By default trrex will use word boundaries
"(\b)" to delimit keywords, this could be problematic if the words contain
punctuation symbols. You can do the following for those cases:
In : import trrex as tx In : import re In : emoticons = [":)", ":D", ":("] In : pattern = tx.make(emoticons, prefix=r"(?<!\w)", suffix=r"(?!\w)") In : result = re.findall(pattern, "The smile :), and the laugh :D and the sad :(") In : result Out: [':)', ':D', ':(']
In the above example the parenthesis need no escaping because they are inside in a character set:
In : pattern Out: '(?<!\\w):[D)(](?!\\w)'
In general, however, the regex meta characters need to be escaped in order to match them:
In : words = ["bab.y", "b#ad", "b?at"] In : pattern = tx.make(map(re.escape, words)) In : pattern Out: '\\bb(?:ab\\.y|\\(?:?at|#ad))\\b'
Notice that you need to use re.escape for each character of the string in order to work properly with trrex.