Advanced usage#
By default trrex will use word boundaries "(\b)"
to delimit keywords, this could be problematic if the words contain
punctuation symbols. You can do the following for those cases:
In [1]: import trrex as tx
In [2]: import re
In [3]: emoticons = [":)", ":D", ":("]
In [4]: pattern = tx.make(emoticons, prefix=r"(?<!\w)", suffix=r"(?!\w)")
In [5]: result = re.findall(pattern, "The smile :), and the laugh :D and the sad :(")
In [6]: result
Out[6]: [':)', ':D', ':(']
In the above example the parenthesis need no escaping because they are inside in a character set:
In [7]: pattern
Out[7]: '(?<!\\w):[D)(](?!\\w)'
In general, however, the regex meta characters need to be escaped in order to match them:
In [8]: words = ["bab.y", "b#ad", "b?at"]
In [9]: pattern = tx.make(map(re.escape, words))
In [10]: pattern
Out[10]: '\\bb(?:ab\\.y|\\(?:?at|#ad))\\b'
Notice that you need to use re.escape for each character of the string in order to work properly with trrex.