Advanced usage#

By default trrex will use word boundaries "(\b)" to delimit keywords, this could be problematic if the words contain punctuation symbols. You can do the following for those cases:

In [1]: import trrex as tx

In [2]: import re

In [3]: emoticons = [":)", ":D", ":("]

In [4]: pattern = tx.make(emoticons, prefix=r"(?<!\w)", suffix=r"(?!\w)")

In [5]: result = re.findall(pattern, "The smile :), and the laugh :D and the sad :(")

In [6]: result
Out[6]: [':)', ':D', ':(']

In the above example the parenthesis need no escaping because they are inside in a character set:

In [7]: pattern
Out[7]: '(?<!\\w):[D)(](?!\\w)'

In general, however, the regex meta characters need to be escaped in order to match them:

In [8]: words = ["bab.y", "b#ad", "b?at"]

In [9]: pattern = tx.make(map(re.escape, words))

In [10]: pattern
Out[10]: '\\bb(?:ab\\.y|\\(?:?at|#ad))\\b'

Notice that you need to use re.escape for each character of the string in order to work properly with trrex.