Tutorials
Advanced usage
By default, trrex will use word boundaries "(\\b)" to delimit keywords,
this could be problematic if the words contain punctuation symbols. You
can do the following for those cases:
import trrex as tx
import re
emoticons = [":)", ":D", ":("]
pattern = tx.make(emoticons,
prefix=r"(?<!w)",
suffix=r"(?!w)")
result = re.findall(pattern, "The smile :), and the laugh :D and the sad :(")
In the above example the parenthesis need no escaping because they are inside in a character set:
In general, however, the regex meta characters need to be escaped in order to match them:
words = ["bab.y", "b#ad", "b?at"]
# apply re.escape to each character of each word
pattern = tx.make(tuple(map(re.escape, word)) for word in words)
Notice that you need to apply re.escape to each character of each string in order to work properly with trrex.
How not to use it
The code below makes a pattern for each word and hence does not take advantage of trrex. The code will offer no performance benefit against a standard Python string search.
import trrex as tx
import re
text = "The bad bat scared the baby"
words = ["bad", "baby", "bat"]
for word in words:
pattern = tx.make([word])
match = re.search(pattern, text)