Integration with other libraries

As trrex builds a regex pattern, it can be used by any library that expects a regular expression

Working with pandas

In [1]: import trrex as tx

In [2]: import pandas as pd

In [3]: df = pd.DataFrame(
   ...:     ["The quick brown fox", "jumps over", "the lazy dog"], columns=["text"]
   ...: )
   ...: 

In [4]: pattern = tx.make(["dog", "fox"])

In [5]: df["text"].str.contains(pattern)
Out[5]: 
0     True
1    False
2     True
Name: text, dtype: bool

As you can see from the above example it works with any pandas function that receives a regular expression.

Efficient gazetteer for spacy

It can be used in conjunction with spacy EntityRuler to build a gazetteer

In [6]: import trrex as tx

In [7]: from spacy.lang.en import English

In [8]: nlp = English()

In [9]: ruler = nlp.add_pipe("entity_ruler")

In [10]: patterns = [
   ....:     {
   ....:         "label": "ORG",
   ....:         "pattern": [
   ....:             {"TEXT": {"REGEX": tx.make(["Amazon", "Apple", "Netflix", "Netlify"])}}
   ....:         ],
   ....:     },
   ....:     {"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "francisco"}]},
   ....: ]
   ....: 

In [11]: ruler.add_patterns(patterns)

In [12]: doc = nlp("Netflix HQ is in Los Gatos.")

In [13]: [(ent.text, ent.label_) for ent in doc.ents]
Out[13]: [('Netflix', 'ORG')]

Fuzzy matching with regex

We can take advantage of the fuzzy matching of the regex module:

In [14]: import regex

In [15]: import trrex as tx

In [16]: pattern = tx.make(
   ....:     ["monkey", "monster", "dog", "cat"], prefix="", suffix=r"{1<=e<=2}"
   ....: )
   ....: 

In [17]: regex.search(pattern, "This is really a master dag", regex.BESTMATCH)
Out[17]: <regex.Match object; span=(24, 27), match='dag', fuzzy_counts=(1, 0, 0)>