Integrations#
As trrex builds a regular expression pattern, it can be used by any library that expects a regular expression
Working with pandas#
In [1]: import trrex as tx
In [2]: import pandas as pd
In [3]: df = pd.DataFrame(
...: ["The quick brown fox", "jumps over", "the lazy dog"], columns=["text"]
...: )
...:
In [4]: pattern = tx.make(["dog", "fox"])
In [5]: df["text"].str.contains(pattern)
Out[5]:
0 True
1 False
2 True
Name: text, dtype: bool
As you can see from the above example it works with any pandas function that receives a regular expression.
Efficient gazetteer for spacy#
It can be used in conjunction with spacy EntityRuler to build a gazetteer
In [6]: import trrex as tx
In [7]: from spacy.lang.en import English
In [8]: nlp = English()
In [9]: ruler = nlp.add_pipe("entity_ruler")
In [10]: patterns = [
....: {
....: "label": "ORG",
....: "pattern": [
....: {"TEXT": {"REGEX": tx.make(["Amazon", "Apple", "Netflix", "Netlify"])}}
....: ],
....: },
....: {"label": "GPE", "pattern": [{"LOWER": "san"}, {"LOWER": "francisco"}]},
....: ]
....:
In [11]: ruler.add_patterns(patterns)
In [12]: doc = nlp("Netflix HQ is in Los Gatos.")
In [13]: [(ent.text, ent.label_) for ent in doc.ents]
Out[13]: [('Netflix', 'ORG')]
Fuzzy matching with regex#
We can take advantage of the fuzzy matching of the regex module:
In [14]: import regex
In [15]: import trrex as tx
In [16]: pattern = tx.make(
....: ["monkey", "monster", "dog", "cat"], prefix="", suffix=r"{1<=e<=2}"
....: )
....:
In [17]: regex.search(pattern, "This is really a master dag", regex.BESTMATCH)
Out[17]: <regex.Match object; span=(24, 27), match='dag', fuzzy_counts=(1, 0, 0)>