An Introduction to Using Semgrex
Chloé Kiddon
What is Semgrex?
A java utility (in javanlp) for identifying patterns in Stanford JavaNLP
SemanticGraph structure
Much like Tregex, which does this for tree structures (Levy, Andrew
2006) and is based on tgrep-2 style syntax and functionality. (These
slides adapted from the structure of theirs)
Applied the same way you use regular expressions to find patterns in
strings
bought
Bob
shirt
nsubj
dobj
a
red
det
amod
{tag:/VB.*/} >dobj ({} >amod {lemma:red})
Ex.
Semgrex Overview
SemgrexPatterns are composed of nodes,
representing IndexedWords, and relations
between them, representing edges in a
SemanticGraph
SemgrexMatchers can be used on singular
SemanticGraphs OR on two SemanticGraphs
and an Alignment between them
Ex. an RTE problem has the hypothesis graph, the
text graph, and the alignment from the hypothesis
graph’s IndexedFeatureLabels to the text graph’s
IndexedFeatureLabels
SemgrexPatterns return matches for
IndexedFeatureLabels in a SemanticGraph
Syntax - Nodes
Nodes are represented as {attr1:value1;attr2:value2;…}
Attributes are regular strings; values can be strings or
regular expressions marked by “/”s
{lemma:run;pos:/VB.*/} => any verb form of the word “run”
{} is any node in the graph
{$} is any root in the graph
{#} is the empty word (IndexedFeatureLabel.NO_WORD)
Comes up when working with alignments
Descriptions can be negated with !
!{lemma:boy} => any word that isn’t “boy”
Grouping Nodes
Perhaps you want a node that is either word with an
ner TIME tag, or the lemma “when”. The node
{ner:TIME;lemma:when} does not accomplish this
OR operation
Can use brackets and | (or &) to specify these
groupings
[ {lemma:locate} | {ner:LOCATION} ]
A node that is either a word with a lemma “locate” or a word
with LOCATION ner
Can also be negated by putting a ! In front
By default, & takes precedence over |, but & has no
reason to be used