Skip to content

Commit a4c43e5

Browse files
Allow Matcher to match on ENT_ID and ENT_KB_ID (explosion#9688)
* Added ENT_ID and ENT_KB_ID into the list of the attributes that Matcher matches on * Added ENT_ID and ENT_KB_ID to TEST_PATTERNS in test_pattern_validation.py. Disabled tests that I added before * Update website/docs/api/matcher.md * Format * Remove skipped tests Co-authored-by: Adriane Boyd <adrianeboyd@gmail.com>
1 parent 7fec5fd commit a4c43e5

File tree

3 files changed

+8
-0
lines changed

3 files changed

+8
-0
lines changed

spacy/schemas.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -222,6 +222,8 @@ class TokenPattern(BaseModel):
222222
lemma: Optional[StringValue] = None
223223
shape: Optional[StringValue] = None
224224
ent_type: Optional[StringValue] = None
225+
ent_id: Optional[StringValue] = None
226+
ent_kb_id: Optional[StringValue] = None
225227
norm: Optional[StringValue] = None
226228
length: Optional[NumberValue] = None
227229
spacy: Optional[StrictBool] = None

spacy/tests/matcher/test_pattern_validation.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@
2222
([{"TEXT": {"VALUE": "foo"}}], 2, 0), # prev: (1, 0)
2323
([{"IS_DIGIT": -1}], 1, 0),
2424
([{"ORTH": -1}], 1, 0),
25+
([{"ENT_ID": -1}], 1, 0),
26+
([{"ENT_KB_ID": -1}], 1, 0),
2527
# Good patterns
2628
([{"TEXT": "foo"}, {"LOWER": "bar"}], 0, 0),
2729
([{"LEMMA": {"IN": ["love", "like"]}}, {"POS": "DET", "OP": "?"}], 0, 0),
@@ -33,6 +35,8 @@
3335
([{"orth": "foo"}], 0, 0), # prev: xfail
3436
([{"IS_SENT_START": True}], 0, 0),
3537
([{"SENT_START": True}], 0, 0),
38+
([{"ENT_ID": "STRING"}], 0, 0),
39+
([{"ENT_KB_ID": "STRING"}], 0, 0),
3640
]
3741

3842

website/docs/api/matcher.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,8 @@ rule-based matching are:
4444
| `SPACY` | Token has a trailing space. ~~bool~~ |
4545
|  `POS`, `TAG`, `MORPH`, `DEP`, `LEMMA`, `SHAPE` | The token's simple and extended part-of-speech tag, morphological analysis, dependency label, lemma, shape. ~~str~~ |
4646
| `ENT_TYPE` | The token's entity label. ~~str~~ |
47+
| `ENT_ID` | The token's entity ID (`ent_id`). ~~str~~ |
48+
| `ENT_KB_ID` | The token's entity knowledge base ID (`ent_kb_id`). ~~str~~ |
4749
| `_` <Tag variant="new">2.1</Tag> | Properties in [custom extension attributes](/usage/processing-pipelines#custom-components-attributes). ~~Dict[str, Any]~~ |
4850
| `OP` | Operator or quantifier to determine how often to match a token pattern. ~~str~~ |
4951

0 commit comments

Comments
 (0)