Skip to content

Commit c4483f9

Browse files
authored
MOD-4432: Add to JSONPath filter the regexp match operator (RedisJSON#848)
* Add to JSONPath filter the regexp match operator * Improve coverage * Minor cosmetics * Allow match using regex pattern from a field and add end-to-end test * Test with numeric combined in filter * Add more tests and documentation
1 parent ece58d3 commit c4483f9

File tree

6 files changed

+188
-5
lines changed

6 files changed

+188
-5
lines changed

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ serde = "1.0"
2828
libc = "0.2"
2929
redis-module = { version="1.0", features = ["experimental-api"]}
3030
itertools = "0.10"
31+
regex = "1"
3132
[features]
3233
# Workaround to allow cfg(feature = "test") in redismodue-rs dependencies:
3334
# https://github.com/RedisLabsModules/redismodule-rs/pull/68

docs/docs/path.md

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ RedisJSON knows which syntax to use depending on the first character of the path
1313

1414
## JSONPath support
1515

16-
RedisJSON v2.0 introduces [JSONPath](http://goessner.net/articles/JsonPath/) support. It follows the syntax described by Goessner in his article.
16+
RedisJSON v2.0 introduces [JSONPath](http://goessner.net/articles/JsonPath/) support. It follows the syntax described by Goessner in his [article](http://goessner.net/articles/JsonPath/).
1717

1818
A JSONPath query can resolve to several locations in a JSON document. In this case, the JSON commands apply the operation to every possible location. This is a major improvement over [legacy path](#legacy-path-syntax) queries, which only operate on the first path.
1919

@@ -40,7 +40,7 @@ The following JSONPath syntax table was adapted from Goessner's [path syntax com
4040
| [] | Subscript operator, accesses an array element. |
4141
| [,] | Union, selects multiple elements. |
4242
| [start\:end\:step] | Array slice where start, end, and step are indexes. |
43-
| ?() | Filters a JSON object or array. Supports comparison operators <nobr>(==, !=, <, <=, >, >=)</nobr> and logical operators <nobr>(&&, \|\|)</nobr>. |
43+
| ?() | Filters a JSON object or array. Supports comparison operators <nobr>(`==`, `!=`, `<`, `<=`, `>`, `>=`, `=~`)</nobr>, logical operators <nobr>(`&&`, `\|\|`)</nobr>, and parenthesis <nobr>(`(`, `)`)</nobr>. |
4444
| () | Script expression. |
4545
| @ | The current element, used in filter or script expressions. |
4646

@@ -154,7 +154,7 @@ You can use an array slice to select a range of elements from an array. This exa
154154
"[\"Noise-cancelling Bluetooth headphones\",\"Wireless earbuds\"]"
155155
```
156156

157-
Filter expressions `?()` let you select JSON elements based on certain conditions. You can use comparison operators (==, !=, <, <=, >, >=) and logical operators (&&, \|\|) within these expressions.
157+
Filter expressions `?()` let you select JSON elements based on certain conditions. You can use comparison operators (`==`, `!=`, `<`, `<=`, `>`, `>=`, `=~`), logical operators (`&&`, `||`), and parenthesis (`(`, `)`) within these expressions. A filter expression can be applied on an array or on an object, iterating all the elements in the array or all the key-value pair in the object, retrieving only the ones that match the filter condition. The filter condition is using `@` to denote the current array element or the current object. Use `@.key_name` to refer to a specific value. Use `$` to denote the top-level, e.g., `$.top_level_key_name`
158158

159159
For example, this filter only returns wireless headphones with a price less than 70:
160160

@@ -170,6 +170,32 @@ This example filters the inventory for the names of items that support Bluetooth
170170
"[\"Noise-cancelling Bluetooth headphones\",\"Wireless earbuds\",\"Wireless keyboard\"]"
171171
```
172172

173+
The comparison operator `=~` is matching a string value of the left-hand side against a regular expression pattern on the right-hand side. The supported regular expression syntax is detailed [here](https://docs.rs/regex/latest/regex/#syntax).
174+
175+
For example, this filters only keyboards with some sort of USB connection (notice this match is case-insensitive thanks to the prefix `(?i)` in the regular expression pattern `"(?i)usb"` ):
176+
177+
```sh
178+
127.0.0.1:6379> JSON.GET store '$.inventory.keyboards[?(@.connection =~ "(?i)usb")]'
179+
"[{\"id\":22346,\"name\":\"USB-C keyboard\",\"description\":\"Wired USB-C keyboard\",\"wireless\":false,\"connection\":\"USB-C\",\"price\":29.99,\"stock\":30,\"free-shipping\":false}]"
180+
```
181+
The regular expression pattern can also be taken from a JSON string key on the right-hand side.
182+
183+
For example, let's add each keybaord object with a `regex_pat` key:
184+
185+
```sh
186+
127.0.0.1:6379> JSON.SET store '$.inventory.keyboards[0].regex_pat' '"(?i)bluetooth"'
187+
OK
188+
127.0.0.1:6379> JSON.SET store '$.inventory.keyboards[1].regex' '"usb"'
189+
OK
190+
```
191+
192+
Now we can match against this `regex_pat` key instead of a hard-coded regular expression pattern, and get the keyboard with the `Bluetooth` string in its `connection` key (notice the one with `USB-C` did not match since its regular expression pattern is case-sensitive and the regular expression pattern is using lower case):
193+
194+
```sh
195+
127.0.0.1:6379> JSON.GET store '$.inventory.keyboards[?(@.connection =~ @.regex_pat)]'
196+
"[{\"id\":22345,\"name\":\"Wireless keyboard\",\"description\":\"Wireless Bluetooth keyboard\",\"wireless\":true,\"connection\":\"Bluetooth\",\"price\":44.99,\"stock\":23,\"free-shipping\":false,\"colors\":[\"black\",\"silver\"],\"regex\":\"(?i)Bluetooth\",\"regex_pat\":\"(?i)bluetooth\"}]"
197+
```
198+
173199
#### Update JSON examples
174200

175201
You can also use JSONPath queries when you want to update specific sections of a JSON document.

src/jsonpath/json_path.rs

Lines changed: 27 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ use pest::Parser;
33
use std::cmp::Ordering;
44

55
use crate::jsonpath::select_value::{SelectValue, SelectValueType};
6+
use regex::Regex;
67
use std::fmt::Debug;
78

89
#[derive(Parser)]
@@ -433,8 +434,32 @@ impl<'i, 'j, S: SelectValue> TermEvaluationResult<'i, 'j, S> {
433434
!self.eq(s)
434435
}
435436

436-
fn re(&self, _s: &Self) -> bool {
437-
false
437+
fn re_is_match(regex: &str, s: &str) -> bool {
438+
Regex::new(regex).map_or_else(|_| false, |re| Regex::is_match(&re, s))
439+
}
440+
441+
fn re_match(&self, s: &Self) -> bool {
442+
match (self, s) {
443+
(TermEvaluationResult::Value(v), TermEvaluationResult::Str(regex)) => {
444+
match v.get_type() {
445+
SelectValueType::String => Self::re_is_match(regex, v.as_str()),
446+
_ => false,
447+
}
448+
}
449+
(TermEvaluationResult::Value(v1), TermEvaluationResult::Value(v2)) => {
450+
match (v1.get_type(), v2.get_type()) {
451+
(SelectValueType::String, SelectValueType::String) => {
452+
Self::re_is_match(v2.as_str(), v1.as_str())
453+
}
454+
(_, _) => false,
455+
}
456+
}
457+
(_, _) => false,
458+
}
459+
}
460+
461+
fn re(&self, s: &Self) -> bool {
462+
self.re_match(s)
438463
}
439464
}
440465

tests/pytest/test.py

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1173,6 +1173,50 @@ def testEscape(env):
11731173
r.expect('JSON.SET', 'doc', '$', '{"val": "escaped unicode here:\u2B50"}').ok()
11741174
r.expect('JSON.GET', 'doc', '$.val').equal('["escaped unicode here:⭐"]')
11751175

1176+
def testFilter(env):
1177+
# Test JSONPath filter
1178+
r = env
1179+
1180+
doc = {
1181+
"arr": ["kaboom", "kafoosh", "four", "bar", 7.0, "foolish", ["food", "foo", "FoO", "fight"], -9, {"in" : "fooctious"}, "ffool", "(?i)^[f][o][o]$", False, None],
1182+
"pat_regex": ".*foo",
1183+
"pat_plain": "(?i)^[f][o][o]$",
1184+
"pat_bad": "[f.*",
1185+
"pat_not_str1": 42,
1186+
"pat_not_str2": None,
1187+
"pat_not_str3": True,
1188+
"pat_not_str4": {"p":".*foo"},
1189+
"pat_not_str5": [".*foo"],
1190+
}
1191+
r.expect('JSON.SET', 'doc', '$', json.dumps(doc)).ok()
1192+
1193+
# regex match using a static regex pattern
1194+
r.expect('JSON.GET', 'doc', '$.arr[?(@ =~ ".*foo")]').equal('["kafoosh","foolish","ffool"]')
1195+
1196+
# regex match using a field
1197+
r.expect('JSON.GET', 'doc', '$.arr[?(@ =~ $.pat_regex)]').equal('["kafoosh","foolish","ffool"]')
1198+
1199+
# regex case-insensitive match using a field (notice the `.*` before the filter)
1200+
r.expect('JSON.GET', 'doc', '$.arr.*[?(@ =~ $.pat_plain)]').equal('["foo","FoO"]')
1201+
1202+
# regex match using field after being modified
1203+
r.expect('JSON.SET', 'doc', '$.pat_regex', '"k.*foo"').ok()
1204+
r.expect('JSON.GET', 'doc', '$.arr[?(@ =~ $.pat_regex)]').equal('["kafoosh"]')
1205+
1206+
# regex mismatch (illegal pattern)
1207+
r.expect('JSON.GET', 'doc', '$.arr[?(@ == $.pat_bad)]').equal('[]')
1208+
r.expect('JSON.GET', 'doc', '$.arr[?(@ == $.pat_bad || @>4.5)]').equal('[7.0]')
1209+
1210+
# regex mismatch (missing pattern)
1211+
r.expect('JSON.GET', 'doc', '$.arr[?(@ =~ $.pat_missing)]').equal('[]')
1212+
1213+
# regex mismatch (not a string pattern)
1214+
for i in range(1, 6):
1215+
r.expect('JSON.GET', 'doc', '$.arr[?(@ =~ $.pat_not_str{})]'.format(i)).equal('[]')
1216+
1217+
# plain string match
1218+
r.expect('JSON.GET', 'doc', '$.arr[?(@ == $.pat_plain)]').equal('["(?i)^[f][o][o]$"]')
1219+
11761220

11771221
# class CacheTestCase(BaseReJSONTest):
11781222
# @property

tests/rust_tests/op.rs

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -366,3 +366,89 @@ fn op_for_same_type() {
366366
]),
367367
);
368368
}
369+
370+
#[test]
371+
fn op_string_regexp_match() {
372+
setup();
373+
374+
select_and_then_compare(
375+
r#"$.tags[?(@ =~ "^[a-z]{4}$")]"#,
376+
read_json("./json_examples/data_obj.json"),
377+
json!(["aute", "elit", "esse"]),
378+
);
379+
380+
select_and_then_compare(
381+
r#"$.tags[?(@ =~ "^[ec].*")]"#,
382+
read_json("./json_examples/data_obj.json"),
383+
json!(["elit", "esse", "culpa"]),
384+
);
385+
386+
select_and_then_compare(
387+
r#"$.arr[?(@ =~ "^[ec.*")]"#, //erroneous regexp
388+
json!([{
389+
"arr": ["eclectic", 54, "elit", "esse", "culpa"],
390+
}]),
391+
json!([]),
392+
);
393+
394+
select_and_then_compare(
395+
// Flat visit all JSON types
396+
r#"$.arr[?(@ =~ "^[Ee]c.*")]"#,
397+
json!({
398+
"arr": ["eclectic", 54, "elit", 96.33, {"eclipse":"ecstatic"}, "esse", true, "culpa", "echo", ["ecu", "eching"], "Ecuador", null, "etc"],
399+
}),
400+
json!(["eclectic", "echo", "Ecuador"]),
401+
);
402+
403+
select_and_then_compare(
404+
// Recursive visit all JSON types
405+
r#"$..[?(@ =~ "^[Ee]c.*")]"#,
406+
json!({
407+
"arr": ["eclectic", 54, "elit", 96.33, {"eclipse":"ecstatic"}, "esse", true, "culpa", "echo", ["ecu", "eching", "plan"], "Ecuador", null, "etc"],
408+
}),
409+
json!([
410+
"eclectic", "echo", "Ecuador", // "arr" filtered
411+
"eclectic", // "arr" content flattened - 1st level
412+
"ecstatic", // "arr" content flattened - "eclipse" filtered
413+
"ecstatic", // "arr" content flattened - "eclipse" flattened
414+
"echo", // "arr" content flattened - 1st level
415+
"ecu", "eching", // "arr" content flattened - anonymous array filtered
416+
"ecu", "eching", // "arr" content flattened - anonymous array flattened
417+
"Ecuador" // "arr" content flattened - 1st level
418+
]),
419+
);
420+
}
421+
422+
#[test]
423+
fn op_string_regexp_field_match() {
424+
setup();
425+
426+
select_and_then_compare(
427+
r#"$.arr[?(@ =~ $.pat1)]"#, //regex
428+
json!({
429+
"arr": ["kaboom", "kafoosh", "four", "bar", 7.0, -9, false, null, "foolish", "ffool", "[f][o][o]"],
430+
"pat1":"foo",
431+
"pat2":"k.*foo"
432+
}),
433+
json!(["kafoosh", "foolish", "ffool"]),
434+
);
435+
436+
select_and_then_compare(
437+
r#"$.arr[?(@ =~ $.pat2)]"#, //regex
438+
json!({
439+
"arr": ["kaboom", "kafoosh", "four", "bar", 7.0, -9, false, null, "foolish", "ffool", "[f][o][o]"],
440+
"pat1":"foo",
441+
"pat2":"k.*foo"
442+
}),
443+
json!(["kafoosh"]),
444+
);
445+
446+
select_and_then_compare(
447+
r#"$.arr[?(@ == $.pat1)]"#, //plain string
448+
json!({
449+
"arr": ["kaboom", "kafoosh", "four", "bar", 7.0, -9, false, null, "foolish", "ffool", "[f][o][o]"],
450+
"pat1":"[f][o][o]"
451+
}),
452+
json!(["[f][o][o]"]),
453+
);
454+
}

0 commit comments

Comments
 (0)