Skip to content

[JsonPath] Improve compliance to the RFC test suite #60699

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 7.3
Choose a base branch
from

Conversation

alexandre-daubois
Copy link
Member

@alexandre-daubois alexandre-daubois commented Jun 5, 2025

Q A
Branch? 7.3
Bug fix? yes
New feature? no
Deprecations? no
Issues -
License MIT

This PR is a big step forward on making the component RFC compliant. Many more tests are now green as you can see in the JsonPathComplianceTestSuiteTest. Mainly, it's about dealing with whitespaces, better expressions parsing and supporting ! and bare literals like true, false and null.

@carsonbot carsonbot added this to the 7.3 milestone Jun 5, 2025
@alexandre-daubois alexandre-daubois changed the title [JsonPath] Handle special whitespaces in filters [JsonPath] Handle special whitespaces in expressions Jun 5, 2025
@@ -163,13 +185,14 @@ private function evaluateBracket(string $expr, mixed $value): array
return $result;
}

// start, end and step
if (preg_match('/^(-?\d*):(-?\d*)(?::(-?\d+))?$/', $expr, $matches)) {
if (preg_match('/^(-?\d*)\s*:\s*(-?\d*)(?:\s*:\s*(-?\d+))?$/', $expr, $matches)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest changing all those * to *+.
Due to the structure of the regex, using possessive quantifiers will match exactly the same, but it will avoid useless backtracking for non-matching strings (a very smart regex compiler might be able to automatically make them possessive as an optimization, but I'm not sure PCRE is doing it)

@@ -211,8 +234,8 @@ private function evaluateBracket(string $expr, mixed $value): array
}

// filter expressions
if (preg_match('/^\?(.*)$/', $expr, $matches)) {
$filterExpr = $matches[1];
if (preg_match('/^\?\s*(.*)$/', $expr, $matches)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (preg_match('/^\?\s*(.*)$/', $expr, $matches)) {
if (preg_match('/^\?\s*+(.*)$/', $expr, $matches)) {

In this case, we want the possessive quantifier to ensure that all spaces are always consumed by the \s* part, as we have 2 consecutive quantifiers that could consume them, which is the typical ReDoS pattern.
Alternatively, let spaces be consumed by the (.*) (as done before) and be cleaned by the trimming on next line.

@@ -346,7 +373,7 @@ private function evaluateScalar(string $expr, array $context): mixed
}

// function calls
if (preg_match('/^(\w+)\((.*)\)$/', $expr, $matches)) {
if (preg_match('/^(\w+)\s*\(\s*(.*)\s*\)$/', $expr, $matches)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\s*(.*)\s*\ will be a ReDoS pattern for non-matching strings (and it does not even ensure the last \s* consumes the trailing spaces).

It would be simpler to let the (.*) match all the content between the braces and to trim $matches[2] before using it.

@alexandre-daubois
Copy link
Member Author

Thank you @stof, I didn't know that denial of service was actually a thing with regex. I updated accordingly.

@stof
Copy link
Member

stof commented Jun 5, 2025

@alexandre-daubois this can be a thing when you allow user input for the string being matched by the regex (which could totally happen in this component). Backtracking engines (like PCRE) have an exponential complexity based on the length of the input when attempting to match an affected regex (and failing to match it, as this is the worse case of backtracking).

This is commonly reported in the npm ecosystem (also because JS does not support possessive quantifiers in its Regexp syntax, and so cannot apply the easy fix to prevent them in many cases, making the issue more common)

@alexandre-daubois
Copy link
Member Author

Applied your suggestions, it makes the code a bit simpler. Thanks!

foreach (explode(',', $expr) as $index) {
$index = (int) trim($index);
foreach (explode(',', $expr) as $indexStr) {
$index = (int) trim($indexStr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure the renaming makes sense to me

Comment on lines 366 to 369
if (trim($args)) {
$argList = array_map(
fn ($arg) => $this->evaluateScalar(trim($arg), $context),
preg_split('/\s*,\s*/', trim($args))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (trim($args)) {
$argList = array_map(
fn ($arg) => $this->evaluateScalar(trim($arg), $context),
preg_split('/\s*,\s*/', trim($args))
if (trim($args)) {
$argList = array_map(
fn ($arg) => $this->evaluateScalar(trim($arg), $context),
preg_split('/\s*,\s*/', trim($args))
Suggested change
if (trim($args)) {
$argList = array_map(
fn ($arg) => $this->evaluateScalar(trim($arg), $context),
preg_split('/\s*,\s*/', trim($args))
if ($args = trim($args) ?: []) {
$args = array_map(
fn ($arg) => $this->evaluateScalar($args, $context),
preg_split('/\s*,\s*/', $args)

Copy link
Member

@nicolas-grekas nicolas-grekas Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUT, is the '0' string handled correctly if that makes sense?

Comment on lines 190 to 191
$current = trim($current);
if ('' !== $current) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skipped similar suggestions but this is the preferred CS for such lines:

Suggested change
$current = trim($current);
if ('' !== $current) {
if ('' !== $current = trim($current)) {

@alexandre-daubois alexandre-daubois changed the title [JsonPath] Handle special whitespaces in expressions [JsonPath] Improve compliance to the RFC test suite Jun 13, 2025
@alexandre-daubois
Copy link
Member Author

alexandre-daubois commented Jun 13, 2025

I reworked the whole PR to keep pushing forward the compliance test suite. This PR removes 265 skips, so that's pretty nice.

Status: Needs Review

@alexandre-daubois alexandre-daubois force-pushed the jsonpath-blankspaces branch 2 times, most recently from d71e0e5 to 84193ab Compare June 13, 2025 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants