-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[JsonPath] Improve compliance to the RFC test suite #60699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 7.3
Are you sure you want to change the base?
[JsonPath] Improve compliance to the RFC test suite #60699
Conversation
ba52060
to
9ceeb5f
Compare
@@ -163,13 +185,14 @@ private function evaluateBracket(string $expr, mixed $value): array | |||
return $result; | |||
} | |||
|
|||
// start, end and step | |||
if (preg_match('/^(-?\d*):(-?\d*)(?::(-?\d+))?$/', $expr, $matches)) { | |||
if (preg_match('/^(-?\d*)\s*:\s*(-?\d*)(?:\s*:\s*(-?\d+))?$/', $expr, $matches)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest changing all those *
to *+
.
Due to the structure of the regex, using possessive quantifiers will match exactly the same, but it will avoid useless backtracking for non-matching strings (a very smart regex compiler might be able to automatically make them possessive as an optimization, but I'm not sure PCRE is doing it)
@@ -211,8 +234,8 @@ private function evaluateBracket(string $expr, mixed $value): array | |||
} | |||
|
|||
// filter expressions | |||
if (preg_match('/^\?(.*)$/', $expr, $matches)) { | |||
$filterExpr = $matches[1]; | |||
if (preg_match('/^\?\s*(.*)$/', $expr, $matches)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (preg_match('/^\?\s*(.*)$/', $expr, $matches)) { | |
if (preg_match('/^\?\s*+(.*)$/', $expr, $matches)) { |
In this case, we want the possessive quantifier to ensure that all spaces are always consumed by the \s*
part, as we have 2 consecutive quantifiers that could consume them, which is the typical ReDoS pattern.
Alternatively, let spaces be consumed by the (.*)
(as done before) and be cleaned by the trimming on next line.
@@ -346,7 +373,7 @@ private function evaluateScalar(string $expr, array $context): mixed | |||
} | |||
|
|||
// function calls | |||
if (preg_match('/^(\w+)\((.*)\)$/', $expr, $matches)) { | |||
if (preg_match('/^(\w+)\s*\(\s*(.*)\s*\)$/', $expr, $matches)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\s*(.*)\s*\
will be a ReDoS pattern for non-matching strings (and it does not even ensure the last \s*
consumes the trailing spaces).
It would be simpler to let the (.*)
match all the content between the braces and to trim $matches[2]
before using it.
9ceeb5f
to
9e73c21
Compare
Thank you @stof, I didn't know that denial of service was actually a thing with regex. I updated accordingly. |
@alexandre-daubois this can be a thing when you allow user input for the string being matched by the regex (which could totally happen in this component). Backtracking engines (like PCRE) have an exponential complexity based on the length of the input when attempting to match an affected regex (and failing to match it, as this is the worse case of backtracking). This is commonly reported in the npm ecosystem (also because JS does not support possessive quantifiers in its Regexp syntax, and so cannot apply the easy fix to prevent them in many cases, making the issue more common) |
9e73c21
to
2c412e1
Compare
2c412e1
to
5beab82
Compare
Applied your suggestions, it makes the code a bit simpler. Thanks! |
5beab82
to
ab0853a
Compare
foreach (explode(',', $expr) as $index) { | ||
$index = (int) trim($index); | ||
foreach (explode(',', $expr) as $indexStr) { | ||
$index = (int) trim($indexStr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure the renaming makes sense to me
if (trim($args)) { | ||
$argList = array_map( | ||
fn ($arg) => $this->evaluateScalar(trim($arg), $context), | ||
preg_split('/\s*,\s*/', trim($args)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (trim($args)) { | |
$argList = array_map( | |
fn ($arg) => $this->evaluateScalar(trim($arg), $context), | |
preg_split('/\s*,\s*/', trim($args)) | |
if (trim($args)) { | |
$argList = array_map( | |
fn ($arg) => $this->evaluateScalar(trim($arg), $context), | |
preg_split('/\s*,\s*/', trim($args)) |
if (trim($args)) { | |
$argList = array_map( | |
fn ($arg) => $this->evaluateScalar(trim($arg), $context), | |
preg_split('/\s*,\s*/', trim($args)) | |
if ($args = trim($args) ?: []) { | |
$args = array_map( | |
fn ($arg) => $this->evaluateScalar($args, $context), | |
preg_split('/\s*,\s*/', $args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BUT, is the '0
' string handled correctly if that makes sense?
$current = trim($current); | ||
if ('' !== $current) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skipped similar suggestions but this is the preferred CS for such lines:
$current = trim($current); | |
if ('' !== $current) { | |
if ('' !== $current = trim($current)) { |
ab0853a
to
9453d33
Compare
I reworked the whole PR to keep pushing forward the compliance test suite. This PR removes 265 skips, so that's pretty nice. Status: Needs Review |
d71e0e5
to
84193ab
Compare
84193ab
to
03f1b10
Compare
This PR is a big step forward on making the component RFC compliant. Many more tests are now green as you can see in the
JsonPathComplianceTestSuiteTest
. Mainly, it's about dealing with whitespaces, better expressions parsing and supporting!
and bare literals liketrue
,false
andnull
.