-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[CssSelector] Fix escape patterns #48771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
@@ -49,22 +49,22 @@ public function __construct() | |||
$this->identifierPattern = '-?(?:'.$this->nmStartPattern.')(?:'.$this->nmCharPattern.')*'; | |||
$this->hashPattern = '#((?:'.$this->nmCharPattern.')+)'; | |||
$this->numberPattern = '[+-]?(?:[0-9]*\.[0-9]+|[0-9]+)'; | |||
$this->quotedStringPattern = '([^\n\r\f%s]|'.$this->stringEscapePattern.')*'; | |||
$this->quotedStringPattern = '([^\n\r\f\\\\%s]|'.$this->stringEscapePattern.')*'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/scrapy/cssselect/blob/ddd9784977fca6e7da4439d837e4f510f1f10638/cssselect/parser.py#L926 (We are missing \\
before the quote)
} | ||
|
||
public function getNewLineEscapePattern(): string | ||
{ | ||
return '~^'.$this->newLineEscapePattern.'~'; | ||
return '~'.$this->newLineEscapePattern.'~'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the 3 escape"_sub_*
"functions" and how they are used. All occurrences are replaced.
['#test\:colon', ['Hash[Element[*]#test:colon]']], | ||
[".a\xc1b", ["Class[Element[*].a\xc1b]"]], | ||
// unicode escape: \22 == " | ||
['*[aval="\'\22\'"]', ['Attribute[Element[*][aval = \'\'"\'\']]']], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I borrowed those tests from https://github.com/scrapy/cssselect/blob/ddd9784977fca6e7da4439d837e4f510f1f10638/tests/test_cssselect.py#L550. They highlighted the "quotedStringPattern" problem.
@fancyweb if you have some availability and are keen on checking tat python code, you might want to work on porting the missing features to Symfony 6.3 😄 I create a few issues about them (but I'm quite sure the list is still incomplete for now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a followup PR that would catch up with upstream would be great indeed!
Thank you @fancyweb. |
I rechecked the original Python source code we "borrowed" (https://github.com/scrapy/cssselect/blob/master/cssselect/parser.py) and fixed some issues.