Skip to content

[CssSelector] Fix escape patterns #48771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 23, 2022

Conversation

fancyweb
Copy link
Contributor

Q A
Branch? 5.4
Bug fix? yes
New feature? no
Deprecations? no
Tickets #46682 and #44516
License MIT
Doc PR -

I rechecked the original Python source code we "borrowed" (https://github.com/scrapy/cssselect/blob/master/cssselect/parser.py) and fixed some issues.

@carsonbot

This comment was marked as resolved.

@fancyweb fancyweb changed the base branch from 6.3 to 5.4 December 23, 2022 11:41
@@ -49,22 +49,22 @@ public function __construct()
$this->identifierPattern = '-?(?:'.$this->nmStartPattern.')(?:'.$this->nmCharPattern.')*';
$this->hashPattern = '#((?:'.$this->nmCharPattern.')+)';
$this->numberPattern = '[+-]?(?:[0-9]*\.[0-9]+|[0-9]+)';
$this->quotedStringPattern = '([^\n\r\f%s]|'.$this->stringEscapePattern.')*';
$this->quotedStringPattern = '([^\n\r\f\\\\%s]|'.$this->stringEscapePattern.')*';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stof stof modified the milestones: 6.3, 5.4 Dec 23, 2022
}

public function getNewLineEscapePattern(): string
{
return '~^'.$this->newLineEscapePattern.'~';
return '~'.$this->newLineEscapePattern.'~';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/scrapy/cssselect/blob/ddd9784977fca6e7da4439d837e4f510f1f10638/cssselect/parser.py#L930

Check the 3 escape"_sub_* "functions" and how they are used. All occurrences are replaced.

['#test\:colon', ['Hash[Element[*]#test:colon]']],
[".a\xc1b", ["Class[Element[*].a\xc1b]"]],
// unicode escape: \22 == "
['*[aval="\'\22\'"]', ['Attribute[Element[*][aval = \'\'"\'\']]']],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I borrowed those tests from https://github.com/scrapy/cssselect/blob/ddd9784977fca6e7da4439d837e4f510f1f10638/tests/test_cssselect.py#L550. They highlighted the "quotedStringPattern" problem.

@stof
Copy link
Member

stof commented Dec 23, 2022

@fancyweb if you have some availability and are keen on checking tat python code, you might want to work on porting the missing features to Symfony 6.3 😄 I create a few issues about them (but I'm quite sure the list is still incomplete for now)

Copy link
Member

@fabpot fabpot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a followup PR that would catch up with upstream would be great indeed!

@fabpot
Copy link
Member

fabpot commented Dec 23, 2022

Thank you @fancyweb.

@fabpot fabpot merged commit e7ec8a6 into symfony:5.4 Dec 23, 2022
@fancyweb fancyweb deleted the css-selector/fix-escape-patterns branch December 23, 2022 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants