-
-
Notifications
You must be signed in to change notification settings - Fork 9.7k
[HtmlSanitizer] Use the native HTML5 parser when using PHP 8.4+ #61366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
48dda59
to
117bdf9
Compare
117bdf9
to
743f812
Compare
324d9b6
to
3383879
Compare
98b6dab
to
1c40c66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
1c40c66
to
bf602fc
Compare
bf602fc
to
d0f98ad
Compare
…ext arg to ParserInterface::parse() (nicolas-grekas) This PR was merged into the 8.0 branch. Discussion ---------- [HtmlSanitizer] Remove MastermindsParser and add $context arg to ParserInterface::parse() | Q | A | ------------- | --- | Branch? | 8.0 | Bug fix? | no | New feature? | yes | Deprecations? | no | Issues | - | License | MIT Follows #61366 Commits ------- b291f58 [HtmlSanitizer] Remove MastermindsParser and add $context arg to ParserInterface::parse()
// Remove NULL character | ||
$input = str_replace(\chr(0), '', $input); | ||
// Remove NULL character and HTML entities for null byte | ||
$input = str_replace([\chr(0), '�', '�', '�', '�'], '�', $input); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$input = str_replace([\chr(0), '�', '�', '�', '�'], '', $input);
Why not like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The native parser already turns variants of �
into �
. We cannot really hook there. I think this is what the spec says to do.
Also: removing characters is a know vector for attacks, because it breaks content scanners.
Together with #61356, this PR allows removing any dependency on masterminds/html5 in favor of the native HTML5 capabilities of PHP 8.4 on Symfony 8
In order to do so, this we:
MastermindsParser
; useNativeParser
insteadParserInterface::parse()
can now return\Dom\Node|\DOMNode|null
instead of just\DOMNode|null
$context
toParserInterface::parse()
Note that
DomVisitor
is internal so no BC breaks there.And
StringSanitizer::htmlLower()
can leveragestrtolower()
since PHP 8.2 thanks to https://wiki.php.net/rfc/strtolower-ascii