Skip to content

[HtmlSanitizer] Use the native HTML5 parser when using PHP 8.4+ #61366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 12, 2025

Conversation

nicolas-grekas
Copy link
Member

@nicolas-grekas nicolas-grekas commented Aug 8, 2025

Q A
Branch? 7.4
Bug fix? no
New feature? yes
Deprecations? yes
Issues #53666
License MIT

Together with #61356, this PR allows removing any dependency on masterminds/html5 in favor of the native HTML5 capabilities of PHP 8.4 on Symfony 8

In order to do so, this we:

  • Use the native HTML5 parser when using PHP 8.4+
  • Deprecate MastermindsParser; use NativeParser instead
  • [BC BREAK] ParserInterface::parse() can now return \Dom\Node|\DOMNode|null instead of just \DOMNode|null
  • Add argument $context to ParserInterface::parse()

Note that DomVisitor is internal so no BC breaks there.
And StringSanitizer::htmlLower() can leverage strtolower() since PHP 8.2 thanks to https://wiki.php.net/rfc/strtolower-ascii

@nicolas-grekas nicolas-grekas force-pushed the hs-php84 branch 2 times, most recently from 98b6dab to 1c40c66 Compare August 12, 2025 10:43
Copy link
Member

@alexandre-daubois alexandre-daubois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

nicolas-grekas added a commit that referenced this pull request Aug 12, 2025
…ext arg to ParserInterface::parse() (nicolas-grekas)

This PR was merged into the 8.0 branch.

Discussion
----------

[HtmlSanitizer] Remove MastermindsParser and add $context arg to ParserInterface::parse()

| Q             | A
| ------------- | ---
| Branch?       | 8.0
| Bug fix?      | no
| New feature?  | yes
| Deprecations? | no
| Issues        | -
| License       | MIT

Follows #61366

Commits
-------

b291f58 [HtmlSanitizer] Remove MastermindsParser and add $context arg to ParserInterface::parse()
@nicolas-grekas nicolas-grekas deleted the hs-php84 branch August 12, 2025 16:44
// Remove NULL character
$input = str_replace(\chr(0), '', $input);
// Remove NULL character and HTML entities for null byte
$input = str_replace([\chr(0), '�', '�', '�', '�'], '', $input);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$input = str_replace([\chr(0), '�', '�', '�', '�'], '', $input);

Why not like this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The native parser already turns variants of � into . We cannot really hook there. I think this is what the spec says to do.
Also: removing characters is a know vector for attacks, because it breaks content scanners.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants