Skip to content

[DomCrawler] Use the native HTML5 parser on PHP 8.4 #61475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 21, 2025

Conversation

nicolas-grekas
Copy link
Member

Q A
Branch? 7.4
Bug fix? no
New feature? no
Deprecations? no
Issues -
License MIT

This PR keeps the DOM*-based API but uses the native HTML5 parser on PHP 8.4 instead of masterminds/html5.
This works by parsing HTML strings using Dom\HTMLDocument then serializing to XML, and loading again using DOMDocument::loadXML().

This basically replaces #61356 since it removes any BC breaks.

The drawback compared to a more native approach is the double-parsing that happens.
This could be worked on later by providing a way to leverage the new Dom\* API directly.
To be proved worth it before.

@carsonbot carsonbot added this to the 7.4 milestone Aug 20, 2025
@OskarStark OskarStark changed the title [DomCrawler] Use the native HTM5 parser on PHP 8.4 [DomCrawler] Use the native HTML5 parser on PHP 8.4 Aug 20, 2025
@nicolas-grekas nicolas-grekas force-pushed the dom-native-html5 branch 4 times, most recently from f963618 to a6a0033 Compare August 21, 2025 06:43
@nicolas-grekas nicolas-grekas merged commit 8187625 into symfony:7.4 Aug 21, 2025
10 of 12 checks passed
nicolas-grekas added a commit that referenced this pull request Aug 21, 2025
…nks to the native DOM parser (nicolas-grekas)

This PR was merged into the 8.0 branch.

Discussion
----------

[DomCrawler] Always parse according to HTML5 rules thanks to the native DOM parser

| Q             | A
| ------------- | ---
| Branch?       | 8.0
| Bug fix?      | no
| New feature?  | yes
| Deprecations? | no
| Issues        | -
| License       | MIT

Follows #61475

Commits
-------

0425b2a [DomCrawler] Always parse according to HTML5 rules thanks to the native DOM parser
@nicolas-grekas nicolas-grekas deleted the dom-native-html5 branch August 21, 2025 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants