Skip to content

[DomCrawler] Use the native HTML5 parser on PHP 8.4 #61475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 7.4
Choose a base branch
from

Conversation

nicolas-grekas
Copy link
Member

Q A
Branch? 7.4
Bug fix? no
New feature? no
Deprecations? no
Issues -
License MIT

This PR keeps the DOM*-based API but uses the native HTML5 parser on PHP 8.4 instead of masterminds/html5.
This works by parsing HTML strings using Dom\HTMLDocument then serializing to XML, and loading again using DOMDocument::loadXML().

This basically replaces #61356 since it removes any BC breaks.

The drawback compared to a more native approach is the double-parsing that happens.
This could be worked on later by providing a way to leverage the new Dom\* API directly.
To be proved worth it before.

@carsonbot carsonbot added this to the 7.4 milestone Aug 20, 2025
@OskarStark OskarStark changed the title [DomCrawler] Use the native HTM5 parser on PHP 8.4 [DomCrawler] Use the native HTML5 parser on PHP 8.4 Aug 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants