-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
Prevent adding non-DOMElement elements in DomCrawler #16058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Note that I submitted it in 2.8 because there is a slight behavior change, and the previous changes about |
} | ||
|
||
if (!$node instanceof \DOMElement) { | ||
throw new \InvalidArgumentException(sprintf('Nodes set in a Crawler must be DOMElement instances, "%s" given.', get_class($node))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
must be DOMElement or DOMDocument instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a test for this new exception?
👍 |
Many methods of the DomCrawler component are relying on the DOMElement API, not only on the DOMNode API.
507fdf9
to
9f362a1
Compare
updated. This PR will conflict with #16057. I will rebase one of them after the merge of the first one. |
@symfony/deciders this one is ready |
Thank you @stof. |
…stof) This PR was merged into the 2.8 branch. Discussion ---------- Prevent adding non-DOMElement elements in DomCrawler | Q | A | ------------- | --- | Bug fix? | kind of | New feature? | no | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | n/a | License | MIT | Doc PR | n/a Many methods of the DomCrawler component are relying on the DOMElement API, not only on the DOMNode API. All the typehints in the Form and Link APIs were already fixed in 2.5 because they are unusable with other kinds of nodes (fatal errors). However, the Crawler itself was not fixed. and this means that a bunch of its APIs can trigger fatal errors when passing other kinds of nodes. Thus, there is a case where the code was allowing such nodes to be injected in the Crawler for some XPath queries. I fixed it to avoid it, adding the same kind of filtering than in other places. Commits ------- 9f362a1 Prevent adding non-DOMElement elements in DomCrawler
{ | ||
$crawler = new Crawler(); | ||
$crawler->add(new \DOMNode()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test does not pass anymore because of the trigger added by the other PR. @stof Can you submit a PR to fix this? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is why I said I would rebase to fix conflicts: the order of triggers should be reverted (first the validation, then the deprecation). I will send a PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #16073 for the PR
This PR was merged into the 2.8 branch. Discussion ---------- Fix the DomCrawler tests | Q | A | ------------- | --- | Bug fix? | yes | New feature? | no | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | n/a | License | MIT | Doc PR | n/a This fixes the tests by applying changes from #16057 and #16058 in the right order in ``addNode`` Commits ------- e6feed2 Fix the DomCrawler tests
This commit breaks my project HtmlPageDom, as it is relying on adding
|
@wasinger it works well for a small part of the Crawler API, and triggers fatal errors for other methods. this was never a supported use case of the component |
and btw, your child class replaces some Crawler method with methods with a different behavior, which does not respect the OOP principles. You should rather build your library using decoration rather than inheritance. |
@stof You maybe right that it was originally not supposed to work or be used in a way but people did and it worked. So I have to second @wasinger in proposing to move this PR to 3.0 and to add deprecation warnings to 2.8. because this is a BC break and I have already hit it with 2 different packages. |
@stof You are right when saying that overwriting methods with a different behavior is not a nice programming style, I'm also not happy with it and will think about a rewrite of my library. But even using decoration I would need to have text nodes stored in a Crawler object. |
@boekkooi having half of the methods triggering fatal errors is not something I call a supported use case. |
@stof but half of them did work and since a lot of people just use stuff that works and that is now broken this PR is a BC break. Also having none of the method's work is more of a problem then having some of them fail. |
Many methods of the DomCrawler component are relying on the DOMElement API, not only on the DOMNode API. All the typehints in the Form and Link APIs were already fixed in 2.5 because they are unusable with other kinds of nodes (fatal errors). However, the Crawler itself was not fixed. and this means that a bunch of its APIs can trigger fatal errors when passing other kinds of nodes.
Thus, there is a case where the code was allowing such nodes to be injected in the Crawler for some XPath queries. I fixed it to avoid it, adding the same kind of filtering than in other places.