Skip to content

Prevent adding non-DOMElement elements in DomCrawler #16058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 2, 2015

Conversation

stof
Copy link
Member

@stof stof commented Oct 1, 2015

Q A
Bug fix? kind of
New feature? no
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets n/a
License MIT
Doc PR n/a

Many methods of the DomCrawler component are relying on the DOMElement API, not only on the DOMNode API. All the typehints in the Form and Link APIs were already fixed in 2.5 because they are unusable with other kinds of nodes (fatal errors). However, the Crawler itself was not fixed. and this means that a bunch of its APIs can trigger fatal errors when passing other kinds of nodes.

Thus, there is a case where the code was allowing such nodes to be injected in the Crawler for some XPath queries. I fixed it to avoid it, adding the same kind of filtering than in other places.

@stof
Copy link
Member Author

stof commented Oct 1, 2015

Note that I submitted it in 2.8 because there is a slight behavior change, and the previous changes about DOMNode vs DOMDocument were not applied in 2.3

@stof stof added the DomCrawler label Oct 1, 2015
}

if (!$node instanceof \DOMElement) {
throw new \InvalidArgumentException(sprintf('Nodes set in a Crawler must be DOMElement instances, "%s" given.', get_class($node)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must be DOMElement or DOMDocument instances.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a test for this new exception?

@fabpot
Copy link
Member

fabpot commented Oct 2, 2015

👍

Many methods of the DomCrawler component are relying on the DOMElement
API, not only on the DOMNode API.
@stof stof force-pushed the validate_crawler_element branch from 507fdf9 to 9f362a1 Compare October 2, 2015 08:01
@stof
Copy link
Member Author

stof commented Oct 2, 2015

updated.

This PR will conflict with #16057. I will rebase one of them after the merge of the first one.

@stof
Copy link
Member Author

stof commented Oct 2, 2015

@symfony/deciders this one is ready

@fabpot
Copy link
Member

fabpot commented Oct 2, 2015

Thank you @stof.

@fabpot fabpot merged commit 9f362a1 into symfony:2.8 Oct 2, 2015
fabpot added a commit that referenced this pull request Oct 2, 2015
…stof)

This PR was merged into the 2.8 branch.

Discussion
----------

Prevent adding non-DOMElement elements in DomCrawler

| Q             | A
| ------------- | ---
| Bug fix?      | kind of
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | n/a
| License       | MIT
| Doc PR        | n/a

Many methods of the DomCrawler component are relying on the DOMElement API, not only on the DOMNode API. All the typehints in the Form and Link APIs were already fixed in 2.5 because they are unusable with other kinds of nodes (fatal errors). However, the Crawler itself was not fixed. and this means that a bunch of its APIs can trigger fatal errors when passing other kinds of nodes.

Thus, there is a case where the code was allowing such nodes to be injected in the Crawler for some XPath queries. I fixed it to avoid it, adding the same kind of filtering than in other places.

Commits
-------

9f362a1 Prevent adding non-DOMElement elements in DomCrawler
@stof stof deleted the validate_crawler_element branch October 2, 2015 10:50
{
$crawler = new Crawler();
$crawler->add(new \DOMNode());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test does not pass anymore because of the trigger added by the other PR. @stof Can you submit a PR to fix this? Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is why I said I would rebase to fix conflicts: the order of triggers should be reverted (first the validation, then the deprecation). I will send a PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #16073 for the PR

@stof stof mentioned this pull request Oct 2, 2015
fabpot added a commit that referenced this pull request Oct 2, 2015
This PR was merged into the 2.8 branch.

Discussion
----------

Fix the DomCrawler tests

| Q             | A
| ------------- | ---
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | n/a
| License       | MIT
| Doc PR        | n/a

This fixes the tests by applying changes from #16057 and #16058 in the right order in ``addNode``

Commits
-------

e6feed2 Fix the DomCrawler tests
@wasinger
Copy link

wasinger commented Oct 8, 2015

This commit breaks my project HtmlPageDom, as it is relying on adding DOMText nodes to a Crawler, which worked fine so far.

  • Is it really needed to allow only DOMElements (or would it be possible to allow DOMText too)?
  • In any case it breaks existing applications so I consider it a breaking change that should not be in a minor version. Please consider to remove it from 2.8 and introduce it in 3.0 only.

@stof
Copy link
Member Author

stof commented Oct 8, 2015

@wasinger it works well for a small part of the Crawler API, and triggers fatal errors for other methods. this was never a supported use case of the component

@stof
Copy link
Member Author

stof commented Oct 8, 2015

and btw, your child class replaces some Crawler method with methods with a different behavior, which does not respect the OOP principles. You should rather build your library using decoration rather than inheritance.

@boekkooi
Copy link
Contributor

boekkooi commented Nov 2, 2015

@stof You maybe right that it was originally not supposed to work or be used in a way but people did and it worked. So I have to second @wasinger in proposing to move this PR to 3.0 and to add deprecation warnings to 2.8. because this is a BC break and I have already hit it with 2 different packages.

@wasinger
Copy link

wasinger commented Nov 2, 2015

@stof You are right when saying that overwriting methods with a different behavior is not a nice programming style, I'm also not happy with it and will think about a rewrite of my library. But even using decoration I would need to have text nodes stored in a Crawler object.

@stof
Copy link
Member Author

stof commented Nov 2, 2015

@boekkooi having half of the methods triggering fatal errors is not something I call a supported use case.

@boekkooi
Copy link
Contributor

boekkooi commented Nov 2, 2015

@stof but half of them did work and since a lot of people just use stuff that works and that is now broken this PR is a BC break. Also having none of the method's work is more of a problem then having some of them fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants