Skip to content

[DomCrawler] Revert previous restriction, allow selection of every DOMNode object #17021

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed

[DomCrawler] Revert previous restriction, allow selection of every DOMNode object #17021

wants to merge 8 commits into from

Conversation

EdgarPE
Copy link
Contributor

@EdgarPE EdgarPE commented Dec 15, 2015

Q A
Bug fix? no
New feature? yes, revert to previous behaviour
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets #16933
License MIT
Doc PR

Update: no BC break here.

Related Pull request: #16058
Mostly reverted previous commits: f416e70 9e60980

The linked PR above introduced a serious limitation in the DomCrawler component, namely only allows DOMElement and DOMDocument objects to be selected/added to the Crawler. Previously it was possible to completely parse, process, and also modify any Html, and XML documents.

Text nodes, attribute nodes and comment nodes are now impossible to search or add using the Crawler. I think these changes greatly reduce the usage possibilities of DomCrawler. So, my suggestion is:

Let's revert two previous changes and allow selecting every possible DOMNode object. There were fatal errors in some cases, witch were fixed with the changed type hints. Instead of type hints, now \InvalidArgumentException is thrown if needed in these methods: Crawler::link(), Crawler::links() and Crawler::form(), because these methods add special treatment for specific nodes.

I am new to this community so I'm sorry if my commit or Pull Request have some flaws or miss something. In that case please let me know what should I do to fix it.

@EdgarPE EdgarPE changed the title [DomCrawler] Revert previous restriction, allow selection of every DOMNode pbject [DomCrawler] Revert previous restriction, allow selection of every DOMNode object Dec 15, 2015
* @param string $currentUri The URI of the page where the form is embedded
* @param string $method The method to use for the link (if null, it defaults to the method defined by the form)
* @param string $baseHref The URI of the <base> used for relative links, but not for empty action
*
* @throws \LogicException if the node is not a button inside a form tag
*/
public function __construct(\DOMElement $node, $currentUri, $method = null, $baseHref = null)
public function __construct(\DOMNode $node, $currentUri, $method = null, $baseHref = null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be changed back, as this class cannot work with anything else than a DOMElement (it triggers fatal errors for others)

@stof
Copy link
Member

stof commented Dec 15, 2015

All changes in Form, Link and fields should be reverted

@@ -830,14 +842,9 @@ private function filterRelativeXPath($xpath)

$crawler = $this->createSubCrawler(null);

foreach ($this->nodes as $node) {
foreach ($this as $node) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change should be reverted. There is no reason to iterate on the crawler rather than iterator directly on the internal array

@EdgarPE
Copy link
Contributor Author

EdgarPE commented Dec 15, 2015

@stof Thank you for your comments. Later today I'll update my patch to incorporate everything you asked, and also fix fabpot.io errors. The Travis CI fail seems to be irrelevant, or at least caused by something else.

@EdgarPE
Copy link
Contributor Author

EdgarPE commented Dec 15, 2015

@stof based on your input I kept all type hints in Form, Link and Fields/*, also related tests. But removed the DOMElement restriction form Crawler on all nodes. Nodes now can contain DOMNode-s, but Crawler::link(). Crawler::links() and Crawler::forms() throw an \InvalidArgumentException if needed.

This way there is no need for a second BC BREAK.

@EdgarPE
Copy link
Contributor Author

EdgarPE commented Dec 15, 2015

Also reworded the exceptions and made the changes in test cases. Now, this is a much smaller PR, only 2 files changed.

@@ -706,6 +702,10 @@ public function link($method = 'get')

$node = $this->getNode(0);

if(!$node instanceof \DOMElement) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing space before the opening parenthesis.

@stof
Copy link
Member

stof commented Dec 16, 2015

should we merge this in older branches as a bugfix ?

@xabbuh
Copy link
Member

xabbuh commented Dec 16, 2015

Does it work the same when applied to older branches?

@EdgarPE
Copy link
Contributor Author

EdgarPE commented Dec 16, 2015

This change set reverts/rewrites 9e60980 which was introduced in 2.8 Sadly, this exact change set as a patch does not apply to 2.8. Should I work on a 2.8 version of these changes?

@EdgarPE
Copy link
Contributor Author

EdgarPE commented Dec 16, 2015

I don't know what is the policy of backports, so I made a different PR for it.

@@ -706,6 +702,10 @@ public function link($method = 'get')

$node = $this->getNode(0);

if (!$node instanceof \DOMElement) {
throw new \InvalidArgumentException(sprintf("The current node list should contain only DOMElement instances, '%s' found.", get_class($node)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, we should use single quotes to enclose the message, and double quotes inside the message:

throw new \InvalidArgumentException(sprintf('The current node list should contain only DOMElement instances, "%s" found.', get_class($node)));

@jakzal
Copy link
Contributor

jakzal commented Dec 17, 2015

@EdgarPE great job!

👍 once my comments are addressed

@EdgarPE
Copy link
Contributor Author

EdgarPE commented Dec 17, 2015

@jakzal Thank you! Let me know if there is anything I should fix.

@@ -745,6 +735,26 @@ public function testLink()
}
}

/**
* @expectedException \InvalidArgumentException
* @expectedExceptionMessageRegExp /^The current node list should contain only DOMElement instances/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be only @expectedExceptionMessage (no regexp)

@EdgarPE
Copy link
Contributor Author

EdgarPE commented Dec 18, 2015

It looks like the tests are failing because of the new commit is master. Let me know if there is anything I should fix, or if I need to do a rebase now.

@EdgarPE
Copy link
Contributor Author

EdgarPE commented Dec 21, 2015

Rebased my changes to latest master, DomCrawler tests still pass. I'm looking forward for your update on this.

@stof
Copy link
Member

stof commented Dec 22, 2015

I suggest merging #17035 instead, which does the same changes in 2.8

@xabbuh
Copy link
Member

xabbuh commented Dec 22, 2015

I agree with @stof.

…MNode object.

Squashed commit of the following:

commit 69c97511c99e761dfeb612bb1d3b8a83b8984810
Author: EdgarPE <hello@edgarpe.hu>
Date:   Tue Dec 15 17:01:10 2015 +0100

    Add CHANGELOG

commit 19a171af813e83112fb404a1b86985215c83346a
Author: EdgarPE <hello@edgarpe.hu>
Date:   Tue Dec 15 16:48:46 2015 +0100

    Test coverage for InvalidArgumentExceptions in Crawler.php

commit 8b050b780222e8c09c1b5ac2776584cd0d42d807
Author: EdgarPE <hello@edgarpe.hu>
Date:   Tue Dec 15 15:43:13 2015 +0100

    Reverted [DomCrawler] Changed typehints form DomNode to DomElement f416e70

commit 52ff5af0c13417277b4e3b122f35536ed43d0668
Author: EdgarPE <hello@edgarpe.hu>
Date:   Tue Dec 15 15:33:58 2015 +0100

    Reverted: feature #16058 Prevent adding non-DOMElement elements in DomCrawler (stof) 9f362a1
@EdgarPE
Copy link
Contributor Author

EdgarPE commented Dec 22, 2015

@xabbuh @stof 👍 Merging #17035 into master causes some conflicts but it's easy to solve

@jakzal jakzal closed this Dec 23, 2015
jakzal added a commit that referenced this pull request Dec 23, 2015
…ion of every DOMNode object (EdgarPE)

This PR was squashed before being merged into the 2.8 branch (closes #17035).

Discussion
----------

[DomCrawler] Revert previous restriction, allow selection of every DOMNode object

| Q             | A
| ------------- | ---
| Bug fix?      | no
| New feature?  | yes, revert to previous behaviour
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #16933
| License       | MIT
| Doc PR        |

This is a backport of PR #17021

Commits
-------

d2872a3 [DomCrawler] Revert previous restriction, allow selection of every DOMNode object
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants