-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
Deprecate loading multiple documents in the same crawler #16057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -20,7 +20,10 @@ public function testConstructor() | |||
$crawler = new Crawler(); | |||
$this->assertCount(0, $crawler, '__construct() returns an empty crawler'); | |||
|
|||
$crawler = new Crawler(new \DOMNode()); | |||
$doc = new \DOMDocument(); | |||
$node = $doc->createElement('test'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change should actually be done in older branches too. Passing an orphan DOMNode creates a broken crawler:
- we actually only accept
DOMElement
(andDOMDocument
where we get the documentElement in it), not anyDOMNode
(you get errors in some cases if you inject such nodes in a crawler) - we don't accept nodes not attached to a document (you would get an
Invalid state error
from the DOM library when accessingownerDocument
, which we already do when filtering the crawler)
the failure with deps=low in the Security component seems related to a bug in PHPUnit 5.0 Edit: reported as sebastianbergmann/phpunit#1886 |
and for deps=high, it is because 2.8 needs to be merged into master to add the new class |
Once this PR is merged, I will work on the update of the component for 3.0 to avoid using SplObjectStorage (I prefer waiting until this PR is merged to avoid useless conflicts) |
@@ -307,6 +313,14 @@ public function addNodes(array $nodes) | |||
*/ | |||
public function addNode(\DOMNode $node) | |||
{ | |||
if (null !== $this->document && $this->document !== $node->ownerDocument) { | |||
@trigger_error('Attaching DOM nodes from multiple documents in a Crawler is deprecated as of 2.8 and will be forbidden in 3.0.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing E_USER_DEPRECATED
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
5a8e2ad
to
a78caa9
Compare
@fabpot it would be great to merge 2.8 into master to fix the deps=high builds (this is painful because of the descriptor fixture changes btw) |
a78caa9
to
0d1cb3b
Compare
@symfony/deciders this one is ready |
Thank you @stof. |
…er (stof) This PR was merged into the 2.8 branch. Discussion ---------- Deprecate loading multiple documents in the same crawler | Q | A | ------------- | --- | Bug fix? | no | New feature? | no | BC breaks? | no | Deprecations? | yes | Tests pass? | yes | Fixed tickets | #15849 | License | MIT | Doc PR | n/a Note that loading multiple documents in the same crawler already creates weird things when working with namespaces (the list of mapping of aliases to namespaces is shared between documents, which was flawed). As said in the issue, this opens the door to optimizations in the future (sharing the DOMXpath instance for instance, including with subcrawler) Commits ------- 0d1cb3b Deprecate loading multiple documents in the same crawler
This PR was merged into the 2.8 branch. Discussion ---------- Fix the DomCrawler tests | Q | A | ------------- | --- | Bug fix? | yes | New feature? | no | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | n/a | License | MIT | Doc PR | n/a This fixes the tests by applying changes from #16057 and #16058 in the right order in ``addNode`` Commits ------- e6feed2 Fix the DomCrawler tests
Note that loading multiple documents in the same crawler already creates weird things when working with namespaces (the list of mapping of aliases to namespaces is shared between documents, which was flawed).
As said in the issue, this opens the door to optimizations in the future (sharing the DOMXpath instance for instance, including with subcrawler)