Skip to content

Deprecate loading multiple documents in the same crawler #16057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 2, 2015

Conversation

stof
Copy link
Member

@stof stof commented Oct 1, 2015

Q A
Bug fix? no
New feature? no
BC breaks? no
Deprecations? yes
Tests pass? yes
Fixed tickets #15849
License MIT
Doc PR n/a

Note that loading multiple documents in the same crawler already creates weird things when working with namespaces (the list of mapping of aliases to namespaces is shared between documents, which was flawed).

As said in the issue, this opens the door to optimizations in the future (sharing the DOMXpath instance for instance, including with subcrawler)

@@ -20,7 +20,10 @@ public function testConstructor()
$crawler = new Crawler();
$this->assertCount(0, $crawler, '__construct() returns an empty crawler');

$crawler = new Crawler(new \DOMNode());
$doc = new \DOMDocument();
$node = $doc->createElement('test');
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change should actually be done in older branches too. Passing an orphan DOMNode creates a broken crawler:

  • we actually only accept DOMElement (and DOMDocument where we get the documentElement in it), not any DOMNode (you get errors in some cases if you inject such nodes in a crawler)
  • we don't accept nodes not attached to a document (you would get an Invalid state error from the DOM library when accessing ownerDocument, which we already do when filtering the crawler)

@stof stof added the DomCrawler label Oct 1, 2015
@stof
Copy link
Member Author

stof commented Oct 1, 2015

the failure with deps=low in the Security component seems related to a bug in PHPUnit 5.0

Edit: reported as sebastianbergmann/phpunit#1886

@stof
Copy link
Member Author

stof commented Oct 1, 2015

and for deps=high, it is because 2.8 needs to be merged into master to add the new class

@stof
Copy link
Member Author

stof commented Oct 1, 2015

Once this PR is merged, I will work on the update of the component for 3.0 to avoid using SplObjectStorage (I prefer waiting until this PR is merged to avoid useless conflicts)

@@ -307,6 +313,14 @@ public function addNodes(array $nodes)
*/
public function addNode(\DOMNode $node)
{
if (null !== $this->document && $this->document !== $node->ownerDocument) {
@trigger_error('Attaching DOM nodes from multiple documents in a Crawler is deprecated as of 2.8 and will be forbidden in 3.0.');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing E_USER_DEPRECATED

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@stof stof force-pushed the deprecate_multiple_documents branch from 5a8e2ad to a78caa9 Compare October 2, 2015 07:17
@stof
Copy link
Member Author

stof commented Oct 2, 2015

@fabpot it would be great to merge 2.8 into master to fix the deps=high builds (this is painful because of the descriptor fixture changes btw)

@stof stof force-pushed the deprecate_multiple_documents branch from a78caa9 to 0d1cb3b Compare October 2, 2015 08:46
@stof stof added the Ready label Oct 2, 2015
@stof
Copy link
Member Author

stof commented Oct 2, 2015

@symfony/deciders this one is ready

@fabpot
Copy link
Member

fabpot commented Oct 2, 2015

Thank you @stof.

@fabpot fabpot merged commit 0d1cb3b into symfony:2.8 Oct 2, 2015
fabpot added a commit that referenced this pull request Oct 2, 2015
…er (stof)

This PR was merged into the 2.8 branch.

Discussion
----------

Deprecate loading multiple documents in the same crawler

| Q             | A
| ------------- | ---
| Bug fix?      | no
| New feature?  | no
| BC breaks?    | no
| Deprecations? | yes
| Tests pass?   | yes
| Fixed tickets | #15849
| License       | MIT
| Doc PR        | n/a

Note that loading multiple documents in the same crawler already creates weird things when working with namespaces (the list of mapping of aliases to namespaces is shared between documents, which was flawed).

As said in the issue, this opens the door to optimizations in the future (sharing the DOMXpath instance for instance, including with subcrawler)

Commits
-------

0d1cb3b Deprecate loading multiple documents in the same crawler
@stof stof deleted the deprecate_multiple_documents branch October 2, 2015 10:50
@stof stof mentioned this pull request Oct 2, 2015
fabpot added a commit that referenced this pull request Oct 2, 2015
This PR was merged into the 2.8 branch.

Discussion
----------

Fix the DomCrawler tests

| Q             | A
| ------------- | ---
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | n/a
| License       | MIT
| Doc PR        | n/a

This fixes the tests by applying changes from #16057 and #16058 in the right order in ``addNode``

Commits
-------

e6feed2 Fix the DomCrawler tests
@fabpot fabpot mentioned this pull request Nov 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants