Skip to content

[DomCrawler] Inherit the namespace cache in subcrawlers #19422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 25, 2016

Conversation

stof
Copy link
Member

@stof stof commented Jul 25, 2016

Q A
Branch? 2.8
Bug fix? no
New feature? no
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets #12298
License MIT
Doc PR n/a

This inherits any already discovered/registered namespace with subcrawlers, improving performance when using namespaces.

I submitted to 2.8 rather than 2.7, because the namespace mapping feature was actually buggy in 2.x, because of the fact that nodes could belong to different documents in the same Crawler while the namespace map was shared. The fact that the map was not inherited in subcrawler mitigated this issue (by reducing changes to have multiple documents in the same subcrawler). 2.8 deprecated this possibility to have multiple documents, so I'm fine with applying this here.

Note that the subcrawler inherits the namespace cache at the time it is created, but the cache is not shared between instance (so if a subcrawler discovers an additional namespace of the document, it will not be available for the parent crawler of other subcrawlers of the parent). Sharing the cache would be totally possible (as they share the same document anyway) and would make the experience even better (removing the need to ensure that the root crawler discovers namespace before filtering). But it would require moving from an array to an object. I'm not sure we want to do this in a patch release. What do you think @symfony/deciders ?

@stof
Copy link
Member Author

stof commented Jul 25, 2016

See #12298 (comment) for the blackfire comparison

@stof
Copy link
Member Author

stof commented Jul 25, 2016

The reason I missed this when I optimized the component in the past is because I was working on queries without namespaces

@xabbuh
Copy link
Member

xabbuh commented Jul 25, 2016

👍

Status: Reviewed

@nicolas-grekas
Copy link
Member

Thank you @stof.

@nicolas-grekas nicolas-grekas merged commit e89c758 into symfony:2.8 Jul 25, 2016
nicolas-grekas added a commit that referenced this pull request Jul 25, 2016
…tof)

This PR was merged into the 2.8 branch.

Discussion
----------

[DomCrawler] Inherit the namespace cache in subcrawlers

| Q             | A
| ------------- | ---
| Branch?       | 2.8
| Bug fix?      | no
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #12298
| License       | MIT
| Doc PR        | n/a

This inherits any already discovered/registered namespace with subcrawlers, improving performance when using namespaces.

I submitted to 2.8 rather than 2.7, because the namespace mapping feature was actually buggy in 2.x, because of the fact that nodes could belong to different documents in the same Crawler while the namespace map was shared. The fact that the map was not inherited in subcrawler mitigated this issue (by reducing changes to have multiple documents in the same subcrawler). 2.8 deprecated this possibility to have multiple documents, so I'm fine with applying this here.

Note that the subcrawler inherits the namespace cache at the time it is created, but the cache is not shared between instance (so if a subcrawler discovers an additional namespace of the document, it will not be available for the parent crawler of other subcrawlers of the parent). Sharing the cache would be totally possible (as they share the same document anyway) and would make the experience even better (removing the need to ensure that the root crawler discovers namespace before filtering). But it would require moving from an array to an object. I'm not sure we want to do this in a patch release. What do you think @symfony/deciders ?

Commits
-------

e89c758 [DomCrawler] Inherit the namespace cache in subcrawlers
@stof stof deleted the share_domcrawler_namespace_cache branch July 25, 2016 16:30
This was referenced Jul 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants