-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
Able to load big xml files with DomCrawler #16873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
zorn-v
commented
Dec 7, 2015
Q | A |
---|---|
Bug fix? | yes |
New feature? | no |
BC breaks? | no |
Deprecations? | no |
Tests pass? | yes |
Fixed tickets | |
License | MIT |
Doc PR |
@@ -230,7 +230,7 @@ public function addXmlContent($content, $charset = 'UTF-8') | |||
$dom->validateOnParse = true; | |||
|
|||
if ('' !== trim($content)) { | |||
@$dom->loadXML($content, LIBXML_NONET); | |||
@$dom->loadXML($content, LIBXML_NONET | LIBXML_PARSEHUGE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this option have any drawbacks when parsing non-huge documents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manual says that it only relaxes any hardcoded limit from the parser.
https://secure.php.net/manual/en/libxml.constants.php
It only for Libxml >= 2.7.0 but I dont know is version below is widespread.
For ex. on CentOS 6 is 2.7.6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need something like:
LIBXML_NONET | (defined('LIBXML_PARSEHUGE') ? LIBXML_PARSEHUGE : 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constant defined in php extension and avail since PHP >= 5.3.2 and PHP >= 5.2.12 which is less than min requirement for DomCrawler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zorn-v is it always defined, or does it depend on the libxml version being used ? Distributions generally compile PHP against the system libxml rather than the version bundled with PHP, meaning that it may change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. In php 5.3.9 ext\libxml\libxml.c
#if LIBXML_VERSION >= 20703
REGISTER_LONG_CONSTANT("LIBXML_PARSEHUGE", XML_PARSE_HUGE, CONST_CS | CONST_PERSISTENT);
#endif
So minimum libxml version actualy 2.7.3 not 2.7.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can find only one dist with libxml2 < 2.7.3 - CentOS 5. But there PHP 5.1.6 (in standart repo)
Even on Debian 6 it 2.7.8
I think threre is no sense in that check, but I add it just in case.
Libxml introduced XML_MAX_TEXT_LENGTH in We recently updated the library to Please also mention http://symfony.com/blog/security-release-symfony-2-0-17-released about the |
Thank you @zorn-v. |
This PR was submitted for the 2.8 branch but it was merged into the 2.3 branch instead (closes #16873). Discussion ---------- Able to load big xml files with DomCrawler | Q | A | ------------- | --- | Bug fix? | yes | New feature? | no | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | | License | MIT | Doc PR | Commits ------- 3dae825 Able to load big xml files with DomCrawler
Think this also should be applied to the |
…-grekas) This PR was merged into the 2.3 branch. Discussion ---------- [DomCrawler] Dont use LIBXML_PARSEHUGE by default | Q | A | ------------- | --- | Branch | 2.3 | Bug fix? | yes | New feature? | no | BC breaks? | no | Deprecations? | no | Tests pass? | no | Fixed tickets | #16873, #17956 | License | MIT | Doc PR | - Because of http://symfony.com/blog/security-release-symfony-2-0-17-released Commits ------- fda32f8 [DomCrawler] Dont use LIBXML_PARSEHUGE by default