Skip to content

[DomCrawler] Fix memory leak with filter method #11221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

jderusse
Copy link
Member

Q A
Bug fix? yes
New feature? no
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets #10879
License MIT
Doc PR no

Add a public method destroy on Translator to reset properties and free memory

$crawler->filter('tr > td');
}

$after = memory_get_usage(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to get an accurate estimation of the memory used, you should call gc_collect_cycles before memory_get_usage

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, but gc_collect_cycles also solve (partialy) the circular reference memory leak. I don't use it, because I want the test prove that there is a issue when GC is not called,

@jderusse jderusse changed the title Fix DomCrawler memory leak with filter method [DomCrawler] Fix memory leak with filter method Jun 24, 2014
@@ -61,8 +61,10 @@ public static function toXPath($cssExpr, $prefix = 'descendant-or-self::')
->registerParserShortcut(new ClassParser())
->registerParserShortcut(new HashParser())
;
$resp = $translator->cssToXPath($cssExpr, $prefix);
$translator->destroy();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An other solution to solve this meomry leak, is to replace this line by a simple.

if (gc_enabled()) {
    gc_collect_cycles();
}

Wich solution do you think is the better ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the garbage collector sounds like a good option when available.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relyinbg on the garbage collector would indeed be better (calling destroy on a translator puts it in a non-working state, so I would prefer avoiding to add it in the API)

@@ -85,6 +85,21 @@ public function __construct(ParserInterface $parser = null)
}

/**
* Resets local properties to there initial values
*/
public function destroy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset() sounds better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldnt this just be a destructor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you mean __destructor, the answer is no, because the __destructor is not called due to circular reference.
If you suggest the name destructor, I'm not aggre, because this method does not destruct anything. @fabpot is right, reset is better name.

@apfelbox
Copy link
Contributor

@jeremy-derusse could you add do some benchmarks? I can imagine that invoking the GC every time could be quite expensive.

@jderusse
Copy link
Member Author

@apfelbox In this article http://alexatnet.com/articles/optimize-php-memory-usage-eliminate-circular-references gc_collect_cycles seems to be as expensive as ->destroy()

In our case

for ($i = 0; $i < 10000; $i++) {
    $crawler = new Crawler($html);
    $crawler->filter('tr > td');
}

with call to ->destroy() => 2.9146590232849
with call to gc_collect_cycles() => 3.0352509021759

@jpauli
Copy link

jpauli commented Jun 25, 2014

gc_collect_cycles() only collects circular references. So it is useless if you don't have circular references. PHP will clean variables when they become out of scope, automatically

@nicolas-grekas
Copy link
Member

gc_collect_cycles() can't really be benchmarked, because it depends on the recent runtime history of the PHP engine. When collecting a simple object graph, it will be fast. But it can grow really expensive with complex structures I believe. It'd prefer letting the PHP engine do this on his own.
What we can and should do is removing circular references IMHO, nothing more.

@stof
Copy link
Member

stof commented Jun 25, 2014

@jpauli the issue here is precisely that the object graph of the CssSelector translator is cyclic

@jderusse
Copy link
Member Author

@jpauli here is the circular reference: https://github.com/symfony/symfony/blob/2.3/src/Symfony/Component/CssSelector/XPath/Translator.php#L79
https://github.com/symfony/symfony/blob/2.3/src/Symfony/Component/CssSelector/XPath/Extension/NodeExtension.php#L50

Thank you @nicolas-grekas, I miss this point in my benchmark. Regarding those arguments, it's not a good idea to use gc_collect_cycles, I'll just rename destroy into reset as suggested by @fabpot

@stof
Copy link
Member

stof commented Jun 25, 2014

@jeremy-derusse exposing reset() in the translator is not a good diea either, because it puts it in a non-working state. The right fix would be to refactor the logic to eliminate the circular reference. I have an idea for this but I don't have time to implement it just now. I will probably have to wait until tomorrow evening at least

@stof
Copy link
Member

stof commented Jun 26, 2014

see #11242 for my own attempt at fixing it, without altering the external API of the Translator and without putting it in an invalid state

@jderusse jderusse closed this Jun 27, 2014
fabpot added a commit that referenced this pull request Jun 27, 2014
…cular object graph (stof)

This PR was merged into the 2.3 branch.

Discussion
----------

[CssSelector] Refactored the CssSelector to remove the circular object graph

| Q             | A
| ------------- | ---
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #10879, replaces  #11221
| License       | MIT
| Doc PR        | n/a

This allows the translator and its extensions to be garbage collected based on the refcount rather than requiring the garbage collector run, making it much more likely to happen at the end of the ``CssSelector::toXPath`` call.

Node translators now receive the Translator as second argument, instead of requiring to inject it in the extension to keep a reference to it. This way, the Translator is referenced nowhere inside it, only by the caller, and so will be destructed at the end of the usage (and extensions will then be destructed after it when not used anymore).

Commits
-------

994f81f Refactored the CssSelector to remove the circular object graph
@jderusse jderusse deleted the domcrawler-memory-leak branch May 1, 2017 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants