Skip to content

[Finder] Handle filtering of recursive iterators and use it to skip looping over excluded directories #15802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

nicolas-grekas
Copy link
Member

Q A
Bug fix? yes
New feature? yes
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets #5951, #8685
License MIT
Doc PR -

By implementing RecursiveIterator in FilterIterator, we can make it able to skip children of excluded branches of recursive inner iterators.
We use it immediately for our main target: skip over children of excluded directories in the Finder.
This is a significant performance boost when iterating over big directories, thus the "bugfix" status.

@jakzal
Copy link
Contributor

jakzal commented Sep 15, 2015

Would it fix #5951 completely?

@fabpot
Copy link
Member

fabpot commented Sep 15, 2015

So, as a bug fix, it's going to be merged into 2.3 instead.

@fabpot
Copy link
Member

fabpot commented Sep 15, 2015

Work great! 50% faster for the following code (from the Symfony root dir):

require_once __DIR__.'/vendor/autoload.php';

use Symfony\Component\Finder\Finder;

$finder = Finder::create();

$ct = 1;
foreach ($finder->in(__DIR__) as $file) {
    ++$ct;
}
print "$ct\n";

And here is the result from Blackfire:

  Total time:   985 ms    (-1.16 s)
    CPU time:   924 ms    (-982 ms)
         I/O:  61.1 ms    (-174 ms)
      Memory:     1 MB  (+ 5.41 KB)
     Network:      n/a

Visually, it confirms that we are doing a better job at limiting the number of calls:

https://blackfire.io/profiles/compare/f45a1e98-0ab2-4f31-99bf-621da8a329fb/graph

Note: For the record, the speed improvement for such a simple Finder command comes from the fact that we are excluding the VCS directories by default (which adds about 9 exclude directory constraints to the Finder).

@fabpot fabpot mentioned this pull request Sep 15, 2015
2 tasks
@stof
Copy link
Member

stof commented Sep 15, 2015

It would be great to run benchmarks of the PHP adapter vs find-based adapters again with this change applied, to see how much better they behave ( @jfsimon had such benchmark when working on the component).

@nicolas-grekas another possible improvement could be to split the DepthRangeFilterIterator in 2:

  • a min-depth filter which would applied after flattening the recursive iterator as today (it is necessary as a recursive filter would not be able to exclude a node while keeping its children)
  • a max-depth filter which would be applied before flattening the recursive iterator, which would avoid looking at children deeper than the limit.

@@ -18,9 +18,24 @@
*
* @author Alex Bogomazov
*/
abstract class FilterIterator extends \FilterIterator
abstract class FilterIterator extends \FilterIterator implements \RecursiveIterator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure implementing RecursiveIterator on all out filters is a good idea. Most of our filters are required to be applied on a flattened iterator to work properly.

What about using a child class RecursiveFilterIterator which would become the parent of the filters applied in a recursive way ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would also avoid the need to memoize the return value in all other filters

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary: implementing inner recursive iterator is just a new feature of the class, but it's opt-in. If you don't use it as a recursive iterator (which nobody does since it didn't implement the interface) then its the same as before. Now, if one uses it as a recursive iterator, then it works. I see no need for more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value is not memoized in all other filters, but in filters where having a recursive inner iterator could make sense. The filter on the contents of a file for example doesn't memoize.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolas-grekas the issue is that this makes it harder to distinguish filters which can be applied in a recursive way and filters which simply cannot be applied this way because it would change their behavior.

For instance, the PathFilterIterator should never be applied as a recursive iterator: it would mean that rejecting a folder would also reject all files in it (because of the behavior of RecursiveIteratorIterator which does not look for children of excluded elements).
Implementing RecursiveIterator on these filter iterators would be misleading for people looking at them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memoization removed, this should fix your concern

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memoization was not the culprit (it was simply useless). The issue is marking filter iterators as being RecursiveIterator while using them in a recursive way provides a broken behavior (it will exclude too much things)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not broken at all, it's just a new behavior that didn't exist before and can be changed easily by wrapping the injected recursive iterator in a RecursiveIteratorIterator. I see no issue here. To get the new behavior is really opt-in: inject a recursive iterator, and use this one as a recursive iterator also. I wouldn't say anyone will do this by mistake.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolas-grekas but having an iterator configured to exclude /Tests$/ and having it exclude .../Tests/foo.php would be confusing.
The filtering iterators are meant to implement the filters provided in the finder component, and implementing RecursiveIterator should be done only on filters which are actually providing their expected behavior in a recursive filtering IMO. Otherwise it would be too easy for someone to break the behavior in the future by moving filters around and getting one before the RecursiveIteratorIterator by mistake

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if that's exactly what I want, excluding /Tests$/ and having it exclude .../Tests/foo.php?
This gives more power to the user, and we wouldn't force what can or can't be done with the iterators.
More power means more way to get things wrong of course... and I see no issue with that.

@nicolas-grekas
Copy link
Member Author

@stof I'm checking-out the benchmark suite.
I thought the same for depth limiting, but there is nothing to fix: the max depth limiting is implemented natively by RecursiveIteratorIterator, which knows about RecursiveIterator and already skips children correctly.

@nicolas-grekas
Copy link
Member Author

Benched with https://github.com/jfsimon/symfony-finder-benchmark:

before:

   1010 files
    110 directories
     10 iterations
      6 cases
      2 adapters

            case             php        gnu_find
               0         134,536          32,502
               1         139,751          32,799
               2         146,635          55,269
               3         165,655         108,679
               4         153,492          26,968
               5         181,146          34,751

      0 Find files by name containing a*
      1 Find files by name containing a* not containing *a
      2 Find files by name containing ~^a.*~ not containing ~.*a$~
      3 Find files by path containing a*
      4 Find files by path containing a* not containing *a
      5 Find files by path containing ~^a.*~ not containing ~.*a$~

after:

   1010 files
    110 directories
     10 iterations
      6 cases
      2 adapters

            case             php        gnu_find
               0          77,216          34,788
               1          81,621          36,329
               2          80,585          56,178
               3          85,982          70,764
               4          91,257          19,029
               5          91,729          38,958

      0 Find files by name containing a*
      1 Find files by name containing a* not containing *a
      2 Find files by name containing ~^a.*~ not containing ~.*a$~
      3 Find files by path containing a*
      4 Find files by path containing a* not containing *a
      5 Find files by path containing ~^a.*~ not containing ~.*a$~

@nicolas-grekas nicolas-grekas force-pushed the skip-excluded-dirs branch 2 times, most recently from 84b2512 to 07e101f Compare September 15, 2015 09:35
@stof
Copy link
Member

stof commented Sep 15, 2015

I thought the same for depth limiting, but there is nothing to fix: the max depth limiting is implemented natively by RecursiveIteratorIterator, which knows about RecursiveIterator and already skips children correctly.

right. I forgot about the way the DepthRangeFilterIterator is implemented internally

$path = $this->isDir() ? $this->current()->getRelativePathname() : $this->current()->getRelativePath();
$path = $this->current();

if (isset($this->patterns[$path->getFilename()])) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks wrong for files rather than folders

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right! fixed, memoization removed also!

@nicolas-grekas nicolas-grekas force-pushed the skip-excluded-dirs branch 2 times, most recently from 0b2996e to 8ee45b2 Compare September 15, 2015 10:11
@fabpot
Copy link
Member

fabpot commented Sep 15, 2015

Thank you @nicolas-grekas.

fabpot added a commit that referenced this pull request Sep 15, 2015
…t to skip looping over excluded directories (nicolas-grekas)

This PR was submitted for the 2.8 branch but it was merged into the 2.3 branch instead (closes #15802).

Discussion
----------

[Finder] Handle filtering of recursive iterators and use it to skip looping over excluded directories

| Q             | A
| ------------- | ---
| Bug fix?      | yes
| New feature?  | yes
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #5951, #8685
| License       | MIT
| Doc PR        | -

By implementing RecursiveIterator in FilterIterator, we can make it able to skip children of excluded branches of recursive inner iterators.
We use it immediately for our main target: skip over children of excluded directories in the Finder.
This is a significant performance boost when iterating over big directories, thus the "bugfix" status.

Commits
-------

8c691bd [Finder] Handle filtering of recursive iterators and use it to skip looping over excluded directories
@stof stof closed this Sep 15, 2015
@nicolas-grekas nicolas-grekas deleted the skip-excluded-dirs branch September 15, 2015 12:35
nicolas-grekas added a commit that referenced this pull request Sep 17, 2015
This PR was merged into the 2.3 branch.

Discussion
----------

[Finder] Fix recursive filter iterator

| Q             | A
| ------------- | ---
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #15802
| License       | MIT
| Doc PR        | -

#15802 was broken because the filters where not propagated to children iterators. This fixes the issue and adds a test case for this situation. The RecursiveIterator implementation is moved from FilterIterator to ExcludeDirectoryFilterIterator. Doing so on other filters is possible but would be a new feature that is not required for fixing the performance issue we had previously.

Commits
-------

cf3019b [Finder] Fix recursive filter iterator
fabpot added a commit that referenced this pull request Sep 18, 2015
This PR was merged into the 2.3 branch.

Discussion
----------

[Finder] Optimize the hot-path

| Q             | A
| ------------- | ---
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #15824
| License       | MIT
| Doc PR        | -

A significant part of the perf gain in #15802 was related to filters not being applied recursively...
#15824 fixing this, performance dropped again.

This PR optimizes the hot path by replacing a regexp test by a simple `isset` when possible.
Blackfire diff after #15824 is the following:
https://blackfire.io/profiles/compare/9e489018-998d-4acb-92a0-46011828e83b/graph

`preg_match` is not called anymore, and  `Symfony\Component\Finder\Iterator\RecursiveDirectoryIterator::current()` is also cut by two.

When this `isset` optimization is disabled and replaced by a concatenation of all the regexps patterns in a single bigger one, the gain is still significant but lower:
https://blackfire.io/profiles/compare/db86b80e-b63e-4fc9-9ff3-9ed32baeb948/graph

This makes me think that an other and last round of optimization is possible by merging all regexps in one in MultiplePcreFilterIterator filters. If someone wants to work on this, please do it :)

Commits
-------

f156de6 [Finder] Optimize the hot-path
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants