Skip to content

gh-74598: add fnmatch.filterfalse for excluding names #121185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Apr 8, 2025

Conversation

picnixz
Copy link
Member

@picnixz picnixz commented Jun 30, 2024

In this implementation, I did not use a lambda function as it was proposed in the original patch nor did I use auxiliary functions.

EDIT 07/04/2025: I kept the use of filterfalse as on a PGO build the loop variant was much slower when processing past a certain number of files. For < 10 files, the numbers are roughly the same and in the range of nanoseconds so we don't really care.

# 6 files to test

small_match_none_loop: Mean +- std dev: 628 ns +- 9 ns
small_match_none_iter: Mean +- std dev: 622 ns +- 14 ns

small_match_some_loop: Mean +- std dev: 675 ns +- 12 ns
small_match_some_iter: Mean +- std dev: 688 ns +- 12 ns

small_match_digits_loop: Mean +- std dev: 962 ns +- 15 ns
small_match_digits_iter: Mean +- std dev: 987 ns +- 12 ns

small_match_all_loop: Mean +- std dev: 700 ns +- 7 ns
small_match_all_iter: Mean +- std dev: 765 ns +- 19 ns

# 60 files to test

medium_match_none_loop: Mean +- std dev: 4.75 us +- 0.10 us
medium_match_none_iter: Mean +- std dev: 3.85 us +- 0.07 us

medium_match_some_loop: Mean +- std dev: 5.08 us +- 0.07 us
medium_match_some_iter: Mean +- std dev: 4.23 us +- 0.07 us

medium_match_digits_loop: Mean +- std dev: 7.93 us +- 0.15 us
medium_match_digits_iter: Mean +- std dev: 7.12 us +- 0.14 us

medium_match_all_loop: Mean +- std dev: 5.35 us +- 0.12 us
medium_match_all_iter: Mean +- std dev: 4.66 us +- 0.10 us

# 600 files to test

large_match_none_loop: Mean +- std dev: 43.9 us +- 0.9 us
large_match_none_iter: Mean +- std dev: 33.9 us +- 0.7 us

large_match_some_loop: Mean +- std dev: 47.4 us +- 1.0 us
large_match_some_iter: Mean +- std dev: 36.4 us +- 0.7 us

large_match_digits_loop: Mean +- std dev: 75.4 us +- 1.5 us
large_match_digits_iter: Mean +- std dev: 64.6 us +- 1.2 us

large_match_all_loop: Mean +- std dev: 48.2 us +- 0.9 us
large_match_all_iter: Mean +- std dev: 39.4 us +- 0.7 us
Benchmark script
import itertools
import os
import posixpath
from fnmatch import _compile_pattern

import pyperf

def filterfalse_loop(names, pat):
    result = []
    pat = os.path.normcase(pat)
    match = _compile_pattern(pat)
    if os.path is posixpath:
        # normcase on posix is NOP. Optimize it away from the loop.
        for name in names:
            if match(name) is None:
                result.append(name)
    else:
        for name in names:
            if match(os.path.normcase(name)) is None:
                result.append(name)
    return result

def filterfalse_iter(names, pat):
    pat = os.path.normcase(pat)
    match = _compile_pattern(pat)
    if os.path is posixpath:
        # normcase on posix is NOP. Optimize it away from the loop.
        return list(itertools.filterfalse(match, names))

    result = []
    for name in names:
        if match(os.path.normcase(name)) is None:
            result.append(name)
    return result

runner = pyperf.Runner()
runner.parse_args()

base_data = ['Python', 'Ruby', 'Perl', 'Tcl',
             '123456789_path_with_digits.json',
             '_path_with_digits_123456789.json']
for data_label, data_size in [('small', 1), ('medium', 10), ('large', 100)]:
    for pat_label, pattern in [('none', 'A*'), ('some', 'P*'), ('digits', '*[1-9]*'), ('all', '*')]:
        data = base_data * data_size
        runner.bench_func(f'{data_label}_match_{pat_label}_loop', filterfalse_loop, data, pattern)
        runner.bench_func(f'{data_label}_match_{pat_label}_iter', filterfalse_iter, data, pattern)

📚 Documentation preview 📚: https://cpython-previews--121185.org.readthedocs.build/

@picnixz picnixz changed the title gh-74598: add fnmatch.filterfalse for excluding patterns gh-74598: add fnmatch.filterfalse for excluding names Jun 30, 2024
@picnixz picnixz added the stdlib Python modules in the Lib dir label Aug 1, 2024
@picnixz picnixz requested a review from barneygale December 15, 2024 14:17
@picnixz picnixz requested a review from serhiy-storchaka March 1, 2025 11:58
@picnixz
Copy link
Member Author

picnixz commented Mar 1, 2025

FT failure is known so I'll wait until the fix (#130724) is merged.

@picnixz picnixz added the stale Stale PR or inactive for long period of time. label Mar 18, 2025
@picnixz picnixz marked this pull request as draft April 7, 2025 07:35
@picnixz picnixz marked this pull request as ready for review April 7, 2025 07:36
@picnixz picnixz removed the stale Stale PR or inactive for long period of time. label Apr 7, 2025
@picnixz
Copy link
Member Author

picnixz commented Apr 7, 2025

I'm going to run some benchmarks again and then I'll merge this one. The feature can still be useful and maintenance is not that hard. If, however, benchmarks are not good, I'll write a recipe instead.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did have other plans for fnmatch, for supporting complex filters with multiple positive and negative patterns, this is why I was not particularly interested in this feature. But I don't think that adding filterfalse() is wrong. It will not conflict with future features, and they will not make it obsolete.

LGTM. But please add more tests for filterfalse() similar to other filter() tests in FilterTestCase. The code is completely separate, so we need to test all this.

@picnixz
Copy link
Member Author

picnixz commented Apr 7, 2025

But please add more tests for filterfalse() similar to other filter() tests in FilterTestCase

Will do!

@picnixz picnixz enabled auto-merge (squash) April 8, 2025 09:48
@picnixz picnixz merged commit 3eda146 into python:main Apr 8, 2025
39 checks passed
@picnixz picnixz deleted the fnmatch-filterfalse branch April 8, 2025 10:30
seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants