Skip to content

[mypyc] feat: ForFilter generator helper for builtins.filter #19643

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

BobTheBuidler
Copy link
Contributor

@BobTheBuidler BobTheBuidler commented Aug 12, 2025

This PR adds a ForFilter helper class for for loops over builtins.filter which allows mypyc to maintain control of the function calling, so it can call native functions and use primitive ops, and to optimize filter's boolean check based on func's return value type

@BobTheBuidler
Copy link
Contributor Author

Just realized I can extend this class to make it work for both builtins.filter and itertools.filterfalse with only 4 more LOC

I wouldn't think to make a primitive for iteration solely over itertools.filterfalse, but considering it would share the same object and only require one more boolean attribute, I think it makes sense to support here. What do you guys think?

@BobTheBuidler BobTheBuidler changed the title [mypyc] feat: ForFilter generator helper for builtins.filter [mypyc] feat: ForFilter generator helper for builtins.filter Aug 12, 2025
@BobTheBuidler
Copy link
Contributor Author

I went ahead and implemented the above on a new branch here, we can merge it into this branch if we decide its worth special casing. I'd say it probably is, considering the code change is quite minimal change once we already have ForFilter defined.

@BobTheBuidler
Copy link
Contributor Author

I don't yet think the failing docs workflow is related to this PR, given that I only added a single bullet point to the native_operations.rst, though can't say for sure

@JukkaL
Copy link
Collaborator

JukkaL commented Aug 15, 2025

I wouldn't think to make a primitive for iteration solely over itertools.filterfalse, but considering it would share the same object and only require one more boolean attribute, I think it makes sense to support here. What do you guys think?

filterfalse seems to be used quite rarely (on the order of 100x less often than filter, based on a large codebase I have access to), so I think it's not worth supporting it at this time. My general guidance is to avoid itertools in any performance-sensitive code, so not having any support any of these makes it easier to teach users what to avoid. Otherwise users would have to remember which functions in itertools have optimized support.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, filter is a fairly common builtin function, so specializing it is useful.

@@ -54,3 +54,5 @@ These variants of statements have custom implementations:
* ``for ... in seq:`` (for loop over a sequence)
* ``for ... in enumerate(...):``
* ``for ... in zip(...):``
* ``for ... in filter(...):``
* ``for ... in itertools.filterfalse(...):``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in my comment, this seems rare enough that it doesn't seem worth it to support it. Every special case adds some overhead to users, as they may try to remember all the stdlib features with optimized primitives.

Copy link
Contributor Author

@BobTheBuidler BobTheBuidler Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there currently any API for a "mypyc plugin" like a mypy plugin? Or future plans for one?

Could be cool to have a mypyc functools plugin and a mypyc itertools plugin, etc, to cover special casing of things that don't necessarily belong in the main mypy repo

something like iterools.accumulate would see quite large performance increases if we could handle the math operations in C. (this is something I wanted/needed for my own libs and was driving towards with these generator helpers but I'll just fork mypy for now to implement)

but to compile optimized C code for other fun things like functools.lru_cache and functools.cached_property would be really really cool (and fast) (and useful (in niche cases)) if there was a way it could be done without polluting the main repo

Copy link
Contributor Author

@BobTheBuidler BobTheBuidler Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though I do think some functools special casing might still be appropriate in the main repo, if not now as a longer term goal. even though that isn't the current way things are done, it seems like we've already started in that direction (see the singledispatch feature set)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we just implement but don't document this, so as to add no overhead to users?

fake_call_expr = CallExpr(self.filter_func_def, [self.index], [ARG_POS], [None])

# I put this here to prevent a circular import
from mypyc.irbuild.expression import transform_call_expr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my comment in the map PR, you can use builder.accept to avoid the cyclic dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! I commented this out and replaced it with builder.accept, and it broke the IR! Check out the test failures.

Once you've seen them I'll revert this to its original state so we're not boxing and unboxing unnecessarily. I'll add a TODO comment to debug the builder.accept issue and then replace this hacky import with builder.accept when its ready

L7:
return s

[case testForFilterStr]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems specialized enough to move to a run test (see my comments in other PRs about our conventions related to irbuild tests).

print(f([5, 15, 25]))
print(f([]))

[out]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please merge all the tests under a single [case ...], leave out driver.py, use assertions instead of [out] and have test cases in def test_... functions. (See my similar comment in the map PR for the motivation.)

@@ -54,3 +54,5 @@ These variants of statements have custom implementations:
* ``for ... in seq:`` (for loop over a sequence)
* ``for ... in enumerate(...):``
* ``for ... in zip(...):``
* ``for ... in filter(...):``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered supporting list(filter(...)) as well -- this seems quite common (in a follow-up PR)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I actually have that drafted already. but it won't be a special case for list(filter(...)) it will be a special case for [list|tuple|set](some_builtin_we_have_a_helper_for_in_for_helpers(...)) which will account for any builtin we have ForGenerator helpers for

Copy link
Contributor Author

@BobTheBuidler BobTheBuidler Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw this was part of the intent behind the list-built-from-range tests

I wasn't actually testing that we can build a list from a range, I was preparing IR to reflect how this helper would change the C implementation. Will work for map, filter, range, zip, enumerate, and future ops with special-case gen helpers

@BobTheBuidler
Copy link
Contributor Author

Thanks for the feedbacks. I have a few busy days ahead of me so I'll have to table these for now, but can try to get them all fixed up next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants