Skip to content

[mypyc] feat: extend stararg fastpath from #19623 to handle lists and generic sequences #19629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Aug 18, 2025

Conversation

BobTheBuidler
Copy link
Contributor

@BobTheBuidler BobTheBuidler commented Aug 9, 2025

This PR extends #19623 with additional logic for handling non-tuple star inputs

Now, we can use the fast path for any arbitrary sequence, in addition to tuples.

I opted to separate this PR from 19623 to keep them smaller and easier to review, and to declutter the changes in the IR.

This PR is ready for review (after #19623 is reviewed and merged)

@BobTheBuidler
Copy link
Contributor Author

I pulled in the latest changes on master, the diff is now cleaned up and this PR is ready for review.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good, just a few comments about tests.

Have you tried to use a microbenchmark to measure the performance impact? This looks obviously beneficial, since fewer objects are created, but it would also be nice to get some idea about the % impact.

@@ -3546,3 +3540,266 @@ L0:
r2 = PyObject_Vectorcall(r1, 0, 0, 0)
r3 = box(None, 1)
return r3

[case testStarArgFastPathTuple]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add run tests corresponding to these irbuild test cases? Please try to look for existing run tests where you can add new def test_foo functions, or add a new test case that has no driver and multiple def test_foo functions that correspond to individual test cases, since each top-level run test case has significant per-test-case overhead.

@BobTheBuidler
Copy link
Contributor Author

BobTheBuidler commented Aug 16, 2025

Have you tried to use a microbenchmark to measure the performance impact?

No I did not, I typically only do real benchmarking in cases that are more ambiguous as to whether or not the change is truly beneficial. In cases where it's clearly beneficial but not clear how much, I usually only benchmark when I'm abnormally curious about the result. Is that a blocker for this one?

@JukkaL
Copy link
Collaborator

JukkaL commented Aug 18, 2025

Is that a blocker for this one?

No. I ran a quick micro-benchmark to just get a general idea of whether the performance impact is likely visible. In this case performance improved by 1.6x, so it looks good. (We've had cases were an apparent optimization actually made code slower, so it's a good idea to do at least minimal benchmarking if there is any doubts about the impact.)

Many optimizations aren't visible in (most) larger-scale benchmarks, since the noise floor tends to be 0.5% or more, so any improvement below 0.5% can't be easily measured.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks like a solid improvement. This improves performance in some use cases where we could be slower than interpreted, which is particularly nice.

@JukkaL JukkaL merged commit 91487cb into python:master Aug 18, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants