-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-132336: Mark a few "slow path" functions used by the interpreter loop as noinline #132337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
These are all the slow path and should not be inlined into the interpreter loop. Unfortunately, they end up being inlined with LTO and the current PGO task.
I'm surprised the coverage benchmark isn't even significantly slower... That's the only benchmark I can think of that could be negatively affected by this, so since it isn't, let's do it. |
|
Buildbot failure is not related to this change, it's been sporadically failing that way for at least a month now. |
See my comments on the issue about what we should inline or not. It should be easy enough to @Yhg1s @mpage |
I will be very happy to adopt this as a general CPython policy, yes.
A 0.9% change is very much in the noise for pyperformance as run by bench_runner on https://github.com/facebookexperimental/free-threading-benchmarking. It's in the noise for unrelated changes that affect the compiler output, and it's even in the noise for A-A tests. We don't accept 1% speedups as real without a lot more vetting, as well as reasoning why it's real, and under what circumstances. Rolling back the change isn't a problem if it turns out the slowdown is real. |
Only for the free-threaded build, where it was 0.3% slower (what I would call neutral).
I don't think that's a fair characterization. I went back through the PRs that I've submitted or have been mentioned on and have consistently treated anything less than 1% as neutral. I've always reported the numbers and said which way I thought things were leaning. In this case it seems like things might be trending slightly negatively, but the most negatively affected benchmarks are also pretty noisy (at least based on my experience), which is why I said that I thought it was worth keeping the builds in sync. I try hard to be a good steward of perf for the default build: I always run benchmark runs for both builds for any changes that might affect perf and try to leave PRs open for long enough to allow someone from the faster-cpython team to have a look at them before they are merged (at times waiting months). I also specifically wanted to give you a chance to comment, which is why I said in the PR summary:
Sorry you didn't have a chance to weigh in before this was merged. I'm happy to put up a PR with something like |
…eter loop as noinline (python#132337) Mark a few functions used by the interpreter loop as noinline These are all the slow path and should not be inlined into the interpreter loop. Unfortunately, they end up being inlined with LTO and the current PGO task.
These are all on the slow path and probably shouldn't be inlined into the interpreter loop.
Performance improves by ~3.6% on the free-threaded build.
The default build reports a ~0.9% regression, but I suspect that's mostly noise. We can restrict this change to only the free-threaded build (where it's clearly a win) if folks are concerned, but I think it's probably worth keeping the builds in sync.