-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Replace a call to PyTuple_New with _PyTuple_FromArraySteal #96516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PyTuple_New will zero out the tuple before returning to the caller, and a surprising amount of time can be saved by not doing this zeroing. One option is to add a non-zeroing version of PyTuple_New, which I did in python#96446, but there was resistance to the unsafety of it. Fortunately it looks like most of the tuple-zeroing happens directly from the BUILD_TUPLE opcode in the interpreter, which already has the arguments in an appropriate array, so we can just convert this to _PyTuple_FromArraySteal This seems to result in a ~0.2% speedup on macrobenchmarks.
This is a great small change, especially if it ends up providing most of the benefit of #96446. I'm running this PR on the Faster CPython team's standard benchmarking machine as well to get another data point (though I may not be able to report back until after the long weekend). |
I measured only a 0.07% speedup on the pyperformance and pyston macrobenchmarks. I'm going to run this (and the baseline) again, because it's surprising it's so little. I don't want to imply that's an argument against this PR (and it's not really up to me anyway). Reproducibility of benchmarks in general, especially when the margins are so small, is an ongoing challenge we're all struggling with. |
Yeah I think that even with some changes to make benchmarking more stable (bolt + longer bolt task) the systematic errors are around this level so I think it's hard to rely on the benchmarking numbers too much. |
Sorry I'm not super familiar with the cpython workflow, is there anything I can do to help progress this through the "awaiting_merge" state? |
This improvement is very small to users. I think we don't need news file for this.
|
thanks! |
PyTuple_New will zero out the tuple before returning to the caller, and a
surprising amount of time can be saved by not doing this zeroing. One option
is to add a non-zeroing version of PyTuple_New, which I did in #96446, but
there was resistance to the unsafety of it.
Fortunately it looks like most of the tuple-zeroing happens directly from the
BUILD_TUPLE opcode in the interpreter, which already has the arguments in an
appropriate array, so we can just convert this to _PyTuple_FromArraySteal
This seems to result in a ~0.2% speedup on macrobenchmarks.