-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
bpo-40137: Micro-optimize _PyType_GetModuleByDef() #25393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Honestly, the runtime check on PyModule_GetState() argument could replaced with an assertion. But I prefer to not change in as part of this PR. See https://bugs.python.org/issue37406 for my previous attempt: "Disable runtime checks in release mode". But _PyModule_GetState() remains useful to ensure that the code can be inlined, especially on macOS which doesn't use LTO: https://bugs.python.org/issue41181 |
Make _PyType_GetModuleByDef() 2.3 ns faster. Benchmark: 43.2 ns +- 0.7 ns -> 40.9 ns +- 1.0 ns: 1.05x faster ./python -m pyperf timeit \ --duplicate=4096 -s "from functools import lru_cache; f = lru_cache(lambda: 42)" \ "f()" Changes: * _PyType_GetModuleByDef(): add fast-path for tp_mro[0] and use _PyType_HasFeature(). * Add a new pycore_moduleobject.h internal C API header file. * Add _PyModule_GetDef() and _PyModule_GetState() which can be inlined without LTO and don't check invalid argument at runtime. * Replace PyModule_GetState() with _PyModule_GetState() in _abc, _array, _operator, _pickle, _queue, _random, _struct and os extension modules. The _array, _queue and _struct extensions are now built with Py_BUILD_CORE_MODULE macro defined.
This looks good to my eyes but I haven't deeply thought about what the slow path is doing and what cases it handling. So, this is a 95% sign-off. |
🤖 New build scheduled with the buildbot fleet by @rhettinger for commit 158c102 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
Hum. I'm not 100% sure about the assumptions that I made in my optimization. I plan to write first preparation changes to make the assumptions reliable :-) |
Also, PR #25492 makes this optimization less important. |
This is still important. I mitigated only one case. If this PR isn't applied, I think all the other cases should be reverted. We're making everyone pay a cost and it looks like there is currently zero offsetting benefit. It is not clear that these ever should have gone in the first place. |
I splitted this PR into 3 smaller PRs with a few minor fixes: https://bugs.python.org/issue40137#msg391547 The main change is that I added a runtime check to ensure that a type MRO cannot be empty. I close this PR. |
Make _PyType_GetModuleByDef() 2.3 ns faster.
Benchmark: 43.2 ns +- 0.7 ns -> 40.9 ns +- 1.0 ns: 1.05x faster
./python -m pyperf timeit
--duplicate=4096
-s "from functools import lru_cache; f = lru_cache(lambda: 42)"
"f()"
Changes:
_PyType_HasFeature().
inlined without LTO and don't check invalid argument at runtime.
_array, _operator, _pickle, _queue, _random, _struct and os
extension modules. The _array, _queue and _struct extensions are
now built with Py_BUILD_CORE_MODULE macro defined.
https://bugs.python.org/issue40137