Skip to content

bpo-40137: Micro-optimize _PyType_GetModuleByDef() #25393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

bpo-40137: Micro-optimize _PyType_GetModuleByDef() #25393

wants to merge 1 commit into from

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Apr 13, 2021

Make _PyType_GetModuleByDef() 2.3 ns faster.

Benchmark: 43.2 ns +- 0.7 ns -> 40.9 ns +- 1.0 ns: 1.05x faster

./python -m pyperf timeit
--duplicate=4096
-s "from functools import lru_cache; f = lru_cache(lambda: 42)"
"f()"

Changes:

  • _PyType_GetModuleByDef(): add fast-path for tp_mro[0] and use
    _PyType_HasFeature().
  • Add a new pycore_moduleobject.h internal C API header file.
  • Add _PyModule_GetDef() and _PyModule_GetState() which can be
    inlined without LTO and don't check invalid argument at runtime.
  • Replace PyModule_GetState() with _PyModule_GetState() in _abc,
    _array, _operator, _pickle, _queue, _random, _struct and os
    extension modules. The _array, _queue and _struct extensions are
    now built with Py_BUILD_CORE_MODULE macro defined.

https://bugs.python.org/issue40137

@vstinner
Copy link
Member Author

Replace PyModule_GetState() with _PyModule_GetState() in (...) extension modules

Honestly, the runtime check on PyModule_GetState() argument could replaced with an assertion. But I prefer to not change in as part of this PR. See https://bugs.python.org/issue37406 for my previous attempt: "Disable runtime checks in release mode".

But _PyModule_GetState() remains useful to ensure that the code can be inlined, especially on macOS which doesn't use LTO: https://bugs.python.org/issue41181

Make _PyType_GetModuleByDef() 2.3 ns faster.

Benchmark: 43.2 ns +- 0.7 ns -> 40.9 ns +- 1.0 ns: 1.05x faster

./python -m pyperf timeit \
    --duplicate=4096
    -s "from functools import lru_cache; f = lru_cache(lambda: 42)" \
    "f()"

Changes:

* _PyType_GetModuleByDef(): add fast-path for tp_mro[0] and use
  _PyType_HasFeature().
* Add a new pycore_moduleobject.h internal C API header file.
* Add _PyModule_GetDef() and _PyModule_GetState() which can be
  inlined without LTO and don't check invalid argument at runtime.
* Replace PyModule_GetState() with _PyModule_GetState() in _abc,
  _array, _operator, _pickle, _queue, _random, _struct and os
  extension modules. The _array, _queue and _struct extensions are
  now built with Py_BUILD_CORE_MODULE macro defined.
@rhettinger
Copy link
Contributor

This looks good to my eyes but I haven't deeply thought about what the slow path is doing and what cases it handling. So, this is a 95% sign-off.

@rhettinger rhettinger added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 21, 2021
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @rhettinger for commit 158c102 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 21, 2021
@pablogsal pablogsal self-requested a review April 21, 2021 03:12
@vstinner vstinner marked this pull request as draft April 21, 2021 15:47
@vstinner
Copy link
Member Author

Hum. I'm not 100% sure about the assumptions that I made in my optimization. I plan to write first preparation changes to make the assumptions reliable :-)

@vstinner
Copy link
Member Author

Also, PR #25492 makes this optimization less important.

@rhettinger
Copy link
Contributor

rhettinger commented Apr 21, 2021

This is still important. I mitigated only one case.

If this PR isn't applied, I think all the other cases should be reverted. We're making everyone pay a cost and it looks like there is currently zero offsetting benefit. It is not clear that these ever should have gone in the first place.

@vstinner
Copy link
Member Author

I splitted this PR into 3 smaller PRs with a few minor fixes: https://bugs.python.org/issue40137#msg391547

The main change is that I added a runtime check to ensure that a type MRO cannot be empty. I close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants