-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-132042: Remove resolve_slotdups to speedup class creation #132156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
gh-132042: Remove resolve_slotdups to speedup class creation #132156
Conversation
Maybe take a look at this comment and have it in mind. |
@StanFromIreland Yeah, thanks! I just checked both scripts - and timing closely same. But I not finished main task, so results can changed. |
Intermediate results: Benchmark results (outdated)
|
I added tests from gh-76527 (script for testing with pyperf b.txt):
Benchmark results (outdated)
Also benchgcclasses2.py output:
new:
Time varies, I'm not sure it is statistically significant. Count of objects closely the same. Async import time (
new:
|
Remove resolve_slotdups (new2): Benchmark results (outdated)
|
It is ready to review. Please take a look. |
Include/cpython/descrobject.h
Outdated
@@ -16,6 +16,7 @@ struct wrapperbase { | |||
const char *doc; | |||
int flags; | |||
PyObject *name_strobj; | |||
uint8_t name_count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment explaining the purpose of this member and how it's computed. You can mention _PyType_InitSlotDefsNameCounts().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment. Please take a look is this clear enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrapperbase is a public C API. It would be better to leave it unchanged if possible. Would it be possible to store this information somewhere here, in a private space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to own array. Please take a look.
Co-authored-by: Victor Stinner <vstinner@python.org>
Updated results - only for remove resolve_slotdups (ran windows 11 x64 desktop, cpu- 11th Gen Intel(R) Core(TM) i5-11600K @ 3.90GHz): Details
|
Updated results
|
@@ -340,6 +340,8 @@ Objects/obmalloc.c - obmalloc_state_main - | |||
Objects/obmalloc.c - obmalloc_state_initialized - | |||
Objects/typeobject.c - name_op - | |||
Objects/typeobject.c - slotdefs - | |||
# It initialized only once when main interpeter starts | |||
Objects/typeobject.c - slotdefs_name_counts - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that c-analyzer is not very happy, but don't say why :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ericsnowcurrently: Any idea how to make c-analyzer happy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vstinner @ericsnowcurrently It seems that we need to adjust max sizes for typeobject.c in c-analyzer. If this is correct, then I prepare own PR with fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done with fixing c-analyzer!
Results for this PR vs current main - 1.14x faster``` +-------------------------------------------------------------+---------+-----------------------+ | Benchmark | ref5 | dups5 | +=============================================================+=========+=======================+ | 1000-empty_cls | 4.52 ms | 3.75 ms: 1.21x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-cls_with_dunders | 5.20 ms | 4.64 ms: 1.12x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-empty_cls_with_bases-bases=['A', 'B'] | 5.74 ms | 5.00 ms: 1.15x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-cls_with_bases-bases=['A', 'B'] | 6.33 ms | 5.72 ms: 1.11x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-empty_cls_with_bases-bases=['A', 'B', 'D'] | 6.53 ms | 5.73 ms: 1.14x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-cls_with_bases-bases=['A', 'B', 'D'] | 7.03 ms | 6.42 ms: 1.09x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-empty_cls_with_bases-bases=['A_dun', 'B_dun'] | 5.47 ms | 4.96 ms: 1.10x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-cls_with_bases-bases=['A_dun', 'B_dun'] | 6.25 ms | 5.73 ms: 1.09x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-empty_cls_with_bases-bases=['A_dun', 'B_dun', 'D_dun'] | 6.22 ms | 5.67 ms: 1.10x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-cls_with_bases-bases=['A_dun', 'B_dun', 'D_dun'] | 7.14 ms | 6.34 ms: 1.13x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-empty_cls_with_bases-bases=['Logger'] | 5.34 ms | 4.78 ms: 1.12x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-cls_with_bases-bases=['Logger'] | 6.04 ms | 5.43 ms: 1.11x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-empty_cls_with_bases-bases=['DatagramHandler'] | 6.70 ms | 5.95 ms: 1.13x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-cls_with_bases-bases=['DatagramHandler'] | 7.19 ms | 6.58 ms: 1.09x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-empty_cls_with_bases-bases=['MagicMock'] | 7.97 ms | 7.04 ms: 1.13x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-cls_with_bases-bases=['MagicMock'] | 8.31 ms | 7.68 ms: 1.08x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-empty_cls_with_bases-bases=['Shelf'] | 9.50 ms | 8.90 ms: 1.07x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-cls_with_bases-bases=['Shelf'] | 10.1 ms | 9.67 ms: 1.05x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-empty_cls_with_bases-bases=['tuple'] | 5.68 ms | 4.42 ms: 1.29x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-cls_with_bases-bases=['tuple'] | 6.40 ms | 5.43 ms: 1.18x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-empty_cls_with_bases-bases=['dict'] | 5.69 ms | 4.62 ms: 1.23x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-cls_with_bases-bases=['dict'] | 6.32 ms | 5.38 ms: 1.17x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-empty_cls_with_bases-bases=['list'] | 6.07 ms | 4.59 ms: 1.32x faster | +-------------------------------------------------------------+---------+-----------------------+ | 1000-cls_with_bases-bases=['list'] | 6.67 ms | 5.32 ms: 1.25x faster | +-------------------------------------------------------------+---------+-----------------------+ | Geometric mean | (ref) | 1.14x faster | +-------------------------------------------------------------+---------+-----------------------+ ```@Yhg1s May I ask you a favor to run benchmarks on this PR too? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Maybe @mdboom will be interested in running benchmarks for this PR?
Misc/NEWS.d/next/Core_and_Builtins/2025-04-10-01-52-42.gh-issue-132042.fePwlj.rst
Outdated
Show resolved
Hide resolved
Yep, I'm kicking it off now. Results may take a few hours (the queue gets long this time of year). |
Co-authored-by: sobolevn <mail@sobolevn.me>
Overall on pyperformance, it's a wash. Some benchmarks are 10% faster, but we should probably also understand why some are 5% slower, to make sure there aren't negative effects of this change. (Ignore the gc_traversal benchmark, that is not particularly reliable). |
@mdboom Thank you! Will profile slow benchmarks. |
Removes resolve_slotdups.