-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-132042: Prebuild mro_dict for find_name_in_mro to speedup class creation #132618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
gh-132042: Prebuild mro_dict for find_name_in_mro to speedup class creation #132618
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All optimizations give about 40% speedup on tests.
Would you mind to run benchmarks on this PR?
First, rebase the PR on the main branch to retrieve the "Do not lookup tp_dict each time to speedup class creation" change.
Yes, but I can do it on the weekends (at least I will try later today on my traveling laptop, but results will not be the same as from original PR). |
@vstinner benchmark's results: Details
There are two mro column - for two runs. I believe results are not very stable due throttling (I ran benchmarks on macbook retina 2013 on windows, cpu - i5-4258U @ 2.40GHz) |
@vstinner Please take a look. |
Updated benchmarks (ran windows 11 x64 desktop, cpu- 11th Gen Intel(R) Core(TM) i5-11600K @ 3.90GHz): Details
|
…hile type initialized - in the old realisation this exception swallowed and this base not checked while finding in mro. so we don't change observed behavior with this change.
Updated results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I just have a remark on a comment.
I'm surprised that the latest benchmark run no longer shows slower tests.
Co-authored-by: Victor Stinner <vstinner@python.org>
Yeah, ref2 became a bit slower than ref (ref is ran on main 3 days ago, ref2 on actual main). Details
|
Merged with main due conflicts after #131174 |
Updated benchmarks with main (a bit slower again - ref3 today main)
This PR - 1.20x faster
|
I also benchmarked with FT. main vs main-ft (34% slower)
main-ft vs PR-ft (20% faster)
|
FYI, I've scheduled full benchmark runs (regular and free-threaded) on https://github.com/facebookexperimental/free-threading-benchmarking. (It'll be a couple of hours before they show up.) I'm a little concerned about the need for the test change, but I guess it's rare enough to have custom |
Thanks for benchmarking. I'm looking forward to results :)
IIUC, |
This is an interesting optimization and worth considering IMHO. However, I am also concerned about unexpected behavioral differences caused by it. Doing a merge on each type dict is not the same has doing a dict lookup on each slot. As the broken unit test shows, you can have code that executes on the dict lookup and that can change the result of the operation (e.g. the hash method on a name value). To be fair, code that does that is probably "cursed" and deserves to break. However, I'd be a little worried about applying this optimization in the 3.14 release. Maybe we should defer and revisit in the 3.15 cycle? |
Thanks! I'm crying! 😿 |
Yeah, I agree with you. In the initial version I just stopped building of MRO-dict and fallback to original version of
It is up to core-devs - I just write code a bit :) |
Don't cry, this is the nature of doing performance optimizations of Python. The low hanging fruit is pretty much gone and so optimizations are either quite complex or they only provide benefit to a specific subset of code. Just because the pyperformance benchmarks don't show a win, it doesn't mean this is not worth pursing. I'm sure some code out there does create more type objects and would see a significant benefit. So it's a matter of how costly the optimization is, in terms of code maintenance and risk of changing behavior (breaking currently working programs). In this case, the maintenance cost looks pretty minor, the implementation is not complex. It's the change in behavior that's the concern and I think we are too close to the beta of 3.14 to risk it. Again, I think this is worth re-visiting and I hope you are not discouraged from looking for other kinds of optimizations. |
@nascheme Thank you for kind words! |
This is one of the optimizations from #132156 that moved to separate PR.
All three optimizations from original PR give about 40% speedup on tests.
This optimization give about 15%-18% speedup.