Skip to content

bpo-44889: Specialize LOAD_METHOD with PEP 659 adaptive interpreter #27722

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 43 commits into from
Aug 17, 2021

Conversation

Fidget-Spinner
Copy link
Member

@Fidget-Spinner Fidget-Spinner commented Aug 11, 2021

Up to 24% faster in logging_silent, 2% faster on average in pyperformance.

https://bugs.python.org/issue44889

@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented Aug 11, 2021

Needs some cleanup before it's ready for review.

Pyperformance right now, vs main branch 9816777:

pyperf compare_to 2021-08-10_13-53-main-9816777861a3.json.gz 2021-08-10_16-21-specialize_load_method-a3d6dd34a147.json.gz  -G --min-speed=2

Slower (9):
- scimark_sparse_mat_mult: 13.3 ms +- 0.2 ms -> 14.2 ms +- 0.2 ms: 1.06x slower
- nqueens: 225 ms +- 4 ms -> 236 ms +- 20 ms: 1.05x slower
- pathlib: 52.4 ms +- 0.9 ms -> 54.6 ms +- 6.9 ms: 1.04x slower
- scimark_fft: 1.01 sec +- 0.01 sec -> 1.05 sec +- 0.06 sec: 1.04x slower
- pickle_list: 9.94 us +- 0.15 us -> 10.2 us +- 0.1 us: 1.03x slower
- unpack_sequence: 140 ns +- 2 ns -> 145 ns +- 2 ns: 1.03x slower
- scimark_lu: 422 ms +- 6 ms -> 435 ms +- 12 ms: 1.03x slower
- xml_etree_parse: 383 ms +- 3 ms -> 391 ms +- 4 ms: 1.02x slower
- crypto_pyaes: 255 ms +- 2 ms -> 261 ms +- 2 ms: 1.02x slower

Faster (15):
- logging_silent: 405 ns +- 12 ns -> 328 ns +- 6 ns: 1.24x faster
- deltablue: 14.9 ms +- 0.2 ms -> 13.7 ms +- 0.2 ms: 1.09x faster
- logging_simple: 18.3 us +- 0.3 us -> 16.8 us +- 0.3 us: 1.09x faster
- pickle_pure_python: 1.05 ms +- 0.04 ms -> 975 us +- 11 us: 1.08x faster
- logging_format: 20.1 us +- 0.3 us -> 18.6 us +- 0.3 us: 1.08x faster
- raytrace: 993 ms +- 6 ms -> 938 ms +- 8 ms: 1.06x faster
- hexiom: 22.3 ms +- 0.3 ms -> 21.2 ms +- 0.4 ms: 1.05x faster
- pickle_dict: 61.4 us +- 0.2 us -> 58.9 us +- 0.4 us: 1.04x faster
- pyflate: 1.52 sec +- 0.01 sec -> 1.46 sec +- 0.01 sec: 1.04x faster
- richards: 158 ms +- 2 ms -> 153 ms +- 2 ms: 1.03x faster
- chaos: 228 ms +- 2 ms -> 222 ms +- 2 ms: 1.02x faster
- json_dumps: 31.8 ms +- 0.5 ms -> 31.1 ms +- 0.4 ms: 1.02x faster
- tornado_http: 364 ms +- 7 ms -> 356 ms +- 6 ms: 1.02x faster
- unpickle_pure_python: 711 us +- 8 us -> 696 us +- 8 us: 1.02x faster
- go: 467 ms +- 6 ms -> 457 ms +- 4 ms: 1.02x faster

Benchmark hidden because not significant (34): 2to3, chameleon, django_template, dulwich_log, fannkuch, float, json_loads, mako, meteor_contest, nbody, pickle, pidigits, python_startup, python_startup_no_site, regex_compile, regex_dna, regex_effbot, regex_v8, scimark_monte_carlo, scimark_sor, spectral_norm, sqlalchemy_declarative, sqlalchemy_imperative, sqlite_synth, sympy_expand, sympy_integrate, sympy_sum, sympy_str, telco, unpickle, unpickle_list, xml_etree_iterparse, xml_etree_generate, xml_etree_process

Geometric mean: 1.01x faster

@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented Aug 13, 2021

I knew it was too good to be true when we got away with 0 segfaults on this change :(. Seems like the refleak buildbots caught some errors by making some tests "hot". The failing tests (test_json, test_descr and test_capi) all involved some complex subclassing of C types. I honestly have no clue why that breaks anything. But we can just validate the type, and it's probably way more robust like that.

FWIW, specialization successes/failures didn't change at all on my subset of tests. This only affects an extremely small number of cases. Since this added 1 branch and 1 read, I'll be re-benchmarking one final time.

@Fidget-Spinner Fidget-Spinner added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Aug 13, 2021
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @Fidget-Spinner for commit a12406b 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Aug 13, 2021
@markshannon
Copy link
Member

I don't think the last change is correct.
The version check is a stronger test than the class identity test and the identity test provides no protection against deallocation of the class, or an ABA change.

To diagnose the problem, you could change DEOPT_IF(cls != cache3->obj, LOAD_METHOD) to assert(cls == cache3->obj).

@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented Aug 13, 2021

I don't think the last change is correct.
The version check is a stronger test than the class identity test

Indeed, I thought so too -- so I'm thoroughly confused. After changing to an assert, I get a plain AssertionError in C in the same tests, even after it passes the type and dict version checks. So the identity check was a last failsafe.

FWIW, the previous attempt at LOAD_METHOD with the old opcache similarly used a type identity check along with version check #23503.

I suspect something weird is going on with tp_version_tag and complex C subclasses (maybe a subtle bug?). This will take me a few days or more to investigate.

On a side note: I have zero clue when Py_TPFLAGS_VALID_VERSION_TAG isn't set. It seems that everything in CPython sets it as long as we call _PyType_Lookup in _Py_Specialize_LoadMethod`.

@Fidget-Spinner
Copy link
Member Author

Fidget-Spinner commented Aug 14, 2021

This took me a full day to debug: I think I found a bug in either the libregretest refleak hunter/unittest or the type version cache:

Please see https://bugs.python.org/issue44914.

@Fidget-Spinner Fidget-Spinner added 🔨 test-with-buildbots Test PR w/ buildbots; report in status section and removed DO-NOT-MERGE labels Aug 16, 2021
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @Fidget-Spinner for commit 020a326 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Aug 16, 2021
@Fidget-Spinner Fidget-Spinner added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Aug 16, 2021
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @Fidget-Spinner for commit d27e388 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Aug 16, 2021
@markshannon
Copy link
Member

2% faster

@Fidget-Spinner
Copy link
Member Author

The s390x Fedora buildbot is stuck running test_multiprocessing_spawn. It has been stuck for 17 hours: https://buildbot.python.org/all/#/builders/356/builds/117.

THE AMD64 RHEL7 is stuck at ./configure for 17 hours https://buildbot.python.org/all/#/builders/169/builds/122 .

The AMD64 Fedora buildbot is stuck at test_concurrent_futures for 17 hours https://buildbot.python.org/all/#/builders/52/builds/115

I can't cancel the builds (only core devs can). But I doubt these are related to PR. These bots passed previously (other than the refleak builds) just one commit ago. All refleak buildbots (bar AMD64 Fedora) and other buildbots (excluding those above) have passed.

@markshannon
Copy link
Member

Looks good 👍

I think that there are extensions we can make here.
E.g. an attribute doesn't need to be a method to be cachable. Non-descriptor instances of immutable classes are OK as well.
We might be able to streamline the code around tp_dictoffset as well.
But those improvements can wait for another PR.

@markshannon markshannon merged commit 96346cb into python:main Aug 17, 2021
@Fidget-Spinner
Copy link
Member Author

Thanks for all the reviews Mark. This had some unexpected detours and took longer than I expected :). You're right about the extensions. Looking at it now, LOAD_METHOD_CLASS doesn't need to check the dict's version, tp_version_tag alone will be sufficient thanks to your fixes in that other PR. But with how generalized it currently is, we can certainly easily use LOAD_METHOD_CLASS to do the other cases which you suggested.

Anyways, I think >92.5% specialization successes is quite good. I purposely left ~1.5% on the table by not supporting LOAD_METHOD_CLASS for builtin types because I found 0 speedup in microbenchmarks.

@Fidget-Spinner Fidget-Spinner deleted the specialize_load_method branch August 17, 2021 15:07
@Fidget-Spinner
Copy link
Member Author

Just a FYI, some *nix PGO buildbots have test failures followed by success with the following message:

======================================================================
ERROR: tearDownClass (test.test_support.TestSupport)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/var/lib/buildbot/workers/enable-optimizations-bot/3.x.gps-debian-profile-opt.pgo/build/Lib/test/test_support.py", line 41, in tearDownClass
    support.clear_ignored_deprecations(
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/lib/buildbot/workers/enable-optimizations-bot/3.x.gps-debian-profile-opt.pgo/build/Lib/test/support/__init__.py", line 2079, in clear_ignored_deprecations
    if message.endswith(endswith):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'endswith'
----------------------------------------------------------------------

I was almost ready to revert this PR, but it seems that this has been happening before this PR was committed, so we're safe (for now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants