-
-
Notifications
You must be signed in to change notification settings - Fork 32.7k
gh-129069: make list ass_slice and memory_repeat safe in free-threading #131882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
tom-pytel
wants to merge
24
commits into
python:main
Choose a base branch
from
tom-pytel:fix-issue-129069
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+187
−29
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Ping @colesbury , @Yhg1s , Windows doesn't like |
Yhg1s
reviewed
Apr 1, 2025
Misc/NEWS.d/next/Core_and_Builtins/2025-03-29-20-12-28.gh-issue-129069.QwbbWV.rst
Outdated
Show resolved
Hide resolved
Misc/NEWS.d/next/Core_and_Builtins/2025-03-29-20-12-28.gh-issue-129069.QwbbWV.rst
Outdated
Show resolved
Hide resolved
Looks like CI is failing on macOS. |
I've been seeing those for a bit, even in other PR, it doesn't appear to be related to the PR:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR exists because of #129069 and
Tools/tsan/suppressions_free_threading.txt
:It seems to hit the requirements: performance, atomicity assured (per pointer), TSAN shuts up. Pretty sure can remove the
list_ass_slice_lock_held
andlist_inplace_repeat_lock_held
suppressions, and if not yet then should be able to add the atomic memcpy tolist_resize
where needed to be able to do so.The "atomic memcpy" (atomic per ptr, not whole memcpy) functions are in the atomic wrappers header because they can be reused, if want otherwise though can move into
listobject.c
, or somewhere else. Can also make non-inline callable real functions. Or the inline funcs could go intopyatomic.h
?@colesbury, you may recognize this, I did something similar for the lock-free array module PR, but this is simpler. PyObject pointers are a bit more fragile than pure values though so I assume you want this here?
Here is an example of what the inner loop a
FT_ATOMIC_MEMMOVE_PTR_RELAXED
compiles to, its norep movsq
or other specialized move instructions, but its fast:Simple benchmark. Note the decrementing address copy case in
FT_ATOMIC_MEMMOVE_PTR_RELAXED
seems to hurt a bit cache-wise in the 'tins' simple bench, but it doesn't seem to make a difference in the large move or overall inpyperformance
. I've had this one jump around from parity with current to this (which is the worst case so I put it here). The difference disappears if the realmemmove
is used in this case but the difference seems to come from the realmemmove
using special instructions (which are not atomic), a la: https://codebrowser.dev/glibc/glibc/sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S.html. The other two simple (tdel
andtins big
) are solid where they are. Test script:Times, average of 10 runs each:
pyperformance
benchmark. Difference in average performance is essentially nonexistent though individual tests can vary a bit (AMD 7950x running in VirtualBox):