-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG: Race adding legacy casts to custom dtype under free threading #28048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hmmmm, I wonder if we can/should do this at registration time already to avoid having to add another locking point (registration should happen at import, I think. And if not, it should be the callers problem to lock). |
Here's the C traceback:
NumPy is lazily generating casts, but we can probably initialize casts between user dtypes and the built-in DTypes using this machinery at DType registration time. Failing that, we can keep the current implementation and lock the DType when we need to fill the cache. Possibly behind a |
I edited the reproducer to use |
@seberg I need your opinion about what the best way forward here is. The reason this is set up this way is because user dtypes first create a DType instance and then after that register a bunch of casts. There's no function that says "this DType is fully initialized, do finalization steps." The current approach for wrapping the legacy user dtype casts avoids the need to monkey with the internals of the legacy dtype implementation (including all the I see three ways to fix this and none are great.
|
Ah, so I had a very good reason to put it exactly where it is :(. My thoughts:
Since I think being a |
I'll have to look closer at this, but I think you'd need to apply a critical section when reading |
Another perhaps less satisfying option is to let the race happen and just ignore this error case. The cast created by both threads should be identical. |
I'll look at converting the bits of NumPy needed to put |
Yeah, in fact, I think the If that isn't possible (and not just not the nicest way to do it!), I would still like to understand why it isn't OK to call |
I managed to get the numpy tests passing after converting I think the most straightforward way to add the mutex is to extend the diff --git a/numpy/_core/src/multiarray/dtypemeta.h b/numpy/_core/src/multiarray/dtypemeta.h
index a8b78e3f75..888ec09fba 100644
--- a/numpy/_core/src/multiarray/dtypemeta.h
+++ b/numpy/_core/src/multiarray/dtypemeta.h
@@ -85,6 +85,12 @@ typedef struct {
* dtype instance for backward compatibility. (Keep this at end)
*/
PyArray_ArrFuncs f;
+
+ /*
+ * Mutex to protect against race conditions modifying castingimpls at
+ * runtime
+ */
+ void *mutex;
} NPY_DType_Slots; I can then heap-allocate the mutex during DType creation to avoid polluting the internal C API with C++ constructs and follow the pattern of the other If anyone spots any issues with that let me know but I think I'm going to try to implement this next. |
I mean, I still really don't understand why we shouldn't just try once (even with this mutex!) and then a critical section on the castingimpl's itself seems just sa well (very very unlikely to be hit in practice). |
Isn't that thread-unsafe? All accesses need to be done under the critical section otherwise you could be reading from the dict while another thread writes to it. That's why I want an RWlock. |
I mean, the whole point of the (Which doesn't mean we shouldn't just do the C++ conversion maybe, it is maybe time...) |
(I somewhat remember things as: for ufuncs we said that using a dict rather than a custom hash-table would be a solution where we don't need a custom lock.) |
What's happening right now is technically thread-safe in the sense that there are no data races, but there is a race condition. We could use What I want to do is make the race condition impossible, by blocking all threads from reading the Does that make more sense? |
@ngoldbaum but the If the return is
So, if the return was Since |
I opened #28290 |
Describe the issue:
If the following code is run under Python 3.13.1t, it fails nondeterministically with
A cast was already added for <class 'numpy.dtype[rational]'> -> <class 'numpy.dtypes.Int8DType'>. (method: legacy_cast)
.Reproduce the code example:
Error message:
Python and NumPy Versions:
2.3.0.dev0+git20241219.35b2c4a
3.13.1 experimental free-threading build (tags/v3.13.1:06714517797, Dec 15 2024, 15:38:01) [Clang 18.1.8 (11)]
Runtime Environment:
[{'numpy_version': '2.3.0.dev0+git20241219.35b2c4a',
'python': '3.13.1 experimental free-threading build '
'(tags/v3.13.1:06714517797, Dec 15 2024, 15:38:01) [Clang 18.1.8 '
'(11)]',
'uname': uname_result(system='Linux', node='', release='', version='https://github.com/https://github.com//pull/1 SMP PREEMPT_DYNAMIC Debian 6.redacted (2024-10-16)', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_KNL',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL']}},
{'architecture': 'Zen',
'filepath': '/usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.27.so',
'internal_api': 'openblas',
'num_threads': 128,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.27'}]
Context for the issue:
Found when working on free-threading support in JAX.
The text was updated successfully, but these errors were encountered: