-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Fix #5339: cache dtype.__hash__. #5343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Small benchmarks:
|
/* Cached hash value (-1 if not yet computed). | ||
* This was added for NumPy 2.0.0. | ||
*/ | ||
npy_hash_t hash; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this unfortunately breaks the ABI and causes forward compatibility issues with cython.
though I am playing with breaking ABI by hiding the ufunc structure (at least partially) maybe we could break this too.
But if we do we should do it properly and add a pointer to a private object or something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't adding a field at the end be fine, ABI-wise? We don't allow
subclassing. What cython issue are you thinking of?
I doubt there's any legitimate reason for anyone outside numpy to access
anything besides a few fields at the beginning anyway though (in
particularly itemsize), and I've also been pondering plans to rewrite the
dtype struct. So I wouldn't rule that out either. Rather than using a
pimpl, though, it might be sufficient to just hide the private fields
behind a private struct definition that only we can cast to.
On 4 Dec 2014 06:47, "Julian Taylor" notifications@github.com wrote:
In numpy/core/include/numpy/ndarraytypes.h:
@@ -619,6 +620,10 @@ typedef struct _PyArray_Descr {
* for NumPy 1.7.0.
*/
NpyAuxData *c_metadata;
/\* Cached hash value (-1 if not yet computed).
\* This was added for NumPy 2.0.0.
*/
npy_hash_t hash;
this unfortunately breaks the ABI and causes forward compatibility issues
with cython.
though I am playing with breaking ABI by hiding the ufunc structure (at
least partially) maybe we could break this too.
But if we do we should do it properly and add a pointer to a private
object or something similar.—
Reply to this email directly or view it on GitHub
https://github.com/numpy/numpy/pull/5343/files#r21288459.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if we do we should do it properly and add a pointer to a private object or something similar.
That sounds reasonable. Do you have a patch for that somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extending structs breaks programs embedding this struct into other structures (unlikely) and cython checks the size of these types and complains if they are different than what the program was built with, thats just an annoyance but one that comes up very often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't finished my ufunc patch yet, though it goes along the lines that njsmith suggested, keep the current structure and internally use a larger one. Additionally it might be interesting to mangle the public part in the development version to trigger failures for people that use the internals. For release the mangling can be reverted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm less knowledgeable than you guys :) What should be the way forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one way would be to implement the hiding of the object internals, another wait until someone else does it, but that might take a while.
Unfortunately the dtype internals are probably far more commonly used than the ufunc internals I want to hide. This probably gives us much less wiggle room for breaking the ABI.
Unless someone finds a way to do so without breaking it, maybe some kind of magic number in one of the existing fields could work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I am also just overly cautious, in light of no better idea if you want to work on it I think the approach of adding a second private struct that embeds the public struct is worthwhile. Its only problem is that we technically can't trust the hash to be correct if others do unusual stuff with the public part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the general scheme work like PyArray_Fields currently does? Which dtype fields should be private vs. public exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping on my questions above :)
Re-re-ping. |
@juliantaylor @njsmith Thoughts? I agree with Julian that the descr internals are accessed in a lot of code, and that embedding the structure might be a way forward, i.e., dividing it into private and public parts. Long term, we might want to add functions for access, but I don't think we can require them in any time frame less than several/many years. |
I'm all for doing some more general rearrangements to how the dtype struct I'd worry more about correctness in face of the fact that dtypes are Making the uninitialized value 0 as discussed above might be a minor
|
Could you show me an example of how to mutate a dtype? |
|
Thanks. Apparently |
Hmm, is there a good reason for the names being mutable? Should we consider removing that ability? |
As a matter of fact, there's the following comment in the setter for the
|
I've updated the PR to invalidate the cache when necessary, and add tests. |
Why the merge from master? Few comments
|
npy_3kcompat.h includes ndarraytypes.h includes ndarrayobject.h includes npy_3kcompat.h.
Depends what you call "cleaner" :-) Perhaps move the "npy_hash_t" typedef into npy_common.h? |
Mailing list thread on the topic http://thread.gmane.org/gmane.comp.python.numeric.general/43047/focus=43119. Note the comment that the dtype was made mutable to work around the inablility to do the renames with a *.view. We might want to take a look at that, as it seems that would be the preferred alternative. |
@pitrou Yeah, let's put |
30e4313
to
6a373cb
Compare
Computing the type of a dhash can be slow for complex (e.g. structured) dtypes. Hashing dtypes can be useful in some applications, such as when doing type-based dispatching, and speed can be critical in those cases. This enhancement caches the once-computed hash value in the dtype structure, so as to save time on further lookups. The cached value is invalidated in the rare cases where the dtype is mutated. Benchmarks numbers: python3.4 -m timeit -s "import numpy as np; t=np.dtype('uint64')" "hash(t)" * before patch: 1000000 loops, best of 3: 0.498 usec per loop * after patch: 10000000 loops, best of 3: 0.0616 usec per loop python3.4 -m timeit -s "import numpy as np; t=np.dtype([(s, 'f') for s in 'abcdefghij'])" "hash(t)" * before patch: 100000 loops, best of 3: 4.43 usec per loop * after patch: 10000000 loops, best of 3: 0.0603 usec per loop Closes numpy#5339.
@charris, ok, I think I have done everything you asked for. |
Fix #5339: cache dtype.__hash__.
LGTM, thanks @pitrou. @juliantaylor @njsmith This simply extends the struct at the end. I don't think this will break anything in practice. If we intend to hide some stuff we should settle how to do that soon. |
I don't like starting the 1.7 cython warning nonsense again and a (minor) abi break for a performance improvement most users are never going to see ... |
Perhaps Cython simply needs fixing? What is the warning about exactly? |
change of structure size, aka abi break. its probably too noisy as the abi is only broken for (hopefully) very uncommon cases. Changing cython will not help much as the current structure size is already included in existing binaries which you'd need to recompile to remove it. |
Details at #432. That was/is indeed annoying as hell. |
The change in #432 to make cython shut up about this nonsense is still in On Sat, Apr 4, 2015 at 4:21 PM, Ralf Gommers notifications@github.com
Nathaniel J. Smith -- http://vorpus.org |
PyObjects should only be allocated dynamically and passed by reference (the exception is static definition of a PyTypeObject), so I don't know why a size change would be a problem. CPython can change the size of builtin objects from one version to another: does Cython complain about that? |
It's never made a huge amount of sense to me either. You might bring it up (Does CPython really change the size of builtin objects in point releases? On Sat, Apr 4, 2015 at 4:26 PM, Antoine Pitrou notifications@github.com
Nathaniel J. Smith -- http://vorpus.org |
@njsmith I just checked, the filters in |
No, normally not. Only in feature releases. |
I agree :-) |
…n noise. See numpygh-432 for details. Motivation for adding this now is the discussion on numpygh-5343.
Okay, done in gh-5750. |
Fix #5339: cache
dtype.__hash__