-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
DEP: Deprecate registering dtype names with np.sctypeDict? #24699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the summary @ngoldbaum! I think it would be useful to get the actual requirements and constraints here clear first, before thinking about potential solutions to make string names for external dtypes work.
Isn't this completely untested/documented and just happpened to work by relying on numpy-internal implementation details? If we did document or recommended it, that would make a difference here. But my current impression is that this could have broken at any time, and it's bad practice for a package to modify global state in a way that can break other packages.
I don't see a real reason to support this, at least if the only rationale is to save a few characters (since Also, how often do you actually need this dtype instance? Idiomatic code in numpy uses |
@rgommers Well, the entire numpy type extension API is more or less undocumented, so to a certain extent that's true of anything related to user-defined types. I know there are plans to change that (NEP 42) but they aren't ready yet. This extension has existed in various forms since 2017 (originally as part of TensorFlow), and it looks like we started adding entries to The issue is not so much that we write this, it's that our users who may write this or expect to write this. I don't think it's that prevalent, but some users definitely do it. And it strikes me as something that's reasonable to expect: we're trying to add additional NumPy integer types ( There are other instances where to do this right may require updating global state, for instance users expect to write things like That said, I suspect this problem is less pressing than it once was: now that |
Sure, but part of the point of all the cleanups in 2.0 is to reduce the many ways of doing the same thing, and give users some better guidance of doing things. And this to me does not look like recommended usage. E.g.: >>> x = np.ones(2, dtype=np.int8) # the canonical way
>>> x = np.ones(2, dtype='int8')
>>> x = np.ones(2, dtype='i1')
>>> x = np.ones(2, dtype=np.dtype('int8'))
>>> x = np.ones(2, dtype=np.byte)
>>> ... # etc. This is quite a mess. I believe the recommended way to write code for
Sure. We can think about ways to keep this working. However before we arrive at "we cannot touch this at all for 2.0", let's first figure out how this is supposed to look in the future. And only after that how we move forward in a way that isn't too disruptive for JAX. I'll note that you have to do a new release for 2.0 support anyway, so if we give you any other way (perhaps temporary/private) to keep |
I think one challenge here is that JAX/ml_dtypes cannot deprecate We could certainly update all instances in our own code and other code that we have control over, but any more gradual runtime deprecation behavior would have to come from NumPy.
No, it's explicitly documented. e.g. in the README at https://github.com/jax-ml/ml_dtypes:
That's probably my fault – I had assumed this sort of registration was intentionally supported by NumPy, and so I advertised it as so. |
Can't you simply insert
I meant in the numpy docs.
No worries at all. It's not impossible that this was in some numpy tutorial or docs. And yes, NumPy historically had neither docs for this kind of thing nor any sort of reasonable public/private split. So who knows if this dict was ever meant to be public for reading from (let alone for writing into). |
I don't totally follow: say we create a shadow |
I haven't done this with the legacy API, but could you add a |
Does accessing the dtype singleton involve instantiating the scalar type? |
OK, so I traced it through: I think numpy/numpy/core/src/multiarray/descriptor.c Line 1572 in f6bf183
So as long as import warnings
import ml_dtypes
import numpy as np
class _deprecated_bfloat16:
@classmethod
@property
def dtype(self):
warnings.warn("np.dtype('bfloat16') is deprecated. Use np.dtype(ml_dtypes.bfloat16) instead.")
return np.dtype(ml_dtypes.bfloat16)
np.sctypeDict['bfloat16'] = _deprecated_bfloat16
print(np.dtype('bfloat16'))
# UserWarning: np.dtype('bfloat16') is deprecated. Use np.dtype(ml_dtypes.bfloat16) instead.
# dtype(bfloat16) It would be nice to make numpy/numpy/core/src/multiarray/descriptor.c Lines 1536 to 1538 in f6bf183
I don't see any good overloading route from the Python side within Let me know if you have better ideas! |
I agree that the deprecation should be spit out by NumPy, happy to nudge you towards doing it, but hacking it in The only thing that would make me pause is if you/users have a large want to keep it working. In that case we should add a new way to do a proper registration with a function. (and I might be fine to just make the old one fail.) |
I think this would be ideal – after all the internal dtype system already has a notion of dtype string name, and does string-to-dtype lookups at the C level, even for user-registered dtypes, without any references to |
I don't really like it, because you cannot control name clashes well and the next thing will be someone asking if we can allow But as I said, I would be fine with hiding a registration function somewhere (which could internally just insert into the |
On import,
ml_dtypes
adds new entries tonp.sctypeDict
so that e.g.np.dtype(“int4”)
returns an int4 dtype defined outside NumPy.Since jax currently documents this behavior to users and relies on it internally, I don’t think we can reasonably break it without a deprecation story and a migration story.
For deprecating it, we would keep a list of all the strings that NumPy accepts out of the box and if any other string is passed in and somehow we get back a valid dtype, we should raise a deprecation warning. I don’t know if there are other ways of injecting a string dtype name into NumPy’s internals without manipulating
sctypeDict
so this will catch any other shenanigans.We should probably also deprecate
np.sctypeDict
too?In a few releases after adding the deprecation, we could make it so
np.dtype
can only return dtype instances with a mapping defined out of the box in NumPy or via some as-yet unwritten mechanism to associate string names with dtypes, probably with some kind of support for namespacing.As far as I know jax is the only downstream library that injects dtype names into the
np.dtype("dtype_name")
mechanism.The deprecation should not be added until we have a clear migration story for the jax library and any possible impacts on jax users are considered.
xref #24376 (comment) and the discussion that follows for context
The text was updated successfully, but these errors were encountered: