API: Cleaning `numpy/init.py` and main namespace - Part 3 [NEP 52] #24376

mtsokol · 2023-08-09T16:48:11Z

Follow-up PR after #24357.

Current scope of the PR:

Remove other aliases:

float_
complex_
longfloat
singlecomplex
cfloat
longcomplex
clongfloat
string_
unicode_

Continue establishing main namespace as is in ENH: Overhaul of NumPy main namespace [NEP 52] #24306 (removed next 18 "remove" column items), remove:

Inf, Infinity, NaN, infty
issctype, maximum_sctype, obj2sctype, sctype2char, sctypeDict, sctypes, issubsctype
set_string_function,
asfarray (removed completely from NumPy)
issubclass_
tracemalloc_domain (still available in np.lib)
mat (alias for np.asmatrix)
recfromcsv and recfromtxt (still available under np.lib.npyio module)
safe_eval, deprecate, deprecate_with_doc (all three already deprecated)

ngoldbaum · 2023-08-14T17:50:22Z

It looks like the doctests for doc/source/reference/arrays.dtypes.rst are failing because np.unicode_ go removed.

ngoldbaum

Left a few comments below, I didn't catch any other issues but this probably needs at least one more careful review before it can be merged.

doc/source/reference/arrays.dtypes.rst

numpy/doc/constants.py

seberg · 2023-08-15T21:10:41Z

Some I wonder if there is enough use that we shouldn't bother, but overall nothing springs up a lot. tracemalloc_domain should be public, or did you add it to the docs somehwere? It could (and arguably should) be moved maybe, but not sure where exactly; and doing so is fine, since it should only be relevant for debugging.
asfarray looks terrifying.

Someone at pandas recently mntiond they rely on sctypeDict to find a list of all dtyps/scalars. I am unsure if we don't have to provide something new if we slash it down a lot (although I generally like slashing it down quit a bit).

rgommers · 2023-08-16T08:57:30Z

Someone at pandas recently mntiond they rely on sctypeDict to find a list of all dtyps/scalars. I am unsure if we don't have to provide something new if we slash it down a lot (although I generally like slashing it down quit a bit).

It looks undocumented besides a single brief mention in https://numpy.org/devdocs/reference/arrays.dtypes.html, and it's unclear what the design of it is. I think it's worth revisiting in a follow-up PR after some of the aliases are removed, and then (assuming we keep it) documenting properly what will be in it and how users are expected to use it?

Right now this seems like total chaos:

>>> np.sctypeDict.keys()
dict_keys(['?', 0, 'byte', 'b', 1, 'ubyte', 'B', 2, 'short', 'h', 3, 'ushort', 'H', 4, 'i', 5, 'uint', 'I', 6, 'intp', 'p', 7, 'uintp', 'P', 8, 'long', 'l', 'ulong', 'L', 'longlong', 'q', 9, 'ulonglong', 'Q', 10, 'half', 'e', 23, 'f', 11, 'double', 'd', 12, 'longdouble', 'g', 13, 'cfloat', 'F', 14, 'cdouble', 'D', 15, 'clongdouble', 'G', 16, 'O', 17, 'S', 18, 'unicode', 'U', 19, 'void', 'V', 20, 'M', 21, 'm', 22, 'b1', 'bool8', 'i8', 'int64', 'u8', 'uint64', 'f2', 'float16', 'f4', 'float32', 'f8', 'float64', 'f16', 'float128', 'c8', 'complex64', 'c16', 'complex128', 'c32', 'complex256', 'object0', 'bytes0', 'str0', 'void0', 'M8', 'datetime64', 'm8', 'timedelta64', 'int32', 'i4', 'uint32', 'u4', 'int16', 'i2', 'uint16', 'u2', 'int8', 'i1', 'uint8', 'u1', 'complex_', 'single', 'csingle', 'singlecomplex', 'float_', 'intc', 'uintc', 'int_', 'longfloat', 'clongfloat', 'longcomplex', 'bool_', 'bytes_', 'string_', 'str_', 'unicode_', 'object_', 'int', 'float', 'complex', 'bool', 'object', 'str', 'bytes', 'a', 'int0', 'uint0'])

mtsokol · 2023-08-16T09:01:04Z

Someone at pandas recently mntiond they rely on sctypeDict to find a list of all dtyps/scalars. I am unsure if we don't have to provide something new if we slash it down a lot (although I generally like slashing it down quit a bit).

It looks undocumented besides a single brief mention in https://numpy.org/devdocs/reference/arrays.dtypes.html, and it's unclear what the design of it is. I think it's worth revisiting in a follow-up PR after some of the aliases are removed, and then (assuming we keep it) documenting properly what will be in it and how users are expected to use it?

I think #24411 is a good place to refactor building process of sctypeDict.

mtsokol · 2023-08-16T10:44:30Z

Some I wonder if there is enough use that we shouldn't bother, but overall nothing springs up a lot. tracemalloc_domain should be public, or did you add it to the docs somehwere? It could (and arguably should) be moved maybe, but not sure where exactly; and doing so is fine, since it should only be relevant for debugging. asfarray looks terrifying.

@seberg This PR is only about cleaning top-level namespace: tracemalloc_domain and asfarray are still available under np.lib.

rgommers · 2023-08-16T10:58:48Z

@seberg This PR is only about cleaning top-level namespace: tracemalloc_domain and asfarray are still available under np.lib.

I'd be +1 for removing fasarray completely right now, it's really bad and should be deleted.

seberg · 2023-08-16T11:40:34Z

asfarray is fine, just would be nice to explain a bit why we think it is bad. tracemalloc is also good to move, might be nice to find a better place eventually, but only documenting where it is important.

rgommers · 2023-08-16T12:29:55Z

asfarray is fine, just would be nice to explain a bit why we think it is bad

For asfarray, the semantics just seem unreasonable. dtype is optional but if not given, floating-point arrays are converted to float64 which is unexpected. And if dtype= is given a non-floating-point dtype, it just swallows it:

>>> x = np.array([1.5, 2.5], dtype=np.float32)
>>> np.asfarray(x)  # converts an array that is already floating-point: bad
array([1.5, 2.5])
>>>  # writing it like this will preserve `float32` & co, but has other bad side-effects,
>>> # like not working with array-likes and ignoring explicitly non-float dtypes
>>> np.asfarray(x, dtype=x.dtype)
array([1.5, 2.5], dtype=float32)
>>> x = np.array([1, 2], dtype=np.int32)
>>> np.asfarray(x, dtype=x.dtype)
array([1., 2.])
>>> np.asfarray(x, dtype=bool)  # one would expect this to raise an exception
array([1., 2.])

The default behavior is better written as np.asarray(x, dtype=np.float64). And for the non-default behavior there's also better ways of writing whatever you actually want. Hence: not a useful function.

mtsokol · 2023-08-16T13:24:16Z

@rgommers Would like me to purge np.asfarray completely here, or should I address it in the next PR? (In this PR it's only removed from the top namespace).

rgommers · 2023-08-16T13:39:53Z

I'd go ahead and delete it. Until now, the numpy and numpy.lib namespaces contained basically the same things, so separating the cleanup for things that we do not want to keep is probably resulting in having to look at such functions twice.

seberg · 2023-08-16T13:55:32Z

My guess: if users turn up, I would rather replace it with a more sane version that does the same thing then just hide it away.
In general, I am mainly worried about the mass of changes meaning that many practically non-programmers run into these...

rgommers · 2023-08-16T14:04:32Z

I would rather replace it with a more sane version that does the same thing then just hide it away.

Are you proposing a silent behavioral change to np.asfarray here, so it preserves lower-precision floating point dtypes by default? That seems more against our backwards compatibility policy than simply removing it.

ngoldbaum · 2023-08-16T17:07:35Z

The doctest failure is caused by pandas trying to import np.NaN, which is removed in this PR. That will be fixed once pandas-dev/pandas#54579 is merged and a new pandas nightly wheel gets generated.

jakevdp · 2023-08-17T18:14:45Z

What is the reasoning for removing float_ and complex_, but not removing int_ and uint? It seems like all four are the same category of object: aliases for specific (default?) scalar types of a given type class.

For consistency, I'd advocate either removing all four, or keeping all four.

mtsokol · 2023-08-17T18:36:56Z

What is the reasoning for removing float_ and complex_, but not removing int_ and uint? It seems like all four are the same category of object: aliases for specific (default?) scalar types of a given type class.

For consistency, I'd advocate either removing all four, or keeping all four.

I think one reason is that int_ and uint are also canonical names, where float_ and complex_ are only aliases (canonical names that they alias are np.double and np.cdouble).

In [24]: np.float_??
...
:Canonical name: `numpy.double`

In [25]: np.complex_??
...
:Canonical name: `numpy.cdouble`

In [26]: np.int_??
...
:Canonical name: `numpy.int_`

In [27]: np.uint??
...
:Canonical name: `numpy.uint`

effigies · 2023-08-28T15:30:15Z

Is numpy.core.sctypes still expected to be available in the long term, or should we be prepared for it to be deprecated and figure out alternatives?

rgommers · 2023-08-28T15:33:48Z

Is numpy.core.sctypes still expected to be available in the long term, or should we be prepared for it to be deprecated and figure out alternatives?

Please don't use anything from numpy.core. That whole module is not public and anything you use from it will likely go away in numpy 2.0.

effigies · 2023-08-28T15:37:41Z

Thanks for the quick response!

jakevdp · 2023-09-13T15:57:58Z

Hi - quick question: in the ml_dtypes project, we access numpy.sctypeDict in order to register new dtypes with NumPy (examples). How should we do this registration in a NumPy 2.0-compatible way?

mtsokol · 2023-09-13T16:26:19Z

Hi - quick question: in the ml_dtypes project, we access numpy.sctypeDict in order to register new dtypes with NumPy (examples). How should we do this registration in a NumPy 2.0-compatible way?

I don't have an answer right away. If there's no replacement then we can move np.sctypeDict back to the main namespace.
NEP 42 introduces a C-level mechanism for registering new dtypes, available with #include "numpy/experimental_dtype_api.h".

jakevdp · 2023-09-13T16:50:36Z

Thanks – experimental_dtype_api.h looks interesting, but given all the warnings and caveats it contains I'd hesitate to depend on it for a project that we'd like to be stable.

It would be great to figure out the correct way forward ASAP – currently JAX's nightly CI job testing against nightly numpy & scipy is broken by this with no obvious workaround. This CI run has proven valuable in the past for early flagging of potential incompatibiilties, and I'd like to get it green again.

ngoldbaum · 2023-09-13T17:04:10Z

Hmm, I think we want to avoid breaking downstream, but I also am not sure whether we anticipated downstream users inserting things into that dictionary. @seberg, do you have a suggestion?

In the short term the easiest solution for jax would either be to restore the removed item or ask them to use the private one in core for now.

ngoldbaum · 2023-09-13T18:13:05Z

We chatted about this in this week's numpy community meeting and decided to restore sctypeDict to the main namespace for now to fix downstream CI. @mtsokol should take care of that soon.

We'd like to find a way to do what ml_dtypes is trying to do with sctypeDict without manipulating global state in NumPy. Can you share a bit more detail about why sctypeDict in particular needs to be touched? Is it just so that e.g. np.dtype('int4') works and returns the ml_dtypes int4 dtype or are there other implications? Do you want users to access dtypes defined in ml_dtypes via np.dtypes or was this choice made for convenience in the implementation?

In the experimental dtype API, user-defined dtypes don't have any character codes at all and the vision is that users manipulate them by importing the dtype class from a python namespace rather than passing a string to np.dtype. That said, maybe we should support string dtype names? But I think at a minimum the names would need to be namespaced so that e.g. Jax's int4 doesn't step on some other project's int4 (or numpy's if numpy adds it in the future).

If you'd like, we have another community meeting two weeks from today that you're welcome to drop in on to discuss this further. We'd also be happy to schedule a low-latency chat or find an appropriate location in Jax community spaces to have this discussion if hiding it in a merged NumPy github PR is obnoxious.

hawkinsp · 2023-09-13T18:21:14Z

@ngoldbaum Indeed, it's so np.dtype('int4') works. Now, maybe we should just say "that shouldn't work". I don't think it would be that disruptive, since you can also spell that as np.dtype(ml_dtypes.int4) (ml_dtypes.int4 is a scalar type object), but we'd need to go through a deprecation period of our own for that.

But yes, you should probably decide if you want to support named lookups of extension types.

(Also, I'll add that ml_dtypes only exists because these types aren't upstream. We'd be thrilled to upstream whatever you want to accept, assuming it doesn't require a gigantic amount of work.)

ngoldbaum · 2023-09-13T20:15:42Z

but we'd need to go through a deprecation period of our own for that

NumPy should also probably deprecate and eventually prevent registering dtype names like this, see #24699

seberg · 2023-09-18T11:36:27Z

Hmm, I think we want to avoid breaking downstream, but I also am not sure whether we anticipated downstream users inserting things into that dictionary. @seberg, do you have a suggestion?

I suspected/mentioned this was likely to happen and while I don't like this, I suspect we may need to just allow it. I would be fine with forcing ml_dtypes to use a hidden away function to "register" themselves, so that sctypeDict isn't really directly mutable.
(A mini-string language seems almost ridiculous to me to replace from ml_dtypes import bfloat16 and using that.)

seberg · 2023-09-18T11:47:03Z

(Also, I'll add that ml_dtypes only exists because these types aren't upstream. We'd be thrilled to upstream whatever you want to accept, assuming it doesn't require a gigantic amount of work.)

Yes, at least bfloat16 did meet sympathy I think, but we also need a bit more to decide than just "would be nice" and maybe a write up, etc. Which is all not quite trivial:

Do we do this top-level or hide it away a bit?
If it is top-level, does it support all ufuncs. What happens if a ufunc is not supported, will it do the (IMO not great) thing that float16 currently often does and just upcast to float32?

Maybe all of these turn out to be details we can ignore, but the question is how to push it forward, and I suspect it needs a bit of a fatter proposal than "upstream it". Even if the implementation may end up being only that.

Xref numpy#24376

numpy/numpy#24376

github-actions bot added the 30 - API label Aug 9, 2023

mtsokol force-pushed the overhaul-of-main-namespace-part-3 branch 3 times, most recently from bd8ab0a to bb5b6da Compare August 13, 2023 11:09

mtsokol marked this pull request as ready for review August 14, 2023 11:54

mtsokol force-pushed the overhaul-of-main-namespace-part-3 branch from fa785a5 to 1fd0839 Compare August 14, 2023 12:58

ngoldbaum reviewed Aug 15, 2023

View reviewed changes

doc/source/reference/arrays.dtypes.rst Outdated Show resolved Hide resolved

doc/source/reference/arrays.dtypes.rst Show resolved Hide resolved

numpy/doc/constants.py Outdated Show resolved Hide resolved

mtsokol force-pushed the overhaul-of-main-namespace-part-3 branch from ffffe98 to d065f8f Compare August 16, 2023 10:47

mtsokol force-pushed the overhaul-of-main-namespace-part-3 branch from d065f8f to c440330 Compare August 16, 2023 11:44

mtsokol mentioned this pull request Aug 16, 2023

ENH: Reflect changes from numpy namespace refactor Part 3 pandas-dev/pandas#54579

Merged

mtsokol requested a review from ngoldbaum August 16, 2023 13:26

mtsokol mentioned this pull request Aug 16, 2023

ENH: Reflect changes from numpy namespace refactor part 3 scipy/scipy#19078

Merged

This was referenced Aug 17, 2023

MNT: Update dtypes to reflect numpy namespace refactor scikit-learn/scikit-learn#27082

Closed

ENH: Reflect changes from numpy namespace refactor Part 3 jax-ml/jax#17155

Closed

jakevdp mentioned this pull request Aug 17, 2023

deprecate jax.numpy.issubsctype jax-ml/jax#17160

Merged

effigies mentioned this pull request Aug 28, 2023

Replace np.sctypes for numpy 2.0 compat nipy/nibabel#1250

Merged

4 tasks

neutrinoceros mentioned this pull request Aug 31, 2023

BUG: fix an incompatibility with numpy 2.0 (np.string_ was removed) SAIL-Labs/AMICAL#183

Merged

larrybradley mentioned this pull request Sep 7, 2023

MNT: Remove np.float_ alias; it is removed in NumPy 2.0 scikit-image/scikit-image#7118

Merged

braingram mentioned this pull request Sep 12, 2023

Fix a bug preventing usage of numpy.linalg.inv and fix numpy 2.0 compatibility spacetelescope/tweakwcs#185

Merged

jakevdp mentioned this pull request Sep 13, 2023

⚠️ Nightly upstream-dev CI failed ⚠️ jax-ml/jax#16989

Closed

ngoldbaum mentioned this pull request Sep 13, 2023

DEP: Deprecate registering dtype names with np.sctypeDict? #24699

Open

jakevdp mentioned this pull request Sep 18, 2023

Discussion: uint / int_ / float_ / complex_ in NEP 52 #24743

Closed

WilliamJamieson mentioned this pull request Sep 25, 2023

Fixes for NumPy 2.0 compatiblity seperman/deepdiff#422

Merged

braingram mentioned this pull request Sep 27, 2023

numpy 2.0 fixes spacetelescope/webbpsf#743

Merged

melissawm mentioned this pull request Dec 6, 2023

DOC: The examples for numpy.mat do not demonstrate numpy.mat #20522

Closed

BvB93 added a commit to BvB93/numpy that referenced this pull request Dec 21, 2023

TYP: Remove remnants of functions deprecated in numpy#24376

41e50df

Xref numpy#24376

BvB93 added a commit to BvB93/numpy that referenced this pull request Dec 21, 2023

TYP: Remove remnants of functions deprecated in numpy#24376

0c259b5

Xref numpy#24376

lucascolley mentioned this pull request Dec 24, 2023

Use of NumPy type aliases to be removed in NumPy 2.0 cupy/cupy#8049

Closed

ngoldbaum mentioned this pull request May 30, 2024

MNT: np.set_string_function implementation should be removed #26576

Closed

andrew-s28 mentioned this pull request Jun 17, 2024

Updating for NumPy 2.0 OceanParcels/Parcels#1588

Closed

schloerke added a commit to posit-dev/py-shiny that referenced this pull request Jun 18, 2024

Use np.nan not, NaN

e365a4a

numpy/numpy#24376

schloerke mentioned this pull request Jun 18, 2024

chore: Use np.nan instead of np.NaN posit-dev/py-shiny#1468

Merged

PGijsbers mentioned this pull request Jul 3, 2024

DOC: sctypes & sctypeDict not documented #12334

Open

Uh oh!

API: Cleaning numpy/__init__.py and main namespace - Part 3 [NEP 52] #24376

API: Cleaning numpy/__init__.py and main namespace - Part 3 [NEP 52] #24376

Uh oh!

Conversation

mtsokol commented Aug 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngoldbaum commented Aug 14, 2023

Uh oh!

ngoldbaum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

seberg commented Aug 15, 2023

Uh oh!

rgommers commented Aug 16, 2023

Uh oh!

mtsokol commented Aug 16, 2023

Uh oh!

mtsokol commented Aug 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgommers commented Aug 16, 2023

Uh oh!

seberg commented Aug 16, 2023

Uh oh!

rgommers commented Aug 16, 2023

Uh oh!

mtsokol commented Aug 16, 2023

Uh oh!

rgommers commented Aug 16, 2023

Uh oh!

seberg commented Aug 16, 2023

Uh oh!

rgommers commented Aug 16, 2023

Uh oh!

ngoldbaum commented Aug 16, 2023

Uh oh!

jakevdp commented Aug 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mtsokol commented Aug 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

effigies commented Aug 28, 2023

Uh oh!

rgommers commented Aug 28, 2023

Uh oh!

effigies commented Aug 28, 2023

Uh oh!

jakevdp commented Sep 13, 2023

Uh oh!

mtsokol commented Sep 13, 2023

Uh oh!

jakevdp commented Sep 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngoldbaum commented Sep 13, 2023

Uh oh!

ngoldbaum commented Sep 13, 2023

Uh oh!

hawkinsp commented Sep 13, 2023

Uh oh!

ngoldbaum commented Sep 13, 2023

Uh oh!

seberg commented Sep 18, 2023

Uh oh!

seberg commented Sep 18, 2023

Uh oh!

Uh oh!

API: Cleaning `numpy/init.py` and main namespace - Part 3 [NEP 52] #24376

API: Cleaning `numpy/init.py` and main namespace - Part 3 [NEP 52] #24376

mtsokol commented Aug 9, 2023 •

edited

Loading

mtsokol commented Aug 16, 2023 •

edited

Loading

jakevdp commented Aug 17, 2023 •

edited

Loading

mtsokol commented Aug 17, 2023 •

edited

Loading

jakevdp commented Sep 13, 2023 •

edited

Loading