Skip to content

API: Cleaning numpy/__init__.py and main namespace - Part 3 [NEP 52] #24376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Aug 25, 2023

Conversation

mtsokol
Copy link
Member

@mtsokol mtsokol commented Aug 9, 2023

Follow-up PR after #24357.

Current scope of the PR:

  1. Remove other aliases:
  • float_
  • complex_
  • longfloat
  • singlecomplex
  • cfloat
  • longcomplex
  • clongfloat
  • string_
  • unicode_
  1. Continue establishing main namespace as is in ENH: Overhaul of NumPy main namespace [NEP 52] #24306 (removed next 18 "remove" column items), remove:
  • Inf, Infinity, NaN, infty
  • issctype, maximum_sctype, obj2sctype, sctype2char, sctypeDict, sctypes, issubsctype
  • set_string_function,
  • asfarray (removed completely from NumPy)
  • issubclass_
  • tracemalloc_domain (still available in np.lib)
  • mat (alias for np.asmatrix)
  • recfromcsv and recfromtxt (still available under np.lib.npyio module)
  • safe_eval, deprecate, deprecate_with_doc (all three already deprecated)

@mtsokol mtsokol force-pushed the overhaul-of-main-namespace-part-3 branch 3 times, most recently from bd8ab0a to bb5b6da Compare August 13, 2023 11:09
@mtsokol mtsokol marked this pull request as ready for review August 14, 2023 11:54
@mtsokol mtsokol force-pushed the overhaul-of-main-namespace-part-3 branch from fa785a5 to 1fd0839 Compare August 14, 2023 12:58
@ngoldbaum
Copy link
Member

It looks like the doctests for doc/source/reference/arrays.dtypes.rst are failing because np.unicode_ go removed.

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments below, I didn't catch any other issues but this probably needs at least one more careful review before it can be merged.

@seberg
Copy link
Member

seberg commented Aug 15, 2023

Some I wonder if there is enough use that we shouldn't bother, but overall nothing springs up a lot. tracemalloc_domain should be public, or did you add it to the docs somehwere? It could (and arguably should) be moved maybe, but not sure where exactly; and doing so is fine, since it should only be relevant for debugging.
asfarray looks terrifying.

Someone at pandas recently mntiond they rely on sctypeDict to find a list of all dtyps/scalars. I am unsure if we don't have to provide something new if we slash it down a lot (although I generally like slashing it down quit a bit).

@rgommers
Copy link
Member

Someone at pandas recently mntiond they rely on sctypeDict to find a list of all dtyps/scalars. I am unsure if we don't have to provide something new if we slash it down a lot (although I generally like slashing it down quit a bit).

It looks undocumented besides a single brief mention in https://numpy.org/devdocs/reference/arrays.dtypes.html, and it's unclear what the design of it is. I think it's worth revisiting in a follow-up PR after some of the aliases are removed, and then (assuming we keep it) documenting properly what will be in it and how users are expected to use it?

Right now this seems like total chaos:

>>> np.sctypeDict.keys()
dict_keys(['?', 0, 'byte', 'b', 1, 'ubyte', 'B', 2, 'short', 'h', 3, 'ushort', 'H', 4, 'i', 5, 'uint', 'I', 6, 'intp', 'p', 7, 'uintp', 'P', 8, 'long', 'l', 'ulong', 'L', 'longlong', 'q', 9, 'ulonglong', 'Q', 10, 'half', 'e', 23, 'f', 11, 'double', 'd', 12, 'longdouble', 'g', 13, 'cfloat', 'F', 14, 'cdouble', 'D', 15, 'clongdouble', 'G', 16, 'O', 17, 'S', 18, 'unicode', 'U', 19, 'void', 'V', 20, 'M', 21, 'm', 22, 'b1', 'bool8', 'i8', 'int64', 'u8', 'uint64', 'f2', 'float16', 'f4', 'float32', 'f8', 'float64', 'f16', 'float128', 'c8', 'complex64', 'c16', 'complex128', 'c32', 'complex256', 'object0', 'bytes0', 'str0', 'void0', 'M8', 'datetime64', 'm8', 'timedelta64', 'int32', 'i4', 'uint32', 'u4', 'int16', 'i2', 'uint16', 'u2', 'int8', 'i1', 'uint8', 'u1', 'complex_', 'single', 'csingle', 'singlecomplex', 'float_', 'intc', 'uintc', 'int_', 'longfloat', 'clongfloat', 'longcomplex', 'bool_', 'bytes_', 'string_', 'str_', 'unicode_', 'object_', 'int', 'float', 'complex', 'bool', 'object', 'str', 'bytes', 'a', 'int0', 'uint0'])

@mtsokol
Copy link
Member Author

mtsokol commented Aug 16, 2023

Someone at pandas recently mntiond they rely on sctypeDict to find a list of all dtyps/scalars. I am unsure if we don't have to provide something new if we slash it down a lot (although I generally like slashing it down quit a bit).

It looks undocumented besides a single brief mention in https://numpy.org/devdocs/reference/arrays.dtypes.html, and it's unclear what the design of it is. I think it's worth revisiting in a follow-up PR after some of the aliases are removed, and then (assuming we keep it) documenting properly what will be in it and how users are expected to use it?

I think #24411 is a good place to refactor building process of sctypeDict.

@mtsokol
Copy link
Member Author

mtsokol commented Aug 16, 2023

Some I wonder if there is enough use that we shouldn't bother, but overall nothing springs up a lot. tracemalloc_domain should be public, or did you add it to the docs somehwere? It could (and arguably should) be moved maybe, but not sure where exactly; and doing so is fine, since it should only be relevant for debugging. asfarray looks terrifying.

@seberg This PR is only about cleaning top-level namespace: tracemalloc_domain and asfarray are still available under np.lib.

@mtsokol mtsokol force-pushed the overhaul-of-main-namespace-part-3 branch from ffffe98 to d065f8f Compare August 16, 2023 10:47
@rgommers
Copy link
Member

@seberg This PR is only about cleaning top-level namespace: tracemalloc_domain and asfarray are still available under np.lib.

I'd be +1 for removing fasarray completely right now, it's really bad and should be deleted.

@seberg
Copy link
Member

seberg commented Aug 16, 2023

asfarray is fine, just would be nice to explain a bit why we think it is bad. tracemalloc is also good to move, might be nice to find a better place eventually, but only documenting where it is important.

@mtsokol mtsokol force-pushed the overhaul-of-main-namespace-part-3 branch from d065f8f to c440330 Compare August 16, 2023 11:44
@rgommers
Copy link
Member

asfarray is fine, just would be nice to explain a bit why we think it is bad

For asfarray, the semantics just seem unreasonable. dtype is optional but if not given, floating-point arrays are converted to float64 which is unexpected. And if dtype= is given a non-floating-point dtype, it just swallows it:

>>> x = np.array([1.5, 2.5], dtype=np.float32)
>>> np.asfarray(x)  # converts an array that is already floating-point: bad
array([1.5, 2.5])
>>>  # writing it like this will preserve `float32` & co, but has other bad side-effects,
>>> # like not working with array-likes and ignoring explicitly non-float dtypes
>>> np.asfarray(x, dtype=x.dtype)
array([1.5, 2.5], dtype=float32)
>>> x = np.array([1, 2], dtype=np.int32)
>>> np.asfarray(x, dtype=x.dtype)
array([1., 2.])
>>> np.asfarray(x, dtype=bool)  # one would expect this to raise an exception
array([1., 2.])

The default behavior is better written as np.asarray(x, dtype=np.float64). And for the non-default behavior there's also better ways of writing whatever you actually want. Hence: not a useful function.

@mtsokol
Copy link
Member Author

mtsokol commented Aug 16, 2023

@rgommers Would like me to purge np.asfarray completely here, or should I address it in the next PR? (In this PR it's only removed from the top namespace).

@mtsokol mtsokol requested a review from ngoldbaum August 16, 2023 13:26
@rgommers
Copy link
Member

I'd go ahead and delete it. Until now, the numpy and numpy.lib namespaces contained basically the same things, so separating the cleanup for things that we do not want to keep is probably resulting in having to look at such functions twice.

@seberg
Copy link
Member

seberg commented Aug 16, 2023

My guess: if users turn up, I would rather replace it with a more sane version that does the same thing then just hide it away.
In general, I am mainly worried about the mass of changes meaning that many practically non-programmers run into these...

@rgommers
Copy link
Member

I would rather replace it with a more sane version that does the same thing then just hide it away.

Are you proposing a silent behavioral change to np.asfarray here, so it preserves lower-precision floating point dtypes by default? That seems more against our backwards compatibility policy than simply removing it.

@ngoldbaum
Copy link
Member

The doctest failure is caused by pandas trying to import np.NaN, which is removed in this PR. That will be fixed once pandas-dev/pandas#54579 is merged and a new pandas nightly wheel gets generated.

@jakevdp
Copy link
Contributor

jakevdp commented Aug 17, 2023

What is the reasoning for removing float_ and complex_, but not removing int_ and uint? It seems like all four are the same category of object: aliases for specific (default?) scalar types of a given type class.

For consistency, I'd advocate either removing all four, or keeping all four.

@mtsokol
Copy link
Member Author

mtsokol commented Aug 17, 2023

What is the reasoning for removing float_ and complex_, but not removing int_ and uint? It seems like all four are the same category of object: aliases for specific (default?) scalar types of a given type class.

For consistency, I'd advocate either removing all four, or keeping all four.

I think one reason is that int_ and uint are also canonical names, where float_ and complex_ are only aliases (canonical names that they alias are np.double and np.cdouble).

In [24]: np.float_??
...
:Canonical name: `numpy.double`

In [25]: np.complex_??
...
:Canonical name: `numpy.cdouble`

In [26]: np.int_??
...
:Canonical name: `numpy.int_`

In [27]: np.uint??
...
:Canonical name: `numpy.uint`

@effigies
Copy link
Contributor

Is numpy.core.sctypes still expected to be available in the long term, or should we be prepared for it to be deprecated and figure out alternatives?

@rgommers
Copy link
Member

Is numpy.core.sctypes still expected to be available in the long term, or should we be prepared for it to be deprecated and figure out alternatives?

Please don't use anything from numpy.core. That whole module is not public and anything you use from it will likely go away in numpy 2.0.

@effigies
Copy link
Contributor

Thanks for the quick response!

@jakevdp
Copy link
Contributor

jakevdp commented Sep 13, 2023

Hi - quick question: in the ml_dtypes project, we access numpy.sctypeDict in order to register new dtypes with NumPy (examples). How should we do this registration in a NumPy 2.0-compatible way?

@mtsokol
Copy link
Member Author

mtsokol commented Sep 13, 2023

Hi - quick question: in the ml_dtypes project, we access numpy.sctypeDict in order to register new dtypes with NumPy (examples). How should we do this registration in a NumPy 2.0-compatible way?

I don't have an answer right away. If there's no replacement then we can move np.sctypeDict back to the main namespace.
NEP 42 introduces a C-level mechanism for registering new dtypes, available with #include "numpy/experimental_dtype_api.h".

@jakevdp
Copy link
Contributor

jakevdp commented Sep 13, 2023

Thanks – experimental_dtype_api.h looks interesting, but given all the warnings and caveats it contains I'd hesitate to depend on it for a project that we'd like to be stable.

It would be great to figure out the correct way forward ASAP – currently JAX's nightly CI job testing against nightly numpy & scipy is broken by this with no obvious workaround. This CI run has proven valuable in the past for early flagging of potential incompatibiilties, and I'd like to get it green again.

@ngoldbaum
Copy link
Member

Hmm, I think we want to avoid breaking downstream, but I also am not sure whether we anticipated downstream users inserting things into that dictionary. @seberg, do you have a suggestion?

In the short term the easiest solution for jax would either be to restore the removed item or ask them to use the private one in core for now.

@ngoldbaum
Copy link
Member

We chatted about this in this week's numpy community meeting and decided to restore sctypeDict to the main namespace for now to fix downstream CI. @mtsokol should take care of that soon.

We'd like to find a way to do what ml_dtypes is trying to do with sctypeDict without manipulating global state in NumPy. Can you share a bit more detail about why sctypeDict in particular needs to be touched? Is it just so that e.g. np.dtype('int4') works and returns the ml_dtypes int4 dtype or are there other implications? Do you want users to access dtypes defined in ml_dtypes via np.dtypes or was this choice made for convenience in the implementation?

In the experimental dtype API, user-defined dtypes don't have any character codes at all and the vision is that users manipulate them by importing the dtype class from a python namespace rather than passing a string to np.dtype. That said, maybe we should support string dtype names? But I think at a minimum the names would need to be namespaced so that e.g. Jax's int4 doesn't step on some other project's int4 (or numpy's if numpy adds it in the future).

If you'd like, we have another community meeting two weeks from today that you're welcome to drop in on to discuss this further. We'd also be happy to schedule a low-latency chat or find an appropriate location in Jax community spaces to have this discussion if hiding it in a merged NumPy github PR is obnoxious.

@hawkinsp
Copy link
Contributor

@ngoldbaum Indeed, it's so np.dtype('int4') works. Now, maybe we should just say "that shouldn't work". I don't think it would be that disruptive, since you can also spell that as np.dtype(ml_dtypes.int4) (ml_dtypes.int4 is a scalar type object), but we'd need to go through a deprecation period of our own for that.

But yes, you should probably decide if you want to support named lookups of extension types.

(Also, I'll add that ml_dtypes only exists because these types aren't upstream. We'd be thrilled to upstream whatever you want to accept, assuming it doesn't require a gigantic amount of work.)

@ngoldbaum
Copy link
Member

but we'd need to go through a deprecation period of our own for that

NumPy should also probably deprecate and eventually prevent registering dtype names like this, see #24699

@seberg
Copy link
Member

seberg commented Sep 18, 2023

Hmm, I think we want to avoid breaking downstream, but I also am not sure whether we anticipated downstream users inserting things into that dictionary. @seberg, do you have a suggestion?

I suspected/mentioned this was likely to happen and while I don't like this, I suspect we may need to just allow it. I would be fine with forcing ml_dtypes to use a hidden away function to "register" themselves, so that sctypeDict isn't really directly mutable.
(A mini-string language seems almost ridiculous to me to replace from ml_dtypes import bfloat16 and using that.)

@seberg
Copy link
Member

seberg commented Sep 18, 2023

(Also, I'll add that ml_dtypes only exists because these types aren't upstream. We'd be thrilled to upstream whatever you want to accept, assuming it doesn't require a gigantic amount of work.)

Yes, at least bfloat16 did meet sympathy I think, but we also need a bit more to decide than just "would be nice" and maybe a write up, etc. Which is all not quite trivial:

  • Do we do this top-level or hide it away a bit?
  • If it is top-level, does it support all ufuncs. What happens if a ufunc is not supported, will it do the (IMO not great) thing that float16 currently often does and just upcast to float32?

Maybe all of these turn out to be details we can ignore, but the question is how to push it forward, and I suspect it needs a bit of a fatter proposal than "upstream it". Even if the implementation may end up being only that.

BvB93 added a commit to BvB93/numpy that referenced this pull request Dec 21, 2023
BvB93 added a commit to BvB93/numpy that referenced this pull request Dec 21, 2023
schloerke added a commit to posit-dev/py-shiny that referenced this pull request Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants