Skip to content

API: Add DType classes into new numpy.dtypes module #23358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 27, 2023

Conversation

seberg
Copy link
Member

@seberg seberg commented Mar 7, 2023

This PR consists of two (initial) commits, which can be viewed independently.

  1. Reorganize the name creating in C, because I needed the names in C later, and it seemed much easier to do it in one place. I think this is a nice cleanup (I am sure a lot more could be done).

  2. Add numpy.types and add new names to it. I have picked Float64DType, StrDType, LongDoubleDType, and camel case for the datetimes. The integers are both as C names and as integer names (this matches the scalars). The reason is that otherwise not all are available and platform dependent.

The docs are just a table for now, I am sure this can be expanded.

(Probably should have a release note, because it is an addition, albeit not interesting for a wide audience probably.)


I am not set on the names, or even on numpy.types, it just seemed like a decent thing and the only alternative was numpy.dtypes but this way we could add e.g. the ArrayFunctionDispatcher to it also.

@BvB93 would this need more from a typing perspective?


EDIT: Direct link to the generated docs, which gives a clearer overview maybe https://output.circle-artifacts.com/output/job/6842ef47-ca31-498b-bee0-4f3b94e9c090/artifacts/0/doc/build/html/reference/routines.other.html#module-numpy.types

@seberg seberg added the 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. label Mar 7, 2023
@seberg seberg added 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes 09 - Backport-Candidate PRs tagged should be backported 30 - API and removed 30 - API 09 - Backport-Candidate PRs tagged should be backported labels Mar 7, 2023
@seberg seberg force-pushed the dtype-class-in-types branch from 7ecb74b to 39c1cfb Compare March 7, 2023 21:00
@ngoldbaum
Copy link
Member

Would it make sense for the __class_getitem__ implementation on np.dtype to return the actual dtype class instances in np.types? If for some reason that breaks the typing support, would it be possible to change the repr for the GenericAlias instance you get back right now to at least point runtime users to the correct place to get the type they want?

In [2]: np.dtype[str]
Out[2]: numpy.dtype[str]

In [3]: type(_)
Out[3]: types.GenericAlias

@seberg
Copy link
Member Author

seberg commented Mar 8, 2023

Just to be clear, are you suggesting that as a replacement or as additional to this? I had that originally even, but we pulled it out for GenericAlias.

I have not digged in whether making __class_getitem__ work for both typing and this can work. I.e. why not return concrete instance (when it exists!) and otherwise the GenericAlias?! Maybe that would even be better for typing?

@ngoldbaum
Copy link
Member

I think it could work as a replacement, although I like the nice explicit names you've given the dtype classes here, so I guess weakly as an addition to this.

I don't know enough about numpy's typing support or python typing in general to say whether or not the concrete types would break anything, but I think eliminating the use of GenericAlias and instead returning concrete dtype classes would be best if that's possible.

@seberg
Copy link
Member Author

seberg commented Mar 8, 2023

@BvB93 do you know if adapting __class_getitem__ to return the concrete classes would work? I guess np.dtype[Any] must keep returning the GenericAlias though at the very least?!

I don't hate it as a replacement, although the current class __name__ is also annoying (since it is not a proper name)! So additionally to np.dtype[...] this does give them a nicer name (although not much nicer to read, but one that makes sense as a variable name).

@seberg
Copy link
Member Author

seberg commented Mar 16, 2023

OK, I think the answer for modifying np.dtype[np.float64] at runtime is a naaaah: https://mail.python.org/archives/list/typing-sig@python.org/thread/LRDYM2MR3ZNXG6IUFTDCDJBOOVDT2U72/

The only alternative then would be if we want np.dtypes or so as a namespace, and the only argument for that I have is that we might at some point want more functionality similar to np.promote_types (e.g. it is probably a good addition to have np.cast_dtype).

@seberg
Copy link
Member Author

seberg commented Mar 23, 2023

@ngoldbaum (and others), whats your take on this? How can we come to a decision whether to just do this or what alternatives to explore?

@ngoldbaum
Copy link
Member

Given the reaction you got on typing-sig, I think this PR is probably the way to go.

@seberg seberg added the triage review Issue/PR to be discussed at the next triage meeting label Mar 27, 2023
@seberg
Copy link
Member Author

seberg commented Mar 29, 2023

@rgommers and @mattip should we go ahead with this?

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The public API and docs of this new np.types submodule LGTM, so +1 for those.

I did not review the C changes.

Copy link
Member

@BvB93 BvB93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the later reply, been a bit busy as of late unfortunately. Do have one comment on types.pyi though.

numpy/types.pyi Outdated
@@ -0,0 +1,39 @@
from numpy.types import (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is that type checkers only see the relevant .pyi file as it shadows the underlying .py file; so your effectively trying to import objects from within the same modular here. Instead, the dtype types would have to be explicitly defined here:

import numpy as np

__all__: list[str]

BoolDType = np.dtype[np.bool_]
UInt64DType = np.dtype[np.uint64]
ByteDType = np.dtype[np.byte]
# etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thanks. One day we might want additional attributes like FloatDType.finfo which seem hard to combine with np.dtype[np.float32]. But I guess that is a bridge to cross when we come to it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that one might actually be fairly easy to do by only allowing specific dtype types for self. There are handful of similar patterns already in the code for structured arrays for example, which are the only arrays that allow for string-based indexing.

_DTypeScalar_co = TypeVar("_DTypeScalar_co", covariant=True, bound=generic)
_NBit = TypeVar("_NBit1", bound=NBitBase)

class dtype(Generic[_DTypeScalar_co]):
    @property
    def finfo(self: dtype[np.inexact[_NBit]]) -> np.finfo[np.floating[_NBit]]:

@seberg seberg force-pushed the dtype-class-in-types branch from 05b2174 to b439175 Compare April 5, 2023 14:46
@seberg seberg removed the triage review Issue/PR to be discussed at the next triage meeting label Apr 5, 2023
@rgommers
Copy link
Member

rgommers commented Apr 7, 2023

When reviewing NEP 52, both @mhvk and @stefanv commented about the choice for types as a submodule name as not ideal, thinking it was a choice that that NEP made. It's not, the choice is being made here - so I'll xref this PR and let Stefan and Marten comment here for themselves.

@mhvk
Copy link
Contributor

mhvk commented Apr 7, 2023

Hmm, I had seen this PR vaguely and thought it was all about typing... But now see it is about storing dtypes somewhere. Copying my comment:

I think that makes sense, though would love to stop using type as equivalent to a data type as much as possible... Since this is a new module, there is an opportunity in naming it. data_types or datatypes are probably too long; is dtypes an option? Or too confusing with np.dtype?
Or could it be np.dt? One could then think of pre-defining some dtype so that one could do np.array(..., dtype=np.dt.f8).

I actually like my last suggestion, of having the dtypes defined by the abbreviations. This is similar to what we do for astropy.units, where the module contains the long and short names of all the units as pre-defined instances (well, one could in principle have __getattr__ define them as needed...)

Of course, one could also encourage a standard pattern like import numpy.dtypes as dt - that allows a longer name.

Anyway, I think the name is important and types is a bad one! I just implies classes to me...

@mhvk
Copy link
Contributor

mhvk commented Apr 7, 2023

Reading #23537 (comment), I'm a bit less sure about having f8 etc as instances; maybe float64 is better and clearer. Though ideally float64 and co would then be instances of dtype classes, not classes of scalars. Would this be a right time to do that? (In the new module at least)

@stefanv
Copy link
Contributor

stefanv commented Apr 7, 2023

My gut feeling aligns with Marten's:

  • If we use numpy.types to suggest an equivalence between "numpy types" and "numpy data type", that's an unnecessary (and for new users counter-intuitive) jump.
  • If we use numpy.types to suggest that we will only store actual types in the submodule, that is unnecessarily restrictive.

Sure, the DTypes may be types, but that is not relevant to how we group the functionality here, and so I would also prefer np.dtypes (np.dt could work, but may be a bit cryptic to new users).

@ksunden
Copy link
Contributor

ksunden commented Apr 7, 2023

Personally, I don't like dt because it is relatively common to do import datetime as dt

(including internal to numpy source, as well as at least pandas and xarray)

@mhvk
Copy link
Contributor

mhvk commented Apr 7, 2023

On further thought, I agree that it should not be np.dt - better to have np.dtypes (or perhaps np.datatypes). Users can then decide themselves whether or not to do a import numpy.d[ata]types as dt (datetime cannot handle leap seconds so is pretty useless in astronomy...)

@seberg
Copy link
Member Author

seberg commented Apr 11, 2023

The question is a bit what we want. I don't disagree with np.dtypes or so as a name. The question is whether np.dtypes.f8 for example is a useful alias for Float64DType (the class) or not for the singleton instance np.dtype("f8"). (This is not always clear, because semantics can differ in subtle ways and I am not actually sure what we want typically: asarray(asarray([1.], ">f8"), dtype=Float64DType) might preserve the endianess. OTOH, the dtype/signature in ufuncs accepts the instance but really cares more about the class).

The follow-up question then would be: If np.dtypes.f8 are not then hiding them in np.types seems OK to me again. OTOH, maybe it should be the classes and if you are worried about possible funny byte-order preserving, you should write dtype=dt.f8().

I can see more dtype utility functions, for example, we are missing a way to spell new_dtype = dtype.astype(new_dtype_or_Class) (which could be a method as here, though). The drop metadata hanging PR is also one (could be a method as well).

@rgommers
Copy link
Member

The question is a bit what we want. I don't disagree with np.dtypes or so as a name. The question is whether np.dtypes.f8

I think these are two separate questions. The one on the submodule name (types, dtypes, or other) clearly needs a decision. Introducing new aliases like f8 is unrelated to that, and I'm personally -1 on that given that we already have way too many aliases floating around. Use of scalars should be discouraged, so new short aliases are going in the wrong direction.

@seberg
Copy link
Member Author

seberg commented Apr 11, 2023

My thought was: If the purpose here is to house the DType classes for niche use-cases, I am not sure putting them into a np.dtypes namespace that may also house convenience aliases like np.dtypes.f8 if all that useful.

@rgommers
Copy link
Member

I suggest to forget about f8 and focus on the main comments of Marten and Stefan, who both don't like the types name.

If the purpose here is to house the DType classes for niche use-cases,

That is not the sole stated purpose in the docs. From types.py in this PR: Similar to the builtin types module, this submodule defines types (classes) that are not widely used directly.

DType classes are a subset of that, so I'm personally fine with the types name. And dtypes seems more restrictive. What if we'd want to move other classes here, like return types for objects like what np.finfo returns?

@seberg
Copy link
Member Author

seberg commented Apr 11, 2023

The point of the original suggestion was to have a place to put types/classes that users rarely/never access. DTypes are most of those (there could be more, like np.ufunc, ArrayFunctionDispatcher).

I don't care about the name, the suggestion was because Python has types as a catch-all for classes that users rarely access directly. The point I wanted to make still stands: If we don't expect it to house any conveniences but only rarely used classes, I am not sure I mind the name. If we want it to house more dtypes related stuff, then that is a very clear argument against the name.

That doesn't mean I care much either way, I don't.

@mhvk
Copy link
Contributor

mhvk commented Apr 11, 2023

Agreed to leave out f8, etc. They were not obviously a good idea.

On the name: I came very much at this from the NEP write-up, so had no idea about the larger idea of types becoming a collection of rarely used but exposed classes. I can see that logic, but wonder if it goes too much in the direction of just having another lib or utils. It also does not look much like python's types module, which does indeed have a set of classes, but they are mostly the building blocks, not collections of related classes like the data types. If one were to stick with np.types, it might be more logical to define a np.types.dtypes submodule.

But with all the work to define the data types, it seems OK to me for them to have their own space even if it is not used much -- and it may get used more than you think once it becomes easy to define user dtype in python rather than in C! So, from that perspective, I think it still makes sense to have np.dtypes as a place to hold everything related to how dtypes are defined.

@seberg
Copy link
Member Author

seberg commented Apr 12, 2023

That's fair. At least abstract DTypes should happen pretty soon, and unlike these classes may be useful a bit more often.

The first/main "create DType in Python" thing, that I currently think would be nice would be to create a custom structured DTypes in a manner similar to dataclasses.

@seberg seberg changed the title API: Add DType classes into new numpy.types module API: Add DType classes into new numpy.dtypes module Apr 12, 2023
@seberg seberg removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Apr 12, 2023
@seberg seberg force-pushed the dtype-class-in-types branch from b439175 to 569f7bc Compare April 12, 2023 10:40
@seberg
Copy link
Member Author

seberg commented Apr 12, 2023

OK, I updated this to np.dtypes. I am assuming we will at least put the open PR for drop_metadata there (we can always deprecate+replace).

@seberg
Copy link
Member Author

seberg commented Apr 25, 2023

Can we move this forward? If we have it, we can use it to fix-up and put in the "drop_meatadata" function.

@seberg seberg requested a review from mattip April 26, 2023 19:32
Copy link
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a few nits in comments and docstrings. The module appears in the documentation on the routines.other page, and as far as I can tell no other documentation links to it. I guess that is OK for now, we can always add more links if the module is helpful.

Co-authored-by: Matti Picus <matti.picus@gmail.com>
@mattip mattip merged commit 6d34e34 into numpy:main Apr 27, 2023
@mattip
Copy link
Member

mattip commented Apr 27, 2023

Thanks @seberg

@seberg seberg deleted the dtype-class-in-types branch April 27, 2023 11:17
BvB93 added a commit to BvB93/numpy that referenced this pull request May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
30 - API 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. Numpy 2.0 API Changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants