Skip to content

ENH: add a canonical way to determine if dtype is integer, floating point or complex #17325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rgommers opened this issue Sep 16, 2020 · 15 comments

Comments

@rgommers
Copy link
Member

There is currently no good way (AFAIK) to figure out if the dtype of an array is integer, floating point or complex. Right now one of these is the most common probably:

x.dtype.kind in np.typecodes["AllFloat"]
np.issubdtype(x.dtype, np.floating)

Both are pretty awful.

A naive way to write code in the absence of something like is_floating_point/is_integer/is_complex would be:

x.dtype in (np.float16, np.float32, np.float64)

The trouble is that we have extended precision dtypes, and only one of float96 or float128 will actually exist (the other one will raise an AttributeError, also annoying and a frequent source of bugs).

Adding a set of functions is_floating_point/is_integer/is_complex (whether with or without an underscore in the name, or naming it floating or floating_point) seems like a good idea to me.

In other libraries: TensorFlow doesn't seem to have any API for this, PyTorch has is_floating_point and is_complex.

Thoughts?

@eric-wieser
Copy link
Member

eric-wieser commented Sep 16, 2020

In my mind, we already have canonical spellings for these and they are:

  • np.issubdtype(x.dtype, np.integer)
  • np.issubdtype(x.dtype, np.floating)
  • np.issubdtype(x.dtype, np.complexfloating)

Perhaps these spellings are too long, but "provide a convenient spelling" is different to "provide a canonical one".


The trouble is that we have extended precision dtypes, and only one of float96 or float128 will actually exist (the other one will raise an AttributeError, also annoying and a frequent source of bugs).

While I would not recommend using this approach of listing the types, the safe spelling is

x.dtype in (np.half, np.single, np.double, np.longdouble)

@rgommers
Copy link
Member Author

rgommers commented Sep 16, 2020

In my mind, we already have canonical spellings for these and they are

There's a number of issues with that:

  • It's not documented as far as I can tell, I'd expect it in https://numpy.org/devdocs/reference/arrays.dtypes.html or on https://numpy.org/devdocs/reference/routines.dtype.html.
  • If I check the NumPy and SciPy code bases, both methods I mentioned are used
  • If one reads the docstring for np.floating, all it says is "Abstract base class of all floating-point scalar types". So someone familiar with Python would probably try to use isinstance first.
  • The issubdtype docstring says "Returns True if first argument is a typecode lower/equal in type hierarchy." which won't make sense to many users, and it's not even clear why one would prefer that over issubsctype given the one-line descriptions.
  • Users that want to know "is this an array of floats or integers?" shouldn't have to understand the details of NumPy's dtype hierarchy.

I'm not too interested in arguing about the semantics of "canonical", so let me just say: there is no good way of doing this currently - should we add is_floating, is_complex and is_integer or something similarly named as a sane way of doing this?

@eric-wieser
Copy link
Member

My leaning is towards making the documentation much clearer for issubdtype and those abstract classes. We already have an open issue about the various inconsistent pages documenting the scalar types, fixing that would help a fair bit here.

Also, we already have

IMO, introducing more functions will make things more confusing not less.

@rgommers
Copy link
Member Author

Oh yes, that doesn't help, more weird and badly documented functions:(

In [13]: z = np.arange(3, dtype=np.complex64)                              

In [14]: np.iscomplex(z)                                                   
Out[14]: array([False, False, False])

In [15]: np.iscomplexobj(z)                                                
Out[15]: True

In [16]: np.iscomplexobj(z.dtype)   # iscomplexobj docstring: "Check for a complex type ...."                        
Out[16]: False

@eric-wieser
Copy link
Member

eric-wieser commented Sep 16, 2020

it's not even clear why one would prefer that over issubsctype given the one-line descriptions.

It looks like this we probably should deprecate one or of the other of these. issubsctype looks weird, because it tries to squash the semantics of isinstance and issubclass into a single function. obj2sctype suffers from the same weirdness.

@rgommers
Copy link
Member Author

Deprecating issubsctype sounds like a good idea to me - but then maybe all the other sctype ones as well?

And hide some of the other weird stuff like issubclass_ towards the bottom of the page with some disclaimer, if we don't want to deprecate it?

@andyfaff
Copy link
Member

You can distinguish integers from floating point immediately using np.issubdtype(o, np.inexact)

@seberg
Copy link
Member

seberg commented Oct 28, 2020

My mid-term goal would be that we do have an inexact DType, to be just a bit clearer than the np.inexact scalar. np.issubdtype does not check whether it is an "instance" (e.g. array with an inexact dtype or scalar of one).

That does not actually touch np.issubdtype, it rather makes it more precise/extensible.

I do not think we have API equivalent to pythons isinstance, which would be something like:

def is_dtype_instance(obj, DType):
    return isinstance(np.array(obj).dtype, DType)

Admittingly, it currently works with np.issubdtype, but that is only because np.dtype(obj) works if np.dtype(obj.dtype) succeeds.
That is the weird dual use Eric was referring too. np.issubdtype is really equivalent to issubclass.

@andyfaff
Copy link
Member

On a semi related topic, is it possible to enumerate all the inexact types somehow? Is there a list somewhere?

@seberg
Copy link
Member

seberg commented Oct 28, 2020

np.typecodes has them in a sense, other than that, probably not unless you want to check __subclasses__()...

@andyfaff
Copy link
Member

Are typecodes that the machine doesn't support eliminated from that list?

@eric-wieser
Copy link
Member

eric-wieser commented Oct 28, 2020

There is no such thing as an unsupported type, only aliased names that exist on some but not all platforms.

@axil
Copy link
Contributor

axil commented Jan 8, 2022

The very first example from the first post:

x.dtype.kind in np.typecodes["AllFloat"]

is apparently incorrect:

>>> np.array([1,2,3], dtype=np.uint8).dtype.kind in np.typecodes['UnsignedInteger']
False
>>> np.array([1,2,3], dtype=np.uint8).dtype.kind in np.typecodes['AllInteger']
False

It should've been x.dtype.char instead of x.dtype.kind:

>>> np.array([1,2,3], dtype=np.uint8).dtype.char in np.typecodes['UnsignedInteger']
False
>>> np.array([1,2,3], dtype=np.uint8).dtype.char in np.typecodes['AllInteger']
False

Also, bool, str, bytes, object and void dtypes ('?', 'U', 'S', 'O', 'V' accordingly) are missing from the np.typecodes dict (except under the 'All' key).

@InessaPawson InessaPawson changed the title add a canonical way to determine if dtype is integer, floating point or complex ENH: add a canonical way to determine if dtype is integer, floating point or complex Aug 27, 2022
@ntessore
Copy link
Contributor

ntessore commented Jun 7, 2023

FWIW, when implementing special functions, I find iscomplexobj to be the most useful interface. That's usually either for determining whether an input dtype needs to be widened for functions mapping reals to complexes,1 or whether a faster branch can be taken for purely real input.

Footnotes

  1. I would love to have a companion ascomplexobj() function that takes care of figuring out the appropriate complex type.

tylerjereddy added a commit to tylerjereddy/scipy that referenced this issue Jul 7, 2023
* replace `*sctype*` NumPy usage per
numpy/numpy#23999
and NEP52 in preparation for NumPy 2.0

* there seem to be straightforward replacements
that still pass the testsuite in all cases

* `git grep -E -i "sctype"` is clean on this branch
(only present in comments for clarity where needed)

* there may be better canonical ways to do some
of these things in the future, though considerable
confusion remains per numpy/numpy#17325

* `UMFPACK`-related changes were not tested locally
(wasn't particularly friendly for PyPI-based setup/venv)

[skip circle]
alugowski pushed a commit to alugowski/scipy that referenced this issue Jul 16, 2023
* replace `*sctype*` NumPy usage per
numpy/numpy#23999
and NEP52 in preparation for NumPy 2.0

* there seem to be straightforward replacements
that still pass the testsuite in all cases

* `git grep -E -i "sctype"` is clean on this branch
(only present in comments for clarity where needed)

* there may be better canonical ways to do some
of these things in the future, though considerable
confusion remains per numpy/numpy#17325

* `UMFPACK`-related changes were not tested locally
(wasn't particularly friendly for PyPI-based setup/venv)

[skip circle]
@rgommers
Copy link
Member Author

For 2.0 we deprecated or removed sctype APIs, we cleaned up dtype aliases so there's one canonical name for each dtype with Pythonic and one with C-like names, and we added numpy.isdtype. So I think we addressed this issue to a sufficient extent.

Thanks all!

@rgommers rgommers added this to the 2.0.0 release milestone Jun 16, 2024
MarDiehl added a commit to MarDiehl/DAMASK that referenced this issue Jun 19, 2024
MarDiehl added a commit to MarDiehl/DAMASK that referenced this issue Jun 19, 2024
MarDiehl added a commit to MarDiehl/DAMASK that referenced this issue Jun 19, 2024
MarDiehl added a commit to MarDiehl/DAMASK that referenced this issue Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants