Skip to content

ENH: Make np.number generic with respect to its precision #17540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Oct 21, 2020

Conversation

BvB93
Copy link
Member

@BvB93 BvB93 commented Oct 12, 2020

This pull request makes np.number generic with respect to its precision, the latter being
represented by a hierarchical set of subclasses during while type checking.

The Problem

A challenge which became apparent during the typing of the arithmetic magic methods was
"how to deal with the issue of numerical precision?", e.g. how to express float16 + int64 -> float64.
The most "straightforward" option would be to brute force the issue and simply add more
overloads, or if we're feeling particularly fancy to let a mypy plugin deal with the precision.

Both approaches suffer from a serious drawback though, namely that its scope will be limited
to a small subset of predetermined functions/methods (e.g. the arithmetic and bitwise ops).
This is undesirable for such a common pattern.

The Solution

This PR attempts to resolve the precision issue via an alternative approach the , one where
np.number is made generic w.r.t its precision. The latter allows one to, quite compactly,
annotate any function in a manner as demonstrated below:

from typing import TypeVar
import numpy as np
import numpy.typing as npt

T = TypeVar("T", bound=npt.NBitBase)  # The precision

# e.g. float16 + int64 -> float64
def add(a: "np.floating[T]", b: "np.integer[T]") -> "np.floating[T]":
    return a + b

The Details

To accomplish this the PR does the following two things:

  1. It adds a hierarchical set of subclasses representing the numbers numerical precision
    (numpy.typing.NBitBase). The classes hierarchical structure makes it easy to infer the return
    types' precision, i.e. find the common baseclass (a task which impossible with Literals for example).
class NBitBase: ...
class _64Bit(NBitBase): ...
class _32Bit(_64Bit): ...
class _16Bit(_32Bit): ...
  1. It makes np.number generic w.r.t. the newly introduced NBitBase class.
    As is demonstrated below, a consequence of this approach is that while the likes of float64 are still
    subtypes of floating they are no longer treated as formal subclasses (while type checking that is).
from typing import Generic, TypeVar

_NBit = TypeVar("_Nbit", bound=NBitBase)

class floating(Generic[NBit]): ...
float64 = floating[_64Bit]
float32 = floating[_32Bit]
float16 = floating[_16Bit]

TODO

  • Update the annotations and protocols
  • Update the tests

@seberg
Copy link
Member

seberg commented Oct 12, 2020

@BvB93 this makes me slightly worried about the extreme complexity involved. But this is just about the scalar types, right? I.e. no mixed scalars-arrays, no array-array operations? And also it is just about the basic math operators? In that case, I suppose its special enough to just do this.

More generically for array-ops and ufuncs, there is no reason to get access to runtime behaviour? Promotion rules are not simple and I would like promotion rules to be customizable (not that it should be used a lot, but there are tricky cases e.g. for datetimes).
I guess, it might be that all of those promotion rules, etc. just need to be duplicated for typing...

There is another difficulty here, which maybe you can ignore. In that 0-D arrays, use "value-based promotion" and their precision is sometimes demoted. It would be great to get rid of that in the long run (except probably for Python floats/ints), but we are not close.

@BvB93
Copy link
Member Author

BvB93 commented Oct 12, 2020

@BvB93 this makes me slightly worried about the extreme complexity involved. But this is just about the scalar types, right? I.e. no mixed scalars-arrays, no array-array operations? And also it is just about the basic math operators? In that case, I suppose its special enough to just do this.

Correct, this PR only affects the basic scalar + scalar bitwise and arithmetic operations.
Updating the array + array and array + scalar ops is planned for some point in the future,
but it's a difficult task and at minimum we'd first have to make ndarray generic w.r.t. to
its' data type.

More generically for array-ops and ufuncs, there is no reason to get access to runtime behaviour?

Can you clarify what you mean with this statement?

I guess, it might be that all of those promotion rules, etc. just need to be duplicated for typing...

Yup, for the number types themselves (integer, floating, etc) there is no escaping the duplication process.
This PR does make it a bit easier to deal with their precision.

There is another difficulty here, which maybe you can ignore. In that 0-D arrays, use "value-based promotion" and their precision is sometimes demoted. It would be great to get rid of that in the long run (except probably for Python floats/ints), but we are not close.

Yes, I've noticed this in previous PRs when typing ndarray methods. Definitely makes things more complicated,
but it's not a hindrance for this particular PR as it is limited to numbers.

@BvB93
Copy link
Member Author

BvB93 commented Oct 12, 2020

The CI failure is expected as the tests still have to be updated.

@BvB93 BvB93 added this to the 1.20.0 release milestone Oct 16, 2020
@BvB93 BvB93 marked this pull request as ready for review October 16, 2020 14:47
Bas van Beek added 11 commits October 17, 2020 18:05
* Removed redundant `type: ignore` messages
* Set the return precision as `Union[_NBit_co, _NBit]`
* Type the precision of `builtins.int` operations as `Any`
Mypy uses a `*` whenever an annotation or one of its parameters is based on a TypeVar.

Its added value is neglible and it unnecessarily complicates the `reveal` tests so lets just ignore them.

Note that this is done after running mypy, so it won't affect cases where `*` is used as multiplication operator.
@BvB93
Copy link
Member Author

BvB93 commented Oct 21, 2020

Any further comments on this PR?

@mattip
Copy link
Member

mattip commented Oct 21, 2020

It seems to be desirable to use more generic expressions but I do think it would be nice to have a shorter alias like complex64 for numpy.complexfloating[numpy.typing._64Bit, numpy.typing._64Bit]. i guess that can be a future enhancement.

@mattip mattip merged commit ebc57e1 into numpy:master Oct 21, 2020
@mattip
Copy link
Member

mattip commented Oct 21, 2020

Thanks @BvB93

@BvB93
Copy link
Member Author

BvB93 commented Oct 21, 2020

It seems to be desirable to use more generic expressions

I completely agree with this.
Unfortunately this is not something we can do on our side; it'll have to be implemented in mypy.
It seems there is already an open issue about the subject (python/mypy#9381).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants