ENH: Add dtype support to the array comparison ops #18128

BvB93 · 2021-01-05T21:21:17Z

Per the title: this PR adds dtype support to the array comparison ops (__lt__, __le__, etc.).

Note that in order to get this to work the variance of the dtype and ndarray parameters had to be changed invariant (the default) to covariant. Without this, the likes of np.dtype[np.int64] would not be considered sub-types of np.dtype[np.integer], making things needlessly complicated.

Examples

from __future__ import annotations

from typing import Any, TYPE_CHECKING
import numpy as np

AR_i8: np.ndarray[Any, np.dtype[np.int64]]
AR_f8: np.ndarray[Any, np.dtype[np.float64]]
AR_m: np.ndarray[Any, np.dtype[np.timedelta64]]

if TYPE_CHECKING:
    # note: Revealed type is 'Union[numpy.bool_, numpy.ndarray[Any, numpy.dtype[numpy.bool_]]]'
    reveal_type(AR_f8 < AR_i8)

    # error: Unsupported operand types for < ("ndarray[Any, dtype[floating[_64Bit]]]" and "ndarray[Any, dtype[timedelta64]]")  [operator]
    reveal_type(AR_f8 < AR_m)

BvB93 · 2021-01-05T21:28:38Z

numpy/__init__.pyi

+    # `Sequence[int]`) and `str`. As `str` is a recusive sequence of
+    # strings, it will pass through the final overload otherwise
+
+    @overload


To summarize the various overloads:

Overload for filtering out any str/bytes-based array-likes. This is needed as bytes would otherwise be recognized as a Sequence[int] sub-type (which is bad) and str would otherwise be caught by overload 6. (as str is a sequence of strings, which is a sequence of strings, etc.).

Number-based overload

Timedelta-based overload

Datetime-based overload

Object-based overload. other is typed as Any here as object arrays can be extremely flexible (depending on the actual underlying objects).

A final overload to handle all sequences whose nesting level is too deep for the previous overloads.

a-reich · 2021-01-10T03:15:59Z

Just curious about something since I've been following the work to add typing support for dtypes: won't making a mutable container type covariant cause issues? For the usual reasons that, e.g. if my_func accepts ndarray[Any, np.dtype[np.float64]) it could assign values of float64 to the array, but calling my_func with ndarray[Any, np.dtype[np.float32]) would no longer raise typecheck errors despite being wrong.

BvB93 · 2021-01-11T14:27:14Z

@a-reich this is arguably more of a problem with the variancy of number, one that I'm not quite about what would be the best solution. Namely, making number[~NBit] invariant (instead of covariant) w.r.t. its precision would resolve the issue. On the other hand, it would also make precision-based casting even difficult than it already is, to the point where I'm not convinced it'd be worthwhile. The latter means ditching the limited cases where we can currently deal with precisions, i.e. changing the return types to the likes of number[Any].

In any case, besides abovementioned issue the problem with leaving ndarray (and dtype) invariant is that it leads to some strange situations with the abstract(-ish) part of the generic hierarchy. For example, it would mean that a float64 array would not be acceptable if an inexact array is expected.

seberg · 2021-01-11T16:15:08Z

@BvB93 I am trying to get a full picture for this. For your last example, it seems clear that inexact must behave covariant (as do all abstract dtypes). The concrete dtypes/types must be invariant (in most ways at least), but because of that, my preferred solution is to not allow subclassing at all (and if we ever allow it, we would be very careful to note that it must be extremely limited – whatever that means exactly).

The number[Any], I am a bit confused about how/where this is used (with a bit-width)?
My first thought would be that number[Any] would be covariant as well (it is an abstract dtype/type), but it would be "invariant" with respect to precision since a higher precision is not a subtype to begin with?

a-reich · 2021-01-12T01:31:34Z

@BvB93 I see what you mean about the abstract parts of the dtype hierarchy making it difficult. However, your point about invariance of number w.r.t. to precision is interesting.
I guess ultimately, there might not be a way to have the whole type system work "correctly" here, so it comes down to which way generates more false positives/false negatives and is more annoying for downstream users to patch over? For example, how common actually are operations with different precisions, and if lack of automatic inference by mypy means users have to annotate their casting explicitly how bad is that? Compared to the false negatives for assigning with wrong precision (which admittedly does not raise at runtime, but silently stores truncated results).

BvB93 · 2021-01-12T21:51:04Z

The number[Any], I am a bit confused about how/where this is used (with a bit-width)?

To give a bit of background: number is currently parametrized w.r.t. a set of npt.NBitBase subclasses,
the latter used for representing their precision via an object-based approach.

_NBit_co = TypeVar(..., covariant=True)

class floating(inexact[_NBit_co]):
    ...

# Note that the types below are not necasrelly treated as `floating` subclasses,
# it's more along the line of how `Sequence[int]` is a subtype of  plain `Sequence`.
#
# This greatly simplifies the rather complicated typing of precisions, as you can always 
# just hit the panic button when things get too complicated: _e.g._ `return floating[Any]`
float16 = floating[_16Bit]
float32 = floating[_32Bit]
float64 = floating[_64Bit]

The key point is that because NBitBase subclasses inherit from each other, whenever one of them is
used as a covariant parameter it would allow mypy to simplify the likes of Union[_64Bit, _32Bit] to
_64Bit, allowing for some basic precision-based casting without adding loads of overloads.

I recall from #17540 that something like this was actually possible for invariant parameters, but I
believe there were more immediate issues with it, at least compared to its covariant counterpart.
I'd have to a bit of digging, as I don't quite remember the details.

BvB93 · 2021-01-12T21:53:36Z

As a side note:
Let's wait with merging until #18155 is merged, as it would allow for the removal of an overload.

BvB93 · 2021-01-15T17:00:51Z

Never mind #18128 (comment), turns out there were, unfortunately, detrimental problems with the therein referenced PR

charris · 2021-01-19T18:55:39Z

Needs rebase

The dtypes scalar-type and ndarrays' dtype are now covariant instead of invariant. This change is necasary in order to ensure that all generic subclasses can be used as underlying scalar type.

…ere neglected More specifically operations between array-likes of `timedelta64` and `ndarray`s that can be cast into `timedelta64`. For example: ar_i = np.array([1]) seq_m = [np.timedelta64()] ar_i > seq_m

BvB93 · 2021-01-21T13:25:46Z

The branch has been rebased and the tests seem to be passing.

charris · 2021-01-21T23:45:11Z

Thanks Bas.

BvB93 added 01 - Enhancement 41 - Static typing labels Jan 5, 2021

BvB93 commented Jan 5, 2021

View reviewed changes

BvB93 mentioned this pull request Jan 12, 2021

ENH,API: Add a protocol for representing nested sequences #18155

Closed

This was referenced Jan 15, 2021

MAINT: Changed the NBitBase variancy in number from co- to invariant #18174

Merged

MAINT: Give the _<X>Like and _ArrayLike<X> type aliases a more descriptive name #18185

Merged

Bas van Beek added 5 commits January 19, 2021 20:09

ENH: Added _ArrayLikeNumber

9f52ec5

ENH: Added dtype support to the array comparison ops

f3dd737

MAINT: Made dtype and ndarray covariant

04fddcc

The dtypes scalar-type and ndarrays' dtype are now covariant instead of invariant. This change is necasary in order to ensure that all generic subclasses can be used as underlying scalar type.

TST: Updated the comparison typing tests

3aa0bf6

MAINT: Fixed an issue where certain array > arraylike operations wh…

6e84855

…ere neglected More specifically operations between array-likes of `timedelta64` and `ndarray`s that can be cast into `timedelta64`. For example: ar_i = np.array([1]) seq_m = [np.timedelta64()] ar_i > seq_m

BvB93 force-pushed the comparison2 branch from e20df11 to 6e84855 Compare January 19, 2021 19:15

charris merged commit 33273e4 into numpy:master Jan 21, 2021

This was referenced Jan 25, 2021

ENH: Add dtype-support to the ufunc-based ndarray magic methods 2/4 #18228

Merged

ENH: Add aliases for commonly used dtype-like objects #18236

Merged

BvB93 deleted the comparison2 branch January 29, 2021 13:07

jakebailey mentioned this pull request Jul 29, 2021

subtraction of numpy.ndarrays results in incorrect NoReturn type microsoft/pylance-release#1619

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add dtype support to the array comparison ops #18128

ENH: Add dtype support to the array comparison ops #18128

BvB93 commented Jan 5, 2021

BvB93 Jan 5, 2021

a-reich commented Jan 10, 2021

BvB93 commented Jan 11, 2021

seberg commented Jan 11, 2021

a-reich commented Jan 12, 2021

BvB93 commented Jan 12, 2021

BvB93 commented Jan 12, 2021 •

edited

Loading

BvB93 commented Jan 15, 2021

charris commented Jan 19, 2021

BvB93 commented Jan 21, 2021

charris commented Jan 21, 2021

ENH: Add dtype support to the array comparison ops #18128

ENH: Add dtype support to the array comparison ops #18128

Conversation

BvB93 commented Jan 5, 2021

Examples

BvB93 Jan 5, 2021

Choose a reason for hiding this comment

a-reich commented Jan 10, 2021

BvB93 commented Jan 11, 2021

seberg commented Jan 11, 2021

a-reich commented Jan 12, 2021

BvB93 commented Jan 12, 2021

BvB93 commented Jan 12, 2021 • edited Loading

BvB93 commented Jan 15, 2021

charris commented Jan 19, 2021

BvB93 commented Jan 21, 2021

charris commented Jan 21, 2021

BvB93 commented Jan 12, 2021 •

edited

Loading