ENH: Make isfinite work with datetime/timedelta. #7856

ssanderson · 2016-07-21T03:47:31Z

Adds new ufunc definitions for isfinite for datetime/timedelta.
Fixes an issue where ufunc_loop_matches would always return False
for datetime inputs with units, because it used
PyArray_CanCast{Type,Array}To to check if the input array was
compatible with a target dtype, but the target dtype is created with
PyArray_DescrFromType, which means it will never have any unit
metadata.

The cleanest fix for this, I think, would be to copy the unit metadata
from the dtype of the input array to the temporary dtype used to
determine casting safety, but that requires the ability to call
get_datetime_metadata_from_dtype, which is defined in multiarray
from ufunc_loop_matches, which is defined in umath, and I couldn't
figure out how do make that happen correctly (I think I need to create
a public wrapper and put it in numpy_api.py?).

The somewhat hacky fix is to short-circuit if the type number being
checked matches the type of the input being checked. This avoids
calling PyArray_CanCastTo in the case where we have exactly-equal
types. It's not totally clear to me whether this is safe though.

ssanderson · 2016-07-21T15:41:41Z

Hrm, the short-circuit here seems to be changing how relational operators get resolved.

ssanderson · 2016-07-23T20:30:23Z

So, it turns out the build failures here were actually just because of bugs that were introduced in np.testing.assert_equal by the fact that isfinite now works on datetime/timedelta. I think this should be correct now, but it'd be good to get confirmation from someone who knows more than I do about the expected semantics of ufunc_loop_matches.

shoyer · 2016-07-23T21:41:48Z

numpy/core/src/umath/ufunc_type_resolution.c

+        else if (types[i] == PyArray_TYPE(op[i])) {
+            /*
+             * Don't bother doing casting checks if the type numbers exactly
+             * match.


It might be safer to do this check outside the loop so that it only checks the first element, possibly special cased for timedelta/datetime. I can see this check going wrong for multiple input arguments.

I can see this check going wrong for multiple input arguments.

I'm not sure I understand why the ufunc arity would make a difference. As I understand it, this function is used to check whether a ufunc-implementation registered for a specific set of input/output type codes is compatible with a set of supplied input/output arrays, and the loop here is iterating through the type codes registered for the inputs and checking whether the supplied arrays are compatible with those type codes. I can't think of a circumstance where we'd register a ufunc with input typecodes (a, b, c) where we wouldn't accept arrays with exactly those typecodes.

Is the worry that we might have something like a binary function taking two datetime/timedelta inputs that depends on the unit metadata matching exactly? I'd naively expect that it'd be that function's responsibility to validate metadata-specifc constraints.

My concern with a ufunc taking multiple variables is that we don't have any way to distinguish between ufuncs that accept arguments with the same time units or any time units. E.g., if S and T are typevar variables, distinguishing between signatures (datetime64[T], datetime64[T]) and (datetime64[S], datetime64[T]).

In practice, I'm not sure this is a major concern for ufuncs added in NumPy, because unit tests should catch these sorts of issues. But at least in theory, datetime64 ufuncs can also be written in third party code (e.g., with Numba).

Basically, I think it's important to emphasize that this is a special case rule in the comments and limit it's scope as much as possible to ensure as few complications as possible.

My concern with a ufunc taking multiple variables is that we don't have any way to distinguish between ufuncs that accept arguments with the same time units or any time units.

That makes sense I think. It seems like the right long term fix for this would be to provide an API for ufunc registration that provides more granularity than just the typenum (for example, by allowing registration with a full PyArray_Descr)?

Basically, I think it's important to emphasize that this is a special case rule in the comments and limit it's scope as much as possible to ensure as few complications as possible.

I think if we truly just want a special case here, it's safer/clearer to just add a new type resolver instead of adding an obscure branch to the default resolver to make one datetime ufunc work. Does that seem reasonable?

I think if we truly just want a special case here, it's safer/clearer to just add a new type resolver instead of adding an obscure branch to the default resolver to make one datetime ufunc work. Does that seem reasonable?

Yes, agreed. We could make it slightly more generic (e.g., to handle all datetime64->bool signatures instead of just isfinite) but this sounds good for now.

I think the long-term design we want here for handling parametrized dtypes in the ufunc system is that loops should be matched based on the typenum equivalent, and then the ufunc loop gets access to the actual PyArray_Descr.

ssanderson · 2016-07-23T22:06:59Z

AppVeyor failure looks unrelated to the changes here.

ssanderson · 2016-07-24T23:15:44Z

@shoyer updated to use a custom resolver. The first build passed on every target except USE_CHROOT=1 ARCH=i386 DIST=trusty PYTHON=2.7, where it failed with Error in 'python': free(): invalid pointer: 0x0e8103e0.

Cursory googling suggests to me that that wouldn't be caused by a ref-counting issue, so I'm not sure what else in the changes here would cause that issue.

ssanderson · 2016-07-24T23:37:10Z

Segfaulted on the second run. Failure appears to be real :/. It looks like the USE_CHROOT build is actually for testing 32-bit systems? How do people usually go about debugging 32-bit-specfic failures?

homu · 2016-09-02T15:09:05Z

☔ The latest upstream changes (presumably #7148) made this pull request unmergeable. Please resolve the merge conflicts.

seberg · 2016-12-27T18:11:53Z

numpy/core/src/umath/loops.c.src

+@TYPE@_isfinite(char **args, npy_intp *dimensions, npy_intp *steps, void *NPY_UNUSED(func))
+{
+    OUTPUT_LOOP {
+        *((@type@ *)op1) = 1;


@type@ seems wrong, probably the reason for your segfaults.

I am wondering whether we should do this, implement np.isnat instead, or do both (and if we implement isnat, do I have to do this user type resolution correctly like you did.

From a user perspective, I would usually prefer to be able to use a function like np.isfinite (or maybe np.isna) that does not require me to know the dtype of the array ahead of time. But np.isnat could also be useful in some cases.

Having a np.isna function that handles both nans and NATs might be good. I'm thinking of the future when we might have "missing value" dtypes. Also note that all the nan* functions can also be easily modified to handle NA, or even finite values. Makes me wonder if those functions are too specific.

So in that future, I guess we would have np.isna as well as np.ismissing or so and np.isna would be something like np.ismissing | np.isnan?

- Adds new ufunc definitions for isfinite for datetime/timedelta. - Adds a new ufunc type resolver, ``PyUFunc_UnaryPredicateResolver``, which handles ufuncs that take a single argument and always produce a bool. This resolver is currently only used for ``isfinite``. - Adds explicit loop definitions for ``isfinite`` on integral and boolean types. Previously, ``isfinite`` worked because integer and boolean arrays would be (comparatively expensively) coerced to float and then dispatched to float loops. We now just fill the output array with 1s, since all integral values are finite. This yields a modest speedup, albeit for a somewhat silly operation.

In order to handle the difference between 0.0 and -0.0 on scalar values, assert_array_equal used to compare scalar inputs against 0. This triggers a DeprecationWarning when np.datetime64 and np.timedelta64 objects are provided, because comparisons between integers and datetime types currently works but is deprecated. By chance, we used to avoid this warning because we would call ``gisfinite`` in the previous block, which would raise a TypeError on datetime types, causing us to skip to direct equality comparison. Now that ``isfinite`` works for datetimes, we continue on to the next block, which triggers the warning. The fix is to avoid comparison against 0 entirely and just unconditionally check that signbits are equal. This is a harmless no-op for integer-based types, and still correctly handles floating-point types.

ssanderson · 2017-03-19T21:53:31Z

@shoyer I finally got around to fixing the issues on this PR. Segfaults on 32-bit are now fixed after applying @seberg's comment on the incorrect cast in the integer specializations for isfinite.

It looks like there's been some discussion in the interim on whether this is still a desirable change. Should I post a question to the mailing list?

shoyer · 2017-03-22T01:32:36Z

I don't think we need to debate this on the mailing list: this seems like a very clear win to me, regardless of whether or not we choose to add a more specialized function like isnat.

eric-wieser · 2017-03-26T14:30:43Z

@ssanderson: Yes and no. Yes, it's the right place, but no, the tests are not currently there.

You want test_type_check.py

homu · 2017-05-01T09:39:33Z

☔ The latest upstream changes (presumably #8967) made this pull request unmergeable. Please resolve the merge conflicts.

seberg · 2019-09-22T00:36:18Z

This was actually implemented in NumPy 1.17 in a separate PR.

ssanderson mentioned this pull request Jul 21, 2016

Add isfinite support for datetime64 and timedelta64 #5610

Closed

charris added 01 - Enhancement component: numpy._core labels Jul 23, 2016

ssanderson mentioned this pull request Jul 23, 2016

Document runtests extra argv #7866

Merged

shoyer reviewed Jul 23, 2016
View reviewed changes

ssanderson force-pushed the nat-isfinite branch 3 times, most recently from 5d2d99c to f20d137 Compare July 24, 2016 21:50

seberg reviewed Dec 27, 2016

View reviewed changes

seberg mentioned this pull request Dec 27, 2016

ENH: Add isnat function and make comparison tests NAT specific #8421

Merged

1 task

ssanderson force-pushed the nat-isfinite branch from 8da67f9 to 3f8068d Compare March 19, 2017 20:34

Scott Sanderson added 3 commits March 19, 2017 16:51

DOC: Remove misplaced comment.

571b597

ssanderson force-pushed the nat-isfinite branch from 3f8068d to 571b597 Compare March 19, 2017 20:51

seberg closed this Sep 22, 2019

Uh oh!

ENH: Make isfinite work with datetime/timedelta. #7856

ENH: Make isfinite work with datetime/timedelta. #7856

Uh oh!

Conversation

ssanderson commented Jul 21, 2016

Uh oh!

ssanderson commented Jul 21, 2016

Uh oh!

ssanderson commented Jul 23, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ssanderson commented Jul 23, 2016

Uh oh!

ssanderson commented Jul 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ssanderson commented Jul 24, 2016

Uh oh!

homu commented Sep 2, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ssanderson commented Mar 19, 2017

Uh oh!

shoyer commented Mar 22, 2017

Uh oh!

eric-wieser commented Mar 26, 2017

Uh oh!

homu commented May 1, 2017

Uh oh!

seberg commented Sep 22, 2019

Uh oh!

Uh oh!

ssanderson commented Jul 24, 2016 •

edited

Loading