Skip to content

ENH: Make isfinite work with datetime/timedelta. #7856

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

ssanderson
Copy link

  • Adds new ufunc definitions for isfinite for datetime/timedelta.

  • Fixes an issue where ufunc_loop_matches would always return False
    for datetime inputs with units, because it used
    PyArray_CanCast{Type,Array}To to check if the input array was
    compatible with a target dtype, but the target dtype is created with
    PyArray_DescrFromType, which means it will never have any unit
    metadata.

    The cleanest fix for this, I think, would be to copy the unit metadata
    from the dtype of the input array to the temporary dtype used to
    determine casting safety, but that requires the ability to call
    get_datetime_metadata_from_dtype, which is defined in multiarray
    from ufunc_loop_matches, which is defined in umath, and I couldn't
    figure out how do make that happen correctly (I think I need to create
    a public wrapper and put it in numpy_api.py?).

    The somewhat hacky fix is to short-circuit if the type number being
    checked matches the type of the input being checked. This avoids
    calling PyArray_CanCastTo in the case where we have exactly-equal
    types. It's not totally clear to me whether this is safe though.

@ssanderson
Copy link
Author

Hrm, the short-circuit here seems to be changing how relational operators get resolved.

@ssanderson
Copy link
Author

So, it turns out the build failures here were actually just because of bugs that were introduced in np.testing.assert_equal by the fact that isfinite now works on datetime/timedelta. I think this should be correct now, but it'd be good to get confirmation from someone who knows more than I do about the expected semantics of ufunc_loop_matches.

else if (types[i] == PyArray_TYPE(op[i])) {
/*
* Don't bother doing casting checks if the type numbers exactly
* match.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be safer to do this check outside the loop so that it only checks the first element, possibly special cased for timedelta/datetime. I can see this check going wrong for multiple input arguments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see this check going wrong for multiple input arguments.

I'm not sure I understand why the ufunc arity would make a difference. As I understand it, this function is used to check whether a ufunc-implementation registered for a specific set of input/output type codes is compatible with a set of supplied input/output arrays, and the loop here is iterating through the type codes registered for the inputs and checking whether the supplied arrays are compatible with those type codes. I can't think of a circumstance where we'd register a ufunc with input typecodes (a, b, c) where we wouldn't accept arrays with exactly those typecodes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the worry that we might have something like a binary function taking two datetime/timedelta inputs that depends on the unit metadata matching exactly? I'd naively expect that it'd be that function's responsibility to validate metadata-specifc constraints.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern with a ufunc taking multiple variables is that we don't have any way to distinguish between ufuncs that accept arguments with the same time units or any time units. E.g., if S and T are typevar variables, distinguishing between signatures (datetime64[T], datetime64[T]) and (datetime64[S], datetime64[T]).

In practice, I'm not sure this is a major concern for ufuncs added in NumPy, because unit tests should catch these sorts of issues. But at least in theory, datetime64 ufuncs can also be written in third party code (e.g., with Numba).

Basically, I think it's important to emphasize that this is a special case rule in the comments and limit it's scope as much as possible to ensure as few complications as possible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern with a ufunc taking multiple variables is that we don't have any way to distinguish between ufuncs that accept arguments with the same time units or any time units.

That makes sense I think. It seems like the right long term fix for this would be to provide an API for ufunc registration that provides more granularity than just the typenum (for example, by allowing registration with a full PyArray_Descr)?

Basically, I think it's important to emphasize that this is a special case rule in the comments and limit it's scope as much as possible to ensure as few complications as possible.

I think if we truly just want a special case here, it's safer/clearer to just add a new type resolver instead of adding an obscure branch to the default resolver to make one datetime ufunc work. Does that seem reasonable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we truly just want a special case here, it's safer/clearer to just add a new type resolver instead of adding an obscure branch to the default resolver to make one datetime ufunc work. Does that seem reasonable?

Yes, agreed. We could make it slightly more generic (e.g., to handle all datetime64->bool signatures instead of just isfinite) but this sounds good for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the long-term design we want here for handling parametrized dtypes in the ufunc system is that loops should be matched based on the typenum equivalent, and then the ufunc loop gets access to the actual PyArray_Descr.

@ssanderson
Copy link
Author

AppVeyor failure looks unrelated to the changes here.

@ssanderson ssanderson force-pushed the nat-isfinite branch 3 times, most recently from 5d2d99c to f20d137 Compare July 24, 2016 21:50
@ssanderson
Copy link
Author

ssanderson commented Jul 24, 2016

@shoyer updated to use a custom resolver. The first build passed on every target except USE_CHROOT=1 ARCH=i386 DIST=trusty PYTHON=2.7, where it failed with Error in 'python': free(): invalid pointer: 0x0e8103e0.

Cursory googling suggests to me that that wouldn't be caused by a ref-counting issue, so I'm not sure what else in the changes here would cause that issue.

@ssanderson
Copy link
Author

Segfaulted on the second run. Failure appears to be real :/. It looks like the USE_CHROOT build is actually for testing 32-bit systems? How do people usually go about debugging 32-bit-specfic failures?

@homu
Copy link
Contributor

homu commented Sep 2, 2016

☔ The latest upstream changes (presumably #7148) made this pull request unmergeable. Please resolve the merge conflicts.

@TYPE@_isfinite(char **args, npy_intp *dimensions, npy_intp *steps, void *NPY_UNUSED(func))
{
OUTPUT_LOOP {
*((@type@ *)op1) = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@type@ seems wrong, probably the reason for your segfaults.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether we should do this, implement np.isnat instead, or do both (and if we implement isnat, do I have to do this user type resolution correctly like you did.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a user perspective, I would usually prefer to be able to use a function like np.isfinite (or maybe np.isna) that does not require me to know the dtype of the array ahead of time. But np.isnat could also be useful in some cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a np.isna function that handles both nans and NATs might be good. I'm thinking of the future when we might have "missing value" dtypes. Also note that all the nan* functions can also be easily modified to handle NA, or even finite values. Makes me wonder if those functions are too specific.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in that future, I guess we would have np.isna as well as np.ismissing or so and np.isna would be something like np.ismissing | np.isnan?

Scott Sanderson added 3 commits March 19, 2017 16:51
- Adds new ufunc definitions for isfinite for datetime/timedelta.

- Adds a new ufunc type resolver, ``PyUFunc_UnaryPredicateResolver``,
  which handles ufuncs that take a single argument and always produce a
  bool.  This resolver is currently only used for ``isfinite``.

- Adds explicit loop definitions for ``isfinite`` on integral and
  boolean types.  Previously, ``isfinite`` worked because integer and
  boolean arrays would be (comparatively expensively) coerced to float
  and then dispatched to float loops.  We now just fill the output array
  with 1s, since all integral values are finite.  This yields a modest
  speedup, albeit for a somewhat silly operation.
In order to handle the difference between 0.0 and -0.0 on scalar values,
assert_array_equal used to compare scalar inputs against 0.

This triggers a DeprecationWarning when np.datetime64 and np.timedelta64
objects are provided, because comparisons between integers and datetime
types currently works but is deprecated.

By chance, we used to avoid this warning because we would call
``gisfinite`` in the previous block, which would raise a TypeError on
datetime types, causing us to skip to direct equality comparison.

Now that ``isfinite`` works for datetimes, we continue on to the next
block, which triggers the warning.

The fix is to avoid comparison against 0 entirely and just
unconditionally check that signbits are equal.  This is a harmless no-op
for integer-based types, and still correctly handles floating-point
types.
@ssanderson
Copy link
Author

@shoyer I finally got around to fixing the issues on this PR. Segfaults on 32-bit are now fixed after applying @seberg's comment on the incorrect cast in the integer specializations for isfinite.

It looks like there's been some discussion in the interim on whether this is still a desirable change. Should I post a question to the mailing list?

@shoyer
Copy link
Member

shoyer commented Mar 22, 2017

I don't think we need to debate this on the mailing list: this seems like a very clear win to me, regardless of whether or not we choose to add a more specialized function like isnat.

@eric-wieser
Copy link
Member

@ssanderson: Yes and no. Yes, it's the right place, but no, the tests are not currently there.

You want test_type_check.py

@homu
Copy link
Contributor

homu commented May 1, 2017

☔ The latest upstream changes (presumably #8967) made this pull request unmergeable. Please resolve the merge conflicts.

@seberg
Copy link
Member

seberg commented Sep 22, 2019

This was actually implemented in NumPy 1.17 in a separate PR.

@seberg seberg closed this Sep 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants