Skip to content

Add isfinite support for datetime64 and timedelta64 #5610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gerritholl opened this issue Feb 26, 2015 · 28 comments · Fixed by #13218
Closed

Add isfinite support for datetime64 and timedelta64 #5610

gerritholl opened this issue Feb 26, 2015 · 28 comments · Fixed by #13218

Comments

@gerritholl
Copy link
Contributor

numpy.isfinite should support datetime64 and timedelta64 objects. It should return True for any ordinary value, and False for datetime64('NaT') and timedelta64('NaT').

As a side-effect, this will make isclose and allclose usable for datetime64 and timedelta64 objects.

In [97]: numpy.isfinite(numpy.datetime64(10**9, 's'))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-97-8555b18406e5> in <module>()
----> 1 numpy.isfinite(numpy.datetime64(10**9, 's'))

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
@shoyer
Copy link
Member

shoyer commented Oct 14, 2015

I made some progress on this today: cc99cdd /
https://github.com/shoyer/numpy/tree/datetime64-isfinite

It works for unit-less datetime64/timedelta64, but fails for dates or times with units attached:

In [1]: import numpy as np

In [2]: np.isfinite(np.timedelta64(10))
Out[2]: True

In [3]: np.isfinite(np.timedelta64('NaT'))
Out[3]: False

In [4]: np.isfinite(np.timedelta64('NaT', 'us'))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-3998f88199cc> in <module>()
----> 1 np.isfinite(np.timedelta64('NaT', 'us'))

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

It appears that this needs something with the ufunc dtype resolution to handle datetimes with units, but I'm not quite sure what that is. Somehow PyUFunc_SimpleBinaryComparisonTypeResolver and PyUFunc_SimpleUnaryOperationTypeResolver can handle datetime64 with units but the default PyUFunc_DefaultTypeResolver cannot.

@seberg
Copy link
Member

seberg commented Jan 21, 2016

@shoyer I was wondering whether you got somewhere with this? I fear that if we change the NaT comparison handling, we probably cannot avoid implementing isnan and isfinite for it.
Otherwise even our own tests should not like them (though of course we could work around that).

EDIT: See also gh-5222

@shoyer
Copy link
Member

shoyer commented Jan 22, 2016

@seberg nope, I got stuck on ufunc type resolution (in the linked branch). I do agree this would be useful, though I usually find checking for x != x is a good way to identify NaN/NaT even without a dedicated function.

@rabernat
Copy link

I just hit this bug. I hope it can be fixed.

@ssanderson
Copy link

ssanderson commented Jul 20, 2016

I took a look at @shoyer's branch at cc99cdd. The issue seems to be that the default type resolver ends up invoking linear_search_type_resolver, which essentially executes something like:

for type in ufunc.registered_types:
    if ufunc_loop_matches(input_data, type, ...):
       <use type>

(the actual code here is https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_type_resolution.c#L1867).

For at least the scalar case, ufunc_loop_matches essentially boils down to a call to PyArray_CanCastTypeTo: https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_type_resolution.c#L1529-L1535 with a casting rule of 'safe'.

The issue ends up being that the input dtype is datetime64[<unit>] or timedelta64[<unit>], but the type registered for the ufunc is just vanilla datetime64 or timedelta64. Since casting from a datetime with a unit to one without a unit isn't deemed, safe, we decide that we don't have a ufunc registered for the input data. (This is why the dispatch works correctly on unit-less datetimes/timedeltas.).

I think the easiest fix here is probably to just add a new ufunc type resolver for isfinite. It doesn't seem worthwhile to try to shoehorn datetime/timedelta-handling logic into the default resolver, since almost all ufuncs that work with datetimes/timedeltas seem to need their own custom behavior to deal with unit conversion (there are already custom resolvers used for add, subtract and multiply for example.

@seberg
Copy link
Member

seberg commented Jul 20, 2016

@ssanderson would it also work to register the datetime types as types to np.isfinite and np.isnan (assuming we are OK using those names)? Honestly, have not tried to understand the issues yet really.

If you ahve some time, we all always appreciate pull requests!

@ssanderson
Copy link

@seberg I don't think it's possible to register ufuncs for the specific unit. Looking at the signature for PyUFunc_FromFuncAndData, it looks like you can only provide type numbers, but the units are metadata attached to the type number.

@ssanderson
Copy link

I made some progress on a custom resolver last night; I'll see if I can tie off that work tonight.

@shoyer
Copy link
Member

shoyer commented Jul 21, 2016

It's possibly worth figuring out what path in ufunc_type_resolution isnan/isfinite normally take (for float data) and why that doesn't work for datetime64.

@ssanderson
Copy link

ssanderson commented Jul 21, 2016

@shoyer I think my long post above outlines the path they take.

The root issue is that part of (at least the generic case) of ufunc resolution is checking whether the input dtype can be safely cast to a dtype for which there's a registered ufunc for the function being dispatched. ufuncs that work on datetime/timedelta are always just registered with a generic dtype, and it's not considered safe to cast from datetime64[<unit>] to generic datetime64 (which seems correct in general?) which means that, as far as I can tell, every ufunc that works on datetime/timedelta needs a custom resolver to work correctly.

The only exception seems to be ufuncs dispatched with PyUFunc_SimpleBinaryComparisonTypeResolver, which directly invokes PyArray_ResultType to figure out how to coerce the input, and then always yields NPY_BOOL for the output. Something similar, e.g. a PyUFunc_SimpleUnaryPredicateTypeResolver for functions that always yield NPY_BOOL would probably work for isfinite, and isnan if we wanted to make that work for datetime (I'd personally be 👎 on that: I'd rather have a generic isnull than have isnan work on non-floating values).

@ssanderson
Copy link

@shoyer specifically, the reason we end up trying to check coercions between a generic datetime and a unit-having datetime is that we get our target type in ufunc_loop_matches by calling PyArray_DescrFromType on the typenum being checked. The dtype created there has no unit set, so it behaves like a generic datetime when we try to check coercions against unit-aware inputs later on in the function.

So now I'm wondering if maybe the right fix here is to check if we're resolving a datetime/timedelta and if so copy the metadata from our input to the temporary dtype used for the casting check...

@ssanderson
Copy link

Here's an attempt that gets the isfinite tests working for datetime and friends: #7856.

I'm not totally sure whether the short-circuit on equal typenums I added is safe. The alternative, I think, is to figure out how to correctly populate the temp dtype's metadata by copying it from the input array. As far as I can tell, that requires a new publicly-visible function in numpy_api.py. At the very least, I couldn't figure out how to call a private function defined in multiarray from a function defined in umath.

@pwolfram
Copy link
Contributor

@ssanderson, do we have any movement on this issue? It would be great if we could get this in sometime soon because it is affecting our workflow (MPAS-Dev/MPAS-Analysis#269). Can we get some idea of what is needed to resolve this for the short-term? For instance, #7856 has gone stale-- what help (if any) is needed to revitalize the PR? Or, is an alternative approach needed?

@seberg
Copy link
Member

seberg commented Nov 17, 2017

By now we have np.isnat, a PR is probably good. It might be that you can steal almost everything from there. Dunno if there is any discussion necessary about whether or not we should add this, but allowing it in isfinite seems OK.

@starfleetjames
Copy link

starfleetjames commented Dec 22, 2017

Bump. Ran into np.isnan failing on pd.datetimeindex, tried converting it to a datetime, pandas.Timestamp, and np.datetime64 but same issue.

@willgdjones
Copy link

Yep, just ran into this issue today too.

@mhvk
Copy link
Contributor

mhvk commented Mar 14, 2018

I think what this requires is including the datetime and timedelta dtypes in the loops understood by isfinite, with the loops themselves essentially copied from isnat (but obviously with the opposite result). In principle, not difficult, though I labeled it as "intermediate" since anything to do with ufuncs is non-trivial.

@madphysicist
Copy link
Contributor

@mhvk I've decided to try to solve this issue. It's high time I learned about ufuncs and the C API and I would like to use this as a stepping stone to make it possible to histogram datetimes.

@mhvk
Copy link
Contributor

mhvk commented Mar 27, 2018

Great!

@mattip
Copy link
Member

mattip commented Sep 4, 2018

There is some discusssion around the idea that ufunc inner loops should have access to the operand dtypes. It is not clear how to do this yet in a backward-compatible fashion, perhaps extending the signature of PyUFuncGenericFunction to void loopfunc(char** args, npy_intp* dimensions, npy_intp* steps, void* data, PyArray_Descr * dtypes)? Then datetime64 loops could acces the units. Theoretically, we could repurpose the rarely-used data argument, but that would break code like PyUFunc_e_e and friends, which are used in downstream projects.

Edit: add link to PyUFuncGenericFunction

@seberg
Copy link
Member

seberg commented Sep 4, 2018 via email

@mattip
Copy link
Member

mattip commented Sep 4, 2018

@seberg

Then that function could also set a data void*

Do you mean the dtype resolution function? Yes, I think I see where you are going with that, the resolution function could set the return dtype and also set the data pointer to a specific function as determined from the operand dtypes, this specific function would be called in the inner loop like PyUFunc_e_e and friends, as in the end of this example but not hard-coding the data pointer.

Error handling: the specific function could set a PyErr exception, and abort the inner loop function and then we could check after the loop execution PyErr_Occurred to raise the exception in the ufunc machinery. Note there is no return value from the inner loop functions, so there is no direct way to know that a loop erred out. This does in some way break the ufunc design that inner loops are not python-capi-aware, but I don't know if that was an explicit design goal.

@seberg
Copy link
Member

seberg commented Sep 4, 2018 via email

@mhvk
Copy link
Contributor

mhvk commented Sep 4, 2018

My sense is that there is no real need to adjust the inner loops themselves for dealing with types - there is already the general purpose *data (I've used it for a prototype chained ufunc); so far, the only real weakness I have found with the whole ufunc mechanism is the inability to assign output arrays, but I think all that requires is moving the hooks for type resolution up earlier in the code, and allowing code to do more in that hook (using the type resolver also seems to have been the conclusion in this thread).

One part where a standard for passing to inner loops might be more useful is for where - working via a write-back buffer prevents meaningful use of where in reductions.

(It does seem we're a bit far off track of the actual issue here! It would seem useful to collect cases where the current ufunc structure doesn't work so well in a wiki; for what it is worth, for units I think chained ufuncs are the way to go, i.e., one calls an upper inner loop which calls more standard inner loops below).

@concretevitamin
Copy link

Just got hit by this bug. My use case is simple: np.histogram(ndarray_of_datetime64). Without being able to do this, I basically have no reason to use the datetime64 support. Please fix!

@eric-wieser
Copy link
Member

I think #12988 might address this.

@eric-wieser
Copy link
Member

eric-wieser commented Mar 5, 2019

@concretevitamin: np.histogram will work fine as long as you compute the bins for it yourself. There are more things blocking that than just nan-checking. linspace is another.

@concretevitamin
Copy link

@eric-wieser Sure - make ndarray of datetime64 function more like first-class citizen will be welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.