-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
[ENH](WIP) new ufunc implementations for object arrays #6320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
EDIT: This comment is obsolete. See comment #6320 (comment) below for a more up-to-date description Here is what this PR does. (This can go into Docs later) ufuncs and object arraysUfuncs operate specially on objects arrays, since the objects contained in the array may not be numpy types. To evaluate a ufunc numpy tries a series of strategies in order for each element of the array (for unary ufuncs) or for each pair of elements from the arrays (for binary ufuncs):
If no step succeeds, a This provides a rough way of creating new numeric types compatible with numpy, as you can define a class which implements all ufuncs missing from step 1 as methods, and then create an object array containing elements of your type. However, note that creating a user-defined type (see "User-Defined Types") is often preferrable. User defined types will be much faster and you will have control over casting and coercion. |
Can you post this to the list, so more eyes will see it?
|
Sure. Also:
|
numpy/core/_objectmath.py
Outdated
|
||
def signbit(x): | ||
if isinstance(x, complex): | ||
return cmath.copysign(1, x) < 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmath.copysign
isn't a thing.
#368 has some earlier discussion of this idea. |
I hate to do this, but my question is: what about for example decimals? |
That's exactly the kind of thing I was worried about missing. I hadn't thought of Decimals. It looks like they are already mostly handled.
Some ufuncs could be improved, eg using the But, should we even try to support Decimal? It doesn't play well with the other numeric types. For example, |
The thing I am worried about is not about them not working (I am fine with that really). What I have a bit of a bad feeling about is silently casting decimals to float, which to me feels wrong (it is an unsafe cast). Ideally, you we would probably make it easy to create a full dtype for it, but the "object" dtype should play well with them in any case. In other words, I am not quite comfortable with calling into |
The same problem arises for people using the mpmath types. My original thought was that if an object implemented But now I am thinking you are right, it would be confusing. Maybe we should have something like
(Note that |
Also, what do you think of fallback implementation in terms of other ufuncs? Eg, what if we have
where On the plus side, it means the user doesn't need to implement |
It sounds like a lot of the use cases that this is aiming at would be
|
Updated. Everyone's comments make me think a lot of ufuncs should just fail for types that aren't builtin (bool/int, long, float or complex), so now they raise Ufuncs which exist to increase precision ( |
@njsmith, while I do think that a lot of this would fit much better with some better dtype system, we should have reasonable stuff in place for plain I think it might be better if we do not use isinstance, but actually
So my question is: unless it is completely obvious that one thing is right (sign seems fine to me for example, though then again, strings ^^, so maybe a "number" check....), how about we always check the type exactly. That feels pretty future proof to me in terms of not breaking down at some point and does probably solve almost all real live cases?! |
I think you're right about not using isinstance. That will be a problem for mpmath types too, eg Also
|
see #1848 |
Ah! But now that I get down to implementing it, is this what we want?
I would expect +1 based on wikipedia's definition since Edit: Oh, I think this is Python's problem, not numpy's. |
@ahaldane: |
But it probably would also be fine to define it in terms of whatever limiting procedure, which might be less surprising for some cases. |
@pv, that makes sense. But also, in python and numpy I can specify the type of infinity:
But this is not very accurate then:
|
31998e0
to
b15f145
Compare
All right, I updated all the ufunc implementations based on recent discussion, fixed small bugs in the C code, and also added a rough unit test. The unit test checks that the ufunc works similarly for numpy types and python builtin types. However, it turns out a bunch of my ufunc implementations failed the test. I fixed some of them, but others I've just disabled for now in the unit test. See the comments in the unit test. |
If we are only going to handle
|
Interestingly, that definition also makes sense for ndarrays and eventually
|
Actually, I think it would likely make sense to define "object" arrays in the sense of "try to call the type specific ufunc -- if we know it -- on each element" + some fallback (unless they map to a python operator maybe). |
For most objects that support both + and np.add they should do the same
|
Too bad I refuse to read that discussion ;). Well, I think one reason to not have it the same would be to have elementwise vs. some other logic (such as concatenate)? In that sense, I think if we call Which actually gets me philosophizing. Are maybe containers just dtype-scalars in some sense? Is there actually any difference between how a ufunc knows how to work on a single element and how |
5f973f6
to
c1740de
Compare
rebased. Still needs more work in the tests. Now that the |
if (ret != NULL) { | ||
arr = (PyArrayObject*)PyArray_FromScalar(ret, NULL); | ||
Py_DECREF(ret); | ||
ret = PyArray_GETITEM(arr, PyArray_DATA(arr)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For pure-python float and complex objects we call the ufunc on the object, which converts to a numpy type, eg float -> float64. Then we convert back to python types by calling .item()
, but this method only exists on arrays so we need to convert the scalar to an array first, then call PyArray_GETITEM
.
return bool(x and y) | ||
|
||
def logical_or(x, y): | ||
return bool(x or y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we leave the bool
to a separate PR, since that's a behaviour change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I want to go even further the other way. logical_or
is documented to guarantee it returns a boolean array as output.
I've already forced isinf
, isnan,
isfinite
and signbit
to return booleans, but it looks like I forgot to do so for the logical_
ufuncs (in generate_umath.py
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's true maybe I can leave that for another PR... if I go all the way through with that idea I'd also have to make the comparison operators (less
and so on) return bool arrays.
Edit: Never mind I forgot: I already made less
and othe comparisons return boolean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be much keener to leave behaviour changes to a second PR, if possible - in particular, we might want a deprecation cycle, or even a way to explicitly request the old behaviour. But I guess we can review it all together, and split it up once everything else looks good.
numpy/core/src/umath/loops.c.src
Outdated
@@ -2877,8 +2880,13 @@ OBJECT_@func@(char **args, npy_intp *dimensions, npy_intp *steps, void *data) | |||
return; | |||
} | |||
|
|||
#if @out_bool@ | |||
*((npy_bool *)op1) = PyObject_IsTrue(ret); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This handles -1
badly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just break the loop, as we do above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, will fix
if isint(x): | ||
return nfunc(float(x)) | ||
raise TypeError(_notimplemented_msg(name, type(x))) | ||
return mathfunc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: set mathfunc.__name__ = nfunc.__name__
), | ||
'power': | ||
Ufunc(2, 1, None, | ||
docstrings.get('numpy.core.umath.power'), | ||
None, | ||
TD(ints), | ||
TD(inexact, f='pow', astype={'e':'f'}), | ||
TD(O, f='npy_ObjectPower'), | ||
TD(O, f='PyNumber_Power'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look safe to me - we don't pass enough arguments to PyNumber_Power
in the loop - npy_ObjectPower
is there to add an extra Py_None
argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, fixing
exp, expm1, log, log10, log1p, sqrt, ceil, trunc, fabs, floor, | ||
arctan2, conjugate, square, reciprocal, _ones_like, sign, degrees, | ||
rad2deg, radians, deg2rad, exp2, log2, cbrt, rint, spacing# | ||
#out_bool = 1, 1, 1, 1, 1, 0*37# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of this information being duplicated in python and C
npy_cache_import("numpy.core._objectmath", "@func@", &fallback); | ||
if (fallback == NULL) { | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put these cache imports, and/or the function name, in data
instead? Then we'd only need one loop function, rather than one for each ufunc (or perhaps two - one for boolean return values)
NPY_NO_EXPORT int | ||
_is_float_complex(PyObject *op) | ||
{ | ||
if (PyFloat_Check(op) || PyComplex_Check(op)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work for objects that define __float__
or __complex__
methods.
|
||
/* Emulates Python's 'not b' behavior */ | ||
static PyObject * | ||
npy_ObjectLogicalNot(PyObject *i1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not modify these to return proper numpy bools instead of python objects instead of getting rid of them?
@@ -2189,6 +2189,8 @@ def norm(x, ord=None, axis=None, keepdims=False): | |||
|
|||
if not issubclass(x.dtype.type, (inexact, object_)): | |||
x = x.astype(float) | |||
elif issubclass(x.dtype.type, object_): | |||
x = x*1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding support for anything with __float__
would likely eliminate the need for this step.
if x == -1: | ||
return -1 | ||
if x == 0: | ||
return np.reciprocal(0).item() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gives RuntimeWarning: divide by zero encountered in reciprocal
- was that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember the intention anymore, but maybe. Doesn't the corresponding int32/int64 ufunc raise the warning too? Then the object ufunc probably should as well.
np.reciprocal([1,0,1])
/usr/bin/ipython3:1: RuntimeWarning: divide by zero encountered in reciprocal
#!/usr/bin/python3
/usr/bin/ipython3:1: RuntimeWarning: invalid value encountered in reciprocal
#!/usr/bin/python3
array([ 1, -9223372036854775808, 1])
In discussion we did not feel that this should move forward as is. It would be best if this could be done outside numpy, perhaps via a new dtype or using one of the protocols somehow. It needs a champion and a clear use case. Closing, please reopen with a clear plan for moving it forward. |
This PR overhauls the ufunc implementations for object type, to follow the rules I will paste in the comment below.
At this point I am pasting it to see if it there is interest so it is worth effort to finish unit tests, and to discuss what the ufunc implementations should be. I still need to do tests, docs, and bug-hunting.
motivation
Currently many ufuncs do not work on object arrays (eg all transcendental functions), and some that do work can be improved (see recent discussion of
np.sign
). With this PR all (or most...) missing ufuncs are implemented using themath
andcmath
modules as a fallback.request for comments
I am sure I have missed funny edge cases in the ufunc implementations, so I would love more eyes looking at them. The current implementations are in
_objectmath.py
.Some issues in particular: I am not sure how to implement
np.nextafter
,np.spacing
ornp.cbrt
in pure python. Also, I changed the way thelogical_
ufuncs work: In current numpy,logical_and
(for example) behave likex and y
(Ie, it will return either x or y), while I changed it toTrue if (x and y) else False
(so it returns a boolean). The new way behaves more like the ufuncs behave for float/int arrays.Also, while we are updating the ufunc implementation, maybe this is the time for back-compat breaks. For example, should the fallback to (eg)
sqrt
be to call theelem.sqrt
method, or might we rename it toelem.npy_sqrt
to avoid name conflicts?