Skip to content

NEP 13 should mention how we handle unknown scalar types #12258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
shoyer opened this issue Oct 24, 2018 · 5 comments
Open

NEP 13 should mention how we handle unknown scalar types #12258

shoyer opened this issue Oct 24, 2018 · 5 comments

Comments

@shoyer
Copy link
Member

shoyer commented Oct 24, 2018

This section of NEP 13 describes how __array_ufunc__ interacts with Python's binary operations:
http://www.numpy.org/neps/nep-0013-ufunc-overrides.html#behavior-in-combination-with-python-s-binary-operations

But it doesn't fully describe what happens an object of unknown type that implements arithmetic (e.g., decimal.Decimal) but doesn't implement any NumPy special methods like __array_ufunc__ or __array_priority__ is encountered in arithmetic with a NumPy array:

  1. Assuming the unknown type properly returns NotImplemented from binary operations for unrecognized arguments, the numpy ufunc will get called.
  2. The NumPy ufunc (e.g., np.add) will coerce the unknown object into a scalar NumPy array with object dtype.
  3. The scalar array gets broadcast to the same shape as the NumPy array.
  4. The NumPy ufunc loop (for object dtype) calls the Python operator on each element of the two arrays.
  5. The result is another object dtype array.

If this seems correct, I will make a minor addendum to add this clarification. This is particularly valuable as an example to other projects (e.g., see pandas-dev/pandas#23293) of the right way to implement arithmetic like NumPy.

@jorisvandenbossche
Copy link
Contributor

The NumPy ufunc (e.g., np.add) will coerce the unknown object into a scalar NumPy array with object dtype.

One question here: how does numpy decide whether to coerce the unknown object or rather return NotImplemented?

Because at this point, you need to decide: is this a scalar like object for which we want to apply the ufunc element-wise with the objects in the array (call the ufunc), or this is rather an unknown container-like object to which we want to dispatch the ufunc (return NotImplemented). This is the aspect for which I am unsure what to do in the pandas case.

@shoyer
Copy link
Member Author

shoyer commented Oct 24, 2018

One question here: how does numpy decide whether to coerce the unknown object or rather return NotImplemented?

  • If other.__array_ufunc__ is None (recommended) or other.__array_priority__ > self.__array_priority__ (for backwards compatibility), then NumPy returns NotImplemented. Control reverts to Python, and either other handles the arithmetic operation or Python raises TypeError.
  • Otherwise, NumPy calls the ufunc (and is guaranteed to not return NotImplemented):
    • If other defines __array_ufunc__, it has the opportunity to override the ufunc's behavior.
    • If other does not define __array_ufunc__, the normal ufunc behavior is invoked. This does coercion to 0d object arrays and applies the ufunc elementwise (as described in my first post).

@jorisvandenbossche
Copy link
Contributor

If other defines __array_ufunc__, it has the opportunity to override the ufunc's behavior.

So probably we should do something similar for pandas? So for object dtype instead of checking a specific list of handled types (like in the NDArrayOperatorsMixin example implementation), we could also check for unknown objects if it defines __array_ufunc__, and in that case return NotImplemented, otherwise apply ufunc on the underlying values?
So I think basically what I proposed on the pandas PR earlier (pandas-dev/pandas#23293 (comment)):

We could check for objects that implement __array_ufunc__ themselves? If they have the attribute, and are not known to us (not in our _HANDLED_TYPES list, eg dask or xarray objects), we raise NotImplemented, otherwise we pass through.

And then for object dtype, we would not return NotImplemented for anything else, but actually apply ufunc.

@shoyer
Copy link
Member Author

shoyer commented Oct 24, 2018

It does seem like it would be worthwhile to figure out the canonical example of how to write a __array_ufunc__ function that works exactly like numpy.ndarray (but adds a wrapper).

I think the right answer is a mix-up of the NDArrayOperatorsMixin example with the checking for __array_ufunc__ attributes from ndarray.__array_ufunc__:

class ArrayLike(np.lib.mixins.NDArrayOperatorsMixin):
    def __init__(self, value):
        self.value = np.asarray(value)

    def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
        out = kwargs.get('out', ())
        for item in inputs + outputs:
            if (hasattr(item, '__array_ufunc__') and
                    type(item).__array_ufunc__ is not np.ndarray.__array_ufunc__):
                return NotImplemented

        # Defer to the implementation of the ufunc on unwrapped values.
        inputs = tuple(x.value if isinstance(x, ArrayLike) else x
                       for x in inputs)
        if out:
            kwargs['out'] = tuple(
                x.value if isinstance(x, ArrayLike) else x
                for x in out)
        result = getattr(ufunc, method)(*inputs, **kwargs)

        # FIXME: should also check ufunc.signature to ensure it's not a gufunc
        if type(result) is tuple:
            # multiple return values
            return tuple(type(self)(x) for x in result)
        elif method == 'at':
            # no return value
            return None
        else:
            # one return value
            return type(self)(result)

    def __repr__(self):
        return '%s(%r)' % (type(self).__name__, self.value)

Note that NDArrayOperatorsMixin automatically implements Python special methods like __add__ that do the right check for __array_ufunc__ is None, e.g.,

def _disables_array_ufunc(obj):
    try:
        return obj.__array_ufunc__ is None
    except AttributeError:
        return False

class NDArrayOperatorsMixin:
    ...
    def __add__(self, other):
        if _disables_array_ufunc(other):
            return NotImplemented
        return np.add(self, other)

@shoyer
Copy link
Member Author

shoyer commented Oct 25, 2018

Cc @mhvk @njsmith

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants