Skip to content

np.median behavior broken for array subclasses in 1.8.0rc1 #3846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
astrofrog opened this issue Oct 1, 2013 · 15 comments · Fixed by #3851
Closed

np.median behavior broken for array subclasses in 1.8.0rc1 #3846

astrofrog opened this issue Oct 1, 2013 · 15 comments · Fixed by #3851
Milestone

Comments

@astrofrog
Copy link
Contributor

Previously, when writing Numpy array subclasses, it was possible to override the behavior of np.mean and np.median by defining a mean method on the sub-class. For example,

import numpy as np

print np.__version__

class MySubClass(np.ndarray):

    def __new__(cls, input_array, info=None):
        obj = np.asarray(input_array).view(cls)
        obj.info = info
        return obj

    def __array_prepare__(self, obj, context=None):
        print "IN PREPARE"
        return obj

    def __array_finalize__(self, obj):
        if obj is None: return
        self.info = getattr(obj, 'info', None)

    def __array_wrap__(self, out_arr, context=None):
        print "IN WRAP", context
        return np.ndarray.__array_wrap__(self, out_arr, context)

    def mean(self, axis=None, dtype=None, out=None):
        print "IN MEAN"

    def median(self, axis=None, dtype=None, out=None):
        print "IN MEDIAN"

a = MySubClass([1,2,3])
np.median(a)

produced the following output:

1.7.1
IN MEAN

However, in 1.8.0rc1, it looks like none of the methods shown above get called:

1.8.0rc1

This means that it is no longer possible to override the behavior of median, which makes things like quantity handing with units break. For instance:

In [1]: import quantities as u 

In [2]: import numpy as np

In [4]: np.median(np.array([1,2,3]) * u.m)
Out[4]: 2.0

In [5]: np.mean(np.array([1,2,3]) * u.m)
Out[5]: array(2.0) * m

or

In [3]: import numpy as np

In [4]: from astropy import units as u

In [5]: np.median(np.array([1,2,3]) * u.m)
Out[5]: 2.0

In [6]: np.mean(np.array([1,2,3]) * u.m)
Out[6]: <Quantity 2.0 m>

It would be great if this could be fixed by 1.8.0 final, as this will otherwise break a lot of unit-handing code.

@njsmith
Copy link
Member

njsmith commented Oct 1, 2013

Hmm, sounds RC to me.

Thomas: do you really mean that you want to be able to define a .mean
method in order to override .median? I don't understand this part.

@jtaylor: did we just forget about ndarray.median when putting in the new
percentile code, or what...? I guess np.median should call ndarray.median
should call percentile?
On 1 Oct 2013 07:57, "Thomas Robitaille" notifications@github.com wrote:

Previously, when writing Numpy array subclasses, it was possible to
override the behavior of np.mean and np.median by defining a mean method
on the sub-class. For example,

import numpy as np

print np.version

class MySubClass(np.ndarray):

def __new__(cls, input_array, info=None):
    obj = np.asarray(input_array).view(cls)
    obj.info = info
    return obj

def __array_prepare__(self, obj, context=None):
    print "IN PREPARE"
    return obj

def __array_finalize__(self, obj):
    if obj is None: return
    self.info = getattr(obj, 'info', None)

def __array_wrap__(self, out_arr, context=None):
    print "IN WRAP", context
    return np.ndarray.__array_wrap__(self, out_arr, context)

def mean(self, axis=None, dtype=None, out=None):
    print "IN MEAN"

def median(self, axis=None, dtype=None, out=None):
    print "IN MEDIAN"

a = MySubClass([1,2,3])
np.median(a)

produced the following output:

1.7.1
IN MEAN

However, in 1.8.0rc1, it looks like none of the methods shown above get
called:

1.8.0rc1

This means that it is no longer possible to override the behavior of
median, which makes things like quantity handing with units break. For
instance:

In [1]: import quantities as u

In [2]: import numpy as np

In [4]: np.median(np.array([1,2,3]) * u.m)
Out[4]: 2.0

In [5]: np.mean(np.array([1,2,3]) * u.m)
Out[5]: array(2.0) * m

or

In [3]: import numpy as np

In [4]: from astropy import units as u

In [5]: np.median(np.array([1,2,3]) * u.m)
Out[5]: 2.0

In [6]: np.mean(np.array([1,2,3]) * u.m)
Out[6]: <Quantity 2.0 m>

It would be great if this could be fixed by 1.8.0 final, as this will
otherwise break a lot of unit-handing code.


Reply to this email directly or view it on GitHubhttps://github.com//issues/3846
.

@astrofrog
Copy link
Contributor Author

@njsmith - in Numpy < 1.8, when calling np.median, if a mean method was defined on the array sub-class, it was called with either the central value of the array (if there was an odd number of elements) or the two central values (if there was an even number of elements). Ideally it would be nice to keep the same behavior so as not to break compatibility, but if this is too convoluted, then at least it would be nice if np.median checks if the sub-class defines a median method and calls that, as it does for np.mean and np.var (with mean and var methods respectively). Does this make sense?

@astrofrog
Copy link
Contributor Author

Similarly, it would be great if np.percentile checked for the presence of a percentile method.

@rgommers
Copy link
Member

rgommers commented Oct 1, 2013

I would say that all the code depending on the previous behavior relied on an implementation detail. If a lot of it breaks though, it can't hurt to keep the call to the mean method at the end of median.

We shouldn't add new things like this though, like you propose for percentile. There's a clear pattern, functions call their corresponding methods if and only if those methods exist on ndarray.

@astrofrog
Copy link
Contributor Author

@rgommers - this currently breaks the quantities code and astropy.units, so at least reinstating the previous behavior for median would be great. For the long term, is the only solution to monkey-patch functions such as np.percentile, etc. to do the right thing? Or does Numpy provide a mechanism for doing this more cleanly?

@rgommers
Copy link
Member

rgommers commented Oct 1, 2013

Why monkey-patch? If you have code that accepts objects with and without a percentile method, then just write:

try:
    result =data.percentile()
except AttributeError:
    result = np.percentile(data)

Since ndarray doesn't have that method, I don't think the above should be in numpy.

@astrofrog
Copy link
Contributor Author

@rgommers - things are a little more complicated because the issue is that quantities and astropy.units define quantity classes that are basically sub-classes of Numpy arrays. As one of the developers of Astropy, I want things like np.mean(np.array([1,2,3]) * u.m) to work seamlessly, which they do. All ufuncs can also be supported because things like __array_wrap__ and __array_prepare__ get called. However, for non-ufuncs (except mean, var, and before 1.8.0rc1 median), the units will be silently dropped if a user does np.percentile(np.array([1,2,3]) * u.m, 10), unless we monkey-patch np.percentile. The use case I am thinking of is not my own code, where I know which functions will and won't drop units, but users who may not realize that some functions support units, and some can't.

@rgommers
Copy link
Member

rgommers commented Oct 1, 2013

But then your problem is every non-ufunc function your users ever want to use right? So why change just percentile?

@pv
Copy link
Member

pv commented Oct 1, 2013

So the question is whether percentile should take measures to preserve array subclass, using the same mechanisms as ufuncs. The answer to this is probably yes,

@rgommers: the motivation for that is probably to avoid a change of this behavior in median.

@astrofrog: also, an unrelated issue: please take a look at the __numpy_ufunc__ mechanism that will be added in Numpy 1.9. There's still several months to propose and implement modifications to it. (Further discussion on this topic on the mailing list, not this issue, thanks.)

@astrofrog
Copy link
Contributor Author

@rgommers - indeed, we would need to monkey-patch many of the Numpy non-ufuncs because they do not preserve sub-classes - I was just using percentile as an example. Ideally, __array_prepare__ and __array_wrap__ should be called for all Numpy functions, not just ufuncs, but there must be a reason why this wasn't done?

@pv thanks for the note about __numpy_ufunc__ - we'll look into it!

@seberg
Copy link
Member

seberg commented Oct 1, 2013

@astrofrog, I doubt there is a big reason for that (though there may well be, I do not follow numpy for long). Half the time it was likely forgotten that it makes sense to call __array_wrap__ for a specific function (some few functions do after all). Though I must say I find the problem with subclasses that mess with the shape, such as matrix, annoying.

The subclassing situation does not seem great, but I guess the bigger subclassing packages will probably have to help if it is to be improved substantially, just like scipy.sparse now gave us __numpy_ufunc__, which is a big step forward.

@njsmith
Copy link
Member

njsmith commented Oct 1, 2013

In the long term, the way forward for 'units' is to improve custom dtype
support and switch to using dtypes-with-units in a plain-vanilla ndarray.
That's just a much better tool for solving the problem. I don't see how to
make ndarray subclass support really solid and consistent in the long term
regardless. Moving everything to methods on ndarray is a boil-the-ocean
solution. Anyway this is a long way of saying that I agree with Ralf that
we shouldn't mess about trying to fix percentile.

But, in the short term it does seem like we should come up with some
duct-tape hack to keep np.median working for astropy in 1.8 though.

Looking at the diff between 1.7.1 and 1.8, median doesn't actually call
percentile -- I think the difference is just that we now call np.array at
the top of the function. Would just doing np.asanyarray instead fix this?
@jtaylor, any thoughts?

On Tue, Oct 1, 2013 at 10:49 AM, seberg notifications@github.com wrote:

@astrofrog https://github.com/astrofrog, I doubt there is a big reason
for that (though there may well be, I do not follow numpy for long). Half
the time it was likely forgotten that it makes sense to call
array_wrap for a specific function (some few functions do after all).
Though I must say I find the problem with subclasses that mess with the
shape, such as matrix, annoying.

The subclassing situation does not seem great, but I guess the bigger
subclassing packages will probably have to help if it is to be improved
substantially, just like scipy.sparse now gave us numpy_ufunc, which
is a big step forward.


Reply to this email directly or view it on GitHubhttps://github.com//issues/3846#issuecomment-25437028
.

@astrofrog
Copy link
Contributor Author

@njsmith - I agree about the dtype, in fact we reached the same idea as a long-term solution.

For now, simply fixing np.median would be great!

@juliantaylor
Copy link
Contributor

using asanyarray instead of asarray is seems to work fine. I'll fix that.

@juliantaylor
Copy link
Contributor

note that median, percentile and sort will (hopefully) get extended axis support and (the first two) the keepdims keyword argument in 1.9.
I guess subclasses should be prepared to handle that too if they want to keep being drop in replacements.

@njsmith njsmith closed this as completed in e6f0c90 Oct 2, 2013
charris pushed a commit to charris/numpy that referenced this issue Oct 2, 2013
shoyer added a commit to shoyer/numpy that referenced this issue May 26, 2019
I think this is an important warning to include for subclass authors.
Otherwise, we will be expanding our exposure of internal APIs as part of
``__array_function__``. All things being equal, it's great when things "just
work" subclasses, but I don't want to guarantee it. In particular, I would be
very displeased if  ``__array_function__`` leads to NumPy adding more subclass
specific hacks like always calling ``mean()`` inside ``median()`` (numpyGH-3846).

mhvk: please take a look.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants