Skip to content

BUG allow subclasses in MaskedArray ufuncs -- for non-ndarray _data #3907

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 17, 2015

Conversation

mhvk
Copy link
Contributor

@mhvk mhvk commented Oct 13, 2013

This is a follow-on to #3900, making two furthe changes to tryto reach the ultimate goal of MaskedArray behaving sensibly for subclasses.

  1. Ensure that for two-argument ufunc's, the data in the subclass are no longer converted to ndarray; rather, a possible subclass is kept. This is particularly relevant for my work on a MaskedQuantity class, where _data contains a Quantity which has not just a value (the ndarray) but also a unit.
  2. Avoid the creation of object arrays, so that non-array Classes with an __array_priority__ set and with a reverse method defined properly lead to NotImplemented (i.e., making the behaviour match that of regular arrays).

In order for this to work, I

  • removed subok=False in the getdata calls in the two BinaryOperation classes
  • slightly changed the method used to keep previous values when they are masked, including allowing this to fail (e.g., when I multiply two masked quantities in meters, the result is an area, and hence it is senseless to keep the original values, and quantities raise an appropriate UnitError). (Without this, the matrix subclass tests would not pass).
  • slightly changed the order, with result masks only calculated if the primary calculation succeeded.
  • slightly changed getdata to avoid the creation of object arrays, and getmaskedarray to deal with the consequences.

I also added a test case, which perhaps best makes clear why this is useful.

CC @charris -- I consider the above a correction of a bug, but realise this is subjective; it could also be ENH, even though the API does not change. It would be wonderful if this could still make it to 1.8.

@njsmith
Copy link
Member

njsmith commented Oct 13, 2013

I'm a little wary of putting in a change of this complexity into 1.8 at
this point - there's lots of room for unexpected consequences here...
On 13 Oct 2013 07:04, "Marten van Kerkwijk" notifications@github.com
wrote:

This is a follow-on to #3900 #3900,
making two furthe changes to tryto reach the ultimate goal of MaskedArraybehaving sensibly for subclasses.

  1. Ensure that for two-argument ufunc's, the data in the subclass are no
    longer converted to ndarray; rather, a possible subclass is kept. This is
    particularly relevant for my work on a MaskedQuantity class, where _datacontains a
    Quantity which has not just a value (the ndarray) but also a unit.
  2. Avoid the creation of object arrays, so that non-array Classes with an
    array_priority set and with a reverse method defined properly lead to
    NotImplemented (i.e., making the behaviour match that of regular arrays).

In order for this to work, I

  • removed subok=False in the getdata calls in the two BinaryOperationclasses
  • slightly changed the method used to keep previous values when they
    are masked, including allowing this to fail (e.g., when I multiply two
    masked quantities in meters, the result is an area, and hence it is
    senseless to keep the original values, and quantities raise an appropriate
    UnitError). (Without this, the matrix subclass tests would not pass).
  • slightly changed the order, with result masks only calculated if the
    primary calculation succeeded.
  • slightly changed getdata to avoid the creation of object arrays, and
    getmaskedarray to deal with the consequences.

I also added a test case, which perhaps best makes clear why this is
useful.

CC @charris https://github.com/charris -- I consider the above a
correction of a bug, but realise this is subjective; it could also be ENH,
even though the API does not change. It would be wonderful if this could

still make it to 1.8.

You can merge this Pull Request by running

git pull https://github.com/mhvk/numpy ma/allow-subclass-in-ufunc

Or view, comment on, or merge it at:

#3907
Commit Summary

  • Allow subclasses in MaskedArray ufuncs -- for non-ndarray _data

File Changes

Patch Links:

@rgommers
Copy link
Member

Proposed changes sound good to me.

Tested with scipy.stats.mstats, no issues.

Test failure is real, needs fixing.

@charris
Copy link
Member

charris commented Oct 13, 2013

I think it is too late for 1.8.0. However, it is good that masked arrays are getting a look over. There may be other parts that can be improved or fixed up using some of the machinery that has gone into numpy in the last few years. In particular, I'm wondering if the new __numpy_ufunc__ functionality in 1.9 can be used.

@mhvk
Copy link
Contributor Author

mhvk commented Oct 14, 2013

@njsmith, @rgommers, @charris - having spent several hours looking at the test failure, I now agree this PR is not for 1.8... The failure is actually a combination of two previous bugs canceling each other out (see below).
I made a separate PR to correct that (#3914); I also included that in this PR, so that the tests all pass again

The test that fails effectively does

import numpy as np; from numpy import isclose, nan, isnan
x = np.ma.masked_where([True, True, False], [nan, nan, nan])
y = isclose(nan, x, equal_nan=True)
# and at the very end of `isclose` (filling in vars),
cond = np.zeros_like(x, dtype=np.bool)  # net effect of quite a few lines
cond[isnan(x) & isnan(nan*np.ones_like(x, dtype=np.bool))] = True
assert all(cond._mask == [True, True, False])

Now the differences are:

  1. (nan*np.ones_like(x, dtype=np.bool))._data yields array([ 0., 0., nan]) in the current version and array([ nan, nan, nan]) in my PR. Of course, in both cases the first two elements are masked.
  2. Since a mask on the index for is ignored in __setitem__, and since for any items set also the mask is adjusted, the assignment to cond[isnan...] = True] by chance keeps the mask OK in the current version, and clears it for all elements for mine.

The basic problem is not so much what is done to masked values (FWIW, I think my PR improves that), but rather that masked values should not be used in the assignment. Indeed, there is a commented out check on the index in MaskedArray.__setitem__ -- but turning that on finds a number of other test failures... And while it should not be used, it is also good if it can be made to work as frequently as possible (#3914 attempts to do so).

@mhvk
Copy link
Contributor Author

mhvk commented Oct 14, 2013

While writing, may as well ask: ideally, I'd like to see that, e.g., masked_array * array_subclass returns masked_array_subclass if such a subclass is available (in my particular case, MaskedQuantity). In principle, this is not too difficult if the array subclass has a link to a masked version of itself, say an _masked_class attribute). Would something like that make sense? Is it fine to introduce such a standard? Or is there perhaps something in place already? (E.g., is one perhaps meant to use update_from? I'm a bit confused by that).

@charris
Copy link
Member

charris commented Oct 14, 2013

Looks like you merged master into this. That is generally not a good idea, better to rebase on master, and I think you should do that here: git rebase master. That should clean things up a bit. If things get really messed up, the other option is a new branch off current master and merge this, but I haven't had to do that.

@charris
Copy link
Member

charris commented Oct 14, 2013

"Is it fine to introduce such a standard?" Questions like this are best raised on the list for discussion.

@mhvk
Copy link
Contributor Author

mhvk commented Oct 14, 2013

@charris - OK, rebased (and I'll raise my question on the mailing list)

@charris
Copy link
Member

charris commented Oct 16, 2013

Ping Travis.

@charris charris closed this Oct 16, 2013
@charris charris reopened this Oct 16, 2013
@@ -655,8 +655,11 @@ def getdata(a, subok=True):
data = a._data
except AttributeError:
data = np.array(a, copy=False, subok=subok)
if not subok:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation for the function is no longer correct if this goes in. It needs correction. I also wonder if returning a is the right thing when data is an object array. It seems a little off to me and an indirect way of communicating the result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a crucial change, which is something I'm having to produce horrible code to work around (I'm giving my classes _data attributes that override view, not at all pretty).

My guess is that it simply is an oversight, but really it seems one should not stuff something in arrays here when it doesn't obviously fit: that can still happen inside _MaskedBinaryOperation.

The specific problem I have for astropy quantities perhaps illustrates this best. There, we use a Unit class, which if multiplied with an numpy array, gives a quantity:

In[8]: np.arange(3)*Unit('m')
Out[8]: <Quantity [0,1,2] m>

It does this by setting __array_priority__, and thus its __rmul__ method gets called. Now when I do

In[9]: np.ma.arange(3)*Unit('m')
Out[9]: 
masked_array(data = [<Quantity 0 m> <Quantity 1 m> <Quantity 2 m>],
             mask = False,
       fill_value = ?)

The reason is that because the unit gets wrapped in an object array, it looses its __array_priority__ (and even if it didn't, it wouldn't help, since it is now an array; see #3164 by another astropy developer), and hence the actual multiplication of the unmasked array with the unit-inside-an-array proceeds by feeding every single number to the unit, which happily turns this into a quantity, so that one is left with a masked array full of objects. (Actually, since most installed versions of numpy do not have my NotImplemented bug fix from #3900, I cannot get the above to work at all, but at least with my hacks unit*masked_array works).

p.s. I definitely should have changed the documentation -- will do so if you agree my change makes sense.

@mhvk
Copy link
Contributor Author

mhvk commented Nov 4, 2013

@charris - I had somewhat forgotten about this PR, and looking back at it realised the main thing still missing was documentation of the change in behaviour for subok=False in getdata for the case where the input is not an array (sub)class, nor anything easily convertible into an array. I've now added that. The other issue we discussed above was whether self.__rmul__(other) should lead to multiply(other, self) or the reverse (and same for __radd__); I think by analogy with __rdiv__ (and __rsub__) my approach is slightly more logical, but as @pv mentioned, any code that relies on either behaviour is badly written. I'm certainly fine with taking that change out.

@charris
Copy link
Member

charris commented Mar 29, 2014

@mhvk 1.9 is coming up, anything you want to add?

@mhvk
Copy link
Contributor Author

mhvk commented Apr 1, 2014

@charris - thanks for warning me about 1.9. I think this is still a very useful change. Just to be sure it is still OK, I rebased on current master and pushed it again (assuming this triggers a new travis built).

It so happens I just started working again on getting masked subclasses to work in astropy, with the current idea being that we'll have a MaskedArray subclass in which we override parts that currently do not work well, and make separate PRs to numpy for those. I'll try to get the simpler ones of those in soon.

@mhvk
Copy link
Contributor Author

mhvk commented Jul 6, 2014

@charris - an astropy bug report (astropy/astropy#2701) reminded me of this solution. I'm assuming it is too late to get this in 1.9 still; if not, do let me know and I'll rebase & ensure it is mergeable.

@charris
Copy link
Member

charris commented Jul 6, 2014

@mhvk Needs a rebase. And the commit messages could use a tweak, see doc/source/dev/gitwash/development_workflow.rst for prefixes.

I've been putting off all the uncommited fixes/workarounds from astropy until 1.10 development starts. This doesn't look terribly risky, but at this point I'd prefer all those changes have more time to settle.

@mhvk
Copy link
Contributor Author

mhvk commented Jul 6, 2014

@charris - fair enough. I rebased it anyway, even though as I think you wrote earlier, it may well be best to rewrite MaskedArray using __numpy_ufunc__, in which case all of this may be irrelevant. Also, for astropy, since we have to support lower versions of numpy, we'll anyway be stuck with half-forking ma.py (of course, I'll continue to raise issues we find; I feel slightly guilty for being responsible for two out of the three remaining 1.9 blockers...)

@charris
Copy link
Member

charris commented Jul 6, 2014

As an aside, you might be interested in the thread that contains this quote

Aside: While I am at it, let me reiterate what I have said to the other
developers privately: there is NO value to inheriting from the array
class. Don't try to achieve that capability if it costs anything, even
just effort, because it buys you nothing. Those of you who keep
remarking on this as if it would simply haven't thought it through IMHO.
It sounds so intellectually appealing that David Ascher and I had a
version of Numeric that almost did it before we realized our folly.

@mhvk
Copy link
Contributor Author

mhvk commented Jul 6, 2014

@charris - thanks for the quote! Though I should add that after my involvement in getting Quantity to work properly as an ndarray subclass, I would disagree. It may sound trivial, but it helps enormously to simply be able to do np.sin(q) where q could be an ndarray in radians or a quantity with any angular unit, if only because code can be reused so much more easily than if one had to import Quantity's own ufuncs. Unfortunately, MaskedArray is a bit less friendly to work with...

@mhvk
Copy link
Contributor Author

mhvk commented May 6, 2015

@charris - thanks for pointing me to #5227. I tried the trial implementation of taking account of __array_priority__ of @abalkin in #5230 and that solves the problem where <masked-array> * <unit> made a masked array of object dtype. With that, I think that the somewhat problematic edits to getdata would no longer be needed.

However, #5230 does not address that subclasses should be passed on (the main goal of this PR). Since it is simpler, though, maybe it should be done first?

@charris
Copy link
Member

charris commented May 14, 2015

@mhvk Where do you think this stands with respect to the __numpy_ufunc__ discussion? Also, needs a fix if this is going to be considered for 1.10.

@mhvk
Copy link
Contributor Author

mhvk commented May 14, 2015

@charris - This is in addition to the discussion on __array_priority__ -- ideally, I would rebase after #5864 is merged, and focus just on the subclass issue (i.e., remove the changes to getdata).

@charris
Copy link
Member

charris commented May 14, 2015

@mhvk Thanks for the update. I'll let this wait on #5864 then.

@charris
Copy link
Member

charris commented Jun 4, 2015

Could use a couple of comments.

@charris
Copy link
Member

charris commented Jun 15, 2015

@mhvk #5864 is in.

@mhvk
Copy link
Contributor Author

mhvk commented Jun 15, 2015

@charris - thanks, I'll try to look at this and possible tests for #4617 as soon as I can.

@mhvk mhvk force-pushed the ma/allow-subclass-in-ufunc branch 2 times, most recently from eda932a to 5a15d3f Compare June 16, 2015 20:06
@mhvk mhvk force-pushed the ma/allow-subclass-in-ufunc branch from 5a15d3f to 3c6b6ba Compare June 16, 2015 20:28
@mhvk
Copy link
Contributor Author

mhvk commented Jun 16, 2015

@charris - OK, I rebased this one, which meant that, as hoped, the changes to getdata could be reverted. Furthermore, with #5864 in, there no longer is a need to pass down NotImplemented from within the ufunc implementations (more in the spirit of what we seem to be getting to for #5844, so good in that sense). So, all that's left is now a rewrite that ensures that subclasses get to tell how the ufuncs are done. Which is quite clean and nicely stand-alone.

Apart from that, I added the comments to __radd__ and __rmul__ you requested and one test for #4617 (can do more if you think it is important but I ran out of time today).

@charris
Copy link
Member

charris commented Jun 16, 2015

Hmm, this test failure looks potentially real (but maybe not repeatable). I'm going to restart the test but keeping this for reference. The numpy/ma/testutils.py is the only thing that looks related.

FAIL: test_io.test_load_refcount

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/home/travis/build/numpy/numpy/builds/venv/lib/python2.7/site-packages/nose/case.py", line 197, in runTest

    self.test(*self.arg)

  File "/home/travis/build/numpy/numpy/builds/venv/lib/python2.7/site-packages/numpy/lib/tests/test_io.py", line 1920, in test_load_refcount

    assert_equal(n_before, n_after)

  File "/home/travis/build/numpy/numpy/builds/venv/lib/python2.7/site-packages/numpy/ma/testutils.py", line 100, in assert_equal

    raise AssertionError(msg)

AssertionError: 

Items are not equal:

 ACTUAL: 55930

 DESIRED: 55911

@mhvk
Copy link
Contributor Author

mhvk commented Jun 16, 2015

@charris - yes, I had seen that very weird test result, and was going to ask to restart, as it seemed unlikely any change in the python code would lead to a refcount problem. But you beat me to it... oddly, it seems irreproduceable (though at least it means it was not due to this PR).

charris added a commit that referenced this pull request Jun 17, 2015
BUG allow subclasses in MaskedArray ufuncs -- for non-ndarray _data
@charris charris merged commit 6c1e1de into numpy:master Jun 17, 2015
@charris
Copy link
Member

charris commented Jun 17, 2015

Well, lets get this in. Thanks @mhvk.

@mhvk mhvk deleted the ma/allow-subclass-in-ufunc branch June 17, 2015 20:09
@argriffing
Copy link
Contributor

A possibly related issue on the mailing list: https://mail.scipy.org/pipermail/numpy-discussion/2016-February/074940.html

mhvk added a commit to mhvk/numpy that referenced this pull request Feb 21, 2016
charris added a commit that referenced this pull request Feb 21, 2016
Revert part of #3907 which incorrectly propogated MaskedArray info.
charris pushed a commit to charris/numpy that referenced this pull request Feb 21, 2016
charris added a commit that referenced this pull request Feb 21, 2016
Backport 7296, Revert part of #3907 which incorrectly propogated MaskedArray info.
jaimefrio pushed a commit to jaimefrio/numpy that referenced this pull request Mar 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants