ENH: Let MaskedArray getter, setter respect baseclass overrides #4586

mhvk · 2014-04-04T21:52:11Z

As is, MaskedArray uses explicity calls to ndarray.__setitem__(self.data, indx, value) and ndarray.__getitem__(self.data, indx) to access elements of its baseclass. This PR uses self.data[indx] instead, so that possible overrides of __getitem__ and __setitem__ are respected.

This is particularly useful for astropy's Quantity ndarray subclass, since it should check that anything written to it has the correct unit, and associate units for anything gotten from it.

(Note: this includes #4585, as otherwise some of the tests would fail; I'll rebase once that is in.)

charris · 2014-04-04T22:43:47Z

LGTM.

charris · 2014-04-12T02:20:20Z

Needs updating for the recent commits. I'm not sure why the merge button is still green.

mhvk · 2014-04-16T04:29:25Z

@charris - I rebased, and it passes the tests. OK to merge?

charris · 2014-04-21T00:19:35Z

LGTM. I cannot figure out why ndarray.__getitem__ and friends were use instead of the usual indexing. Any ideas? The ma code probably dates back to the days of python 2.0, but I don't think that should make a difference.

mhvk · 2014-04-21T01:01:29Z

No ideas, I'm afraid... But it seemed good to clean up, hence also #4617.

charris · 2014-04-21T01:56:09Z

numpy/ma/core.py

@@ -3050,16 +3049,16 @@ def __setitem__(self, indx, value):
 #        if getmask(indx) is not nomask:
 #            msg = "Masked arrays must be filled before they can be used as indices!"
 #            raise IndexError(msg)
-        _data = ndarray.view(self, ndarray.__getattribute__(self, '_baseclass'))


OK, this line has pretty much everything the bothers me ;) Note the here _data is viewed as _baseclass, which may, but I'm not sure, have a different shape than self._data in the matrix case or some other odd case. Same thing above, where ndarray.view(self, ndarray) may be different from self.data, the latter may actually be a matrix type. I'd also like to know what the difference between self.data and self._data is, likewise self.mask and self._mask.

Dealing with masked arrays is like trying to sneak up on a tiger while creeping across eggshells. Disaster threatens at every move.

@charris - yes, it is weird, but it may be that this predates the following lines

def _get_data(self): """Return the current data, as a view of the original underlying data. """ return ndarray.view(self, self._baseclass) _data = property(fget=_get_data) data = property(fget=_get_data)

Similarly,

def _get_mask(self): """Return the current mask. """ # We could try to force a reshape, but that wouldn't work in some cases. # return self._mask.reshape(self.shape) return self._mask mask = property(fget=_get_mask, fset=__setmask__, doc="Mask")

So, this would seem to be safe... Or is there a MaskedMatrix class or so where parts get overwritten? My larger worry is that this was all written at a time when people were not as used to writing lots and lots of test cases...

Turns out the code actually postdates those functions you quote by about two years. Two different authors though, so we may not be the first struggling with the code ;) The mask part does look safe, although I'd like to know what the comment on line 2006 means.

I don't feel very confidence reviewing masked array code, there are a lot of subtleties that I don't understand yet...

mhvk · 2014-04-21T16:48:38Z

I went ahead and copied test_indexing in test_core.py to a new test_matrix_indexing. These tests fail with the current master, but with the addition of a mask.shape = dout.shape, they pass with this PR. So,it looks like this might solve even the masked matrix indexing problem! (well, better first see what travis thinks of it...).

charris · 2014-05-04T15:08:11Z

@mhvk I'm going to delay the rest of the masked array work until 1.9 is branched.

mhvk · 2014-05-04T17:52:45Z

@charris - yes, that is better, certainly for this part.

charris · 2014-08-24T00:49:23Z

@mhvk OK, I'm back and sinking in this quicksand ;) I would help if I had a better understanding of what quantities is doing. That is, what methods are overridden, etc. If I understand correctly, the main problem here is the scalar returned by indexing. This is likely to be a general property, although ufuncs behave correctly.

In [6]: class Foo(ndarray):
    def __new__(cls, *args, **kwargs):
        return array(*args, **kwargs).view(cls)
   ...:     

In [7]: a = Foo([1]*3)

In [8]: a[...]
Out[8]: Foo([1, 1, 1])

In [9]: a[:2]
Out[9]: Foo([1, 1])

In [10]: a[1]
Out[10]: 1

In [11]: a.sum()
Out[11]: Foo(3)

Maybe there is a deeper problem here that needs to be addressed.

mhvk · 2014-08-26T20:09:38Z

@charris - there are problems with both __getitem__ and __setitem__ that this PR tries to address. The one with __setitem__ is most obvious: if an item is unsuitable to be stored in the baseclass (in our case, e.g., a quantity with the wrong unit), the baseclass should be allowed to raise an exception (currently, I cannot easily give an example with quantity, since it requires some of the other PRs, but see the second half of test_subclass_items).

For __getitem__, you are indeed right that in Quantity -- and in the test case in this PR -- we are working around what one might well consider a deeper problem, that indexing a single item does not preserve subclass. But even so, by analogy with __setitem__, it would seem reasonable to just use the getter of baseclass rather than that of ndarray. (I really have no idea why MaskedArray was written to explicitly use ndarray.__setitem__ and ndarray.__getitem__.)

From a broader perspective, it would seem that MaskedArray effectively is a container class. As such, it would seem to make sense to let it make minimal assumptions about what it contains.

mhvk · 2014-08-26T20:19:17Z

p.s. I rebased, which meant moving the note about this change from 1.9 to 1.10.

charris · 2015-01-25T17:16:55Z

@mhvk OK, I'm ready to walk into this swamp again ;) Could you rebase?

mhvk · 2015-01-25T21:10:35Z

@charris - OK, rebased. Haven't looked at this myself for the past year, but still think the essence here is logical: let the baseclass handle things were appropriate.

ahaldane · 2015-03-27T22:19:41Z

@mhvk I noticed some similar problems when combining ma and matrices, maybe you have an idea. Consider

>>> ax = matrix(np.random.rand(3,3))
>>> mx = np.ma.array(ax, mask=(np.random.rand(3,3)) < 0.5))

If you do ax*ax it performs a dot product, but doing mx*mx semi-unexpectedly performs a simple multiply.

Now, much like the changes in this PR, we could try to fix it by making MaskedArray.__mul__ end up calling self.data.__mul__ rather than (as currently) np.multiply(self.data. But my question is, how would we know that the mask should also be updated according to a dot product rather than multiply?

Similarly, a subtype might make certain ndarray methods return the elements in a different order, or in a different shape. How would np.ma know how to update the mask in those cases? (This is already sort of a problem with matrices which sometimes add dimensions of size 1, which do not get added to the mask, as I noticed here).

These problems make it seem very tough to me to make masked arrays interoperate well with arbitrary ndarray subtypes.

mhvk · 2015-03-28T05:53:53Z

@ahaldane - when I wrote this PR I was mercilessly unaware of the pitfalls of matrices! Your example indeed shows a big one: either the masked version operates differently from the unmasked one, or the mask gets out of whack.

I think there are two parts to a possible solution. First would be for MaskedArray not to define __mul__, etc., but purely work at the ufunc level (i.e., with __array_prepare__, __array_wrap__, or, better, __numpy_ufunc__). Second to ensure that it takes on the methods what is being masked, so that the methods of the enclosed class are available. Then, when one calls mx * mx, this leads to Matrix.__mul__, which calls np.dot, which will first be passed through MaskedArray.__numpy_ufunc__, and there the mask can be dealt with, with the proper ufunc in hand.

But I'm far from sure this would work for all cases; in end, my sense would be that in terms of broadcasting, etc., MaskedArray should simply be able to assume that a subclass behaves like an ndarray. But obviously it does not currently, so this would break backwards compatibility.

Now on your specific example, I should add that my impression was that the matrix class has become such a problem that most people simply do not use it. Indeed, in python3.5 a new operator @ and associated __matmul__ method will be specifically introduced to make matrix operations with regular arrays easier (see http://legacy.python.org/dev/peps/pep-0465/).

ahaldane · 2015-03-28T16:11:25Z

@mhvk thanks for the helpful explanations. I want to better understand what the "end goal" might be for masked arrays, before I start making smaller changes.

In the case of dot, there are ways I see that almost work through varous combinations of views, but I still haven't found one that really works. The problem with your suggestion is that dot (and sum and others) are not ufuncs, and that matrix.__mul__ never goes through the ufunc machnery. A solution for dot is not going to involve ufuncs at all, I think. Also, even if matrix multiplication is fixed up in python3.5, there is still a similar problem with sum and others. Anyway, I'll keep thinking about it.

charris · 2015-04-22T22:26:56Z

@mhvk Could you rebase this again. It's probably just the release notes causing problems.

mhvk · 2015-04-22T23:31:57Z

@charris - ok, rebased. As you guessed, it was just the release notes -- and travis still passes!

charris · 2015-06-04T00:48:34Z

I've convinced myself that the gain is worth the possible pain, so in it goes.

ENH: Let MaskedArray getter, setter respect baseclass overrides

charris · 2015-06-04T00:49:16Z

Thanks @mhvk .

For a masked array holding objects that themselves are arrays, when selecting a single item, it is treated as if it were a slice of the array and an attempt is made to set its mask. This was always a bug, but it become visible with a recent change to `MaskedArray.__getitem__` (numpygh-4586) where it is attempted to change the shape of the mask. With this PR, this case gets special treatment in that the object is made into a Masked Array with a full set mask. (A previous attempt to do the perhaps more logical think and just return `masked` caused quite a few errors in astropy.io.votable; it seemed it was not worth breaking backwards compatibility that much). Test case that now works but used to fail: ``` mx1 = MaskedArray([1.], mask=[True]) mx2 = MaskedArray([1., 2.]) mx = MaskedArray([mx1, mx2], mask=[False, True]) mx[0] ```

mhvk mentioned this pull request Apr 12, 2014

ENH: allow subclass overrides by removing explicit ndarray methods #4617

Merged

charris reviewed Apr 21, 2014
View reviewed changes

mhvk force-pushed the ma/subclass-item-setting-getting branch from 613e061 to eea2342 Compare August 26, 2014 20:16

mhvk mentioned this pull request Oct 21, 2014

Rename masked column fails for subsequent slicing astropy/astropy#3023

Merged

mhvk mentioned this pull request Nov 3, 2014

Implement MaskedQuantity astropy/astropy#1852

Closed

mhvk force-pushed the ma/subclass-item-setting-getting branch from eea2342 to aca4d1d Compare January 25, 2015 21:05

rgommers added component: numpy.ma masked arrays 03 - Maintenance labels Mar 8, 2015

mhvk mentioned this pull request Mar 23, 2015

ENH: make some masked array methods behave more like ndarray methods #5706

Merged

charris added this to the 1.10.0 release milestone Apr 7, 2015

ENH: Let MaskedArray getter, setter respect baseclass overrides

7a84c56

mhvk force-pushed the ma/subclass-item-setting-getting branch from aca4d1d to 7a84c56 Compare April 22, 2015 23:09

charris self-assigned this Apr 28, 2015

mhvk mentioned this pull request May 9, 2015

BUG: MaskedArray ignores __array_priority__ #5230

Closed

charris added a commit that referenced this pull request Jun 4, 2015

Merge pull request #4586 from mhvk/ma/subclass-item-setting-getting

36b9404

ENH: Let MaskedArray getter, setter respect baseclass overrides

charris merged commit 36b9404 into numpy:master Jun 4, 2015

mhvk deleted the ma/subclass-item-setting-getting branch June 4, 2015 00:54

This was referenced Jun 12, 2015

BUG Ensure masked object arrays can always return single items. #5962

Merged

Numpy-dev issues for votable astropy/astropy#3848

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Let MaskedArray getter, setter respect baseclass overrides #4586

ENH: Let MaskedArray getter, setter respect baseclass overrides #4586

mhvk commented Apr 4, 2014

charris commented Apr 4, 2014

charris commented Apr 12, 2014

mhvk commented Apr 16, 2014

charris commented Apr 21, 2014

mhvk commented Apr 21, 2014

charris Apr 21, 2014

mhvk Apr 21, 2014

charris Apr 21, 2014

mhvk commented Apr 21, 2014

charris commented May 4, 2014

mhvk commented May 4, 2014

charris commented Aug 24, 2014

mhvk commented Aug 26, 2014

mhvk commented Aug 26, 2014

charris commented Jan 25, 2015

mhvk commented Jan 25, 2015

ahaldane commented Mar 27, 2015

mhvk commented Mar 28, 2015

ahaldane commented Mar 28, 2015

charris commented Apr 22, 2015

mhvk commented Apr 22, 2015

charris commented Jun 4, 2015

charris commented Jun 4, 2015

ENH: Let MaskedArray getter, setter respect baseclass overrides #4586

ENH: Let MaskedArray getter, setter respect baseclass overrides #4586

Conversation

mhvk commented Apr 4, 2014

charris commented Apr 4, 2014

charris commented Apr 12, 2014

mhvk commented Apr 16, 2014

charris commented Apr 21, 2014

mhvk commented Apr 21, 2014

charris Apr 21, 2014

Choose a reason for hiding this comment

mhvk Apr 21, 2014

Choose a reason for hiding this comment

charris Apr 21, 2014

Choose a reason for hiding this comment

mhvk commented Apr 21, 2014

charris commented May 4, 2014

mhvk commented May 4, 2014

charris commented Aug 24, 2014

mhvk commented Aug 26, 2014

mhvk commented Aug 26, 2014

charris commented Jan 25, 2015

mhvk commented Jan 25, 2015

ahaldane commented Mar 27, 2015

mhvk commented Mar 28, 2015

ahaldane commented Mar 28, 2015

charris commented Apr 22, 2015

mhvk commented Apr 22, 2015

charris commented Jun 4, 2015

charris commented Jun 4, 2015