-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Let MaskedArray getter, setter respect baseclass overrides #4586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
LGTM. |
Needs updating for the recent commits. I'm not sure why the merge button is still green. |
@charris - I rebased, and it passes the tests. OK to merge? |
LGTM. I cannot figure out why |
No ideas, I'm afraid... But it seemed good to clean up, hence also #4617. |
@@ -3050,16 +3049,16 @@ def __setitem__(self, indx, value): | |||
# if getmask(indx) is not nomask: | |||
# msg = "Masked arrays must be filled before they can be used as indices!" | |||
# raise IndexError(msg) | |||
_data = ndarray.view(self, ndarray.__getattribute__(self, '_baseclass')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this line has pretty much everything the bothers me ;) Note the here _data
is viewed as _baseclass
, which may, but I'm not sure, have a different shape than self._data
in the matrix case or some other odd case. Same thing above, where ndarray.view(self, ndarray)
may be different from self.data
, the latter may actually be a matrix type. I'd also like to know what the difference between self.data
and self._data
is, likewise self.mask
and self._mask
.
Dealing with masked arrays is like trying to sneak up on a tiger while creeping across eggshells. Disaster threatens at every move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charris - yes, it is weird, but it may be that this predates the following lines
def _get_data(self):
"""Return the current data, as a view of the original
underlying data.
"""
return ndarray.view(self, self._baseclass)
_data = property(fget=_get_data)
data = property(fget=_get_data)
Similarly,
def _get_mask(self):
"""Return the current mask.
"""
# We could try to force a reshape, but that wouldn't work in some cases.
# return self._mask.reshape(self.shape)
return self._mask
mask = property(fget=_get_mask, fset=__setmask__, doc="Mask")
So, this would seem to be safe... Or is there a MaskedMatrix
class or so where parts get overwritten? My larger worry is that this was all written at a time when people were not as used to writing lots and lots of test cases...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out the code actually postdates those functions you quote by about two years. Two different authors though, so we may not be the first struggling with the code ;) The mask part does look safe, although I'd like to know what the comment on line 2006 means.
I don't feel very confidence reviewing masked array code, there are a lot of subtleties that I don't understand yet...
I went ahead and copied |
@mhvk I'm going to delay the rest of the masked array work until 1.9 is branched. |
@charris - yes, that is better, certainly for this part. |
@mhvk OK, I'm back and sinking in this quicksand ;) I would help if I had a better understanding of what quantities is doing. That is, what methods are overridden, etc. If I understand correctly, the main problem here is the scalar returned by indexing. This is likely to be a general property, although ufuncs behave correctly.
Maybe there is a deeper problem here that needs to be addressed. |
@charris - there are problems with both For From a broader perspective, it would seem that |
613e061
to
eea2342
Compare
p.s. I rebased, which meant moving the note about this change from 1.9 to 1.10. |
@mhvk OK, I'm ready to walk into this swamp again ;) Could you rebase? |
eea2342
to
aca4d1d
Compare
@charris - OK, rebased. Haven't looked at this myself for the past year, but still think the essence here is logical: let the baseclass handle things were appropriate. |
@mhvk I noticed some similar problems when combining ma and matrices, maybe you have an idea. Consider
If you do Now, much like the changes in this PR, we could try to fix it by making Similarly, a subtype might make certain ndarray methods return the elements in a different order, or in a different shape. How would These problems make it seem very tough to me to make masked arrays interoperate well with arbitrary ndarray subtypes. |
@ahaldane - when I wrote this PR I was mercilessly unaware of the pitfalls of matrices! Your example indeed shows a big one: either the masked version operates differently from the unmasked one, or the mask gets out of whack. I think there are two parts to a possible solution. First would be for But I'm far from sure this would work for all cases; in end, my sense would be that in terms of broadcasting, etc., Now on your specific example, I should add that my impression was that the matrix class has become such a problem that most people simply do not use it. Indeed, in python3.5 a new operator |
@mhvk thanks for the helpful explanations. I want to better understand what the "end goal" might be for masked arrays, before I start making smaller changes. In the case of |
@mhvk Could you rebase this again. It's probably just the release notes causing problems. |
aca4d1d
to
7a84c56
Compare
@charris - ok, rebased. As you guessed, it was just the release notes -- and travis still passes! |
I've convinced myself that the gain is worth the possible pain, so in it goes. |
ENH: Let MaskedArray getter, setter respect baseclass overrides
Thanks @mhvk . |
For a masked array holding objects that themselves are arrays, when selecting a single item, it is treated as if it were a slice of the array and an attempt is made to set its mask. This was always a bug, but it become visible with a recent change to `MaskedArray.__getitem__` (numpygh-4586) where it is attempted to change the shape of the mask. With this PR, this case gets special treatment in that the object is made into a Masked Array with a full set mask. (A previous attempt to do the perhaps more logical think and just return `masked` caused quite a few errors in astropy.io.votable; it seemed it was not worth breaking backwards compatibility that much). Test case that now works but used to fail: ``` mx1 = MaskedArray([1.], mask=[True]) mx2 = MaskedArray([1., 2.]) mx = MaskedArray([mx1, mx2], mask=[False, True]) mx[0] ```
As is,
MaskedArray
uses explicity calls tondarray.__setitem__(self.data, indx, value)
andndarray.__getitem__(self.data, indx)
to access elements of its baseclass. This PR usesself.data[indx]
instead, so that possible overrides of__getitem__
and__setitem__
are respected.This is particularly useful for astropy's
Quantity
ndarray subclass, since it should check that anything written to it has the correct unit, and associate units for anything gotten from it.(Note: this includes #4585, as otherwise some of the tests would fail; I'll rebase once that is in.)