-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Let MaskedArray getter, setter respect baseclass overrides #4586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3043,8 +3043,7 @@ def __getitem__(self, indx): | |
# if getmask(indx) is not nomask: | ||
# msg = "Masked arrays must be filled before they can be used as indices!" | ||
# raise IndexError(msg) | ||
_data = ndarray.view(self, ndarray) | ||
dout = ndarray.__getitem__(_data, indx) | ||
dout = self.data[indx] | ||
# We could directly use ndarray.__getitem__ on self... | ||
# But then we would have to modify __array_finalize__ to prevent the | ||
# mask of being reshaped if it hasn't been set up properly yet... | ||
|
@@ -3074,6 +3073,8 @@ def __getitem__(self, indx): | |
# Update the mask if needed | ||
if _mask is not nomask: | ||
dout._mask = _mask[indx] | ||
# set shape to match that of data; this is needed for matrices | ||
dout._mask.shape = dout.shape | ||
dout._sharedmask = True | ||
# Note: Don't try to check for m.any(), that'll take too long... | ||
return dout | ||
|
@@ -3091,16 +3092,16 @@ def __setitem__(self, indx, value): | |
# if getmask(indx) is not nomask: | ||
# msg = "Masked arrays must be filled before they can be used as indices!" | ||
# raise IndexError(msg) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, this line has pretty much everything the bothers me ;) Note the here Dealing with masked arrays is like trying to sneak up on a tiger while creeping across eggshells. Disaster threatens at every move. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @charris - yes, it is weird, but it may be that this predates the following lines
Similarly,
So, this would seem to be safe... Or is there a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Turns out the code actually postdates those functions you quote by about two years. Two different authors though, so we may not be the first struggling with the code ;) The mask part does look safe, although I'd like to know what the comment on line 2006 means. I don't feel very confidence reviewing masked array code, there are a lot of subtleties that I don't understand yet... |
||
_data = ndarray.view(self, ndarray.__getattribute__(self, '_baseclass')) | ||
_mask = ndarray.__getattribute__(self, '_mask') | ||
_data = self._data | ||
_mask = self._mask | ||
if isinstance(indx, basestring): | ||
ndarray.__setitem__(_data, indx, value) | ||
_data[indx] = value | ||
if _mask is nomask: | ||
self._mask = _mask = make_mask_none(self.shape, self.dtype) | ||
_mask[indx] = getmask(value) | ||
return | ||
#........................................ | ||
_dtype = ndarray.__getattribute__(_data, 'dtype') | ||
_dtype = _data.dtype | ||
nbfields = len(_dtype.names or ()) | ||
#........................................ | ||
if value is masked: | ||
|
@@ -3124,21 +3125,21 @@ def __setitem__(self, indx, value): | |
mval = tuple([False] * nbfields) | ||
if _mask is nomask: | ||
# Set the data, then the mask | ||
ndarray.__setitem__(_data, indx, dval) | ||
_data[indx] = dval | ||
if mval is not nomask: | ||
_mask = self._mask = make_mask_none(self.shape, _dtype) | ||
ndarray.__setitem__(_mask, indx, mval) | ||
_mask[indx] = mval | ||
elif not self._hardmask: | ||
# Unshare the mask if necessary to avoid propagation | ||
if not self._isfield: | ||
self.unshare_mask() | ||
_mask = ndarray.__getattribute__(self, '_mask') | ||
_mask = self._mask | ||
# Set the data, then the mask | ||
ndarray.__setitem__(_data, indx, dval) | ||
ndarray.__setitem__(_mask, indx, mval) | ||
_data[indx] = dval | ||
_mask[indx] = mval | ||
elif hasattr(indx, 'dtype') and (indx.dtype == MaskType): | ||
indx = indx * umath.logical_not(_mask) | ||
ndarray.__setitem__(_data, indx, dval) | ||
_data[indx] = dval | ||
else: | ||
if nbfields: | ||
err_msg = "Flexible 'hard' masks are not yet supported..." | ||
|
@@ -3149,7 +3150,7 @@ def __setitem__(self, indx, value): | |
np.copyto(dindx, dval, where=~mindx) | ||
elif mindx is nomask: | ||
dindx = dval | ||
ndarray.__setitem__(_data, indx, dindx) | ||
_data[indx] = dindx | ||
_mask[indx] = mindx | ||
return | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking this should be
because it is possible that the
_baseclass
overrides the indexing.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you logic (especially after the troubles with masked matrices for
.flat
...; indeed, cc @seberg) but one of the main points of this PR is actually that, to allow the baseclass to have its say in getting and setting. For astropy'sQuantity
, getting a proper slice would work fine, since that gets turned back into aMaskedArray
and is updated (which is important as it copies theQuantity
's unit). But if one gets a single item, ndarray's__getitem__
leaves one with annp.float64
object, which, if not masked, will be returned as is, i.e., one has lost the unit (or, more generally, removed the baseclass aspect of the item).Indeed, the above is why we have a
Quantity.__getitem__
: we return a single item as a a̶r̶r̶a̶y̶ ̶s̶c̶a̶l̶a̶r̶ 0-dimensional array instead (and have an internal property that remembers that this is a scalar).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what to think of this, except that it seems like this "masked matrix" stuff was always broken with indexing... It likely makes sense to actually change this, the finalization stage should attempt to fix the mask shape (needs test with matrix). But shape fixing in the data array and not having the same thing in the mask array, still seems like asking for trouble to me in any case (i.e. the whole reason why the .flat was needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does seem to make sense to allow the
_baseclass
to override the indexing. I did check scipy with your changes and didn't see anything the looked related to this. Hmm... could you add a note here and we'll give this a try.Looks like the code in
MaskedArray
could use a cleanup, if only for clarity.Curious that
_data
anddata
are the same thing, likewise_mask
andmask
. I wonder why that was done? Backward compatibility? Intent to deprecate one?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Masked array could probably use a rewrite using
__numpy_ufunc__
anyway when it is time...Isn't
data
a property and_data
the real array? Though I think you are probably right and it is identical. If it doesn't, it would be better if.data
would return a view on_data
and not the object itself.About the matrix stuff/subclasses that change the shape... This makes enough trouble in numpy, getting it to work right in ma is probably hell and buggy at best at this time (just my guess on things, I never even used masked arrays ^^). For what I care, I would be fine with deprecating all that
.flat
usage. Do we really need to support subclasses that mess with the shape inside masked arrays?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charris - It is indeed odd that
_data
anddata
are the same thing. For_mask
andmask
it looks more like the usual private property shielded by a getter and setter.@seberg - definitely would be great to move to
__numpy_ufunc__
... but it does need someone to sit down and be careful...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Y'all, sorry for not being able to be more responsive on all these problems. The
_data
is indeed the private property accessing the underlyingndarray
(just like_mask
stores the booleanndarray
corresponding to, you guessed it). Neither_data
or_mask
are supposed to be accessible to the lay person, hence thedata
andmask
properties. Well, hence themask
: thedata
was introduced to give a quick access to thendarray
in cases where you'd want to drop theMaskedArray
part for a while, perform faster computations and pull the mask back at the end.In the problem at hand, the objective is to get the underlying data in its 'least-adulterated' form: @charris is right, it should be `dout=self.view(...)[indx].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pierregm - the problem with doing
dout=self.view(np.ndarray)[indx]
is that, as mentioned above, this presumes that a single item from a subclass can be represented as anp.float64
scalar. This, however, is false at least for astropy'sQuantity
, where one does need to keep the unit.Note that the problem is worse for
__setitem__
: the baseclass may not allow one to set parts to just anything, it may need to check that, e.g., the data put in has the same class (or, forQuantity
, that the units are consistent, and convert if necessary).