-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: Fix various problems with the np.ma.masked constant #9336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It's still possible to create duplicate MaskedConstants through `.copy()`
This seems to solve the problem everywhere. It's not clear if this works on PyPy. Fixes numpygh-9328
numpy/ma/core.py
Outdated
# everywhere else, we want to downcast to MaskedArray, to prevent a | ||
# duplicate maskedconstant. | ||
self.__class__ = MaskedArray | ||
self.__array_finalize__(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see a good alternative here to assigning to __class__
- unfortunately, __array_finalize__
asks us to operate in place, rather than return a new object.
Does this work on PyPy, @mattip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be alright on PyPy.js
, with a fake inheritance of MaskedConstant -> MaskedArray -> int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With my above suggestion, I'd hope this won't be necessary any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure we're stuck with this - too much of the MaskedArray
code calls .view(type(self))
, and we really don't want to have to special case every single one of those.
An alternative approach would be to allow __array_finalize__
to have a return value, which replaces self
- but there's almost certainly code out there that returns something random because it doesn't know what its doing, so that's not safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would seem anything.view(MaskedConstant)
is almost certainly an error.... And setting __class__
is quite awful. But I defer to your judgment - I'm on holidays and worry looking in more detail is not going to be a short process...
(and I agree we cannot change __array_finalize__
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would seem anything.view(MaskedConstant) is almost certainly an error
Written that way, yes. The problem is any kind of subclass-capable code that tries to reattach the subclass, perhaps after allocating a new buffer. That shouldn't have to know about np.ma
, so np.ma
needs to be able to make .view(MaskedConstant)
do something sensible
numpy/ma/tests/test_core.py
Outdated
MC = np.ma.core.MaskedConstant | ||
assert_(not isinstance(np.ma.masked[...], MC)) | ||
assert_(not isinstance(np.ma.masked.copy(), MC)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these tests previously failed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to admit I looked at this without really referring to the rest of the code, which means I didn't understand some things. Apologies! I think my suggestion for __new__
and __array_finalize__
would be an improvement, though.
numpy/ma/core.py
Outdated
_mask = mask = np.array(True) | ||
_baseclass = ndarray | ||
|
||
__singleton = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just a single _
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as not to clash with any subclasses of MaskConstant
, perhaps? Why not __
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__
to me is associated with dunder
while this is really just a private attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__
is a special mechanism in python to do name-mangling, to make attributes unique to their class, so that property names don't clash with subclasses. __foo
in the class suite of Bar
is mangled to _Bar__foo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, learned something new! So, here it makes sense (but for __inop
it then does not; presumably that should be inherited - not that anyone really should want to inherit MaskedConstant
...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heck, I should probably add a del __inop
there anyway, which makes the name irrelevant
numpy/ma/core.py
Outdated
_mask = mask = np.array(True) | ||
_baseclass = ndarray | ||
|
||
__singleton = None | ||
|
||
def __new__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you're at it, could you change self
to cls
here - this really is super confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good plan
numpy/ma/core.py
Outdated
mask.flags.writeable = False | ||
|
||
masked = MaskedArray(data, mask=mask) | ||
self.__singleton = masked.view(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This especially would be nicer as cls._singleton = ...
numpy/ma/core.py
Outdated
data.flags.writeable = False | ||
mask.flags.writeable = False | ||
|
||
masked = MaskedArray(data, mask=mask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I'd avoid the view casing and do instead (see comment on __array_finalize__
):
masked = super(MaskedConstant, cls).__new__(cls, data, mask=mask)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the view casing anyway, because MaskedArray.__new__
calls .view
numpy/ma/core.py
Outdated
|
||
def __array_finalize__(self, obj): | ||
return | ||
if self.__singleton is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make more sense to choose path based on obj
? If one uses super
in __new__
as suggested above, then the relevant case that is OK is obj is None
- all other cases imply view casting or slicing, neither of which I think should be allowed (except if obj is masked
) - could they just raise?
numpy/ma/core.py
Outdated
|
||
def __reduce__(self): | ||
"""Override of MaskedArray's __reduce__. | ||
""" | ||
return (self.__class__, ()) | ||
|
||
# inplace operations have no effect. We have to override them to avoid | ||
# trying to modify the readonly data and mask arrays | ||
def __inop(self, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single _
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess I named this to look like __iadd__
without actually looking like a special method. Could call it _handle__iop__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer that, but _iop
would also be fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually calling it __iop__
, so that np.ma.masked.__add__.__name__
prints something somewhat reasonable
numpy/ma/core.py
Outdated
if self is masked: | ||
return 'masked' | ||
else: | ||
# something is wrong, make it obvious |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure, this will not happen in any normal usage, correct? (e.g., from the test below i assume .copy
returns something else)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that's the hope. There were many different ways that this could happen before, and this change made it easy to spot.
It also means that subclasses of MaskedConstant
don't masquerade as masked
either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, makes sense.
numpy/ma/core.py
Outdated
# everywhere else, we want to downcast to MaskedArray, to prevent a | ||
# duplicate maskedconstant. | ||
self.__class__ = MaskedArray | ||
self.__array_finalize__(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With my above suggestion, I'd hope this won't be necessary any more.
numpy/ma/tests/test_core.py
Outdated
def test_no_copies(self): | ||
MC = np.ma.core.MaskedConstant | ||
assert_(not isinstance(np.ma.masked[...], MC)) | ||
assert_(not isinstance(np.ma.masked.copy(), MC)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test that checks the actual class that is returned by [...]
and .copy()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you worried the type might be ndarray
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is more that I had no idea what it would be! (As mentioned, I didn't look at the rest of the code, but from this PR it was not obvious what the result should be, and on current master it is a MaskedConstant
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will adjust to asserting the direct type here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fun fact - copying 0d masked arrays doesn't copy the mask
numpy/ma/tests/test_core.py
Outdated
assert_(not isinstance(np.ma.masked.copy(), MC)) | ||
|
||
def test_immutable(self): | ||
assert_raises(ValueError, operator.setitem, np.ma.masked.data, (), 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add the test for just trying to set np.ma.masked
itself?
Hopefully addressed most of the reviews. This raised #9430, but I'd rather open that can of worms in another PR. |
numpy/ma/tests/test_core.py
Outdated
|
||
m_copy = np.ma.masked.copy() | ||
assert_equal(type(m_copy), np.ma.MaskedArray) | ||
# assert_equal(m_copy.mask, True) - gh-9430 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can of worms left unopened
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed here: #9432
# trying to modify the readonly data and mask arrays | ||
def __iop__(self, other): | ||
return self | ||
__iadd__ = \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is OK because the singleton can never be unmasked? Might be cleaner stylewise to just assign the function rather than have all the line continuations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, if the singleton is unmasked there are much bigger problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer the continuations, as they ensure that __iop__
doesn't get mispelt as one of the other methods
numpy/ma/tests/test_core.py
Outdated
|
||
def test_write_to_copy(self): | ||
# gh-9328 | ||
x = np.ma.masked.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So x
is a downcast masked array, not the singleton in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, strange behavior for copy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, because that seemed the least bad option, with the others being:
return self
- this is clearly not a copy, and will fail when trying to write to it, whereas normally you would expect to be able to write to the result of copy() without side-effectsreturn MaskedConstant(self)
(current master) - this is super confusing, because it printed asmasked
, and would be still be unwriteable, yet failsis masked
- stop returning
masked
anywhere internally - this is huge compatibility change, and breaks anyis np.ma.masked
comparison
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it back, this should return self
, because x.copy() is x
is already true for np.True_
, so there's precedent for that behaviour for scalars
LGTM. Do we need a release note? The behavior of masked is changed in subtle ways. |
Possibly an improvements release note? |
e87fd89
to
36f481b
Compare
|
||
def test_immutable(self): | ||
orig = np.ma.masked | ||
assert_raises(np.ma.core.MaskError, operator.setitem, orig, (), 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit weird how this raises a different error, but shouldn't be a big deal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and importantly, hasn't changed in this PR)
If the behavior has changed, it should probably be in the compatibility notes. |
@eric-wieser Could you add a release note so I can merge this? |
Looks like I forgot to push the release notes fix. I'm still away from git for the next day or two |
Code LGTM. The following might be out of the scope of this PR, but I came across them while testing: The singleton's shape can be edited: >>> np.ma.masked.shape = (1,1,1)
>>> np.ma.array([1], mask=True)[0].shape
(1, 1, 1) Not sure if this prexisting problem is with >>> np.ma.masked.mean() # fails, unlike other 0d masked arrays
AttributeError: 'int' object has no attribute 'shape' |
@charris - a summary of the behaviour changes:
Perhaps these latter two deserve a release note. |
a52db41
to
1616b02
Compare
1616b02
to
d4509fc
Compare
Added a release note about point 2, @charris |
Thanks Eric. |
That |
Did this patch change whether |
Hmm, I'm just going by @astrofrog's comment that this PR broke our tests, and recalled this hashable question about |
It turns out that
np.ma.masked
is actually pretty bad at preserving identity or valueThis PR attempts to fix that.
Fixes #9328 and #4595