Skip to content

BUG: Fix various problems with the np.ma.masked constant #9336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Sep 21, 2017

Conversation

eric-wieser
Copy link
Member

@eric-wieser eric-wieser commented Jun 30, 2017

It turns out that np.ma.masked is actually pretty bad at preserving identity or value

This PR attempts to fix that.

Fixes #9328 and #4595

It's still possible to create duplicate MaskedConstants through `.copy()`
This seems to solve the problem everywhere. It's not clear if this works on PyPy.

Fixes numpygh-9328
numpy/ma/core.py Outdated
# everywhere else, we want to downcast to MaskedArray, to prevent a
# duplicate maskedconstant.
self.__class__ = MaskedArray
self.__array_finalize__(obj)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see a good alternative here to assigning to __class__ - unfortunately, __array_finalize__ asks us to operate in place, rather than return a new object.

Does this work on PyPy, @mattip?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be alright on PyPy.js, with a fake inheritance of MaskedConstant -> MaskedArray -> int

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With my above suggestion, I'd hope this won't be necessary any more.

Copy link
Member Author

@eric-wieser eric-wieser Jul 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure we're stuck with this - too much of the MaskedArray code calls .view(type(self)), and we really don't want to have to special case every single one of those.

An alternative approach would be to allow __array_finalize__ to have a return value, which replaces self - but there's almost certainly code out there that returns something random because it doesn't know what its doing, so that's not safe

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would seem anything.view(MaskedConstant) is almost certainly an error.... And setting __class__ is quite awful. But I defer to your judgment - I'm on holidays and worry looking in more detail is not going to be a short process...

(and I agree we cannot change __array_finalize__)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would seem anything.view(MaskedConstant) is almost certainly an error

Written that way, yes. The problem is any kind of subclass-capable code that tries to reattach the subclass, perhaps after allocating a new buffer. That shouldn't have to know about np.ma, so np.ma needs to be able to make .view(MaskedConstant) do something sensible

MC = np.ma.core.MaskedConstant
assert_(not isinstance(np.ma.masked[...], MC))
assert_(not isinstance(np.ma.masked.copy(), MC))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these tests previously failed

@charris
Copy link
Member

charris commented Jul 16, 2017

@mhvk @ahaldane Could you folks take a look at this?

@eric-wieser eric-wieser requested review from mhvk and ahaldane July 16, 2017 16:56
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit I looked at this without really referring to the rest of the code, which means I didn't understand some things. Apologies! I think my suggestion for __new__ and __array_finalize__ would be an improvement, though.

numpy/ma/core.py Outdated
_mask = mask = np.array(True)
_baseclass = ndarray

__singleton = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just a single _?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as not to clash with any subclasses of MaskConstant, perhaps? Why not __?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__ to me is associated with dunder while this is really just a private attribute.

Copy link
Member Author

@eric-wieser eric-wieser Jul 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__ is a special mechanism in python to do name-mangling, to make attributes unique to their class, so that property names don't clash with subclasses. __foo in the class suite of Bar is mangled to _Bar__foo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, learned something new! So, here it makes sense (but for __inop it then does not; presumably that should be inherited - not that anyone really should want to inherit MaskedConstant...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heck, I should probably add a del __inop there anyway, which makes the name irrelevant

numpy/ma/core.py Outdated
_mask = mask = np.array(True)
_baseclass = ndarray

__singleton = None

def __new__(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're at it, could you change self to cls here - this really is super confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good plan

numpy/ma/core.py Outdated
mask.flags.writeable = False

masked = MaskedArray(data, mask=mask)
self.__singleton = masked.view(self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This especially would be nicer as cls._singleton = ...

numpy/ma/core.py Outdated
data.flags.writeable = False
mask.flags.writeable = False

masked = MaskedArray(data, mask=mask)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I'd avoid the view casing and do instead (see comment on __array_finalize__):

masked = super(MaskedConstant, cls).__new__(cls, data, mask=mask)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the view casing anyway, because MaskedArray.__new__ calls .view

numpy/ma/core.py Outdated

def __array_finalize__(self, obj):
return
if self.__singleton is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make more sense to choose path based on obj? If one uses super in __new__ as suggested above, then the relevant case that is OK is obj is None - all other cases imply view casting or slicing, neither of which I think should be allowed (except if obj is masked) - could they just raise?

numpy/ma/core.py Outdated

def __reduce__(self):
"""Override of MaskedArray's __reduce__.
"""
return (self.__class__, ())

# inplace operations have no effect. We have to override them to avoid
# trying to modify the readonly data and mask arrays
def __inop(self, other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single _?

Copy link
Member Author

@eric-wieser eric-wieser Jul 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess I named this to look like __iadd__ without actually looking like a special method. Could call it _handle__iop__?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer that, but _iop would also be fine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually calling it __iop__, so that np.ma.masked.__add__.__name__ prints something somewhat reasonable

numpy/ma/core.py Outdated
if self is masked:
return 'masked'
else:
# something is wrong, make it obvious
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure, this will not happen in any normal usage, correct? (e.g., from the test below i assume .copy returns something else)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's the hope. There were many different ways that this could happen before, and this change made it easy to spot.

It also means that subclasses of MaskedConstant don't masquerade as masked either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, makes sense.

numpy/ma/core.py Outdated
# everywhere else, we want to downcast to MaskedArray, to prevent a
# duplicate maskedconstant.
self.__class__ = MaskedArray
self.__array_finalize__(obj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With my above suggestion, I'd hope this won't be necessary any more.

def test_no_copies(self):
MC = np.ma.core.MaskedConstant
assert_(not isinstance(np.ma.masked[...], MC))
assert_(not isinstance(np.ma.masked.copy(), MC))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test that checks the actual class that is returned by [...] and .copy()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you worried the type might be ndarray here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more that I had no idea what it would be! (As mentioned, I didn't look at the rest of the code, but from this PR it was not obvious what the result should be, and on current master it is a MaskedConstant)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will adjust to asserting the direct type here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fun fact - copying 0d masked arrays doesn't copy the mask

assert_(not isinstance(np.ma.masked.copy(), MC))

def test_immutable(self):
assert_raises(ValueError, operator.setitem, np.ma.masked.data, (), 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add the test for just trying to set np.ma.masked itself?

@eric-wieser
Copy link
Member Author

Hopefully addressed most of the reviews. This raised #9430, but I'd rather open that can of worms in another PR.


m_copy = np.ma.masked.copy()
assert_equal(type(m_copy), np.ma.MaskedArray)
# assert_equal(m_copy.mask, True) - gh-9430
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can of worms left unopened

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here: #9432

# trying to modify the readonly data and mask arrays
def __iop__(self, other):
return self
__iadd__ = \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is OK because the singleton can never be unmasked? Might be cleaner stylewise to just assign the function rather than have all the line continuations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, if the singleton is unmasked there are much bigger problems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer the continuations, as they ensure that __iop__ doesn't get mispelt as one of the other methods


def test_write_to_copy(self):
# gh-9328
x = np.ma.masked.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So x is a downcast masked array, not the singleton in this case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, strange behavior for copy.

Copy link
Member Author

@eric-wieser eric-wieser Jul 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, because that seemed the least bad option, with the others being:

  • return self - this is clearly not a copy, and will fail when trying to write to it, whereas normally you would expect to be able to write to the result of copy() without side-effects
  • return MaskedConstant(self) (current master) - this is super confusing, because it printed as masked, and would be still be unwriteable, yet fails is masked
  • stop returning masked anywhere internally - this is huge compatibility change, and breaks any is np.ma.masked comparison

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it back, this should return self, because x.copy() is x is already true for np.True_, so there's precedent for that behaviour for scalars

@charris
Copy link
Member

charris commented Jul 18, 2017

LGTM. Do we need a release note? The behavior of masked is changed in subtle ways.

@eric-wieser
Copy link
Member Author

Possibly an improvements release note?


def test_immutable(self):
orig = np.ma.masked
assert_raises(np.ma.core.MaskError, operator.setitem, orig, (), 1)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit weird how this raises a different error, but shouldn't be a big deal

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and importantly, hasn't changed in this PR)

@charris
Copy link
Member

charris commented Jul 18, 2017

If the behavior has changed, it should probably be in the compatibility notes.

@charris
Copy link
Member

charris commented Jul 21, 2017

@eric-wieser Could you add a release note so I can merge this?

@eric-wieser
Copy link
Member Author

Looks like I forgot to push the release notes fix. I'm still away from git for the next day or two

@ahaldane
Copy link
Member

ahaldane commented Sep 4, 2017

Code LGTM.

The following might be out of the scope of this PR, but I came across them while testing:
(edited since first posting)

The singleton's shape can be edited:

>>> np.ma.masked.shape = (1,1,1) 
>>> np.ma.array([1], mask=True)[0].shape
(1, 1, 1)

Not sure if this prexisting problem is with mean or with MaskedConstant:

>>> np.ma.masked.mean()  # fails, unlike other 0d masked arrays
AttributeError: 'int' object has no attribute 'shape'

@charris charris added this to the 1.14.0 release milestone Sep 4, 2017
@eric-wieser
Copy link
Member Author

eric-wieser commented Sep 5, 2017

@charris - a summary of the behaviour changes:

  • masked is better at being the singleton it is documented to be, by not allowing copies:

    • round-tripped through pickle by identity, rather than making a copy
    • masked.copy() doesn't copy, because it doesn't need to.
      These seem like bugfixes that aren't note-worthy
  • The buffers owned by masked are now readonly

  • np.ma.masked.view(type(np.ma.masked)) no longer returns np.ma.masked. There are competing semantics here

    • view on an array should always return a new wrapper of the same type around the same buffer
    • view on a scalar should unpack the result, which in this case gives masked:
      • np.True_.view(np.ndarray) is np.True_ (identity holds for `bool_ only - otherwise it's equality)
      • np.True_.view(np.matrix) gives np.matrix([[True]], dtype=bool)

    It's hard to pick something consistent here anyway, because of np.ma.masked is not a scalar #7588.

Perhaps these latter two deserve a release note.

@eric-wieser
Copy link
Member Author

Added a release note about point 2, @charris

@charris charris merged commit 1ccfa62 into numpy:master Sep 21, 2017
@charris
Copy link
Member

charris commented Sep 21, 2017

Thanks Eric.

@mhvk
Copy link
Contributor

mhvk commented Oct 2, 2017

That np.ma.masked is no longer hashable is breaking tests in astropy - astropy/astropy#6645 (I think we had discussion about this, but am not quite sure where...)

@eric-wieser
Copy link
Member Author

Did this patch change whether masked could be hashed?

@mhvk
Copy link
Contributor

mhvk commented Oct 2, 2017

Hmm, I'm just going by @astrofrog's comment that this PR broke our tests, and recalled this hashable question about MaskedConstant. Perhaps I jumped to conclusions too quickly (am in the middle of classes and meetings with students, so cannot investigate right now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: MaskedConstant is mutated despite copying
4 participants