Skip to content

API: Make MaskedArray.mask return a view, rather than the underlying mask #10308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 12, 2019

Conversation

eric-wieser
Copy link
Member

@eric-wieser eric-wieser commented Jan 2, 2018

This prevents consumers from reshaping the mask in place, which breaks things

As a result, x.mask is x.mask returns False, but this was already true of x.data is x.data.

May also be related to gh-10270


This fixes the following:

assert np.ma.masked.mask.shape == ()  # fails!

a = np.ma.masked
b = np.ma.array(1, mask=a.mask)
b.shape = (1,)
assert np.ma.masked.mask.shape == ()  # fails!

@mhvk
Copy link
Contributor

mhvk commented Jan 2, 2018

I don't know that I like this: in general, python is fairly consistent in just giving you a direct link, so that you can do tests like a is b, and that comes with the responsibility to not just screw things up by assuming you can mutate (as in directories passed through). I think the analogy with data is not quite correct, since that really returns a different type.

Separately, I don't think we should ever suggest to use private attributes.

@eric-wieser
Copy link
Member Author

The other argument for making this change is it gives us the freedom to swap out the implementation for one that computes the mask on the fly - such as how x.imag is x.imag also returns False.

It seems pretty apparent to me that is is a bad way to detect "is a view of the same data" in numpy. Perhaps we need a np.is_full_view = lambda x, y: x.__array_interface__ == y.__array_interface__ or similar

@mhvk
Copy link
Contributor

mhvk commented Jan 2, 2018

Hmm, looking more at #10270, I see at least the rationale. And I see that __array_finalize__ happily changes the shape of the mask it got from obj. But from an admittedly not in-depth inspection, it is also the only place: everywhere else special care is taken to view or copy the mask properly.

@rgommers
Copy link
Member

rgommers commented Jan 2, 2018

That this PR updates mostly tests seems to indicate that this will break user code. So I share @mhvk's concern.

Also, these are the things that should be discussed on the mailing list before merging, since it's an intentional backwards compat break.

@eric-wieser
Copy link
Member Author

seems to indicate that this will break user code.

Probably mostly user tests. Only code relying on the identity of arr.mask will fail.

should be discussed on the mailing list before merging

Agreed - I figured I'd make the PR first to work out exactly what needs to change

@rgommers
Copy link
Member

rgommers commented Jan 2, 2018

should be discussed on the mailing list before merging

Agreed - I figured I'd make the PR first to work out exactly what needs to change

Yeah, good idea - is a useful strategy especially if the PR is not hard. Easier to discuss a concrete rather than an abstract change.

@mhvk
Copy link
Contributor

mhvk commented Jan 2, 2018

Something else that is slightly illogical is that this changes only how ma.mask behaves - if you use np.ma.getmask(ma) you would still get the mask directly. Logically, it would seem that, if anything, this should be the reverse. But if one changes the latter to return a view, lots more tests break.

Or, more particularly, a number of additional tests break if one changes this in __array_finalize__ - the one place where really one should not be meddling with the mask shape and have it propagate back to obj! Looking deeper into this: there is actually an incredible hack - it makes shape-setting work by the following rather horrible logic:

  1. ma.shape = <shape> calls the base ndarray.shape setter (getset.c, array_shape_set).
  2. This internally calls PyArray_Reshape, i.e., effectively new = np.ndarray.reshape(ma, <shape>).
  3. Which calls new.__array_finalize__(ma), and therefore mangles ma.mask
  4. Upon success, the c code copies new.shape to ma.shape, but does not touch the mask.
  5. Since the mask is already changed, this works.

@eric-wieser
Copy link
Member Author

I'm not sure I understand how _mask fits in there - are you sure that the .view(np.ndarray) actually happens?

@mhvk
Copy link
Contributor

mhvk commented Jan 2, 2018

Sorry that was not clear: I should have written np.ndarray.reshape(ma, <shape>) (will edit above) - this calls __array_finalize__.

@mhvk
Copy link
Contributor

mhvk commented Jan 2, 2018

I saw a comment with an example of b=MaskedArray(1., mask=masked.mask) fly by but don't know where - this would seem a bug in the initializer, which should always take a new view of the mask.

@mhvk
Copy link
Contributor

mhvk commented Jan 2, 2018

Fixing __array_finalize__ does not seem to be entirely trivial, sadly. Really, we should be ensuring ma.shape does the right thing, but it's not enough. An initial attempt is at #10314.

@mattip
Copy link
Member

mattip commented May 11, 2019

Needs rebasing and changing the release note update from 1.15 to 1.17

…mask

This prevents consumers from reshaping the mask in place, which breaks things

As a result, `x.mask is x.mask` returns `False`, but this was already true of `x.data is x.data`.

May also be related to numpygh-10270
@eric-wieser eric-wieser added this to the 1.17.0 release milestone May 11, 2019
@eric-wieser eric-wieser requested a review from mattip May 11, 2019 22:04
@charris
Copy link
Member

charris commented May 12, 2019

close/reopen

@charris charris closed this May 12, 2019
@charris charris reopened this May 12, 2019
@mattip mattip merged commit d2d5897 into numpy:master May 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants