-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG, DEP: Fix masked arrays to properly edit views. ( #5558 ) #5580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think this is another one of those policy questions that need discussion on the list. It looks like the current behavior was intentional. |
Posted to the mailing list. |
@charris, how do we know if we have reached consensus? It seems like I haven't heard any disagreement about this fix. It just needs tests for another use case. |
@@ -3109,8 +3109,6 @@ def __setitem__(self, indx, value): | |||
_mask[indx] = tuple([True] * nbfields) | |||
else: | |||
_mask[indx] = True | |||
if not self._isfield: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the default here is self._sharedmask = True
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea why self._isfield
was important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll start by explaining what is going on and then answer your questions if that is alright. Please forgive me if I go into too much detail and explain the obvious. Good questions by the way. Sorry if my answer gets too long.
This particular branch is dealing with the case of a structured array. Suppose for example we have a structured array created like so a = np.zeros((2,), dtype=[("c",int, 2), ("d", float, 3)])
. We could construct a masked structured array like so b = np.ma.array(a, mask=np.ma.getmaskarray(a))
. In this example, we have two fields c
and d
. If we index into one say c
like so b["c"]
, we will get a normal masked array.
However, there is no way to know that this masked array came from a structured masked array; so, the _isfield
member variable indicates this. One can verify this is the case by trying b._isfield
and b["c"]._isfield
.
Now, why was _isfield
checked? My belief, as I can't speak for the author personally, is the author correctly assumed that if we are looking at a field of a structured array our mask is a view of the mask from the original structured array's mask and we want to ensure changes here are propagated back to the original and so this must remain shared if it already is (this indirectly answers your first question). However, if it is not shared, it will remain the case.
To directly answer your first question, we can't assume anything about self._sharedmask
. Its value will be dependent on what .
The argument that I am making here is that this one can't assume this is not a view if it is not a field. In other words, we could always be working with a view of another masked array where we want changes propagated to an original and should not be forcing the user's hand here. Additionally, it would be inconsistent to "unshare" the mask as the data remains shared regardless.
I'd say the consensus (of one other ;) was that the behavior needed changing, but it worries me. Have you tested this with any other packages, matplotlib for instance? Also needs a rebase. |
Thanks for reviewing this @charris. I'll block out some time this weekend to both respond to your questions/suggestions and test this a bit more rigorously. |
It's true, unfortunately, the conversation didn't attract as many people as either of us might have hoped. I have added some responses to give you more insight to the changes that are proposed here. I have not had time to test this with other packages and will try to set aside some more time to do that. Are there any other packages that you recommend? Agreed that this needs a rebase, but I will hold off until you have had some time to see/respond to my comments otherwise they get buried on GitHub. |
self.assertTrue(y2._data.__array_interface__ == x1.__array_interface__) | ||
#self.assertTrue( y2.mask is m) | ||
self.assertTrue(y2._mask.__array_interface__ == m.__array_interface__) | ||
#self.assertTrue( y2.mask is m3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space after the #
. It isn't clear that this is a comment rather than a commented out statement, it would be better something like
# check that y2.mask is m3 in (whatever the case is)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if it is a commented out statement, it should be removed. Frankly, I don't understand what is going on here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I wasn't clear as to why these commented statements were kept either. I think I tested the original code with those comments removed, but it failed if I recall correctly. Sure, we can drop them completely.
@jakirkham Needs a rebase also. |
Well, I read through it twice now in the hope of better understanding maskedarray, but it's a bit unclear. It does seem like all the unshared mask stuff should be removed - the code makes more sense without, and it fixes a bug. But then it is quite mysterious why that code was there in the first place. |
Sorry to get to this very late. I liked the idea but wondered whether there really was nothing in the documentation. It turns out the current behaviour is explicitly mentioned [1]:
Personally, I feel this somewhat contradicts the expectation that one gets a view of everything. But obviously, if we change it, the documentation should be changed as well, and should include a note on how one can get back the old behaviour (presumably, something like Note, though, that with the new behaviour things will not be consistent if the original masked array did not have a mask. For that case, its mask attribute will hold [1] http://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html#indexing-and-slicing |
It is reassuring that the current behavior was intended as opposed to being a possible mistake even if it contradicts our intuition.
Agreed.
Sure, I'll make sure to add this to this PR. It is also worth noting that the
Yeah, |
I go back and forth on this. The change does make sense, but then the original behavior was intended, no doubt for some reason that escapes us. The nomask case is also bothersome. Would it be an imposition if I put this off until 1.11-devel? |
OK, I'm going to push this off to 1.11.0, it is more of a change than I want to see in the 1.10 release at this point. |
4bda89f
to
c110ced
Compare
Sure, I don't have lots of time at present to work on this anyways. Do you think it would be worth getting rid of nomask in favor of just having an array full of Also, rebased. :) |
@charris, do you think it would be a worthwhile to post something on mailing list about deprecating |
If we were to rewrite |
Maybe we should do a new "MaskedContainer" and try to solve some of the problems with ma. If we get a version of |
The deprecation went into 1.11, so I am going to push this off to 1.14, which should be a little less than two years. |
Close and reopen to restart testss. |
So NumPy 1.13 has already been branched out, right? Can this be merged? |
@mhvk Any final thoughts on this? |
@jakirkham Needs a 1.14 release note. |
Please let me know what you would like to see for a release note. |
@@ -379,8 +379,8 @@ is masked. | |||
When accessing a slice, the output is a masked array whose | |||
:attr:`~MaskedArray.data` attribute is a view of the original data, and whose | |||
mask is either :attr:`nomask` (if there was no invalid entries in the original | |||
array) or a copy of the corresponding slice of the original mask. The copy is | |||
required to avoid propagation of any modification of the mask to the original. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should note the previous behavior, the new behavior, and the release in which things changed.
@jakirkham Take a look at the |
I confess that the incompatible nomask behavior bothers me quite a bit. Maybe we should seek to fix that problem first. If there is a fix. |
I'm thinking of something like
to fix the nomask problem. The mask and hard/soft could be stored in a container that is inherited by the view, so can be changed, and the mask property accesses the container. |
Do you mean something like: class MaskedArray:
@property
def mask(self):
if self._nomaskscalar_backcompat and not self._mask.any():
return nomask
return self._mask Where |
I was thinking more like storing the |
Also wondering if we need some locking around view taking if the mask is also a view. The thought is that another view could modify the mask while a new view is being taken. I suppose that this is a problem with views in general, and something that folks using them in threading environments need to take into account in any case. I wonder just how atomic ndarray indexing is? |
Overall, I would like to go on, now that we've had the Future/Deprecation cycle, because I do prefer that masks are viewed just like the data. But I also agree that we should not be in a state where the mask sometimes propagates and sometimes not... I quite like the wrapping in a new class, though also @eric-wieser's suggestion of faking the |
Given the widespread use of both Hmm, maybe we should start with a PR that changes masked arrays to always use arrays internally and see if that causes problems. |
I'm kind of holding the opinion that maybe a complete rewrite of |
So if you do go the rewrite |
OK, I'll put this in as is. I expect the |
Thanks everyone. Glad to see this one in. |
Fixes #5558
A detailed description of the bug is available at issue #5558. In short, when changing a view of a masked array, the data is properly propagated to the original, but changes to the mask are not. This pull request attempts to address this by ensuring the mask will also be properly altered.