-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG: Raise a quieter MaskedArrayFutureWarning
for mask changes.
#7194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Raise a quieter MaskedArrayFutureWarning
for mask changes.
#7194
Conversation
a41ab10
to
aa81395
Compare
FutureWarning
for masked array when it can propagate.MaskedArrayFutureWarning
for mask changes.
MaskedArrayFutureWarning
for mask changes.MaskedArrayFutureWarning
for mask changes.
aa81395
to
a262155
Compare
This seems a bit too quiet to me, but I am not sure there is a good fix. The problem that I see is that once some masked array that shares a mask gets assigned to it will copy the mask and so will no longer share it. So, future operations with this object will not raise a warning. However, I don't see a good way to fix this. See an example below.
|
a262155
to
2768476
Compare
FutureWarning | ||
) | ||
if (self._mask is not nomask) and \ | ||
(self._sharedmask or self._data.flags.owndata): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By checking whether we own the data of the data part of the array, we can see if it is a view or not. If it is a view the mask should be a view in the future, so this is a good reason to check. This will only be worth checking if we don't already note that we share the mask.
I think I fixed the former issue by checking to see if the data is a view or not. |
2768476
to
8aea362
Compare
Weird. So, if I do the following, I think I am seeing a bug with
|
Too busy right now, but do you mind having a look at my take on it as well, maybe you find some mistake. you do not expect a warning the second case, since copying the mask will set the sharedmask attribute to False? |
Oh, sorry, I copied and pasted my example code from before. I now work around this, but yes it will actually make the |
@jakirkham I really think we should put the warning to exactly where we plan to change things, which is as far as I can tell just the |
My concern with adding the warning to I'm not sure that I like recommending messing with an internal API detail like I felt like as I did introduce this problem and people do view it as one that I should make some effort to fix it. You're right that we shouldn't both be fixing. Currently, I am not working on this. It is merely a proposal for our discussion and definitely needs some more work as mentioned. |
MaskedArrayFutureWarning
for mask changes.MaskedArrayFutureWarning
for mask changes.
FutureWarning | ||
) | ||
if (self._mask is not nomask) and \ | ||
(self._sharedmask or not self._data.flags.owndata): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While owndata
seems like the right sort of check, it seems that views of the data are being regularly created as part of construction a MaskedArray. We need another way to get at this information. Maybe a reference count to the original array, but am not sure how to get that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not to mention that if you mask an existing array the data is a view, so that won't work. Reference counting would be a source of endless troubles. Hmm...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, as views on the data act only as a proxy for what we really want to know. I agree reference counting is to messy.
One thought I had is if we have some flag set here in __setitem__
to note if the mask was shared previously like self._oldsharedmask
. Admittedly, if other references to the old mask are dropped, there might still be a few false positives, but trying to address that special case involves referenced counting, which I think we both want to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and implemented the solution described above.
8aea362
to
99342d5
Compare
@jakirkham @seberg One thing worth thinking about is if there is a workaround that will result in the original behavior, i.e., data view and mask copy, that way people could maintain backward compatibility if they really wanted the old behavior. |
Simple call |
unshare_mask gives users the ability to request the future behavior now, On Fri, Feb 5, 2016 at 5:13 PM, jakirkham notifications@github.com wrote:
Nathaniel J. Smith -- https://vorpus.org http://vorpus.org |
No, the other way around. It will make sure a copy of the mask is made and used as the new mask. So, it will stop propagation to the former mask's views as they won't share the same data any more. |
Ah, okay, right, sorry, I had it mixed up. |
I should add that doing so with this change to the warning will silence it. Though any trivial masks like those in matplotlib will be silenced anyways. |
The only way I can think of, is to do |
" will propagate values back through all masks that are present.", | ||
FutureWarning | ||
) | ||
if (self._mask is not nomask) and \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enclosing parenthesis are preferred to line continuations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This should be corrected.
So I probably don't quite understand all the nuances of what happens when a view is taken, which I presume is where all the trouble is, but if
Anyway, it does seem to me that if you know what the propective changes will be (and that should be layed out before this), then the best thing would be to issue the warning where it makes a difference. |
And now I've gone back and reviewed the original PR, which I had forgotten about. The nomask case really does add a complication. In fact the whole thing is sufficiently complicated that I'm tempted to push this off until we do have a clear PR that has been discussed that implements the new behavior so that it can be discussed in detail. |
It feels to me like this is being overthought. The reality is if a slice is taken and the mask is non-trivial All we want to do is raise a warning in
This is all we are doing with this warning. I am not sure how I can help make this any clearer. Please ask questions if you have them. |
The behavior is implemented in that PR and has been discussed. It also has been on the mailing list at least once (possibly twice IIRC). The |
@jakirkham Let me think on it for another night. I want to be clear about what is going on and I think you would agree that the masked array code takes a bit of work to understand. |
Ok, that's fair. I need to do some other things atm, but will check back in tomorrow. Please let me know if you have questions. |
OK, now I get the oldsharedmask idea. I am not sure I feel it is very necessary. We mostly worry about propagating values back to the parent, this most likely is either bad already at the first warning, or never. It is true that continuing to warn might help finding errors in some cases, so it is a nice idea, but I am not quite convinced it is necessary or that helpful. For simplicities sake, I still prefer to placing the warning exactly where the code change will occur in the future. Note that my "added complexity" with reference counting only silences some additional spurious warnings in cases that cannot be affected by the change (unless someone does weakref crazyness, or has hooks in the mask arrays destruction). Probably everyone knows these things, but in case it might make it a bit more clear. The unshare mask logic, is a mechanism designed to prevent back-propagation to the parent in:
Additionally there is the The proposed change, as far as I understand, is to simply never call There are also many many places where |
BUG: Raise a quieter `MaskedArrayFutureWarning` for mask changes.
Well, let's give this a shot in 1.11.0b3. |
Thanks @charris. |
Glad you follow, @seberg. Sorry I didn't explain it better before. You may be right, but at this point I think any adjustment to this warning will try to follow the changes in the patch, but I will need to revisit the patch to make sure its merge conflicts have been resolved correctly.
Agreed. Will need to fix some merge conflicts with the old patch to make sure that we know exactly where they are and they are correct.
Thanks for doing a better job explaining it than I did. :)
Glad to hear you are coming around on this. I suppose we could suggest some kind of inheritance that fixes this or a monkey patch (only suggested as inheritance of NumPy arrays is a bit complex) to get this behavior earlier to avoid accidentally messing up
That's the idea and you are probably right we can just put the warning there. I think there was one tweak with fields of record arrays, but they should still hit this warning. |
Just to note, I am very sure that there is at least one spurious I have not tested this code when it comes to spurious errors in the |
I doubt we will ever get perfect precision and recall here. Though I will contend that the changes should help make a noticeable improvement on the situation. |
Sure, but we should (really must) avoid all warnings we know of and |
Once confirmed, we can look at any and all special cases that appear. I think the better thing to do though is to make sure the bugfix patch has its merge conflicts fixed and tests passing. After this is done, the warning can probably be cut down a bit more by just emitting warnings where the patch changes behavior. |
Just to note in case we want to make it even quieter after this, in #7020 I suggested to only raise the warning in
this way the warning only shows up in code which attempts to assign to a view of a masked array. I'm not sure exactly how much that solution overlaps with the one here involving oldsharedmask. |
@ahaldane The problem is that
|
FWIW, @ahaldane, that is what I tried first, but it turns out that all masked arrays that create a view of an |
@jakirkham: OWNDATA is just not a reliable sign of whether there are multiple user-exposed views of the same array -- masked arrays are far from the only place that acts like this. I wouldn't bother trying to fix it; of course it's best to have a signal that's 100% reliable, but in some ways it's worse to have a signal that's 99% reliable than a signal that's 90% reliable, because with the 99% reliable signal people are more likely to rely on it. |
Sorry if that was confusing, I am merely agreeing with @charris that using As far as thinking about improving the warning, I was only considering this if it was found to be causing issues somewhere else. Thus far, (and this may be premature), this does not seem to be the case. If that changes, then I am happy to rework the warning patch. |
Hmm, fair enough, I didn't realize that. |
I would have thought you use |
Fixes #7164
Related: #7020
Related: #5580
Drop the
__getitem__
warning. In__setitem__
, check to see if the masked array is shared. If it is shared, we know it will propagate upstream in the future. Also, use a more specific warning type instead of usingFutureWarning
so that this can be explicitly ignored.