-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
ENH: Make no unshare mask future warnings less noisy #7363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
the mask of ``child`` is a view into the mask of ``ma_array``. | ||
Also when creating ``ma_array = MaskedArray(data, mask)`` will | ||
make ``ma_array`` view the data in ``mask`` and setting an item | ||
will currently copy ``mask`` but in the future modify the original. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few issues with this text:
- it's in the wrong place --it's currently under "things that will happen in 1.12", but there's no way we're going to flip the switch on this in 1.12, it's way too early.
- the description here is hard to understand i think. "Setting items of the array using the square bracket notation" to me sounds like
__setitem__
. I'm not 100% sure what the part aboutMaskedArray
is saying either. Maybe: "Currently, taking a view of a masked array produces a confusing result. For example, if we writemasked_view = masked_original[:]
, thenmasked_view
's data array will be a view ofmasked_original
's data array, so modifications to one array's data will also affect the other:masked_view[0] = 123; assert masked_original[0] == 123
. But currently, the mask array is copied, rather than taking a view, so changes to one array's mask will not affect the other:masked_view[0] = np.ma.masked; assert masked_original[0] is not np.ma.masked
. A similar situation happens when explicitly constructing a masked array usingMaskedArray(data, mask)
-- the returned array will have a view ofdata
but a copy ofmask
. In the future, these cases will be normalized so that the data and mask arrays are treated the same way, and modifications to either will propagate between views. In 1.11, numpy will issue awhatever class it is
warning whenever user code modifies the mask of a view. To silence these warnings, and make your code robust against the upcoming changes, you have two options: if you want to keep the current behavior, callmasked_view.unshare_mask()
before modifying the mask. If you want to get the future behavior early, domasked_view._sharedmask = False
."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: do unshare_mask
and/or _sharedmask
work in older versions of numpy? If so then we should say that, because downstream will be working about how to handle both old numpy and 1.11 in the same code and that will make their lives easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, there is a twist. The mask is a view until you call setitem (or unshare_mask explicitly). I am trying to adapt your text a bit to be clear on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, but frankly, I am not sure it can be understood. It is tricky business....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that logic was there since a very long time (probably forever). but setting _sharedmask
to false, might break explicit calls to unshare_mask, hmmpf, not hat I assume there are many, but still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could just document that unshare_mask is unreliable after this, I can't
imagine people will want to intentionally be using both. Could also make at
least the new version of unshare_mask copy unconditionally (or maybe it
already does this) in preparation for removing _sharedmask logic entirely.
On Feb 28, 2016 11:53, "seberg" notifications@github.com wrote:
In doc/release/1.11.0-notes.rst
#7363 (comment):@@ -41,8 +41,21 @@ Future Changes
The following changes are scheduled for Numpy 1.12.0.
- Support for Python 2.6, 3.2, and 3.3 will be dropped.
-* Slicing aMaskedArray
will return views of both data and mask.
- Currently the mask is returned as a copy.
+* When setting items of aMaskedArray
using the square bracket- notation NumPy will in the future never copy the mask. Some
- examples of this are
child = ma_array[::2]; view[3] = 3
, where- the mask of
child
is a view into the mask ofma_array
.- Also when creating
ma_array = MaskedArray(data, mask)
will- make
ma_array
view the data inmask
and setting an item- will currently copy
mask
but in the future modify the original.Yeah, that logic was there since a very long time (probably forever). but
setting _sharedmask to false, might break explicit calls to unshare_mask,
hmmpf, not hat I assume there are many, but still.—
Reply to this email directly or view it on GitHub
https://github.com/numpy/numpy/pull/7363/files#r54355225.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, added a mention. No, it only copies when the _sharedmask
is True. This is the case far more often then necessary, but I think it is reliable to always copy the mask if it may be necessary (to not propagate to parent). As you said before, there might be cases where unshare mask may be wanted, though I guess in those cases an unconditional copy is likely to be perfectly fine, so making it unconditional may be an option, hmmmm (we would just not call it unless also giving the warning otherwise).
Overall:
So modulo comments above my impression is that something like this should probably be merged. Cc @jakirkham - thoughts? |
6913d75
to
11ba029
Compare
@njsmith we are "getting away" with it, because our test suit is broken when it comes to warning testing, especially FutureWarnings, see my suppressed warnings PR ;). |
11ba029
to
c8c338b
Compare
Incorporates Nathaniels suggestions for a longer explanation in the release notes.
Bah, one other annoying thing, though maybe we can ignore it. The mask is of course never a view if it is nomask. |
So, I haven't had lots of time to take a close look. Though my impression after looking through is this looks very similar to the previous version ( #7187 ). My criticisms are pretty much the same. Not a fan of reference counting as this could result in weird behavior in terms of performance and validity (of warning). Concerned as this will not raise warnings when there are views of views (and so on). This is what Though the big question for me right now is whether this change is needed. I haven't heard any complaints about the warning level on the initial issue. Though that doesn't mean that there are no complaints. So, my question is, are there still problems downstream? Are there problems here? If so, what are they? |
This appears to already be addressed here thanks to the fact that the first branch rules that out ( https://github.com/numpy/numpy/pull/7363/files#diff-e85f023d9c07df014a190b485db72c3dR3229 ). |
@jakirkham, it annoys me because it is inconsistent in the final behaviour. Your desired end result is, that mask changes tend to propagate back to the original array, but they won't in the case of As to the rest, as I said, the reference counting is only a further filter, and yes, it is safe even for views of views, because that is what it checks. That only "self" holds a reference to the mask and the mask is not a view. If someone had a view into that same mask, they would also -- indirectly -- have a reference to it. Feel free to proof me wrong, but I predict unless you use wekrefs you won't find a loophole. You are right about bad code smell, but I will just claim my nose is better trained. If you check carefully, all the tweaks in About [1]:
|
I thought you were musing over quieting the warning by a check for
Yeah, still feel like ref counting is a bit sketchy. Not sure how much we can trust this given how masked arrays are constructed.
Again haven't had lots of time to take a close look. It seems like a call to
When I have more time I'll try to give a better explanation. To say it simply and roughly, it tries to mimic what the future behavior will be. I suppose it could be tweaked more... Though when these improvements were contemplated the question was raised about how accurate we really want to make the warning and whether more accurate might not be worse by lulling people into reliance on a nearly (not entirely) accurate warning. This brings me to my original question, which I am still not seeing the answer to. What problems are we still solving? If it is to silence a few false alarms in covariance, median, etc., maybe we should just silence those by following the recommend procedure from before namely calling |
What to do about this? Looking at Pauli's tests I see two occurances of There is mention of other independent parts of this PR that look desireable. Could someone expand on that? |
Sorry, I was trying to be clever using |
As for as the proposal presented here, below are my thoughts. Others may of course think differently. Seem like good ideas:
Subject of discussion:
Seem like not good ideas:
|
My impression is that the main effect of this is to put the warning directly onto the same code path that is actually going to be changed in the future, which is usually the simplest and most accurate way to do things. Is that wrong? Can you elaborate on what the actual trade-offs are here?
It seems safe and accurate to me, but meh, whatever, I'm not so excited that I want to argue about it :-)
IIUC, what you're saying here is that you don't like the recommendation that users do |
Hi,
At first when reading this PR I thought we would have to wait for 1.12, and this sentence was not clear because of the meaning of "_sharedmask": setting it to False seems like it prevent the mask to be shared, though it is actually the contrary. So now that I read the code I understand, but maybe the sentence should be more explicit ? |
If anyone wants me to do explicit fixup stuff here (i.e. remove the The general point whether we want it. Dunno, in general I think I |
I've been meaning to review this but just haven't had time! I'll take a look once I get a chance. (by the way I wouldn't consider myself a MaskedArray user really... I got interested in MaskedArrays due to changes I wanted to make elsewhere in numpy, which snowballed) |
I'm going to stick with what we have for 1.11.0. If there are internal uses that raise warnings, we should deal with that at some point, but the number of warnings currently generated downstream don't strike me as excessive. I will try to update the release notes, but with some editing as I find the version here confusing, but otherwise an improvement as it offers a more extended explanation. |
Updated release notes in #7430, comments welcome. |
@seberg @jakirkham I'm thinking we might solve the nomask problem by making the EDIT, We might want to control access with a lock, or maybe not, we don't bother to do that with ordinary views and I don't know if array assignments or property getters/setters are atomic. |
Hehe, yeah but another level of indirection would make it even harder Ceterum censeo, that moving the place to warn has only advantages, |
ENH: Make no unshare mask future warnings less noisy
I went and checked both Matplotlib and StatsModels for both versions and this PR does a better job. So merging. Thanks Sebastian. |
@saimn, that warms my heart. 😄 |
@charris, ok, if you think it is better, I trust your judgement. |
In particular, the StatsModel warning went away, which was good because it didn't seem associated with any test. For Matplotlib, the warning was in a different place and looked valid whereas the warning without this PR did not look correct. Not sure what the difference was there. Setting the stacklevel also helped. |
@jakirkham I figured the only valid way to settle the question was to gather some actual data and it does look like this PR performs better. EDIT: I was hoping they would both work the same, but it didn't turn out that way. |
No need to keep selling it. 😄 I have learned my lesson after the disaster that was my first attempt at this warning. 😉 |
A little late here since you just merged, but I was just looking through the history of changes here and this PR looks ok to me. I was at first a little worried about the refcounting, but @seberg's comment that this is merely a further filter on the warning satisfied me: There are no cases where we should emit a warning but don't (it can't happen that the mask is really shared but appears unshared by refounting), although there are still some rare cases where we probably don't need to warn but do (eg if the users manually makes a reference to arr._mask, but doesn't write to it). |
@seberg Just to be sure, assignment to a masked data value can also change the mask, does this cover that case? |
@charris sorry, a bit daft here, can you give an example? My logic here always was: Put the warning where the future change is and trust that it must be right. |
@SeberT Actually, looks like a miss...
No warning issued. |
NVM, bad example... EDIT: Looks OK. |
Heh, almost difficult to get a good example:
|
Mostly a rebase of the other one, mostly in case anyone still cares (yeah, I can't stop myself from just not liking ):
_sharedmask = False
calls also fix in extras.py. Those warnings are caused by intermediate arrays without any user control.MaskedArrayFutureWarning
as its own class, I believe at the point where we feel that we have to make it easier to suppress a warning, there is probably something not quite right (the warning should be avoided not silenced). But it left it because it does not hurt._sharedmask = False
it is messing with internals, so it is ugly and I don't mind removing it.unshare_mask
, i.e. field-indexing, hardmasks assign "masked". Putting the warning where the call is just does not break my brain and gets down the warning hits in the test suit to something like 6 (~28 without the refcount logic), from 65 hits (counting test failures when turning to error).