-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ma.empty/ones/zeros_like to copy mask properly #3404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
As underlying `empty_like()` knows nothing about mask, it would be `_convert2ma`'s job to properly set up the output mask. `copy` argument is added to figure out which functions need this behavior. `ones_like()` and `zeros_like()` which were mere aliases of regular numpy functions now go through `_convert2ma` as well. Hopefully this fixes numpy#3145, numpy#2572.
a.mask = False | ||
assert_array_equal(a_empty.mask, m) | ||
assert_array_equal(a_ones.mask, m) | ||
assert_array_equal(a_zeros.mask, m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this says that all the *_like
functions return arrays in which everything that was masked in the input, is masked in the output. This seems odd to me. Shouldn't e.g. ones_like
give you an array whose contents is all-ones and whose mask is all-False?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might depend on the design of API. Yet, many users would expect *_like
functions return arrays of same shape and mask. In either way, current implementation returns a new array with old mask, which is definitely confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to follow up, it looks like everyone on the mailing list prefers the API where the *_like
functions return a new array with the mask set to nomask
, rather than copying the input array's mask, and that's my intuition as well. Want to update the PR to do that? (Or if you have an argument for why it should be the other way around, bring that up on the list?)
Also Pierre GM made a suggestion for how the _convert2ma
function might be changed that you might want to take a look at.
What I am wondering here is, if the mask should not be copied whenever in array_finalize the memory between the two arrays is not shared. Also in the case where it is shared we should create a view due to reshape bugs (there are some issues about this open). The latter I once started (but have not much clue if it is right nor the time to think much about it in seberg/numpy@master...safemask . Though in that code I used apparently used the owndata flag, which I think should likely be replaced with the may_share_memory check mentioned. Just if you have some time and interested and use/know masked arrays. I think it might solve other smaller issues elsewhere, too. Though I don't know if someone might actually work with the fact that masks are not copied sometimes when the array is (but it seems weird to me). |
Asked about the API: On Thu, Jun 6, 2013 at 1:35 PM, seberg notifications@github.com wrote:
|
A follow-up that won't be captured in the above thread, with a dissenting opinion: |
Needs decision. |
Well, that's no help ;) The majority voted for nomask, whereas Gregorio wants a mask copy and has code that uses that assumption. Hmm..., I think backward compatibility (almost) requires the copy, whereas changing the mask values would require deprecation. So I vote for this patch as is. Thoughts? |
Or at least for |
Alternatively, we could use an ma specific keyword to determine which behavior to use, say |
Previously, operations which created a new masked array from an old masked array -- e.g., np.empty_like -- would tend to result in the new and old arrays sharing the same .mask attribute. This leads to horrible brokenness in which writes to one array affect the other. In particular this was responsible for part of the brokenness that @jenshnielsen reported in numpygh-5184 in which np.gradient on masked arrays would modify the original array's mask. This fixes the worst part of the issues addressed in numpygh-3404, though there's still an argument that we ought to deprecate the mask-copying behaviour entirely so that empty_like returns an array with an empty mask. That can wait until we find someone who cares though. I also applied a small speedup to np.gradient (avoiding one copy); previously this inefficiency was masking (heh) some of the problems with masked arrays, so removing it is both an optimization and makes it easier to test that things are working now.
Previously, operations which created a new masked array from an old masked array -- e.g., np.empty_like -- would tend to result in the new and old arrays sharing the same .mask attribute. This leads to horrible brokenness in which writes to one array affect the other. In particular this was responsible for part of the brokenness that @jenshnielsen reported in numpygh-5184 in which np.gradient on masked arrays would modify the original array's mask. This fixes the worst part of the issues addressed in numpygh-3404, though there's still an argument that we ought to deprecate the mask-copying behaviour entirely so that empty_like returns an array with an empty mask. That can wait until we find someone who cares though. I also applied a small speedup to np.gradient (avoiding one copy); previously this inefficiency was masking (heh) some of the problems with masked arrays, so removing it is both an optimization and makes it easier to test that things are working now.
As reported in #3145 and #2572,
ma.empty_like
does not copy, but merely assigns a reference to the original mask, leading to undesirable results.This patch makes
numpy.ma.empty_like()
to work properly, in addition to newly introducednumpy.ma.ones_like()
andnumpy.ma.zeros_like()
.However,
numpy.empty_like()
and siblings still do not handle masks correctly. This could be a work demanding more lower-level changes.