Skip to content

ma.empty/ones/zeros_like to copy mask properly #3404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

tomyun
Copy link

@tomyun tomyun commented Jun 6, 2013

As reported in #3145 and #2572, ma.empty_like does not copy, but merely assigns a reference to the original mask, leading to undesirable results.

This patch makes numpy.ma.empty_like() to work properly, in addition to newly introduced numpy.ma.ones_like() and numpy.ma.zeros_like().

However, numpy.empty_like() and siblings still do not handle masks correctly. This could be a work demanding more lower-level changes.

tomyun added 3 commits June 5, 2013 16:13
As underlying `empty_like()` knows nothing about mask, it would be
`_convert2ma`'s job to properly set up the output mask. `copy` argument
is added to figure out which functions need this behavior.

`ones_like()` and `zeros_like()` which were mere aliases of regular
numpy functions now go through `_convert2ma` as well.

Hopefully this fixes numpy#3145, numpy#2572.
a.mask = False
assert_array_equal(a_empty.mask, m)
assert_array_equal(a_ones.mask, m)
assert_array_equal(a_zeros.mask, m)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this says that all the *_like functions return arrays in which everything that was masked in the input, is masked in the output. This seems odd to me. Shouldn't e.g. ones_like give you an array whose contents is all-ones and whose mask is all-False?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might depend on the design of API. Yet, many users would expect *_like functions return arrays of same shape and mask. In either way, current implementation returns a new array with old mask, which is definitely confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to follow up, it looks like everyone on the mailing list prefers the API where the *_like functions return a new array with the mask set to nomask, rather than copying the input array's mask, and that's my intuition as well. Want to update the PR to do that? (Or if you have an argument for why it should be the other way around, bring that up on the list?)

Also Pierre GM made a suggestion for how the _convert2ma function might be changed that you might want to take a look at.

@seberg
Copy link
Member

seberg commented Jun 6, 2013

What I am wondering here is, if the mask should not be copied whenever in array_finalize the memory between the two arrays is not shared. Also in the case where it is shared we should create a view due to reshape bugs (there are some issues about this open). The latter I once started (but have not much clue if it is right nor the time to think much about it in seberg/numpy@master...safemask . Though in that code I used apparently used the owndata flag, which I think should likely be replaced with the may_share_memory check mentioned.

Just if you have some time and interested and use/know masked arrays. I think it might solve other smaller issues elsewhere, too. Though I don't know if someone might actually work with the fact that masks are not copied sometimes when the array is (but it seems weird to me).

@njsmith
Copy link
Member

njsmith commented Jun 10, 2013

Asked about the API:
http://mail.scipy.org/pipermail/numpy-discussion/2013-June/066832.html

On Thu, Jun 6, 2013 at 1:35 PM, seberg notifications@github.com wrote:

What I am wondering here is, if the mask should not be copied whenever in
array_finalize the memory between the two arrays is not shared. Also in the
case where it is shared we should create a view due to reshape bugs (there
are some issues about this open). The latter I once started (but have not
much clue if it is right nor the time to think much about it in
seberg/numpy@master...safemask . Though in
that code I used apparently used the owndata flag, which I think should
likely be replaced with the may_share_memory check mentioned.

Just if you have some time and interested and use/know masked arrays. I
think it might solve other smaller issues elsewhere, too. Though I don't
know if someone might actually work with the fact that masks are not copied
sometimes when the array is (but it seems weird to me).


Reply to this email directly or view it on GitHubhttps://github.com//pull/3404#issuecomment-19042020
.

@njsmith
Copy link
Member

njsmith commented Jul 17, 2013

A follow-up that won't be captured in the above thread, with a dissenting opinion:
http://mail.scipy.org/pipermail/numpy-discussion/2013-July/067136.html

@charris
Copy link
Member

charris commented Aug 16, 2013

Needs decision.

@charris
Copy link
Member

charris commented Mar 28, 2014

Well, that's no help ;) The majority voted for nomask, whereas Gregorio wants a mask copy and has code that uses that assumption. Hmm..., I think backward compatibility (almost) requires the copy, whereas changing the mask values would require deprecation. So I vote for this patch as is. Thoughts?

@charris
Copy link
Member

charris commented Mar 28, 2014

Or at least for empty_like, and having the new zeros_like and ones_like behave differently would be confusing. Of course, once they are all in, I suspect further changes are off the table.

@charris
Copy link
Member

charris commented Mar 28, 2014

Alternatively, we could use an ma specific keyword to determine which behavior to use, say clearmask=0 or nomask=0.

njsmith added a commit to njsmith/numpy that referenced this pull request Oct 18, 2014
Previously, operations which created a new masked array from an old
masked array -- e.g., np.empty_like -- would tend to result in the new
and old arrays sharing the same .mask attribute. This leads to
horrible brokenness in which writes to one array affect the other. In
particular this was responsible for part of the brokenness that
@jenshnielsen reported in numpygh-5184 in which np.gradient on masked
arrays would modify the original array's mask.

This fixes the worst part of the issues addressed in numpygh-3404, though
there's still an argument that we ought to deprecate the mask-copying
behaviour entirely so that empty_like returns an array with an empty
mask. That can wait until we find someone who cares though.

I also applied a small speedup to np.gradient (avoiding one copy);
previously this inefficiency was masking (heh) some of the problems
with masked arrays, so removing it is both an optimization and makes
it easier to test that things are working now.
@njsmith
Copy link
Member

njsmith commented Oct 18, 2014

Closing in favor of #5203. The commentary in this PR is still relevant, but progress is stalled and the actual change currently in the branch is a subset of what's in #5203.

@njsmith njsmith closed this Oct 18, 2014
njsmith added a commit to njsmith/numpy that referenced this pull request Oct 21, 2014
Previously, operations which created a new masked array from an old
masked array -- e.g., np.empty_like -- would tend to result in the new
and old arrays sharing the same .mask attribute. This leads to
horrible brokenness in which writes to one array affect the other. In
particular this was responsible for part of the brokenness that
@jenshnielsen reported in numpygh-5184 in which np.gradient on masked
arrays would modify the original array's mask.

This fixes the worst part of the issues addressed in numpygh-3404, though
there's still an argument that we ought to deprecate the mask-copying
behaviour entirely so that empty_like returns an array with an empty
mask. That can wait until we find someone who cares though.

I also applied a small speedup to np.gradient (avoiding one copy);
previously this inefficiency was masking (heh) some of the problems
with masked arrays, so removing it is both an optimization and makes
it easier to test that things are working now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants