Skip to content

Array-MaskedArray inplace ufunc interactions can turn out nonsensical results #9394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
chewxy opened this issue Jul 9, 2017 · 7 comments
Labels
component: numpy.ma masked arrays

Comments

@chewxy
Copy link

chewxy commented Jul 9, 2017

What is seen:

>>> import numpy as np
>>> import numpy.ma as ma
>>> y = ma.array([1, 2, 3], mask = [0, 1, 0])
>>> incr = np.array([100,100,100])
>>> x = np.array([2,4,6])
>>> incr += x + y
>>> incr
array([103, 104, 109])

While this is somewhat logical, this starts to make less sense when you come to something like this:

>>> y = ma.array([1, 2, 3], mask = [0, 1, 0])
>>> incr = np.array([100,100,100])
>>> x = np.array([2,4,6])
>>> incr += x * y
>>> incr
array(102, 104, 118])

Here we see that the equivalent operation is incr[1] += x[1] .

Combined with the above case with addition, we can rapidly come to a conclusion that it's essentially doing this: incr[idx_with_invalid] += x[idx_with_invalid] <OP> identity. I'm not sure if that's the correct thing to do.

The documentation on the website is unclear on this either, stating only:

Masked arrays also support standard numpy ufuncs. The output is then a masked array. The result of a unary ufunc is masked wherever the input is masked. The result of a binary ufunc is masked wherever any of the input is masked. If the ufunc also returns the optional context output (a 3-element tuple containing the name of the ufunc, its arguments and its domain), the context is processed and entries of the output masked array are masked wherever the corresponding input fall outside the validity domain

So, the question is what would the correct interpretation be with these sorts of interactions? If it's supposed to be the identity of the operation then it should be noted in the docs

@eric-wieser
Copy link
Member

Part of the problem here is that you're discarding the mask information by forcing the result into a standard array. Your code is equivalent in result to incr = incr + np.array(x*y).

@chewxy
Copy link
Author

chewxy commented Jul 10, 2017

Fair enough. This should be documented down somewhere tho

@mhvk
Copy link
Contributor

mhvk commented Jul 10, 2017

This is a problem faced by in-place operations more generally, that there exists no __iradd__ and hence the object being added cannot influence how the operation proceeds. This would be solved once we implement MaskedArray using __array_ufunc__ since in that case it can influence the result (more specifically, it can cause it to fail, which is probably appropriate here: the mask should not just be discarded without the user explicitly doing that). Anyway, agreed it would be good to document... PR most welcome...

@eric-wieser
Copy link
Member

eric-wieser commented Jul 10, 2017

This problem came up in scipy sparse as a result of a __array_ufunc__ change, didn't it? What was the resolution there regarding correct implementation of __array_ufunc__?

Edit: that was #9019.

@mhvk
Copy link
Contributor

mhvk commented Jul 10, 2017

Ah, so that was for a change in behaviour of __array_priority__. With __array_ufunc__, the sparse matrix could just return NotImplemented itself and things should work. Anyway, I think we're getting a bit off-topic here, except that I think we could solve the issue in a future MaskedArray that relied on __array_ufunc__.

@eric-wieser
Copy link
Member

eric-wieser commented Jul 10, 2017

But this already did the right thing for scipy.sparse before __array_ufunc__, right? How was that achieved? Is it because it was a duck type?

@mhvk
Copy link
Contributor

mhvk commented Jul 10, 2017

The way __array_priority__ was handled was changed a bit as part of introducing __array_ufunc__ - since it was attempted to sanitize the code allowing overrides more generally - and this is what tripped up scipy.sparse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: numpy.ma masked arrays
Projects
None yet
Development

No branches or pull requests

4 participants