Skip to content

np.tile modifies the mask of the inputted masked array #3140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tjanez opened this issue Mar 12, 2013 · 11 comments
Closed

np.tile modifies the mask of the inputted masked array #3140

tjanez opened this issue Mar 12, 2013 · 11 comments

Comments

@tjanez
Copy link

tjanez commented Mar 12, 2013

Here is an example of the problem:

>>> import numpy as np
>>> import numpy.ma as ma
>>> A = ma.masked_array(np.array([1., 2.]), mask=np.array([False, False]))
>>> A
masked_array(data = [1.0 2.0],
             mask = [False False],
       fill_value = 1e+20)

>>> B = np.tile(A, (2, 1))
>>> A
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/site-packages/numpy/ma/core.py", line 3553, in __repr__
    data=str(self), mask=str(self._mask),
  File "/usr/lib64/python2.7/site-packages/numpy/ma/core.py", line 3537, in __str__
    res[m] = f
ValueError: boolean index array should have 1 dimension
>>> A._data
array([ 1.,  2.])
>>> A._mask
array([[False, False]], dtype=bool)

The issue is that the call to np.tile changed A._mask from array([False, False], dtype=bool) to array([[False, False]], dtype=bool), which made the __repr___() function return the error.

Digging deeper, I found that the issue is with the call _nx.array(A,copy=False,subok=True,ndmin=d) in the tile function.
Before this call, A has the correct _mask and after this call, A's _mask changes.

Does anyone have any ideas how to fix this?

This is on Fedora-rawhide with:
python-2.7.3-34.fc19.x86_64
numpy-1.7.0-1.fc19.x86_64

@seberg
Copy link
Member

seberg commented Mar 12, 2013

Yes... I think there is an older bug report about this... The basic problem is that somewhere the masks shape is assigned using mask.shape = new_shape (along those lines, in array_finalize or so), which fails to create a new view, and thus can change the mask of the earlier one as well. A long time ago I had a short look at this, not sure if it is quite right, but maybe you can have a look, do necessary cleanups and make a pull request. Here are the changes from back then:

https://github.com/seberg/numpy/compare/safemask

I did change the .shape attribute completely, I am not 100% sure as to why all those changes though.

@seberg
Copy link
Member

seberg commented Mar 12, 2013

So the more thorough fix would probably include changing the ndarray.shape attribute setting logic (possible fixing #145 while at it. I am not certain as to the discussion, but since I messed with it recently, attempt_nocopy_reshape can easily be modified to cover all cases (if someone works on that, I have had done it, but it would be unused right now so it is not in numpy), so if it is exposed to the getset.c it could be just called directly then.

But just saying if someone feels like looking at into finding a good solution. I don't think its difficult overall, and I am happy to give pointers.

@tjanez
Copy link
Author

tjanez commented Mar 14, 2013

@seberg, thanks for a quick and thorough reply.

I'm a bit confused right now. Do you think working on https://github.com/seberg/numpy/compare/safemask would be a good way to fix this or do you think this issue should be tackled by fixing #145? The latter is harder to understand for me and I don't know why it hasn't been merged yet?

Also, I haven't contributed to Numpy yet, so I'm not very familiar with the workflow (though I've read the development workflow). Is this easy enough for a first contribution?

@seberg
Copy link
Member

seberg commented Mar 14, 2013

Don't know, but I had a bit longer look, and I think fixing the latter (gh-145) has some tedious details, and maybe the approach in that PR is best. I just though that by changing ndarray.shape attribute setting, you may not need to give the masked array its own shape attribute. While I do not remember why I did that, there was probably some reason...

So my guess is, the basic approach in that safemask branch is probably good, but needs at the very least some tests.

Contributing is very easy, all you need to do is create your own branch in a github fork, push to it and then github will already show you a button to create a pull request. And once you have the pull request open, anyone can easily comment on it to smooth out the details.

@honnorat
Copy link

The bug is still present identically in recent numpy versions.

>>> np.__version__
'1.14.3'
>>> A.mask.shape
(1, 2)
>>> A.shape
(2,)
>>> A
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/honnorat/usr/conda/lib/python3.6/site-packages/numpy/ma/core.py", line 3942, in __repr__
    self._insert_masked_print(),
  File "/Users/honnorat/usr/conda/lib/python3.6/site-packages/numpy/ma/core.py", line 3866, in _insert_masked_print
    _recursive_printoption(res, mask, masked_print_option)
  File "/Users/honnorat/usr/conda/lib/python3.6/site-packages/numpy/ma/core.py", line 2439, in _recursive_printoption
    np.copyto(result, printopt, where=mask)
ValueError: could not broadcast where mask from shape (1,2) into shape (2)

@eric-wieser
Copy link
Member

Thanks for reviving this - I think you just found the root cause of #10270. Pinging @mhvk, who has a PR somewhere that is a smaller alternative to #10308, that was just waiting on a test.

Here's that test:

np.tile(np.ma.masked, (2, 1))
assert_equal(np.ma.masked.mask.ndim, 0)  # fails

@mhvk
Copy link
Contributor

mhvk commented May 16, 2018

@honnorat - thanks, indeed. My PR 1308 #10314 indeed solves this problem, and now uses @tjanez's example on top in a test that should ensure the problem will not recur.

@eric-wieser
Copy link
Member

@mhvk: Wrong PR number?

@mhvk
Copy link
Contributor

mhvk commented May 29, 2018

Thanks, corrected to #10314

@cmarmo
Copy link
Contributor

cmarmo commented Oct 13, 2022

Hello, I have run the snippet of code in the description and I am under the impression that this issue can be closed?

Python 3.10.7 (main, Sep  7 2022, 00:00:00) [GCC 12.2.1 20220819 (Red Hat 12.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.__version__
'1.24.0.dev0+934.gdb7414b7f'
>>> A = ma.masked_array(np.array([1., 2.]), mask=np.array([False, False]))
>>> A
masked_array(data=[1.0, 2.0],
             mask=[False, False],
       fill_value=1e+20)
>>> B = np.tile(A, (2, 1))
>>> A
masked_array(data=[1.0, 2.0],
             mask=[False, False],
       fill_value=1e+20)
>>> A._data
array([1., 2.])
>>> A._mask
array([False, False])

Thanks for listening.

@seberg
Copy link
Member

seberg commented Oct 13, 2022

Thanks for the note @cmarmo. Agree, seems this has been fixed years ago! (in gh-10314)

@seberg seberg closed this as completed Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants