Skip to content

Indexing multidimensional structured masked array fails to translate fill_value correctly, leading to broadcasting errors ValueError when calling .filled() #6723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gerritholl opened this issue Nov 25, 2015 · 6 comments

Comments

@gerritholl
Copy link
Contributor

When using a structured masked array where one of the structure elements has a multidimensional dtype, indexing does not correctly set the fill_value attribute for the new masked array. This leads to a fill_value with a shape that is incompatible with .data or .mask, which inevitably leads to broadcasting problems when fill_value is used, such as when calling filled():

In [332]: A = ma.masked_array(data=[([0,1,2],), ([3,4,5],)], mask=[([True, False, False],), ([False, True, False],)], dtype=[("A", ">i2", (3,))])

In [339]: A["A"][:, 0].filled()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-339-ddde509d73bf> in <module>()
----> 1 A["A"][:, 0].filled()

/home/users/gholl/venv/stable-3.5/lib/python3.5/site-packages/numpy/ma/core.py in filled(self, fill_value)
   3573             result = self._data.copy('K')
   3574             try:
-> 3575                 np.copyto(result, fill_value, where=m)
   3576             except (TypeError, AttributeError):
   3577                 fill_value = narray(fill_value, dtype=object)

ValueError: could not broadcast input array from shape (3) into shape (2)

In [341]: A.shape, A.fill_value.shape, A["A"].shape, A["A"].fill_value.shape, A["A"][:, 0].shape, A["A"][:, 0].fill_value.shape
Out[341]: ((2,), (), (2, 3), (3,), (2,), (3,))

Indeed, the fill value doesn't change upon indexing the masked array — only its dtype does:

In [351]: A.fill_value, A["A"].fill_value, A["A"][:, 0].fill_value
Out[351]: 
(([16959, 16959, 16959],),
 array([16959, 16959, 16959], dtype=int16),
 array([16959, 16959, 16959], dtype=int16))
@gerritholl
Copy link
Contributor Author

Would it be sensible to require that a fill_value should always have .ndim==0?

@gerritholl
Copy link
Contributor Author

A more commonly used method through to which this propagates is pcolor — I found this bug through matplotlibs pcolor function/method, and I am currently using a workaround (explicitly setting X._fill_value = X.fill_value.flat[0]) so that pcolor works.

@gerritholl
Copy link
Contributor Author

@charris The reason dimensionality of the fill_value goes up upon indexing is because we access a multidimensional field, A["A"] in the above example. A.dtype == A.fill_value.dtype == dtype([('A', '>i2', (3,))]), A.shape == (2,), A.fill_value.dtype == (). However, A["A"].dtype == dtype('>i2') and A["A"].shape == (2, 3). So any time we index a field that is multidimensional, A[field] will have higher dimensionality than A. In this example, A.ndim == 1 and A["A"].ndim == 2, A.fill_value.ndim == 0, and finally A["A"].fill_value.ndim == 1. It's this last dimensionality that causes problems.

@gerritholl
Copy link
Contributor Author

More compactly put: M[field].ndim == M.ndim + len(M.dtype[field].shape)

(Seems dtypes have a .shape, but not an .ndim...)

gerritholl added a commit to gerritholl/numpy that referenced this issue Dec 7, 2015
Fix issue numpy#6723.  Given an exotic masked structured array, where one of
the fields has a multidimensional dtype, make sure that, when accessing
this field, the fill_value still makes sense.  As it stands prior to this
commit, the fill_value will end up being multidimensional, possibly with
a shape incompatible with the mother array, which leads to broadcasting
errors in methods such as .filled().  This commit uses the first element
of this multidimensional fill value as the new fill value.  When more
than one unique value existed in fill_value, a warning is issued.

Also add a test to verify that fill_value.ndim remains 0 after indexing.
ahaldane added a commit that referenced this issue Jan 13, 2016
…rray_fillvalue

BUG/TST: Fix for #6723 including test: force fill_value.ndim==0
jaimefrio pushed a commit to jaimefrio/numpy that referenced this issue Mar 22, 2016
Fix issue numpy#6723.  Given an exotic masked structured array, where one of
the fields has a multidimensional dtype, make sure that, when accessing
this field, the fill_value still makes sense.  As it stands prior to this
commit, the fill_value will end up being multidimensional, possibly with
a shape incompatible with the mother array, which leads to broadcasting
errors in methods such as .filled().  This commit uses the first element
of this multidimensional fill value as the new fill value.  When more
than one unique value existed in fill_value, a warning is issued.

Also add a test to verify that fill_value.ndim remains 0 after indexing.
@eric-wieser
Copy link
Member

eric-wieser commented Feb 28, 2017

Since the above rebases are confusing, the PR for this was #6728

@mhvk
Copy link
Contributor

mhvk commented May 4, 2017

This has been fixed (I assume by #6728), so closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants