Skip to content

BUG, DEP: Fix masked arrays to properly edit views. ( #5558 ) #5580

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 9, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions doc/source/reference/maskedarray.generic.rst
Original file line number Diff line number Diff line change
Expand Up @@ -379,8 +379,8 @@ is masked.
When accessing a slice, the output is a masked array whose
:attr:`~MaskedArray.data` attribute is a view of the original data, and whose
mask is either :attr:`nomask` (if there was no invalid entries in the original
array) or a copy of the corresponding slice of the original mask. The copy is
required to avoid propagation of any modification of the mask to the original.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should note the previous behavior, the new behavior, and the release in which things changed.

array) or a view of the corresponding slice of the original mask. The view is
required to ensure propagation of any modification of the mask to the original.

>>> x = ma.array([1, 2, 3, 4, 5], mask=[0, 1, 0, 0, 1])
>>> mx = x[:3]
Expand Down
25 changes: 1 addition & 24 deletions numpy/ma/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -3259,8 +3259,6 @@ def __setitem__(self, indx, value):
_mask[indx] = tuple([True] * nbfields)
else:
_mask[indx] = True
if not self._isfield:
self._sharedmask = False
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the default here is self._sharedmask = True.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea why self._isfield was important?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll start by explaining what is going on and then answer your questions if that is alright. Please forgive me if I go into too much detail and explain the obvious. Good questions by the way. Sorry if my answer gets too long.

This particular branch is dealing with the case of a structured array. Suppose for example we have a structured array created like so a = np.zeros((2,), dtype=[("c",int, 2), ("d", float, 3)]). We could construct a masked structured array like so b = np.ma.array(a, mask=np.ma.getmaskarray(a)). In this example, we have two fields c and d. If we index into one say c like so b["c"], we will get a normal masked array.

However, there is no way to know that this masked array came from a structured masked array; so, the _isfield member variable indicates this. One can verify this is the case by trying b._isfield and b["c"]._isfield.

Now, why was _isfield checked? My belief, as I can't speak for the author personally, is the author correctly assumed that if we are looking at a field of a structured array our mask is a view of the mask from the original structured array's mask and we want to ensure changes here are propagated back to the original and so this must remain shared if it already is (this indirectly answers your first question). However, if it is not shared, it will remain the case.

To directly answer your first question, we can't assume anything about self._sharedmask. Its value will be dependent on what .

The argument that I am making here is that this one can't assume this is not a view if it is not a field. In other words, we could always be working with a view of another masked array where we want changes propagated to an original and should not be forcing the user's hand here. Additionally, it would be inconsistent to "unshare" the mask as the data remains shared regardless.


# Get the _data part of the new value
Expand All @@ -3276,27 +3274,6 @@ def __setitem__(self, indx, value):
_mask = self._mask = make_mask_none(self.shape, _dtype)
_mask[indx] = mval
elif not self._hardmask:
# Unshare the mask if necessary to avoid propagation
# We want to remove the unshare logic from this place in the
# future. Note that _sharedmask has lots of false positives.
if not self._isfield:
notthree = getattr(sys, 'getrefcount', False) and (sys.getrefcount(_mask) != 3)
if self._sharedmask and not (
# If no one else holds a reference (we have two
# references (_mask and self._mask) -- add one for
# getrefcount) and the array owns its own data
# copying the mask should do nothing.
(not notthree) and _mask.flags.owndata):
# 2016.01.15 -- v1.11.0
warnings.warn(
"setting an item on a masked array which has a shared "
"mask will not copy the mask and also change the "
"original mask array in the future.\n"
"Check the NumPy 1.11 release notes for more "
"information.",
MaskedArrayFutureWarning, stacklevel=2)
self.unshare_mask()
_mask = self._mask
# Set the data, then the mask
_data[indx] = dval
_mask[indx] = mval
Expand Down Expand Up @@ -4575,7 +4552,7 @@ def put(self, indices, values, mode='raise'):
if self._mask is nomask and getmask(values) is nomask:
return

m = getmaskarray(self).copy()
m = getmaskarray(self)

if getmask(values) is nomask:
m.put(indices, False, mode=mode)
Expand Down
17 changes: 14 additions & 3 deletions numpy/ma/tests/test_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -394,15 +394,26 @@ def test_copy(self):
y1._data.__array_interface__)
self.assertTrue(y1a.mask is y1.mask)

y2 = array(x1, mask=m)
y2 = array(x1, mask=m3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this test? m3 is a copy of m. A note of explanation would help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, the point is that we have generated a masked array in two different ways first using y2 = array(x1, mask=m) and then y2a = array(x1, mask=m, copy=1). In short, we are testing whether the keyword argument copy worked as intended. My understanding was this is to make sure the data and the mask used were copied when it was made true, but would remain the same array if not. By default, copy is false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what kind of explanation would you like here? Are we looking for just a comment explaining the two use cases? Should I break them into two separate tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

self.assertTrue(y2._data.__array_interface__ == x1.__array_interface__)
self.assertTrue(y2._mask.__array_interface__ == m.__array_interface__)
self.assertTrue(y2._mask.__array_interface__ == m3.__array_interface__)
self.assertTrue(y2[2] is masked)
y2[2] = 9
self.assertTrue(y2[2] is not masked)
self.assertTrue(y2._mask.__array_interface__ != m.__array_interface__)
self.assertTrue(y2._mask.__array_interface__ == m3.__array_interface__)
self.assertTrue(allequal(y2.mask, 0))

y2a = array(x1, mask=m, copy=1)
self.assertTrue(y2a._data.__array_interface__ != x1.__array_interface__)
#self.assertTrue( y2a.mask is not m)
self.assertTrue(y2a._mask.__array_interface__ != m.__array_interface__)
self.assertTrue(y2a[2] is masked)
y2a[2] = 9
self.assertTrue(y2a[2] is not masked)
#self.assertTrue( y2a.mask is not m)
self.assertTrue(y2a._mask.__array_interface__ != m.__array_interface__)
self.assertTrue(allequal(y2a.mask, 0))

y3 = array(x1 * 1.0, mask=m)
self.assertTrue(filled(y3).dtype is (x1 * 1.0).dtype)

Expand Down
156 changes: 90 additions & 66 deletions numpy/ma/tests/test_old_ma.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@
import numpy as np
import numpy.core.umath as umath
import numpy.core.fromnumeric as fromnumeric
from numpy.testing import (
TestCase, run_module_suite, assert_, suppress_warnings)
from numpy.testing import TestCase, run_module_suite, assert_
from numpy.ma.testutils import assert_array_equal
from numpy.ma import (
MaskType, MaskedArray, absolute, add, all, allclose, allequal, alltrue,
Expand Down Expand Up @@ -258,73 +257,98 @@ def test_testCI(self):

def test_testCopySize(self):
# Tests of some subtle points of copying and sizing.
with suppress_warnings() as sup:
sup.filter(
np.ma.core.MaskedArrayFutureWarning,
"setting an item on a masked array which has a "
"shared mask will not copy")

n = [0, 0, 1, 0, 0]
m = make_mask(n)
m2 = make_mask(m)
self.assertTrue(m is m2)
m3 = make_mask(m, copy=1)
self.assertTrue(m is not m3)

x1 = np.arange(5)
y1 = array(x1, mask=m)
self.assertTrue(y1._data is not x1)
self.assertTrue(allequal(x1, y1._data))
self.assertTrue(y1.mask is m)

y1a = array(y1, copy=0)
self.assertTrue(y1a.mask is y1.mask)

y2 = array(x1, mask=m, copy=0)
self.assertTrue(y2.mask is m)
self.assertTrue(y2[2] is masked)
y2[2] = 9
self.assertTrue(y2[2] is not masked)
self.assertTrue(y2.mask is not m)
self.assertTrue(allequal(y2.mask, 0))

y3 = array(x1 * 1.0, mask=m)
self.assertTrue(filled(y3).dtype is (x1 * 1.0).dtype)

x4 = arange(4)
x4[2] = masked
y4 = resize(x4, (8,))
self.assertTrue(eq(concatenate([x4, x4]), y4))
self.assertTrue(eq(getmask(y4), [0, 0, 1, 0, 0, 0, 1, 0]))
y5 = repeat(x4, (2, 2, 2, 2), axis=0)
self.assertTrue(eq(y5, [0, 0, 1, 1, 2, 2, 3, 3]))
y6 = repeat(x4, 2, axis=0)
self.assertTrue(eq(y5, y6))
n = [0, 0, 1, 0, 0]
m = make_mask(n)
m2 = make_mask(m)
self.assertTrue(m is m2)
m3 = make_mask(m, copy=1)
self.assertTrue(m is not m3)

x1 = np.arange(5)
y1 = array(x1, mask=m)
self.assertTrue(y1._data is not x1)
self.assertTrue(allequal(x1, y1._data))
self.assertTrue(y1.mask is m)

y1a = array(y1, copy=0)
self.assertTrue(y1a.mask is y1.mask)

y2 = array(x1, mask=m3, copy=0)
self.assertTrue(y2.mask is m3)
self.assertTrue(y2[2] is masked)
y2[2] = 9
self.assertTrue(y2[2] is not masked)
self.assertTrue(y2.mask is m3)
self.assertTrue(allequal(y2.mask, 0))

y2a = array(x1, mask=m, copy=1)
self.assertTrue(y2a.mask is not m)
self.assertTrue(y2a[2] is masked)
y2a[2] = 9
self.assertTrue(y2a[2] is not masked)
self.assertTrue(y2a.mask is not m)
self.assertTrue(allequal(y2a.mask, 0))

y3 = array(x1 * 1.0, mask=m)
self.assertTrue(filled(y3).dtype is (x1 * 1.0).dtype)

x4 = arange(4)
x4[2] = masked
y4 = resize(x4, (8,))
self.assertTrue(eq(concatenate([x4, x4]), y4))
self.assertTrue(eq(getmask(y4), [0, 0, 1, 0, 0, 0, 1, 0]))
y5 = repeat(x4, (2, 2, 2, 2), axis=0)
self.assertTrue(eq(y5, [0, 0, 1, 1, 2, 2, 3, 3]))
y6 = repeat(x4, 2, axis=0)
self.assertTrue(eq(y5, y6))

def test_testPut(self):
# Test of put
with suppress_warnings() as sup:
sup.filter(
np.ma.core.MaskedArrayFutureWarning,
"setting an item on a masked array which has a "
"shared mask will not copy")
d = arange(5)
n = [0, 0, 0, 1, 1]
m = make_mask(n)
x = array(d, mask=m)
self.assertTrue(x[3] is masked)
self.assertTrue(x[4] is masked)
x[[1, 4]] = [10, 40]
self.assertTrue(x.mask is not m)
self.assertTrue(x[3] is masked)
self.assertTrue(x[4] is not masked)
self.assertTrue(eq(x, [0, 10, 2, -1, 40]))

x = array(d, mask=m)
x.put([0, 1, 2], [-1, 100, 200])
self.assertTrue(eq(x, [-1, 100, 200, 0, 0]))
self.assertTrue(x[3] is masked)
self.assertTrue(x[4] is masked)
d = arange(5)
n = [0, 0, 0, 1, 1]
m = make_mask(n)
m2 = m.copy()
x = array(d, mask=m)
self.assertTrue(x[3] is masked)
self.assertTrue(x[4] is masked)
x[[1, 4]] = [10, 40]
self.assertTrue(x.mask is m)
self.assertTrue(x[3] is masked)
self.assertTrue(x[4] is not masked)
self.assertTrue(eq(x, [0, 10, 2, -1, 40]))

x = array(d, mask=m2, copy=True)
x.put([0, 1, 2], [-1, 100, 200])
self.assertTrue(x.mask is not m2)
self.assertTrue(x[3] is masked)
self.assertTrue(x[4] is masked)
self.assertTrue(eq(x, [-1, 100, 200, 0, 0]))

def test_testPut2(self):
# Test of put
d = arange(5)
x = array(d, mask=[0, 0, 0, 0, 0])
z = array([10, 40], mask=[1, 0])
self.assertTrue(x[2] is not masked)
self.assertTrue(x[3] is not masked)
x[2:4] = z
self.assertTrue(x[2] is masked)
self.assertTrue(x[3] is not masked)
self.assertTrue(eq(x, [0, 1, 10, 40, 4]))

d = arange(5)
x = array(d, mask=[0, 0, 0, 0, 0])
y = x[2:4]
z = array([10, 40], mask=[1, 0])
self.assertTrue(x[2] is not masked)
self.assertTrue(x[3] is not masked)
y[:] = z
self.assertTrue(y[0] is masked)
self.assertTrue(y[1] is not masked)
self.assertTrue(eq(y, [10, 40]))
self.assertTrue(x[2] is masked)
self.assertTrue(x[3] is not masked)
self.assertTrue(eq(x, [0, 1, 10, 40, 4]))

def test_testMaPut(self):
(x, y, a10, m1, m2, xm, ym, z, zm, xf, s) = self.d
Expand Down