Skip to content

BUG: Fix MaskedArray.__setitem__ #8594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 20, 2017

Conversation

eric-wieser
Copy link
Member

@eric-wieser eric-wieser commented Feb 9, 2017

Fixes #8510.

The root cause here is that np.ma.getdata does a conversion to np.ndarray that we don't want to happen.

This conversion in general is a bad idea, because when delegating to base numpy functions, it doesn't allow that function to do the correct conversion with extra information.

numpy/ma/core.py Outdated
return data

# object dtype should not be converted to an array
if np.dtype(dtype) == np.object_:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could use the hasobject property here, and avoid needlessley creating one:

if dtype is not None and np.dtype(dtype).hasobject:
    ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, for speed this probably should be:

if dtype is not None:
    dtype = np.dtype(dtype)
    if dtype.hasobject:
        return a

return np.array(a, copy=False, subok=subok, dtype=dtype)

This avoids possibly converting a string to dtype twice.

Copy link
Member Author

@eric-wieser eric-wieser Feb 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely do not want hasobject - that would catch structured dtypes, whereas I specifically want just object types

numpy/ma/core.py Outdated
try:
data = a._data
except AttributeError:
data = np.array(a, copy=False, subok=subok)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason not to keep the logic here? That avoids having to add the else clause.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the logic only made sense in this particular case. This separates the three cases of how the dtype parameter should be used.

@mhvk
Copy link
Contributor

mhvk commented Feb 9, 2017

@eric-wieser - apart from the very minor comments, why the check for the object dtype at all? It seems this is not required to make the test case work (I do agree with the sentiment, but this is not really a function that will be used outside of np.ma -- indeed, one could also insist any dtype passed in is already a dtype instance).

Separately: nice to actually use getmask consistently!

@eric-wieser
Copy link
Member Author

eric-wieser commented Feb 9, 2017

It seems this is not required to make the test case work

Well, it is! Otherwise test_set_element_as_object fails.

@mhvk
Copy link
Contributor

mhvk commented Feb 9, 2017

Ah, I see, it was an existing test that starts breaking with your change; should have considered that possibility. But I fear this will fail the general case. E.g., what happens if one is trying to do ma['object'] = <object> if ma is a masked single-element record array and object is a record with objects?

Might it be an idea to move this object-checking piece of the logic out of getdata and in to __setitem__ (where one has self-awareness and can thus choose not to go through getdata at all if one is trying to set a single object).

@eric-wieser
Copy link
Member Author

eric-wieser commented Feb 9, 2017

But I fear this will fail the general case.

Note that it can't fail any existing case, since the only place right now that sets the dtype parameter of getdata is in __setitem__. Without that argument, the behaviour is as (broken?) as before.

@mhvk
Copy link
Contributor

mhvk commented Feb 9, 2017

I meant the "general case of __setitem__".

Indeed, that you only use the dtype in __setitem__ is why I suggested to keep the "does self hold objects" part of the logic there. As is, getdata always returns an array, and I think it makes a lot of sense to allow one to tell what dtype the result should have, but it makes less sense to allow it not to return an array any more.

@mhvk
Copy link
Contributor

mhvk commented Feb 9, 2017

Looking more at test_set_element_as_object, I realize some parts of numpy I don't like. At least, I find the following bizarre:

a = np.ma.empty(5, dtype=object)
a[0] = x
a
# masked_array(data = [(1, 2, 3, 4, 5) None None None None],
#              mask = False,
#        fill_value = ?)
a[:] = x
a
# masked_array(data = [1 2 3 4 5],
#              mask = False,
#        fill_value = ?)
a[0:1] = x
# ValueError: cannot copy sequence with size 5 to array axis with dimension 1

Anyway, it does seem we're stuck with this.

@eric-wieser
Copy link
Member Author

I think that's consistent with how non-ma arrays behave

@mhvk
Copy link
Contributor

mhvk commented Feb 9, 2017

p.s. Is there anything against just using dval = getattr(value, '_data', value)? This is in closer analogy to what getmask does, and may solve everything too? At least, I just tried on your branch and there are no errors...

@eric-wieser
Copy link
Member Author

@mhvk: Very tempting. I'd argue that getdata should become that, not just the use in setitem. Or at the very least, everything in ma.core should maybe use that.

@eric-wieser
Copy link
Member Author

@mhvk: Actually, that makes much more sense as an approach, as really we want to forward array-conversion behaviour on to the base ndarray functions, rather than try and emulate the same behaviour ourself in np.ma.

I'm not sure if it's acceptable to change the public behaviour of getdata in that way though, so perhaps the entire of ma would use _getdata(), and getdata can be deprecated?

@eric-wieser eric-wieser force-pushed the MaskedArray.__setitem__ branch from 315886f to 6aa9541 Compare February 10, 2017 15:03
@eric-wieser
Copy link
Member Author

@mhvk: Really good spot there - this probably caused other bugs in subtle places as well. Updated to use getattr(value, '_data', value) plus some backwards-compatible boilerplate

@eric-wieser eric-wieser force-pushed the MaskedArray.__setitem__ branch from 6aa9541 to c08566f Compare February 10, 2017 15:57
@mhvk
Copy link
Contributor

mhvk commented Feb 10, 2017

@eric-wieser - I tried to do something quite similar in an earlier PR, and at least at the time that was felt to be too risky. My suggestion, therefore, would be to split this PR in an uncontroversial one that fixes the bug (i.e., use getattr(value, '_data', value) in __setitem__) and a maintenance PR that updates getdata (the latter should probably also have the generic use of getmask since it is not necessary for the bug fix).

But perhaps best to ask a real maintainer...

@eric-wieser
Copy link
Member Author

split this PR in an uncontroversial one that fixes the bug (i.e., use getattr(value, '_data', value) in __setitem__

I suspect there are a whole class of similar bugs, that mean this change should be made through np.ma.core. But perhaps exposing this change publically is a bad idea

@eric-wieser eric-wieser force-pushed the MaskedArray.__setitem__ branch from c08566f to b4f0210 Compare February 10, 2017 16:27
@eric-wieser eric-wieser changed the title BUG: Fix MaskedArray.__setitem__ BUG: Fix various bits of MaskedArray Feb 10, 2017
@eric-wieser eric-wieser force-pushed the MaskedArray.__setitem__ branch 2 times, most recently from f604105 to 31e7726 Compare February 20, 2017 15:25
@eric-wieser eric-wieser changed the title BUG: Fix various bits of MaskedArray BUG: Fix MaskedArray.__setitem__, and change np.ma.getdata Feb 20, 2017
@mhvk
Copy link
Contributor

mhvk commented Feb 20, 2017

@eric-wieser - looking back at this (now with a maintainer hat on): I do very much like the change to getdata in (as I think it is much more logical, and means there is at least a hope of a MaskedQuantity class, one of my longstanding goals...), but it also still seems to me that that part really is not for a bug fix.

So, I think this is best done in two steps, the bug fix just using getattr(value, '_data', value), and then a MAINT PR that changes getdata. Is that OK with you?

@eric-wieser
Copy link
Member Author

@mhvk: I've been trying to conjure up other bugs that arise from the current implementation, but haven't found any yet. If I don't succeed in doing so, then yes, I guess I'll split the PR

@eric-wieser
Copy link
Member Author

eric-wieser commented Feb 20, 2017

Also, I moved the np.ma.where stuff to #8647, which it seems the change to getdata requires be merged first

@mhvk
Copy link
Contributor

mhvk commented Feb 20, 2017

OK; if it does stay together, then I guess it should be MAINT for 1.13 -- one cannot really introduce a deprecation warning in a bug-fix release...

@eric-wieser
Copy link
Member Author

Oh, I see what you mean now - I hadn't realized targeting 12.1 was on the cards. How would you feel about me introducing _get_data, and using that everywhere in ma/core.py? Then the MAINT commit can just rename that to get_data

@mhvk
Copy link
Contributor

mhvk commented Feb 20, 2017

My sense remains that we should just fix the __setitem__ bug for the bug fix, not make further changes...

@eric-wieser
Copy link
Member Author

eric-wieser commented Feb 20, 2017

And fix it for 1.12 too then?

In that case, I should rebase on the branch point for 1.12. Where is that?

Update: 1718ee8?

@eric-wieser
Copy link
Member Author

eric-wieser commented Feb 20, 2017

@mhvk: Ok, I've stripped it down to just that change, and rebased on the branch point.

@mhvk mhvk changed the title BUG: Fix MaskedArray.__setitem__, and change np.ma.getdata BUG: Fix MaskedArray.__setitem__ Feb 20, 2017
@mhvk mhvk added this to the 1.12.1 release milestone Feb 20, 2017
@mhvk
Copy link
Contributor

mhvk commented Feb 20, 2017

OK, that looks good. I'll merge assuming the tests pass (don't see how they could not, but just to be sure).

@eric-wieser
Copy link
Member Author

Arguably #8648 is one that should have the 12.1 release milestone

@eric-wieser
Copy link
Member Author

@mhvk: tests pass :)

@mhvk
Copy link
Contributor

mhvk commented Feb 20, 2017

Hmm, yes, it is my astropy background again, where we set the milestone to the lowest relevant release and do backporting after the fact. But I think you've done the right thing here, so will merge both...

@mhvk mhvk merged commit b8769a2 into numpy:master Feb 20, 2017
@eric-wieser eric-wieser deleted the MaskedArray.__setitem__ branch February 20, 2017 20:44
@charris
Copy link
Member

charris commented Feb 20, 2017

@mhvk Our backport policy is still somewhat ad hoc, exspecially as I the only one who has been doing release. What I currently do is set the milestone to the earlier version, so I will find the PR when looking for backports, then do a backport, label it as such, set the milestone on the backported version, and remove the milestone from the original. I'm not completely happy with the process, so if you have better ideas I'd like to hear them. One option I've considered is a backported label in addition to the backport label, then use the latter for things to be backported.

@eric-wieser
Copy link
Member Author

Thanks @mhvk. This now unblocks #8511 :)

@mhvk
Copy link
Contributor

mhvk commented Feb 21, 2017

@charris - OK, so clearly we should continue to set the milestone to a bug-release version; that is no effort to anyone and keeps things clear. For the rest, it would be nice if things could be more automated. E.g., might it be possible to have some travis magic that does a trial merge & test? But maybe we better discuss this on the mailing list? ... which I will do now.

@charris charris changed the title BUG: Fix MaskedArray.__setitem__ BUG: Fix MaskedArray.__setitem__ May 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants