-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: make some masked array methods behave more like ndarray methods #5706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d11dbf5
to
3cd5514
Compare
Closing and reopening to see if the tests can be rejiggered. |
Yep, that works. Restarting the tests doesn't have the same effect. |
Try again. |
3cd5514
to
dff1dce
Compare
Updated Also I noticed that many ma operations will show a warning if you perform an invalid operation on an element even if that element is masked. Eg,
will raise a warning, which doesn't seem right. (The first of these also fails when using But actually the correct fix may instead be to do the domain check with the masked array, not the raw data. However, in the case of |
Keeping masked arrays interfaces in sync with ndarray has been a loosing battle in the past because most of the people who enhance ndarray either don't know or don't care about masked arrays. The things only got worse when masked array were reimplemented (over objections of original designers) to inherit from ndarray. After that, any new method added to ndarray has automatically became a broken method of masked arrays. Case in point: the dot method. See #5185. I can see only one way to fix this situation for the long term: add a test that would introspect andarray and masked arrays methods and make sure they match. Unfortunately, introspecting ndarray methods are not always possible because some of them are implemented in C, but in those cases we can probably parse the docstring to extract the signature. See #4356. Finally, I think this issue overlaps with #4537. |
if isinstance(out, MaskedArray): | ||
outmask = getattr(out, '_mask', nomask) | ||
if (outmask is nomask): | ||
outmask = out._mask = make_mask_none(out.shape) | ||
outmask.flat = newmask | ||
outmask[()] = newmask.reshape(result.shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is more idiomatic to use ...
when assigning to the entire array. Empty tuple indexing should only be used in generic code where ndim can happen to be 0. Literal a[()]
is confusing and unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the second thought, what was wrong with the old code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm you're right using .flat
works. I was worried that a non-1d newmask would cause problems.
I still think outmask[...] =
is better (thanks for explaining ()
vs ...
). I guess I am biased against .flat
after reading some other devs say they didn't like it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always found assigning to .flat
confusing, so as someone who might read your code at some point in the future I'd suggest keeping your change here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I only objected to the use of [()]
. Getting rid of .flat
is an improvement, I was just curious if that change was necessary to implement keepdims
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end I don't think I can replace .flat =
with [...] =
.
Using .flat
is currently necessary for interopreability of ma and matrices. See #4585 (comment), #4615.
Eg, if you sum a 3x3 masked-matrix along the 2nd axis, the resulting data shape is (3,1)
, but the mask shape becomes (3,)
(since mask is an ndarray). [...] =
fails when trying to assign a (3,)
shape to a (3,1)
shape, but .flat =
doesn't.
If we really want to get rid of a.flat = b
here, something like this might be a replacement:
a[...] = b.reshape(a.shape) if not np.isscalar(b) else b
but that seems a bit ugly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahaldane - ah, bitten by matrices again... Yes, in that case, I would also just stick with .flat
.
In my opinion, subclassing from |
@@ -28,12 +28,14 @@ | |||
|
|||
import numpy as np | |||
import numpy.core.umath as umath | |||
from numpy.core import multiarray as mu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point I'd really like to get rid of most of these imports, plain old np.*
seems to work in most cases.
Might want to look at the corresponding nan functions, I expect there is some commonality. There needs to be tests for all the new functionality, might be able to copy some of those from the nanfunction tests also. Making the masked functions conform is good. We are looking for a masked array maintainer if anyone is interested in pursuing that ;) |
dff1dce
to
9baf7a9
Compare
Updated. Summary so far:
I haven't tested it much yet. By the way, for a separate PR, I noticed the ndarray doctrings are missing keepdim args. Eg in I also notices that the C-Api function |
6814cfa
to
31afad5
Compare
More changes: I got burned due to the So I made |
Test failure is spurious, happens because http://www.rabbitmq.com is down. |
Apart from the two small comments (one of which can be ignored if you wish), to me this seems all ready to go in. Sorry for it having been such a long slog... |
ef527a1
to
451614d
Compare
Seems like this is a bit cleaned up/less compared to before. I will try to look over it again sooner rather then later, so hopefully by the weekend, don't feel like it now, but poke me if I don't (though if someone else beats me to it, would be more than happy), and since @mhvk already reviewed it, I think it should be good anyway. That way the changes can hopefully sit at least a little bit on master to notice oddities. |
@ahaldane - OK, all seems fine now! |
`None`, all non-masked elements are counted. | ||
axis : None or int or tuple of ints, optional | ||
Axis or axes along which the count is performed. | ||
The default (`axis` = `None`) is perform the count sum over all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are single ticks OK here (honestly not sure)? The sentence reads a bit funny. "is to perform" or "performs" sound fine to me, though who knows english has some weird grammar ;).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the docs as I read them, single-backticks go around variable, module, function, and class names, while double-backticks go around inline code. So I think it's right here. (Edit: and note that this docstring was copied from the np.any
docstring, though slightly mangled)
I'll fix the grammar though.
Just some silly questions/nitpicks. Then I guess I will put it in and hope nothing annoying crops up (with MA I won't try to guess). |
451614d
to
a455a22
Compare
Updated any, all, sum, prod, cumsum, cumprod, min, max, argmin, argmax, mean, var
a455a22
to
798dd4f
Compare
Updated.
|
OK, lets give it a shot. Thanks a lot @ahaldane was a long work.... |
Whew! Good thing I had some tenacious reviewers to help, thanks both of you. |
Follow up to numpy#5706. Fixes numpy#7509
Follow up to numpy#5706. Fixes numpy#7509
I recently learned about masked arrays and was trying them out in a project, and found that the masked array functions
mean
,var
,sum
,amax
and similar do not behave quite the same as their ndarray counterparts.They don't support multiple axes like ndarrays, eg
they don't have a
keepdims
argument, and looking at the code the behavior is a little different.In this PR I modified
np.ma.mean
,np.ma.sum
andnp.ma.count
to behave more like ndarray methods, as proof of concept.Does it look like a good idea, and is the code OK so far? If so I'll do the same thing for the rest of the functions (
var
,sum
etc).Note that I modified a test to get it to pass: I removed the test that the return type of
np.ma.count
was of typenp.intp
. The return type can now be an ndarray (for partial axes) or an int. The return type was discussed in #4698, and I think the problem at that time was thatarr.size
could return variable types, but I removed dependence onarr.size
so I don't think that's a problem any more.