-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
BUG: allow MaskedArray min/max with dtype=object #29576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Tests the case of arrays of 0-D arrays, and check for identity rather than equality.
@@ -5942,8 +5942,9 @@ def min(self, axis=None, out=None, fill_value=None, keepdims=np._NoValue): | |||
# No explicit output | |||
if out is None: | |||
result = self.filled(fill_value).min( | |||
axis=axis, out=out, **kwargs).view(type(self)) | |||
if result.ndim: | |||
axis=axis, out=out, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused here I think. When I write a new regression test that more closely matches the original reproducer, like below, it errors out here.
--- a/numpy/ma/tests/test_core.py
+++ b/numpy/ma/tests/test_core.py
@@ -1461,6 +1461,11 @@ def test_minmax_object_dtype(self, arg1, arg2, arg3):
assert a.min() is arg1
assert a.max() is arg3
+
+ def test_gh_9103(self):
+ arr = masked_array([1, 2, 3], dtype='object', mask=[True, False, False])
+ assert arr.min() == 2
Traceback
_____________________________________________________________________________ TestMaskedArrayArithmetic.test_gh_9103 ______________________________________________________________________________
self = <numpy.ma.tests.test_core.TestMaskedArrayArithmetic object at 0x1033739b0>
def test_gh_9103(self):
arr = masked_array([1, 2, 3], dtype='object', mask=[True, False, False])
> assert arr.min() == 2
arr = masked_array(data=[--, 2, 3],
mask=[ True, False, False],
fill_value=np.str_('?'),
dtype=object)
self = <numpy.ma.tests.test_core.TestMaskedArrayArithmetic object at 0x1033739b0>
numpy/ma/tests/test_core.py:1467:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
numpy/ma/core.py:5944: in min
result = self.filled(fill_value).min(
_mask = array([ True, False, False])
axis = None
fill_value = None
keepdims = <no value>
kwargs = {}
newmask = np.False_
out = None
self = masked_array(data=[--, 2, 3],
mask=[ True, False, False],
fill_value=np.str_('?'),
dtype=object)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
a = array(['?', 2, 3], dtype=object), axis = None, out = None, keepdims = False, initial = <no value>, where = True
def _amin(a, axis=None, out=None, keepdims=False,
initial=_NoValue, where=True):
> return umr_minimum(a, axis, None, out, keepdims, initial, where)
E TypeError: '<=' not supported between instances of 'str' and 'int'
a = array(['?', 2, 3], dtype=object)
axis = None
initial = <no value>
keepdims = False
out = None
where = True
numpy/_core/_methods.py:45: TypeError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great catch, I totally neglected to even test the masked array with an actual mask! I would say that is a separate bug from the one I'm trying to fix though. Actually it seems like it may be a regression that has occurred in the 8 years since #9103 was reported. Even without my change, in current versions of numpy, the #9103 reproducer now gives that TypeError rather than the AttributeError. Of course you can work around this by setting an appropriate fill_value
other than '?' (and arguably you should have a fill_value
that matches the actual data you're storing in the array). But I'm not sure why it worked at all in 2017. I'll look into it a bit more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... even with the contemporary numpy 1.13.0 from 2017, I'm seeing the TypeError
rather than the AttributeError
on the #9103 reproducer. I only get the AttributeError
if I remove the mask or set a fill_value
. I'd test older versions, but I'm running into some segfaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been trying to think if there is any "good" way to fix the TypeError
issue, besides the obvious way of creating a universal infinity object that compares greater to any other object regardless of type1, and putting that in min_filler
, but really I don't think there is (short of completely reimplementing min). That said, it should be a very easy fix to create such an object, so I'd be willing to add that to the PR if you think it is an acceptable solution.
Footnotes
-
...as long as those other objects aren't also playing the same game of overriding comparison operators for all other types (eg numpy array broadcasting) -- which is part of why I don't consider it a "good" solution even though it should work for most cases. ↩
I haven't thought too much more about this, but if the discussion above is suggesting that this PR doesn't fix the reproducer in the original ticket, it might be sensible to at least not have this PR set to automatically close/address the ticket, etc. Then the scope might be clearer/a bit less confusing--it may very well be just fine to address the separate issue here without closing that ticket. |
That's reasonable. I changed it to no longer close #9103. |
axis=axis, out=out, **kwargs).view(type(self)) | ||
if result.ndim: | ||
axis=axis, out=out, **kwargs) | ||
if isinstance(result, ndarray) and result.ndim: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an (extreme) edge case I am aware of here that my code fails to address: ideally, I think that
x = np.array([1, 2])
assert np.fromiter([x], dtype=object).view(np.ma.MaskedArray).min() is x
should pass, but with my change, x will get erroneously converted to a masked array when returned by min
.
This is identical to the old behavior, so it's not necessarily a problem, but it is a failure of my assumptions: initially when working on this PR, I thought I only needed to worry about arrays containing 0-D array objects, since arrays of higher dimension are not orderable (and so you'd just get an error when calling min()
, as you should). But of course I forgot to consider that min()
will succeed regardless if there is only one item. Also, as another similar edge case, higher dimensional arrays actually are orderable if they themselves only contain one item (not just if they are 0-D).
I'm not sure if this is actually worth fixing though. Thoughts?
Addresses the
AttributeError
reported in #9103 by checking if the result of min/max is anndarray
before calling.view()
.In the process of fixing this bug, I also found several other related issues by searching for uses of
.view()
. I was initially planning to fix those in this PR, but I think some of them may require a bit more discussion. I plan to open a new issue about that later.