Skip to content

BUG: Masked scalar comparison returns float #4332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
abalkin opened this issue Feb 20, 2014 · 9 comments
Open

BUG: Masked scalar comparison returns float #4332

abalkin opened this issue Feb 20, 2014 · 9 comments

Comments

@abalkin
Copy link
Contributor

abalkin commented Feb 20, 2014

>>> (numpy.ma.array(5.5, mask=True) > 0).dtype
dtype('float64')

Moreover, the result has float dtype even if the masked scalar is an int:

>>> (numpy.ma.array(5, dtype='i', mask=True) < 0).dtype
dtype('float64')

The problem is not present in oldnumeric:

>>> (numpy.oldnumeric.ma.array(5.5, mask=True) < 0).dtype
dtype('bool')
>>> numpy.__version__
'1.8.0'
@abalkin
Copy link
Contributor Author

abalkin commented Feb 20, 2014

Apparently the problem is with this logic

            if result.shape == () and m:
                return masked

in MaskedArray.__array_wrap__. Any scalar result from ufunc gets replaced with the masked singleton which has

>>> numpy.ma.masked.dtype
dtype('float64')

So the issue is not limited to comparison operators:

>>> -numpy.ma.array(5, mask=True) is numpy.ma.masked
True

abalkin added a commit to abalkin/numpy that referenced this issue Feb 20, 2014
Removed logic replacing masked scalar results from ufuncs
with ma.masked singleton which has dtype float64.

Fixes numpy#4332.
@pierregm
Copy link
Contributor

The whole idea is indeed to return a specific value, the masked singleton, however it is defined. That way, one can test whether an item of a ndarray is masked by simply checking item is masked.

The reason why it worked with oldnumeric is that the masked singleton was defined as np.ma.MaskedArray(False, dtype=bool). I'd be more willing to have this changed back than having __array_wrap__ no longer returning the np.ma.masked value. I'm afraid the current PR is such a redical change it will break things down the road.

@abalkin
Copy link
Contributor Author

abalkin commented Feb 22, 2014

The reason why it worked with oldnumeric is that the masked singleton
was defined as np.ma.MaskedArray(False, dtype=bool).

No,

 >>> numpy.oldnumeric.ma.masked.dtype
 dtype('int32')

but

 >>> (numpy.oldnumeric.ma.masked > numpy.oldnumeric.ma.masked).dtype
 dtype('bool')

@abalkin
Copy link
Contributor Author

abalkin commented Feb 22, 2014

Would it be less radical if we preserve the existing behavior when result dtype is float, but return the correct type otherwise?

@pierregm
Copy link
Contributor

@abalkin Can you give an example of what you have in mind?

@abalkin
Copy link
Contributor Author

abalkin commented Feb 23, 2014

@pierregm

--- a/numpy/ma/core.py
+++ b/numpy/ma/core.py
@@ -2841,7 +2841,7 @@ class MaskedArray(ndarray):
                     # Don't modify inplace, we risk back-propagation
                     m = (m | d)
             # Make sure the mask has the proper size
-            if result.shape == () and m:
+            if result.shape == () and m and m.dtype == masked.dtype:
                 return masked
             else:
                 result._mask = m

I am not sure backward compatibility is such a big concern here. I understand that some people like a[i] is ma.masked idiom, but I am yet to see anyone writing a.sum() is not ma.masked.

In the long run, I would really like to see masked singleton use deprecated. As numpy type system getting reacher, it will cause more and more problems.

@charris
Copy link
Member

charris commented Feb 24, 2014

Labelling a bug until the conversation concludes.

@pierregm
Copy link
Contributor

Returning the masked singleton was a design choice since numarray. I understand it can be frustrating, that a better solution needs to be implemented (once a consensus will be reached on missing/masked data...), that testing x is np.ma.masked is a bit weird (on another hand, testing x is None is quite common)... Nevertheless, I would first try to set np.ma.masked=MaskedArray(False,mask=True, dtype=bool), see whether it breaks anything in numpy and matplotlib before using your solution (because there's a good chance that m.dtype will never be masked.dtype in a generic case).
Note #1: OK, I see the comments in L5673/L5764 about precedence, so there must have been a good reason at the time, but I can't remember which one..
Note #2: another possibility with masked scalars is to have a comparison return False all the time (like np.nan does). But that's a radical change too...

@abalkin
Copy link
Contributor Author

abalkin commented Feb 25, 2014

I would first try to set np.ma.masked=MaskedArray(False,mask=True, dtype=bool), see whether it breaks anything

Do you recall the reason for changing ma.masked dtype from int32 to float between oldnumeric.ma and numpy.ma?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants