BUG: Fix NEP50 weak-type scalar logic for binary operations on masked arrays #28567

rhugonnet · 2025-03-21T16:52:10Z

This PR adds NEP50 logic (https://numpy.org/neps/nep-0050-scalar-promotion.html) to masked arrays, which was overlooked in all binary operations.

For in-place operations specifically, this resulted in errors, see #27679. But in general, wrong dtypes were returned for all binary operations.

Tagging @mattip @seberg as related to the above issue and NEP50 that you are familiar with.

Resolves #27679

Short example of the bug

For an ndarray, this passes:

import numpy as np
res = np.array([1], dtype=np.uint8) + 2
assert res.dtype == np.uint8

The masked array equivalent fails:

res = np.ma.masked_array([1], dtype=np.uint8) + 2
assert res.dtype == np.uint8

Logic of the fix

Binary ops not in-place

We modify getdata() with a return_pyscalar argument to optionally return the scalar itself (instead of always converting to an array), and update the getdata() calls in _MaskedBinaryOperation and _DomainedBinaryOperation to use the scalar directly in their call to the corresponding np.umath function.

This fixes the issue for all not in-place binary ops.

Binary ops in-place

The problem:
For in-place operations, the problem is more delicate. As masked arrays vow to preserve the .data under the mask during an operation (see first lines here: https://numpy.org/doc/stable/reference/maskedarray.generic.html#operations-on-masked-arrays), they need a specific in-place logic.
In case of binary ops, the resulting .data is preserved under the final mask with respect to the left term, so self.data for in-place. To perform this without adding more memory usage, the existing logic in the ma replaces data under the future mask in other_data with the invariant value for the operation (0 for add/subtract, 1 for power/division), before passing it to the corresponding np.umath.
Therefore, we cannot rely on the np.umath function to take care of the NEP50 scalar logic, as the scalar other_data has to be converted to an array to substitute some cells to these invariant values before calling the np.umath.

Proposed fix:
We still use the return_pyscalar argument of getdata(), as for not in-place above, to get other_data as a scalar. Then, in the case where it is a scalar, we run an equivalent of result_dtype() on the scalar+array for this operation. However, result_dtype itself is not sufficient here. We need a function that can take as input (scalar, array_type, operator) and can both return the proper result_dtype for NEP50, and flag if we need to raise an OverflowError.

Does such a function exist internally somewhere in NumPy? (I couldn't find any)

For now, as a placeholder operation simulating the same behaviour, I've used a dubious call to a single value numpy array + the scalar for this operation.
For example for __iadd__:

if isinstance(other_data, (bool, int, float)):
    placehold_arr = np.array([0], dtype=self._data.dtype)  # Placeholder data
    placehold_arr.__iadd__(other_data)   # Raises appropriate errors
    other_dtype = placehold_arr.dtype   # Returns appropriate dtype to create a other_data as array

This fixes the issue for all in-place binary ops.

New tests

First, I've mirrored all the relevant tests in https://github.com/numpy/numpy/blob/main/numpy/_core/tests/test_nep50_promotions.py for masked arrays.

However, as those don't cover in-place operations, I've also added a test_nep50_inplace_ma that checks the behaviour for all combinations of in-place ops, scalars and dtypes.

rhugonnet · 2025-04-04T15:22:27Z

@mattip @seberg @tylerjereddy @ngoldbaum Any feedback on this?

Just in case you didn't dive in: It's relatively minor changes (almost all line changes are additional tests).

If we can easily define a function _result_dtype_raise_overflow(scalar, array_dtype, operator), it'll just be about calling it within all in-place binary ops, instead of the current placeholder method that replicates that behaviour.
And for not in-place ops, it's just those couple lines passing the scalar to the op directly.

seberg

Thanks for looking into it, sorry masked arrays tend to not get much attention. I won't be able to look at this for a while, but the rough approach seems good, that "hack" looks a bit awkward, though.

seberg · 2025-04-04T15:34:34Z

numpy/ma/core.py

+        other_data = getdata(other, return_scalar=True)
+        if isinstance(other_data, (bool, int, float)):
+            # Trick to raise appropriate error and get output dtype
+            placehold_arr = np.array([0], dtype=self._data.dtype)


This trick seems a bit strange to me. Where does it actually fail if you don't do this here?

Yes, this is where we'd need a _result_dtype_raise_overflow(scalar, array_dtype, operator) function to give us the dtype or raise the OverflowError, and cleanly replace this hack.

Without this bit, the code would fail here (even with other_data now being returned as a scalar):

other_data = np.where(self._mask, other_data.type(0), other_data) self._data.__isub__(other_data)

It fails because the python scalar has to be converted into an array before the actual operation.
This conversion is (unfortunately) required in order replace masked values by the neutral value for the op (0 for add/sub, 1 for pow/div), as masked array vow not to touch masked values.

Hmmm, you only need the promotion result? The other error will happen naturally when the mask is created?
This is an possibility, I think:

dtype = np.true_divide.resolve_dtypes((np.dtype(np.int8), int, np.dtype(np.int8)), casting="unsafe")[1]

but I would prefer to avoid that if we can think of a way. We could use where= on the ufunc call (as part of the __itruediv__), but I suppose that will make things slower.

Yes, that'd be ideal, but I couldn't think of any option. I was afraid that using where for a in-place operation would unnecessarily increase memory usage (I think we'd need to conserve self._data, and also have the result of op(self._data, other_data) in order to run where?)
Depending on how __iadd__ for ndarray works in the background, maybe there's a way, but I wouldn't know.

Following your advice, I tried the option of changing the hack everywhere into this:

if isinstance(type(other_data), (int, float, complex)): other_dtype = np.ufunc.resolve_dtypes( (self._data.dtype, type(other_data), None), casting="unsafe")[1]

with ufunc the corresponding operation to the __ioperation__.

It doesn't seem to give exactly the same other_dtype output as the previous hack that worked for all.
Now, combinations of np.uint8 and Python int fail for iadd, isub, imul, ifloordiv and ipow (so everything except truediv and div) during the __ioperation__:

E numpy._core._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'floor_divide' output from dtype('int64') to dtype('uint32') with casting rule 'same_kind'

Any ideas?

And unfortunately, the OverflowError doesn't happen naturally with the above option.
For instance, this doesn't raise one while it does with np.array():

np.ma.masked_array([1], dtype=np.uint8) + 300

My bad, I forgot the part about using in instead of isinstance!

All passing with the following instead of the hack (including OverflowError!) 😁 :

if type(other_data) in (int, float, complex): other_dtype = np.ufunc.resolve_dtypes( (self._data.dtype, type(other_data), None), casting="unsafe")[1]

seberg · 2025-04-04T15:39:49Z

numpy/ma/core.py

-        other_data = getdata(other)
-        other_data = np.where(self._mask, other_data.dtype.type(1), other_data)
+        other_data = getdata(other, return_scalar=True)
+        if isinstance(other_data, (bool, int, float)):


(bool, int, float) is wrong, it has to be (int, float, complex). Also make sure to use type(other_data) in ... for an exact type check.
(Yes, I realize that there is some discussion about whether it should be an exact check, but it is what we use currently, and float64 is a float subclass so isinstance may at times be confusing.)

Clearly this applies to all occurances. Also rename return_scalar to return_pyscalar or return_pyliterals even to make it sure that this is not about preserving scalars in general.

rhugonnet · 2025-04-04T21:17:54Z

Pushed the changes, all works nicely with np.ufunc.resolve_dtypes for in-place! The last test function I added checks this quite thoroughly.

If there is no where solution that comes to mind without computing/memory trouble, I'm happy with this! 🙂

ngoldbaum · 2025-08-20T18:29:54Z

Ping @seberg - mind giving this another once-over?

@rhugonnet it looks like there's a merge conflict - mind clearing that?

Sorry this got dropped. We just looked at this at a triage meeting where Sebastian wasn't there, so I hoped that pinging Sebastian is sufficient to say this has been triaged.

jorenham · 2025-08-20T18:37:20Z

numpy/ma/core.py

@@ -710,7 +710,7 @@ def get_masked_subclass(*arrays):
    return rcls


-def getdata(a, subok=True):
+def getdata(a, subok=True, return_pyscalar=False):


Seeing as it's being used as keyword-only, so maybe we should just enforce it?

Suggested change

def getdata(a, subok=True, return_pyscalar=False):

def getdata(a, subok=True, *, return_pyscalar=False):

rhugonnet · 2025-08-21T23:42:54Z

Thanks for picking this up again @ngoldbaum!

Fixed the two merge conflicts: I kept the idiv removal of #28927, and the linting change of #28755.

Add NEP50 logic for masked arrays and tests

74cb500

github-actions bot added the 00 - Bug label Mar 21, 2025

rhugonnet added 3 commits March 21, 2025 17:53

Remove space edits from editor in test file

738bb00

Further fix edited spaces

3ab5f4d

Final fix for spaces

f39ffc7

tylerjereddy added the component: numpy.ma masked arrays label Mar 21, 2025

ngoldbaum added the triage review Issue/PR to be discussed at the next triage meeting label Mar 24, 2025

seberg reviewed Apr 4, 2025

View reviewed changes

rhugonnet added 2 commits April 4, 2025 22:29

Use np.ufunc.resolve_dtypes instead of placeholder operation

901fe80

Reverse trailing space removal from automatic editor linter

a77d5c3

melissawm added this to NumPy first-time contributor PRs Apr 7, 2025

melissawm moved this to Awaiting a code review in NumPy first-time contributor PRs Apr 7, 2025

rhugonnet mentioned this pull request Apr 7, 2025

Add NumPy 2 support GlacioHack/geoutils#605

Merged

ngoldbaum added triaged Issue/PR that was discussed in a triage meeting and removed triage review Issue/PR to be discussed at the next triage meeting labels Aug 20, 2025

jorenham reviewed Aug 20, 2025

View reviewed changes

Merge branch 'main' into bug_nep50_ma

1748a80

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Fix NEP50 weak-type scalar logic for binary operations on masked arrays #28567

BUG: Fix NEP50 weak-type scalar logic for binary operations on masked arrays #28567

rhugonnet commented Mar 21, 2025 •

edited

Loading

Uh oh!

rhugonnet commented Apr 4, 2025

Uh oh!

seberg left a comment

Uh oh!

seberg Apr 4, 2025

Uh oh!

rhugonnet Apr 4, 2025

Uh oh!

seberg Apr 4, 2025

Uh oh!

rhugonnet Apr 4, 2025 •

edited

Loading

Uh oh!

rhugonnet Apr 4, 2025

Uh oh!

rhugonnet Apr 4, 2025

Uh oh!

seberg Apr 4, 2025

Uh oh!

rhugonnet commented Apr 4, 2025

Uh oh!

ngoldbaum commented Aug 20, 2025

Uh oh!

jorenham Aug 20, 2025

Uh oh!

rhugonnet commented Aug 21, 2025

Uh oh!

Uh oh!

	def getdata(a, subok=True, return_pyscalar=False):
	def getdata(a, subok=True, *, return_pyscalar=False):

Uh oh!

BUG: Fix NEP50 weak-type scalar logic for binary operations on masked arrays #28567

Are you sure you want to change the base?

BUG: Fix NEP50 weak-type scalar logic for binary operations on masked arrays #28567

Conversation

rhugonnet commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Short example of the bug

Logic of the fix

Binary ops not in-place

Binary ops in-place

New tests

Uh oh!

rhugonnet commented Apr 4, 2025

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

seberg Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

rhugonnet Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

seberg Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

rhugonnet Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rhugonnet Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

rhugonnet Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

seberg Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

rhugonnet commented Apr 4, 2025

Uh oh!

ngoldbaum commented Aug 20, 2025

Uh oh!

jorenham Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

rhugonnet commented Aug 21, 2025

Uh oh!

Uh oh!

rhugonnet commented Mar 21, 2025 •

edited

Loading

rhugonnet Apr 4, 2025 •

edited

Loading