ENH: Adding where for argmin #21625

m10an · 2022-05-28T20:28:24Z

Addresses #14371

m10an · 2022-05-28T20:37:13Z

I'm not sure about np.nanargmin, np.nanargmax and MaskedArray's argmin, argmax.
Are initial and where needed in those functions?

seberg

Nice, this is a pretty thorough start! We need to pass this through the mailing list as an API decision at some point. I suspect adding where= should be pretty straight forward since it matches max (and reduce-like operations in general).

OTOH, I don't have any intuition for what initial= means. initial= is normally the starting value! np.max([1, 2], initial=100) makes sense: it returns 100, since that is larger than all the others.
But using initial as a "fill" value has a very different meaning. The only way I could make sense of it would be:

np.argmax([1, 2], initial=100)

returning some special value like -1 (indicating that initial was the largest value. But that feels like it may be too special, at least unless someone comes around with a clear real-world use case. (It is much easier/better to do API decisions with a specific use-case, rather than just filling apparent holes in the API)

There are a couple of other issues that I suspect exist that would need fixing here however:

where=True should be the default and be optimized out.
where=arr[::2] i.e. non-contiguous arrays must work and be tested! (it does not look like they will?)
where must work if shapes mismatch, as long as it can be broadcast to the input.

BvB93 · 2022-05-30T19:56:43Z

numpy/__init__.pyi

+    @overload
+    def argmax(
+        self,
+        axis: None = ...,
+        out: None = ...,
+        *,
+        keepdims: bool = ...,
+        initial: _ScalarLike_co = ...,
+        where: _ArrayLikeBool_co = ...,
+    ) -> intp: ...


No need for any additional overloads here (or anywhere in this PR), as neither initial nor where affect the output type, dtype or shape. In this case you can simply add the new parameters to the existing overloads:

@overload def argmax( self, axis: None = ..., out: None = ..., *, keepdims: bool = ..., + initial: _ScalarLike_co = ..., + where: _ArrayLikeBool_co = ..., ) -> intp: ...

As neither `initial` nor `where` affect the output type, dtype or shape, they simply should be added to the existing overloads.

m10an · 2022-05-31T19:31:40Z

@seberg I totally agree with counter intuitive initial= argument. I just followed example, and realised only while implementing it :)

But where= I would use as simple masking, and in case of zero mask raise

ValueError: attempt to get argmax of an empty sequence

And leave initial= as parameter of reduce-functions family

m10an · 2022-07-07T06:34:07Z

I've been leisurely working on major issues pointed out by @seberg (non-contiguous case and mismatched shapes).

First one is pretty straight forward (I believe), since there were already a piece for forcing alignment

wp = (PyArrayObject *)PyArray_ContiguousFromAny((PyObject *)where, NPY_BOOL, 0, 0);

But the second one appeared trickier for me. I decided to use NpyIter and to not reinvent wheel I thought.
My first test went smoothly (test_masked), but the second one (test_masked_2d) exits with segmentation fault (core dumped) because of calling of NpyIter_MultiNew here.

I've tried to recreate such case using np.nditer:

import numpy as np

method = 'max'
n = 5
a = np.zeros([2, n], dtype=int)
where = np.ones([2, n], dtype=bool)
value = getattr(np.iinfo(a.dtype), method)

a[:, 0] = value
a[:, n - 1] = value

arg_method = getattr(a, 'arg' + method)
mask_args = dict(initial=0, where=where)
where[0, 0] = False

it = np.nditer(
    [a, where], 
    ['multi_index'],
    [['readonly'], ['readonly']], casting='no', order='K'
)
with it:
    while not it.finished:
        it.debug_print()
        print(it[0], it[1])
        it.iternext()

But that didn't help me much...

The last straw was that gdb suddenly refused to show lines and step through code...

Thread 1 "python" hit Breakpoint 1, 0x00007ffff6d0f0f0 in _PyArray_ArgMinMaxCommon () from /home/ivan/proj/m10an/numpy/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so
(gdb) list
1       /usr/local/src/conda/python-3.9.13/Programs/python.c: No such file or directory.
(gdb) s
Single stepping until exit from function _PyArray_ArgMinMaxCommon,
which has no line number information.
PyType_IsSubtype (a=0x7ffff70bb200 <PyArray_Type>, b=0x7ffff70d1d80 <PyGenericArrType_Type>) at /usr/local/src/conda/python-3.9.13/Objects/typeobject.c:1425
1425    /usr/local/src/conda/python-3.9.13/Objects/typeobject.c: No such file or directory.

I hope there is a stupid mistake that I ignore or maybe NpyIter is not best here.
I would appreciate your thought on this.

seberg · 2022-07-07T14:13:48Z

numpy/core/src/multiarray/calculation.c

+        it_ops[0] = ap;
+        it_ops[1] = wp;
+        iter = NpyIter_MultiNew(2, it_ops, 0, NPY_KEEPORDER, NPY_NO_CASTING, 
+                                it_opflags, NULL);


You will have to pass a couple of flags here, probably: NPY_ITER_EXTERNAL_LOOP and NPY_ITER_ZEROSIZE_OK, maybe also NPY_ITER_REFS_OK. Some of the following code does not necessarily make much sense without the external loop flag.

The crash looks like a PyArray_Check() on some invalid data, but it may be in the code after the new creation. The main error here may be the missing check for error returns on iter and some of the following functions (although I am not certain, it may well be that calling some of these is just invalid without the appropriate flags).

I am surprised you are missing debugging symbols on a local build, but try recompiling with CFLAGS=-g or adding -g to runtests.py if you are using that.

seberg · 2024-01-13T12:23:42Z

Closing this, there is maybe a good start here so can be reopened. But it needs work and has not been active for 2 years.

m10an added 7 commits May 18, 2022 09:32

Init masking kwargs and testing numpy#14371

f1fd235

Add PyArray_BoolArrayConverter numpy#14371

1f61a56

Use masking in arg methods numpy#14371

81b2c69

Update testing and add 2d case numpy#14371

3cea455

Add overloads with initial and where numpy#14371

f93b817

Update docs numpy#14371

64333cc

Remove initial and where from MaskedArray numpy#14371

ffb2bbd

github-actions bot added the 01 - Enhancement label May 28, 2022

seberg added the 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. label May 28, 2022

m10an added 3 commits May 29, 2022 08:13

Fix version refrence numpy#14371

b2e729e

Add new params to nanfunctions numpy#14371

38dbcd2

Fix linter warnings numpy#14371

1ab28c9

seberg reviewed May 29, 2022

View reviewed changes

seberg added the 55 - Needs work label May 29, 2022

BvB93 reviewed May 30, 2022

View reviewed changes

BvB93 linked an issue May 30, 2022 that may be closed by this pull request

ENH: Adding where for argmin #14371

Open

m10an added 4 commits May 31, 2022 21:37

ENH: Update test_masked_2d (numpy#14371)

9f00da9

ENH: Fix non-contiguous where (numpy#14371)

cee30f3

ENH: Add non-contiguous case to test_masked_2d (numpy#14371)

f94344f

ENH: Fix argmax and argmin overloads (numpy#14371)

93dac93

As neither `initial` nor `where` affect the output type, dtype or shape, they simply should be added to the existing overloads.

m10an requested a review from BvB93 May 31, 2022 19:31

ENH: Init where broadcasting using NpyIter

c99b8a7

m10an requested a review from seberg July 7, 2022 06:34

seberg reviewed Jul 7, 2022

View reviewed changes

seberg closed this Jan 13, 2024

carlosgmartin mentioned this pull request Mar 11, 2024

Add where argument to argmax, argmin, ptp, cumsum, cumprod jax-ml/jax#20177

Open

5 tasks

carlosgmartin mentioned this pull request Apr 23, 2024

ENH: Add where argument to reduction functions that are missing it #26336

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Adding where for argmin #21625

ENH: Adding where for argmin #21625

Uh oh!

m10an commented May 28, 2022

Uh oh!

m10an commented May 28, 2022

Uh oh!

seberg left a comment

Uh oh!

BvB93 May 30, 2022

Uh oh!

m10an commented May 31, 2022

Uh oh!

m10an commented Jul 7, 2022

Uh oh!

seberg Jul 7, 2022

Uh oh!

seberg commented Jan 13, 2024

Uh oh!

Uh oh!

Uh oh!

ENH: Adding where for argmin #21625

ENH: Adding where for argmin #21625

Uh oh!

Conversation

m10an commented May 28, 2022

Uh oh!

m10an commented May 28, 2022

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

BvB93 May 30, 2022

Choose a reason for hiding this comment

Uh oh!

m10an commented May 31, 2022

Uh oh!

m10an commented Jul 7, 2022

Uh oh!

seberg Jul 7, 2022

Choose a reason for hiding this comment

Uh oh!

seberg commented Jan 13, 2024

Uh oh!

Uh oh!