Skip to content

Commit fcc77bd

Browse files
committed
ENH: Add extended sorting APIs
This PR adds new sorting APIs that can be used for extending our current sort types. The new C-APIs do not have the *which* argument of the legacy C-APIs, instead they have *flags* request flag API. In this transition, heapsort can no longer be called directly, introsort (quicksort) is called instead. The sort/argsort methods get new keyword arguments used to set the flags when *kind* is not given, so *flags* is invisible at the Python level. The new keywords are: - stable -- whether the sort should stable, default is False, - descending -- whether to sort in descending order, default is False, - nanfirst -- whether NaNs sort to the last or first, default is False. The main difference here is that the keywords select by functionality, whereas *kind* selects by algorithm. Note that descending and nanfirst are not yet implemented, so an error is raised if either is specified. The added C-API functions are: - (API)PyArray_SortEx, - (API)PyArray_ArgSortEx, Those are the two functions that need changes depending on how we want to implement adding new sort algorithms. They should be pretty easy to adapt to whatever we choose to do. The legacy C-API functions, PyArray_Sort and PyArray_ArgSort, have been modified to call through the new C-API functions. In order to do so the *kind* function has been encoded in the *flags* like so: - "quicksort" - default (0) - "heapsort" - default (0) - "mergesort" - stable (1) I note that the partitioning/selection functions have not been modified, we may want to do that a some point.
1 parent bde097d commit fcc77bd

File tree

9 files changed

+306
-154
lines changed

9 files changed

+306
-154
lines changed
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
The NPY_SORTKIND enum has been enhanced with new variables
2+
----------------------------------------------------------
3+
This is of interest if you are using ``PyArray_Sort`` or ``PyArray_ArgSort``.
4+
We have changed the semantics of the old names in the NPY_SORTKIND enum and
5+
added new ones. The changes are backward compatible, and no recompilation is
6+
needed. The new names of interest are:
7+
8+
* NPY_SORT_DEFAULT -- default sort (same value as NPY_QUICKSORT)
9+
* NPY_SORT_STABLE -- the sort must be stable (same value as NPY_MERGESORT)
10+
* NPY_SORT_DESCENDING -- the sort must be descending
11+
* NPY_SORT_NANFIRST -- NaNs sort to the beginning
12+
13+
The semantic change is that NPY_HEAPSORT is mapped to NPY_QUICKSORT when used.
14+
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Sorting ``kind='heapsort'`` now maps to ``kind='quicksort'``
2+
------------------------------------------------------------
3+
It is unlikely that this change will be noticed, but if you do see a change in
4+
execution time or unstable argsort order, this is likely the cause.

doc/source/reference/c-api/array.rst

Lines changed: 41 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2308,9 +2308,9 @@ Item selection and manipulation
23082308
is sorted using the algorithm denoted by *kind*, which is an integer/enum pointing
23092309
to the type of sorting algorithms used.
23102310
2311-
.. c:function:: PyObject* PyArray_ArgSort(PyArrayObject* self, int axis)
2311+
.. c:function:: PyObject* PyArray_ArgSort(PyArrayObject* self, int axis, NPY_SORTKIND kind)
23122312
2313-
Equivalent to :meth:`ndarray.argsort<numpy.ndarray.argsort>` (*self*, *axis*).
2313+
Equivalent to :meth:`ndarray.argsort<numpy.ndarray.argsort>` (*self*, *axis*, *kind*).
23142314
Return an array of indices such that selection of these indices
23152315
along the given ``axis`` would return a sorted version of *self*. If *self* ->descr
23162316
is a data-type with fields defined, then self->descr->names is used
@@ -4321,7 +4321,11 @@ Enumerated Types
43214321
.. c:enum:: NPY_SORTKIND
43224322
43234323
A special variable-type which can take on different values to indicate
4324-
the sorting algorithm being used.
4324+
the sorting algorithm being used. These algorithm types have not been
4325+
treated strictly for some time, but rather treated as stable/not stable.
4326+
In NumPy 2.4 they are replaced by requirements (see below), but done in a
4327+
backwards compatible way. These values will continue to work, except that
4328+
that NPY_HEAPSORT will do the same thing as NPY_QUICKSORT.
43254329
43264330
.. c:enumerator:: NPY_QUICKSORT
43274331
@@ -4335,11 +4339,41 @@ Enumerated Types
43354339
43364340
.. c:enumerator:: NPY_NSORTS
43374341
4338-
Defined to be the number of sorts. It is fixed at three by the need for
4339-
backwards compatibility, and consequently :c:data:`NPY_MERGESORT` and
4340-
:c:data:`NPY_STABLESORT` are aliased to each other and may refer to one
4341-
of several stable sorting algorithms depending on the data type.
4342+
Defined to be the number of sorts. It is fixed at three by the need for
4343+
backwards compatibility, and consequently :c:data:`NPY_MERGESORT` and
4344+
:c:data:`NPY_STABLESORT` are aliased to each other and may refer to one
4345+
of several stable sorting algorithms depending on the data type.
43424346
4347+
In NumPy 2.4 the algorithm names are replaced by requirements. You can still use
4348+
the old values, a recompile is not needed, but they are reinterpreted such that
4349+
4350+
* NPY_QUICKSORT and NPY_HEAPSORT -> NPY_SORT_DEFAULT
4351+
* NPY_MERGESORT and NPY_STABLE -> NPY_SORT_STABLE
4352+
4353+
.. c:enumerator:: NPY_SORT_DEFAULT
4354+
4355+
The default sort for the type. For the NumPy builtin types it may be
4356+
stable or not, but will be ascending and sort NaN types to the end. It
4357+
is usually chosen for speed and/or low memory.
4358+
4359+
.. c:enumerator:: NPY_SORT_STABLE
4360+
4361+
(Requirement) Specifies that the sort must be stable.
4362+
4363+
.. c:enumerator:: NPY_SORT_DESCENDING
4364+
4365+
(Requirement) Specifies that the sort must be in descending order.
4366+
This functionality is not yet implemented for any of the NumPy types
4367+
and cannot yet be set from the Python interface.
4368+
4369+
.. c:enumerator:: NPY_SORT_NANFIRST
4370+
4371+
(Requirement) Specifies that the sort must sort NaNs to
4372+
the beginning, e.g. NaNs compare less than any other value for the
4373+
type. This is only relevant if the type has NaN equivalents.
4374+
This functionality is not yet implemented for any of the NumPy types
4375+
and cannot yet be set from the Python interface. It may change in the
4376+
future.
43434377
43444378
.. c:enum:: NPY_SCALARKIND
43454379

doc/source/reference/c-api/types-and-structures.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -650,7 +650,7 @@ PyArray_ArrFuncs
650650
address is given. The final argument is the array which is
651651
needed to get the itemsize for variable-length arrays.
652652
653-
.. c:member:: int sort(void* start, npy_intp length, void* arr)
653+
.. c:member:: int sort(void* start, npy_intp length, void *arr)
654654
655655
An array of function pointers to a particular sorting
656656
algorithms. A particular sorting algorithm is obtained using a
@@ -659,7 +659,7 @@ PyArray_ArrFuncs
659659
in-place assuming contiguous and aligned data.
660660
661661
.. c:member:: int argsort( \
662-
void* start, npy_intp* result, npy_intp length, void *arr)
662+
void* start, npy_intp* result, npy_intp length, void* arr)
663663
664664
An array of function pointers to sorting algorithms for this
665665
data type. The same sorting algorithms as for sort are

numpy/_core/include/numpy/ndarraytypes.h

Lines changed: 31 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -162,18 +162,39 @@ enum NPY_TYPECHAR {
162162
};
163163

164164
/*
165-
* Changing this may break Numpy API compatibility
166-
* due to changing offsets in PyArray_ArrFuncs, so be
167-
* careful. Here we have reused the mergesort slot for
168-
* any kind of stable sort, the actual implementation will
169-
* depend on the data type.
165+
* Changing this may break Numpy API compatibility due to changing offsets in
166+
* PyArray_ArrFuncs, so be careful. Here we have reused the mergesort slot for
167+
* any kind of stable sort, the actual implementation will depend on the data
168+
* type.
169+
*
170+
* Updated in NumPy 2.4
171+
*
172+
* Updated with new names denoting requirements rather than the algorithm. All
173+
* the previous values are reused in a way that should be downstream
174+
* compatible, but the actual algorithms used may be different than before. The
175+
* new approach should be more flexible and easier to update. The idea is that
176+
* NPY_SORT_STABLE | NPY_SORT_DESCENDING | NPY_SORT_NANFIRST should provide an
177+
* index.
178+
*
179+
* Names with a leading underscore are private, and should only be used
180+
* internally by NumPy.
181+
*
182+
* NPY_NSORTS remains the same for backwards compatibility, it should not be
183+
* changed.
170184
*/
185+
171186
typedef enum {
172-
_NPY_SORT_UNDEFINED=-1,
173-
NPY_QUICKSORT=0,
174-
NPY_HEAPSORT=1,
175-
NPY_MERGESORT=2,
176-
NPY_STABLESORT=2,
187+
_NPY_SORT_UNDEFINED = -1,
188+
NPY_QUICKSORT = 0,
189+
NPY_HEAPSORT = 1,
190+
NPY_MERGESORT = 2,
191+
NPY_STABLESORT = 2,
192+
// new style names
193+
_NPY_SORT_HEAPSORT = 1,
194+
NPY_SORT_DEFAULT = 0,
195+
NPY_SORT_STABLE = 2,
196+
NPY_SORT_DESCENDING = 4,
197+
NPY_SORT_NANFIRST = 8,
177198
} NPY_SORTKIND;
178199
#define NPY_NSORTS (NPY_STABLESORT + 1)
179200

numpy/_core/src/multiarray/_multiarray_tests.c.src

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2089,10 +2089,18 @@ run_sortkind_converter(PyObject* NPY_UNUSED(self), PyObject *args)
20892089
return NULL;
20902090
}
20912091
switch (kind) {
2092-
case _NPY_SORT_UNDEFINED: return PyUnicode_FromString("_NPY_SORT_UNDEFINED");
2093-
case NPY_QUICKSORT: return PyUnicode_FromString("NPY_QUICKSORT");
2094-
case NPY_HEAPSORT: return PyUnicode_FromString("NPY_HEAPSORT");
2095-
case NPY_STABLESORT: return PyUnicode_FromString("NPY_STABLESORT");
2092+
case _NPY_SORT_UNDEFINED:
2093+
return PyUnicode_FromString("_NPY_SORT_UNDEFINED");
2094+
case NPY_QUICKSORT:
2095+
return PyUnicode_FromString("NPY_QUICKSORT");
2096+
case NPY_HEAPSORT:
2097+
return PyUnicode_FromString("NPY_HEAPSORT");
2098+
case NPY_STABLESORT:
2099+
return PyUnicode_FromString("NPY_STABLESORT");
2100+
default:
2101+
// the other possible values in NPY_SORTKIND can only
2102+
// be set with keywords.
2103+
break;
20962104
}
20972105
return PyLong_FromLong(kind);
20982106
}

0 commit comments

Comments
 (0)