-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MAINT Add a private cython module for sorting utilities #19950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tests should fail because of bad type coercion. I am trying to see how to support fused types here (I simplified it to use inlined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To validate the design of this module, could you please use it to update #19473?
sklearn/neighbors/_sorts.pyx
Outdated
return 0 | ||
|
||
cpdef np.ndarray[ITYPE_t, ndim=2, mode='c'] argpartition( | ||
np.ndarray[DTYPE_t, ndim=2, mode='c'] data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make more sense to use use typed memory views instead of the np.ndarray
types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to, but if we make them const
(which is not done yet but we might want to) we won't yet be able to used fused types (#10624 for reference).
This is the current case in K-Means.
What shall we do? Go without const
to use memory views? 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forget what I've said: those aren't fused types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6506756 introduced Cython fused-types in.
@@ -26,6 +26,12 @@ def configuration(parent_package='', top_path=None): | |||
language="c++", | |||
libraries=libraries) | |||
|
|||
config.add_extension('_sorts', | |||
sources=['_sorts.pyx'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just use the singular for this module _sort.py
or alternatively _sorting.pyx
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably _sorting
is better for consistency (we have *._testing
for instance).
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@jjerphan Just to let you know: I'm very much in favour of your initiative to make use of great C++ features where it makes life easier like this one:+1: |
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Closing as it has been made and will be made irrelevant with the latest PRs. We might start from scratch later if having a utility module for some computational routines makes sense. |
Reference Issues/PRs
None.
What does this implement/fix? Explain your changes.
Some internals rely on sorts, like
neighbours.KNeighborsMixin
for partitioning neighbours.The C++ std library exposes a lot of efficient sorting algorithms, like
nth_element
, which is an efficient partial sort forneighbours.KNeighborsMixin
's case for example.To use
std::algorithm
, a few fixture can be defined using Cython, mainly:This is an attempt at structuring a cython private module for sorts relying on
std::algorithm
.Any other comments?
The original scope of PR proposition was to speed-up
neighbours.KNeighborsClassifier.predict
.Setting up the connection to
std::algorithms
made me think of enlarging its original scope.If we want to pursue with such a module, probably a next step would be to merge it with
_partition_nodes
, introduced in #19473PS: also, this PR definitely introduces adherence to C++ code, something which always has been avoided for the implementations' maintainability and/or understandability.
See #19473 (comment) as a pointer to recent discussions.