Skip to content

MAINT Add a private cython module for sorting utilities #19950

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

jjerphan
Copy link
Member

@jjerphan jjerphan commented Apr 21, 2021

Reference Issues/PRs

None.

What does this implement/fix? Explain your changes.

Some internals rely on sorts, like neighbours.KNeighborsMixin for partitioning neighbours.

The C++ std library exposes a lot of efficient sorting algorithms, like nth_element, which is an efficient partial sort for neighbours.KNeighborsMixin's case for example.

To use std::algorithm, a few fixture can be defined using Cython, mainly:

  • Cython functions used in Cython implementations. Those call
  • C++ functions that wraps function of std::algorithm and (sometimes) use
  • an Comparator to state how to sort

This is an attempt at structuring a cython private module for sorts relying on std::algorithm.

Any other comments?

The original scope of PR proposition was to speed-up neighbours.KNeighborsClassifier.predict.
Setting up the connection to std::algorithms made me think of enlarging its original scope.

If we want to pursue with such a module, probably a next step would be to merge it with _partition_nodes, introduced in #19473


PS: also, this PR definitely introduces adherence to C++ code, something which always has been avoided for the implementations' maintainability and/or understandability.

See #19473 (comment) as a pointer to recent discussions.

@jjerphan
Copy link
Member Author

Tests should fail because of bad type coercion.

I am trying to see how to support fused types here (I simplified it to use inlined int and floating for experiments).
I'll get into it tomorrow.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To validate the design of this module, could you please use it to update #19473?

return 0

cpdef np.ndarray[ITYPE_t, ndim=2, mode='c'] argpartition(
np.ndarray[DTYPE_t, ndim=2, mode='c'] data,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make more sense to use use typed memory views instead of the np.ndarray types?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to, but if we make them const (which is not done yet but we might want to) we won't yet be able to used fused types (#10624 for reference).

This is the current case in K-Means.

What shall we do? Go without const to use memory views? 🙂

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget what I've said: those aren't fused types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6506756 introduced Cython fused-types in.

@@ -26,6 +26,12 @@ def configuration(parent_package='', top_path=None):
language="c++",
libraries=libraries)

config.add_extension('_sorts',
sources=['_sorts.pyx'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just use the singular for this module _sort.py or alternatively _sorting.pyx.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably _sorting is better for consistency (we have *._testing for instance).

jjerphan and others added 2 commits April 22, 2021 11:52
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@jjerphan jjerphan closed this Apr 22, 2021
@jjerphan jjerphan deleted the sorts_module branch April 22, 2021 15:33
@jjerphan jjerphan restored the sorts_module branch April 22, 2021 15:33
@jjerphan jjerphan reopened this Apr 22, 2021
@lorentzenchr
Copy link
Member

@jjerphan Just to let you know: I'm very much in favour of your initiative to make use of great C++ features where it makes life easier like this one:+1:

@jjerphan
Copy link
Member Author

Closing as it has been made and will be made irrelevant with the latest PRs. We might start from scratch later if having a utility module for some computational routines makes sense.

@jjerphan jjerphan closed this Sep 30, 2021
@jjerphan jjerphan deleted the sorts_module branch October 21, 2022 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants