-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
WIP PERF Faster KNeighborsClassifier.predict #14543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…lassifier.predict
The radius neighbours input is perfect for transforming into a 2d CSR matrix. Just set indices to the concatenated neighbour indices, and indptr to I suspect you can efficiently use sparse matrices to compute the result for both. |
Thanks for the pointer. I'll try to investigate, but first I need to fix the test failure which seems to point to a genuine bug. |
@jjerphan In case you are interested to continue it, this should also give an easy performance win for KNN classification :) As a first step maybe just making this happen for |
Thanks! |
Closes #13783
This makes
KNeighborsClassifier.predict
faster by re-writingscipy.stats.mode
as an argmax of a sparse array as discussed in the parent issue.Using the example provided in #13783 (comment),
On master
With this PR
so in this particular case, the
KNeighborsClassifier.predict
is 3.1x faster.It works in a straightforward way both on weighted an unweighted data making
sklearn.utils.weighted_mode
no longer necessary.The downside is that it makes
RadiusNeighborsClassifier.predict
slower by about 30% on the following example,TODO
investigate RadiusNeighborsClassifier.predict performance regression.