WIP PERF Faster KNeighborsClassifier.predict #14543

rth · 2019-08-01T16:24:18Z

This makes KNeighborsClassifier.predict faster by re-writing scipy.stats.mode as an argmax of a sparse array as discussed in the parent issue.

Using the example provided in #13783 (comment),

On master

%timeit knn.predict(X_grid)                                                 
2.47 s ± 37.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

With this PR

%timeit knn.predict(X_grid)                                                 
793 ms ± 39.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

so in this particular case, the KNeighborsClassifier.predict is 3.1x faster.

It works in a straightforward way both on weighted an unweighted data making sklearn.utils.weighted_mode no longer necessary.

The downside is that it makes RadiusNeighborsClassifier.predict slower by about 30% on the following example,

In [1]: import numpy as np 
   ...: from sklearn.datasets import make_blobs 
   ...: from sklearn.neighbors import RadiusNeighborsClassifier 
   ...:  
   ...: X, y = make_blobs(centers=2, random_state=4, n_samples=30) 
   ...: knn = RadiusNeighborsClassifier(algorithm='kd_tree', radius=2).fit(X, y) 
   ...:  
   ...: x_min, x_max = X[:, 0].min(), X[:, 0].max() 
   ...: y_min, y_max = X[:, 1].min(), X[:, 1].max() 
   ...:  
   ...: xx = np.linspace(x_min, x_max, 100) 
   ...: # change 100 to 1000 below and wait a long time                             
   ...:               
   ...: yy = np.linspace(y_min, y_max, 100)                                         
   ...:   
   ...:  
   ...: X1, X2 = np.meshgrid(xx, yy)                                                
   ...:    
   ...: X_grid = np.c_[X1.ravel(), X2.ravel()]                                      

In [2]: %timeit knn.predict(X_grid)                                                 
1.27 s ± 9.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The difference is mostly that in `RadiusNeighborsClassifier` we call this function repeatedly on 1D arrays, as opposed to a single 2D array. In worst case, I could revert back to `scipy.stats.mode` and `sklearn.utils.weighted_mode` for `RadiusNeighborsClassifier` but it's annoying to use two different implementations.

TODO

fix the remaining 2 test failures on the comparison between the multi-output and single-output case.
~~investigate RadiusNeighborsClassifier.predict performance regression.~~

…lassifier.predict

jnothman · 2020-08-31T23:19:12Z

The radius neighbours input is perfect for transforming into a 2d CSR matrix. Just set indices to the concatenated neighbour indices, and indptr to array([len(x) for x in neighbor_indices]).

I suspect you can efficiently use sparse matrices to compute the result for both.

rth · 2020-09-01T11:19:05Z

Thanks for the pointer. I'll try to investigate, but first I need to fix the test failure which seems to point to a genuine bug.

rth · 2021-04-22T09:45:03Z

@jjerphan In case you are interested to continue it, this should also give an easy performance win for KNN classification :) As a first step maybe just making this happen for KNeighborsClassifier would already be a start.
It might need to re-run benchmarks, merge with main and fix the two remaining test failures.

jjerphan · 2021-04-22T10:07:58Z

@rth: thanks for the comment!

I might have a look after #19950 which might help with improving performance of neighbours.KNeighborsClassifier.predict :)

rth · 2021-04-22T10:20:46Z

Thanks!

jjerphan · 2022-11-16T17:04:25Z

@rth: #24076 might be preferred over this PR. I let you close if you also think it is.

rth added 3 commits August 1, 2019 17:18

Initial version of the faster implementation of scipy.stats.mode

62695fd

Also support weights

6bd2a08

Fixes to RadiusNeigboursClassifier

c7d3286

rth mentioned this pull request Aug 1, 2019

knn predict unreasonably slow b/c of use of scipy.stats.mode #13783

Closed

TomDLT mentioned this pull request Aug 6, 2019

[MRG+1] Add predict_proba(X) and outlier handler for RadiusNeighborsClassifier #9597

Merged

github-actions bot added module:neighbors module:utils labels Mar 2, 2020

Merge remote-tracking branch 'upstream/master' into fast-KNeighboursC…

1787f3e

…lassifier.predict

rth marked this pull request as draft September 1, 2020 11:16

Base automatically changed from master to main January 22, 2021 10:51

Micky774 mentioned this pull request Jun 21, 2022

ENH Improve performance of KNeighborsClassifier.predict #23721

Closed

1 task

Micky774 mentioned this pull request Aug 1, 2022

PERF Implement PairwiseDistancesReduction backend for KNeighbors.predict_proba #24076

Merged

4 tasks

ogrisel closed this in #24076 Mar 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

WIP PERF Faster KNeighborsClassifier.predict #14543

WIP PERF Faster KNeighborsClassifier.predict #14543

Uh oh!

rth commented Aug 1, 2019 •

edited

Loading

Uh oh!

jnothman commented Aug 31, 2020

Uh oh!

rth commented Sep 1, 2020

Uh oh!

rth commented Apr 22, 2021

Uh oh!

jjerphan commented Apr 22, 2021

Uh oh!

rth commented Apr 22, 2021

Uh oh!

jjerphan commented Nov 16, 2022

Uh oh!

Uh oh!

Uh oh!

WIP PERF Faster KNeighborsClassifier.predict #14543

WIP PERF Faster KNeighborsClassifier.predict #14543

Uh oh!

Conversation

rth commented Aug 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Aug 31, 2020

Uh oh!

rth commented Sep 1, 2020

Uh oh!

rth commented Apr 22, 2021

Uh oh!

jjerphan commented Apr 22, 2021

Uh oh!

rth commented Apr 22, 2021

Uh oh!

jjerphan commented Nov 16, 2022

Uh oh!

Uh oh!

rth commented Aug 1, 2019 •

edited

Loading