Skip to content

[WIP] Pre-compute norms to speed-up NearestNeighbors.kneighbors algorith='brute' metric='euclidean' #10212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

lesteve
Copy link
Member

@lesteve lesteve commented Nov 28, 2017

This problem was spotted in the context of https://github.com/erikbern/ann-benchmarks. In particular look at http://www.itu.dk/people/pagh/SSS/ann-benchmarks/rand-euclidean-data_10_-1_rand-euclidean-query_euclidean.html where bruteforce (NearestNeighbors with algorithm='brute') is a lot slower than bruteforce-blas (similar to fast_euclidean_neighbors in the snippet below):

This is a simple snippet showing a 3x speed-up in NearestNeighbors.kneighbors. The cost is to store the precomputed norms of X at fit time.

import numpy as np

from sklearn.utils.extmath import row_norms
from sklearn.neighbors import NearestNeighbors

n_samples = int(1e6)
n_features = 1000
n_neighbors = 10
n_queries = 1
metric = 'euclidean'

rng = np.random.RandomState(42)
X = rng.randn(n_samples, n_features)
X_norms = row_norms(X, squared=True)

def fast_euclidean_neighbors(X_queries, n_neighbors):
    """Quick function extracting bits of pieces from the NearestNeighbors code"""
    norms_squared = np.dot(X_queries, X.T)
    norms_squared *= -2
    norms_squared += row_norms(X_queries, squared=True)[:, np.newaxis]
    norms_squared += X_norms
    indices = np.argpartition(norms_squared, n_neighbors - 1, axis=1)
    indices = indices[:, :n_neighbors]

    neighbors_range = np.arange(n_queries)[:, np.newaxis]
    norms_squared = norms_squared[neighbors_range, indices]
    arg_ind = np.argsort(norms_squared, axis=1)
    norms = np.sqrt(norms_squared[neighbors_range, arg_ind])
    indices = indices[neighbors_range, arg_ind]
    return norms, indices

X_queries = rng.randn(n_queries, n_features)

res_fast = fast_euclidean_neighbors(X_queries, n_neighbors)

nn = NearestNeighbors(algorithm='brute', metric=metric)
nn.fit(X)
res_kneighbors = nn.kneighbors(X_queries, n_neighbors=n_neighbors)

np.testing.assert_allclose(res_fast, res_kneighbors)

print('fast_euclidean_neighbors')
%timeit fast_euclidean_neighbors(X_queries, n_neighbors)
print('NearestNeighbors.kneighbors')
%timeit nn.kneighbors(X_queries, n_neighbors=n_neighbors)

scikit-learn master:

fast_euclidean_neighbors
430 ms ± 7.94 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
NearestNeighbors.kneighbors
1.44 s ± 19.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This PR:

fast_euclidean_neighbors
424 ms ± 9.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
NearestNeighbors.kneighbors
421 ms ± 6.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I'd like to do more extensive benchmarks, suggestions about which parameters to vary and parameters range more than welcome.

@lesteve lesteve changed the title [WIP] Pre-compute norms to speed-up NearestNeighbors.kneighbors [WIP] Pre-compute norms to speed-up NearestNeighbors.kneighbors algorith='brute' metric='euclidean' Nov 28, 2017
@lesteve lesteve force-pushed the improve-kneighbors-brute branch from a5bd59d to 3d757bd Compare November 28, 2017 13:17
@rth
Copy link
Member

rth commented Nov 29, 2017

I like the idea! So this would speed up the kneighbours method at the cost of a somewhat slower fit.

Here are some additional benchmarks where I look at the speedup for the fit and kneighbours methods for randomly selected cases in a parameter space defined by n_samples, n_features, n_queries and n_neighbours.

The summary is that,

  • For dense arrays, this change indeed speeds up kneighbours method by 1.5-2. Best case scenario is 2-3 and worst case is 0.95. On the contrary fit is speed up by a factor of 0.45 - 0.7. Overall the speedup for fit + kneighbours is 1.1 - 1.3 with the best case scenario at 3 and a worst case at 0.95.
  • For sparse arrays, this change would speed up both fit and kneighbours by somewhere between 0.25 and 0.6 (i.e. make everything much slower) so it's probably better to keep the current solution for sparse arrays.

Note that this uses your simplified fast_euclidean_neighbors function and not the actual code in the PR.

Also, this would ignore the n_job parameter with euclidean distances, which is reasonable I think, as it's better to let BLAS handle that. So, I think a warning could be printed when user attempts to compute Euclidean distances on dense arrays with n_jobs > 1 (which would partially address #8216).

@jnothman
Copy link
Member

Test failures suggest you've got logic errors.

In terms of implementation, can't we just add the norms in effective_metric_params_?

In what cases would you recommend algorithm='brute' for euclidean? If sparse, please benchmark that case too.

@rth
Copy link
Member

rth commented Dec 14, 2017

In what cases would you recommend algorithm='brute' for euclidean

In my experience, when computing pairwise distances with n_features more than a few 100 , when BallTree wouldn't work so well.

@jnothman
Copy link
Member

jnothman commented Dec 14, 2017 via email

@lesteve lesteve force-pushed the improve-kneighbors-brute branch from 3d757bd to b7caa23 Compare December 22, 2017 16:26
@amueller amueller added Needs Benchmarks A tag for the issues and PRs which require some benchmarks Performance Stalled labels Aug 5, 2019
Base automatically changed from master to main January 22, 2021 10:49
@lesteve
Copy link
Member Author

lesteve commented May 5, 2021

This is now longer relevant, probably something similar has been done by @jeremiedbb work on nearest neighbors.

@lesteve lesteve closed this May 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:neighbors Needs Benchmarks A tag for the issues and PRs which require some benchmarks Performance Stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants