Skip to content

Fitting a NearestNeighbors model fails with sparse input and a callable as metric #9199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tttthomasssss opened this issue Jun 22, 2017 · 11 comments · Fixed by #9579
Closed

Comments

@tttthomasssss
Copy link
Contributor

tttthomasssss commented Jun 22, 2017

Description

Fitting a NearestNeighbors model fails when a) the distance metric used is a callable and b) the input to the NearestNeighbors model is sparse.

Steps/Code to Reproduce

from scipy import sparse
from sklearn.neighbors import NearestNeighbors

def sparse_metric(x, y): # Some metric accepting sparse input
    return x.count_nonzero() / y.count_nonzero()

A = sparse.random(10, 5, density=0.3, format='csr')

nn = NearestNeighbors(algorithm='brute', metric=sparse_metric).fit(A)

Expected Results

No error is thrown when passing a callable as metric with sparse input

Actual Results

ValueError                                Traceback (most recent call last)
<ipython-input-2-a9d2fd7f843b> in <module>()
      7 A = sparse.random(10, 5, density=0.3, format='csr')
      8 
----> 9 nn = NearestNeighbors(algorithm='brute', metric=sparse_metric).fit(A)

/Volumes/LocalDataHD/thk22/.virtualenvs/nlpy3/lib/python3.5/site-packages/sklearn/neighbors/base.py in fit(self, X, y)
    797             or [n_samples, n_samples] if metric='precomputed'.
    798         """
--> 799         return self._fit(X)

/Volumes/LocalDataHD/thk22/.virtualenvs/nlpy3/lib/python3.5/site-packages/sklearn/neighbors/base.py in _fit(self, X)
    213             if self.effective_metric_ not in VALID_METRICS_SPARSE['brute']:
    214                 raise ValueError("metric '%s' not valid for sparse input"
--> 215                                  % self.effective_metric_)
    216             self._fit_X = X.copy()
    217             self._tree = None

ValueError: metric '<function sparse_metric at 0x1097d0378>' not valid for sparse input

Some Analysis/Wild Speculation

The problem seems to come from the fact that in the case of sparse input, it is only checked whether the given metric is in the list of metrics accepting sparse input, but no check is made whether the given metric is a string or a callable: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/neighbors/base.py#L210

Versions

Darwin-15.6.0-x86_64-i386-64bit
Python 3.5.1 (default, Dec  8 2015, 06:00:08) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
NumPy 1.12.1
SciPy 0.19.0
Scikit-Learn 0.18.2
@jnothman
Copy link
Member

I think this may be fixed in master due to #9145

@jnothman
Copy link
Member

B

@jnothman jnothman reopened this Jun 22, 2017
@jnothman
Copy link
Member

Please check

@tttthomasssss
Copy link
Contributor Author

I just cloned master and it results in the same error. The codeline I referenced above was actually from master as well.

ValueError                                Traceback (most recent call last)
<ipython-input-2-a9d2fd7f843b> in <module>()
      7 A = sparse.random(10, 5, density=0.3, format='csr')
      8 
----> 9 nn = NearestNeighbors(algorithm='brute', metric=sparse_metric).fit(A)

~/DevSandbox/InfiniteSandbox/scikit-learn/sklearn/neighbors/base.py in fit(self, X, y)
    801             or [n_samples, n_samples] if metric='precomputed'.
    802         """
--> 803         return self._fit(X)

~/DevSandbox/InfiniteSandbox/scikit-learn/sklearn/neighbors/base.py in _fit(self, X)
    214             if self.effective_metric_ not in VALID_METRICS_SPARSE['brute']:
    215                 raise ValueError("metric '%s' not valid for sparse input"
--> 216                                  % self.effective_metric_)
    217             self._fit_X = X.copy()
    218             self._tree = None

ValueError: metric '<function sparse_metric at 0x1122a57b8>' not valid for sparse input

Versions

Darwin-15.6.0-x86_64-i386-64bit
Python 3.5.1 (default, Dec 8 2015, 06:00:08)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
NumPy 1.13.0
SciPy 0.19.0
Scikit-Learn 0.19.dev0

@tttthomasssss
Copy link
Contributor Author

A very simple fix would be to change Line 214 in sklearn/neighbors/base.py from

if self.effective_metric_ not in VALID_METRICS_SPARSE['brute']:

to

if not callable(self.effective_metric_) and self.effective_metric_ not in VALID_METRICS_SPARSE['brute']:

@jnothman
Copy link
Member

jnothman commented Jun 22, 2017 via email

@tttthomasssss
Copy link
Contributor Author

Cool, I'll submit a PR along with a test for it.

@taineleau-zz
Copy link

@tttthomasssss have you fixed this bug?

@jnothman
Copy link
Member

No pull request yet

@tttthomasssss
Copy link
Contributor Author

@taineleau @jnothman thanks for the reminder and apologies for the delay!

I have a fix for the problem per se, however haven't gotten round to write an appropriate test for it.

I will submit a PR within the next few days.

@taineleau-zz
Copy link

taineleau-zz commented Aug 16, 2017 via email

tttthomasssss added a commit to tttthomasssss/scikit-learn that referenced this issue Aug 17, 2017
…ghbors model fails with sparse input and a callable as metric
TomDLT pushed a commit that referenced this issue Dec 6, 2017
tttthomasssss added a commit to tttthomasssss/scikit-learn that referenced this issue Dec 14, 2017
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this issue Dec 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants