Skip to content

[MRG+2] algorithm='auto' should always work for nearest neighbors (continuation) #9145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 19, 2017

Conversation

herilalaina
Copy link
Contributor

Reference Issue

Fixes #4931, continuation of #5596

What does this implement/fix? Explain your changes.

Implement test for metric in ['mahalanobis', 'wminkowski', 'seuclidean']

Any other comments?

Should we warn the user when the algorithm is set into brute? (instead of auto)

if metric not in _metrics:
nn = neighbors.NearestNeighbors(n_neighbors=3, algorithm='auto',
metric=metric).fit(Xcsr)
if metric != "precomputed":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that normal that I can't call kneighbors method on sparse matrix when metric='precomputed'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That case is not implemented: it would need to be treated a bit differently to the precomputed dense array case; and it doesn't really make sense. An array containing distances which is mostly 0s is pretty useless for nearest neighbors. (Unless we're admitting negative distances, or having a different interpretation of 0s, both of which may be useful.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't understand why precomputed should be in VALID_METRICS_SPARSE and hence why it's relevant tot his code path.

@herilalaina herilalaina changed the title [WIP] algorithm='auto' should always work for nearest neighbors (continuation) [MRG] algorithm='auto' should always work for nearest neighbors (continuation) Jun 17, 2017
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise LGTM

if metric not in _metrics:
nn = neighbors.NearestNeighbors(n_neighbors=3, algorithm='auto',
metric=metric).fit(Xcsr)
if metric != "precomputed":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That case is not implemented: it would need to be treated a bit differently to the precomputed dense array case; and it doesn't really make sense. An array containing distances which is mostly 0s is pretty useless for nearest neighbors. (Unless we're admitting negative distances, or having a different interpretation of 0s, both of which may be useful.)

if metric not in _metrics:
nn = neighbors.NearestNeighbors(n_neighbors=3, algorithm='auto',
metric=metric).fit(Xcsr)
if metric != "precomputed":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't understand why precomputed should be in VALID_METRICS_SPARSE and hence why it's relevant tot his code path.

@@ -988,6 +990,38 @@ def custom_metric(x1, x2):
assert_array_almost_equal(dist1, dist2)


def test_algo_auto_metrics():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a better name or comment would be appreciated

``'effective_metric_'`` is in the ``'VALID_METRICS'`` list of
``'ball_tree'``. It selects ``'brute'`` if :math:`k < N/2` and the
``'effective_metric_'`` is not in the ``'VALID_METRICS'`` list of
``'kd_tree'`` or ``'ball_tree'``. It selects ``'brute'`` if :math:`k >= N/2`. This choice is based on the assumption that the number of query points is at least the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please keep this under 80 chars

@herilalaina
Copy link
Contributor Author

Thanks for the review. The change has been done!

@jnothman jnothman changed the title [MRG] algorithm='auto' should always work for nearest neighbors (continuation) [MRG+1] algorithm='auto' should always work for nearest neighbors (continuation) Jun 17, 2017
@@ -988,6 +990,46 @@ def custom_metric(x1, x2):
assert_array_almost_equal(dist1, dist2)


def test_unsupported_metric_for_auto():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what you mean by unsupported. It's clearly not unsupported if you're testing it works.

@jnothman jnothman added this to the 0.19 milestone Jun 18, 2017
X = rng.rand(12, 12)
Xcsr = csr_matrix(X)

# Metric which don't required any additional parameter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the comment a bit misleading. I would rename the variable and say only test metrics that don't require additional arguments.
And maybe assert that some strange metric is in VALID_METRICS['brute'].

Checking that the test is non-empty, and more didactic variable name
@amueller amueller changed the title [MRG+1] algorithm='auto' should always work for nearest neighbors (continuation) [MRG+2] algorithm='auto' should always work for nearest neighbors (continuation) Jun 19, 2017
@amueller
Copy link
Member

LGTM, I made minor minor changes to fix my nitpicks, merge on green.

@herilalaina
Copy link
Contributor Author

Thank you!

@amueller amueller merged commit 60deaea into scikit-learn:master Jun 19, 2017
@herilalaina herilalaina deleted the iss4931 branch June 19, 2017 19:10
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
…ntinuation) (scikit-learn#9145)

* Merge neighbors.rst

* issue scikit-learn#4931

* Improve test implementation

* Update base.py

* Remove unused import

* Customize test for precomputed metric

* Change test function name

* rename _metrics to require_params, add set assert

Checking that the test is non-empty, and more didactic variable name

* Remove blank line
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
…ntinuation) (scikit-learn#9145)

* Merge neighbors.rst

* issue scikit-learn#4931

* Improve test implementation

* Update base.py

* Remove unused import

* Customize test for precomputed metric

* Change test function name

* rename _metrics to require_params, add set assert

Checking that the test is non-empty, and more didactic variable name

* Remove blank line
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
…ntinuation) (scikit-learn#9145)

* Merge neighbors.rst

* issue scikit-learn#4931

* Improve test implementation

* Update base.py

* Remove unused import

* Customize test for precomputed metric

* Change test function name

* rename _metrics to require_params, add set assert

Checking that the test is non-empty, and more didactic variable name

* Remove blank line
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…ntinuation) (scikit-learn#9145)

* Merge neighbors.rst

* issue scikit-learn#4931

* Improve test implementation

* Update base.py

* Remove unused import

* Customize test for precomputed metric

* Change test function name

* rename _metrics to require_params, add set assert

Checking that the test is non-empty, and more didactic variable name

* Remove blank line
AishwaryaRK pushed a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
…ntinuation) (scikit-learn#9145)

* Merge neighbors.rst

* issue scikit-learn#4931

* Improve test implementation

* Update base.py

* Remove unused import

* Customize test for precomputed metric

* Change test function name

* rename _metrics to require_params, add set assert

Checking that the test is non-empty, and more didactic variable name

* Remove blank line
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
…ntinuation) (scikit-learn#9145)

* Merge neighbors.rst

* issue scikit-learn#4931

* Improve test implementation

* Update base.py

* Remove unused import

* Customize test for precomputed metric

* Change test function name

* rename _metrics to require_params, add set assert

Checking that the test is non-empty, and more didactic variable name

* Remove blank line
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
…ntinuation) (scikit-learn#9145)

* Merge neighbors.rst

* issue scikit-learn#4931

* Improve test implementation

* Update base.py

* Remove unused import

* Customize test for precomputed metric

* Change test function name

* rename _metrics to require_params, add set assert

Checking that the test is non-empty, and more didactic variable name

* Remove blank line
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

algorithm='auto' should always work for nearest neighbors
4 participants