MAINT Use _validate_params in Nearest Centroid #23874

2357juan · 2022-07-09T21:35:37Z

Reference Issues/PRs

towards #23462
Ref #23890

What does this implement/fix? Explain your changes.

Added _parameter_constraints class variable to Nearest Centroid.
Added metrics pairwise _VALID_METRICS constraints.

Any other comments?

The left bound of the shrink_threshold of 0 may need some review.

sklearn/neighbors/_nearest_centroid.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

ogrisel · 2022-07-19T15:54:34Z

Since this is fixing the bug reported in #23890, could you please add a FIX changelog entry for this estimator targeting 1.2? See doc/whats_new/v1.2rst.

Something like: NearestCentroid now only explicitly accepts Euclidean and Manhattan metrics and raise an informative error message at fit-time otherwise instead of failing with a low-level error message at predict-time.

ogrisel · 2022-07-19T15:56:55Z

Also, please remove the unecessary import reported by the linting runner on the continuous integration:

./sklearn/neighbors/_nearest_centroid.py:22:1: F401 'sklearn.metrics.pairwise._VALID_METRICS' imported but unused
from sklearn.metrics.pairwise import _VALID_METRICS
^

ogrisel · 2022-07-19T15:59:13Z

I just realized that the common test is also failing:

https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=44573&view=logs&j=dde5042c-7464-5d47-9507-31bdd2ee0a3a&t=4bd2dad8-62b3-5bf9-08a5-a9880c530c94

but I don't understand why. Any idea @jeremiedbb or @glemaitre?

sklearn/neighbors/_nearest_centroid.py

Valentin-Laurent

Probably that the documentation about the metric param should be updated.
Beware, this will conflict with main(see merged PR #23806).

ogrisel · 2022-07-20T08:33:32Z

@glemaitre we actually need a changelog entry as explained in #23874 (comment).

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

glemaitre · 2022-07-20T11:54:02Z

@glemaitre we actually need a changelog entry as explained in #23874 (comment).

Oh yes I see. @2357juan Could you solve the linting problem and add an entry in the what's new

Micky774 · 2022-07-21T16:32:19Z

You'll also need to modify some tests in sklearn/neighbors/tests/teast_nearest_centroid.py to not use unsupported metrics, e.g.

test_iris
test_iris_shrinkage

and you can safely delete test_precomputed in the same file.

jeremiedbb · 2022-07-22T13:13:59Z

Since this is fixing the bug reported in #23890, could you please add a FIX changelog entry for this estimator targeting 1.2? See doc/whats_new/v1.2rst.

Something like: NearestCentroid now only explicitly accepts Euclidean and Manhattan metrics and raise an informative error message at fit-time otherwise instead of failing with a low-level error message at predict-time.

@ogrisel, the issue with the failure happening only at predict time only applies to wminkowski, seuclidean or mahalanobis. Many other metrics don't fail but are now rejected. As discussed in #23890, all metrics other than euclidean or manhattan raise a warning saying that averaging for these metrics is not supported, taking the mean instead. I don't think completely removing support for these metrics is a "bug fix", the behavior is intentional. If we really want to not support these metrics we should at least do a deprecation cycle imo.

jeremiedbb · 2022-07-22T13:37:24Z

I pushed a more conservative validation of the metric options with only discarding the failing at predict metrics. I think that if we really want to not support all metrics but euclidean and manhattan, we should make a proper deprecation in a separate PR.

glemaitre · 2022-07-27T08:36:27Z

LGTM then. We can open a subsequent PR as suggested by @jeremiedbb

Valentin-Laurent · 2022-08-01T17:38:24Z

LGTM then. We can open a subsequent PR as suggested by @jeremiedbb

I'm working on it (if that's OK with you @2357juan)

Juan Gomez added 2 commits July 9, 2022 14:02

adding validate-params to nearest centroid

b7fa6a1

ran black on modifications

fe219e1

github-actions bot added the module:neighbors label Jul 9, 2022

2357juan mentioned this pull request Jul 9, 2022

Make all estimators use _validate_params #23462

Closed

Micky774 added No Changelog Needed Validation related to input validation labels Jul 10, 2022

ogrisel mentioned this pull request Jul 13, 2022

NearestCentroid not handling properly distance metrics other than Manhattan or Euclidean. #23890

Closed

ogrisel reviewed Jul 13, 2022

View reviewed changes

sklearn/neighbors/_nearest_centroid.py Outdated Show resolved Hide resolved

Update sklearn/neighbors/_nearest_centroid.py

e078104

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

ogrisel removed the No Changelog Needed label Jul 19, 2022

glemaitre reviewed Jul 19, 2022

View reviewed changes

sklearn/neighbors/_nearest_centroid.py Outdated Show resolved Hide resolved

glemaitre added the No Changelog Needed label Jul 19, 2022

Valentin-Laurent reviewed Jul 20, 2022

View reviewed changes

ogrisel removed the No Changelog Needed label Jul 20, 2022

Update sklearn/neighbors/_nearest_centroid.py

e8855f9

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

glemaitre added No Changelog Needed and removed No Changelog Needed labels Jul 20, 2022

Merge remote-tracking branch 'upstream/main' into pr/2357juan/23874

47fa743

jeremiedbb added 3 commits July 22, 2022 15:28

what's new

c9031b9

conservative valid metric options

018e0a6

lint

7545165

Merge branch 'main' into nearest-centroid-validate-params

15cde18

glemaitre merged commit a601e8c into scikit-learn:main Jul 27, 2022

Valentin-Laurent mentioned this pull request Aug 2, 2022

API Deprecate metrics other than euclidean and manhattan for NearestCentroid #24083

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MAINT Use _validate_params in Nearest Centroid #23874

MAINT Use _validate_params in Nearest Centroid #23874

Uh oh!

2357juan commented Jul 9, 2022 •

edited by jeremiedbb

Loading

Uh oh!

Uh oh!

ogrisel commented Jul 19, 2022 •

edited

Loading

Uh oh!

ogrisel commented Jul 19, 2022

Uh oh!

ogrisel commented Jul 19, 2022

Uh oh!

Uh oh!

Valentin-Laurent left a comment

Uh oh!

ogrisel commented Jul 20, 2022

Uh oh!

glemaitre commented Jul 20, 2022

Uh oh!

Micky774 commented Jul 21, 2022

Uh oh!

jeremiedbb commented Jul 22, 2022

Uh oh!

jeremiedbb commented Jul 22, 2022

Uh oh!

glemaitre commented Jul 27, 2022

Uh oh!

Valentin-Laurent commented Aug 1, 2022

Uh oh!

Uh oh!

Uh oh!

MAINT Use _validate_params in Nearest Centroid #23874

MAINT Use _validate_params in Nearest Centroid #23874

Uh oh!

Conversation

2357juan commented Jul 9, 2022 • edited by jeremiedbb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

ogrisel commented Jul 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Jul 19, 2022

Uh oh!

ogrisel commented Jul 19, 2022

Uh oh!

Uh oh!

Valentin-Laurent left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Jul 20, 2022

Uh oh!

glemaitre commented Jul 20, 2022

Uh oh!

Micky774 commented Jul 21, 2022

Uh oh!

jeremiedbb commented Jul 22, 2022

Uh oh!

jeremiedbb commented Jul 22, 2022

Uh oh!

glemaitre commented Jul 27, 2022

Uh oh!

Valentin-Laurent commented Aug 1, 2022

Uh oh!

Uh oh!

2357juan commented Jul 9, 2022 •

edited by jeremiedbb

Loading

ogrisel commented Jul 19, 2022 •

edited

Loading