Skip to content

MAINT Use _validate_params in Nearest Centroid #23874

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

2357juan
Copy link
Contributor

@2357juan 2357juan commented Jul 9, 2022

Reference Issues/PRs

towards #23462
Ref #23890

What does this implement/fix? Explain your changes.

Added _parameter_constraints class variable to Nearest Centroid.
Added metrics pairwise _VALID_METRICS constraints.

Any other comments?

The left bound of the shrink_threshold of 0 may need some review.

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@ogrisel
Copy link
Member

ogrisel commented Jul 19, 2022

Since this is fixing the bug reported in #23890, could you please add a FIX changelog entry for this estimator targeting 1.2? See doc/whats_new/v1.2rst.

Something like: NearestCentroid now only explicitly accepts Euclidean and Manhattan metrics and raise an informative error message at fit-time otherwise instead of failing with a low-level error message at predict-time.

@ogrisel
Copy link
Member

ogrisel commented Jul 19, 2022

Also, please remove the unecessary import reported by the linting runner on the continuous integration:

./sklearn/neighbors/_nearest_centroid.py:22:1: F401 'sklearn.metrics.pairwise._VALID_METRICS' imported but unused
from sklearn.metrics.pairwise import _VALID_METRICS
^

@ogrisel
Copy link
Member

ogrisel commented Jul 19, 2022

Copy link
Contributor

@Valentin-Laurent Valentin-Laurent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably that the documentation about the metric param should be updated.
Beware, this will conflict with main(see merged PR #23806).

@ogrisel
Copy link
Member

ogrisel commented Jul 20, 2022

@glemaitre we actually need a changelog entry as explained in #23874 (comment).

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@glemaitre
Copy link
Member

@glemaitre we actually need a changelog entry as explained in #23874 (comment).

Oh yes I see. @2357juan Could you solve the linting problem and add an entry in the what's new

@Micky774
Copy link
Contributor

You'll also need to modify some tests in sklearn/neighbors/tests/teast_nearest_centroid.py to not use unsupported metrics, e.g.

test_iris
test_iris_shrinkage

and you can safely delete test_precomputed in the same file.

@jeremiedbb
Copy link
Member

Since this is fixing the bug reported in #23890, could you please add a FIX changelog entry for this estimator targeting 1.2? See doc/whats_new/v1.2rst.

Something like: NearestCentroid now only explicitly accepts Euclidean and Manhattan metrics and raise an informative error message at fit-time otherwise instead of failing with a low-level error message at predict-time.

@ogrisel, the issue with the failure happening only at predict time only applies to wminkowski, seuclidean or mahalanobis. Many other metrics don't fail but are now rejected. As discussed in #23890, all metrics other than euclidean or manhattan raise a warning saying that averaging for these metrics is not supported, taking the mean instead. I don't think completely removing support for these metrics is a "bug fix", the behavior is intentional. If we really want to not support these metrics we should at least do a deprecation cycle imo.

@jeremiedbb
Copy link
Member

I pushed a more conservative validation of the metric options with only discarding the failing at predict metrics. I think that if we really want to not support all metrics but euclidean and manhattan, we should make a proper deprecation in a separate PR.

@glemaitre glemaitre merged commit a601e8c into scikit-learn:main Jul 27, 2022
@glemaitre
Copy link
Member

LGTM then. We can open a subsequent PR as suggested by @jeremiedbb

@Valentin-Laurent
Copy link
Contributor

LGTM then. We can open a subsequent PR as suggested by @jeremiedbb

I'm working on it (if that's OK with you @2357juan)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:neighbors Validation related to input validation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants