Skip to content

MAINT Parameters validation for sklearn.metrics.pairwise_distances_argmin #26124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

Charlie-XIAO
Copy link
Contributor

Reference Issues/PRs

Towards #24862.

What does this implement/fix? Explain your changes.

Automatic parameters validation for sklearn.metrics.pairwise_distances_argmin

@Charlie-XIAO
Copy link
Contributor Author

I'm not sure what metrics is allowed to be. It seems that only using ArgKmin.valid_metrics() is not enough some metrics like "cosine" are also allowed. Only using _VALID_METRICS is also not enough (I think) since I've tested with metrics like "infinity" and "p" but no errors were raised. Please let me know what is the correct way to do this and I will make changes ASAP.

@jeremiedbb jeremiedbb added No Changelog Needed Validation related to input validation labels Apr 12, 2023
@glemaitre
Copy link
Member

My intuition would be to have the same metrics as allowed in the neighbors:

VALID_METRICS = dict(
ball_tree=BallTree._valid_metrics,
kd_tree=KDTree._valid_metrics,
# The following list comes from the
# sklearn.metrics.pairwise doc string
brute=sorted(set(PAIRWISE_DISTANCE_FUNCTIONS).union(SCIPY_METRICS)),
)

@Charlie-XIAO
Copy link
Contributor Author

@glemaitre It seems that _VALID_METRICS (as below) is a superset of SCIPY_METRICS in sklearn.neighbors.

_VALID_METRICS = [

Originally _VALID_METRICS was defined far after pairwise_distances_argmin, and I've moved it to before this function. However, PAIRWISE_DISTANCE_FUNCTIONS is a dict of (name, callable), so that I cannot move it to the front where those callables (functions) have not been defined. What would you suggest as the best workaround?

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we could make valid metrics the same in most places, if not everywhere, but maybe that'd be a separate PR.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @Charlie-XIAO !

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasjpfan thomasjpfan merged commit 5b9ec99 into scikit-learn:main May 24, 2023
@Charlie-XIAO Charlie-XIAO deleted the param-val-pairwise_distances_argmin_min branch September 23, 2023 13:37
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
…rn#26124)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants