Skip to content

FIX Add "sqeuclidean" to valid metrics for PairwiseDistancesReduction #24771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jjerphan
Copy link
Member

Reference Issues/PRs

Relates to #24556.

What does this implement/fix? Explain your changes.

This adds "sqeuclidean" to valid metrics for PairwiseDistancesReduction which simplies has been forgotten for inclusion in the past.

Any other comments?

Originally observed by @Vincent-Maladiere in #24556.

Any other distance metrics that I forgot? 🙃

This metric has been forgotten in the past.
@@ -73,7 +73,7 @@ def valid_metrics(cls) -> List[str]:
"hamming",
*BOOL_METRICS,
}
return sorted(set(METRIC_MAPPING.keys()) - excluded)
return sorted(({"sqeuclidean"} | set(METRIC_MAPPING.keys())) - excluded)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use _VALID_METRICS instead ?

_VALID_METRICS = [

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_VALID_METRICS and METRIC_MAPPING.keys() intersect but none isn't included in the other.

METRIC_MAPPING.keys() is the list of DistanceMetric within scikit-learn while _VALID_METRICS has other distance metrics (like "sqeuclidean" that we conceptually support without any DistanceMetric via the Euclidean* specialisation).

Hence, I think f9beef1 can be reverted and we can implement new DistanceMetric in the future (or have our be in common with SciPy's).

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the metrics in VALID_METRICS that are not in METRIC_MAPPING:

    "correlation",
    "cosine",
    "sqeuclidean",
    "yule",
    "nan_euclidean",

and here are the ones that are in METRIC_MAPPING but not in VALID_METRICS:

    'p'
    'infinity'
    'pyfunc'

I agree, let's keep using METRIC_MAPPING for now. At some point it would nice to have more consistency throughout the pairwise module, but let's leave that for later :)

@jjerphan
Copy link
Member Author

jjerphan commented Nov 3, 2022

The error observed on the CI should be fixed by the NaN guard introduced in #24542, another small PR I am currently assessing any performance regression of.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@jeremiedbb jeremiedbb merged commit 363ce86 into scikit-learn:main Nov 18, 2022
@jjerphan jjerphan deleted the fix/pdr-add-sqeuclidean-valid-metrics branch November 18, 2022 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants