-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Inconsistent check_pairwise_arrays
in pairwise_distances
#31162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
For metrics in PAIRWISE_DISTANCE_FUNCTIONS, it delegates the check to the metric functions since they already do it. Otherwise it would result in a double validation.
Yes, in case of a callable, all validations are left to the callable. It's done on purpose to allow for use cases that we don't have within sklearn but may be useful more generally like distances between strings. |
To summarize, I think the status of |
Sure but we could at least leave a comment in the code about:
|
Indeed adding a comment would not hurt :) |
That was wrong. We still call So in the end the use of |
Yeah I saw that later too. Agreed, it's enough |
Note: This is not necessarily a 'problem' to be fixed but it could be better documented at least.
In
pairwise_distances
,check_pairwise_arrays
is not run ifmetric
is a callable or a metric inPAIRWISE_DISTANCE_FUNCTIONS
.check_pairwise_arrays
notably ensures the inputs are 2D and performscheck_array
, which does more than I fully understand, so I can't confidently comment on whether it is 'needed'.Most
PAIRWISE_DISTANCE_FUNCTIONS
metrics runcheck_pairwise_arrays
themselves but the following do not:haversine_distances
(this is implemented in Cython but does seem to check thatX
is 2D)cosine_distances
For user provided
metric
callables, I think we leave it up to the user to ensure input is correct. I found this comment (#27456 (comment)) in and old PR, about leaving it to the user to ensure the input is 2D.We also do have the note:
scikit-learn/sklearn/metrics/pairwise.py
Lines 228 to 229 in 35c431d
I am not sure this comment is in the best place, but that is an easy fix.
check_array
is also not performed for user provided metric callables, but maybe this is not 'needed' ?cc @jeremiedbb and maybe @adrinjalali ?
Context: noticed while working on #29822
The text was updated successfully, but these errors were encountered: