Description
Note: This is not necessarily a 'problem' to be fixed but it could be better documented at least.
In pairwise_distances
, check_pairwise_arrays
is not run if metric
is a callable or a metric in PAIRWISE_DISTANCE_FUNCTIONS
.
check_pairwise_arrays
notably ensures the inputs are 2D and performs check_array
, which does more than I fully understand, so I can't confidently comment on whether it is 'needed'.
Most PAIRWISE_DISTANCE_FUNCTIONS
metrics run check_pairwise_arrays
themselves but the following do not:
haversine_distances
(this is implemented in Cython but does seem to check thatX
is 2D)cosine_distances
For user provided metric
callables, I think we leave it up to the user to ensure input is correct. I found this comment (#27456 (comment)) in and old PR, about leaving it to the user to ensure the input is 2D.
We also do have the note:
scikit-learn/sklearn/metrics/pairwise.py
Lines 228 to 229 in 35c431d
I am not sure this comment is in the best place, but that is an easy fix. check_array
is also not performed for user provided metric callables, but maybe this is not 'needed' ?
cc @jeremiedbb and maybe @adrinjalali ?
Context: noticed while working on #29822