Skip to content

Inconsistent check_pairwise_arrays in pairwise_distances #31162

Closed
@lucyleeow

Description

@lucyleeow

Note: This is not necessarily a 'problem' to be fixed but it could be better documented at least.

In pairwise_distances, check_pairwise_arrays is not run if metric is a callable or a metric in PAIRWISE_DISTANCE_FUNCTIONS.
check_pairwise_arrays notably ensures the inputs are 2D and performs check_array, which does more than I fully understand, so I can't confidently comment on whether it is 'needed'.

Most PAIRWISE_DISTANCE_FUNCTIONS metrics run check_pairwise_arrays themselves but the following do not:

  • haversine_distances (this is implemented in Cython but does seem to check that X is 2D)
  • cosine_distances

For user provided metric callables, I think we leave it up to the user to ensure input is correct. I found this comment (#27456 (comment)) in and old PR, about leaving it to the user to ensure the input is 2D.

We also do have the note:

# Only check the number of features if 2d arrays are enforced. Otherwise,
# validation is left to the user for custom metrics.

I am not sure this comment is in the best place, but that is an easy fix. check_array is also not performed for user provided metric callables, but maybe this is not 'needed' ?

cc @jeremiedbb and maybe @adrinjalali ?

Context: noticed while working on #29822

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions