Skip to content

FEA Introduce PairwiseDistances, a generic back-end for pairwise_distances #26983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 46 commits into
base: main
Choose a base branch
from

Conversation

Micky774
Copy link
Contributor

@Micky774 Micky774 commented Aug 1, 2023

Reference Issues/PRs

Towards #23958
Supersedes #25561

What does this implement/fix? Explain your changes.

From #25561:
This simplifies the original implementation of PairwiseDistance by @jjerphan, with the following differences:

  • PairwiseDistance{32,64} doesn't subclass BaseDistancesReduction{32,64} anymore.
  • This allows to add _parallel_on_{X,Y} methods to PairwiseDistance{32,64}, since these methods are decorated with @final in BaseBaseDistancesReduction{32,64} and thus can't be overwritten.
  • This also remove the chunk computing mechanism, by considering only the case chunk_size = 1, as proposed by @ogrisel in this comment.
  • This doesn't implement the Euclidean specialization yet to make benchmarks simpler.

This PR:

  • Preserves existing specializations and dispatches based on benchmark-driven heuristic (benchmarks will be published soon)
  • Simplifies usability criteria for PairwiseDistances
  • Changes backend dispatch process in pairwise_distances
  • Adds n_jobs parameter to PairwiseDistances to preserve n_jobs semantics from pairwise_distances
  • Moves X_is_Y attribute to PairwiseDistances to minimize change to unrelated API (all that would be affected by DatasetsPair)
  • Cleans misc. changes/comments and updates PR
  • Minimizes diff

Any other comments?

Benchmarks will be coming soon

Vincent-Maladiere and others added 30 commits February 7, 2023 10:06
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
This allows using older versions of threadpoolctl.
@github-actions
Copy link

github-actions bot commented Aug 1, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 7254cb0. Link to the linter CI: here

@Micky774 Micky774 marked this pull request as ready for review August 2, 2023 13:04
@Micky774
Copy link
Contributor Author

Micky774 commented Aug 2, 2023

@jjerphan @Vincent-Maladiere When developing this, had either of you investigated the viability of parallelizing DistinceMetric.{csr_}pdist directly, and using that + additional validation in pairwise_distances?

@jjerphan
Copy link
Member

jjerphan commented Aug 4, 2023

I do not think that we have tried.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants