Skip to content

[ENH, DRAFT] Implement Precomputed distances in the Pairwise Distances Reduction framework #26032

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

adam2392
Copy link
Member

Reference Issues/PRs

Towards: ##25888

This is just a draft and I will mark it ready for review when I am done.

What does this implement/fix? Explain your changes.

  • Adds PrecomputedDistanceMatrix subclass of DatasetsPair, which has the precomputed distance matrix between two pairs of arrays
  • Adds the relevant API and changes in the logic to support a user passing in 'precomputed' as the metric.

Any other comments?

Thanks @jjerphan for pointing me in the right direction initially with his commit at jjerphan@78b3aa4#diff-2c0fbe478f4ed2459849f7227487a1df0f7cac915508a970dd96220418ca5784

Signed-off-by: Adam Li <adam2392@gmail.com>
@jjerphan
Copy link
Member

Hi @adam2392, thank you for contributing!

Do you need some guidance for this contribution?

@adam2392
Copy link
Member Author

Do you need some guidance for this contribution?

Hey @jjerphan! Sorry been busy and I have a deadline coming up so haven't had the chance to work on this yet. I still need to review the code design, but from my initial understanding, adding precomputed distances should not be that difficult or involved, since most of the "computing distances" dispatch part is not used(?)

@jjerphan
Copy link
Member

No problem, focus on your deadline. We can follow-up later ; you are not under a SLA or something, I just wanted to know if my instructions were clear.

@Micky774
Copy link
Contributor

@adam2392 I just wanted to ping and check if you were still interested in pursuing this PR. Totally fine if it is not a priority in the near future :)

@adam2392
Copy link
Member Author

Hey @Micky774 yeah I'm interested in helping out with this push, but then got a bunch out on my plate. Once neurips rebuttal period is over I was going to revisit this.

If it's urgent tho, feel free to proceed!

@Micky774
Copy link
Contributor

Hey @Micky774 yeah I'm interested in helping out with this push, but then got a bunch out on my plate. Once neurips rebuttal period is over I was going to revisit this.

If it's urgent tho, feel free to proceed!

Thanks for the update! No rush on our part either, just trying to get a sense of where we are at regarding backend work. Best of luck during the rebuttal period 😄

@adam2392 adam2392 marked this pull request as ready for review February 17, 2024 14:52
@adam2392 adam2392 marked this pull request as draft February 17, 2024 14:52
Copy link

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here


black

black detected issues. Please run black . locally and push the changes. Here you can see the detected issues. Note that running black might also fix some of the issues which might be detected by ruff. Note that the installed black version is black=23.3.0.


--- /home/runner/work/scikit-learn/scikit-learn/sklearn/metrics/_pairwise_distances_reduction/_dispatcher.py	2024-02-17 14:53:35.062353 +0000
+++ /home/runner/work/scikit-learn/scikit-learn/sklearn/metrics/_pairwise_distances_reduction/_dispatcher.py	2024-02-17 14:53:51.460787 +0000
@@ -76,11 +76,13 @@
             # of distances in this case.
             # TODO: implement a stable simultaneous_sort.
             "hamming",
             *BOOL_METRICS,
         }
-        return sorted(({"sqeuclidean", "precomputed"} | set(METRIC_MAPPING64.keys())) - excluded)
+        return sorted(
+            ({"sqeuclidean", "precomputed"} | set(METRIC_MAPPING64.keys())) - excluded
+        )
 
     @classmethod
     def is_usable_for(cls, X, Y, metric) -> bool:
         """Return True if the dispatcher can be used for the
         given parameters.
would reformat /home/runner/work/scikit-learn/scikit-learn/sklearn/metrics/_pairwise_distances_reduction/_dispatcher.py

Oh no! 💥 💔 💥
1 file would be reformatted, 907 files would be left unchanged.

ruff

ruff detected issues. Please run ruff --fix --show-source . locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.2.1.


warning: The `--show-source` argument is deprecated. Use `--output-format=full` instead.
warning: The top-level linter settings are deprecated in favour of their counterparts in the `lint` section. Please update the following options in `pyproject.toml`:
  - 'ignore' -> 'lint.ignore'
  - 'select' -> 'lint.select'
  - 'per-file-ignores' -> 'lint.per-file-ignores'
sklearn/metrics/_pairwise_distances_reduction/_dispatcher.py:81:89: E501 Line too long (97 > 88)
   |
79 |             *BOOL_METRICS,
80 |         }
81 |         return sorted(({"sqeuclidean", "precomputed"} | set(METRIC_MAPPING64.keys())) - excluded)
   |                                                                                         ^^^^^^^^^ E501
82 | 
83 |     @classmethod
   |

Found 1 error.

Generated for commit: 124a8dc. Link to the linter CI: here

@kyrajeep
Copy link

Hello @adam2392,

I am checking in on the status of this PR. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants