-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FEAT Support precomputed distance matrix for PairwiseDistancesReductions
#29483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Adam Li <adam2392@gmail.com>
This feature was implemented last year so updates need to be pulled
…nto feat_precomputed
❌ Linting issuesThis PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling You can see the details of the linting issues under the
|
No. Keep in mind that |
Thank you for the clarification |
In class PrecomputedDistanceMatrix, I don't see the option to input the precomputed distance matrix. Should I make that option or is it stored somewhere else? |
Hi! I don't have much time and energy to have a look at it in details but basically:
Let me know if this is unclear. 🙂 |
PairwiseDistancesReductions
…atrix stored in datasets_pair.pyx
Thank you for your comments. They make sense, are helpful, and I believe I am going in the right direction. So far I have made progress on all the points made above by @jjerphan except for creating an instance to store the precomputed matrix. I think even though there are things (referencing the precomputed matrix and syntax) that should be definitely changed a review will be helpful and probably worthwhile at this point :) Higher level review will be helpful enough as I am familiar with the class hierarchy and codebase. I have a couple questions that are not syntax related. We have a class for precomputed that subclasses DatasetsPair. In that class we have surrogate_dist and dist functions. My current understanding is that they are both used to compute distance depending on metric, chunking strategy, etc.. I want to create a method that will take in a precomputed matrix, if provided, and another method to retrieve this matrix to be passed into the compute method. Does that sound about right? As for creating an instance to store the precomputed matrix itself... which file should I include it in? Feedback will be appreciated! Thank you! |
316c72a
to
e2ef246
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sklearn/metrics/_pairwise_distances_reduction/_datasets_pair.pyx.tp
Outdated
Show resolved
Hide resolved
sklearn/metrics/_pairwise_distances_reduction/_datasets_pair.pyx.tp
Outdated
Show resolved
Hide resolved
sklearn/metrics/_pairwise_distances_reduction/_datasets_pair.pyx.tp
Outdated
Show resolved
Hide resolved
sklearn/metrics/_pairwise_distances_reduction/_datasets_pair.pyx.tp
Outdated
Show resolved
Hide resolved
sklearn/metrics/_pairwise_distances_reduction/_datasets_pair.pyx.tp
Outdated
Show resolved
Hide resolved
Signed-off-by: Julien Jerphanion <git@jjerphan.xyz>
e2ef246
to
3a027c7
Compare
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Thank you for the reviews! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, run the linters and remove the unused fixture in the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, revert changes to this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need tests checking that the results of PairwiseDistancesReduction
obtained respectively on (X, Y)
or on their precomputed distance matrix are identical.
Catching up on this PR. I wanted to mention that I don't know if I like the current API with To enable this, we could simply enforce an API where What do you all think? |
Thank you for shiming in, @Micky774. We can indeed revise the API to get them in line with the public ones. |
Hi @Micky774, I'm sorry to take a while to respond. That's a great point. I agree that including 'precomputed' in one of the options for metrics is clearer. How about making the change in a separate PR so that this PR does not get too lengthy? Thanks! |
Considering that it is a fairly key part of the API being affected by this PR, I think it is better to resolve it here. In my opinion it costs less to fix it now, than it would to start up a new PR for that change. And no worries on the delay -- I appreciate you continuing your work here! |
Ok, thank you :) |
While working on implementing the change in the API to enable 'precomputed' as one of the metrics, I realized that X = precomputed argument may lead to issues due to the dimensions being different from X. This is from the current DatasetsPair docstring: X : {ndarray, sparse matrix} of shape (n_samples_X, n_features) What do you all think? |
I'm not sure I see the problem here. If you're worried about the fact that an If that's not what your concern is, then I may need a bit more of an explanation :) |
Thank you for your input, @Micky774 ! That's exactly what I meant, but shouldn't we verify the shape of the precomputed matrix to throw an error if the dimension is not correct? What do you think, @jjerphan ? :) |
I would look at and implement the same logic as the one of estimator taking a data matrix or a pre-computed distance matrix for |
Sounds good. Thank you. |
is_usable = False | ||
|
||
# is_usable = (X is not None and Y is not None) ^ bool(precomputed) | ||
if is_usable == False: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't is_usable
be undefined if metric != "precomputed"
?
@@ -1622,7 +1709,7 @@ def test_radius_neighbors_classmode_strategy_consistent(outlier_label): | |||
X=X, | |||
Y=Y, | |||
radius=radius, | |||
metric=metric, | |||
metric=metric, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metric=metric, | |
metric=metric, |
Towards #25888
(1) To take in precomputed distance directly, uses 'compute' in {ArgKmin, RadiusNeighbors}{ClassMode} {32,64}.
(2) When this is received after a XOR check (X, Y or precomputed) initializes an instance of class PrecomputedDistance.
(3) This data is formatted (wrapped?) by datasets_pair.py
(4) It is passed onto the backend by using the get_for() factory method.
Thank you to @jjerphan for the reviews!
Co-authors: @jjerphan @adam2392