-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH Add dtype preservation for SpectralClustering
#22669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
test_spectral.py
to test implementations on 32bit datasetsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
According to some irl discussions, such tests should only be added after Let's keep this PR open in the mean time. |
Resurrecting this PR: |
SpectralClustering
@@ -744,7 +744,7 @@ def fit(self, X, y=None): | |||
params["coef0"] = self.coef0 | |||
self.affinity_matrix_ = pairwise_kernels( | |||
X, metric=self.affinity, filter_params=True, **params | |||
) | |||
).astype(X.dtype, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should work on making pairwise_kernels
work efficiently with float32
data first.
Since affinity="rbf"
is the default, flagging SpectralClustering
as dtype
preserving without that prerequisite would be misleading to our users: there would be very few performance or peak memory usage gains in passing float32
data to this estimator when using the default hyper-params.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW we also need a test that checks the dtype of affinity_matrix_
when the affinity
param is the str name of a kernel function and another such assertion in a test that covers the case when affinity is nearest_neighbors
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to have ArgKmin
and RadiusNeighbors
preserve dtypes because it is also used when affinity="precomputed_nearest_neighbors"
.
Let's turn this PR as a draft for now.
@@ -96,11 +97,12 @@ def test_spectral_clustering_sparse(assign_labels): | |||
assert adjusted_rand_score(y, labels) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the dtype of affinity_matrix_
here.
@@ -122,13 +124,14 @@ def test_precomputed_nearest_neighbors_filtering(): | |||
assert_array_equal(results[0], results[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the dtype of affinity_matrix_
here.
Closing for now, might reopen later. |
Reference Issues/PRs
Partially addresses #22881
Precedes #22590
What does this implement/fix? Explain your changes.
This parametrizes tests from
test_spectral.py
to run on 32bit datasets.Any other comments?
We could introduce a mechanism to be able to able to remove tests' execution on 32bit datasets if this takes too much time to complete.