ENH Add dtype preservation for `SpectralClustering` #22669

jjerphan · 2022-03-03T14:59:32Z

Reference Issues/PRs

Partially addresses #22881
Precedes #22590

What does this implement/fix? Explain your changes.

This parametrizes tests from test_spectral.py to run on 32bit datasets.

Any other comments?

We could introduce a mechanism to be able to able to remove tests' execution on 32bit datasets if this takes too much time to complete.

sklearn/cluster/tests/test_spectral.py

jeremiedbb

LGTM

jeremiedbb · 2022-06-08T09:37:14Z

According to some irl discussions, such tests should only be added after SpectralClustering can natively work on float32 data. For now it converts to float64 right away.

Let's keep this PR open in the mean time.

jjerphan · 2022-11-16T15:51:14Z

Resurrecting this PR: SpectralClustering should now preserve dtype with the latest commits.

ogrisel · 2022-11-29T17:03:42Z

sklearn/cluster/_spectral.py

@@ -744,7 +744,7 @@ def fit(self, X, y=None):
                params["coef0"] = self.coef0
            self.affinity_matrix_ = pairwise_kernels(
                X, metric=self.affinity, filter_params=True, **params
-            )
+            ).astype(X.dtype, copy=False)


I think we should work on making pairwise_kernels work efficiently with float32 data first.

Since affinity="rbf" is the default, flagging SpectralClustering as dtype preserving without that prerequisite would be misleading to our users: there would be very few performance or peak memory usage gains in passing float32 data to this estimator when using the default hyper-params.

BTW we also need a test that checks the dtype of affinity_matrix_ when the affinity param is the str name of a kernel function and another such assertion in a test that covers the case when affinity is nearest_neighbors.

We also need to have ArgKmin and RadiusNeighbors preserve dtypes because it is also used when affinity="precomputed_nearest_neighbors".

Let's turn this PR as a draft for now.

ogrisel · 2022-11-29T17:05:28Z

sklearn/cluster/tests/test_spectral.py

@@ -96,11 +97,12 @@ def test_spectral_clustering_sparse(assign_labels):
    assert adjusted_rand_score(y, labels) == 1


Please check the dtype of affinity_matrix_ here.

ogrisel · 2022-11-29T17:06:22Z

sklearn/cluster/tests/test_spectral.py

@@ -122,13 +124,14 @@ def test_precomputed_nearest_neighbors_filtering():
    assert_array_equal(results[0], results[1])


Please check the dtype of affinity_matrix_ here.

jjerphan · 2023-03-13T14:27:21Z

Closing for now, might reopen later.

TST Adapt test_spectral.py to test implementations on 32bit datasets

582aa0b

jjerphan added the No Changelog Needed label Mar 3, 2022

github-actions bot added the module:cluster label Mar 3, 2022

jjerphan marked this pull request as ready for review March 4, 2022 13:19

jjerphan added 2 commits March 23, 2022 16:39

Merge branch 'main' into tst/test_spectral-32bit

2a0b747

TST Use global_dtype

818e85c

jjerphan changed the title ~~TST Adapt test_spectral.py to test implementations on 32bit datasets~~ TST use global_dtype in sklearn/cluster/tests/test_spectral.py Mar 24, 2022

jjerphan mentioned this pull request Mar 24, 2022

Improve tests to make them run on variously typed data using the global_dtype fixture #22881

Open

jjerphan added the Waiting for Reviewer label Mar 24, 2022

jeremiedbb reviewed Mar 25, 2022

View reviewed changes

sklearn/cluster/tests/test_spectral.py Outdated Show resolved Hide resolved

sklearn/cluster/tests/test_spectral.py Outdated Show resolved Hide resolved

sklearn/cluster/tests/test_spectral.py Outdated Show resolved Hide resolved

Address reviews comments

dcc03f5

jeremiedbb approved these changes Jun 3, 2022

View reviewed changes

jeremiedbb added the Quick Review For PRs that are quick to review label Jun 3, 2022

cmarmo added float32 Issues related to support for 32bit data and removed Waiting for Reviewer Quick Review For PRs that are quick to review labels Jul 16, 2022

jjerphan added 2 commits November 16, 2022 16:39

Merge branch 'main' into tst/test_spectral-32bit

b60a65f

Make sure to preserve dtype in SpectralClustering

1d05d2d

jjerphan changed the title ~~TST use global_dtype in sklearn/cluster/tests/test_spectral.py~~ ENH Add dtype preservation for SpectralClustering Nov 23, 2022

Merge branch 'main' into tst/test_spectral-32bit

ac6dce4

jjerphan added the Waiting for Second Reviewer First reviewer is done, need a second one! label Nov 23, 2022

ogrisel reviewed Nov 29, 2022

View reviewed changes

jjerphan marked this pull request as draft November 30, 2022 09:37

glemaitre requested review from glemaitre and removed request for glemaitre December 28, 2022 14:04

glemaitre removed the Waiting for Second Reviewer First reviewer is done, need a second one! label Dec 28, 2022

jjerphan closed this Mar 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH Add dtype preservation for `SpectralClustering` #22669

ENH Add dtype preservation for `SpectralClustering` #22669

Uh oh!

jjerphan commented Mar 3, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeremiedbb left a comment

Uh oh!

jeremiedbb commented Jun 8, 2022

Uh oh!

jjerphan commented Nov 16, 2022

Uh oh!

ogrisel Nov 29, 2022

Uh oh!

ogrisel Nov 29, 2022

Uh oh!

jjerphan Nov 30, 2022

Uh oh!

ogrisel Nov 29, 2022

Uh oh!

ogrisel Nov 29, 2022

Uh oh!

jjerphan commented Mar 13, 2023

Uh oh!

Uh oh!

		@@ -96,11 +97,12 @@ def test_spectral_clustering_sparse(assign_labels):
		assert adjusted_rand_score(y, labels) == 1

		@@ -122,13 +124,14 @@ def test_precomputed_nearest_neighbors_filtering():
		assert_array_equal(results[0], results[1])

Uh oh!

ENH Add dtype preservation for SpectralClustering #22669

ENH Add dtype preservation for SpectralClustering #22669

Uh oh!

Conversation

jjerphan commented Mar 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

jeremiedbb commented Jun 8, 2022

Uh oh!

jjerphan commented Nov 16, 2022

Uh oh!

ogrisel Nov 29, 2022

Choose a reason for hiding this comment

Uh oh!

ogrisel Nov 29, 2022

Choose a reason for hiding this comment

Uh oh!

jjerphan Nov 30, 2022

Choose a reason for hiding this comment

Uh oh!

ogrisel Nov 29, 2022

Choose a reason for hiding this comment

Uh oh!

ogrisel Nov 29, 2022

Choose a reason for hiding this comment

Uh oh!

jjerphan commented Mar 13, 2023

Uh oh!

Uh oh!

ENH Add dtype preservation for `SpectralClustering` #22669

ENH Add dtype preservation for `SpectralClustering` #22669

jjerphan commented Mar 3, 2022 •

edited

Loading