ENH: Faster Eigen Decomposition For Isomap & KernelPCA #31247

yaichm · 2025-04-24T17:29:04Z

Fixes #31246

Implemented randomized_eigh(selection='values') and integrated it into KernelPCA and Isomap

Introduced a new eigenvalue decomposition function randomized_eigh(values) for faster computation.
Integrated this solver into both KernelPCA and Isomap as an alternative to dense solvers.
Added comprehensive tests in extmath.py to validate the decomposition accuracy.
Benchmarked against existing solvers, comparing:
- Execution time in KernelPCA and Isomap
- Reconstruction error in Isomap
The benchmark result graphs comparing execution time and reconstruction error with existing solvers will be added in the comment below.

…ith tests and integration into Isomap and KernelPCA

…eigsh_value

…into feature/randomized_eigsh_value

github-actions · 2025-04-24T17:30:02Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 94e1961. Link to the linter CI: here}

ogrisel · 2025-04-25T12:15:22Z

Thanks for the PR @yaichm.

Please follow the instructions of the automated comment above to resolve the failing linter continuous integration.

If you need help (e.g. the instructions are not clear enough), please let us know with a specific description of your attempt at resolving the problems and the problem you faced.

The benchmark result graphs comparing execution time and reconstruction error with existing solvers will be added in the comment below.

Looking forward to it. Please feel free to ping me once done.

Dlimim · 2025-04-26T14:38:15Z

In this Kernel PCA benchmark (focus on the left side of the graph), our custom randomized_value solver shows better scalability compared to standard solvers when the number of components is large.

Dlimim · 2025-04-26T14:41:52Z

When increasing the number of samples, our randomized_value solver consistently achieves much lower execution times compared to the full solver. Even with large datasets, randomized_value remains highly efficient and scalable.

Dlimim · 2025-04-26T14:52:31Z

In this Isomap benchmark , for a small number of components, both solvers show comparable execution times. However, as the number of components increases ( > 10 ), randomized_value significantly outperforms the full solver, achieving much faster execution for large datasets.

Dlimim · 2025-04-26T15:00:12Z

The projections obtained with the auto solver and the randomized solver are visually very similar, confirming that the randomized solver preserves the structure and quality of the embedding. This highlights its reliability alongside its faster execution.

Dlimim · 2025-04-26T15:08:23Z

The reconstruction error is very similar between the auto and randomized solvers across different sample sizes. This confirms that the randomized solver preserves the quality of the reconstruction while offering faster execution.

yaichm · 2025-04-27T14:51:07Z

I’ve followed the automated instructions and fixed the linter issues.

The benchmark result graphs have also been added.

@ogrisel , feel free to review whenever you are available.

smarie · 2025-04-28T12:23:03Z

Thanks to the team for finalizing these results !

So to summarize

on KernelPCA, we see similar results with method 5.3 (randomized eigenvalues) than with the "trick" of using 4.3 (randomized SVD). This makes me quite confident that the implementation is correct
on Isomap, where the "trick" could not be used (since matrices are not guaranteed to be PSD),
- when the number of components is low (2), there is no drawback: the speed is the same than the current method based on arpack, and the result on sample dataset is identical (same visual figure, same reconstruction error)
- we see clear speed benefits when the number of components is high (right figure of this message, where 50 components are selected). @yaichm It would be interesting to see the reconstruction error corresponding to this situation, to make sure that the speed gain was not obtained at the cost of increasing the reconstruction error

yaichm · 2025-04-28T20:56:06Z

As requested by @smarie, here is the image showing the reconstruction error for 50 components.

smarie · 2025-05-05T07:31:19Z

@ogrisel I think the team is now ready for a first review (see summary #31247 (comment))

smarie · 2025-05-05T07:34:43Z

sklearn/utils/tests/test_extmath.py

@@ -198,10 +198,6 @@ def test_randomized_eigsh(dtype):
    # eigenvectors
    assert eigvecs.shape == (4, 2)

-    # with 'value' selection method, the negative eigenvalue does not show up


Please replace this section with

eigvals, eigvecs = _randomized_eigsh(X, n_components=2, selection="value") # eigenvalues assert eigvals.shape == (2,) assert_array_almost_equal(eigvals, [3.0, 1.0]) # signed ordering: positive eig # eigenvectors assert eigvecs.shape == (4, 2)

smarie · 2025-05-05T07:38:06Z

sklearn/utils/tests/test_extmath.py

+    # make a random PSD matrix
+    X = make_sparse_spd_matrix(n_features, random_state=0)


This is PSD, but we should also check non-PSD. So let's create a symmetric random matrix instead. If you wish you can test the two cases

add a @pytest.mark.parametrize("is_psd", (True, False))

in the test, switch on if is_psd: to create the random X.

smarie · 2025-05-05T07:53:40Z

sklearn/utils/extmath.py

+    with bounded error. Unlike the 'module' strategy, it works efficiently with
+    non-positive semidefinite matrices, handling both positive and negative
+    eigenvalues directly.


I would rather suggest to have a simple comment here :

Suggested change

with bounded error. Unlike the 'module' strategy, it works efficiently with

non-positive semidefinite matrices, handling both positive and negative

eigenvalues directly.

with bounded error. Unlike the 'module' strategy, it returns the top `k` eigenvalues by decreasing value: all the positive ones first, then the negative ones if any.

And to add a more detailed comment in the Strategy "module" description:

Strategy 'module': (...existing definition...) Unlike the 'value' strategy, this returns the top `k` eigenvalues by decreasing **module**. Therefore, when `M` is non-positive semidefinite large negative eigenvalues will be returned before small positive ones. When `M` is psd both strategies lead to the same results as all eigenvalues are positive.

oussama er-rabie and others added 8 commits April 16, 2025 23:02

add randomized_eigh(selection='value')

b286daf

add changelog

66573f0

Add randomized_eigsh(selection='value') for fast eigendecomposition w…

ce755eb

…ith tests and integration into Isomap and KernelPCA

Merge branch 'scikit-learn:main' into feature/randomized_eigsh_value

5bccfbc

Merge remote-tracking branch 'upstream/main' into feature/randomized_…

94cc727

…eigsh_value

Merge remote-tracking branch 'origin/feature/randomized_eigsh_value' …

13db528

…into feature/randomized_eigsh_value

Merge branch 'scikit-learn:main' into feature/randomized_eigsh_value

cbdd2ad

Merge branch 'scikit-learn:main' into feature/randomized_eigsh_value

64da5d2

github-actions bot added module:decomposition module:manifold module:utils labels Apr 24, 2025

yaichm changed the title ~~ENH: Faster Eigen Decomposition For & KernelPCA~~ ENH: Faster Eigen Decomposition For Isomap & KernelPCA Apr 24, 2025

ogrisel mentioned this pull request Apr 25, 2025

Faster Eigen Decomposition for Isomap & KernelPCA #31246

Open

Merge branch 'main' into feature/randomized_eigsh_value

d204484

Mohamed Yaich added 2 commits April 27, 2025 12:12

Fix linting errors and rename the changelog to match the PR

53e0102

Fix docstring of randomized_eigen_decomposition

80f8ee7

Merge branch 'main' into feature/randomized_eigsh_value

161658d

Mohamed Yaich and others added 3 commits April 29, 2025 00:46

Added test for array API compliance

91654c8

Merge branch 'main' into feature/randomized_eigsh_value

ca7f378

Merge branch 'main' into feature/randomized_eigsh_value

4829984

Merge branch 'main' into feature/randomized_eigsh_value

4939734

smarie reviewed May 5, 2025

View reviewed changes

yaichm added 2 commits May 5, 2025 10:34

Merge branch 'main' into feature/randomized_eigsh_value

e3628d1

Merge branch 'main' into feature/randomized_eigsh_value

94e1961

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Faster Eigen Decomposition For Isomap & KernelPCA #31247

ENH: Faster Eigen Decomposition For Isomap & KernelPCA #31247

yaichm commented Apr 24, 2025

github-actions bot commented Apr 24, 2025 •

edited

Loading

ogrisel commented Apr 25, 2025

Dlimim commented Apr 26, 2025 •

edited

Loading

Dlimim commented Apr 26, 2025

Dlimim commented Apr 26, 2025 •

edited

Loading

Dlimim commented Apr 26, 2025

Dlimim commented Apr 26, 2025

yaichm commented Apr 27, 2025 •

edited

Loading

smarie commented Apr 28, 2025

yaichm commented Apr 28, 2025 •

edited

Loading

smarie commented May 5, 2025

smarie May 5, 2025

smarie May 5, 2025

smarie May 5, 2025

		# make a random PSD matrix
		X = make_sparse_spd_matrix(n_features, random_state=0)

ENH: Faster Eigen Decomposition For Isomap & KernelPCA #31247

Are you sure you want to change the base?

ENH: Faster Eigen Decomposition For Isomap & KernelPCA #31247

Conversation

yaichm commented Apr 24, 2025

github-actions bot commented Apr 24, 2025 • edited Loading

✔️ Linting Passed

ogrisel commented Apr 25, 2025

Dlimim commented Apr 26, 2025 • edited Loading

Dlimim commented Apr 26, 2025

Dlimim commented Apr 26, 2025 • edited Loading

Dlimim commented Apr 26, 2025

Dlimim commented Apr 26, 2025

yaichm commented Apr 27, 2025 • edited Loading

smarie commented Apr 28, 2025

yaichm commented Apr 28, 2025 • edited Loading

smarie commented May 5, 2025

smarie May 5, 2025

Choose a reason for hiding this comment

smarie May 5, 2025

Choose a reason for hiding this comment

smarie May 5, 2025

Choose a reason for hiding this comment

github-actions bot commented Apr 24, 2025 •

edited

Loading

Dlimim commented Apr 26, 2025 •

edited

Loading

Dlimim commented Apr 26, 2025 •

edited

Loading

yaichm commented Apr 27, 2025 •

edited

Loading

yaichm commented Apr 28, 2025 •

edited

Loading