ENH: Faster Eigen Decomposition For Isomap & KernelPCA #31247

yaichm · 2025-04-24T17:29:04Z

Fixes #31246

Implemented randomized_eigh(selection='values') and integrated it into KernelPCA and Isomap

Introduced a new eigenvalue decomposition function randomized_eigh(values) for faster computation.
Integrated this solver into both KernelPCA and Isomap as an alternative to dense solvers.
Added comprehensive tests in extmath.py to validate the decomposition accuracy.
Benchmarked against existing solvers, comparing:
- Execution time in KernelPCA and Isomap
- Reconstruction error in Isomap
The benchmark result graphs comparing execution time and reconstruction error with existing solvers will be added in the comment below.

…ith tests and integration into Isomap and KernelPCA

…eigsh_value

…into feature/randomized_eigsh_value

github-actions · 2025-04-24T17:30:02Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 3e7c792. Link to the linter CI: here}

ogrisel · 2025-04-25T12:15:22Z

Thanks for the PR @yaichm.

Please follow the instructions of the automated comment above to resolve the failing linter continuous integration.

If you need help (e.g. the instructions are not clear enough), please let us know with a specific description of your attempt at resolving the problems and the problem you faced.

The benchmark result graphs comparing execution time and reconstruction error with existing solvers will be added in the comment below.

Looking forward to it. Please feel free to ping me once done.

Dlimim · 2025-04-26T14:38:15Z

In this Kernel PCA benchmark (focus on the left side of the graph), our custom randomized_value solver shows better scalability compared to standard solvers when the number of components is large.

Dlimim · 2025-04-26T14:41:52Z

When increasing the number of samples, our randomized_value solver consistently achieves much lower execution times compared to the full solver. Even with large datasets, randomized_value remains highly efficient and scalable.

Dlimim · 2025-04-26T14:52:31Z

In this Isomap benchmark , for a small number of components, both solvers show comparable execution times. However, as the number of components increases ( > 10 ), randomized_value significantly outperforms the full solver, achieving much faster execution for large datasets.

Dlimim · 2025-04-26T15:00:12Z

The projections obtained with the auto solver and the randomized solver are visually very similar, confirming that the randomized solver preserves the structure and quality of the embedding. This highlights its reliability alongside its faster execution.

Dlimim · 2025-04-26T15:08:23Z

The reconstruction error is very similar between the auto and randomized solvers across different sample sizes. This confirms that the randomized solver preserves the quality of the reconstruction while offering faster execution.

yaichm · 2025-04-27T14:51:07Z

I’ve followed the automated instructions and fixed the linter issues.

The benchmark result graphs have also been added.

@ogrisel , feel free to review whenever you are available.

smarie · 2025-04-28T12:23:03Z

Thanks to the team for finalizing these results !

So to summarize

on KernelPCA, we see similar results with method 5.3 (randomized eigenvalues) than with the "trick" of using 4.3 (randomized SVD). This makes me quite confident that the implementation is correct
on Isomap, where the "trick" could not be used (since matrices are not guaranteed to be PSD),
- when the number of components is low (2), there is no drawback: the speed is the same than the current method based on arpack, and the result on sample dataset is identical (same visual figure, same reconstruction error)
- we see clear speed benefits when the number of components is high (right figure of this message, where 50 components are selected). @yaichm It would be interesting to see the reconstruction error corresponding to this situation, to make sure that the speed gain was not obtained at the cost of increasing the reconstruction error

yaichm · 2025-04-28T20:56:06Z

As requested by @smarie, here is the image showing the reconstruction error for 50 components.

smarie · 2025-05-05T07:31:19Z

@ogrisel I think the team is now ready for a first review (see summary #31247 (comment))

smarie · 2025-05-05T07:34:43Z

sklearn/utils/tests/test_extmath.py

@@ -198,10 +198,6 @@ def test_randomized_eigsh(dtype):
    # eigenvectors
    assert eigvecs.shape == (4, 2)

-    # with 'value' selection method, the negative eigenvalue does not show up


Please replace this section with

eigvals, eigvecs = _randomized_eigsh(X, n_components=2, selection="value") # eigenvalues assert eigvals.shape == (2,) assert_array_almost_equal(eigvals, [3.0, 1.0]) # signed ordering: positive eig # eigenvectors assert eigvecs.shape == (4, 2)

smarie · 2025-05-05T07:38:06Z

sklearn/utils/tests/test_extmath.py

+    # make a random PSD matrix
+    X = make_sparse_spd_matrix(n_features, random_state=0)


This is PSD, but we should also check non-PSD. So let's create a symmetric random matrix instead. If you wish you can test the two cases

add a @pytest.mark.parametrize("is_psd", (True, False))

in the test, switch on if is_psd: to create the random X.

smarie · 2025-05-05T07:53:40Z

sklearn/utils/extmath.py

+    with bounded error. Unlike the 'module' strategy, it works efficiently with
+    non-positive semidefinite matrices, handling both positive and negative
+    eigenvalues directly.


I would rather suggest to have a simple comment here :

Suggested change

with bounded error. Unlike the 'module' strategy, it works efficiently with

non-positive semidefinite matrices, handling both positive and negative

eigenvalues directly.

with bounded error. Unlike the 'module' strategy, it returns the top `k` eigenvalues by decreasing value: all the positive ones first, then the negative ones if any.

And to add a more detailed comment in the Strategy "module" description:

Strategy 'module': (...existing definition...) Unlike the 'value' strategy, this returns the top `k` eigenvalues by decreasing **module**. Therefore, when `M` is non-positive semidefinite large negative eigenvalues will be returned before small positive ones. When `M` is psd both strategies lead to the same results as all eigenvalues are positive.

smarie · 2025-05-26T07:56:35Z

sklearn/utils/tests/test_extmath.py

+    # -- eigenvectors comparison
+    assert eigvecs_lapack.shape == (n_features, k)
+    dummy_vecs = np.zeros_like(eigvecs).T
+    eigvecs, _ = svd_flip(eigvecs, dummy_vecs)


Suggested change

eigvecs, _ = svd_flip(eigvecs, dummy_vecs)

# Fix the signs so that both have the same

eigvecs, _ = svd_flip(eigvecs, dummy_vecs)

smarie · 2025-05-26T14:28:45Z

sklearn/manifold/_isomap.py

@@ -169,7 +171,7 @@ class Isomap(ClassNamePrefixFeaturesOutMixin, TransformerMixin, BaseEstimator):
        "n_neighbors": [Interval(Integral, 1, None, closed="left"), None],
        "radius": [Interval(Real, 0, None, closed="both"), None],
        "n_components": [Interval(Integral, 1, None, closed="left")],
-        "eigen_solver": [StrOptions({"auto", "arpack", "dense"})],
+        "eigen_solver": [StrOptions({"auto", "arpack", "dense", "randomized_value"})],


"randomized" could probably be more straightforward as a parameter name.

smarie · 2025-05-26T14:28:56Z

sklearn/manifold/_isomap.py

@@ -57,6 +57,8 @@ class Isomap(ClassNamePrefixFeaturesOutMixin, TransformerMixin, BaseEstimator):
        'dense' : Use a direct solver (i.e. LAPACK)
        for the eigenvalue decomposition.

+        'randomized_value' : Use randomized solver in order to reduce complexity.


"randomized" could probably be more straightforward as a parameter name.

smarie · 2025-05-26T14:32:10Z

sklearn/decomposition/_kernel_pca.py

@@ -263,7 +263,9 @@ class KernelPCA(ClassNamePrefixFeaturesOutMixin, TransformerMixin, BaseEstimator
        "kernel_params": [dict, None],
        "alpha": [Interval(Real, 0, None, closed="left")],
        "fit_inverse_transform": ["boolean"],
-        "eigen_solver": [StrOptions({"auto", "dense", "arpack", "randomized"})],
+        "eigen_solver": [
+            StrOptions({"auto", "dense", "arpack", "randomized", "randomized_value"})


If we really add randomized_value to kernel pca solvers, it should be declared in the docstring of the parameters (as for Isomap). If we do this it should be clear that we expect the results to be equivalent in speed and in quality to the current "randomized"(svd) method, except when a precomputed Kernel matrix is provided and is actually not PSD. In this case current "randomized" may not lead to correct results, or may even raise an error (if a negative eigenvalue is returned by the solver)

smarie · 2025-05-26T14:35:58Z

sklearn/decomposition/_kernel_pca.py

@@ -363,6 +365,14 @@ def _fit_transform_in_place(self, K):
                random_state=self.random_state,
                selection="module",
            )
+        elif eigen_solver == "randomized_value":
+            self.eigenvalues_, self.eigenvectors_ = _randomized_eigsh(


Maybe we could add comments to this branch and the previous branch of the if so that it is easier to remember the difference while maintaining the code

# Use selection='module' for SVD decomposition (wrong if non-PSD precomputed kernel)

and

Suggested change

self.eigenvalues_, self.eigenvectors_ = _randomized_eigsh(

# Use selection='value' for direct eigenvalue decomposition (robust to non-PSD)

self.eigenvalues_, self.eigenvectors_ = _randomized_eigsh(

smarie · 2025-05-26T14:37:02Z

sklearn/utils/extmath.py

+    random_state=None,
+):
+    """
+    Approximate eigenvalue decomposition of a Hermitian matrix A ≈ U Λ U*.


Question to core devs : is symbol ≈ allowed in docstrings ?

Suggested change

Approximate eigenvalue decomposition of a Hermitian matrix A ≈ U Λ U*.

Approximate eigenvalue decomposition of a Hermitian matrix A ≈ U Λ U*.

smarie · 2025-05-26T14:38:59Z

doc/whats_new/upcoming_changes/sklearn.utils/31247.feature.rst

+  **Halko, Martinsson, and Tropp (2011)**: *Finding Structure with Randomness: Probabilistic Algorithms for Constructing 
+  Approximate Matrix Decompositions*. This method provides a faster alternative to existing eigen decomposition techniques.
+
+- Integrated :func:`eigen_decomposition_one_pass` into :class:`sklearn.manifold.Isomap` and 


Suggested change

- Integrated :func:`eigen_decomposition_one_pass` into :class:`sklearn.manifold.Isomap` and

- Integrated :func:`randomized_eigen_decomposition` into :class:`sklearn.manifold.Isomap` and

smarie · 2025-05-26T14:39:55Z

doc/whats_new/upcoming_changes/sklearn.utils/31247.feature.rst

+- Integrated :func:`eigen_decomposition_one_pass` into :class:`sklearn.manifold.Isomap` and 
+  :class:`sklearn.decomposition.KernelPCA` as an additional option for eigen decomposition.
+
+- Added a test suite comparing the new method to existing solvers (:obj:`arpack`, :obj:`dense`, etc.), ensuring numerical 


Suggested change

- Added a test suite comparing the new method to existing solvers (:obj:`arpack`, :obj:`dense`, etc.), ensuring numerical

- Added a test suite comparing the new method to existing solvers (``'arpack'``, ``'dense'``, etc.), ensuring numerical

smarie · 2025-05-26T14:40:21Z

doc/whats_new/upcoming_changes/sklearn.utils/31247.feature.rst

+by :user: `Sylvain Marié<@smarie>`, `Mohamed yaich<@yaichm>`, `Oussama Er-rabie<@eroussama>`, `Mohamed Dlimi<@Dlimim>`, 
+`Hamza Zeroual<@HamzaLuffy>` and `Amine Hannoun<@AmineHannoun>`.


You guys did most of the work and deserve to appear first :)

Suggested change

by :user: `Sylvain Marié<@smarie>`, `Mohamed yaich<@yaichm>`, `Oussama Er-rabie<@eroussama>`, `Mohamed Dlimi<@Dlimim>`,

`Hamza Zeroual<@HamzaLuffy>` and `Amine Hannoun<@AmineHannoun>`.

by `Mohamed yaich<@yaichm>`, `Oussama Er-rabie<@eroussama>`, `Mohamed Dlimi<@Dlimim>`,

`Hamza Zeroual<@HamzaLuffy>`, `Amine Hannoun<@AmineHannoun>` and :user: `Sylvain Marié<@smarie>`.

smarie · 2025-05-26T14:42:38Z

doc/whats_new/upcoming_changes/sklearn.utils/31247.feature.rst

@@ -0,0 +1,17 @@
+Feature
+------------
+


Generally with this changelog I have got the impression that this is too big/detailed compared to what sklearn core team expects. @lesteve or @ogrisel let us know if there is some guideline here, some reference example PR doing it right, or feel free to propose an alternative text directly

smarie · 2025-05-26T15:45:19Z

benchmarks/bench_isomap_auto_vs_randomized.py

+
+What you can observe:
+----------------------
+- The `auto` solver provides a reference solution.


The 'auto' behaviour should rather be improved to rely on some heuristic in order to solve the problem faster automatically. See what has been done in #27491

ogrisel · 2025-05-26T15:55:02Z

Sorry for the lack of feedback, I plan to come back to revew this PR soonish. In the meantime, could you please expand the right handside of this benchmark plot: #31247 (comment)

to go to even larger number of data points and/or components? Also, could you please try to use log scaled axis to ease extrapolation assuming the gap between the two method is not a fixed constant?

smarie · 2025-06-10T13:55:24Z

For your information @ogrisel I have open an issue in scipy for the long-term discussion, as discussed with the @scikit-learn/core-devs at the monthly meeting : scipy/scipy#23145 .

oussama er-rabie and others added 8 commits April 16, 2025 23:02

add randomized_eigh(selection='value')

b286daf

add changelog

66573f0

Add randomized_eigsh(selection='value') for fast eigendecomposition w…

ce755eb

…ith tests and integration into Isomap and KernelPCA

Merge branch 'scikit-learn:main' into feature/randomized_eigsh_value

5bccfbc

Merge remote-tracking branch 'upstream/main' into feature/randomized_…

94cc727

…eigsh_value

Merge remote-tracking branch 'origin/feature/randomized_eigsh_value' …

13db528

…into feature/randomized_eigsh_value

Merge branch 'scikit-learn:main' into feature/randomized_eigsh_value

cbdd2ad

Merge branch 'scikit-learn:main' into feature/randomized_eigsh_value

64da5d2

github-actions bot added module:decomposition module:manifold module:utils labels Apr 24, 2025

yaichm changed the title ~~ENH: Faster Eigen Decomposition For & KernelPCA~~ ENH: Faster Eigen Decomposition For Isomap & KernelPCA Apr 24, 2025

ogrisel mentioned this pull request Apr 25, 2025

Faster Eigen Decomposition for Isomap & KernelPCA #31246

Open

Merge branch 'main' into feature/randomized_eigsh_value

d204484

Mohamed Yaich added 2 commits April 27, 2025 12:12

Fix linting errors and rename the changelog to match the PR

53e0102

Fix docstring of randomized_eigen_decomposition

80f8ee7

Merge branch 'main' into feature/randomized_eigsh_value

161658d

Mohamed Yaich and others added 3 commits April 29, 2025 00:46

Added test for array API compliance

91654c8

Merge branch 'main' into feature/randomized_eigsh_value

ca7f378

Merge branch 'main' into feature/randomized_eigsh_value

4829984

Merge branch 'main' into feature/randomized_eigsh_value

4939734

smarie reviewed May 5, 2025

View reviewed changes

yaichm added 4 commits May 5, 2025 10:34

Merge branch 'main' into feature/randomized_eigsh_value

e3628d1

Merge branch 'main' into feature/randomized_eigsh_value

94e1961

Merge branch 'main' into feature/randomized_eigsh_value

7593464

Merge branch 'main' into feature/randomized_eigsh_value

e8656f1

smarie reviewed May 26, 2025

View reviewed changes

smarie mentioned this pull request Jun 10, 2025

ENH: linalg: randomized svd and eigh decomposition (faster but approximate) scipy/scipy#23145

Open

Merge branch 'main' into feature/randomized_eigsh_value

3e7c792

		# make a random PSD matrix
		X = make_sparse_spd_matrix(n_features, random_state=0)

	eigvecs, _ = svd_flip(eigvecs, dummy_vecs)
	# Fix the signs so that both have the same
	eigvecs, _ = svd_flip(eigvecs, dummy_vecs)

	self.eigenvalues_, self.eigenvectors_ = _randomized_eigsh(
	# Use selection='value' for direct eigenvalue decomposition (robust to non-PSD)
	self.eigenvalues_, self.eigenvectors_ = _randomized_eigsh(

	Approximate eigenvalue decomposition of a Hermitian matrix A ≈ U Λ U*.
	Approximate eigenvalue decomposition of a Hermitian matrix A ≈ U Λ U*.

	- Integrated :func:`eigen_decomposition_one_pass` into :class:`sklearn.manifold.Isomap` and
	- Integrated :func:`randomized_eigen_decomposition` into :class:`sklearn.manifold.Isomap` and

	- Added a test suite comparing the new method to existing solvers (:obj:`arpack`, :obj:`dense`, etc.), ensuring numerical
	- Added a test suite comparing the new method to existing solvers (``'arpack'``, ``'dense'``, etc.), ensuring numerical

		by :user: `Sylvain Marié<@smarie>`, `Mohamed yaich<@yaichm>`, `Oussama Er-rabie<@eroussama>`, `Mohamed Dlimi<@Dlimim>`,
		`Hamza Zeroual<@HamzaLuffy>` and `Amine Hannoun<@AmineHannoun>`.

		@@ -0,0 +1,17 @@
		Feature
		------------

Uh oh!

ENH: Faster Eigen Decomposition For Isomap & KernelPCA #31247

Are you sure you want to change the base?

ENH: Faster Eigen Decomposition For Isomap & KernelPCA #31247

Uh oh!

Conversation

yaichm commented Apr 24, 2025

Uh oh!

github-actions bot commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

ogrisel commented Apr 25, 2025

Uh oh!

Dlimim commented Apr 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dlimim commented Apr 26, 2025

Uh oh!

Dlimim commented Apr 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dlimim commented Apr 26, 2025

Uh oh!

Dlimim commented Apr 26, 2025

Uh oh!

yaichm commented Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smarie commented Apr 28, 2025

Uh oh!

yaichm commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smarie commented May 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smarie May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smarie May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smarie commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 24, 2025 •

edited

Loading

Dlimim commented Apr 26, 2025 •

edited

Loading

Dlimim commented Apr 26, 2025 •

edited

Loading

yaichm commented Apr 27, 2025 •

edited

Loading

yaichm commented Apr 28, 2025 •

edited

Loading

smarie May 26, 2025 •

edited

Loading

smarie May 26, 2025 •

edited

Loading

ogrisel commented May 26, 2025 •

edited

Loading

smarie commented Jun 10, 2025 •

edited

Loading