Skip to content

ENH: Faster Eigen Decomposition For Isomap & KernelPCA #31247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

yaichm
Copy link

@yaichm yaichm commented Apr 24, 2025

Fixes #31246

Implemented randomized_eigh(selection='values') and integrated it into KernelPCA and Isomap

  • Introduced a new eigenvalue decomposition function randomized_eigh(values) for faster computation.

  • Integrated this solver into both KernelPCA and Isomap as an alternative to dense solvers.

  • Added comprehensive tests in extmath.py to validate the decomposition accuracy.

  • Benchmarked against existing solvers, comparing:

    • Execution time in KernelPCA and Isomap
    • Reconstruction error in Isomap

    The benchmark result graphs comparing execution time and reconstruction error with existing solvers will be added in the comment below.

Copy link

github-actions bot commented Apr 24, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 4939734. Link to the linter CI: here

@yaichm yaichm changed the title ENH: Faster Eigen Decomposition For & KernelPCA ENH: Faster Eigen Decomposition For Isomap & KernelPCA Apr 24, 2025
@ogrisel
Copy link
Member

ogrisel commented Apr 25, 2025

Thanks for the PR @yaichm.

Please follow the instructions of the automated comment above to resolve the failing linter continuous integration.

If you need help (e.g. the instructions are not clear enough), please let us know with a specific description of your attempt at resolving the problems and the problem you faced.

The benchmark result graphs comparing execution time and reconstruction error with existing solvers will be added in the comment below.

Looking forward to it. Please feel free to ping me once done.

@Dlimim
Copy link

Dlimim commented Apr 26, 2025

KPCA_Execution_Time_vs_components

In this Kernel PCA benchmark (focus on the left side of the graph), our custom randomized_value solver shows better scalability compared to standard solvers when the number of components is large.

@Dlimim
Copy link

Dlimim commented Apr 26, 2025

KPCA_Execution_Time_vs_Samples

When increasing the number of samples, our randomized_value solver consistently achieves much lower execution times compared to the full solver. Even with large datasets, randomized_value remains highly efficient and scalable.

@Dlimim
Copy link

Dlimim commented Apr 26, 2025

Isomap_Solvers_Comparaison

In this Isomap benchmark , for a small number of components, both solvers show comparable execution times. However, as the number of components increases ( > 10 ), randomized_value significantly outperforms the full solver, achieving much faster execution for large datasets.

@Dlimim
Copy link

Dlimim commented Apr 26, 2025

Isomap_Solvers

The projections obtained with the auto solver and the randomized solver are visually very similar, confirming that the randomized solver preserves the structure and quality of the embedding. This highlights its reliability alongside its faster execution.

@Dlimim
Copy link

Dlimim commented Apr 26, 2025

Isomap_Reconstruction_Erreur

The reconstruction error is very similar between the auto and randomized solvers across different sample sizes. This confirms that the randomized solver preserves the quality of the reconstruction while offering faster execution.

@yaichm
Copy link
Author

yaichm commented Apr 27, 2025

I’ve followed the automated instructions and fixed the linter issues.

The benchmark result graphs have also been added.

@ogrisel , feel free to review whenever you are available.

@smarie
Copy link
Contributor

smarie commented Apr 28, 2025

Thanks to the team for finalizing these results !

So to summarize

  • on KernelPCA, we see similar results with method 5.3 (randomized eigenvalues) than with the "trick" of using 4.3 (randomized SVD). This makes me quite confident that the implementation is correct
  • on Isomap, where the "trick" could not be used (since matrices are not guaranteed to be PSD),
    • when the number of components is low (2), there is no drawback: the speed is the same than the current method based on arpack, and the result on sample dataset is identical (same visual figure, same reconstruction error)
    • we see clear speed benefits when the number of components is high (right figure of this message, where 50 components are selected). @yaichm It would be interesting to see the reconstruction error corresponding to this situation, to make sure that the speed gain was not obtained at the cost of increasing the reconstruction error

@yaichm
Copy link
Author

yaichm commented Apr 28, 2025

As requested by @smarie, here is the image showing the reconstruction error for 50 components.
Capture d’écran du 2025-04-28 22-48-25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Faster Eigen Decomposition for Isomap & KernelPCA
4 participants