-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Description
(disclaimer: this issue and associated PR are part of a student project supervised by @smarie )
Summary
Eigendecomposition is slow when number of samples is large. This impacts decomposition models such as KernelPCA and Isomap. A "randomized" eigendecomposition method (from Halko et al) has been introduced for KernelPCA leveraging Halko's algorithm 4.3 for randomized SVD decomposition (also used in PCA).
Unfortunately, the current approach is only valid for decomposition of PSD matrices - which suits well for KernelPCA but can not be true in the context of Isomap. Therefore Isomap has not accelerated implementation as of today.
We propose to introduce an additional approximate eigendecomposition method based on algorithm 5.3 from the same paper.
This method should offer a faster alternative to existing solvers (arpack, dense, etc.) while maintaining accuracy, and as opposed to randomized svd, is suitable to find eigenvalues for non-PSD matrices.
Describe your proposed solution
- Implement
_randomized_eigsh(selection='value')
, that is left as NotImplemented today. - Integrate it as an alternate solver in
Isomap
and inKernelPCA
. - Add tests comparing performance with existing solvers.
- Provide benchmarks to evaluate speedup and accuracy.
Motivation
- Improves scalability for large datasets.
- Reduces computation time for eigen decomposition-based methods.
Note: this solution could be used to accelerate all models relying on eigenvalue decomposition, including possibly #22330