Skip to content

Faster Eigen Decomposition for Isomap & KernelPCA #31246

@oerrabie

Description

@oerrabie

(disclaimer: this issue and associated PR are part of a student project supervised by @smarie )

Summary

Eigendecomposition is slow when number of samples is large. This impacts decomposition models such as KernelPCA and Isomap. A "randomized" eigendecomposition method (from Halko et al) has been introduced for KernelPCA leveraging Halko's algorithm 4.3 for randomized SVD decomposition (also used in PCA).

Unfortunately, the current approach is only valid for decomposition of PSD matrices - which suits well for KernelPCA but can not be true in the context of Isomap. Therefore Isomap has not accelerated implementation as of today.

We propose to introduce an additional approximate eigendecomposition method based on algorithm 5.3 from the same paper.
This method should offer a faster alternative to existing solvers (arpack, dense, etc.) while maintaining accuracy, and as opposed to randomized svd, is suitable to find eigenvalues for non-PSD matrices.

Describe your proposed solution

  • Implement _randomized_eigsh(selection='value'), that is left as NotImplemented today.
  • Integrate it as an alternate solver in Isomap and in KernelPCA.
  • Add tests comparing performance with existing solvers.
  • Provide benchmarks to evaluate speedup and accuracy.

Motivation

  • Improves scalability for large datasets.
  • Reduces computation time for eigen decomposition-based methods.

Note: this solution could be used to accelerate all models relying on eigenvalue decomposition, including possibly #22330

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions