-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH: Faster Eigen Decomposition For Isomap & KernelPCA #31247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ENH: Faster Eigen Decomposition For Isomap & KernelPCA #31247
Conversation
…ith tests and integration into Isomap and KernelPCA
…into feature/randomized_eigsh_value
Thanks for the PR @yaichm. Please follow the instructions of the automated comment above to resolve the failing linter continuous integration. If you need help (e.g. the instructions are not clear enough), please let us know with a specific description of your attempt at resolving the problems and the problem you faced.
Looking forward to it. Please feel free to ping me once done. |
I’ve followed the automated instructions and fixed the linter issues. The benchmark result graphs have also been added. @ogrisel , feel free to review whenever you are available. |
Thanks to the team for finalizing these results ! So to summarize
|
As requested by @smarie, here is the image showing the reconstruction error for 50 components. |
@ogrisel I think the team is now ready for a first review (see summary #31247 (comment)) |
@@ -198,10 +198,6 @@ def test_randomized_eigsh(dtype): | |||
# eigenvectors | |||
assert eigvecs.shape == (4, 2) | |||
|
|||
# with 'value' selection method, the negative eigenvalue does not show up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace this section with
eigvals, eigvecs = _randomized_eigsh(X, n_components=2, selection="value")
# eigenvalues
assert eigvals.shape == (2,)
assert_array_almost_equal(eigvals, [3.0, 1.0]) # signed ordering: positive eig
# eigenvectors
assert eigvecs.shape == (4, 2)
# make a random PSD matrix | ||
X = make_sparse_spd_matrix(n_features, random_state=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is PSD, but we should also check non-PSD. So let's create a symmetric random matrix instead. If you wish you can test the two cases
- add a
@pytest.mark.parametrize("is_psd", (True, False))
- in the test, switch on
if is_psd:
to create the random X.
with bounded error. Unlike the 'module' strategy, it works efficiently with | ||
non-positive semidefinite matrices, handling both positive and negative | ||
eigenvalues directly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather suggest to have a simple comment here :
with bounded error. Unlike the 'module' strategy, it works efficiently with | |
non-positive semidefinite matrices, handling both positive and negative | |
eigenvalues directly. | |
with bounded error. Unlike the 'module' strategy, it returns the top `k` eigenvalues by decreasing value: all the positive ones first, then the negative ones if any. |
And to add a more detailed comment in the Strategy "module"
description:
Strategy 'module':
(...existing definition...)
Unlike the 'value' strategy, this returns the top `k` eigenvalues by decreasing **module**. Therefore, when `M` is non-positive semidefinite large negative eigenvalues will be returned before small positive ones.
When `M` is psd both strategies lead to the same results as all eigenvalues are positive.
Fixes #31246
Implemented randomized_eigh(selection='values') and integrated it into KernelPCA and Isomap
Introduced a new eigenvalue decomposition function randomized_eigh(values) for faster computation.
Integrated this solver into both KernelPCA and Isomap as an alternative to dense solvers.
Added comprehensive tests in extmath.py to validate the decomposition accuracy.
Benchmarked against existing solvers, comparing:
The benchmark result graphs comparing execution time and reconstruction error with existing solvers will be added in the comment below.