Skip to content

Add svd_solver="lobpcg" option to PCA #30075

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

flying-sheep
Copy link

Reference Issues/PRs

Fixes #12079

What does this implement/fix? Explain your changes.

This allows PCA to pass the svd_solver argument on to svds

Copy link

github-actions bot commented Oct 15, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 229764a. Link to the linter CI: here

@ogrisel
Copy link
Member

ogrisel commented Oct 15, 2024

Out of curiosity, could you please adapt benchmarks/bench_pca_solvers.py to also include this solver and post the resulting plots in the description of this PR?

@ogrisel
Copy link
Member

ogrisel commented Oct 15, 2024

BTW, before considering reviewing this PR, we would need to make the tests pass first. There are many failures, and I did not investigate the cause myself.

@flying-sheep
Copy link
Author

Am I interpreting this correctly that once the tests pass and I fix possible other issues (like some places where I missed adding it to some set or so) this will go in?

We’re currently deprecating some old code and your answer determines if we’ll tell our users “stop using lobpcg, we won’t support it in the future” or nothing, because we’ll just swap out the implementation at some point in the future.

@glemaitre
Copy link
Member

Am I interpreting this correctly that once the tests pass and I fix possible other issues (like some places where I missed adding it to some set or so) this will go in?

Together with the fix, the important point here is to have a benchmark. If we add a new option, we need to document it and clearly state to our users when to use it (we could already improve our documentation in this regard).

I assume that LOBPCG should be compared when ARPACK is useful (sparse data) and check if there is a specific regime where LOBPCG brings added value. The previous benchmarks (#12319) were certainly not specific enough and we look at LOBPCG as a preconditioner if I recall.

@ogrisel
Copy link
Member

ogrisel commented Oct 16, 2024

LOBPCG can also be compared to our current "randomized" method, both for dense and sparse data. All solvers have different speed / accuracy trade-offs, and it's interesting how close and where they stand on the Pareto optimality frontier of that trade off. It's also important to test on data with different shape, sparsity and distribution and different number of components to extract, as all those parameters impact the trade-off.

@ogrisel
Copy link
Member

ogrisel commented Oct 16, 2024

I would start with running the benchmark and sharing the results here before attempting to make all the tests pass.

@flying-sheep
Copy link
Author

OK, sorry for the noise. We talked with @ivirshup who said that for our use case, lobpcg hasn’t been terribly accurate.

We could probably troubleshoot this and make it more accurate here and for our use case, but we don’t have enough motivation to do this, so we will just deprecate it for now and maybe re-add it later if someone else has added support for it in the meantime.

@lobpcg
Copy link
Contributor

lobpcg commented Oct 24, 2024

@flying-sheep @ivirshup
Some default parameters in different eigenvalue solvers are incompatible. For your case, it's the "Maximum number of iterations" I am quite sure.

From
https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.lobpcg.html

maxiter
int, default: 20
Maximum number of iterations.

while, e.g., in https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.eigs.html

maxiterint, optional
Maximum number of Arnoldi update iterations allowed Default: n*10

To fix this, one needs to add the possibility of passing this parameter and set it up externally as needed,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

new feature: add LOBPCG as an SVD solver in PCA
4 participants