-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Add svd_solver="lobpcg" option to PCA #30075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Out of curiosity, could you please adapt |
BTW, before considering reviewing this PR, we would need to make the tests pass first. There are many failures, and I did not investigate the cause myself. |
Am I interpreting this correctly that once the tests pass and I fix possible other issues (like some places where I missed adding it to some set or so) this will go in? We’re currently deprecating some old code and your answer determines if we’ll tell our users “stop using lobpcg, we won’t support it in the future” or nothing, because we’ll just swap out the implementation at some point in the future. |
Together with the fix, the important point here is to have a benchmark. If we add a new option, we need to document it and clearly state to our users when to use it (we could already improve our documentation in this regard). I assume that LOBPCG should be compared when ARPACK is useful (sparse data) and check if there is a specific regime where LOBPCG brings added value. The previous benchmarks (#12319) were certainly not specific enough and we look at LOBPCG as a preconditioner if I recall. |
LOBPCG can also be compared to our current "randomized" method, both for dense and sparse data. All solvers have different speed / accuracy trade-offs, and it's interesting how close and where they stand on the Pareto optimality frontier of that trade off. It's also important to test on data with different shape, sparsity and distribution and different number of components to extract, as all those parameters impact the trade-off. |
I would start with running the benchmark and sharing the results here before attempting to make all the tests pass. |
OK, sorry for the noise. We talked with @ivirshup who said that for our use case, lobpcg hasn’t been terribly accurate. We could probably troubleshoot this and make it more accurate here and for our use case, but we don’t have enough motivation to do this, so we will just deprecate it for now and maybe re-add it later if someone else has added support for it in the meantime. |
@flying-sheep @ivirshup From
while, e.g., in https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.eigs.html
To fix this, one needs to add the possibility of passing this parameter and set it up externally as needed, |
Reference Issues/PRs
Fixes #12079
What does this implement/fix? Explain your changes.
This allows
PCA
to pass thesvd_solver
argument on tosvds