-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] add lobpcg svd_solver to PCA and TruncatedSVD #12319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request introduces 5 alerts when merging 539dfd4 into 63e5ae6 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
This pull request introduces 3 alerts when merging 2beaafc into 63e5ae6 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
This pull request introduces 3 alerts when merging 5375a48 into 63e5ae6 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
This pull request introduces 3 alerts when merging 17c9196 into 4e2e1fa - view on LGTM.com new alerts:
Comment posted by LGTM.com |
This pull request introduces 2 alerts when merging 3546217 into 5fd9e03 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
The development docs say: "The full scikit-learn tests can be run using ‘make’ in the root folder. Alternatively, running ‘pytest’ in a folder will run all the tests of the corresponding subpackages." This works for me:
after building the package in-place |
This pull request introduces 2 alerts when merging d658176 into 5fd9e03 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
OK, I have your branch now. I am looking at the LGTM alerts. |
I am not sure how you want to proceed with working on this branch together, so here is my proposal:
You can add my repo by adding the following to your .git/config:
and then just use |
OK, I accepted your invitation, and pushed my commit to this PR. Let's see :) |
This pull request introduces 1 alert when merging 7db19f8 into 03c3af5 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
OK, I will try debugging the errors (probably tomorrow). Concerning the single LGTM alert |
This means that |
Don't worry about the lgtm.com alerts
|
This pull request introduces 1 alert when merging d435213 into 00c2f41 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
This pull request introduces 1 alert when merging eea2083 into 00c2f41 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
This pull request introduces 1 alert when merging 3661ef5 into 39bd736 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
This pull request introduces 1 alert when merging 7f9ab5a into 39bd736 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
This pull request introduces 1 alert when merging 4c93c2f into 39bd736 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
This pull request introduces 1 alert when merging 11ef291 into 39bd736 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
@lobpcg I have fetched your changes, thanks. I do not really understand the |
This pull request introduces 1 alert when merging 2c2e5bb into 8985a63 - view on LGTM.com new alerts:
Comment posted by LGTM.com |
@lobpcg in what way? Seems OK for me. BTW. I have fixed the docstring errors. |
I call git in two alternative ways, both GUI, locally and might have created a conflict locally.
I of course noticed, the fixes are way above my head, great work! All tests are passed now, except for LGTM, which are recommended to be ignored. There is nothing else trivial I can think of is left for us to do. I removed [WIP]. Should I put [MRG]? lobpcg.py is well improved, thanks to your editing. Do you want to PR it to scipy? Add to the existing scipy PR? BTW, in scipy/scipy#9275 I suggest adding lobpcg_svd to scipy, where it naturally belongs. It now can probably be just mostly copy/pasted from https://github.com/lobpcg/scikit-learn/blob/lobpcg_svd/sklearn/utils/extmath.py to scipy... |
The easiest way of fixing that is to fetch from github, and hard-reset your branch (master?) to the fetched one, in case you do not have local changes. If you have non-commited local changes, you can stash them first (
It might be easier to create a new PR, after scipy/scipy#9352 is merged. Of course, just in case scipy maintainers agree to merge it. I will ping them to get some opinions.
OK. |
I do not really understand the merge conflict - we just added a function, and the function logsumexp() below it was removed in the master - removing logsumexp() in this branch did not help resolving the problem. |
To my vague recollection, the cost/iteration should be m^2n+9mn^2+n^3 in LOBPCG vs probably something like m^2n+mn^2+n^3 in randomized. When m/n is large, the difference 8mn^2 is well dominated by the common first term m^2n, so the cost per iteration is nearly the same, while LOBPCG should always converge faster, often MUCH faster, than randomized, which may become visible even after only a few iterations. The speed improvement depends on the distribution of the singular values, not directly on m/n, that is why it should be inconsistent between different datasets. So the reported tests results for "large" m/n are as expected. Another consideration: "Direct solve" eigh uses n=m and takes m^3 (no iterations) in LAPACK. When m/n is small, like 10, and you need to iterate at least 10 times, it's much better just to run plain LAPACK eigh, so testing this PR on m/n<10 makes little practical sense. In fact, if m/n<5 LOBPCG algorithm does not even run - the lobpcg code simply switches inside to the standard LAPACK solver, moreover, does it very inefficiently, see scipy/scipy#10983, (the user is not expected to run The bottom line is that LOBPCG cannot lose to randomized, unless m/n is small, but then they both should lose to eigh |
scipy/scipy#9275 is now merged, adding lobpcg_svd to scipy |
@glemaitre or anybody else: I have just updated https://github.com/lobpcg/scikit-learn/tree/master from https://github.com/scikit-learn/scikit-learn I am now trying to update this branch https://github.com/lobpcg/scikit-learn/tree/lobpcg_svd from https://github.com/lobpcg/scikit-learn/tree/master but cannot due to Conflicting files which I do not know how to resolve. Could you please help to resolve the Conflicting files so that I can bring this branch up to date with https://github.com/lobpcg/scikit-learn/tree/master . I just use GitHub Desktop, not really familiar with git commands. |
@thomasjpfan -thanks very much for your help! There are some docstring errors/warnings that I still cannot figure out how to fix,
But I guess they can be just ignored for now. |
Just chiming in to support this PR! |
@lobpcg If merged, this would work correctly for a sparse input into PCA (i.e. implicitly center the input), right? |
@dkobak Thanks for your support! Yes, this PR surely also works correctly for a sparse input into PCA, adding a new solver 'lobpcg' within scikit. However, it is an independent separate issue from and does not supersede the implicit centering the input like in #12794. |
@lobpcg this pr seems stalled. Do you need any help getting it merged? I'd be happy to contribute commits |
@zachmayer yes, thanks for your willingness to help! The known work remained is as follows:
Still interested to help getting it merged? |
@lobpcg I'll take a look. Could you elaborate on step 3? I may be able to at least help resolve the merge conflicts |
scipy/scipy#10830 has implemented LOBPCG solver in svds in scipy while this PR being stalled. Which gives yet another option to add lobpcg svd_solver to PCA and TruncatedSVD simply by calling https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.svds.html with solver=‘lobpcg’ |
Ahh, yeah calling the scipy solver with |
Yes, but one needs to consider that:
|
I think it's fine for a new feature to be only available for newer versions of the dependency as long as we raise a |
scipy/scipy#10830 has implemented LOBPCG solver in svds in scipy while this PR being stalled. Which allows using lobpcg svd_solver to PCA and TruncatedSVD simply by calling https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.svds.html with solver=‘lobpcg’ |
I think I am a bit confused here @lobpcg: the scipy PR was merged two years ago, so why are you only closing this PR now, two years later? In any case, thanks a lot for your efforts! |
I thought about coming back to it, factoring in the SciPy implementation, but never found time and then forgot about it. |
The comment of @ogrisel gives the way forward since this is merged in SciPy: #12319 (comment) Basically, we have two possibilities: either backport the method in
I am pretty upset when reading these types of comments that I find really indecent. |
I surely did not mean to upset anyone, so I apologize. |
Reference Issues/PRs
fixes #12079, fixes #12080
What does this implement/fix? Explain your changes.
#12079 adds LOBPCG as an SVD solver in PCA
#12080 adds LOBPCG solver to Truncated PCA
lobpcg_svd should also be useful in KernelPCA for faster partial decompositions, see #12068
This PR also includes multiple LOBPCG related bug fixes, including vendoring sklearn/externals/_lobpcg.py from scipy 1.3.0
Any other comments?
@ogrisel Transferred from permanently closed PR #12291
Keep in mind for testing, that lobpcg_svd falls back to dense eigensolver unless n_components < 3*matrix_size, where matrix_size = min (n_samples, n_features)
Still to do, better in new focused PRs after this one is merged
example plot_faces_decomposition may include lobpcg_svd, just change
('Eigenfaces - PCA using randomized SVD',
decomposition.PCA(n_components=n_components, svd_solver='randomized',
whiten=True),
True),
to
but lobpcg currently fails here for unclear numerical reasons. More testing may be needed for float32 data, like in this example.