-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
[MRG + 2] ENH: optimizing power iterations phase for randomized_svd #5141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG + 2] ENH: optimizing power iterations phase for randomized_svd #5141
Conversation
So far, plots suggest 2 may be the choice. Larger number may even degrade quality of the approximation. The flat behaviour of the last two plots come from very small datasets. Ping @ogrisel |
This looks great although the plots on diabetes and abalone are not very interesting as they extract a single component. Maybe you should force to extract at least 5 components to make them more relevant. |
plt.legend(loc="lower right") | ||
plt.axhline(y=0, ls=':', c='black') | ||
plt.suptitle("%s: distances from SVD vs. running time\n \ | ||
fraction of compontens used in rand_SVD = %i" % |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: components.
Also from your plots, it looks like this is the number of components and not the fraction right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I always take 5%. That's the absolute number. Thanks.
4be01ff
to
b83adab
Compare
I have coded a last test. Looking to the 2 new plots, I think we can be confident that 2 is a good default value. Input was (500, 5000). Rank is the rank of the input matrix, n_comps is the parameter of |
This is great. Thanks for this empirical study. +1 for using 2 power iterations in |
I wonder if we should change the default value for
Any opinion by others, e.g. @larsmans? |
How do we compare against fbpca? https://github.com/facebook/fbpca ? That also just uses power iterations, right? |
Another thing which would be nice to investigate if you have time: our implementation of range_finder uses Algorithm 4.3 but according to the paper the procedure is numerically unstable. The paper suggests a subspace iteration method in Algorithm 4.4. It would be interesting to compare both algorithms in terms of accuracy and computational time. |
It does. And it also uses 2 power iterations by default. I ran a few tests. Sklearn seems slower. Performance is similar, but sklearn is slightly better here. A difference is that Code on this gist. Graphs attached. |
I don't understand this, could you please rephrase? |
Also could you please annotate the scatter plot with the number of power iterations for each dot? See for instance: http://stackoverflow.com/questions/5147112/matplotlib-how-to-put-individual-tags-for-a-scatter-plot (no need for arrows) |
Just commenting on the two plots. Sklearn runs slower than fbpca. But performance in term of matrix factorization error is slighly better than fbpca --until sklearn starts to degrade, for lfw_people.
Sure. |
The difference is how the power iterations are computed, see: https://github.com/facebook/fbpca/blob/master/fbpca.py#L1562 A LU factorization It would be worth checking the paper: maybe our code does not do it because of an oversight on our part. It seems to make the power iteration more stable: too many power iterations do not seem to cause a degradation in the quality of the approximation when the LU steps are there. |
Maybe @ajtulloch would be interested in following this discussion :) |
I think fbpca switches "modes" depending on the number of components - IIRC On Mon, Aug 31, 2015 at 10:32 AM, Giorgio Patrini notifications@github.com
|
and should we also switch modes internally? |
This is only the case when |
Looking into the code, the 3 differences I see are:
I am comparing with the external interface
The paper suggests QR factorizations anyway.
|
Actually on LFW, sklearn is slightly slower, not faster. Maybe the final QR is less expensive when the LU is done in the intermediate steps? It would be great to confirm with an experiment by adding an option to add the LU steps in our codebase.
This looks very likely indeed. |
The implementation in https://github.com/blaze/dask/blob/master/dask/array/linalg.py#L225 |
It would be interesting to compare QR and LU decompositions. Another thing I am curious is whether it would be worth using a stopping criterion for the loop in the range finder. Halko et al. suggest ||A - QQ^TA||. This can however be expensive to compute. |
Cool, great discussion! LMK if we (Mark Tygert and I wrote fbpca) can contribute in any way. There's a fairly detailed evaluation at http://tygert.com/implement.pdf if that's useful as well. |
Thanks for the very useful reference @ajtulloch! I am going to make a benchmark file for playing with different versions of the power iterations. I will keep comparing with your |
|
lgtm. |
This lgtm as well |
The what's new entries should be put under 0.18. |
Sure. I was hoping for a merge in 0.17 ;) Also, we should have a common default policy for the power iterations ( |
7a9c3bc
to
e4fb7e9
Compare
I did not implement the faster spectral norm estimate. That measure depends on random initialization, which make benchmarking trickier (need many runs and averaged performance etc.). |
0a12481
to
ce138f7
Compare
ce138f7
to
b18f295
Compare
Done here. |
+1 for merging this now and re-adjusting the default params for consistency in the PCA collapse PR if needed. |
[MRG + 2] ENH: optimizing power iterations phase for randomized_svd
🍋cello! |
🍻 |
:craft beer: |
In practice this is often enough for obtaining a good approximation of the | ||
true eigenvalues/vectors in the presence of noise. By `Giorgio Patrini`_. | ||
|
||
- :func:`randomized_range_finder` is more numerically stable when many |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please add a link to the PR to whatsnew? And versionadded
hints for the changed parameters would be nice ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This you added this in the wrong diff @amueller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, now I get it @amueller . just give me time :)
I am writing some benchmarks to see if we can set a default number of power iterations for `utils.extmath.randomized_SVD'. The function implements the work of Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions, Halko, et al., 2009 http://arxiv.org/abs/arXiv:0909.4061
See the code for mode details.
Still to do:
randomized_svd
sample the random matrix fromU[-1,1]
instead of from a GaussianAnd finally
n_iter
inrandomized_svd
and a default strategy for normalizing matrices between power iterationsrandomized_svd
(in TruncatedSVD and RandomizedPCA)