Skip to content

TruncatedSVD by eigh #2572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mblondel opened this issue Nov 4, 2013 · 9 comments
Closed

TruncatedSVD by eigh #2572

mblondel opened this issue Nov 4, 2013 · 9 comments

Comments

@mblondel
Copy link
Member

mblondel commented Nov 4, 2013

The right singular vectors of X can be computed by the eigenvalue decomposition of np.dot(X.T, X). So if n_features is not too large (say < 1000), this could be an efficient solver for TruncatedSVD.

Likewise, if n_samples << n_features, the left singular vectors could be obtained by the eigenvalue decomposition ofnp.dot(X, X.T) but then we would only be able to implement fit_transform, not transform.

See http://en.wikipedia.org/wiki/Singular_value_decomposition

@MechCoder
Copy link
Member

Hi, @mblondel I would like / am working on this issue. I played with the current implementation of truncatedSVD, using a few examples, given in the research papers, If I am right, you would like to have a new algorithm "eigh" for finding the SVD decomposition of X?

Any specific pointers, before I actually start coding on this and submit a PR within 2-3 days?

@mblondel
Copy link
Member Author

mblondel commented Nov 7, 2013

Nope, only the right singular vectors.

@larsmans
Copy link
Member

We have this already. TruncatedSVD(algorithm="arpack") uses scipy.sparse.linalg.svds, which

is a naive implementation using ARPACK as an eigensolver on A.H * A or A * A.H, depending on which one is more efficient.

(http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.svds.html)

@mblondel
Copy link
Member Author

Thanks I didn't know that. If we only need the right singular vectors, it seems that svds does a little bit more than necessary (but I guess we can live with that?)
https://github.com/scipy/scipy/blob/master/scipy/sparse/linalg/eigen/arpack/arpack.py#L1660

@larsmans
Copy link
Member

I'm not sure what you mean; we always want both the components and the reduced X, right? (Otherwise, a PR to SciPy and a backported svds would seem more appropriate than introducing our own variant.)

@mblondel
Copy link
Member Author

@larsmans
Copy link
Member

That was actually supposed to be a TODO: find out if there's any case where doing safe_sparse_dot(X, VT.T).T would be faster than np.dot(U, Sigma.T). The latter is a matrix-vector multiplication of cost O(n_samples×n_components), while the former is typically be a sparse-dense matrix multiplication of cost O(X.nnz×n_components), if I'm not mistaken. But since X.nnz is expected to be larger than n_samples and there's a large constant hiding in the latter big-O, I think the comment can go away.

@larsmans
Copy link
Member

Shall we close this issue?

@mblondel
Copy link
Member Author

I guess so. We could add the following comment:

svds is a naive implementation using ARPACK as an eigensolver on A.H * A or A * A.H, depending on which one is more efficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants