TruncatedSVD by eigh #2572

mblondel · 2013-11-04T16:07:56Z

The right singular vectors of X can be computed by the eigenvalue decomposition of np.dot(X.T, X). So if n_features is not too large (say < 1000), this could be an efficient solver for TruncatedSVD.

Likewise, if n_samples << n_features, the left singular vectors could be obtained by the eigenvalue decomposition ofnp.dot(X, X.T) but then we would only be able to implement fit_transform, not transform.

See http://en.wikipedia.org/wiki/Singular_value_decomposition

The text was updated successfully, but these errors were encountered:

MechCoder · 2013-11-06T20:56:58Z

Hi, @mblondel I would like / am working on this issue. I played with the current implementation of truncatedSVD, using a few examples, given in the research papers, If I am right, you would like to have a new algorithm "eigh" for finding the SVD decomposition of X?

Any specific pointers, before I actually start coding on this and submit a PR within 2-3 days?

mblondel · 2013-11-07T02:25:03Z

Nope, only the right singular vectors.

larsmans · 2013-11-11T12:56:06Z

We have this already. TruncatedSVD(algorithm="arpack") uses scipy.sparse.linalg.svds, which

is a naive implementation using ARPACK as an eigensolver on A.H * A or A * A.H, depending on which one is more efficient.

(http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.svds.html)

mblondel · 2013-11-11T13:53:47Z

Thanks I didn't know that. If we only need the right singular vectors, it seems that svds does a little bit more than necessary (but I guess we can live with that?)
https://github.com/scipy/scipy/blob/master/scipy/sparse/linalg/eigen/arpack/arpack.py#L1660

larsmans · 2013-11-11T14:27:47Z

I'm not sure what you mean; we always want both the components and the reduced X, right? (Otherwise, a PR to SciPy and a backported svds would seem more appropriate than introducing our own variant.)

mblondel · 2013-11-11T15:04:13Z

I think I got confused by this comment:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/truncated_svd.py#L123

larsmans · 2013-11-11T15:53:54Z

That was actually supposed to be a TODO: find out if there's any case where doing safe_sparse_dot(X, VT.T).T would be faster than np.dot(U, Sigma.T). The latter is a matrix-vector multiplication of cost O(n_samples×n_components), while the former is typically be a sparse-dense matrix multiplication of cost O(X.nnz×n_components), if I'm not mistaken. But since X.nnz is expected to be larger than n_samples and there's a large constant hiding in the latter big-O, I think the comment can go away.

larsmans · 2013-11-11T23:59:45Z

Shall we close this issue?

mblondel · 2013-11-12T00:27:41Z

I guess so. We could add the following comment:

svds is a naive implementation using ARPACK as an eigensolver on A.H * A or A * A.H, depending on which one is more efficient.

MechCoder mentioned this issue Nov 7, 2013

Attempt at implementing eigh algorithm for truncatedSVD #2579

Closed

larsmans closed this as completed in b3aff11 Nov 21, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TruncatedSVD by eigh #2572

TruncatedSVD by eigh #2572

mblondel commented Nov 4, 2013

MechCoder commented Nov 6, 2013

mblondel commented Nov 7, 2013

larsmans commented Nov 11, 2013

mblondel commented Nov 11, 2013

larsmans commented Nov 11, 2013

mblondel commented Nov 11, 2013

larsmans commented Nov 11, 2013

larsmans commented Nov 11, 2013

mblondel commented Nov 12, 2013

TruncatedSVD by eigh #2572

TruncatedSVD by eigh #2572

Comments

mblondel commented Nov 4, 2013

MechCoder commented Nov 6, 2013

mblondel commented Nov 7, 2013

larsmans commented Nov 11, 2013

mblondel commented Nov 11, 2013

larsmans commented Nov 11, 2013

mblondel commented Nov 11, 2013

larsmans commented Nov 11, 2013

larsmans commented Nov 11, 2013

mblondel commented Nov 12, 2013