Apache Mahout - SSVD PDF

10/15/2014
Apache Mahout: Scalable machine learning and data mining
Stochastic Singular Value Decomposition

Stochastic SVD method in Mahout produces reduced rank Singular Value Decomposition output in its strict
mathematical definition: A UV .
The benefits over other methods are:

reduced flops required compared to Krylov subspace methods
In map-reduce world, a fixed number of MR iterations required regardless of rank requested
Tweak precision/speed balance with options.
A is a Distributed Row Matrix where rows may be identified by any Writable (such as a document path). As
such, it would work directly on the output of seq2sparse.
As of 0.7 trunk, includes PCA and dimensionality reduction workflow (EXPERIMENTAL! Feedback on
performance/other PCA related issues/ blogs is greatly appreciated.)
Map-Reduce characteristics:
SSVD uses at most 3 MR sequential steps (map-only + map-reduce + 2 optional parallel map-reduce jobs) to
produce reduced rank approximation of U, V and S matrices. Additionally, two more map-reduce steps are added
for each power iteration step if requested.
Potential drawbacks:
potentially less precise (but adding even one power iteration seems to fix that quite a bit).
Twitter
Tweets
Paul Scott
@paulscott56
Thanks to the 6 brave souls for
attending my @ApacheMahout talk at
#tech4africa. Hope I answered some
questions!
Retweeted by Apache Mahout
Expand
Vintzooz
@vintzooz
Tutoriel sur Apache Mahout, un sousprojet Hadoop pour la mise en place
d'outils de recommandation
soat.developpez.com/tutoriels/nosq
Expand
John Kennedy
@CommerceJohn
A Great AWS Mahout tutorial
blogs.aws.amazon.com/bigdata/post/T
#mapreduce #ecommerce #tech #math
Expand
Documentation
Overview and Usage
Note: Please use 0.6 or later! for PCA workflow, please use 0.7 or later.
Zaizi Social
@Zaizi
Content Recommendation in @Alfresco
using @ApacheMahout, read our latest
blog bit.ly/1rurNHN
Tweet to @ApacheMahout
Publications
Nathan Halko's dissertation "Randomized methods for computing low-rank approximations of matrices" contains
comprehensive definition of parallelization strategy taken in Mahout SSVD implementation and also some
precision/scalability benchmarks, esp. w.r.t. Mahout Lanczos implementation on a typical corpus data set.
Halko, Martinsson, Tropp paper discusses family of random projection-based algorithms and contains theoretical
error estimates.
Apache Software
Foundation
How the ASF works
R simulation
Get Involved
Non-parallel SSVD simulation in R with power iterations and PCA options. Note that this implementation is not
most optimal for sequential flow solver, but it is for demonstration purposes only.
Developer Resources
However, try this R code to simulate a meaningful input:
Thanks
Sponsorship
tests.R
n<-1000
m<-2000
k<-10
qi<-1
Related Projects
Lucene
Hadoop
#simulated input
svalsim<-diag(k:1)
usim<- qr.Q(qr(matrix(rnorm(m*k, mean=3), nrow=m,ncol=k)))
vsim<- qr.Q(qr( matrix(rnorm(n*k,mean=5), nrow=n,ncol=k)))
x<- usim %*% svalsim %*% t(vsim)
and try to compare ssvd.svd(x) and stock svd(x) performance for the same rank k, notice the difference in the
running time. Also play with power iterations (qIter) and compare accuracies of standard svd and SSVD.
Note: numerical stability of R algorithms may differ from that of Mahout's distributed version. We haven't studied
accuracy of the R simulation. For study of accuracy of Mahout's version, please refer to Nathan's dissertation as
referenced above.
Modified SSVD Algorithm.
Given an m n matrix A , a target rank k N 1 , an oversampling parameter p N 1 , and the number of
additional power iterations q N 0 , this procedure computes an m (k + p) SVD A UV :

https://mahout.apache.org/users/dim-reduction/ssvd.html
1/2
10/15/2014
Apache Mahout: Scalable machine learning and data mining
1.
Create seed for random n (k + p) matrix . The seed defines matrix using Gaussian unit vectors
per one of suggestions in Halko, Martinsson, Tropp .
2.
Y = A, Y R
3.
Column-orthonormalize Y
Q R
4.
m(k+p)
B0 = Q
A :
m(k+p)
, R R
B R
(k+p)n
. I denote this as Q
6.
Compute Eigensolution of a small Hermitian B q B q
7.
Singular values
8.
9.
: Bi
If q
= 1..q
= A
qr (AB i1 ) . Q
If needed, compute
0.5
, or, in other words, s i
^
U = QU
If needed, compute V
. Also,
(power iterations step).
^
^
= U U
= i
, Bq Bq
(k+p)(k+p)
1
^
= Bq U
= QR
= qr (Y) . Q
5.
> 0
repeat: for i
by computing thin decomposition Y
(k+p)(k+p)
. Another way is V
= A
Copyright 2014 The Apache Software Foundation, Licensed under the Apache License, Version 2.0 .
Apache and the Apache feather logos are trademarks of The Apache Software Foundation.
https://mahout.apache.org/users/dim-reduction/ssvd.html
2/2

Apache Mahout - SSVD PDF

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Apache Mahout - SSVD PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Apache Mahout - SSVD PDF

Uploaded by

Copyright:

Available Formats

10/15/2014

Apache Mahout: Scalable machine learning and data mining

Stochastic Singular Value Decomposition

The benefits over other methods are:

However, try this R code to simulate a meaningful input:

additional power iterations q N 0 , this procedure computes an m (k + p) SVD A UV :

Apache Mahout: Scalable machine learning and data mining

Compute Eigensolution of a small Hermitian B q B q

, or, in other words, s i

(power iterations step).

by computing thin decomposition Y

You might also like