Parallel Computing::: Cheat Sheet
Parallel Computing::: Cheat Sheet
• sparklyr, iotools
• pbdr (programming with big data in R) 1. generated on workers
# clusterApply(cl,x, FUN) e.g FUN(){ rnorm()}
doParallel.R : backend of foreach
2. generated on master and pass to workers library(doParallel)
2. Master - Worker Models : # ind <- splitIndices(200, 5) cl <- makeCluster(ncores) # ncores = 2,3,…
(M tasks on C cores; usually 1 < C << M )
# clusterApply(cl, ind, FUN)
registerDoParallel(cl) # register the backend
# (-) : not efficient in Big Data : heavy
R Packages: foreach(…) %dopar% FUN(…)
3. chunk on workers # copy of original Data on all workers
• snow, snowFT, snowfal
doFuture.R : backend of foreach
# clusterExport(cl, M) e.g. M is a matrix
• foreach # clusterApply(cl, x, FUN) FUN contains subset M
• future, future.apply
foreach.R : Sequential
library(doFuture)
registerDoFuture()
library(foreach) # by default return a list plan(cluster , workers = 3) # can be other plans
RStudio® is a trademark of RStudio, Inc. • CC BY Ardalan Mirshani • ardeeshany@gmail.com • 814-777-8547 • ArdalanMirshani.com • Updated: 2019-03