0% found this document useful (0 votes)
158 views1 page

Parallel Computing::: Cheat Sheet

Uploaded by

Gerald
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
158 views1 page

Parallel Computing::: Cheat Sheet

Uploaded by

Gerald
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Parallel Computing :: CHEAT SHEET

Splitting : parallel.R : core package future.R : asynchronously


library(parallel) library(future) (variables run as soon as created)
Splitting a code by : plan(multicore)   
ncores <- detectCores(logical=F) # physical cores 
1. Task (different tasks on same data) # plans : sequential, cluster, multicore, multiprocess
cl <- makeCluster(ncores)
2. Data (one task on different data)
clusterApply(cl, x = c(…), fun = FUN) # FUN(x,…)    x %<-% mean(rnorm(100))
Hardware needs : stopCluster(cl) y %<-% mean(rnorm(100))
CPU (+2 cores)
RAM (shared memory vs distributed memory) Initialization of workers : future.apply.R : parallel _apply
library(future.apply) (parallel _apply functions)
clusterCall(cl,FUN) # calls FUN on workers

2 ideas in parallel computing : clusterEvalQ(cl, exp) # eval an exp. on workers


plan(multicore)    # can be other plans
future_apply(n,FUN),future_lapply(…),future_sapply(…)
## clusterEvalQ(cl, library(foo))
1. Map-Reduced Models :
(distributed data; physically on different devices)
clusterExport(cl, varlist) # varlist on workers
## clusterExport(cl, c("mean")) where mean = 10
foreach.R : Parallel
• Hadoop needs backend packages support parallel computing
• Spark
R Packages: Data Chunk on workers : • doParallel(parallel.R),doFuture (future.R),doSEQ

• sparklyr, iotools
• pbdr (programming with big data in R) 1. generated on workers
# clusterApply(cl,x, FUN) e.g FUN(){ rnorm()}
doParallel.R : backend of foreach
2. generated on master and pass to workers library(doParallel)
2. Master - Worker Models : # ind <- splitIndices(200, 5) cl <- makeCluster(ncores) # ncores = 2,3,…
(M tasks on C cores; usually 1 < C << M )
# clusterApply(cl, ind, FUN)
registerDoParallel(cl) # register the backend
# (-) : not efficient in Big Data : heavy
R Packages: foreach(…) %dopar% FUN(…)
3. chunk on workers  # copy of original Data on all workers
• snow, snowFT, snowfal
doFuture.R : backend of foreach
# clusterExport(cl, M) e.g. M is a matrix
• foreach # clusterApply(cl, x, FUN) FUN contains subset M
• future, future.apply

foreach.R : Sequential
library(doFuture)
registerDoFuture()
library(foreach) # by default return a list plan(cluster , workers = 3) # can be other plans

Not always parallel computing: foreach(n = rep(5,3), m = 10^(0:2)) %do% FUN(n,m)


foreach(n, .packages = "X") %do% FUN(n)
foreach(…) %dopar% FUN(…)

stop/start cluster takes time


overhead (communication time b/w master and workers ; not
# FUN needs package X to be run Load Balancing: for uneven task times
good for repeatedly sending big data! foreach(n, .export = c("Y") ) %do% FUN(n,b=Y)  clusterApplyLB(cl,x,FUN) # not for small task time
# FUN needs outside object/function “Y" clusterApply(cl, x = splitIndices(10,2), FUN)
Sequential vs Parallel: foreach(n,.combine = rbind) %do% FUN(n) #row bind library(itertools)
library(microbenchmark) foreach(n,.combine = ‘+’) %do% FUN(n) #rbind + colSum foreach(s=isplitVector(1:10,chunks =2))%dopar% FUN
microbenchmark( FUN1(…), FUN2(…), foreach(n,.combine = c) %do% FUN(n) # vector # e.g. FUN = sapply(s,”*”,100)
times = 10) foreach(n,.combine = c) %:% when(n > 2) %do% FUN(n) future_sapply(…, future.scheduling = 1)

RStudio® is a trademark of RStudio, Inc. • CC BY Ardalan Mirshani • ardeeshany@gmail.com • 814-777-8547 • ArdalanMirshani.com • Updated: 2019-03

You might also like