Pram Algorithms: Parallel and Distributed Algorithms BY Debdeep Mukhopadhyay AND Abhishek Somani
Pram Algorithms: Parallel and Distributed Algorithms BY Debdeep Mukhopadhyay AND Abhishek Somani
Pram Algorithms: Parallel and Distributed Algorithms BY Debdeep Mukhopadhyay AND Abhishek Somani
PRAM ALGORITHMS
2
1
27‐07‐2015
COST MODELS
Uniform Cost Criterion: each RAM instruction Consider an 8 bit adder.
requires one unit of time to execute. Every In the uniform cost criteria to
register requires one unit of space. analyze the run time of the
Logarithmic Cost Criterion: Assumes that adder, we would say that the
every instruction takes a logarithmic number adder takes 1 unit of time, ie.
of time units (wrt. the length of the T(N)=1.
operands), and that every register requires However, in the logarithmic
a logarithmic number of units of space. model you would consider that
Thus, uniform cost criteria count the number the 1’s position bits are added,
of operations and logarithmic cost criteria followed by the 2’s position bits,
count the number of bit operations. and so on. In this model, thus
there are 8 smaller additions
The uniform cost criterion is applicable if
the values manipulated by a program (for every bit positions) and each
always fit into one computer word. requires a unit of time. Thus,
T(N)=8. Generalizing,
T(N)=log(N).
4
2
27‐07‐2015
Analogous definitions hold for the space complexities (just replace the
time word by space).
3
27‐07‐2015
4
27‐07‐2015
PRAM MODELS
The models differ in how they handle read or write
conflicts, ie. when two or more processors attempt to read
from or write to the same global memory location.
1. EREW (Exclusive Read Exclusive Write) Read or write conflicts are not
allowed.
2. CREW (Concurrent Read Exclusive Write) Concurrent reading allowed, ie.
Multiple processors may read from the same global memory location
during the same instruction step. Write conflicts are not allowed.
1. During a given time, ie. During a given step of an algorithm, arbitrarily many PEs can read
the value of a cell simultaneously while at most one PE can write a value into a cell.
3. CRCW (Concurrent Read Concurrent Write): Concurrent reading and
writing are allowed. A variety of CRCW models exist with different
policies for handling concurrent writes to the same global address:
1. Common: All processors concurrently writing into the same global address must be writing
the same value.
2. Arbitrary: If multiple processors concurrently write to the same global address, one of the
competing processors is arbitrarily choses as the winner, and its value is written.
3. Priority: The processor with the lowest index succeeds in writing its value.
9
RELATIVE STRENGTHS
The EREW model is the weakest.
A CREW PRAM can execute any EREW PRAM algorithm in the same time. This is
obvious, as the concurrent read facility is not used.
Similarly, a CRCW PRAM can execute any EREW PRAM algorithm in the same amount
of time.
Because the PRIORITY PRAM model is stronger than the EREW PRAM
model, an algorithm to solve a problem on the EREW PRAM can have
higher time complexity than an algorithm solving the same problem on
the PRIORITY PRAM model.
10
5
27‐07‐2015
11
6
27‐07‐2015
IMPLICATION
A p-processor PRIORITY PRAM can be simulated by a p-processor
EREW PRAM with time complexity increased by a factor of Θ log .
13
PRAM ALGORITHMS
PRAM algorithms work in two phases:
First phase: a sufficient number of processors are activated.
Second phase: These activated processors perform the computation in
parallel.
Given a single active processor to begin with it is easy to see that
log activation steps are needed to activate p processors.
14
7
27‐07‐2015
PARALLEL REDUCTION
The binary tree is one of the most important paradigms of parallel
computing.
In the algorithms that we refer here, we consider an inverted binary
tree.
Data flows from the leaves to the root. These are called fan-in or reduction
operations.
16
8
27‐07‐2015
17
18
9
27‐07‐2015
A[2i]+A[2i+2j].
P0 P2 j=1
Note, the processor which is
active has an i such that:
i mod 2j=0 (ie. keep only P0 j=2
those processors active).
P0 j=3
Also check that the array
does not go out of bound.
ie, 2i+2j<n
19
20
10
27‐07‐2015
COMPLEXITY
21
PREFIX SUM
Given a set of n values a1, a2, …, an, and an associative operation
⊕, the prefix sum problem is to calculate the n quantities:
a1,
a1 ⊕ a2,
…
a1 ⊕ a2 ⊕ … ⊕ an
22
11
27‐07‐2015
23
24
12
27‐07‐2015
25
COMPLEXITY
Running time is t(n) = (lg n)
Cost is c(n) = p(n) t(n) = (n lg n)
Note not cost optimal, as RAM takes (n)
26
13
27‐07‐2015
27
28
14
27‐07‐2015
29
COMPLEXITY ANALYSIS
Analysis:
Step 1 takes O(k) = O(lg n) time.
Step 2 takes (lg m) = (lg n/k)
= O(lg n- lg k) = (lg n - lg lg n)
= (lg n)
Step 3 takes O(k) = O(lg n) time
The running time for this algorithm is (lg n).
The cost is ((lg n) n/(lg n)) = (n)
Cost optimal, as the sequential time is O(n)
30
15
27‐07‐2015
BROADCASTING ON A PRAM
“Broadcast” can be done on CREW PRAM in O(1) steps:
Broadcaster sends value to shared memory
Processors read from shared memory
M B
P P P P P P P P
Requires lg(P) steps on EREW PRAM.
16
27‐07‐2015
Concurrent-write
In step 4 and 5, processors with A[i] < A[j] write the same value ‘false’ into the same
location m[i]
This actually implements m[i] = (A[i] A[1]) … (A[i] A[n])
Is this work efficient?
No, n2 processors in O(1)
O(n2) work vs. sequential algorithm is O(n)
What is the time complexity for the Exclusive-write?
Initially elements “think” that they might be the maximum
First iteration: For n/2 pairs, compare.
n/2 elements might be the maximum.
Second iteration: n/4 elements might be the maximum.
log n th iteration: one element is the maximum.
So Fast_max with Exclusive-write takes O(log n).
O(1) (CRCW) vs. O(log n) (EREW)
CRCW
Hardware implementations are expensive
Used infrequently
Easier to program, runs faster, more powerful.
Implemented hardware is slower than that of EREW
In reality one cannot find maximum in O(1) time
EREW
Programming model is too restrictive
Cannot implement powerful algorithms
17