0% found this document useful (0 votes)
2 views41 pages

Lect5 Parallel System

The document discusses the performance measurement of parallel systems, focusing on criteria such as running time, speedup, number of processors, cost, and efficiency. It highlights the importance of speedup as a comparison metric between parallel and sequential algorithms, as well as factors limiting speedup like load imbalance and communication overhead. Additionally, it addresses the implications of processor count and cost optimality in achieving efficient parallel computation.

Uploaded by

sama akram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views41 pages

Lect5 Parallel System

The document discusses the performance measurement of parallel systems, focusing on criteria such as running time, speedup, number of processors, cost, and efficiency. It highlights the importance of speedup as a comparison metric between parallel and sequential algorithms, as well as factors limiting speedup like load imbalance and communication overhead. Additionally, it addresses the implications of processor count and cost optimality in achieving efficient parallel computation.

Uploaded by

sama akram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Performance of Parallel Systems

How should the Performance of a parallel computation be


measured ?
A number of criteria are commonly used in evaluating the
goodness of an algorithm.

1. Running time & Speedup


2. Number of Processors
3. Cost
4. Efficiency
1. Running time & Speedup

The primary reason for using parallel algorithm is


to speedup sequential computation.

Therefore, it is quite natural to compare the running


time of a parallel algorithm designed for certain
problem to that of the best available sequential
algorithm for the same problem, These give Speedup
Speedup

SP = TS / TP

Where:
TS is the best possible sequential time.

TP is the time taken by a parallel algorithm


on P processors.
What is meant by TS ?
Is the time taken to run:
• A sequential algorithm on one processor of the parallel computer ?

• A sequential algorithm on the fastest serial machine available ?

• A parallel algorithm on a single processor ?

To keep things fair


TS should be the best possible time in serial
world.
Speedup SP
If we consider SP as the ratio between the
time taken by the parallel algorithm on one
processor to the time taken by the parallel
algorithm on P processors.

This is misleading since many parallel algorithms


contain extra operations to accommodate the
parallelism (e.g. communication) which increases
TS and thus exaggeration the speedup
Speedup SP Limitation

SP = TS / TP where SP < P

Why ?
1. Because the problem cannot be decomposed into independent computations
to be executed simultaneously, while keeping all processors sufficiently busy.

2. Because the structure of parallel computers used imposed certain restrictions


(e.g. communications).
Speedup Curves

Superlinear Speedup

Speedup Linear Speedup

Typical Speedup

Number of Processors
Speedup Curves
• Which ever definition is used the ideal is to produce linear speedup
• A speedup of N using N processors

• However in practice the speedup (Typical) is reduced from its ideal


value of N

• Superlinear speedup results when


• unfair values are used for Ts
• Differences in the nature of the hardware used
Factors that limit Speedup SP ?
(source of parallel overhead in parallel
processing systems )
1. Load Imbalance.

2. Communication Overhead

3. Extra computation.
Factors that limit Speedup SP ?

• Load imbalance
Speedup is generally limited by the speed of the
slowest node. So, an important consideration is to
ensure that each node performs the same amount
of work.

• Extra computation
Even with a completely equivalent algorithm, software overhead arises in the
concurrent implementations.
(Factors that limit Speedup SP )

• Communication Overhead
Assuming that communication and calculation
cannot be overlapped. Then a time spent in
communication between processors is directly
degrades the speedup.

To conclude
Speedup SP does not measure how efficiently the
processors are being used.

Q: Is it worth using 100 processors to get a speedup of 2 ?


How should the Performance of a parallel computation be
measured ?
A number of criteria are commonly used in evaluating the
goodness of an algorithm.

1. Running time & Speedup


2. Number of Processors
3. Cost
4. Efficiency
2. Number of Processors
Q: Why number of processors ?

1- Given two parallel algorithms for solving a problem with


different number of processors. The one uses fewer processors
is preferred.

2- In some cases, an optimal time, or a certain speedup can be


achieved only with a given number of processors.

3- A minimum number of processors may be required to


guarantee the success of computation.
2. Number of Processors

4- By trying to keep all of its processors continuously busy while


solving a problem, a parallel algorithm may require a longer
running time than if it uses fewer processors.

5- An analysis may reveal that a number of processors used by


an algorithm are idle most of the time can be discarded while
maintaining the same performance.

6- The structure of the parallel computer for which an algorithm


is destined may not be accommodate the number of
processors required by an algorithm.
How should the Performance of a parallel computation be
measured ?
A number of criteria are commonly used in evaluating the
goodness of an algorithm.

1. Running time & Speedup


2. Number of Processors
3. Cost
4. Efficiency
3. Cost

Suppose a parallel algorithm runs in time t(n) and


uses p(n) processors to solve a problem of size n.

Then, the total number of steps executed is given


by C(n).

C(n) = t(n) x p(n)

This is the upper bound, in some cases, not all


processors are active throughout the t(n) times.
How should the Performance of a parallel computation be
measured ?
A number of criteria are commonly used in evaluating the
goodness of an algorithm.

1. Running time & Speedup


2. Number of Processors
3. Cost
4. Efficiency
4. Efficiency

Is defined as the ratio of the speedup and the


number of processors required to achieve it.

Efficiency ( EP ) = SP / P
= TS / (P xTP)
4. Efficiency
• EP = 1
the parallel algorithm is cost optimal.

• EP < 1
the parallel algorithm is not cost optimal.

• EP > 1
a faster sequential algorithm can be obtained
by simulating the parallel one.
Example
Processors Time (secs) Speedup Efficiency
1 76 1.00 1.00
2 38 2.00 1.00
4 20 3.80 0.95
5 16 4.75 0.95
6 14 5.42 0.90
8 11 6.90 0.86
9 10 7.60 0.84
Evaluating static interconnection networks

• Diameter
The diameter of a network is the maximum distance between any two
processors in the network.

• connectivity
The connectivity of a network is a measure of the multiplicity of paths
between any two processor. A network with high connectivity is desirable
because its lower contention for communication resources.
Evaluating static interconnection networks
•Bisection width and Bisection bandwidth
Bisection width of a network is defined as the minimum number of
communication links that have to be removed to partition the network
into two halves.
Bisection bandwidth of a network is defined as the minimum volume
of communication allowed between any two halves of the network with
an equal number of processors. It’s the product of bisection width and
the channel bandwidth.

• cost
•Cost is the number of communication links required by the network.
Cost -optimal
The cost of solving a problem on a parallel
processor is the product of parallel run time and
the number of processors used.
Cost refer to as work or processor time product.

A parallel system is said to be cost optimal if the


cost of solving a problem on a parallel computer is
proportional to the execution time of the fastest
known sequential algorithm on a single processor.
Cost = Θ Ts
The effect of data mapping on
performance
*Using fewer than the maximum possible number of
processors to execute a parallel algorithm is called scaling
down a parallel system in terms of the number of processors.
* A native way to scale down a parallel system is to design a
parallel algorithm for one input element per processor and
then use fewer processors to simulate a large number of
processor.
If there are n inputs and only p processors (p < n ), we
can use the parallel algorithm designed for n
processors by assuming n virtual processors and
having each of the p physical processors simulate n/p
virtual processors.

As number of processor decreases by n/p


 the computation at each processor increase
by a factor n/p.

 the overall communication time doesn’t grow


by more than n/p.

 total run time increase by a factor n/p ( at


most).
If a parallel system with n processors is cost optimal

 Parallel system with p (p < n) processors is cost


optimal.

If a parallel system with n processors is not cost


optimal to begin with.

 It may still not be cost optimal after the


granularity of computation increases.
Ex:
Adding n numbers on n processors hypercube.
Adding n numbers on n processors hypercube.

Ts =15 = n-1 ≈ n if n is large.

Tp = (2*log (n)).

S = n/(2 *log(n))

E = 1/(2* log(n))
Adding n numbers on p (p < n) processors hypercube.
Adding n numbers on p (p < n) processors hypercube.
-We need (n/p) log p steps on p processors (
communication steps).
-In the remaining there are no communication
required as the remaining numbers are added
locally.
n/p numbers to add need (n/p -1) ≈ n/p
computation time.
Tp = 2(n/p) log(p) + (n/p) = 2 (n/p) ( log (p) +0.5)
≈ 2(n/p) log (p).

Cost = 2 (n/p) log (p) . P = 2 n ( log (p) > n ( cost of


adding n on one processor)
This parallel system is not cost optimal.
Adding n numbers on p (p < n) processors hypercube.
Second method
Step a take computation time n/p.
Step b, c, d is the adding of numbers in p
processors that is taken log (p) computation
and communication times.
Tp = (n/p) + 2 log (p).
Cost = (n + 2 p log (p)).
Hence this parallel system is not cost optimal
The Role of mapping computations into processors in
parallel algorithm design

The complete design of a parallel algorithm should take


into account
- the mapping of data into processors.
- must include the description of its implementation on
an arbitrary number of processors.
- that is why, we keep the input size and the number of
processors as two separate variables while designing and
analysis parallel algorithm.
Speed Scalability of parallel systems
Speed Scalability of parallel systems
Speed Scalability of parallel systems

n P=1 P=4 P=8 P=16 P=32


64 1 .8 .57 .33 .17

192 1 .92 .8 .6 .38

320 1 .95 .87 .71 .5

512 1 .97 .91 .8 .62


Speed Scalability of parallel systems

Speed Scalability of a parallel system is a


measure of its capacity to increase speedup
in proportion to the number of processors.
It reflects a parallel system’s ability to utilize
increasing processing resources effectively.
Minimum Execution time and minimum cost
optimal execution time

Tp = 0
Minimum execution time: is the minimum possible
execution time of a parallel algorithm.

minimum cost optimal execution time be the


minimum in which a problem can be solved by a
cost optimal parallel system.

You might also like