0% found this document useful (0 votes)

2 views41 pages

Lect5 Parallel System

The document discusses the performance measurement of parallel systems, focusing on criteria such as running time, speedup, number of processors, cost, and efficiency. It highlights the importance of speedup as a comparison metric between parallel and sequential algorithms, as well as factors limiting speedup like load imbalance and communication overhead. Additionally, it addresses the implications of processor count and cost optimality in achieving efficient parallel computation.

Uploaded by

sama akram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views41 pages

Lect5 Parallel System

Uploaded by

sama akram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Performance of Parallel Systems

How should the Performance of a parallel computation be

measured ?
A number of criteria are commonly used in evaluating the
goodness of an algorithm.

1. Running time & Speedup

2. Number of Processors
3. Cost
4. Efficiency
1. Running time & Speedup

The primary reason for using parallel algorithm is

to speedup sequential computation.

Therefore, it is quite natural to compare the running

time of a parallel algorithm designed for certain
problem to that of the best available sequential
algorithm for the same problem, These give Speedup
Speedup

SP = TS / TP

Where:
TS is the best possible sequential time.

TP is the time taken by a parallel algorithm

on P processors.
What is meant by TS ?
Is the time taken to run:
• A sequential algorithm on one processor of the parallel computer ?

• A sequential algorithm on the fastest serial machine available ?

• A parallel algorithm on a single processor ?

To keep things fair

TS should be the best possible time in serial
world.
Speedup SP
If we consider SP as the ratio between the
time taken by the parallel algorithm on one
processor to the time taken by the parallel
algorithm on P processors.

This is misleading since many parallel algorithms

contain extra operations to accommodate the
parallelism (e.g. communication) which increases
TS and thus exaggeration the speedup
Speedup SP Limitation

SP = TS / TP where SP < P

Why ?
1. Because the problem cannot be decomposed into independent computations
to be executed simultaneously, while keeping all processors sufficiently busy.

2. Because the structure of parallel computers used imposed certain restrictions

(e.g. communications).
Speedup Curves

Superlinear Speedup

Speedup Linear Speedup

Typical Speedup

Number of Processors
Speedup Curves
• Which ever definition is used the ideal is to produce linear speedup
• A speedup of N using N processors

• However in practice the speedup (Typical) is reduced from its ideal

value of N

• Superlinear speedup results when

• unfair values are used for Ts
• Differences in the nature of the hardware used
Factors that limit Speedup SP ?
(source of parallel overhead in parallel
processing systems )
1. Load Imbalance.

2. Communication Overhead

3. Extra computation.
Factors that limit Speedup SP ?

• Load imbalance
Speedup is generally limited by the speed of the
slowest node. So, an important consideration is to
ensure that each node performs the same amount
of work.

• Extra computation
Even with a completely equivalent algorithm, software overhead arises in the
concurrent implementations.
(Factors that limit Speedup SP )

• Communication Overhead
Assuming that communication and calculation
cannot be overlapped. Then a time spent in
communication between processors is directly
degrades the speedup.

To conclude
Speedup SP does not measure how efficiently the
processors are being used.

Q: Is it worth using 100 processors to get a speedup of 2 ?

How should the Performance of a parallel computation be
measured ?
A number of criteria are commonly used in evaluating the
goodness of an algorithm.

1. Running time & Speedup

2. Number of Processors
3. Cost
4. Efficiency
2. Number of Processors
Q: Why number of processors ?

1- Given two parallel algorithms for solving a problem with

different number of processors. The one uses fewer processors
is preferred.

2- In some cases, an optimal time, or a certain speedup can be

achieved only with a given number of processors.

3- A minimum number of processors may be required to

guarantee the success of computation.
2. Number of Processors

4- By trying to keep all of its processors continuously busy while

solving a problem, a parallel algorithm may require a longer
running time than if it uses fewer processors.

5- An analysis may reveal that a number of processors used by

an algorithm are idle most of the time can be discarded while
maintaining the same performance.

6- The structure of the parallel computer for which an algorithm

is destined may not be accommodate the number of
processors required by an algorithm.
How should the Performance of a parallel computation be
measured ?
A number of criteria are commonly used in evaluating the
goodness of an algorithm.

1. Running time & Speedup

2. Number of Processors
3. Cost
4. Efficiency
3. Cost

Suppose a parallel algorithm runs in time t(n) and

uses p(n) processors to solve a problem of size n.

Then, the total number of steps executed is given

by C(n).

C(n) = t(n) x p(n)

This is the upper bound, in some cases, not all

processors are active throughout the t(n) times.
How should the Performance of a parallel computation be
measured ?
A number of criteria are commonly used in evaluating the
goodness of an algorithm.

1. Running time & Speedup

2. Number of Processors
3. Cost
4. Efficiency
4. Efficiency

Is defined as the ratio of the speedup and the

number of processors required to achieve it.

Efficiency ( EP ) = SP / P
= TS / (P xTP)
4. Efficiency
• EP = 1
the parallel algorithm is cost optimal.

• EP < 1
the parallel algorithm is not cost optimal.

• EP > 1
a faster sequential algorithm can be obtained
by simulating the parallel one.
Example
Processors Time (secs) Speedup Efficiency
1 76 1.00 1.00
2 38 2.00 1.00
4 20 3.80 0.95
5 16 4.75 0.95
6 14 5.42 0.90
8 11 6.90 0.86
9 10 7.60 0.84
Evaluating static interconnection networks

• Diameter
The diameter of a network is the maximum distance between any two
processors in the network.

• connectivity
The connectivity of a network is a measure of the multiplicity of paths
between any two processor. A network with high connectivity is desirable
because its lower contention for communication resources.
Evaluating static interconnection networks
•Bisection width and Bisection bandwidth
Bisection width of a network is defined as the minimum number of
communication links that have to be removed to partition the network
into two halves.
Bisection bandwidth of a network is defined as the minimum volume
of communication allowed between any two halves of the network with
an equal number of processors. It’s the product of bisection width and
the channel bandwidth.

• cost
•Cost is the number of communication links required by the network.
Cost -optimal
The cost of solving a problem on a parallel
processor is the product of parallel run time and
the number of processors used.
Cost refer to as work or processor time product.

A parallel system is said to be cost optimal if the

cost of solving a problem on a parallel computer is
proportional to the execution time of the fastest
known sequential algorithm on a single processor.
Cost = Θ Ts
The effect of data mapping on
performance
*Using fewer than the maximum possible number of
processors to execute a parallel algorithm is called scaling
down a parallel system in terms of the number of processors.
* A native way to scale down a parallel system is to design a
parallel algorithm for one input element per processor and
then use fewer processors to simulate a large number of
processor.
If there are n inputs and only p processors (p < n ), we
can use the parallel algorithm designed for n
processors by assuming n virtual processors and
having each of the p physical processors simulate n/p
virtual processors.

As number of processor decreases by n/p

 the computation at each processor increase
by a factor n/p.

 the overall communication time doesn’t grow

by more than n/p.

 total run time increase by a factor n/p ( at

most).
If a parallel system with n processors is cost optimal

 Parallel system with p (p < n) processors is cost

optimal.

If a parallel system with n processors is not cost

optimal to begin with.

 It may still not be cost optimal after the

granularity of computation increases.
Ex:
Adding n numbers on n processors hypercube.
Adding n numbers on n processors hypercube.

Ts =15 = n-1 ≈ n if n is large.

Tp = (2*log (n)).

S = n/(2 *log(n))

E = 1/(2* log(n))
Adding n numbers on p (p < n) processors hypercube.
Adding n numbers on p (p < n) processors hypercube.
-We need (n/p) log p steps on p processors (
communication steps).
-In the remaining there are no communication
required as the remaining numbers are added
locally.
n/p numbers to add need (n/p -1) ≈ n/p
computation time.
Tp = 2(n/p) log(p) + (n/p) = 2 (n/p) ( log (p) +0.5)
≈ 2(n/p) log (p).

Cost = 2 (n/p) log (p) . P = 2 n ( log (p) > n ( cost of

adding n on one processor)
This parallel system is not cost optimal.
Adding n numbers on p (p < n) processors hypercube.
Second method
Step a take computation time n/p.
Step b, c, d is the adding of numbers in p
processors that is taken log (p) computation
and communication times.
Tp = (n/p) + 2 log (p).
Cost = (n + 2 p log (p)).
Hence this parallel system is not cost optimal
The Role of mapping computations into processors in
parallel algorithm design

The complete design of a parallel algorithm should take

into account
- the mapping of data into processors.
- must include the description of its implementation on
an arbitrary number of processors.
- that is why, we keep the input size and the number of
processors as two separate variables while designing and
analysis parallel algorithm.
Speed Scalability of parallel systems
Speed Scalability of parallel systems
Speed Scalability of parallel systems

n P=1 P=4 P=8 P=16 P=32

64 1 .8 .57 .33 .17

192 1 .92 .8 .6 .38

320 1 .95 .87 .71 .5

512 1 .97 .91 .8 .62

Speed Scalability of parallel systems

Speed Scalability of a parallel system is a

measure of its capacity to increase speedup
in proportion to the number of processors.
It reflects a parallel system’s ability to utilize
increasing processing resources effectively.
Minimum Execution time and minimum cost
optimal execution time

Tp = 0
Minimum execution time: is the minimum possible
execution time of a parallel algorithm.

minimum cost optimal execution time be the

minimum in which a problem can be solved by a
cost optimal parallel system.

2011 State Competition Answer Key
No ratings yet
2011 State Competition Answer Key
8 pages
An Introduction To Parallel Algorithms
No ratings yet
An Introduction To Parallel Algorithms
66 pages
Jeffrey Dean CSE Summa Sum1990
No ratings yet
Jeffrey Dean CSE Summa Sum1990
34 pages
PDC ch#5
No ratings yet
PDC ch#5
12 pages
Parallel Algorithms Unit 2 by Dr. Choudhary Ravi Singh
No ratings yet
Parallel Algorithms Unit 2 by Dr. Choudhary Ravi Singh
18 pages
OOAD
No ratings yet
OOAD
67 pages
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
67 pages
Performance Metrices
100% (1)
Performance Metrices
18 pages
Unit 4
No ratings yet
Unit 4
64 pages
Parallel Computing - Unit III
No ratings yet
Parallel Computing - Unit III
74 pages
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
36 pages
CS621 Week 14 - Complete
No ratings yet
CS621 Week 14 - Complete
69 pages
Parallel Algorithm Analysis
No ratings yet
Parallel Algorithm Analysis
11 pages
PC 2
No ratings yet
PC 2
44 pages
Performance Metrics For Parallel Programs: 8 March 2010
No ratings yet
Performance Metrics For Parallel Programs: 8 March 2010
44 pages
Unit 2 Performance Evaluations: Structure Nos
No ratings yet
Unit 2 Performance Evaluations: Structure Nos
18 pages
3.2 Performance Evaluations
No ratings yet
3.2 Performance Evaluations
18 pages
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
No ratings yet
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
13 pages
Slides
No ratings yet
Slides
44 pages
JaJa Parallel - Algorithms Intro
50% (2)
JaJa Parallel - Algorithms Intro
45 pages
Chapter 4
No ratings yet
Chapter 4
16 pages
Parallel Algorithms Complete Notes
No ratings yet
Parallel Algorithms Complete Notes
13 pages
Dis Top Tim Notes 1
No ratings yet
Dis Top Tim Notes 1
3 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
Pc98 Lect5 Part1 Speedup
No ratings yet
Pc98 Lect5 Part1 Speedup
36 pages
Untitled Document
No ratings yet
Untitled Document
63 pages
Untitled Document
No ratings yet
Untitled Document
39 pages
Lecture 4 Analytical Modeling of Parallel Programs
No ratings yet
Lecture 4 Analytical Modeling of Parallel Programs
11 pages
HPC Chapter 2
No ratings yet
HPC Chapter 2
16 pages
PP - CH03
No ratings yet
PP - CH03
15 pages
Week 7
No ratings yet
Week 7
27 pages
Course Outcome 1:: 15Cs4180 - Parallel Computing
No ratings yet
Course Outcome 1:: 15Cs4180 - Parallel Computing
23 pages
Week 7
No ratings yet
Week 7
27 pages
HPC Unit 456
No ratings yet
HPC Unit 456
25 pages
Speedup and Efficiency of Parallel Algorithms: N N N P T Sequential T N P S
No ratings yet
Speedup and Efficiency of Parallel Algorithms: N N N P T Sequential T N P S
4 pages
Cours 2
No ratings yet
Cours 2
25 pages
Analysis Modeling of Parallel Programs
No ratings yet
Analysis Modeling of Parallel Programs
4 pages
Presented by
No ratings yet
Presented by
23 pages
Chapter 01
No ratings yet
Chapter 01
52 pages
5 - Designing Parallel Programs
No ratings yet
5 - Designing Parallel Programs
52 pages
Zindagi Zama Da
No ratings yet
Zindagi Zama Da
21 pages
Screenshot 2024-12-05 at 2.01.32 PM
No ratings yet
Screenshot 2024-12-05 at 2.01.32 PM
49 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
Unit 4 HPC
No ratings yet
Unit 4 HPC
82 pages
Unit 4 HPC Part2
No ratings yet
Unit 4 HPC Part2
18 pages
HW2 Solutions
No ratings yet
HW2 Solutions
4 pages
ACA 2024W 01 Introduction
No ratings yet
ACA 2024W 01 Introduction
19 pages
002 IntroHPC
No ratings yet
002 IntroHPC
33 pages
Lecture 3 Amdahl's Law and Karp Flatt Metric
No ratings yet
Lecture 3 Amdahl's Law and Karp Flatt Metric
42 pages
UNIT-8 Forms of Parallelism: 8.1 Simple Parallel Computation: Example 1: Numerical Integration Over Two Variables
No ratings yet
UNIT-8 Forms of Parallelism: 8.1 Simple Parallel Computation: Example 1: Numerical Integration Over Two Variables
12 pages
HPC 4th Unit - 240504 - 160030
No ratings yet
HPC 4th Unit - 240504 - 160030
19 pages
12 MPIProgramPerformance
No ratings yet
12 MPIProgramPerformance
33 pages
Performance Metrics
No ratings yet
Performance Metrics
16 pages
Parallel Programming: Sathish S. Vadhiyar Course Web Page
No ratings yet
Parallel Programming: Sathish S. Vadhiyar Course Web Page
36 pages
Unit 2 - 2.1 (Parallel Approaches)
No ratings yet
Unit 2 - 2.1 (Parallel Approaches)
11 pages
Lecture 9 - Parallel Algorithms
No ratings yet
Lecture 9 - Parallel Algorithms
28 pages
Unit 1
No ratings yet
Unit 1
18 pages
Principles of Scalable Performance
No ratings yet
Principles of Scalable Performance
61 pages
Speedup and Efficiency
No ratings yet
Speedup and Efficiency
11 pages
lez_10
No ratings yet
lez_10
8 pages
lez_13
No ratings yet
lez_13
9 pages
lez_18
No ratings yet
lez_18
9 pages
lez_04 (1)
No ratings yet
lez_04 (1)
6 pages
lez_22
No ratings yet
lez_22
9 pages
lez_05
No ratings yet
lez_05
8 pages
lez_07
No ratings yet
lez_07
9 pages
lez_15
No ratings yet
lez_15
9 pages
lez_08
No ratings yet
lez_08
8 pages
lect2-parallel system
No ratings yet
lect2-parallel system
26 pages
lect1-parallel system
No ratings yet
lect1-parallel system
52 pages
chapter 4- interprocess communication1
No ratings yet
chapter 4- interprocess communication1
34 pages
lect3-parallel system
No ratings yet
lect3-parallel system
31 pages
chapter 1 Introduction
No ratings yet
chapter 1 Introduction
32 pages
Sheet 1 Solution
No ratings yet
Sheet 1 Solution
12 pages
Lect8 Parallel System
No ratings yet
Lect8 Parallel System
43 pages
Professional Cloud DevOps Engineer Questions
No ratings yet
Professional Cloud DevOps Engineer Questions
4 pages
Dev Ops
No ratings yet
Dev Ops
29 pages
MP QM - Part 2 - 2021
No ratings yet
MP QM - Part 2 - 2021
36 pages
Devops
No ratings yet
Devops
1 page
Topic1 ClassesAndObjects
No ratings yet
Topic1 ClassesAndObjects
27 pages
Lesson Plan - ML
No ratings yet
Lesson Plan - ML
3 pages
Maths Project Differentiation Daily Life
No ratings yet
Maths Project Differentiation Daily Life
5 pages
Esolutions Manual - Powered by Cognero
No ratings yet
Esolutions Manual - Powered by Cognero
35 pages
Egd Model Paper I
No ratings yet
Egd Model Paper I
2 pages
PG Handbook
No ratings yet
PG Handbook
371 pages
DR Jang Test2
No ratings yet
DR Jang Test2
12 pages
Investigation and Analysis of Bridge Pot Bearing: I R J M T
No ratings yet
Investigation and Analysis of Bridge Pot Bearing: I R J M T
10 pages
Test
No ratings yet
Test
4 pages
The Foundations: Logic and Proofs: Chapter 1, Part II: Predicate Logic
No ratings yet
The Foundations: Logic and Proofs: Chapter 1, Part II: Predicate Logic
57 pages
2-Table Constants & Formulas For Control Chart (In)
No ratings yet
2-Table Constants & Formulas For Control Chart (In)
3 pages
Sample Exam
No ratings yet
Sample Exam
2 pages
Lab Report CSE162.5
No ratings yet
Lab Report CSE162.5
7 pages
STD 5, PT 2, Mixed Assignment
No ratings yet
STD 5, PT 2, Mixed Assignment
2 pages
Document Control
86% (7)
Document Control
23 pages
Curve Fitting (Quintana)
No ratings yet
Curve Fitting (Quintana)
14 pages
Secant Method
No ratings yet
Secant Method
49 pages
Dimensioning ME Drawings
No ratings yet
Dimensioning ME Drawings
50 pages
Algebra 1st Chapter
No ratings yet
Algebra 1st Chapter
22 pages
NAV2 Module2 Luzon
100% (1)
NAV2 Module2 Luzon
18 pages
Holzland Stoellger Gesamtpreisliste
No ratings yet
Holzland Stoellger Gesamtpreisliste
11 pages
Formal Methods For Performance Evaluation 7th International School On Formal Methods For The Design Of Computer Communication And Software Systems Sfm 2007 Bertinoro Italy May 28june 2 2007 Advanced Lectures 1st Edition William J Stewart Auth pdf download
No ratings yet
Formal Methods For Performance Evaluation 7th International School On Formal Methods For The Design Of Computer Communication And Software Systems Sfm 2007 Bertinoro Italy May 28june 2 2007 Advanced Lectures 1st Edition William J Stewart Auth pdf download
82 pages
Question Bank (Paper MATDSCT 6.1)
No ratings yet
Question Bank (Paper MATDSCT 6.1)
5 pages
Classical Encryption Cipher
No ratings yet
Classical Encryption Cipher
164 pages
Gamma Blast
No ratings yet
Gamma Blast
13 pages
Computational Thinking and Thinking About Computing Jeannette M. Wing 2009 Reu
No ratings yet
Computational Thinking and Thinking About Computing Jeannette M. Wing 2009 Reu
31 pages
United International University (UIU) : Dept. of Computer Science& Engineering (CSE)
No ratings yet
United International University (UIU) : Dept. of Computer Science& Engineering (CSE)
2 pages
S1 Exercise 5C
No ratings yet
S1 Exercise 5C
5 pages
Vessel Calculation Sheet: Internal Pressure Design
No ratings yet
Vessel Calculation Sheet: Internal Pressure Design
7 pages

Lect5 Parallel System

Uploaded by

Lect5 Parallel System

Uploaded by

Performance of Parallel Systems

How should the Performance of a parallel computation be

1. Running time & Speedup

The primary reason for using parallel algorithm is

Therefore, it is quite natural to compare the running

TP is the time taken by a parallel algorithm

• A sequential algorithm on the fastest serial machine available ?

• A parallel algorithm on a single processor ?

To keep things fair

This is misleading since many parallel algorithms

2. Because the structure of parallel computers used imposed certain restrictions

Speedup Linear Speedup

• However in practice the speedup (Typical) is reduced from its ideal

• Superlinear speedup results when

Q: Is it worth using 100 processors to get a speedup of 2 ?

1. Running time & Speedup

1- Given two parallel algorithms for solving a problem with

2- In some cases, an optimal time, or a certain speedup can be

3- A minimum number of processors may be required to

4- By trying to keep all of its processors continuously busy while

5- An analysis may reveal that a number of processors used by

6- The structure of the parallel computer for which an algorithm

1. Running time & Speedup

Suppose a parallel algorithm runs in time t(n) and

Then, the total number of steps executed is given

C(n) = t(n) x p(n)

This is the upper bound, in some cases, not all

1. Running time & Speedup

Is defined as the ratio of the speedup and the

A parallel system is said to be cost optimal if the

As number of processor decreases by n/p

 the overall communication time doesn’t grow

 total run time increase by a factor n/p ( at

 Parallel system with p (p < n) processors is cost

If a parallel system with n processors is not cost

 It may still not be cost optimal after the

Ts =15 = n-1 ≈ n if n is large.

Cost = 2 (n/p) log (p) . P = 2 n ( log (p) > n ( cost of

The complete design of a parallel algorithm should take

n P=1 P=4 P=8 P=16 P=32

192 1 .92 .8 .6 .38

320 1 .95 .87 .71 .5

512 1 .97 .91 .8 .62

Speed Scalability of a parallel system is a

minimum cost optimal execution time be the

You might also like