0% found this document useful (0 votes)

51 views11 pages

Lecture 4 Analytical Modeling of Parallel Programs

This document discusses performance metrics for parallel systems such as speedup, efficiency, and cost. It defines speedup as the ratio of serial runtime to parallel runtime. Amdahl's law states that speedup is limited by the sequential fraction of a program. Gustafson's law assumes problem size increases with processors. Brent's principle states that an algorithm requiring m operations in time t on unlimited processors would take m/p + t time on p processors through emulation. Superlinear speedup can occur from caches or exploratory decomposition doing less work in parallel.

Uploaded by

rahul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views11 pages

Lecture 4 Analytical Modeling of Parallel Programs

Uploaded by

rahul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 11

Lecture 4 Analytical

Modeling of Parallel
Programs
Parallel Computing
Fall 2008

Performance Metrics for

Parallel Systems

Number of processing elements p

Execution Time
Parallel runtime: the time that elapses from the
moment a parallel computation starts to the moment
the last processing element finishes execution.
Ts: serial runtime
Tp: parallel runtime
Total Parallel Overhead T0
Total time collectively spent by all the processing
elements running time required by the fastest known
sequential algorithm for solving the same problem on
a single processing element.
T =pTp-Ts
0

Performance Metrics for

Parallel Systems

Speedup S:

The ratio of the serial runtime of the best sequential algorithm

for solving a problem to the time taken by the parallel
algorithm to solve the same problem on p processing elements.
S=Ts(best)/Tp
Example: adding n numbers: Tp=(logn), Ts= (n), S=
(n/logn)
Theoretically, speedup can never exceed the number of
processing elements p(S<=p).

Proof: Assume a speedup is greater than p, then each processing

element can spend less than time Ts/p solving the problem. In this
case, a single processing element could emulate the p processing
elements and solve the problem in fewer than Ts units of time. This
is a contradiction because speedup, by definition, is computed with
respect to the best sequential algorithm.

Superlinear speedup: In practice, a speedup greater than p is

sometimes observed, this usually happens when the work
performed by a serial algorithm is greater than its parallel
formulation or due to hardware features that put the serial
implementation at a disadvantage.

Example for Superlinear

speedup

Superlinear speedup:

Example1: Superlinear effects from caches: With the

problem instance size of A and 64KB cache, the cache hit
rate is 80%. Assume latency to cache of 2ns and latency of
DRAM of 100ns, then memory access time is
2*0.8+100*0.2=21.6ns. If the computation is memory
bound and performs one FLOP/memory access, this
corresponds to a processing rate of 46.3 MFLOPS. With the
problem instance size of A/2 and 64KB cache, the cache hit
rate is higher, i.e., 90%, 8% the remaining data comes
from local DRAM and the other 2% comes from the remote
DRAM with latency of 400ns, then memory access time is
2*0.9+100*0.08+400*0.02=17.8. The corresponding
execution rate at each processor is 56.18MFLOPS, and for
two processors the total processing rate is 112.36MFLOPS.
Then the speedup will be 112.36/46.3=2.43!

Example for Superlinear

speedup

Superlinear speedup:

Example2: Superlinear effects due to exploratory

decomposition: explore leaf nodes of an unstructured tree.
Each leaf has a label associated with it and the objective is
to find a node with a specified label, say S. The solution
node is the rightmost leaf in the tree. A serial formulation
of this problem based on depth-first tree traversal explores
the entire tree, i.e. all 14 nodes, time is 14 units time. Now
a parallel formulation in which the left subtree is explored
by processing element 0 and the right subtree is explored
by processing element 1. The total work done by the
parallel algorithm is only 9 nodes and corresponding
parallel time is 5 units time. Then the speedup is 14/5=2.8.

Performance Metrics for

Parallel Systems(cont.)

Efficiency E

Cost(also called Work or processor-time product) W

Ratio of speedup to the number of processing element.

E=S/p
A measure of the fraction of time for which a processing element is
usefully employed.
Examples: adding n numbers on n processing elements: Tp=(logn),
Ts= (n), S= (n/logn), E= (1/logn)
Product of parallel runtime and the number of processing elements used.
W=Tp*p
Examples: adding n numbers on n processing elements: W= (nlogn).
Cost-optimal: if the cost of solving a problem on a parallel computer has
the same asymptotic growth(in terms) as a function of the input size
as the fastest-known sequential algorithm on a single processing
element.

Problem Size W2

The number of basic computation steps in the best sequential algorithm

to solve the problem on a single processing element.
W2=Ts of the fastest known algorithm to solve the problem on a
sequential computer.

Parallel vs Sequential
Computing: Amdahls

Theorem 0.1 (Amdahls Law) Let f, 0 f 1,

be the fraction of a computation that is
inherently sequential. Then the maximum
obtainable speedup S on p processors is S 1/(f
+ (1 f)/p)

Proof. Let T be the sequential running time for the

named computation. fT is the time spent on the
inherently sequential part of the program. On p
processors the remaining computation, if fully
parallelizable, would achieve a running time of at most
(1f)T/p. This way the running time of the parallel
program on p processors is the sum of the execution
time of the sequential and parallel components that is,
fT + (1 f)T/p. The maximum allowable speedup is
therefore S T/(fT + (1 f)T/p) and the result is
proven.
7

Amdahls Law

Amdahl used this observation to advocate the building of

even more powerful sequential machines as one cannot gain
much by using parallel machines. For example if f = 10%,
then S 10 as p . The underlying assumption in
Amdahls Law is that the sequential component of a
program is a constant fraction of the whole program. In
many instances as problem size increases the fraction of
computation that is inherently sequential decreases with
time. In many cases even a speedup of 10 is quite
significant by itself.
In addition Amdahls law is based on the concept that
parallel computing always tries to minimize parallel time. In
some cases a parallel computer is used to increase the
problem size that can be solved in a fixed amount of time.
For example in weather prediction this would increase the
accuracy of say a three-day forecast or would allow a more
accurate five-day forecast.
8

Parallel vs Sequential
Computing: Gustaffsons Law

Theorem 0.2 (Gustafsons Law) Let the execution

time of a parallel algorithm consist of a sequential
segment fT and a parallel segment (1 f)T and the
sequential segment is constant. The scaled speedup of
the algorithm is then. S =(fT + (1 f)Tp)/(fT + (1 f)T)
= f + (1 f)p

For f = 0.05, we get S = 19.05, whereas Amdahls law

gives an S 10.26.
1 proc
fT
(1-f)Tp
T(f+(1-f)p)

p proc
fT
(1-f)T
T

Amdahls Law assumes that problem size is fixed when it

deals with scalability. Gustafsons Law assumes that
running time is fixed.

Brents Scheduling Principle

(Emulations)

Suppose we have an unlimited parallelism efficient parallel

algorithm, i.e. an algorithm that runs on zillions of processors.
In practice zillions of processors may not available. Suppose
we have only p processors. A question that arises is what can
we do to run the efficient zillion processor algorithm on our
limited machine.
One answer is emulation: simulate the zillion processor
algorithm on the p processor machine.
Theorem 0.3 (Brents Principle) Let the execution time of
a parallel algorithm requires m operations and runs in parallel
time t. Then running this algorithm on a limited processor
machine with only p processors would require time m/p + t.
m
m Let

Proof:
mi be the number of computational operations at the ii

m / p m / p 1

processors
i

i
th step, i.e.
.If we assign the p
on the i-th step to
t
work on these mi operations
they can conclude in time
mi / p time
1 t
t m/ p
mi / p running
. Thus the
total
onmip/ pprocessors
would be
i

i 1

End

Thank you!

Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
4.5/5 (3)
Corporate Governance Case Study Tesla, Twitter, and The Good Weed
No ratings yet
Corporate Governance Case Study Tesla, Twitter, and The Good Weed
4 pages
Sample Memorandum
No ratings yet
Sample Memorandum
3 pages
3 - Gender Awareness and Development
No ratings yet
3 - Gender Awareness and Development
19 pages
Principles of Scalable Performance
No ratings yet
Principles of Scalable Performance
61 pages
Performance Analysis: PE PE
No ratings yet
Performance Analysis: PE PE
10 pages
Lecture-11 Amdhals Law Gustafsons Law
No ratings yet
Lecture-11 Amdhals Law Gustafsons Law
16 pages
Pc7 Performance
No ratings yet
Pc7 Performance
50 pages
Parallel2 PDF
No ratings yet
Parallel2 PDF
16 pages
Pc98 Lect5 Part1 Speedup
No ratings yet
Pc98 Lect5 Part1 Speedup
36 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
HW2 Solutions
No ratings yet
HW2 Solutions
4 pages
Parallel Algorithm Analysis
No ratings yet
Parallel Algorithm Analysis
11 pages
OOAD
No ratings yet
OOAD
67 pages
Chapter (7) Performance Analysis Techniques: Asmaa Ismail Farah Basil Raua Waleed
No ratings yet
Chapter (7) Performance Analysis Techniques: Asmaa Ismail Farah Basil Raua Waleed
46 pages
Zindagi Zama Da
No ratings yet
Zindagi Zama Da
21 pages
Lect 02
No ratings yet
Lect 02
51 pages
PC 2
No ratings yet
PC 2
44 pages
Lecture04 PDF
No ratings yet
Lecture04 PDF
27 pages
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
67 pages
Screenshot 2024-12-05 at 2.01.32 PM
No ratings yet
Screenshot 2024-12-05 at 2.01.32 PM
49 pages
Performance Evaluation of Parallel Computers
No ratings yet
Performance Evaluation of Parallel Computers
37 pages
Performance Metrices
100% (1)
Performance Metrices
18 pages
Lecture Week - 3 Amdahl Law 1
No ratings yet
Lecture Week - 3 Amdahl Law 1
19 pages
Lecture 4 Amdahl Law 1
No ratings yet
Lecture 4 Amdahl Law 1
22 pages
Document
No ratings yet
Document
10 pages
Parallel Computing - Unit III
No ratings yet
Parallel Computing - Unit III
74 pages
Unit 2 Performance Evaluations: Structure Nos
No ratings yet
Unit 2 Performance Evaluations: Structure Nos
18 pages
3.2 Performance Evaluations
No ratings yet
3.2 Performance Evaluations
18 pages
Course Outcome 1:: 15Cs4180 - Parallel Computing
No ratings yet
Course Outcome 1:: 15Cs4180 - Parallel Computing
23 pages
CS-3006 10 PerformanceAnalysis
No ratings yet
CS-3006 10 PerformanceAnalysis
52 pages
Speed Up Laws
No ratings yet
Speed Up Laws
21 pages
Daa Unit-V
No ratings yet
Daa Unit-V
50 pages
COE4590 12 Amdahls Law
No ratings yet
COE4590 12 Amdahls Law
18 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
Unit 4
No ratings yet
Unit 4
64 pages
Speedup and Efficiency of Parallel Algorithms: N N N P T Sequential T N P S
No ratings yet
Speedup and Efficiency of Parallel Algorithms: N N N P T Sequential T N P S
4 pages
Lecture # 21
No ratings yet
Lecture # 21
16 pages
Lecture 4 - Parallel Computing Metrics
No ratings yet
Lecture 4 - Parallel Computing Metrics
3 pages
02 Gustafsons Law
No ratings yet
02 Gustafsons Law
2 pages
Lecture 3 Amdahl's Law and Karp Flatt Metric
No ratings yet
Lecture 3 Amdahl's Law and Karp Flatt Metric
42 pages
Week - 01 - Lec - 2 - 05-03-2021 (Types of Parallelism)
No ratings yet
Week - 01 - Lec - 2 - 05-03-2021 (Types of Parallelism)
17 pages
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Analytical Modeling of Parallel Systems: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
36 pages
CS621 Week 14 - Complete
No ratings yet
CS621 Week 14 - Complete
69 pages
Performance Metrics For Parallel Programs: 8 March 2010
No ratings yet
Performance Metrics For Parallel Programs: 8 March 2010
44 pages
Performance Metrics
No ratings yet
Performance Metrics
34 pages
Week 01 Lec 2 - 05!03!2024 (Types of Parallelism)
No ratings yet
Week 01 Lec 2 - 05!03!2024 (Types of Parallelism)
17 pages
Lecture 02
No ratings yet
Lecture 02
31 pages
Parallel Programming: Sathish S. Vadhiyar Course Web Page
No ratings yet
Parallel Programming: Sathish S. Vadhiyar Course Web Page
36 pages
Lect5 Parallel System
No ratings yet
Lect5 Parallel System
41 pages
2 ND
No ratings yet
2 ND
19 pages
410A Week 4
No ratings yet
410A Week 4
12 pages
PDC ch#5
No ratings yet
PDC ch#5
12 pages
Unit 4 - Analytical Modeling of Parallel Programs
No ratings yet
Unit 4 - Analytical Modeling of Parallel Programs
37 pages
Design Problem1 of CSE261 "Computer System Architecture"
No ratings yet
Design Problem1 of CSE261 "Computer System Architecture"
4 pages
Lecture 2 Amdahl's Law and Karp-Flatt Metric
0% (1)
Lecture 2 Amdahl's Law and Karp-Flatt Metric
14 pages
UNIT-2 Parallel Programming Challenges
No ratings yet
UNIT-2 Parallel Programming Challenges
32 pages
HPC 4th Unit - 240504 - 160030
No ratings yet
HPC 4th Unit - 240504 - 160030
19 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Lec7 PDF
No ratings yet
Lec7 PDF
16 pages
Laraib Cs - 39 Assig 1
No ratings yet
Laraib Cs - 39 Assig 1
4 pages
3.1 Computer Performance
No ratings yet
3.1 Computer Performance
16 pages
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
FAQ List For VOLKSWAGEN CATIA Additional Applications
No ratings yet
FAQ List For VOLKSWAGEN CATIA Additional Applications
37 pages
Onefinity CNC Rev5
No ratings yet
Onefinity CNC Rev5
10 pages
AAR Re SWIM September 7, 2023
No ratings yet
AAR Re SWIM September 7, 2023
8 pages
1500 Health Insurance Claim Form 02 - 12 Revised PDF OWEN
No ratings yet
1500 Health Insurance Claim Form 02 - 12 Revised PDF OWEN
2 pages
Korean Culture For - p95
No ratings yet
Korean Culture For - p95
3 pages
The Whydah by Martin W. Sandler Chapter Sampler
100% (1)
The Whydah by Martin W. Sandler Chapter Sampler
13 pages
EOT - Sub Off A.C Test Part - I &amp II
No ratings yet
EOT - Sub Off A.C Test Part - I &amp II
2 pages
Sample GCU GAT PHD Political Science
No ratings yet
Sample GCU GAT PHD Political Science
7 pages
Institutional Support For New Ventures
No ratings yet
Institutional Support For New Ventures
8 pages
Digital Rights Management
No ratings yet
Digital Rights Management
9 pages
A Step by Step Guide To SS7 Attacks - FirstPoint
No ratings yet
A Step by Step Guide To SS7 Attacks - FirstPoint
1 page
Laser Patent
No ratings yet
Laser Patent
11 pages
Recent Developments in Hittite Archaeology and History
No ratings yet
Recent Developments in Hittite Archaeology and History
15 pages
At-I B.ST Exam Paper (2025-26)
No ratings yet
At-I B.ST Exam Paper (2025-26)
2 pages
FI T1 L4 PPT Core Principles
No ratings yet
FI T1 L4 PPT Core Principles
21 pages
History of National Flag
No ratings yet
History of National Flag
3 pages
Bussiness Math: Checking Account
No ratings yet
Bussiness Math: Checking Account
4 pages
The Fulfillment of The Abrahamic Covenant
No ratings yet
The Fulfillment of The Abrahamic Covenant
5 pages
SBI SMILE Scheme Application Form
No ratings yet
SBI SMILE Scheme Application Form
37 pages
35 ATCorporate Tax Planning
No ratings yet
35 ATCorporate Tax Planning
4 pages
Acct Statement
No ratings yet
Acct Statement
78 pages
Brief History, Core Teachings of Judaism (Week 4)
No ratings yet
Brief History, Core Teachings of Judaism (Week 4)
8 pages
Zehra Solmaz - 1.2 Key Historical Moments
No ratings yet
Zehra Solmaz - 1.2 Key Historical Moments
3 pages
Circular No. 01
No ratings yet
Circular No. 01
2 pages
Ia1 QP
No ratings yet
Ia1 QP
1 page
White Paper ServiceNow
No ratings yet
White Paper ServiceNow
3 pages
The Problems Encountered by Overseas Filipino Workers in Their Employment Abroad
No ratings yet
The Problems Encountered by Overseas Filipino Workers in Their Employment Abroad
37 pages

Lecture 4 Analytical Modeling of Parallel Programs

Uploaded by

Lecture 4 Analytical Modeling of Parallel Programs

Uploaded by

Lecture 4 Analytical

Performance Metrics for

Number of processing elements p

Performance Metrics for

The ratio of the serial runtime of the best sequential algorithm

Proof: Assume a speedup is greater than p, then each processing

Superlinear speedup: In practice, a speedup greater than p is

Example for Superlinear

Example1: Superlinear effects from caches: With the

Example for Superlinear

Example2: Superlinear effects due to exploratory

Performance Metrics for

Cost(also called Work or processor-time product) W

Ratio of speedup to the number of processing element.

The number of basic computation steps in the best sequential algorithm

Theorem 0.1 (Amdahls Law) Let f, 0 f 1,

Proof. Let T be the sequential running time for the

Amdahl used this observation to advocate the building of

Theorem 0.2 (Gustafsons Law) Let the execution

For f = 0.05, we get S = 19.05, whereas Amdahls law

Amdahls Law assumes that problem size is fixed when it

Brents Scheduling Principle

Suppose we have an unlimited parallelism efficient parallel

You might also like