0% found this document useful (0 votes)

21 views27 pages

02 - Parallel Programming

This document discusses parallel programming and parallel computers. It describes speedup factors and Amdahl's law, which states that the maximum speedup from parallel programming is limited by the percentage of the program that must run sequentially. The document also covers shared memory and distributed memory parallel computer architectures.

Uploaded by

Dd d

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views27 pages

02 - Parallel Programming

Uploaded by

Dd d

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Parallel Programming

Parallel Computers (2)

Parallel Programming - Lecture 2

Potential for Increased Computational Speed

 Speedup Factor

 What is the Maximum Speedup?

Parallel Programming - Lecture 2 2

Speedup Factor

Execution time using one processor (best sequential algorithm) ts

S(p) = =
Execution time using a multiprocessor with p processors tp

Where:
ts is execution time of the best sequential algorithm running on a
single processor
tp is execution time for solving the same problem on a multiprocessor.
 S(p) gives the increase in speed in using the
multiprocessor.
 Underlying algorithm for parallel implementation
might be (and is usually) different.

Parallel Programming - Lecture 2 3

Speedup Factor

In theoretical analysis, Speedup factor can also be

cast in terms of computational steps:
Number of computational steps using one processor
S(p) =
Number of parallel computational steps with p processors

Can also extend time complexity to parallel

computations.

Parallel Programming - Lecture 2 4

Speedup Factor

 The maximum speedup is usually p with p

processors (linear speedup).
𝑡𝑠
𝑆 𝑝 ≤ =𝑝
𝑡𝑠 𝑝
 Possible to get superlinear speedup ( 𝑆(𝑝) > 𝑝 )
Because the original sequential algorithm was not
optimal.
 One common reason for superlinear speedup is
extra memory in multiprocessor system.

Parallel Programming - Lecture 2 5

What is the Maximum Speedup?

Several factors will appear as overhead in the

parallel version and limit the speedup:
1. Periods when not all the processors can be
performing useful work and are simply idle.
2. Extra computations in the parallel version not
appearing in the sequential version, for
example, to recompute the constants locally.
3. Communication time between processes.

Parallel Programming - Lecture 2 6

Maximum Speedup

Parallel Programming - Lecture 2 7

Maximum Speedup: Amdahl’s law

Speedup factor is given by:

ts p
S(p)  
fts  (1  f )ts /p 1  (p  1) f

This equation is known as Amdahl’s law

Parallel Programming - Lecture 2 8

Speedup against number of processors

Even with infinite number of processors, maximum

speedup limited to 1/f .
𝑆(𝑝) 1
=
𝑝→∞ 𝑓
Example: With only 5% of computation being serial,
the maximum speedup is 20, irrespective of number
of processors.

Parallel Programming - Lecture 2 9

Speedup against number of processors

Parallel Programming - Lecture 2 10

Types of Parallel Computers

There are two basic types of parallel computers:

1. Shared memory multiprocessor

2. Distributed memory multicomputer

Parallel Programming - Lecture 2 11

Conventional Computer
A conventional computer consists of a processor
executing a program stored in a (main) memory:

Each main memory location located by its address.

Addresses start at 0 and extend to 2b - 1 when there
are b bits (binary digits) in address.
Parallel Programming - Lecture 2 12
Shared Memory Multiprocessor System
A natural way to extend single processor model is to
have multiple processors connected to multiple
memory modules, such that each processor can
access any memory module in a so-called shared
memory configuration.

Parallel Programming - Lecture 2 13

Shared Memory Multiprocessor System

 The connection between the processors and

memory is through some form of
interconnection network.
 A shared memory
multiprocessor system
employs a single address
space.
 which means that each location
in the whole main memory
system has a unique address
that is used by each processor
to access the location.
Parallel Programming - Lecture 2 14
Programming Shared Memory Multiprocessors

 Programming a shared memory multiprocessor

involves having executable code stored in the
shared memory for each processor to execute.
 The data for each program will also be stored in
the shared memory, and each program could
access all the data if needed.

Parallel Programming - Lecture 2 15

Programming Shared Memory Multiprocessors

 One way for the parallel programming to produce

the executable code for each processor is to use a
high level parallel programming language that has
special parallel programming constructs and
statements for declaring shared variables and
parallel code sections.

Parallel Programming - Lecture 2 16

Programming Shared Memory Multiprocessors

The programming languages is divided into:

1. A regular sequential programming language with
preprocessor directives to specify the parallelism.
Example OpenMP - An industry standard set of
compiler directives and constructs added to C/C++
and Fortran
2. Threads can be used that contains regular high
level language code sequences for individual
processors.
3. A regular sequential programming language and
modify the syntax to specify the parallelism.
Example UPC (Unified Parallel C).
Parallel Programming - Lecture 2 17
Shared Memory Multiprocessor System

 Two-processor shared memory system are particularly

cost-effective.
 However, it is very difficult to implement the hardware
to achieve fast access to all the shared memory by all
the processors with a large number of processors.
 Most large shared memory systems have some form of
hierarchical or distributed memory structure. Then,
processors can physically access nearby memory
locations much faster than more distant memory
locations.
 The term nonuniform memory access (NUMA) is used in
these cases, as opposed to uniform memory access
(UMA).
Parallel Programming - Lecture 2 18
Message-Passing Multicomputer
Complete computers connected through an
interconnection network:

Parallel Programming - Lecture 2 19

Message-Passing Multicomputer

 Each computer consists of a processor and local

memory but this memory is not accessible by
other processors.
 The interconnection network provides for
processors to send message to other processors.
 The message carry data from one processor to
another as dictated by the program.

Parallel Programming - Lecture 2 20

Networks for Multicomputers

 The purpose of the interconnection network is to

provide a physical path for messages sent from
one computer to another computer.
 Key issues in network design are:
1. Bandwidth
2. Latency
3. Cost

Parallel Programming - Lecture 2 21

Key issues in network design

 The bandwidth is the number of bits that can be

transmitted in unit time, given as bits/sec.
 The network latency is the time to make a
message transfer through the network.
 The communication latency is the total time to
send the message including the software
overhead and interface delays.
 Message latency or startup time is the time to
send zero-length message (finding the route,
packing and unpacking)

Parallel Programming - Lecture 2 22

Key issues in network design

 The diameter is the minimum number of links

between the two farthest nodes (computers) in
the network.
 The bisection width of a network is the
minimum number of links that must be cut to
divide the network into two equal parts.

Parallel Programming - Lecture 2 23

Multicomputer System
There are several ways one could interconnect
computers to form a multicomputer system:
1. Connecting every computer to every other
computer with links.
1. With c computers: there are 𝑐(𝑐 − 1)/2 links in all.
2. Only for a very small systems.
3. As the size increases, the number of interconnections
becomes impractical for economic and engineering
reasons.
2. There are two networks with restricted direct
interconnections:
1. The mesh network
2. The hypercube network.
Parallel Programming - Lecture 2 24
Mesh Network

 A two dimensional mesh can be created by having

each node in a two dimensional array connect to
its four nearest neighbors.
Computer/
Links processor

Parallel Programming - Lecture 2 25

Mesh Network

 The diameter of a 𝑝 × 𝑝 mesh is 2 𝑝 − 1 ,

since to reach one corner from the opposite
corner requires a path to made access 𝑝−1
nodes and down 𝑝 − 1 nodes.
 Torus: The free ends of the mesh might circulate
back to the opposite sides. This network is called
tours.

Parallel Programming - Lecture 2 26

Mesh Network

 Meshes are particularly convenient for many

scientific and engineering problems in which
solution points are arranged in two-dimensional or
three-dimensional arrays.
 The Intel Touchstone Delta computer designed
with a two dimensional mesh.
 J-machine, a research prototype with a three-
dimensional mesh

Parallel Programming - Lecture 2 27

COMPUTER RRB SPECIAL-1000 Questions
No ratings yet
COMPUTER RRB SPECIAL-1000 Questions
31 pages
Training Assignment: Front-End Advanced
No ratings yet
Training Assignment: Front-End Advanced
5 pages
01 - Parallel Programming
No ratings yet
01 - Parallel Programming
18 pages
Parallel Computing
No ratings yet
Parallel Computing
28 pages
001 - DDS IIIT Jan 10th
No ratings yet
001 - DDS IIIT Jan 10th
34 pages
Mpi Course
No ratings yet
Mpi Course
202 pages
Slides Taken From: Parallel Computing Platforms
No ratings yet
Slides Taken From: Parallel Computing Platforms
11 pages
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
No ratings yet
DSECL ZG 522: Big Data Systems: Session 2: Parallel and Distributed Systems
58 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Arch13 Multiprocessors Afterlecture
No ratings yet
Arch13 Multiprocessors Afterlecture
70 pages
Parallel Computing
No ratings yet
Parallel Computing
24 pages
Pc98 Lect5 Part1 Speedup
No ratings yet
Pc98 Lect5 Part1 Speedup
36 pages
Upd 2
No ratings yet
Upd 2
87 pages
Cloud Computing CS 15-319: Programming Models-Part I Lecture 4, Jan 25, 2012
No ratings yet
Cloud Computing CS 15-319: Programming Models-Part I Lecture 4, Jan 25, 2012
40 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
2 Parallel Computer Memory Architectures
No ratings yet
2 Parallel Computer Memory Architectures
26 pages
Unit 1
No ratings yet
Unit 1
25 pages
CSC580 Quick Notes Lect1and2
100% (1)
CSC580 Quick Notes Lect1and2
18 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
02 - B (Parallel Hardware)
No ratings yet
02 - B (Parallel Hardware)
40 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
Parallel Programming Module 1
No ratings yet
Parallel Programming Module 1
71 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
lec 4
No ratings yet
lec 4
36 pages
Parallel Processing
No ratings yet
Parallel Processing
31 pages
2 ParallelArchExec
No ratings yet
2 ParallelArchExec
46 pages
Chapter 5 - Shared Memory Multiprocessor
No ratings yet
Chapter 5 - Shared Memory Multiprocessor
96 pages
L 1 ParallelProcess Challenges
No ratings yet
L 1 ParallelProcess Challenges
82 pages
Lecture 4
No ratings yet
Lecture 4
20 pages
Parallel Programming Models
No ratings yet
Parallel Programming Models
25 pages
2-Amdahls Law
No ratings yet
2-Amdahls Law
32 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Lecture Week - 3 Amdahl Law 1
No ratings yet
Lecture Week - 3 Amdahl Law 1
19 pages
Parallel Programming: Sathish S. Vadhiyar Course Web Page
No ratings yet
Parallel Programming: Sathish S. Vadhiyar Course Web Page
36 pages
Chapter 3
No ratings yet
Chapter 3
21 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
High Performance Computing
No ratings yet
High Performance Computing
17 pages
Memory in Multiprocessor System
No ratings yet
Memory in Multiprocessor System
52 pages
Large Computer Systems and Pipelining: Homework
No ratings yet
Large Computer Systems and Pipelining: Homework
11 pages
Unit 2 Pram Algorithms: Structure Page Nos
No ratings yet
Unit 2 Pram Algorithms: Structure Page Nos
25 pages
03 (Parallel Software)
No ratings yet
03 (Parallel Software)
38 pages
002 IntroHPC
No ratings yet
002 IntroHPC
33 pages
CS4961 Parallel Programming: Course Details
No ratings yet
CS4961 Parallel Programming: Course Details
7 pages
24csppc202 Multicore Architecture and Programming
No ratings yet
24csppc202 Multicore Architecture and Programming
21 pages
Whitepaper Imsl Increase Performance Parallel Programming Numerical Libraries
No ratings yet
Whitepaper Imsl Increase Performance Parallel Programming Numerical Libraries
8 pages
Cloud Computing - Lecture 3
No ratings yet
Cloud Computing - Lecture 3
22 pages
Intro Parallel Programming 2015
No ratings yet
Intro Parallel Programming 2015
38 pages
L2 Parallel Computing Models
No ratings yet
L2 Parallel Computing Models
31 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
129 pages
Chapter 2 - Parallel Algorithm Design
No ratings yet
Chapter 2 - Parallel Algorithm Design
84 pages
PA Midsem
No ratings yet
PA Midsem
20 pages
Presentation 3
No ratings yet
Presentation 3
63 pages
Message Passing Fundamentals: Reference: Http://foxtrot - Ncsa.uiuc - edu:8900/public/MPI
No ratings yet
Message Passing Fundamentals: Reference: Http://foxtrot - Ncsa.uiuc - edu:8900/public/MPI
22 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
BDA UNIT-2 (Final)
No ratings yet
BDA UNIT-2 (Final)
27 pages
Programming Assignment I: 1 Overview of The Programming Project
No ratings yet
Programming Assignment I: 1 Overview of The Programming Project
3 pages
LESSON 1. 2 Universal Set
No ratings yet
LESSON 1. 2 Universal Set
28 pages
DSA syllabus
No ratings yet
DSA syllabus
1 page
ALG Internal-1 QP
No ratings yet
ALG Internal-1 QP
2 pages
PST Book - Unit 1 - 5
No ratings yet
PST Book - Unit 1 - 5
192 pages
Ep110 China 2023 5 Solution
No ratings yet
Ep110 China 2023 5 Solution
2 pages
3 Pipeline
No ratings yet
3 Pipeline
21 pages
Internal Test-Unit-2 Answers
No ratings yet
Internal Test-Unit-2 Answers
3 pages
Binary Search Tree
No ratings yet
Binary Search Tree
49 pages
Project Report
No ratings yet
Project Report
47 pages
Algorithm Complexity Cheat Sheet
No ratings yet
Algorithm Complexity Cheat Sheet
2 pages
DCS 309
No ratings yet
DCS 309
10 pages
Gr7-1 Term Test 2022-23 (Test Paper)
No ratings yet
Gr7-1 Term Test 2022-23 (Test Paper)
6 pages
4.3 Frequency Distribution
No ratings yet
4.3 Frequency Distribution
5 pages
MLT Unit 1 & 2
No ratings yet
MLT Unit 1 & 2
119 pages
Internal Architecture of JVM
No ratings yet
Internal Architecture of JVM
6 pages
Class 9 - Python Introduction New
No ratings yet
Class 9 - Python Introduction New
37 pages
10 Access Modifiers
No ratings yet
10 Access Modifiers
6 pages
Module 4 Fuzzy Logic, Neural NW Leture Notes. 16861418577272
No ratings yet
Module 4 Fuzzy Logic, Neural NW Leture Notes. 16861418577272
8 pages
CS3301 DS QB
No ratings yet
CS3301 DS QB
22 pages
Shared Memory Parallelism Can Be Simple Fast and Scalable Julian Shun Instant Download
100% (4)
Shared Memory Parallelism Can Be Simple Fast and Scalable Julian Shun Instant Download
61 pages
Final Model Print
No ratings yet
Final Model Print
12 pages
CH 02
No ratings yet
CH 02
25 pages
Zoho
No ratings yet
Zoho
2 pages
Fycs Sem II Design&Analysisofalgorithm Notes
No ratings yet
Fycs Sem II Design&Analysisofalgorithm Notes
85 pages
Integers - Notes
No ratings yet
Integers - Notes
21 pages
4th International Conference On Computer Science and Information Technology (COMSCI 2025)
No ratings yet
4th International Conference On Computer Science and Information Technology (COMSCI 2025)
2 pages

02 - Parallel Programming

Uploaded by

02 - Parallel Programming

Uploaded by

Parallel Programming

Parallel Computers (2)

Parallel Programming - Lecture 2

 What is the Maximum Speedup?

Parallel Programming - Lecture 2 2

Execution time using one processor (best sequential algorithm) ts

Parallel Programming - Lecture 2 3

In theoretical analysis, Speedup factor can also be

Can also extend time complexity to parallel

Parallel Programming - Lecture 2 4

 The maximum speedup is usually p with p

Parallel Programming - Lecture 2 5

Several factors will appear as overhead in the

Parallel Programming - Lecture 2 6

Parallel Programming - Lecture 2 7

Speedup factor is given by:

This equation is known as Amdahl’s law

Parallel Programming - Lecture 2 8

Even with infinite number of processors, maximum

Parallel Programming - Lecture 2 9

Parallel Programming - Lecture 2 10

There are two basic types of parallel computers:

2. Distributed memory multicomputer

Parallel Programming - Lecture 2 11

Each main memory location located by its address.

Parallel Programming - Lecture 2 13

 The connection between the processors and

 Programming a shared memory multiprocessor

Parallel Programming - Lecture 2 15

 One way for the parallel programming to produce

Parallel Programming - Lecture 2 16

The programming languages is divided into:

 Two-processor shared memory system are particularly

Parallel Programming - Lecture 2 19

 Each computer consists of a processor and local

Parallel Programming - Lecture 2 20

 The purpose of the interconnection network is to

Parallel Programming - Lecture 2 21

 The bandwidth is the number of bits that can be

Parallel Programming - Lecture 2 22

 The diameter is the minimum number of links

Parallel Programming - Lecture 2 23

 A two dimensional mesh can be created by having

Parallel Programming - Lecture 2 25

 The diameter of a 𝑝 × 𝑝 mesh is 2 𝑝 − 1 ,

Parallel Programming - Lecture 2 26

 Meshes are particularly convenient for many

Parallel Programming - Lecture 2 27

You might also like