001 - DDS IIIT Jan 10th
001 - DDS IIIT Jan 10th
• Serial computing
• Problem is broken down into a stream of instructions, executed sequentially
one after the other on a single processor
• Only one instruction is executed at a time
• Parallel computing
• Problem is broken down into parts that can be solved concurrently
• Each part is further divided into a stream of instructions
• Instructions from different parts execute simultaneously on different
processors
Why Parallel Computing?
• The real world is parallel
• Complex interrelated events happen simultaneously
• galaxies, planetary movements, functioning of the brain, weather, traffic
Navier-stocks
equations Mathematical Models
Algorithms,
Solvers, Numerical Parallel
Application codes, Solutions Computing
supercomputers
Data
viz software Visualization,
Validation,
Physical
Insights
Parallel Computers
• Racks of server/computer/processing units
(40/80 server per rack) sits on 68680 sq-foot space
(Google data space)
But here single application will use the supercomputer at the same time.
Parallel Computers
13 fastest Supercomputer in
the world - 2022
https://www.top500.org/
Data Center
Traditional Processor
Controller
• The traditional logical view of a sequential computer – Datapath
Control
memory connected to processor via datapath. logic and
Register
file
state
register
General
• All these becomes bottleneck for the over-all processing rate IR PC ALU
of the computer
Program Data
• Number of architectural innovation have addressed these memory memory
total = 0
• Improvement of clock speed for i =1 to …
• µp’s like Itanium, Space Ultra, MIPS, Power4 supports multiple instruction
execution.
• If the assembly of a car taking 100 time units, can be broken into 10
pipelined stages of 10 units each, a single assembly line can produce a car every
10 time units!
sum = 0;
for(i=0 ; i<n ; i++) {
x = compute_next_value(…);
sum += x;
}
my_sum = 0;
my_first_i = …;
my_last_i = …;
sum = 0;
for(i=0 ; i<n ; i++) {
x = compute_next_value(…);
sum += x;
Core: 0
sum: 8 + 19 + 7 + 15 + 7 +13 + 12 +14 = 95
Parallel Programming Platforms
• In a distributed memory system, each core has its own, private memory, and
the cores must communicate explicitly by doing something like sending
messages across a network.
Parallel Programming Platforms
There are two main types of parallel systems:
Shared memory systems and distributed-memory systems.
• MPI are libraries of type definitions, functions, and macros that can be
used in C programs.
(a) Shared Memory System (b) Distributed Memory System (c) GPU Architecture
Limitation of Memory System Performance
Limitation of Memory System Performance
• Every time memory request is made, the processor must wait 100 cycles
Effect of memory latency on performance
Example:
1 1000
= = 10 𝑀𝐹𝐿𝑂𝑃𝑆
100 𝑥 10−9 100 𝑥 10−6
If ‘n’ is a power of 2,
these operations performed in log2(n) steps
Ts = Θ(n), Tp = Θ(log(n))
Ts = Θ(n), Tp = Θ(log(n))
se = F, fraction of calculation
that is serial
Text Book:
An Introduction to Parallel Programming
By Peter S. Pacheco