Unit 2 – Parallel Computing
Architecture
Parallel Computing
2022 Parallel Computing | UNITEN 1
Objectives
• To differentiate the categories in Flynn’s Taxonomy
• To explain the concept of symmetric multiprocessor
• To differentiate the types of MIMD in Shared and Distributed
Memories
2022 Parallel Computing | UNITEN 2
Flynn's Classical Taxonomy
• There are different ways to classify parallel computers. One of the
more widely used classifications, in use since 1966, is called Flynn's
Taxonomy.
• Flynn's taxonomy distinguishes multi-processor computer
architectures according to how they can be classified along the two
independent dimensions of Instruction and Data. Each of these
dimensions can have only one of two possible states: Single or
Multiple.
2022 Parallel Computing | UNITEN 3
Flynn Taxonomy - Categories
SINGLE DATA MULTIPLE DATA
SISD SIMD
SINGLE
Single Instruction Stream Single Instruction Stream
INSTRUCTION
Single Data stream Multiple Data stream
MISD MIMD
MULTIPLE
Multiple Instruction Stream Multiple Instruction Stream
INSTRUCTION
Single Data stream Multiple Data stream
2022 Parallel Computing | UNITEN 4
Taxonomy of Parallel Processor Architectures
Processor
Organization
SISD SIMD MISD MIMD
Vector Array Shared Distributed
Uniprocessor
processor processor memory memory
SMP NUMA Clusters
2022 Parallel Computing | UNITEN 5
2022 Parallel Computing | UNITEN 6
Single Instruction, Single
Data (SISD)
2022 Parallel Computing | UNITEN 7
Flynn Taxonomy - Categories
SINGLE DATA MULTIPLE DATA • Uniprocessor
SINGLE
SISD SIMD • Single processor
Single Instruction Stream Single Instruction Stream
INSTRUCTION
Single Data stream Multiple Data stream
• Single instruction
MISD MIMD
MULTIPLE
INSTRUCTION
Multiple Instruction
Stream
Multiple Instruction
Stream
• Single memory
Single Data stream Multiple Data stream
2022 Parallel Computing | UNITEN 8
Single Instruction,
Single Data (SISD)
• A serial (non-parallel) computer
• Single instruction: only one instruction
stream is being acted on by the CPU during
any one clock cycle
• Single data: only one data stream is being
used as input during any one clock cycle
• Deterministic execution
• This is the oldest and until recently, the
most prevalent form of computer
• Examples: most PCs, single CPU
workstations and mainframes
2022 Parallel Computing | UNITEN 9
Single Instruction,
Multiple Data (SIMD)
2022 Parallel Computing | UNITEN 10
Flynn Taxonomy - Categories
SINGLE DATA MULTIPLE DATA • Single instruction
SISD SIMD
control simultaneous
SINGLE
INSTRUCTION
Single Instruction Stream
Single Data stream
Single Instruction Stream
Multiple Data stream
execution of
MISD MIMD
Processing Elements
MULTIPLE
INSTRUCTION
Multiple Instruction
Stream
Multiple Instruction
Stream
(PE)
• Different PE different
Single Data stream Multiple Data stream
data
• Vector and array
processor
2022 Parallel Computing | UNITEN 11
Single Instruction, Multiple Data (SIMD)
• A type of parallel computer
• Single instruction: All processing units execute the same instruction at any given clock cycle
• Multiple data: Each processing unit can operate on a different data element
• This type of machine typically has an instruction dispatcher, a very high-bandwidth internal
network, and a very large array of very small-capacity instruction units.
• Best suited for specialized problems characterized by a high degree of regularity, such as
image processing.
• Synchronous (lockstep) and deterministic execution
• Two varieties: Processor Arrays and Vector Pipelines
• Examples:
• Processor Arrays: Connection Machine CM-2, Maspar MP-1, MP-2
• Vector Pipelines: IBM 9000, Cray C90, Fujitsu VP, NEC SX-2, Hitachi S820
2022 Parallel Computing | UNITEN 12
SIMD : Single Instruction Multiple Data Stream
2022 Parallel Computing | UNITEN 13
Multiple Instruction,
Single Data (MISD)
2022 Parallel Computing | UNITEN 14
Flynn Taxonomy - Categories
SINGLE DATA MULTIPLE DATA • Same data different
SISD SIMD
processors
SINGLE
• Different instruction
Single Instruction Stream Single Instruction Stream
INSTRUCTION
Single Data stream Multiple Data stream
MISD MIMD
MULTIPLE Multiple Instruction Multiple Instruction
INSTRUCTION Stream Stream
Single Data stream Multiple Data stream
2022 Parallel Computing | UNITEN 15
Multiple Instruction, Single Data (MISD)
• A single data stream is fed into multiple processing units.
• Each processing unit operates on the data independently via
independent instruction streams.
• Few actual examples of this class of parallel computer have ever existed.
One is the experimental Carnegie-Mellon C.mmp computer (1971).
• Some conceivable uses might be:
• multiple frequency filters operating on a single signal stream
• multiple cryptography algorithms attempting to crack a single coded
message.
2022 Parallel Computing | UNITEN 16
2022 Parallel Computing | UNITEN 17
Multiple Instruction,
Multiple Data (MIMD)
2022 Parallel Computing | UNITEN 18
Flynn Taxonomy - Categories
SINGLE DATA MULTIPLE DATA • Different instruction
SINGLE
SISD SIMD • Different data
Single Instruction Stream Single Instruction Stream
INSTRUCTION
Single Data stream Multiple Data stream
• Different processors
MISD MIMD
MULTIPLE
INSTRUCTION
Multiple Instruction
Stream
Multiple Instruction
Stream
• Apply in:
Single Data stream Multiple Data stream
• Symmetric
multiprocessor (SMP)
• Nonuniform Memory
access (NUMA)
• Clusters
2022 Parallel Computing | UNITEN 19
2022 Parallel Computing | UNITEN 20
Multiple Instruction,
Multiple Data (MIMD)
• Currently, the most common type of parallel
computer. Most modern computers fall into this
category.
• Multiple Instruction: every processor may be
executing a different instruction stream
• Multiple Data: every processor may be working
with a different data stream
• Execution can be synchronous or asynchronous,
deterministic or non-deterministic
• Examples: most current supercomputers,
networked parallel computer "grids" and multi-
processor SMP computers - including some types of
PCs.
Parallel Computing | UNITEN 21
2022
Shared Memory
Processor
Organization
SISD SIMD MISD MIMD
Vector Array Shared Distributed
Uniprocessor
processor processor memory memory
SMP NUMA Clusters
2022 Parallel Computing | UNITEN 22
MIMD: Shared Memory
Symmetric Multiprocessor (SMP)
Nonuniform Memory Access (NUMA)
2022 Parallel Computing | UNITEN 23
Symmetric multiprocessor (SMP) -
Characteristics
• Two or more similar processors of comparable capability (identical processors)
• Uniformed Memory Access (UMA)
• All processors
• Share connection to memory and I/O devices
• Share access to I/O devices
• Perform same functions
• Equal access and equal access time to memory
• System is controlled by integrated OS
• interaction between processors and job, task, file, data
• Cache coherent
• one processor update the memory all processors are aware of the update.
2022 Parallel Computing | UNITEN 24
SMP - Advantages
• Performance
• enable parallel processing
• Availability
• failure of a single processor does not halt the machine
• Flexibility & Incremental Growth
• enhance performance by adding an additional processor
• Scaling
• wide range of products with different price and performance characteristics –
number of processors
2022 Parallel Computing | UNITEN 25
SMP - Organization
Processor Processor … Processor
L1 cache L1 cache L1 cache
L2 cache L2 cache L2 cache
Shared Bus
Main
Memory I/O Subsystem
2022 Parallel Computing | UNITEN 26
SMP – Key design issues
• Simultaneous concurrent processes – OS allow processors to enable
parallel execution
• Scheduling – avoid conflicts between processes
• Synchronization – Enforce mutual exclusion and event ordering.
Potential access to the same memory location.
• Memory management – paging
• Reliability and fault tolerance – OS handle degradation caused by
processor failure
2022 Parallel Computing | UNITEN 27
MIMD: Shared Memory
Symmetric Multiprocessor (SMP)
Nonuniform Memory Access (NUMA)
2022 Parallel Computing | UNITEN 28
NUMA – Non-uniform Memory Access
• Often made by physically linking two or more SMPs
• All processors – access all parts of main memory
• Access time - depends on location
2022 Parallel Computing | UNITEN 29
NUMA – Characteristics
• Maintain transparent system wide memory – multiprocessor nodes
• One SMP can directly access memory of another SMP
• Not all processors have equal access time to all memories
• If cache coherency is maintained, then may also be called CC-NUMA -
Cache Coherent NUMA
• Cache-coherent NUMA (CC-NUMA)
• Cache coherence is maintained among the caches (different processors)
2022 Parallel Computing | UNITEN 30
CC-NUMA – Organization of Fetch
Start
Operation
Processor initiates
memory address
Flowchart Location
Memory Found in
No L2 cache initiates found in Noo Noo
location in remote
fetch operation from local
current main
main memory main
cache found memory
memory
Yes Yes Yes
Fetch across
Send desired memory Fetch across
interconnection
address to processor local bus
network
End
2022 Parallel Computing | UNITEN 31
Processor 1-1 … Processor 1-m Processor 2-1 … Processor 2-m
L1 cache L1 cache L1 cache L1 cache
L2 cache L2 cache L2 cache L2 cache
Local Bus Local Bus
Main Main
I/O Subsystem I/O Subsystem
Memory 1 Memory 2
Interconnect
network
Processor N-1 … Processor N-M
L1 cache L1 cache
L2 cache L2 cache
Local Bus
Main
I/O Subsystem
Memory N
CC-NUMA
Organization
2022 Parallel Computing | UNITEN 32
NUMA - Design issue
issues solution
• Overloaded memory access to • Use L1 and L2 – minimize
remote nodes memory access
• Additional nodes mean • Software – good spatial locality
additional traffic (virtual memory)
• OS page migration
2022 Parallel Computing | UNITEN 33
NUMA – Pros and Cons
pros cons
• Performance – no major • Not as transparent as SMP –
software changes software changes
• Availability
2022 Parallel Computing | UNITEN 34
MIMD: Distributed
Memory
Cluster
2022 Parallel Computing | UNITEN 35
Distributed Memory
Processor
Organization
SISD SIMD MISD MIMD
Vector Array Shared Distributed
Uniprocessor
processor processor memory memory
SMP NUMA Clusters
2022 Parallel Computing | UNITEN 36
A group of interconnected
computer working together as a
unified computing resource
Distributed Approach to provide high
Memory: performance and high
availability
Cluster
Node – each computer in a
cluster
2022 Parallel Computing | UNITEN 37
Distributed Memory: Cluster
• Processors - own local memory –
not share
• Operates independently
• CPU needs access to data in
another processor – define by
programmer
2022 Parallel Computing | UNITEN 38
Distributed Memory: Advantage
• Memory is scalable with the number of processors
• Absolute scalability – better than a standalone machine
• Incremental scalability – possible to add new node
• High availability – failure in one node will not impact the whole
system
• Superior price/performance – add equal or greater computing power
(lower cost)
• Each processor can rapidly access its own memory without
interference
2022 Parallel Computing | UNITEN 39
Distributed Memory: Disadvantage
• Increase programmer’s task – arrange the data
• It may be difficult to map existing data structures - based on global
memory
• Non-uniform memory access times - data residing on a remote node
takes longer to access than node local data
2022 Parallel Computing | UNITEN 40
Standby server with no shared
disk
Cluster
Configurations
Shared disk
2022 Parallel Computing | UNITEN 41
Cluster Configuration
Standby server with no shared disk Shared disk
• Two-node interconnected • Two-node interconnected
• Message exchange - high-speed • Message exchange - high-speed
link link
• LAN • LAN
• WAN • WAN
• Disk subsystem
• RAID system
2022 Parallel Computing | UNITEN 42
Standby server with no shared disk
Processor Processor Processor Processor
… …
Main Main
I/O Subsystem I/O Subsystem
Memory Memory
High-speed message link
Shared Disk
Processor Processor Processor Processor
… …
RAID
Main Main
I/O Subsystem I/O Subsystem
Memory Memory
High-speed message link
2022 Parallel Computing | UNITEN 43
Clustering Method
• Passive Standby
• One computer handle all of the processing load. Only in the event of failure
the other computer in the cluster will take over.
• Primary machine sends “heartbeat” message to standby machines.
• Increases availability but not performance
• Active Secondary
• Separate Servers
• All actively doing processing
• Servers Connected to disks
• Servers Share Disks
2022 Parallel Computing | UNITEN 44
Clusters compared to SMP
Characteristic Clusters SMP
Number of Multiple processors Multiple processors
processors
Configuration Need to deal with different Easier to manage and
computer range and configure
specifications
Physical space More space to allocated each Located on computer
nodes motherboard
Power consumption Power required by each node Power required by each
processor
Stability Not stable – depends on Stable and established
connection and node’s
availability
Scalability Big and expandable Small scale/single
computing
2022 Parallel Computing | UNITEN 45
Hybrid Distributed-
Shared Memory
2022 Parallel Computing | UNITEN 46
Hybrid
Distributed-
Shared Memory
• Apply shared and distributed
memory architectures
• Shared memory component - a
shared memory machine and/or
graphics processing units (GPU)
• Distributed memory component -
networking of multiple shared
memory/GPU machines – only
knows their own memory
• Network communications –
data transfer
2022 Parallel Computing | UNITEN 47
Reference
1. https://computing.llnl.gov/tutorials/parallel_comp/
2. Roosta, S.H., Parallel Processing and Parallel Algorithms: Theory and Computation. 2000: Springer-Verlag
New York, Inc. 566.
3. Leopold, C., Parallel and Distributed Computing: A Survey of Models, Paradigms, and Approaches. Wiley
Series On Parallel And Distributed Computing, ed. A.Y. Zomaya. 2001: A Wiley-Interscience Publication,
John Wiley & Sons, Inc. 260.
4. Symmetric Multi-Processing (SMP) systems on top of contemporary Intel appliances
http://www.cs.uta.fi/research/theses/masters/Hlusi_Jiri.pdf
5. The Linux Kernel Module Programming Guide - Symmetric Multi Processing,
http://www.tldp.org/LDP/lkmpg/2.4/html/c1294.htm
6. Cluster Computing: High-Performance, High-Availability, and High-Throughput Processing on a Network of
Computers, http://www.cloudbus.org/papers/ic_cluster.pdf
7. Bertil Svensson, SIMD Architectures, 1992
http://www.hh.se/download/18.70cf2e49129168da0158000103477/1341267676514/SIMD+Architecture
s.pdf
8. W. Stallings, Computer Organization and Architecture, 9th ed. England: Pearson Education Limited, 2013,
pp. 635-678.
2022 Parallel Computing | UNITEN 48
2022 Universiti Tenaga Nasional. All rights reserved.
2022 Parallel Computing | UNITEN 49