Chap 5 New

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Chapter Five

Parallel Processing and


Multiprocessors

COA chap-5 1
Contents

Parallel Processing, Pipelining and Vector


processing
 Multiprocessors organization
 Trends in computer Architectural development
(reading assignment)

COA chap-5 2
Parallel Processing

 Parallel Processing means simultaneous


data/instruction processing
 Parallel processing includes
 Pipelining: arithmetic sub operations or the phases
of a computer instruction cycle overlap in execution
 Vector processing: deals with computations
involving of large vectors and matrices.

COA chap-5 3
Pipelining

 Result of computation in each segment is transferred to the next


segment in the pipeline
 Consider the following

 The sub-operations performed in each segment are

COA chap-5 4
Pipelining
The five registers are loaded with new data
every clock pulse
It takes three clock cycles for each process to
compute a result
With out pipeline 21 clock cycles required to
compute
With pipeline only 9 clock cycles
 Pipelining is efficient for those applications that
need to repeat the same task many times with
different sets of data.
COA chap-5 5
Computation in pipelined system

COA chap-5 6
Pipelining speed-up

 For a k-segment pipeline to execute n tasks


with a clock cycle time tp, the first task T1
requires ktp time to complete its operation, the
remaining n-1 tasks emerge from the pipe at a
rate of one task per clock cycle, and completed
after (n-1)tp.
 The total time required is k+(n-1)tp
 Considering a non-pipeline system, to perform
the same operation, it takes a time tn to
complete each task.
COA chap-5 7
Speed-up…

 The speed-up of a pipeline processing is then:


 S=ntn/(k+n-1)tp
 As the number of tasks increases, n becomes larger
than k-1, and k+n-1 approaches the value of n. the
speed up becomes,
 S=tn/tp
 S= ktp/tp=k, where, tn=ktp, if the time taken to
execute one task in the pipeline and non-pipeline is the
same
 K is the maximum speed up a pipeline can provide

COA chap-5 8
Relating Pipeline to SIMD

K-segment pipeline processor is equal to the


performance of K copies of an equivalent non-
pipeline processing units.
 the parallel circuits accepts four input data
items simultaneously and perform four tasks at
the same time
 the SIMD organization uses a single instruction
operating on multiple data in parallel

COA chap-5 9
Multiple functional units in
Parallel

COA chap-5 10
Application of Pipeline
organization

 two areas of computer design where the


pipeline is applicable.
 Arithmetic pipeline: divides an arithmetic operations
into sub operations for execution in the pipeline
segments
 Instruction pipeline: operates on streams of
instructions by overlapping the fetch, decode and
execute phases of the instruction cycle

COA chap-5 11
Arithmetic pipeline

 Pipeline arithmetic are found in very high speed


computers
 They are used to implement floating-point
operations, multiplications of fixed point
numbers and similar computations

COA chap-5 12
Example of floating point
Add/Subtract pipeline

COA chap-5 13
COA chap-5 14
4-segment Instruction pipeline

 Up to four instructions can be overlapped in


progress at the same time in the instruction
cycle,
 while inst1 is being executed in seg4, an
operand from memory could be fetched for
inst2 in seg3, the effective address may be
calculated for inst3 in seg2, and inst4 can be
fetched and placed in an instruction FIFO

COA chap-5 15
Four segment instruction
pipeline

COA chap-5 16
4-segement Instruction pipeline

COA chap-5 17
Pipeline conflicts
data dependency
Procedural dependency
Resource conflicts

COA chap-5 18
True Data Dependency

ADD r1, r2 (r1 := r1+r2;)


MOVE r3,r1 (r3 := r1;)
Can fetch and decode second instruction in
parallel with first
Can NOT execute second instruction until first is
finished

COA chap-5 19
Procedural Dependency

Can not execute instructions after a branch in


parallel with instructions before a branch
They have a procedural dependency on the
branch and cannot be executed until the branch
is executed
Also, if instruction length is not fixed,
instructions have to be decoded to find out how
many fetches are needed
This prevents simultaneous fetches
COA chap-5 20
Resource Conflict

Two or more instructions requiring access to the


same resource at the same time
e.g. two arithmetic instructions if there is one ALU
Can duplicate resources
e.g. have two arithmetic units

COA chap-5 21
Dependencies

COA chap-5 22
Vector Processing

 There is a class of computational problems that


are beyond the capabilities of a conventional
computer
 The problems are characterized by a vast
number of computations that will take a
conventional computer days or even weeks to
complete.
 the problems can be formulated in terms of
vectors and matrices that lend themselves to
vector processing
COA chap-5 23
Vector Processing

 computers with vector processing capabilities are in


demand in specialized applications, such as
 Long range weather forecasting
 Petroleum explorations
Medical diagnosis
 aerodynamics and space flight simulations
 Artificial intelligence and expert systems
 Mapping the human genome
 image processing

COA chap-5 24
Vector operations

 Many scientific problems require arithmetic


operations on large arrays of numbers
 these numbers are formulated as vectors and
matrices of floating point numbers
 A vector V of length n is represented as
V=[V1 V2 V3---Vn]
 A conventional computer is capable of
processing operands one at a time.

COA chap-5 25
COA chap-5 26
Adding of two vectors…

 the program loop reads a pair of operands from arrays


A and B, and performs floating-point addition, and the
steps repeat 100 times
 a computer with vector processing eliminates the
overhead associated with the time it takes to fetch and
execute the instruction in the program loop
 it allows operations to be specified with single vector
instruction of the form:
 C(1:100)=A(1:100)+B(1:100)
 the addition is done with pipelined floating-point
addition
COA chap-5 27
Multiprocessors Organizations

 The two most common multiple-processor organizations


are symmetric multiprocessors (SMPs) and clusters.
 An SMP consists of multiple similar processors within
the same computer, interconnected by a bus or some
sort of switching arrangement.
 The most critical problem to address in an SMP is that
of cache coherence. Each processor has its own cache
and so it is possible for a given line of data to be
present in more than one cache.

COA chap-5 28
Multiprocessors…

 When more than one processor are implemented on a single chip,


the configuration is referred to as chip multiprocessing. A related
design scheme is to replicate some of the components of a single
processor so that the processor can execute multiple threads
concurrently; this is known as a multithreaded processor.
 A cluster is a group of interconnected, whole computers working
together as a unified computing resource that can create the
illusion of being one machine. The term whole computer means a
system that can run on its own, apart from the cluster.

COA chap-5 29
Multiple Processor Organization

Single instruction, single data stream - SISD


Single instruction, multiple data stream - SIMD
Multiple instruction, single data stream - MISD
Multiple instruction, multiple data stream- MIMD

COA chap-5 30
Single Instruction, Single Data
Stream - SISD

Single processor
Single instruction stream
Data stored in single memory
Uni-processor

COA chap-5 31
Parallel Organizations - SISD

COA chap-5 32
Single Instruction, Multiple
Data Stream - SIMD

Single machine instruction


Controls simultaneous execution
Number of processing elements
Each processing element has associated data
memory
Each instruction executed on different set of
data by different processors
Vector and array processors

COA chap-5 33
Parallel Organizations - SIMD

COA chap-5 34
Multiple Instruction, Single
Data Stream - MISD

Sequence of data
Transmitted to set of processors
Each processor executes different instruction
sequence
Never been implemented

COA chap-5 35
Multiple Instruction, Multiple
Data Stream- MIMD

Set of processors
Simultaneously execute different instruction
sequences
Different sets of data
SMPs, clusters and NUMA systems

COA chap-5 36
Parallel Organizations - MIMD
Shared Memory

COA chap-5 37
Parallel Organizations - MIMD
Distributed Memory

COA chap-5 38
Taxonomy of Parallel Processor
Architectures

COA chap-5 39
Processors coupling

 Tightly Coupled System


- Tasks and/or processors communicate in a highly synchronized
fashion
- Communicates through a common shared memory
- Shared memory system
 Loosely Coupled System
- Tasks or processors do not communicate in a
synchronized fashion
- Communicates by message passing packets
- Overhead for data exchange is high
- Distributed memory system

COA chap-5 40
Amdahl’s Law

 Given a program
 f : Fraction of time that represents operations
that must be performed serially
 Maximum Possible Speedup: S

COA chap-5 41

You might also like