Chap 5 New

Chapter Five
Parallel Processing and

Multiprocessors
COA chap-5 1
Contents
Parallel Processing, Pipelining and Vector

processing
 Multiprocessors organization
 Trends in computer Architectural development
(reading assignment)
COA chap-5 2
Parallel Processing
 Parallel Processing means simultaneous

data/instruction processing
 Parallel processing includes
 Pipelining: arithmetic sub operations or the phases
of a computer instruction cycle overlap in execution
 Vector processing: deals with computations
involving of large vectors and matrices.
COA chap-5 3
Pipelining
 Result of computation in each segment is transferred to the next

segment in the pipeline
 Consider the following
 The sub-operations performed in each segment are
COA chap-5 4
Pipelining
The five registers are loaded with new data
every clock pulse
It takes three clock cycles for each process to
compute a result
With out pipeline 21 clock cycles required to
compute
With pipeline only 9 clock cycles
 Pipelining is efficient for those applications that
need to repeat the same task many times with
different sets of data.
COA chap-5 5
Computation in pipelined system
COA chap-5 6
Pipelining speed-up
 For a k-segment pipeline to execute n tasks

with a clock cycle time tp, the first task T1
requires ktp time to complete its operation, the
remaining n-1 tasks emerge from the pipe at a
rate of one task per clock cycle, and completed
after (n-1)tp.
 The total time required is k+(n-1)tp
 Considering a non-pipeline system, to perform
the same operation, it takes a time tn to
complete each task.
COA chap-5 7
Speed-up…
 The speed-up of a pipeline processing is then:

 S=ntn/(k+n-1)tp
 As the number of tasks increases, n becomes larger
than k-1, and k+n-1 approaches the value of n. the
speed up becomes,
 S=tn/tp
 S= ktp/tp=k, where, tn=ktp, if the time taken to
execute one task in the pipeline and non-pipeline is the
same
 K is the maximum speed up a pipeline can provide
COA chap-5 8
Relating Pipeline to SIMD
K-segment pipeline processor is equal to the

performance of K copies of an equivalent non-
pipeline processing units.
 the parallel circuits accepts four input data
items simultaneously and perform four tasks at
the same time
 the SIMD organization uses a single instruction
operating on multiple data in parallel
COA chap-5 9
Multiple functional units in
Parallel
COA chap-5 10
Application of Pipeline
organization
 two areas of computer design where the

pipeline is applicable.
 Arithmetic pipeline: divides an arithmetic operations
into sub operations for execution in the pipeline
segments
 Instruction pipeline: operates on streams of
instructions by overlapping the fetch, decode and
execute phases of the instruction cycle
COA chap-5 11
Arithmetic pipeline
 Pipeline arithmetic are found in very high speed

computers
 They are used to implement floating-point
operations, multiplications of fixed point
numbers and similar computations
COA chap-5 12
Example of floating point
Add/Subtract pipeline
COA chap-5 13
COA chap-5 14
4-segment Instruction pipeline
 Up to four instructions can be overlapped in

progress at the same time in the instruction
cycle,
 while inst1 is being executed in seg4, an
operand from memory could be fetched for
inst2 in seg3, the effective address may be
calculated for inst3 in seg2, and inst4 can be
fetched and placed in an instruction FIFO
COA chap-5 15
Four segment instruction
pipeline
COA chap-5 16
4-segement Instruction pipeline
COA chap-5 17
Pipeline conflicts
data dependency
Procedural dependency
Resource conflicts
COA chap-5 18
True Data Dependency
ADD r1, r2 (r1 := r1+r2;)

MOVE r3,r1 (r3 := r1;)
Can fetch and decode second instruction in
parallel with first
Can NOT execute second instruction until first is
finished
COA chap-5 19
Procedural Dependency
Can not execute instructions after a branch in

parallel with instructions before a branch
They have a procedural dependency on the
branch and cannot be executed until the branch
is executed
Also, if instruction length is not fixed,
instructions have to be decoded to find out how
many fetches are needed
This prevents simultaneous fetches
COA chap-5 20
Resource Conflict
Two or more instructions requiring access to the

same resource at the same time
e.g. two arithmetic instructions if there is one ALU
Can duplicate resources
e.g. have two arithmetic units
COA chap-5 21
Dependencies
COA chap-5 22
Vector Processing
 There is a class of computational problems that

are beyond the capabilities of a conventional
computer
 The problems are characterized by a vast
number of computations that will take a
conventional computer days or even weeks to
complete.
 the problems can be formulated in terms of
vectors and matrices that lend themselves to
vector processing
COA chap-5 23
Vector Processing
 computers with vector processing capabilities are in

demand in specialized applications, such as
 Long range weather forecasting
 Petroleum explorations
Medical diagnosis
 aerodynamics and space flight simulations
 Artificial intelligence and expert systems
 Mapping the human genome
 image processing
COA chap-5 24
Vector operations
 Many scientific problems require arithmetic

operations on large arrays of numbers
 these numbers are formulated as vectors and
matrices of floating point numbers
 A vector V of length n is represented as
V=[V1 V2 V3---Vn]
 A conventional computer is capable of
processing operands one at a time.
COA chap-5 25
COA chap-5 26
Adding of two vectors…
 the program loop reads a pair of operands from arrays

A and B, and performs floating-point addition, and the
steps repeat 100 times
 a computer with vector processing eliminates the
overhead associated with the time it takes to fetch and
execute the instruction in the program loop
 it allows operations to be specified with single vector
instruction of the form:
 C(1:100)=A(1:100)+B(1:100)
 the addition is done with pipelined floating-point
addition
COA chap-5 27
Multiprocessors Organizations
 The two most common multiple-processor organizations

are symmetric multiprocessors (SMPs) and clusters.
 An SMP consists of multiple similar processors within
the same computer, interconnected by a bus or some
sort of switching arrangement.
 The most critical problem to address in an SMP is that
of cache coherence. Each processor has its own cache
and so it is possible for a given line of data to be
present in more than one cache.
COA chap-5 28
Multiprocessors…
 When more than one processor are implemented on a single chip,

the configuration is referred to as chip multiprocessing. A related
design scheme is to replicate some of the components of a single
processor so that the processor can execute multiple threads
concurrently; this is known as a multithreaded processor.
 A cluster is a group of interconnected, whole computers working
together as a unified computing resource that can create the
illusion of being one machine. The term whole computer means a
system that can run on its own, apart from the cluster.
COA chap-5 29
Multiple Processor Organization
Single instruction, single data stream - SISD

Single instruction, multiple data stream - SIMD
Multiple instruction, single data stream - MISD
Multiple instruction, multiple data stream- MIMD
COA chap-5 30
Single Instruction, Single Data
Stream - SISD
Single processor
Single instruction stream
Data stored in single memory
Uni-processor
COA chap-5 31
Parallel Organizations - SISD
COA chap-5 32
Single Instruction, Multiple
Data Stream - SIMD
Single machine instruction

Controls simultaneous execution
Number of processing elements
Each processing element has associated data
memory
Each instruction executed on different set of
data by different processors
Vector and array processors
COA chap-5 33
Parallel Organizations - SIMD
COA chap-5 34
Multiple Instruction, Single
Data Stream - MISD
Sequence of data
Transmitted to set of processors
Each processor executes different instruction
sequence
Never been implemented
COA chap-5 35
Multiple Instruction, Multiple
Data Stream- MIMD
Set of processors
Simultaneously execute different instruction
sequences
Different sets of data
SMPs, clusters and NUMA systems
COA chap-5 36
Parallel Organizations - MIMD
Shared Memory
COA chap-5 37
Parallel Organizations - MIMD
Distributed Memory
COA chap-5 38
Taxonomy of Parallel Processor
Architectures
COA chap-5 39
Processors coupling
 Tightly Coupled System

- Tasks and/or processors communicate in a highly synchronized
fashion
- Communicates through a common shared memory
- Shared memory system
 Loosely Coupled System
- Tasks or processors do not communicate in a
synchronized fashion
- Communicates by message passing packets
- Overhead for data exchange is high
- Distributed memory system
COA chap-5 40
Amdahl’s Law
 Given a program
 f : Fraction of time that represents operations
that must be performed serially
 Maximum Possible Speedup: S
COA chap-5 41

Chap 5 New

Uploaded by

Copyright:

Available Formats

Chap 5 New

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chap 5 New

Uploaded by

Copyright:

Available Formats

Chapter Five

Parallel Processing and

Parallel Processing, Pipelining and Vector

 Parallel Processing means simultaneous

 Result of computation in each segment is transferred to the next

 The sub-operations performed in each segment are

 For a k-segment pipeline to execute n tasks

 The speed-up of a pipeline processing is then:

K-segment pipeline processor is equal to the

 two areas of computer design where the

 Pipeline arithmetic are found in very high speed

 Up to four instructions can be overlapped in

ADD r1, r2 (r1 := r1+r2;)

Can not execute instructions after a branch in

Two or more instructions requiring access to the

 There is a class of computational problems that

 computers with vector processing capabilities are in

 Many scientific problems require arithmetic

 the program loop reads a pair of operands from arrays

 The two most common multiple-processor organizations

 When more than one processor are implemented on a single chip,

Single instruction, single data stream - SISD

Single machine instruction

 Tightly Coupled System

You might also like