0% found this document useful (0 votes)
2K views18 pages

Pipeline and Vector Processing

The document discusses pipelining and vector processing techniques for improving parallel processing performance. It covers: 1) Pipelining which breaks down instructions into sequential sub-operations that can execute concurrently across multiple stages. This includes arithmetic, instruction, and RISC pipelines. 2) Vector processing which performs the same operation on multiple data elements simultaneously using array processors. 3) Parallel computers classified by Flynn's taxonomy based on the number of instruction and data streams as SISD, SIMD, MISD, and MIMD architectures.

Uploaded by

nancy_01
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views18 pages

Pipeline and Vector Processing

The document discusses pipelining and vector processing techniques for improving parallel processing performance. It covers: 1) Pipelining which breaks down instructions into sequential sub-operations that can execute concurrently across multiple stages. This includes arithmetic, instruction, and RISC pipelines. 2) Vector processing which performs the same operation on multiple data elements simultaneously using array processors. 3) Parallel computers classified by Flynn's taxonomy based on the number of instruction and data streams as SISD, SIMD, MISD, and MIMD architectures.

Uploaded by

nancy_01
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 18

Pipelining and Vector Processing 1

PIPELINING AND VECTOR PROCESSING

• Parallel Processing

• Pipelining

• Arithmetic Pipeline

• Instruction Pipeline

• RISC Pipeline

• Vector Processing

• Array Processors

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 2 Parallel Processing

PARALLEL PROCESSING

Execution of Concurrent Events in the computing


process to achieve faster Computational Speed

Levels of Parallel Processing

- Job or Program level

- Task or Procedure level

- Inter-Instruction level

- Intra-Instruction level

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 3 Parallel Processing

PARALLEL COMPUTERS
Architectural Classification

– Flynn's classification
» Based on the multiplicity of Instruction Streams and
Data Streams
» Instruction Stream
• Sequence of Instructions read from memory
» Data Stream
• Operations performed on the data in the processor

Number of Data Streams


Single Multiple

Number of Single SISD SIMD


Instruction
Streams Multiple MISD MIMD

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 4 Pipelining

PIPELINING
A technique of decomposing a sequential process
into suboperations, with each subprocess being
executed in a partial dedicated segment that
operates concurrently with all other segments.
Ai * Bi + Ci for i = 1, 2, 3, ... , 7
Ai Bi Memory Ci
Segment 1
R1 R2

Multiplier
Segment 2
R3 R4

Adder
Segment 3

R5

R1  Ai, R2  Bi Load Ai and Bi


R3  R1 * R2, R4  Ci Multiply and load Ci
R5  R3 + R4 Add
Computer Organization Computer Architectures Lab
Pipelining and Vector Processing 5 Pipelining

OPERATIONS IN EACH PIPELINE STAGE

Clock Segment 1 Segment 2 Segment 3


Pulse
Number R1 R2 R3 R4 R5
1 A1 B1
2 A2 B2 A1 * B1 C1
3 A3 B3 A2 * B2 C2 A1 * B1 + C1
4 A4 B4 A3 * B3 C3 A2 * B2 + C2
5 A5 B5 A4 * B4 C4 A3 * B3 + C3
6 A6 B6 A5 * B5 C5 A4 * B4 + C4
7 A7 B7 A6 * B6 C6 A5 * B5 + C5
8 A7 * B7 C7 A6 * B6 + C6
9 A7 * B7 + C7

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 6 Pipelining

GENERAL PIPELINE
General Structure of a 4-Segment Pipeline
Clock

Input S1 R1 S2 R2 S3 R3 S4 R4

Space-Time Diagram
1 2 3 4 5 6 7 8 9 Clock cycles
Segment 1 T1 T2 T3 T4 T5 T6
2 T1 T2 T3 T4 T5 T6
3 T1 T2 T3 T4 T5 T6
4 T1 T2 T3 T4 T5 T6

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 7 Pipelining

PIPELINE SPEEDUP
n: Number of tasks to be performed

Conventional Machine (Non-Pipelined)


tn: Clock cycle
: Time required to complete the n tasks
 = n * tn

Pipelined Machine (k stages)


tp: Clock cycle (time to complete each suboperation)
: Time required to complete the n tasks
 = (k + n - 1) * tp

Speedup
Sk: Speedup

Sk = n*tn / (k + n - 1)*tp
tn
lim Sk = ( = k, if tn = k * tp )
n tp

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 8 Arithmetic Pipeline

ARITHMETIC PIPELINE
Floating-point adder Exponents
a b
Mantissas
A B
X = A x 2a
Y = B x 2b R R

[1] Compare the exponents Compare Difference


Segment 1: exponents
[2] Align the mantissa by subtraction
[3] Add/sub the mantissa
[4] Normalize the result
R

Segment 2: Choose exponent Align mantissa

Segment 3: Add or subtract


mantissas

R R

Segment 4: Adjust Normalize


exponent result

R R

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 9 Instruction Pipeline

INSTRUCTION CYCLE
Six Phases* in an Instruction Cycle
[1] Fetch an instruction from memory
[2] Decode the instruction
[3] Calculate the effective address of the operand
[4] Fetch the operands from memory
[5] Execute the operation
[6] Store the result in the proper place

* Some instructions skip some phases


* Effective address calculation can be done in
the part of the decoding phase
* Storage of the operation result into a register
is done automatically in the execution phase

==> 4-Stage Pipeline

[1] FI: Fetch an instruction from memory


[2] DA: Decode the instruction and calculate
the effective address of the operand
[3] FO: Fetch the operand
[4] EX: Execute the operation

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 10 Instruction Pipeline

INSTRUCTION PIPELINE

Execution of Three Instructions in a 4-Stage Pipeline


Conventional

i FI DA FO EX

i+1 FI DA FO EX

i+2 FI DA FO EX

Pipelined

i FI DA FO EX
i+1 FI DA FO EX
i+2 FI DA FO EX

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 11 Instruction Pipeline

INSTRUCTION EXECUTION IN A 4-STAGE PIPELINE

Segment1: Fetch instruction


from memory

Decode instruction
Segment2: and calculate
effective address

yes Branch?
no
Fetch operand
Segment3: from memory

Segment4: Execute instruction

Interrupt yes
Interrupt?
handling
no
Update PC

Empty pipe
Step: 1 2 3 4 5 6 7 8 9 10 11 12 13
Instruction 1 FI DA FO EX
2 FI DA FO EX
(Branch) 3 FI DA FO EX
4 FI FI DA FO EX
5 FI DA FO EX
6 FI DA FO EX
7 FI DA FO EX

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 12 RISC Pipeline

RISC PIPELINE
RISC
- Machine with a very fast clock cycle that
executes at the rate of one instruction per cycle
<- Simple Instruction Set
Fixed Length Instruction Format
Register-to-Register Operations

Instruction Cycles of Three-Stage Instruction Pipeline


Data Manipulation Instructions
I: Instruction Fetch
A: Decode, Read Registers, ALU Operations
E: Write a Register

Load and Store Instructions


I: Instruction Fetch
A: Decode, Evaluate Effective Address
E: Register-to-Memory or Memory-to-Register

Program Control Instructions


I: Instruction Fetch
A: Decode, Evaluate Branch Address
E: Write Register(PC)
Computer Organization Computer Architectures Lab
Pipelining and Vector Processing 13 RISC Pipeline

DELAYED LOAD
LOAD: R1  M[address 1]
LOAD: R2  M[address 2]
ADD: R3  R1 + R2
STORE: M[address 3]  R3
Three-segment pipeline timing
Pipeline timing with data conflict

clock cycle 1 2 3 4 5 6
Load R1 I A E
Load R2 I A E
Add R1+R2 I A E
Store R3 I A E

Pipeline timing with delayed load

clock cycle 1 2 3 4 5 6 7
Load R1 I A E
The data dependency is taken
Load R2 I A E care by the compiler rather
NOP I A E than the hardware
Add R1+R2 I A E
Store R3 I A E

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 14 RISC Pipeline

DELAYED BRANCH
Compiler analyzes the instructions before and after
the branch and rearranges the program sequence by
inserting useful instructions in the delay steps

Using no-operation instructions


Clock cycles: 1 2 3 4 5 6 7 8 9 10
1. Load I A E
2. Increment I A E
3. Add I A E
4. Subtract I A E
5. Branch to X I A E
6. NOP I A E
7. NOP I A E
8. Instr. in X I A E

Rearranging the instructions


Clock cycles: 1 2 3 4 5 6 7 8
1. Load I A E
2. Increment I A E
3. Branch to X I A E
4. Add I A E
5. Subtract I A E
6. Instr. in X I A E

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 15 Vector Processing

VECTOR PROCESSING
Vector Processing Applications
• Problems that can be efficiently formulated in terms of vectors
– Long-range weather forecasting
– Petroleum explorations
– Seismic data analysis
– Medical diagnosis
– Aerodynamics and space flight simulations
– Artificial intelligence and expert systems
– Mapping the human genome
– Image processing

Vector Processor (computer)


Ability to process vectors, and related data structures such as matrices
and multi-dimensional arrays, much faster than conventional computers

Vector Processors may also be pipelined

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 16 Vector Processing

VECTOR PROGRAMMING

DO 20 I = 1, 100
20 C(I) = B(I) + A(I)

Conventional computer

Initialize I = 0
20 Read A(I)
Read B(I)
Store C(I) = A(I) + B(I)
Increment I = i + 1
If I  100 goto 20

Vector computer

C(1:100) = A(1:100) + B(1:100)

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 17 Vector Processing

VECTOR INSTRUCTION FORMAT

Vector Instruction Format


Operation Base address Base address Base address Vector
code source 1 source 2 destination length

Pipeline for Inner Product

Source
A

Source Multiplier Adder


B pipeline pipeline

Computer Organization Computer Architectures Lab


Pipelining and Vector Processing 18 Vector Processing

MULTIPLE MEMORY MODULE AND INTERLEAVING

Multiple Module Memory


Address bus
M0 M1 M2 M3

AR AR AR AR

Memory Memory Memory Memory


array array array array

DR DR DR DR

Data bus

Address Interleaving

Different sets of addresses are assigned to


different memory modules

Computer Organization Computer Architectures Lab

You might also like