Pipelining and Superscalar Techniques: CSE539: Advanced Computer Architecture
Pipelining and Superscalar Techniques: CSE539: Advanced Computer Architecture
Chapter 6
Pipelining and Superscalar
Techniques
Book: Advanced Computer Architecture Parallelism, Scalability, Programmability, Hwang & Jotwani
Sumit Mittu
Assistant Professor, CSE/IT
Lovely Professional University
sumit.12735@lpu.co.in
In this chapter
Linear Pipeline Processors
Non-linear Pipeline Processors
Instruction Pipeline Design
Arithmetic Pipeline Design
Superscalar Pipeline Design
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 2
LINEAR PIPELINE PROCESSORS
Linear Pipeline Processor
o (Definition)
Models of Linear Pipeline
o Synchronous Model
o Asynchronous Model
o (Corresponding reservation tables)
Clocking and Timing Control
o Clock Cycle
o Pipeline Frequency
o Clock skewing
o Flow-through delay
o Speedup, Efficiency and Throughput
Optimal number of Stages and Performance-Cost Ratio (PCR)
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 3
LINEAR PIPELINE PROCESSORS
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 4
LINEAR PIPELINE PROCESSORS
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 5
NON-LINEAR PIPELINE PROCESSORS
Dynamic Pipeline
o Static v/s Dynamic Pipeline
o Streamline connection, feed-forward connection and feedback connection
Reservation and Latency Analysis
o Reservation tables
o Evaluation time
Latency Analysis
o Latency
o Collision
o Forbidden latencies
o Latency Sequence, Latency Cycle and Average Latency
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 6
NON-LINEAR PIPELINE PROCESSORS
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 7
NON-LINEAR PIPELINE PROCESSORS
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 8
NON-LINEAR PIPELINE PROCESSORS
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 9
INSTRUCTION PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 10
Instruction Execution Phases
o E.g. Fetch, Decode, Issue, Execute, Write-back
o In-order Instruction issuing and Reordered Instruction issuing
E.g. X = Y + Z , A = B x C
Mechanisms/Design Issues for Instruction Pipelining
o Pre-fetch Buffers
o Multiple Functional Units
o Internal Data Forwarding
o Hazard Avoidance
Dynamic Scheduling
Branch Handling Techniques
INSTRUCTION PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 11
INSTRUCTION PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 12
Fetch: fetches instructions from memory; ideally one per cycle
Decode: reveals instruction operations to be performed and identifies the resources needed
Issue: reserves the resources and reads the operands from registers
Execute: actual processing of operations as indicated by instruction
Write Back: writing results into the registers
INSTRUCTION PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 13
INSTRUCTION PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 14
INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 15
Pre-fetch Buffers
o Sequential Buffers
o Target Buffers
o Loop Buffers
INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 16
Multiple Functional Units
o Reservation Station and Tags
o Slow-station as Bottleneck stage
Subdivision of Pipeline Bottleneck stage
Replication of Pipeline Bottleneck stage
(Example to be discussed)
INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 17
INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 18
Internal Forwarding and Register Tagging
o Internal Forwarding:
A short-circuit technique to replace unnecessary memory accesses by register-register
transfers in a sequence of fetch-arithmetic-store operations
o Register Tagging:
Use of tagged registers , buffers and reservation stations, for exploiting concurrent activities
among multiple arithmetic units
o Store-Fetch Forwarding
(M R1, R2 M) replaced by (M R1, R2 R1)
o Fetch-Fetch Forwarding
(R1 M, R2 M) replaced by (R1 M, R2 R1)
o Store-Store Overwriting
(M R1, M R2) replaced by (M R2)
INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 19
Internal Forwarding and Register Tagging
INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 20
Internal Forwarding and Register Tagging
INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 21
Hazard Detection and Avoidance
o Domain or Input Set of an instruction
o Range or Output Set of an instruction
o Data Hazards: RAW, WAR and WAW
o Resolution using Register Renaming approach
INSTRUCTION PIPELINE DESIGN
Dynamic Instruction Scheduling
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 22
Idea of Static Scheduling
o Compiler based scheduling strategy to resolve Interlocking among instructions
Dynamic Scheduling
o Tomasulos Algorithm (Register-Tagging Scheme)
Hardware based dependence-resolution
o Scoreboarding Technique
Scoreboard: the centralized control unit
A kind of data-driven mechanism
INSTRUCTION PIPELINE DESIGN
Dynamic Instruction Scheduling
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 23
INSTRUCTION PIPELINE DESIGN
Dynamic Instruction Scheduling
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 24
INSTRUCTION PIPELINE DESIGN
Branch Handling Techniques
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 25
Branch Taken, Branch Target, Delay Slot
Effect of Branching
o Parameters:
k : No. of stages in the pipeline
n : Total no. of instructions or tasks
p : Percentage of Brach instructions over n
q : Percentage of successful branch instructions (branch taken) over p.
b : Delay Slot
: Pipeline Cycle Time
o Branch Penalty = q of (p of n) * b = pqnb
o Effective Execution Time:
T
eff
= [k + (n-1)] + pqnb =[k + (n-1) + pqnb]
INSTRUCTION PIPELINE DESIGN
Branch Handling Techniques
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 26
Effect of Branching
o Effective Throughput:
H
eff
= n/T
eff
H
eff
= n / {[k + (n-1) + pqnb]} = nf / [k + (n-1) + pqnb]
As nInfinity and b = k-1
o H*
eff
= f / [pq(k-1)+1]
If p=0 and q=0 (no branching occurs)
o H**
eff
= f = 1/
o Performance Degradation Factor
D = 1 H*
eff
/ f = pq(k-1) / [pq(k-1)+1]
INSTRUCTION PIPELINE DESIGN
Branch Handling Techniques
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 27
INSTRUCTION PIPELINE DESIGN
Branch Handling Techniques
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 28
Branch Prediction
o Static Branch Prediction: based on branch code types
o Dynamic Branch prediction: based on recent branch history
Strategy 1: Predict the branch direction based on information found at decode stage.
Strategy 2: Use a cache to store target addresses at effective address calculation stage.
Strategy 3: Use a cache to store target instructions at fetch stage
o Brach Target Buffer Organization
Delayed Branches
o A delayed branch of d cycles allows at most d-1 useful instructions to be executed following the
branch taken.
o Execution of these instructions should be independent of branch instruction to achieve a zero
branch penalty
INSTRUCTION PIPELINE DESIGN
Branch Handling Techniques
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 29
INSTRUCTION PIPELINE DESIGN
Branch Handling Techniques
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 30
ARITHMETIC PIPELINE DESIGN
Computer Arithmetic Operations
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 31
Finite-precision arithmetic
Overflow and Underflow
Fixed-Point operations
o Notations:
Signed-magnitude, ones complement and two-complement notation
o Operations:
Addition: (n bit, n bit) (n bit) Sum, 1 bit output carry
Subtraction: (n bit, n bit) (n bit) difference
Multiplication: (n bit, n bit) (2n bit) product
Division: (2n bit, n bit) (n bit) quotient, (n bit) remainder
ARITHMETIC PIPELINE DESIGN
Computer Arithmetic Operations
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 32
Floating-Point Numbers
o X = (m, e) representation
m: mantissa or fraction
e: exponent with an implied base or radix r.
Actual Value X = m * r
e
o Operations on numbers X = (m
x
, e
x
) and Y = (m
y
, e
y
)
Addition: (m
x
* r
ex-ey
+ m
y
, e
y
)
Subtraction: (m
x
* r
ex-ey
m
y
, e
y
)
Multiplication: (m
x
* m
y
, e
x
+e
y
)
Division: (m
x
/ m
y
, e
x
e
y
)
Elementary Functions
o Transcendental functions like: Trigonometric, Exponential, Logarithmic, etc.
ARITHMETIC PIPELINE DESIGN
Static Arithmetic Pipelines
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 33
Separate units for fixed point operations and floating point operations
Scalar and Vector Arithmetic Pipelines
Uni-functional or Static Pipelines
Arithmetic Pipeline Stages
o Majorly involve hardware to perform: Add and Shift micro-operations
o Addition using: Carry Propagation Adder (CPA) and Carry Save Adder (CSA)
o Shift using: Shift Registers
Multiplication Pipeline Design
o E.g. To multiply two 8-bit numbers that yield a 16-bit product using CSA and CPA Wallace Tree.
ARITHMETIC PIPELINE DESIGN
Static Arithmetic Pipelines
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 34
0
ARITHMETIC PIPELINE DESIGN
Static Arithmetic Pipelines
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 35
ARITHMETIC PIPELINE DESIGN
Multifunctional Arithmetic Pipelines
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 36
Multifunctional Pipeline:
o Static multifunctional pipeline
o Dynamic multifunctional pipeline
Case Study: T1/ASC static multifunctional pipeline architecture
ARITHMETIC PIPELINE DESIGN
Multifunctional Arithmetic Pipelines
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 37
ARITHMETIC PIPELINE DESIGN
Multifunctional Arithmetic Pipelines
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 38
SUPERSCALAR PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 39
Pipeline Design Parameters
o Pipeline cycle, Base cycle, Instruction issue rate, Instruction issue Latency, Simple Operation Latency
o ILP to fully utilize the pipeline
Superscalar Pipeline Structure
Data and Resource Dependencies
Pipeline Stalling
Superscalar Pipeline Scheduling
o In-order Issue and in-order completion
o In-order Issue and out-of-order completion
o Out-of-order Issue and out-of-order completion
Superscalar Performance
SUPERSCALAR PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 40
Parameter Base Scalar Processor Super Scalar Processor
(degree = K)
Pipeline Cycle 1 (base cycle) K
Instruction Issue Rate 1 K
Instruction Issue Latency 1 1
Simple Operation Latency 1 1
ILP to fully utilize pipeline 1 K
SUPERSCALAR PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 41
4,
SUPERSCALAR PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 42
SUPERSCALAR PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 43
SUPERSCALAR PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 44
SUPERSCALAR PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 45
SUPERSCALAR PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 46
SUPERSCALAR PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 47
4,
SUPERSCALAR PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 48
SUPERSCALAR PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 49
Time required by base scalar machine:
o T(1,1) = K + N 1
The ideal execution time required by m-issue superscalar machine:
o T(m,1) = K + (N m)/m
o Where,
K is the time required to execute first m instructions through m pipelines of k-stages
simultaneously
Second term corresponds to time required to execute remaining N-m instructions , m per
cycle through m pipelines
The ideal speedup of superscalar machine
o S(m,1) = T(1,1)/T(m,1) = m(N + k 1)/[N+ m(k 1)]
As n infinity, S(m,1) m