0% found this document useful (0 votes)
119 views49 pages

Pipelining and Superscalar Techniques: CSE539: Advanced Computer Architecture

This document discusses chapter 6 of the book "Advanced Computer Architecture - Parallelism, Scalability, Programmability" which covers pipelining and superscalar techniques. The chapter contains sections on linear pipeline processors, non-linear pipeline processors, instruction pipeline design, arithmetic pipeline design, and superscalar pipeline design. It describes different pipeline models, clocking mechanisms, optimal pipeline stages, dynamic versus static pipelines, reservation tables, latency analysis, and mechanisms for instruction and arithmetic pipelining such as forwarding, scheduling, and branch handling techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
119 views49 pages

Pipelining and Superscalar Techniques: CSE539: Advanced Computer Architecture

This document discusses chapter 6 of the book "Advanced Computer Architecture - Parallelism, Scalability, Programmability" which covers pipelining and superscalar techniques. The chapter contains sections on linear pipeline processors, non-linear pipeline processors, instruction pipeline design, arithmetic pipeline design, and superscalar pipeline design. It describes different pipeline models, clocking mechanisms, optimal pipeline stages, dynamic versus static pipelines, reservation tables, latency analysis, and mechanisms for instruction and arithmetic pipelining such as forwarding, scheduling, and branch handling techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

CSE539: Advanced Computer Architecture

Chapter 6
Pipelining and Superscalar
Techniques
Book: Advanced Computer Architecture Parallelism, Scalability, Programmability, Hwang & Jotwani


Sumit Mittu
Assistant Professor, CSE/IT
Lovely Professional University
sumit.12735@lpu.co.in
In this chapter
Linear Pipeline Processors
Non-linear Pipeline Processors
Instruction Pipeline Design
Arithmetic Pipeline Design
Superscalar Pipeline Design
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 2
LINEAR PIPELINE PROCESSORS
Linear Pipeline Processor
o (Definition)
Models of Linear Pipeline
o Synchronous Model
o Asynchronous Model
o (Corresponding reservation tables)
Clocking and Timing Control
o Clock Cycle
o Pipeline Frequency
o Clock skewing
o Flow-through delay
o Speedup, Efficiency and Throughput
Optimal number of Stages and Performance-Cost Ratio (PCR)
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 3
LINEAR PIPELINE PROCESSORS
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 4
LINEAR PIPELINE PROCESSORS
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 5
NON-LINEAR PIPELINE PROCESSORS
Dynamic Pipeline
o Static v/s Dynamic Pipeline
o Streamline connection, feed-forward connection and feedback connection
Reservation and Latency Analysis
o Reservation tables
o Evaluation time
Latency Analysis
o Latency
o Collision
o Forbidden latencies
o Latency Sequence, Latency Cycle and Average Latency
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 6
NON-LINEAR PIPELINE PROCESSORS
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 7
NON-LINEAR PIPELINE PROCESSORS
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 8
NON-LINEAR PIPELINE PROCESSORS
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 9
INSTRUCTION PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 10
Instruction Execution Phases
o E.g. Fetch, Decode, Issue, Execute, Write-back
o In-order Instruction issuing and Reordered Instruction issuing
E.g. X = Y + Z , A = B x C
Mechanisms/Design Issues for Instruction Pipelining
o Pre-fetch Buffers
o Multiple Functional Units
o Internal Data Forwarding
o Hazard Avoidance
Dynamic Scheduling
Branch Handling Techniques
INSTRUCTION PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 11
INSTRUCTION PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 12
Fetch: fetches instructions from memory; ideally one per cycle
Decode: reveals instruction operations to be performed and identifies the resources needed
Issue: reserves the resources and reads the operands from registers
Execute: actual processing of operations as indicated by instruction
Write Back: writing results into the registers
INSTRUCTION PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 13
INSTRUCTION PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 14
INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 15
Pre-fetch Buffers
o Sequential Buffers
o Target Buffers
o Loop Buffers
INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 16
Multiple Functional Units
o Reservation Station and Tags
o Slow-station as Bottleneck stage
Subdivision of Pipeline Bottleneck stage
Replication of Pipeline Bottleneck stage
(Example to be discussed)
INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 17
INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 18
Internal Forwarding and Register Tagging
o Internal Forwarding:
A short-circuit technique to replace unnecessary memory accesses by register-register
transfers in a sequence of fetch-arithmetic-store operations
o Register Tagging:
Use of tagged registers , buffers and reservation stations, for exploiting concurrent activities
among multiple arithmetic units
o Store-Fetch Forwarding
(M R1, R2 M) replaced by (M R1, R2 R1)
o Fetch-Fetch Forwarding
(R1 M, R2 M) replaced by (R1 M, R2 R1)
o Store-Store Overwriting
(M R1, M R2) replaced by (M R2)
INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 19
Internal Forwarding and Register Tagging
INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 20
Internal Forwarding and Register Tagging
INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 21
Hazard Detection and Avoidance
o Domain or Input Set of an instruction
o Range or Output Set of an instruction
o Data Hazards: RAW, WAR and WAW
o Resolution using Register Renaming approach
INSTRUCTION PIPELINE DESIGN
Dynamic Instruction Scheduling
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 22
Idea of Static Scheduling
o Compiler based scheduling strategy to resolve Interlocking among instructions
Dynamic Scheduling
o Tomasulos Algorithm (Register-Tagging Scheme)
Hardware based dependence-resolution
o Scoreboarding Technique
Scoreboard: the centralized control unit
A kind of data-driven mechanism
INSTRUCTION PIPELINE DESIGN
Dynamic Instruction Scheduling
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 23
INSTRUCTION PIPELINE DESIGN
Dynamic Instruction Scheduling
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 24
INSTRUCTION PIPELINE DESIGN
Branch Handling Techniques
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 25
Branch Taken, Branch Target, Delay Slot
Effect of Branching
o Parameters:
k : No. of stages in the pipeline
n : Total no. of instructions or tasks
p : Percentage of Brach instructions over n
q : Percentage of successful branch instructions (branch taken) over p.
b : Delay Slot
: Pipeline Cycle Time
o Branch Penalty = q of (p of n) * b = pqnb
o Effective Execution Time:
T
eff
= [k + (n-1)] + pqnb =[k + (n-1) + pqnb]
INSTRUCTION PIPELINE DESIGN
Branch Handling Techniques
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 26
Effect of Branching
o Effective Throughput:
H
eff
= n/T
eff
H
eff
= n / {[k + (n-1) + pqnb]} = nf / [k + (n-1) + pqnb]
As nInfinity and b = k-1
o H*
eff
= f / [pq(k-1)+1]
If p=0 and q=0 (no branching occurs)
o H**
eff
= f = 1/
o Performance Degradation Factor
D = 1 H*
eff
/ f = pq(k-1) / [pq(k-1)+1]

INSTRUCTION PIPELINE DESIGN
Branch Handling Techniques
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 27
INSTRUCTION PIPELINE DESIGN
Branch Handling Techniques
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 28
Branch Prediction
o Static Branch Prediction: based on branch code types
o Dynamic Branch prediction: based on recent branch history
Strategy 1: Predict the branch direction based on information found at decode stage.
Strategy 2: Use a cache to store target addresses at effective address calculation stage.
Strategy 3: Use a cache to store target instructions at fetch stage
o Brach Target Buffer Organization
Delayed Branches
o A delayed branch of d cycles allows at most d-1 useful instructions to be executed following the
branch taken.
o Execution of these instructions should be independent of branch instruction to achieve a zero
branch penalty
INSTRUCTION PIPELINE DESIGN
Branch Handling Techniques
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 29
INSTRUCTION PIPELINE DESIGN
Branch Handling Techniques
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 30
ARITHMETIC PIPELINE DESIGN
Computer Arithmetic Operations
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 31
Finite-precision arithmetic
Overflow and Underflow
Fixed-Point operations
o Notations:
Signed-magnitude, ones complement and two-complement notation
o Operations:
Addition: (n bit, n bit) (n bit) Sum, 1 bit output carry
Subtraction: (n bit, n bit) (n bit) difference
Multiplication: (n bit, n bit) (2n bit) product
Division: (2n bit, n bit) (n bit) quotient, (n bit) remainder
ARITHMETIC PIPELINE DESIGN
Computer Arithmetic Operations
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 32
Floating-Point Numbers
o X = (m, e) representation
m: mantissa or fraction
e: exponent with an implied base or radix r.
Actual Value X = m * r
e
o Operations on numbers X = (m
x
, e
x
) and Y = (m
y
, e
y
)
Addition: (m
x
* r
ex-ey
+ m
y
, e
y
)
Subtraction: (m
x
* r
ex-ey
m
y
, e
y
)
Multiplication: (m
x
* m
y
, e
x
+e
y
)
Division: (m
x
/ m
y
, e
x
e
y
)
Elementary Functions
o Transcendental functions like: Trigonometric, Exponential, Logarithmic, etc.
ARITHMETIC PIPELINE DESIGN
Static Arithmetic Pipelines
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 33
Separate units for fixed point operations and floating point operations
Scalar and Vector Arithmetic Pipelines
Uni-functional or Static Pipelines
Arithmetic Pipeline Stages
o Majorly involve hardware to perform: Add and Shift micro-operations
o Addition using: Carry Propagation Adder (CPA) and Carry Save Adder (CSA)
o Shift using: Shift Registers
Multiplication Pipeline Design
o E.g. To multiply two 8-bit numbers that yield a 16-bit product using CSA and CPA Wallace Tree.
ARITHMETIC PIPELINE DESIGN
Static Arithmetic Pipelines
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 34
0
ARITHMETIC PIPELINE DESIGN
Static Arithmetic Pipelines
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 35
ARITHMETIC PIPELINE DESIGN
Multifunctional Arithmetic Pipelines
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 36
Multifunctional Pipeline:
o Static multifunctional pipeline
o Dynamic multifunctional pipeline
Case Study: T1/ASC static multifunctional pipeline architecture
ARITHMETIC PIPELINE DESIGN
Multifunctional Arithmetic Pipelines
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 37
ARITHMETIC PIPELINE DESIGN
Multifunctional Arithmetic Pipelines
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 38

SUPERSCALAR PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 39
Pipeline Design Parameters
o Pipeline cycle, Base cycle, Instruction issue rate, Instruction issue Latency, Simple Operation Latency
o ILP to fully utilize the pipeline
Superscalar Pipeline Structure
Data and Resource Dependencies
Pipeline Stalling
Superscalar Pipeline Scheduling
o In-order Issue and in-order completion
o In-order Issue and out-of-order completion
o Out-of-order Issue and out-of-order completion
Superscalar Performance
SUPERSCALAR PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 40
Parameter Base Scalar Processor Super Scalar Processor
(degree = K)
Pipeline Cycle 1 (base cycle) K
Instruction Issue Rate 1 K
Instruction Issue Latency 1 1
Simple Operation Latency 1 1
ILP to fully utilize pipeline 1 K
SUPERSCALAR PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 41
4,
SUPERSCALAR PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 42
SUPERSCALAR PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 43
SUPERSCALAR PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 44
SUPERSCALAR PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 45
SUPERSCALAR PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 46
SUPERSCALAR PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 47
4,
SUPERSCALAR PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 48
SUPERSCALAR PIPELINE DESIGN
Sumit Mittu, Assistant Professor, CSE/IT, Lovely Professional University 49
Time required by base scalar machine:
o T(1,1) = K + N 1
The ideal execution time required by m-issue superscalar machine:
o T(m,1) = K + (N m)/m
o Where,
K is the time required to execute first m instructions through m pipelines of k-stages
simultaneously
Second term corresponds to time required to execute remaining N-m instructions , m per
cycle through m pipelines
The ideal speedup of superscalar machine
o S(m,1) = T(1,1)/T(m,1) = m(N + k 1)/[N+ m(k 1)]
As n infinity, S(m,1) m

You might also like