Systems I: Pipelining II
Systems I: Pipelining II
Systems I: Pipelining II
Pipelining II
Topics
Pipelining hardware: registers and
feedback paths
Difficulties with pipelines: hazards
Method of mitigating hazards
valM
SEQ+ Hardware
data out
read
Mem. Data
Data
Memory control memory
memory
write
Addr Data
Still sequential
Bch valE
implementation
ALU
Reorder PC stage to put at Execute CC
CC ALU
ALU fun.
beginning ALU
A
ALU
B
PC Stage
Task is to select PC for valA valB dstE dstM srcA srcB
current instruction
dstE dstM srcA srcB
Based on results computed
Decode A B
PC is no longer stored in
register Fetch Instruction
Instruction
memory
memory
PC
PC
increment
increment
2
Adding Pipeline Registers
valE, valM
Write back W_icode, W_valM W_valE, W_valM, W_dstE, W_dstM
valM
valM
W
valM
Data
Data
Memory M_icode, Data
Data
memory
memory Memory
M_Bch, memory
memory
M_valA
Addr, Data
Addr, Data
M
valE
E
valA, valB
valA, valB
srcA, srcB
Decode d_srcA,
dstA, dstB A B
Decode A B
icode, valC Register M
Register d_srcB
RegisterM
Register
valP file
file
E file
file E
Write back
valP D
icode, ifun valP
rA, rB
valC
Instruction PC icode, ifun,
Fetch Instruction PC
rA, rB, valC valP
memory
memory increment
increment
Fetch Instruction
Instruction PC
PC
memory
memory increment
increment
PC predPC
PC pState
PC f_PC
3
Pipeline Stages
Fetch
Select current PC
Read instruction
Compute incremented PC
Decode
Read program registers
Execute
Operate ALU
Memory
Read or write data memory
Write Back
Update register file
4
Write back
read
valE valM
data out
dstE dstM
Mem. Data
Data
control memory
memory
write
Memory
Pipeline registers hold Addr
data in
M_valA
intermediate values
M_Bch
ALU ALU
d_srcA d_srcB
Select d_rvalA dstE dstM srcA srcB
Cannot jump past A
W_valM
Decode A B
stages Register M
Register
file
file
E
W_valE
F predPC
5
Signal Naming Conventions
S_Field
Value of Field held in stage S pipeline
register
s_Field
Value of Field computed in stage S
6
Write back
read
valE valM
data out
dstE dstM
Mem. Data
Data
control memory
memory
write
Memory
Predicted PC M_Bch
Addr
data in
M_valA
e_Bch
Branch information
ALU
CC
CC ALU
ALU fun.
ALU ALU
Jump taken/not-taken Execute
A B
Fall-through or target
address E icode ifun valC valA valB dstE dstM srcA srcB
d_srcA d_srcB
Select d_rvalA dstE dstM srcA srcB
A B W_valM
Register M
Register
Read from memory file
file W_valE
E
Select
M_valA
PC W_valM
F predPC
7
Pipeline Demonstration
1 2 3 4 5 6 7 8 9
Cycle 5
File: demo-basic.ys W
I1
M
I2
E
I3
D
I4
F
I5
8
Data Dependencies: 3 Nop’s
1 2 3 4 5 6 7 8 9 10 11
# demo-h3.ys
0x000: irmovl $10,%edx F D E M W
0x006: irmovl $3,%eax F D E M W
0x00c: nop F D E M W
0x00d: nop F D E M W
0x00e: nop F D E M W
0x00f: addl %edx,%eax F D E M W
0x011: halt F D E M W
Cycle 6
W
R[ %eax] 3
Cycle 7
D
valA R[ %edx] = 10
valB R[ %eax] = 3 9
Data Dependencies: 2 Nop’s
1 2 3 4 5 6 7 8 9 10
# demo-h2.ys
0x000: irmovl $10,%edx F D E M W
0x006: irmovl $3,%eax F D E M W
0x00c: nop F D E M W
0x00d: nop F D E M W
0x00e: addl %edx,%eax F D E M W
0x010: halt F D E M W
Cycle 6
W
R[ %eax] 3
10
Data Dependencies: 1 Nop
# demo-h1.ys 1 2 3 4 5 6 7 8 9
Cycle 5
W
R[ %edx] 10
M
M_valE = 3
M_dstE = %eax
Now a
problem with •
both operands •
•
D
Error
valA R[ %edx] = 0
valB R[ %eax] = 0 11
Data Dependencies: No Nop
1 2 3 4 5 6 7 8
# demo-h0.ys
0x000: irmovl $10,%edx F D E M W
0x006: irmovl $3,%eax F D E M W
0x00c: addl %edx,%eax F D E M W
0x00e: halt F D E M W
Cycle 4
M
M_valE = 10
M_dstE = %edx
E
e_valE 0 + 3 = 3
Wow - we really E_dstE = %eax
missed the boat here…
D
Error
valA R[ %edx] = 0
valB R[ %eax] = 0
12
M_icode
PC
W_valM
D icode ifun rA rB valC valP
Predict
PC
Need
valC
Instr PC
PC
valid increment
Need increment
regids
Split
Split Align
Align
Byte 0 Bytes 1-5
Instruction
Instruction
memory
memory
Select
PC
F predPC
13
Our Prediction Strategy
Instructions that Don’t Transfer Control
Predict next PC to be valP
Always reliable
Conditional Jumps
Predict next PC to be valC (destination)
Only correct if branch is taken
Typically right 60% of time
Return Instruction
Don’t try to predict
14
Recovering
M_icode
M_Bch
M_valA
from PC
W_icode
W_valM
D icode ifun rA rB valC valP
Misprediction Instr
Need
valC
Predict
PC
PC
PC
valid increment
Need increment
regids
Split
Split Align
Align
Byte 0 Bytes 1-5
Instruction
Instruction
memory
memory
Select
PC
F predPC
Mispredicted Jump
Will see branch flag once instruction reaches memory stage
Can get fall-through PC from valA
Return Instruction
Will get return PC when ret reaches write-back stage
In both cases
Need to throw away instructions fetched between prediction and resolution
15
Branch Misprediction Example
demo-j.ys
16
Branch Misprediction Trace
# demo-j 1 2 3 4 5 6 7 8 9
Cycle 5
M
Incorrectly execute two M_Bch = 0
M_valA = 0x007
instructions at branch target
E
valE 3
dstE = %edx
D
valC = 4
dstE = %ecx
F
valC 1
rB %eax 17
Return Example demo-ret.ys
18
Incorrect Return Example
# demo-ret
0x023: ret F D E M W
0x024: irmovl $1,%eax # Oops! F D E M W
0x02a: irmovl $2,%ecx # Oops! F D E M W
0x030: irmovl $3,%edx # Oops! F D E M W
0x00e: irmovl $5,%esi # Return F D E M W
Incorrectly execute 3
W
instructions following ret
valM = 0x0e
M
valE = 1
dstE = %eax
E
valE 2
dstE = %ecx
D
valC = 3
dstE = %edx
F
valC 5
rB %esi 19
Pipeline Summary
Concept
Break instruction execution into 5 stages
Run instructions through in pipelined mode
Limitations
Can’t handle dependencies between instructions when
instructions follow too closely
Data dependencies
One instruction writes register, later one reads it
Control dependency
Instruction sets PC in way that pipeline did not predict correctly
Mispredicted branch and return
20
The problem is hazards
Make the pipelined processor work!
Data Hazards
Instruction having register R as source follows shortly after
instruction having register R as destination
Common condition, don’t want to slow down pipeline
Control Hazards
Mispredict conditional branch
Our design predicts all branches as being taken
Naïve pipeline executes two extra instructions
Getting return address for ret instruction
Naïve pipeline executes three extra instructions
21
How do we fix the Pipeline?
Pad the program with NOPs
Yuck!
22
Stalling for Data Dependencies
1 2 3 4 5 6 7 8 9 10 11
# demo-h2.ys
0x000: irmovl $10,%edx F D E M W
0x006: irmovl $3,%eax F D E M W
0x00c: nop F D E M W
0x00d: nop F D E M W
bubble E M W
0x00e: addl %edx,%eax F D D E M W
0x010: halt F F D E M W
23
Stall Condition
Source Registers
srcA and srcB of current
instruction in decode stage
Destination Registers
dstE and dstM fields
Instructions in execute,
memory, and write-back
stages
Special Case
Don’t stall for register ID 15
(0xF)
Indicates absence of
register operand
Don’t stall for failed
conditional move
24
Detecting Stall Condition
1 2 3 4 5 6 7 8 9 10 11
# demo-h2.ys
0x000: irmovl $10,%edx F D E M W
0x006: irmovl $3,%eax F D E M W
0x00c: nop F D E M W
0x00d: nop F D E M W
bubble E M W
0x00e: addl %edx,%eax F D D E M W
0x010: halt F F D E M W
Cycle 6
W
W_dstE = %eax
W_valE = 3
•
•
•
D
srcA = %edx
srcB = %eax 25
Stalling X3
1 2 3 4 5 6 7 8 9 10 11
# demo-h0.ys
0x000: irmovl $10,%edx F D E M W
0x006: irmovl $3,%eax F D E M W
bubble E M W
bubble E M W
bubble E M W
0x00c: addl %edx,%eax F D D D D E M W
0x00e: halt F F F F D E M W
Cycle 6
W
Cycle 5 W_dstE = %eax
M
Cycle 4 M_dstE = %eax •
•
E • •
E_dstE = %eax
•
•
D D D
srcA = %edx srcA = %edx srcA = %edx
srcB = %eax srcB = %eax srcB = %eax 26
What Happens When Stalling?
# demo-h0.ys
Cycle 8
4
5
6
7
0x000: irmovl $10,%edx
Write Back 0x000: bubble
0x006: irmovl $10,%edx
$3,%eax
0x006: irmovl $3,%eax
Memory 0x000: bubble
0x006: irmovl $10,%edx
$3,%eax
0x00c: addl %edx,%eax
Execute 0x006: bubble
0x00c: irmovl
addl %edx,%eax
$3,%eax
0x00e: halt
Decode 0x00c: halt
0x00e: addl %edx,%eax
Fetch 0x00e: halt
27
Implementing Stalling
W_dstM
W_dstE
M_dstM
M_dstE
E_dstM
Pipe E_dstE
control
logic E_bubble
E icode ifun valC valA valB dstE dstM srcA srcB
d_srcB
d_srcA
srcB
D_icode
srcA
F_stall
F predPC
Pipeline Control
Combinational logic detects stall condition
Sets mode signals for how pipeline registers should update
28
Pipeline Register Modes
Rising
Input = y Output = x clock Output = y
Normal x
y
stall bubble
=0 =0
Rising
Input = y Output = x clock Output = x
Stall x
x
stall bubble
=1 =0
Rising
Input = y Output = x clock Output = nop
Bubble x
n
o
p
stall bubble
=0 =1
29
Summary
Today
Data hazards (read after write)
Control hazards (branch, return)
Mitigating hazards through stalling
Next Time
Hazard mitigation through pipeline forwarding
Hardware support for forwarding
Forwarding to mitigate control (branch) hazards
30