Lec18 Pipeline
Lec18 Pipeline
Lec18 Pipeline
Parallel processing
Theoretical only
processors receive different instructions, but operate
on the same data.
Sequential Laundry
6 PM
10
11
Midnight
Time
30 40 20 30 40 20 30 40 20 30 40 20
T
a
s
k
A
B
O
r
d
e
r
C
90 min
D
This operator scheduled his loads to be delivered to the laundry every 90
minutes which is the time required to finish one load. In other words he
will not start a new task unless he is already done with the previous task
The process is sequential. Sequential laundry takes 6 hours for 4 loads
10
11
Midnight
Time
30 40
T
a
s
k
40
40
40 20
40 40 40
A
B
O
r
d
e
r
C
D
Another operator asks for the delivery of loads to the laundry every 40
minutes!?.
Pipelined laundry takes 3.5 hours for 4 loads
Pipelining Facts
6 PM
9
Time
T
a
s
k
O
r
d
e
r
30 40
A
40
40 20
B
C
40
The washer
waits for the
dryer for 10
minutes
Multiple tasks
operating
simultaneously
Pipelining doesnt help
latency of single task,
it helps throughput of
entire workload
Pipeline rate limited by
slowest pipeline stage
Potential speedup =
Number of pipe stages
Unbalanced lengths of
pipe stages reduces
speedup
Time to fill pipeline
and time to drain it
reduces speedup
9.2 Pipelining
Decomposes a sequential process into segments.
9.2 Pipelining
k segments
SPEEDUP
Example
SPEEDUP
Example
5-Stage Pipelining
S1
S2
S3
S4
S5
Fetch
Instruction
(FI)
Decode
Instruction
(DI)
Fetch
Operand
(FO)
Execution
Instruction
(EI)
Write
Operand
(WO)
Time
S1
1 2 3 4 5 6 7 8 9
S2
1 2 3 4 5 6 7 8
S3
1 2 3 4 5 6 7
S4
1 2 3 4 5 6
S5
1 2 3 4 5
Example Answer
Example
Example Answer
Maximum Speedup:
100*N/ (6+N-1)*30 = 10/3
Some definitions
Some definitions
Throughput of the instruction pipeline is determined by
how often an instruction exits the pipeline. Pipelining
does not decrease the time for individual instruction
execution. Instead, it increases instruction throughput.
sequential processing
Instruction pipeline
sequential processing is
Instructions seperate
1.
2.
3.
4.
5.
5-Stage Pipelining
S1
S2
S3
S4
S5
Fetch
Instruction
(FI)
Decode
Instruction
(DI)
Fetch
Operand
(FO)
Execution
Instruction
(EI)
Write
Operand
(WO)
Time
S1
1 2 3 4 5 6 7 8 9
S2
1 2 3 4 5 6 7 8
S3
1 2 3 4 5 6 7
S4
1 2 3 4 5 6
S5
1 2 3 4 5
Five Stage
Instruction Pipeline
Fetch instruction
Decode instruction
Fetch operands
Execute instructions
Write result
Difficulties...
If a complicated memory access occurs in
stage 1, stage 2 will be delayed and the
rest of the pipe is stalled.
If there is a branch, if.. and jump,
then some of the instructions that have
already entered the pipeline should not be
processed.
We need to deal with these difficulties to
keep the pipeline moving
Pipeline Hazards
Structural hazard
Data hazard
Branch hazard
Pipeline Hazards
Structural hazard
Data hazard
Branch hazard
Structural hazard
Structural hazard
Memory data fetch requires on FI and FO
S1
S2
S3
S4
S5
Fetch
Instruction
(FI)
Decode
Instruction
(DI)
Fetch
Operand
(FO)
Execution
Instruction
(EI)
Write
Operand
(WO)
Time
S1
1 2 3 4 5 6 7 8 9
S2
1 2 3 4 5 6 7 8
S3
1 2 3 4 5 6 7
S4
1 2 3 4 5 6
S5
1 2 3 4 5
Structural hazard
Structural hazard
Fetch
Instruction
(FI)
Decode
Instruction
(DI)
Time
Fetch
Operand
(FO)
Execution
Instruction
(EI)
Write
Operand
(WO)
Data hazard
Example:
ADD
SUB
AND
OR
XOR
R1R2+R3
R4R1-R5
R6R1 AND R7
R8R1 OR R9
R10R1 XOR R11
Data hazard
FO: fetch data value
S1
S2
Fetch
Instruction
(FI)
Decode
Instruction
(DI)
Time
Execution
Instruction
(EI)
Write
Operand
(WO)
Data hazard
ADD
No-op
No-op
SUB
AND
OR
XOR
R1R2+R3
R4R1-R5
R6R1 AND R7
R8R1 OR R9
R10R1 XOR R11
Data hazard
Data hazard
Data hazard
Branch hazards
Branch hazards
Freeze scheme
Predict-untaken scheme
Predict-taken scheme
Delayed branch
5-Stage Pipelining
Fetch
Instruction
(FI)
Decode
Instruction
(DI)
Fetch
Operand
(FO)
Execution
Instruction
(EI)
Time
S1
1 2 3 4 5 6 7 8 9
S2
1 2 3 4 5 6 7 8
S3
1 2 3 4 5 6 7
S4
1 2 3 4 5 6
S5
1 2 3 4 5
Write
Operand
(WO)
Branch Untaken
(Freeze approach)
Fetch
Instruction
(FI)
Decode
Instruction
(DI)
Fetch
Operand
(FO)
Execution
Instruction
(EI)
Write
Operand
(WO)
Branch Taken
(Freeze approach)
Fetch
Instruction
(FI)
Decode
Instruction
(DI)
Fetch
Operand
(FO)
Execution
Instruction
(EI)
Write
Operand
(WO)
Branch Taken
(Freeze approach)
Branch Hazards
(Predicted-untaken)
Branch Untaken
(Predicted-untaken)
Fetch
Instruction
(FI)
Decode
Instruction
(DI)
Time
Fetch
Operand
(FO)
Execution
Instruction
(EI)
Write
Operand
(WO)
Branch Taken
(Predicted-untaken)
Fetch
Instruction
(FI)
Decode
Instruction
(DI)
Fetch
Operand
(FO)
Execution
Instruction
(EI)
Write
Operand
(WO)
Branch Taken
(Predicted-taken)
Branch Untaken
(Predicted-taken)
Fetch
Instruction
(FI)
Decode
Instruction
(DI)
Fetch
Operand
(FO)
Execution
Instruction
(EI)
Write
Operand
(WO)
Branch taken
(Predicted-taken)
Fetch
Instruction
(FI)
Decode
Instruction
(DI)
Fetch
Operand
(FO)
Execution
Instruction
(EI)
Write
Operand
(WO)
Delayed Branch
Delay slot
branch target if taken
Delayed Branch
Optimal
Delayed
Branch
If the optimal is not
available:
(b) Act like
predict-taken
(in complier way)
(c) Act like
predict-untaken
(in complier way)
Delayed Branch
Branch Prediction
Branch Prediction