Unit 6
Unit 6
Unit 6
Time
30 40 20 30 40 20 30 40 20 30 40 20
Sequential laundry takes 6 hours
A for 4 loads
If they learned pipelining, how
long would laundry take?
B
D
Traditional Pipeline Concept
6 PM 7 8 9 10 11 Midnight
Time
T
a 30 40 40 40 40 20
s
k A
Pipelined laundry takes
3.5 hours for 4 loads
O B
r
d C
e
r D
Traditional Pipeline Concept
Pipelining doesn’t help
6 PM 7 8 9 latency of single task, it helps
throughput of entire
Time
workload
T Pipeline rate limited by
a 30 40 40 40 40 20
slowest pipeline stage
s Multiple tasks operating
A
k simultaneously using
different resources
O B Potential speedup = Number
r pipe stages
d C Unbalanced lengths of pipe
e stages reduces speedup
r Time to “fill” pipeline and
D
time to “drain” it reduces
speedup
Stall for Dependences
Use the Idea of Pipelining in a
ComputerFetch + Execution
T ime
I1 I2 I3
Time
Clock cycle 1 2 3 4
F E F E F E
1 1 2 2 3 3 Instruction
I1 F1 E1
(a) Sequential execution
I2 F2 E2
Interstage buffer
B1
I3 F3 E3
Instruction Execution
fetch unit (c) Pipelined execution
unit
Computer
Clock cycle 1 2 3 4 5 6 7
Instruction
I1 F1 D1 E1 W1
Fetch + Decode
+ Execution + Write I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
Interstage buffers
D : Decode
F : Fetch instruction E: Execute W : Write
instruction and fetch operation results
operands
B1 B2 B3
Cycle time
= Maximum delay due to any stage + Delay due to its
register
= Max { 60, 50, 90, 80 } + 10 ns
= 90 ns + 10 ns
= 100 ns
Part-02: Non-Pipeline Execution Time-
Speed up
= Non-pipeline execution time / Pipeline execution time
= 280 ns / Cycle time
= 280 ns / 100 ns
= 2.8
Part-04: Pipeline Time For 1000 Tasks-
Part-06: Throughput-
Cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { 3, 2, 4, 2, 3 } + 0
= 4 ns
Execution Time For 100 Instructions in Design D1-
Execution time for 100 instructions
= Time taken for 1st instruction + Time taken for remaining 99
instructions
= 1 x 5 clock cycles + 99 x 1 clock cycle
= 5 x cycle time + 99 x cycle time
= 5 x 4 ns + 99 x 4 ns
= 20 ns + 396 ns
= 416 ns
Cycle Time in Design D2-
Cycle time
= Delay due to a stage + Delay due to its register
= 2 ns + 0
= 2 ns
Execution Time For 100 Instructions in Design D2-
Execution time for 100 instructions
= Time taken for 1st instruction + Time taken for remaining 99
instructions
= 1 x 8 clock cycles + 99 x 1 clock cycle
= 8 x cycle time + 99 x cycle time
= 8 x 2 ns + 99 x 2 ns
= 16 ns + 198 ns
= 214 ns
Time Saved-
Time saved
= Execution time in design D1 – Execution time in design D2
= 416 ns – 214 ns
= 202 ns
Role of Cache Memory
Each pipeline stage is expected to complete in one
clock cycle.
The clock period should be long enough to let the
slowest pipeline stage to complete.
Faster stages can only wait for the slowest one to
complete.
Since main memory is very slow compared to the
execution, if each instruction needs to be fetched
from main memory, pipeline is almost useless.
Fortunately, we have cache.
Pipeline Performance
The potential increase in performance resulting from
pipelining is proportional to the number of pipeline
stages.
However, this increase would be achieved only if all
pipeline stages require the same time to complete,
and there is no interruption throughout program
execution.
Unfortunately, this is not true.
Pipeline Performance
Clock cycle 1 2 3 4 5 6 7 8 9
Time
Instruction
I1 F1 D1 E1 W1
I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
I5 F5 D5 E5
Figure 8.3. Effect of an execution operation taking more than one clock cycle.
Pipeline Performance
The previous pipeline is said to have been stalled for two clock
cycles.
Any condition that causes a pipeline to stall is called a hazard.
Data hazard – any condition in which either the source or the
destination operands of an instruction are not available at the time
expected in the pipeline. So some operation has to be delayed, and
the pipeline stalls.
Instruction (control) hazard – a delay in the availability of an
instruction causes the pipeline to stall.
Structural hazard – the situation when two instructions require the
use of a given hardware resource at the same time.
Data Hazards
We must ensure that the results obtained when instructions are
executed in a pipelined processor are identical to those obtained
when the same instructions are executed sequentially.
Hazard occurs
A←3+A
B←4×A
No hazard
A←5×C
B ← 20 + C
When two operations depend on each other, they must be
executed sequentially in the correct order.
Another example:
Mul R2, R3, R4
Add R5, R4, R6
Data Hazards Time
Clock cycle 1 2 3 4 5 6 7 8 9
Instruction
I1 (Mul) F1 D1 E1 W1
I2 (Add) F2 D2 D2A E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
Instruction
I1 F1 E1
I3 F3 X
Ik Fk Ek