0% found this document useful (0 votes)
114 views7 pages

Cse410 10 Pipelining A

This document summarizes the execution cycle and pipelining in a 5-stage MIPS processor. It discusses how instructions are fetched, decoded, executed, access memory, and write results back in separate pipeline stages. Pipelining allows overlapping execution of multiple instructions to improve throughput. However, data and structural hazards can occur when instructions depend on results not yet available or compete for hardware resources. Solutions like deeper pipelines, separate caches, and forwarding of register values between stages help address these hazards and maximize performance.

Uploaded by

Purnay Barge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views7 pages

Cse410 10 Pipelining A

This document summarizes the execution cycle and pipelining in a 5-stage MIPS processor. It discusses how instructions are fetched, decoded, executed, access memory, and write results back in separate pipeline stages. Pipelining allows overlapping execution of multiple instructions to improve throughput. However, data and structural hazards can occur when instructions depend on results not yet available or compete for hardware resources. Solutions like deeper pipelines, separate caches, and forwarding of register values between stages help address these hazards and maximize performance.

Uploaded by

Purnay Barge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Execution Cycle

IF ID EX MEM WB

Pipelining
1. Instruction Fetch
2. Instruction Decode
CSE 410, Spring 2005 3. Execute
Computer Systems 4. Memory
5. Write Back
http://www.cs.washington.edu/410

IF and ID Stages Simple MIPS Instruction Formats


1. Instruction Fetch
R op code source 1 source 2 dest shamt function
» Get the next instruction from memory
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
» Increment Program Counter value by 4
2. Instruction Decode I op code base reg src/dest offset or immediate value
» Figure out what the instruction says to do 6 bits 5 bits 5 bits 16 bits

» Get values from the named registers


» Simple instruction format means we know which J op code word offset

registers we may need before the instruction is 6 bits 26 bits

fully decoded
EX, MEM, and WB stages Example: add $s0, $s1, $s2
3. Execute • IF get instruction at PC from memory
» On a memory reference, add up base and offset op code source 1 source 2 dest shamt function
» On an arithmetic instruction, do the math 000000 10001 10010 10000 00000 100000

4. Memory Access
• ID determine what instruction is and read
» If load or store, access memory registers
» If branch, replace PC with destination address » 000000 with 100000 is the add instruction
» Otherwise do nothing » get contents of $s1 and $s2 (eg: $s1=7, $s2=12)
5. Write back • EX add 7 and 12 = 19
» Place the results in the appropriate register • MEM do nothing for this instruction
• WB store 19 in register $s0

Example: lw $t2, 16($s0) Latency & Throughput


• IF get instruction at PC from memory 1 2 3 4 5 6 7 8 9 10
op code base reg src/dest offset or immediate value IF ID EX MEM WB inst 1
010111 10000 01000 0000000000010000 IF ID EX MEM WB inst 2
• ID determine what 010111 is Latency—the time it takes for an individual instruction to execute
» 010111 is lw What’s the latency for this implementation?
» get contents of $s0 and $t2 (we don’t know that we One instruction takes 5 clock cycles
don’t care about $t2) $s0=0x200D1C00, $t2=77763 Cycles per Instruction (CPI) = 5
• EX add 16 to 0x200D1C00 = 0x200D1C10 Throughput—the number of instructions that execute per unit time
• MEM load the word stored at 0x200D1C10 What’s the throughput of this implementation?
One instruction is completed every 5 clock cycles
• WB store loaded value in $t2 Average CPI = 5
A case for pipelining Pipelined Latency & Throughput
• If execution is non-overlapped, the functional 1 2 3 4 5 6 7 8 9
units are underutilized because each unit is used IF ID EX MEM WB inst 1
only once every five cycles IF ID EX MEM WB inst 2
• If Instruction Set Architecture is carefully IF ID EX MEM WB inst 3
designed, organization of the functional units IF ID EX MEM WB inst 4
can be arranged so that they execute in parallel IF ID EX MEM WB inst 5
• Pipelining overlaps the stages of execution so
every stage has something to do each cycle • What’s the latency of this implementation?
• What’s the throughput of this implementation?

Pipelined Analysis Throughput is good!


• A pipeline with N stages could improve
throughput by N times, but
» each stage must take the same amount of time overlapped
» each stage must always have work to do
increasing
» there may be some overhead to implement number of
• Also, latency for each instruction may go up instructions
» Within some limits, we don’t care
sequential

increasing time
MIPS ISA: Born to Pipeline Memory accesses
• Instructions all one length • Efficient pipeline requires each stage to
» simplifies Instruction Fetch stage take about the same amount of time
• Regular format • CPU is much faster than memory hardware
» simplifies Instruction Decode • Cache is provided on chip
• Few memory operands, only registers » i-cache holds instructions
» only lw and sw instructions access memory » d-cache holds data
• Aligned memory operands » critical feature for successful RISC pipeline
» only one memory access per operand » more about caches next week

The Hazards of Parallel Activity Design for Speed


• Any time you get several things going at once, • Most of what we talk about next relates to the
you run the risk of interactions and CPU hardware itself
dependencies » problems keeping a pipeline full
» juggling doesn’t take kindly to irregular events » solutions that are used in the MIPS design
• Unwinding activities after they have started • Some programmer visible effects remain
can be very costly in terms of performance » many are hidden by the assembler or compiler
» drop everything on the floor and start over » the code that you write tells what you want done,
but the tools rearrange it for speed
Pipeline Hazards Structural Hazards
• Structural hazards • Concurrent instructions want same resource
» Instructions in different stages need the same » lw instruction in stage four (memory access)
resource, eg, memory » add instruction in stage one (instruction fetch)
• Data hazards » Both of these actions require access to
» data not available to perform next operation memory; they would collide if not designed for
• Control hazards • Add more hardware to eliminate problem
» data not available to make branch decision » separate instruction and data caches
• Or stall (cheaper & easier), not usually
done

Data Hazards Stall for register data dependency


• When an instruction depends on the results
of a previous instruction still in the pipeline
• Stall the pipeline until the result is available
• This is a data dependency $s0 is
written here » this would create a 3-cycle pipeline bubble

add $s0, $s1, $s2 IF ID EX MEM WB


add s0,s1,s2 IF ID EX MEM WB
add $s4, $s3, $s0 IF ID EX MEM WB add s4,s3,s0 IF ID EX MEM WB
stall
$s0 is
read here
Read & Write in same Cycle Solution: Forwarding
• The value of $s0 is known internally after cycle 3
(after the first instruction’s EX stage)
• Write the register in the first part of the clock
cycle • The value of $s0 isn’t needed until cycle 4 (before
• Read it in the second part of the clock cycle the second instruction’s EX stage)
• A 2-cycle stall is still required • If we forward the result there isn’t a stall
write $s0

add s0,s1,s2 IF ID EX MEM WB read $s0 add s0,s1,s2 IF ID EX MEM WB


add s4,s3,s0 IF stall ID EX MEM WB
add s4,s3,s0 IF ID EX MEM WB

Another data hazard Stall for lw hazard


• What if the first instruction is lw?
• s0 isn’t known until after the MEM stage • We can stall for one cycle, but we hate to stall
» We can’t forward back into the past
• Either stall or reorder instructions
lw s0,0(s2) IF ID EX MEM WB

lw s0,0(s2) IF ID EX MEM WB add s4,s3,s0 IF ID stall EX MEM WB


NO!
add s4,s3,s0 IF ID EX MEM WB
Instruction Reorder for lw hazard Reordering Instructions

• Try to execute an unrelated instruction • Reordering instructions is a common


between the two instructions technique for avoiding pipeline stalls
• Static reordering
» programmer, compiler and assembler do this
• Dynamic reordering
lw s0,0(s2) IF ID EX MEM WB
» modern processors can see several instructions
sub t4,t2,t3 IF ID EX MEM WB
» they execute any that have no dependency
add s4,s3,s0 IF ID EX MEM WB » this is known as out-of-order execution and is
sub t4,t2,t3
complicated to implement, but effective

You might also like