Lecture 16
Lecture 16
Lecture 16
Montek Singh
Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5)
Outline
Motivation for Caches Principle of locality
Levels of Memory Hierarchy
Cache Organization
Cache Read/Write Policies Block replacement policies Write-back vs. write-through caches Write buffers
Processor
Input Control Memory Datapath
Output
Processor
Cache
DRAM
Address Space
2n
The Principle of Locality Program accesses a relatively small portion of the address space at any instant of time Example: 90% of time in 10% of the code Two different types of locality Temporal Locality (locality in time):
if an item is referenced, it will tend to be referenced again soon
Spatial Locality (locality in space):
Registers
Words
L1, L2, Cache Blocks
Memory
Pages Disk Files
user/operator Mbytes OS 4-64K bytes
Tape/Network
adjacent levels
processor
Block The smallest unit of information that can either be present or not present in the two-level hierarchy
To Processor Upper Level (Cache)
Blk X
From Processor
Blk Y 7
Hit Rate = fraction of memory access found in upper level Hit Time = time to access the upper level
level
Time to replace a block in the upper level from lower level + Time to deliver the block the processor
Cache Addressing
Set 0
Set j-1 Block 0 Sector 0 Byte 0 Block k-1 Replacement info Sector m-1 Byte n-1 Valid Tag Dirty Shared
Block/line is unit of allocation Sector/sub-block is unit of transfer and coherence Cache parameters j, k, m, n are integers, and generally
powers of 2
Cache Shapes
Direct-mapped (A = 1, S = 16)
2-way set-associative (A = 2, S = 8)
4-way set-associative (A = 4, S = 4)
8-way set-associative (A = 8, S = 2)
11
12
Cache Organization
Direct Mapped Cache Each memory location can only mapped to 1 cache location No need to make any decision :-)
Current item replaces previous item in that cache location
N-way Set Associative Cache Each memory location have a choice of N cache locations Fully Associative Cache Each memory location can be placed in ANY cache location Cache miss in a N-way Set Associative or Fully
Associative Cache
Bring in new block from memory Throw out a cache block to make room for the new block Need to decide which block to throw out!
13
cache miss
14
WRITE Write into cache only Write allocate with write and set dirty bit (so that back on replacement, block is written back to MM only if modified)
15
MISS
CPU detects miss, stalls Cache selects replacement block New block loaded from MM Requested word sent to CPU CPU resumes operation CPU detects miss CPU writes MM (cache also if write allocate) stalls until write completes
Write Through
WRITE
CPU writes cache CPU writes MM and stalls until write completes
HIT READ
CPU reads cache
MISS
CPU detects miss, stalls Cache selects replacement block New block loaded from MM Word sent to CPU CPU resumes operation CPU detects miss, stalls Cache selects replacement block Old block evicted from cache New block loaded from MM (write allocate) CPU resumes operation
16
WRITE
Write Back
:
Addr. Tag
:
Addr. Tag
Compare
SEL11
Mux
0 SEL0
Compare
: :
Byte 1
Byte 0
Byte 33 Byte 32
:
18
:
Entry 63
19
What!!! How can this be? Isnt memory too slow for this?
20
Write Buffer
Write Buffer: needed between cache and main mem Processor: writes data into the cache and the write buffer Memory controller: write contents of the buffer to memory Write buffer is just a FIFO Typical number of entries: 4 Works fine if store freq. (w.r.t. time) << 1 / DRAM write cycle Memory system designers nightmare Store frequency (w.r.t. time) > 1 / DRAM write cycle Write buffer saturation
21
Store buffer will overflow no matter how big you make it CPU Cycle Time << DRAM Write Cycle Time
Processor
Cache
L2 Cache
DRAM
22
Write Buffer