5 Cache PDF
5 Cache PDF
5 Cache PDF
William Stallings
Computer Organization
and Architecture
10th Edition
© 2016 Pearson Education, Inc., Hoboken,
NJ. All rights reserved.
+ Chapter 4
Cache Memory
Table 4.1
Key Characteristics of Computer Memory Systems
Capacity
Memory is typically expressed in terms of bytes
Unit of transfer
For internal memory the unit of transfer is equal to the number of
electrical lines into and out of the memory module
Ou isk
t t i cd
sto boar
e
gn OM
rag d M a D- R W
e C D -R W
C R M
D-
D V D- R A y
a
DV lu-R
B
Of ap
e
f ct
sto -line eti
rag gn
e Ma
T2
T1
0 1
Fraction of accesses involving only Level 1 (Hit ratio)
Disk cache
A portion of main memory can be used as a buffer to hold data
temporarily that is to be read out to disk
A few large transfers of data can be used instead of many small
transfers of data
Data can be retrieved rapidly from the software cache rather than
slowly from the disk
Fastest Fast
Less Slow
fast
C–1
Block Length
(K Words)
(a) Cache
Block M – 1
2n – 1
Word
Length
(b) Main memory
Receive address
RA from CPU
Load main
Deliver RA word
memory block
to CPU
into cache line
DONE
Address
buffer
System Bus
Control Control
Processor Cache
Data
buffer
Data
Table 4.2
Elements of Cache Design
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Cache Addresses
Virtual Memory
Virtual memory
Facility that allows programs to address memory from a logical
point of view, without regard to the amount of main memory
physically available
When used, the address fields of machine instructions contain
virtual addresses
For reads to and writes from main memory, a hardware memory
management unit (MMU) translates each virtual address into a
physical address in main memory
Processor Main
Cache memory
Data
Processor Main
Cache memory
Data
m lines
Bm–1 Lm–1
First m blocks of
cache memory
main memory
(equal to size of cache) b = length of block in bits
t = length of tag in bits
(a) Direct mapping
t b
L0
one block of
main memory
Lm–1
cache memory
(b) Associative mapping
s–r
s
W4j
w Li
W(4j+1) Bj
Compare w
W(4j+2)
W(4j+3)
(hit in cache)
1 if match
0 if no match
Lm–1
0 if match
1 if no match
(miss in cache)
00 000000001111111111111000
00 000000001111111111111100
Line
Tag Data Number
16 000101100000000000000000 77777777 00 13579246 0000
16 000101100000000000000100 11235813 16 11235813 0001
FF 11223344 3FFE
16 000101101111111111111100 12345678 16 12345678 3FFF
8 bits 32 bits
FF 111111110000000000000000 16-Kline cache
FF 111111110000000000000100
FF 111111111111111111111000 11223344
FF 111111111111111111111100 24682468
Note: Memory address values are
in binary representation;
32 bits other values are in hexadecimal
16-MByte main memory
w Lj
s
W4j
W(4j+1)
Compare w Bj
W(4j+2)
W(4j+3)
(hit in cache)
1 if match
0 if no match
s
Lm–1
0 if match
1 if no match
(miss in cache)
Line
Tag Data Number
3FFFFE 11223344 0000
058CE7 FEDCBA98 0001
058CE6 000101100011001110011000
058CE7 000101100011001110011100 FEDCBA98 FEDCBA98
058CE8 000101100011001110100000
3FFFFD 33333333 3FFD
000000 13579246 3FFE
3FFFFF 24682468 3FFF
22 bits 32 bits
16 Kline Cache
Tag Word
Main Memory Address =
22 bits 2 bits
k lines
Lk–1
Cache memory - set 0
Bv–1
First v blocks of
main memory
(equal to number of sets)
B0 L0
one
set
v lines
Bv–1 Lv–1
First v blocks of Cache memory - way 1 Cache memory - way k
main memory
(equal to number of sets)
Set 0
s–d Fk–1
Fk s+w
Bj
0 if match
1 if no match
(miss in cache)
Number of sets = v = 2d
000 000000001111111111111000
000 000000001111111111111100
Set
Tag Data Number Tag Data
02C 000101100000000000000000 77777777 000 13579246 0000 02C 77777777
02C 000101100000000000000100 11235813 02C 11235813 0001
32 bits
Note: Memory address values are
16 MByte Main Memory in binary representation;
other values are in hexadecimal
0.6
0.5
0.4
0.3
0.2
0.1
0.0
1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1M
Cache size (bytes)
direct
2-way
4-way
8-way
16-way
Once the cache has been filled, when a new block is brought
into the cache, one of the existing blocks must be replaced
For direct mapping there is only one possible line for any
particular block and no choice is possible
First-in-first-out (FIFO)
Replace that block in the set that has been in the cache longest
Easily implemented as a round-robin or circular buffer technique
If at least one write operation has been A more complex problem occurs when
performed on a word in that line of the multiple processors are attached to the
cache then main memory must be same bus and each processor has its own
updated by writing the line of cache out local cache - if a word is altered in one
to the block of memory before bringing cache it could conceivably invalidate a
in the new block word in other caches
Write back
Minimizes memory writes
Updates are made only in the cache
Portions of main memory are invalid and hence accesses by I/O
modules can be allowed only through the cache
This makes for complex circuitry and a potential bottleneck
The on-chip cache reduces the processor’s external bus activity and
speeds up execution time and increases overall system performance
When the requested instruction or data is found in the on-chip cache, the bus
access is eliminated
On-chip cache accesses will complete appreciably faster than would even
zero-wait state bus cycles
During this period the bus is free to support other transfers
Two-level cache:
Internal cache designated as level 1 (L1)
External cache designated as level 2 (L2)
Potential savings due to the use of an L2 cache depends on the hit rates
in both the L1 and L2 caches
The use of multilevel caches complicates all of the design issues related
to caches, including size, replacement algorithm, and write policy
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
0.98
0.96
0.94
0.92
0.90
L1 = 16k
Hit ratio
0.88 L1 = 8k
0.86
0.84
0.82
0.80
0.78
1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1M 2M
Figure 4.17 Total Hit Ratio (L1 and L2) for 8 Kbyte and 16 Kbyte L1