CL10 MemoryMgmt

Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

Memory Management

Hardware Implementation
# More than one word are put in one cache block to
(a) exploit the temporal locality of reference in a program
(b) exploit the spatial locality of reference in a program
(c) reduce the miss penalty
(d) none of the above
Cache Performance Metrics
Miss Rate
 Fraction of memory references not found in cache (misses / accesses)
 1 – hit rate 
 Typical numbers (in percentages):
 3-10% for L1

 can be quite small (e.g., < 1%) for L2, depending on size, etc.

Hit Time
 Time to deliver a line in the cache to the processor
 includes time to determine whether the line is in the cache
 Typical numbers:
 1-2 clock cycle for L1

 5-20 clock cycles for L2

Miss Penalty
 Additional time required because of a miss
 typically 50-200 cycles for main memory (Trend: increasing!)

9 15-213, S’08
Cache Misses
• Compulsory (cold start or process migration, first reference): first
access to a block
– “Cold” fact of life: not a whole lot you can do about it
– Note: If you are going to run “billions” of instruction,
Compulsory Misses are insignificant
• Capacity:
– Cache cannot contain all blocks access by the program
– Solution: increase cache size
• Conflict (collision):
– Multiple memory locations mapped
to the same cache location
– Solution 1: increase cache size
– Solution 2: increase associativity

• Coherence (Invalidation): other process (e.g., I/O) updates memory


12 15-213, S’08
Direct Mapped Cache
Taking advantage of spatial locality, we read 4 bytes at a time
Direct Mapping: 0001 0 1 =5
Main Memory
Block offset
0 0 1 2 3 Block #
Cache Memory 1 4 5 6 7
TAG Line No BO
2 .
0
3 . 0011 00
1
4 .
2
5 .
3 From the block number,
6 .
7 . 0111 1 1 the LSB (here last 2 bit)
Indicates cache
8 . line number
9 .
Many to One mapping
10 .
11 .
12 .
13 .
14 .
15 60 61 62 63 1111 00
Cache Line Main Memory
0 0 1 2 3
00 00 00 1 4 5 6 7
01
10 2 .
Line #
TAG Cache 11
000000=0 3 .
4 .
00 000001=1 5 .
6 .
000010=2
X 000011=3
7
8
.
.

11 Memory block # 0 9 .
10 .
1 1 0 0 0 0 = 48
11 .
1 1 0 0 0 1 = 49 12 .
13 .
1 1 0 0 1 0 = 50
14 .
1 1 0 0 1 1 = 51 15 60 61 62 63
Memory block # 12
Problem:

MM Size Cache Block TAG bits TAG Directory


Size Size Size
128 KB 16 KB 256B 3- -
3 x 2^6 Cl=6 bits

32 GB 32 KB 1 KB 10 - 10 x 2^5 - CL=5 bits

16 - 512 KB 1 KB 7 7 x 2^9 - CL=9bits

16 GB -
4 KB 4 KB 10 10 x 1 - CL=1 bit
64 MB - - 10 -
- 512 KB - 7 -
Assuming that memory Byte addressable
MM Size Cache Block TAG bits TAG Directory
Size Size Size
128 KB 16 KB 256B 3 3X26 bits
32 GB 32 KB 1 KB - -
- 512 KB 1 KB 7 -
16 GB - 4 KB 10 -
64 MB - - 10 -
- 512 KB - 7 -
Assuming that memory Byte addressable
Direct Mapping pros & cons
• Simple & Fast
• Inexpensive
• Fixed location for given block
—If a program accesses 2 blocks that map to the same line
repeatedly, cache misses are very high
MM Size Cache Size Block Size TAG bits TAG Directory Size

128 KB 16 KB 256B -

32 GB 32 KB 1 KB - -

- 512 KB 1 KB 7 -

16 GB - 4 KB 10 -

64 MB - - 10 -

- 512 KB - 7 -

Assuming that memory Byte addressable


Set Associative Mapping
• Cache is divided into a number of sets
• Each set contains a number of lines
• A given block maps to any line in a given set
—e.g. Block B can be in any line of set i
• e.g. 2 lines per set
—2 way associative mapping
—A given block can be in one of 2 lines in only one set
K-Way Set Associative Cache Organization
Set Associative Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w words or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2d
• Number of lines in set = k
• Number of sets = v = 2d
• Number of lines in cache = kv = k * 2d
• Size of tag = (s – d) bits
Associative Caches

• Block 12 placed in 8 block cache:


– Fully associative, direct mapped, 2-way set associative
– S.A. Mapping = Block Number Modulo Number Sets
Fully associative: Direct mapped: Set associative:
block 12 can go block 12 can go block 12 can go
anywhere only into block 4 anywhere in set 0
(12 mod 8) (12 mod 4)
Block 01234567 Block 01234567 Block 01234567
no. no. no.

Set Set Set Set


Block-frame address 0 1 2 3

Block 1111111111222222222233
no.
01234567890123456789012345678901
Disadvantages of Set Associative Cache
• N-way Set Associative Cache versus Direct Mapped Cache:
– N comparators vs. 1
– Extra MUX delay for the data
– Data comes AFTER Hit/Miss decision and set selection
• In a direct mapped cache, Cache Block is available BEFORE
Hit/Miss:

Cache Index
Valid Cache Tag Cache Data Cache Data Cache Tag Valid
Cache Block 0 Cache Block 0

: : : : : :

Adr Tag
Compare Sel1 1 Mux 0 Sel0 Compare

OR
Cache Block
Hit
Replacement Algorithms

Direct mapping

• No choice
• Each block only maps to one line
• Replace that line
Associative & Set Associative

• Algorithm must be implemented in hardware (speed)


• Least Recently used (LRU)
— e.g. in 2 way set associative, which of the 2 block is LRU?
– For each slot, have an extra bit, USE. Set to 1 when accessed, set all others to 0.
— For more than 2-way set associative, need a time stamp for each slot -
expensive
• First in first out (FIFO)
— Replace block that has been in cache longest
— Easy to implement as a circular buffer
• Least frequently used
— Replace block which has had fewest hits
— Need a counter to sum number of hits
• Random
— Almost as good as LFU and simple to implement
Write Policy
• Must not overwrite a cache block unless main memory is
up to date
• Multiple CPUs may have individual caches
• I/O may address main memory directly
Write through
• All writes go to main memory as well as cache
• Multiple CPUs can monitor main memory traffic to keep
local (to CPU) cache up to date
• Lots of traffic
• Slows down writes

• Remember bogus write through caches!


Write back

• Updates initially made in cache only


– Dirty bit is set when we write to the cache, this indicates the cache is
now inconsistent with main memory
• Dirty bit for cache slot is cleared when update occurs
• If cache line is to be replaced, write the existing cache line to
main memory if dirty bit is set before loading the new memory
block
Unified v Split Caches
• One cache for data and instructions or two, one for data
and one for instructions
• Advantages of unified cache
—Higher hit rate
– Balances load of instruction and data fetch
– Only one cache to design & implement
• Advantages of split cache
—Eliminates cache contention between instruction fetch/decode
unit and execution unit
– Important in pipelining
A computer has a 256 KByte, 4-way set associative, write back data cache with block size of 32 Bytes. The
processor sends 32 bit addresses to the cache controller. Each cache tag directory entry contains, in addition
to address tag, 2 valid bits, 1 modified bit and 1 replacement bit. The number of bits in the tag field of an
address is

(A) 11 (B) 14 (C) 16 (D) 27

Answer: (C)

Que-2: Consider the data given in previous question. The size of the cache tag directory is
(A) 160 Kbits (B) 136 bits (C) 40 Kbits (D) 32 bits

Answer: (A)

Que-3: An 8KB direct-mapped write-back cache is organized as multiple blocks, each of size 32-bytes. The
processor generates 32-bit addresses. The cache controller maintains the tag information for each cache block
comprising of the following.
1 Valid bit 1 Modified bit As many bits as the minimum needed to identify the memory block mapped in the
cache. What is the total size of memory needed at the cache controller to store meta-data (tags) for the cache?

(A) 4864 bits (B) 6144 bits (C) 6656 bits (D) 5376 bits

Answer: (D)
A cache memory unit with capacity of N words and block size of B words is to be designed. If it is designed as
direct mapped cache, the length of the TAG field is 10 bits. If the cache unit is now designed as a 16-way set-
associative cache, the length of the TAG field is ______ bits.

(A) 12
(B) 14
(C) 16
(D) 18

Answer: (B)

Explanation: Type of mapping is direct map; for this direct map, 10 bits are required in its Tag. It is updated to 16
way set Associative map then new tag field size = 10 + log216 = 14 bits, because for k way set associative map
design, log2k bits are additionally required to the number of bits in tag field for Direct map design.

The size of the physical address space of a processor is 2P bytes. The word length is 2W bytes. The
capacity of cache memory is 2N bytes. The size of each cache block is 2M words. For a K-way set-
associative cache memory, the length (in number of bits) of the tag field is

(A) P−N−log2K. (B) P−N+log2K. (C) P−N−M−W−log2K. (D) P−N−M−W+log2K

Ans (B)
Q. Consider a 2-way set associative cache with 256 blocks and uses LRU replacement, Initially the cache is
empty. Conflict misses are those misses which occur due to contention of multiple blocks for the same cache set.
Compulsory misses occur due to first time access to the block. The following sequence of accesse to memory
blocks
(0, 128, 256, 128, 0, 128, 256, 128, 1, 129, 257, 129, 1, 129, 257, 129)
is repeated 10 times. The number of conflict misses experienced by the cache is ____________ .
Q. A cache memory unit with capacity of N words and block size of B words is to be designed. If it is designed as a direct
mapped cache, the length of the TAG field is 10 bits. If the cache unit is now designed as a 16-way set-associative cache,
the length of the TAG field is ______ bits.

Q. If a 16-way Set Associative cache is made up of 64 bit words , 16 words per line and 8192 sets, How big is the
cache in Megabytes ?

Ans: 16MB

Q. In a two-level cache system, the access times of L1 and L2 caches are 1 and 8 clock cycles, respectively. The
miss penalty from the L2 cache to main memory is 18 clock cycles. The miss rate of L1 cache is twise that of L2.
The average memory access time (AMAT) of this cache system is 2 cycles. The miss rate
of L1 and L2 respectively are:

Ans:
Q. Did you get any difference in the Programme ? Assuming, the cache has a block size of 4 words each, word size being 4 bytes

Program 1 Program 2

int sum_array(int a[4][4]) { int sum_array(int a[4][4])


int i, j, sum = 0; { int i, j, sum = 0;
for (i = 0; i < 4; i++) for (j = 0; j < 4; j++)
for (j = 0; j < 4; j++) for (i = 0; i < 4; i++)
sum += a[i][j]; sum += a[i][j];
return sum; return sum;
} }
Q. Consider the cache memory size of 16kb, and cache block size is 16
bytes. The processor generates the physical address of 32 bits. Assume
the cache is fully associative. What are the TAG and index bits
__________
(A) 28 and 4bits (B) 28 and 0bits (C) 24 and 4bits (D) 24 and 0bits.
GATE 2019

You might also like