revision1
revision1
revision1
Problems
April 27, 2023
1. Suppose we have a memory and a direct-mapped cache with the following characteristics.
● Memory is byte addressable
● Memory addresses are 16 bits (i.e., the total memory size is 2 16 = 65536 bytes)
● The cache has 8 rows (i.e., 8 cache lines)
● Each cache row (line) holds 16 bytes of data
How 16 address bits will be allocated to the offset, index, and tag parts of the address used to reference the cache?
1. Suppose we have a memory and a direct-mapped cache with the following characteristics.
● Memory is byte addressable
● Memory addresses are 16 bits (i.e., the total memory size is 2^16 = 65536 bytes)
● The cache has 8 rows (i.e., 8 cache lines)
● Each cache row (line) holds 16 bytes of data
How 16 address bits will be allocated to the offset, index, and tag parts of the address used to reference the cache?
4 bits offset
3 bit index
9 bit tag
2. Below is a sequence of four binary memory addresses with 16 address bits in the order they are used to reference memory. Assume
that the cache is initially empty. For each reference, write down the tag and index bits and indicate whether that reference is a hit or a
miss.
0010 1101 1011 0011
0000 0110 1111 1100
0010 1101 1011 1000
1010 1010 1010 1011
2. Below is a sequence of four binary memory addresses with 16 address bits in the order they are used to reference memory. Assume
that the cache is initially empty. For each reference, write down the tag and index bits and indicate whether that reference is a hit or a
miss.
0010 1101 1011 0011
0000 0110 1111 1100
0010 1101 1011 1000
1010 1010 1010 1011
Main memory (RAM) : 1sec 100ms 10ms 1ms 100µs 10µs 1µs 100ns 10ns 1ns
Hard disk drive: 1sec 100ms 10ms 1ms 100µs 10µs 1µs 100ns 10ns 1ns
CPU register: 1sec 100ms 10ms 1ms 100µs 10µs 1µs 100ns 10ns 1ns
3. Different storage devices have different access times and costs. For each of the following devices, circle the access time that is
closest to the typical time needed to read information from that device given current hardware technologies. The important thing is to
get the order of magnitude right; the exact number is not the issue. (ms = millisecond, µs = microsecond, ns = nanosecond). If it
matters, assume that these numbers are for a computer with a 1GHz clock (i.e., a low-end current machine)
Main memory (RAM) : 1sec 100ms 10ms 1ms 100µs 10µs 1µs 100ns 10ns 1ns
Hard disk drive: 1sec 100ms 10ms 1ms 100µs 10µs 1µs 100ns 10ns 1ns
CPU register: 1sec 100ms 10ms 1ms 100µs 10µs 1µs 100ns 10ns 1ns
4. Two of the design choices in a cache are the row size (number of bytes per row or line) and whether each row is organized as a
single block of data (direct mapped cache) or as more than one block (2-way or 4-way set associative). The goal of a cache is to
reduce overall memory access time. Suppose that we are designing a cache and we have a choice between a direct-mapped cache
where each row has a single 64-byte block of data, or a 2-way set associative cache where each row has two 32-byte blocks of data.
Which one would you choose and why? Give a brief technical justification for your answer. If the choice would make no difference in
performance then explain why not.
4. Two of the design choices in a cache are the row size (number of bytes per row or line) and whether each row is organized as a
single block of data (direct mapped cache) or as more than one block (2-way or 4-way set associative). The goal of a cache is to
reduce overall memory access time. Suppose that we are designing a cache and we have a choice between a direct-mapped cache
where each row has a single 64-byte block of data, or a 2-way set associative cache where each row has two 32-byte blocks of data.
Which one would you choose and why? Give a brief technical justification for your answer. If the choice would make no difference in
performance then explain why not.
Although the TLB and the cache both hold copies of main memory data, they are used in different parts of the memory subsystem, and
accesses to them have different characteristics.
a. TLB and the cache are used in different parts of a memory access. A TLB uses the virtual address to locate data, while a cache
normally uses the physical (translated) address as the index and tag.
b. Each entry in the TLB contains a pair of addresses: a virtual page number and a page frame address. Because individual TLB entries
translate a large number of virtual addresses (a full page each), relatively few of them are active at any time. So a TLB tends to be
fairly small, and the hardware to look up a virtual page number is fully associative.
The regular cache, on the other hand, has much more data per row to exploit locality of instruction and data memory
references. It also tends to be much larger than a TLB to get a reasonably large hit rate, so it is not practical to use fully associative
addressing. Instead, most caches tend to be 2-way or 4-way set associative.
c. A TLB miss and a cache miss are also handled in quite different ways. A TLB miss requires walking the page table; a cache miss
does not use a secondary data structure.
Because of these differences it makes sense to use specialized hardware optimized for each function rather than having a single “super
cache”.
6. True or false: By having the cache act as a bridge between main memory and the CPU, the time it takes to retrieve a block of data
from main memory is longer than it would be if the cache did not exist.
6. True or false: By having the cache act as a bridge between main memory and the CPU, the time it takes to retrieve a block of data
from main memory is longer than it would be if the cache did not exist.
(a) 8.4 nS (b) 33 nS (c) 24.6 nS (d) 27.0 nS (e) 2.4 nS (f) 3.0 nS
7. Assume a memory access to main memory on a cache "miss" takes 30 ns and a memory access to the cache on a cache "hit" takes
3 ns. If 80% of the processor's memory requests result in a cache "hit", what is the average memory access time?
(a) 8.4 nS (b) 33 nS (c) 24.6 nS (d) 27.0 nS (e) 2.4 nS (f) 3.0 nS
Since we know that not all of the memory accesses are going to the cache, the average access time must be greater than the cache
access time of 3 ns. This eliminates answers 'e' and 'f'. Similarly, not all accesses are going to the main memory, so the average must
be less than 30 ns eliminating 'b' as an answer. For the exact answer, you need to see that 80% of the time, the access will be 3 ns
while the rest of the time (20%) the access time will be 30 ns. You should be able to see then that the average is going to be closer to 3
ns than 30 ns meaning the answer is a.
First, convert 7121C5 to binary. 7121C5 16 = 0111 0001 0010 0001 1100 0101
The last two bits, 01, identify the word position within the block. 01 means that it is in the second column of data.
The next eight bits, 01110001, should identify the set. 01110001 identifies the set consisting of the third and fourth rows
from the bottom.
The fourth row from the bottom has the matching tag of 0111000100100.
Therefore, the data is in the second column of the fourth row from the bottom, i.e., A5.
b. How many lines are contained in this cache?
(i) 8 (ii) 256 (iii) 512 (iv) 1024 (v) 16K (vi) Cannot be determined
b. How many lines are contained in this cache?
(i) 8 (ii) 256 (iii) 512 (iv) 1024 (v) 16K (vi) Cannot be determined
c. How many blocks are contained in the memory space (not the cache, but the memory) of the cache system defined above?
(i) 2^24 (ii) 2^22 (iii) 2^14 (iv) 2^8 (iv) 2^2 (v) Cannot be determined
c. How many blocks are contained in the memory space (not the cache, but the memory) of the cache system defined above?
(i) 2^24 (ii) 2^22 (iii) 2^14 (iv) 2^8 (iv) 2^2 (v) Cannot be determined
9. Which is the fastest cache mapping function?
(a) Direct mapping (b) Set associative mapping (c) Fully associative mapping
10. Which cache mapping function is least likely to thrash, i.e., it has the lowest chance of two blocks contending with each other to be
stored in the same line?
(a) Direct mapping (b) Set associative mapping (c) Fully associative mapping
11. Which cache mapping function does not require a replacement algorithm?
(a) Direct mapping (b) Set associative mapping (c) Fully associative mapping
12. True or false: The number of lines contained in a set associative cache can be calculated from the number of bits in the memory
address, the number of bits assigned to the tag, the number of bits assigned to the word id (identifying the number of words per block),
and the number of bits assigned to the set id (identifying the number of sets.)
14. Which cache write mechanism allows an updated memory location in the cache to remain out of date in memory until the block
containing the updated memory location is replaced in the cache?
(a) Write through (b) Write back (c) Both (d) Neither
15. Which cache write mechanism best supports bus watching for use with multi-processor systems?
(a) Write through (b) Write back (c) does not make a difference
9. Which is the fastest cache mapping function?
(a) Direct mapping (b) Set associative mapping (c) Fully associative mapping
10. Which cache mapping function is least likely to thrash, i.e., it has the lowest chance of two blocks contending with each other to be
stored in the same line?
(a) Direct mapping (b) Set associative mapping (c) Fully associative mapping
11. Which cache mapping function does not require a replacement algorithm?
(a) Direct mapping (b) Set associative mapping (c) Fully associative mapping
12. True or false: The number of lines contained in a set associative cache can be calculated from the number of bits in the memory
address, the number of bits assigned to the tag, the number of bits assigned to the word id (identifying the number of words per block),
and the number of bits assigned to the set id (identifying the number of sets.)
14. Which cache write mechanism allows an updated memory location in the cache to remain out of date in memory until the block
containing the updated memory location is replaced in the cache?
(a) Write through (b) Write back (c) Both (d) Neither
15. Which cache write mechanism best supports bus watching for use with multi-processor systems?
(a) Write through (b) Write back (c) does not make a difference
16. Suppose a computer has a 4-way set associative cache with one-word blocks. It has a capacity of 256 bytes. Given the sequence
of byte addresses 8, 64, 96, 128, 64, 96, 256, 192, 24 show the final cache contents and state the number of hits and misses.
16. Suppose a computer has a 4-way set associative cache with one-word blocks. It has a capacity of 256 bytes. Given the sequence
of byte addresses 8, 64, 96, 128, 64, 96, 256, 192, 24 show the final cache contents and state the number of hits and misses.
There are 256/(4*4) = 16 sets. The sequence generates the following table. Empty sets are not shown. The currently accessed cache
block is in boldface. The notation m[i] means the word located at memory address i.
17. A machine has a base CPI of 2 clock cycles. Measurements obtained show that the instruction miss rate is 12% and the data miss
rate is 6%, and that on average, 30% of all instructions contain one data reference. The miss penalty for the cache is 10 cycles. What
is the total CPI?
17. A machine has a base CPI of 2 clock cycles. Measurements obtained show that the instruction miss rate is 12% and the data miss
rate is 6%, and that on average, 30% of all instructions contain one data reference. The miss penalty for the cache is 10 cycles. What
is the total CPI?
= 3.38
18. A machine has a 500MHz system clock. Memory takes 30 ns to access a word. How many clock cycles is this?
18. A machine has a 500MHz system clock. Memory takes 30 ns to access a word. How many clock cycles is this?
The clock rate of 250 MHz implies a clock cycle of length 4 ns (because 1/(250*10^6) = 4 *
10^-9.
Therefore the memory access time is 25 cycles and the secondary cache access time is 2.5
cycles.
The effective CPI = 1.5 + cycles missed by primary but caught by secondary + cycles missed by
both caches
= 1.5 + (0.05 - 0.01)*2.5 + 0.01*( 2.5 + 25 )
= 1.5 + 0.1 + 0.275
= 1.875
20. Draw a picture showing the organization of a direct-mapped cache with 16 words per block, with a capacity of 128KB. Show any
multiplexers, gates, needed. Show how a 32-bit physical address is mapped to a cache block.
Cache capacity is 128KB. Each block has 16*4 = 64 bytes. There are 128K/64 = 2K = 2048 blocks. The bytes offset is 2 bits, the word
offset is 4 bits (16 = 2^4) and the index is 11 bits since 2K = 2^11. That leaves 15 bits for the tag.