Computer Architecture Questions
Computer Architecture Questions
Computer Architecture Questions
These questions were collected from previous exams and tests, so you will find a
new set of processor specifications inserted at various locations: the questions
following use those processor specifications. You will also find some essentially
identical questions! Re-use is a well-established software engineering
principle: we use it for exam questions too!
Except where otherwise indicated, use the following operating system and processor
characteristics in all questions.
Your operating system uses 8 kbyte pages. The machine you are using has a 4-way
set associative 32 kbyte unified L1 cache and a 64 entry fully associative TLB.
Cache lines contain 32 bytes. Integer registers are 32 bits wide. Physical addresses
are also 32 bits. It supports virtual addresses of 46 bits. 1 Gbyte of main memory is
installed.
kkk) What fields would you expect to find in a page table entry? Add a short phrase
indicating the purpose of each field.
lll) Why does a system interface unit provide separate queues for read and write
transactions?
mmm) Why does a read transaction check the write queue in a system interface unit?
nnn) If a processor with a simple branch predictor sees a conditional branch, under
what circumstances can it make a prediction which has a high probability of being
successful? Why?
ooo) Why is the branch processing unit placed as early in the pipeline as possible?
ppp) Irrespective of the number and power of the individual processing elements in a
parallel processor, what factor primarily determines whether the processor will be
efficient?
qqq) Under what conditions would you expect to achieve 100% TLB hits? (Two
answers required. For one, you are expected to do a calculation and give an
approximate numeric answer!)
rrr) Why are caches built with long (ie more than 8 byte) lines? (Two reasons needed.)
sss) A superscalar processor has 6 functional units. What determines the maximum
number of instructions that this processor can start every cycle?
ttt) List the functions of the instruction issue unit of a superscalar processor. No
marks for issue instructions (somewhat obvious!)- list the other functions that
the IIU performs.
uuu) You are trying to estimate the performance on your application of a superscalar
processor with 8 functional units and a clock speed of 2GHz. Its a conventional
RISC machine. You decide to start by working out how many instructions the
processor can complete every second. The application is a commercial one with
no floating point operations. What question do you need to ask before you can
make this estimate? (Alternatively: what piece of information do you need to find
in the processors data sheets?)
vvv) Have dataflow architectures disappeared in the way of the dinosaurs? Explain
your answer.
www) Describe a situation in which it is beneficial for an OS to share pages between
different processes or users.
xxx) In a multiprocessor based on a common bus, why must the bus support a READ-
MODIFY-WRITE transaction?
yyy) What is the purpose of the snooping circuitry in a high performance processor.
zzz) The number of high performance processors that can be placed on a single bus
when constructing a parallel processor is limited. How many processors would
you expect to be able to support on a single bus? (You are not expected to give a
precise answer: an approximate one appropriate to current technology will
suffice.)
aaaa) Give as many reasons as you can for this limitation.
bbbb) What is an SIMD processor? Explain the acronym and describe the basic
characteristics of an SIMD processor.
cccc) Give one example of a problem that can be solved effectively with an SIMD
architecture processor.
dddd) Some modern high performance processors have capabilities of an SIMD
machine. Explain this statement.
eeee) Sizes of components in computer architectures are usually powers of 2.
Which of the following must be a power or 2?
Answer yes if it would require extraordinary effort on behalf of a compiler or
operating system or a large amount of additional circuitry to manage a value other
than a power of 2.
(i) The maximum number of bytes in a physical address space YES / NO
(ii) The actual amount of memory installed in a computer YES / NO
(iii) The number of lines in a direct mapped cache YES / NO
(iv) The number of lines in a fully associative cache YES / NO
(v) The number of ways in a set associative cache YES / NO
(vi) The number of entries in a fully associative TLB YES / NO
(vii) The number of bytes in a page YES / NO
(viii) The number of functional units in a superscalar machine YES / NO
(ix) The number of general purpose registers YES / NO
(x) The number of stack entries in a stack machine YES / NO
(xi) The number of stages in a pipeline YES/ NO
(xii) The maximum number of primary op-codes (distinct instructions)
in an architecture YES / NO
(xiii) The number of bus clocks for transferring data in the data phase of a bus
transaction YES / NO
ffff) Describe a scenario in which you would prefer a write-back cache to a write-
through one. Explain why a write-back cache should perform better.
You can describe the scenario in English, pseudo-code, actual code or any other
way that would describe it clearly and unambiguously.
gggg) Describe a scenario in which you would prefer a write-through cache to a write-
back one. Explain why a write-through cache should perform better.
hhhh) Give one advantage and one disadvantage of a direct mapped cache.
Except where otherwise stated, assume that all caches in the following questions are 4-
way set-associative, have lines of 64 bytes and a total capacity of 64kbytes.
iiii) The system bus is 64 bits wide. How many bus clock cycles are used to transfer
data for the most common bus transaction?
jjjj) The system has a split address and data bus. List the overhead bus cycles needed
for the common bus transaction. An overhead cycle is one which transfers no
data. A simple name implying a function for each cycle will suffice. The list is
started for you.
Address Bus Request
kkkk) The cache is write-through. The machine emits 64-bit addresses. One bit is used
for an LRU algorithm. What is the total number of bits in the cache? Count all
overhead bits.
Since you do not have a calculator, show your working leading to a numeric
expression which would, if fed into a calculator, give the final answer.
An OS uses 8kbyte pages. Its running on a system with a 44-bit virtual address
space. A page table entry requires 4 bytes. Physical addresses are 32 bits.
How much space is needed for the page table for each user?
pppp) If the page tables are inverted and the system can handle 256 simultaneous
processes, how much space is needed for page tables?
qqqq) Show how the address emitted by a program running on the system in Q4 is
translated into a physical address.
rrrr) Discuss the benefits (if any) of having separate TLBs for instructions and data.
ssss) A system with a 64kb cache exhibits a hit rate of 95% on a benchmark program.
A cache access time is 1.8 cycles, so that pipeline is stalled for 1 cycle.
Increasing the cache size to 128kb increases the hit rate to 98% and the cache
access time to 2.0 cycles. The access time for a main memory access is 15 cycles.
Is increasing the cache size a good idea?
tttt) Your processors L1 cache contains 16kB of data; it is organized as an 8-way set
associative cache with lines of 64 bytes each. The processor has a 64 bit data bus.
a. How many data cycles would you expect in the most common bus
transaction?
b. How many sets does this cache contain?
c. You are designing a program to process matrices: what situation would
you look out for? Be precise supply a number in your answer!
d. How many comparators are required?
e. How many bits will be in each tag?
f. How many tags will this cache hold?
g. For an image processing program that works its way sequentially through
2Mbyte monochrome images (each pixel is one byte), what would you
expect the hit rate to be?
h. If the image is stored in row-major order and a program processes the
image column-by-column, what would you expect the hit rate to be?
i. For an engineering program that processes streams of double precision
floats that have been captured on disc, what would you expect the hit rate
to be?
uuuu) List the advantages of separate instruction and data caches.
vvvv) The OS manages pages of 8kB. The TLB has 128 entries.
a. What is the coverage of this TLB?
b. Your program needs to multiply matrices which contain 100x100 doubles.
How would you expect the TLB to perform?
Would there be any advantage to padding the matrices out to 128x128
elements? (Dont forget the cache!)
wwww) Why does a typical branch predictor count the number of times that it
predicted the branch direction successfully?
xxxx) Most elements of a typical processor are replicated 2k times where k is an integer.
Which of the following need to be 2k in size? Interpret need here to mean that
either that a considerable amount of extra circuitry would be required or that
software would become considerably more complicated.
a. Maximum physical memory supported by a processor.
b. Actual amount of physical memory installed in a processor
c. Number of lines in a
i. direct mapped cache
ii. fully associative cache
iii. set-associative cache
d. Number of entries in a TLB
e. Size of a page
f. Number of entries in a page table
g. Number of data phases in the most common bus transaction
yyyy) On a system with a 128Kbyte L1 cache for a program with a working data set of
2Mbytes:
a. Calculate the expected cache hit rate when no assumptions can be made
about data access patterns.
b. Would you expect the actual hit rate to be better or worse than this? Why?
Better this assumes perfectly random access to everything. Loop variables, constants,
etc are likely to have much better hit rates.
zzzz) Your system has a 4-way set associative cache with 4 32-bit words per cache line.
a. If the total cache size is 64kbytes, how many sets are there?
b. A physical address on this system has 32-bits. How much main memory
can it accommodate?
c. What is the total number of bits are needed for the cache? Its a write-
back cache.
d. How many comparators does this cache require?
e. How are the real addresses of lines in the same set related?
f. If this cache was fully associative,
i. How many comparators would be needed?
ii. How many overhead bits would be required?
iii. What would be the advantage (if any) obtained from the extra
resources.
aaaaa) Your OS uses a page size of 8kbytes.
a. What is the coverage of a 64 entry TLB?
b. What will the TLB hit rate be for a 2Mb working data set program
(making no assumptions about access patterns)?
c. If the program sweeps through the data accessing 64-bit doubles in
sequential order, what will the TLB hit rate be?
bbbbb) An OS uses 8kbyte pages. Its running on a system with a 44-bit virtual address
space. A page table entry requires 4 bytes. Physical addresses are 32 bits. How
much space is needed for the page table for each user?
ccccc) If the page tables are inverted and the system can handle 256 simultaneous
processes, how much space is needed for page tables?
ddddd) Show how the address emitted by a program running on the system in Q4 is
translated into a physical address.
eeeee) A system with a 64kb cache exhibits a hit rate of 95% on a benchmark program.
A cache access time is 1.8 cycles, so that pipeline is stalled for 1 cycle.
Increasing the cache size to 128kb increases the hit rate to 98% and the cache
access time to 2.0 cycles. The access time for a main memory access is 15 cycles.
Is increasing the cache size a good idea?
fffff) What benefit would you expect from a fully associative cache (compared to other
cache organizations)?
Parallel Processing
1.1 Superscalar procesors
1. Draw a diagram showing how the instruction fetch and execution units of a
superscalar processor are connected. Show the widths of the datapath (in words - not
bits; your diagram should be relevant to a 32-bit or 64-bit processor). Which factor
primarily determines performance: the instruction issue width (number of instructions
issued per cycle) or the number of functional units?
2. List the capabilities of the instruction fetch/despatch unit needed to make an effective
superscalar processor.
1.2 Branch Prediction
3. Why does branch prediction speed up a processor?
Two reasons one to do with the effect of branches on performance, the other to do
with the likelihood that prediction is possible.
4. If you only had a few transistors to implement a branch prediction system, what
would you do? Why would it be effective?
5. In addition to the answer that you almost certainly gave for the previous question,
describe a scenario where you would expect branch prediction to be successful.
6. Describe the status bits in a branch target buffer.
7. Does it make sense to have both branch prediction and speculative execution in the
same processor? Explain your answer.
Atomic Instructions
2 Is an atomic instruction, such as a test-and-set instruction necessary for (a) a single
processor running a multi-threaded OS and (b) a shared memory parallel processor.
In each case, explain your answer.
8. A computer bus must support READ and WRITE commands. List some other
commands that it must support. Consider also the situation when the processors have
snooping caches.
2.1 Programming Model
9. What does the Shared Memory programming model imply?
10. Distinguish between a Uniform Memory Access system and a Non-Uniform
Memory Access system.
11. When using a shared memory, cache coherence transactions are expensive and could
potentially clog up a bus so much that the bandwidth available to useful transactions
becomes very low. A hacker (whos spent his time reading the manual for his
processor on the net instead of going to SE363 lectures) discovers that theres an
instruction that will disable the cache and decides that this will solve the problem.
Why is this likely to be a bad idea?
12. (State transition diagram for the MESI protocol inserted here.)
13. Explain the significance of each state in a MESI protocol.
14. Using the diagram, describe a scenario that would lead to the transition
in the diagram.
2.2 Other Parallel Processors
15. Why does a VLIW machine need a good optimizing compiler?
16. Where can you find a small dataflow machine in every high performance processor?