CMSC414 Practice
CMSC414 Practice
CMSC414 Practice
Email:
rtFp
Instructions:
• The Goal: Show me what you know.
• The Method: For each section, read the directions carefully.
• The Plan: If you don’t know what to do, skip to another problem.
• The Warning: There is about a test-and-a-half of work here–don’t be intimidated.
• The Requirement: Explain your answer clearly and with appropriate assumptions.
• The Reward: Generous partial credit for correct and clear partial answers.
• The Penalty: No partial credit for content-free answer. Nothing relevant or correct, no credit. So don’t waste
the effort.
• The Expected Outcome: You never have to take me again, unless you really want to do so.
1
Scratch Paper–Just in Case
2
Problem 1: Now, It’s Marsupials? (30 pts)
Welcome to the land of uncertainty. Circle either True or False as appropriate for the statement below. If
you feel the need to write an explanation, don’t–it won’t be graded; so, just circle False instead.
True False [1.1 To be meaningful, benchmarks must specify a suite of programs to be run; a set of
compiler parameter requirements; and a set of hardware specifications for which expected results are
available from a trusted source.
True False [1.2 Decreasing the average IC for the CPU will never decrease the execution times of
programs run on that CPU.
True False 1.3 In the context of conditional branches, the address of the “taken” instruction is in the
PC (program counter) after the branch instruction is fetched.
True False [1.4 Executing the MIPS command LH R0, 8(R2) will never cause a byte alignment error,
regardless of the value in register R2.
True False [1.5 The floating point pipe-like-pipe (FPLP) increased the amount of ILP available by
using pipelined functional units where possible.
True False [1.6 In the superscalar Tomasulo algorithm, instructions start executing in compiler order.
True False [1.7 In Tomasulo’s algorithm, instructions leave the reservation stations and enter the
functional units in the order that they were issued.
True False [1.8 In a superscalar machine (also called multiple-issue), any two instructions may be
issued during the same clock cycle.
True False [1.9 Tomasulo’s algorithm prevents all data hazards using register renaming.
True False [1.10 Old sequential machines, often described as SISD, were not vulnerable to hazards.
True False [1.11 Superscalar architectures may issue more than one instruction per clock cycle, but
need not.
True False [1.12 Branch prediction using only prediction bits does not require compiler assistance.
True False [1.13 Current machines called GP GPU’s (General Purpose Graphic Processing Units) can
support SIMD applications that don’t require graphical outputs.
True False [1.14 The MIPS64 architecture supports only register, immediate, and displacement ad-
dressing modes.
True False [1.15 The Branch Target and Branch Prediction Buffers are static methods of handling
control hazards.
3
Problems 2 and 3 are essentially the practice material for week fourteen. Check the work-
sheet and quiz solutions linked to the Week 14 page for more detailed answers.
F T N (2.2) Start of next instruction stalled until WAR hazards have cleared.
F T N (2.5) The functional units receive all values directly from registers.
F T N (2.8) Suffers from structural hazards associated with the Common Data Bus.
F T N (2.10) Forwarding, bypassing, and load interlocks are used to avoid WAR hazards.
4
Problem 3: Under? New Option! (52 points)
3.1–3.8 Suppose that we have implemented a (0,3) GPR machine, using a Superscalar Speculative Tomasulo
(SST) architecture that executes the MIPS64 ISA.
General Instructions:
For example, the correct response to the statement Uses register renaming to prevent data dependences
from becoming hazards would be to circle F and TO, since register renaming is not associated with
either SUperscalar or SPeculative.
If you feel the need to write an explanation don’t–it won’t be graded; so, just circle F instead–and at
least guess which attribute, SU, SP, or T rules the day. Each correctly circled item is worth two (2)
points.
3.1 Out-of-order issue in the SST increases the amount of ILP available to be exploited.
3.2 The SST reorder buffer supports Out-of-order instruction completion.
3.3 The SST uses register renaming to prevent WAR hazards.
3.4 The Ideal CPI of the SST is greater than or equal to one because several instructions can complete
simultaneously.
3.5 Compiler support for the SST always identifies safe instructions and schedules them to run im-
mediately after a branch is encountered in the MIPS64 code intended for the SST.
3.6 The SST can have at most one iteration of a loop in flight simultaneously.
3.7 In the SST, instructions are eligible to start running as soon as their operands are ready.
3.8 Committed instructions in the SST exit the reorder buffer in compiler order.
3.1 T F and SU SP TO
3.2 T F and SU SP TO
3.3 T F and SU SP TO
3.4 T F and SU SP TO
3.5 T F and SU SP TO
3.6 T F and SU SP TO
3.7 T F and SU SP TO
3.8 T F and SU SP TO
5
3.9–3.18 Who am I? Identify which of the methods below (for minimizing the impact of control
hazards) is introducing itself in the sentences that follow. Each correct response is worth two
points. If more than one answer is correct, you need only indicate one of them for full credit
of 2pts.
Potential Answers:
D (branch Delay slots.)
F (branch Folding)
P (branch Prediction buffer)
T (branch Target buffer without prediction)
None (None of the above)
3.10 I am a dynamic method that strives to decrease the impact of structural hazards
on execution time.
3.11 I contain the next instruction to be executed after some known unconditional
branch instructions.
3.12 I am modified only after I know that the branch was predicted taken.
3.13 I require compiler support to decrease all stalls or no-ops caused by control
hazards.
3.14 I contain the “predicted next PC ” for some taken branches in a program.
3.15 I always contain the next PC value after the branch condition has been evalu-
ated.
3.18 I can remove all stalls associated with incorrect branch prediction.
6
Problem 4: Racing Injures Startled Koalas (60pts)
Fill in the short answer that best matches the following phrase, where your answers come from the following
set of terms. Not all need to be used, and terms may be reused, as well. Each correct answer is worth up to
3 points. Zero credit for a blank answer.
7
anti-dependence associativity branch target buffer
clock rate conditional branch conflict miss
control-Z data dependence data hazard
dirty bits dynamic schedule effective address
exception forwarding immediate AND
index bits reservation station sign extend
speculation speedup structural hazard
superscalar Tomasulo unroll loop
unsigned binary valid bits VLIW
4.19 Predicted PC
8
Problem 5: Pademelon Offers Lorikeet Opals! (30pts)
Welcome to the land of uncertainty, where a ’null’ response means ’no opinion–no points’. In this problem,
You are to assess the veracity (that’s truthfulness) of the statements that follow. Your answer must come
from the selections below. Blank answers are worth zero (0) points. Correct answer are worth two(2) points.
Hint: Mark the ones that you can tell are absolutely false ( N ) or absolutely true (A). Then, go back and
look at the others more carefully.
9
Problem 6: Pull Over! Kangaroo Enters Race (24pts)
Once again, you have the privilege of being the memory management system. Use the information below
regarding the cache organization and the breakdown of the address of a byte in memory to determine if the
indicated memory requests result in a hit or a miss.
• The cache is 2-way set associative, with a 4 byte line size and 16 total lines.
10
Part 1 (10pt)
For the given address, indicate the cache entry accessed and the cache byte value returned in hex. Indicate
whether a cache miss occurs.
If there is a cache miss, enter “-” for “Cache Byte returned”.
Address: 0x 1D33
Memory reference
Parameter Value
Byte offset 0x
Cache Index 0x
Cache Tag 0x
Cache Hit? (Y/N)
Cache Byte returned 0x
Part 2 (10pt)
For the given address, indicate the cache entry accessed and the cache byte value returned in hex. Indicate
whether a cache miss occurs.
If there is a cache miss, enter “-” for “Cache Byte returned”.
Adress: 0x 0845
Memory reference
Parameter Value
Byte offset 0x
Cache Index 0x
Cache Tag 0x
Cache Hit? (Y/N)
Cache Byte returned 0x
11
Problem 7: Brisbane Onions, Cabbage, Carrots–Excellent! (60pts)
The L3 cache for an Intel Itanium machine with byte-addressable memory is 32M bytes, with a block size
of 4K bytes. Answer the following 4-point questions about potential cache organizations for this machine,
assuming that the address has 64 bits.
7.1 Diagram the address of a typical memory access, assuming that the organization of the cache is 4-way
set associative. Label all parts of the address clearly with name and size.
7.2 Suppose that your colleague modifies the cache organization so that you have 12 index bits. What is the
organization of this new cache, assuming that no other cache parameters were altered? Be as specific
as possible, and justify your work for full credit.
7.3 Another colleague proposes a different modification to the 32MB cache. She raises the block size to 8K
bytes, she makes the cache fully associative and writeback. Draw a diagram of a typical cache slot
or cache line associated with this configuration, being sure to label each field clearly with name and
size.
12
7.4 Write an expression for the number of bits of overhead required to access information in the 32MB
cache, as a function of t, the number of tag bits for a machine with 64 address bits. For this problem
only, assume that the block size is 8K and that space is present for a dirty bit. To assist the grader,
make sure you label all fields in the cache line that are considered part of the overhead. Note: since t
is represented by a variable, you don’t have to know anything about the associativity of the cache to
answer this question.
Erase the assumptions for the first four questions from your mind for the remaining questions about
Memory Matters
7.5 Write an expression for memory access time, given that the hit time (HT) is 2 clock cycles, the miss
rate is 5% and the miss penalty is 500 clock cycles.
13
7.6 Suppose that we consider a slightly different memory model. The miss penalty for reads is 500 clock
cycles; however, the miss penalty for writes is now 750 clock cycles. Separate memories are used for
instructions and data, with the following values for the appropriate parameters.
Write an expression for the memory access time when the cache is not perfect.
The topics that follow will be covered at a level significantly less than they were in the worksheets,
quizzes and exam in the first portion of the course. It might be better to use the detailed answer
keys to review the material instead of rolling back to the original information in the videos and books.
Don’t forget to visit the speedup related word problems, the floating point conversion problems (you
will be given the floating point formats, the so-called fixed point representations when the binary point
is always to the right of the least significant bit. You should also expect to write or analyze some MIPS
code.
14