ECE 4100 Advanced Computer Architecture Final Exam - Summer 2003
ECE 4100 Advanced Computer Architecture Final Exam - Summer 2003
ECE 4100 Advanced Computer Architecture Final Exam - Summer 2003
Name:__________________________________________
2. (10 points) What limits the size and complexity of the L1 cache and why is an L2
common today?
3. (5 points) What are the advantages and disadvantages of FLASH memory over Disk?
7. (10 points) A processor sends 50 disk I/Os per second, the I/O requests are
exponentially distributed, and the disk drive has an average service time of 18ms. Use
queuing theory to compute the following:
8. (10 points) A computer has two levels of cache. The L1 cache has a 5% miss rate and
the L2 cache has a local miss rate of 33%. Main memory takes 50 clock cycles at 2Ghz,
an L2 hit is 8 clock cycles, and an L1 hit is 1 clock cycle. Compute the average memory
access time.
9. (15 points) Part I (8 of 15 points): Unroll the loop shown below three times to reduce the number of
stalls and control overhead. You can assume that the loop executes a multiple of three times. Use registers
F10..F30, if needed. Indicate any stalls in your answer.
Instruction producing result
FP ALU Op
FP ALU Op
Load Double
Load Double
LOOP:
L.D
F8, 0(R1)
L.D
F4, 0(R2)
ADD.D F6, F4, F8
SUB.D F4, F6, F10
S.D
F4, 0(R1)
DADDIU R1, R1, #8
BNE
R1, R3 LOOP
Part II (7of 15 points): Using the code example above (with the same number of loop un-rollings as a
basis), use software pipelining to minimize stalls. Startup and cleanup code is not required. Indicate any
stalls in your answer. Note: Do not unroll more than 3 times for the pipeline code.
10. (10 points) Consider the program segment below running on a single-issue machine
using Tomasulos Algorithm. Fill in the clock cycle number in the table below assuming
the latencies shown in the table below the program.
L.D
L.D
MUL.D
SUB.D
DIV.D
ADD.D
DADDIU
BNE
F4, 0(R1)
F6, 0(R2)
F2, F6, F4
F4, F4, F10
F4, F6, F2
F8, F8, F4
R1, R1, # -8
R1, R3 LOOP
Reservation Stations
2
2
2
2 each Load/Store buffers
None
Note: The FP arithmetic units are NOT pipelined i.e. you must wait for the current
operation to finish execution before using the unit again. You can do a new FP load/store
every clock cycle. The WB stage takes only 1 clock cycle and during that cycle the new
data appears on the CDB and at all reservation stations.
Instruction
L.D F4, 0(R1)
L.D F6, 0(R2)
MULT F2, F6, F4
SUB.D F4, F4, F10
DIV.D F4, F6, F2
ADD.D F8, F8, F4
DADDIU R1, R1, #-8
BNE R1, R3 LOOP
Issue
0
Execute
WB
Optional Worksheet for last problem, use it only if you find it helpful to keep track of everything (this sheet
will not be graded and does not need to be turned in)
Name
Load1
Busy
Op
Reservation Stations
Vj
Vk
Qj
Qk
Load2
Add1
Add2
Mult1
Mult2
Div1
Div2
Field
F2
Register Status
F4
F6
F8
F10