ECE 4100 Advanced Computer Architecture Final Exam - Summer 2003

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 6

SCORE:________

Name:__________________________________________

ECE 4100 Advanced Computer Architecture


Final Exam Summer 2003
1. (10 points) What is the significance of a snooping protocol and what is it used for
in modern computer systems?

2. (10 points) What limits the size and complexity of the L1 cache and why is an L2
common today?

3. (5 points) What are the advantages and disadvantages of FLASH memory over Disk?

4. (10 points) Adding 3 additional processors to a parallel computer system runs an


application 3.5 times faster than on one processor (ignore additional communication
overheard between processors and any possibility of superlinear speedup effects). What
percentage of the original program code would have to be able to run in parallel on four
processors for this to be possible? (assume it only runs parallel on four or sequential on
one)

Percentage of code that can run in parallel _____%


5. (10 points) Assume a network with a 100M bit/sec bandwidth has a sending overhead
of 50 usec. and a receiving overhead of 75 usec. How long would it take to send a
1Mbyte message. Assume the machines are 5000 km apart and use the books speed of
light in a conductor estimate (2/3 of the speed in a vacuum). Compute the total latency
for the message (to four decimal places).

Total latency for message _____________________


6. (10 points) Compute (to four decimal places) the average time to read or write a 4096byte sector on a disk with these features:
Average seek time is 8ms (use the books suggested 1/3 correction factor for a more
realistic seek time) a transfer rate of 20 Mbytes/sec, the disk rotates at 15,000 RPM, and
the controller overhead is .1ms

Average time to read or write a sector is ___________ms

7. (10 points) A processor sends 50 disk I/Os per second, the I/O requests are
exponentially distributed, and the disk drive has an average service time of 18ms. Use
queuing theory to compute the following:

Disk utilization _________%

Average Queue Time = _____________ms

Average Queue Length = _______________

8. (10 points) A computer has two levels of cache. The L1 cache has a 5% miss rate and
the L2 cache has a local miss rate of 33%. Main memory takes 50 clock cycles at 2Ghz,
an L2 hit is 8 clock cycles, and an L1 hit is 1 clock cycle. Compute the average memory
access time.

Average memory access time = _____________________________ ns.

9. (15 points) Part I (8 of 15 points): Unroll the loop shown below three times to reduce the number of
stalls and control overhead. You can assume that the loop executes a multiple of three times. Use registers
F10..F30, if needed. Indicate any stalls in your answer.
Instruction producing result
FP ALU Op
FP ALU Op
Load Double
Load Double

LOOP:

Instruction using result


FP ALU Op
Store Double
FP ALU Op;
Store Double

Latency in clock cycles


3
2
1
0

L.D
F8, 0(R1)
L.D
F4, 0(R2)
ADD.D F6, F4, F8
SUB.D F4, F6, F10
S.D
F4, 0(R1)
DADDIU R1, R1, #8
BNE
R1, R3 LOOP

Part II (7of 15 points): Using the code example above (with the same number of loop un-rollings as a
basis), use software pipelining to minimize stalls. Startup and cleanup code is not required. Indicate any
stalls in your answer. Note: Do not unroll more than 3 times for the pipeline code.

10. (10 points) Consider the program segment below running on a single-issue machine
using Tomasulos Algorithm. Fill in the clock cycle number in the table below assuming
the latencies shown in the table below the program.
L.D
L.D
MUL.D
SUB.D
DIV.D
ADD.D
DADDIU
BNE

F4, 0(R1)
F6, 0(R2)
F2, F6, F4
F4, F4, F10
F4, F6, F2
F8, F8, F4
R1, R1, # -8
R1, R3 LOOP

Details of Functional Units:


Unit
FP Add/Sub
FP Mult
FP Div
FP Load/Store
Integer Unit

Latency (in Execute)


5
8
12
2
1

Reservation Stations
2
2
2
2 each Load/Store buffers
None

Note: The FP arithmetic units are NOT pipelined i.e. you must wait for the current
operation to finish execution before using the unit again. You can do a new FP load/store
every clock cycle. The WB stage takes only 1 clock cycle and during that cycle the new
data appears on the CDB and at all reservation stations.

Instruction
L.D F4, 0(R1)
L.D F6, 0(R2)
MULT F2, F6, F4
SUB.D F4, F4, F10
DIV.D F4, F6, F2
ADD.D F8, F8, F4
DADDIU R1, R1, #-8
BNE R1, R3 LOOP

Issue
0

Execute

WB

Optional Worksheet for last problem, use it only if you find it helpful to keep track of everything (this sheet
will not be graded and does not need to be turned in)

Name
Load1

Busy

Op

Reservation Stations
Vj
Vk

Qj

Qk

Load2
Add1
Add2
Mult1
Mult2
Div1
Div2

Field

F2

Register Status
F4
F6

F8

F10

You might also like