A5 Solution
A5 Solution
A5 Solution
Solution
1. (4 pts) We wish to compare the performance of two different computers: M1 and M2. The
following measurements have been made on these computers:
a) (1 pt) Which computer is faster for each program, and how many times as fast is it?
b) (1 pt) Find the instruction execution rate (instructions per second) for each computer
when running program 1.
c) (1 pt) The clock rates for M1 and M2 are 3 GHz and 5 GHz respectively. Find the CPI
for program 1 on both machines.
d) (1 pt) Suppose that program 1 must be executed 1600 times each hour. Any remaining
time should used to run program 2. Which computer is faster for this workload?
Performance is measured here by the throughput of program 2.
Solution:
a) For program 1, M2 is 2.0/1.5 = 1.33 times as fast as M1.
For program 2, M1 is 10.0/5.0 = 2 times as fast as M2.
b) For program 1:
Execution rate on M1 = 5 × 109 / 2.0 = 2.5 × 109 IPS (Instructions Per Second).
Execution rate on M2 = 6 × 109 / 1.5 = 4 × 109 IPS.
2. (2 pts) Suppose you wish to run a program P with 7.5 × 109 instructions on a 5 GHz
machine with a CPI of 1.2.
a) (1 pt) What is the CPU execution time?
b) (1 pt) When you run program P, it takes 3 seconds of wall time to complete. What is
the percentage of the CPU time program P received?
Solution:
3. (4 pts) Consider two different implementations, M1 and M2, of the same instruction set.
There are five classes of instructions (A, B, C, D and E) in the instruction set. M1 has a
clock rate of 4 GHz and M2 has a clock rate of 6 GHz.
a) (2 pts) Assume that peak performance is defined as the fastest rate that a computer
can execute any instruction sequence. What are the peak performances of M1 and M2
expressed in instructions per second?
b) (2 pts) If the number of instructions executed in a certain program is divided equally
among the classes of instructions, except that for class A, which occurs twice as often
as each of the others, how much faster is M2 than M1?
Solution:
a) For peak performance, the machine will be executing the fastest set of
instructions. M1 will be executing instructions of class A only, and M2 will be
executing instructions that belong to class A, B, or C.
4. (5 pts) Consider two different implementations, M1 and M2, of the same instruction set.
There are three classes of instructions (A, B, and C) in the instruction set. M1 has a clock
rate of 6 GHz and M2 has a clock rate of 3 GHz. The CPI for each instruction class on
M1 and M2 is given in the following table:
The above table also contains a summary of the usage of instruction classes generated by
three different compilers: C1, C2, and C3. Assume that each compiler generates the same
number of instructions for a given program.
a) (1 pt) Using C1 compiler on both M1 and M2, how much faster is M1 than M2?
b) (1 pt) Using C2 compiler on both M1 and M2, how much faster is M2 than M1?
c) (1 pt) If you purchase M1, which compiler would you use?
d) (1 pt) If you purchase M2, which compiler would you use?
e) (1 pt) Which computer and compiler combination give the best performance?
Solution:
a) Using C1 compiler:
Average CPI on M1 = 0.4 × 2 + 0.4 × 3 + 0.2 × 5 = 3.0
Average CPI on M2 = 0.4 × 1 + 0.4 × 2 + 0.2 × 2 = 1.6
b) Using C2 compiler:
Average CPI on M1 = 0.4 × 2 + 0.2 × 3 + 0.4 × 5 = 3.4
Average CPI on M2 = 0.4 × 1 + 0.2 × 2 + 0.4 × 2 = 1.6
e) Compiler C3 compiler gives the best average CPI on both M1 and M2.
Performance is proportional to Clock rate / CPI, because I-count is the same.
Performance of M1 relative to M2 = (6 GHz / 3 GHz) × (1.4 / 2.9) = 2.8/2.9
Therefore, M2 gives the best performance using Compiler C3.
5. (2 pts) A benchmark program runs for 100 seconds. We want to improve the speedup of
the benchmark by a factor of 3. We enhance the floating-point hardware to make floating-
point instructions run 5 times faster. How much of the initial execution time would
floating-point instructions have to account for to show an overall speedup of 3 on this
benchmark?
Let f be the fraction of time spent in floating point instructions. Then, after the
enhancement, the time taken by the machine will be:
Therefore, 5/6 of the initial 100 seconds, or 83.33 seconds, must be spent
executing floating-point instructions.
6. (3 pts) Consider the following fragment of MIPS code. Assume that a and b are arrays of
words and the base address of a is in $a0 and the base address of b is in $a1. How many
instructions are executed during the running of this code? If ALU instructions (addu and
addiu) take 1 cycle to execute, load/store (lw and sw) take 5 cycles to execute, and the
branch (bne) instruction takes 3 cycles to execute, how many cycles are needed to
execute the following code (all iterations). What is the average CPI?