Numerical: Central Processing Unit
Numerical: Central Processing Unit
Numerical: Central Processing Unit
Hence A on P1 is faster.
Problem: Amdahl’s Law
A company is releasing 2 latest versions (beta and gamma) of its basic
processor architecture named alpha. Beta and gamma are designed by
making modifications on three major components (X, Y and Z) of the alpha.
It was observed that for a program A the fractions of the total execution time
on these three components, X, Y, and Z are 40%, 30%, and 20%,
respectively. Beta speeds up X and Z by 2 times but slows down Y by 1.3
times, where as gamma speeds up X, Y and Z by 1.2, 1.3 and 1.4 times,
respectively.
(a) How much faster is gamma over alpha for running A?
(b) Whether beta or gamma is faster for running A? Find the speedup factor
Beta: S = 1/ {(1-fx-fy-fz) + (fx/Nx) + (fy/Ny) + (fz/Nz) }
= 1 / {(0.1 + (0.4/2) + (0.3/0.77) + (0.2/2) } = 1.266 times
(a) Gamma: S = 1/ {(1-fx-fy-fz) + (fx/Nx) + (fy/Ny) + (fz/Nz) }
= 1 / {(0.1 + (0.4/1.2) + (0.3/1.3) + (0.2/1.4) } = 1.239 times
(b) Beta is faster than gamma 1.266/1.239 = 1.021 times
Problem: Pipeline Hazards
Given a non-pipelined architecture running at 1.5 GHz, that takes 5 cycles to
finish an instruction. You want to make it pipelined with 5 stages. Due to
hardware overhead the pipelined design will operate only at 1 GHz. 5% of
memory instructions cause a stall of 50 cycles, 30% of branch instruction cause
a stall of 2 cycles and load-ALU combinations cause a stall of 1 cycle. Assume
that in a given program, there exist 20% of branch instructions and 30% of
memory instructions. 10% of instructions are load-ALU combinations. What is
the speedup of pipelined design over the non-pipelined design?
L.D IF ID EX ME WB
ADD IF * * * ID EX ME WB
L.D IF * * * ID EX ME WB
ADD IF * * * ID
1 2 3 4 5 6 7 8 9 10 11 12 13 14
L.D IF ID EX ME WB
ADD IF * ID EX ME WB
L.D IF ID EX ME WB
ADD IF * ID EX ME WB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
F D X M W
F D M1 M2 M3 M4 M5 M6 M7 M W
*
F D A1 A2 A3 A4 M W
* * * * * * *
F D M W
* * * * * * * * * *
Problem:– Compiler Scheduling
Assume a multi-cycle MIPS pipeline with 1 integer unit (EX), 1 FP Adder (4 stage
pipelined unit), 1 Multiplier (7 stage pipelined unit), 1 Divider (24 stage un-pipelined
unit).
ADDI, R8, R0,#4 // R0 is by default stored with ‘0’MIPS
Loop: LW F1,0(R5) // value A is loaded to F1
LW F2,0(R6) // value B is loaded to F2
MUL.D F3, F1,F1
MUL.D F4, F2,F2
MULI.D F1,F1,#2
MUL.D F1,F1,F2
ADD.D F1, F1,F3
ADD.D F1,F1,F4
SUB.D F2,F3,F4
DIV.D F1,F1,F2
SW F1, 0(R7) // value C is stored to memory
ADD R5,R5,#8
ADD R6,R6,#8
ADD R7,R7,#8
SUB R8,R8,#1
BNEZ R8, Loop
ADDI R8, R0,#8
Problem:– Compiler Scheduling
O INSTRUCTION DIFF WB CLK REMARKS
CYCLE