Ec23 Chapter1
Ec23 Chapter1
Ec23 Chapter1
[Adapted from Mary Jane Irwin’s slides (PSU) based on Computer Organization
and Design, ARM Edition, Patterson & Hennessy, © 2017, Elsevier]
https://www.top500.org/lists/top500/
2022/06/
It is used in
Thor
Captain America
X-Men: First Class
The Avengers
Gravity
Guardians of the Galaxy
Star Wars: The Force Awakens
Game of Thrones
…
System software
Operating system – supervising program that interfaces the
user’s program with the hardware (e.g., Linux, MacOS,
Windows)
- Handles basic input and output operations
- Allocates storage and memory
- Schedules tasks & Provides for protected sharing among multiple
applications
Compiler – translate programs written in a high-level language
(e.g., C, Java) into instructions that the hardware can execute
EC Chapter 1.20 Dept. of Comp. Arch., UMA
Below the Program, Con’t
High-level language program (in C)
swap (int v[], int k)
(int temp;
temp = v[k];
v[k] = v[k+1]; one-to-many
v[k+1] = temp;
) C compiler
Matrix Multiply:
relative
speedup to a
Python version
(18 core CPU)
Intel Pentium II (1997) Intel Pentium III (1999) Intel Pentium 4 (2000)
7.5×106 transistors 28×106 transistors 42×106 transistors
300 Mhz, 209 mm2 733 Mhz, 140 mm2 1.5 Ghz, 224 mm2
Remarkable improvement in performance, without
considering power consumption or die size
Alder Lake
8P + 8E cores 1GHz-5.5GHz / 770 Mhz-4.0GHz
GPU with 32 EU TDP: 9 W-241 W
4-30MB L3 10 nm
EC Chapter 1.28 Dept. of Comp. Arch., UMA
13th Generation Intel Core Proc.
Raptor Lake
8(16)P + 16E cores 6 GHz P-cores / 4,3 GHZ E-cores
GPU up to 96 EU TDP: 35 W-253 W
36 MB L3 7 nm
EC Chapter 1.29 Dept. of Comp. Arch., UMA
Processors for desktops and laptops
Desktops (35 – 150W) and laptops (9 – 55 W) or mobiles (9 - 25W):
Intel Raptor Lake U (mobile) AMD Ryzen U 7000
Electronics
technology continues
to evolve
Increased capacity
and performance
Reduced cost
Courtesy, Intel ®
EC Chapter 1.42 Dept. of Comp. Arch., UMA
Technology Scaling Road Map (ITRS)
Year 2006 2008 2010 2012 2014 2020 2022
Feature size 65 45 32 22 14 5 4
(nm)
Nuclear Reactor
100
Pentium 4, 2000
Hot plate
Pentium III
Pentium II Intel Core Duo, 2006
10
Pentium Pro
Pentium
i386
i486
1
nm
Power (Watts)
In CMOS IC technology
Power Capacitive load Voltage Frequency
2
×30 ×1000
How could clock rates grow by a factor of 1000 while
power grew by only a factor of 30?
5V → 1V
performanceX = 1 / execution_timeX
performanceX execution_timeY
-------------------- = --------------------- = n
performanceY execution_timeX
performanceA execution_timeB
-------------------- = --------------------- = n
performanceB execution_timeA
Productivity
Throughput: the total amount of work done in a given unit time
IO+
OS P1.exe OS P2.exe OS P1.exe P2.exe P1.exe OS P1.exe IO sll $2, $5, 2
add $2, $4, $2
lw $15, 0($2)
Result lw $15, 0($2)
lw $16, 4($2)
sw $16, 0($2)
sw $15, 4($2)
jr $31
sll $2, $5, 2
User CPU Time add $2, $4, $2
lw $15, 0($2)
lw $16, 4($2)
sw $16, 0($2)
sw $15, 4($2)
jr $31
sll $2, $5, 2
add $2, $4, $2
System CPU Time
User CPU Time
A B C
CPI 1 2 3
EC Chapter 1.85 Dept. of Comp. Arch., UMA
Using the Performance Equation
Computers A and B implement the same ISA. Computer
A has a clock cycle time of 250 ps and an effective CPI
of 2.0 for some program and computer B has a clock
cycle time of 500 ps and an effective CPI of 1.2 for the
same program. Which computer is faster and by how
much?
Algorithm X X
Programming
language X X
Compiler X X
ISA
X X X
Core
organization X X
Technology
X
=
How much faster would the machine be if a better data cache
reduced the average load time to 2 cycles?
NI x CPI
Tcpu = -------------------- = NI x CPI x tck
F
Static Metrics:
How many bytes does the program occupy in memory?
Dynamic Metrics:
How many instructions are executed? How many bytes does the
processor fetch to execute the program?
How many clocks are required per instruction? CPI
How "lean" a clock is practical?
Instruction count
MIPS
Execution time 10 6
Instruction count Clock rate
Instruction count CPI CPI 10 6
10 6
Clock rate
Supercomputers metric:
tsubSM
tsubCM
TcpuSM
TcpuCM
tresto tresto
S 1 𝐹𝑚 2 Sm 3
tsub … 3 , 2
EC Chapter 1.110 Dept. of Comp. Arch., UMA
Amdahl’s Law
Pitfall: Improving an aspect of a computer and
expecting a proportional improvement in overall
performance
TCPU _ org 1
S
TCPU _ imp Fm (1 Fm)
Sm
Fm = Fraction of improvement
Sm = Factor of improvement
1 1
Am=1 A 1 Am=inf A
Fm (1 Fm) 0 (1 Fm)
EC Chapter 1.112 Dept. of Comp. Arch., UMA
Amdahl’s Law
Example: multiply accounts for 80s/100s
How much improvement in multiply performance to
get 5× overall?
80 Can’t be done!
20 20
n
1 1
Sm=1 S 1 Sm=inf S
Fm (1 Fm) 0 (1 Fm)
EC Chapter 1.113 Dept. of Comp. Arch., UMA
Concluding Remarks
Cost/performance is improving
Due to underlying technology development