Computer Hardware Engineering: IS1200, Spring 2015
Computer Hardware Engineering: IS1200, Spring 2015
Computer Hardware Engineering: IS1200, Spring 2015
David Broman
Associate Professor, KTH Royal Institute of Technology
Assistant Research Engineer, University of California, Berkeley
Course Structure
E9 Lab: nios2int
L3 E2 E3 Lab: nios2time
L4 E4 Home lab: C
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
3
Application Software
Software
Operating System
Microarchitecture
Digital Circuits
Analog Circuits
Analog Design and Physics
Devices and Physics
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
Agenda
Part I Part II
Multiprocessors, Parallelism, Instruction-Level Parallelism
Concurrency, and Speedup
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
5
Part I
Multiprocessors, Parallelism,
Concurrency, and Speedup
Acknowledgement: The structure and several of the good examples are derived from the book
“Computer Organization and Design” (2014) by David A. Patterson and John L. Hennessy
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
Moore’s law:
• Integrated circuit resources (transistors)
double every 18-24 months.
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
7
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
What is a multiprocessor?
A multiprocessor is a computer By contrast, a computer with one
system with two or more processors. processor is called a uniprocessor.
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
9
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
10
Why multiprocessors?
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
11
Software
Parallelism is about
Sequential Concurrent
doing (executing)
many things at the
Serial
Example: matrix Example: A Linux
same time. Parallelism multiplication on a OS running on a
may be viewed from
Hardware
unicore processor. unicore processor .
the hardware
viewpoint.
Parallel
Note: As always, everybody does Example: matrix Example: A Linux OS
not agree on the definitions of multiplication on a running on a multicore
concurrency and parallelism. The multicore processor. processor .
matrix is from H&P 2014 and the
informal definitions above are
similar to what was said in a talk
by Rob Pike.
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
12
Speedup
Execution time of
How much can we improve the one program before
Tbefore improvement
performance using parallelization? Speedup =
Tafter Execution time after
improvement
Superlinear speedup. Either wrong,
Speedup or due to e.g. cache effects.
4
Linear speedup
3 (or ideal speedup)
T = Taffected + Tunaffected
Tbefore Tbefore
Speedup = = This is sometimes referred
Tafter Taffected to as Amdahl’s law
+ Tunaffected
N
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
14
Tbefore Tbefore
Speedup = =
Tafter Taffected
+ Tunaffected
N
16
Number of processors
10 40 But was not the maximal
speedup 11.1 when N ! infinity?
10x10
Speedup Speedup
Size of matrices
5.5 8.8
Strong scaling = measuring
speedup while keeping the
problem size fixed.
Weak scaling = measuring
20x20
Speedup Speedup
8.2 20.5
speedup when the problem
size grows proportionally to
the increased number of
processors.
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
17
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
18
SISD
Instruction Stream
cluster computers
No examples today E.g. Intel Physical Q/A
Core i7 What is a modern Intel CPU,
such as Core i7? Stand for
MIMD, on the table for SIMD
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
19
Part II
Instruction-Level Parallelism
Acknowledgement: The structure and several of the good examples are derived from the book
“Computer Organization and Design” (2014) by David A. Patterson and John L. Hennessy
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
20
2. Multiple issue
A technique where multiple instructions are
issued in each in cycle.
ILP may decrease the CPI to lower than 1,
or using the inverse metric instructions per
clock cycle (IPC) increase it above 1.
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
21
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
22
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
23
ALU
Instruction
PCnext A2 A RD 1
Memory
RD2 0 32
Memory
1
Data
A3
WD3
WD
32
20:16
0
15:11
+
1
4
<<2
+
Sign Extend
32
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
24
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
25
26
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
27
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
28
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
29
Bonus Part
Time-Aware Systems Design
Research Challenges
David Broman @ KTH
(not part of the Examination in IS1200)
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
30
Time-stamped
Physical simulations distributed systems
(Simulink, Modelica, etc.) (E.g. Google Spanner)
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
31
Objective:
Minimize the slack Minimize area, memory,
No point in making the energy.
execution time shorter, as
long as the deadline is met. Challenge:
Still guarantee to meet
Deadline all timing constraints.
Task Slack
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
32
Programming
Model
Programming
Model
You are
• ambitious and interested in learning new things
You want to
• do a real research project as part of you Bachelor’s or Master’s
thesis project
Please send an email to dbro@kth.se, so that we can discuss
some ideas.
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism
34
Summary
Part I Part II
David Broman Multiprocessors, Parallelism, Instruction-Level
dbro@kth.se Concurrency, and Speedup Parallelism