0% found this document useful (0 votes)
99 views

Advanced Computer Systems Architecture Lect-1

This document provides an overview of an Advanced Computer Systems Architecture course. It introduces the course instructor, topics to be covered including computer arithmetic, instruction level parallelism, and memory organization. It also lists required textbooks and the grading breakdown. Finally, it gives a broad introduction to computer architecture, trends in technology, and different classes of computers and parallelism techniques.

Uploaded by

HussainShabbir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views

Advanced Computer Systems Architecture Lect-1

This document provides an overview of an Advanced Computer Systems Architecture course. It introduces the course instructor, topics to be covered including computer arithmetic, instruction level parallelism, and memory organization. It also lists required textbooks and the grading breakdown. Finally, it gives a broad introduction to computer architecture, trends in technology, and different classes of computers and parallelism techniques.

Uploaded by

HussainShabbir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Advanced Computer Systems

Architecture
Course Teacher: Dr.-Ing. Shehzad Hasan
CIS, NED University

Lecture # 1

Fall Semester 2015 CS-506 ACSA 1


Topics
• Introduction
• Computer Arithmetic
• Instruction Level Parallelism
• Memory Organization
• Data Level Parallelism
• Recent Architectures

Fall Semester 2015 CS-506 ACSA 2


Books
• Computer Architecture – A Quantitative Approach
J.L. Hennessy & D.A. Patterson
Morgan Kaufmann, 2012, 5th Edition
• Computer Arithmetic: Algorithms and Hardware Design
Behrooz Parhami
Oxford University Press, New York, 2010, 2nd Edition,
• Advanced Computer Architecture
Kai Hwang & Jotwani
McGraw Hill, 2010

Fall Semester 2015 CS-506 ACSA 3


Grading

• Midterm 25%
– 9th Lecture i.e. 29th September (if everything goes fine)

• Quizzes (surprise) 1-2 05%


• Class Participation / Presentations 05%
• Assignments 05%

• Final 60%

Fall Semester 2015 CS-506 ACSA 4


Computer Architecture
• “Old” view of computer architecture:

– Instruction Set Architecture (ISA) design


– i.e. decisions regarding:
• registers, memory addressing, addressing modes, instruction
operands, available operations, control flow instructions,
instruction encoding

Fall Semester 2015 CS-506 ACSA 5


Computer Architecture
• “Real” computer architecture:
– Determine what attributes are important for a new
computer, then design a computer to maximize
performance and energy efficiency while staying
within cost, power, and availability constraints.

– Includes Instruction Set Design, logic design, IC design,


packaging, power and cooling.

– Optimization requires familiarity with


• Compilers
• Operating Systems
• Logic Design
• Packaging

Fall Semester 2015 CS-506 ACSA 6


Classes of Computers
• Personal Mobile Device (PMD)
– Web-based and Media oriented
applications
– Battery powered
– No fan for cooling
• e.g. smart phones, tablet computers

Emphasis on
– Energy efficiency
– Size (code size, and hardware)
– Cost (less expensive packaging,
optimized memory)
– Responsiveness
• Real-time constraints (Soft*)
– Weight

* possible to occasionally miss the time constraint


on an event, as long as not too many are missed

Fall Semester 2015 CS-506 ACSA 7


Classes of Computers
• Desktop Computing
– Largest market in terms of revenues
• e.g. low-end netbook, laptops, PCs, heavily configured workstations
– Well characterized in terms of applications and benchmarking
– For the last few years, half of the desktop computers made each year
are battery operated laptops
Emphasis on
– Performance
– Cost
• Price-performance combination
– Response time

Fall Semester 2015 CS-506 ACSA 8


Classes of Computers
• Servers
– Large scale and reliable file and computing
services
– Replacement of traditional mainframes
Emphasis on
– Availability (24/7 operational)
– Scalability
• As demand increases so should the
computing capacity, the memory, the storage
and I/O bandwidth of the server increase.
– Throughput
• Responsiveness to an individual request is
important but efficiency is determined by
how many requests can be handled in a unit
time.

Fall Semester 2015 CS-506 ACSA 9


Classes of Computers
• Clusters
– Used for “Software as a Service (SaaS)”
• e.g. Search, Social Networking, Online shopping, Multiplayer games, file
sharing etc.
– Clusters are collection of desktop computers or servers connected by
local area networks to act as a single larger computer
– When tens of thousands of servers act as one the clusters are called
warehouse- scale computers
Emphasis on
– Availability
– Price
– Performance
– Power
– Internet Bandwidth

Fall Semester 2015 CS-506 ACSA 10


Classes of Computers
• Embedded Computers
– Embedded systems are information processing systems that are
embedded into a larger product.
• Products like cars, trains, planes, medical equipment, buildings,
telecommunications
– PMD are examples of embedded computers but can run externally
developed software and so share some characteristics of desktop
computers
– Wide variety of processors (general purpose, special purpose, single
purpose) and wide variety of applications
Emphasis on
– Power and Energy
– Cost
– Size
– Performance at minimum price
(rather than higher performance at higher price)
– Real-time constraints (hard and soft)

Fall Semester 2015 CS-506 ACSA 11


Single Processor Performance

Fall Semester 2015 CS-506 ACSA 12


Parallelism
 Cannot continue to leverage Instruction-Level
parallelism (ILP)
 Single processor performance improvement ended in 2003

 New models for performance:


 Data-level parallelism (DLP)
 Thread-level parallelism (TLP)
 Request-level parallelism (RLP)

 These require explicit restructuring of the application

Fall Semester 2015 CS-506 ACSA 13


Parallelism
• Classes of architectural parallelism:

1) Instruction-Level Parallelism (ILP)


• Exploits parallelism with compiler’s help using ideas like pipelining
or through speculative execution

2) Vector architectures/Graphic Processor Units (GPUs)


• Exploits data-level parallelism by applying a single instruction to a
collection of data in parallel

Fall Semester 2015 CS-506 ACSA 14


Parallelism

3) Thread-Level Parallelism
• Exploits either data-level or task-level parallelism in a tightly
coupled hardware model that allows for interaction among
parallel threads

4) Request-Level Parallelism
• Exploits parallelism among largely decoupled tasks specified by
the programmer or operating system

Fall Semester 2015 CS-506 ACSA 15


Flynn’s Taxonomy
• Single instruction stream, single data stream (SISD)
– Uniprocessor
– Standard sequential computer but can exploit instruction-level parallelism

• Single instruction stream, multiple data streams (SIMD)


– Same instruction is executed by multiple processors using different data
– Separate data memories, but single instruction memory and control processor
– Exploits data-level parallelism
– Vector architectures, Multimedia extensions, Graphics processor units

• Multiple instruction streams, single data stream (MISD)


– No commercial implementation

• Multiple instruction streams, multiple data streams (MIMD)


– Each processor fetches its own instruction and operates on its own data
– Targets task-level parallelism, can also exploit data-level parallelism but with
more overhead than SIMD computers
– Tightly-coupled MIMD – exploits thread-level parallelism (high communication
and synchronization)
– Loosely-coupled MIMD – exploits request-level parallelism

Fall Semester 2015 CS-506 ACSA 16


Trends in Technology
• Integrated circuit technology
– Transistor density increase: 35%/year
– Die size increase: less predictable and slower about 10-20%/year
– Overall growth rate of transistor count on a chip: 40-55%/year
Or doubling every 18 to 24 months (Moore’s Law)
• DRAM capacity: 25-40%/year (slowing)
– Doubling every 2 to 3 years and perhaps will stop due to difficulty in
manufacturing smaller DRAM cells
• Flash capacity: 50-60%/year
– Also doubling every two years
– 15-20X cheaper/bit than DRAM
• Magnetic disk technology: 40%/year
– 15-25X cheaper/bit than Flash
– 300-500X cheaper/bit than DRAM

Fall Semester 2015 CS-506 ACSA 17


Throughput and Latency
• Bandwidth or throughput
– Total work done in a given time
– 10,000-25,000X improvement for processors
– 300-1200X improvement for memory and disks

• Latency or response time


– Time between the start and the completion of an event
– 30-80X improvement for processors
– 6-8X improvement for memory and disks

Fall Semester 2015 CS-506 ACSA 18


Throughput
and Latency

Log-log plot
of bandwidth
and latency
milestones
last 25 to 40
years

Spring Semester 2015 CS-506 ACSA 19


Transistors and wires
• Feature size
– Minimum size of transistor or a wire in either x or
y dimension
– 10 microns in 1971 to 14 nm in 2015 (700x)
– The density of transistors increases quadratically
with decrease in feature size
– Transistor performance improves linearly with
decrease in feature size but
• Wire delay does not improve with reduced feature size!
As per-unit-length resistance increases with shrinking
sizes. 𝜌𝐿
R=
𝑊𝐻

Spring Semester 2015 CS-506 ACSA 20


Dynamic Energy and Power
• Dynamic energy
Energy of pulse of logic transition 0→1 → 0 or 1 → 0 → 1
Energy dynamic∝ Capacitive load x Voltage2
For a single transition i.e. a switch from 0 -> 1 or 1 -> 0
Energy dynamic∝ ½ x Capacitive load x Voltage
2

• Dynamic power
Power dynamic∝ ½ x Capacitive load x Voltage2 x Frequency

• For a fixed task, slowing clock rate reduces power but not energy

• If two processors are to be compared for efficiency then energy


consumption for executing the task should be considered not
power consumption

Fall Semester 2015 CS-506 ACSA 21


Dynamic Energy and Power
• Microprocessors today for embedded applications are designed to have
adjustable voltage which could be adjusted dynamically also known as
dynamic voltage scaling. As the voltage is scaled down so should the
operating frequency be decreased. So a 15% reduction in voltage may
result in a 15% reduction in frequency. What would be the impact on
dynamic energy and on dynamic power?

• Dynamic energy
– Since the capacitance is unchanged

• Dynamic power
– We multiply the ratio of frequencies too

Fall Semester 2015 CS-506 ACSA 22


Power
• Intel 80386
consumed ~ 2 W
• 3.3 GHz Intel
Core i7
consumes 130 W
• Heat must be
dissipated from
1.5 x 1.5 cm chip
• This is the limit
of what can be
cooled by air

Fall Semester 2015 CS-506 ACSA 23


Measuring Performance
• Typical performance metrics:
– Response time also known as execution time
– Throughput

• Speedup of X relative to Y
The phrase X is faster than Y means that the response time or
execution time is lower on X than on Y
Execution timeY
n = Execution timeX

– Since execution time is reciprocal of performance following relationship holds

Execution timeY PerformanceX


n= =
Execution timeX PerformanceY

Fall Semester 2015 CS-506 ACSA 24


Measuring Performance
• What actually is Execution time?

– Wall clock time: includes all system overheads to


complete a task e.g. disk accesses, memory accesses,
input-output activities, operating system overhead etc.

– CPU time: (only computation time)


In multiprogramming the processor works on another
program while waiting for I/O so CPU time means the
time the processor is computing , not including the time
waiting for I/O or running other programs.

Fall Semester 2015 CS-506 ACSA 25


Improving Performance
• Take Advantage of Parallelism
– Use
• multiple processors
• Disks
• memory banks
• multiple functional units
– Pipelining
• Multiple instructions are
overlapped in execution.

Fall Semester 2015 CS-506 ACSA 26


Improving Performance
• Principle of Locality
– Programs tend to reuse data and instructions they have
used recently
– 90-10 rule of thumb
• 90% of the time 10% of the code is running
– We can predict with reasonable accuracy what instructions
and data a program will use in near future
– Temporal Locality
• Recently accessed items are likely to be accessed in near future
– Spatial Locality
• Items whose addresses are near one another tend to be
referenced close together in time

Fall Semester 2015 CS-506 ACSA 27


Improving Performance
• Focus on the Common Case
– In making a design tradeoff, favour the frequent case over
the infrequent case
• e.g. if decode unit of a processor is used frequently than a
multiplier unit, so optimize the decode unit first.
• Any other common example?
• Amdahl’s Law
– Performance improvement to be gained from using some faster
mode of execution is limited by the fraction of time the faster
mode can be used

Execution timefor entire task without using the enhancement


Speedup 
Execution time for entire task using the enhancement when possible

Fall Semester 2015 CS-506 ACSA 28


Improving Performance
• Amdahl’s law
– Fraction Enhanced:
• The fraction of the computation time in the original computer that
can be converted to take advantage of the enhancement
• Always less than or equal to 1
– Speedup Enhanced:
• The improvement gained by the enhanced execution mode
• Always greater than 1

Fall Semester 2015 CS-506 ACSA 29


Amdahl’s Law
• A new processor is designed which is 10 times faster on
computation in the web serving application than the original
processor. Assuming that the processor is busy with computation
40% of the time and is waiting for I/O 60% of the time. What is
the overall speedup gained by incorporating the enhancement?

– Fractionenhanced = ? = 0.4
– Speedupenhanced=? = 10

1  1  1.56
– Speedup overall =
0.6  0.4 0.64
10

Fall Semester 2015 CS-506 ACSA 30


Do it yourself
• Assume an enhancement is made to a computer that
improves some mode of execution by factor of 10. Enhanced
mode is used 50% of the time, measured as a percentage of
the execution time when the enhanced computer is in use.
– A) What is the speedup obtained from fast mode?
– B) What percentage of the original execution time has been converted
to fast mode? Or what is the Fractionenhanced?
Remember: Amdahl’s law depends on the fraction of the original,
unenhanced execution time that could make use of enhanced mode so
we cannot use directly this 50% measurement to compute speedup with
Amdahl’s law

Fall Semester 2015 CS-506 ACSA 31

You might also like