Chapter 2-Part 12 1

Distributed Architecture &
Intensive Computing
Presented by Dr. A. Djenadi
1
CHAPTER 2: COMPREHENSIVE PERFORMANCE
ASSESSMENT ACROSS ARCHITECTURES
2
Chapter objectives
The knowledge provided in this chapter will prove valuable to you, whether you are tasked with choosing a
new system or aiming to enhance the performance of an existing one.
Additionally, the chapter explores various factors influencing performance.
By the end of this chapter, you will have a clear understanding of what to examine in system tuning
reports and how each piece of information contributes to the broader perspective of overall system
performance.
3
Introduction
The word architecture covers all three aspects of computer design: Software, Instruction set architecture,
and hardware.
Optimization targets
Software Instruction set Hardware

architecture (ISA)
Programming Microarchitecture
Compiler Transistor
language
4
Introduction
Computer architects must design a computer considering the following aspects:
Functional Trends in
Performance Power
requirements technology
Availability
Price/Cost
goals
5
Functional requirements
Definition: This refers to the intended functionality and capabilities of the computer system.
• Application area:
o Personal mobile device (Real-time performance, graphics, videos and audio, energy efficiency.)
o Desktop computer (Real-time performance, graphics, videos and audio)
o Servers (Support for databases and transaction processing; enhancements for reliability and
availability; support for scalability).
o Clusters computers (Throughput performance for many independent tasks; error correction for
memory; energy proportionality)
o Internet of things / Embedded computing (special support for graphics or video (or other
application-specific extension); power limitations and power control may be required; real-time
constraints
6
Functional requirements
• Level of software compatibility:
• Operating system requirements (Necessary features to support chosen OS)
• Standards Certain standards may be required by marketplace
• Floating point Format and arithmetic: IEEE 754 standard, special arithmetic for graphics or signal
processing
• I/O interfaces For I/O devices: Serial ATA, Serial Attached SCSI, PCI Express
• Networks Support required for different networks: Ethernet
• Programming languages Languages (ANSI C, C++, Java, Fortran) affect instruction set
7
Trends in Technology
Computer architect must stay updated on swiftly changing implementation technologies, including:
• Integrated circuit logic technology: Transistor density and Increases in die size. However, this increase
does not follow the Moore’s law.
• Semiconductor DRAM (dynamic random-access memory).
• Semiconductor Flash (electrically erasable programmable read-only memory). This nonvolatile
semiconductor memory is the standard storage device in PMDs.
• Magnetic disk technology.
• Network technology.
8
Performance measurement and analysis
Question 1: What does it mean when we say that computer X has better performance than computer Y?
Answer 1: Computer X is faster that computer Y.
Question 2: What does it means computer X is faster that computer Y?
Answer 2: It depends on the perspectives of the users and on both external and internal considerations of
the machine.
9
The user of a desktop computer may say a computer is faster when a program runs in less time, while a
computer center administrator may say a computer is faster when it completes more transactions per
hour.
• The desktop computer user wants to reduce the response time (execution time) which is defined as the
time between the start and the completion of an event.
• The administrator wants to increase throughput (debit de sortie), which is defined as the total amount of
work done in a given time.
• In both cases the metric used to asses the performance is: The time.
Important: The primary, consistent and reliable indicator measure of performance is the execution time of
real programs.
10
Time & computer: The clock system
The actions carried out by a processor, such as retrieving an instruction, interpreting the instruction, loading
and storing data and executing arithmetic operations, are controlled by a system clock.
Typically, all operations begin with the pulse of the clock.
At the most fundamental level, the speed of a processor is dictated by the pulse frequency produced by the
clock, measured in cycles per second, or Hertz (Hz).
11
Clock signal generation
Analog to
Quartz
Digital
crystal
conversion
Example 1: 1-GHz processor receives 1 billion pulses per second.
• The rate of pulses is known as the clock rate, or clock speed (Frequency)
• One increment, or pulse, of the clock is referred to as a clock tick.
• The time between pulses is the cycle time, clock periods, clocks, cycles.
12
CPU time (Execution time): The Processor Performance Equation
CPU time (execution time) for a program can be expressed in seconds in two ways:
𝐶𝑃𝑈 𝑡ℏ𝑚𝑒 = 𝐶𝑃𝑈 𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 𝑓𝑜𝑟 𝑎 𝑝𝑟𝑜𝑔𝑟𝑎𝑚 × 𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒 𝑡ℏ𝑚𝑒(𝑝𝑒𝑟ℏ𝑜𝑑 )
𝐶𝑃𝑈 𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 𝑓𝑜𝑟 𝑎 𝑝𝑟𝑜𝑔𝑟𝑎𝑚

𝐶𝑃𝑈 𝑡ℏ𝑚𝑒 =
𝐶𝑙𝑜𝑐𝑘 𝑟𝑎𝑡𝑒
• CPU Time (execution time): This is the total time the CPU spends executing a specific program. It is
often measured in seconds.
• CPU Clock Cycles for a Program: This refers to the number of clock cycles the CPU takes to execute
all the instructions in the program.
• Clock Cycle Time (period): This is the duration of a single clock cycle, measured in seconds. It
represents the time it takes for the CPU to complete one clock cycle.
• Clock rate: This is the clock frequency (the number of clock cycles per second).
13
If we know the number of clock cycles and the instruction count (IC), we can calculate the average
number of clock cycles per instruction (CPI).
𝐶𝑃𝑈 𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 𝑓𝑜𝑟 𝑎 𝑝𝑟𝑜𝑔𝑟𝑎𝑚

𝐶𝑃𝐼 =
𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡ℏ𝑜𝑛 𝑐𝑜𝑢𝑛𝑡
Thus, we can use the CPI in the execution time formula (CPU time):
𝐶𝑃𝑈 𝑡ℏ𝑚𝑒 = 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡ℏ𝑜𝑛 𝑐𝑜𝑢𝑛𝑡 × 𝐶𝑦𝑐𝑙𝑒𝑠 𝑝𝑒𝑟 ℏ𝑛𝑠𝑡𝑟𝑢𝑐𝑡ℏ𝑜𝑛 × 𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒 𝑡ℏ𝑚𝑒
𝑒𝑥𝑒𝑐𝑢𝑡ℏ𝑜𝑛_𝑡ℏ𝑚𝑒 = 𝐼𝐶 × 𝐶𝑃𝐼 × 𝑇𝑝𝑒𝑟ℏ𝑜𝑑𝑒
14
Example 2:
• A program P1 consists of 30 instructions.
• Clock frequency = 1 GHz.
• Number of cycles per instruction = 3 cycles.
𝑪𝒚𝒄𝒍𝒆 𝒕𝒊𝒎𝒆 = 1 / 1000 = 0.001 𝜇𝑠 = 1 𝑛𝑠.

𝑪𝑷𝑼 𝒕𝒊𝒎𝒆 𝒇𝒐𝒓 𝑷𝟏 = 𝑬𝒙𝒆𝒄𝒖𝒕𝒊𝒐𝒏 𝒕𝒊𝒎𝒆 𝒇𝒐𝒓 𝑷𝟏 = 30 𝑥 3 𝑥 1 = 90 𝑛𝑠.
15
Expressing the initial formula in terms of units of measurement illustrates the integration of its components:
𝐼𝑛𝑡𝑟𝑢𝑐𝑡ℏ𝑜𝑛𝑠 𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 𝑆𝑒𝑐𝑜𝑛𝑑𝑠 𝑆𝑒𝑐𝑜𝑛𝑑𝑠

× × × = 𝐶𝑃𝑈 𝑡ℏ𝑚𝑒
𝑃𝑟𝑜𝑔𝑟𝑎𝑚 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡ℏ𝑜𝑛 𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒 𝑃𝑟𝑜𝑔𝑟𝑎𝑚
As this formula demonstrates, processor performance is dependent upon three characteristics:

• Clock cycle time: Hardware technology and organization
• Clock cycles per instruction (CPI): Organization and instruction set architecture
• Instruction count: Instruction set architecture and compiler technology
16
Remarks:
• Executing an instruction involves multiple steps, such as retrieving it from memory, decoding, and
performing operations. Thus, most instructions on most processors require multiple clock cycles to
complete. Some instructions may take only a few cycles, while others require dozens.
• On any give processor, the number of clock cycles required varies for different types of instructions, such
as load, store, branch, and so on.
• A straight comparison of clock speeds on different processors does not tell the whole story about
performance.
17
Thus, the previous formulas become:

𝑛
𝐶𝑃𝑈 𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 = ෍ 𝐼𝐶ℏ × 𝐶𝑃𝐼ℏ

ℏ=1
Where 𝐼𝐶ℏ represents the number of times instruction “ℏ” is executed in a program and 𝐶𝑃𝐼ℏ represents the
average number of clocks per instruction for instruction "ℏ".
Thus, the CPU time (execution time ) formulas become:
𝐶𝑃𝑈 𝑡ℏ𝑚𝑒 = ෍ 𝐼𝐶ℏ × 𝐶𝑃𝐼ℏ × 𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒 𝑡ℏ𝑚𝑒

ℏ=1
18
The overall CPI or the global CPI:
𝑛
σ𝑛ℏ=1 𝐼𝐶ℏ × 𝐶𝑃𝐼ℏ 𝐼𝐶ℏ
𝐺𝑙𝑜𝑏𝑎𝑙_𝐶𝑃𝐼 = =෍ × 𝐶𝑃𝐼ℏ
𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡ℏ𝑜𝑛 𝑐𝑜𝑢𝑛𝑡 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡ℏ𝑜𝑛 𝑐𝑜𝑢𝑛𝑡
ℏ=1
The overall version of the 𝐶𝑃𝐼 calculation considers each specific 𝐶𝑃𝐼ℏ and its frequency in a program (i.e.,
𝐼𝐶ℏ ÷ 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡ℏ𝑜𝑛 𝑐𝑜𝑢𝑛𝑡).
Because it must include pipeline effects, cache misses, and any other memory system inefficiencies, 𝐶𝑃𝐼ℏ
should be measured and not just calculated from a table in the back of a reference manual.
19
Example 3: Suppose we made the following measurements:
• Frequency of floating point (FP) operations: 25%
• Average CPI of FP operations: 4 cycles
• Average CPI of other instructions: 1.33 cycles
What is the CPI global?
𝐶𝑃𝐼 𝑔𝑙𝑜𝑏𝑎𝑙 = 0,25 × 4 + 0,75 × 1,33 ≈ 2 𝑐𝑦𝑐𝑙𝑒𝑠
20
Performance comparison
We often compare the performance of two different computers, X and Y, by using the assessment “X is faster
than Y”, which means that execution time is lower on X than on Y for the given task.
In particular, “X is n times as fast as Y” will mean:
𝐸𝑥𝑒𝑐𝑢𝑡ℏ𝑜𝑛 𝑡ℏ𝑚𝑒𝑌
=𝑛
𝐸𝑥𝑒𝑐𝑢𝑡ℏ𝑜𝑛 𝑡ℏ𝑚𝑒𝑋
We suppose that the execution time is the reciprocal of performance, thus we have the following relationship:
1
𝐸𝑥𝑒𝑐𝑢𝑡ℏ𝑜𝑛 𝑡ℏ𝑚𝑒𝑌 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝑌 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝑋
𝑛= = =
𝐸𝑥𝑒𝑐𝑢𝑡ℏ𝑜𝑛 𝑡ℏ𝑚𝑒𝑋 1 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝑌
𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝑋
21
The execution time can be replaced by the throughput metric to compare the performance between X and Y
in term of the amount of work done in a given time.
throughput 𝑌
𝑛=
throughput 𝑋
Example:
The throughput of X is 5.2 times as fast as Y signifies here that the number of tasks completed per unit time
on computer X is 5.2 times the number completed on Y.
22
Remarks:
➢ Execution time is expressed in seconds. It may include or not: instruction processing; memory access;
I/O; interruptions; operating system overhead.
➢ Output throughput is expressed in the number of instructions per second (for a processor), the number of
queries processed per hour (for a server), MIPS (Million Instructions Per Second), and MFLOPS (Million
Floating-point Operations Per Second).
𝐼𝐶 Clock frequency
𝑀𝐼𝑃𝑆 = =
𝐶𝑃𝑈 𝑡ℏ𝑚𝑒(𝑒𝑥𝑒𝑐𝑢𝑡ℏ𝑜𝑛 𝑡ℏ𝑚𝑒) × 106 𝐶𝑃𝐼 × 106
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑥𝑒𝑐𝑢𝑡𝑒𝑑 𝑓𝑙𝑜𝑎𝑡ℏ𝑛𝑔 𝑝𝑜ℏ𝑛𝑡 𝑜𝑝𝑒𝑟𝑎𝑡ℏ𝑜𝑛𝑠 ℏ𝑛 𝑎 𝑝𝑟𝑜𝑔𝑟𝑎𝑚

𝑀𝐹𝐿𝑂𝑃𝑆 =
𝑒𝑥𝑒𝑐𝑢𝑡ℏ𝑜𝑛 𝑡ℏ𝑚𝑒 × 106
23
Benchmarks
Definition: Performance benchmarking involves objectively evaluating the performance of one system (e.g.
computer, software, component) in comparison to another.
Reliable benchmarks play a crucial role in cutting through marketing exaggerations and statistical
manipulations. In essence, effective benchmarks help pinpoint systems that deliver optimal performance at a
reasonable cost.
24
Benchmark type
• Kernels, represents small, key pieces of real applications.

• Toy programs, which are simple programs of 100-line from beginning programming assignments, such
as Quicksort.
• Synthetic benchmarks, consists in fake programs invented to imitate the behavior of real applications,
such as Dhrystone.
25
Flaws and limitations
• The compiler writer and architect can manipulate the test results by making the computer appear faster
on these surrogate programs than on real applications.
• The use of a benchmark-specific compiler flags to improve the performance of a benchmark. These flags
often caused transformations that would be illegal on many programs or would slow down performance
on others.
• Modification of the source code of the benchmarks:
• No modifications allowed.
• Modifications allowed but impossible to be made (Database benchmarks)
• Source modifications are allowed, as long as the altered version produces the same output.
26
Better benchmarking solution: benchmark suites
An accepted solution for performance assessment is the use of collections of benchmark applications, called
benchmark suites.
A key advantage of such suites is that the weakness of any one benchmark is lessened by the presence of
the other benchmarks.
27
SPEC: Standard Performance Evaluation Corporation
The most recognized standardized benchmark application suites have been the SPEC (Standard
Performance Evaluation Corporation)
The first benchmark suites version was developed in 1980 to benchmark workstations. Currently, there are
SPEC benchmarks to cover many application classes. All the SPEC benchmark suites and their reported
results are found at http://www.spec.org.
28
SPEC: Standard Performance Evaluation Corporation
Active benchmarks
from SPEC as of
2017
29
Reporting Performance Results
The key principle in presenting performance measurements should prioritize reproducibility, ensuring that
another experimenter can replicate the results.
A SPEC benchmark report requires an extensive description of the computer and the compiler flags, as well
as the publication of both the baseline and the optimized results.
Alongside hardware, software, and baseline tuning details, a SPEC report includes performance times
displayed in tables and graphs.
30
SPEC results comparison: SPECRatio
A normalization of the execution times to a reference computer by dividing the time on the reference
computer by the time on the computer being rated, yielding a ratio proportional to performance. SPEC uses
the SPECRatio.
For example, suppose that the SPECRatio of computer A on a benchmark is 2.56 times as fast as computer
B; then we know
𝐸𝑥𝑒𝑐𝑢𝑡ℏ𝑜𝑛 𝑡ℏ𝑚𝑒𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒
𝑆𝑃𝐸𝐶𝑅𝑎𝑡ℏ𝑜𝐴 𝐸𝑥𝑒𝑐𝑢𝑡ℏ𝑜𝑛 𝑡ℏ𝑚𝑒𝐴 𝐸𝑥𝑒𝑐𝑢𝑡ℏ𝑜𝑛 𝑡ℏ𝑚𝑒𝐵 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝐴
2.56 = = = =
𝑆𝑃𝐸𝐶𝑅𝑎𝑡ℏ𝑜𝐵 𝐸𝑥𝑒𝑐𝑢𝑡ℏ𝑜𝑛 𝑡ℏ𝑚𝑒𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝐸𝑥𝑒𝑐𝑢𝑡ℏ𝑜𝑛 𝑡ℏ𝑚𝑒𝐴 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝐵
𝐸𝑥𝑒𝑐𝑢𝑡ℏ𝑜𝑛 𝑡ℏ𝑚𝑒𝐵
Note: The choice of the reference computer is irrelevant when the comparisons are made as a ratio.
31
After choosing a benchmark suite, the performance results of the suite are summarized in a unique number
that is the geometric mean of the SPECRatio of the programs in the suite.
𝑛
𝑛
𝐺𝑒𝑜𝑚𝑒𝑡𝑟ℏ𝑐 𝑚𝑒𝑎𝑛 = ෑ 𝑆𝑎𝑚𝑝𝑙𝑒ℏ
ℏ=1
In the case of SPEC, 𝑠𝑎𝑚𝑝𝑙𝑒ℏ is the SPECRatio for program i.
32
Why use Geometric mean:
1. The geometric mean of the ratios is the same as the ratio of the geometric means.
2. The ratio of the geometric means is equal to the geometric mean of the performance ratios, which implies
that the choice of the reference computer is irrelevant.
33
Example
34
Performance enhancement: Amdahl’s Law
Objective: enhancing the performance by improving a portion of a computer.
Definition: Amdahl’s Law states that the performance improvement to be gained from using some faster
mode of execution is limited by the fraction of the time the faster mode can be used.
Speedup: Amdahl’s Law defines the speedup that can be gained by using a particular feature. Speedup is
the ratio given by:
𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝑓𝑜𝑟 𝑒𝑛𝑡ℏ𝑟𝑒 𝑡𝑎𝑠𝑘 𝑢𝑠ℏ𝑛𝑔 𝑡ℎ𝑒 𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑚𝑒𝑛𝑡 𝑤ℎ𝑒𝑛 𝑝𝑜𝑠𝑠ℏ𝑏𝑙𝑒
𝑆𝑝𝑒𝑒𝑑𝑢𝑝 =
𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝑓𝑜𝑟 𝑒𝑛𝑡ℏ𝑟𝑒 𝑡𝑎𝑠𝑘 𝑤ℏ𝑡ℎ𝑜𝑢𝑡 𝑢𝑠ℏ𝑛𝑔 𝑡ℎ𝑒 𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑚𝑒𝑛𝑡
Or, function of the execution times:
𝐸𝑥𝑒𝑐𝑢𝑡ℏ𝑜𝑛 𝑡ℏ𝑚𝑒 𝑓𝑜𝑟 𝑒𝑛𝑡ℏ𝑟𝑒 𝑡𝑎𝑠𝑘 𝑤ℏ𝑡ℎ𝑜𝑢𝑡 𝑢𝑠ℏ𝑛𝑔 𝑡ℎ𝑒 𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑚𝑒𝑛𝑡

𝑆𝑝𝑒𝑒𝑑𝑢𝑝 =
𝐸𝑥𝑒𝑐𝑢𝑡ℏ𝑜𝑛 𝑡ℏ𝑚𝑒 𝑓𝑜𝑟 𝑒𝑛𝑡ℏ𝑟𝑒 𝑡𝑎𝑠𝑘 𝑢𝑠ℏ𝑛𝑔 𝑡ℎ𝑒 𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑚𝑒𝑛𝑡 𝑤ℎ𝑒𝑛 𝑝𝑜𝑠𝑠ℏ𝑏𝑙𝑒
35
Amdahl’s Law factors:
𝑭𝒓𝒂𝒄𝒕𝒊𝒐𝒏𝒆𝒏𝒉𝒂𝒏𝒄𝒆𝒅 : The fraction of the computation time in the original computer that can be converted to take
advantage of the enhancement. This value is always less than or equal to 1.
Example: if 20 seconds of the execution time of a program that takes 60 seconds in total can use an
enhancement, the fraction is 20/60.
𝑺𝒑𝒆𝒆𝒅𝒖𝒑𝒆𝒏𝒉𝒂𝒏𝒄𝒆𝒅 : The improvement gained by the enhanced execution mode. This value is the time of the
original mode over the time of the enhanced mode. This value is always greater than 1
Example: If the enhanced mode takes 4 seconds for a portion of the program, while it is 40 seconds in the
original mode, the improvement is 40/4 or 10.
36
The new enhanced execution time
The execution time using the original computer with the enhanced mode will be the time spent using the
unenhanced portion of the computer plus the time spent using the enhancement:
𝐹𝑟𝑎𝑐𝑡ℏ𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑
𝐸𝑥𝑒𝑐𝑢𝑡ℏ𝑜𝑛 𝑡ℏ𝑚𝑒𝑛𝑒𝑤 = 𝐸𝑥𝑒𝑐𝑢𝑡ℏ𝑜𝑛 𝑡ℏ𝑚𝑒𝑜𝑙𝑑 × 1 , 𝐹𝑟𝑎𝑐𝑡ℏ𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑 +
𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑
The overall speedup is given by:
𝐸𝑥𝑒𝑐𝑢𝑡ℏ𝑜𝑛 𝑡ℏ𝑚𝑒𝑜𝑙𝑑 1
𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙 = =
𝐸𝑥𝑒𝑐𝑢𝑡ℏ𝑜𝑛 𝑡ℏ𝑚𝑒𝑛𝑒𝑤 𝐹𝑟𝑎𝑐𝑡ℏ𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑
1 , 𝐹𝑟𝑎𝑐𝑡ℏ𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑 +
𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑
37
Example: Amdahl’s Law
Suppose that we want to enhance the processor used for web serving. The new processor is 10 times faster
on computation in the web serving application than the old processor. Assuming that the original processor is
busy with computation 40% of the time and is waiting for I/O 60% of the time.
What is the overall speedup gained by incorporating the enhancement?
𝐹𝑟𝑎𝑐𝑡ℏ𝑜𝑛𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑 = 0.4
𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑑 = 10
1 1
𝑆𝑝𝑒𝑒𝑑𝑢𝑝𝑜𝑣𝑒𝑟𝑎𝑙𝑙 = = ≈ 1.56
0.4 0.64
0.6 + 10
38

Chapter 2-Part 12 1

Uploaded by

Copyright:

Available Formats

Chapter 2-Part 12 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 2-Part 12 1

Uploaded by

Copyright:

Available Formats

Distributed Architecture &

Additionally, the chapter explores various factors influencing performance.

Software Instruction set Hardware

• Operating system requirements (Necessary features to support chosen OS)

• Standards Certain standards may be required by marketplace

• Networks Support required for different networks: Ethernet

does not follow the Moore’s law.

• Semiconductor DRAM (dynamic random-access memory).

• Semiconductor Flash (electrically erasable programmable read-only memory). This nonvolatile

semiconductor memory is the standard storage device in PMDs.

• Magnetic disk technology.

Answer 1: Computer X is faster that computer Y.

Question 2: What does it means computer X is faster that computer Y?

Typically, all operations begin with the pulse of the clock.

Example 1: 1-GHz processor receives 1 billion pulses per second.

• One increment, or pulse, of the clock is referred to as a clock tick.

𝐶𝑃𝑈 𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 𝑓𝑜𝑟 𝑎 𝑝𝑟𝑜𝑔𝑟𝑎𝑚

𝐶𝑃𝑈 𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 𝑓𝑜𝑟 𝑎 𝑝𝑟𝑜𝑔𝑟𝑎𝑚

𝑒𝑥𝑒𝑐𝑢𝑡ℏ𝑜𝑛_𝑡ℏ𝑚𝑒 = 𝐼𝐶 × 𝐶𝑃𝐼 × 𝑇𝑝𝑒𝑟ℏ𝑜𝑑𝑒

𝑪𝒚𝒄𝒍𝒆 𝒕𝒊𝒎𝒆 = 1 / 1000 = 0.001 𝜇𝑠 = 1 𝑛𝑠.

𝐼𝑛𝑡𝑟𝑢𝑐𝑡ℏ𝑜𝑛𝑠 𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 𝑆𝑒𝑐𝑜𝑛𝑑𝑠 𝑆𝑒𝑐𝑜𝑛𝑑𝑠

As this formula demonstrates, processor performance is dependent upon three characteristics:

Thus, the previous formulas become:

𝐶𝑃𝑈 𝑐𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒𝑠 = ෍ 𝐼𝐶ℏ × 𝐶𝑃𝐼ℏ

Thus, the CPU time (execution time ) formulas become:

𝐶𝑃𝑈 𝑡ℏ𝑚𝑒 = ෍ 𝐼𝐶ℏ × 𝐶𝑃𝐼ℏ × 𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒 𝑡ℏ𝑚𝑒

The overall CPI or the global CPI:

Example 3: Suppose we made the following measurements:

• Frequency of floating point (FP) operations: 25%

• Average CPI of FP operations: 4 cycles

• Average CPI of other instructions: 1.33 cycles

What is the CPI global?

𝐶𝑃𝐼 𝑔𝑙𝑜𝑏𝑎𝑙 = 0,25 × 4 + 0,75 × 1,33 ≈ 2 𝑐𝑦𝑐𝑙𝑒𝑠

In particular, “X is n times as fast as Y” will mean:

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑥𝑒𝑐𝑢𝑡𝑒𝑑 𝑓𝑙𝑜𝑎𝑡ℏ𝑛𝑔 𝑝𝑜ℏ𝑛𝑡 𝑜𝑝𝑒𝑟𝑎𝑡ℏ𝑜𝑛𝑠 ℏ𝑛 𝑎 𝑝𝑟𝑜𝑔𝑟𝑎𝑚

• Kernels, represents small, key pieces of real applications.

• Modification of the source code of the benchmarks:

• Modifications allowed but impossible to be made (Database benchmarks)

In the case of SPEC, 𝑠𝑎𝑚𝑝𝑙𝑒ℏ is the SPECRatio for program i.

𝐸𝑥𝑒𝑐𝑢𝑡ℏ𝑜𝑛 𝑡ℏ𝑚𝑒 𝑓𝑜𝑟 𝑒𝑛𝑡ℏ𝑟𝑒 𝑡𝑎𝑠𝑘 𝑤ℏ𝑡ℎ𝑜𝑢𝑡 𝑢𝑠ℏ𝑛𝑔 𝑡ℎ𝑒 𝑒𝑛ℎ𝑎𝑛𝑐𝑒𝑚𝑒𝑛𝑡

The overall speedup is given by:

What is the overall speedup gained by incorporating the enhancement?

You might also like