UNIT1

Rapid Progress: Computers have advanced rapidly, offering more performance at lower costs.
For
example, today's $500 phones are as powerful as a $50 million computer from 1993.
Early Performance Growth: In the rst 25 years, computer performance grew by 25% annually
due to technological advances and better designs.
Microprocessor Rise: The late 1970s introduced microprocessors, which improved performance by
35% each year and were mass-produced, making computers cheaper.
New Architecture: Two major changes helped new computer designs succeed:
• Less assembly language programming.

• Standardized operating systems like UNIX and Linux.
RISC Architecture: In the 1980s, RISC (Reduced Instruction Set Computer) architecture was
developed. It simpli ed instructions and used techniques like:
• Instruction-level parallelism (running multiple instructions at once).

• Caches (for faster data access).
Increased Competition: RISC computers forced other designs to improve or become obsolete.
Complex Translations Simpli ed: By the late 1990s, translating more complex computer
instructions (like x86) became easier with more transistors.
ARM Architecture: In low-power devices (like phones), ARM’s RISC architecture became
dominant due to its ef ciency in power and size.
Sustained Growth: From 1980 to 1997, computer performance grew by over 50% annually.
New Computer Classes: Affordable computers like personal computers and workstations appeared
due to improved cost-performance.
Moore’s Law: The prediction that transistor counts would double every two years drove
semiconductor manufacturing improvements, favoring microprocessor-based computers.
Multi-core Processors: By 2004, companies like Intel moved away from single, fast processors and
began using multiple ef cient cores for better performance.
Rapid Progress: Computers have advanced rapidly, offering more performance at lower costs. For
example, today's $500 phones are as powerful as a $50 million computer from 1993.
Early Performance Growth: In the rst 25 years, computer performance grew by 25% annually
due to technological advances and better designs.
Microprocessor Rise: The late 1970s introduced microprocessors, which improved performance by
35% each year and were mass-produced, making computers cheaper.
New Architecture: Two major changes helped new computer designs succeed:
• Less assembly language programming.

• Standardized operating systems like UNIX and Linux.
fi
fi
fi
fi
fi
fi
How Moore's Law Works:
1. Observation by Moore: Moore’s Law is based on an observation by Gordon Moore in

1965, stating that the number of transistors on a microchip doubles periodically, initially
predicted to be every year.
2. Adjustments: Over time, the interval was adjusted:
◦ First, it was changed to two years.

◦ Later, it became around 18 months for doubling transistor density.
3. Exponential Growth: Despite these adjustments, the growth of transistor density continued
at an exponential rate, driving innovation and opportunities in the semiconductor industry
for decades.
4. Impact: This continuous exponential growth in the number of transistors led to more
powerful, smaller, and cost-ef cient chips, which fueled advancements in computing
technology.
Classes of Parallelism and Parallel Architectures (Simpli ed):
1. Two Kinds of Parallelism in Applications:

◦ Data-level parallelism (DLP): Multiple data items can be processed at the same
time.
◦ Task-level parallelism (TLP): Independent tasks can run simultaneously.
2. Ways to Exploit Parallelism in Hardware:
◦ Instruction-level parallelism (ILP): Uses techniques like pipelining and

speculative execution to handle data-level parallelism.
◦ Vector architectures, GPUs: Apply the same instruction to multiple data items at
once.
◦ Thread-level parallelism: Handles parallel threads that either exploit DLP or TLP
PERFORMANCE BASED COMP:
Flynn's Classi cation Explained:
1. SISD (Single Instruction Stream, Single Data Stream):

◦ How it works: A single processor executes one instruction at a time on a single data
stream.
◦ Example: Traditional sequential computers where only one operation happens at a
time.
◦ Parallelism: Can take advantage of instruction-level parallelism (ILP), which
means that even though it processes one instruction at a time, techniques like
pipelining (breaking tasks into smaller stages) allow multiple parts of the instruction
to be processed simultaneously.
fi
fi
fi
2. SIMD (Single Instruction Stream, Multiple Data Streams):
◦ How it works: Multiple processors carry out the same instruction on different pieces
of data at the same time.
◦ Example: Graphics processing units (GPUs), where the same operation (like adding
brightness to pixels) is applied to many data elements (pixels) simultaneously.
◦ Parallelism: Data-level parallelism (DLP) is exploited, meaning multiple data
items are processed in parallel under the same operation.
3. MISD (Multiple Instruction Streams, Single Data Stream):

◦ How it works: Multiple processors execute different instructions on the same data.
This architecture has not been practically implemented.
◦ Example: There are no commercial computers using this model, but conceptually, it
could be used for systems needing high reliability where the same data is processed
by different methods for error checking.
◦ Parallelism: It's a theoretical model with no real-world applications.
4. MIMD (Multiple Instruction Streams, Multiple Data Streams):

◦ How it works: Multiple processors execute different instructions on different data at
the same time.
◦ Example: Modern multicore processors, where each core can run its own program
independently (e.g., a server handling different tasks like video streaming, database
queries, etc.).
◦ Parallelism: Task-level parallelism (TLP) is exploited because different tasks can
run in parallel on different processors. It can also exploit data-level parallelism
(DLP), but with higher overhead compared to SIMD because each processor runs
independently. It’s more exible but generally more expensive than SIMD due to the
complexity of coordinating independent tasks.
In summary, Flynn's classi cation de nes how computers manage instruction and data streams in
parallel. While SISD is the most basic form, SIMD and MIMD are widely used today for handling
large-scale computations ef ciently, particularly in areas like gaming, scienti c simulations, and
cloud computing.
THE MYOPTC VIEW OF COMP ARCH:
Instruction Set Architecture (ISA): The Basics
1. De nition of ISA: ISA acts as the interface between software and hardware. It de nes how
the software communicates with the hardware, specifying the instructions a computer's
processor can execute.
2. ISA Examples:
◦ 80x86 (Intel architecture).

◦ ARMv8 (Advanced RISC Machine).
◦ RISC-V (an open-source RISC architecture).
3. Popular RISC Processors:
◦ ARM is the most widely used RISC processor, found in smartphones, tablets, and
many embedded systems.
fi
fl
fi
fi
fi
fi
fi
◦ RISC-V is an open-source ISA, available for use in custom chips and eld-
programmable gate arrays (FPGAs). It also has a full software ecosystem, including
compilers and operating systems.
4. ISA Classi cation:
◦ Most ISAs today are classi ed as general-purpose register architectures, meaning

they use registers (small storage areas inside the CPU) or memory locations as
operands.
◦ The 80x86 architecture has 16 general-purpose registers and 16 oating-point
registers (for handling decimal calculations).
◦ RISC-V has 32 general-purpose registers and 32 oating-point registers, giving it
more exibility for calculations.
Seven Dimensions of Instruction Set Architecture (ISA)
2. Memory Addressing:
◦ Byte Addressing: Most computers, like 80x86, ARMv8, and RISC-V, use byte
addressing to access memory.
◦ Alignment: Some architectures, like ARMv8, require data to be aligned (address
must be divisible by data size). While 80x86 and RISC-V don't strictly require
alignment, aligned data is accessed faster.
3. Addressing Modes:
◦ Addressing modes de ne how memory addresses are calculated.

◦ RISC-V: Has three addressing modes—Register, Immediate (constant), and
Displacement (constant offset added to a register).
◦ 80x86: Supports more modes, including:
▪ Absolute (no register involved).
▪ Base-indexed (two registers plus a displacement).
▪ Scaled-index (one register is multiplied by the operand size in bytes and
added to another register and displacement).
4. Operand Types and Sizes:
◦ Common operand sizes:

▪ 8-bit (ASCII character),
▪ 16-bit (Unicode or half-word),
▪ 32-bit (integer or word),
▪ 64-bit (long integer or double word).
◦ Floating Point: 32-bit (single precision), 64-bit (double precision). 80x86 also
supports an extended 80-bit oating point.
5. Operations:
◦ General operations include data transfer, arithmetic logic, control, and oating-
point.
◦ RISC-V: Known for simplicity and ef cient pipelining, which makes it a
representative RISC architecture.
6. Control Flow Instructions:
◦ Most ISAs, including 80x86, ARMv8, and RISC-V, support:

fl
fi
fi
fi
fl
fi
fl
fl
fi
fl
▪ Conditional branches,
▪ Unconditional jumps,
▪ Procedure calls,
▪ Returns.
◦ All three use PC-relative addressing (the branch address is calculated relative to the
program counter).
◦ RISC-V: Tests register contents for branches.
◦ 80x86 and ARMv8: Test condition codes (set by arithmetic/logic operations).
◦ Procedure calls: ARMv8 and RISC-V store the return address in a register, while
80x86 uses a stack in memory.
7. Encoding an ISA:
◦ Two encoding options:

▪ Fixed length (like ARMv8 and RISC-V, where all instructions are 32 bits
long, making decoding simpler).
▪ Variable length (like 80x86, where instructions can be between 1 to 18
bytes, allowing smaller programs but making decoding more complex).
Trends in Technology: Key Points
1. Integrated Circuit Logic Technology:

◦ Transistor density used to increase by about 35% per year, leading to a doubling of
transistors every 18-24 months (Moore's Law).
◦ Increases in chip size are slower, about 10-20% per year.
◦ Today, transistor speed isn't improving as fast, and Moore's Law is no longer
accurate.
2. Semiconductor DRAM (Main Memory Technology):
◦ DRAM growth has slowed signi cantly.

◦ Previously, DRAM capacity quadrupled every three years, but now it takes much
longer (e.g., from 8-gigabit in 2014 to 16-gigabit in 2019).
◦ No plans for 32-gigabit DRAM anytime soon.
3. Semiconductor Flash Memory:
◦ Flash memory, used in devices like phones, is growing fast in capacity (50-60% per
year, doubling every two years).
◦ Flash is 8-10 times cheaper per bit than DRAM, making it more popular for storage.
4. Magnetic Disk Technology (Hard Drives):
◦ Disk density increased rapidly in the past, but growth has now slowed to less than
5% per year.
◦ Hard drives are still cheaper per bit than Flash (8-10 times) and DRAM (200-300
times), so they are important for large-scale storage.
◦ New technology like HAMR is the last chance for signi cant improvement in disk
density.
5. Network Technology:
◦ Network performance depends on switches and transmission technology.

◦ As technology rapidly changes, designers must plan for these updates because
components like Flash and network systems evolve quickly, with devices typically
lasting only 3-5 years.
fi
fi
Performance Trends: Bandwidth vs. Latency
1. Bandwidth Improvements:
◦ Microprocessors and networks have seen major gains in bandwidth, increasing by
32,000 to 40,000 times.
◦ For memory and disks, bandwidth has also improved signi cantly, increasing by 400
to 2,400 times.
2. Latency Improvements:
◦ Latency has improved at a much slower rate compared to bandwidth:

▪ 50 to 90 times for microprocessors and networks.
▪ Only 8 to 9 times for memory and disks.
3. Rule of Thumb:
◦ Bandwidth generally grows much faster than latency, typically increasing by at least
the square of the latency improvement.
Takeaway: Designers should focus more on increasing bandwidth rather than latency, as bandwidth
tends to grow at a much faster rate and has a larger impact on performance
Trends in Power and Energy in Integrated Circuits:
1. Power Distribution and Dissipation:

◦ Power must be effectively delivered to the chip and dissipated as heat.
◦ Modern microprocessors use many pins and interconnect layers for power and
grounding.
2. Peak Power:
◦ The maximum power a processor needs is crucial to ensure the power supply can
meet the demand.
◦ Voltage drops can occur if the supply can't provide enough power, leading to device
malfunction.
◦ Modern processors regulate voltage and slow down to manage peak power demands.
3. Thermal Design Power (TDP):
◦ TDP is a measure of sustained power consumption, determining the cooling needs of

a system.
◦ It is typically less than peak power but higher than the average power during regular
use.
◦ Cooling systems are designed to handle the TDP, and processors lower clock rates or
shut down when temperatures exceed safe limits.
4. Energy Ef ciency:
◦ Energy, not just power, is a better metric to compare processor ef ciency.

◦ Energy = Power × Execution Time.
◦ A processor with higher power but shorter execution time can still be more energy-
ef cient than a slower, less power-consuming processor. For example, if a processor
uses 20% more power but completes a task in 70% of the time, it consumes less
overall energy (84%).
Conclusion: To determine which processor is more ef cient, it is important to compare their energy
consumption for the same task, not just their power usage.
fi
fi
fi
fi
fi
Trends in Cost:
1. Importance of Cost:
◦ While cost is less crucial for supercomputers, it is increasingly important for cost-
sensitive designs.
2. Impact of Time:
◦ The cost of manufactured computer components tends to decrease over time, even
without major technological advancements.
◦ This is in uenced by the learning curve, where manufacturing costs decrease as
more units are produced.
◦ The learning curve is often measured by yield, or the percentage of devices that
pass testing.
3. Price of DRAM:
◦ The price per megabyte of DRAM has generally fallen over the long term.
◦ DRAM prices are closely tied to production costs, except during shortages or
oversupply situations.
4. Microprocessor Pricing:
◦ Microprocessor prices also decline over time but are more complex due to their less
standardized nature compared to DRAM.
5. Volume Effects on Cost:
◦ Increased production volume affects costs positively:

▪ Faster Learning Curve: More units produced reduces the time needed to
improve manufacturing ef ciency.
▪ Purchasing and Manufacturing Ef ciency: Higher volume leads to cost
reductions through bulk purchasing and improved manufacturing processes.
▪ Amortization of Development Costs: Higher volumes spread out
development costs over more units, lowering the cost per unit.
6. Commoditization:
◦ Commodities are products sold by multiple vendors in large quantities that are
essentially identical (e.g., DRAM, Flash memory, keyboards).
◦ High competition among vendors drives prices closer to costs, reducing overall
product costs.
◦ The competitive nature of commodity markets allows suppliers to achieve economies
of scale, further lowering costs.
In summary, the cost of computer components is in uenced by time, volume, and market dynamics.
Understanding these trends can help in designing cost-effective systems.
Performance Metrics and Evaluation
1. Importance of Hardware Performance:

◦ The performance of hardware is crucial for the overall effectiveness of a computer
system.
2. Measuring and Comparing Performance:
fl
fi
fi
fl
◦ Performance must be evaluated to compare different design and technological
approaches effectively.
◦ Different applications may require different performance metrics, highlighting
various aspects of the system's signi cance.
3. Factors Affecting Performance:
◦ Key factors include:

▪ Instruction use and implementation
▪ Memory hierarchy
▪ Input/Output (I/O) handling
De ning Performance
• Subjectivity of Performance:
◦ Performance can mean different things to different stakeholders.
◦ An analogy from the airline industry can illustrate this:
▪ Cruising Speed: Indicates how fast the system operates.
▪ Flight Range: Re ects how far the system can go before needing a recharge
or refuel.
▪ Passengers: Represents how many tasks or users the system can handle
simultaneously.
Performance Metrics
1. Response (Execution) Time:

◦ De nition: The time from the start to the completion of a task.
◦ Signi cance: Measures the user's perception of the system's speed.
◦ Relevance: Particularly important in reactive and time-critical systems, such as
single-user computers or applications requiring immediate feedback.
2. Throughput:
◦
De nition: The total number of tasks completed in a speci c period.
◦
Signi cance: Most relevant for batch processing applications (e.g., billing, credit
card processing).
◦ Relevance: Primarily used in input/output systems, like disk access and printers,
where the volume of tasks is essential.
In summary, evaluating performance requires understanding different metrics, considering the
speci c context of applications, and recognizing the factors that in uence performance outcome
Benchmarks in Performance Measurement
1. Real Applications as Benchmarks:
◦ The most effective benchmarks for measuring performance are real applications,
such as Google Translate.
◦ Simpler programs often lead to misleading performance results.
2. Types of Benchmarks:
fi
fi
fi
fi
fi
fi
fl
fi
fi
fl
◦
Kernels: Small, essential parts of real applications that are critical for performance
measurement.
◦ Toy Programs: Simple programs (e.g., Quicksort) typically used in introductory
programming that do not re ect real-world performance.
◦ Synthetic Benchmarks: Invented programs designed to mimic the behavior of real
applications (e.g., Dhrystone).
3. Compiler Flags:
◦Using benchmark-speci c compiler ags can improve performance, but these

transformations might not be applicable to other programs.
◦ Benchmark developers often require the same compiler and ags for consistency
across tests.
4. Source Code Modi cations:
◦ Three approaches to handling source code modi cations in benchmarks:

1. No modi cations allowed.
2. Modi cations allowed but impractical (e.g., large database programs).
3. Modi cations allowed as long as the output remains the same.
5. Key Issues in Benchmark Design:
◦Designers must balance allowing modi cations with ensuring benchmarks accurately
predict real-world performance.
◦ The aim is to create a benchmark suite that effectively characterizes the relative
performance of computers for applications not included in the suite.
6. Standardized Benchmark Suites:
◦ One of the most successful standardized benchmarks is the SPEC (Standard

Performance Evaluation Corporation), initiated in the late 1980s for workstations.
◦ SPEC has evolved to cover various application classes over time.
Desktop Benchmarks
1. Classi cation:
◦
Desktop benchmarks are divided into two main categories:
▪ Processor-Intensive Benchmarks: Focus on measuring CPU performance.
▪ Graphics-Intensive Benchmarks: Evaluate graphics processing, often
involving signi cant CPU activity.
2. SPEC Benchmark Evolution:
◦The SPEC benchmark suite began with a focus on processor performance (initially
called SPEC89) and has evolved through six generations, with the latest being
SPEC2017.
◦ In SPEC2017, integer benchmarks are represented above the line and oating-point
benchmarks below the line.
3. Characteristics of SPEC Benchmarks:
◦ SPEC benchmarks are real programs that have been modi ed to ensure portability
and minimize I/O effects on performance.
◦ Integer Benchmarks: Include various applications such as parts of a C compiler, a
Go program, and video compression.
◦ Floating-Point Benchmarks: Feature applications like molecular dynamics, ray
tracing, and weather forecasting.
fi
fi
fi
fi
fi
fi
fi
fl
fl
fi
fi
fi
fl
fl
◦ The SPEC CPU suite is suitable for benchmarking processors in both desktop
systems and single-processor servers.
Server Benchmarks
1. Variety of Server Benchmarks:
◦ Servers serve multiple functions, requiring various types of benchmarks.

◦ A straightforward benchmark focuses on processor throughput.
2. Throughput Measurement:
◦ SPEC CPU2017 utilizes SPEC CPU benchmarks to create a throughput benchmark,

measuring the processing rate of multiprocessors by running multiple instances
(usually equal to the number of processors) of each SPEC CPU benchmark.
◦ This results in a measurement called SPECrate, which assesses request-level
parallelism.
3. Thread-Level Parallelism:
◦ SPEC offers high-performance computing benchmarks for measuring thread-level

parallelism, utilizing frameworks like OpenMP and MPI, as well as for GPUs.
4. I/O Activity:
◦ Most server benchmarks involve signi cant I/O activity related to storage or network
traf c.
◦ Types of benchmarks include:
▪ File server systems
▪ Web servers
▪ Database and transaction processing systems
5. Transaction-Processing (TP) Benchmarks:
◦ TP benchmarks evaluate a system's capacity to manage transactions, which consist of

database accesses and updates.
◦ Typical examples include airline reservation systems and bank ATM systems.
◦ The Transaction Processing Council (TPC) was established in the mid-1980s to
develop realistic and fair TP benchmarks.
6. TPC Benchmarks:
◦ The rst TPC benchmark, TPC-A, was published in 1985, later replaced and
enhanced by various benchmarks.
◦ TPC-C (1992): Simulates a complex query environment.
◦ TPC-H: Models ad hoc decision support with unrelated queries.
◦ TPC-DI: Focuses on data integration tasks, important for data warehousing.
◦ TPC-E: Represents an online transaction processing (OLTP) workload for a
brokerage rm’s customer accounts.
7. Performance Measurement:
◦ All TPC benchmarks measure performance in transactions per second and include
response time requirements to ensure throughput is only measured when response
time limits are met.
◦ The cost of the benchmark system is also considered for accurate cost-performance
comparisons.
fi
fi
fi
fi

UNIT1

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

UNIT1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

UNIT1

Uploaded by

Copyright:

Available Formats

Rapid Progress: Computers have advanced rapidly, offering more performance at lower costs.

• Less assembly language programming.

• Instruction-level parallelism (running multiple instructions at once).

• Less assembly language programming.

1. Observation by Moore: Moore’s Law is based on an observation by Gordon Moore in

2. Adjustments: Over time, the interval was adjusted:

◦ First, it was changed to two years.

Classes of Parallelism and Parallel Architectures (Simpli ed):

1. Two Kinds of Parallelism in Applications:

◦ Instruction-level parallelism (ILP): Uses techniques like pipelining and

PERFORMANCE BASED COMP:

Flynn's Classi cation Explained:

1. SISD (Single Instruction Stream, Single Data Stream):

3. MISD (Multiple Instruction Streams, Single Data Stream):

4. MIMD (Multiple Instruction Streams, Multiple Data Streams):

THE MYOPTC VIEW OF COMP ARCH:

Instruction Set Architecture (ISA): The Basics

◦ 80x86 (Intel architecture).

4. ISA Classi cation:

◦ Most ISAs today are classi ed as general-purpose register architectures, meaning

Seven Dimensions of Instruction Set Architecture (ISA)

◦ Addressing modes de ne how memory addresses are calculated.

◦ Common operand sizes:

◦ Most ISAs, including 80x86, ARMv8, and RISC-V, support:

◦ Two encoding options:

Trends in Technology: Key Points

1. Integrated Circuit Logic Technology:

◦ DRAM growth has slowed signi cantly.

◦ Network performance depends on switches and transmission technology.

◦ Latency has improved at a much slower rate compared to bandwidth:

Trends in Power and Energy in Integrated Circuits:

1. Power Distribution and Dissipation:

◦ TDP is a measure of sustained power consumption, determining the cooling needs of

◦ Energy, not just power, is a better metric to compare processor ef ciency.

◦ Increased production volume affects costs positively:

Performance Metrics and Evaluation

1. Importance of Hardware Performance:

◦ Key factors include:

1. Response (Execution) Time:

Benchmarks in Performance Measurement

1. Real Applications as Benchmarks:

◦Using benchmark-speci c compiler ags can improve performance, but these

◦ Three approaches to handling source code modi cations in benchmarks:

◦ One of the most successful standardized benchmarks is the SPEC (Standard

◦ Servers serve multiple functions, requiring various types of benchmarks.

◦ SPEC CPU2017 utilizes SPEC CPU benchmarks to create a throughput benchmark,

◦ SPEC offers high-performance computing benchmarks for measuring thread-level

◦ TP benchmarks evaluate a system's capacity to manage transactions, which consist of

You might also like