Computer Architecture Unit 1

Unit:-1
What Is Computer Architecture?
Computer architecture refers to the end-to-end structure of a computer system that

determines how its components interact with each other in helping to execute the machine’s
purpose (i.e., processing data), often avoiding any reference to the actual technical
implementation.
Examples of Computer Architecture: Von Neumann Architecture (a) and Harvard

Architecture(b)
Components of Computer Architecture
Depending on the method of categorization, the parts of a computer architecture can be

subdivided in several ways. The main components of a computer architecture are the CPU,
memory, and peripherals. All these elements are linked by the system bus, which comprises an
address bus, a data bus, and a control bus. Within this framework, the computer architecture
has eight key components, as described below.
1. Input unit and associated peripherals
The input unit provides external data sources to the computer system. Therefore, it
connects the external environment to the computer. It receives information from input
devices, translates it to machine language, and then inserts it within the computer system.
The keyboard, mouse, or other input devices are the most often utilized and have
corresponding hardware drivers that allow them to work in sync with the rest of the
computer architecture.
2. Output unit and associated peripherals
The output unit delivers the computer process’s results to the user. A majority of the output
data comprises music, graphics, or video. A computer architecture’s output devices encompass
the display, printing unit, speakers, headphones, etc.
To play an MP3 file, for instance, the system reads a number array from the disc and into
memory. The computer architecture manipulates these numbers to convert compressed audio
data to uncompressed audio data and then outputs the resulting set of numbers
(uncompressed audio file) to the audio chips. The chip then makes it user-ready through the
output unit and associated peripherals.
3. Storage unit/memory
The storage unit contains numerous computer parts that are employed to store data. It is
typically separated into primary storage and secondary storage.
Unit:-1
Quantitative techniques in computer design:-

 CPU Performance equation
 Clock cycle time: Hardware and organization
 CPI: Organization and instruction set architecture
 Instruction count: instruction set architecture and compiler technology
The most important and pervasive design principal of computer design is to make the
common case fast. While applying this simple principal we have to decide what the frequent
case is and how much performance can be improved by making the case faster.
A fundamental law, called Amdahl’s law , can be use to quantify this principal.
Quantitative Computer Design
 Principles that are useful in design and analysis of computers:
 Make the common case fast !
o If a design trade-off is necessary, favor the frequent case (which
is often simpler) over the infrequent case.
o
o For example, given that overflow in addition is infrequent, favor
optimizing the case when no overflow occurs.
 Objective:
 Determine the frequent case.
 Determine how much improvement in performance is possible by making it faster.
 Amdahl's law can be used to quantify the latter given that we have information
concerning the former.
 Amdahl's Law :
o The performance improvement to be gained from using some
faster mode of execution is limited by the fraction of the time the
faster mode can be used.
o Amdahl's law defines the speedup obtained by using a particular

feature:
o Two factors:
Unit:-1
 Fraction enhanced : Fraction of compute time in original machine that can be converted to
take advantage of the enhancement.
 Always <= 1.
 Speedup enhanced : Improvement gained by enhanced execution mode:

 Amdahl's Law (cont) :
o Execution time using original machine with enhancement:
o Speedup overall using Amdahl's Law:

 Amdahl's Law (cont) : An example:
o Consider an enhancement that takes 20ns on a machine with
enhancement and 100ns on a machine without. Assume
enhancement can only be used 30% of the time.
o What is the overall speedup ?

Unit:-1

 Amdahl's Law expresses the law of diminishing returns . i.e.
o Assume the first improvement costs us $1,000.
o Assume we are thinking about spending $100,000 to speed up

the 30% by a factor of 500.
o Is this a worthy investment ? (will we get a 52%*100 fold

increase in performance)?
o NO ! The best that we can do: 1/0.7 = 1.42 !

 CPU Performance Equation:
o Often it is difficult to measure the improvement in time using a
new enhancement directly.
o A second method that decomposes the CPU execution time into

three components makes this task simpler.
o CPU Performance Equation:
 where, for example, Clock cycle time = 2ns for a

500MHz Clock rate.
 CPU Performance Equation:
o An alternative to "number of clock cycles" is "number of

instructions executed" or Instruction Count ( IC ).
o Given both the "number of clock cycles" and IC of a program, the

average Clocks Per Instruction ( CPI ) is given by:
o Therefore, CPU performance is dependent on three

characteristics:
Unit:-1
o Note that CPU time is equally dependent on these, i.e. a 10%

improvement in any one leads to a 10% improvement in CPU
time.
 CPU Performance Equation :
o One difficulty: It is difficult to change one in isolation of the
others:
 Clock cycle time: Hardware and Organization.
 CPI: Organization and Instruction set architecture.
 Instruction count: Instruction set architecture and Compiler technology.
 A variation of this equation is:
 where IC i represents number of time instruction i is executed in a program and

CPI i represents the average number of clock cycles for instruction i.
 Why isn't CPI i a constant ? (Hint: cache behavior).
 Key adv: It is often possible to measure the constituent parts of the CPU
performance eq., unlike the components of Amdahl's eq.
Fallacies and Pitfalls
 MIPS (million instruction per second) is NOT an alternative metric to time.
 The implication: the bigger the MIPS, the faster the machine.
 3 problems with MIPS:

 MIPS is dependent on the instruction set. This makes it difficult to compare across
platforms.
 MIPS varies between programs on the same computer.
 MIPS can vary inversely to performance !
o Classic example: MIPS rating on a machine with floating point
hardware may be LOWER than machine with emulation because
Unit:-1
integer instructions take fewer clock cycles to execute. But there

are a lot more of them!
 What is important is how much work gets done .
 MIPS is typically used to measure PEAK performance, not real performance.

Fallacies and Pitfalls
 Synthetic benchmarks (i.e. Whetstone and Dhrystone) aren't necessarily much
good for several reasons:
 These benchmarks contain code sequences that are not typically found in real code.
Therefore,
o Compilers can perform optimizations on this code artificially
inflating performance.
o They don't reflect the behavior of real programs.
 Synthetic benchmarks often fit into cache; real code doesn't always do this.
 Peak Performance!= observed performance .

 Peak performance is performance that the machine is "guaranteed not to exceed".
 A machine is rarely able to run at peak performance for any extended period of time on
real programs.
 Just because a processor can run at 300 MFLOPS doesn't mean it always can (pipeline
hazards, memory access and the range of CPIs can slow down the CPU).
4. Measuring and Reporting Performance of Benchmarks
Benchmark programs should be derived from how actual applications will execute. However,
performance is often the result of combined characteristics of a given computer architecture
and system software/hardware components in addition to the microprocessor. Other factors
such as the operating system, compilers, libraries, memory design and I/O subsystem
characteristics may also have impacts on the results and make comparisons difficult.
4.1 Measuring Performance
Two ways to measure the performance are:
1. The speed measure - which measures how fast a computer completes a single task. For
example, the SPECint95 is used for comparing the ability of a computer to complete
single tasks.
Unit:-1
2. The throughput measure - which measures how many tasks a computer can complete in
a certain amount of time. The SPECint_rate95 measures the rate of a machine carrying
out a number of tasks.
4.2 Interpreting Results
There are three important guidelines to remember when interpreting benchmark results:
1. Be aware of what is being measured. When making critical purchasing decisions based on
results from standard benchmarks, it is very important to know what is actually been
measured. Without knowing, it is difficult to know whether the measurements obtained is even
relevant to the applications which will run on the system being purchased. Questions to
consider are: does the benchmark measure the overall performance of the system or just
components of the system such as the CPU or memory?
2. Representativeness is key. How close is the benchmark to the actual application being
executed? The closer it is, the better it will be at predicting the performance. For example, a
component-level benchmark would not be good predictors of performance for an application
that would use the entire system. Likewise, application benchmarks would be the most
accurate predictors of performance for individual applications.
3. Avoid single-measure metrics. Application performance should not be measured with just a
single number. No single numerical measurement can completely describe the performance of
a complex device like the CPU or the entire system. Also, try to avoid benchmarks that average
several results into a single measurement. Important information may be lost in average values.
Try to evaluate all the results from different benchmarks that are relevant to the application.
This may give a more accurate picture than evaluating the results from one benchmark alone.
4.3 Reporting Performance
There are some points to remember when reporting results obtained from running
benchmarks.
 Use newer version over the older. If an updated and revised version of a benchmark
suite is available, it is usually preferred over the outdated one. Generally there are good
reasons for revising the original. They include, but not limited to, changes in technology,
improvements in compiler efficiency, etc.
 Use all programs in a suite. There may be legitimate reasons why only a subset was
used, but they should be explained. Otherwise, someone looking at the results may
Unit:-1
become suspicious as to why the other programs were not considered. Explain about
the selection process, why it was not arbitrary, and why it was useful to do so.
 Report compilation mode. The compilation mode that was used is important and should
be reported in every case. The effect of a certain new hardware feature may be
dependent on whether it is applied to optimzed or unoptimized programs.
 Use a variety of benchmarks when reporting performance. Generally it is a good idea to
use other set of programs as additional test cases. One set of benchmarks may behave
differently than another set and such observations may be useful as to the next round of
benchmark selections.
 List all factors affecting performance. Have enough information about performance
measurements to allow readers to duplicate the results. These include:
1. program input
2. version of the program
3. version of compiler
4. optimizing level of compiled code
5. version of operating system
6. amount of main memory
7. number and types of disks
8. version of the CPU
 What is Pipelining?
 Pipeliningistheprocessofaccumulatinginstructionfromtheprocessorthrough
a pipeline. It allows storing and executing instructions in an orderly
process. It is also known as pipeline processing.
 Pipelining is a technique where multiple instructions are overlapped

during execution. Pipeline is divided into stages and these stages are
connected with one another to form a pipe like structure. Instruction
center from one end and exit from another end.
 Pipelining increases the overall instruction throughput.
 In pipeline system, each segment consists of an input register followed by

a combinational circuit. The register is used to hold data and
combinational circuit performs operations on it. The output of
combinational circuit is applied to the input register of the next segment.
Pipeline system is like the modern day assembly line setup in factories. For
example in a car manufacturing industry, huge assembly in easer setup and at
each point, there are robotic arms to perform a certain task, and then the car
moves on ahead to the next arm.
Types of Pipeline
Unit:-1
Itisdividedinto2categories:
1. Arithmetic Pipeline
2. Instruction Pipeline
Arithmetic Pipeline
Arithmeticpipelinesareusuallyfoundinmostofthecomputers.Theyareusedfor
floating point operations, multiplication of fixed point numbers etc. For
example: The input to the Floating Point Adder pipeline is:
X=A*2^a
Y=B*2^b
Here Aand Bare mantissas (significant digit of floating point numbers),

while a and b are exponents.
The floating point addition and subtraction is done in 4 parts:
1. Compare the exponents.

2. Align the mantissas.
3. Add or subtract mantissas
4. Product the result.
Registers are used for storing the intermediate results between the above
operations.
Instruction Pipeline
In this as tream of instructions can be executed by

Over lapping fetch, decode and execute phases of an instruction cycle. This
type of technique is used to increase the throughput of the computer
system.
An instruction pipeline reads instruction from the memory while previous
instructions are being executed in other segments of the pipeline. Thus we
can
executemultipleinstructionssimultaneously.Thepipelinewillbemoreefficientif
the instruction cycle is divided into segments of equal duration.
Unit:-1
What are Pipeline Hazards?

As we all know, the CPU’s speed is limited by memory. There’s one more case to consider, i.e. a
few instructions are at some stage of execution in a pipelined design. There is a chance that
these sets of instructions will become dependent on one another, reducing the pipeline’s pace.
Dependencies arise for a variety of reasons, which we will examine shortly. The dependencies
in the pipeline are referred to as hazards since they put the execution at risk.
Types of Pipeline Hazards in Computer Architecture

The three different types of hazards in computer architecture are:
1. Structural
2. Data
3. Control
Structural Hazard
Hardware resource conflicts among the instructions in the pipeline cause structural hazards.
Memory, a GPR Register, or an ALU might all be used as resources here. When more than one
instruction in the pipe requires access to the very same resource in the same clock cycle, a
resource conflict is said to arise. In an overlapping pipelined execution, this is a circumstance
where the hardware cannot handle all potential combinations. Know more about structural
hazards here.
Control Hazards
Branch hazards are caused by branch instructions and are known as control hazards in
computer architecture. The flow of program/instruction execution is controlled by branch
instructions. Remember that conditional statements are used in higher-level languages for
iterative loops and condition testing (correlate with while, for, and if case statements). These
are converted into one of the BRANCH instruction variations. As a result, when the decision to
execute one instruction is reliant on the result of another instruction, such as a conditional
branch, which examines the condition’s consequent value, a conditional hazard develops. Know
more about control hazards in pipelining here.
Data Hazards and its Handling Methods
Data Hazards occur when an instruction depends on the result of previous instruction and that
result of instruction has not yet been computed. whenever two different instructions use the
same storage. the location must appear as if it is executed in sequential order.
There are four types of data dependencies: Read after Write (RAW), Write after Read (WAR),
Write after Write (WAW), and Read after Read (RAR). These are explained as follows below.
Unit:-1
 Read after Write (RAW) :

It is also known as True dependency or Flow dependency. It occurs when the value produced
by an instruction is required by a subsequent instruction. For example,
ADD R1, --, --;
SUB --, R1, --;
Stalls are required to handle these hazards.
 Write after Read (WAR) :
It is also known as anti dependency. These hazards occur when the output register of an
instruction is used right after read by a previous instruction. For example,
ADD --, R1, --;
SUB R1, --, --;
 Write after Write (WAW) :
It is also known as output dependency. These hazards occur when the output register of an
instruction is used for write after written by previous instruction. For example,
ADD R1, --, --;
SUB R1, --, --;
 Read after Read (RAR) :
It occurs when the instruction both read from the same register. For example,
ADD --, R1, --;
SUB --, R1, --;
Since reading a register value does not change the register value, these Read after Read (RAR)
hazards don’t cause a problem for the processor.
Handling Data Hazards :

These are various methods we use to handle hazards: Forwarding, Code reordering , and Stall
insertion.
These are explained as follows below.

1. Forwarding :
It adds special circuitry to the pipeline. This method works because it takes less time for the
required values to travel through a wire than it does for a pipeline segment to compute its
result.
2. Code reordering :
We need a special type of software to reorder code. We call this type of software a
hardware-dependent compiler.
3. Stall Insertion :
it inserts one or more stall (no-op instructions) into the pipeline, which delays the execution
of the current instruction until the required operand is written to the register file, but this
method decreases pipeline efficiency and throughput.
Unit:-1
Exception handling is the process of responding to unwanted or unexpected events when a

computer program runs. Exception handling deals with these events to avoid the program or
system crashing, and without this process, exceptions would disrupt the normal operation of
a program.
Pipelining is commonly used to optimize throughput, by partitioning the function of the
circuit into stages, so that multiple instructions can be processed concurrently. The
partitioning of instruction processing is done by introducing state-holding elements, called
pipeline registers (delays), into the pipeline.

Computer Architecture Unit 1

Uploaded by

Copyright:

Available Formats

Computer Architecture Unit 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computer Architecture Unit 1

Uploaded by

Copyright:

Available Formats

Unit:-1

What Is Computer Architecture?

Computer architecture refers to the end-to-end structure of a computer system that

Examples of Computer Architecture: Von Neumann Architecture (a) and Harvard

Components of Computer Architecture

Depending on the method of categorization, the parts of a computer architecture can be

1. Input unit and associated peripherals

2. Output unit and associated peripherals

Quantitative techniques in computer design:-

o Amdahl's law defines the speedup obtained by using a particular

Quantitative Computer Design

o Speedup overall using Amdahl's Law:

Quantitative Computer Design

o What is the overall speedup ?

Quantitative Computer Design

o Assume the first improvement costs us $1,000.

o Assume we are thinking about spending $100,000 to speed up

o Is this a worthy investment ? (will we get a 52%*100 fold

o NO ! The best that we can do: 1/0.7 = 1.42 !

o A second method that decomposes the CPU execution time into

o CPU Performance Equation:

 where, for example, Clock cycle time = 2ns for a

o An alternative to "number of clock cycles" is "number of

o Given both the "number of clock cycles" and IC of a program, the

o Therefore, CPU performance is dependent on three

o Note that CPU time is equally dependent on these, i.e. a 10%

 A variation of this equation is:

 where IC i represents number of time instruction i is executed in a program and

 Why isn't CPI i a constant ? (Hint: cache behavior).

 3 problems with MIPS:

integer instructions take fewer clock cycles to execute. But there

 What is important is how much work gets done .

 MIPS is typically used to measure PEAK performance, not real performance.

 Peak Performance!= observed performance .

4. Measuring and Reporting Performance of Benchmarks

4.1 Measuring Performance

Two ways to measure the performance are:

4.2 Interpreting Results

4.3 Reporting Performance

 Pipelining is a technique where multiple instructions are overlapped

 Pipelining increases the overall instruction throughput.

 In pipeline system, each segment consists of an input register followed by

Here Aand Bare mantissas (significant digit of floating point numbers),

The floating point addition and subtraction is done in 4 parts:

1. Compare the exponents.

In this as tream of instructions can be executed by

What are Pipeline Hazards?

Types of Pipeline Hazards in Computer Architecture

Data Hazards and its Handling Methods

 Read after Write (RAW) :

Handling Data Hazards :

These are explained as follows below.

Exception handling is the process of responding to unwanted or unexpected events when a

You might also like