Co Unit 2

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 24

CO UNIT 2

Multiple Bus Organization with Hardwired Control


A Multiple Bus Organization is a processor design that uses multiple internal buses to
facilitate efficient data transfer between various functional units within the processor
(such as registers, ALU, memory, and other units). The goal is to reduce the number of
cycles needed to execute an instruction by allowing multiple components to
communicate in parallel. When combined with hardwired control, this organization
becomes an efficient means of implementing instruction execution.

Let’s break down the key concepts:

1. Multiple Bus Organization:


In processors with a multiple bus architecture, there are typically several buses
connecting different components (such as registers, ALU, memory, etc.) within the CPU.
Each bus can carry data between components simultaneously, enabling parallel
operations. Here’s a breakdown of the typical components and buses in a multiple bus
organization:

 Registers: These are small, high-speed storage locations used to hold data temporarily.
Common types of registers in this architecture are general-purpose
registers (GPRs), status registers, instruction register (IR), program counter
(PC), memory address register (MAR), and memory data register (MDR).
 ALU: The Arithmetic and Logic Unit performs computations such as addition, subtraction,
AND, OR, and so on.
 Buses: Multiple buses (e.g., data bus, address bus) carry data between registers,
memory, and the ALU.

Key Benefits of Multiple Bus Organization:

 Parallel Data Transfer: With multiple buses, different units (e.g., registers, ALU) can
communicate simultaneously, speeding up execution by eliminating the need for
serialized data transfer.
 Reduced Instruction Cycle Time: By allowing more than one data transfer per clock
cycle, multiple bus organizations can reduce the overall number of cycles needed to
execute an instruction.
 Flexibility: Different components of the processor (like the ALU, registers, or memory)
can operate concurrently on different data without waiting for each other to complete.

Example Configuration:

Consider a basic processor with the following buses:

 Bus 1: Carries data between the ALU and registers.


 Bus 2: Carries data between memory and registers.
 Bus 3: Carries addresses between the memory unit and the MAR.
By using multiple buses, instructions can be fetched from memory while simultaneously
performing operations in the ALU and updating registers.

2. Hardwired Control:
In processor design, control units are responsible for generating the control signals
needed to manage the operation of the CPU. The hardwired control unit is a type of
control unit that uses fixed logic (usually combinational logic circuits) to generate control
signals based on the current instruction.

How Hardwired Control Works:

 In a hardwired control system, the control signals are generated directly by


the instruction bits (the opcode, or operation code) that are decoded.
 The control unit decodes the opcode to determine which operation to perform and then
sends the corresponding control signals to different parts of the processor (e.g., ALU,
registers, buses).

The key features of a hardwired control unit are:

 Speed: Hardwired control units are faster than microprogrammed control because they
use fixed logic circuits that don't require fetching or interpreting control data from
memory.
 Simplicity: Hardwired control is simpler to implement, especially for processors with a
fixed instruction set. The control unit directly generates signals based on the decoded
instruction, typically using decoders and multiplexers.
 Limited Flexibility: While hardwired control is fast and simple, it is less flexible
compared to microprogramming because adding new instructions or modifying existing
ones often requires changes to the hardware (i.e., redesigning the control unit).

3. Hardwired Control with Multiple Bus Organization:


When hardwired control is used in a multiple bus organization, the control unit
generates signals that manage the flow of data between registers, the ALU, and memory
using the multiple buses available.

Steps in the Instruction Cycle with Hardwired Control and Multiple Bus
Organization:

1. Instruction Fetch:
 The PC (Program Counter) is loaded with the address of the instruction.
 The address is sent to memory via the address bus.
 Memory retrieves the instruction and sends it back through the data bus to
the IR (Instruction Register).
2. Decode:
 The opcode of the instruction in IR is decoded by the control unit (via a decoder or a
logic circuit).
 Based on the decoded opcode, the control unit sends appropriate control signals to the
relevant parts of the processor. These signals direct the operation of the ALU, memory,
and registers.
3. Execute:
 The control unit sends signals to the ALU to perform the operation on the operands
stored in registers.
 If the instruction involves memory, the data is transferred via the data bus from
registers to memory, or vice versa, depending on whether it is a load/store operation.
 The ALU can also communicate with the registers via the buses to update the contents of
registers after computation.
4. Memory Access (if applicable):
 For memory operations (like load or store), the address bus carries the address to be
accessed, and the data bus transfers the data to/from the memory.
5. Write-back:
 The result of the ALU computation (or memory fetch) is written back to the register
file using the data bus.

Example of Hardwired Control Signal Generation:

Let’s take a simple instruction like ADD R1, R2, R3, which adds the contents of
registers R2 and R3 and stores the result in R1. The hardwired control process might
involve:

 Opcode decode: The control unit decodes the opcode ( ADD).


 ALU operation: The control unit sends a signal to the ALU to perform
an addition operation.
 Bus control: The control unit ensures that the contents of R2 and R3 are placed on the
appropriate buses, and the result is written to R1.
 Data paths: The control unit controls the buses to route data between registers and the
ALU, ensuring that the operands are correctly fetched and the result is stored in the right
place.

4. Control Signals in Hardwired Control with Multiple Buses:


The hardwired control unit in a multiple bus organization generates various control
signals that govern the actions of different parts of the CPU, including:

 Bus selection: Signals to select which bus should be used for data transfer (e.g.,
between registers, memory, and ALU).
 Register load signals: Signals that determine which register should be loaded with
data.
 ALU operation signals: Control lines that specify which operation the ALU should
perform (e.g., addition, subtraction, logical operations).
 Memory read/write signals: Signals to read data from or write data to memory.
 Multiplexer control: Multiplexers are used to select data from multiple sources, and
control signals select which data path is active at any time.
5. Example:
Consider a simple example where we are executing the following instruction:

assembly
Copy code
ADD R1, R2, R3 ; R1 = R2 + R3

Step-by-Step Breakdown:

1. Instruction Fetch:
 The PC provides the address to memory.
 The instruction is fetched from memory and loaded into the IR.
2. Instruction Decode:
 The control unit decodes the opcode ( ADD) from IR.
 It sends signals to the ALU to perform addition, and to the buses to route the values
of R2 and R3.
3. ALU Execution:
 The control unit activates the ALU to perform the addition of the contents of R2 and R3.

 The result is placed on the data bus.


4. Write-back:
 The result from the ALU is written to R1 via the data bus.

By using multiple buses and hardwired control, the processor can perform
the ADD instruction efficiently, with parallel data transfers between the ALU, registers,
and memory, without having to wait for sequential transfers. This improves execution
speed and reduces the overall clock cycle count per instruction.

A Complete Processor
Central Processing Unit (CPU)

CPU [Central Processing Unit]. It is the brain of the computer. It is the part that
does most of the work in a computer system. Just like how our brain controls
our body and processes information, the CPU carries out instructions from
programs and performs calculations. It’s made up of smaller components that
work together to execute tasks, making it the heart of any computing device.
All types of data processing operations from simple arithmetic to complex
tasks and all the important functions of a computer are performed by the CPU.
It helps input and output devices to communicate with each other and perform
their respective operations. It also stores data which is input, intermediate
results in between processing, and instructions. The CPU’s job is to make sure
everything runs smoothly and efficiently. In this article, we are going to discuss
CPU in detail.
What is a CPU?
A Central Processing Unit is the most important component of a computer
system. A CPU is hardware that performs data input/output, processing, and
storage functions for a computer system. A CPU can be installed into a CPU
socket. These sockets are generally located on the motherboard. CPU can
perform various data processing operations. CPU can store data, instructions,
programs, and intermediate results.

CPU

History of CPU
Since 1823, when Baron Jons Jakob Berzelius discovered silicon, which is still
the primary component used in manufacturing CPUs today, the history of the
CPU has experienced numerous significant turning points. The
first transistor was created by John Bardeen, Walter Brattain, and William
Shockley in December 1947. in 1958, the first working integrated circuit was
built by Robert Noyce and Jack Kilby.
The Intel 4004 was the company’s first microprocessor, which it unveiled in
1971. Ted Hoff’s assistance was needed for this. When Intel released its 8008
CPU in 1972, Intel 8086 in 1976, and Intel 8088 in June 1979, it contributed to
yet another win. The Motorola 68000, a 16/32-bit processor, was also released
in 1979. The Sun also unveiled the SPARC CPU in 1987. AMD unveiled the
AM386 CPU series in March 1991.
In January 1999, Intel introduced the Celeron 366 MHZ and 400 MHz
processors. AMD back in April 2005 with its first dual-core processor. Intel also
introduced the Core 2 Dual processor in 2006. Intel released the first Core
i5 desktop processor with four cores in September 2009.
In January 2010, Intel released other processors like the Core 2 Quad processor
Q9500, the first Core i3 and i5 mobile processors, and the first Core i3 and i5
desktop processors.
In June 2017, Intel released Core i9 desktop processor, and Intel introduced its
first Core i9 mobile processor In April 2018.
Different Parts of CPU
Now, the CPU consists of 3 major units, which are:
 Memory or Storage Unit
 Control Unit
 ALU(Arithmetic Logic Unit)
Let us now look at the block diagram of the computer:

Here, in this diagram, the three major components are also shown. So, let us
discuss these major components in detail.
Memory or Storage Unit
As the name suggests this unit can store instructions, data, and intermediate
results. The memory unit is responsible for transferring information to other
units of the computer when needed. It is also known as an internal storage unit
or the main memory or the primary storage or Random Access Memory (RAM)
as all these are storage devices.
Its size affects speed, power, and performance. There are two types of memory
in the computer, which are primary memory and secondary memory. Some
main functions of memory units are listed below:
 Data and instructions are stored in memory units which are required for
processing.
 It also stores the intermediate results of any calculation or task when they
are in process.
 The final results of processing are stored in the memory units before these
results are released to an output device for giving the output to the user.
 All sorts of inputs and outputs are transmitted through the memory unit.
Control Unit
As the name suggests, a control unit controls the operations of all parts of the
computer but it does not carry out any data processing operations. Executing
already stored instructions, It instructs the computer by using the electrical
signals to instruct the computer system. It takes instructions from the memory
unit and then decodes the instructions after that it executes those instructions.
So, it controls the functioning of the computer. Its main task is to maintain the
flow of information across the processor. Some main functions of the control
unit are listed below:
 Controlling of data and transfer of data and instructions is done by the
control unit among other parts of the computer.
 The control unit is responsible for managing all the units of the computer.
 The main task of the control unit is to obtain the instructions or data that is
input from the memory unit, interpret them, and then direct the operation of
the computer according to that.
 The control unit is responsible for communication with Input and output
devices for the transfer of data or results from memory.
 The control unit is not responsible for the processing of data or storing data.
ALU (Arithmetic Logic Unit)
ALU (Arithmetic Logic Unit) is responsible for performing arithmetic and logical
functions or operations. It consists of two subsections, which are:
 Arithmetic Section: By arithmetic operations, we mean operations like
addition, subtraction, multiplication, and division, and all these operations
and functions are performed by ALU. Also, all the complex operations are
done by making repetitive use of the mentioned operations by ALU.
 Logic Section: By Logical operations, we mean operations or functions like
selecting, comparing, matching, and merging the data, and all these are
performed by ALU.
Note: The CPU may contain more than one ALU and it can be used for
maintaining timers that help run the computer system.
What Does a CPU Do?
The main function of a computer processor is to execute instructions and
produce an output. CPU work Fetch, Decode, and Execute are the fundamental
functions of the computer.
 Fetch: the first CPU gets the instruction. That means binary numbers that
are passed from RAM to CPU.
 Decode: When the instruction is entered into the CPU, it needs to decode
the instructions. with the help of ALU(Arithmetic Logic Unit), the process of
decoding begins.
 Execute: After the decode step the instructions are ready to execute.
 Store: After the execute step the instructions are ready to store in the
memory.
Types of CPU
We have three different types of CPU:
 Single Core CPU: The oldest type of computer CPU is a single-core CPU.
These CPUs were used in the 1970s. these CPUs only have a single core that
performs different operations. This means that the single-core CPU can only
process one operation at a single time. single-core CPU CPU is not suitable
for multitasking.
 Dual-Core CPU: Dual-Core CPUs contain a single Integrated Circuit with
two cores. Each core has its cache and controller. These controllers and
cache work as a single unit. dual-core CPUs can work faster than single-core
processors.
 Quad-Core CPU: Quad-Core CPUs contain two dual-core processors present
within a single integrated circuit (IC) or chip. A quad-core processor contains
a chip with four independent cores. These cores read and execute various
instructions provided by the CPU. Quad Core CPU increases the overall
speed of programs. Without even boosting the overall clock speed it results
in higher performance.
What is CPU Performance?
CPU performance is how fast a computer’s processor (CPU) can complete the
task. It is measured by the number of instructions completed in one second. Its
performance depends on the processor’s clock speed and other factors like its
design and the size of its cache.
What are Computer Programs and Where are They Stored?
A computer program is a set of instructions written by a programmer that tells
a computer what to do. For example, Using a web browser or a word processor
is a program, Performing math operations on a computer and clicking and
selecting items with a mouse or touchpad is also a program.
Storage of Programs
There are two ways of storing programs on the computer memory:
 Permanent Storage: Programs are stored permanently on storage devices
like HDD, or SSD.
 Temporary Storage: When a program is running on a CPU, its data is
stored in RAM from HDD or SDD. Temporary because RAM is volatile, it loses
all data when the power is turned off.
Advantages
 Versatility: CPU can able to handle a complex task, from basic calculation
to managing the operating system.
 Performance: Modern CPU are vary fast and able to perform billions of
calculation per second.
 Multi-core: CPU have multiple core and able to handle multiple task
simultaneously.
 Compatibility: CPUs are designed to be compatible with a wide range of
software, this help to run different applications by using single CPU.
Disadvantages
 Overheating: CPU generate a lot of heat while performing complex task.
This requires effective cooling solutions, such as fans or liquid cooling
systems.
 Power Consumption: High-performance CPUs can consume a vary high
amount of power, which cause to generate higher electricity bills and the
need for a robust power supply.
 Cost: Best performance CPU can be expensive. Which can be a barrier for
some users or applications that need high computing power.
 Limited Parallel Processing: While multi-core CPUs can handle multiple
tasks at once, they are still not as efficient at parallel processing as
specialized hardware like GPUs (Graphics Processing Units), which are
designed for handling many tasks simultaneously.

Microprogrammed Control
Microprogrammed control is a method used in the design of the control unit of a
computer’s processor. Unlike hardwired control, where control signals are generated
directly by combinational logic, microprogrammed control uses a set
of microinstructions stored in memory (usually in Control Memory) to generate
control signals.

In microprogrammed control, each instruction in the machine's instruction set is


translated into a sequence of smaller operations called micro-
operations or microinstructions. These micro-operations are executed one by one to
carry out the full operation of the instruction.

Key Concepts in Microprogrammed Control:

1. Microinstructions
2. Microprogram Sequencing
3. Wide Branch Addressing
4. Microinstructions with Next-Address Field
5. Prefetching Microinstructions
6. Emulation

Let’s break down each of these concepts:

1. Microinstructions
A microinstruction is a low-level instruction that defines a control signal or a set of
control signals for a specific operation within the processor. These control signals are
necessary to coordinate the operation of the processor's functional units (such as the
ALU, registers, buses, and memory).

 Micro-operations are smaller operations that occur in a microcycle and correspond to


individual steps in the execution of an instruction. For example, moving data between
registers, performing an ALU operation, or transferring data to/from memory.
A microinstruction typically consists of:

 Control Bits: Each bit of the microinstruction controls a specific part of the processor
(e.g., ALU control, register load signals, memory read/write signals).
 Address Field: Specifies the address of the next microinstruction in the sequence,
which is used for the microprogram sequencing.

Example: A microinstruction could specify the operation "load the value of register A
into the ALU" by setting control bits that activate the appropriate register and ALU paths.

2. Microprogram Sequencing
Microprogram sequencing refers to the method used to determine the sequence in
which microinstructions are executed. When an instruction is fetched and decoded, the
control unit must determine which microinstructions need to be executed in order to
perform the required operations.

 Microprogram Sequencer: The sequencer generates the address of the next


microinstruction. It can be part of a control memory system or a specialized logic unit.

Types of Microprogram Sequencing:

1. Linear Sequence: The microinstructions are executed sequentially, one after the other.
2. Branching Sequence: Some microinstructions may cause a jump to another sequence
of microinstructions. This is necessary for handling control flow in the program (such as
branching, loops, or subroutine calls).

Key Consideration: The sequencer decides the next address by either incrementing
the address (in the case of a linear sequence) or using a jump mechanism (in the case of
branching).

3. Wide Branch Addressing


Wide Branch Addressing refers to the ability to use a wide range of addresses when
branching between microinstructions. Since the control memory stores microprograms
that are executed in sequence, branching is required when certain conditions are met,
such as jumping to a new microprogram or handling conditional instructions.

 Branching in a microprogram typically occurs when there are decisions to be made


(e.g., on the result of a comparison or after an instruction fetch).
 Wide Addressing allows the system to support larger address spaces for branches,
enabling more complex control flow and allowing microprograms to jump over longer
sequences of instructions.

This is particularly useful for handling:

 Conditionally executed microinstructions.


 Subroutine calls (such as procedure calls).
 Interrupts that require a jump to a specific set of microinstructions.
Wide addressing generally uses larger address fields in the microinstruction to specify
jump addresses, compared to systems with smaller control memory addressing.

4. Microinstructions with Next-Address Field


A microinstruction with a next-address field is one where the microinstruction not
only contains the control bits (which specify what operations to perform) but also
includes the address of the next microinstruction to be executed.

 The next-address field directly specifies the address of the next microinstruction,
effectively controlling the flow of execution in the microprogram.
 This field is crucial for branching within the microprogram or for sequences that depend
on the outcome of an operation (such as the result of an arithmetic operation or a
comparison).

For example, a microinstruction could instruct the ALU to add two registers, and then use
the next-address field to either continue the execution of the next sequential
microinstruction or jump to a different part of the microprogram based on a conditional
test.

5. Prefetching Microinstructions
Prefetching microinstructions is a technique used to improve the performance of
microprogrammed control by loading the next microinstructions into the control
memory before they are actually needed.

 This is similar to instruction prefetching in traditional pipelined processors, where


instructions are fetched ahead of time to keep the pipeline filled.
 In the case of microprogramming, the microinstruction fetch process is pipelined, so
that while one microinstruction is being executed, the next microinstruction is fetched
and ready for execution.

Benefits of Prefetching:

 Reduced latency: By fetching microinstructions in advance, the processor doesn't need


to wait for the next instruction to be loaded, improving overall execution time.
 Improved throughput: More efficient use of the control memory system, as
microinstructions are readily available when needed.
 Effective use of idle cycles: The microprogram control unit can be busy fetching
instructions while the CPU is working on executing the current microinstruction.

6. Emulation
Emulation refers to the ability of a microprogrammed control system to simulate the
behavior of another system or instruction set architecture (ISA). This allows one
hardware system to execute programs written for a different system.

 Emulation of machine instructions: A microprogram can be written to interpret a


higher-level machine instruction as a sequence of microinstructions. This makes it
possible for processors to execute instructions from a different instruction set, thereby
enabling software compatibility across different systems.

Types of Emulation:

1. Instruction Emulation: The microprogrammed control unit simulates the execution of


instructions from one machine by executing a sequence of microinstructions that
correspond to that machine’s instruction.
2. Virtual Machines: A virtual machine (VM) is an example of an emulation layer that uses
microprogramming to execute programs designed for a different machine architecture.

Example of Emulation:

 A processor using microprogrammed control might be able to emulate a legacy


processor (like an older microprocessor architecture), translating the instructions from
the legacy architecture into micro-operations that can be executed by the modern
processor.

Cache Memories: Mapping Functions, Replacement Algorithms,


and Example of Mapping Technique
Cache memory is a small, high-speed memory located close to the CPU that stores
frequently accessed data to reduce the time it takes to fetch data from slower main
memory (RAM). The primary goal of cache memory is to speed up data access by taking
advantage of spatial and temporal locality in programs.

Key Concepts of Cache Memory


1. Cache Mapping: The method by which data from the main memory is placed into the
cache.
2. Cache Replacement Algorithms: Algorithms used to decide which cache line to
replace when the cache is full and a new data block needs to be loaded into the cache.
3. Example of Mapping Technique: A practical example of how data is mapped from
main memory to cache.

1. Cache Mapping Functions


Cache mapping refers to how data from the main memory is transferred to the cache
memory. The process depends on how the cache is organized and how the blocks of
data in memory are mapped to specific locations in the cache.

There are three primary types of cache mapping:

(a) Direct-Mapped Cache

 In direct-mapped cache, each memory location can be mapped to exactly one cache
line. This is the simplest form of cache mapping.
 The memory address is divided into three parts:
 Tag: Identifies which block of memory the cache line corresponds to.
 Index: Specifies which cache line the data should be placed into.
 Block offset: Indicates the specific byte within a cache block (or line).

Direct-mapped cache structure:

 The cache has a fixed number of lines (or slots), and each memory address maps to
exactly one of these lines.

Example: For a 32-bit memory address and a cache with 64 lines, the memory address
could be divided into:

 Tag: The most significant bits (to identify the block).


 Index: The next set of bits (to map to a specific cache line).
 Block offset: The least significant bits (to specify a byte within a block).

(b) Fully Associative Cache

 In fully associative cache, a memory block can be placed in any cache line. There is
no predefined index for where data will be placed; instead, the entire cache is searched
for a match.
 This type of cache is more flexible and can potentially have fewer cache misses, but it
requires more complex hardware to perform the search across all cache lines.

Example:

 A 32-bit memory address might only contain a Tag and a Block offset. The index part
of the address is omitted since any block can be stored in any cache line.

(c) Set-Associative Cache

 Set-associative cache is a hybrid between direct-mapped and fully associative caches.


In this approach, the cache is divided into sets, and each set contains multiple cache
lines.
 A memory address is divided into three parts:
 Tag: Identifies the block.
 Set index: Specifies which set the block is likely to be placed in.
 Block offset: Identifies the exact byte within the cache block.

For example, a 4-way set-associative cache means that each set has 4 cache lines, and a
memory address is mapped to one of these sets but can be placed in any line within the
set.

2. Cache Replacement Algorithms


When the cache is full and a new data block needs to be loaded, the cache must decide
which existing block to evict (replace). Several replacement algorithms exist to
determine which cache line should be replaced:
(a) Least Recently Used (LRU)

 The LRU algorithm evicts the cache block that has not been used for the longest period
of time. The idea is that blocks that haven’t been used recently are less likely to be used
again soon.
 LRU typically requires maintaining a record of access times or ordering, which can be
done either using a counter or a stack.

(b) First-In, First-Out (FIFO)

 The FIFO algorithm evicts the block that was loaded into the cache first. It maintains a
queue of cache blocks, and the oldest block (at the front of the queue) is replaced when
a new block needs to be loaded.
 FIFO is simple to implement, but it does not always perform optimally.

(c) Optimal (OPT) / Belady's Algorithm

 The Optimal algorithm evicts the block that will not be used for the longest period of
time in the future. While optimal, it’s not practical for real systems because it requires
knowledge of future memory accesses.
 It is often used as a benchmark to compare the performance of other algorithms.

(d) Least Frequently Used (LFU)

 The LFU algorithm replaces the block that has been accessed the least frequently. It
maintains a count of accesses to each cache block, and the block with the lowest access
count is evicted.

(e) Random Replacement

 The Random algorithm randomly selects a cache line to replace when a new data block
must be loaded into the cache. It’s simple to implement but may result in poor cache
performance compared to other algorithms.

3. Example of Mapping Technique


Let’s consider a simple example to demonstrate Direct-Mapped Cache:

Example:

 Memory Address: 32 bits


 Cache Size: 64 cache lines
 Block Size: 4 bytes

The 32-bit memory address is divided as follows:

 Tag: 18 bits
 Index: 6 bits (since the cache has 64 lines, we need log⁡2(64)=6log2(64)=6 bits for
indexing)
 Block offset: 8 bits (since the block size is 4 bytes, we need log⁡2(4)=2log2(4)=2 bits for
the block offset)

Address Breakdown:

 The Tag (18 bits) is used to identify which block of memory is being referenced.
 The Index (6 bits) specifies which line of the cache the block will be placed in.
 The Block offset (8 bits) tells which byte within the 4-byte block we are interested in.

Example Mapping:

If the memory address is 0x12345678, we would break it down into:

 Tag: The first 18 bits of the address.


 Index: The next 6 bits (used to find the cache line).
 Block offset: The last 8 bits (to determine which byte within the cache block).

After breaking down the address, we would:

1. Use the Index to find the corresponding cache line.


2. Compare the Tag with the tag stored in the cache line to check if there is a cache hit.
3. If there’s a cache miss, we load the data from the main memory into the cache at the
corresponding cache line.

Example Cache Operation:

 Memory Address: 0x12345678


 Tag: 0x12345
 Index: 0x12
 Block Offset: 0x78

This means the data for address 0x12345678 would be placed in cache line 0x12, and the
data inside the block would be loaded starting at byte 0x78.

Performance Considerations in Computer Organization:


Interleaving, Hit Rate and Miss Penalty, Caches on Processor
Chip, and Other Enhancements
In Computer Organization, achieving optimal performance relies heavily on efficient
memory access and data management. Key performance considerations in memory
systems include how data is organized in memory, how it is cached for fast access, and
how various memory techniques reduce latency and improve throughput. The four
primary areas of focus are interleaving, hit rate and miss penalty, caches on the
processor chip, and other cache-related enhancements. Below is an in-depth
explanation of each.
1. Memory Interleaving
Memory interleaving is a technique used to improve memory access speed by
distributing data across multiple memory banks so that simultaneous memory accesses
can occur.

How It Works:

 Memory interleaving divides the system's main memory into multiple banks (e.g., 2-way,
4-way, or 8-way interleaving), with each bank containing a portion of the address space.
 Data from consecutive memory addresses is placed in different memory banks, so when
the CPU needs to access consecutive data, it can request data from multiple banks
simultaneously.
 This parallelism minimizes waiting time and maximizes throughput, reducing
the memory bottleneck.

Types of Interleaving:

1. 2-way interleaving: Every other word or block of memory is placed in a different


memory bank. For example, odd-numbered addresses go to one bank, and even-
numbered addresses go to another.
2. 4-way interleaving: Memory is split into four banks, with different data placed in each
one, allowing for even more parallelism.
3. N-way interleaving: Memory is divided into N banks, each handling a portion of the
memory space.

Benefits:

 Increased Throughput: With multiple banks, the memory system can handle more
than one request at a time, reducing wait times.
 Reduced Latency: By splitting the memory load, interleaving can speed up the overall
memory access time, as the CPU can simultaneously access data from multiple locations.

Example:

 In a 2-way interleaved memory system, addresses are divided into two groups:
addresses 0, 2, 4, 6 are mapped to Bank 1, and addresses 1, 3, 5, 7 are mapped to Bank
2. By accessing data from two banks at once, the CPU reduces the latency of sequential
accesses.

2. Hit Rate and Miss Penalty


(a) Hit Rate

The hit rate is the percentage of memory accesses that are satisfied by the cache,
meaning the requested data is already present in the cache. A high hit rate means that
the processor can quickly access data without needing to fetch it from slower main
memory.
 Formula:
Hit Rate=Cache HitsTotal Memory AccessesHit Rate=Total Memory AccessesCache Hits
 Impact: A high hit rate reduces the need to access main memory, leading to faster data
retrieval and better performance.

(b) Miss Penalty

The miss penalty is the additional time it takes to retrieve data from main memory (or
possibly from secondary storage like a hard disk) when a cache miss occurs. This penalty
occurs because accessing main memory is much slower than accessing the cache, and a
miss results in a delay that must be accounted for in performance.

 Formula:
Miss Penalty=Time to fetch data from main memory+Time to load data into cacheMiss Pena
lty=Time to fetch data from main memory+Time to load data into cache
 Impact: A higher miss penalty means longer delays in fetching data, which directly
impacts system performance. Therefore, reducing miss penalties is a key design
consideration.

Overall Performance:

 High Hit Rate: Fewer accesses to slower memory, leading to faster data retrieval and
reduced processor idle time.
 Low Miss Penalty: Minimizes the time spent waiting for data to be retrieved from
memory, reducing performance bottlenecks.

Example:

If the hit rate is 90%, 90% of memory accesses are served by the cache, and only 10%
require accessing slower memory. If the miss penalty is high (e.g., 100 CPU cycles to
fetch data from memory), the system will still experience slowdowns despite the high hit
rate.

3. Caches on the Processor Chip


On-chip caches are memory caches that are integrated directly onto the processor chip
itself. These caches are essential for reducing the latency involved in memory access, as
accessing data from on-chip cache is orders of magnitude faster than accessing data
from main memory.

There are typically multiple levels of cache (L1, L2, and L3) to optimize performance
further:

(a) L1 Cache

 Location: Closest to the CPU core.


 Size: Small (typically 16 KB to 128 KB).
 Speed: Very fast (1 to 2 CPU cycles).
 Split Design: Often split into L1d (data cache) and L1i (instruction cache).
 Role: Handles the most frequently accessed data and instructions, providing the fastest
possible data retrieval.

(b) L2 Cache

 Location: Typically located on the processor chip, but may also be located near the
processor.
 Size: Larger than L1 (typically 128 KB to several MB).
 Speed: Slower than L1 but still faster than main memory (3-6 CPU cycles).
 Role: Holds data and instructions that are not in the L1 cache but are likely to be
accessed soon.

(c) L3 Cache

 Location: Often shared among multiple processor cores.


 Size: Much larger (typically 2 MB to 16 MB).
 Speed: Slower than L2 (10+ CPU cycles).
 Role: Acts as a shared buffer for multiple cores, helping to reduce cache misses between
cores.

Benefits of On-Chip Caches:

1. Reduced Latency: The closer a cache is to the CPU, the faster the data retrieval,
reducing delays in memory access.
2. Higher Data Throughput: Caches on the processor chip can handle more data in
parallel, ensuring that the CPU spends less time waiting for memory access.
3. Improved Performance: With data readily available in caches, the processor can
execute instructions more efficiently, improving overall performance.

Cache Hierarchy:

 L1, L2, and L3 Caches form a hierarchy, where data is first searched in L1, then L2,
and finally L3. If the data is not found in any of these caches, it will be fetched from main
memory.

Example:

 If the CPU needs data that is frequently used, L1 cache will provide the fastest response.
If it's not in L1, L2 cache will be checked, and if it's not there, L3 cache is consulted, and
then the main memory. This multi-level cache hierarchy ensures that data access
times are minimized.

4. Other Cache Enhancements


In addition to the basic concepts of interleaving, hit rate, and on-chip caches, modern
processors implement several enhancements to further optimize cache performance.

(a) Cache Prefetching


 Definition: Prefetching is a technique in which the processor or the hardware preloads
data into the cache before it is actually requested, based on predictable patterns of
memory access.
 Goal: Minimize cache misses by loading data into the cache in anticipation of future
requests, reducing the wait time when the data is eventually needed.

(b) Cache Coherence

 Problem: In a multi-core processor, each core has its own cache. This can lead to cache
coherence problems where multiple caches contain different versions of the same
data.
 Solution: Cache coherence protocols like MESI (Modified, Exclusive, Shared,
Invalid) ensure that all caches in the system maintain a consistent view of the data.
 Modified: Data is updated in the cache and is not present in memory.
 Exclusive: Data is only in one cache, and it's the correct version.
 Shared: Data is in multiple caches but is unmodified.
 Invalid: Data is no longer valid or has been updated elsewhere.

(c) Non-Uniform Memory Access (NUMA)

 NUMA is an architecture in multi-socket systems where memory is divided


into local and remote regions, with faster access to local memory (memory attached to
the same processor) compared to remote memory (memory attached to another
processor).
 NUMA optimization: Caches can be optimized to take advantage of local memory,
reducing access latency and improving performance.

(d) Cache Replacement Policies

 LRU (Least Recently Used): Replaces the least recently accessed cache block when a
new one needs to be loaded.
 FIFO (First In, First Out): Replaces the oldest cache block.
 Random Replacement: Chooses a cache block to replace at random.
 Optimal Replacement: Replaces the cache block that will not be used for the longest
period in the future.

Virtual Memory – Address Translation


In virtual memory, blocks of memory (called pages) are mapped from one set
of addresses (called virtual addresses) to another set (called physical
addresses). It is also possible for a virtual page to be absent from main
memory and not be mapped to a physical address; in that case, the page
resides on disk. Physical pages can be shared by having two virtual addresses
point to the same physical address. This capability is used to allow two
different programs to share data or code.
Unallocated Pages
These are thee pages that have not yet been allocated (or created) by the VM
system. Unallocated blocks do not have any data associated with them, and
thus do not occupy any space on disk.

Cached Pages
These are those allocated pages that are currently cached in physical
memory. They are also present on disk.

Uncached Pages
These are those allocated pages that are not cached in physical memory. They
are present on disk.

Example

Number of addresses in virtual address space

Number of addresses in physical address space.

Page Size

Number of virtual pages

Number of physical pages


Virtual and Physical Address
In virtual memory, the address is broken into a virtual page number and a
page offset.

The physical page number constitutes the upper portion of the physical
address, while the page offset, which is not changed, constitutes the lower
portion.

The number of bits in the page offset field determines the page size.

A page table is used to carry out the translation from the virtual page number
to the physical page number.

Page Table
A page table is indexed with the page number from the virtual address. It
maps to a physical address in memory. Each program has its own page table,
which maps the virtual address space of that program to main memory. The
page table may contain entries for pages not present in memory.

The page table itself also resides in main memory. To indicate the location of
the page table in memory, the hardware includes a register that points to the
start of the page table; called the Page Table Base Register (PTBR).

The page table, together with the program counter and the registers,
specifies the state of a program. This state is often referred as a process. The
process is considered active when it is in possession of the processor;
otherwise, it is considered inactive. If we want to allow another program to
use the processor, we must save this state. The operating system can make a
process active by loading the process’s state, including the program counter,
which will initiate execution at the value of the saved program counter.

Rather than saving the entire page table, the operating system simply loads
the page table register to point to the page table of the process it wants to
make active.

The operating system is responsible for allocating the physical memory and
updating the page tables, so that the virtual address spaces of different
processes do not collide. The use of separate page tables also provides
protection of one process from another.

Because the page table contains a mapping for every possible virtual page, no
tags are required. The index that is used to access the page table consists of
the full block address, which is the virtual page number.

A valid bit is used in each page table entry. If the bit is off, the page is not
present in main memory (but present in disk) and a page fault occurs. If the
bit is on, the page is in memory and the entry contains the physical page
number.

Page Hit
In case of a page hit, the page table entry contains the physical page number.
1. Processor sends Virtual Address (VA).
2. MMU uses PTEA to access Page Table in memory.
3. MMU retrieves Page Table Entry (PTE) from Page Table.
4. MMU sends Physical Address (PA) to L1 cache.
5. L1 cache sends data word to processor.

Page Fault
If the valid bit for a virtual page is off, a page fault occurs. Control is
transferred to the operating system via the exception mechanism.

Once the operating system gets control, it must find the page in the next level
of the hierarchy and decide where to place the requested page in main
memory.

The space on the disk reserved for the full virtual memory space of a process
is called the swap space. The operating system allocates this swap space.

The operating system also tracks which processes and which virtual
addresses use each physical page. When a page fault occurs, if all the pages
in main memory are in use, the operating system must choose a page to
replace.

The operating system searches for the least recently used (LRU) page,
assuming that a page that has not been used in a long time is less likely to be
needed than a more recently accessed page. The replaced pages are written
to swap space on the disk.
1. Processor sends Virtual Address (VA).
2. MMU uses PTEA to access Page Table in memory.
3. MMU retrieves Page Table Entry (PTE) from Page Table.
4. Valid bit is zero, so MMU triggers page fault exception.
5. Handler identifies victim, and if dirty pages it out to disk.
6. Handler pages in new page and updates PTE in memory.
7. Handler returns to original process, restarting faulting instruction.

You might also like