Co Unit 2
Co Unit 2
Co Unit 2
Registers: These are small, high-speed storage locations used to hold data temporarily.
Common types of registers in this architecture are general-purpose
registers (GPRs), status registers, instruction register (IR), program counter
(PC), memory address register (MAR), and memory data register (MDR).
ALU: The Arithmetic and Logic Unit performs computations such as addition, subtraction,
AND, OR, and so on.
Buses: Multiple buses (e.g., data bus, address bus) carry data between registers,
memory, and the ALU.
Parallel Data Transfer: With multiple buses, different units (e.g., registers, ALU) can
communicate simultaneously, speeding up execution by eliminating the need for
serialized data transfer.
Reduced Instruction Cycle Time: By allowing more than one data transfer per clock
cycle, multiple bus organizations can reduce the overall number of cycles needed to
execute an instruction.
Flexibility: Different components of the processor (like the ALU, registers, or memory)
can operate concurrently on different data without waiting for each other to complete.
Example Configuration:
2. Hardwired Control:
In processor design, control units are responsible for generating the control signals
needed to manage the operation of the CPU. The hardwired control unit is a type of
control unit that uses fixed logic (usually combinational logic circuits) to generate control
signals based on the current instruction.
Speed: Hardwired control units are faster than microprogrammed control because they
use fixed logic circuits that don't require fetching or interpreting control data from
memory.
Simplicity: Hardwired control is simpler to implement, especially for processors with a
fixed instruction set. The control unit directly generates signals based on the decoded
instruction, typically using decoders and multiplexers.
Limited Flexibility: While hardwired control is fast and simple, it is less flexible
compared to microprogramming because adding new instructions or modifying existing
ones often requires changes to the hardware (i.e., redesigning the control unit).
Steps in the Instruction Cycle with Hardwired Control and Multiple Bus
Organization:
1. Instruction Fetch:
The PC (Program Counter) is loaded with the address of the instruction.
The address is sent to memory via the address bus.
Memory retrieves the instruction and sends it back through the data bus to
the IR (Instruction Register).
2. Decode:
The opcode of the instruction in IR is decoded by the control unit (via a decoder or a
logic circuit).
Based on the decoded opcode, the control unit sends appropriate control signals to the
relevant parts of the processor. These signals direct the operation of the ALU, memory,
and registers.
3. Execute:
The control unit sends signals to the ALU to perform the operation on the operands
stored in registers.
If the instruction involves memory, the data is transferred via the data bus from
registers to memory, or vice versa, depending on whether it is a load/store operation.
The ALU can also communicate with the registers via the buses to update the contents of
registers after computation.
4. Memory Access (if applicable):
For memory operations (like load or store), the address bus carries the address to be
accessed, and the data bus transfers the data to/from the memory.
5. Write-back:
The result of the ALU computation (or memory fetch) is written back to the register
file using the data bus.
Let’s take a simple instruction like ADD R1, R2, R3, which adds the contents of
registers R2 and R3 and stores the result in R1. The hardwired control process might
involve:
Bus selection: Signals to select which bus should be used for data transfer (e.g.,
between registers, memory, and ALU).
Register load signals: Signals that determine which register should be loaded with
data.
ALU operation signals: Control lines that specify which operation the ALU should
perform (e.g., addition, subtraction, logical operations).
Memory read/write signals: Signals to read data from or write data to memory.
Multiplexer control: Multiplexers are used to select data from multiple sources, and
control signals select which data path is active at any time.
5. Example:
Consider a simple example where we are executing the following instruction:
assembly
Copy code
ADD R1, R2, R3 ; R1 = R2 + R3
Step-by-Step Breakdown:
1. Instruction Fetch:
The PC provides the address to memory.
The instruction is fetched from memory and loaded into the IR.
2. Instruction Decode:
The control unit decodes the opcode ( ADD) from IR.
It sends signals to the ALU to perform addition, and to the buses to route the values
of R2 and R3.
3. ALU Execution:
The control unit activates the ALU to perform the addition of the contents of R2 and R3.
By using multiple buses and hardwired control, the processor can perform
the ADD instruction efficiently, with parallel data transfers between the ALU, registers,
and memory, without having to wait for sequential transfers. This improves execution
speed and reduces the overall clock cycle count per instruction.
A Complete Processor
Central Processing Unit (CPU)
CPU [Central Processing Unit]. It is the brain of the computer. It is the part that
does most of the work in a computer system. Just like how our brain controls
our body and processes information, the CPU carries out instructions from
programs and performs calculations. It’s made up of smaller components that
work together to execute tasks, making it the heart of any computing device.
All types of data processing operations from simple arithmetic to complex
tasks and all the important functions of a computer are performed by the CPU.
It helps input and output devices to communicate with each other and perform
their respective operations. It also stores data which is input, intermediate
results in between processing, and instructions. The CPU’s job is to make sure
everything runs smoothly and efficiently. In this article, we are going to discuss
CPU in detail.
What is a CPU?
A Central Processing Unit is the most important component of a computer
system. A CPU is hardware that performs data input/output, processing, and
storage functions for a computer system. A CPU can be installed into a CPU
socket. These sockets are generally located on the motherboard. CPU can
perform various data processing operations. CPU can store data, instructions,
programs, and intermediate results.
CPU
History of CPU
Since 1823, when Baron Jons Jakob Berzelius discovered silicon, which is still
the primary component used in manufacturing CPUs today, the history of the
CPU has experienced numerous significant turning points. The
first transistor was created by John Bardeen, Walter Brattain, and William
Shockley in December 1947. in 1958, the first working integrated circuit was
built by Robert Noyce and Jack Kilby.
The Intel 4004 was the company’s first microprocessor, which it unveiled in
1971. Ted Hoff’s assistance was needed for this. When Intel released its 8008
CPU in 1972, Intel 8086 in 1976, and Intel 8088 in June 1979, it contributed to
yet another win. The Motorola 68000, a 16/32-bit processor, was also released
in 1979. The Sun also unveiled the SPARC CPU in 1987. AMD unveiled the
AM386 CPU series in March 1991.
In January 1999, Intel introduced the Celeron 366 MHZ and 400 MHz
processors. AMD back in April 2005 with its first dual-core processor. Intel also
introduced the Core 2 Dual processor in 2006. Intel released the first Core
i5 desktop processor with four cores in September 2009.
In January 2010, Intel released other processors like the Core 2 Quad processor
Q9500, the first Core i3 and i5 mobile processors, and the first Core i3 and i5
desktop processors.
In June 2017, Intel released Core i9 desktop processor, and Intel introduced its
first Core i9 mobile processor In April 2018.
Different Parts of CPU
Now, the CPU consists of 3 major units, which are:
Memory or Storage Unit
Control Unit
ALU(Arithmetic Logic Unit)
Let us now look at the block diagram of the computer:
Here, in this diagram, the three major components are also shown. So, let us
discuss these major components in detail.
Memory or Storage Unit
As the name suggests this unit can store instructions, data, and intermediate
results. The memory unit is responsible for transferring information to other
units of the computer when needed. It is also known as an internal storage unit
or the main memory or the primary storage or Random Access Memory (RAM)
as all these are storage devices.
Its size affects speed, power, and performance. There are two types of memory
in the computer, which are primary memory and secondary memory. Some
main functions of memory units are listed below:
Data and instructions are stored in memory units which are required for
processing.
It also stores the intermediate results of any calculation or task when they
are in process.
The final results of processing are stored in the memory units before these
results are released to an output device for giving the output to the user.
All sorts of inputs and outputs are transmitted through the memory unit.
Control Unit
As the name suggests, a control unit controls the operations of all parts of the
computer but it does not carry out any data processing operations. Executing
already stored instructions, It instructs the computer by using the electrical
signals to instruct the computer system. It takes instructions from the memory
unit and then decodes the instructions after that it executes those instructions.
So, it controls the functioning of the computer. Its main task is to maintain the
flow of information across the processor. Some main functions of the control
unit are listed below:
Controlling of data and transfer of data and instructions is done by the
control unit among other parts of the computer.
The control unit is responsible for managing all the units of the computer.
The main task of the control unit is to obtain the instructions or data that is
input from the memory unit, interpret them, and then direct the operation of
the computer according to that.
The control unit is responsible for communication with Input and output
devices for the transfer of data or results from memory.
The control unit is not responsible for the processing of data or storing data.
ALU (Arithmetic Logic Unit)
ALU (Arithmetic Logic Unit) is responsible for performing arithmetic and logical
functions or operations. It consists of two subsections, which are:
Arithmetic Section: By arithmetic operations, we mean operations like
addition, subtraction, multiplication, and division, and all these operations
and functions are performed by ALU. Also, all the complex operations are
done by making repetitive use of the mentioned operations by ALU.
Logic Section: By Logical operations, we mean operations or functions like
selecting, comparing, matching, and merging the data, and all these are
performed by ALU.
Note: The CPU may contain more than one ALU and it can be used for
maintaining timers that help run the computer system.
What Does a CPU Do?
The main function of a computer processor is to execute instructions and
produce an output. CPU work Fetch, Decode, and Execute are the fundamental
functions of the computer.
Fetch: the first CPU gets the instruction. That means binary numbers that
are passed from RAM to CPU.
Decode: When the instruction is entered into the CPU, it needs to decode
the instructions. with the help of ALU(Arithmetic Logic Unit), the process of
decoding begins.
Execute: After the decode step the instructions are ready to execute.
Store: After the execute step the instructions are ready to store in the
memory.
Types of CPU
We have three different types of CPU:
Single Core CPU: The oldest type of computer CPU is a single-core CPU.
These CPUs were used in the 1970s. these CPUs only have a single core that
performs different operations. This means that the single-core CPU can only
process one operation at a single time. single-core CPU CPU is not suitable
for multitasking.
Dual-Core CPU: Dual-Core CPUs contain a single Integrated Circuit with
two cores. Each core has its cache and controller. These controllers and
cache work as a single unit. dual-core CPUs can work faster than single-core
processors.
Quad-Core CPU: Quad-Core CPUs contain two dual-core processors present
within a single integrated circuit (IC) or chip. A quad-core processor contains
a chip with four independent cores. These cores read and execute various
instructions provided by the CPU. Quad Core CPU increases the overall
speed of programs. Without even boosting the overall clock speed it results
in higher performance.
What is CPU Performance?
CPU performance is how fast a computer’s processor (CPU) can complete the
task. It is measured by the number of instructions completed in one second. Its
performance depends on the processor’s clock speed and other factors like its
design and the size of its cache.
What are Computer Programs and Where are They Stored?
A computer program is a set of instructions written by a programmer that tells
a computer what to do. For example, Using a web browser or a word processor
is a program, Performing math operations on a computer and clicking and
selecting items with a mouse or touchpad is also a program.
Storage of Programs
There are two ways of storing programs on the computer memory:
Permanent Storage: Programs are stored permanently on storage devices
like HDD, or SSD.
Temporary Storage: When a program is running on a CPU, its data is
stored in RAM from HDD or SDD. Temporary because RAM is volatile, it loses
all data when the power is turned off.
Advantages
Versatility: CPU can able to handle a complex task, from basic calculation
to managing the operating system.
Performance: Modern CPU are vary fast and able to perform billions of
calculation per second.
Multi-core: CPU have multiple core and able to handle multiple task
simultaneously.
Compatibility: CPUs are designed to be compatible with a wide range of
software, this help to run different applications by using single CPU.
Disadvantages
Overheating: CPU generate a lot of heat while performing complex task.
This requires effective cooling solutions, such as fans or liquid cooling
systems.
Power Consumption: High-performance CPUs can consume a vary high
amount of power, which cause to generate higher electricity bills and the
need for a robust power supply.
Cost: Best performance CPU can be expensive. Which can be a barrier for
some users or applications that need high computing power.
Limited Parallel Processing: While multi-core CPUs can handle multiple
tasks at once, they are still not as efficient at parallel processing as
specialized hardware like GPUs (Graphics Processing Units), which are
designed for handling many tasks simultaneously.
Microprogrammed Control
Microprogrammed control is a method used in the design of the control unit of a
computer’s processor. Unlike hardwired control, where control signals are generated
directly by combinational logic, microprogrammed control uses a set
of microinstructions stored in memory (usually in Control Memory) to generate
control signals.
1. Microinstructions
2. Microprogram Sequencing
3. Wide Branch Addressing
4. Microinstructions with Next-Address Field
5. Prefetching Microinstructions
6. Emulation
1. Microinstructions
A microinstruction is a low-level instruction that defines a control signal or a set of
control signals for a specific operation within the processor. These control signals are
necessary to coordinate the operation of the processor's functional units (such as the
ALU, registers, buses, and memory).
Control Bits: Each bit of the microinstruction controls a specific part of the processor
(e.g., ALU control, register load signals, memory read/write signals).
Address Field: Specifies the address of the next microinstruction in the sequence,
which is used for the microprogram sequencing.
Example: A microinstruction could specify the operation "load the value of register A
into the ALU" by setting control bits that activate the appropriate register and ALU paths.
2. Microprogram Sequencing
Microprogram sequencing refers to the method used to determine the sequence in
which microinstructions are executed. When an instruction is fetched and decoded, the
control unit must determine which microinstructions need to be executed in order to
perform the required operations.
1. Linear Sequence: The microinstructions are executed sequentially, one after the other.
2. Branching Sequence: Some microinstructions may cause a jump to another sequence
of microinstructions. This is necessary for handling control flow in the program (such as
branching, loops, or subroutine calls).
Key Consideration: The sequencer decides the next address by either incrementing
the address (in the case of a linear sequence) or using a jump mechanism (in the case of
branching).
The next-address field directly specifies the address of the next microinstruction,
effectively controlling the flow of execution in the microprogram.
This field is crucial for branching within the microprogram or for sequences that depend
on the outcome of an operation (such as the result of an arithmetic operation or a
comparison).
For example, a microinstruction could instruct the ALU to add two registers, and then use
the next-address field to either continue the execution of the next sequential
microinstruction or jump to a different part of the microprogram based on a conditional
test.
5. Prefetching Microinstructions
Prefetching microinstructions is a technique used to improve the performance of
microprogrammed control by loading the next microinstructions into the control
memory before they are actually needed.
Benefits of Prefetching:
6. Emulation
Emulation refers to the ability of a microprogrammed control system to simulate the
behavior of another system or instruction set architecture (ISA). This allows one
hardware system to execute programs written for a different system.
Types of Emulation:
Example of Emulation:
In direct-mapped cache, each memory location can be mapped to exactly one cache
line. This is the simplest form of cache mapping.
The memory address is divided into three parts:
Tag: Identifies which block of memory the cache line corresponds to.
Index: Specifies which cache line the data should be placed into.
Block offset: Indicates the specific byte within a cache block (or line).
The cache has a fixed number of lines (or slots), and each memory address maps to
exactly one of these lines.
Example: For a 32-bit memory address and a cache with 64 lines, the memory address
could be divided into:
In fully associative cache, a memory block can be placed in any cache line. There is
no predefined index for where data will be placed; instead, the entire cache is searched
for a match.
This type of cache is more flexible and can potentially have fewer cache misses, but it
requires more complex hardware to perform the search across all cache lines.
Example:
A 32-bit memory address might only contain a Tag and a Block offset. The index part
of the address is omitted since any block can be stored in any cache line.
For example, a 4-way set-associative cache means that each set has 4 cache lines, and a
memory address is mapped to one of these sets but can be placed in any line within the
set.
The LRU algorithm evicts the cache block that has not been used for the longest period
of time. The idea is that blocks that haven’t been used recently are less likely to be used
again soon.
LRU typically requires maintaining a record of access times or ordering, which can be
done either using a counter or a stack.
The FIFO algorithm evicts the block that was loaded into the cache first. It maintains a
queue of cache blocks, and the oldest block (at the front of the queue) is replaced when
a new block needs to be loaded.
FIFO is simple to implement, but it does not always perform optimally.
The Optimal algorithm evicts the block that will not be used for the longest period of
time in the future. While optimal, it’s not practical for real systems because it requires
knowledge of future memory accesses.
It is often used as a benchmark to compare the performance of other algorithms.
The LFU algorithm replaces the block that has been accessed the least frequently. It
maintains a count of accesses to each cache block, and the block with the lowest access
count is evicted.
The Random algorithm randomly selects a cache line to replace when a new data block
must be loaded into the cache. It’s simple to implement but may result in poor cache
performance compared to other algorithms.
Example:
Tag: 18 bits
Index: 6 bits (since the cache has 64 lines, we need log2(64)=6log2(64)=6 bits for
indexing)
Block offset: 8 bits (since the block size is 4 bytes, we need log2(4)=2log2(4)=2 bits for
the block offset)
Address Breakdown:
The Tag (18 bits) is used to identify which block of memory is being referenced.
The Index (6 bits) specifies which line of the cache the block will be placed in.
The Block offset (8 bits) tells which byte within the 4-byte block we are interested in.
Example Mapping:
This means the data for address 0x12345678 would be placed in cache line 0x12, and the
data inside the block would be loaded starting at byte 0x78.
How It Works:
Memory interleaving divides the system's main memory into multiple banks (e.g., 2-way,
4-way, or 8-way interleaving), with each bank containing a portion of the address space.
Data from consecutive memory addresses is placed in different memory banks, so when
the CPU needs to access consecutive data, it can request data from multiple banks
simultaneously.
This parallelism minimizes waiting time and maximizes throughput, reducing
the memory bottleneck.
Types of Interleaving:
Benefits:
Increased Throughput: With multiple banks, the memory system can handle more
than one request at a time, reducing wait times.
Reduced Latency: By splitting the memory load, interleaving can speed up the overall
memory access time, as the CPU can simultaneously access data from multiple locations.
Example:
In a 2-way interleaved memory system, addresses are divided into two groups:
addresses 0, 2, 4, 6 are mapped to Bank 1, and addresses 1, 3, 5, 7 are mapped to Bank
2. By accessing data from two banks at once, the CPU reduces the latency of sequential
accesses.
The hit rate is the percentage of memory accesses that are satisfied by the cache,
meaning the requested data is already present in the cache. A high hit rate means that
the processor can quickly access data without needing to fetch it from slower main
memory.
Formula:
Hit Rate=Cache HitsTotal Memory AccessesHit Rate=Total Memory AccessesCache Hits
Impact: A high hit rate reduces the need to access main memory, leading to faster data
retrieval and better performance.
The miss penalty is the additional time it takes to retrieve data from main memory (or
possibly from secondary storage like a hard disk) when a cache miss occurs. This penalty
occurs because accessing main memory is much slower than accessing the cache, and a
miss results in a delay that must be accounted for in performance.
Formula:
Miss Penalty=Time to fetch data from main memory+Time to load data into cacheMiss Pena
lty=Time to fetch data from main memory+Time to load data into cache
Impact: A higher miss penalty means longer delays in fetching data, which directly
impacts system performance. Therefore, reducing miss penalties is a key design
consideration.
Overall Performance:
High Hit Rate: Fewer accesses to slower memory, leading to faster data retrieval and
reduced processor idle time.
Low Miss Penalty: Minimizes the time spent waiting for data to be retrieved from
memory, reducing performance bottlenecks.
Example:
If the hit rate is 90%, 90% of memory accesses are served by the cache, and only 10%
require accessing slower memory. If the miss penalty is high (e.g., 100 CPU cycles to
fetch data from memory), the system will still experience slowdowns despite the high hit
rate.
There are typically multiple levels of cache (L1, L2, and L3) to optimize performance
further:
(a) L1 Cache
(b) L2 Cache
Location: Typically located on the processor chip, but may also be located near the
processor.
Size: Larger than L1 (typically 128 KB to several MB).
Speed: Slower than L1 but still faster than main memory (3-6 CPU cycles).
Role: Holds data and instructions that are not in the L1 cache but are likely to be
accessed soon.
(c) L3 Cache
1. Reduced Latency: The closer a cache is to the CPU, the faster the data retrieval,
reducing delays in memory access.
2. Higher Data Throughput: Caches on the processor chip can handle more data in
parallel, ensuring that the CPU spends less time waiting for memory access.
3. Improved Performance: With data readily available in caches, the processor can
execute instructions more efficiently, improving overall performance.
Cache Hierarchy:
L1, L2, and L3 Caches form a hierarchy, where data is first searched in L1, then L2,
and finally L3. If the data is not found in any of these caches, it will be fetched from main
memory.
Example:
If the CPU needs data that is frequently used, L1 cache will provide the fastest response.
If it's not in L1, L2 cache will be checked, and if it's not there, L3 cache is consulted, and
then the main memory. This multi-level cache hierarchy ensures that data access
times are minimized.
Problem: In a multi-core processor, each core has its own cache. This can lead to cache
coherence problems where multiple caches contain different versions of the same
data.
Solution: Cache coherence protocols like MESI (Modified, Exclusive, Shared,
Invalid) ensure that all caches in the system maintain a consistent view of the data.
Modified: Data is updated in the cache and is not present in memory.
Exclusive: Data is only in one cache, and it's the correct version.
Shared: Data is in multiple caches but is unmodified.
Invalid: Data is no longer valid or has been updated elsewhere.
LRU (Least Recently Used): Replaces the least recently accessed cache block when a
new one needs to be loaded.
FIFO (First In, First Out): Replaces the oldest cache block.
Random Replacement: Chooses a cache block to replace at random.
Optimal Replacement: Replaces the cache block that will not be used for the longest
period in the future.
Cached Pages
These are those allocated pages that are currently cached in physical
memory. They are also present on disk.
Uncached Pages
These are those allocated pages that are not cached in physical memory. They
are present on disk.
Example
Page Size
The physical page number constitutes the upper portion of the physical
address, while the page offset, which is not changed, constitutes the lower
portion.
The number of bits in the page offset field determines the page size.
A page table is used to carry out the translation from the virtual page number
to the physical page number.
Page Table
A page table is indexed with the page number from the virtual address. It
maps to a physical address in memory. Each program has its own page table,
which maps the virtual address space of that program to main memory. The
page table may contain entries for pages not present in memory.
The page table itself also resides in main memory. To indicate the location of
the page table in memory, the hardware includes a register that points to the
start of the page table; called the Page Table Base Register (PTBR).
The page table, together with the program counter and the registers,
specifies the state of a program. This state is often referred as a process. The
process is considered active when it is in possession of the processor;
otherwise, it is considered inactive. If we want to allow another program to
use the processor, we must save this state. The operating system can make a
process active by loading the process’s state, including the program counter,
which will initiate execution at the value of the saved program counter.
Rather than saving the entire page table, the operating system simply loads
the page table register to point to the page table of the process it wants to
make active.
The operating system is responsible for allocating the physical memory and
updating the page tables, so that the virtual address spaces of different
processes do not collide. The use of separate page tables also provides
protection of one process from another.
Because the page table contains a mapping for every possible virtual page, no
tags are required. The index that is used to access the page table consists of
the full block address, which is the virtual page number.
A valid bit is used in each page table entry. If the bit is off, the page is not
present in main memory (but present in disk) and a page fault occurs. If the
bit is on, the page is in memory and the entry contains the physical page
number.
Page Hit
In case of a page hit, the page table entry contains the physical page number.
1. Processor sends Virtual Address (VA).
2. MMU uses PTEA to access Page Table in memory.
3. MMU retrieves Page Table Entry (PTE) from Page Table.
4. MMU sends Physical Address (PA) to L1 cache.
5. L1 cache sends data word to processor.
Page Fault
If the valid bit for a virtual page is off, a page fault occurs. Control is
transferred to the operating system via the exception mechanism.
Once the operating system gets control, it must find the page in the next level
of the hierarchy and decide where to place the requested page in main
memory.
The space on the disk reserved for the full virtual memory space of a process
is called the swap space. The operating system allocates this swap space.
The operating system also tracks which processes and which virtual
addresses use each physical page. When a page fault occurs, if all the pages
in main memory are in use, the operating system must choose a page to
replace.
The operating system searches for the least recently used (LRU) page,
assuming that a page that has not been used in a long time is less likely to be
needed than a more recently accessed page. The replaced pages are written
to swap space on the disk.
1. Processor sends Virtual Address (VA).
2. MMU uses PTEA to access Page Table in memory.
3. MMU retrieves Page Table Entry (PTE) from Page Table.
4. Valid bit is zero, so MMU triggers page fault exception.
5. Handler identifies victim, and if dirty pages it out to disk.
6. Handler pages in new page and updates PTE in memory.
7. Handler returns to original process, restarting faulting instruction.