Computer System
Computer System
Computer System
1
Introduction to Computer System
Introduction :: Computer Organization and Architecture (1)
Computer technology has made incredible improvement in the past half century. In the early part of
computer evolution, there were no stored-program computer, the computational power was less and on the
top of it the size of the computer was a very huge one.
Today, a personal computer has more computational power, more main memory, more disk storage, smaller
in size and it is available in affordable cost.
This rapid rate of improvement has come both from advances in the technology used to build computers
and from innovation in computer design. In this course we will mainly deal with the innovation in
computer design.
The task that the computer designer handles is a complex one: Determine what attributes are important for
a new machine, then design a machine to maximize performance while staying within cost constraints.
This task has many aspects, including instruction set design, functional organization, logic design, and
implementation.
While looking for the task for computer design, both the terms computer organization and computer
architecture come into picture.
2
Introduction to Computer System
It is difficult to give precise definition for the terms Computer Organization and Computer
Architecture. But while describing computer system, we come across these terms, and in
literature, computer scientists try to make a distinction between these two terms.
Computer architecture refers to those parameters of a computer system that are visible to a
programmer or those parameters that have a direct impact on the logical execution of a
program. Examples of architectural attributes include the instruction set, the number of bits
used to represent different data types, I/O mechanisms, and techniques for addressing memory.
Computer organization refers to the operational units and their interconnections that realize the
architectural specifications. Examples of organizational attributes include those hardware
details transparent to the programmer, such as control signals, interfaces between the computer
and peripherals, and the memory technology used.
In this course we will touch upon all those factors and finally come up with the concept how
these attributes contribute to build a complete computer system.
3
Introduction to Computer System
The model of a computer can be described by four basic units in high level abstraction. These
basic units are:
4
Introduction to Computer System
Basic Computer Model and different units of Computer
• The program control unit has a [set of registers and control circuit] to generate control signals.
In addition, CPU may have some additional registers for temporary storage of data.
B. Input Unit :
With the help of input unit data from outside can be supplied to the computer. Program or data is read into
main storage from input device or secondary storage under the control of CPU input instruction.
Example of input devices: Keyboard, Mouse, Hard disk, CD-ROM drive, Scanner, microphone, digital
camera etc. 5
C. Output Unit :
With the help of output unit computer results can be provided to the user or it can be stored in
storage device permanently for future use. Output data from main storage go to output device
under the control of CPU output instructions.
D. Memory Unit :
Memory unit is used to store the data and program. CPU can work with the information stored
in memory unit. This memory unit is termed as primary memory or main memory module.
These are basically semi conductor memories.
There is another kind of storage device, apart from primary or main memory, which is known
as a secondary memory.
Secondary memories are non volatile memory and it is used for permanent storage of data and
program.
Hard Disk, Floppy Disk, Magnetic Tape These are magnetic devices,
CD / DVD -ROM is optical device
7
Introduction to Computer System
Basic Working Principle of a Computer (2)
Before going into the details of working principle of a computer, we will analyse how computer works with
the help of a small hypothetical computer.
In this small computer, we do not consider about Input and Output unit. We will consider only CPU and
memory module. Assume that somehow we have stored the program and data into main memory. We will
see how CPU can perform the job depending on the program stored in main memory.
Our assumption is that students understand common terms like program, CPU, memory etc. without
knowing the exact details.
Consider the Arithmetic and Logic Unit (ALU) of Central Processing Unit :
Consider an ALU which can perform four arithmetic operations and four logical operations
To distinguish between arithmetic and logical operation, we may use a signal line,
0 in that signal, represents an arithmetic operation and
1 in that signal, represents a logical operation.
In the similar manner, we need another two signal lines to distinguish between four arithmetic operations.
8
Basic Working Principle of a Computer
The different operations and their binary code is as follows:
Arithmetic Logical
000 ADD 100 OR
001 SUB 101 AND
010 MULT 110 NAND
011 DIV 111 ADD
Consider the part of control unit, its task is to generate the appropriate signal at right moment.
There is an instruction decoder in CPU which decodes this information in such a way that computer can
perform the desired task.
The simple model for the decoder may be considered that there is three input lines to the decoder and
correspondingly it generates eight output lines. Depending on input combination only one of the output
signals will be generated and it is used to indicate the corresponding operation of ALU.
Some of them are inside CPU, which are known as register. Other bigger junk of storage space
is known as primary memory or main memory.
The CPU can work with the information available in main memory only.
To access the data from memory, we need two special registers one is known as Memory Data
Register (MDR) and the second one is Memory Address Register (MAR).
Data and program is stored in main memory. While executing a program, CPU brings
instruction and data from main memory, performs the tasks as per the instruction fetch from the
memory. After completion of operation, CPU stores the result back into the memory.
In next section, we discus about memory organization for our small machine.
10
Introduction to Computer System
Main Memory Organization (3)
Main memory unit is the storage unit, There are several location for storing information in the main memory module.
The capacity of a memory module is specified by the number of memory location and the information stored in each
location.
A memory module of capacity 16 X 4 indicates that, there are 16 location (chips) in the memory module and in each
location, we can store 4 bit of information.
We have to know how to indicate or point to a specific memory location. This is done by address of the memory location.
READ Operation : This operation is to retrieve the data from memory and bring it to CPU register
WRITE Operation: This operation is to store the data to a memory location from CPU register
11
We need some mechanism to distinguish these two operations READ and WRITE.
Main Memory Organization
With the help of one signal line, we can differentiate these two operations. If the content of this signal line is
0, we say that we will do a READ operation; and if it is
1, then it is a WRITE operation.
To transfer the data from CPU to memory module and vice-versa, we need some connection. This is termed as
DATA BUS.
The size of the data bus indicate how many bit we can transfer at a time. Size of data bus is mainly specified by the
data storage capacity of each location of memory module.
We have to resolve the issues how to specify a particular memory location where we want to store our data or from
where we want to retrieve the data.
This can be done by the memory address. Each location can be specified with the help of a binary address.
If we use 4 signal lines, we have 16 different combinations in these four lines, provided we use two signal values
only (say 0 and 1).
To distinguish 16 location, we need four signal lines. These signal lines use to identify a memory location is
termed as ADDRESS BUS.
Size of address bus depends on the memory size. For a memory module of capacity of 2^n location, we need n
address lines, that is, an address bus of size n.
12
Main Memory Organization
We use a address decoder to decode the address that are present in address bus
As for example, consider a memory module of 16 location and each location can store 4 bit of
information
The size of address bus is 4 bit and the size of the data bus is 4 bit
The size of address decoder is 4 X 16.
If the contents of address bus is 0101 and contents of data bus is 1100 and R/W = 1, then 1100
will be written in location 5.
If the contents of address bus is 1011 and R/W=0, then the contents of location 1011 will be
placed in data bus.
In next section, we will explain how to perform memory access operation in our small hypothetical computer
13
Introduction to Computer System
Memory Instruction
We need some more instruction to work with the computer. Apart from the instruction needed to perform
task inside CPU, we need some more instructions for data transfer from main memory to CPU and vice
versa.
In our hypothetical machine, we use three signal lines to identify a particular instruction. If we want to
include more instruction, we need additional signal lines.
With this additional signal line, we can go up to 16 instructions. When the signal of this new line is 0, it will
indicate the ALU operation. For signal value equal to 1, it will indicate 8 new instructions. So, we can
design 8 new memory access instructions.
14
We have added 6 new instructions. Still two codes are unused, which can be used for other
purposes. We show it as NOP means No Operation.
We have seen that for ALU operation, instruction decoder generated the signal for appropriate
ALU operation.
Apart from that we need many more signals for proper functioning of the computer.
Therefore, we need a module, which is known as control unit, and it is a part of CPU. The
control unit is responsible to generate the appropriate signal.
As for example, for LDAI instruction, control unit must generate a signal which enables the
register A to store in data into register A.
One major task is to design the control unit to generate the appropriate signal at appropriate
time for the proper functioning of the computer.
Consider a simple problem to add two numbers and store the result in memory, say we want
to add 7 to 5.
To solve this problem in computer, we have to write a computer program. The program is
machine specific, and it is related to the instruction set of the machine. 15
For our hypothetical machine, the program is as follows
16
Consider another example, say that the first number is stored in memory location 13 and the second
data is stored in memory location 14. Write a program to Add the contents of memory location 13
and 14 and store the result in memory location 15.
One question still remain unanswered: How to store the program or data to main memory.
Once we put the program and data in main memory, then only CPU can execute the program. For
that we need some more instructions.
We need some instructions to perform the input tasks. These instructions are responsible to
provide the input data from input devices and store them in main memory. For example instructions
are needed to take input from keyboard.
We need some other instructions to perform the output tasks. These instructions are responsible to
provide the result to output devices. For example, instructions are needed to send the result to printer.
17
We have seen that number of instructions that can be provided in a computer depends on the
signal lines that are used to provide the instruction, which is basically the size of the storage
devices of the computer.
For uniformity, we use same size for all storage space, which are known as register. If we work
with a 16-bit machine, total instructions that can be implemented is 2^16.
The model that we have described here is known as Von Neumann Stored Program Concept.
First we have to store all the instruction of a program in main memory, and CPU can work
with the contents that are stored in main memory. Instructions are executed one after another.
We have explained the concept of computer in very high level abstraction by omitting most of
the details.
As the course progresses we will explain the exact working principle of computer in more details.
18
Introduction to Computer System
The present day digital computers are based on stored-program concept introduced by Von
Neumann. In this stored-program concept, programs and data are stored in separate storage
unit called memories.
Central Processing Unit, the main component of computer can work with the information
stored in storage unit only.
In 1946, Von Neumann and his colleagues began the design of a stored-program computer at
the Institute for Advanced Studies in Princeton. This computer is referred as the IAS computer.
19
Figure : Structure of a first generation computer : IAS 20
The IAS computer is having three basic units:
This is the main unit of computer, which is responsible to perform all the operations. The
CPU of the IAS computer consists of a data processing unit and a program control unit.
The data processing unit contains a high speed registers intended for temporary storage of
instructions, memory addresses and data. The main action specified by instructions are
performed by the arithmetic-logic circuits of the data processing unit.
The control circuits in the program control unit are responsible for fetching instructions,
decoding opcodes, controlling the information movements correctly through the system, and
providing proper control signals for all CPU actions.
21
The Main Memory Unit:
It is used for storing programs and data. The memory locations of memory unit is uniquely specified by the
memory address of the location. M(X) is used to indicate the location of the memory unit M with address X.
The data transfer between memory unit and CPU takes place with the help of data register DR. When CPU
wants to read some information from memory unit, the information first brings to DR, and after that it goes
to appropriate position. Similarly, data to be stored to memory must put into DR first, and then it is stored to
appropriate location in the memory unit.
The address of the memory location that is used during memory read and memory write operations are
stored in the memory register AR.
The information fetched from the memory is a operand of an instruction, then it is moved from DR to data
processing unit (either to AC or MQ). If it is an operand, then it is moved to program control unit (either to
IR or IBR).
Two additional registers for the temporary storage of operands and results are included in data processing
units: the accumulator AC and the multiplier-quotient register MQ.
Two instructions are fetch simultaneously from M and transferred to the program control unit. The
instruction that is not to be executed immediately is placed in the instruction buffer register IBR. The
opcode of the other instruction is placed in the instruction register IR where it is decoded.
22
In the decoding phase, the control circuits generate the required control signals to perform the
specified operation in the instruction.
The program counter PC is used to store the address of the next instruction to be fetched from
memory.
Input devise are used to put the information into computer. With the help of input devices we
can store information in memory so that CPU can use it. Program or data is read into main
memory from input device or secondary storage under the control of CPU input instruction.
Output devices are used to output the information from computer. If some results are evaluated
by computer and it is stored in computer, then with the help of output devices, we can present it
to the user. Output data from the main memory go to output device under the control of CPU
output instruction.
23
A Brief History of Computer Architecture (5)
Computer Architecture is the field of study of selecting and interconnecting hardware
components to create computers that satisfy functional performance and cost goals. It refers to
those attributes of the computer system that are visible to a programmer and have a direct effect
on the execution of a program.
Instruction set
Data formats
Principle of Operation (formal description of every operation)
Features (organization of programmable storage, registers used, interrupts mechanism, etc.)
In short, it is the combination of Instruction Set Architecture, Machine Organization and the
related hardware.
24
First Generation (1940-1950) :: Vacuum Tube
ENIAC [1945]: Designed by Mauchly & Echert, built by US army to calculate trajectories
for ballistic shells during World War II. Around 18000 vacuum tubes and 1500 relays were
used to build ENIAC, and it was programmed by manually setting switches
UNIVAC [1950]: the first commercial computer
John Von Neumann architecture: Goldstine and Von Neumann took the idea of ENIAC
and developed concept of storing a program in the memory. Known as the Von Neumann's
architecture and has been the basis for virtually every machine designed since then.
Features:
Electron emitting devices
Data and programs are stored in a single read-write memory
Memory contents are addressable by location, regardless of the content itself
Machine language/Assemble language
Sequential execution
25
Second Generation (1950-1964) :: Transistors
• William Shockley, John Bardeen, and Walter Brattain invent the transistor that reduce size
of computers and improve reliability. Vacuum tubes have been replaced by transistors.
26
Third Generation (1964-1974) :: Integrated Circuits (IC)
27
Major advances in computer architecture are typically associated with landmark instruction set
designs. Computer architecture's definition itself has been through bit changes. The following are
the main concern for computer architecture through different times:
1970-1980: Instruction Set Design, especially for compilers; Vector processing and shared
memory multiprocessors
– RISC
Under a rapidly changing set of forces, computer technology keeps at dramatic change, for
example:
29
A Brief History of Computer Organization (6)
If computer architecture is a view of the whole design with the important characteristics
visible to programmer, computer organization is how features are implemented with the
specific building blocks visible to designer, such as control signals, interfaces, memory
technology, etc. Computer architecture and organization are closely related, though not exactly
the same.
Computer organization defines the ways in which these components are interconnected and
controlled. It is the capabilities and performance characteristics of those principal functional
units. Architecture can have a number of organizational implementations, and organization
differs between different versions. Such, all Intel x86 families share the same basic
architecture, and IBM system/370 family share their basic architecture. 30
Computer architecture has progressed five generation: vacuum tubes, transistors, integrated circuits, and
VLSI. Computer organization has also made its historic progression accordingly.
The advance of microprocessor ( Intel)
1977: 8080 - the first general purpose microprocessor, 8 bit data path, used in first personal computer
1978: 8086 - much more powerful with 16 bit, 1MB addressable, instruction cache, prefetch few
instructions
1980: 8087 - the floating point coprocessor is added
1982: 80286 - 24 Mbyte addressable memory space, plus instructions
1985: 80386 - 32 bit, new addressing modes and support for multitasking
1989 -- 1995:
80486 25, 33, MHz, 1.2 M transistors, 5 stage pipeline, sophisticated powerful cache and
instruction pipelining, built in math co-processor.
Pentium 60, 66 MHz, 3.1 M transistor, branch predictor, pipelined floating point, multiple
instructions executed in parallel, first superscalar IA-32.
PentiumPro Increased superscalar, register renaming, branch prediction, data flow analysis, and
speculative execution
1995 -- 1997: Pentium II - 233, 166, 300 MHz, 7.5 M transistors, first compaction of micro-
architecture, MMX technology, graphics video and audio processing.
1999: Pentium III - additional floating point instructions for 3D graphics
2000: Pentium IV - Further floating point and multimedia enhancements
31
A Brief History of Computer Organization
Evolution of Memory
1995, EDO - Extended Data Output, which increases the read cycle between memory and
CPU, 20 MHz
1997- 1998: SDRAM - Synchronous DRAM, which synchronizes itself with the CPU bus
and runs at higher clock speeds, PC66 at 66 MHz, PC100 at 100 MHz
1999: RDRAM - Rambus DRAM, which DRAM with a very high bandwidth, 800 MHz
A bus is a parallel circuit that connects the major components of a computer, allowing the transfer of
electric impulses form one connected component to any other.
• IDE - Integrated Drive Electronics, also know as ATA, EIDE, Ultra ATA, Ultra DMA,
most widely used interface for hard disks
• PS/2 port - mini Din plug with 6 pins for a mouse and keyboard
35
Concept of Memory
We have already mentioned that digital computer works on stored programmed concept introduced
by Von Neumann. We use memory to store the information, which includes both program and data.
Due to several reasons, we have different kind of memories. We use different kind of memory at
different level.
Internal memory is used by CPU to perform task and external memory is used to store bulk
information, which includes large software and data.
Memory is used to store the information in digital form. The memory hierarchy is given by:
• Register
• Cache Memory
• Main Memory
• Magnetic Disk
• Removable media (Magnetic tape) 36
Concept of Memory
Register:
This is a part of Central Processor Unit, so they reside inside the CPU. The information from
main memory is brought to CPU and keep the information in register. Due to space and cost
constraints, we have got a limited number of registers in a CPU. These are basically faster
devices.
Cache Memory:
Cache memory is a storage device placed in between CPU and main memory. These are
semiconductor memories. These are basically fast memory device, faster than main memory.
We can not have a big volume of cache memory due to its higher cost and some constraints of
the CPU. Due to higher cost we can not replace the whole main memory by faster memory.
Generally, the most recently used information is kept in the cache memory. It is brought from
the main memory and placed in the cache memory. Now a days, we get CPU with internal
cache.
Main Memory:
Like cache memory, main memory is also semiconductor memory. But the main memory is
relatively slower memory. We have to first bring the information (whether it is data or
program), to main memory. CPU can work with the information available in main memory
only. 37
Concept of Memory
Magnetic Disk:
This is bulk storage device. We have to deal with huge amount of data in many application.
But we don't have so much semiconductor memory to keep these information in our computer.
On the other hand, semiconductor memories are volatile in nature. It loses its content once we
switch off the computer. For permanent storage, we use magnetic disk. The storage capacity of
magnetic disk is very high.
Removable media:
For different application, we use different data. It may not be possible to keep all the
information in magnetic disk. So, which ever data we are not using currently, can be kept in
removable media. Magnetic tape is one kind of removable medium. CD is also a removable
media, which is an optical device.
38
Concept of Memory
Register, cache memory and main memory are internal memory. Magnetic Disk, removable
media are external memory. Internal memories are semiconductor memory. Semiconductor
memories are categorised as volatile memory and non-volatile memory.
RAM: Random Access Memories are volatile in nature. As soon as the computer is switched
off, the contents of memory are also lost.
ROM: Read only memories are non volatile in nature. The storage is permanent, but it is read
only memory. We can not store new information in ROM.
PROM: Programmable Read Only Memory; it can be programmed once as per user
requirements.
EPROM: Erasable Programmable Read Only Memory; the contents of the memory can be
erased and store new data into the memory. In this case, we have to erase
whole information.
EEPROM: Electrically Erasable Programmable Read Only Memory; in this type of
memory the contents of a particular location can be changed without
effecting the contents of other location.
39
Concept of Memory
Main Memory
The main memory of a computer is semiconductor memory. The main memory unit of computer is
basically consists of two kinds of memory:
The permanent information are kept in ROM and the user space is basically in RAM.
The smallest unit of information is known as bit (binary digit), and in one memory cell we can store one bit
of information. 8 bit together is termed as a byte.
The maximum size of main memory that can be used in any computer is determined by the addressing
scheme.
A computer that generates 16-bit address is capable of addressing up-to 216 which is equal to 64K memory
location. Similarly, for 32 bit addresses, the total capacity will be 232 which is equal to 4G memory
location.
In some computer, the smallest addressable unit of information is a memory word and the machine is called
40
word addressable
Concept of Memory
In some computer, individual address is assigned for
each byte of information, and it is called byte-addressable
computer. In this computer, one memory word contains
one or more memory bytes which can be addressed
individually.
If the MAR is k-bit long, then the total addressable memory location will be 2^k.
If the MDR is n-bit long, then the n bit of data is transferred in one memory cycle.
The transfer of data takes place through memory bus, which consist of address bus and data
bus. In the above example, size of data bus is n-bit and size of address bus is k bit.
It also includes control lines like Read, Write and Memory Function Complete (MFC) for
coordinating data transfer. In the case of byte addressable computer, another control line to be
added to indicate the byte transfer instead of the whole word.
42
Concept of Memory
For memory operation, the CPU initiates a memory operation by loading the appropriate data
i.e., address to MAR.
If it is a memory read operation, then it sets the read memory control line to 1. Then the
contents of the memory location is brought to MDR and the memory control circuitry indicates
this to the CPU by setting MFC to 1.
If the operation is a memory write operation, then the CPU places the data into MDR and sets
the write memory control line to 1. Once the contents of MDR are stored in specified memory
location, then the memory control circuitry indicates the end of operation by setting MFC to 1.
A useful measure of the speed of memory unit is the time that elapses between the initiation of
an operation and the completion of the operation (for example, the time between Read and
MFC). This is referred to as Memory Access Time. Another measure is memory cycle time.
This is the minimum time delay between the initiation two independent memory operations
(for example, two successive memory read operation). Memory cycle time is slightly larger
than memory access time.
43
Concept of Memory (8)
The memory constructed with the help of transistors is known as semiconductor memory. The
semiconductor memories are termed as Random Access Memory(RAM), because it is
possible to access any memory location in random.
Depending on the technology used to construct a RAM, there are two types of RAM :
44
Concept of Memory
Dynamic Ram (DRAM):
A DRAM is made with cells that store data as charge on capacitors. The presence or absence
of charge in a capacitor is interpreted as binary 1 or 0.
Because capacitors have a natural tendency to discharge due to leakage current, dynamic RAM
require periodic charge refreshing to maintain data storage. The term dynamic refers to this
tendency of the stored charge to leak away, even with power continuously applied.
A typical DRAM structure for an individual cell that stores one bit information is shown in the
figure.
45
Concept of Memory
For the write operation, a voltage signal is applied to the bit line B, a high voltage represents 1
and a low voltage represents 0. A signal is then applied to the address line, which will turn on
the transistor T, allowing a charge to be transferred to the capacitor.
For the read operation, when a signal is applied to the address line, the transistor T turns on
and the charge stored on the capacitor is fed out onto the bit line B.
46
Concept of Memory
Static RAM (SRAM):
In an SRAM, binary values are stored using traditional flip-flop constructed with the help of
transistors. A static RAM will hold its data as long as power is supplied to it.
47
Concept of Memory (8)
Four transistors (T1, T2, T3, T4) are cross connected in an arrangement that produces a stable logic state.
In logic state 1, point A1 is high and point A2 is low; in this state T1 and T4 are off, and T2 and T3 are on .
In logic state 0, point A1 is low and point A2 is high; in this state T1 and T4 are on, and T2 and T3 are off .
The address line is used to open or close a switch which is nothing but another transistor. The address line
controls two transistors(T5 and T6).
When a signal is applied to this line, the two transistors are switched on, allowing a read or write operation.
For a write operation, the desired bit value is applied to line B, and its complement is applied to line . This
forces the four transistors(T1, T2, T3, T4) into the proper state.
For a read operation, the bit value is read from the line B. When a signal is applied to the address line, the
signal of point A1 is available in the bit line B.
48
Concept of Memory
SRAM Versus DRAM :
• Both static and dynamic RAMs are volatile, that is, it will retain the information as long
as power supply is applied.
• A dynamic memory cell is simpler and smaller than a static memory cell. Thus a DRAM
is more dense, i.e., packing density is high(more cell per unit area). DRAM is less
expensive than corresponding SRAM.
• DRAM requires the supporting refresh circuitry. For larger memories, the fixed cost of the
refresh circuitry is more than compensated for by the less cost of DRAM cells
• SRAM cells are generally faster than the DRAM cells. Therefore, to construct faster
memory modules(like cache memory) SRAM is used.
49
Cache Memory
Analysis of large number of programs has shown that a number of instructions are executed
repeatedly. This may be in the form of a simple loops, nested loops, or a few procedures that
repeatedly call each other. It is observed that many instructions in each of a few localized areas
of the program are repeatedly executed, while the remainder of the program is accessed
relatively less. This phenomenon is referred to as locality of reference.
50
Cache Memory
Now, if it can be arranged to have the active segments of a program in a fast memory, then
the total execution time can be significantly reduced. It is the fact that CPU is a faster device
and memory is a relatively slower device.
Memory access is the main bottleneck for the performance efficiency. If a faster memory
device can be inserted between main memory and CPU, the efficiency can be increased. The
faster memory that is inserted between CPU and Main Memory is termed as Cache memory.
To make this arrangement effective, the cache must be considerably faster than the main
memory, and typically it is 5 to 10 time faster than the main memory. This approach is more
economical than the use of fast memory device to implement the entire main memory. This is
also a feasible due to the locality of reference that is present in most of the program, which
reduces the frequent data transfer between main memory and cache memory.
51
Cache Memory
Operation of Cache Memory
The memory control circuitry is designed to take advantage of the property of locality of reference. Some
assumptions are made while designing the memory control circuitry:
1.The CPU does not need to know explicitly about the existence of the cache.
2. The CPU simply makes Read and Write request. The nature of these two operations are same whether
cache is present or not.
3. The address generated by the CPU always refer to location of main memory.
4. The memory access control circuitry determines whether or not the requested word currently exists in the
cache.
When a Read request is received from the CPU, the contents of a block of memory words containing the
location specified are transferred into the cache. When any of the locations in this block is referenced by the
program, its contents are read directly from the cache.
The cache memory can store a number of such blocks at any given time.
The correspondence between the Main Memory Blocks and those in the cache is specified by means of a52
mapping function.
Cache Memory
When the cache is full and a memory word is referenced that is not in the cache, a decision must be made
as to which block should be removed from the cache to create space to bring the new block to the cache
that contains the referenced word. Replacement algorithms are used to make the proper selection of block
that must be replaced by the new one.
When a write request is received from the CPU, there are two ways that the system can proceed. In the
first case, the cache location and the main memory location are updated simultaneously. This is called the
store through method or write through method.
The alternative is to update the cache location only. During replacement time, the cache block will be
written back to the main memory. If there is no new write operation in the cache block, it is not required
to write back the cache block in the main memory. This information can be kept with the help of an
associated bit. This bit it set while there is a write operation in the cache block. During replacement, it
checks this bit, if it is set, then write back the cache block in main memory otherwise not. This bit is
known as dirty bit. If the bit gets dirty (set to one), writing to main memory is required.
This write through method is simpler, but it results in unnecessary write operations in the main memory
when a given cache word is updated a number of times during its cache residency period.
Consider the case where the addressed word is not in the cache and the operation is a read. First the block
of the words is brought to the cache and then the requested word is forwarded to the CPU. But it can be
forwarded to the CPU as soon as it is available to the cache, instead of whole block to be loaded into the
cache. This is called load through, and there is some scope to save time while using load through policy.
53
Cache Memory
During a write operation, if the address word is not in the cache, the information is written
directly into the main memory.
A write operation normally refers to the location of data areas and the property of locality of
reference is not as pronounced in accessing data when write operation is involved. Therefore,
it is not advantageous to bring the data block to the cache when there a write operation, and
the addressed word is not present in cache.
54
Cache Memory
Mapping Functions
The mapping functions are used to map a particular block of main memory to a particular
block of cache. This mapping function is used to transfer the block from main memory to
cache memory. Three different mapping functions are available:
Direct mapping:
A particular block of main memory can be brought to a particular block of cache memory. So,
it is not flexible.
Associative mapping:
In this mapping function, any block of Main memory can potentially reside in any cache
block position. This is much more flexible mapping method.
Block-set-associative mapping:
In this method, blocks of cache are grouped into sets, and the mapping allows a block of main
memory to reside in any block of a specific set. From the flexibility point of view, it is in
between to the other two methods.
55
Cache Memory
Replacement Algorithms
When a new block must be brought into the cache and all the positions that it may occupy are full,
a decision must be made as to which of the old blocks is to be overwritten. In general, a policy is
required to keep the block in cache when they are likely to be referenced in near future. However, it
is not easy to determine directly which of the block in the cache are about to be referenced. The
property of locality of reference gives some clue to design good replacement policy.
When a hit occurs, that is, when a read request is received for a word that is in the cache, the
counter of the block that is referenced is set to 0. All counters which values originally lower than
the referenced one are incremented by 1 and all other counters remain unchanged. 56
Cache Memory
When a miss occurs, that is, when a read request is received for a word and the word is not
present in the cache, we have to bring the block to cache.
If the set is not full, the counter associated with the new block loaded from the main memory is
set to 0, and the values of all other counters are incremented by 1.
If the set is full and a miss occurs, the block with the counter value 3 is removed , and the new
block is put in its place. The counter value is set to zero. The other three block counters are
incremented by 1.
It is easy to verify that the counter values of occupied blocks are always distinct. Also it is
trivial that highest counter value indicates least recently used block.
57
Cache Memory
A reasonable rule may be to remove the oldest from a full set when a new block must be
brought in. While using this technique, no updating is required when a hit occurs. When a
miss occurs and the set is not full, the new block is put into an empty block and the counter
values of the occupied block will be increment by one. When a miss occurs and the set is full,
the block with highest counter value is replaced by new block and counter is set to 0, counter
value of all other blocks of that set is incremented by 1. The overhead of the policy is less,
since no updating is required during hit.
58
Memory Management
Main Memory
The main working principle of digital computer is Von-Neumann stored program principle. First of all we
have to keep all the information in some storage, mainly known as main memory, and CPU interacts with
the main memory only. Therefore, memory management is an important issue while designing a computer
system.
On the other hand, everything cannot be implemented in hardware, otherwise the cost of system will be
very high. Therefore some of the tasks are performed by software program. Collection of such software
programs are basically known as operating systems. So operating system is viewed as extended machine.
Many more functions or instructions are implemented through software routine. The operating system is
mainly memory resistant, i.e., the operating system is loaded into main memory.
Due to that, the main memory of a computer is divided into two parts. One part is
reserved for operating system. The other part is for user program. The program
currently being executed by the CPU is loaded into the user part of the memory.
In a uni-programming system, the program currently being executed is loaded into
the user part of the memory.
When memory holds multiple processes, then the process can move from one process to another process
when one process is waiting. But the processor is so much faster then I/O that it will be common for all the
processes in memory to be waiting for I/O. Thus, even with multiprogramming, a processor could be idle
most of the time.
The processor alternates between executing operating system instructions and executing user
processes. While the operating system is in control, it decides which process in the queue
should be executed next.
We know that the information of all the process that are in execution must be placed in main
memory. Since there is fix amount of memory, so memory management is an important issue. 61
Memory Management
In an uni-programming system, main memory is divided into two parts : one part for the
operating system and the other part for the program currently being executed.
Since the size of main memory is fixed, it is possible to accommodate only few process in the main
memory. If all are waiting for I/O operation, then again CPU remains idle.
To utilize the idle time of CPU, some of the process must be off loaded from the memory and new process
must be brought to this memory place. This is known swapping.
What is swapping :
1. The process waiting for some I/O to complete, must stored back in disk.
2. New ready process is swapped in to main memory as space becomes available.
3. As process completes, it is moved out of main memory.
4. If none of the processes in memory are ready,
Swapped out a block process to intermediate queue of blocked process.
Swapped in a ready process from the ready queue.
But swapping is an I/O process, so it also takes time. Instead of remain in idle state of CPU, sometimes it is
advantageous to swapped in a ready process and start executing it.
The main question arises where to put a new process in the main memory. It must be done in such a way
that the memory is utilized properly. 63
Memory Management
Partitioning
Splitting of memory into sections to allocate processes including operating system. There are
two scheme for partitioning :
• Fixed size partitions
• Variable size partitions
64
Memory Management
Even with the use of unequal size of partitions, there will be wastage of memory. In most cases,
a process will not require exactly as much memory as provided by the partition.
For example, a process that require 5-MB of memory would be placed in the 6-MB partition
which is the smallest available partition. In this partition, only 5-MB is used, the remaining 1-
MB can not be used by any other process, so it is a wastage. Like this, in every partition we
may have some unused memory. The unused portion of memory in each partition is termed as
hole.
When a process is brought into memory, it is allocated exactly as much memory as it requires
and no more. In this process it leads to a hole at the end of the memory, which is too small to
use. It seems that there will be only one hole at the end, so the waste is less.
But, this is not the only hole that will be present in variable size partition. When all processes
are blocked then swap out a process and bring in another process. The new swapped in process
may be smaller than the swapped out process. Most likely we will not get two process of same
size. So, it will create another whole. If the swap- out and swap-in is occurring more time, then
more and more hole will be created, which will lead to more wastage of memory. 65
Memory Management
There are two simple ways to slightly remove the problem of memory wastage:
Coalesce : Join the adjacent holes into one large hole , so that some process can be
accommodated into the hole.
Compaction : From time to time go through memory and move all hole into one free block of
memory.
During the execution of process, a process may be swapped in or swapped out many times. it
is obvious that a process is not likely to be loaded into the same place in main memory each
time it is swapped in. Further more if compaction is used, a process may be shiefted while in
main memory.
A process in memory consists of instruction plus data. The instruction will contain address for
memory
locations of two types:
Address of data item
Address of instructions used for branching instructions
66
Memory Management
These addresses will change each time a process is swapped in. To solve this problem, a distinction is made
between logical address and physical address.
• Logical address is expressed as a location relative to the beginning of the program. Instructions in the
program contains only logical address.
• Physical address is an actual location in main memory.
When the processor executes a process, it automatically converts from logical to physical address by adding
the current starting location of the process, called it's base address to each logical address.
Every time the process is swapped in to main memory, the base address may be different depending on the
allocation of memory to the process.
Consider a main memory of 2-MB out of which 512-KB is used by the Operating System. Consider three
process of size 425-KB, 368-KB and 470-KB and these three process are loaded into the memory. This
leaves a hole at the end of the memory. That is too small for a fourth process. At some point none of the
process in main memory is ready. The operating system swaps out process-2 which leaves sufficient room
for new process of size 320-KB. Since
process-4 is smaller then process-2, another hole is created. Later a point is reached at which none of the
processes in the main memory is ready, but proces-2, so process-1 is swapped out and process-2 is swapped
in there. It will create another hole. In this way it will create lot of small holes in the memory system which
will lead to more memory wastage. 67
Memory Management
In this scheme,
The memory is partitioned into equal fixed size chunks that are relatively small. This chunk of
memory is known as frames or page frames.
Each process is also divided into small fixed chunks of same size. The chunks of a program is
known as pages.
In this scheme, the wastage space in memory for a process is a fraction of a page frame which
corresponds to the last page of the program.
At a given point of time some of the frames in memory are in use and some are free. The list
of free frame is maintained by the operating system. 69
Virtual memory
Process A , stored in disk , consists of pages . At the time of execution of the process A, the operating
system finds six free frames and loads the six pages of the process A into six frames.
These six frames need not be contiguous frames in main memory. The operating system maintains a page
table for each process.
Within the program, each logical address consists of page number and a relative address within the page.
In case of simple partitioning, a logical address is the location of a word relative to the beginning of the
program; the processor translates that into a physical address.
With paging, a logical address is a location of the word relative to the beginning of the page of the
program, because the whole program is divided into several pages of equal length and the length of a page
is same with the length of a page frame.
A logical address consists of page number and relative address within the page, the process uses the page
table to produce the physical address which consists of frame number and relative address within the
frame.
The figure on next page shows the allocation of frames to a new process in the main memory. A page table
is maintained for each process. This page table helps us to find the physical address in a frame which
corresponds to a logical address within a process.
70
Virtual memory
71
Virtual memory
The conversion of logical address to physical address is shown in the figure for the Process A.
This approach solves the problems. Main memory is divided into many small equal size frames.
Each process is divided into frame size pages. Smaller process requires fewer pages, larger
process requires more. When a process is brought in, its pages are loaded into available frames
and a page table is set up.
72
Virtual memory
The concept of paging helps us to develop truly effective multiprogramming systems.
Since a process need not be loaded into contiguous memory locations, it helps us to put a page of a process
in any free page frame. On the other hand, it is not required to load the whole process to the main memory,
because the execution may be confined to a small section of the program. (eg. a subroutine).
It would clearly be wasteful to load in many pages for a process when only a few pages will be used before
the program is suspended.
Instead of loading all the pages of a process, each page of process is brought in only when it is needed, i.e.
on demand. This scheme is known as demand paging.
Demand paging also allows us to accommodate more process in the main memory, since we are not going to
load the whole process in the main memory, pages will be brought into the main memory as and when it is
required.
With demand paging, it is not necessary to load an entire process into main memory.
This concept leads us to an important consequence – It is possible for a process to be larger than the size of
main memory. So, while developing a new process, it is not required to look for the main memory available
in the machine. Because, the process will be divided into pages and pages will be brought to memory on
demand. 73
Virtual memory
Because a process executes only in main memory, so the main memory is referred to as real
memory or physical memory.
A programmer or user perceives a much larger memory that is allocated on the disk. This
memory is referred to as virtual memory. The program enjoys a huge virtual memory space to
develop his or her program or software.
The execution of a program is the job of operating system and the underlying hardware. To
improve the performance some special hardware is added to the system. This hardware unit is
known as Memory Management Unit (MMU).
In paging system, we make a page table for the process. Page table helps us to find the physical
address from virtual address.
The virtual address space is used to develop a process. The special hardware unit , called
Memory Management Unit (MMU) translates virtual address to physical address. When the
desired data is in the main memory, the CPU can work with these data. If the data are not in the
main memory, the MMU causes the operating system to bring into the memory from the disk.
74
Virtual memory
75
Module 3: CPU Design
2. Processor Organization
76
Introduction to CPU
The operation or task that must perform by CPU are:
To do these tasks, it should be clear that the CPU needs to store some data temporarily. It must
remember the location of the last instruction so that it can know where to get the next instruction. It
needs to store instructions and data temporarily while an instruction is beign executed. In other
words, the CPU needs a small internal memory. These storage location are generally referred as
registers.
The major components of the CPU are an arithmetic and logic unit (ALU) and a control unit (CU).
The ALU does the actual computation or processing of data. The CU controls the movement of data
and instruction into and out of the CPU and controls the operation of the ALU.
77
Introduction to CPU
The CPU is connected to the rest of the system through system bus. Through system bus, data
or information gets transferred between the CPU and the other component of the system.
The system bus may have three components:
Data Bus:
Data bus is used to transfer the data between main memory and CPU.
Address Bus:
Address bus is used to access a particular memory location by putting the address of the
memory location.
Control Bus:
Control bus is used to provide the different control signal generated by CPU to different part of
the system. As for example, memory read is a signal generated by CPU to indicate that a
memory read operation has to be performed. Through control bus this signal is transferred to
memory module to indicate the required operation.
There are three basic components of CPU: register bank, ALU and Control Unit. There are
several data movements between these units and for that an internal CPU bus is used. Internal
CPU bus is needed to transfer data between the various registers and the ALU, because the
ALU in fact operates only on data in the internal CPU memory. 78
Introduction to CPU
In this case, the arithmetic and logic unit (ALU), and all CPU registers are connected via a single
common bus. This bus is internal to CPU and this internal bus is used to transfer the information
between different components of the CPU. This organization is termed as single bus organization,
since only one internal bus is used for transferring of information between different components of
CPU. We have external bus or buses to CPU also to connect the CPU with the memory module and
I/O devices. The external memory bus is also shown in the figure A connected to the CPU via the
memory data and address register MDR and MAR.
The number and function of registers R0 to R(n-1) vary considerably from one machine to another.
They may be given for general-purpose for the use of the programmer. Alternatively, some of them
may be dedicated as special-purpose registers, such as index register or stack pointers.
In this organization, two registers, namely Y and Z are used which are transparent to the user.
Programmer can not directly access these two registers. These are used as input and output buffer to
the ALU which will be used in ALU operations. They will be used by CPU as temporary storage for
some instructions.
81
Processor Organization
82
Figure A : Single bus organization of the data path inside the CPU
Processor Organization
For the execution of an instruction, we need to perform an instruction cycle. An instruction cycle
consists of two phase,
• Fetch cycle and
• Execution cycle.
Most of the operation of a CPU can be carried out by performing one or more of the following
functions in some pre-specified sequence:
1. Fetch the contents of a given memory location and load them into a CPU register.
2. Store a word of data from a CPU register into a given memory location.
3. Transfer a word of data from one CPU register to another or to the ALU.
4. Perform an arithmetic or logic operation, and store the result in a CPU register.
Now we will examine the way in which each of the above functions is implemented in a computer.
Fetching a Word from Memory:
Information is stored in memory location identified by their address. To fetch a word from memory,
the CPU has to specify the address of the memory location where this information is stored and
request a Read operation. The information may include both, the data for an operation or the
instruction of a program which is available in main memory.
83
Processor Organization
To perform a memory fetch operation, we need to complete the following tasks:
The CPU transfers the address of the required memory location to the Memory Address Register
(MAR).
The MAR is connected to the memory address line of the memory bus, hence the address of the
required word is transferred to the main memory.
Next, CPU uses the control lines of the memory bus to indicate that a Read operation is initiated.
After issuing this request, the CPU waits until it receives an answer from the memory, indicating
that the requested operation has been completed.
The memory set this signal to 1 to indicate that the contents of the specified memory location are
available in memory data bus.
As soon as MFC signal is set to 1, the information available in the data bus is loaded into the
Memory Data Register (MDR) and this is available for use inside the CPU. 84
Processor Organization
As an example, assume that the address of the memory location to be accessed is kept in register R2 and
that the memory contents to be loaded into register R1. This is done by the following sequence of
operations:
The time required for step 3 depends on the speed of the memory unit. In general, the time required to
access a word from the memory is longer than the time required to perform any operation within the CPU.
The scheme that is used here to transfer data from one device (memory) to another device (CPU) is referred
to as an asynchronous transfer.
This asynchronous transfer enables transfer of data between two independent devices that have different
speeds of operation. The data transfer is synchronised with the help of some control signals. In this
example, Read request and MFC signal are doing the synchronization task.
An alternative scheme is synchronous transfer. In this case all the devices are controlled by a common clock
pulse (continuously running clock of a fixed frequency). These pulses provide common timing signal to the
CPU and the main memory. A memory operation is completed during every clock period. Though the
synchronous data transfer scheme leads to a simpler implementation, it is difficult to accommodate devices
with widely varying speed. In such cases, the duration of the clock pulse will be synchronized to the slowest
device. It reduces the speed of all the devices to the slowest one. 85
Execution of a Complete Instructions:
We have discussed about four different types of basic operations:
To execute a complete instruction we need to take help of these basic operations and we need to
execute these operation in some particular order to execute an instruction.
As for example, consider the instruction : "Add contents of memory location NUM to the contents of
register R1 and store the result in register R1." For simplicity, assume that the address NUM is given
explicitly in the address field of the instruction .That is, in this instruction, direct addressing mode is
used.
Steps Actions
1. PCout, MARin, Read, Clear Y, Set carry -in to ALU, Add, Zin
2. Zout, PCin, Wait For MFC
3. MDRout, Irin
4. Address-field- of- IRout, MARin, Read
5. R1out, Yin, Wait for MFC
6. MDRout, Add, Zin
7. Zout, R1in
8. END
At the same time, the carry-in to the ALU is set to 1 and an add operation is specified.
In Step 2:
The updated value is moved from register Z back into the PC. Step 2 is initiated
immediately after issuing the memory Read request without waiting for completion of memory function.
This is possible, because step 2 does not use the memory bus and its execution does not depend
on the memory read operation.
In Step 3:
Step3 has been delayed until the MFC is received. Once MFC is received, the word
fetched from the memory is transferred to IR (Instruction Register), Because it is an instruction. Step 1
through 3 constitute the instruction fetch phase of the control sequence.
The instruction fetch portion is same for all instructions. Next step onwards, instruction
execution phase takes place.
88
Execution of a Complete Instructions:
As soon as the IR is loaded with instruction, the instruction decoding circuits interprets its
contents. This enables the control circuitry to choose the appropriate signals for the remainder
of the control sequence, step 4 to 8, which we referred to as the execution phase.
To design the control sequence of execution phase, it is needed to have the knowledge of the
internal structure and instruction format of the PU. Secondly , the length of instruction phase is
different for different instruction.
opcode M R
89
Execution of a Complete Instructions:
In Step 5 :
The destination field of IR, which contains the address of the register R1, is used to
transfer the contents of register R1 to register Y and wait for Memory function
Complete. When the read operation is completed, the memory operand is available in
MDR.
In Step 6 :
The result of addition operation is performed in this step.
In Step 7:
The result of addition operation is transferred from temporary register Z to the
destination register R1 in this step.
In step 8 :
It indicates the end of the execution of the instruction by generating End signal. This
indicates completion of execution of the current instruction and causes a new fetch
cycle to be started by going back to step 1.
90
Design of Control Unit
To execute an instruction, the control unit of the CPU must generate the required control signal
in the proper sequence. As for example, during the fetch phase, CPU has to generate PC out
signal along with other required signal in the first clock pulse. In the second clock pulse CPU
has to generate PCin signal along with other required signals. So, during fetch phase, the proper
sequence for generating the signal to retrieve from and store to PC is PC out and PCin.
To generate the control signal in proper sequence, a wide variety of techniques exist. Most of
these techniques, however, fall into one of the two categories,
1. Hardwired Control
2. Microprogrammed Control.
91
Design of Control Unit
Hardwired Control
In this hardwired control techniques, the control signals are generated by means of hardwired circuit. The
main objective of control unit is to generate the control signal in proper sequence.
Consider the sequence of control signal required to execute the ADD instruction that is explained in
previous lecture. It is obvious that eight non-overlapping time slots are required for proper execution of the
instruction represented by this sequence.
Each time slot must be at least long enough for the function specified in the corresponding step to be
completed. Since, the control unit is implemented by hardwire device and every device is having a
propagation delay, due to which it requires some time to get the stable output signal at the output port after
giving the input signal. So, to find out the time slot is a complicated design task.
For the moment, for simplicity, let us assume that all slots are equal in time duration. Therefore the required
controller may be implemented based upon the use of a counter driven by a clock.
Each state, or count, of this counter corresponds to one of the steps of the control sequence of the
instructions of the CPU.
In the previous lecture, we have mentioned control sequence for execution of one instruction only (for add).
Like that we need to design the control sequence of all the instructions.
92
Design of Control Unit
By looking into the design of the CPU, we may say that there are various instruction for add
operation. As for example,
The control sequence for execution of these two ADD instructions are different. Of course, the
fetch phase of all the instructions remain same.
It is clear that control signals depend on the instruction, i.e., the contents of the instruction
register. It is also observed that execution of some of the instructions depend on the contents of
condition code or status flag register, where the control sequence depends in conditional branch
instruction.
93
Design of Control Unit
Hence, the required control signals are uniquely determined by the following information:
The external inputs represent the state of the CPU and various control lines connected to it,
such as MFC status signal. The condition codes/ status flags indicates the state of the CPU.
These includes the status flags like carry, overflow, zero, etc.
94
Design of Control Unit
Control Unit Organization
95
Design of Control Unit
The structure of control unit can be represented in a simplified view by putting it in block
diagram. The detailed hardware involved may be explored step by step. The simplified view
of the control unit is given in the fig (A)
(prev page.)
The decoder part of decoder/encoder part provide a separate signal line for each
control step, or time slot in the control sequence. Similarly, the output of the instructor
decoder consists of a separate line for each machine instruction loaded in the IR, one of the
output line INS1 to INSm is set to 1 and all other lines are set to 0.
The detailed view of Control Unit organization is shown in the fig (Next Page..)
96
Design of Control Unit
1. Introduction to I/O
98
Introduction to I/O
Input/Output Organization
• The computer system's input/output (I/O) architecture is its interface to the outside world.
• Till now we have discussed the two important modules of the computer system
• Each I/O module interfaces to the system bus and controls one or more peripheral devices.
99
Introduction to I/O
There are several reasons why an I/O device or peripheral device is not directly connected to
the system bus. Some of them are as follows –
There are a wide variety of peripherals with various methods of operation. It would be
impractical to include the necessary logic within the processor to control several
devices.
The data transfer rate of peripherals is often much slower than that of the memory or
processor. Thus, it is impractical to use the high-speed system bus to communicate
directly with a peripheral.
Peripherals often use different data formats and word lengths than the computer to
which they are attached.
100
Introduction to I/O
Input/Output Modules
• Processor Communication
• Device Communication
• Data Buffering
• Error Detection
During any period of time, the processor may communicate with one or more external devices
in unpredictable manner, depending on the program's need for I/O.
The internal resources, such as main memory and the system bus, must be shared among a101
number of activities, including data I/O.
Introduction to I/O
Control & timings:
The I/O function includes a control and timing requirement to co-ordinate the flow of
traffic between internal resources and external devices.
For example, the control of the transfer of data from an external device to the
processor might involve the following sequence of steps :
a.) The processor interacts with the I/O module to check the status of the attached device.
b.) The I/O module returns the device status.
c.) If the device is operational and ready to transmit, the processor requests the transfer of
data, by means of a command to the I/O module.
d.) The I/O module obtains a unit of data from external device.
e.) The data are transferred from the I/O module to the processor.
If the system employs a bus, then each of the interactions between the processor and the I/O
module involves one or more bus arbitrations.
102
Introduction to I/O
Processor & Device Communication
During the I/O operation, the I/O module must communicate with the processor and with the
external device.
Processor communication involves the following -
Command decoding :
The I/O module accepts command from the processor, typically sent as
signals on control bus.
Data :
Data are exchanged between the processor and the I/O module over the data bus.
Status Reporting :
Because peripherals are so slow, it is important to know the status of the I/O module.
For example, if an I/O module is asked to send data to the processor(read), it may not
be ready to do so because it is still working on the previous I/O command. This fact
can be reported with a status signal. Common status signals are BUSY and READY.
Address Recognition :
Just as each word of memory has an address, so thus each of the I/O devices. Thus an
I/O module must recognize one unique address for each peripheral it controls.
One the other hand, the I/O must be able to perform device communication. This communication involves
command, status information and data. 103
Introduction to I/O
Data Buffering:
An essential task of an I/O module is data buffering. The data buffering is required
due to the mismatch of the speed of CPU, memory and other peripheral devices. In
general, the speed of CPU is higher than the speed of the other peripheral devices. So,
the I/O modules store the data in a data buffer and regulate the transfer of data as per
the speed of the devices.
In the opposite direction, data are buffered so as not to tie up the memory in a slow
transfer operation. Thus the I/O module must be able to operate at both device and
memory speed.
Error Detection:
Another task of I/O module is error detection and for subsequently reporting error to
the processor. One class or error includes mechanical and electrical malfunctions
reported by the device (e.g. paper jam). Another class consists of unintentional
changes to the bit pattern as it is transmitted from devices to the I/O module.
104
Introduction to I/O
There will be many I/O devices connected through I/O modules to the system. Each device
will be identified by a unique address.
When the processor issues an I/O command, the command contains the address of the device
that is used by the command. The I/O module must interpret the address lines to check if the
command is for itself.
Generally in most of the processors, the processor, main memory and I/O share a common bus
(data address and control bus).
• Memory-mapped I/O
• Isolated or I/O mapped I/O
105
Introduction to I/O
Memory-mapped I/O:
There is a single address space for memory locations and I/O devices.
The processor treats the status and address register of the I/O modules as memory location.
For example, if the size of address bus of a processor is 16, then there are 2^16 combinations
and all together 2^16 address locations can be addressed with these 16 address lines.
Out of these 2^16 address locations, some address locations can be used to address I/O devices
and other locations are used to address memory locations.
Since I/O devices are included in the same memory address space, so the status and address
registers of I/O modules are treated as memory location by the processor. Therefore, the same
machine instructions are used to access both memory and I/O devices.
106
Introduction to I/O
In this scheme, the full range of addresses may be available for both.
The address refers to a memory location or an I/O device is specified with the help of a
command line.
Since full range of address is available for both memory and I/O devices, so, with 16 address
lines, the system may now support both 2 ^16 memory locations and 2 ^16 I/O addresses.
107
Program controlled I/O
With interrupt driven I/O, the processor issues an I/O command, continues to execute other
instructions, and is interrupted by the I/O module when the I/O module completes its work.
In Direct Memory Access (DMA), the I/O module and main memory exchange data directly
without processor involvement.
108
Program controlled I/O
With both programmed I/O and Interrupt driven I/O, the processor is responsible for extracting
data from main memory for output operation and storing data in main memory for input
operation.
To send data to an output device, the CPU simply moves that data to a special memory
location in the I/O address space if I/O mapped input/output is used or to an address in the
memory address space if memory mapped I/O is used.
To read data from an input device, the CPU simply moves data from the address (I/O or
memory) of that device into the CPU.
Input/Output Operation: The input and output operation looks very similar to a memory read
or write operation except it usually takes more time since
peripheral devices are slow in speed than main memory
modules. 109
Program controlled I/O
Input/Output Port
An I/O port is a device that looks like a memory cell to the computer but contains connection
to the outside world.
An I/O port typically uses a latch. When the CPU writes to the address associated with the
latch, the latch device captures the data and makes it available on a set of wires external to the
CPU and memory system.
The I/O ports can be read-only, write-only, or read/write. The write-only port is shown in the
figure.
110
Program controlled I/O
First, the CPU will place the address of the device on the I/O address bus and with the help of
address decoder a signal is generated which will enable the latch.
If it is in a read operation, the data that are already stored in the latch will be transferred to the
CPU.
A read only (input) port is simply the lower half of the figure.
In case of I/O mapped I/O, a different address space is used for I/O devices. The address space
for memory is different. In case of memory mapped I/O, same address space is used for both
memory and I/O devices. Some of the memory address space are kept reserved for I/O devices.
For memory-mapped I/O, any instruction that accessed memory can access a memory-mapped I/O
port.
112
Program controlled I/O
Generally, a given peripheral device will use more than a single I/O port. A typical PC parallel
printer interface, for example, uses three ports, a read/write port, and input port and an output
port.
The read/write port is the data port ( it is read/write to allow the CPU to read the last ASCII
character it wrote to the printer port ).
Memory-mapped I/O subsystems and I/O-mapped subsystems both require the CPU to move
data between the peripheral device and main memory.
For example, to input a sequence of 20 bytes from an input port and store these bytes into
memory, the CPU must send each value and store it into memory. 113
Program controlled I/O
Programmed I/O:
In programmed I/O, the data transfer between CPU and I/O device is carried out with the help of a
software routine.
When a processor is executing a program and encounters an instruction relating to I/O, it executes that
I/O instruction by issuing a command to the appropriate I/O module.
The I/O module will perform the requested action and then set the appropriate bits in the I/O status
register.
It is the responsibility of the processor to check periodically the status of the I/O module until it finds
that the operation is complete.
In programmed I/O, when the processor issues a command to a I/O module, it must wait until the I/O
operation is complete.
Generally, the I/O devices are slower than the processor, so in this scheme CPU time is wasted. CPU
114is
checking the status of the I/O module periodically without doing any other work.
Program controlled I/O
I/O Commands
To execute an I/O-related instruction, the processor issues an address, specifying the particular
I/O module and external device, and an I/O command. There are four types of I/O commands
that an I/O module will receive when it is addressed by a processor –
Control : Used to activate a peripheral device and instruct it what to do. For example, a
magnetic tape unit may be instructed to rewind or to move forward one record. These
commands are specific to a particular type of peripheral device.
Test : Used to test various status conditions associated with an I/O module and its
peripherals. The processor will want to know if the most recent I/O operation is completed
or any error has occurred.
Read : Causes the I/O module to obtain an item of data from the peripheral and place it in
the internal buffer.
Write : Causes the I/O module to take an item of data ( byte or word ) from the data bus
and subsequently transmit the data item to the peripheral. 115
Interrupt Control I/O
Interrupt driven I/O
The problem with programmed I/O is that the processor has to wait a long time for the I/O
module of concern to be ready for either reception or transmission of data. The processor,
while waiting, must repeatedly interrogate the status of the I/O module.
This type of I/O operation, where the CPU constantly tests a part to see if data is available, is
polling, that is, the CPU Polls (asks) the port if it has data available or if it is capable of
accepting data. Polled I/O is inherently inefficient.
The solution to this problem is to provide an interrupt mechanism. In this approach the
processor issues an I/O command to a module and then go on to do some other useful work.
The I/O module then interrupt the processor to request service when it is ready to exchange
data with the processor. The processor then executes the data transfer. Once the data transfer
is over, the processor then resumes its former processing.
116
Let us consider how it works
For input, the I/O module services a READ command from the processor.
The I/O module then proceeds to read data from an associated peripheral device.
Once the data are in the modules data register, the module issues an interrupt to the
processor over a control line.
The module then waits until its data are requested by the processor.
When the request is made, the module places its data on the data bus and is then ready
for another I/O operation.
117
B. From the processor point of view; the action for an input is as follows: :
At the end of each instruction cycle, the processor checks for interrupts
When the interrupt from an I/O module occurs, the processor saves the context
(e.g. program counter & processor registers) of the current program and processes the interrupt.
In this case, the processor reads the word of data from the I/O module and stores it in
memory.
It then restores the context of the program it was working on and resumes execution.
118
Direct Memory Access [ DMA ]
We have discussed the data transfer between the processor and I/O devices. We have discussed
two different approaches namely programmed I/O and Interrupt-driven I/O. Both the methods
require the active intervention of the processor to transfer data between memory and the I/O
module, and any data transfer must transverse a path through the processor. Thus both these
forms of I/O suffer from two inherent drawbacks.
o The I/O transfer rate is limited by the speed with which the processor can test and service a
device.
o The processor is tied up in managing an I/O transfer; a number of instructions must be
executed for each I/O transfer.
To transfer large block of data at high speed, a special control unit may be provided to allow
transfer of a block of data directly between an external device and the main memory, without
continuous intervention by the processor. This approach is called direct memory access or
DMA.
DMA transfers are performed by a control circuit associated with the I/O device and this circuit
is referred as DMA controller. The DMA controller allows direct data transfer between the
device and the main memory without involving the processor.
119
Direct Memory Access [ DMA ]
To transfer data between memory and I/O devices, DMA
controller takes over the control of the system from the processor
and transfer of data take place over the system bus. For this
purpose, the DMA controller must use the bus only when the
processor does not need it, or it must force the processor to
suspend operation temporarily. The later technique is more
common and is referred to as cycle stealing, because the DMA
module in effect steals a bus cycle.
120
Direct Memory Access [ DMA ]
When the processor wishes to read or write a block of data, it issues a command to the DMA module, by
sending to the DMA module the following information.
Whether a read or write is requested, using the read or write control line between the processor and the
DMA module.
The address of the I/O devise involved, communicated on the data lines.
The starting location in the memory to read from or write to, communicated on data lines and stored by
the DMA module in its address register.
The number of words to be read or written again communicated via the data lines and stored in the
data count register.
The processor then continues with other works. It has delegated this I/O operation to the DMA module.
The DMA module checks the status of the I/O devise whose address is communicated to DMA controller
by the processor. If the specified I/O devise is ready for data transfer, then DMA module generates the
DMA request to the processor. Then the processor indicates the release of the system bus through DMA
acknowledge.
The DMA module transfers the entire block of data, one word at a time, directly to or from memory,
without going through the processor. 121
Direct Memory Access [ DMA ]
When the transfer is completed, the DMA module sends an interrupt signal to the processor. After
receiving the interrupt signal, processor takes over the system bus.
The point where in the instruction cycle Fig : DMA break point
the processor may be suspended 122
Direct Memory Access [ DMA ]
When the processor is suspended, then the DMA module transfer one word and return control
to the processor.
Note that, this is not an interrupt, the processor does not save a context and do something else.
Rather, the processor pauses for one bus cycle.
During that time processor may perform some other task which does not involve the system
bus. In the worst situation processor will wait for some time, till the DMA releases the bus.
The net effect is that the processor will go slow. But the net effect is the enhancement of
performance, because for a multiple word I/O transfer, DMA is far more efficient than interrupt
driven or programmed I/O.
123
Direct Memory Access [ DMA ]
The DMA mechanism can be configured in different ways. The most common amongst
them are:
124
Direct Memory Access [ DMA ]
Single bus, detached DMA - I/O configuration
In this organization all modules share the same system bus. The DMA module here acts as a
surrogate processor. This method uses programmed I/O to exchange data between memory and
an I/O module through the DMA module.
For each transfer it uses the bus twice. The first one is when transferring the data between I/O
and DMA and the second one is when transferring the data between DMA and memory. Since
the bus is used twice while transferring data, so the bus will be suspended twice. The transfer
consumes two bus cycle.
125
Direct Memory Access [ DMA ]
By integrating the DMA and I/O function the number of required bus cycle can be reduced. In
this configuration, the DMA module and one or more I/O modules are integrated together in
such a way that the system bus is not involved. In this case DMA logic may actually be a part
of an I/O module, or it may be a separate module that controls one or more I/O modules.
The DMA module, processor and the memory module are connected through the system bus.
In this configuration each transfer will use the system bus only once and so the processor is
suspended only once.
The system bus is not involved when transferring data between DMA and I/O device, so
processor is not suspended. Processor is suspended when data is transferred between DMA and
memory.
The configuration is shown in the figure.
126
Direct Memory Access [ DMA ]
In this configuration the I/O modules are connected to the DMA through another I/O bus. In
this case the DMA module is reduced to one.
Transfer of data between I/O module and DMA module is carried out through this I/O bus. In
this transfer, system bus is not in use and so it is not needed to suspend the processor.
There is another transfer phase between DMA module and memory. In this time system bus is
needed for transfer and processor will be suspended for one bus cycle. The configuration is
shown in the figure.
127
Module 5 : Connecting I/O Devices
1. I/O Buses
128
Buses
The processor, main memory, and I/O devices can be interconnected through common data
communication lines which are termed as common bus.
The primary function of a common bus is to provide a communication path between the
devices for the transfer of data. The bus includes the control lines needed to support interrupts
and arbitration.
The bus lines used for transferring data may be grouped into three categories:
data,
address
control lines.
A single line is used to indicate Read or Write operation. When several sizes are possible
like byte, word, or long word, control signals are required to indicate the size of data.
The bus control signal also carry timing information to specify the times at which the
processor and the I/O devices may place data on the bus or receive data from the bus.
129
Buses
There are several schemes exist for handling the timing of data transfer over a bus. These can
be broadly classified as
• Synchronous bus
• Asynchronous bus
Synchronous Bus :
In a synchronous bus, all the devices are synchronised by a common clock, so all devices
derive timing information from a common clock line of the bus. A clock pulse on this common
clock line defines equal time intervals.
In the simplest form of a synchronous bus, each of these clock pulse constitutes a bus cycle
during which one data transfer can take place.
The timing of an input transfer on a synchronous bus is shown in the figure (Next Page..).
130
Buses
131
Buses
Let us consider the sequence of events during an input (read) operation.
At time t0, the master places the device address on the address lines and sends an appropriate
command (read in case of input) on the command lines.
In any data transfer operation, one device plays the role of a master, which initiates data
transfer by issuing read or write commands on the bus.
Normally, the processor acts as the master, but other device with DMA capability may also
becomes bus master. The device addressed by the master is referred to as a slave or target
device.
The command also indicates the length of the operand to be read, if necessary.
The clock pulse width, t1 - t0, must be longer than the maximum propagation delay between
two devices connected to the bus.
After decoding the information on address and control lines by slave, the slave device of that
particular address responds at time t1. The addressed slave device places the required input
data on the data line at time .
132
Buses
At the end of the clock cycle, at time t2, the master strobes the data on the data lines into its
input buffer. The period t2- t1 must be greater than the maximum propagation delay on the bus
plus the set up time of the input buffer register of the master. A similar procedure is followed
for an output operation.
The master places the output data on the data lines when it transmits the address and command
information. At time t2, the addressed device strobe the data lines and load the data into its
data buffer.
133
Buses
Multiple Cycle Transfer
The simple design of device interface by synchronous bus has some limitations.
• A transfer has to be completed within one clock cycle. The clock period, must be long
enough to accommodate the slowest device to interface. This forces all devices to operate at
the speed of slowest device.
• The processor or the master has no way to determine whether the addressed device has
actually responded. It simply assumes that, the output data have been received by the device
or the input data are available on the data lines.
To solve these problems, most buses incorporate control signals that represent a response from
the device. These signals inform the master that the target device has recognized its address and
it is ready to participate in the data transfer operation.
They also adjust the duration of the data transfer period to suit the needs of the participating
devices.
A high frequency clock pulse is used so that a complete data transfer operation span over
several clock cycles. The numbers of clock cycles involved can vary from device to device 134
Buses
An instance of this scheme is shown in the figure.
135
Buses
In clock cycle 1, the master sends address and command on the bus requesting a read
operation.
The target device responded at clock cycle 3 by indicating that it is ready to participate in the
data transfer by making the slave ready signal high.
Then the target device places the data on the data line.
The target device is a slower device and it needs two clock cycle to transfer the information.
After two clock cycle, that is at clock cycle 5, it pulls down the slave ready signal down.
When the slave ready signal goes down, the master strobes the data from the data bus into its
input buffer.
If the addressed device does not respond at all, the master waits for some predefined maximum
number of clock cycles, then aborts the operation.
136
Buses
Asynchronous Bus
In asynchronous mode of transfer, a handshake signal is used between master and slave.
In asynchronous bus, there is no common clock, and the common clock signal is
replaced by two timing control signals: master-ready and slave-ready.
Master-ready signal is assured by the master to indicate that it is ready for a transaction,
and slave-ready signal is a response from the slave.
• The master places the address and command information on the bus. Then it indicates to all
devices that it has done so by activating the master-ready signal.
• The selected target device performs the required operation and inform the processor (or master)
by activating the slave-ready line.
• The master waits for slave-ready to become asserted before it remove its signals from the bus.
137
Buses
The timing of an input data transfer using the handshake scheme is shown in the figure.
138
Buses
The timing of an output operation using handshaking scheme is shown in the figure.
139
External Memory
Main memory is taking an important role in the working of computer. We have seen that
computer works on Von - Neuman stored program principle. We have to keep the information in
main memory and CPU access the information from main memory.
The main memory is made up of semiconductor device and by nature it is volatile. For
permanent storage of information we need some non volatile memory. The memory devices
need to store information permanently are termed as external memory. While working, the
information will be transferred from external memory to main memory.
The devices need to store information permanently are either magnetic or optical devices.
140
Module 6 : Pipeline
2. Performance Issues
3. Branching
141