COA Chapter 5-7

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 26

BHU Department of Computer Science

CHAPTER - 5
MEMORY ORGANIZATION
5.1 Memory Hierarchy
Memory hierarchy in a computer system is shown in the figure below. Memory Hierarchy is to
obtain the highest possible access speed while minimizing the total cost of the memory system.

Figure: Memory
hierarchy in a computer
system
 Memory
hierarchy in a
computer
system:
 Main
Memory
:
memory
unit that
communicates directly with the CPU (RAM)
 Auxiliary Memory: device that provide backup storage (Disk Drives)
 Cache Memory: special very-high-speed memory to increase the processing
speed (Cache RAM)
 Multiprogramming: enables the CPU to process a number of independent programs
concurrently.
 Memory Management System: supervises the flow of information between auxiliary
memory and main memory.
5.2 Main Memory
 It is the memory unit that communicates directly with the CPU.
 It is the central storage unit in a computer system
 It is relatively large and fast memory used to store programs and data during the
computer operation.

a) RAM Chips
 RAM is used for storing the bulk of the programs and data that are subject to change.
 Available in two possible operating modes:
1. Static RAM- consists essentially of internal flip-flops that store binary information.
2. Dynamic RAM – stores the binary information in the form of electric charges that are
applied to capacitors.

1|Page
BHU Department of Computer Science

Figure: Typical RAM chip


b) ROM chip
 ROM is used or storing programs permanently.
 The ROM portion is needed for storing and initial program called a bootstrap loader
 Bootstrap loader is a program whose function is to start the computer software operating
when power is turned on.
 The startup of a computer consists of turning the power on and starting the execution of
an initial program.

Figure: Typical ROM chip


Memory address map
2|Page
BHU Department of Computer Science

 Memory address map is a table of pictorial representation of assigned address space


assignment to each memory chip in the system.
 Example: Memory Configuration: 512bytes RAM+ 512bytes ROM
 4x 128bytes RAM+ 1x 512byte ROM

Figure: Memory address map for microcomputer


Memory Connection to CPU
 RAM and ROM chips are connected to a CPU through the data and address buses
 The low-order lines in the address bus select the byte within the chips and other lines in
the address bus select a particular chip through its chip select inputs

Figure: Memory Connection to CPU


 Each RAM receives the seven low order bits of the address bus to select one of 128
possible bytes.
 The particular RAM chip selected is determined from lines 8 and 9 in the address bus.
This is done through a 2 x 4 decoder whose outputs go to the CS1 inputs in each RAM
chip.
 The selection between RAM and ROM is achieved through bus line 10.
3|Page
BHU Department of Computer Science

5.3 Auxiliary Memory(External memories-magnetic disks, magnetic tape and optical disks)
Auxiliary memory is devices that provide backup storage.
The most common auxiliary memory devices used in computer systems are:
 Magnetic Disk: a memory device consisting of a flat disk covered with a magnetic
coating on which information is stored ex: HDD(Hard Disk Drive)- computer hardware
that holds and spins a magnetic or optical disk and reads and writes information on it
 Magnetic Tape: Memory device consisting of a long thin plastic strip coated with iron
oxide; used to record audio or video signals or to store computer information
 Optical Disk: a disk coated with plastic that can store digital data as tiny pits etched in
the surface; is read with a laser that scans the surface, CDR- compact disk on which you
can write only once and thereafter read only memory, ODD, DVD
5.4 Cache Memory
 Cache memory is a small fast memory and is sometimes used to increase the
speed of processing by making current programs and data available to the CPU
at a rapid rate.
 Keeping the most frequently accessed instructions and data in the fast cache memory
 Locality of Reference: the references to memory tend to be confined within a few
localized areas in memory.
The basic operation of the cache is as follows.
 When the CPU needs to access memory, the cache is examined. If the word is found in
the cache, it is read from the fast memory.
 If the word addressed by the CPU is not found in the cache, the main memory is
accessed to read the word.
 A block of words containing the one just accessed is then transferred from main memory
to cache memory.
Hit ratio: is the quantity used to measure the performance of cache memory.
The ratio of the number of hits divided by the total CPU references (hits plus misses) to
memory.
 Hit: the CPU finds the word in the cache (0.9)
 Miss: the word is not found in cache (CPU must read main memory)
Example: A computer with cache memory access time = 100 ns, main memory access time =
1000 ns, hit ratio = 0.9
1 ns miss: 1 x 1000ns
9 ns hit: 9 x 100ns
1900ns/10 = 190ns - average access
Mapping is the transformation of data from main memory to the cache memory.
 There are three types mapping process:
1. Associative mapping
2. Direct mapping
3. Set-associative mapping
4. Example of cache memory

 Main memory: 32 K x 12 bit word (15 bit address lines)


 Cache memory: 512x 12 bit word

4|Page
BHU Department of Computer Science

 CPU sends a 15-bit address to cache


 Hit: CPU accepts the 12-bit data from cache
 Miss: CPU reads the data from main memory (then data is written to cache)
5.5 Associative mapping
 The fastest and most flexible cache organization uses an associative memory.
 The associative memory stores both the address and content (data) of the memory
word.

Figure: Associative Mapping Cache (all number in octal)


 Any location in cache to store any word from main memory.
 The address value of 15 bits is shown as a five-digit octal number and its corresponding
12-bit word is shown as a four- digit octal number.
5.6 Direct mapping
 Each memory block has only one place to load in Cache
 Mapping Table is made of RAM instead of CAM
 In the general case, there are 2k words in cache memory and 2n words in main memory.
 n-bit memory address consists of 2 parts; k bits of Index field and n-k bits of Tag field
 n-bit addresses are used to access main memory and k-bit Index is used to access the
Cache

Figure: Addressing Relationships between Main and Cache Memories


5|Page
BHU Department of Computer Science

Operation
 The CPU generates a memory request with the index field which is used for the address
to access the cache.
 The tag field of the CPU address is compared with the tag in the word read from the
cache.
 If the two tags match, there is a hit and the desired data word is in cache.
 If there is no match, there is miss and the required word is read from main memory.
 It is then stored in the cache together with the new tag replacing the previous value.
Example: Consider the numerical example of Direct Mapping shown in figure below

Figure: Direct Mapping Cache Organization


The word at address zero is presently stored in the cache (index=000, tag=00, data=1220).
Suppose that the CPU now wants to access the word at address 02000. The index address is
000, so it is used to access cache. The two tags are then compared. The cache tag is 00 but the
address tag is 02, which does not produce a match. Therefore, the main memory is accessed
and the data word 5670 is transferred to the CPU. The cache word at index address 000 is then
replaced with a tag of 02 and data of 5670.

The direct-mapping example just described uses a block size of one word. The same
organization but using a block size of 8 words is shown in figure below.

6|Page
BHU Department of Computer Science

Figure: Direct Mapping cache with block size of 8 words


5.7 Set-Associative mapping
 Each data word is stored together with its tag and the number of tag-data items in one
word of cache.

Figure: Two-way set-associative mapping cache.


 Each memory block has a set of locations in the Cache to load Set Associative Mapping
Cache with set size of two
 In general, a set-associative cache of set size k will accommodate k words of main
memory in each word of cache.
Operation
 When the CPU generates a memory request, the index value of the address is used to
access the cache.
 The tag field of the CPU address is then compared with both tags in the cache to
determine if a match occurs.
 The comparison logic is done by an associative search of the tags in the set similar to an
associative memory search: thus the name “set-associative.”
 The hit ratio will improve as the set size increases because more words with the same
index but different tags can reside in cache. However, an increase in the set size
increases the number of bits in words of cache and requires more complex comparison
logic.
 When a miss occurs in a set-associative cache and the set is full, it is necessary to
replace one of the tag-data items with a new value.
The most common replacement algorithms used are:
1. Random Replacement: With the random replacement policy the control chooses one
tag-data item for replacement at random.
2. First-In, First-Out (FIFO): The FIFO procedure selects for replacement the item that
has been in the set the longest.
3. Least Recently Used (LRU): The LRU algorithm selects for re-placement the item that
has been least recently used by the CPU.
Both FIFO and LRU can be implemented by adding a few extra bits in each word of cache.
Writing into Cache
There are two ways of writing into memory:
1. Write Through
When writing into memory:
 If Hit, both Cache and memory is written in parallel
 If Miss, Memory is written
2. Write-Back
When writing into memory:
 If Hit, only cache is written

7|Page
BHU Department of Computer Science

CHAPTER- 6
INPUT/OUTPUT ORGANIZATION
6.1 Peripheral Devices

Peripheral devices are the I/O devices that are externally connected to the machine to read or
write information.
Input Devices
 Keyboard
 Optical input devices
 Magnetic Input Devices- Magnetic Stripe Reader
 Screen Input Devices
- Touch Screen- Light Pen- Mouse
 Analog Input Devices
Output Devices
 Card Puncher, Paper Tape Puncher
 CRT
 Printer (Impact, Ink Jet, Laser, Dot Matrix)
 Plotter
 Analog
6.2 Input-Output Interface

 Input-output interface provides a method for transferring information between internal


storage (such as memory and CPU registers) and external I/O devices.
8|Page
BHU Department of Computer Science

 Interface is a special hardware components between the CPU and peripherals


 Interface supervises and synchronizes all input and output transfers
I/O Bus and Interface Modules
 The I/O bus consists of data lines, address lines, and control lines (collectively called
system bus).

 Each peripheral has an interface module associated with it.


 Each interface:
 Decodes the device address (device code) and interprets them for the
peripherals
 Decodes control (operation/commands) received from the I/O bus and interprets
them for the peripherals
 Provides signals for the peripheral controller
 All peripherals whose address does not correspond to the address in the bus are
disabled by their interface.
 There are four types of commands that an interface may receive.
1) Control: is issued to activate the peripheral and to inform it what to do.
2) Status: is used to test various status conditions in the interface and the
peripheral.
3) Data Input: causes the interface to receive an item of data from the peripheral
and places it in its buffer register.
4) Data Output: causes the interface to respond by transferring data from the bus
into one of its registers.
Connection of I/O Bus to CPU

I/O Bus and Memory Bus


 Memory bus is for information transfers between CPU and the MU
 I/O bus is for information transfers between CPU and I/O devices through their I/O
interface.
 Like the I/O bus, the memory bus contains data, address, and read/write control lines.
 There are three ways that computer buses can be used to communicate with memory
and I/O:
9|Page
BHU Department of Computer Science

1. Use two separate buses, one for memory and the other for I/O.
2. Use one common bus for both memory and I/O but have separate control lines
for each.
3. Use one common bus for memory and I/O with common control lines.
I/O Mapping
Two types of I/O mapping:
1. Isolated I/O
2. Memory-Mapped I/O
Isolated I/O
 Separate I/O read/write control lines in addition to memory read/write control lines
 Separate (isolated) memory and I/O address spaces
 Distinct input and output instructions
Memory-Mapped I/O
 A single set of read/write control lines (no distinction between memory and I/O transfer)
 Memory and I/O addresses share the common address space ( reduces memory
address range available
 No specific input or output instruction (the same memory reference instructions can be
used for I/O transfers
 Considerable flexibility in handling I/O operations

I/O Interface for an Input Device


The address decoder, the data and status registers,and the control circuitry required to
coordinateI/O transfers constitute the device’s interfacecircuit.

6.3 Types of Data Transfer

1. Synchronous -All devices derive the timing information from common clock line
2. Asynchronous -No common clock
Asynchronous data transfer
 Asynchronous data transfer between two independent units requires that control signals
be transmitted between the communicating units to indicate the time at which data is
being transmitted
10 | P a g e
BHU Department of Computer Science

 Two Asynchronous data transfer methods:


1. Strobe pulse- A strobe pulse is supplied by one unit to indicate the other unit
when the transfer has to occur
2. Handshaking- A control signal is accompanied with each data being transmitted
to indicate the presence of data. The receiving unit responds with another control
signal to acknowledge receipt of the data
Strobe Control
 Employs a single control line to time each transfer
 The strobe may be activated by either the source or the destination unit

 Source-Initiated: the source unit that initiates the transfer has no way of knowing
whether the destination unit has actually received data.
 Destination-Initiated: The destination unit that initiates the transfer no way of knowing
whether the source has actually placed the data on the bus.

6.4 Modes of Transfer

There four different data transfer modes between the central computer (CPU & Memory) and peripherals:
1) Programmed-Controlled I/O
2) Interrupt-Initiated I/O
3) Direct Memory Access (DMA)
4) I/O Processor (IOP)
Programmed-Controlled I/O (I/O devices to CPU)
 Transfer of data under programmed I/O is between CPU and peripherals
 Programmed I/O operations are the result of I/O instructions written in the computer
program.
 An example of data transfer from an I/O device through an interface into the CPU is
shown in the following figure:

11 | P a g e
BHU Department of Computer Science

Interrupt - Initiated I/O

 Polling takes valuable CPU time


 Open communication only when some data has to be passed -> Interrupt.
 I/O interface, instead of the OF SHIFT MICROOPERATIONS CPU, monitors the I/O
device
 When the interface determines that the I/O device is ready for data transfer, it generates
an Interrupt Request to the CPU
 Upon detecting an interrupt, CPU stops momentarily the task it is doing, branches to the
service routine to process the data transfer, and then returns to the task it was
performing
6.5 DMA (Direct Memory Access)
 DMA refers to the ability of an I/O device to transfer data directly to and from
 Large blocks of data transferred at a high speed to or from high speed devices, magnetic
drums, disks, tapes, etc.
 DMA controller:- Interface that provides I/O transfer of data directly to and from the
memory and the I/O device
 CPU initializes the DMA controller by sending a memory address and the number of
words to be transferred
 Actual transfer of data is done directly between the device and memory through DMA
controller( Freeing CPU for other tasks

12 | P a g e
BHU Department of Computer Science

6.6 I/O Processor (IOP)


 Communicate directly with all I/O devices
 Fetch and execute its own instruction
 IOP instructions are specifically designed to facilitate I/O transfer
 DMAC must be set up entirely by the CPU
 Designed to handle the details of I/O processing
 The block diagram of a computer with two processors is shown in the following figure.

Command:
 Instruction that are read form memory by an IOP
 Distinguish from instructions that are read by the CPU
 Commands are prepared by experienced programmers and are stored in
memory
 Command word = IOP program
CPU - IOP Communication
 Memory units acts as a message center: Information
 Each processor leaves information for the other

13 | P a g e
BHU Department of Computer Science

I/O Channel

 Three types of channel


1) Multiplexer channel: slow-medium speed device, operating with a number of I/O
devices simultaneously
2) Selector channel: high-speed device, one I/O operation at a time
3) Block-Multiplexer channel: (1 + 2)

 I/O instruction format


Operation code: 8

 Start I/O, Start I/O fast release (less CPU time), Test I/O, Clear I/O, Halt I/O, Halt
device, Test channel, Store channel ID
 Channel Status Word:
 Always stored in Address 64 in memory
 Key: Protection used to prevent unauthorized access
 Address: Last channel command word address used by channel
 Count: 0 (if successful transfer)

14 | P a g e
BHU Department of Computer Science

6.7 Priority Interrupt

 Identify the source of the interrupt when several sources will request service
simultaneously
 Determine which condition is to be serviced first when two or more requests arrive
simultaneously
 Priority interrupt can be done by:
1) Software: Polling
2) Hardware: Daisy chain, Parallel priority
Polling
 Identify the highest-priority source by software means
 One common branch address is used for all interrupts
 Program polls the interrupt sources in sequence
 The highest-priority source is tested first
 Polling priority interrupt occurs:
 If there are many interrupt sources, the time required to poll them can exceed the
time available to service the I/O device
 Hardware priority interrupt

Daisy-Chaining
 Either a serial or a parallel connection of interrupt lines can establish the hardware
priority function.
 The serial connection is known as the daisy- chaining method.

15 | P a g e
BHU Department of Computer Science

The following figure shows one stage of the daisy-chain priority arrangement:

6.8 Serial Communication


 A data communication can be classified into two:
1. Serial communication
2. Parallel communication
 Serial communication: is the process of sending data one bit at a time sequentially
over a communication channel or computer bus.

 Serial communication is used for all long-distance communication and most computer
networks
 Slow data transfer
 Parallel communication: is the process of transferring the whole data bits over
communication channel or computer bus simultaneously.

 Parallel communication is used for all short-distance communication.


 High-data transfer rate
 Data can be transmitted between two points in three different modes:
1. Simplex,
2. Half-duplex
3. Full-duplex
 A simplex line carries information in one direction only (e.g. Radio and television
broadcasting).
 A half-duplex transmission system is one that is capable of transmitting in both
directions at different time (e.g. walk talkie)
 A full-duplex transmission can send and receive data in both directions simultaneously.
(e.g. Telephone)

16 | P a g e
BHU Department of Computer Science

CHAPTER - 7
PIPELINE AND VECTOR PROCESSING
7.1 Parallel processing
Parallel processing is a term used to denote a large class of technique that are used to
provide simultaneous data processing tasks for the purpose of increasing computational
speed of a computer system.
A parallel processing system is able to perform concurrent data processing to achieve
faster execution time.
Example: While an instruction is being executed in the ALU, the next instruction can be
read from memory.
System may have two or more ALU to execute two or more executions at the same
time.
The purpose of parallel processing is to speed up the computer processing capability
and increase its throughput which is the amount of processing that can be interval of
time.

17 | P a g e
BHU Department of Computer Science

Parallel processing is established by distributing the data among the multiple functional
units.
Figure below shows one possible way of separating the execution unit into eight
functional units operating in parallel.

Figure: Processor with Multiple Functional Units


Parallel processing can be considered under the following topics:
1. Pipeline processing
2. Vector processing
3. Array processors
7.2 Pipelining
Pipelining is a technique of decomposing a sequential process into sub operations with
each sub process being executed in a special dedicated segment that operates
concurrently with all other segments.
Example: To perform the combined multiple and add operations with a stream of
numbers.
Ai*Bi + Ci for I = 1, 2, 3 ….7
The sub-operations performed in each segment of the pipeline are as follows:

18 | P a g e
BHU Department of Computer Science

Figure: Example of Pipeline Processing


7.2.1 Arithmetic Pipeline
Used to implement floating-point operations, multiplication of fixed-point numbers and
similar computations encountered in scientific problems.
Example: Consider the following arithmetic operations were pipeline is used in floating-
point adder pipeline binary numbers:
X=A*2a
Y=B*2b
Where A & B are two fractions that represent mantissa and a & b are the exponents.
The floating-point addition and subtraction can be performed in four segments as
follows:
1. Compare the exponents
2. Align the mantissas
3. Add or subtract the mantissas
4. Normalize the result.
The following procedure is outlined in the below figure.

19 | P a g e
BHU Department of Computer Science

Figure: Pipeline for Floating-Point Addition and Subtraction


7.2.2 Instruction Pipeline
Instruction pipeline is a technique for overlapping the execution of several instructions to
reduce the execution time of a set of instructions.

Six Phases in an Instruction Cycle:


1. Fetch the instruction from memory
2. Decode the instruction.
3. Calculate the effective address
4. Fetch the operands from memory
5. Execute the instruction
6. Store the result in the proper place.
Example: Four – segment instruction pipeline
The below Figure shows how the instruction cycle in the CPU can be processed with a
four segment pipeline.

20 | P a g e
BHU Department of Computer Science

Figure: Four Segment CPU Pipeline


The above figure with the abbreviated symbol
1. FI is the segment that fetches an instruction.
2. DA is the segment that decodes the instruction and calculates the effective
address
3. FO is segment that fetches the operand.
4. EX is the segment that executes the instruction.

Figure: Timing of Instruction Pipeline


In general there are three major difficulties that cause the instruction pipeline to deviate
from its normal operation:
1. Resource conflicts caused by access to memory by two segments at the same
time. Most of the self-conflicts can be resolved by using separate instruction and
data memories.
2. Data dependency conflicts arise when an instruction depends on the result of a
previous instructions but this result is not yet available.

21 | P a g e
BHU Department of Computer Science

3. Branch difficulties arise from branch and other instructions that change the value
of PC.
7.2.3 RISC Pipeline
The reduced instruction set computer (RISC) is its ability to use an efficient instruction
pipeline.
RISC is a machine with a very fast clock cycle that executes at the rate of one
instruction per cycle.
 Simple Instruction Set
 Fixed Length Instruction Format
 Register-to-Register Operations

Example: Three- segment instruction pipeline

Figure: Three-segment pipeline timing


The instruction cycle can be divided into three sub-operations and implemented in three
segments:
I: Instruction Fetch
22 | P a g e
BHU Department of Computer Science

A: ALU operation
E: Execute instruction

7.3 Vector Processing


Computers with vector processing capabilities are in demand in specialized application.
Some of the major application areas of Vector processing are:
 Long-range weather forecasting
 Petroleum explorations
 Seismic data analysis
 Medical diagnosis
 Aerodynamics and space flights simulations
 Artificial intelligence and expert systems
 Mapping the human genome
 Image processing
7.3.1 Vector operations
A vector is an ordered set of a one-dimensional ‘array of data items. A vector V of
length n is represented as row vector by V=[V1,V2,V3.....Vn].It may be represented as
column vector if the data item are listed in a column. Consequently operations on vector
must be broken down into single computations with subscripted variables. The element
Viif vector V is written asV(I) and the index I refers to a memory address or register
where the number is stored. To examine the difference between a convection scalar
processor and a vector processor consider the following Fortran Do loop
DO 20 I = 1,100
20 C(I) =B(I)+A(I)
This is a program for adding to vectors A and B of length 100 to produce a vector
C.
a) Matrix Multiplication
Matrix multiplication is one of the most computational intensive operations performed in
computer with vector processors. The multiplication of two n x n matrices consists of n2
inner products or n3 multiply -add operations. Ann x m matrix of numbers has n rows
and m column and may be considered as constituting a set of n row vector or a set of m
column vectors. Consider, for examples, the multiplication of two 3x3 matrices A and B.

The product matrix C is a 3 x 3 matrix whose elements are related to the elements of A
and B by the inner product:

23 | P a g e
BHU Department of Computer Science

For example, the number in the first row and first column of matrix C is calculated by
letting i=1, j=1, to obtain

Figure: Instruction format for vector processor


In general, the inner product consists of the sum of k products terms of the form

b) Memory Interleaving
Memory interleaving is the technique of using memory from two or more sources. An
instruction pipeline may require the fetching of instruction and an operand at the same
time from two different segments .Similarly, an arithmetic pipeline usually requires two
or more operands to enter the pipeline at the same time instead of using two memory
buses simultaneous access the memory can be partitioned into a number of modules
connected to a common memory address and data buses.

Figure: Multiple module memory organization


The advantage of a modular is that it allows the use of a technique called interleaving
7.4 Array Processors
An array processor is a processor that performs computations on large arrays of data.
The terms used to refer to two different types of processors. An attached array
processor is an auxiliary processor attached to a generally - purpose computer. It is
intended to improve the performance of the host computer in specific numerical
computational tasks. An SIMD array processor is a processor that has a single
24 | P a g e
BHU Department of Computer Science

instruction multiple-data organization. It manipulates vector instructions by means of


multiple functional units responsible to common instruction.
 Attached Array Processor
An attached array processor is designed peripheral for a convectional host computer
and its purpose to enhance the performance of the computer by providing vector
processing complex scientific applications. It achieves high performance by means of
parallel processing with multiple functional units. It includes an arithmetic unit containing
one or more pipeline floating adder and multipliers. The array processor can be
programmed by the user to accommodate variety arithmetic problems.

Figure: Attached array processor with host computer


The above figure represents the host computer connected to the array processor.
 SIMD Array Processor
An SIMD array processor is a computer with multiple processing units operating in
parallel. The processing units Synchronized to perform the same operation under the
control of a common control unit/thus providing a single instruction stream, multiple data
stream (SIMD) organization. A general block diagram of an array processor is shown in
below figure. It contains a set of identical processing elements (PEs) each having a
local memory M.

25 | P a g e
BHU Department of Computer Science

Figure: SIMD array processor organization

26 | P a g e

You might also like