Unit IV Memory Organization
Unit IV Memory Organization
Unit IV Memory Organization
MEMORY ORGANIZATION
Hardware Organization
Match Logic
Read Operation
Write Operation
Cache Memory
Associative
Direct
Set associative Mapping
Writing into cache
Virtual Memory
Address space and Memory Space
Address Mapping using Pages
Associative Memory page table
Page Replacement
Lets us Sum Up
Lesson – End Activities
Points for Discussion
References
INTRODUCTION
In this unit we will discuss about the main memory, Associative memory, cache memory
and virtual memory.
MEMORY HIERARCHY
Magnetic Tape
I/O PROCESSOR
Main Memory
The part of the operating system that supervises the flow of information between all
storage devices is called “Memory Management System”. The memory management system
distributes program and data to various levels in the memory hierarchy. They are
Batch Mode
Time sharing mode
In a Batch mode, each user prepares his program off- line and submits it to the computer
center. An operator loads all programs into the computer where they are executed. The operator
retrieves the printed output and returns it to the user.
A major concept common to both batch and time – sharing modes is their use of
Multiprogramming. Processor logic is usually faster than main memory access time with
the result that processing speed is mostly limited by the speed of main memory.
The cache memory is used to store segments of programs currently being executed in the
CPU. In a computer system where the demand for service is high, it is customary to run all
programs in one of two possible modes. At the bottom of the hierarchy are the relatively slow
magnetic tapes used to store removable files. Above if are the magnetic disks or drums used as
backup storage. The main memory occupies a central position by being able to communicate
directly with the CPU and with auxiliary devices through an I/O processor.
When programs not residing in main memory are needed by the CPU, they are brought in
from auxiliary memory. Programs not currently needed in main memory are transferred into
auxiliary memory to provide space for currently used programs and data according to their
expected frequency of storage. The objective of the memory management system is to adjust the
frequency with which the various memories are referenced to provide an efficient method of
transfers between levels so as to maximize the utilization of all computer components.
MAIN MEMORY
The main memory is the central storage unit in a computer system.
It is a relatively large and fast memory used to store programs and data during the
computer operation. The principal technology used for the main memory is based on
semiconductor integrated circuits. Integrated circuit RAM chips are available in two possible
operating modes, static and dynamic. The static RAM consists essentially of internal flip- flops
that store the binary information. The stored information remains valid as long as power is
applied to the unit. The dynamic RAM stores the binary information in the form of electric
charges that are applied to capacitors. The capacitors are provided inside the chip by MOS
transistors. The stored charges on the capacitors tend to discharge with time and the capacitors
must be periodically recharged by refreshing the dynamic memory. Refreshing is done by cycling
through the words every few milliseconds to restore the decaying charge. The dynamic RAM
offers reduced power consumption and larger storage capacity in a single memory chip. The
static RAM is easier to use and has shorter read and write cycles.
Among other things, the ROM portion of main memory is needed for storing an initial
program called a bootstrap loader. The bootstrap loader is a program whose function is to start
the computer software operating when power is turned on.
The start up a computer consists of turning the power on and starting the execution of
an initial program. Thus when power is turned on, the hardware of the computer sets the
program counter to first address of the bootstrap loader. The bootstrap program loads a portion
of the operating system from disk to; main memory and control is then transferred to the
operating system, which prepares the computer for general use.
RAM and ROM chips are available in a variety of sizes. If the memory needed for the
computer is larger than the capacity of one chip, it is necessary to combine a number of chips to
form the required memory size. To demonstrate the chip interconnection, we will show an
example of a 1024 * 8 memory constructed with 128 * 8 RAM chips and 412 * 8 ROM chips.
A RAM chip is better suited for communication with the CPU if it has one or more
control inputs that selected the chip only when needed. Another common feature is a
bidirectional data bus that allows the transfer of data either from memory to CPU during a read
operation or from CPU to memory during a write operation. A bidirectional bus can be
constructed with three-state buffers. A three-state buffer output could be placed in one of three
possible states: a signal equivalent to logic 1, a signal equivalent to logic 0, or high impedance
state. The logic 1 and 0 are normal digital signals. The high impedance state behaves like an
open circuit, which means that the output does not carry a signal and has no logic significance.
The block diagram if a RAM chip is shown in the Figure 4.2(a).
The capacity of the memory is 128 words of eight bits per word. This requires a 7-bit
address and an 8-bit bidirectional data bus. The read and write inputs specify the memory
operation and the two chips select (CS) control inputs are for enabling the chip only when it is
select the chip facilitates the decoding of the address lines when multiple chips are used in the
microcomputer. The read and write inputs are sometimes combined into one line labeled R/W.
When the chip is selected, the two binary states in this line specify the two operations of read or
write.
The function table listed in Figure. 4.2(b) specifies the operation of the RAM chip. The
unit is in operation only when CS1 = 1 and CS2 = 0. The bar on top of the second select variable
indicates that this input is enabled when it is equal to 0. If the chip select inputs are not enabled,
or if they are enabled but the read or write inputs are not enabled, the memory is inhibited and its
data bus is in a high- impedance state. When CS1 = 1 and CS2 = 0, the memory can be placed in
a write or read mode. When the WR in-put is enabled, the memory stores a byte from the data
bus into a location specifies by the address input lines. When the RD input is enabled, the
content of the selected byte is placed into the data bus. The RD and WR signals control the
memory operation as well as the bus buffers associated with the bidirectional data bus.
A ROM chip is organized externally in a similar manner. However, since a ROM can
only read, the data bus can only be in an output mode. The block diagram of a ROM chip is
shown in Figure. 4.3. For the same-size chip, it is possible to have more bits of ROM than in
RAM, because the internal binary cells in ROM occupy less space than of RAM. For this reason,
the diagram specifies a 412-byte ROM, while the RAM has only 128 bytes.
chip select 1
CS1
The designer of a computer system must calculate the amount of memory required for the
particular application and assign it to either RAM or ROM. The interconnection between
memory and processor is then established from knowledge of the size of memory needed and the
type of RAM and ROM chips available. The addressing of memory can be established by means
of a table that specifies the memory address assigned to each chip. The table, called a memory
address map, is a pictorial representation of assigned address space for each chip in thesystem.
To demonstrate with a particular example, assume that a computer system needs 412
bytes of RAM and 412 bytes of ROM. The RAM and ROM chips to be used are specified in
Figure.4. 2 and 4. 3. The memory address map for this configuration is shown in Table 4.1. The
component column specifies whether a RAM or a ROM chip is used. The hexadecimal address
column assigns a range of hexadecimal equivalent addresses for each chip. The address bus lines
are listed in the third column. Although there are 16 lines in the address bus, the table shows
only 10 lines because the other 6 are not used in this example and are assumed to be zero. The
small x‟s under the address bus lines designate those lines that must be connected to the address
inputs in each chip. The RAM chips have 128 bytes and need seven address lines. The ROM
chip has 412 bytes and needs 9 address lines. The x‟s are always assigned to the low-order bus
lines: lines 1 through 7 for the RAM and lines 1 through 9 for the ROM. It is now necessary to
distinguish between four RAM chips by assigning to each a different address. For this particular
example we choose bus lines 8 and 9 to represent four distinct binary combinations. Note that
any other pair of unused bus lines can be chosen for this purpose. The table clearly shows that
the nine low-order bus lines constitute a memory space for RAM equal to 29 = 412 bytes. The
distinction between a RAM and ROM address is done with another bus line. Here we choose
line 10 for this purpose. When line 10 is 0, the CPU selects a RAM, and when this line is equal
to 1, it selects the ROM.
ADDRESS BUS
COMPONENT HEXA DECIMAL 10 9 8 7 6 4 4 3 2 1
ADDRESS
RAM 1 0000-007F 0 0 0 X X X X X X X
RAM 2 0080-00FF 0 0 1 X X X X X X X
RAM 3 0100-017F 0 1 0 X X X X X X X
RAM 4 0180-01FF 0 1 1 X X X X X X X
ROM 0200-03FF 1 X X X X X X X X X
TABLE 4.1 Memory Address Map for Microcomputer
The equivalent hexadecimal address for each chip is obtained from the information under the
address bus assignment. The address bus lines are subdivided into groups of four bits each so
that each group can be represented with a hexadecimal digit. The first hexadecimal digit
represents lines 9 to 12, but lines 11 and 12 are always 0. The range of hexadecimal addresses
for each component is determined from the x‟s associated with it. These x‟s represent a binary
number that can range from an all-1‟svalue.
RAM and ROM chips are connected to a CPU through the data and address buses. The
low-order lines in the address bus select the byte within the chips and other lines in the address
bus select a part9cular chip through its chip select inputs. The connection of memory chips to
the CPU is shown in Figure. 4.4. This configuration gives a memory capacity of 412 bytes of
RAM and 412 bytes of ROM. It implements the memory map of Table 4.4. Each RAM receives
the seven low-order bits of the address bus to select one of 128 possible bytes. The particular
RAM chip selected is determines from lines 8 and 9 in the address bus. This is done through a 2
* 4 decoder whose outputs go to the CS1 inputs in each RAM chip is selected. When 01, the
second RAM chip is selected, and so on. The RD and WR outputs from the microprocessor are
applied to the inputs of each RAM chip.
The selection between RAM and ROM is achieved through bus line 10. The RAMs are
selected when the bit in this line is 0 and the ROM when the bit is 1. The other chip select input
in the ROM is connected to the RD control line for the ROM chip to be enabled only during a
read operation. Address bus lines 1 to 9 are applied to the input address of ROM without going
through the decoder. This assigns addresses 0 to 411 to RAM and 412 to 1023 to ROM. The
data bus of the RAMs can transfer information in both directions.
Decoder
3210
CS1
CS2
RD 128 X 8 Data
WR RAM 1
AD7
CS1
CS2
RD 128 X 8 Data
WR RAM 2
AD7
CS1
CS2
RD 128 X 8 Data
WR RAM 3
AD7
CS1
CS2
RD 128 X 8 Data
WR RAM 4
AD7
CS1
CS2
128 X 8 Data
AD9 ROM
Self check Exercise: 1
(a) The secondary memory is slower than that of main memory but has a larger capacity.
True / False
(b) In Random Access Memory any memory location can be accessed independently.
True / False
ASSOCIATIVE MEMORY
Many data processing applications require the search of items in a table to be stored in
memory. The time required to find an item stored in memory can be reduced considerably if
stored data can be identified for access by the content of the data itself rather than by an address.
The memory locates all words, which match the specified content, and marks them for
reading. Because of its organization, the associative memory is uniquely suited to do parallel
searches by data association. Moreover, searches can be done on an entire word or on a specific
field within a word. Associative memories are used in applications where the search time is very
critical and must be very short.
The block diagram of an associative memory is shown in the following Figure. 4.4.
Argument register (A)
Read
M
Write
M-words n-bits per word
Output
Figure. 4.4 Block Diagram of Associative Memory
The associative memory consists of a memory array and logic for m words with n bits per
word. The argument register A and key register K each have n bits, one for each have n bits, one
for each bit of a word. The argument register A and key register K each have n bits, one for each
bit of a word. The match register M has m bits, one for each memory word.
Each word in memory is compared in parallel with the content of the argument register.
The word that matches the bits of the argument register set a corresponding bit in the match
register. Reading is accomplished by a sequential process or access to memory for those words
whose corresponding bits in the match register have been set. The key register provides a mask
for choosing a particular field or key in the argument word.
The entire argument is compared with each memory word if the key register contains all
1‟s. Otherwise, only those bits in the argument that have 1‟s in their corresponding position of
the key register are compared.
Example
A 101 111100
K 111 000000
Word1 100 111100 => no match
Word2 101 000001 => match
Word2 matches the unmasked argument field because the three left-most bits of the argument and
the word are equal. The cells in the array are marked by the letter C with two subscripts. The first
subscript gives the word number and second specifies the bit position of the logic in the word.
Figure 4.6 Associative memory of an m word, n cells – per word
The relation between the memory array and external registers in an associative memory is
shown in Figure. 4.6
A1 Aj An
K1 Kj Kn
Input
Ai Kj
Write
R S
F ij
Match
Logic
Read
Output
The match logic for each word can be derived from the comparison algorithm of two
binary numbers. First, we neglect the K bits and compare the argument in A with the bits stored
in the cells of the words.
Word I is equal to the argument in A if
Aj = Fij for j=1,2…n
Xj = Aj Fij + Aj‟Fij„
For a word I to be equal to the argument in A we must have all Xj variables equal to 1.
This is the condition for setting the corresponding match bit Mi to 1. The function for this
condition is Mi = x1 x2x3… xn and constitute the AND operation of all pairs of matched bits.
We now include the key bit Kj in the comparison logic. The requirement is that if Kj=0,
the corresponding bits of Aj and need no comparison. Only when Kj=1 must be compared.
This requirement is achieved by OR ing each term with Kj. Thus
xj if kj = 1
Xj + Kj „=
1 if kj = 0
The match logic for word I in an associative memory can now be expressed by the
following Boolean function.
If we substitute the original definition of xj, the above Boolean function can be expressed
as follow
n
Mi = (Aj Fij + Aj„Fij „+ kj‟)
j=1
It is then necessary to scan the bits of the match register one at a time. The matched
words are read in sequence by applying a read signal to each word line whose corresponding M i
bit is a 1.
If only one word may match the unmasked argument field, then connect output Mi
directly to the read line in the same word position, the content of the matched word will be
presented automatically at the output lines and no special read command signal is needed.
Furthermore, if we exclude words having zero content, then all zero output will indicate that no
match occurred and that the searched item is not available in memory.
If the entire memory is loaded with new information at once, then the writing can be done
by addressing each location in sequence. The information is loaded prior to a search operation. If
unwanted words have to be deleted and new words inserted one at a time, there is a need for a
special register to distinguish between active an inactive words. This register is called “Tag
Register”.
The references to memory at any given interval of time tent to be contained within a few
localized areas in memory. This is called “Locality of reference”. If the active portions of the
program and data are placed in a fast small memory, the average memory access time can be
reduced, thus reducing the total execution time of the program. Such a fast small memory is
referred to as “Cache Memory”.
The performance of the cache memory is measured in terms of a quality called “Hit
Ratio”. When the CPU refers to memory and finds the word in cache, it produces a hit. If the
word is not found in cache, it counts it as a miss.
The hit ratios of 0.9 and higher have been reported. The average memory
access time of a computer system can be improved considerably by use of cache.
It is placed between the CPU and main memory. It is the faster component in the
hierarchy and approaches the speed of CPU components. Most frequently accessed instruction
and data are kept in the fast cache memory. Although the cache is only a small fraction of the
size of main memory, a large fraction of memory requests will be found due to locality of
reference.
When the CPU needs to access memory, the cache is examined. If it is found in the
cache, it is read very quickly. If it is not found in the cache, the main memory is accessed. A
block of words containing the one just accessed is then transferred from main memory to cache
memory.
For example,
A computer with cache access time of 100ns, a main memory access time of 1000ns and
a hit of 0.9 produce an average access time of 200ns. This is a considerable improvement over a
similar computer without a cache memory, whose access time is 1000ns.
The basic characteristic of cache memory is its fast access time. Therefore, very little or
no time must be wasted when searching for words in the cache. The transformation of data from
main memory to cache memory is referred to as a
“Mapping Process”. There are three types of mapping procedures are available.
Associative Mapping
Direct Mapping
Self – Associative Mapping.
ASSOCIATIVE MAPPING
The diagram, given below shows 3 words presently stored in cache. The address value of 14 bits
it‟s shown as a 4 digit octal number and its corresponding 12 bits is shown as 4 digit octal
number
A CPU address of 14 bits is placed in the argument register and associative memory is
searched for a matching address. If the address is found, the corresponding 12 bit data is read and
sent to the CPU.
Argument register
Address Data
01000 3440
02777 6710
22344 1234
Figure. 4.9 Associative Mapping Cache (all numbers in octal)
If no match occurs, the main memory is accessed for the word. The address – data pair is
then transferred to associative cache memory. If the cache is full, it must be displayed, using
replacement algorithm. FIFO may be used.
DIRECT MAPPING
Here, using RAM for the cache is presented in the following Figure 4.10.
6 Bits 9 Bits
Tag Index
00 000 000
32k x 12 a 412 x 12
Main Memory o d Cache Memory
Address = 14 bits c d Address = 9 bits
Data = 12 bits t r Data = 12 bits
a e
l s
s
777
77 777
4.10 Addressing relationships between main and cache memories
The 14-bit CPU address is divided into two fields. The 9 least significant bits constitute
the index field and the remaining 6 bits form the tag fields. The main memory needs an address
but includes both the tag and the index bits. The cache memory requires the index bit only i.e., 9.
There are 2k words in the cache memory & 2n words in the main memory.
Eg: k = 9, n = 14
The internal organization of the words in the cache is shown in the Figure. 4.11. Each
word in cache consists of the data word and it associated tag. When a new word is brought into
cache, the tag bits store along data. When the CPU generates a memory request, the index field is
used in the address to access the cache. The tag field of the CPU address is equal to tag in the
word from cache; there is a hit, otherwise miss.
00777 2340
01000 3440
a) Cache Memory
02777 6710
a) Main Memory
Figure 4.11 Direct mapping Cache Organization
The cache tag is 00 but the address tag is 02, which does not produce match. Therefore
the main memory is accessed and the data word 4670 is transferred to CPU. The cache word at
index address 000 is then replaced with a tag of 02 and data of 4670.
Each data word is stored together with its tag and thenumber of tag – data items in one word
of cache is said to form a set.
02 6710 00 2340
777
Figure 4.12 Self – associative mapping cache with set size of two
Each index address refers to two data words and their associated tags. Each tag requires 6
bits & each data word has 12 bits, so the word length is 2(6+12) =36 bits. An index address of 9
bits can accommodate 412 words. It can accommodate 1024 words. When the CPU generates a
memory request, the index value of the address is used to access the cache.
The tag field of the CPU address is compared with both tags in the cache.
The most common replacement algorithms are
Random replacement
FIFO
Least Recently Used (LRU)
The simplest and most commonly used procedure is to update main memory with every
memory write operation, with cache memory being update in parallel if it contains the word at
the specified address. This is called the “Write-through method”.
This method has the advantage that main memory always contains the same data as the
cache. The second procedure is called the “write-back method”. In this method only the cache
location is updated during a write operation. The location is then marked by a flag so that later
when the word is removed from the cache it is copied into main memory.
The reason for the write-back method is that during the time a word resides in the cache,
it may be updated several times. Analytical results indicate that the number of memory writes in
a typical program ranges between 10 and 30 % of the total references to memory.
Self-Check Exercise 2
1. The direct mapping of main memory to cache is achieved by mapping a block of main
memory to any slot ofcache.
True / False
2. A replacement algorithm is needed only for associative and set associative mapping of cache.
True / False
3. Write policy is not needed for instruction cache.
True / False
In a memory hierarchy system, programs and data are first stored in auxiliary memory.
Portions of program or data are brought into main memory, as they are needed by the CPU.
Virtual memory is a concept used in some large computer systems that permit the user to
construct his programs as though he had a large memory space, equal to the totality of auxiliary
memory. Each address, which is referenced by the CPU, goes through an address mapping from
FC:\WINDOWS\hinhem.scrthe so- called virtual address to an actual address in main memory.
Program 1 Program 1
Data1, 1
Data 1,1
Program 2
Data 2, 1
Memory Space
M=32k=214
Address Space
N=1024k =220
Figure 4.13 Relation between address and memory space in a virtual memory system
A table is needed to map a virtual address of 20 bits to a physical address of 14 bits. The
mapping table may be stored in a separate memory or in main memory.
Virtual Address
Memory
Virtual Address Mapping Main Memory
Register Main Memory
Table
(20 Bits) Address Register
(14 Bits)
Introduction in the address space and the memory space are each divided into groups of
equal size. The group in the physical memory called blocks, 64 to 4096 words. The group in the
auxiliary memory is called pages.
For example,
Address space is divided into 1024 pages and main memory is divided into 32 blocks.
The program is divided into pages. Positions of programs are moved from auxiliary memory to
main memory. The terms page refers to group of address space of the same size.
Consider a computer with an address space of 8k and a memory space of 4k. If we split
each into groups of 1k words we obtain 8 pages and 4 blocks.
Page 0
Page 1 M= 4k= 212
Page 2
Page 3
Page 4
Page 4
Page 6
Page 7
Address Space
N=8k=213
Block 0
Block 1
Block 2
Figure 4.14 Address space and memory space split into groups of 1k words
At any time, 4 pages may reside in main memory. Each virtual address is represented by
two numbers. A page number address and a line within the page. In a computer with 2p words
per page, p bits are used to specify a line address and the remaining bits of the virtual address
specify the page number. A virtual address has 13 bits. Since each page consists of 2 10 = 1024
words, the high order 3 bits of a virtual address will specify one of the 8 pages and the low order
10 bits give the line address within the page.
Virtual Address
Page Number
Argument Register
101 Line number
001 11
010 00 Associative memory
101 01
110 10
Page no block no
Figure. 4.17 An Associative Memory Page Table
Page replacement
(1) Which page in memory ought to be removed to make room for a new page
(2) When a new page is to be transferred from auxiliary memory and
(3) Where the page is to be placed in the main memory. The hardware mapping
mechanism and the memory management software together constitute the
architectural of a virtual memory.
When a program starts execution one or more pages are transferred into main memory
and the page table is set to indicate checks position.
The program is executed from main memory until if attempt to reference a page that is
still in auxiliary memory. This condition is called page fault when page fault. When page fault
occurs, the execution of the present program is suspended until the required page is brought into
main memory.
Since loading a page from auxiliary memory to main memory is basically on I/O
operation, the operating system assigns this task to the I/O processor. In the mean time control is
transferred to the program in memory that is waiting to be processed in the CPU. Later when the
memory block has been assigned and the transfer completed the original program can resume its
operation.
Two of the most common development algorithms used are the first in first out (FIFO)
and the least recently used (LRU). The FIFO algorithms select for replacement the page that has
been in the memory the longest time. Each time a page is loaded into memory, its identification
number is pushed into a FIFO will be full whenever memory has no more empty blocks. When a
new page must be loaded, the page least recently brought in is removed. The page to be removed
is easily determined because its identification number is at the top of the FIFO stack. The FIFO
replacement policy has the advantage of being easy to employment. It has the disadvantage that
under certain circumstances pages are removed and loaded from memory too frequently.
The LRU policy is more difficult to implement but has been more attractive on the
assumption that the least recently used page is better candidate for removal than the least
recently loaded page as in FIFO. The LRU algorithm can be implemented by associating a
counter with every page that is in main memory. When a page is referenced, its associated
counted is set to zero. At fixed intervals of time, the counters associated with all pages presently
in memory are incremented by 1. The least recently used page is the page with the highest count.
The counters are often called aging registers as their count indicates their age, that is, how long
ago their associated pages have been referenced.
Self-Check Exercise: 3
Thus we have taken a complete view of the memory system of computer system along
with the various technologies. The unit has outlined the importance of the memory system, the
memory hierarchy, the main memory and its technologies.