Memory Hierarchy Cache Memory
Memory Hierarchy Cache Memory
CS8493-OPERATING SYSTEMS
ROHINI COLLEGE OF ENGINEERING & TECHNOLOGY
CS8493-OPERATING SYSTEMS
ROHINI COLLEGE OF ENGINEERING & TECHNOLOGY
As can be seen, for high percentages of level 1 access, the average total access time is
much closer to that of level 1 than that of level 2. In our example, suppose 95% of the
memory accesses are found in the cache (H = 0.95) . Then the average time to access
a byte can be expressed as
The result is close to the access time of the faster memory. So the strategy of using
two memory levels works in principle.
The basis for the validity of condition (Decreasing frequency of access to the memory
by the processor) is a principle known as locality of reference.
Locality of reference, also known as the principle of locality, is the tendency of a
processor to access the same set of memory locations repetitively over a short period
of time.
Accordingly, it is possible to organize data across the hierarchy such that the
percentage of accesses to each successively lower level is substantially less than that
of the level above.
The fastest, smallest, and most expensive type of memory consists of the registers
internal to the processor.
CS8493-OPERATING SYSTEMS
ROHINI COLLEGE OF ENGINEERING & TECHNOLOGY
Skipping down two levels, main memory is the principal internal memory system of
the computer. Each location in main memory has a unique address.
Main memory is usually extended with a higher-speed, smaller cache. The cache is not
usually visible to the programmer or, indeed, to the processor. It is a device for staging
the movement of data between main memory and processor registers to improve
performance.
The three forms of memory just described are, typically, volatile and employ
semiconductor technology.
Data are stored more permanently on external mass storage devices, of which the
most common are hard disk and removable media, such as removable disk, tape, and
optical storage.
External, nonvolatile memory is also referred to as secondary memory or auxiliary
memory. These are used to store program and data files.
A hard disk is also used to provide an extension to main memory known as virtual
memory.
Additional levels can be effectively added to the hierarchy in software. For example, a
portion of main memory can be used as a buffer to temporarily hold data that are to
be read out to disk. Such a technique, sometimes referred to as a disk cache, improves
performance in two ways:
Disk writes are clustered. Instead of many small transfers of data, we have a few large
transfers of data. This improves disk performance and minimizes processor
involvement.
Some data destined for write-out may be referenced by a program before the next
dump to disk. In that case, the data are retrieved rapidly from the software cache
rather than slowly from the disk.
CS8493-OPERATING SYSTEMS
ROHINI COLLEGE OF ENGINEERING & TECHNOLOGY
Motivation
The rate at which the processor can execute instructions is clearly limited by the
memory cycle time (the time it takes to read one word from or write one word to
memory).
CS8493-OPERATING SYSTEMS
ROHINI COLLEGE OF ENGINEERING & TECHNOLOGY
This limitation has been a significant problem because of the persistent mismatch
between processor and main memory speeds:
Over the years, processor speed has consistently increased more rapidly than memory
access speed.
We are faced with a trade-off among speed, cost, and size.
Ideally, main memory should be built with the same technology as that of the
processor registers, giving memory cycle times comparable to processor cycle times.
This has always been too expensive a strategy. The solution is to exploit the principle
of locality by providing a small, fast memory between the processor and main
memory, namely the cache.
Cache Principles
The cache contains a copy of a portion of main memory.
When the processor attempts to read a byte or word of memory, a check is made to
determine if the byte or word is in the cache.
If so, the byte or word is delivered to the processor. (CACHE HIT) If not, a block of main
memory, consisting of some fixed number of bytes, is read into the cache and then the
byte or word is delivered to the processor. (CACHE MISS)
The use of multiple levels of cache:
The L2 cache is slower and typically larger than the L1 cache, and the L3 cache
is slower and typically larger than the L2 cache.
Following Figure depicts the structure of a cache/main memory system.
Main memory consists of up to 2n addressable words, with each word having a unique
n –bit address.
This memory is considered to consist of a number of fixed-length blocks of K words
each. That is, there are M= 2n/K blocks.
Cache consists of C slots (also referred to as lines) of K words each, and the number of
slots is considerably less than the number of main memory blocks (C<<M) (<< much
less than)
CS8493-OPERATING SYSTEMS
ROHINI COLLEGE OF ENGINEERING & TECHNOLOGY
If a word in a block of memory that is not in the cache is read, that block is transferred
to one of the slots of the cache.
Each slot includes a tag that identifies which particular block is currently being stored.
The tag is usually some number of higher-order bits of the address and refers to all
addresses that begin with that sequence of bits.
As a simple example, suppose that we have a 6-bit address and a 2-bit tag. The tag 01
refers to the block of locations with the following addresses: 010000, 010001, 010010,
010011, 010100, 010101, 010110, 010111, 011000, 011001, 011010, 011011, 011100,
011101, 011110, 011111.
Cache Read Operation
The processor generates the address, RA, of a word to be read. If the word is contained
in the cache, it is delivered to the processor. Otherwise, the block containing that word is
loaded into the cache and the word is delivered to the processor.
CS8493-OPERATING SYSTEMS
ROHINI COLLEGE OF ENGINEERING & TECHNOLOGY
CACHE DESIGN
Key elements in cache design.
Cache size
Block size
Mapping function
Replacement algorithm
Write policy
Number of cache levels
Cache size: It turns out that reasonably small caches can have a significant impact on
performance.
Block size: The unit of data exchanged between cache and main memory. As the block size
increases from very small to larger sizes, the hit ratio will at first increase because of the
principle of locality:
CS8493-OPERATING SYSTEMS
ROHINI COLLEGE OF ENGINEERING & TECHNOLOGY
Mapping Function: When a new block of data is read into the cache, the mapping function
determines which cache location the block will occupy. Two constraints affect the design of
the mapping function.
First, when one block is read in, another may have to be replaced.
The more flexible the mapping function, the more scope we have to design a replacement
algorithm to maximize the hit ratio.
Second, the more flexible the mapping function, the more complex is the circuitry required to
search the cache to determine if a given block is in the cache.
The replacement algorithm: Effective strategy is to replace the block that has been in the
cache longest with no reference to it. This policy is referred to as the least-recently-used (LRU)
algorithm.
Write policy: If the contents of a block in the cache are altered, then it is necessary to write it
back to main memory before replacing it. The write policy dictates when the memory write
operation takes place.
At one extreme, the writing can occur every time that the block is updated. At the other
extreme, the writing occurs only when the block is replaced. (Write Through, Write Back)
CS8493-OPERATING SYSTEMS