Memory L

Memory Technology
Main Memory
Memory Hierarchy
DRAM vs. SRAM
•DRAM is short for Dynamic Random Access Memory
•SRAM is short for Static Random Access Memory
DRAM is dynamic in that, unlike SRAM, it needs to have

its storage cells refreshed or given a new electronic charge
every few milliseconds. SRAM does not need refreshing
because it operates on the principle of moving current that
is switched in one of two directions rather than a storage cell
that holds a charge in place.
Random-Access Memory (RAM)
• Key features
– RAM is packaged as a chip.
– Basic storage unit is a cell (one bit per cell).
– Multiple RAM chips form a memory.
• Static RAM (SRAM)
– Each cell stores bit with a six-transistor circuit.
– Retains value indefinitely, as long as it is kept powered.
– Relatively insensitive to disturbances such as electrical noise.
– Faster and more expensive than DRAM.
• Dynamic RAM (DRAM)
– Each cell stores bit with a capacitor and transistor.
– Value must be refreshed every 10-100 ms.
– Sensitive to disturbances.
– Slower and cheaper than SRAM.
Parameters of Memory technology
• Access time
• Memory size
• Cost
• Bandwidth
• Unit of transfer
Inclusion
CPU
Register
(M1) Cache
(M2)Main Memory
(M3) Disk Storage
(M4) Tape unit

Conventional DRAM Organization
• d x w DRAM: 16 x 8 DRAM chip
– dw total bits organized as d supercells of size w

bits cols
0 1 2 3
2 bits 0
/
addr
1
rows
memory supercell
2
controller (2,1)
(to CPU)
3
8 bits
/
data
internal row buffer

Typical Bus Structure Connecting
CPU and Memory
• A bus is a collection of parallel wires that carry
address, data, and control signals.
• Buses are typically shared by multiple devices.
CPU chip
register file
ALU
system bus memory bus
I/O main
bus interface
bridge memory
Read Operation
•On a read the CPU will first try to find the data in the cache, if it is not there
the cache will get updated from the main memory and then return the data to
the CPU.
Write Operation
• On a write the CPU will write the information into the

cache and the main memory.
Use of Cache
Why is cache used on parallel computers?

The advances in memory technology aren’t keeping up with
processor innovations.
Memory isn’t speeding up as fast as the processors.
One way to alleviate the performance gap between main
memory and the processors is to have local cache.
The cache memory can be accessed faster than the main
memory.
Cache keeps up with the fast processors, and keeps them
busy with data.
Shared Memory
Network
Cache Cache Cache

Memory 1 Memory 2 Memory 3
processor processor processor

1 2 3
Cache Coherence
What is cache coherence?

Keeps a data element found in several caches
current with each other and with the value in
main memory.
Various cache coherence protocols are used.
snoopy protocol
directory based protocol
Types of main memory
There are two types of main memory, Random Access Memory (RAM) and Read
Only Memory (ROM)
Random Access Memory (RAM)

 holds its data as long as the computer is switched on
 All data in RAM is lost when the computer is switched off
 Described as being volatile
 It is direct access as it can be both written to or read from in any order
Its purpose is to temporarily hold programs and data for processing. In

modern computers it also holds the operating system
Types of RAM
1. Dynamic Random Access Memory (DRAM)
• Contents are constantly refreshed 1000 times per second
• Access time 60 – 70 nanoseconds
Note: a nanosecond is one billionth of a second!
2. Synchronous Dynamic Random Access Memory (SDRAM)

• Quicker than DRAM
• Access time less than 60 nanoseconds
3. Direct Rambus Dynamic Random Access Memory (DRDRAM)

• New type of RAM architecture
• Access time 20 times faster than DRAM
• More expensive
Types of RAM
4. Static Random Access Memory (SRAM)
• Doesn’t need refreshing
• Retains contents as long as power applied to the chip
• Access time around 10 nanoseconds
• Used for cache memory
• Also for date and time settings as powered by small battery
5. Cache memory
• Small amount of memory typically 256 or 512 kilobytes
• Temporary store for often used instructions
• Level 1 cache is built within the CPU (internal)
• Level 2 cache may be on chip or nearby (external)
• Faster for CPU to access than main memory
Types of RAM
6. Video Random Access memory
• Holds data to be displayed on computer screen
• Has two data paths allowing READ and WRITE to occur at the same time
• A system’s amount of VRAM relates to the number of colours and resolution
• A graphics card may have its own VRAM chip on board
7. Virtual memory
• Uses backing storage e.g. hard disk as a temporary location for programs
and data where insufficient RAM available
• Swaps programs and data between the hard-disk and RAM as the CPU
requires them for processing
• A cheap method of running large or many programs on a computer system
• Cost is speed: the CPU can access RAM in nanoseconds but hard-disk in
milliseconds (Note: a millisecond is a thousandth of a second)
• Virtual memory is much slower than RAM
Locality Of reference
• 90-10 rule: A typical Programe may spend 90%
of its execution time on only 10% on code.
• Temporal
• Spatial
• sequential
• Principle of Locality:
Locality
– Programs tend to reuse data and instructions near those they
have used recently, or that were recently referenced
themselves.
– Temporal locality: Recently referenced items are likely to be
referenced in the near future.
– Spatial locality: Items with nearby addresses tend to be
referenced close together in time.
Memory Capacity Planning
• Hit Ratio
• Effective access time
• Cost of memory hierarchy
Page replacement algorithm
• FIFO
• Optimal PR
• LRU
• LFU
• MFU
Page Replacement Algorithms
• So, when we have a page fault we have to find
an eviction candidate.
• Optimally, we would like to evict the page that
will not be referenced again for the longest
amount of time.
– In reality the OS has no way of knowing when
each of the pages will be referenced next.
Not Recently Used
• Examine M and R bits associated with each page.
• Page fault occurs, OS places all pages in 1 or 4
classifications
– Class 0: R=0, M=0
– Class 1: R=0, M=1
– Class 2: R=1, M=0
– Class 3: R=1, M=1
• NRU Algorithm says to remove a page at random
from the lowest nonempty class.
FIFO
• Simple design of having a queue maintained for
pages in memory.
• The head of the queue contains oldest page in
memory.
• The tail of the queue contains the newest page in
memory.
• Is there a potential problem with evicting the head of
the queue each time?
Optimal Page-Replacement Algorithm
• Replace page that will not be used for longest

period of time
• This is a design to guarantee the lowest page-

fault rate for a fixed number of frames
FIFO with second chance
• Performs like FIFO, but inspects the R bit.
• If R bit of the head page is 1, it is cleared and placed at the
tail.
• What happens if all pages in memory have the R bit set to
one?
Clock Page Replacement
• Behaves liked FIFO with second chance except
that it is a circular linked list.
Least Recently Used
• Assume pages used recently will be used again soon

– throw out page that has been unused for longest time
• Must keep a linked list of pages
– most recently used at front, least at rear
– update this list every memory reference !!
• Keep counter in each page table entry
– choose page with lowest value counter
– periodically zero the counter
LRU (cont.)
• Alternatively used a n by n matrix. n is the number of page
frames.
– All values in matrix intially set to 0
– When a page frame, k, is referenced, the hardware
first sets all the bits in row k to 1.
– Then sets all the bits in column k to 0.
– The row whose binary value is lowest is the LRU at
any instant.
Example of LRU using Matrix
Pages referenced in order 0,1,2,3,2,1,0,3,2,3

Simulated LRU – Not Frequently Used
• Most machines do not have the hardware to perform
true LRU, but it may be simulated.
• We can use a counter to keep track of each R bit for
each page upon a timer tick.
• Page with the lowest R-bit is evicted.
– What is the problem with this type of history keeping?
Modified NFU
• Counters are each shifted right 1 bit before
the R bit is added.
• R bit is added to the leftmost, rather than the
rightmost bit.
• This modified algorithm is known as aging.
Belady’s Anomaly
• If the number of frames is increased the page
fault rate decreased.
Belady's anomaly
• normally, increasing the number of frames allocated
to a process will reduce the number of page faults
however, not always the case
1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
with this reference string, actually

have more page faults with 4 frames
than with 3
this rare but highly undesirable

situation is known as Belady's
anomaly
37
Optimal algorithm
• Belady's anomaly led to the search for an optimal page replacement algorithm
– optimal algorithm will have the lowest page fault rate
– optimal algorithm will never suffer from Belady's anomaly
OPT: select as victim the page that will not be used for the longest time
 same reference string requires only 9 page faults (compared with 15 for FIFO)
38
Virtual Memory
Virtual memory
• memory management allows us a logical view of memory that is
consistent with the program creation process
• previously, we discussed techniques for managing multiple processes

in physical memory (continuous allocation, paging, segmentation)
– assumed each process would be provided sufficient physical memory
– brief mention of swapping processes in and out, but considered too costly
note: at any given time, only 1 instruction is being executed

 with paging, only 1 instruction page is required to be in physical memory
 may also require 1 or more data pages
not requiring an entire process to be in physical memory  virtual memory
40
Virtual space > physical space
• logical address space can be much larger than physical address space
– an application is allocated a large virtual address space
– multiple jobs can share the same physical space, each with its own larger
virtual space
– requires (hardware) translation of virtual address to physical address
– the virtual address space can be continuous or segmented
41
Caches
• Cache: A smaller, faster storage device that acts as a staging
area for a subset of the data in a larger, slower device.
• Fundamental idea of a memory hierarchy:
– For each k, the faster, smaller device at level k serves as a cache
for the larger, slower device at level k+1.
• Why do memory hierarchies work?
– Programs tend to access the data at level k more often than they
access the data at level k+1.
– Thus, the storage at level k+1 can be slower, and thus larger and
cheaper per bit.
– Net effect: A large pool of memory that costs as much as the
cheap storage near the bottom, but that serves data to programs
at the rate of the fast storage near the top.
Write Policy:
Write-Through vs Write-Back
• Write-through: all writes update cache and underlying
memory/cache
– Can always discard cached data - most up-to-date data is in memory
– Cache control bit: only a valid bit
• Write-back: all writes simply update cache
– Can’t just discard cached data - may have to write it back to memory
– Cache control bits: both valid and dirty bits
• Other Advantages:
– Write-through:
• memory (or other processors) always have latest data
• Simpler management of cache
– Write-back:
• much lower bandwidth, since data often overwritten multiple times
• Better tolerance to long-latency memory?
Read only memory (ROM)
 ROM holds programs and data permanently even when computer is switched off
 Data can be read by the CPU in any order so ROM is also direct access
 The contents of ROM are fixed at the time of manufacture
 Stores a program called the bootstrap loader that helps start up the computer
 Access time of between 10 and 50 nanoseconds

Memory L

Uploaded by

Copyright:

Available Formats

Memory L

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Memory L

Uploaded by

Copyright:

Available Formats

Memory Technology

•SRAM is short for Static Random Access Memory

DRAM is dynamic in that, unlike SRAM, it needs to have

(M3) Disk Storage

(M4) Tape unit

– dw total bits organized as d supercells of size w

internal row buffer

system bus memory bus

• On a write the CPU will write the information into the

Why is cache used on parallel computers?

Cache Cache Cache

processor processor processor

What is cache coherence?

Random Access Memory (RAM)

 All data in RAM is lost when the computer is switched off

 Described as being volatile

 It is direct access as it can be both written to or read from in any order

Its purpose is to temporarily hold programs and data for processing. In

2. Synchronous Dynamic Random Access Memory (SDRAM)

3. Direct Rambus Dynamic Random Access Memory (DRDRAM)

• Replace page that will not be used for longest

• This is a design to guarantee the lowest page-

• Assume pages used recently will be used again soon

Pages referenced in order 0,1,2,3,2,1,0,3,2,3

with this reference string, actually

this rare but highly undesirable

• previously, we discussed techniques for managing multiple processes

note: at any given time, only 1 instruction is being executed

not requiring an entire process to be in physical memory  virtual memory

 The contents of ROM are fixed at the time of manufacture

 Access time of between 10 and 50 nanoseconds

You might also like