Memory System
Memory System
Memory System
Memory System
Chapter Outline
• Basic memory circuits & memory organization
• Memory technology
• Memory hierarchy
• Virtual memory
Basic Concepts
• Access provided by processor-memory interface
• Address and data lines, and also control lines for
command (Read/Write), timing, data size
• Memory access time is time from initiation to
completion of a word or byte transfer
• Memory cycle time is minimum time delay
between initiation of successive transfers
• Random-access memory (RAM) means that
access time is same, independent of location
Semiconductor RAM Memories
• Memory chips have a common organization
• Cells holding single bits arranged in an array
• Words are rows; cells connected to word lines
(cells per row bits per processor word)
• Cells in columns connect to bit lines
• Sense/Write circuits are interfaces between
internal bit lines and data I/O pins of chip
• Typical control pin connections include
Read/Write command and chip select (CS)
Internal Organization and Operation
• Example of 16-word 8-bit memory chip has
decoder to select word line from 4-bit address
• Two complementary bit lines for each data bit
• External source provides stable address bits,
and asserts chip-select input with command
• For Read operation, Sense/Write circuits
transfer data from selected row to I/O pins
• For Write operation, Sense/Write circuits
transfer data from I/O pins to selected cells
Static RAMs and CMOS Cell
• Static memories need power to retain state;
usually have short access time (few nano secs.)
• A static RAM cell in a chip consists of
two cross-connected inverters to form a latch
• Chip implementation typically uses CMOS cell
whose advantage is low power consumption
• Two transistors controlled by word line act as
switches between the cell and the bit lines
• To write, bit lines driven with desired data
Dynamic RAMs
• Static RAMs have short access times, but need
several transistors per cell, so density is lower
• Dynamic RAMs are simpler for higher density
and lower cost, but access times are longer
• Density/cost advantages outweigh slowness
• Dynamic RAMs are widely used in computers
• Cell consists of a transistor and a capacitor
• State is presence/absence of capacitor charge
• Charge leaks away and must be refreshed
Dynamic RAM Chip Operation
• Reflects general principles of chip operation
• For Read, charge from cells in selected row
is checked by sense amplifiers on bit lines
• 1 or 0 if charge is above or below threshold
• Action of sensing the bit lines also causes
refresh of charge in all cells of selected row
• For Write, access the row and drive bit lines
to alter amount of charge in subset of cells
• Refresh rows periodically to maintain charge
More on Dynamic RAM Chips
• Consider 32M 8 chip with 16K 16K array
• 16,384 cells per row organized as 2048 bytes
• 14 bits to select row, 11 bits for byte in row
• Use multiplexing of row/column on same pins
• Row/column address latches capture bits
• Row/column address strobe signals for timing
(asserted low with row/column address bits)
• Asynchronous DRAMs: delay-based access,
external controller refreshes rows periodically
Fast Page Mode
• In preceding example, all 16,384 cells in a row
are accessed (and also refreshed as a result)
• But only 8 bits of data are actually transferred
for each full row/column addressing sequence
• For more efficient access to data in same row,
latches in sense amplifiers hold cell contents
• For consecutive data, just assert CAS signal
and increment column address in same row
• This fast page mode is useful in block transfers
Synchronous DRAMs
• In early 1990s, DRAM technology enhanced
by including clock signal with other chip pins
• More circuitry also added for enhancements
• These chips are called synchronous DRAMs
• Sense amplifiers still have latching capability
• Additional benefits from internal buffering
and availability of synchronizing clock signal
• Internal row counter enables built-in refresh
instead of relying on external controller
SDRAM Features
• Synchronous DRAM (SDRAM) chips include
data registers as well as address latches
• New access operation can be initiated while
data are transferred to or from these registers
• Also have more sophisticated control circuitry
• SDRAM chips require power-up configuration
• Memory controller initializes mode register
• Used to specify burst length for block transfers
and also to set delays for control of timing
Efficient Block Transfers
• Asynchronous DRAM incurs longer delay from
CAS assertion for each column address
• Synchronous DRAM reduces delay by having
CAS assertion once for initial column address
• SDRAM circuitry increments column counter
and transfers consecutive data automatically
• Burst length determines number of transfers
• Consider example with burst length of 4,
RAS delay of 3 cycles, CAS delay of 2 cycles
Double-Data-Rate (DDR) SDRAM
• Early SDRAMs transferred on rising clock edge
• Later enhanced to use rising and falling edges
• Doubles effective rate after RAS/CAS assertion
• Requires more complex clock/control circuitry
• Internal array access not significantly faster
• How can transfer rate be effectively doubled?
• Interleave consecutive data across two arrays
• Switch between arrays for each clock edge
Structure of Larger Memories
• Internal chip organization has been discussed
• Larger memories combine multiple chips
• How are these chips connected together?
– Consider an example based on static memory
– Memory size is 2M words, each 32 bits in size
– Implement with 512K 8 static memory chips
– 4 chips for 32 bits, 4 groups of 4 chips for 2M
Address Decoder and Tri-state Pins
• 2M word-addressable memory needs 21 bits
• Each chip has only 19 address bits (219 512K)
• Address bits A20 and A19 select one of 4 groups
• Outputs of 2-bit decoder drive chip-select pins
• Only the selected chips respond to request
for j := 0 to 9 do
SUM := SUM+ A(0,j)
End
AVG := SUM/10
for i := 9 downto 0 do
A(0,i) := A(0,i)/AVG
end
Illustrative Example (contd.)
Performance Consideration
• Hit Rate and Miss Penalty
– Number of hits as a fraction of all attempted
accesses is called the hit rate
– miss rate is the number of misses stated as a
fraction of attempted accesses.
– total access time seen by the processor when a
miss occurs as the miss penalty
– Average time experience by processor:
• tavg= hC + (1-h)M, where h is hit rate, C is cache access
time and M is miss penalty
Performance Consideration (Contd.)
• Access times to the cache and main memory are τ and 10τ ,
respectively.
• When a cache miss occurs, a block of 8 words is transferred
from the main memory to the cache.
• It takes 10τ to transfer the first word of the block, and the
remaining 7 words are transferred at the rate of one word
every τ seconds.
• Miss penalty also includes a delay of τ for the initial access to
the cache, which misses, and another delay of τ to transfer
the word to the processor after the block is loaded into the
cache.
• Thus, the miss penalty in this computer is given by:
M = τ + 10τ + 7τ + τ = 19τ
Performance Consideration (Contd.)
• Assume that 30 percent of the instructions in a
typical program perform a Read or a Write operation
• It means that there are 130 memory accesses for
every 100 instructions executed.
• Assume that the hit rates in the cache are 0.95 for
instructions and 0.9 for data.
• Assume further that the miss penalty is the same for
both read and write accesses
Performance Consideration (Contd.)
• Time without cache/ Time with cache =
(130 × 10τ)/(100(0.95τ + 0.05 × 19τ) + 30(0.9τ + 0.1 × 19τ))
= 4.7.
• Time for Real cache/ Time for Ideal cache =
(100(0.95τ + 0.05 × 19τ) + 30(0.9τ + 0.1 × 19τ))/(130τ) = 2.1.
• For two-level cache with L1 and L2,
tavg= h1C1 + (1 - h1)(h2C2 + (1-h2)M) where
h1 is the hit rate for L1, h2is the hit rate for L2, C1is the time to access
data in L1, C2is the miss penalty to transfer data from L2 to L1, M is
the miss penalty to transfer data from main memory to L2.
Virtual Memory
• Physical mem. capacity address space size
• A large program or many active programs may
not be entirely resident in the main memory
• Use secondary storage (e.g., magnetic disk) to
hold portions exceeding memory capacity
• Needed portions are automatically loaded
into the memory, replacing other portions
• Programmers need not be aware of actions;
virtual memory hides capacity limitations
Virtual Memory
• Programs written assuming full address space
• Processor issues virtual or logical address
• Must be translated into physical address
• Proceed with normal memory operation
when addressed contents are in the memory
• When no current physical address exists,
perform actions to place contents in memory
• System may select any physical address;
no unique assignment for a virtual address
Memory Management Unit
• Implementation of virtual memory relies on a
memory management unit (MMU)
• Maintains virtualphysical address mapping
to perform the necessary translation
• When no current physical address exists,
MMU invokes operating system services
• Causes transfer of desired contents from disk
to the main memory using DMA scheme
• MMU mapping information is also updated
Address Translation
• Use fixed-length unit of pages (2K-16K bytes)
• Larger size than cache blocks due to slow disks
• For translation, divide address bits into 2 fields
• Lower bits give offset of word within page
• Upper bits give virtual page number (VPN)
• Translation preserves offset bits, but causes
VPN bits to be replaced with page frame bits
• Page table (stored in the main memory)
provides information to perform translation
Page Table
• MMU must know location of page table
• Page table base register has starting address
• Adding VPN to base register contents gives
location of corresponding entry about page
• If page is in memory, table gives frame bits
• Otherwise, table may indicate disk location
• Control bits for each entry include a valid bit
and modified bit indicating needed copy-back
• Also have bits for page read/write permissions
Translation Lookaside Buffer
• MMU must perform lookup in page table
for translation of every virtual address
• For large physical memory, MMU cannot hold
entire page table with all of its information
• Translation lookaside buffer (TLB) in the MMU
holds recently-accessed entries of page table
• Associative searches are performed on the TLB
with virtual addresses to find matching entries
• If miss in TLB, access full table and update TLB
Page Faults
• A page fault occurs when a virtual address has
no corresponding physical address
• MMU raises an interrupt for operating system
to place the containing page in the memory
• Operating system selects location using LRU,
performing copy-back if needed for old page
• Delay may be long, involving disk accesses,
hence another program is selected to execute
• Suspended program restarts later when ready
Memory Management Requirements
• Virtual address space divided as system space and user space (one
space per user)
• One page table per process
• MMU uses Page Table Base Register to determine where the
corresponding page table is located
• By changing PTBR content, OS can switch from one space to
another
• Processor runs in two modes – supervisor mode and user mode
• Privileged instructions can be run in supervisor mode only – e.g.,
update the PTBR. Thus user processes cannot overwrite each
other’s data
• For data sharing, pages may be shared between processes – with
control bits set to read/write, etc.
Sections to Read
(From Hamacher’s Book)
• Chapter on Memory System
– All sections and sub-sections except 8.3.5 (Flash
Memory) 8.10 (Secondary Storage)