13m Arch PDF
13m Arch PDF
13m Arch PDF
10/19/98
13
Main Memory
Architecture
18-548/15-548 Memory System Architecture
Philip Koopman
October 19, 1998
Required Reading:
Assignments
u
10/19/98
Preview
u
10/19/98
Main Memory
CPU
I-UNIT
SPECIALPURPOSE
MEMORY
E-UNIT
REGISTER
FILE
ON-CHIP
L1 CACHE
(?)
ON-CHIP
L2 CACHE
MAIN
MEMORY
CA CH E BY
PA SS
(?)
L3/L4
CACHE
L2/L3
CACHE
INTERCONNECTION
NETWORK
TLB
SPECIALPURPOSE
CACHES
OTHER
COMPUTERS
& WWW
VIRTUAL
MEMORY
DISK FILES &
DATABASES
CD-ROM
TAPE
etc.
10/19/98
Main Memory
u
10/19/98
DRAM OPERATION
10/19/98
RAS#
RAS.L
CAS#
CAS.L
Cycle time
RAS + CAS + rewriting data back to array
Refresh cycle
Access to refresh capacitors
Needed every few milliseconds (say, 64 msec); varies with chip
10/19/98
10/19/98
10/19/98
64 Kbit
CYCLE TIME (ns)
256 Kbit
200
1 Mbit
CAS (ns)
SPEED (ns)
4 Mbit
150
16 Mbit
100
64 Mbit
50
0
1980
1982
1984
1986
1988
1990
1992
1994
YEAR OF INTRODUCTION
10/19/98
Various modes:
Nibble mode: DRAM provides several bits sequentially for every RAS
Fast Page mode: DRAM row can be randomly addressed with several CAS
cycles
Static column: Same as page mode, but asynchronous CAS access
10
10/19/98
Use fast page mode, etc., to read several words over a modest width
DRAM bank
Can provide higher bandwidth with modest latency penalty
Often a cost-effective tradeoff, since cache is already helping with latency on
most accesses
11
10/19/98
Bandwidth takes into account 110 ns first cycle, 40 ns for CAS cycles
Bandwidth for one word = 8 bytes / 110 ns = 69.35 MB/sec
Bandwidth for two words = 16 bytes / (110+40 ns) = 101.73 MB/sec
Peak bandwidth = 8 bytes / 40 ns = 190.73 MB/sec
Maximum sustained bandwidth = (256 words * 8 bytes) / ( 110ns + 256*40ns) = 188.71 MB/sec
Cache on a Shoestring
u
12
10/19/98
INTERLEAVED MEMORY
13
10/19/98
14
10/19/98
BUT, time to go through interleaving process costs time for a single, isolated
memory access
DRAM SET
INTERLEAVE
CONTROLLER
DRAM SET
CPU
DRAM SET
INTERLEAVE
CONTROLLER
DRAM SET
15
10/19/98
Need for high bandwidth in successive accesses having poor spatial locality
Superscalar access to multiple data locations
Multiprocessing accessing shared main memory
Minimum memory size can be a cost constraint on all but the biggest
systems
Assume 64-bit data bus; 64 Mbit DRAMs in 1-bit wide configuration
64-bit DRAM module will have 64 chips with 512 MB of DRAM
16
10/19/98
REVIEW
Review
u
17