Seminar Report
Seminar Report
Seminar Report
Introduction
technology has steadily progressed in capacity and performance to meet the increasing
requirements of other PC hardware subsystems and software. In the past, there have been
relatively clear industry transitions from one memory technology to its successor.
However, today there are multiple choices— PC133, RDRAM, and Double Data Rate
(DDR)— and more choices may exist in the future as providers of DRAM system
memory accommodate a growing variety of platforms and form factors. These range
from small handheld devices to high-end servers, each with different power, space, speed,
and capacity requirements for system memory. This seminar focuses on PC system
memory issues and trends. It begins by reviewing the role of memory in the system and
the key memory parameters that affect system performance. This report then presents the
basics of memory technology and today's alternatives. Finally says about the key
The primary role of memory is to store code and data for the processor. Although
caching and other processor architecture features have reduced its dependency on
memory performance, the processor still requires most of the memory bandwidth. Figure
1 shows the major consumers of memory bandwidth: the processor, graphics subsystem,
PCI devices (such as high-speed communications devices), and hard drives. Other lower
bandwidth interfaces such as the USB and parallel ports must also be accommodated. The
memory hub provides an interface to system memory for all of the high bandwidth
devices. The I/O hub schedules requests from other devices into the memory hub.
Memory plays a key role in the efficient operation of I/O devices such as graphics
adapters and disk drives. In a typical system, most data transfers move through system
memory. For example, when transferring a file from the network to a local disk, the PCI
host adapter transfers data from the network to memory. This is commonly referred to as
direct memory access (DMA), as opposed to programmed I/O (PIO), in which the
processor is directly involved in all data transfers. The processor, after performing any
required formatting operations, initiates a transfer from memory to local disk storage.
Once initiated, the data is transferred directly from memory to disk without any further
processor involvement. In summary, the system memory functions as the primary storage
component for processor code and data, and as a centralized transfer point for most data
Performance Factors
Memory parameters that impact system performance are capacity, bandwidth, and
latency.
Capacity
How does capacity impact system performance? The first step in answering this
question is to describe the memory hierarchy. Table 1 shows the capacities and speeds
service today.
These storage mechanisms range from the very fast, but low-capacity, Level 1 (L1)
cache memory to the much slower, but higher-capacity, disk drive. Ideally, a computer
would use the fastest available storage mechanisms—in this case L1 cache—for all data.
However, the laws of physics (which dictate that higher capacity storage mechanisms are
slower) and cost considerations prevent this. Instead, PCs use a mechanism called
“virtual memory,” which makes use of the L1 and L2 cache, main system memory, and
the hard drive. The virtual memory mechanism allows a programmer to use more
memory than is physically available in the system, and to keep the most frequently and
recently used data in the fastest storage. When more memory is needed than is available
in system memory, some data or code must be stored on disk. When the processor
accesses data not available in memory, information that has not been accessed recently is
saved to the hard drive. The system then uses the vacated memory space to complete the
processor's request. However, disk access is comparatively slow and system performance
is significantly impacted if the processor must frequently wait for disk access. Adding
system memory reduces this probability. The amount of capacity required to reduce disk
activity to an acceptable level depends on the operating system and the type and number
compound annual growth rate—and the outlook is for similar increases over the next few
years. These increases have exceeded the requirements of mainstream desktop PCs and,
as a result, the number of memory slots is being reduced from three to two in many of
today's client platforms. However, servers and high-end workstations continue to take
Bandwidth
Memory bandwidth is a measure of the rate at which data can be transferred to and
from memory, typically expressed in megabytes per second (MB/sec). Peak bandwidth is
the theoretical maximum transfer rate between any device and memory. In practice, peak
bandwidth is reduced by interference from other devices and by the “lead-off” time
required for a device to receive the first bit of data after initiating a memory request.
There should be adequate memory bandwidth to support the actual data rates of the
between devices. In many systems, memory and I/O hubs are designed to accommodate
Table 2 shows the data rates of various system components over the last 4 years.
Although the need for memory bandwidth is not directly proportional to these data rates,
the upward trend is obvious. Memory systems have done a fairly good job of keeping up
with system requirements over this period of time, moving from 533 MB/sec to 2133
3200 MB/sec.
Latency
Latency is a measure of the delay from the data request until the data is returned. It is
a function of peak bandwidth, lead-off time, and interference between devices. In general,
processors are more sensitive to latency than bandwidth because they work with smaller
blocks of data and can waste a significant number of clocks waiting for critical data. In
contrast, I/O data transfers are relatively long, and bandwidth is a more important
consideration than latency. Data transfers moving to and from system memory must pass
through the memory hub and, in many cases, the I/O hub. These components are
collectively referred to as the chip set or core logic, and are major contributors to the
latency from a device to memory. They can restrict or exploit system memory capabilities
data transfers for optimum memory performance. Over the last 4 years, memory
bandwidth has kept up with system needs, but latency improvements have lagged.
Current Rambus and DDR technologies double the memory bandwidth over100-MHz
SDRAM, but do not reduce latency. Ideally, latency should be reduced in proportion to
CAS Latency
The RAM module is an organized collection of integrated chips (ICs). Each module is
controlled by the memory controller, which handles the signals going from the CPU to
the RAM. There are two signals--- the Row Access Strobe (RAS) and the Column Access
Strobe (CAS). Each memory chip is divided into rows and columns, making it look like a
cell matrix. Each cell ha s a row address and a column address. When the CPU sends a
signal to the memory controller, it first accesses the row by putting an address on the
memory’s address pins, and activation the RAS signal. Then it waits a few clock cycles---
this is the RAS-to-CAS Delay. It then puts the column address on the address pins, and
activates the CAS signal. Then there’s a wait for another few clock cycles, which is the
CAS delay. Finally, the data appears on the pins of the RAM.
The CAS delay is called CAS Latency. Lower CAS latency provides faster data
access. CAS-2 is where the CPU waits for 2 clock cycles; CAS-3 is where it waits for 3
clock cycles, and so on. However, CAS-2 affords a performance gain over CAS-3 only
while playing games or over clocking the CPU. The CAS latency is usually 2.5ns for
DDR RAM.
order execution, and prefetch instructions for applications that are sensitive to processor
While the complex world of main system Memory technology can be regarded from
many angles, it is beneficial to begin with a look at two large-scale perspectives, ROM
between the two is that RAM is volatile memory, that is, any data existing on the memory
is erased when the system shuts down. By comparison, ROM memory is referred to as
nonvolatile, meaning its data content remains intact, irrespective of shutdown and boot
activity. Note that RAM and ROM are commonly used terms, and are used here, although
ROM
As the name implies, ROM memory can only be read in operation, preventing the re-
writing of contents as part of its normal function. Basic ROM stores critical information
in computers and other digital devices, information whose integrity is vital to system
operation and is unlikely to change. Other commonly used forms of ROM are:
Programmable ROM is a blank memory chip open for one-time only recording of
This type of ROM is rewrite able by way of software. It is used in flash BIOS, in
which the software allows users to upgrade the stored BIOS information (flashing).
RAM
The Random component of RAM’s name actually refers to this type of memory’s
neighboring bytes. The main difference between a ROM and a RAM is that a RAM
outside the computer and is only normally read. RAM plays a major role in system
operations and specifically performance. Essentially, the more complex a program is, the
more its execution will benefit from the present of both ample and efficient RAM access.
RAM takes two forms, SRAM (Static RAM), and DRAM (Dynamic RAM). A detailed
Static RAM memory devices retain data for as long as DC power is applied. Because
no special action (except power) is required to retain stored data, these devices are called
static memory. They re also called volatile memory because they will not retain data
without power. The SRAM stores temporary data and is used when the size of the
read/write memory is relatively small. Static RAM provides significant advantages for
performance, in that it holds data without needing to be refreshed constantly. This allows
RAM’s high cost to produce, limiting most of its practical applications to memory
caching functions. There are three types of SRAM, Async RAM, Sync RAM, and
Async SRAM
The designation Async is short for Asynchronous, meaning here that the SRAM
functions with no dependence on the system clock. Async SRAM is an older type of
Sync SRAM
operations. The marked speed advantage of this feature also dictates a higher cost.
Most common, this type of SRAM directs larger data packets of data (requests) to the
memory (pipelining) all at the same time, allowing a much quicker reaction on the part of
the RAM, thus increasing access speed. This type of SRAM accommodates bus speeds in
DRAM is essentially the same as SRAM, except that it retains data for only 2 or 4 ms
completely rewritten (refreshed) because the capacitor, which store a logic 1 or logic 0,
lose their charges. In order to refresh a DRAM, the contents of a section of the memory
must periodically be read or written. Any read or write automatically refreshes an entire
section of the DRAM. The number of bits refreshed depends on the size of the memory
Refresh cycles are accomplished by doing a read, a write, or a special refresh cycle
that doesn’t read or write data. The refresh cycle is totally internal to the DRAM and is
accomplished while other memory components in the system operate. This type of
refresh is called either hidden refresh, transparent refresh, or sometimes cycle stealing.
For example, it takes the 8086/8088, running at a 5 MHz clock rate, 800 ns to do a
read or write. Because the DRAM must have a refresh cycle every 15.6 µs, this means
that for every 19 memory reads or writes, the memory system must run a refresh cycle or
memory data will be lost. This represents a loss of 5% of the computer’s time, a small
price to pay for the savings represented by using the dynamic RAM. While the raw speed
consideration renders it a secondary choice to SRAM, its superior cost advantages make
In the last few years, a tremendous upsurge has occurred in the evolution of DRAM
technology. Where originally, nearly all PC system memory was confined to FPM (Fast
Page Mode) DRAM. Rapid growth in both CPU and motherboard bus speeds, however,
faster and faster system capabilities. Numerous options are now, and have been,
available, some popular, some less so. They fall into two main categories, Asynchronous
and Synchronous, with the latter taking the lead in recent years.
An improvement on its Page Mode Access predecessor, FPM quickly became the
most widely utilized access mode for DRAM, almost universally installed for a time, and
still widely supported. Its bypassing of power-hungry sense and restore current is a major
benefit. With speeds originally limited to around 120 ns, and later improvements to as
low as 60, FPM was still unable to keep up with the soon-to-be ubiquitous 66 MHz
system bus. Presently, FPM is rarely used, having been usurped by several superior
technologies. Ironically, its rarity of deployment has resulted in it often being more costly
EDO, or Hyper Page mode, as it is sometimes called, was the last significant
synchronous DRAM interface technology. Its advantage over FPM lies in its ability, by
not turning off the output buffers, to allow an access operation to commence prior to the
FPM DRAM, with no increase in silicon usage, and hence, package size. Improvements
in the 30-40% range are seen in the implementation of EDO DRAM. Additionally, it
capability. Given proper chip support, even 100 MHz bus speeds are accessible, although
with much lower results than the newer synchronous forms. Users whose bus speed
requirements are no higher than 83 MHz should see no clear advantage in upgrading from
chipset capabilities has, for the most part, left EDO in the past alongside FPM DRAM.
utilized a burst mode combined with dual bank architecture to beat EDO capabilities.
Simply put, the BEDO advantage lay in its ability to prepare 3 subsequent addresses
internally following an initial address input. This technique allowed it to overcome time
delays resulting from the input of each new address. The timing of its development was,
however, poor, in that by the time it became a possible alternative, most large
manufacturers had devoted most of their development energies towards SDRAM and
related advancement. Consequently, industry wisdom, led by Intel, negated the viability
As parallel advances made it clear that the future belonged to bus speeds in excess of
synchronous operations was the result, accomplishing not only the ability to embrace
Presently accepted as the de facto industry standard for system memory, DDR This
section provides a more detailed view of DDR SDRAM (and in some cases, simply
implementations of DIMM memory are available in two major types, Unbuffered and
Registered.
It is worthwhile to be aware of the conventions used for rating the relative speed of
SDRAM modules. While ratings for asynchronous DRAM are listed in nanoseconds, an
additional MHz rating is applied to SDRAM, since its synchronous nature requires that it
be compatible with system bus speed. Unfortunately, exactly matching exact SDRAM
ratings with bus speeds, that is, for example, to say that a 100 MHz/10 ns-rated SDRAM
speed will operate properly with a 100 MHz system speed, is not practically correct. This
stems from the fact that the SDRAM MHz rating refers to optimum conditions in the
system, a scenario that rarely, if ever, occurs in real-world operations. More operable
would be to “over-match” the SDRAM rating to compensate. Thus, a much more realistic
To make the entire process easier, Intel implemented a Speed Rating system for
SDRAM qualification, its now universally exercised “PC” rating. The qualification,
intended as an aid to both manufacturers and users, takes into account the above noted
the-board rating system, this scale comes very close to assuring the compatibility of
SDRAM with its host system speed. According to the Intel system, a PC100-compliant
SDRAM module will perform well with a 100 MHz system. Likewise its PC66 and
PC133 ratings. The ratings also guarantee compatibility with Intel protocols, such as
Over time, PC system memory has evolved to increase capacity and bandwidth, and
reduce latency. There has been progress at the component and interface level. While the
basic component core architecture has not changed significantly, the interfaces to the
component cores have and continue to evolve. Speed and capacity advances in the core
(or fast page-mode), and extended data out (EDO) DRAM. Today, three main memory
• DDR SDRAM
• Rambus
SDRAM
SDRAM is a 64-bit-wide interface (or 72 bits in implementations that include error correction
capability). The interface clock rate began at 66 MHz, evolved to 100 MHz, and is now capable of
operating at 133 MHz in system memory implementations. (100- and 133-MHz implementations
are often referred to as PC100 and PC133.) The bandwidth in MB/sec is equal to the clock
multiplied by 8. (A 64-bit wide interface transfers 8 bytes per clock.) This yields 1066 MB/sec
bandwidth for current 133-MHz SDRAM. Both latency and bandwidth have improved in proportion
to the clock speed increases. A single system memory semiconductor component may provide 4,
8, or 16 bits on each data transfer. Regardless of the bits per transfer, each component contains
the same amount of storage (in bits) for a given DRAM component generation. Multiple
components are mounted on a memory module (or DIMM). The system's memory hub interfaces
with one or more of these memory modules. An SDRAM module combines the components to
provide 64 bits on each data transfer. Desktop and portable systems use 8- and 16-bit
component interfaces to provide 64-bit interfaces in lower capacities. For example, a typical
desktop or portable SDRAM module might consist of four 16-bit components, each with a 128-
megabit (Mb) capacity, for a total of 64 MB of memory. In contrast, 4- and 8-bit components are
used for the higher capacities needed in servers. An SDRAM module used in a server might
contain 16 4-bit components, each with 128-Mb capacity, for a total capacity of 256 MB of
memory. Component capacities have increased over the life of SDRAM in proportion to the
performance is adequate for the application mix in the target markets, but it is expected to
operating systems, and applications increase. Dual SDRAM interfaces are used in servers
frequencies are not expected to increase beyond 133 MHz. Instead, DDR will be used
beyond the 133-MHz performance point. (SDRAM frequencies beyond 133 MHz are
DDR SDRAM
SDRAM, but adds strobes that operate at the same frequency as the clock. These strobes
travel with the data from the memory hub to the memory components on writes and to the
memory hub from the memory components on reads. Data is transferred on both the
rising and falling edges of the strobe. This architectural change, coupled with lower
signal levels and advances in semiconductor processes, doubles the data rates and the
bandwidth. However, DDR does not reduce latency. In general, the PC1600
implementations have greater latency than PC133. Despite this, PC1600 provides a
Like SDRAM, PC system memory implementations of DDR have 64-bit data paths (or
72 bits for error correction capability). Bandwidths for DDR are 1600 MB/sec for
PC1600 and 2133 MB/sec for PC2100. Component capacities for volume production
DDR began at 128 Mb and will be available at least through the 1-Gb generation.
One of the goals of DDR was a smooth technology transition. Many memory hubs are
designed to operate with either SDRAM or DDR SDRAM modules. However, practical
space and cost considerations, driven in part by different module connectors, has
prevented any practical system-level application of this capability. DDR has minor
incremental system- and component level cost penalties over standard SDRAM, but these
DDR SDRAM
(100 MHz) DDR200 PC1600
DDR SDRAM
(133 MHz) DDR266 PC2100
Some topological differences between DDR SDRAM and SDRAM DIMM modules
SDRAM DDR
Profile
Pin
Count 168 184
Notches 2 1
Table 3: topological differences between DDR SDRAM and SDRAM DIMM modules
DDR modules, like their SDRAM predecessors, arrive in their DIMM modules.
Although motherboards designed to implement DDR are similar to those that use
SDRAM, they are not backward compatible with motherboards that support SDRAM.
You cannot use DDR in earlier SDRAM based motherboards, nor can you use SDRAM
Rambus
SDRAM and DDR SDRAM share many architectural and signaling features. Both
use a parallel data bus, mainly available in component widths of x8 or x16, both have a
single addressing command bus that must be shared to transmit row and column
transmitting data on both edges of the synchronous clock signal, thus in theory
“doubling” the data rate of the memory. It is important to note that DDR SDRAM does
not double the address command bandwidth of the system by using both edges of the
clock on the command bus, a factor that ultimately limits the real world performance gain
from using DDR signaling on the data bus. RDRAM memory, which pioneered double
data rate technology for DRAM in 1990, takes a different approach. It combines a
conventional DRAM core with a high-speed serial interface called the RDRAM Channel.
interfaces. RDRAM memory uses separate buses to transmit row and column address
information on both edges of the clock. This enables higher efficiency by allowing the
issue of a column packet while a row packet is simultaneously being issued to another
device, a feature not present in SDRAM or DDR. Both commands and data are
communicated on packet buses from the memory hub to the modules. There are two
command buses—RAS bus and CAS bus—and one data bus. Eight transfers on both
edges of a 400-MHz clock are used for all packets. The separate command and data
buses, along with identical packet lengths, allow memory hubs to improve the scheduling
of memory transfers to achieve higher memory data bus utilization. The result is higher
actual bandwidth. One of the benefits of this architecture is that it allows the banks of the
conflict and thus minimize bank conflict effects. RDRAM memory is capable of
capacity that better matches the needs of a particular system. However, each memory
component must be capable of providing the full bandwidth needed by the system. This
Unlike SDRAM, motherboards that support Rambus memory require that all memory
slots in the motherboard must be populated. Not with memory modules necessarily, but
4 memory slots and fill only one with a memory module, and the system would operate.
On Rambus based motherboards, all blank or unfilled memory sockets must be populated
with either a memory module (in this case a RIMM™ Rambus Inline Memory Modules)
or a Continuity Module, (CRIMM) to complete the memory path to the bus. Rambus
that want to produce it are required to pay a royalty to Rambus Inc. DDR designs, on the
other hand, are open architecture. Since Rambus is an entirely new design, there are other
cost factors to be added in, such as an entirely new module manufacturing and testing
process.
The results of the bandwidth study for typical memories of different types, operating
frequencies and core latency (tRAC) timings can be referenced in the Table 4 below.
There are key upcoming transitions in the DDR and Rambus memory technologies, as
(ADT).
ADT
after DDR. Details of this interface have not been publicly disclosed; however, all major
DRAM component manufacturers are involved in defining both the DDR and ADT
interfaces. It is expected that ADT will initially target desktop applications. There is a
strong possibility that only one of these standards will be adopted by the PC industry, and
Rambus
Rambus has plans for a new high-speed memory interface called Yellowstone. This
next-generation interface is intended to provide 3.2 Gb/sec per differential signal pair
signal is planned to support this data rate. The Yellowstone is expected to be within
Although memory technology has kept pace with increasing capacity requirements,
system memory bandwidth is not keeping pace with the demands of the rest of the
system. The ratio of bandwidth to capacity for a given memory module implementation is
falling over time. A dual interface helps to alleviate the resulting performance issues.
Multiple memory interfaces in high-end systems are not new, but their use in mainstream
semiconductor technology and the techniques used to maintain signal integrity and timing
across the interface. There are challenges on the component side to providing capacity
and speed increases at the pace of Moore's law; it is difficult to make components bigger
and faster at the same time. Containing cost, reducing size, controlling emissions, and
maintaining the reliability of the system memory interface becomes more difficult as
performance increases. Future interface transitions must address all of these issues.
MRAM
Manufacturers such as IBM, Motorola and Infineon are specifically involved in the
This type of RAM uses the principle of alignment of magnetic particles as the basis of
An inherent advantage with MRAM is that it since it’s based on the principle of
off. Additionally, these elements do not need an electric charge to retain information,
Finally with the absence of the delays associated with transferring electricity between the
faster than DRAM. MRAM is thus poised to give rise to a new breed of instantly-on
computing devises.
The technology is already up and running in some high-tech labs around the world.
But only after the fabrication process for this memory becomes commercially available, it
will make its way into mainstream home PCs and handhelds.
IRAM
for the solution to be complete, memory latency needs to be reduced, too. Currently,
IRAM products are no commercially available: however, several university and company
NRAM
Woburn, Massachusetts, may be out ay any time now. MRAM is non-volatile, like
MRAM, so your computer can boot up within a second. The chips would run at 2 GHz,
and last virtually forever. The concept behind NRAM is simple: combine carbon
nanotubes—a molecule of carbon atoms connected like a tube and sealed at both ends
with traditional semiconductor techniques. Ones and zeroes revisited: A nanotube is bent
using electrostatic forces so that the top touches the bottom. Bent nanotubes can be
unbent too. Bent means a 1, and unbent means a 0. And billions of nanotubes can be
Conclusion
In today’s world of ever growing need of faster and high-end computers the need for
faster and better memory technology is never ending. Consider, for example, the Intel’s
newest Pentium4 processors have front-side bus architectures operating at data speeds of
533MHz and 800MHz. This translates to a peak data bandwidth of 4.2GB/s (533MHz x 8
Considering the fact that the computer industry has come so far from the FPT to the
RDRAM it can now be said that a day will come when every user will be able to start
his/her PC like a bulb with out any delay (no time requirement for booting)! Giving rise
Reference:
1. www.pcguide.com/ref/ram/tech.htm
2. www.rambus.com
3. www.dell.com/r&d
4. www.dewassoc.com
5. www.kingston.com/newtech/MKF_520DDR
7. www.jedec.org
INDEX
A L
About Speed Ratings, 14 Latency, 6
ADT, 22
M
APPENDIX –A - Motherboards
at a Glance, 27 MEMORY TECHNOLOGY
Async SRAM, 10 BASICS, 7
Asynchronous DRAM models, 12 MRAM, 23
Multiple Memory Interfaces, 22
B
Bandwidth, 5 N
NRAM, 24
C
Capacity, 3 P
CAS Latency, 6 Performance Factors, 3
Conclusion, 25 Pipeline Burst SRAM, 10
Current Memory Technologies, 15 PROM (Programmable ROM), 8
Current SDRAM Implementations, 16
R
D RAM, 6, 7, 9, 10, 11, 23, 24
DDR SDRAM, 17 Rambus, 19, 22
DDR SDRAM (Double Data Rate Reference, 26
Synchronous), 13 Role of Memory in the System, 2
DRAM (Dynamic RAM), 10 ROM, 8
E S
EDO (Extended Data Output), 12 SDRAM, 15
EEPROM (Electrically Erasable SDRAM (Synchronous DRAM) Models, 13
Programmable ROM), 8 SRAM, 9
EPROM (Erasable Programmable ROM), 8 Sync SRAM, 10
F T
Figure1.Major Consumers of Memory Table 1. Memory Storage Mechanisms, 4
Bandwidth, 5 Table 3: topological differences between DDR SDRAM
FPM (Fast Page Mode), 12 and SDRAM DIMM modules, 18
Future RAM Technologies, 23 Table 4: Effective Bandwidth of 256Mbyte Memory
Modules, 21
Table2- DDR naming conventions, 18
I
Introduction, 1 U
IRAM, 24 Upcoming Memory Interface
Transitions, 21