0% found this document useful (0 votes)
170 views13 pages

Embedded DDR Interfaces: Ten Tips To Success For Your Soc

DDR2 and DDR3 SDRAM ("DDR") have become the dominant off-chip memory storage solution for system-on-chip designs. This paper presents ten guiding principles for embedded DDR interfaces. Many of the principles are not explained in the DRAM standards and vendor data sheets.

Uploaded by

Giriprasad VA
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
170 views13 pages

Embedded DDR Interfaces: Ten Tips To Success For Your Soc

DDR2 and DDR3 SDRAM ("DDR") have become the dominant off-chip memory storage solution for system-on-chip designs. This paper presents ten guiding principles for embedded DDR interfaces. Many of the principles are not explained in the DRAM standards and vendor data sheets.

Uploaded by

Giriprasad VA
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

White Paper

Embedded DDR Interfaces


Ten Tips to Success for Your SoC
September 2009

Author
Graham Allan Sr. Manager, Product Marketing, Synopsys Inc.

Preface
DRAM standards such as DDR3 are developed by JEDEC, the standard setting organization for many semiconductor components. Most of the DRAM standards developed by JEDEC represent a standard with a specific system or a small number of systems in mind. For example, the DDR3 SDRAM standard was primarily developed for use on unbuffered and registered DIMMs for laptop, desktop and server applications. Embedded applications for DRAM such as video processing chips typically consume a very small percentage of the DRAMs manufactured and are often left to make due with a less than perfect fit for the application. Unless a given company regularly attends the JEDEC meetings, it is very difficult to appreciate the intended system view(s) that represent the focus for each type of DRAM being standardized. Synopsys is a regular participant in JEDEC meetings in order to represent the interests of the memory controller and to keep abreast of what standards are in development and how the current standards may evolve. This whitepaper does not disclose any proprietary information about work in progress at JEDEC, but it does address a number of issues and concerns that have come up frequently in customer interactions.

Introduction
Emerging from a host of competing technologies, DDR2 and DDR3 SDRAM (DDR) have become the dominant off-chip memory storage solution for system-on-chip (SoC) designs. With high volumes driven by the PC market, stability of supply, and attractive pricing, DDR has defeated all of the contenders including QDR SRAM, RLDRAM, Rambus DRAM and other memory technologies to take the RAM crown for embedded applications. Unfortunately, many SoC designers are unfamiliar with the realities of the DRAM standards, typical DRAM applications and the DRAM market. This paper presents ten guiding principles for embedded DDR interfaces, many of which the DRAM standards and vendor data sheets do not explain.

1. Carefully Consider the Exact DRAM You Require


DRAM pricing and technology are driven by PCs, workstations and notebooks (PCs), not by the embedded applications served by most SoCs. PC makers buy DRAMs in huge volumes, and represent approximately 90% of the revenue for DRAM vendors such as Elpida, Hynix, Micron, and Samsung. Embedded applications represent a small portion of the DRAM market, and are often a difficult area for DRAM vendors to address because embedded applications present a variety of differing requirements.

Mainstream DRAMs designed for the PC market represent the most plentiful and cheapest DRAMs. Currently, for example, the highest volume and cheapest products are the 64 Mb x 8 (512Mb) and 128 Mb x 8 (1Gb) DDR2 SDRAMs since these represent the most commonly used DRAMs in todays PCs. Some design teams might conclude that they want to use the cheapest available product for their SoC applications, but it is often not that simple as DRAMs targeting embedded applications typically have different requirements from DRAMs that were designed for PCs. To accommodate the memory sub-system requirements for expandability, PCs use 64-bit wide memory modules (72-bit wide if ECC is used) which are effectively a small PCB with a self contained memory subsystem. The most common module used in PCs today is the UDIMM or Unbuffered Dual In-Line Memory Module. The Dual part is a legacy term as most UDIMMs today only have one side of the module populated with DRAMs. There are many other types of DIMMs such as RDIMMs (Registered DIMMs popular in servers), SODIMMs (Small Outline DIMMs popular in notebooks & netbooks), etc. Each PC UDIMM typically uses eight (nine if error correction code [ECC] is used) 8-bit wide DRAMs in a parallel configuration providing anywhere from 256MB to 4GB total RAM capacity per DIMM depending on the number and capacity of the DRAMs used. PCs typically accommodate more than one DIMM slot per memory channel, however this goal is becoming more challenging with high end DDR2 and DDR3. Compared to PCs, embedded applications typically call for a narrower channel, wider components with less overall memory capacity. A more typical embedded memory configuration is a 32-bit channel using two 32Mb x 16 (512Mb) or 64Mb x 16 (1Gb) DRAMs that has a total RAM capacity of 128MB to 256MB. The net result is that any price per bit crossover point reported by the media (the point at which newer technology is cheaper than older technology for an equivalent size DRAM) applies to the PC market, not necessarily the embedded market. For example, if DDR3 is expected to be the lowest priced DRAM at some point in 2010, this will apply to the 128Mb x 8 (1Gb) or 256 Mb x 8 (2Gb) DDR3 DRAMs used in PCs. Since they have a smaller volume of production, x 16 DRAMs may take a longer time to cross over. In addition, since the focus is on the x 8 DRAMs for PCs, the higher speed DRAMs typically roll out as 8-bit wide DRAMs first followed by the 16-bit wide versions sometime later. The moral of this particular story is that a DRAM data sheet may cover well over 30 separate part numbers offering a variety of width options, speed grades, latency options and even temperature range options. The pricing for each option is considerably different and some combinations may simply not be available. If you are designing SoCs for cost-sensitive embedded applications, make sure you understand the availability and pricing implications of which ever DRAM you target for the system by talking with your DRAM suppliers.

2. New DDR Technology is Not Always Better


Believe it or not, if you have the choice of a buying a new PC using DDR2-800 or DDR3-1066 SDRAM, you should pick the DDR2-800-based system with all other things being equal. While the DDR3-based system does offer higher bandwidth (theoretically), it will also likely have a higher overall latency and a PCs performance is often just as sensitive to latency as it is bandwidth. In regard to embedded applications, DDR2 may represent the best choice for many designs for some time to come. The good news for cost-conscious embedded system developers is that there is plenty of life in current or older DRAM technologies. DDR2 is the workhorse for embedded applications today, and is going to be around for a long time. For that matter, DDR SDRAMs are still widely available, as are original non-DDR synchronous DRAM that first became available in the mid-90s. Most memory technologies remain available for years or even decades after they have peaked in the mainstream memory market.

Embedded DDR InterfacesTen Tips to Success for Your SoC

As an example, todays DDR2 is well supported at both 800 and 1066 Mbps. These data rates are sufficient for most embedded applications today. DDR2 uses a 1.8V terminated I/O, so power consumption will be higher compared to DDR3 that uses 1.5V, but overall power consumption is also heavily dependent on the values chosen for on-die termination (ODT) and output drive impedance. There are also some DDR2 products available that operate at 1.5V instead of 1.8V to reduce the power signature of the DRAMs. DDR2 also has two output drive impedances that are programmable in the DRAM itself nominal drive strength, and reduced drive strength. Most lightly loaded, embedded applications should use the reduced drive strength. Unfortunately, DDR3 lacks a similar reduced drive strength that is more suited to embedded applications. DDR2 also supports common 256Mb and 512Mb devices, whereas DDR3 nominally starts at 1Gb devices. You may be able to find a data sheet for a 512Mb DDR3 SDRAM but it is likely to be lower volume and therefore more expensive. A 256Mb or 512Mb DDR2 SDRAM will be cheaper than a 1GB DDR3 SDRAM for years after the official DDR3 price cross-over that assumes similar DRAMs. On the higher end, DDR2 also supports up to 4Gb devices. DDR2 and DDR3 are available in 4-bit, 8-bit and 16-bit wide configurations, but some vendors offer a non-JEDEC standard 32-bit DDR2 option, which may be preferred for some embedded applications to allow high interface bandwidth with a minimum number of DRAMs. So why consider DDR3? The most common reason is the expectation that the price for DDR3 will be cheaper at some point in the future (and you require 1Gb or larger DRAMs). Higher bandwidth is another reason supported by clock frequencies up to 800 MHz, DDR3 goes up to 1600 Mbps (with the expectation that this will increase to 2133 Mbps in the future). There is also a potential power savings in the DDR3 1.5V interface (with the expectation that this will decrease to 1.35V in the future), and there is a wider variety of ODT settings in the DDR3 SDRAMs. DDR3 is also planned to support DRAMs up to 8Gb. However, there are some caveats specifically about DDR3. First, due to the higher operating frequency range and the on-chip delay-locked loop (DLL), the minimum operating frequency is 300 MHz (vs. 125 MHz for DDR2), which might not suit an embedded application that has a broad frequency range requirement. Secondly, DDR3 uses an 8-bit prefetch, compared to a 4-bit prefetch for DDR2. Thus, DDR3 is effectively limited to a burst length of 8 whereas DDR2 offers programmable burst lengths of 4 or 8. The longer burst length of DDR3 can potentially result in more delays to access data and poorer channel utilization even with the higher clock frequencies especially for video processing applications. Aside from operating frequencies and burst length, there are other factors to think about when considering DDR3 versus DDR2. Because the clock frequencies are higher and the product is younger, DDR3 DRAMs typically have higher latencies. DDR3 DIMMs use a technique called fly by routing that daisy-chains address and control signals along the length of the DIMM. This reduces signal-integrity concerns, but requires Address and Command termination on the DIMM causing issues for multi-DIMM systems (refer to section 9). Finally, due to the higher data rates and single-ended signaling, DDR3 makes it more likely that a flip-chip package will be required for SoCs with the embedded DRAM interface. SoC designers also have two other DDR options to consider. Mobile DDR, also known as low-power DDR (LPDDR), has no DLL on the DRAM, no lower operating limit, no on-die termination (ODT), and a 1.8V interface. Mobile DDR can easily switch into lower power modes and it supports low-power features such as partial array self-refresh (PASR) and clock stop mode. Ranging from 128Mb to 2Gb, Mobile DDR is available at frequencies up to 200 MHz, or 400 Mbps, and offers a 32-bit wide interface option which is a nice feature for embedded applications. However, due to the lower production volumes, Mobile DDR carries a significant price premium over an equivalent, higher performing DDR2 product. LPDDR2 is an emerging technology that aims to double the performance of Mobile DDR, going up to 400 MHz and 800 Mbps and ultimately targeting 1066 Mbps for package-on-package or multi-chip packaged systems. LPDDR2 features a 1.2V interface with no ODT, supports devices up to 8Gb, and provides a 32-bit wide option. While LPDDR2 is not yet widely available, a significant price premium compared to Mobile DDR should be expected for quite some time.

Embedded DDR InterfacesTen Tips to Success for Your SoC

3. Do Not Apply DRAM Standards to Memory Controllers


JEDEC DRAM standards apply to the DRAMs themselves, not to the memory controllers. Further, as previously noted, the standards are developed with PCs in mind, not embedded applications. Thus, JEDEC DRAM standards only have an indirect relevance to memory controllers for embedded applications in that the memory controller has to properly drive the JEDEC standard DRAMs. With the exception of a few aging interface standards (which only apply to DDR and DDR2), how the memory controller drives and receives information from the DRAM is not standardized at all. Unlike a PC-based memory controller, embedded memory controllers must accommodate a wide range of potential system applications. Perhaps the most common embedded configuration uses two DRAMs soldered on the motherboard next to the SoC, with the SDRAM PHY and controller embedded in the SoC (see Figure 1). Here, the individual bidirectional data and data strobe nets are point-to-point. The Address/Command lane is unidirectional and carries two loads.

SoC
SDRAM controller

DQ, DQS

SDRAM

SDRAM PHY

Address/Command

Unidirectional, 2 loads per net

DQ, DQS

SDRAM

Bidirectional, point to point

Figure 1: A Common Embedded Application Configuration Uses an SoC with Two SDRAMs

Another common embedded configuration uses 4 SDRAMs arranged in a single rank (see Figure 2). This configuration typically uses 16-bit wide devices for a 64-bit wide data channel. The wide channel provides higher bandwidth, but the Address and Command lines are more heavily loaded. Fortunately, the Address and Command channel is single data rate so it can accommodate more loading than the double data rate data channel. Four loads on the Address and Command channel is typically not a difficult design challenge.
SoC
DQ, DQS SDRAM

DQ, DQS SDRAM controller SDRAM PHY Address/Command DQ, DQS

SDRAM Unidirectional, 4 loads per net SDRAM

DQ, DQS

SDRAM

Bidirectional, point to point

Figure 2: Embedded Application with a 64-bit Bus Using Four 16-bit Wide SDRAMs

Embedded DDR InterfacesTen Tips to Success for Your SoC

Some embedded configurations may use more than one rank of SDRAMs (see Figure 3). A rank is defined as an independent set of SDRAMs that can be accessed using the full databus width but share the Address and Command channel with all other ranks. Each rank has a unique chip select control signal to identify the specific rank being accessed. Ranks cannot be accessed simultaneously as they share the same data path to the SoC. Multiple ranks allow the memory sub-system to have higher performance as one Address and Command bus controls independent memory ranks that avoid many of the system timing limits in play when a single rank is used.

SoC

Rank 0 DQ, DQS SDRAM

Rank 1 SDRAM Unidirectional, 4 loads per net SDRAM

SDRAM controller

SDRAM PHY

CS0* Ad ss/Command AddreAddress/Command Address/Command ddress/Command ss/Command s/Comm CS1* DQ, DQS SDRAM

*Also CKEn, ODTn

Bidirectional, point to 2 points

Figure 3: Memory Sub-system with 2 Ranks

While not common in consumer applications, networking and computing applications often use one or more single or dual rank DIMMs (see Figure 4 and Figure 5). Unbuffered DIMMs are generally preferred over their Registered counterparts as the registers used in RDIMMs add an additional cycle of latency and are generally more expensive. However, RDIMMs do have the advantage that they provide intermediate buffering of the Address and Command signals and the clocks (via on DIMM PLLs) to limit the fan out from the memory controller.
DQ, DQS DQ, DQS DQ, DQS SDRAM controller SDRAM PHY DQ, DQS Address/Command DQ DQ, DQS DQ, DQS DQ, DQS DQ, DQS

SoC

Bidirectional, point to point

Unidirectional, 8-9 loads per net

Figure 4: SoC Interfacing to One Single Rank Unbuffered DIMM

Embedded DDR InterfacesTen Tips to Success for Your SoC

Second rank of DRAMs on back side

SoC

DQ, DQS DQ, DQS DQ, DQS CS0* CS1* DQ, DQS

SDRAM controller

SDRAM PHY

Address/Command DQ, DQS DQ, DQS DQ, DQS DQ, DQS

*Also CKEn, ODTn

Bidirectional, point to 2 points

Unidirectional, 16-18 loads per net

Figure 5: SoC Interfacing to One Dual Rank Unbuffered DIMM

The conclusion is that an embedded memory controller may have to support a number of different memory configurations depending on how the end customer wants to use the SoC. Without any standard for the memory controller, the SoC developer needs to ensure that the DRAMs will be driven properly in any configuration. Key features to look for in this regard include flexibility of output drive impedance, flexibility of the I/O configuration to match the number of I/Os required to the number of ranks, flexibility to interface to DIMMs or individual components, etc.

4. Proper Termination and Drive Strength is Key for Power and Performance
DDR2 and DDR3 SDRAMs offer a host of programmable options for the drive strength of the output buffers and for the on-die termination impedance. The embedded memory controller will offer similar choices. The DRAM data sheets and JEDEC standards clearly outline the options available and how to program the settings but how is a typical SoC designer supposed to choose the optimum settings? Termination requires a lot of care in component-based, embedded applications, and it is an area where there are frequent misunderstandings. Many engineers read the JEDEC specs and deduce that a DDR3 controller should use a 34 ohm output drive. The problem is that they are reading the spec for the DDR3 DRAM, not the memory controller (and remember, there is no standard for the memory controller). Recall that DRAM standards are developed with PCs in mind and that PCs use DIMMs. In a system that uses DIMMs, the 34 ohm output drive setting for the DRAM is the optimal choice. DDR DIMMs use a series termination resistor that isolates the DIMM stub from the memory channel transmission line. Specifically, DDR3 DIMMs use a 15 ohm series termination. In a typical PC system, 34 plus 15 ohms equals 49 ohms which is almost a perfect match for the characteristic impedance of the loaded transmission lines which are typically 50-60 ohm traces. So how is an embedded system developer left to manage all of these choices? First of all, for simple point-topoint systems, series termination resistors are not typically required. The result is that the 34 ohm output drive that was optimal for the PC may no longer be the right choice for an embedded system it may be too strong. In addition, without the 15 ohm series termination provided by a DIMM, a 34 ohm output drive will require termination matched to the transmission line (~50-60 ohms) to avoid reflections which can result in a high-power

Embedded DDR InterfacesTen Tips to Success for Your SoC

system. Ideally, it may be better to raise the output drive impedance to 50-60 ohms (matching the impedance of the transmission line to absorb reflections) and then use a higher ODT setting such as 120 ohms to reduce the power dissipation in the termination. The ideal settings for drive strength and ODT will also depend on the clock frequency to ensure that inter-symbol interference (ISI) effects are not introduced. Figure 6 contrasts two embedded DDR3 systems one that uses 34 ohm drive strength in the controller with a 60 ohm ODT in the DRAM and one that uses 60 ohm drive strength with a 120 ohm ODT. In each case, the Write data eye as seen at the DRAM pins is shown. Notice that the optimized system creates smaller reflections, cleaner edge transitions, comparable data eye width and most importantly, a 37% overall lower power dissipation. Standard JEDEC Optimized embedded

34 driver, 60, ODT VSWING 1V Valid data eye: 743ps Significant reflections Timed with DQS transitions = jitter Driver and termination power: Address driver: 7.4mW/bit Address term: 3.9mW/bit DQ driver: 6.6mW/bit DQ term: 10mW/bit 950mW for a 32-bit channel

60 driver, 120, ODT VSWING 1V Valid data eye: 741ps Reflections absorbed by driver Less DQS jitter Driver and termination power: Address driver: 7.0mW/bit Address term: 2.0mW/bit DQ driver: 4.2mW/bit DQ term: 5.2mW/bit 600mW for a 32-bit channel

37% s av i n gs

Figure 6: Optimized Output Driver and ODT Settings (DDR3-1066) Result in a Significant Power Savings

To support optimized embedded applications, embedded memory controllers should have many options for output driver strength and ODT. Ideally, the memory controller should be more programmable than the DRAMs to enable engineers to select the optimal value based on their clock frequency and power requirements. In every case, a signal integrity analysis should be performed to allow designers to find optimal output drive impedance and ODT values for their system.

5. The Address and Command Lane Needs Close Attention


When developing DDR memory interfaces, many designers focus most of their attention on the double data rate data bus. The single data rate Address and Command signals are often overlooked which can lead to unreliable memory channels since the Address and Command lanes are typically routed to every device in every rank and are often very heavily loaded (refer to Figure 5 and Figure 6). Nowhere in the JEDEC DDR standards does it tell you that the Address and Command bus are designed to be terminated via discrete resistors on the motherboard and most DRAM vendor data sheets do not mention such termination. Designers should plan for termination of the Address/Command signals. Unlike the data signals, these nets are not provided with dynamic termination on the DRAM die since they must be terminated at the end of the net which usually has multiple DRAM loads. Termination occurs on the motherboard with discrete components which

Embedded DDR InterfacesTen Tips to Success for Your SoC

consume power whenever the memory controller is driving the Address/Command bus. Making the power dissipation worse is the requirement that DDR SDRAMs do not always allow their inputs to be high impedance (Hi-Z) since their input buffers are only turned off when in a power-down or self-refresh mode. DIMM-based designs present even more challenges for the Address/Command bus. If a design has two UDIMMs with two ranks on each UDIMM, a DDR application could result in an Address/Command bus with 32 or 36 loads. The heavy loading on this bus slows down the edges of the signal transitions and results in ISI problems as the signals do not reach their steady state potential in one clock period effectively clipping the signal. One solution is to implement 2T (two period) command timing which allows more setup and hold time for the signals. In this case, there is one Address/Command signal for every two cycles, and the signals are clocked on every other rising clock edge. This reduces timing uncertainty, but each Address and Command signal now needs to be held for two cycles. Finally, many designers would like to simply leave the Address/Command bus unterminated to save power. At very low frequencies, this may work but more often than not, an unterminated Address/Command bus leads to overshoot and undershoot that exceeds the DRAM data sheet specifications. The result may not be a system that fails immediately but instead has reduced long term reliability as the overshoot/undershoot slowly damages the DRAM. The address eye patterns in Figure 7 compare the received eye at the address ball of the DRAM for a typical system operating at 1066 Mbps for both a terminated and unterminated net. The terminated net shows an eye opening that is over 300ps wider compared to the unterminated net. More importantly, the unterminated net has overshoot and undershoot that far exceeds the JEDEC standard specification and vendor data sheets (for DDR3, the JEDEC standard and DRAM data sheets specify an absolute maximum overshoot/undershoot of VDDQ+0.4V/-0.4V for Address/Command/Control pins).

Figure 7: DDR3 Address Eyes, 50 Ohm Termination to VTT (left), Unterminated (right)

6. Pay Attention to the SoC Package


DRAM interfaces often take more pins than designers really want to put on their SoCs. The DRAM pins create a source-synchronous parallel interface, and can create various signal integrity issues. This is especially true with high data rates. Designers are often looking to reduce pin count but the memory system design may not permit this. The use of multiple memory ranks requires more pins including chip select (CS), ODT, and clock enable (CKE) for each rank. One possible way to reduce the pin count is to not use the data mask (DM) pins if Write data mask capability is not required. These pins mask data when the SoC Writes to memory. Typically there is one DM pin for every 8 bits of data. In many embedded applications, the DM pins are never used but design teams pin them out anyway. The DM pins are active high, input-only on the DDR SDRAMs so if the controller does not pin them out, the unused DM pins can simply be connected to ground at the DRAMs via a resistor.

Embedded DDR InterfacesTen Tips to Success for Your SoC

The type of package is also important. Wire bond packages have higher lead inductance, and require more power and ground pins than flip-chip packages to avoid current induced power rail collapse when all of the signals switch in the same direction. Flip-chip packages do not use highly inductive wires to deliver power to the die. As a result, designers may be able to use half the number of power and ground connections with a flip-chip package compared to a wire bond package. Of course, many design teams would prefer to use a wire bond package because of their inherently lower costs. Sufficient on-die decoupling capacitance is also a design criterion that is more critical in wire bond applications. The power planes in the package are also important. In addition to having enough pads on the SoC, it is important to have a sufficient number of balls on the package, and enough vias to connect those balls to the power plane to avoid blocking the power supply to the device.

7. Beyond the Package Realize that DDR Performance is System Dependent


A common question that Synopsys gets from potential customers is my system is (insert brief description of the system), will it work? There is never a simple answer to that question. DDR interfaces, especially higher performance or heavily loaded systems, always require a signal integrity analysis to ensure the system will be robust there are no short cuts. The signal integrity analysis will result in several timing budgets that will conclude whether or not there is positive timing margin in that area of the system for all potential operating conditions. When it comes to meeting timing budgets, designers have little control over the timing contributions of the DRAM. The controller and PHY IP that is selected can certainly make a difference, and the SoC packaging can have a significant impact as well. Designers also have control over the interconnect between the SoC and the DRAM. With good signal integrity techniques, designers can significantly improve the timing characteristics of that interconnect in a way that helps meet timing budgets. Interconnect timing can be improved by reducing crosstalk, inter-symbol interference, reflections, and skew, and by controlling simultaneously switching output (SSO) effects. Designers should, for example, keep package skew to an absolute minimum, preferably +/- 10ps from DQS to DQ to DM. There are also DDR PHY IP solutions that help manage skew. For example, a DDR PHY that supports per bit deskew can correct for skew between different DQS, DQ & DM signals within the same data byte. Crosstalk can impact both setup and hold times, but can often be controlled by increasing the spacing between signal lines on the motherboard. As we have already discussed, ISI, which can keep signals from reaching their full threshold levels before the next bit occurs, can be addressed with 2T timing as well as careful selection of parallel termination values. SSO effects can be mitigated by increasing the number of I/O power and ground paths on the SoC and in the package as well as by on-chip decoupling capacitance. Given the importance of managing signal integrity effects, it is very important that design teams run signal integrity simulations on the entire system rather than simply relying on rules of thumb. Figure 8 shows the correlation between simulation and measurement of an actual 32-bit wide DDR2-800 interface that is possible when a simulation environment is carefully assembled. In Figure 8, the traces for DQ(EVEN) show the probed DQ signal when all other DQs have same switching direction (worst case SSO) and DQ(ODD) shows the probed DQ signal when it has the opposite switching direction to all other DQ signals (worst case crosstalk). Close correlation between measured and simulated results validates the simulation environment and increases confidence in a broad series of simulation results used in a signal integrity analysis.

Embedded DDR InterfacesTen Tips to Success for Your SoC

Simulation

Measurement

Figure 8: Example of System Simulation versus Measurement for DDR2-800 Write Data Eyes

For a more complete analysis of signal integrity analysis of DDR interfaces, refer to the Synopsys white paper Meeting Timing Budgets for DDR Memory Interfaces.

8. The DRAM Interface Power Dissipation Depends on Many Variables


As a supplier of DDR interface IP, Synopsys is frequently asked how much power an embedded DDR interface will dissipate. It is an excellent question and often vital to overall chip and package planning. However, it is a very difficult question to answer because the DDR PHYs power consumption can vary from a few hundred milliwatts to several watts depending on the system configuration and operating parameters. The power consumption of a DDR PHY is affected by the following issues:
Issue affecting PHY power DDR Type (e.g., DDR2, DDR3) Channel Width Number of Ranks Channel Data Rate Ratio of Reads to Writes to Idle cycles on the data bus Activity ratio of the Address/Command bus Data Switching activity Drive strength & ODT values in the DRAM Drive strength & ODT values in the PHY Termination resistor value for Address/Command bus Process Technology PVT Pin Loading How? DDR2 uses 1.8V I/O VDDQ. DDR3 uses 1.5V I/O VDDQ Higher VDDQ = higher power More data bits = more active power More ranks = more ODT termination during Writes and more capacitance load on Address & Command channel Higher data rate = more power consumption Reads use ODT in the PHY and consume power in the input buffers. Writes consume power in the output drivers The more switching you have, the more power you consume in external capacitance The more switching you have, the more power you consume in external capacitance Smaller ODT values = higher system power Lower impedance drive = higher system power Smaller termination values = higher system power Core power (VDD) varies with feature size. Higher VDD = higher power Traditional process, voltage, temperature effects Traditional CV2F power

Figure 9 shows how DDR3 PHY power dissipation decreases significantly as the ODT termination values programmed into the DDR SDRAM are increased.

Embedded DDR InterfacesTen Tips to Success for Your SoC

10

Normalized PHY power

1 0.8 0.6 0.4 0.2 0 40

Total DDR PHY power vs. ODT

60 On-die termination setting

120

Figure 9: DDR PHY Power is Proportional to ODT

DDR PHY power is also proportional to bus activity and what type of activity is happening on the bus (e.g., Read, Write or idle). Due to the ODT that is active in the DDR PHY during Reads from the DDR SDRAMs, DDR PHYs typically consume more power during a Read versus a Write when the data is driven off-chip to the DRAMs (this is reversed compared to an unterminated interface which typically consumes more power in the off-chip drivers during a Write). Figure 10 shows how the power consumption of a DDR3 PHY can vary according to the Read/ Write/idle percentage. The higher the Read percentage, the higher the overall power consumption.

Normalized PHY power

1 0.8 0.6 0.4 0.2 0

Total DDR3 PHY power as a function of bus activity

60/20/20

20/60/20

40/40/20

20/40/40

40/20/40

Write/Read/Idle percentage

Figure 10: PHY Power Increases as the Percentage of Read Operations Increases

One key advantage to licensing DDR interface IP versus building your own is the ability to accurately estimate the power dissipation of the interface very early in the design process.

9. Using DIMMs Often Limits Your Performance and Beware of the DDR3 UDIMM
We have already explored how the loading of a large number of DRAMs and ISI can impact the performance of your DIMM-based system. As the data rates increase, the number of DIMMs that can be supported on a single channel has decreased over time as each UDIMM on the channel typically adds at least 8 DRAM loads to the Address and Command channel. As the Address and Command bus becomes more heavily loaded, the response times of the signal edges slow down causing ISI effects. If you review what is possible throughout the history of PC chipsets and look towards the future, you will arrive at a chart that resembles that shown in Figure 11. A useful rule of thumb to follow for embedded designs is that if the PC chipsets cant do it, then

Embedded DDR InterfacesTen Tips to Success for Your SoC

11

you will not be able to either! Todays PC chipsets use flip-chip packaging and the PCB is a highly engineered design that is unlikely to be out-designed in an embedded application. If your system will not handle your preferred Unbuffered DIMM loading, then you should consider Registered DIMMs to reduce the fan out of the Address/Command channel.

4 3 2 1 0 *Most systems cannot achieve this today.


DDR2-1066 DDR3-1066 DDR3-1333 DDR3-1600 DDR3-1866 DDR3-2133 DDR2-400 DDR2-533 DDR2-667 DDR2-800 DDR-200 DDR-266 DDR-333

Figure 11: Number of UDIMMs per DDR Memory Channel

DDR3 UDIMMs present a unique challenge not seen previously with DDR memory channels. As shown in Figure 12 on the left, DDR2 UDIMMs use a standard T routing topology where the Address and Command net is routed to every DRAM on the DIMM with equal track length, resulting in the same flight time for the signals so they arrive at each DRAM synchronously. This minimizes data to data skew across the 64/72-bit data word. In the case of DDR2, the termination is located on the motherboard at the end of the net. DDR3 DIMMs use a new technique for routing the Address and Command bus called fly-by routing. As shown in Figure 12 on the right, fly-by routing addresses the DDR3 SDRAM from one side of the UDIMM to the other and no attempt is made to match the flight times to each DRAM. Write leveling is used by the memory controller to zero out the DRAM to DRAM skew caused by the signals flight time down the UDIMM. Perhaps more significantly, the DDR3 UDIMMs address and control nets are terminated on the UDIMM itself, not the motherboard like DDR2. The result is that any DDR3 UDIMM-based system that uses 2 UDIMMs per channel will have two terminations per net in parallel. Not only does this increase power consumption as the termination impedance is effectively cut in half, the data eyes for the nets will be much more limited in their swing. Fundamentally, any DDR3 UDIMM based system using 2 UDIMMs will have to take special precautions to enable the Address and Command channel to function properly. These include using 2T timing on the Address and Command bus (each command uses two clock cycles with the first cycle used for extra long setup time of the signals) or making a copy of the Address and Command bus (including bond pads on the SoC) for every individual DDR3 UDIMM that is supported. Micron Technology has published an excellent Technical Note on this subject titled Design Guide for Two DDR3-1066 UDIMM Systems. Micron also offers many other useful Technical Notes on their web site www.micron.com.

Embedded DDR InterfacesTen Tips to Success for Your SoC

12

DDR2 DDR2 DDR2 DDR2 DDR2 DDR2 DDR2 DDR2

DDR2 DDR2 DDR2 DDR2 DDR2 DDR2 DDR2 DDR2 Termination

Term. DDR3 DDR3 DDR3 Memory Memory Memory ory controller c ntroller cont er DDR3 DDR3 DDR3 DDR3 DDR3

Term. DDR3 DDR3 DDR3 DDR3 DDR3 DDR3 DDR3 DDR3

M or Memory Memory controller c ntroller cont er

Figure 12: DDR2 UDIMMs (Left) versus DDR3 UDIMMs (Right) Highlighting the Different Routing and Termination Techniques

10. DDR Interfaces Do Not Differentiate SoCs


The final point to be made is that a clever DDR interface adds no real value to an SoC. Nobody is going to differentiate an SoC product based on its commodity memory interface. Thus, it is much lower risk to purchase DDR memory interface IP than to design a do-it-yourself interface. In this way, engineering teams can focus on areas that differentiate their SoC rather than on standard interfaces. In addition, your SoC design will benefit from the hundreds of engineer-years included in the DDR IP products and the vast number of successful past projects and silicon experience. Early access to area and power information is also a key benefit to licensing DDR IP versus building your own. Finally, purchasing DDR IP provides you with an experienced DDR team that can support your interface during prototype bring-up. Synopsys offers DesignWare DDR IP supporting Mobile DDR, DDR, DDR2, DDR3, and many of the variations of DDR (including 1.5V DDR2 and 1.35V DDR3L) at data rates up to 2133 Mbps. Synopsys participates in the JEDEC committees that develop memory standards to keep abreast of what is coming in the future and to represent the interests of embedded memory controllers as the DRAM standards are developed. Synopsys memory controller IP is designed to support the DRAM standards, not to the DRAM standards, and the IP supports an expansive range of system configurations. Finally, signal integrity expertise is built into the products, along with flexible ODT and output drive impedance. For more information on the Synopsys DesignWare DDR IP portfolio, visit: http://www.synopsys.com/ddr

Conclusion
With a wide array of choices in DRAMs, memory configurations, and memory controller IP, there is certain to be a good solution for almost any SoC design. Successful design with DRAM calls for a good understanding of the DRAM systems, the DRAM market, and a willingness to challenge common misunderstandings about the requirements for DRAM controllers. As noted in this paper, embedded applications are not prominent on the radar screen for DRAM makers, whose primary customers are in the PC market. Designers of embedded applications must therefore take extra steps to become informed customers in order to meet their system requirements.

Synopsys, Inc. 700 East Middlefield Road Mountain View, CA 94043 www.synopsys.com 2009 Synopsys, Inc. All rights reserved. Synopsys is a trademark of Synopsys, Inc. in the United States and other countries. A list of Synopsys trademarks is available at http://www.synopsys.com/copyright.html. All other names mentioned herein are trademarks or registered trademarks of their respective owners. 08/09.CE.09-17767.

You might also like