Embedded DDR Interfaces: Ten Tips To Success For Your Soc
Embedded DDR Interfaces: Ten Tips To Success For Your Soc
Author
Graham Allan Sr. Manager, Product Marketing, Synopsys Inc.
Preface
DRAM standards such as DDR3 are developed by JEDEC, the standard setting organization for many semiconductor components. Most of the DRAM standards developed by JEDEC represent a standard with a specific system or a small number of systems in mind. For example, the DDR3 SDRAM standard was primarily developed for use on unbuffered and registered DIMMs for laptop, desktop and server applications. Embedded applications for DRAM such as video processing chips typically consume a very small percentage of the DRAMs manufactured and are often left to make due with a less than perfect fit for the application. Unless a given company regularly attends the JEDEC meetings, it is very difficult to appreciate the intended system view(s) that represent the focus for each type of DRAM being standardized. Synopsys is a regular participant in JEDEC meetings in order to represent the interests of the memory controller and to keep abreast of what standards are in development and how the current standards may evolve. This whitepaper does not disclose any proprietary information about work in progress at JEDEC, but it does address a number of issues and concerns that have come up frequently in customer interactions.
Introduction
Emerging from a host of competing technologies, DDR2 and DDR3 SDRAM (DDR) have become the dominant off-chip memory storage solution for system-on-chip (SoC) designs. With high volumes driven by the PC market, stability of supply, and attractive pricing, DDR has defeated all of the contenders including QDR SRAM, RLDRAM, Rambus DRAM and other memory technologies to take the RAM crown for embedded applications. Unfortunately, many SoC designers are unfamiliar with the realities of the DRAM standards, typical DRAM applications and the DRAM market. This paper presents ten guiding principles for embedded DDR interfaces, many of which the DRAM standards and vendor data sheets do not explain.
Mainstream DRAMs designed for the PC market represent the most plentiful and cheapest DRAMs. Currently, for example, the highest volume and cheapest products are the 64 Mb x 8 (512Mb) and 128 Mb x 8 (1Gb) DDR2 SDRAMs since these represent the most commonly used DRAMs in todays PCs. Some design teams might conclude that they want to use the cheapest available product for their SoC applications, but it is often not that simple as DRAMs targeting embedded applications typically have different requirements from DRAMs that were designed for PCs. To accommodate the memory sub-system requirements for expandability, PCs use 64-bit wide memory modules (72-bit wide if ECC is used) which are effectively a small PCB with a self contained memory subsystem. The most common module used in PCs today is the UDIMM or Unbuffered Dual In-Line Memory Module. The Dual part is a legacy term as most UDIMMs today only have one side of the module populated with DRAMs. There are many other types of DIMMs such as RDIMMs (Registered DIMMs popular in servers), SODIMMs (Small Outline DIMMs popular in notebooks & netbooks), etc. Each PC UDIMM typically uses eight (nine if error correction code [ECC] is used) 8-bit wide DRAMs in a parallel configuration providing anywhere from 256MB to 4GB total RAM capacity per DIMM depending on the number and capacity of the DRAMs used. PCs typically accommodate more than one DIMM slot per memory channel, however this goal is becoming more challenging with high end DDR2 and DDR3. Compared to PCs, embedded applications typically call for a narrower channel, wider components with less overall memory capacity. A more typical embedded memory configuration is a 32-bit channel using two 32Mb x 16 (512Mb) or 64Mb x 16 (1Gb) DRAMs that has a total RAM capacity of 128MB to 256MB. The net result is that any price per bit crossover point reported by the media (the point at which newer technology is cheaper than older technology for an equivalent size DRAM) applies to the PC market, not necessarily the embedded market. For example, if DDR3 is expected to be the lowest priced DRAM at some point in 2010, this will apply to the 128Mb x 8 (1Gb) or 256 Mb x 8 (2Gb) DDR3 DRAMs used in PCs. Since they have a smaller volume of production, x 16 DRAMs may take a longer time to cross over. In addition, since the focus is on the x 8 DRAMs for PCs, the higher speed DRAMs typically roll out as 8-bit wide DRAMs first followed by the 16-bit wide versions sometime later. The moral of this particular story is that a DRAM data sheet may cover well over 30 separate part numbers offering a variety of width options, speed grades, latency options and even temperature range options. The pricing for each option is considerably different and some combinations may simply not be available. If you are designing SoCs for cost-sensitive embedded applications, make sure you understand the availability and pricing implications of which ever DRAM you target for the system by talking with your DRAM suppliers.
As an example, todays DDR2 is well supported at both 800 and 1066 Mbps. These data rates are sufficient for most embedded applications today. DDR2 uses a 1.8V terminated I/O, so power consumption will be higher compared to DDR3 that uses 1.5V, but overall power consumption is also heavily dependent on the values chosen for on-die termination (ODT) and output drive impedance. There are also some DDR2 products available that operate at 1.5V instead of 1.8V to reduce the power signature of the DRAMs. DDR2 also has two output drive impedances that are programmable in the DRAM itself nominal drive strength, and reduced drive strength. Most lightly loaded, embedded applications should use the reduced drive strength. Unfortunately, DDR3 lacks a similar reduced drive strength that is more suited to embedded applications. DDR2 also supports common 256Mb and 512Mb devices, whereas DDR3 nominally starts at 1Gb devices. You may be able to find a data sheet for a 512Mb DDR3 SDRAM but it is likely to be lower volume and therefore more expensive. A 256Mb or 512Mb DDR2 SDRAM will be cheaper than a 1GB DDR3 SDRAM for years after the official DDR3 price cross-over that assumes similar DRAMs. On the higher end, DDR2 also supports up to 4Gb devices. DDR2 and DDR3 are available in 4-bit, 8-bit and 16-bit wide configurations, but some vendors offer a non-JEDEC standard 32-bit DDR2 option, which may be preferred for some embedded applications to allow high interface bandwidth with a minimum number of DRAMs. So why consider DDR3? The most common reason is the expectation that the price for DDR3 will be cheaper at some point in the future (and you require 1Gb or larger DRAMs). Higher bandwidth is another reason supported by clock frequencies up to 800 MHz, DDR3 goes up to 1600 Mbps (with the expectation that this will increase to 2133 Mbps in the future). There is also a potential power savings in the DDR3 1.5V interface (with the expectation that this will decrease to 1.35V in the future), and there is a wider variety of ODT settings in the DDR3 SDRAMs. DDR3 is also planned to support DRAMs up to 8Gb. However, there are some caveats specifically about DDR3. First, due to the higher operating frequency range and the on-chip delay-locked loop (DLL), the minimum operating frequency is 300 MHz (vs. 125 MHz for DDR2), which might not suit an embedded application that has a broad frequency range requirement. Secondly, DDR3 uses an 8-bit prefetch, compared to a 4-bit prefetch for DDR2. Thus, DDR3 is effectively limited to a burst length of 8 whereas DDR2 offers programmable burst lengths of 4 or 8. The longer burst length of DDR3 can potentially result in more delays to access data and poorer channel utilization even with the higher clock frequencies especially for video processing applications. Aside from operating frequencies and burst length, there are other factors to think about when considering DDR3 versus DDR2. Because the clock frequencies are higher and the product is younger, DDR3 DRAMs typically have higher latencies. DDR3 DIMMs use a technique called fly by routing that daisy-chains address and control signals along the length of the DIMM. This reduces signal-integrity concerns, but requires Address and Command termination on the DIMM causing issues for multi-DIMM systems (refer to section 9). Finally, due to the higher data rates and single-ended signaling, DDR3 makes it more likely that a flip-chip package will be required for SoCs with the embedded DRAM interface. SoC designers also have two other DDR options to consider. Mobile DDR, also known as low-power DDR (LPDDR), has no DLL on the DRAM, no lower operating limit, no on-die termination (ODT), and a 1.8V interface. Mobile DDR can easily switch into lower power modes and it supports low-power features such as partial array self-refresh (PASR) and clock stop mode. Ranging from 128Mb to 2Gb, Mobile DDR is available at frequencies up to 200 MHz, or 400 Mbps, and offers a 32-bit wide interface option which is a nice feature for embedded applications. However, due to the lower production volumes, Mobile DDR carries a significant price premium over an equivalent, higher performing DDR2 product. LPDDR2 is an emerging technology that aims to double the performance of Mobile DDR, going up to 400 MHz and 800 Mbps and ultimately targeting 1066 Mbps for package-on-package or multi-chip packaged systems. LPDDR2 features a 1.2V interface with no ODT, supports devices up to 8Gb, and provides a 32-bit wide option. While LPDDR2 is not yet widely available, a significant price premium compared to Mobile DDR should be expected for quite some time.
SoC
SDRAM controller
DQ, DQS
SDRAM
SDRAM PHY
Address/Command
DQ, DQS
SDRAM
Figure 1: A Common Embedded Application Configuration Uses an SoC with Two SDRAMs
Another common embedded configuration uses 4 SDRAMs arranged in a single rank (see Figure 2). This configuration typically uses 16-bit wide devices for a 64-bit wide data channel. The wide channel provides higher bandwidth, but the Address and Command lines are more heavily loaded. Fortunately, the Address and Command channel is single data rate so it can accommodate more loading than the double data rate data channel. Four loads on the Address and Command channel is typically not a difficult design challenge.
SoC
DQ, DQS SDRAM
DQ, DQS
SDRAM
Figure 2: Embedded Application with a 64-bit Bus Using Four 16-bit Wide SDRAMs
Some embedded configurations may use more than one rank of SDRAMs (see Figure 3). A rank is defined as an independent set of SDRAMs that can be accessed using the full databus width but share the Address and Command channel with all other ranks. Each rank has a unique chip select control signal to identify the specific rank being accessed. Ranks cannot be accessed simultaneously as they share the same data path to the SoC. Multiple ranks allow the memory sub-system to have higher performance as one Address and Command bus controls independent memory ranks that avoid many of the system timing limits in play when a single rank is used.
SoC
SDRAM controller
SDRAM PHY
CS0* Ad ss/Command AddreAddress/Command Address/Command ddress/Command ss/Command s/Comm CS1* DQ, DQS SDRAM
While not common in consumer applications, networking and computing applications often use one or more single or dual rank DIMMs (see Figure 4 and Figure 5). Unbuffered DIMMs are generally preferred over their Registered counterparts as the registers used in RDIMMs add an additional cycle of latency and are generally more expensive. However, RDIMMs do have the advantage that they provide intermediate buffering of the Address and Command signals and the clocks (via on DIMM PLLs) to limit the fan out from the memory controller.
DQ, DQS DQ, DQS DQ, DQS SDRAM controller SDRAM PHY DQ, DQS Address/Command DQ DQ, DQS DQ, DQS DQ, DQS DQ, DQS
SoC
SoC
DQ, DQS DQ, DQS DQ, DQS CS0* CS1* DQ, DQS
SDRAM controller
SDRAM PHY
The conclusion is that an embedded memory controller may have to support a number of different memory configurations depending on how the end customer wants to use the SoC. Without any standard for the memory controller, the SoC developer needs to ensure that the DRAMs will be driven properly in any configuration. Key features to look for in this regard include flexibility of output drive impedance, flexibility of the I/O configuration to match the number of I/Os required to the number of ranks, flexibility to interface to DIMMs or individual components, etc.
4. Proper Termination and Drive Strength is Key for Power and Performance
DDR2 and DDR3 SDRAMs offer a host of programmable options for the drive strength of the output buffers and for the on-die termination impedance. The embedded memory controller will offer similar choices. The DRAM data sheets and JEDEC standards clearly outline the options available and how to program the settings but how is a typical SoC designer supposed to choose the optimum settings? Termination requires a lot of care in component-based, embedded applications, and it is an area where there are frequent misunderstandings. Many engineers read the JEDEC specs and deduce that a DDR3 controller should use a 34 ohm output drive. The problem is that they are reading the spec for the DDR3 DRAM, not the memory controller (and remember, there is no standard for the memory controller). Recall that DRAM standards are developed with PCs in mind and that PCs use DIMMs. In a system that uses DIMMs, the 34 ohm output drive setting for the DRAM is the optimal choice. DDR DIMMs use a series termination resistor that isolates the DIMM stub from the memory channel transmission line. Specifically, DDR3 DIMMs use a 15 ohm series termination. In a typical PC system, 34 plus 15 ohms equals 49 ohms which is almost a perfect match for the characteristic impedance of the loaded transmission lines which are typically 50-60 ohm traces. So how is an embedded system developer left to manage all of these choices? First of all, for simple point-topoint systems, series termination resistors are not typically required. The result is that the 34 ohm output drive that was optimal for the PC may no longer be the right choice for an embedded system it may be too strong. In addition, without the 15 ohm series termination provided by a DIMM, a 34 ohm output drive will require termination matched to the transmission line (~50-60 ohms) to avoid reflections which can result in a high-power
system. Ideally, it may be better to raise the output drive impedance to 50-60 ohms (matching the impedance of the transmission line to absorb reflections) and then use a higher ODT setting such as 120 ohms to reduce the power dissipation in the termination. The ideal settings for drive strength and ODT will also depend on the clock frequency to ensure that inter-symbol interference (ISI) effects are not introduced. Figure 6 contrasts two embedded DDR3 systems one that uses 34 ohm drive strength in the controller with a 60 ohm ODT in the DRAM and one that uses 60 ohm drive strength with a 120 ohm ODT. In each case, the Write data eye as seen at the DRAM pins is shown. Notice that the optimized system creates smaller reflections, cleaner edge transitions, comparable data eye width and most importantly, a 37% overall lower power dissipation. Standard JEDEC Optimized embedded
34 driver, 60, ODT VSWING 1V Valid data eye: 743ps Significant reflections Timed with DQS transitions = jitter Driver and termination power: Address driver: 7.4mW/bit Address term: 3.9mW/bit DQ driver: 6.6mW/bit DQ term: 10mW/bit 950mW for a 32-bit channel
60 driver, 120, ODT VSWING 1V Valid data eye: 741ps Reflections absorbed by driver Less DQS jitter Driver and termination power: Address driver: 7.0mW/bit Address term: 2.0mW/bit DQ driver: 4.2mW/bit DQ term: 5.2mW/bit 600mW for a 32-bit channel
37% s av i n gs
Figure 6: Optimized Output Driver and ODT Settings (DDR3-1066) Result in a Significant Power Savings
To support optimized embedded applications, embedded memory controllers should have many options for output driver strength and ODT. Ideally, the memory controller should be more programmable than the DRAMs to enable engineers to select the optimal value based on their clock frequency and power requirements. In every case, a signal integrity analysis should be performed to allow designers to find optimal output drive impedance and ODT values for their system.
consume power whenever the memory controller is driving the Address/Command bus. Making the power dissipation worse is the requirement that DDR SDRAMs do not always allow their inputs to be high impedance (Hi-Z) since their input buffers are only turned off when in a power-down or self-refresh mode. DIMM-based designs present even more challenges for the Address/Command bus. If a design has two UDIMMs with two ranks on each UDIMM, a DDR application could result in an Address/Command bus with 32 or 36 loads. The heavy loading on this bus slows down the edges of the signal transitions and results in ISI problems as the signals do not reach their steady state potential in one clock period effectively clipping the signal. One solution is to implement 2T (two period) command timing which allows more setup and hold time for the signals. In this case, there is one Address/Command signal for every two cycles, and the signals are clocked on every other rising clock edge. This reduces timing uncertainty, but each Address and Command signal now needs to be held for two cycles. Finally, many designers would like to simply leave the Address/Command bus unterminated to save power. At very low frequencies, this may work but more often than not, an unterminated Address/Command bus leads to overshoot and undershoot that exceeds the DRAM data sheet specifications. The result may not be a system that fails immediately but instead has reduced long term reliability as the overshoot/undershoot slowly damages the DRAM. The address eye patterns in Figure 7 compare the received eye at the address ball of the DRAM for a typical system operating at 1066 Mbps for both a terminated and unterminated net. The terminated net shows an eye opening that is over 300ps wider compared to the unterminated net. More importantly, the unterminated net has overshoot and undershoot that far exceeds the JEDEC standard specification and vendor data sheets (for DDR3, the JEDEC standard and DRAM data sheets specify an absolute maximum overshoot/undershoot of VDDQ+0.4V/-0.4V for Address/Command/Control pins).
Figure 7: DDR3 Address Eyes, 50 Ohm Termination to VTT (left), Unterminated (right)
The type of package is also important. Wire bond packages have higher lead inductance, and require more power and ground pins than flip-chip packages to avoid current induced power rail collapse when all of the signals switch in the same direction. Flip-chip packages do not use highly inductive wires to deliver power to the die. As a result, designers may be able to use half the number of power and ground connections with a flip-chip package compared to a wire bond package. Of course, many design teams would prefer to use a wire bond package because of their inherently lower costs. Sufficient on-die decoupling capacitance is also a design criterion that is more critical in wire bond applications. The power planes in the package are also important. In addition to having enough pads on the SoC, it is important to have a sufficient number of balls on the package, and enough vias to connect those balls to the power plane to avoid blocking the power supply to the device.
Simulation
Measurement
Figure 8: Example of System Simulation versus Measurement for DDR2-800 Write Data Eyes
For a more complete analysis of signal integrity analysis of DDR interfaces, refer to the Synopsys white paper Meeting Timing Budgets for DDR Memory Interfaces.
Figure 9 shows how DDR3 PHY power dissipation decreases significantly as the ODT termination values programmed into the DDR SDRAM are increased.
10
120
DDR PHY power is also proportional to bus activity and what type of activity is happening on the bus (e.g., Read, Write or idle). Due to the ODT that is active in the DDR PHY during Reads from the DDR SDRAMs, DDR PHYs typically consume more power during a Read versus a Write when the data is driven off-chip to the DRAMs (this is reversed compared to an unterminated interface which typically consumes more power in the off-chip drivers during a Write). Figure 10 shows how the power consumption of a DDR3 PHY can vary according to the Read/ Write/idle percentage. The higher the Read percentage, the higher the overall power consumption.
60/20/20
20/60/20
40/40/20
20/40/40
40/20/40
Write/Read/Idle percentage
Figure 10: PHY Power Increases as the Percentage of Read Operations Increases
One key advantage to licensing DDR interface IP versus building your own is the ability to accurately estimate the power dissipation of the interface very early in the design process.
9. Using DIMMs Often Limits Your Performance and Beware of the DDR3 UDIMM
We have already explored how the loading of a large number of DRAMs and ISI can impact the performance of your DIMM-based system. As the data rates increase, the number of DIMMs that can be supported on a single channel has decreased over time as each UDIMM on the channel typically adds at least 8 DRAM loads to the Address and Command channel. As the Address and Command bus becomes more heavily loaded, the response times of the signal edges slow down causing ISI effects. If you review what is possible throughout the history of PC chipsets and look towards the future, you will arrive at a chart that resembles that shown in Figure 11. A useful rule of thumb to follow for embedded designs is that if the PC chipsets cant do it, then
11
you will not be able to either! Todays PC chipsets use flip-chip packaging and the PCB is a highly engineered design that is unlikely to be out-designed in an embedded application. If your system will not handle your preferred Unbuffered DIMM loading, then you should consider Registered DIMMs to reduce the fan out of the Address/Command channel.
DDR3 UDIMMs present a unique challenge not seen previously with DDR memory channels. As shown in Figure 12 on the left, DDR2 UDIMMs use a standard T routing topology where the Address and Command net is routed to every DRAM on the DIMM with equal track length, resulting in the same flight time for the signals so they arrive at each DRAM synchronously. This minimizes data to data skew across the 64/72-bit data word. In the case of DDR2, the termination is located on the motherboard at the end of the net. DDR3 DIMMs use a new technique for routing the Address and Command bus called fly-by routing. As shown in Figure 12 on the right, fly-by routing addresses the DDR3 SDRAM from one side of the UDIMM to the other and no attempt is made to match the flight times to each DRAM. Write leveling is used by the memory controller to zero out the DRAM to DRAM skew caused by the signals flight time down the UDIMM. Perhaps more significantly, the DDR3 UDIMMs address and control nets are terminated on the UDIMM itself, not the motherboard like DDR2. The result is that any DDR3 UDIMM-based system that uses 2 UDIMMs per channel will have two terminations per net in parallel. Not only does this increase power consumption as the termination impedance is effectively cut in half, the data eyes for the nets will be much more limited in their swing. Fundamentally, any DDR3 UDIMM based system using 2 UDIMMs will have to take special precautions to enable the Address and Command channel to function properly. These include using 2T timing on the Address and Command bus (each command uses two clock cycles with the first cycle used for extra long setup time of the signals) or making a copy of the Address and Command bus (including bond pads on the SoC) for every individual DDR3 UDIMM that is supported. Micron Technology has published an excellent Technical Note on this subject titled Design Guide for Two DDR3-1066 UDIMM Systems. Micron also offers many other useful Technical Notes on their web site www.micron.com.
12
Term. DDR3 DDR3 DDR3 Memory Memory Memory ory controller c ntroller cont er DDR3 DDR3 DDR3 DDR3 DDR3
Figure 12: DDR2 UDIMMs (Left) versus DDR3 UDIMMs (Right) Highlighting the Different Routing and Termination Techniques
Conclusion
With a wide array of choices in DRAMs, memory configurations, and memory controller IP, there is certain to be a good solution for almost any SoC design. Successful design with DRAM calls for a good understanding of the DRAM systems, the DRAM market, and a willingness to challenge common misunderstandings about the requirements for DRAM controllers. As noted in this paper, embedded applications are not prominent on the radar screen for DRAM makers, whose primary customers are in the PC market. Designers of embedded applications must therefore take extra steps to become informed customers in order to meet their system requirements.
Synopsys, Inc. 700 East Middlefield Road Mountain View, CA 94043 www.synopsys.com 2009 Synopsys, Inc. All rights reserved. Synopsys is a trademark of Synopsys, Inc. in the United States and other countries. A list of Synopsys trademarks is available at http://www.synopsys.com/copyright.html. All other names mentioned herein are trademarks or registered trademarks of their respective owners. 08/09.CE.09-17767.