Tutorial On FPGA
Tutorial On FPGA
Tutorial On FPGA
PRO GRAMMAB
E
D
Architectures: A rufo~idl
RECENTLY, the development of new typesot sophisticated fieldprogrammable devices (FPOs) has dramatically changed the process of designing digital hardware. Unlike previous {generations of hardware technology in which board level designs includedlarge numbers of SS! (small-scale integration) chips containing basic gates, virtually every digital design produced today consists mostly of high-density devices. This is true not only oi custom devices such as processors and memory but also of logic circuits such as state machine controllers, counters, registers, and decoders. When such circuits are destined for high-volume systems, designers integrate them into high-density gate arrays. However, the high nonrecurring engineering costs and long manufacturing time of gate arrays make them unsuitable for prototyping or other lowvolume scenarios. Therefore, most prototypes and many production designs now use. FPOs. The most compelling advantages of FPOs are low startup cost, low financial risk, and, because the end user programs the device, quick manufacturing turnaround and easy design changes.
STEPHEN BROWN JONATHAN ROSE
The FPO market has grown .over tile past decade tothe point where there is now a wide assortrnentofdevicesto choose from.To choose a product, designers face the daunting task of researching the best uses of the various chips and learning the intricacies-of vendor-specific software. Addingto the difficulty is the complexityofthe more sophisticated devices. To help sort-out the confusion, we provide an overview
42
0740-7475/961$05.00 © 1996 IEEE
Inputs and flip-flop feedbacks
Figure 1. PAL structure.
plane output to produce the logical sum of any AND plane output. With this structure, PLAs are well-suited for implementing logic functions in sum-ofproducts form. They are also quite versatile, since both the AND and OR terms can have many inputs (product literature often calls this feature "wide AND and OR gates").
When Philips introduced PLAs in the early 1970s, their main drawbacks were expense of manufacturing and somewhat poor speed performance. Both disadvantages arose from the two levels of configurable logic; programmable logic planes were difficult to manufacture and introduced significant propagation delays. To overcome these weaknesses, Monolithic Memories (MMI, later merged with Advanced Micro Devices) developed PAL devices. As Figure 1 shows, PAls feature only a single level of programmability-a programmable, wired-AND plane that feeds fixed-OR gates. To compensate for the lack of generality incurred by the fixed-OR plane, PAls come in variants with different numbers of inputs and outputs and various sizes of OR gates. To implementsequentiai circuits, PAls usually contain flip-flops connected to the OR gate outputs.
The introduction of PAL devices profoundly affected digital hardware design, and they are the basis of some of the newer, more sophisticated archi-
$UMMER 1996
FPDs, including PLAs, PAls, and PALlike devices, into the single category of simple programmable-logic devices (SPLDs), whose most important char-
tectures that we will describe shortly. Variants of the basic PAL architecture appear in several products known by various acronyms. We group all small
43
D
PRO GRAMMAB
D
v
DD DO
I/ObIOCk-D D~~~~ __ H+~~~-+~
Figure 2. FPGA structure.
40,000 h*
'*' Altera Flex 10K, ATT&T ORCA 2 *' Altera Max 9000
, AlteraMax 7000, AMP Mach, Lattice (p)LSI. CypressJlasl1370; Xilinx XC9500
FigureS. FPD logic capacities.
acteristics are low cost and very high pin-to-pin speed performance.
Advances in technology have produced devices with higher capacities than SPLOs. The difficulty with increasing a strict SPLO architecture's capaci-
44
tyis that the programmable-logic plane structure grows too quickly as the number of inputs increases. The only feasible way to provide largEX:apacity devices based on SPLD architectures is to programmably interconnnect multiple SPLOs on a single chip. Many FPO product') on the rnarkel todaY;ffaVe this basic structure and are known as complex programmable-logic dewceS.
Altera pioneered CPills, filStin their Classic EPLD chips, and then in the Max 5000,7000, and 9000series. Because of a rapidly growing market for large FPDs, other manufacturers develofied CPLO devices, and many choices 'are now available. CPLOs pFOvide 10giccapaCity up to the equivalent of about 50 typi: cal SPLO devices, but extending these architecturestohigher'ciehsiti'es is difficult. Building FPOs'with very h1gh logic capacity requires a different approach,
The highest capacity general-purpose logic chips available today are the traditional gate arrays sometimes referred to as mask'programmable gate,arrays. An MPGA cO[lsists of an array·of pre-
fabricated transistors customized for{he Ire connections. Beca,use the si oundry performs custom:iizatiol1,~uringchip fabc ricatiofl/the manufacturing time is long,
and the user's setup cost is high. .
Although MF'GAs are. tlearly not FPOs, we mentioD,.them here,becausc they moHvated .the! desighof the fietdprogrammable equivalent, FPGAs. Like MPGAs, :an FPGAconsists oran array of uncommitted circuiteiements (logic blocks) and intercorm'ectlresourc~s, but the end user configures the FPGA through programming, Figure 2 shows a typi:ca!FPGA architecture. As the Olily typeof'FPO that supports velyhigh logiccapadty, FF'GAs have erigendered;a major snift in digital-ciwuit:cIesigFl,
. Fjgure3 illustva'tes the ;lo'gio capacitiesiavailable iIi each FPD category. "Equivalent gates\'refersiooseiy to the number of twe-input NAN9gates. The chart serves as a guide :for;.selecting 'a devkeIor anapplicatioTl<xccordin'g to the,iogi'ccapac:ity \n;~eded~ However, as we explain later, each ,type ofFPD isioherently' better suited for some, applications'rhan for others. There are also spedaicpurposedevices optimized .for specifi'c applications (for example, state maohines, analog gatcltrl'ars, large in" tercorinm::tioIi problems). ;Sicl1c;esncn devices have limite-a lise, we{10 n0tciescribe them here. ' '.
User-programmable' swit<:h
technolagies ' .
User-programmable switches are the key to user custornizatlon of EPOs, Tne firstusercprogramri1able s,v,itch developed was the fuse used': in . PLAs. Alllwughsome srr'l'allerd'evic'es still use fuses, we:will not discussttiem h'ere'be" cause rrewer technOlogy is.('{l1ickly replac~ng them. For higner density devices, CMOS dominates:theIC industry, and differentapproachest6:implementing programmable s'V\itcnes are necessary: For CPLDs, the main switch technologies (in Qommercial j'ltoductS)
Switch type
Reprogrammable?
Table 1. Summary of FPD programming technologies.
Volatile? Technology
No Bipolar
No UVCMOS
No EECMOS
Yes CMOS
No CMOS+ Fuse EPROM
No Yes
(out of circuit) Yes
(in circuit) Yes
(in circuit) No
EEPROM
SRAM
Antifuse
are floating gate transistors like those used in EPROM (erasable programmable read-only memory) and EEPROM (electrically erasable PROM), For FPGAs, they are SRAM (static RAM) and anti fuse. Table I lists the most important characteristics of these programming technologies.
To use an EPROM or EEPROM transistor as a programmable switch for CPLDs (and many SPLDs) , the manufacturer places the transistor between two wires to facilitate implementation of wired-AND functions. Figure 4 shows EPROM transistors connected in a CPLD's AND plane. An input to the AND plane can drive a product wire to logic level 0 through an EPROM transistor, if that input is part of the corresponding product term. For inputs not involved in a product term, the appropriate EPROM transistors are programmed as permanently turned off. The diagram of an EEPROM-based device would look similar to the one in Figure 4.
Although no technical reason prevents application of EPROM or EEPROM to FPGAs, current commercial FPGA products use either SRAM or antifuse technologies. The example of SRAM-controlled switches in Figure 5 illustrates two applications, one to control the gate nodes of pass-transistor switches and the other, the select lines of multiplexers that drive logic block inputs. The figure shows the connection of one logic block (represented by the
SUMMIRI996
+5V
Input wire
EPROM
Input wire
Product wire
EPROM --=--
Figure 4. EPROM programmable switches.
Figure 5. SRAM-controlled programmable switches.
AND gate in the upper left comer) to another through two pass-transistor switches and then a multiplexer, all controlled by SRAM cells. Whether an FPGA uses pass transistors, multiplexers, or both depends on the particular product.
Antifuses are originally open circuits that take on low resistance only when programmed. Antifuses are. manufactured using modified CMOS tech nolo-
gy. As an example, Figure 6 (next page) depicts Actel's PUCE (programmablelogic interconnect circuit element), an tifuse structure.' The antifuse, positioned between two interconnect wires, consists of three sandwiched layers: conductors at top and bottom and an insulator in the middle. Unprogrammed, the insulator isolates the top and bottom layers; programmed, the insulator becomes a low-resistance link.
45
o
RO GRAMMABL
Figure 6,Acte/'s PLiCE antifusestructure.
Fix errors
Design entry:
Text or schematic
Programming unit
@
f---c-- -~ Manual
Automatic
Figure 7. CAD process for SPLDs.
PLlCKuses polysilicon and Fl-l- diffusion as conductors and a custom-developed compound, ONO (oxide-nitride-oxide),las an insulator. Other antifuses rely on metal for conductors, with amorphous silicon as the middle layer.2,3
CAD for FPDs
Computer-aided design programs are essential in designing circuits for implementation in FPDs. Such software tools are important not only for CPLDs and FPGAs, but also for SPLDs. A typical CAD system for SPLDsincludes software for the following tasks: initial design entry, logic optimization, device fitting, simulation, and configuration Figure 7 illustrates the SPLD design process. To enter a design, the designer creates a schematic diagram with a graphical CAD tool, describes the design in a simple hardware description
46
language, or combines these methods. Since initial logic entry is not usually in an optimized form, the system applies algorithms to optimize the circuits. Then additional algorithms analyze the resulting logic equations and fit them into the SPLD. Simulation verifies correct operation, and the designer retums to the design entry step to fix errors. When a design simulates correctly, the designer loads it into a programming unit to configure an SPLD. In most CAD systems, the designerperforrnsthe originaldesignel1tlystep manually, and;all other steps are automatic.
The steps involved in CPLD design are similar to those for SPLDs, but tlie - CAD tools are more sophisticated. Because the devices are complex and can accommodate large designs, it is more common to use different, design entry methods fordiffercntmodu!.esof a circuit. For instance, the designer
o
v
s
c
miglht useasrnaIHianJwa,!(e description language such as ABEL for some moqules, a SYdIlbolicschematic@apture tool fOl'0thers; anda ftllHeatured;hardware descripti.0n language such as VHDL for still otners. Also, the device:fitting process may r,equire steps similar to those described next for :GPDAs, depen<\ling,on the CPLD'ssophistication. Either th.e CPLD mimufa('.tureroralhird party supplies the necessary software for these~tasks.
The FPGA design process is similar to that of CPLDs bUl requires additional tools to support ih.creased.chipcomplexity. The majordifferen'ceiis.indeviceHittlng, for which FPGAs need 'at least three tools:. a: technology mapper to transform basic !6gic,gates'intb tlie FPGA's logic blo<:::ks, a placement tool to choose the speci.fic logic hlo(lks, and a router to allocate wim segrrteHts;.lojgterconmid the logic blocks: With this addedcomplexity\ the CAD toob take: a fairly Imig time (often more than an hour; or:evenseveral hours) tacorne plete their tasks.
"; ~:-, <",)<;'_'_,;'
Commercially available FPDs.,'
ThisoVierview p'rovides eXctinples of com'tneraial FPD produGtS.arid tReirapplications. Weieiicourage readeE~interestedinmore detailsWcontact the man1±Ifaeturers 0rdistribut6rs fortheJa~estdata,sheets.: M(.)st FPEl nla.n:ufactu'fc ers provide data'sileets orr;the'\Vorld Wide Web at name. com.
SPLDs, As a staple ohliigHa,1 hardware: designers'Jor, thELpast. ',t\VO deca:c1es,.SPLDs are 'Very. importar'ltdcvices" They have thehighestsp;eedpe'tforrriance of all, FPDs rand; ate inex]!Jevrsive. Because theya,[e,straightc forward·and wellllnderslo.od;. we. discuss;themonlybriefly here, :
'Fwo of the mostpop'uJarSPLDs are the AMD(AdvancedM.icro ,De:vices) 16R8'an€l22Vl OPAls. Bothofth·es., devices'are ,industry standards, wielely sec-
IEEE DESIGN & TES"QflC~M~UrERs
end-sourced by other companies. The designation 16R8 means that the PAL has a maximum of 16 inputs (eight dedicated inputs and eight input/outputs) and a maximum of eight outputs, and that each output is registered (R) by a D flip-flop. Similarly, the 22VlO has a maximum of 22 inputs and ten outputs. The V means versatile-that is, each output can be registered or combinational.
Another Widely used and secondsourced SPLD is the Altera Classic EP610. This device is similar in complexity to PAls, but offers more flexibility in the production of outputs and has larger AND and OR planes. The EP61O's outputs can be registered, and the flipflops are configurable as D, T, JK, or SR.
Many other SPLD products are available from a wide array of companies. All share common characteristics such as logic planes (AND, OR, NOR, or NAND), but each offers unique features suitable for particular applications. A partial list of companies that offer SPLOs includes AMD, Altera, ICT, Lattice, Cypress, and Philips-Signetics. The complexity of some of these SPLDs approaches that of CPLOs.
CPLDs. As we said earlier, CPLDs consist of multiple SPLO-like blocks on a single chip. However, CPLD products are much more sophisticated than SPLDs, even at the level of their basic SPLO-like blocks. In the following descriptions, we present sufficient details to compare competing products, emphasizing the most widely used devices.
ro block
LogiC array block
Figure 8. Alfera Max 7000 series archifecfure.
Array of 16 macrocells
PIA
tecture of the Altera Max 7000 series. It consists of an array of logic array blocks and a set of interconnect wires called a programmable interconnect array (PIA). The PIA can connect any logic array block input'6r output to any oth-, er logic array block. The chip's inputs
To liD cells
Altera Max, AItera has developed three families of CPLD chips: Max 5000, Figure 9. Altera Max 7000 logic array black. 7000, and 9000, We focus on the 7000
series because of its wide use and state-
of-the-art logic capacity and speed performance. Max 5000 represents an older technology that offers a cost-effective solution; Max 9000 is similar to Max 7000 but offers higher logic capacity (the industry's highest for CPtDs)',
Figure 8 depicts the general arch i-
PIA
and outputs connect directly to the PIA and to logic array blocks. A logic array block is a complex, SPLD-like structure, and so we can consider the entire chip an array of SPLDs.
Figure. 9 shows.the structure of a logic array block. Each logic array block
SUMMIRI996
47
o
PRO G RAMMAB
o
Figure 11. AMD Mach 4 structure.
Global clock
PIA
Local logic array block interconnect
Figure 10. Max 7000 macrocell.
34V16 PAL-like block
1/0(8)0
I/O (8)
I/O (8)
1/0 (8)
-0-110(8)
1/0 (32)
consists of two sets of eight macrocells (shown in figure 10). A macrocell is a set of programmable product terms (part of an AND plane) that feeds an OR gate and a flip-flop. TIle flip-flops can be D, JK, T, or SR, or can be transparent. As Figure 10 shows, the product select matrix allows a variable number of inputs totheOR gate in a macrocell:
Any or all of the five product terms in the rnacrocell can feed the OR gate, which can have up-to 15 extra~product terms from macro cells in the same logic array block. This producttetmflexi-
. bility makes the Max 7000seri(e$ffiorce efficient in chip area than c:iasslcSPLDs, because typical logic funetionssieed no morcthanJive· product-terms, and the
48
between this block and a normal PAL: 1) a product term (PI') allocator between the AND plane and the macrocells (the macrocells comprise an OR gate, an EXOR gate, and a flip-flop), and 2) an output switch matrix between the OR gates and the I/O pins. These features make a Mach 4 chip easier to use because they decouple sections of the PAL-like block. More specifically, the product-term allocator distributes and shares product terms from the AND plane to OR gates that require them, allowing much more flexibility than the fixed-size OR gates in regular PAls. The output switch matrix enables any macrocell output (OR gate or flip-flop) to drive any 110 pin connected to the PAL-like block, again providing greater flexibility than a PAL, in which each macrocell can drive only one specific I/O pin. Mach 4's combination of in-system programmability and high flexibility allow easy hardware design changes.
Lattice pLSJ and ispl.Sl. Lattice offers a complete range of CPLDs, with two main product lines: the pLSI and the ispLSI. Each consists of three families of EEPROM CPLDs with different logic capacities and speed performance. The ispLSI devices are in-system programmable.
Lattice's earliest generation of CPLDs is the pLSI and ispl.Sl 1000 series. Each chip consists of a collection of SPLDlike blocks and a global routing pool to connect the blocks. Logic capacity ranges from about 1,200 to 4,000 gates, and pin-to-pin delays are 10 ns. Lattice also offers the 2000 series-relatively small CPLDs with between 600 and 2,000 gates. The 2000 series features a higher ratio of macrocellsto I/O pins and higher speed performance than the 1000 series. At 5.5-ns pin-to-pin delays, the 2000 series provides state-of-the-art speed.
Lattice's 3000 series consists of the company's largest CPLDs, with up to 5,000 gates and 10- to IS-ns pin-to-pin
SUMMER 1996
Figure 12. Mach 4 34V16 PAL-like block.
Input bus
Figure 13. Lattice pLSI and ispLSI architecture.
I delays. Compared with the chips discussed so far, the functionality of the I 3000 series is most similar to that of the !vlach 4. Unlike the other Lattice CPLDs, / the 3000 series offers enhancements to support more recent design styles, such I as IEEE Std 1149.1 boundary scan.
, Figure 13 shows the general structure
of a Lattice pLSI or ispLSI device. Around the chip's outside edges are
/' bidirectional I10s, which .connect to both the generic logic blocks and the
/. global routing pool. As the magnified view on the right side of the figure shows, the generic logic blocks are
1/0 (8)
, AND
• \. plane
........ \
.......... :':..\' .... ""
Product Macrocells': term
allocator
1/0 pads
small PAL-like blocks consisting of an AND plane, a product term allocator, and macrocells. The global routing pool is a set of wires that span the chip to connect generic logic block inputs and outputs. All interconnects pass through the global routing pool, so timing between logic levels is fully predictable, as it is for the AMD Mach devices.
Cypress Flash370. Cypress has recently developed CPLD products similar to the AMD and Lattice devices in several ways. Cypress Flash370 CPLDs
49
D
c
E
D
PRO G RAMMAB
Clock (4)
liDs --IIf~!il----1 liDs +"~If~+--j liDs --lIl'~!il----1 1/0s_~~if----1
~l/os f----IBII'- 110s_" 1---1_1-_ liDs
110 110 110
110
bypassable (0, T, latch) flip-flop, trista:tebuffer
Figure 74. Cypress Flash370 architecture. (PIM: programmable interconnect matrix.}
(a)
(b)
(e)
Figure 7 5. Altem Flashlogic: CPLD: general architecfure (0); CFB in PAL mode {b}; CFB in SRAM mode ic).
use flash EEPROM technology and ofier.speed performance of 8.5 to 15 ns pin-to-pin delays. The Flash370s are not in-system programmable. To-meet the needs of larger chips, the devices provide more ·lIO pins than competing products, with a linear relationship between the number of macro cells and the number of bidirectional 110 pins.
50
The smallest parts have 32 rnacrocells and 32 I/O pins-the largest have 256 macroceHs and 256 pins.
Figure 14sbowstbat Flash370shave a typical CPLDarchitecturewithmultiple PAL-like blocks connectedbya prograrnmableinterconnectmati'ix.EaCh PAL-like block contains an AND plane that feeds a productterm allottator that
IEEE DESIGN & TEST OF'· COMPUTERS
all other CPlOs: Instead of containing AND/OR logic, a CFB can serve as a 10-ns SRAM block. Figure ISb shows a CFB configured as a PAL, and Figure ISc shows another configured as an SRAM. In the SRAM configuration, the PAL block becomes a 128-word by 10- bit read/write memory. Inputs that would normally feed the AND plane in the PAL become address lines, data lines, or control signals for the memory. Flip-flops and tristate buffers are still available in the SRAM configuration.
In the Flashlogic device, the AND/OR logic plane's configuration bits are SRAM cells connected to EPROM or EEPROM cells. Applying power loads the SRAM cells with a copy of the nonvolatile EPROM or EEPROM, but the SRAM cells control the chip's configuration. The user can reconfigure the chips in system by downloading new information into the SRAM cells. The user can make the SRAM cell reprogramming nonvolatile by writing the SRAM cell contents back to the EPROM cells.
tcr PEEL Arrays. ICT PEEL (programmable, electrically-erasable logic) Arrays are large PLAs that include logic macrocells with flop-flops and feedback to the logic planes. Figure 16 illustrates this structure, which consists of a programmable AND plane that feeds a programmable OR plane. The OR plane's outputs are partitioned into groups of four, and each group can be input to any of the logic cells. The logic cells provide registers for the sum terms and can feed back the sum terms to the AND plane; Also, the logic cells connect sum terms to I/O pins.
Because they have a PLA-like structure, the logic capacity of PEEL Arrays is difficult to measure compared to the CPLDs discussed so far, but we estimate a capacity of 1,600 to 2,800 equivalent gates. Containing relatively few I/O pins, the largest PEEL Array comes in a 40-pin package. Since they do not consist of SPill-like blocks, PEEL Arrays do not fit
SUMMER 1996
I/O Global
pins clock
ToANO
array
Input
pins
Four pins
sum
terms terms
\
'-- __ ---.J Group of four
sum terms
I Figure 16. /CT PEEL Array archifedvre.
well into the CPlO category.
I Nevertheless, we include them here be, cause they exemplify Pl.Abased (rather I than PAL-based) devices and offer larg-
er capacity than a typical SPill.
The PEEL Array logic cell, shown in Figure 17, includes a flip-flop, configurable as D, T, or JK, and two multiplexers. Each multiplexer produces a logic cell output, either registered or combinational. One logic cell output can connect to an I/O pin, and the other output is buried. An interesting feature of the logic cell is that the flip-flop clock, preset, and clear are full sum-ofproduct logic functions. Distinguishing PEEL Arrays from all other CPLDs, which simply provide product terms for these signals, this feature is attractive for some applications. Because of their PLA-like OR plane, PEEL Arrays are especially well suited to applications that require very wide sum terms.
CPLD applications. Their high speeds and wide range of capacities make CPLDs useful for many applications, from implementing random glue logic to prototyping small gate arrays. An important reason for the growth of the CPLD market is the conversion of designs that consist of multiple.Sf'Llrs into a smaller number of CPLDs.
. CPLDs can realize complex designs
I such as graphics, LAN, and cache controllers. Asa rule of thumb, circuits that
Figure 17. /CT PEEL Array logic cell strvcture.
can exploit wide AND/OR gates and do not need a large number of flip-flops are good candidates for CPill implementation. Finite state machines are an excellent example of this class of circuits. A significant advantage of CPills is that
. they allow simple design changes I through reprogramming (all commercial CPLD products are reprogrammable). In-system programmable CPLDs even make it possible to reconfigure hardware (for example, change a.protocol for a communications circuit) without powering down.
Designs often partition naturally into the SPill-like blocks in a CPlO, producing more predictable speed performance than a design split into many small pieces mapped into different areas of the chip. Predictability of circuit implementation is one of the strongest advantages of CPill architectures.
FPGAs. As one of the fastest growing segments of the semiconductor industry,the FPGA marketplace is volatile. The pool of companies involved
I changes rapidly, and it is difficult to say which products will be most significant when the industry reaches a stable
state. We focus here on products currently in widespread use. In describing
I each device, we list its capacity in twoinput NAND gates as given by the venI dor. Gate count is an especially contentious issue in the FPGA industry, I and so the numbers given should not
be taken too seriously. In fact, wags
51
D
PROG RAMMAB
v
G1 G2 G3 G4
F4 F3 F2 F1
Inputs G4 G3 G2 G1
Clock
Figure 78. XilinxXC4000 CLB.
Vertical
channels ~~ •.
not shown .. ~. ---.-~ ~~ ... .
~ ----......___---~
.} ·.length1 ==== wires ======--'-:=====--2====~====--===-= r L~ngth 2 J Wires
~~---'~~-------"-----'---'.--- - .. -.--~ };~;e;
Fig~re 19. Xilinx XC4000 wire segments.
have coined the term "dog gates," a reference to the often-cited ratio between human and dog years, to indicate the dubiousness of vendors' figures.
The two basic categories of FPGAs on the market today areSRAM- and antifuse-based FPGAs. In the first category, XiIinx and Altera lead in number. of users, their major competitor being A1'811. For antifuse-hased products, ActeI, Quicklogic, and Cypress are-the leading manufacturers.
Xilinx FPGAs. Xilinx FPGAs have an array-based structure, each chip comprising a two-dimensional array of loge ic blocksinterconnectod by horizontal and vertical [auting channels (see Figure 2). Xilinx introd(Lced the first FPGAseries, theXC2000, in about 1985 and now offers three more gerretations:
XC3000, XC4000, and XC5000 .. Although theXC3000devices are still widelyused, we focus orrthe more recent andmore popularXC4000 family. The XC4000de-1 vices range in capacity from about .
52
02
G
01
D
c
s
2,000 to more ti1ttn15;QOO'equtvaleht gates. The XC5000 family provides similaufeatures at amoreattr2l'Cti've pri'ce with somepehalty in speed;;XiIinx has recently, annoum:ed an'anti{use'based fPGAfamily;the XC8100. TheXC8TOO' has;many interesting features, but since it is not"yet in widespread use; we .do notldiscuss it here ..
The XC4000 features a c€>i'lfigurable logi~c block eCLB) based oH']ookuptables. Atookup'tal:Jleis a l·bit>wicie.memory anay:; the memory address lineS are logit block inputs; and tnei-bit memoryoUJtput:is the lookup table output A looKup table withKin:puts corresponds to a2Kx I-bitrnemory; and the user cart reatIize:any K-inpllt logic. ftlhchon by programrriing the:logic':.function'struth table directlv into them.emorv. In the c~nfiguratio~ shown inFigur~ 18,:an XC4000 CLB· contains two fGlur-inpiit loolwp:tables fed:by CLBinputs; and a third lookup table fed by tile other tWG. Thililarrangement allows the CLB to implerilent a wide r;:u)ge pflogiduhctions . of up to nine inputs, two separate fourinput functions, or other possib1Iitios. Each CLB also cbntains:two flipcflops.
T:be,XC4000.chipsha¥e features designed,tosupportthe integriftion6f entire systems. Foninstance;,cach'CL.B contains circuitry ,that allows it to effi.:' cientlyperformarithmetic(tfuat.is, a circuit, tn.at implements a ·kmt carry opei'ati011 for adder-like circuits): Also, users can-configure the looktlp tables as read/write RkM 'cells; A '[,feW :addition, the 4000E al'lowsconfiguration as a dual-po11 RAM with a Single write and two read ports, arid Ri\M.b'iodks can be synchronous RAM. Each:XC4tlOO 'Chip includes very wide AND.planes around theperipbery of the logic bl@ti:ka:rrayt6 facilitate implementation of circ:uit blocks SUGh as wiGle decoders ..
Besid('>.sits logic, the other keYfer:ture thatdisti[irguishesan FPOAis its interconnect structure., Horizontal and 'fer' tical;thannels characterizetheX:C40(J(J interconnect.IEaonchanneh30n'tairrs
IEEED'ESIGN &" TEsrio'F~cbMPurERS
short wire segments that span a single CLB (the number of segments in each channel varies for each member of the XC4000 family), longer segments that span two CLBs, and vel}' long segments that span the chip's entire length or width. Programmable switches are available (see Figure 5) to connect CLB inputs and outputs to the wire segments or to connect one wire segment to another. A small section of an XC4000 routing channel appears in Figure 19. The figure shows only the wire segments in a horizontal channel-not the vertical routing channels, CLB inputs and outputs, and the routing switches. An important point about the Xilinx interconnect is that signals must pass through switches to reach one CLB from another, and the total number of switches traversed depends on the particular set of wire segments used. Thus, an implemented circuit's speed performance depends in part on how CAD tools allocate the wire segments to individual signals.
Altera Flex 8000 and Flex 10K.
Altera's Flex 8000 series combines FPGA and CPLD technologies. The devices consist of a three-level hierarchy much like that of CPLDs. However, the lowest level of the hierarchy is a set of lookup tables, rather than an SPLD-like block, and so we categorize the Flex 8000 as an FPGA. The SRAM-based Flex 8000 features a four-input lookup table as its basic logic block. Logic capacity of the 8000 series ranges from about 4,000 to more than 15,000 gates.
Figure 20 illustrates the overall Flex 8000 architecture. The basic logic block, called a logic element, contains a four-input lookup table, a flip-flop, and special-purpose carl}' circuitry for arithmetic circuits (similar to the Xilinx XC4000). The logic element also includes cascade circuitry that allows efficient implementation of wide AND functions. Figure 21 shows details of the logic element.
SUMMER 1996
Logic array block :
(8 logic element~ ~.D
and local
, interconnect)
I ...
Figure 20. Altem Flex 8000 architecture.
Cascade in
Carry in
Control1 Control2
Control3 Control4
Data1 Data2 Data3 Data4
Carry out
Cascade out
Logic element out
Figure 21. Flex 8000 logic element.
This design groups logic elements into sets of eight, called logic array blocks (a term borrowed from Altera's CPLDs). Ac; shown in Figure 22 on the next page, each logic array block contains local interconnection, and each local wire can connect any logic element to any other logic element within the same logic array block. The local interconnect also connects to the Flex 8000's FastTrack global interconnect. Like the long wires
in the Xilinx XC4000, each FastTrack wire extends the full width or height of the device. However, a major difference between Flex 8000 and Xilinx chips is that FastTrack consists only of long lines, making the Flex 8000 easy for CAD tools to configure automatically. All FastTrack horizontal wires are identical.Therefore, interconnect delays in the Flex 8000 are more predictable than in FPGAs that employ many shorter segments because
53
D
ROGRAMMAB
From FastTrack
interconnect Control Cascade,qarry'
To FastTrack interconnect
To FastTrack interconnect
To FastTrack interconnect To adjacent logic array block
Figure 22. Flex 8000 logic array block.
Figure 24. A T& T ORCA prograinmablefunction unit.
the longer paths contain fewer programmable switches. Moreover, connections between horizontal and vertical lines pass through active buffers, further enhancing predictability.
The Flex 10K familyofle{;>all theFlex 8000 features with the addition. of variable-size blocks of SRAM called embedded array blocks. A~ Figure 23 shows, each row of a Flex 10K chip has an embedded array block on ori~ end. Users can configure E!achembf:iddedar~ay block to serve as an SRAM'block with a variable aspect ratio: 2S~x8, S12x4, lKx2, or 2Kxl. AlterI).ativelY, CAD tools
54
Figure 23. A/tem Flex 10K architecture.
can configure an embedded 2lrray block to implement a complex logiC:circuit, such as a multiplier, byerflpioylngitas a large, multioutput lookup table. Altera CAD tools provide several macrofunctions that implement useiqI logiq~irc)lits in embedded array blqClU;' Counting the embedded array blocks-as lqgjc gates, Flex 10Koffers the high~t Iogiccapaci~ ty of any FPGA, althougH obtaining an accurate humber is difficult. .
AT&T ORCA. AT&T's SR&M-based FPGAs,' called Optimized Reconfigurable Cell Arrays (ORCAs), feature an overalIstrycture similar to.thatofXilinx FPGAs.;the ORCA IOQic block.contains an arra5rof progra;I\mable,function units (Figure 24)base~orHo<ilkup.tables. Alprogrammable-function u1;\h'is uniquearnong lookup-table-pased logic bICie~s: It is configurablegp fqur 4,il1, put lookVp tables, two 5-ir\,put']ook;up tables, or 'one 6-input lookup table. A key element 01 this architecture is that when the programmable-function unit
units based on the original ORCA architecture.
Actel FPGAs. Actel offers three main FPGA families: Act 1, Act 2, and Act 3. Although the three generations have similar features, we focus on the most recent devices. Unlike the FPGAs described so far, Actel's devices use antifuse technology and a structure similar to traditional gate arrays. Their design arranges logic blocks in rows with horizontal routing channels between adjacent rows (Figure 25). Actel logic blocks, based on multiplexers, are small compared to those based on lookup tables. Figure 26 illustrates the Act 3 logic block, which consists of an AND and an OR gate connected to a multiplexer-based circuit block. In combination with the two logic gates, the arrangement of the multiplexer circuit enables a single logic block to realize a wide range of functions. About half the logic blocks in an Act 3 device also contain a flip-flop.
Actel's horizontal routing channels consist of various-length wire segments with anti fuses to connect logic blocks to wire segments or one wire to another. Although not shown in Figure 25, vertical wires also overlie the logic blocks. forming signal paths that span multiple rows. The speed performance of Actel chips is not fully predictable because the number of anti fuses traversed by a signal depends on how CAD tools allocate the wire segments during circuit implementation. However, a rich selection of wire segment lengths in each channel and algorithms that guaranteestrict limits on the number of antifuses traversed by any two-point connection improve speed performance significantly.
Quicklogic pASIC. Actel's main competitor in antifuse-based FPGAs is Quicklogic, which has two device families, pASlC and pASIC2. The pASIC, illustrated in Figure 27a, has similarities
SUMMER 1996
liD blocks
Routing channels
Logic block rows
~ ~
o 0
~ IJ1id:fnlnJimW1Nwln"I1iITlwNINmjLiitwniiiNI1i;N~g!bljiifnIU;~ral'i;iITI;iiilli,XiiM;n!",I,ii0li*irl',iluMlrNl;Ji!jINImmlimd ~
~ g
liD blocks
Figure 25. Acte' FPGA structure.
to several other FPGAs: Like Xilinx FPGAs, it has an array-based structure; like Actel FPGAs, its logic blocks use multiplexers; and like Altera Flex 8000s, its interconnect consists only of long lines. The pASlC2 is a recently introduced enhanced version, which we will not discuss here. Cypress also offers devices using the pASlC architecture, but we discuss only Quicklogic's version.
Quickloglc's ViaLink antifuse structure (see Figure 27b) consists of a metal top layer, an amorphous-silicon insulat-
DC] o o
Logic cell:
c=J
liD blocks
Inputs
Output
Multiplexer-based circuit block
Inputs
Figure 26. AdelAd 3 logic module.
D o D
ViaLink at every wire crossing
D
I (a) (b)
Figure 27. Quicklogic pASIC strudure (a) and ViaLink (b).
55
o
PRO G
~---------------'---~- _- ---
QS----~----------~-.
~1-) ~
B1 oz
B2
C1 QZ
C2
D1 D2 E1
E2 NZ
~~ ~ F'L
F4 -.-
F5
F6
QC -.--. -----QR
figure 28. Quicklogic pASIC logic cell.
ing layer, and a metal bottom layer. Compared to Actel's PLICE anti fuse, ViaLink offers very low on-resistance-about 50 ohms (PLICE's is about :300 ohms)-and a low parasitic capacitance. ViaLink antifuses are present at every crossing of logic block pins and interconnect wires, providing generous connectivity. Figure 28 shows the pASIC multiplexer-based logic block. It is more complex than Actel's logic module, with more inputs and wide (six-input) AND gates on the multiplexer select lines. Every logic block also contains a flipflop.
FPGA applications. FPGAs have gained rapid acceptance over the past decade because users can apply them to a wide range of applications: random logic, integrating multiple SPLDs,device controllers, communication encoding and filtering, small- to medium-size systerns with SRAM blocks, and many more.
Another interesting FPGA application is prototyping designs to be implemented in gate arrays by using one or more large FPGAs. (A large FPGA corresponds to a small gate array in terms of capacity). Still another application is the emu-
56
R A fA M A B
lation Of. entire I. arge hardware systems I via the use of many interconnected FPGAs. Quickfurrr' and others have de- ' veloped products consisting of the FPGAs and software necessary to partition and map circuits for hardware emulation.
An application only beginning developmentis the use of FPGAs as custom computing machines. This involves usingthe programmable parts to execute software, rather than compiling the software for execution on a regularCPlJ. For information, we refer readers to the proceedings of the IEEE Workshop -on FPC;As for Custom Computing Machines, held for the last four years,"
As mentioned earlier, pieces of designs often map naturally to the SPLDlike blocks ofCPIDs. However, designs mapped into an FPGA break up into logic-block-size pieces distributed through an area of the FPGA. Depending on the FPGA's interconnectstaicture, the logic block interconnections may produce delays. Thus, FPGA performance often depends more on how CAD tools map circuits into the_c~ip than does CPID performance.
THE LOW COST OF FPDs makes them attractive to small firms and larg'iJ COHlpanies alike. Their fast manulacturing tumaroimdisanessential element of their market success. Although their large, slow programmableswit6l1es preventFPDs from providing thes\i'eefl_ performance and logic capacity dfMPGAs, irn provements in-architecture-and CAD toolswill overcome these disadvantages. Over time rPDs will become the dominant technology for implementing digital circuits. $
Acknowledgments
We acknowledge students, colleagues, and acquaintances in industry who have contributed to our-knowledge.
o
v
(
5
References
1. E. Hamdy et aI., "Dielectric,Based Antifuse for Logic and Memory ICs," Tech. Di. gfOstJEEE Int'I Electron EJeulces Meeting, IEEE, Piscataway, N.J., 1988, pp. 786-789.
2_ J BirkneretaL "A Ve!}'-High-Speed FieldProgrammable Gate Array'lJsihg Metalto·Metal Antifuse Programmable Elements," lvficroelectronics 1., Vol. '23, 1992,~pp. 5CI-568.
3 .. D. Marple and L.Cooke,"Programming iAntifuses in CrossPoinfs fPGA,"Proc. [FRE - In t '/ Custom Integrated Circuits Conf., IEEE, Piscataway, N;J, 1994, pp. i85·188.
4. H. Wolff, "How QuickTum Is Filling the Gap.,";Electronics,Apr. 1990:··
5. Proc.1EEE Symp :FPGAs (or- Custoin Computing Machines, IEEE Com[lllterSoci-ety Press, Los Alamitos, Calif ... 1993-1996.
Suggested reading S. Bwwn ct al., Field"Programmabie Gate IIrrays,-Kluwer Academic F'ulJlishers, New well, Mass., 1992. A generaHntroduction to FPGAs.
J Oldfield and R.DlDrf, Fieldf'rogrammable Gate. Arrays, John Wiley & Sons, New York, 1995: A tcxrbeok-like treatment.ineluding digital logic rlesign ha~erl0n the Xilimt 3000 series and theAIgotronix CAL . chip.
1. Rose, A. EI Gamal, and A. Sangibvalllni-Vln· centelli, "Architecture of Field-ProgramIDableGate Arrays," Proc.: !EEt~ Vol. 1)1', No.7, July 1993;pp. 1013"1029. Detailed discussion ·01 architectural. trade-ufls.
Field-Programmable Gate Array Technology, S. Trimberger, ed., Kluwer:Academic Publishers, Norwell, Mass; 1994. Discus-, sion of three FP6AJCPLD architectures.
Up-tO'<iate FPD research appears in the published proceedings of several·conferences:
Proi: IEEE Int'lCustom Integrated Circuits Conf.,IEEE.
Pror:.-lnt'!Conf Corhputer·AidedDesign (ICGIlD) ,IEEE CS PreSs, Los Alamitos, Calif.
IEEE DESIGN & TEST OF-'COMPUTERS
Proc. Design Automation Conference (VAC), IEEE CS Press.
FPGA Symp. Series: Third Int'l ACM Sytnp.
Field-Programmable Gate Arrays (FPGA 95) and Fourth fnt'IACM Symp. Field-Programmable Gate Arrays (FPGA 96), Assoc. for Computing Machinery, New York.
Stephen Brown is an assistant professor of electrical and computer engineering at the University of Toronto. He holds a PhD in electrical engineering from that university; his dissertation (on architecture and CAD for FPGAs) won him the Canadian NSERC's 1992 prize for the best doctoral thesis in Canada. In 1990, the International Conference on Computer-Aided Design awarded him and coauthor Jonathan Rose a Best Paper award. A coauthor of the book FieldProgrammable Gate Arrays, he has also won four awards for excellence in teaching elec-
trical engineering, computer engineering, and 'computer science courses. Brown is the general and program chair for the Fourth Canadian Workshop on Field-ProgrammableDevices (FPD 96), and is on the Technical Program Committee for the Sixth International Workshop on Field-Programmable Logic (FPL 96). He is a member of the IEEE and the Computer Society.
Jonathan Rose is an associate professor of electrical and computer engineering at the University of Toronto. His research interests are in the area of architecture and CAD for field-programmable gate arrays
and systems. He coauthored the book FieldProgrammable Gate Arrays. Rose holds a PhD in electrical engineering from the University of Toronto. He is the general chair of the Fourth International Symposium on FPGAs (FPGA96) and serves on the technical program committee for the Sixth International Workshop on Field-Programmable Logie. In 1990, [CCAD awarded him and coauthor Stephen Brown a Best Paper award. He is a member of the IEEE, the Computer Society, the Association for Computing Machinery, and SIGDA.
Direct questions concerning this article to Stephen Brown, Dept. of Electrical and Computer Engineering, Univ. of Toronto, 10 Kings College Rd., Toronto, ONT, Canada M5S 3G4; brown@eecg.toronto.edu.
CALL FOR ARTICLES
IEEE Design & Test of Computers
Special Issue on Microprocessors
Submission deadline: June 15, 1996 Publication date: Spring 1997
D& T focuses on practical articles of near-term interest to the professional engineering community. D& Tseeks articles of significant contribution that address the design, test, debugging, manufacturability, and yield improvement of microprocessors and microcontrollers. The areas of interest include but are not limited to
• Circuit design and design methodologies
• Logic design and design methodologies
• CAD tools and methodologies
• Design-for-test techniques and applications
• Debugging experiences, tools, and methodologies
• Yield improvement experiences, tools, and methodologies
• Project management
Interested authors should submit four copies ota doublespaced manuscript no longer than 35 pages, in English, by June 15, 1996. Each copy must contain contact information (name, postal and e-mail addresses, and phone/fax numbers). Final articles will be due October 15, 1996. Forauthorguidelines,seeD&T'sSpring 1996 issue or Web page at http://www.computer.org/pubs/d&tld&t.htIl).
Submit manuscripts to:
Marc E. Levin
Special Issue Guest Editor
Sun Microelectronics, USUN02-301
2550 Garcia Avenue, Mountain View, CA 94043 phone (408) 774-8268; fax (408) 774-2099 marc.levitt@eng.sun.com
SUMMER 1996
57