FPGA

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

INVITED

PAPER

Three Ages of FPGAs: A


Retrospective on the First Thirty
Years of FPGA Technology
This paper reflects on how Moore’s Law has driven the design of FPGAs through
three epochs: the age of invention, the age of expansion, and the age of accumulation.
By Stephen M. (Steve) Trimberger, Fellow IEEE

ABSTRACT | Since their introduction, field programmable gate


arrays (FPGAs) have grown in capacity by more than a factor of
10 000 and in performance by a factor of 100. Cost and energy
per operation have both decreased by more than a factor of
1000. These advances have been fueled by process technology
scaling, but the FPGA story is much more complex than simple
technology scaling. Quantitative effects of Moore’s Law have
driven qualitative changes in FPGA architecture, applications
and tools. As a consequence, FPGAs have passed through sev-
eral distinct phases of development. These phases, termed
‘‘Ages’’ in this paper, are The Age of Invention, The Age of
Expansion and The Age of Accumulation. This paper summa-
rizes each and discusses their driving pressures and funda-
mental characteristics. The paper concludes with a vision of the
upcoming Age of FPGAs.

Fig. 1. Xilinx FPGA attributes relative to 1988. Capacity is logic cell


KEYWORDS | Application-specific integrated circuit (ASIC);
count. Speed is same-function performance in programmable fabric.
commercialization; economies of scale; field-programmable
Price is per logic cell. Power is per logic cell. Price and power are scaled
gate array (FPGA); industrial economics; Moore’s Law; pro- up by 10 000. Data: Xilinx published data.
grammable logic

I. INTRODUCTION These advancements have been driven largely by process


Xilinx introduced the first field programmable gate arrays technology, and it is tempting to perceive the evolution of
(FPGAs) in 1984, though they were not called FPGAs until FPGAs as a simple progression of capacity, following semi-
Actel popularized the term around 1988. Over the ensuing conductor scaling. This perception is too simple. The real
30 years, the device we call an FPGA increased in capacity story of FPGA progress is much more interesting.
by more than a factor of 10 000 and increased in speed by a Since their introduction, FPGA devices have pro-
factor of 100. Cost and energy consumption per unit func- gressed through several distinct phases of development.
tion decreased by more than a factor of 1000 (see Fig. 1). Each phase was driven by both process technology oppor-
tunity and application demand. These driving pressures
caused observable changes in the device characteristics
Manuscript received September 18, 2014; revised November 21, 2014 and and tools. In this paper, I review three phases I call the
December 11, 2014; accepted December 23, 2014. Date of current version April 14, 2015.
The author is with Xilinx, San Jose, CA 95124 USA (e-mail:
‘‘Ages’’ of FPGAs. Each age is eight years long and each
steve.trimberger@xilinx.com). became apparent only in retrospect. The three ages are:
Digital Object Identifier: 10.1109/JPROC.2015.2392104 1) Age of Invention 1984–1991;
0018-9219  2015 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/
redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
318 Proceedings of the IEEE | Vol. 103, No. 3, March 2015
Trimberger: Three Ages of FPGAs

The disadvantage of the FPGA per-unit cost premium


over ASIC diminished over time as NRE costs became a
larger fraction of the total cost of ownership of ASIC. The
dashed lines in Fig. 2 indicate the total cost at some process
node. The solid lines depict the situation at the next process
node, with increased NRE cost, but lower cost per chip. Both
FPGA and ASIC took advantage of lower cost manufacturing,
while ASIC NRE charges continued to climb, pushing the
crossover point higher. Eventually, the crossover point grew
so high that for the majority of customers, the number of
units no longer justified an ASIC. Custom silicon was war-
Fig. 2. FPGA versus ASIC Crossover Point. Graph shows total cost ranted only for very high performance or very high volume;
versus number of units. FPGA lines are darker and start at the lower all others could use a programmable solution.
left corner. With the adoption of the next process node (arrows
This insight, that Moore’s Law [33] would eventually
from the earlier node in dashed lines to later node in solid lines),
the crossover point, indicated by the vertical dotted line, grew larger.
propel FPGA capability to cover ASIC requirements, was a
fundamental early insight in the programmable logic busi-
ness. Today, device cost is less of a driver in the FPGA
2) Age of Expansion 1992–1999; versus ASIC decision than performance, time-to-market,
3) Age of Accumulation 2000–2007. power consumption, I/O capacity and other capabilities.
Many ASIC customers use older process technology,
lowering their NRE cost, but reducing the per-chip cost
II . PREAMBLE: WHAT WAS THE advantage.
BIG DEAL ABOUT FPGAs? Not only did FPGAs eliminate the up-front masking
charges and reduce inventory costs, but they also reduced
A. FPGA Versus ASIC design costs by eliminating whole classes of design prob-
In the 1980s, Application-Specific Integrated Circuit lems. These design problems included transistor-level de-
(ASIC) companies brought an amazing product to the sign, testing, signal integrity, crosstalk, I/O design and
electronics market: the built-to-order custom integrated clock distribution.
circuit. By the mid-1980s, dozens of companies were sell- As important as low up-front cost and simpler design
ing ASICs, and in the fierce competition, the winning at- were, the major FPGA advantages were instantly availabi-
tributes were low cost, high capacity and high speed. When lity and reduced visibility of a failure. Despite extensive
FPGAs appeared, they compared poorly on all of these simulation, ASICs rarely seemed to be correct the first
measures, yet they thrived. Why? time. With wafer-fabrication turnaround times in the
The ASIC functionality was determined by custom mask weeks or months, silicon re-spins impacted schedules sig-
tooling. ASIC customers paid for those masks with an up- nificantly, and as masking costs rose, silicon re-spins were
front non-recurring engineering (NRE) charge. Because noticeable to ever-rising levels in the company. The high
they had no custom tooling, FPGAs reduced the up-front cost of error demanded extensive chip verification. Since
cost and risk of building custom digital logic. By making an FPGA can be reworked in minutes, FPGA designs in-
one custom silicon device that could be used by hundreds or curred no weeks-long delay for an error. As a result, veri-
thousands of customers, the FPGA vendor effectively fication need not be as thorough. ‘‘Self-emulation,’’ known
amortized the NRE costs over all customers, resulting in colloquially as ‘‘download-it-and-try-it,’’ could replace ex-
no NRE charge for any one customer, while increasing the tensive simulation.
per-unit chip cost for all. Finally, there was the ASIC production risk: an ASIC
The up-front NRE cost ensured that FPGAs were more company made money only when their customer’s design
cost effective than ASICs at some volume [38]. FPGA went into production. In the 1980s, because of changing
vendors touted this in their ‘‘crossover point,’’ the number requirements during the development process, product
of units that justified the higher NRE expense of an ASIC. failures or outright design errors, only about one-third of
In Fig. 2, the graphed lines show the total cost for a number all designs actually went to production. Two-thirds of de-
units purchased. An ASIC has an initial cost for the NRE, signs lost money. The losses were incurred not only by the
and each subsequent unit adds its unit cost to the total. An ASIC customers, but also by the ASIC suppliers, whose
FPGA has no NRE charge, but each unit costs more than the NRE charges rarely covered their actual costs and never
functionally equivalent ASIC, hence the steeper line. The covered the cost of lost opportunity in their rapidly depre-
two lines meet at the crossover point. If fewer than that ciating manufacturing facilities. On the other hand,
number of units is required, the FPGA solution is cheaper; programmable-logic companies and customers could still
more than that number of units indicates the ASIC has make money on small volume, and a small error could be
lower overall cost. corrected quickly, without costly mask-making.

Vol. 103, No. 3, March 2015 | Proceedings of the IEEE 319


Trimberger: Three Ages of FPGAs

the and array grows with the square of the number of


inputs (more precisely, inputs times product terms). Pro-
cess scaling delivers more transistors with the square of the
shrink factor. However, the quadratic increase in the and
array limits PALs to grow logic only linearly with the
shrink factor. PAL input and product-term lines are also
heavily loaded, so delay grows rapidly as size increases. A
PAL, like any memory of this type, has word lines and bit
lines that span the entire die. With every generation, the
ratio of the drive of the programmed transistor to the
loading decreased. More inputs or product terms increased
loading on those lines. Increasing transistor size to lower
resistance also raised total capacitance. To maintain speed,
power consumption rose dramatically. Large PALs were
impractical in both area and performance. In response, in
Fig. 3. Generic PAL architecture. the 1980s, Altera pioneered the Complex Programmable
Logic Device (CPLD), composed of several PAL-type blocks
with smaller crossbar connections among them. But FPGAs
B. FPGA Versus PAL had a more scalable solution.
Programmable logic was well established before the The FPGA innovation was the elimination of the and
FPGA. EPROM-programmed Programmable Array Logic array that provided the programmability. Instead, config-
(PAL) had carved out a market niche in the early 1980s. uration memory cells were distributed around the array to
However, FPGAs had an architectural advantage. To un- control functionality and wiring. This change gave up the
derstand the FPGA advantage, we first look at the simple memory-array-like efficiency of the PAL structure in favor
programmable logic structures of these early 1980s de- of architectural scalability. The architecture of the FPGA,
vices. A PAL device, as depicted in Fig. 3, consists of a two- shown in Fig. 4, consists of an array of programmable logic
level logic structure [6], [38]. Inputs are shown entering at blocks and interconnect with field-programmable switches.
the bottom. On the left side, a programmable and array The capacity and performance of the FPGA were no longer
generates product terms, ands of any combination of the limited by the quadratic growth and wiring layout of the
inputs and their inverses. A fixed or gate in the block at and array. Not every function was an output of the chip, so
the right completes the combinational logic function of the
macrocell’s product terms. Every macrocell output is an
output of the chip. An optional register in the macrocell
and feedback to the input of the and array enable a very
flexible state machine implementation.
Not every function could be implemented in one pass
through the PAL’s macrocell array, but nearly all common
functions could be, and those that could not were realized
in two passes through the array. The delay through the PAL
array is the same regardless of the function performed or
where it is located in the array. PALs had simple fitting
software that mapped logic quickly to arbitrary locations in
the array with no performance concerns. PAL fitting soft-
ware was available from independent EDA vendors,
allowing IC manufacturers to easily add PALs to their
product line.
PALs were very efficient from a manufacturing point of
view. The PAL structure is very similar to an EPROM
memory array, in which transistors are packed densely to
yield an efficient implementation. PALs were sufficiently
similar to memories that many memory manufacturers
were able to expand their product line with PALs. When
the cyclical memory business faltered, memory manufac-
turers entered the programmable logic business. Fig. 4. Generic array FPGA architecture. 4  4 array with three wiring
The architectural issue with PALs is evident when one tracks per row and column. Switches are at the circles at intersections.
considers scaling. The number of programmable points in Device inputs and outputs are distributed around the array.

320 Proceedings of the IEEE | Vol. 103, No. 3, March 2015


Trimberger: Three Ages of FPGAs

capacity could grow with Moore’s Law. The consequences downplayed it to avoid customer concerns about what
were great. happened to their logic when power was removed. And
• FPGA architecture could look nothing like a mem- memory dominated the die area.
ory. Design and manufacturing were very different Antifuse devices promised the elimination of the second
than memory. die and elimination of the area penalty of memory-cell
• The logic blocks were smaller. There was no gua- storage, but at the expense of one-time programmability. The
rantee that a single function would fit into one. early antifuse was a single transistor structure; the memory
Therefore, it was difficult to determine ahead of cell switch was six transistors. The area savings of antifuses
time how much logic would fit into the FPGA. over memory cells was inescapable. Actel invented the
• The performance of the FPGA depended on where antifuse and brought it to market [17], and in 1990 the largest
the logic was placed in the FPGA. FPGAs required capacity FPGA was the Actel 1280. Quicklogic and Cross-
placement and routing, so the performance of the point followed Actel and also developed devices based on the
finished design was not easy to predict in advance. advantages of the antifuse process technology.
• Complex EDA software was required to fit a design In the 1980s, Xilinx’s four-input LUT-based architec-
into an FPGA. tures were considered ‘‘coarse-grained’’. Four-input func-
With the elimination of the and-array, FPGA architects tions were observed as a ‘‘sweet spot’’ in logic designs, but
had the freedom to build any logic block and any inter- analysis of netlists showed that many LUT configurations
connect pattern. FPGA architects could define whole new were unused. Further, many LUTs had unused inputs,
logic implementation models, not based on transistors or wasting precious area. Seeking to improve efficiency,
gates, but on custom function units. Delay models need FPGA architects looked to eliminate waste in the logic
not be based on metal wires, but on nodes and switches. block. Several companies implemented finer-grained ar-
This architectural freedom ushered in the first Age of chitectures containing fixed functions to eliminate the
FPGAs, the Age of Invention. logic cell waste. The Algotronix CAL used a fixed-MUX
function implementation for a two-input LUT [24]. Con-
current (later Atmel) and their licensee, IBM, used a
II I. AGE OF I NVE NTION 1 984 –1 99 1 small-cell variant that included two-input nand and xor
The first FPGA, the Xilinx XC2064, contained only 64 logic gates and a register in the CL devices. Pilkington based
blocks, each of which held two three-input Look-Up Tables their architecture on a single nand gate as the logic block
(LUTs) and one register [8]. By today’s counting, this would [23], [34]. They licensed Plessey (ERA family), Toshiba
be about 64 logic cells, less than 1000 gates. Despite its (TC family) and Motorola (MPA family) to use their nand-
small capacity, it was a very large dieVlarger than the cell-based, SRAM-programmed device. The extreme of
commercial microprocessors of the day. The 2.5-micron fine-grained architecture was the Crosspoint CLi FPGA, in
process technology used for the XC2064 was barely able to which individual transistors were connected to one
yield it. In those early years, cost containment was critical another with antifuse-programmable connections [31].
to the success of FPGAs. Early FPGA architects noted that an efficient inter-
‘‘Cost containment was critical to the success of FPGAs.’’ connect architecture should observe the two-dimensionality
A modern reader will accept that statement as some kind of of the integrated circuit. The long, slow wires of PALs were
simplistic statement of the obvious, but this interpretation replaced by short connections between adjacent blocks that
seriously underemphasizes the issue. Die size and cost per could be strung together as needed by programming to form
function were crushingly vital. The XC2064, with only longer routing paths. Initially, simple pass transistors steered
64 user-accessible flip-flops, cost hundreds of dollars because signals through the interconnect segments to adjacent
it was such a large die. Since yield (and hence, cost) is super- blocks. Wiring was efficient because there were no unused
linear for large die, a 5% increase in die size could have fractions of wires. These optimizations greatly shrank the
doubled the cost or, worse, yield could have dropped to zero interconnect area and made FPGAs possible. At the same
leaving the startup company with no product whatsoever. time, though, they increased signal delay and delay
Cost containment was not a question of mere optimization; it uncertainty through FPGA wiring due to large capacitances
was a question of whether or not the product would exist. It and distributed series resistances through the pass transistor
was a question of corporate life or death. In those early years, switch network. Since interconnect wires and switches
cost containment was critical to the success of FPGAs. added size, but not (billable) logic, FPGA architects were
As a result of cost pressure, FPGA architects used their reluctant to add much. Early FPGAs were notoriously
newfound freedom to maximize the efficiency of the difficult to use because they were starved for interconnect.
FPGA, turning to any advantage in process technology and
architecture. Although static memory-based FPGAs were
re-programmable, they required an external PROM to IV. AGE OF INVENTION IN RETROSPECT
store the programming when power was off. Reprogramm- In the Age of Invention, FPGAs were small, so the design
ability was not considered to be an asset, and Xilinx problem was small. Though they were desirable, synthesis

Vol. 103, No. 3, March 2015 | Proceedings of the IEEE 321


Trimberger: Three Ages of FPGAs

and even automated placement and routing were not


considered essential. Many deemed it impractical even to
attempt design automation on the personal computers of
the time, since ASIC placement and routing was being
done on mainframe computers. Manual design, both
logical and physical, was acceptable because of the small
problem size. Manual design was often necessary because
of the limited routing resources on the chips [41].
Radically different architectures precluded universal
FPGA design tools, as were available in the ASIC business.
FPGA vendors took on the added burden of EDA devel-
opment for their devices. This was eventually recognized
as an advantage, as FPGA vendors experimented and im-
proved their architectures. The PAL vendors of the pre-
vious decade had relied upon external tool vendors to
provide software for mapping designs into their PALs. As a
result, PAL vendors were restricted to those architectures
the tool vendors supported, leading to commoditization,
low margins and lack of innovation. PLD architecture was
stifled while FPGA architecture flourished.
A further advantage of captive software development Fig. 5. FPGA architecture genealogical tree, ca. 2000. All trademarks
are the property of their respective owners.
was that FPGA customers were not required to purchase
tools from a third-party EDA company, which would have
increased their NRE costs. As they did with NRE charges,
FPGA vendors amortized their tool development costs into Pioneering the fabless business model, FPGA startup com-
their silicon pricing, keeping the up-front cost of using panies typically could not obtain leading-edge silicon tech-
their devices very low. EDA companies were not much nology in the early 1990s. As a result, FPGAs began the Age
interested in FPGA tools anyway with their fragmented of Expansion lagging the process introduction curve. In the
market, low volume, low selling price, and requirement to 1990s, they became process leaders as the foundries realized
run on underpowered computers. the value of using the FPGA as a process-driver application.
In the Age of Invention, FPGAs were much smaller than Foundries were able to build SRAM FPGAs as soon as they
the applications that users wanted to put into them. As a were able to yield transistors and wires in a new technology.
result, multiple-FPGA systems [1], [42] became popular, and FPGA vendors sold their huge devices while foundries
automated multi-chip partitioning software was identified as refined their processes. Each new generation of silicon
an important component of an FPGA design suite [36], even doubled the number of transistors available, which doubled
as automatic placement and routing were not. the size of the largest possible FPGA and halved the cost per
function. More import than simple transistor scaling, the
V. INTERL UDE: SHAKEOUT IN introduction of chemical-mechanical polishing (CMP)
FPGA BUSINESS permitted foundries to stack more metal layers. Valuable
The Age of Invention ended with brutal attrition in the FPGA
business. A modern reader may not recognize most of the
companies or product names in Section III and in the FPGA
genealogical tree in Fig. 5 [6], [38]. Many of the companies
simply vanished. Others quietly sold their assets as they
exited the FPGA business. The cause of this attrition was
more than the normal market dynamics. There were impor-
tant changes in the technology, and those companies that did
not take advantage of the changes could not compete. Quan-
titative changes due to Moore’s Law resulted in qualitative
changes in the FPGAs built with semiconductor technology.
These changes characterized the Age of Expansion.

VI . AGE OF E XPANSION 1992–199 9


Through the 1990s, Moore’s Law continued its rapid pace of Fig. 6. Growth of FPGA LUTs and interconnect wires Wire length is
improvement, doubling transistor count every two years. measured in millions of transistor pitches.

322 Proceedings of the IEEE | Vol. 103, No. 3, March 2015


Trimberger: Three Ages of FPGAs

for ASICs, this capability was explosive for FPGAs because


the cost of valuable (nonbillable) interconnect dropped even
faster than the cost of transistors, and FPGA vendors
aggressively increased the interconnect on their devices to
accommodate the larger capacity (see Fig. 6).
This rapid process improvement had several effects
which we now examine.

A. Area Became Less Precious


No one who joined the FPGA industry in the mid-1990s
would agree that cost was unimportant or area was not
precious. However, those who had experienced the ago-
nies of product development in the 1980s certainly saw the
difference. In the 1980s, transistor efficiency was neces-
sary in order to deliver any product whatsoever. In the Fig. 7. Performance scaling with longer wire length segmentation.
1990s, it was merely a matter of product definition. Area
was still important, but now it could be traded off for
performance, features and ease-of-use. The resulting de- On the down side, when the entire length of the wire
vices were less silicon efficient. This was unthinkable in segment is not used, parts of the metal trace are effectively
the Age of Invention just a few years earlier. wasted. Many silicon-efficient Age of Invention architec-
tures were predicated on wiring efficiency, featuring short
B. Design Automation Became Essential wires that eliminated waste. Often, they rigidly followed
In the Age of Expansion, FPGA device capacity in- the two-dimensional limitation of the physical silicon giv-
creased rapidly as costs fell. FPGA applications became too ing those FPGAs the label ‘‘cellular.’’ In the Age of Expan-
large for manual design. In 1992, the flagship Xilinx sion, longer wire segmentation was possible because the
XC4010 delivered a (claimed) maximum of 10 000 gates. cost of wasted metal was now acceptable. Architectures
By 1999, the Virtex XCV1000 was rated at a million. In the dominated by nearest-neighbor-only connections could
early 1990s, at the start of the Age of Expansion, automatic not match the performance or ease-of-automation of archi-
placement and routing was preferred, but not entirely tectures that took advantage of longer wire segmentation.
trusted. By the end of the 1990s, automated synthesis [9], A similar efficiency shift applied to logic blocks. In the
placement and routing [3], [4], [19], [32], [37] were required Age of Invention, small, simple logic blocks were attractive
steps in the design process. Without the automation, the because their logic delay was short and because they
design effort would be simply too great. The life of an FPGA wasted little when unused or partially used. Half of the
company was now dependent upon the ability of design configuration memory cells in a four-input LUT are wasted
automation tools to target the device. Those FPGA compa- when a three-input function is instantiated in it. Clever
nies that controlled their software controlled their future. designers could manually map complex logic structures
Cheaper metal from process scaling led to more prog- efficiently into a minimum number of fine-grained logic
rammable interconnect wire, so that automated placement blocks, but automated tools were not as successful. For
tools could succeed with a less-precise placement. larger functions, the need to connect several small blocks
Automated design tools required automation-friendly put greater demand on the interconnect. In the Age of
architectures, architectures with regular and plentiful Expansion, not only were there more logic blocks, but the
interconnect resources to simplify algorithmic decision- blocks themselves became more complex.
making. Cheaper wire also admitted longer wire segmen- Many Age of Invention architectures, built for effi-
tation, interconnect wires that spanned multiple logic ciency with irregular logic blocks and sparse interconnect,
blocks [2], [28], [44]. Wires spanning many blocks effec- were difficult to place and route automatically. During the
tively make physically distant logic logically closer, im- Age of Invention, this was not a serious issue because the
proving performance. The graph in Fig. 7 shows large devices were small enough that manual design was practi-
performance gains from a combination of process technol- cal. But excessive area efficiency was fatal to many devices
ogy and interconnect reach. Process scaling alone would and companies in the Age of Expansion. Fine-grained
have brought down the curve, but retained the shape; architectures based on minimizing logic waste (such as the
longer segmentation flattened the curve. The longer seg- Pilkington nand-gate block, the Algotronix/Xilinx 6200
mented interconnect simplified placement because with Mux-based 2LUT block, the Crosspoint transistor-block)
longer interconnect, it was not as essential to place two simply died. Architectures that achieved their efficiency by
blocks in exactly the right alignment needed to connect starving the interconnect also died. These included all
them with a high performance path. The placer can do a nearest-neighbor grid-based architectures. The Age of Ex-
sloppier job and still achieve good results. pansion also doomed time-multiplexed devices [14], [39],

Vol. 103, No. 3, March 2015 | Proceedings of the IEEE 323


Trimberger: Three Ages of FPGAs

since equivalent capacity expansion could be had without the network of nands with inverters. Since a LUT implements
n
attendant complexity and performance loss by merely any of the 22 combinations of its inputs, a complete library
waiting for the next process generation. The survivors in would have been enormous. ASIC technology mappers did
the FPGA business were those that leveraged process a poor job on LUT-based FPGAs. But by the mid-1990s,
technology advancement to enable automation. Altera was targeted LUT mappers exploited the simplicity of mapping
first, bringing the long-distance connections of their CPLDs arbitrary functions into LUTs [9].
to the Altera FLEX architecture. FLEX was more automatable The LUT has hidden efficiencies. A LUT is a memory,
than other FPGAs of the period that were dominated by short and memories lay out efficiently in silicon. The LUT also
wires. It achieved quick success. In the mid-1990s, AT&T/ saves interconnect. FPGA programmable interconnect is
Lucent released ORCA [26] and Xilinx scaled up its XC4000 expensive in area and delay. Rather than a simple metal
interconnect in number and length as it built larger devices. wire as in an ASIC, FPGA interconnect contains buffers,
The Age of Expansion was firmly established. routing multiplexers and the memory cells to control them.
Therefore, much more of the cost of the logic is actually in
C. Emergence of SRAM as Technology of Choice the interconnect [15]. Since a LUT implements any func-
One aspect of the rapid progress of Moore’s Law was the tion of its inputs, automation tools need only route the
need to be on the forefront of process technology. The easiest desired signals together at a LUT in order to retire the
way to double the capacity and halve the cost for logic was to function of those inputs. There was no need to make mul-
target the next process technology node. This pressured tiple levels of LUTs just to create the desired function of a
FPGA vendors to adopt leading-edge process technology. small set of inputs. LUT input pins are arbitrarily swappa-
FPGA companies using technologies that could not be easily ble, so the router need not target a specific pin. As a result,
implemented on a new process were at a structural LUT-based logic reduced the amount of interconnect re-
disadvantage. This was the case with nonvolatile program- quired to implement a function. With good synthesis, the
mable technologies such as EPROM, Flash and antifuse. waste from unused LUT functionality was less than the
When a new process technology becomes available, the first savings from the reduced interconnect requirement.
components that are available are transistors and wires, the Distributed-memory-cell programming permitted archi-
essential components of electronics. A static-memory-based tectural freedom and gave FPGA vendors nearly universal
device could use a new, denser process immediately. access to process technology. LUTs for logic implementation
Antifuse devices were accurately promoted as being more eased the burden on interconnect. Xilinx-derived LUT-based
efficient on a particular technology node, but it took months architectures appeared at Xilinx second sources: Monolithic
or years to qualify the antifuse on the new node. By the time Memories, AMD and AT&T. In the Age of Expansion, other
the antifuse was proven, SRAM FPGAs were already starting companies, notably Altera, and AT&T/Lucent, adopted
to deliver on the next node. Antifuse technologies could not memory cell and LUT architectures as well.
keep pace with technology, so they needed to be twice as
efficient as SRAM just to maintain product parity.
Antifuse devices suffered a second disadvantage: lack VII. INTERLUDE: FPGA CAPACITY
of reprogrammability. As customers grew accustomed to BELL CURVE
‘‘volatile’’ SRAM FPGAs, they began to appreciate the ad- The bell curve in Fig. 8 represents the histogram of distri-
vantages of in-system programmability and field-updating bution of sizes of ASIC applications. FPGA capacity at
of hardware. In contrast, a one-time-programmable device some time is a point on the X-axis, shown by a vertical bar.
needed to be physically handled to be updated or to re- All the applications to the left of the bar are those that can
medy design errors. The alternative for antifuse devices be addressed by FPGAs, so the addressable market for
was an extensive ASIC-like verification phase, which un- FPGAs is the shaded area under the curve to the left of the
dermined the value of the FPGA.
The rapid pace of Moore’s Law in the Age of Expansion
relegated antifuse and flash FPGAs to niche products.

D. Emergence of LUT as Logic Cell of Choice


LUTs survived and dominated despite their docu-
mented inefficiency in the Age of Expansion for several
reasons. First, LUT-based architectures were easy targets
for synthesis tools. This statement would have been dis-
puted in the mid-1990s, when synthesis vendors com-
plained that FPGAs were not ‘‘synthesis friendly.’’ This
perception arose because synthesis tools were initially de-
veloped to target ASICs. Their technology mappers ex-
pected a small library in which each cell was described as a Fig. 8. Growth of the FPGA addressable market.

324 Proceedings of the IEEE | Vol. 103, No. 3, March 2015


Trimberger: Three Ages of FPGAs

be effectively automated simply disappeared. SRAM de-


vices were first to exploit new process technology and
dominated the business. FPGAs encroached on ASIC terri-
tory as FPGA device capacity grew more rapidly than the
demand from applications. No longer did users clamor for
multi-FPGA partitioning software: designs fit, sometimes
snugly, into the available FPGAs.
As FPGAs became more popular, EDA companies be-
came interested in providing tools for them. However,
overtures by EDA companies were viewed with suspicion.
FPGA veterans had seen how PLD vendors had lost control
over their innovation by surrendering the software. They
refused to let that happen in the FPGA domain. Further,
the major FPGA companies feared their customers could
Fig. 9. Design gap. Source: Data: Synopsys, Gartner Dataquest, become dependent upon an external EDA company’s tools.
VLSI Technology, Xilinx. If that happened, the EDA company could effectively drive
up the FPGA NRE by their software tools prices. This
would undermine the FPGA value proposition, shifting the
bar. During the Age of Expansion, FPGA capacity in- crossover point back to lower volume. Some important
creased at Moore’s Law pace, so the bar moved to the right. FPGA-EDA alliances were made in the synthesis domainV
Of course, the entire bell curve of applications also moved an arms-length from the physical design tools that defined
to the right, but the rate of growth in application size was the architecture. Despite the alliances, the FPGA compa-
slower than the FPGA capacity growth. As a result, the bar nies maintained competitive projects to prevent the pos-
representing FPGAs moved quickly to the right relative to sibility of dependence. In the Age of Expansion, FPGA
the distribution of designs. Since FPGAs were addressing vendors found themselves competing against both ASIC
the low end of the curve, even a small increase in available technology and EDA technology.
capacity admitted a large number of new applications.
During the Age of Expansion, FPGA capacity covered a
growing fraction of extant designs and grew to address the IX. INTE RLUDE: XI LINX MARKETING,
majority of ASIC applications. CA. 20 0 0
This increasing applicability can also be seen from the By the late 1990s, the Age of Expansion was well under-
‘‘Design Gap’’ slide popular with EDA vendors in the late stood in the FPGA business. FPGA vendors were aggres-
1990s (Fig. 9). The capacity of ASICs and FPGAs grew with sively pursuing process technology as the solution to their
Moore’s Law: ASICs growing at a 59% annual growth rate, size, performance and capacity issues. Each new process
and FPGAs at 48% annual growth rate. The observed aver- generation brought with it numerous new applications.
age ASIC design start grew considerably more slowly, only The slide in Fig. 10 is excerpted from a Xilinx marketing
25% per year. As a result, FPGA capacity met the average
ASIC design size in 2000, though for a large (expensive)
FPGA. By 2004, though, a ten-dollar FPGA was predicted to
meet the average ASIC requirement. That crossover point
moved farther out in the early 2000s as FPGAs addressed
the low end of the ASIC market and those small designs
became FPGA designs. Small designs were no longer in-
cluded in the average ASIC design size calculation, thereby
inflating the average ASIC design size in the new mille-
nnium. Today, the average ASIC is much larger than Fig. 9
would suggest, because FPGAs successfully absorbed nearly
the entire low-end of the ASIC business.

VIII . AGE OF E XPANSION


IN RETROSPECT
Through the Age of Expansion, Moore’s Law rapidly in-
creased the capacity of FPGAs, leading to a demand for
design automation and permitting longer interconnect
segmentation. Overly efficient architectures that could not Fig. 10. Xilinx marketing ca. 2000 Image courtesy of Xilinx.

Vol. 103, No. 3, March 2015 | Proceedings of the IEEE 325


Trimberger: Three Ages of FPGAs

produced libraries of soft logic (IP) for important func-


tions. The most notable of these soft logic functions were
microprocessors (Xilinx MicroBlaze and Altera Nios) [21],
memory controllers and various communications protocol
stacks. Before Ethernet MAC was implemented in transis-
tors on Virtex-4, it was implemented in LUTs as a soft core
for Virtex-II. Standard interfaces to IP components con-
sumed additional LUTs, but that inefficiency was not a
great concern compared with the design effort savings.
Large FPGAs were larger than the average ASIC design.
By the mid-2000s, only ASIC emulators needed multi-chip
partitioners. More customers were interested in aggregat-
Fig. 11. Shrinking growth of the FPGA addressable market.
ing multiple, potentially unrelated components on a single
spacious FPGA [25]. Xilinx promoted ‘‘Internet Reconfi-
gurable Logic’’ and partitioning of the FPGA area to allow
presentation from the year 2000. The Virtex 1000, the
dynamic insertion of functional units into a subset of the
largest FPGA available at the time, is depicted as the tiny
programmable logic resources.
black rectangle at the lower left. The slide shows the ex-
The characteristics of designs changed in the 2000s.
pectation that the Age of Expansion will continue
Large FPGAs admitted large designs that were complete
unabated, increasing the number of gates to 50 million
subsystems. FPGA users were no longer working simply on
over the following five years. This did not happen, despite
implementing logic; they needed their FPGA design to
unwavering progress of Moore’s Law. In the following
adhere to systems standards. These standards were prima-
section we examine what really happened and why.
rily communications standards for signaling and protocols,
either to interface to external components or to com-
X. AGE OF ACCUMULATION 2000–2007 municate among internal blocks. Processing standards
By the start of the new millennium, FPGAs were common became applicable due to FPGAs growing role in compute-
components of digital systems. Capacity and design size were intensive applications. As the FPGA grew as a fraction of
growing and FPGAs had found a huge market in the data the customer’s overall system logic, its cost and power
communications industry. The dot-com bust of the early grew accordingly. These issues became much more im-
2000s created a need for lower cost. The increasing cost and portant than they were in the Age of Expansion.
complexity of silicon manufacturing eliminated ‘‘casual’’ Pressure to adhere to standards, decrease cost and de-
ASIC users. Custom silicon was simply too risky for a small crease power led to a shift in architecture strategy from
team to execute successfully. When they saw they could fit simply adding programmable logic and riding Moore’s Law,
their problem into an FPGA, they became FPGA customers. as was done in the Age of Expansion, to adding dedicated
As in the Age of Expansion, the inexorable pace of logic blocks [7], [13], [29], [43]. These blocks included large
Moore’s Law made FPGAs ever larger. Now they were memories, microprocessors, multipliers, flexible I/O and
larger than the typical problem size. There is nothing bad source-synchronous transceivers. Built of custom-designed
about having capacity greater than what is needed, but transistors rather than ASIC gates, they were often even
neither is there anything particularly virtuous in it. As a more efficient than ASIC implementations. For applications
result, customers were unwilling to pay a large premium that used them, they reduced the overhead of programma-
for the largest FPGA. bility in area, performance, power and design effort [27].
Increased capacity alone was insufficient to guarantee The result was the ‘‘Platform FPGA,’’ captured by the
market growth, either. Consider Fig. 11, the FPGA bell curve, Xilinx Marketing slide from 2005 in Fig. 12. Compare this
again. As the FPGA capacity passed the average design size, with Fig. 10. No longer was the message the millions of gates,
the peak of the bell curve, an increase in capacity admitted but rather the pre-defined, high-performance dedicated
progressively fewer applications. Mere size, which virtually blocks. Even the term ‘‘gate’’ had disappeared from the slide.
guaranteed a successful product in the Age of Expansion, This FPGA was not simply a collection of LUTs, flip-flops, I/O
attracted fewer and fewer new customers in the years and programmable routing. It included multipliers, RAM
following. blocks, multiple Power-PC microprocessors, clock manage-
FPGA vendors addressed this challenge in two ways. For ment, gigahertz-rate source-synchronous transceivers and bit-
the low-end of the market, they refocused on efficiency and stream encryption to protect the IP of the design. FPGA tools
produced families of lower-capacity, lower-performance grew to target this growing array of implementation targets.
‘‘low-cost’’ FPGAs: Spartan from Xilinx, Cyclone from To ease the burden of using the new functions and to
Altera and EC/ECP from Lattice. meet system standards, FPGA vendors offered logic gene-
For the high-end, FPGA vendors looked to make it rators to build targeted functionality from a combination of
easier for customers to fill up their spacious FPGAs. They their dedicated functions and soft logic [22]. Generators

326 Proceedings of the IEEE | Vol. 103, No. 3, March 2015


Trimberger: Three Ages of FPGAs

Table 1 Selected Dedicated Logic on FPGA

Fig. 12. Xilinx marketing ca. 2005. Image courtesy of Xilinx.

and libraries of soft logic provided the interfaces to


CoreConnect, AXI and other busses for peripherals on soft
and hardened processors. They also built the bus protocol
logic that wrapped around the fixed-function physical inter- the Age of Invention. Gates, routing and three-state bussing
faces of serial transceivers. Xilinx System Generator and were available in the Age of Invention, while arithmetic,
Altera’s DSP Builder automated much of the assembly for memory and specialized I/O appeared in the Age of
DSP systems, constructed from a combination of fixed func- Expansion (Table 1). Dedicated blocks have been added
tions and LUTs. To simplify creation of microprocessor throughout the ages of FPGAs, and there is every indication
systems, Xilinx provided the Embedded Design Kit (EDK) that they will continue to evolve in variety and complexity. In
while Altera released their Embedded System Design Kit general, though, successful dedicated functions have been
(ESDK). Demonstrations of these capabilities included Linux generic in nature, using the flexibility of programmable LUTs
running on the FPGA processor with video compression and and interconnect to customize the functionality. Attempts to
decompression in the FPGA fabric. produce niche-targeted or Application-Specific FPGAs have
But, what of those customers in the Age of Accumu- not proved successful, as they lose the advantage of volume
lation who do not need the fixed functions? To a customer manufacturing on which FPGA economics relies. This was, of
who does not need a Power-PC processor, memory or mul- course, until the Age of Accumulation gave rise to the
tiplier, the area of that block is wasted, effectively degrad- ‘‘communications FPGA.’’
ing the cost and speed of the FPGA. At first, FPGA vendors
tried to ensure those functions could be used for logic if
they were not needed for their primary purpose. They XI I. AGE OF ACCUMULATION
provided ‘‘large-LUT-mapping’’ software to move logic into IN RETROSPECT
unused RAM blocks. Xilinx released the ‘‘Ultracontroller’’
to map state machines into microprocessor code for the A. Applications
hardened Power-PC in the Virtex-II Pro. But these mea- The biggest change in FPGAs in the Age of Accumula-
sures were eventually perceived as unimportant. It is an tion was the change in the target application. The FPGA
indication of how far we had come from the Age of Inven- business grew not from general ASIC replacement, but from
tion that FPGA vendors and customers alike simply ac- adoption by the communications infrastructure. Companies
cepted the wasted area. A Xilinx Vice-President remarked such as Cisco Systems used FPGAs to make custom data
that he would provide four Power-PC processors on an paths for steering huge volumes of internet and packetized
FPGA and did not care if customers did not use any of them. voice traffic through their switches and routers [20], [30].
‘‘We give them the processors for free.’’ Their performance requirements eliminated standard mi-
croprocessors and array processors, and unit volumes were
within the FPGA crossover point. New network routing ar-
XI. INTE RLUDE: ALL AGES AT ALL TIMES chitectures and algorithms could be implemented in FPGAs
Dedicated blocks were not unique to the Age of Accumu- quickly and updated in the field. In the Age of Accumula-
lation, just as increased device capacity was not unique to tion, sales to the communications industry segment grew
the Age of Expansion or architecture innovation unique to rapidly to well over half the total FPGA business.

Vol. 103, No. 3, March 2015 | Proceedings of the IEEE 327


Trimberger: Three Ages of FPGAs

Of course, this success led major FPGA manufacturers


to customize their FPGAs for the communications
industry. Made-for-communications FPGAs incorporated
high-speed I/O transceivers, thousands of dedicated high-
performance multipliers, the ability to make wide data
paths and deep pipelines for switching large amounts of
data without sacrificing throughput. The dedicated blocks
and routing that were added to better serve the commu-
nications application requirements reduced the available
general logic area. By the end of the 2000s, FPGAs were
not general-purpose ASIC replacements as much as they
were data-routing engines. As multi-core processors and
general-purpose graphics processor units (GPGPUs) ap-
peared, FPGAs were still preferred for high-throughput,
real-time computation. At the same time, FPGAs retained
their generality. FPGA bitwise programmability assured Fig. 13. Estimated chip design cost, by process node, worldwide. Data:
their continued use in a wide range of applications, in- Xilinx and Gartner. 2011.

cluding control and automotive systems.

B. Moore’s Law cost of using the devices [29], [40]. Post-Dennard scaling
Classical Dennard scaling, with simultaneous improve- processing technology failed to deliver the huge concur-
ments in cost, capacity, power and performance, ended in rent benefits in cost, capacity, performance, power and
the mid-2000s [5], [18]. Subsequent technology genera- reliability that new process technology had delivered in
tions still gave improvements in capacity and cost. Power preceding decades. Of particular concern was the demand
continued to improve also, but with a clear tradeoff against for tradeoffs between power and performance. Now what?
performance. Performance gains from one technology node
to the next were modest and were traded off against power A. Applications
savings. This effect is evident in the slowdown of perfor- During the Age of Accumulation, the ASIC companies
mance growth in the 2000s in Fig. 1. These tradeoffs also that brought custom devices to market in the 1980s were
drove the accumulation of functions, because simple re- quietly disappearing. Custom socket-specific ASIC devices
liance on process technology scaling, as in the Age of still existed, of course, but only for designs with very large
Expansion, was not sufficient to improve power and per- volume or extreme operating requirements. Did FPGAs
formance. Hardening the logic provided the needed defeat them? Well, partially. In the 2000s, ASIC NRE
improvements. charges simply grew too large for most applications. This
can be seen in Fig. 13 where development cost in millions
We are now well into the next Age of FPGAs. of dollars is plotted against technology node. The devel-
What is this next age? opment cost of a custom device reached tens, then hun-
dreds of millions of dollars. A company that invests 20% of
income on research and development requires half a bil-
lion dollars revenue from sales of a chip to justify one
XI II. CURRENT AGE: NO L ONGE R hundred million dollars development cost. The FPGA
PROGRAMMABLE LOGIC crossover point reached millions of units. There are very
By the end of the Age of Accumulation, FPGAs were not few chips that sell in that volume: notably microproces-
arrays of gates, but collections of accumulated blocks in- sors, memories and cell phone processors. Coupled with
tegrated with the programmable logic. They were still tight financial controls in the wake of another recession,
programmable but were not restricted to programmable the sales uncertainty and long lead time to revenue for new
logic. The additional dimensions of programmability products, the result was inescapable: if the application re-
acquired in the Age of Accumulation added design burden. quirements could be met by a programmable device, prog-
Design effort, an advantage for FPGAs in their competition rammable logic was the preferred solution. The FPGA
with ASIC, was a disadvantage in competition with newly advantage from the very earliest days was still operating:
arrived multi-core processors and GPUs. lower overall cost by sharing development cost.
Pressures continued to mount on FPGA developers. ASICs did not die. ASICs survived and expanded by
The economic slowdown beginning in 2008 continued to adding programmability in the form of application specific
drive the desire for lower cost. This pressure is exhibited standard product (ASSP) system-on-chip (SoC) devices. An
not only in the demand for lower price for functionality, SoC combines a collection of fixed function blocks along
but also in lower power consumption, which reflects the with a microprocessor subsystem. The function blocks are

328 Proceedings of the IEEE | Vol. 103, No. 3, March 2015


Trimberger: Three Ages of FPGAs

typically chosen for a specific application domain, such as In the Age of Expansion, riding Moore’s Law was the most
image processing or networking. The microprocessor con- successful way to address an ever-growing fraction of the
trols the flow of data and allows customization through market. As FPGAs grew to become systems components,
programming as well as field updates. The SoC gave a struc- they were required to address those standards, and the dot-
ture to the hardware solution, and programming the com bust required them to provide those interfaces at a
microprocessors was easier than designing hardware. Lev- much lower price. The FPGA industry has relied on
eraging the FPGA advantages, programmable ASSP devices process technology scaling to meet many of these
served a broader market, amortizing their development costs requirements.
more broadly. Companies building ASSP SoCs became Since the end of Dennard scaling, process technology
fabless semiconductor vendors in their own right, able to has limited performance gains to meet power goals. Each
meet sales targets required by high development costs. process node has delivered less density improvement as
Following the ASIC migration to SoC, programmable well. The growth in the number of transistors in each new
logic vendors developed programmable SoCs [12]. This is node slowed as complex processes became more expen-
decidedly not the data-throughput engine so popular in the sive. Some predictions claim the cost per transistor will
data communications domain and also not an array of rise. The FPGA industry, like the semiconductor industry
gates. The Programmable System FPGA is a full prog- as a whole, has relied on technology scaling to deliver
rammable system-on-a-chip, containing memory, micro- improved products. If improvements no longer come from
processors, analog interfaces, an on-chip network and a technology scaling, where do they come from?
programmable logic block. Examples of this new class of Slowing process technology improvement enhances the
FPGA are the Xilinx All-Programmable Zynq, the Altera viability of novel FPGA circuits and architecture: a return
SoC FPGA, and the Actel/Microsemi M1. to the Age of Invention. But it is not as simple as returning
to 1990. These changes must be incorporated without
B. Design Tools degrading the ease-of-use of the FPGA. This new age puts a
These new FPGAs have new design requirements. Most much greater burden on FPGA circuit and applications
importantly, they are software programmable as well as engineers.
hardware programmable. The microprocessor is not the
simple hardware block dropped into the FPGA as was done D. Design Effort
in the Age of Accumulation but includes a full environment Notice how that last section focused on device attri-
with caches, busses, Network-on-Chip and peripherals. butes: cost, capacity, speed, and power. Cost, capacity and
Bundled software includes operating systems, compilers speed were precisely those attributes at which FPGAs were
and middleware: an entire ecosystem, rather than an integ- at a disadvantage to ASIC in the 1980s and 1990s. Yet they
rated function block. Programming software and hardware thrived. A narrow focus on those attributes would be mis-
together adds design complexity. guided, just as the ASIC companies’ narrow focus on them
But this is still the tip of the iceberg. To achieve their in the 1990s led them to underestimate FPGAs. Program-
goal of displacing ASICs or SoCs, FPGAs inherit the system mability gave FPGAs an advantage despite their drawbacks.
requirements of those devices. Modern FPGAs have power That advantage translated into lower risk and easier de-
controls, such as voltage scaling and the Stratix adaptive sign. Those attributes are still valuable, but other technol-
body bias [29]. State-of-the art security is required, includ- ogies offer programmability, too.
ing public-key cryptography in the Xilinx Zynq SoC and Design effort and risk are emerging as critical re-
Microsemi SmartFusion. Complete systems require mixed- quirements in programmable logic. Very large systems are
signal interfaces for real-world interfacing. These also difficult to design correctly and require teams of designers.
monitor voltage and temperature. All these are required The problems of assembling complex compute or data
for the FPGA to be a complete system on a chip, a credible processing systems drive customers to find easier solu-
ASSP SoC device. As a result, FPGAs have grown to the tions. As design cost and time grow, they become as much
point where the logic gate array is typically less than half of a problem for FPGAs as ASIC NRE costs were for ASICs
the area. Along the way, FPGA design tools have grown to in the 1990s [16]. Essentially, large design costs under-
encompass the broad spectrum of design issues. The num- mine the value proposition of the FPGA.
ber of EDA engineers at FPGA companies grew to be Just as customers seeking custom integrated circuits 30
comparable to the number of design engineers. years ago were attracted to FPGAs over the complexity of
ASICs, many are now attracted to multicore processors,
C. Process Technology graphic processing units (GPU) and software-programma-
Although process scaling has continued steadily ble Application Specific Standard Products (ASSPs). These
through the past three decades, the effects of Moore’s alternative solutions provide pre-engineered systems with
Law on FPGA architecture were very different at different software to simplify mapping problems onto them. They
times. To be successful in the Age of Invention, FPGAs sacrifice some of the flexibility, the performance and the
required aggressive architectural and process innovation. power efficiency of programmable logic for ease-of-use. It

Vol. 103, No. 3, March 2015 | Proceedings of the IEEE 329


Trimberger: Three Ages of FPGAs

is clear that, while there are many FPGA users who need to architectures, exploitation of the process technology, or
exploit the limits of FPGA technology, there are many greater accumulation of fixed blocks? Most likely, just as
others for whom the technological capability is adequate, every previous age was required to contribute to each suc-
but who are daunted by the complexity of using that cessive age, all techniques will be needed to succeed. And
technology. more besides. As with the other Ages, the next Age of
The complexity and capability of devices have driven an FPGAs will only be completely clear in retrospect.
increase in capability of design tools. Modern FPGA tool- Throughout the age, expect to see time-honored good
sets include high-level synthesis compilation from C, Cuda engineering: producing the best products possible from the
and OpenCL to logic or to embedded microprocessors [10], available technology. This good engineering will be
[11], [35]. Vendor-provided libraries of logic and process- accomplished as the available technology and the definition
ing functions defray design costs. Working operating sys- of ‘‘best’’ continuously change.
tems and hypervisors control FPGA SoC operation. Team
design functions, including build control, are built into
FPGA design systems. Some capabilities are built by the XIV. FUTURE AGE OF FPGAS
vendors themselves, others are part of the growing FPGA What of the future? What is the age after this one? I refuse
ecosystem. to speculate, but instead issue a challenge: remember the
Clearly, usability is critical to this next age of FPGAs. words of Alan Kay, ‘‘The best way to predict the future is to
Will that usability be realized through better tools, novel invent it.’’ h

REFERENCES [13] A. deHon, ‘‘DPGA-coupled microprocessors: [25] C. Koo, ‘‘Benefits of partial reconfiguration,’’
Commodity ICs for the Early 21st Century,’’ in in Xcell, Fourth Quarter 2005, Xilinx.
[1] J. Babb et al., ‘‘Logic emulation with Proc. IEEE FCCM, 1994, pp. 31–39.
virtual wires,’’ IEEE J. Comput. Aided [26] R. H. Krambeck, C.-T. Chen, and R. Y. Tsui,
Design Circuits Syst., vol. 16, no. 6, [14] A. deHon, ‘‘DPGA utilization and application,’’ ‘‘ORCA: A high speed, high density FPGA
pp. 609–626, Jun. 1997. in Proc. FPGA, 1996, pp. 115–121. architecture,’’ in Dig. Papers Compcon Spring
[15] A. deHon, ‘‘Balancing interconnect and ’93, 1993, pp. 367–372.
[2] V. Betz and J. Rose, ‘‘FPGA routing
architecture: Segmentation and computation in a reconfigurable computing [27] I. Kuon and J. Rose, ‘‘Measuring the gap
buffering to optimize speed and array (or, why you don’t really want between FPGAs and ASICs,’’ IEEE J. Comput.
density,’’ in Proc. FPGA ’99, ACM Symp. 100% LUT utilization),’’ in Proc. Int. Symp. Aided Design Circuits Syst., vol. 26, no. 2,
FPGAs, pp. 59–68. Field Program. Gate Arrays, Feb. 1999, 2007.
pp. 125–134. [28] D. Lewis et al., ‘‘The Stratix-II logic and
[3] V. Betz, J. Rose, and A. Marquardt,
Architecture and CAD for Deep-Submicron [16] P. Dworksy. (2012). How can we keep our routing architecture,’’ in Proc. FPGA,
FPGAs. Boston, MA, USA: Kluwer FPGAs from falling into the productivity 2003.
Academic, Feb. 1999. gap. Design and Reuse, viewed Sep. 16, 2014. [29] D. Lewis et al., ‘‘Architectural enhancements
Available: http://www.slideshare.net/ in Stratix-III and Stratix-IV,’’ in Proc. ACM/
[4] V. Betz and J. Rose, ‘‘VPR: A new designreuse/111207-ip-so-c-dworskyfpga-
packing, placement and routing tool SIGDA Int. Symp. Field Programmable Gate
panel-slides Arrays, ACM, 2009, pp. 33–42.
for FPGA Research,’’ in Proc. Int.
Workshop Field Program. Logic Appl., [17] K. El-Ayat et al., ‘‘A CMOS electrically [30] J. W. Lockwood, N. Naufel, J. S. Turner, and
1997, pp. 213–222. configurable gate array,’’ IEEE J. Solid-State D. E. Taylor, ‘‘Reprogrammable network
Circuits, vol. 24, no. 3, pp. 752–762, packet processing on the field programmable
[5] M. Bohr, ‘‘A 30 year retrospective on Mar. 1989.
Dennard’s MOSFET scaling paper,’’ port extender (FPX),’’ in Proc. ISFPGA 2001,
IEEE Solid-State Circuits Soc. Newslett., [18] H. Esmaeilzadeh, E. Blem, R. St.Amant, ACM, pp. 87–93.
vol. 12, no. 1, pp. 11–13, 2007. K. Sankaralingam, and D. Burger, [31] D. J. Marple, ‘‘An MPGA-like FPGA,’’ IEEE
‘‘Dark silicon and the end of multicore Design Test Comput., vol. 9, no. 4, 1989.
[6] S. Brown and J. Rose, ‘‘FPGA and CPLD scaling,’’ in Proc. ISCA 2011, pp. 365–376.
Architectures: A tutorial,’’ IEEE Design [32] L. McMurchie and C. Ebeling, ‘‘PathFinder:
Test Comput., vol. 13, no. 2, pp. 32–57, [19] J. Frankle, ‘‘Iterative and adaptive slack A negotiation-based performance-driven
1996. allocation for performance-driven layout router for FPGAs,’’ in Proc. FPGA ’95,
and FPGA routing,’’ in Proc. IEEE Design ACM.
[7] T. Callahan, J. Hauser, and J. Wawrzynek, Autom. Conf., 1992, pp. 536–542.
‘‘The Garp architecture and C compiler,’’ [33] G. Moore, ‘‘Are we really ready for VLSI?’’
IEEE Computer, 2000. [20] G. Gibb, J. W. Lockwood, J. Naous, P. Hartke, in Proc. Caltech Conf. Very Large Scale Integr.,
and N. McKeown, ‘‘NetFPGA: An open 1979.
[8] W. Carter, K. Duong, R. H. Freeman, platform for teaching how to build gigabit-rate
H. Hsieh, J. Y. Ja, J. E. Mahoney, L. T. Ngo, [34] H. Muroga et al., ‘‘A large Scale FPGA
network switches and routers,’’ IEEE J. Educ.,
and S. L. Sze, ‘‘A user programmable with 10 K core cells with CMOS 0.8 um
vol. 51, no. 3, pp. 364–369, Aug. 2008.
reconfigurable gate array,’’ in Proc. Custom 3-layered metal process,’’ in Proc. CICC,
Integr. Circuits Conf., 1986, pp. 233–235. [21] T. Halfhill, ‘‘Microblaze v7 Gets an MMU,’’ 1991.
Microprocessor Rep., Nov. 13, 2007.
[9] J. Cong and Y. Ding, ‘‘An optimal technology [35] A. Papakonstantinou et al., ‘‘FCUDA:
mapping algorithm for delay optimization [22] J. Hwang and J. Ballagh, ‘‘Building custom Enabling efficient compilation of CUDA
in lookup-table FPGA designs,’’ IEEE FIR filters using system generator,’’ in kernels onto FPGAs,’’ in Proc. IEEE 7th
Trans. Comput. Aided Design Circuits Syst., Field-Programmable Logic and Applications: Int. Symp. Appl.-Specific Processors (SASP),
vol. 13, no. 1, Jan. 1994. Reconfigurable Computing is Going Mainstream, 2009.
Lecture Notes in Computer Science, M. Glesner,
[10] J. Cong et al., ‘‘High-level synthesis for [36] K. Roy and C. Sechen, ‘‘A timing-driven
P. Zipf, and M. Renovell, Eds. New York,
FPGAs: From prototyping to deployment,’’ N-way multi-chip partitioner,’’ in Proc.
NY, USA: Springer, 2002, pp. 1101–1104.
IEEE Trans. Comput.-Aided Design Circuits IEEE ICCAD, 1993, pp. 240–247.
Syst., vol. 30, no. 4, Apr. 2011. [23] G. Jones and D. M. Wedgewood, ‘‘An effective
[37] V. P. Roychowdhury, J. W. Greene, and
hardware/software solution for fine grained
[11] T. S. Czajkowski et al., ‘‘From OpenCL to A. El-Gamal, ‘‘Segmented channel routing,’’
architectures,’’ in Proc. FPGA, 1994.
high-performance hardware on FPGAs,’’ in Trans. Computer-Aided Design Integ.
Proc. Int. Conf. Field Program. Logic Appl. [24] T. Kean, ‘‘Configurable Logic: A dynamically Circuits Syst., vol. 12, no. 1, pp. 79–95,
(FPL), 2012, pp. 531–534. programmable cellular architecture and its 1993.
VLSI implementation,’’ Ph.D. dissertation
[12] L. Crockett, R. Elliot, M. A. Enderwitz, and [38] S. Trimberger, Ed., Field Programmable
CST62-89, Dept. Comput. Sci., Univ.
R. W. Stewart, The Zynq Book, Strathclyde Gate Array Technology. Boston, MA, USA:
Edinburgh, Edinburgh, U.K.
Academic, 2014. Kluwer Academic, 1994.

330 Proceedings of the IEEE | Vol. 103, No. 3, March 2015


Trimberger: Three Ages of FPGAs

[39] S. Trimberger, R. Carberry, A. Johnson, and [41] B. von Herzen, ‘‘Signal processing at centralized field-configurable memory,’’ in
J. Wong, ‘‘A time-multiplexed FPGA,’’ in Proc. 250 MHz using high-performance Proc. 1995 ACM Third Int. Symp. Field-
FCCM, 1997. FPGA’s,’’ in Proc. FPGA, 1997, pp. 62–68. Programmable Gate Arrays, FPGA.
[40] T. Tuan, A. Rahman, S. Das, S. Trimberger, [42] J. E. Vuilllemin et al., ‘‘Programmable [44] V. Betz and J. Rose, ‘‘FPGA routing
and S. Kao, ‘‘A 90-nm low-power FPGA active memories: Reconfigurable systems architecture: Segmentation and buffering
for battery-powered applications,’’ IEEE come of age,’’ IEEE J. Very Large Scale to optimize speed and density,’’ in Proc.
Trans. Comput. Aided Design Circuits Syst., Integ., vol. 4, no. 1, pp. 56–69, FPGA ’99, ACM Symp. FPGAs, Feb. 1999,
vol. 26, no. 2, 2007. Feb. 1996. pp. 140–149.
[43] S. J. E. Wilton, J. Rose, and
Z. G. Vranesic, ‘‘Architecture of

ABOUT THE AUTHOR


Stephen M. Trimberger (F’11) received the B.S.
degree in engineering and applied science from
the California Institute of Technology, Pasadena,
CA, USA, in 1977, the M.S. degree in information
and computer science from the University of
California, Irvine, in 1979, and the Ph.D. degree in
computer science from the California Institute of
Technology, in 1983.
He was employed at VLSI Technology from
1982 to 1988. Since 1988 he has been at Xilinx, San
Jose, CA, USA, holding a number of positions. He is currently a Xilinx
Fellow, heading the Circuits and Architectures group in Xilinx Research
Labs. He is author and editor of five books as well as dozens of papers
and journal articles. He is an inventor with more than 200 U.S. patents in
the areas of IC design, FPGA and ASIC architecture, CAE, 3-D die stacking
semiconductors and cryptography.
Dr. Trimberger is a four-time winner of the Freeman Award, Xilinx’s
annual award for technical innovation. He is a Fellow of the Association
for Computing Machinery.

Vol. 103, No. 3, March 2015 | Proceedings of the IEEE 331

You might also like