FPGA Based On Integration of Memristors and CMOS Devices
Wei Wang, Tom T. Jing, and Brian Butcher
College of Nanoscale Science and Engineering, University at Albany, SUNY
Albany, NY 12203, USA
This work was supported in part by SRC FCRP, NSF EMT, Air
Force STTR, and International Sematech research grants.
FPGA applications. The proposed FPGA is similar to the existing VP2 are passing through the two programming transistors to select
magnetic memory-based FPGA [12]. The advantages of using the 1M junction to be RON or ROFF (see Table I). If the 1M junction
CMOS-memristor hybrid devices are expected to provide a higher is RON, it will connect “A” and “B” during the operational stage. If
density and more compatible fabrication with foundry-type CMOS the 1M junction is ROFF, it will disconnect “A” and “B”. The
technology. For example, HfO2 and ZrO2 materials have been used in
programming voltage |VP1 - VP2| is much larger than the operational
CMOS and can be used to establish efficient resistive memory
devices [9] as well as mFPGA. voltage |VAB| as shown in Table I.
Our second design, a 2T2M structure, is seen in Fig. 3c.
B. 1T1M-based FPGA Elements Compared with both Fig. 3a and Fig. 3b, this design has one
1) NOR-based 1T1M Array as Memory Element programming transistor with a programming voltage VP used to
Using 1T1M structures, the NOR-cell array provides faster configure two complementary junctions connected to Vdd and GND,
access compared with the NAND-cell array that requires a larger area respectively. Note that |Vdd -VP| is used to program the Vdd junction,
and a slower access. The NOR 1T1M array is used to replace SRAM while Vp is used for the GND junction. (When one junction is ON, the
cells used in the block memory of an FPGA, which leads to a 6X other junction is OFF.) Table I summarizes the typical values of VP
density enhancement as expected and referenced in [13]. Since used to program the pass transistor in the programming stage; it also
information storage of the NOR array is based on the resistance shows how combining one RON junction and one ROFF junction works
change in the memristor, almost no power is required to maintain as a voltage divider, during the operational stage. The ROFF/RON ratio
data storage. 1T1M cell can significantly reduce the standby power of determines the pass transistor gate voltage VG. ROFF/RON must be
a corresponding SRAM cell, PSRAM,with a 6X improvement as shown large enough to insure both VG < VT when the pass transistor is OFF,
in [13]. and VG > VT when the pass transistor is ON. Generally, the memristor
has a ratio in a range of 104 [10, 11, 13]; this is much higher than the
2) Two Novel CMOS-Memristor Routing Switches typical ratio of 119, which is assumed as a low bound. This 2T2M
As shown in Fig. 3a, the conventionally used CMOS routing routing switch operates exactly like a SRAM routing switch where
switches consist of a pass transistor controlled by a SRAM cell of VG has a modest reduction and does not affect the ON/OFF operation
six transistors to provide the routing function. By integrating of the pass transistor.
memristor devices with CMOS devices, we can achieve two new Compared with a 7T SRAM cell, 2T2M or 2T1M can have
routing switches to improve the area and standby power around 3.5X density improvement. The standby power of 2T2M or
consumption of FPGA. Note that the standby power of the 7- 2T1M switch is equal to 1/3 (2 transistors / 6 transistors) of PSRAM.
transistor (7T) SRAM routing switch depends on the 6-transistor The 2T1M switch is expected to operate faster than a 7T SRAM
SRAM PSRAM. switch or a 2T2M switch, but the 2T2M leads to a more reliable
circuit operation than both SRAM and 2T1M designs. The proposed
mFPGA structure considers both new routing switches.
The proposed 2D mFPGA maintains the architecture of the
baseline 2D FPGA (Fig. 2), while utilizing RRAM devices to build
several FPGA building blocks. In particular, the block RAM memory
is based on the NOR 1T1M arrays; CB and SB are designed by using
2T1M or 2T2M routing switches; LB is designed by using 1T1M
Figure 3. (a) A 7T SRAM routing switch and its ON/OFF operations, (b) A
2T1M routing switch and its ON/OFF operations, (c) A 2T2M routing switch
and its ON/OFF operations.
Programming Stage Operating Stage
“ON”: Vg = 1.2V
“ON”: 1.2V Figure 4. The proposed mFPGA structure considers both new routing
7T SRAM “ON”: Vg = 0V
“OFF”: 0V switches. (a) The 4-by-4 CB structure, (b) Replacement of a 7T switch with a
VAB = 0V – 1.2V
2T1M switch, (c) Replacement of a 7T switch with a 2T2M switch.
“ON”: VP1 = 3V and VP2 = 0V
2T1M VAB = 0V – 1.2V
“OFF”: VP1 = 0V and VP2 = 3V
The 4-by-4 CMOS CB is shown in Fig. 4a [15]. By using the
“ON”: Vg = 1.19V 2T1M routing device to replace the 7T SRAM switch, we can obtain
“ON”: VP = -3V
2T2M “ON”: Vg = 0.01V a high-density and low-power design (Fig. 4b). We also consider the
“OFF”: VP = 3V
VAB = 0V – 1.2V use of the 2T2M switch to replace the 7T switch in the CB (see Fig.
Note: The pass transistor has Vdd = 1.2V, VT = 0.4V [14] and the 4c).
memristor requires ±3V to program [10, 11, 13] By using the 2T1M or 2T2M switch to replace each 7T SRAM
switch in Fig. 5a and 5b, we can obtain CMOS-memristor designs
Fig. 3b compares our first design, the 2T1M structure, with Fig. for SB-1 and SB-2 (these two designs are for 1-bit switching
3a: two programming transistors are used to configure one junction. operation as shown in Fig. 5c and 5d, respectively). The 4-bit CB
During the configuration stage, the programming voltages VP1 and
and SB operations are summarized in Table II in terms of the area,
delay, and power performance estimations.
(c) (d)
Figure 5. The 1-bit SB operation: (a) SB-1: CMOS design [15], (b) SB-2:
CMOS design [15], (c) SB-1: CMOS-memristor design (four branches are
required; only one branch is shown) and its equivalent circuit, (d) SB-2:
CMOS-memristor design (four branches are required; only one branch is
shown) and its equivalent circuit.
CMOS SRAM 2T1M-based 2T2M-based
design design design Figure 6. (a) LB (BLE: basic logic elements) [3], (b) BLE [3], (c) A truth
112 8 32 table for a 2-input logic, (d) A LUT implementation, (e) LUT with SRAM,
Area (f) LUT with 1T1M.
n-transistors n-transistors n-transistors
4 × 4 CB
(Fig. 4) B. 3D mFPGA
16PSRAM 5.3PSRAM 5.3PSRAM The proposed mFPGA can be modified to a 3D FPGA structure.
440 240 240 As shown in Fig. 7 below, the 3D mFPGA is a face-to-face two-
Area layer structure. The use of 1T1M cells and circuits in 3D mFPGA is
n-transistors n-transistors n-transistors
SB-1’s Delay τSB-1 τSB-1/3 τSB-1
expected to give new opportunities, similar to 2D mFPGA, which
(Fig. 5 a, b) will reduce power density and thus improve thermal performance of
Standby 3D ICs.
432 192 192
n-transistors n-transistors n-transistors
SB-2’s Delay τSB-2 τSB-2/3 τSB-2
(Fig. 5 c, d)
Note: The p-transistor in the SRAM cell is equivalent to an n-transistor.
Each p-type transistor in the buffer (two inverters) or individual inverter is
similar to two n-type transistors.
In the proposed mFPGA architecture, the LB CMOS design is
also modified by utilizing 1T1M cells. Most commercial FPGAs use
LBs (Fig. 6a) with basic-logic elements (BLEs, Fig. 6b) based on
look-up tables (LUTs). Each LB contains N BLEs fed by I cluster
inputs. The BLE consists of a K-input LUT and register, which feed
a two-input MUX that determines whether the registered or
unregistered LUT output drives the BLE output. A truth table and
the related implementation for a 2-input LUT are illustrated by Fig. Figure 7. Architecture of the 2-layer 3D mFPGA based on the face-to-face
6c and Fig. 6d, respectively. In the convertional design, SRAM is bonding using bumps without area penalty.
used as shown in Fig. 6e. Here, we use 1T1M cell to replace each
SRAM cell in the design of the FPGA LB LUT (Fig. 6f) to reduce
area and power consumption. IV. BENCHMARK SIMULATIONS
In order to demonstrate the efficiency of the proposed 2D and 3D
As we analyzed before, one SRAM cell consists of six transistors
mFPGAs, we simulate the Toronto-20 FPGA benchmark circuits for
while the 1T1M cells only uses one transistor. The BLE LUT
2D and 3D mFPGAs following the methods described in [7, 8]. The
consists of four SRAM cells and six additional pass transistors. simulation results are summarized in Table III, including area,
Therefore, compared with the SRAM-based LUT design, the 1T1M- critical-path delay, and total-power consumption (including both
based design can have a 3X density improvement. The standby dynamic and standby power values). These results provide an
power of the four 1T1M cells is equal to 1/6 (4 transistors / 24 accurate comparison between the proposed 2D and 3D mFPGAs with
transistors) of the four SRAM cells, leading to 2/3 PSRAM.
Area (µm2) Critical Part Delay (ns) Power (mW)
2D 3D
2D 2D 2D 3D 3D CMOS 2D CMOS 3D 2D 2D 2D 3D 3D
3D 3D
[8] (2T1M) (2T2M) (2T1M) (2T2M) mFPGA (2T1M) mFPGA (2T1M) [7] (2T1M) (2T2M) (2T1M) (2T2M)
(2T2M) (2T2M)
Alu4 137700 55080 60588 68850 47540 58654 5.1 4.59 2.04 1.8 0.062 0.056 0.058 0.0362 0.028 0.03
apex2 166050 66420 79704 83025 53210 61875 6 5.4 2.4 2.2 0.067 0.06 0.063 0.0421 0.033 0.036
apex4 414619 165848 197865 207310 102924 115432 5.5 4.95 2.2 2 0.042 0.038 0.04 0.0203 0.014 0.015
clma 623194 249278 348976 311597 144639 157865 13.1 11.79 5.24 4.7 0.2 0.18 0.19 0.178 0.147 0.149
diffeq 100238 40095 48765 50119 40048 43876 6 5.4 2.4 2.2 0.024 0.022 0.023 0.0152 0.011 0.014
elliptic 213638 85455 118769 106819 62728 71903 8.6 7.74 3.44 3.1 0.069 0.052 0.053 0.0502 0.04 0.042
ex1010 391331 156532 219876 195666 98266 113245 9 8.1 3.6 3.2 0.113 0.092 0.093 0.096 0.079 0.081
ex5p 100238 40095 52794 50119 40048 48765 5.1 4.59 2.04 1.8 0.0314 0.018 0.025 0.0105 0.009 0.011
frisc 230850 92340 101574 115425 66170 85439 11.3 10.17 4.52 4.1 0.0627 0.046 0.049 0.0472 0.047 0.048
misex3 124538 49815 69877 62269 44908 47654 5.3 4.77 2.12 1.9 0.0513 0.036 0.047 0.0299 0.022 0.023
pdc 369056 147622 189078 184528 93811 107653 9.6 8.64 3.84 3.5 0.101 0.081 0.089 0.087 0.061 0.063
s298 166050 66420 73062 83025 53210 55282 10.7 9.63 4.28 3.9 0.042 0.028 0.029 0.0261 0.019 0.021
S38417 462713 185085 204651 231357 112543 127530 7.3 6.57 2.92 2.6 0.124 0.102 0.108 0.122 0.091 0.093
S38584.1 438413 175365 210049 219207 107683 111235 4.8 4.32 1.92 1.7 0.136 0.112 0.115 0.121 0.098 0.099
seq 151369 60548 67456 75685 50274 55672 5.4 4.86 2.16 1.9 0.065 0.049 0.052 0.042 0.033 0.034
spla 326025 130410 179028 163013 85205 90378 7.3 6.57 2.92 2.6 0.087 0.068 0.069 0.0754 0.041 0.044
tseng 78469 31388 53297 39235 35694 38976 6.3 5.67 2.52 2.3 0.029 0.016 0.017 0.0101 0.009 0.01
Avg. 264382 105753 133848 132191 72877 81849 7.44 6.69 2.97 2.68 0.077 0.062 0.066 0.059 0.046 0.048
