Device-to-System Performance Evaluation From Trans

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

The original manuscript was finished in 2017 but not submitted in time due to the delay in the preparation

for source code release. Though some of the references and assumptions in this manuscript are dated, the key
messages which the authors were trying to convey remain valid today amid the continuous advancement of
one-/two-dimensional materials:
a) The intrinsic performance gain through the adoption of two-dimensional materials as the FET channel
materials can be greatly limited by the parasitic resistances.
b) Design technology co-optimization across the boundaries between devices, interconnects, circuits,
and systems can help direct resources toward the most promising candidates and maximize the values
of new technologies.
c) Machine learning algorithms can facilitate holistic design technology co-optimization.
As a result, we decide to still upload the manuscript to arXiv now and hope that our findings and insights can
still benefit the research community.
Device-to-System Performance Evaluation:
from Transistor/Interconnect Modeling to VLSI
Physical Design and Neural-Network Predictor
Chi-Shuen Lee, Brian Cline, Saurabh Sinha, Greg Yeric, and H.-S. Philip Wong, Fellow, IEEE

Abstract—We present a DevIce-to-System Performance their potential benefits. Early insights into the key performance
EvaLuation (DISPEL) workflow that integrates transistor and detractors help focus development efforts such that resources
interconnect modeling, parasitic extraction, standard cell library can be directed toward the most important challenges.
characterization, logic synthesis, cell placement and routing, and
timing analysis to evaluate system-level performance of new However, accurate technology assessment at early stage is
CMOS technologies. As the impact of parasitic resistances and difficult due to growing impact of parasitic and interconnect
capacitances continues to increase with dimensional downscaling, resistances and capacitances, which depend on the cell layouts
component-level optimization alone becomes insufficient, calling and system architecture. Traditional simple benchmarks [9-10]
for a holistic assessment and optimization methodology across the become insufficient to capture the complexity of interconnects
boundaries between devices, interconnects, circuits, and systems. in a Very-Large-Scale Integration (VLSI) system. Ring
The physical implementation flow in DISPEL enables realistic
analysis of complex wires and vias in VLSI systems and their oscillators with fixed fan-out and wire loads is a common
impact on the chip power, speed, and area, which simple circuit benchmark to compare power and speed of different
simulations cannot capture. To demonstrate the use of DISPEL, a technologies, whose wire lengths are typically estimated by the
32-bit commercial processor core is implemented using theoretical average wire length of VLSI circuit implementations or a
n-type MoS2 and p-type Black Phosphorous (BP) planar FETs at multiple of the contacted gate pitch (CGP). While this simple
a projected 5-nm node, and the performance is benchmarked approach provides some insights, estimation of the wire load is
against Si FinFETs. While the superior gate control of the
MoS2/BP FETs can theoretically provide 51% reduction in the iso- not easy because different transistor drive current and
frequency energy consumption, the actual performance can be capacitance can lead to different optimal circuit topologies (see
greatly limited by the source/drain contact resistances. With the Section IV). Other system-level models [11-14] employed
large amount of data generated by DISPEL, a neural-network is Rent’s rule [15] to derive a stochastic wire distribution,
trained to predict the key performance metrics of the 32-bit optimize the interconnects given a gate delay model, and
processor core using the characteristics of transistors and predict the final chip power, speed, and area. Despite the high-
interconnects as the input features without the need to go through
the time-consuming physical implementation flow. The machine level abstraction of architectures and wiring optimization, many
learning algorithms show great potentials as a means for empirical parameters must be decided (e.g. the Rent’s
evaluation and optimization of new CMOS technologies and constants), which can be architecture and/or technology
identifying the most significant technology design parameters. dependent and need careful calibration. Recent works resorted
to full physical design flows to assess system-level performance
Index Terms—design-technology co-optimization, technology by actual implementation of a VLSI system or a large circuit
assessment, neural networks. module [16-19]. This approach provides the most accurate
performance evaluation as complex wiring is considered
I. INTRODUCTION
realistically. Therefore, in this paper we present an end-to-end
As conventional scaling of Si transistors and Cu interconnect DevIce-to-System Performance EvaLuation (DISPEL)
began to face significant difficulties [1-3], candidates to workflow that automates both Process Design Kit (PDK)
complement Si and Cu to extend CMOS technology scaling in development and physical design flows, which enables efficient
the sub-10-nm technology nodes are researched extensively. system-level performance evaluation of transistor and
Active development for new technologies includes nanowires interconnect technologies. Similar method has been employed
[4], two-dimensional (2D) semiconductors [5], and carbon in [19] to evaluate the benefits of carbon nanotube field-effect
nanotubes (CNT) for transistors [6]; and cobalt, graphene, and transistors (FETs). Here the emphasis is on: (i) the methodology
CNT for interconnects1 [7-8]. The high cost of developing a of performance evaluation, (ii) impact of parasitics on
new technology makes it vital to gain an early understanding of performance, and (iii) applying neural-network (NN) models to
efficient processor core-level performance evaluation.
C.-S. Lee was with the Department of Electrical Engineering, Stanford
University, Stanford, CA 94305 USA. He is now with Google LLC. (e-mail:
chishuen@alumni.stanford.edu)
Brian Cline is with ARM Ltd.
Saurabh Sinha was with ARM Ltd. He is now with Apple Inc.
Greg Yeric was with ARM Ltd. He is now with Cerfe Labs Inc.
H.-S. P. Wong is with the Department of Electrical Engineering, Stanford
University, Stanford, CA 94305 USA (e-mail: hspwong@stanford.edu).
1
The term “interconnect” in this paper refer to both wires and vias.
Fig. 1. Overview of the DevIce-to-System Performance EvaLuation Platform (DISPEL) workflow. Corresponding EDA tools or file formats are specified in the
parentheses. Acronyms–LEF: Library Exchange Format, Lib: Liberty model, RTL: Register Transfer Language, ITF: Interconnect Technology Format, VS: Virtual
Source [21].

This paper is organized as follows: the DISPEL workflow is


introduced in Section II; the methodology of technology
evaluation and optimization is presented in Section III. To
demonstrate the use of DISPEL, CMOS devices composed of
n-type MoS2 and p-type Black Phosphorous (BP) FETs are used
to implement a 32-bit commercial processor core at a projected
5-nm node. The performance is compared against Si FinFETs
to stress the impact of device parasitics; in Section IV, NN
models are trained to predict the core energy consumption,
delay, and die area using the data generated by DISPEL to
address the issue of long runtime of physical design flows; in
Section V, the NN models are analyzed to understand what has
been learnt from the data; finally, in Section VI, we discuss the
limitations of DISPEL and the NN models, as well as potential
future research directions.

II. DISPEL: DEVICE-TO-SYSTEM PERFORMANCE


EVALUATION PLATFORM
An overview of DISPEL workflow is depicted in Fig. 1. The Fig. 2. Virtual Source model (lines) fitted to NEGF-based numerically
simulated monolayer n-type MoS2 [23] and p-type black phosphorous (BP)
flow consists of three parts: (i) transistor and interconnect [24] FETs (circles). (a) ID-VDS for BP and (b) MoS2. (c) ID-VGS for BP and (d)
modeling, (ii) PDK development, and (iii) physical design MoS2. LGATE = 10 nm and equivalent oxide thickness = 0.7 nm for both FETs.
flows. The major steps are described in this section to provide
essential knowledge for the rest of the sections. Characteristics of interconnect wires, vias, and interlayer
A. Transistor and Interconnect Models dielectrics are specified in an Interconnect Technology Format
(ITF) file, including the dimensions, resistivities, and dielectric
The compact Virtual-Source (VS) model [21] is used as a
constants of each layer, which are parameterized for easy
basis for the transistor modeling because the model parameters
experimentation with different interconnect properties. The
can be extracted from experimental data and have physical
interconnects can be conceptually categorized into two parts:
meaning in the quasi-ballistic transport regime. To ensure
local middle-end-of-line (MEOL) and intermediate/global
physically meaningful results, the apparent carrier mobility (μ)
back-end-of-line (BEOL) interconnects as illustrated in Fig. 3
[22] and injection velocity (v) of VS models are extracted from
(a) and (b). At advanced technology nodes, cell-level parasitics
either experimental data or physical simulations. In Fig. 2, VS
dominate over transistor resistance and capacitance (RC) [26].
models are fitted to the simulated current-voltage (I-V) curves
One of the dominant components is the metal-to-semiconductor
of monolayer n-type MoS2 and p-type BP FETs based on Non-
source/drain (S/D) contact resistance (RCON). Here RCON is
Equilibrium Green’s function formalism [23-24] to extract μ
modeled by ρCON/(LCONW) assuming LCON is much shorter than
and v (a parameter extractor is provided in [25]). Here μ and v
the transfer length [27], where ρCON is the specific contact
should be viewed as fitting parameters reflecting the theoretical
resistivity, LCON is the contact length, and W is the device width.
performance of MoS2 and BP FETs with clear physical
RCON is included in the MEOL parasitic RC during the parasitic
meanings [22].
extraction step in the workflow rather than in the transistor
Fig. 5. Illustration of layout scaling of the INV_X1 standard cell (i.e. inverter
with 1 finger). Cell height is proportional to M2 pitch where N is the number
of tracks, a constant number.

M2 pitch. All the widths and lengths of the polygons in the


layouts are parameterized such that they can be scaled by
computer programs (e.g. Python scripts) easily. The Design
Rule Checks (DRC) are relaxed to an extent so that layouts can
pass the checks as long as two polygons on the same layer do
not touch or overlap each other. The relaxed DRCs help us
focus on the impact of scaling on performance without
Fig. 3. Illustration of (a) BEOL interconnects and (b) MEOL interconnects and worrying about the constraints of photolithography. Then the
cell-level parasitic RC. MA, MB are local interconnect metals; TS is trench scaled layouts are tested by Layout Versus Schematic checks to
silicide (for Si) or the metals directly connecting to the source (S) and drain ensure consistency after scaling. With the scaled layouts, netlist
(D); RCON is contact resistance; CG2C is gate-to-contact coupling capacitance.
templates, and an ITF file, cell-level parasitic extraction is
performed by Synopsys StarRC [30] to generate extracted SC
netlists with all the parasitic RCs. Timing and power
characteristics of SCs in the library are then characterized by
Synopsys SiliconSmart [31].
C. Physical Design
A simplified yet complete physical design flow is used to
map a design in Register Transfer Language (RTL) into
physical geometric representations of all the layers that can be
manufactured. The flow involves floor-planning, logic
synthesis, Design-For-Test scan chain insertion, cell placement,
Fig. 4. The Steinhögl model [28] fitted to experimental data [29], used in clock tree synthesis, signal routing, RC extraction, setup-/hold-
DISPEL to capture the copper resistivity vs. cross-section area relationship.
time violation fix, and static timing analysis. Physical
model because RCON depends on the circuit topology. For verification and signal integrity analysis are skipped as the
instance, when a S/D contact is shared by two parallel focus here is performance. Inputs to the physical design flow
transistors, the impact of RCON is amplified, which is an effect include: (i) a technology file defining the metal layers used for
difficult to be captured inside a compact model. The BEOL signal routing (e.g. minimum metal width and separation); (ii)
interconnects include Metal-1 (M1) layer and the layers above; a circuit/system RTL design to be implemented (e.g. a
copper (Cu) is used as the default materials for wires and vias. microprocessor core); (iii) a floorplan specifying the die size,
Here the Steinhögl model [28] calibrated to experimental data locations of the input/output (I/O) pins, positions of the module
[29] is employed to account for the dependence of Cu resistivity macros and blockages; (iv) a design constraint file (e.g. set clock
(ρCu) on the cross-sectional area as shown in Fig. 4. ρCu uncertainty, maximum signal transition, etc.); (v) a target clock
increases dramatically with dimensional scaling due to electron frequency (fTAR). Outputs at the end of the flow are the total chip
scattering at the surfaces and grain boundaries. power consumption, total cell area, and critical-path timing
B. Process Design Kit (PDK) Creation analysis. Rather than employing complex techniques to achieve
timing closure after the physical design, an achieved clock
The PDK creation flow starts with dimensional scaling of
frequency (fACH), defined as 1/fACH = (1/fTAR)−tSLACK, is used as
standard cell (SC) layouts. The SC library used in this paper
the metric of speed, where tSLACK is the timing slack of the
includes more than 100 cells. Fig. 5 illustrates the scaling of an
critical path.
INV_X1 cell (i.e. inverter with 1 finger) for example. The cell
width is proportional to CGP and the height is proportional to
TABLE I. PROJECTED 5-NM NODE METAL WIDTHS AND SPACINGS
Layer Min Width Min Spacing
M1, M2, M3 12 nm 12 nm
M4, M5 18 nm 18 nm
M6 24 nm 24 nm
V1, V2, V3 12 nm 12 nm
V4, V5 18 nm 18 nm

Fig. 7. Pareto-optimal energy-frequency curve for the 32-bit processor core


implemented using n-MoS2/p-BP FETs following the methodology in Section
III. Each dot represents an implementation for a target clock frequency.
Fig. 6. (a) Floorplan. I/O pins are located on the top of the core to connect to the
memory system. The die size may change but the aspect ratio (Height/Width)
and I/O locations remain fixed. (b) Top-down view of output GDS layout of an
implemented core (the lines in orange/dark red are metal lines).

III. SYSTEM-LEVEL PERFORMANCE ASSESSMENT OF NEW


TRANSISTOR/INTERCONNECT TECHNOLOGIES
A commercial 32-bit microprocessor core is implemented at
a projected 5-nm technology node using the monolayer n-type
MoS2 and p-type BP FETs (see Fig. 2) to demonstrate the use
of DISPEL. The 5-nm node is assumed to have a CGP of 36 nm
and a M2 pitch of 24 nm, extrapolated from the 7-nm node [32].
Table I summarizes the minimum widths and spacings of metal
wires and vias at different levels. M2 and the layers above are Fig. 8. Minimum core energy-delay product vs. gate spacer length (see Fig. 3b)
used for signal routing up to M6 as the core is a power-efficient with fixed CGP = 36 nm and LGATE = 10 nm. The low resistivity case (green)
design and relatively simple. MoS2 and BP are selected to is assumed to have a 0.1× of the BEOL metal resistivity of the Cu case.
represent the 2D layered material family because MoS2 is the The primary performance metric used in this section is the
most mature among the family [33] and BP has the highest Pareto-optimal energy vs. fACH trade-off curve (E-f curve),
mobility in theory [34]. Evaluation of other transistor where the energy is calculated by the total power consumption
technologies has been studied in [16, 35]. This paper focuses on divided by fACH. The E-f curves are generated as follows:
the methodology for performance evaluation, device structure 1. For a given supply voltage (VDD), tune the threshold
optimization, and analysis of interconnects. voltage in the VS model to meet the target transistor
A. Methodology for Core Performance Evaluation leakage current (IOFF = 1 nA/μm throughout this paper).
The processor core implementation flow is simplified as 2. Scale the standard cell layouts based on the input
follows: (i) process-voltage-temperature (PVT) variations are dimensional parameters (e.g. LGATE, LCON) and run
not considered, i.e. only the nominal-case transistor and parasitic extraction to generate extracted netlists.
interconnect models at the typical corners are used in the timing 3. Characterize the timing and power of the SC library.
and power analysis; (ii) Static Random-Access Memories 4. Run physical design flow for a given fTAR to generate the
(SRAMs) are removed from the core because predictions of core energy consumption and fACH.
SRAM performance heavily rely on many other factors such as 5. Repeat Step 4 over a range of fTAR with a coarse frequency
process variability, which requires separate optimization [36- interval (e.g. from 1 to 3 GHz with a step size of 0.2 GHz).
37] and is beyond the scope of this paper; (iii) a simple floorplan 6. Pick the design at the maximum fACH from Step 5 and
is used for all the core implementations throughout this paper scale the die width and height proportionally with a fixed
as shown in Fig. 6. All the I/O pins are located on one side of aspect ratio to achieve a cell utilization of 60%.
the core to connect to the memory system. While floor-planning 7. Repeat Step 4 around the maximum fACH with a finer step
can significantly affect the core performance [38], this simple size (e.g. a step size of 0.02 GHz).
floorplan allows us to focus on the impact of CMOS 8. Repeat Step 1 and 3-7 for different VDD’s (e.g. from 0.5
technologies on the system-level performance to gain early to 0.9 V with a step size of 0.1 V).
insights. The final outputs of the workflow are core energy 9. Generate a Pareto-optimal E-f curve by connecting the
consumption, fACH, and total cell area. Throughout this paper optimal design points, i.e. the point with the lowest
these numbers are normalized to an arbitrary number to energy consumption for any given fACH, as illustrated in
emphasize the trend rather than the absolute values. Fig. 7.
Fig. 9. Statistics of 98 core implementations for different VDD’s after timing closure. (a) Average wire length of each metal layer (normalized to CGP = 36 nm)
between two connected standard cells on the top 20 critical paths (illustrated by the inset). (b) Average net length over a core. (c) Number of inverter and buffer
instances. (d) Ratio of inverter and buffer instances that have two or more fingers to the total number of inverters and buffers. (e)(f) Contributions of interconnect
(wires + vias) resistances (RITC) and capacitances (CITC) to the critical path delays (tCP), where tCP-R(C) is the delay with RITC (CITC) zeroed out.

B. Device Structure Optimization C. Interconnect Analysis


Ultrathin (~1 nm) channel materials (e.g. 1D carbon Evaluation of different wire materials other than Cu has been
nanotubes, 2D MoS2) can provide superior electrostatic gate studied in [35]. In this section, wire distributions in the cores
control and carrier mobility to bulk materials (e.g. Si, Ge) [16] are analyzed to shed light on the complexity of interconnects.
at similar channel thickness. To maintain good gate control, the Interconnects in VLSI circuits/systems become increasingly
channel body thickness of bulk materials must be thinned down, complex with technology scaling due to increasing interconnect
leading to drastic degradation in the carrier mobility [39]. Good RC and more stringent lithography rules. Many design
gate control allows further reduction of the gate length (LGATE) techniques (e.g. buffer insertion and gate sizing) are employed
without increasing subthreshold leakage current; shorter LGATE in modern place-and-route (P&R) EDA tools to place and
also helps reduce intrinsic gate capacitance and gives more connect standard cells on a die as densely as possible while
space to the gate spacer and S/D contacts to lower the cell-level meeting the timing requirements [41-42]. Analyses of wire
parasitic RC. Trade-offs exist between the lengths of gate distributions in the cores after timing closure for different fTAR’s
spacer (LSPA), S/D contacts (LCON) and LGATE for a fixed CGP and VDD’s are shown in Fig. 9. The average wire length
—as LSPA increases, LCON decreases accordingly, resulting in normalized to CGP of each metal layer between two connected
decreasing gate-to-contact coupling capacitance (CG2C) but standard cells on the top 20 critical paths is shown in Fig. 9a.
increasing RCON. To fully understand the potential of a transistor, The low-level metal layers are much shorter than the high-level
it is important to explore the full design space and find an metal layers as the P&R tool manages to avoid using more
optimal design to evaluate its performance. Fig. 8 shows the resistive layers to transmit signals over long distances.
design points with the minimum core energy-delay product Meanwhile, design-rule violations and routing congestions
(min-EDP) for different LSPA’s and LCON’s at VDD = 0.6 V and a need be minimized, which makes the optimization problem
fixed LGATE = 10 nm. The optimal design for min-EDP in this even more complex. Average net length over an entire core and
case happens at LSPA = 8 nm and LCON = 10 nm. To stress the number of buffers (and inverters) vs. fACH for different VDD’s are
importance of taking interconnects into account during device shown in Fig. 9b and Fig. 9c. To achieve higher fACH’s for a
optimization, an artificial interconnect technology with lower fixed VDD, the average net length becomes shorter as more
resistivity is created to compare against the Cu case, which is buffers are inserted to break long wires. Moreover, the sizes of
assumed to have a 0.1× of the BEOL metal resistivity (including buffers also become larger to deliver more drive current to meet
wires and vias) of Cu by modifying a parameter in the Steinhögl the timing requirement as shown in Fig. 9d. To quantify the
model. As shown in Fig. 8, the optimal device design shifts impact of interconnect RC on the core speed, Fig. 9e and 9f
toward the left as the interconnect resistances (RITC) decrease show the contributions of RITC and interconnect capacitance
because RCON then becomes more dominant such that a shorter (CITC) to the critical-path delay (tCP). The contribution is
LSPA and thus longer LCON provides more benefits. measured as 1 – tCP-R(C)/tCP, where tCP-R(C) is the critical-path
Fig. 10. Device structures optimized for minimum core energy-delay product. Fig. 12. (a) Input pin capacitance and (b) drive current (i.e. VGS = VDS = VDD =
(a) Planar n-MoS2/p-BP FET (b) Si FinFET. All numbers are in the unit of nm. 0.6 V) the INV_X1 standard cell for the projected Si FinFET and n-MoS2/p-
BP FET. CMEOL is the cell-level MEOL parasitic capacitance (see Fig. 3) and
CFET is the transistor-level gate capacitance. For the Si FinFET case, there are
3 fins in the n and p active regions, respectively.

Fig. 11. Pareto optimal energy-frequency curves of the projected n-MoS2/p-BP


FET and the Si FinFET for different specific contact resistivities (ρCON).
Fig. 13. Core area vs. clock frequency of implemented with the n-MoS2/p-BP
delay after zeroing out RITC (CITC) in the extracted netlists of the FET and the Si FinFET for different VDD’s.
critical path. As fACH increases for a fixed VDD, the impact of
RITC increases while the impact of CITC decreases, because Appendix. The optimized device structure for Si FinFET
wider logic gates are required to deliver more drive current (i.e. following the same methodology described in Section III.B is
lower cell-level resistance but higher capacitance). In general, shown in Fig. 10. The Pareto optimal E-f curves are compared
RITC and CITC contribute to ~25% and ~40% of tCP, respectively, between the theoretical n-MoS2/p-BP FET and Si FinFET in Fig.
across different VDD’s. All in all, interconnects in a VLSI system 11. With the same assumed ρCON of 10−8 Ω-cm2, the MoS2/BP
are complex because various design techniques are used by the FETs provide 51% lower iso-frequency energy consumption, or
design tools to balance the interconnect RC against the 39% faster iso-energy frequency compared to the Si FinFET.
transistor RC. Therefore, evaluation of the system-level The superior energy efficiency is attributed to the much lower
performance without factoring in P&R optimization will be cell-level capacitance as shown in Fig. 12. Input capacitance of
incomplete. the INV_X1 cell based on the MoS2/BP FETs is 53% lower than
the Si-FinFET counterpart. 60% of the MoS2/BP capacitance
D. n-MoS2/p-BP Planer FETs vs. Si FinFET benefit comes from the reduced CG2C due to the longer LSPA
n-MoS2/p-BP FETs are compared against a projected 5-nm (thanks to the better LGATE scalability) and its planar structure
Si FinFET using DISPEL to study their potentials and (as it doesn’t have the gate-to-epi capacitance as illustrated in
challenges. While device performance benchmarking is itself Fig. 3b), which highlights the importance of device and
an important topic requiring careful analysis [45-46], the aim standard cell co-optimization to reduce parasitic RC. However,
here is not to arrive at a definitive assessment of a technology, metal-to-2D-material contact resistances still limit the
but rather to demonstrate the importance of considering both performance of 2D FETs to date [39]. As shown in Fig. 11, for
parasitic and intrinsic parts as well as transistors and the MoS2/BP FETs to achieve comparable performance of the
interconnects in the performance evaluation. Si FinFET (with an assumed metal-to-Si ρCON of 10-8 Ω-cm2), a
The VS model for the Si FinFET is fitted to the 14-nm data ρCON < 5×10-8 Ω-cm2 is required, which is ~6× lower than the
[43] to extract μ and v and capture the I-V profile. Then the μ ρCON for monolayer MoS2-to-metal contacts obtained from
and v are scaled up by 1.1× to match the projected intrinsic experiments [39]. Comparison of core area vs. frequency
current assuming ballistic carrier transport [44]. The key VS between the two transistor technologies is shown in Fig. 13.
model parameters and more details are summarized in
Fig. 15. A fully connected neuron network as the system performance predictor.
Input: a vector of technology parameters and achieved clock frequency; output:
Fig. 14. 41 features are selected as the input to the neural-network model, core energy or area.
including 30 features to characterize the logic performance (6 logic gates × 5
quantities measured from the chain of logic) and 9 features to characterize the
interconnect performance.

Despite the lower drive current of the planar MoS2/BP FETs,


the core area is not necessarily larger than the Si FinFET for the
same fACH because of its lower cell capacitance. However, in the
low frequency regime where the cell sizes are smaller in general
(see Fig. 9d), the fixed capacitive loads at the core I/O pins
become more important. Therefore, the MoS2/BP FETs require
more area (by ~3%) to drive the I/O pins and wire loads
compared to the Si-FinFET counterpart.
To conclude this section, one of the key advantages of transistors
based on one-/two-dimensional channel materials over bulk Fig. 16. A representative training and validation losses vs. epochs showing no
material-based transistors (e.g. Si FinFETs) is their low cell- signs of overfitting.
level parasitic RC thanks to the superior electrostatic gate
v’s in the FET models, ρCON’s, dielectric constants of gate
control, which enables further scaling of LGATE without the need
spacers, BEOL metal resistivity, etc.), while the system
to grow in the height of the device structure and gives more
architecture and design rules remain unchanged. Only the data
room for LSPA and LCON. Therefore, it is important to factor in
points sitting on the Pareto-optimal curves for a fixed VDD are
the cell-level capacitance and optimize the device structure and
selected as they represent the optimal designs. The resulting
SC layout during technology assessment.
data set has 2,763 samples. Each combination of technology
parameters is then transformed into a vector of numbers, or
IV. NEURAL-NETWORK PERFORMANCE PREDICTOR
features, as the inputs to the ML models, as illustrated in Fig.
While DISPEL provides accurate evaluation of system-level 14. The input features include: (i) 30 features characterizing the
performance compared to the empirical models [11-14], the run performance of 6 different logic gates. For each logic gate, 5
time is much longer, ranging from hours to days, depending on characteristics are derived from a simple fan-out-of-3 (FO3)
the design complexity and constraints, which is not ideal for circuit [62], including pull-up and pull-down drive currents
early technology assessment or design space exploration. (ION), falling and rising delays, and average switching energy
Therefore, we train machine-learning (ML) models to predict consumption; (ii) 9 features characterizing the interconnect
system-level performance with technology-level parameters as technology. Only M2, M4, and M6 are selected to represent the
the inputs. Most of the existing empirical models for system- BEOL metal stacks because the physical properties (dimensions,
level performance prediction rely on a set of explicit equations resistivity) of M3/M5 are identical with M2/M4, and M1 is not
and empirical parameters (e.g. Rent’s exponent, logic depth, used for routing; for vias, only V1, V3, and V5 are selected for
fan-out), which require calibration to accurate data generated the same reason, and only the resistances are considered
by full physical design flow. As discussed in section III.C, because via capacitances are negligible; (iii) the last two
wiring optimization is such a complex process that it is very features are VDD and fACH. The model output is either the core
challenging for any explicit analytical models to predict the energy or area. Next, a fully connected NN model as illustrated
results accurately. In this regard, ML appear to be a reasonable in Fig. 15 with the following designs are trained using
approach since they are good at discovering intricate structure TensorFlow [49]: (i) the activation function of each neuron is a
in large data sets. As DISPEL is highly automated, generating Softplus function [50], i.e. f(x) = ln(1+ex); (ii) mean squared
data for a variety of technology parameters becomes error is used as the regression’s loss function; (iii) the Adam
straightforward, making it feasible to train ML models. algorithm [51] is used as the optimizer. Before training, input
To prepare training and testing data sets, we repeat the vectors are rescaled to a range of [−1, 1], and the neuron weights
process introduced in Section III.A for a variety of are initialized by the Xavier initialization [52] for training
combinations of technology parameters (e.g. different μ’s and stability. The entire data set is split into training and test data
Fig. 18. Comparison of minimum core energy-delay product vs. gate spacer
length between the implementation results using DISPEL and the neural-
network (NN) model predictions for nanowire FETs with fixed CGP = 36 nm
and LGATE = 11 nm at VDD = 0.6 V.

Fig. 17. Comparison of the core (a) energy and (b) area vs. frequency between
the implementation results using DISPEL and the neural-network (NN) model
predictions for n-MoS2/p-BP FET+BEOL interconnect with 50% wire
resistance and 25% lower capacitance.

sets. The training set is further divided into 80/20 ratios for
training and validation, respectively. Hyper-parameters such as
learning rates, L2 regularization, and the architecture of NNs
are experimented to minimize the validation error. We found
empirically that a 2(-hidden)-layer NN with 40 neurons on the
first hidden layer and 20 neurons on the second layer achieves
the minimum losses (~4% test loss). Fig. 16 shows one
representative training curve over one million epochs without
any signs of overfitting.
While high accuracy is necessary for a good ML model, it is
also important to verify its physical robustness. To this end,
three test sets are created. Each test is composed of a
combination of transistor and interconnect technologies that the
model has never seen in the training data set. The first test is to
test if the model can capture the relations between the key
performance metrics-core energy consumptions, areas, and
clock frequencies. The test data is based on a hypothetical
Fig. 19. (a) Comparison of minimum energy-delay product vs. wire resistance
interconnect technology with a BEOL wire resistivity (RWIRE) multiplier (XRW) between the DISPEL implementation results and neural-
50% lower than the baseline case of Cu and a 25% lower network (NN) predictions. The dotted lines are predictions of ring-oscillator
interlayer dielectric constant (i.e. lower wire capacitances). Fig. (RO) models with different lengths of the wire loads. (b) Core area vs.
frequency for different XRW’s.
17 shows the predicted Pareto optimal curves of energy and area
vs. frequency compared against the results generated by anymore, and thus the implementation results become less
DISPEL. To generate the predictions, the input clock frequency predictable in the high frequency regime. Nonetheless, it is the
to the NN is swept from low to high while the rest of the input region around the ‘knees’ of the curves (illustrated in Fig. 17)
features are fixed. The predicted energy consumptions and core that matter the most because it is where the most efficient
areas increase smoothly and monotonically with increasing designs (e.g. min-EDP) reside in.
frequencies at an accelerating rate, which is a result of the The second test is to test if the model can be used to explore
shallow NN with just two hidden layers and is more physically the optimal device structure for a given performance metric.
meaningful over the results of other models with higher The test data is based on a stacked nanowire (NW) FET model
complexity. In-depth analysis of the model is discussed in fitted to numerical simulations [53] and is not present in the
Section V. Note that the deviation of the predictions from the training data set. Performance analysis of NW FETs has been
implementation results (through DISPEL) is larger in the high studied elsewhere [40,54] and is not the focus in this paper so
frequency regime. To achieve higher frequencies, more larger the NW FETs here should be viewed as yet another new
logic gates are needed. Beyond a certain point, the capacitances transistor technology to be explored. Fig. 18 shows the
of the logic gates on the critical paths become so dominant that predicted core min-EDP at VDD = 0.6 V compared against the
making the gates bigger does not increase the frequency implementation results for different LSPA’s and LCON’s (see Fig.
Fig. 21. Outputs of the 20 neurons on the second hidden layer vs. input
frequency. The pivot neuron is the only neuron that transitions from being
inactive to active across the frequency range while the other neurons always
stay either inactive or active.

Fig. 20. Weights of the 41 input features to two of the neurons on the first
hidden layer: (a) the 17th neuron reactive to BEOL wire resistivity. (b) The 14th
neuron reactive to logic gate delays and drive currents (ION).

3) with fixed LGATE and CGP. While the predictions of the min-
EDP are slightly off at the long LSPA regime, the predicted
optimal LSPA and LCON are reasonably close to the DISPEL
results.
The third test is to test if impact of RWIRE on the core
performance is captured in the model, as RWIRE is expected to
increase rapidly with dimensional scaling. The test data is
created by artificially multiplying RWIRE of M2 to M6 by a Fig. 22. Comparison of the predicted energy-frequency curves between two
factor of XRW (from 0.1 to 4) in the ITF file in the DISPEL neural-network models with different activation functions: the Softplus vs.
Rectifier functions.
workflow. The predicted min-EDPs of the core vs. XRW are
compared with the implementation results in Fig. 19a. The min- The other neurons always stay either inactive or active across
EDP grows sub-linearly with increasing XRW as the P&R tool the frequency range. It is the transition of this pivot neuron from
manages to reduce the impact of wire resistances by, for being inactive to active that leads to the “hockey-stick” shape
instance, inserting more buffers. Consequently, the core areas of the energy (and area) vs. frequency relations, and the smooth
are also increased with increasing XRW as shown in Fig. 19b. transition is a characteristic of the Softplus function. The effects
The NN model is able to capture the nonlinear relation between of other neurons are mainly to shift the E-f (and A-f) curves in
EDP and XRW, whereas a ring-oscillator model with a fixed- the vertical and/or lateral directions. For example, an increase
length wire load would predict a linear increase in EDP as XRW in RWIRE raises the output of HL1’s 17th neuron, which activates
increases (as shown by the dotted lines in Fig. 19a). the pivot neuron on HL2 earlier and shifts the curves toward the
left, i.e. higher energy and larger area for the same frequency.
V. ANALYSIS OF NEURAL-NETWORK MODEL Interpretation of what representations has a NN learned is still
In-depth analysis of the 2(-hidden)-layer NN model an active research area [55]. To see the effect of using the
introduced in Section IV is presented in this section. The first Softplus function, another NN with Rectified Linear Units
hidden layer (HL1) has 40 neurons. Each neuron has 41 weights (ReLUs) [56] (i.e. f(x) = max(0, x)) as the activation functions
corresponding to the 41 input features (see Fig. 15). The is trained and the result is shown in Fig. 22 compared against
weights reflect the sensitivity of each neuron to different the predictions of the Softplus-based NN. The zigzag pattern is
features. For instance, Fig. 20 shows that the 17th neuron is apparent in the output of the ReLU-based NN because the
particularly reactive to RWIRE, while the 14th neuron is reactive output of a ReLU is a piecewise linear function, and overfitting
to the logic gate characteristics. The weights corresponding to becomes more likely to happen. Similar outcomes of overfitting
the logic gate ION and delay features have opposite polarities, are also observed in other ML models such as NN with more
which matches the intuition because large ION’s are preferred hidden layers or random forest regression models. The 2-
for performance whereas large delays are unfavorable. The hidden-layer NN with Softplus activation functions is found to
second hidden layer (HL2) has 20 neurons and start to become be the optimal model architecture that gives both accurate
too opaque to draw insights from their weights. Nonetheless, as predictions as well as smooth outputs which reduces the chance
shown in Fig. 21, it is found that only one neuron transitions of overfitting.
from an inactive state (i.e. the output is close to 0) to an active
state (i.e. the output is >> 0) as the input frequency increases.
VI. DISCUSSION AND OUTLOOK TABLE II. KEY VIRTUAL SOURCE MODEL PARAMETERS
Theoretical Theoretical Projected
In this section, limitations of the DISPEL workflow and the Parameter
n-MoS2 FET p-BP FET Si FinFET
ML-based performance predictors as well as possible ways to v (107 cm/s) 1.17 1.7 0.97
improve them are discussed. While only nominal cases at the μ (cm2/V-s) 200 350 253
LGATE(nm) 10 10 18
typical corner were analyzed, PVT variations can be easily
CINV (μF/cm2) 4.36 4.26 3.14
incorporated in DISPEL through multi-corner multi-mode Fin Width
N/A N/A 5/30/21
analysis [57]. The key challenge is to create models that capture /Height/Pitch (nm)
the process variations in the new transistor and interconnect SS (mV/dec) 70
technologies, which is a non-trivial task because new
technologies are often not mature enough to provide sufficient
amounts of data at different corners. Similarly, while
integrating memory instances (e.g. SRAM) into DISPEL is
possible, creating memory models and compilers [58] that
accurately capture PVT variations to ensure adequate margins
for millions of memory cells requires significant amount of
work.
The main idea of the DISPEL workflow is to streamline the
process from end to end to provide a holistic view of the impact
of CMOS technologies on the system-level performance.
Performance evaluation across different technology nodes can Fig. 23. Virtual Source model (lines) fitted to the 14-nm Si FinFET data [43]
provide insights into the benefits of further dimensional (circles) to extract carrier mobility (μ) and velocity (v). (a) ID-VDS and (b) ID-
VGS. Effective thickness of the gate oxide is assumed to be 1.2 nm.
downscaling and design guidance. However, several parts are
skipped in this paper for simplicity, including the design that simple benchmark circuits cannot offer. Using DISPEL, we
constraints of different lithography technologies and floor plan demonstrate how device structures can be optimized to reduce
optimization. For more rigorous and realistic assessment, the impact of parasitic RC on the performance of a 32-bit
proper design rules and floor plan optimization need to be taken processor core at the projected 5-nm node and provide a more
into account, which by themselves are also big research topics. accurate view of the advantages of 2D-channel-material FETs
Thanks to the highly automated workflow of DISPEL, large and their challenges. Large amount of data generated by the
amount of data can be generated to leverage the power of ML highly automated DISPEL workflow is used to train neural-
algorithms to discover the dependencies of system-level network models to predict the performance of the processor
performance on technology-level design parameters. The NN core. A two-hidden-layer neuron network with Softplus
model presented in this paper is an attempt the performance of activation functions is found to achieve the most accurate and
a specific system (i.e. the 32-bit processor core) at a particular physically favorable results. As technology scaling becomes
node (i.e. the projected 5-nm node). To generalize the method ever more challenging, highly integrated and automated design
to different system architectures and/or technology nodes, the flows across the boundaries between devices, interconnects,
input features must be modified to encapsulate the architectural circuits, and systems like DISPEL can facilitate technology
information. The Rent’s exponent and constants is an example development and provide design guidance in the early stage.
to abstract a certain type of system in a few numbers [11]. To
account for dimensional downscaling in a more general sense, APPENDIX
CGP and interconnect dimensions need to be treated as free Key parameters of the VS models for the theoretical n-type
input variables. And just like all the ML problems, feature MoS2 FET, p-type BP FET, and the projected Si FinFET at the
engineering can make a big difference. One can choose device- 5-nm node are listed in in Table II. For the n-MoS2 and p-BP
or process-level parameters such as LGATE, CGP, μ, or ρCON FETs, μ and v are extracted from the current-/capacitance-
rather than the logic-level features. In any case, a large amount voltage characteristics of physics-based numerical simulations
of data is required, which calls for a highly integrated and [18, 19, 48] and should be viewed as theoretical predictions; for
automated design flow across the boundaries between devices, the Si FinFET, μ and v are extracted from the 14-nm FinFET
interconnects, circuits, and systems like DISPEL. experimental data [43] (Fig. 23) and scaled up by 1.1× to match
the projection assuming ballistic transport [44]. LGATE is set at
VII. CONCLUSION the value such that the subthreshold slope is 70 mV/dec based
The Device-to-System Performance EvaLuation (DISPEL) on numerical simulations. The inversion gate-to-channel
platform presented in this paper provides a framework for capacitance (CINV) is derived from COX∙CQ / (COX + CQ), where
assessment of new transistor and interconnect technologies COX = εSiO2/EOT, CQ is the quantum capacitance, εSiO2 is the
from the standpoint of system-level performance through a permittivity of SiO2, and EOT = 0.7 nm.
highly integrated workflow from transistor and interconnect
modeling to physical design flow. Full-chip placement and
routing enables accurate evaluation of the system performance
ACKNOWLEDGMENT Digital VLSI,” IEEE Trans. Nanotechnol., vol. 17, no. 6, pp. 1259–1269,
Nov 2018.
This work was supported in part through the NCN-NEEDS [17] C. Pan, P. Raghavan, A. Ceyhan, F. Catthoor, Z. Tokei, and A. Naeemi,
program, which was funded by the National Science “Technology/Circuit/System Co-Optimization and Benchmarking for
Multilayer Graphene Interconnects at Sub-10-nm Technology Node,” IEEE
Foundation, contract 1227020-EEC, and by the Semiconductor Trans. Electron Devices, vol. 62, no. 5, pp. 1530-1536, May 2015.
Research Corporation, and through Systems on Nanoscale [18] J. Shi1, D. Nayak, S. Banna, R. Fox, S. Samavedam, S. Samal, and S. K.
Information fabriCs (SONIC), and Function Accelerated Lim, “A 14nm Finfet Transistor-Level 3D Partitioning Design to Enable
High-Performance and Low-Cost Monolithic 3D IC,” in Proc. IEEE Int.
NanoMaterials Engineering (FAME), two of the six SRC
Electron Devices Meeting (IEDM), pp. 2.5.1–2.5.4, Dec. 2016.
STARnet Centers, sponsored by MARCO and DARPA, as well [19] K. Chang, K. Acharya, S. Sinha, B. Cline, G. Yeric, and S. K. Lim, “Impact
as the member companies of the Stanford SystemX Alliance and Design Guideline of Monolithic 3-D IC at the 7-nm Technology Node,”
and the Initiative for Nanoscale Materials and Processes (INMP) IEEE Trans. Very Large Scale Integration Syst., vol. PP, pp. 1-12, Apr. 2017.
[20] C.-S. Lee and H.-S. P. Wong. (2017). Device-to-System Performance
at Stanford University. EvaLuation tool (DISPEL) [Online]. Available:
https://nano.stanford.edu/device-system-performance-evaluation-tool
REFERENCES [21] A. Khakifirooz, O. M. Nayfeh, and D. Antoniadis, “A Simple Semi-
empirical Short-Channel MOSFET Current–Voltage Model Continuous
[1] T. Skotnicki, J. A. Hutchby, T.-J. King, H.-S. P. Wong, and F. Boeuf, “The Across All Regions of Operation and Employing Only Physical Parameters,”
road to the end of CMOS scaling,” IEEE Circuits Devices Mag., vol. 21, no. IEEE Trans. Electron Devices, vol. 56, no. 8, pp. 1674–1680, Aug. 2009.
1, pp. 16–26, Jan./Feb. 2005. [22] M. S. Lundstrom and D. A. Antoniadis, “Compact Models and the Physics
[2] R. Brain, “Interconnect Scaling: Challenges and Opportunities,” in Proc. of Nanoscale FETs,” IEEE Trans. Electron Devices, vol. 61, no. 2, pp. 225-
IEEE Int. Electron Devices Meeting (IEDM), pp. 9.3.1–9.3.4, Dec. 2016. 233, Feb. 2014.
[3] K. Ronse, P. De Bisschop, G. Vandenberghe, E. Hendrickx, R. Gronheid, [23] L. Liu, Y. Lu, and J. Guo, “On Monolayer MoS2 Field-Effect Transistors
A. Vaglio Pret, A. Mallik, D. Verkest, and A. Steegen, “Opportunities and at the Scaling Limit,” IEEE Trans. Electron Devices, vol. 60, no. 12, pp.
challenges in device scaling by the introduction of EUV lithography,” in 4133-4139, Dec. 2013.
Proc. IEEE Int. Electron Devices Meeting (IEDM), pp. 18.5.1-18.5.4, 2012. [24] X. Cao and J. Guo, “Simulation of Phosphorene Field-Effect Transistor at
[4] V. Moroz, L. Smith, J. Huang, M. Choi, T. Ma, J. Liu, Y. Zhang, X.-W. Lin, the Scaling Limit,” IEEE Trans. Electron Devices, vol. 62, no. 2, pp. 659-
J. Kawa, and Y. Saad, “Modeling and Optimization of Group IV and III-V 665, Feb. 2015.
FinFETs and Nano-Wires,” in Proc. IEEE Int. Electron Devices Meeting [25] S. Rakheja, D. Antoniadis, (2014). MVS Nanotransistor Model (Silicon).
(IEDM), pp. 7.4.1–7.4.4, Dec. 2014. nanoHUB. doi:10.4231/D3H12V82S
[5] C. D. English, K. Smithe, R. Xu, and E. Pop, “Approaching Ballistic [26] A. V-Y Thean, D. Yakimets, T. H. Bao, P. Schuddinck, S. Sakhare, M. G.
Transport in Monolayer MoS2 Transistors with Self-Aligned 10 nm Top Bardon, A. Sibaja-Hernandez, I. Ciofi, G. Eneman, A. Veloso, J. Ryckaert,
Gates,” in Proc. IEEE Int. Electron Devices Meeting (IEDM), pp. 5.6.1– P. Raghavan, A. Mercha, A. Mocuta, Z. Tokei, D. Verkest, P. Wambacq, K.
5.6.4, Dec. 2016. De Meyer, and N. Collaert, “Vertical Device Architecture for 5nm and
[6] G. S. Tulevski, A. D. Franklin, D. Frank, J. M. Lobez, Q. Cao, H. Park, A. beyond: Device & Circuit Implications,” in Proc. VLSI Technol. Symp., pp.
Afzali, S.-J. Han, J. B. Hannon, and W. Haensch, “Toward High- T26-T27, Jun. 2015.
Performance Digital Logic Technology with Carbon Nanotubes,” ACS Nano, [27] G. K. Reeves and H.B. Harrison, “Obtaining the specific contact resistance
vol. 8, no. 9, pp. 8730-8745, 2014. from transmission line model measurements,” IEEE Electron Device Lett.,
[7] Z. Tőkei, “End of Cu roadmap and beyond Cu,” in Proc. IEEE Int. vol. 3, no. 5, pp. 111-113, May 1982.
Interconnect Technol. Conf./Adv. Metallization Conf. (IITC/AMC), Short [28] W. Steinhogl, G. Schindler, G. Steinlesberger, and M. Engelhardt, “Size-
Course, 2016. dependent resistivity of metallic wires in the mesoscopic range,” Phys. Rev.
[8] D. Kondo, H. Nakano, B. Zhou, A. I, K. Hayashi, M. Takahashi, S. Sato and B, vol. 66, pp. 075414, 2002.
N. Yokoyama, “Sub-10-nm-wide intercalated multi-layer graphene [29] A. Pyzyna, R. Bruce, M. Lofaro, H. Tsai, C. Witt, L. Gignac, M. Brink, M.
interconnects with low resistivity,” in Proc. IEEE Int. Interconnect Technol. Guillorn, G. Fritz, H. Miyazoe, D. Klaus, E. Joseph, K. P. Rodbell, C.
Conf./Adv. Metallization Conf. (IITC/AMC), pp. 189-192, 2014. Lavoie, D.-G. Park, “Resistivity of copper interconnects beyond the 7 nm
[9] J. Ryckaert et al., “Design Technology Co-Optimization for N10,” in Proc. node,” in Proc. VLSI Technol. Symp., pp. T120-T121, Jun. 2015.
IEEE Custom Integr. Circuits Conf. (CICC), pp. 1-8, 2014. [30] Synopsys. (2015). StarRC™ User Guide and Command Reference:
[10] S. Sinha, L. Shifren, V. Chandra, B. Cline, G. Yeric, R. Aitken, B. Cheng, Product Version K-2015.06. Mountain View, CA: Author.
A. Brown, C. Riddet, C. Alexandar, C. Millar, and A. Asenov, “Circuit [31] Synopsys. (2015). SiliconSmart ACE User Guide: Product Version K-
design perspectives for Ge FinFET at 10nm and beyond,” In Proc. Int. Symp. 2015.06. Mountain View, CA: Author.
Quality Electron. Design (ISQED), pp. 57-60, 2015. [32] M. G. Bardon, P. Raghavan, G. Eneman, P. Schuddinck, M. Dehan, A.
[11] H. B. Bakoglu and J. D. Meindl, “A System-Level Circuit Model for Multi- Mercha, A. Thean, D. Verkest, and A. Steegen, “Group IV channels for 7nm
and Single-Chip CPUs,” IEEE Int. Solid-State Circuits Conf. (ISSCC), pp. FinFETs: Performance for SoCs Power and Speed Metrics,” in Proc. VLSI
308-309, 1987. Technol. Symp., pp. 88-89, Jun. 2014.
[12] D. Sylvester and K. Keutzer, “System-Level Performance Modeling with [33] G. R. Bhimanapati, “Recent Advances in Two-Dimensional Materials
BACPAC—Berkeley Advanced Chip Performance Calculator,” beyond Graphene,” ACS Nano, vol. 9, no. 12, pp. 11509-11539, 2015.
International Workshop on System-Level Interconnect Prediction, pp. 109- [34] Aron Szabo, Reto Rhyner, Hamilton Carrillo-Nunez, and Mathieu Luisier,
114, 1999. “Phonon-limited performance of single-layer, single-gate black phosphorus
[13] D. J. Frank, W. Haensch, G. Shahidi, and O. H. Dokumaci, “Optimizing n- and p-type field-effect transistors,” in Proc. IEEE Int. Electron Devices
CMOS Technology for Maximum Performance,” IBM J. Res. & Dev., vol. Meeting (IEDM), pp. 12.1.1–12.1.4, Dec. 2015.
50, No. 4/5, pp. 419-431, Jul./Sep. 2006. [35] C.-S. Lee, B. Cline, S. Sinha, G. Yeric, and H.-S. P. Wong, “32-bit
[14] S. Wang, A. Pan, C. O. Chui, and P. Gupta, “Proceed: A pareto Processor Core at 5-nm Technology: Analysis of Transistor and
optimization-based circuit-level evaluator for emerging devices”, IEEE Interconnect Impact on VLSI System Performance,” in Proc. IEEE Int.
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 24, no. 1, pp. 192-205, Electron Devices Meeting (IEDM), pp. 28.3.1–28.3.4, Dec. 2016.
Jan. 2016. [36] B. Amrutur and M. Horowitz, “A Replica Technique for Wordline and
[15] B. S. Landman and R. L. Russo, “On a pin versus block relationship for Sense Control in Low-Power SRAM’s,” IEEE J. Solid-State Circuits, vol.
partitions of logic paths,” IEEE Trans. Comput., vol. C-20, pp. 1469–1479, 33, no. 8, pp. 1208-1219, Aug. 1998.
Dec. 1971. [37] A. Raychowdhury, B. Geuskens, J. Kulkarni, J. Tschanz, K. Bowman, T.
[16] G. Hills, M. G. Bardon, G. Doornbos, D. Yakimets, P. Schuddinck, R. Karnik, S.-L. Lu, V. De, M. Khellah, “PVT-and-Aging Adaptive Wordline
Baert, D. Jang, L. Mattii, S. M. Y. Sherazi, D. Rodopoulos, R. Ritzenthaler, Boosting for 8T SRAM Power Reduction,” IEEE Int. Solid-State Circuits
C.-S. Lee, A. V.-Y. Thean, I. Radu, A. Spessot, P. Debacker, F. Catthoor, P. Conf., Dig. Tech., pp. 351-353, 2010.
Raghavan, M. M. Shulaker, H.-S. P. Wong, and S. Mitra, “Understanding [38] G. Yeric, “Challenges of 7nm CMOS Technologies: Circuit Application
Energy Efficiency Benefits of Carbon Nanotube Field-Effect Transistors for Requirements,” in Proc. IEEE Int. Electron Devices Meeting (IEDM), Short
Course, Dec. 2014.
[39] C. D. English, G. Shine, V. E. Dorgan, K. C. Saraswat, and E. Pop, [62] D. Harris, R. Ho, G.-Y. Wei, and M. Horowitz, “The fanout-of-4 Inverter
“Improved Contacts to MoS2 Transistors by Ultra-High Vacuum Metal Delay Metric” [Online]. Available:
Deposition,” Nano Lett., vol. 16, pp. 3824-3830, May 2016. https://www.ece.ucdavis.edu/~bbaas/116/docs/paper.harris.FO4.pdf
[40] C.-S. Lee, E. Pop, A. D. Franklin, W. Haensch, and H.-S. P. Wong, “A
Compact Virtual-Source Model for Carbon Nanotube Field-Effect
Transistors in the Sub-10-nm Regime—Part II: Extrinsic Elements,
Performance Assessment, and Design Optimization,” IEEE Trans. Electron
Devices, vol. 62, no. 9, pp. 3070-3078, Sep. 2015.
[41] K. Shahookar and P. Mazumder, “VLSI cell placement techniques,” ACM
Computing Surveys, vol. 23, no. 2, pp. 143-220, Jun. 1991.
[42] J. Vygen, “Algorithms for large-scale flat placement,” in Proc. ACM/IEEE
Design Automation Conf. (DAC), pp. 746-751, 1997.
[43] S. Natarajan et al., “A 14nm logic technology featuring 2 nd-generation
FinFET Transistors, air-gapped interconnects, self-aligned double
patterning and a 0.0588 µm2 SRAM cell size,” in Proc. IEEE Int. Electron
Devices Meeting (IEDM), pp. 3.7.1-3.7.4, Dec. 2014.
[44] L. Smith, M. Choi, M. Frey, V. Moroz, A. Ziegler, and M. Luisier,
“FinFET to Nanowire Transition at 5nm Design Rules,” in Proc. IEEE Int.
Conf. Simul. Semiconductor Process. Devices, pp. 254–257, Sep. 2015.
[45] L. Shifren, R. Aitken, A. R. Brown, V. Chandra, B. Cheng, C. Riddet, C.
L. Alexander, B. Cline, C. Millar, S. Sinha, G. Yeric, and A. Asenov,
“Predictive Simulation and Benchmarking of Si and Ge pMOS FinFETs for
Future CMOS Technology,” IEEE Trans. Electron Devices, vol. 61, no. 7,
pp. 2271-2277, Jul. 2014.
[46] D. E. Nikonov and I. A. Young, “Benchmarking of Beyond-CMOS
Exploratory Devices for Logic Integrated Circuits,” IEEE J. Exploratory
Solid-State Comput. Devices Circuits, vol. 1, no. 1, pp. 3-11, Dec. 2015.
[47] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward
networks are universal approximators,” Neural Networks, vol. 2, no. 5, pp.
359-366, 1989.
[48] L. Scheffer, L. Lavagno, and G. Martin, EDA for IC Implementation,
Circuit Design, and Process Technology. Boca Raton, FL, U.S.A.: CRC
Press, 2006, pp. 5.1-5.23.
[49] M. Abadi et al., TensorFlow: Large-scale machine learning on
heterogeneous systems, 2015. Software available from tensorflow.org.
[50] X. Glorot, A. Bordes, and Y. Bengio, “Deep Sparse Rectifier Neural
Networks,” in Proc. Conf. Artificial Intelligence and Statistics, pp. 315-323,
2011.
[51] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
arXiv preprint arXiv:1412.6980, 2014.
[52] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep
feedforward neural networks,” In Proc. Conf. on Artificial Intelligence and
Statistics, vol. 9, pp. 249-256, 2010.
[53] M. Choi, V. Moroz, L. Smith, and J. Huang, “Extending drift-diffusion
paradigm into the era of FinFETs and nanowires,” in Proc. IEEE Int. Conf.
Simul. Semiconductor Process. Devices, pp. 242–245, Sep. 2015.
[54] D. Jang, D. Yakimets, G. Eneman, P. Schuddinck, M. G. Bardon, P.
Raghavan, A. Spessot, D. Verkest, and A. Mocuta, “Device Exploration of
NanoSheet Transistors for Sub-7-nm Technology Node,” IEEE Trans.
Electron Devices, vol. 64, no. 6, pp. 2707-2713, Jun. 2017.
[55] D. Castelvecchi, “Can we open the black box of AI?” Nature, vol. 538, pp.
21-23, Oct. 2016.
[56] A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet Classification with
Deep Convolutional Neural Networks,” in Proc. Neural Information and
Processing Systems (NIPS), 2012.
[57] S. Onaissi, F. Taraporevala, J. Liu, F. Najm, “A Fast Approach for Static
Timing Analysis Covering All PVT Corners,” in Proc. ACM/IEEE Design
Automation Conf. (DAC), pp. 777-782, 2011.
[58] M. R. Guthaus, J. E. Stine, S. Ataei, B. Chen, B. Wu, and M. Sarwar,
“OpenRAM: An Open-Source Memory Compiler,” in Proc. Int. Conf. on
Computer-Aided Design (ICCAD), 2016.
[59] Y. Liu, J. Guo, E. Zhu, L. Liao, S.-J. Lee, M. Ding, I. Shakir, V. Gambin,
Y. Huang, and X. Duan, “Approaching the Schottky–Mott limit in van der
Waals metal–semiconductor junctions,” Nature, vol. 557, pp. 696-700, May
2018.
[60] Y. Wang, J. C. Kim, R. Wu, J. Martinez, X. Song, J. Yang, F. Zhao, A.
Mkhoyan, H. Y. Jeong, and M. Chhowalla, “Van der Waals contacts
between three-dimensional metals and two-dimensional semiconductors,”
Nature, vol. 568, pp. 70-74, Apr. 2019.
[61] G. Pitner, G. Hills, J. P. Llinas, K.-M. Persson, R. Park, J. Bokor, S. Mitra,
and H.-S. P. Wong, “Low-Temperature Side Contact to Carbon Nanotube
Transistors: Resistance Distributions Down to 10 nm Contact Length,”
Nano Lett., vol. 19, pp. 1083-1089, Jan. 2019.

You might also like