Device-to-System Performance Evaluation From Trans
Device-to-System Performance Evaluation From Trans
Device-to-System Performance Evaluation From Trans
for source code release. Though some of the references and assumptions in this manuscript are dated, the key
messages which the authors were trying to convey remain valid today amid the continuous advancement of
one-/two-dimensional materials:
a) The intrinsic performance gain through the adoption of two-dimensional materials as the FET channel
materials can be greatly limited by the parasitic resistances.
b) Design technology co-optimization across the boundaries between devices, interconnects, circuits,
and systems can help direct resources toward the most promising candidates and maximize the values
of new technologies.
c) Machine learning algorithms can facilitate holistic design technology co-optimization.
As a result, we decide to still upload the manuscript to arXiv now and hope that our findings and insights can
still benefit the research community.
Device-to-System Performance Evaluation:
from Transistor/Interconnect Modeling to VLSI
Physical Design and Neural-Network Predictor
Chi-Shuen Lee, Brian Cline, Saurabh Sinha, Greg Yeric, and H.-S. Philip Wong, Fellow, IEEE
Abstract—We present a DevIce-to-System Performance their potential benefits. Early insights into the key performance
EvaLuation (DISPEL) workflow that integrates transistor and detractors help focus development efforts such that resources
interconnect modeling, parasitic extraction, standard cell library can be directed toward the most important challenges.
characterization, logic synthesis, cell placement and routing, and
timing analysis to evaluate system-level performance of new However, accurate technology assessment at early stage is
CMOS technologies. As the impact of parasitic resistances and difficult due to growing impact of parasitic and interconnect
capacitances continues to increase with dimensional downscaling, resistances and capacitances, which depend on the cell layouts
component-level optimization alone becomes insufficient, calling and system architecture. Traditional simple benchmarks [9-10]
for a holistic assessment and optimization methodology across the become insufficient to capture the complexity of interconnects
boundaries between devices, interconnects, circuits, and systems. in a Very-Large-Scale Integration (VLSI) system. Ring
The physical implementation flow in DISPEL enables realistic
analysis of complex wires and vias in VLSI systems and their oscillators with fixed fan-out and wire loads is a common
impact on the chip power, speed, and area, which simple circuit benchmark to compare power and speed of different
simulations cannot capture. To demonstrate the use of DISPEL, a technologies, whose wire lengths are typically estimated by the
32-bit commercial processor core is implemented using theoretical average wire length of VLSI circuit implementations or a
n-type MoS2 and p-type Black Phosphorous (BP) planar FETs at multiple of the contacted gate pitch (CGP). While this simple
a projected 5-nm node, and the performance is benchmarked approach provides some insights, estimation of the wire load is
against Si FinFETs. While the superior gate control of the
MoS2/BP FETs can theoretically provide 51% reduction in the iso- not easy because different transistor drive current and
frequency energy consumption, the actual performance can be capacitance can lead to different optimal circuit topologies (see
greatly limited by the source/drain contact resistances. With the Section IV). Other system-level models [11-14] employed
large amount of data generated by DISPEL, a neural-network is Rent’s rule [15] to derive a stochastic wire distribution,
trained to predict the key performance metrics of the 32-bit optimize the interconnects given a gate delay model, and
processor core using the characteristics of transistors and predict the final chip power, speed, and area. Despite the high-
interconnects as the input features without the need to go through
the time-consuming physical implementation flow. The machine level abstraction of architectures and wiring optimization, many
learning algorithms show great potentials as a means for empirical parameters must be decided (e.g. the Rent’s
evaluation and optimization of new CMOS technologies and constants), which can be architecture and/or technology
identifying the most significant technology design parameters. dependent and need careful calibration. Recent works resorted
to full physical design flows to assess system-level performance
Index Terms—design-technology co-optimization, technology by actual implementation of a VLSI system or a large circuit
assessment, neural networks. module [16-19]. This approach provides the most accurate
performance evaluation as complex wiring is considered
I. INTRODUCTION
realistically. Therefore, in this paper we present an end-to-end
As conventional scaling of Si transistors and Cu interconnect DevIce-to-System Performance EvaLuation (DISPEL)
began to face significant difficulties [1-3], candidates to workflow that automates both Process Design Kit (PDK)
complement Si and Cu to extend CMOS technology scaling in development and physical design flows, which enables efficient
the sub-10-nm technology nodes are researched extensively. system-level performance evaluation of transistor and
Active development for new technologies includes nanowires interconnect technologies. Similar method has been employed
[4], two-dimensional (2D) semiconductors [5], and carbon in [19] to evaluate the benefits of carbon nanotube field-effect
nanotubes (CNT) for transistors [6]; and cobalt, graphene, and transistors (FETs). Here the emphasis is on: (i) the methodology
CNT for interconnects1 [7-8]. The high cost of developing a of performance evaluation, (ii) impact of parasitics on
new technology makes it vital to gain an early understanding of performance, and (iii) applying neural-network (NN) models to
efficient processor core-level performance evaluation.
C.-S. Lee was with the Department of Electrical Engineering, Stanford
University, Stanford, CA 94305 USA. He is now with Google LLC. (e-mail:
chishuen@alumni.stanford.edu)
Brian Cline is with ARM Ltd.
Saurabh Sinha was with ARM Ltd. He is now with Apple Inc.
Greg Yeric was with ARM Ltd. He is now with Cerfe Labs Inc.
H.-S. P. Wong is with the Department of Electrical Engineering, Stanford
University, Stanford, CA 94305 USA (e-mail: hspwong@stanford.edu).
1
The term “interconnect” in this paper refer to both wires and vias.
Fig. 1. Overview of the DevIce-to-System Performance EvaLuation Platform (DISPEL) workflow. Corresponding EDA tools or file formats are specified in the
parentheses. Acronyms–LEF: Library Exchange Format, Lib: Liberty model, RTL: Register Transfer Language, ITF: Interconnect Technology Format, VS: Virtual
Source [21].
Fig. 17. Comparison of the core (a) energy and (b) area vs. frequency between
the implementation results using DISPEL and the neural-network (NN) model
predictions for n-MoS2/p-BP FET+BEOL interconnect with 50% wire
resistance and 25% lower capacitance.
sets. The training set is further divided into 80/20 ratios for
training and validation, respectively. Hyper-parameters such as
learning rates, L2 regularization, and the architecture of NNs
are experimented to minimize the validation error. We found
empirically that a 2(-hidden)-layer NN with 40 neurons on the
first hidden layer and 20 neurons on the second layer achieves
the minimum losses (~4% test loss). Fig. 16 shows one
representative training curve over one million epochs without
any signs of overfitting.
While high accuracy is necessary for a good ML model, it is
also important to verify its physical robustness. To this end,
three test sets are created. Each test is composed of a
combination of transistor and interconnect technologies that the
model has never seen in the training data set. The first test is to
test if the model can capture the relations between the key
performance metrics-core energy consumptions, areas, and
clock frequencies. The test data is based on a hypothetical
Fig. 19. (a) Comparison of minimum energy-delay product vs. wire resistance
interconnect technology with a BEOL wire resistivity (RWIRE) multiplier (XRW) between the DISPEL implementation results and neural-
50% lower than the baseline case of Cu and a 25% lower network (NN) predictions. The dotted lines are predictions of ring-oscillator
interlayer dielectric constant (i.e. lower wire capacitances). Fig. (RO) models with different lengths of the wire loads. (b) Core area vs.
frequency for different XRW’s.
17 shows the predicted Pareto optimal curves of energy and area
vs. frequency compared against the results generated by anymore, and thus the implementation results become less
DISPEL. To generate the predictions, the input clock frequency predictable in the high frequency regime. Nonetheless, it is the
to the NN is swept from low to high while the rest of the input region around the ‘knees’ of the curves (illustrated in Fig. 17)
features are fixed. The predicted energy consumptions and core that matter the most because it is where the most efficient
areas increase smoothly and monotonically with increasing designs (e.g. min-EDP) reside in.
frequencies at an accelerating rate, which is a result of the The second test is to test if the model can be used to explore
shallow NN with just two hidden layers and is more physically the optimal device structure for a given performance metric.
meaningful over the results of other models with higher The test data is based on a stacked nanowire (NW) FET model
complexity. In-depth analysis of the model is discussed in fitted to numerical simulations [53] and is not present in the
Section V. Note that the deviation of the predictions from the training data set. Performance analysis of NW FETs has been
implementation results (through DISPEL) is larger in the high studied elsewhere [40,54] and is not the focus in this paper so
frequency regime. To achieve higher frequencies, more larger the NW FETs here should be viewed as yet another new
logic gates are needed. Beyond a certain point, the capacitances transistor technology to be explored. Fig. 18 shows the
of the logic gates on the critical paths become so dominant that predicted core min-EDP at VDD = 0.6 V compared against the
making the gates bigger does not increase the frequency implementation results for different LSPA’s and LCON’s (see Fig.
Fig. 21. Outputs of the 20 neurons on the second hidden layer vs. input
frequency. The pivot neuron is the only neuron that transitions from being
inactive to active across the frequency range while the other neurons always
stay either inactive or active.
Fig. 20. Weights of the 41 input features to two of the neurons on the first
hidden layer: (a) the 17th neuron reactive to BEOL wire resistivity. (b) The 14th
neuron reactive to logic gate delays and drive currents (ION).
3) with fixed LGATE and CGP. While the predictions of the min-
EDP are slightly off at the long LSPA regime, the predicted
optimal LSPA and LCON are reasonably close to the DISPEL
results.
The third test is to test if impact of RWIRE on the core
performance is captured in the model, as RWIRE is expected to
increase rapidly with dimensional scaling. The test data is
created by artificially multiplying RWIRE of M2 to M6 by a Fig. 22. Comparison of the predicted energy-frequency curves between two
factor of XRW (from 0.1 to 4) in the ITF file in the DISPEL neural-network models with different activation functions: the Softplus vs.
Rectifier functions.
workflow. The predicted min-EDPs of the core vs. XRW are
compared with the implementation results in Fig. 19a. The min- The other neurons always stay either inactive or active across
EDP grows sub-linearly with increasing XRW as the P&R tool the frequency range. It is the transition of this pivot neuron from
manages to reduce the impact of wire resistances by, for being inactive to active that leads to the “hockey-stick” shape
instance, inserting more buffers. Consequently, the core areas of the energy (and area) vs. frequency relations, and the smooth
are also increased with increasing XRW as shown in Fig. 19b. transition is a characteristic of the Softplus function. The effects
The NN model is able to capture the nonlinear relation between of other neurons are mainly to shift the E-f (and A-f) curves in
EDP and XRW, whereas a ring-oscillator model with a fixed- the vertical and/or lateral directions. For example, an increase
length wire load would predict a linear increase in EDP as XRW in RWIRE raises the output of HL1’s 17th neuron, which activates
increases (as shown by the dotted lines in Fig. 19a). the pivot neuron on HL2 earlier and shifts the curves toward the
left, i.e. higher energy and larger area for the same frequency.
V. ANALYSIS OF NEURAL-NETWORK MODEL Interpretation of what representations has a NN learned is still
In-depth analysis of the 2(-hidden)-layer NN model an active research area [55]. To see the effect of using the
introduced in Section IV is presented in this section. The first Softplus function, another NN with Rectified Linear Units
hidden layer (HL1) has 40 neurons. Each neuron has 41 weights (ReLUs) [56] (i.e. f(x) = max(0, x)) as the activation functions
corresponding to the 41 input features (see Fig. 15). The is trained and the result is shown in Fig. 22 compared against
weights reflect the sensitivity of each neuron to different the predictions of the Softplus-based NN. The zigzag pattern is
features. For instance, Fig. 20 shows that the 17th neuron is apparent in the output of the ReLU-based NN because the
particularly reactive to RWIRE, while the 14th neuron is reactive output of a ReLU is a piecewise linear function, and overfitting
to the logic gate characteristics. The weights corresponding to becomes more likely to happen. Similar outcomes of overfitting
the logic gate ION and delay features have opposite polarities, are also observed in other ML models such as NN with more
which matches the intuition because large ION’s are preferred hidden layers or random forest regression models. The 2-
for performance whereas large delays are unfavorable. The hidden-layer NN with Softplus activation functions is found to
second hidden layer (HL2) has 20 neurons and start to become be the optimal model architecture that gives both accurate
too opaque to draw insights from their weights. Nonetheless, as predictions as well as smooth outputs which reduces the chance
shown in Fig. 21, it is found that only one neuron transitions of overfitting.
from an inactive state (i.e. the output is close to 0) to an active
state (i.e. the output is >> 0) as the input frequency increases.
VI. DISCUSSION AND OUTLOOK TABLE II. KEY VIRTUAL SOURCE MODEL PARAMETERS
Theoretical Theoretical Projected
In this section, limitations of the DISPEL workflow and the Parameter
n-MoS2 FET p-BP FET Si FinFET
ML-based performance predictors as well as possible ways to v (107 cm/s) 1.17 1.7 0.97
improve them are discussed. While only nominal cases at the μ (cm2/V-s) 200 350 253
LGATE(nm) 10 10 18
typical corner were analyzed, PVT variations can be easily
CINV (μF/cm2) 4.36 4.26 3.14
incorporated in DISPEL through multi-corner multi-mode Fin Width
N/A N/A 5/30/21
analysis [57]. The key challenge is to create models that capture /Height/Pitch (nm)
the process variations in the new transistor and interconnect SS (mV/dec) 70
technologies, which is a non-trivial task because new
technologies are often not mature enough to provide sufficient
amounts of data at different corners. Similarly, while
integrating memory instances (e.g. SRAM) into DISPEL is
possible, creating memory models and compilers [58] that
accurately capture PVT variations to ensure adequate margins
for millions of memory cells requires significant amount of
work.
The main idea of the DISPEL workflow is to streamline the
process from end to end to provide a holistic view of the impact
of CMOS technologies on the system-level performance.
Performance evaluation across different technology nodes can Fig. 23. Virtual Source model (lines) fitted to the 14-nm Si FinFET data [43]
provide insights into the benefits of further dimensional (circles) to extract carrier mobility (μ) and velocity (v). (a) ID-VDS and (b) ID-
VGS. Effective thickness of the gate oxide is assumed to be 1.2 nm.
downscaling and design guidance. However, several parts are
skipped in this paper for simplicity, including the design that simple benchmark circuits cannot offer. Using DISPEL, we
constraints of different lithography technologies and floor plan demonstrate how device structures can be optimized to reduce
optimization. For more rigorous and realistic assessment, the impact of parasitic RC on the performance of a 32-bit
proper design rules and floor plan optimization need to be taken processor core at the projected 5-nm node and provide a more
into account, which by themselves are also big research topics. accurate view of the advantages of 2D-channel-material FETs
Thanks to the highly automated workflow of DISPEL, large and their challenges. Large amount of data generated by the
amount of data can be generated to leverage the power of ML highly automated DISPEL workflow is used to train neural-
algorithms to discover the dependencies of system-level network models to predict the performance of the processor
performance on technology-level design parameters. The NN core. A two-hidden-layer neuron network with Softplus
model presented in this paper is an attempt the performance of activation functions is found to achieve the most accurate and
a specific system (i.e. the 32-bit processor core) at a particular physically favorable results. As technology scaling becomes
node (i.e. the projected 5-nm node). To generalize the method ever more challenging, highly integrated and automated design
to different system architectures and/or technology nodes, the flows across the boundaries between devices, interconnects,
input features must be modified to encapsulate the architectural circuits, and systems like DISPEL can facilitate technology
information. The Rent’s exponent and constants is an example development and provide design guidance in the early stage.
to abstract a certain type of system in a few numbers [11]. To
account for dimensional downscaling in a more general sense, APPENDIX
CGP and interconnect dimensions need to be treated as free Key parameters of the VS models for the theoretical n-type
input variables. And just like all the ML problems, feature MoS2 FET, p-type BP FET, and the projected Si FinFET at the
engineering can make a big difference. One can choose device- 5-nm node are listed in in Table II. For the n-MoS2 and p-BP
or process-level parameters such as LGATE, CGP, μ, or ρCON FETs, μ and v are extracted from the current-/capacitance-
rather than the logic-level features. In any case, a large amount voltage characteristics of physics-based numerical simulations
of data is required, which calls for a highly integrated and [18, 19, 48] and should be viewed as theoretical predictions; for
automated design flow across the boundaries between devices, the Si FinFET, μ and v are extracted from the 14-nm FinFET
interconnects, circuits, and systems like DISPEL. experimental data [43] (Fig. 23) and scaled up by 1.1× to match
the projection assuming ballistic transport [44]. LGATE is set at
VII. CONCLUSION the value such that the subthreshold slope is 70 mV/dec based
The Device-to-System Performance EvaLuation (DISPEL) on numerical simulations. The inversion gate-to-channel
platform presented in this paper provides a framework for capacitance (CINV) is derived from COX∙CQ / (COX + CQ), where
assessment of new transistor and interconnect technologies COX = εSiO2/EOT, CQ is the quantum capacitance, εSiO2 is the
from the standpoint of system-level performance through a permittivity of SiO2, and EOT = 0.7 nm.
highly integrated workflow from transistor and interconnect
modeling to physical design flow. Full-chip placement and
routing enables accurate evaluation of the system performance
ACKNOWLEDGMENT Digital VLSI,” IEEE Trans. Nanotechnol., vol. 17, no. 6, pp. 1259–1269,
Nov 2018.
This work was supported in part through the NCN-NEEDS [17] C. Pan, P. Raghavan, A. Ceyhan, F. Catthoor, Z. Tokei, and A. Naeemi,
program, which was funded by the National Science “Technology/Circuit/System Co-Optimization and Benchmarking for
Multilayer Graphene Interconnects at Sub-10-nm Technology Node,” IEEE
Foundation, contract 1227020-EEC, and by the Semiconductor Trans. Electron Devices, vol. 62, no. 5, pp. 1530-1536, May 2015.
Research Corporation, and through Systems on Nanoscale [18] J. Shi1, D. Nayak, S. Banna, R. Fox, S. Samavedam, S. Samal, and S. K.
Information fabriCs (SONIC), and Function Accelerated Lim, “A 14nm Finfet Transistor-Level 3D Partitioning Design to Enable
High-Performance and Low-Cost Monolithic 3D IC,” in Proc. IEEE Int.
NanoMaterials Engineering (FAME), two of the six SRC
Electron Devices Meeting (IEDM), pp. 2.5.1–2.5.4, Dec. 2016.
STARnet Centers, sponsored by MARCO and DARPA, as well [19] K. Chang, K. Acharya, S. Sinha, B. Cline, G. Yeric, and S. K. Lim, “Impact
as the member companies of the Stanford SystemX Alliance and Design Guideline of Monolithic 3-D IC at the 7-nm Technology Node,”
and the Initiative for Nanoscale Materials and Processes (INMP) IEEE Trans. Very Large Scale Integration Syst., vol. PP, pp. 1-12, Apr. 2017.
[20] C.-S. Lee and H.-S. P. Wong. (2017). Device-to-System Performance
at Stanford University. EvaLuation tool (DISPEL) [Online]. Available:
https://nano.stanford.edu/device-system-performance-evaluation-tool
REFERENCES [21] A. Khakifirooz, O. M. Nayfeh, and D. Antoniadis, “A Simple Semi-
empirical Short-Channel MOSFET Current–Voltage Model Continuous
[1] T. Skotnicki, J. A. Hutchby, T.-J. King, H.-S. P. Wong, and F. Boeuf, “The Across All Regions of Operation and Employing Only Physical Parameters,”
road to the end of CMOS scaling,” IEEE Circuits Devices Mag., vol. 21, no. IEEE Trans. Electron Devices, vol. 56, no. 8, pp. 1674–1680, Aug. 2009.
1, pp. 16–26, Jan./Feb. 2005. [22] M. S. Lundstrom and D. A. Antoniadis, “Compact Models and the Physics
[2] R. Brain, “Interconnect Scaling: Challenges and Opportunities,” in Proc. of Nanoscale FETs,” IEEE Trans. Electron Devices, vol. 61, no. 2, pp. 225-
IEEE Int. Electron Devices Meeting (IEDM), pp. 9.3.1–9.3.4, Dec. 2016. 233, Feb. 2014.
[3] K. Ronse, P. De Bisschop, G. Vandenberghe, E. Hendrickx, R. Gronheid, [23] L. Liu, Y. Lu, and J. Guo, “On Monolayer MoS2 Field-Effect Transistors
A. Vaglio Pret, A. Mallik, D. Verkest, and A. Steegen, “Opportunities and at the Scaling Limit,” IEEE Trans. Electron Devices, vol. 60, no. 12, pp.
challenges in device scaling by the introduction of EUV lithography,” in 4133-4139, Dec. 2013.
Proc. IEEE Int. Electron Devices Meeting (IEDM), pp. 18.5.1-18.5.4, 2012. [24] X. Cao and J. Guo, “Simulation of Phosphorene Field-Effect Transistor at
[4] V. Moroz, L. Smith, J. Huang, M. Choi, T. Ma, J. Liu, Y. Zhang, X.-W. Lin, the Scaling Limit,” IEEE Trans. Electron Devices, vol. 62, no. 2, pp. 659-
J. Kawa, and Y. Saad, “Modeling and Optimization of Group IV and III-V 665, Feb. 2015.
FinFETs and Nano-Wires,” in Proc. IEEE Int. Electron Devices Meeting [25] S. Rakheja, D. Antoniadis, (2014). MVS Nanotransistor Model (Silicon).
(IEDM), pp. 7.4.1–7.4.4, Dec. 2014. nanoHUB. doi:10.4231/D3H12V82S
[5] C. D. English, K. Smithe, R. Xu, and E. Pop, “Approaching Ballistic [26] A. V-Y Thean, D. Yakimets, T. H. Bao, P. Schuddinck, S. Sakhare, M. G.
Transport in Monolayer MoS2 Transistors with Self-Aligned 10 nm Top Bardon, A. Sibaja-Hernandez, I. Ciofi, G. Eneman, A. Veloso, J. Ryckaert,
Gates,” in Proc. IEEE Int. Electron Devices Meeting (IEDM), pp. 5.6.1– P. Raghavan, A. Mercha, A. Mocuta, Z. Tokei, D. Verkest, P. Wambacq, K.
5.6.4, Dec. 2016. De Meyer, and N. Collaert, “Vertical Device Architecture for 5nm and
[6] G. S. Tulevski, A. D. Franklin, D. Frank, J. M. Lobez, Q. Cao, H. Park, A. beyond: Device & Circuit Implications,” in Proc. VLSI Technol. Symp., pp.
Afzali, S.-J. Han, J. B. Hannon, and W. Haensch, “Toward High- T26-T27, Jun. 2015.
Performance Digital Logic Technology with Carbon Nanotubes,” ACS Nano, [27] G. K. Reeves and H.B. Harrison, “Obtaining the specific contact resistance
vol. 8, no. 9, pp. 8730-8745, 2014. from transmission line model measurements,” IEEE Electron Device Lett.,
[7] Z. Tőkei, “End of Cu roadmap and beyond Cu,” in Proc. IEEE Int. vol. 3, no. 5, pp. 111-113, May 1982.
Interconnect Technol. Conf./Adv. Metallization Conf. (IITC/AMC), Short [28] W. Steinhogl, G. Schindler, G. Steinlesberger, and M. Engelhardt, “Size-
Course, 2016. dependent resistivity of metallic wires in the mesoscopic range,” Phys. Rev.
[8] D. Kondo, H. Nakano, B. Zhou, A. I, K. Hayashi, M. Takahashi, S. Sato and B, vol. 66, pp. 075414, 2002.
N. Yokoyama, “Sub-10-nm-wide intercalated multi-layer graphene [29] A. Pyzyna, R. Bruce, M. Lofaro, H. Tsai, C. Witt, L. Gignac, M. Brink, M.
interconnects with low resistivity,” in Proc. IEEE Int. Interconnect Technol. Guillorn, G. Fritz, H. Miyazoe, D. Klaus, E. Joseph, K. P. Rodbell, C.
Conf./Adv. Metallization Conf. (IITC/AMC), pp. 189-192, 2014. Lavoie, D.-G. Park, “Resistivity of copper interconnects beyond the 7 nm
[9] J. Ryckaert et al., “Design Technology Co-Optimization for N10,” in Proc. node,” in Proc. VLSI Technol. Symp., pp. T120-T121, Jun. 2015.
IEEE Custom Integr. Circuits Conf. (CICC), pp. 1-8, 2014. [30] Synopsys. (2015). StarRC™ User Guide and Command Reference:
[10] S. Sinha, L. Shifren, V. Chandra, B. Cline, G. Yeric, R. Aitken, B. Cheng, Product Version K-2015.06. Mountain View, CA: Author.
A. Brown, C. Riddet, C. Alexandar, C. Millar, and A. Asenov, “Circuit [31] Synopsys. (2015). SiliconSmart ACE User Guide: Product Version K-
design perspectives for Ge FinFET at 10nm and beyond,” In Proc. Int. Symp. 2015.06. Mountain View, CA: Author.
Quality Electron. Design (ISQED), pp. 57-60, 2015. [32] M. G. Bardon, P. Raghavan, G. Eneman, P. Schuddinck, M. Dehan, A.
[11] H. B. Bakoglu and J. D. Meindl, “A System-Level Circuit Model for Multi- Mercha, A. Thean, D. Verkest, and A. Steegen, “Group IV channels for 7nm
and Single-Chip CPUs,” IEEE Int. Solid-State Circuits Conf. (ISSCC), pp. FinFETs: Performance for SoCs Power and Speed Metrics,” in Proc. VLSI
308-309, 1987. Technol. Symp., pp. 88-89, Jun. 2014.
[12] D. Sylvester and K. Keutzer, “System-Level Performance Modeling with [33] G. R. Bhimanapati, “Recent Advances in Two-Dimensional Materials
BACPAC—Berkeley Advanced Chip Performance Calculator,” beyond Graphene,” ACS Nano, vol. 9, no. 12, pp. 11509-11539, 2015.
International Workshop on System-Level Interconnect Prediction, pp. 109- [34] Aron Szabo, Reto Rhyner, Hamilton Carrillo-Nunez, and Mathieu Luisier,
114, 1999. “Phonon-limited performance of single-layer, single-gate black phosphorus
[13] D. J. Frank, W. Haensch, G. Shahidi, and O. H. Dokumaci, “Optimizing n- and p-type field-effect transistors,” in Proc. IEEE Int. Electron Devices
CMOS Technology for Maximum Performance,” IBM J. Res. & Dev., vol. Meeting (IEDM), pp. 12.1.1–12.1.4, Dec. 2015.
50, No. 4/5, pp. 419-431, Jul./Sep. 2006. [35] C.-S. Lee, B. Cline, S. Sinha, G. Yeric, and H.-S. P. Wong, “32-bit
[14] S. Wang, A. Pan, C. O. Chui, and P. Gupta, “Proceed: A pareto Processor Core at 5-nm Technology: Analysis of Transistor and
optimization-based circuit-level evaluator for emerging devices”, IEEE Interconnect Impact on VLSI System Performance,” in Proc. IEEE Int.
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 24, no. 1, pp. 192-205, Electron Devices Meeting (IEDM), pp. 28.3.1–28.3.4, Dec. 2016.
Jan. 2016. [36] B. Amrutur and M. Horowitz, “A Replica Technique for Wordline and
[15] B. S. Landman and R. L. Russo, “On a pin versus block relationship for Sense Control in Low-Power SRAM’s,” IEEE J. Solid-State Circuits, vol.
partitions of logic paths,” IEEE Trans. Comput., vol. C-20, pp. 1469–1479, 33, no. 8, pp. 1208-1219, Aug. 1998.
Dec. 1971. [37] A. Raychowdhury, B. Geuskens, J. Kulkarni, J. Tschanz, K. Bowman, T.
[16] G. Hills, M. G. Bardon, G. Doornbos, D. Yakimets, P. Schuddinck, R. Karnik, S.-L. Lu, V. De, M. Khellah, “PVT-and-Aging Adaptive Wordline
Baert, D. Jang, L. Mattii, S. M. Y. Sherazi, D. Rodopoulos, R. Ritzenthaler, Boosting for 8T SRAM Power Reduction,” IEEE Int. Solid-State Circuits
C.-S. Lee, A. V.-Y. Thean, I. Radu, A. Spessot, P. Debacker, F. Catthoor, P. Conf., Dig. Tech., pp. 351-353, 2010.
Raghavan, M. M. Shulaker, H.-S. P. Wong, and S. Mitra, “Understanding [38] G. Yeric, “Challenges of 7nm CMOS Technologies: Circuit Application
Energy Efficiency Benefits of Carbon Nanotube Field-Effect Transistors for Requirements,” in Proc. IEEE Int. Electron Devices Meeting (IEDM), Short
Course, Dec. 2014.
[39] C. D. English, G. Shine, V. E. Dorgan, K. C. Saraswat, and E. Pop, [62] D. Harris, R. Ho, G.-Y. Wei, and M. Horowitz, “The fanout-of-4 Inverter
“Improved Contacts to MoS2 Transistors by Ultra-High Vacuum Metal Delay Metric” [Online]. Available:
Deposition,” Nano Lett., vol. 16, pp. 3824-3830, May 2016. https://www.ece.ucdavis.edu/~bbaas/116/docs/paper.harris.FO4.pdf
[40] C.-S. Lee, E. Pop, A. D. Franklin, W. Haensch, and H.-S. P. Wong, “A
Compact Virtual-Source Model for Carbon Nanotube Field-Effect
Transistors in the Sub-10-nm Regime—Part II: Extrinsic Elements,
Performance Assessment, and Design Optimization,” IEEE Trans. Electron
Devices, vol. 62, no. 9, pp. 3070-3078, Sep. 2015.
[41] K. Shahookar and P. Mazumder, “VLSI cell placement techniques,” ACM
Computing Surveys, vol. 23, no. 2, pp. 143-220, Jun. 1991.
[42] J. Vygen, “Algorithms for large-scale flat placement,” in Proc. ACM/IEEE
Design Automation Conf. (DAC), pp. 746-751, 1997.
[43] S. Natarajan et al., “A 14nm logic technology featuring 2 nd-generation
FinFET Transistors, air-gapped interconnects, self-aligned double
patterning and a 0.0588 µm2 SRAM cell size,” in Proc. IEEE Int. Electron
Devices Meeting (IEDM), pp. 3.7.1-3.7.4, Dec. 2014.
[44] L. Smith, M. Choi, M. Frey, V. Moroz, A. Ziegler, and M. Luisier,
“FinFET to Nanowire Transition at 5nm Design Rules,” in Proc. IEEE Int.
Conf. Simul. Semiconductor Process. Devices, pp. 254–257, Sep. 2015.
[45] L. Shifren, R. Aitken, A. R. Brown, V. Chandra, B. Cheng, C. Riddet, C.
L. Alexander, B. Cline, C. Millar, S. Sinha, G. Yeric, and A. Asenov,
“Predictive Simulation and Benchmarking of Si and Ge pMOS FinFETs for
Future CMOS Technology,” IEEE Trans. Electron Devices, vol. 61, no. 7,
pp. 2271-2277, Jul. 2014.
[46] D. E. Nikonov and I. A. Young, “Benchmarking of Beyond-CMOS
Exploratory Devices for Logic Integrated Circuits,” IEEE J. Exploratory
Solid-State Comput. Devices Circuits, vol. 1, no. 1, pp. 3-11, Dec. 2015.
[47] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward
networks are universal approximators,” Neural Networks, vol. 2, no. 5, pp.
359-366, 1989.
[48] L. Scheffer, L. Lavagno, and G. Martin, EDA for IC Implementation,
Circuit Design, and Process Technology. Boca Raton, FL, U.S.A.: CRC
Press, 2006, pp. 5.1-5.23.
[49] M. Abadi et al., TensorFlow: Large-scale machine learning on
heterogeneous systems, 2015. Software available from tensorflow.org.
[50] X. Glorot, A. Bordes, and Y. Bengio, “Deep Sparse Rectifier Neural
Networks,” in Proc. Conf. Artificial Intelligence and Statistics, pp. 315-323,
2011.
[51] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
arXiv preprint arXiv:1412.6980, 2014.
[52] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep
feedforward neural networks,” In Proc. Conf. on Artificial Intelligence and
Statistics, vol. 9, pp. 249-256, 2010.
[53] M. Choi, V. Moroz, L. Smith, and J. Huang, “Extending drift-diffusion
paradigm into the era of FinFETs and nanowires,” in Proc. IEEE Int. Conf.
Simul. Semiconductor Process. Devices, pp. 242–245, Sep. 2015.
[54] D. Jang, D. Yakimets, G. Eneman, P. Schuddinck, M. G. Bardon, P.
Raghavan, A. Spessot, D. Verkest, and A. Mocuta, “Device Exploration of
NanoSheet Transistors for Sub-7-nm Technology Node,” IEEE Trans.
Electron Devices, vol. 64, no. 6, pp. 2707-2713, Jun. 2017.
[55] D. Castelvecchi, “Can we open the black box of AI?” Nature, vol. 538, pp.
21-23, Oct. 2016.
[56] A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet Classification with
Deep Convolutional Neural Networks,” in Proc. Neural Information and
Processing Systems (NIPS), 2012.
[57] S. Onaissi, F. Taraporevala, J. Liu, F. Najm, “A Fast Approach for Static
Timing Analysis Covering All PVT Corners,” in Proc. ACM/IEEE Design
Automation Conf. (DAC), pp. 777-782, 2011.
[58] M. R. Guthaus, J. E. Stine, S. Ataei, B. Chen, B. Wu, and M. Sarwar,
“OpenRAM: An Open-Source Memory Compiler,” in Proc. Int. Conf. on
Computer-Aided Design (ICCAD), 2016.
[59] Y. Liu, J. Guo, E. Zhu, L. Liao, S.-J. Lee, M. Ding, I. Shakir, V. Gambin,
Y. Huang, and X. Duan, “Approaching the Schottky–Mott limit in van der
Waals metal–semiconductor junctions,” Nature, vol. 557, pp. 696-700, May
2018.
[60] Y. Wang, J. C. Kim, R. Wu, J. Martinez, X. Song, J. Yang, F. Zhao, A.
Mkhoyan, H. Y. Jeong, and M. Chhowalla, “Van der Waals contacts
between three-dimensional metals and two-dimensional semiconductors,”
Nature, vol. 568, pp. 70-74, Apr. 2019.
[61] G. Pitner, G. Hills, J. P. Llinas, K.-M. Persson, R. Park, J. Bokor, S. Mitra,
and H.-S. P. Wong, “Low-Temperature Side Contact to Carbon Nanotube
Transistors: Resistance Distributions Down to 10 nm Contact Length,”
Nano Lett., vol. 19, pp. 1083-1089, Jan. 2019.