0% found this document useful (0 votes)
13 views69 pages

Thesis GaN

The thesis discusses the ASIC implementation of a RISC-V based microcontroller aimed at wireless charging applications, submitted by Mohamed Tarek Mohamed Ismail at Ain Shams University. It covers various aspects of ASIC design, including standard cell libraries, timing analysis, logic synthesis, and physical design considerations. The document also highlights the use of Gallium Nitride technology for transmitters in wireless power transmission systems.

Uploaded by

Mohamed Tarek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views69 pages

Thesis GaN

The thesis discusses the ASIC implementation of a RISC-V based microcontroller aimed at wireless charging applications, submitted by Mohamed Tarek Mohamed Ismail at Ain Shams University. It covers various aspects of ASIC design, including standard cell libraries, timing analysis, logic synthesis, and physical design considerations. The document also highlights the use of Gallium Nitride technology for transmitters in wireless power transmission systems.

Uploaded by

Mohamed Tarek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

AIN SHAMS UNIVERSITY

FACULTY OF ENGINEERING
Electronics Engineering and Electrical Communications

ASIC implementation of RISC-V based


Microcontroller
[For Wireless Charging applications]

A Thesis submitted in partial fulfillment of the requirements of the degree of


Bachelor in Electrical Engineering
(Electronics Engineering and Electrical Communications
by
Mohamed Tarek Mohamed Ismail

Faculty of Engineering, Ain-Shams University, year 2023-2024


Supervised By
Prof. Hani Fikry
Eng. Islam Samir (ICpedia)
1 Table of Contents
2 Table of Figures ..................................................................................................................................... 6
1 Introduction .......................................................................................................................................... 9
1.1 PULP Platform ............................................................................................................................. 10
1.2 PULP project tape outs .................................................................................................................... 14
1.3 PULPino SoC .................................................................................................................................... 14
1.4 PULPino SoC Peripherals: ................................................................................................................ 15
1.5 PULPissimo SoC ............................................................................................................................... 16
1.6 PULPissimo SoC Peripherals: ....................................................................................................... 17
2 ASIC Design approaches ........................................................................................................................ 17
2.1 Full Custom.................................................................................................................................. 17
2.2 Semi-Custom [Standard Cell Based ASICs] .................................................................................. 17
2.3 Programmable ASICs Field Programmable Gate Array ............................................................... 18
3 Standard Cell Library ........................................................................................................................... 19
3.1 Cell categories ............................................................................................................................. 19
3.2 Specify Libraries .......................................................................................................................... 20
4 Standard cell library Characterization ................................................................................................. 20
4.1 Liberty file (.lib/.db file) .............................................................................................................. 20
4.1.1 Delay Calculation Model ..................................................................................................... 22
4.2 Parasitic Estimation: WLM .......................................................................................................... 23
5 LEF: Library exchange format (.lef file) ............................................................................................... 23
5.1 Layout of cells [std_cell.gds]: ...................................................................................................... 23
5.2 SPICE netlistis .............................................................................................................................. 23
5.3 The Multiple Analysis Corners..................................................................................................... 23
6 Static Timing analysis .......................................................................................................................... 24
6.1 Timing Path Types ...................................................................................................................... 25
7 Early and Latest analysis approaches: ................................................................................................. 26
8 Timing Verification of Synchronous Designs ....................................................................................... 26
8.1 The requirements of Setup and Hold on timing paths ................................................................. 27
9 What is metastability? ........................................................................................................................ 27
10 Logic Synthesis: ............................................................................................................................... 28

1
10.1 Steps of Synthesis: ...................................................................................................................... 28
10.2 Synthesis Flow and Design Constraints: ...................................................................................... 28
11 Formal Verification: ......................................................................................................................... 32
11.1 Main concepts in Formality are: ................................................................................................. 33
11.2 Guidance (Load Automated Setup File): ..................................................................................... 33
12 Design For Test (DFT): ..................................................................................................................... 33
12.1 Physical Defects:.......................................................................................................................... 33
12.2 Fault Models: .............................................................................................................................. 34
12.3 There are two conditions should be met to have a testable design: .......................................... 34
12.4 Scan Chains: ................................................................................................................................ 34
12.5 Mechanics of Scan Chain: ........................................................................................................... 35
13 Overall Design Flow ......................................................................................................................... 36
13.1 Design Flow Overview ................................................................................................................. 36
13.2 Concept + Market Research ........................................................................................................ 36
13.3 RTL Coding and Simulation.......................................................................................................... 36
13.4 Logic Synthesis and Formal Verification ...................................................................................... 36
13.5 Place and Route (PnR) ................................................................................................................. 36
13.5.1 Floorplanning ...................................................................................................................... 36
13.5.2 Power Planning & Boundary Cell Insertion ......................................................................... 37
13.5.3 Placement ........................................................................................................................... 37
13.5.4 CTS and Post-CTS Logical Optimization ............................................................................... 37
13.5.5 Routing ................................................................................................................................ 37
13.6 Physical Verification .................................................................................................................... 37
13.7 Signoff/Tapeout........................................................................................................................... 37
14 Floorplanning .................................................................................................................................. 37
14.1 Chip Floorplanning ...................................................................................................................... 37
14.1.1 IO Pads ................................................................................................................................ 37
14.2 IP Floorplanning .......................................................................................................................... 38
14.3 Standard cell placement and Routing ......................................................................................... 38
14.4 Blockages and Halos .................................................................................................................... 38
14.5 Guidelines for a Good Floorplan ................................................................................................. 38

2
15 Power Planning and Related Challenges ......................................................................................... 38
15.1 Introduction ................................................................................................................................ 38
15.2 Power Planning Overview ........................................................................................................... 38
15.3 Three Levels of Power Distribution ............................................................................................. 39
16 IR Drop ............................................................................................................................................ 39
16.1 Effects of IR Drop ......................................................................................................................... 39
16.2 Mitigating Static IR Drop ............................................................................................................. 39
16.3 Mitigating Dynamic IR Drop ........................................................................................................ 40
17 Decap Cells ...................................................................................................................................... 40
18 Analyzing IR Drop ............................................................................................................................ 40
19 Static vs. Dynamic IR Analysis ......................................................................................................... 40
20 Electromigration.............................................................................................................................. 40
21 Placement in Physical Design .......................................................................................................... 41
21.1 Introduction ................................................................................................................................ 41
21.2 Placement Overview ................................................................................................................... 41
21.3 Timing-Driven Placement ............................................................................................................ 41
22 Placement Flow ............................................................................................................................... 41
22.1 Global Placement ........................................................................................................................ 41
22.2 Detailed Placement ..................................................................................................................... 41
23 Placement Optimizations ................................................................................................................ 41
24 Tie-Cell Insertion ............................................................................................................................. 42
25 Scan-Chain Reordering .................................................................................................................... 42
26 Logical Restructuring ....................................................................................................................... 42
27 Congestion in Placement ................................................................................................................ 42
27.1 Reasons for Congestion ............................................................................................................... 42
28 Placement Constraints .................................................................................................................... 42
29 Strategies to Fix Congestion ............................................................................................................ 43
30 Clock Tree Synthesis (CTS) in Physical Design ................................................................................. 43
30.1 Introduction ................................................................................................................................ 43
30.2 Clock Parameters......................................................................................................................... 43
30.2.1 Skew .................................................................................................................................... 43

3
30.2.2 Jitter .................................................................................................................................... 43
30.2.3 Slew ..................................................................................................................................... 43
30.3 Insertion Delay ............................................................................................................................ 43
31 Skew and Jitter ................................................................................................................................ 43
31.1 Local Skew and Global Skew ....................................................................................................... 43
31.2 Positive Skew and Negative Skew ............................................................................................... 44
32 Source Delay, Network Delay, and Uncertainty .............................................................................. 44
33 Clock Tree Synthesis (CTS) ............................................................................................................... 44
33.1 CTS goals ..................................................................................................................................... 44
33.2 CTS vs. High Fanout Synthesis (HFS)............................................................................................ 44
33.3 CTS Process Overview ................................................................................................................. 44
33.4 Clock Tree Constraints ................................................................................................................. 44
33.5 Clock Tree Synthesis Execution ................................................................................................... 44
33.6 Non-Default Clock Routing .......................................................................................................... 45
33.7 Recommendations for NDR (Non-Default Routing) .................................................................... 45
33.8 Effects of CTS ............................................................................................................................... 45
33.9 Clock Tree Optimization .............................................................................................................. 45
34 Routing ............................................................................................................................................ 45
34.1 Importance of Routing as Technology Shrinks[3]........................................................................ 45
34.2 Routing ........................................................................................................................................ 46
34.3 Inputs of Routing ......................................................................................................................... 46
34.4 Goals of Routing .......................................................................................................................... 46
34.5 Routing Constraints ..................................................................................................................... 47
34.6 Do we have any routes created before this step? ....................................................................... 47
34.7 Grid-Based Routing System ......................................................................................................... 47
34.8 Routing Flow ............................................................................................................................... 48
34.8.1 Global Routing..................................................................................................................... 49
34.8.2 Track Assignment ................................................................................................................ 49
34.8.3 Detailed Routing ................................................................................................................. 50
34.8.4 Search&Repair .................................................................................................................... 50
35 Filler Cell Insertion .......................................................................................................................... 50

4
36 Physical Only Cells ........................................................................................................................... 50
Decap ...................................................................................................................................................... 50
EndCap cells ............................................................................................................................................ 51
Well Tap cells _ ........................................................................................................................................ 51
37 Parasitics Extraction (PEX): .............................................................................................................. 51
38 Physical Verification (PVR):.............................................................................................................. 55
39 On-Chip Variation (OCV): ................................................................................................................ 58
40 Design Planning Overview............................................................................................................... 62
40.1 Hierarchical Design Planning Flow .............................................................................................. 62
40.2 Topographical technology ........................................................................................................... 63
40.2.1 Using Floorplan physical constraints................................................................................... 64
41 Future Work .................................................................................................................................... 65
42 Conclusion ....................................................................................................................................... 66
43 References:...................................................................................................................................... 67

5
2 Table of Figures
figure 1 STWLC38 .......................................................................................................................................... 9
figure 2 Araine core ..................................................................................................................................... 11
figure 3 RISCY core ...................................................................................................................................... 11
figure 4 zero-riscy (Ibex) core ...................................................................................................................... 11
figure 5 structure of a PULP Microcontroller .............................................................................................. 12
figure 6 PULP cluster ................................................................................................................................... 13
figure 7 PULP Multi-cluster ......................................................................................................................... 13
figure 8 PULP ASIC chips ............................................................................................................................. 14
Figure 9:PULPino ......................................................................................................................................... 15
Figure 10: PULPissimo SoC .......................................................................................................................... 16
Figure 11: Semi custom ............................................................................................................................... 18
Figure 12: FPGA ........................................................................................................................................... 18
Figure 13: SC Library views.......................................................................................................................... 20
Figure 14: Liberty file library information ................................................................................................... 21
Figure 15:Liberty file cell information ......................................................................................................... 21
Figure 16:Liberty file pin information ......................................................................................................... 22
Figure 17: Timing model that abstracts cell behavior. ................................................................................ 22
Figure 18:Delay Calculation Table Model .................................................................................................... 22
Figure 19:Wire Load Model......................................................................................................................... 23
Figure 20:GDS VS LEF .................................................................................................................................. 23
Figure 21:STA flow ....................................................................................................................................... 25
Figure 22:Timing paths................................................................................................................................ 25
Figure 23:positive and negative slacks ........................................................................................................ 26
Figure 24:D-FF timing parameters .............................................................................................................. 26
Figure 25:Setup and Hold on timing paths ................................................................................................. 27
Figure 26: inputs and outputs of logic synthesis......................................................................................... 28
Figure 27: Steps of synthesis ...................................................................................................................... 28
Figure 28: Clock Skew.................................................................................................................................. 30
Figure 29: input and output delay .............................................................................................................. 30
Figure 30: multicycle path ........................................................................................................................... 31
Figure 31: formal verification ...................................................................................................................... 32
Figure 32:: logic cones and compare points............................................................................................... 33
Figure 33: formal verification flow .............................................................................................................. 33
Figure 34: Chip testing ................................................................................................................................ 33
Figure 35: Physical Defects .......................................................................................................................... 33
Figure 36: examples of stuck at fault .......................................................................................................... 34
Figure 37: bridging fault .............................................................................................................................. 34
Figure 38: Controllability and Observability................................................................................................ 34
Figure 39: scan flip flop ............................................................................................................................... 34
Figure 40:scan chain.................................................................................................................................... 35

6
Figure 41: mechanics of scan chain ............................................................................................................ 35
Figure 42: Chip Floorplanning .................................................................................................................... 38
Figure 43: Placement Blockages and Halos ................................................................................................ 38
Figure 44 : Power Planning ......................................................................................................................... 38
Figure 45: Levels of Power Distribution ..................................................................................................... 39
Figure 46: IR Drop....................................................................................................................................... 39
Figure 47: Static IR Drop ............................................................................................................................. 39
Figure 48: Decap Cells ................................................................................................................................ 40
Figure 49: Electromigration ........................................................................................................................ 40
Figure 50: Placement ................................................................................................................................. 41
Figure 51: Tie-Cell Insertion ....................................................................................................................... 42
Figure 52: Congestion Map ........................................................................................................................ 42
Figure 53: Clock Parameters ....................................................................................................................... 43
Figure 54: Clock Tree Synthesis (CTS) ......................................................................................................... 44
Figure 55: Buffer Insertion for Clock Tree Optimization ............................................................................. 45
Figure 56:Multi-level Interconnection (MLI) Technology layer Stack .......................................................... 45
Figure 57:Routing ........................................................................................................................................ 46
Figure 58:Grid based routing with two metals ........................................................................................... 47
Figure 59:Routing Grids............................................................................................................................... 48
Figure 60: Grid Based Routing..................................................................................................................... 48
Figure 61: Routing Flow .............................................................................................................................. 48
Figure 62: Global Routing ............................................................................................................................ 49
Figure 63: Global Routing ............................................................................................................................ 49
Figure 64: Track assignment ........................................................................................................................ 49
Figure 65: SBoxes ........................................................................................................................................ 50
Figure 66: Design Rule Constraints ............................................................................................................. 50
Figure 67: Filler Cell Insertion ..................................................................................................................... 50
Figure 68 :Well Tap Cell ............................................................................................................................... 51
Figure 69:Interconnect parasitic capacitance modeling ............................................................................. 51
Figure 70: Star-RCXT Flow ........................................................................................................................... 52
Figure 71Extract nxtgrd database ............................................................................................................... 53
Figure 72: PVR Flow .................................................................................................................................... 55
Figure 73: DRC Checks................................................................................................................................. 55
Figure 74: schematic netlist vs extracted layout netlist .............................................................................. 57
Figure 75: different metal on gate representation...................................................................................... 57
Figure 76: represents the difference between PVT & OCV ......................................................................... 58
Figure 77: the difference in derates across enhancing technology ............................................................ 59
Figure 78: Setup analysis under OCV .......................................................................................................... 59
Figure 79: Hold analysis under OCV ............................................................................................................ 60
Figure 80: Common path that should be applied by CRPR (GreyOne) ....................................................... 60
Figure 81: Different path Delay in OCV Fixed Derate .................................................................................. 60
Figure 82: Sample AOCV table for setup analysis ....................................................................................... 61
7
Figure 83: Standard deviation of data ......................................................................................................... 61
Figure 84: The flow to implement a hierarchical design plan .................................................................... 63
Figure 85 : Topographical technology in RTL synthesis .............................................................................. 64
Figure 86: Inputs and outputs of DC in topographical mode ..................................................................... 64
Figure 87 : Gant Chart ................................................................................................................................. 65

8
1 Introduction
Wireless power transmission (WPT) involves the transfer of electrical energy from a power source to
an electrical load without the need for physical conductors. The process typically includes a
transmitter (Tx) and a receiver (Rx) system. In the context you mentioned, the Tx uses Gallium
Nitride (GaN) technology, while the Rx operates on low power and utilizes Complementary Metal-
Oxide-Semiconductor (CMOS) technology.

Gallium Nitride is a wide-bandgap semiconductor material that exhibits excellent high-frequency and
power-handling capabilities. GaN-based transmitters are often employed in WPT systems due to
their high electron mobility and efficiency, allowing for the efficient generation and transmission of
high-frequency signals. GaN transmitters are particularly advantageous in applications where high
power and high-frequency operation are essential.

In a WPT system, the Tx using GaN technology converts electrical power into high-frequency
electromagnetic waves. These waves can travel through space without the need for physical
conductors, allowing for efficient power transfer over short to moderate distances.

The receiver in a WPT system is responsible for capturing and converting the transmitted
electromagnetic waves back into electrical power for the load. In your context, the Rx operates on
low power and employs CMOS technology.

The use of CMOS in the Rx is beneficial for low-power applications, making it suitable for energy
harvesting and wireless charging scenarios where power consumption is a critical factor. Also the
WPT receiver end can employ a microcontroller to ensure efficient power reception and monitoring
different bias points. An example wireless charging SoC is the STWLC38 IC from STMicroelectronics

figure 1 STWLC38

This project's goal is to do synthesis and PnR of a digital Microcontroller Unit (MCU) core based on
an open-source RISC-V architecture, specifically the PULPino SoC from the PULP platform.

9
1.1 PULP Platform
The PULP (Parallel Ultra-Low Power) platform is an open-source, efficient RISC-V architecture
developed as a joint effort of ETH Zurich, University of Bologna, and their partners. It includes a
variety of RISC-V cores, such as the CV32E40P (RI5CY) and Ibex (Zero-riscy), which are single-stage,
single-issue 32-bit RISC-V integer cores tuned for high energy efficiency. The platform also offers a
range of ready-to-use FPGA flows on multiple boards and various safety, security, and
predictability features. Additionally, PULP provides a minimal single-core RISC-V SoC called
PULPino, which can be configured to use any 32-bit core, add memory, and some peripherals. The
platform is designed for parallel ultra-low power computing and has been actively developed since
2013 by the involved institutions and partners.

The PULP platform offers a variety of cores for building efficient RISC-V systems. Some of the key
cores provided by the PULP platform include:

CV32E40P (RI5CY): A single-stage, single-issue 32-bit RISC-V integer core tuned for high energy
efficiency. It implements the RV32-IMC instruction set and has an optional 32-bit FPU supporting
the F extension and instruction set extensions for DSP, including hardware loops, SIMD extensions,
bit manipulation, and post-increment instructions.

Ibex (Zero-riscy): A single-stage, single-issue 32-bit RISC-V integer core also tuned for high energy
efficiency. It is a minimal core with a 2-stage pipeline.

Micro-riscy: A minimal 32-bit RISC-V core with a 2-stage pipeline, designed for ultra-low power
computing.

Snitch: A 32-bit 1-stage RISC-V core with a 2-stage pipeline, optimized for ultra-low power
computing

CVA6 (Ariane): An Application class 6-stage RISC-V CPU capable of booting Linux. It is developed
by the OpenHW Group and is available on their GitHub repository. The CORE-V CVA6 is designed
to be a high-performance, 32-bit, 6-stage superscalar RISC-V CPU. It is an open-source core that
can be used for various applications, including running the Linux operating system.

These cores can be used to build a wide range of systems, from simple microcontrollers to
complex, energy-efficient accelerators for DSP loads. The PULP platform is designed for parallel
ultra-low power computing and has been actively developed since 2013 by the involved
institutions and partners.

10
figure 2 Araine core

figure 3 RISCY core

figure 4 zero-riscy (Ibex) core

11
In addition to the versatile cores mentioned, the PULP (Parallel Ultra-Low Power) platform also provides a
set of hardware intellectual properties (HW IPs) and peripherals that contribute to the flexibility and
functionality of the overall system design. These HW IPs are designed to complement the RISC-V cores and
enhance the capabilities of systems built on the PULP platform.

Some of the notable HW IPs found in the PULP platform include:

Memory IPs: PULP offers various memory IPs suitable for different applications, providing options
for both volatile and non-volatile memory. These memory IPs are crucial for storing program code,
data, and other essential information in a system.

Interconnect IPs: To enable efficient data transfer and communication between different
components within a system, PULP provides communication IPs such as AXI4 & APB interconnects
and buses. These IPs play a crucial role in ensuring seamless connectivity and data exchange.

Peripheral IPs: PULP supports a range of peripheral IPs that can be integrated into the system to
enhance its capabilities. These peripherals may include interfaces like UART (Universal
Asynchronous Receiver-Transmitter), SPI (Serial Peripheral Interface), I2C (Inter-Integrated
Circuit), GPIO (General-Purpose Input/Output), DMAs and more.

PULP platform offer variety of ASIC chips ranging from single core microcontrollers to multi cluster
based systems. For microcontrollers PULP provided

• PULPino
A minimal single-core RISC-V SoC, PULP’s 1st open-source release that has attracted a lot
of attention.
• PULPissimo
An advanced version of our microcontroller. The main change is the presence of the
logarithmic interconnect between the core and the memory subsystem allowing multiple
access ports. These are then used by an integrated uDMA that can copy data directly
between peripherals and memory, as well as optional accelerators that are called
Hardware Processing Engines (HWPEs).

figure 5 structure of a PULP Microcontroller

12
For cluster-based systems PULP provided

• Mr.Wolf
• Mia Wallace
• Flumine
• Honey Bunny

based on clusters of 32-bit RISC-V cores with direct access to a small and fast scratchpad memory
(Tightly Coupled Data Memory). The cluster is supported by a SoC that houses a larger second level
memory, peripherals for input and output, and in later versions a complete PULPissimo class
microcontroller for power management and basic operations. They mainly target heterogenous
computation platforms solutions.

figure 6 PULP cluster

Multi-cluster accelrators

• HERO platform

HERO combines a PULP-based open-source parallel manycore accelerator implemented on FPGA with a
hard ARM Cortex-A multicore host processor running full-stack Linux. HERO is the first heterogeneous
system architecture that mixes a powerful ARM multicore host with a highly parallel and scalable manycore
accelerator based on RISC-V cores.

figure 7 PULP Multi-cluster

13
1.2 PULP project tape outs

figure 8 PULP ASIC chips

PULP project have tapped out and tested more than 40 PULP related designs based on the released open
source HW IPs [1]

1.3 PULPino SoC


PULPino is a single-core System-on-a-Chip built for the RISC-V RI5CY and zero-riscy core. PULPino reuses
most components from its bigger brother PULP. It uses separate single-port data and instruction RAMs. It
includes a boot ROM that contains a boot loader that can load a program via SPI from an external flash
device. Figure 1.1 shows a block diagram of the SoC. The SoC uses a AXI as its main interconnect with a
bridge to APB for simple peripherals. Both the AXI and the APB buses feature 32-bit wide data channels.
For debugging purposes, the SoC includes an advanced debug unit which enables access to core
registers, the two RAMs and memory-mapped IO via JTAG. Both RAMs are connected to the AXI bus via
bus adapters

14
Figure 9:PULPino

PULPino supports both the RISC-V RI5CY and the RISC-V zero-riscy cores. The two cores have the same
external interfaces and are thus plug-compatible. Figure 2 show the RI5CY core architectures, figure 3
shows the zero-riscy core. The core uses a very simple data and instruction interface to talk to data and
instruction memories. To interface with AXI, a core2axi protocol converter is instantiated in PULPino. For
debugging purposes, all core registers have been memory mapped which allows to them to be accessed
over the AXI bus. The debug unit inside the core handles the request over this bus and reads/sets the core
registers and/or halts the core.

1.4 PULPino SoC Peripherals:


GPIO

UART

The UART used in this system is compatible with a 16750. It features all the typical UART signals,
see Table 5.1, plus some additional signals defined by the 16750.

I2C

I2C is an open drain signaling protocol, meaning that the pad output driver will be switched on
and off, while always driving a low value when enabled. Logic high values are achieved by using a
pull-up resistor on the SDA and SCL lines.

Timer

The timer unit has 2 timers per default. This can be overwritten by a parameter when instantiating
the time.

SPI Slave

The SPI slave is an active peripheral in the sense that it receives/sends data without the assistance
of the core. Its intended purpose is to function as an external interface through which a user of
the system can access the internal memories from outside. This mechanism can be used to pre-
load programs into the memories, start the system, wait for an acknowledgment that the program
15
has terminated and then examine the results. The SPI slave has an AXI master through which it
can access all peripherals and memories. From the outside it can be accesses by sending SPI/QSPI
commands to it. QSPI is an extension to SPI that uses four data lanes instead of one and is thus
four times as fast as the standard SPI.

SoC Control

PULPino features a small and simple APB peripheral which provides information about the
platform and provides the means for CLK gating on the ASIC.[2]

1.5 PULPissimo SoC


PULPissimo is a 32-bit RI5CY single-core System-on-a-Chip. PULPissimo is the second version of the
PULPino system, and it can be extended with the multi-core cluster of the PULP project. Differently from
the simpler PULPino system, PULPissimo uses a more complex memory subsystem, an autonomous I/O
subsystem which uses the uDMA, new peripherals (eg the camera interface (CAMIF)) and a new SDK.

Figure 6 shows a simplified block diagram of the SoC. As for PULPino, PULPissimo can be configured at
design time to use either the RISC-V or zero-riscy. The peripherals are connected to the uDMA which
transfers the date to the memory subsystem efficiently. The JTAG and the AXI plug have also access to
the SoC. The AXI plug can be used to extend the microcontroller with a multi-core cluster or an
accelerator. As for PULPino, the advanced debug unit is used to access to system and core registers,
memories, and memory-mapped IO via JTAG. A logarithmic interconnect allows to link the core and the
uDMA to the memory banks simultaneously.[3]

Figure 10: PULPissimo SoC

16
1.6 PULPissimo SoC Peripherals:
FLLs

PULPissimocontaints3FLLs. One FLL is meant for generating the clock for the peripheral domain,
one for the core domain (core, memories, event unit etc.) and one is meant for the cluster. The
latter is not used.

APB GPIO

Timer

The timer unit has 2 timers per default. This can be overwritten by a parameter when instantiating
the time.

uDMA subsystem

transfers the date to the memory subsystem efficiently by connecting to all peripherals and the
core using the logarithmic interconnect.

2 ASIC Design approaches


2.1 Full Custom
A design methodology useful for integrated circuits. In this design, the resistors, transistors, digital logic,
capacitors, and analog circuits are all positioned in the circuit layout. [“handcrafted” designs] [4]

Pros: Maximum performance, minimized area and highest degree of flexibility.

Cons: Huge design effort, high Design cost and NRE cost, design is frozen in silicon, and long time to
market

2.2 Semi-Custom [Standard Cell Based ASICs]


Components from a predesigned standard cell library are used. All logic cells are predesigned, and some
mask layers are only customized. Standard cell libraries are usually designed using full custom
approach.[4]

Pros over Full custom: Easier, automatable/less design effort, practical to use for large designs,
reasonable TaT and reduced risk.

17
Figure 11: Semi custom

2.3 Programmable ASICs Field Programmable Gate Array


FPGAs are complex and larger reconfigurable devices. Unique features of Field Programmable Gate
Arrays include programming logic cells and interconnect and here no mask layer is customized.
Xilinx/AMD, Altera/Intel, Microsemi/Microchip are some of the important FPGA companies[5]

Figure 12: FPGA

18
FPGA ASIC
Advantages Faster time to market. No NRE Lower unit cost for mass
Simpler design cycle, more production. Faster than FPGA.
predictable. Re-programability Lower power than FPGA. More
Reusability flexible; analog and mixed-signal
Perfect for prototyping Have built- designs can be created which is not
in blocks like: MACs, memories, possible in FPGA
high speed IOs.

Disadvantages Higher unit cost Slower than ASIC Longer time to market High NRE,
Higher power than ASIC. No very expensive tools. Design cycle
control over power optimization. has to analyze/enhance more
Limited design size to FPGA aspects like: DFM, Crosstalk, EMIR,
resources LVS, ERC/PERC. Design is frozen in
silicon

Typical Usage Complex products in small Products in large volume


volume. (ex. Medical and Defense)
Digital HW Prototyping

3 Standard Cell Library


Standard Cell Libraries are crucial building blocks in ASIC design, offering a collection of pre-designed,
pre-characterized cells representing basic logic functions. These cells include gates, flip-flops, and
latches, designed to be easily interconnected to create complex digital circuits. The significance of
Standard Cell Libraries lies in their ability to balance flexibility, performance, and efficiency in chip design
[6].

3.1 Cell categories


• All basic and universal gates (AND, OR, NOT, NAND, NOR,XOR, etc)

• Complex gates (MUX, HA, FA, Comparators, AOI, OAI etc)

• Clock tree cells (Clock buffers, clock inverters, ICG cells etc)

• Flip flops and latches

• Delay cells

• Physical only cells


19
• Scannable Flip flops, Latches.

3.2 Specify Libraries


Standard cell library is delivered by the fab with a collection of files that provides all information needed
by the various EDA tools, so it is provided in 3 different views:-

Figure 13: SC Library views


Behavioral view:
(std_cell.v / std_cell.vhd) used in Gate level simulation (GLS).

Physical view: (.gds) and (.lef) file represent the layout and it is not used in synthesis but used in PNR.
Liberty timing view: (.lib / .db) file which is measured on a specific PVT. It is used as an input to
synthesis tool with timing, area, power information, propagation delay, transition time of the output,
output max capacitance and cell setup and hold timing.

4 Standard cell library Characterization


4.1 Liberty file (.lib/.db file)
It’s a readable ASCII format that characterizes the standard cell library cells in terms of timing, area,
power and other parameters.
The cell is characterized using spice simulation, timing and power results are obtained under a variety of
conditions.
The liberty file contains information about:

Library: General information common to all cells in the library.


For example. Operating conditions, Wire load models, Look-up tables.

20
Figure 14: Liberty file library information

Cell: Specific information about cell characterization.


For example. Function, Area, leakage power

Figure 15:Liberty file cell information

Pin: Timing, power, capacitance, leakage. functionality, design rules and other characteristics of each

21
pin in each cell.

Figure 16:Liberty file pin information

AS running SPICE will consume a lot of time and computing resources.

Instead, we use a timing model that abstracts cell behavior and simplify
calculations.

For every timing arc, the .lib enables us to calculate:

A.Propagation delay

B.Output transition

Based on: Figure 17: Timing model that abstracts cell


behavior.
A.Input transition

B.Output load capacitance

For each signoff corner, we use the provided .lib file for

this corner to perform timing analysis (STA) and

Power analysis as well.

4.1.1 Delay Calculation Model


Synopsys supports several delay models; The non-linear
delay model is the most used in asic world.

The (NLDM) uses a circuit simulator to characterize a


cell’s transistor with various input transitions and output
load capacitance and record the result in a table.

If the delay has to be determined with a value not


recorded in the table, then this value can be determined
through interpolation or extrapolation.
Figure 18:Delay Calculation Table Model

22
4.2 Parasitic Estimation: WLM
During logic synthesis, we don’t have actual placed cells.

We use WLM to estimate interconnects parasitics,

based on the fanout of the net.

Figure 19:Wire Load Model

5 LEF: Library exchange format (.lef file)

It’s a readable ASCII format that contains detailed PIN


information that is used later by PnRtools to guide routing.

The LEF file abstracts the following information to PnRtools:

A.Cell size and shape.

B.Pin locations and layer.

C.Metal blockages (OBS section), that represent internal metal


shapes of the cell not to be touched by routing.
Figure 20:GDS VS LEF

5.1 Layout of cells [std_cell.gds]:


It contains information regarding layout of the cell and physical layers.

Used for LVS, DRC and custom layout.

GDSII file is a binary file format representing planar geometric shapes, text labels, and other information
about the layout in hierarchical form.

5.2 SPICE netlistis

The netlist of cell in SPICE format is used for simulation.

typically used in digital implementation for LVS checking.

5.3 The Multiple Analysis Corners

A corner is defined as a PVT, and it is provided to the analysis and optimization tool as logic libraries per
PVT and parasiticsdata.
23
Corners are not due to functional settings, but rather result from process variations during
manufacturing, and voltage and temperature variations in the environment in which the chip will
operate.

Each standard cell library is characterized for a set of signoff corners, according to the required signoff
corner for the design(s) that will use the library later.

Supply voltage variations: Supply noise, DC source or voltage regulator producing changing voltage over
time. It can go above or below the expected voltage and hence it will cause current to change making
the circuit slower or faster than earlier.

Temperature variation: as the temperature increases the delay increases.

For technology nodes below 65nm, the delay will increase with decrease in temperature, and it will be
maximum at -40°C.

Process variations: Process variation is the deviation in attributes of transistor during the fabrication.

For the same technology we will find the same library but with different operating conditions.

1- fast fast (ff) library (min library) : means lowest delay in the technology (fast) & best process

parameter (equals 1) (fast) …. gives the worst case for the hold analysis so it used during it

2- slow slow (ss) library (max library): means highest delay in the technology (slow) & best

process parameter (equals 0) (slow) ... gives the worst case for the setup analysis so it used during it

3- typical typical (tt) library (normal library) : means normal delay in the technology & best

process parameter (normal)

6 Static Timing analysis


Static Timing Analysis (STA) is a method for determining if a circuit meets timing constraints without
having to simulate.

STA is used to Validates if the design can operate at the set timing constraints. [reports timing violations]

Is a complete and exhaustive verification of all timing checks of a design.

Is used instead of simulation.

24
Figure 21:STA flow

6.1 Timing Path Types


There are 4 types of paths in any synchronous
circuit

1. Input to register.
2. Register to register.
3. Register to output.
4. Input to output.

A start point can be:

1. clock pin of a FF,


2. An input port. Figure 22:Timing paths

An endpoint can be:

1. Input data pin of a FF


2. An output port.

Required time specifies the time point (interval) at which data is required to arrive at end point (data is
required to be stable after arrival).

Arrival time defines the time interval during which a data signal will arrive at a path endpoint (after
arrival time signal will be stable).

Slack is the difference between required time and arrival time.


25
If the slack is negative, then there is violation.

If the slack is positive, then constraints are met where the critical path will have the lowest slack.

STA tool calculates the slack of each logic path, in order to find critical path.

Figure 23:positive and negative slacks

7 Early and Latest analysis approaches:


• Assumes circuits have minimum delay, compares arrival time to earliest required time (hold
check)
• Assumes circuits have maximum delay, compares arrival time to latest required time (setup
check)

8 Timing Verification of Synchronous Designs


A D-FF is characterized by 3 main parameters:

1. T[clk-to-q]–clock to output
2. T[setup]–setup time.
3. T[hold]–hold time

Figure 24:D-FF timing parameters

26
T[clk-to-q]

indicates the amount of time needed for a change in the flip flop-clock input (e.g. rising edge) resulting
in a permanent change at the flip-flop output (Q).

T[setup]

Setup time is the minimum amount of time the data input should be held steadybeforethe clock event,
so that the data is reliably sampled by the clock.

T[hold]

Hold timeis the minimum amount of time the data input should be held steadyafterthe clock event, so
that the data is reliably sampled by the clock. It’s not dependent on clock period!

8.1 The requirements of Setup and Hold on timing paths

Figure 25:Setup and Hold on timing paths

• For Setup:T[clk-to-Q] + Tcomb+ T[setup] ≤ Tclk+ Tskew


• For Hold:T[clk-to-Q] + Tcomb≥ Thold+ Tskew

9 What is metastability?
Whenever there are setup and hold time violations in any flip-flop, it enters a state where its output is
unpredictable: this state is known as metastable state (quasi stable state).

At the end of metastable state, the flip-flop settles down to either '1' or '0'.

Whenever the input signal D does not meet the TSetup and Thold of the given D flip-flop, metastability
occurs.

27
10 Logic Synthesis:
Synthesis is the process that translate and map RTL code written in HDL into a technology specific gate-
level netlist, optimized for a set of pre=define constrains. It is an iterative process aimed at achieving
design goals. [9]

Inputs:
• Behavioral RTL Code.
• Standard Cell library.
• Set of Design Constrains. [7]

Outputs:
• Mapped Gate level netlist.
• Timing, area and power reports. Figure 26: inputs and outputs of logic synthesis
• .svf file. [7]

10.1 Steps of Synthesis:


• Translation:
It translates RTL code into Generic TECH (GTECH).
This netlist doesn’t depend on the technology (no
timing information). Each cell has two inputs and
there is no optimization. [7], [9]

• Mapping and Optimization:


It depends on the technology (timing information
exists). mapping combinational and sequential
generic cells to technology equivalent cells as
shown the opposite figure. Achieving high level
Figure 27: Steps of synthesis
optimization. [7], [9]

10.2 Synthesis Flow and Design Constraints:


• Specify Libraries
1. Search_path variable
Define the path in order to search for design and library files.

SDC command: lappend search_path “........put the path......”

28
2. Target_library
Specify the standard cells timing libraries in db format which used in mapping step. The
libraries should be in the path which defined in Search_path variable.

SDC command: set target_library [list lib1.db lib2.db]

3. Link_library
For a design to be complete, it must connect all the library components and designs it
references.

SDC command: set link_library [list * lib1.db lib2.db Macrolib.db]

The asterisk “*” specifies the the design compile should search for the references in all
the design files and the libraries loaded in the memory.

• Read Design
The synthesis tool reads the designs and check for any unsynthesized syntax. There are two
methods to read files:

- analyze / elaborate method:


analyze checks the syntax and report errors. Elaborate synthesize the design into a
technology independent representation (GTECH). Performs link automatically.

SDC command:

analyze -format verilog {ALU.v Regfile.v TOP.v}

elaborate -lib work TOP

- read_file method:
this method checks the syntax and report errors then translate the design into a technology
independent representation (GTECH). Doesn’t execute the link command automatically.

SDC command:

read_file -format verilog {ALU.v Regfile.v TOP.v}

Link

• Define Design Constraints


The constraints help the tool to optimize the netlist. There are three main categories of Design
constraints:

1) Timing definitions.
SDC commands:

29
- create_clock
This command is used to create the master clocks object in the current design.

- create_generated_clock
This command is used to create a generated clock object in the current design and
specify the clock source from which it is generated. Generated clock is the clock that is
generated by on-chip logic from another clocks as clock dividers and clock gating. The
advantage of using this command is that whenever the master clock changes, the
generated clock changes automatically.

- set_clock_uncertainty
Clock uncertainty is the time
variations in the arrival times of the
clock edges resulting from clock jitter
and clock skew.

- set_clock_latency
Clock latency is the amount of time
Figure 28: Clock Skew
it takes for the clock signal to
propagate from the original clock F
i
source to the clock pin of sequential element
g in the design.
u
- set_dont_touch_network r
e
This command is used for clocks to prevent
: synthesis from buffering clock trees.
n
- set_input_delay e
g
This command is used to sets input a
delay at which the signal arrives on t
i
pins or input ports relative to a clock v
signal. e
s
- set_output_delay k
e
This command is used to sets w
output delay that the signal needs to i Figure 29: input and output delay
n
travel after the output port on pins or s
output ports relative to a clock signal. e
t
u
- set_max_delay p
t
This command is used to Specify the desired maximum delay for paths in the current
i
design.This command specifies that the maximum path length for any start point in
m
from_list to any endpoint in to_list muste be less than delay_value.

- Set_min_delay
30
This command is used to set the minimum delay target for paths in the current design.
Minimum delay is considered as an optimization constraint by the compile command. If
a path violates the requirement given in a set_min_delay command, compile adds delay
to fix the violation.

2) Optimization goals and timing exceptions.


- set_driving_cell or set_input_transition
Rise and fall transition times on an input port affect the cell delay of the input gate. It
is therefore important to accurately model transition times on all inputs. For modeling
input transition, a specific transition time value or a driving cell on the input pin can be
used.

- set_load
Capacitive load on an output port affects the transition time, and thus the cell delay of
the output driver is occurred. Default in picofarad.

3) Modeling the world external to the chip.


- set_false_path
It is possible that certain timing paths are not real (or not possible) in the actual
functional operation of the design. Such paths can be turned off during STA by setting
these as false paths. Even if they violate a timing requirement according to STA, the data
would never propagate through this path in actual circuit operation so no need to meet
any timing constraint. A false path is ignored by the STA for analysis.

- set_multicycle_path
Multicycle path are data paths that require more
than one clock period for execution. In some
cases, the combinational data path between two
flip-flops can take more than one clock cycle to
propagate through the logic. Since the data path
can take up to three clock cycles, a setup multi
cycle check of three cycles should be specified.

- set_max_area
As timing has greater priority in Design Compiler
it is used to set maximum area to zero, thus
optimization achieves the best possible area with
Figure 30: multicycle path
timings met.

31
- set_case_analysis
A common case for designs is that some value should be assumed constant so this
command is used to propagate this constant through the design and disable irrelevant
timing arcs.

- set_max_transition
Sets the maximum transition time for all nodes of the design.

- set_max_capacitance
Sets the maximum capacitance of a net for all nodes of the design.

- set_max_fanout
Set the maximum number of load cells a pin can be connected to.

● Select Compile Strategy


Can be repeated as necessary.

● Optimize the Design (SDC Command: compile or compile_ultra)


Optimization is the step in the synthesis process that attempts to implement a combination of
library cells that meets the functional, speed, and area requirements of your design.

● Generate Reports
Reports for Area, Power and Timing.

● Generate Gate Level Netlist (SDC Command: write)


Use write command to save and export the design from memory to disk, in the required format.

11 Formal Verification:
Formal verification refers to the process of establishing functional equivalence of two designs, once is
called reference design (ref) which known to be functionally
correct, and the other is called the implementation design (imp)
which is modified version of the reference design that you want
to verify as functionally equivalence to the reference design.

Equivalence checkers prove or disprove that one design


representation is logically equivalent to another. In other words,
two circuits exhibit the same exact behavior under all conditions
despite different representations. The purpose of Formality is to Figure 31: formal verification
detect unexpected differences that might have been introduced F
i
into a design during development. [8], [9] g
u
Formality can be used to verify two RTL designs against each other, two
r gate-level designs against each
other, or an RTL design against a gate-level design. [7], [9] e
:
H
o 32
w
f
o
r
m
a
l
v
e
11.1 Main concepts in Formality are:
● Compare Point:
− Primary output of a circuit.
− Registers within a circuit.
− Input to black boxes within a circuit. [8]

● Logic Cone:
A block of combinational logic which drives a compare point. F
Figure 32:: logic cones and compare points
[8] i
g
u
11.2 Guidance (Load Automated Setup File): r
Before specifying the reference and implementation designs, e
:
an automated setup file (.svf) can be optionally loaded into L
Formality. The automated setup file helps Formality process o
design changes caused by other tools used in the design flow. g
i
Formality uses this file to assist the compare point matching c
and verification process. For each automated setup file that is c
o
loaded, Formality processes the content and stores the n
information for use during the name-based compare point e
matching period. The purpose of automated file (.svf) is to help s
a
Formality process design changes caused by other tools, which n
it should have access to as the changes are made. [8], [9] d
Figure 33:
C formal verification flow
o
12 Design For Test (DFT): m
p
DFT is a manufacturing test seeks physical defects free ICs. a
Packaged IC chips are tested using automated test equipment r
e
(ATE) through a test program. If a unit fails ant test in the p
program, it is discarded. Only units that pass every test in the o
i
program are ever shipped to the user. [10] n
t
s
Figure 34: Chip testing

12.1 Physical Defects:


An on-chip flaw introduced during fabrication or packaging of
an individual IC that makes the devices malfunction. Most of
physical defects are caused due to dust particle or due to
fabrication process issues. These IC defects introduces in a
wide variety of ways: open circuits, short circuits and bridging
between metal layers.
Figure 35: Physical Defects

33
12.2 Fault Models:
A logical model representing the effects of a physical defect:

● Stuck-at
A signal, or gate output/input, is stuck at a 0 or 1 value,
independent of the inputs to the circuit. [10] F
i
Stuck-at-0: Node is shorted to GND. g
Figure 36: examples of stuck atufault
Stuck-at-1: Node is shorted to VDD. r
e
:
● Bridge o
Two or more normally distinct points are shorted together. u
t
12.3 There are two conditions should be met to have a testable Figure 37: bridgingpufault
design: t
Y
● Controllability: s
All the inputs of the gates are controllable. The t
u
nearer a gate is towards input, the easier to c
control but the nearer a gate is towards output, k
a
the more difficult to control. t
0
● Observability: Figure 38: Controllability and Observability
The gate output is observable. The nearer a gate is
towards output, the easier it is to control but the nearer a gate is towards output, the more
difficult to control.

But the design is more complex than this circuit so have to find a method to help in controlling
the inputs of the design and observing its outputs. One of the methods is using scan chains.

12.4 Scan Chains:


All the flip flops are replaced by scan flop (Mux + normal flip flop) as shown in
the figure.

“compile -scan” command in Design Compiler causes the normal flops to be


scan-replaced during the synthesis process.

Figure 39: scan flip flop

34
All flops in the design are connected in series. The first flop in the chain is fed directly from an input pin
scan input (SI). And the last flop in the chain feeds directly to an input pin scan output b(SO). As shown
in the below figure. [10]

F
i
g
u Figure 40:scan chain
r
e
:
s
e Scan Enable (SE) signal is common to all flip flops in all the chain controls the mux to decide whether the
r scan chain path will be activated or the normal function path would be activated. When SE = 0, the input
i
e
of flip flop is functional input D. When SE = 1, the input of flip flop is scan input (SI). [10]
s
s Scan flops have higher setup requirement as the functional data has to go through an additional mux so
c it is possible that a path which was meeting the timing before scan insertion starts failing after scan
a
n
insertion. Scan flip flop are larger in area compared to normal flip flop. [10]
c
h 12.5 Mechanics of Scan Chain:
a
i
n

Figure 41: mechanics of scan chain

1- Shift in:
• Assert Scan (Scan Enable).
• Load input pattern serially to all FFs from scan-in.
• Logic blocks produce outputs at D-inputs of FFs. [11]
35
2- Capture:
• De-assert Scan (Scan Enable).
• D-inputs of FFs will be stored in Qs of FFs in 1 clock cycle (to capture logic blocks outputs). [11]

3- Shift out:
• Assert Scan (Scan Enable).
• Read results from all FFs serially from scan-out.
• Simultaneously load new input pattern serially to all FFs from scan-in. [11]

Automatic Test Pattern Generation (ATPG):


• Automatic generation of test patterns for scan ready design.
• ATPG tool is used for generating test pattern file and testbench for simulating test patterns. [11]

13 Overall Design Flow


13.1 Design Flow Overview
The design flow is a sequential progression of stages, each playing a pivotal role in the ultimate
realization of an integrated circuit. [1]2 The key stages encompass:

13.2 Concept + Market Research


The initial phase involves the conception of the design based on thorough market research and the
definition of architectural specifications. This stage serves as the foundation for subsequent design
decisions. [12]

13.3 RTL Coding and Simulation


Once the architectural specifications are established, the design team proceeds to create Register-
Transfer Level (RTL) code. This code, written in Verilog, captures the behavior of the circuit. Subsequent
simulation ensures that the design adheres to functional requirements, and key metrics such as timing,
area, and power are evaluated. [12]

13.4 Logic Synthesis and Formal Verification


Logic synthesis transforms the RTL code into gate-level netlists, facilitating further optimization. Formal
verification techniques are then applied to rigorously confirm the correctness of the design, ensuring
that it adheres to specified functional and timing requirements. [12]

13.5 Place and Route (PnR)


The transition from logical to physical design occurs during the Place and Route (PnR) stage. [12] This
encompasses several critical steps:

13.5.1 Floorplanning
Floorplanning involves the strategic determination of the chip's major design objects' size and
placement. Key influencers include the digital frontend, digital backend, analog-mixed signal designer,
36
and the project manager. Decisions made during floorplanning significantly impact the subsequent
stages of the design flow. [13]

13.5.2 Power Planning & Boundary Cell Insertion


Strategic power planning ensures efficient power distribution, and boundary cell insertion establishes
the chip's periphery, shaping its overall structure. [13]

13.5.3 Placement
Placement involves positioning standard cells within the defined floorplan, considering factors such as
congestion, cell density, and overall accessibility for routing. [13]

13.5.4 CTS and Post-CTS Logical Optimization


Clock Tree Synthesis (CTS) ensures a balanced and efficient distribution of clock signals throughout the
design. Post-CTS logical optimization refines the design for improved performance. [13]

13.5.5 Routing
The routing stage involves the creation of physical interconnections between the placed cells, adhering
to design rules and ensuring the integrity of signal paths. [13]

13.6 Physical Verification


Post-PnR, the design undergoes rigorous physical verification processes. This includes checks for
adherence to design rules, layout versus schematic (LVS) clean results, and confirmation that power
delivery network requirements are met. [12]

13.7 Signoff/Tapeout
The final stage involves comprehensive signoff checks to ensure that all criteria, including timing, power,
and area requirements, are met. Once validated, the design is taped out for fabrication. [12]

14 Floorplanning
14.1 Chip Floorplanning
The chip floorplanning process is a critical undertaking that
involves several considerations: [14]

14.1.1 IO Pads
Input/Output (IO) pads serve as crucial intermediaries
connecting internal signals to the external pins of the chip
package. They are organized into an IO ring at the chip's
periphery, featuring various types such as signal IOs,
power IOs, corner IOs, and filler IOs. [16]
Figure 41: I/O Pads

37
14.2 IP Floorplanning
For Intellectual Property (IP) floorplanning, the focus
shifts to die area, core area, pin placement, hard
IPs/macros placement, power delivery, and voltage
domains. [17]

14.3 Standard cell placement and Routing


Standard cell placement involves creating site rows and
placing cells strategically to optimize routing and power
grid considerations and have a good area utilization to
make it easy to close routing. [17]

14.4 Blockages and Halos Figure 42: Chip Floorplanning

Placement blockages and halos guide the tools in placing cells, with
considerations for partial blockages and halos to maintain optimal
utilization. [15]

14.5 Guidelines for a Good Floorplan


Some guidelines for a well-structured floorplan are emphasizing macro
placement, routing channels, pin accessibility, and congestion avoidance.
[17]
Figure 43: Placement Blockages and Halos

15 Power Planning and Related Challenges


15.1 Introduction
Power planning is a critical aspect of integrated circuit
design that ensures a uniform supply of voltages to all
standard cells and macros. This part delves into the
intricacies of power planning, addressing challenges
such as IR drop, dynamic IR, and electromigration. [15]

15.2 Power Planning Overview


Power planning is essential for maintaining stable
voltages, minimizing noise, avoiding IR drop, and
preventing electromigration-related issues. The power
distribution network (PDN) aims to uniformly
distribute power throughout the chip while
minimizing area utilization and simplifying layout. [15] Figure 44 : Power Planning

38
15.3 Three Levels of Power Distribution
1. Rings: Circulate (VDD, VSS) around the
chip.

2. Stripes: Connect (VDD, VSS) from rings to


the core.

3. Rails: Link (VDD, VSS) to standard cell


pins.

Identifying suitable metal layers, widths, pitch for


stripes and rings, and determining the number
and location of power pads are crucial steps in Figure 45: Levels of Power Distribution
this process. [15]

16 IR Drop
IR drop is a fundamental source
of power noise and is addressed
through the Power Delivery
Network (PDN) or power grid.
Figure 46: IR Drop
Two types of IR drop exist: static
and dynamic. [18], [19].

16.1 Effects of IR Drop


1. Timing Violations: Delay of standard cells depends on the available VDD, introducing potential
timing issues.

2. Propagation Delay: Lower VDD leads to higher propagation delay.

3. Potential IC Failure: Excessive IR drop, if unmitigated, can cause IC failure.

4. Power Noise: Introduction of power noise, including voltage drop and ground bounce, in power
supply nets.

16.2 Mitigating Static IR Drop


Static IR drop, an average voltage drop within the
design, is influenced by the RC network of the
power grid. [18], [19]. Mitigation strategies include:

• Utilizing higher metal layers if available.


Figure 47: Static IR Drop
• Increasing the width and number of straps.

• Verifying the presence of necessary vias.

39
16.3 Mitigating Dynamic IR Drop
Dynamic IR drop occurs due to high switching activity of transistors, causing a peak current demand. [18],
[19]. Strategies for mitigation include:

• Incorporating decap cells.

• Increasing the number of straps.

• Managing cell density and switching activity in specific regions.

17 Decap Cells
Decap cells, acting as capacitors, are
essential for addressing dynamic IR drop.
They provide extra charge to the Power
Delivery Network (PDN), boosting the
power network. Decap cells, made from
MOS transistors, are inserted post-route
stage. [18], [19].
Figure 48: Decap
Cells
18 Analyzing IR Drop
Analyzing the IR drop involves mapping it using a color map to highlight "hot spots" where the IR drop is
significant. Adjustments, such as adding extra strips in problematic areas, can improve the overall
situation. [18], [19].

19 Static vs. Dynamic IR Analysis


Aspect Static IR Analysis Dynamic IR Analysis

Objective Generating a robust power delivery Optimizing the insertion of decoupling


network capacitors
Timing Can be used right after placement Used as a final check
Current Average current over a period Instantaneous current at a specific time
Calculation (varying with frequency) (independent of frequency)

20 Electromigration
Electromigration occurs when high current
density displaces metal ions within the
metal layer. Effects include opens and
shorts, leading to potential IC failure. The
drift of metal ions depends on current
density, and mitigating strategies involve
Figure 49: Electromigration
considering wider power lines. [20]
40
21 Placement in Physical Design
21.1 Introduction
Placement is a crucial step in the physical design of
integrated circuits, where the goal is to optimally
position standard cells within the core boundary. This
part explores the placement process, its objectives,
and various optimization techniques employed to
achieve optimal results. [21]

21.2 Placement Overview


Placement involves arranging standard cells within a
predefined floorplan while adhering to technology
constraints. The main objectives of placement include
providing a legal location for the entire netlist,
minimizing routing congestion, and achieving optimal Figure 50: Placement
timing, area, and power results. [21]

21.3 Timing-Driven Placement


Timing-driven placement integrates timing considerations into the placement process. It leverages
information from synthesis netlists to optimize cell positions for improved overall timing performance.
[21]

22 Placement Flow
The placement flow is typically divided into two stages: global placement and detailed placement
(legalization). [21]

22.1 Global Placement


• Quickly divide each cell into “bins” to try and minimize the number of connections between
groups. [21]
• In this stage, the tool will not check any overlap of instances. [21]

22.2 Detailed Placement


• Legalization: Provides a legal placement for each instance while minimizing wirelength or other
cost metrics. [21]

23 Placement Optimizations
Placement optimizations are crucial for achieving optimal timing closure and addressing high fanout
nets. [10] Various strategies include:

41
• Optimizing Timing: Reducing wirelength, dividing long nets using buffers, layer promotion,
resizing gates to meet timing.

• High Fanout Nets Fixing: Addressing nets with high fanout through tie-cell insertion and scan-
chain reordering.

• Logical Restructuring: Gate composition or decomposition to enhance worst negative slack


(WNS) or total negative slack (TNS).

24 Tie-Cell Insertion
Tie-cells are inserted to connect the input of logic
gates that need to be linked to VDD or VSS. This
prevents damage to the gate oxide under the poly
gate due to power noise. [21]
Figure 51: Tie-Cell Insertion
25 Scan-Chain Reordering
Scan-chain reordering optimizes the arrangement of scan chains to improve routing. This process
ensures easier routing of scan chains and reduces congestion. [21]

26 Logical Restructuring
Logical restructuring involves gate composition or decomposition to enhance worst slack or total
negative slack, improving overall placement quality. [21]

27 Congestion in Placement
Congestion occurs when the number of required routing tracks exceeds
the available tracks. It can lead to routing difficulties and must be
minimized or eliminated. Congestion maps are useful for evaluating
total congestion and identifying hotspots. [21]

27.1 Reasons for Congestion


• High standard cell density in a small area.

• Placement of standard cells near macros.

• High pin density at the edge of the macro.


Figure 52: Congestion Map
• Macro pins near the core area boundary.

• Blind double spacing/width for clock tree synthesis (CTS) in lower metal layers near pins.

28 Placement Constraints
Placement constraints, including blockages, bounds, and keep-out margins, are used to reduce or avoid
congestion and enhance timing. [21]
42
29 Strategies to Fix Congestion
• Modify the floorplan to mark areas for low utilization.

• Align bus signal pins.

• Increase spacing between macros.

• Add blockages and halos.

• Adjust core aspect ratio and size.

• Optimize power grid.

30 Clock Tree Synthesis (CTS) in Physical Design


30.1 Introduction
Clock Tree Synthesis (CTS) is a critical phase in the physical design flow of integrated circuits, focused on
building a balanced buffer/inverter network to distribute the clock signal efficiently. This part explores
various aspects of CTS, including parameters, goals, skew, jitter, and the synthesis process. [22], [23].

30.2 Clock Parameters


30.2.1 Skew
Definition: Difference in clock arrival time at two spatially distinct points. [22], [23].

30.2.2 Jitter
Definition: Difference in clock period between different cycles. [22], [23].

30.2.3 Slew
Definition: Transition (rise/fall) of the clock signal. [22], [23].

30.3 Insertion Delay


Definition: Delay from the clock source until
registers. [22], [23].

31 Skew and Jitter


• Capture Clock Edge: The edge of the clock
for which data is detected. [22], [23].

• Launch Clock Edge: The edge of the clock Figure 53: Clock Parameters
where data is launched. [22], [23].

31.1 Local Skew and Global Skew


• Local Skew: Difference in the arrival of the clock signal at the clock pin of related flops.

43
• Global Skew: Difference in the arrival of the clock signal at the clock pin of non-related flops.

31.2 Positive Skew and Negative Skew


• Positive Skew: Capture clock comes late compared to launch clock, leading to potential hold
violations.

• Negative Skew: Capture clock comes early compared to launch clock, potentially causing setup
violations.

32 Source Delay, Network Delay, and Uncertainty


• Source Delay (Source Latency): Delay from the clock origin point to the clock definition point.

• Network Delay (Insertion Delay): Delay from the clock definition point to the clock pin of the
register.

• Uncertainty: Time difference between arrivals of clock signals at registers in one clock domain or
between domains.

33 Clock Tree Synthesis (CTS)


33.1 CTS goals
• CTS aims to build a buffer/inverter network to balance
relative delays of Flip-Flops (FFs) within a clock domain.

• Goals include minimizing global skew, optimizing power,


area, and ensuring signal integrity. [22], [23].

33.2 CTS vs. High Fanout Synthesis (HFS)


• CTS uses symmetric buffers/inverters with equal rise and fall
Figure 54: Clock Tree Synthesis (CTS)
times.

• HFS uses buffers with relaxed rise and fall times, often for static signals with high fan-out. [22],
[23].

33.3 CTS Process Overview


• CTS begins with a buffered tree to minimize RC-delay and restore signal integrity.

• A "delay line" is added to meet the minimum insertion delay requirements. [22], [23].

33.4 Clock Tree Constraints


• Clock tree constraints include symmetric buffers/inverters, routing layers, NDR rules, target skew,
max transition, max capacitance, and CTS cell spacing. [22], [23].

33.5 Clock Tree Synthesis Execution


• CTS execution involves commands like clock_opt and set_clock_tree_exceptions to define stop,
float, or exclude pins. [22], [23].
44
33.6 Non-Default Clock Routing
• Non-default routing rules can be used for clock signals, impacting signal integrity and sensitivity
to crosstalk. [22], [23].

33.7 Recommendations for NDR (Non-Default Routing)


• Recommendations include routing clocks on metal 3 and above, avoiding NDR on clock sinks,
and being cautious with NDR on Metal 1. [22], [23].

33.8 Effects of CTS


• CTS introduces clock buffers, potentially increasing congestion.

• Non-clock cells may be relocated, leading to new timing and max transition/capacitance
violations. [22], [23].

33.9 Clock Tree Optimization


• Clock tree optimization involves sizing,
relocation, buffer insertion, and gate
sizing to achieve useful skew. [22], [23].

Figure 55: Buffer Insertion for Clock Tree Optimization

34 Routing
34.1 Importance of Routing as Technology Shrinks[3]
• Device (Gate) delay decreases.
• Interconnect resistance increases.
• Vertical heights of interconnect layers increase, in an
attempt to offset increasing interconnect resistance.
• Area component of interconnect capacitance no longer
dominates.
• Lateral (sidewall) and fringing components of capacitance
Figure 56:Multi-level Interconnection (MLI)
start to dominate the total capacitance of the
Technology layer Stack
interconnect.
• Interconnect capacitance dominates total Gate loading.

45
34.2 Routing
Making physical connections between
signal pins using metal layers are called
Routing. Routing is the stage after CTS
and optimization where exact paths for
the interconnection of standard cells
and macros and I/O pins are
determined. Electrical connections using
metals and vias are created in the
layout, defined by the logical
connections present in the netlist (i.e. Figure 57:Routing
Logical connectivity converted as
physical connectivity).[26]

As ASIC designs are getting more complex and larger (e.g. sea of cells), routing is becoming more difficult
and challenging. It is possible for routing to fail to complete, or to take an unacceptable amount of
execution run time Besides the routing algorithms, the factors which influence the routability of a given
ASIC are the layout of standard cells style, a well-prepared floorplan, and the quality of standard cell
placement.[26]

After CTS, we have information of all the placed cells, blockages, clock tree buffers/inverters and I/O
pins. The tool relies on this information to electrically complete all connections defined in the netlist
such that:[26]

o There are minimal DRC violations while routing.


o The design is 100% routed with minimal LVS violations.
o There are minimal SI related violations.
o There must be no or minimal congestion hot spots.
o The Timing DRCs & QOR are met and good respectively.

34.3 Inputs of Routing


• Netlist
• All cells & ports should be legally placed with clock tree structure & CTS DEF file
• NDRs
• Routing blockages
• Technology data (metal layers (lef, tech file etc.), DRC rules, via creation rules, grid rules (metal
pitch) etc...)

34.4 Goals of Routing

Routing is the process of converting every logical connection in the netlist to physical connection with
the help of metal layers and VIAS. Taking in consideration the DRC rules while doing the actual routes.[26]

1) Establish a physical connection with min routing resources.


46
2) met skew requirement.

3) Timing requirement & logical DRCs.

4) No physical DRCs (min space , min width , …).

5) Shorts / opens clean.

6) No signal EM.

7) No crosstalk.

8) No congestion.

34.5 Routing Constraints


• Set constraints to number of layer to be used during routing
• Setting limits on routing to specific regions
• Setting the maximum length for the routing wires.
• Blocking routing in specific regions.
• Set stringent guidelines for minimum width and minimum spacing
• Set preferred routing directions to specific metal layers during routing
• Constraining the routing density.
• Constraining the pin connections.

34.6 Do we have any routes created before this step?


Yes we already created some routes before this step.

1) Power routing → in power planning stage.

2) Clock routing → in CTS stage.

3) Signal routing → in Route stage (this stage).

We have routing tracks in each layer. Each stage takes some of the routing resources.

34.7 Grid-Based Routing System


Most of the routers available are grid based routers. There are routing grids defined for the entire layout.
Consider it like a graph as below. For grid based routers, there are also preferred routing direction
defined for each metal layer. e.g. Metal1 has a preferred direction of “horizontal’, metal2 has preferred
routing direction of “vertical’ and so on. So, in the
whole layout, metal1 routing grids will be drawn
(superimposed) horizontally with metal1 wire picth and
metal2 grids will be drawn vertically with metal2 wire

Figure 58:Grid based routing with two metals

47
pitch between each. [26]
Figure 59:Routing Grids

The first figure on left figure shows how routing grids are drawn. I am only considering two metals for
now, but in a process with more metals, similar grids will be superimposed on the layout for all available
metals. Pitch is calculated by determining the minimum spacing required between grid lines of same
metal. This can be the minimum spacing of the metal itself but is usually a value greater than the
minimum spacing. This is calculated by considering the via
dimension as well, so that no two adjacent wires on the grid
create any DRC violation even when there are vias present.

In a grid based routing algorithm, the router switches the metal


as per preferred direction to interconnect the nodes. As you can
see in the second figure, metal1 & metal2 wires are drawn along
the metal1 & metal2 grids respectively. They are interconnected
by via1 to complete the routing path.

Metal traces (routes) are built along, and centered upon routing
tracks based on a grid.

34.8 Routing Flow Figure 60: Grid Based Routing


Due to the inherent complexity of ASIC designs and the
very large numbers of interconnections associated with
them, the overall routing is performed in three stages:
global routing, Track Assignment, and detail routing.[26]

Figure 61: Routing Flow

48
34.8.1 Global Routing
Global routing is a coarse-grain assignment of routes, which first
partitions the routing region into tiles/rectangles called global
routing cells (Gcells) and decides tile-to-tile paths for all nets while
attempting to optimize some given objective function (e.g., total
wire length and circuit timing), but doesn’t make actual routes or
assign nets to specific paths within the routing regions. By default,
the width of a gcells is same as the height of a standard cell and is
aligned with the standard cell rows.[26]

Before this stage, we completed global routing. In the


placement stage, we performed a quick global routing to Figure 62: Global Routing
generate a congestion map, ensuring that the placement of
standard cells is optimal and free from congestion areas.[26]

it takes Chip floorplan and Placement information and generate


Instructions to detailed router how to route every net. It
Calculates estimated values for each net by the delays of fan-
out of wire.[25]

Optimization:[4]
• Minimize routing length especially for critical path.
• Solve any possible congestions (can move cells/rows to
make room for interconnections). Figure 63: Global Routing

• Maximize probability that detailed router succeed.

34.8.2 Track Assignment


In this stage, the routing tracks assigned by the global stage
are replaced by the metal layers. Tracks are assigned in
horizontal and vertical direction. If overlapping is occurred
then rerouting is done.[26]

It also attempts to:


• Make long, straight traces.
• Reduce the number of vias.
• TA is not DRC aware.

Figure 64: Track assignment

49
34.8.3 Detailed Routing
The detailed router uses the routing plan laid by the router
during the Global Routing and Track Assignment and lays
actually metal to logically connect pins with nets and other pins
in the design.[26]

The violations that were created during the Track Assignment


stage are fixed through multiple iterations in this stage.[26]

The detailed routing starts with the router dividing the block
into specific areas called switch boxes or Sbox, which are
generally expressed in terms of gcells. For example, a 3x3 Sbox
is a box which encompass 9 gcells. [25] Figure 65: SBoxes

In detailed routing, the actual delays of wire is calculated by


various optimization methods like timing optimization, clock tree
synthesis, etc.[24]

34.8.4 Search&Repair
Search & Repair fixes remaining DRC violations through multiple
loops using progressively larger SBox sizes [26]

35 Filler Cell Insertion


Filler cells are used for connecting the gaps between the cells
after placement. To make sure that each cell gets power and
Figure 66: Design Rule Constraints
ground connection, the cells are abutted together so that the
VDD and VSS rails are continued short together. This makes it
possible to tap power only at one point anywhere in the row. But it
is virtually impossible to fill 100% of the die area, we use filler cells
to fill these spaces between regular library cells to route power
rails.[26]

They have power rails, N-well, PPlus, and NPlus layers only. It will be
used for Engineering Change Order (ECO)'s, then the filler cells can
be deleted and the empty spaces can be utilized. They can also be
used to cope with setup or hold violations. It will also help to Figure 67: Filler Cell Insertion
prevent density- related DRC errors for base layers(FEOL).

36 Physical Only Cells


Decap Cells _ Decoupling capacitors are another type of physical only cells used in PD flow. These do
not have any logical functionality. These cells essentially act as a capacitance between power and ground
rails, and hence as a charge reservoir that can be counted upon while there is a high demand for current
from the power lines. They can be thought of as localized power supplies within your chip.
50
EndCap cells _ Placed at both the ends of each row T o protect the gate of a standard cell
placed near the boundary from damage during manufacturing.

Also placed at the bottom and top row To make the proper alignment with the other block.

Well Tap cells _ used to prevent the latch-up issue in the CMOS design. Well tap cells
connect the nwell to VDD and p-substrate to VSS in order to prevent the latch-up issue.[26]

Figure 68 :Well Tap Cell

37 Parasitics Extraction (PEX):


The major purpose of parasitic extraction is to create an accurate analog model of the circuit, so that
detailed simulations can emulate actual digital and analog circuit responses. Digital circuit responses are
often used to populate databases for signal delay and loading calculation such as:

• Timing analysis
• Power analysis
• Circuit simulation
• Signal integrity analysis

Figure 69:Interconnect parasitic capacitance modeling

51
Figure 70: Star-RCXT Flow

Star RCXT Flow is a critical tool in the field of electronic design automation (EDA) that plays a pivotal role
in the analysis and optimization of integrated circuits (ICs). Developed by Synopsys, a leading EDA
software provider, Star RCXT Flow is widely used in the semiconductor industry to address the challenges
associated with the ever-increasing complexity of modern IC designs.

The primary function of Star RCXT Flow is to perform accurate and efficient extraction of parasitic
elements from the layout of an IC. Parasitic elements, such as resistors, capacitors, and inductors, can
significantly impact the performance of the circuit. Star RCXT Flow employs advanced algorithms and
methodologies to model and extract these parasitics, providing designers with a comprehensive
understanding of the circuit behavior.

There are many methods to give the tool such its inputs:

• The main level here is (Cell Level) which gives the (Milkyway) or (LEF & DEF) as an input to the
tool.

(Milkyway) or (LEF & DEF) can represent the New Data Model (NDM) such that it is the database for
using (ICC2) which represents the whole information about the timing and physical information of the
cells in used in the design.

• There is another level which is the (Transistor Level) where the inputs are (GDSII Format &
Command File).

Graphic Data Stream (GDSII) is the final output file of the backend design which includes the information
of the final design output , & (CMD File) is a file define the commands that represent the database ,
technology information , the mapping file and netlist format.
52
Ex :

• We can also use the Interconnect Technology Format (ITF) after running a command (grdgenxo)
to get the nxtgrd database , and (Layer Mapping) file which contains all the informations about
the layers. Ex: Conducting_layer , Via_layer , Remove_layer , marker_layer.

The ITF file defines a cross section profile of the process. This is an ordered list of conductor and
dielectric layer definition statements.

We will make process integration to define the


chip’s physical layer composition, before we run
the (grdgenxo) command to generate the nxtgrd
database , and to begin the process
characterization, we will specify the content of
each layer in an Interconnect Technology Format
(ITF) file.
Figure 71Extract nxtgrd database

The nxtgrd (New Xtraction Generic Regression Database) output file is a database containing
capacitance, resistance, and layer information , StarRC uses the nxtgrd file to calculate the parasitics of
the actual layout by pattern matching.

53
StarRC integrates into many design flows through standard design data formats like Milkyway, Library
Exchange Format/Design Exchange Format (LEF/DEF), Standard Parasitic Exchange Format (SPEF), and
Calibre Connectivity Interface (CCI).

In fact, widespread use of StarRC in third-party design flows as well as Synopsys design flows is occurring
today. This includes integration with static timing analysis tools (STA) and third-party place-and-route
tools (PNR) directly through the use of LEF/DEF and the Calibre Connectivity Interface. You can also use
GDSII by using Hercules (Milkyway XTR view) files.

• Input formats:

Milkyway , LEF/DEF , GDSII , CCI , (grd models & mapping)

• Output Netlist Formats:

SPICE Netlist , Standard Parasitic Exchange Format (SPEF) Reports.

What is the post-STAR-RCXT flow? In other words, the SPEF is i/p for which tools?

Simulation , STA , IR drop , Xtalk , and another tools.

In conclusion, Star RCXT Flow is a powerful and versatile tool in the realm of EDA, contributing
significantly to the success of modern IC design projects. Its advanced capabilities in parasitic extraction,
compatibility with cutting-edge process technologies, and seamless integration within the design flow
make it a preferred choice for semiconductor professionals.

54
38 Physical Verification (PVR):

Figure 72: PVR Flow

Physical verification is the process of ensuring a design’s layout works as intended.

Steps include design rule checking (DRC) , Electric


rule checking (ERC) , layout-versus-schematic (LVS)
checks , and Antenna Rules.

• Design rule checking (DRC)


determines if a chip layout satisfies a number of
rules as defined by the semiconductor manufacturer.
Each semiconductor process will have its own set of
rules and ensure sufficient margins such that normal
variability in the manufacturing process will not
result in chip failure.

Figure 73: DRC Checks

55
• Electric rule checking (ERC)
It involves checking the design for all electrical connections that are considered dangerous.

Floating gate error : If any gate is unconnected, this could lead to leakage issues.

VDD/VSS errors : The well geometries need to be connected to power/Ground and if the PG connection
is not complete or if the pins are not defined, the whole layout can report errors like “NWELL not
connected to VDD.

• Layout vs. schematic (LVS)


Provides device and connectivity comparisons between the IC layout and the schematic. An LVS tool
enables accurate circuit verification because it is able to measure actual device geometries across a full-
chip for a complete accounting of physical parameters. The measured device parameters supply the
information for back-annotation to the source schematic and comprehensive data for running
simulations.

LVS steps:

1. Extraction : The tool takes GDSII file containing all the layers and uses polygon based approach to
determine the components like transistors, diodes, capacitors and resistors and also connectivity
information between devices presented in the layout by their layers of construction. All the device
layers, terminals of the devices, size of devices, nets, vias and the locations of pins are defined and given
an unique identification.

2. Reduction : All the defined information is extracted in the form of netlist.


3. Comparison : The extracted layout netlist is then compared to the netlist of the same stage using the
LVS rule deck. In this stage the number of instances, nets and ports are compared. All the mismatches
such as shorts and opens, pin mismatch etc.. are reported. The tools also checks topology and size
mismatch.

Issues that can be detected by LVS:

1. Shorts : Shorts are formed, if two or more wires which should not be connected together are
connected.

2. Opens : Opens are formed, if the wires or components which should be connected together are
left floating or partially connected.

3. Component mismatch : Component mismatch can happen, if components of different types


are used (e.g, LVT cells instead of HVT cells).

56
4. Missing components : Component missing can happen, if an expected component is left out
from the layout.

5. Parameter mismatch : All components has it’s own properties, LVS tool is configured to
compare these properties with some tolerance. If this tolerance is not met, then it will give
parameter mismatch.

Figure 74: schematic netlist vs extracted layout netlist

Antenna rule checking

Process antenna effect or “plasma induced gate oxide damage” is a manufacturing effect. & this is a type
of failure that can occur solely at the
manufacturing stage. This is a gate damage that
can occur due to charge accumulation on
metals and discharge to a gate through gate
oxide.

As total area (length) of wire increases during


processing, the voltage stressing the gate oxide
increases.

Antenna rules define acceptable total area of


wires.

Antenna Ratios : Area of metal connected to


gate / Combined Area of Gate (Antenna Figure 75: different metal on gate representation
area) ⁄ (Gate area) < (Max Antenna ratio)
57
39 On-Chip Variation (OCV):
It is a method to model the random variations occurs due to the fabrication process posing challenges to
achieving consistent and reliable IC performance.

We should model it as it can lead to chip failure, as it can make path delay faster or slower.
To model this variations ,we will add extra pessimism to model this random variation.

Figure 76: represents the difference between PVT & OCV

58
These OCV variations can affect the wire delays and cell delays in different portions of the chip. And the
result is that all the cells of the entire chip no longer can be modelled using the fast or slow process
corner alone, such some cells will run fast, others slower than expected, depending on the changes in
process condition and the impact of design dependent effects.

Failure to account for these issues can lead to setup and hold violations in designs that are, nominally at
least, error-free. As a result, the design’s timing needs to be analyzed in a way that takes into account the
potential for timing to change within a given process or temperature corner.

• Derates :
It is the delay percentage values
which the delay values of cells and
nets are multiplied by the timing
derate percentage, which adds more
pessimism to account for these
variations.

Figure 77: the difference in derates across enhancing technology

OCV is generally pessimistic view of modelling process variations. Here we use that the delay of all cells
can show , let’s say (X%) (Derate) variation in their delays. Now we would either model this varaitions as
(-X%) to (+X%).

Now we would model the delay of all cells and subject them to OCV in a manner that our timing
becomes pessimistic and we can claim that in the worst case scenario, as long as we can ensure that the
variation would be within the bracket of (-X%) to (+X%) , we would be safe.

For setup check, to increase pessimism: (Tcomb(max), Tskew(min))

• Data is multiplied by
setup_data_late derate.
• Launch clock is multiplied
by setup_clock_late
derate.
• Capture clock is
multiplied by
setup_clock_early derate.

Figure 78: Setup analysis under OCV

59
For hold check, to increase pessimism: (Tcomb(min), Tskew(max))

• Data is multiplied by
hold_data_early
derate.
• Launch clock is
multiplied by
hold_clock_early
derate.
• Capture clock is
multiplied by
hold_clock_late derate.

Figure 79: Hold analysis under OCV

• Clock Reconvergence Pessimism Removal (CRPR) :


It is removing extra pessimism due to the fact
that same cells cannot have different delays
in the same conditions and same point of
time.

For the common clock path, present in both


launch and capture clock paths, it is derated
twice with two different values, so the extra
pessimism is removed. Figure 80: Common path that should be applied by CRPR (GreyOne)

• We face some issues using OCV such


that it uses fixed derates for all the
cells and this make OCV is very
pessimistic , and as number of stages
increases the pessimism increases.
But in reality, the cells in the path
couldn’t be delayed all or early all as
there are some cells could be delayed
and others be early as there is a mixed
type of effect and this can make
cancellation of effect in total.

Figure 81: Different path Delay in OCV Fixed Derate

60
As the path depth is getting larger then OCV became more pessimistic.

As we shrink technology, a need arise for more efficient methodology for variation-aware timing analysis.

• Advanced OCV (AOCV) :


AOCV models the random and systematic variations across an IC that affect timing by using variable
derating factors. It consider the cell type, location and the logic depth of each path being analyzed.

• Cell Type: Variation is calculated for each individual cell. Such that surely an AND gate and an OR
gate can’t exhibit the same variation type.
• Distance: As the distance increase, the systematic variations would increase, we use a higher
derate value to reflect the increased uncertainty in timing analysis.
• Path Depth: If within a given distance, as the path depth increases, in the same distance,
random variations tend to cancel each other, therefore AOCV derate tends to decrease.

Figure 82: Sample AOCV table for setup analysis

• If derate type is early : we take in consider the hold analysis.


• If derate type is late : we take in consider the setup analysis.

• Parametric On-Chip Variation (POCV) :


POCV advance variation technology provides
statistical benefits without the overhead of
expansive statistical library characterization. In
POCV instead of applying the specific derate factor
to a cell, cell delay is calculated based on delay
variation (σ) of the cell. In POCV it is assumed that
the normal delay value of a cell follows the normal
distribution curve. An example of a normal
distribution curve and standard deviation of data
from the mean is shown in the next figure.
Figure 83: Standard deviation of data

61
In normal distribution 68% of data falls within the 1σ range, 95% data falls within 2σ and 99.7% data fall
within the range of 3σ.

40 Design Planning Overview


Design planning is an integral part of the RTL to GDSII design process. During design

planning, you assess the feasibility of different implementation strategies early in the design

flow. For large designs, design planning helps you to “divide and conquer” the implementation process
by partitioning the design into smaller, more manageable pieces for more efficient processing.

Similarly, design planning supports pattern-based power planning, including low-power

design techniques that can be used in multi-voltage designs. Using pattern-based power planning, you
can create different voltage areas within the design and define a power plan strategy for each voltage
area. The tool creates the power and ground mesh based on your

specification, and it connects the macros and standard cells to the mesh. You can quickly iterate on
different power plan strategies to investigate different implementations and select the optimal power
plan.The IC Compiler II tool supports complete hierarchical design planning for both channeled and
abutted layout styles.

40.1 Hierarchical Design Planning Flow


The hierarchical design planning flow provides an efficient approach for managing large designs. By
dividing the design into multiple blocks, different design teams can work on different blocks in parallel,
from RTL through physical implementation. Working with smaller blocks and using multiply instantiated
blocks can reduce overall runtime.Consider using a hierarchical methodology in the following scenarios:

•The design is large, complex, and requires excessive computing resources to process the design in a flat
form.

•You anticipate problems that might delay the delivery of some blocks and might cause the schedule to
slip. A robust hierarchical methodology accommodates late design changes to individual blocks while
maintaining minimal disruption to the design schedule.

•The design contains hard intellectual property (IP) macros such as RAMs, or the design was previously
implemented and can be converted and reused.

After the initial design netlist is generated in Design Compiler topographical mode, you can use the
hierarchical methodology for design planning in the IC Compiler II tool. Design planning is performed
during the first stage of the hierarchical flow to partition the design into blocks, generate hierarchical
physical design constraints, and allocate top-level timing budgets to lower-level physical blocks[28].
62
Figure 84: The flow to implement a hierarchical design plan

40.2 Topographical technology


Topographical technology allows to accurately predict post-layout timing, area, and power during RTL
synthesis without the need for timing approximations based on wire load models. It uses Synopsys’
placement and optimization technologies to drive accurate timing prediction within synthesis, ensuring
better correlation with the final physical design.

63
In ultra deep submicron designs,
interconnect Parasitics have a major effect
on path delays; accurate estimates of
resistance and capacitance are necessary to
calculate path delays. In topographical Figure 85 : Topographical technology in RTL synthesis
mode, Design Compiler leverages the
Synopsys physical implementation solution to derive the “virtual layout” of the design so that the tool
can accurately predict and use real net capacitances instead of statistical net approximations based on
wire load models. If wire load models are present, they are ignored.

In addition, the tool updates capacitances as synthesis progresses. That is, it considers the variation of
net capacitances in the design by adjusting placement-derived net delays based on an updated virtual
layout at multiple points during synthesis. This approach eliminates the need for over constraining the
design or using optimistic wire load models in synthesis. The accurate prediction of net capacitances
drives Design Compiler to generate a netlist that is optimized for all design goals including area, timing,
test, and power. It also results in a better starting point for physical implementation.

Figure 86: Inputs and outputs of DC in topographical mode

40.2.1 Using Floorplan physical constraints


Design Compiler in topographical mode supports high-level physical constraints such as die area, core
area and shape, port locations, cell locations and orientations, keepout margins, placement blockages,
preroutes, bounds, vias, tracks, voltage areas, and wiring keepouts. Using floorplan physical constraints
in topographical mode improves timing correlation with post-place-and-route tools, such as IC Compiler,
by considering floorplanning information during optimization.

The main reason to use floorplan constraints in topographical mode is to accurately represent the
placement area and improve timing correlation with the post-place-and-route design. You can provide
high-level physical constraints that determine core area and shape, port location, macro location and
orientation, voltage areas, placement blockages, and placement bounds. These physical constraints can

64
be derived from IC Compiler floorplan data, extracted from an existing Design Exchange Format (DEF)
file, or created manually.

41 Future Work

Figure 87 : Gant Chart

65
42 Conclusion
In conclusion, the ASIC implementation of RISC-V Microcontroller project encompasses the hardening of
RISC-V from PULP project cores with a focused approach on achieving low power PULPino SoC to be used
in a wireless charging system specifically the receiver end

The ASIC implementation focuses on hardening the Ibex core to achieve the lowest power consumption,
particularly targeting IoT applications. This involves considerations for a relaxed frequency,
implementation of power domains, and integration of power switches to efficiently manage and reduce
power consumption. The core is to be integrated in the PULPino SoC afterwards. These steps are crucial
in ensuring the SoC meets the power requirements for battery-operated devices.

In essence, the ASIC design flow serves as the backbone for achieving these objectives, with each step
playing a crucial role in shaping the final implementation. Through synthesis, floor planning, power
planning, placement, CTS, routing, and chip finishing, the project aims to deliver a PULP-based SoC that
excels in performance, power efficiency, and area optimization, thereby catering to diverse application
domains and market demands.

66
43 References:
[1] PULP Implementation (pulp-platform.org)

[2] PULPino datasheet

[3] PULPissimo datasheet

[4] A. B. Kahng, J. Lienig, I. L. Markov, and J. Hu, "VLSI Physical Design: From Graph Partitioning to Timing
Closure," Cambridge University Press, 2011.

[5] P. P. Chu, "FPGA Prototyping by Verilog Examples," [Online]. Available: [Link]

[6] M. Singh, "California State University Northridge Implementation of Complete ASIC Design Flow, RTL
to GDS-II," M.S. thesis, California State University, Northridge, Dec. 2011. [Online]. Available:
http://hdl.handle.net/10211.3/141729

[7] "Logic Synthesis," VLSI Backend Adventure, [Online]. Available: https://www.vlsi-backend-


adventure.com/logic_synthesis.html

[8] "Formal Verification Basics," [Online]. Available: http://tech.tdzire.com/formal-verification-basics/

[9] "Verilog HDL: A Guide to Digital Design and Synthesis," Second Edition, IEEE 1364-2001 Verilog HDL
standard, Prentice Hall PTR, 2003, ISBN: 0-13-044911-3

[10] S. Churiwala and S. Garg, "Principles of VLSI RTL Design," Springer, 2011, Chapter 6, ISBN 978-1-
4419-9295-6

[11] N. H. E. Weste and D. Harris, "CMOS VLSI Design," Third Edition, Pearson, Addison Wesley, 2004,
Chapter 9, ISBN: 978-0321149015

[12] "VLSI Design Cycle," GeeksforGeeks, [Online]. Available: https://www.geeksforgeeks.org/vlsi-design-


cycle/.

[13] C. Nadella, "VLSI Physical Design Back-End Process," [Online]. Available:


https://www.slideshare.net/NADELLACHENCHU/vlsi-physical-design-back-end-process.

[14] "Understanding the Importance of Prerequisites in the VLSI Physical Design Stage," Design & Reuse,
[Online]. Available: https://www.design-reuse.com/articles/54634/understanding-the-importance-of-
prerequisites-in-the-vlsi-physical-design-stage.html

[15] "Floorplan Guidelines for Sub-Micron Technology Node for Networking Chips," Design & Reuse,
[Online]. Available: https://www.design-reuse.com/articles/53962/floorplan-guidelines-for-sub-micron-
technology-node-for-networking-chips.html

[16] C. Peng and C. Lin, "Low Loss I-O Pad with ESD Protection for K–Ka Bands," [Online]. Available:
https://www.semanticscholar.org/paper/Low-Loss-I-O-Pad-With-ESD-Protection-for-K-Ka-Bands-Peng-
Lin/9c90c8dc17eba28fc8bccbb367f2bc937393dc1a

67
[17] Author, "Understanding the Importance of Prerequisites in the VLSI Physical Design Stage," Design &
Reuse, [Online]. Available: https://www.design-reuse.com/articles/54634/understanding-the-
importance-of-prerequisites-in-the-vlsi-physical-design-stage.html

[18] Author, "IR Analysis in ASIC Design: Effects and Solutions," Team VLSI, [Online]. Available:
https://teamvlsi.com/2020/07/ir-analysis-in-asic-design-effects-and.html

[19] Author, "IR Drop Analysis II," VLSI SOC Blog, [Online]. Available: https://vlsi-
soc.blogspot.com/2019/06/ir-drop-analysis-ii.html

[20] "What is Electromigration?" ANSYS Blog, [Online]. Available: https://www.ansys.com/blog/what-is-


electromigration

[21] "Placement - VLSI Physical Design," Physical Design 4U, [Online]. Available:
https://www.physicaldesign4u.com/2020/02/placement.html

[22] "Clock Tree Synthesis - VLSI Physical Design," Physical Design 4U, [Online]. Available:
https://www.physicaldesign4u.com/2020/02/clock-tree-synthesis.html

[23] "Clock Tree Synthesis (CTS) - VLSI Backend Adventure," VLSI Backend Adventure, [Online]. Available:
https://www.vlsi-backend-adventure.com/cts.html.

[24] W. Wolf, "Modern VLSI Design: IP-Based Design," Fourth Edition, Pearson, 2010.

[25] A. B. Kahng, J. Lienig, I. L. Markov, and J. Hu, "VLSI Physical Design: From Graph Partitioning to
Timing Closure," Cambridge University Press, 2011.

[26] K. Roy and S. C. Prasad, "PHYSICAL DESIGN ESSENTIALS: An ASIC Design Implementation
Perspective," Springer, 2005.

[27] M. Smith, "Application-specific-integrated circuit (ASIC)," [Provide Details of the Specific Work or
Publication]

[28] Synopsys, "Star RCXT Flow - User Guide," June 2011. [Online]. Available:
https://usermanual.wiki/Document/StarRCUserGuide.1597430316.pdf

[29] "OCV v/s AOCV," VLSI SOC Design, March 2017. [Online]. Available: https://vlsi-
soc.blogspot.com/2017/03/ocv-vs-aocv.html

[30] "IC Compiler™ II Design Planning User Guide," Version L-2016.03-SP4, September 2016.

68

You might also like