Useful Book Ieee1500 Dft

Download as pdf or txt
Download as pdf or txt
You are on page 1of 63

Creation and validation of

flows with IEEE1500 test


wrapper for core-based
test methodology
Master Thesis
Pierre Schamberger
ST-Ericsson, Sophia-Antipolis, France
Supervisor:

Isabelle Delbaere, ST-Ericsson

Professor Ahmed Hemani, KTH


Electronic Systems, ES

School of Information and


Communication Technology, KTH

Sweden

02/11/2012 TRITA-ICT-EX-2012:272
Acknowledgments

I would like to thank all the people at ST -


Ericsson Sophia Antipolis who took time to give me
part of their knowledge and who helped me through
my work.

I acknowledge more especially the following


persons at ST-Ericsson:

– Isabelle Delbaere, company supervisor

– Christophe Eychenne, DfT team member

– Caroline Carin, DfT team member

– Emmanuel Solari, DfT team member

– Beatrice Brochier, team manager

Also, special thanks to my supervisor s at


Grenoble INP and KTH:

– Regis Leveugle, ENSIMAG supervisor

– Ahmed Hemani, KTH supervisor

– Jean-Michel Chabloz, KTH PhD student and


advisor

Pierre Schamberger Page I


Abstract

System-on-Chips are getting more complex


every day, making manufacturing test constantly
more challenging. As chip size is increasing, a divide
and conquer approach is tackled, through a core -
based methodology, using Synopsys Design -for-Test
(DfT) state of the art features. This document deals
with flows implementing such architectures. A
wrapping core flow is proposed, limiting ports
impact, thanks to compression feature. A full
automated flow is proposed, as well as one offering
more test possibilities, implementing wrapper bypass
paths for full custom tests.

Then a top-down flow is presented to tackle an


actual complex ST-Ericsson project, using most of
Synopsys features, showing first how to use the tool
and which workarounds are to be implemented to
achieve the expected architecture.

As a parallel study, clock management under


test, which is one of the most challenging parts in DfT
flows, was examined. Clocks are handled with On-
Chip-Clocking (OCC) controller, setting dynamically
clock behavior through clock chains. It was shown
that all clock chains should be handled in a single
scan chain through compression modules. As a
consequence, and to avoid a longer clock chain than
regular chains, an update of the current Synopsys
OCC controller was proposed, improving test time in
coming projects.

Pierre Schamberger Page II


Sammanfattning

System-on-Chip blir mer och mer komplexa, och


det komplicerar chips tillverkningenstestar . På grund
av chip storleks ökning, presenteras en ”divide-and-
conquer” teknik med en ”core-based” metod och
Synopsyss Design-for-Test (DfT) toppmoderna
funktioner. Det här dokumentet utveklar flöder med
sådana arkitekturer. Först föreslås ett wrapper IP
flöden, som användar kompressions flöd för att
begränsa påverkan om pads. Sedan en ”top-down”
flöd presenteras för att hantera en verk lig komplex
ST-Ericssons projekt, med många Synopsyss
funktioner. Verktygs begränsningar först visas, sedan
sätter att komma runt dem presenteras för att få
förvantade arkitekturen.

En andra studie är klocka förvaltning under


test kontrolleras med så kallade ”On-Chip-Clocking
(OCC) controller”. Den här funktionen undersöks och
studien bevisar att alla OCC kontroll bitarna bör
hanteras i en enda clock-chain genom kompression
moduler. Som en följd av det och för att undvika en
längre clock-chain än vanliga scan chains, är här en
OCC kontrollers uppdatering föreslås, minskar testtid
i kommande projekt.

Pierre Schamberger Page III


Table of contents
ACKNOWLEDGMENTS .................................................................................................................................................... I

ABSTRACT ..................................................................................................................................................................... II

SAMMANFATTNING .................................................................................................................................................... III

TABLE OF CONTENTS ................................................................................................................................................... IV

GLOSSARY.................................................................................................................................................................... VI

INTRODUCTION ............................................................................................................................................................ 1

I) DESIGN FOR TEST (DFT) ........................................................................................................................................ 2

I.1) TEST PROCESS .......................................................................................................................................................... 2


I.2) PHYSICAL DEFECTS .................................................................................................................................................... 2
I.3) FAULT MODELS ........................................................................................................................................................ 2
I.4) SCAN-BASED TEST ..................................................................................................................................................... 4

II) DESIGN-FOR-TEST FLOW ...................................................................................................................................... 6

II.1) FROM RTL TO FUNCTIONAL WAFER.............................................................................................................................. 6


II.2) SYNOPSYS TOOLS ...................................................................................................................................................... 6
II.3) CONCLUSION ........................................................................................................................................................... 8

III) DESIGN-FOR-TEST STATE OF THE ART ............................................................................................................... 9

III.1) SYNOPSYS MAIN DESIGN-FOR-TEST FEATURES ................................................................................................................ 9


III.2) SYNOPSYS MAIN FLOWS ........................................................................................................................................... 12
III.3) DFT COMPILER FLOWS MAIN LIMITATIONS .................................................................................................................. 13

IV) ON-CHIP CLOCKING (OCC) ...............................................................................................................................14

IV.1) TOOLS FACILITIES .................................................................................................................................................... 14


IV.2) OCC LOCATION IN COMPRESSED CHAINS ..................................................................................................................... 16
IV.3) OCC MOORE’S LAW ............................................................................................................................................... 21
IV.4) CONCLUSION ......................................................................................................................................................... 27

V) IEEE1500 STANDARD AND SYNOPSYS WRAPPER FEATURE ..................................................................................28

V.1) CORE-BASED TEST METHODOLOGY ............................................................................................................................. 28


V.2) STANDARD WRAPPER FEATURE .................................................................................................................................. 28
V.3) THE IEEE1500 STANDARD ...................................................................................................................................... 30
V.4) IEEE1500 IMPLEMENTED LOGIC ............................................................................................................................... 32
V.5) SYNOPSYS WRAPPER FLOW ....................................................................................................................................... 33

VI) IP WRAPPER & COMPRESSION FLOW ..............................................................................................................34

VI.1) REQUIREMENTS ..................................................................................................................................................... 34


VI.2) WRAPPER BYPASS PATHS ......................................................................................................................................... 34
VI.3) DESIGN UNDER TEST ............................................................................................................................................... 36
VI.4) SOLUTIONS PROPOSED ............................................................................................................................................ 36
VI.5) CONCLUSION ......................................................................................................................................................... 39

VII) X/J PROJECTS FLOWS ......................................................................................................................................40

VII.1) THE PROJECT ......................................................................................................................................................... 40


VII.2) TOOL LIMITATIONS ................................................................................................................................................. 43

Pierre Schamberger November 2012 Page IV


VII.3) FLOW PROPOSALS................................................................................................................................................... 43
VII.4) PIPELINES ............................................................................................................................................................. 48
VII.5) INTEGRATING IP X PROJECT ...................................................................................................................................... 48
VII.6) TARGETING CHIP J PROJECT ...................................................................................................................................... 49
VII.7) CONCLUSION ......................................................................................................................................................... 50

CONCLUSION ...............................................................................................................................................................51

TABLE OF FIGURES ...................................................................................................................................................... VII

TABLE OF TABLES ....................................................................................................................................................... VIII

BIBLIOGRAPHY ............................................................................................................................................................ IX

ANNEXES ...................................................................................................................................................................... A

A) ANNEX: OCC CLOCK CHAINS MODULE VERILOG CODE .......................................................................................................... A


B) ANNEX: FULL SINGLE HIERARCHY IP WRAPPER & COMPRESSION FLOW ................................................................................... B

Pierre Schamberger November 2012 Page V


Glossary
Key words which will appear progressively in the report are listed below. They will be defined in their
respective parts, but synthetic definition stands here as a reminder.

ATPG: Automated Test Patterns Generation. This process uses algorithms to find best sets of pattern to test
a given chip, using all inserted DfT logic. This can also designate the tool which generates those patterns.

DfT: Design-for-Test. This is the main topic of this master thesis. Its main goal is to add logic in a functional
design to speed up test on silicon of the chip to avoid manufacturing process failures.

Flow: Set of steps to follow in order to achieve correctly a process, as a Design-for-Test insertion for
instance.

IP: Intellectual Property. Hardened hardware block in opposite to a full chip. A chip may contain several IPs.

Netlist: This is the resulting file after compiling RTL files with a given set of physical cells library, describing all
gates and interconnection in a design.

OCC: On-Chip Clocking. This is used to control all functional clocks during test and use a “slow” shift clock
when loading and unloading patterns.

PLL: Phase Locked Loop. This is usually synonym in design for an on chip clock source, allowing design to
work at different frequencies. Each PLL block provides a clock at a given frequency.

RTL: Register-Transfer Level: This is a level of abstraction to describe behavior of hardware modules.

SoC: System-on-Chip: This is a set of features integrated in a chip.

TCL: Tool Command Language: This is the command language used to write Synopsys command scripts,
including basic programming language feature.

Testcase: It is a set of files (scripts, log, netlist, RTL files …) to test and show a given behavior of the tool,
without implementing the feature in a complex design. It allows isolating errors for quicker debug.

Pierre Schamberger November 2012 Page VI


Introduction
Systems-on-Chips (SoC) are getting bigger quickly, following Moore’s law, and testing manufactured
chips is now a challenging part of the creation process due to its critical time. For that purpose, new test
techniques are appearing to keep the test time in a reasonable range. Also, big chip becomes more power
consuming under test than in functional mode, and their number of test pads becomes clearly a limitation. It
makes core based methodology an essential feature to avoid soliciting too much the chip. To that end,
dedicated test logic is inserted in the chip to ease and speed up manufacturing test. All these trends provide
new challenges to Design-for-Test (DfT) teams, finding new ways to implement test logic to make it always
faster.

The aim of this master thesis is to set up DfT logic insertion flows for these current designs, using
state of the art features. Due to the complexity and the size of designs, all insertion is achieved by tools. The
user’s tasks are first to decide the test architecture to implement, depending on the physical target (chip
size, number of pads available, IP to be integrated …) and the functional design properties. Then, the user
provides the specifications through command script files to the tools to get the correct result. Depending on
the complexity of the test architecture, tools cannot always implement the required behavior, and a third
task is to implement workaround to keep the test architecture in concordance with specifications. The work
realized within this master thesis was to get familiar with these tools, most particularly Synopsys tools and
write script sets targeting either specific DfT architecture projects or more global type of flows.

This document will introduce first the basic Design-for-Test (DfT) key concepts from the physical
fault models to the scan-based test process. The following part will explain which flow current DfT tools are
using, showing the major steps of test implementation. Then the DfT state of the art section will contain the
best features on the market to improve DfT designs. After these background sections, a first study will detail
techniques to manage clocks during the test with dedicated modules (OCC) inside the design. In fact, clock
structures are among the most challenging parts of test implementation to have a powerful final test
process. This part will show how OCC feature has to be used, and which configurations are the best. Finally,
an OCC update will be proposed to speed-up test in coming designs.

To limit big power consumption during test, a core based test methodology is introduced in section
V, showing its main advantages and how it can be implemented. Among the feature used, core wrapping will
be presented, both as the IEEE1500 standard proposes it and as Synopsys tools is implementing it. The two
last sections (VI and VII) will deal with core-based architecture and its implementations through 2 designs:
First a Wrapper & Compression flow for standalone IP will be described, with 2 different implementations,
discussing advantages and drawbacks of each. Then, the core-based methodology will be applied on a
current ST-Ericsson design project, where the whole DfT scan architecture will be implemented with several
state of the art features. Again, 2 flows will be proposed and evaluated, and the chosen process is currently
applied in the chip development process.

The thesis has been performed in the DfT team at ST-Ericsson Sophia-Antipolis, France. For
confidentiality reasons, some value sources and name of projects are not provided, as they depend still on
internal ongoing projects.

Pierre Schamberger November 2012 page 1


I) Design for Test (DfT)
In a system-on-chip (SoC), two fundamental types of errors are controlled. First, before the chip is
manufactured, the designer has to validate that the implemented system is fitting the functional
requirements thanks to testbenches and validation tests. This is called functional validation. On the other
side, chip manufacturing is not a 100% safe process, and a chip can contain manufacturing defects. Thus,
each chip, once manufactured has to be tested in order to control its correct behavior. This is done with the
help of Design for Test (DFT) techniques, the main topic of this document.

I.1) Test process


Under test, the chip is inserted in a tester device that connects every pad to inject values in the
inputs while output values are probed. A full functional test, checking all possible function of the chip, would
take way too much time. Testing all the chips is time consuming, and the chip test time is crucial as it is part
of the final price. Thus, the coming parts will describe how the faults are tested, but first which kind of
defects are targeted.

I.2) Physical defects


DfT purpose is to detect physical defects on the final chip.
In fact, impurities can get stuck on the silicon (Figure 1), or a
lithography process can be slightly missed, bringing up all kind of
defects. From common physical defects, faults models are
elaborated and these models are tested on each chip. In the coming
part, the most common fault models will be described [1, 2].

I.3) Fault models Figure 1: Physical defect on silicon

I.3.1) Stuck-at fault model


First, and maybe the more known fault model is the “stuck-at” fault model. It basically means that a
net is stuck to a constant value (0 or 1), regardless of its driver. Two sub-cases then exist: stuck-at 1 (SA1)
faults and stuck-at 0 (SA0) faults.

As an example, Figure 2 shows a typical NAND gate and on which stuck-at faults will be simulated.
Table 1 shows output gate value with net A stuck at 0, net B stuck at 1 and net ZN stuck at 1 as examples.
Basically, in the first case, the defect will be noticed only if a “1” is put at both inputs (result in red when
showing the defect). Other input patterns will show a regular behavior. To test all Stuck-at possibilities on
this single NAND gate, one needs 3 set of input values (“01”, “10”, “11” for instance).

A
A B ZN ZN if A ZN if B ZN if ZN ZN if ZN
ZN (normal) SA0 SA1 SA0 SA1
B
0 0 1 1 1 0 1
NAND 0 1 1 1 1 0 1
Figure 2: A NAND Gate 1 0 1 1 0 0 1
1 1 0 1 0 0 1
Table 1: NAND gate logic and Stuck-at faults impact

On a single logic gate, it can seem simple, but it gets much more complicated and sometimes even
impossible to test some nets in a chip because of redundancy or complex structures.

Pierre Schamberger November 2012 page 2


I.3.2) Stuck-at open fault model
VDD
This model is close to the previous one, but instead of
having a net stuck at 0 or 1, the net is broken. It’s then neither A
at 1 nor at 0. It can lead to problems since it has a memory
effect. The following picture gives an example of such a fault:

Considering the previous NOR gate, which has a defect B


(red cross) stuck open. If the pattern (AB) = (00) is followed by
(10), the output will stay high. But if the first pattern was (01), ZN
the output would have stay low, masking the defect.

For this kind of fault model, which can be tracked


thanks to stuck-at faults model, one has to take care of the
previous pattern (memory effect). Such faults are however
very time consuming to target and thus not often tested.
Figure 3:NOR gate with Stuck-At-Open defect
I.3.3) Bridging fault model
Again, bridging fault is another static fault model, and it happens between two wires. The first one is
called aggressor, the second victim. A structural defect in the chip can lead the victim to copy its value from
the aggressor line instead of its driver. This is called bridging fault model.

These faults can be detected also with stuck-at faults model. But algorithms are not very developed
to detect such faults since it requires a lot of placement information. Tools try to test those faults with both
at-speed (delay fault) and stuck-at tests if the layout information is given.

I.3.4) Delay fault model


This fault model is caused by rise and fall time of cells delays, and propagation delays. Even passing
successfully previous static fault models, it can happen that the chip contains still some manufacturing
defects which can increase these delays. Then the tester must check nets or cells where the value would
take too much time traveling from one gate to another, and then causing synchronization problems.
Compared to the previous models, these tests have to be done “at speed”, using functional clock speed to
realize the tests in functional conditions. Actually, two tests models exist: Transition delay and path delay
fault models. The first one is considering a gate taking too much time to stabilize an output, hence a problem
of data synchronization. The path delay fault model is more about the time the data spend in the wire, from
a gate to another.

This model is very important and is part of the 2 test models with stuck-at fault model which are
usually tested in the industry and will be developed later in the report.

I.3.5) IDDQ fault model


This last model is more a way to check if the chip is working in a global point of view. IDDQ stands for
IDD quiescent (intensity measurement in a quiescent state). The basic idea is to put the chip in a given state
and watch out the power consumption. If the value is over the specifications, one can infer that there is
some leakage defect, and that the chip is not correctly manufactured. Even if this method looks quick
compared to the many thousands of pattern vectors required, it’s not really accurate, since, for big chip,
many power domains are existing and a leakage defect in a low power domain can be masked by a “noisy”
high power domain. Then, one needs to split the chip in several parts, turning on some parts only of the chip.

Pierre Schamberger November 2012 page 3


Also, an analog power measurement take a lot more time than a digital pattern vector test, and if several of
those are required, it can become quite time consuming. It’s however used because it can give a global idea
of the chip state.

I.4) Scan-based test


Knowing the kind of fault models targeted, this part will now deal with the techniques used to target
such faults. In fact, the tester does not have access to every gate directly in the design. The idea of scan
based test is to find a way to “shift” inside the chip through pads to target a given fault.

A way to set values in the chip is to force all registers (flip-flops) at a custom value, depending on the
faults targeted. To do so, the design should be slightly improved, to provide access to each register without
using regular register input. Thus, all flip flops (Figure 4 left) are replaced with scan cells (Figure 4 right), with
2 extra ports. The TE (test enable, or SE, scan enable) port selects whenever the regular D input or the test TI
(test input, or SI, scan in) port is to be used as register input.

Figure 4: Regular flip-flop (left) and corresponding scan flip-flop (right)

With this, the register can take either the value from the upstream logic, or what comes from the
test input port. To provide access to these TI ports in all the design, chains of scan cells are implemented,
linking all scan cells with the following pattern:

scanin port -> Cell1/TI -> Cell1/Q -> Cell2/TI ->Cell2/Q -> … -> Cellx/TI -> Cellx/Q -->scanout port

In fact, these “scan chains” are wired from an input pad (scanin port) to an output one (scanout
port), passing by a certain amount of scanned flip flops, using TI input port, and functional output Q port.
However, the output port is still connected also to downstream logic, and another net is just added to link
the scan cell to the next one.

A complete design contains generally several scan chains. A good design from a DfT point of view
would have small and equal scan chain length (currently around 200), because test time is directly impacted
by shifting time, thus by scan chain length. However, the amount of pads is limited and other specific
techniques have to be implemented to achieve small scan chain length. Such techniques will be detailed in a
coming section of the report.

When the scan chains are built, there is a hardware path to shift values inside the chain, by enabling
TE to select TI register input. Each cycle, a new value is inserted in the first cell of the chain, and all the
previously inserted ones are shift one cell downstream. For a scan chain including N cells, N shift cycles are
required. After that point, the design is fully loaded with custom values.

Pierre Schamberger November 2012 page 4


Figure 5: Scan based test methodology. Scan chain from scanin to scanout port controlling register state during test.

Then a functional clock cycle is processed, with all TE ports selecting regular D register input, to
capture the functional value. As an example, considering Figure 5, when all registers are loaded, the
functional clock cycle propagates the values from the left registers through Logic Block A, and captures them
in the right registers. Meanwhile, the values from the right registers go through logic block B until the
functional outputs. Finally, the input selector enables again the test input, and the whole scan chain is
shifted out through scanout port and probed to control the value correctness. Meanwhile, input ports are
already shifting another set of values (“pattern”) in the chip through input pads.

A “test pattern” is a set of values defining all design registers states at a given time, to target a set of
faults. Several patterns are required to test a chip and each pattern is shifted one after the other. A good set
of test patterns is a small set which has a maximal coverage over the chip defects. The usual target is over
99% of fault coverage for Stuck-at faults model, and around 70% for transition faults model, which are more
difficult to target.

The work of DFT teams is then basically divided in two parts: Create and insert the dedicated test
logic discussed above and find the best test patterns set to test the chip with the help of powerful tools. The
challenging part is to find a good compromise between test time and test coverage. It gets more complicated
as the chips are growing up. To reach the 99% coverage target, the set of test patterns required is getting
bigger and bigger and test capacity limits are now reached.

In the same time, test power consumption is also increasing dramatically. This leads to a new issue:
In functional mode, all chip cells are never switching within one single clock pulse. However, it could happen
in test mode and then current supply can be a limited factor. Because of that, a chip can appear to be miss-
manufactured because of power lack. The consequences would be to throw away an operational chip (called
a yield loss). The DfT architecture must take this power constraint into account.

All this limits leads us today to implement new techniques to ease and speed up the test. This master
thesis will deal with these new techniques, as wrapper logic, compression units and clock controller which
are the new standards of DfT to implement Design-for-Test architecture easing compact sets of test patterns
and short test time. The next part will now explain the global DfT flow used with Synopsys tools.

Pierre Schamberger November 2012 page 5


II) Design-for-Test flow
After seeing why DfT is important as well as its main process, this section will present the main steps
of DfT process, focusing on the DfT insertion process which is the main topic of this master thesis. Test
pattern generation and simulation steps will be presented quickly as they are verification steps to control
the correctness of DfT insertion. The second part of this section will present how the flow is applied under
Synopsys tools.

II.1) From RTL to functional wafer


Basically DfT is working at several levels, from early RTL stage to final tests on silicon. First, at RTL
level, some blocks are integrated to speed-up test afterwards, such as clock management, dynamic test
modules, and Build-In Self Test (BIST) for memories.

Then a global synthesis is done, mapping the RTL on a given library set. The output of a synthesis is a
Verilog file called netlist containing all the design mapped on a given technology (32/28nm for instance,
depending on the fab). This netlist is used again in DfT to insert all the scan logic. In fact, a first step is to
replace all regular scan elements (flip-flops) by scanned cells, then stitch them together to create scan
chains. During this step, DfT modules such as Compressors, Wrapper … are dynamically inserted be the tool.
All clock and reset architectures are explained to the tool to handle it in the best way when inserting DfT
elements. This step creates a new and final netlist.

When all information is provided and the hardware is in its final stage, the tool should be able to
deal with all design features to generate test pattern sets. This step is achieved with an ATPG (Automated
Test Pattern Generation) tool. Depending on the complexity of the design, and the DfT insertion precision,
the tool will be able to give or not a correct set, with a variable test coverage. This step was used in this
master thesis work to check the expected behavior of test features, and verify that test pattern can be
generated with provided information.

A last step is to validate, through simulation that the patterns generated are correct, and checking
again that the tool understood the chip behavior correctly. This last step usually forces to loop back to first
steps as problems occurred. Basically, most troubles come from clock architectures, and test signals which
were not correctly handled.

All this process finally provides a clean test pattern set fitting the final chip architecture. Test
Engineers will use this pattern set to test chips at wafer level for mass production.

II.2) Synopsys tools


Among the main tool providers, one can mention Synopsys, Cadence and Mentor. Those 3 Electronic
Design Automation (EDA) vendors provide a similar tool suite, not only for DfT purpose, but for all design
process, from RTL synthesis to back-end place and route steps. For DfT, Synopsys is the tool suite which is
used at ST-Ericsson for Design-for-Test purpose.

Synopsys provides tools to perform the previously presented steps. In a global way, Synopsys tools
works with TCL scripts of command lines to activate tool switches, reading input files and creating output
files. The flow used during the master thesis was restricted to post-synthesis netlist steps. Synopsys
corresponding tools are described below.

Pierre Schamberger November 2012 page 6


II.2.1) Inserting DfT: DfT Compiler
First, DfT Compiler [3] is used from netlist stage, and usually takes a netlist as main input. Then a set
a command line (switches) are given, in order to guide the tool how to implement scan chains and which
options (Wrapper, Compression …) are to be implemented. Also, all clocks, reset, and specific test signals are
defined in the command line script to guide the tool towards the expected result.

After processing all given information, DfT Compiler produces an updated netlist with scan inserted,
as well as a couple of reports files, and 2 important files:

 Test model file, commonly called CTL (as its extension). It describes all DfT info inside the
design (scan chains, scan ports, clock management, embedded DfT devices …). It is used
when the current design is to be integrated at higher level (SoC level for an IP), to avoid
redoing the work again, and simplifying IP integration.
 Test protocol file, commonly called SPF (as its extension). It describes almost the same
information as the test mode file, plus test signal timings. It’s aimed to next step which is
ATPG to drive correctly test signals.

II.2.2) Editing test files: SPFGen/CTLGen


Since complex designs implies advanced DfT architectures, DfT Compiler cannot always implement
exactly what the user asked, implying to find tools workarounds as modifying the netlist for instance. Then,
test protocol and test models are corrupted, and should be regenerated. Synopsys provides 2 script tools
SPFGen (test protocol) and CTLGen (test model) which recreate CTL and SPF files to avoid corrupted ones.
Those are not supposed to be used for regular flows, but due to tool limitations and unexpected results, this
is sometimes the only way. Flows developed in this master thesis will unfortunately use such tools.

II.2.3) Automated Test Pattern Generation: TetraMax


At this step, the netlist is not modified anymore, and the design part is finished. TetraMax [3] then
takes the updated netlist from above, as well as the test protocol files. If the work was correctly done
before, this step is basically just checking (DRC, Design Rules Checking step) if both design and test protocol
are coherent, and if ATPG tool is able to shift test values in the scan chains. This is also at this step that loops
back are required to modify DfT insertion as the implemented test architecture does not work as expected.

If this checking step is alright TetraMax can create a full list of faults to target in the design (Stuck-at,
Transition …). Then the user decides the algorithms to be used for targeting faults and the tool runs to
provide a correct set of pattern and a corresponding testbench for the design (clock, reset, scan enable,
control signals …). These patterns are used first in a simulation flow to check if they are usable and
implemented in a chip tester to test all chips in mass production at last.

II.2.4) Simulation: NCSIM


One cannot think about waiting the chip to be manufactured to test if DfT was well implemented.
This is why simulation tools have to be used. NCSim is the only software used in this flow which is not from
Synopsys, but Cadence instead. But this does not perturb the flow. Time is a major issue as simulate a whole
chip is quite long (day scale). At this step again, loop back to DfT Compiler step are mandatory as errors
occurs. Simulation is run usually twice:

 First, with a regular netlist, with no back-end information. This is called zero-delay
simulation since everything is still ideal and no placement information is available.

Pierre Schamberger November 2012 page 7


 After Place & Route steps, the simulation is run again with all timing information to verify
the correctness of the test architecture at speed. Several test cases are evaluated in
different conditions (Temperature …) to validate the chip behavior in worst and best case.

When the second simulation is alright, the chip is declared to be testable and can be sent to
production.

II.3) Conclusion
DfT tools are really essential for digital circuit and nothing could be done without. They can handle a
DfT implementation by themselves with very few information. However, when it comes to implement a
customized architecture, where the user wants specific details, it can be very hard to deal with them. The
aim of this master thesis is actually to master them, understand how they react when implementing such
customized architecture, and find workarounds to their limitations.

Synopsys tools, and more specifically DfT Compiler, allow regular scan chain insertion, but also the
insertion of a lot of features to enhanced test on chip, which will be presented in the next section.

Pierre Schamberger November 2012 page 8


III) Design-for-Test state of the art
DfT state of the art is globally restricted to tools improvements. In fact, Synopsys tools include more
and more features to improve DfT architectures, and helped with regular feedbacks from design companies,
the improvement is constant. The coming part deals then with Synopsys state of the art [3, 4], since
everything is done through their tools. First, a couple of features commonly used in today DfT architectures
will be presented. Wrapper feature and clock controller will be left apart since these items have dedicated
parts below. At the end, some of the most important Synopsys DfT insertion flows will be presented. All
these features and flows will be used later to implement real flows.

III.1) Synopsys main Design-for-Test features


III.1.1) Multi-mode designs

Figure 6: Multi-test modes architecture

This first technique allows implementing designs with several test configurations, implying several
scan chain lengths. One can for instance create a configuration (called test mode) where all flops in the
design are grouped in a single scan chain and another configuration with several scan chains. The tool
creates automatically the required number of ports (called test_si and test_so on Figure 6) for each test
mode, and links scan chains to fit the requirements, trying to equilibrate the chains to keep the length as
small as possible and equal between chains. Figure 6 is an example of 2 test modes (blue and red) for a
design containing 12 cells. Red test mode is declared with 3 scan chains, and blue with 2. The figure shows
that the tools inserted multiplexer driven by a test_mode signal to switch from a test mode to another,
modifying scan chains length. At the end, each test mode will have its test protocol and its set of patterns.

III.1.2) Compression feature


Scan compression is a widely used technique allowing reducing the number of pad required. In fact,
design trends to decrease the scan chain length to speed up test time, thus to increase number of scan
chains. To handle this increase of scan chains with a limited amount of pads, Synopsys compression feature
allows having more scan chains in design than scan ports. A given input test port impact several scan chains
and output test ports value results from several scan chains. Thanks to information provided in test protocol
files, ATPG algorithms can handle it in quite a good way, computing a test pattern set still compact.

In fact, DfT Compiler allows common compression ratio from 10 to 50 times. Over that limit, it is still
implementable (up to 100 times), but it becomes difficult to target a high fault coverage for the design.
Decompressor is based on muxes, and compressor is mostly made of a XOR-tree.

Pierre Schamberger November 2012 page 9


Figure 7 represents compressor structures. Scanin are bold left line, and Scanout are right bold line.
Internal core scan chains are represented in between compressor and decompressor modules.

D
e C
c o
o m
m p
p r
r e
e s
s s
s o
o r
r

Figure 7: Scan compression feature

This method is almost always used in today’s design because of its simplicity and its power. When
implemented by DfT Compiler, 2 test modes are created, namely Internal_scan and ScanCompression_mode.
Both modes use the same ports, but Internal_scan does not use compressor modules, and thus has bigger
scan chains. In fact, this mode recombines the chains from the compressed mode to get as many as scan
channels. The Internal_scan mode is useful for debug both on chip and during simulation as a first step.

III.1.3) Partition flow

D
Power C
e
o
c domain
m
o
m
1 p
r
p
e
r Partition 1
s
e
s
s
o
s
D r
o Power C
e
r o
c domain
m
o
m
2 p
r
p
e Partition 2
r
s
e
s
s
Figure 8: Compressed sub-blocks Figure 9: Top level compression using
o
s level.
integrated at top partition flow to build 2 compressor
r
o modules
r

Pierre Schamberger November 2012 page 10


When it comes to multi power domains designs, it is not recommended to compress all chains
through a single compressor unit. Power domain-dedicated compression should be implemented. This can
be done either by working in power domains sub-blocks (Figure 8) and just integrating the 2 blocks at top
level. Sometimes it cannot be that easy, and integration of compression has to be done at top level, with a
top-down flow. In those cases, DfT Compiler has a partition feature (Figure 9). It allows the user declaring
which blocks are in which partition, and compression insertion, as well as scan configuration, is completely
independent from a partition to another. Since more and more chips are using different voltage, this feature
gets more interesting. It will be use later in this report for the flow of a current ST-Ericsson IP.

III.1.4) Serializer feature


Sometimes, for some chips, the number of pads
80 MHz
dedicated to scan ports is very limited, and even with a
compression feature, it is not enough to compress all scan
chains. To avoid increasing scan chain length, DfT Compiler
has a feature called Serializer that the user can enable
over compression.

This feature is shown on Figure 10. In the figure’s case, it


allows to save 2*(8-1) =14 scan ports. In fact, for a 10
compression with 8 inputs, only one scan in and one scan
MHz
out are used. The deserializer box is an 8 bit shift register,
and 1 clock cycle out of 8, when the deserializer shift
register is full, it updates the value towards the
decompressor.

For a serialialization ratio of N (here N=8), the core


shift frequency is divided by N, thus a test time multiplied
by N. But, if the deserializer is close enough to pads on the Figure 10: Serializer feature
chip, with short propagation timing, thus one can increase
the serializer clock frequency, and then lower the impact on test time. For N equal to 2 (flow in section VII),
it is even possible to double the serializer frequency not to increase test time.

This feature is useful then, but brings a lot of constraints and incompatibilities with other flows.
Depending on Synopsys investment on this feature, it can become interesting in coming DfT Compiler
releases.

III.1.5) Pipeline feature


DfT logic in the design is not really the
major concern when it comes to timing. Then,
depending on the pads remaining, the wires can
be quite long from the pad, through the
decompressor until the first register in the design
under test, and maybe longer than a clock cycle.
The problem is the same for output pads, from
last flip-flop of the scan chains and through
compression. To avoid synchronization issues,

Figure 11: Pipeline feature (pipeline cells added in green)

Pierre Schamberger November 2012 page 11


the idea is to add register elements between pads and decompressor unit, as well as between compressor
and output pads on each scan path. This adds an extra clock cycle of shift, but it allows breaking long scan
paths into two smaller and reduces propagation delay.

This feature might look simple and easy to implement, and almost indispensable in current designs.
But the current DfT Compiler flows including pipelines are limited and bring some more issues in complex
architectures. This will be discussed on section VII flow.

III.2) Synopsys main flows


For each feature described above, DfT Compiler has dedicated flows. However, on a more global
view, Synopsys provides 2 main ways to handle DfT insertion in a complete design, containing sub-blocks.
Those are similar to standard hardware design flows, namely Bottom-up and Top-down approach. Both have
advantages and drawbacks, and the good answer is often by mixing the two flows, to get the expected
granularity.

III.2.1) Bottom-up
Also called hierarchical flow in Synopsys documentation, this flow allows a pleasant granularity when
inserting DfT.

Block B Block D

Block A

Block C Block E
top
Figure 12: Hierarchical design example
As an example, in the design on Figure 12, one would process 6 runs of the tool, one for each
module. It allows defining the scan configuration in a very precise way, for each block.

However, each run will result in an updated netlist, and a test model file (CTL) of the block. All the
information has to be carried from a sub-module to its parent block and it’s often here things get wrong: In
fact, in complex architectures, the user has to bypass tools limitations by “hacking” the netlist or the test
model. It is then complicated to keep something coherent through hierarchy levels. And the flow becomes
quickly creepy with a lot of steps and files.

III.2.2) Top-down
A smarter way to handle a DfT insertion may be using a Top down approach. In the example on
Figure 12, a top down flow would treat all in one pass, at top level. This technique is quick, clean, and does
not require carrying DfT information several times. However, when design get slightly complex, the

Pierre Schamberger November 2012 page 12


granularity, even working with partition flow, is bad and the DfT architecture is quickly limited. Moreover, as
work is usually done by teams, this approach is hard to work in parallel.

III.2.3) Conclusion
As proposed above, the smartest way is generally a good mix of both flows. In the above design
example, one could for instance treat block D as a top unit, all at once (including blocks A, B and C in a single
DfT insertion), exporting a test model for this block, and integrating block D with block E at top level.

III.3) DfT Compiler flows main limitations


Synopsys tools do have a lot of features with their respective flows. For simple test cases, it is easy
to implement, but in actual complex projects, the coming parts will show that it’s not true anymore. Among
presented features, Serializer and Pipeline flows are quite incompatible with other features. This is where
engineering work is required, mixing smartly tools and workarounds.

Pierre Schamberger November 2012 page 13


IV) On-Chip Clocking (OCC)
Stuck-at fault model does not fully cover DfT needs: A deficient net can sometimes drive the correct
value at reduced speed, but will fail at full speed. This is why transition fault model or so called “at-speed”
tests are used. This requires however a fast clock, as fast as the functional ones, but these are more likely on
chip since a pad cannot handle high frequencies (over 100MHz). Moreover, today’s designs contain several
on chip clocks (PLL, Phase-locked loop), which are divided and mixed together, and sometimes synchronized
together. Too many pads would be wasted to drive all those clocks during test.

An On-Chip Clocking (OCC) Controller is basically a box which intercepts PLL output clock and allow
the APTG tool to mix it with the shift clock for test purpose. Synopsys requires each on chip clock generator
output to be controlled with such OCC controller.

PLL

Shift On-Chip Clocking


clock Controller
Towards design
Primar flip-flops
y input

Figure 13: The OCC Controller is taking control over each PLL clock with the shift clock

The PLL clock is the one used in functional mode, and the OCC Controller is bypassed. During test,
the shift clock (coming from a pad, red wire on Figure 13) is used to shift patterns in and out of the chip.
Then capture cycles are processed with the PLL clock (at speed). Such a module should then handle, in a
glitch-free way, clock transitions from a slow clock (shift clock) to a fast clock (PLL clock), implying a full
synchronization between the 2 clocks.

Also, in order to target complex faults, such as the ones in memories, designs might now require
multi-capture cycles to target these faults. This is handled with dynamic control bits inside the OCC
Controller, driven as a scan chain (called clock chain) by ATPG tools. Each bit, when enabled, leads to a
capture clock pulse before shifting out the result.

IV.1) Tools facilities


This study has been realized with Synopsys tools and more specifically DfT Compiler F-2011-09 SP1
for DfT insertion and TetraMax D-2010.03-SP5 [3, 4] for ATPG step. The release is important because
functionalities from a release to the next one can make the results vary. The coming part explains, for the
given releases the way of describing OCCs to the tool and the how the tools handle them. Also coming lines
will freely discuss about OCC, while still meaning OCC Controller for simplicity purpose.

IV.1.1) DfT Compiler: Inserting DfT


In DfT Compiler, the regular way to recognize OCC present in the design is described in the following
script extract:

Pierre Schamberger November 2012 page 14


# Defining the clock itself, with PLL source, shift clock and output pin clock.
# Also, OCC control flops are defined in the –ctrl_bits list
set_dft_signal -type Oscillator \
-hookup_pin output _clock_pin \
-ate_clock shift_clock \
-pll_clock PLL_clock_pin \
-ctrl_bits [list \
0 occ_ref_clk/…/p_out_reg_0_/Q 1 \
1 occ_ref_clk/…/p_out_reg_1_/Q 1 \
2 occ_ref_clk/…/p_out_reg_2_/Q 1 \
3 occ_ref_clk/…/p_out_reg_3_/Q 1 \
4 occ_ref_clk/…/p_out_reg_4_/Q 1 ] \
-view existing

# Then, OCC control bits are grouped together in an OCC scan group
set_scan_group clk_bit_chain_ref_clk\
–class OCC \
-segment_length 5 \
-serial_routed true \
-clock [list ate_clk] \
-access [ list \
ScanDataIn occ_ref_clk/scan_in \
ScanDataOut occ_ref_clk/scan_out ]

Figure 14: TCL script extract defining OCC control bits

The above script is divided in 2 commands. First, the clock itself is defined, with the different bits to
control it (5 control bits = max. 5 capture cycles), the PLL source clock, the shift clock and the output pin
clock. Then, a command defines how to group the OCC control bits in a particular class (named OCC) which
will create a clock chain in the design among regular scan chains.
This “-class” option in the set_scan_group command allows the tool recognizing a clock chain. It
results in the creation of an easy way through compressors to avoid degrading the test coverage or the
pattern set size too much. In fact, the tool implicitly excludes the clock chains from compression. If in a scan
chain, some cells are defined in an OCC scan group, this whole scan chain will not be part of the
compression, and it will take a dedicated scan channel for itself from compression dedicated pads.
As a first point, let’s consider any basic compressed design (with no clock chains). As an example,
given 8 primary scan channels, DfT Compiler divides them in two groups: 3 signals are selectors, and 5 are
data. Without OCC architecture, DfT Compiler can handle up to 1024 internal scan chains with such
configuration. With OCC architecture and still 8 scan channels, it falls down to 512.

In the first case (no OCC), each of the 5 data bits drives directly internal scan chain 0 to 4 (Figure 16,
C channels). And all the other internal scan chains are a logic mix in between all 8 signals. The first C scan
chains are then fit-through, but a given value on them impact deeply all others scan chains values.

However, if an OCC chain is declared as clock chain, one of the 5 (C) data bits is fully dedicated to
that clock chain, and not used for other scan chains. It means that others scan chains are not impacted by
the clock chain values.

IV.1.2) TetraMax: Generating pattern set


When generating patterns with TetraMax, two sets are generated: a Stuck-at test fault model
pattern set and the transition “at-speed” one. For the Stuck-at, the user gives either a rule to use single

Pierre Schamberger November 2012 page 15


capture clock, or allow the tool to use “fast-sequential” or “full-sequential”, where the tool decides itself
how many (between 2 and 5 with ST-Ericsson OCCs) capture clock cycles the pattern will use, depending on
the targeted faults. For Transition faults, the minimum capture cycle number is 2. More specifically, and
contrary to Stuck-at fault models, one needs a cycle to “launch” the value on the targeted fault, and another
one to “capture” it. Again, “fast/full sequential” modes can be used to target more complex faults. The
following picture gives an overview on how TetraMax use the OCC in transition faults model. The 2 cycles are
shown on the last line. ATE clock stands for the shift clock below.

Figure 15: Capture sequence for Transition patterns

According to this, with a full scanned design, the tool should be able to cover the whole design set of
faults. However, this is barely the case in today’s projects. In fact, some cells are not scanned, and some are
told to be “uncontrollable”. This happens when the cell clock or reset signal cannot be controlled, meaning
nor a primary input clock, neither an OCC controlled one. This should appear as few as possible since it
complicates test patterns generation.

Also, most of faults not able to be targeted with simple patterns are faults targeting memories. In
fact, if memory content is tested with Memory Built-In Self Tests (MBIST), the access logic has to be tested as
regular scan logic. However, a memory accesses (write/read) requires around 3 cycles, and TetraMax has to
wait those 3 cycles to see the results on the output pins of the memory. This is where most of multi cycles
patterns are used.

In a general way, putting apart the memory faults, most faults can be detected with simple patterns
(single capture for Stuck-at faults, and double launch/capture for Transition faults).

IV.2) OCC location in compressed chains


Using compression around an IP allows decreasing the number of scan ports, while not increasing
scan chain length. However, when putting a scan input port at a certain value (0 or 1), it loudly impacts the
decompressed values on several internal chains. In the same way, the ATPG tool decides to target a given
fault, and applies corresponding pattern at the input of the decompressor, implying other cells state in the
design. A bit which is decided over a targeted fault is called “care bit”.

Pierre Schamberger November 2012 page 16


The main issue with OCC control bit chains is that all 5 bits are always “care bits”, implying a lot of
other bits in the design when using scan compression. This is why it is important to discuss about the best
way to place the OCC control bits in scan chains when dealing with compression [5].

IV.2.1) Problem description


Problems appear when several scan chains have OCC scan group on it. It can appear easily in
designs, since some hardened macros can include clock chain segments in regular chains. Several macros
including OCCs lead to one dedicated scan channel per clock chain. And merging them is not always possible
because the length of the clock chain would be too long. Let’s take an example:
A core with 265 scan chains and 8 scan channels (8 in and 8 out). According to DfT Compiler
documentation [3, 4], 8 scan channels can handle up to 512 scan chains including 1 OCC dedicated scan
chain. If another core scan chain in the design has OCC scan group, it will reduce to 7 scan channels for core
scan chains, which limits to 224 the maximum of internal scan chains handled.
Clock chains are handled in such high priority that the tool will, instead of reconsidering the OCC
scan chain repartition, merge other core scan chains together to reach the value of 224 scan chains. Thus, 40
scan chains will have a length twice as big as wanted, doubling shifting time, thus test time!
In the following parts, this problem will be put into evidence and the best implementation will be
found out.

IV.2.2) Spreading OCC clock chains in design


Tests were realized on a former NXP IP core. The faults were generated only on the core, without
memory blocks. No DfT logic was tested. Also, only Stuck-At fault model was used. However, this shows well
enough the problem.
The tests were computed with the following compression requirements:
 8 scan channels
 267 core scan chains
The ATPG run had the following parameters:

 capture cycles parameter: 0


 High effort on pattern merging to get the most compact set of pattern.

A first run called Best Case below, was done to get a reference in terms of pattern set size and fault
coverage one can expect. All OCC control bits were in a single dedicated clock chain, creating only one
dedicated scan channel (scan chain 0)
Then, the OCC clock segments were spread into 4 scan chains, but only one of them was declared as
OCC scan group. To perform this, simply remove the “-class OCC” parameter in the set_scan_group
definition. Three tests were performed with the aim to declare all (1), part (2), or none (3) of the OCC control
bits. This was done by removing in the test protocol file the line (cf. Table 5):

cycle X “<cell name>” 1;


in each clock controller structure block. When removing these lines, the ATPG assumes that the good value
will appear at the right capture cycle, making these bits don’t care bits, removing then some constraints on
other scan chains. The notion of don’t care bits, opposite to care bits, is very important in compression flow.
The higher the ratio of don’t care cells is, the more compact the pattern set is.

If no line is removed, all 5 bits are considered as care bits, making the pattern set generation harder.
Results are shown below as “OCC spread, 5 bits declared”.

Pierre Schamberger November 2012 page 17


As seen before, in Stuck-At test model, only the first cycle of capture is used, then one can think that
the 4 others control bits are useless in that model. The following test shows the impact while keeping the
first control bit as care bit, and others as don’t care bit. Results are shown below as “OCC spread, 1 bit
declared”.

As the last step, as a kind of verification, all control bit information was even removed, leaving only
the pin definition of the OCC. During this test, all “-class OCC” parameters in scan group definition were also
removed, spreading all the OCC in 3 regular scan chains. In this way, all OCC control bits are seen as regular
scan cells and always don’t care bits. The results are shown under “OCC Spread, no control bit declared”.

IV.2.3) Declaring OCC clock chains


In previous tests, one OCC chain was declared as clock chain, dedicating a scan channel for it. One
can wonder the behavior the ATPG tool will have if DfT Compiler did not adapt the compressor according to
any clock chains.

In fact, without any OCC consideration in a C width input decompressor, the first C internal scan
chains are actually direct wires from the scan ports, and all other internal scan chains are a logic mix of these
scan ports (Figure 16). One can wonder if any improvement can be achieved by moving the OCC chain on
either a “wired” scan chain or standard compressed chain.

Figure 16: Decompressor paths. The C first internal scan chains are directly wired from the C input channels. Remaining internal
chains are a logic mix of them.

4 tests were performed, with 2 parameters studied:

 Number of OCC control bits declared: 5 or 1 control bits declared. The case “0 control bit declared”
was already done in the previous study.
 Path through compressors: the first C scan chains of the compression are always fits-through. The
idea is to force the OCC chain (still not declared as OCC chain) on a privileged chain, or on a standard
one.

In all cases, all OCCs were on the same scan chain, but not declared as clock chain.

IV.2.4) Main results


This part recaps all results from tests above through Table 2.

Pierre Schamberger November 2012 page 18


5 control 1 control
no control
bits bit
bit declared
declared declared
Clock chain Test Coverage 99.55% --- ---
Best case
declared Pattern set 1402 --- ---
Clock chain Test Coverage 99.21% 99.27% 99.55%
OCCs spread
declared Pattern set 18499 17738 1412
Clock chain not OCC chain on random Test Coverage 99.31% 99.26% ---
declared chain Pattern set 17793 17372 ---
Clock chain not OCC chain on privileged Test Coverage 99.29% 99.29% ---
declared undeclared scan chain Pattern set 17536 17404 ---
Table 2: Pattern set size and test coverage for different OCC location and declaration

Analyzing results depending on the OCC location and its declaration:


The coming part discusses the results of the set of tests, whose results are shown on lines 1 and 2
from Table 2.

The first line is the reference test, with all OCCs on a dedicated and declared clock chain. It
represents a good test coverage of 99.55%, and a pattern set including 1402 patterns.

The second line deals with OCCs spread through 4 chains, but only one declared as a clock chain.
First test was with all 5 control bits declared. One can see in the results of the ATPG run a test coverage of
99.21% for a pattern set containing 18499 patterns. The set size is over 10 times the best case! Also, the test
coverage is slightly lowered (-0.3%) compared to the best case on line 1.

The second test declared only 1 control bit per OCC. The results provide a test coverage of 99.27%,
and a pattern set of 17738 patterns. One can see a small pattern set size reduction (≈5%) compared to the
previous test, but it’s still around 10 times more compared to the best case, where all OCCs were in the
same dedicated scan chain. And the test coverage is still slightly lowered compared to the best case.

As the last step, all control bits declaration was removed. The results were then way much better
with a test coverage of 99.55% for 1412 patterns. With this run, one finds again similar results to the best
case. However, this test was performed without any OCC control bit information. Thus TetraMax assumed
that the OCC will magically decide the control bit state, but OCC actually contains random values. All these
control bits should then be forced to a correct value before ATPG capture cycle. Otherwise all patterns will
fail during simulation. There is no way to guess the ATPG required control bit state. Then this test will fail in
simulation. This is not usable in such a state. This shows however that, without any care bits, this
corresponds to the best case results.

Analyzing results depending on the location of the undeclared clock chain:


First, the results of the runs done when the OCC scan chain is not declared as a clock chain, with a
random scan chain choice (not on the C first pins then) are presented on the line 3 of Table 2. Both 5 and 1
control bits declared were processed. The results showed a small improvement for 1 bit declared compared
to the 5 control bits declared. But both pattern set sizes are way too high to be acceptable.

Then, the undeclared clock chain was stitched on one of the C first internal chains, as a privileged
chain (directly wired from scan port). One can notice a tiny improvement compared to the previous tests.
But again, the pattern set is around 17,000 patterns, which is not acceptable. It means that the ATPG tool is
working with the C privileged chains in the same way than other chains.

Pierre Schamberger November 2012 page 19


To conclude, no big difference, even when declaring only 1 control bit (others in don’t care), is visible
in between these solutions. This is not a correct way to implement it neither.

IV.2.5) Hints if IPs with no dedicated clock chain


If, despite the above considerations, an IP which does not have a dedicated clock chain is to be
integrated, two solutions appear:

 Enclose the full IP chain in the top clock chain. It will enlarge the chain, but test coverage and
pattern set will remain correct. However, shifting time might increase.

 Declare the chain as OCC scan chain, creating fit-through in the compressor and loosing then
a couple of scan ports. Compression ratio is reduced and additional scan ports may be required to avoid
regular scan chains merging.

IV.2.6) Conclusions
The best pattern set count was around 1400 patterns. It can be achieved either by declared a single
clock chain with all OCCs (best case), or removing all care bits constraints in SPF file (OCC spread, no control
bit declared). These tests also achieve the best test coverage (99.55%).

When spreading OCCs in design (on several scan chain), a small improvement can be seen when
declaring only 1 control bit (other 4 bits in don’t care state). But this improvement is really small (18,500 to
17,740 patterns, against 1400 patterns in the best case). And the global overhead is not acceptable.

Since the “no control bit declared” architecture does not pass simulation step, there is no other way
but the best case described above to process OCC recognition in design, when it is possible to link all OCCs in
one single scan chain. This is of course the Synopsys recommended way of doing such thing.

But troubles appear when it’s not possible to link all OCCs. In this case, all considered options gave
bad results. Some solutions were studied without success.

When several scan chains contains OCC in a design, DfT Compiler, while inserting compression, uses
a dedicated couple of scan in and out ports for each clock chain. This could be very port-consuming if OCC
are spread in the design. This could ultimately force the tool to merge 2 regular scan chains to fit
compression limits, implying a shifting time doubled. Also if the OCC chains are not declared as clock chains,
and because OCC control bits are considered as care bits, the ATPG goes into troubles when processing
patterns because of the high number of care bits, and resulting in a huge pattern set.

As a conclusion of this study, it has been shown that the only thinkable way to have a correct design
and a reasonable pattern set is to use only IPs with a dedicated clock chain (OCC) at IP top level. However,
issues can still appear. In fact, more and more OCCs are present in designs and the clock chain can become
bigger than regular scan chain length requirements. This would lead to an increase of the test time. Next
part will deal with the clock chain length, and propose an update to avoid such shift time overhead.

Pierre Schamberger November 2012 page 20


IV.3) OCC Moore’s law
From the previous part, it was decided that all OCCs should be grouped on a single clock chain to
ease DfT Compiler work. As long as the amount of OCC controllers is not too high, it should be alright. But
current trends show that the OCC clock chain will soon be longer than regular chains. One can wonder then
how this coming issue can be anticipated.

IV.3.1) Problem description


Declaring the clock chains implies the loss of 1 dedicated scan channel for each clock chain, thus
dividing by 2 the maximum number of internal scan chains inside compression. The idea is to dedicate only 1
single clock chain through the compression where all OCC chains are gathered.

However, designs are today heading towards an all OCC test design style, implying an increasing
number of OCCs, avoiding not controlled flip-flops. Common scan chain lengths are around 200 or 300 cells,
with decreasing trends. Given OCCs with 5 control bits in a design with scan chain length of 200, if there are
more than 40 OCCs, the clock chain will be longer than regular chains, impacting shift time.

But, as seen previously, cutting the clock chain into 2 chains basically asks for a new couple of scan
ports. This part deals with that problematic which will be accurate very soon in coming designs. As an
example, a current project contains around 40 OCCs for 300 scan chain length.

One can wonder why a single extra scan port is so important. In fact, at SoC level, a scan port is a
pad. Since DfT ports are plugged on functional pads, the amount of test pads is limited by the functional
pads. Dedicating 2 pads just for a couple of OCC clock chains is a very poor usage of this limited resource. In
fact, if they were added to a compressor, they would drive around 40 scan chains of 300 scan cells.

IV.3.2) Possible improvements


One can then wonder if all 5 control bits are always required in OCC controllers. In the Synopsys OCC
template, length can freely vary, but ST-Ericsson OCCs were set to 5 control bits. In fact, in most patterns,
only part of the 5 capture cycles is used. Table 3 shows pattern repartition depending on the number of
capture cycles. The pattern count comes from a current ST-Ericsson project, sorted by capture cycles. Both
Stuck-at and Transition faults models are taken into account.

Fault targeted type Capture clock cycles Pattern count Ratio

1 (S1) 28018 49.76 %

2 (S2) 597 1.06 %


Stuck-at
3 (S3) 7428 13.19 %

4 (S4) 1980 3.52 %

2 (T2) 2823 5.01 %


Transition
Over 2 (T4) 15459 27.46 %

Total 56305 100.00 %


Table 3: Pattern set size of a current ST-Ericsson project, classed by capture clock cycles.

Pierre Schamberger November 2012 page 21


In the above table, the single capture cycle pattern style for Stuck-at represents half of the pattern
set. Also, using 3 capture cycles allows the ATPG to target most of the multi capture cycles faults in Stuck-at.

In a global way, this particularity can then be used to reduce clock chain length in some patterns.
The idea is not about statically reducing the control bit chain at instantiation, but more about dynamically
showing to the ATPG a given number of those 5 cells. Indeed, for half the pattern set, one can show only a
single control bit to the ATPG, forcing the others at 0, decreasing the clock chain to a fifth of its length.

The following parts will try to propose an update of the current OCC clock chain hierarchy.

IV.3.3) Current ST-Ericsson OCC clock chain


Let us first described how is the current ST-Ericsson OCC shift register, which is the part including
OCC control bits. This module is similar to the Synopsys OCC. The schematic of that part is represented on
Figure 17 and the corresponding Verilog source code is in Annex A), Table 8. The 5 control bits are the 5 flip-
flops, serially linked, and with a parallel output “p_out” to drive OCC control part. The scan out port is
p_out[4] (towards next OCC in clock chain). This part of the OCC will now be modified in order to take
advantage of the previous observation.

SET
si S1 D D Q
S2

clk C ENB
CLR Q
Ctrl_bit1
P_out[4:0]
shift SET
S1 D D Q
S2

C ENB
CLR Q
Ctrl_bit2

SET
S1 D D Q
S2

C ENB
CLR Q
Ctrl_bit3

SET
S1 D D Q
S2

C ENB
CLR Q
Ctrl_bit4

SET
S1 D D Q so

S2

C ENB
CLR Q
Ctrl_bit5

Figure 17: Current ST-Ericsson OCC clock chain schematic

Pierre Schamberger November 2012 page 22


IV.3.4) Update proposal for the OCC clock chain: Muxing the design
This update should offer to the ATPG the choice to pulse or not the output clock of each OCC, on a
given subset of the 5 control bits. It should also offer the choice to the user on the number of capture cycles.
More precisely, the user may be able to choose between 1 to 5 capture cycles dynamically, for each set of
pattern (“basic stuck-at” and “fast-seq stuck-at” patterns for instance).

When processing the first patterns, the ATPG does not need to deal with multi capture cycles, and
therefore only 1 control bit of each OCC is used in stuck at test mode, while all 4 others are set to 0.

This update hides the useless 4 control bits in each OCC during standard pattern generation. Thanks
to this, a clock chain with 50 OCCs will be reduced from 250 cells to 50 cells for most of patterns. It means
that for those patterns (in a 200 scan chain length design), the clock chain will not anymore impact the
shifting time. In the current ST-Ericsson project, it represents half of the pattern set. In Transition fault
model, 2 control bits can be used for most of faults, implying a clock chain length of 100 flops. Still, it would
not impact the shift time. Only in the few patterns, including those dealing with memories, the shifting time
will be over 200 cycles, slowing down the shifting time from one fifth (250 cycles).

This method basically avoids creation of a second clock chain, thus the creation of an extra couple of
scan ports while lowering the impact on the test time.

The idea is to perform 4 tests:

 Stuck-at single capture cycle: 1 control bit shown to the APTG tool.
 Stuck-at multi capture cycles: Here, depending on the design, it should be set after some
trials, but typically 3 control bits are enough for most of multi cycles faults.
 Transition single launch/capture cycle: 2 control bits shown to the ATPG.
 Transition multi capture cycles: here again, depending on the design, it should be set after
some trials. The value which gives the best tradeoff between fault coverage and compact
clock chain is to be used.

The choice of the number of OCC control bits will be set into WIR control registers (cf. next section)
as a 3-bit register: With a standard binary coding implementation (“001” to “101”), this register can decide
the number of control bits used.

There is no impact at design top level, since everything is handled by WIR Controller (cf. next
section). In the shift clock chain, however, it is quite different since an eventual bypass should be
implemented for each bit control (red wires on Figure 18).

The current design is compacted by using scanned flip flops (mux and sequential cell grouped), and 5
new mux (in red on Figure 18) are added to decide whenever it is bypassed. The combinational logic is
grouped inside the i_logic (top right corner) block, evaluating bypass enablers.

One can see on Figure 18 the sel[2:0] signal going through the logic bloc, transform in a ctrl[4:0]
signal to drive bypass enablers.

Pierre Schamberger November 2012 page 23


sel[2:0]
Scan enable
ctrl[0] i_logic
Clock

ctrl[1]
SE Clk SE Clk
D 1 D 1
SO port
Q Q
SI port
SI 0 SI 0
Ctrl bit 0 Ctrl bit 1

p_out_0 p_out_1

Figure 18: Schematic of the update OCC clock chain proposal. For esthetic purpose, only 2 control bits where reported on the
picture. The real design still have 5.

IV.3.5) Update test results


Table 4 presents the results obtained on the former NXP IP project, for Stuck at fault model, showing
the right behavior of the update. For all tests, the OCC chain was declared as a clock chain, according to the
first part of the OCC study.

5 control bits 1 control bit


declared declared
Full configuration: 5 bits Test coverage 99.55% -
Pattern set 1402 -
Reduced configuration: 1 bits Test coverage 25.21% 99.55%
Pattern set 169 (wrong) 1479
Table 4: Pattern set size and test coverage for current OCC clock chain implementation and update proposal one.

One can conclude that this update proposal does modify neither the pattern set, nor the test
coverage.

In order to pass the Design Rules Check (step verifying all test protocol and learning how to deal with
the design. Last step before pattern generation) in TetraMax, the SPF should be slightly adapted when all
control bits are not used. In fact, leaving unchanged the SPF file will fail. One should remove the extra OCC
control bits in the OCC definition (the bold lines in the SPF extract on Table 5). For instance, if only one
control bit is used (Stuck-at basic scan), all but “Cycle 0…” bold lines should be removed, as well as the
PLLCycles parameter adjusted.

Pierre Schamberger November 2012 page 24


ClockStructures ScanCompression_mode {
PLLStructures "PLL_STRUCT_0" {
PLLCycles 5;
Clocks {
"i_OCC/fast_clk" Internal {
OffState 0; Table 5: OCC declaration in test protocol
PLLSource "pll_block/Z"; file. Bold information should be adjusted
depending on the number of control bit
ExternalSource "shift_clock";
used.
Cycle 0 "i_OCC/…/p_out_reg_0/Q" 1;
Cycle 1 "i_OCC/…/p_out_reg_1/Q" 1;
Cycle 2 "i_OCC/…/p_out_reg_2/Q" 1;
Cycle 3 "i_OCC/…/p_out_reg_3/Q" 1;
Cycle 4 "i_OCC/…/p_out_reg_4/Q" 1;
} } }}

Then the ATPG tool, while tracing scan chains, will see nothing but a reduced OCC clock bit chain.

IV.3.6) Impacts on design


The design impact is very low, since only a couple of logic gates are required with a 3-bit control
signal coming from a WIR controller. The area, in a C32 technology, rises from 83.01 to 84.86(WHATT!!) for
the whole OCC, giving an area impact of 2.2% (+1.85). This can then be considered as negligible since the
OCC represent quite nothing in the global design. Also, no impact on the timings is to be noticed since the
clock chain is driven by a “slow” shift clock.

However, the number of control bits in OCCs is determined once for the test (depending on the SPF
file and the WIR controller test setup). An update of the WIR register is to be planned between 2 runs to
change the control bit bypass. This can slightly slow down the testing time. To check this, the time impact
will be evaluated on a concrete case below.

IV.3.7) Update advantages


Let’s take an example to see the real benefits of the update (based on current ST-Ericsson pattern
set):
 Faults targeted are Stuck-at faults
 A test setup is around 1,000 clock cycles for the design used.
 The set of pattern using only 1 capture cycle is around 28,000 patterns, over a total of
38,000 stuck-at patterns.
 The computation was processed for 50 OCCs.
 As well, regular scan chain length is 200.

Current OCC design will add an extra 50 (5*50=250 OCC shift cycles) shift cycles for each pattern.
These 50*28,000 = 1,400,000 cycles can be avoid by the method described in this study, against a 1,000
cycles of test setup reconfiguration in between single capture cycle patterns and multi cycles patterns.
With the current method, full test time for these patterns would be:
38,000*250 = 9,500,000 cycles for the current OCC implementation

Opposite to, with this update proposal:


28,000*200 + (38,000 – 28,000)*250 + 1,000 = 8,101,000 cycles for the update proposal implementation.
Gain is around 15% of the test time!!

Pierre Schamberger November 2012 page 25


In the following lines will try, through several configurations, to emphasize the benefits of such an
update. According to coming trends, the number of OCCs will increase from 30 to 130 in the design, and the
benefits will be observed on the test time in 2 cases: The first design has a scan chain length of 200 (Actually,
Current project scan chain length) and another with scan chain length of 120, as current trends can lead
towards.
Simulation length vs OCC number (scan Simulation length vs OCC number (scan
chain length: 120) chain length: 200)
45 45
Simulation Simulation
40 length 40 length
(Millions (Millions
35 35
Clock Clock
30 cycles) 30 cycles)

25 25

20 20

15 15

10 10

5 5

0 0
104
111
118
125
132
139
146
27

62
20

34
41
48
55

69
76
83
90
97

104
111
118
125
132
139
146
20
27
34
41
48
55
62
69
76
83
90
97
OCC items OCC items

Figure 19: Test time depending on OCC number in design with a Figure 20: Test time depending on OCC number in design with a
scan chain length of 120. scan chain length of 200.

For both graphs on Figure 19 and Figure 20 (corresponding each one to a case), here is the legend.
Table 6represents which set of pattern is included in which count (The Sx and Tx refer to Table 3).

 Dark blue: This is the worst case, keeping the current implementation.
 Red: This is achieved when running all single capture Stuck-at patterns with OCC clock chain length
of 1.
 Green: Idem than the Red curb, plus all 2-capture cycles transition patterns using OCC clock chain
length of 2.
 Purple: Idem than the Green curb, plus all Stuck-at multi capture cycles patterns limited to 3 cycles.
This implies the loss of a few patterns, but does not impact the fault coverage.
 Light blue: Ideal case if the OCC clock chain is either split or at least shorter than regular chains.

Number pattern Dark blue Red Green Purple


with
1 cycle - S1 S1 S1
2 cycles - - T2 T2
3 cycles - - - S2+S3+S4
4 cycles - S2+S3+S4+T2+T4 S2+S3+S4+T4 T4
5 cycles S1+S2+S3+S4+T2+T4 - - -
Table 6: Detailed count of patterns for each possibility of capture clock cycle, depending on the simulation processed.

Pierre Schamberger November 2012 page 26


Comments
On the previous figures, one can notice the threshold value (24 to 40 OCCs) under which the update
does not improve the test time. However, the present update allows still ATPG run in that state, making the
design able to deal with both.

Above this threshold, the reduction ratio gets bigger when the number of OCC rises. It grows up to
50% reduction for high number of OCCs. The Red and Green lines show the gain without losing any fault
coverage, and the purple line provides a nice enhancement while losing a few patterns for Stuck-at multi
capture cycles.

IV.3.8) Conclusions
The current implementation of the ST-Ericsson OCC implies 5 control bits. Those 5 bits are
sometimes used, but most of patterns are requiring a limited amount of capture clock cycles. With the
growth of design, the number of OCC will rise. It will appear soon that clock chain (including all OCC control
bits) will be longer than 200 or 300 cells, making it bigger than regular scan chain length requirements. This
will lead to an increase of the shifting time, depending on the number of OCC in the design.

The proposal offers a solution to avoid such problem, by limiting the number of control bits to shift
for most of patterns as simple Stuck-at and simple Transition patterns. Then, only patterns with multi
capture cycles (fast-sequential), targeting memories mainly, will impact the shifting time, and therefore the
test time.

With this update proposal, and according to current projects values, it has been shown that one can
avoid the test time rise (up to 50%) when the number of OCC will rise.

The nice thing with such an update would be, by dedicating more bits in WIR registers, to set
dynamically each OCC in a custom way, using 2 control bit for a given OCC, and 4 for another, depending on
the targeted part of design (memories, against regular simple logic cones). Moreover, this could be done
without modifying anymore the OCC structure.

IV.4) Conclusion
As a main conclusion, one can assert that it is a mandatory tool which is more and more used.
However, one should take care how to place the control bit chains in an IP when it comes to higher level
compression design. Also, an update of the current OCC clock chain was proposed to anticipate the rise of
OCCs number in designs. This allows not dividing the clock chain into 2 smaller ones when reaching the
maximum scan chain length, avoiding an extra couple of scan port at top level. This update has no real
impact on the design, but can provide significant test time reduction.

Pierre Schamberger November 2012 page 27


V) IEEE1500 Standard and Synopsys wrapper feature
As a last feature, wrapper is a powerful tool as it allows test pattern generation to target one core at
a time for a multi core chip. This is the purpose of this section. The core-based test will be presented in a first
time, showing advantages compared to regular full test, and then the official IEEE1500 standard will be
shown, dealing with its limitation and the way it is used in ST-Ericsson flows and Synopsys tools.

V.1) Core-based test methodology


Chips are getting bigger and bigger, tester limits are reached. Also, when testing the whole device, a
lot of power is required to switch cell states, mostly when shifting patterns. This power consumption could
be higher than functional regular power. This over-consumption can create issues at test, detecting then a
defected chip, while the chip is in fact alright. These new constraints lead to new methodologies [1]. Among
them, one can think about core-based methodology [7]. Given a 3-core chip, thanks to this methodology, 3
DfT teams can work in parallel on each core, ideally reusing each core test patterns at chip level.

Table 7 lists pros and cons of the core-based methodology, over a regular “all-at-once” methodology
[8].

Traditional Core-Based
 ATPG at Top-level  Joint responsibility
 Can only start when full chip – Core provider: core-internal DfT and ATPG
design is done – Core-user: Top-level DfT and ATPG
 Requires implementation  Allows concurrent engineering
details and knowledge of all  ‘Divide & Conquer’ approach
cores  Test development might be speeded up by test reuse
 Very large SOCs increasingly  Test coverage guaranteed at TOP level
difficult to handle by test  Core pattern expansion if no compression used
tools
Table 7: Traditional versus Core-based test methodology

V.2) Standard wrapper feature


To perform such methodology, when testing the whole chip, the cores must be tested one after the
other. While the first core is tested, other ones have to not impact the test. This is the aim of core wrapping
technology. It allows isolating each core from the others.
Each core is then tested in standalone, and a last run can be
done in order to test the interconnection of the chip (in
between cores).

V.2.1) Core wrapper


Assuming a core integration, wrapping the core
implies catching all functional core inputs and outputs
through so called wrapper cells. A wrapper cell (Figure 21)
includes a scan cell (green block in Figure 21), and a
multiplexer. The cell is the same for input and output; it only
differs with the capture_enable signal driver, which is always Figure 21: Standard wrapper cell with a
set to opposite values for inputs and outputs. scanned cell (green) and a mux to select the
wrapper cell mode.
Figure 22 is a global picture of how a wrapper looks like. The red wire connect all wrapper cells as a
regular scan chain, and the Glue logic block is driving capture signal for output and input cells.

Pierre Schamberger November 2012 page 28


Figure 22: Synopsys wrapper implementation with input and output wrapper cells.

In functional mode, the wrapper cells are still used by data, but the scan cell (SDF cell in Figure 23
and Figure 24) is bypassed. The wrapper feature is then only adding a mux (capture mux) in the functional
path from port to first cell inside core. In test modes however, the wrapper cells are playing two roles:

 When the core is tested, all input cells are loaded (through the red scan chain) with the input pattern
of the corresponding functional input, and output cells capture the values which are supposed to be
at the functional corresponding output port. In that way, from a core point of view, it seems like just
this core is visible, and no interconnection around.
 When the core is not tested, one can test the interconnection around the core. The input wrapper
cells capture upstream values, and output cells are set, thanks to the wrapper scan chain, to given
value (ATPG work). In that way, the core appears as transparent and interconnection can be tested.

Figure 23: Wrapper cells in inward facing mode. Figure 24: Wrapper cells in outward facing mode.

Two test modes are then required for each core, namely an inward facing wrapper test mode (called
wrp_if) for core testing (Figure 23), and an outward facing wrapper test mode (called wrp_of) for
interconnection testing (Figure 24). On the figures above, the 2 modes are represented: Figure 23 shows an

Pierre Schamberger November 2012 page 29


IP in wrapper inward facing mode, taking as functional inputs the pattern issued from the wrapper chain,
and output values are captured in output cells. When pattern is unloaded, one can verify the correctness of
it, as a regular chain. Figure 24 shows the interconnection testing, as output cells of core A wrapper set
values, and core B input wrapper cells capture the results. The same shift process is used again to get the
output pattern and check its correctness.

A single wrapper chain is mentioned above, but for real IPs with several thousands of ports, several
wrapper chains are required, because they have the same impact on test time as regular core scan chain
length. Meanwhile, the figures do not represent internal regular scan chains and scan ports, but those are
still similar to what have been explained before. Internal scan ports are excluded from wrapper, as well as
clock not to perturb the usual behavior of the core. The wrapper feature adds simply controllability and
observability over functional core inputs and outputs.

V.2.2) Chip top integration


At chip level, several wrapped cores are integrated. A common way to group the regular “internal”
scan chains for each core is the following: Since each core is tested alone, the same scan pads can be reused
for each core, as the multiplexed architecture on Figure 25 (red wires). Some selection signals choose which
core is under test and drive the mux selector.

Usually, core internal scan chains have a compression feature inside each core, to limit the number
of scan ports. It makes it impossible to compress it again. However, each IP has around 10 to 20 wrapper
chains which are not compressed, and these chains cannot be muxed with other core wrapper chains since
they are all used together. For a 3 core chip, it can be around 50 wrapper chains to handle. A chip top level
compression might then be used to limit pad impact, as shown on Figure 25 (blue wires).

internal scan in
internal scan
ports
out ports

core_select
Core A (wrapped &
compressed)
Decompressor

Compressor

wrapper scan in wrapper scan out


ports Core B (wrapped & ports
compressed)

Core C (wrapped &


Chip top level
compressed)

Figure 25: Chip top DfT architecture. Red wire are internal scan ports, muxed all together, and the
blue ones are extest wrapper scan chains handled in a compression.

V.3) The IEEE1500 Standard


To standardize this feature, the IEEE working group decided in 2007 to set a new standard called
IEEE1500, or Standard Testability Method for Embedded Core base Integrated Circuits [1]. This part will
explain first the most important rules of this standard, what it implies, and its limitations. Then, the way it is
really implemented in SoC designs will be presented.

Pierre Schamberger November 2012 page 30


V.3.1) Global specification
The IEEE1500 standard can be summarized with Figure 26. The case is the same as explained before,
a core to be wrapper in a shell. However, the “wrapper” is a somewhat more complex than previously.

Figure 26: IEEE1500 Standard global view of the wrapper feature with the wrapper cells (WBR), and the regular internal scan
chains(WPP).

The signals nomenclature is the following:

 WPP: Wrapper parallel Port: Those are regular core internal scan ports.
 WSP: Wrapper Serial Port: includes Wrapper Serial Input/Output (WSI/WSO), and Wrapper Serial
Control (WSC) to handle correctly this architecture. WSC mandatory signals are:
o WCLK (clock), WRSTN (reset), UpdateWR, ShiftWR, CaptureWR, SelectWIR

The aim of each signal will not be detailed here since it is pointless. Reader should refer to IEEE1500
official documentation [1] if needed. However, global behavior allows the data from WSI port to be shifted in
one of the following shift register which form IEEE1500 Wrapper block:

 WBR (Wrapper Boundary Register): These registers correspond to all wrapper cells previously
presented, handling functional core ports. All wrapper cells values should be shifted through
WSI/WSO ports (limiting to 1 the number of wrapper scan chains).
 WIR (Wrapper Instruction Register): Here is stored the mode in which the block is, a.k.a. which shift
register will shift WSI inputs to WSO values (and then be updated).
 WBY (Wrapper Bypass Register): Is used as a default shift register, when the data is not destined to
this Wrapper block. In fact, the standard allows to serially chain all wrapper blocks from a WSO to a
WSI port in order to get a single wrapper serial chain.
 WDR (Wrapper Data Register): Here are all test control signals, such as those defining test modes,
wrapper and compression use, and clock management. A WDR can be of 2 types: “Control” which set
values at test setup to drive test switches in all design. “Observe” which can capture global values,
from clock management mostly, and unload them through WSO port. The “observe” type is less
often used.

Pierre Schamberger November 2012 page 31


V.3.2) Limitation
On the paper, this standard provides new features, such as WDR to set static test control bits in the
design, without dedicating a pad for each value. The WBR feature is in fact the wrapper chains presented in
the previous part. However, this standard has a huge drawback, which makes it not usable directly: There is
only a single WSI/WSO couple of scan ports for all wrapper cells.

On theIP considered, there are around 1000 inputs and 1500 outputs. It means then a wrapper scan
chain of 2500 cells to shift for each pattern. When the remaining scan chains are aligned on a length of 200
or 300 cells, this is clearly not an implementable way.

Thus, this wrapper scan chain is usually cut into several smaller chains, aligned on internal scan chain
length, and stitched to dedicated scan ports, outside IEEE1500 rules (usually compressed at chip top
level,afterwards). This corresponds to what have been defined at the beginning of wrapper section.

V.4) IEEE1500 implemented logic


Despite this limitation, this standard is commonly used, and control signals (WSP) are used to drive
so-called WIR blocks (which are actually a block containing a WIR, a WBY, and one or several WDR).

Figure 27: Implemented IEEE1500 Standard WIR Controller

Previously called TCB in IEEE1149.1 boundary scan standard, this module contains shift registers as
WIR, WBY and WDR, and is commonly called “WIR” or “WIR Controller”. These are clearly misnomers, but
the following parts will use these terms for simplicity. Figure 27 shows how it is implemented.

The WIR signals (Figure 27 in the top left corner) are still the same from the IEEE1500 standard. The
“WDR Observe” module is less used. Basically, at test setup, a first sequence (a couple of clock cycles) is used
to shift a code in the WIR block, which will decide which shift register will be used next. It’s usually the WDR
Control, containing from a 100 to a 1000 registers to control all the design (clock management, test modes
…). After that, the WIR is set towards WBY registers, and the real chip test can start.

A particularity of WDR control shift register, is the possibility to use “dynamic control bits”. Instead
of setting a static value for the whole test, the dynamic control bit is caught in a scan chain and the bit can

Pierre Schamberger November 2012 page 32


be determined dynamically by ATPG. It will be useful for core-based test time savings. This will be discussed
in the next section.

This module is generated and integrated at RTL level, and is already present in design when DfT
insertion start. One should note that the WIR test setup should be well defined to enable the right test
control switches before ATPG step to pass design rule checking step and achieve correct scan chain
recognition by the ATPG tool.

V.5) Synopsys wrapper flow


Wrapper used to be inserted at RTL level with previous tools. Synopsys do it at gate level, during DfT
insertion. This is a nice feature easy to use: A single command line to enable wrapper insertion and a list of
port to exclude from wrapper are enough to implement it [3].

However, Synopsys does not support IEE1500 compliance, and the wrapper instantiated is as simple
as in Figure 22. In this way, it is very easy to control wrapper scan chain length or the number of them.

The wrapper creates 2 test modes: inward facing and outward facing. Each test mode can have
dedicated scan ports, and can also choose a different scan chain length. The main point is that inward facing
chains are usually included in a core compression feature, to limit the amount of scan ports. However, for
outward facing mode, these chains have to be carried to the chip top level, where they will, with other core
wrapper scan chains, be compressed too.

This nice feature has however a set of limitations. As well as, for simple wrapper flows, it is easy to
use, but when it comes to combination with other features (compression, pipeline, complex compression
flows …), either unsupported flows or serious errors arise. These limitations have to be bypassed for current
complex designs. This will be the aim of the following parts, and also the main work achieved during this
master thesis.

Pierre Schamberger November 2012 page 33


VI) IP wrapper & compression flow
After seeing what DfT Compiler is up to, it is now time to present which achievements were done
during this master thesis. After quite a long and hard time to get into Synopsys tools behavior, the idea was
first to use wrapper feature mixed with a compression for internal scan chains on a given IP. This will be the
main topic of this section.

VI.1) Requirements
The first assignment was to realize a script for DfT Compiler to fit Figure 28 requirements. Starting
from a regular scanned core with regular scan chains (brown block), the idea is to add a wrapper (WBR
chains) around the core, with dedicated inward facing (intest) and outward facing (extest) ports (si/o_wbr
vs. si/o_wbr_ext) at core wrapper hierarchy, and a global compression (green blocks) over internal scan and
wrapper scan chains at test compression hierarchy level. Apart from some points, this can be done with
Synopsys tools, activating wrapper and compression flows.

Figure 28: IP Wrapper & Compression flow requirement with double bypass for each wrapper mode, and a complete control over
boundary cells direction.

The above figure shows extra features around the wrapper (WBR) chains, which are not compliant
with Synopsys tools. These bypass paths are discussed in the coming part.

VI.2) Wrapper bypass paths


The benefits of wrapper cells have been proved in previous part, and it has been shown that
Synopsys tools can handle the IP with 2 exclusive test modes: Intest and Extest. This behavior has a serious
limitation, since there is no hardware possibility to test the chip interconnect (extest wrapper mode usually)
and IP internal logic (intest wrapper mode) at the same time. In fact, given a 2-core chip, if after testing core
A in standalone mode, one would be able to test both interconnection and core B at the same time, it would
allow to save basically the interconnect test time [7]. With Synopsys flow, this is not possible, since, when

Pierre Schamberger November 2012 page 34


activating both modes at chip level, the wrapper cells would be both included in IP internal compressor and
chip top level compressor, failing test pattern generation.

To avoid such a behavior, some bypasses were implemented (cf. Figure 28). 3 test control signals are
used to provide full flexibility during test. By setting the wir_sel_wbr_chain control bit, the user decides
which compression (core or chip level) will handle the WBR (wrapper) chain:

 If set to 1 (Figure 29), chip top compression will drive the wrapper chains (wir_bypass_wbr_ext set to
0), and IP compression chains will see only the bypass path (wir_bypass_wbr set to 1).

Figure 29: Wrapper bypass paths in extest mode.

 If set to 0 (Figure 30), IP compression handles wrapper chains (wir_bypass_wbr set to 0), and the
chip top compression does only detect the bypass chain (wir_bypass_wbr_ext set to 1).

Figure 30: Wrapper bypass paths in intest mode.

The bypass includes a flip-flop to avoid scan chain length of 0 (ATPG would fail) and a latch, to
handle the possible clock domain change.

To test interconnection and a core at the same time, the wrapper cells should be able to switch from
an intest mode to an extest one, to capture faults either from the core or from the interconnection. The
generated WIR block can allow such a behavior by setting the test mode control bits (generated by DfT
Compiler) to be dynamic. Thanks to this feature, these 2 control bits can dynamically change from a test
pattern to another, reversing wrapper cells behavior. The above 3 control signals should be held however
static to capture either wrapper chains in chip top compression or IP compression.

Also, global IDDQ test patterns generation should be done with all logic testable, meaning
interconnection and cores logic. Activating both at chip top level would cause failure since wrapper chains
would be caught in 2 activated compressions. Thus, this cannot be achieved with Synopsys tools.

Pierre Schamberger November 2012 page 35


All this requirements are against Synopsys tools features, and custom actions should be then
processed.

VI.3) Design under Test


The implementation and tests were processed on a former NXP IP with the following settings:

 256 regular core scan chains of length 200.


 Clock management unit inside the core, including Clock Controller Block (CCB), which are
former OCC NXP-homemade devices.
 Test Control Blocks (TCB) are implemented inside the core to control test signals. A TCB is a
former NXP device which is equivalent to a WIR, without the IEEE1500 Standard interface.

VI.3.1) Extra updates


Due to the 2 last points of the above setting list, some updates were made in order to make it
Synopsys (OCC) and IEEE1500 (WIR) compliant, replacing Clock Control Box (CCB) and Test Control Box (TCB).
The team had never worked with OCC and WIR before, and a consequent part of the master thesis was
actually to see how to use and integrate these features, updating from previous ones.

In fact, at test compression hierarchy level, a WIR and an OCC were inserted, but the core itself used
the old features. Interface modules were created to handle such architecture. But current IPs are now fully
WIR and OCC compliant.

VI.4) Solutions proposed


Tests have been processed to implement such architecture exclusively with Synopsys tools
(command line batch files), without success. Having a full Synopsys flow is not a requirement, but it would
simplify the flow, and allows mostly getting correct test models and testing protocols, avoiding then to
manually edit them after each run. In fact, the design and its corresponding test files have to be shared with
other design teams (SoC integrators, test engineering …) and the flow has to be kept as simple as possible.
This flow should also be technology-independent, since it might be reused on a future project, making netlist
edition not recommended.

VI.4.1) 2-level hierarchy flow


According to Figure 28 requirements, the first try was using 2 extra hierarchies around the core to
implement such architecture, namely:

 IP_shell: This hierarchy level includes wrapper features.


 IP_shell_tk: This level includes all compression modules, as well as the OCC and the WIR block.

A first pass is made at IP_shell level to insert wrapper, with control signals coming from above (WIR
Controller at IP_shell_tk level). A test model is built with these 3 test modes to fits wrapper insertion.

 A first test mode, called Internal_scan, includes all scan chains from the IP core (no wrapper
consideration).
 A second, called My_extest, includes exclusively wrapper scan chains in extest mode.
 The last test mode is called My_intest, and includes wrapper cells in inward facing mode, as well as
all core internal scan chains.

Pierre Schamberger November 2012 page 36


2 test mode signals (mentioned as TM) are created to encode the test modes. They are stored in the
WIR controller at IP_shell_tk hierarchy level above. The test mode signals also drive wrapper state, making
the wrapper dynamic reversal impossible. The first test mode (Internal_scan) is unused but automatically
built by the tool.

As explained before, no bypass can be provided with Synopsys tools. This was then implemented as
a RTL block (with the red boundary on Figure 31), compiled and inserted after Synopsys DfT insertion, as a
hack. Wrapper chains extremities are connected to this block to provide the expected result. The 3 extra
control signals (wir_sel_wbr_chain, wir_bypass_wbr, wir_bypass_wbr_ext in gray on Figure 31) are statically
driven from a WIR controller for the level above. By setting these 3 test control signals to correct values
(resp. “001” in intest and “110” in extest), this modification does not impact anything. However, if the tool
can play with previous TM signals defined as dynamic control bit in the WIR, it will only affect wrapper cell
capture signal, and reverse the wrapper from inward facing to outward facing dynamically during pattern
generation.

Figure 31: wrapper bypass RTL block implemented (inside red boundary).

The second pass will be done at IP_shell_tk to implement the compression over internal scan ports
and inward facing wrapper ports (si_wbr ports). My_extest scan ports (si_wbr_ext ports) should not be
compressed and used as another regular scan mode.

The integration should use the IP_shell test model file generated during the previous run, to get
information about the sub block. However the integration is not that easy, since it contains wrapper
information, and so called “user-defined test modes” (My_extest, My_intest). Those are 2 limitations of test
model integration which makes it impossible. Thus, the test model must be cleaned with Perl scripts to make
the test model passing integration. The main idea is always not to corrupt test model/protocol files, which is
not an easy thing: In fact, the structure is not so obvious, and a single change can corrupt all data when
reading it again afterwards (ATPG, Simulation …). The main changes were to remove unused Internal_scan
test mode, to modify all wrapper information (which should be find out), and rename the My_intest test
mode into Internal_scan, which is the only test mode allowed for test compression integration. When this
test model modification is done, the compression insertion is quick, enabling a regular scan compression
flow.

ATPG runs were processed in order mainly to verify if all clock information were correct. Also the
fault coverage was checked. Indeed, this IP was tested before and got a very good coverage (around 99.5%
of faults targeted), and after those previous steps, the coverage should remain unchanged. If it was below, it
would signify a problem in DfT implementation. Finally, the patterns generated were simulated with NCSim
to check their correctness.

Pierre Schamberger November 2012 page 37


VI.4.2) Single hierarchy flow
A second flow more straight forward was set up after discussion with Synopsys support. It avoids
test model edition between two steps, since it only contains one. By enabling both wrapper and
compression flows at the same time, at IP_shell, the tool can create almost the same result. Figure 32 shows
the resulting architecture, listing all test functionalities.

A WIR block is used to control test mode signals (TM1, TM2 and TM3), and an OCC controller (black
block) is used to control the clock used in the wrapper and WIR (black lines). Wrapper cells (in the green
block, Wrapper Boundary Register), and a compression feature (decompressor and compressor) are both
generated at the same DfT insertion run. A WIR serial chain is created, from input ports (wsi), through WIR
Controller at shell level and those inside the core, until wso output ports. In a similar way, and according to
the OCC explanations in a previous part, the shell clock chain is handled in a dedicated global clock chain
grouping all OCC control bits (shell OCC + core OCC). To perform this single chain, scan cells group are
created, and a scan path is declared through shell clock chain, and dedicated core ports.

This script is fully compliant with Synopsys flows, and provides a cleaner result since no handmade
step is required. However, in this flow, the wrapper bypasses were not implemented, in accordance with
Synopsys. It means that there are no possibilities to make the wrapper state switch dynamically, from
outward facing to inward facing. Also, at chip top level, the ATPG will provide a set of pattern for each core,
plus one to test the interconnection. This can imply a small test time increase. In fact, the wrapper created is
directly plugged to compression feature, and it is not possible to modify the netlist as in the first flow to add
bypasses and make TM test modes signals dynamic. If TM signals were dynamic, the inputs of wrapper
chains would come randomly from extest or intest test port due to the scan in multiplexer on wrapper
chains.

Figure 32: IP wrapper & Compression flow in a single hierarchy. No more bypass implemented but a full compliance with Synopsys
tools.

Pierre Schamberger November 2012 page 38


VI.4.3) User custom flow
A full flow from a core netlist, until simulation files ready was built using TCL, Perl and Bash scripts,
generic enough to fit easily various IP cores. It implements the second single pass architecture. A
configuration file is to be set with correct values about names of specific signals, and key figures (input ports,
scan chain length …). Then a small customization of some scripts is required to perform the flow. Figure 40 in
Annex B shows all actions required as step list.

VI.5) Conclusion
Two flows were implemented and tested with wrapper and compression architecture. The most
straight-forward was the one using only extra hierarchy but does not allow bypass implementation. The first
flow, using 2 extra hierarchies is easier to customize is more flexible, allowing wrapper bypass
implementation. However, a choice was made, for simplicity, to automate the simpler one, but the scripts of
bypass implementation are available for a later use too.

This IP wrapper & compression flow was a first task to handle correctly the tools which was followed
by a real and up-to-date project. This project will be the purpose of the next and last section.

Pierre Schamberger November 2012 page 39


VII) X/J projects flows
The aim of this part is to use all features previously introduced in a current ST-Ericsson project. The
top module is a modem, and is to be declined both as an IP (Project X), to be integrated at higher level in a
chip, and as a standalone chip (Project J). If the functional part is very close, the DfT work is however quite
different. The idea is to tackle both projects as long as possible with a common DfT insertion flow, and split it
up at the last minute.

VII.1) The project


The targeted design (Figure 34 for the X IP, Figure 35 for the J SoC) includes 2 power domains,
namely Vcore and Vsafe. The Vcore part is mainly the modem itself, whose power supply can be switched off
when unused, and the Vsafe part includes all remaining components which need to have constant power
supply. The architecture was defined to fits all following requirements:

 Power aware: Limit both test and functional net crossing power borders to avoid extra logic
insertion. Also, independent test architecture should be implemented for each power domain. In fact, for
test purpose, these 2 domains must be seen as 2 blocks with no interaction. For debug, one could even
imagine testing just a power domain with the other one switched off
 Targeting 2 projects: Maybe the most challenging part, the aim is to work for both projects.
The IP will have an extra hierarchy, called IP_shell (light blue on Figure 33), were dedicated test logic will be
inserted. The chip project will stop a level below, at soc_top
IP_shell: Test compression + IEEE1500
hierarchy (Figure 35), where chip top test blocks IP_top

(TAP_CTRL…) will also be inserted. But below the IP_top level, OCC IP_iso: Protection cells

IP_stub_shell: Test compression + IEE1500


everything, both functional and test is similar. Then only the
WIR 1500

last tool run is project dependent, and all the remaining can IP_stub: Stub

be grouped. Free flops

IP_core

Since this work should be done in a critical path of xmip_vcore_shell

IP_vcore_top

the project roadmap, a test case including the same test scan segment
scan segment
scan segment
scan segment
scan segment
specificities was built in Verilog. The test case (initial view on Free flops

Figure 33) had the same hierarchies, with a couple of logic CORE

gates and sequential elements in some parts (green blocks). MODEM

The design includes WIR controller and is full On-chip clocking WIR 1500

controller compliant: Each functional clock is handled with an


IP_vsafe_iso

OCC Controller and a shift clock for test purpose. Both are IP_vsafe_shell
WIR 1500

already inserted at RTL level, and WIR wrapper serial IP_vsafe_top

interfaces are linked from a WIR controller to another. 2 scan segment


scan
scan segment
segment
Free flops

chains of WIR Controller are built (red lines), one for each
power domain. Free flops

Also, a sub-module will be provided by another team,


with DfT already inserted (among it, 2 compression units on WIR 1500

scan ports with pipeline stages), which is the Modem Core


Figure 33: Base of IP X project. WIR and OCC are
(gray box). A test model will be given by the developing team
inserted at RTL, all other features to be inserted.
to be integrated at IP and SoC levels. The IP architecture
target is shown on Figure 34.

Pierre Schamberger November 2012 page 40


IP_shell: Test compression + IEEE1500

IP_top

IP_iso: Protection cells

IP_stub_shell: Test compression + IEE1500

WIR 1500

IP_stub: Stub

scan segment

IP_core
IP_vcore_shell
IP_vcore_top Vsafe power domain
12 12
scan segment
scan segment
scan segment Vcore power domain

scan …
scan segment
segment
scan segment

32 Modem 32
Core
10
10

WIR 1500

IP_vsafe_iso
IP_vsafe_shell
WIR 1500

IP_vsafe_top

scan
scan segment
segment

scan segment
scan segment

10 10
scan segment

scan segment
scan
scan segment

segment
Wrapper cells

WIR 1500

Figure 34: IP X project DfT requirements with 2 compressions at different levels, and a Wrapper feature.

The IP architecture includes 2 compression modules (at IP_stub_shell and IP_shell levels, depending
on the power domains) and a wrapper (in pink) to be included in Vsafe compression. The number of scan
ports is given above, for a total amount of 128 scan ports (64 inputs and 64 outputs). Because the IP project
will be integrated in a bigger chip, a core-based test methodology will be applied at chip level. For that
purpose, the wrapper discussed in previous sections, including bypass paths, will be implemented here. The
aim is to deliver a fully tested IP with its set of test patterns, in order to limit the integrator work.

The coming figure will detail the chip J project, which is quite similar to the IP X project.

Pierre Schamberger November 2012 page 41


soc_shell

soc_top

soc_iso

soc_vcore_shell: IEEE 1500 + Test compression

soc_vcore

scan segment
scan
scan segment


segment 6
6 scan segment

IP_core
Serializer
Serializer IP_vcore_shell output
input
12 12
IP _vcore_top

scan segment 16
16 scan segment
scan segment


scan
scan segment
segment
scan segment

Serializer Serializer
input output
32 Modem 32
Serializer Core Serializer
input output
10 10

W WIR 1500
5 I
5
R
1
5 IP_vsafe_iso
0
TAP 0 IP_vsafe_shell
CTRL
WIR 1500

IP_vsafe_top
5
scan
scan segment
segment

scan segment
scan segment

Serializer
input Serializer
10 output
10
scan segment
scan
scan segment

segment
scan segment
5

WIR 1500

Figure 35: Chip J project DfT requirements with the same compression architecture as IP X project, plus serializer at soc_shell level.

In the J chip project, the scan compression structure is the same, but there is no wrapper, since it is
pointless in a chip project without other IPs. However, due to the small dimensions of the chip, the amount
of pads is limited, and thus the scan access ports too. To compensate this, a serializer feature is added to
divide test ports by 2 (from 128 ports to 64 ports). Also, some chip top level logic (such as Test Access
Protocol Controller) is inserted and some logic will be scanned too (green scan segments), included in Vsafe
compressor.

Pierre Schamberger November 2012 page 42


VII.2) Tool limitations
This part will deal with tool limitations which do not allow this architecture to be fully compliant.
First, a so called hybrid flow is to be used when inserting compression at a given level, when a sub-level has
already compression logic inserted. This is the case here with the Modem core including 2 compression
modules. However, wrapper feature does not work with this hybrid flow. This first limitation will impact
loudly the flow. Two solutions then appeared: Either hiding the wrapper feature, or hiding sub-block
compression modules to the tool. The two flows are presented below in the “Flow proposals” part.

Another limitation when using a hierarchical flow and bringing up information from sub-blocks is
that no user-defined test mode can be used. The impact of such limitation will clearly appear when
integrating the IP X project. In fact, a wrapper implies dedicated test modes for intest and extest modes, as
defined in a previous part. This is another limitation to bypass which will be discussed in “Integrating IP X
project” part.

For the chip J project, serializers have to be inserted to limit test port impact. Again, this is an issue
because the serializer flow does not work in hybrid flow. Serializer should then be inserted at another level,
which is dedicated to that purpose (soc_shell). Such change in the hierarchy was allowed because of the
early stage of the chip J project. However, concerning the IP X project, there is no way to change anything,
and the architecture should be implemented as described. This will be discussed in “Targeting chip J project”
part.

VII.3) Flow proposals


As well as for the previous IP wrapper compression flow, a lot of trials were processed, facing a lot of
tool limitations. Most of trials failed at one point, or became too much complex to be implemented. Bugs
and tool limitations were reported to Synopsys support for analysis, and improvements in coming releases.
Also, they provided some support by validating the flows which appeared to work, and helped to debug
when needed. Among all trials, 2 flows deserved to be described since considered as working on the test
case built (at the time this report is written, the design is still under implementation, and only a few DfT
insertion steps can be done).The flow will handle more particularly the IP X project implementation, and the
difference with chip J project will be handled in the “Targeting chip J project” part.

VII.3.1) Bottom-up flow


The first proposal is a flow called bottom-up or hierarchical, and was often used in previous designs.
It starts with the lowest blocks in design (here IP_Vsafe_top and IP_Vcore_top) and insert DfT for each block
above. This is quite a long process, requiring as many steps (DfT insertion runs) as hierarchy levels, but it
provides the ability to declare and define all clocks and scan chains with a very good granularity. At each
step, a new netlist is created, with new test signals and ports, and used in the next step as netlist input. Also
a test model is written usually to be used at the next step. For the IP X project, the steps are defined in
Figure 36. Starting from Vsafe_top and Vcore_top levels up to IP_top, the table describes what is done at
each step.

Pierre Schamberger November 2012 page 43


• Creating regular chains (300 cells/chain)
Vsafe • No test model used. No test mode created.
top

• Just stitching internal chains to new ports


Vsafe • Using Vsafe_top previously generated test model. No test mode created.
shell

•Just stitching internal chains to new ports.


Vsafe •Using Vsafe_shell previously generated test model. No test mode created.
iso

•Creating regular chains (300 cells/chain)


Vcore •Using CTLGen of Modem_core block without compressors. No test mode created.
top

•Just stitching internal chains to new ports.


Vcore •Using Vcore_top previously generated test model. No test mode created.
shell

•Merging 2 designs. Just stitching internal chains to new ports.


•Using Vsafe_iso and Vcore)shell previously generated test models. No test mode
IP core created.

•Creating regular chains (300 cells/chain) and stitching internal chains to new ports.
IP stub •Using IP_core previously generated test model. No test mode created.

•Inserting compression for Vcore scan chains. Excluding Vsafe chains from
compression.
IP stub •Using IP_stub previously generated test model. ScanCompression_mode and
shell Internal_scan test modes used.
•Just stitching internal chains to new ports.
•Using IP_stub_shell previously generated test model. ScanCompression_mode and
IP iso Internal_scan test modes used.

•Creating regular chains (300 cells/chain) for IP_top cells and stitching internal chains
to new ports.
IP top •Using IP_iso previously generated test model. ScanCompression_mode and
Internal_scan test modes used.

Figure 36: X/J project bottom-up flow steps. Each line describes a step with the main actions to be done.

As an example, at IP_stub_shell, no scan replacement is made, since there is no cell in that hierarchy.
Then, a compression feature is inserted for all scan chains from Vcore power domains, namely those coming
from IP_vcore_top and IP_stub. Other scan chains are excluded from compression. This step takes as input a
netlist and a test model from previous step (IP_stub). In the new design IP_stub_shell after DfT insertion, 2
test modes are present, ScanCompression_mode and Internal_scan. The first one is using compression
modules, and the second is bypassing compressors, and then contains scan chains with long length.

Pierre Schamberger November 2012 page 44


For all steps, the test model from previous step is to be used, but Vcore_top step. In fact, there is no
way to integrate correctly the test model with compression from the Modem core. It would alter the whole
flow, and making IP_stub_shell and IP_shell compression insertion impossible. As a workaround, a fake test
model is generated with CTLgen tool.

After describing all sub-steps to reach IP_top level, the last one is to proceed to IP_shell DfT
insertion, which is the most challenging. The idea is basically to reuse the IP wrapper compression flow from
the previous part. A single thing differs from the previous flow: The compression does not have to compress
all internal scan chains since some of them (Vcore and Modem_Core chains) are already handled in
compression at IP_stub_shell.

A first step is then to create a fake test model for IP_top block with all scan ports declared (with a
virtual internal scan chain length) with CTLGen. Then, by excluding Vcore and Modem_Core scan chains from
compression and taking care of OCC internal clock chains, the flow is exactly the same as for the IP wrapper
& compression flow.

However, the test protocols generated will be completely wrong, since no compression information
from IP_stub_shell and Modem_Core compressors are given. Then, a correct test protocol should be built
with SPF, taking the generated one as a basis. Compressors for Modem_Core and Vcore are to be inserted,
and the resulting file is fully ATPG compliant.

VII.3.2) Top-Down approach


The previous flow is actually very handmade and thus quite unstable. The coming flow had for
purpose to provide better final test files without building them from scratch. The main idea of this flow is to
forget about what have been done before, and see the problem from a new point of view. The biggest
limitation is actually the wrapper and hybrid flow mixing. With the previous flow, the hybrid flow was
avoided. In this flow, the wrapper will be hidden to the tool.

2 parallel tasks have to be first achieved, which are inserting DfT in Vsafe_shell_iso and Vcore_shell,
inserting Modem core test model in the Vcore block, and scanning both core with regular scan chain of 300
cells.

Then, the tool is used to insert a wrapper at IP_shell. But all wrapper dedicated information, ports
and nets are removed afterwards, as well as test modes. The only part remaining is the wrapper cells in a
dedicated module called wrapper_shell. In this module, the same bypasses as in IP wrapper compression
flow are inserted. But from the outside of the wrapper box, it looks like a regular scanned block. Also,
dedicated intest and extest wrapper scan ports will be created in the wrapper box. By creating a simplified
test model of this wrapper_shell, it will be handled at IP_shell as common scan chains. Twice as much as
wrapper chains number will be declared, namely wrapper_intest_x (green chains on Figure 37) and
wrapper_extest_x (red chains on Figure 37) to use dedicated ports for intest and extest modes. However
both chains are targeting the same WBR chain, depending on the bypass control signals states. Figure 37
explains the modified wrapper behavior.

Only a couple of ports deciding whenever the wrapper is in inward facing or outward facing mode
(test modes and wbr_bypass control signals) are to be plugged manually towards the IP_shell WIR controller.
After this step, the netlist is updated and ready to insert all compression at once (both Vcore and Vsafe with
wrapper chains).

Pierre Schamberger November 2012 page 45


intest intest
Wrapper bypass paths
wrapper wrapper
scanin scanout

extest WBR chains


wrapper extest
scanin wrapper
scanout
bypass
control
test mode signals
Wrapper box

IP_shell
Figure 37:Wrapper modified. WBR chains are handled in both intest and extest wrapper chains.

In fact, the last run will read test model from Vcore_shell and Vsafe_iso block, handling their scan
chains, and create the last scan chains in IP_stub and IP_top. 3 test modes are created:

 Internal_scan (wrapper chains and all core chains declared).


 ScanCompression_mode (same as Internal_scan, but this test mode will use compression).
 My_extest (only wrapper extest chains declared)
To perform the division between what should be handled with which compressor, the partition flow
is to be used, and 3 partitions will be used:

 Vcore partition:
o IP_stub scan segments
o IP_Vcore_top scan chains (from test model)
 Vsafe partition:
o IP_Vsafe_top scan chains (from test model)
o IP_top scan chains
o Wrapper intest chains
 A default partition without compression just to connect Modem core scan chains already
compressed.

The tool offers the possibility to choose, for each partition, where DfT logic has to be inserted. For
the Vcore compression, by setting it to IP_stub shell sub-block, the tool understands correctly and
implements the correct architecture. At last, with all these specifications the tool is then able to insert the
asked feature, and overall, all test models and test protocols are correct and “almost” ready to use for ATPG
step. The only mandatory left step before is actually the “test setup” generation to start correctly the test.
This is a small part of the test protocol explaining what are the first values to put on which input (clocks
mostly), and also the values which are to be shifted in WIR controller through wsi ports.

Pierre Schamberger November 2012 page 46


•Creatng regular chains (300 cells/chain)
Vsafe_iso

•Creatng regular chains (300 cells/chain)


•Integrating compression from Modem_core block with existing test model file.
Vcore_shell

•Inserting wrapper feature with DfT Compiler.


•Removing all wrapper nets, ports and test modes.
•Adding bypass paths as RTL block with test mode dedicated scan chains.
Wrapper •Creating wrapper block test model with CTLGen, showing wrapper chains as regular chains.
insertion •Updating netlist with wrapper block.

•Using test models file for Vcore_shell, Vsafe_iso and wrapper blocks.
•Hybrid flow used with partitions:
• Default partition with Modem_core chains, integrating Modem_core compression feature.
• Vcore partition with IP_stub and Vcore chains. Compression modules located in
IP_stub_shell.
• Vsafe partition with Vsafe chains, IP_top and wrapper block. Compression modules
Global DfT located in IP_shell.
insertion •3 test modes created:
• Internal_scan and its related compressed mode (ScanCompression_mode), including all
chains but external wrapper chains.
• My_extest mode, excluding all core chains and internal wrapper chains. (Just extest
wrapper chains remaining)

Figure 38: X/J project top-down flow steps. Each line describes a step with the main actions to be done.

VII.3.3) Flow evaluation: Top-Down vs. Bottom-Up


The two flows have been described above, without real judgment on their performances. This part
will discuss the benefits and drawbacks of both flows.

First, the bottom-up flow is able to define very precisely in each block the clock management, and
scan configuration. By inserting all step by step, it allows a flow setup very easy to debug, and a team work
sharing very wide. However, this flow asks for 11 run of DfT compiler, creating each time a new netlist, and a
corresponding test model. It requires starting on the modem core with a simplified test model, to hide
compression features inside. Limitations would appear quickly and test model hacking will become soon
tedious. At last, test model and test protocol of IP_shell have to be built with piece of information from
everywhere, making it hard to automate. Moreover, ATPG tool requires very precise information to perform
a correct work, and this test protocol generation is thus important.

Pierre Schamberger November 2012 page 47


The top-down flow is however way much faster, since only 4 runs have to be performed. Team work
sharing is still accurate with Vcore and Vsafe parts while a third one works on IP_shell flow. These two first
steps allow also a good granularity and an easier debug than a full top-down flow without previous step. This
flow offers among all correct test model and protocols ready to use, which eases greatly the flow. On the
other hand, this flow does not reuse the previous IP wrapper compression flow which was tested and proved
to work until simulation. Finally there are no more wrapper specificities in the design, apart from netlist
wrapper cells, and the user should check carefully the right behavior of the tool after DfT insertion to set the
wrapper in the correct mode.

All considered, the second option was chosen, using an easier and faster flow. It is currently under
implementation with the real project netlist. Results are good so far.

VII.4) Pipelines
During previous lines, for both flow proposals, pipelining was not mentioned. It is a particularly
thorny issue to deal with them. In fact, 2 stages of pipeline are required in the specifications between each
scan port and the first cells, before any logic, to break the timings. The modem core has already 1 stage
inside its block, so 2 stages of pipelines have to be added to all scan ports but those concerning the modem
core (84 ports concerned). For those ports, only 1 stage is required. Synopsys handle pipeline support, in
most of the cases, but a bug has been detected and reported when using multi-mode feature at the same
time. The tool does not fit test mode dedicated scan ports requirements. Moreover, the tool is not able to
equalize pipeline stages if a sub-block already contains pipeline.

The idea is then to insert pipeline cells, a particular kind of scan flip flops, by hand in the netlist, after
the tool inserted DfT. A script in Tcl has been written to perform such insertion, for any technology.
Meanwhile, since the netlist has been modified after test protocol generation, there is no trace extra
pipelines are inserted, and the ATPG too would fail recognizing scan chains. The test protocol files are to be
modified, adding a couple of lines explaining presence of pipelines, and modifying pipeline clocks behavior to
pulse whenever it is necessary. The flow is thus very far from the ideal push button flow

VII.5) Integrating IP X project


In a test model file, the tool lists all pins/ports which have a particular signification. Among them,
some pins from WIR Controller such as test mode signals. The test model describes which value is to be set
for a given test mode. It gets dirty at higher level of integration: When it is an input port, the tool can drive it
freely, but when it is an internal pin (from the WIR for instance), the tool would like to drive it. Of course this
is not possible and the tool issue always an error explaining that the test model file provided “ contains
internal pins signals”, and refuses to use it. Several trials have been tested. The one which provides better
results was a Perl script going through the test model file removing all internal pin references. To correct this
limitation, Synopsys R&D is working now on a variable (test_allow_internal_pins_in_hierarchical_flow) to set
at true when integrating such test models to lower tool verification. However, it now assumes that the
internal value will be set to a right value. In the final implementation, they will be provided by WIR serial
interface at test setup.

The second issue when integrating IP_shell at higher level (another SoC) is to provide to the tool all
information about wrapper modes and compression modules. However, user-defined multi modes test
models are not supported at integration and the tool cannot insert a compression on extest ports since it
looks like the IP has compression already inside. To avoid this, one would need 2 test models, one with all

Pierre Schamberger November 2012 page 48


compression information, with ScanCompression_mode and Internal_scan test modes, in order to get info
about compression when generating test patterns. A second test model would be mandatory to insert the
DfT itself, and mainly compressor on extest ports. This test models should then have only the extest test
mode defined. Synopsys does not give opportunity to write test mode dedicated test models. Again, several
trials were processed.

The first trial was to start from scratch, using CTLGen tool from Synopsys to create a fake test model.
The limitation (not negligible) is that CTLGen does not handle OCC definition. The test model is then losing all
clock management information. A Perl script was made, from a CTL generated with CTLGen and a text list
with all synthetic OCC info, to get a test model file with OCC information. This would however require the
user to write all these information himself, which can be quite long when the design handles around 50 OCC
Controllers.

Next trial was taking as input the full generated CTL from DfT Compiler run, add, thanks to another
Perl script, remove all compression information. This script is close to the one used in the first flow for IP
wrapper & compression. It removes all reference to Internal_scan and ScanCompression_mode test modes,
and rename My_extest test mode as Internal_scan, the only one in the design. No trace of compression
anymore, and all OCC information already in place, untouched.

A last option will be available soon from Synopsys R&D is a new feature which will allow the user to
write a test model for a given test mode. The option is today under test and does not work correctly yet. But
the expected result is the same as the second trial.

The solution which was adopted is for now is the second one, from a full test model to be simplified
by Perl script waiting for Synopsys option to be corrected.

VII.6) Targeting chip J project


This part mainly discussed on the IP X project, but the J chip project is similar, in its principal lines.
The major changes are that no wrapper is to be inserted (pointless at a chip level), but instead serializer
feature might help to limit test pads. At first, serializers were supposed to be implemented in soc_top, as
Vsafe compression, which is the top level of the chip. However, due to limitations between hybrid flow,
pipelines and serializers, which are not supported all together, it was decided, since the chip architecture is
not frozen, to move it in a soc_shell hierarchy above soc_top. The flow is then exactly similar to IP X project,
without the wrapper insertion step. A last run is done at soc_shell, enabling serializer flow.

At the time this report was written, the project was still under discussion, and the number of pad
available for scan purpose not defined. There is no certitude that a serializer will be really used, depending
on the chip package used. Then, the serializer flow implemented includes parallel access. This means that
the chip, depending on which ports are linked to pads, either a parallel mode (Figure 39, red wires) with N
scan channels or a serial access (Figure 39, green wires) dividing by S the test port count will be used. In the
IP X project, S should be equal to 2. Figure 39 shows the two possible modes.

Pierre Schamberger November 2012 page 49


Parallel mode scan ports Parallel mode scan ports

N/S Core Serializer


N/S
Deserializer Serial mode
Serial mode N N
scan ports
scan ports

Figure 39: Parallel mode (red wires) in a serializer flow.

It should be emphasized that serializer feature is quite recent and feedbacks from project having
embedded serializer are so good. Thus, it divides the test frequency by S, increasing test time with a factor S.
All that means that serializers, even if implemented on the chip, will only be used as a last resort.

VII.7) Conclusion
This part was the final target of the master thesis at ST-Ericsson. Previous parts aimed at
understanding and handling tools and all their possibilities, with an ultimate goal to implement the best flow
in these projects. It has been clearly proved that the Top-Down flow, with 2 sub-blocks Vcore and Vsafe is
the better in terms of flow simplicity, speed and workload sharing, since two engineers can work on one of
the 2 blocks, and a third one can prepare the final top-down flow, using test model files generated by sub-
block DfT insertions.

It has also been stressed that Synopsys tools have many limitations, forcing the user to find
workarounds for real complex projects. However, Synopsys support is really present and asking constantly
for feedbacks and test cases when some limitations or errors are raised. A new feature or a bug correction
requirement from a design team can be effective then quickly in a coming tool release.

Pierre Schamberger November 2012 page 50


Conclusion
System on Chip designs are growing fast and manufacturing test is more than ever a key step in the
development process. Design for Test (DfT) architecture is to be planned upstream and test logic has to be
inserted carefully.

This master thesis presented basic concepts of scan based test, and went through most of up-to-date DfT
features. The Synopsys tools flow has been clarified, showing its indispensability as a strong and powerful tool.
Also, tools limits have been demonstrated in complex flows. In fact, test architecture in the design is planned
usually depending on constraints as number of ports, as well as functional design specificities. Thus, tools are
sometimes not able to fulfill all requirements. To overcome these limitations, the user needs to implement smart
workarounds to insert DfT logic. This is what has been done in this document, raising tool limitations and
proposing solutions around.

Two major flows have been proposed in this master thesis. A first one, taking test power limits into
account, is implementing a full custom flow to wrap IPs with regular scan compression for IP internal chains.
These wrapped IPs can then easily be integrated at chip top level, compressing wrapper chains to minimize scan
impact. Flexible architecture, including wrapper bypass chains, has been developed, showing clearly tools
limitations. Bypass paths, allowing scan chain continuity for both internal and external wrapper chains. This could
eventually allow testing the core and the interconnection at the same time. The flow which was automated took
the compliant Synopsys way, integrating both compression and wrapper feature at a single step. It provides a
faster setup for coming IPs to be wrapped.

The second architecture developed dealt with a complex current DfT design, implementing most of
available features, including compression, wrapper, pipelines, partition flows. A System-on-Chip project, parallel
to the IP project will also implement a Serializer to limit scan port impact as the number is limited .Some of these
features are completely new in DfT state of the art and mixing them together did not always lead to expected
results. A deep work has been provided to work around all tools limitations again, to finally propose 2 flows. No
Synopsys fully compliant flow could be proposed, thus some customization and netlist modifications have to be
realized for both flows. The one selected was the faster and simpler, including as few steps as possible. It inserts a
wrapper feature at a step, keeping only wrapper cells without ports, nets and wrapper specifications. This way
allows flow compliance with complex compression architecture (hybrid flow).

Also, a strong stress has been put on clock management, most particularly on On-Chip-Clocking
controllers. This was the purpose of section IV and it showed that OCC clock chains have to be correctly handled
in compression flow. More precisely, all OCC clock chains must be merged in a single chain to provide the best
ATPG patterns set. Any other combination tested led to huge test pattern set increase. A second study, taking into
account the coming growth of designs and the number of clock to handle in them, an update was proposed to
improve both similar ST-Ericsson and Synopsys OCC clock chains. In fact, as OCC clock chains must be kept in a
single chain, it will soon appear that clock chain size will be superior to regular scan chain length. Noting that, for
a certain amount of test patterns, only part of OCC control bits are used, the update allows shortening
dynamically the clock chain, reducing then shift time during test.

All in all, this master thesis provided a great overview over DfT insertion flow, and raised up tools
limitations and some workarounds for core-based test methodology were detailed. A close collaboration with
Synopsys support allowed constant help and feedbacks in both directions, and the result is correct and fully
functional flows. Moreover, Synopsys R&D is now aware of raised limitations, which will be for sure improved in
coming release.

Pierre Schamberger November 2012 page 51


Table of figures
Figure 1: Physical defect on silicon ....................................................................................................................................... 2
Figure 2: A NAND Gate ......................................................................................................................................................... 2
Figure 3:NOR gate with Stuck-At-Open defect ..................................................................................................................... 3
Figure 4: Regular flip-flop (left) and corresponding scan flip-flop (right) ............................................................................. 4
Figure 5: Scan based test methodology. Scan chain from scanin to scanout port controlling register state during test. .... 5
Figure 6: Multi-test modes architecture ............................................................................................................................... 9
Figure 7: Scan compression feature .................................................................................................................................... 10
Figure 8: Compressed sub-blocks integrated at top level. .................................................................................................. 10
Figure 9: Top level compression using partition flow to build 2 compressor modules........................................................ 10
Figure 10: Serializer feature ................................................................................................................................................ 11
Figure 11: Pipeline feature (pipeline cells added in green) ................................................................................................. 11
Figure 12: Hierarchical design example .............................................................................................................................. 12
Figure 13: The OCC Controller is taking control over each PLL clock with the shift clock ................................................... 14
Figure 14: TCL script extract defining OCC control bits ....................................................................................................... 15
Figure 15: Capture sequence for Transition patterns ......................................................................................................... 16
Figure 16: Decompressor paths. The C first internal scan chains are directly wired from the C input channels. Remaining
internal chains are a logic mix of them. .............................................................................................................................. 18
Figure 17: Current ST-Ericsson OCC clock chain schematic ................................................................................................. 22
Figure 18: Schematic of the update OCC clock chain proposal. For esthetic purpose, only 2 control bits where reported on
the picture. The real design still have 5. ............................................................................................................................. 24
Figure 19: Test time depending on OCC number in design with a scan chain length of 120. ............................................. 26
Figure 20: Test time depending on OCC number in design with a scan chain length of 200. ............................................. 26
Figure 21: Standard wrapper cell with a scanned cell (green) and a mux to select the wrapper cell mode. ...................... 28
Figure 22: Synopsys wrapper implementation with input and output wrapper cells. ........................................................ 29
Figure 23: Wrapper cells in inward facing mode. ............................................................................................................... 29
Figure 24: Wrapper cells in outward facing mode. ............................................................................................................. 29
Figure 25: Chip top DfT architecture. Red wire are internal scan ports, muxed all together, and the blue ones are extest
wrapper scan chains handled in a compression. ................................................................................................................ 30
Figure 26: IEEE1500 Standard global view of the wrapper feature with the wrapper cells (WBR), and the regular internal
scan chains(WPP). ............................................................................................................................................................... 31
Figure 27: Implemented IEEE1500 Standard WIR Controller .............................................................................................. 32
Figure 28: IP Wrapper & Compression flow requirement with double bypass for each wrapper mode, and a complete
control over boundary cells direction. ................................................................................................................................. 34
Figure 29: Wrapper bypass paths in extest mode. .............................................................................................................. 35
Figure 30: Wrapper bypass paths in intest mode. .............................................................................................................. 35
Figure 31: wrapper bypass RTL block implemented (inside red boundary). ....................................................................... 37
Figure 32: IP wrapper & Compression flow in a single hierarchy. No more bypass implemented but a full compliance with
Synopsys tools. .................................................................................................................................................................... 38
Figure 33: Base of IP X project. WIR and OCC are inserted at RTL, all other features to be inserted. ................................ 40
Figure 34: IP X project DfT requirements with 2 compressions at different levels, and a Wrapper feature. ...................... 41
Figure 35: Chip J project DfT requirements with the same compression architecture as IP X project, plus serializer at
soc_shell level. .................................................................................................................................................................... 42
Figure 36: X/J project bottom-up flow steps. Each line describes a step with the main actions to be done. ...................... 44
Figure 37:Wrapper modified. WBR chains are handled in both intest and extest wrapper chains. ................................... 46
Figure 38: X/J project top-down flow steps. Each line describes a step with the main actions to be done. ....................... 47
Figure 39: Parallel mode (red wires) in a serializer flow. .................................................................................................... 50
Figure 40: Full flow developed for the single hierarchy IP Wrapper & Compression flow. ................................................... B

Pierre Schamberger November 2012 Page VII


Table of tables
Table 1: NAND gate logic and Stuck-at faults impact ........................................................................................................... 2
Table 2: Pattern set size and test coverage for different OCC location and declaration .................................................... 19
Table 3: Pattern set size of a current ST-Ericsson project, classed by capture clock cycles. ............................................... 21
Table 4: Pattern set size and test coverage for current OCC clock chain implementation and update proposal one. ....... 24
Table 5: OCC declaration in test protocol file. Bold information should be adjusted depending on the number of control
bit used. .............................................................................................................................................................................. 25
Table 6: Detailed count of patterns for each possibility of capture clock cycle, depending on the simulation processed. . 26
Table 7: Traditional versus Core-based test methodology.................................................................................................. 28
Table 8: Verilog code of the current ST-Ericsson OCC clock chain......................................................................................... A

Pierre Schamberger November 2012 Page VIII


Bibliography
1. Di Martino, A 2010, ‘Low power test pattern generation flow’, Master thesis, ST-Ericsson, Sophia-
Antipolis, France.

2. Pennstate University 1998, Test technology Overview, viewed June 2011,


<http://ares.cedcc.psu.edu/ee497i/rassp_43/sld001.htm>.

3. Synopsys 2011, DFT Compiler, DFTMax & TetraMax User Guides, F-2011.09-SP1.

4. Synopsys 2011, Solvnet viewed June-December 2011, <https://solvnet.synopsys.com>

5. HIEBEL, F 2008, ‘Transition Fault test pattern generation optimization using on-chip PLL and
implication on compression techniques’, Synopsys User Group Europe.

6. Delbaere, I, Carin, C & Eychenne, C 2010, ‘Hierarchical Adaptive Scan Synthesis and core wrappers
methodology combined to reduce power during scan test’, Synopsys User Group, Grenoble, France.

7. Eychenne, C, Carin, C & Delbaere, I 2011, ‘Reconfigurable wrapper Test Access Mechanism (TAM) in
a core based DFT strategy to save interconnect test time’, Synopsys User Group, Grenoble, France.

8. Eychenne, C 2010, ‘IEEE 1500 status: Core test wrapper use’, Internal ST-Ericsson document.

9. Institute of Electrical and Electronics Engineers 2007, Standard Testability Method for Embedded
Core-based Integrated Circuits, IEEE1500, Institute of Electrical and Electronics Engineers, Geneva.

10. Chakravadhanula, K & Chickermane, V 2009, ‘Automating IEEE 1500 Core Test – An EDA Perspective’,
IEEE Design & Test of Computer, May-June 2009, pp6-15.

11. Refan, F, Prinetto, P & Navabi, Z 2008, ‘An IEEE 1500 Compatible Wrapper Architecture for Testing
Cores at Transaction Level’, Design & Test Symposium (EWDTS), 2008 East-West, pp178-181.

12. McLaurin, T 2005, ‘IEEE Std. 1500 Compliant Wrapper Boundary Register Cell’, ARM.

Pierre Schamberger November 2012 Page IX


Annexes

A) Annex: OCC Clock chains module Verilog code


Current implementation Update proposal
module SNPS_OCC_DFT_shftreg_length5_0_0 ( module clk_shift_reg_logic (sel, ctrl);
clk, input[2:0] sel;
s_in, output[4:0] ctrl;
shift_n, reg[4:0] ctrl;
p_out);
output [4:0] p_out; always @(sel) begin
input clk, s_in, shift_n; ctrl[0] <= 0;
wire n5, n4, n3, n2, n1; ctrl[1] <= 0;
reg [4:0] p_out; ctrl[2] <= 0;
ctrl[3] <= 0;
always @(posedge clk ) begin ctrl[4] <= 0;
p_out[0] <= n5; if (sel > 0) begin
p_out[1] <= n4; ctrl[0] <= 1;
p_out[2] <= n3; if ( sel > 1) begin
p_out[3] <= n2; ctrl[1] <= 1;
p_out[4] <= n1; if (sel > 2) begin
end ctrl[2] <= 1;
if (sel > 3) begin
ste_clk_mux i_mux_0 ( ctrl[3] <= 1;
.Z (n5), if (sel > 4) begin
.D0 (s_in), ctrl[4] <= 1;
.D1 (p_out[0]), end
.S0 (shift_n) end
); end
ste_clk_mux i_mux_1 ( end
.Z (n4), end
.D0 (p_out[0]), end
.D1 (p_out[1]), endmodule
.S0 (shift_n)
); module clk_shift_reg(si, clk, shift, sel, so, p_out);
ste_clk_mux i_mux_2 ( input si, clk, shift;
.Z (n3), input[2:0] sel;
.D0 (p_out[1]), output so;
.D1 (p_out[2]), output[4:0] p_out;
.S0 (shift_n) reg so;
); wire n0, n1, n2, n3, n4;
ste_clk_mux i_mux_3 ( wire[4:0] ctrl;
.Z (n2),
.D0 (p_out[2]), clk_shift_reg_logic i_logic(.sel(sel), .ctrl(ctrl));
.D1 (p_out[3]), ste_common_1ff p_out_reg_0 (.Q(p_out[0]), .D(p_out[0]), .CP(clk), .TI(si),
.S0 (shift_n) .TE(shift) );
); ste_clk_mux i_bp_reg_0 (.S0(ctrl[0]), .D0(si), .D1(p_out[0]), .Z(n1));
ste_clk_mux i_mux_4 ( ste_common_1ff p_out_reg_1 (.Q(p_out[1]), .D(p_out[1]), .CP(clk),
.Z (n1), .TI(n1), .TE(shift) );
.D0 (p_out[3]), ste_clk_mux i_bp_reg_1 (.S0(ctrl[1]), .D0(n1), .D1(p_out[1]), .Z(n2));
.D1 (p_out[4]), ste_common_1ff p_out_reg_2 (.Q(p_out[2]), .D(p_out[2]), .CP(clk),
.S0 (shift_n) .TI(n2), .TE(shift) );
); ste_clk_mux i_bp_reg_2 (.S0(ctrl[2]), .D0(n2), .D1(p_out[2]), .Z(n3));
endmodule ste_common_1ff p_out_reg_3 (.Q(p_out[3]), .D(p_out[3]), .CP(clk),
.TI(n3), .TE(shift) );
ste_clk_mux i_bp_reg_3 (.S0(ctrl[3]), .D0(n3), .D1(p_out[3]), .Z(n4));
ste_common_1ff p_out_reg_4 (.Q(p_out[4]), .D(p_out[4]), .CP(clk),
.TI(n4), .TE(shift) );
ste_clk_mux i_bp_reg_4 (.S0(ctrl[4]), .D0(n4), .D1(p_out[4]), .Z(so));
endmodule
Table 8: Verilog code of the current ST-Ericsson OCC clock chain

Pierre Schamberger November 2012 page A


B) Annex: Full single hierarchy IP Wrapper & Compression flow
This step list details the different steps to use the IP Wrapper & Compression flow on a given IP.

Netlist & DfT •Get a shell netlist with scanned core and OCC inside.
Spec •Configure config file and library file.
Generation

•Perform WIR block creation (internal tool) and WIR insertion at shell level.
IEEE1500 WIR
Controller •run do.all –wir to get a test model file of WIR block.
Generation

Create Core
•If no Core CTL file exists, custom run_ctlgen.tcl to get a CTL file (from CTLGen tool).
CTL and •Run do.all –netlists to create all required netlist files for insertion.
netlists

•Multimode wrapper and compression insertion at shell level thanks to DFT compiler
Wrapper & based on core by running do.all -insert.
Compression
insertion
•=> Full shell test model file + Final shell netlist

•Create the test mode-dependent test setup files (internal tool) + OCC setup by
Test setup running do.all -spf.
generation •=> ATPG ready test protocol files

•Custom ATPG script (run_tmax.tcl) and run ATPG with do.all –atpg command.
ATPG run •=> Patterns ready for simulation

Figure 40: Full flow developed for the single hierarchy IP Wrapper & Compression flow.

Pierre Schamberger November 2012 page B

You might also like