Useful Book Ieee1500 Dft
Useful Book Ieee1500 Dft
Useful Book Ieee1500 Dft
Sweden
02/11/2012 TRITA-ICT-EX-2012:272
Acknowledgments
ABSTRACT ..................................................................................................................................................................... II
GLOSSARY.................................................................................................................................................................... VI
INTRODUCTION ............................................................................................................................................................ 1
CONCLUSION ...............................................................................................................................................................51
BIBLIOGRAPHY ............................................................................................................................................................ IX
ANNEXES ...................................................................................................................................................................... A
ATPG: Automated Test Patterns Generation. This process uses algorithms to find best sets of pattern to test
a given chip, using all inserted DfT logic. This can also designate the tool which generates those patterns.
DfT: Design-for-Test. This is the main topic of this master thesis. Its main goal is to add logic in a functional
design to speed up test on silicon of the chip to avoid manufacturing process failures.
Flow: Set of steps to follow in order to achieve correctly a process, as a Design-for-Test insertion for
instance.
IP: Intellectual Property. Hardened hardware block in opposite to a full chip. A chip may contain several IPs.
Netlist: This is the resulting file after compiling RTL files with a given set of physical cells library, describing all
gates and interconnection in a design.
OCC: On-Chip Clocking. This is used to control all functional clocks during test and use a “slow” shift clock
when loading and unloading patterns.
PLL: Phase Locked Loop. This is usually synonym in design for an on chip clock source, allowing design to
work at different frequencies. Each PLL block provides a clock at a given frequency.
RTL: Register-Transfer Level: This is a level of abstraction to describe behavior of hardware modules.
TCL: Tool Command Language: This is the command language used to write Synopsys command scripts,
including basic programming language feature.
Testcase: It is a set of files (scripts, log, netlist, RTL files …) to test and show a given behavior of the tool,
without implementing the feature in a complex design. It allows isolating errors for quicker debug.
The aim of this master thesis is to set up DfT logic insertion flows for these current designs, using
state of the art features. Due to the complexity and the size of designs, all insertion is achieved by tools. The
user’s tasks are first to decide the test architecture to implement, depending on the physical target (chip
size, number of pads available, IP to be integrated …) and the functional design properties. Then, the user
provides the specifications through command script files to the tools to get the correct result. Depending on
the complexity of the test architecture, tools cannot always implement the required behavior, and a third
task is to implement workaround to keep the test architecture in concordance with specifications. The work
realized within this master thesis was to get familiar with these tools, most particularly Synopsys tools and
write script sets targeting either specific DfT architecture projects or more global type of flows.
This document will introduce first the basic Design-for-Test (DfT) key concepts from the physical
fault models to the scan-based test process. The following part will explain which flow current DfT tools are
using, showing the major steps of test implementation. Then the DfT state of the art section will contain the
best features on the market to improve DfT designs. After these background sections, a first study will detail
techniques to manage clocks during the test with dedicated modules (OCC) inside the design. In fact, clock
structures are among the most challenging parts of test implementation to have a powerful final test
process. This part will show how OCC feature has to be used, and which configurations are the best. Finally,
an OCC update will be proposed to speed-up test in coming designs.
To limit big power consumption during test, a core based test methodology is introduced in section
V, showing its main advantages and how it can be implemented. Among the feature used, core wrapping will
be presented, both as the IEEE1500 standard proposes it and as Synopsys tools is implementing it. The two
last sections (VI and VII) will deal with core-based architecture and its implementations through 2 designs:
First a Wrapper & Compression flow for standalone IP will be described, with 2 different implementations,
discussing advantages and drawbacks of each. Then, the core-based methodology will be applied on a
current ST-Ericsson design project, where the whole DfT scan architecture will be implemented with several
state of the art features. Again, 2 flows will be proposed and evaluated, and the chosen process is currently
applied in the chip development process.
The thesis has been performed in the DfT team at ST-Ericsson Sophia-Antipolis, France. For
confidentiality reasons, some value sources and name of projects are not provided, as they depend still on
internal ongoing projects.
As an example, Figure 2 shows a typical NAND gate and on which stuck-at faults will be simulated.
Table 1 shows output gate value with net A stuck at 0, net B stuck at 1 and net ZN stuck at 1 as examples.
Basically, in the first case, the defect will be noticed only if a “1” is put at both inputs (result in red when
showing the defect). Other input patterns will show a regular behavior. To test all Stuck-at possibilities on
this single NAND gate, one needs 3 set of input values (“01”, “10”, “11” for instance).
A
A B ZN ZN if A ZN if B ZN if ZN ZN if ZN
ZN (normal) SA0 SA1 SA0 SA1
B
0 0 1 1 1 0 1
NAND 0 1 1 1 1 0 1
Figure 2: A NAND Gate 1 0 1 1 0 0 1
1 1 0 1 0 0 1
Table 1: NAND gate logic and Stuck-at faults impact
On a single logic gate, it can seem simple, but it gets much more complicated and sometimes even
impossible to test some nets in a chip because of redundancy or complex structures.
These faults can be detected also with stuck-at faults model. But algorithms are not very developed
to detect such faults since it requires a lot of placement information. Tools try to test those faults with both
at-speed (delay fault) and stuck-at tests if the layout information is given.
This model is very important and is part of the 2 test models with stuck-at fault model which are
usually tested in the industry and will be developed later in the report.
A way to set values in the chip is to force all registers (flip-flops) at a custom value, depending on the
faults targeted. To do so, the design should be slightly improved, to provide access to each register without
using regular register input. Thus, all flip flops (Figure 4 left) are replaced with scan cells (Figure 4 right), with
2 extra ports. The TE (test enable, or SE, scan enable) port selects whenever the regular D input or the test TI
(test input, or SI, scan in) port is to be used as register input.
With this, the register can take either the value from the upstream logic, or what comes from the
test input port. To provide access to these TI ports in all the design, chains of scan cells are implemented,
linking all scan cells with the following pattern:
scanin port -> Cell1/TI -> Cell1/Q -> Cell2/TI ->Cell2/Q -> … -> Cellx/TI -> Cellx/Q -->scanout port
In fact, these “scan chains” are wired from an input pad (scanin port) to an output one (scanout
port), passing by a certain amount of scanned flip flops, using TI input port, and functional output Q port.
However, the output port is still connected also to downstream logic, and another net is just added to link
the scan cell to the next one.
A complete design contains generally several scan chains. A good design from a DfT point of view
would have small and equal scan chain length (currently around 200), because test time is directly impacted
by shifting time, thus by scan chain length. However, the amount of pads is limited and other specific
techniques have to be implemented to achieve small scan chain length. Such techniques will be detailed in a
coming section of the report.
When the scan chains are built, there is a hardware path to shift values inside the chain, by enabling
TE to select TI register input. Each cycle, a new value is inserted in the first cell of the chain, and all the
previously inserted ones are shift one cell downstream. For a scan chain including N cells, N shift cycles are
required. After that point, the design is fully loaded with custom values.
Then a functional clock cycle is processed, with all TE ports selecting regular D register input, to
capture the functional value. As an example, considering Figure 5, when all registers are loaded, the
functional clock cycle propagates the values from the left registers through Logic Block A, and captures them
in the right registers. Meanwhile, the values from the right registers go through logic block B until the
functional outputs. Finally, the input selector enables again the test input, and the whole scan chain is
shifted out through scanout port and probed to control the value correctness. Meanwhile, input ports are
already shifting another set of values (“pattern”) in the chip through input pads.
A “test pattern” is a set of values defining all design registers states at a given time, to target a set of
faults. Several patterns are required to test a chip and each pattern is shifted one after the other. A good set
of test patterns is a small set which has a maximal coverage over the chip defects. The usual target is over
99% of fault coverage for Stuck-at faults model, and around 70% for transition faults model, which are more
difficult to target.
The work of DFT teams is then basically divided in two parts: Create and insert the dedicated test
logic discussed above and find the best test patterns set to test the chip with the help of powerful tools. The
challenging part is to find a good compromise between test time and test coverage. It gets more complicated
as the chips are growing up. To reach the 99% coverage target, the set of test patterns required is getting
bigger and bigger and test capacity limits are now reached.
In the same time, test power consumption is also increasing dramatically. This leads to a new issue:
In functional mode, all chip cells are never switching within one single clock pulse. However, it could happen
in test mode and then current supply can be a limited factor. Because of that, a chip can appear to be miss-
manufactured because of power lack. The consequences would be to throw away an operational chip (called
a yield loss). The DfT architecture must take this power constraint into account.
All this limits leads us today to implement new techniques to ease and speed up the test. This master
thesis will deal with these new techniques, as wrapper logic, compression units and clock controller which
are the new standards of DfT to implement Design-for-Test architecture easing compact sets of test patterns
and short test time. The next part will now explain the global DfT flow used with Synopsys tools.
Then a global synthesis is done, mapping the RTL on a given library set. The output of a synthesis is a
Verilog file called netlist containing all the design mapped on a given technology (32/28nm for instance,
depending on the fab). This netlist is used again in DfT to insert all the scan logic. In fact, a first step is to
replace all regular scan elements (flip-flops) by scanned cells, then stitch them together to create scan
chains. During this step, DfT modules such as Compressors, Wrapper … are dynamically inserted be the tool.
All clock and reset architectures are explained to the tool to handle it in the best way when inserting DfT
elements. This step creates a new and final netlist.
When all information is provided and the hardware is in its final stage, the tool should be able to
deal with all design features to generate test pattern sets. This step is achieved with an ATPG (Automated
Test Pattern Generation) tool. Depending on the complexity of the design, and the DfT insertion precision,
the tool will be able to give or not a correct set, with a variable test coverage. This step was used in this
master thesis work to check the expected behavior of test features, and verify that test pattern can be
generated with provided information.
A last step is to validate, through simulation that the patterns generated are correct, and checking
again that the tool understood the chip behavior correctly. This last step usually forces to loop back to first
steps as problems occurred. Basically, most troubles come from clock architectures, and test signals which
were not correctly handled.
All this process finally provides a clean test pattern set fitting the final chip architecture. Test
Engineers will use this pattern set to test chips at wafer level for mass production.
Synopsys provides tools to perform the previously presented steps. In a global way, Synopsys tools
works with TCL scripts of command lines to activate tool switches, reading input files and creating output
files. The flow used during the master thesis was restricted to post-synthesis netlist steps. Synopsys
corresponding tools are described below.
After processing all given information, DfT Compiler produces an updated netlist with scan inserted,
as well as a couple of reports files, and 2 important files:
Test model file, commonly called CTL (as its extension). It describes all DfT info inside the
design (scan chains, scan ports, clock management, embedded DfT devices …). It is used
when the current design is to be integrated at higher level (SoC level for an IP), to avoid
redoing the work again, and simplifying IP integration.
Test protocol file, commonly called SPF (as its extension). It describes almost the same
information as the test mode file, plus test signal timings. It’s aimed to next step which is
ATPG to drive correctly test signals.
If this checking step is alright TetraMax can create a full list of faults to target in the design (Stuck-at,
Transition …). Then the user decides the algorithms to be used for targeting faults and the tool runs to
provide a correct set of pattern and a corresponding testbench for the design (clock, reset, scan enable,
control signals …). These patterns are used first in a simulation flow to check if they are usable and
implemented in a chip tester to test all chips in mass production at last.
First, with a regular netlist, with no back-end information. This is called zero-delay
simulation since everything is still ideal and no placement information is available.
When the second simulation is alright, the chip is declared to be testable and can be sent to
production.
II.3) Conclusion
DfT tools are really essential for digital circuit and nothing could be done without. They can handle a
DfT implementation by themselves with very few information. However, when it comes to implement a
customized architecture, where the user wants specific details, it can be very hard to deal with them. The
aim of this master thesis is actually to master them, understand how they react when implementing such
customized architecture, and find workarounds to their limitations.
Synopsys tools, and more specifically DfT Compiler, allow regular scan chain insertion, but also the
insertion of a lot of features to enhanced test on chip, which will be presented in the next section.
This first technique allows implementing designs with several test configurations, implying several
scan chain lengths. One can for instance create a configuration (called test mode) where all flops in the
design are grouped in a single scan chain and another configuration with several scan chains. The tool
creates automatically the required number of ports (called test_si and test_so on Figure 6) for each test
mode, and links scan chains to fit the requirements, trying to equilibrate the chains to keep the length as
small as possible and equal between chains. Figure 6 is an example of 2 test modes (blue and red) for a
design containing 12 cells. Red test mode is declared with 3 scan chains, and blue with 2. The figure shows
that the tools inserted multiplexer driven by a test_mode signal to switch from a test mode to another,
modifying scan chains length. At the end, each test mode will have its test protocol and its set of patterns.
In fact, DfT Compiler allows common compression ratio from 10 to 50 times. Over that limit, it is still
implementable (up to 100 times), but it becomes difficult to target a high fault coverage for the design.
Decompressor is based on muxes, and compressor is mostly made of a XOR-tree.
D
e C
c o
o m
m p
p r
r e
e s
s s
s o
o r
r
This method is almost always used in today’s design because of its simplicity and its power. When
implemented by DfT Compiler, 2 test modes are created, namely Internal_scan and ScanCompression_mode.
Both modes use the same ports, but Internal_scan does not use compressor modules, and thus has bigger
scan chains. In fact, this mode recombines the chains from the compressed mode to get as many as scan
channels. The Internal_scan mode is useful for debug both on chip and during simulation as a first step.
D
Power C
e
o
c domain
m
o
m
1 p
r
p
e
r Partition 1
s
e
s
s
o
s
D r
o Power C
e
r o
c domain
m
o
m
2 p
r
p
e Partition 2
r
s
e
s
s
Figure 8: Compressed sub-blocks Figure 9: Top level compression using
o
s level.
integrated at top partition flow to build 2 compressor
r
o modules
r
This feature is useful then, but brings a lot of constraints and incompatibilities with other flows.
Depending on Synopsys investment on this feature, it can become interesting in coming DfT Compiler
releases.
This feature might look simple and easy to implement, and almost indispensable in current designs.
But the current DfT Compiler flows including pipelines are limited and bring some more issues in complex
architectures. This will be discussed on section VII flow.
III.2.1) Bottom-up
Also called hierarchical flow in Synopsys documentation, this flow allows a pleasant granularity when
inserting DfT.
Block B Block D
Block A
Block C Block E
top
Figure 12: Hierarchical design example
As an example, in the design on Figure 12, one would process 6 runs of the tool, one for each
module. It allows defining the scan configuration in a very precise way, for each block.
However, each run will result in an updated netlist, and a test model file (CTL) of the block. All the
information has to be carried from a sub-module to its parent block and it’s often here things get wrong: In
fact, in complex architectures, the user has to bypass tools limitations by “hacking” the netlist or the test
model. It is then complicated to keep something coherent through hierarchy levels. And the flow becomes
quickly creepy with a lot of steps and files.
III.2.2) Top-down
A smarter way to handle a DfT insertion may be using a Top down approach. In the example on
Figure 12, a top down flow would treat all in one pass, at top level. This technique is quick, clean, and does
not require carrying DfT information several times. However, when design get slightly complex, the
III.2.3) Conclusion
As proposed above, the smartest way is generally a good mix of both flows. In the above design
example, one could for instance treat block D as a top unit, all at once (including blocks A, B and C in a single
DfT insertion), exporting a test model for this block, and integrating block D with block E at top level.
An On-Chip Clocking (OCC) Controller is basically a box which intercepts PLL output clock and allow
the APTG tool to mix it with the shift clock for test purpose. Synopsys requires each on chip clock generator
output to be controlled with such OCC controller.
PLL
Figure 13: The OCC Controller is taking control over each PLL clock with the shift clock
The PLL clock is the one used in functional mode, and the OCC Controller is bypassed. During test,
the shift clock (coming from a pad, red wire on Figure 13) is used to shift patterns in and out of the chip.
Then capture cycles are processed with the PLL clock (at speed). Such a module should then handle, in a
glitch-free way, clock transitions from a slow clock (shift clock) to a fast clock (PLL clock), implying a full
synchronization between the 2 clocks.
Also, in order to target complex faults, such as the ones in memories, designs might now require
multi-capture cycles to target these faults. This is handled with dynamic control bits inside the OCC
Controller, driven as a scan chain (called clock chain) by ATPG tools. Each bit, when enabled, leads to a
capture clock pulse before shifting out the result.
# Then, OCC control bits are grouped together in an OCC scan group
set_scan_group clk_bit_chain_ref_clk\
–class OCC \
-segment_length 5 \
-serial_routed true \
-clock [list ate_clk] \
-access [ list \
ScanDataIn occ_ref_clk/scan_in \
ScanDataOut occ_ref_clk/scan_out ]
The above script is divided in 2 commands. First, the clock itself is defined, with the different bits to
control it (5 control bits = max. 5 capture cycles), the PLL source clock, the shift clock and the output pin
clock. Then, a command defines how to group the OCC control bits in a particular class (named OCC) which
will create a clock chain in the design among regular scan chains.
This “-class” option in the set_scan_group command allows the tool recognizing a clock chain. It
results in the creation of an easy way through compressors to avoid degrading the test coverage or the
pattern set size too much. In fact, the tool implicitly excludes the clock chains from compression. If in a scan
chain, some cells are defined in an OCC scan group, this whole scan chain will not be part of the
compression, and it will take a dedicated scan channel for itself from compression dedicated pads.
As a first point, let’s consider any basic compressed design (with no clock chains). As an example,
given 8 primary scan channels, DfT Compiler divides them in two groups: 3 signals are selectors, and 5 are
data. Without OCC architecture, DfT Compiler can handle up to 1024 internal scan chains with such
configuration. With OCC architecture and still 8 scan channels, it falls down to 512.
In the first case (no OCC), each of the 5 data bits drives directly internal scan chain 0 to 4 (Figure 16,
C channels). And all the other internal scan chains are a logic mix in between all 8 signals. The first C scan
chains are then fit-through, but a given value on them impact deeply all others scan chains values.
However, if an OCC chain is declared as clock chain, one of the 5 (C) data bits is fully dedicated to
that clock chain, and not used for other scan chains. It means that others scan chains are not impacted by
the clock chain values.
According to this, with a full scanned design, the tool should be able to cover the whole design set of
faults. However, this is barely the case in today’s projects. In fact, some cells are not scanned, and some are
told to be “uncontrollable”. This happens when the cell clock or reset signal cannot be controlled, meaning
nor a primary input clock, neither an OCC controlled one. This should appear as few as possible since it
complicates test patterns generation.
Also, most of faults not able to be targeted with simple patterns are faults targeting memories. In
fact, if memory content is tested with Memory Built-In Self Tests (MBIST), the access logic has to be tested as
regular scan logic. However, a memory accesses (write/read) requires around 3 cycles, and TetraMax has to
wait those 3 cycles to see the results on the output pins of the memory. This is where most of multi cycles
patterns are used.
In a general way, putting apart the memory faults, most faults can be detected with simple patterns
(single capture for Stuck-at faults, and double launch/capture for Transition faults).
A first run called Best Case below, was done to get a reference in terms of pattern set size and fault
coverage one can expect. All OCC control bits were in a single dedicated clock chain, creating only one
dedicated scan channel (scan chain 0)
Then, the OCC clock segments were spread into 4 scan chains, but only one of them was declared as
OCC scan group. To perform this, simply remove the “-class OCC” parameter in the set_scan_group
definition. Three tests were performed with the aim to declare all (1), part (2), or none (3) of the OCC control
bits. This was done by removing in the test protocol file the line (cf. Table 5):
If no line is removed, all 5 bits are considered as care bits, making the pattern set generation harder.
Results are shown below as “OCC spread, 5 bits declared”.
As the last step, as a kind of verification, all control bit information was even removed, leaving only
the pin definition of the OCC. During this test, all “-class OCC” parameters in scan group definition were also
removed, spreading all the OCC in 3 regular scan chains. In this way, all OCC control bits are seen as regular
scan cells and always don’t care bits. The results are shown under “OCC Spread, no control bit declared”.
In fact, without any OCC consideration in a C width input decompressor, the first C internal scan
chains are actually direct wires from the scan ports, and all other internal scan chains are a logic mix of these
scan ports (Figure 16). One can wonder if any improvement can be achieved by moving the OCC chain on
either a “wired” scan chain or standard compressed chain.
Figure 16: Decompressor paths. The C first internal scan chains are directly wired from the C input channels. Remaining internal
chains are a logic mix of them.
Number of OCC control bits declared: 5 or 1 control bits declared. The case “0 control bit declared”
was already done in the previous study.
Path through compressors: the first C scan chains of the compression are always fits-through. The
idea is to force the OCC chain (still not declared as OCC chain) on a privileged chain, or on a standard
one.
In all cases, all OCCs were on the same scan chain, but not declared as clock chain.
The first line is the reference test, with all OCCs on a dedicated and declared clock chain. It
represents a good test coverage of 99.55%, and a pattern set including 1402 patterns.
The second line deals with OCCs spread through 4 chains, but only one declared as a clock chain.
First test was with all 5 control bits declared. One can see in the results of the ATPG run a test coverage of
99.21% for a pattern set containing 18499 patterns. The set size is over 10 times the best case! Also, the test
coverage is slightly lowered (-0.3%) compared to the best case on line 1.
The second test declared only 1 control bit per OCC. The results provide a test coverage of 99.27%,
and a pattern set of 17738 patterns. One can see a small pattern set size reduction (≈5%) compared to the
previous test, but it’s still around 10 times more compared to the best case, where all OCCs were in the
same dedicated scan chain. And the test coverage is still slightly lowered compared to the best case.
As the last step, all control bits declaration was removed. The results were then way much better
with a test coverage of 99.55% for 1412 patterns. With this run, one finds again similar results to the best
case. However, this test was performed without any OCC control bit information. Thus TetraMax assumed
that the OCC will magically decide the control bit state, but OCC actually contains random values. All these
control bits should then be forced to a correct value before ATPG capture cycle. Otherwise all patterns will
fail during simulation. There is no way to guess the ATPG required control bit state. Then this test will fail in
simulation. This is not usable in such a state. This shows however that, without any care bits, this
corresponds to the best case results.
Then, the undeclared clock chain was stitched on one of the C first internal chains, as a privileged
chain (directly wired from scan port). One can notice a tiny improvement compared to the previous tests.
But again, the pattern set is around 17,000 patterns, which is not acceptable. It means that the ATPG tool is
working with the C privileged chains in the same way than other chains.
Enclose the full IP chain in the top clock chain. It will enlarge the chain, but test coverage and
pattern set will remain correct. However, shifting time might increase.
Declare the chain as OCC scan chain, creating fit-through in the compressor and loosing then
a couple of scan ports. Compression ratio is reduced and additional scan ports may be required to avoid
regular scan chains merging.
IV.2.6) Conclusions
The best pattern set count was around 1400 patterns. It can be achieved either by declared a single
clock chain with all OCCs (best case), or removing all care bits constraints in SPF file (OCC spread, no control
bit declared). These tests also achieve the best test coverage (99.55%).
When spreading OCCs in design (on several scan chain), a small improvement can be seen when
declaring only 1 control bit (other 4 bits in don’t care state). But this improvement is really small (18,500 to
17,740 patterns, against 1400 patterns in the best case). And the global overhead is not acceptable.
Since the “no control bit declared” architecture does not pass simulation step, there is no other way
but the best case described above to process OCC recognition in design, when it is possible to link all OCCs in
one single scan chain. This is of course the Synopsys recommended way of doing such thing.
But troubles appear when it’s not possible to link all OCCs. In this case, all considered options gave
bad results. Some solutions were studied without success.
When several scan chains contains OCC in a design, DfT Compiler, while inserting compression, uses
a dedicated couple of scan in and out ports for each clock chain. This could be very port-consuming if OCC
are spread in the design. This could ultimately force the tool to merge 2 regular scan chains to fit
compression limits, implying a shifting time doubled. Also if the OCC chains are not declared as clock chains,
and because OCC control bits are considered as care bits, the ATPG goes into troubles when processing
patterns because of the high number of care bits, and resulting in a huge pattern set.
As a conclusion of this study, it has been shown that the only thinkable way to have a correct design
and a reasonable pattern set is to use only IPs with a dedicated clock chain (OCC) at IP top level. However,
issues can still appear. In fact, more and more OCCs are present in designs and the clock chain can become
bigger than regular scan chain length requirements. This would lead to an increase of the test time. Next
part will deal with the clock chain length, and propose an update to avoid such shift time overhead.
However, designs are today heading towards an all OCC test design style, implying an increasing
number of OCCs, avoiding not controlled flip-flops. Common scan chain lengths are around 200 or 300 cells,
with decreasing trends. Given OCCs with 5 control bits in a design with scan chain length of 200, if there are
more than 40 OCCs, the clock chain will be longer than regular chains, impacting shift time.
But, as seen previously, cutting the clock chain into 2 chains basically asks for a new couple of scan
ports. This part deals with that problematic which will be accurate very soon in coming designs. As an
example, a current project contains around 40 OCCs for 300 scan chain length.
One can wonder why a single extra scan port is so important. In fact, at SoC level, a scan port is a
pad. Since DfT ports are plugged on functional pads, the amount of test pads is limited by the functional
pads. Dedicating 2 pads just for a couple of OCC clock chains is a very poor usage of this limited resource. In
fact, if they were added to a compressor, they would drive around 40 scan chains of 300 scan cells.
In a global way, this particularity can then be used to reduce clock chain length in some patterns.
The idea is not about statically reducing the control bit chain at instantiation, but more about dynamically
showing to the ATPG a given number of those 5 cells. Indeed, for half the pattern set, one can show only a
single control bit to the ATPG, forcing the others at 0, decreasing the clock chain to a fifth of its length.
The following parts will try to propose an update of the current OCC clock chain hierarchy.
SET
si S1 D D Q
S2
clk C ENB
CLR Q
Ctrl_bit1
P_out[4:0]
shift SET
S1 D D Q
S2
C ENB
CLR Q
Ctrl_bit2
SET
S1 D D Q
S2
C ENB
CLR Q
Ctrl_bit3
SET
S1 D D Q
S2
C ENB
CLR Q
Ctrl_bit4
SET
S1 D D Q so
S2
C ENB
CLR Q
Ctrl_bit5
When processing the first patterns, the ATPG does not need to deal with multi capture cycles, and
therefore only 1 control bit of each OCC is used in stuck at test mode, while all 4 others are set to 0.
This update hides the useless 4 control bits in each OCC during standard pattern generation. Thanks
to this, a clock chain with 50 OCCs will be reduced from 250 cells to 50 cells for most of patterns. It means
that for those patterns (in a 200 scan chain length design), the clock chain will not anymore impact the
shifting time. In the current ST-Ericsson project, it represents half of the pattern set. In Transition fault
model, 2 control bits can be used for most of faults, implying a clock chain length of 100 flops. Still, it would
not impact the shift time. Only in the few patterns, including those dealing with memories, the shifting time
will be over 200 cycles, slowing down the shifting time from one fifth (250 cycles).
This method basically avoids creation of a second clock chain, thus the creation of an extra couple of
scan ports while lowering the impact on the test time.
Stuck-at single capture cycle: 1 control bit shown to the APTG tool.
Stuck-at multi capture cycles: Here, depending on the design, it should be set after some
trials, but typically 3 control bits are enough for most of multi cycles faults.
Transition single launch/capture cycle: 2 control bits shown to the ATPG.
Transition multi capture cycles: here again, depending on the design, it should be set after
some trials. The value which gives the best tradeoff between fault coverage and compact
clock chain is to be used.
The choice of the number of OCC control bits will be set into WIR control registers (cf. next section)
as a 3-bit register: With a standard binary coding implementation (“001” to “101”), this register can decide
the number of control bits used.
There is no impact at design top level, since everything is handled by WIR Controller (cf. next
section). In the shift clock chain, however, it is quite different since an eventual bypass should be
implemented for each bit control (red wires on Figure 18).
The current design is compacted by using scanned flip flops (mux and sequential cell grouped), and 5
new mux (in red on Figure 18) are added to decide whenever it is bypassed. The combinational logic is
grouped inside the i_logic (top right corner) block, evaluating bypass enablers.
One can see on Figure 18 the sel[2:0] signal going through the logic bloc, transform in a ctrl[4:0]
signal to drive bypass enablers.
ctrl[1]
SE Clk SE Clk
D 1 D 1
SO port
Q Q
SI port
SI 0 SI 0
Ctrl bit 0 Ctrl bit 1
p_out_0 p_out_1
Figure 18: Schematic of the update OCC clock chain proposal. For esthetic purpose, only 2 control bits where reported on the
picture. The real design still have 5.
One can conclude that this update proposal does modify neither the pattern set, nor the test
coverage.
In order to pass the Design Rules Check (step verifying all test protocol and learning how to deal with
the design. Last step before pattern generation) in TetraMax, the SPF should be slightly adapted when all
control bits are not used. In fact, leaving unchanged the SPF file will fail. One should remove the extra OCC
control bits in the OCC definition (the bold lines in the SPF extract on Table 5). For instance, if only one
control bit is used (Stuck-at basic scan), all but “Cycle 0…” bold lines should be removed, as well as the
PLLCycles parameter adjusted.
Then the ATPG tool, while tracing scan chains, will see nothing but a reduced OCC clock bit chain.
However, the number of control bits in OCCs is determined once for the test (depending on the SPF
file and the WIR controller test setup). An update of the WIR register is to be planned between 2 runs to
change the control bit bypass. This can slightly slow down the testing time. To check this, the time impact
will be evaluated on a concrete case below.
Current OCC design will add an extra 50 (5*50=250 OCC shift cycles) shift cycles for each pattern.
These 50*28,000 = 1,400,000 cycles can be avoid by the method described in this study, against a 1,000
cycles of test setup reconfiguration in between single capture cycle patterns and multi cycles patterns.
With the current method, full test time for these patterns would be:
38,000*250 = 9,500,000 cycles for the current OCC implementation
25 25
20 20
15 15
10 10
5 5
0 0
104
111
118
125
132
139
146
27
62
20
34
41
48
55
69
76
83
90
97
104
111
118
125
132
139
146
20
27
34
41
48
55
62
69
76
83
90
97
OCC items OCC items
Figure 19: Test time depending on OCC number in design with a Figure 20: Test time depending on OCC number in design with a
scan chain length of 120. scan chain length of 200.
For both graphs on Figure 19 and Figure 20 (corresponding each one to a case), here is the legend.
Table 6represents which set of pattern is included in which count (The Sx and Tx refer to Table 3).
Dark blue: This is the worst case, keeping the current implementation.
Red: This is achieved when running all single capture Stuck-at patterns with OCC clock chain length
of 1.
Green: Idem than the Red curb, plus all 2-capture cycles transition patterns using OCC clock chain
length of 2.
Purple: Idem than the Green curb, plus all Stuck-at multi capture cycles patterns limited to 3 cycles.
This implies the loss of a few patterns, but does not impact the fault coverage.
Light blue: Ideal case if the OCC clock chain is either split or at least shorter than regular chains.
Above this threshold, the reduction ratio gets bigger when the number of OCC rises. It grows up to
50% reduction for high number of OCCs. The Red and Green lines show the gain without losing any fault
coverage, and the purple line provides a nice enhancement while losing a few patterns for Stuck-at multi
capture cycles.
IV.3.8) Conclusions
The current implementation of the ST-Ericsson OCC implies 5 control bits. Those 5 bits are
sometimes used, but most of patterns are requiring a limited amount of capture clock cycles. With the
growth of design, the number of OCC will rise. It will appear soon that clock chain (including all OCC control
bits) will be longer than 200 or 300 cells, making it bigger than regular scan chain length requirements. This
will lead to an increase of the shifting time, depending on the number of OCC in the design.
The proposal offers a solution to avoid such problem, by limiting the number of control bits to shift
for most of patterns as simple Stuck-at and simple Transition patterns. Then, only patterns with multi
capture cycles (fast-sequential), targeting memories mainly, will impact the shifting time, and therefore the
test time.
With this update proposal, and according to current projects values, it has been shown that one can
avoid the test time rise (up to 50%) when the number of OCC will rise.
The nice thing with such an update would be, by dedicating more bits in WIR registers, to set
dynamically each OCC in a custom way, using 2 control bit for a given OCC, and 4 for another, depending on
the targeted part of design (memories, against regular simple logic cones). Moreover, this could be done
without modifying anymore the OCC structure.
IV.4) Conclusion
As a main conclusion, one can assert that it is a mandatory tool which is more and more used.
However, one should take care how to place the control bit chains in an IP when it comes to higher level
compression design. Also, an update of the current OCC clock chain was proposed to anticipate the rise of
OCCs number in designs. This allows not dividing the clock chain into 2 smaller ones when reaching the
maximum scan chain length, avoiding an extra couple of scan port at top level. This update has no real
impact on the design, but can provide significant test time reduction.
Table 7 lists pros and cons of the core-based methodology, over a regular “all-at-once” methodology
[8].
Traditional Core-Based
ATPG at Top-level Joint responsibility
Can only start when full chip – Core provider: core-internal DfT and ATPG
design is done – Core-user: Top-level DfT and ATPG
Requires implementation Allows concurrent engineering
details and knowledge of all ‘Divide & Conquer’ approach
cores Test development might be speeded up by test reuse
Very large SOCs increasingly Test coverage guaranteed at TOP level
difficult to handle by test Core pattern expansion if no compression used
tools
Table 7: Traditional versus Core-based test methodology
In functional mode, the wrapper cells are still used by data, but the scan cell (SDF cell in Figure 23
and Figure 24) is bypassed. The wrapper feature is then only adding a mux (capture mux) in the functional
path from port to first cell inside core. In test modes however, the wrapper cells are playing two roles:
When the core is tested, all input cells are loaded (through the red scan chain) with the input pattern
of the corresponding functional input, and output cells capture the values which are supposed to be
at the functional corresponding output port. In that way, from a core point of view, it seems like just
this core is visible, and no interconnection around.
When the core is not tested, one can test the interconnection around the core. The input wrapper
cells capture upstream values, and output cells are set, thanks to the wrapper scan chain, to given
value (ATPG work). In that way, the core appears as transparent and interconnection can be tested.
Figure 23: Wrapper cells in inward facing mode. Figure 24: Wrapper cells in outward facing mode.
Two test modes are then required for each core, namely an inward facing wrapper test mode (called
wrp_if) for core testing (Figure 23), and an outward facing wrapper test mode (called wrp_of) for
interconnection testing (Figure 24). On the figures above, the 2 modes are represented: Figure 23 shows an
A single wrapper chain is mentioned above, but for real IPs with several thousands of ports, several
wrapper chains are required, because they have the same impact on test time as regular core scan chain
length. Meanwhile, the figures do not represent internal regular scan chains and scan ports, but those are
still similar to what have been explained before. Internal scan ports are excluded from wrapper, as well as
clock not to perturb the usual behavior of the core. The wrapper feature adds simply controllability and
observability over functional core inputs and outputs.
Usually, core internal scan chains have a compression feature inside each core, to limit the number
of scan ports. It makes it impossible to compress it again. However, each IP has around 10 to 20 wrapper
chains which are not compressed, and these chains cannot be muxed with other core wrapper chains since
they are all used together. For a 3 core chip, it can be around 50 wrapper chains to handle. A chip top level
compression might then be used to limit pad impact, as shown on Figure 25 (blue wires).
internal scan in
internal scan
ports
out ports
core_select
Core A (wrapped &
compressed)
Decompressor
Compressor
Figure 25: Chip top DfT architecture. Red wire are internal scan ports, muxed all together, and the
blue ones are extest wrapper scan chains handled in a compression.
Figure 26: IEEE1500 Standard global view of the wrapper feature with the wrapper cells (WBR), and the regular internal scan
chains(WPP).
WPP: Wrapper parallel Port: Those are regular core internal scan ports.
WSP: Wrapper Serial Port: includes Wrapper Serial Input/Output (WSI/WSO), and Wrapper Serial
Control (WSC) to handle correctly this architecture. WSC mandatory signals are:
o WCLK (clock), WRSTN (reset), UpdateWR, ShiftWR, CaptureWR, SelectWIR
The aim of each signal will not be detailed here since it is pointless. Reader should refer to IEEE1500
official documentation [1] if needed. However, global behavior allows the data from WSI port to be shifted in
one of the following shift register which form IEEE1500 Wrapper block:
WBR (Wrapper Boundary Register): These registers correspond to all wrapper cells previously
presented, handling functional core ports. All wrapper cells values should be shifted through
WSI/WSO ports (limiting to 1 the number of wrapper scan chains).
WIR (Wrapper Instruction Register): Here is stored the mode in which the block is, a.k.a. which shift
register will shift WSI inputs to WSO values (and then be updated).
WBY (Wrapper Bypass Register): Is used as a default shift register, when the data is not destined to
this Wrapper block. In fact, the standard allows to serially chain all wrapper blocks from a WSO to a
WSI port in order to get a single wrapper serial chain.
WDR (Wrapper Data Register): Here are all test control signals, such as those defining test modes,
wrapper and compression use, and clock management. A WDR can be of 2 types: “Control” which set
values at test setup to drive test switches in all design. “Observe” which can capture global values,
from clock management mostly, and unload them through WSO port. The “observe” type is less
often used.
On theIP considered, there are around 1000 inputs and 1500 outputs. It means then a wrapper scan
chain of 2500 cells to shift for each pattern. When the remaining scan chains are aligned on a length of 200
or 300 cells, this is clearly not an implementable way.
Thus, this wrapper scan chain is usually cut into several smaller chains, aligned on internal scan chain
length, and stitched to dedicated scan ports, outside IEEE1500 rules (usually compressed at chip top
level,afterwards). This corresponds to what have been defined at the beginning of wrapper section.
Previously called TCB in IEEE1149.1 boundary scan standard, this module contains shift registers as
WIR, WBY and WDR, and is commonly called “WIR” or “WIR Controller”. These are clearly misnomers, but
the following parts will use these terms for simplicity. Figure 27 shows how it is implemented.
The WIR signals (Figure 27 in the top left corner) are still the same from the IEEE1500 standard. The
“WDR Observe” module is less used. Basically, at test setup, a first sequence (a couple of clock cycles) is used
to shift a code in the WIR block, which will decide which shift register will be used next. It’s usually the WDR
Control, containing from a 100 to a 1000 registers to control all the design (clock management, test modes
…). After that, the WIR is set towards WBY registers, and the real chip test can start.
A particularity of WDR control shift register, is the possibility to use “dynamic control bits”. Instead
of setting a static value for the whole test, the dynamic control bit is caught in a scan chain and the bit can
This module is generated and integrated at RTL level, and is already present in design when DfT
insertion start. One should note that the WIR test setup should be well defined to enable the right test
control switches before ATPG step to pass design rule checking step and achieve correct scan chain
recognition by the ATPG tool.
However, Synopsys does not support IEE1500 compliance, and the wrapper instantiated is as simple
as in Figure 22. In this way, it is very easy to control wrapper scan chain length or the number of them.
The wrapper creates 2 test modes: inward facing and outward facing. Each test mode can have
dedicated scan ports, and can also choose a different scan chain length. The main point is that inward facing
chains are usually included in a core compression feature, to limit the amount of scan ports. However, for
outward facing mode, these chains have to be carried to the chip top level, where they will, with other core
wrapper scan chains, be compressed too.
This nice feature has however a set of limitations. As well as, for simple wrapper flows, it is easy to
use, but when it comes to combination with other features (compression, pipeline, complex compression
flows …), either unsupported flows or serious errors arise. These limitations have to be bypassed for current
complex designs. This will be the aim of the following parts, and also the main work achieved during this
master thesis.
VI.1) Requirements
The first assignment was to realize a script for DfT Compiler to fit Figure 28 requirements. Starting
from a regular scanned core with regular scan chains (brown block), the idea is to add a wrapper (WBR
chains) around the core, with dedicated inward facing (intest) and outward facing (extest) ports (si/o_wbr
vs. si/o_wbr_ext) at core wrapper hierarchy, and a global compression (green blocks) over internal scan and
wrapper scan chains at test compression hierarchy level. Apart from some points, this can be done with
Synopsys tools, activating wrapper and compression flows.
Figure 28: IP Wrapper & Compression flow requirement with double bypass for each wrapper mode, and a complete control over
boundary cells direction.
The above figure shows extra features around the wrapper (WBR) chains, which are not compliant
with Synopsys tools. These bypass paths are discussed in the coming part.
To avoid such a behavior, some bypasses were implemented (cf. Figure 28). 3 test control signals are
used to provide full flexibility during test. By setting the wir_sel_wbr_chain control bit, the user decides
which compression (core or chip level) will handle the WBR (wrapper) chain:
If set to 1 (Figure 29), chip top compression will drive the wrapper chains (wir_bypass_wbr_ext set to
0), and IP compression chains will see only the bypass path (wir_bypass_wbr set to 1).
If set to 0 (Figure 30), IP compression handles wrapper chains (wir_bypass_wbr set to 0), and the
chip top compression does only detect the bypass chain (wir_bypass_wbr_ext set to 1).
The bypass includes a flip-flop to avoid scan chain length of 0 (ATPG would fail) and a latch, to
handle the possible clock domain change.
To test interconnection and a core at the same time, the wrapper cells should be able to switch from
an intest mode to an extest one, to capture faults either from the core or from the interconnection. The
generated WIR block can allow such a behavior by setting the test mode control bits (generated by DfT
Compiler) to be dynamic. Thanks to this feature, these 2 control bits can dynamically change from a test
pattern to another, reversing wrapper cells behavior. The above 3 control signals should be held however
static to capture either wrapper chains in chip top compression or IP compression.
Also, global IDDQ test patterns generation should be done with all logic testable, meaning
interconnection and cores logic. Activating both at chip top level would cause failure since wrapper chains
would be caught in 2 activated compressions. Thus, this cannot be achieved with Synopsys tools.
In fact, at test compression hierarchy level, a WIR and an OCC were inserted, but the core itself used
the old features. Interface modules were created to handle such architecture. But current IPs are now fully
WIR and OCC compliant.
A first pass is made at IP_shell level to insert wrapper, with control signals coming from above (WIR
Controller at IP_shell_tk level). A test model is built with these 3 test modes to fits wrapper insertion.
A first test mode, called Internal_scan, includes all scan chains from the IP core (no wrapper
consideration).
A second, called My_extest, includes exclusively wrapper scan chains in extest mode.
The last test mode is called My_intest, and includes wrapper cells in inward facing mode, as well as
all core internal scan chains.
As explained before, no bypass can be provided with Synopsys tools. This was then implemented as
a RTL block (with the red boundary on Figure 31), compiled and inserted after Synopsys DfT insertion, as a
hack. Wrapper chains extremities are connected to this block to provide the expected result. The 3 extra
control signals (wir_sel_wbr_chain, wir_bypass_wbr, wir_bypass_wbr_ext in gray on Figure 31) are statically
driven from a WIR controller for the level above. By setting these 3 test control signals to correct values
(resp. “001” in intest and “110” in extest), this modification does not impact anything. However, if the tool
can play with previous TM signals defined as dynamic control bit in the WIR, it will only affect wrapper cell
capture signal, and reverse the wrapper from inward facing to outward facing dynamically during pattern
generation.
Figure 31: wrapper bypass RTL block implemented (inside red boundary).
The second pass will be done at IP_shell_tk to implement the compression over internal scan ports
and inward facing wrapper ports (si_wbr ports). My_extest scan ports (si_wbr_ext ports) should not be
compressed and used as another regular scan mode.
The integration should use the IP_shell test model file generated during the previous run, to get
information about the sub block. However the integration is not that easy, since it contains wrapper
information, and so called “user-defined test modes” (My_extest, My_intest). Those are 2 limitations of test
model integration which makes it impossible. Thus, the test model must be cleaned with Perl scripts to make
the test model passing integration. The main idea is always not to corrupt test model/protocol files, which is
not an easy thing: In fact, the structure is not so obvious, and a single change can corrupt all data when
reading it again afterwards (ATPG, Simulation …). The main changes were to remove unused Internal_scan
test mode, to modify all wrapper information (which should be find out), and rename the My_intest test
mode into Internal_scan, which is the only test mode allowed for test compression integration. When this
test model modification is done, the compression insertion is quick, enabling a regular scan compression
flow.
ATPG runs were processed in order mainly to verify if all clock information were correct. Also the
fault coverage was checked. Indeed, this IP was tested before and got a very good coverage (around 99.5%
of faults targeted), and after those previous steps, the coverage should remain unchanged. If it was below, it
would signify a problem in DfT implementation. Finally, the patterns generated were simulated with NCSim
to check their correctness.
A WIR block is used to control test mode signals (TM1, TM2 and TM3), and an OCC controller (black
block) is used to control the clock used in the wrapper and WIR (black lines). Wrapper cells (in the green
block, Wrapper Boundary Register), and a compression feature (decompressor and compressor) are both
generated at the same DfT insertion run. A WIR serial chain is created, from input ports (wsi), through WIR
Controller at shell level and those inside the core, until wso output ports. In a similar way, and according to
the OCC explanations in a previous part, the shell clock chain is handled in a dedicated global clock chain
grouping all OCC control bits (shell OCC + core OCC). To perform this single chain, scan cells group are
created, and a scan path is declared through shell clock chain, and dedicated core ports.
This script is fully compliant with Synopsys flows, and provides a cleaner result since no handmade
step is required. However, in this flow, the wrapper bypasses were not implemented, in accordance with
Synopsys. It means that there are no possibilities to make the wrapper state switch dynamically, from
outward facing to inward facing. Also, at chip top level, the ATPG will provide a set of pattern for each core,
plus one to test the interconnection. This can imply a small test time increase. In fact, the wrapper created is
directly plugged to compression feature, and it is not possible to modify the netlist as in the first flow to add
bypasses and make TM test modes signals dynamic. If TM signals were dynamic, the inputs of wrapper
chains would come randomly from extest or intest test port due to the scan in multiplexer on wrapper
chains.
Figure 32: IP wrapper & Compression flow in a single hierarchy. No more bypass implemented but a full compliance with Synopsys
tools.
VI.5) Conclusion
Two flows were implemented and tested with wrapper and compression architecture. The most
straight-forward was the one using only extra hierarchy but does not allow bypass implementation. The first
flow, using 2 extra hierarchies is easier to customize is more flexible, allowing wrapper bypass
implementation. However, a choice was made, for simplicity, to automate the simpler one, but the scripts of
bypass implementation are available for a later use too.
This IP wrapper & compression flow was a first task to handle correctly the tools which was followed
by a real and up-to-date project. This project will be the purpose of the next and last section.
Power aware: Limit both test and functional net crossing power borders to avoid extra logic
insertion. Also, independent test architecture should be implemented for each power domain. In fact, for
test purpose, these 2 domains must be seen as 2 blocks with no interaction. For debug, one could even
imagine testing just a power domain with the other one switched off
Targeting 2 projects: Maybe the most challenging part, the aim is to work for both projects.
The IP will have an extra hierarchy, called IP_shell (light blue on Figure 33), were dedicated test logic will be
inserted. The chip project will stop a level below, at soc_top
IP_shell: Test compression + IEEE1500
hierarchy (Figure 35), where chip top test blocks IP_top
(TAP_CTRL…) will also be inserted. But below the IP_top level, OCC IP_iso: Protection cells
last tool run is project dependent, and all the remaining can IP_stub: Stub
IP_core
IP_vcore_top
the project roadmap, a test case including the same test scan segment
scan segment
scan segment
scan segment
scan segment
specificities was built in Verilog. The test case (initial view on Free flops
Figure 33) had the same hierarchies, with a couple of logic CORE
The design includes WIR controller and is full On-chip clocking WIR 1500
OCC Controller and a shift clock for test purpose. Both are IP_vsafe_shell
WIR 1500
chains of WIR Controller are built (red lines), one for each
power domain. Free flops
IP_top
WIR 1500
IP_stub: Stub
scan segment
IP_core
IP_vcore_shell
IP_vcore_top Vsafe power domain
12 12
scan segment
scan segment
scan segment Vcore power domain
…
scan …
scan segment
segment
scan segment
32 Modem 32
Core
10
10
WIR 1500
IP_vsafe_iso
IP_vsafe_shell
WIR 1500
IP_vsafe_top
scan
scan segment
segment
…
scan segment
scan segment
10 10
scan segment
scan segment
scan
scan segment
…
segment
Wrapper cells
WIR 1500
Figure 34: IP X project DfT requirements with 2 compressions at different levels, and a Wrapper feature.
The IP architecture includes 2 compression modules (at IP_stub_shell and IP_shell levels, depending
on the power domains) and a wrapper (in pink) to be included in Vsafe compression. The number of scan
ports is given above, for a total amount of 128 scan ports (64 inputs and 64 outputs). Because the IP project
will be integrated in a bigger chip, a core-based test methodology will be applied at chip level. For that
purpose, the wrapper discussed in previous sections, including bypass paths, will be implemented here. The
aim is to deliver a fully tested IP with its set of test patterns, in order to limit the integrator work.
The coming figure will detail the chip J project, which is quite similar to the IP X project.
soc_top
soc_iso
soc_vcore
scan segment
scan
scan segment
…
segment 6
6 scan segment
IP_core
Serializer
Serializer IP_vcore_shell output
input
12 12
IP _vcore_top
scan segment 16
16 scan segment
scan segment
…
…
scan
scan segment
segment
scan segment
Serializer Serializer
input output
32 Modem 32
Serializer Core Serializer
input output
10 10
W WIR 1500
5 I
5
R
1
5 IP_vsafe_iso
0
TAP 0 IP_vsafe_shell
CTRL
WIR 1500
IP_vsafe_top
5
scan
scan segment
segment
…
scan segment
scan segment
Serializer
input Serializer
10 output
10
scan segment
scan
scan segment
…
segment
scan segment
5
WIR 1500
Figure 35: Chip J project DfT requirements with the same compression architecture as IP X project, plus serializer at soc_shell level.
In the J chip project, the scan compression structure is the same, but there is no wrapper, since it is
pointless in a chip project without other IPs. However, due to the small dimensions of the chip, the amount
of pads is limited, and thus the scan access ports too. To compensate this, a serializer feature is added to
divide test ports by 2 (from 128 ports to 64 ports). Also, some chip top level logic (such as Test Access
Protocol Controller) is inserted and some logic will be scanned too (green scan segments), included in Vsafe
compressor.
Another limitation when using a hierarchical flow and bringing up information from sub-blocks is
that no user-defined test mode can be used. The impact of such limitation will clearly appear when
integrating the IP X project. In fact, a wrapper implies dedicated test modes for intest and extest modes, as
defined in a previous part. This is another limitation to bypass which will be discussed in “Integrating IP X
project” part.
For the chip J project, serializers have to be inserted to limit test port impact. Again, this is an issue
because the serializer flow does not work in hybrid flow. Serializer should then be inserted at another level,
which is dedicated to that purpose (soc_shell). Such change in the hierarchy was allowed because of the
early stage of the chip J project. However, concerning the IP X project, there is no way to change anything,
and the architecture should be implemented as described. This will be discussed in “Targeting chip J project”
part.
•Creating regular chains (300 cells/chain) and stitching internal chains to new ports.
IP stub •Using IP_core previously generated test model. No test mode created.
•Inserting compression for Vcore scan chains. Excluding Vsafe chains from
compression.
IP stub •Using IP_stub previously generated test model. ScanCompression_mode and
shell Internal_scan test modes used.
•Just stitching internal chains to new ports.
•Using IP_stub_shell previously generated test model. ScanCompression_mode and
IP iso Internal_scan test modes used.
•Creating regular chains (300 cells/chain) for IP_top cells and stitching internal chains
to new ports.
IP top •Using IP_iso previously generated test model. ScanCompression_mode and
Internal_scan test modes used.
Figure 36: X/J project bottom-up flow steps. Each line describes a step with the main actions to be done.
As an example, at IP_stub_shell, no scan replacement is made, since there is no cell in that hierarchy.
Then, a compression feature is inserted for all scan chains from Vcore power domains, namely those coming
from IP_vcore_top and IP_stub. Other scan chains are excluded from compression. This step takes as input a
netlist and a test model from previous step (IP_stub). In the new design IP_stub_shell after DfT insertion, 2
test modes are present, ScanCompression_mode and Internal_scan. The first one is using compression
modules, and the second is bypassing compressors, and then contains scan chains with long length.
After describing all sub-steps to reach IP_top level, the last one is to proceed to IP_shell DfT
insertion, which is the most challenging. The idea is basically to reuse the IP wrapper compression flow from
the previous part. A single thing differs from the previous flow: The compression does not have to compress
all internal scan chains since some of them (Vcore and Modem_Core chains) are already handled in
compression at IP_stub_shell.
A first step is then to create a fake test model for IP_top block with all scan ports declared (with a
virtual internal scan chain length) with CTLGen. Then, by excluding Vcore and Modem_Core scan chains from
compression and taking care of OCC internal clock chains, the flow is exactly the same as for the IP wrapper
& compression flow.
However, the test protocols generated will be completely wrong, since no compression information
from IP_stub_shell and Modem_Core compressors are given. Then, a correct test protocol should be built
with SPF, taking the generated one as a basis. Compressors for Modem_Core and Vcore are to be inserted,
and the resulting file is fully ATPG compliant.
2 parallel tasks have to be first achieved, which are inserting DfT in Vsafe_shell_iso and Vcore_shell,
inserting Modem core test model in the Vcore block, and scanning both core with regular scan chain of 300
cells.
Then, the tool is used to insert a wrapper at IP_shell. But all wrapper dedicated information, ports
and nets are removed afterwards, as well as test modes. The only part remaining is the wrapper cells in a
dedicated module called wrapper_shell. In this module, the same bypasses as in IP wrapper compression
flow are inserted. But from the outside of the wrapper box, it looks like a regular scanned block. Also,
dedicated intest and extest wrapper scan ports will be created in the wrapper box. By creating a simplified
test model of this wrapper_shell, it will be handled at IP_shell as common scan chains. Twice as much as
wrapper chains number will be declared, namely wrapper_intest_x (green chains on Figure 37) and
wrapper_extest_x (red chains on Figure 37) to use dedicated ports for intest and extest modes. However
both chains are targeting the same WBR chain, depending on the bypass control signals states. Figure 37
explains the modified wrapper behavior.
Only a couple of ports deciding whenever the wrapper is in inward facing or outward facing mode
(test modes and wbr_bypass control signals) are to be plugged manually towards the IP_shell WIR controller.
After this step, the netlist is updated and ready to insert all compression at once (both Vcore and Vsafe with
wrapper chains).
IP_shell
Figure 37:Wrapper modified. WBR chains are handled in both intest and extest wrapper chains.
In fact, the last run will read test model from Vcore_shell and Vsafe_iso block, handling their scan
chains, and create the last scan chains in IP_stub and IP_top. 3 test modes are created:
Vcore partition:
o IP_stub scan segments
o IP_Vcore_top scan chains (from test model)
Vsafe partition:
o IP_Vsafe_top scan chains (from test model)
o IP_top scan chains
o Wrapper intest chains
A default partition without compression just to connect Modem core scan chains already
compressed.
The tool offers the possibility to choose, for each partition, where DfT logic has to be inserted. For
the Vcore compression, by setting it to IP_stub shell sub-block, the tool understands correctly and
implements the correct architecture. At last, with all these specifications the tool is then able to insert the
asked feature, and overall, all test models and test protocols are correct and “almost” ready to use for ATPG
step. The only mandatory left step before is actually the “test setup” generation to start correctly the test.
This is a small part of the test protocol explaining what are the first values to put on which input (clocks
mostly), and also the values which are to be shifted in WIR controller through wsi ports.
•Using test models file for Vcore_shell, Vsafe_iso and wrapper blocks.
•Hybrid flow used with partitions:
• Default partition with Modem_core chains, integrating Modem_core compression feature.
• Vcore partition with IP_stub and Vcore chains. Compression modules located in
IP_stub_shell.
• Vsafe partition with Vsafe chains, IP_top and wrapper block. Compression modules
Global DfT located in IP_shell.
insertion •3 test modes created:
• Internal_scan and its related compressed mode (ScanCompression_mode), including all
chains but external wrapper chains.
• My_extest mode, excluding all core chains and internal wrapper chains. (Just extest
wrapper chains remaining)
Figure 38: X/J project top-down flow steps. Each line describes a step with the main actions to be done.
First, the bottom-up flow is able to define very precisely in each block the clock management, and
scan configuration. By inserting all step by step, it allows a flow setup very easy to debug, and a team work
sharing very wide. However, this flow asks for 11 run of DfT compiler, creating each time a new netlist, and a
corresponding test model. It requires starting on the modem core with a simplified test model, to hide
compression features inside. Limitations would appear quickly and test model hacking will become soon
tedious. At last, test model and test protocol of IP_shell have to be built with piece of information from
everywhere, making it hard to automate. Moreover, ATPG tool requires very precise information to perform
a correct work, and this test protocol generation is thus important.
All considered, the second option was chosen, using an easier and faster flow. It is currently under
implementation with the real project netlist. Results are good so far.
VII.4) Pipelines
During previous lines, for both flow proposals, pipelining was not mentioned. It is a particularly
thorny issue to deal with them. In fact, 2 stages of pipeline are required in the specifications between each
scan port and the first cells, before any logic, to break the timings. The modem core has already 1 stage
inside its block, so 2 stages of pipelines have to be added to all scan ports but those concerning the modem
core (84 ports concerned). For those ports, only 1 stage is required. Synopsys handle pipeline support, in
most of the cases, but a bug has been detected and reported when using multi-mode feature at the same
time. The tool does not fit test mode dedicated scan ports requirements. Moreover, the tool is not able to
equalize pipeline stages if a sub-block already contains pipeline.
The idea is then to insert pipeline cells, a particular kind of scan flip flops, by hand in the netlist, after
the tool inserted DfT. A script in Tcl has been written to perform such insertion, for any technology.
Meanwhile, since the netlist has been modified after test protocol generation, there is no trace extra
pipelines are inserted, and the ATPG too would fail recognizing scan chains. The test protocol files are to be
modified, adding a couple of lines explaining presence of pipelines, and modifying pipeline clocks behavior to
pulse whenever it is necessary. The flow is thus very far from the ideal push button flow
The second issue when integrating IP_shell at higher level (another SoC) is to provide to the tool all
information about wrapper modes and compression modules. However, user-defined multi modes test
models are not supported at integration and the tool cannot insert a compression on extest ports since it
looks like the IP has compression already inside. To avoid this, one would need 2 test models, one with all
The first trial was to start from scratch, using CTLGen tool from Synopsys to create a fake test model.
The limitation (not negligible) is that CTLGen does not handle OCC definition. The test model is then losing all
clock management information. A Perl script was made, from a CTL generated with CTLGen and a text list
with all synthetic OCC info, to get a test model file with OCC information. This would however require the
user to write all these information himself, which can be quite long when the design handles around 50 OCC
Controllers.
Next trial was taking as input the full generated CTL from DfT Compiler run, add, thanks to another
Perl script, remove all compression information. This script is close to the one used in the first flow for IP
wrapper & compression. It removes all reference to Internal_scan and ScanCompression_mode test modes,
and rename My_extest test mode as Internal_scan, the only one in the design. No trace of compression
anymore, and all OCC information already in place, untouched.
A last option will be available soon from Synopsys R&D is a new feature which will allow the user to
write a test model for a given test mode. The option is today under test and does not work correctly yet. But
the expected result is the same as the second trial.
The solution which was adopted is for now is the second one, from a full test model to be simplified
by Perl script waiting for Synopsys option to be corrected.
At the time this report was written, the project was still under discussion, and the number of pad
available for scan purpose not defined. There is no certitude that a serializer will be really used, depending
on the chip package used. Then, the serializer flow implemented includes parallel access. This means that
the chip, depending on which ports are linked to pads, either a parallel mode (Figure 39, red wires) with N
scan channels or a serial access (Figure 39, green wires) dividing by S the test port count will be used. In the
IP X project, S should be equal to 2. Figure 39 shows the two possible modes.
It should be emphasized that serializer feature is quite recent and feedbacks from project having
embedded serializer are so good. Thus, it divides the test frequency by S, increasing test time with a factor S.
All that means that serializers, even if implemented on the chip, will only be used as a last resort.
VII.7) Conclusion
This part was the final target of the master thesis at ST-Ericsson. Previous parts aimed at
understanding and handling tools and all their possibilities, with an ultimate goal to implement the best flow
in these projects. It has been clearly proved that the Top-Down flow, with 2 sub-blocks Vcore and Vsafe is
the better in terms of flow simplicity, speed and workload sharing, since two engineers can work on one of
the 2 blocks, and a third one can prepare the final top-down flow, using test model files generated by sub-
block DfT insertions.
It has also been stressed that Synopsys tools have many limitations, forcing the user to find
workarounds for real complex projects. However, Synopsys support is really present and asking constantly
for feedbacks and test cases when some limitations or errors are raised. A new feature or a bug correction
requirement from a design team can be effective then quickly in a coming tool release.
This master thesis presented basic concepts of scan based test, and went through most of up-to-date DfT
features. The Synopsys tools flow has been clarified, showing its indispensability as a strong and powerful tool.
Also, tools limits have been demonstrated in complex flows. In fact, test architecture in the design is planned
usually depending on constraints as number of ports, as well as functional design specificities. Thus, tools are
sometimes not able to fulfill all requirements. To overcome these limitations, the user needs to implement smart
workarounds to insert DfT logic. This is what has been done in this document, raising tool limitations and
proposing solutions around.
Two major flows have been proposed in this master thesis. A first one, taking test power limits into
account, is implementing a full custom flow to wrap IPs with regular scan compression for IP internal chains.
These wrapped IPs can then easily be integrated at chip top level, compressing wrapper chains to minimize scan
impact. Flexible architecture, including wrapper bypass chains, has been developed, showing clearly tools
limitations. Bypass paths, allowing scan chain continuity for both internal and external wrapper chains. This could
eventually allow testing the core and the interconnection at the same time. The flow which was automated took
the compliant Synopsys way, integrating both compression and wrapper feature at a single step. It provides a
faster setup for coming IPs to be wrapped.
The second architecture developed dealt with a complex current DfT design, implementing most of
available features, including compression, wrapper, pipelines, partition flows. A System-on-Chip project, parallel
to the IP project will also implement a Serializer to limit scan port impact as the number is limited .Some of these
features are completely new in DfT state of the art and mixing them together did not always lead to expected
results. A deep work has been provided to work around all tools limitations again, to finally propose 2 flows. No
Synopsys fully compliant flow could be proposed, thus some customization and netlist modifications have to be
realized for both flows. The one selected was the faster and simpler, including as few steps as possible. It inserts a
wrapper feature at a step, keeping only wrapper cells without ports, nets and wrapper specifications. This way
allows flow compliance with complex compression architecture (hybrid flow).
Also, a strong stress has been put on clock management, most particularly on On-Chip-Clocking
controllers. This was the purpose of section IV and it showed that OCC clock chains have to be correctly handled
in compression flow. More precisely, all OCC clock chains must be merged in a single chain to provide the best
ATPG patterns set. Any other combination tested led to huge test pattern set increase. A second study, taking into
account the coming growth of designs and the number of clock to handle in them, an update was proposed to
improve both similar ST-Ericsson and Synopsys OCC clock chains. In fact, as OCC clock chains must be kept in a
single chain, it will soon appear that clock chain size will be superior to regular scan chain length. Noting that, for
a certain amount of test patterns, only part of OCC control bits are used, the update allows shortening
dynamically the clock chain, reducing then shift time during test.
All in all, this master thesis provided a great overview over DfT insertion flow, and raised up tools
limitations and some workarounds for core-based test methodology were detailed. A close collaboration with
Synopsys support allowed constant help and feedbacks in both directions, and the result is correct and fully
functional flows. Moreover, Synopsys R&D is now aware of raised limitations, which will be for sure improved in
coming release.
3. Synopsys 2011, DFT Compiler, DFTMax & TetraMax User Guides, F-2011.09-SP1.
5. HIEBEL, F 2008, ‘Transition Fault test pattern generation optimization using on-chip PLL and
implication on compression techniques’, Synopsys User Group Europe.
6. Delbaere, I, Carin, C & Eychenne, C 2010, ‘Hierarchical Adaptive Scan Synthesis and core wrappers
methodology combined to reduce power during scan test’, Synopsys User Group, Grenoble, France.
7. Eychenne, C, Carin, C & Delbaere, I 2011, ‘Reconfigurable wrapper Test Access Mechanism (TAM) in
a core based DFT strategy to save interconnect test time’, Synopsys User Group, Grenoble, France.
8. Eychenne, C 2010, ‘IEEE 1500 status: Core test wrapper use’, Internal ST-Ericsson document.
9. Institute of Electrical and Electronics Engineers 2007, Standard Testability Method for Embedded
Core-based Integrated Circuits, IEEE1500, Institute of Electrical and Electronics Engineers, Geneva.
10. Chakravadhanula, K & Chickermane, V 2009, ‘Automating IEEE 1500 Core Test – An EDA Perspective’,
IEEE Design & Test of Computer, May-June 2009, pp6-15.
11. Refan, F, Prinetto, P & Navabi, Z 2008, ‘An IEEE 1500 Compatible Wrapper Architecture for Testing
Cores at Transaction Level’, Design & Test Symposium (EWDTS), 2008 East-West, pp178-181.
12. McLaurin, T 2005, ‘IEEE Std. 1500 Compliant Wrapper Boundary Register Cell’, ARM.
Netlist & DfT •Get a shell netlist with scanned core and OCC inside.
Spec •Configure config file and library file.
Generation
•Perform WIR block creation (internal tool) and WIR insertion at shell level.
IEEE1500 WIR
Controller •run do.all –wir to get a test model file of WIR block.
Generation
Create Core
•If no Core CTL file exists, custom run_ctlgen.tcl to get a CTL file (from CTLGen tool).
CTL and •Run do.all –netlists to create all required netlist files for insertion.
netlists
•Multimode wrapper and compression insertion at shell level thanks to DFT compiler
Wrapper & based on core by running do.all -insert.
Compression
insertion
•=> Full shell test model file + Final shell netlist
•Create the test mode-dependent test setup files (internal tool) + OCC setup by
Test setup running do.all -spf.
generation •=> ATPG ready test protocol files
•Custom ATPG script (run_tmax.tcl) and run ATPG with do.all –atpg command.
ATPG run •=> Patterns ready for simulation
Figure 40: Full flow developed for the single hierarchy IP Wrapper & Compression flow.