A Case Study of Hierarchical Diagnosis For Core-Ba
A Case Study of Hierarchical Diagnosis For Core-Ba
A Case Study of Hierarchical Diagnosis For Core-Ba
Eric Wang
Freescale Semiconductor, 288 ZhuYuan Rd, SuZhou, JiangSu, P.R.C, 215011
Abstract
Test Access Mechanism (TAM): On-chip hardware
In this paper, a silicon debug case study was infrastructure to transport test stimuli from the SoC
given in the context of a hierarchical diagnosis pins to the embedded cores. TAMs also transport
flow for core-based SoC. We discuss (1) how to test responses from the embedded cores to the SoC
design a simple core wrapper that supports at- pins [1].
speed test, (2) how to map the failures collected
from the chip level to core level, and (3) how to In prior publications, several TAM
perform failure analysis and silicon debug under architectures were proposed for SoC testing [2-6].
the guidance of diagnosis results. They can be classified into two categories: (1)
using existing functional paths inside cores and (2)
1 Terminology and Introduction inserting additional paths outside of cores
dedicated to test purposes. It is also possible to use
The terminology used in this paper is briefly a combination of these approaches. The method
discussed below. proposed in [2] belongs to the first category. Most
of the TAMs proposed in the literature belong to
SoC: Designs that integrate a complete system the second category, i.e. the test access paths are
onto one chip are called System-on-a-Chip (SoC) established outside of embedded cores. The
designs. advantages of using these TAMs include (1)
structured DFT, (2) simple control protocols and
Core: In SoC designs, the design process (3) easier diagnosis. The major TAM techniques in
involves an IC that is often made up of large pre- this category include parallel access [3], serial
defined and pre-verified reusable building blocks access [4], Test Bus [5], and TestRAIL [6].
or intellectual property (IP) blocks, such as digital
logic, processors, memories, analog and mixed Wrapper: A wrapper is a thin shell around the core
signal circuits. The IC building blocks are called that provides interface between SoC pins and core
cores or embedded cores [1]. In order to shorten the terminals during test [7] [8]. It also allows
design cycle and improve the system integration adjusting the bandwidth for test thus providing the
efficiency, the modules with more complex flexibility to better utilize the limited number of
functions and required to guarantee high quality of SOC pins for test.
routing and timing are often delivered to the SoC
design company as hard IP cores. Typically, a pre- Controlled by the instruction register, a core
generated and validated test pattern set for a hard can be set into functional mode, test mode and
IP core is also delivered together with the core. bypass mode. In test mode, it can be accessed from
TAM through wrapper boundary cells. The
Core-based SoC design strategy has become functional input / output / bi-directional terminals
more and more popular because designing a multi- of a core connect with these wrapper cells. In
million gate system with pre-designed and pre- addition, if a core has internal scan chains, the scan
verified reusable cores is the most effective way to input and output terminals of the core can also be
reduce design time and cost. As the complexity of connected to the wrapper cells. This leads to
SoCs keep increasing and process technology multiple choices for configuring a wrapper. In the
keeps shrinking, how to rapidly diagnose test next section we will first review the IEEE standard
failures, analyze diagnosis results and identify the wrapper design and then describe the wrapper used
root cause of the defects become more and more in our SoC.
important for silicon debug and yield improving.
1
2 Core Wrapper Design register is selected. The WBY is intended to
provide a minimum length scan path through the
The IEEE 1500 core wrapper [8] is illustrated wrapper, so that when several IEEE 1500 wrappers
in Figure 1. are serially chained together in a SoC, the wrappers
that do not require a data register to be accessed
can be bypassed with a short scan path through
their WSI/WSO terminals.
2
DFT wrapper In Figure 2(b), we illustrated the core output
side and in Figure 2(c), we illustrated the core input
side. Note that the internal scan chains of a core
can be accessed through WPP of the wrapper,
IP which is not illustrated in Figure 2.
Core
logic
3 Hierarchical Diagnosis
As we know the hierarchical nature of SoC
designs makes the hierarchical ATPG and
diagnosis the most straightforward and effective
approaches. Since many prior papers discussed
wrapper_scan_out
wrapper_clk
wrapper_se
test_mode
wrapper_scan_in
how to perform hierarchical ATPG for core-based
SoC designs, in this section we will only focus on
how to run hierarchical diagnosis. It requires two
steps to run hierarchical diagnosis.
3
The above mentioned diagnosis flow can be The first silicon failed at-speed test. We have
iteratively applied from the upper level down to to identify the root cause of the test failures as soon
lower level, if hierarchical cores are used. as possible, and provide guidance to make the
Essentially hierarchical diagnosis can provide us design function. Good EDA tools can help
with better diagnosis resolution and shorter run engineers run diagnosis at logic and physical level,
time, compared to diagnosing the entire chip. which will greatly reduce the effort of silicon
debug. In our silicon debug phase, firstly, we apply
4. Case Study the proposed hierarchical diagnosis flow and
identify the failing IP core. Secondly, Mentor
In this section, we share our experiences with Graphics’s diagnosis tool YieldAssistTM was used
one industrial case study to illustrate the to do the logic diagnosis at core level.
application of the proposed hierarchical diagnosis
flow. The design we manufactured is a SoC The input data for YieldAssist include the
integrated with 4 hard IP cores, which are flattened design model, core level ATPG test
manufactured by 90nm CMOS process. Each core patterns and failure data log which is translated
has 27947 scan cells, 46 scan chains, and about from SoC level. The flattened model is generated
1500 PIs/POs. The entire SoC design has about 7 during ATPG, which includes design netlist, test
million gates in total. signal constraints and the circuit model of some
modules. WGL, parallel STIL, ASCII and Binary
Since the hard IP used to be applied into are the acceptable test pattern formats for
multiple projects, to avoid duplicated efforts to YieldAssist. Since the ATE vendors have different
generate the test pattern, we can just map the IP data log format, it’s required to convert them to a
level patterns to the chip level. This method not unified format which can be read into YieldAssist.
only improves the reusability and reliability of the
patterns, but also reduces the pattern debug effort This case proves the benefits of applying
in SoC level. hierarchical diagnosis, which is only requiring core
level test patterns and flattened model for
The core level patterns can be mapped to chip diagnosis. Since the size of hard IP flatten model is
level by applying the flow shown in Figure 3. It much smaller than that of SoC, the runtime of
also takes advantage of the mapping database YieldAssist is very short. Figure 4 illustrates our
proposed in the previous section, but in an opposite flow of diagnosing hard IP cores with YieldAssist.
way compared to diagnosis flow. Firstly, we map
the primary IO port names in hard IP cores to chip Chip Core vs. SoC
level. Then, we schedule the order of the cores to SoC ATE level Fail mapping
be tested, and determine a power-on setup pattern WGL data log database
sequence according to the chip reset sequence and
boot up mode. Finally, the chip level patterns are
obtained by merging the setup patterns and the
patterns at each hard IP core level, based on their Hard IP
scheduled order. Flattened Hard IP level
Hard IP
model fail data log
WGL
Hard IP SoC Setup
Hard IP constraints SoC procedure
YieldAssist
Fastscan Fastscan Pattern consistency check
Hard IP
pins vs.
SoC pads Failure file consistency check
mapping Hard IP WGL SoC setup WGL
list Chain Logic
diagnose diagnose
Diagnose Results:
Mapped Hard
IP WGL -Symptom separation
-Suspect type classification
-Suspect scoring & ranking
-Net, cell and pin
SoC WGL
Figure 3: Test Pattern Integration Flow Figure 4: Core Diagnosis with YieldAssist
4
In our case study, according to the YieldAssist Comparing the two Shmoo plots shown in
diagnosis report, a few particular nets are (5.a) and (5.b), we found that using internal on-
highlighted as the suspects. Meanwhile, we chip clock will cause intermittent failures at 1.15V,
correlate the diagnosis report with the STA report, whereas using external clocks this is not
which indicates that the suspected nets given by happening.
YieldAssist happen to be on a timing critical path
in on-chip clock generate module. So we know that To find out why we get intermittent failures at
the fault is caused by timing issues on the critical 1.15V when using internal on-chip clock, we
path. However, we still can not draw any exploit the micro probe measurement in failure
conclusion if the root cause is design-related or analysis laboratory and observe that the clocks
process-related. don’t necessarily shift to the same side at the same
time and it caused the phase to change between
To further investigate the root cause, we draw 2.9ns and 6.6ns. The two pictures illustrated in
shmoo plots with different voltage and clock Figures 6(a) and 6(b) show the phase difference
frequencies. The shmoo plots using an internal on- between launch and capture pulses is not stable
chip clock and using an external clock are shown in around 1.15v. Figure 6(a) shows the minimum
Figures 5(a) and 5(b) respectively. value of 3.04ns and Figure 6(b) shows the
maximum value 6.6ns. However, the expected
value should be around 12.7ns.
launch pulse
capture pulse
scan enable
launch pulse
capture pulse
scan enable
5
Based on these analyses, we finally can draw 5 Conclusions
the conclusions that:
In this paper, we first proposed a hierarchical
(1) The PLL generated clock phase variations lead diagnosis flow such that diagnosis can be directly
to on-chip clock generator output pulse changes. performed at core level instead of chip level. This
can be achieved by maintaining a database
(2) The intermittent failures are caused by an translating between core IO names and chip level
unstable launch/capture frequency which is pads and translating core level failure cycle number
generated from on-chip clock generator. to chip level failure cycle number. A silicon debug
case study was also given in detail to explain how
(3) At lower or higher voltage rather than around to correlate diagnosis results with shmoo plots,
1.15v, the phase difference stabilizes to a ~12.7ns micro probe measurement to identify the root cause
value, making the test to pass. of the design problems.