A Case Study of Hierarchical Diagnosis For Core-Ba

A Case Study of Hierarchical Diagnosis for Core-Based SoC
Eric Wang
Freescale Semiconductor, 288 ZhuYuan Rd, SuZhou, JiangSu, P.R.C, 215011
Yu Huang, Wu-Tung Cheng, Wu Yang, and James Fu

Mentor Graphics Corporation, 8005 S.W. Boeckman Rd., Wilsonville, OR 97070, USA
Abstract
Test Access Mechanism (TAM): On-chip hardware
In this paper, a silicon debug case study was infrastructure to transport test stimuli from the SoC
given in the context of a hierarchical diagnosis pins to the embedded cores. TAMs also transport
flow for core-based SoC. We discuss (1) how to test responses from the embedded cores to the SoC
design a simple core wrapper that supports at- pins [1].
speed test, (2) how to map the failures collected
from the chip level to core level, and (3) how to In prior publications, several TAM
perform failure analysis and silicon debug under architectures were proposed for SoC testing [2-6].
the guidance of diagnosis results. They can be classified into two categories: (1)
using existing functional paths inside cores and (2)
1 Terminology and Introduction inserting additional paths outside of cores
dedicated to test purposes. It is also possible to use
The terminology used in this paper is briefly a combination of these approaches. The method
discussed below. proposed in [2] belongs to the first category. Most
of the TAMs proposed in the literature belong to
SoC: Designs that integrate a complete system the second category, i.e. the test access paths are
onto one chip are called System-on-a-Chip (SoC) established outside of embedded cores. The
designs. advantages of using these TAMs include (1)
structured DFT, (2) simple control protocols and
Core: In SoC designs, the design process (3) easier diagnosis. The major TAM techniques in
involves an IC that is often made up of large pre- this category include parallel access [3], serial
defined and pre-verified reusable building blocks access [4], Test Bus [5], and TestRAIL [6].
or intellectual property (IP) blocks, such as digital
logic, processors, memories, analog and mixed Wrapper: A wrapper is a thin shell around the core
signal circuits. The IC building blocks are called that provides interface between SoC pins and core
cores or embedded cores [1]. In order to shorten the terminals during test [7] [8]. It also allows
design cycle and improve the system integration adjusting the bandwidth for test thus providing the
efficiency, the modules with more complex flexibility to better utilize the limited number of
functions and required to guarantee high quality of SOC pins for test.
routing and timing are often delivered to the SoC
design company as hard IP cores. Typically, a pre- Controlled by the instruction register, a core
generated and validated test pattern set for a hard can be set into functional mode, test mode and
IP core is also delivered together with the core. bypass mode. In test mode, it can be accessed from
TAM through wrapper boundary cells. The
Core-based SoC design strategy has become functional input / output / bi-directional terminals
more and more popular because designing a multi- of a core connect with these wrapper cells. In
million gate system with pre-designed and pre- addition, if a core has internal scan chains, the scan
verified reusable cores is the most effective way to input and output terminals of the core can also be
reduce design time and cost. As the complexity of connected to the wrapper cells. This leads to
SoCs keep increasing and process technology multiple choices for configuring a wrapper. In the
keeps shrinking, how to rapidly diagnose test next section we will first review the IEEE standard
failures, analyze diagnosis results and identify the wrapper design and then describe the wrapper used
root cause of the defects become more and more in our SoC.
important for silicon debug and yield improving.
1
2 Core Wrapper Design register is selected. The WBY is intended to
provide a minimum length scan path through the
The IEEE 1500 core wrapper [8] is illustrated wrapper, so that when several IEEE 1500 wrappers
in Figure 1. are serially chained together in a SoC, the wrappers
that do not require a data register to be accessed
can be bypassed with a short scan path through
their WSI/WSO terminals.
(5) Wrapper Boundary Register (WBR) is the data

register through which test data stimuli are applied
and pattern responses are captured. This register
allows internal testing of the core, as well as testing
of external connectivity to other cores and SoC
integration circuitry, in response to an instruction
loaded into the WIR.
Although IEEE 1500 wrapper is very flexible

for controlling and testing complicated SoC with
many cores, we decide to design a more simplified
wrapper for our SoC.
(1) We do not need WBY for our SoC design. Our

SoC design only has 4 hard IP cores, and hence the
total testing time is not a big problem for us.
Figure 1: Standard Components of the Therefore we decide to apply serial core testing for
IEEE 1500 wrapper simplicity. That is to say we test one core at a time.
When testing a core, its WPP was connected to the
Its components are briefly described as IO pads at chip level directly. The core test
follows. scheduling becomes very simple.
(1) Wrapper Serial Port (WSP) has a set of serial (2) Instead of using a dedicated WIR, we use a few
terminals that could be sourced from chip-level control signals to directly control the core into test
pins or from an embedded controller such as an mode or functional mode. In Figure 2(a) we
IEEE 1149.1-based (JTAG) controller. The WSP is illustrate the conceptual view of the core wrapper
used to load and unload instructions and data into used for our SoC. In Figure 2(b) and 2(c), we show
and out of the IEEE 1500 registers. In addition to the detailed implementation of a wrapper cell. To
the wrapper serial input (WSI) and wrapper serial support launch-by-capture at-speed test, we design
output (WSO) terminals shown in Figure 1, the our wrapper with 2 flops for each wrapper cell. As
WSP contains wrapper serial control (WSC) it can be seen, in functional mode, “test_mode” is
terminals used to control the operation of all IEEE set to 0 and the wrapper cell is in transparent mode.
1500 registers. The IP signals can pass through the wrapper cell
and communicate with other modules in the SoC.
(2) Wrapper Parallel Port (WPP) is a user-defined In test mode, “test_mode” is set 1. The core input
set of wrapper terminals providing a parallel signals come from the port Q of the flip-flops in the
interface to the IEEE 1500 wrapper. These wrapper cell, and the core output signals go to port
terminals are used when the wrapper is configured D of the flip-flops. By using two flops per wrapper
into parallel mode. cell, core IOs can be used as launched points for at-
speed test since the proposed wrapper cells can
(3) Wrapper Instruction Register (WIR) enables all have two values in two clock cycles. Under the
IEEE 1500 wrapper operations. This register is control of the ATPG test vectors, when
loaded via the WSP with instructions that select an “wrapper_se” is 1, these flip-flops are connected as
IEEE 1500 data register. The WIR can optionally a scan chain to load the input signals and unload
be interfaced to the core for establishing test mode the output signals; when “wrapper_se” is 0, the
or functional operation. flip-flops are used to update the input signals and
capture the output signals.
(4) Wrapper Bypass Register (WBY) provides a
bypass path for the WSI-WSO terminals of the The above-mentioned simplified wrapper
WSP. The WBY is the default data register satisfies our test requirements for the design at
between WSI and WSO and should be selected by hand and reduces the silicon area overhead which
the current wrapper instruction when no other data is introduced by complicated IEEE 1500 wrapper.
2
DFT wrapper In Figure 2(b), we illustrated the core output
side and in Figure 2(c), we illustrated the core input
side. Note that the internal scan chains of a core
can be accessed through WPP of the wrapper,
IP which is not illustrated in Figure 2.
Core
logic
3 Hierarchical Diagnosis
As we know the hierarchical nature of SoC
designs makes the hierarchical ATPG and
diagnosis the most straightforward and effective
approaches. Since many prior papers discussed
wrapper_scan_out
wrapper_clk
wrapper_se
test_mode
wrapper_scan_in
how to perform hierarchical ATPG for core-based
SoC designs, in this section we will only focus on
how to run hierarchical diagnosis. It requires two
steps to run hierarchical diagnosis.
The first step is to identify if the failures come

from cores or come from glue logic between cores.
Figure 2(a): Conceptual View of Wrapper If the failures come from cores, we need identify
which core(s) caused the failures. This is
test_mode straightforward if we always maintain a core level
database to indicate a test cycle at a particular pin.
So, it requires maintain two mapping relations in
Core logic Output one database:
0 DFT
CORE
wrapper I/O
(1) The mapping between the primary IO port
names at core level and chip level in test mode.
1
d q d q Note that it is a simple task if we always
wrapper_scan_in
statically allocate a core to a fixed TAM during the
wrapper_scan_out entire testing of this particular core. If someone
sdi sdi
dynamically allocates cores to TAM to achieve
optimal test time, like the methodology proposed in
wrapper_se
se se [9] [10], the mapping relation must be also
ck ck dynamically tracked. Hence it will add another
wrapper_clk
dimension of test time / cycle number to the name
mapping database.
Figure 2(b): Wrapper Cell Circuit at Core
Output Side (2) The mapping between the test cycle numbers
at core level and chip level in test mode.
test_mode
Note that this mapping is not only related to a
DFT wrapper I/O Core logic test scheduling scheme, it also related to the test
0 setup procedure at the power on phase or between
Input
CORE testing of two cores. Occasionally, post shift
1
procedure will be applied to make the mapping
more error-prone. After the test scheduling is done,
dq and the test setup procedure is determined, we need
d q q
carefully calculate the mapping relation between
wrapper_scan_out test cycle numbers at core level and at chip level.
wrapper_scan_in sdi sdi
The second step is relatively easy. Suppose we
wrapper_se have identified the failing core and translate the
se se
failure data log from chip level to core level, based
ck ck on the information stored in the two mapping
wrapper_clk database. We can run core level diagnosis as if it is
a failing chip, by using core netlist, core patterns
Figure 2(c): Wrapper Cell Circuit at Core and translated core level failure data log.
Input Side
3
The above mentioned diagnosis flow can be The first silicon failed at-speed test. We have
iteratively applied from the upper level down to to identify the root cause of the test failures as soon
lower level, if hierarchical cores are used. as possible, and provide guidance to make the
Essentially hierarchical diagnosis can provide us design function. Good EDA tools can help
with better diagnosis resolution and shorter run engineers run diagnosis at logic and physical level,
time, compared to diagnosing the entire chip. which will greatly reduce the effort of silicon
debug. In our silicon debug phase, firstly, we apply
4. Case Study the proposed hierarchical diagnosis flow and
identify the failing IP core. Secondly, Mentor
In this section, we share our experiences with Graphics’s diagnosis tool YieldAssistTM was used
one industrial case study to illustrate the to do the logic diagnosis at core level.
application of the proposed hierarchical diagnosis
flow. The design we manufactured is a SoC The input data for YieldAssist include the
integrated with 4 hard IP cores, which are flattened design model, core level ATPG test
manufactured by 90nm CMOS process. Each core patterns and failure data log which is translated
has 27947 scan cells, 46 scan chains, and about from SoC level. The flattened model is generated
1500 PIs/POs. The entire SoC design has about 7 during ATPG, which includes design netlist, test
million gates in total. signal constraints and the circuit model of some
modules. WGL, parallel STIL, ASCII and Binary
Since the hard IP used to be applied into are the acceptable test pattern formats for
multiple projects, to avoid duplicated efforts to YieldAssist. Since the ATE vendors have different
generate the test pattern, we can just map the IP data log format, it’s required to convert them to a
level patterns to the chip level. This method not unified format which can be read into YieldAssist.
only improves the reusability and reliability of the
patterns, but also reduces the pattern debug effort This case proves the benefits of applying
in SoC level. hierarchical diagnosis, which is only requiring core
level test patterns and flattened model for
The core level patterns can be mapped to chip diagnosis. Since the size of hard IP flatten model is
level by applying the flow shown in Figure 3. It much smaller than that of SoC, the runtime of
also takes advantage of the mapping database YieldAssist is very short. Figure 4 illustrates our
proposed in the previous section, but in an opposite flow of diagnosing hard IP cores with YieldAssist.
way compared to diagnosis flow. Firstly, we map
the primary IO port names in hard IP cores to chip Chip Core vs. SoC
level. Then, we schedule the order of the cores to SoC ATE level Fail mapping
be tested, and determine a power-on setup pattern WGL data log database
sequence according to the chip reset sequence and
boot up mode. Finally, the chip level patterns are
obtained by merging the setup patterns and the
patterns at each hard IP core level, based on their Hard IP
scheduled order. Flattened Hard IP level
Hard IP
model fail data log
WGL
Hard IP SoC Setup
Hard IP constraints SoC procedure
YieldAssist
Fastscan Fastscan Pattern consistency check
Hard IP
pins vs.
SoC pads Failure file consistency check
mapping Hard IP WGL SoC setup WGL
list Chain Logic
diagnose diagnose
Diagnose Results:
Mapped Hard
IP WGL -Symptom separation
-Suspect type classification
-Suspect scoring & ranking
-Net, cell and pin
SoC WGL
Figure 3: Test Pattern Integration Flow Figure 4: Core Diagnosis with YieldAssist
4
In our case study, according to the YieldAssist Comparing the two Shmoo plots shown in
diagnosis report, a few particular nets are (5.a) and (5.b), we found that using internal on-
highlighted as the suspects. Meanwhile, we chip clock will cause intermittent failures at 1.15V,
correlate the diagnosis report with the STA report, whereas using external clocks this is not
which indicates that the suspected nets given by happening.
YieldAssist happen to be on a timing critical path
in on-chip clock generate module. So we know that To find out why we get intermittent failures at
the fault is caused by timing issues on the critical 1.15V when using internal on-chip clock, we
path. However, we still can not draw any exploit the micro probe measurement in failure
conclusion if the root cause is design-related or analysis laboratory and observe that the clocks
process-related. don’t necessarily shift to the same side at the same
time and it caused the phase to change between
To further investigate the root cause, we draw 2.9ns and 6.6ns. The two pictures illustrated in
shmoo plots with different voltage and clock Figures 6(a) and 6(b) show the phase difference
frequencies. The shmoo plots using an internal on- between launch and capture pulses is not stable
chip clock and using an external clock are shown in around 1.15v. Figure 6(a) shows the minimum
Figures 5(a) and 5(b) respectively. value of 3.04ns and Figure 6(b) shows the
maximum value 6.6ns. However, the expected
value should be around 12.7ns.
launch pulse
capture pulse
scan enable
Figure 5(a): Shmoo Plot with Internal On-

Chip Clock Source Figure 6(a): Minimum phase difference
launch pulse
capture pulse
scan enable
Figure 6(b): Maximum phase difference

Figure 5(b): Shmoo Plot with External
Clock Source
5
Based on these analyses, we finally can draw 5 Conclusions
the conclusions that:
In this paper, we first proposed a hierarchical
(1) The PLL generated clock phase variations lead diagnosis flow such that diagnosis can be directly
to on-chip clock generator output pulse changes. performed at core level instead of chip level. This
can be achieved by maintaining a database
(2) The intermittent failures are caused by an translating between core IO names and chip level
unstable launch/capture frequency which is pads and translating core level failure cycle number
generated from on-chip clock generator. to chip level failure cycle number. A silicon debug
case study was also given in detail to explain how
(3) At lower or higher voltage rather than around to correlate diagnosis results with shmoo plots,
1.15v, the phase difference stabilizes to a ~12.7ns micro probe measurement to identify the root cause
value, making the test to pass. of the design problems.
(4) The launch/capture pulse frequency lead to the References

AC scan failures. The timing critical paths in the
tested IP are the most sensitive paths at that
frequency. [1] Y. Zorian, E.J. Marinissen and S. Dey, “Testing
Embedded Core-Based System Chips,” Computer,
Volume: 32 Issue: 6, pp. 52 –60, June 1999.
After identified the root cause, we examined [2] I. Ghosh, N.K. Jha, and S. Dey, “A Low
the PLL carefully. In the design phase we have to Overhead Design for Testability and Test
use PLL behavior model for simulation because it Generation Technique for Core-Based System-on-
is an analog module. The behavior model may not a-Chip,” IEEE TCAD, Vol. 18, pp.1661-1676,
as accurate as synthesis model that were used for Nov., 1999.
other parts of the same design. This is why we can [3] V. Immaneni, and S. Raman, “Direct Access
not detect the pattern failure during design phase Test Scheme – Design of Block and Core Cells for
with simulation. Embedded ASICs,” pp.488-492, ITC 1990.
[4] N.A. Touba and B. Pouya, “Testing Embedded
To completely fix the problem, we need Cores Using Partial Isolation Rings,” pp. 10-15,
to create a better PLL model for future VTS 1997.
projects. However, for this project, we use external [5] P. Varma and S. Bhatia, “A Structured Test Re-
clock source as a workaround to speed up time-to- Use Methodology for Core-Based System Chips,”
market. The conceptual view of the clock path pp.294-302, ITC 1998.
select circuit is illustrated in Figure 7. As can be [6] E.J. Marinissen, R. Arendsen, G. Bos, H.
seen, When external_clk_sel=1 and cgm_en=0, the Dingemanse, M. Lousberg, and C. Wouters, “A
clk_root is input from external_scan_clk pad. Structured and Scalable Mechanism for Test
Access to Embedded Reusable Cores,” pp.284-292,
ITC 1998.
func_clock [7] E.J. Marinissen, S. K. Goel and M. Lousberg,
_enable clk from pll CG
func_clk
“Wrapper Design for Embedded Core Test,”
0 0
!cgm_en | pp.911-920, ITC 2000.
ac_test_clk 1 [8] http://grouper.ieee.org/groups/1500/
_enable [9] V. Iyengar, K. Chakrabarty, and E.J.
chip_test_ clk_root
mode test_clk Marinissen, “Test wrapper and test access
1 mechanism co-optimization for system-on-chip,”
external_scan_clk pp. 1023 – 1032, ITC, 2001.
[10] S. Koranne, “Design of reconfigurable access
(external_clk_sel & wrappers for embedded core based SoC test,”
!cgm_en) pp.955 – 960, IEEE Trans. on VLSI Systems, Vol.
| (test_clock_mux_sel & 11, Issue 5, Oct. 2003.
cgm_en)
Figure 7: Bypass PLL to use External Clock

Source

A Case Study of Hierarchical Diagnosis For Core-Ba

Uploaded by

Copyright:

Available Formats

A Case Study of Hierarchical Diagnosis For Core-Ba

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Case Study of Hierarchical Diagnosis For Core-Ba

Uploaded by

Copyright:

Available Formats

A Case Study of Hierarchical Diagnosis for Core-Based SoC

Yu Huang, Wu-Tung Cheng, Wu Yang, and James Fu

(5) Wrapper Boundary Register (WBR) is the data

Although IEEE 1500 wrapper is very flexible

(1) We do not need WBY for our SoC design. Our

The first step is to identify if the failures come

Figure 5(a): Shmoo Plot with Internal On-

Figure 6(b): Maximum phase difference

(4) The launch/capture pulse frequency lead to the References

Figure 7: Bypass PLL to use External Clock

You might also like