Academia.eduAcademia.edu

Software Accelerated Functional Fault Simulation for Data-Path Architectures

1995, 32nd Design Automation Conference

Software Accelerated Functional Fault Simulation for Data-Path Architectures* M. Kassab, N. Mukherjee, J. Rajski†, and J. Tyszer Microelectronics and Computer Systems Laboratory McGill University, Montreal, Canada, H3A 2A7 †Mentor Graphics Corporation, Wilsonville, OR 97070, USA Abstract - This paper demonstrates how fault simulation of building blocks found in data-path architectures can be performed extremely efficiently and accurately by taking advantage of their simple functional models and structural regularity. This technique can be used to accelerate the simulation of those blocks in virtually any fault simulation environment, resulting in fault simulation algorithms that can perform fault grading in a very demanding BIST environment. I. INTRODUCTION Data-path architectures constitute a large portion of circuits manufactured by the ASIC industry, and are mainly used in high performance computing systems like DSP circuits. The proliferation of embedded systems and high-level synthesis is expected to further increase the number of circuits comprising data-paths with such regular blocks as adders, multipliers, multiplexers, shifters, register files, etc. Recently, it has been shown that for circuits with datapath architectures, existing hardware on the chip, such as arithmetic and logic units (ALUs), can be used to successfully perform test pattern generation [1] and test response compaction [2]. Consequently, for a given circuit, a built-in self test (BIST) scheme can be devised such that the circuit tests itself with virtually no area overhead and no performance degradation [3]. The test is applied at-speed, which allows the application of a large number of cycles and increases the probability of detecting dynamic and unmodeled faults. However, in order to assess the quality of a proposed BIST scheme, fault grading has to be used. This requires fault simulation to be performed for a relatively large number of vectors with no fault dropping, which can be very computationally intensive for most known fault simulation techniques. Existing fault simulators fall into several categories. Most are gate-level simulators, such as PROOFS [4] and HOPE [5], with very efficient structural simulation algorithms and the flexibility of handling any circuit whose structural model is known. However, they do not exploit the functionality of the circuit and its building blocks to reduce simulation time; any circuit is treated like random logic. Gate-level simulators * This work was supported by a Cooperative Research and Development grant from the Natural Sciences and Engineering Research Council of Canada and Bell-Northern Research. 32nd ACM/IEEE Design Automation Conference  Permission to copy without fee all or part of this material is granted, provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.  1995 ACM 0-89791-756-1/95/0006 $3.50 require the entire circuit to be modeled at the structural level. This precludes simulating circuits which contains blocks that are only modeled behaviorally. Some simulators [6] model faults functionally at a higher level of abstraction. This enhances performance at the expense of accuracy. Simulators like MOZART [7] allow multilevel simulation, such that different blocks have the flexibility of being modeled at different levels of abstraction. Blocks which are not to be fault-simulated, or do not have a gate-level representation, can thus be modeled at a higher level of abstraction. More recent developments, like FEHSIM [8], use enhanced scheduling techniques to dynamically switch between different levels of abstraction such as to maximize speed without losing accuracy. The simulation approach presented in this paper exploits the fact that most modules in data-path architectures perform arithmetic operations for which fault-free simulation can be performed very efficiently. The regularity and functionality of many arithmetic structures makes it possible to compute the faulty output functionally, without resorting to structural simulation. This is done without loss of accuracy for any fault model. Hence, behavioral-level speed can be obtained with gate-level accuracy. Memory usage is also drastically reduced as no netlist has to be instantiated for the module, and no values internal to the netlist need to be stored for the different faulty machines. II. FUNCTIONAL FAULT MODELING IN REGULAR BLOCKS To analyze the fault coverage of a circuit, given a BIST scheme, fault simulation has to be performed without fault dropping [9], i.e., the entire fault list has to be simulated for all vectors, making the process very computationally intensive. The techniques presented in this paper exploit features of data-path building blocks to speed up their simulation, and hence make fault simulation of data-path circuits possible in a BIST environment. Data-path architectures mainly consist of building blocks, such as adders, subtractors, multipliers, comparators, etc. These blocks have regular structures and simple functionality. Hence, their faulty behavior can often be modeled and computed functionally with gate-level accuracy, as will be shown in this section. Fault simulation for blocks with functional fault models can thus be performed almost as fast as functional simulation of the fault-free model. This section examines the modeling of some of those building blocks. The fault-free functionality can be represented by simple operations, which can be invoked for faults external to the module. For faults internal to the module, the faulty output of the module can be efficiently computed by superposing the effect of the fault on the fault-free output. Two examples are covered in detail: a ripple-carry adder and an array multiplier for unsigned numbers. In a similar way, fault models were developed for a number of other building blocks, such as Booth multipliers, ALUs, and multiplexers. ( c f – c ) . Hence, the fault effect is realized by adding ( c f – c ) f+1 to bit f + 1 , or adding ( c f – c ) ⋅ 2 to S. The two effects are f f+1 superposed, such that ( s f – s ) ⋅ 2 + ( c f – c ) ⋅ 2 is added to S. ■ A. Adders, Subtractors, and Comparators Based on Theorem 1, calculation of the output of the faulty adder can be conducted as follows. First, variables s , c , s f , and c f have to be determined. To look up these values in the corresponding table, the three inputs to faulty cell f are calculated. The two input bits to the cell, A f , and B f , are directly extracted from primary inputs A and B , respectively. The carry-in bit to the cell is equal to the carry-out of the sum of bit-range 0 to f – 1 . This is illustrated in Algorithm 1. The modeling of an adder is illustrated using a ripple-carry adder (Figure 1). Each bit-slice is implemented as a full-adder cell. This simulation technique can be applied to a variety of fault models. The single-stuck-at model will be used in this paper. The uncollapsed stuck-at fault set for the full-adder cell consists of 30 faults. Hence, an n-bit adder contains 30n faults (uncollapsed fault set). FAn-1 adder(fault, x, y, cin) case (fault location) of external to module: return ( x + y + c in ) internal to module: f = index of faulty full-adder a = x [ f] b = y [ f] if (fault in least significant bit-slice) c = c in else c = carry( x [ f – 1, 0 ] + y [ f – 1, 0 ] + c in ) cout A FAf a b s S FA1 B fault-free sum of f = a ⊕ b ⊕ c fault-free carry of f = ( a ∧ b ) ∨ ( b ∧ c ) ∨ ( a ∧ c ) cin faulty sum of f = table_sum(fault,a,b,c) faulty carry of f = table_carry(fault,a,b,c) FA0 d sum = (faulty sum of f ) - (fault-free sum of f ) d carry = (faulty carry of f ) - (fault-free carry of f ) Fig. 1: Ripple-carry adder. In the functional fault model, the faulty output due to an internal fault will be computed by superposing the fault effect on the fault-free sum. The faulty behavior of a full-adder cell is known. A lookup table is used to model the behavior of the cell by storing the outputs (sum and carry-out) of the full3 adder cell for all faults ( 30 ) and all possible inputs ( 2 = 8 ). The table in this example models single stuck-at faults. Consider the n -bit adder, with the fault-free sum output S, and an internal fault located in bit-slice f . Let s and c be the fault-free sum and carry-out values of bit-slice f , respectively, while sf and cf denote the faulty values of s and c. The faulty output of the adder can be computed according to the following theorem: Theorem 1: The f output S + ( sf – s) ⋅ 2 + ( cf – c) ⋅ 2 of . f+1 the faulty adder is Proof: To superpose the effect of the fault on the output of the bit-slice, bit f of S has to change from to sf. This is equivf alent to adding ( s f – s ) to bit f , or adding ( s f – s ) ⋅ 2 to S. The carry-out from bit f is an input to the adder formed by bits f + 1 to n-1. The difference in the output of the adder formed by bits f + 1 to n – 1 , for the faulty adders, is equal to fault-free output = x +f y + c in f+1 correction = ( d sum ⋅ 2 ) + ( d carry ⋅ 2 ) faulty output = fault-free output + correction return(faulty output) Algorithm 1: Binary adder Functional fault model The functional model shown by the algorithm computes the faulty output in constant time, independent of the adder size. The use of table lookup for the full-adder cells allows fast evaluation, as well as the flexibility to use different fault models. Subtractors and comparators can be modeled as extensions of the adder. By inverting one of the adder’s inputs and feeding a 1 to the carry-in of the least significant bit, the adder is transformed into a 2’s complement subtractor. The functional fault model then needs to distinguish whether the fault lies on the adder or the input inverters, and inject the fault accordingly. The comparator, with inputs A and B , is required to check if A > B . This can be realized by feeding A and B to the negative and positive inputs of the subtractor, respectively. The result bit is the carry-out of the adder. The result bit is a 0 when A ≤ B , and a 1 when A > B . B. Multipliers The modeling of a multiplier is illustrated by an array multiplier for unsigned numbers [10] (Figure 2). The multiplier uses an array of carry-save adders to add the partial products. The structure consists of an array of full-adders. The multiplier accepts an m -bit input x and an n -bit input y . The implementation shown contains m ( n – 1 ) full-adders. Hence, the uncollapsed fault set consists of 30m ( n – 1 ) stuck-at faults. i FA 0 xiyj 1 2 n-2 xi+1yj-1 0 0 0 0 of the multiplication by p for the fault-free multiplier, and by p f for the multiplier with fault f . Note that the structure of the circuit is very regular except for the last row, where the carry ripples horizontally. Hence, the analysis to determine the three inputs of C i, j is divided into two parts: the first is used for any cell located in any row but the last one, while the second is used for cells in the last row. First, consider cells in any row except the last. The inputs x and y have masks applied to them, such that the value of the desired line can be observed directly on the output of the multiplier with the masked inputs. Hence, the appropriate masks are applied to the inputs, functional multiplication is performed, and the specific bit are extracted from the output. Let x [ i ] represent the value of the i’th bit of x . Also let x m be the masked value of input x and y m the masked value of input y , such that: 1 xm [ p] = 2 3 j m-1 0 Pn+m-1 Pn+m-2 Pm+2 Pm+1 Pm Pm-1 m P3 P2 P1 P0 Fig. 2: Array multiplier The fault-free model is a multiplication operation, which is invoked for faults external to the multiplier. For internal faults, the proposed fault model computes the faulty output of the multiplier in the following 3 steps: 1. Determination of the coordinates of the faulty cell. 2. Computation of the three inputs of the faulty cell. 3. Computation of the output of the faulty multiplier. As with the adder, the faulty output is determined by superposing the fault effect on the correct multiplication result. Given the cell in which the fault is located, the faulty and fault-free outputs of the cell need to be determined before the superposition can be performed. However, for the outputs of the cell to be extracted from the full-adder lookup table, the inputs of the faulty cell first have to be determined. Let C i, j denote the full-adder cell with coordinates i and j , where i and j are the column and row numbers, respectively. The coordinates of the faulty cell can be extracted from the fault identifier. Inputs of faulty cell C i, j Each cell C i, j has two inputs a i, j and b i, j , a carry-in input c i, j , a sum output s i, j , and a carry-out output e i, j . Let the sum and carry-out outputs of C i, j for a fault f located in the cell be s i, j, f and e i, j, f respectively. We will also denote the result ym [ q] = { { x [ p] ; 0; y [ q] ; 0; i+1≤p≤m–1 0≤p≤i 0≤q≤j–1 q≥j (1) (2) The product of the masked inputs is p m = x m ⋅ y m . Bit ( i + j ) of p m , contains the first input a i, j to C i, j . Hence, a i, j = p m [ i + j ] . th Theorem 2: The input a i, j to cell C i, j is the ( i + j ) bit of the product of x m and y m , where x m and y m are defined by Equations 1 and 2, respectively. Proof: First, we will prove that the value of a i, j is the same for inputs x m and y m as it is for the original inputs x and y . From the structure of the multiplier, it can be seen that a i, j is not affected by { x ( q ) , q ≤ i } or by { y ( r ) , r ≥ j } . Hence, the bits which are masked in x m and y m are not used in computing a i, j . Now we have to show that a i, j is observed th on bit ( i + j ) of x m ⋅ y m . a i, j propagates to the ( i + j ) bit of p m through a number of full-adder cells. The other two inputs of each of these full-adders is reduced to zero by the masks applied to x and y . Hence, the full-adders between a i, j and the primary output become transparent, and the value propagates to the output of the multiplier unchanged. ■ The output s i, j of C i, j can be obtained in a similar way by computing the input a i, j + 1 to C i, j + 1 . The carry-in signal c i, j can then be deduced as follows: c i, j = s i, j ⊕ a i, j ⊕ b i, j = a i, j + 1 ⊕ a i, j ⊕ b i, j . where b i, j = x [ i ] ∧ y [ j ] . Now consider cells in the last row. The output of cell C i, j , where j = m , is: s i, j = p [ i + j ] = p [ i + m ] . Inputs a i, j for any cell in the last row can be computed as previously described for cells in other rows. To calculate c i, j , the carry-out of cell C i, j – 1 is calculated as presented in the previous case, by computing the three inputs of C i, j – 1 , then calculating the carry-out, where the carry-out is ( a i, j – 1 ∧ b i, j – 1 ) ∨ ( a i, j – 1 ∧ c i, j – 1 ) ∨ ( b i, j – 1 ∧ c i, j – 1 ) . The input b i, j is 0 for the first cell of the last row ( i = 0 ), i.e. for C 0, m , b 0, m = 0 . For any other cell in the last row ( i > 0 ), b i, j is s i, j ⊕ a i, j ⊕ c i, j . Given a i, j , b i, j , and c i, j , the sum and carry-out can be determined from the full-adder output lookup table. Note that the table lookup technique is just one method of determining the cell output. If the cell involved is large, the memory requirements to store its outputs for all internal faults and all input combinations may be impractical. In that case, some other technique, like gate-level simulation of the cell, may be performed to determine the faulty and fault-free cell outputs. Faulty output Given the output and carry-out of the faulty cell, the faulty multiplier output is calculated by superposing the fault effect on the fault-free output. Let δ s and δ e denote the difference between the faulty and fault-free values of the sum and carryout outputs of C i, j , respectively, i.e. δ s = s i, j, f – s i, j δ e = e i, j, f – e i, j Theorems 3 and show the effect of changes to s i, j and e i, j on the multiplication result, respectively. Theorem 3: The difference δ s between the output of the faulty and fault-free multiplier, due to the fault effect on the i+j sum output of C i, j , is δ s ⋅ 2 . Proof: From the structure of the multiplier, it follows that the output of cell C i, j (or any input of C i, j ) is added to the final product, which is essentially a sum of the different rows, at position ( i + j ) . Due to the linearity of the circuit, the change from 0 to 1, or 1 to 0, can be superposed on the prodi+j uct by adding ± 1 at position ( i + j ) . That is, by adding ± 2 to the fault-free result of the multiplication. ■ Theorem 4: The difference δ e between the output of the faulty and fault-free multiplier circuits, due to the fault effect i+j+1 on the carry-out output of C i, j , is δ e ⋅ 2 . Proof: The carry-out from C i, j is an input of C i, j + 1 . According to Theorem 3, the effect of the change on the line can be superposed on the product at position i + ( j + 1 ) . That i+j+1 is, by adding ± 2 to the fault-free multiplication. ■ The effects of changes on both s i, j and e i, j , discussed in Theorems 3 and , respectively, can be superposed to compute the product of the faulty multiplier: pm = p + δs ⋅ 2 i+j + δe ⋅ 2 i+j+1 The multiplier functional fault model is shown in Algorithm 2 . As with the adder, the evaluation of the multiplier model requires constant time, i.e. the performance of the model is independent of the size of the multiplier. Hence, the efficiency of this simulation technique compared to gate-level simulation increases with larger structures. multiplier(fault, x, y) case (fault location) of external to module: return ( x ⋅ y ) internal to module: Determine coordinates i and j of faulty cell C i, j p m = x [ m – 1, i + 1 ] ⋅ y [ j – 1, 0 ] if (faulty cell in any row except the last) Input a of C i, j = p m [ i + j ] Input b of C i, j = x [ i ] ∧ y [ i ] Output s of C i, j = input of C i, j + 1 Input c in of C i, j = a ⊕ b ⊕ s else Input a of C i, j = p m [ i + j ] Output s of C i, j = Bit ( i + j ) of x ⋅ y Compute the 3 inputs of C i, j – 1 , as done for C i, j Output c out of C i, j – 1 = ( a ∧ b ) ∨ ( a ∧ c in ) ∨ ( b ∧ c in ) Input c in of C i, j = cout of C i, j – 1 Input b of C i, j = a ⊕ s ⊕ c in f.f. sum of f = a ⊕ b ⊕ c in f.f. carry of f = ( a ∧ b ) ∨ ( a ∧ c in ) ∨ ( b ∧ c in ) faulty sum of f = table_sum(fault,a,b,cin) faulty carry of f = table_carry(fault,a,b,cin) d sum = (faulty sum of C i, j ) - (f.f. sum of C i, j ) d carry = (faulty carry of C i, j ) - (f.f. carry of C i, j ) fault-free output = x ⋅ y i+j i+j+1 ) correction = ( d sum ⋅ 2 ) + ( d carry ⋅ 2 faulty output = fault-free output + correction return(faulty output) Algorithm 2: Array multiplier functional fault model III. EXPERIMENTAL RESULTS The experimental results presented in this section demonstrate the performance of the proposed fault simulation scheme and its applicability to typical data-path architectures. High-level synthesis benchmark circuits were simulated in a computationally-demanding BIST environment. The experiments are divided into two parts. First, some basic building blocks are analyzed in Section III.A. This involves simulation benchmarks using functional fault models, as well as testability results. In the second part, covered in Section III.B, a number of high-level synthesis benchmark circuits are used to evaluate the efficiency of the simulation technique. High-level synthesis benchmark circuits are computing structures with data-paths comprising building blocks with regular structures. Each of these building blocks can consist of thousands of gates, making the circuits too computationally intensive to simulate at the gate-level for a large number of vectors with no fault dropping. A. Analysis of Building Blocks Four arithmetic building blocks were simulated with pseudo-random test vectors: a multiplier, an adder, a subtractor, and a comparator. All simulations were run on a Sun SparcStation 5 with 32 MB of RAM and the results are summarized in Table I. A 16-bit data-path is used for all experiments. The simulation times are provided for the case in which detected faults are dropped from the list of active faults, as well as the case in which no fault dropping is performed; that is, all faults are simulated for all vectors. Complete fault coverage is obtained for all blocks. TABLE I: Simulation of building blocks Module Observation 32-bit product 16 MSB (TRUNC) 16 MSB (XOR4) mul16 16 MSB (XOR2) 16 MSB (XOR1) 16 MSB (ADD) adder16 16-bit sum (no carry) sub16 16-bit difference 1-bit (carry of adder) 1-bit (XOR4) cmp16 1-bit (XOR2) 1-bit (XOR1) Faults 7103 7008 7032 7056 7109 7103 456 522 445 463 487 540 Vectors CPU time (sec) (100% FC) Dropping No dropping 280 3.2 14.3 245096 857.2 12512 661 1.7 35.5 320 0.9 17.9 280 0.8 17.0 280 0.7 14.8 26 0.003 0.040 22 0.003 0.33 178397 39.9 244.8 154 0.020 0.24 85 0.011 0.17 67 0.009 0.16 The simulation speed is also shown as the number of evaluations that can be performed per second for each of the building blocks. This is done for both the faulty and fault-free models, with the results shown in Table II. For example, the faulty 16 × 16 multiplier can be evaluated approximately 160,000 times per second. The time needed to evaluate a fault-free multiplier is approximately one order of magnitude less than that needed to evaluate the functional model of the faulty multiplier. The rate of equivalent gate evaluations refers to the performance achieved with the functional model, relative to structural simulation. That is, if the faulty multiplier can be evaluated 160,000 times per second, and its circuit consists of 1200 gates, then this is equivalent to simulating 8 1.92 ×10 gates per second. Memory requirements are drastically reduced when functional fault modeling is used. No netlist or internal values need to be stored for the module. Only one copy of the fault model needs to be kept, as well as copies of the memory elements in the circuits (e.g. registers) for all faults. The lookup table for the full-adder cell requires little memory. The cell has 3 single-bit inputs (8 possible input combinations) and 30 internal faults (uncollapsed fault set). For those 240 possible input-fault combinations, the values of the sum and carry-out bits need to be stored. If those 2 bits for every input-fault are stored in one byte of memory, the table requires 240 bytes. If the fault-free outputs are stored in the table as well (instead of being calculated using the boolean equations), then 248 bytes are required for the table. This memory can be reduced 4 times (to 62 bytes) by making use of all 8 bits in each byte of the array, instead of using only 2 bits. Since a 16-bit data-path is being used, the 32-bit output of the multiplier is truncated by taking the most significant 16 bits. However, the truncation makes many faults in the circuit hard to observe as they have to propagate through many fulladder cells before reaching the observed outputs. This can be seen by the large number of input vectors that need to be applied to reach complete fault coverage. A number of modifications can be implemented to increase the observability of most faults in test mode, and hence decrease the test length. In the XOR1 scheme (Figure 3), the 16 least significant bits of the output are XORed together, and the result is fed to the carry-in of the adder chain in the last row of the multiplier, i.e. to the input of C 0, m which is normally set to 0. The XOR2 and XOR4 schemes are the same as XOR1, except that the number of XOR gates is reduced as every second or fourth bit is XORed, respectively. For example, in the XOR4 scheme, the carry-in bit of C 0, m is set to P 4 ⊕ P 8 ⊕ P 12 . These modifications reduce the test length by three orders of magnitude. In the ADD scheme, the 16 LSBs are added to the 16 MSBs. The applicability of the ADD scheme is dependent on how the data-path is implemented, and whether it is feasible to perform the operation using existing hardware. xi+1yj-1 0 0 xiyj FA TABLE II: Simulation performance of building blocks Module No. gates mul16 adder16 sub16 cmp16 1200 80 96 96 Block eval/sec Equiv. gate eval/sec Fault-free Faulty Fault-free Faulty 1,700,000 160,000 2,040,000,000 192,000,000 2,400,000 353,000 198,000,000 28,200,000 3,022,000 409,000 290,000,000 39,300,000 2,600,000 394,000 250,000,000 37,800,000 Pn+m-1 Pn+m-2 Pm+2 Pm+1 Pm Fig. 3: Array multiplier with XOR gates 0 0 The comparator circuit suffers from an observability limitation similar to that of the multiplier. It is implemented as a subtractor, with the carry-out of the adder being observed. Hence, faults have to propagate through to the carry-out output of the adder to be observed. The modifications in this case consist of XORing selected output bits from the adder with the output bit. This leads to a significant reduction in the test length - more than three orders of magnitude. In summary, a number of arithmetic building blocks were simulated with and without fault dropping. The simulation times illustrate the efficiency of the technique. Complete fault coverage can be achieved for all the blocks. However, some blocks feature limited observability that requires a large number of test vectors to achieve complete fault coverage. A number of modifications to these blocks can be devised to enhance the observability, and hence drastically decrease the test length required for complete fault coverage. B. High-Level Synthesis Benchmarks Two high-level synthesis benchmark circuits are analyzed in this section: an elliptical wave filter and a band-pass filter. The implementation of the elliptical wave filter contains 3 multipliers, 3 adders, and 17 registers. A total of 22677 faults are injected on the adders and multipliers. For each input vector applied, the circuit performs a total of 34 operations: 26 additions and 8 multiplications. The input vectors are generated using an additive generator, which uses existing adders in the circuit. Compaction is done by converting the indicated addition operation to rotatecarry addition [3]. Table III indicates the fault coverage achieved and simula- lation results are shown in Table IV. As with the elliptical TABLE IV: BPF benchmark simulation No. vectors 10 100 1000 10,000 100,000 CPU time (sec) 9.8 98 974 10,739 ≈ 100, 000 TRUNC 68.097 88.647 95.573 99.592 99.987 Fault coverage (%) XOR4 XOR2 XOR1 88.766 94.275 94.397 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 ADD 99.258 100.00 100.00 100.00 100.00 wave filter circuit, complete fault coverage is achieved for all cases except when the multiplier output is truncated. IV. CONCLUSIONS In this paper, it has been shown that the regularity of the structures of several building blocks commonly used in datapath architectures can be used to derive accurate functional fault models. The faulty response is typically computed by isolating the fault effect and superposing it on the fault-free result. This leads to very efficient fault simulation of these blocks, reducing simulation of hundreds or thousands of gates to a few instructions. The technique can be incorporated into a variety of simulation environments for accelerating the fault simulation of regular blocks that lend themselves to this modeling approach. Furthermore, memory usage is significantly reduced since no netlist needs to be instantiated for the blocks, and no internal values need to be stored for the different faulty machines. REFERENCES TABLE III: EWF benchmark simulation No. vectors 10 100 1000 10,000 100,000 CPU time (sec) 10.9 105 1050 10,485 10,9898 TRUNC 71.896 87.924 94.377 98.214 99.781 Fault coverage (%) XOR4 XOR2 XOR1 84.513 89.772 93.043 99.283 99.902 99.943 99.982 100.00 99.996 100.00 100.00 100.00 100.00 100.00 100.00 [1] ADD 95.251 99.921 100.00 100.00 100.00 tion time required to apply a given number of vectors. Note that for each vector applied, all operations shown in the data flow graph (DFG) are performed (26 additions and 8 multiplications). This is the equivalent of simulating 11680 gates for each of the 22678 machines - for every vector applied. The fault coverage, when the 16 least-significant bits the multipliers’ outputs are truncated, does not reach 100% due to the observability limitation of the multiplier. Full fault coverage is reached for each of the modified circuits within a reasonable test length. The second benchmark circuit simulated is the band-pass filter. A total of 15640 faults are injected on the 2 multipliers, 2 adders, and 1 subtractor. There are 13 registers in the circuit. A total of 29 operations are performed for each input vector: 12 multiplications, 10 additions, and 7 subtractions. The simu- S. Gupta, J. Rajski, and J. Tyszer, “Test Pattern Generation Based on Arithmetic Operations,” Proc. of the ICCAD, pp. 117-124, Nov. 1994. [2] J. Rajski and J. Tyszer, “Accumulator-Based Compaction of Test Responses,” IEEE Trans. on Computers, pp. 643-650, June 1993. [3] M. Kassab, J. Rajski, and J. Tyszer, “Accumulator-Based Compaction for Built-In Self Test of Data-path Architectures,” 1st Asian Test Symposium, pp. 241-246, Hiroshima, Japan, Nov. 1992. [4] T. M. Niermann and W. T. Cheng and J. H. Patel, “PROOFS: A Fast, Memory Efficient Sequential Circuit Fault Simulator,” IEEE Trans. on Computer-Aided Design, pp. 198-207, Feb. 1992. [5] H. K. Lee and D. S. Ha, “HOPE: An Efficient Parallel Fault Simulator for Synchronous Sequential Circuits,” Proc. 29th ACM/IEEE Design Automation Conference, pp. 336-340, June 1992. [6] S. Ghosh, “Behavioral-Level Fault Simulation,” IEEE Design and Test of Computers, pp. 31-42, June 1988. [7] S. Gai and P. L. Montessoro and F. Somenzi, “MOZART: A Concurrent Multilevel Simulator,” IEEE Trans. on Computer-Aided Design, pp. 1005-1016, Sep. 1988. [8] W. Meyer and R. Camposano, “Fast Hierarchical Multi-Level Fault Simulation of Sequential Circuits with Switch-Level Accuracy,” Proc. 30th ACM/IEEE Design Automation Conference, pp. 515-519, 1993. [9] N. Mukherjee, M. Kassab, J. Rajski, and J. Tyszer, “Arithmetic Built-In Self Test for High-Level Synthesis,” VLSI Test Symposium, 1995. [10] I. Koren, “Computer Arithmetic Algorithms,” Prentice Hall, 1993.