Module 5
Module 5
Module 5
Chapter 16
Fault-Based Testing
A model of potential program faults is a valuable source of information for evaluating and designing
test suites. Some fault knowledge is commonly used in functional and structural testing. Fault-based
testing uses a fault model directly to hypothesize potential faults in a program under test, as well as to
create or evaluate test suites based on its efficacy in detecting those hypothetical faults.
➢ Overview
Engineers study failures to understand how to prevent similar failures in the future. For example, failure
of the Tacoma Narrows Bridge in 1940 led to new understanding of oscillation in high wind and to the
introduction of analyses to predict and prevent such destructive oscillation in subsequent bridge design.
The causes of an airline crash are likewise extensively studied, and when traced to a structural failure they
frequently result in a directive to apply diagnostic tests.
Experience with common software faults sometimes leads to improvements in design methods and
programming languages. For example, the main purpose of automatic memory management in Java is not
to spare the programmer the trouble of releasing unused memory, but to prevent the programmer from
making the kind of memory management errors like dangling pointers, redundant deallocations, and
memory leaks that frequently occur in C and C++ programs.
Some faults must be detected through testing, and there too we can use knowledge about common faults
to be more effective. The basic concept of fault-based testing is to select test cases that would
distinguish the program under test from alternative programs that contain hypothetical faults.
Fault seeding can be used to evaluate the thoroughness of a test suite (as an element of a test adequacy
criterion), or for selecting test cases to augment a test suite, or to estimate the number of faults in a
program.
If the program under test has an actual fault, we may hypothesize (assume) that it differs from another
corrected program by only a small textual change. If so, then we need merely distinguish the program
from all such small variants to ensure detection of all such faults. This is known as the competent
programmer hypothesis, an assumption that the program under test is close to a correct program.
Module 5
Some program faults are indeed simple typographical errors, and others that involve deeper errors of
logic. Sometimes, though, an error of logic will result in much more complex differences in program text.
This may not invalidate fault-based testing with a simpler fault model, provided test cases sufficient for
detecting the simpler faults are sufficient also for detecting the more complex fault. This is known as the
coupling effect.
Program location: A region in the source code. The precise definition is defined relative to the syntax of
a particular programming language. Typical locations are statements, arithmetic and Boolean expressions,
and procedure calls.
Alternate expression: Source code text that can be legally substituted for the text at a program location.
A substitution is legal if the resulting program is syntactically correct (i,e., it compiles without errors).
Alternate program: A program obtained from the original program by substituting an alternate
expression for the text at some program location.
Distinct behavior of an alternate program R for a test t: The behavior of an alternate program R is
distinct from the behavior of the original program P for a test t, if R and P produce a different result for t,
or if the output of R is not defined for t.
Distinguished set of alternate programs for a test suite T: A set of alternate programs are distinct if
each alternate program in the set can be distinguished from the original program by at least one test in T.
Mutant: A program that differs from the original program for one syntactic element (e,g., a statement , a
condition, a variable , a label).
Distinguished mutant: A mutant that can be distinguished for the original program by executing at least
one test case.
Equivalent mutant: A mutant that cannot be distinguished from the original program.
Mutation operator: A rule for producing a mutant program by syntactically modifying the original
program.
Module 5
➢ Mutation Analysis
Mutation analysis is the most common form of software fault-based testing. A fault model is used to
produce hypothetical faulty programs by creating variants of the program under test. Variants are created
by seeding faults, that is, by making a small change to the program under test following a pattern in
the fault model. The patterns for changing program text are called mutation operators, and each variant
program is called a mutant.
Mutants should be believable as faulty programs. Mutant programs that are rejected by a compiler, or that
fail almost all tests, are not good models of the faults we seek to uncover with systematic testing.
We say a mutant is valid if it is syntactically correct. A mutant obtained from the program of Figure 1
by substituting while for switch in the statement at line 13 would not be valid, since it would result in a
compile-time error.
We say a mutant is useful if, in addition to being valid, its behavior differs from the behavior of the
original program for no more than a small subset of program test cases. A mutant obtained by
substituting 0 for 1000 in the statement at line 4 would be valid, but not useful, since the mutant would
be distinguished from the program under test by all inputs and thus would not give any useful information
on the effectiveness of a test suite. Defining mutation operators that produce valid and useful mutations
is a nontrivial task.
Figure 1: Program transducer converts line endings among UNIX, DOS, and Macintosh conventions
1
2 /** Convert each line from standard input */
3 void transducer() {
4 #define BUFLEN 1000
5 char buf[BUFLEN]; /* accumulate line into this buffer */
6 int pos=0; /* Index for next character in buffer */
7
8 char inChar; /*Next character from input*/
9
10 int atCR = 0; /* 0=”within line”, 1=”optional DOS LF” */
11
12 while ((inChar = getchar()) != EOF) {
13 switch(inChar) {
14 case LF:
15 if(atCR) { /* optional DOS LF */
16 atCR = 0;
17 } else { /* Encountered CR within line */
18 emit(buf,pos);
19 pos = 0;
Module 5
20 }
21 break;
22 case CR:
23 emit(buf,pos);
24 pos=0;
25 atCR = 1;
26 break;
27 default:
28 if(pos >= BUFLEN-2) fail(“Buffer overflow”);
29 buf[pos++] = inChar;
30 } /* switch */
31 }
32 if(pos >0) {
33 emit(buf,pos);
34 }
35 }
Since mutants must be valid, mutation operators are syntactic patterns defined relative to particular
programming languages. Figure 2 shows some mutation operators for the C language. Constraints are
associated with mutation operators to guide selection of test cases likely to distinguish mutants from the
original program. For example, the mutation operator svr (scalar variable replacement) can be applied
only to variables of compatible type (to be valid), and a test case that distinguishes the mutant from the
original program must execute the modified statement in a state in which the original variable and its
substitute have different values.
Many of the mutants of Figure 2 can be applied equally well to other procedural languages, but in general
a mutation operator that produces valid and useful mutants for a given language may not apply to a
different language or may produce invalid or useless mutants for another language. For example, a
mutation operator that removes the friend keyword from the declaration of a C++ class would not be
applicable to Java, which does not include friend classes.
Figure 2: A sample set of mutation operators for the C language, with associated constraint to select
test cases that distinguish generated mutants from the original program.
ID Operator Description Constraint
Operand Modifications
scr scalar for constant replacement replace constant C1 with scalar variable X C1 ≠ X
acr array for constant replacement replace constant C with array reference A[I] C1 ≠ A[I]
scr struct for constant replacement replace constant C with struct field S C≠S
Module 5
svr scalar variable replacement replace scalar variable X with a scalar variable Y X≠Y
csr constant for scalar variable replacement replace scalar variable X with a constant C X≠C
asr array for scalar variable replacement replace scalar variable X with an array reference X ≠ A[I]
A[I]
ssr struct for scalar replacement replace scalar variable X with a struct field S X≠S
car constant for array replacement replace array reference A[I] with constant C A[I] ≠ C
sar scalar for array replacement replace array reference A[I] with scalar variable X A[I] ≠ X
cnr comparable array replacement replace array reference A[I] with a comparable
array reference
sar struct for array replacement replace array reference A[I] with a struct field S A[I] ≠ S
Expression Modifications
operator Φ
connector Φ
operator Φ
cpr constant for predicate replacement replace predicate with a constant value
Statement Modifications
sdl statement deletion delete a statement
sca switch case replacement replace the label of one case with another
ses end block shift move } one statement earlier and later
Module 5
➢ Fault-Based Adequacy Criteria
Given a program and a test suite T, mutation analysis consists of the following steps:
Select mutation operators: If we are interested in specific classes of faults, we may select a set of
mutation operators relevant to those faults.
Generate mutants: Mutants are generated mechanically by applying mutation operators to the original
program.
Distinguish mutants: Execute the original program and each generated mutant with the test cases in T .
A mutant is killed when it can be distinguished from the original program.
Figure 3 shows a sample of mutants for program Transduce, obtained by applying the mutant operators
in Figure 2. Test suite TS
kills Mj, which can be distinguished from the original program by test cases 1D, 2U, 2D, and 2M. Mutants
Mi, Mk, and Ml are not distinguished from the original program by any test in TS. We say that mutants not
killed by a test suite are live.
• The mutant can be distinguished from the original program, but the test suite T does not contain a
test case that distinguishes them (i.e., the test suite is not adequate with respect to the mutant).
• The mutant cannot be distinguished from the original program by any test case (i.e., the mutant is
equivalent to the original program).
Given a set of mutants SM and a test suite T, the fraction of nonequivalent mutants killed by T
measures the adequacy of T with respect to SM.
Unfortunately, the problem of identifying equivalent mutants is undecidable in general, and we could err
either by claiming that a mutant is equivalent to the program under test when it is not or by counting some
equivalent mutants among the remaining live mutants.
The adequacy of the test suite TS evaluated with respect to the four mutants of Figure 3 is 25%. However,
we can easily observe that mutant Mi, is equivalent to the original program (i.e., no input would distinguish
it). Conversely, mutants Mk and Ml seem to be nonequivalent to the original program: There should be at
least one test case that distinguishes each of them from the original program. Thus the adequacy of TS,
measured after eliminating the equivalent mutant Mi, is 33%.
Mutant Ml is killed by test case Mixed, which represents the unusual case of an input file containing both
DOS- and Unix-terminated lines. We would expect that Mixed would also kill Mk, but this does not
actually happen: Both Mk and the original program produce the same result for Mixed. This happens
Module 5
because both the mutant and the original program fail in the same way. The use of a simple oracle for
checking the correctness of the outputs (e.g., checking each output against an expected output) would
reveal the fault. The test suite TS2 obtained by adding test case Mixed to TS would be 100% adequate
(relative to this set of mutants) after removing the fault.
Figure 3: A sample set of mutants for program Transduce generated with mutation
operators from Figure 2. x indicates the mutant is killed by the test case in the column head.
Mk sdl 16 atCR = 0 - - - - - - - -
Nothing
Ml ssr 16 atCR = 0 - - - - - - - x
Pos=0
Mixed Mix of DOS and UNIX line ends in the same file
It can generate a number of mutants quadratic in the size of the program. Each mutant must be compiled
and executed with each test case until it is killed. The time and space required for compiling all mutants
and for executing all test cases for each mutant may be impractical.
The computational effort required for mutation analysis can be reduced by decreasing the number of
mutants generated and the number of test cases to be executed. Weak mutation analysis decreases the
number of tests to be executed by killing mutants when they produce a different intermediate state,
rather than waiting for a difference in the final result or observable program behavior.
A meta-mutant program is divided into segments containing original and mutated source code, with a
mechanism to select which segments to execute. Two copies of the meta-mutant are executed in tandem,
one with only original program code selected and the other with a set of live mutants selected. Execution
is paused after each segment to compare the program state of the two versions. If the state is equivalent,
execution resumes with the next segment of original and mutated code. If the state differs, the mutant is
marked as dead, and execution of original and mutated code is restarted with a new selection of live
mutants.
The statistical estimate, is the extent to which a test suite distinguishes programs with seeded faults
from the original program, and then only a much smaller statistical sample of mutants is required. The
main limitation of statistical mutation analysis is that partial coverage is meaningful only to the extent that
the generated mutants are a valid statistical model of occurrence frequencies of actual faults and it has
limitation to assessment rather than creation of test suites. To avoid reliance on this questionable
assumption, the target coverage should be 100% of the sample; statistical sampling may keep the sample
small enough to permit careful examination of equivalent mutants.
Counting fish: Lake Winnemunchie is inhabited by two kinds of fish, a native trout and an introduced
species of chub. The Fish and Wildlife Service wishes to estimate the populations to evaluate their efforts
to eradicate the chub without harming the population of native trout.
The population of chub can be estimated statistically as follows. 1000 chub are netted, their dorsal fins
are marked by attaching a tag, and then they are released back into the lake. Over the next weeks,
fishermen are asked to report the number of tagged and untagged chub caught. If 50 tagged chub and 300
untagged chub are caught, we can calculate
1000 50
=
Untagged chub population 300
and thus there are about 6000 untagged chub remaining in the lake.
Module 5
It may be tempting to also ask fishermen to report the number of trout caught and to perform a similar
calculation to estimate the ratio between chub and trout.
Counting residual faults: A similar procedure can be used to estimate the number of faults in a program:
Seed a given number S of faults in the program. Test the program with some test suite and count the
number of revealed faults. Measure the number of seeded faults detected DS, and also the number of
natural faults DN detected. Estimate the total number of faults remaining in the program, assuming the test
suite is as effective at finding natural faults as it is at finding seeded faults, using the formula
𝑆 DS
=
Total natural faults DN
If we estimate the number of faults remaining in a program by determining the proportion of seeded faults
detected, we must be wary of the pitfall of estimating trout population by counting chub. The seeded faults
are chub, the real faults are trout, and we must either have good reason for believing the seeded faults are
no easier to detect than real remaining faults, or else make adequate allowances for uncertainty.
Fault-based testing is widely used for semiconductor and hardware system validation and evaluation. Used
for evaluating the quality of test suites and for evaluating fault tolerance.
Semiconductor testing has conventionally been aimed at detecting random errors in fabrication, rather
than design faults. Relatively simple fault models have been developed for testing semiconductor memory
devices, the prototypical faults being "stuck- at-0" and "stuck-at-1" (a gate, cell, or pin that produces the
same logical value regardless of inputs).
Test and analysis of logic device designs faces the same problems as test and analysis of software,
including the challenge of devising fault models. Hardware design verification also faces the added
problem that it is much more expensive to replace faulty devices that have been delivered to customers
than to deliver software patches. In evaluation of fault tolerance in hardware, the usual approach is to
modify the state or behavior rather than the system under test.
Module 5
Module 5
Chapter 17
Test Execution
Whereas test design, even when supported by tools, requires insight and creativity in similar measure to
other facets of software design, test execution must be sufficiently automated for frequent re-execution
without little human involvement.
➢ Overview
Designing tests is creative; executing them should be as mechanical as compiling the latest version of the
product, and indeed a product build is not complete until it has passed a suite of test cases. In many
organizations, a complete build-and-test cycle occurs nightly, with a report of success or problems ready
each morning.
The purpose of run-time support for testing is to enable frequent hands-free re-execution of a test suite.
A large suite of test data may be generated automatically from a more compact and abstract set of test case
specifications.
For unit and integration testing, and sometimes for system testing as well, the software under test may be
combined with additional scaffolding code to provide a suitable test environment. Executing a large
number of test cases is of little use unless the observed behaviors are classified as passing or failing.
The human eye is a slow, expensive, and unreliable instrument for judging test outcomes, so test
scaffolding typically includes automated test oracles.
A more general test case specification may designate many possible concrete test cases. There is no
clear, sharp line between test case design and test case generation. A rule of thumb is that, while test case
design involves judgment and creativity, test case generation should be a mechanical step.
Automatic generation of concrete test cases from more abstract test case specifications reduces the
impact of small interface changes in the course of development. Corresponding changes to the test suite
Module 5
are still required with each program change, but changes to test case specifications are likely to be smaller
and more localized than changes to the concrete test cases.
Test cases that satisfy several constraints may be simple if the constraints are independent but becomes
more difficult to automate when multiple constraints apply to the same item. Some well-formed sets of
constraints have no solution at all eg., an even, positive integer that is not the sum of two primes.
Constraints that appear to be independent may not be.
General test case specifications that may require considerable computation to produce test data often arise
in model-based testing. For example, if a test case calls for program execution corresponding to a certain
traversal of transitions in a finite state machine model, the test data must trigger that traversal, which
may be quite complex if the model includes computations and semantic constraints.
➢ Scaffolding
During much of development, only a portion of the full system is available for testing. In modern
development methodologies, the partially developed system is likely to consist of one or more runnable
programs and may even be considered a version or prototype of the final system from very early in
construction, so it is possible at least to execute each new portion of the software as it is constructed.
Code developed to facilitate testing is called scaffolding. Scaffolding may include test drivers -
substituting for a main or calling program, test harnesses - substituting for parts of the deployment
environment, and stubs - substituting for functionality called or used by the software under test, in
addition to program instrumentation and support for recording and managing test execution.
A common estimate is that half of the code developed in a software project is scaffolding of some kind.
The amount of scaffolding that must be constructed with a software project can vary widely, and depends
both on the application domain and the architectural design and build plan, which can reduce cost by
exposing appropriate interfaces and providing necessary functionality in a rational order.
The purposes of scaffolding are to provide controllability to execute test cases and observability to
judge the outcome of test execution. Sometimes scaffolding is required to simply make a module
executable, but even in incremental development with immediate integration of each module.
Consider, for example, an interactive program that is normally driven through a graphical user interface.
Assume that each night the program goes through a fully automated and unattended cycle of integration,
compilation, and test execution. It is necessary to perform some testing through the interactive interface,
but it is neither necessary nor efficient to execute all test cases that way. Small driver programs,
independent of the graphical user interface, can drive each module through large test suites in a short time.
At least some level of generic scaffolding support can be used across a fairly wide class of applications.
Such support typically includes basic support for logging test execution and results in addition to a
standard interface for executing a set of test cases. Figure 1 illustrates use of generic test scaffolding in
the JFlex lexical analyzer generator.
Figure 1: Excerpt of Flex 1.4.1 source code (a widely used open-source scanner generator) and
accompanying JUnit test cases. JUnit is typical of basic test scaffolding libraries, providing support
for test execution, logging and simple result checking (assertEquals in the example). The illustrated
version of JUnit uses java reflection to find and execute test case methods; later versions of JUnit
use Java annotation(metadata) facilities, and other tools use source code preprocessors or
generators.
1 package JFlex.tests;
2
3 import JFlex.IntCharSet;
4 import JFlex.Interval;
5 import junit.framework.TestCase;
11 …
12 public class CharClassesTest extends TestCase {
25 …
26 public void testAdd1() {
27 IntCharSet set= new IntCharSet(new Interval(‘a’,’h’));
28 set.add(new Interval(‘o’,’z’));
29 set.add(new Interval(‘A’,’Z’));
30 set.add(new Interval(‘h’,’o’));
31 assertEquals(“{ [‘A’-‘Z’] [‘a’ – ‘z’] }”,set.toString());
32 }
33
34 public void testAdd2() {
35 IntCharSet set= new IntCharSet(new Interval(‘a’,’h’));
36 set.add(new Interval(‘o’,’z’));
37 set.add(new Interval(‘A’,’Z’));
38 set.add(new Interval(‘i’,’n’));
39 assertEquals(“{ [‘A’-‘Z’] [‘a’ – ‘z’] }”,set.toString());
Module 5
40 }
99 …
100 }
Fully generic scaffolding may be sufficient for small numbers of hand-written test cases. For larger
test suites and particularly for those that are generated systematically, writing each test case by hand is
impractical. The Java code expressing each test case in Figure 1 follows a simple pattern, and it would
not be difficult to write a small program to convert a large collection of input, output pairs into procedures
following the same pattern. A large suite of automatically generated test cases and a smaller set of hand-
written test cases can share the same underlying generic test scaffolding.
The simplest kind of stub, sometimes called a mock, can be generated automatically by analysis of the
source code. A mock is limited to checking expected invocations and producing pre-computed results that
are part of the test case specification or were recorded in a prior execution. Depending on system build
order and the relation of unit testing to integration in a particular process, isolating the module under test
is sometimes considered an advantage of creating mocks, as compared to depending on other parts of the
system that have already been constructed.
➢ Test Oracles
It is little use to execute a test suite automatically if execution results must be manually inspected to apply
a pass/fail criterion. Relying on human intervention to judge test outcomes is not merely expensive, but
also unreliable because of hard-work required to identify one failure in a hundred program executions,
one or ten thousand. That is a job for a computer.
Software that applies a pass/fail criterion to a program execution is called a test oracle.
Automated test oracles make it possible to classify behaviors that exceed human capacity in addition to
rapidly classifying a large number of test case executions. Egs., checking real-time response against
latency requirements, dealing with voluminous output data in a machine-readable rather than human-
readable form.
A test oracle would classify every execution of a correct program as passing and would detect every
program failure. In practice, the pass/fail criterion is usually imperfect. A test oracle may apply a pass/fail
criterion that reflects only part of the actual program specification, and therefore passes some program
executions, it has to fail. A test oracle may also give false alarms, failing an execution that it has to pass.
False alarms in test execution are highly undesirable because of the direct expense of manually checking
them and real failures will be overlooked. The best we can obtain is an oracle that detects deviations from
expectation that may or may not be actual failures.
Module 5
A test case with a comparison-based oracle relies on predicted output that is either pre-computed as
part of the test case specification or can be derived in some way independent of the program under test.
Pre-computing expected test results is reasonable for a small number of relatively simple test cases, and
is still preferable to manual inspection of program results because the expense of producing predicted
results is incurred once and amortized over many executions of the test case.
Support for comparison-based test oracles is often included in a test harness program or testing
framework as in figure 2. A harness typically takes two inputs: (1) the input to the program under test
and (2) the predicted output. Frameworks for writing test cases as program code likewise provide
support for comparison-based oracles. The assertEquals method of JUnit, illustrated in Figure 1, is a
simple example of comparison-based oracle support.
Figure 2: A test harness with a comparison-based test oracle processes test cases consisting of
(program input, predicted output) pairs.
Comparison-based oracles are useful mainly for small, simple test cases, but sometimes expected outputs
can also be produced for complex test cases and large test suites. A related approach is to capture the
output of a trusted alternate version of the program under test. It is not even necessary that the alternative
implementation be more reliable than the program under test, as long as it is sufficiently different.
Oracles that check results without reference to a predicted output are often partial. They check
necessary but not sufficient conditions for correctness. For example, if the specification calls for finding
the optimum bus route according to some metric, partial oracle a validity check is only a partial oracle
because it does not check optimality. Similarly, checking that a sort routine produces sorted output is
simple and cheap, but it is only a partial oracle because the output is also required to be a permutation of
the input.
➢ Self-Checks as Oracles
A program or module specification describes all correct program behaviors, so an oracle based on a
specification need not be paired with a particular test case. Instead, the oracle can be incorporated into the
program under test, so that it checks its own work as in Figure 3. In general these self-check are in the
form of assertions, similar to assertions used in symbolic execution and program verification but designed
to be checked during execution.
Module 5
If the cost of a runtime assertion check is too high - most tools for assertion processing also provide
controls for activating and deactivating assertions. It is generally considered good design practice to make
assertions and self-checks be free of side-effects on program state. Side-effect free assertions are essential
when assertions may be deactivated, because otherwise suppressing assertion checking can introduce
program failures that appear only when one is not testing.
Figure 3: When self-checks are embedded in the program, test cases need not include predicted
outputs.
Self-checks in the form of assertions embedded in program code are useful primarily for checking module
and subsystem-level specifications, rather than overall program behavior. Devising program assertions
that correspond in a natural way to specifications poses two main challenges: bridging the gap between
concrete execution values and abstractions used in specification, and dealing in a reasonable way
with quantification over collections of values.
Test execution necessarily deals with concrete values, while abstract models are very important in both
formal and informal specifications. The intended effect of an operation is described in terms of a
precondition i.e., state before the operation and postcondition i.e., state after the operation, relating
the concrete state to the abstract model. Consider again a specification of the get method of java.util.Map
with pre- and postconditions expressed as the Hoare triple.
(|<k,v> € φ(dict)|)
O = dict.get(k)
(|o = v|)
φ is an abstraction function that constructs the abstract model type (sets of key, value pairs) from the
concrete data structure. φ is a logical association that need not be implemented when analysis about
program correctness. To create a test oracle, it is useful to have an actual implementation of φ. For this
example, we might implement a special observer method that creates a simple textual representation of
the set of (key, value) pairs. Assertions used as test oracles can then correspond directly to the
specification. In addition, simplifying implementation of oracles by implementing this mapping once and
using it in several assertions.
Module 5
In addition to an abstraction function, reasoning about the correctness of internal structures usually
involves structural invariants, that is, properties of the data structure that are preserved by all operations.
Structural invariants are good candidates for self checks implemented as assertions. They pertain directly
to the concrete data structure implementation, and can be implemented within the module that
encapsulates that data structure. For example, if a dictionary structure is implemented as a red-black tree
or an AVL tree, the balance property is an invariant of the structure that can be checked by an assertion
within the module. Figure 4 illustrates an invariant check found in the source code of the Eclipse
programming invariant.
The problem of quantification over large sets of values is a variation on the basic problem of program
testing, which is that we cannot exhaustively check all program behaviors. Instead, we select a tiny fraction
of possible program behaviors or inputs as representatives. The same tactic is applicable to quantification
in specifications. If we cannot fully evaluate the specified property, we can at least select some elements
to check. For example, as with test design, good samples require some insight into the problem, such as
recognizing that if the shortest path from A to C passes through B, it should be the concatenation of the
shortest path from A to B and the shortest path from B to C.
A final implementation problem for self-checks is that asserted properties sometimes involve values that
are either not kept in the program at all (so-called ghost variables) or values that have been replaced
("before" values). A specification of noninterference between threads in a concurrent program may use
ghost variables to track entry and exit of threads from a critical section. A run-time assertion system must
manage ghost variables and retained "before" values and must ensure that they have no side-effects outside
assertion checking.
Figure 4: A structural invariant checked by run –time assertions. Excerpted from the Eclipse
programming environment, version 3. ©2000, 2005 IBM Corporation; used under terms of the
Eclipse public License v1.0.
1 package org.eclipse.jdt.internal.ui.text;
2 import jaqva.text.CharacterIterator;
3 Import org.eclipse.jface.text.Assert;
4 /**
5 * A < code > CharSequenceCharacterIterator implements CharacterIterator {
6 * < code > CharacterIterator</code>.
7 *@since 3.0
8 */
9 Public class SequenceCharacterIterator implements CharacterIterator {
13 …
14 private void invariant() {
15 Assert.isTrue(fIndex >= fFirst);
16 Assert.isTrue(fIndex <=fLast);
17 }
49 …
50 public SequenceCharacterIterator(CharSequence, int first, int last)
51 throws illegalArgumentException {
52 if(sequence == null)
Module 5
53 throw new NullPointerException();
54 if(first < 0 || first > last)
55 throw new illegalArgumentException();
56 if(last > sequence.length())
57 throw new illegalArgumentException();
58 fSequence= sequence;
59 fFirst=first;
60 fLast=last;
61 fIndex=first;
62 invariant();
63 }
143 …
144 public char setIndex(int position) {
145 if (position >= getBeginIndex() && position <= getEndIndex())
146 findex= position;
147 else
148 throw new illegalArgumentException();
149
150 invariant();
151 return current();
152 }
263 …
264 }
The savings from automated retesting with a captured log depends on how many build-and-test cycles we
can continue to use it in, before it is invalidated by some change to the program. Capturing events at a
more abstract level suppresses insignificant changes. For example, if we log only the actual pixels of
windows and menus, then changing even a typeface or background color can invalidate an entire suite of
execution logs.
A more fruitful approach is capturing input and output behavior at multiple levels of abstraction within
the implementation. We have noted the usefulness of a layer in which abstract input events are captured
in place of concrete events. In general, there is a similar abstract layer in graphical output, and much of
Module 5
the capture/replay testing can work at this level. Small changes to a program can still invalidate a large
number of execution logs, but it is much more likely that an insignificant detail can either be ignored in
comparisons or, even better, the abstract input and output can be systematically transformed to reflect the
intended change. Further amplification of the value of a captured log can be obtained by varying the
logged events to obtain additional test cases. Creating meaningful and well-formed variations also depends
on the abstraction level of the log.
UNIT 7
1. What do you mean by Fault-Based testing? What are the assumptions made in Fault-Based
testing? Explain.
2. What is Mutation Analysis Explain Mutation Analysis with an appropriate example?
3. Explain Fault-Based Adequacy criteria.
4. What are different types of Mutation Analysis Explain them.
5. What do you mean by test execution? Explain.
6. What is Scaffolding. Explain generic and Specific Scaffolding with an appropriate example.
7. Explain Test oracles with neat diagrams.
8. Explain Self-Checks as oracles with neat diagrams.
9. What do mean by capture and repaly?Expalin.
Module 5
Testing and analysis activities occur throughout the development and evolution of software systems,
from early in requirements engineering through delivery and subsequent evolution. Quality depends on
every part of the software process, not only on software analysis and testing.
It is convenient to group these quality assurance activities under the rubric quality process. The
quality process provides a framework for selecting and arranging activities aimed at a particular goal,
while also considering interactions and trade-offs with other important goals.
All software development activities reflect constraints and trade-offs, and quality activities are
no exception. For example, high dependability is usually in tension with time to market, and in most
cases it is better to achieve a reasonably high degree of dependability on a tight schedule than to achieve
ultra-high dependability on a much longer schedule,.
The quality process should be structured for completeness, timeliness, and cost- effectiveness.
Completeness means that appropriate activities are planned to detect each important class of faults. What
the important classes of faults are depends on the application domain, the organization, and the
technologies employed.
Cost-effectiveness means that, subject to the constraints of completeness and timeliness, one
chooses activities depending on their cost as well as their effectiveness. Cost must be considered over the
whole development cycle and product life, so the dominant factor is likely to be the cost of repeating an
activity through many change cycles.
A well-designed quality process balances several activities across the whole development process,
selecting and arranging them to be as cost-effective as possible, and to improve early visibility. For
example, one designs test cases at the earliest opportunity and uses both automated and manual static
analysis techniques on software artifacts that are produced before actual code.
Early visibility also motivates the use of proxy measures. For example, we know that the number
of faults in design or code is not a true measure of reliability. However, one may count faults uncovered
in design inspections as an early indicator of potential quality problems, because the alternative of waiting
to receive a more accurate estimate from reliability testing is unacceptable.
Quality goals can be achieved only through careful planning of activities that are matched to the
identified objectives. Planning is integral to the quality process and is elaborated and revised through the
whole project. It encompasses both an overall strategy for test and analysis, and more detailed test plans.
The overall analysis and test strategy identifies company- or project-wide standards that must be
satisfied: procedures for obtaining quality certificates required for certain classes of products, techniques
and tools that must be used, and documents that must be produced. Some companies develop and certify
procedures following international standards such as ISO 9000 or SEI Capability Maturity Model,
which require detailed documentation and management of analysis and test activities and well-defined
phases, documents, techniques, and tools.
The final analysis and test plan includes additional information that illustrates constraints, pass and
fail criteria, schedule, deliverables, hardware and software requirements, risks, and contingencies.
Constraints indicate deadlines and limits that may be derived from the hardware and software
implementation of the system under analysis and the tools available for analysis and testing. Pass and fail
criteria indicate when a test and analysis activity fails or succeeds, thus supporting monitoring of the
quality process.
➢ Quality Goals
Process visibility requires a clear specification of goals. A team that does not have a clear idea of the
difference between reliability and robustness has little chance of attaining goal. Goals must be further
refined into a clear and reasonable set of objectives.
The relative importance of qualities and their relation to other project objectives varies. Time-to-
market may be the most important property for a mass market product, usability may be more prominent
for a Web based application, and safety may be the overriding requirement for a life-critical system.
Module 5
Product qualities are the goals of software quality engineering, and process qualities are means
to achieve those goals. For example, development processes with a high degree of visibility are necessary
for creation of highly dependable products. The process goals with which software quality engineering is
directly concerned are often on the "cost" side of the ledger.
Software product qualities can be divided into those that are directly visible to a client and
those that primarily affect the software development organization. Reliability, for example, is
directly visible to the client. Maintainability primarily affects the development organization.
Properties that are directly visible to users of a software product, such as dependability, latency,
usability, and throughput, are called external properties. Properties that are not directly visible to end
users, such as maintainability, reusability, and traceability, are called internal properties.
The external properties of software can ultimately be divided into dependability and usefulness.
There is no precise dependability way to distinguish these, but a rule of thumb is that when software is
not dependable, we say it has a fault, or a defect, or a bug, resulting in an undesirable behavior or failure.
It is quite possible to build systems that are very reliable, relatively free from usefulness hazards, and
completely useless. They may be unbearably slow, or have terrible user interfaces and unfathomable
documentation, or they may be missing several crucial features. How should these properties be
considered in software quality? One answer is that they are not part of quality at all unless they have
been explicitly specified, since quality is the presence of specified properties.
➢ Dependability Properties
The simplest of the dependability properties is correctness: A program or system is correct if it is
consistent with its specification. By definition, a specification divides all possible system behaviors into
two classes, successes and failures. All of the possible behaviors of a correct system are successes.
Particular measures of reliability can be used for different units of execution and different ways of
counting success and failure. Availability is an appropriate measure when a failure has some duration in
time. For example, a failure of a network router may make it impossible to use some functions of a local
area network until the service is restored; between initial failure and restoration we say the router is "down"
or "unavailable." The availability of the router is the time in which the system is "up" as a fraction of total
time. Thus, a network router that averages 1 hour of down time in each 24-hour period would have an
availability of 23/24, or 95.8%.
Mean time between failures (MTBF) is yet another measure of reliability, also using time as the
unit of execution. The hypothetical network switch that typically fails once in a 24-hour period and takes
Module 5
about an hour to recover has a mean time between failures of 23 hours. Note that availability does not
distinguish between two failures of 30 minutes each and one failure lasting an hour, while MTBF does.
Software safety is an extension of the well-established field of system safety into software. Safety
is concerned with preventing certain undesirable behaviors, called hazards. It is quite explicitly not
concerned with achieving any useful behavior apart from whatever functionality is needed to prevent
hazards. Software safety is typically a concern in "critical" systems such as avionics and medical
systems.
Safety is best considered as a quality distinct from correctness and reliability for two reasons. First,
by focusing on a few hazards and ignoring other functionality, a separate safety specification can be much
simpler than a complete system specification, and therefore easier to verify. To put it another way, while
a good system specification should rule out hazards, we cannot be confident that either specifications or
our attempts to verify systems are good enough to provide the degree of assurance we require for hazard
avoidance.
Software that gracefully degrades or fails "softly" outside its normal operating parameters is
robust. Software safety is a kind of robustness, but robustness is a more general notion that concerns not
only avoidance of hazards (e.g., data corruption) but also partial functionality under unusual situations.
Robustness, like safety, begins with explicit consideration of unusual and undesirable situations, and
should include augmenting software specifications with appropriate responses to undesirable events.
Quality analysis should be part of the feasibility study. The sidebar below shows an excerpt of the
feasibility study for the Chipmunk Web presence. The primary quality requirements are stated in terms of
dependability, usability, and security. Performance, portability and interoperability are typically not
primary concerns at this stage, but they may come into play when dealing with other qualities.
This document was prepared for the Chipmunk IT management team. It describes the results of a
feasibility study undertaken to advise Chipmunk corporate management whether to embark on a
substantial redevelopment effort to add online shopping functionality to the Chipmunk Computers' Web
presence.
Goals : The primary goal of a Web presence redevelopment is to add online shopping facilities. Marketing
estimates an increase of 15% over current direct sales within 24 months, and an additional 8% savings in
direct sales support costs from shifting telephone price inquiries to online price inquiries.
Architectural Requirements :The logical architecture will be divided into three distinct subsystems:
human interface, business logic, and supporting infrastructure. Each major subsystem must be
structured for phased development, with initial features delivered 6 months from inception, full features
at 12 months, and a planned revision at 18 months from project inception.
Quality Requirements :
Dependability: With the introduction of direct sales and customer relationship management functions,
dependability of Chipmunk's Web services becomes business critical. A critical core of functionality will
be identified, isolated from less critical functionality in design and implementation, and subjected to the
highest level of scrutiny. We estimate that this will be approximately 20% of new development and
revisions, and that the V&V costs for those portions will be approximately triple the cost of V&V for
noncritical development.
• Usability: The new Web presence will be, to a much greater extent than before, the public face of
Chipmunk Computers.
• Security: Introduction of online direct ordering and billing raises a number of security issues. Some
of these can be avoided initially by contracting with one of several service companies that provide
secure credit card transaction services. Nonetheless, order tracking, customer relationship
management, returns, and a number of other functions that cannot be effectively outsourced raise
significant security and privacy issues.
➢ Analysis
Analysis techniques that do not involve actual execution of program source code play a prominent role in
overall software quality processes. Manual inspection techniques and automated analyses can be applied
at any development stage. They are particularly well suited at the early stages of specifications and design,
where the lack of executability of many intermediate artifacts reduces the efficacy of testing.
Inspection, in particular, can be applied to essentially any document including requirements documents,
architectural and more detailed design documents, test plans and test cases, and program source
Module 5
code. Inspection may also have secondary benefits, such as spreading good practices and instilling
shared standards of quality.
Inspection takes a considerable amount of time and requires meetings, which can become a scheduling
bottleneck. Moreover, re-inspecting a changed component can be as expensive as the initial inspection.
Automated static analyses are more limited in applicability but are selected when available because
substituting machine cycles for human effort makes them particularly cost-effective.
➢ Testing
Tests are executed when the corresponding code is available, but testing activities start earlier, as soon as
the artifacts required for designing test case specifications are available. Thus, acceptance and system
test suites should be generated before integration and unit test suites, even if executed in the opposite
order.
• Tests are specified independently from code and when the corresponding software specifications
are fresh in the mind of analysts and developers, facilitating review of test design.
• Also test cases may highlight inconsistencies and incompleteness in the corresponding software
specifications.
• Early design of test cases also allows for early repair of software specifications, preventing
specification faults from propagating to later stages in development.
• Finally, programmers may use test cases to illustrate and clarify the software specifications,
especially for errors and unexpected conditions.
At Chipmunk, developers are expected to perform functional and structural module testing before a
work assignment is considered complete and added to the project baseline.
The goal of quality process improvement is to find cost-effective countermeasures for classes of
faults that are expensive because they occur frequently, or because the failures they cause are expensive,
once detected, they are expensive to repair. Countermeasures may be either prevention or detection and
may involve either quality assurance activities or other aspects of software development.
The first part of a process improvement feedback loop is gathering sufficiently complete and
accurate raw data about faults and failures. A main obstacle is that data gathered in one project goes mainly
to benefit other projects in the future and may seem to have little direct benefit for the current project,
much less to the persons asked to provide the raw data.
Module 5
The analysis step consists of tracing several instances of an observed fault or failure back to the human
error from which it resulted. The analysis also involves the reasons the fault was not detected and
eliminated earlier. This process is known as root cause analysis but the ultimate aim is for the most cost-
effective countermeasure, which is sometimes but not always the ultimate root cause.
➢ Organizational Factors
The quality process includes a wide variety of activities that require specific skills and attitudes and may
be performed by quality specialists or by software developers. Planning the quality process involves not
only resource management but also identification and allocation of responsibilities.
A poor allocation of responsibilities can lead to major problems in which pursuit of individual goals
conflicts with overall project success. For example, splitting responsibilities of development and quality-
control between a development and a quality team, and rewarding high productivity in terms of lines of
code per person-month during development may produce undesired results. The development team, not
rewarded to produce high-quality software, may attempt to maximize productivity to the detriment of
quality. The resources initially planned for quality assurance may not suffice if the initial quality of code
from the "very productive" development team is low. On the other hand, combining development and
quality control responsibilities in one undifferentiated team, while avoiding the perverse incentive of
divided responsibilities, can also have unintended effects: As deadlines near, resources may be shifted
from quality assurance to coding, at the expense of product quality.
Conflicting considerations support both the separation of roles (e.g., recruiting quality specialists), and
the mobility of people and roles (e.g, rotating engineers between development and testing tasks).
At Chipmunk, responsibility for delivery of the new Web presence is distributed among a development
team and a quality assurance team. Both teams are further articulated into groups. The quality assurance
team is divided into the analysis and testing group, responsible for the dependability of the system, and
the usability testing group, responsible for usability. Responsibility for security issues is assigned to the
infrastructure development group, which relies partly on external consultants for final tests based on
external attack attempts.
UNIT 6
1. Explain Validation and Verification with neat diagram.
2. Explain Degree of freedom with appropriate examples and diagram.
3. Note on Varieties of software.
4. Explain Basic Principles in software testing in detail.
5. Explain a) The Quality process b) Planning and monitoring c) Quality goals.
6. Explain Dependability properties in detail in software testing.
7. What do you mean by analysis with respect to Chipmunk IT management document Explain?
8. Explain Improving the Process and Organizational factors.
Module 5
UNIT 6 (VTU Questions)