Nbsspecialpublication700 2
Nbsspecialpublication700 2
Nbsspecialpublication700 2
DEPARTMENT OF COMMERCE
National Bureau of Standards
r
Ication 700-2
mm t Series
Measurement
Evaluation
J. Mandel and L. F. Nanni
NBS NBS NBS NBS NBS NBS NBS NBS NBS NBS /
iS NBS NBS NBS NBS NBS NBS NBS NBS NBS NBl
NBS NBS NBS NBS NBS NBS NBS NBS NBS NBS /
iS NBS NBS NBS NBS NBS NBS NBS NBS NBS NBi
NBS NBS NBS NBS NBS NBS NBS NBS NBS NBS /
tS NBS NBS NBS NBS NBS NBS NBS NBS NBS NBi
NBS NBS NBS NBS NBS NBS NBS NBS NBS NBS il
tS NBS NBS NBS NBS NBS NBS NBS NBS NBS NB^
NBS NBS NBS NBS NBS NBS NBS NBS NBS NBS A
iS NBS NBS NBS NBS NBS NBS NBS NBS NBS NB\-
NBS NBS NBS National Bureau of Standards NBS NBS /
NBS NBS NBS NBS NBS NBS NBS NBS NBI
m^^BS NBS NBS NBS NBS NBS NBS NBS NBS /
Wmm NBS NBS NBS NBS NBS NBS NBS NBS NB\
W^^BS NBS NBS NBS NBS NBS NBS NBS NBS A
Q^P NBS NBS NBS NBS NBS NBS NBS NBS NB,
.
TM
# he National Bureau of Standards' was established by an act of Congress on March 3, 1901. The
Bureau's overall goal is to strengthen and advance the nation's science and technology and facilitate
their effective application for public benefit. To this end, the Bureau conducts research and provides: (1) a
basis for the nation's physical measurement system, (2) scientific and technological services for industry and
government, (3) a technical basis for equity in trade, and (4) technical services to promote public safety.
The Bureau's technical work is performed by the National Measurement Laboratory, the National
Engineering Laboratory, the Institute for Computer Sciences and Technology, and the Institute for Materials
Science and Engineering
'Headquarters and Laboratories at Gaithersburg, MD, unless otherwise noted; mailing address
Gaithersburg, MD
20899.
'Some di' isions within the center are located at Boulder, CO 80303.
'Located at Boulder, CO, with some elements at Gaithersburg, MD.
NBS Special Publication 700-2
Industrial Measurement Series
Measurement
Evaluation
J. Mandel
National Measurement Laboratory
National Bureau of Standards
Gaithersburg, Maryland 20899
and
L. F. Nanni
Federal University
Porto Alegre, Brazil
March 1986
The links between NBS staff members and our industrial colleagues
have always been strong. Publication of this new Industrial
Measurement Series, aimed at those responsible for measurement in
industry, represents a strengthening of these ties.
The concept for the series stems from the joint efforts of the
National Conference of Standards Laboratories and NBS. Each volume
will be prepared jointly by a practical specialist and a member of the
NBS staff. Each volume will be written within a framework of
industrial relevance and need.
i i i
INTRODUCTION
J. Mandel
January, 1986
iv
ABOUT THE AUTHORS
John Mandel
Luis F. Nanni
V
CONTENTS
Page
Foreword iii
Introduction iv
Random variables 1
vi
Propagation of Errors 27
An example 27
The General Case 28
Sample Sizes and Compliance with Standards 30
An Example 30
General Procedure-Acceptance, Rejection, Risks 31
Inclusion of Between-Laboratory Variability 32
Transformation of Scale 33
Some Common Transformations 33
Robustness 33
Transformations of Error Structure 34
Presentation of Data and Significant Figures 35
An Example 35
General Recommendations 37
Tests of Significance 37
General Considerations 37
Alternative Hypotheses and Sample Size-The Concept of Power 38
An Example 39
Evaluation of Diagnostic Tests 40
Sensitivity and Specificity 41
Predictive Values-The Concept of Prevalance 41
Interpretation of Multiple Tests 42
A General Formula for Multiple Independent Tests 43
2. Quality Control 44
vii
other Types of Control Charts 59
Control Chart for Attributes-The P-Chart
Control Limits and Warning Limits 59
Control Charts for Number of Defects Per Unit-The C-Chart . 60
The Poisson Distribution 61
Detecting Lack of Randomness 61
Rules Based on the Theory of Runs 61
Distribution of Points Around the Central Line 62
Interpreting Patterns of Variation in a Control Chart 62
Indication of Lack of Control 62
Patterns of Variation 62
The Control Chart as a Management Tool 63
References 64
viii
Measurement
Evaluation
J. Mandel (principal author), and L. F. Nanni.
Random variables
the measurements in the population into group intervals and counts the num-
ber of measurements in each interval. Each interval is defined in terms of its
lower and upper limit, in the scale in which the measurement is expressed.
Since in practice one is always limited to a statistical sample, i.e., a finite
number of measurements, one can at best only approximate the frequency
distribution. Such an approximation is called a histogram. Figure 4.1 contains
a histogram of glucose values in serum measurements on a sample of 2,197
individuals. It is worth noting that the frequency tends to be greatest in the
vicinity of the mean and diminishes gradually as the distance from the mean
1
HISTOGRAM FOR GLUCOSE IN SERUM
400
300
>-
o
UJ
Z) 200
o
UJ
-
100
increases. The grouped data on which the histogram is based are given in
Table 4.1.
Random samples
The sample of individuals underlying the histogram in Table 4. 1 is rather
large. A large size, in itself, does not necessarily ensure that the histogram's
characteristics will faithfully represent those of the entire population. An ad-
ditional requirement is sample be obtained by a random selection
that the
from the entire population. A random selection is designed to ensure that
each element of the population has an equal chance of being included in the
sample. A sample obtained from a random selection is called a random
sample, although, strictly speaking, it is not the sample but the method of
obtaining it that is random. Using the concept of a random sample, it is pos-
sible to envisage the population as the limit of a random sample of ever-in-
creasing size. When the sample size N
becomes larger and larger, the charac-
teristics of the sample approach those of the entire population. If the random
sample is as large as the sample used in this illustration, we may feel con-
fident that its characteristics are quite similar to those of the population.
2
T
47 S 1
1c\n c
lU/.J 313
L 111 c
112.5 220
S7 S L 111
11/.^ 132
-I
Di. J J 122.5 50
D / . J 1
12 /.5 26
/Z. J 20 132.5 8
77.5 52 137.5 6
82.5 118 142.5 4
87.5 204 147.5 1
102.5 390
Total number of individuals: 2,197
Thus, upon inspection of Table 4.1, we may feel confident that the mean se-
rum glucose for the entire population is not far from 100.4 mg/dl. We also
may feel confident in stating that relatively very few individuals, say about 1
percent of the entire population, will have serum glucose values of less than
70 mg/dl. Our confidence in such conclusions (which, incidentally, can be
made more quantitative), however, would have been much less had all of the
available data consisted of a small sample, say on the order of five to 50 indi-
viduals. Two such sets of data are shown in Table 4.2. Each represents the
serum glucose of ten individuals from the population represented in Table
4.1. The mean glucose contents of these samples are 107.57 and 96.37 mg/dl,
respectively. If either one of these samples was all the information available
Sample I Sample II
1 134.2 1 88.2
2 119.6 2 82.0
3 91.9 3 96.0
4 96.6 4 94.1
5 118.8 5 96.3
6 105.2 6 108.8
7 103.4 7 106.3
8 112.1 8 101.1
9 97.0 9 89.4
10 96.9 10 101.7
3
to us,what could we have concluded about the mean serum glucose of the
entire population?And, in that case, what could we have stated concerning
the percentage of the population having a serum glucose of less than 70
mg/dl?
only two parameters are required, in the sense that these two parameters
contain practically all the pertinent information that is required for answer-
ing all useful questions about the population. In cases where more than two
parameters are needed, it is often possible to perform a mathematical opera-
tion, called a transformation of scale, on the measured values, which will
reduce the required number of parameters to two. The two parameters in
question are the mean and the standard deviation, measuring, respectively,
the location of the center of the population and its spread.
Sample estimates
Let JCi, JC2,
. . Xs represent a sample of
.
,
measurements belonging to
a single population. The sample mean is generally denoted by x and defined
by
^ Xi + X2 + . . . + Xn _ XXj
N ~ N ^^-^^
(4.2)
N - 1
Sx^\/sf (4.3)
Table 4.2 contains, for each of the samples, the numerical values of x, si,
and Sj..
MS =^ (4.5)
where MS stands for mean square, SS for sum of squares, and DF for de-
grees offreedom. The term "sum of squares" is short for "sum of squares of
deviations from the mean," which is, of course, a literal description of the
expression X(Xi - x)^, but it is also used to describe a more general concept,
which will not be discussed at this point. Thus, Equation 4.2 is a special case
of the more general Equation 4.5.
The reason for making the divisor N - rather than the
1 more obvious
N can be understood by noting that the N quantities
Xi X, X2 X, . . .
, Xf^ X
Substituting for x the value given by its definition (Equation 4. 1), we obtain:
This relation implies that if any {N - 1) of the quantities {Xi - x) are giv- N
en, the remaining one can be calculated without ambiguity. It follows that
while there areN independent measurements, there are only - 1 indepen- N
dent deviations from the mean. We express this fact by stating that the
sample variance is based on - 1 degrees offreedom. This explanation pro-
vides at least an intuitive justification for using - 1 as a divisor for the N
calculation of s^. When N
is very large, the distinction between and - 1 N N
becomes unimportant, but for reasons of consistency, we always define the
5
sample variance and the sample standard deviation by Equations 4.2 and
4.3.
Grouped data
When the data in a sample are given in grouped form, such as in Table
4. 1 ,
and 4.2 cannot be used for the calculation of the mean and
Equations 4. 1
the variance. Instead, one must use diflFerent formulas that involve the mid-
points of the intervals (first column of Table 4.1) and the corresponding fre-
quencies (second column of Table 4.1).
Formulas for grouped data are given below.
To differentiate the regular average (Equation 4.1) of a set of jc; values
from their "weighted average'' (Equation 4.8), we use the symbol x (x tilde)
for the latter.
5. ^v^r (4.10)
— c
(4.11)
The weighted average u is equal to (i - Xo)/c. Operation (1) alters neither the
variance nor the standard deviation. Operation (2) divides the variance by
and the standard deviation by c. Thus, "uncoding" is accomplished by multi-
plying the variance of // by and the standard deviation of // by c. The for-
mulas in Equations 4.8, 4.9, and 4. 10 are illustrated in Table 4.3 with the data
from Table 4.1.
We now can better appreciate the difference between population param-
eters and sample estimates. Table 4.4 contains a summary of the values of
the mean, the variance, and the standard deviation for the population (in this
case, the very large sample TV = 2,197 is assumed to be identical with the
population) and for the two samples of size 10.
6
1
102.5 0 390
The widely held, intuitive notion that the average of several measure-
ments is "better" than a single measurement can be given a precise meaning
by elementary statistical theory.
Let Xv represent a sample of size
Xi, a:2, . . . taken from a population
, N
of mean /X and standard deviation cr.
Let jCi represent the average of the measurements. We can visualize a N
repetition of the entire process of obtaining the results, yielding a new av- N
erage X2. Continued repetition would thus yield a series of averages jCj,
. . (Two such averages are given by the sets shown in Table 4.2). These
. .
Table 4.4. Population Parameter and Sample Estimates (Data of Tables 4. and 1 4.2)
7
variance of the population of averages can be shown to be smaller than that
of the population of single values, and, in fact, it can be proved mathemati-
cally that the following relation holds:
VarW
VarU) =--j^ — (4.12)
cr.
(4.13)
This relation is known as the law of the standard error of the mean, an ex-
pression simply denoting the quantity cr^. The term standard error refers to
the variability of derived quantities (in contrast to original measurements).
Examples are: the mean of individual measurements and the intercept or
the slope of a fitted line (see section on straight line fitting). In each case, the
derived quantity is considered a random variable with a definite distribution
function. The standard error is simply the standard deviation of this distribu-
tion.
Systematic errors
Asecond observation concerns the important assumption of random-
ness required for the validity of the law of the standard error of the mean.
The N values must represent sl random sample from the original population.
If, for example, systematic errors arise when going from one set of N meas-
urements to the next, these errors are not reduced by the averaging process.
An important example of this is found in the evaluation of results from difi'er-
ent laboratories. If each laboratory makes N
measurements, and if the within-
laboratory replication error has a standard deviation of a, the standard
deviation between the averages of the various laboratories will generally be
larger than cr/VN, because additional variability is generally found between
laboratories.
8
and its spread. They fail to inform us, however, as to the exact way in which
the values are distributed around the mean. In particular, they do not tell us
whether the frequency or occurrence of values smaller than the mean is the
same as that of values larger than the mean, which would be the case for a
symmetrical distribution. A nonsymmetrical distribution is said to be skew,
and it is possible to define a parameter of skewness for any population. As in
the case of the mean and the variance, we can calculate a sample estimate of
the population parameter of skewness. We will not discuss this matter fur-
ther at this point, except to state that even the set of three parameters, mean,
variance, and skewness, is not always sufficient to completely describe a
population of measurements.
mean and its variance (or, alternatively, its mean and its standard deviation).
Let a: be the result of some measuring process. Unlimited repetition of
the process would generate a population of values Xi, X21 -^31 If the fre-
• • • •
z =^^^^ '
(4.14)
cr
9
Then, the corresponding z value will be given by
cr
Thus the z value simply expresses the distance from the mean, in units of
standard deviations.
Confidence intervals
10
Let
Xi, X2, . . represent a sample of size N from a population of
.
,
mean and standard deviation a. In general fx and a are unknown, but can
/jl
include the true value (100.42). By making this interval long enough we can
always easily fulfill this requirement, depending on what we mean by 'iike-
ly." Therefore, we first express this qualification in a quantitative way by
stipulating the value of a confidence coefficient Thus we may require that
.
the interval shall bracket the population mean "with 95 percent con-
fidence." Such an interval is then called a "95 percent confidence interval."
—
The case of known cr. We proceed as follows, assuming for the mo-
ment that although is unknown, the population standard deviation cr is
z= ^~ ^ (4.15)
a/ \/ N
By virtue of the central limit theorem, the variable x generally may be
considered to be normally distributed. The variable z then obeys the reduced
normal distribution. We can therefore assert, for example, that the probabili-
ty that
~
-1.96 <-, ^ ^ < 1.96
a/ VN
or
11
percent. Such a confidence interval is said to be a "95 percent confidence
interval," or to have a confidence coefficient of 0.95. By changing 1.96 to
3.00 in Equation 4.17, we would obtain a 99.7 percent confidence interval.
General formula for the case of known a. More generally, from the —
table of the reduced normal distribution, we can obtain the proper critical
value Z(. (to replace 1.96 in Equation 4.17) for any desired confidence
coefficient. The general formula becomes
obtained from a sample of size N, from a normal population of mean and /jl
/ - (4.20)
s/ VN
has a well-defined distribution, depending only on the degrees of freedom,
A/ - 1, with which s has been estimated. This distribution is known as Stu-
dent's / distribution with N
- 1 degrees of freedom.
For a unknown, it is still possible, therefore, to calculate confidence in-
tervals for the mean by substituting in Equation 4.18 5 for cr, and tr for z^.
The confidence interval is now given by
X - t,
^- < IX < X + tc - — (4.21)
Vn Vn
The critical value tc, for any desired confidence coefficient, is obtained
from a tabulation of Student's / distribution. Tables of Student's t values can
12
.
L = It, —^ (4.22)
Vn
For any given confidence coefficient, t, will be larger than z^, so that the
length of the interval given by Equation 4.22 is larger than that given by
Equation 4. 19. This difference is to be expected, since the interval now must
take into account the uncertainty of the estimate s in addition to that of x.
Applying Equation 4.21 to the two samples shown in Table 4.2, and
choosing a 95 percent confidence coefficient (which, for 9 degrees of free-
dom, gives tc = 2.26), we obtain:
1) For the first sample:
^'^^
96.37 - 2.26 -^4?: < /a < 96.37 + 2.26
VTo VTo
or
13
If the number of degrees of freedom with which 5 is estimated is denoted
by V, a confidence interval for a is given by the formula:
In this formula, the quantities X f and Xl are the appropriate upper and low-
er percentage points of a statistical distribution known as chi-square, for the
chosen confidence coefficient. These percentage points are found in several
'^"^
references.
This formula can be illustrated by means of the two samples in Table
4.2. To calculate 95 percent confidence intervals for a (the population stand-
ard deviation), we locate the limits at points corresponding to the upper and
lower 2.5 percentage points (or the 97.5 percentile and the 2.5 percentile) of
chi-square. From the chi-square table we see that, for 9 degrees of freedom,
the 97.5 percentile is 19.02, and the 2.5 percentile is 2.70. The 95 percent
confidence interval in question is therefore:
1) For the first sample:
or
8.40
V
/ ^
19.02
< o- < 8.40
V
/
2.70
or
Here again, both intervals bracket the population standard deviation 12.15,
but again the lengths of the intervals reflect the inadequacy of samples of
size 10 for a satisfactory estimation of the population standard deviation.
Tolerance intervals
than 70 mg/dl. This inference was reliable because of the large size of our
sample (N = 2,197). Can similar inferences be made from small samples,
such as those shown in Table 4.2? Before answering this question, let us first
see how the inference from a very large sample (such as that of Table 4.1)
can be made quantitatively precise.
The reduced variate for our data is
14
.
^ X - _ X - 100.42
' '
fx <
^ ~ a ~ 12.15
70 - 100.42
12.15
If we now assume that the serum glucose data are normally distributed (i.e.,
follow a Gaussian distribution), we read from the table of the normal distribu-
tion that the fraction of the population for which z is less than -2.50 is
0.0062, or 0.62 percent. This is a more precise value than the 1 percent esti-
mate we obtained from a superficial examination of the data.
It is clear that if we attempted to use the same technique for the samples
of size 10 shown in Table 4.2, by substituting x for /x and s for cr, we may
obtain highly unreliable values. Thus, the first sample gives a z value equal
to (70 - 107.57)/13.40 or -2.80, which corresponds to a fraction of the popu-
lation equal to 0.25 percent, and the second sample gives z = (70 - 96.37)/
8.40 = -3.14, which corresponds to a fraction of the population equal to
0.08 percent. It is obvious that this approach cannot be used for small sam-
ples. It is possible, however, to solve related problems, even for small sam-
ples. The statistical procedure used for solving these problems is called the
method of tolerance intervals
X - 107.57, s = 13.40
k = 3.053
Hence the tolerance interval that, on the average, will include 98 percent of
the population is
or
66.7 to 148.5
15
.
We can compare this interval to the one derived from the population itself
(for all practical purposes, the large sample of 2,197 individuals may be con-
sidered as the population). Using the normal table, we obtain for a 98 per-
cent coverage
or
72.2 to 128.7
The fact that the small sample gives an appreciably wider interval is due to
the uncertainties associated with the estimates i and s.
For a more detailed discussion of tolerance intervals, see Proschan.^ Ta-
bles of coefficients for the calculation of tolerance intervals can be found in
Snedecor and Cochran^ and Proschan.^
•^(2)
~ 96.6 ^(7)
~ 112.1
•^(3)
~ 96.9 -^(8)
~ 118.8
-^(4)
~ 97.0 ^(9)
~ 119.6
~ 103.4
~ 134.2
-^(5) -^(10)
The values x^), x,2), x^!^^ are denoted as the first, second,
. . A'^th order . . . ,
16
tion of the population contained between the smallest and the largest value
of a sample of size 10 is ^ = ~~ The meaning
. of the qualification "on
J
Reasons for the central role of the normal distribution in statistical theo-
ry and practice have been given in the section on the normal distribution.
Many situations are encountered in data analysis for which the normal distri-
bution does not apply. Sometimes non-normality is evident from the nature
of the problem. Thus, in situations in which it is desired to determine wheth-
er a product conforms to a given standard, one often deals with a simple di-
chotomy: the fraction of the lot that meets the requirements of the standard,
and the fraction of the lot that does not meet these requirements. The statisti-
cal distribution pertinent to such a problem is the binomial (see section on
the binomial distribution).
In other situations, there is no a priori reason for non-normality, but the
data themselves give indications of a non-normal underlying distribution.
Thus, a problem of some importance is to "test for normality."
Tests of normality
17
scale is labeled in terms of coverages (from 0 to 100 percent), but graduated
abscissa of a plot of the normal curve into N + 1 segments such that the area
under the curve between any two successive division points is ^^77. The
division points will bezj, 7.2, • •
, Zn, the values of which can be determined
from the normal curve. Table 4.5 lists the values 777^, 2 ' • •
'
^^ J
, in percent, in column 1, and the corresponding normal z values in
column 2, for N = 10. According to the general theorem about order statis-
tics, the order statistics of a sample of size N= \0"attempt" to accomplish
just such a division of the area into + 1 equal parts. Consequently, the N
order statistics tend to be linearly related to the z values. The order statistics
for the first sample of Table 4.2 are listed in column 3 of Table 4.5. A plot of
column 3 versus column 2 will constitute a "test for normality": if the data
are normally distributed, the plot will approximate a straight line. Further-
more, the intercept of this line (see the section on straight line fitting) will be
an estimate of the mean, and the slope of the line will be an estimate of the
standard deviation.^ For non-normal data, systematic departures from a
straight line should be noted. The use of normal probability paper obviates
the calculations involved in obtaining column 2 of Table 4.5, since the hori-
zontal axis is graduated according to z but labeled according to the values
7777, expressed as percent. Thus, in using the probability paper, the ten
order statistics are plotted versus the numbers
'«»
Ti •
'«« ^ '«»
11
have presented the procedure by means of a sample of size 10. One would
generally not attempt to use this method for samples of less than 30. Even
then, subjective judgment is required to determine whether the points fall
along a straight line.
18
Table 4.5. Test of Normality Using Order Statistics^
Intercept = 107.6 = ft
^The example is merely illustrative of the method. In practice one would never test normality on a sample
of size 10.
"values of 100 .,
N +
, ,
1
, where N = 10.
110 - 100.42
^^^^
12.15 .
P = 0.215
19
Let p represent the fraction of individuals having the stated character-
istic (serum glucose greater than 1 10 mg/dl) in the sample of size N\ and let
^ = - /?. It is clear that for a relatively small, or even a moderately large
1
TV, p will generally differ from P In fact, p is a random variable with a well-
.
where the symbol E{p) represents the "expected value" of p, another name
mean. Thus the population mean of the distribution of p is
for the population
equal to the parameter f*. Ifp is taken as an estimate forP, i\\\s estimate will
therefore be unbiased.
Furthermore:
Var(p) = (4.25)
Hence
1^^ (4.26)
approximated by the normal distribution of the same mean and standard de-
viation. This enables us to easily solve practical problems that arise in con-
nection with the binomial. For example, returning to our sample of 100 indi-
viduals from the population given in Table 4.1, we have:
E(p) = 0.215
= (0.215)(0.785) =
cr„ 0 0411
100
20
/ (0.18)(1 - 0.18) ^ 0 038
V 100
as an estimate for cr^. This would lead to the following approximate 95 per-
cent confidence interval forP:
or
The above discussion gives a general idea about the uses and usefulness
of the binomial distribution. More detailed discussions will be found in two
general references.
Consider the data in Table 4.6, taken from a study of the hexokinase
method for determining serum glucose. For simplicity of exposition, Table
21
Table 4.6. Determination of Serum Glucose
Serum sample
Laboratory A B C D
J
I 76 0 i _7 / . O ?06 ^
4.6 contains only a portion of the entire set of data obtained in this study.
Each of three laboratories made
four replicate determinations on each of
four serum, samples. It can be observed that, for each sample, the results
obtained by different laboratories tend to show greater differences than re-
sults obtained through replication in the same laboratory. This observation
can be made quantitative by calculating, for each sample, two standard de-
viations: the standard deviation ''within" laboratories and the standard de-
viation "between" laboratories. Within-laboratory precision is often re-
ferred to as repeatability and between-laboratory precision as reproduc-
,
22
5j. = 1.117
The calculations for all four serum samples are summarized in Table 4.7. in
which standard deviations are rounded to two decimal places.
It may be inferred from Table 4.7 that Sy. tends to increase with the glu-
assume that such values have been established and are as follows:
Serum Sample Reference Value
A 40.8
B 76.0
C- 133.4
D 204.1
The values given here as "reference values" are actually only tentative. We
willassume, however, in our present discussion, that they can be considered
to be free of systematic errors. Our task is to decide whether the values ob-
tained in our study are, within random experimental error, equal to these
reference values. The grand average value for sample A, 41.89 mg/dl, which
23
we denote by the symbol .v, involves 12 individual determinations and four
laboratories. Its variance, therefore, can be estimated by the formula:
. / (0.W (1.04)2 ^ ^
\ 12 4
Corresponding values for all four samples are shown in Table 4.8.
It can be seen that, on the one hand, all four grand averages are larger
than the corresponding reference values but, on the other hand, the diflFer-
ences D are of the order of only one or two standard errors s^. One would
tentatively conclude that the method shows a positive systematic error (bias)
but, as has been pointed out above, the data are insufficient to arrive at defi-
nite conclusions.
tration, one finds a nonzero absorbance value. If one "corrected" the sub-
sequent two values for the blank, one would obtain 0.189 - 0.050 = 0.139,
and 0.326 - 0.050 = 0.276. If the "corrected" absorbance were proportion-
al to concentration (as required by Beer's law), these two corrected absor-
bances should be proportional to 50 and 100, i.e., in a ratio of to 2. Ac- 1
tually, 0.139 is slightly larger than (0.276/2). We will assume that this is due
2A
r
y = 0.0516 + 0. 0027571.
s, = 0.0019
A general model
If a represents the true value of the "blank" and ft the absorbance per
unit concentration, we have, according to Beer's law:
where Eiy) the expected value for absorbance, i.e., the absorbance value
is
y = Eiy) + e (4.28)
This equation should hold for all x-values, i.e., Xi,X2, . . . ,XAr, with the same
values of a and (3. Hence
yi
= a + /3jc, + (4.30)
where i — 1 to N.
The errors should, on the average, be zero, but each one departs from zero
by a random amount. We will assume that these random departures from
zero do not increase with the absorbance (in some cases, this assumption is
not valid) and that their distribution is Gaussian with standard deviation o-p.
The object of the analysis is to estimate: (a) a and /3, as well as the uncer-
tainties (standard errors) of these estimates; and (b) the standard deviation
of e; i.e., o-g.
25
Formulas for linear regression
The fitting process is known in the statistical literature as the "linear
regression of>' onx." We will denote the estimates of a, /3, and o-^ by a, ^,
and Se, respectively. The formulas involve the following three quantities:
U= SU, - xf (4.31)
W = X{y^ - yf (4.32)
a = y- ^x (4.34)
,4.35,
x' (4.36)
VU V TV
For the data of Table 4.9, the calculations result in the following values:
a = 0.0516, 55 = 0.0010, /8 = 0.0027571, 5^ = 0.0000036, 5^ = 0.0019.
Since a and ^ are now available, we can calculate, for each x, a "calcu-
lated" (or "fitted") value, y, given by the equation y = a + ^x. This is, of
course, simply the ordinate of the point on the fitted line for the chosen value
of X.
The diflferences between the observed value v and the calculated value v
is called a "residual." Table 4.9 also contains the values of _v and the resid-
uals,denoted by the symbol "t/."
important to observe that the quantity
It is (W - P^/U), occurring in
Equation 4.35, is simply equal to Xd^i. Thus:
Id' (4.37)
N
This formula, though mathematically equivalent to Equation 4.35, should be
used in preference to Equation 4.35, unless all calculations are carried out
with many significant figures. The reason for this is that the quantities di are
less affected by rounding errors than the quantity (W - P'^/U).
26
J
Wi = l/o-l. (4.38)
The weights w, are then used in the regression calculations, leading to for-
mulas that are somewhat dififerent from those given in this section. For fur-
ther details, two references can be consulted. ^-^
Propagation of errors
An example
As an example, consider the determination of glucose in serum, using
an enzymatic reaction sequence. The sequence generates a product, the opti-
cal absorbance of which is measured on a spectrophotometer. The proce-
dure consists of three steps: (a) apply the enzyme reaction sequence to a set
of glucose solutions of known concentrations, and establish in this way a cal-
ibration curve of "absorbance" versus "glucose concentration," (b) by use
of the same reaction sequences, measure the absorbance for the "un-
known," and (c) using the calibration curve, convert the absorbance for the
unknown into a glucose concentration.
It turns out that the calibration curve, for this sequence of reactions, is
y = a + I3x (4.39)
ues. We will again use the data of Table 4.9 for illustration. Fitting a straight
line to these data, we obtain:
27
Let us assume, uncertainty of the calibration Hne
at this point, that the
is negligible. Then the only quantity affected by error isy,,, and it is readily
seen from Equation 4.41 that the error ofi,, is equal to that ofyu, divided by
/3. If we assume that the standard deviation of a single measured y-value is
0.0019 absorbance units, then the standard error of the average of four
determinations, is
A more rigorous treatment would also take account of the uncertainty of the
calibration line.
tal error. The problem to be solved is the calculation of the standard devia-
unaffected by the errors in the otherx's. For independent errors in the meas-
ured values Xi, X2, JC3, some simple rules can be applied. They are all
. . . ,
derived from the application of a general formula known as "the law of prop-
agation of errors," which is valid under very general conditions. The reader
is referred to MandeF for a general discussion of this formula.
S = W2 - (4.44)
S = {\)W, + (-1)^2
28
Hence
0-5= Vo-^H', + o-V. (4.45)
Note that in spite of the negative sign occurring in Equation 4.44, the vari-
ances of Wi and W2 in Equation 4.45 are added (not subtracted from each
other).
It is also of great importance to emphasize that Equation 4.43 is vahd
only if the errors in the measurements x^, .Vg, X3, . . . , are independent of
each other. Thus, a particular element in chemical analysis was deter-
if
mined as the diflference between 100 percent and the sum of the concentra-
tions found for all other elements, the error in the concentrations for that
element would not be independent of the errors of the other elements, and
Equation 4.43 could not be used for any linear combination of the type of
Equation 4.42 involving the element in question and the other elements. But
in that case. Equations 4.42 and 4.43 could be used to evaluate the error vari-
ance for the element in question by considering it as the dependent variable
y. Thus, in the case of three other elements Xi, X2, and x^, we would have:
y = Xi X2 (4.46)
100
-^J
= |l00
J
+ |lOO
^ J
(4.47)
+ •
(4.48)
y J \ I \ X2
Equation 4.48 states that for products of independent errors, the squares of
the relative errors are additive.
The same law applies for ratios of quantities with independent errors.
Thus, when Xi and X2 have independent errors, and
(4.49)
X2
we have
+ ^
X2
(4.50)
29
As an illustration, suppose that in a gravimetric analysis, the sample weight
is 5, the weight of the precipitate is W, and the "conversion factor" is F.
Then:
y = mF^W
The constants 100 and F are known without error. Hence, for this example,
> 2 N 2 \ 2
+
w
If, for example, the coefficient of variation for 5 is 0.1 percent, and that for
W is 0.5 percent, we have:
cr
^ = V (0.005)2 + (0.001)2 = 0.0051
It is seen that in this case, the error of the sample weight S has a negligible
effecton the error of the "unknown" y.
—
Logarithmic functions. When the calculated quantity y is the natural
logarithm of the measured quantity x (we assumed that x > 0):
y = Inx (4.51)
^y=~ (4.52)
y = logio X (4.53)
An example
As an illustration, suppose that a standard requires that the mercury
content of natural water should not exceed 2/xg/l. Suppose, furthermore,
that the standard deviation of reproducibility of the test method (see section
on precision and accuracy, and MandeF), at the level of 2^tg/l, is 0.88/xg/l. If
subsamples of the water sample are sent to a number of laboratories and
30
each laboratory performs a single determination, we may wish to determine
the number of laboratories that should perform this test to ensure that we
can detect noncompliance with the standard. Formulated in this way, the
problem has no definite solution. In the first place, it is impossible to guaran-
tee unqualifiedly the detection of any noncompliance. After all, the decision
will be made on the basis of measurements, and measurements are subject to
experimental error. Even assuming, as we do, that the method is unbiased,
we still have to contend with random errors. Second, we have, so far, failed
to give precise meanings to the terms "compliance" and "noncompliance";
while the measurement in one laboratory might give a value less than 2/u,g/l
of mercury, a second laboratory might report a value greater than 2;ag/l.
ACCEPTABLE UNACCEPTABLE
CONCENTRATION OF MERCURY
(yug/l)
Fig. 4.2. Distribution of measurements of mercury in subsamples of a water sample
sent to N laboratories.
31
the sample, as complying, whenever x is less than 2.0,and reject it, as non-
complying, whenever x is greater than 2.0. As a result of setting our risks at
5 percent, this implies that the areas A and J5 are each equal to 5 percent (see
Fig. 4.2). From the table of the normal distribution, we read that for a 5 per-
cent one-tailed area, the value of the reduced variate is 1.64. Hence:
2.0^
0.88/Va^
(We could also state the requirement that (2.0 - 2.5)/(0.88/V/V) = -1.64,
which is algebraically equivalent to the one above.) Solving forN, we find:
N = i^^\ (4.56)
where a is the appropriate standard deviation, z^. is the value of the reduced
normal variate corresponding to the risk probability (5 percent in the above
example), and d is the departure (from the specified value) to which the cho-
sen risk probability applies.
cr (4.57)
The term al must be included, since the laboratory mean may differ
from the true value by a quantity whose standard deviation is a^. Since the
between-laboratory component al is not divided by N, a cannot be less
than (Ti no matter how many determinations are made in the single laborato-
ry. Therefore, the risks of false acceptance or false rejection of the sample
cannot be chosen at will. If in our case, for example, we had a^, = 0.75/u-g/l
and o-/^ = 0.46/Ltg/l, the total cr cannot be less than 0.46. Considering the fa-
vorable case, (x = 1.5/u,g/l, the reduced variate (see Fig. 4.2) is:
0.46
32
ing (as complying) a sample that is actually noncomplying. The conclusion to
be drawn from the above argument is that, in some cases, testing error will
make it impossible to keep the double risk of accepting a noncomplying prod-
uct and rejecting a complying product below a certain probability value. If,
as in our illustration, the purpose of the standard is to protect health, the
proper course of action is to set the specified value at such a level that, even
allowing for the between-laboratory component of test error, the risk of de-
claring a product as complying, when it is actually noncomplying, is low. If,
in our illustration, a level of 2.5/xg/l is such that the risk of false acceptance
of it (as complying) should be kept to 5 percent (and cr^ = 0.46)Ltg/l), then the
specification limit should be set at a value x such that:
= 1.64
0.46
Transformation of scale
or even
y = V7 (4.60)
and
y = arcsin Vx (4.61)
Robustness
The reason given above for making a transformation of scale is the pres-
ence of skewness. Another reason is that certain statistical procedures are
33
valid only when the data are at least approximately normal. The procedures
may become grossly invalid when the data have a severely non-normal distri-
bution.
A statistical procedure that is relatively insensitive to non-normality in
the original data (or, more generally, to any set of specific assumptions) is
called "robust." Confidence intervals for the mean, for example, are quite
robust because, as a result of the central limit theorem, the distribution of
the sample mean x will generally be close to normality. On the other hand,
tolerance intervals are likely to be seriously affected by non-normality. We
have seen that nonparametric techniques are available to circumvent this dif-
ficulty.
Suppose that, for a particular type of measurement, tests of normality
on many always show evidence of non-normality. Since many
sets of data
techniques are based on the assumption of normality, it would be
statistical
advantageous to transform these data into new sets that are more nearly nor-
mal.
Fortunately, the transformations that reduce skewness also tend, in gen-
eral, to achieve closer compliance with the requirement of normality. There-
fore, transformations of the logarithmic type, as well as the square root and
arcsine transformations, are especially useful whenever a nonrobust analy-
sis is to be performed on a set of data that is known to be seriously non-
normal. The reader is referred to MandeP for further details regarding trans-
formations of scale.
error structure of the data, and transformations are, in fact, often used for
the purpose of making the experimental error more uniform over the entire
range of the measurements. Transformations used for this purpose are called
"variance-stabilizing" transformations. To understand the principle in-
volved, consider the data in Table 4.10, consisting of five replicate absor-
bance values at two different concentrations, obtained in the calibration of
34
spectrophotometers for the determination of serum glucose. At the higher
concentration level, the absorbance values are of course higher, but so is the
standard deviation of the replicate absorbance values. The ratio of the aver-
age absorbance values is 1.6435/0. 1987 = 8.27. The ratio of the standard de-
viations is 0.0776/0.0127 - 6. 1 1. Thus the standard deviation between repli-
cates tends to increase roughly in proportion to the level of the measure-
ment. We have here an example of "heterogeneity of variance." Let us now
examine the two sets of values listed in Table 4. 10 under the heading "trans-
formed data." These are simply the logarithms to the base 10 of the original
absorbance values. This time, the standard deviations for the two levels are in
the proportion 0.0199/0.0288 = 0.69. Thus, the logarithmic transformation
essentially has eliminated the heterogeneity of variance. It has, in fact, "sta-
bilized" the variance. The usefulness of variance stabilizing transformations
is twofold: (a) a single number will express the standard deviation of error,
The law of propagation of errors (see that section) enables one to calcu-
late thenumber of significant figures in a calculated value. A useful rule of
thumb is to report any standard deviation or standard error with two signifi-
cant figures, and to report a calculated value with as many significant figures
as are required to reach the decimal position of the second significant digit of
its standard error.
An example
Consider the volumetric determination of manganese in manganous cy-
clohexanebutyrate by means of a standard solution of sodium arsenite. The
formula leading to the desired value of percent Mn is
w(mg)
where w the weight of the sample, v the volume of reagent, and / the titer
is
of the reagent, and the factor 200/15 is derived from taking an aliquot of 15
ml from a total volume of 200 ml.
For a particular titration, the values and their standard errors are found
to be:
V - 23.67 0.0040
/ = 0.41122 0.000015
200 0.0040
15 cr 0.0040
w = 939.77 w 0.0060
35
The values are reported as they are read on the balance or on the burettes
and pipettes; their standard errors are estimated on the basis of previous ex-
perience. The calculation gives:
Percent Mn = 13.809872
= 0.0044
On the basis of this standard deviation, we would report this result as:
1 13.81 11 13.92
2 13.76 12 13.83
3 13.80 13 13.73
4 13.79 14 13.99
5 13.94 15 13.89
6 13.76 16 13.76
7 13.88 17 13.88
8 13.81 18 13.82
9 13.84 19 13.87
10 13.79 20 13.89
Average = v = 13.838
= 0.068
Si = 0.068/ V20 - 0.015
36
standard deviation of the replicate values is 0.068; therefore, the standard
error of the mean is 0.068/V20 = 0.015. The final value reported for this
analysis would therefore be:
General recommendations
good practice to retain, for individual measurements, more signifi-
It is
cant figures than would result from calculations based on error propagation,
and to use this law only for reporting the final value. This practice enables
any interested person to perform whatever statistical calculations he desires
on the individually reported measurements. Indeed, the results of statistical
manipulations of data, when properly interpreted, are never affected by un-
necessary significant figures in the data, but they may be seriously impaired
by too much rounding.
The practice of reporting a measured value with a ± symbol followed
by itsstandard error should be avoided at all costs, unless the meaning of
the ± symbol is specifically and precisely stated. Some use the ± symbol to
indicate a standard error of the value preceding the symbol, others to in-
dicate a 95 percent confidence interval for the mean, others for the standard
deviation of a single measurement, and still others use it for an uncertainty
interval including an estimate of bias added to the 95 percent confidence in-
terval. These alternatives are by no means exhaustive, and so far no stand-
ard practice has been adopted. It is of the utmost importance, therefore, to
define the symbol whenever and wherever it is used.
It should also be borne in mind that the same measurement can have,
and generally does have, more than one precision index, depending on the
framework (statistical population) to which it is referred. For certain pur-
poses, this population is the totality of (hypothetical) measurements that
would be generated by repeating the measuring process over and over again
on the same sample in the same laboratory. For other purposes, it would be
the totality of results obtained by having the sample analyzed in a large num-
ber of laboratories. The reader is referred to the discussion in the section on
precision and accuracy.
Tests of significance
General considerations
Aconsiderable part of the published statistical literature deals with sig-
nificance testing. Actually, the usefulness of the body of techniques classi-
fied under this title is far smaller than would be inferred from its prominence
37
in the literature. Moreover, there are numerous instances, both pubHshed
and unpubhshed, of serious misinterpretations of these techniques. In many
applications of significance testing, a "null-hypothesis" is formulated that
consists of a statement that the observed experimental result —for example,
the improvement resulting from the use of a drug compared toa placebo— is
not "real," but simply the effect of chance. This null-hypothesis is then sub-
jected to a statistical test and, if rejected, leads to the conclusion that the
beneficial effect of the drug is "real," i.e., not due to chance. A closer exami-
nation of the nature of the null-hypothesis, however, raises some serious
questions about the validity of the logical argument. In the drug-placebo
comparison, the null-hypothesis is a statement of equality of the means of
two populations one referring to results obtained with the drug and the oth-
,
er with the placebo. All one infers from the significance test is a probability
statement regarding the observed (sample) difference, on the hypothesis that
the true difference between the population means is zero. The real question,
of course, is related not to the means of hypothetical populations but rather
to the benefit that any particular subject, selected at random from the rele-
vant population of patients, may be expected to derive from the drug.
Viewed from this angle, the usefulness of the significance test is heavily de-
pendent on the size of the sample, i.e. on the number of subjects included in
,
the experiment. This size will determine how large the difference between
the two populations must be, as compared to the spread of both popu-
lations, before the statistical procedure will pick it up with a reasonable prob-
ability. Such calculations are known as the determination of the "power" of
the statistical test of significance. Without indication of power, a test of sig-
nificance may be very misleading.
38
that show
better results with the drug as compared to the placebo. There-
fore, our acceptance of the greater effectiveness of the drug on the basis of
the data will involve risks of error. If the true situation is (a), we may wish to
have only a small probability of declaring the drug superior, say, a probabili-
ty of 10 percent. On the other hand, if the true situation is (b), we would
want this probability to be perhaps as high as 90 percent. These two probabil-
ities then allow us to calculate the required sample size for our experiment.
Using this sample size, we will have assurance that the power of our experi-
ment is sufficient to realize the stipulated probability requirements.
An example
An problems is shown in Table 4. 12. The data
illustration of this class of
result from the comparison of two drugs, S (standard) and E (experimental),
for the treatment of a severe pulmonary disease. The data represent the re-
duction in blood pressure in the heart after administration of the drug. The
test most commonly used for such a comparison is Student's t test.^"^ In the
present case, the value found for t is 3.78, for DF = 142 (DF = number of
degrees of freedom). The probability of obtaining a value of 3.78 or larger by
pure chance (i.e. for equal efficacy of the two drugs) is less than 0.0002. The
,
test and the conclusion are of little value for the solution of the real problem
underlying this situation, as the following treatment shows. If we assume, as
a first approximation, that the standard deviation 3.85 is the "population pa-
rameter" cr, and that the means, 0.10 for S and 2.53 for E, are also popu-
lation parameters, then the probability of a single patient being better off
Number of patients 68 76
Average 0.10 2.53
Standard deviation 3.15 4.28
Standard error of average 0.38 0.50
True mean
39
)
2.53 - 0.10
Q = 3.85
= 0.63
This can be readily understood by looking at Figure 4.3, in which the means
of two populations, S and E, are less than one standard deviation apart, so
that the curves show a great deal of overlap. There is no question that the
two populations are distinct, and this is really all the t test shows. But due to
the overlap, the probability is far from overwhelming that treatment E will
be superior to treatment S for a randomly selected pair of individuals. It can
be shown that this probability is that of a random normal deviate exceeding
the value! — or, in our case (- - -0.45. This probability is
0 5 10
40
ferfrom quantitative types of tests in that their outcome is characterized by
a simple dichotomy into positive or negative cases.
As an example, consider Table 4.13, representing data on the alpha-feto-
protein (AFP) test for the diagnosis of hepatocellular carcinoma.^ What do
these data tell us about the value of the AFP test for the diagnosis of this
disease?
90
Sensitivity ='j^= 0.8411 = 84.11%
2079
Specificity =^~~ = 0.9816 = 98.16%
2 1 lo
Hepatocarcinoma
+ 90 39 129
17 2,079 2,096
Total 107 2,118 2,225
Al
Table 4.14. Values for Alpha-Fetoprotein Tests Derived from Table 4.13
Hepatocarcinoma
+ 900 39 939
170 2,079 2,249
Total 1,070 2,118 3,118
column by 10, and by leaving the values in the "Absent" column un-
changed. Table 4.14 leads to the same sensitivity and specificity values as
Table 4.13. However, the (PV + ) value is now 900/939 = 95.85 percent.
It is seen that the (PV + ) value depends not only on the sensitivity and
the specificity but also on the prevalence of the disease in the total popu-
lation. In Table 4.13, this prevalence is 107/2225 = 4.809 percent, whereas
in Table 4.14 it is 1070/3118 = 34.32 percent.
A logical counterpart of the (PV + ) value is the predictive value of a neg-
ative test, or PV-.
Predictive value of a negative test. (PV — ) is defined as the proportion—
of subjects free of the disease among those showing a negative test. For the
data of Table 4.13, the (PV-) value is 2079/2096 = 99.19 percent, whereas
for Table 4.14, (PV-) = 2079/2249 = 92.44 percent. As is the case for
(PV + ), the (PV-) value depends on the prevalence of the disease.
The following formulas relate (PV + ) and (PV-) to sensitivity, specifici-
ty, and prevalence of the disease. We denote sensitivity by the symbol SE,
specificity by SP, and prevalence by P; then:
(PV + ) = ^
(4.62)
^
{I - SP) {I - P)
1 +
SE P
•
(PV-) = ^
„ (4.63)
(1- SE) P
1 +
SP{\ - P)
= '''''
(1 - 0.9816) (1 - 0.04809) = = ''
1 +
(0.8411) (0.04809)
Apart from rounding errors, these values agree with those found by direct
inspection of the table.
42
of the disease is Then the probabiHty that the patient suffers
4.809 percent.
from hepatocarcinoma about 70 percent. On the basis of this result, the
is
patient now belongs to a. subgroup of the total population in which the preva-
lence of the disease is 70 percent rather than the 4.8 percent applying to the
total population. Let us assume that a second test is available for the diag-
nosis of hepatocarcinoma, and that this second test is independent of the
AFP test. The concept of independence of two diagnostic tests is crucial for
the correct statistical treatment of this type of problem, but it seems to have
received little attention in the literature. Essentially, it means that in the
class of patients affected by the disease, the proportion of patients showing a
positive result for test B is the same, whether test A was positive or nega-
tive. A similar situation must hold for the class of patients free of the dis-
ease.
In making inferences from this second test for the patient in question,
we can with a value of prevalence of the disease (P) of 70 percent, rath-
start
er than 4.8 percent, since we know from the result of the AFP test that the
patient belongs to the subgroup with this higher prevalence rate. As an illus-
tration, let us assume that the second test has a sensitivity of 65 percent and
a specificity of 90 percent and that the second test also is positive for this
patient. Then the new (PV+) value is equal to
^ (0.65) (0.70)
If, on the other hand, the second turned out to be negative, then the
test
probability that the patient is free of disease would be:
^ (0.90) (1 - 0.70)
In that case, the two tests essentially would have contradicted each other,
and no firm diagnosis could be made without further investigations.
can easily be shown that the order in which the independent tests are
It
carried out has no effect on the final (PV + ) or (PV-) value. In fact, the fol-
lowing general formula can be derived that covers any number of indepen-
dent tests and their possible outcomes.
Denote by {SE)i and {SP)i the sensitivity and the specificity of the
ith = test, where / = 1, 2, 3, N. Furthermore, define the symbols
. . . ,
and Bi as follows:
_
= \ {SE)i when the result of test / is +
Ai -
[ 1 when the result of test /
{SE)i is
D ^ [ 1
- {SP)i when the result of test / is +
'
{SP)i when the result of test is - /
43
If P
the prevalence rate of the disease before administration of any of
is
the tests, and P' is the probability that the subject has the disease after ad-
ministration of the tests, then:
It is important to keep in mind that Equation 4.64 is valid only if all tests
are mutually independent in the sense defined above.
Quality Control
According to the ASQC,^ the control chart is "a graphical chart with
control limits and plotted values of some statistical measure for a series of
samples or subgroups. A central line is commonly shown."
44
The results of a laboratory test are plotted on the vertical axis, in units
of the test results, versus time, in hours, days, etc., plotted on the horizontal
axis. Since each laboratory test should be checked at least once a day, the
horizontal scale should be wide enough to cover a minimum of one month of
data. The control chart should be considered as a tool to provide a "real-
time" analysis and feedback for appropriate action. Thus, it should cover a
sufficient period of time to provide sufficient data to study trends, "runs"
above and below the central line, and any other manifestation of lack of ran-
domness (see section on detection of lack of randomness).
General considerations
W. A. Shewhart, in his pioneering work developed the prin-
in 1939,^^
ciples of the control chart. They can be summarized, as was done by E. I.
Grant, as follows: "The measured quantity of a manufactured product is
always subject to a certain amount of variation as a result of chance. Some
stable 'System of Chance Causes' is inherent in any particular scheme of pro-
duction and inspection. Variation within this stable pattern is inevitable. The
reasons for variation outside this stable pattern may be discovered and cor-
rected." If the words "manufactured product" are changed to "laboratory
test," the above statement is directly applicable to the content of this
section.
We can think of the "measured quantity" as the concentration of a par-
ticular constituent in a patient'ssample (for example, the glucose content of
a patient's serum). Under the "system of chance causes," this concentra-
tion, when measured many times under the same conditions, will fluctuate in
such a way as to generate a statistical distribution that can be represented by
a mathematical expression. This expression could be the normal distribu-
tion, for those continuous variables that are symmetrically distributed about
the mean value, or it could be some other suitable mathematical function ap-
plicable to asymmetrically or discretely distributed variables (see section on
non-normal distributions). Then, applying the known principles of probabili-
ty, one can find lower and upper limits, known as control limits, that will
define the limits of variation within "this stable pattern" for a given accept-
able tolerance probability. Values outside these control limits will be consid-
ered "unusual," and an investigation may be initiated to ascertain the rea-
sons for this occurrence.
Control limits
According to the ASQC,^ the control limits are the "limits on a control
chart that are used as criteria for action or forjudging whether a set of data
does or does not indicate lack of control."
Probability limits. — If the distribution of the measured quantity is
known, then lower and upper limits can be found so that, on the average, a
predetermined percentage of the values (e.g., 95 percent, 99 percent) will fall
within these limits if the process is under control. The limits will depend on
the nature of the probability distribution. They will diflfer, depending on
45
whether the distribution of the measured quantity is symmetric, asymmetric
to the left or to the right, unimodal or bimodal, discrete or continuous, etc.
The obvious difficulty of finding the correct distribution function for
each measured quantity, and of determining the control limits for this distri-
bution, necessitates the use of procedures that are not overly sensitive to the
nature of the distribution function.
—
Three-sigma limits. The three-sigma limits, most commonly used in in-
dustrial practice, are based on the following expression:
Control limits = Average of the measured quantity ± three standard
deviations of the measured quantity
The ''measured quantity" could be the mean of two or three replicate
determinations for a particular chemical test, the range of a set of replicate
tests, a proportion defective, a radioactive count, etc.
The range of three standard deviations around the mean, that is, a width
of six standard deviations, usually covers a large percentage of the distribu-
tion. For normally distributed variables, this range covers 99.7 percent of
the distribution (see section on the normal distribution). For non-normally
distributed variables, an indication of the percentage coverage can be ob-
tained by the use of two well-known inequalities:
1) Tchehychejf s Inequality. For any distribution, (discrete or continuous,
symmetric or asymmetric, unimodal or bimodal, etc.) with a finite stand-
ard deviation, the interval mean ± Ka covers a proportion of the popu-
lation of at least
1 8
' " ^ . Thus for = 3, the coverage will be at least
- =
1
^ ^, or roughly 90 percent of the distribution.
46
Variability between and within subgroups
The hypothesis = 0
In control charts for variables, the variability is partitioned into two
components: within and between subgroups. To this eflfect, the sequence of
measurements is divided into subgroups of n consecutive values each. The
variability within subgroups is estimated by first computing the average of
the ranges of all subgroups and dividing this average by a factor that depends
on n, which can be found in standard statistical tables. As an example, con-
sider the sequence: 10.2, 10.4, 10.1, 10.7, 10.3, 10.3, 10.5, 10.4, 10.0, 9.8,
10.4, 10.9. When divided into subgroups of four, we obtain the arrangement:
47
day of and there may be more variability between days than within
testing,
days. The initial set of data (baseline data) is then used primarily to estimate
both the within- and the between-components of variability, and control lim-
its are calculated on the basis of both these components (see section on com-
putation of control limits). Data that are obtained subsequent to the baseline
period are then evaluated in terms of these control lines. From time to time,
the control lines are recalculated using all the data obtained up to that time,
eliminating, however, those data for which abnormal causes of variability
were found.
The general objectives of a control chart are: (a) to obtain initial esti-
mates for the key parameters, particularly means and standard deviations.
These are used to compute the central lines and the control lines for the con-
trol charts; (b) to ascertain when these parameters have undergone a radical
change, either for worse or for better. In the former case, modifications in
the control process are indicated; and (c) to determine when to look for as-
signable causes of unusual variations so as to take the necessary steps to
correct them or, alternatively, to establish when the process should be left
alone.
A daily review of the control chart should indicatewhether the result-
ing product or service accordance with specifications. For example, in
is in
A properly prepared control chart will tend to reflect any change in the
precision and accuracy of the results obtained. To avoid wasting time in
hunting for unnecessary sources of trouble, care should be taken to maintain
laboratory conditions and practices as uniform as possible. These include
sampling procedures, dilution techniques, aliquoting methods, storage meth-
ods, instrumental techniques, calculating procedures, etc.
49
Means and ranges should be computed for each run and plotted on separate
charts. Records should be accurately kept, using standard quality control
(QC) forms that are readily available. Any value that appears to be the result
of a blunder should be eliminated, and the source of the blunder carefully
noted. It is recommended that the number of runs or subgroups be at least 25
for the baseline period.
The same considerations apply to control charts of proportions and
counts, except that the number of observations for each subgroup is gener-
ally larger than the corresponding number used in a control chart of vari-
ables. Statistical procedures for determining the sample size, n, for the P-
chart or the C-chart can be found in the literature (see Duncan, pp. 345 and
361). In general, n should be large enough to provide a good chance of find-
ing one or more defectives in the sample.
Based on the initial set of data collected during the baseline period, trial
control limits can be determined using the procedure outlined in the section
on random samples. After plotting these limits in the initial control chart (see
section on the case otb 0), points that are outside or very near the limits
should be carefully examined, and if some valid reasons are found for their
erratic behavior, they should be eliminated and new control limits should be
computed. In general, it is better to start with control limits that are relative-
ly narrow in order to better detect future trends, shifts in mean values, and
some other types of irregularities. A common experience is that some initial
subgroups of data will not be under control but, in general, after some knowl-
edge is gained in the use of the control chart, the process will tend to reach a
state of statistical equilibrium. After this time period, one generally has an
adequate amount of data to produce realistic estimates of the mean and
standard deviations.
Two variable control charts should be kept, one for the average value,
and the other for the range of individual determinations in each subgroup. In
all cases in which a non-zero component for between-subgroups is known to
exist, the control limits for the chart of averages will be based on the "total"
standard deviation for subgroup averages.
If the subgroups are of size n, and if 6- pi and 6-| represent the estimated
components of variance within subgroups and between subgroups, respec-
tively, then the "total standard deviation" for the averages of subgroups is
(4.65)
This quantity can also be obtained by directly calculating the standard devia-
tion of the subgroup averages in the baseline period.
The control chart of the ranges will be used to ascertain whether the
variability among individual readings within subgroups is consistent from
50
subgroup to subgroup. The limits for this chart will be based on the within-
subgroup standard deviation.
Using the available data for k subgroups, each of size n we will have the ,
layout shown in Table 4. 15. The standard deviation within subgroups can be
estimated from
Sw^-^ (4.66)
where
(4.67)
and the value of d2 can be obtained from standard control chart tables (see
Duncan/2 p. 927). Values of d2 for typical sample sizes are given in the fol-
lowing table:
n ^2
2 1.128
3 1.693
4 2.059
5 2.326
The value of can be accurately determined by pooling the variances
from each subgroup (see section on precision and accuracy). However, the
above estimate, based on the average range, is sufficiently accurate if the
number of subgroups is large enough (say, 25 or more).
The standard deviation of the k sample averages is:
S,= MrzM.
-
(4.68)
k 1
(4.69)
I . ,X,n
2 A'2i,A'22, • • • y^in ^2 R2
3 X3\,X32t • i^3n
• • • •
• • • •
k • fXicn Rk
X R
51
and the total standard deviation for individual determinations is:
St=Is-^+^'^ (4.71)
The control limits for the chart of averages are given by:
and
where UCL =
upper control limit; LCL = lower control limit.
The warning limits are:
UWLj,= X + 2Sj. (4.74)
and
and
where
D,= I + 3^ (4.78)
and
and the values of dz, d-^, Dg, and are given in Natrella"' and Duncan.''^ For
n = 2, those values are = 3.267, and = 0.
52 r
The warning limits for « =2 are:
UWLfi^ 2.5\2R
LWLji= 0
2.512=1 + 24
d2
=1 + 2^^
1.128
Initial data
The data in Table 4.16 represent 25 daily, duplicate determinations of a
cholesterol control, run on a single-channel Autoanalyzer I, 40 per hour. It
may appear strange that all 50 values are even. This is due to a stipulation in
the protocol that the measured values be read to the nearest even number.
The data cover a period of two months, with the analyzer run at a frequency
S = 9,810 120
53
of three days per week. The two daily determinations were randomly located
within patient samples. The control consisted of 0.5 ml of sample extracted
with 9.5 ml of 99 percent reagent-grade isopropyl alcohol.
X = 9810/25 = 392.4
R - 120/25 = 4.8
= VieJ = 6.04
LCLr = 0
Analysis of data
charts for the mean and range of the daily runs, together with their appropri-
ate control limits.
The means of the daily runs appear to be under control. Only one point,
day above the warning limit, and all points appear to be randomly lo-
9, is
cated around the central line.
The control chart of the range shows two points out of control, days 5
and 14, and one point, day 12, on the upper warning limit.
Let us assume, for the purpose of illustration, that a satisfactory reason
was found for those two points to be out of control in the range chart, and
that it was decided to recompute new limits for both the X and the R charts
based on only 23 days of data.
The new values are: A' = 392.7,^ = 3.57, 5' = 6. 17, and a? = 23. j.
5A
CHOLESTEROL
CONTROL CHART FOR THE MEAN
(Two determinations per day)
2 4 6 8 10 12 14 16 18 20 22 24 26
DAYS
Fig. 4.4. Control chart for the mean, based on 25 daily duplicate determinations of a
cholesterol control.
LCL« = 0 LWLn =0
These values establish the final limits, based on the baseline period.
CHOLESTEROL
CONTROL CHART FOR THE RANGE
20
16
12
8 10 12 14 16 18 20 22 24 26
DAYS
Fig. 4.5. Control chart for the range, based on 25 daily duplicate determinations of a
cholesterol control.
55
Additional data
points can be evaluated with confidence in terms of the baseline central line
S = 7,533 82
56
CHOLESTEROL
CONTROL CHART FOR THE MEAN,
USING CORRECTED LIMITS
(Additional Data)
- UCL= 411.2
••
UWL= 405.0
- X = 392.7
•
LWL= 380.4
-• LCL= 374.2
DAYS
Fig. 4.6. Control chart for the mean, based on 19 additional data points, plotted
against the corrected control limits.
and control limits. If, in the example under discussion, the additional set
(days 26 to 44) was found to be satisfactorily consistent with the baseline
data, then it would be proper to extend the baseline period by this set, i.e., a
total of 25 + 19 = 44 points. However, we have already observed a number
of shortcomings in the additional set, and the proper action is to search for
the causes of these disturbances, i.e., "to bring the process under control."
This is of course not a statistical problem.
For the purpose of our discussion, we will assume that an examination
of the testing process has revealed faulty procedure starting with day 37.
Therefore, we will consider a shortened additional set, of days 26 through
36. The following table gives a comparison of the baseline set (corrected to
23 points as discussed previously) and the shortened additional set (11
points).
Number of points, TV 23 11
Average, Z 392.7 389.5
Average Range, 7? 3.57 4.73
Standard deviation, 5 J - 6.17 8.15
By using the F test, ^"^ it is easily verified that the diflference between
the two standard deviations is well within the sampling variability that may
be expected from estimates derived from samples of 23 and 1 1 points, respec-
57
CHOLESTEROL
CONTROL CHART FOR THE RANGE
USING CORRECTED LIMITS
R (Additional Data)
34 36
DAYS
Fig. 4.7. Control chart for the range, based on 19 additional data points, plotted
against the corrected control limits.
both sets to construct a new baseline. This results in the following parame-
ters: N = 34, 1 = 391.7, R = 3.95, and Sj^ = 6.93.
The new control limits are:
ForX: UCL 412.5 UWL 405.6
LCL 370.9 LWL 377.8
For/?: UCL 12.9 UWL 9.9
LCL 0 LWL 0
Using these new parameters, it can be noted that the points correspond-
ing to days 37 through 44 may indicate a potential source of trouble in the
measuring process.
58
The steps to follow are:
1) Use a moving range of two successive determinations;
2) Compute R =
3) Determine the controllimits for
^±34 "2
X ± 2.66 R
4) The upper control limit fori? is D4R = 3.267 R. The lower control limit is
equal to zero.
that are considered attributes and are not necessarily quantitative in nature.
To use this chart, it is only necessary to count the number of entities that
have a well-defined property, such as being defective, have a certain type of
disease, or have a glucose content greater than a given value, and translate
this number into a proportion. The data used in this chart are easy to handle,
and the cost of collection is normally not very high. In some instances, the
P-chart can do the job of several average and range charts, since the classifi-
cation of a "defective" element may depend on several quantitative charac-
teristics, each of which would require an individual set of average and range
charts for analysis.
The sample size for each subgroup will depend on the value of the pro-
portion Pbeing estimated. A small value of P will require a fairly large
sample size in order to have a reasonable probability of finding one or more
"defectives" in the sample (See Duncan^^). In general, a value of n between
25 and 30 is considered adequate for the calculation of a sample proportion.
UCL^ = p + 3 (4.80)
59
LCLp = p - 3 (4.81)
Number of Elements
Sample Having a Certain
2
In
Number Size
n
Characteristic
X,
X2
Proportion
p,
P2
3 n Z3
n Xn Pk
Total XXi Xpi
where pt
n
Average proportion:
(4.82)
k
(4.83)
When the sample size does not remain constant from subgroup to sub-
group, the recommended procedure is to compute control limits using the
average sample size. However, when a point falls near the control limits thus
calculated, then the actual limits for this point, using its own sample size,
should be estimated before a conclusion is reached about its state of control.
60
small, but the unit is large enough to make the average number of occur-
rences (number of defects) a measurable number.
but the sample size n is large, then the distribution of the number of occur-
rences c of this event tends to follow a Poisson distribution with parameter
nP = c' The mean and standard deviation of c are:
.
The random variable c represents the number of defects per unit, the
number of radioactive counts in a given period of time, the number of bacteria
in a specified volume of liquid, etc.
—
Control limits. The upper and lower limits are given by:
UCLc = c + 3 (4.87)
LCL^ = c - 3 Vc (4.88)
LWL, = c - 2 VV (4.90)
aaababbbaab
Here we have six runs, of length 3, 1, 1, 3, 2, 1, respectively.
61
Another would be the property of
criterion for the definition of a run
increase or decrease of successive observations. Such runs are called ''runs
up and down/' For example, the sequence 2, 1.7, 2.2, 2.5, 2.8, 2.0, 1.8, 2.6,
2.5, has three runs down and two runs up. In order of occurrence, the
lengths of the runs are 1, 3, 2, 1, 1.
Returning to runs above and below the central value, it is possible
through use of the theory of probability, and assuming that the probability is
one-half that an observation will fall above the central line (and, con-
sequently, one-half that it will fall below the central line), to determine the
probability distribution of the lengths of runs. Tables are available for sever-
al of these distributions (See Duncan, Chap. 6), Some rules of thumb based
on the theory of runs that are very useful in pointing out some lack of ran-
domness are:
62
Cyclic Variations Stiift in the Average
UCL X
UCL
LCL LCL
63
References
Periodical
Journal of Research —
The Journal of Research of the National Bureau of Standards reports NBS research
and development in those disciplines of the physical and engineering sciences in which the Bureau is active.
These include physics, chemistry, engineering, mathematics, and computer sciences. Papers cover a broad
range of subjeas, with major emphasis on measurement methodology and the basic technology underlying
standardization. Also included from time to time are survey articles on topics closely related to the Bureau's
technical and scientific programs. Issued six times a year.
Nonperiodicals
Monographs — Major contributions to the technical literature on various subjects related to the Bureau's scien-
tific and technical activities.
—
Handbooks Recommended codes of engineering and industrial practice (including safety codes) developed in
cooperation with interested industries, professional organizations, and regulatory bodies.
Special Publications — Include proceedings of conferences sponsored by NBS, NBS annual reports, and other
special publications appropriate to this grouping such as wall charts, pocket cards, and bibliographies.
Applied Mathematics Series — Mathematical tables,manuals, and studies of special Interest to physicists,
engineers, chemists, biologists, mathematicians, computer programmers, and others engaged in scientific and
technical work.
—
National Standard Reference Data Series Provides quantitative data on the physical and chemical properties
of materials, compiled from the world's literature and critically evaluated. Developed under a worldwide pro-
gram coordinated by NBS under the authority of the National Standard Data Act (Public Law 90-396).
NOTE: The Journal of Physical and Chemical Reference Data (JPCRD) is published quarterly for NBS by
the American Chemical Society (ACS) and the American Institute of Physics (AIP). Subscriptions, reprints,
and supplements are available from ACS, 1155 Sixteenth St., NW, Washington, DC 20056.
—
Building Science Series Disseminates technical information developed at the Bureau on building materials,
components, systems, and whole structures. The series presents research results, test methods, and perfor-
mance criteria related to the structural and environmental functions and the durability and safety
characteristics of building elements and systems.
Technical Notes — Studies or reports which are complete in themselves but restrictive in their treatment of a
subject. Analogous to monographs but not so comprehensive in scope or definitive in treatment of the subject
area. Often serve as a vehicle for final reports of work performed at NBS under the sponsorship of other
government agencies.
—
Voluntary Product Standards Developed under procedures published by the Department of Commerce in
Part 10, Title 15, of the Code of Federal Regulations. The standards establish nationaDy recognized re-
quirements for products, and provide all concerned interests with a basis for common understanding of the
characteristics of the products. NBS administers this program as a supplement to the aaivities of the private
sector standardizing organizations.
NBS Interagency Reports special series of interim or final reports on work performed by NBS
(NBSIR)— A
for outside sponsors (both government and non-government). In general, initial distribution is handled by the
sponsor; public distribution is by the National Technical Information Service, Springfield, VA 22161, in paper
copy or microfiche form.