2006 40 4Q&Q
2006 40 4Q&Q
2006 40 4Q&Q
DOI 10.1007/s11135-005-1100-y
Springer 2006
Abstract. Measurement plays a signicant role in Six sigma program. Usually, the gauge
repeatability and reproducibility (GR&R) study needs to be conducted prior to the process
capability analysis for verifying the accuracy of measuring equipments and helping organizations improve their product and service quality. Therefore, how to ensure the quality of
measurement becomes an important task for quality practitioners. In performing the GR&R
study, most industries are using the acceptance criteria of Precision to Tolerance(P/T) ratio
as stipulated by QS9000. However, the adequacy of applying the same acceptance criteria
to different manufacturing processes is very questionable. In this paper, a statistical method
using the relationship between GR&R and process capability indices is proposed for evaluating the adequacy of the acceptance criteria of P/T ratio. Finally, a comparative analysis has
also been performed for evaluating the accuracy of GR&R among three methods (ANOVA,
Classical GR&R, and Long Form). Hopefully, the results of this research can provide a
useful reference for quality practitioners in various industries.
Key words: Six sigma program, gauge repeatability and reproducibility, precision to tolerance
(P/T) ratio, ANOVA, classical GR&R method, long form method
1. Introduction
Gauge variability plays a key role on quality improvement for the industry.
Only a gauge with acceptable repeatability and reproducibility, the adequacy
of a products measurement process can be determined. Recently, gauge variability study has been highly regarded by the quality practitioners when
QS9000 and Six sigma program become fashionable requirements/management tools for manufacturing industries. Many companies are now pursuing
the excellence of product and service quality through Six sigma program,
The implementation steps of a Six sigma project D-M-A-I-C are listed as
follows:
1. Dene: Dene the processes and their performance standards.
2. Measure: Select CTQ (Critical To Quality) characteristics and verify
the accuracy of measuring equipments.
500
JEH-NAN PAN
3. Analysis: Perform process capability analysis, setup baselines and identify sources of variation.
4. Improvement: Discover key variable relationships and establish operating tolerance.
5. Control: Implement and maintain the process control.
The second step of having a sound measurement system is a vital
part of the Six sigma program. Good quality of product can only be
achieved through an adequate measurement system. Therefore, making
sure the performance of a measurement system is adequate becomes an
urgent task for practitioners. Gauge Repeatability and Reproducibility
(GR&R) analysis is part of the requirements for QS9000 initiated by
Ford Company and widely accepted by auto manufacturers afterwards.
Prior to the development of QS9000, the three major auto manufacturers in USA have their own quality systems. In order to adapt for the
trend of ISO (international Standards) and become an international quality standard for the auto suppliers, QS9000 and one of the six handbooks
of QS9000, Measurement System Analysis (MSA), have been developed
accordingly.
The GR&R study is performed according to the QS9000 standards
stated in MSA(AIAG, 1997) to decide the suitability of a gauge. Since
these standards are set primarily for auto manufacturers, whether the
acceptance criteria of P/T ratio in these standards could be used for
other hardware manufacturers, e.g., electronic and chemical industries,
are very questionable. Generally speaking, there are three known methods for the GR&R study. Mandel (1972) rst used the expected mean
squares to nd the total variation of measurement through the concept of ANOVA. Montgomery and Runger (1993a) proposed the Clas 2 and
sical GR&R, who used the idea of mean and range, i.e., R/d
Rx /d2 (d2 is adjustment factor, to estimate the total variation of measurement of GR&R). This second method is to get an estimate of standard
deviation for GR&R. The last method, called Long Form, is used to estimate the total variation of measurement of GR&R and value of precision to tolerance (P/T) ratio. This method is specially designed for
quality practitioners without statistical background. After discussing the
suitability of the acceptance criteria of P/T ratio for GR&R and establishing a reasonable acceptance range of P/T ratio for different industries, the accuracy of three current methods (ANOVA, Classical GR&R
and Long Form) of GR&R analysis have also been analyzed and
compared.
501
(2.1)
2
2
is the variability of measurement process, repeatability
the
where gauge
2
repeatability, reproducibility the reproducibility. Total variation is the sum of
product variation and the variability of measurement process:
2
2
2
Total
= part
+ gauge
,
(2.2)
2
2
2
is total variation, part
the product variation, gauge
the
where Total
variability of measurement process or gauge.
According to Tsais (1989) ANOVA model, it is a two-factor design of
experiment under the same condition of measurement, where one factor is
the inspector, the other factor is the product, and both are random effect.
The model is:
i = 1, 2, . . . , n
j = 1, 2, . . . , p
yij l = + Pi + Oj + (P O)ij + Rij l
(2.3)
l = 1, 2, . . . , k
502
JEH-NAN PAN
Sum of
square
Degrees of freedom
Mean of
square
Parts
Inspector
Parts*Inspector
Error
Total
SSP
SSO
SSPO
SSR
SST
np = n 1
no = p 1
npo = (n 1)(p 1)
nR = np(k 1)
npk 1
MSP
MSO
MSP O
MSR
MS
(MS
P
PO)
P2 =
pk
(MSO MSP O )
O2 =
nk
P2 O =
(2.4)
If interaction between product and inspector does not exist, then sum of
square, degree of freedom, and mean square should be added to error term.
Montgomery and Runger (1993a, b) proposed another method Classical
Gauge Repeatability and Reproducibility study, in which the idea of mean
2 and Rx /d2 (d2 is the adjustment factor shown in
and range, i.e., R/d
503
j =1 Rj
R =
(2.5)
p
repeatability =
R
,
d2
(2.6)
Rx
,
d2
(2.7)
(2.8)
(2.9)
The condition to use Classical GR&R to estimate repeatability and reproducibility is that all R j fall within the control limits of R chart for ensuring the stability to assess the measurement system. AIAG (1995, 1997) and
DataMyte (1989) stated a method called Long Form, which is a standard
form designed by three major automobile manufacturers in the USA. The
form uses sample range method to estimate repeatability and reproducibility, is primarily designed for quality practitioners without statistical background. The GR&R and whether the measuring equipment is suitable can
be determined through step-by-step procedures. The data can be gathered
by inspectors, and then put into a standard format, thus the repeatability,
reproducibility, and product variation can easily be estimated. All the three
methods are using Precision to Tolerance (P/T) ratio as the acceptance criteria shown in (2.10) for evaluating the precision and accuracy of measuring equipments.
P/T =
5.15gauge
100%,
Tolerance
(2.10)
504
JEH-NAN PAN
USLLSL
6
(2.11)
where USL is the upper specication limit, LSL the lower specication
limit, and is the standard deviation of a production process.
Kane (1986) discussed the statistical properties of capability index CP.
He pointed out that there is a reference parameter of CP , i.e., Total , and
estimated value of this reference parameter is:
Total =
2
X)
.
n1
n
i=1 (Xi
(2.12)
He proved that if the Equation (2.13) shown below is a chi-square distribution, i.e.,
2
(n 1) Total
2
Total
2
n1
.
(2.13)
(2.14)
505
(2.15)
Taking the square of capability indices shown in Equation (2.15) and multiplying degree of freedom n 1,
(n 1)CP2
,
C P2
(2.16)
One can prove that Equation (2.16) follows a chi-square distribution with
degree of freedom n 1, i.e.,
(n 1)CP2
2
n1
.
2
CP
(2.17)
Tolerance
,
6Total
(3.1)
506
JEH-NAN PAN
taken into account. While the actual or true capability index relates only to
the product variation and can be written as:
Actual CP =
Tolerance
,
6part
(3.2)
where Actual CP is the actual or true capability index, part the product
variation.
Listed below are the derivation of relationship between true capability
index, observed capability index, and GR&R.
Tolerance
6part
Tolerance
=
2
2
6 Total
gauge
Actual CP =
=
6
=
6
=
Tolerance
Tolerance
6 Observed CP
1
1
6 ObservedCP
2
gauge
1
6 ObservedCP
by (3.1)
1
2
gauge
Tolerance
5.15 gauge
5.15 Tolerance
Let 5.15gauge /Tolerance in the above equation be P/T ratio, then the
relationship between actual capability index, observed capability index, and
P/T ratio of GR&R can be described as:
Actual CP =
1
1
6 ObservedCP
P/T
5.15
(3.3)
Actual CP =
1
1
Observed CP
6 P/T
5.15
(3.4)
507
1
1
OCP
6 P/T
5.15
(3.5)
2
2
H o : CP = ACP =
1
6P/T
.
(3.6)
OCP
5.15
H a : C = OC
P
P
We rst dene the type I error of CP caused by a measurement process
as below:
: probability of type I error that CP is the true capability index, while
mistakenly regarded as the observed capability index. If the type I error is
xed, say 0.01, then we can dene the probability of accuracy as 1 ,
where the type II error indicates that CP is the observed capability index,
while regarded as the true capability index. Given a xed , our aim is to
have type II error as small as possible, and the ideal situation is that type
II error equals zero, i.e., OCP equals to ACP . By the denition of is
and , an increasing of means that OCP is closer to ACP , i.e., with a
xed , the probability is expected to be as higher as possible since the
508
JEH-NAN PAN
observed capability index then will be closer to the actual or true capability
index.
According to NeymanPearson Lemma (Hogg and Craig, 1995) the critical region can be expressed as below:
1, C P < K
=
,
(3.7)
0,
otherwise
where K is a constant larger than zero. Then the denition of can be
expressed as:
= P (reject H0 | H0 is true)
1
=P
CP < K CP =
2
2
6 P/T
1
OCP 5.15
1
1
1
=P
>
C
=
P
C 2 K 2
2
2
P
1
6 P/T
OCP 5.15
(n 1)CP2 (n 1)CP2
1
CP =
.
=P
>
2
2
2
2
K
C P
6 P/T
1
OCP 5.15
(3.8)
(3.9)
2
=P
(n1) >
K2
(n 1)
2
.
2
1
6
P/T
OCP 5.15
Therefore,
2
=
,(n1)
K2
(n 1)
.
1 2 6 P/T 2
OCP
5.15
509
Since K, larger than zero, is a constant, thus critical value K can be written as
(n 1)
K =
(3.10)
2
.
2
2
6
P/T
1
,(n1)
OCP 5.15
Substituting Equation (3.10) into the following denition of :
= 1 P (accept H0 | Ha is true) = 1 P C P > K|CP = OCP ,
then
= 1 P CP >
(n 1)
CP = OCP
2
2
1
6
P/T
2
,(n1)
OCP 5.15
2
2
1
6
P/T
2
,(n1)
OCP 5.15
1
CP = OCP
=1P
<
C 2
(n 1)
P
(n 1)CP2
2
=1P
< ,(n1)
C P2
2
CP CP = OCP
1
OCP
2
2
= 1 P (n1)
1 OCP2
< ,(n1)
6 P/T
5.15
6 P/T
5.15
2
2
6
P/T
2
2
2
1
.
,(n1) = ,(n1) 1 OCP
5.15
(3.11)
(3.12)
2
1
1
5.15
,(n1)
P/T =
1 2
.
(3.13)
6
,(n1)
OCP2
Given a xed , a predetermined , and OCP , the point estimate of
P/T ratio can be obtained by Equation (3.13) and the reasonable acceptance range of P/T ratio can also be decided by the point estimate P/T
510
JEH-NAN PAN
value 5%. If the calculated P/T value of a GR&R study is less than the
point estimate of P/T ratio minus 5% stipulated by Equation (3.13), the
measurement system is considered to be very adequate. If the calculated
P/T value of GR&R falls within the 5% range, then the measurement
system is still considered to be acceptable. Otherwise, the adequacy of the
measurement system is considered not acceptable.
Equation (3.13) gives the reasonable acceptance criteria of GR&R,
which also indicates that the testing results are inuenced by the sample
size n. Therefore, the relationship of the P/T ratio and its respective sample
size for different production processes/capability indices are discussed when
equals to 0.01 and equals to 0.98, 0.97, 0.96, 0.95.
When equals to 0.01 and equals to 0.98, 0.97, 0.96, 0.95, all the
four relationships in Figure 1 show that the curves become atter as the
sample size increases and all the curves become stable when sample size
excesses 100. Thus, we focus on discussing and selecting the critical baseline/point estimate of P/T ratio in relationship to the sample size less than
100. Based on the criteria derived in Equation (3.13), the relationship of
the P/T ratio, (with = 0.01) and various process capability indices (Cp =
1, 1.5, 2) are illustrated in Figure 2.
Figure 2 indicates that with = 0.01, a larger P/T ratio leads to a
smaller , and if the P/T ratio is less than 10%, then gets larger. Therefore, no matter what process capability index is, a P/T ratio less than
10% is always desirable. When the P/T ratio excesses 10%, differences of
become very obvious with different CP indices. According to QS9000,
the measurement system is considered satisfactory if the P/T ratio is less
than 10% and the measurement system is not acceptable if the P/T ratio
is greater than 30%. However, it becomes very ambiguous if the P/T ratio
falls within 1030% range, in which the process requiring a higher capability index tends to have a smaller . For example, if CP = 1.5 or 2, the measurement system is not acceptable since the P/T ratio leads to a smaller .
But, the measurement system may be considered acceptable if lower process
capability index for example, CP = 1 is required since it leads to a much
higher . Therefore, it would be more reasonable if the acceptance criteria
of P/T ratio can be varied with different capability indices.
Critical baselines/point estimates of the P/T ratio can be obtained by
plugging various process capability indices CP (CP = 1.0 2.0) and ( =
0.98 0.95) into Equation (3.13). Table II provides a useful reference for
quality practitioners when performing GR&R analysis.
As illustrated in the numeric example (see Section 5 for details), one
may select 10 parts, 4 inspectors and 3 replications (n = 10, p = 4 and k = 3)
for performing the GR&R study and get the width of condence interval
(Pan, 2004). If the target value of process capability index equals 1.5 and
= 0.97, then the point estimate of P/T ratio is 14%. Hence, the acceptable
511
Figure 1. The relationship of the P/T ratio and its respective sample size for different
process capability indices when equals to 0.01 and equals to 0.98, 0.97, 0.96, 0.95.
Table II. The point estimates of P/T ratio under various combinations of CP and
Target value of Cp
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
0.98
0.97
0.96
0.95
16%
20%
23%
25%
14%
18%
21%
23%
13%
17%
19%
21%
12%
16%
18%
19%
11%
15%
16%
18%
11%
14%
15%
17%
10%
13%
14%
16%
9%
12%
14%
15%
9%
11%
13%
14%
8%
11%
12%
13%
8%
10%
12%
12%
512
JEH-NAN PAN
Figure 2. The relationship of the P/T ratio and with different CP indices.
513
Figure 3. Comparison of the Biases for three GR&R methods with various total measurement numbers ( = 5, 2 = 0.5).
Step 7: Calculate the biases between the mean of samples and population
variations.
Suppose one select a data set from the normal distribution with = 5
and 2 = 0.5 to compare the biases of estimation by using three different methods. Then the simulation results with different total measurement
numbers are shown in Figure 3.
Figure 3 shows that bias of ANOVA method is the smallest and biases
of Classical GR&R and Long Form are quite close, and when the total
measurement number increases, biases of three methods become closer to
each other. Therefore, using the ANOVA method to estimate the gauge
variability is better than using the other two methods since the variation
of the interaction between products and inspectors can be estimated by the
ANOVA method. The result of Classical GR&R method is close to that of
the Long Form method, i.e., the variations of measurements estimated by
these two methods have no signicant differences due to the fact that both
methods are using sample ranges to estimate the gauge repeatability and
reproducibility.
514
JEH-NAN PAN
5. Numerical Example
One TFT-LCD manufacturer located in Taiwan produces high-resolution
microscopes. Among manufacturing processes, the sealing process, of which
sealing gum applied to two glasses, is a critical one. If more gum were
applied, it will cause residual/splatter problems. On the other hand, the two
glasses cant be properly sealed if less gum is applied. Therefore, it is necessary to perform statistical process control (SPC) and the variability of measurement has to be analyzed prior to the SPC study, otherwise the SPC
result will be greatly inuenced.
The following procedures are suggested to perform the GR&R study.
Step 1: Decide products quality characteristic, gauge for measurement,
and specications of quality characteristic. In this case, the products
quality characteristic is the absorbing amount of gum for microscope
and the specication of quality characteristic is 0.10.6 mm.
Step 2: Decide measurement parameters (sample size, number of inspectors, replicate measurements). The following allocation of parameters
n, p, and k is recommended for performing GR&R study for TFT-LCD
manufacturer (Pan, 2004).
Sample size (n): 10.
Number of inspectors (p): 4.
Replicate measurements (k): 3.
Step 3: Perform actual measurement and collect data.
After deciding measurement parameters, practitioners randomly select
10 samples and assign 4 inspectors to measure each sample three times.
The measurement data are shown in Table III.
Step 4: Estimate the GR&R of the measuring process.
The ANOVA, Classical GR&R, and Long Form methods can be used
to estimate the P/T ratio. The results of GR&R analysis are summarized
in Table IV.
Table IV shows that the P/T ratio (6.68%) estimated by ANOVA method
is the largest one among the above three methods since it includes the variation of interaction between product and inspector. Moreover, it is more
accurate than the others since the sample variation obtained by ANOVA
method is more efcient in estimating the population variation than the
sample range.
Step 5: Evaluate the adequacy of the measurement system.
The capability index of this product is set to be 1.3. Suppose that the
TFT-LCD manufacturer decide = 0.01 and = 0.98 is suitable, then by
looking up Table II or substituting the parameters , , and CP into Equation 3.13, one will nd that the point estimate of the acceptance criteria of P/T ratio is 12% and the acceptable range is between 7 and 18%.
1
2
3
4
5
6
7
8
9
10
Sample
0.286
0.275
0.267
0.278
0.245
0.267
0.278
0.288
0.248
0.256
0.275
0.276
0.253
0.259
0.247
0.274
0.278
0.279
0.250
0.264
1
0.266
0.274
0.262
0.260
0.244
0.263
0.281
0.279
0.256
0.263
0.254
0.267
0.263
0.262
0.244
0.263
0.279
0.274
0.252
0.256
2
0.253
0.270
0.278
0.267
0.247
0.257
0.290
0.282
0.255
0.263
0.247
0.281
0.274
0.279
0.238
0.273
0.291
0.281
0.259
0.257
0.267
0.278
0.270
0.266
0.252
0.267
0.292
0.278
0.252
0.254
Inspector
0.270
0.274
0.270
0.267
0.247
0.267
0.308
0.281
0.250
0.264
3
0.265
0.270
0.264
0.267
0.253
0.263
0.290
0.278
0.252
0.259
0.267
0.270
0.270
0.268
0.252
0.267
0.300
0.274
0.256
0.259
0.259
0.263
0.270
0.258
0.248
0.274
0.296
0.281
0.256
0.259
4
0.267
0.274
0.267
0.270
0.248
0.267
0.296
0.278
0.252
0.252
515
516
JEH-NAN PAN
GR&R
Tolerance
Repeatability
Tolerance
Reproducibility
Tolerance
ANOVA
Classical GR&R
Long form
6.68%
5.66%
5.55%
5.36%
5.49%
5.51%
3.98%
1.38%
0.67%
Since all the P/T ratios estimated by three methods (see Table IV) are less
than 7%, one can conclude the precision and accuracy of this measurement
system is considered to be very adequate.
6. Conclusion
An adequate measurement system plays very important role in quality
improvement in industry. This paper discusses the suitability of existing
acceptance criteria of P/T ratio for analyzing the gauge repeatability and
reproducibility to determine whether the precision and accuracy of measurement systems are acceptable. A reasonable acceptance range of P/T
ratio has also been derived. Finally, the accuracy of three methods for
estimate the GR&R are compared. Several conclusions are summarized as
follows:
1. In performing the gauge repeatability and reproducibility (GR&R)
study, most industries are using the acceptance criteria set by QS9000
to evaluate the adequacy of their measurement systems. It has been
illustrated that the acceptance criteria of measurement systems is very
ambiguous when P/T ratio falls between 10 and 30%, Thus, it is suggested that the acceptance criteria of P/T ratio be varied with different
process capability indices stipulated by various industries.
2. This paper discusses the acceptance criteria of P/T ratio with
= 0.01. Table II summarizes the critical baseline/point estimate
of P/T ratio under various combinations of process capability indices,
and . It provides a very useful reference for practitioners when
performing GR&R study.
3. Among the three methods to estimate the GR&R, ANOVA method
is the most accurate one since it includes the variation of interaction
between product and inspector and it can be done with the existing
statistical software packages such as Minitab and Statistica. Thus, it
is suggested that quality practitioners use ANOVA method to perform
the GR&R study.
517
Appendix A
1
d2
m
k
10
2
3
4
5
6
7
8
9
10
0.709
.524
0.446
0.403
0.375
0.353
0.338
0.325
0.314
0.781
0.552
0.465
0.417
0.385
0.361
0.344
0.331
0.319
0.813
0.565
0.472
0.420
0.388
0.364
0.346
0.332
0.322
0.826
0.571
.474
0.422
0.389
0.365
0.347
0.333
0.323
0.840
0.575
0.476
0.424
0.391
0.366
0.348
0.334
0.323
0.855
0.581
0.481
0.426
0.392
0.368
0.348
0.334
0.324
0.862
0.581
0.481
0.427
0.392
0.368
0.350
0.336
0.324
0.885
0.592
0.485
0.429
0.395
0.370
0.351
0.337
0.325
References
Automotive Industry Action Group (AIAG). (1995). Production Part Approval Process,
AIAG Reference Manual, Southeld, MI.
Automotive Industry Action Group (AIAG). (1997). Measurement Systems Analysis, AIAG
Reference Manual, Southeld, MI.
Burdick, R. K. & Larsen, G. A. (1997). Condence intervals on measures of variability in
R&R studies. Journal of Quality Technology 29(3): 261273.
DataMyte Editing Group. (1989). DataMyte Handbook, 4th edn., DataMyte Corporation,
Chapter 6, pp. 1725.
Hogg, R. V. & Craig, A. T. (1995). Introduction to Mathematical Statistics, New Jersey:
Prentice Hall, Inc.
Juran, J. M. (1974). Quality Control Handbook, 3rd edn., New York: McGraw-Hill Inc.
Kane, V. E. (1986). Process capability indices. Journal of Quality Technology 18: 4152.
Majeske, K. D. & Andrews, R. W. (2002). Evaluating measurement systems and manufacturing processes using three quality measures. Quality Engineering 3: 243251.
Mandel, J. (1972). Repeatability and reproducibility. Journal of Quality Technology 4(2):
7485.
Montgomery, D. C. & Runger, G. C. (1993a). Gauge capability analysis and designed experiments. Part I: basic methods. Quality Engineering 6(1): 115135.
Montgomery, D. C. & Runger, G. C. (1993b). Gauge capability analysis and designed experiments. Part II: experimental design models and variance component estimation. Quality
Engineering 6(2): 289305.
518
JEH-NAN PAN
Pan, J. N. (2004). Determination of optimal parameters for gauge repeatability and reproducibility study. International Journal of Quality and Reliability Management 21(6): 672682.
Tsai, P. (1989). Variable gauge repeatability and reproducibility study using the analysis of
variance method. Quality Engineering 1(1): 107115.