0% found this document useful (0 votes)
46 views54 pages

Walpole Ch-10 KZ

The document outlines key concepts in hypothesis testing, including: 1) Hypothesis testing involves taking a sample to provide evidence for or against a hypothesis, since examining the entire population is impractical. 2) Tests involve a null hypothesis (H0) and alternative hypothesis (H1). The goal is to either reject or fail to reject the null hypothesis. 3) Type I and Type II errors can occur. Sample size can be increased to reduce both error probabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views54 pages

Walpole Ch-10 KZ

The document outlines key concepts in hypothesis testing, including: 1) Hypothesis testing involves taking a sample to provide evidence for or against a hypothesis, since examining the entire population is impractical. 2) Tests involve a null hypothesis (H0) and alternative hypothesis (H1). The goal is to either reject or fail to reject the null hypothesis. 3) Type I and Type II errors can occur. Sample size can be increased to reduce both error probabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

8/24/2023

Lecture 2: One- and


Two-Sample Tests of
Hypotheses

Bangladesh University of Eng. & Tech. Slide 1 of 108 Industrial &Production Engineering

Outline
• Statistical Hypotheses: General Concepts
• Testing a Statistical Hypothesis
• The Use of P -Values for Decision Making in Testing Hypotheses
• Single Sample: Tests Concerning a Single Mean
• Two Samples: Tests on Two Means
• Choice of Sample Size for Testing Means
• One Sample: Test on a Single Proportion
• Two Samples: Tests on Two Proportions
• One- and Two-Sample Tests Concerning Variances
• Goodness-of-Fit Test
• Test for Independence (Categorical Data)
• Test for Homogeneity
Bangladesh University of Eng. & Tech. Slide 2 of 108 Industrial &Production Engineering

1
8/24/2023

Statistical Hypotheses: General


Concepts

• The truth or falsity of a statistical hypothesis is never known with


absolute certainty unless we examine the entire population.
• This, of course, would be impractical in most situations.
• Instead, we take a random sample from the population of interest
and use the data contained in this sample to provide evidence that
either supports or does not support the hypothesis.
• Evidence from the sample that is inconsistent with the stated
hypothesis leads to a rejection of the hypothesis.

Bangladesh University of Eng. & Tech. Slide 3 of 108 Industrial &Production Engineering

The Role of Probability in


Hypothesis Testing
• Rejection of a hypothesis implies that the sample
evidence refutes it.
• Put another way, rejection means that there is a small
probability of obtaining the sample information observed
when, in fact, the hypothesis is true.
• When the data analyst formalizes experimental evidence
on the basis of hypothesis testing, the formal statement
of the hypothesis is very important.

Bangladesh University of Eng. & Tech. Slide 4 of 108 Industrial &Production Engineering

2
8/24/2023

The Null and Alternative


Hypotheses
• The structure of hypothesis testing will be formulated with the use of
the term null hypothesis, which refers to any hypothesis we wish to
test and is denoted by H0.
• The rejection of H0 leads to the acceptance of an alternative
hypothesis, denoted by H1.
• An understanding of the different roles played by the null hypothesis
(H0) and the alternative hypothesis (H1) is crucial to one’s
understanding of the rudiments of hypothesis testing.
• The alternative hypothesis H1 usually represents the question to be
answered or the theory to be tested, and thus its specification is
crucial.
• The null hypothesis H0 nullifies or opposes H1 and is often the
logical complement to H1.

Bangladesh University of Eng. & Tech. Slide 5 of 108 Industrial &Production Engineering

The Null and Alternative


Hypotheses
• The analyst arrives at one of the two following
conclusions:

Bangladesh University of Eng. & Tech. Slide 6 of 108 Industrial &Production Engineering

3
8/24/2023

Testing a Statistical Hypothesis


• A certain type of cold vaccine is known to be only 25% effective after
a period of 2 years. To determine if a new and somewhat more
expensive vaccine is superior in providing protection against the
same virus for a longer period of time, suppose that 20 people are
chosen at random and inoculated. If more than 8 of those receiving
the new vaccine surpass the 2-year period without contracting the
virus, the new vaccine will be considered superior to the one
presently in use. The requirement that the number exceed 8 is
somewhat arbitrary but appears reasonable in that it represents a
modest gain over the 5 people who could be expected to receive
protection if the 20 people had been inoculated with the vaccine
already in use. We are essentially testing the null hypothesis that the
new vaccine is equally effective after a period of 2 years as the one
now commonly used. The alternative hypothesis is that the new
vaccine is in fact superior.

Bangladesh University of Eng. & Tech. Slide 7 of 108 Industrial &Production Engineering

Testing a Statistical Hypothesis


• This is equivalent to testing the hypothesis that the
binomial parameter for the probability of a success on a
given trial is p = 1/4 against the alternative that p > 1/4.
This is usually written as follows:

Bangladesh University of Eng. & Tech. Slide 8 of 108 Industrial &Production Engineering

4
8/24/2023

The Test Statistic


• The test statistic on which we base our decision is X, the
number of individuals in our test group who receive protection
from the new vaccine for a period of at least 2 years.
• The possible values of X, from 0 to 20, are divided into two
groups: those numbers less than or equal to 8 and those
greater than 8.
• All possible scores greater than 8 constitute the critical region.
• The last number that we observe in passing into the critical
region is called the critical value.

Bangladesh University of Eng. & Tech. Slide 9 of 108 Industrial &Production Engineering

Type I and Type II Errors

Bangladesh University of Eng. & Tech. Slide 10 of 108 Industrial &Production Engineering

10

5
8/24/2023

The Probability of a Type I Error


• The probability of committing a Type I error, also called the level of
significance, is denoted by the Greek letter α.
• In our illustration, a Type I error will occur when more than 8
individuals inoculated with the new vaccine surpass the 2-year
period without contracting the virus and researchers conclude that
the new vaccine is better when it is actually equivalent to the one in
use.
• Hence, if X is the number of individuals who remain free of the virus
for at least 2 years,

Bangladesh University of Eng. & Tech. Slide 11 of 108 Industrial &Production Engineering

11

The Probability of a Type II Error


• The probability of committing a Type II error, denoted by β, is
impossible to compute unless we have a specific alternative
hypothesis.
• If we test the null hypothesis that p = 1/4 against the alternative
hypothesis that p = 1/2, then we are able to compute the probability
of not rejecting H0 when it is false.
• We simply find the probability of obtaining 8 or fewer in the group
that surpass the 2-year period when p = 1/2.

Bangladesh University of Eng. & Tech. Slide 12 of 108 Industrial &Production Engineering

12

6
8/24/2023

The Probability of a Type II Error


• It is possible that the director of the testing program is willing to
make a Type II error if the more expensive vaccine is not
significantly superior.
• In fact, the only time he wishes to guard against the Type II error is
when the true value of p is at least 0.7.
• If p = 0.7, this test procedure gives

Bangladesh University of Eng. & Tech. Slide 13 of 108 Industrial &Production Engineering

13

The Role of α, β, and Sample Size


• For a fixed sample size, a decrease in the probability of
one error will usually result in an increase in the
probability of the other error.
• Fortunately, the probability of committing both types of
error can be reduced by increasing the sample size.
• Consider the same problem using a random sample of
100 individuals. If more than 36 of the group surpass the
2-year period, we reject the null hypothesis that p = 1/4
and accept the alternative hypothesis that p > 1/4. The
critical value is now 36. All possible scores above 36
constitute the critical region, and all possible scores less
than or equal to 36 fall in the acceptance region.

Bangladesh University of Eng. & Tech. Slide 14 of 108 Industrial &Production Engineering

14

7
8/24/2023

The Role of α, β, and Sample Size

Bangladesh University of Eng. & Tech. Slide 15 of 108 Industrial &Production Engineering

15

The Role of α, β, and Sample Size

Bangladesh University of Eng. & Tech. Slide 16 of 108 Industrial &Production Engineering

16

8
8/24/2023

Illustration with a Continuous


Random Variable
• Consider the null hypothesis that the average weight of male
students in a certain college is 68 kilograms against the alternative
hypothesis that it is unequal to 68.

• A sample mean that falls close to the hypothesized value of 68


would be considered evidence in favor of H0.
• On the other hand, a sample mean that is considerably less than or
more than 68 would be evidence inconsistent with H0 and therefore
favoring H1.
• The sample mean is the test statistic in this case.

Bangladesh University of Eng. & Tech. Slide 17 of 108 Industrial &Production Engineering

17

Illustration with a Continuous


Random Variable
• A critical region for the test statistic might arbitrarily be chosen to be
the two intervals 𝑥̅ < 67 and 𝑥̅ > 69. The nonrejection region will
then be the interval 67 ≤ 𝑥̅ ≤ 69.

• Let us now use the decision criterion above to calculate the


probabilities of committing type I and type II errors when testing the
null hypothesis that μ = 68 kilograms against the alternative that μ ≠
68 kilograms.

Bangladesh University of Eng. & Tech. Slide 18 of 108 Industrial &Production Engineering

18

9
8/24/2023

Illustration with a Continuous


Random Variable
• Assume the standard deviation of the population of
weights to be σ = 3.6.
• For large samples, we may substitute s for σ if no other
estimate of σ is available.
• Our decision statistic, based on a random sample of size
n = 36, will be 𝑋 , the most efficient estimator of μ.
• From the Central Limit Theorem, we know that the
sampling distribution of 𝑋 is approximately normal with
standard deviation 𝜎 = σ/√n = 3.6/6 = 0.6.

Bangladesh University of Eng. & Tech. Slide 19 of 108 Industrial &Production Engineering

19

Illustration with a Continuous


Random Variable
• The probability of committing a type I error, or the level of
significance of our test, is equal to the sum of the areas that have
been shaded in each tail of the distribution in the figure below.

Bangladesh University of Eng. & Tech. Slide 20 of 108 Industrial &Production Engineering

20

10
8/24/2023

Illustration with a Continuous


Random Variable
• To reduce α, we have a choice of increasing the sample size or
widening the fail-to-reject region.
• Suppose that we increase the sample size to n = 64.
• Then 𝜎 = 3.6/8 = 0.45.

Bangladesh University of Eng. & Tech. Slide 21 of 108 Industrial &Production Engineering

21

Illustration with a Continuous


Random Variable
• The reduction in α is not sufficient by itself to guarantee a
good testing procedure.
• We must also evaluate β for various alternative hypotheses.
• If it is important to reject H0 when the true mean is some value
μ ≥ 70 or μ ≤ 66, then the probability of committing a type II
error should be computed and examined for the alternatives μ
= 66 and μ = 70.
• Because of symmetry, it is only necessary to consider the
probability of not rejecting the null hypothesis that μ = 68
when the alternative μ = 70 is true.
• A type II error will result when the sample mean 𝑥̅ falls
between 67 and 69 when H1 is true.
Bangladesh University of Eng. & Tech. Slide 22 of 108 Industrial &Production Engineering

22

11
8/24/2023

Illustration with a Continuous


Random Variable

Bangladesh University of Eng. & Tech. Slide 23 of 108 Industrial &Production Engineering

23

Illustration with a Continuous


Random Variable
• If the true value of μ is the alternative μ = 66, the value of β will
again be 0.0132.
• For all possible values of μ < 66 or μ > 70, the value of β will be
even smaller when n = 64, and consequently there would be little
chance of not rejecting H0 when it is false.
• The probability of committing a Type II error increases rapidly when
the true value of μ approaches, but is not equal to, the hypothesized
value.
• Of course, this is usually the situation where we do not mind making
a Type II error.
• For example, if the alternative hypothesis μ = 68.5 is true, we do not
mind committing a Type II error by concluding that the true answer is
μ = 68.
• The probability of making such an error will be high when n = 64.
Bangladesh University of Eng. & Tech. Slide 24 of 108 Industrial &Production Engineering

24

12
8/24/2023

Illustration with a Continuous


Random Variable

Bangladesh University of Eng. & Tech. Slide 25 of 108 Industrial &Production Engineering

25

Important Properties of a Test of


Hypothesis

Bangladesh University of Eng. & Tech. Slide 26 of 108 Industrial &Production Engineering

26

13
8/24/2023

The Power of a Test


• The power of a test can be computed as 1 − β.
• Often different types of tests are compared by
contrasting power properties.
• Consider the previous illustration, in which we were
testing H0 : μ = 68 and H1 : μ ≠ 68.
• As before, suppose we are interested in assessing the
sensitivity of the test.
• The test is governed by the rule that we do not reject H0
if 67 ≤ 𝑥̅ ≤ 69.
• We seek the capability of the test to properly reject H0
when indeed μ = 68.5.
Bangladesh University of Eng. & Tech. Slide 27 of 108 Industrial &Production Engineering

27

The Power of a Test


• We have seen that the probability of a Type II error is given by β =
0.8661.
• Thus, the power of the test is 1 − 0.8661 = 0.1339.
• In a sense, the power is a more succinct measure of how sensitive
the test is for detecting differences between a mean of 68 and a
mean of 68.5.
• In this case, if μ is truly 68.5, the test as described will properly
reject H0 only 13.39% of the time.
• As a result, the test would not be a good one if it was important that
the analyst have a reasonable chance of truly distinguishing
between a mean of 68.0 (specified by H0) and a mean of 68.5.
• From the foregoing, it is clear that to produce a desirable power
(say, greater than 0.8), one must either increase α or increase the
sample size.
Bangladesh University of Eng. & Tech. Slide 28 of 108 Industrial &Production Engineering

28

14
8/24/2023

One- and Two-Tailed Tests

Bangladesh University of Eng. & Tech. Slide 29 of 108 Industrial &Production Engineering

29

How Are the Null and Alternative


Hypotheses Chosen?
• The null hypothesis H0 will often be stated using the equality sign.
• With this approach, it is clear how the probability of Type I error is
controlled.
• However, there are situations in which “do not reject H0” implies that
the parameter θ might be any value defined by the natural
complement to the alternative hypothesis.
• For example, in the vaccine example, where the alternative
hypothesis is H1: p > 1/4, it is quite possible that nonrejection of H0
cannot rule out a value of p less than 1/4.
• Clearly though, in the case of one-tailed tests, the statement of the
alternative is the most important consideration.
• Whether one sets up a one-tailed or a two-tailed test will depend on
the conclusion to be drawn if H0 is rejected.

Bangladesh University of Eng. & Tech. Slide 30 of 108 Industrial &Production Engineering

30

15
8/24/2023

How Are the Null and Alternative


Hypotheses Chosen?
• The location of the critical region can be determined only after H1
has been stated.
• For example, in testing a new drug, one sets up the hypothesis that
it is no better than similar drugs now on the market and tests this
against the alternative hypothesis that the new drug is superior.
• Such an alternative hypothesis will result in a one-tailed test with the
critical region in the right tail.
• However, if we wish to compare a new teaching technique with the
conventional classroom procedure, the alternative hypothesis
should allow for the new approach to be either inferior or superior to
the conventional procedure.
• Hence, the test is two-tailed with the critical region divided equally
so as to fall in the extreme left and right tails of the distribution of our
statistic.
Bangladesh University of Eng. & Tech. Slide 31 of 108 Industrial &Production Engineering

31

Example 1

Bangladesh University of Eng. & Tech. Slide 32 of 108 Industrial &Production Engineering

32

16
8/24/2023

Example 2

Bangladesh University of Eng. & Tech. Slide 33 of 108 Industrial &Production Engineering

33

The Use of P -Values for Decision


Making in Testing Hypotheses

• How does the use of p-values differ from classic hypothesis testing?

Bangladesh University of Eng. & Tech. Slide 34 of 108 Industrial &Production Engineering

34

17
8/24/2023

Single Sample: Tests Concerning


a Single Mean
• Tests on a Single Mean (Variance Known)

Bangladesh University of Eng. & Tech. Slide 35 of 108 Industrial &Production Engineering

35

Single Sample: Tests Concerning a


Single Mean
• Tests on a Single Mean (Variance Known)

Bangladesh University of Eng. & Tech. Slide 36 of 108 Industrial &Production Engineering

36

18
8/24/2023

Single Sample: Tests Concerning a


Single Mean
• Tests on a Single Mean (Variance Known)

• Thus, rejection of H0 results when the computed z > zα.


• Obviously, if the alternative is H1: μ < μ0, the critical region is entirely
in the lower tail and thus rejection results from z < −zα.
• Although in a one-sided testing case the null hypothesis can be
written as H0 : μ ≤ μ0 or H0: μ ≥ μ0, it is usually written as H0: μ = μ0.

Bangladesh University of Eng. & Tech. Slide 37 of 108 Industrial &Production Engineering

37

Example 3

Bangladesh University of Eng. & Tech. Slide 38 of 108 Industrial &Production Engineering

38

19
8/24/2023

Example 3

Bangladesh University of Eng. & Tech. Slide 39 of 108 Industrial &Production Engineering

39

Example 4

Bangladesh University of Eng. & Tech. Slide 40 of 108 Industrial &Production Engineering

40

20
8/24/2023

Example 4

Bangladesh University of Eng. & Tech. Slide 41 of 108 Industrial &Production Engineering

41

Relationship to Confidence
Interval Estimation
• The hypothesis-testing approach to statistical inference in this
lecture is very closely related to the confidence interval approach in
studied in the previous lecture.
• Confidence interval estimation involves computation of bounds
within which it is “reasonable” for the parameter in question to lie.
• For the case of a single population mean μ with σ2 known, the
structure of both hypothesis testing and confidence interval
estimation is based on the random variable

Bangladesh University of Eng. & Tech. Slide 42 of 108 Industrial &Production Engineering

42

21
8/24/2023

Tests on a Single Sample


(Variance Unknown)
• The application of Student t for both confidence intervals and
hypothesis testing is developed under the following assumptions.
– The random variables X1, X2, . . . , Xn represent a random sample from a normal
distribution with unknown μ and σ2.
– Then the random variable 𝑛 𝑋 𝜇 /𝑆 has a Student t-distribution with n−1
degrees of freedom.
– The structure of the test is identical to that for the case of σ known, with the
exception that the value σ in the test statistic is replaced by the computed
estimate S and the standard normal distribution is replaced by a t-distribution.

Bangladesh University of Eng. & Tech. Slide 43 of 108 Industrial &Production Engineering

43

Example 5

Bangladesh University of Eng. & Tech. Slide 44 of 108 Industrial &Production Engineering

44

22
8/24/2023

Comment on the Single-Sample


t-Test
• Comments regarding the normality assumption are worth
emphasizing at this point.
• We have indicated that when σ is known, the Central Limit Theorem
allows for the use of a test statistic or a confidence interval which is
based on Z, the standard normal random variable.
• Strictly speaking, of course, the Central Limit Theorem, and thus the
use of the standard normal distribution, does not apply unless σ is
known.
• In the development of the t-distribution, normality on X1, X2, . . . , Xn
was an underlying assumption.
• Thus, strictly speaking, the Student’s t-tables of percentage points
for tests or confidence intervals should not be used unless it is
known that the sample comes from a normal population.

Bangladesh University of Eng. & Tech. Slide 45 of 108 Industrial &Production Engineering

45

Comment on the Single-Sample


t-Test
• In practice, σ can rarely be assumed to be known.
• However, a very good estimate may be available from previous
experiments.
• Many statistics textbooks suggest that one can safely replace σ by s
in the test statistic

• when n ≥ 30 with a bell-shaped population and still use the Z-tables


for the appropriate critical region.
• The implication here is that the Central Limit Theorem is indeed
being invoked and one is relying on the fact that s ≈ σ.
• Obviously, when this is done, the results must be viewed as
approximate.
Bangladesh University of Eng. & Tech. Slide 46 of 108 Industrial &Production Engineering

46

23
8/24/2023

Comment on the Single-Sample


t-Test
• Thus, a computed P- value (from the Z-distribution) of 0.15 may be
0.12 or perhaps 0.17, or a computed confidence interval may be a
93% confidence interval rather than a 95% interval as desired.
• Now what about situations where n ≤ 30?
• The user cannot rely on s being close to σ, and in order to take into
account the inaccuracy of the estimate, the confidence interval
should be wider or the critical value larger in magnitude.
• The t-distribution percentage points accomplish this but are correct
only when the sample is from a normal distribution.
• Of course, normal probability plots can be used to ascertain some
sense of the deviation of normality in a data set.
• For bell-shaped distributions of the random variables X1, X2, ..., Xn,
the use of the t-distribution for tests or confidence intervals is likely
to produce quite good results.
Bangladesh University of Eng. & Tech. Slide 47 of 108 Industrial &Production Engineering

47

Two Samples: Tests on Two Means


• Tests concerning two means represent a set of very important
analytical tools for the scientist or engineer.
• Two independent random samples of sizes n1 and n2, respectively,
are drawn from two populations with means μ1 and μ2 and
variances σ12 and σ22.
• We know that the random variable Z below has a standard normal
distribution.

• Here we are assuming that n1 and n2 are sufficiently large that the
Central Limit Theorem applies.
• Of course, if the two populations are normal, the statistic above has
a standard normal distribution even for small n1 and n2.
Bangladesh University of Eng. & Tech. Slide 48 of 108 Industrial &Production Engineering

48

24
8/24/2023

Two Samples: Tests on Two Means

• Obviously, if we can assume that σ1 = σ2 = σ, the statistic above


reduces to

• The two-sided hypothesis on two means can be written generally as

Bangladesh University of Eng. & Tech. Slide 49 of 108 Industrial &Production Engineering

49

Two Samples: Tests on Two Means


• Unknown But Equal Variances
– The more prevalent situations involving tests on two means are
those in which variances are unknown.
– If the scientist involved is willing to assume that both distributions
are normal and that σ1 = σ2 = σ, the pooled t-test (often called
the two-sample t-test) may be used.
– The test statistic is given by the following test procedure.

<

Bangladesh University of Eng. & Tech. Slide 50 of 108 Industrial &Production Engineering

50

25
8/24/2023

Example 6

Bangladesh University of Eng. & Tech. Slide 51 of 108 Industrial &Production Engineering

51

Example 6

Bangladesh University of Eng. & Tech. Slide 52 of 108 Industrial &Production Engineering

52

26
8/24/2023

Paired Observations
• A study of the two-sample t-test or confidence interval on the
difference between means should suggest the need for experimental
design.
• Recall the discussion of experimental units in Lecture 1, where it
was suggested that the conditions of the two populations (often
referred to as the two treatments) should be assigned randomly to
the experimental units.
• This is done to avoid biased results due to systematic differences
between experimental units.
• In other words, in hypothesis- testing jargon, it is important that any
significant difference found between means be due to the different
conditions of the populations and not due to the experimental units
in the study.

Bangladesh University of Eng. & Tech. Slide 53 of 108 Industrial &Production Engineering

53

Paired Observations
• For example, consider Exercise 9.40 in Section 9.9.

Bangladesh University of Eng. & Tech. Slide 54 of 108 Industrial &Production Engineering

54

27
8/24/2023

Paired Observations
• For example, consider Exercise 9.40 in Section 9.9.
• The 20 seedlings play the role of the experimental units.
• Ten of them are to be treated with nitrogen and 10 with no nitrogen.
• It may be very important that this assignment to the “nitrogen” and
“no-nitrogen” treatments be random to ensure that systematic
differences between the seedlings do not interfere with a valid
comparison between the means.
• In Example 10.6, time of measurement is the most likely choice for
the experimental unit.
• The 22 pieces of material should be measured in random order.
• We need to guard against the possibility that wear measurements
made close together in time might tend to give similar results.

Bangladesh University of Eng. & Tech. Slide 55 of 108 Industrial &Production Engineering

55

Paired Observations
• Systematic (nonrandom) differences in experimental units are not
expected.
• However, random assignments guard against the problem.
• Testing of two means can be accomplished when data are in the
form of paired observations, as discussed in Lecture 1.
• In this pairing structure, the conditions of the two populations
(treatments) are assigned randomly within homogeneous units.
• Computation of the confidence interval for μ1 − μ2 in the situation
with paired observations is based on the random variable

• Where 𝐷 and Sd are random variables representing the sample


mean and standard deviation of the differences of the observations
in the experimental units.
Bangladesh University of Eng. & Tech. Slide 56 of 108 Industrial &Production Engineering

56

28
8/24/2023

Paired Observations
• As in the case of the pooled t-test, the assumption is that the
observations from each population are normal.
• This two-sample problem is essentially reduced to a one-sample
problem by using the computed differences d1, d2, . . . , dn. Thus, the
hypothesis reduces to

• The computed test statistic is then given by

• Critical regions are constructed using the t-distribution with n − 1


degrees of free- dom.

Bangladesh University of Eng. & Tech. Slide 57 of 108 Industrial &Production Engineering

57

Problem of Interaction in a Paired t-


Test
• Not only will the case study that follows illustrate the use of the
paired t-test but the discussion will shed considerable light on the
difficulties that arise when there is an interaction between the
treatments and the experimental units in the paired t structure.
• There are some types of statistical tests in which the existence of
interaction results in difficulty.
• The paired t-test is one such example.
• In Lecture 1, the paired structure was used in the computation of a
confidence interval on the difference between two means, and the
advantage in pairing was revealed for situations in which the
experimental units are homogeneous.
• The pairing results in a reduction in σD, the standard deviation of a
difference Di = X1i − X2i.
Bangladesh University of Eng. & Tech. Slide 58 of 108 Industrial &Production Engineering

58

29
8/24/2023

Problem of Interaction in a Paired t-


Test
• If interaction exists between treatments and experimental units, the
advantage gained in pairing may be substantially reduced.
• Thus, in Example 9.13 on page 293, the no interaction assumption
allowed the difference in mean TCDD levels (plasma vs. fat tissue)
to be the same across veterans.
• A quick glance at the data would suggest that there is no significant
violation of the assumption of no interaction.
• In order to demonstrate how interaction influences Var(D) and hence
the quality of the paired t-test, it is instructive to revisit the ith
difference given by Di = X1i − X2i = (μ1 − μ2) + (ε1 − ε2), where X1i and
X2i are taken on the ith experimental unit.
• If the pairing unit is homogeneous, the errors in X1i and in X2i should
be similar and not independent.

Bangladesh University of Eng. & Tech. Slide 59 of 108 Industrial &Production Engineering

59

Problem of Interaction in a Paired t-


Test
• We noted in Chapter 9 that the positive covariance between the
errors results in a reduced Var(D).
• Thus, the size of the difference in the treatments and the
relationship between the errors in X1i and X2i contributed by the
experimental unit will tend to allow a significant difference to be
detected.

Bangladesh University of Eng. & Tech. Slide 60 of 108 Industrial &Production Engineering

60

30
8/24/2023

What Conditions Result in


Interaction?
• Let us consider a situation in which the experimental units are not
homogeneous.
• Rather, consider the ith experimental unit with random variables X1i
and X2i that are not similar.
• Let ε1i and ε2i be random variables representing the errors in the
values X1i and X2i, respectively, at the ith unit.
• Thus, we may write

• The errors with expectation zero may tend to cause the response
values X1i and X2i to move in opposite directions, resulting in a
negative value for Cov(ε1i, ε2i) and hence negative Cov(X1i, X2i).

Bangladesh University of Eng. & Tech. Slide 61 of 108 Industrial &Production Engineering

61

What Conditions Result in


Interaction?
• In fact, the model may be complicated even more by the fact that σ12
= Var(ε1i) ≠ σ22 = Var(ε2i).
• The variance and covariance parameters may vary among the n
experimental units.
• Thus, unlike in the homogeneous case, Di will tend to be quite
different across experimental units due to the heterogeneous nature
of the difference in ε1 − ε2 among the units.
• This produces the interaction between treatments and units.
• In addition, for a specific experimental unit (see Theorem 4.9),

• is inflated by the negative covariance term, and thus the advantage


gained in pairing in the homogeneous unit case is lost in the case
described here.
Bangladesh University of Eng. & Tech. Slide 62 of 108 Industrial &Production Engineering

62

31
8/24/2023

What Conditions Result in


Interaction?
• While the inflation in Var(D) will vary from case to case, there is a
danger in some cases that the increase in variance may neutralize
any difference that exists between μ1 and μ2.
• Of course, a large value of d ̄ in the t-statistic may reflect a treatment
difference that overcomes the inflated variance estimate, sd2.

Bangladesh University of Eng. & Tech. Slide 63 of 108 Industrial &Production Engineering

63

Case Study

Bangladesh University of Eng. & Tech. Slide 64 of 108 Industrial &Production Engineering

64

32
8/24/2023

Case Study

Bangladesh University of Eng. & Tech. Slide 65 of 108 Industrial &Production Engineering

65

Case Study

Bangladesh University of Eng. & Tech. Slide 66 of 108 Industrial &Production Engineering

66

33
8/24/2023

Tests Concerning Means

Bangladesh University of Eng. & Tech. Slide 67 of 108 Industrial &Production Engineering

67

One Sample: Test on a Single


Proportion
• Tests of hypotheses concerning proportions are required in many
areas.
• Politicians are certainly interested in knowing what fraction of the
voters will favor them in the next election.
• All manufacturing firms are concerned about the proportion of
defective items when a shipment is made.
• Gamblers depend on a knowledge of the proportion of outcomes
that they consider favorable.
• We will consider the problem of testing the hypothesis that the
proportion of successes in a binomial experiment equals some
specified value.

Bangladesh University of Eng. & Tech. Slide 68 of 108 Industrial &Production Engineering

68

34
8/24/2023

One Sample: Test on a Single


Proportion
• We are testing the null hypothesis H0 that p = p0, where p is the
parameter of the binomial distribution.
• The alternative hypothesis may be one of the usual one-sided or
two-sided alternatives:

• The appropriate random variable on which we base our decision


criterion is the binomial random variable X, although we could just
as well use the statistic 𝑝̂ = X/n.
• Values of X that are far from the mean μ = np0 will lead to the
rejection of the null hypothesis.
• Because X is a discrete binomial variable, it is unlikely that a critical
region can be established whose size is exactly equal to a
prespecified value of α.
Bangladesh University of Eng. & Tech. Slide 69 of 108 Industrial &Production Engineering

69

One Sample: Test on a Single


Proportion
• For this reason, it is preferable, in dealing with small samples, to
base our decisions on P-values.
• To test the hypothesis

• we use the binomial distribution to compute the P -value

• The value x is the number of successes in our sample of size n.


• If this P-value is less than or equal to α, our test is significant at the
α level, and we reject H0 in favor of H1.

Bangladesh University of Eng. & Tech. Slide 70 of 108 Industrial &Production Engineering

70

35
8/24/2023

One Sample: Test on a Single


Proportion
• Similarly, to test the hypothesis

• at the α-level of significance, we compute

• and reject H0 in favor of H1 if this P-value is less than or equal to α.


• Finally, to test the hypothesis

• at the α-level of significance, we compute


Bangladesh University of Eng. & Tech. Slide 71 of 108 Industrial &Production Engineering

71

One Sample: Test on a Single


Proportion

• and reject H0 in favor of H1 if the computed P-value is less than or


equal to α.

Bangladesh University of Eng. & Tech. Slide 72 of 108 Industrial &Production Engineering

72

36
8/24/2023

One Sample: Test on a Single


Proportion

Bangladesh University of Eng. & Tech. Slide 73 of 108 Industrial &Production Engineering

73

Example 9

Bangladesh University of Eng. & Tech. Slide 74 of 108 Industrial &Production Engineering

74

37
8/24/2023

One Sample: Test on a Single


Proportion
• We know that binomial probabilities can be obtained from the actual
binomial formula or from table when n is small.
• For large n, approximation procedures are required.
• When the hypothesized value p0 is very close to 0 or 1, the Poisson
distribution, with parameter μ = np0, may be used.
• However, the normal curve approximation, with parameters μ = np0
and σ2 = np0q0, is usually preferred for large n and is very accurate
as long as p0 is not extremely close to 0 or to 1.
• If we use the normal approximation, the z-value for testing p = p0 is
given by

Bangladesh University of Eng. & Tech. Slide 75 of 108 Industrial &Production Engineering

75

Example 10

Bangladesh University of Eng. & Tech. Slide 76 of 108 Industrial &Production Engineering

76

38
8/24/2023

Two Samples: Tests on Two


Proportions
• Situations often arise where we wish to test the hypothesis that two
proportions are equal.
• For example, we might want to show evidence that the proportion of
doctors who are pediatricians in one state is equal to the proportion
in another state.
• A person may decide to give up smoking only if he or she is
convinced that the proportion of smokers with lung cancer exceeds
the proportion of nonsmokers with lung cancer.
• In general, we wish to test the null hypothesis that two proportions,
or binomial parameters, are equal.
• That is, we are testing p1 = p2 against one of the alternatives p1 < p2,
p1 > p2, or p1 ≠ p2.

Bangladesh University of Eng. & Tech. Slide 77 of 108 Industrial &Production Engineering

77

Two Samples: Tests on Two


Proportions
• Of course, this is equivalent to testing the null hypothesis that p1 −
p2 = 0 against one of the alternatives p1 − p2 < 0, p1 − p2 > 0, or p1 −
p2 ≠ 0.
• The statistic on which we base our decision is the random variable
𝑃 1 − 𝑃 2.
• Independent samples of sizes n1 and n2 are selected at random from
two binomial populations and the proportions of successes 𝑃1 and
𝑃2 for the two samples are computed.
• In our construction of confidence intervals for p1 and p2 we noted, for
n1 and n2 sufficiently large, that the point estimator 𝑃1 − 𝑃2 was
approximately normally distributed with

Bangladesh University of Eng. & Tech. Slide 78 of 108 Industrial &Production Engineering

78

39
8/24/2023

Two Samples: Tests on Two


Proportions

• When H0 is true, we can substitute p1 = p2 = p and q1 = q2 = q


(where p and q are the common values) in the preceding formula for
Z to give the form

• To compute a value of Z, however, we must estimate the


parameters p and q that appear in the radical.
• Upon pooling the data from both samples, the pooled estimate of the
proportion p is

Bangladesh University of Eng. & Tech. Slide 79 of 108 Industrial &Production Engineering

79

Two Samples: Tests on Two


Proportions

• Substituting 𝑝̂ for p and 𝑞 = 1 − 𝑝̂ for q, the z-value for testing p1 = p2


is determined from the formula

Bangladesh University of Eng. & Tech. Slide 80 of 108 Industrial &Production Engineering

80

40
8/24/2023

Example 11

Bangladesh University of Eng. & Tech. Slide 81 of 108 Industrial &Production Engineering

81

Example 11

Bangladesh University of Eng. & Tech. Slide 82 of 108 Industrial &Production Engineering

82

41
8/24/2023

One- and Two-Sample Tests


Concerning Variances
• Here, we are concerned with testing hypotheses concerning
population variances or standard deviations.
• Engineers and scientists are confronted with studies in which they
are required to demonstrate that measurements involving products
or processes adhere to specifications set by consumers.
• The specifications are often met if the process variance is
sufficiently small.
• Attention is also focused on comparative experiments between
methods or processes, where inherent reproducibility or variability
must formally be compared.
• In addition, to determine if the equal variance assumption is violated,
a test comparing two variances is often applied prior to conducting a
t-test on two means.

Bangladesh University of Eng. & Tech. Slide 83 of 108 Industrial &Production Engineering

83

One- and Two-Sample Tests


Concerning Variances
• Let us first consider the problem of testing the null hypothesis H0
that the population variance σ2 equals a specified value σ02 against
one of the usual alternatives σ2 < σ02, σ2 > σ02, or σ2 ≠ σ02.
• The appropriate statistic on which to base our decision is the chi-
squared statistic.
• Therefore, if we assume that the distribution of the population being
sampled is normal, the chi-squared value for testing σ2 = σ02 is given
by

• where n is the sample size, s2 is the sample variance, and σ02 is the value of σ2 given
by the null hypothesis.
• If H0 is true, χ2 is a value of the chi-squared distribution with v = n − 1 degrees of
freedom.
• Critical regions:

Bangladesh University of Eng. & Tech. Slide 84 of 108 Industrial &Production Engineering

84

42
8/24/2023

Robustness of 𝜒2-Test to
Assumption of Normality
• We know that various tests depend, at least theoretically, on the assumption of
normality.
• In general, many procedures in applied statistics have theoretical underpinnings that
depend on the normal distribution.
• These procedures vary in the degree of their dependency on the assumption of
normality.
• A procedure that is reasonably insensitive to the assumption is called a robust
procedure (i.e., robust to normality).
• The χ2-test on a single variance is very nonrobust to normality (i.e., the practical
success of the procedure depends on normality).
• As a result, the P-value computed may be appreciably different from the actual P-
value if the population sampled is not normal.
• Indeed, it is quite feasible that a statistically significant P-value may not truly signal
H1: σ ≠ σ0; rather, a significant value may be a result of the violation of the normality
assumptions.
• Therefore, the analyst should approach the use of this particular χ2-test with caution.

Bangladesh University of Eng. & Tech. Slide 85 of 108 Industrial &Production Engineering

85

Example 12

Bangladesh University of Eng. & Tech. Slide 86 of 108 Industrial &Production Engineering

86

43
8/24/2023

One- and Two-Sample Tests


Concerning Variances
• Now let us consider the problem of testing the equality of the variances σ12
and σ22 of two populations.
• That is, we shall test the null hypothesis H0 that σ12 = σ22 against one of the
usual alternatives

• For independent random samples of sizes n1 and n2, respectively, from the
two populations, the f-value for testing σ12 = σ22 is the ratio

• where s12 and s22 are the variances computed from the two samples.
• If the two populations are approximately normally distributed and the null hypothesis
is true, according to Theorem 8.8 the ratio f = s12 / s22 is a value of the F-distribution
with v1 = n1 − 1 and v2 = n2 − 1 degrees of freedom.
• Critical regions:

Bangladesh University of Eng. & Tech. Slide 87 of 108 Industrial &Production Engineering

87

Example 13

Bangladesh University of Eng. & Tech. Slide 88 of 108 Industrial &Production Engineering

88

44
8/24/2023

Goodness-of-Fit Test

• We consider a test to determine if a population has a specified


theoretical distribution.
• The test is based on how good a fit we have between the frequency
of occurrence of observations in an observed sample and the
expected frequencies obtained from the hypothesized distribution.
• To illustrate, we consider the tossing of a die.
• We hypothesize that the die is honest, which is equivalent to testing
the hypothesis that the distribution of outcomes is the discrete
uniform distribution

Bangladesh University of Eng. & Tech. Slide 89 of 108 Industrial &Production Engineering

89

Goodness-of-Fit Test
• Suppose that the die is tossed 120 times and each outcome is
recorded.
• Theoretically, if the die is balanced, we would expect each face to
occur 20 times.

• By comparing the observed frequencies with the corresponding


expected frequencies, we must decide whether these discrepancies
are likely to occur as a result of sampling fluctuations and the die is
balanced or whether the die is not honest, and the distribution of
outcomes is not uniform.

Bangladesh University of Eng. & Tech. Slide 90 of 108 Industrial &Production Engineering

90

45
8/24/2023

Goodness-of-Fit Test
• It is common practice to refer to each possible outcome of an
experiment as a cell.
• In our illustration, we have 6 cells.
• The appropriate statistic on which we base our decision criterion for
an experiment involving k cells is defined by the following.
• A goodness-of-fit test between observed and expected frequencies
is based on the quantity

• where 𝜒2 is a value of a random variable whose sampling


distribution is approximated very closely by the chi-squared
distribution with v = k − 1 degrees of freedom.
• The symbols oi and ei represent the observed and expected
frequencies, respectively, for the ith cell.
Bangladesh University of Eng. & Tech. Slide 91 of 108 Industrial &Production Engineering

91

Goodness-of-Fit Test
• If the observed frequencies are close to the corresponding expected
frequencies, the 𝜒2 -value will be small, indicating a good fit.
• If the observed frequencies differ considerably from the expected
frequencies, the 𝜒2 -value will be large and the fit is poor.
• A good fit leads to the acceptance of H0, whereas a poor fit leads to
its rejection.
• The critical region will, therefore, fall in the right tail of the chi-
squared distribution.
• For a level of significance equal to α, we find the critical value 𝜒2α
from Table A.5, and then 𝜒2 > 𝜒2α constitutes the critical region.
• The decision criterion described here should not be used unless
each of the expected frequencies is at least equal to 5.
• This restriction may require the combining of adjacent cells, resulting
in a reduction in the number of degrees of freedom.
Bangladesh University of Eng. & Tech. Slide 92 of 108 Industrial &Production Engineering

92

46
8/24/2023

Critical Values: Chi-Squared Distribution

Bangladesh University of Eng. & Tech. Slide 93 of 108 Industrial &Production Engineering

93

Critical Values: Chi-Squared Distribution

Bangladesh University of Eng. & Tech. Slide 94 of 108 Industrial &Production Engineering

94

47
8/24/2023

Goodness-of-Fit Test

• Since 1.7 is less than the critical value, we fail to reject H0.
• We conclude that there is insufficient evidence that the die is not
balanced.

Bangladesh University of Eng. & Tech. Slide 95 of 108 Industrial &Production Engineering

95

Goodness-of-Fit Test

• As a second illustration, let us test the hypothesis that the frequency


distribution of battery lives given in Table 1.7 on page 23 may be
approximated by a normal distribution with mean μ = 3.5 and
standard deviation σ = 0.7.
• The expected frequencies for the 7 classes (cells), listed in Table
10.5, are obtained by computing the areas under the hypothesized
normal curve that fall between the various class boundaries.

Bangladesh University of Eng. & Tech. Slide 96 of 108 Industrial &Production Engineering

96

48
8/24/2023

Goodness-of-Fit Test

• For example, the z-values corresponding to the boundaries of the


fourth class are

• It is customary to round these frequencies to one decimal.

• We have no reason to reject the null hypothesis and conclude that


the normal distribution with μ = 3.5 and σ = 0.7 provides a good fit
for the distribution of battery lives.
Bangladesh University of Eng. & Tech. Slide 97 of 108 Industrial &Production Engineering

97

Test for Independence


(Categorical Data)
• The chi-squared test procedure just discussed can also be used to test the
hypothesis of independence of two variables of classification.
• Suppose that we wish to determine whether the opinions of the voting
residents of the state of Illinois concerning a new tax reform are
independent of their levels of income.
• Members of a random sample of 1000 registered voters from the state of
Illinois are classified as to whether they are in a low, medium, or high
income bracket and whether or not they favor the tax reform.
• The observed frequencies are presented in the table below, which is known
as a contingency table.

Bangladesh University of Eng. & Tech. Slide 98 of 108 Industrial &Production Engineering

98

49
8/24/2023

Test for Independence


(Categorical Data)
• A contingency table with r rows and c columns is referred to as an r
× c table (“r × c” is read “r by c”).
• The row and column totals in Table 10.6 are called marginal
frequencies.
• Our decision to accept or reject the null hypothesis, H0, of
independence between a voter’s opinion concerning the tax reform
and his or her level of income.
• This is based upon how good a fit we have between the observed
frequencies in each of the 6 cells of Table 10.6 and the frequencies
that we would expect for each cell under the assumption that H0 is
true.

Bangladesh University of Eng. & Tech. Slide 99 of 108 Industrial &Production Engineering

99

Test for Independence


(Categorical Data)
• To find these expected frequencies, let us define the following
events:

Bangladesh University of Eng. & Tech. Slide 100 of 108 Industrial &Production Engineering

100

50
8/24/2023

Test for Independence


(Categorical Data)
• Now, if H0 is true and the two variables are independent, we should
have

• The expected frequencies are obtained by multiplying each cell


probability by the total number of observations.
• As before, we round these frequencies to one decimal.

Bangladesh University of Eng. & Tech. Slide 101 of 108 Industrial &Production Engineering

101

Test for Independence


(Categorical Data)

Bangladesh University of Eng. & Tech. Slide 102 of 108 Industrial &Production Engineering

102

51
8/24/2023

Test for Homogeneity


• When we tested for independence, a random sample of 1000 voters was
selected and the row and column totals for our contingency table were
determined by chance.
• Another type of problem for which the method of previous problem applies
is one in which either the row or column totals are predetermined.
• Suppose, for example, that we decide in advance to select 200 Democrats,
150 Republicans, and 150 Independents from the voters of the state of
North Carolina and record whether they are for a proposed abortion law,
against it, or undecided.

Bangladesh University of Eng. & Tech. Slide 103 of 108 Industrial &Production Engineering

103

Test for Homogeneity


• Now, rather than test for independence, we test the hypothesis that
the population proportions within each row are the same.
• That is, we test the hypothesis that the proportions of Democrats,
Republicans, and Independents favoring the abortion law are the
same; the proportions of each political affiliation against the law are
the same; and the proportions of each political affiliation that are
undecided are the same.
• We are basically interested in determining whether the three
categories of voters are homogeneous with respect to their opinions
concerning the proposed abortion law.
• Such a test is called a test for homogeneity.

Bangladesh University of Eng. & Tech. Slide 104 of 108 Industrial &Production Engineering

104

52
8/24/2023

Example 14

Bangladesh University of Eng. & Tech. Slide 105 of 108 Industrial &Production Engineering

105

Example 14

Bangladesh University of Eng. & Tech. Slide 106 of 108 Industrial &Production Engineering

106

53
8/24/2023

Example 14

Bangladesh University of Eng. & Tech. Slide 107 of 108 Industrial &Production Engineering

107

Assignment-2

• Walpole Chapter 10:


• Problems 10.1, 10.3, 10.4, 10.15, 10.19, 10.21, 10.26,
10.27, 10.30, 10.31, 10.34, 10.36, 10.55, 10.61, 10.67,
10.73, 10.79, 10.80, 10.84, 10.86, 10.88.

Bangladesh University of Eng. & Tech. Slide 108 of 108 Industrial &Production Engineering

108

54

You might also like