Lecture Notes 8A Testing of Hypothesis

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Lecture Notes 8A – Testing of Hypothesis 1

Engr. Caesar Pobre Llapitan

1 Introduction
In this chapter we will study another method of inference-making: hypothesis testing. The
procedures to be discussed are useful in situations where we are interested in making a decision about
a parameter value, rather than obtaining an estimate of its value. It is often desirable to know whether
some characteristics of a population is larger than a specified value, or whether the obtained value of
a given parameter is less than a value hypothesized for the purpose of comparison.

2 Formulating Hypotheses
When we set out to test a new theory, we first formulate a hypothesis, or a claim, which we believe to
be true. For example, we may claim that the mean number of children born to urban women is less
than the mean number of children born to rural women.

Since the value of the population characteristic is unknown, the information provided by a sample
from the population is used to answer the question of whether or not the population quantity is larger
than the specified or hypothesized value. In statistical terms, a statistical hypothesis is a statement
about the value of a population parameter. The hypothesis that we try to establish is called the
alternative hypothesis and is denoted by Ha. To be paired with the alternative hypothesis is the null
hypothesis, which is "opposite" of the alternative hypothesis, and is denoted by H0. In this way, the
null and alternative hypotheses, both stated in terms of the appropriate parameters, describe two
possible states of nature that cannot simultaneously be true. When a researcher begins to collect
information about the phenomenon of interest, he or she generally tries to present evidence that
lends support to the alternative hypothesis. As you will subsequently learn, we take an indirect
approach to obtaining support for the alternative hypothesis: Instead of trying to show that the
alternative hypothesis is true, we attempt to produce evidence to show that the null hypothesis is
false.

It should be stressed that researchers frequently put forward a null hypothesis in the hope that they
can discredit it. For example, consider an educational researcher who designed a new way to teach a
particular concept in science, and wanted to test experimentally whether this new method worked
better than the existing method. The researcher would design an experiment comparing the two
methods. Since the null hypothesis would be that there is no difference between the two methods, the
researcher would be hoping to reject the null hypothesis and conclude that the method he or she
developed is the better of the two.

The null hypothesis is typically a hypothesis of no difference, as in the above example where it is the
hypothesis that there is no difference between population means. That is why the word "null" in "null
hypothesis" is used  it is the hypothesis of no difference.

Example 1 Formulate appropriate null and alternative hypotheses for testing the demographer's
theory that the mean number of children born to urban women is less than the mean number of
children born to rural women.

Solution The hypotheses must be stated in terms of a population parameter or parameters. We will
thus define

1 = Mean number of children born to urban women, and


2 = Mean number of children ever born of the rural women.

The demographer wants to support the claim that 1 is less than 2; therefore, the null and alternative
hypotheses, in terms of these parameters, are

H0: (1 - 2) = 0 (i.e., 1 = 2; there is no difference between the mean numbers of children born
to urban and rural women)
Lecture Notes 8A – Testing of Hypothesis 2
Engr. Caesar Pobre Llapitan

Ha: (1 - 2) < 0 (i.e., 1 < 2; the mean number of children born to urban women is less than
that for the rural women)

Example 2 For many years, cigarette advertisements have been required to carry the following
statement: "Cigarette smoking is dangerous to your health." But, this waning is often located in
inconspicuous corners of the advertisements and printed in small type. Consequently, a researcher
believes that over 80% of those who read cigarette advertisements fail to see the warning. Specify the
null and alternative hypotheses that would be used in testing the researcher's theory.

Solution The researcher wants to make an inference about p, the true proportion of all readers of
cigarette advertisements who fail to see the warning. In particular, he wishes to collect evidence to
support the claim that p is greater than .80; thus, the null and alternative hypotheses are

H0: p = .80
Ha: p > .80

Observe that the statement of H0 in these examples and in general, is written with an equality (=)
sign. In Example 2, you may have been tempted to write the null hypothesis as H0: p  .80. However,
since the alternative of interest is that p > .80, then any evidence that would cause you to reject the
null hypothesis H0: p = .80 in favor of Ha: p > .80 would also cause you to reject H0: p = p', for any value
of p' that is less than .80. In other words, H0: p = .80 represents the worst possible case, from the
researcher's point of view, when the alternative hypothesis is not correct. Thus, for mathematical
ease, we combine all possible situations for describing the opposite of Ha into one statement involving
equality.

Example 3 A metal lathe is checked periodically by quality control inspectors to determine if it is


producing machine bearings with a mean diameter of .5 inch. If the mean diameter of the bearings is
larger or smaller than .5 inch, then the process is out of control and needs to be adjusted. Formulate
the null and alternative hypotheses that could be used to test whether the bearing production process
is out of control.

Solution We define the following parameter:


 = True mean diameter (in inches) of all bearings produced by the lathe

If either  > .5 or  < .5, then the metal lathe's production process is out of control. Since we wish to
be able to detect either possibility, the null and alternative hypotheses would be

H0:  = .5 (i.e., the process is in control)


Ha:   .5 (i.e., the process is out of control)

An alternative hypothesis may hypothesize a change from H0 in a particular direction, or it may


merely hypothesize a change without specifying a direction. In Examples 1 and 2, the researcher is
interested in detecting departure from H0 in one particular direction. In Example 1, the interest
focuses on whether the mean number of children born to the urban women is less than the mean
number of children born to rural women. The interest focuses on whether the proportion of cigarette
advertisement readers who fail to see the warning is greater than .80 in Example 2. These two tests are
called one-tailed tests. In contrast, Example 8.3 illustrates a two-tailed test in which we are interested
in whether the mean diameter of the machine bearings differs in either direction from .5 inch, i.e.,
whether the process is out of control.

3 Types of errors for a Hypothesis Test


The goal of any hypothesis testing is to make a decision. In particular, we will decide whether to reject
the null hypothesis, H0, in favor of the alternative hypothesis, Ha. Although we would like always to be
able to make a correct decision, we must remember that the decision will be based on sample
Lecture Notes 8A – Testing of Hypothesis 3
Engr. Caesar Pobre Llapitan

information, and thus we are subject to make one of two types of error, as defined in the
accompanying boxes.

Definition 1
A Type I error is the error of rejecting the null hypothesis when it is true. The probability of
committing a Type I error is usually denoted by .

Definition 2
A Type II error is the error of accepting the null hypothesis when it is false. The probability of
making a Type II error is usually denoted by .

The null hypothesis can be either true or false further, we will make a conclusion either to reject or
not to reject the null hypothesis. Thus, there are four possible situations that may arise in testing a
hypothesis (see Table 1).

Table 1 Conclusions and consequences for testing a hypothesis

Conclusions
Do not reject Reject
Null Hypothesis Null Hypothesis
True Null Hypothesis Correct conclusion Type I error
"State of Nature" Alternative Type II error Correct conclusion
Hypothesis

The kind of error that can be made depends on the actual state of affairs (which, of course, is
unknown to the investigator). Note that we risk a Type I error only if the null hypothesis is rejected,
and we risk a Type II error only if the null hypothesis is not rejected. Thus, we may make no error, or
we may make either a Type I error (with probability ), or a Type II error (with probability ), but not
both. We don't know which type of error corresponds to actuality and so would like to keep the
probabilities of both types of errors small. There is an intuitively appealing relationship between the
probabilities for the two types of error: As  increases,  decreases, similarly, as  increases, a
decreases. The only way to reduce  and  simultaneously is to increase the amount of information
available in the sample, i.e., to increase the sample size.

Example 4 Refer to Example 3. Specify what Type I and Type II errors would represent, in terms of
the problem.

Solution A Type I error is the error of incorrectly rejecting the null hypothesis. In our example, this
would occur if we conclude that the process is out of control when in fact the process is in control,
i.e., if we conclude that the mean bearing diameter is different from .5 inch, when in fact the mean is
equal to .5 inch. The consequence of making such an error would be that unnecessary time and effort
would be expended to repair the metal lathe.

A Type II error, that of accepting the null hypothesis when it is false, would occur if we conclude that
the mean bearing diameter is equal to .5 inch when in fact the mean differs from .5 inch. The practical
significance of making a Type II error is that the metal lathe would not be repaired, when in fact the
process is out of control.

The probability of making a Type I error ( ) can be controlled by the researcher (how to do this will
be explained in Section 4).  is often used as a measure of the reliability of the conclusion and called
the level of significance (or significance level) for a hypothesis test.
Lecture Notes 8A – Testing of Hypothesis 4
Engr. Caesar Pobre Llapitan

You may note that we have carefully avoided stating a decision in terms of "accept the null hypothesis
H0." Instead, if the sample does not provide enough evidence to support the alternative hypothesis Ha,
we prefer a decision "not to reject H0." This is because, if we were to "accept H0," the reliability of the
conclusion would be measured by , the probability of Type II error. However, the value of  is not
constant, but depends on the specific alternative value of the parameter and is difficult to compute in
most testing situations.

In summary, we recommend the following procedure for formulating hypotheses and stating
conclusions.

Formulating hypotheses and stating conclusions


1. State the hypothesis as the alternative hypothesis Ha.
2. The null hypothesis, H0, will be the opposite of Ha and will contain an equality sign.
3. If the sample evidence supports the alternative hypothesis, the null hypothesis will be rejected
and the probability of having made an incorrect decision (when in fact H0 is true) is , a quantity
that can be manipulated to be as small as the researcher wishes.
4. If the sample does not provide sufficient evidence to support the alternative hypothesis, then
conclude that the null hypothesis cannot be rejected on the basis of your sample. In this situation,
you may wish to collect more information about the phenomenon under study.

Example 5 The logic used in hypothesis testing has often been likened to that used in the courtroom
in which a defendant is on trial for committing a crime.
a. Formulate appropriate null and alternative hypotheses for judging the guilt or innocence of the
defendant.
b. Interpret the Type I and Type II errors in this context.
c. If you were the defendant, would you want  to be small or large? Explain.

Solution
a. Under a judicial system, a defendant is "innocent until proven guilty." That is, the burden of proof
is not on the defendant to prove his or her innocence; rather, the court must collect sufficient
evidence to support the claim that the defendant is guilty. Thus, the null and alternative
hypotheses would be
H0: Defendant is innocent
Ha: Defendant is guilty

b. The four possible outcomes are shown in Table 2. A Type I error would be to conclude that the
defendant is guilty, when in fact he or she is innocent; a Type II error would be to conclude that
the defendant is innocent, when in fact he or she is guilty.

Table 8.2 Conclusions and consequences inn Example 8.5

Decision of Court
Defendant is Defendant is
innocent guilty
True State of Defendant is innocent Correct decision Type II error
Nature
Defendant is guilty Type I error Correct decision

c. Most would probably agree that the Type I error in this situation is by far the more serious. Thus,
we would want , the probability of committing a Type I error, to be very small indeed.

A convention that is generally observed when formulating the null and alternative hypotheses of any
statistical test is to state H0 so that the possible error of incorrectly rejecting H0 (Type I error) is
considered more serious than the possible error of incorrectly failing to reject H0 (Type II error). In
Lecture Notes 8A – Testing of Hypothesis 5
Engr. Caesar Pobre Llapitan

many cases, the decision as to which type of error is more serious is admittedly not as clear-cut as that
of Example 8.5; experience will help to minimize this potential difficulty.

4 Rejection Regions
In this section we will describe how to arrive at a decision in a hypothesis-testing situation. Recall that
when making any type of statistical inference (of which hypothesis testing is a special case), we collect
information by obtaining a random sample from the populations of interest. In all our applications,
we will assume that the appropriate sampling process has already been carried out.

Example 6 Suppose we want to test the hypotheses


H0:  = 72
Ha:  > 72

What is the general format for carrying out a statistical test of hypothesis?

Solution The first step is to obtain a random sample from the population of interest. The information
provided by this sample, in the form of a sample statistic, will help us decide whether to reject the
null hypothesis. The sample statistic upon which we base our decision is called the test statistic.

The second step is to determine a test statistic that is reasonable in the context of a given hypothesis
test. For this example, we are hypothesizing about the value of the population mean . Since our best
guess about the value of  is the sample mean x̄ (see Section 2), it seems reasonable to use x̄ as a
test statistic. We will learn how to choose the test statistic for other hypothesis-testing situations in
the examples that follow.

The third step is to specify the range of possible computed values of the test statistic for which the
null hypothesis will be rejected. That is, what specific values of the test statistic will lead us to reject
the null hypothesis in favor of the alternative hypothesis? These specific values are known collectively
as the rejection region for the test. For this example, we would need to specify the values of x̄ that
would lead us to believe that Ha is true, i.e., that  is greater than 72. We will learn how to find an
appropriate rejection region in later examples.

Once the rejection region has been specified, the fourth step is to use the data in the sample to
compute the value of the test statistic. Finally, we make our decision by observing whether the
computed value of the test statistic lies within the rejection region. If in fact the computed value falls
within the rejection region, we will reject the null hypothesis; otherwise, we do not reject the null
hypothesis.

An outline of the hypothesis-testing procedure developed in Example 8.6 is given followings.

Outline for testing a hypothesis


1. Obtain a random sample from the population(s) of interest.
2. Determine a test statistic that is reasonable in the context of the given hypothesis test.
3. Specify the rejection region, the range of possible computed values of the test statistic for
which the null hypothesis will be rejected.
4. Use the data in the sample to compute the value of the test statistic.
5. Observe whether the computed value of the test statistic lies within the rejection region. If so,
reject the null hypothesis; otherwise, do not reject the null hypothesis.

Recall that the null and alternative hypotheses will be stated in terms of specific population
parameters. Thus, in step 2 we decide on a test statistic that will provide information about the target
parameter.

Example 7 Refer to Example 1, in which we wish to test


Lecture Notes 8A – Testing of Hypothesis 6
Engr. Caesar Pobre Llapitan

H0: (1 - 2) = 0


Ha: (1 - 2) < 0

where 1, and 2, are the population mean numbers of children born to urban women and rural
women, respectively. Suggest an appropriate test statistic in the context of this problem.

Solution The parameter of interest is ( 1 - 2), the difference between the two population means.
Therefore, we will use ( x̄1 − x̄ 2 ) , the difference between the corresponding sample means, as a basis
for deciding whether to reject H0. If the difference between the sample means, ( x̄1 − x̄ 2 ) , falls
greatly below the hypothesized value of ( 1 - 2) = 0, then we have evidence that disagrees with the
null hypothesis. In fact, it would support the alternative hypothesis that ( 1 - 2) < 0. Again, we are
using the point estimate of the target parameter as the test statistic in the hypothesis-testing
approach. In general, when the hypothesis test involves a specific population parameter, the test
statistic to be used is the conventional point estimate of that parameter.

In step 3, we divide all possible values of the test into two sets: the rejection region and its
complement. If the computed value of the test statistic falls within the rejection region, we reject the
null hypothesis. If the computed value of the test statistic does not fall within the rejection region, we
do not reject the null hypothesis.

Example 8 Refer to Example 8.6. For the hypothesis test


H0:  = 72
Ha:  > 72

indicate which decision you may make for each of the following values of the test statistic:
a . x̄ = 110 b . x̄ = 59 c . x̄ = 73
Solution
a. If x̄ = 110 , then much doubt is cast upon the null hypothesis. In other words, if the null
hypothesis were true (i.e., if  is in fact equal to 72), then it is very unlikely that we would observe
a sample mean x̄ as large as 110. We would thus tend to reject the null hypothesis on the basis
of information contained in this sample.
b. Since the alternative of interest is  > 72, this value of the sample mean, x̄ = 59 , provides no
support for Ha. Thus, we would not reject H0 in favor of Ha:  > 72, based on this sample.
c. Does a sample value of x̄ = 73 cast sufficient doubt on the null hypothesis to warrant its
rejection? Although the sample mean x̄ = 73 is larger than the null hypothesized value of  =72,
is this due to chance variation, or does it provide strong enough evidence to conclude in favor of
Ha? We think you will agree that the decision is not as clear-cut as in parts a and b, and that we
need a more formal mechanism for deciding what to do in this situation.

We now illustrate how to determine a rejection region that takes into account such factors as the
sample size and the maximum probability of a Type I error that you are willing to tolerate.

Example 9 Refer to Example 8. Specify completely the form of the rejection region for a test of
H0:  = 72
Ha:  > 72

at a significance level of  = .05.


Lecture Notes 8A – Testing of Hypothesis 7
Engr. Caesar Pobre Llapitan

Solution We are interested in detecting a directional departure from H0; in particular, we are
interested in the alternative that  is greater than 72. Now, what values of the sample mean x̄
would cause us to reject H0 in favor of Ha? Clearly, values of x̄ which are "sufficiently greater" than
72 would cast doubt on the null hypothesis. But how do we decide whether a value, x̄ = 73 is
"sufficiently greater" than 72 to reject H0? A convenient measure of the distance between x̄ and 72 is
the z-score, which "standardizes" the value of the test statistic x̄ :

x   x x  72 x  72
z  
x  / n s/ n

The z-score is obtained by using the values of x μand x σ


that would be valid if the null
hypothesis were true, i.e., if  = 72. The z-score then gives us a measure of how many standard
deviations the observed x̄ is from what we would expect to observe if H0 were true.

We examine Figure 8.1a and observe that the chance of obtaining a value of x̄ more than 1.645
standard deviations above 72 is only .05, when in fact the true value of  is 72. We are assuming that
the sample size is large enough to ensure that the sampling distribution of x̄ is approximately
normal. Thus, if we observe a sample mean located more than 1.645 standard deviations above 72,
then either H0, is true and a relatively rare (with probability .05 or less) event has occurred, or Ha is
true and the population mean exceeds 72. We would tend to favor the latter explanation for obtaining
such a large value of x̄ , and would then reject H0.

Figure 1 Location of rejection region of Example 9

In summary, our rejection region for this example consists of all values of z that are greater than 1.645
(i.e., all values of x̄ that are more than 1.645 standard deviations above 72). The value at the
boundary of the rejection region is called the critical value. The critical value 1.645 is shown in Figure
1b. In this situation, the probability of a Type I error  that is, deciding in favor of Ha if in fact H0 is
true  is equal to a  =.05.

Example 10 Specify the form of the rejection region for a test of


H0:  = 72
Ha:  < 72

at significance level  = .01.

Solution Here, we want to be able to detect the directional alternative that  is less than 72; in this
case, it is "sufficiently small" values of the test statistic x̄ that would cast doubt on the null
Lecture Notes 8A – Testing of Hypothesis 8
Engr. Caesar Pobre Llapitan

hypothesis. As in Example 9, we will standardize the value of the test statistic to obtain a measure of
the distance between x̄ and the null hypothesized value of 72:

(x  x ) x  72 x  72
z  
x / n s/ n

This z-value tells us how many standard deviations the observed x̄ is from what would be expected
if H0 were true. Here, we have also assumed that n  30 so that the sampling distribution of x̄ will
be approximately normal. The appropriate modifications for small samples will be indicated in
Chapter 9.

Figure 2a shows us that, when in fact the true value of  is 72, the chance of observing a value of x̄
more than 2.33 standard deviations below 72 is only .01. Thus, at significance level (probability of Type
I error) equal to .01, we would reject the null hypothesis for all values of z that are less than - 2.33 (see
Figure 2b), i.e., for all values of x̄ that lie more than 2.33 standard deviations below 72.

Figure 2 Location of rejection region of Example 10

Example 11 Specify the form of the rejection region for a test of


H0:  = 72
Ha:   72

where we are willing to tolerate a .05 chance of making a Type I error.

Solution For this two-sided (non-directional) alternative, we would reject the null hypothesis for
"sufficiently small" or "sufficiently large" values of the standardized test statistic
x  72
z
s/ n

Now, from Figure 3a, we note that the chance of observing a sample mean x̄ more than 1.96
standard deviations below 72 or more than 1.96 standard deviations above 7 2, when in fact H0 is true,
is only  = .05. Thus, the rejection region consists of two sets of values: We will reject H0 if z is either
less than -1.96 or greater than 1.96 (see Figure 3b). For this rejection rule, the probability of a Type I
error is .05.
Lecture Notes 8A – Testing of Hypothesis 9
Engr. Caesar Pobre Llapitan

Figure 3 Location of rejection region of Example 11

The three previous examples all exhibit certain common characteristics regarding the rejection
region, as indicated in the next paragraph.

Guidelines for Step 3 of Hypothesis Testing


1. The value of , the probability of a Type I error; is specified in advance by the researcher. It
can be made as small or as large as desired; typical values are  = .01, .02, .05, and .10. For a
fixed sample size, the size of the rejection region decreases as the value of a decrease (see
Figure 4). That is, for smaller values of , more extreme departures of the test statistic from
the null hypothesized parameter value are required to permit rejection of H0.
2. For testing means or proportions, the test statistic (i.e., the point estimate of the target
parameter) is standardized to provide a measure of how great is its departure from the null
hypothesized value of the parameter. The standardization is based on the sampling
distribution of the point estimate, assuming H0 is true. (It is through standardization that the
rejection rule takes into account the sample sizes.)

Point estimate-Hypothesized value


Standard test statistic 
Standard deviation of point estimate

3. The location of the rejection region depends on whether the test is one-tailed or two-tailed,
and on the pre-specified significance level, .
a. For a one-tailed test in which the symbol ">" occurs in H0, the rejection region consists of
values in the upper tall of the sampling distribution of the standardized test statistic. The
critical value is selected so that the area to its right is equal to .
b. For a one-tailed test in which the symbol "<" appears in Ha, the rejection region consists of
values in the lower tail of the sampling distribution of the standardized test statistic. The
critical value is selected so that the area to its left is equal to .
c. For a two-tailed test, in which the symbol "" occurs in Ha, the rejection region consists of
two sets of values. The critical values are selected so that the area in each tail of the
sampling distribution of the standardized test statistic is equal to /2.
Lecture Notes 8A – Testing of Hypothesis 10
Engr. Caesar Pobre Llapitan

Figure 4 Size of the upper-tail rejection region for different values of 

Steps 4 and 5 of the hypothesis-testing approaches require the computation of a test statistic from the
sample information. Then we determine if the standardized of the test statistic value lies within the
rejection region in order to make a decision about whether to reject the null hypothesis.

Example 12 Refer to Example 9. Suppose the following statistics were calculated based on a random
sample of n = 30 measurements: x̄ = 73, s = 13. Perform a test of
H0:  = 72
Ha:  > 72

at a significance level of  = .05.

Solution In Example 9, we determined the following rejection rule for the given value of  and the
alternative hypothesis of interest:
Reject H0 if z > 1.645.

The standardized test statistic, computed assuming H0 is true, is given by

x   x x  72 x  72 73  72
z     .42
x  / n s / n 13 / 30

Figure 5 Location of rejection region of Example 8.12

Since this value does not lie within the rejection region (shown in Figure 5), we fail to reject H0 and
conclude there is insufficient evidence to support the alternative hypothesis, Ha:  > 72. (Note that we
do not conclude that H0 is true; rather, we state that we have insufficient evidence to reject H0.)

Summary
In this chapter, we have introduced the logic and general concepts involved in the statistical
procedure of hypothesis testing. The techniques will be illustrated more fully with practical
applications in Chapter 9.
Lecture Notes 8A – Testing of Hypothesis 11
Engr. Caesar Pobre Llapitan

Exercises
1. A medical researcher would like to determine whether the proportion of males admitted to a
hospital because of heart disease differs from the corresponding proportion of females.
Formulate the appropriate null and alternative hypotheses and state whether the test is one-
tailed or two-tailed.
2. Why do we avoid stating a decision in terms of "accept the null hypothesis H0"?
3. Suppose it is desired to test
H0:  = 65
Ha:   65
at significance level  = .02. Specify the form of the rejection region. (Hint: assuming that the
sample size will be sufficient to guarantee the approximate normality of the sampling
distribution of x̄ .)
4. Indicate the form of the rejection region for a test of
H0: (p1  p2) = 0
Ha: (p1  p2) > 0
Assume that the sample size will be appropriate to apply the normal approximation to the
sampling distribution of ( p 1 − p2 ) , and that the maximum tolerable probability of
^ ^
committing a Type I error is .05.
5. For each of the following rejection region, determine the value of , the probability of a Type I
error:
a) z < 1.96 b) z > 1.645 c) z < 2.58 or z > 2.58

You might also like