L18 Hypothesis Testing1

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 62

Statistical Inference:

Hypothesis Testing
Copyright© Dorling Kindersley India Pvt. Ltd
Statistical Inference: Hypothesis Testing for
Single Populations 2
Do tweets create Impact on share markets?

Copyright© Dorling Kindersley India Pvt. Ltd


Statistical Inference: Hypothesis Testing for
Single Populations 3
VP: HR Dilemma

• For example, the Vice President (HR) of a company wants to


know the effectiveness of a training programme which the
company has organized for all its 70,000 employees based at 130
different locations in the country.
• Contacting all these employees with an effectiveness
measurement questionnaire is not feasible.
• So the Vice President (HR) takes a sample of size 629 from all the
different locations in the country.
• The result that is obtained would not be the result from the entire
population but only from the sample.
• The Vice President (HR) will then set an assumption that
“training has not enhanced efficiency” and will accept or reject
this assumption through a well-defined statistical procedure
known as hypothesis testing.
Learning Objectives

 Understand hypothesis-testing procedure using one-tailed and two-


tailed tests

 Understand the concepts of Type I and Type II errors in hypothesis


testing

 Understand the procedure of hypothesis testing


Introduction to Hypothesis Testing
 A statistical hypothesis is an assumption about an unknown population
parameter.

 Hypothesis testing is a well defined procedure which helps us to decide


objectively whether to accept or reject the hypothesis based on the
information available from the sample.

 In statistical analysis, we use the concept of probability to specify a


probability level at which a researcher concludes that the observed
difference between the sample statistic and the population parameter is
not due to chance.
QUIZ
• Poll

• Q-A hypothesis is an assumption about an unknown population


parameter.
• True/False

 Q-A statement made about a population for testing purpose is


called?
a) Statistic
b) Hypothesis
c) Level of Significance
d) Test-Statistic
Hypothesis Testing Procedure
Seven steps of hypothesis testing
Step 1: Set Null and Alternative
Hypotheses
 The null hypothesis generally referred by H0 (H sub-zero), is the
hypothesis which is tested for possible rejection under the assumption
that is true. Theoretically, a null hypothesis is set as no difference or status
quo and considered true, until and unless it is proved wrong by the
collected sample data.
 Symbolically, a null hypothesis is represented as:

 The alternative hypothesis, generally referred by H1 (H sub-one), is a


logical opposite of the null hypothesis.

 Symbolically, alternative hypothesis is represented as:


Step 1: Set Null and Alternative
Hypotheses
Copyright© Dorling Kindersley India Pvt. Ltd
Statistical Inference: Hypothesis Testing for
Single Populations 11
Copyright© Dorling Kindersley India Pvt. Ltd
Statistical Inference: Hypothesis Testing for
Single Populations 12
2008 Mumbai attacks –
26/11

Copyright© Dorling Kindersley India Pvt. Ltd


Statistical Inference: Hypothesis Testing for
Single Populations 13
Copyright© Dorling Kindersley India Pvt. Ltd
Statistical Inference: Hypothesis Testing for
Single Populations 14
Copyright© Dorling Kindersley India Pvt. Ltd
Statistical Inference: Hypothesis Testing for
Single Populations 15
Copyright© Dorling Kindersley India Pvt. Ltd
Statistical Inference: Hypothesis Testing for
Single Populations 16
Copyright© Dorling Kindersley India Pvt. Ltd
Statistical Inference: Hypothesis Testing for
Single Populations 17
Copyright© Dorling Kindersley India Pvt. Ltd
Statistical Inference: Hypothesis Testing for
Single Populations 18
Copyright© Dorling Kindersley India Pvt. Ltd
Statistical Inference: Hypothesis Testing for
Single Populations 19
Copyright© Dorling Kindersley India Pvt. Ltd
Statistical Inference: Hypothesis Testing for
Single Populations 20
QUIZ
• Poll
What will you Think?
1) When you are applying for a Job?
2) When you are writing a Competitive Exam?
3) When you are starting a company?

Do you go for Null Hypothesis?


QUIZ
• Poll
Q-If we reject null hypothesis, we conclude that
a) There is enough statistical evidence to infer that the alternative
hypothesis is true.
b) There is not enough statistical evidence to infer that the alternative
hypothesis is true.
c) None
Step 2: Determine the Appropriate
Statistical Test

 Type, number, and the level of data may provide a platform for
deciding the statistical test.
Step 3: Set the Level of Significance
 The level of significance generally denoted by α is the
probability, which is attached to a null hypothesis, which may be
rejected even when it is true.

 The level of significance is also known as the size of the rejection


region or the size of the critical region.

 The levels of significance which are generally applied by


researchers are: 0.01; 0.05; 0.10.
Type I and Type II Errors
When a researcher tests statistical hypotheses, there can be four possible
outcomes as follows:
Type I and Type II Errors
When a researcher tests statistical hypotheses, there can be four possible
outcomes as follows:
Type I and Type II Errors
When a researcher tests statistical hypotheses, there can be four possible
outcomes as follows:
Type I and Type II Errors
When a researcher tests statistical hypotheses, there can be four possible
outcomes as follows:
Poll

Q-The probability of rejecting a null


hypothesis when it is true is called
a Level of significance
b Type II error
c Type I error
Poll

The level of significance can be viewed


as the amount of risk that an analyst will
accept when making a decision.

True/False
Step 4: Set the Decision Rule
Acceptance and rejection regions of null hypothesis (two-tailed test)

Critical region is the area under the normal curve, divided into two mutually
exclusive regions. These regions are termed as acceptance region (when the null
hypothesis is accepted) and the rejection region or critical region (when the null
hypothesis is rejected).
Two-Tailed Test of Hypothesis
 Let us consider the null and alternative hypotheses as below:

 Two-tailed tests contain the rejection region on both the tails of


the sampling distribution of a test statistic. This means a
researcher will reject the null hypothesis if the computed sample
statistic is significantly higher than or lower than the
hypothesized population parameter (considering both the tails,
right as well as left).
Acceptance and rejection regions (alpha = 0.05)
One-Tailed Test of Hypothesis
Let us consider a null and alternative hypotheses as below:

One-tailed test contains the rejection region on one tail of the


sampling distribution of a test statistic. In case of a left-tailed test, a
researcher rejects the null hypothesis if the computed sample
statistic is significantly lower than the hypothesized population
parameter.

In case of a right-tailed test, a researcher rejects the null hypothesis


if the computed sample statistic is significantly higher than the
hypothesized population parameter.
Acceptance and rejection regions for one-tailed (left)
test (alpha = 0.05)
Acceptance and rejection regions for one-tailed (right)
test (alpha = 0.05)
Step 5: Collect the Sample Data
 In this stage of sampling, data are collected and the appropriate
sample statistics are computed.

 The first four steps should be completed before collecting the


data for the study.

 It is not advisable to collect the data first and then decide on the
stages of hypothesis testing.
Step 6: Analyse the data
 In this step, the researcher has to compute the test statistic. This
involves selection of an appropriate probability distribution for a
particular test.
 Some of the commonly used testing procedures are z, t, F, and
χ2.
Step 7: Arrive at a Statistical Conclusion
and Business Implication

 In this step, the researchers draw a statistical conclusion. A


statistical conclusion is a decision to accept or reject a null
hypothesis.
 Statisticians present the information obtained using hypothesis-
testing procedure to the decision makers. Decisions are made on
the basis of this information. Ultimately, a decision maker
decides that a statistically significant result is a substantive result
and needs to be implemented for meeting the organization’s
goals.
Hypothesis Testing for a Single Population
Mean Using the Z Statistic
 When sample size is greater than (equals to) 30.
 Population has a normal distribution.
Hypothesis Testing for a Single Population
Mean Using the Z Statistic

A marketing research firm conducted a survey 10 years ago and found that
the average household income of a particular geographic region is Rs
10,000. Mr. Gupta, who has recently joined the firm as a vice president has
expressed doubts about the accuracy of the data. For verifying the data,
the firm has decided to take a random sample of 200 households that yield
a sample mean (for household income) of Rs 11,000. Assume that the
population standard deviation of the household income is Rs 1200.
Verify Mr. Gupta’s doubts using the seven steps of hypothesis testing. Let
α = 0.05 (5%).
Example (Solution)
Hypothesis Testing for a Single Population
Mean Using the T Statistic (Case of a
Small Random Sample When N < 30)
When a researcher draw a small random sample (n < 30) to estimate
the population mean μ and when the population standard deviation
is unknown and population is normally distributed, t-test can be
applied.
Example
Royal Tyres has launched a new brand of tyres for tractors and
claims that under normal circumstances the average life of the tyres
is 40,000 km. A retailer wants to test this claim and has taken a
random sample of 8 tyres. He tests the life of the tyres under normal
circumstance. The results obtained are presented in Table 10.4.
Example (Solution)
Figure : Computed and critical t values for Example 10.4
Lets Do It !!

A cable TV network company wants to provide modern facilities to its


consumers. The company has five-year old data which reveals that the
average household income is Rs 120,000. Company officials believe that
due to the fast development in the region, the average household income
might have increased. The company takes a random sample of 25
households to verify this assumption. From the sample the average
income of the households is calculated as 125,000. From historical data,
standard deviation is obtained as 1200. Use alpha = 0.05 to verify the
finding.
Lets Do It !!

During the economic boom, the average monthly income of software


professionals touched Rs.75,000. A researcher is conducting a study on the
impact of economic recession in 2008. The researcher believes that the
economic recession may have an adverse impact on the average monthly
salary of software professionals.
For verifying his belief, the researcher has taken a random sample of 20
software professionals and computed their average income during the
recession period. The average income of software professionals is
computed as Rs.60,000.
The sample standard deviation is computed as Rs.3000. Use alpha = 0.10
to test whether the average income of software professionals is Rs. 75,000
or it has gone down as indicated by the sample mean.
Statistical Inference:
Hypothesis Testing for
Two Populations
Hypotheseis Testing for the Difference
Between Two Population Means Using the
Z Statistic (Case of a large Random
Sample, n1, n2 > 30, When Population
Standard Deviation Is Known)

When sample size is large (n1, n2 > 30) and samples are
independent (not related) and the population standard deviation is
known, the Z statistic can be used to test the hypothesis for
difference between two population means.
Hypotheseis Testing for the Difference Between
Two Population Means Using the Z-Statistic (Case
of a large Random Sample, n1, n2 > 30)
LET’S DO IT !

The amount of a certain trace element in


blood is known to vary with a standard
deviation of 14.1 ppm (parts per million)
for male blood donors and 9.5 ppm for
female donors. Random samples of 75 male
and 50 female donors yield concentration
means of 28 and 33 ppm, respectively.
What is the likelihood that the population
means of concentrations of the element are
the same for men and women?
Your Turn !
Dominos wanted to test their claim regarding
who can eat more slices of Pizza in a Pizza eating
festival for males vs females. For the purpose,
they randomly selected 22 males and 20 females.
The average number of slices eaten by males
were 450 with a standard deviation of 25 (from
historical data) and the average number of slices
eaten by females were 550 with a standard
deviation of 20. On the basis of the samples
taken for the study, estimate the difference in
population means taking 5% as the level of
significance and help Dominos to check their
claim that females eat more pizza slices than
males.
Hypotheseis Testing for the Difference
Between Two Population Means Using the
t Statistic (Case of a Small Random
Sample, n1, n2 < 30, When Population
Standard Deviation Is Unknown)

When sample size is small (n1, n2 < 30) and samples are
independent (not related) and the population standard deviation is
unknown, the t statistic can be used to test the hypothesis for
difference between two population means.
Hypotheseis Testing for the Difference Between
Two Population Means Using the t Statistic (Case
of a Small Random Sample, n1, n2 < 30, When
Population Standard Deviation Is Unknown)
LET’S DO IT !
Anmol Constructions is a leading company in the construction
sector in India. It wants to construct flats in Raipur and Dehradun,
the capitals of the newly formed states of Chattisgarh and
Uttarakhand, respectively. The company wants to estimate the
amount that customers are willing to spend on purchasing a flat
in the two cities. It randomly selected 25 potential customers
from Raipur and 27 customers from Dehradun and posed the
question, “how much are you willing to spend on a flat?” The
mean for dehradun was 143.44 (variance = 203.64) and for Raipur
it was found as 162.8 (Variance = 273.08). On the basis of the
samples taken for the study, estimate the difference in
population means taking 95% as the confidence level.
CHI SQUARE TEST

•Chi-square is a statistical test commonly used to


compare observed data with data we would expect to
obtain according to a specific hypothesis.

•There are three types of Chi-Square tests:


–Goodness of fit
–Test of Association or Independence
•Measuring Association between Categorical data.
–Test of Homogeneity
Chi Square Test of Association
The Chi Square Test of Association was derived
mathematically by Karl Pearson, and is often known as
Pearson's Chi Square Test of Association. 

Degree of Freedom =
(No. of Rows-1) x (No. of Columns - 1)
Chi Square Table Value Case
QUESTION FOR PRACTICE
A company is concerned about the increasing violent altercations
between its employees. The number of violent incidents recorded
by the management during six randomly selected months. Use
alpha as 5% to determine whether the data fits a uniform
distribution.
Months No. Of
incidents
Jan 55
Feb 65
Mar 68
Apr 72
May 78
June 82

You might also like