Class 03 04 Confidence Interval, Hypothesis Testing
Class 03 04 Confidence Interval, Hypothesis Testing
Class 03 04 Confidence Interval, Hypothesis Testing
Confidence interval,
Hypothesis testing
Sampling Distribution of the means
• Central Limit Theorem: if is the mean of a random sample of size n
taken from a population mean and finite variance then the limiting
form of the distribution of
Theorem 2:
• Let X1, X2, …, Xn be the sample means of samples S1, S2, …, Sn that are
drawn from an independent and identically distributed population
with mean and standard deviation . From central limit theorem
we know that the sample means Xi follow a normal distribution with
mean and standard deviation . The variable follows
a standard normal variable.
• That is, the probability that the population mean takes a value
between and is 1 – .
• The absolute values of Z/2 for various values of are shown below:
Confidence interval for
|Z/2| population mean when
population standard deviation is
known
0.1 1.64
0.05 1.96
0.02 2.33
0.01 2.58
(a) Calculate the 95% confidence interval for the population mean.
(b) What is the probability that the population mean is greater than 4.73
days?
(a) 95% confidence interval for population mean: We know that =4.5 and = 1.2 and thus
Note that 4.73 is the upper limit of the 95% confidence interval from part (a), thus the probability
that the population mean is greater than 4.73 is approximately 0.025.
• William Gossett (Student, 1908) proved that if the population follows a normal
distribution and the standard deviation is calculated from the sample, then the statistic
given in Eq will follow a t-distribution with (n 1) degrees of freedom
• Here S is the standard deviation estimated from the sample (standard error). The t-
distribution is very similar to standard normal distribution; it has a bell shape and its
mean, median, and mode are equal to zero as in the case of standard normal distribution.
The major difference between the t-distribution and the standard normal distribution is
that t-distribution has broad tail compared to standard normal distribution. However, as
the degrees of freedom increases the t-distribution converges to standard normal
distribution.
• In above Eq, the value t/2,n 1 is the value of t under t-distribution for which
the cumulative probability F(t) = /2 when the degrees of freedom is (n 1).
• An online grocery store is interested in estimating the basket size (number of items
ordered by the customer) of its customers so that it can optimize its size of crates used for
delivering the grocery items. From a sample of 70 customers, the average basket size was
estimated as 24 and the standard deviation estimated from the sample was 3.8. Calculate
the 95% confidence interval for the basket size of the customer order.
Solution
We know that , n = 70, S = 3.8 and t0.025, 69 = 1.995
Thus the 95% confidence interval for the size of the basket is
(23.09,24.91).
10/16/2022 @TKMISHRA ML NITRKL 25
HYPOTHESIS TESTING
INTRODUCTION TO HYPOTHEIS TESTING
3) Identify the test statistic to be used for testing the validity of the null
hypothesis. Test statistic will enable us to calculate the evidence in
support of null hypothesis. The test statistic will depend on the
probability distribution of the sampling distribution; for example, if the
test is for mean value and the mean is calculated from a large sample
and if the population standard deviation is known, then the sampling
distribution will be a normal distribution and the test statistic will be a Z-
statistic (standard normal statistic).
10/16/2022 @TKMISHRA ML NITRKL 31
HYPOTHESIS TESTING STEPS
4. Decide the criteria for rejection and retention of null hypothesis.
This is called significance value traditionally denoted by symbol .
The value of will depend on the context and usually 0.1, 0.05, and
0.01 are used.
6. Take the decision to reject or retain the null hypothesis based on the
p-value and significance value . The null hypothesis is rejected
when p-value is less than and the null hypothesis is retained when
p-value is greater than or equal to .
H0: a e
2 On average people with Ph.D. in analytics earn
HA: a > e
more than people with Ph.D. in engineering.
analytics.
engineering.
It is essential to have the equal sign in null hypothesis
statement.
• Note that the value 2 is actually the value under a standard normal
distribution since it is calculated from
Criteria Decision
H0: m 100,000
HA: m > 100,000
Where m and f are the average salaries of male and female MBA
students, respectively, at the time of graduation.
In this case, the rejection region will be on either side of the
distribution and if the significance level is then the rejection region
will be /2 on either side of the distribution. Since the rejection
region is on either side of the distribution, it will be a two-tailed test.
10/16/2022 @TKMISHRA ML NITRKL 46
Solution
Z-statistic =
• The critical value in this case will depend on the significance value
and whether it is a one-tailed or two-tailed test
0.1
1.28 1.28 1.64 and 1.64
0.05
1.64 1.64 1.96 and 1.96
0.01
2.33 2.33 2.58 and 2.58
16 16 30 37 25 22 19 35 27 32
34 28 24 35 24 21 32 29 24 35
28 29 18 31 28 33 32 24 25 22
21 27 41 23 23 16 24 38 26 28
• Since the Z-statistic value is 1.8132 and falls on the right tail, we
first calculate normal distribution beyond 1.8132 which is equal
to 0.0348.
601 627 330 364 562 353 583 254 528 470
408 601 593 729 402 530 708 599 439 762
292 636 444 286 636 667 252 335 457 632
Specialization Sample Size Estimated Mean Salary (in Rupees) Population Standard
Z-statistic value is higher than the Z-critical value, we reject the null
Group Sample Size Increase in Height (in cm) during the Standard Deviation
Do not drink
80 6.3 cm 1.3 cm
health drink
Pooled variance is
Couples with no
120 10.1 years 2.4 years
Degree
Couples with
100 9.5 years 3.1 years
Degree