Credit Sessions5 & 6

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 91

SESSIONS-5 & 6

CONFIDENCE INTERVAL
ESTIMATION
CONFIDENCE INTERVAL ESTIMATION

 Estimators and Their Properties


 Confidence Interval for the Population Mean µ
When the Population Standard Deviation ơ is
Known
 Confidence Intervals for µ When ơ is Unknown -
The t Distribution
 Large-Sample Confidence Intervals for the
Population Proportion P
 Sample Size Determination
LEARNING OBJECTIVES
After studying this module you should be able to:
 Explain why sample statistics are good estimators of
population parameters

 Judge one estimator as better than another based on


desirable properties of estimators

 Explain confidence intervals

 Compute confidence intervals for population means

 Compute confidence intervals for population


proportions

 Compute minimum sample sizes needed for an


estimation
Fuel Usage of “Ultra-Green” Cars
A car manufacturer advertises that its new
“ultra-green” car obtains an average of 100
mpg and, based on its fuel emissions, has
earned an A+ rating from the Environmental
Protection Agency.

Pinnacle Research, an independent consumer


advocacy firm, obtains a sample of 25 cars for
testing purposes.

Each car is driven the same distance in identical


conditions in order to obtain the car’s mpg.
Fuel Usage of “Ultra-Green” Cars
The mpg for each “Ultra-Green” car is given below.

Need to use the data in this sample to:


Estimate with 90% confidence
The mean mpg of all ultra-green cars.
The proportion of all ultra-green cars that

obtain over 100 mpg.


Determine the sample size needed to achieve a
specified level of precision in the mean and
proportion estimates.
Estimation…
Objective of estimation is to determine the
approximate value of a population parameter on the
basis of a sample statistic.

Two types of estimators:

 Point Estimator

 Interval Estimator
Confidence Interval
• Consider the following statements:
• x = 550
A single-valued estimate that conveys little information
about the actual value of the population mean.
 We are 99% confident that μ is in the interval
[449,551]
An interval estimate which locates the population mean
within a narrow interval, with a high level of
confidence.
 We are 90% confident that μ is in the interval
[400,700]
An interval estimate which locates the population mean
within a broader interval, with a lower level of
confidence.
Point and Interval Estimates
A point estimate is a single-valued estimate, a single
element chosen from sampling distribution.
Doesn’t reflect the effects of larger sample sizes
Conveys little information about actual value of the
population parameter, about accuracy of estimate.
A confidence interval provides additional information
about the variability of the estimate
Provides amount of uncertainty associated with a point
estimate of a population parameter

Lower Upper
Confidence Limit Point Estimate Confidence Limit

Width of
confidence interval
Confidence Interval
 A confidence interval or interval estimate is a range or
interval of numbers believed to include an unknown
population parameter.
 Associated with the interval is a measure of the
confidence we have that the interval does indeed
contain the parameter of interest.
A confidence interval or interval estimate has two
components:
A range or interval of values
An associated level of confidence
 Confidence interval is constructed as:
Point estimate ± Margin of error.
 Margin of error accounts for the variability of the
estimator and the desired confidence level of the interval
Confidence Interval Example
In practice we only take one sample of size n

In practice we do not know µ so we do not know if


the interval actually contains µ

However we do know that 95% of the intervals


formed in this manner will contain µ

Thus, based on the one sample, we actually


selected we can be 95% confident our interval will
contain µ (this is a 95% confidence interval)

Note: 95% confidence is based on the fact that Z=1.96.


Types of Estimators
Confidence Interval or Interval Estimate
An interval or range of values believed to include the
unknown population parameter.
Associated with the interval is a measure of the
confidence we have that the interval does indeed
contain the parameter of interest.
Takes into consideration variation in sample statistics
from sample to sample
Based on observations from 1 sample
Gives information about closeness to unknown
population parameters
 Stated in terms of level of confidence
 e.g. 95% confident, 99% confident
 Can never be 100% confident
Confidence Level, (1-)
The probability 1 – α is the confidence level, which
is a measure of how frequently the interval will
actually include µ.

Suppose confidence level = 95%


Also written as (1 - ) = 0.95, (so  = 0.05)

A relative frequency interpretation:


95% of all the confidence intervals that can be
constructed will contain the unknown true
parameter
A specific interval either will contain or will not
contain the true parameter
Confidence Intervals

Confidence
Intervals

Population Population
Mean Proportion

σ Known σ Unknown
Confidence Interval for Population
Mean µ when Population S.D ơ Is Known
 If the population distribution is normal, the sampling
distribution of the mean is normal (Normal Theorem)

If the sample is sufficiently large, regardless of the


shape of the population distribution, the sampling
distribution is normal (Central Limit Theorem).

X ~ N(µ,ơ2 /n)

Z = (X - µ) / (ơ /√n) ~ N(0,1)
A (1-a )100% Confidence Interval for m
z
is defined as the z value that cuts off a right-tail area of
2
α/2 under the standard normal curve. (1- α) is called the
confidence coefficient. α is called the error probability,
and (1- α)100% is called the confidence level.
S tand ard N o r m al D is trib uti on æ ö
Pçz > za = a/2
0.4 è ø
2
(1   )
æ ö a/2
0.3 Pçz < -za =
è ø
2
æ ö
f(z)

0.2
Pç-za < z < za = (1 - a )
 è ø
0.1  2 2

2 2
0.0 (1- a )100% Confidence Interval:
-5 -4 -3 -2 -1 0 1 2 3 4 5 s
z  Z z x ± za
2 2 2
n
Confidence Interval for μ (σKnown)
 Assumptions
Population standard deviation σ is known

Population is normally distributed

If population is not normal, use large sample (CLT)

Confidence interval estimate:


σ
X  Z α/2
n
X
where is the point estimate
Zα/2 is the normal distribution critical value for
a probability of /2 in each tail
σ/ n
General Formula
The general formula for all confidence intervals is:

Point Estimate ± (Critical Value)(Standard Error)

•Point Estimate is the sample statistic estimating the


population parameter of interest

•Critical Value is a table value based on the sampling


distribution of the point estimate and the desired
confidence level

•Standard Error is the standard deviation of the point


estimate
Finding the Critical Value, Zα/2
Z α /2   1.96

Consider a 95% confidence interval:


1    0.95 so   0.05

α α
 0.025  0.025
2 2

Z units: Zα/2 = -1.96 0 Zα/2 = 1.96


Lower Upper
X units: Confidence Point Estimate Confidence
Limit Limit
Confidence Interval for μ (σ Known)
 Constructing a Confidence Interval for μ
When is σ Known
 Consider a standard normal random variable Z.


P  1.96  Z  1.96   0.95 as illustrated here.
95% Intervals around Sample Mean

Approximately 95% of the intervals


x  1 . 96

Sampling Distribution of the Mean n
0.4
around the sample
mean can be expected to include
95%
0.3

the actual value of the population


f(x)

0.2

0.1
2.5% 2.5% mean, µ. (When the sample mean
0.0
    1.96
 x falls within the 95% interval
  1.96
n n
around the population mean.)
x

x
*5% of such intervals around the
sample mean can be expected not
x

* x
x
x to include the actual value of the
population mean. (When the
x

x
*
x
x
x sample mean falls outside the 95%
interval around the population
Critical Values of z and Levels of
Confidence
 z
S t an d ard N o rm al D is trib uti o n
(1)
2 2
0.4
(1   )
0.99 0.005 2.576 0.3

0.98 0.010 2.326


f( z)
0.2

 
0.95 0.025 1.960 0.1

2 2

0.90 0.050 1.645


0.0

-5 -4 -3 -2 -1 0 1 2 3 4 5

z  Z z
0.80 0.100 1.282 2 2
Confidence Interval of the Population
Mean When ơ Is Known
 Interpreting a Confidence Interval
 Interpreting a confidence interval requires care.
 Incorrect: The probability that µ falls in the
interval is 0.95.

 Correct: If numerous samples of size n are


drawn from a given population, then 95% of the
intervals formed by the formula
will contain µ.
x  z 2  n
 Since there are many possible samples,
estimate will be right 95% of the time, thus
giving us 95% confidence.
Example …
The Doll Computer Company makes its own
computers and delivers them directly to customers who
order them via the Internet.
To achieve its objective of speed, Doll makes each of
its five most popular computers and transports them to
warehouses from which it generally takes 1 day to
deliver a computer to the customer.
This strategy requires high levels of inventory that add
considerably to the cost.
To lower these costs the operations manager wants to
use an inventory model. He notes demand during lead
time is normally distributed and he needs to know the
mean to compute the optimum inventory level.
Example …
He observes 25 lead time periods and records the
demand during each period.
The manager would like a 95% confidence interval
estimate of the mean demand during lead time.
Assume that the manager knows that the standard
deviation is 75 computers.

10.24
Example …
Need to estimate the mean demand over lead time with 95%
confidence in order to set inventory levels…
The parameter to be estimated is the population mean:µ
Confidence interval estimator will be:

370.16 Calculated from the data…

1.96

75
Given
n 25

Lower and upper confidence limits are 340.76 and 399.56.


Example …Interpretation
Estimation for the mean demand during lead time
lies between 340.76 and 399.56 —can be used this
as input in developing an inventory policy.

Estimated mean demand during lead time falls


between 340.76 and 399.56, and this estimator is
correct 95% of the time which means that 5% of the
time the estimator will be incorrect.

Incidentally, the media often refer to the 95% figure


as “19 times out of 20,” which emphasizes the long-
run aspect of the confidence level.

10.26
Confidence Interval of the Population
Mean When ơ Is Known- Example 1
 A sample of 25 cereal boxes yields a mean
weight of 1.02 kgs of cereal per box.

 Construct a 95% confidence interval of the


mean weight of all cereal boxes.

 Assume that the weight is normally


distributed with a population standard
deviation of 0.03 kgs.
Confidence Interval of the Population
Mean When s Is Known
n=25, x= 1.02 kgs,

α= (1-0.95)= 0.05,zα/2= 1.96, Ơ=0.03

 Substituting these values, we get

x  1.96  
n  1.02  1.96 0.03 
25  1.02  0.012

 or, with 95% confidence, the mean weight of all


cereal boxes falls between 1.008 and 1.032 kgs.
Confidence Interval of the Population
Mean When ơ Is Known
 The Width of a Confidence Interval
 Margin of Error Confidence Interval Width
z 2  n

2 z 2  n 
 The width of the confidence interval is
influenced by the:
 Sample size n.
 Standard deviation ơ.
 Confidence level (1  α)%.
EXAMPLE 2
 A machine produces components,
which have a SD 1.6 cm in length.

 A random sample of 64 parts is selected from


output and this sample has a mean length of 90
cms.

 Customer will reject the part if it is


either less than 88 cms or more than 92 cms.

 Does the 95% confidence interval


for the true mean length of all the components
produced ensure acceptance by the customer?
SOLUTION TO EXAMPLE 2
 Confidence interval is
X – Z (ơ /√n) ≤ μ ≤ X + Z (ơ /√n)
 For 95% confidence level, Z value is 1.96
[from standard normal distribution]
 Standard Normal Distribution Z = (X – μ)/(ơ /√n)

0.95

0.025 0.025

0.475 0.475

-1.96 0 1.96
SOLUTION TO EXAMPLE 2
 95% confidence interval for population mean:

X – (1.96)(ơ/√n) ≤ μ ≤ X + (1.96)(ơ/√n)

 Given X = 90, ơ/√n = 1.6/√64 = 1.6/8 = 0.2

 Therefore, 95% confidence interval is


90 – (1.96)(0.2) ≤ μ ≤ 90 + (1.96)
» 89.61 ≤ μ ≤ 90.39

 With 95% confidence, true value of population mean


length of components will fall in the interval
89.61 ≤ μ ≤ 90.39

 Hence, 95% confidence interval ensures acceptance by


customer.
Confidence Interval -Concepts
 Problem 1:
Suppose you are sampling from a population with
population variance 1,000,000. You want the standard
deviation of sample mean to be at most 25. What is the
minimum sample size you should use? .

 Problem 2:
For a fixed sample size, what is the value of the true
population proportion P that maximizes the variance of the
sample proportion p?
 Problem 3:
The width of a 95% confidence interval for population
mean µ is 10 units. If everything else stays the same, how
wide would a 90% confidence interval be for µ
 Problem 4:
Suppose you have a confidence interval based on a
sample of size n. Using the same level of confidence, how
Do You Ever Truly Know σ?
Probably not!

In virtually all real world business situations, σ is


not known.

If there is a situation where σ is known then µ is


also known (since to calculate σ you need to
know µ.)

If you truly know µ there would be no need to


gather a sample to estimate it.
Confidence Interval for μ
(σ Unknown)
If the population standard deviation
σ is unknown, we can substitute the
sample standard deviation, S
This introduces extra uncertainty,
since S is variable from sample to
sample
So we use the t distribution instead
of the normal distribution
Degrees of Freedom (df)
Idea: Number of observations that are free to vary
after sample mean has been calculated

Example: Suppose the mean of 3 numbers is 8.0

Let X1 = 7 If the mean of these three


values is 8.0,
Let X2 = 8 then X3 must be 9
What is X3? (i.e., X3 is not free to vary)

Here, n = 3, degrees of freedom = n – 1 = 3 – 1 = 2


(2 values can be any numbers, but the third is not
free to vary for a given mean)
Confidence Interval of the Population
Mean µ When ơ Is Unknown
 The t Distribution
If repeated samples of size n are taken from a
normal population with a finite variance, then the
statistic t follows the t distribution
with (n  1) degrees of freedom.. X 
Degrees
T 
of freedom S n
Number of independent variates which
make up the statistic)
Determines the extent of the broadness of the

tails of the distribution; the fewer the degrees


of freedom, the broader the tails.
Student’s t Distribution
IfIf the
the population
population standard
standard deviation, unknown,,
deviation, σσ ,, isis unknown
replace σσ with
replace with sample
sample standard deviation, ss.. IfIf the
standard deviation, the
population isis normal,
population normal, the
the resulting
resulting statistic:
statistic:
has aa tt distribution
has distribution withwith (n(n -- 1)
1) t X 
degrees of
degrees of freedom
freedom.. s/ n
The tt isis aa family
The
 family of
of bell-shaped
bell-shaped
and symmetric
and symmetric distributions,
distributions, one one
Standard normal
for each
for each number
number of of degree
degree of of
freedom.
freedom. t, df=20
The expected
The
 expected value value of of tt isis 0.
0. t, df=10

The variance
The
 variance of of tt isis greater
greater than than
1, but
1, but approaches
approaches 11 as as the
the number
number
of degrees
of degrees of of freedom
freedom increases.
increases. 

The tt isis flatter


The flatter and
and has has fatter
fatter tails
tails


than does the standard normal.
Student’s t Distribution
 Because the value of ơ is unknown and S is used to
estimate it, the values of t that are observed will be
more variable than for Z.

 As the number of d.f increases, t distribution gradually


approaches the standard normal distribution until the
two are virtually identical (since S becomes a better
estimate of ơ as the sample size gets larger).

 For large n (greater than or equal to 30), t distribution


loses it’s flatness and becomes a normal distribution,
i.e. normal approximation to t can be used when n is
greater than or equal to 30.

 For estimating population mean involving small


sample, t distribution is the best choice.
Student’s t Distribution
t Z as n increases

Standard
Normal (t
with df = ∞)

t (df = 13)
t-distributions are bell-
shaped and symmetric, but
have ‘fatter’ tails than the t (df = 5)
normal

0 t
Student’s t Table

Upper Tail Area


Let: n = 3
df .10 .05 .025 df = n - 1 = 2
 = 0.10
1 3.078 6.314 12.706 /2 = 0.05

2 1.886 2.920 4.303

3 1.638 2.353 3.182 /2 = 0.05

The body of the table


0
contains t values, not 2.920 t
probabilities
Selected t distribution values
With comparison to the Z value

Confidence t t t Z
Level (10 d.f.) (20 d.f.) (30 d.f.) (∞ d.f.)

0.80 1.372 1.325 1.310 1.28


0.90 1.812 1.725 1.697 1.645
0.95 2.228 2.086 2.042 1.96
0.99 3.169 2.845 2.750 2.58

Note: t Z as n increases
The t Distribution
dft0.100 t0.050 t0.025 t0.010 t0.005
---
----- ----- ------ ------ ------
1
3.078 6.314 12.706 31.821 63.657
2
1.886 2.920 4.303 6.965 9.925
3
1.638 2.353 3.182 4.541 5.841 t D is trib u tio n : d f = 1 0
4
1.533 2.132 2.776 3.747 4.604
5
1.476 2.015 2.571 3.365 4.032 0 .4
6
1.440 1.943 2.447 3.143 3.707
7
1.415 1.895 2.365 2.998 3.499
0 .3
8
1.397 1.860 2.306 2.896 3.355
Area = 0.10 Area = 0.10

}
9
1.383 1.833 2.262 2.821 3.250

}
f(t)
0 .2
10
1.372 1.812 2.228 2.764 3.169
11
1.363 1.796 2.201 2.718 3.106
12
1.356 1.782 2.179 2.681 3.055 0 .1

13
1.350 1.771 2.160 2.650 3.012
14
1.345 1.761 2.145 2.624 2.977 0 .0
15
1.341 1.753 2.131 2.602 2.947 -2.228
-1.372 0 1.372

}
}
2.228
16
1.337 1.746 2.120 2.583 2.921 t
17
1.333 1.740 2.110 2.567 2.898
18
1.330 1.734 2.101 2.552 2.878 Area = 0.025 Area = 0.025
19
1.328 1.729 2.093 2.539 2.861
20
1.325 1.725 2.086 2.528 2.845
21
1.323 1.721 2.080 2.518 2.831 Whenever ơơ isis not
Whenever not known
known (and
(and the the
22
1.321
23
1.319
1.717 2.074 2.508 2.819
1.714 2.069 2.500 2.807 population isis assumed
population assumed normal),
normal), the the
24
1.318
25
1.316
1.711 2.064 2.492 2.797
1.708 2.060 2.485 2.787
correct distribution
correct distribution to to use the tt
use isis the
26
1.315
27
1.314
1.706 2.056 2.479 2.779
1.703 2.052 2.473 2.771
distribution with
distribution n-1 degrees
with n-1 degrees of of
28
1.313 1.701 2.048 2.467 2.763 freedom. For
freedom. For large
large degrees
degrees of of
29
1.311 1.699 2.045 2.462 2.756
30
1.310 1.697 2.042 2.457 2.750 freedom, the
freedom, the tt distribution
distribution isis
the ZZ
40
1.303 1.684 2.021 2.423 2.704
60
1.296 1.671 2.000 2.390 2.660 approximated well
approximated well by
by the
120
1.289
 1.282
1.658 1.980 2.358 2.617
1.645 1.960 2.326 2.576 distribution.
distribution.
Confidence Interval for μ
(σ Unknown)
Assumptions
Population standard deviation is unknown
Population is normally distributed
If population is not normal, use large sample
Use Student’s t Distribution
Confidence Interval Estimate:

S
X  tα / 2
n
(where tα/2 is the critical value of the t
distribution
with n -1 degrees of freedom that cut-off an area
of α/2 in each tail)
Confidence Interval of the Population
Mean µ When ơ Is Unknown- Example
Need to estimate mean mpg of all ultra-green cars.
Use the sample information to construct a 90%
confidence interval of the population mean.
 Assume that mpg follows a normal distribution.
Solution: Since the population SD is not known,
the sample SD has to be computed from the
sample.
Sample Mean:96.52, Sample SD: 10.70, Sample
size: 25
 As a result, the 90% confidence interval is

x  t 2,df s 
n  96.52  1.711 10.70 
25  96.52  3.66
Example 3
AA stock
stock market
market analyst
analyst wants
wants to
to estimate
estimate thethe average
average return
return on
on
aa certain
certain stock.
stock. AA random
random sample
sample ofof 15
15 days
days yields
yields an
an average
average
(annualized) return
(annualized) return ofofx  10.37% and aa standard
and standard deviation
deviation of
of
3.5%. Assuming
3.5%. Assuming aa normal
normal population
population ofof returns,
returns, give
give aa 95%
95%
confidence interval
confidence interval for
for the
the average
average return
return onon this
this stock
stock..
The critical value of t for df = (n
df t0.100 t0.050 t0.025 t0.010 t0.005
-1) = (15 -1) =14 and a right-
--- ----- ----- ------ ------ ------
tail area of 0.025 is:
t 0 . 025  2.145
1 3.078 6.314 12.706 31.821 63.657
. . . . . .
. . . . . .
. . . . . .
13
14
15
1.350
1.345
1.341
1.771
1.761
1.753
2.160
2.145
2.131
2.650
2.624
2.602
3.012
2.977
2.947
The corresponding confidence
.
.
.
.
.
.
.
.
.
.
.
.
interval or interval estimate is:
. . . . . .
s
x  t 0 .0 2 5
n
3 .5
 1 0 . 3 7  2 .1 4 5
15
 1 0 .3 7  1 .9 4
  8 . 4 3 ,1 2 . 3 1 
Confidence Intervals for the
Population Proportion, P
P proportion of successes in the population,
where success is defined by a particular outcome.
p is the point estimator of population proportion P
By central limit theorem, p can be approximated by
a normal distribution for large samples (i.e., nP > 5
and n(1  P) > 5) with mean µp= P
& SD P (1  P )
σ 
n
p

We will estimate this with sample data:


p(1 p)
n
Confidence Intervals for the
Population Proportion, P
(1-α)% Confidence interval for the
population proportion P :

p(1  p)
p  Z α/2
n
where
Z is the standard normal value for the
α/2
level of confidence desired
p is the sample proportion
n is the sample size
Note: must have nP > 5 and n(1-P) > 5
Confidence Interval of the Population
Proportion- Example:
Need to estimate the proportion of all ultra-green
cars that obtain over 100 mpg.
Use the sample information to construct a 90%
confidence interval of the population proportion.
Solution: Given n= 25, p=7/25 =0.28

The normality assumption is met since nP > 5


and n(1  P) > 5.
90% confidence interval of the population
p +proportion P : = 0.28+ 1.645√0.28 (1-0.28)/25 =
zα/2√p(1-p)/n
0.28+0.148 =(0.132, 0.428)
With 90% confidence the % of cars that obtain over 100 mpg is
between 13.2% & 42.8%
Large-Sample Confidence Interval for the
Population Proportion, P (Example-4)
AA marketing
marketing research
research firm
firm wants
wants to to estimate
estimate the
the share
share that
that
foreign companies
foreign companies havehave inin the
the American
American market
market forfor certain
certain
products. AA random
products. random sample
sample of of 100
100 consumers
consumers isis obtained,
obtained,
and itit isis found
and found that
that 34
34 people
people in in the
the sample
sample are
are users
users of
of
foreign-made products;
foreign-made products; the the rest
rest are
are users
users of
of domestic
domestic
products. Give
products. Give aa 95%
95% confidence
confidence interval
interval for
for the
the share
share of
of
foreign products
foreign products inin this
this market.
market.

pq ( 0 . 34 )( 0 . 66 )
p  z  0 . 34  1 . 96
2 n 100
 0 . 34  (1 . 96 )( 0 . 04737 )
 0 . 34  0 . 0928
  0 . 2472 , 0 . 4328 
The firm
The firm may
may be
be 95%
95% confident
confident that
that foreign
foreign manufacturers
manufacturers
control anywhere
control anywhere from
from 24.72%
24.72% to
to 43.28%
43.28% of of the
the market.
market.
Selecting a Useful Sample Size
 Precision in interval estimates is implied
by a low margin of error.

 The larger n reduces the margin of error for


the interval estimates.

 How large should the sample size be for a


given margin of error?
Determining Sample Size
Determining
Sample Size

For the For the


Mean Proportion
Sample-Size Determination
Before determining the necessary sample size, three
questions must be answered:
How close
How
 close dodo we
we want
want ourour sample
sample estimate
estimate to to be
be
to the
to the unknown
unknown parameter?
parameter? (What(What isis the
the desired
desired
bound,, E?)
bound E?)
What do
What
 do we
we want
want the
the desired
desired confidence
confidence level
level (1-
(1-
to be
α) to
α) be so
so that
that the
the distance
distance between
between our our estimate
estimate
and the
and the parameter
parameter isis less
less than
than or
or equal
equal toto E?
E?
What isis our
What
 our estimate
estimate of of the
the variance
variance (or(or standard
standard
deviation) of
deviation) of the
the population
population in in question?
question?

F o r e x a m p le : A (1 -  ) C o n fid e n c e In te rv a l fo r  : x  z 
n

}
2

Bound, E
Sample Size and Standard Error
The sample size determines the bound of a
statistic, since the standard error of a statistic
shrinks as the sample size increases:

Sample size = 2n
Standard error
of statistic

Sample size = n
Standard error
of statistic


Sampling Error
The required sample size can be found to
reach a desired margin of error (E) with a
specified level of confidence (1 - )

The margin of error is also called sampling


error
the amount of imprecision in the estimate
of the population parameter
the amount added and subtracted to the
point estimate to form the confidence
interval
Determining Sample Size
Determining
Sample Size

Sampling
For the
error (margin
Mean
of error)
σ σ
X  Zα / 2
n
E  Z
n
/2
Determining Sample Size

Determining
Sample Size

For the
Mean

2
σ Now solve Z / 2 σ 2
E  Z / 2 for n to n 2
n get E
Determining Sample Size
To determine the required sample size for
the mean, we must know:

The desired level of confidence (1 -


), which determines the critical value,
Zα/2
The acceptable sampling error, E
The standard deviation, σ
If σ is unknown
If unknown, σ can be estimated when
using the required sample size formula
Use a value for σ that is expected to
be at least as large as the true σ
Select a pilot sample and estimate σ
with the sample standard deviation, S
Determining Sample Size-Example
 Need to construct a 90% confidence interval of the
mean mpg of all ultra-green cars in Ultra Green Car
problem .
 Suppose we would like to constrain the margin of
error to within 2 mpg.
 The lowest mpg in the population is 76 mpg and the
highest is 118 mpg.
 How large a sample do we need to compute the
90% confidence interval of the population mean?
n = Zα/22ơ2/E2,
n = (1.645)2.(10.50)2/(2)2 = 74.58~ 75
EXAMPLE-5
 A marketing manager of a fast food restaurant in a
city wishes to estimate the average yearly amount
that families spend on fast food restaurants.
 He wants the estimates to be within + Rs. 100 with
a confidence level of 99%.
 It is known from an earlier pilot study, that SD of
the family expenditure on fast food restaurant is Rs.
500.
 How many families must be chosen for this
problem?

Solution:
 Given ơ = 500, E = + 100
 Z = 2.58 for a 99% confidence level
n = Zα/22ơ2/E2,
n = (2.58)2.(500)2/(100)2 = 166.41 ~ 166
Sample-Size Determination:
Example-6
AA marketing
marketing research
research firmfirm wants
wants toto conduct
conduct aa survey
survey to
to
estimate the
estimate the average
average amount
amount spent
spent onon entertainment
entertainment by by
each person
each person visiting
visiting aa popular
popular resort.
resort. The
The people
people who
who plan
plan
the survey
the survey would
would like
like to
to determine
determine the the average
average amount
amount
spent by
spent by all
all people
people visiting
visiting the
the resort
resort to
to within
within $120,
$120, with
with
95% confidence.
95% confidence. From From past
past operation
operation of of the
the resort,
resort, an
an
estimate of
estimate of the
the population
population standard
standard deviation
deviation isis
$400. What
$400. What isis the
the minimum
minimum required
required sample
sample size?
size?
z 
2
 2

n  2

E 2

( 1 . 96 ) 2
( 400 ) 2


120 2

 42 . 684  43
Determining Sample Size
Determining
Sample Size

For the
Proportion

P (1  P ) Z 2 P (1  P)
EZ
Now solve n 2
n for n to get E
Determining Sample Size
To determine the required sample size for the
proportion, we must know:

The desired level of confidence (1 - ), which


determines the critical value, Zα/2
The acceptable sampling error, E
The true proportion of events of interest, P
P can be estimated with a pilot sample if
necessary (or conservatively use 0.5 as an
estimate of P)
Selecting a Useful Sample Size- Ultra Green
Car problem
 Need to construct a 90% confidence interval of the
proportion of all ultra-green cars that obtain over
100 mpg in..
 Does not want the margin of error to be more

than 0.10.
 How large a sample do we need for his analysis
of the population proportion?

n = Zα/22P(1-p)/E2 = (1.645/0.10)2 0.50(1-0.50)


=67.65 =68
,
Sample-Size for Proportion: Example 7
The manufacturers
•• The manufacturers of of aa sports
sports carcar want
want toto estimate
estimate
the proportion
the proportion of of people
people in in aa given
given income
income bracket
bracket
who are
who are interested
interested in in the
the model.
model.
The company
•• The company wants
wants to to know
know thethe population
population
proportion, P,
proportion, P, to
to within
within 0.01
0.01 with
with 99%
99% confidence.
confidence.
Current company
•• Current company records
records indicate
indicate that
that the
the proportion
proportion
may be
may be around
around 0.25.
0.25. What
What isis the
the minimum
minimum required
required
sample size
sample size for
for this
this survey?
survey?

n=z2α/2P(1-P) /E2

= 2.5762 (0.25) (0,75) / (0.10)2

=124.42 = 125
EXAMPLE-7
 A company manufacturing sports goods wants
to estimate the proportion of cricket players
among high school students in India.
 The company wants the estimate to be within
+ 0.03 with a confidence level of 99%.
 A pilot study done earlier reveals that out of 80
school students, 36 students play cricket.
 What should be the sample size for this study?

Solution:
 Given P = 36/80 = 0.45, E = + 0.03, &
Z = 2.58 for 99% confidence level
Using n = P(1-P) (Z2/E2),
we get n = (0.45)(0.55)(2.58)2/(0.03)2
» n = 1830.51 ~ 1831
Problems
 The director of a market research agency
wishes to study the reach of a particular
advertising campaign.
 He is concerned with the percentage of the
target market that has seen at least a portion of
the campaign.
 The director does not think that the figure will
exceed 25%.
 What should be the sample size for this study if
the director wishes the estimate to be within
three percentage points of the true value and
95% confidence level is specified?
Problems
2. The average travel time taken based on a random
sample of 15 people working in a company to reach
the office is 45 minutes with an S.D of 9minutes.
Assuming normal establish the 95% confidence
interval for the mean travel time of every one in the
office.

3. A private bank offering internet banking facility wants


to know the proportion of customers who are
satisfied with its service quality.
The bank wants this estimate to be within 0.04 with a
confidence level of 95%.
A pilot study done earlier reveals that
out of 120 customers, 90 are satisfied customers.
What should be the sample size for a new
comprehensive survey to ascertain satisfaction level ?
Ethical Issues
A confidence interval estimate (reflecting
sampling error) should always be included
when reporting a point estimate
The level of confidence should always be
reported
The sample size should be reported
An interpretation of the confidence
interval estimate should also be provided
Case Study: TATA TEA : An Indian Multinational Setting
Landmark
Introduction
India is the largest producer of tea in the world.

Tea companies are a major source of foreign


exchange revenue for the country.

72% of the tea produced in India is consumed in


the domestic market and 28% is exported.

 The consumption of tea is almost equal in rural and


urban areas.
Introduction
 The government took an important step to
promote tea industry by amending the Tea
Marketing Control Order (TMCO), 1984.

It has also granted options to tea growers to sell


tea through any channel.) through public auctions.

Earlier, there was a restriction in terms of selling


75% of the products (subject to some
exemptions) through public auctions.
Introduction
Tata Tea, a part of the Tata group, was
incorporated in 1964 as a joint venture between
the Tata group and the United Kingdom – based
James Finley and Company.

It is the second largest tea Company in India.

 In December 1982, The Tata group acquired


James Finley.

As a result, the company was rechristened “Tata


Tea Ltd” from “Tata Finlay Ltd.”
Introduction
Tata Tea has come a long way from being a pure
plantation company to becoming an international
branded tea company.

Presently, Tata Tea and Hindustan Unilever are


the top two tea manufacturing companies in
India.

Tata Tea’s strategic acquisition of British giant


Tetley has made it the second biggest tea
company in the world after Unilever.
About Case
India: World’s second largest Tea Producer
Was Largest Producer till 2007- the time
when this case study was designed

72% domestic consumption

28% exported

Market share:
HUL: 19%
Tata Tea: 18%
Sales Figure Tata Tea
Year Sales (in Million CAGR
rupees) =[(Sales2007 /
Sales1995)^(1/Yrs)-1]*100
1995 3993.2
1996 5196.9 =0.084447 *100
1997 6921.9 =8.45%
1998 8719
1999 8762
2000 9136.5
2001 8244.4
2002 7628.2
2003 7484.3
2004 7775.3
2005 8932.7
2006 9710.1
2007 10563.8
Production Graph
Sales in Million Rupees
12000

10000

8000

6000

4000

2000

0
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Production Graph 2
Sales in Million Rupees
12000

10000

8000

6000

4000

2000

0
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
• Suppose Tata Tea decides to assess its employees’ job
satisfaction on the basisof a brief survey that includes five
questions.
• Response of 150 employees randomly surveyed is as shown
in Table.
• Use α = 0.05, to estimate the population’s positive
response to these questions on the basis of the sample
responses.
Following data is given :
• The research team has randomly selected 150 employees,
thus sample size n=150
• Here the sample statistic is the sample proportion
•  = 0.05

• Sample Proportion

X number of items in the sample having the characteristic of interest


p 
n sample size

0≤ p≤1

Since the sample size is large (150), sample proportion p is


approximately distributed as a normal distribution.
Confidence Interval for the Population
Proportion:
• Upper and lower confidence limits for the population
proportion are calculated with the formula

p(1  p)
p  Z/2
n
• where
– Zα/2 is the standard normal value for the level of
confidence desired
–p is the sample proportion
–n is the sample size
–  is given as  = 0.05 , it is the level of significance
SL NO Questions in the Yes No Sample
form of Proportion for
Statements positive
responses
1 I am proud to work 110 40 110/150 = 0.733
for my company
2 HR Policies for 120 30 120/150 = 0.8
promotion are fair
3 Seniors are co- 105 45 105/150 = 0.7
operative and
helpful
4 I will leave my 25 125 25/150 = 0.167
company in case a
better opportunity
arises
5 My company follows 40 110 40/150 = 0.267
a fair compensation
structure
Question Sample Confidence Limit for Confidence Limit for
No Proportion Population Proportion Population Proportion
(p) (P) (P)
1 110/150 0.663≤P ≤0.804
(0.733)(0.267)
0.7331.96
150
2 120/150 0.736≤P≤0.864
(0.8)(0.2)
0.8 1.96
150
3 105/150 0.627≤P≤0.773
(0.7)(0.3)
0.7 1.96
150
4 25/150
(0.167)(0.833) 0.108≤P≤0.226
0.1671.96
150
5 40/150
(0.267)(0.733) 0.197≤P≤0.337
0.2671.96
150
Analysis: Based on the survey conducted amongst 150
employees of Tata Tea, we can say with 95% level of
confidence that the percentage of positive response for
all the employees of Tata Tea to the below questions
would lie between
Question Response Percentage of
Total
Population
I am proud to work for my company Yes 66.3 % to
80.4%
HR Policies for promotion are fair Yes 73.6% to
86.4%
Seniors are co-operative and helpful Yes 62.7% to
77.3%
I will leave my company in case a better Yes 10.8% to
opportunity arises 22.6%

My company follows a fair compensation Yes 19.7% to


structure 33.7%
• Suppose the company has also decided to measure the job satisfaction
levels of managers who have joined the organization in the last five
years.
• The research team has randomly selected 15 managers.
• Researchers have used a five-point rating scale with 1 as “strongly
disagree” and 5 as “strongly agree”.
• Assume that over all responses to the questions are normally distributed
and data are in the interval scale.
• Responses are given in Table.
• Use a = 0.05 for estimating the population’s response to these questions
on the basis of the sample responses.
Following data is given :
• The research team has randomly selected 15 members, thus
sample size n = 15
• Assumed that all responses to the questions are normally
distributed and data are in interval scale
•  = 0.05
If the population standard deviation, σ , is unknown, replace σ with the
sample standard deviation, s. If the population is normal, the resulting
statistic has a t distribution with (n - 1) degrees of freedom.

(1-α)% CI for S
population X  t / 2
responses n
(where tα/2 is the critical value of the t distribution with n -1
degrees of freedom that cut-off an area of α/2 in each tail)
Sl. Questions in form of Mean SD X - tα-1(S/√n) X + tα-1(S/√n)
No statements X
.
1 There are good 3.45 0.90 3.45 - 3.45 +
opportunities for growth 2.145(0.90/√15)= 2.145(0.90/√15)=
in my organization 2.95 3.95
2 My job matches my 4.10 0.80 4.10 - 4.10 +
qualification and 2.145(0.80/√15)= 2.145(0.80/√15)=
experience 3.66 4.54

3 The work environment is 3.80 1.10 3.80 - 3.80 +


facilitate and supportive 2.145(1.10/√15)= 2.145(1.10/√15)=
3.19 4.41

4 Work Culture of the 3.85 0.75 3.85 - 3.85 +


organization is healthy 2.145(0.75/√15)= 2.145(0.75/√15)=
3.43 4.26

5 I am more satisfied than 4.20 0.95 4.20 - 4.20 +


my batch mates who are 2.145(0.95/√15)= 2.145(0.95/√15)=
working with other 3.67 4.73
organizations
df
---
t0.100
-----
t0.050 t0.025 t0.010 t0.005
----- ------ ------ ------
The critical value of t for df = (n -1)
1
2
3.078
1.886
6.314 12.706 31.821 63.657
2.920 4.303 6.965 9.925
= (15 -1) =14
3
4
1.638
1.533
2.353 3.182 4.541 5.841
2.132 2.776 3.747 4.604
A right-tail area of 0.025 is:
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499 t 0 . 025  2.145
8 1.397 1.860 2.306 2.896 3.355
9
10
1.383
1.372
1.833 2.262 2.821 3.250
1.812 2.228 2.764 3.169
The corresponding confidence interval
11
12
1.363
1.356
1.796 2.201 2.718 3.106
1.782 2.179 2.681 3.055
or interval estimate is:
13 1.350 1.771 2.160 2.650 3.012 s
14 1.345 1.761 2.145 2.624 2.977 x  t 0 . 025
n
15 1.341 1.753 2.131 2.602 2.947
16 1.337 1.746 2.120 2.583 2.921 s
 x  2 . 145
17 1.333 1.740 2.110 2.567 2.898 15
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845 t D is trib utio n: d f = 1 4
21 1.323 1.721 2.080 2.518 2.831
0 .4
22 1.321 1.717 2.074 2.508 2.819
23 1.319 1.714 2.069 2.500 2.807
0 .3
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
f(t)

0 .2
26 1.315 1.706 2.056 2.479 2.779
27 1.314 1.703 2.052 2.473 2.771
28 1.313 1.701 2.048 2.467 2.763 0 .1

29 1.311 1.699 2.045 2.462 2.756


30 1.310 1.697 2.042 2.457 2.750 0 .0

40 1.303 1.684 2.021 2.423 2.704 0

}
}
60 1.296 1.671 2.000 2.390 2.660 t
120 1.289 1.658 1.980 2.358 2.617 Area = 0.025 Area = 0.025
 1.282 1.645 1.960 2.326 2.576
Analysis of the above data :
95 %
1 2 3 4 5
Q1 2.9
5
3.9
5
Q2 3.6 4.5
6 4
Q3 3.1 4.4
Q4 9
3.4
1
4.2
Q5 3
3.6
6
4.7
7 3
1 = Strongly disagree 2 = Disagree 3 = Neutral 4 = Agree
5 = Strongly Agree
Analysis: Based on the survey conducted amongst 15 managers of
Tata Tea, we can say with 95% level of confidence about job
satisfaction levels of managers who have joined the organization in
the last five years, that average population’s response to the below
questions would lie between

Question Ranking of average


Population Response
There are good opportunities for growth in my 2.95 to 3.95
organization

My job matches my qualification and experience 3.66 to 4.54

The work environment is facilitate and supportive 3.19 to 4.41

Work Culture of the organization is healthy 3.43 to 4.26

I am more satisfied than my batch mates who are 3.67 to 4.73


working with other organizations

You might also like