Confidence Intervals (WithSolutions)
Confidence Intervals (WithSolutions)
Confidence Intervals (WithSolutions)
Exercise 1
For a generic confidence interval, what is the relation between the confidence level and the margin of error.
Keeping everything else constant, by decreasing the confidence level 100(1- α)% , the margin of error is
decreased. As a consequence, a smaller sample size is sufficient to maintain the length of the confidence
interval fixed.
Exercise 2
Provide the definition of confidence interval of level (1- α)⋅100% for an unknown parameter θ of a
population.
Suppose that, on the basis of the sample information, we can determine two random variables A and B such
that P(A < θ < B) = 1 – α, for 0 ≤ α ≤ 1. If a and b are the observed values for A and B corresponding to the
sample obtained. The interval from a to b is the confidence interval at level 100(1 – α)% for the parameter
θ. The quantity 100(1 – α)% is the confidence level of the interval.
Exercise 3
Explain briefly the differences between the Student-t distribution and the Standard Normal distribution.
The Student-t distribution has a shape similar to that of the standard normal distribution: both have zero
mean and their density functions are symmetric with respect to the mean. However, the density function of a
Student-t distribution has a greater variability with respect to the standard normal distribution. The graph is
flatter and the area in the tails is greater (heavy tails) than that one of the standard normal distribution.
Let us consider a random sample of n observations with mean 𝑋 and standard deviation S drawn from a
normally distributed population with mean µ. The variable
𝑋−𝜇
𝑇=
𝑆/ 𝑛
has a Student-t distribution with n-1 degrees of freedom, since σ is unknown and is therefore estimated by S.
The uncertainty about σ causes the greater variability of T. As the degrees of freedom increase, the Student-t
distribution becomes closer and closer to the normal distribution until the two distributions are almost
identical. This happens because, as the sample size increases, S becomes a more reliable estimator of σ.
Exercise 4
A random sample of 90 vehicles on a given point on a highway is observed, yielding an average speed of
64.8 Km/h, and a standard deviation of 23.2 Km/h.
a) Construct the 95% confidence interval for the average speed of vehicles in that point of the highway.
b) If you were to increase the number of vehicles whose speed you measure, but not change any other
conditions, would the width of the confidence interval become larger or smaller? What if (keeping all
other conditions unchanged) you were to increase the level of confidence? Justify your answers.
a) Let X be a random variable that indicate a car’s speed at that point of the highway.
x = 64.8
s = 23.2
n = 90
1 − α = 0.95
1
The confidence interval for the population mean (level 95%) is:
s s
x − t n −1,α < µ < x + t n −1,α
2 n 2 n
23.2 23.2
64.8 − t89,0.025 < µ < 64.8 + t89,0.025
90 90
23.2 23.2
64.8 − 1.987 < µ < 64.8 + 1.987
90 90
59.941 < µ < 69.659
If we use the normal approximation, the interval is different.
60.007 < µ < 69.593
b) If the sample size increases, the variance of the sample mean decreases and then the confidence
interval decreases. If the confidence level increases, t n −1,α is proportional to the error rate, then the
2
confidence interval increases.
Exercise 5
We want to know the average number of times, µ, that a students take a particular exam before they pass it.
Considering a random sample of 150 students and recording the number of times each one of them takes the
exam before passing it, we obtained a sample average of 1.22, with a sample standard deviation equal to
0.35.
a) Determine the 90% Confidence Interval for µ.
b) Do we need to assume that the population is normally distributed in order to calculate the previous
interval? Justify your answer.
c) If we increase the confidence level to 99%, keeping everything else unchanged, what consequence
would this have on the interval?
a) First, we observe that, for n=150 𝑡*+,,.// = 𝑡,01,.// ≈< 𝑧.// , so we can use the Normal distribution
(as an accurate approximation) when calculating the interval:
⎡ s ⎤ ⎡ 0.35 ⎤
CI 90% = ⎢ x ± zα / 2 ⎥ = ⎢1.22 ± 1.645 ⎥ = [1.22 ± 1.645 ⋅ 0.0286] = [1.22 ± 0.0470] = [1.1730;1.2670]
⎣ n ⎦ ⎣ 150 ⎦
b) We don’t need to assume normality. Given the large sample size, the Central Limit Theorem allows
us to calculate the interval without any assumptions on the population distribution.
c) Increasing the confidence level to 99%, keeping everything else unchanged, would increase the
margin of error, making the interval wider.
Exercise 6
You want to estimate the average cost, µ, of organic garbage disposal for Italian households. With this aim,
you interview a simple random sample of 5 household heads and you find the following costs (in Euro):
22.50 21.50 25.00 21.50 25.50
a) Say whether it is necessary to make any assumption on the population in order to calculate a
confidence interval for µ. If so, specify what are the necessary assumptions.
b) Calculate a 99% confidence interval for µ.
c) Does the interval you found contain µ with probability 99%? Justify your answer.
a) Given that the sample size is small, we need to assume that the random sample is extracted from a
normally distributed population with expected value µ.
b) Let X be a random variable that indicates the amount spent by a randomly selected resident for the
2
collection of organic waste.
n=5
22.50 + 21.50 + 25.00 + 21.50 + 25.50
x= = 23.2
5
s2 =
(22.50 − 23.2)2 + (21.50 − 23.2)2 + (25.00 − 23.2)2 + (21.50 − 23.2)2 + (25.50 − 23.2)2 = 3.7
4
s = 3.7 = 1.9235
α
1 − α = 0.99 = 0.005 t 4,0.005 = 4.604
2
The 99% confidence interval for the population mean is:
s s
x − t n −1,α < µ < x + t n −1,α
2 n 2 n
1.9235 1.9235
23.2 − t 4,0.005 < µ < 23.2 + t 4,0.005
5 5
1.9235 1.9235
23.2 − 4.604 < µ < 23.2 + 4.604
5 5
c) No. The interval is fixed, so there are no probabilities involved. The interpretation of the confidence
level is: if one extracts, repeatedly and independently, a large number of samples of size n=5, then
the true value of the parameter µ will be contained in 99% of the random intervals created from
those samples.
Exercise 7
The Director of a Bank wants to study the average amount of Euros, µ, withdrawn at a given cash dispenser.
On the basis of historical data, the standard deviation of the amount withdrawn is assumed to be 40 Euros. A
random sample of 100 withdrawals is analyzed and the sample average is found to be equal to 107 Euros.
a) Calculate the 95% confidence interval for µ.
b) Determine the minimum sample size that guarantees a margin of error not higher than 6, using the
same confidence level used in point a.
c) Explain the difference between the estimator and the estimate of a certain parameter.
a) We wish to build a CI for the mean in the case that the population variance is known:
⎛ σ ⎞
CI (1−α ) =0.95 ( µ ) = ⎜ x ± zα ⋅ ⎟
⎜ n ⎟
⎝ 2 ⎠
x = 107;
σ = 40;
n = 100;
z α = z0.025 = 1.96
2
⎛ σ ⎞ ⎛ 40 ⎞
Substituting, we get: CI (1−α ) =0.95 ( µ ) = ⎜ x ± zα ⋅
⎜
⎟ = ⎜107 ± 1.96 ⋅
⎟ ⎝ ⎟ = (99.16;114.84 )
⎝ 2
n ⎠ 100 ⎠
b) We have to solve: σ
ME = z α ⋅ ≤6
2
n
Substituting the known quantities we get:
σ 40 40 2
ME = zα ⋅ ≤ 6 → ME = 1.96 ⋅ ≤ 6 → n ≥ 1.96 ⋅ = 13.0667 → n ≥ (13.0667 ) → n ≥ 170.7386
2
n n 6
In order to have a Margin of error not higher than 6 the sample size has to be at least 171.
Exercise 8
The travelling time of an Intercity notte train from Torino to Palermo is distributed according to a normal distribution
with a standard deviation equal to 2 hours. A simple random sample of 16 travels is selected in order to estimate
the average travelling time µ. Build a 95% confidence interval for the average travelling time of this train.
𝜎 𝜎
𝑥 − 𝑧. < 𝜇 < 𝑥 + 𝑧.
/ 𝑛 / 𝑛
𝑧. = 1.96
/
Exercise 9
In a firm, a Committee on health and safety at work is interested in estimating the average quantity µ of
water taken by the employees during a working day. For a sample of 6 employees the following quantities of
water taken during a working day (in liters) have been recorded:
1.52 1.85 1.34 1.98 2.07 2.34
a) Determine the 90% confidence interval for µ.
b) Suppose that σ = 0.24, how many employees would it be necessary to extract so that that a 90%
confidence interval for µ has a length less than 0.2?
a) Let X = quantity of water taken by an employee during a working day (in liters).
The 90% confidence interval for the population mean is given by:
𝑠 𝑠
𝑋 − 𝑡*+,,. < 𝜇 < 𝑋 + 𝑡*+,,.
/ 𝑛 / 𝑛
In this case, we have:
0.3673 0.3673
1.85 − 𝑡>,L.L> < 𝜇 < 1.85 + 𝑡>,L.L>
6 6
0.3673 0.3673
1.85 − 2.015 < 𝜇 < 1.85 + 2.015
6 6
1.5479 < 𝜇 < 2.1521
b) If we require the width of the 90% confidence interval for µ to be less than 0.2, this means we need
the ME to be less than 0.1.
𝜎 𝜎 ∙ 𝑧.// 0.24 ∙ 1.645
𝑀𝐸 = 𝑧.// < 0.1 →
𝑛 > = = 3.948
→ 𝑛 > 15.59
𝑛 0.1 0.1
4
Exercise 10
Assume that the monthly expenditure (in Euros) for the purchase of music-tracks on the web by young men
between 18 and 25 years old has a Normal distribution with mean µ and standard deviation equal to 5.8. For
a randomly extracted sample of 8 young men between 18 and 25 years old, we have recorded the following
monthly expenditures for the purchase of music-tracks on the web:
22.8 24.4 18.5 32.4 12.6 24.2 21.5 20.6
a) Determine a 90% confidence interval for µ.
b) Which elements determine the margin of error of a confidence interval of the type determined in
point a)?
a) The confidence interval for the unknown population mean µ is given by [𝑋 ∓ 𝑀𝐸],
T
where 𝑀𝐸 = 𝑧.//
*
22.8 + 24.4 + 18.5 + 32.4 + 12.6 + 24.2 + 21.5 + 20.6
𝑥= = 22.125
8
𝜎 5.8 5.8
𝑀𝐸 = 𝑧. = 𝑧L.L> =
1.645 = 3.3733
𝑛 / 8 8
𝐶𝐼1L% = 𝑋 ∓ 𝑀𝐸 = 22.125 ∓ 3.3733 = [8.7517, 25.4983]
b) The margin of error depends on three factors: confidence level, sample size and variability of the
data, which in the specific case of point a) is summarized by the (known) standard deviation of the
population. The greater the sample size, the smaller the standard error of the sample mean, and
therefore the smaller the margin of error. The same effect is produced by a reduction of the standard
deviation which is at the numerator of the fraction. The opposite relation is verified for the desired
confidence level: the greater the required confidence level, the greater the value of the reliability
factor 𝑧.// , and so the greater the margin of error.
Exercise 11
On a random sample of 15 music shops of an Italian region (Region A) the average weekly gain from the sale of Jazz
discs is equal to 160 Euros with standard deviation 46 Euros. Under the assumption of normality of the average
weekly gain,
a) construct a 90% confidence interval for the average weekly gain of the music shops in Region A;
b) does it make sense to say, on the basis of the result obtained in point a), that the average weekly gain is 168
Euros? Motivate your answer;
c) without modifying the sampling realization, if we want to obtain a confidence interval of width not greater
than 19, what can we do? (motivate the answer).
d) Explain the meaning of the statement “S2 is an unbiased estimator for σ2”.
c) 19 is a width smaller than the width of the previously determined interval. Since I cannot modify the
sample realization, I cannot act on n (nor s), so the only possible action is to diminish the confidence
level (increase the value of α ).
5
ˆ
d) A point estimator θ is said to be an unbiased estimator for the population parameter θ if its
ˆ ˆ
expected value is equal to the value of the parameter itself. If: E (θ ) = θ holds, then θ is an
unbiased estimator of θ . In this case
⎛ ∑ ( X i − X )2 ⎞
E ( S 2 ) = E ⎜ ⎟ = σ
2
⎝ n −1 ⎠
Exercise 12
The owner of a bar assumes that the time spent by a customer in a seat follows a Normal distribution with
unknown mean µ and variance equal to 12.4.
For a random sample of 6 customers, the following times (in minutes) spent in the bar were collected:
16.1 14.1 14.8 8.2 14.3 8.1
a) Compute the standard error of the sample mean. What could you say about the results obtained?
b) Find a 99% confidence interval for µ.
σ σ2 12.4
a) The standard error of the sample mean is : = = = 1.4376
n n 6
The standard error provides a measure of variability of the estimator, that is, its inaccuracy. In
this case, it indicates that, if several different samples of the same size n were observed, the the
corresponding observed value of the sample would deviate, on average, 1.4376 units from its
expected valued, namely µ.
b) Since we assume that the variable “X= minutes spent in a bar” has a normal distribution, and the
variance of X is known, the confidence interval for µ at 99% is given by:
σ
x ± zα
2 n
16.1 + 14.1 + 14.8 + 8.2 + 14.3 + 8.1 75.6
x= = = 12.6
6 6
z α = z 0.01 = z 0.005 = 2.575
2 2
σ
So the confidence interval for µ at 99% is: x ± zα = 12.6 ± 2.575 ⋅ 1.4376 = (8.8982, 16.3018)
2 n
Exercise 13
A careful owner has to remodel his own apartment. In order to evaluate the costs he has to undertake, he has
investigated a sample of 80 building companies in Lombardia, obtaining the following results:
6
a) Determine the 95% Confidence Interval for the average cost µ.
b) Would it be helpful to know the total number of building companies in Lombardia? Briefly motivate your
answer.
c) The careful owner would like to estimate the proportion p of building companies that have a certain
certificate of conformity. How many companies would he need to extract in order to be sure that the 95%
confidence interval for p has a margin error not greater than 0.03?
In order to be sure that the 95% confidence interval for p has a margin of error no greater than 0.03
he needs to extract at least 1068 companies.
Exercise 14
Let the sample X1, ..., Xn be extracted from a N (µ,σ ) population with known variance σ = 16.
2
2
7
€ €
a) Find the smallest sample size that will produce a 95% confidence interval for µ no wider than 2.
b) Now, suppose that your boss tells you that you that the company budget will only allow for a sample
size equal to 30. How wide would your 95% confidence interval be? What could you do to achieve
the desired width of 2 with such sample size? €
σ
a) Width= 2 ⋅ ME = 2 ⋅ zα ⋅ ≤2
2 n
4
2 ⋅ 1.96 ⋅ ≤ 2 à n ≥ 4 ⋅ 1.96 à n ≥ 7.84 2 à n ≥ 61.4656
n
So n ≥ 62 .
b) n=30.
σ 4
width= 2 ⋅ zα ⋅ = 2 ⋅ 1.96 ⋅
= 2.8628
2n 30
so in order to achieve a width of 2 you could increase α.
Exercise 15
A publisher is evaluating the average time µ required (in minutes of work) for a final revision of a
manuscript, before the publication. The time required follows a normal distribution with a standard
deviation equal to 840.
a) Find the minimum sample size needed to obtain a 95% confidence interval for µ with a margin of
error smaller than 610 minutes.
b) In a sample of 5 recently accepted manuscripts, the following times for the final revision have been
collected:
3450 6300 4300 5760 4893.
a) Since the population follows a normal distribution with known standard deviation:
2
σ 840 ⎛ 840 ⎞
ME = zα = 1.96 ≤ 610 ⇒ n > ⎜1.96 ⎟ = 7.2847 → n* = 8
2 n n ⎝ 610 ⎠
c) Yes, the length of the confidence interval would increase because t α = t 4,0.025 = 2.776 > z α
n −1,
2 2
Exercise 16
A sample of 120 cyclists is extracted from the 400 participants of a bike competition. After having collected
the quantity of liquids drank (in liters) during the competition, the below output has been obtained:
8
Liquids
drank
Mean
1.811
Median
1.987
Mode
2.123
Sample
Variance
0.233
Kurtosis
-‐1.6220
Skewness
-‐0.3038
Range
1.249
Minimum
1.112
Maximum
2.362
Sum
217.3842
Count
120
a) Since the sample size n is not sufficiently small with respect to the total number of participants, that
is the population N (n is greater than the 5% of N), it is necessary to apply the finite population
correction factor in order to estimate the standard error of the sample mean
s N −n 0.2331 400 − 120
= = 0.0441⋅ 0,8377 = 0.0369
n N −1 120 400 − 1
b) Let X be the quantity of liquids drank (in liters) during the competition
n = 120
x = 1.8115
1 − α = 0.90 α = 0.10 α 2 = 0.05 t n−1,α 2 ≈ z α 2 = z 0.05 = 1.645
The 90% confidence interval for the average quantity µ is:
s N −n s N −n
x − t n −1,α 2 < µ < x + t n −1,α 2
n N −1 n N −1
1.8115 − 1.645 ⋅ 0.0369 < µ < 1.8115 + 1.645 ⋅ 0.0369
1.7508 < µ < 1.8722
Exercise 17
In a small town library with 850 members, a random sample of 100 people have been asked about their use of social
networks. The 99% confidence interval for the average age of Twitter users was computed as being equal to (36; 38).
What were the average age and the standard deviation in the sample?
9
N=850; n=100; 𝐶𝐼11% 𝜇 = [36, 38]
A confidence interval for µ is a symmetrical interval centered on sample mean: [𝑥 − 𝑀𝐸, 𝑥 + 𝑀𝐸],
So we can see that 𝑥 = 37 and ME=1 (the width of the interval is 2).
To find the standard error for the sample mean we have to use the finite population correction factor because
we have a finite population (N=850) and the sample size n >5%*N. The margin of error is:
𝑠 𝑁−𝑛
𝑀𝐸 = 𝑡*+,,.//
𝑛 𝑁−1
[+* \>L+,LL
𝑡*+,,.// = 𝑡11,L.LL> ≈ 𝑧L.LL> = 2.575 and
= = 0.8834 , so we must have
[+, \01
] , ,LL
∙ 0.8834 ∙ 2.575 = 1
⇒
𝑠 = ∙ = 4.1318
* /.>_> L.\\`0
So the average age in the sample is 𝑥 = 37 and the standard deviation is s = 4.1318
Exercise 18
In a 2007 survey, a sample of 1000 workers of a small town were asked which means of transportation they
were using to reach the work place. These were the results:
Calculate the 95% confidence interval for those that are NOT using a private car to reach the work place in
2007.
Given that the sample size is sufficiently large, the confidence interval for the sample proportion is:
⎛ pˆ (1 − pˆ ) pˆ (1 − pˆ ) ⎞
⎜⎜ pˆ − zα / 2 ; pˆ + zα / 2 ⎟⎟
⎝ n n ⎠
1000 − 530
pˆ = = 0.47
1000
zα / 2 = z0.025 = 1.96
⎛ (0.47)(0.53) (0.47)(0.53) ⎞
⎜⎜ 0.47 − 1.96 ;0.47 + 1.96 ⎟⎟ = ( 0.4391;0.5009) .
⎝ 1000 1000 ⎠
Exercise 19
We want to estimate the percentage of “family companies” in a certain industry. In a random sample of 973
companies of such industry, 750 can be considered “family companies”.
a) Determine the 99% Confidence Interval for the proportion p of “family companies” in the industry.
b) Is it correct to say that “the calculated interval contains p with a probability of 99%”? Justify your
answer.
c) Give the definition of unbiased estimator for a parameter θ.
750
a) First we calculate the sample proportion: pˆ = = 0.7708
973
Then, considering that n is sufficiently large, the interval is:
⎡ pˆ (1 − pˆ ) ⎤ ⎡ 0.7708(1 − 0.7708) ⎤ ⎡ 0.1767 ⎤
CI 99% = ⎢ pˆ ± zα / 2 ⎥ = ⎢0.7708 ± 2.575 ⎥ = ⎢0.7708 ± 2.575 ⎥ =
⎣ n ⎦ ⎣ 973 ⎦ ⎣ 973 ⎦
= [0.7708 ± 2.575 ⋅ 0.0135] = [0.7708 ± 0.03470] = [0.7361;0.8055]
10
b) It is NOT correct. What we can say is that, although the true proportion, p, may or may not be in this
particular interval, 99% of intervals formed in this manner would contain the true proportion, p.
c) An estimator, T, of a parameter θ is unbiased if E(T) = θ, that is, if the expected value of the
estimator is equal to the parameter we want to estimate.
Exercise 20
In a recent report, the police has indicated that, in a random sample of 80 license suspensions on a Saturday night, 57
are related to excessive alcohol consumption. Let p be the proportion of alcohol-related license suspensions on
Saturdays. Based on the sample reported on by the police, compute a 90% confidence interval for p.
The general expression for the 90% confidence interval for the sample proportion is:
⎛ pˆ (1 − pˆ ) pˆ (1 − pˆ ) ⎞
c.i.0.90 ( p) = ⎜⎜ pˆ − z0.05 ; pˆ + z0.05 ⎟⎟ . with z0.05 = 1.645.
⎝ n n ⎠
From the sample data we get:
57
pˆ = = 0.7125,
80
The confidence interval for the proportion of alcohol-related license suspensions on Saturdays is (observing that the
sample size is large enough):
⎛ 0.7125(1 − 0.7125) 0.7125(1 − 0.7125) ⎞
c.i.0.90 ( p) = ⎜⎜ 0.7125 − 1.645 ; 0.7125 + 1.645 ⎟⎟
⎝ 80 80 ⎠
= ( 0.6293; 0.7957 )
Exercise 21
A simple random sample of 1000 mobile users is selected and data are collected about the brand of their mobile and
their monthly mobile expenses in €. The following table shows the results:
Nrunber of Monthly
Brand
observations expenses (€)
Nokia 367 25.53
Samsung 215 28.78
Sony Ericsson 99 23.44
Apple 159 37.45
Others 160 24.01
a) Build a 90% confidence interval for the proportion of Sony Ericsson mobile users.
b) Is the width of the calculated interval larger than that for the proportion of Nokia users, considering the same
level of confidence? Justify your answer without any calculations.
a) The available data for this calculation are: n = 1000, pˆ = 99 / 1000 = 0,099, z 0, 05 = 1,645
⎡ p(1 − p) ⎤
pˆ ≈ N ⎢ p; if np (1 − p) > 5
⎣ n ⎥⎦
The true proportion ( p ) is unknown, so we can use the sample proportion ( p̂ ), which is a good
11
approximation: 1000 ⋅ 0,099(1 − 0,099) = 89,199 > 5 . So it’s possible to use the Normal
approximation. The estimated standard error of the sample proportion is:
pˆ (1 − pˆ )
σˆ p̂ = =0,0095
n
So the 90% confidence interval is:
pˆ (1 − pˆ )
pˆ ± zα / 2 = 0,099 ±1,6445*0,0095 à (0,0835;0,1145)
n
b) Considering the same level of confidence the width of the interval for the proportion of Nokia users
will be larger than the one calculated before, since the sample proportion of Nokia users is closer to
0.5 and, therefore, its estimated standard error is bigger.
Exercise 22
We want to verify whether the passengers like a recent modification to a train timetable. 1000 persons are interviewed
and 200 state that they are in favor of the new timetable.
a) Determine a 95% confidence interval for the percentage of passengers that are in favor of the new timetable.
b) Supposing that we want to obtain a confidence interval with a width not larger than 0.03, what is the minimal
necessary sample size?
a) The (asymptotic) 1-α confidence interval for the percentage of passengers that are in favor of the
new timetable is:
𝑝 1−𝑝 𝑝(1 − 𝑝)
𝐶𝐼,+. 𝑝 = 𝑝 − 𝑧. , 𝑝 + 𝑧.//
/ 𝑛 𝑛
/LL .
In this case 𝑝 = = 0.2,
1 − 𝛼 = 0.95 ⇒ = 0.025
⇒ 𝑧c = 1.96, so
,LLL / d
𝐶𝐼,+. 𝑝 = [0.1752, 0.2248].
g(,+g)
b) The width of the confidence interval for the population proportion p is 𝑤 = 2 ⋅ 𝑧.//
*
Since we don’t know the value of the sample proportion before observing the sample, we set it equal
/⋅,.1h⋅L.> /
to 0.5. Therefore, if we want w ≤ 0.03, we need 𝑛 ≥ = 4268.44. So the sample size
L.L`
should be at least 4269.
Exercise 23
We want to estimate the proportion p of parents who are satisfied by the service offered at the school
cafeteria in a certain municipality. In a random sample of 350 parents, 42 have declared to be satisfied.
a) Determine the 99% Confidence Interval for the proportion p
b) Determine the sample size necessary to reduce to 0.04 the margin of error for the interval considered
above.
g(,+g)
a) The sample size is large (n=350>50). Therefore, the CI is 𝑝 ± 𝑧.//
*
0/
Since 𝑝 = = 0.12, 1-α=0.99, α=0.01, α/2=0.005, the interval is
`>L
. 012(1 − 0.12)
0.12 ± 𝑧L.LL> = 0.12 ± 2.575 ∙ 0.0174 = 0.0752, 0.1648
350
b) Because the maximum value of the standard error of the sample proportion is reached when
𝑝=0.5, we have that:
12
0.5 1 − 0.5
𝑀𝐸 ≤ 0.04
⇒
2.575 ≤ 0.04
𝑛
2.575/ ∙ 0.5 ∙ 0.5 1.6577
⇒
𝑛 ≥ = = 1036.0625
0.04/ 0.0016
So the necessary sample size is equal to 1037 units.
Exercise 24
In a supermarket, 200 randomly chosen customers who used a card payment method, are interviewed. The
below output is obtained by classifying the 200 customers by age and by the answer to the following
question: “Do you usually use debit or credit card as way of payment when shopping?”
Determine a 99% confidence interval for the proportion p of supermarket card-paying customers who use a
credit card.
A confidence interval is computed in accordance with the equation (considering that n is sufficiently large):
𝑝 1−𝑝 𝑝 1−𝑝
𝑝 − 𝑧. < 𝑝 < 𝑝 + 𝑧.
/ 𝑛 / 𝑛
In order to compute 𝑝 we need to work on marginal frequencies since we do require any distinction between
,LkhL
age classes. Therefore 𝑝 = = 0.35
/LL
Exercise 25
A bookshop needs to explore the buying behavior of its customers and the following data has been collected:
EXPENDITURE Average expenditure for books (in Euros)
NR number of purchased books
AGE customer’s age
13
a) The point estimate for the proportion of customers who spend between 20 and 30 Euros is:
4
𝑝 = = 0.5714
7
And the confidence interval for the proportion:
⎛ pˆ (1 − pˆ ) pˆ (1 − pˆ ) ⎞
i.c.0.95 ( p ) = ⎜ pˆ − z0,05 ; pˆ + z0,05 ⎟ =
⎝ n n ⎠
⎛ 0,5714(1 − 0,5714) 0,5714(1 − 0,5714) ⎞
= ⎜ 0,5714 − 1,645 ;0,5714 + 1,645 ⎟ = (0, 2638;0,879)
⎝ 7 7 ⎠
with z0,05 = 1,645 .
g+g
In order to construct the interval, we need to use the central limit theorem, to ensure that
g(,+g)
*
follows (approximately) a standard normal distribution. In this case, n does not appear large enough
for the application of the central limit theorem to hold (in particular 𝑛𝑝 1 − 𝑝 < 5), so the interval
is not reliable.
b) Using the same confidence level, in order to have a width of the interval not greater than 0.4:
pˆ (1 − pˆ ) ,.h0>d ∙L./>
2 ME = 2 ⋅ zα 2 ≤ 0,4 with zα 2 = z0,05 = 1,645 ⇒
𝑛 ≥ = 16.9127
L./d
n
So we would need a sample size of 17.
Exercise 26
A publisher has to decide whether he should offer books with a transparent cover in order to prevent
potential damages. A survey has been conducted on 228 readers with the result that 187 readers would like a
transparent cover on their books. Determine the 99% confidence interval for the proportion p of readers who
like a transparent cover on the books.
X 187
Taking into account that pˆ = = = 0.8202 and npˆ (1 − pˆ ) = 228 ⋅ 0.8202 ⋅ 0.1798 = 33.6236 > 5, the
n 228
distribution of the sample proportion p̂ is approximately normal. Therefore the desired interval is:
pˆ (1 − pˆ ) 0.8202 ⋅ 0.1798
pˆ ± z α = 0.8202 ± 2.575 = (0.7547; 0.8857)
2
n 228
where z α = z0.005 = 2.575
2
Exercise 27
Consider a sample of 6 tennis clubs. Data on the number of tennis courts (X) and number of members (Y)
are reported in the following table:
X 5 6 6 10 5 7
Y 145 182 140 280 178 257
State the necessary assumption in order to compute a 90% confidence interval for the population variance
𝜎l/ , and compute such interval.
We must assume that the number of members (Y) follows a normal distribution. In this case, the 90%
confidence interval for the population variance is given by:
(𝑛 − 1)𝑠l/ (𝑛 − 1)𝑠l/ 6 − 1 3405.6 6 − 1 3405.6 17028 17028
𝐶𝐼1L% 𝜎l/ = / , / = , = , = (1538.2114, 14871.6157)
𝜒*+,,.// 𝜒*+,,,+.// 11.07
1.145 11.07 1.145
, * pqd
Where 𝑦 = 𝑦o = 197, 𝑠l/ = − 𝑦 / = 3405.6
* *+, *
/ / / /
𝜒*+,,.// = 𝜒>,L.L> = 11.07 ; 𝜒*+,,,+.// = 𝜒>,L.1> = 1.145
14
Exercise 28
The following table contains information about the horse power and fuel consumption of a sample of 11
cars:
, * Zqd
a) 𝑥 = 𝑥o = 157.273, 𝑠r/ = − 𝑥 / = 681.818
* *+, *
, * pqd
𝑦= 𝑦o = 7.0091, 𝑠l/ = − 𝑦 / = 1.3089
* *+, *
b) We must assume that both variables, X and Y are normally distributed.
Exercise 29
We want to estimate the difference µX – µY between the average expenses of graduate and undergraduate
school students for the lunch around the university. For this purpose a sample of 5 couples of students is
selected, each couple equal for gender, economic situation, geographic origin, and the money spent (in Euro)
for the lunch in one day is collected:
a) An interval estimator for a population parameter is a rule for determining (based on sample
information) a range or an interval that is likely to include the parameter.
An interval estimator for a generic parameter θ is a range between two (random) values (A and B)
which depend on sample information, such that P(A < θ < B) = 1 - α. The quantity (1 - α) is called
the confidence level of the interval (α between 0 and 1). The interval estimator calculated in this
manner is written as A < θ < B with probability 100(1 - α)%. After observing the sample, we say
that a < θ < b with 100(1 - α)% confidence.
15
b) The samples are paired, so we define the variable D = X-Y (where X denotes de expense of graduate
school students and Y the expense of undergraduate students). Then,
(8 − 7.5) + (6.5 − 6.55) + (6.5 − 5.5) + (11.5 − 10) + (4 − 4.5)
d = = 0.49
5
(0.5 − 0.49) 2 + (−0.05 − 0.49) 2 + (1 − 0.49) 2 + (1.5 − 0.49) 2 (−0.5 − 0.49) 2
sd = = 0.7987
4
⎡ s ⎤ ⎡ 0.7987 ⎤
CI 99% = ⎢d ± tn−1,α / 2 d ⎥ = ⎢0.49 ± 4.604 = [0.49 ± 4.604 ⋅ 0.3572] = [− 1.1546;2.1346]
⎣ n ⎦ ⎣ 5 ⎥⎦
Since the Confidence Interval crosses 0, we cannot say that the average expenses of graduate and
undergraduate school students for the lunch around the university are different.
Exercise 30
Two different machines (an old one and a new one) are used to fill bags with flour. There is a suspicion that
the new machine produces heavier bags. It is reasonable to assume that the weight, in grams, of the bags
being filled by the two machines are described by two normal random variables (call them X and Y) with
unknown expected values and with known variances both equal to 1200. Two samples of 20 and 25 bags are
extracted at random from among the bags produced by the old and the new machine, respectively.
a) What are the expected value and the variance of the difference between the two sample means?
How is such difference distributed?
b) Assuming that µ X = µY , determine the value k such that P( X − Y < k ) = 0.8.
c) The 20 bags extracted from the old machine’s production have a mean weight of 1136 grams, while
the 25 bags extracted from the new machine’s production have an average weight of 1125 grams.
Construct the 90% confidence interval for the difference in the average weights of the bags
produced by the two machines.
a) 𝐸 𝑋 − 𝑌 = 𝐸 𝑋 − 𝐸 𝑌 = 𝜇r − 𝜇l
Since the two samples are independent,
d
Tu Tvd ,/LL ,/LL
𝑉 𝑋−𝑌 =𝑉 𝑋 +𝑉 𝑌 = + = + = 60 + 48 = 108
*u *v /L />
And 𝑋 − 𝑌 ~N(𝜇r − 𝜇l ; 108)
b) 𝑃 𝑋 − 𝑌 < 𝑘 = 0.8
r+l+L {+L
𝑃 < = 0.8
,L\ ,L\
{
𝑃 𝑍< = 0.8
→ 𝐹 𝑧 = 0.8 → 𝑧 = 0.84
,L\
𝑘 = 108×0.84 = 10.3923×0.84 = 𝟖. 𝟐𝟕𝟖𝟓.
c) Using
𝑧c = 𝑧L.L> = 1.645.
d
16
Exercise 31
There are two donuts stands in a public square. We randomly pick 30 people who bought donuts from stand
1 and 40 who bought then from stand 2. The 30 donuts from stand 1 have an average weight of 140.3 gr,
with a standard deviation of 16.4 gr., while the 40 donuts from stand 2 have an average weight of 149.2 gr,
with a standard deviation of 21.2 gr. After stating the necessary assumptions, build a 95% confidence
interval for the difference between the average weights of the donuts of the two stands.
First, we notice that the two samples are independent and both variances are unknown. Therefore, in order to
calculate the confidence interval for the difference between the two average weights we need to assume that
the two variances are equal.
The size of the samples is sufficiently large to apply the Central Limit Theorem, thus granting normality of
the distribution of the sample means without additional assumptions.
We calculate the 95% confidence interval as follows:
⎡ 1 1 1 1 ⎤
CI 95% ( µ X − µY ) = ⎢(x − y ) − tn X +nY −2;α / 2 ⋅ s p ⋅ + , (x − y ) + t n X +nY −2;α / 2 ⋅ s p ⋅ + ⎥
⎣ n X nY n X nY ⎦
Given the large number of degrees of freedom we can approximate the student t-distribution by a normal
distribution:
tnX +nY −2;α / 2 = t68;0.025 ≅ z0.025 = 1,96
Eventually, we get:
⎡ 1 1 ⎤
CI 95% ( µ X − µY ) = ⎢(140,3 − 149,2) ± 1,96 ⋅ 19,2995 + ⎥ = [- 18,0361 ; 0,2361].
⎣ 30 40 ⎦
Exercise 32
A researcher wants to demonstrate that a particular blood parameter is different for smokers and no-
smokers. Two random samples are selected the first one of 10 smokers, the second one of 10 no-smokers.
The particular blood parameter is measured for both samples, with the following results:
Sample mean Sample standard deviation
No-smokers 90.7 5.4
Smokers 87.2 4.8
Let µS – µNS be the difference between the average values of the blood parameter in the two populations of
smokers and no-smokers.
a) Which assumptions must we make in order to build a confidence interval for µS – µNS ?
b) Calculate a 95% confidence interval for the difference between the two means.
a) The researcher must assume that the populations follow a normal distribution, with means µF and
µNF and common variance, σ2.
17
⎛ s2 s2 s2 s2 ⎞
i.c.(1−α ) ( µ F − µ NF ) = ⎜ xF − xNF − tnF + nNF −2,α /2 p + p ; xF − xNF + tnF + nNF −2,α /2 p + p ⎟ =
⎜ nF nNF nF nNF ⎟
⎝ ⎠
= x F − x NF ± ME ;
s 2p s 2p
where ME = tnF + nNF − 2,α /2 +
nF nNF
x F − x NF = 87.2 – 90.7 = -3.5
s 2p =
( nF − 1) sF2 + ( nNF − 1) sNF
2
=
9 ⋅ 4.82 + 9 ⋅ 5.42
= 26.1
nF + nNF − 2 10 + 10 − 2
26.1 26.1
α / 2 = 0.025 ; t18,0.025 = 2.101 ; ME = 2.101
+ = 4.8002
10 10
Hence, i.c.0.95 ( µF − µNF ) = (−3.5 − 4.8002; −3.5 + 4.8002) = (−8.3002;1.3002) .
Exercise 33
In order to verify that a fish-based diet can reduce cholesterol levels, the following experiment has been run: two
independent samples of 100 men each have been selected, with the same initial average level of cholesterol.
Individuals in the first group followed, for 6 months, a diet with a reduced amount of fish; individuals in the second
one, during the same period of time, followed a “fish-based diet”. Cholesterol levels, after the diet, for the two
samples are reported below:
Sample
mean
Sample
Variance
“Reduced
fish
diet”
210.1
37.4
“Fish-‐based
diet”
196.8
33.5
a) What are assumptions must we make in order to build a confidence interval for the difference
between the two means?
b) Calculate a 95% confidence interval for the difference between the two means
a) It is necessary to assume that the samples come from populations that have means µPP (little fish
diet) and µMP (fish-based diet) and a common variance, σ2
b) Given that the sample sizes are large, we can use the normal approximation to the Student-t
distribution, thus replacing the critical value tn-1,α/2 with zα/2. Therefore, the confidence interval
becomes:
s 2p s 2p
i.c.(1−α ) ( µPP − µMP ) = (xPP − xMP − ME; xPP − xMP + ME) ; ME = zα /2 +
nPP nMP
s 2p =
( nPP − 1) sPP + ( nMP − 1) sMM
=
99 ⋅ 37.4 + 99 ⋅ 33.5
= 35.45
nPP + nMP − 2 100 + 100 − 2
35.45 35.45
α / 2 = 0.025 ; z0.025 = 1.96 ; ME = 1.96 + = 1.96 ⋅ 0.842 = 1.6504
100 100
therefore i.c.0.95 ( µPP − µMP ) = (13.3 − 1.6504;13.3 + 1.6504) = (11.6496;14.9504).
18
Exercise 34
The punctuality of the employees of a firm is controlled and, over two random samples, one of employees with
permanent contract (X) and the other one of freelancers (Y), the following values (in minutes) have been recorded: a
positive value denotes a late arrival while a negative value denotes an arrival in advance.
X -0.23 -0.83 0.63 1.26 0.7 0.01 -2.27
Y 0.94 -1.42 1.87 1.41 -0.34 0.65
_ h _ h
Which assumptions about the two populations have to be made in order to be able to determine a confidence
interval for the difference of the means, µX - µY? Determine the 90% confidence interval for the difference of
the means, µX - µY.
The two populations are independent. Given the small number of observations, in order to determine a
confidence interval for the difference between the means it is necessary to assume that the two
populations are distributed according to Normal distributions, 𝑋~𝑁(𝜇r , 𝜎r/ ) and 𝑌~𝑁 𝜇l , 𝜎l/ with
unknown means and unknown but equal variances, 𝜎r/ = 𝜎l/ = 𝜎 / .
The 1-α confidence interval for the difference between the means is:
𝑠g/ 𝑠g/
𝐶𝐼,+. 𝜇r − 𝜇l = (𝑥 − 𝑦) ∓ 𝑡*• k*Ž +/,.// +
𝑛Z 𝑛p
In our case,
_ h
1 1
𝑛Z = 7;
𝑛p = 6;
𝑥 = 𝑥o = −0.1043;
𝑦 = 𝑦o = 0.5783;
𝑥 − 𝑦 = −0.6226
7 6
oŒ, oŒ,
_ h
7 1 6 1 𝑛Z − 1 𝑠Z/ + (𝑛p − 1)𝑠p/
𝑠Z/ = 𝑥o/ −𝑥 /
= 1.3822;
𝑠p/ = 𝑦o/ − 𝑦 / = 1.4623;
𝑠g/ = = 1.4186
6 7 5 6 𝑛Z + 𝑛p − 2
oŒ, oŒ,
Exercise 35
You want to evaluate the savings on heating expenses of an apartment, due to the installation of new
windows. In order to do so, you randomly select 5 flats, of the same size, where windows have been just
substituted. You find the results bellow by collecting the heating monthly expenses for each flat (in euro),
before and after the substitution of the windows:
Flat Expense before substitution Expense after substitution
1 180.72 120.42
2 140.56 96.46
3 175.34 184.24
4 168.56 118.16
5 163.87 123.77
a) Is it necessary to formulate some preliminary assumptions in order to determine the confidence
interval for the difference between means? If yes, please specify which assumptions.
19
b) Determine the 90% confidence interval for the difference between the average expenditure before
and after the windows’ substitution.
c) In a generic problem of interval estimation, define what is meant by “confidence level for an interval
estimate”.
a) Yes, we have to assume that the 5 pairs of dependent observations are extracted from two
populations with normal distributions.
b) We are dealing with paired samples, so we calculate the new variable D as the difference in expense
before and after the substitution.
Difference
Flat Expense before substitution Expense after substitution
(𝒅𝒊 = 𝒙𝒊 − 𝒚𝒊 )
1 180.72 120.42 60.3
2 140.56 96.46 44.1
3 175.34 184.24 -8.9
4 168.56 118.16 50.4
5 163.87 123.77 40.1
𝑑 = 37.2
𝑠”/ = 722.27
𝑠” = 26.8751
1 − 𝛼 = 0.9
𝛼 = 0.05
𝑡*+,,. = 2.132
/
The 90% confidence interval is given by:
𝑠” 𝑠”
𝑑 −
𝑡*+,,. < 𝜇” < 𝑑 +
𝑡*+,,.
/ 𝑛 / 𝑛
26.8751 26.8751
37.2 −
𝑡0,L.L> < 𝜇” < 37.2 +
𝑡0,L.L>
5 5
11.5757 < 𝜇” < 62.8243
c) The “confidence level for an interval estimator” (1 – α, with α between 0 and 1) represents the
degree of reliability of the interval. If we extract several samples from the population, the observed
interval (in each extracted sample) will contain the true value of the parameter in 100(1- α)% of
cases, while it will not contain it in the α% of cases.
Exercise 36
The manager of a chain of real estate agents wants to know, among young couples, how many are willing to
buy an apartment in the next few years. To this purpose, he decides that it is appropriate to extract a sample
of young couples.
a) In order to obtain a 99% confidence interval for the proportion of young couples who intend to buy
an apartment in the next few years with a margin of error not larger than 0.03 (3%), what is the
minimum number of young couples that the manager must extract?
b) In an interview of 2000 young couples, randomly selected, 945 stated that they are intending to buy
an apartment in the next few years. Construct a 99% confidence interval for the proportion of young
couples intending to buy an apartment in the next few years.
A training course is now organized for some real estate agents of the chain, in order to increase the weekly
sales (in thousands of €). Taking a random sample of six agencies, the levels of sales in the week before and
in the week after the course have been measured, with the following results:
20
2 2.5 2.9
3 2.2 2.3
4 2.1 2.1
5 2.6 2.9
6 2.7 2.8
c) After listing the necessary assumptions, construct a 95% confidence interval for the difference
between the average sales.
a) In order to have a margin of error not larger than 0.03, we must solve for n the equation
0.5 ⋅ 0.5
z 0.005 ⋅ = 0.03 and round up to find the minimum n.
n
2.576 2 ⋅ 0.25 6.6358 ⋅ 0.25
n= = = 1843.2778
0.03 2 0.0009 à n*=1844
b) In order to construct a confidence interval, the condition n ⋅ pˆ ⋅ (1 − pˆ ) > 5 must hold. In the exercise,
945
pˆ = = 0.4725 , n ⋅ pˆ ⋅ (1 − pˆ ) = 498.4875 > 5 .
2000
⎛ pˆ (1 − pˆ ) ⎞ ⎛ 0.4725 ⋅ 0.5275 ⎞
ic99% ( p ) = ⎜⎜ pˆ ± z 0.005 ⋅ ⎟ ⎜ 0.4725 ± 2.576 ⋅
⎟ ⎜
⎟
⎟
⎝ n ⎠ à ⎝ 2000 ⎠
⎛ 0.2492 ⎞ (0.4725 ± 2.576 ⋅ 0.0112) (0.4725 ± 0.0288) (0.4437;0.5013)
⎜ 0.4725 ± 2.576 ⋅ ⎟ à à à
⎜ 2000 ⎟
⎝ ⎠
c) In order to construct a confidence interval for the difference of the means we have to assume that the
populations X = “level of weekly sales before the course” and Y = “level of weekly sales after the
course” have a Normal distribution.
We have to construct a confidence interval for the difference of the means, with dependent samples.
Let d i = xi − y i , then the confidence interval will be the following:
s
ic95% (µ d ) = ⎛⎜ d ± t n −1;0.025 ⋅ d ⎞
⎟
⎝ n ⎠
# agent xi yi di= xi - yi
1 2.7 2.7 0
2 2.5 2.9 -0.4
3 2.2 2.3 -0.1
4 2.1 2.1 0
5 2.6 2.9 -0.3
6 2.7 2.8 -0.1
d = −0.15
6 ⎡ 1 6 2 6
sd = ⎢ ∑
⎤
d i − d 2 ⎥ = [0.045 − 0.0225] = 6 ⋅ 0.0225 = 0.027 = 0.1643
5 ⎣ 6 i =1 ⎦ 5 5
ic95% (µ d ) = ⎛⎜ − 0.15 ± 2.571 ⋅ 0.1643 ⎞⎟
⎝ 6 ⎠
ic95% (µ d ) = (− 0.15 ± 2.571 ⋅ 0.0671)
ic95% (µ d ) = (− 0.15 ± 0.1725)
ic95% (µ d ) = (− 0.3225;0.0225)
21
Note: In the case where the choice of the variables X and Y are reverted, the confidence interval is
(− 0.0225;0.3225)
Exercise 37
The table below refers to a sample of 10 products sold by a food company:
ADVERTISING
PRODUCT SALES (Y) PRICE (X2)
PRODUCT MARKET SUCCESS (Z) CAMPAIGNS (X1)
CATEGORY (Q) (millions of Euro) (euro)
(thousands of euro)
1 A HIGH 4 475 2
2 A LOW 2.5 350 2.6
3 A MEDIUM 3.1 425 2.4
4 A MEDIUM 2.9 325 2.6
5 A HIGH 5.5 555 1.4
6 B LOW 1.6 550 2
7 B MEDIUM 2.8 440 2.8
8 B LOW 1.8 310 3.5
9 B LOW 1.6 220 3.9
10 B HIGH 7.2 670 1.8
Total 33 4320 25
We are interested in comparing the average of the variable PRICE (X2) between the two Product categories
represented by the variable Q. Calculate the 99% confidence interval for the difference between the average
prices of the two populations. Write down the assumptions needed to build the confidence interval. (HINT:
the sample variance for price in product category B is equal to 0.835).
By denoting with the subscripts A and B the relative quantities of the two product categories, we have:
nA = 5, nB = 5,
2 + 2.6 + 2.4 + 2.6 + 1.4 2 + 2.8 + 3.5 + 3.9 + 1.8
x2, A = = 2.2, x2,B = = 2.8
5 5
We need to find an estimate of the common variance of the two populations, that is s p 2 , the pooled
variance. To do so, we need the sample variance of Price (X2) for the product category A which is
1 1
s2 X2 ,A =
nA − 1
∑
5
i =1
( x2, A,i − x2, A ) 2 =
nA − 1
(∑ 5
x
i =1 2, A,i
2
)
− nA ⋅ ( x2, A ) 2 =
1 2
=
4
(
(2 + 2.62 + 2.42 + 2.62 + 1.42 ) − 5 ⋅ 2.22 = 0.260 )
And then for product category B which is given by the text: s 2 X 2 , B = 0.835
At this point we can compute the pooled variance:
(nA − 1) s X 2 , A2 + (nB − 1) s X 2 , B 2 4 * 0.26 + 4 * 0.835
2
sp = = = 0.5475
nA + nB − 2 8
22
⎛ s p 2 s p 2 ⎞ ⎛ 0.5475 0.5475 ⎞
IC99% ( µ A − µ B ) = ⎜ ( x2, A − x2, B ) ± t0.005,8 + ⎟ = ⎜ −0.6 ± 3.355 + ⎟ =(-2.17, 0.97)
⎜ n A n B ⎟ ⎜⎝ 5 5 ⎟⎠
⎝ ⎠
Exercise 38
Considering a sample of 8 business agents working for the same company, we recorded their gross annual
income in 2008 and 2011:
Agents INCOME 2011 INCOME 2008
1 38 39
2 42 40
3 35 35
4 48 46
5 51 46
6 50 45
7 38 36
8 36 36
23
n
∑d
i =1
i
15
d= = = 1.875
n 8
⎛ n 2 ⎞
⎜ ∑ d i ⎟
n ⎜ i =1 8 ⎛ 63 2 ⎞
s d2 = − d 2 ⎟ = ⎜ − (1.875 ) ⎟ = 4.9821
n −1 ⎜ n ⎟ 7 ⎝ 8 ⎠
⎜ ⎟
⎝ ⎠
s d = s d2 = 2.2321
t n −1,α 2 = t 7,0.025 = 2.365
Therefore,
⎡ 2.2321 2.2321⎤
CI 0.95 = ⎢1.875 − 2.365 ;1.875 + 2.36 ⎥ =
⎣ 8 8 ⎦
= [0.0086;3.7414]
In order to decrease the width of the interval you can decrease the level of confidence (increase α ) or
increase n thus decreasing the margin of error. Since the confidence interval for the difference between
sd
the means is d ± ME where: ME = tn −1,α 2 is the margin of error.
n
c) Note that ME depends on three factors: the confidence level, the sample size and the variability of the
differences. The greater the sample size, the lower the standard error and thus the margin of error will be
smaller, as will the size of the interval. For fixed standard error, instead, ME depends on the confidence
level, therefore the smaller the confidence level the smaller tn −1,α 2 will be and consequently the interval
will be reduced. The variability of the differences will depend on the sample observations and thus
cannot be controlled.
24