Chapter 8 Sampling and Estimation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

TAR UC – C.P.U.S.

Foundation in Computing & Accounting– Statistics (FPMA1014)

Chapter 8: Sampling and Estimation

At the end of chapter, students should be able to


 understand the distinction between a sample and a population.
 understand how to select a random sample from a population.
 appreciate the benefits of choosing a random sample.
 recognize that a sample mean can be a random variable and use the facts that
2
E( X ) = µ and that Var( X ) =
n
 use the fact the X has a normal distribution if X has a normal distribution.
 understand the meaning of the central limit theorem and be able to use it in
calculations.

8.1 Sampling

Random samples are a subset taken from a population where every item or object in a
population has equal chance of being picked or being selected.

Population is complete collection of Sample is a portion or subset taken from


measurements, objects, or individuals a population.
under study. [Sample frame is a list of the entire
population from which items can be
selected to form a sample]
Parameter is number that describes Statistic is number that describes
population characteristic. sample characteristic.
Symbols Symbols
 Population size = N (the number  Sample size = n (the number of
of elements in the population) elements in the sample)
 Population mean = µ (the average  Sample mean = x (the average
value in the population) value in the sample)
 Population standard deviation = σ  Sample standard deviation = s
 Population variance = σ 2
 Sample variance = s2
Advantages Advantages
 accurate (precise)  save resources (e.g. cost, time,
 reliable man power)
 faster output
Survey Survey
Census (every member of the population Poll (only a portion of a whole
is surveyed) population is surveyed)

Note: Statistics is the science of designing studies, gathering data, and then classifying,
summarizing, interpreting, and presenting these data to support the decisions that are
needed. (Refer to the subject of study)
1
TAR UC – C.P.U.S. Foundation in Computing & Accounting– Statistics (FPMA1014)

8.1.1 Sampling Distribution of Sample Mean and Sample Variance


 Sampling distribution is the probability distribution of a statistic obtained through
a large number of samples drawn from a specific population.
 Sampling distribution of the sample means is the distribution of the arithmetic
means of all the possible random samples of size n that could be selected from a
given population.
 For the sampling distribution of the sample means:

If X is normally distributed, X ~ N(µ, σ2) , If X is not normally distributed,

then X also is normally distributed, X ~


 2
then the Central Limit Theorem is applied.
N(µ, )
n
2 2
* E( X ) = µ and that Var( X ) = . i.e. X ~ N(µ, ) provided that n is large,
n n
(n ≥ 30).
2
Thus, E( X ) = µ and that Var( X ) = .
n

 The standard deviation of the sampling distribution of sample means is the


standard error of the mean and is represented by  X or s .
2 
 Standard error of the mean (standard error or sampling error), s or s  ,
n n
is the deviation of a sample mean from the mean of the population

** Central Limit Theorem

 the mean of the sampling distribution is equal to the proportion mean.  x   



 the standard deviation of the sampling distribution of means is equal to .
n
  
s  
 n
 the sampling distribution is asymptotically when n is sufficiently large. [n ≥ 30]

Note: If the population is normally distributed, the sampling distribution is normal


regardless of the sample size.

2
TAR UC – C.P.U.S. Foundation in Computing & Accounting– Statistics (FPMA1014)

Example 1
At a college, the masses of the male students can be modelled by a normal distribution
with mean mass 70 kg and standard deviation 5.6 kg.
a) Find the probability that a randomly chosen male student has mass more than 73 kg.
b) Find the probability that 6 randomly chosen male student has mean mass more than
73 kg.

3
TAR UC – C.P.U.S. Foundation in Computing & Accounting– Statistics (FPMA1014)

Example 2
a) The mass of a randomly chosen 15 – year – old male student at a large secondary
school may be modelled by a normal distribution with mean 55 kg and standard
deviation 2.2 kg. Four students are chosen at random from this group. Calculate the
probability that the mean mass of the four students is
(i) less than 58 kg (ii) between 52 kg and 57.5 kg.
b) A second sample of size n is chosen from the 15 – year – old male students. How
large does n have to be for there to be 99.79% chance that the mean mass of sample
is greater 53 kg.

4
TAR UC – C.P.U.S. Foundation in Computing & Accounting– Statistics (FPMA1014)

8.2 Estimation
This part looks at what you can deduce about a population from a sample. When you
have completed it you should
 understand the term ‘unbiased’ with reference to an estimator of the mean or
variance of a sample and be able to calculate unbiased estimates of the population
mean ad variance from a sample
 be able to determine a confidence interval for a population mean in the case when
the population is normally distributed or where a large sample is used
 be able to determine, from a large sample , an approximate confidence interval
for a population proportion.

Unbiased Estimator
 Estimate is a specific value or quantity obtained for a statistic such as the sample
mean, sample percentage or sample variance.
 Estimator is any statistic (sample mean, sample percentage, sample variance) that
is used to estimate parameter.
 Unbiased estimator is one that produces a sampling distribution that has a mean
that is equal to the population parameter to be estimated.
 Estimation is the entire process of using an estimator to produce an estimate of
the parameter.
 There are two types of estimation:
 point estimation
 interval estimation

5
TAR UC – C.P.U.S. Foundation in Computing & Accounting– Statistics (FPMA1014)

8.2.1 Point Estimation


 Point estimate is a single number based on random sample used to estimate a
population parameter and the process of estimation with a single number is
known as point estimation.

 Point estimate of population parameters


a) unbiased estimate of the population mean(µ), ̂

ˆ  x 
x
n
b) unbiased estimate of the population variance(σ2), ˆ
2

n
ˆ 2   s2
n 1
 x  x 
2

 , s2 is the variance of the sample


n 1
  x  2


1  
 x2 
n 1  n 
 

6
TAR UC – C.P.U.S. Foundation in Computing & Accounting– Statistics (FPMA1014)

Example 3
a) Nine CDs were played and the playing time of each CD was recorded. The times, in
minutes, are given below.
49, 56, 55, 68, 61, 57, 61, 52, 63
Find the mean playing time of the nine CDs and the variance of the playing times.

b) A student was doing a project on the playing times of CDs. She wished to estimate
the mean playing time for CDs sold throughout the country and she wished also to
estimate the variance of playing time of CDs sold throughout the country. She took a
sample of nine CDs and recorded their playing times. The results are given below.
49, 56, 55, 68, 61, 57, 61, 52, 63
(i) Use the student’s data to estimate the mean playing time for CDs sold in the
country.
(ii) Use the student’s data to estimate the variance of the playing time of CDs sold in
the country.

7
TAR UC – C.P.U.S. Foundation in Computing & Accounting– Statistics (FPMA1014)

Example 4
Thirty oranges are chosen at random from a large box of oranges. Their masses, x
grams, are summarized by Σx = 3 033 and Σx2 = 306 676. Find, to 4 significant figures,
unbiased estimates for the mean and variance of the mass of an orange in the box.

The oranges are packed in bags of 10 in a shop and the shopkeeper told customers that
most bags weigh more than a kilogram. Show that the shopkeeper’s statement is
correct.

8
TAR UC – C.P.U.S. Foundation in Computing & Accounting– Statistics (FPMA1014)

8.2.2 Interval Estimation


 Interval estimate (confidence interval) is a two numbers between which the
population parameter may be considered to lie, and the process of estimating with
a spread of values is known as interval estimation.
 Interval estimate indicates the accuracy of an estimate.
 Level of confidence (confidence coefficient) refers to the levels of certainty in
which the interval estimate correct (or the probability of correctly including the
population parameter being estimated in the interval that is produced).
 Critical value is the z – value in the confidence interval. It is obtained for
different levels of confidence.
 An interval estimate of µ may be constructed in the following manner:
x  z X    x  z X
lower limit of estimate upper limit of estimate

where
x is the sample mean (and point estimator of µ),
 X is the standard error of the mean,
z is the standard normal value determines by the probability associated with the
interval estimate.

8.2.2(a) Confidence Interval for Population Mean


 of a normal population with known variance σ2
 using any size sample, n large or small

Given a sample from normal population, a 100 (1 – α) % confidence interval for the
population mean is given by
   
 x  z , x  z 
 n n
where x is the sample mean and the value of z is such that Φ(z) = 1 – 0.5α.

Interval estimate of population mean, µ:

a) a symmetric 90% confidence interval for µ is


   
 x  ,x 
 n n

9
TAR UC – C.P.U.S. Foundation in Computing & Accounting– Statistics (FPMA1014)

b) a symmetric 95% confidence interval for µ is

   
 x  ,x 
 n n

c) a symmetric 99% confidence interval for µ is

   
 x  ,x 
 n n

Example 5
The lengths of nails produced by a machine are known to be distributed normally with
mean µ mm and standard deviation 0.7 mm. The lengths, in mm, of a random sample of
5 nails are 107.29, 106.56, 105.94, 106.99, 106.47.
a) Calculate a symmetric 95% confidence interval for µ, giving the end-points correct
to 1 decimal place.
b) Two hundred random samples of 5 nails are taken are taken and symmetric 95%
confidence interval for µ is calculated for each sample. Find the expected number of
intervals which do not contain µ.

10
TAR UC – C.P.U.S. Foundation in Computing & Accounting– Statistics (FPMA1014)

Example 6
For a method of measuring the velocity of sound in air, the results of repeated
experiments are known to be distributed normally with standard deviation 6 ms -1. A
number of measurements are made using this method, and from these measurements a
symmetric 95% confidence interval for the velocity of sound in air is calculated. Find
the width of this confidence interval for
a) 4 measurements,
b) 36 measurements.

11
TAR UC – C.P.U.S. Foundation in Computing & Accounting– Statistics (FPMA1014)

8.2.2(b) Confidence Interval for Population Mean


 of a normal or non-normal population with unknown variance σ2
 using large sample, n

Given a large sample (n ≥ 30) from any population, a 100 (1 – α) % confidence interval
for the population mean is given by

 ˆ ˆ 
x  z ,x z 
 n n

where x is the sample mean and the value of z is such that Φ(z) = 1 – 0.5α and
1  ( x) 2 
ˆ   x  
2 2

n  1  n  .
Interval estimate of population mean, µ:

a) a symmetric 90% confidence interval for µ is


 ˆ ˆ 
 x  ,x 
 n n

b) a symmetric 95% confidence interval for µ is

 ˆ ˆ 
 x  ,x 
 n n

c) a symmetric 99% confidence interval for µ is


 ˆ ˆ 
 x  ,x 
 n n

12
TAR UC – C.P.U.S. Foundation in Computing & Accounting– Statistics (FPMA1014)

Example 7
The contents of 140 bags of flour selected randomly from a large batch delivered to a
store are weighed and the results, w grams, summarized by Σw = 69 734 and Σw2 = 34
735 178.
a) Calculate unbiased estimates of the batch mean and variance of the mass of flour in
a bag.
b) Calculate a symmetric 95% confidence interval for the batch mean mass.

13
TAR UC – C.P.U.S. Foundation in Computing & Accounting– Statistics (FPMA1014)

Example 8
Audio cassette tapes of a particular brand are claimed by the manufacturer to give, on
average, at least 60 minutes of playing time. After receiving some complaints, the
manufacturer’s quality control manager obtains a random sample of 64 tapes and
measures the playing time, t minutes, of each. The results are summarized by

Σ t = 3953.28 and Σ t2 = 244 557.00.

a) Calculate a symmetric 99% confidence interval for the population mean playing
time of this brand of tape.
b) Does the confidence interval support the customers’ complaints? Give a reason for
your answer.

14

You might also like