Statistics For Engineers Statistics 509, Fall 2010: Professor Edsel A. Pe Na E-Mail: Pena@stat - Sc.edu

Statistics for Engineers
Statistics 509, Fall 2010
Professor Edsel A. Peña

E-Mail: pena@stat.sc.edu
November 16, 2010
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Lecture 05: Sampling, Likelihoods, and Estimation
▶ Reprise: Distributions as Population Models

▶ Need to Discover Values of Parameters of Distributions
▶ Experimentation and Sampling
▶ Samples, Statistics, and Sampling Distributions
▶ Central Limit Theorem
▶ Sample Mean and Sample Median
▶ Sample Variance and Sample Standard Deviation
▶ Sample Covariance and Sample Correlation Coefficient
▶ Likelihood Function: Information about Parameters from
Sample Data
▶ The Problem of Parameter Estimation
▶ Method of Moments Principle
▶ Maximum Likelihood Estimation Principle
Problem of Inference
▶ The probability distributions (both discrete and continuous)

serve as population models.
▶ If we know the parameters of the probability distribution
modeling our population, then we know how the population
behaves.
▶ Thus, it is important to discover the values of the parameters
of our probability distributions.
▶ How do we discover their values?
▶ We do experiments or take samples that will provide us
sample data.
Sampling and Experiments
▶ Consider a population where the variable of interest is denoted

by X and which is postulated to have a distribution F (x; 𝜃),
with PMF p(x; 𝜃) if X is discrete or with PDF f (x; 𝜃) is X is
continuous, where 𝜃 is a parameter.
▶ A random sample of size n from this population is a subset of
size n taken in such that every possible subset of size n are
equally likely.
▶ It could either be taken sampling with or without replacement.
However, in this course we will assume that it is sampling
with replacement. These will make the observations
independent of each other.
▶ The sample variables will be denoted by X1 , X2 , . . . , Xn .
Joint PMF and Likelihood for Discrete Population
Let X1 , X2 , . . . , Xn be a random sample from a discrete PMF

p(x; 𝜃). The joint PMF of (X1 , X2 , . . . , Xn ) is
n
∏
p(x1 , x2 , . . . , xn ; 𝜃) = p(xi ; 𝜃).
i =1
Given the sample values (x1 , x2 , . . . , xn ), if we view the joint PMF

for these values as a function of 𝜃, we obtain what is called as the
likelihood function, denoted by L(𝜃):
n
∏
L(𝜃) = L(𝜃; x1 , x2 , . . . , xn ) = p(x1 , x2 , . . . , xn ; 𝜃) = p(xi ; 𝜃)
i =1
For a given 𝜃, L(𝜃) represents the probability of getting the data

(x1 , x2 , . . . , xn ) for the value 𝜃.
Joint PDF and Likelihood for Continuous Population
Let X1 , X2 , . . . , Xn be a random sample from a continuous PDF

p(x; 𝜃). The joint PDF of (X1 , X2 , . . . , Xn ) is
n
∏
f (x1 , x2 , . . . , xn ; 𝜃) = f (xi ; 𝜃).
i =1
Given the sample values (x1 , x2 , . . . , xn ), the likelihood function is:

n
∏
L(𝜃) = L(𝜃; x1 , x2 , . . . , xn ) = f (x1 , x2 , . . . , xn ; 𝜃) = f (xi ; 𝜃)
i =1
For a given 𝜃, L(𝜃) represents the ‘likelihood’ of getting the data

(x1 , x2 , . . . , xn ) under the value of 𝜃.
Remark: The likelihood function is the function that contains
information about the parameter 𝜃 provided by the sample data.
Example: Sampling from Bernoulli Population
▶ Consider a population with only two values: 0 and 1, a
Bernoulli population.
▶ Denote by 𝜃 the proportion of 1s in this population. This is
the parameter of interest and it will be in Θ = (0, 1).
▶ We sample, with replacement, n units from this population.
The sample data is X1 , X2 , . . . , Xn where each Xi takes either
the value of 0 or 1. We write
X1 , X2 , . . . , Xn IID p(x; 𝜃) = 𝜃 x (1 − 𝜃)1−x for x = 0, 1.
▶ Likelihood function is:

n
∏
L(𝜃) = 𝜃 xi (1 − 𝜃)1−xi = 𝜃 T (x1,x2 ,...,xn ) (1 − 𝜃)n−T (x1 ,x2 ,...,xn )
i =1
∑n
with T (x1 , x2 , . . . , xn ) = i =1 xi .
Example: Sampling from an Exponential Population
▶ Consider a study to determine the lifetime properties of an
electronic component (say, electric bulbs of a certain brand).
▶ Assume that the lifetime distribution of a component is
exponential with parameter 𝜆, that is, the pdf is
f (t; 𝜆) = 𝜆 exp(−𝜆t) for t > 0.
▶ We perform a life-testing experiment consisting of n = 50
components where we observe the lifetimes of these 50
components.
▶ The sample data will be
T1 , T2 , . . . , T50 IID f (t; 𝜆) = 𝜆 exp(−𝜆t)

▶ Likelihood function is
50
{ }
∑
50
L(𝜆) = 𝜆 exp −𝜆 Ti
i =1
Example: Sampling from a Normal Population
▶ Normal Population with mean 𝜇 and variance 𝜎 2 .
▶ Random sample of size n:
X1 , X2 , . . . , Xn IID N(𝜇, 𝜎 2 )
▶ Likelihood Function:
n
{ }
2 1 1 ∑ 2
L(𝜇, 𝜎 ) = √ exp − 2 (xi − 𝜇)
(𝜎 2 )(n/2) ( 2𝜋)n 2𝜎
i =1
▶ Note that with
n
1∑
x̄ = xi ,
n
i =1
we have
n
∑ n
∑
2
(xi − 𝜇) = (xi − x̄)2 + n(x̄ − 𝜇)2
i =1 i =1
Sample Statistics
▶ Given a random sample X1 , X2 , . . . , Xn .

▶ Any function of X1 , X2 , . . . , Xn , together possibly with known
constants, is called a sample statistic.
▶ Note that compared to parameters whose values we will
usually not know, we will be able to know or compute the
values of sample statistics since we will observe the values of
the Xi s.
▶ Sample statistics will be the basis of inference about the
parameters of the population distribution.
Common Examples of Sample Statistics
1 ∑n
▶ Sample Mean: X̄ = n i =1 Xi
▶ When the values of the Xi s are either 0 or 1, the sample mean
is the sample proportion.
▶ Sample Median: This is a value that divides the ordered data
set into two equal parts.
kth Sample Moment: Mk′ = n1 ni=1 Xik
∑
▶
▶ Sample Variance:
n
[ n ]
2 1 ∑ 2 1 ∑
2 2
S = (Xi − X̄ ) = Xi − n(X̄ )
n−1 n−1
i =1 i =1
√
▶ Sample Standard Deviation: S = + S 2
Sample Covariance and Correlation
Definition
Let (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn ) be a bivariate random sample
from a joint bivariate distribution. The sample covariance is
defined to be
n
[ n ]
1 ∑ 1 ∑
SXY = (Xi − X̄ )(Yi − Ȳ ) = Xi Yi − nX̄ Ȳ .
n−1 n−1
i =1 i =1
The sample correlation coefficient is

SXY
r=
(SX )(SY )
where SX and SY are the sample standard deviations of the Xi s

and Yi s, respectively.
Commands in R to Compute Sample Statistics
▶ x below is the vector containing the values of x1 , x2 , . . . , xn ; y

is another vector containing the values of y1 , y2 , . . . , yn
▶ To Create a Histogram: hist(x)
▶ Sample Mean (X̄ ): mean(x)
▶ Sample Median: median(x)
▶ Sample Variance (S 2 ): var(x)
▶ Sample Standard Deviation (S): sd(x) or sqrt(var(x))
▶ Sample Covariance (SXY ): cov(x,y)
▶ Sample Correlation (r ): cor(x,y)
Sampling Distributions of Sample Statistics
▶ Since the sample X1 , X2 , . . . , Xn consists of random variables,

then a sample statistic is also random variable, hence it has its
own distribution function. Such a distribution function is
called a sampling distribution.
▶ Thus, the sample mean has its own sampling distribution; the
sample median has its own sampling distribution; and the
sample variance has its own sampling distribution. Each of
these sampling distributions will have their own mean and
variance.
▶ Thus, for the sample mean we will have the mean of the
sample mean, denoted by 𝜇X̄ , and we will have the variance of
the sample mean, denoted by 𝜎X̄2 .
▶ The standard deviation of the sample mean, denoted 𝜎X̄ is
called the standard error of the sample mean.
Basic Results about Sampling Distributions
▶ Let X1 , X2 , . . . , Xn be a random sample from a population or
distribution whose mean is 𝜇 and variance 𝜎 2 .
▶ The mean of the sample mean is
𝜇X̄ = E (X̄ ) = 𝜇.
▶ The variance of the sample mean is
𝜎2
𝜎X̄2 = Var (X̄ ) = .
n
▶ The standard error of the sample mean is
𝜎
𝜎X̄ = SE (X̄ ) = √ .
n
▶ The mean of the sample variance is
𝜇S 2 = E (S 2 ) = 𝜎 2 .
Sampling Distribution of X̄ from Normal Population
Theorem
Let X1 , X2 , . . . , Xn be a random sample from a normal
population/distribution with mean 𝜇 and variance 𝜎 2 . The the
sampling distribution of the sample mean X̄ is normal with mean
𝜇X̄ = 𝜇 and variance 𝜎X̄2 = 𝜎 2 /n. That is,
𝜎2
( )
X̄ ∼ N 𝜇, .
n
Equivalently,
X̄ − 𝜇
Z= √ ∼ N(0, 1).
𝜎/ n
Chi-Square Distribution
Definition
A positive-valued random variable X is said to have a chi-squared
distribution with degrees-of-freedom k if it has a gamma
distribution with shape parameter 𝛼 = k/2 and scale parameter
𝜆 = 1/2. That is, its pdf is
1
f (x) = x k/2−1 exp(−x/2) for x ≥ 0.
2k/2 Γ(k/2)
We denote this by X ∼ 𝜒2k . For such a random variable,
E (X ) = k and Var (X ) = 2k.
Sampling Distribution of S 2 under Normal Population
Theorem
population/distribution with mean 𝜇 and variance 𝜎 2 . Let S 2 be
the sample variance. Then
(n − 1)S 2
V = ∼ 𝜒2n−1 .
𝜎2
Consequently, under a normal population,
2𝜎 4
E (S 2 ) = 𝜎 2 and Var (S 2 ) = .
n−1
Independence of X̄ and S 2 under Normality
Theorem
population/distribution with mean 𝜇 and variance 𝜎 2 . Then the
sample mean X̄ and the sample variance S 2 are independent.
Remark: This implies that knowledge of the value of X̄ does not

provide any additional information about the value of S 2 !
Central Limit Theorem
Theorem
Let X1 , X2 , . . . , Xn be a random sample from any population or
distribution with mean 𝜇 and variance 𝜎 2 . For large sample size n
(usually at least 30), the sampling distribution of the sample mean
X̄ is approximately normal with mean 𝜇X̄ = 𝜇 and variance
𝜎X̄2 = 𝜎 2 /n. In short hand, for large (n),
𝜎2
( )
⋅
X̄ ∼ N 𝜇, .
n
As such,
X̄ − 𝜇 ⋅
Z= √ ∼ N(0, 1).
𝜎/ n
Importance: Normal distribution can be used to compute

probabilities for the sample mean X̄ .
Simulated Demonstrations: Under Normality and the CLT
▶ Design of the Simulation Study.

▶ For different types of populations or distributions (Bernoulli,
Uniform, Exponential, Gamma, Normal) we will take samples
of size n and for each sample we compute the sample mean
(X̄ ) and the sample variance (S 2 ).
▶ We will choose n to be n = 2, n = 5, n = 10, n = 30, and
n = 100.
▶ For each combination of population type and sample size n,
we repeat the sampling process a total of MREPS = 10000
times. We then examine the empirical sampling distributions
of the sample mean and the sample variance.
▶ We examine the shape of their histograms (are they becoming
more normal?), their means and their variances relative to
what are expected under theory.
For a Normal Population with 𝜇 = 20, 𝜎 2 = 25
n 2.0000 10.0000 30.0000 100.0000

MeanOfSampMeans 19.9483 20.0024 19.9990 19.9952
PopnMean 20.0000 20.0000 20.0000 20.0000
VarOfSampMeans 12.3503 2.5105 0.8443 0.2498
TrueVarOfSampMean 12.5000 2.5000 0.8333 0.2500
MeanOfSampVars 24.9223 25.0382 25.0149 25.0041
PopnVar 25.0000 25.0000 25.0000 25.0000
VarOfSampVars 1244.4987 137.0298 42.7614 12.7159
Sampling Distributions of X̄ under Normal Popn
Sample Means With n=2 Sample Means With n=10
1000 2000
1500
Frequency
Frequency
0 500
0
5 10 15 20 25 30 35 14 18 22 26
SMeans[, 1] SMeans[, 2]

1500
500 1000
Frequency
Frequency
0 500
17 19 21 23 19 20 21 22
Sampling Distributions of S 2 under Normal Popn
Sample Variances With n=2 Sample Variances With n=10
1500
5000
Frequency
Frequency
0 2000
0 500
0 100 200 300 400 0 20 40 60 80
SVars[, 1] SVars[, 2]

2500
1500
Frequency
Frequency
0 1000
0 500
10 30 50 70 15 20 25 30 35 40
Independence of X̄ and S 2 under Normal Popn
XBar and S2 With n=2 XBar and S2 With n=10

Sample Variance
Sample Variance
0 4 8 12
0.5 2.0
−2 −1 0 1 2 −1.0 0.0 0.5 1.0
Sample Mean Sample Mean

Sample Variance
Sample Variance
0.6 1.0 1.4

0.5 1.5
−0.6 −0.2 0.2 0.6 −0.3 −0.1 0.1 0.3
For a Binomial Population with 𝜃 = .5
n 2.0000 10.0000 30.0000 100.0000

PopnMean 0.5000 0.5000 0.5000 0.5000
VarOfSampMeans 0.1243 0.0250 0.0083 0.0024
MeanOfSampVars 0.2513 0.2499 0.2499 0.2500
PopnVar 0.2500 0.2500 0.2500 0.2500
VarOfSampVars 0.0625 0.0013 0.0001 0.0000
Sampling Distributions of X̄ for Bernoulli Popn
1000 2000
2000 4000
Frequency
Frequency
0
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

1000 2000
Frequency
Frequency
500 1000
0
0.2 0.4 0.6 0.8 0.4 0.5 0.6 0.7
Sampling Distributions of S 2 for Bernoulli Popn

2000 4000
5000
Frequency
Frequency
0 2000
0
0.0 0.1 0.2 0.3 0.4 0.5 0.00 0.10 0.20

5000
3000
Frequency
Frequency
0 2000
0 1000
0.14 0.18 0.22 0.26 0.22 0.23 0.24 0.25
For a Uniform Population with (𝛼, 𝛽) = (0, 1)
n 2.0000 10.0000 30.0000 100.0000

PopnMean 0.5000 0.5000 0.5000 0.5000
VarOfSampMeans 0.0416 0.0084 0.0028 0.0008
MeanOfSampVars 0.0828 0.0830 0.0831 0.0833
PopnVar 0.0833 0.0833 0.0833 0.0833
VarOfSampVars 0.0098 0.0007 0.0002 0.0001
Sampling Distributions of X̄ under Uniform Popn

800
1500
Frequency
Frequency
400
0 500
0
0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8

500 1000 1500
1000 2000
Frequency
Frequency
0
0.3 0.4 0.5 0.6 0.7 0.40 0.50 0.60
Sampling Distributions of S 2 under Uniform Popn
500 1000 1500

2000 4000
Frequency
Frequency
0
0
0.0 0.1 0.2 0.3 0.4 0.5 0.05 0.10 0.15 0.20

1400
1000 2000
Frequency
Frequency
0 400 800
0.04 0.08 0.12 0.06 0.08 0.10
Non-Independence of X̄ and S 2 under Uniform Popn

Sample Variance
Sample Variance
0.15
0.0 0.2 0.4
0.05
0.0 0.4 0.8 0.2 0.4 0.6

Sample Variance
Sample Variance
0.06 0.09
0.10
0.04
0.30 0.45 0.60 0.45 0.55
For an Exponential Population with 𝜆 = 1
n 2.0000 10.0000 30.0000 100.0000

PopnMean 1.0000 1.0000 1.0000 1.0000
VarOfSampMeans 0.4830 0.0989 0.0329 0.0099
MeanOfSampVars 1.0014 0.9956 1.0067 0.9992
PopnVar 1.0000 1.0000 1.0000 1.0000
VarOfSampVars 4.9061 0.8075 0.2738 0.0796
Sampling Distributions of X̄ under Exponential Popn
1000 2000
2500
Frequency
Frequency
0 1000
0
0 1 2 3 4 5 6 0.5 1.5 2.5
1500
1500
Frequency
Frequency
0 500
0 500
0.4 0.8 1.2 1.6 0.8 1.0 1.2 1.4
Sampling Distributions of S 2 under Exponential Popn

8000
5000
Frequency
Frequency
4000
0 2000
0
0 10 20 30 40 0 2 4 6 8 10

2000 4000
2500
Frequency
Frequency
0 1000
0
0 2 4 6 8 0.5 1.5 2.5
For a Gamma Population with 𝛼 = .5, 𝜆 = 1
n 2.0000 10.0000 30.0000 100.0000

PopnMean 0.5000 0.5000 0.5000 0.5000
VarOfSampMeans 0.2551 0.0500 0.0167 0.0050
MeanOfSampVars 0.5093 0.5009 0.5003 0.5009
PopnVar 0.5000 0.5000 0.5000 0.5000
VarOfSampVars 2.1788 0.3430 0.1147 0.0352
Sampling Distributions of X̄ under Gamma Popn

5000
1500
Frequency
Frequency
0 2000
0 500
0 1 2 3 4 5 6 0.0 0.5 1.0 1.5
1000 2000
Frequency
500 1000
Frequency
0
0.2 0.4 0.6 0.8 1.0 1.2 0.2 0.4 0.6 0.8
Sampling Distributions of S 2 under Gamma Popn

4000 8000
5000
Frequency
Frequency
0 2000
0
0 10 20 30 40 0 2 4 6 8
1000 2000
2500
Frequency
Frequency
0 1000
0.0 1.0 2.0 3.0 0.5 1.0 1.5
The Problem of Parameter Estimation
▶ Statement of the Problem: Given a sample data

X1 , X2 , . . . , Xn from a population or distribution F (x; 𝜃),
where 𝜃 belongs to a parameter space Θ, we would like to
estimate a parametric function of 𝜃, such as the population
mean or the population variance, based on the sample data.
▶ Ideally, we would like our estimate to be as close as possible
to the true value of the parametric function, whatever the
value of 𝜃 is.
A Special Case: Bernoulli Population
▶ Consider the problem of estimating the proportion of 1’s

(‘successes’) in a Bernoulli population. (For example, you
would like to estimate the proportion of defective items being
produced by a manufacturing process.)
▶ Recall that for a Bernoulli population the mean is 𝜇 = 𝜃 and
variance 𝜎 2 = 𝜃(1 − 𝜃).
▶ A sample of size n from this population is
X1 , X2 , . . . , Xn IID Ber (𝜃).
▶ Note that the Xi s will take 0/1-values.

▶ How do we estimate 𝜃 based on the sample data? If we could
estimate 𝜃, then we could also estimate the variance.
The Method-of-Moments (MM) Approach
▶ The first population moment is 𝜇 = 𝜃.

▶ The first sample moment, which is the sample mean X̄ , has a
mean that equals the first population moment.
▶ The idea therefore is, since this first sample moment will, on
average, equal 𝜇, we could obtain an estimate of 𝜇 by
equating it with the sample mean, X̄ .
▶ Therefore, a method-of-moments (MM) estimate of 𝜇 = 𝜃 is
the sample mean, X̄ . We write
𝜃ˆ = X̄ .
▶ To estimate the variance 𝜎 2 = 𝜃(1 − 𝜃), we substitute 𝜃ˆ for 𝜃

to get
ˆ 2 = X̄ (1 − X̄ ).
𝜎
The Maximum Likelihood (ML) Approach
▶ As discussed earlier, when sampling from a Bernoulli

population, the likelihood function for 𝜃, given the sample
data X1 , X2 , . . . , Xn , is
L(𝜃) = 𝜃 T (1 − 𝜃)n−T
∑n
where T = i =1 Xi .
▶ The likelihood function provides the probability of observing
the data given the value of 𝜃.
▶ The ML principle relies on the idea that the best estimate of 𝜃
is the value that will yield the largest possible likelihood.
▶ To obtain the estimate, we therefore maximize the likelihood
function, or equivalently, maximize the log-likelihood function.
ML Estimate for the Bernoulli Population
▶ The log-likelihood function for the Bernoulli population is
l (𝜃) = log L(𝜃) = T log(𝜃) + (n − T ) log(1 − 𝜃).
▶ The derivative with respect to 𝜃 of l (𝜃) is
dl T n−T
= l ′ (𝜃) = − .
d𝜃 𝜃 1−𝜃
▶ Equating l ′ (𝜃) to zero, then solving for 𝜃, yields the maximizer
T
𝜃ˆ = = X̄ .
n
▶ This ML estimate for this Bernoulli population coincides with
the MM estimate.
Remark About the Variance Estimate
▶ We have seen that for the Bernoulli population, the MM and
ML estimates of the variance 𝜎 2 = 𝜃(1 − 𝜃) is
ˆ 2 = X̄ (1 − X̄ ).
𝜎
▶ This estimate is related to the sample variance S 2 via

n
n−1 2 1∑
X̄ (1 − X̄ ) = S = (Xi − X̄ )2 .
n n
i =1
This follows from the fact that X = X 2 since X takes a 0 or 1

value.
▶ In practice, we usually use the sample variance, S 2 , to
estimate the variance since, on the average, it equals the
variance (we say that S 2 is unbiased for 𝜎 2 ); whereas
X̄ (1 − X̄ ) tends to underestimate 𝜎 2 .
Standard Error of the Estimate
▶ The MM or ML estimate of 𝜇 = 𝜃 is 𝜃ˆ = X̄ . On average, this

will equal the true value of 𝜃, that is, E (X̄ ) = 𝜃.
▶ As discussed earlier, the standard deviation, called the
standard error, of X̄ is equal to
√ √
𝜎2 𝜃(1 − 𝜃)
SE (X̄ ) = = .
n n
▶ This standard error represents the variability that is associated
with X̄ when we think of repeating the sampling process of
size n from this Bernoulli population and for each sample
computing the value of X̄ .
Estimate of the Standard Error
▶ Since we do not know the value of 𝜃, that is why we are

trying to estimate it, then we will also not know the value of
the standard error. We estimate this by substituting our
estimate of the variance.
√
ˆ X̄ (1 − X̄ )
SE (X̄ ) = .
n−1
▶ If the sampling distribution of X̄ is close to normal [justified

by Central Limit Theorem], which is the case when 𝜃 is not
too close to zero nor too close to one, we have that
ˆ ˆ
{ }
P 𝜃 − (1.96)SE (X̄ ) ≤ X̄ ≤ 𝜃 + (1.96)SE (X̄ ) ≈ 0.95
Notion of a Confidence Interval
▶ This is equivalent to writing
ˆ ˆ
{ }
P X̄ − (1.96)SE (X̄ ) ≤ 𝜃 ≤ X̄ + (1.96)SE (X̄ ) ≈ 0.95
▶ That is, we will be approximately 95% confident that the

value of 𝜃 is in the interval
ˆ ˆ
[ ]
X̄ − (1.96)SE (X̄ ), X̄ + (1.96)SE (X̄ )
▶ Such an interval is called an approximate 95% confidence

interval for 𝜃.
▶ The so-called estimate of the margin of error of X̄ is the
quantity
√
ˆ X̄ (1 − X̄ )
ME (X̄ ) = (1.96)SE (X̄ ) = 1.96 .
n−1
A Concrete Numerical Example for Bernoulli Population
▶ 𝜃: proportion of defective components produced by process.

▶ X = 1 if component is defective; X = 0 if component is good.
▶ We take a random sample of size n = 100.
▶ Suppose 12 of these 100 sampled components are defective.
That is,
100
∑
T = Xi = 12.
i =1
▶ The MM or ML estimate of 𝜃 is
𝜃ˆ = X̄ = 12/100 = .12.
▶ An estimate of the the variance 𝜎 2 is
𝜎ˆ2 = (.12)(1 − .12) = .1056.
The Confidence Interval for Bernoulli Example
▶ Estimate of the standard error of X̄ is

ˆ
√
SE (X̄ ) = (.12)(1 − .12)/(100 − 1) = .0327.
▶ Estimate of the margin of error of X̄ is
ˆ
ME (X̄ ) = (1.96)(.0327) = .0641.
▶ An approximate 95% confidence interval for 𝜃 is therefore
[.12 − .0641, .12 + .0641] = [.0559, .1841].
▶ We would then say that we are approximately 95% that 𝜃 is

between .0559 and .1841.
Measurement Model with Normal Errors
▶ Seek to measure a fixed value c, e.g., the speed of light; the
volume of a container.
▶ Measurement process not perfect; measured values are
contaminated with errors.
▶ Error Model:
Y =c +𝜖
where Y the measurement, c is the true (but unknown) value,
and 𝜖 is the error contamination, which is also not known.
▶ Assumption on the Error Term:
𝜖 ∼ N(0, 𝜎 2 )
where the variance 𝜎 2 is not known.
▶ To discover both c and 𝜎 2 , we make n measurements, so we
get the data Y1 , Y2 , . . . , Yn . Note that
Y1 , Y2 , . . . , Yn IID N(c, 𝜎 2 ).
Least-Squares Approach: Estimating c
▶ A distance measure:
n
∑ n
∑
Q(c) = 𝜖2i = (Yi − c)2
i =1 i =1
▶ To get an estimate of c, we minimize Q(c). The first

derivative of Q wrt c is
n n
dQ ∑ ∑
= 2(Yi − c)(−1) = 2( Yi − nc).
dc
i =1 i =1
▶ Equating this to zero and solving for c, we obtain the

estimator ∑n
Yi
ĉ = i =1 = Ȳ .
n
Estimating the Variance 𝜎 2
▶ To get an estimator for the variance 𝜎 2 , observe that
𝜎 2 = E [(Y − c)2 ], so that
[ n ]
1∑
E (Yi − c) = 𝜎 2 .
2
n
i =1
▶ If we knew c, we could then estimate 𝜎 2 using

n
1∑
(Yi − c)2 .
n
i =1
▶ However, since we do not c, we could not compute the above
quantity. Thus we replace c by ĉ = Ȳ , which yields the
estimator of 𝜎 2 given by
n
1∑
𝜎ˆ2 = (Yi − Ȳ )2 .
n
i =1
Getting an Unbiased Estimator of 𝜎 2
▶ Interestingly, after this substitution of ĉ for c, the mean of 𝜎ˆ2

is not 𝜎 2 , but is instead (n − 1)𝜎 2 /n.
▶ To get an estimator whose mean is 𝜎 2 , that is, an unbiased
estimator, we multiply 𝜎ˆ2 by n/(n − 1), and this leads to the
sample variance S 2 as our estimator of 𝜎 2 .
▶ Recall that the sample variance is
n
[ n ]
2 1 ∑ 2 1 ∑
2 2
S = (Yi − Ȳ ) = Yi − nȲ .
n−1 n−1
i =1 i =1
This is what we use in practice for estimating 𝜎 2 .
Method-of-Moments Approach
▶ Population Moments:
E (Y ) = c and E (Y 2 ) = 𝜎 2 + c 2
▶ First two Sample Moments:
n
1∑ 2
M1′ = X̄ and M2′ = Yi
n
i =1
▶ Equate population and sample moments:
n
1∑ 2
c = X̄ and 𝜎 2 + c 2 = Yi
n
i =1
▶ Solving for c and 𝜎2 (exercise!) we get:
n
1∑
ĉ = Ȳ and 𝜎ˆ2 = (Yi − Ȳ )2
n
i =1
▶ Same estimators as in the least-squares approach.
Maximum Likelihood Approach
▶ The likelihood function of (c, 𝜎 2 ) is:

n
{ }
1 n/2
[ ]
2 1 ∑ 2
L(c, 𝜎 ) = exp − 2 (Yi − c)
2𝜋𝜎 2 2𝜎
i =1
▶ By maximizing this function or its logarithm with respect to c

and 𝜎 2 [left as an exercise], we obtain also
n
1∑
ĉ = Ȳ and 𝜎ˆ2 = (Yi − Ȳ )2
n
i =1
▶ Thus, the LS, MM, and ML estimators coincide under this

normal measurement error model!
Example: Michelson’s Speed of Light Measurements
▶ See the website:
http://www.itl.nist.gov/div898/bayesian/datagall/michelso.dat
▶ Michelson’s 1879 Data: There were 101 observations (in

millions of meters per second).
▶ Histogram of this 101 observations are in next slide.
▶ Summary Statistics:
n = 101, Ȳ = 299.8524, S 2 = 0.006242667, S = 0.07901055
▶ Our estimate of the speed of light is therefore
ĉ = 299.8524 millions of meters per second.
▶ Estimate of the Standard Error of the Estimate:

√ √
ˆ
SE (ĉ) = S/ n = 0.07901055/ 101 = 0.007861843
Histogram of Michelson’s Speed of Light Data
Michelson’s 1879 Speed of Light Data

30
25
20
Frequency
15
10
5
0
299.6 299.7 299.8 299.9 300.0 300.1
SpeedOfLight
On Desirable Properties of Estimators
▶ An estimator T of 𝜃 is said to be unbiased if E (T ∣𝜃) = 𝜃

whatever 𝜃 is. That is, on the average it hits the ‘target.’ An
unbiased estimator is usually preferred over a biased estimator.
An unbiased estimator is usually said to be accurate.
▶ Given two unbiased estimators of 𝜃, say T1 and T2 , the one
with a smaller variance is preferred over the other. We say
that if T1 has smaller variance than T2 , then T1 is more
precise.
▶ Theoretically, it turns out that ML estimators are usually the
ones that are unbiased and with small variance. Most
estimators we use are the ML estimators.
▶ Generally, MM estimators tend to be less precise than ML
estimators, though usually they are easier to compute.
Estimating Normal Mean: Battle of Three Estimators
▶ Population: Normal with mean 𝜇 = 20 and 𝜎 = 25.

▶ Sample Data: X1 , X2 , . . . , X10
▶ Estimator 1: Sample Mean, X̄ .
▶ Estimator 2: Sample Median, M, which is the value that
divides the arranged data into two equal parts.
▶ Estimator 3: Sample Midrange, MR, which is the average of
the smallest and largest observations.
▶ We take 10000 samples of size 10 each.
Performance of Three Estimators under Normal
Boxplots Samp Dist, Mean

−20 20 60
Frequency
0 1000
SMean SMidrange −10 10 30 50
SMean
SampDist, Median Samp Dist, Midrange

Frequency
Frequency
1000
0 1000
0
−20 0 20 40 60 −20 0 20 40 60
SMedian SMidrange
Estimating Uniform Mean: Battle of Three Estimators
▶ Population: Uniform[10, 20] with mean 𝜇 = 15.

▶ Sample Data: X1 , X2 , . . . , X10
▶ Estimator 1: Sample Mean, X̄ .
▶ Estimator 2: Sample Median, M, which is the value that
divides the arranged data into two equal parts.
▶ Estimator 3: Sample Midrange, MR, which is the average of
the smallest and largest observations.
▶ We take 10000 samples of size 10 each.
Sample Mean is Not the Best under Uniform
Boxplots Samp Dist, Mean
Frequency
12 16
0 1000
SMean SMidrange 12 14 16 18
SMean
SampDist, Median Samp Dist, Midrange

Frequency
Frequency
0 1500
0 600
12 14 16 18 12 14 16 18
SMedian SMidrange

Statistics For Engineers Statistics 509, Fall 2010: Professor Edsel A. Pe Na E-Mail: Pena@stat - Sc.edu

Uploaded by

Copyright:

Available Formats

Statistics For Engineers Statistics 509, Fall 2010: Professor Edsel A. Pe Na E-Mail: Pena@stat - Sc.edu

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics For Engineers Statistics 509, Fall 2010: Professor Edsel A. Pe Na E-Mail: Pena@stat - Sc.edu

Uploaded by

Copyright:

Available Formats

Statistics for Engineers

Statistics 509, Fall 2010

Professor Edsel A. Peña

November 16, 2010

▶ Reprise: Distributions as Population Models

▶ The probability distributions (both discrete and continuous)

▶ Consider a population where the variable of interest is denoted

Let X1 , X2 , . . . , Xn be a random sample from a discrete PMF

Given the sample values (x1 , x2 , . . . , xn ), if we view the joint PMF

For a given 𝜃, L(𝜃) represents the probability of getting the data

Let X1 , X2 , . . . , Xn be a random sample from a continuous PDF

Given the sample values (x1 , x2 , . . . , xn ), the likelihood function is:

For a given 𝜃, L(𝜃) represents the ‘likelihood’ of getting the data

X1 , X2 , . . . , Xn IID p(x; 𝜃) = 𝜃 x (1 − 𝜃)1−x for x = 0, 1.

▶ Likelihood function is:

T1 , T2 , . . . , T50 IID f (t; 𝜆) = 𝜆 exp(−𝜆t)

▶ Given a random sample X1 , X2 , . . . , Xn .

The sample correlation coeﬃcient is

where SX and SY are the sample standard deviations of the Xi s

▶ x below is the vector containing the values of x1 , x2 , . . . , xn ; y

▶ Since the sample X1 , X2 , . . . , Xn consists of random variables,

We denote this by X ∼ 𝜒2k . For such a random variable,

E (X ) = k and Var (X ) = 2k.

Remark: This implies that knowledge of the value of X̄ does not

Importance: Normal distribution can be used to compute

▶ Design of the Simulation Study.

n 2.0000 10.0000 30.0000 100.0000

Sample Means With n=2 Sample Means With n=10

Sample Means With n=30 Sample Means With n=100

Sample Variances With n=2 Sample Variances With n=10

Sample Variances With n=30 Sample Variances With n=100

XBar and S2 With n=2 XBar and S2 With n=10

Sample Mean Sample Mean

XBar and S2 With n=30 XBar and S2 With n=100

0.6 1.0 1.4

−0.6 −0.2 0.2 0.6 −0.3 −0.1 0.1 0.3

Sample Mean Sample Mean

n 2.0000 10.0000 30.0000 100.0000

Sample Means With n=2 Sample Means With n=10

Sample Means With n=30 Sample Means With n=100

0.2 0.4 0.6 0.8 0.4 0.5 0.6 0.7

Sample Variances With n=2 Sample Variances With n=10

0.0 0.1 0.2 0.3 0.4 0.5 0.00 0.10 0.20

Sample Variances With n=30 Sample Variances With n=100

0.14 0.18 0.22 0.26 0.22 0.23 0.24 0.25

n 2.0000 10.0000 30.0000 100.0000

Sample Means With n=2 Sample Means With n=10

Sample Means With n=30 Sample Means With n=100

0.3 0.4 0.5 0.6 0.7 0.40 0.50 0.60

Sample Variances With n=2 Sample Variances With n=10

500 1000 1500

Sample Variances With n=30 Sample Variances With n=100

0.04 0.08 0.12 0.06 0.08 0.10

XBar and S2 With n=2 XBar and S2 With n=10

Sample Mean Sample Mean

XBar and S2 With n=30 XBar and S2 With n=100

0.30 0.45 0.60 0.45 0.55

Sample Mean Sample Mean

n 2.0000 10.0000 30.0000 100.0000