Statistics For Engineers Statistics 509, Fall 2010: Professor Edsel A. Pe Na E-Mail: Pena@stat - Sc.edu
Statistics For Engineers Statistics 509, Fall 2010: Professor Edsel A. Pe Na E-Mail: Pena@stat - Sc.edu
Statistics For Engineers Statistics 509, Fall 2010: Professor Edsel A. Pe Na E-Mail: Pena@stat - Sc.edu
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Lecture 05: Sampling, Likelihoods, and Estimation
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Problem of Inference
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Sampling and Experiments
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Joint PMF and Likelihood for Discrete Population
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Joint PDF and Likelihood for Continuous Population
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Example: Sampling from Bernoulli Population
▶ Consider a population with only two values: 0 and 1, a
Bernoulli population.
▶ Denote by 𝜃 the proportion of 1s in this population. This is
the parameter of interest and it will be in Θ = (0, 1).
▶ We sample, with replacement, n units from this population.
The sample data is X1 , X2 , . . . , Xn where each Xi takes either
the value of 0 or 1. We write
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Example: Sampling from an Exponential Population
▶ Consider a study to determine the lifetime properties of an
electronic component (say, electric bulbs of a certain brand).
▶ Assume that the lifetime distribution of a component is
exponential with parameter 𝜆, that is, the pdf is
f (t; 𝜆) = 𝜆 exp(−𝜆t) for t > 0.
▶ We perform a life-testing experiment consisting of n = 50
components where we observe the lifetimes of these 50
components.
▶ The sample data will be
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Example: Sampling from a Normal Population
▶ Normal Population with mean 𝜇 and variance 𝜎 2 .
▶ Random sample of size n:
X1 , X2 , . . . , Xn IID N(𝜇, 𝜎 2 )
▶ Likelihood Function:
n
{ }
2 1 1 ∑ 2
L(𝜇, 𝜎 ) = √ exp − 2 (xi − 𝜇)
(𝜎 2 )(n/2) ( 2𝜋)n 2𝜎
i =1
▶ Note that with
n
1∑
x̄ = xi ,
n
i =1
we have
n
∑ n
∑
2
(xi − 𝜇) = (xi − x̄)2 + n(x̄ − 𝜇)2
i =1 i =1
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Sample Statistics
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Common Examples of Sample Statistics
1 ∑n
▶ Sample Mean: X̄ = n i =1 Xi
▶ When the values of the Xi s are either 0 or 1, the sample mean
is the sample proportion.
▶ Sample Median: This is a value that divides the ordered data
set into two equal parts.
kth Sample Moment: Mk′ = n1 ni=1 Xik
∑
▶
▶ Sample Variance:
n
[ n ]
2 1 ∑ 2 1 ∑
2 2
S = (Xi − X̄ ) = Xi − n(X̄ )
n−1 n−1
i =1 i =1
√
▶ Sample Standard Deviation: S = + S 2
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Sample Covariance and Correlation
Definition
Let (X1 , Y1 ), (X2 , Y2 ), . . . , (Xn , Yn ) be a bivariate random sample
from a joint bivariate distribution. The sample covariance is
defined to be
n
[ n ]
1 ∑ 1 ∑
SXY = (Xi − X̄ )(Yi − Ȳ ) = Xi Yi − nX̄ Ȳ .
n−1 n−1
i =1 i =1
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Commands in R to Compute Sample Statistics
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Sampling Distributions of Sample Statistics
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Basic Results about Sampling Distributions
▶ Let X1 , X2 , . . . , Xn be a random sample from a population or
distribution whose mean is 𝜇 and variance 𝜎 2 .
▶ The mean of the sample mean is
𝜇X̄ = E (X̄ ) = 𝜇.
▶ The variance of the sample mean is
𝜎2
𝜎X̄2 = Var (X̄ ) = .
n
▶ The standard error of the sample mean is
𝜎
𝜎X̄ = SE (X̄ ) = √ .
n
▶ The mean of the sample variance is
𝜇S 2 = E (S 2 ) = 𝜎 2 .
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Sampling Distribution of X̄ from Normal Population
Theorem
Let X1 , X2 , . . . , Xn be a random sample from a normal
population/distribution with mean 𝜇 and variance 𝜎 2 . The the
sampling distribution of the sample mean X̄ is normal with mean
𝜇X̄ = 𝜇 and variance 𝜎X̄2 = 𝜎 2 /n. That is,
𝜎2
( )
X̄ ∼ N 𝜇, .
n
Equivalently,
X̄ − 𝜇
Z= √ ∼ N(0, 1).
𝜎/ n
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Chi-Square Distribution
Definition
A positive-valued random variable X is said to have a chi-squared
distribution with degrees-of-freedom k if it has a gamma
distribution with shape parameter 𝛼 = k/2 and scale parameter
𝜆 = 1/2. That is, its pdf is
1
f (x) = x k/2−1 exp(−x/2) for x ≥ 0.
2k/2 Γ(k/2)
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Sampling Distribution of S 2 under Normal Population
Theorem
Let X1 , X2 , . . . , Xn be a random sample from a normal
population/distribution with mean 𝜇 and variance 𝜎 2 . Let S 2 be
the sample variance. Then
(n − 1)S 2
V = ∼ 𝜒2n−1 .
𝜎2
Consequently, under a normal population,
2𝜎 4
E (S 2 ) = 𝜎 2 and Var (S 2 ) = .
n−1
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Independence of X̄ and S 2 under Normality
Theorem
Let X1 , X2 , . . . , Xn be a random sample from a normal
population/distribution with mean 𝜇 and variance 𝜎 2 . Then the
sample mean X̄ and the sample variance S 2 are independent.
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Central Limit Theorem
Theorem
Let X1 , X2 , . . . , Xn be a random sample from any population or
distribution with mean 𝜇 and variance 𝜎 2 . For large sample size n
(usually at least 30), the sampling distribution of the sample mean
X̄ is approximately normal with mean 𝜇X̄ = 𝜇 and variance
𝜎X̄2 = 𝜎 2 /n. In short hand, for large (n),
𝜎2
( )
⋅
X̄ ∼ N 𝜇, .
n
As such,
X̄ − 𝜇 ⋅
Z= √ ∼ N(0, 1).
𝜎/ n
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
For a Normal Population with 𝜇 = 20, 𝜎 2 = 25
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Sampling Distributions of X̄ under Normal Popn
1000 2000
1500
Frequency
Frequency
0 500
0
5 10 15 20 25 30 35 14 18 22 26
SMeans[, 1] SMeans[, 2]
500 1000
Frequency
Frequency
0 500
17 19 21 23 19 20 21 22
SMeans[, 3] SMeans[, 4]
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Sampling Distributions of S 2 under Normal Popn
1500
5000
Frequency
Frequency
0 2000
0 500
0 100 200 300 400 0 20 40 60 80
SVars[, 1] SVars[, 2]
1500
Frequency
Frequency
0 1000
0 500
10 30 50 70 15 20 25 30 35 40
SVars[, 3] SVars[, 4]
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Independence of X̄ and S 2 under Normal Popn
Sample Variance
0 4 8 12
0.5 2.0
−2 −1 0 1 2 −1.0 0.0 0.5 1.0
Sample Variance
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
For a Binomial Population with 𝜃 = .5
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Sampling Distributions of X̄ for Bernoulli Popn
1000 2000
2000 4000
Frequency
Frequency
0
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
SMeans[, 1] SMeans[, 2]
Frequency
500 1000
0
SMeans[, 3] SMeans[, 4]
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Sampling Distributions of S 2 for Bernoulli Popn
5000
Frequency
Frequency
0 2000
0
SVars[, 1] SVars[, 2]
3000
Frequency
Frequency
0 2000
0 1000
SVars[, 3] SVars[, 4]
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
For a Uniform Population with (𝛼, 𝛽) = (0, 1)
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Sampling Distributions of X̄ under Uniform Popn
1500
Frequency
Frequency
400
0 500
0
0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8
SMeans[, 1] SMeans[, 2]
1000 2000
Frequency
Frequency
0
SMeans[, 3] SMeans[, 4]
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Sampling Distributions of S 2 under Uniform Popn
Frequency
0
0
0.0 0.1 0.2 0.3 0.4 0.5 0.05 0.10 0.15 0.20
SVars[, 1] SVars[, 2]
1000 2000
Frequency
Frequency
0 400 800
SVars[, 3] SVars[, 4]
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Non-Independence of X̄ and S 2 under Uniform Popn
Sample Variance
0.15
0.0 0.2 0.4
0.05
0.0 0.4 0.8 0.2 0.4 0.6
Sample Variance
0.06 0.09
0.10
0.04
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
For an Exponential Population with 𝜆 = 1
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Sampling Distributions of X̄ under Exponential Popn
1000 2000
2500
Frequency
Frequency
0 1000
0
0 1 2 3 4 5 6 0.5 1.5 2.5
SMeans[, 1] SMeans[, 2]
1500
1500
Frequency
Frequency
0 500
0 500
SMeans[, 3] SMeans[, 4]
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Sampling Distributions of S 2 under Exponential Popn
5000
Frequency
Frequency
4000
0 2000
0
0 10 20 30 40 0 2 4 6 8 10
SVars[, 1] SVars[, 2]
2500
Frequency
Frequency
0 1000
0
SVars[, 3] SVars[, 4]
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
For a Gamma Population with 𝛼 = .5, 𝜆 = 1
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Sampling Distributions of X̄ under Gamma Popn
1500
Frequency
Frequency
0 2000
0 500
0 1 2 3 4 5 6 0.0 0.5 1.0 1.5
SMeans[, 1] SMeans[, 2]
1000 2000
Frequency
500 1000
Frequency
0
0.2 0.4 0.6 0.8 1.0 1.2 0.2 0.4 0.6 0.8
SMeans[, 3] SMeans[, 4]
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Sampling Distributions of S 2 under Gamma Popn
5000
Frequency
Frequency
0 2000
0
0 10 20 30 40 0 2 4 6 8
SVars[, 1] SVars[, 2]
1000 2000
2500
Frequency
Frequency
0 1000
SVars[, 3] SVars[, 4]
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
The Problem of Parameter Estimation
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
A Special Case: Bernoulli Population
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
The Method-of-Moments (MM) Approach
𝜃ˆ = X̄ .
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
The Maximum Likelihood (ML) Approach
L(𝜃) = 𝜃 T (1 − 𝜃)n−T
∑n
where T = i =1 Xi .
▶ The likelihood function provides the probability of observing
the data given the value of 𝜃.
▶ The ML principle relies on the idea that the best estimate of 𝜃
is the value that will yield the largest possible likelihood.
▶ To obtain the estimate, we therefore maximize the likelihood
function, or equivalently, maximize the log-likelihood function.
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
ML Estimate for the Bernoulli Population
dl T n−T
= l ′ (𝜃) = − .
d𝜃 𝜃 1−𝜃
▶ Equating l ′ (𝜃) to zero, then solving for 𝜃, yields the maximizer
T
𝜃ˆ = = X̄ .
n
▶ This ML estimate for this Bernoulli population coincides with
the MM estimate.
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Remark About the Variance Estimate
▶ We have seen that for the Bernoulli population, the MM and
ML estimates of the variance 𝜎 2 = 𝜃(1 − 𝜃) is
ˆ 2 = X̄ (1 − X̄ ).
𝜎
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Estimate of the Standard Error
ˆ ˆ
{ }
P 𝜃 − (1.96)SE (X̄ ) ≤ X̄ ≤ 𝜃 + (1.96)SE (X̄ ) ≈ 0.95
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Notion of a Confidence Interval
▶ This is equivalent to writing
ˆ ˆ
{ }
P X̄ − (1.96)SE (X̄ ) ≤ 𝜃 ≤ X̄ + (1.96)SE (X̄ ) ≈ 0.95
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
A Concrete Numerical Example for Bernoulli Population
𝜃ˆ = X̄ = 12/100 = .12.
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
The Confidence Interval for Bernoulli Example
ˆ
ME (X̄ ) = (1.96)(.0327) = .0641.
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Measurement Model with Normal Errors
▶ Seek to measure a fixed value c, e.g., the speed of light; the
volume of a container.
▶ Measurement process not perfect; measured values are
contaminated with errors.
▶ Error Model:
Y =c +𝜖
where Y the measurement, c is the true (but unknown) value,
and 𝜖 is the error contamination, which is also not known.
▶ Assumption on the Error Term:
𝜖 ∼ N(0, 𝜎 2 )
where the variance 𝜎 2 is not known.
▶ To discover both c and 𝜎 2 , we make n measurements, so we
get the data Y1 , Y2 , . . . , Yn . Note that
Y1 , Y2 , . . . , Yn IID N(c, 𝜎 2 ).
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Least-Squares Approach: Estimating c
▶ A distance measure:
n
∑ n
∑
Q(c) = 𝜖2i = (Yi − c)2
i =1 i =1
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Estimating the Variance 𝜎 2
▶ To get an estimator for the variance 𝜎 2 , observe that
𝜎 2 = E [(Y − c)2 ], so that
[ n ]
1∑
E (Yi − c) = 𝜎 2 .
2
n
i =1
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Getting an Unbiased Estimator of 𝜎 2
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Method-of-Moments Approach
▶ Population Moments:
E (Y ) = c and E (Y 2 ) = 𝜎 2 + c 2
▶ First two Sample Moments:
n
1∑ 2
M1′ = X̄ and M2′ = Yi
n
i =1
▶ Equate population and sample moments:
n
1∑ 2
c = X̄ and 𝜎 2 + c 2 = Yi
n
i =1
▶ Solving for c and 𝜎2 (exercise!) we get:
n
1∑
ĉ = Ȳ and 𝜎ˆ2 = (Yi − Ȳ )2
n
i =1
▶ Same estimators as in the least-squares approach.
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Maximum Likelihood Approach
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Example: Michelson’s Speed of Light Measurements
▶ See the website:
http://www.itl.nist.gov/div898/bayesian/datagall/michelso.dat
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Histogram of Michelson’s Speed of Light Data
15
10
5
0
SpeedOfLight
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
On Desirable Properties of Estimators
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Estimating Normal Mean: Battle of Three Estimators
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Performance of Three Estimators under Normal
Frequency
0 1000
SMean SMidrange −10 10 30 50
SMean
Frequency
1000
0 1000
0
−20 0 20 40 60 −20 0 20 40 60
SMedian SMidrange
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Estimating Uniform Mean: Battle of Three Estimators
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010
Sample Mean is Not the Best under Uniform
Frequency
12 16
0 1000
SMean SMidrange 12 14 16 18
SMean
Frequency
0 1500
0 600
12 14 16 18 12 14 16 18
SMedian SMidrange
Professor Edsel A. Peña E-Mail: pena@stat.sc.edu Statistics for Engineers Statistics 509, Fall 2010