Sampling Distributions of Statistics: Corresponds To Chapter 5 of Tamhaneand Dunlop

Sampling Distributions of Statistics
Corresponds to Chapter 5 of Tamhaneand Dunlop
Slides prepared by Elizabeth Newton (MIT), with some slides by Jacqueline Telford (Johns Hopkins University)
1
Sampling Distributions
Definitions and Key Concepts A sample statistic used to estimate an unknown population parameter is called an estimate. The discrepancy between the estimate and the true parameter value is known as sampling error. A statistic is a random variable with a probability distribution, called the sampling distribution, which is generated by repeated sampling.
We use the sampling distribution of a statistic to assess the sampling error in an estimate.
2
Random Sample
Definition 5.11, page 201, Casellaand Berger. How is this different from a simple random sample? For mutual independence, population must be very large or must sample with replacement.
Sample Mean and Variance
Sample Mean
Sample Variance
How do the sample mean and variance vary in repeated samples of size n drawn from the population?
In general, difficult to find exact sampling distribution. However, see example of deriving distribution when all possible samples can be enumerated (rolling 2 dice) in sections 5.1 and 5.2. Note errors on page 168.
4
Properties of a sample mean and variance

See Theorem 5.2.2, page 268, Casella& Berger
Distribution of Sample Means

If the i.i.d. r.v.s are Bernoulli Normal Exponential The distributions of the sample means can be derived Sum of n i.i.d. Bernoulli(p) r.v.sis Binomial(n,p) Sum of n i.i.d. Normal(,2) r.v.sis Normal(n,n2) Sum of n i.i.d. Exponential() r.v.sis Gamma(,n)
Distribution of Sample Means

Generally, the exact distribution is difficult to calculate. What can be said about the distribution of the sample mean when the sample is drawn from an arbitrary population? In many cases we can approximate the distribution of the sample mean when nis large by a normal distribution. The famous Central Limit Theorem
Central Limit Theorem

Let X1, X2, , Xn be a random sample drawn from an arbitrary distribution with a finite mean and variance 2 As n goes to infinity, the sampling distribution of
converges to the N(0,1)distribution.

Sometimes this theorem is given in terms of the sums:
Central Limit Theorem

Let X1 Xn be a random sample from an arbitrary distribution with finite mean and variance 2. As n increases
What happens as n goes to infinity?

9
Variance of means from uniform distribution

sample size=10 to 10^6 number of samples=100log
log10(variance)
log10(sample.size)
This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
10
Example: Uniform Distribution

f(x| a, b) = 1 / (b-a), axb E X = (b+a)/2 Var X = (b-a)2/12
runif(500, min = 0, max = 10) This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
11
Standardized Means, Uniform Distribution500 samples, n=1
number of samples=500, n=1
12
13
14
QQ (Normal) plot of means of 500 samples of size 100 from uniform distribution
Quantiles of Standard Normal
15
Bootstrap sampling from the sample

Previous slides have shown results for means of 500 samples (of size 100) from uniform distribution. Bootstrap takes just one sample of size 100 and then takes 500 samples (of size 100) with replacement from the sample.
x<-runif(100)
y<-mean(sample(x,100,replace=T))
16
Normal probability plot of sample of size 100 from exponential distribution
Quantiles of Standard Normal
17
Normal probability plot of means of 500 bootstrap samples from sample of size 100 from exponential distribution
Quantiles of Standard Normal This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
18
Law of Large Numbers and Central Limit Theorem

Both are asymptotic results about the sample mean: Law of Large Numbers (LLN) says that as n ,the sample mean converges to the population mean, i.e.,
Central Limit Theorem (CLT) says that as n , also the distribution converges to Normal, i.e., converges to N(0,1)
19
Normal Approximation to the Binomial

A binomial r.v. is the sum of i.i.d. Bernoulli r.v.s so the CLT can be used to approximate its distribution.
Suppose that X is B(n, p). Then the mean of X is np and the variance of X is np(1 -p).
By the CLT, we have:
How large a sample, n, do we need for the approximation to be good? Rule of Thumb: np 10 and n(1-p) 10
For p=0.5, np = n(1-p)=n(0.5) = 10 n should be 20. (symmetrical)

For p=0.1 or 0.9, npor n(1-p)= n(0.1) = 10 n should be 100. (skewed) See Figures 5.2 and 5.3 and Example 5.3, pp.172-174
20
Continuity Correction
See Figure 5.4 for motivation.
Exact Binomial Probability: P(X 8)= 0.2517 Normal approximation without Continuity Correction: P(X 8)= 0.1867 Normal approximation with Continuity Correction: P(X 8.5)= 0.2514 (much better agreement with exact calculation)
21
Sampling Distribution of the Sample Variance

There is no analog to the CLT for which gives an approximation for large samples for an arbitrary distribution. The exact distribution for S2 can be derived for X ~ i.i.d. Normal. Chi-square distribution: For 1, let Z1, Z2, , Zbe i.i.d. N(0,1) and let Y = Z12+ Z22+ + Z2. The p.d.f. of Y can be shown to be
This is known as the 2 distribution with degrees of freedom (d.f.) or Y ~ See Figures 5.5 and 5.6, pp. 176-177 and Table A.5, p.676
22
Distribution of the Sample Variance in the Normal Case

CaseIf Z ~ N(0,1), then It can be shown that or equivalently a scaled
(is an unbiased estimator) Var(S2) = See Result 2 (p.179)
23
Chi-square distribution
Chi square density for df=5,10,20,30
x This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
24
Chi-Square Distribution Interesting Facts

EX = (degrees of freedom) VarX = 2 Special case of the gamma distribution with scale parameter=2, shape parameter=v/2. Chi-square variatewith v d.f. is equal to the sum of the squares of v independent unit normal
variates.
25
Students t-Distribution
Consider a random sample X1, X2, ..., Xndrawn from N(,2). It is known that is exactly distributed as N(0,1).
is NOT distributed as N(0,1). A different distribution for each = n-1 degrees of freedom (d.f.).
T is the ratio of a N(0,1) r.v. and sq.rt.(independent 2divided by

its d.f.) -for derivation, see eqn5.13, p.180, and its messy p.d.f., eqn5.14See Figure 5.7, Students tp.d.f.s for = 2, 10,and ,
p.180See Table A.4, t-distribution table, p. 675See Example 5.6,

milk cartons, p. 181
26
Students t densities for df=1,100
Students pdf, df=1 & 100
27
Students t Distribution Interesting Facts

E X = 0, for v>1 VarX = v/(v-2) for v>2 Related to F distribution (F1,v= t2v ) As v tends to infinity t variatetends to unit normal If v=1 then t variateis standard Cauchy
28
Cauchy Distribution for center=0, scale=1 and center=1, scale=2
Cauchy pdf
29
Cauchy Distribution Interesting Facts
Parameters, a=center, b=scale Mean and Variance do not exist (how could this be?) a=median Quartiles=a +/-b Special case of Students t with 1 d.f. Ratio of 2 independent unit normal variatesis standard Cauchy variate Should not be thought of as only a pathological case. (Casella& Berger) as we frequently (when?) calculate ratios of random variables.
30
Snedecor-Fishers F-Distribution
Consider two independent random samples: X1, X2, ..., Xn1from N(1,12) , Y1, Y2, ..., Yn2from
N(2,22).
Then has an F-distribution with n1-1 d.f. in the numerator and n2-1 d.f. in the denominator. F is the ratio of two independent 2s divided by their respective d.f.s Used to compare sample variances. See Table A.6, F-distribution, pp. 677-679
31
Snedecors F Distribution
F pdf for df2=40
32
Snedecors F Distribution Interesting Facts

Parameters, v, w, referred to as degrees of freedom (df). Mean = w/(w-2), for w>2 Variance = 2w2(v+w-2)/(v(w-2)2(w-4)), for w>4 As d.f., v and w increase, F variate tends to normal Related also to Chi-square, Students t, Beta and Binomial Reference for distributions: Statistical Distributions 3rded.by Evans, Hastings and Peacock, Wiley, 2000
33
Sampling Distributions - Summary

For random sample from any distribution, standardized sample mean converges to N(0,1) as n increases (CLT). In normal case, standardized sample mean with S instead of sigmain the denominator ~ Students t(n-1).
Sum of n squared unit normal variates~ Chi-square (n)

In the normal case, sample variance has scaled Chi-square distribution. In the normal case, ratio of sample variances from two different samples divided by their respective d.f. has F distribution.
34
Sir Ronald A. Fisher (1890-1962)

Wrote the first books on statistical methods (1926 & 1936): A student should not be made to read Fishers books unless he has read them before.
George W. Snedecor (1882-1974)

Taught at Iowa State Univ. where wrote a college textbook (1937): Thank God for Snedecor; now we can understand Fisher. (named the distribution for Fisher)
35
Sampling Distributions for Order Statistics

Most sampling distribution results (except for CLT) apply to samples from normal populations. If data does not come from a normal (or at least approximately normal), then statistical methods called distribution-free or non-parametric methods can be used (Chapter 14). Non-parametric methods are often based on ordered data (called order statistics: X(1) , X(2), , X(n)) or just their ranks. If X1..Xn are from a continuous population with cdfF(x) and pdff(x) then the pdfof X(j) is:
The confidence intervals for percentiles can be derived using the order statistics and the binomial distribution.
36

Sampling Distributions of Statistics: Corresponds To Chapter 5 of Tamhaneand Dunlop

Uploaded by

Copyright:

Available Formats

Sampling Distributions of Statistics: Corresponds To Chapter 5 of Tamhaneand Dunlop

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sampling Distributions of Statistics: Corresponds To Chapter 5 of Tamhaneand Dunlop

Uploaded by

Copyright:

Available Formats

Sampling Distributions of Statistics

Corresponds to Chapter 5 of Tamhaneand Dunlop

Sample Mean and Variance

Properties of a sample mean and variance

Distribution of Sample Means

Distribution of Sample Means

Central Limit Theorem

converges to the N(0,1)distribution.

Central Limit Theorem

What happens as n goes to infinity?

Variance of means from uniform distribution

Example: Uniform Distribution

Standardized Means, Uniform Distribution500 samples, n=1

number of samples=500, n=1

Standardized Means, Uniform Distribution500 samples, n=2

number of samples=500, n=2

Standardized Means, Uniform Distribution500 samples, n=100

number of samples=500, n=100

Quantiles of Standard Normal

Bootstrap sampling from the sample

Normal probability plot of sample of size 100 from exponential distribution

Quantiles of Standard Normal

Law of Large Numbers and Central Limit Theorem

Normal Approximation to the Binomial

For p=0.5, np = n(1-p)=n(0.5) = 10 n should be 20. (symmetrical)

Sampling Distribution of the Sample Variance

Distribution of the Sample Variance in the Normal Case

(is an unbiased estimator) Var(S2) = See Result 2 (p.179)

Chi square density for df=5,10,20,30

Chi-Square Distribution Interesting Facts

T is the ratio of a N(0,1) r.v. and sq.rt.(independent 2divided by

p.180See Table A.4, t-distribution table, p. 675See Example 5.6,

Students t densities for df=1,100

Students pdf, df=1 & 100

Students t Distribution Interesting Facts

Cauchy Distribution for center=0, scale=1 and center=1, scale=2

Cauchy Distribution Interesting Facts

F pdf for df2=40

Snedecors F Distribution Interesting Facts

Sampling Distributions - Summary

Sum of n squared unit normal variates~ Chi-square (n)

Sir Ronald A. Fisher (1890-1962)

George W. Snedecor (1882-1974)

Sampling Distributions for Order Statistics

You might also like