Sampling Distributions of Statistics: Corresponds To Chapter 5 of Tamhaneand Dunlop
Sampling Distributions of Statistics: Corresponds To Chapter 5 of Tamhaneand Dunlop
Sampling Distributions of Statistics: Corresponds To Chapter 5 of Tamhaneand Dunlop
Slides prepared by Elizabeth Newton (MIT), with some slides by Jacqueline Telford (Johns Hopkins University)
1
Sampling Distributions
Definitions and Key Concepts A sample statistic used to estimate an unknown population parameter is called an estimate. The discrepancy between the estimate and the true parameter value is known as sampling error. A statistic is a random variable with a probability distribution, called the sampling distribution, which is generated by repeated sampling.
We use the sampling distribution of a statistic to assess the sampling error in an estimate.
2
Random Sample
Definition 5.11, page 201, Casellaand Berger. How is this different from a simple random sample? For mutual independence, population must be very large or must sample with replacement.
Sample Mean
Sample Variance
How do the sample mean and variance vary in repeated samples of size n drawn from the population?
In general, difficult to find exact sampling distribution. However, see example of deriving distribution when all possible samples can be enumerated (rolling 2 dice) in sections 5.1 and 5.2. Note errors on page 168.
4
log10(variance)
log10(sample.size)
This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
10
runif(500, min = 0, max = 10) This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
11
This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
12
This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
13
This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
14
QQ (Normal) plot of means of 500 samples of size 100 from uniform distribution
This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
15
x<-runif(100)
y<-mean(sample(x,100,replace=T))
16
This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
17
Normal probability plot of means of 500 bootstrap samples from sample of size 100 from exponential distribution
Quantiles of Standard Normal This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
18
Central Limit Theorem (CLT) says that as n , also the distribution converges to Normal, i.e., converges to N(0,1)
19
Suppose that X is B(n, p). Then the mean of X is np and the variance of X is np(1 -p).
By the CLT, we have:
How large a sample, n, do we need for the approximation to be good? Rule of Thumb: np 10 and n(1-p) 10
Continuity Correction
See Figure 5.4 for motivation.
Exact Binomial Probability: P(X 8)= 0.2517 Normal approximation without Continuity Correction: P(X 8)= 0.1867 Normal approximation with Continuity Correction: P(X 8.5)= 0.2514 (much better agreement with exact calculation)
21
This is known as the 2 distribution with degrees of freedom (d.f.) or Y ~ See Figures 5.5 and 5.6, pp. 176-177 and Table A.5, p.676
22
23
Chi-square distribution
x This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
24
variates.
25
Students t-Distribution
Consider a random sample X1, X2, ..., Xndrawn from N(,2). It is known that is exactly distributed as N(0,1).
is NOT distributed as N(0,1). A different distribution for each = n-1 degrees of freedom (d.f.).
x This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
27
28
Cauchy pdf
x This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
29
Parameters, a=center, b=scale Mean and Variance do not exist (how could this be?) a=median Quartiles=a +/-b Special case of Students t with 1 d.f. Ratio of 2 independent unit normal variatesis standard Cauchy variate Should not be thought of as only a pathological case. (Casella& Berger) as we frequently (when?) calculate ratios of random variables.
30
Snedecor-Fishers F-Distribution
Consider two independent random samples: X1, X2, ..., Xn1from N(1,12) , Y1, Y2, ..., Yn2from
N(2,22).
Then has an F-distribution with n1-1 d.f. in the numerator and n2-1 d.f. in the denominator. F is the ratio of two independent 2s divided by their respective d.f.s Used to compare sample variances. See Table A.6, F-distribution, pp. 677-679
31
Snedecors F Distribution
x This graph was created using S-PLUS(R) Software. S-PLUS(R) is a registered trademark of Insightful Corporation.
32
33
35
The confidence intervals for percentiles can be derived using the order statistics and the binomial distribution.
36