James Stock CH 1, 2, 3 Slides
James Stock CH 1, 2, 3 Slides
James Stock CH 1, 2, 3 Slides
1/2/3-1
1/2/3-4
1/2/3-6
1/2/3-7
2.
3.
657.4
650.0
Standard
deviation
(sBYB)
19.4
17.9
238
182
1/2/3-11
1. Estimation
Ysmall Ylarge =
1
nsmall
nsmall
Y
i 1
1
nlarge
nlarge
Y
i 1
= 657.4 650.0
= 7.4
Is this a large difference in a real-world sense?
Standard deviation across districts = 19.1
Difference between 60PthP and 75PthP percentiles of test
score distribution is 667.6 659.4 = 8.2
This is a big enough difference to be important for school
reform discussions, for parents, or for a school
committee?
1/2/3-12
2. Hypothesis testing
Difference-in-means test: compute the t-statistic,
t
Ys Yl
ss2
ns
sl2
nl
Ys Yl
SE (Ys Yl )
(remember this?)
(
Y
Y
)
and s
(etc.)
i
s
ns 1 i 1
1/2/3-13
sBYB
19.4
17.9
Y
657.4
650.0
t
Ys Yl
ss2
ns
sl2
nl
657.4 650.0
19.42
238
17.92
182
n
238
182
7.4
1.83 = 4.05
|t| > 1.96, so reject (at the 5% significance level) the null
hypothesis that the two means are the same.
1/2/3-14
3. Confidence interval
A 95% confidence interval for the difference between the
means is,
(Ys Yl ) 1.96SE(Ys Yl )
= 7.4 1.961.83 = (3.8, 11.0)
Two equivalent statements:
1. The 95% confidence interval for doesnt include 0;
2. The hypothesis that = 0 is rejected at the 5% level.
1/2/3-15
(school districts)
We will think of populations as infinitely large ( is an
approximation to very big)
Random variable Y
Numerical summary of a random outcome (district
average test score, district STR)
1/2/3-18
Population distribution of Y
The probabilities of different values of Y that occur in the
population, for ex. Pr[Y = 650] (when Y is discrete)
or: The probabilities of sets of these values, for ex. Pr[640
Y 660] (when Y is continuous).
1/2/3-19
variance = BYB
1/2/3-20
Moments, ctd.
3
E Y Y
skewness =
Y3
E Y Y
kurtosis =
Y4
1/2/3-22
1/2/3-23
X2
1/2/3-24
so is the correlation
1/2/3-25
corr(X,Z) =
cov( X , Z )
XZ
= rBXZB
var( X ) var( Z ) X Z
1 corr(X,Z) 1
corr(X,Z) = 1 mean perfect positive linear association
corr(X,Z) = 1 means perfect negative linear association
corr(X,Z) = 0 means no linear association
1/2/3-26
1/2/3-27
1/2/3-29
1/2/3-31
median(YB1B,, YBnB)
The starting point is the sampling distribution of Y
1/2/3-33
1/2/3-36
= E Y
i Y
n i 1
= E (Yi Y )
n i 1
1/2/3-38
var(Y ) = E (Yi Y )
n i 1
so
1 n
1 n
= E (Yi Y ) (Y j Y )
n i 1
n j 1
1 n n
= 2 E (Yi Y )(Y j Y )
n i 1 j 1
1 n n
= 2 cov(Yi , Y j )
n i 1 j 1
1
= 2
n
Y
i 1
Y2
=
n
1/2/3-39
Y2
var(Y ) =
n
Implications:
1.
2.
1/2/3-41
1/2/3-44
Y E (Y )
var(Y )
1/2/3-45
Y2 /n
Other than its mean and variance, the exact distribution of
Y is complicated and depends on the distribution of Y (the
population distribution)
When n is large, the sampling distribution simplifies:
p
Y is consistent: Y Y
Y is the least squares estimator of Y; Y solves,
n
min m (Yi m ) 2
i 1
dm i 1
i 1 dm
i 1
n i 1
i 1
i 1
1/2/3-47
1/2/3-48
1.
2.
3.
4.
Hypothesis Testing
The hypothesis testing problem (for the mean): make a
provisional decision, based on the evidence at hand, whether
a null hypothesis is true, or instead that some alternative
hypothesis is true. That is, test
H0: E(Y) = Y,0 vs. H1: E(Y) > Y,0 (1-sided, >)
H0: E(Y) = Y,0 vs. H1: E(Y) < Y,0 (1-sided, <)
H0: E(Y) = Y,0 vs. H1: E(Y) Y,0
(2-sided)
1/2/3-49
|
Y
Y ,0 |]
p-value = H 0
Y ,0
|
Y
Y ,0 |],
p-value = H 0
Y ,0
= PrH 0 [|
Y Y ,0
Y / n
||
Y act Y ,0
Y / n
|]
Y Y ,0 Y act Y ,0
||
|]
= PrH 0 [|
Y
Y
probability under left+right N(0,1) tails
where Y = std. dev. of the distribution of Y = Y/ n .
1/2/3-51
Y
)
sY2 =
= sample variance of Y
i
n 1 i 1
Fact:
p
1/2/3-53
= PrH 0 [|
PrH 0 [|
Y Y ,0
Y / n
Y Y ,0
sY / n
||
||
Y act Y ,0
Y / n
Y act Y ,0
sY / n
|]
|] (large n)
so
act
Pr
[|
t
|
|
t
|]
p-value = H 0
( Y estimated)
Y Y ,0
sY / n
1/2/3-54
5% t-distribution
critical value
2.23
2.09
2.04
2.00
1.96
1/2/3-58
1/2/3-59
1/2/3-60
nl
1.
2.
3.
4.
Confidence Intervals
A 95% confidence interval for Y is an interval that contains
the true value of Y in 95% of repeated samples.
Digression: What is random here? The values of Y1,,Yn and
thus any functions of them including the confidence
interval. The confidence interval it will differ from one
sample to the next. The population parameter, Y, is not
random, we just dont know it.
1/2/3-63
Summary:
From the two assumptions of:
(1) simple random sampling of a population, that is,
{Yi, i =1,,n} are i.i.d.
(2) 0 < E(Y4) <
we developed, for large samples (large n):
Theory of estimation (sampling distribution of Y )
Theory of hypothesis testing (large-n distribution of tstatistic and computation of the p-value)
Theory of confidence intervals (constructed by inverting
test statistic)
Are assumptions (1) & (2) plausible in practice? Yes
1/2/3-65
1/2/3-66