least squares estimator. Weighted least squares.
Robust and clustered standard errors.
Jakub Mućk
SGH Warsaw School of Economics
Multiple regression
y = β0 + β1 x1 + β2 x2 + . . . + βK xK + ε (1)
I y is the (outcome) dependent variable;
I x1 , x2 , . . . , xK is the set of independent variables;
I ε is the error term.
The dependent variable is explained with the components that vary with the
the dependent variable and the error term.
β0 is the intercept.
β1 , β2 , . . . , βK are the coefficients (slopes) on x1 , x2 , . . . , xK .
Assumptions of the least squares estimators I
Assumption #1: true DGP (data generating process):
y = Xβ + ε. (2)
E (ε) = 0, (3)
E(Xε) = 0. (7)
Assumptions of the least squares estimators II
rank(X) = K + 1 ≤ N. (8)
ε ∼ N 0, σ 2 .
Gauss-Markov Theorem
The least squares estimator
Consequences of non spherical errors
h −1 −1 0 i
V ar(β̂ OLS ) = E X0 X X0 ε X0 X X0 ε
h −1 −1 i
= E X0 X X0 εε0 X X0 X
−1 −1
X0 X X0 E εε0 X X0 X
I The least squares estimator is still unbiased and consistent but it no longer
I Inconsistency of variance. The standard errors usually computed for the
least squares estimator are unreliable.
I Confidence intervals and hypothesis tests that use these standard errors may
be misleading.
I Visual inspection of residuals.
I Formal tests.
Dealing with non spherical errors
I (Feasible) Generalized Least Squares.
I Robust standard errors.
Special cases
I Heteroskedasticity of the error term.
I Serial correlation.
I The simple linear model:
yi = −β0 + β1 xi + εi var(εi ) = σ 2 , (16)
the variance of the least squares estimators for β1 :
var(β̂1LS ) = PN (17)
(xi − x̄)2
I The simple linear model (with heteroskedasticity):
Detecting Heteroskedasticity
Detecting Heteroskedasticity
household food expenditure per week
200 300 400
100 500
0 10 20 30 40
weekly household income
squared residuals
20000 30000
0 10 20 30 40
weekly household income
The Breusch-Pagan test I
H0 : α1 = α2 = . . . = αS = 0,
H1 : not all αj = 0.
The Breusch-Pagan test II
The null is about homoskedasticity while the alternative is about heteroskedas-
Note that for linear function we have:
where νi is random.
The test statistics based on the above regression (for linear function) obtained
after substitution the least squares residualsε̂2i for ε2i :
Finally, the test statistics based on the R2 from the previous regression has
a chi-square distribution with S degrees of freedom:
χ2 = N R2 ∼ χ2(S) . (25)
The White test I
In the White test the explanatory variables x, their squares and cross-
products are used instead of z.
Example. In the linear
E(y) = β0 + β1 x1 + β2 x2 . (26)
The Goldfeld-Quandt test I
Splitting sample:
The Goldfeld-Quandt test II
Test statistics:
2 2
σ̂M /σM
F= ∼ F(NM −KM ,FM −KF ) , (31)
σ̂F /σF2
Heteroskedasticity-Consistent Standard Errors
Heteroskedasticity-Consistent Standard Errors I
The White’s estimator for the variance helps avoid computing incorrect in-
terval estimates or incorrect values for test statistics in the presence of het-
eroskedasticity but it does not address the other implication of heteroskedas-
I But when sample size is large the variance of the least squares estimator may
still be sufficiently small to get precise estimates.
I Robust standard errors estimator does not require to specify a suitable variance
function h().
Clustered standard errors
Clustered standard errors
The general variance-covariance of the error term matrix will be block diag-
Denoting the group by g = 1, 2, . . . , G, the variance can be estimated:
−1 X −1
V ar(β̂ LS
)= XX x0 ê0g êg x X0 X . (35)
Generalized Least Squares
GLS: known form of variance I
yi = β0 + β1 xi + εi (36)
or more generally
zi∗ = √ . (39)
GLS: known form of variance II
Therefore the least squares estimator can be applied to the regression that
bases on transformed variables.
GLS and grouped data I
Example: wages for female and male workers in the divided samples"
Splitting sample:
GLS and grouped data II
Unknown Form of Variance I
ln ε̂2i = α1 + α2 z1 + . . . + αS zS + νi (49)
Serial correlation
Nature of serial correlation of the error term
Detecting serial correlation of the error term
Detecting serial correlation of the error term
The Lagrange multiplier test I
The Lagrange multiplier test allows to test jointly correlations at more
than one lag.
The AR(1) model for error term:
et = ρ1 et−1 + νt (58)
yt = β0 + β1 xt + ρ1 et−1 + νt . (59)
yt = β0 + β1 xt + ρ1 êt−1 + νt , (60)
H0 : ρ1 = 0. (63)
The Lagrange multiplier test – testing higher order of autocorrelation
H0 : ρ1 = ρ2 = . . . = ρk = 0. (67)
Estimation with Serially Correlated Errors
The variance of the least squares estimator (in the simply regression model, i.e.,
yt = β0 + β1 xt + et ):
var(β̂1 ) = wt var(et ) + wt ws cov(et , es ), (69)
t t t6=s
(xt − x̄)
wt = . (70)
(xt − x̄)2
If there is no serial correlation then the variance
var(β̂1 ) = wt2 var(et ), (71)
ρk σν2
cov(et , et−k ) = , ρi = ρi . (74)
1 − ρ2
When the error term follows AR(1) then the simple regression can be ex-
yt = β0 + β1 xt + ρet−1 + νt . (75)
For the period t − 1 the error term can be expressed as:
where z ∈ {yt , xt , et }.
This transformation is called quasi-differencing.
To get estimates of ρ we can use sample correlation of residuals.
By construction the error term is not autocorrelated: