Heteroskedasticity

Heteroskedasticity —Chapter 8 of Wooldridge’s textbook
1
Big Picture
In this lecture you will learn
1. Homoskedasticity and Heteroskedasticity
2. Robust Standard Error
3. Weighted Least Squares Estimator
2
Homoskedasticity
Consider a simple regression
yi = β0 + β1 xi + ui , (i = 1, . . . , n) (1)
Homoskedasticity is the assumption that the variance of u is constant
var(ui |xi ) = var(yi |xi ) = σ 2 = constant (Homoskedasticity) (2)
Under the homoskedasticity assumption we can show that
1. the conventional standard error of β̂1 is

s s
σ2 σ2
se(β̂1 ) = = (3)
T SSx ∑(xi − x̄)2
By default Stata command reg y x uses formula (3) to compute the standard error, t
value, p-value and etc. They are all wrong if homoskedasticity fails.
2. OLS is the best unbiased linear estimator (BLUE), a result called Gauss-Markov
Theorem
3
Heteroskedasticity
1. We can imagine a situation in which homoskedasticity (constant variance) is invalid

(see the picture in next page). For instance, because rich people have more options, the
dispersion of expenditure (variance of expenditure) can rise when people’s income rises
2. Heteroskedasticity is present when the variance of error term varies across observations
var(ui |xi ) = var(yi |xi ) = h(xi ) = σi2 6= constant (Heteroskedasticity) (4)
Under heteroskedasticity we can show that

(a) the new standard error of the slope coefficient estimate is
s
∑(xi − x̄)2 σi2
se(β̂1 ) = 2
(heteroskedasticity-robust standard error)
(T SSx )
(5)
Stata command reg y x, r uses formula (5) to compute the heteroskedasticity-robust
standard error, t value, p-value and etc. They are correct no matter whether
homoskedasticity holds.
(b) OLS is no longer BLUE. Instead Weighted Least Squares (WLS) is BLUE
4
A Picture of Heteroskedasticity
5
Example 1
6
Remarks
1. We use Smoke data to illustrate heteroskedasticity
2. We regress cigs (number of cigarettes) onto lincome (log income), lcigpric (log price),
education, age, age squared and a dummy variable that equals one when there is a
smoking ban in restaurants
3. We find neither income nor price matters (t values = 1.21, -0.13). This unexpected result
raises a red flag. Something may be wrong
4. Stata command reg y x uses formula (3) to compute standard error, t value, p-value and
confidence intervals. They are all wrong if homoskedasticity fails. We are unsure if
there is heteroskedasticity, so we circle those numbers and put a big question mark
7
Example 1—continued
8
Informal Check I of Homoskedasticity
1. We next consider an informal check (eye-ball econometrics) of the homoskedasticity

assumption
2. We plot cigs against smoking-ban dummy variable
3. We find that the conditional distribution of cigs when restaurn=0 is wider or has
greater dispersion than the conditional distribution when restaurn=1
4. This finding indicates that the variance at least depends on restaurn, so is not constant.
In short, it is very likely that for this problem homoskedasticity fails and
heteroskedasticity is present
9
10
Informal Check II of Homoskedasticity
1. We compare the conditional standard deviation of cigs when restaurn=0 to when

restaurn=1
2. It’s clear the two conditional standard deviations 14.2149 and 11.8806 are different.
Again, this finding suggests possible heterskedasticity
3. This method is informal because we do not know whether that difference is statistically
significant
11
Breusch-Pagan (BP) Test for Homoskedasticity
1. BP Test is the formal way to test the null hypothesis of homoskedasticity

2. BP tests involves three steps
(a) Step I: run original regression and save residual û
y = β0 + β1 x1 + . . . + βk xk + error term
(b) Step II: run an auxiliary regression given by
û2 = c0 + c1 x1 + . . . ck xk + error term (6)
Note the dependent variable is squared residual

(c) Step III: compute
BP Test = nR2StepIIregression ∼ χ 2 (k) (7)
We reject homoskedasticity if the p-value is less than 0.05
3. Intuitively, we reject homoskedasticity when the auxiliary regression has a big
R-squared, that is, when some of x variables can explain or matters for variance
12
13
Remarks
1. We use command predict uhat, r to save the residual of original regression
2. Next we generate squared residual, the dependent variable in the auxiliary regression
3. The auxiliary regression (6) indicates that age, age squared and restaurn are significant
(t values = 4.48, -4.55, -2.36). So variances change when those variables change
4. The BP test follows chi-squared distribution with 6 degrees of freedom. For this
problem BP test is 32.25
5. The p value is less than 0.05, so we reject homoskedasticity
6. We conclude that for this problem homoskedasticity is invalid and heterskedasticity is

present
14
Road Map
1. If homoskedasticity is not rejected by BP test, we can trust the result reported by stata
command reg y x
2. If homoskedasticity is rejected, there are two options

(a) Option A: we still use OLS, but we must use stata command reg y x, r to report the
heteroskedasticity-robust standard error, t value, p value and confidence intervals
(b) Option B: we use WLS, which is more inefficient than OLS in the presence of
heteroskedasticity
15
16
Remarks
1. After use the command reg y x, r, stata reports heteroskedasticity-robust standard error,
t value, p value and confidence
2. For instance, the standard error of coefficient of lincome is 0.7277 based on formula (3),
the robust standard error based on formula (5) is 0.5960
3. We see t values, p values and confidence intervals change accordingly
4. Note that the coefficient estimates remain unchanged—heteroskedasticity has nothing to

do with β̂ , it matters for var(β̂ )
17
Weighted Least Squares (WLS)
1. We consider weighted least squares method if there is heteroskedasticity
2. Consider a simple regression
yi = β0 + β1 xi + ui , (var(ui |xi ) = h(xi )) (8)
The idea of WLS is to transform the model so that the new error is homoskedastic
yi 1 xi ui
p = β0 p + β1 p +p (9)
h(xi ) h(xi ) h(xi ) h(xi )
!
ui
var p |xi = 1 = constant (10)
h(xi )
3. WLS is just the OLS applied to the transformed regression (9). Because the error is now
homoskedastic, WLS is BLUE
4. √ 1 is called weight. In practice we need to estimate h(xi ) before applying WLS. The
h(xi )
details are in the textbook
18
19
Remarks
1. The details of obtaining WLS can be found in the textbook
2. In this case, log income becomes significant (t-value=2.96) after we apply WLS
3. This finding confirms that WLS can be more efficient (have smaller standard error) than
OLS in the presence of heteroskedasticity
20

Heteroskedasticity

Uploaded by

Copyright:

Available Formats

Heteroskedasticity

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Heteroskedasticity

Uploaded by

Copyright:

Available Formats

Heteroskedasticity —Chapter 8 of Wooldridge’s textbook

In this lecture you will learn

1. Homoskedasticity and Heteroskedasticity

2. Robust Standard Error

3. Weighted Least Squares Estimator

Consider a simple regression

Homoskedasticity is the assumption that the variance of u is constant

var(ui |xi ) = var(yi |xi ) = σ 2 = constant (Homoskedasticity) (2)

Under the homoskedasticity assumption we can show that

1. the conventional standard error of β̂1 is

1. We can imagine a situation in which homoskedasticity (constant variance) is invalid

var(ui |xi ) = var(yi |xi ) = h(xi ) = σi2 6= constant (Heteroskedasticity) (4)

Under heteroskedasticity we can show that

1. We use Smoke data to illustrate heteroskedasticity

1. We next consider an informal check (eye-ball econometrics) of the homoskedasticity

2. We plot cigs against smoking-ban dummy variable

1. We compare the conditional standard deviation of cigs when restaurn=0 to when

1. BP Test is the formal way to test the null hypothesis of homoskedasticity

(b) Step II: run an auxiliary regression given by

û2 = c0 + c1 x1 + . . . ck xk + error term (6)

Note the dependent variable is squared residual

1. We use command predict uhat, r to save the residual of original regression

5. The p value is less than 0.05, so we reject homoskedasticity

6. We conclude that for this problem homoskedasticity is invalid and heterskedasticity is

2. If homoskedasticity is rejected, there are two options

3. We see t values, p values and confidence intervals change accordingly

4. Note that the coefficient estimates remain unchanged—heteroskedasticity has nothing to

1. We consider weighted least squares method if there is heteroskedasticity

2. Consider a simple regression

yi = β0 + β1 xi + ui , (var(ui |xi ) = h(xi )) (8)

1. The details of obtaining WLS can be found in the textbook

You might also like