Heteroskedasticity
Heteroskedasticity
Heteroskedasticity
1
Big Picture
2
Homoskedasticity
yi = β0 + β1 xi + ui , (i = 1, . . . , n) (1)
5
Example 1
6
Remarks
2. We regress cigs (number of cigarettes) onto lincome (log income), lcigpric (log price),
education, age, age squared and a dummy variable that equals one when there is a
smoking ban in restaurants
3. We find neither income nor price matters (t values = 1.21, -0.13). This unexpected result
raises a red flag. Something may be wrong
4. Stata command reg y x uses formula (3) to compute standard error, t value, p-value and
confidence intervals. They are all wrong if homoskedasticity fails. We are unsure if
there is heteroskedasticity, so we circle those numbers and put a big question mark
7
Example 1—continued
8
Informal Check I of Homoskedasticity
3. We find that the conditional distribution of cigs when restaurn=0 is wider or has
greater dispersion than the conditional distribution when restaurn=1
4. This finding indicates that the variance at least depends on restaurn, so is not constant.
In short, it is very likely that for this problem homoskedasticity fails and
heteroskedasticity is present
9
Example 1—continued
10
Informal Check II of Homoskedasticity
2. It’s clear the two conditional standard deviations 14.2149 and 11.8806 are different.
Again, this finding suggests possible heterskedasticity
3. This method is informal because we do not know whether that difference is statistically
significant
11
Breusch-Pagan (BP) Test for Homoskedasticity
y = β0 + β1 x1 + . . . + βk xk + error term
13
Remarks
2. Next we generate squared residual, the dependent variable in the auxiliary regression
3. The auxiliary regression (6) indicates that age, age squared and restaurn are significant
(t values = 4.48, -4.55, -2.36). So variances change when those variables change
4. The BP test follows chi-squared distribution with 6 degrees of freedom. For this
problem BP test is 32.25
14
Road Map
1. If homoskedasticity is not rejected by BP test, we can trust the result reported by stata
command reg y x
15
Example 1—continued
16
Remarks
1. After use the command reg y x, r, stata reports heteroskedasticity-robust standard error,
t value, p value and confidence
2. For instance, the standard error of coefficient of lincome is 0.7277 based on formula (3),
the robust standard error based on formula (5) is 0.5960
17
Weighted Least Squares (WLS)
The idea of WLS is to transform the model so that the new error is homoskedastic
yi 1 xi ui
p = β0 p + β1 p +p (9)
h(xi ) h(xi ) h(xi ) h(xi )
!
ui
var p |xi = 1 = constant (10)
h(xi )
3. WLS is just the OLS applied to the transformed regression (9). Because the error is now
homoskedastic, WLS is BLUE
4. √ 1 is called weight. In practice we need to estimate h(xi ) before applying WLS. The
h(xi )
details are in the textbook
18
Example 1—continued
19
Remarks
2. In this case, log income becomes significant (t-value=2.96) after we apply WLS
3. This finding confirms that WLS can be more efficient (have smaller standard error) than
OLS in the presence of heteroskedasticity
20