Homoscedastic That Is, They All Have The Same Variance: Heteroscedasticity
Homoscedastic That Is, They All Have The Same Variance: Heteroscedasticity
Homoscedastic That Is, They All Have The Same Variance: Heteroscedasticity
Note the subscript of σ2, which tells that the conditional variances of ui (= conditional
variances of Yi) are no longer constant.
To make the difference between homoscedasticity and heteroscedasticity clear, assume
that in the two-variable model
Yi = β1 + β2Xi + ui,
Where, Y represents savings and X represents income. Figures 1 and 2 show that as income
increases, savings on the average also increase. But in Figure 1 the variance of savings remains the
same at all levels of income, whereas in Fig. 2 it increases with income, i.e., the higher income
families on the average save more than the lower-income families, but there is also more
variability in their savings.
Reasons for varying variances of ui
1. Following the error-learning models: As people learn, their errors of behavior become
smaller over time. In this case, σi2 is expected to decrease. Ex: In following Fig. 3, which
relates the number of typing errors made in a given time period on a test to the hours put in
typing practice. As the number of hours of typing practice increases, the average number of
typing errors as well as their variances decreases.
AEC 507/GMG/ Heteroscedasticity 1
Fig. 3: Illustration of heteroscedasticity
2. As incomes grow, people have more discretionary income and hence more scope for choice
about the disposition of their income. Hence, σi2 is likely to increase with income. Thus in the
regression of savings on income one is likely to find σi2 increasing with income because people
have more choices about their savings behavior. Similarly, companies with larger profits are
generally expected to show greater variability in their dividend policies than companies with
lower profits. Also, growth oriented companies are likely to show more variability in their
dividend payout ratio than established companies.
3. As data collecting techniques improve, σi2 is likely to decrease. Thus, banks that have
sophisticated data processing equipment are likely to commit fewer errors in the monthly or
quarterly statements of their customers than banks without such facilities.
4. Heteroscedasticity can also arise as a result of the presence of outliers. An outlying
observation, or outlier, is an observation that is much different (either very small or very large)
in relation to the observations in the sample. More precisely, an outlier is an observation from
a different population to that generating the remaining sample observations. The inclusion or
exclusion of such an observation, especially if the sample size is small, can substantially alter
the results of regression analysis.
AEC 507/GMG/ Heteroscedasticity 2
Fig.4: The relationship between stock prices and consumer prices
5. Another source of heteroscedasticity arises from violating Assumption 9 of CLRM, viz., that
the regression model is correctly specified. Heteroscedasticity may be due to the fact that some
important variables are omitted from the model. Thus, in the demand function for a
commodity, if we do not include the prices of commodities complementary to or competing
with the commodity in question (the omitted variable bias), the residuals obtained from the
regression may give the distinct impression that the error variance may not be constant. But if
the omitted variables are included in the model, that impression may disappear.
Ex: advertising impressions retained (Y) in relation to advertising expenditure (X). If we
regress Y on X only and observe the residuals from this regression, you will see one pattern, but if
you regress Y on X and X2, you will see another pattern, which can be seen clearly from Fig.5. We
have already seen that X2 belongs in the model.
Fig.5: Residuals from the regression of (a) impressions of advertising expenditure and (b)
impression on Adexp and Adexp2.
6. Another source of heteroscedasticity is skewness in the distribution of one or more regressors
included in the model. Examples are economic variables such as income, wealth, and
education. It is well known that the distribution of income and wealth in most societies is
uneven, with the bulk of the income and wealth being owned by a few at the top.
AEC 507/GMG/ Heteroscedasticity 3
7. Other sources of heteroscedasticity: As David Hendry notes, heteroscedasticity can also arise
because of (1) incorrect data transformation (e.g., ratio or first difference transformations) and
(2) incorrect functional form (e.g., linear versus log–linear models).
Note that the problem of heteroscedasticity is likely to be more common in cross-sectional
than in time series data. In cross-sectional data, one usually deals with members of a population at
a given point in time, such as individual consumers or their families, firms, industries, or
geographical subdivisions such as state, country, city, etc. Moreover, these members may be of
different sizes, such as small, medium, or large firms or low, medium, or high income. In time
series data, on the other hand, the variables tend to be of similar orders of magnitude because one
generally collects the data for the same entity over a period of time. Examples are GNP,
consumption expenditure, savings, or employment in the United States, say, for the period 1950 to
2000
Following table gives data on compensation per employee in 10 nondurable goods
manufacturing industries, classified by the employment size of the firm or the establishment for
the year 1958. Also given in the table are average productivity figures for nine employment
classes. Although the industries differ in their output composition, Table shows clearly that on the
average large firms pay more than the small firms. As an example, firms employing one to four
employees paid on the average about $3396, whereas those employing 1000 to 2499 employees on
the average paid about $4843. But notice that there is considerable variability in earning among
various employment classes as indicated by the estimated standard deviations of earnings. This can
be seen also from Fig. 6, which plots the standard deviation of compensation and average
compensation in each employment class. Thus it can be seen clearly that on average, the standard
deviation of compensation increases with the average value of compensation.
Table-1: Compensation per employee ($) in nondurable manufacturing industries according
to Employment size of establishment, 1958
AEC 507/GMG/ Heteroscedasticity 4
Fig.6: Standard deviation of compensation and mean compensatio
OLS Estimation in the Presence of Heteroscedasticity
What happens to OLS estimators and their variances if we introduce heteroscedasticity by
letting E(ui2 ) = σi2 but retain all other assumptions of the classical model? To answer this question,
let us revert to the two-variable model:
which is obviously different from the usual variance formula obtained under the assumption of
homoscedasticity, namely,
AEC 507/GMG/ Heteroscedasticity 5
under certain conditions (called regularity conditions), ˆβ2 is asymptotically normally distributed.
Of course, what we have said about ˆβ2 also holds true of other parameters of a multiple regression
model. Granted that ˆβ2 is still linear unbiased and consistent, is it “efficient” or “best”; that is,
does it have minimum variance in the class of unbiased estimators? The answer is no: ˆβ2 is no
longer best. Then what is BLUE in the presence of heteroscedasticity?
The answer is given in the following section.
The Method of Generalized Least Squares (GLS)
Why the usual OLS estimator of β2 given above is not best, although it is still unbiased?
Intuitively, we can see the reason from Table 1 above. As the table shows, there is considerable
variability in the earnings between employment classes. If we were to regress per-employee
compensation on the size of employment, we would like to make use of the knowledge that there is
considerable interclass variability in earnings. Ideally, we would like to devise the estimating
scheme in such a manner that observations coming from populations with greater variability are
given less weight than those coming from populations with smaller variability. Examining Table 1,
we would like to weight observations coming from employment classes 10–19 and 20–49 more
heavily than those coming from employment classes like 5–9 and 250–499, for the former are
more closely clustered around their mean values than the latter, thereby enabling us to estimate the
PRF more accurately.
Unfortunately, the usual OLS method does not follow this strategy and therefore does not
make use of the “information” contained in the unequal variability of the dependent variable Y,
say, employee compensation of Table 1: It assigns equal weight or importance to each observation.
But a method of estimation, known as generalized least squares (GLS), takes such information
into account explicitly and is therefore capable of producing estimators that are BLUE. To see let
us consider two-variable model:
where the starred, or transformed, variables are the original variables divided by (the known) σi.
We use the notation β*1 and β*2, the parameters of the transformed model, to distinguish them from
the usual OLS parameters β1 and β2. What is the purpose of transforming the original model? To
see this, notice the following feature of the transformed error term ui*:
AEC 507/GMG/ Heteroscedasticity 6
Which is a constant, i.e., the variance of the transformed disturbance term ui* is now
homoscedastic. With the other assumptions of the classical model, the finding of u* is
homoscedastic suggests that if we apply OLS to the transformed model it will produce estimators
that are BLUE. In short, the estimated β*1 and β*2 are now BLUE and not the OLS estimators ˆβ1
and ˆβ2.
The procedure of transforming the original variables in such a way that the transformed
variables satisfy the assumptions of the classical model and then applying OLS to them is known
as the method of Generalized Least Squares (GLS). In short, GLS is OLS on the transformed
variables that satisfy the standard least-squares assumptions. The estimators thus obtained are
known as GLS estimators, and it is these estimators that are BLUE.
Mechanics of estimating β*1 and β*2
First, write down the SRF
i.e.,
where wi = 1/σ2i
OLS GLS
We minimize
minimize an unweighted or minimize a weighted sum of residual
(what amounts to the same squares with wi =1/σ2i acting as the
thing) equally weighted RSS weights
For clear difference between OLS and GLS, consider the following hypothetical scattergram
(Fig.7)
AEC 507/GMG/ Heteroscedasticity 7
Fig.7: Hypothetical scattergram
In the (unweighted) OLS, each ûi2 associated with points A, B, and C will receive the same
weight in minimizing the RSS. Obviously, in this case the ûi2 associated with point C will
dominate the RSS. But in GLS the extreme observation C will get relatively smaller weight than
the other two observations. As noted earlier, this is the right strategy, for in estimating the
population regression function (PRF) more reliably we would like to give more weight to
observations that are closely clustered around their (population) mean than to those that are widely
scattered about.
Since above equation minimizes a weighted RSS, it is appropriately known as weighted
least squares (WLS), and the estimators thus obtained are known as WLS estimators. But WLS
is just a special case of the more general estimating technique, GLS. In the context of
heteroscedasticity, one can treat the two terms WLS and GLS interchangeably.
Note that if wi = w, a constant for all i, ˆβ*2 is identical with ˆβ2 and var (ˆβ*2) is identical
with the usual (i.e., homoscedastic) var (ˆβ2), which should not be surprising.
Detection of Heteroscedasticity
Informal Methods
Nature of the Problem
According to Prais and Houthakker on family budget studies found that the residual
variance around the regression of consumption on income increased with income, one now
generally assumes that in similar surveys one can expect unequal variances among the
disturbances. As a matter of fact, in cross-sectional data involving heterogeneous units,
heteroscedasticity may be the rule rather than the exception.
Graphical Method
Do the regression analysis on the assumption that there is no heteroscedasticity and then
do a postmortem examination of the residual squared ûi2 to see if they exhibit any systematic
pattern. Although ûi2 are not the same thing as u2i, they can be used as proxies especially if the
sample size is sufficiently large. An examination of the ûi2 may reveal patterns such as those
shown in Fig.8.
AEC 507/GMG/ Heteroscedasticity 8
Fig.8: Hypothetical patterns of estimated squared residuals.
In the above figure ûi2 are plotted against Ŷi, the estimated Yi from the regression line, the
idea being to find out whether the estimated mean value of Y is systematically related to the
squared residual.
In above Figure (a) one can see that there is no systematic pattern between the two
variables, suggesting that perhaps no heteroscedasticity is present in the data. Figure (b) to (e),
however, exhibits definite patterns. For instance, Figure (c) suggests a linear relationship, whereas
Figure (d) and (e) indicates a quadratic relationship between ûi2 and Ŷi. Using such knowledge,
albeit informal, one may transform the data in such a manner that the transformed data do not
exhibit heteroscedasticity.
Instead of plotting ûi2 against Ŷi, one may plot them against one of the explanatory
variables, especially if plotting ûi2 against Ŷi results in the pattern similar to in Figure (a). In the
case of the two-variable model, plotting ûi2 against Ŷi is equivalent to plotting it against Xi,
Formal Methods
Park Test
Park formalizes the graphical method by suggesting that σ2i is some function of the explanatory
variable Xi. The functional form he suggested was
or
AEC 507/GMG/ Heteroscedasticity 9
First stage - Run the OLS regression disregarding the heteroscedasticity question and
obtain ˆui.
Then run the second regression with ui’s as dependent variable.
Although empirically appealing, but Goldfeld and Quandt have argued that the error term vi may
not satisfy the OLS assumptions and may itself be heteroscedastic.
Illustration of Park approach
From data given in Table 1 run the following regression:
Yi = β1 + β2Xi + ui
where Y = average compensation in thousands of dollars, X = average productivity in thousands of
dollars, and i = i th employment size of the establishment. The results of the regression were as
follows:
The results reveal that the estimated slope coefficient is significant at the 5 percent level on the
basis of a one-tail t test. The equation shows that as labor productivity increases by, say, a dollar,
labor compensation on the average increases by about 23 cents.
The residuals obtained from regression were regressed on Xi as suggested gave the following
results:
Inference: There is no statistically significant relationship between the two variables. Following
the Park test, one may conclude that there is no heteroscedasticity in the error variance.
Glejser Test
The Glejser test is similar in spirit to the Park test. After obtaining the residuals ˆui from
the OLS regression, Glejser suggests regressing the absolute values of ˆui on the X variable that is
thought to be closely associated with σ2i. Glejser used the following functional forms:
Continuing with the previous Example, the absolute value of the residuals obtained from
regression were regressed on average productivity (X), gives the following results
Inference: Regression results revealed that there is no relationship between the absolute value of
the residuals and the regressor, average productivity. This reinforces the conclusion based on the
Park test.
and
Glejser has found that for large samples the first four of the preceding models give
generally satisfactory results in detecting heteroscedasticity. As a practical matter, therefore, the
Glejser technique may be used for large samples and may be used in the small samples strictly as a
qualitative device to learn something about heteroscedasticity.