Homoscedastic That Is, They All Have The Same Variance: Heteroscedasticity

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Heteroscedasticity

An important assumption of the classical linear regression model (Assumption 4) is that


the disturbances ui appearing in the population regression function are homoscedastic; that is, they
all have the same variance. In the following section let us try to understand nature of
heteroscedasticity, its consequences, how detect and different remedial measures.
Nature of Heteroscedasticity
As one of the important assumptions of the classical linear regression model is that the
variance of each disturbance term ui, conditional on the chosen values of the explanatory variables,
is some constant number equal to σ2. This is the assumption of homoscedasticity, or equal (homo)
spread (scedasticity), that is, equal variance. Symbolically,

Diagrammatically, in the two-variable regression model homoscedasticity can be shown as

Fig.-1: Homoscedastic disturbances Fig.-2: Heteroscedastic disturbances


Fig. 1 shows, the conditional variance of Yi (which is equal to that of ui), conditional upon the
given Xi, remains the same regardless of the values taken by the variable X. In contrast, Fig. 2
shows that the conditional variance of Yi increases as X increases. Here, the variances of Yi are not
the same. Hence, there is heteroscedasticity. Symbolically,

Note the subscript of σ2, which tells that the conditional variances of ui (= conditional
variances of Yi) are no longer constant.
To make the difference between homoscedasticity and heteroscedasticity clear, assume
that in the two-variable model
Yi = β1 + β2Xi + ui,
Where, Y represents savings and X represents income. Figures 1 and 2 show that as income
increases, savings on the average also increase. But in Figure 1 the variance of savings remains the
same at all levels of income, whereas in Fig. 2 it increases with income, i.e., the higher income
families on the average save more than the lower-income families, but there is also more
variability in their savings.
Reasons for varying variances of ui
1. Following the error-learning models: As people learn, their errors of behavior become
smaller over time. In this case, σi2 is expected to decrease. Ex: In following Fig. 3, which
relates the number of typing errors made in a given time period on a test to the hours put in
typing practice. As the number of hours of typing practice increases, the average number of
typing errors as well as their variances decreases.

AEC 507/GMG/  Heteroscedasticity  1 
 
Fig. 3: Illustration of heteroscedasticity
2. As incomes grow, people have more discretionary income and hence more scope for choice
about the disposition of their income. Hence, σi2 is likely to increase with income. Thus in the
regression of savings on income one is likely to find σi2 increasing with income because people
have more choices about their savings behavior. Similarly, companies with larger profits are
generally expected to show greater variability in their dividend policies than companies with
lower profits. Also, growth oriented companies are likely to show more variability in their
dividend payout ratio than established companies.
3. As data collecting techniques improve, σi2 is likely to decrease. Thus, banks that have
sophisticated data processing equipment are likely to commit fewer errors in the monthly or
quarterly statements of their customers than banks without such facilities.
4. Heteroscedasticity can also arise as a result of the presence of outliers. An outlying
observation, or outlier, is an observation that is much different (either very small or very large)
in relation to the observations in the sample. More precisely, an outlier is an observation from
a different population to that generating the remaining sample observations. The inclusion or
exclusion of such an observation, especially if the sample size is small, can substantially alter
the results of regression analysis.

AEC 507/GMG/  Heteroscedasticity  2 
 
Fig.4: The relationship between stock prices and consumer prices
5. Another source of heteroscedasticity arises from violating Assumption 9 of CLRM, viz., that
the regression model is correctly specified. Heteroscedasticity may be due to the fact that some
important variables are omitted from the model. Thus, in the demand function for a
commodity, if we do not include the prices of commodities complementary to or competing
with the commodity in question (the omitted variable bias), the residuals obtained from the
regression may give the distinct impression that the error variance may not be constant. But if
the omitted variables are included in the model, that impression may disappear.
Ex: advertising impressions retained (Y) in relation to advertising expenditure (X). If we
regress Y on X only and observe the residuals from this regression, you will see one pattern, but if
you regress Y on X and X2, you will see another pattern, which can be seen clearly from Fig.5. We
have already seen that X2 belongs in the model.

Fig.5: Residuals from the regression of (a) impressions of advertising expenditure and (b)
impression on Adexp and Adexp2.
6. Another source of heteroscedasticity is skewness in the distribution of one or more regressors
included in the model. Examples are economic variables such as income, wealth, and
education. It is well known that the distribution of income and wealth in most societies is
uneven, with the bulk of the income and wealth being owned by a few at the top.

AEC 507/GMG/  Heteroscedasticity  3 
 
7. Other sources of heteroscedasticity: As David Hendry notes, heteroscedasticity can also arise
because of (1) incorrect data transformation (e.g., ratio or first difference transformations) and
(2) incorrect functional form (e.g., linear versus log–linear models).
Note that the problem of heteroscedasticity is likely to be more common in cross-sectional
than in time series data. In cross-sectional data, one usually deals with members of a population at
a given point in time, such as individual consumers or their families, firms, industries, or
geographical subdivisions such as state, country, city, etc. Moreover, these members may be of
different sizes, such as small, medium, or large firms or low, medium, or high income. In time
series data, on the other hand, the variables tend to be of similar orders of magnitude because one
generally collects the data for the same entity over a period of time. Examples are GNP,
consumption expenditure, savings, or employment in the United States, say, for the period 1950 to
2000
Following table gives data on compensation per employee in 10 nondurable goods
manufacturing industries, classified by the employment size of the firm or the establishment for
the year 1958. Also given in the table are average productivity figures for nine employment
classes. Although the industries differ in their output composition, Table shows clearly that on the
average large firms pay more than the small firms. As an example, firms employing one to four
employees paid on the average about $3396, whereas those employing 1000 to 2499 employees on
the average paid about $4843. But notice that there is considerable variability in earning among
various employment classes as indicated by the estimated standard deviations of earnings. This can
be seen also from Fig. 6, which plots the standard deviation of compensation and average
compensation in each employment class. Thus it can be seen clearly that on average, the standard
deviation of compensation increases with the average value of compensation.
Table-1: Compensation per employee ($) in nondurable manufacturing industries according
to Employment size of establishment, 1958

AEC 507/GMG/  Heteroscedasticity  4 
 
Fig.6: Standard deviation of compensation and mean compensatio
OLS Estimation in the Presence of Heteroscedasticity
What happens to OLS estimators and their variances if we introduce heteroscedasticity by
letting E(ui2 ) = σi2 but retain all other assumptions of the classical model? To answer this question,
let us revert to the two-variable model:

but its variance is now given by the following expression

which is obviously different from the usual variance formula obtained under the assumption of
homoscedasticity, namely,

Of course, if σi2= σ2for each i, the two formulas will be identical.


As a matter of fact, to establish the unbiasedness of ˆβ2 it is not necessary that the
disturbances (ui) be homoscedastic. In fact, the variance of ui, homoscedastic or heteroscedastic,
plays no part in the determination of the unbiasedness property.
ˆβ2 is a consistent estimator despite heteroscedasticity; that is, as the sample size increases
indefinitely, the estimated β2 converges to its true value. Furthermore, it can also be shown that

AEC 507/GMG/  Heteroscedasticity  5 
 
under certain conditions (called regularity conditions), ˆβ2 is asymptotically normally distributed.
Of course, what we have said about ˆβ2 also holds true of other parameters of a multiple regression
model. Granted that ˆβ2 is still linear unbiased and consistent, is it “efficient” or “best”; that is,
does it have minimum variance in the class of unbiased estimators? The answer is no: ˆβ2 is no
longer best. Then what is BLUE in the presence of heteroscedasticity?
The answer is given in the following section.
The Method of Generalized Least Squares (GLS)
Why the usual OLS estimator of β2 given above is not best, although it is still unbiased?
Intuitively, we can see the reason from Table 1 above. As the table shows, there is considerable
variability in the earnings between employment classes. If we were to regress per-employee
compensation on the size of employment, we would like to make use of the knowledge that there is
considerable interclass variability in earnings. Ideally, we would like to devise the estimating
scheme in such a manner that observations coming from populations with greater variability are
given less weight than those coming from populations with smaller variability. Examining Table 1,
we would like to weight observations coming from employment classes 10–19 and 20–49 more
heavily than those coming from employment classes like 5–9 and 250–499, for the former are
more closely clustered around their mean values than the latter, thereby enabling us to estimate the
PRF more accurately.
Unfortunately, the usual OLS method does not follow this strategy and therefore does not
make use of the “information” contained in the unequal variability of the dependent variable Y,
say, employee compensation of Table 1: It assigns equal weight or importance to each observation.
But a method of estimation, known as generalized least squares (GLS), takes such information
into account explicitly and is therefore capable of producing estimators that are BLUE. To see let
us consider two-variable model:

which for ease of algebraic manipulation we write as

where X0i = 1 for each i.


Now assume that the heteroscedastic variances σi2 are known. Divide above through by σi to obtain

which for ease of exposition we write as

where the starred, or transformed, variables are the original variables divided by (the known) σi.
We use the notation β*1 and β*2, the parameters of the transformed model, to distinguish them from
the usual OLS parameters β1 and β2. What is the purpose of transforming the original model? To
see this, notice the following feature of the transformed error term ui*:

AEC 507/GMG/  Heteroscedasticity  6 
 
Which is a constant, i.e., the variance of the transformed disturbance term ui* is now
homoscedastic. With the other assumptions of the classical model, the finding of u* is
homoscedastic suggests that if we apply OLS to the transformed model it will produce estimators
that are BLUE. In short, the estimated β*1 and β*2 are now BLUE and not the OLS estimators ˆβ1
and ˆβ2.
The procedure of transforming the original variables in such a way that the transformed
variables satisfy the assumptions of the classical model and then applying OLS to them is known
as the method of Generalized Least Squares (GLS). In short, GLS is OLS on the transformed
variables that satisfy the standard least-squares assumptions. The estimators thus obtained are
known as GLS estimators, and it is these estimators that are BLUE.
Mechanics of estimating β*1 and β*2
First, write down the SRF

obtain the GLS estimators, and minimize

i.e.,

Then GLS estimator of β*2 and its variance are given by

where wi = 1/σ2i

Difference between OLS and GLS

OLS GLS
We minimize
minimize an unweighted or minimize a weighted sum of residual
(what amounts to the same squares with wi =1/σ2i acting as the
thing) equally weighted RSS weights

For clear difference between OLS and GLS, consider the following hypothetical scattergram
(Fig.7)

AEC 507/GMG/  Heteroscedasticity  7 
 
Fig.7: Hypothetical scattergram
In the (unweighted) OLS, each ûi2 associated with points A, B, and C will receive the same
weight in minimizing the RSS. Obviously, in this case the ûi2 associated with point C will
dominate the RSS. But in GLS the extreme observation C will get relatively smaller weight than
the other two observations. As noted earlier, this is the right strategy, for in estimating the
population regression function (PRF) more reliably we would like to give more weight to
observations that are closely clustered around their (population) mean than to those that are widely
scattered about.
Since above equation minimizes a weighted RSS, it is appropriately known as weighted
least squares (WLS), and the estimators thus obtained are known as WLS estimators. But WLS
is just a special case of the more general estimating technique, GLS. In the context of
heteroscedasticity, one can treat the two terms WLS and GLS interchangeably.
Note that if wi = w, a constant for all i, ˆβ*2 is identical with ˆβ2 and var (ˆβ*2) is identical
with the usual (i.e., homoscedastic) var (ˆβ2), which should not be surprising.
Detection of Heteroscedasticity
Informal Methods
Nature of the Problem
According to Prais and Houthakker on family budget studies found that the residual
variance around the regression of consumption on income increased with income, one now
generally assumes that in similar surveys one can expect unequal variances among the
disturbances. As a matter of fact, in cross-sectional data involving heterogeneous units,
heteroscedasticity may be the rule rather than the exception.
Graphical Method
Do the regression analysis on the assumption that there is no heteroscedasticity and then
do a postmortem examination of the residual squared ûi2 to see if they exhibit any systematic
pattern. Although ûi2 are not the same thing as u2i, they can be used as proxies especially if the
sample size is sufficiently large. An examination of the ûi2 may reveal patterns such as those
shown in Fig.8.

AEC 507/GMG/  Heteroscedasticity  8 
 
Fig.8: Hypothetical patterns of estimated squared residuals.
In the above figure ûi2 are plotted against Ŷi, the estimated Yi from the regression line, the
idea being to find out whether the estimated mean value of Y is systematically related to the
squared residual.
In above Figure (a) one can see that there is no systematic pattern between the two
variables, suggesting that perhaps no heteroscedasticity is present in the data. Figure (b) to (e),
however, exhibits definite patterns. For instance, Figure (c) suggests a linear relationship, whereas
Figure (d) and (e) indicates a quadratic relationship between ûi2 and Ŷi. Using such knowledge,
albeit informal, one may transform the data in such a manner that the transformed data do not
exhibit heteroscedasticity.
Instead of plotting ûi2 against Ŷi, one may plot them against one of the explanatory
variables, especially if plotting ûi2 against Ŷi results in the pattern similar to in Figure (a). In the
case of the two-variable model, plotting ûi2 against Ŷi is equivalent to plotting it against Xi,
Formal Methods
Park Test
Park formalizes the graphical method by suggesting that σ2i is some function of the explanatory
variable Xi. The functional form he suggested was

or

where vi is the stochastic disturbance term.


Since σ2i is generally not known, Park suggests using ˆu2i as a proxy and running the following
regression.

If β turns out to be statistically significant, it would suggest presence of heteroscedasticity in the


data. If it turns out to be insignificant, we may accept the assumption of homoscedasticity. The
Park test is thus a two stage procedure.

AEC 507/GMG/  Heteroscedasticity  9 
 
 First stage - Run the OLS regression disregarding the heteroscedasticity question and
obtain ˆui.
 Then run the second regression with ui’s as dependent variable.
Although empirically appealing, but Goldfeld and Quandt have argued that the error term vi may
not satisfy the OLS assumptions and may itself be heteroscedastic.
Illustration of Park approach
From data given in Table 1 run the following regression:
Yi = β1 + β2Xi + ui
where Y = average compensation in thousands of dollars, X = average productivity in thousands of
dollars, and i = i th employment size of the establishment. The results of the regression were as
follows:

The results reveal that the estimated slope coefficient is significant at the 5 percent level on the
basis of a one-tail t test. The equation shows that as labor productivity increases by, say, a dollar,
labor compensation on the average increases by about 23 cents.
The residuals obtained from regression were regressed on Xi as suggested gave the following
results:

Inference: There is no statistically significant relationship between the two variables. Following
the Park test, one may conclude that there is no heteroscedasticity in the error variance.
Glejser Test
The Glejser test is similar in spirit to the Park test. After obtaining the residuals ˆui from
the OLS regression, Glejser suggests regressing the absolute values of ˆui on the X variable that is
thought to be closely associated with σ2i. Glejser used the following functional forms:

where vi is the error term.


Illustration:

Continuing with the previous Example, the absolute value of the residuals obtained from
regression were regressed on average productivity (X), gives the following results

Inference: Regression results revealed that there is no relationship between the absolute value of
the residuals and the regressor, average productivity. This reinforces the conclusion based on the
Park test.

AEC 507/GMG/  Heteroscedasticity  10 


 
Spearman’s Rank Correlation Test
Critics: Goldfeld and Quandt point out that the error term vi has some problems in that its
expected value is nonzero, it is serially correlated, and ironically it is heteroscedastic. An
additional difficulty with the Glejser method is that following models are nonlinear in the
parameters and therefore cannot be estimated with the usual OLS procedure

and
Glejser has found that for large samples the first four of the preceding models give
generally satisfactory results in detecting heteroscedasticity. As a practical matter, therefore, the
Glejser technique may be used for large samples and may be used in the small samples strictly as a
qualitative device to learn something about heteroscedasticity.

AEC 507/GMG/  Heteroscedasticity  11 


 

You might also like