L10.2_2023

1
Multicollinearity
What Happens If the Regressors
Are Correlated?
Introduction
• Assumption 8 of the classical linear regression model (CLRM) is
that there is no multicollinearity among the regressors included
in the regression model.
2
Introduction
We attempt to answers to the following questions:
1. What is the nature of multicollinearity?
2. Is multicollinearity really a problem?
3. What are its practical consequences?
4. How does one detect it?
5. What remedial measures can be taken to alleviate the problem of

multicollinearity?
3
The Nature of Multicollinearity
• The term multicollinearity is attributed to Ragnar Frisch.
• An exact linear relationship is said to exist if the following condition is satisfied:
• where λ1, λ2, ... , λk are constants such that not all of them are zero
simultaneously.
4
The Ballentine view of multicollinearity
5
Multicollinearity
• Multicollinearity refers only to linear relationships among the X variables.
• It does not rule out nonlinear relationships among them.
• For example, consider the following regression model:
6
Multicollinearity
• Why does the classical linear regression model assume that there is no
multicollinearity among the X’s?
• If multicollinearity is perfect, the regression coefficients of the X variables are

indeterminate and their standard errors are infinite.
• If multicollinearity is less than perfect, the regression coefficients, although

determinate, possess large standard errors (in relation to the coefficients
themselves), which means the coefficients cannot be estimated with great
precision or accuracy.
7
Multicollinearity
• There are several sources of multicollinearity:
• The data collection method employed
• Constraints on the model or in the population being sampled
• Model specification
• An over-determined model.
• An additional reason for multicollinearity, especially in time series data, may be

that the regressors included in the model share a common trend, that is, they all
increase or decrease over time.
8
Estimation in the Presence of Perfect
Multicollinearity
• Take the following three-variable regression model as
9
Multicollinearity
• Assume that X3i = λX2i, where λ is a nonzero constant
10
Multicollinearity
• 𝛽መ2 : It gives the rate of change in the average value of Y as X2 changes by a unit,
holding X3 constant.
• But if X3 and X2 are perfectly collinear, there is no way X3 can be kept constant:
As X2 changes, so does X3 by the factor λ.
11
Estimation in the Presence of “High” but
“Imperfect” Multicollinearity
• Instead of exact multicollinearity, if we have
12
Multicollinearity: Theoretical Consequences of
Multicollinearity
• Even if multicollinearity is very high, as in the case of near multicollinearity, the
OLS estimators still retain the property of BLUE.
• According to Goldberger, exact micronumerosity (the counterpart of exact

multicollinearity) arises when n, the sample size, is zero, in which case any kind
of estimation is impossible.
• Near micronumerosity, like near multicollinearity, arises when the number of

observations barely exceeds the number of parameters to be estimated.
13
Multicollinearity
• Even in the case of near multicollinearity the OLS estimators are unbiased.
• Second, collinearity does not disturb the property of minimum variance.
• Third, multicollinearity is essentially a sample (regression) phenomenon, that is,

even if the X variables are not linearly related in the population, they may be so
related in the particular sample at hand.
14
Practical Consequences of Multicollinearity
1. Although BLUE, the OLS estimators have large variances and covariances,
making precise estimation difficult.
2. Because of consequence 1, the confidence intervals tend to be much wider,
leading to the acceptance of the “zero null hypothesis” (i.e., the true population
coefficient is zero) more readily.
3. Also because of consequence 1, the t ratio of one or more coefficients tends to
be statistically insignificant.
4. Although the t ratio of one or more coefficients is statistically insignificant, R2,
the overall measure of goodness of fit, can be very high.
5. The OLS estimators and their standard errors can be sensitive to small changes
in the data.
15
Large Variances and Covariances of OLS
Estimators
The following formulae are used
• where r23 is the coefficient of correlation between X2 and X3.
16
Variance-Inflating Factor (VIF)
• The speed with which variances and covariances increase can be seen with the
variance-inflating factor (VIF), which is defined as
• VIF shows how the variance of an estimator is inflated by the presence of

multicollinearity.
17
Regression model
• We can also write
• the variance of the kth coefficient
18
Multicollinearity
• The Effect of Increasing Collinearity on the 95% Confidence Interval
19
Tolerance (TOL)
• The inverse of the VIF is called tolerance (TOL). That is,
20
Wider Confidence Intervals
• Because of the large standard errors, the confidence intervals for the relevant
population parameters tend to be larger
• The probability of accepting a false hypothesis (i.e., type II error) increases.
21
“Insignificant” t Ratios
• In cases of high collinearity the estimated standard errors increase dramatically,
thereby making the t values smaller.
• Therefore, in such cases, one will increasingly accept the null hypothesis that the
relevant true population value is zero.
22
A High R2 but Few Significant t Ratios
• Consider the k-variable linear regression model:
• In cases of high collinearity, it is possible to find that one or more of the partial
slope coefficients are individually statistically insignificant on the basis of the t-
test.
• Yet the R2 in such situations may be so high, say, in excess of 0.9, that on the
basis of the F test one can convincingly reject the hypothesis that β2 = β3 = ··· = βk
=0
23
Sensitivity of OLS Estimators and Their Standard
Errors to Small Changes in Data
• Consider the following data set
24
• Estimate the regression model using the command: reg y x2 x3
Source SS df MS Number of obs = 5

F(2, 2) = 4.27
Model 8.10121951 2 4.05060976 Prob > F = 0.1899
Residual 1.89878049 2 .949390244 R-squared = 0.8101
Adj R-squared = 0.6202
Total 10 4 2.5 Root MSE = .97437
y Coef. Std. Err. t P>|t| [95% Conf. Interval]
x2 .4463415 .1848104 2.42 0.137 -.3488336 1.241517

x3 .0030488 .0850659 0.04 0.975 -.3629602 .3690578
_cons 1.193902 .7736789 1.54 0.263 -2.134969 4.522774
25
• We obtain the following multiple regression:
26
• With the new data set after changing 3rd and 4th value of X3

F(2, 2) = 4.39
Model 8.14324324 2 4.07162162 Prob > F = 0.1857
Residual 1.85675676 2 .928378378 R-squared = 0.8143
Total 10 4 2.5 Root MSE = .96352
x2 .4013514 .272065 1.48 0.278 -.7692498 1.571953

x3 .027027 .1252281 0.22 0.849 -.5117858 .5658399
_cons 1.210811 .7480215 1.62 0.247 -2.007666 4.429288
27
• With the new data set after changing 3rd and 4th value of X3
28
Consequences of Micronumerosity
29
Example 2
• EXAMPLE Consumption Expenditure in Relation to Income and Wealth
30
Example 2

F(2, 7) = 92.40
Model 8565.55407 2 4282.77704 Prob > F = 0.0000
Residual 324.445926 7 46.349418 R-squared = 0.9635
Total 8890 9 987.777778 Root MSE = 6.808
x2 .9415373 .8228983 1.14 0.290 -1.004308 2.887383

x3 -.0424345 .0806645 -0.53 0.615 -.2331757 .1483067
_cons 24.77473 6.7525 3.67 0.008 8.807609 40.74186
31
Example 2
32
Graphical presentation
• Graphical presentation
33
Example

F(1, 8) = 3849.02
Model 3427202.73 1 3427202.73 Prob > F = 0.0000
Total 3434326 9 381591.778 Root MSE = 29.84
x3 Coef. Std. Err. t P>|t| [95% Conf. Interval]
x2 10.19091 .1642623 62.04 0.000 9.81212 10.5697

_cons 7.545455 29.47581 0.26 0.804 -60.42589 75.5168
34
Example

F(1, 8) = 202.87
Model 8552.72727 1 8552.72727 Prob > F = 0.0000
Total 8890 9 987.777778 Root MSE = 6.493
x2 .5090909 .0357428 14.24 0.000 .4266678 .591514

_cons 24.45455 6.413817 3.81 0.005 9.664256 39.24483
35
Example
• If instead of regressing Y on X2, we regress it on X3

F(1, 8) = 176.67
Model 8504.87666 1 8504.87666 Prob > F = 0.0000
Total 8890 9 987.777778 Root MSE = 6.9383
x3 .0497638 .003744 13.29 0.000 .0411301 .0583974

_cons 24.41104 6.874097 3.55 0.007 8.559349 40.26274
36
Example
• If instead of regressing Y on X2, we regress it on X3, we obtain
• This result would suggest that a way out of extreme collinearity is

to drop the collinear variable
37
Detection of Multicollinearity
1. High R2 but few significant t ratios.
2. High pair-wise correlations among regressors.

• high zero-order correlations are a sufficient but not a necessary condition for the
existence of multicollinearity because it can exist even though the zero-order or
simple correlations are comparatively low.
3. Examination of partial correlations.
38
4. Auxiliary regressions.
• Regress each Xi on the remaining X variables and compute the corresponding R2,
designated as R2i ; each one of these regressions is called an auxiliary regression
• If the computed F exceeds the critical Fi at the chosen level of significance, it is

taken to mean that the particular Xi is collinear with other X’s.
39
• Klein’ rule of thumb suggests that multicollinearity may be a troublesome
problem only if the R2 obtained from an auxiliary regression is greater than the
overall R2, that is, that obtained from the regression of Y on all the regressors.
40
5. Eigenvalues and Condition Index.
• Condition number k defined as
• and the Condition Index (CI) defined as
41
• Thumb rule: If k is between 100 and 1000 there is moderate to strong
multicollinearity and if it exceeds 1000 there is severe multicollinearity.
• Alternatively, if the CI ( = 𝐾 ) is between 10 and 30, there is moderate to strong

multicollinearity and if it exceeds 30 there is severe multicollinearity.
42
6. Tolerance and variance inflation factor.
• The larger the value of VIFj, the more “troublesome” or collinear the variable Xj.
• As a rule of thumb, if the VIF of a variable exceeds 10, which will happen if R2j
exceeds 0.90, that variable is said be highly collinear.
• The closer TOLj is to zero, the greater the degree of collinearity of that variable
with the other regressors.
• On the other hand, the closer TOLj is to 1, the greater the evidence that Xj is not
collinear with the other regressors.
43
• 7. Scatterplot.
44
Remedial Measures
• Do Nothing Approach
45
Rule-of-Thumb Procedures
1. A priori information.
Suppose we consider the model:
• But suppose we believe that β3 = 0.10β2; that is, the rate of change of
consumption with respect to wealth is one-tenth the corresponding rate with
respect to income.
• We can then run the following regression:
46
2. Combining cross-sectional and time series data.
47
3. Dropping a variable(s) and specification bias.
• But in dropping a variable from the model we may be committing a specification

bias or specification error.
• Specification bias arises from incorrect specification of the model used in the
analysis.
• Thus, if economic theory says that income and wealth should both be included in the
model explaining the consumption expenditure, dropping the wealth variable would
constitute specification bias.
• Omitting a variable may seriously mislead us as to the true values of the parameters.
• OLS estimators are BLUE despite near collinearity.
48
4. Transformation of variables.
• First difference form
• Another commonly used transformation in practice is the ratio

transformation. Consider the model:
49
• Problems with first difference models
• For instance, the error term vt in transformed equation may not satisfy the
assumption that the disturbances are serially uncorrelated.
• The error term
• will be heteroscedastic, if the original error term ut is homoscedastic.
50
5. Additional or new data.
• In the three-variable model
2
• As the sample size increases, σ 𝑥2𝑖 will generally increase.
• Therefore, for any given r23, the variance of 𝛽መ2 will decrease, thus decreasing the
standard error, which will enable us to estimate β2 more precisely.
51
6. Reducing collinearity in polynomial regressions.
• If the explanatory variable(s) is expressed in the deviation form (i.e., deviation

from the mean value), multicollinearity is substantially reduced.
52
7. Other methods of remedying multicollinearity.
• Multivariate statistical techniques such as factor analysis and principal

components or techniques such as ridge regression are often employed to
“solve” the problem of multicollinearity.
53
Is Multicollinearity Necessarily Bad?
• If the objective is prediction only, it may not pose any problem.
54
Example: The Longley Data
• The data is given as
55
Example: The Longley Data

F(6, 9) = 330.29
Model 184172402 6 30695400.3 Prob > F = 0.0000
Residual 836424.056 9 92936.0062 R-squared = 0.9955 R2 value is very high, but
Total 185008826 15 12333921.7
Adj R-squared
Root MSE
=
=
0.9925
304.85
quite a few variables are
statistically insignificant (X1,
y Coef. Std. Err. t P>|t| [95% Conf. Interval] X2, and X5), a classic symptom
x1 1.506187 8.491493 0.18 0.863 -17.7029 20.71528
of multicollinearity.
x2 -.0358192 .033491 -1.07 0.313 -.1115811 .0399427
x3 -2.02023 .4883997 -4.14 0.003 -3.125067 -.915393
x4 -1.033227 .2142742 -4.82 0.001 -1.517949 -.548505
x5 -.0511041 .2260732 -0.23 0.826 -.5625172 .460309
x6 1829.151 455.4785 4.02 0.003 798.7875 2859.515
_cons 77270.12 22506.71 3.43 0.007 26356.41 128183.8
56
Example
• Correlation matrix
x1 x2 x3 x4 x5 x6
x1 1
x2 0.9916 1
x3 0.6206 0.6043 1
x4 0.4647 0.4464 -0.1774 1
x5 0.9792 0.9911 0.6866 0.3644 1
x6 0.9911 0.9953 0.6683 0.4172 0.994 1
57
Example
• Correlation matrix
x1 x2 x3 x4 x5 x6
x1 1
x2 0.9916 1
x3 0.6206 0.6043 1
x4 0.4647 0.4464 -0.1774 1
x5 0.9792 0.9911 0.6866 0.3644 1
x6 0.9911 0.9953 0.6683 0.4172 0.994 1
• Several of these pair-wise correlations are quite high, suggesting that
there may be a severe collinearity problem.
• However, such pair-wise correlations may be a sufficient but not a necessary condition
for the existence of multicollinearity.
58
Example
• To shed further light on the nature of the multicollinearity problem, let us run
the auxiliary regressions.

F(5, 10) = 269.06
Model 173397.547 5 34679.5095 Prob > F = 0.0000
Total 174686.438 15 11645.7625 Root MSE = 11.353

F(5, 10) = 3575.03
Model 1.4811e+11 5 2.9621e+10 Prob > F = 0.0000
Total 1.4819e+11 15 9.8794e+09 Root MSE = 2878.5
59
Example
• Finally, we get the following table.
60
Example
• What “remedial” actions can we take?
• Consider the original model.
• First of all, we could express GNP not in nominal terms, but in real terms, which we
can do by dividing nominal GNP by the implicit price deflator.
• Second, since non-institutional population over 14 years of age grows over time
because of natural population growth, it will be highly correlated with time, the
variable X6 in our model.
• Therefore, instead of keeping both these variables, we will keep the variable X5 and
drop X6.
• Third, there is no compelling reason to include X3, the number of people
unemployed; perhaps the unemployment rate would have been a better measure of
labor market conditions.
61
Example
• To make these changes, generate RGNP variable
• Estimate the regression model

F(3, 12) = 211.10
Model 181568361 3 60522787.1 Prob > F = 0.0000
Total 185008826 15 12333921.7 Root MSE = 535.45
RGNP 97.36499 17.91551 5.43 0.000 58.33046 136.3995

x4 -.6879658 .3222373 -2.13 0.054 -1.390061 .0141289
x5 -.2995369 .1417612 -2.11 0.056 -.608408 .0093342
_cons 65720.39 10624.8 6.19 0.000 42570.94 88869.85
62
Example
• Although the R2 value has declined slightly compared with the original R2, it is
still very high.
• Now all the estimated coefficients are significant and the signs of the coefficients
make economic sense.
63
64

L10.2_2023

Uploaded by

Copyright:

Available Formats

L10.2_2023

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L10.2_2023

Uploaded by

Copyright:

Available Formats

1

1. What is the nature of multicollinearity?

2. Is multicollinearity really a problem?

3. What are its practical consequences?

4. How does one detect it?

5. What remedial measures can be taken to alleviate the problem of

• An exact linear relationship is said to exist if the following condition is satisfied:

• It does not rule out nonlinear relationships among them.

• For example, consider the following regression model:

• If multicollinearity is perfect, the regression coefficients of the X variables are

• If multicollinearity is less than perfect, the regression coefficients, although

• An additional reason for multicollinearity, especially in time series data, may be

• Take the following three-variable regression model as

• According to Goldberger, exact micronumerosity (the counterpart of exact

• Near micronumerosity, like near multicollinearity, arises when the number of

• Second, collinearity does not disturb the property of minimum variance.

• Third, multicollinearity is essentially a sample (regression) phenomenon, that is,

• where r23 is the coefficient of correlation between X2 and X3.

• VIF shows how the variance of an estimator is inflated by the presence of

• the variance of the kth coefficient

• The probability of accepting a false hypothesis (i.e., type II error) increases.

Source SS df MS Number of obs = 5

y Coef. Std. Err. t P>|t| [95% Conf. Interval]

x2 .4463415 .1848104 2.42 0.137 -.3488336 1.241517

Source SS df MS Number of obs = 5

y Coef. Std. Err. t P>|t| [95% Conf. Interval]

x2 .4013514 .272065 1.48 0.278 -.7692498 1.571953

Source SS df MS Number of obs = 10

y Coef. Std. Err. t P>|t| [95% Conf. Interval]

x2 .9415373 .8228983 1.14 0.290 -1.004308 2.887383

Source SS df MS Number of obs = 10

x3 Coef. Std. Err. t P>|t| [95% Conf. Interval]

x2 10.19091 .1642623 62.04 0.000 9.81212 10.5697

Source SS df MS Number of obs = 10

y Coef. Std. Err. t P>|t| [95% Conf. Interval]

x2 .5090909 .0357428 14.24 0.000 .4266678 .591514

Source SS df MS Number of obs = 10

y Coef. Std. Err. t P>|t| [95% Conf. Interval]

x3 .0497638 .003744 13.29 0.000 .0411301 .0583974

• This result would suggest that a way out of extreme collinearity is

2. High pair-wise correlations among regressors.

3. Examination of partial correlations.

• If the computed F exceeds the critical Fi at the chosen level of significance, it is

• Condition number k defined as

• and the Condition Index (CI) defined as

• Alternatively, if the CI ( = 𝐾 ) is between 10 and 30, there is moderate to strong

• We can then run the following regression:

• But in dropping a variable from the model we may be committing a specification

• OLS estimators are BLUE despite near collinearity.

• First difference form

• Another commonly used transformation in practice is the ratio

• The error term

• will be heteroscedastic, if the original error term ut is homoscedastic.

• In the three-variable model

• If the explanatory variable(s) is expressed in the deviation form (i.e., deviation

• Multivariate statistical techniques such as factor analysis and principal

• The data is given as

Source SS df MS Number of obs = 16

Source SS df MS Number of obs = 16

Source SS df MS Number of obs = 16