L10.2_2023

Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

1

Multicollinearity
What Happens If the Regressors
Are Correlated?
Introduction
• Assumption 8 of the classical linear regression model (CLRM) is
that there is no multicollinearity among the regressors included
in the regression model.

2
Introduction
We attempt to answers to the following questions:

1. What is the nature of multicollinearity?

2. Is multicollinearity really a problem?

3. What are its practical consequences?

4. How does one detect it?

5. What remedial measures can be taken to alleviate the problem of


multicollinearity?

3
The Nature of Multicollinearity
• The term multicollinearity is attributed to Ragnar Frisch.

• An exact linear relationship is said to exist if the following condition is satisfied:

• where λ1, λ2, ... , λk are constants such that not all of them are zero
simultaneously.

4
The Ballentine view of multicollinearity

5
Multicollinearity
• Multicollinearity refers only to linear relationships among the X variables.

• It does not rule out nonlinear relationships among them.

• For example, consider the following regression model:

6
Multicollinearity
• Why does the classical linear regression model assume that there is no
multicollinearity among the X’s?

• If multicollinearity is perfect, the regression coefficients of the X variables are


indeterminate and their standard errors are infinite.

• If multicollinearity is less than perfect, the regression coefficients, although


determinate, possess large standard errors (in relation to the coefficients
themselves), which means the coefficients cannot be estimated with great
precision or accuracy.

7
Multicollinearity
• There are several sources of multicollinearity:
• The data collection method employed
• Constraints on the model or in the population being sampled
• Model specification
• An over-determined model.

• An additional reason for multicollinearity, especially in time series data, may be


that the regressors included in the model share a common trend, that is, they all
increase or decrease over time.

8
Estimation in the Presence of Perfect
Multicollinearity

• Take the following three-variable regression model as

9
Multicollinearity
• Assume that X3i = λX2i, where λ is a nonzero constant

10
Multicollinearity
• 𝛽መ2 : It gives the rate of change in the average value of Y as X2 changes by a unit,
holding X3 constant.

• But if X3 and X2 are perfectly collinear, there is no way X3 can be kept constant:
As X2 changes, so does X3 by the factor λ.

11
Estimation in the Presence of “High” but
“Imperfect” Multicollinearity
• Instead of exact multicollinearity, if we have

12
Multicollinearity: Theoretical Consequences of
Multicollinearity
• Even if multicollinearity is very high, as in the case of near multicollinearity, the
OLS estimators still retain the property of BLUE.

• According to Goldberger, exact micronumerosity (the counterpart of exact


multicollinearity) arises when n, the sample size, is zero, in which case any kind
of estimation is impossible.

• Near micronumerosity, like near multicollinearity, arises when the number of


observations barely exceeds the number of parameters to be estimated.

13
Multicollinearity
• Even in the case of near multicollinearity the OLS estimators are unbiased.

• Second, collinearity does not disturb the property of minimum variance.

• Third, multicollinearity is essentially a sample (regression) phenomenon, that is,


even if the X variables are not linearly related in the population, they may be so
related in the particular sample at hand.

14
Practical Consequences of Multicollinearity
1. Although BLUE, the OLS estimators have large variances and covariances,
making precise estimation difficult.
2. Because of consequence 1, the confidence intervals tend to be much wider,
leading to the acceptance of the “zero null hypothesis” (i.e., the true population
coefficient is zero) more readily.
3. Also because of consequence 1, the t ratio of one or more coefficients tends to
be statistically insignificant.
4. Although the t ratio of one or more coefficients is statistically insignificant, R2,
the overall measure of goodness of fit, can be very high.
5. The OLS estimators and their standard errors can be sensitive to small changes
in the data.

15
Large Variances and Covariances of OLS
Estimators
The following formulae are used

• where r23 is the coefficient of correlation between X2 and X3.

16
Variance-Inflating Factor (VIF)
• The speed with which variances and covariances increase can be seen with the
variance-inflating factor (VIF), which is defined as

• VIF shows how the variance of an estimator is inflated by the presence of


multicollinearity.

17
Regression model
• We can also write

• the variance of the kth coefficient

18
Multicollinearity
• The Effect of Increasing Collinearity on the 95% Confidence Interval

19
Tolerance (TOL)
• The inverse of the VIF is called tolerance (TOL). That is,

20
Wider Confidence Intervals
• Because of the large standard errors, the confidence intervals for the relevant
population parameters tend to be larger

• The probability of accepting a false hypothesis (i.e., type II error) increases.

21
“Insignificant” t Ratios
• In cases of high collinearity the estimated standard errors increase dramatically,
thereby making the t values smaller.

• Therefore, in such cases, one will increasingly accept the null hypothesis that the
relevant true population value is zero.

22
A High R2 but Few Significant t Ratios
• Consider the k-variable linear regression model:

• In cases of high collinearity, it is possible to find that one or more of the partial
slope coefficients are individually statistically insignificant on the basis of the t-
test.

• Yet the R2 in such situations may be so high, say, in excess of 0.9, that on the
basis of the F test one can convincingly reject the hypothesis that β2 = β3 = ··· = βk
=0

23
Sensitivity of OLS Estimators and Their Standard
Errors to Small Changes in Data
• Consider the following data set

24
Sensitivity of OLS Estimators and Their Standard
Errors to Small Changes in Data
• Estimate the regression model using the command: reg y x2 x3

Source SS df MS Number of obs = 5


F(2, 2) = 4.27
Model 8.10121951 2 4.05060976 Prob > F = 0.1899
Residual 1.89878049 2 .949390244 R-squared = 0.8101
Adj R-squared = 0.6202
Total 10 4 2.5 Root MSE = .97437

y Coef. Std. Err. t P>|t| [95% Conf. Interval]

x2 .4463415 .1848104 2.42 0.137 -.3488336 1.241517


x3 .0030488 .0850659 0.04 0.975 -.3629602 .3690578
_cons 1.193902 .7736789 1.54 0.263 -2.134969 4.522774

25
Sensitivity of OLS Estimators and Their Standard
Errors to Small Changes in Data
• We obtain the following multiple regression:

26
Sensitivity of OLS Estimators and Their Standard
Errors to Small Changes in Data
• With the new data set after changing 3rd and 4th value of X3

Source SS df MS Number of obs = 5


F(2, 2) = 4.39
Model 8.14324324 2 4.07162162 Prob > F = 0.1857
Residual 1.85675676 2 .928378378 R-squared = 0.8143
Adj R-squared = 0.6286
Total 10 4 2.5 Root MSE = .96352

y Coef. Std. Err. t P>|t| [95% Conf. Interval]

x2 .4013514 .272065 1.48 0.278 -.7692498 1.571953


x3 .027027 .1252281 0.22 0.849 -.5117858 .5658399
_cons 1.210811 .7480215 1.62 0.247 -2.007666 4.429288

27
Sensitivity of OLS Estimators and Their Standard
Errors to Small Changes in Data
• With the new data set after changing 3rd and 4th value of X3

28
Consequences of Micronumerosity

29
Example 2
• EXAMPLE Consumption Expenditure in Relation to Income and Wealth

30
Example 2
• EXAMPLE Consumption Expenditure in Relation to Income and Wealth

Source SS df MS Number of obs = 10


F(2, 7) = 92.40
Model 8565.55407 2 4282.77704 Prob > F = 0.0000
Residual 324.445926 7 46.349418 R-squared = 0.9635
Adj R-squared = 0.9531
Total 8890 9 987.777778 Root MSE = 6.808

y Coef. Std. Err. t P>|t| [95% Conf. Interval]

x2 .9415373 .8228983 1.14 0.290 -1.004308 2.887383


x3 -.0424345 .0806645 -0.53 0.615 -.2331757 .1483067
_cons 24.77473 6.7525 3.67 0.008 8.807609 40.74186

31
Example 2
• EXAMPLE Consumption Expenditure in Relation to Income and Wealth

32
Graphical presentation
• Graphical presentation

33
Example

Source SS df MS Number of obs = 10


F(1, 8) = 3849.02
Model 3427202.73 1 3427202.73 Prob > F = 0.0000
Residual 7123.27273 8 890.409091 R-squared = 0.9979
Adj R-squared = 0.9977
Total 3434326 9 381591.778 Root MSE = 29.84

x3 Coef. Std. Err. t P>|t| [95% Conf. Interval]

x2 10.19091 .1642623 62.04 0.000 9.81212 10.5697


_cons 7.545455 29.47581 0.26 0.804 -60.42589 75.5168

34
Example

Source SS df MS Number of obs = 10


F(1, 8) = 202.87
Model 8552.72727 1 8552.72727 Prob > F = 0.0000
Residual 337.272727 8 42.1590909 R-squared = 0.9621
Adj R-squared = 0.9573
Total 8890 9 987.777778 Root MSE = 6.493

y Coef. Std. Err. t P>|t| [95% Conf. Interval]

x2 .5090909 .0357428 14.24 0.000 .4266678 .591514


_cons 24.45455 6.413817 3.81 0.005 9.664256 39.24483

35
Example
• If instead of regressing Y on X2, we regress it on X3

Source SS df MS Number of obs = 10


F(1, 8) = 176.67
Model 8504.87666 1 8504.87666 Prob > F = 0.0000
Residual 385.123344 8 48.1404181 R-squared = 0.9567
Adj R-squared = 0.9513
Total 8890 9 987.777778 Root MSE = 6.9383

y Coef. Std. Err. t P>|t| [95% Conf. Interval]

x3 .0497638 .003744 13.29 0.000 .0411301 .0583974


_cons 24.41104 6.874097 3.55 0.007 8.559349 40.26274

36
Example
• If instead of regressing Y on X2, we regress it on X3, we obtain

• This result would suggest that a way out of extreme collinearity is


to drop the collinear variable

37
Detection of Multicollinearity
1. High R2 but few significant t ratios.

2. High pair-wise correlations among regressors.


• high zero-order correlations are a sufficient but not a necessary condition for the
existence of multicollinearity because it can exist even though the zero-order or
simple correlations are comparatively low.

3. Examination of partial correlations.

38
Detection of Multicollinearity
4. Auxiliary regressions.
• Regress each Xi on the remaining X variables and compute the corresponding R2,
designated as R2i ; each one of these regressions is called an auxiliary regression

• If the computed F exceeds the critical Fi at the chosen level of significance, it is


taken to mean that the particular Xi is collinear with other X’s.

39
Detection of Multicollinearity
• Klein’ rule of thumb suggests that multicollinearity may be a troublesome
problem only if the R2 obtained from an auxiliary regression is greater than the
overall R2, that is, that obtained from the regression of Y on all the regressors.

40
Detection of Multicollinearity
5. Eigenvalues and Condition Index.

• Condition number k defined as

• and the Condition Index (CI) defined as

41
Detection of Multicollinearity
• Thumb rule: If k is between 100 and 1000 there is moderate to strong
multicollinearity and if it exceeds 1000 there is severe multicollinearity.

• Alternatively, if the CI ( = 𝐾 ) is between 10 and 30, there is moderate to strong


multicollinearity and if it exceeds 30 there is severe multicollinearity.

42
Detection of Multicollinearity
6. Tolerance and variance inflation factor.

• The larger the value of VIFj, the more “troublesome” or collinear the variable Xj.

• As a rule of thumb, if the VIF of a variable exceeds 10, which will happen if R2j
exceeds 0.90, that variable is said be highly collinear.

• The closer TOLj is to zero, the greater the degree of collinearity of that variable
with the other regressors.

• On the other hand, the closer TOLj is to 1, the greater the evidence that Xj is not
collinear with the other regressors.

43
Detection of Multicollinearity
• 7. Scatterplot.

44
Remedial Measures
• Do Nothing Approach

45
Rule-of-Thumb Procedures
1. A priori information.
Suppose we consider the model:

• But suppose we believe that β3 = 0.10β2; that is, the rate of change of
consumption with respect to wealth is one-tenth the corresponding rate with
respect to income.

• We can then run the following regression:

46
Rule-of-Thumb Procedures
2. Combining cross-sectional and time series data.

47
Rule-of-Thumb Procedures
3. Dropping a variable(s) and specification bias.

• But in dropping a variable from the model we may be committing a specification


bias or specification error.

• Specification bias arises from incorrect specification of the model used in the
analysis.

• Thus, if economic theory says that income and wealth should both be included in the
model explaining the consumption expenditure, dropping the wealth variable would
constitute specification bias.

• Omitting a variable may seriously mislead us as to the true values of the parameters.

• OLS estimators are BLUE despite near collinearity.

48
Rule-of-Thumb Procedures
4. Transformation of variables.

• First difference form

• Another commonly used transformation in practice is the ratio


transformation. Consider the model:

49
Rule-of-Thumb Procedures
• Problems with first difference models

• For instance, the error term vt in transformed equation may not satisfy the
assumption that the disturbances are serially uncorrelated.

• The error term

• will be heteroscedastic, if the original error term ut is homoscedastic.

50
Rule-of-Thumb Procedures
5. Additional or new data.

• In the three-variable model

2
• As the sample size increases, σ 𝑥2𝑖 will generally increase.

• Therefore, for any given r23, the variance of 𝛽መ2 will decrease, thus decreasing the
standard error, which will enable us to estimate β2 more precisely.

51
Rule-of-Thumb Procedures
6. Reducing collinearity in polynomial regressions.

• If the explanatory variable(s) is expressed in the deviation form (i.e., deviation


from the mean value), multicollinearity is substantially reduced.

52
Rule-of-Thumb Procedures
7. Other methods of remedying multicollinearity.

• Multivariate statistical techniques such as factor analysis and principal


components or techniques such as ridge regression are often employed to
“solve” the problem of multicollinearity.

53
Is Multicollinearity Necessarily Bad?
• If the objective is prediction only, it may not pose any problem.

54
Example: The Longley Data

• The data is given as

55
Example: The Longley Data

Source SS df MS Number of obs = 16


F(6, 9) = 330.29
Model 184172402 6 30695400.3 Prob > F = 0.0000
Residual 836424.056 9 92936.0062 R-squared = 0.9955 R2 value is very high, but
Total 185008826 15 12333921.7
Adj R-squared
Root MSE
=
=
0.9925
304.85
quite a few variables are
statistically insignificant (X1,
y Coef. Std. Err. t P>|t| [95% Conf. Interval] X2, and X5), a classic symptom
x1 1.506187 8.491493 0.18 0.863 -17.7029 20.71528
of multicollinearity.
x2 -.0358192 .033491 -1.07 0.313 -.1115811 .0399427
x3 -2.02023 .4883997 -4.14 0.003 -3.125067 -.915393
x4 -1.033227 .2142742 -4.82 0.001 -1.517949 -.548505
x5 -.0511041 .2260732 -0.23 0.826 -.5625172 .460309
x6 1829.151 455.4785 4.02 0.003 798.7875 2859.515
_cons 77270.12 22506.71 3.43 0.007 26356.41 128183.8

56
Example
• Correlation matrix

x1 x2 x3 x4 x5 x6
x1 1
x2 0.9916 1
x3 0.6206 0.6043 1
x4 0.4647 0.4464 -0.1774 1
x5 0.9792 0.9911 0.6866 0.3644 1
x6 0.9911 0.9953 0.6683 0.4172 0.994 1

57
Example
• Correlation matrix

x1 x2 x3 x4 x5 x6
x1 1
x2 0.9916 1
x3 0.6206 0.6043 1
x4 0.4647 0.4464 -0.1774 1
x5 0.9792 0.9911 0.6866 0.3644 1
x6 0.9911 0.9953 0.6683 0.4172 0.994 1
• Several of these pair-wise correlations are quite high, suggesting that
there may be a severe collinearity problem.

• However, such pair-wise correlations may be a sufficient but not a necessary condition
for the existence of multicollinearity.

58
Example
• To shed further light on the nature of the multicollinearity problem, let us run
the auxiliary regressions.

Source SS df MS Number of obs = 16


F(5, 10) = 269.06
Model 173397.547 5 34679.5095 Prob > F = 0.0000
Residual 1288.89024 10 128.889024 R-squared = 0.9926
Adj R-squared = 0.9889
Total 174686.438 15 11645.7625 Root MSE = 11.353

Source SS df MS Number of obs = 16


F(5, 10) = 3575.03
Model 1.4811e+11 5 2.9621e+10 Prob > F = 0.0000
Residual 82856688.7 10 8285668.87 R-squared = 0.9994
Adj R-squared = 0.9992
Total 1.4819e+11 15 9.8794e+09 Root MSE = 2878.5

59
Example
• Finally, we get the following table.

60
Example
• What “remedial” actions can we take?
• Consider the original model.
• First of all, we could express GNP not in nominal terms, but in real terms, which we
can do by dividing nominal GNP by the implicit price deflator.
• Second, since non-institutional population over 14 years of age grows over time
because of natural population growth, it will be highly correlated with time, the
variable X6 in our model.
• Therefore, instead of keeping both these variables, we will keep the variable X5 and
drop X6.
• Third, there is no compelling reason to include X3, the number of people
unemployed; perhaps the unemployment rate would have been a better measure of
labor market conditions.

61
Example
• To make these changes, generate RGNP variable

• Estimate the regression model

Source SS df MS Number of obs = 16


F(3, 12) = 211.10
Model 181568361 3 60522787.1 Prob > F = 0.0000
Residual 3440464.6 12 286705.383 R-squared = 0.9814
Adj R-squared = 0.9768
Total 185008826 15 12333921.7 Root MSE = 535.45

y Coef. Std. Err. t P>|t| [95% Conf. Interval]

RGNP 97.36499 17.91551 5.43 0.000 58.33046 136.3995


x4 -.6879658 .3222373 -2.13 0.054 -1.390061 .0141289
x5 -.2995369 .1417612 -2.11 0.056 -.608408 .0093342
_cons 65720.39 10624.8 6.19 0.000 42570.94 88869.85

62
Example
• Although the R2 value has declined slightly compared with the original R2, it is
still very high.

• Now all the estimated coefficients are significant and the signs of the coefficients
make economic sense.

63
64

You might also like