3SLS is a technique used to estimate systems of equations simultaneously when the error terms across equations are correlated. It combines 2SLS and SUR by using instrumental variables to account for endogeneity, while also modeling the covariance structure of the error terms like SUR. The example estimates a 3-equation system using 3SLS, showing the coefficients and statistics for each equation. It also stores the covariance matrix of the error terms in a new matrix called sig to examine the correlations across equations.
3SLS is a technique used to estimate systems of equations simultaneously when the error terms across equations are correlated. It combines 2SLS and SUR by using instrumental variables to account for endogeneity, while also modeling the covariance structure of the error terms like SUR. The example estimates a 3-equation system using 3SLS, showing the coefficients and statistics for each equation. It also stores the covariance matrix of the error terms in a new matrix called sig to examine the correlations across equations.
3SLS is a technique used to estimate systems of equations simultaneously when the error terms across equations are correlated. It combines 2SLS and SUR by using instrumental variables to account for endogeneity, while also modeling the covariance structure of the error terms like SUR. The example estimates a 3-equation system using 3SLS, showing the coefficients and statistics for each equation. It also stores the covariance matrix of the error terms in a new matrix called sig to examine the correlations across equations.
3SLS is a technique used to estimate systems of equations simultaneously when the error terms across equations are correlated. It combines 2SLS and SUR by using instrumental variables to account for endogeneity, while also modeling the covariance structure of the error terms like SUR. The example estimates a 3-equation system using 3SLS, showing the coefficients and statistics for each equation. It also stores the covariance matrix of the error terms in a new matrix called sig to examine the correlations across equations.
Download as PPT, PDF, TXT or read online from Scribd
Download as ppt, pdf, or txt
You are on page 1of 31
3SLS
3SLS is the combination of 2SLS and SUR.
It is used in an system of equations which are endogenous, i.e. In each equation there are endogenous variables on both the left and right hand sides of the equation. THAT IS THE 2SLS PART.
But there error terms in each equation are also correlated. Efficient estimation requires we take account of this. THAT IS THE SUR (SEEMINGLY UNRELATED REGRESSIONS). PART.
Hence in the regression for the ith equation there are endogenous (Y ) variables on the rhs AND the error term is correlated with the error terms in other equations.
3SLS log using "g:summ1.log"
If you type the above then a log is created on drive g (on my computer this is the flash drive, on yours you may need to specify another drive.
The name summ1 can be anything. But the suffx must be log
At the end you can close the log by typing:
log close
So open a log now and you will have a record of this session 3SLS Load Data Clear use http://www.ats.ucla.edu/stat/stata/examples/greene/TBL16-2
THAT link no longer works. But the following does webuse klein In order to get the rest to work rename consump c rename capital1 k1 rename invest i rename profits p rename govt g rename wagegovt wg rename taxnetx t rename totinc t rename wagepriv wp generate x=totinc
*generate variables generate w = wg+wp generate k = k1+i generate yr=year-1931 generate p1 = p[_n-1] generate x1 = x[_n-1] OLS Regression regress c p p1 w
Regresses c on p , p1 and w (what this equation means is not so important).
Total 941.429389 20 47.0714695 Root MSE = 1.0255 Adj R-squared = 0.9777 Residual 17.8794524 17 1.05173249 R-squared = 0.9810 Model 923.549937 3 307.849979 Prob > F = 0.0000 F( 3, 17) = 292.71 Source SS df MS Number of obs = 21 reg3 By the command reg3, STATA estimates a system of structural equations, where some equations contain endogenous variables among the explanatory variables. Estimation is via three-stage least squares (3SLS). Typically, the endogenous regressors are dependent variables from other equations in the system.
In addition, reg3 can also estimate systems of equations by seemingly unrelated regression (SURE), multivariate regression (MVREG), and equation-by-equation ordinary least squares (OLS) or two-stage least squares (2SLS). 2SLS Regression reg3 (c p p1 w), 2sls inst(t wg g yr p1 x1 k1)
Regresses c on p , p1 and w. The instruments (i.e. The predetermined or exogenous variables in this equation and the rest of the system) are t wg g yr p1 x1 k1
This means that p and w (which are not included in the instruments are endogenous).
The output is as before, but it confirms what the exogenous and endogenous variables are. .
Exogenous variables: t wg g yr p1 x1 k1 Endogenous variables: c p w
_cons 16.55476 1.467979 11.28 0.000 13.45759 19.65192 w .8101827 .0447351 18.11 0.000 .7158 .9045654 p1 .2162338 .1192217 1.81 0.087 -.0353019 .4677696 p .0173022 .1312046 0.13 0.897 -.2595153 .2941197 c
Coef. Std. Err. t P>|t| [95% Conf. Interval]
c 21 3 1.135659 0.9767 225.93 0.0000
Equation Obs Parms RMSE "R-sq" F-Stat P
Two-stage least-squares regression 2SLS Regression ivreg c p1 (p w = t wg g yr p1 x1 k1)
This is an alternative command to do the same thing. Note that the endogenous variables on the right hand side of the equation are specified in (p w
And the instruments follow the = sign.
The results are identical. Instruments: p1 t wg g yr x1 k1 Instrumented: p w
Total 941.429389 20 47.0714695 Root MSE = 1.1357 Adj R-squared = 0.9726 Residual 21.9252518 17 1.28972069 R-squared = 0.9767 Model 919.504138 3 306.501379 Prob > F = 0.0000 F( 3, 17) = 225.93 Source SS df MS Number of obs = 21 Instrumental variables (2SLS) regression 3SLS Regression reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)
This format does two new things. First it specifies all the three equations in the system. Note it has to do this. Because it needs to calculate the covariances between the error terms and for this it needs to know what the equations and hence the errors are.
Secondly it says 3sls not 2sls All 3 equations are printed out. This tells us what these equations look like
Exogenous variables: t wg g yr p1 x1 k1 Endogenous variables: c p w i wp x
Three-stage least-squares regression Lets compare the three different sets of equations. Look at the coefficient on w. In OLS very significant and in 2SLS not significant but in 3SLS its back to similar with OLS and significant. That is odd.
Now I expect that if 2sls is different because of bias then so should 3sls. As it stands it suggests that OLS is closer to 3SLS than 2SLS is to 3SLS. Which does not make an awful lot of sense.
But we do not have many observations. Perhaps that is partly why.
3SLS 2SLS OLS coefficient t stat coefficient t stat coefficient t stat p 0.125 1.16 0.017 0.13 0.193 2.12 p1 0.163 1.62 0.810 18.11 0.090 0.99 w 0.790 20.83 0.216 1.81 0.796 19.93 _cons 16.441 12.6 16.555 11.28 16.237 12.46 R2 0.98 0.977 0.981 3SLS Regression reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1) matrix sig=e(Sigma)
Now this command stores the variances and covariances between the error terms in a matrix I call sig.
You have used generate to generate variables, scalar to generate scalars. Similarly matrix produces a matrix.
e(Sigma)stores this variance covariance matrix from the previous regression
3SLS Regression reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1) matrix sig=e(Sigma) display sig[1,1], sig[1,2], sig[1,3] display sig[2,1], sig[2,2], sig[2,3] display sig[3,1], sig[3,2], sig[3,3]
. . display sig[3,1], sig[3,2], sig[3,3] -.3852272 .19260612 .47642626 1.04406 0.437848 -0.38523 0.437848 1.383183 0.192606 -0.38523 0.192606 0.476426 Variance of 1 st error term Covariance of error terms from equations 2 and 3 3SLS Regression
. 1.04406 0.437848 -0.38523 0.437848 1.383183 0.192606 -0.38523 0.192606 0.476426 This relates to the variance covariance matrix in the lecture Hence 0.437848 relates to 12 and of course 21 This matrix is 3SLS Regression display sig[1,2]/( sig[1,1] ^0.5* sig[2,2]^0.5)
Now this should give the correlation between the error terms from equations 1 and 2.
It is this formula Correlation (x, y) = xy /( x
x ). When we do this we get:
.36435149 . display sig[1,2]/( sig[1,1]^0.5* sig[2,2]^0.5) Lets check reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1) matrix sig=e(Sigma) matrix cy= e(b) generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4]) generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8]) correlate ri rc
matrix cy= e(b) stores the coefficients from the regression in a regression vector we call cy,
cy[1,1] is the first coefficient on p in the first equation cy[1,4] is the fourth coefficient in the first equation (the constant term) cy[1,5] is the first coefficient ion p in the second equation Note this is cy[1,5] NOT cy[2,1]
Lets check reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)matrix sig=e(Sigma) matrix cy= e(b) generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4]) generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8]) correlate ri rc
Thus cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4] is the predicted value from this first regression. and
i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8])
Is the actual minus the predicted value, i.e. The error term from the 2nd equation correlate ri rc prints out the correlation between the two error terms
Lets check reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)matrix sig=e(Sigma) matrix cy= e(b) generate rc=c-(cy[1,1]*p+ cy[1,2]*p1+ cy[1,3]*w+cy[1,4]) generate ri=i-(cy[1,5]*p+ cy[1,6]*p1+ cy[1,7]*k1+ cy[1,8]) correlate ri rc
The correlation is 0,30, close to what we had before. But not the same. Now the main purpose of this class is to illustrate commands. So its not too important. I think it could be because stata is not calculating the e(sigma) matrix by dividing by n-k, but just n?????
rc 0.3011 1.0000 ri 1.0000
ri rc (obs=21) . correlate ri rc Lets check Click on help (on tool bar at the top of the screen to the right). Click on stata command In the dialogue box type reg3
Move down towards the end of the file and you get the following
e(cons_#) 1 when equation # has a constant; 0 otherwise e(ic) number of iterations e(p_#) significance for equation # e(chi2_#) chi-squared for equation # e(ll) log likelihood e(dfk2_adj) divisor used with VCE when dfk2 specified e(rmse_#) root mean squared error for equation # e(F_#) F statistic for equation # (small) e(r2_#) R-squared for equation # e(df_r) residual degrees of freedom ( small) e(rss_#) residual sum of squares for equation # e(df_m#) model degrees of freedom for equation # e(mss_#) model sum of squares for equation # e(k_eq) number of equations e(k) number of parameters e(N) number of observations Scalars reg3 saves the following in e(): Saved results Some important retrievables e(mss_#) model sum of squares for equation # e(rss_#) residual sum of squares for equation # e(r2_#) R-squared for equation # e(F_#) F statistic for equation # (small) e(rmse_#) root mean squared error for equation # e(ll) log likelihood
Where # is a number e.g. If 2 it means equation 2.
And
Matrices e(b) coefficient vector e(Sigma) Sigma hat matrix e(V) variance-covariance matrix of the estimators
The Hausman Test Again We looked at this with respect to panel data. But it is a general test to allow us to compare an equation which has been estimated by two different techniques. Here we apply the technique to comparing ols with 3sls.
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr),ols est store EQNols
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr) , 3sls inst(t wg g yr p1 x1 k1) est store EQN3sls
hausman EQNols EQN3sls
The Hausman Test Again Below we run the three regressions specifying ols and store the results as EQNols.
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr),ols est store EQNols
Then we run the three regressions specifying 3sls and store the results as EQN3sls.
reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr) , 3sls inst(t wg g yr p1 x1 k1) est store EQN3sls
Then we do the Hausman test hausman EQNols EQN3sls
The Results
(V_b-V_B is not positive definite) Prob>chi2 = 0.9963 = 0.06 chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B) Test: Ho: difference in coefficients not systematic B = inconsistent under Ha, efficient under Ho; obtained from reg3 b = consistent under Ho and Ha; obtained from reg3
w .7962188 .790081 .0061378 .0124993 p1 .0898847 .1631439 -.0732592 . p .1929343 .1248904 .068044 .
EQNols EQN3sls Difference S.E. (b) (B) (b-B) sqrt(diag(V_b-V_B)) Coefficients . hausman EQNols EQN3sls The table prints out the two sets of coefficients and their difference.
The Hausman test statistic is 0.06
The significance level is 0.9963
This is clearly very far from being significant at the 10% level. The Hausman Test Again Hence it would appear that the coefficients from the two regressions are not significantly different.
If OLS was giving biased estimates that 3SLS corrects they would be different.
Hence we would conclude that there is no endogeneity which requires endogenous techniques.
But because the error terms do appear correlated SUR is probably the approriate technique as it produces better results.
Tasks 1. Using the display command, e.g.
display e(mss_2)
Print on the screen some of the retrievables from eqach regression (the above the model sum of squared residuals for the second equation.
2. Lets look at the display command
Type:
display "The residual sum of squares =" e(mss_2)
Tasks
display "The residual sum of squares =" e(mss_2), "and the R2 =" e(r2_2)
display _column(20) "The residual sum of squares =" e(mss_2), _column(50) "and the R2 =" e(r2_2)
display _column(20) "The residual sum of squares =" e(mss_2), _column(60) "and the R2 =" e(r2_2)
display _column(20) "The residual sum of squares =" e(mss_2), _column(60) "and the R2 =" _skip(5) e(r2_2)
display _column(20) "The residual sum of squares =" e(mss_2), _column(60) "and the R2 =" _skip(10) e(r2_2)
Tasks
Close log:
log close
And have a look at it in word.
webuse klein In order to get the rest to work rename consump c rename capital1 k1 rename invest i rename profits p rename govt g rename wagegovt wg rename taxnetx t rename totinc t rename wagepriv wp generate x=totinc generate w = wg+wp generate k = k1+i generate yr=year-1931 generate p1 = p[_n-1] generate x1 = x[_n-1] reg3 (c p p1 w), 2sls inst(t wg g yr p1 x1 k1) reg3 (c p p1 w) (i p p1 k1) (wp x x1 yr), 3sls inst(t wg g yr p1 x1 k1)