Research Method: Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods

Research Method
Lecture 7 (Ch14)
Pooled Cross Sections

and Simple Panel Data
Methods
© 1
An independently pooled
cross section
This type of data is obtained by sampling
randomly from a population at different
points in time (usually in different years)
You can pool the data from different year
and run regressions.
However, you usually include year
dummies.
2
Panel data
This is the cross section data collected at
different points in time.
However, this data follow the same
individuals over time.
You can do a bit more than the pooled
cross section with Panel data.
You usually include year dummies as
well.
3
Pooling independent cross
sections across time.
As long as data are collected independently, it
causes little problem pooling these data over
time.
However, the distribution of independent
variables may change over time. For example,
the distribution of education changes over time.
To account for such changes, you usually need
to include dummy variables for each year (year
dummies), except one year as the base year
Often the coefficients for year dummies are of
interest.
4
Example 1
Consider that you would like to see the
changes in fertility rate over time after
controlling for various characteristics.
Next slide shows the OLS estimates of the
determinants of fertility over time. (Data:
FERTIL1.dta)
The data is collected every other year.
The base year for the year dummies are
year 1972.
5
Dependent variable =# kids per woman
. reg kids educ age agesq black east northcen west farm othrural town smcity y74 y76 y80 y82 y84
Source SS df MS Number of obs = 1129

F( 16, 1112) = 10.33
Model 399.265559 16 24.9540975 Prob > F = 0.0000
Residual 2686.24374 1112 2.41568682 R-squared = 0.1294
Adj R-squared = 0.1169
Total 3085.5093 1128 2.73538059 Root MSE = 1.5542
kids Coef. Std. Err. t P>|t| [95% Conf. Interval]
educ -.1287556 .0183209 -7.03 0.000 -.164703 -.0928081

age .535383 .1380659 3.88 0.000 .264484 .8062821
agesq -.0058384 .001561 -3.74 0.000 -.0089013 -.0027756
black 1.077747 .1733806 6.22 0.000 .7375571 1.417937
east .2180929 .1327211 1.64 0.101 -.042319 .4785049
northcen .3616071 .1207846 2.99 0.003 .1246157 .5985984
west .1989796 .1668093 1.19 0.233 -.1283168 .5262761
farm -.0553556 .146947 -0.38 0.706 -.3436803 .2329692
othrural -.1662171 .1751486 -0.95 0.343 -.5098761 .177442
town .0825938 .124396 0.66 0.507 -.1614836 .3266712
smcity .2092197 .1600797 1.31 0.191 -.1048727 .5233121
y74 .301226 .1488953 2.02 0.043 .0090786 .5933735
y76 -.0639849 .1556646 -0.41 0.681 -.3694143 .2414445
y80 -.037886 .1598956 -0.24 0.813 -.3516171 .2758452
y82 -.4892665 .1482989 -3.30 0.001 -.7802437 -.1982893
y84 -.5112715 .1496524 -3.42 0.001 -.8049044 -.2176385
_cons -7.844731 3.038574 -2.58 0.010 -13.80672 -1.882745
6
The number of children one woman has
in 1982 is 0.49 less than the base year.
Similar result is found for year 1984.
The year dummies show significant drops

in fertility rate over time.
7
Example 2
CPS78_85.dta has wage data collected in 1978
and 1985.
we estimate the earning equation which
includes education, experience, experience
squared, union dummy, female dummy and the
year dummy for 1985.
Suppose that you want to see if gender gap has
changed over time, you include interaction
between female and 1985; that is you estimate
the following.
8
Log(wage)=β0+β1(educ)
+β2(exper)+β3(expersq)+β4(Union)
+β5(female)
+β6(year85)
+β7(year85)(female)
You can check if gender wage gap in 1985 is
different from the base year (1978) by checking if β7
is equal to zero or not.
The gender gap in each period is given by:
-gender gap in the base year (1978) = β5
-gender gap in 1985= β5+ β7
9
. reg lwage educ exper expersq union female y85 y85fem

F( 7, 1076) = 113.20
Model 135.328704 7 19.332672 Prob > F = 0.0000
Residual 183.762464 1076 .170782959 R-squared = 0.4241
Total 319.091167 1083 .29463635 Root MSE = .41326
lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]
educ .0833217 .0050646 16.45 0.000 .0733841 .0932594

exper .0294761 .0035717 8.25 0.000 .0224679 .0364844
expersq -.0003975 .0000776 -5.12 0.000 -.0005498 -.0002451
union .205237 .0302943 6.77 0.000 .1457945 .2646795
female -.3195333 .0366427 -8.72 0.000 -.3914324 -.2476341
y85 .3530916 .0333324 10.59 0.000 .2876877 .4184954
y85fem .0884046 .0513498 1.72 0.085 -.0123524 .1891616
_cons .3522088 .0763137 4.62 0.000 .2024683 .5019493
Coefficient for the interaction term (y85)(Female) is

positive and significant at 10% significance level. So
gender gap appear to have reduced over time.
gender gap in 1978 =-0.319
gender gap in 1985=-0.319+0.088 =-0.231 10
Policy analysis with pooled
cross sections:
The difference in difference
estimator
I explain a typical policy analysis with
pooled cross section data, called the
difference-in-difference estimation, using
an example.
11
Example: Effects of garbage
incinerator on housing prices
This example is based on the studies of
housing price in North Andover in
Massachusetts
The rumor that a garbage incinerator will
be build in North Andover began after
1978. The construction of incinerator
began in 1981.
You want to examine if the incinerator
affected the housing price.
12
Our hypothesis is the following.
Hypothesis: House located near the incinerator

would fall relative to the price of more distant
houses.
For illustration define a house to be near the

incinerator if it is within 3 miles.
So create the following dummy variables
nearinc =1 if the house is `near’ the incinerator
=0 if otherwise
13
Most naïve analysis would be to run the following
regression using only 1981 data.
price =β0+β1(nearinc)+u
where the price is the real price (i.e., deflated using CPI to
express it in 1978 constant dollar).
Using the KIELMC.dta, the result is the following
. reg rprice nearinc if year==1981

F( 1, 140) = 27.73
Model 2.7059e+10 1 2.7059e+10 Prob > F = 0.0000
Residual 1.3661e+11 140 975815048 R-squared = 0.1653
Total 1.6367e+11 141 1.1608e+09 Root MSE = 31238
rprice Coef. Std. Err. t P>|t| [95% Conf. Interval]
nearinc -30688.27 5827.709 -5.27 0.000 -42209.97 -19166.58

_cons 101307.5 3093.027 32.75 0.000 95192.43 107422.6
But can we say from this estimation that the incinerator has
14
negatively affected the housing price?
To see this, estimate the same equation using
1979 data. Note this is before the rumor of
incinerator building began.

F( 1, 177) = 15.74
Model 1.3636e+10 1 1.3636e+10 Prob > F = 0.0001
Total 1.6696e+11 178 937979126 Root MSE = 29432
nearinc -18824.37 4744.594 -3.97 0.000 -28187.62 -9461.117

_cons 82517.23 2653.79 31.09 0.000 77280.09 87754.37
Note that the price of the house near the place where the
incinerator is to be build is lower than houses farther from the
location.
So negative coefficient simply means that the garbage incinerator

was build in the location where the housing price is low. 15
Now, compare the two regressions.
Year 1978 regression

F( 1, 177) = 15.74
Model 1.3636e+10 1 1.3636e+10 Prob > F = 0.0001
Compared to Adj R-squared = 0.0765
Total 1.6696e+11 178 937979126 Root MSE = 29432
1978, the price
penalty for
nearinc -18824.37 4744.594 -3.97 0.000 -28187.62 -9461.117
houses near the _cons 82517.23 2653.79 31.09 0.000 77280.09 87754.37
incinerator is
greater in Year 1981 regression
1981. Source SS df MS Number of obs

F( 1, 140)
=
=
142
27.73
Model 2.7059e+10 1 2.7059e+10 Prob > F = 0.0000
Perhaps, the Total 1.6367e+11 141 1.1608e+09 Root MSE = 31238
increase in the rprice Coef. Std. Err. t P>|t| [95% Conf. Interval]
price penalty in nearinc

_cons
-30688.27
101307.5
5827.709
3093.027
-5.27
32.75
0.000
0.000
-42209.97
95192.43
-19166.58
107422.6
1981 is caused
by the This is the basic idea of the
incinerator difference-in-difference estimator 16
The difference-in-difference estimator in
this example may be computed as follows.
I will show you more a general case later
on.
The difference-in-difference estimator :

ˆ1 = (coefficient for nearinc in 1981)
‒ (coefficient for nearinc in 1979)
= ‒ 30688.27 ‒(‒ 18824.37)= ‒11846
So, incinerator has decreased the house prices on
average by $11846. 17
Note that, in this example, the coefficient for
(nearinc) in 1979 is equal to
Average price Average price of

of houses near ‒ houses not near
the incinerator the incinerator
This is because the regression includes only one dummy variable:

(Just recall Ex.1 of the homework 2).
Therefore the difference in difference estimator ˆ in this

1
example is written as.
 
1  (Price) 1981,near  (Price) 1981,far  (Price) 1979,near  (Price) 1979,far 
This is the reason why the estimator is called the difference
in difference estimator. 18
Difference in difference
estimator: More general
case.
The difference-in-difference estimator can
be estimated by running the following
single equation using pooled sample.
Difference in
price =β0+β1(nearinc) difference estimator
+β2(year81)+δ1(year81)(nearinc)
19
. reg rprice nearinc y81 y81nrinc

F( 3, 317) = 22.25
Model 6.1055e+10 3 2.0352e+10 Prob > F = 0.0000
Total 3.5099e+11 320 1.0969e+09 Root MSE = 30243
nearinc -18824.37 4875.322 -3.86 0.000 -28416.45 -9232.293

y81 18790.29 4050.065 4.64 0.000 10821.88 26758.69
y81nrinc -11863.9 7456.646 -1.59 0.113 -26534.67 2806.867
_cons 82517.23 2726.91 30.26 0.000 77152.1 87882.36
Difference in difference estimator

This form is more general since in addition to policy dummy
(nearinc), you can include more variables that affect the housing
price such as the number of bedrooms etc. When you include more
variables, ˆ1 cannot be expressed in a simple difference-in-
difference format. However, the interpretation does not change, and
therefore, it is still called the difference-in-difference estimator 20
Natural experiment (or
quasi-experiment)
The difference in difference estimator is
frequently used to evaluate the effect of
governmental policy.
Often governmental policy affects one group of
people, while it does not affect other group of
people. This type of policy change is called the
natural experiment.
For example, the change in spousal tax
deduction system in Japan which took place in
1995 has affected married couples but did not
affect single people.
21
The group of people who are affected by
the policy is called the treatment group.
Those who are not affected by the policy
is called the control group.
Suppose that you want to know how the
change in spousal tax deduction has
affected the hours worked by women.
Suppose, you have the pooled data of
workers in 1994 and 1995.
The next slide shows the typical
procedure you follow to conduct the
difference-in-difference analysis. 22
Step 1: Create the treatment dummy such
that
Dtreat =1 if the person is affected by the policy change

=0 otherwise.
Step 2: Run the following regression.
(Hours worked)=β0+β1Dtreat+ β0(year95) +δ1(Year95)(Dtreat)+u
Difference in difference estimator. This shows

the effect of the policy change on the women’s
hours worked.
23
Two period panel data
Motivation:
analysis
Remember the effects of employee training grant on the
scrap rate. You estimated the following model for the
1987 data.
log( Scrap)   0  1 ( grant )   2 log( sales )  3 log( employment)  v
. reg lscrap grant lsale lemploy if year==1988

F( 3, 46) = 1.18
Model 6.8054029 3 2.26846763 Prob > F = 0.3270
Residual 88.2852083 46 1.91924366 R-squared = 0.0716
Total 95.0906112 49 1.94062472 Root MSE = 1.3854
lscrap Coef. Std. Err. t P>|t| [95% Conf. Interval]
grant -.0517781 .4312869 -0.12 0.905 -.9199137 .8163574

lsales -.4548425 .3733152 -1.22 0.229 -1.206287 .2966021
lemploy .6394289 .3651366 1.75 0.087 -.095553 1.374411
_cons 4.986779 4.655588 1.07 0.290 -4.384433 14.35799
You did not find the evidence that receiving the grant will
reduce scrap rate. 24
The reason why we did not find the significant
effect is probably due to the endogeneity
problem.
The company with low ability workers tend to
apply for the grant, which creates positive bias
in the estimation. If you observe the average
ability of the workers, you can eliminate the bias
by including the ability variable. But since you
cannot observe ability, you have the following
situation.
log( Scrap )   0  1 ( grant )   2 log( sales )   3 log( employment)  (  3ability  u )
 
v
where ability is in the error term v. v=(β3ability+u)

is called the composite error term.
25
log( Scrap )   0  1 ( grant )   2 log( sales )   3 log( employment)  (  3ability  u )
 
v
Because ability and grant are correlated

(negatively), this causes a bias in the
coefficient for (grant).
We predicted the direction of bias in the
following way. Effect of
ability on

~
1  
ˆ
1  
ˆ 

~
4 1
scrap rate
( ) ( ) ( )
True effect  Sign is
()
of grant determined by
Bias term
the correlation
The true negative effect of grant is cancelled out by between ability
the bias term. Thus, the bias make it difficult to and grant
26
find the effect.
Now you know that there is a bias. Is
there anything we can do to correct for the
bias?
When you have a panel data, we can
eliminate the bias.
I will explain the method using this
example. I will generalize it later.
27
Eliminating bias using two
period panel data
Now, go back to the equation.
log( Scrap )   0  1 ( grant )   2 log( sales )   3 log( employment)  (  4 ability  u )
 
v
The grant is administered in 1988.

Suppose that you have a panel data of
firms for two period, 1987 and 1988.
Further assume that the average ability of
workers does not change over time. So
(ability) is interpreted as the innate ability
of workers, such as IQ.
28
When you have the two period panel
data, the equation can be written as:
log( Scrap)it   0  1 ( grant )it   2 log( sales )it  3 log( employment)it
 5 ( year88)it  (  4 ability i  uit )

vit
i is the index for ith firm. t is the index for the

period.
Since ability is constant overtime, ability has
only i index.
Now, I will use a short hand notation for
β4(ability)i. Since (ability) is assumed constant over
time, write β4(ability)i=ai. Then above equation can
be written as: 29
log( Scrap)it   0  1 ( grant )it   2 log( sales )it  3 log( employment)it
 5 ( year88)it  (ai  uit )
 
vit
 ai is called, the fixed effect, or the unobserved effect. If

you want to emphasize that it is the unobserved firm
characteristic, you can call it the firm fixed effect as well
 uit is called the idiosyncratic error.
 Now the bias in OLS occurs because the fixed effect is
correlated with (grant).
 So if we can get rid of the fixed effect, we can eliminate
the bias. This is the basic idea.
 In the next slide, I will show the procedure of what is
called the first-differenced estimation.
30
First, for each firm, take the first difference. That
is, compute the following.
 log( Scrap)it  log( Scrap)it  log( Scrap)it 1
It follows that,
 log( Scrap ) it   0  1 ( grant ) it   2 log( sales ) it   3 log( employment) it

  5 ( year88) it  (ai  uit )  [  0  1 ( grant ) it 1   2 log( sales ) it 1
  3 log( employment) it 1   5 ( year88) it 1  (ai  uit 1 )]
 1( grant )it   2  log( sales ) it   3  log( employment) it   5 ( year88)it  uit
The first differenced equation.
31
So, by taking the first difference, you can
eliminate the fixed effect.
 log( Scrap)it  1( grant )it   2  log( sales )it  3 log( employment)it  5 ( year88)it  uit
If ∆uit is not correlated with ∆(grant)it, estimating

the first differenced model using OLS will produce
unbiased estimates. If we have controlled for
enough time-varying variables, it is reasonable to
assume that they are uncorrelated.
Note that this model does not have the constant.

Now, estimate this model using JTRAIN.dta
32
. **************************
. * Declare panel *
. **************************
. tsset fcode year
panel variable: fcode (strongly balanced)
time variable: year, 1987 to 1989
delta: 1 unit
. ******************************
. * Generate first differenced *
. * variables *
. ******************************
. gen difflscrap=lscrap-L.lscrap
(363 missing values generated)
. gen diffgrant=grant-L.grant
(157 missing values generated) When you use ‘nocons’
. gen difflsales=lsales-L.lsales
(226 missing values generated) option, the stata omits
. gen difflemploy=lemploy-L.lemploy
(181 missing values generated) constant term.
. gen diffd88=d88-L.d88
(157 missing values generated)
. **********************
. * Run the regression *
. **********************
. reg difflscrap diffgrant difflsales difflemploy diffd88 if year<=1988, nocons

F( 4, 43) = 1.82
Model 2.71885438 4 .679713595 Prob > F = 0.1428
Total 18.79382 47 .399868511 Root MSE = .61142
difflscrap Coef. Std. Err. t P>|t| [95% Conf. Interval]
diffgrant -.3223172 .1879101 -1.72 0.093 -.701274 .0566396

difflsales -.1733036 .365626 -0.47 0.638 -.9106586 .5640514
difflemploy .0233784 .5064015 0.05 0.963 -.9978775 1.044634
diffd88 -.0272418 .120639 -0.23 0.822 -.2705336 .2160501
Now, the grant is negative and significant at

10% level. 33
Note that, when you use this method in your
research, it is a good idea to tell your audience
what the potential fixed effect would be and
whether it is correlated with the explanatory
variables. In this example, unobserved ability is
potentially an important source of the fixed
effect.
Off course, one can never tell exactly what the
fixed effect is since it is the aggregate effects of
all the unobserved effects. However, if you tell
what is contained in the fixed effect, your
audience can understand the potential direction
of the bias, and why you need to use the first-
differenced method.
34
General case
First differenced model in a more general
situation can be written as follows.
Yit=β0+β1xit1+β2xit2+…+βkxitk+ai+uit
Fixed
effect
If ai is correlated with any of the explanatory variables,
the estimated coefficients will be biased. So take the
first difference to eliminate ai, then estimate the
following model by OLS.
∆Yit=∆ β1xit1+ ∆ β2xit2+…+ ∆ xitk+∆ uit
35
Note, when you take the first difference,
the constant term will also be eliminated.
So you should use `nocons’ option in
STATA when you estimate the model.
When some variables are time invariant,
these variables are also eliminated. If the
treatment variable does not change
overtime, you cannot use this method.
36
First differencing for more
than two periods.
You can use first differencing for more
than two periods.
You just have to difference two adjacent
periods successively.
For example, suppose that you have 3
periods. Then for the dependent variable,
you compute ∆yi2=yi2-yi1, and ∆yi3=yi3-yi2.
Do the same for x-variables. Then run the
regression. 37
Exercise
The data ezunem.dta contains the city level
unemployment claim statistics in the state of
Indiana. This data also contains information
about whether the city has an enterprise zone or
not.
The enterprise zone is the area which
encourages businesses and investments through
reduced taxes and restrictions. Enterprise zones
are usually created in an economically
depressed area with the purpose of increasing
the economic activities and reducing
unemployment.
38
 Using the data, ezunem.dta, you are asked to estimate the
effect of enterprise zones on the city-level unemployment
claim. Use the log of unemployment claim as the
dependent variable
Ex1. First estimate the following model using OLS.

log(unemployment claims)it =β0+β1(Enterprise zone)it
+β(year dummies)it+vit
Discuss whether the coefficient for enterprise zone is biased
or not. If you think it is biased, what is the direction of
bias?
Ex2. Estimate the model using the first difference method.

Did it change the result? Was your prediction of bias
correct?
39
OLS results
. reg luclms ez d81 d82 d83 d84 d85 d86 d87 d88

F( 9, 188) = 11.44
Model 35.5700512 9 3.95222791 Prob > F = 0.0000
Total 100.496279 197 .510133396 Root MSE = .58767
luclms Coef. Std. Err. t P>|t| [95% Conf. Interval]
ez -.0387084 .1148501 -0.34 0.736 -.2652689 .187852

d81 -.3216319 .1771882 -1.82 0.071 -.6711645 .0279007
d82 .1354957 .1771882 0.76 0.445 -.2140369 .4850283
d83 -.2192554 .1771882 -1.24 0.217 -.568788 .1302772
d84 -.5970717 .1799355 -3.32 0.001 -.9520237 -.2421197
d85 -.6216534 .1847186 -3.37 0.001 -.986041 -.2572658
d86 -.6511313 .1847186 -3.52 0.001 -1.015519 -.2867437
d87 -.9188151 .1847186 -4.97 0.000 -1.283203 -.5544275
d88 -1.2575 .1847186 -6.81 0.000 -1.621887 -.893112
_cons 11.69439 .125291 93.34 0.000 11.44724 11.94155
40
First differencing
. reg lagluclms lagez lagd81 lagd82 lagd83 lagd84 lagd85 lagd86 lagd87 lagd88, nocons

F( 9, 167) = 41.31
Model 17.3537634 9 1.92819594 Prob > F = 0.0000
Total 25.1496016 176 .142895463 Root MSE = .21606
lagluclms Coef. Std. Err. t P>|t| [95% Conf. Interval]
lagez -.1818775 .0781862 -2.33 0.021 -.3362382 -.0275169

lagd81 -.3216319 .046064 -6.98 0.000 -.4125748 -.2306891
lagd82 .1354957 .0651444 2.08 0.039 .0068831 .2641083
lagd83 -.2192554 .0797852 -2.75 0.007 -.3767731 -.0617378
lagd84 -.5580256 .0945636 -5.90 0.000 -.7447196 -.3713315
lagd85 -.5565765 .108961 -5.11 0.000 -.7716951 -.3414579
lagd86 -.5860544 .1182979 -4.95 0.000 -.8196066 -.3525023
lagd87 -.8537383 .1269499 -6.72 0.000 -1.104372 -.6031047
lagd88 -1.192423 .1350488 -8.83 0.000 -1.459046 -.9257998
41
The do file used to generate the results.
tsset city year
reg luclms ez d81 d82 d83 d84 d85 d86 d87 d88
gen lagluclms =luclms -L.luclms

gen lagez =ez -L.ez
gen lagd81 =d81 -L.d81
reg lagluclms lagez lagd81 lagd82 lagd83 lagd84 lagd85 lagd86 lagd87 lagd88,
nocons
42
The assumptions for the first
difference method.
Assumption FD1: Linearity
For each i, the model is written as
yit=β0+β1xit1+…+βkxitk+ai+uit
43
Assumption FD2:
We have a random sample from the cross

section
Assumption FD3:
There is no perfect collinearity. In addition,
each explanatory variable changes over
time at least for some i in the sample.
44
Assumption FD4. Strict exogeneity
E(uit|Xi,ai)=0 for each i.

Where Xi is the short hand notation for ‘all
the explanatory variables for ith individual
for all the time period’.
This means that uit is uncorrelated with the

current year’s explanatory variables as
well as with other years’ explanatory
variables.
45
The unbiasedness of first
difference method
Under FD1 through FD4, the estimated
parameters for the first difference method
are unbiased.
46
Assumption FD5: Homoskedasticity
Var(∆uit|Xi)=σ2
Assumption FD6: No serial correlation

within ith individual.
Cov(∆uit,∆uis)=0 for t≠s
Note that FD2 assumes random sampling across

difference individual, but does not assume
randomness within each individual. So you
need an additional assumption to rule out the
serial correlation.
47

Research Method: Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods

Uploaded by

Copyright:

Available Formats

Research Method: Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Research Method: Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods

Uploaded by

Copyright:

Available Formats

Research Method

Pooled Cross Sections

Source SS df MS Number of obs = 1129

kids Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ -.1287556 .0183209 -7.03 0.000 -.164703 -.0928081

The year dummies show significant drops

Source SS df MS Number of obs = 1084

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ .0833217 .0050646 16.45 0.000 .0733841 .0932594

Coefficient for the interaction term (y85)(Female) is

Hypothesis: House located near the incinerator

For illustration define a house to be near the

Source SS df MS Number of obs = 142

rprice Coef. Std. Err. t P>|t| [95% Conf. Interval]

nearinc -30688.27 5827.709 -5.27 0.000 -42209.97 -19166.58

Source SS df MS Number of obs = 179

rprice Coef. Std. Err. t P>|t| [95% Conf. Interval]

nearinc -18824.37 4744.594 -3.97 0.000 -28187.62 -9461.117

So negative coefficient simply means that the garbage incinerator

Source SS df MS Number of obs = 179

1981. Source SS df MS Number of obs

price penalty in nearinc

The difference-in-difference estimator :

Average price Average price of

This is because the regression includes only one dummy variable:

Therefore the difference in difference estimator ˆ in this

Source SS df MS Number of obs = 321

rprice Coef. Std. Err. t P>|t| [95% Conf. Interval]

nearinc -18824.37 4875.322 -3.86 0.000 -28416.45 -9232.293

Difference in difference estimator

Dtreat =1 if the person is affected by the policy change

Step 2: Run the following regression.

(Hours worked)=β0+β1Dtreat+ β0(year95) +δ1(Year95)(Dtreat)+u

Difference in difference estimator. This shows

Source SS df MS Number of obs = 50

lscrap Coef. Std. Err. t P>|t| [95% Conf. Interval]

grant -.0517781 .4312869 -0.12 0.905 -.9199137 .8163574

where ability is in the error term v. v=(β3ability+u)

Because ability and grant are correlated

The grant is administered in 1988.

i is the index for ith firm. t is the index for the

 ai is called, the fixed effect, or the unobserved effect. If

 log( Scrap)it  log( Scrap)it  log( Scrap)it 1

It follows that,

 log( Scrap ) it   0  1 ( grant ) it   2 log( sales ) it   3 log( employment) it

 1( grant )it   2  log( sales ) it   3  log( employment) it   5 ( year88)it  uit

The first differenced equation.

If ∆uit is not correlated with ∆(grant)it, estimating

Note that this model does not have the constant.

Source SS df MS Number of obs = 47

difflscrap Coef. Std. Err. t P>|t| [95% Conf. Interval]

diffgrant -.3223172 .1879101 -1.72 0.093 -.701274 .0566396

Now, the grant is negative and significant at

∆Yit=∆ β1xit1+ ∆ β2xit2+…+ ∆ xitk+∆ uit

Ex1. First estimate the following model using OLS.

Ex2. Estimate the model using the first difference method.

Source SS df MS Number of obs = 198

luclms Coef. Std. Err. t P>|t| [95% Conf. Interval]

ez -.0387084 .1148501 -0.34 0.736 -.2652689 .187852