Research Method: Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods
Research Method: Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods
Research Method: Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods
Lecture 7 (Ch14)
2
Panel data
This is the cross section data collected at
different points in time.
However, this data follow the same
individuals over time.
You can do a bit more than the pooled
cross section with Panel data.
You usually include year dummies as
well.
3
Pooling independent cross
sections across time.
As long as data are collected independently, it
causes little problem pooling these data over
time.
However, the distribution of independent
variables may change over time. For example,
the distribution of education changes over time.
To account for such changes, you usually need
to include dummy variables for each year (year
dummies), except one year as the base year
Often the coefficients for year dummies are of
interest.
4
Example 1
Consider that you would like to see the
changes in fertility rate over time after
controlling for various characteristics.
Next slide shows the OLS estimates of the
determinants of fertility over time. (Data:
FERTIL1.dta)
The data is collected every other year.
The base year for the year dummies are
year 1972.
5
Dependent variable =# kids per woman
. reg kids educ age agesq black east northcen west farm othrural town smcity y74 y76 y80 y82 y84
7
Example 2
CPS78_85.dta has wage data collected in 1978
and 1985.
we estimate the earning equation which
includes education, experience, experience
squared, union dummy, female dummy and the
year dummy for 1985.
Suppose that you want to see if gender gap has
changed over time, you include interaction
between female and 1985; that is you estimate
the following.
8
Log(wage)=β0+β1(educ)
+β2(exper)+β3(expersq)+β4(Union)
+β5(female)
+β6(year85)
+β7(year85)(female)
You can check if gender wage gap in 1985 is
different from the base year (1978) by checking if β7
is equal to zero or not.
The gender gap in each period is given by:
-gender gap in the base year (1978) = β5
-gender gap in 1985= β5+ β7
9
. reg lwage educ exper expersq union female y85 y85fem
11
Example: Effects of garbage
incinerator on housing prices
This example is based on the studies of
housing price in North Andover in
Massachusetts
The rumor that a garbage incinerator will
be build in North Andover began after
1978. The construction of incinerator
began in 1981.
You want to examine if the incinerator
affected the housing price.
12
Our hypothesis is the following.
But can we say from this estimation that the incinerator has
14
negatively affected the housing price?
To see this, estimate the same equation using
1979 data. Note this is before the rumor of
incinerator building began.
. reg rprice nearinc if year==1978
Note that the price of the house near the place where the
incinerator is to be build is lower than houses farther from the
location.
incinerator is
greater in Year 1981 regression
. reg rprice nearinc if year==1981
increase in the rprice Coef. Std. Err. t P>|t| [95% Conf. Interval]
1981 is caused
by the This is the basic idea of the
incinerator difference-in-difference estimator 16
The difference-in-difference estimator in
this example may be computed as follows.
I will show you more a general case later
on.
+β2(year81)+δ1(year81)(nearinc)
19
. reg rprice nearinc y81 y81nrinc
You did not find the evidence that receiving the grant will
reduce scrap rate. 24
The reason why we did not find the significant
effect is probably due to the endogeneity
problem.
The company with low ability workers tend to
apply for the grant, which creates positive bias
in the estimation. If you observe the average
ability of the workers, you can eliminate the bias
by including the ability variable. But since you
cannot observe ability, you have the following
situation.
log( Scrap ) 0 1 ( grant ) 2 log( sales ) 3 log( employment) ( 3ability u )
v
27
Eliminating bias using two
period panel data
Now, go back to the equation.
log( Scrap ) 0 1 ( grant ) 2 log( sales ) 3 log( employment) ( 4 ability u )
v
30
First, for each firm, take the first difference. That
is, compute the following.
31
So, by taking the first difference, you can
eliminate the fixed effect.
log( Scrap)it 1( grant )it 2 log( sales )it 3 log( employment)it 5 ( year88)it uit
. ******************************
. * Generate first differenced *
. * variables *
. ******************************
. gen difflscrap=lscrap-L.lscrap
(363 missing values generated)
. gen diffgrant=grant-L.grant
(157 missing values generated) When you use ‘nocons’
. gen difflsales=lsales-L.lsales
(226 missing values generated) option, the stata omits
. gen difflemploy=lemploy-L.lemploy
(181 missing values generated) constant term.
. gen diffd88=d88-L.d88
(157 missing values generated)
. **********************
. * Run the regression *
. **********************
. reg difflscrap diffgrant difflsales difflemploy diffd88 if year<=1988, nocons
35
Note, when you take the first difference,
the constant term will also be eliminated.
So you should use `nocons’ option in
STATA when you estimate the model.
When some variables are time invariant,
these variables are also eliminated. If the
treatment variable does not change
overtime, you cannot use this method.
36
First differencing for more
than two periods.
You can use first differencing for more
than two periods.
You just have to difference two adjacent
periods successively.
For example, suppose that you have 3
periods. Then for the dependent variable,
you compute ∆yi2=yi2-yi1, and ∆yi3=yi3-yi2.
Do the same for x-variables. Then run the
regression. 37
Exercise
The data ezunem.dta contains the city level
unemployment claim statistics in the state of
Indiana. This data also contains information
about whether the city has an enterprise zone or
not.
The enterprise zone is the area which
encourages businesses and investments through
reduced taxes and restrictions. Enterprise zones
are usually created in an economically
depressed area with the purpose of increasing
the economic activities and reducing
unemployment.
38
Using the data, ezunem.dta, you are asked to estimate the
effect of enterprise zones on the city-level unemployment
claim. Use the log of unemployment claim as the
dependent variable
40
First differencing
. reg lagluclms lagez lagd81 lagd82 lagd83 lagd84 lagd85 lagd86 lagd87 lagd88, nocons
41
The do file used to generate the results.
reg luclms ez d81 d82 d83 d84 d85 d86 d87 d88
reg lagluclms lagez lagd81 lagd82 lagd83 lagd84 lagd85 lagd86 lagd87 lagd88,
nocons
42
The assumptions for the first
difference method.
Assumption FD1: Linearity
yit=β0+β1xit1+…+βkxitk+ai+uit
43
Assumption FD2:
Assumption FD3:
There is no perfect collinearity. In addition,
each explanatory variable changes over
time at least for some i in the sample.
44
Assumption FD4. Strict exogeneity
46
Assumption FD5: Homoskedasticity
Var(∆uit|Xi)=σ2