Lect 1 18

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Economics 1123

Introduction to Econometrics
Fall 2018

The statistical analysis of economic (and related) data

Faculty: Prof. James Stock


Littauer Center M27
Teaching fellows: Jim Reisinger (Head TF)
Justin Bloesch
Lauren Shani Russell
Dayea Oh
Kevin Carney
Prof. Stock’s OH: M 5-6, Wed 3-4 & by appt,
schedule with Clare Dingwell
(dingwell@fas.harvard.edu)
1-1
Administrative matters:
• Syllabus on Web site

• Ec1123 (spring, fall) or Ec1126 (spring)?

• Prerequisites and level:


o Stat 104 or equivalent (2 weeks of overlap with Stat 104)
o No economics prereq (but Ec 10 helps)
o Math: algebra, some calculus in class, but not in PS’s or exams

• Slides posted on Web site by 9pm evening before lecture

• Textbook: Introduction to Econometrics by Stock and Watson, 3rd


edition updated (all English editions OK)

1-2
Administrative ctd: Grading (see syllabus)
• Problem sets:
o distribution method: posted on course Web site (Canvas)
o Late PS policy: full (by deadline)/half (after deadline)/zero (after
solutions are posted)
o Electronic/hard copy submission - TBD
o Answers posted after the class after the PS is due (if due on
Tuesday, posted after class Thursday)
o OK to work in groups on PS’s (good idea): 3 max, write up
individually
o Total PS grade: Drop lowest of 1-7, 9, 10; 8 counts double and
can’t be dropped.
• Course grade:
o Problem Sets: 30%; Midterm: 25%; Final: 45%.
o Ec 1123 fall & spring use same curve

1-3
Admin ctd:
• Sections
o online sectioning (Canvas) – Jim R
o Stats & STATA review session: 1pm Fri. Sept. 2
o Regular sections start next week (next Wed-Sunday).
o TODAY Fall athletes please email Jim R scheduling input

• STATA and R
o Problem sets require STATA
o STATA is a means to an end, not an end in itself
o Intro to STATA handout is on the course Web site
o Use .do files, get started now before PS’s get harder
o STATA .do files will be included with lecture notes
o What about R?

1-4
Two Regression Studies

A. Doping and athletic performance

Matthew Romanczuk, Baseball’s Juiced Era: Steroids, Swinging for the


Fences, or Both? (Harvard UG Thesis, 2009)

Data:
• MLB players and at-bat data, 1965-2008
• 6,723,291 at bats (data on outcome, park, conditions, home/away,
player, etc.)

Key dates: “Juiced Era” is 1994-2006

1-5
HR per AB by Year
.035
.03
HR per AB

.025
.02
.015

1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
Year

1-6
K per AB by Year

.2
.18
K per AB

.16
.14

1965 1970 1975 1980 1985 1990 1995 2000 2005 2010
Year

1-7
Data update:

0.14
0.18
0.22
0.26

0.12
0.16
0.20
0.24
0.01500
0.02000
0.02500
0.03000
0.03500
0.04000

1965
1967 1965
1969 1967
1971 1969
1973 1971
1975 1973
1977 1975
1979 1977
1981 1979
1983 1981
1985 1983
1985
1987
1987
1989
1989
1991
1991
K per AB

1993
HR per AB

1993
1995
1995
1997
1997
1999
1999
2001 2001
2003 2003
2005 2005
2007 2007
2009 2009
2011 2011
2013 2013
2015 2015
2017 2017
1-8
B. What has happened to coal mining since 2008?

Employment in coal mining, 2004-2018 (BLS)

“We have ended the war on American energy — and we have ended the
war on beautiful, clean coal. We are now very proudly an exporter of
energy to the world.” (President Trump, State of the Union, 2018)
1-9
Other explanations?
Monthly coal use for electricity and
Employment in coal mining, 1986 – 2018 relative price of natural gas to coal,
seasonally adjusted

Candidate factors
• Regulations (war on coal)?
• Prices of competing fuels (natural gas)?
• Energy efficiency improvements (electricity demand reduction)?
• Other (exports, metallurgical coal, productivity gains, etc.)
1-10
Use multiple regression to decompose sources of change in coal for
electricity, 2009-2016
State-level aggregated up from power plant and mine level data
(“panel data”)

1-11
What about the 2017 turnaround in employment?

U.S. Natural Gas Electric Power Price


12

11

10

1-12
Material for today and part of Monday: Review of…
• Random variables, distributions, and conditional distributions
• Expectations and conditional expectations
• Random sampling as the source of sampling uncertainty
• Central Limit Theorem
• Learning about population distributions from data:
1. Estimation
2. Confidence intervals
3. Hypothesis testing
• Application to comparison of two means

Monday: regression with a single regressor

1-13
Example of the Central Limit Theorem: The sampling distribution of Y
when Y is Bernoulli (binary) with Pr(Y = 1) = .78 is, in large samples,
approximately normal with mean E(Y) and variance
var(Y ) =  Y2 / n , i.e. N(E(Y),  Y2 / n ):

1-14
Statistics Review: Empirical Example using STATA

Data set: U.T. Teaching evaluations


n = 463 courses at U.T. Austin, academic years 2000-2002 (Source:
Hamermesh and Parker (2005))

Histogram of Course Evaluation Scores


.8
.6
Density

.4
.2
0

2 3 4 5
Average course rating

1-15
Empirical question
Are course evaluation scores the same on average for male and female
instructors?

Let  = the population difference in mean scores, men – women


= E(Ym) – E(Yw).

We are interested in:


1. Estimating  by the sample difference, ̂ = Ym – Yw
2. Can we reject the hypothesis that male and female instructors have
the same scores on average, i.e. that  = 0?
3. Finding a 95% confidence interval for 

1-16
STATA output –courseevaluation by sex of instructor
Blue means you type this in

. summarize courseevaluation if(female==0)

Variable | Obs Mean Std. Dev. Min Max


-------------+--------------------------------------------------------
courseeval~n | 268 4.06903 .5566518 2.1 5

. summarize courseevaluation if(female==1)

Variable | Obs Mean Std. Dev. Min Max


-------------+--------------------------------------------------------
courseeval~n | 195 3.901026 .5388026 2.3 4.9

Question 1: Who has better evaluations – male or female instructors?


What is the estimated difference ( ̂ ) in evaluations?

Estimated difference = ̂ = Ym – Yw = 4.069 – 3.901= 0.168

1-17
Question 2: Can we reject the hypothesis that male and female
instructors have the same scores on average?

To conduct this hypothesis test, compute the t-statistic testing the


hypothesis that  = 0:

Yw  Ym
t (testing =0) =
SE (Yw  Ym )

We need to compute the standard error of ̂ , SE( ̂ ):

sm2 sw2
SE(Ym – Yw ) = 
nm nw

1-18
. summarize courseevaluation if(female==0)

Variable | Obs Mean Std. Dev. Min Max


-------------+--------------------------------------------------------
courseeval~n | 268 4.06903 .5566518 2.1 5

. summarize courseevaluation if(female==1)

Variable | Obs Mean Std. Dev. Min Max


-------------+--------------------------------------------------------
courseeval~n | 195 3.901026 .5388026 2.3 4.9

sm2 sw2 0.55672 0.53882


SE(Ym – Yw ) =  =  = 0.0514
nm nw 268 195
Yw  Ym
t (testing =0) = = 0.168/0.0514 = 3.27
SE (Yw  Ym )
Two methods to evaluate this t-statistic:
a) compare it to 1.96
b) compute the p-value:
p-value = Pr(|z| > 3.27) = 0.0011 = 0.11%
1-19
area for z > 3.27 is 0.0006 = 0.06% = Pr(z > 3.27)

p-value = Pr(|z| > 3.27) = 2Pr(z > 3.27)


= 20.06% = 0.12%

(different from 0.11% due to rounding)

1-20
Question 3: What is the 95% confidence interval for this difference?

95% confidence interval = ̂  1.96SE( ̂ )

̂  1.96SE( ̂ ) = 0.168  1.960.0514 = (0.067, 0.269)

1-21
These calculations, done using ttest in STATA:

. ttest courseevaluation, by(female) unequal;

Two-sample t test with unequal variances


------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
0 | 268 4.06903 .0340029 .5566518 4.002082 4.135978
1 | 195 3.901026 .0385845 .5388026 3.824927 3.977125
---------+--------------------------------------------------------------------
combined | 463 3.998272 .0257868 .5548656 3.947598 4.048946
---------+--------------------------------------------------------------------
diff | .1680042 .0514292 .0669175 .2690909
------------------------------------------------------------------------------
diff = mean(0) - mean(1) t = 3.2667
Ho: diff = 0 Satterthwaite's degrees of freedom = 425.756

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0


Pr(T < t) = 0.9994 Pr(|T| > |t|) = 0.0012 Pr(T > t) = 0.0006

1-22

You might also like