Economics of Education Review: Dan Goldhaber, Stephanie Liddle, Roddy Theobald
Economics of Education Review: Dan Goldhaber, Stephanie Liddle, Roddy Theobald
Economics of Education Review: Dan Goldhaber, Stephanie Liddle, Roddy Theobald
A R T I C L E I N F O A B S T R A C T
Article history: With teacher quality repeatedly cited as the most important schooling factor influencing
Received 21 July 2012 student achievement, there has been increased interest in examining the efficacy of
Received in revised form 18 January 2013 teacher training programs. This paper presents the results of research investigating the
Accepted 22 January 2013
relationship between teachers who graduate from different training programs and student
achievement on state reading and math tests. Using a novel methodology that allows
JEL classification:
teacher training effects to decay, we find that training institution indicators explain a
I21
statistically significant portion of the variation in student achievement in reading, but not
I22
in math. Moreover, there is evidence that graduates from some specific training programs
Keywords: are differentially effective at teaching reading than the average teacher trained out-of-
Teacher training state and that these differences are large enough to be educationally meaningful.
Value added ß 2013 Elsevier Ltd. All rights reserved.
Teacher education
1. Teacher training and student achievement was a central tenet of the Race to the Top (RttT) grant
competition.2
The perceived lack of quality control within the teacher The value of teacher training is a hotly debated topic in
preparation system paints a discouraging picture of the education. Much of this debate is fueled by comparisons of
system’s prospects for improving the teacher workforce teachers who hold either a traditional or alternative
and has led to calls for reform that include closer license.3 Teacher training, however, often gets painted
monitoring of programs and holding them accountable with a broad brush, despite the fact that there are over
for student achievement results.1 Several states have 2000 traditional teacher training programs in the United
moved in this policy direction; evaluating teacher training States. Rhetoric about teacher training aside, there exists
programs based, at least in part, on the student perfor- relatively little quantitative information linking programs
mance of their trainees has already emerged as an with the quality of their graduates, or how specific
important education reform strategy in several states— approaches to teacher preparation are related to the
such as Colorado, Louisiana, Texas, and Tennessee—and effectiveness of teachers in the field (National Research
2
The U.S. Department of Education is also currently working to change
* Corresponding author. Tel.: +1 206 547 5585; fax: +1 206 547 1641. regulation of teacher training programs with an aim ‘‘to reduce input-
E-mail addresses: dgoldhab@uw.edu (D. Goldhaber), skl0@uw.edu based reporting elements that are not strong indicators of program
(S. Liddle), roddy@uw.edu (R. Theobald). effectiveness or safety and replace them with three categories of
1
For example, U.S. Secretary of Education Arne Duncan has said that outcome-based measures [including]. . .student growth of elementary
‘‘by almost any standard, many if not most of the nation’s 1450 schools, and secondary school students taught by program graduates’’ (U.S.
colleges and departments of education are doing a mediocre job of Department of Education, 2011).
3
preparing teachers for the realities of the 21st century classroom’’ See, for instance, Darling-Hammond, Wise, and Klein (1999), Gold-
(Duncan, 2010). See also Cochran-Smith and Zeichner (2005), Crowe haber and Brewer (2000), Glazerman, Mayer, and Decker (2006). For a
(2010), Levine (2006), NCATE (2010), Teaching Commission (2006). more thorough review of this literature, see Harris and Sass (2011).
0272-7757/$ – see front matter ß 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.econedurev.2013.01.011
30 D. Goldhaber et al. / Economics of Education Review 34 (2013) 29–44
Council, 2010).4 The not inconsiderable pushback against First, we utilize a two stage model that accounts for the
the notion that training programs should be held clustering issue identified by Koedel et al., and test the
accountable in some way for student growth based implications of data restrictions inherent with fixed effects
estimates (e.g., value-added) of their graduates has focused specifications, consistent with Mihaly et al. Second, we
on the shortage of research on measures of teacher training attempt to disentangle the contribution of training
program effectiveness (Sawchuk, 2012). programs toward teacher effectiveness from the influence
Researchers have only recently begun using administra- of teacher selection into particular programs. Finally,
tive databases to draw the link from teacher preparation unlike prior research, our model allows for the possibility
programs program to in-service teachers and then to that training program effects decay the longer a teacher is
student achievement in order to draw conclusions about in the workforce.6 Allowing for the possibility that training
the efficacy of teacher training programs (Boyd, Grossman, programs decay is an important feature of our model
Lankford, Loeb, & Wyckoff, 2009; Henry et al., 2011; Koedel, because it is quite unlikely that the impact of a teacher’s
Parsons, Podgursky, & Ehlert, 2012; Mihaly, McCaffrey, Sass, training one year after the receipt of a teaching credential
& Lockwood, 2012; Noell, Porter, Patt, & Dahir, 2008). is the same as the impact 10 or 20 years after the receipt of
Boyd et al. (2009), the only published large-scale a credential. This is particularly important in the context of
quantitative study focused on teacher training institutions, using student achievement for training program account-
examines training programs for teachers employed in New ability purposes. One way to address the likelihood that
York City. This study suggests that there is important training program effects decay with teacher experience is
variation in the effectiveness of teachers graduating from to only include novice teachers in an assessment of
different programs and, moreover, that some program training programs, but this assessment will reduce the
characteristics (e.g., timing of student teaching) predict reliability of the training program estimates and almost
program effectiveness. The difference between teachers certainly guarantee statistically insignificant findings for
from the average program and the program judged to be smaller programs. The decay feature of our model allows
the most effective is about as large as the regression- all teachers in the teacher workforce to contribute toward
adjusted difference between students who are eligible for program estimates, which dramatically increases sample
free or reduced-price lunches and students who are not. sizes and raises the possibility that these estimates could
This degree of variation is similar for both math and be used for program accountability even when some
language arts. Furthermore, institutions that produce programs send relatively few graduates into the workforce
effective math teachers also tend to produce effective each year.
language arts teachers. We investigate training programs in Washington State
Mihaly et al. (2012) and Koedel et al. (2012) focus on the and find that the majority of state-accredited programs
importance of empirical specification for interpreting both produce teachers who cannot be statistically distinguished
the point estimates and statistical significance of training from teachers who were credentialed outside of the state.
program effects. Mihaly et al. focus on the implications of The finding that there is relatively little variation in
including fixed effects designed to account for (time- training program graduate effects may lead some to
invariant) school context factors (e.g., an effective princi- conclude that teacher training is ineffective, but we
pal). In particular, they conclude that in order for models caution against this conclusion given that we are not
that employ school fixed effects to produce unbiased comparing the program indicators to a ‘‘no-training’’
program estimates, data must meet two assumptions: counterfactual (so our findings are not inconsistent with
identifiability and homogeneity.5 a hypothesis that programs uniformly improve the
Koedel et al. (2012) provide a further caution, arguing teaching skills of prospective teachers). But, what one
that studies that fail to account for the clustering of can conclude from our findings is that there are few bold
student observations for the same teacher overstate the experiments or radically different models for training and
differences between training programs because sampling selection amongst the 21 programs we investigate, at least
variability has been inappropriately attributed to training that show up in markedly different value-added estimates
programs. for training program graduates. Whether this is good or
In this paper we present research on the relationship bad clearly depends on one’s perspective on the current
between teacher training and teacher effectiveness (i.e., state of teacher training.
teacher-classroom-year effects) that builds on the existing That said, there are a small number of programs that
literature on teacher training institutions in several ways. can be distinguished from teachers trained out-of-state,
and the magnitudes of these differences are education-
ally meaningful. The point estimates, for example,
4
suggest that the regression-adjusted difference between
As we note below, it is difficult, if not impossible to definitively assess
teachers who received a credential from a program with
the causal impact of training institutions on teacher candidates since the
effectiveness of in-service teachers is likely to depend on both their
individual attributes as well as what they learned while being trained.
5 6
Identifiability refers to the connectedness of training programs and Henry et al. (2011) hint at the fact that program effects decay by
schools and the representation of teachers from different preparation mentioning that ‘‘the influence of colleagues, formal and informal
programs in the same schools. Homogeneity refers to the assumption that professional development, and trial and error experiences within the
programs estimates for ‘‘highly centralized’’ schools (those with teachers classroom may significantly reduce the influence of teacher preparation
from four or more preparation programs) are not significantly different that occurred 10 or 20 years earlier (p. 7),’’ but they do not investigate this
from those for less connected schools. possibility in their analysis.
D. Goldhaber et al. / Economics of Education Review 34 (2013) 29–44 31
the lowest performing teachers and those who received a measured by standardized tests.7 We utilize the following
credential from the program with the highest performing value-added model:
teachers is about 12% of a standard deviation in math and
Ai jst ¼ Aiðt1Þ a þ X it b þ t jt þ ei jst (2)
19% in reading. In math, this difference is 1.5 times larger
than the regression-adjusted difference in performance In (2), i represents students, j represents teachers, s
between students eligible for free or reduced-price represents subject area (math or reading), and t represents
lunches and those who are not; in reading the difference the school year. Student achievement Aijst, is regressed
is 2.3 times larger. So, while the bulk of our findings against: prior student achievement in math and reading,
contribute to the growing literature demonstrating that Ai(t1), a vector of student and family background
observable teacher characteristics are only weakly characteristics (e.g., sex, race/ethnicity, disability, spe-
correlated with teacher effectiveness, the striking differ- cial-ed status, free or reduced-price lunch status), Xit, and
ences in the effectiveness of teachers from programs at teacher-classroom-year effects, tjt.8
the tails of the distribution in Washington State hint at While the model above is a commonly used methodol-
the potential of teacher training to influence student ogy to derive teacher-classroom-year effects, there is no
achievement. universally accepted estimation specification for this
purpose (NRC, 2010), and empirically derived program
2. Conceptual framework, analytic approach, and data estimates involve making a number of strong assumptions
about the nature of student learning.9 In particular, in (2),
We posit a conceptual model in which the effectiveness the teacher-classroom-year effects implicitly include any
of teacher j at time t, tjt, is assumed to be dependent on four school- or district-level factors (e.g., the effectiveness of
components: (1) individual specific (time invariant) principals, the curriculum of a school district) that
teaching ability, wj; (2) the match between teachers and influence student achievement; we discuss this issue at
their schools and districts, hjk; (3) a non-linear function f of greater length below.
their experience in the labor market, Expjt; and (4) the In the second stage, estimated teacher effectiveness at
quality of the teacher-training they received, gjp. This time t (t̂ jt ) is assumed to depend on experience level at
program effect decays at a rate l according to a decay time t (Expjt), time-invariant teacher characteristics (Tj),
function g of teacher experience: time-varying classroom characteristics (Cjt) and the
t jt ¼ ’ j þ h jk þ f ðEx p jt Þ þ gðlEx p jt Þg j p (1) training that was received while at program P:
difference between our model and models in the existing our conceptual model (1), pre-service teaching ability (wj)
literature on training program effects is that we assume or teacher/school/district match (hjk). These effects are
program effects decay (all training program effects decay likely to be confounded with training programs effects due
at the same exponential rate l) during the time that to the non-random selection of teacher candidates into
teachers are in the labor market.10 training programs and the non-random assignment of
The rate of decay is defined by l, and our choice of decay teachers into schools and districts. Thus, we run variants of
function (g in Eq. (1)) is ubiquitous both in the physical model (3) that attempt to control for these confounders by
sciences (e.g., Karro, Peifer, Hardison, Kollmann, & Grunberg, including various measures of institutional selectivity or
2008; Wennmalm & Sanford, 2007) and economics litera- individual pre-service ability, as measured by the tests that
ture (e.g., Bechtold & DeWitt, 1988; Obizhaeva & Wang, prospective teachers take prior to entering a training
2013; Padmanabhan & Vrat, 1990). Note that the effective- program. The variables we include, however, are unlikely
ness of a totally notice teacher (Expjt = 0) is influenced solely to completely control for pre-service teaching ability. Thus,
by individual teaching talent and by training program, but even in this ‘‘selectivity decay’’ model it is appropriate to
assuming a positive value of l, the influence of training consider the training program effects to be a combination
programs on teacher effectiveness diminishes the longer of any effect of training combined with the influence of
that a teacher is in the field and is constant across all time-invariant teaching talent that is correlated with
programs.11 If training program effects do not decay, then selection into training programs.
l = 0.12 By allowing teacher training effects to decay, we can With this caveat in mind, we note that while a
include teachers of all experience levels in our analytic combined training-selection effect may be a less desirable
sample, not just recent program graduates. measure for policy makers, it may still prove useful for
Because the dependent variable is an estimate from administrators. While public policy makers may be
model (2) and ordinary least squares regression may interested in isolating training from selection effects to
produce standard errors that are too small, we use a learn what types of training are efficacious or possibly to
generalized least squares approach that accounts for the reward particular programs for the improvement in
uncertainty in the dependent variable by weighting teacher candidates, the primary concern for principals
observations in proportion to the reliability of each and district administrators is that they put quality people
individual estimated teacher-year effect (Aaronson et al., in front of their students. Thus, the question of whether our
2007; Borjas George & Glenn Sueyoshi, 1994; Koedel & findings are driven by selection or training is probably not
Betts, 2007).13 terribly relevant for practitioners.
The challenge in interpreting the results from model (3) We also estimate variants of model (3) that include
is that we do not control for the first two components of district or school fixed effects. These models account for
time-invariant differences across schools and districts,
but it is not totally clear that fixed effects models will
yield unbiased estimates of mean program effects. The
10
When we estimate (3) without allowing for the possibility of decay reason is that in a fixed-effects model, the estimates are
(l = 1) and limit the analysis to teachers at similar levels of the experience
based solely on within district or school differences in
distribution, we find that the weighted standard deviation of program
effects is far larger for teachers with 0–5 years of experience (0.029 in teacher effectiveness, and some of the differences
math and 0.033 in reading) than for teachers with more than 12 years of between programs may be related to systematic sorting
experience (0.015 in math and 0.018 in reading). This suggests that the across different types of districts or schools. Imagine, for
program indicators hold less explanatory power for more experienced instance, that there are large differences between
teachers, and motivates our incorporation of decay in (3).
11
The closest analog we have found in the education production
programs, but schools tend to employ teachers of a
function literature is a value-added model proposed by Harris and Sass similar effectiveness level. In this case, a school that
(2006) that allows for geometric decay in the impact of schooling inputs employs teachers that are average in effectiveness, from
over time. We use exponential decay because the model fit is marginally multiple programs, would tend to have some of the least
better than with geometric decay, but the correlation between the
effective teachers from the best training programs and
program estimates using exponential and geometric decay was over 0.99
in every specification. most effective teachers from the worst training pro-
12
This model implicitly assumes that decay is a function of experience grams, and thus the within school comparison would
in the labor market as opposed to the time since a teacher received her tend to show little difference between the programs. In
training, however, we could easily replace Expjt in the exponential term other words, some of the true differences between
with a measure of time since training. The correlation between program
effects in models that use experience and models that use time since
programs help explain the sorting of teachers across
training is greater than 0.99. schools so the within school comparisons lead to a
13
The correlations between these model results and those from washing out of the program estimates.14 Therefore,
unweighted, or OLS, models are all above 0.97 in both subjects. Previous while the school and fixed effects models improve
work in Florida (Mihaly et al., 2012) estimates program effects with the
upon model (3) by separating the program effect from
felsdvregdm command (Mihaly, Lockwood, McCaffrey, & Sass, 2010),
which allows them to estimate program effects relative to the average school- or district-level confounders, they introduce a
program in the state. However, this command does not allow for the non-
linear specification in model (3), so we include program dummies and
estimate program effects relative to out-of-state teachers. Nonetheless,
14
we find that the correlation between the program estimates from our The analogous issue arises when estimating individual teacher
model when l = 0 (no decay) and the program estimates using effectiveness and making decisions of whether or not to include school
felsdvregdm is over 0.9 in both math and reading (results available level fixed effects. We thank Jim Wyckoff for his insights on this matter
upon request). (Personal Communication, August 2011).
D. Goldhaber et al. / Economics of Education Review 34 (2013) 29–44 33
comparison-group issue that makes interpretation of the This database also includes teachers’ test scores on the
results difficult.15 Washington Educator Skills Test-Basic, or WEST-B, a
We use multiple years of data to improve the stability of standardized test that all teachers must pass prior to
the estimates, but there are two issues that arise from our use entering a teaching training program.18
of multiple years of data. First, model (3) assumes that Information on teachers in the S-275 and the Washing-
programs do not change over time. This may be plausible ton State Credentials database can be linked to students via
over the short term given well-documented resistance to the state’s CSRS, CEDARS, and WASL databases. The CSRS
change within academic departments (e.g., Perlmutter, includes information on individual students’ backgrounds
2005; Summers, 2012), but it is unlikely that the no change including gender, race/ethnicity, free or reduced-price
assumption would hold over the decades that cover the time lunch, migrant, and homeless statuses; as well as
span for which we have teachers in our sample. Thus, though participation in the following programs: home-based
not reported here, we estimate additional models that learning, learning disabled, gifted/highly capable, limited
include interactions between program graduate cohorts and English proficiency (LEP), and special education for the
programs to explore this issue. Second, we do not account for 2005–2006 to 2008–2009 school years. All of these
the possible non-random attrition of teachers from different variables are included as student controls in model (2).
programs. If teachers from different programs attrite from In 2009–2010, CEDARS replaced the CSRS database. It
the workforce at different rates, this will bias our program contains all individual student background characteristics,
estimates, since these estimates are pooled over years. We but in addition, includes a direct link (a unique course ID
do not address this issue explicitly in the model, but we within schools) between teachers and students. The WASL
explore the potential that non-random attrition might lead database includes achievement outcomes on the WASL, an
to biased program effects in the data section below. annual state assessment of math and reading given to
students in grades 3 through 8 and grade 10.
3. Data
Like every state, Washington has requirements for
initial entry into the teacher workforce, but unlike a
The data for this study is derived primarily from five number of states, Washington’s standards are relatively
administrative databases prepared by Washington State’s stringent in the sense that it has not relied on alternative
Office of Superintendent of Public Instruction (OSPI): the routes, such as Teach for America, as a significant source of
Washington State S-275 personnel report, the Washington new teachers (National Council on Teacher Quality, 2007).
State Credentials database, the Core Student Record System The great majority of the state’s teachers are trained at one
(CSRS), the Comprehensive Education Data and Research of the 21 state-approved programs.19 There is, however,
System (CEDARS), the Washington Assessment of Student clearly heterogeneity in the selectivity of programs
Learning (WASL) database. preparing teachers. For instance, the University of
The S-275 contains information from Washington Washington Seattle (UW Seattle) is considered the flagship
State’s personnel-reporting process; it includes a record university in the state and in 2009, the 75th percentile
of all certified employees in school districts and educa- composite SAT score of incoming UW freshman was about
tional service districts (ESDs), their place(s) of employ- 1330. Nearly every other program in the state had lower
ment, and annual compensation levels. It also includes 75th percentile SAT scores ranging between 1070 and
gender, race/ethnicity, highest degree earned, and experi- 1290.20 And, a few accredited programs do not require
ence, all of which are used as control variables in model (3). applicants to submit admissions test results in order to be
The Washington State Credentials database contains
considered for admission.21 To account for this heteroge-
information on the licensure/certification status of all neity we use institution-level data from The College Board,
teachers in Washington, including when and where
which includes annual (since 1990) measures of selectivity
teachers obtained their initial teaching certificates.16,17
15
We also attempt to account for the non-random distribution of programs
18
within schools by estimating a school-level model (Hanushek, Rivkin, & Since August 2002, candidates of teacher preparation programs in
Taylor, 1996) that regresses the average achievement of students in a school Washington State have been required to meet the minimum passing
on school characteristics (e.g., enrollment, percent of students in each gender scores on all three subtests (reading, mathematics, and writing) of the
and racial/ethnic category, percent of students eligible for free or reduced- WEST-B as a prerequisite for admission to a teacher preparation program
price lunches, average class size) and the percent of teachers in that school approved by the PESB. The same is also required of out-of state teachers
who come from each training program. The results of this specification seeking a Washington State residency certificate. This test is designed to
should be robust to any non-random sorting of teachers within schools. reflect knowledge and skills described in textbooks, the Washington
16
From this database, we identify the institution from which a teacher Essential Academic Learning Requirements (EALRs), curriculum guides,
received his or her first teaching certificate, which may or may not be where a and certification standards.
19
teacher did his or her undergraduate work. OSPI’s coding schema for first- In each year from 2006–2007 to 2008–2009, at least 95% of teachers
issue teaching certificates (i.e., what we call ‘‘initial’’ certificates) has in Washington State were certified via traditional training programs; the
changed over time. We code all initial certificates to account for these remaining teachers were certified through alternative training programs
historical changes. within PESB-accredited institutions of higher education (https://title2.
17
The ‘‘recommending agency’’ variable in these data identifies the ed.gov/Title2STRC/Pages/ProgramCompleters.aspx).
20
college/university that did all of the legal paperwork to get an individual The one exception is the University of Puget Sound whose composite
issued a teaching certificate. Thus, while likely that the recommending 75th percentile SAT score was 1340.
21
institution was also the institution where teachers were trained, the Heritage University has an open enrollment policy. City University
variable itself does not necessarily mean that the person graduated from and Antioch University both focus on adult learning and bachelor’s degree
the recommending agency. completion suggesting less stringent entrance requirements.
34 D. Goldhaber et al. / Economics of Education Review 34 (2013) 29–44
Table 1
Experience levels of teachers from different training programs.
Heritage Northwest Pacific Lutheran Saint Martin’s Seattle Pacific Seattle University of
University University University University University University Puget Sound
based on the high school grades, standardized test scores, these grades and years we were able to match about 70% of
and admissions rates of incoming freshman.22 students to their teachers and estimate value-added models
We combine the data from these sources described above (VAMs) of teacher effectiveness.24
to create a unique dataset that links teachers to their schools Our analytic sample includes 8718 teachers (17,715
and, in some cases, their students in grades 3 through 6 for teacher-years) for whom we can estimate VAMs and for
the 2005–2006 to 2009–2010 school years in both math and whom we know their initial teacher training program as
reading. Due to data limitations, not all students in grades 3 being either from one of 20 state accredited teacher
through 6 across these five school years can be linked to preparation programs, or from outside of the state.25
their teachers. This is largely due to the fact that, until Table 1 gives the number of teachers in the sample from
recently, the state has only kept records of the names of each training program by experience level. Unlike previous
individuals who proctored the state assessment to students, studies, our analytic sample contains both novice and
not necessarily the students’ classroom teacher.23 Across all experienced teachers from each program, which is possible
because of the decay methodology described in the
previous section. If program effects were the same for
22
Year-to-year Pearson correlations for each of these selectivity both novice and experienced teachers, we would be forced
measures are typically high, i.e., above 0.90. to make one of two choices: keep all teachers and weight
23
The proctor of the state assessment was used as the teacher–student them all equally, despite the fact that some programs (e.g.,
link for the data used for analysis for the 2005–2006 to 2008–2009 school
years. The assessment proctor is not intended to and does not necessarily
Antioch, City, and UW-Bothell) have far more novice
identify the subject-matter teacher of a student. The ‘‘proctor name’’
might be another classroom teacher, teacher specialist, or administrator.
We take additional measures to reduce the possibility of inaccurate
24
matches by limiting our analyses to elementary school data where most For the 2005–2006 to 2008–2009 school years where the proctor
students have only one primary teacher and only including matches name was used as the student-teacher link, student-to-teacher match
where the listed proctor is reported (in the S-275) as being a certified rates vary by year and grade with higher match rates in earlier years and
teacher in the student’s school, where he or she is listed as 1.0 FTE in that lower grades.
25
school, and is endorsed to teach elementary education. And for the 2009– Consistent with Constantine et al. (2009), we define teacher
2010 school year, we are able to check the accuracy of these proctor preparation programs as those from which new teachers must complete
matches using the state’s new Comprehensive Education Data and all their certification requirements before beginning to teach. Lesley
Research System (CEDARS) that matches students to teachers through a University produced its first teachers in 2009–2010. So, although it is an
unique course ID. Our proctor match agrees with the student’s teacher in accredited institution in Washington State, our observation window
the CEDARS system for about 95% of students in both math and reading. precludes it from being included in our analysis.
D. Goldhaber et al. / Economics of Education Review 34 (2013) 29–44 35
Table 2
Means of selected student characteristics in 2008–2009 for unrestricted and restricted samples.
teachers than other programs; or, as in previous studies, by sex, race/ethnicity, and licensure score are statistically
only keep novice teachers, despite the very small sample significant at the 0.05-level, suggesting that the within-
sizes that would result for some programs. However, by program attrition is approximately random.28 Second,
allowing program effects to decay, we are able to keep all there is evidence in the literature of differential teacher
8718 teachers in our sample, since each teacher’s attrition by effectiveness (Winters, Dixon, & Greene, 2012;
contribution to the program estimates is weighted Goldhaber, Gross, & Player, 2011), so we compare the
appropriately based on the time he or she has been in average teacher effectiveness of new teachers in 2006–
the workforce.26 2007, by program, to the average teacher effectiveness of
These teachers are linked to 291,422 students (388,670 those teachers who remain in our sample until 2009–2010.
student-years) who have valid WASL scores in both We find few significant differences by program. But we
reading and math for at least two consecutive years. also caution readers that these results are based on small
Table 2 reports selected student characteristics for the sample sizes. For example, only two programs placed more
2008–2009 school year for an unrestricted sample of than 30 teachers into 4th–6th grade classrooms in
students, i.e., those in grades 4–6 who have a valid WASL Washington State during 2006–2007, and only one of
math or reading score but could not be matched to a those programs retained just over 50% of their teachers
teacher, and our restricted analytic sample described until 2009–2010. Nonetheless, results from both of these
above. t-Tests show that while nearly all of these tests suggest we ought not be overly concerned that non-
differences are statistically significant, none of them are random attrition in our sample will bias our program
very large.27 effects.29
Before moving to the main results in the following
section, we note that we investigate the potential that non- 4. Results
random attrition from the teacher workforce might lead to
biased program effects in two ways. First, we compare Prior to describing our findings on teacher training
time-invariant teacher characteristics by training program, programs, a few peripheral findings warrant brief notice.
for new teachers in 2006–2007 to the characteristics of Table 3 shows the coefficient estimates from the first stage
those teachers who remained in the workforce until 2009– models used to generate the teacher-classroom-year
2010. Approximately 5% of the within-program differences effects. As is typically found in the literature, there are
significant differences between student subgroups in
achievement. And while not reported, we also estimated
26
models that allow for non-linear relationships between
Nonetheless, Table 1 shows that there are relatively small sample
sizes for some of the programs in our analytic sample. As a robustness
check, we run our base model only for programs with at least 100
28
graduates in the sample, and find that the correlation between the Importantly, there is no evidence that teachers with higher or lower
program estimates for these 15 programs and the estimates reported in licensure scores are disproportionately leaving the profession within the
Table 4 are above 0.999 in both math and reading, demonstrating that the years of our study.
29
estimates for the larger programs are not affected by the presence of Whether comparing observable teacher characteristics or teachers’
several small programs. value-added estimates over time, t-tests for samples that include all
27
Comparisons for other years reveal similar results. teachers are largely consistent with those including only new teachers.
36 D. Goldhaber et al. / Economics of Education Review 34 (2013) 29–44
Table 4
Program estimates and standard errors for various model specifications.
Math Reading
Reported program coefficients and standard errors are multiplied by 100. All models include teacher and classroom covariates (i.e., gender, race/ethnicity,
degree level, experience, class size). Selectivity Decay models also include measures of selectivity (i.e., freshman admissions rates, composite SAT scores,
and percent freshman with GPA > 3.0).
y
Statistical significance: if p < 0.10.
* Statistical significance: if p < 0.05.
** Statistical significance: if p < 0.01.
*** Statistical significance: if p < 0.001.
standard deviation in the teacher effect for new teachers’’ There is little change in the number of significant
(p. 429). This could possibly be due to different clustering program coefficients moving from the no decay (columns 1
decisions, but beyond this, we can only speculate as to and 4 for math and reading, respectively) to decay
why we find less heterogeneity in our program estimates (columns 2 and 5) specifications, but there are some
than Boyd et al. It is possible these differences result from marked changes in their magnitudes. For instance, the
the fact that New York has more training programs (i.e., average absolute values of the coefficient estimates in
30) than does Washington, that New York training math and reading in the no decay specifications are 0.016,
programs draw potential teachers from a different and 0.020, respectively, whereas the corresponding aver-
distribution, or that training programs in Washington age in the decay specifications are 0.027 and 0.030. The
are more tightly regulated and are therefore more similar change in magnitude is to be expected given that the
to one another. interpretation of the training program coefficients is
38 D. Goldhaber et al. / Economics of Education Review 34 (2013) 29–44
Fig. 1. Decay curves and half-life for decay and selectivity decay models.
different in the decay specification. Specifically, the a 10th-year teacher, and so on. Specifically, the ‘‘half-life’’38
program indicators in the no decay model are the of teacher training effects is estimated to be between 12.9
estimated effects on student achievement of having a and 13.7 years in math and between 11.3 and 15.5 years in
teacher who received a credential from a specific training reading depending on model specification.39
program regardless of when he/she received the creden- We attempt to account for selection into training
tial, whereas the program coefficients in the decay models programs in our models with the inclusion of various
are the estimated effects for first-year teachers who get measures of institutional selectivity: the composite (math
their teaching credentials from different programs.36 and verbal) 75th percentile score on the SAT for incoming
In both math and reading the magnitude of the freshman, the percent of incoming freshman whose high
estimated decay parameter is about 0.05, suggesting that school grade point average was above 3.0, and admissions
training program effects do decay as teachers gain rates for incoming freshman (i.e., total admitted/total
workforce experience. This has important implications applied).40 Columns 3 and 6 in Table 4 provide the individual
for how we think about the influence of training programs training program estimates from these selectivity models.
on teacher quality, as is illustrated in Fig. 1, which shows There are a handful of notable changes in individual
how the effects of training programs for teachers with program estimates moving from our decay models (columns
varying workforce experience decay over time based on 2 and 5) to those with institutional selectivity controls
estimates of the decay parameter (l) from both our decay (columns 3 and 6). In math, for example, the coefficient for
and selectivity decay models.37 These results clearly Eastern Washington University turns negative and increases
suggest that teacher training should not be thought of in magnitude. The coefficients for St. Martin’s University and
as invariant to a teacher’s workforce experience, as the
initial influence of a training program on a first-year
teacher decays to about 78% for a 5th-year teacher, 60% for 38
The estimate of the half life is ln(2)/l.
39
The individual program estimates share modest positive correlations
with institutional SAT scores (0.4 in math and 0.2 in reading). That said,
36
Implicit in our estimates is the notion that the training programs there are a few surprises. For example, while not significant at the 95%
themselves are not changing over time. Though not reported here, we level, both UW Tacoma and UW Bothell, have composite 75th percentile
estimate additional models that include interactions between program SAT scores of 1120 and 1150, respectively—the fourth and fifth lowest of
graduate cohorts and programs to explore this issue and find some all other institutions requiring SATs in 2009—have graduated some of the
evidence of change for some training programs, but it is somewhat most effective math and reading teachers in the state. Whereas more
speculative. Also, since earlier studies have focused only on novice selective institutions, such as Gonzaga University, with a composite 75th
teachers in large training programs, we run our base model only for percentile SAT score of 1270—ranking fourth highest—have graduated
teachers with three or fewer years of experience and from programs with teachers who are significantly less effective in reading compared to out-
at least 50 such teachers in the sample. The correlation between these of-state teachers or graduates from the majority of other in-state
program estimates and the estimates for the full sample of teachers from programs.
40
these programs is 0.67 in both math and reading. These differences, Since most training programs are part of four-year institutions, we
however, conflate programmatic change with changes to the sample of subtract four years from a teacher’s certification date to approximate
teachers considered, so we do not discuss them further. For more details, their entry year into their certifying institution. Teachers who were
see the working paper version of this paper (Goldhaber, Liddle, & certified out of state (22%), entered school before 1990 (31%), or
Theobald, 2012). graduated from institutions that don’t report selectivity data (9%) are
37
We note that Fig. 1 is somewhat speculative given that we observe missing institutional selectivity data, but are included in the regression
each teacher for five years at most. with a dummy indicator for missingness.
D. Goldhaber et al. / Economics of Education Review 34 (2013) 29–44 39
Table 5
Program estimates from robustness models.
Math Reading
Reported program coefficients and standard errors are multiplied by 100. All models include teacher and classroom covariates (i.e., gender, race/ethnicity,
degree level, experience, class size). District FE decay models include district fixed effects. School FE decay models include school fixed effects.
y
Statistical significance: if p < 0.10.
* Statistical significance: if p < 0.05.
** Statistical significance: if p < 0.01.
*** Statistical significance: if p < 0.001.
Evergreen State College become increasingly negative least for the workforce as a whole, a significant mechanism
and become statistically significant. And in reading, the influencing differences between teacher training programs.
coefficients for Seattle Pacific University and Western Second, our controls are for the selectivity of the institutions,
Washington University turn negative with smaller magni- not training programs themselves, and, regardless, may be a
tudes, the latter losing significance. However, overall there poor proxy for individuals’ pre-service capacity to teach.41
is little evidence that the inclusion of selectivity controls has We attempt to better account for individual pre-service
an impact on the estimates. Indeed, the Pearson correlation
coefficients for the program estimates of the decay model
and the program estimates from the selectivity model are
41
0.95 for math and 0.97 for reading. In fact, F-tests show that together these three selectivity measures
only significantly improve model fit for reading (at the 0.10 level).
There are several potential explanations for the finding Moreover, the coefficient estimates from the selectivity variables
that our selectivity results differ little from the decay results. themselves do not provide a consistently strong argument that program
First, it may be that selection into training programs is not, at selectivity is an important factor in predicting teacher effectiveness.
40 D. Goldhaber et al. / Economics of Education Review 34 (2013) 29–44
Fig. 2. Program effects for various model specifications. The program effects across specifications are directionally robust for programs at the tails of the
distribution, but not near the center.
capacity/selection into program by adding an additional additional models that include either (1) school-level
control for selection: a teacher’s average score on the three aggregates, (2) district fixed effects, or (3) school fixed
subtests of the WEST-B.42 The WEST-B coefficients are effects. These results are reported in columns 1 through 3
positive for both subjects and statistically significant in (for math) and columns 4 through 6 (for reading) of Table 5.
math (marginally in reading) indicating that higher-scoring The school aggregation models regress average teacher
teachers tend to be more effective teachers.43 However, the effectiveness (equally weighted by teachers) in a school
inclusion of this test does little to change the program against the share of teachers who hold a credential from each
estimates.44 training program plus the other covariates in model (3)
A final possibility is that the program estimates aggregated to the building level. The argument for models
reported in Table 4 are biased by the sorting of teachers with building level aggregation is that they wash out the
and students into classrooms (Rothstein, 2010). For potential for within-school sorting of teachers to classrooms.
instance, were it the case that teachers from particularly Alternatively, the argument for fixed effects specifications is
effective training institutions were systematically more that they account for time-invariant district or school factors
likely to be teaching students who are difficult to educate such as curriculum or the effectiveness of principals.45
(in ways that are not accounted for by the control variables As described in Mihaly et al. (2012), there are two related
reported in Table 3) and vice versa, we might expect an data issues that arise when estimating fixed effects
attenuation of the program coefficients. This possibility is specifications of Eq. (3): whether schools and programs
explored in the next sub-section. are ‘‘connected’’ enough in a sample such that estimates of
program effects can be based on within school (or district)
4.2. Robustness checks: accounting for non-random sorting of variation in teacher effectiveness (identifiability); and
teachers and students whether ‘‘highly centralized’’ schools (those with teachers
from four or more preparation programs) are significantly
We assess the potential bias associated with the non- different from less centralized schools such that the findings
random sorting of teachers and students by estimating three may not generalize to the full sample of schools (homogene-
ity). We follow Mihaly et al.’s (2012) procedures for testing
42
for identifiability and homogeneity and conclude there is
Since individuals are allowed to take the exam more than once and
minimal concern about either in our data (at least relative to
scores may be contingent on the number of times the test is taken, we
average teachers’ first scores on the three WEST-B subtests. the situation they investigate in Florida). Identifiability does
43
The WEST-B coefficients are 0.019 for math and 0.011 for reading.
44
Of our full sample of 8718 teachers, only 1469 (16.8%) have WEST-B
scores largely because the majority of teachers (roughly 85%) were
45
enrolled in a program before the WEST-B requirement was put into effect. The district and school fixed effects specifications allow for program
The correlation between models with and without the WEST-B test, for effects to decay, but the school aggregation models do not permit this as
the subsample of teachers that have WEST-B scores, is over 0.99 for each decay is specified by the experience level of individual teachers. For
subject. correlations across models, see Table 6.
D. Goldhaber et al. / Economics of Education Review 34 (2013) 29–44 41
FE decay
School
the schools in our sample (which employed about 5% of all
1j1
teachers) were staffed with teachers from a single training
program.46 We cannot rule out homogeneity issues as t-tests
comparing the characteristics of students and teachers in
0.89j0.87
FE decay
District
schools with teachers from one training program to those in
1j1
schools with teachers from four or more training programs
reveal that there are a few significant differences, but these
differences are relatively small.47
aggregates
0.62j0.64
0.40j0.38
As is readily apparent from scanning the results across
School
columns in Table 4, most of the program coefficients from
1j1
our fixed effects regressions are far smaller than those
reported in our decay models and are less likely to be
Selectivity
0.56j0.48
0.91j0.86
0.79j0.72
statistically significant. For instance, the average absolute
decay
values of the program indicators in the decay models are 0.03
1j1
for both math and reading. These values attenuate to 0.01 for
math and 0.02 for reading in both the district and school
0.60j0.69
0.97j0.90
0.91j0.90
0.76j0.70
fixed effect models, and F-tests of the program indicators are
Decay
not jointly significant in either math or reading in the fixed
1j1
effect models. This finding may simply reflect the fact that
there is little true difference between teachers associated
No decay
0.98j0.96
0.64j0.66
0.93j0.95
0.82j0.79
0.96j0.90
with the program from which they received their teaching
Reading
credential after accounting for selection into district, school,
1j1
or classroom. But there are also at least two good reasons to
doubt that these are the right specifications. In particular, in
0.15j0.32
0.22j0.24
0.14j0.36
0.39j0.49
0.02j0.16
0.03j0.17
FE decay
the case of the fixed effects models the decay parameter is
School
estimated to be negative and marginally significant in math,
1j1
and insignificant in the case of reading. This finding is
seemingly implausible as it implies, in the case of math, that
0.45j0.61
0.48j0.53
0.42j0.47
0.58j0.56
0.37j0.44
0.53j0.60
0.55j0.60
the effects of training programs increase with teacher FE decay
District
1j1
Moreover, while F-tests of the school and district fixed
effects suggest that they improve the fit of the model, other
aggregates
0.33j0.23
0.34j0.25
0.24j0.12
0.39j0.26
0.70j0.47
0.12j0.04
0.16j0.05
each new parameter added to the model—is lower (and
School
PearsonjSpearman correlations of program estimates from various model specifications.
0.63j0.56
0.16j0.28
0.47j0.45
0.41j0.46
0.29j0.33
0.32j0.32
0.50j0.53
0.50j0.46
0.13j0.09
really is no way, outside of an experiment, to know which
decay
0.32j0.33
0.38j0.38
0.24j0.28
0.60j0.48
0.05j0.19
0.30j0.32
0.57j0.60
0.17j0.20
0.97j0.96
0.92j0.92
0.65j0.71
0.73j0.58
0.11j0.17
0.43j0.45
0.31j0.34
0.44j0.42
0.29j0.31
0.41j0.40
0.02j0.04
Math
1j1
46
This contrasts sharply with Mihaly et al.’s sample of Florida teachers
wherein 54.1% of schools had teachers from a single training institution
Model specification
School aggregates
School aggregates
District FE decay
Selectivity decay
Selectivity decay
School FE decay
School FE decay
No decay
Decay
teachers from four or more training programs are significantly more likely
to meet state math standards, be Black, Asian, and/or bilingual. Teachers
in schools with only one training program are marginally (at the 10%
significance level) more likely to have higher WEST-B scores. The t-tests
Reading
Table 6
48
Specifically, BIC = 2 ln(L) + k ln(n), where L is the likelihood of the
model, k is the number of free parameters, and n is the sample size.
42 D. Goldhaber et al. / Economics of Education Review 34 (2013) 29–44
Despite the individual differences in some program Washington State is an excellent setting for this study
indicators between model specifications, it turns out that because it has few alternative certification routes and few
there is a fair amount of agreement across many, but not schools in which every teacher comes from a single teacher
all, of the specifications. As we show in Table 6 the Pearson training program, an important condition for identifiability
correlations of the estimates from all specifications in (Mihaly et al., 2012).
Table 4, the school aggregation model, and district fixed In general our findings suggest that where teachers are
effects specifications in Table 5 are all over 0.50 within credentialed explains only a small portion of the overall
subject. By contrast, the correlation between the estimates variation in the effectiveness of in-service teachers. This is
from the school fixed effects models and the models now a common finding in the educational productivity
without fixed effects are far smaller, particularly so in math literature; it appears that the best assessments of teachers
where the correlation is not statistically significant. These are those based on actual classroom performance rather
findings are reiterated graphically in Fig. 2, which shows than pre- or in-service credentials. That said, the
the distributions of program effect estimates for all model differential in the effectiveness of the teachers creden-
specifications sorted by the program effect from the decay tialed by various programs is meaningful. For instance, the
model in descending order. In general, we see that regression-adjusted difference between teachers who
directionality of the programs estimates is robust in the receive a credential from the least and most effective
tails of the distribution (where effects are further from zero programs is estimated to be 3.9–13.4% of a standard
and sometimes statistically significant—at least in read- deviation in math and 9.2–22% of a standard deviation in
ing), but not in the middle (where effects bounce around reading.50 To put this in context, in decay models with no
zero and are not statistically significant). fixed effects, the average expected difference in student
Lastly, it is worth noting that cross-subject, within performance between having a math teacher from the
model specification, correlations are in the neighborhood most effective program and the least effective program at
of 0.4–0.5, suggesting that programs producing teachers least 1.5 times the regression-adjusted difference be-
who are effective at teaching math tend to also produce tween students who are eligible for free or reduced-price
teachers who are effective at teaching reading. lunches and those who are not (bFRPL = 0.076). For
reading, this difference is over two times the difference
5. Discussion and conclusions between students with learning disabilities and those
without (bdisability = 0.083). Moreover, this same differ-
U.S. teacher policy is on the cusp of significant changes. ence is larger than the size of the estimated difference
Much of this focuses on in-service teachers, and the linking between a first-year teacher and a teacher with five or
of student growth measures to the evaluation of teachers. more years of experience by one and a half times in math
But there is increasing interest in relating student (g0 = 0.074) and more than 2 times in reading
outcomes to teacher training with the hope that identified (g0 = 0.086).
differences in program effects ultimately lead to more While we find that programs credentialing teachers
effective selection and training practices for pre-service who are more effective in math are generally also
teachers. The results presented in this paper represent one credentialing more teachers who are more effective in
of the first statewide efforts to connect individual teacher reading, there is clear evidence throughout that training
training institutions to student achievement and explore program indicators are far more predictive of teacher
the extent to which model specification influences effectiveness in reading than math. This may suggest that
estimated results, and the first that allows all teachers more of teacher training tends to focus on the skills that
in the state to contribute to program estimates by allowing teachers need to teach students reading than math.
program effects to decay with teacher experience.49 There is no doubt that evaluating teacher training
programs based on the value-added estimates of the
49
teachers they credential is controversial. It is true that the
It is important to note several caveats about the analyses we presented
value-added program estimates do not provide any direct
here. First, the samples used to detect differential program estimates for
student subgroups and programmatic change over time were relatively guidance on how to improve teacher preparation pro-
small so it is conceivable that effects do exist but their magnitude is too small grams. However, it is conceivable that it is not possible to
to detect with the sample at hand. Second, our analyses are focused entirely move policy toward explaining why we see these program
on elementary schools and teachers. It is conceivable that comparable estimates until those estimates are first quantified.
analyses would yield results that look quite different at the secondary level.
Third, students’ outcomes on standardized tests are only one measure of
Moreover, it is certainly the case that some of the policy
student learning, so it is possible that value-added approaches miss key questions that merit investigation—e.g., Do we see
aspects of what different training institutions contribute to teacher program change with new initiatives?; Do we see
candidates. Finally, our decay models assume that a teacher’s performance differences between programs within endorsement area?;
decays to the average performance of out-of-state teachers of the same
How much of the difference between programs is driven by
experience level. What might be more plausible, however, is that teacher
performance decays to the average performance of teachers at the same selection versus training?—require additional data. Some
school (teacher ‘‘acculturation’’). Unfortunately, our specification of decay
requires program effects to be measured relative to a reference category, so
we cannot modify our model to allow teacher effects to decay to an average
50
teacher, either across the workforce or within the same school. In the future These differences across models for math and reading, respectively,
it will be possible to estimate more flexible specifications that allow for are: 7.3 and 12.7 (no decay), 11.5 and 19.2 (decay), 13.4 and 22.0
differential types of decay, but this is currently infeasible given that we only (selectivity decay), 3.8 and 12.1 (district FE decay), and 4.4 and 9.2 (school
have four years of value-added data for teachers in our sample. FE decay).
D. Goldhaber et al. / Economics of Education Review 34 (2013) 29–44 43
of these questions could be addressed with larger Goldhaber, D., & Brewer, D. (2000). Does teacher certification matter? High
school teacher certification status and student achievement. Educational
samples—in this case it is merely a matter of time—but Evaluation and Policy Analysis, 22(2), 129–145.
other questions require additional information about Goldhaber, D. (2007). Everybody’s doing it, but what does teacher testing tell
individual programs, teacher candidate selection process- us about teacher effectiveness. Journal of Human Resources, 42(4),
765–794.
es, and so forth. The collection of this kind of data, along Goldhaber, D., Gross, B., & Player, D. (2011). Teacher career paths, teacher
with systematic analysis, would provide a path toward quality, and persistence in the classroom: Are public schools keeping
evidence-based reform of the pre-service portion of the their best? Journal of Public Policy and Management, 30, 57–87.
Goldhaber, D., Liddle, S., & Theobald, R. (2012). The Gateway to the Profession:
teacher pipeline. Teacher Preparation Programs Based on Student Achievement. CEDR Work-
ing Paper no. 2012-4. Seattle, WA: Center for Education Data & Research
Acknowledgments (CEDR), University of Washington.
Goldhaber, D., & Hansen, M. (in press). Is it just a bad class? Assessing the
long-term stability of estimated teacher performance. Economica.
We gratefully acknowledge the use of confidential data Hanushek, E. A., Rivkin, S. G., & Taylor, L. L. (1996). Aggregation and the
from Washington State supplied by the Office of Superin- estimated effects of school resources. National Bureau of Economic Research.
Harris, D. & Sass, T. (2006). Value-added models and the measurement of
tendent for Public Instruction (OSPI). This research was teacher quality. Unpublished Manuscript.
made possible in part by a grant from the Carnegie Harris, D., & Sass, T. (2011). Teacher training, teacher quality and student
Corporation of New York and has benefited from helpful achievement. Journal of Public Economics, 95(7), 798–812.
Henry, G. T., Thompson, C. L., Bastian, K. C., Kershaw, D. C., Purtell, K. M., &
comments from Joe Koski, John Krieg, Dale Ballou, Jim Zulli, R. A. (2011) Does teacher preparation affect student achievement?
Wyckoff, Steve Rivkin, Duncan Chaplin, Margit McGuire, Chapel Hill, NC: Carolina Institute for Public Policy Working Paper,
Jon Wakefield, Cory Koedel, and Richard Murnane. Finally, Version dated February 7, 2011.
Jackson, C. K. (in press). Match quality, worker productivity, and worker
we wish to thank Jordan Chamberlain for editorial
mobility: Direct evidence from teachers. Review of Economics and Statis-
assistance. The statements made and views expressed in tics.
this paper do not necessarily reflect those of the University Jackson, C. K., & Bruegmann, E. (2009). Teaching students and teaching each
other: The importance of peer learning for teachers. American Economic
of Washington Bothell, Washington State, or the Carnegie
Journal: Applied Economics, 1, 85–108.
Corporation. Any and all errors are solely the responsibility Karro, J., Peifer, M., Hardison, R., Kollmann, M., & Grunberg, H. (2008).
of the authors. Exponential decay of GC content detected by strand-symmetric substi-
tution rates influences the evolution of isochore structure. Molecular
Biology and Evolution, 25(2), 362–374.
References Koedel, C., & Betts. J. R. (2007). Re-examining the role of teacher quality in the
educational production function. University of Missouri Working Paper.
Aaronson, D., Barrow, L., & Sanders, W. (2007). Teachers and student Koedel, C., Parsons, E., Podgursky, M., & Ehlert, M. (2012). Teacher preparation
achievement in the Chicago Public High Schools. Journal of Labor Eco- programs and teacher quality: Are there real differences across programs?
nomics, 25(1), 95–135. University of Missouri Working Paper.
Bechtold, S., & DeWitt, S. (1988). Optimal work–rest scheduling with expo- Levine, A. (2006). Educating school teachers. Washington, DC: The Education
nential work-rate decay. Management Science, 4, 547–552. Schools Project.
Borjas George, J., & Glenn Sueyoshi, T. (1994). A two-stage estimator for McCaffrey, D., Sass, T., Lockwood, J. R., & Mihaly, K. (2009). The intertemporal
probit models with structural group effects. Journal of Econometrics, 64, variability of teacher effect estimates. Education Finance and Policy, 4,
165–182. 572–606.
Boyd, D., Grossman, P., Lankford, H., Loeb, S., & Wyckoff, J. (2009). Teacher Mihaly, K., Lockwood, J. R., McCaffrey, D., & Sass, T. (2010). Centering and
preparation and student achievement. Educational Evaluation and Policy reference groups for estimates of fixed effects: Modifications to felsdv-
Analysis, 31(4), 416–440. reg. The STATA Journal, 10(1), 82–103.
Branch, G., Hanushek, E., & Rivkin, S. (2012, January). Estimating the effect of Mihaly, K., McCaffrey, D., Sass, T., & Lockwood, J. R. (2012). Where you come
leaders on public sector productivity: The case of school principals. CALDER from or where you go? Distinguishing between school quality and the
Working Paper 66. effectiveness of teacher preparation program graduates. CALDER Working
Chingos, M., & Peterson, P. (2011). It’s easier to pick a good teacher than to Paper.
train one: Familiar and new results on the correlates of teacher effec- National Council for Accreditation of Teacher Education. (2010). Transform-
tiveness. Economics of Education Review, 30(3), 449–465. ing teacher education through clinical practice: A national strategy to
Clotfelter, C. T., Glennie, E., Ladd, H. F., & Vigdor, J. L. (2006a). Teacher prepare effective teachers. Washington, DC.
bonuses and teacher retention in low performing schools: Evidence National Council on Teacher Quality. (2007). State teacher policy yearbook:
from the North Carolina $1,800 teacher bonus program. Public Finance Washington state summary. Washington, DC.
Review, 36(1), 63–87. National Research Council. (2010, April). Better data on teacher prepara-
Clotfelter, C., Ladd, H., & Vigdor, J. (2006b). Teacher–student matching and tion could aid efforts to improve education. The National Academies
the assessment of teacher effectiveness. The Journal of Human Resources, Office of News and Public Information. Accessed from: http://
41(4), 778–820. www8.nationalacademies.org/onpinews/newsitem.aspx?RecordID=
Cochran-Smith, M., & Zeichner, K. M. (Eds.). (2005). Studying teacher 12882.
education: The report of the AERA panel on research and teacher Nichols, A. (2008). fese: Stata module calculating standard errors for fixed
education. Washington, DC: The American Educational Research effectshttp://ideas.repec.org/c/boc/bocode/s456914.html.
Association. Noell, G. H., Porter, B. A., Patt, R. M., & Dahir, A. (2008). Value-added
Constantine, J., Player D., Silva, T., Hallgren, K., Grider, M., & Deke, J. (2009). assessment of teacher preparation in Louisiana: 2004–2005 to 2006–
An Evaluation of Teachers Trained Through Different Routes to Certification, 2007 Retrieved from: http://www.laregentsarchive.com/Academic/TE/
Final Report (NCEE 2009-4043). Washington, DC: National Center for 2008/Final%20Value-Added%20Report%20(12.02.08).pdf.
Education Evaluation and Regional Assistance, Institute of Education Obizhaeva, A., & Wang, J. (2013). Optional trading strategy and supply/
Sciences, U.S. Department of Education. demand dynamics. Journal of Financial Markets, 16(1), 1–32.
Crowe, E. (2010). Measuring what matters: A stronger accountability model for Padmanabhan, G., & Vrat, P. (1990). An EOQ model for items with stock
teacher education. Washington, DC: Center for American Progress. dependent consumption rate and exponential decay. Engineering Costs
Darling-Hammond, L., Wise, A., & Klein, S. (1999). A license to teach: Raising and Production Economics, 18(3), 241–246.
standards for teaching. San Francisco, CA: Jossey-Bass. Perlmutter, D. (2005, October). We want change: No we don’t. The Chronicle
Duncan, A. (2010). Teacher preparation: Reforming the uncertain profession. of Higher Education.
Education Digest, 75(5), 13–22. Rivkin, S., Hanushek, E., & Kane, T. J. (2005). Teachers, schools, and academic
Glazerman, S., Mayer, D., & Decker, P. (2006). Alternative routes to teach- achievement. Econometrica, 73(2), 417–458.
ing: The impacts of teach for America on student achievement and Rockoff, J. E. (2004). The impact of individual teachers on student achieve-
other outcomes. Journal of Policy Analysis and Management, 25(1), ment: Evidence from panel data. American Economic Review, 94(2),
75–96. 247–252.
44 D. Goldhaber et al. / Economics of Education Review 34 (2013) 29–44
Rothstein, J. (2010). Teacher quality in educational production: Tracking, The Teaching Commission. (2006). Teaching at risk: Progress & potholes. New
decay, and student achievement. Quarterly Journal of Economics, 25(1), York, NY.
175–214. U.S. Department of Education, Office of Postsecondary Education. (2011).
Rubin, D., Stuart, E., & Zanutto, E. (2004). A potential outcomes view of value- Preparing and credentialing the nation’s teachers: The secretary’s eighth
added assessment in education. Journal of Educational and Behavioral report on teacher quality; based on data provided for 2008, 2009 and 2010.
Statistics, 29, 67–101. Washington, DC.
Sawchuk, S. (2012, April). Deadlocked negotiators fail to reach consensus on Wennmalm, S., & Sanford, S. (2007). Studying individual events in biology.
teacher-prep rules. Education Week Teacher Beat, 2012. Annual Review of Biochemistry, 76, 419–446.
Summers, L. H. (2012). What you (really) need to know. The New York Winters, M., Dixon, B., & Greene, J. (2012). Observed characteristics and
TimesJanuary: . teacher quality: Impacts of sample selection on a value added model.
Todd, P. E., & Wolpin, K. I. (2003). On the specification and estimation of the Economics of Education Review, 31(1), 19–32.
production function for cognitive achievement. Economic Journal, 113,
F3–F33.