EdPsychSpecialIssueReplicationAcceptedVersion PDF
EdPsychSpecialIssueReplicationAcceptedVersion PDF
EdPsychSpecialIssueReplicationAcceptedVersion PDF
This is not the version of record. Accepted for publication in Educational Psychologist.
Author Note
The authors acknowledge and appreciate the constructive criticisms and suggestions for
Johns Hopkins University, 5801 Smith Avenue, Suite 400, Baltimore, MD 21209. Email:
jplucker@jhu.edu
REPLICATION IN EDUCATIONAL PSYCHOLOGY 2
Abstract
Replication is a key activity in scientific endeavors. Yet explicit replications are rare in many fields,
including education and psychology. In this paper, we discuss the relevance and value of replication in
educational psychology and analyze challenges regarding the role replications can and should play in
research. These challenges include philosophical, methodological, professional, and utility concerns
about replication in education and the social sciences more broadly. Finally, we discuss strategies that
Replication is Important for Educational Psychology: Recent Developments and Key Issues
previous results, serving as a de facto reliability check on previous research. Informing stakeholders
about which results can be repeated — and in what circumstances — are the chief value that
replications contribute to research and the public at large. Successful replication of research on positive
outcomes associated with a reading intervention, for example, provides educators and policymakers
with confidence that can justify the investment of scarce public resources in implementation of that
intervention. Conversely, a sensational research result that cannot be replicated provides information to
tends to occur in one of two forms, direct or conceptual replications. When attempting a direct
replication, researchers are attempting to follow the original study’s methods as closely as possible in an
effort to arrive at similar results. The goal of a direct replication is not a thumbs-up/thumbs-down
decision; rather, as Simons (2014) notes, “The end result is not a judgment of whether a single
replication attempt succeeded or failed—it is a robust estimate of the size and reliability of the original
finding” (p. 76). In contrast, the purpose of a conceptual replication is to examine the theoretical
soundness of a particular finding or set of findings, with less focus on repeating exact methods from the
original study. Conceptual replications purposefully alter factors such as participant demographics,
operationalization of dependent variables, or study context (see Schmidt, 2009, Figure 1). Although
there is considerable debate about the value of direct versus conceptual replication (see Simons, 2014;
Stroebe & Strack, 2014) – a debate we explore in greater depth later in this paper – these two categories
are the most common and straightforward way of thinking about types of replication.
Regardless of form, replication is rare, perhaps far too rare. In a series of studies, we have found
low replication rates in the published research bases of psychology, education, special education, gifted
REPLICATION IN EDUCATIONAL PSYCHOLOGY 4
education, and criminology, ranging from 0.13% in education to 1.07% in psychology (Makel & Plucker,
2014b, 2015; Makel et al., 2016; Makel et al., 2012; Pridemore et al., 2018). Other researchers, using
slightly different methods, have arrived at roughly similar rates (Lemons et al., 2016; McNeely &
Warner, 2015). Although we have not specifically examined the presence of replication studies within
educational psychology, the field’s major journals were included in the education study (Makel &
foundational activity within the field yet rarely occurs, it is fair to question whether the field’s impact is
While exploring issues related to replication over the past decade, we often found ourselves in
faculty meetings, conference sessions, and casual conversations with colleagues who asked questions
about the idea of replication (philosophy of replication), strategies for conducting replication studies
replication), and replication’s utility in the field of educational psychology (utility of replication). In this
paper, we attempt to summarize researchers’ current understandings in all four areas, with the caveat
that researchers have not yet answered all of the concerns and questions about replication successfully.
In each section, we provide summaries and analyses of the most recent thinking and research on
replication, in addition to an examination of questions that have yet to be answered. Our hope is that
this paper equips readers with perspective to form deeper understanding of how replication should
factor into their own work as well as the field more broadly.1
Philosophy of Replication
1
After we had written most of this paper, we noted the unintended similarity in structure to Zwaan et al. (2018),
who used potential concerns about replication to organize their comments. Although our framing is unique, we
acknowledge the overlap in general structure.
REPLICATION IN EDUCATIONAL PSYCHOLOGY 5
Some educational psychologists have questioned the philosophical basis for replication.
However, the rationale for replication research has strong epistemological foundations related to the
nature of scientific knowledge. Indeed, Collins (1985) called replication the Supreme Court of science.
Schmidt (2017) is more direct, noting that, “... a single observation cannot be trusted,” and that
“replication … is capable of transforming an observation into a fact, or piece of knowledge” (p. 236,
emphasis in original). Several scholars have recently argued replication plays an important role in theory
building and theory assessment (Guest & Martin, 2021: Irvine, 2021; van Rooij & Baggio, 2021).
Perhaps the most cynical framing of the epistemological value of replication can be drawn from
Planck’s observation that science advances one funeral at a time. In his 1949 autobiography, he wrote
that, “a new scientific truth does not triumph by convincing its opponents and making them see the
light, but rather because its opponents eventually die and a new generation grows up that is familiar
with it” (p. 33). More to the point, he observed, “An important scientific innovation rarely makes its way
by gradually winning over and converting its opponents” (p. 97), an idea reinforced by both philosophers
of science (e.g., Kuhn, 2012) and empirical evidence (Azoulay et al., 2019).
But need this historical reality be destiny? Although Azoulay et al. also note the potential value
of having eminent gatekeepers control the flow and dominance of ideas, especially within nascent fields,
educational psychology is a mature field. Regular and planned replication is one tool that can help
education science self-correct more quickly. Why relegate advances in educational psychology to the
passage of time when we have open science tools at our disposal, replication first among them, that can
help advance the field’s knowledge faster and more efficiently? Doing so will allow the field to be more
transparent and democratic about what we know, what we do not know, and when the field is heading
intentionally strawperson questions. First, to examine constraints on generality (Simons et al., 2017) one
REPLICATION IN EDUCATIONAL PSYCHOLOGY 6
can ask, “If we conducted every study with undergrads, could we generalize those results to all students,
of all ages, in all schools?” Second, to determine the need for some replication, one can ask, “Can we
assume with a reasonable degree of confidence that we know when results would generalize across
specific contexts, ages, and cultures, or do we need to collect data to be certain?” This establishes
scenarios where replications add value. Third, to establish confidence in the reliability of existing
research, one could ask, “Do you believe that the existing academic publication process is 100% error
free?” This helps demonstrate fallibility with published findings. A fourth question, to determine
confidence in the field’s body of research on a specific topic, is: “Do you think having more confidence in
how research on a given topic generalizes, in which settings and for which students, would help you
Although we acknowledge that epistemological questions remain about replication and how to
appropriately interpret and decide when to use them (see, e.g., Gervais, 2021), these questions have
evolved over the past generation from “Do replications have value?” to “When do we need replications
and how can we structure them to provide maximum value?” This is a non-trivial, philosophical advance
that helps the field provide greater value via real world application.
The field has provided value for many decades, why is an enhanced focus on replication needed
now? Calls for replication are not new, and problems in other, related fields suggest educational
psychology could learn from errors in those fields rather than waiting to suffer from them itself. For
example, concerns about replicability have been raised about the research base in a broad range of
fields, from economics to medicine to the life sciences (see Zwaan et al, 2018). Within the social
sciences, psychology has long been considered to have a replication crisis (Pashler & Harris, 2012;
Rosenthal, 1969; Schlosberg, 1951). This has gained attention in recent years due to suspicions of
research misconduct by several prominent researchers. Whether one believes this situation to be
REPLICATION IN EDUCATIONAL PSYCHOLOGY 7
important or overblown, a good crisis should never be wasted. In this vein, Vazire (2018) has reframed
the replicability crisis into a credibility revolution that embraces several methodological strategies,
The presence of a credibility crisis is certainly applicable to education research. The first author
once testified before a state senate education committee and was surprised to hear senators
sarcastically noting that researchers can make their studies say whatever they want. From an empirical
perspective, Merk and Rosman (2019) found evidence that student-teachers held a “smart but evil”
stereotype about education researchers, “as the authors of scientific studies … are perceived not only as
less benevolent, with less integrity, but also as having more expertise in contrast to practitioners. This is
an intriguing finding, as it suggests that student-teachers hold a kind of distrust in scientists” (p. 6). We
do not believe such perceptions by relevant stakeholders help the scientific endeavor in educational
psychology. An increase in credibility among stakeholders is one possible pathway toward gaining
Several scholars have argued that most psychological studies are de facto replication studies,
given that they investigate similar theoretical constructs using different methods than earlier studies; in
other words, most papers are conceptual replications (Smith et al., 2017; Stroebe & Strack, 2014). Chhin
et al. (2018), in a study of IES-funded projects, found no direct replications but concluded that nearly
replications. But failing to label them as such (and take advantage of open science strategies such as
preregistration) disrupts the research process, making it harder for consumers of research to know what
has and has not been replicated (Hodges, 2015; Reich, 2021/this issue).
REPLICATION IN EDUCATIONAL PSYCHOLOGY 8
For these and other reasons, Simons (2014) argued that direct replications are the only way to
verify the reliability of results, a position that is attracting growing levels of support (e.g., Machery,
2019; Nosek & Errington, 2020). Hüffmeier et al. (2016) offer a more nuanced typology involving exact
replications (direct replication conducted by same researchers), close replications (also direct but by
different researchers), constructive (also direct but a similar study modified in a small number of ways to
assess robustness of original effect), conceptual under lab conditions (conceptual attempt to study
theory), and conceptual under field conditions (also conceptual and attempting to study robustness of
theoretical effect). An advantage of the Hüffmeier et al. approach2 is that even if one assumes most
empirical studies are conceptual replications, this typology stresses the importance of systematic and
replication without earlier forms of supportive, direct replication are not adding meaningfully to a field’s
research base (see Zwaan et al., 2018, for a similar argument). Similarly, Irvine (2021) argued that even
the best conceptual replications have a low capacity for theoretical payoff in most circumstances.
This importance of direct replication in no way implies that conceptual replications lack value, as
they help assess generalizability and establish boundary conditions for empirical effects. Comparing the
relative merit of each type of replication should not be about which is better or more important. Rather,
which form of replication is most useful for a given effect at a given time? If a finding is wholly novel,
direct replication may be more useful prior to conceptual replication. If a finding has been observed
That said, conceptual replications may be more informative when they are conducted
systematically through purposefully altering a single variable rather than through changing many
variables. If the population, independent variables, and outcome variables are all changed, making
strong conclusions about replicability of original studies becomes complicated if not impossible. In a
2
See, in particular, Table 1 and Figure 1 in Hüffmeier et al. (2016).
REPLICATION IN EDUCATIONAL PSYCHOLOGY 9
related vein, haphazard conceptual replications may provide little value. For example, is there
theoretical rationale for assuming there may be a difference in how left-handed and right-handed
students respond to math tutoring intervention? If not, such a conceptual replication may not be worth
pursuing.
Regardless of how educational psychologists feel about conceptual versus direct replication (or
any other classification system for replication), they should explicitly state their intent; when they fail to
state their intent to replicate a theoretical position or empirical finding, it becomes difficult for the field
to move forward. Explicit intent can be easily systematized, with authors including the following
language in their papers: “We are attempting to [directly/conceptually] replicate the methods used by
[citation] in their study of [key concepts or interventions].” In addition, direct replications should state,
“We kept all methods as similar to the original as possible, with the following exceptions.” Conceptual
replications should state, “Our study methods differ from the study to be replicated in the following
ways.” Such language would add substantial clarity with minimal length and should be included in both
Given that children are unique and, therefore, have unique experiences, concerns about
generalizability (or lack thereof) are important issues given the variability of students and their contexts.
A belief that no research finding can generalize may be the most extreme version of this concern. We
have encountered such views, often sociocultural in nature (see Turner & Nolen, 2015), but do not find
them compelling. There are numerous educational findings that have been replicated and generalized
across contexts, including the results of sociocultural inquiry (e.g., Coalition for Psychology in Schools
and Education, 2015). Anyone who lacked confidence in generalizability would likely have to believe that
educational psychology should consist only of case studies or action research. Moreover, from this
REPLICATION IN EDUCATIONAL PSYCHOLOGY 10
perspective, publishing these case studies and action research would have little value beyond
descriptive biography because they would not inform practice in other contexts.
Methodology of Replication
Meta-analysis and replication address research quality issues and are complementary processes,
but they have distinct purposes and therefore address research quality in different ways (Patall,
2021/this issue; Valentine, 2019; Williams et al., 2017). Meta-analyses synthesize previous research,
whereas replications seek to verify whether previous research findings are reproducible and, therefore,
accurate. Wide variance in construct definition, instrumentation, sampling, and data analysis, among
other factors, can result in a diverse pool of studies within a meta-analysis, none of which may have
Moreover, meta-analyses and replications solve different problems. Meta-analyses help solve
the problem of heterogeneous results (which may be driven by moderators such as using different
samples or measures). However, replications help assess (and address) experimenter bias. Namely, if a
researcher has a bias (e.g., wants to find a specific result, will be rewarded if certain results are
obtained), meta-analyzing multiple studies from the same lab will amplify the bias. From this
Some may view replication as applying only to experimental research. From this perspective,
replication may be viewed as “other people’s problem” by researchers who do not conduct
experimental research. However, replicability and reliability of results are important across all empirical
approaches to study design and data collection. Across a range of disciplines, plentiful examples are
available of quantitative, non-experimental research being subject to replication attempts (e.g., Kanai,
REPLICATION IN EDUCATIONAL PSYCHOLOGY 11
2016; Piffer, 2019). For example, Ebersole et al. (2016), as part of a Many Labs collaboration, attempted
to replicate both experimental and correlational effects in social psychology across undergraduate
research pools to determine whether variations in participation in research across the course of a
semester produced significantly different effects, finding little evidence they did. Assessing whether the
timing of an event influences consequences is of great relevance to all of education and educational
psychology. Ironically, we once had a descriptive study (on replication, no less!) replicated by a team
that was studying the same issues at the same time for the same journal special issue (see Lemons et al.,
Regarding qualitative methods, most readers will not be surprised that this is an area of
considerable debate, with some strong, negative views of the importance of replication (and even the
concept of replicability) to qualitative research (Pratt et al., 2020). However, perspectives are emerging
focusing on the clear communication of methods to facilitate replicable qualitative work (Anczyk et al.,
The application of replication to qualitative research is relatively recent, and therefore the
number of unanswered questions in this area of scholarship are numerous and important. For example,
does replication apply at all to studies using grounded theory or critical race theory? How can qualitative
research approach independent replication when individual subjectivity or background personal context
plays a central role in many approaches to qualitative interpretation? How can qualitative replication
help build or assess theory? Developing answers to these questions will help inform when and how
“perhaps we should no longer think in terms of qualitative–quantitative divides but rather in terms of
more-less replicable distinctions, and do all that is possible to document all choices and decisions made
REPLICATION IN EDUCATIONAL PSYCHOLOGY 12
throughout a study to enable others to replicate our work” (p. 100). This recommendation is applicable
to all forms of research within educational psychology and the learning sciences. If a research result
(whether quantitative or qualitative) is so narrow and fragile that it can never be found again (even by
the same research team), that result would be of little use to practitioners and policymakers.
Conversely, for example, a series of case studies that all found evidence, hypothetically speaking, that
We often encounter colleagues who note that increased use of replication within educational
psychology will not automatically lead to huge increases in the quality of the field’s research. Indeed,
increased use of replication is a necessary but insufficient strategy for improving research quality (see
also Nosek et al., 2021; Schmidt, 2017). In recent years, a range of strategies have been suggested and
refined for improving how research is conducted and communicated, many falling under the banner of
open science. These wide-ranging approaches include preregistration of hypotheses (i.e., publicly
sharing a study’s methods before the study is conducted), open data, and meta-analysis, among many
others (Chambers, 2017; Cook et al., 2021/this issue; Cruwell et al., 2019; Gehlbach & Robinson,
2021/this issue; Makel & Plucker, 2017; Nosek et al., 2015; Reich, 2021/this issue; Spellman, 2015). To
facilitate these practices, web services are available that allow for posting of preregistrations, data
sharing, and posting of pre-prints (e.g., https://osf.io/, https://edarxiv.org/). Many of these approaches
For example, a recent issue of AERA Open was dedicated to publishing registered reports and
included replications (Reich et al., 2020), with the vast majority being educational psychology research.
The studies were primarily inferential and experimental in nature but included descriptive work (e.g.,
Peters et al., 2019). In the ensuing commentaries (see Reich et al., 2020), many of the authors noted the
REPLICATION IN EDUCATIONAL PSYCHOLOGY 13
example, Merk and Rosman (2020) noted that preregistration requires more detailed methods sections
in papers, and that the discussion section was “more honest and vivid as we could, for example, give
sharper opinions” (p. 1). Another benefit is that the detailed method sections typically found in
preregistered studies facilitate future replication attempts. Any progress toward heightened research
credibility, regardless of whether one refers to it as “open science” or some other term, is a positive
development for educational psychologists and the educators and families whom our work benefits.
Development and testing of theory may be an even more important component of improving
research rigor, but that does not mean that replication is irrelevant (Wentzel, 2021/this issue). Some
have proposed that replications are part of the process of strengthening the empirical portion that
informs the cycle of theory development and assessment (van Rooij & Baggio, 2021). Irvine (2021)
argued that without sufficiently advanced theory, replications may have limited value but that they can
serve as a tool to improve theory. van Rooij and Baggio (2021) share the vivid example of knowing
apples fall from trees (a replicable effect) but needing the theory of gravity to explain the effect and
provide true understanding. In addition, they argued that the development of theory is not only the
ultimate priority but also the foundation of all future empirical efforts (cf. Eronen & Bringmann, 2021).
Regardless of whether one believes replications can benefit the field, they may wonder if
conducting replications will further their own careers.3 Sufficient incentives exist within education to
allay this concern: Replications are cited, unsuccessful replications can be well received, and external
3 In economics, this is called the tragedy of the commons: Replication may be a public good (benefitting all), but
individuals may act as free riders, benefitting when others perform replications but not acting to conduct
replications themselves.
REPLICATION IN EDUCATIONAL PSYCHOLOGY 14
As shown in Makel and Plucker’s (2014b) study of replication in the Top 100 education journals
as ranked by impact factor, the median number of citations for articles being replicated was 31 (range
from 1-7,644) while the median citation count for the replicating paper was 5 (range from 0-135).
Although the replicating papers were cited far less often, the median citation count for those papers
was higher than the impact factor for all of the 100 journals at that time.
Some researchers may be concerned that conducting unsuccessful replications will give one the
reputation of being a critic within their field, which could cause problems when going up for promotion,
being considered for fellowships and other honors, and other aspects of a profession where peer
assessment is valued and important. Although failed replication attempts tend to capture media
attention and be noted on social media, replications in the social sciences tend to be confirmatory.
Successful replication rates tend to be over 65% in education, psychology, economics, and special
education, with some rates in excess of 80% (Camerer et al., 2016; Klein et al., 2014; Lemons et al.,
2016; Makel & Plucker, 2014a; Makel et al., 2012, Makel et al. 2016; cf. Camerer et al., 2018). Although
self-replications tend to be more successful than third-party replications (Makel et al., 2012; Makel &
Plucker, 2014b), they are still often successful even when conducted by third-party researchers (71% vs
54%, respectively, Makel & Plucker, 2014b, Table 2). But we note that these estimates are built on
replications that were generally not preregistered. Replicability estimates of preregistered replications
are often lower (e.g., 35%; OSC, 2015), which may result from the many ways success of a replication
Replication research is also increasingly attractive to external funders. Howe and Perfors (2018)
note that “grant agencies greatly value novelty, but they even more greatly value reliable science; a
novel finding can have a long-term impact only if it is true” (p. 25). For this reason, it is not surprising
that the National Science Foundation and the Institute of Educational Sciences (IES) now have regular
funding competitions for replication research. Replications have become such a prevalent part of
REPLICATION IN EDUCATIONAL PSYCHOLOGY 15
funding projects that these agencies have jointly published companion guidelines on conducting
replication research (NSF & IES, 2018). This type of action also suggests a cultural shift is occurring,
making replications both a public good for the field and good for one’s career, too.
Finally, replication research may lower opportunity costs within the busy careers of educational
psychologists. Consider some worst case scenarios: A researcher spends years working on a concept
that, over time, others cannot replicate. Or a group of graduate students devote their research time
pursuing topics that appear enticing but ultimately fail to replicate. Neither situation is good for one’s
career, but contrast those scenarios with that of an early career researcher who conducts a series of
replications on foundational studies on a particular construct, some successful and some not, that help
improve theory on that construct and create more effective interventions. The latter scenario is of
Replication is Pro-Innovation
Researchers work in communities that reward creativity and innovation. Some may be
concerned that replications distract or detract from creative contributions. Under no circumstances is
this accurate. This question is often asked or implied by journal editors, who note that they are resistant
to publishing replication papers because they want their journal to focus on creative additions to the
research literature. The attraction to the shiny new object in research is well-known (e.g., Fanelli, 2010;
Hodges, 2015; Howe & Perfors, 2018; Makel, 2014), but this attitude confuses novelty with creativity.
uniqueness, they also require usefulness or utility (e.g., Plucker et al., 2004; Simonton, 2012).
If an idea cannot be replicated, arguing that the idea is useful, especially in an applied field such
as educational psychology, is a difficult case to make (Makel & Plucker, 2014a). Furthermore, given that
innovation can be conceptualized as creativity taken to scale, a finding that cannot be replicated -
regardless of one’s chosen definition of replication - can never inform innovative practice. At best, it can
REPLICATION IN EDUCATIONAL PSYCHOLOGY 16
only misinform practice and mislead educators and policymakers. An irreplicable research result within
educational psychology is neither creative nor innovative; a replicable result is likely to be creative, and
A related aspect involves the primacy effect as applied to research, in which the first study
published on a topic is assumed to be the most valuable. Gelman (2017, 2018) has proposed a time-
reversal heuristic - a thought experiment in which researchers consider how their evaluation of a
theoretical effect changes if an unsupportive direct replication were published first, and the original,
exploratory, supportive study were published second. Most people would be skeptical of the second
study’s results, when in fact both should be considered equally when evaluating the research on the
effect. The time-reversal heuristic would have us consider usefulness before novelty, an admittedly
Is conducting a replication an aggressive act? Not necessarily, given that replication attempts
can be perceived as a form of flattery, in that a researcher’s colleagues are paying attention to their
work. In a field with little to no replication, it is human nature to find any attempt to replicate your work
to be suspicious if not adversarial. Until the research culture changes to embrace replication and other
open science strategies, replication will always have at least a tinge of an adversarial feel. But
researchers can influence the degree to which replication is viewed as constructive versus aggressive.
For example, compare the reactions of two distinct sets of authors whose work has recently
been the subject of unsuccessful replication attempts. In the first, a study of the impact of human-like
avatars on decision-making in a technology context was replicated with mixed and generally negative
results (Simmons & Nelson, 2020). The authors of the original study responded politely, but their
response included several paragraphs exploring why the replication was likely flawed.
REPLICATION IN EDUCATIONAL PSYCHOLOGY 17
However, in the other example, Stafford (2018) authored a study that did not find evidence of
stereotype threat among chess players. A replication of the study found considerable evidence of
stereotype threat (Smerdon et al., 2020). The author of the original study acknowledged the failed
replication, noted ways in which the replication improved on his original study, provided even-handed
analysis on the possible causes for the divergent results, and even defended the replication authors
against subsequent criticisms of the replication (Stafford, 2020). This commitment to building an
accurate research base rather than reflexively defend one’s personal investment in their research is
Many researchers have proposed ways to make replication less adversarial, from systematic
approaches involving changing research culture to specific replication strategies. Gernsbacher (2018)
has pointed to reciprocal replications – in which teams of researchers attempt to replicate each other’s
work – as one path forward. Tierney et al. (2020) proposed a creative destruction approach that
conceptualizes replications as the act of replacing original results with revised results that are more
powerful or more precise. Regardless of the approach, having more frequent replications in educational
psychology will help make them less of an aberration, more difficult to interpret as a personal attack,
more of a key aspect of the educational psychology enterprise, and more successful in improving
As the preceding sections suggest, the conventional wisdom on the value of replication within
educational psychology is changing, albeit slowly and unevenly. There are reasons to be optimistic about
changing attitudes within the field toward replication and open science practices (see Cook et al.,
2021/this issue; Mellor, 2021/this issue). For example, editors from some specialty journals in and
related to the field have published editorials endorsing and implementing open science practices (e.g.,
Adelson & Matthews, 2019; Hodges, 2015; Spector et al., 2015), and journals that feature work from the
REPLICATION IN EDUCATIONAL PSYCHOLOGY 18
field, such as AERA Open, The British Journal of Educational Psychology, Exceptional Children, and
Journal of Educational Psychology, accept registered reports. But to our knowledge few other
educational psychology journals have taken similar steps and acceptance of replications is less clear.
When all of the field’s journal editors state unequivocal support for replication and open science
approaches to research, many of the professional concerns surrounding replication will dissipate.
Increasing calls for innovation in how we teach research methods at the graduate level will also
hopefully result in lasting culture change (Gernsbacher, 2018; Spector at al., 2015). For example, Kochari
and Ostarek (2018) have called for making direct replications a required research activity for doctoral
students, and Yeo-Teh and Tang (2020) have suggested a research ethics and integrity course for
doctoral students that emphasizes the responsible conduct of research, including issues such as
preregistration and other open science practices. Using graduate education as a major intervention
point allows educational psychologists to prepare the next generation of innovators as opposed to
slowly adopting and adapting new research practices after everyone has already entered the field.
Utility of Replication
We once led a replication workshop at a federal agency where the participants questioned
whether it was possible to determine if a replication attempt was successful. However, in the absence of
replication, researchers routinely make judgments about the accuracy of collective bodies of research,
dealing with variables in study quality and methodology, but with little information about whether a
given finding is replicable. If we can determine success and failure of original studies, we can determine
success of replications.
Simple heuristics such as comparing whether p-values are similar have their limitations,
including ignoring whether the magnitude of the effect is the same (Valentine, 2019). But this limitation
is not unique to replications; it holds true for original research as well. Several approaches that could be
REPLICATION IN EDUCATIONAL PSYCHOLOGY 19
taken to assess replication success have been proposed, each with its strengths and limitations (Schauer
& Hedges, 2020), including interpreting confidence intervals (Jacobs et al, 2019; Zwaan et al., 2018), a
combination of sample and effect sizes (Simonsohn, 2015), and a replication Bayes factor (Zwaan et al.,
2018). Others have noted that more than one replication may often be needed for unambiguous
interpretation of effects (Hedges & Schauer, 2019) or that reproducible results may not lead to a
“convergence to scientific truth” (Devezer et al., 2019, p. 17) because of research and statistical
We have no issue with attempts to create objective criteria for evaluating replication results but
offer a broad approach that is more direct: Do the results give the reader more or less confidence in the
validity of the original findings? More objective and statistical approaches, such as those mentioned
above, may inform the answer to this question, but in the end stakeholders will make subjective
design, magnitude of effects) about the results of any study, and replications are no different. Just as a
specific p value should not automatically equate to action, nor should any other generic statistic. A more
interpretive approach allows educators to consider local context when evaluating, for example, whether
efficacy research on a specific intervention has been replicated. For example, the replications may be
program but not sufficiently supportive to expand the intervention to every school in a large district.
Meehl (1978) similarly emphasized the importance of using relevant theory and other background
knowledge as part of any assessment process and not to rely solely on formal statistical comparison. As
with original findings, we believe context issues such as theory, measurement validity, and relevant
previous findings all need to play a vital role in interpreting replication success.
Interestingly, researchers have spent considerable time debating the flip-side of this coin, the
definitions of “failed” replications. Again, the issue of context is especially relevant to this discussion, as
REPLICATION IN EDUCATIONAL PSYCHOLOGY 20
one can usually explain away a failed or mixed replication attempt as being due to differences in subtle
context or hidden moderators. That may be the case, but the authors of the original study then bear
responsibility for not adequately describing key contextual factors in the success of the original
context is unlikely to be of great use to practitioners who attempt to use the intervention. Gelman
(2018) argues similarly, noting that “various concerns about the difficulty of replication should, in fact,
be interpreted as arguments in favor of replication … if effects can vary by context, this provides more
reason why replication is necessary for scientific progress” (p. 19, emphasis in original). In other words,
context happens, and finding interventions that are useful across contextual differences (or, more to the
point, are useful only within certain, definable contexts) is a goal of educational psychology.
In a related vein, some may wonder whether replications “fail” because the replicators simply
are not as skilled as the researchers involved with the original study. But there is no evidence that
replications fail because replicators lack experience or expertise (Nosek, 2020; Protzko & Schooler,
2020). Moreover, in an applied field like educational psychology, it is important to know whether special
skill is required to elicit a particular effect. What if not all teachers or schools have this elusive trait?
Should they expect effects or not? Those are the questions practitioners want and need answered. If
researchers can develop even marginally informative practices, they would be immediately useful to
Assessing the need for replication is not about a particular replication rate, although we believe
1% is too low. Rather than focusing on replications rates, other criteria must be considered. If one were
to ask how many people should be on anti-cholesterol medication, the answer would not be a
percentage of the population. Instead, the answer would be based on the proportion of individuals who
met specified criteria (e.g., age, cholesterol levels, family history of cardiovascular disease) associated
REPLICATION IN EDUCATIONAL PSYCHOLOGY 21
with problems and benefits relevant to anti-cholesterol medication. The percentage of individuals who
meet that criteria may vary over time and place, but it is those contextual factors that determine
appropriateness and value. Similarly, Irvine (2021) argued, "as the current state of knowledge informs
what counts as a good replication, what counts as a good replication can change" (p. 8). For these
reasons, we find a percent target for educational psychology to be a less-than-ideal lens through which
Rather, we favor a focus on the use of the following criteria for determining whether studies
should be replicated. From an empirical perspective, highly-cited studies should be prioritized for
replication. If a study on a new construct or intervention is cited several dozen or several hundred times
in its first year after publication, those citations can be interpreted as the field voting with its feet, so to
speak, about the importance of the study. Regarding qualitative evidence, a set of research findings that
are about to be scaled up for broad implementation or are being included in textbooks and course
Viewed conceptually, a replication adds value if it helps build theory (Guest & Martin, 2021),
narrow or assess a given theory (Irvine, 2021), or assess whether an explanation has particular
boundaries on effects (van Rooij & Baggio, 2021). In particular, high rates of replication may be needed
in studies based on weak theory, a lot of variability in results across contexts, or theory that predicts
many complex factors affecting outcomes. In the end, the question about replication prevalence boils
down to whether a field has replicated its most important studies or needs them as part of the cycle of
theory development, rather than whether researchers have replicated a specific percent of the field’s
empirical output.
Regarding more frequent use of replication within educational psychology, we need to “Make it
so” (Picard, 2366). An ideal educational system is informed by research evidence that practitioners and
REPLICATION IN EDUCATIONAL PSYCHOLOGY 22
policymakers are confident will lead to desired outcomes. We struggle to see any realistic path toward
this goal that does not include greater use of replication in educational psychology research. The idea
that science is self-correcting is well-established, but such correction does not occur magically. Science is
self-correcting when scientists are self-correcting. Replication attempts help inform the practices in
which we should have confidence and the practices that need correcting. Frequent replication attempts
will help accomplish this more quickly than traditional, largely replication-free approaches to research.
To make replication more common within the field, we see two challenges of adaptation:
people and systems. People challenges (like those mentioned above), often stem from individuals
needing to admit there is a problem. Educational psychology is not alone in needing to change behaviors
to live up to norms (see Vazire, 2018). Psychology, even fMRI research, suffers from reliability and
replicability problems (Elliott et al., 2020). Another people challenge to replications is researchers giving
undue trust and credit to original studies simply because they were published first (again, Gelman’s
time-reversal heuristics).
System challenges that prevent greater replication prevalence include things as fundamental as
the incentive structure within higher education and the broader domain of academic research (Mellor,
2021/this issue). What gets published and funded as well as what helps get people hired, promoted, and
honored needs rethinking. Or perhaps more accurately, perceptions of what gets rewarded matter, and
those perceptions are still largely misaligned with replication and other strategies for improving
educational psychology research. Additionally, undergraduate and graduate methods courses – as well
as courses for pre-service teachers and administrators – need to include replication and its importance.
Another system-level step involves educational psychology journals and professional organizations
adopting the Transparency and Openness Promotion (TOP) Guidelines (Nosek et al., 2015), which
include standards on replication. And venerable groups within the field, such as Division 15 of the
REPLICATION IN EDUCATIONAL PSYCHOLOGY 23
American Psychological Association, should offer awards for high-quality replication attempts
The current discussion and debate about improved research methodology in educational
psychology and other fields should in no way give the impression that adoption of open science
methods is the sole solution for improving the field’s impact. The best studies, even with huge sample
sizes, impeccable design and measurement, and cutting-edge analysis techniques, will not be useful if
based on weak theoretical foundations. Theory matters (Gehlbach & Robinson, 2021/this issue; Smith et
al., 2017; Wentzel, 2021/this issue), and having carefully constructed theories that build on theories and
research of the past serve as the foundation for all we do as applied psychologists (Vartanian, 2017). The
growing discussion of replication’s role in theory development and assessment (e.g., Guest & Martin,
2021; Irvine, 2021; van Rooij & Baggio, 2021) will likely have a major impact on replication use. For
example, making decisions about when replications are needed and what types of replication should be
used may be informed by what they contribute to the development of a particular theory.
Making gains on both people and system challenges to increase use of replication will likely
occur in stages. Developing momentum behind changing cultural norms will require coalition-building
and sustained effort of many within their research communities and departments. To achieve this
culture change, several universities in the United Kingdom have jointly agreed to appoint research
quality officers (Munafò, 2019), sending a clear message across institutions and stakeholder groups
about the need to act collectively to increase research quality and impact. We see no reason why
Conclusion
Replication is a necessary cornerstone for effective scientific endeavors, yet explicit replications
are too rare within educational psychology. In concert with other open science strategies, increased use
of strategically planned and timed replications would improve the overall quality and value of
REPLICATION IN EDUCATIONAL PSYCHOLOGY 24
educational psychology research. Of course, the field needs to explore several aspects of replication to
help maximize its potential benefits, such as how replication efforts can best inform theory
development, the extent to which replication can be applied to various forms of qualitative research,
and how replication can be most effectively incentivized, among other topics. But as Leppink (2017)
noted in an examination of replication across all types of research methodologies, the goal of replication
is to “allow us to work together towards stronger conclusions and implications for future research and
practice” (p. 100). This point is well-taken, and we see it as a fitting goal for educational psychologists,
too. By collaborating and working together to improve the quality of our research, educational
psychologists will create positive outcomes for both research and practice in education and learning.
Replication – in conjunction with strong theoretical foundations and the widespread use of other open
science practices – will help us achieve this goal. These collaborations will help us understand existing
work better, conduct future work more efficiently and effectively, and provide greater value to
References
Adelson, J. L., & Matthews, M. S. (2019). Gifted Child Quarterly’s commitment to transparency,
https://doi.org/10.1177/0016986218824675
Anczyk, A., Grzymała-Moszczyńska, H., Krzysztof-Świderska, A., & Prusak, J. (2019). The replication crisis
and qualitative research in the psychology of religion. The International Journal for the
Azoulay, P., Fons-Rosen, C., & Graff Zivin, J. S. (2019). Does science advance one funeral at a time?
Camerer, C. F., Dreber A., Forsell, E., Wu, H. (2016). Evaluating replicability of laboratory experiments in
Camerer, C.F., Dreber, A., Holzmeister, F., et al., (2018). Evaluating the replicability of social science
experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour, 2, 637–
644. https://doi.org/10.1038/s41562-018-0399-z
Carter, E. C., Schönbrodt, F. D., Gervais, W. M., & Hilgard, J. (2019). Correcting for bias in psychology: A
Chambers, C. (2017). The seven deadly sins of psychology: A manifesto for reforming the culture of
Chhin, C. S., Taylor, K. A., & Wei, W. S. (2018). Supporting a culture of replication: An examination of
education and special education research grants funded by the Institute of Education Sciences.
Coalition for Psychology in Schools and Education. (2015). Top 20 principles from psychology for preK-12
Crüwell, S., van Doorn, J., Etz, A., Makel, M. C., Moshontz, H., Niebaum, J., Orben, A., Parsons, S., &
2604/a000387
Devezer, B., Nardin, L. G., Baumgaertner, B., & Buzbas, E. O. (2019). Scientific discovery in a
modelcentric framework: Reproducibility, innovation, and epistemic diversity. PLoS ONE 14(5):
e0216125. https://doi.org/10.1371/journal.pone.0216125
Ebersole, C. R., Atherton, O. E., Belanger, A. L., Skulborstad, H. M., Allen, J. M., Banks, J. B., ... & Brown,
E. R. (2016). Many Labs 3: Evaluating participant pool quality across the academic semester via
https://doi.org/10.1016/j.jesp.2015.10.012
Elliott, M. L., Knodt, K. R., Ireland, D., Morris, M. L., Poulton, R., Ramrakha, S. Sison, M. L., Moffitt, T. E.,
Caspi, A. & Hariri, A. R. What is the test-retest reliability of common task-fMRI measure? New
10.1177/0956797620916786
Eronen, M. I., & Bringmann, L. F. (2021). The theory crisis in psychology: How to move forward.
Fanelli, D. (2010). Do pressures to publish increase scientists' bias? An empirical support from US States
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0010271
Gehlbach, H., & Robinson, C. D. (2021/this issue). From old school to open science: The implications of
new research norms for educational psychology and beyond. Educational Psychologist, 56(2),
###-###.
REPLICATION IN EDUCATIONAL PSYCHOLOGY 27
Gelman, A. (2017). Beyond “power pose”: Using replication failures and a better understanding of data
https://statmodeling.stat.columbia.edu/2017/10/18/beyond-power-pose-using-replication-
failures-better-understanding-data-collection-analysis-better-science/.
Gelman, A. (2018). Don't characterize replications as successes or failures. Behavioral and Brain
Gernsbacher, M. A. (2018). Three ways to make replication mainstream. Behavioral and Brain Sciences,
Gorgolewski, K. J., Nichols, T., Kennedy, D. N., Poline, J. B., & Poldrack, R. A. (2018). Making replication
Guest, O. & Martin, A. (2021). How computational modeling can force theory building in psychological
Hedges, L. V., & Schauer, J. M. (2019). More than one replication study is needed for unambiguous tests
https://doi.org/10.3102/1076998619852953
DOI:10.1007/s11528-015-0862-2
Hoogeveen, S., Wagenmakers, E., Kay, A., & van Elk, M. (2019, September 27). Compensatory control
https://doi.org/10.31234/osf.io/vqu2x
Howe, P. D., & Perfors, A. (2018). An argument for how (and why) to incentivise replication. Behavioral
Hüffmeier, J., Mazei, J., & Schultze, T. (2016). Reconceptualizing replication as a sequence of different
https://doi.org/10.1016/j.jesp.2015.09.009
Irvine, E. (2021). The role of replication studies in theory building. Perspectives on Psychological Science.
https://doi.org/ 10.1177/1745691620970558
Jacob, R. T., Doolittle, F., Kemple, J., & Somers, M.-A. (2019). A framework for learning from null results.
Kanai, R. (2016). Open questions in conducting confirmatory replication studies: Commentary on Boekel
Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B., Bahnik, S., et al. (2014). Investigating variation in
Kochari, A. R., & Ostarek, M. (2018). Introducing a replication-first rule for PhD projects (commmentary
on Zwaan et al., ‘Making replication mainstream’). Behavioral and Brain Sciences, 41, 28.
doi:10.1017/S0140525X18000730.
Kuhn, T. S. (2012). The structure of scientific revolutions (4th ed.). University of Chicago Press.
Lemons, C. J., King, S. A., Davidson, K. A., Berryessa, T. L., Gajjar, S. A., & Sacks, L. H. (2016). An
inadvertent concurrent replication. Same roadmap, different journey. Remedial and Special
developments, and the need for replication. Journal of Taibah University Medical Sciences,
12(2), 97-101.
Makel, M. C. (2014). The empirical march: Making science better at self-correction. Psychology of
Makel, M. C., & Plucker, J. A. (2014a). Creativity is more than novelty: Reconsidering replication as a
Makel, M. C., & Plucker, J. A. (2014b). Facts are more important than novelty: Replication in the
Makel, M. C., & Plucker, J. A. (2015). An introduction to replication research in gifted education: Shiny
and new is not the same as useful. Gifted Child Quarterly, 59, 157-164. DOI:
10.1177/0016986215578747.
Makel, M. C., & Plucker, J. A. (Eds.). (2017). Toward a more perfect psychology: Improving trust,
Makel, M. C., Plucker, J. A., Freeman, J., Lombardi, A., Simonsen, B., & Coyne, M. (2016). Replication of
special education research: Necessary but far too rare. Remedial and Special Education, 37, 205-
Makel, M. C., Plucker, J. A., & Hegarty, C. B. (2012). Replications in psychology research: How often do
10.1177/1745691612460688.
Makel, M. C., Smith, K. N., McBee, M. T., Peters, S. J., & Miller, E. M. (2019). A path to greater credibility:
https://journals.sagepub.com/doi/full/10.1177/2332858419891963
McNeely, S., & Warner, J. J. (2015). Replication in criminology: Necessary practice. European Journal of
Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of
Mellor, D. T. (2021/this issue). Changing norms in research culture to value transparency over novelty.
Merk, S., & Rosman, T. (2019). Smart but evil? student-teachers’ perception of educational researchers’
Merk, S., & Rosman, T. (2020). Reflections on the registered report process for “Smart but evil? Student-
1. DOI: 2332858420918158.
Morris, P. A., Connors, M., Friedman-Krauss, A., McCoy, D. C., Weiland, C., Feller, A., Page, L., Bloom, H.,
& Yoshikawa, H. (2018). New findings on impact variation from the Head Start Impact Study:
Informing the scale-up of early childhood programs. AERA Open, 4(2), 1-16. DOI:
2332858418769287. Available at
https://journals.sagepub.com/doi/pdf/10.1177/2332858418769287
Munafò, M. (2019). Raising research quality will require collective action. Nature, 576(7786), 183-183.
doi: 10.1038/d41586-019-03750-7
NSF and IES (2018). A Supplement to the Common Guidelines for Education Research and Development.
Nosek, B. A. [BrianNosek]. (2020, November 13). Summary press release [Twitter thread]. Retrieved
from https://twitter.com/BrianNosek/status/1327296776865525761
Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., Buck, C. D., Chambers,
G., Chin, G., Christensen, M., Contestabile, A., Dafoe, E., Eich, J., Freese, R., Glennerster, D.,
Goroff, D. P., Green, B., Hesse, M., Humphreys, J. .. & Yarkoni, T.. (2015). Promoting an open
https://doi.org/10.1371/journal.pbio.3000691
REPLICATION IN EDUCATIONAL PSYCHOLOGY 31
Nosek, B. A., Hardwicke, T. E., Moshontz, H., Allard, A., Corker, K. S., Almenberg, A. D., ... & Vazire, S.
Pashler, H., & Harris, C. R. (2012). Is the replicability crisis overblown? Three arguments examined.
Patall, E. A. (2021/this issue). Implications of the open science era for educational psychology research
Peters, S. J., Rambo-Hernandez, K., Makel, M. C., Matthews, M., & Plucker, J. A. (2019). The effect of
local norms on racial and ethnic representation in gifted education. AERA Open, 5(2), 1-18. DOI:
10.1177/2332858419848446.
https://journals.sagepub.com/doi/full/10.1177/2332858419848446
Piffer, D. (2019). Evidence for recent polygenic selection on educational attainment and intelligence
inferred from Gwas hits: A replication of previous findings using recent data. Psych, 1(1), 55-75.
https://doi.org/10.3390/psych1010005
Plucker, J. A., Beghetto, R. A., & Dow, G. T. (2004). Why isn't creativity more important to educational
Pratt, M. G., Kaplan, S., & Whittington, R. (2020). Editorial essay: The tumult over transparency:
Pridemore, W. A., Makel, M. C., & Plucker, J. A. (2018). Replication in criminology and the social
032317-091849
REPLICATION IN EDUCATIONAL PSYCHOLOGY 32
Protzko J., & Schooler, J. W. (2020). No relationship between researcher impact and replication effect:
https://doi.org/10.7717/peerj.8014
Reich, J. (2021/this issue). Pre-Registration and Registered Reports. Educational Psychologist, 56(2), ###-
###.
Reich, J., Gehlbach, H., & Albers, C. (2020, May). AERA Open special topic on preregistered reports. AERA
Open. https://journals.sagepub.com/page/ero/collections/registered-reports
Rosenthal, R. (1969). On not so replicated experiments and not so null results. Journal of Consulting and
Schauer, J. M., & Hedges, L. V. (2020). Reconsidering statistical methods for assessing replication.
https://doi.org/10.1037/met0000302Schindler, C., Veja, C., Hocker, J., Kminek, H., & Meier, M.
doi:10.1037/h0056148
Schmidt, S. (2017). Replication. In M. C. Makel & J. A. Plucker (Eds.), Toward a more perfect psychology:
Improving trust, accuracy, and transparency in research (pp. 215-232). American Psychological
Association.
Simmons, J., & Nelson, L. (2020, May 20). Do human-like products inspire more holistic judgments?
http://datacolada.org/87
Simons, D. J. (2014). The value of direct replication. Perspectives on Psychological Science, 9(1), 76-80.
https://doi.org/10.1177/1745691613514755
REPLICATION IN EDUCATIONAL PSYCHOLOGY 33
Simons, D. J., Shoda, Y., & Lindsay, D. S. (2017). Constraints on Generality (COG): A proposed addition to
https://doi.org/10.1177/1745691617708630
Simonsohn, U. (2015). Small telescopes: detectability and the evaluation of replication results.
https://doi.org/10.1177/0956797614567341
Simonton, D. K. (2012). Taking the U.S. Patent Office criteria seriously: A quantitative three-criterion
creativity definition and its implications. Creativity Research Journal, 24(2-3), 97-106.
Smerdon, D., Hu, H., McLennan, A., von Hippel, W., & Albrecht, S. (2020). Female chess players show
0956797620924051.
Smith, J. K., Smith, L. F., & Smith, B. K. (2017). The reproducibility crisis in psychology: Attack of the
clones or phantom menace? In M. C. Makel & J. A. Plucker (Eds.), Toward a more perfect
Spector, J.M., Johnson, T.E., & Young, P.A. (2015). An editorial on replication studies and scaling up
https://doi.org/10.1007/s11423-014-9364-3
Spellman, B. A. (2015). A short (personal future history of revolution 2.0. Perspectives on Psychological
Stafford, T. [TomStafford]. (2020, May 20). Is stereotype threat in chess real after all? [Twitter thread]
Steinhardt, I. (2020). Learning open science by doing open science. A reflection of a qualitative research
https://content.iospress.com/download/education-for-information/efi190308?id=education-
for-information%2Fefi190308
Stroebe, W., & Strack, F. (2014). The alleged crisis and the illusion of exact replication. Perspectives on
Tierney, W., Hardy, J. H., Ebersole, C. R., Leavitt, K., Viganola, D., Clemente, E. G., Gordon, M., Dreber, A.,
Johannesson, M., Pfeiffer, T., Hiring Decisions Forecasting Collaboration, & Uhlmann, E. L.
(2020). Creative destruction in science. Organizational Behavior and Human Decision Processes,
Turner, J. C., & Nolen, S. B. (2015). Introduction: The relevance of the situative perspective in
10.1080/00461520.2015.1075404
Valentine, J. (2019). Expecting and learning from null results. Educational Researcher, 48(9), 611-613.
https://doi.org/10.3102/0013189X19891440
van Rooij, I. & Baggio, G. (2021). Theory before the test: How to build high-verisimilitude explanatory
https://doi.org/10.1177/1745691620970604
Wentzel, K. R. (2021/this issue). Open science reforms: Strengths, challenges and future directions.
Williams, R. T., Polanin, J. R., & Pigott, T. D. (2017). Meta-analysis and reproducibility. In M. C. Makel & J.
A. Plucker (Eds.), Toward a more perfect psychology (pp. 255-270). American Psychological
Association.
Yeo-Teh, N. S. L., & Tang, B. L. (2020). Research ethics courses as a vaccination against a toxic research
Zwaan, R. A., Etz, A., Lucas, R. E., & Donnellan, M. B. (2018). Making replication mainstream. Behavioral