Li 2010

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/229940242
The Effectiveness of Corrective Feedback in SLA: A

Meta‐Analysis
Article in Language Learning · February 2010

DOI: 10.1111/j.1467-9922.2010.00561.x
CITATIONS READS
662 10,082
1 author:
Shaofeng Li
Florida State University
79 PUBLICATIONS 3,996 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Language Aptitude View project
Language aptitude: Advancing theory, testing, research and practice. By: Wen, Skehan, Biedron, Li & Sparks (Routledge,
2019) View project
All content following this page was uploaded by Shaofeng Li on 06 January 2020.
The user has requested enhancement of the downloaded file.

Language Learning ISSN 0023-8333
REVIEW ARTICLE
The Effectiveness of Corrective Feedback
in SLA: A Meta-Analysis
Shaofeng Li
Michigan State University
This study reports on a meta-analysis on the effectiveness of corrective feedback in

second language acquisition. By establishing a different set of inclusion/exclusion crite-
ria than previous meta-analyses and performing a series of methodological moves, it is
intended to be an update and complement to previous meta-analyses. Altogether 33 pri-
mary studies were retrieved, including 22 published studies and 11 Ph.D. dissertations.
These studies were coded for 17 substantive and methodological features, 14 of which
were identified as independent and moderator variables. It was found that (a) there was a
medium overall effect for corrective feedback and the effect was maintained over time,
(b) the effect of implicit feedback was better maintained than that of explicit feedback,
(c) published studies did not show larger effects than dissertations, (d) lab-based studies
showed a larger effect than classroom-based studies, (e) shorter treatments generated a
larger effect size than longer treatments, and (f) studies conducted in foreign language
contexts produced larger effect sizes than those in second language contexts. Possible
explanations for the results were sought through data cross-tabulation and with reference
to the theoretical constructs of SLA.
Keywords meta-analysis; corrective feedback; recasts; metalinguistic feedback;
implicit feedback; explicit feedback; moderator variables
Introduction
Corrective feedback in second language acquisition (SLA) refers to the re-
sponses to a learner’s nontargetlike L2 production. The past decade has
An earlier version of this article was presented at the Annual Conference of the American As-
sociation of Applied Linguistics in Denver in March 2009. I would like to thank Susan Gass for
her generous help in all phases of this study. I am indebted to Shawn Loewen, Patti Spinner, Fred
Oswald, and Luke Plonsky for their help with various aspects of the project. My thanks also go
to the anonymous reviewers and Scott Jarvis, the associate editor of Language Learning, who
provided invaluable comments on the previous versions of this article.
Correspondence concerning this article should be addressed to Shaofeng Li, A-742 Wells Hall,
Michigan State University, East Lansing, MI 48824. Internet: lishaofe@msu.edu
Language Learning 60:2, June 2010, pp. 309–365 309

C 2010 Language Learning Research Club, University of Michigan
DOI: 10.1111/j.1467-9922.2010.00561.x
Li Meta-Analysis of Corrective Feedback
witnessed a rapid increase in empirical research on the effectiveness of correc-

tive feedback. The accumulation of research in this field necessitates a research
synthesis investigating the effectiveness of corrective feedback across stud-
ies and identifying the variables impacting its effectiveness. One of the best
methods of research synthesis is meta-analysis.
Ever since Norris and Ortega (2000) introduced meta-analysis as a re-
search synthesis method in SLA, some such analyses (e.g., Goldschneider
& DeKeyser, 2005; Indefrey, 2006; Norris & Ortega, 2006) have been con-
ducted. Among them, three have direct or indirect links with corrective feed-
back (Mackey & Goo, 2007; Norris & Ortega, 2000; Russell & Spada, 2006).
Norris and Ortega’s (2000) analysis concerns the effectiveness of L2 instruc-
tion in general, with the inclusion of feedback as one type of instruction;
Russell and Spada (2006) meta-analyzed the efficacy of feedback on second
language (L2) grammar accuracy in both oral and written modes; Mackey
and Goo (2007) examined the role of negotiated interaction in L2 acquisition
and included corrective feedback as a variable mediating the effectiveness of
interaction.
The few previous meta-analyses approached the effectiveness of corrective
feedback from different perspectives, and because of their different foci, each
reported on a different set of primary studies. This meta-analysis seeks to
present a more thorough and comprehensive investigation of the effects of
corrective feedback by adopting a different set of inclusion/exclusion criteria,
investigating some moderator variables that have not been included in previous
analyses, and utilizing some different meta-analytic procedures.
Background
Corrective Feedback in SLA
The effectiveness of corrective feedback is mainly attributable to the negative
evidence it entails. According to Gass (1997), language learners have access to
two types of input: positive evidence and negative evidence. Positive evidence
informs the learner of what is acceptable in the target language and contains
“the set of well-formed sentences to which learners are exposed” (p. 36).
Negative evidence, in contrast, provides the learner with information about
the incorrectness of an L2 form or utterance and is often realized through the
provision of corrective feedback in response to the learner’s nontargetlike L2
production. The distinction between the two types of input gives rise to the
question of whether both types of evidence are necessary or if exposure to
positive evidence is the only necessary condition for L2 learning.

One group of researchers (Krashen, 1981; Schwartz, 1993; Truscott, 2007)

argued that similar to first language (L1) acquisition, SLA depends solely on
positive evidence and that negative evidence is not necessary and might even
be harmful. Therefore, any attempt to draw the learner’s attention to linguistic
form should be avoided. The only task facing L2 educators is to maximize
the learner’s exposure to positive evidence. However, research contextualized
in some French immersion programs in Canada (Swain, 1985) showed that
even after many years of exposure to the target language, the interlanguage
of the learners was still in many ways grammatically flawed. It was found
that the failure of these immersion learners to achieve L2 accuracy was partly
attributable to the unavailability of negative evidence to the learner.
Researchers have justified the usefulness of corrective feedback from dif-
ferent perspectives. The interactionists (Gass, 1997; Long, 2007; Pica, 1988)
acknowledge the importance of positive evidence, but in the meantime main-
tain that negative evidence afforded through interactional feedback helps the
learner notice the gap between his/her nontargetlike L2 production and the tar-
get form and make subsequent modifications. The role of corrective feedback
is also grounded in Schmidt’s (1990, 2001) noticing hypothesis, which claims
that unlike L1 acquisition, SLA is conscious. One way to enhance the learner’s
noticing of linguistic forms is through the provision of corrective feedback. In
addition, the effects of feedback might also be attributable to uptake, which
refers to the learner’s responses to corrective feedback provided after a linguistic
error or a query about a linguistic form (Loewen, 2004; Sheen, 2006).
Ever since the role of corrective feedback was theoretically established,
empirical research on its effectiveness has gained momentum and researchers
have approached it from varied perspectives. Experimental and observational
studies have been conducted in both classroom and laboratory settings, in L2
as well as foreign language contexts, in both face-to-face and computer modes,
and with different age groups. The foci of the studies include the occurrence of
different types of feedback (Lyster, 1998, 2001; Lyster & Ranta, 1997), learners’
perception of feedback (Carpenter, Jeon, MacGregor, & Mackey, 2006; Mackey,
Gass, & McDonough, 2000), the overall effect of feedback (Mackey, Oliver,
& Leeman, 2003; McDonough, 2005; Oliver & Mackey, 2003), the effect of
recasts (Loewen & Philp, 2006; Nabei & Swain, 2002; Nicholas, Lightbown,
& Spada, 2001; Philp, 2003), uptake (Loewen, 2004; Panova & Lyster, 2002),
and the differential effects of different types of feedback (Ammar & Spada,
2006; Carroll & Swain, 1993; Ellis, Loewen, & Erlam, 2006; Lyster, 2004).
In general, the following findings have been obtained: (a) Recasts are the
most frequent feedback type in the classroom (Lyster & Ranta, 1997; Sheen,
311 Language Learning 60:2, June 2010, pp. 309–365

2004); (b) recasts facilitate SLA (Han, 2002; Mackey & Philp, 1998); (c) dif-
ferent types of feedback have differential impact on SLA—explicit feedback is
more effective than implicit feedback (Ellis et al., 2006) and prompts work bet-
ter than recasts (Ammar & Spada, 2006; Lyster, 2004)—and (d) the occurrence
of uptake varies in different contexts (Lyster & Mori, 2006) and is constrained
by the characteristics of feedback (Loewen & Philp, 2006).
Although corrective feedback has generally been found to be beneficial to
L2 acquisition, to provide better pedagogical implications to L2 practitioners
and identify research orientations for L2 researchers, it is important to have an
accurate and holistic picture of how effective it is across different studies, how
different feedback types impact L2 learning differently, and how the effective-
ness of corrective feedback is mediated by the idiosyncratic methodological
features of primary studies. To answer these questions, it is necessary to con-
duct a comprehensive research synthesis, and meta-analysis has been shown to
be one of the most effective tools for research synthesis.
Meta-Analyses Related to the Effectiveness

of Corrective Feedback in SLA
The term “meta-analysis” was coined by Glass (1976) to refer to a quantita-
tive review of the research on the effect of a certain treatment on a response
variable. It provides a systematic description of “the results of each study via a
numerical index of effect size” (Konstantopoulos & Hedges, 2004, p. 281) and
combines these estimates to arrive at a summary of the findings across primary
studies. The use of meta-analysis has increasingly become a preferred method
of research synthesis because of its advantages over traditional approaches
(such as the narrative method, the vote-counting method, and the “cumulation
of p values” method) (Hunter & Schmidt, 2004).
The superiority of meta-analysis to other research synthesis methods has
been recognized in the field of SLA with the publication of Norris and Ortega’s
(2000) seminal study on the effectiveness of L2 instruction. In their analysis,
corrective feedback was identified as one instruction type. In 2006, Norris and
Ortega edited a collection of research syntheses in SLA, each concerned with a
certain aspect of L2 acquisition. Among the eight research syntheses included
in the book, one (Russell & Spada, 2006) relates to corrective feedback. Mackey
and Goo (2007) performed a meta-analysis on the effectiveness of negotiated
interaction in SLA and included corrective feedback as an independent variable
mediating the effect of interaction. In the section below, I provide a brief
summary, in chronological order, of what the three meta-analyses1 (Mackey

& Goo, 2007; Norris & Ortega, 2000; Russell & Spada, 2006) have revealed
about the effectiveness of corrective feedback.
Norris and Ortega (2000) meta-analyzed the empirical studies published
between 1980 and 1998 on the effectiveness of L2 instructional treatments. In
the analysis, L2 instruction was identified as either focus on form or focus on
forms, depending on whether the teaching of linguistic forms was embedded in
meaningful communication or was implemented in the absence of context, and
as either explicit instruction or implicit instruction, depending on how much
learner attention was drawn to linguistic forms. In general, it was found that the
average effect size for focus-on-form treatments was slightly larger than that
for focus-on-forms treatments and that explicit instruction had substantially
larger effect sizes than implicit instruction.
Within the broad domains of focus on form and focus on forms, Norris
and Ortega (2000) also investigated the effect sizes of some subgroups of
studies, including the two groups of studies examining the efficacy of recasts
and metalinguistic feedback, respectively. The average effect size for recasts
was d = 0.81, and for metalinguistic feedback, it was d = 0.96, suggesting a
larger effect for the latter, a more explicit type of feedback. It must be noted
that due to the small number of included studies that investigated recasts, the
confidence interval was wide (–0.43 to 2.05) and included zero, precluding
any firm conclusion beyond chance. In addition to recasts and metalinguistic
feedback, other feedback types such as clarification were also included in the
analysis, but because their effect sizes were not separately calculated (either
because of the focus of the analysis or because of the small number of related
primary studies contributing effect sizes), exactly how effective they were is
not known.
Russell and Spada’s (2006) analysis specifically examined the effectiveness
of corrective feedback. The included studies were published between 1988
and 2003. It encompassed both studies examining oral feedback and those
examining written feedback; it also investigated corrective feedback (be it oral
or written) provided to written errors (i.e., grammar errors in L2 writing). It was
found that the mean effect size for all treatments was 1.16 and that overall oral
feedback had a smaller effect size than written feedback, although both effect
sizes were large. As the number of studies included in the analysis is relatively
small (k = 15), the meta-analysts cautioned against the generalizability of
the findings, especially when it comes to the effect sizes of the subgroups.
Additionally, due to the lack of primary research on the effects of individual
feedback types, the meta-analysts did not distinguish between feedback types
or carry out separate analyses for them.

Mackey and Goo (2007) meta-analyzed the effect of negotiated interaction

on L2 learning, but because corrective feedback is a very important defining
feature of interaction, a substantial proportion of the included unique stud-
ies (16 out of 22) examined the effectiveness of feedback. The researchers
classified feedback types into recasts, metalinguistic feedback, and negotiation
(clarification, confirmation, etc.). The mean effect size for feedback on imme-
diate posttests was 0.71, and on short-term posttests, it was 1.09, suggesting
an increased effect of feedback over a short term. Separate analyses were also
conducted for different feedback types on immediate posttests, which showed
that the mean effect size was 0.96 for recasts, 0.47 for metalinguistic feedback,
and 0.52 for negotiation.
Whereas previous meta-analyses have examined the effectiveness of cor-
rective feedback, several issues remain unexplored. First, all of the previ-
ous analyses included only published studies, leaving the question open as
to whether publication bias was present—that is, whether studies reporting
significant results and larger effect sizes are more likely to be published or
submitted for publication. Second, because of the different foci and selection
criteria of previous studies, some studies were left out and certain questions
were unanswered. With regard to Norris and Ortega’s (2000) study, all of the
included studies were published before 1998 and many studies on feedback
have been published since, so it needs to be updated. Additionally, because
the study focus was on L2 instruction in general and corrective feedback was
identified as only one form of instruction, the effects of some feedback types
were not teased out and separate analyses were not conducted for them. Russell
and Spada’s (2006) study investigated both oral feedback and written feed-
back and excluded computer-delivered feedback, indicating a different set of
selection criteria. Furthermore, because the cutoff date for their data collection
was February 2004, an update is necessary. Mackey and Goo’s (2007) analysis
examined both face-to-face feedback and feedback delivered through the com-
puter, and it also brought out different types of feedback. However, because the
focus was on interaction, studies where corrective feedback was not provided
in negotiated interaction were excluded (e.g., Carroll & Swain, 1993). In fact,
as Mackey and Goo pointed out, “some studies were purposefully designed so
that learners would not receive feedback on their incorrect utterances during
negotiation for meaning” (2007, p. 413). Third, there are different models of
meta-analysis that are based on different assumptions and yield different results
(models of meta-analysis will be discussed in the Method section). Previous
studies did not specify which models were followed or whether the assumptions
were met.

This meta-analysis sought to fill these gaps or answer the remaining ques-
tions by taking the following steps: (a) It included unpublished dissertations to
minimize publication bias; (b) it excluded studies investigating feedback fol-
lowing errors in the learner’s written production based on the assumption that
those studies involve different constructs; (c) it identified some variables that
had not been dealt with in previous analyses such as feedback delivery mode,
publication type, and so forth; (d) corrective feedback was examined as the
sole construct so that a clearer picture about this type of L2 instruction can be
obtained; (e) both the fixed-effects model and the random-effects model were
utilized to show a more comprehensive picture for the topic under investigation;
(f) the principle of “one study, one effect size” was followed as much as possi-
ble to minimize the presence of sample size inflation and nonindependence of
events.
Variables and Research Questions

The independent variables examined in this meta-analysis include feedback in
general, feedback types, and timing of posttests (Post 1, Post 2, and Post 3);
the moderator variables are research setting, research context, task type, mode
of feedback delivery, outcome measure, publication type, length of treatment,
interlocutor type, target language (L2), learners’ age, and year of publication.
The dependent variable is the effect sizes derived from the included primary
studies. Although the importance of examining the independent variables is
self-evident, the rationale for investigating the moderator variables needs fur-
ther explanation and some major moderators are discussed below.
Research Setting
Descriptive studies (Lyster & Mori, 2006; Sheen, 2004) have shown that the
occurrence and uptake of corrective feedback was very different across research
or instruction settings, but experimental studies have not singled out research
setting as an independent variable. Research setting can be divided into foreign
language (FL) and second language (SL). A foreign language setting is one
where the learner studies a language that is not the primary language of the
linguistic community (e.g., an L1 Korean speaker learning English in Korea);
a second language setting is one in which the learner’s target language is the
primary language of the linguistic community (e.g., an L1 Korean speaker
learning English in the United States). Because the dynamics of these two
settings are different, the effects of feedback are likely to differ.

Research Context
Feedback studies have been conducted in both the laboratory and the classroom.
In the laboratory, distraction is minimized and instructional interventions can
be better implemented than in the classroom. Classroom feedback studies are
mostly described as quasi-experimental because distracter variables cannot be
easily or entirely controlled. In light of the differences between the laboratory
and the classroom, there is reason to believe that the effects of feedback may
not be the same across the two contexts.
Task Type
Feedback can be provided in communicative activities (focus-on-form activi-
ties), in which linguistic forms are attended to in meaningful communication; it
can also be provided in mechanical drills (focus-on-forms activities), in which
the primary focus is on linguistic forms and in which feedback is supplied on
an item-by-item basis. Feedback provided in these two remarkably different
task types may lead to different learning outcomes.
Mode of Delivery
Mode of delivery refers to whether feedback is provided through the computer
or in face-to-face communication. Sagarra (2007) stated that feedback provided
through the computer is more salient; one anonymous reviewer pointed out that
computerized feedback might be more consistent. Therefore, the possibility
exists that the mode of delivery may impact the effects of feedback. To date,
no empirical research has been done to compare the two modes of delivery.
Outcome Measure
As previous meta-analyses showed (Norris & Ortega, 2000; Mackey & Goo,
2007), primary researchers used varied test formats to measure the effects of
L2 instruction, and outcome measure did mediate the effects of instruction.
Determining how it would impact the effects of corrective feedback or how the
effects of feedback are reflected by different test types is one of the objectives
of this meta-analysis.
Publication Type
It is generally believed that studies with significant findings are more likely to
be published. This is called publication or availability bias in meta-analysis—“a
tendency on the part of researchers, reviewers, and editors to submit, accept, and
publish studies that report statistically significant results consistent with theo-
retical or previously established empirical expectations” (Cornell & Mulrow,

1999, p. 311). No meta-analysis related to corrective feedback has included

unpublished studies; investigating the impact of publication status is one of the
goals of this meta-analysis.
Length of Treatment
In feedback research, the duration of treatment ranges from 15 min (Chen, 1996)
to a semester (Tomasello & Herron, 1989). Although the impact of this variable
must be investigated together with other variables such as the complexity of
linguistic structure, the intensity of feedback (Norris & Ortega, 2000), learner
differences, and so on, it is interesting to examine whether treatment length
alone has any influence on the effects of feedback.
Age
Descriptive research (Mackey, Oliver, & Philp, 1997; Oliver, 2000) has indi-
cated that children were different from adult learners in the way they responded
to and used feedback. Primary studies on corrective feedback have been con-
ducted with adult or child L2 learners, but no study has examined age as an
independent variable. This meta-analysis seeks to determine if learners’ age
mediates the effectiveness of corrective feedback.
As discussed, these so-called methodological or learner characteristics that
potentially affect the effectiveness of corrective feedback have not been inves-
tigated in primary research. However, they may become independent variables
if they have a substantial effect. That being the case, the findings of primary
studies must be reinterpreted in conjunction with these factors. Although these
variables have not been examined in primary studies, their impact can be identi-
fied by performing a meta-analysis in which the effect sizes generated by these
studies are compared and synthesized.
This meta-analysis seeks to answer the following research questions:
1. What is the overall effect of corrective feedback on L2 learning?
2. Do different feedback types impact L2 learning differently?
3. Does the effectiveness of corrective feedback persist over time?
4. What are the moderator variables for the effectiveness of corrective
feedback?
Method
Identifying Primary Studies
The following steps were taken to locate related primary studies. First, two
commonly used electronic databases in the fields of applied linguistics and

education, LLBA and ERIC, were searched. The key words and combination
of key words that were used include corrective feedback, feedback, implicit
feedback, explicit feedback, negative evidence, negative feedback, error cor-
rection, negotiation, recasts, metalinguistic feedback, prompts, clarification,
second language acquisition/learning, foreign language education/learning,
focus on form, focus on forms, and form-focused instruction. Second, both elec-
tronic and manual searches were performed for the current and back issues of
some widely cited journals in SLA and applied linguistics, including, but not
limited to, Language Learning, Studies in Second Language Acquisition, Ap-
plied Linguistics, The Modern Language Journal, TESOL Quarterly, Foreign
Language Annals, Language Teaching Research, System, The Canadian Mod-
ern Language Review, International Review of Applied Linguistics, Computer
Assisted Language Learning, and Language Learning and Technology. Third,
state-of-the-art articles (e.g., Ellis & Sheen, 2006; Felix, 2005; Nicholas et al.,
2001) and edited books, course books, and book chapters related to corrective
feedback (e.g., Doughty & Long, 2003; Gass, 2003; Gass & Selinker, 2001;
Long, 2007; Mackey, 2007), as well as their reference sections, were scanned
for potential sources of primary research. Fourth, the reference sections of the
published meta-analyses associated with corrective feedback were carefully
examined.
Finally, in order to minimize availability bias or the “file-drawer” problem
(the fact that some fugitive literature might be tucked away in researchers’
file cabinets), this meta-analysis included Ph.D. dissertations. The existence
of availability bias is evidenced by Rosenthal’s (1984) finding that averaged
d values yielded by theses and dissertations were at least 40% less than those
from other sources. Because of the possible presence of availability bias, experts
in meta-analysis (Hunter & Schmidt, 2004; Konstantopoulos & Hedges, 2004;
Lipsey & Wilson, 2001) have been calling for the inclusion of unpublished
studies in meta-analyses.
Initially, the researcher considered obtaining as much “fugitive” literature
as possible, including conference presentations, manuscripts in press, and so
on, but due to the difficulty involved in retrieving those materials, it was
decided that only Ph.D. dissertations would be included. In light of the fact that
most dissertations are carefully designed and provide detailed information on
research methodology and statistical analyses, it is justified to include them in a
meta-analysis. The electronic database ProQuest Dissertations and Theses2 was
utilized to search for dissertations. The key words used in search of published
studies were also used to search for dissertations. After related dissertations

were identified, they were requested and obtained through the InterLibrary Loan
service at Michigan State University.
Inclusion/Exclusion Criteria
A study must have had the following characteristics to be included in this
meta-analysis:
1. One of the independent variables was corrective feedback, either in the
form of recasts, metalinguistic feedback, explicit correction, negotiation
(clarification request, confirmation check, elicitation, and repetition) and
so on, or a combination of different feedback types.
2. Feedback was delivered either face-to-face or via the computer.
3. It was experimental or quasi-experimental and had a control group or a
group that could be considered a comparison group (i.e., no feedback
treatment or least amount of feedback treatment) so that learning effects
after treatment could be observed by comparing the gains of experiment
groups and those of the control or comparison group.
4. The effect of feedback could be disentangled from the effects of other
treatments. This specification made it possible to include studies in which
instructional intervention included feedback as well as other instructional
types. For instance, one study that was included in this analysis but excluded
from Mackey and Goo’s (2007) study is by Lyster (2004), who examined
four conditions: FFI (form-focused instruction) + recasts, FFI + prompts,
FFI-only, and control. It was excluded because the researchers argued
that the FFI + recasts and FFI + prompts groups involved two types
of instruction—FFI and feedback—so it was difficult to tease them out.
However, if the FFI-only group, instead of the control group that received
no treatment, serves as the comparison group, any effect based on the
comparison between this group and the FFI + recasts or FFI + prompts
group must be due to the presence or absence of feedback.
5. The dependent variable measured the learning of an L2 feature, be it
morphosyntactic, lexical, or phonological.
6. It was published in English.
7. It utilized statistical analyses that investigated mean differences. Although
it is possible to convert one effect size index to another (e.g., from r to d and
vice versa), meta-analyzing studies that use different effect size measures
usually does not generate interpretable results (Lipsey & Wilson, 2001).
8. It examined the effect of corrective feedback on either child L2 or adult
L2 (following Mackey & Goo, 2007).

A study was excluded from this analysis for the following reasons:
1. It did not measure learning. For instance, Takahashi (2007) investigated

the effect of the characteristics of recasts on learners’ uptake (responses
following feedback). Although uptake might facilitate learning, there has
been a lack of empirical evidence for the association between uptake and
L2 learning.
2. It did not provide enough information for effect size calculation (Ishida,
2004; Nagata, 1993).
3. It employed a design that made it impossible to disentangle the effect of
feedback from other instruction types in the treatment (Iwashita, 2003;
Lee, 1997; Whitlow, 1997).
4. It examined the effect of corrective feedback on the learning of pragmatics
or cultural knowledge rather than linguistic features (e.g., Koike & Pearson,
2005).
5. It adopted pretest-posttest designs without including a control group. There
were two such studies in the initial data set (Paniagua, 1985; Sachs & Suh,
2007). They were excluded because in all of the remaining studies, a control
group was included and the extent to which pretest-posttest differences are
comparable to experiment-control differences has yet to be investigated.
6. It investigated the effects of feedback on L2 writing.
Coding
Because of the cumbersome, complicated, and important nature of coding for
meta-analysis, the creation of a coding scheme was a cyclic process that in-
volved repeated modifications and revisions. At first, 20% of the retrieved
studies were examined to generate a preliminary scheme identifying the inde-
pendent and dependent variables and methodological features, which were then
categorized and given labels that applied to as many studies as possible. The
coding protocols of previous meta-analyses were also consulted in the estab-
lishment of the preliminary scheme. To ensure coding reliability, the primary
studies went through a total of five rounds of coding. At the completion of the
third round, a second coder (a meta-analyst) coded 11 out of the 33 retrieved
studies independently, including 6 dissertations and 5 published articles. The
second coder was asked to pay particular attention to high-inference variables
such as feedback type and outcome measure. The agreement rate was 98%, and
differences were resolved through discussion. Fourth and fifth rounds of coding
were performed to make sure that all of the data were coded in compliance with
the protocol both coders agreed upon.

According to Lipsey and Wilson (2001), the study descriptors in a meta-

analysis fall into three types: substantive aspects that are usually independent
variables in primary studies; methodological aspects that might become mod-
erator variables accounting for effect size variation; and bibliographic aspects
such as dates of publication, publication type, and so on. Although this classi-
fication helps meta-analysts to understand the coding process, the distinction
among the three categories of descriptors may not be as clear-cut as expected
simply because a certain feature might switch between categories. For instance,
a methodological feature might become substantive if it has a large impact on
effect size variation. By the same token, a bibliographic aspect might become
a moderator variable. The coding protocol of this meta-analysis appears in
Table 1 and some features and descriptors are discussed below.
Feedback Type
Partly because of the variety of feedback types investigated by primary re-
searchers and partly because of the different ways to operationalize the same
feedback types in different studies, the coding of feedback type posed a great
challenge. On the one hand, the categories must be general enough to encom-
pass as many studies as possible; on the other hand, the categories must be
specific enough to maintain the unique features of primary studies. In this
meta-analysis, two schemes were developed with regard to feedback type.
First, feedback was identified as reported and defined in primary studies and
the original labels were maintained. This scheme makes it possible to compare
the effects of the most frequent corrective strategies in the data set. In coding
feedback type, care was taken to make sure that the categories that were used
were consistent across studies regardless of the idiosyncrasy involved when
primary researchers labeled the types of feedback they investigated. Feedback
types were categorized and labeled according to the scheme developed by
Lyster and his colleagues (Lyster, 1998, 2001, 2004; Lyster & Mori, 2006;
Lyster & Ranta, 1997). Recasts refer to partial or complete reformulation of the
learner’s erroneous utterance; explicit correction is defined as the provision of
the correct form while clearly indicating that the learner’s utterance is wrong;
metalinguistic feedback refers to metalinguistic comments or information about
the learner’s utterance; elicitation refers to the interlocutor’s (teacher or native
speaker) attempt to elicit a reformulation from the learner by asking questions
such as “How do we say this in English?”; clarification requests ask the learner
to clarify his/her utterance through questions such as “Pardon me?” or “I don’t
understand”; repetition is a move where the interlocutor repeats the learner’s
ill-formed utterance.

Table 1 Coding scheme
Features Descriptors
Feedback type 1. Implicit/explicit/mixed

2. Recasts/metalinguistic/clarification/. . .
Outcome measure Constrained constructed responses/free
constructed responses/metalinguistic
judgments/selected responses
Timing of posttests Immediate (<7 days)/short (7–29 days)/long
(>30 days)
Research setting Lab/classroom/groups∗
Research context Second language/foreign language
Mode of feedback Face-to-face/computer-delivered
Learners’ proficiency level Low/intermediate/high
Measures of proficiency Impressionistic judgment/institutional
status/in-house assessment/standardized test
Learners’ age Average age of participants
Learners’ L1 English, Spanish, etc.
Learners’ L2 English, Spanish, etc.
Academic status Elementary school/high school/university/
language program
Interlocutor type Teacher/native speaker/computer
Time of publication Publication year
Length of treatment Short (<50 min)/medium (60–120 min)/long
(>120 min)
Task type Communicative/drill/mixed
Publication type Published/dissertation
∗
In a few studies (e.g., Ellis et al., 2006), learners received corrective feedback as small
groups rather than in dyadic interaction or as a class.
The second scheme coded feedback types into implicit feedback and explicit
feedback. Where it was impossible to classify certain feedback types in terms
of explicitness/implicitness, such as when a feedback type was operationalized
as containing both implicit and explicit feedback (such as “prompts” in Lyster,
2004, which included clarification, elicitation, repetition, and metalinguistic
feedback), their original labels were maintained and they were not included for
analysis when the implicit versus explicit comparison was made. Although the
explicitness of feedback varies along a continuum and even the same feedback
type, such as recasts, can vary in explicitness, it is generally agreed that recasts
are toward the implicit end and explicit correction and metalinguistic feedback

are at the explicit end (Ellis et al., 2006; Lyster, 1998). Corrective feedback
in the form of clarification and elicitation was classified as implicit (Carroll &
Swain, 1993). Consequently, in this meta-analysis, implicit feedback included
recasts, negotiation (clarification requests, elicitation, and repetition), and any
type of feedback that was not intended to overtly draw the learner’s attention to
his/her erroneous production; explicit feedback included metalinguistic feed-
back, explicit correction, and any feedback type that overtly indicated that the
learner’s L2 output was not acceptable (such as “explicit hypothesis rejection”
in Carroll & Swain, 1993). The implicit versus explicit dichotomy is necessary
because it has been argued that explicit feedback is superior to implicit feedback
in SLA because the former is more salient (Carroll & Swain, 1992; Ellis et al.,
2006).
As one anonymous reviewer pointed out, the boundary between explicit and
implicit feedback cannot be easily drawn and there are different ways to clas-
sify feedback types. For instance, Lyster and Ranta (1997) argued that feedback
types should be categorized according to whether learner repair is encouraged:
Recasts and explicit correction supply the correct form and therefore do not
encourage learner repair, whereas prompts (which include metalinguistic feed-
back, elicitation, clarification, and repetition) withhold the target form and en-
courage self-correction. Loewen and Nabei (2007) pointed out that recasts and
explicit correction could be labeled “other repair” and prompts “self-repair.” It
would be interesting to meta-analyze the effectiveness of prompts in comparison
with recasts and explicit correction. However, there have been only two studies
that investigated the effects of prompts compared with recasts. Most primary
researchers have examined the effects of individual feedback types and opera-
tionalized and discussed the results in terms of the explicitness/implicitness of
the feedback. Therefore, the “explicit versus implicit” scheme was used in this
meta-analysis.
Outcome Measure
Following Norris and Ortega (2000), measures of treatment effect were coded
as metalinguistic judgments (or grammatical judgment tests [GJTs]) if learners
were required to make a judgment on the grammaticality of some target struc-
tures; as selected responses if learners were asked to choose the correct answer
among several alternatives; as constrained constructed responses if learners
were required to produce the tested forms in tasks where the use of the tar-
get structure was essential; and as free constructed responses if learners were
required to produce the target language without many constraints.

Timing of Posttests
Following Keck, Iberri-Shea, Tracy-Ventura, and Wa-Mbaleka (2006), a test
was defined as an immediate posttest if it was taken less than 7 days after the
treatment, as a short-term delayed posttest if it was administered 8–29 days
after the treatment, and as a long-term delayed posttest if it happened 30 days
or later after the treatment. In cases in which the posttesting time frame of a
primary study did not match the scheme of this meta-analysis, it was coded
to fit into the scheme. For instance, in Bationo (1991), the first and second
posttests were both administered within 7 days after the treatment, but only the
first was included and coded as “Post 1.”
Learners’ Proficiency Level

Learners’ proficiency level was coded as low, intermediate, or high as reported
in the primary studies. Note that the primary researchers’ decisions on the
proficiency levels of participants were arbitrary and highly context-specific.
Thus, this variable was not included in the moderator analyses.
Measures of Proficiency
As in Keck et al.’s study (2006; also see Thomas, 1994), a proficiency measure
was coded as an impressionistic judgment if the participants’ proficiency level
was based on the researcher’s personal evaluation; as institutional status if
learners’ proficiency was assessed on the basis of their enrollment in a language
class or program; as in-house assessment if a placement test or a test created
by the researcher was used; and as a standardized test if the participants’
proficiency was calibrated according to their performance on an established
test such as TOEFL or the ACTFL Proficiency Guidelines. Because of the high
degree of heterogeneity in primary researchers’ use of proficiency measures,
this variable was not included in the moderator analyses.
Length of Treatment
Three categories were identified as far as length of treatment is concerned.
If the duration of a treatment was 50 min or less, it was coded as a “short
treatment”; if it was between 60 and 120 min, it was considered “medium”; if
it was over 120 min, it was considered “long.” It should be noted that the cutoff
points for the length of treatment were arbitrary. In Norris and Ortega’s (2000)
meta-analysis, four categories of treatment length were identified: brief (less
than 1 hr), short (over 1 hr but less than 2 hr), medium (from 3 to 6 hr), and
long (over 7 hr). In this meta-analysis, three, rather than four, categories were
created due to the relatively small sample size; the boundaries of each category

were delineated in the way that better fit with the distribution of the studies in
the data set in terms of duration of treatment.
Task Type
Tasks that involved meaningful communication were coded as “communica-
tive.” Such tasks include information gap, jigsaw, decision making, and so on,
the focus of which is on fulfilling a task rather than linguistic forms per se.
Tasks that focused on linguistic features and that required the learner to engage
in mechanical practice were coded as “drill.” An example of such a task is when
a learner is required to answer discrete questions by using the target structure
to be learned, followed by corrective feedback about the answers. Tasks that
did not fit in either category were recorded as “miscellaneous,” such as when a
task contained both drills and communicative activities.
Learners’ Age
Learners’ age was coded as follows. For studies that reported participants’
average age, the original information was recorded; for studies that reported
participants’ enrollment at school, such as “university students,” “freshmen,”
and so on, their age was estimated (e.g., 12 for “sixth graders” ’ and 18 for
“freshmen”); for studies that reported a narrow range such as “18–20,” the
median (19) was taken as the average age; for studies that did not provide any
related information or provided a wide range such as “18–55,” they were coded
as such and were not included when the age effect was investigated. Because
of the lack of studies dealing with child L2 learners (n = 3), which makes it
difficult to determine the differential effects of corrective feedback on child
and adult learners as separate groups, in this meta-analysis learners’ age was
investigated as a continuous moderator variable.
Analysis
All the analyses were performed by using professional meta-analysis software
called Comprehensive Meta-Analysis (CMA; Borenstein, Hedges, Higgins, &
Rothstein, 2005), which has been developed by a group of experts from the
United States and the United Kingdom. An almost “all-purpose meta-analysis
program” (Hunter & Schmidt, 2004, p. 466) and “probably the most sophisti-
cated stand-alone package for meta-analysis” (Littell, Corcoran, & Pillai, 2008,
p. 146), it has many features that allow users to perform analyses that would
otherwise be impossible, such as calculating effect sizes based on different data
formats, yielding results for both the fixed-effects and random-effects models,
using Q-tests to detect significant moderators, plotting availability bias, and

so on. Ever since the program was developed, it has been used in numerous
meta-analyses in various academic fields (e.g., Juffer & van Uzendoorn, 2007;
LeBauer & Treseder, 2008; Richardson & Rothstein, 2008) and has proven to
be an effective meta-analytic tool.3
Fixed-Effects Versus Random-Effects Models
There are two models of meta-analysis that are based on different assumptions:
fixed-effects (FE) models and random-effects (RE) models (Cornell & Mulrow,
1999; Hedges, 1994; Hunter & Schmidt, 2004; Raudenbush, 1994). FE models
are based on the assumption that the population effect size is the same in
all the studies included in the meta-analysis and any variation between the
studies is attributable to sampling variability. RE models allow variation of the
true population effect in the included studies, and the variation results from
heterogeneous factors. Choosing between the FE and RE models is not easy
and often “involves a considerable amount of subjective judgment” (Cooper &
Hedges, 1994, p. 526). Because the two models generate somewhat different
results, reporting the estimates of either model alone would be misleading.
Therefore, in this meta-analysis, the results from both models are reported
(e.g., Patall, Cooper, & Robinson, 2008; Shadish, 1992) to present a more
comprehensive picture of the included studies.
Effect Size Calculation
The following contrasts were identified for effect size calculation:
1. If a study had a control group and the only difference between the control
group and experiment groups is the presence or absence of corrective
feedback, effect sizes were calculated by comparing each treatment group
with the control group.
2. If a study had a control group that did not receive any treatment and
the experiment groups received another type of instruction in addition to
corrective feedback such that the effect of feedback could not be singled out
by comparing the control group and the experiment groups, effect sizes
were calculated by using the one experiment group as the comparison
group that differed from the other experiment groups only by the presence
or absence of feedback (e.g. Lyster, 2004; Mackey & Philp, 1998).
3. If a study had no control group, effect sizes were calculated by using the
group that received the least amount of feedback as the comparison group
that provided baseline data.
Equations used in effect size calculation were selected based on the data
formats reported in primary studies. For studies that reported group means and

standard deviations, the following equation was used:
Mean difference
d= , (1)
Pooled SD
where mean difference refers to the difference between the mean of the ex-
periment group and that of the control group (in cases where there were no
pretest scores) or between the mean change score of the experiment group and
that of the control group (in cases where both pretest and posttest scores were
reported). The pooled standard deviation (SD) was calculated based on the
standard deviations of the experiment and control means. For studies (n = 2 in
this meta-analysis) that reported only t values or F values (n = 1), Equations 2
and 3 were used, respectively:
t 2N1 N2
d= √ , Harmonic N = , (2)
N1 + N2
Harmonic N / 2

F(N1 + N2 )
d= . (3)
N1 N2
In Equations 2 and 3, N 1 and N 2 refer to the sample sizes of the compared

groups. For studies that reported binary data in a 2 × 2 cohort (e.g., Mackey &
Philp, 1998), the following equation was used:
√
3 log(odds ratio)
d= , (4)
π
where odds ratio refers to the ratio of the odds of an event occurring in the
experiment group compared to the control group.
Sample Size Inflation

As a result of the inclusion of multiple independent variables, outcome mea-
sures, or target structures, many studies contributed more than one effect size;
in fact, it was very rare that a study generated a single effect size. The inclu-
sion of more than one effect size in a meta-analysis would lead to an inflated
sample size, nonindependence of data points, and distortion of standard error

estimates, which “can render the statistical results highly suspect” (Lipsey &
Wilson, 2001, p. 105). Where there are multiple effect sizes from one study,
the suggested solution is to randomly pick one or take the average.
In this meta-analysis, the principle of “one study, one effect size” was con-
sistently adhered to as much as possible in the various analyses. This was done
by averaging multiple effect sizes that tapped into the same construct or that re-
lated to the same independent variable. More specifically, the following moves
were performed. First, when the general analysis was conducted regarding the
overall effect of corrective feedback, each study contributed the average of
all of the effect sizes related to different feedback types. All of the subsequent
analyses that aimed to identify moderator variables were performed in the same
manner—that is, by including the average effect size from each study. Second,
if a study included multiple feedback types, multiple outcome measures, and/or
multiple target structures, priority was give to feedback type (as it is the pri-
mary focus of the study)—that is, the effect sizes based on different dependent
variables and/or different structures were averaged for each feedback type (e.g.,
Ellis, 2007). As a consequence, one study might contribute several effect sizes,
each related to one type of feedback. However, when a separate analysis was
conducted for a certain feedback type, only one effect size from each study
was entered into the analysis. Third, if a study presented results for the separate
parts of a test (such as listening, writing, reading, etc.) as well as a global score
that combined the discrete scores, the global score was used to calculate effect
sizes. Fourth, if the same results were reported in two studies, effect sizes were
extracted from only one study report (e.g., Ellis, 2007; Ellis et al., 2006).
Outliers
Because the sample size of this meta-analysis is relatively small (which is the
case for most meta-analyses in SLA), the presence of extreme values may have
a substantial impact on the results. Preliminary analysis of the data showed that
this was indeed the case: When outliers were included, the average effect size
of the overall effect of feedback on immediate posttests was 0.70 under the
FE model and 0.88 under the RE model. Without outliers, the mean effect size
became 0.61 (FE) and 0.64 (RE). To ensure the robustness of results, outliers
were excluded in this meta-analysis. The detection of outliers was performed
through the following procedure. The effect sizes (or averaged sizes if one study
contributed more than one effect size) contributed by the primary studies under
an independent variable or moderator variable were transformed into z-scores.
Any absolute value (regardless of whether it was positive or negative) larger
than 2.0 was eliminated from the analysis. The procedure was repeated for all

analyses. In addition to identifying extreme values through the examination of

z-scores, a sensitive analysis was performed using the computer program, which
showed the resultant mean effect size when a particular study was removed.
Any study that resulted in a change in the mean effect size by more than
0.05 standard deviation units was considered an outlier. The detected outliers
include Chen (1996; the effect size associated with metalinguistic feedback),
Han (2002), Révész (2007), and Sagarra (2007). It must be noted that because
each analysis involves a different set of primary studies, an outlier in one
analysis may not necessarily be an outlier in another analysis (see Appendix A
for details). However, Sagarra (2007) was found to be an extreme value in all
analyses and was therefore excluded throughout. Conducting outlier detection
on an analysis-by-analysis basis makes it possible to maintain as much data as
possible given the relatively small sample size of this meta-analysis.
Analysis Procedure
After effect sizes were calculated based on the raw data derived from the study
reports, they were weighted according to their sample sizes when analyses were
performed; that is, effect sizes based on larger sample sizes carried more weight
in the analysis. The weighted effect sizes served as the dependent variable, and
the independent variables included feedback in general, feedback types, and
the timing of posttests. Moderator variables were some of the coded features
listed in Table 1. All moderator analyses were performed based on the data
associated with immediate posttests.
To determine whether the mean effect sizes were significantly different from
zero, confidence intervals were calculated. A confidence interval that includes
zero indicates that the null hypothesis that the effect of a certain treatment is
different from zero cannot be rejected at the p < .05 level. The width of an
interval indicates the robustness of the effect: Narrower intervals indicate more
robust results. With regard to the magnitude of effect size, 0.20 is considered
a small effect, 0.50 indicates a medium effect, and 0.80 suggests a large effect
(Cohen, 1988). In order to determine whether there were significant differences
between different feedback types and whether the coded learner characteristics
and methodological features (which served as categorical moderator variables)
as reported by primary studies were significant moderators of the effectiveness
of feedback, Q-tests were performed. A significant between-group Q value in-
dicates that the differences between the levels under the independent variable
are significant. To ascertain whether age and year of publication, the two con-
tinuous moderator variables, were predictors of the magnitude of effect size,
the corresponding data were subjected to meta-regression analyses.4

Results
The Research Synthesis
A total of 33 studies (see Appendix A for the effect sizes each study con-
tributed, the standard error, and the 95% confidence interval [CI] of each effect
size) published between 1988 and 2007 were included in the analysis. (Data
collection was completed by September 2008.) These studies involved a total
of 1,773 L2 learners. Among the 33 studies, 22 were published articles and 11
were Ph.D. dissertations.5 The frequency distribution of the included published
studies and Ph.D. dissertations6 from 1988 through 2007, per unit of 5 years, is
displayed in Figure 1. As shown, there has been a rapid growth in the number of
studies (both published articles and dissertations) on corrective feedback since
1997, indicating an increased interest in its role in SLA. In order to ascertain
whether availability bias was present (i.e., whether the retrieved studies tended
to be those with significant results), a funnel plot was created that plots the
effect sizes (immediate effects) contributed by primary studies against pre-
cision, the inverse of standard error (Figure 2). In a funnel plot, studies with
large sample sizes, because of their smaller sampling error and higher precision
values, appear toward the apex of the graph and tend to cluster near the mean
effect size. Studies with small sample sizes have greater sampling error and
lower precision values, so they tend to appear toward the bottom of the graph
16
14
12
10
Frequency
Dissertation
8
Published
6
0
1988-1992 1993-1997 1998-2002 2003-2007
Year
Figure 1 Publication years of the included studies.

5
Precision
0
-3 -2 -1 0 1 2 3
Effect Size
Figure 2 Availability bias: Funnel plot of precision by effect sizes.
and are dispersed across a range of values. If there is no availability bias, the
studies will be symmetrically distributed around the mean; if availability bias
is present, small studies will be concentrated on the right side of the mean.
This would mean that small-scale studies with greater sampling error (lower
precision values) and lower effect sizes are missing from the data.
The funnel plot of this meta-analysis shows the following patterns. First,
in general, larger sample studies (those with higher precision values) were
evenly distributed around the mean and appeared toward the upper part of the
funnel. Second, at the bottom of the plot, there were only a few effect sizes
and there were more effect sizes on the right side of the mean than on the left
side. This indicates that there was a lack of small-scale studies in the data, and
studies with small sample sizes and small effect sizes were not available. In
short, studies with medium and large sample sizes were well represented in the
data, but small-sample studies with small effect sizes were underrepresented.
A trim-and-fill analysis was performed to search for the missing values that
would change the mean effect size if these values were imputed. It was found
that under the FE model, four values were missing on the left side of the plot
and imputing these values would change the mean effect size from 0.61 (95%
CI = 0.51, 0.71) to 0.56 (95% CI = 0.46, 0.66); under the RE model, five values
should be added to the left side to make the plot symmetrical, and imputing

these values would change the mean effect size from 0.64 (95% CI = 0.47,
0.81) to 0.53 (95% CI = 0.34, 0.72).
Tables 2 and 3 summarize the learner and methodological characteristics
of the included studies (see Appendix B for the information related to each
included study). According to the reported information, the average age of
the participants of these studies was 19.6. Most of the studies recruited adult
L2 learners and only three studies investigated child learners. Around half
of the 33 studies involved L1 speakers of English and L2 learners of English.
Approximately two thirds of the studies were conducted with university students
in low-level language classes. As found in previous meta-analyses (Keck et al.,
2006; Mackey & Goo, 2007), institutional status was the predominant measure
of proficiency. Almost 55% of the studies took place in the laboratory, and
nearly 80% of them were conducted in foreign language contexts. Whereas
in 27 studies corrective feedback was provided in the face-to-face mode, only
6 investigated the effectiveness of feedback as it was delivered through the
computer. Corrective feedback was provided by native speakers in 16 studies,
by teachers in 11 studies, and by computers in 6 studies. In terms of duration of
treatment, the most frequent were treatments that lasted 50 min or less, followed
by treatments that lasted more than 2 hr and those that were 1–2 hr long. As far as
the type of outcome measure is concerned, free constructed responses were used
in 12 studies, constrained constructed responses in 19 studies, grammaticality
judgment tests in 8 studies, and selected responses in 2 studies. Finally, with
respect to the type of instruction activity where corrective feedback was made
available to learners, more than 60% of the studies involved communicative
activities and nearly 30% of them involved mechanical drills in which feedback
was provided in discrete-item practice.
Further discussion on the experimental treatments and control conditions
in the included studies is necessary prior to the quantitative analysis. Typi-
cally, studies conducted in the laboratory involved dyadic interaction between
a native speaker and a nonnative speaker, and learners’ nontargetlike produc-
tion was followed by the corrective moves under investigation (e.g., Mackey
& Philp, 1998). Furthermore, as will also be shown in the Discussion section,
instructional interventions in most lab-based studies lasted less than 50 min. In
classroom-based studies and studies carried out with small groups (three to four
learners), feedback was intended to be directed toward the whole class or group
although only one learner’s erroneous production was responded to (e.g., Ellis
et al., 2006). Unlike studies conducted in the laboratory, classroom- or group-
based studies mostly used longer treatments. In terms of the “communica-
tive versus drill” division, learners involved in the former condition received

Li
333
Table 2 Included studies by learner characteristics
Age k L1 k L2 k Academic status k Proficiency k
10–12 3 English 17 Chinese 1 Elementary school 2 Beginner 20

19–26 23 Dutch 1 English 15 High school 2 Intermediate 6
NR/Range∗ 7 French 1 French 9 University 20 High 1
Hungarian 1 Japanese 2 Language program 7 Not reported 6
Japanese 2 Korean 1 Not reported 2
Spanish 1 Spanish 5
Swedish 1
Thai 2
Mixed 7
∗
Not reported or reported as a range.
Language Learning 60:2, June 2010, pp. 309–365

Meta-Analysis of Corrective Feedback
Li
Table 3 Included studies by methodological features
Proficiency measure k Setting k Context k Interlocutor type k
Institutional status 21 Lab 18 FL 26 Native speaker 16

In-house 9 Class 11 SL 7 Teacher 11
Standardized 2 Group 4 Computer 6
Self-rated 1

Treatment length k Task type k Outcome measure∗ k Mode k
≤50 min 13 Communicative 21 Free constructed 12 Face-to-face 27

60–120 min 8 Drill 9 Constrained constructed 19 Computer 6
>120 min 9 Miscellaneous 3 GJT 8
Not reported 3 Selected response 2
∗
In some studies, more than one outcome measure was used, so the sum of the frequencies of the three outcome measures is not the same as
the total number of studies.
334
feedback when engaged in meaning-based activities (e.g., McDonough, 2005),

such as picture description, spot-the-difference, or story retell. In studies that
utilized mechanical drills as treatment tasks, learners typically engaged in
form-focused discrete-item practice (e.g., Bationo, 1991).
As to what happened in the control condition in the primary studies, only a
few studies included a “real” control group (e.g., Ellis et al., 2006), in which
learners took only the pretests and posttests without being involved in any
instructional treatment. In most studies, learners in the “control” (or compari-
son) condition either engaged in conversations without feedback (e.g., Mackey
& Oliver, 2002) or followed regular class instruction without feedback (e.g.,
Ammar & Spada, 2006) or with feedback on structures other than the target
structure (e.g., Macheak, 2002).
The Quantitative Analysis

The Effectiveness of Corrective Feedback
Table 4 provides the summative information about the effectiveness of feedback,
including the independent variables, the number of studies contributing effect
sizes, the mean effect size, the standard error, and the 95% confidence interval
for each variable on both FE and RE models. Information is also displayed about
the overall effectiveness of feedback and different feedback types at different
time points. As shown, corrective feedback in general had a medium effect on
L2 acquisition (FE: d = 0.61; RE: d = 0.64); the 95% confidence intervals
under both models are far above zero. Although the short- and long-term effects
are slightly smaller, the decrease in effect is not significant because the Q-test
showed that there was no significant difference among the mean effect sizes
associated with the three time points.
As metalinguistic feedback, recasts, explicit correction, and clarification are
the most frequently investigated feedback types in the included studies, mean
effect sizes were calculated for them separately. The results appear in Table 4.
Due to the lack of related studies, the long-term effects for metalinguistic
feedback, short- and long-term effects for explicit correction, and immediate
and short-term effects for clarification were not calculated. As shown, under
both models, explicit correction showed substantially larger immediate effects
than metalinguistic feedback and recasts. The effect size difference between
recasts and metalinguistic feedback is less than 0.1 standard deviation unit at
both time 1 (immediate) and time 2 (short-delayed). On long-delayed posttests,
recasts produced an average effect size that is larger than the one associated with
time 1 and the one associated with time 2, indicating that the effect of recasts
was well retained and increased with time. The effect size associated with the

Table 4 Effects of corrective feedback
Li
95% Confidence interval

Independent variable k Mean d SE Lower limit Upper limit
Overall Effect
Immediate 28 0.612/0.637 0.050/0.087 0.513/0.466 0.712/0.808
Short 14 0.566/0.603 0.073/0.125 0.423/0.358 0.709/0.849
Long 12 0.544/0.531 0.076/0.099 0.395/0.335 0.693/0.726
Recasts
Immediate 16 0.506/0.485 0.088/0.112 0.334/0.226 0.678/0.703
Short 9 0.439/0.435 0.139/0.145 0.166/0.149 0.713/0.721
Long 8 0.533/0.553 0.121/0.121 0.316/0.316 0.789/0.789
Metalinguistic

Immediate 9 0.581/0.581 0.133/0.135 0.321/0.316 0.841/0.846
Short 6 0.518/0.519 0.178/0.235 0.170/0.059 0.866/0.866
Explicit Correction
Immediate 4 0.877/0.908 0.170/0.199 0.543/0.516 1.121/1.299
Clarification
Long 5 0.531/0.547 0.168/0.351 0.201/–0.141 0.861/1.235
Implicit Feedback
Immediate 20 0.542/0.562 0.081/0.112 0.383/0.332 0.700/0.791
Short 11 0.444/0.446 0.120/0.133 0.208/0.184 0.679/0.707
Long 11 0.544/0.544 0.105/0.112 0.338/0.324 0.751/0.765
Explicit Feedback
Immediate 18 0.693/0.718 0.090/0.112 0.516/0.498 0.870/0.938
Short 10 0.608/0.622 0.127/0.165 0.358/0.298 0.858/0.946
Long 4 0.440/0.440 0.166/0.166 0.115/0.115 0.765/0.765
336
Note. In each column, the numbers on the left are derived from the FE model and those on the right are based on the RE model.
0.8
0.7
0.6
0.5
Effect size
Implicit
0.4
Explicit
0.3
0.2
0.1
0
Immediate Short Long
Figure 3 Implicit and explicit feedback: Change of mean effect sizes over time.
long-term effect of clarification was 0.53 under the FE model and 0.55 under
the RE model. However, under the RE model, the effect size was not significant,
as the confidence interval crossed zero. Q-Test results showed that there was
no significant difference between different feedback types or between different
time points where a particular feedback type is concerned. However, the patterns
that emerged seem interesting and deserve further discussion, especially given
that some of the patterns were also obtained by previous meta-analyses.
Recall that corrective feedback was also coded as implicit versus explicit
to answer the question of whether explicit feedback was more effective than
implicit feedback in facilitating SLA. Implicit feedback refers to any correc-
tive move that does not overtly inform the learner of the unacceptability of
his/her erroneous production; explicit feedback, in contrast, draws the learner’s
attention to the error he/she commits. Table 4 shows that under both models,
explicit feedback worked better than implicit feedback on both immediate and
short-delayed posttests. However, on long-delayed posttests, implicit feedback
produced a larger effect size than explicit feedback, indicating that the effect of
implicit feedback was more enduring (see Figure 3 for a graphic display of the
effects of the two feedback types). Once again, although Q-tests showed that
these differences were not significant, they are noteworthy.

Moderator Variables
In order to ascertain whether the effectiveness of corrective feedback was mod-
erated by learner characteristics and methodological features, separate analy-
ses were performed for the effect sizes associated with immediate posttests.
Q-Statistics were used to determine if a certain variable was a significant mod-
erator. The analyzed moderator variables7 include research context, research
setting, task type, publication type, outcome measure, treatment length, mode
of delivery, interlocutor type, learners’ age, target language, and year of pub-
lication. The results for these variables, except for those for age and year of
publication, appear in Table 5. Two separate meta-regression analyses were
performed to determine whether age and year of publication were significant
predictors of the efficacy of corrective feedback. The following results were
obtained.
• Research context. The mean effect size associated with the studies con-
ducted in FL contexts was significantly larger than that associated with
studies conducted in SL contexts under the FE model, Q(1) = 4.5, p < .05,
indicating that corrective feedback was more effective in FL contexts than
in SL contexts. However, the difference was not significant under the RE
model, Q(1) = 1.3, p = .25.
• Research setting. Significant differences were found among the three con-
ditions (lab, class, and group) under this variable under both models—FE:
Q(2) = 31.3, p < .01; RE: Q(2) = 7.9, p < .05. Follow-up pairwise compar-
isons indicated that lab-based studies generated a significantly larger effect
than classroom-based studies—FE: Q(1) = 24.2, p < .01; RE: Q(1) = 6.6,
p < .05—or group-based studies—FE: Q(1) = 16.8, p < .01; RE: Q(1) =
3.7, p < .01. No significant difference was found between classroom-based
and group-based studies—FE: Q(1) = 2.1, p = .14; RE: Q(1) = 7.9, p =
.65.
• Task type. The mean effect size generated by mechanical drills was signifi-
cantly larger than that generated by communicative activities under the FE
model, Q(1) = 6.1, p < .05, but the difference was not significant under
the RE model, Q(1) = 2.2, p = .34.
• Mode of delivery. Computer-delivered feedback (which is provided by an
interlocutor through online communication programs or is embedded in the
computer) and face-to-face feedback did not differ substantially in affecting
L2 development—FE: Q(1) = 0.1, p = .77; RE: Q(1) = 0.1, p = .91.
• Outcome measure. There were no significant differences among the mean
effect sizes associated with the three outcome measures—FE: Q(1) = 3.3,

Li
339
Table 5 Moderator analysis: Means and Q-statistics based on immediate effects
95% CI
Moderator k Mean d SE Lower Upper Q

a
Context 4.5∗ /1.3
FL 22 0.741/0.804 0.066/0.142 0.612/0.527 0.869/1.081
SL 7 0.413/0.504 0.141/0.221 0.137/0.070 0.689/0.937
Setting 31.3∗∗ /7.9∗
Lab 14 1.091/1.078 0.096/0.170 0.902/0.744 1.279/1.411
Class 11 0.472/0.495 0.081/0.149 0.312/0.201 0.631/0.789
Group 4 0.147/0.324 0.219/0.354 −0.263/−0.369 0.558/1.107
Taskb 6.1∗ /2.2
Drill 8 0.887/0.888 0.120/0.102 0.652/0.652 1.237/1.124
Com 18 0.686/0.742 0.083/0.185 0.523/0.379 0.848/1.104
Modec 0.1/0.1
Comp 5 0.722/0.725 0.152/0.176 0.425/0.379 1.019/1.071
FF 24 0.675/0.752 0.065/0.143 0.548/0.472 0.802/1.031
Outcomed 3.3/1.7
FCR 10 0.686/0.809 0.126/0.188 0.438/0.441 0.934/1.178
CCR 17 0.656/0.656 0.087/0.095 0.486/0.418 0.826/0.843
GJT 7 0.340/0.413 0.172/0.307 0.004/−0.188 0.676/1.014
Pub Typee 0.6/0.4
Pub 19 0.648/0.664 0.073/0.101 0.505/0.466 0.792/0.862
Disser 10 0.747/0.850 0.101/0.304 0.548/0.255 0.945/1.445
(Continued)

Li
Table 5 Continued
95% CI
Moderator k Mean d SE Lower Upper Q
Duration 23.9∗∗ /3.9
<50 11 1.154/1.020 0.117/0.234 0.925/0.561 1.383/1.479
60–120 7 0.461/0.502 0.128/0.171 0.209/0.166 0.713/0.837
>120 9 0.499/0.553 0.087/0.196 0.327/0.169 0.671/0.937
Interlocutorf 17.6∗∗ /4.6#
NS 14 0.997/0.975 0.103/0.218 0.795/0.548 1.198/1.403
T 12 0.412/0.474 0.080/0.143 0.305/0.194 0.618/0.753
Comp 3 0.828/0.886 0.178/0.242 0.478/0.410 1.178/1.361
L2 1.3/0.5

English 13 0.717/0.749 0.097/0.223 0.528/0.312 0.907/1.186
French 9 0.586/0.638 0.088/0.179 0.412/0.287 0.759/0.991
Spanish 3 0.548/0.548 0.183/0.183 0.189/0.189 0.907/0.907
Note. Fixed-effects values are on the left of each column and random-effects values are on the right.
a
Context: FL = foreign language; SL = second language.
b
Task: Com = communicative.
c
Mode: Comp = computer; FF = face-to-face.
d
Outcome: FCR = free constructed response; CCR = constrained constructed response; GJT = grammaticality judgment test.
e
Publication type: Pub = published studies; Disser = dissertation.
f
Interlocutor: NS = native speaker; T = teacher; Comp = computer.
∗
p < .05.
∗∗
p < .01.
#
p < .10.
340
p = .35; RE: Q(1) = 1.7, p = .63. However, under the RE model, studies
adopting free constructed responses seemed to show a larger effect (by
around .15 and 0.3 standard deviation units) than constrained constructed
responses and GJTs (grammatical judgment tests), which might deserve
more attention.
• Publication type. Published studies did not show a larger effect than Ph.D.
dissertations; in fact, the mean effect size for dissertations was larger than
that yielded by published articles. However, the difference was not signif-
icant under either model—FE: Q(1) = 0.6, p = .43; RE: Q(1) = 0.4, p =
.56.
• Treatment length. Significant differences were found among the three
groups of studies in this category under the FE model but not under the
RE model—FE: Q(2) = 23.9, p < .01; RE: Q(2) = 3.9, p = .26. Pair-
wise comparisons revealed that short treatments (50 min or less) produced
a substantially larger mean effect size than treatments of medium length
(60–120 min)—FE: Q(1) = 16.0, p < .01; under the RE model, the differ-
ence approaches significance, Q(1) = 3.2, p = .07. Short treatments also
produced significantly larger effects than long treatments (over 120 min—
FE: Q(1) = 20.1, p < .01. There was no significant difference between
medium-length treatments and long treatments.
• Interlocutor type. The Q-tests revealed that there were significant differ-
ences among the three groups of studies (computer, native speaker, and
teacher) under the FE model, Q(2) = 17.6, p < .01; but based on the RE
model, the differences were nonsignificant, Q(2) = 4.6, p = .09. Pairwise
analyses showed that feedback provided by native-speaker interlocutors
was significantly more effective than feedback provided by teachers—FE:
Q(1) = 16.9, p < .01; under the RE model, the difference bordered on
significance, Q(1) = 3.7, p = .05. The difference between computerized
feedback (which is embedded in the computer and does not involve an
interlocutor) and teacher-provided feedback approached significance—FE:
Q(1) = 16.9, p = .06. No significant difference was found between com-
puterized feedback and feedback provided by native speakers.
• Target language. Effect sizes were calculated only for studies examining
L2 English, L2 French, and L2 Spanish, as the number of studies related
to other L2s was not large enough for analysis. It was found that studies
related to L2 English yielded larger effect sizes than those investigating
L2 French or L2 Spanish, although the differences were not significant
under either model.

• Learners’ age and publication year. The separate meta-regression anal-

yses (under both models) conducted on the two continuous moderator
variables—learners’ age and publication year—showed that neither was a
predictor of effect size variation. To control for the possibility that these
two variables might be correlated, thus violating the “uncorrelated with ex-
ternal variables” assumption of regression analysis, a Pearson’s correlation
analysis was performed and the coefficient was found to be nonsignificant.
The regression results indicate that the magnitude of effect size was not
related to how old the learners were and when the study was conducted.
However, it must be pointed out that only three studies in the data set in-
volved child L2 learners (10–12 years) and the rest investigated learners
of 15 years or older.8 The underrepresentation of child L2 learners might
undermine the validity of the results.
It is evident that the included (or available) studies are heterogeneous and
the subgroups as categorized by learner characteristics and methodological
features are not equally represented. For instance, in the data set, there was a
lack of studies that investigated feedback types other than recasts, that were
implemented in the L2 setting, that used drills in treatment tasks, that were
conducted in the computer mode, and that involved the learning of languages
other than English. Therefore, the results should be interpreted with caution.
Discussion
This meta-analysis sought to determine the effectiveness of corrective feedback
in L2 learning and to identify the moderator variables for its effectiveness. It
was found that overall feedback showed a medium effect and the effect was
maintained over time. In general, the effects found in this meta-analysis are
smaller than in previous analyses. In Russell and Spada’s (2006) analysis, the
studies that examined oral feedback, which is the focus of this meta-analysis,
yielded a mean effect size of 0.91; Mackey and Goo (2007) performed, among
others, a separate analysis comparing studies examining interaction with or
without feedback, and the mean effect size for the studies with feedback was
0.71 (a near-large effect). The variation among these three meta-analyses in
terms of the magnitude of effect size is attributable to their different inclusion
criteria and to the exclusion of outliers from this analysis. As for inclusion
criteria, Russell and Spada’s (2006) meta-analysis included studies published
before 2003, and some of the studies published before 2003 were included in
the current meta-analysis but not in theirs. Additionally, their analysis included
studies that examined corrective feedback in L2 writing and excluded studies

related to computer-assisted language learning and studies that involved

child learners. Mackey and Goo’s (2007) meta-analysis excluded studies that
investigated corrective feedback provided in tasks other than negotiated in-
teraction. Neither analysis included unpublished studies. With respect to how
outliers are treated, this meta-analysis excluded some extreme (large) values,
which lowered the overall mean effect size by more than 0.2 standard deviation
units.
With regard to how different types of feedback vary in effect, two coding
schemes were utilized. In the first scheme, the original feedback types as
defined and operationalized in primary studies were retained, and analyses
were performed for the four most frequently studied feedback types: recasts,
metalinguistic feedback, explicit correction, and clarification. It was found that
on immediate posttests, the four studies that investigated explicit correction
contributed a substantially larger mean effect size than studies that examined
recasts and metalinguistic feedback. Further examination of the data revealed
that in three out of the four studies that involved explicit correction, feedback
was provided in mechanical drills and the tests had almost the same format as the
instructional treatments, which might have contributed to the observed larger
effects of these studies. Furthermore, given the small sample size of the explicit
condition, it is difficult to make conclusive claims about its effectiveness.
Therefore, caution should be exercised in claiming a superior effect for explicit
correction in comparison with recasts and metalinguistic feedback.
With respect to recasts, the most frequently investigated feedback, the
following findings deserve more attention. First, different from Mackey and
Goo (2007), who found that recasts had a large immediate effect (d = 0.96),
this meta-analysis revealed a medium effect size for recasts. Again, this finding
may result from, among other factors, the fact that three extreme values (Han,
2002; Révész, 2007; Sagarra, 2007) were removed when the analysis was
performed on the effectiveness of individual feedback types. Also, because
the number of primary studies contributing effect sizes is small (k = 4 for
recasts and k = 3 for metalinguistic feedback) in Mackey and Goo (2007), the
meta-analysts stated that it was premature to make conclusive arguments for
the superior effects of recasts based on their data. Second, similar to Mackey
and Goo, in this meta-analysis, the long-term effect of recasts was larger than
the short-term effect. The reason why the effect of recasts did not fade and even
increased with the passage of time will be discussed together with the findings
concerning implicit and explicit feedback.
When feedback was coded as implicit versus explicit, it was found that
explicit feedback worked better than implicit feedback on immediate and

short-delayed posttests, but on long-delayed posttests, implicit feedback was

slightly more effective than explicit feedback. Furthermore, the long-term ef-
fect of implicit feedback was larger than its short-term effect, similar to the
pattern for recasts identified in both this meta-analysis and Mackey and Goo’s
(2007). One speculation is that implicit feedback might be more beneficial than
explicit feedback to the development of implicit knowledge (L2 competence).
Over the short term, explicit feedback might work better than implicit feedback,
but because it primarily contributes to the development of explicit knowledge
(learned linguistic knowledge), it might not be as effective as implicit feed-
back in transforming explicit knowledge into implicit knowledge. This might
have caused its smaller effect size over the long term. However, the claim that
implicit feedback is superior to explicit feedback in the proceduralization of
L2 knowledge needs to be empirically tested (see DeKeyser, 2003; Ellis, 2005;
Ellis & Sheen, 2006; Long, 2007, for more discussion on this topic).
Admittedly, the claim that implicit feedback works better in proceduralizing
linguistic knowledge is at most tentative and is subject to two caveats. First,
the sample size for the explicit group (k = 4) at time 3 (long-delayed posttests)
is small, rendering it difficult to arrive at robust findings about its long-term
effects. Second, as is shown in the Results section and will be discussed below,
the effectiveness of (one type of ) feedback must be discussed in conjunction
with the methodological features and learner characteristics of primary studies.
Partly following the general practices of meta-analysis and partly following
the previous meta-analyses in SLA, this meta-analysis probed into the potential
influence of 11 moderator variables on the effectiveness of corrective feedback.
These variables were some of the learner and methodological characteristics
as reported in the meta-analyzed studies. Some striking results related to these
variables are discussed in the following paragraphs.
First, a larger effect was found for studies conducted in foreign language
contexts than for studies conducted in second language contexts. One explana-
tion might be that learners in foreign language contexts have a more positive
attitude toward error correction than learners in second language contexts
(Loewen et al., 2009), which makes it more likely for the effects of feedback to
be incorporated. Loewen et al. surveyed 754 L2 learners at a large Midwestern
U.S. university and they found that whereas SL learners were more enthusiastic
about improving their oral communication skills, FL learners gave more prior-
ity to grammar instruction and error correction. Another speculation is that the
instructional dynamics of FL contexts might make corrective feedback more
effective. For instance, Sheen (2004) compared four instructional contexts in
terms of the occurrence of corrective feedback and found that the instructors in

the Korean EFL (English as a foreign language) context provided substantially

more recasts than the instructors in the ESL (English as a second language) and
immersion programs and that the recasts supplied in the EFL context led to a
higher uptake rate than in the other contexts. In addition, Liu (2007) surveyed
800 teachers of English throughout the world and found that EFL teachers
tended to focus more on linguistic forms than ESL teachers.
Second, lab-based studies yielded a substantially larger effect than
classroom- or group-based studies. This may be due to the fact that in the
classroom context, there is more distraction and feedback is often not directed
toward individual learners. Therefore, corrective feedback, especially implicit
feedback (Nicholas et al., 2001), might not be easily recognized. In the labo-
ratory, however, feedback is often supplied on a one-on-one basis and targets
only one structure, rendering the corrective intention easily perceivable. An-
other advantage of the laboratory is that, as one anonymous reviewer pointed
out, variables can be more easily or better controlled and the quality of treatment
might be better than in the classroom.
Third, corrective feedback supplied in discrete-item practice produced a
larger effect than that embedded in communicative activities. Cross-tabulation
of the data showed that the result can be accounted for by the findings relating
to other moderator variables. First, part of the explanation has to do with the
research setting of the studies in the two conditions. Among the 8 studies in the
“drill” condition, 6 were carried out in the laboratory context (75%), whereas in
the “communicative” condition, 10 out of the 18 studies (over 55%) were either
carried out in classes or groups, where feedback was less effective than in lab-
oratories. Second, as discussed previously, this finding might be attributable to
the resemblance of treatment tasks to outcome measures in the drill conditions,
both involving explicit focus on forms and happening on an item-by-item basis.
In addition to the influence of other moderator variables, the inherent nature of
drills might inflate the magnitude of the effect. In mechanical drills, learners
receive intensive feedback and there is less distraction than in communicative
activities. Feedback is therefore made more salient and noticeable.
Fourth, with regard to effect size variation resulting from different out-
come measures, the studies that measured learners’ achievements through free
constructed responses produced larger effects than those that employed con-
strained construction responses or metalinguistic judgment tests. Although the
differences were not significant, the pattern that emerged seems interpretable.
The finding might be due to the fact that when performing tasks involving
free constructed responses, learners normally do not have much constraint
and they might choose to avoid producing structures about which they are

uncertain, which leads to a lower error rate. Conversely, constrained construc-

tion responses and metalinguistic judgment measures provide obligatory con-
texts for the tested structures, in which conditions learners do not have the
freedom of choosing particular items to answer.
Fifth, a comparison was made between published studies and Ph.D. disser-
tations to deal with the debate on publication bias (i.e., studies with significant
results are more likely to be published), and Ph.D. dissertations showed a larger
mean effect size than published studies. However, this would not suffice to reach
the conclusion that publication bias was not present. In the first place, studies
in other formats, such as conference presentations, papers rejected by journals,
and so on, were not retrieved. Next, out of the 33 meta-analyzed studies, only 11
were dissertation studies, resulting in an unproportional representation of pub-
lished versus unpublished studies. Finally, as discussed in the Results section,
there was indeed a certain degree of availability bias in this meta-analysis.
Sixth, in terms of the effect of length of treatment, it was found that treat-
ments of 50 min or less were associated with substantially larger effect sizes
than longer treatments (60–120 min or more than 120 min).9 Explanations
were sought and it was found that this might be due to the fact that all of
the short treatments were those that took place in the laboratory context and
all of the treatments that were 60 min or longer were those that were carried
out in classroom contexts. Lab-based studies tended to yield larger effects
than classroom-based studies. It appears that primary researchers tended to
use longer treatments when conducting classroom-based studies and shorter
treatments when carrying out lab-based studies. Thus, the above explanation
provides only a partial solution to the puzzle, because one would wonder what
might happen if comparisons are made between conditions in which learners
receive the same amount of instruction but that happen in different research
contexts, between short classroom and long lab treatments, or between treat-
ments that are different in length but that are carried out in the same context.
As previously discussed, as well, the impact of the duration of treatment is tied
with feedback type, intensity of feedback, complexity of linguistic structure,
learner differences, and so on. What amount of treatment is needed to achieve
certain effects is therefore a complex issue and needs to be further examined.
Seventh, with regard to the impact of interlocutor or type, feedback provided
by native speakers or embedded in the computer was more effective than
feedback provided by language teachers. Given the fact that all of the studies
in which learners received feedback from native speakers or computers are the
ones conducted in the laboratory and that all of the studies in which language
teachers provided feedback were carried out in the classroom, it appears that

this moderator variable can be merged with the “research context” variable.
However, this somewhat “unfortunate” coincidence implies that more studies
are needed that examine different interlocutor-context combinations: different
interlocutors (teacher vs. native speaker vs. computer) in the same context (lab
or classroom) or the same interlocutor in different contexts. Other potential
areas of research related to this topic include age and/or gender of interlocutor,
relationship between the learner and the interlocutor, and so forth. Furthermore,
it should be noted that Sagarra’s study (2007), which investigated the effects
of computerized recasts, was excluded as an outlier when the analysis was
performed. Initial analyses showed that including the study would have made
computerized feedback the most effective feedback type as far as interlocutor
type is concerned. Computerized feedback is salient and consistent and can be
delivered visually or both audially and visually; one would expect it to show a
larger effect than other interlocutor types. However, this hypothesis needs to be
empirically tested.
Finally, to ascertain if the effectiveness of feedback depends to some extent
on the target language to be learned, effect sizes were calculated for the three
most frequent L2s in the data set. It was found that feedback provided in
the learning of L2 English was slightly more effective than that provided
in the learning of L2 French and L2 Spanish although the differences were
not significant. An explanation was sought through the cross-tabulation of
the results related to other independent variables. Examination of the learner
characteristics of these studies showed that out of the 13 L2 English studies, 9
were conducted with ESL/EFL learners in intensive language programs. Among
the nine L2 French studies, seven involved learners enrolled in university
language classes, one examined immersion students at an elementary school,
and one investigated high school students. The L2 Spanish learners in the
three primary studies were all students taking language classes at universities.
Language students at intensive training programs typically receive 4–5 hr of
instruction every day and might therefore be more sensitive and receptive to
corrective feedback than students in university language classes or students in
immersion programs.
It should be noted that some of the results as interpreted above are not
statistically significant. However, explanations were sought and speculations
were attempted because the results showed some interesting and noteworthy
patterns. It is hoped that these tentative speculations can provide some directions
for future research. For instance, the results about the differential effects of
implicit and explicit feedback did not reach statistical significance. However,
because the obtained patterns can be woven into the theoretical framework of

implicit and explicit knowledge (DeKeyser, 2003; Ellis, 2005) and there has
been a heated debate over which feedback type is more effective (Long, 2007;
Lyster, 2004), some explanations were attempted to account for the findings.
Conclusion
In response to the mushrooming of empirical research on the effectiveness
of corrective feedback in L2 learning, this meta-analysis was undertaken to
present a summative description of previous findings by investigating the mag-
nitude of related effect sizes across primary studies. It was intended to be
an update and complement to previous meta-analyses that are related to cor-
rective feedback in one way or another. To achieve this purpose, a series of
methodological moves were taken. These moves include establishing a dif-
ferent set of inclusion/exclusion criteria to sharpen the study focus and min-
imize publication bias, presenting the results from both the FE and the RE
model, using Q-tests to detect group differences and identify moderator vari-
ables, controlling for sample size inflation, and so on. The introduction of
these moves was expected to make the results more robust and trustworthy,
which might, in turn, provide useful information and reference for interested
L2 researchers and educators. By performing these moves, it was also hoped
that this meta-analysis can provide some methodological implications for SLA
meta-analysts.
This meta-analysis explored some issues that have not been or have been
insufficiently investigated in previous meta-analyses. It revealed that explicit
feedback worked better than implicit feedback over a short term and that the ef-
fects of implicit feedback did not fade or even increased over a long term. It also
identified some significant moderators such as research context, research set-
ting, task type, treatment length, and interlocutor type. The results concerning
these moderators as well as the ones that were found to be nonsignificant mod-
erators but that generated interesting findings were extensively and intensively
discussed, and interpretations were sought.
This analysis identified the following issues to be addressed in future re-
search. First, the presence of availability bias in this meta-analysis shows that
more research is needed on corrective feedback. As far as specific feedback
types, although there is a relatively large amount of research on recasts, less at-
tention has been paid to explicit correction, metalinguistic feedback, and even
less to negotiation moves such as clarification and elicitation, which makes
the comparison of effect sizes among different feedback types difficult. For
instance, effect sizes were not calculated for the immediate and short-term

effects of clarification, the short- and long-term effects for explicit correc-
tion, or the long-term effect of metalinguistic feedback simply because there
were not sufficient related studies. The unbalanced representation of individual
feedback types in the data set, in turn, limits the conclusion concerning the
differential effects of different feedback types. Second, the fact that primary
researchers operationalized feedback, particularly specific feedback types, in
different ways poses a great challenge to meta-analysts when they try to dis-
entangle the effects of different varieties of feedback. Therefore, a call needs
to be made for researchers to observe more consistency in defining and oper-
ationalizing different types of feedback. Third, it was found that a few studies
did not provide learners’ pretest scores or did not measure learners’ knowledge
about the target structures prior to instructional treatments. This makes one
wonder about the extent to which the obtained effects were derived from the
treatments.
The categorization of the meta-analyzed studies according to their learner
characteristics and methodological features revealed some gaps to be filled, and
the existence of these gaps affected the identification of moderator variables.
Specifically, more research is needed that involves child learners, that inves-
tigates speakers and learners of languages other than English, that involves
language learners of higher proficiency (as most of the studies are about begin-
ners), that is conducted in L2 contexts, and that is implemented in the computer
mode. There is also a dearth of research examining the variables that moderate
the effects of corrective feedback on SLA, such as age, gender, proficiency,
L1 transfer, culture, complexity of the target structure, or interlocutor type, to
name only a few. This suggests that now that the effect of corrective feedback
has been established, researchers should embark on the mission of investigating
the factors constraining its effectiveness.
Revised version accepted 26 January 2009
Notes
1 Keck et al.’s meta-analysis (2006), which is about the effectiveness of task-based
interaction in SLA, also included several studies that involved negative feedback
and that were included in Mackey and Goo (2007) and this meta-analysis.
2 It must be pointed out that although ProQuest is a global database, most available
dissertations and theses in this database are from academic institutions in North
America (the United States or Canada).
3 One anonymous reviewer pointed out that the fact that the software can handle only
one independent variable at a time might be a drawback because the independent
variables might be correlated. However, addressing the correlations between

independent variables does not seem to have been the norm in meta-analysis
because the results would be hardly interpretable. Correlations between independent
variables are more of a concern when metaregression analyses are performed, which
explains why a correlation analysis was conducted when the two continuous
moderators were subjected to metaregression analysis. Additionally, the data
cross-tabulation as reported in the Discussion section is at least a partial, if not
perfect, solution if it is indeed a concern.
4 The current version of the software allows only one independent variable to be
included in a metaregression analysis. Therefore, two separate analyses were
performed, for age and publication year, respectively.
5 A total of 18 dissertations were retrieved, but only 11 were included in the analysis.
The rest did not meet the inclusion criteria. Additionally, a few dissertations failed
to be retrieved although the related bibliographic information was obtained, because
they were noncirculation items according to the source libraries.
6 O’Relly’s (1999) study was published, but the dissertation the article was based on
was also retrieved. Effect sizes were calculated according to the data provided in the
dissertation because it was more detailed.
7 The moderator variables did not include learners’ L1, academic status, proficiency,
or proficiency measure because the results could not be meaningfully interpreted. In
many studies, especially those that involved ESL learners, learners’ L1s were
mixed. With regard to academic status, it is difficult to categorize learners in
intensive language programs or language schools. As far as proficiency and
proficiency measures are concerned, there was too much variation in defining
proficiency levels and in the use of proficiency measures by primary researchers.
Therefore, these variables were not analyzed.
8 One anonymous reviewer expressed a concern about the inclusion of studies
involving child learners in the analysis. Further examination of the data showed that
excluding the three studies involving child L2 learners slightly lowered the mean
effect size (on immediate posttests) from 0.61(FE)/0.64 (RE) to 0.58 (FE)/0.61
(RE). Nonetheless, the difference was below 0.03 standard deviation units.
Therefore, including the three studies did not seem to be a cause for concern.
Additionally, because age is a moderator variable in this meta-analysis, including
studies in different age groups would make the results related to this variable more
convincing.
9 The length of treatment does not provide any information on the amount/intensity of
feedback provided to the learner, although in most cases longer treatment contains
more feedback. Primary researchers usually do not report how much feedback was
provided in the treatment; they report only how long the treatment lasted. However,
the amount/intensity of feedback is certainly a question that needs to be addressed
in future research.

References∗
∗
Ammar, A., & Spada, N. (2006). One size fits all? Recasts, prompts, and L2 learning.
Studies in Second Language Acquisition, 28, 543–574.
∗
Ayoun, D. (2001). The role of negative and positive feedback in the second language
acquisition of the passé composé and imparfait. Modern Language Journal, 85,
226–243.
∗
Bationo, B. (1991). The effects of three forms of immediate feedback on learning
intellectual skills in a foreign language computer-based tutorial. Unpublished
doctoral dissertation. The University of Toledo, Toledo, OH.
∗
Bell-Corrales, M. (2001). The role of negative feedback in second language
instruction. Unpublished doctoral dissertation. University of Florida, Gainesville.
Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H. (2005). Comprehensive
Meta-Analysis (Version 2.2.027) [Computer software]. Englewood, NJ: Biostat.
Carpenter, H., Jeon, S., MacGregor, D., & Mackey, A. (2006). Learners’
interpretations of recasts. Studies in Second Language Acquisition, 28, 209–236.
∗
Carroll, S., & Swain, M. (1993). Explicit and implicit negative feedback: An
empirical study of the learning of linguistic generalizations. Studies in Second
Language Acquisition, 15, 357–386.
∗
Carroll, S., Swain, M., & Roberge, Y. (1992). The role of feedback in adult second
language acquisition: Error correction and morphological generalizations. Applied
Psycholinguistics, 13, 173–198.
∗
Chen, H. (1996). A study of the effect of corrective feedback on foreign language
learning: American students learning Chinese classifiers. Unpublished doctoral
dissertation. University of Pennsylvania, Philadelphia.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.).
Hillsdale, NJ: Erlbaum.
Cooper, H., & Hedges, L. (1994). Potentials and limitations of research synthesis. In
H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis (pp.
521–530). New York: Russell Sage Foundation.
Cornell, J., & Mulrow, C. (1999). Meta-Analysis. In A. Herman & M. Gideon (Eds.),
Research mythology in the social, behavioral, and life sciences (pp. 285–323).
London: SAGE Publications.
∗
DeKeyser, R. (1993). The effect of error correction on L2 grammar knowledge and
oral proficiency. Modern Language Journal, 77, 501–514.
DeKeyser, R. (2003). Implicit and explicit learning. In C. Doughty & M. Long (Eds.),
Handbook of second language acquisition (pp. 313–348). Malden, MA: Blackwell.
Doughty, C., & Long, M. (Eds.). (2003). Handbook of second language acquisition.
Malden, MA: Blackwell.
∗
Studies included in the current meta-analysis are marked with an asterisk.

Ellis, R. (2005). Measuring implicit and explicit knowledge of a second language: A

psychometric study. Studies in Second Language Acquisition, 27, 141–172.
∗
Ellis, R. (2007). The differential effects of corrective feedback on two grammatical
structures. In A. Mackey (Ed.), Conversational interaction in second language
acquisition (pp. 339–360). New York: Oxford University Press.
∗
Ellis, R., Loewen, S., & Erlam, R. (2006). Implicit and explicit corrective feedback
and the acquisition of L2 grammar. Studies in Second Language Acquisition, 28,
339–368.
Ellis, R., & Sheen, Y. (2006). Reexamining the role of recasts in second language
acquisition. Studies in Second Language Acquisition, 28, 575–600.
Felix, U. (2005). Analyzing recent CALL effectiveness research: Towards a common
agenda. Computer Assisted Language Learning, 18, 1–32.
Gass, S., & Selinker, L. (2001). Second language acquisition: An introductory course.
Mahwah, NJ: Erlbaum.
Gass, S. M. (1997). Input, interaction, and the second language learner. Mahwah, NJ:
Erlbaum.
Gass, S. M. (2003). Input and interaction. In C. J. Doughty & M. H. Long (Eds.), The
handbook of second language acquisition (pp. 224–255). Malden, MA: Blackwell.
Glass, G. (1976). Primary, secondary, and meta-analysis of research. Education
Researcher, 5, 3–8.
Goldschneider, J., & DeKeyser, R. (2005). Explaining the “natural order of L2
morpheme acquisition” in English: A meta-analysis of multiple determinants.
Language Learning, 55 (Suppl.), 27–77.
∗
Han, Z. (2002). A study of the impact of recasts on tense consistency in L2 output.
TESOL Quarterly, 36, 543–572.
Hedges, L. (1994). Statistical considerations. In H. Cooper & L. Hedges (Eds.), The
handbook of research synthesis (pp. 29–38). New York: Russell Sage Foundation.
∗
Herron, C. (1991). The garden path correction strategy in the foreign language
classroom. The French Review, 64, 966–977.
∗
Herron, C., & Tomasello, M. (1988). Learning grammatical structures in a foreign
language: Modeling versus feedback. The French Review, 61, 910–922.
∗
Hino, J. (2006). Linguistic information supplied by negative feedback: A study of its
contribution to the process of second language acquisition. Unpublished doctoral
dissertation. University of Pennsylvania, Philadelphia.
Hunter, J., & Schmidt, F. (2004). Methods of meta-analysis. London: SAGE
Publications.
Indefrey, P. (2006). Meta-analysis of hemodynamic studies on first and second
language processing: Which suggested differences can we trust and what do they
mean? Language Learning, 56, 279–304.
Ishida, M. (2004). Effects of recasts on the acquisition of the aspectual form -te i-(ru)
by learners of Japanese as a foreign language. Language Learning, 54, 311–394.

Iwashita, N. (2003). Negative feedback and positive evidence in task-based interaction:

Differential effects on L2 development. Studies in Second Language Acquisition,
25, 1–36.
Juffer, F., & van Uzendoorn, M. H. (2007). Adoptees do not lack self-esteem: A
meta-analysis of studies on self-esteem of transracial, international, and domestic
adoptees. Psychological Bulletin, 133, 1067–1083.
∗
Kang, H. (2007). Negative evidence: Its positioning, explicitness and linguistic focus
as factors in second language acquisition. Unpublished doctoral dissertation.
University of Pennsylvania, Philadelphia.
Keck, C., Iberri-Shea, G., Tracy-Ventura, N., & Wa-Mbaleka, S. (2006). Investigating
the empirical link between task-based interaction and acquisition. In J. Norris & L.
Ortega (Eds.), Synthesizing research on language learning and teaching
(pp. 91–131). Amsterdam: Benjamins.
Koike, D., & Pearson, L. (2005). The effect of instruction and feedback in the
development of pragmatic competence. System, 33, 481–501.
Konstantopoulos, S., & Hedges, L. (2004). Meta-analysis. In D. Kaplan (Ed.), The
SAGE handbook of quantitative methodology for the social sciences (pp. 281–297).
London: SAGE Publications.
Krashen, S. (1981). Second language acquisition and second language learning.
Oxford: Pergamon.
LeBauer, D. S., & Treseder, K. K. (2008). Nitrogen limitation of net primary
productivity in terrestrial ecosystems is globally distributed. Ecology, 89,
371–379.
Lee, H. (1997). The effects of the interactional modification of input on second/foreign
language acquisition. Unpublished doctoral dissertation. Georgetown University,
Washington, DC.
∗
Leeman, J. (2003). Recasts and second language development: Beyond negative
evidence. Studies in Second Language Acquisition, 25, 37–63.
Lipsey, M., & Wilson, D. (2001). Practical meta-analysis. Thousand Oaks, CA: SAGE
Publications.
Littell, J. H., Corcoran, J., & Pillai, V. (2008). Systematic reviews and meta-analysis.
New York: Oxford University Press.
Liu, J. (2007). The place of methods in teaching English around the world. In J. Liu
(Ed.), English language teaching in China: New approaches, perspectives, and
standards (pp. 13–41). New York: Continuum International Publishing Group.
Loewen, S. (2004). Uptake in incidental focus on form in meaning-based ESL lessons.
Language Learning, 54, 153–188.
∗
Loewen, S., & Erlam, R. (2006). Corrective feedback in the chatroom: An
experimental study. Computer Assisted Language Learning, 19, 1–14.
Loewen, S., Li, S., Fei, F., Thompson, A., Nakatsukasa, K., Ahn, S., & Chen, X.
(2009). L2 learners’ beliefs about grammar instruction and error correction. The
Modern Language Journal, 93, 91–104.

∗
Loewen, S., & Nabei, T. (2007). Measuring the effects of oral corrective feedback on
L2 knowledge. In A. Mackey (Ed.), Conversational interaction in second language
acquisition (pp. 361–377). New York: Oxford University Press.
Loewen, S., & Philp, J. (2006). Recasts in the adult English L2 classroom:
Characteristics, explicitness, and effectiveness. Modern Language Journal, 90,
536–556.
∗
Long, M., Inagaki, S., & Ortega, L. (1998). The role of negative feedback in SLA:
Models and recasts in Japanese and Spanish. Modern Language Journal, 82,
357–371.
Long, M. H. (2007). Problems in SLA. Mahwah, NJ: Erlbaum.
Lyster, R. (1998). Negotiation of form, recasts, and explicit correction in relation to
error types and learner repair in immersion classrooms. Language Learning, 48,
183–218.
Lyster, R. (2001). Negotiation of form, recasts, and explicit correction in relation to
error types and learner repair in immersion classrooms. Language Learning,
51(Suppl. 1), 265–301.
∗
Lyster, R. (2004). Different effects of prompts and effects in form-focused
instruction. Studies in Second Language Acquisition, 26, 399–432.
Lyster, R., & Mori, H. (2006). Interactional feedback and instructional counterbalance.
Studies in Second Language Acquisition, 28, 269–300.
Lyster, R., & Ranta, L. (1997). Corrective feedback and learner uptake. Studies in
Second Language Acquisition, 19, 37–66.
∗
Macheak, T. (2002). Learner vs. instructor correction in adult second language
acquisition: Effects of oral feedback type on the learning of French grammar.
Unpublished doctoral dissertation. Purdue University, West Lafayette, IN.
Mackey, A. (Ed.) (2007). Conversational interaction in SLA: A collection of empirical
studies. New York: Oxford University Press.
Mackey, A., Gass, S. M., & McDonough, K. (2000). How do learners perceive
international feedback? Studies in Second Language Acquisition, 22,
471–497.
Mackey, A., & Goo, J. (2007). Interaction research in SLA: A meta-analysis and
research synthesis. In A. Mackey (Ed.), Conversational interaction in SLA: A
collection of empirical studies (pp. 408–452). New York: Oxford University Press.
∗
Mackey, A., & Oliver, R. (2002). Interactional feedback and children’s L2
development. System, 30, 459–477.
Mackey, A., Oliver, R., & Leeman, J. (2003). Interactional input and the incorporation
of feedback: An exploration of NS-NNS and NNS-NNS adult and child dyads.
Language Learning, 53, 35–66.
Mackey, A., Oliver, R., & Philp, J. (1997). Patterns of interaction in NNS-NNS
conversation. Paper presented at Second Language Research Forum, East Lansing,
MI.

∗
Mackey, A., & Philp, J. (1998). Conversational interaction and second language
development: Recasts, responses, and red herrings? Modern Language Journal, 82,
338–356.
∗
McDonough, K. (2005). Identifying the impact of negative feedback and learners’
response on ESL question development. Studies in Second Language Acquisition,
27, 79–103.
∗
McDonough, K. (2007). Interactional feedback and the emergence of simple past
activity verbs in L2 English. In A. Mackey (Ed.), Conversational interaction in
second language acquisition (pp. 323–338). New York: Oxford University Press.
Nabei, T., & Swain, M. (2002). Learner awareness of recasts in classroom interaction:
A case study of an adult EFL student’s second language learning. Language
Awareness, 11, 43–63.
Nagata, N. (1993). Intelligent computer feedback for second language instruction.
Modern Language Journal, 77, 330–339.
Nicholas, H., Lightbown, P., & Spada, N. (2001). Recasts as feedback to language
learners. Language Learning, 51, 719–758.
Norris, J., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis
and quantitative meta-analysis. Language Learning, 50, 417–528.
Norris, J., & Ortega, L. (Eds.) (2006). Synthesizing research on language learning and
teaching. Amsterdam: Benjamins.
Oliver, R. (2000). Age differences in negotiation and feedback in classroom and
pairwork. Language Learning, 50, 119–151.
Oliver, R., & Mackey, A. (2003). Interactional context and feedback in child ESL
classrooms. Modern Language Journal, 87, 519–533.
∗
O’Relly, L. (1999). The effect of focused versus unfocused communication tasks on
the development of linguistic competence during negotiated interaction.
Unpublished doctoral dissertation. University of South Florida, Tampa.
Paniagua, D. (1985). A study of overt versus covert error correction in foreign language
teaching. Unpublished doctoral dissertation. University of Texas at Austin.
Panova, I., & Lyster, R. (2002). Patterns of corrective feedback and uptake in an adult
ESL classroom. TESOL Quarterly, 36, 573–595.
Patall, E., Cooper, H., & Robinson, J. (2008). The effects of choice on intrinsic
motivation and related outcomes: A meta-analysis of research findings.
Psychological Bulletin, 134, 270–300.
Philp, J. (2003). Constraints on “noticing the gap”: Nonnative speakers’ noticing of
recasts in NS-NNS interaction. Studies in Second Language Acquisition, 25,
99–126.
Pica, T. (1988). Interlanguage adjustments as an outcome of NS-NNS negotiated
interaction. Language Learning, 38, 45–73.
Raudenbush, S. (1994). Random effects models. In H. Cooper & L. V. Hedges (Eds.),
The handbook of research synthesis (pp. 301–322). New York: Russell Sage
Foundation.

∗
Révész, A. (2007). Focus on form in task-based language teaching: Recasts, task
complexity, and L2 learning. Unpublished doctoral dissertation. Teachers College,
Columbia University, New York.
Richardson, K. M., & Rothstein, H. R. (2008). Effects of occupational stress
management intervention programs: A meta-analysis. Journal of Occupational
Health Psychology, 13, 69–93.
∗
Roig-Torres, T. (1992). Error correction in the natural approach classroom: A
contrastive study. Unpublished doctoral dissertation. University of Pittsburgh,
Pennsylvania.
Rosenthal, R. (1984). Meta-analysis procedures for social research. Beverly Hills, CA:
Sage.
Russell, J., & Spada, N. (2006). The effectiveness of corrective feedback for second
language acquisition: A meta-analysis of the research. In J. Norris & L. Ortega
(Eds.), Synthesizing research on language learning and teaching (pp. 131–164).
Amsterdam: Benjamins.
Sachs, R., & Suh, B. (2007). Textually enhanced recasts, learner awareness, and L2
outcomes in synchronous computer-mediated interaction. In A. Mackey (Ed.),
Conversational interaction in second language acquisition (pp. 324–338). New
York: Oxford University Press.
∗
Sagarra, N. (2007). From CALL to face-to-face interaction: The effect of
computer-delivered recasts and working memory on L2 development. In A. Mackey
(Ed.), Conversational interaction in second language acquisition (pp. 229–248).
New York: Oxford University Press.
∗
Sauro, S. (2007). A comparative study of recasts and metalinguistic feedback through
computer mediated communication on the development of L2 knowledge and
production accuracy. Unpublished doctoral dissertation. University of
Pennsylvania, Philadelphia.
Schmidt, R. (1990). The role of consciousness in second language learning. Applied
Linguistics, 11, 129–158.
Schmidt, R. (2001). Attention. In P. Robinson (Ed.), Cognition and second language
instruction (pp. 3–32). Cambridge: Cambridge University Press.
Schwartz, B. (1993). On explicit and negative data effecting and affecting competence
and linguistic behavior. Studies in Second Language Acquisition, 15, 147–163.
Shadish, W. (1992). Do family and marital psychotherapies change what people do? A
meta-analysis of behavioral outcomes. In Cook et al. (Eds.), Meta-analysis for
explanation: A casebook (pp. 129–208). New York: Russell Sage Foundation.
∗
Sheen, Y. (2007). The effects of corrective feedback, language aptitude, and learner
attitudes on the acquisition of English articles. In A. Mackey (Ed.), Conversational
interaction in second language acquisition (pp. 301–322). New York: Oxford
University Press.
Sheen, Y. H. (2004). Corrective feedback and leaner uptake in communicative
classrooms across instructional settings. Language Teaching Research, 8,
263–300.
Sheen, Y. H. (2006). Exploring the relationship between characteristics of recasts and

learner uptake. Language Teaching Research, 10, 361–392.
Swain, M. (1985). Communicative competence: Some roles of comprehensible input
and comprehensible output in it development. In S. Gass & C. Madden (Eds.), Input
in second language acquisition (pp. 235–252). Rowley, MA: Newbury House.
Takahashi, N. (2007). The differential effects of perceptual saliency on recasts in L2
Japanese: Learners’ noticing, integration, detection, and subsequent oral
production. Unpublished doctoral dissertation. University of Iowa, Iowa City.
∗
Takashima, H. (1995). A study of focused feedback, or output enhancement, in
promoting accuracy in communicative activities. Unpublished doctoral dissertation.
Temple University, Philadelphia, PA.
Thomas, M. (1994). Assessment of L2 proficiency in second language acquisition
research. Language Learning, 44, 307–336.
∗
Tomasello, M., & Herron, C. (1989). Feedback for language transfer errors: The
garden path technique. Studies in Second Language Acquisition, 11, 385–395.
Truscott, J. (2007). The effect of error correction on learners’ ability to write
accurately. System, 16, 255–272.
Whitlow, J. (1997). Assessing the effects of positive and negative input on second
language acquisition: A study investigating the learnability of the English passive
by speakers of Japanese. Unpublished doctoral dissertation. Boston University,
Massachusetts.

Li
Appendix Aa
Meta-analytic Data for Included Primary Studies

Implicit/ Timing Effect Standard 95% CI 95% CI
Primary studies Nb Feedback type explicitc of posts size error lower upper
Ammar & Spada (2006) 22 Prompts NA 1 1.121 0.336 0.095 1.345
22 2 1.917 0.373 1.186 2.649
20 Recasts I 1 0.720 0.319 0.095 1.345
2 0.993 0.327 0.351 1.634
Ayoun (2001) 45 Recasts I 1 0.735 0.211 0.321 1.149
51
Bationo (1991) 42 Metalinguistic E 1 0.759 0.391 −0.008 1.530

14
Bell-Corrales (2001) 28 Explicit correction E 1 0.564 0.270 0.034 1.093
40 3 0.321 0.267 −0.201 0.844
29 Recasts I 1 0.592 0.249 0.104 1.080
3 0.817 0.254 0.320 1.314
Carroll & Swain (1993) 20 Explicit hypothesis E 1 1.036 0.337 0.376 1.696
20 rejection 2 0.606 0.323 −0.028 1.240
20 Explicit utterance E 1 0.892 0.332 0.242 1.542
20 rejection 2 0.419 0.320 −0.208 1.045
20 Indirect E 1 0.518 0.321 −0.112 1.148
metalinguistic 2 0.437 0.320 −0.190 1.064
Combinedd E 1 0.741 0.328 0.098 1.384
2 0.474 0.321 −0.154 1.103
358
(Continued)
Li
Appendix A
359
Continued
Recasts I 1 1.044 0.337 0.384 1.705
2 0.412 0.320 −0.215 1.038
Carroll, Swain, 60 Explicit correction E 1 0.871 0.333 0.218 1.524
& Roberge (1992) 19 2 0.593 0.325 −0.045 1.230
Chen (1996) 9 Explicit rejection E 1 0.012 0.461 −0.892 0.916
9 3 0.096 0.461 −0.808 1.000
10 metalinguistice E 1 4.873 0.921 3.068 6.677
3 1.280 0.504 0.291 2.269
DeKeyser (1993) 19 Explicit E 1 0.124 0.115 −0.542 0.789
16
Ellis (2007) 12 Metalinguistic E 1 0.552 0.453 −0.335 1.440
12 2 0.890 0.467 −0.025 1.805
10 Recasts I 1 0.633 0.459 −0.267 1.532
2 0.196 0.433 −0.652 1.045
Ellis et al. (2006) 12 Metalinguistic E 1 −0.005 0.429 −0.846 0.836
12 2 0.290 0.440 −0.573 1.152
10 Recasts I 1 −0.422 0.437 −1.280 0.435
2 −0.400 0.451 −1.283 0.483
Han (2002) 4 Recastsf I 1 2.285 0.915 0.492 4.078
4 3 1.171 0.765 −0.328 2.671
(Continued)

Li
Appendix A
Continued
Herron & Tomasello (1988) 16 Clarification & E 1 0.750 0.365 0.033 1.467
16 metalinguistic
Herron (1991) 13 Explicit correction E 1 1.605 0.460 0.703 2.510
12 2 1.669 0.465 0.758 2.580
Hino (2006) 9 Clarification I 1 0.201 0.461 −0.702 1.103
9 2 0.153 0.460 −0.749 1.055
10 Metalinguistic E 1 1.364 0.510 0.365 2.364

9 2 1.042 0.490 0.082 2.001
Recasts I 1 0.209 0.448 −0.670 1.087
2 0.292 0.450 −0.589 1.174
Combined I 1 0.205 0.455 −0.686 1.095
2 0.223 0.455 −0.669 1.114
Kang (2007) 11 Explicit E 1 1.940 0.517 0.927 2.954
12 3 0.675 0.440 −0.187 1.538
11 Implicit I 1 2.054 0.521 1.033 3.075
3 0.994 0.442 0.127 1.861
Leeman (2003) 18 Explicit utterance E 1 0.642 0.355 −0.054 1.337
18 rejection 2 0.399 0.348 −0.295 1.071
19 Recasts I 1 0.958 0.374 0.226 1.690
2 0.609 0.362 −0.101 1.320
(Continued)
360
Appendix A
Li
361
Continued
Loewen & Erlam (2006) 11 Metalinguistic E 1 0.025 0.457 −0.870 0.920
12 2 −0.585 0.466 −1.498 0.328
8 Recasts I 1 0.041 0.473 −0.887 0.968
2 −0.400 0.451 −1.283 0.483
Loewen & Nabei (2007) 10 Clarification I 1 0.243 0.401 −0.543 1.029
8 Metalinguistic E 1 0.358 0.421 −0.468 1.184
7 Recasts I 1 0.471 0.368 −0.250 1.192
31 Combined I 1 0.356 0.385 −0.397 1.111
Long, Inagaki, & Ortega (1998) 7 Recasts I 1 0.561 0.527 −0.472 1.595
8
Lyster (2004) 45 Prompts NA 1 1.030 0.315 .0.417 1.652
56 3 0.942 0.312 0.330 1.553
47 Recasts I 1 0.506 0.308 −0.098 1.110
3 0.485 0.308 −0.119 1.089
McDonough (2007) 27 Clarification I 3 1.422 0.328 0.779 2.065
21 Recasts I 3 0.824 0.286 0.263 1.385
26 Combined I 3 1.122 0.307 0.520 1.726
McDonough (2005) 15 Clarification I 3 0.649 0.516 −0.362 1.662
15 Repetition I 3 1.255 0.509 0.256 2.254
15 Repetition & I 3 0 0.592 −1.161 1.161
15 clarification
Combined I 3 0.476 0.554 −0.610 1.562

(Continued)
Li
Appendix A
Continued
Macheak (2002) 11 Elicitation I 1 −0.205 0.431 −0.011 0.641
13 3 0.214 0.431 −0.631 1.058
11 Recasts I 1 −0.586 0.422 −1.414 0.242
3 −0.160 0.412 −0967 0.646
Combined I 1 −0.372 0.426 −1.207 0.463
3 −0.018 0.418 −0.839 0.802
Mackey & Oliver (2002) 22 Clarification & recasts I 1 1.082 0.528 0.047 2.116

2 1.370 0.570 0.253 2.487
Mackey & Philp (1998) 17 Recasts I 1 1.578 0.748 0.111 3.045
12 2 0.887 0.907 −0.890 2.665
O’Relly (1999) 16 Clarification I 3 −0.161 0.367 −0.879 0.558
16 Recasts I 3 0.530 0.372 −0.200 1.260
14 Combined I 3 0.184 0.369 −0.539 0.908
Révész (2007) 72 Recastsg I 1 3.149 0.523 2.123 4.175
18 3 0.577 0.683 −0.762 1.917
Roig-Torres (1992) 15 Feedback NA 1 0.235 0.367 −0.483 0.954
15
Sagarra (2007) 35 Recastsh I 1 3.966 0.427 3.127 4.804
30 2 3.497 0.395 2.723 4.272
3 2.316 0.321 1.687 2.946
(Continued)
362
Appendix A
Li
363
Continued
Sauro (2007) 7 Metalinguistic E 1 0.716 0.517 −0.296 1.729
8 2 0.947 0.528 −0.088 1.981
8 Recasts I 1 0.554 0.529 −0.482 1.590
2 0.842 0.541 −0.219 1.903
Sheen (2007) 26 Metalinguistic E 1 0.470 0.276 −0.071 1.011
26 3 0.611 0.279 0.065 1.157
28 Recasts I 1 0.161 0.273 −0.374 0.695
3 0.295 0.274 −0.242 0.831
Takashima (1995) 27 Clarification I 1 0.168 0.258 −0.338 0.674
34 2 0.289 0.259 −0.219 0.797
3 0.290 0.259 −0.218 0.799
Tamasello & Herron (1989) 16 Metalinguistic E 1 1.000 0.375 0.269 1.739
16 2 0.767 0.366 0.049 1.485
a
The displayed effect sizes are the ones that were used in the analysis. In the case of multiple effect sizes for a feedback type or target
structure, they were averaged.
b
This column lists the number of participants in the experiment and control groups of each study. The last number of each cell relates to the
control group. These numbers refer to the participants involved in the feedback groups that contribute effect sizes to this meta-analysis, so
they may not correspond to the total number of participants of each study.
c
I = implicit feedback; E = explicit feedback; NA = not applicable.
d
In cases in which multiple implicit or explicit feedback types were examined in a primary study, they are combined to generate an average
value that is used in the “explicit versus implicit” analysis. That value is not necessarily the average of all the effect sizes of a primary study.
e−h
Outliers; e and f were not included in the analysis on the most frequent feedback types as reported in primary studies; g was also not

included in analyzing the overall immediate effects of feedback; h was excluded from all analyses.
Li
Appendix B
Learner Characteristics and Methodological Features of Included Studies

Primary studies Agea Contextb L2 Modec Setting Instructord Outcomee Publicationf Lengthg Taskh
Ammar & Spada (2006) 12 FL English FF Class Teacher FCR Published >120 Com
Ayoun (2001) 21 FL French Com Lab Computer CCR Published >120 Drill
Bationo (1991) 22 FL French Com Lab Computer CCR Dissertation 60–120 Drill
Bell-Corrales (2001) 21 FL Spanish FF Class Teacher CCR Dissertation 60–120 Com
Carroll & Swain (1993) NR SL English FF Lab NS CCR Published NR Drill
Carroll et al. (1992) 21 SL French FF Lab NS CCR Published NR Drill
Chen (1996) 21 FL Chinese Com Lab Computer CCR Dissertation <50 Drill
DeKeyser (1993) NR FL French FF Class Teacher CCR/RCR Published >120 Com

Ellis (2007) 25 SL English FF Class Teacher CCR Published 60–120 Com
GJT
Ellis et al. (2006) 25 SL English FF Group NS CCR/GJT Published 60–120 Com
Han (2002) NR SL English FF Group NS FCR Published >120 Com
Herron & Tomasello (1988) 21 FL French FF Lab NS CCR Published <50 Drill
Herron (1991) 21 FL French FF Class Teacher CCR Published >120 Drill
Hino (2006) 20 FL Japanese FF Lab NS FCR Dissertation <50 Com
Kang (2007) 18–45 FL Korean FF Lab NS FCR/GJT Dissertation <50 Com
Leeman (2003) 21 FL Spanish FF Lab NS FCR Published <50 Com
Loewen & Erlam (2006) 26 SL English Com Group Teacher GJT Published <50 Com
Loewen & Nabei (2007) 19 FL English FF Group NS GJT Published <50 Com
Long et al. (1998) 21 FL Japanese FF Lab NS CCR Published <50 Com
(Continued)
364
Appendix B
Li
365
Continued
View publication stats

Primary studies Agea Contextb L2 Modec Setting Instructord Outcomee Publicationf Lengthg Taskh
Lyster (2004) 10 FL French FF Class Teacher CCR/FCR/SR Published >120 Var

Macheak (2002) 20 FL French FF Class Teacher GJT/CCR/SR Dissertation >120 Var
Mackey & Oliver (2002) 10 FL English FF Lab NS FCR Published 60–120 Com
Mackey & Philp (1998) 15–30 SL English FF Lab NS FCR Published 60–120 Com
McDonough (2005) 18.5 FL English FF Lab NS FCR Published <50 Com
McDonough (2007) 18 FL English FF Lab NS RCR Published 60–120 Com
O’Relly (1999) 19+ FL Spanish FF Lab NS FCR Dissertation NR Com
Révész (2007) 16.9 FL English FF Lab NS FCR/GJT/CCR Dissertation <50 Com
Roig-Torres (1992) 19 FL Spanish FF Class NS CCR Dissertation >120 Com
Sagarra (2007) 21 FL Spanish Com Lab Computer CCR Published <50 Drill
Sauro (2007) 24 FL English Com Lab NS CCR/GJT Dissertation <50 Com
Sheen (2007) 20–51 SL English FF Class Teacher Combined Published 60–120 Com
Takashima (1995) 20 FL English FF Class Teacher FCR Dissertation >120 Com
Tamasello & Herron (1989) 21 FL French FF Class Teacher CCR Published <50 Drill
a
NR = not reported.
b
FL = foreign language; SL = second language.
c
FF = face-to-face; Com = computer.
d
NS = NATIVE SPEAKER.
e
CCR = constrained constructed response; FCR = free constructed response; GJT = grammaticality judgment test;
SR = selected response.
f
Publication: publication type.
g
Length = treatment length (in minutes); NR = not reported.

h
Com = computer; Var = (A) variety.

Li 2010

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Li 2010

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Li 2010

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

The Effectiveness of Corrective Feedback in SLA: A

Article in Language Learning · February 2010

Language Aptitude View project

The user has requested enhancement of the downloaded file.

This study reports on a meta-analysis on the effectiveness of corrective feedback in

Language Learning 60:2, June 2010, pp. 309–365 309

witnessed a rapid increase in empirical research on the effectiveness of correc-

Language Learning 60:2, June 2010, pp. 309–365 310

One group of researchers (Krashen, 1981; Schwartz, 1993; Truscott, 2007)

311 Language Learning 60:2, June 2010, pp. 309–365

Meta-Analyses Related to the Effectiveness

Language Learning 60:2, June 2010, pp. 309–365 312

313 Language Learning 60:2, June 2010, pp. 309–365

Mackey and Goo (2007) meta-analyzed the effect of negotiated interaction

Language Learning 60:2, June 2010, pp. 309–365 314

Variables and Research Questions

315 Language Learning 60:2, June 2010, pp. 309–365

Language Learning 60:2, June 2010, pp. 309–365 316

1999, p. 311). No meta-analysis related to corrective feedback has included

317 Language Learning 60:2, June 2010, pp. 309–365

Language Learning 60:2, June 2010, pp. 309–365 318

319 Language Learning 60:2, June 2010, pp. 309–365

1. It did not measure learning. For instance, Takahashi (2007) investigated

Language Learning 60:2, June 2010, pp. 309–365 320

According to Lipsey and Wilson (2001), the study descriptors in a meta-

321 Language Learning 60:2, June 2010, pp. 309–365

Table 1 Coding scheme

Feedback type 1. Implicit/explicit/mixed

Language Learning 60:2, June 2010, pp. 309–365 322

323 Language Learning 60:2, June 2010, pp. 309–365

Learners’ Proficiency Level

Language Learning 60:2, June 2010, pp. 309–365 324

325 Language Learning 60:2, June 2010, pp. 309–365

Language Learning 60:2, June 2010, pp. 309–365 326

standard deviations, the following equation was used:

In Equations 2 and 3, N 1 and N 2 refer to the sample sizes of the compared

Sample Size Inflation

327 Language Learning 60:2, June 2010, pp. 309–365

Language Learning 60:2, June 2010, pp. 309–365 328

analyses. In addition to identifying extreme values through the examination of

329 Language Learning 60:2, June 2010, pp. 309–365

Figure 1 Publication years of the included studies.

Language Learning 60:2, June 2010, pp. 309–365 330

Figure 2 Availability bias: Funnel plot of precision by effect sizes.

331 Language Learning 60:2, June 2010, pp. 309–365

Language Learning 60:2, June 2010, pp. 309–365 332

Age k L1 k L2 k Academic status k Proficiency k

10–12 3 English 17 Chinese 1 Elementary school 2 Beginner 20

Language Learning 60:2, June 2010, pp. 309–365

Table 3 Included studies by methodological features

Proficiency measure k Setting k Context k Interlocutor type k

Institutional status 21 Lab 18 FL 26 Native speaker 16

Language Learning 60:2, June 2010, pp. 309–365

≤50 min 13 Communicative 21 Free constructed 12 Face-to-face 27

feedback when engaged in meaning-based activities (e.g., McDonough, 2005),

The Quantitative Analysis

335 Language Learning 60:2, June 2010, pp. 309–365