Li 2010
Li 2010
Li 2010
net/publication/229940242
CITATIONS READS
662 10,082
1 author:
Shaofeng Li
Florida State University
79 PUBLICATIONS 3,996 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Language aptitude: Advancing theory, testing, research and practice. By: Wen, Skehan, Biedron, Li & Sparks (Routledge,
2019) View project
All content following this page was uploaded by Shaofeng Li on 06 January 2020.
REVIEW ARTICLE
The Effectiveness of Corrective Feedback
in SLA: A Meta-Analysis
Shaofeng Li
Michigan State University
Introduction
Corrective feedback in second language acquisition (SLA) refers to the re-
sponses to a learner’s nontargetlike L2 production. The past decade has
An earlier version of this article was presented at the Annual Conference of the American As-
sociation of Applied Linguistics in Denver in March 2009. I would like to thank Susan Gass for
her generous help in all phases of this study. I am indebted to Shawn Loewen, Patti Spinner, Fred
Oswald, and Luke Plonsky for their help with various aspects of the project. My thanks also go
to the anonymous reviewers and Scott Jarvis, the associate editor of Language Learning, who
provided invaluable comments on the previous versions of this article.
Correspondence concerning this article should be addressed to Shaofeng Li, A-742 Wells Hall,
Michigan State University, East Lansing, MI 48824. Internet: lishaofe@msu.edu
DOI: 10.1111/j.1467-9922.2010.00561.x
Li Meta-Analysis of Corrective Feedback
Background
Corrective Feedback in SLA
The effectiveness of corrective feedback is mainly attributable to the negative
evidence it entails. According to Gass (1997), language learners have access to
two types of input: positive evidence and negative evidence. Positive evidence
informs the learner of what is acceptable in the target language and contains
“the set of well-formed sentences to which learners are exposed” (p. 36).
Negative evidence, in contrast, provides the learner with information about
the incorrectness of an L2 form or utterance and is often realized through the
provision of corrective feedback in response to the learner’s nontargetlike L2
production. The distinction between the two types of input gives rise to the
question of whether both types of evidence are necessary or if exposure to
positive evidence is the only necessary condition for L2 learning.
2004); (b) recasts facilitate SLA (Han, 2002; Mackey & Philp, 1998); (c) dif-
ferent types of feedback have differential impact on SLA—explicit feedback is
more effective than implicit feedback (Ellis et al., 2006) and prompts work bet-
ter than recasts (Ammar & Spada, 2006; Lyster, 2004)—and (d) the occurrence
of uptake varies in different contexts (Lyster & Mori, 2006) and is constrained
by the characteristics of feedback (Loewen & Philp, 2006).
Although corrective feedback has generally been found to be beneficial to
L2 acquisition, to provide better pedagogical implications to L2 practitioners
and identify research orientations for L2 researchers, it is important to have an
accurate and holistic picture of how effective it is across different studies, how
different feedback types impact L2 learning differently, and how the effective-
ness of corrective feedback is mediated by the idiosyncratic methodological
features of primary studies. To answer these questions, it is necessary to con-
duct a comprehensive research synthesis, and meta-analysis has been shown to
be one of the most effective tools for research synthesis.
& Goo, 2007; Norris & Ortega, 2000; Russell & Spada, 2006) have revealed
about the effectiveness of corrective feedback.
Norris and Ortega (2000) meta-analyzed the empirical studies published
between 1980 and 1998 on the effectiveness of L2 instructional treatments. In
the analysis, L2 instruction was identified as either focus on form or focus on
forms, depending on whether the teaching of linguistic forms was embedded in
meaningful communication or was implemented in the absence of context, and
as either explicit instruction or implicit instruction, depending on how much
learner attention was drawn to linguistic forms. In general, it was found that the
average effect size for focus-on-form treatments was slightly larger than that
for focus-on-forms treatments and that explicit instruction had substantially
larger effect sizes than implicit instruction.
Within the broad domains of focus on form and focus on forms, Norris
and Ortega (2000) also investigated the effect sizes of some subgroups of
studies, including the two groups of studies examining the efficacy of recasts
and metalinguistic feedback, respectively. The average effect size for recasts
was d = 0.81, and for metalinguistic feedback, it was d = 0.96, suggesting a
larger effect for the latter, a more explicit type of feedback. It must be noted
that due to the small number of included studies that investigated recasts, the
confidence interval was wide (–0.43 to 2.05) and included zero, precluding
any firm conclusion beyond chance. In addition to recasts and metalinguistic
feedback, other feedback types such as clarification were also included in the
analysis, but because their effect sizes were not separately calculated (either
because of the focus of the analysis or because of the small number of related
primary studies contributing effect sizes), exactly how effective they were is
not known.
Russell and Spada’s (2006) analysis specifically examined the effectiveness
of corrective feedback. The included studies were published between 1988
and 2003. It encompassed both studies examining oral feedback and those
examining written feedback; it also investigated corrective feedback (be it oral
or written) provided to written errors (i.e., grammar errors in L2 writing). It was
found that the mean effect size for all treatments was 1.16 and that overall oral
feedback had a smaller effect size than written feedback, although both effect
sizes were large. As the number of studies included in the analysis is relatively
small (k = 15), the meta-analysts cautioned against the generalizability of
the findings, especially when it comes to the effect sizes of the subgroups.
Additionally, due to the lack of primary research on the effects of individual
feedback types, the meta-analysts did not distinguish between feedback types
or carry out separate analyses for them.
This meta-analysis sought to fill these gaps or answer the remaining ques-
tions by taking the following steps: (a) It included unpublished dissertations to
minimize publication bias; (b) it excluded studies investigating feedback fol-
lowing errors in the learner’s written production based on the assumption that
those studies involve different constructs; (c) it identified some variables that
had not been dealt with in previous analyses such as feedback delivery mode,
publication type, and so forth; (d) corrective feedback was examined as the
sole construct so that a clearer picture about this type of L2 instruction can be
obtained; (e) both the fixed-effects model and the random-effects model were
utilized to show a more comprehensive picture for the topic under investigation;
(f) the principle of “one study, one effect size” was followed as much as possi-
ble to minimize the presence of sample size inflation and nonindependence of
events.
Research Setting
Descriptive studies (Lyster & Mori, 2006; Sheen, 2004) have shown that the
occurrence and uptake of corrective feedback was very different across research
or instruction settings, but experimental studies have not singled out research
setting as an independent variable. Research setting can be divided into foreign
language (FL) and second language (SL). A foreign language setting is one
where the learner studies a language that is not the primary language of the
linguistic community (e.g., an L1 Korean speaker learning English in Korea);
a second language setting is one in which the learner’s target language is the
primary language of the linguistic community (e.g., an L1 Korean speaker
learning English in the United States). Because the dynamics of these two
settings are different, the effects of feedback are likely to differ.
Research Context
Feedback studies have been conducted in both the laboratory and the classroom.
In the laboratory, distraction is minimized and instructional interventions can
be better implemented than in the classroom. Classroom feedback studies are
mostly described as quasi-experimental because distracter variables cannot be
easily or entirely controlled. In light of the differences between the laboratory
and the classroom, there is reason to believe that the effects of feedback may
not be the same across the two contexts.
Task Type
Feedback can be provided in communicative activities (focus-on-form activi-
ties), in which linguistic forms are attended to in meaningful communication; it
can also be provided in mechanical drills (focus-on-forms activities), in which
the primary focus is on linguistic forms and in which feedback is supplied on
an item-by-item basis. Feedback provided in these two remarkably different
task types may lead to different learning outcomes.
Mode of Delivery
Mode of delivery refers to whether feedback is provided through the computer
or in face-to-face communication. Sagarra (2007) stated that feedback provided
through the computer is more salient; one anonymous reviewer pointed out that
computerized feedback might be more consistent. Therefore, the possibility
exists that the mode of delivery may impact the effects of feedback. To date,
no empirical research has been done to compare the two modes of delivery.
Outcome Measure
As previous meta-analyses showed (Norris & Ortega, 2000; Mackey & Goo,
2007), primary researchers used varied test formats to measure the effects of
L2 instruction, and outcome measure did mediate the effects of instruction.
Determining how it would impact the effects of corrective feedback or how the
effects of feedback are reflected by different test types is one of the objectives
of this meta-analysis.
Publication Type
It is generally believed that studies with significant findings are more likely to
be published. This is called publication or availability bias in meta-analysis—“a
tendency on the part of researchers, reviewers, and editors to submit, accept, and
publish studies that report statistically significant results consistent with theo-
retical or previously established empirical expectations” (Cornell & Mulrow,
Length of Treatment
In feedback research, the duration of treatment ranges from 15 min (Chen, 1996)
to a semester (Tomasello & Herron, 1989). Although the impact of this variable
must be investigated together with other variables such as the complexity of
linguistic structure, the intensity of feedback (Norris & Ortega, 2000), learner
differences, and so on, it is interesting to examine whether treatment length
alone has any influence on the effects of feedback.
Age
Descriptive research (Mackey, Oliver, & Philp, 1997; Oliver, 2000) has indi-
cated that children were different from adult learners in the way they responded
to and used feedback. Primary studies on corrective feedback have been con-
ducted with adult or child L2 learners, but no study has examined age as an
independent variable. This meta-analysis seeks to determine if learners’ age
mediates the effectiveness of corrective feedback.
As discussed, these so-called methodological or learner characteristics that
potentially affect the effectiveness of corrective feedback have not been inves-
tigated in primary research. However, they may become independent variables
if they have a substantial effect. That being the case, the findings of primary
studies must be reinterpreted in conjunction with these factors. Although these
variables have not been examined in primary studies, their impact can be identi-
fied by performing a meta-analysis in which the effect sizes generated by these
studies are compared and synthesized.
This meta-analysis seeks to answer the following research questions:
1. What is the overall effect of corrective feedback on L2 learning?
2. Do different feedback types impact L2 learning differently?
3. Does the effectiveness of corrective feedback persist over time?
4. What are the moderator variables for the effectiveness of corrective
feedback?
Method
Identifying Primary Studies
The following steps were taken to locate related primary studies. First, two
commonly used electronic databases in the fields of applied linguistics and
education, LLBA and ERIC, were searched. The key words and combination
of key words that were used include corrective feedback, feedback, implicit
feedback, explicit feedback, negative evidence, negative feedback, error cor-
rection, negotiation, recasts, metalinguistic feedback, prompts, clarification,
second language acquisition/learning, foreign language education/learning,
focus on form, focus on forms, and form-focused instruction. Second, both elec-
tronic and manual searches were performed for the current and back issues of
some widely cited journals in SLA and applied linguistics, including, but not
limited to, Language Learning, Studies in Second Language Acquisition, Ap-
plied Linguistics, The Modern Language Journal, TESOL Quarterly, Foreign
Language Annals, Language Teaching Research, System, The Canadian Mod-
ern Language Review, International Review of Applied Linguistics, Computer
Assisted Language Learning, and Language Learning and Technology. Third,
state-of-the-art articles (e.g., Ellis & Sheen, 2006; Felix, 2005; Nicholas et al.,
2001) and edited books, course books, and book chapters related to corrective
feedback (e.g., Doughty & Long, 2003; Gass, 2003; Gass & Selinker, 2001;
Long, 2007; Mackey, 2007), as well as their reference sections, were scanned
for potential sources of primary research. Fourth, the reference sections of the
published meta-analyses associated with corrective feedback were carefully
examined.
Finally, in order to minimize availability bias or the “file-drawer” problem
(the fact that some fugitive literature might be tucked away in researchers’
file cabinets), this meta-analysis included Ph.D. dissertations. The existence
of availability bias is evidenced by Rosenthal’s (1984) finding that averaged
d values yielded by theses and dissertations were at least 40% less than those
from other sources. Because of the possible presence of availability bias, experts
in meta-analysis (Hunter & Schmidt, 2004; Konstantopoulos & Hedges, 2004;
Lipsey & Wilson, 2001) have been calling for the inclusion of unpublished
studies in meta-analyses.
Initially, the researcher considered obtaining as much “fugitive” literature
as possible, including conference presentations, manuscripts in press, and so
on, but due to the difficulty involved in retrieving those materials, it was
decided that only Ph.D. dissertations would be included. In light of the fact that
most dissertations are carefully designed and provide detailed information on
research methodology and statistical analyses, it is justified to include them in a
meta-analysis. The electronic database ProQuest Dissertations and Theses2 was
utilized to search for dissertations. The key words used in search of published
studies were also used to search for dissertations. After related dissertations
were identified, they were requested and obtained through the InterLibrary Loan
service at Michigan State University.
Inclusion/Exclusion Criteria
A study must have had the following characteristics to be included in this
meta-analysis:
1. One of the independent variables was corrective feedback, either in the
form of recasts, metalinguistic feedback, explicit correction, negotiation
(clarification request, confirmation check, elicitation, and repetition) and
so on, or a combination of different feedback types.
2. Feedback was delivered either face-to-face or via the computer.
3. It was experimental or quasi-experimental and had a control group or a
group that could be considered a comparison group (i.e., no feedback
treatment or least amount of feedback treatment) so that learning effects
after treatment could be observed by comparing the gains of experiment
groups and those of the control or comparison group.
4. The effect of feedback could be disentangled from the effects of other
treatments. This specification made it possible to include studies in which
instructional intervention included feedback as well as other instructional
types. For instance, one study that was included in this analysis but excluded
from Mackey and Goo’s (2007) study is by Lyster (2004), who examined
four conditions: FFI (form-focused instruction) + recasts, FFI + prompts,
FFI-only, and control. It was excluded because the researchers argued
that the FFI + recasts and FFI + prompts groups involved two types
of instruction—FFI and feedback—so it was difficult to tease them out.
However, if the FFI-only group, instead of the control group that received
no treatment, serves as the comparison group, any effect based on the
comparison between this group and the FFI + recasts or FFI + prompts
group must be due to the presence or absence of feedback.
5. The dependent variable measured the learning of an L2 feature, be it
morphosyntactic, lexical, or phonological.
6. It was published in English.
7. It utilized statistical analyses that investigated mean differences. Although
it is possible to convert one effect size index to another (e.g., from r to d and
vice versa), meta-analyzing studies that use different effect size measures
usually does not generate interpretable results (Lipsey & Wilson, 2001).
8. It examined the effect of corrective feedback on either child L2 or adult
L2 (following Mackey & Goo, 2007).
A study was excluded from this analysis for the following reasons:
Coding
Because of the cumbersome, complicated, and important nature of coding for
meta-analysis, the creation of a coding scheme was a cyclic process that in-
volved repeated modifications and revisions. At first, 20% of the retrieved
studies were examined to generate a preliminary scheme identifying the inde-
pendent and dependent variables and methodological features, which were then
categorized and given labels that applied to as many studies as possible. The
coding protocols of previous meta-analyses were also consulted in the estab-
lishment of the preliminary scheme. To ensure coding reliability, the primary
studies went through a total of five rounds of coding. At the completion of the
third round, a second coder (a meta-analyst) coded 11 out of the 33 retrieved
studies independently, including 6 dissertations and 5 published articles. The
second coder was asked to pay particular attention to high-inference variables
such as feedback type and outcome measure. The agreement rate was 98%, and
differences were resolved through discussion. Fourth and fifth rounds of coding
were performed to make sure that all of the data were coded in compliance with
the protocol both coders agreed upon.
Feedback Type
Partly because of the variety of feedback types investigated by primary re-
searchers and partly because of the different ways to operationalize the same
feedback types in different studies, the coding of feedback type posed a great
challenge. On the one hand, the categories must be general enough to encom-
pass as many studies as possible; on the other hand, the categories must be
specific enough to maintain the unique features of primary studies. In this
meta-analysis, two schemes were developed with regard to feedback type.
First, feedback was identified as reported and defined in primary studies and
the original labels were maintained. This scheme makes it possible to compare
the effects of the most frequent corrective strategies in the data set. In coding
feedback type, care was taken to make sure that the categories that were used
were consistent across studies regardless of the idiosyncrasy involved when
primary researchers labeled the types of feedback they investigated. Feedback
types were categorized and labeled according to the scheme developed by
Lyster and his colleagues (Lyster, 1998, 2001, 2004; Lyster & Mori, 2006;
Lyster & Ranta, 1997). Recasts refer to partial or complete reformulation of the
learner’s erroneous utterance; explicit correction is defined as the provision of
the correct form while clearly indicating that the learner’s utterance is wrong;
metalinguistic feedback refers to metalinguistic comments or information about
the learner’s utterance; elicitation refers to the interlocutor’s (teacher or native
speaker) attempt to elicit a reformulation from the learner by asking questions
such as “How do we say this in English?”; clarification requests ask the learner
to clarify his/her utterance through questions such as “Pardon me?” or “I don’t
understand”; repetition is a move where the interlocutor repeats the learner’s
ill-formed utterance.
Features Descriptors
The second scheme coded feedback types into implicit feedback and explicit
feedback. Where it was impossible to classify certain feedback types in terms
of explicitness/implicitness, such as when a feedback type was operationalized
as containing both implicit and explicit feedback (such as “prompts” in Lyster,
2004, which included clarification, elicitation, repetition, and metalinguistic
feedback), their original labels were maintained and they were not included for
analysis when the implicit versus explicit comparison was made. Although the
explicitness of feedback varies along a continuum and even the same feedback
type, such as recasts, can vary in explicitness, it is generally agreed that recasts
are toward the implicit end and explicit correction and metalinguistic feedback
are at the explicit end (Ellis et al., 2006; Lyster, 1998). Corrective feedback
in the form of clarification and elicitation was classified as implicit (Carroll &
Swain, 1993). Consequently, in this meta-analysis, implicit feedback included
recasts, negotiation (clarification requests, elicitation, and repetition), and any
type of feedback that was not intended to overtly draw the learner’s attention to
his/her erroneous production; explicit feedback included metalinguistic feed-
back, explicit correction, and any feedback type that overtly indicated that the
learner’s L2 output was not acceptable (such as “explicit hypothesis rejection”
in Carroll & Swain, 1993). The implicit versus explicit dichotomy is necessary
because it has been argued that explicit feedback is superior to implicit feedback
in SLA because the former is more salient (Carroll & Swain, 1992; Ellis et al.,
2006).
As one anonymous reviewer pointed out, the boundary between explicit and
implicit feedback cannot be easily drawn and there are different ways to clas-
sify feedback types. For instance, Lyster and Ranta (1997) argued that feedback
types should be categorized according to whether learner repair is encouraged:
Recasts and explicit correction supply the correct form and therefore do not
encourage learner repair, whereas prompts (which include metalinguistic feed-
back, elicitation, clarification, and repetition) withhold the target form and en-
courage self-correction. Loewen and Nabei (2007) pointed out that recasts and
explicit correction could be labeled “other repair” and prompts “self-repair.” It
would be interesting to meta-analyze the effectiveness of prompts in comparison
with recasts and explicit correction. However, there have been only two studies
that investigated the effects of prompts compared with recasts. Most primary
researchers have examined the effects of individual feedback types and opera-
tionalized and discussed the results in terms of the explicitness/implicitness of
the feedback. Therefore, the “explicit versus implicit” scheme was used in this
meta-analysis.
Outcome Measure
Following Norris and Ortega (2000), measures of treatment effect were coded
as metalinguistic judgments (or grammatical judgment tests [GJTs]) if learners
were required to make a judgment on the grammaticality of some target struc-
tures; as selected responses if learners were asked to choose the correct answer
among several alternatives; as constrained constructed responses if learners
were required to produce the tested forms in tasks where the use of the tar-
get structure was essential; and as free constructed responses if learners were
required to produce the target language without many constraints.
Timing of Posttests
Following Keck, Iberri-Shea, Tracy-Ventura, and Wa-Mbaleka (2006), a test
was defined as an immediate posttest if it was taken less than 7 days after the
treatment, as a short-term delayed posttest if it was administered 8–29 days
after the treatment, and as a long-term delayed posttest if it happened 30 days
or later after the treatment. In cases in which the posttesting time frame of a
primary study did not match the scheme of this meta-analysis, it was coded
to fit into the scheme. For instance, in Bationo (1991), the first and second
posttests were both administered within 7 days after the treatment, but only the
first was included and coded as “Post 1.”
Measures of Proficiency
As in Keck et al.’s study (2006; also see Thomas, 1994), a proficiency measure
was coded as an impressionistic judgment if the participants’ proficiency level
was based on the researcher’s personal evaluation; as institutional status if
learners’ proficiency was assessed on the basis of their enrollment in a language
class or program; as in-house assessment if a placement test or a test created
by the researcher was used; and as a standardized test if the participants’
proficiency was calibrated according to their performance on an established
test such as TOEFL or the ACTFL Proficiency Guidelines. Because of the high
degree of heterogeneity in primary researchers’ use of proficiency measures,
this variable was not included in the moderator analyses.
Length of Treatment
Three categories were identified as far as length of treatment is concerned.
If the duration of a treatment was 50 min or less, it was coded as a “short
treatment”; if it was between 60 and 120 min, it was considered “medium”; if
it was over 120 min, it was considered “long.” It should be noted that the cutoff
points for the length of treatment were arbitrary. In Norris and Ortega’s (2000)
meta-analysis, four categories of treatment length were identified: brief (less
than 1 hr), short (over 1 hr but less than 2 hr), medium (from 3 to 6 hr), and
long (over 7 hr). In this meta-analysis, three, rather than four, categories were
created due to the relatively small sample size; the boundaries of each category
were delineated in the way that better fit with the distribution of the studies in
the data set in terms of duration of treatment.
Task Type
Tasks that involved meaningful communication were coded as “communica-
tive.” Such tasks include information gap, jigsaw, decision making, and so on,
the focus of which is on fulfilling a task rather than linguistic forms per se.
Tasks that focused on linguistic features and that required the learner to engage
in mechanical practice were coded as “drill.” An example of such a task is when
a learner is required to answer discrete questions by using the target structure
to be learned, followed by corrective feedback about the answers. Tasks that
did not fit in either category were recorded as “miscellaneous,” such as when a
task contained both drills and communicative activities.
Learners’ Age
Learners’ age was coded as follows. For studies that reported participants’
average age, the original information was recorded; for studies that reported
participants’ enrollment at school, such as “university students,” “freshmen,”
and so on, their age was estimated (e.g., 12 for “sixth graders” ’ and 18 for
“freshmen”); for studies that reported a narrow range such as “18–20,” the
median (19) was taken as the average age; for studies that did not provide any
related information or provided a wide range such as “18–55,” they were coded
as such and were not included when the age effect was investigated. Because
of the lack of studies dealing with child L2 learners (n = 3), which makes it
difficult to determine the differential effects of corrective feedback on child
and adult learners as separate groups, in this meta-analysis learners’ age was
investigated as a continuous moderator variable.
Analysis
All the analyses were performed by using professional meta-analysis software
called Comprehensive Meta-Analysis (CMA; Borenstein, Hedges, Higgins, &
Rothstein, 2005), which has been developed by a group of experts from the
United States and the United Kingdom. An almost “all-purpose meta-analysis
program” (Hunter & Schmidt, 2004, p. 466) and “probably the most sophisti-
cated stand-alone package for meta-analysis” (Littell, Corcoran, & Pillai, 2008,
p. 146), it has many features that allow users to perform analyses that would
otherwise be impossible, such as calculating effect sizes based on different data
formats, yielding results for both the fixed-effects and random-effects models,
using Q-tests to detect significant moderators, plotting availability bias, and
so on. Ever since the program was developed, it has been used in numerous
meta-analyses in various academic fields (e.g., Juffer & van Uzendoorn, 2007;
LeBauer & Treseder, 2008; Richardson & Rothstein, 2008) and has proven to
be an effective meta-analytic tool.3
Fixed-Effects Versus Random-Effects Models
There are two models of meta-analysis that are based on different assumptions:
fixed-effects (FE) models and random-effects (RE) models (Cornell & Mulrow,
1999; Hedges, 1994; Hunter & Schmidt, 2004; Raudenbush, 1994). FE models
are based on the assumption that the population effect size is the same in
all the studies included in the meta-analysis and any variation between the
studies is attributable to sampling variability. RE models allow variation of the
true population effect in the included studies, and the variation results from
heterogeneous factors. Choosing between the FE and RE models is not easy
and often “involves a considerable amount of subjective judgment” (Cooper &
Hedges, 1994, p. 526). Because the two models generate somewhat different
results, reporting the estimates of either model alone would be misleading.
Therefore, in this meta-analysis, the results from both models are reported
(e.g., Patall, Cooper, & Robinson, 2008; Shadish, 1992) to present a more
comprehensive picture of the included studies.
Effect Size Calculation
The following contrasts were identified for effect size calculation:
1. If a study had a control group and the only difference between the control
group and experiment groups is the presence or absence of corrective
feedback, effect sizes were calculated by comparing each treatment group
with the control group.
2. If a study had a control group that did not receive any treatment and
the experiment groups received another type of instruction in addition to
corrective feedback such that the effect of feedback could not be singled out
by comparing the control group and the experiment groups, effect sizes
were calculated by using the one experiment group as the comparison
group that differed from the other experiment groups only by the presence
or absence of feedback (e.g. Lyster, 2004; Mackey & Philp, 1998).
3. If a study had no control group, effect sizes were calculated by using the
group that received the least amount of feedback as the comparison group
that provided baseline data.
Equations used in effect size calculation were selected based on the data
formats reported in primary studies. For studies that reported group means and
Mean difference
d= , (1)
Pooled SD
where mean difference refers to the difference between the mean of the ex-
periment group and that of the control group (in cases where there were no
pretest scores) or between the mean change score of the experiment group and
that of the control group (in cases where both pretest and posttest scores were
reported). The pooled standard deviation (SD) was calculated based on the
standard deviations of the experiment and control means. For studies (n = 2 in
this meta-analysis) that reported only t values or F values (n = 1), Equations 2
and 3 were used, respectively:
t 2N1 N2
d= √ , Harmonic N = , (2)
N1 + N2
Harmonic N / 2
F(N1 + N2 )
d= . (3)
N1 N2
√
3 log(odds ratio)
d= , (4)
π
where odds ratio refers to the ratio of the odds of an event occurring in the
experiment group compared to the control group.
estimates, which “can render the statistical results highly suspect” (Lipsey &
Wilson, 2001, p. 105). Where there are multiple effect sizes from one study,
the suggested solution is to randomly pick one or take the average.
In this meta-analysis, the principle of “one study, one effect size” was con-
sistently adhered to as much as possible in the various analyses. This was done
by averaging multiple effect sizes that tapped into the same construct or that re-
lated to the same independent variable. More specifically, the following moves
were performed. First, when the general analysis was conducted regarding the
overall effect of corrective feedback, each study contributed the average of
all of the effect sizes related to different feedback types. All of the subsequent
analyses that aimed to identify moderator variables were performed in the same
manner—that is, by including the average effect size from each study. Second,
if a study included multiple feedback types, multiple outcome measures, and/or
multiple target structures, priority was give to feedback type (as it is the pri-
mary focus of the study)—that is, the effect sizes based on different dependent
variables and/or different structures were averaged for each feedback type (e.g.,
Ellis, 2007). As a consequence, one study might contribute several effect sizes,
each related to one type of feedback. However, when a separate analysis was
conducted for a certain feedback type, only one effect size from each study
was entered into the analysis. Third, if a study presented results for the separate
parts of a test (such as listening, writing, reading, etc.) as well as a global score
that combined the discrete scores, the global score was used to calculate effect
sizes. Fourth, if the same results were reported in two studies, effect sizes were
extracted from only one study report (e.g., Ellis, 2007; Ellis et al., 2006).
Outliers
Because the sample size of this meta-analysis is relatively small (which is the
case for most meta-analyses in SLA), the presence of extreme values may have
a substantial impact on the results. Preliminary analysis of the data showed that
this was indeed the case: When outliers were included, the average effect size
of the overall effect of feedback on immediate posttests was 0.70 under the
FE model and 0.88 under the RE model. Without outliers, the mean effect size
became 0.61 (FE) and 0.64 (RE). To ensure the robustness of results, outliers
were excluded in this meta-analysis. The detection of outliers was performed
through the following procedure. The effect sizes (or averaged sizes if one study
contributed more than one effect size) contributed by the primary studies under
an independent variable or moderator variable were transformed into z-scores.
Any absolute value (regardless of whether it was positive or negative) larger
than 2.0 was eliminated from the analysis. The procedure was repeated for all
Analysis Procedure
After effect sizes were calculated based on the raw data derived from the study
reports, they were weighted according to their sample sizes when analyses were
performed; that is, effect sizes based on larger sample sizes carried more weight
in the analysis. The weighted effect sizes served as the dependent variable, and
the independent variables included feedback in general, feedback types, and
the timing of posttests. Moderator variables were some of the coded features
listed in Table 1. All moderator analyses were performed based on the data
associated with immediate posttests.
To determine whether the mean effect sizes were significantly different from
zero, confidence intervals were calculated. A confidence interval that includes
zero indicates that the null hypothesis that the effect of a certain treatment is
different from zero cannot be rejected at the p < .05 level. The width of an
interval indicates the robustness of the effect: Narrower intervals indicate more
robust results. With regard to the magnitude of effect size, 0.20 is considered
a small effect, 0.50 indicates a medium effect, and 0.80 suggests a large effect
(Cohen, 1988). In order to determine whether there were significant differences
between different feedback types and whether the coded learner characteristics
and methodological features (which served as categorical moderator variables)
as reported by primary studies were significant moderators of the effectiveness
of feedback, Q-tests were performed. A significant between-group Q value in-
dicates that the differences between the levels under the independent variable
are significant. To ascertain whether age and year of publication, the two con-
tinuous moderator variables, were predictors of the magnitude of effect size,
the corresponding data were subjected to meta-regression analyses.4
Results
The Research Synthesis
A total of 33 studies (see Appendix A for the effect sizes each study con-
tributed, the standard error, and the 95% confidence interval [CI] of each effect
size) published between 1988 and 2007 were included in the analysis. (Data
collection was completed by September 2008.) These studies involved a total
of 1,773 L2 learners. Among the 33 studies, 22 were published articles and 11
were Ph.D. dissertations.5 The frequency distribution of the included published
studies and Ph.D. dissertations6 from 1988 through 2007, per unit of 5 years, is
displayed in Figure 1. As shown, there has been a rapid growth in the number of
studies (both published articles and dissertations) on corrective feedback since
1997, indicating an increased interest in its role in SLA. In order to ascertain
whether availability bias was present (i.e., whether the retrieved studies tended
to be those with significant results), a funnel plot was created that plots the
effect sizes (immediate effects) contributed by primary studies against pre-
cision, the inverse of standard error (Figure 2). In a funnel plot, studies with
large sample sizes, because of their smaller sampling error and higher precision
values, appear toward the apex of the graph and tend to cluster near the mean
effect size. Studies with small sample sizes have greater sampling error and
lower precision values, so they tend to appear toward the bottom of the graph
16
14
12
10
Frequency
Dissertation
8
Published
6
0
1988-1992 1993-1997 1998-2002 2003-2007
Year
5
Precision
0
-3 -2 -1 0 1 2 3
Effect Size
and are dispersed across a range of values. If there is no availability bias, the
studies will be symmetrically distributed around the mean; if availability bias
is present, small studies will be concentrated on the right side of the mean.
This would mean that small-scale studies with greater sampling error (lower
precision values) and lower effect sizes are missing from the data.
The funnel plot of this meta-analysis shows the following patterns. First,
in general, larger sample studies (those with higher precision values) were
evenly distributed around the mean and appeared toward the upper part of the
funnel. Second, at the bottom of the plot, there were only a few effect sizes
and there were more effect sizes on the right side of the mean than on the left
side. This indicates that there was a lack of small-scale studies in the data, and
studies with small sample sizes and small effect sizes were not available. In
short, studies with medium and large sample sizes were well represented in the
data, but small-sample studies with small effect sizes were underrepresented.
A trim-and-fill analysis was performed to search for the missing values that
would change the mean effect size if these values were imputed. It was found
that under the FE model, four values were missing on the left side of the plot
and imputing these values would change the mean effect size from 0.61 (95%
CI = 0.51, 0.71) to 0.56 (95% CI = 0.46, 0.66); under the RE model, five values
should be added to the left side to make the plot symmetrical, and imputing
these values would change the mean effect size from 0.64 (95% CI = 0.47,
0.81) to 0.53 (95% CI = 0.34, 0.72).
Tables 2 and 3 summarize the learner and methodological characteristics
of the included studies (see Appendix B for the information related to each
included study). According to the reported information, the average age of
the participants of these studies was 19.6. Most of the studies recruited adult
L2 learners and only three studies investigated child learners. Around half
of the 33 studies involved L1 speakers of English and L2 learners of English.
Approximately two thirds of the studies were conducted with university students
in low-level language classes. As found in previous meta-analyses (Keck et al.,
2006; Mackey & Goo, 2007), institutional status was the predominant measure
of proficiency. Almost 55% of the studies took place in the laboratory, and
nearly 80% of them were conducted in foreign language contexts. Whereas
in 27 studies corrective feedback was provided in the face-to-face mode, only
6 investigated the effectiveness of feedback as it was delivered through the
computer. Corrective feedback was provided by native speakers in 16 studies,
by teachers in 11 studies, and by computers in 6 studies. In terms of duration of
treatment, the most frequent were treatments that lasted 50 min or less, followed
by treatments that lasted more than 2 hr and those that were 1–2 hr long. As far as
the type of outcome measure is concerned, free constructed responses were used
in 12 studies, constrained constructed responses in 19 studies, grammaticality
judgment tests in 8 studies, and selected responses in 2 studies. Finally, with
respect to the type of instruction activity where corrective feedback was made
available to learners, more than 60% of the studies involved communicative
activities and nearly 30% of them involved mechanical drills in which feedback
was provided in discrete-item practice.
Further discussion on the experimental treatments and control conditions
in the included studies is necessary prior to the quantitative analysis. Typi-
cally, studies conducted in the laboratory involved dyadic interaction between
a native speaker and a nonnative speaker, and learners’ nontargetlike produc-
tion was followed by the corrective moves under investigation (e.g., Mackey
& Philp, 1998). Furthermore, as will also be shown in the Discussion section,
instructional interventions in most lab-based studies lasted less than 50 min. In
classroom-based studies and studies carried out with small groups (three to four
learners), feedback was intended to be directed toward the whole class or group
although only one learner’s erroneous production was responded to (e.g., Ellis
et al., 2006). Unlike studies conducted in the laboratory, classroom- or group-
based studies mostly used longer treatments. In terms of the “communica-
tive versus drill” division, learners involved in the former condition received
333
Table 2 Included studies by learner characteristics
334
Meta-Analysis of Corrective Feedback
Li Meta-Analysis of Corrective Feedback
336
Meta-Analysis of Corrective Feedback
Note. In each column, the numbers on the left are derived from the FE model and those on the right are based on the RE model.
Li Meta-Analysis of Corrective Feedback
0.8
0.7
0.6
0.5
Effect size
Implicit
0.4
Explicit
0.3
0.2
0.1
0
Immediate Short Long
Figure 3 Implicit and explicit feedback: Change of mean effect sizes over time.
long-term effect of clarification was 0.53 under the FE model and 0.55 under
the RE model. However, under the RE model, the effect size was not significant,
as the confidence interval crossed zero. Q-Test results showed that there was
no significant difference between different feedback types or between different
time points where a particular feedback type is concerned. However, the patterns
that emerged seem interesting and deserve further discussion, especially given
that some of the patterns were also obtained by previous meta-analyses.
Recall that corrective feedback was also coded as implicit versus explicit
to answer the question of whether explicit feedback was more effective than
implicit feedback in facilitating SLA. Implicit feedback refers to any correc-
tive move that does not overtly inform the learner of the unacceptability of
his/her erroneous production; explicit feedback, in contrast, draws the learner’s
attention to the error he/she commits. Table 4 shows that under both models,
explicit feedback worked better than implicit feedback on both immediate and
short-delayed posttests. However, on long-delayed posttests, implicit feedback
produced a larger effect size than explicit feedback, indicating that the effect of
implicit feedback was more enduring (see Figure 3 for a graphic display of the
effects of the two feedback types). Once again, although Q-tests showed that
these differences were not significant, they are noteworthy.
Moderator Variables
In order to ascertain whether the effectiveness of corrective feedback was mod-
erated by learner characteristics and methodological features, separate analy-
ses were performed for the effect sizes associated with immediate posttests.
Q-Statistics were used to determine if a certain variable was a significant mod-
erator. The analyzed moderator variables7 include research context, research
setting, task type, publication type, outcome measure, treatment length, mode
of delivery, interlocutor type, learners’ age, target language, and year of pub-
lication. The results for these variables, except for those for age and year of
publication, appear in Table 5. Two separate meta-regression analyses were
performed to determine whether age and year of publication were significant
predictors of the efficacy of corrective feedback. The following results were
obtained.
• Research context. The mean effect size associated with the studies con-
ducted in FL contexts was significantly larger than that associated with
studies conducted in SL contexts under the FE model, Q(1) = 4.5, p < .05,
indicating that corrective feedback was more effective in FL contexts than
in SL contexts. However, the difference was not significant under the RE
model, Q(1) = 1.3, p = .25.
• Research setting. Significant differences were found among the three con-
ditions (lab, class, and group) under this variable under both models—FE:
Q(2) = 31.3, p < .01; RE: Q(2) = 7.9, p < .05. Follow-up pairwise compar-
isons indicated that lab-based studies generated a significantly larger effect
than classroom-based studies—FE: Q(1) = 24.2, p < .01; RE: Q(1) = 6.6,
p < .05—or group-based studies—FE: Q(1) = 16.8, p < .01; RE: Q(1) =
3.7, p < .01. No significant difference was found between classroom-based
and group-based studies—FE: Q(1) = 2.1, p = .14; RE: Q(1) = 7.9, p =
.65.
• Task type. The mean effect size generated by mechanical drills was signifi-
cantly larger than that generated by communicative activities under the FE
model, Q(1) = 6.1, p < .05, but the difference was not significant under
the RE model, Q(1) = 2.2, p = .34.
• Mode of delivery. Computer-delivered feedback (which is provided by an
interlocutor through online communication programs or is embedded in the
computer) and face-to-face feedback did not differ substantially in affecting
L2 development—FE: Q(1) = 0.1, p = .77; RE: Q(1) = 0.1, p = .91.
• Outcome measure. There were no significant differences among the mean
effect sizes associated with the three outcome measures—FE: Q(1) = 3.3,
339
Table 5 Moderator analysis: Means and Q-statistics based on immediate effects
95% CI
Table 5 Continued
95% CI
Moderator k Mean d SE Lower Upper Q
Duration 23.9∗∗ /3.9
<50 11 1.154/1.020 0.117/0.234 0.925/0.561 1.383/1.479
60–120 7 0.461/0.502 0.128/0.171 0.209/0.166 0.713/0.837
>120 9 0.499/0.553 0.087/0.196 0.327/0.169 0.671/0.937
Interlocutorf 17.6∗∗ /4.6#
NS 14 0.997/0.975 0.103/0.218 0.795/0.548 1.198/1.403
T 12 0.412/0.474 0.080/0.143 0.305/0.194 0.618/0.753
Comp 3 0.828/0.886 0.178/0.242 0.478/0.410 1.178/1.361
L2 1.3/0.5
340
Meta-Analysis of Corrective Feedback
Li Meta-Analysis of Corrective Feedback
p = .35; RE: Q(1) = 1.7, p = .63. However, under the RE model, studies
adopting free constructed responses seemed to show a larger effect (by
around .15 and 0.3 standard deviation units) than constrained constructed
responses and GJTs (grammatical judgment tests), which might deserve
more attention.
• Publication type. Published studies did not show a larger effect than Ph.D.
dissertations; in fact, the mean effect size for dissertations was larger than
that yielded by published articles. However, the difference was not signif-
icant under either model—FE: Q(1) = 0.6, p = .43; RE: Q(1) = 0.4, p =
.56.
• Treatment length. Significant differences were found among the three
groups of studies in this category under the FE model but not under the
RE model—FE: Q(2) = 23.9, p < .01; RE: Q(2) = 3.9, p = .26. Pair-
wise comparisons revealed that short treatments (50 min or less) produced
a substantially larger mean effect size than treatments of medium length
(60–120 min)—FE: Q(1) = 16.0, p < .01; under the RE model, the differ-
ence approaches significance, Q(1) = 3.2, p = .07. Short treatments also
produced significantly larger effects than long treatments (over 120 min—
FE: Q(1) = 20.1, p < .01. There was no significant difference between
medium-length treatments and long treatments.
• Interlocutor type. The Q-tests revealed that there were significant differ-
ences among the three groups of studies (computer, native speaker, and
teacher) under the FE model, Q(2) = 17.6, p < .01; but based on the RE
model, the differences were nonsignificant, Q(2) = 4.6, p = .09. Pairwise
analyses showed that feedback provided by native-speaker interlocutors
was significantly more effective than feedback provided by teachers—FE:
Q(1) = 16.9, p < .01; under the RE model, the difference bordered on
significance, Q(1) = 3.7, p = .05. The difference between computerized
feedback (which is embedded in the computer and does not involve an
interlocutor) and teacher-provided feedback approached significance—FE:
Q(1) = 16.9, p = .06. No significant difference was found between com-
puterized feedback and feedback provided by native speakers.
• Target language. Effect sizes were calculated only for studies examining
L2 English, L2 French, and L2 Spanish, as the number of studies related
to other L2s was not large enough for analysis. It was found that studies
related to L2 English yielded larger effect sizes than those investigating
L2 French or L2 Spanish, although the differences were not significant
under either model.
Discussion
This meta-analysis sought to determine the effectiveness of corrective feedback
in L2 learning and to identify the moderator variables for its effectiveness. It
was found that overall feedback showed a medium effect and the effect was
maintained over time. In general, the effects found in this meta-analysis are
smaller than in previous analyses. In Russell and Spada’s (2006) analysis, the
studies that examined oral feedback, which is the focus of this meta-analysis,
yielded a mean effect size of 0.91; Mackey and Goo (2007) performed, among
others, a separate analysis comparing studies examining interaction with or
without feedback, and the mean effect size for the studies with feedback was
0.71 (a near-large effect). The variation among these three meta-analyses in
terms of the magnitude of effect size is attributable to their different inclusion
criteria and to the exclusion of outliers from this analysis. As for inclusion
criteria, Russell and Spada’s (2006) meta-analysis included studies published
before 2003, and some of the studies published before 2003 were included in
the current meta-analysis but not in theirs. Additionally, their analysis included
studies that examined corrective feedback in L2 writing and excluded studies
this moderator variable can be merged with the “research context” variable.
However, this somewhat “unfortunate” coincidence implies that more studies
are needed that examine different interlocutor-context combinations: different
interlocutors (teacher vs. native speaker vs. computer) in the same context (lab
or classroom) or the same interlocutor in different contexts. Other potential
areas of research related to this topic include age and/or gender of interlocutor,
relationship between the learner and the interlocutor, and so forth. Furthermore,
it should be noted that Sagarra’s study (2007), which investigated the effects
of computerized recasts, was excluded as an outlier when the analysis was
performed. Initial analyses showed that including the study would have made
computerized feedback the most effective feedback type as far as interlocutor
type is concerned. Computerized feedback is salient and consistent and can be
delivered visually or both audially and visually; one would expect it to show a
larger effect than other interlocutor types. However, this hypothesis needs to be
empirically tested.
Finally, to ascertain if the effectiveness of feedback depends to some extent
on the target language to be learned, effect sizes were calculated for the three
most frequent L2s in the data set. It was found that feedback provided in
the learning of L2 English was slightly more effective than that provided
in the learning of L2 French and L2 Spanish although the differences were
not significant. An explanation was sought through the cross-tabulation of
the results related to other independent variables. Examination of the learner
characteristics of these studies showed that out of the 13 L2 English studies, 9
were conducted with ESL/EFL learners in intensive language programs. Among
the nine L2 French studies, seven involved learners enrolled in university
language classes, one examined immersion students at an elementary school,
and one investigated high school students. The L2 Spanish learners in the
three primary studies were all students taking language classes at universities.
Language students at intensive training programs typically receive 4–5 hr of
instruction every day and might therefore be more sensitive and receptive to
corrective feedback than students in university language classes or students in
immersion programs.
It should be noted that some of the results as interpreted above are not
statistically significant. However, explanations were sought and speculations
were attempted because the results showed some interesting and noteworthy
patterns. It is hoped that these tentative speculations can provide some directions
for future research. For instance, the results about the differential effects of
implicit and explicit feedback did not reach statistical significance. However,
because the obtained patterns can be woven into the theoretical framework of
implicit and explicit knowledge (DeKeyser, 2003; Ellis, 2005) and there has
been a heated debate over which feedback type is more effective (Long, 2007;
Lyster, 2004), some explanations were attempted to account for the findings.
Conclusion
In response to the mushrooming of empirical research on the effectiveness
of corrective feedback in L2 learning, this meta-analysis was undertaken to
present a summative description of previous findings by investigating the mag-
nitude of related effect sizes across primary studies. It was intended to be
an update and complement to previous meta-analyses that are related to cor-
rective feedback in one way or another. To achieve this purpose, a series of
methodological moves were taken. These moves include establishing a dif-
ferent set of inclusion/exclusion criteria to sharpen the study focus and min-
imize publication bias, presenting the results from both the FE and the RE
model, using Q-tests to detect group differences and identify moderator vari-
ables, controlling for sample size inflation, and so on. The introduction of
these moves was expected to make the results more robust and trustworthy,
which might, in turn, provide useful information and reference for interested
L2 researchers and educators. By performing these moves, it was also hoped
that this meta-analysis can provide some methodological implications for SLA
meta-analysts.
This meta-analysis explored some issues that have not been or have been
insufficiently investigated in previous meta-analyses. It revealed that explicit
feedback worked better than implicit feedback over a short term and that the ef-
fects of implicit feedback did not fade or even increased over a long term. It also
identified some significant moderators such as research context, research set-
ting, task type, treatment length, and interlocutor type. The results concerning
these moderators as well as the ones that were found to be nonsignificant mod-
erators but that generated interesting findings were extensively and intensively
discussed, and interpretations were sought.
This analysis identified the following issues to be addressed in future re-
search. First, the presence of availability bias in this meta-analysis shows that
more research is needed on corrective feedback. As far as specific feedback
types, although there is a relatively large amount of research on recasts, less at-
tention has been paid to explicit correction, metalinguistic feedback, and even
less to negotiation moves such as clarification and elicitation, which makes
the comparison of effect sizes among different feedback types difficult. For
instance, effect sizes were not calculated for the immediate and short-term
effects of clarification, the short- and long-term effects for explicit correc-
tion, or the long-term effect of metalinguistic feedback simply because there
were not sufficient related studies. The unbalanced representation of individual
feedback types in the data set, in turn, limits the conclusion concerning the
differential effects of different feedback types. Second, the fact that primary
researchers operationalized feedback, particularly specific feedback types, in
different ways poses a great challenge to meta-analysts when they try to dis-
entangle the effects of different varieties of feedback. Therefore, a call needs
to be made for researchers to observe more consistency in defining and oper-
ationalizing different types of feedback. Third, it was found that a few studies
did not provide learners’ pretest scores or did not measure learners’ knowledge
about the target structures prior to instructional treatments. This makes one
wonder about the extent to which the obtained effects were derived from the
treatments.
The categorization of the meta-analyzed studies according to their learner
characteristics and methodological features revealed some gaps to be filled, and
the existence of these gaps affected the identification of moderator variables.
Specifically, more research is needed that involves child learners, that inves-
tigates speakers and learners of languages other than English, that involves
language learners of higher proficiency (as most of the studies are about begin-
ners), that is conducted in L2 contexts, and that is implemented in the computer
mode. There is also a dearth of research examining the variables that moderate
the effects of corrective feedback on SLA, such as age, gender, proficiency,
L1 transfer, culture, complexity of the target structure, or interlocutor type, to
name only a few. This suggests that now that the effect of corrective feedback
has been established, researchers should embark on the mission of investigating
the factors constraining its effectiveness.
Revised version accepted 26 January 2009
Notes
1 Keck et al.’s meta-analysis (2006), which is about the effectiveness of task-based
interaction in SLA, also included several studies that involved negative feedback
and that were included in Mackey and Goo (2007) and this meta-analysis.
2 It must be pointed out that although ProQuest is a global database, most available
dissertations and theses in this database are from academic institutions in North
America (the United States or Canada).
3 One anonymous reviewer pointed out that the fact that the software can handle only
one independent variable at a time might be a drawback because the independent
variables might be correlated. However, addressing the correlations between
independent variables does not seem to have been the norm in meta-analysis
because the results would be hardly interpretable. Correlations between independent
variables are more of a concern when metaregression analyses are performed, which
explains why a correlation analysis was conducted when the two continuous
moderators were subjected to metaregression analysis. Additionally, the data
cross-tabulation as reported in the Discussion section is at least a partial, if not
perfect, solution if it is indeed a concern.
4 The current version of the software allows only one independent variable to be
included in a metaregression analysis. Therefore, two separate analyses were
performed, for age and publication year, respectively.
5 A total of 18 dissertations were retrieved, but only 11 were included in the analysis.
The rest did not meet the inclusion criteria. Additionally, a few dissertations failed
to be retrieved although the related bibliographic information was obtained, because
they were noncirculation items according to the source libraries.
6 O’Relly’s (1999) study was published, but the dissertation the article was based on
was also retrieved. Effect sizes were calculated according to the data provided in the
dissertation because it was more detailed.
7 The moderator variables did not include learners’ L1, academic status, proficiency,
or proficiency measure because the results could not be meaningfully interpreted. In
many studies, especially those that involved ESL learners, learners’ L1s were
mixed. With regard to academic status, it is difficult to categorize learners in
intensive language programs or language schools. As far as proficiency and
proficiency measures are concerned, there was too much variation in defining
proficiency levels and in the use of proficiency measures by primary researchers.
Therefore, these variables were not analyzed.
8 One anonymous reviewer expressed a concern about the inclusion of studies
involving child learners in the analysis. Further examination of the data showed that
excluding the three studies involving child L2 learners slightly lowered the mean
effect size (on immediate posttests) from 0.61(FE)/0.64 (RE) to 0.58 (FE)/0.61
(RE). Nonetheless, the difference was below 0.03 standard deviation units.
Therefore, including the three studies did not seem to be a cause for concern.
Additionally, because age is a moderator variable in this meta-analysis, including
studies in different age groups would make the results related to this variable more
convincing.
9 The length of treatment does not provide any information on the amount/intensity of
feedback provided to the learner, although in most cases longer treatment contains
more feedback. Primary researchers usually do not report how much feedback was
provided in the treatment; they report only how long the treatment lasted. However,
the amount/intensity of feedback is certainly a question that needs to be addressed
in future research.
References∗
∗
Ammar, A., & Spada, N. (2006). One size fits all? Recasts, prompts, and L2 learning.
Studies in Second Language Acquisition, 28, 543–574.
∗
Ayoun, D. (2001). The role of negative and positive feedback in the second language
acquisition of the passé composé and imparfait. Modern Language Journal, 85,
226–243.
∗
Bationo, B. (1991). The effects of three forms of immediate feedback on learning
intellectual skills in a foreign language computer-based tutorial. Unpublished
doctoral dissertation. The University of Toledo, Toledo, OH.
∗
Bell-Corrales, M. (2001). The role of negative feedback in second language
instruction. Unpublished doctoral dissertation. University of Florida, Gainesville.
Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H. (2005). Comprehensive
Meta-Analysis (Version 2.2.027) [Computer software]. Englewood, NJ: Biostat.
Carpenter, H., Jeon, S., MacGregor, D., & Mackey, A. (2006). Learners’
interpretations of recasts. Studies in Second Language Acquisition, 28, 209–236.
∗
Carroll, S., & Swain, M. (1993). Explicit and implicit negative feedback: An
empirical study of the learning of linguistic generalizations. Studies in Second
Language Acquisition, 15, 357–386.
∗
Carroll, S., Swain, M., & Roberge, Y. (1992). The role of feedback in adult second
language acquisition: Error correction and morphological generalizations. Applied
Psycholinguistics, 13, 173–198.
∗
Chen, H. (1996). A study of the effect of corrective feedback on foreign language
learning: American students learning Chinese classifiers. Unpublished doctoral
dissertation. University of Pennsylvania, Philadelphia.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.).
Hillsdale, NJ: Erlbaum.
Cooper, H., & Hedges, L. (1994). Potentials and limitations of research synthesis. In
H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis (pp.
521–530). New York: Russell Sage Foundation.
Cornell, J., & Mulrow, C. (1999). Meta-Analysis. In A. Herman & M. Gideon (Eds.),
Research mythology in the social, behavioral, and life sciences (pp. 285–323).
London: SAGE Publications.
∗
DeKeyser, R. (1993). The effect of error correction on L2 grammar knowledge and
oral proficiency. Modern Language Journal, 77, 501–514.
DeKeyser, R. (2003). Implicit and explicit learning. In C. Doughty & M. Long (Eds.),
Handbook of second language acquisition (pp. 313–348). Malden, MA: Blackwell.
Doughty, C., & Long, M. (Eds.). (2003). Handbook of second language acquisition.
Malden, MA: Blackwell.
∗
Studies included in the current meta-analysis are marked with an asterisk.
∗
Loewen, S., & Nabei, T. (2007). Measuring the effects of oral corrective feedback on
L2 knowledge. In A. Mackey (Ed.), Conversational interaction in second language
acquisition (pp. 361–377). New York: Oxford University Press.
Loewen, S., & Philp, J. (2006). Recasts in the adult English L2 classroom:
Characteristics, explicitness, and effectiveness. Modern Language Journal, 90,
536–556.
∗
Long, M., Inagaki, S., & Ortega, L. (1998). The role of negative feedback in SLA:
Models and recasts in Japanese and Spanish. Modern Language Journal, 82,
357–371.
Long, M. H. (2007). Problems in SLA. Mahwah, NJ: Erlbaum.
Lyster, R. (1998). Negotiation of form, recasts, and explicit correction in relation to
error types and learner repair in immersion classrooms. Language Learning, 48,
183–218.
Lyster, R. (2001). Negotiation of form, recasts, and explicit correction in relation to
error types and learner repair in immersion classrooms. Language Learning,
51(Suppl. 1), 265–301.
∗
Lyster, R. (2004). Different effects of prompts and effects in form-focused
instruction. Studies in Second Language Acquisition, 26, 399–432.
Lyster, R., & Mori, H. (2006). Interactional feedback and instructional counterbalance.
Studies in Second Language Acquisition, 28, 269–300.
Lyster, R., & Ranta, L. (1997). Corrective feedback and learner uptake. Studies in
Second Language Acquisition, 19, 37–66.
∗
Macheak, T. (2002). Learner vs. instructor correction in adult second language
acquisition: Effects of oral feedback type on the learning of French grammar.
Unpublished doctoral dissertation. Purdue University, West Lafayette, IN.
Mackey, A. (Ed.) (2007). Conversational interaction in SLA: A collection of empirical
studies. New York: Oxford University Press.
Mackey, A., Gass, S. M., & McDonough, K. (2000). How do learners perceive
international feedback? Studies in Second Language Acquisition, 22,
471–497.
Mackey, A., & Goo, J. (2007). Interaction research in SLA: A meta-analysis and
research synthesis. In A. Mackey (Ed.), Conversational interaction in SLA: A
collection of empirical studies (pp. 408–452). New York: Oxford University Press.
∗
Mackey, A., & Oliver, R. (2002). Interactional feedback and children’s L2
development. System, 30, 459–477.
Mackey, A., Oliver, R., & Leeman, J. (2003). Interactional input and the incorporation
of feedback: An exploration of NS-NNS and NNS-NNS adult and child dyads.
Language Learning, 53, 35–66.
Mackey, A., Oliver, R., & Philp, J. (1997). Patterns of interaction in NNS-NNS
conversation. Paper presented at Second Language Research Forum, East Lansing,
MI.
∗
Mackey, A., & Philp, J. (1998). Conversational interaction and second language
development: Recasts, responses, and red herrings? Modern Language Journal, 82,
338–356.
∗
McDonough, K. (2005). Identifying the impact of negative feedback and learners’
response on ESL question development. Studies in Second Language Acquisition,
27, 79–103.
∗
McDonough, K. (2007). Interactional feedback and the emergence of simple past
activity verbs in L2 English. In A. Mackey (Ed.), Conversational interaction in
second language acquisition (pp. 323–338). New York: Oxford University Press.
Nabei, T., & Swain, M. (2002). Learner awareness of recasts in classroom interaction:
A case study of an adult EFL student’s second language learning. Language
Awareness, 11, 43–63.
Nagata, N. (1993). Intelligent computer feedback for second language instruction.
Modern Language Journal, 77, 330–339.
Nicholas, H., Lightbown, P., & Spada, N. (2001). Recasts as feedback to language
learners. Language Learning, 51, 719–758.
Norris, J., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis
and quantitative meta-analysis. Language Learning, 50, 417–528.
Norris, J., & Ortega, L. (Eds.) (2006). Synthesizing research on language learning and
teaching. Amsterdam: Benjamins.
Oliver, R. (2000). Age differences in negotiation and feedback in classroom and
pairwork. Language Learning, 50, 119–151.
Oliver, R., & Mackey, A. (2003). Interactional context and feedback in child ESL
classrooms. Modern Language Journal, 87, 519–533.
∗
O’Relly, L. (1999). The effect of focused versus unfocused communication tasks on
the development of linguistic competence during negotiated interaction.
Unpublished doctoral dissertation. University of South Florida, Tampa.
Paniagua, D. (1985). A study of overt versus covert error correction in foreign language
teaching. Unpublished doctoral dissertation. University of Texas at Austin.
Panova, I., & Lyster, R. (2002). Patterns of corrective feedback and uptake in an adult
ESL classroom. TESOL Quarterly, 36, 573–595.
Patall, E., Cooper, H., & Robinson, J. (2008). The effects of choice on intrinsic
motivation and related outcomes: A meta-analysis of research findings.
Psychological Bulletin, 134, 270–300.
Philp, J. (2003). Constraints on “noticing the gap”: Nonnative speakers’ noticing of
recasts in NS-NNS interaction. Studies in Second Language Acquisition, 25,
99–126.
Pica, T. (1988). Interlanguage adjustments as an outcome of NS-NNS negotiated
interaction. Language Learning, 38, 45–73.
Raudenbush, S. (1994). Random effects models. In H. Cooper & L. V. Hedges (Eds.),
The handbook of research synthesis (pp. 301–322). New York: Russell Sage
Foundation.
∗
Révész, A. (2007). Focus on form in task-based language teaching: Recasts, task
complexity, and L2 learning. Unpublished doctoral dissertation. Teachers College,
Columbia University, New York.
Richardson, K. M., & Rothstein, H. R. (2008). Effects of occupational stress
management intervention programs: A meta-analysis. Journal of Occupational
Health Psychology, 13, 69–93.
∗
Roig-Torres, T. (1992). Error correction in the natural approach classroom: A
contrastive study. Unpublished doctoral dissertation. University of Pittsburgh,
Pennsylvania.
Rosenthal, R. (1984). Meta-analysis procedures for social research. Beverly Hills, CA:
Sage.
Russell, J., & Spada, N. (2006). The effectiveness of corrective feedback for second
language acquisition: A meta-analysis of the research. In J. Norris & L. Ortega
(Eds.), Synthesizing research on language learning and teaching (pp. 131–164).
Amsterdam: Benjamins.
Sachs, R., & Suh, B. (2007). Textually enhanced recasts, learner awareness, and L2
outcomes in synchronous computer-mediated interaction. In A. Mackey (Ed.),
Conversational interaction in second language acquisition (pp. 324–338). New
York: Oxford University Press.
∗
Sagarra, N. (2007). From CALL to face-to-face interaction: The effect of
computer-delivered recasts and working memory on L2 development. In A. Mackey
(Ed.), Conversational interaction in second language acquisition (pp. 229–248).
New York: Oxford University Press.
∗
Sauro, S. (2007). A comparative study of recasts and metalinguistic feedback through
computer mediated communication on the development of L2 knowledge and
production accuracy. Unpublished doctoral dissertation. University of
Pennsylvania, Philadelphia.
Schmidt, R. (1990). The role of consciousness in second language learning. Applied
Linguistics, 11, 129–158.
Schmidt, R. (2001). Attention. In P. Robinson (Ed.), Cognition and second language
instruction (pp. 3–32). Cambridge: Cambridge University Press.
Schwartz, B. (1993). On explicit and negative data effecting and affecting competence
and linguistic behavior. Studies in Second Language Acquisition, 15, 147–163.
Shadish, W. (1992). Do family and marital psychotherapies change what people do? A
meta-analysis of behavioral outcomes. In Cook et al. (Eds.), Meta-analysis for
explanation: A casebook (pp. 129–208). New York: Russell Sage Foundation.
∗
Sheen, Y. (2007). The effects of corrective feedback, language aptitude, and learner
attitudes on the acquisition of English articles. In A. Mackey (Ed.), Conversational
interaction in second language acquisition (pp. 301–322). New York: Oxford
University Press.
Sheen, Y. H. (2004). Corrective feedback and leaner uptake in communicative
classrooms across instructional settings. Language Teaching Research, 8,
263–300.
Language Learning 60:2, June 2010, pp. 309–365 356
Li Meta-Analysis of Corrective Feedback
Appendix Aa
358
Meta-Analysis of Corrective Feedback
(Continued)
Li
Appendix A
359
Continued
Implicit/ Timing Effect Standard 95% CI 95% CI
Primary studies Nb Feedback type explicitc of posts size error lower upper
Recasts I 1 1.044 0.337 0.384 1.705
2 0.412 0.320 −0.215 1.038
Carroll, Swain, 60 Explicit correction E 1 0.871 0.333 0.218 1.524
& Roberge (1992) 19 2 0.593 0.325 −0.045 1.230
Chen (1996) 9 Explicit rejection E 1 0.012 0.461 −0.892 0.916
9 3 0.096 0.461 −0.808 1.000
10 metalinguistice E 1 4.873 0.921 3.068 6.677
3 1.280 0.504 0.291 2.269
DeKeyser (1993) 19 Explicit E 1 0.124 0.115 −0.542 0.789
16
Ellis (2007) 12 Metalinguistic E 1 0.552 0.453 −0.335 1.440
12 2 0.890 0.467 −0.025 1.805
10 Recasts I 1 0.633 0.459 −0.267 1.532
2 0.196 0.433 −0.652 1.045
Ellis et al. (2006) 12 Metalinguistic E 1 −0.005 0.429 −0.846 0.836
12 2 0.290 0.440 −0.573 1.152
10 Recasts I 1 −0.422 0.437 −1.280 0.435
2 −0.400 0.451 −1.283 0.483
Han (2002) 4 Recastsf I 1 2.285 0.915 0.492 4.078
4 3 1.171 0.765 −0.328 2.671
(Continued)
Appendix A
Continued
Implicit/ Timing Effect Standard 95% CI 95% CI
Primary studies Nb Feedback type explicitc of posts size error lower upper
Herron & Tomasello (1988) 16 Clarification & E 1 0.750 0.365 0.033 1.467
16 metalinguistic
Herron (1991) 13 Explicit correction E 1 1.605 0.460 0.703 2.510
12 2 1.669 0.465 0.758 2.580
Hino (2006) 9 Clarification I 1 0.201 0.461 −0.702 1.103
9 2 0.153 0.460 −0.749 1.055
10 Metalinguistic E 1 1.364 0.510 0.365 2.364
360
Meta-Analysis of Corrective Feedback
Appendix A
Li
361
Continued
Implicit/ Timing Effect Standard 95% CI 95% CI
Primary studies Nb Feedback type explicitc of posts size error lower upper
Loewen & Erlam (2006) 11 Metalinguistic E 1 0.025 0.457 −0.870 0.920
12 2 −0.585 0.466 −1.498 0.328
8 Recasts I 1 0.041 0.473 −0.887 0.968
2 −0.400 0.451 −1.283 0.483
Loewen & Nabei (2007) 10 Clarification I 1 0.243 0.401 −0.543 1.029
8 Metalinguistic E 1 0.358 0.421 −0.468 1.184
7 Recasts I 1 0.471 0.368 −0.250 1.192
31 Combined I 1 0.356 0.385 −0.397 1.111
Long, Inagaki, & Ortega (1998) 7 Recasts I 1 0.561 0.527 −0.472 1.595
8
Lyster (2004) 45 Prompts NA 1 1.030 0.315 .0.417 1.652
56 3 0.942 0.312 0.330 1.553
47 Recasts I 1 0.506 0.308 −0.098 1.110
3 0.485 0.308 −0.119 1.089
McDonough (2007) 27 Clarification I 3 1.422 0.328 0.779 2.065
21 Recasts I 3 0.824 0.286 0.263 1.385
26 Combined I 3 1.122 0.307 0.520 1.726
McDonough (2005) 15 Clarification I 3 0.649 0.516 −0.362 1.662
15 Repetition I 3 1.255 0.509 0.256 2.254
15 Repetition & I 3 0 0.592 −1.161 1.161
15 clarification
Combined I 3 0.476 0.554 −0.610 1.562
(Continued)
Li
Appendix A
Continued
Implicit/ Timing Effect Standard 95% CI 95% CI
Primary studies Nb Feedback type explicitc of posts size error lower upper
Macheak (2002) 11 Elicitation I 1 −0.205 0.431 −0.011 0.641
13 3 0.214 0.431 −0.631 1.058
11 Recasts I 1 −0.586 0.422 −1.414 0.242
3 −0.160 0.412 −0967 0.646
Combined I 1 −0.372 0.426 −1.207 0.463
3 −0.018 0.418 −0.839 0.802
Mackey & Oliver (2002) 22 Clarification & recasts I 1 1.082 0.528 0.047 2.116
362
Meta-Analysis of Corrective Feedback
Appendix A
Li
363
Continued
Implicit/ Timing Effect Standard 95% CI 95% CI
Primary studies Nb Feedback type explicitc of posts size error lower upper
Sauro (2007) 7 Metalinguistic E 1 0.716 0.517 −0.296 1.729
8 2 0.947 0.528 −0.088 1.981
8 Recasts I 1 0.554 0.529 −0.482 1.590
2 0.842 0.541 −0.219 1.903
Sheen (2007) 26 Metalinguistic E 1 0.470 0.276 −0.071 1.011
26 3 0.611 0.279 0.065 1.157
28 Recasts I 1 0.161 0.273 −0.374 0.695
3 0.295 0.274 −0.242 0.831
Takashima (1995) 27 Clarification I 1 0.168 0.258 −0.338 0.674
34 2 0.289 0.259 −0.219 0.797
3 0.290 0.259 −0.218 0.799
Tamasello & Herron (1989) 16 Metalinguistic E 1 1.000 0.375 0.269 1.739
16 2 0.767 0.366 0.049 1.485
a
The displayed effect sizes are the ones that were used in the analysis. In the case of multiple effect sizes for a feedback type or target
structure, they were averaged.
b
This column lists the number of participants in the experiment and control groups of each study. The last number of each cell relates to the
control group. These numbers refer to the participants involved in the feedback groups that contribute effect sizes to this meta-analysis, so
they may not correspond to the total number of participants of each study.
c
I = implicit feedback; E = explicit feedback; NA = not applicable.
d
In cases in which multiple implicit or explicit feedback types were examined in a primary study, they are combined to generate an average
value that is used in the “explicit versus implicit” analysis. That value is not necessarily the average of all the effect sizes of a primary study.
e−h
Outliers; e and f were not included in the analysis on the most frequent feedback types as reported in primary studies; g was also not
included in analyzing the overall immediate effects of feedback; h was excluded from all analyses.
Li
Appendix B
Ammar & Spada (2006) 12 FL English FF Class Teacher FCR Published >120 Com
Ayoun (2001) 21 FL French Com Lab Computer CCR Published >120 Drill
Bationo (1991) 22 FL French Com Lab Computer CCR Dissertation 60–120 Drill
Bell-Corrales (2001) 21 FL Spanish FF Class Teacher CCR Dissertation 60–120 Com
Carroll & Swain (1993) NR SL English FF Lab NS CCR Published NR Drill
Carroll et al. (1992) 21 SL French FF Lab NS CCR Published NR Drill
Chen (1996) 21 FL Chinese Com Lab Computer CCR Dissertation <50 Drill
DeKeyser (1993) NR FL French FF Class Teacher CCR/RCR Published >120 Com
364
Meta-Analysis of Corrective Feedback
Appendix B
Li
365
Continued
h
Com = computer; Var = (A) variety.