James Dean Brown

THE WASHBACK EFFECT OF LANGUAGE TESTS
JAMESDEANBnowx
Universily of Hawai'i
Introduction
Numerous definitions ofthe concept of washback have been offered in the language
teaching literature. For instance, Shohamy, Donitsa-Schmidt, and Ferman (1996, p. 298)
define washback very simply as "the connections between testing and learning." Gates
(1995, p. l0l) defines it as "the influence oftesting on teaching and learning." Shohamy
(1992, p.513) went further when she defined washback as "the utilization of external
language tests to affect and drive foreign language learning...this phenomenon is the result
ofthe strong authority of extemal testing and the major impact it has on the lives oftest
takers." Messick (1996, p. 241) provided an even more elaborate definition when he
wrote, "washback, a concept prominent in applied linguistics, refers to the extent to which
the introduction and use ofa test influences language teachers and leamers to do things
they would not otherwise do that promote or inhibit language leaming."
One source of confusion about washback is that several different terms are used to
refer to the connections between testing and leaming. In the general educational literature
the concept is refened to as bacl<wash. Elsewhere the concept has been referred to as test
impacl, measurementiriven inslruclion, curriculum alignment, and test feedback. More
humorously, at a meeting at Educational Testing Service, Dan Douglas once referred to
the concept as the bogwash effect, a variant that I hope will endure.
A number of authors have linked washback to test validity. As Alderson and Wall
(1993a p. I 16) point out, "some writers have even gone so far as to suggest that a test's
validity should be measured by the degree to which it has had a bdneficial influence on
teaching." Messick (1996, p. 241) discusses "the concept ofwasback as an instance ofthe
consequential aspect of construct validity, linking positive washback to so-called authentic
and direct assessments and, more basically, to the need to minimile construct
underrepresentation and construct-irrelevant difliculty in the test.'1 Morrow (1986) refers
to something he calls washback validity, the relationship between a test and the related
curriculum. Frederiksen and Collins (1989) discuss a similar conccpt but refer to it as
systemic validity. Weir (1990, p. 27) suggests that communicativc language testing could
have a strong washback effect on communicative language teachilrg and, in fact, that such
University of Havai'i Working Papers in ESZ, Vol. 16, No. l,Fall 1997, p.2745.
BROWN
a washback effect would be directly linked to the construct validity of the tests'
Because the first definition at the top of this paper (from Shohamy et al, 1996) is both
adequate and parsimonious, it is very attractive. However, that definition does not
explicitly include the link between washback and validity. Hence the working definition of
washback that I will use in this paper is a slightly expanded version ofthe one provided by
Shohamy et al (1996): the connections between language testing and learning, and the
consequences of those connections.
I will continue to explore the concept ofbackwash by addressing a number of
questions: Does washback exist? What factors affect the impact of washback? What are
the negative aspects of washback? How can we promote positive washback? What
directions might future research on washback effect take?
Does lVashback Effect Exist?

Quite reasonably, one of the main points of the article by Alderson and Wall (1993a)
is that insufficient evidence was found in the literature at that time for the existence of
washback. Indeed, their title asked that question directly: Does washback exist? They
point to a great deal of literature that makes assertions about washback, but little actual
empiricat research into the existence and nature ofwashback. Similarly, Watanabe (1996b,
p. 208) reports having found over 500 assertions about the impact ofuniversity entrance
examinations in Japan, but only l0 empirical studies.
Alderson and Wall (1993a) point to four studies that empirically addressed the issue of
washback in the past. Westdorp (1982) in the Netherlands investigated "the validity of
objections to the introduction of multiple-choice tests into the assessment of mother
tongue and foreign language education" and found that complaints based on assumed
washback effects were notjustified. Hughes (1988) in Turkey claimed a positive washback
effect for an English proficiency test screening students to English medium university in
Istanbul. Khaniya (1990) created a beneficial washback effect in Nepal and studied it.
Alderson and Wall (1993a) note that the Westdorp (1982) report shows very little
washback effect, while the other two studies are incomplete or inadequate in some way
and lack investigations into "what actually changed in class" (p. 126). Another study they
cite is Wall and Alderson (1993), which investigated the classroom impact of changing
English examinations in Sri Lanka. That study found very little classroom impact. They
suggest that this outcome may have resulted from teachers not understanding what is
appropriate for preparing students for examinations. They also suggest that the
examination "does not determine how teachers teacher, however much it might influence
what they teach." (Alderson & Wall, 1993a, p. 127).
THE WASHBACK EFFECT'OF TESTS ON LANGUAGE EDUCATION 29
More recently, Alderson and Hamp-Lyons (1996) conducted a study in the united
States ofTOEFL preparation ctasses. They found thai such classes are substantially
different from non-TOEFL classes. TOEFL classes had more test-taking, more teacher
talking time, less turn taking, less time spent on pair work, more references to TOEFL'
picture
more metalanguage, more routinized talk, and less laughter. They found that the
was more complex when the teachers TOEFL and non-TOEFL classes were analyzed
separately and that the ideas expressed about washback in the literature are too
simplistic'
Shohamy, Donitsa-Schmidt, and Ferman (1996) investigated washback in Israel by

examining the effects ofan Arabic as a second language (ASL) examination
for grades 7-9
and an EFL oral test for grade 12. They found the following effects
for the ASL (p. 301):
l. Teachers stopped teaching new material and tumed to reviewing material
2. Teachers replaced class textbooks with worksheets that were identical to previous
years'tests
l
3. The activities were all "testlike"
4. Review sessions were added to regular class hours
5. The atmosphere in the class was tense
6. Teachers and students were highly motivated to master the material
7. When the test was over the above activities stopped
They found quite different effects for the EFL oral test (p. 301):
l. Experienced teachers spent more time on teaching oral language
2. Experienced teachers used only activities identical to the ones on the test
3. Novice teachers tried out additional oral language activities
watanabe (1992) examined washback in Japan and found that, ifil exists, it broadens
the range ofstrategies that students will use, an effect that persistS one year later. Later,
in
Watanabe (1996a) investigated washback in the classroom s ol two yobikoi teachers
Japan. His goal was to study the use of exam induced translation in class.
He concluded
personal
that translation on exams affects some teachers and not others depending on
betiefs, educational background, and past learning experiences (nJ rro1. He also felt that
(1996b)
claims of washback may be exaggerated and somewhat inconsistsnt. Watanabe
(1992), Saito et
cited a number ofother related papers Ariyoshi and Senba (1983), Fujita
d (lg8a) with bearing on the issue ofwashback. unfortunately, I have been unable to
obtain those papers at the time of this writing.
wall (1996) spent four years in a Sri Lankan EFL project evaluating a new national
examination. Similar to what was reported in wall and Alderson (1993), wall (1996)
feports:
BROT'YN
The main findings...were that the examination had had considerable impact on the
content ofEnglish lessons and on the way teachers design their classroom tests (some
of this was positive and some negative), but it had had little to no impact on the
methodology they used in the classroom or on the way they marked their pupils' test
performance.
Wall found the language testing advice ofHughes (1989), Shohamy (1992), and Bailey
(1996) and the general education recommendations of Heyneman and Ransom (1990) and
Kellaghan and Greaney (1992) helpful in improving cuniculum (p. 350-351)
All in all, the empirical studies to date indicate that the washback effect does exist in
various forms in various places, but also that the issue is not a straightforward one that
conforms neatly to the popular notions ofthe effects of examinations on language
learning.
ll/hal Factorc Affect the Impact of Washbock?

Gven the complexity of the issues involved in washback, there must be a number of
factors that impact on it. Gates (1995, p. l0l) outlines two ways in which washback may
vary: it can range from positive to negative washback and from strong to weak washback.
The question raised in this section is what factors cause tests to have positive or negative,
weak or strong washback effects? Gates suggests (pp. 102-103) that the following factors
affect the impact of washback:
l. Prestige
2. Accuracy
3. Transparency
4. Utility
5. Monopoly
6. Anxiety
7. Practicality
Alderson and Hamp-Lyons (1996, p. 296) argue that the amount and type ofwashback
depend on the extent to which:
l. The test has status (and level of stakes)
2. The test is counter to cunent teaching practices
3. Teachers and textbook writers think about appropriate methods oftest preparation l
4. Teachers and textbook writers are willing and able to innovate

TITE WASHBACK EFFECT OF TESTS ON I.,ANGUAGE EDUCANON 3I
Shohamy et al (1996, pp. 299-300, 314-315) say that the degree of impact is
influenced by.
l. status ofthe subject-matter (language)
2. Low vs. high stakes
3. Nature ofthe test (purpose)
4. Format of the test (more anxiety from oral vs. written test; novel vs. familiar
format; etc.)
5. Use to which the scores will be put
6. Skills being tested: "It may very well be that in multiskilled tests, test-takers may
feel more confident since they can compensate for a lack ofproficienry in one skill
by high proficiency in the other"
Shohamy et al. (1996, p. 303) also suggest ways ofjudging the impact oftests on
curriculum:
l. Classroom activities and time allotment
2. The extent to which the test has generated new teaching materials
3. The degree to which students and parents are aware of the existence and content of
the test
4. Perceived effects oftest results
5. The extent to which the test has changed the prestige and position ofthe areas
tested
6. Perception of test quality and importance
7. Impact of test on promoting leaming
8. How the various language inspectors view the role, status, and impact ofthe test
Table I summarizes the factors that affect the impact ofwashback. Surprisingly, very
little overlap exists between the lists provided by the authors cited above. Only two ofthe
24 items listed in the table were mentioned in two articles. Notice, in Table l, that I have
organized the factors into four categories: prestige factors, test faotors, people factors,
and curriculum factors.
lAhat Are the Negative Aspects of llashback?

Many language educators believe that tests have negative washback effects on the
learning and teaching of languages. These beliefs are often based on assumptions like
those pointed out by Watanabe as underlying the washback effects ofuniversity of
entrance examinations in Japan ( 1996a, p. 319):
I . A substantial number of the questions in university entranoe examination require
grammar and translation skills (GT)
32 BROTryN
Table I
Factors That Afect the Impact of l(ashback
^ - -i.'ptotig.
Prestige Faclors
or status of the test (Gates, 1995; Alderson & Hamp-Lyons' 1996)
2. Status-of the subject-matter ofthe test (Shohamy et al' 1996)
3. Perception of quality and importance of the test (Shohamy et al, 1996) '--1
4. Degree to whiih a tist has a monopoly on assessment (Gates, 1995)

5. L;l of stakes (Alderson & Hamp-Lyons, 1996; Shohamy et al, 1996)
6. Use to which the icores will be put (Shohamy et al' 1996)
Test Factorc
l. Nature ofthe test (purpose) (Shohamy et al, 1996)
2. Format ofthe test (Shohamy et al, 1996)
3. Skills being tested (Shohamy et al, 1996)
4. Accuracy the scores (Gates, 1995)
5. Utility of the test and its results (Gates, 1995)
6. Practicality ofthe test (Gates' 1995)
People Faclorc
t. nta degree to which students and parents are aware of the existence and content
ofthe test (Shohamy et al, 1996)
) How students and parents perceive effects oftest results (Shohamy et al, 1996)
J. Student anxiety (Gates, 1995)
4. How the various language inspectors view the role' status, and impact
of the test
(ShohamY et al, 1996)
5. ih" e*t"nt to which the test has changed the prestige and position ofthe areas
tested (ShohamY et al, 1996)
parents'
6. Transparency oi the information provided by the test to teachers, students,
etc. (Gates, 1995)
Curriculum Fodorc
l. Match oftest to current teaching practices (Alderson & Hamp-Lyons, 1996)
2. Effect oftest on promoting learning (Shohamy et al, 1996)
3. Types ofclassroom activities and time allotted for each (Shohamy et al, 1996)
4. T-he e*ent to which the test has generated new teaching materials (Shohamy et al'
1ee6)
5. The extent to which teachers and textbook writers think about appropriate
methods of test preparation (Alderson & Hamp-Lyons, 1996)
6. The extent to which teachers and textbook writers are willing and able to innovate
(Alderson & HamP-LYons, 1996)
THE TI.ASHBACK EFFECT OF TESTS ON LANGUAGE EDUCATION 33
z. This is why so much GT is employed in the classroom

3. unless the content ofthese exams changes, nothing will change in the teaching of
EFL
4. corollary: "...ifthe exam begins to use other types oftest questions, then teachers
will use methods other than GT." (p. 319)
such thinking may often underlie beliefs in the negative washback effects. But more
generally, what are the negative washback effects that language educators perceive?
Alderson and Hamp-Lyons (1996) in their study about ToEFL feel that the following
were negative results ofwashback (p. 280):
l. Unnatural teaching
2. Students being taught inappropriate languageJearning and language-using
strategies
3. Students being taught "TOEFLese"
4. courses that raise TOEFL scores without providing students with the English they
will need in language interaction or in the college or university courses they are
entering
5. Students taking TOEFL courses instead of ..real"
English courses
They also found four main themes in the washback literature (p. 2gl), which could be
construed as negative washback effects:
l. Narrowing the curriculum
2. Lost instructional time
' 3. Reduced emphasis on skills that require complex thinking or problem-solving
4. Test score 'pollution', or increases in test scores without an accompanying rise in
ability in the construct being tested
Bailey (1996, p. 264-265) discusses ten choices that students milht make due to
washback (those with an asterisk might be construed as negative clrcices):
l. *Practicing items similar in format to those on the test
2. *Studying vocabulary and grammar rules
3. Participating in interactive language practice (e.g., target language conversations)
4. Reading widely in the target language
5. Listening to non-interactive language (radio, televisioq eto)
6. *Applying test-taking strategies
7. *Enrolling in tests-preparation courses l
8. Requesting guidance in their studying and feedback on their performance

9. *Enrolling in, requesting or demanding additional (unscheluled) test-preparation
classes or tutorials (in addition to or in lieu ofother languige classes)
BROWN
34
study for the test

10. *Skipping language classes to
the washback efrect in Israel' found
the
Sf,of,amy et al (1996), in investigating
grades 7-9
Arabic as a second language test for
following negative effects for a national
before it was administered (P' 301):
and turned to reviewing material
I . Teachers stopped teaching new material
2.Teachersreplacedclasstextbookswithworksheetsthatwereidenticaltoprevious
Years' tests
3. The activities were all "testlike"
4. Review sessions were added to regular class hours
5. The atmosphere in the class was tense
the test had been administered' the above
However, according to the authors, once
activities stopped.
Table2summarizesthenegativeeffectsofwashback.Again,surprisingly,verylittle
table
Only two ofthe 16 factors listed in the
overlap exists between the listi given above'
have organized the factors
were mentioned in two artictes' Notice, in Table 2' that I
teaching factors' course content factors' course
affecting washback into four categories:
characteristic factors, and class time
factors'
How Can Promote Positive lloshback?

We
their investigation of the washback
As pointed out above, Shohamy et al (1996)' in
effectofanEFLoraltestinlsrael,findthefollowingpositivewashbackeffects(p.301):
oral language
l. Experienced teachers spent more time on teaching
2.Experiencedteachersusedonlyactivitiesidenticaltotheonesonthetest
3. Novice teachers tried out additional oral language activities
positive washback effects'
Clearly, tests can and sometimes do produce
Gates(1995,p.101)suggeststhat..teachersmightreasonablywanttodeterminethe
be interested so they can limit
type of washback that flows from a given test." They should
negativewashbackeffectsandpromotepositivewashback.Anumberofauthorshave
addressed this Iatter issue.
ways of
Hughes (1989, p. l) provides an entire chapter discussing the following
promoting benefi cial backwash:
I . Test the abilities whose development you want to encourage
2. Sample widely and unpredictably
3. Use direct testing
4. Maketesting criterion-referenced
5. Base achievement tests on objectives
THE I,YASHBACK EFFECT OF TESTS ON LANGUAGE EDT]CATION
35
Table 2
Negative Aspects oJ Washback
Teaching Factors
l. Teachers narrow the curriculum (Alderson & Hamp-Lyons, 1996)
2. Teachers stop teaching new materiar and tumed to reviewing matedat (Shohamy et
al, le96)
3. Teachers replaced class textbooks with worksheets identicar to previous years,
tests (Shohamy et al, 1996)
4. Ulnatural teaching (Alderson & Hamp_Lyons, 1996)
^
Coarce Content Factorc
l. students being taught "examination-ese" (Alderson & Hamp-Lyons,
1996)
2. students Practicing "testlike" items similar in format to those on the test qitailey,
1996; Shohamy et al, 1996)
3. Students applying test-taking strategies in class (Bailey, 1996)
4. Students studying vocaburary and grammar rules [to tire excrr,rsion of other aspects
of languagel @ailey, 1996)
Co u rse C h aract eristic Fact on
1. Students being taught inappropriate languageJeaming and language_using
strategies (Alderson & Hamp-Lyons, 1996)
2. Reduced emphasis on skilrs that require comprex thinking or problem-sorving
(Alderson & Hamp-Lyons, 1996)
3. courses that raise examination scores without providing students with the English
they will need in language interaction or in the college or university courses thley
are entering; also called this test score .pollution' (Alderson & Hamp_Lyons,
lee6)
4. The tense atmosphere in the class (Shohamy et al, 1996)
Class Time Facton
l. Enrolling in, requesting or demanding additional (unscheduled) test-preparation
classes or tutorials (in addition to or in lieu ofother langua$e classes) (Alderson &
Hamp-Lyons, 1996; Bailey, 1996)
2. Review sessions added to regular class hours (Shohamy et al, 1996)
3. Skipping language classes to study for the test (Bailey, 1996)
4. Lost instructional time (Alderson & Hamp-Lyons, 1996)
BROTYN
6.Ensuretestisknownandunderstoodbystudentsandteachers
7. Where necessary provide assistance to teachers
Heyneman and Ransom 1990,
( p I 12) suggest three strategies for improving test
positive washback effects:
content so as to create more
(as opposed to selected-response items
like m-c)
l. Use more open-ended items
2. Test higher-level cognitive skills
and others (teacher trainerq
3. Authorities should provide feedback to teachers head teachers) so
education officers' and
curriculum developers, inspectors'
meaningful change can be effected
in 14 countnes
p' 3)' in reviewing World Bank research
Kellaghan and Greaney (1992' (as
negative washback on classroom teaching
in Africq suggest the following toiessen
summarized in Wall, 1996, P 337): of it'
curriculum' not merely a limited aspect
l. Examinations should reflect the full
to ensure they are taught'
2. Higher-order cognitive skills should be assessed
areas; should also relate to
3. Skills to be tested should not be limited to academic
out-olschool tasks'
4.Avarietyofexaminationsformatsshouldbeused,includingwritten,oral,aural,
and Practical.
and national rankings' account should
5. In evaluating published examination results
be taken offactors other than teaching
efrort'
to schools on levels of pupis
6. Detailed, timely feedback should be provided
performance and areas of diffrculty in public examinations'
T.Predictivevaliditystudiesofpublicexaminationsshouldbeconducted.(Thisisto
see whether selected exams are fulfilling
their purpose)'
S.Theprofessionalcompetenceofexaminationauthoritiesneedsimprovement,
esPeciallY in test design'
g. Each examination board should have a research capacity. (This is to investigate,
teaching )
among other things, the impact of examinations on
l0.Examinationauthoritiesshouldworkcloselywithcurriculumorganizationsand
with educational administrators'
ll.Regionalprofessionalnetworksshouldbedevelopedtoinitiateexchange
programmes and to share common interests and concems'
by
Bailey (1996, pp.268-269) suggests that we could promote beneficial washback
incorporating the following into our tests:
l.
Language learning goals
2. AuthenticitY
THE WASHBACK EFFECT OF I'ESTS ON LANGUAGE
EDUCATION 37
3. Learner autonomy and self-assessment

4. Detailed score rePorting
washback:
She (p. 275) also lists other criteria likely to promote beneficial
l.Test-takers,teachers,administrators,andcurriculumdesignersshouldunderstand
the purPose of the test
2. Results must be believable to test takers and score users
3. Test takers must find the results credible and fair
Test should measure what the programs intend to teach
4.
washbick to the extent that:
Bailey concludes (p. 275) that a test will promote beneficial
l.It is based on sound theoretical principles
2. It uses authentic tasks and texts
3. Test takers buy into the assessment process
wall(1996),inreviewingtheliterature,liststhedesirablecharactedsticsinlanguage
testing as being the following (p 33a):
I . Direct testing
2. Criterion-referencing
3. Authentic texts
4. Tasks
Drawingonotherauthors,Wa|lfurthersuggestsimprovingthewashbackeffectbydoing
the following (PP. 334-335):
l.Teachersandstudentsshouldunderstandthetestsforwhichtheyarepreparing
(Hughes, 1989)
2. Teachers should receive help so they understand the tests (Hughes' 1989)
3. Schools should receive feedback from testers (Shohamy, 1992) and
4. Teachers and administrators should be involved in different phases ofthe testing
processbecausetheyarethepeoplewhowillhavetomakechanges(Shohamy,
tee2)
Table 3 summarizes the ways suggested in the literature to promote positive washback
five of
effects. Again, surprisingly little overlap exists between the lists given above. only
the 28 items listed in the table were mentioned in two articles. Notice, in Table
3' that I
have organized the factors that promote positive washback into fodr categories: test
design factors, test content, logistic factors, as well as interpretatioh and analysis factors.
38 BROWN
Table 3
Promoling B eneli c i al Backwash
Ted Design Fadorc

l. Sample widely and unpredictably (Hughes, 1989)
2. Design ta*s to b€ criterion-referenced (Hughes, 1989; Wall, 1996)
3. Design the rest to measure what the programs intend to teach (Bailey, I 996)
4. Base the test on sound theoretical principles (Bailey, 1996)
5. Base achisvement tests on objectives (Hughes, 1989)
6. Use direct testing (Hughes, 1989; Wall, 1996)
7. Foster leamer autonomy and self-assessment (Bailey, 1996)
Test Corted fadors
l. Test the abilities whose dwelopment you want to encourage (Hughes, l9g9)
2. use more open-ended items (not selected-res?onse items like m-c) (Heyneman & RansonL 1990)
3. Make examinations rcflect the full curriculum, not a limited part (Kellaghan & Greaney, 1992)
4. Assess higher-order cognitive skills to ensur€ thcy arc taught (Hcyncman & Ransom, 1990;
Kellaghan & creaney, 1992)
5. use a variety of examination formats, including writren, oral, aural, and practical (Kellaghan &
Greaney, 1992)
6. Do not limit skills to be tcsted to academic areas (should also relate to out-of-school tasks)
(Kellaghan & Greaney, 1992)
7. Use authentic tasks and texts (Bailey, 1996; Wall, 1996)
Loglnic Fadors
1. Insue that test-takers, teachers, administrators, c-urriculum designers undersand the purpose of
the test @ailey, 1996; Hughes, 1989)
2. Make zure language leaming goals are clear (Bailey, 1996)
3. where necessary provide assistance to teachers to help them understand the tests (Hughes, 19sg)
4. Provide feedback to teachers and others so meaningful change can be effected (t{eyneman &
Ransom, 1990; Shohamy, 1992)
5. Provide detailed and timely feedback to schools on levels ofpupils' performance and areas of
difficulty in public examinations (Kellaghan & Greaney, 1992)
6.Make sure teachers and administrators are involved in difrerent phases of the testing process
because they are the people who will lrave to make changes (Shohamy, 1992)
7. Provide dctailcd scorc rcporting (Bailey, t996)
Interprulotion and Analysis Factors
l. Make surc rcsults arc bclievable, crcdiblc, and fair to test tak€rs and scorc uscrs (Bail€ry, 1996)
2. Considcr factors other than teaching cIIoIt in evaluating publishcd examination rcsults and
national rankings (Kellaghan & Greaney, 1992)
3. Conduct predictive validity studies of public examinations (This is to see whether selected exans
are fulfilling their purpose) (Kellaghan & Greaney, 1992)
4. Improve the professional competence of examination authorities, csp€cially in test design
(Kellaghan & Greaney, 1992)
5. Insure that each examination board has a research capacity (In order to investigate, among other
things, the impact ofexaminations on teaching) (Kellaghan & Greaney, 1992)
6. Hav€ examination authorities work closely with curriculum organizations and with educational
administmtors (Kellaghan & Greaney, 1992)
7. Dwelop regional professional networks to initiate €xchange programs and to share @mmon
interests and concems (Kellaghan & Greaney, 1992)
THE Tr/ASHBACK EFFECT OF TESTS ON LANGUAGE EDUCATION 39
llhat Directions Might Future Research on l4tashback Elfect Takc?

Earlier, I showed how the literature: supports the existence ofwashback effects in
various situations, but also reveals that the issues involved are far fiom simple. Many
authors have listed the factors they think affect the impact ofwashback, factors they think
are the negative aspects ofwashback, and factors they think promote positive washback.
However, asI was summarizing these three aspects of the literature in Tables l-3, I
realized that very little overlap exists among the lists ofdifferent authors. In other words,
little agreement was found about what factors affect washback, what the negative aspects
are, and what we can do to promote positive washback. Clearly, much more research is
needed in this important area oflanguage testing, especially research that can clarifu the
above three issues.
Alderson and Wall (1993, pp. 120-l2l) suggest 15 hypotheses that should be
investigated in this regard:
l. A test will influence teaching
2. A test will influence learning
3. A test will influence ryftal teachers teach
4. A test will influence &oy teachers teach
5. A test will influence ryla, leamers leam
6. A test will influence iow learners learn
7. A test will influence the rate and sequence ofteaching
8. A test will influence the rale and sequence oflearning
9. A test will influence the degree and depth ofteaching
10. A test will influence the degree and depth of leaming
I l. A test will influence the attitudes to the content, method, etc. ofteaching and
learning
12. Tests that have important consequences will have washbacl{
13. Tests that do not have important consequences will have nd washback
14. Tests will have washback on a// learners and teachers
15. Tests will have washback effects for soze learners and teachers, but not for others
Watanabe (1996b) suggests five research questions:
l. Does washback exist?
2. What evidence enables us to say washback exists or does nbt exist?
3. Ifwashback exists, what is its nature (i.e., positive or negative)?
4. Ifwashback does not exist, why not?
5. If washback exists, under what conditions?
Bailey (1996, pp.276-277) offers questions for research within a language program:
BROWN
l. Do the participants understand the purpose(s) ofthe test and the intended use(s) of
the results?
2. Are the results provided in a clear, informative and timely fashion?
3. Are the results perceived as believable and fair by the participants?
4. Does the test measure what the program intends to teach?
5. Is the test based on clearly articulated goals and objectives?
6. Is the test based on sound theoretical principles which have current credibility in
the field?
7. Does the test utilize authentic texts and authentic tasks?
8. Are the participants invested in the assessments processes?
Shohamy et al (1996, p. 298) raised questions oftheir own:
l. Is introducing changes through tests effective?
2. Can the introduction oftests per se cause real improvement in learning and
teaching?
3. How are test results used by teachers, students and administrators.
Table 4 summarizes some questions that future research on the washback effect might
profitably address. Again, surprisingly little overlap appears to exist between the lists
given by various authors. Only one ofthe 3l questions listed in the table were mentioned
in two articles. Notice, in Table 4, that I have organized the questions for future research
into three categories: general questions, detailed questions, and program related questions.
Conclusion
In this article, I reviewed several definitions ofwashback. Then, I set out to answer a
number ofquestions and found the following:
l. Does washback exist? The answer was a qualified yes. The literature supports the
notion that washback exists in various places in various ways. Clearly, the issue is
a complex one that warrants considerably more research.
2. What factors affect the impact ofwashback? I identified a total of24 factors in
the literature that have an impact on washback. These 24 included prestige factors,
test factors, people factors, and curriculum factors.
L ll'hat are the negative aspects ofwashback? | fovnd a total of 16 factors in the
literature that seem to be negative aspects ofwashback. These 16 included
teaching factors, course content factors, course characteristic factors, and class
time factors.
THE WASHBACK EFFECTOF TESZS ON LANGUAGEEDUCANON
4t
Table 4
Direc ons lot Futurc Research on Washback
General Questions
I. Does washback exist? (Alderson & Wall, 1993; Waranabe, 1996b)
2. what evidence enabres us to say uashback exisls or does not exist? (watanabe, lg96b)
3. If washback exisls, what is its nature (i.e., positive or negative)? (Watanabe, t9%b)
4. Ifwashback does not exist, why not? (Waranabe, l996bi
5 . If washback exists, under what condirions? (Watanabe,
i 996b)
6. Is introducing changes through rests effective? (Shohamy et al, 1996)
7' can the introduclion oftests per se cause rear improvement in learning and teaching? (Shohamy
et al, 1996)
8. How are test res1llts used by teachers, students and
administrafors? (Shohamy et at, 1996)
Ddailed Questions
l. Will test inlluence teaching (Anderson & Wall, 1993)
2. Will test inJluencc leaming (Anderson & Wa , 1993j
3. Will test influenc€ yrat teachers teach (Anderson & Wall, 1993)
4. Will tcst influence /rov teachers teach (Anderson & Wall, 1993)
5. Will test inftuence yrat learners leam (Anderson & Wall, 1993)
6. Will test inJluence lrory leamers leam (Andenon & Wall, 1993)
7. Will test inlluence rhe rdr€ and ss quence of te,achilg(A;d€rson & Walf, 1993)
8. Will test inltuen(E, the rate and sequence of leaninl(Anderson & Walf, 1993)
9. Will test inllue n* the degree and depth of aeaching (Anderson & Wall, 1993)
10. Will test influe ne the degree and, depth of leaning (Anderson
& Wall, 1993)
I I . will test inlluence the arritudes to the content, method, etc. of
teaching and Iearning (Anderson
& Wall, 1993)
12. will tests that have important consequences have more washback (Anderson
& wa , 1993)
13. wiu tests that do not have imponant consequences have no washback (Anderson
& watr, illr;
14. Will tests have washback on a// learners and teachers (Anderson & Wall,
1993)
15. will tests have washback effects for sozre l€arnen and teachers, but not for
others (Anderson &
Wall, 1993)
h og anr Re I a t e d Qu ed;io n s
l. Do the participants understand the purpose(s) of the test and lhe interd€d use(s)
of the r€sults?
@ailcy, 1996)
2. Are the result$ provided in a clcar, informative and timcly fashion?
@aifey, 1996)
3. Are the results perceived as betiwable and fair by the panicipants? (Bail€y, f996)
4. Does the test me:lsure what the program intends to teach? @ailey, 19g6)
5. Is the test based on clearly articulated goals and objectives? (Bailey, 1996)
6' Is the test based on sound lheoretical principles which have current ctedibility in the field?
@ailey, 1996)
7. Does the t€st utilize authentic texts and authentic tasks? (Bailey, 1996)
8. Are the participants invested in the assessments pr@€sses? (Bailey, 1996)
BROTTN
I unearthed a total of 28 factors in the

4. How can we promote posilive washback?
These 28 included test design factors'
literature that promote positive washback'
interpretation and analysis factors'
test content factors, logistio factors' and
research on washback effecl lake? | also compiled
a
5 . What directions might future
related research questions that I found in
list of3l general, detailed, and program
to be answered'
the literature research questions that need
in this article are important if we are to
Answers to the five questions addressed
decisions about our students' As Watanabe
responsibly use tests for making important
money and energy is spent on entrance
(1996, p. ifZ; put it, "a large amount oftime'
and national levels' In order to make
the best use of
exams every year at individual, school
investment, we need to be empirical'
rational and well informed "
such an
REFERENCES
(1996) TOEFL preparation courses: A study of

Alderson, I. C., & Hamp-Lyons, L'
washback. Langnge Testing, I 3, 280-297 '
Alderson,I.C.,&walt,D.(1993a).Doeswashbackexist?AppliedLinguistics'14'pp'
I l5-129.
washback: The Sri Lankan impact study'
Aldersoq I. C., & Wall, D. (1993b)' Examining
Innguage Tesling, I 0, 4l-69'
ni kansuru kenkyu (A
,Liy*f,i, ff., A SenUa" K. (1983)' Daigaku nyushijunbi kyoiku
studyonpreparatoryteachingfortheuniversityentranceexaminations).Fuhtolta
Kyoitu Daigaku KiYo, 33,l-21'
Bailey, K. M. (1996). Working for washback: A
review of the washback concept in
language testing . Language Testing, 13,257-279 '
Berwick,R',&Ross,S.(1989).Motivationandmatriculation:Arestudentsstillalive
after exam hell? JALT Journal, I l(2), 193'210'
BuchG'(1988)'Testinglisteningcomprehensioninlapaneseuniversityentrance
examinations. JALT Journal, I 0, 1542'
Frederiksen, J. R., & collins, A. (19s9). A systems approach to educational testing.
Mucation researcher, I 8(9), 27 -32.
Fujita T. (1992). Readiness model of optimal input: A comparison between Japanese high
school and non-high school students ofEnglish. sophia Linguistica, 31,122-143.
Gates, S. (1995). Exploiting washback from standardized tests ln J. D. Brown & S' O'
Yamashita (Eds.), Language testing in Japan (pp' l0l-106). Tokyo: Japanese
Association for Language Teaching.
TITE WASHBACK EFFECT OF TESTS ON LANGUAGE EDUCANON 43
Green, D. z. (1985) Developing measures of communicative proficiency:

A test for
French immersion students in grades 9 and 10. In p. c. Hauptman,
R. LeBlanc, & M.
B. Wesche @ds.), Second language perlormance tesling (pp.2lS-227)
Ottawa,
Canada: University of Ottawa press.
Hart, D., Lapkin, S., & Swain, M. (1997). Communicative languago
tests: perks and
penls. Evaluation ond Reseatch in Education, I, g3_g3.
Heyneman, S. P., & Ransor4 A. W. (1990). Using examinations
and testing to improve
educational quality. Hucational policy, 177 _192.
Hughes, A. (1988). Introducing a needs-based test ofEngrish rangurge
proficiency into an
English-medium university in Turkey. In A. Hughes (Ed.), Testing
Engrish for
University Study. ELT Document #127 . ModemEnglish publicrtions.
Hughes, A. (1989)' Testing for ranguage teachers. cambridge:
cambridge university
Press.
Iones, R. L. (1985). Second language performance testing: An overview.
In p. C.
Hauptman, R. LeBlanc, & M. B. Wesche @ds.), Second language performance
testing (pp. 15-24). Ottawa: University of Ottawa press.
Khaniya, T. R. (1990). The washback effect of a textbook-based test.
Minburgh working
Papers in Applied Linguistics, /, 4g-59.
Kellaghan, T., & Greaney, v. (lgg2). using examination.s ro improve education:A study
offourteen Alrican countries. Washington, DC: The World Bank.
Law, G. (1994). college entrance exams and team teaching in high schoor Engrish
classrooms. In M. wada & T. cominos @ds.), studies in team teaching (pp. 90-102).
Tokyo: Kenlqrusha.
Law, G. (1995). Ideologies of English language education in Japan. JALTJourrnl, I7(2),
213-224.
Mehrens, w. A., & Kaminsky, J. (1989). Methods for improving strndardized test scores:
Fruitful, fruitless, or fraudulent? Mucational Meanrement: Is$tes and practice, g,
t4-22.
Messiclg s. (1989). Meaning and values in test validation: The science and ethics of
assessment. Educalional Researcher, I 8, 5-ll.
Messick, S. (1996). Validity and washback in language testing. Language Testing, 13,
241-256.
Miller, M. D., & Legg, S. M. (1993). Altemative assessment in a high-stakes environment.
Educational Measuremenl: Issaes and Practice, I2, 9-15.
Morrow, K. (1986). The evaluation of tests of communicative perfbrmance. In M. portal
@d.), Innovations in Innguage Testing. London: NFER/Nelsoh.
BROIYN
(Ed )' Currenl

Morrow, K. (1991). Evaluating communicative tests' In S' Anivan
developments in language testing (pp. I 11-l l8)' Singapore: Regional
Language
Centre.
Moss, P. A. (1992). Shifting conceptions of validity in educational measurement:
62(3\'
Implications for perfiormance assessment' Review of Mucalional Research'
229-258.
and an
Powers, D. E. (1993). Coaching for the SAT: A summary of the summaries
update. Mucational Measurement: Issues and Practice'
I2'24'30'
personal view ' JALT
Reader, I. (1986). Language teaching in Britain and Japan: A
Journal, 7 (2), I 13 -13 6.
Rohlen, T. Japan's high schools. Berkley, CA: University of California Press.
ga igaku kyoiku ni oyobosu
Saito, T., Ariata, S., & Nasu, I. (1984). Taki-sentaku tesuto
eikyo(Effectsofmultiple-choicequestionsonmedicaleducation).Nihonigafukyoiku
shinko midon kenkyu jose ni yoru kenlcyu hokohr sho'
feedback testing model
Shohamy, E. (|gg2). Beyond performance testing: A diagnostic
for assessing foreign language learning. Modern Language Journal, 76(4), 513-521.
Shohamy,E.(1995).Performanceassessmentinlanguagetesling'AnnualReviewof
Applied Linguistics, I 5, 188-21 l.
Shohamy, E., Donitsa-Schmidt, S., & Ferman, l. (1996). Test impact revisited:
washback
efiect over time . I'anguage Testing, I 3,298'317 '

Swain, M. (1984). Large-scale communicative testing: A case study. In S. J. Savignon
&
M. Berns (Eds.), Initiatives in communicative language teaching (pp' 185-201)'
Reading, MA: Addison-WesleY
Swain, M. (1985). Large-scale communicative testing. In Y. P. Lee, C. Y. Y' Foh R'
Lord, & G. Low (Eds.), New directions in language lesllng. Hong Kong: Pergamon
Press.
Wall, D. (1996). Introducing new tests into traditional systems: lnsights from general
education and from innovation theory.I'anguage Tesling, 13,234-354.
Wall, D., & Alderson, J. C. (1993). Examining washback: The Sri Lankan impact study.
Ianguage Tesling, 10, 4l-69.
Wall, D., & Alderson, J. C. (1996). Examining washback: The Sri Lankan impact study
@p. 9a-221).In A. Cumming & R. Berwick (Eds.), Validarbn in language testing.
Clevedon, UK: Multilingual Matters.
Watanabe, Y. (1992). Washback effects ofcollege entrance examinations on language
learning strategies. JACET Bulletin, 23, 175-194.
TITE WASHBACK EFFECT OF TESTS ON I/4NGUAGE
EDUCATION 45
come from the entran@ examination?

Watanabe, Y. (1996a). Does grammar translation
Language Testing' I3' 318-333'
Preliminary findings from classroom-based research
I Watanabe,Y.(1996b).InvestigatingwashbackinJapaneseEFLclissrooms:Problemsand
l- I i' 208-239 '
methodology. Australian Review of Apptied Linguistics'
in ESL: Defining the essentials' ?"ESOI
Watson-Gegeo, K. A. (1988). Ethnography
Quarter ly, 2 2(4), 57 5 -592.
Hemel Hempstead' UK: Prentice
Weir, C. J. (1990). Communicative language /eslirg
Hall.
testing in pri"sry and seconday
Wesdorp, H. (1982)' Bacln'ash fficts of langaage
van de
education. Unpublished ms. Stichting Centrum voor onderwijsonderzoek
Universiteit van Amserdam, The Netherlands'
I
I
Iames Dean Brown
Department of ESL
1890 East-West Road
l University of Hawai'i
Honolulu, Hawai'i 96822
I
e-mail: brownj@hawaii.edu

James Dean Brown

Uploaded by

Copyright:

Available Formats

James Dean Brown

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

James Dean Brown

Uploaded by

Copyright:

Available Formats

THE WASHBACK EFFECT OF LANGUAGE TESTS

Does lVashback Effect Exist?

Shohamy, Donitsa-Schmidt, and Ferman (1996) investigated washback in Israel by

ll/hal Factorc Affect the Impact of Washbock?

4. Teachers and textbook writers are willing and able to innovate

lAhat Are the Negative Aspects of llashback?

4. Degree to whiih a tist has a monopoly on assessment (Gates, 1995)

z. This is why so much GT is employed in the classroom

8. Requesting guidance in their studying and feedback on their performance

study for the test

How Can Promote Positive lloshback?

3. Learner autonomy and self-assessment

Ted Design Fadorc

llhat Directions Might Future Research on l4tashback Elfect Takc?

I unearthed a total of 28 factors in the

(1996) TOEFL preparation courses: A study of

language testing . Language Testing, 13,257-279 '

Green, D. z. (1985) Developing measures of communicative proficiency:

(Ed )' Currenl

efiect over time . I'anguage Testing, I 3,298'317 '

come from the entran@ examination?

You might also like