James Dean Brown
James Dean Brown
James Dean Brown
JAMESDEANBnowx
Universily of Hawai'i
Introduction
Numerous definitions ofthe concept of washback have been offered in the language
teaching literature. For instance, Shohamy, Donitsa-Schmidt, and Ferman (1996, p. 298)
define washback very simply as "the connections between testing and learning." Gates
(1995, p. l0l) defines it as "the influence oftesting on teaching and learning." Shohamy
(1992, p.513) went further when she defined washback as "the utilization of external
language tests to affect and drive foreign language learning...this phenomenon is the result
ofthe strong authority of extemal testing and the major impact it has on the lives oftest
takers." Messick (1996, p. 241) provided an even more elaborate definition when he
wrote, "washback, a concept prominent in applied linguistics, refers to the extent to which
the introduction and use ofa test influences language teachers and leamers to do things
they would not otherwise do that promote or inhibit language leaming."
One source of confusion about washback is that several different terms are used to
refer to the connections between testing and leaming. In the general educational literature
the concept is refened to as bacl<wash. Elsewhere the concept has been referred to as test
impacl, measurementiriven inslruclion, curriculum alignment, and test feedback. More
humorously, at a meeting at Educational Testing Service, Dan Douglas once referred to
the concept as the bogwash effect, a variant that I hope will endure.
A number of authors have linked washback to test validity. As Alderson and Wall
(1993a p. I 16) point out, "some writers have even gone so far as to suggest that a test's
validity should be measured by the degree to which it has had a bdneficial influence on
teaching." Messick (1996, p. 241) discusses "the concept ofwasback as an instance ofthe
consequential aspect of construct validity, linking positive washback to so-called authentic
and direct assessments and, more basically, to the need to minimile construct
underrepresentation and construct-irrelevant difliculty in the test.'1 Morrow (1986) refers
to something he calls washback validity, the relationship between a test and the related
curriculum. Frederiksen and Collins (1989) discuss a similar conccpt but refer to it as
systemic validity. Weir (1990, p. 27) suggests that communicativc language testing could
have a strong washback effect on communicative language teachilrg and, in fact, that such
University of Havai'i Working Papers in ESZ, Vol. 16, No. l,Fall 1997, p.2745.
BROWN
a washback effect would be directly linked to the construct validity of the tests'
Because the first definition at the top of this paper (from Shohamy et al, 1996) is both
adequate and parsimonious, it is very attractive. However, that definition does not
explicitly include the link between washback and validity. Hence the working definition of
washback that I will use in this paper is a slightly expanded version ofthe one provided by
Shohamy et al (1996): the connections between language testing and learning, and the
consequences of those connections.
I will continue to explore the concept ofbackwash by addressing a number of
questions: Does washback exist? What factors affect the impact of washback? What are
the negative aspects of washback? How can we promote positive washback? What
directions might future research on washback effect take?
More recently, Alderson and Hamp-Lyons (1996) conducted a study in the united
States ofTOEFL preparation ctasses. They found thai such classes are substantially
different from non-TOEFL classes. TOEFL classes had more test-taking, more teacher
talking time, less turn taking, less time spent on pair work, more references to TOEFL'
picture
more metalanguage, more routinized talk, and less laughter. They found that the
was more complex when the teachers TOEFL and non-TOEFL classes were analyzed
separately and that the ideas expressed about washback in the literature are too
simplistic'
and an EFL oral test for grade 12. They found the following effects
for the ASL (p. 301):
l. Teachers stopped teaching new material and tumed to reviewing material
2. Teachers replaced class textbooks with worksheets that were identical to previous
years'tests
l
3. The activities were all "testlike"
4. Review sessions were added to regular class hours
5. The atmosphere in the class was tense
6. Teachers and students were highly motivated to master the material
7. When the test was over the above activities stopped
They found quite different effects for the EFL oral test (p. 301):
l. Experienced teachers spent more time on teaching oral language
2. Experienced teachers used only activities identical to the ones on the test
3. Novice teachers tried out additional oral language activities
watanabe (1992) examined washback in Japan and found that, ifil exists, it broadens
the range ofstrategies that students will use, an effect that persistS one year later. Later,
in
Watanabe (1996a) investigated washback in the classroom s ol two yobikoi teachers
Japan. His goal was to study the use of exam induced translation in class.
He concluded
personal
that translation on exams affects some teachers and not others depending on
betiefs, educational background, and past learning experiences (nJ rro1. He also felt that
(1996b)
claims of washback may be exaggerated and somewhat inconsistsnt. Watanabe
(1992), Saito et
cited a number ofother related papers Ariyoshi and Senba (1983), Fujita
d (lg8a) with bearing on the issue ofwashback. unfortunately, I have been unable to
obtain those papers at the time of this writing.
wall (1996) spent four years in a Sri Lankan EFL project evaluating a new national
examination. Similar to what was reported in wall and Alderson (1993), wall (1996)
feports:
BROT'YN
The main findings...were that the examination had had considerable impact on the
content ofEnglish lessons and on the way teachers design their classroom tests (some
of this was positive and some negative), but it had had little to no impact on the
methodology they used in the classroom or on the way they marked their pupils' test
performance.
Wall found the language testing advice ofHughes (1989), Shohamy (1992), and Bailey
(1996) and the general education recommendations of Heyneman and Ransom (1990) and
Kellaghan and Greaney (1992) helpful in improving cuniculum (p. 350-351)
All in all, the empirical studies to date indicate that the washback effect does exist in
various forms in various places, but also that the issue is not a straightforward one that
conforms neatly to the popular notions ofthe effects of examinations on language
learning.
Shohamy et al (1996, pp. 299-300, 314-315) say that the degree of impact is
influenced by.
l. status ofthe subject-matter (language)
2. Low vs. high stakes
3. Nature ofthe test (purpose)
4. Format of the test (more anxiety from oral vs. written test; novel vs. familiar
format; etc.)
5. Use to which the scores will be put
6. Skills being tested: "It may very well be that in multiskilled tests, test-takers may
feel more confident since they can compensate for a lack ofproficienry in one skill
by high proficiency in the other"
Shohamy et al. (1996, p. 303) also suggest ways ofjudging the impact oftests on
curriculum:
l. Classroom activities and time allotment
2. The extent to which the test has generated new teaching materials
3. The degree to which students and parents are aware of the existence and content of
the test
4. Perceived effects oftest results
5. The extent to which the test has changed the prestige and position ofthe areas
tested
6. Perception of test quality and importance
7. Impact of test on promoting leaming
8. How the various language inspectors view the role, status, and impact ofthe test
Table I summarizes the factors that affect the impact ofwashback. Surprisingly, very
little overlap exists between the lists provided by the authors cited above. Only two ofthe
24 items listed in the table were mentioned in two articles. Notice, in Table l, that I have
organized the factors into four categories: prestige factors, test faotors, people factors,
and curriculum factors.
Table I
Factors That Afect the Impact of l(ashback
^ - -i.'ptotig.
Prestige Faclors
or status of the test (Gates, 1995; Alderson & Hamp-Lyons' 1996)
2. Status-of the subject-matter ofthe test (Shohamy et al' 1996)
3. Perception of quality and importance of the test (Shohamy et al, 1996) '--1
Table 2
Negative Aspects oJ Washback
Teaching Factors
l. Teachers narrow the curriculum (Alderson & Hamp-Lyons, 1996)
2. Teachers stop teaching new materiar and tumed to reviewing matedat (Shohamy et
al, le96)
3. Teachers replaced class textbooks with worksheets identicar to previous years,
tests (Shohamy et al, 1996)
4. Ulnatural teaching (Alderson & Hamp_Lyons, 1996)
^
Coarce Content Factorc
l. students being taught "examination-ese" (Alderson & Hamp-Lyons,
1996)
2. students Practicing "testlike" items similar in format to those on the test qitailey,
1996; Shohamy et al, 1996)
3. Students applying test-taking strategies in class (Bailey, 1996)
4. Students studying vocaburary and grammar rules [to tire excrr,rsion of other aspects
of languagel @ailey, 1996)
Co u rse C h aract eristic Fact on
1. Students being taught inappropriate languageJeaming and language_using
strategies (Alderson & Hamp-Lyons, 1996)
2. Reduced emphasis on skilrs that require comprex thinking or problem-sorving
(Alderson & Hamp-Lyons, 1996)
3. courses that raise examination scores without providing students with the English
they will need in language interaction or in the college or university courses thley
are entering; also called this test score .pollution' (Alderson & Hamp_Lyons,
lee6)
4. The tense atmosphere in the class (Shohamy et al, 1996)
Class Time Facton
l. Enrolling in, requesting or demanding additional (unscheduled) test-preparation
classes or tutorials (in addition to or in lieu ofother langua$e classes) (Alderson &
Hamp-Lyons, 1996; Bailey, 1996)
2. Review sessions added to regular class hours (Shohamy et al, 1996)
3. Skipping language classes to study for the test (Bailey, 1996)
4. Lost instructional time (Alderson & Hamp-Lyons, 1996)
BROTYN
6.Ensuretestisknownandunderstoodbystudentsandteachers
7. Where necessary provide assistance to teachers
Heyneman and Ransom 1990,
( p I 12) suggest three strategies for improving test
positive washback effects:
content so as to create more
(as opposed to selected-response items
like m-c)
l. Use more open-ended items
2. Test higher-level cognitive skills
and others (teacher trainerq
3. Authorities should provide feedback to teachers head teachers) so
education officers' and
curriculum developers, inspectors'
meaningful change can be effected
in 14 countnes
p' 3)' in reviewing World Bank research
Kellaghan and Greaney (1992' (as
negative washback on classroom teaching
in Africq suggest the following toiessen
summarized in Wall, 1996, P 337): of it'
curriculum' not merely a limited aspect
l. Examinations should reflect the full
to ensure they are taught'
2. Higher-order cognitive skills should be assessed
areas; should also relate to
3. Skills to be tested should not be limited to academic
out-olschool tasks'
4.Avarietyofexaminationsformatsshouldbeused,includingwritten,oral,aural,
and Practical.
and national rankings' account should
5. In evaluating published examination results
be taken offactors other than teaching
efrort'
to schools on levels of pupis
6. Detailed, timely feedback should be provided
performance and areas of diffrculty in public examinations'
T.Predictivevaliditystudiesofpublicexaminationsshouldbeconducted.(Thisisto
see whether selected exams are fulfilling
their purpose)'
S.Theprofessionalcompetenceofexaminationauthoritiesneedsimprovement,
esPeciallY in test design'
g. Each examination board should have a research capacity. (This is to investigate,
teaching )
among other things, the impact of examinations on
l0.Examinationauthoritiesshouldworkcloselywithcurriculumorganizationsand
with educational administrators'
ll.Regionalprofessionalnetworksshouldbedevelopedtoinitiateexchange
programmes and to share common interests and concems'
by
Bailey (1996, pp.268-269) suggests that we could promote beneficial washback
incorporating the following into our tests:
l.
Language learning goals
2. AuthenticitY
THE WASHBACK EFFECT OF I'ESTS ON LANGUAGE
EDUCATION 37
have organized the factors that promote positive washback into fodr categories: test
design factors, test content, logistic factors, as well as interpretatioh and analysis factors.
38 BROWN
Table 3
Promoling B eneli c i al Backwash
l. Do the participants understand the purpose(s) ofthe test and the intended use(s) of
the results?
2. Are the results provided in a clear, informative and timely fashion?
3. Are the results perceived as believable and fair by the participants?
4. Does the test measure what the program intends to teach?
5. Is the test based on clearly articulated goals and objectives?
6. Is the test based on sound theoretical principles which have current credibility in
the field?
7. Does the test utilize authentic texts and authentic tasks?
8. Are the participants invested in the assessments processes?
Shohamy et al (1996, p. 298) raised questions oftheir own:
l. Is introducing changes through tests effective?
2. Can the introduction oftests per se cause real improvement in learning and
teaching?
3. How are test results used by teachers, students and administrators.
Table 4 summarizes some questions that future research on the washback effect might
profitably address. Again, surprisingly little overlap appears to exist between the lists
given by various authors. Only one ofthe 3l questions listed in the table were mentioned
in two articles. Notice, in Table 4, that I have organized the questions for future research
into three categories: general questions, detailed questions, and program related questions.
Conclusion
In this article, I reviewed several definitions ofwashback. Then, I set out to answer a
number ofquestions and found the following:
l. Does washback exist? The answer was a qualified yes. The literature supports the
notion that washback exists in various places in various ways. Clearly, the issue is
a complex one that warrants considerably more research.
2. What factors affect the impact ofwashback? I identified a total of24 factors in
the literature that have an impact on washback. These 24 included prestige factors,
test factors, people factors, and curriculum factors.
L ll'hat are the negative aspects ofwashback? | fovnd a total of 16 factors in the
literature that seem to be negative aspects ofwashback. These 16 included
teaching factors, course content factors, course characteristic factors, and class
time factors.
THE WASHBACK EFFECTOF TESZS ON LANGUAGEEDUCANON
4t
Table 4
Direc ons lot Futurc Research on Washback
General Questions
I. Does washback exist? (Alderson & Wall, 1993; Waranabe, 1996b)
2. what evidence enabres us to say uashback exisls or does not exist? (watanabe, lg96b)
3. If washback exisls, what is its nature (i.e., positive or negative)? (Watanabe, t9%b)
4. Ifwashback does not exist, why not? (Waranabe, l996bi
5 . If washback exists, under what condirions? (Watanabe,
i 996b)
6. Is introducing changes through rests effective? (Shohamy et al, 1996)
7' can the introduclion oftests per se cause rear improvement in learning and teaching? (Shohamy
et al, 1996)
8. How are test res1llts used by teachers, students and
administrafors? (Shohamy et at, 1996)
Ddailed Questions
l. Will test inlluence teaching (Anderson & Wall, 1993)
2. Will test inJluencc leaming (Anderson & Wa , 1993j
3. Will test influenc€ yrat teachers teach (Anderson & Wall, 1993)
4. Will tcst influence /rov teachers teach (Anderson & Wall, 1993)
5. Will test inftuence yrat learners leam (Anderson & Wall, 1993)
6. Will test inJluence lrory leamers leam (Andenon & Wall, 1993)
7. Will test inlluence rhe rdr€ and ss quence of te,achilg(A;d€rson & Walf, 1993)
8. Will test inltuen(E, the rate and sequence of leaninl(Anderson & Walf, 1993)
9. Will test inllue n* the degree and depth of aeaching (Anderson & Wall, 1993)
10. Will test influe ne the degree and, depth of leaning (Anderson
& Wall, 1993)
I I . will test inlluence the arritudes to the content, method, etc. of
teaching and Iearning (Anderson
& Wall, 1993)
12. will tests that have important consequences have more washback (Anderson
& wa , 1993)
13. wiu tests that do not have imponant consequences have no washback (Anderson
& watr, illr;
14. Will tests have washback on a// learners and teachers (Anderson & Wall,
1993)
15. will tests have washback effects for sozre l€arnen and teachers, but not for
others (Anderson &
Wall, 1993)
h og anr Re I a t e d Qu ed;io n s
l. Do the participants understand the purpose(s) of the test and lhe interd€d use(s)
of the r€sults?
@ailcy, 1996)
2. Are the result$ provided in a clcar, informative and timcly fashion?
@aifey, 1996)
3. Are the results perceived as betiwable and fair by the panicipants? (Bail€y, f996)
4. Does the test me:lsure what the program intends to teach? @ailey, 19g6)
5. Is the test based on clearly articulated goals and objectives? (Bailey, 1996)
6' Is the test based on sound lheoretical principles which have current ctedibility in the field?
@ailey, 1996)
7. Does the t€st utilize authentic texts and authentic tasks? (Bailey, 1996)
8. Are the participants invested in the assessments pr@€sses? (Bailey, 1996)
BROTTN
REFERENCES
Berwick,R',&Ross,S.(1989).Motivationandmatriculation:Arestudentsstillalive
after exam hell? JALT Journal, I l(2), 193'210'
BuchG'(1988)'Testinglisteningcomprehensioninlapaneseuniversityentrance
examinations. JALT Journal, I 0, 1542'
Frederiksen, J. R., & collins, A. (19s9). A systems approach to educational testing.
Mucation researcher, I 8(9), 27 -32.
Fujita T. (1992). Readiness model of optimal input: A comparison between Japanese high
school and non-high school students ofEnglish. sophia Linguistica, 31,122-143.
Gates, S. (1995). Exploiting washback from standardized tests ln J. D. Brown & S' O'
Yamashita (Eds.), Language testing in Japan (pp' l0l-106). Tokyo: Japanese
Association for Language Teaching.
TITE WASHBACK EFFECT OF TESTS ON LANGUAGE EDUCANON 43
Centre.
Moss, P. A. (1992). Shifting conceptions of validity in educational measurement:
62(3\'
Implications for perfiormance assessment' Review of Mucalional Research'
229-258.
and an
Powers, D. E. (1993). Coaching for the SAT: A summary of the summaries
update. Mucational Measurement: Issues and Practice'
I2'24'30'
personal view ' JALT
Reader, I. (1986). Language teaching in Britain and Japan: A
Journal, 7 (2), I 13 -13 6.
Rohlen, T. Japan's high schools. Berkley, CA: University of California Press.
ga igaku kyoiku ni oyobosu
Saito, T., Ariata, S., & Nasu, I. (1984). Taki-sentaku tesuto
eikyo(Effectsofmultiple-choicequestionsonmedicaleducation).Nihonigafukyoiku
shinko midon kenkyu jose ni yoru kenlcyu hokohr sho'
feedback testing model
Shohamy, E. (|gg2). Beyond performance testing: A diagnostic
for assessing foreign language learning. Modern Language Journal, 76(4), 513-521.
Shohamy,E.(1995).Performanceassessmentinlanguagetesling'AnnualReviewof
Applied Linguistics, I 5, 188-21 l.
Shohamy, E., Donitsa-Schmidt, S., & Ferman, l. (1996). Test impact revisited:
washback
I
Iames Dean Brown
Department of ESL
1890 East-West Road
l University of Hawai'i
Honolulu, Hawai'i 96822
I
e-mail: brownj@hawaii.edu