Coxhead 2000 2001 A New Academic Wordlist
Coxhead 2000 2001 A New Academic Wordlist
Coxhead 2000 2001 A New Academic Wordlist
REFERENCES
Linked references are available on JSTOR for this article:
https://www.jstor.org/stable/3587951?seq=1&cid=pdf-
reference#references_tab_contents
You may need to log in to JSTOR to access the linked references.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
is collaborating with JSTOR to digitize, preserve and extend access to TESOL Quarterly
Representation
Organization
Size
'The term running words (or tokens) refers to the total number of word forms in a tex
whereas the term individual words (types) refers to each different word in a text, irrespective
how many times it occurs.
Word Selection
Research Questions
TABLE 1
Note. Words in italics are the most frequent form in that family occurring in the Academic
Corpus.
6. How does the AWL compare with the UWL (Xue & Nation, 1
METHODOLOGY
Discipline
The corpus analysis programme Range (Heatley & Nation, 1996) was
used to count and sort the words in the Academic Corpus. This
programme counts the frequency of words in up to 32 files at a time and
records the number of files in which each word occurs (range) and the
frequency of occurrence of the words in total and in each file.
Words were selected for the AWL based on three criteria:
Description
The first research question asked which lexical items beyond the first
2,000 in West's (1953) GSL occur frequently across a range of academic
texts. In the Academic Corpus, 570 word families met the criteria for
inclusion in the AWL (see Appendix A). Some of the most frequent word
families in the AWL are analyse, concept, data, and research. Some of the
least frequent are convince, notwithstanding, ongoing, persist, and whereby.
The second question was whether the lexical items selected for the
AWL occur with different frequencies in arts, commerce, law, and
science texts. The list appears to be slightly advantageous for commerce
students, as it covers 12.0% of the commerce subcorpus. The coverage of
arts and of law is very similar (9.3% and 9.4%, respectively), and the
coverage of science is the lowest among the four disciplines (9.1%). The
3.0% difference between the coverage of the commerce subcorpus and
the coverage of the other three subcorpora may result from the presence
of key lexical items such as economic, export, finance, and income, which
occur with very high frequency in commerce texts. (See Appendix B for
excerpts from texts in each section of the Academic Corpus.)
The words in the AWL occur in a wide range of the subject areas in the
Academic Corpus. Of the 570 word families in the list, 172 occur in all 28
subject areas, and 263 (172 + 91) occur in 27 or more subject areas (see
Table 3). In total, 67% of the word families in the AWL occur in 25 or
more of the 28 subject areas, and 94% occur in 20 or more.
Evaluation
172 28 20 21
91 27 15 20
58 26 9 19
62 25 9 18
43 24 5 17
43 23 5 16
33 22 4 15
Note. Total s
1,000 wor
2,550 word families, and all but 12 of those in the GSL occur in the
Academic Corpus.
The AWL, the first 1,000 words of the GSL (West, 1953), and the
second 1,000 words of the GSL cover the arts, commerce, and law
subcorpora similarly but in very different patterns (see Table 5). The first
1,000 words of the GSL account for fewer of the word families in the
commerce subcorpus than in the arts and law subcorpora, but this lower
coverage of commerce is balanced by the AWL's higher coverage of this
discipline. On the other hand, the AWL's coverage of the arts and law
subcorpora is lower than its coverage of the commerce subcorpus, but
the GSL's coverage of arts and law is slightly higher than its coverage of
commerce. The AWL's coverage of the science subcorpus is 9.1%, which
indicates that the list is also extremely useful for science students. The
GSL, in contrast, is not quite as useful for science students as it is for arts,
commerce, and law students.
TABLE 4
Coverage of In Academic
Coverage of Anoth
A frequency-based
should be expected
covers a different collection of similar texts. To establish whether the
AWL maintains high coverage over academic texts other than th
the Academic Corpus, I compiled a second corpus of academic te
English, using the same criteria and sources to select texts and divi
them into the same four disciplines. This corpus comprised ap
mately 678,000 tokens (82,000 in arts, 53,000 in commerce, 143,
law, and 400,000 in science) representing 32,539 types of lexical
This second corpus was made up of texts that had met the crite
inclusion in the Academic Corpus but were not included either b
they were collected too late or because the subject area they belong
was already complete.
The AWL's coverage of the second corpus is 8.5% (see Table 6)
all 570 word families in the AWL occur in the second corpus. The G
coverage of the second corpus (66.2%) is consistent with its cove
the science section of the Academic Corpus (65.7%). The overall
coverage of the second corpus by both the AWL and the GSL (7
seems to be partly the result of the large proportion of science tex
contains.
To establish that the AWL is truly an academic word list rather than
general-service word list, I developed a collection of 3,763,733 runni
words of fiction texts. The collection consisted of 50 texts from Projec
Gutenberg's (http://www.gutenberg.net) collection of texts that we
written more than 50 years ago and are thus in the public domain. The
Coverage of the Academic Corpus and the Second Corpus of Academic Texts b
Academic Word List and the General Service List (West, 1953) (%)
fact that th
important,
tion was to f
fiction texts
of lexical ite
The AWL ac
collection, m
Corpus. The
word famil
writing (see
that the wo
collection, a
However, an
Of the AWL
occur with m
additional 86
TABLE 7
Occurrence of the AWL Word Families in the Academic Corpus and the Fiction Collecti
In Academic Corpus
Four or more times as frequently as in fiction collection 380
Three times as frequently as in fiction collection 34
Twice as frequently as in fiction collection 52
Less than twice as frequently as in fiction collection 52
Less frequently than in fiction collection 22
Total 570
The UWL (Xue & Nation, 1984), created through the amalgamation
of four existing word lists, contains 836 word families consisting of 3,70
types and covers 8.5% of the Learned and Scientific sections of the LOB
corpus of written British English (Johansson, 1978) and the paral
Wellington corpus of written English (Bauer, 1993). It covers 9.8% of th
Academic Corpus, slightly less than the 10.0% coverage of the corpus by
the AWL. Therefore, the AWL, though smaller, gives a better return o
learning, as students would need to learn only 570 word families instead
of 836 for the same coverage of academic texts.
The overlap between the AWL and the UWL is 51%, with 435 wor
families occurring in both. This leaves 401 word families occurring only
in the UWL and 135 word families occurring only in the AWL. Th
explanation for the large number of word families occurring in the UW
but not in the AWL lies in the criteria for including word families in th
AWL: Members of a word family had to occur at least 100 times in t
Academic Corpus. Approximately 150 of the word families that are only
in the UWL occurred in the Academic Corpus less than 50 times, or onl
once in more than 174 pages of 400 words, and therefore would not hav
been included in the AWL. Other words in the UWL did not meet the
range criterion for the AWL.
The UWL contains more than 133 word families that do not occur in
all four sections of the Academic Corpus (Table 8). Thus students co
learn these words but might rarely or never encounter them in academ
texts. Although the UWL contains useful words for students to learn,
shown by the 9.8% coverage of the Academic Corpus, the AWL is smal
has a higher coverage of academic texts, and covers a far wider range
subject areas.
CONCLUSION
The Academic Word List includes 570 word families that constitute a
specialised vocabulary with good coverage of academic texts, regardles
of the subject area. It accounts for 10% of the total tokens in the
Academic Corpus, and more than 94% of the words in the list occur in
20 or more of the 28 subject areas of the Academic Corpus. Thes
The AWL is the result of a corpus-based study. Such studies create list
concordances, or data concerning the clustering of linguistic items
coherent, purposeful texts. The use of this research method, howeve
does not imply that language teaching and learning should rely on
decontextualised methods. Instead, the AWL might be used to s
vocabulary goals for EAP courses, construct relevant teaching materials
and help students focus on useful vocabulary items.
The AWL will be most valuable in setting goals for EAP courses. This
study has identified vocabulary to include in teaching and learnin
materials, but there remains a need to design tests to diagnose whether
learners know this vocabulary and whether attempts to teach and learn
have been successful. Such tests exist for the UWL (Nation, 1983);
similar tests based on the AWL are under development.
The UWL and one of its predecessors, the American University Word List
(Praninskas, 1972), served as the basis for course books specifically
designed to teach academic vocabulary (Farid, 1985; Valcourt & Wells,
1999; Yorkey, 1981). It is hoped that authors will undertake to write
similar books based on the AWL. In addition, a useful direction for
materials development would be the design of texts that provide optimal
conditions for meeting and learning academic vocabulary. This initiative
might involve adapting academic texts so that the density of unknow
TABLE 9
Future Research
ACKNOWLEDGMENTS
THE AUTHOR
REFERENCES
Atkins, S., Clear, J., & Ostler, N. (1992). Corpus design criteria. Literary
Computing, 7, 1-16.
Bauer, L. (1993). Manual of information to accompany the Wellington Co
New Zealand English. Wellington, New Zealand: Victoria University
Bauer, L., & Nation, I. S. P. (1993). Word families. InternationalJournal o
6, 253-279.
Biber, D. (1989). A typology of English texts. Linguistics, 27, 3-43.
Biber, D. (1993). Representativeness in corpus design. Literary and Ling
ing, 8, 243-257.
Biber, D., Conrad, S., & Reppen, R. (1994). Corpus-based issue
linguistics. Applied Linguistics, 15, 169-189.
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investi
structure and use. Cambridge: Cambridge University Press.
Buckle, R., Kim, K., & Hall, V. B. (1994). Dating New Zealand business
Working Paper 6). Wellington, New Zealand: Victoria University of
Campion, M., & Elley, W. (1971). An academic vocabulary list. Well
Zealand Council for Educational Research.
Cohen, A., Glasman, H., Rosenbaum-Cohen, P. R., Ferrara, J., & Fine, J
Reading English for specialised purposes: Discourse analysis and t
standard informants. In P. Carrell, J. Devine, & D. Eskey (Eds.), In
approaches to second language reading (pp. 152-167). Cambridge: Cam
University Press.
Collins COBUILD dictionary (2nd ed.). (1995). London: HarperCollins.
Corson, D. (1997). The learning and use of academic English words.
Learning, 47, 671-718.
Coxhead, A. J. (1998). An academic word list (English Language Institute
Publication No. 18). Wellington, New Zealand: Victoria University of W
Sutarsyah, C., Nation, P., & Kennedy, G. (1994). How useful is EAP vocabulary for
ESP? RELCJournal 25(2), 34-50.
Thorndike, E. & Lorge, I. (1944). The teacher's word book of 30,000 words. New York
Teachers College Press.
Valcourt, G., & Wells, L. (1999). Mastery: A University Word List reader. Ann Arbor: The
University of Michigan Press.
West, M. (1953). A general service list of English words. London: Longman, Green.
APPENDIX A
APPENDIX B
I. Introduction
Dating the turning points and duration of business cycles has long been assoc
construction of aggregate reference cycle indexes, and their associated leading,
lagging indicators. This was along lines originally developed by Burns and Mitch
subsequently by colleagues at the National Bureau of Economic Research (NB
(1990). More recently, identifying the turning points and duration of business c
an important aspect of two further areas of business cycle research: the evaluation
and associated empirical business cycle models, e.g. King and Plosser (1994), Sim
and the analysis of the time varying characteristics of business cycles, e.g
Rudebusch (1992), Watson (1994).
The American educator Maxine Greene (1984) has written of the relationsh
students and teachers: