Vocabulary Increase and Collocation

Haiyan Men
Vocabulary
Increase and
Collocation
Learning
A Corpus-Based Cross-sectional Study of
Chinese Learners of English
Vocabulary Increase and Collocation Learning
Haiyan Men
Vocabulary Increase
and Collocation Learning
A Corpus-Based Cross-sectional Study
of Chinese Learners of English
123
Haiyan Men
Shanghai Sanda University
Shanghai
China
ISBN 978-981-10-5821-9 ISBN 978-981-10-5822-6 (eBook)

DOI 10.1007/978-981-10-5822-6
Jointly published with Shanghai Jiao Tong University Press, Shanghai
Not for sale outside the Mainland of China (Not for sale in Hong Kong SAR, Macau SAR, and Taiwan,
and all countries, except the Mainland of China)
Library of Congress Control Number: 2017947487
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018
This work is subject to copyright. All rights are reserved by the Publishers, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publishers, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publishers nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publishers remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
Printed on acid-free paper
This Springer imprint is published by Springer Nature

The registered company is Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Foreword
I am pleased to have the opportunity to recommend this monograph study of the

development of collocations in L2. It offers an extremely well-grounded perspective
on a vital area of language learning that remains central to the research endeavours
of large numbers of scholars at least since the 1980s.
Dr. Men’s investigations go well beyond the themes of a great deal of earlier
work on L2 collocation acquisition, where much of the existing literature has
explored the question of L1–L2 collocational congruence, or otherwise. However,
as Dr. Men points out here, the congruence factor may be of restricted applicability
when considering scenarios where the L1 and the L2 share relatively few directly
congruent items anyway, as it seems to be the case for Chinese learners of L2
English studied here. She finds they may well struggle with collocations of those
that are L1–L2 congruent. She offers the finding, important for future research, that
increasing vocabulary size actually gives L2 learners difficulties, by inhibiting the
learning of collocations, at least temporarily.
The discovery that learners may get worse before they get better has long
intrigued researchers in first-language acquisition. Leading scholars such as Jean
Berko and Melissa Bowerman found with noun morphology and verb argument
structure that there is a U-shaped curve in the development of these patterns.
Dr. Men’s results point to a similar process taking place in L2 collocation learning.
A strong point of this study is the attention it gives to the properties of the target
language collocations to be acquired. Dr. Men is thus able to relate her analysis
of these properties to learning performance. The greater difficulty of verb + noun
collocations versus noun + noun collocations is related to synonym density in
English, she shows. The novelty of Dr. Men’s work is that she insightfully links
learners’ challenges with the features of the linguistic landscape they must traverse
in the domain of collocations.
v
vi Foreword
Extensive consideration is also given to pedagogical aspects that arise from

considering how learners can better be helped with collocation learning. Having
contributed greatly to our understanding of why L2 collocation learning involves
serious problems, the book then offers practical suggestions for how they might be
solved. I find this altogether a very satisfying conclusion to an original and carefully
detailed study, and I expect it to form a reference point for researchers in this field
for years to come.
August 2016 Richard Ingham

Mannheim University
Germany
Acknowledgements
This book is based upon my Ph.D. dissertation, Vocabulary Increase and

Collocation Learning: A Corpus-Based Cross-sectional Study of Chinese Learners
of English, accepted by Birmingham City University in March 2015. I would like to
extend my sincere gratitude to my supervisor, Prof. Richard Ingham, whose rig-
orous academic training and thought-provoking guidance have always been trea-
sures for me in the writing process and beyond. I would also like to thank him for
his endless patience in revising the draft chapters, and my conference abstracts.
Richard stimulated my interest in linguistic research and imparted knowledge on
how to do research.
I am very much indebted to two other supervisors, Dr. Ursula Lutzky and Prof.
Antoinette Renouf, for their invaluable comments and assistance, and for their
conscientious proofreading of the drafts.
I am particularly grateful to Prof. Yang Huizhong for not only his lively and
stimulating instruction, but also for his constant critical comments on my doctoral
thesis. I owe my debt to him for his care for both my academic study and life
overseas several years ago.
At an institutional level, my thanks are due to Shanghai Sanda University for
providing financial support for this doctoral study, and for providing funds for me
to attend conferences related to this topic. I also wish to thank Editor Jin Ying’ai
and her team in introducing this book to Shanghai Jiao Tong University Press and
Springer and making the publication possible.
Finally, my gratitude is reserved for my parents and friends. Special thanks are
given to my husband, Wang Shuai, for his unselfish help and unfailing support I
have received while working on this book.
vii
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 General Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aims of the Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 The Shape of the Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 The Notion of Collocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 The Importance of Collocation . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 The Pervasiveness of Phraseological Tendency . . . . . . . 9
2.1.2 The Importance of Collocation for L2 Learners. . . . . . . 11
2.1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 The Notion of Collocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Collocation Previously Approached . . . . . . . . . . . . . . . . 14
2.2.2 Collocation Defined in This Study . . . . . . . . . . . . . . . . 25
2.2.3 Collocations Classified in This Study . . . . . . . . . . . . . . 27
2.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Collocation Studies in Second-Language Learner English . . . . . . . . 35
3.1 Methodologies Adopted in L2 Collocation Studies . . . . . . . . . . . 35
3.1.1 Elicitation Data-Based Collocation Studies . . . . . . . . . . 36
3.1.2 Spontaneous Data-Based Collocation Studies . . . . . . . . 39
3.2 Previous Findings from L2 Collocation Research . . . . . . . . . . . . 42
3.2.1 Forms of Collocation Deficiency: Overuse, Underuse
and Misuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.2 The Role of Learners’ L1 . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.3 Collocation Lag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
ix
x Contents
4 Research Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.1 Research Purpose and Questions . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 The Selection of Verb + Noun, Adjective + Noun
and Noun + Noun Collocations . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 The Learner Corpus—CLEC . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4 Collocation Dictionaries for Reference . . . . . . . . . . . . . . . . . . . . 64
4.5 The Reference Corpus—BNC . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.6 Software for Retrieval and Analysis . . . . . . . . . . . . . . . . . . . . . . 66
4.7 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.7.1 Tagging and Reliability Check . . . . . . . . . . . . . . . . . . . 66
4.7.2 Investigation of Verb + Noun Collocations . . . . . . . . . . 68
4.7.3 Investigation of Adjective + Noun and Noun + Noun
Collocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5 Chinese Learners’ Production of Verb + Noun Collocations . . .... 77
5.1 Overall Analyses (1): General Patterns of VN Collocations
Produced by L2 Learners . . . . . . . . . . . . . . . . . . . . . . . . . . .... 77
5.1.1 Overall Tokens of Collocations . . . . . . . . . . . . . . . .... 77
5.1.2 Overall Types of Collocations and Collocation
Frequency Distribution . . . . . . . . . . . . . . . . . . . . . .... 79
5.1.3 Collocation Misuses . . . . . . . . . . . . . . . . . . . . . . . .... 83
5.1.4 Synopsis of Overall Analyses (1) . . . . . . . . . . . . . .... 85
5.2 Overall Analyses (2): Between-Group Comparisons
of Delexical and Lexical VN Collocations . . . . . . . . . . . . . .... 86
5.2.1 Between-Group Comparisons of Well-Formed
DeLexVN and LexVN Collocations . . . . . . . . . . . .... 88
5.2.2 Between-Group Comparisons of Erroneous
DeLexVN and LexVN Collocations . . . . . . . . . . . .... 89
5.2.3 Synopsis of Overall Analyses (2) . . . . . . . . . . . . . .... 91
5.3 Overall Analyses (3): Verb Growth and Collocation Errors .... 92
5.4 Synopsis of the Overall Analyses of Verb + Noun
Collocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 93
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 95
6 Verb Increase and the Production of Verb + Noun
Collocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 97
6.1 Detailed Analyses—Verb Increase and Collocation Uses . . .... 97
6.1.1 Analysis of VN Collocations Within Synsets
Identified at the ST2 and ST6 Levels . . . . . . . . . . .... 98
6.1.2 Analysis of VN Collocations Within Synsets
Identified at the ST2, ST5 and ST6 Levels . . . . . . .... 103
6.2 Synopsis of Detailed Analyses of Verb Increase
and Collocation Uses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 108
Contents xi
6.3 An Alternative Explanation: New Nouns

and Collocation Uses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 109
6.3.1 New Nouns in the ST6 VN Collocations
(New as Compared with the ST2 Level) . . . . . . . . .... 110
6.3.2 New Nouns in the ST6 VN Collocations
(New as Compared with the ST5 Level)
and New Nouns in the ST5 VN Collocations
(New as Compared with the ST2 Level) . . . . . . . . .... 114
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 116
7 Chinese Learners’ Performance on English Adjective + Noun
and Noun + Noun Collocations . . . . . . . . . . . . . . . . . . . . . . . . . .... 117
7.1 Analyses of Adjective + Noun Collocations . . . . . . . . . . . . .... 117
7.2 Analyses of Noun + Noun Collocations . . . . . . . . . . . . . . . .... 118
7.3 Synopsis of the Analyses of Adjective + Noun
and Noun + Noun Collocations . . . . . . . . . . . . . . . . . . . . . .... 122
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 122
8 Comparison and Interpretation of Learners’ Performance
on the Three Types of Collocations . . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.1 Collocation Errors in the Three Types of Collocations . . . . . . . . 123
8.2 Vocabulary Growth and Collocation Errors . . . . . . . . . . . . . . . . 125
8.3 Synsets and Collocation Production . . . . . . . . . . . . . . . . . . . . . . 126
8.4 Synopsis of the Findings in This Chapter . . . . . . . . . . . . . . . . . . 131
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
9 The Role of L1 in Collocation Learning . . . . . . . . . . . . . . . . . . . . . . 133
9.1 The Notion of Congruence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
9.2 Within-group Comparison of Well-formed and Erroneous
Congruent and Non-congruent VN Collocations . . . . . . . . . . . . . 137
9.3 Between-group Comparison of Well-formed and Erroneous
Congruent and Non-congruent VN Collocations . . . . . . . . . . . . . 141
9.4 Within-group Comparison of Positive and Negative L1
Influence with VN and aN Collocations . . . . . . . . . . . . . . . . . . . 145
9.5 Synopsis of Findings in This Chapter . . . . . . . . . . . . . . . . . . . . . 147
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
10 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
10.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
10.2 Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
10.2.1 Theoretical Implications. . . . . . . . . . . . . . . . . . . . . . . . . 155
10.2.2 Pedagogical Implications . . . . . . . . . . . . . . . . . . . . . . . . 160
10.3 Limitations and Ways Forward . . . . . . . . . . . . . . . . . . . . . . . . . . 165
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
xii Contents
Appendix A: Erroneous VN Collocations Produced by the Three

Levels of Learners (Types) . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Appendix B: Well-Formed and Erroneous VN Collocations
in the 16 Synsets (ST2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Appendix C: Well-Formed and Erroneous VN Collocations
in the 16 Synsets (ST6) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Appendix D: Frequencies of Well-Formed and Erroneous VN
Collocation Types in the 16 Synsets (ST2 and ST6) . . . . . . 183
Appendix E: Well-Formed and Erroneous VN Collocations
in the 16 Synsets (ST5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Appendix F: Frequencies of Well-Formed and Erroneous VN
Collocation Types in the 16 Synsets
(ST2, ST5 and ST6) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Appendix G: Adjective Categories in the ST2 and ST6 AN
Collocation Databases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Appendix H: Well-Formed and Erroneous Congruent
and Non-congruent Collocations in the ST6 (Types) . . . . . 193
Appendix I: Well-Formed Congruent and Non-congruent VN
Collocations in the ST2 and ST6 (Types) . . . . . . . . . . . . . . . 195
Appendix J: Erroneous Congruent and Non-congruent VN
Collocations in the ST2 and ST6 (Types) . . . . . . . . . . . . . . . 197
Appendix K: Positive and Negative Transfer Between VN
and AN Collocations in the ST2 (Types) . . . . . . . . . . . . . . . 199
Appendix L: Positive and Negative Transfer Between VN
and AN Collocations in the ST6 (Types) . . . . . . . . . . . . . . . 201
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Abbreviations
Corpora
BNC British National Corpus

CLEC Chinese Learner English Corpus
ICLE International Corpus of Learner English
Dictionaries
Bilingual dictionaries:
NCCED New Century Chinese-English Dictionary
OALECD Oxford Advanced Learner’s English-Chinese Dictionary (7th Edition)
English dictionaries:
BBI The BBI Combinatory Dictionary of English: Your Guide to
Collocations and Grammar (3rd Edition)
COBUILD Collins COBUILD English Dictionary (2nd Edition)
OCDSE Oxford Collocations Dictionary for Students of English (2nd Edition)
ODSA Oxford Dictionary of Synonyms and Antonyms
Chinese dictionaries:
CCD Contemporary Chinese Dictionary (5th Edition)
xiii
xiv Abbreviations
Other Abbreviations
AN Adjective + noun
DeLexVN Delexical verb + noun
EFL English as a foreign language
ELT English language teaching
ESL English as a second language
EVCA English Verb Classes and Alternations
FL Foreign language
FLT Foreign language teaching
L1 First language
L2 Second language
LexVN Lexical verb + noun
NN Noun + noun
NNSs Non-native speakers
NSs Native speakers
POS Part of speech
SLA Second language acquisition
VN Verb + noun
Chapter 1
Introduction
1.1 General Background
The past decades have seen a dramatic increase in studies on collocations in

second-language acquisition. Several reasons account for such an increase. The first
is a general one, associated with a growing body of research on collocation as a
linguistic phenomenon per se in native-speaker language. Initiated from the Firthian
tradition of looking for word meanings through syntagmatic relations between
words and a search for a lexical theory complementary to grammatical theory (Firth
1957; Halliday 1966; Sinclair 1966), collocation has been a thriving and on-going
field of linguistic enquiry (cf. Hoey 2005; Moon 1998; Renouf and Sinclair 1991;
Sinclair 1991, 2004; Stubbs 1996, 2001). Linguistic investigations into collocations
have provided extensive evidence that native-speaker texts are on the most part
formulaic (e.g. Altenberg 1998; Biber et al. 1999; Cowie 1991, 1992; Howarth
1998). It follows that this phraseological tendency for meanings to be created
through conventionalised word combinations underlying proficient performance
requires L2 learners to have a good command of collocations.
The observation that collocation learning is of central importance for L2
learners’ idiomatic control of that language constitutes another motive for a
growing interest in L2 collocation learning. The importance of collocation
knowledge for L2 learners has been long and widely recognised. Idiomaticity is
identified as key to the attainment of native-like proficiency. As Pawley and Syder
(1983: 191) acknowledged, “fluent and idiomatic control of a language rests to a
considerable extent on knowledge of a body of ‘sentence stems’ which are ‘insti-
tutionalised’ or ‘lexicalised’’’. Mastery of collocations not only facilitates idiomatic
production, but also promotes the efficiency of language comprehension in general
and the comprehension of lexical semantics of individual words. Failing to use
native-like expressions may not only “divert the reader’s attention from content to
form” (Howarth 1998: 174), but also cause “an impression of brusqueness,
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018 1
H. Men, Vocabulary Increase and Collocation Learning,
DOI 10.1007/978-981-10-5822-6_1
2 1 Introduction
disrespect or arrogance” (Wray 2002: 143), and “may sound rather bookish and
pedantic to a native speaker” (Channell 1994: 21).
In the meantime, arbitrarily restricted co-occurrence of word combinations
abound in the English language. For example, blond is perfect for modifying hair,
but not door or dress (Palmer 1981: 76f), and we “do the cooking” but “make
dinner” (Fox 1998: 33). Learning to construct word combinations that are cus-
tomarily used by native speakers is one of the most difficult tasks for even the most
proficient language learners (Pawley and Syder 1983). This phraseological defi-
ciency in second-language learners was realised as early as in the 1930s and has
been extensively discussed ever since. Palmer (1933) noted that when forming such
combinations (e.g. to ask a question, to do a favour), which are not-rule-governed
word combinations, learners may produce expressions such as *to make a question,
*to perform a favour. A great deal of previous research has found that collocation
learning constitutes a problematic domain for L2 learners even at fairly high pro-
ficiency levels. Studies in this field are generally devoted to a description of L2
learners’ difficulties with collocations. The overall picture that emerges from pre-
vious L2 collocation research is that apart from learners’ better receptive knowledge
of collocations (e.g. Biskup 1990; Gyllstad 2005; Marton 1977), collocation pro-
duction poses great problems. Overall, L2 learners’ “building material is individual
bricks rather than prefabricated sections” (Kjellmer 1991:124). They are found to
operate more on the “open choice principle” than the “idiom principle” and use
fewer collocations compared with native-speaker counterparts. In addition to
insufficient uses of collocation, overuse, underuse and misuse of certain colloca-
tions are frequently reported in learners’ writings (e.g. Granger 1998; Howarth
1996; Laufer and Waldman 2011).
Apart from the difficulties L2 learners encounter in the production of colloca-
tions, phraseological knowledge is believed to lag behind grammar and lexis and
constitutes the “last and most challenging hurdle in attaining near native-like flu-
ency” (Spottl and McCarthy 2004: 191). In comparison with learners’ general
vocabulary knowledge, knowledge of collocations is rather weak as Bahns and
Eldaw (1993) found that collocation errors were more than twice than errors with
lexical words. When collocation knowledge is compared among learners at different
levels, it was reported that collocation performance did not improve as the advanced
and the intermediate learners produced significantly more erroneous collocations
than the basic learners (Laufer and Waldman 2011). Similarly, in another more
recent study exploring the collocational competence of two groups of Nigerian
advanced speakers of English as a second language, Obukadeta (2014) discovered
that the participants who had been living/studying in the UK for up to 15 years
were less proficient in terms of their knowledge of collocations than the other group
which had never lived or studied outside Nigeria. These studies indicate a collo-
cation lag, which means that collocational knowledge does not develop alongside
learners’ general level of English proficiency.
Given the difficulties learners are confronted with and the lag in collocational
knowledge, investigating collocations in an L2 is a continuing concern within the
field of second-language learning and teaching. However, research in this field is
1.1 General Background 3
still in its early stage since there is not an overall theory accounting for how
collocations are acquired by L2 learners (Gitsaki 1999). Knowing how collocations
are acquired and produced can provide valuable insight into how they are best
taught. Although extensive research has been carried out on the learning and
production of L2 collocations, much of the research up to now has been descriptive
in nature. It remains unclear what factor(s) is/are associated with the lag in collo-
cation knowledge.
Therefore, the importance of phraseological knowledge in both language pro-
duction and comprehension, and its acquisition as a problematic territory for L2
learners provide sufficient justification for further research in this area. The aim of
this study is to fill the gap by examining factors associated with collocation lag.
1.2 Aims of the Study
The major task of second-language collocation research is to discover what it means

for L2 learners to acquire a collocation, how they learn it, and what problems they
encounter in acquiring a collocation. Previous research has provided a compre-
hensive description of how L2 learners use collocations and what problems they
encounter in using them. Yet little is known with regard to factors responsible for
the stagnant development of collocation knowledge. Hence, this study is intended
to investigate factors that are associated with this collocation lag. By examining the
factor(s) that is/are responsible for the lag in collocation knowledge, we can better
understand the process of collocation learning. Furthermore, more knowledge on
how collocation is acquired can further shed light on how collocations are best
learnt and taught.
As regards which factor(s) may be responsible for collocation lag, we hypoth-
esise that vocabulary growth is an inhibiting force in collocation learning. So a
general question is asked: is vocabulary growth an inhibiting factor in the learning
of collocations by L2 learners? This question needs to be further divided into
detailed questions by taking particular types of collocations as examples. For this
purpose, one most important and frequent type of collocation—verb + noun
(henceforth: VN) collocations is first selected.
The relationship between verb increase and the production of VN collocations is
examined among verb + noun collocations produced by different levels of learners.
At a macro level, verb increase is measured in terms of the development from
delexical to lexical verbs. Verbs are divided into two categories according to the
semantic contents they take: delexical verbs (do, make, take, have, give and get) and
lexical verbs (acquire, fulfil, perform, etc.). Accordingly, VN collocations are
divided into delexical verb + noun (DeLexVN) collocations and lexical verb + noun
(LexVN) collocations. At a micro level, the growth of verbs is measured in syn-
onym sets (Fellbaum 1998) in specific VN collocations. The following develop-
mental patterns with regard to the increase of verbs from delexical to lexical verbs
are hypothesised, such that lower levels of L2 learners make more errors with
4 1 Introduction
delexical verbs and higher levels make more errors with lexical verbs in VN
collocations.
The hypothesis on the developmental patterns is closely linked with the general
hypothesis that verb increase is a hindrance in collocation acquisition. More pre-
cisely, at lower stages of L2 development, due to their limited mastery of verbs,
learners resort to delexical verbs to collocate with a noun instead of a specific
lexical verb. As their verb vocabulary grows, they have more access to lexical verbs
and tend to make more collocation errors with lexical verbs, because the increase in
synonymous verbs allows more chances of incorrect verb choices. The increase in
lexical verbs and the subsequent occurrences of errors with lexical verbs suggest
that vocabulary growth impedes collocation acquisition. To test whether the growth
of verb vocabulary constitutes an inhibiting force in collocation learning, the
relationship between vocabulary growth and collocation development has to be
viewed locally in specific VN collocations, through locating the semantic domains
of verbs in collocations where there is an increase in verbs and examining whether
the increase in these verbs subsequently leads to collocation errors.
Based on the two hypotheses, the following research questions will be
addressed:
1. What developmental patterns appear in the verb + noun collocations produced
by L2 learners, in terms of delexical verb and lexical verb + noun collocations?
a. Is there a tendency towards increasing use of lexical verb + noun collocations
with rising proficiency?
b. Is there a tendency towards increasing errors with lexical verb + noun collo-
cations and decreasing errors with delexical verb + noun collocations with rising
proficiency?
2. Within specific semantic domains of the verbs in verb + noun collocations used
by all levels of learners, is there a tendency for these verbs, as they increasingly
occur at the higher levels, to be associated with collocation errors?
Research questions 1b and 2 are interrelated as they bear the relation of the
whole and a part. They are both concerned with the increase of lexical verbs and the
production of verb + noun collocations at the higher levels. Research question 1b
addresses the relationship between the overall increase of lexical verbs at the higher
levels on the whole and the increasing/decreasing trend of verb + noun collocation
errors associated with lexical verbs; the scope of research concerning verb increase
and collocation errors is further narrowed down in research question 2, which is
aimed at a detailed investigation of the increase of lexical verbs within particular
semantic domains. Through confining the verb increase into semantic domains, the
study sets out to examine if verb increase in a semantic domain is a factor asso-
ciated with the lag in verb + noun collocational knowledge. The particular focus on
verb increase in semantic domains in research question 2 is built on the belief that
learners may be confused with semantically related words (e.g. acquire and obtain)
rather than words falling in different semantic domains (e.g. acquire and change) in
producing verb + noun collocations.
1.2 Aims of the Study 5
Furthermore, two other common types of collocations—adjective + noun and

noun + noun collocations—will be examined, so as to compare the results with
verb + noun collocations. Similar research questions on verb + noun collocations
will be addressed with regard to adjective + noun and noun + noun collocations.
Specifically, we focus on the following questions:
3. Are adjective + noun and noun + noun collocations produced by Chinese L2
learners at the same accuracy level as verb + noun collocations? If not, what
patterns do they follow?
4. Within specific semantic domains of the adjectives in adjective + noun collo-
cations and nouns in noun + noun collocations used by all levels of learners, is
there a tendency for these adjectives/nouns, as they increasingly occur at the
higher levels, to be associated with collocation errors?
As another field of enquiry, this book will investigate the role of L1 in the
production of congruent and non-congruent L2 collocations. Congruent colloca-
tions refer to collocations whose word elements in one language have direct
word-for-word translational equivalence in another language; if word elements in
one collocation do not share direct word-for-word translational equivalence
between two languages, then it is considered as a non-congruent collocation
(Nesselhauf 2005; Wolter and Gyllstad 2011). Previous studies show that congruent
collocations are much easier than non-congruent ones (e.g. Bahns 1993; Nesselhauf
2005), and non-congruent collocations once acquired, are processed independently
of the learners’ L1 (Yamashita and Jiang 2010; Wolter and Gyllstad 2011). In light
of these findings, we set out to examine Chinese L2 learners’ performance on
congruent and non-congruent collocations, with the aim to test whether congruent
collocations are easier than non-congruent ones for them, and whether
non-congruent collocations, once acquired, are less prone to errors.
The present research is therefore an empirical study of the phraseological per-
formance (verb + noun collocations in particular) of Chinese learners of English
across different proficiency levels. This investigation will provide insight into how
collocations are acquired by EFL learners, a question which has not yet been
addressed but is of central importance for a comprehensive understanding of
second-language collocation learning. The study will shed light on the key question
that researchers in SLA attempt to answer, expressed as “What is acquired? What is
not acquired? Why so?” by Gries (2008: 407).
The objective of the research project was an empirical study of the use of
collocations by Chinese L2 learners. However, in order to describe nonnative
phraseological competence, it is first necessary to establish native-speaker norms in
this regard. The native speaker and nonnative speaker dichotomy is contentious.
Davies (2003: 1) introduces the common-sense concept of native speakers, referring
to “people who have a special control over a language, insider knowledge about
“their” language” and are the “models we appeal to for the “truth” about the
language”. However, as Davies (2003) acknowledges, he can more easily define
what a nonnative speaker is than a native speaker, and he even argues that the native
6 1 Introduction
speaker is a myth. Those who have two native-speaking parents, both preferably
monolingual, and are raised in a native-speaking community, can still not be def-
initely defined as native speakers of that language, since other social factors like
mobility and the rise of new Englishes are at play (Davies 2003). As English is
becoming a lingua franca, and an increasing number of proficient academics whose
first language is not English enter English academia (Hyland 2006), it is even
harder to define what a native speaker is. The theoretical aspects of the
native-speaker construct will not be addressed in this study.
Nevertheless, for the investigation and description of learner interlanguage,
language learning goals in terms of native-speaker norms need to be set. Two
widely used English collocation dictionaries and the British National Corpus, a
collection of the texts in British English, were taken as a kind of target norm for L2
English learners. Language forms produced by L2 learners that conform to the
norm were regarded as well-formed, and those that deviate from the norm were
viewed as erroneous.
1.3 The Shape of the Study
The book is divided into ten chapters. Chaps. 2 and 3 discuss previous theoretical
and empirical studies on (L2) collocations. Chap. 2 highlights the importance of
collocation and clarifies the notion of collocation. The significance of collocations
is discussed in terms of their prevalence in native-speaker texts and their importance
for a fluent and idiomatic control of English for L2 learners. Then the notion of
collocation is examined on the basis of previous different approaches, and a defi-
nition and classification applied in this study are presented. Chap. 3 reviews pre-
vious L2 collocation studies. In this chapter, the methodologies commonly adopted
in L2 collocation studies are first addressed, with a view to introducing the
methodology that has been more and more widely used in the analysis of collo-
cations in learner corpora; then major findings of previous L2 collocation studies
are discussed. Chap. 4 presents the detailed design of the present cross-sectional
study of Chinese EFL learners’ collocation performance. The learner corpus chosen
for such an investigation, the types of collocations targeted, the sources of reference
in extracting these collocations, and the procedures for collocation extraction and
analyses are introduced. Chaps. 5, 6, 7, 8 and 9 contain a detailed analysis of the
data. In Chap. 5, the overall picture of Chinese L2 learners’ performance in
verb + noun collocations is depicted, with the main focus on the developmental
patterns of collocation production from delexical verb + noun to lexical verb +
noun collocations. Chap. 6 is devoted to an investigation of the relationship
between verb increase in specific synonym sets and collocation uses associated with
verbs in these synsets. Moreover, an alternative explanation, i.e. the learning of new
nouns in collocation production, is made in order to see whether the acquisition of
new nouns is responsible for a lag in collocation. Chap. 7 goes on to explore
learners’ performance on two other important and frequent types of collocations,
1.3 The Shape of the Study 7
i.e. adjective + noun and noun + noun collocations. It aims at a corroboration of

findings from verb + noun collocations. Chap. 8 presents and compares learners’
performance on verb + noun, adjective + noun and noun + noun collocations. In
Chap. 9, cross-linguistic influence in the production and learning of L2 collocations
is investigated. It addresses the role of L1 in learners’ performance on congruent
and non-congruent collocations, and the role of L1 in different word-class collo-
cations such as verb + noun and adjective + noun collocations. Chap. 10, finally,
concludes the study by summarising the findings of this study, discussing both
theoretical and pedagogical implications for effective collocation learning,
acknowledging the limitations of the present study and putting forward proposals
for future research.
References
Altenberg, B. (1998). On the phraseology of spoken English: The evidence of recurrent

word-combinations. In A. P. Cowie (Ed.), Phraseology: Theory, analysis and applications
(pp. 101–122). Oxford: Oxford University Press.
Bahns, J. (1993). Lexical collocations: A contrastive view. ELT Journal, 47(1), 56–63.
Bahns, J., & Eldaw, M. (1993). Should we teach EFL students collocations? System, 21(1),
101–114.
Biber, D., Johansson, S., Leech, G., et al. (1999). Longman grammar of spoken and written
English. Harlow: Longman.
Biskup, D. (1990). Some remarks on combinability: Lexical collocations (pp. 31–44). In J. Arabski
(Ed.), Foreign language acquisition papers. Katowice: Uniwersytet Slaski.
Channell, J. (1994). Vague language. Oxford: Oxford University Press.
Cowie, A. P. (1991). Multiword Units in Newspaper Language. In S. Granger (Ed.), Perspectives
on the English lexicon: A tribute to Jacques Van Roey (pp. 101–116). Louvain-la-Neuve:
Cahiers de I’Institut de Linguistique de Louvain.
Cowie A. P. (1992). Multiword lexical units and communicative language teaching. In P. Arnaud,
H. Bejoint (Eds.), Vocabulary and applied linguistics (pp. 1–12). London: Macmillan.
Davies, A. (2003). The native speaker: Myth and reality (Bilingual Education and Bilingualism).
Clevedon: Multilingual Matters.
Fellbaum, C. (1998). WordNet: An electronic lexical Database. Cambridge, MA: MIT Press.
Firth, J. R. (1957). Papers in linguistics 1934–1951. London: Oxford University Press.
Fox, G. (1998). Using corpus data in the classroom. In B. Tomlinson (Ed.), Materials
Development in Language Teaching (pp. 25–43). Cambridge: Cambridge University Press.
Gitsaki, C. (1999). Second language lexical acquisition: A study of the development of
collocational knowledge. San Francisco: International Scholars Publications.
Granger, S. (1998). Prefabricated patterns in advanced EFL writing: collocations and formulae.
In A. P. Cowie (Ed.), Phraseology: Theory, Analysis, and Applications(pp. 145–160). Oxford:
Oxford University Press.
Gries, S. (2008). Corpus-based method in analyses of second language acquisition data.
In P. Robinson, N. C. Ellis (Eds.), Handbook of cognitive linguistics and second language
acquisition (pp. 406–431). New York: Routledge.
Gyllstad, H. (2005). Words that go together well: Developing test formats for measuring learner
knowledge of English collocations (Vol. 5, pp. 1–31). In F. Heinat, E. Klingval (Eds.), The
Department of English in Lund: Working Papers in Linguistics.
8 1 Introduction
Halliday, M. A. K. (1966). Lexis as a linguistic level. In C. E. Bazell, J. C. Catford, M. A. K.

Halliday, et al. (Eds.), In memory of J. R. Firth (pp. 148–162). London: Longman.
Hoey, M. (2005). Lexical priming: A new theory of words and language. London: Routledge.
Howarth, P. (1996). Phraseology in English academic writing: Some implications for language
learning and dictionary making. Tübingen: Niemeyer.
Howarth, P. (1998). The phraseology of learners’ academic writing. A. P. Cowie (Ed.),
Phraseology: Theory, analysis and applications (pp. 161–186). Oxford: Oxford University
Press.
Hyland, K. (2006). “The ‘other’ English: thoughts on EAP and academic writing”. The European
English Messenger, 15(2), 34–38.
Kjellmer, G. A. (1991). Mint of phrases. In K. Aijmer, B. Altenberg (Eds.), English corpus
linguistics (pp. 111–127). Studies in Honour of Jan Svartvik. London: Longman.
Laufer, B., & Waldman, T. (2011). Verb-Noun collocations in second language writing: a corpus
analysis of learners’ English. Language Learning, 61(2), 647–672.
Marton, W. (1977). Foreign vocabulary learning as problem No. 1 of language teaching at the
advanced level. Interlanguage Studies Bulletin, 2, 33–57.
Moon, R. (1998). Fixed expressions and idioms in English: A corpus-based approach. Oxford:
Clarendon Press.
Nesselhauf, N. (2005). Collocations in a learner corpus. Amsterdam: Benjamins.
Obukadeta, P. (2014). L2 collocations: A problematic linguistic phenomenon? In Talk given at the
Birmingham English Language Postgraduate Conference, University of Birmingham, March
7, 2014.
Palmer, F. R. (1981). Semantics (2nd ed.). Cambridge: Cambridge University Press.
Palmer, H. E. (1933). Second Interim Report on English Collocations. Tokyo: Kaitakusha.
Pawley, A., & Syder, F. H. (1983). Two puzzles for linguistic theory: Nativelike selection and
nativelike fluency. In J. C. Richards, R. W. Schmidt (Eds.), Language and Communication
(pp. 191–226). London: Longman.
Renouf, A., & Sinclair, J. (1991). Collocational frameworks in English. In K. Aijmer, B. Altenberg
(Eds.), English corpus linguistics: Studies in honour of Jan Svartvik (pp. 128–143). London:
Longman.
Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.
Sinclair, J. (1966). Beginning the study of lexis. In C. E. Bazell, J. Catford, M. A. K. Halliday,
et al. (Eds.), In Memory of J. R. Firth (pp. 410–430). London: Longman.
Sinclair, J. (2004). Trust the text. London: Routledge.
Spottl, C., & M. Mccarthy (2004). Comparing knowledge of formulaic sequences across L1, L2,
L3 and L4. In N. Schmitt (Ed.), Formulaic sequences: Acquisition, processing and use
(pp. 191–225). Amsterdam: Benjamins.
Stubbs, M. (1996). Text and corpus analysis: Computer-assisted studies of language and culture.
Oxford: Blackwell.
Stubbs, M. (2001). Words and phrases: Corpus studies of lexical semantics. Oxford: Blackwell.
Wolter, B., & Gyllstad, H. (2011). Collocational links in the L2 mental lexicon and the influence
of L1 intralexical knowledge. Applied Linguistics, 32(4), 430–449.
Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge University Press.
Yamashita, J., & Jiang, N. (2010). L1 Influence on the acquisition of L2 collocations:
Japanese ESL Users and EFL learners acquiring English collocations. TESOL Quarterly, 44(4),
647–668.
Chapter 2
The Notion of Collocation
Collocation not only plays a crucial role in language production and comprehen-
sion, but also functions as a key indicator of L2 learners’ overall proficiency in the
field of second language acquisition. This chapter briefly clarifies the notion of
collocation before presenting in the next chapter the reviews of L2 collocation
studies. It begins by highlighting the importance of collocation for both native
speakers and L2 learners (Sect. 2.1). The second section (Sect. 2.2) proceeds to
discuss the different approaches to collocation, and develops a definition adopted in
this study. Finally, how collocations were previously classified and the classifica-
tion of collocation used in the present study are presented.
2.1 The Importance of Collocation
2.1.1 The Pervasiveness of Phraseological Tendency
In the process of speech or text production, complete freedom of choice of a single

word is rare and rather there is a phraseological tendency where meanings are
created through word combinations (Sinclair 2004: 29). What Sinclair refers to by
word combinations are collocations and other features of idiomaticity like fixed
expressions, idioms, etc. The phraseological nature of language has long been
recognised, as “language does not expect us to build everything starting with
lumber, nails, and blueprint, and rather it provides us with an incredibly large
number of prefabs” (Bolinger 1976: 1). Research on word combinations has
accumulated extensive evidence for this phraseological tendency, either in written
or spoken language (e.g. Altenberg 1998; Biber et al. 1999; Cowie 1991, 1992;
Howarth 1998a; Kjellmer 1994; Nattinger and DeCarrico 1992; Pawley and Syder
1983; Renouf and Sinclair 1991; Sinclair 1991; Stubbs 2001; inter alia).
DOI 10.1007/978-981-10-5822-6_2
10 2 The Notion of Collocation
A considerable proportion of prefabs has been identified in various kinds of

genres of texts. Kjellmer (1987) used a straightforward way to measure the col-
locational density of two short samples of texts in the Brown corpus and discovered
that a large proportion of the text was made up of collocational elements.1 In
examining journalistic writings (news stories and editorials), Cowie (1992: 1)
concluded that “journalistic prose draws very heavily on verb–noun collocations
that are already well-established and widely known”, as his studies revealed a
collocational density as high as more than 40% (Cowie 1991, 1992). In addition to
journalistic writings, this collocational density has also been found in academic
writings, in which 41% of the verb–noun combinations were found to be con-
ventional collocations (restricted collocations and idioms) (Howarth 1996).
Furthermore, in general English writings, still a significant number of fixed phrases
and idioms can be found, as reported by Moon (1998) in her examination of the
Oxford Hector Pilot Corpus and Birmingham Collection of English Text. An
interesting way of proving the strength of phraseological tendency was introduced
by Stubbs (2001), who counted the frequency of attested phraseological units of the
word forms beginning with the letter f in a 1000-word sample.2 It was found that all
the 47 words with an initial f were in recognisable phrases, which confirmed the
ubiquitous presence of phraseological units. In all, as Howarth (1998a: 171)
summarised, “there is in native writing an identifiable core of collocational
conventionality”.
In the meantime, the phenomenon that natural language is made up of a large
proportion of word clusters is not manifested in written discourse alone. It shows an
even stronger tendency in the spoken language. As early as the 1980s, working on
data from conversational talk, Pawley and Syder (1983: 215) estimated that “by far
the largest part of the English speaker’s lexicon consists of complex lexical items
including several hundred thousand lexicalised sentence stems”. Similarly,
Jackendoff (1997) collected the data from an American television game show Wheel
of Fortune and discovered a high ratio of collocations, idioms and prefabricated
phrases. Switching to a different perspective, Altenberg investigated recurrent word
combinations retrieved from the London-Lund Corpus of Spoken English, and
reported that “over 80% of the words in the corpus form part of a recurrent
word-combination in one way or another” (Altenberg 1998: 102). This figure shows
an outstandingly high proportion of word combinations, but it encompasses a whole
range of recurrent word combinations, a large proportion of which are of little
phraseological interest (e.g. the the, and the, in a, out of the) (ibid.). The inclusion
of these word sequences is owing to the automatic retrieval method adopted by
Altenberg, whose calculation of the percentage of word clusters is based on the
1
Kjellmer’s recognition of collocation is based on his definition of collocation as a grammatically
well-structured sequence occurring more than once (1987: 133). So, more collocations were
counted than collocations defined in the present study (cf. Sect. 2.2.2).
2
The 1000-word sample was compiled from a 10,000-headword database, which recorded the most
frequent content words in the Cobuild (1995) database. So there were no function words beginning
with an f (e.g. for).
2.1 The Importance of Collocation 11
inclusion of any continuous string of words occurring more than once in identical
form. In a similar vein, Biber et al. (1999) identified many lexical bundles (recurrent
expressions) in a large corpus. Unlike Altenberg (1998), they set a fairly high
threshold level for what qualifies as a lexical bundle—lexical sequences occurring
at least ten times per million words and at the same time across at least five different
texts in a register. Even with such a high cut-off point between lexical bundles and
casual lexical co-occurrences, they discovered a large proportion of lexical bundles:
45% in conversation and 21% in academic prose.
Both the written and spoken languages of native speakers thus exhibit a strong
phraseological tendency. The spoken language has been found to consist of a
greater proportion of recurrent word combinations than the written language. One
reason provided by Biber et al. (1999) is that the spoken language involves a
considerable amount of repetitions, which increases the potential proportion of
clusters. Another underlying reason explaining why the spoken language is more
formulaic might be the time constraints imposed on speakers. Speakers usually do
not have enough time to coin novel expressions as they do in writing. This is the
case with journalistic reporting, where the intense pressures and time constraints on
reporters require them to use a great many familiar ready-made expressions (Cowie
1992). Hence, there is an unavoidably larger occurrence of formulaic language use
in spoken than written production. In all, word combinations make up a very high
proportion in both the written and spoken performance of native speakers. This
phenomenon demonstrates the block-like nature of language and facilitates the
inference that “when we speak or write it is therefore often more apposite to say that
we move from one cluster to the next than to say that we move from one word to
the next” (Kjellmer 1994: ix). The clusters, or multiple-word units, are stored in the
psychological lexicon and are believed by Kelly and Stone (1975) to be at least as
numerous as single words.
Therefore, as to learners of a second or foreign language,3 the existence of a
large number of word combinations underlying proficient performance requires
them to be empowered with this phraseological competence. Phraseological
knowledge is naturally of central importance to fluent and idiomatic control of the
language for L2 learners as well. The next section moves on to discuss the sig-
nificance of this phraseological competence for L2 learners.
2.1.2 The Importance of Collocation for L2 Learners
The importance of collocational knowledge for L2 learners has been long and
widely recognised (e.g. Cowie 1992; Fox 1998; Kjellmer 1991; Lee and Liu 2009;
Lewis 2000; Meara 1984; Palmer 1933; Pawley and Syder 1983; Wray 2002; Yorio
3
In this research, the terms second and foreign language are used interchangeably, referring to any
language learned after one’s native language, although they are differentiated by Richards and
Schmidt (2010: 224f) in terms of whether the language is used as a medium of instruction in
schools or widely used in a country as a medium of communication by the government, media, etc.
1989). In this section, its significance is briefly summarised from two perspectives:
for native-like production and for efficient comprehension.
2.1.2.1 Phraseological Knowledge Is Important for Native-like

Production
Knowledge of collocations is of the same importance as knowledge of grammar. It

is considered key to native-like production, as is claimed by Fox (1998: 33):
when even very good learners of the language speak or write English, the effect is often
slightly odd. There is nothing that is obviously wrong, but somehow native speakers know
that they would not express themselves in quite that way. … The problem is often one of
collocation.
Here, the oddness of expressions produced by learners is not concerned with the
inappropriateness of grammar, but with the co-selected word combinations. To
know a language not only requires the knowledge of appropriate rules to generate
grammatically well-formed utterances of that language, but also knowledge of
which of these grammatical utterances are native-like (Biber et al. 1999; Wray
2002: 143). Failing to appropriately use these lexicalised expressions, as has been
pointed in Chap. 1, may even divert the reader’s attention from content to form
(Howarth 1998a: 174). As Cowie (1992: 10) acknowledged, it is impossible to
perform at a native-like level without knowledge of an appropriate range of mul-
tiword units. Therefore, the significance of phraseological knowledge for L2
learners should in no way be downplayed.
A good command of phraseological knowledge helps attain the goal of
native-like production through promoting fluency. A store of formulaic units in the
mental lexicon plays a key role in reducing the processing effort en route to language
production (cf. Hunston and Francis 2000: 271). Unlike the creative side of language
production, in which individual words are combined one by one according to
grammatical rules, the agglomeration of words into clusters constitutes one single
choice and thus saves much processing time (cf. Sinclair 1987: 320). Jackendoff’s
analogy between fixed word combinations and chunking in music well illustrates the
role of prefabricated units in promoting fluency, as he maintained that:
any musician can attest the fact that one of the tricks to playing fast is to make larger and
larger passages form simplex units from the point view of awareness—to “chunk” the input
and output. This suggests that processing speed is linked not so much to the gross measure
of information processed as to the number of highest-level units that must be treated
serially. Otherwise, chunking wouldn’t help. (Jackendoff 1983: 125)
2.1.2.2 Phraseological Knowledge Is Beneficial for Efficient

Comprehension
Knowing a wide range of multiword units not only facilitates native-like produc-
tion, but also contributes to efficient comprehension on the part of L2 learners.
2.1 The Importance of Collocation 13
Hunston and Francis (2000: 270–271) argued that storing a large number of mul-
tiword units in the mental lexicon, learners can understand the meaning of text
without having to pay attention to every word. This is beneficial for enhancing both
the reading and listening efficiency. They further pointed out that knowledge of
phraseological patterns can help L2 learners reconstruct the meanings even if they
mishear some words in speech. At a micro-level, knowledge of co-occurring word
combinations contributes to successful comprehension of the semantics of each
constituent. For example, through a corpus-based analysis of the collocations with
affect/influence, Lee and Liu (2009) exemplified how the use of collocations pro-
vides a solid conceptual grounding of the target word for L2 learners in grasping the
lexical semantics of the two words.
In sum, in the process of striving for native-like language production, phrase-
ological knowledge is, on the one hand, important for L2 learners’ idiomatic and
fluent production; on the other, it helps promote the efficiency of language com-
prehension in general and the comprehension of lexical semantics of individual
words. Collocation is thus recognised by Lewis (2000: 45) as “the most powerful
force in the creation and comprehension of all naturally-occurring texts”.
2.1.3 Summary
In this section, we have placed collocation within the context of formulaic language
and reviewed its importance for both native speakers and L2 learners. First, the
ubiquitousness of formulaic language in either spoken or written language has long
been acknowledged and verified in previous studies. Given the pervasiveness of
conventionalised word combinations, it follows that L2 learners have to gain a good
command of them in order to achieve native-like proficiency. A good control of
formulaic language not only facilitates idiomatic production, but also promotes
efficient language comprehension. Collocation is one of the most important and
frequent aspects of formulaic language and constitutes the target of this study. The
next section will be devoted to a deeper discussion of the nature of collocations.
2.2 The Notion of Collocation
Given the abundance of terminology in the field of phraseology (cf. the various
expressions mentioned in Sect. 2.1, e.g. collocations, fixed expressions, idioms,
prefabs, complex lexical items, multiword units, etc.),4 a clarification of which of
4
Kjellmer (1994: xi) listed various terms referring to clusters of words: expressions, fixed com-
binations, formula units, formulas, larger-than-word units, lexical phrases, lexicalised sentence
stems, multiword lexical units (MLU), multiple-word units, patterned speech, patterns, phrases,
these aspects of formulaic language forms the object of this study is in order. The
present study will focus on the most common manifestations of formulaic language
—collocation.5 Yet as Bahns (1993: 57) admitted, “regrettably, collocation is a term
which is used and understood in many different ways”. So the primary aim of this
section is to summarise previous definitions and classifications and develop a
definition and classification of collocation in order to identify those word combi-
nations in learner English.
2.2.1 Collocation Previously Approached
Collocation, which refers to syntagmatic lexical relations in a language, can be

traced back to as early as the 1930s. Palmer (1933: title page) defined the collo-
cation as “a succession of two or more words that must be learnt as an integral
whole and not pieced together from its component parts”. Examples of such a
definition by Palmer are to strike while the iron’s hot, thank you and to commit
suicide. Though Palmer used “collocation” as an umbrella term to generally refer to
all “comings-together-of-words”, he is believed to be the first to use collocation in
its present-day sense. Yet collocation approached by Palmer is mainly pedagogi-
cally oriented, and it is not clear from his definition what kind of co-selecting
relationship between two or more words can qualify them as a collocation (and thus
be learnt as an integral whole). These classifying criteria were later developed by
Russian phraseologists like Vinogradov. Based on an analytical framework of
descriptive categories and regarding collocations as a type of word combinations
with a degree of inseparability or fixedness, collocation approached this way is
termed the phraseological approach (Nesselhauf 2004) [or “significance oriented
approach” termed by Herbst (1996: 380)], which shall be discussed in detail in this
section.
The notion of collocation formally came into being in the 1950s when Firth,
commonly accredited as the father of collocation, viewed collocation from a purely
linguistic standpoint and put forward the notion of collocation through the cele-
brated dictum: “you shall know a word by the company it keeps” (Firth 1957: 179).
This statement has been endorsed by many linguists and also been scientifically
validated through corpus-based studies as we shall see below. Definitions of col-
location in various forms following Firth are called the Firthian approach [or
“statistically oriented approach” called by Herbst (1996: 380); the “frequency-based
approach” by Nesselhauf (2004)].
(Footnote 4 continued)
prefabricated speech, ready-made utterances, recurrent combinations, stock phrases and word-like
units. See also the terms to describe the phraseological phenomenon in Wray (2002: 9).
5
Fellbaum (2007: 8) distinguished collocation, a linguistic phenomenon, from collocations,
specific lexical instances resulting from collocation that are part of the lexicon. No differentiation
is attempted in this study.
2.2 The Notion of Collocation 15
Collocation has also been psychologically envisaged (Aitchison 2003). In what

follows, we shall present three main approaches to collocation: the psychological
approach; the Firthian approach and the phraseological approach. The psycholog-
ical approach will be briefly discussed and the latter two will be more elaborated
since the Firthian approach sets a trend for lexical studies in corpus-based research
and the phraseological approach concentrates primarily on the classifying criteria of
collocations, which is particularly useful for collocation studies in the field of
second language acquisition.6
2.2.1.1 The Psychological Approach
Collocation involves strong associations between words. This association can be

frozen into one type of the meanings of a word, defined as the collocative meaning,
which “consists of the associations a word acquires on account of the meanings of
words which tend to occur in its environment” (Leech 1974: 20). Leech (ibid: 20)
gave the example of pretty and handsome, which have the similar meaning of
“good looking”, but can be differentiated by the range of nouns with which they
take, e.g. (handsome) man, and (pretty) woman. This definition of collocation,
concerned with the (collocative) meaning of a word through association with its
likely-to-occur collocates, is viewed as a “psychological” or “associative” definition
(Partington 1998: 15). The associative tendency of words is so strong that in the
mental lexicon the number of collocations is inferred by Kelly and Stone (1975) and
Pawley and Syder (1983) to be as many as single words. The claim that a word
strongly associates with other words is not only evidenced through the large
existence of clusters as discussed in Sect. 2.1.1, but also verified through word
association tests in the field of psycholinguistics. Aitchison (2003: 86) reported that
the second commonest type of response to stimulus words in a test is collocation.7
For example, water, sea, shaker and lake were among the top ten commonest
responses to the word salt, which shows that words are stored in the mental lexicon
in connection with their collocates. Tongue slips, according to Aitchison, constitute
another interesting form of evidence that words are linked with their collocates in
the mental lexicon, as there are cases when “people sometimes start out with one
phrase and then get ‘derailed’ on to a familiar routine, as in Hungarian restaurant
6
This classification of the different approaches to collocation is similar to previous collocation
reviews. For example, definitions of collocation have been neatly summarised by Partington
(1998) into “textual”, “psychological” or “associative”, and “statistical” ones, whilst Handl (2008)
classifies previous definitions into four categories: text-oriented, association-oriented, statistically
oriented and semantically oriented. Herbst (1996) distinguishes three approaches in collocation:
“statistically oriented approach”, “significance oriented approach” and “text-oriented approach”.
Nesselhauf (2004) summarises the approaches of collocation as the “frequency-based” and
“phraseological” approach.
7
According to Aitchison (2003: 86), the commonest type of response to stimulus words is coor-
dination, e.g. salt with pepper, butterfly with moth.
for ‘Hungarian rhapsody’” (Aitchison 2003: 91). Words in a collocational rela-

tionship are believed to be stored in a single remembered set from which they can
be retrieved (cf. Greenbaum 1974: 80).
Therefore, the collocating relationship of words is psychologically real and takes
a major position in the mental lexicon of language users. Yet these associating
bonds between words stored in the mental lexicon of native speakers might be quite
different from those in a L2 learner, whose mental lexicon, as Meara (1984: 232)
put it, is “in general more loosely organised than the native speaker’s lexicon”.
When it comes to the actual use of collocation, the associative bond in the mental
lexicon may work well for native speakers, but probably poses difficulty for non-
native speakers. For example, when both a native speaker and an L2 English learner
are asked to express strong coffee, the collocate strong can be easily associated with
coffee by native speakers, but learners might use other collocates like powerful
rather than strong. It thus requires consciousness and efforts for English learners to
build up that associative bond between words.
The associative power between words in a syntagmatic relation helps the pre-
diction of the co-occurring words to a greater or lesser extent, as the word bonsai
has a strong prediction for tree and spick for span, but pill-box cannot be forecast by
letter (Crystal 1997; Stubbs 2001: 29). The realisation of predictable word com-
binations in texts is collocation––two or more words that tend to co-occur (Lewis
2000: 73). So combining the psycholinguistic phenomenon of collocation and its
realisation in texts, collocation is defined by Hoey (2005: 5) as a psychological
association between words and “evidenced by their more frequent occurrence
together in corpora more often than is explicable in terms of random distribution”.
Next follows a discussion of this text-based study of collocation.
2.2.1.2 The Firthian Approach
Firth’s statement that “you shall know a word by the company it keeps” is well
exemplified by the co-occurring words dark night, where he claimed “one of the
meanings of night is its collocability with dark, and of dark, …, collocation with
night” (1957: 196). The meaning by collocation, as Firth argued, is “an abstraction
at the syntagmatic level and is not directly concerned with the conceptual or idea
approach to the meaning of words” (ibid: 196). Firth put forward a significant
conception as to the realisation of meaning by its instantiations with co-occurring
words. At a time when “the idea the language is based on a system of rules
determining the interpretation of its infinitely many sentences is by no means
novel” (Chomsky 1965: v), Firth’s meaning by collocation was fresh. This con-
ception has become a substantial and new impetus in observable-text-based and
later computer-assisted studies on collocation and established the British traditions
in text analysis (Stubbs 1996). Taking inspiration from Firth’s definition of collo-
cation, the Firthians have conducted studies of word co-occurrences based on real
language in use, and proposed other definitions.
Sinclair is the main inheritor and innovator of the Firthian approach, along with
other linguists who followed the British tradition and viewed collocations primarily
as a syntagmatic relation between words in texts (Halliday 1966; Hoey 1991, 2005;
Kjellmer 1987, 1994; Lewis 2000; Moon 1998; Sinclair 1966, 1987, 1991, 2004;
Stubbs 1996, 2001; etc.). Given the abundance of studies on collocation in this
trend, these studies are summarised in two subsections: one discusses the notion of
collocation in word sense recognition and differentiation, and the expansion of the
notion of collocation to other aspects, such as colligation, semantic prosody and
semantic preference; the other discusses frequency-based studies of collocation
focusing on defining and recognising collocations based on word frequencies.
These two lines of studies are however not mutually exclusive and only discussed
separately for the purpose of stressing their differences.
Text-Oriented Studies on Collocation
Collocation refers to a patterning of language with tendencies of lexical items to

co-occur (Sinclair 1966). These co-occurring tendencies of words are instantiated in
texts, a fact which induces a text-oriented definition of collocation by Sinclair
(1991: 170) as “the occurrence of two or more words within a short space of each
other in a text”. Based on this definition, word combinations are considered to be in
collocational relationships as long as they are within a short space of each other.
Thus, the strong associative bond between words is not retained in Sinclair’s
definition, but this definition constitutes the precondition to the study of collocation
in texts, since the occurrence of two or more words specifies the forms of all
collocations and a text forms the basic medium where collocations are homed and
recognised. Further specification to what counts as a short space of each other is
developed:
We may use the term node to refer to an item whose collocations we are studying, and we
may define a span as the number of lexical items on each side of a node that we consider
relevant to that node. Items in the environment set by the span we will call collocates.
(Sinclair 1966: 415)8
Through a computer-based study, Jones and Sinclair (1974) discovered that

significant collocates usually fall in a span of 4:4, that is, four words to the left and
four words to the right of the node (cf. Sinclair 1991: 170). Based on this
text-oriented study of collocation, the notion of collocation has been utilised by
Sinclair for the study of lexis, lexis and grammar and the expansion of the concept
of collocation to other aspects. Collocational information is useful for word sense
recognition. For instance, with the evidence of collocates like per, average, pop-
ulation, economic profitability, gradual, sharp, slowly, the sense of the node word
decline is recognised as a reduction in size, whilst another sense of decline—
deterioration—is supported by words like sad, suffered (Sinclair 1991). This sense
8
Words in boldface are quoted in their original forms.
differentiation method based on actual language use has revolutionised lexical

research and has been widely followed in lexicography, with the Collins COBUILD
English Dictionary as a typical example.
Collocational evidence also contributes to the combining of lexis and grammar,
as illustrated by Sinclair in the word yield (1991: 56). He found 33 corpus instances
of yield showing the sense of “give way”, realised by yield used as an intransitive
verb; 30 cases meant “produce”, realised as a noun. In 15 cases, yield was used as a
transitive verb, meaning “lead to”. One might argue that words of different parts of
speech (structure) naturally have different meanings (sense), as the senses of the
polysemous word drink are quite different depending on its use as a verb (take in
liquids) and a noun (any liquid suitable for drinking).9 Yet what is significant in
Sinclair’s demonstration is a bottom-up approach, i.e. word sense is recognised
through word co-occurrences by using authentic language data (this method is
widely embraced by Hoey 2005; Renouf 1987; Stubbs 1996, 2001; Teubert 2010;
inter alia). So essentially he approaches the study of lexis in a data-driven fashion.
The close correlation between sense and structure as revealed through colloca-
tional information is also strengthened by the fact that even different word forms of
the same lemma have quite different collocational behaviour.10 In a 130-million
corpus, Stubbs (1996: 172) found that for the lemma educate, the most frequent
word form is education and it collocates primarily with terms denoting institutions
(e.g. further, higher, secondary, university). The base form educate, on the other
hand, collocates with synonymous verbs such as enlighten, entertain, inform, etc.
Moreover, different word forms can enter similar collocations and this therefore
induces the definition of collocation as a relationship between lexemes (Halliday
1966; Sinclair 1991). For example, according to Halliday (1966: 151), “he argued
strongly, I don’t deny the strength of his argument, his argument was strengthened
by other factors” would all be considered instances of the same collocation as
strong argument (cf. Greenbaum 1974: 80).
The notion of collocation has further been expanded to more abstract levels, such
as colligation, semantic prosody and preference. There are syntactic constraints on a
word’s selection of its co-occurring words: these constraints are called colligation
(Firth 1957). Colligation refers to the co-occurrence of grammatical choices (Hoey
2005; Sinclair 1996, 1998, 2004). Compared with collocations, which are directly
observable in texts, colligations are not so directly observable and involve
abstractions based on generalisations about the behaviour of the word in question
(Stubbs 2001: 88). Consequence, for example, tends to co-occur with the prepo-
sition of (Hoey 2005). Besides the grammatical constraints on the collocates of a
word, a word can also co-occur with positive or negative groupings of words and as
a result it is presented with a certain semantic prosody, defined by Louw (1993:
157) as “the consistent aura of meaning with which the form is imbued by its
9
Explanations of drink are quoted from WordNet.
10
Lemma refers to the composite set of word forms. For example, the lemma give refers to the
forms of give, gives, given, gave and giving (Sinclair 1991: 41–42).
collocates”. For example, the phrase set in primarily co-occurs with an unpleasant
state of affairs and has a negative prosody (Sinclair 1991: 68, for semantic prosody,
cf. Sinclair 2003; Stubbs 1995a, b, 1996, 2001). Other words would usually col-
locate with a certain semantic preference, as the naked eye collocates with verbs and
adjectives indicating visibility and the word unemployment usually collocates with
the semantic set of statistics (Sinclair 1991: 33; Stubbs 1995b: 254). Semantic
preference is therefore an abstraction of the semantic orientations over the collo-
cates of the node word.
Frequency-Based Studies on Collocation
Besides the loose treatment of collocation as co-occurring words within a set span,
other researchers reserve the notion of collocation for statistically significant
co-occurring words and define collocation as “the relationship a lexical item has
with items that appear with greater than random probability in its (textual) context”
(Hoey 1991: 7, 2005; cf. Greenbaum 1974; Moon 1998; Sinclair et al. 2004; Stubbs
2001). The higher the probability is, the more likely for a word combination to be a
collocation. Significant collocations are quantitatively identified by using statistical
formulae (cf. Church et al. 1991; Church and Hanks 1990; Church and Hindle
1990; McEnery and Wilson 1996; McEnery et al. 2006; Stubbs 1995a). Within the
field of frequency-based definition of collocation, some definitions purely rely on
frequency, as Moon (1998: 26) considered a collocation as that which “typically
denotes frequently repeated or statistically significant co-occurrences, whether or
not there are any special semantic bonds between collocating items”. Yet frequency
alone is not a reliable criterion for identifying meaningful collocations. Other
researchers add a grammatical standard as well as with frequency and define col-
location as recurring sequences of items that are grammatically well formed (cf.
Johansson and Hofland 1989: 95; Kjellmer 1987: 133, 1994: xiv). According to
Kjellmer (1994: xv), sequences that have no or only a very distant grammatical
relationship are excluded. For example, instances like but too, day but, however in
the, night he would not be considered as collocations even if the frequency criterion
was satisfied. Instead, by me, in April, of the Government all qualify as collocations
(ibid: xiv). However, even though the definition incorporates grammatical
well-formedness, it is not sufficient to distinguish between combinations formed on
the basis of grammatical rules (e.g. by me, in April) and collocations of phraseo-
logical value (e.g. make a decision, strong argument). The approach to identifying
collocations that are of phraseological value is the phraseological approach, which
will be illustrated in the following section.
2.2.1.3 The Phraseological Approach
In Aspects of the Theory of Syntax, Chomsky (1965: 190f) distinguished two types
of word relations: a close construction (as decide on a boat in the sense of choose
the boat) and a loose association (as decide on a boat meaning decide while on a
boat). This distinction is much the same as collocations and free combinations,
where close construction refers to collocation (e.g. the verb decide occurring
together with the particle on to mean choose), which represents a unit, and loose
association resembles free word combinations, which are constructed on the basis
of grammatical rules. The phraseological approach is concerned with the defining
criteria of collocation and demarcating it from other types of word combinations.
The phraseological approach, in contrast with the psychological and Firthian
approaches, concerns itself with classifying schemes of phraseological units
according to their varying degrees of fixedness. Russian phraseologists such as
Vinogradov (1947, cited in Cowie 1998: 4f) established three categories of word
combinations: “phraseological fusions” (e.g. spill the beans), “phraseological uni-
ties” (e.g. blow off steam) and “phraseological combinations” (e.g. meet the
demand). Different phraseologists adopt slightly different classifications with dif-
ferent terminology, as summarised in the table below:
As Table 2.1 shows, word combinations are generally divided into idioms,
collocations and free combinations, which are on a spectrum from the most fixed to
the most free. What is also revealed through the above table is that generally there
are two separate directions of interests distinguishing the Russian phraseologists
(Vinogradov and Amosova) from other phraseologists like Aisenstadt, Cowie and
Howarth. The Russian scholars start from the idiomatic spectrum to delineate the
specific phraseological zone and are preoccupied with the distinction between id-
ioms and collocations. Their classifying criteria will not be elaborated here, since
what is more challenging and significant for L2 learning is the distinction between
collocations and free word combinations, and to “identify at what point language
users are manipulating expressions as wholes rather than composing them
Table 2.1 Classifications of English word combinations

Researchers Classifications of word combinations
Vinogradov (1947) Phraseological Phraseological Phraseological
fusion unity combination
Amosova (1963) Idiom Idiom (not Phraseme, or
differentiated) phraseoloid
Aisenstadt (1979) Idiom Restricted Free word
collocation combination
Cowie (1981) Pure idiom Figurative Restricted Open
idiom collocation collocation
Nattinger and Idiom Collocation Free
DeCarrico (1992) combination
Howarth (1996) Pure idiom Figurative Restricted Free
idiom collocation collocation
Howarth (1998a, b) Pure idiom Figurative Restricted Free
idiom collocation combination
These classifications are partly cited in Cowie (1998: 7), and largely modified in this research. See
also Howarth (1996: 34) for a summary of the terminologies
according to generative rules” (Howarth 1996: 31). Therefore, special attention is

paid to the distinction between collocations and free combinations.
As is acknowledged by Cowie (1998: 5), phraseological combination, or re-
stricted collocation, is the most interesting and yet most difficult to delimit. It is
difficult to delineate because collocation is located in the fuzzy zone between free
combinations and idioms. Previous researchers distinguish the three types of word
combinations: idioms, collocations and free combinations in terms of semantic
transparency, semantic specialisation of one element in the combination and
commutability/substitution of one of the elements.
Semantic Transparency/Opacity
Semantic transparency/opacity is measured in terms of whether the meaning of the

whole combination can be deduced from the meaning of the individual elements.
This criterion is well suited to the differentiation of idioms and non-idiomatic
expressions (Hausmann 1989). For the former, the semantics of the whole com-
bination is opaque in that their meanings are not made up of the sum of their
constituents (e.g. kick the bucket, spill the beans). Yet the meanings of
non-idiomatic word combinations, namely collocations and free combinations, are
easily derivable from their constituents. For example, the meaning is transparent in
both commit a crime as a collocation and control the crime as a free combination).
So collocations can be differentiated from idioms by applying the criterion of
semantic opacity. But this criterion fails to demarcate collocations from free word
combinations, given that both types are transparent in meaning. Criteria central to
the distinction of collocations and free combinations are: one element used in its
specialised sense and the degree of commutability of either of the constituents (cf.
Aisenstadt 1979; Cowie 1981, 1992, 1998; Howarth 1996; 1998a).
Specialised Senses of One Element
Phraseologists distinguish collocations from free combinations in terms of the

senses/meanings of the constituents, and claim that for a word combination to
qualify as a collocation, either of the elements must have a specialised meaning.
What they mean by specialised meaning are figurative senses (as pay in pay one’s
respects, adopt in adopt a policy), technical senses (as obtain in obtain a warrant)
and delexical senses (as make in make a decision) (Aisenstadt 1979; Cowie 1991,
1992, 1998; Howarth 1998a, b; Moon 1998). The requirement for either of the
elements to have a specialised meaning is meant to exclude free combinations, for
which both elements are used in their literal senses (e.g. bake bread, cut cheese).
However, it is not always easy to discern whether the sense of one element is
specialised. Take the collocations used by Howarth (1998a: 170) as an example:
(1) Figurative: require qualifications

(2) Delexical: give evidence of
(3) Technical: obtain a warrant.
A note of caution is due here concerning the senses of the three verbs (require,
give, and obtain). In example (1), the verb require in require qualifications may not
be used in the claimed figurative sense; rather, it is in its literal sense “to ask”/“to
request”; so is the verb give in example (2), which is a delexical verb but means
provide in the context of give evidence; In example (3), though the whole com-
bination is used in a technical text, obtain is used in its original meaning—get.
So the figurative, delexical and technical senses complicate the categorisation
process and are not universally reliable. Meanwhile, the application of this criterion
in delimiting collocations in turn excludes a large number of real collocations, such
as commit a crime, for which no specialised senses are involved and both of the
elements are used in their literal senses. So the criterion that either of the con-
stituents must have a specialised sense does not qualify as a defining criterion in the
definition proposed later in this study.
Commutability/Substitutability
Unlike free combinations which are subject to free substitution of either element
without a consequent alteration in the meaning of the other, collocations are
restricted in the commutability of either element (Aisenstadt 1979; Cowie 1992;
Howarth 1996, 1998a). Aisenstadt (1979: 73) illustrated restricted commutability in
the following two examples:
(4) shrug one’s shoulders
shrug something off
shrug something away
shrug one’s shoulders
square one’s shoulders
hunch one’s shoulders
(5) make a decision
take a decision
have a look
give a look
take a look.
In example (4), both shrug and shoulders are restricted to a number of
co-occurring words and neither of them can be substituted; In example (5), there is
a restricted commutability on the verbs, as decision is limited in alternative verb
collocates: make/take, and look in verbs such as have/give/take. Aisenstadt
attempted to demarcate collocations according to the restricted substitutability of
word constituents. Yet on the one hand, commutability itself is a vague criterion
and depends much on the conceivability of a human mind. With shrug one’s
shoulders for example, shoulders can have a rather wide set of verbs to go with, as
in straighten one’s shoulders, wash one’s shoulders, look at one’s shoulders, rub
one’s shoulders, scratch one’s shoulders (Nesselhauf 2005: 27). This is also the
case with decision, which can co-occur with a variety of verbs, such as reach a
decision, come to a decision, postpone a decision, criticise a decision, explain a
decision (ibid: 27). On the other hand, commutability can also be restricted in free
combinations like wash the glass, since substituting the verb clean for wash slightly
alters the original sense and the same applies to replacing the noun glass with cup.
So what qualifies the two combinations as collocations is the fact that the word
shoulders has a rather restricted set of co-occurring words with the sense of “shrug”
in shrug one’s shoulders (probably only the verb, i.e. shrug) and decision has a
restricted number of verbs with the sense of “make” in make a decision
(make/take/reach, etc.). The notion of the given sense was adopted by Cowie (1992:
5f) in his commutation tests to demarcate restricted collocations. The commutability
of the verb is tested through whether it is the only verb or one of a set of syn-
onymous verbs used in the appropriate sense in relation to a given noun (e.g. verbs
are commutable in abandon/give up a cherished principle, but verbs are not
commutable in run a deficit).
A comprehensive classification of collocations on the basis of commutability
was established by Howarth (1996: 102) in his categorisation of verb + noun
collocations from the most free to the most restricted (from L1 to L5), as sum-
marised in Table 2.2.
From L1 to L5, restrictedness of collocations is scaled from a slight degree of
restriction of one element to complete restriction of both elements and this
restriction is explained by the number of synonyms either element can take. For
example, for L1 collocations, nouns are subject to free substitution whilst restriction
is placed on the verbs because of the limited number of synonymous verbs. When
neither element permits substitution, i.e. with no synonym in the given sense, the
word combination is the most restricted collocation (L5), such as curry favour.
However, this classifying scheme complicates the differentiation between col-
locations and free combinations once the notion of synonyms is introduced. Like
Table 2.2 Howarth’s categorisation of collocations into five levels of restrictedness

Verb Noun Examples
L1 Some restriction Free substitution Adopt/accept/agree to a
proposal/suggestion, etc.
L2 Some Some Introduce/table/bring forward a bill/an
substitution substitution amendment
L3 Some Complete Pay/take heed
substitution restriction
L4 Complete Some Give the appearance/impression
restriction substitution
L5 Complete Complete Curry favour
restriction restriction
the notion of commutability, the number of synonyms is also subject to the con-
ceivability of a human mind. With the examples in L3 for an example, the com-
bination pay heed is considered as a restricted collocation in the sense that heed is
completely restricted in its substitution. Yet according to the Oxford Dictionary of
Synonyms and Antonyms, attention is in a synonymous relationship with heed and
pay attention is an acceptable English collocation. The example of give
appearance/impression classified in L4 has the same problem, as the verb give can
be replaced by make/leave given that make appearance/impression or leave
appearance/impression are expressions with similar meanings.11 So the judging on
the number of synonyms requires a good deal of subjectivity. As with Cowie’s
commutation test in which verbs are measured in terms of the number of synonyms
they have, it is hard to find synonyms for verbs even in free combinations such as
open the door (?unblock, ?unlock). In cases where no synonyms are found, it can
just as well be a free combination rather than a restricted collocation, e.g. drink
one’s tea (Nesselhauf 2005). Commutability is not a clear criterion for differenti-
ating collocations from free combinations.
In this section, the notion of collocation has been first introduced in the domain
of psychological studies, with collocation viewed as psycholinguistic lexical
associations. Another field in which collocation has been researched is the
text/frequency-based studies of collocation. Much text/frequency-based research
focuses on the collocational relationship between words, the extension of the notion
of collocation to more abstract levels, such as colligation, semantic prosody,
semantic preference and the identification of significant collocations. However, the
Firthian approach is based on linear co-occurrence of items and takes little account
of the syntactic and semantic statements that are essential in treating collocations
(Greenbaum 1970: 10). In addition, the span established for identifying colloca-
tions–four words each side—is insufficient to account for certain common collo-
cations (e.g. collect stamps in the following examples):
(6) They collect many things, but chiefly stamps.
(7) They collect many things, though their chief interest is in collecting coins. We,
however, are only interested in stamps (Greenbaum 1970: 11).
So the frequency-based approach, although it can identify significant colloca-
tions of statistical value, cannot incorporate all the collocations of phraseological
value (like collecting stamps in the above examples). Moreover, the Firthian tra-
dition is preoccupied with collocation as a linguistic phenomenon per se and is not
concerned with demarcating collocations from other types of word combinations.
As discussed above, the notion of restriction inevitably forms part of accounting for
what a collocation is and this restriction distinguishes it from other forms of lexical
co-occurrences (e.g. free word combinations and idioms). Measured frequency
of co-occurring words is not a significant measure of collocational restriction
11
The verb collocates of impression—make and leave—are listed with reference to a collocation
dictionary––Oxford Collocations Dictionary for Students of English (2nd Edition).
(Cowie 1998: 226; Greenbaum 1974: 83); the phraseologists on the other hand have
proposed categorisation frameworks of word combinations. The separation of
collocations from free combinations is of essential importance in the investigation
of collocations used by L2 learners, since that constitutes the first step in examining
what is phraseological rather than what is free (cf. Howarth 1996). However, even
with the widely adopted defining features in demarcating collocations within the
phraseological approach, a clear borderline between free combinations and collo-
cations still cannot be set. The next sections, then, continue to examine the defi-
nition of collocations within the phraseological approach, attempting to develop a
usable categorisation of collocation and discussing previous classifications of
collocations.
2.2.2 Collocation Defined in This Study
Since this study is situated in the field of second language acquisition, aiming at
measuring nonnative phraseological performance, the approach taken to collocation
is mainly phraseological, in order to delimit collocations from idioms and free
combinations in learners’ English writings. Collocation within this approach has
been defined by previous researchers in more or less the same way, adopting the
criteria of semantic transparency/opacity, specialised sense of one element and
commutability (see definitions summarised in Table 2.3).
As demonstrated in the previous section, collocations can be distinguished from
idioms by applying the criterion of semantic transparency, i.e. the former are rel-
atively transparent in meaning (e.g. make a decision) and not as opaque as idioms
(e.g. kick the bucket). Another criterion—the specialised senses required of at least
one element of a word combination—is weaved, since certain collocations with
both elements used in their literal senses are excluded otherwise (e.g. commit a
crime, answer questions). As for the criterion of commutability measured in terms
of the number of synonyms an element can take, though it contributes to the
identification of the restrictedness in collocations, it operates more or less at an
intuitive level.
Therefore, there is a need for a clear definition using terminology which avoids
blurring the notion of collocations. What is commonly acknowledged is that there is
restricted commutability in either of the constituent words in a collocation. In other
words, either of the two elements has a limited set of words with which to co-occur
(Cowie 1981, 1998; Howarth 1996). For example, the noun stir in the given context
of “make a stir” has a limited set of verbs: cause/create/make a stir (Cowie 1981:
228). Or a verb in a given context has a limited set of nouns (e.g. pay one’s
respects/a compliment/court) (Cowie 1998: 216). The limited set of verbs/nouns is
termed collocational range, referring to the number of co-occurring words a word
can take (see also Cowie 1981, 1998; Granger 1998; Handl 2008; Leech 1974: 20;
Nesselhauf 2003; Philip 2007). In Greenbaum’s (1974: 80) words, the notion of
collocational range is exemplified by turn on, which “collocates with (among other
Table 2.3 Definitions of collocations and demarcating criteria adopted

Author Definitions Criteria
Aisenstadt “Combinations of two or more words used in Semantic transparency;
(1979: 71) one of their regular, non-idiomatic meanings, commutability
following certain structural patterns, and
restricted in their commutability not only by
grammatical and semantic valency (like the
components of so-called free
word-combinations), but also by usage”
Aisenstadt “A type of word combination consisting of two Semantic transparency;
(1981: 54) or more words, unidiomatic in meaning, commutability
following certain structural patterns, restricted
in commutability not only by semantics, but
also by usage, belonging to the sphere of
collocations”
Van Roey “The linguistic phenomenon whereby a given Commutability
(1990: 46) vocabulary item prefers the company of
another item rather than its ‘synonyms’ because
of constraints which are not on the level of
syntax or conceptual meaning but on that of
usage”
Howarth “Combinations in which one component is Specialised sense of
(1996: 47) used in its literal meaning, while the other is one element;
used in a specialised sense. The specialised commutability
meaning of one element can be figurative, (collocability);
delexical or in some way technical and is an semantic transparency
important determinant of limited collocability (semantically
at the other. These combinations are, however, motivated)
fully motivated”
Nesselhauf “Combinations in which at least one element Specialised sense of
(2005: 25) has a non-literal meaning (and at least one a one element;
literal one) and in which commutability is commutability
arbitrarily restricted, but some commutability is
possible”
Laufer and “Habitually occurring lexical combinations that Semantic transparency;
Waldman are characterised by restricted co-occurrence of commutability
(2011: 648) elements and relative transparency in meaning”
items) light, gas, radio, and TV…These items and others we might add to them
constitute the COLLOCATIONAL RANGE of turn on”.
Collocational range is used as a criterion for distinguishing phraseological units
in that elements in collocations have a restricted range of co-occurring words. With
the example of commit a crime, commit has a restricted range of nouns, such as
crime, wrongdoing, murder, and thus commit a crime qualifies as a collocation.
Combinations with both elements having a wide/unrestricted range of co-occurring
words are free word combinations, e.g. want a book, for which the verb want can
occur with, a car, money, peace, etc., and the noun book can occur with have, buy,
read, take, etc.
Table 2.4 A framework for demarcating collocations

Verb Noun Examples
Free combination − − Want a book (car, money, peace, etc.)
Want/have/buy, etc. a book
Collocation + − Pay/take heed; make/take decision
− + Commit a crime/murder; shrug one’s shoulders
+ + Curry favour; curry/court favour
Idiom ++ ++ Call the shots; face the music
Note “−” means the word in question has a wide range of co-occurring words; “+”represents a
restricted range of words; underlined words in the Examples column are those words whose ranges
are considered
Therefore, this study utilises two essential defining criteria in defining colloca-
tions, namely, semantic transparency and the range of co-occurring words.
Collocations are then defined as combinations of two or more words which are
characterised by a restricted range of co-occurrence in at least one of their con-
stituent words and by relative transparency in meaning.
Based on this definition, we propose that word combinations with both elements
taking a wide range of co-occurring words are classified as free combinations;
combinations in which either of or both elements have a restricted range of
co-occurring words, and also are transparent in meaning, are categorised as col-
locations; combinations with both elements having a very restricted range of
co-occurring words and being opaque in meaning are viewed as idioms (see
Table 2.4 for a detailed illustration).
According to this framework, want a book is a free combination since both the
verb and noun have an unlimited range of co-occurring words. Call the shots is an
idiom with both the verb and noun having a very restricted range of words and
being semantically opaque. Both free combinations and idioms are disregarded in
this study. The focus is on collocations such as pay heed, commit a crime and curry
favour. This framework is a simplified version of Howarth’s categorisation of
collocations into five levels of restrictedness and Nesselhauf’s five groups of
combinatory possibilities of verbs in verb–noun combinations (cf. Howarth 1996:
102; Nesselhauf 2005: 30).
2.2.3 Collocations Classified in This Study
Different approaches to collocations result in different classifications. This section

concerns itself with a brief presentation of previous classifications, especially those
that are relevant to the present study.
Based on the strength of associations between words, Aitchison (2003: 91)
distinguished three types of collocations from the loosely to the most strongly
associated: words that are optionally, yet commonly associated (e.g. fresh-faced
youths), words with habitual connections or clichés (wide awake) and words frozen
into a fixed order or “freezes” (knife and fork). This framework of collocation
classification resembles that of Howarth’s categorisation of collocations from the
least to the most restricted. The difference lies in the criteria they adopted, namely
the strength of association by Aitchison, as opposed to the analytical method of
semantic commutability by Howarth. The common denominator is that both
acknowledge the degree of fixedness in collocations. If words are strongly asso-
ciated, they tend to co-occur more often than would be expected in texts. This leads
to the classification of collocations in frequency-based studies, where collocations
are classified into significant and casual ones (cf. Jones and Sinclair 1974; Sinclair
1987, 1991; Sinclair et al. 2004). Moon (1998: 27) made a distinction among
collocations based on the constraints where collocation arises. The simplest col-
locations are semantically constrained and represent co-occurrence of the referents
in the real world (strawberry jam); the second kind is constrained both
lexico-grammatically and semantically and “arises where one word requires asso-
ciation with a member of a certain class or category of item” (rancid butter); the
third type is syntactically constrained and arises where a word requires comple-
mentation with a specified particle (too—to).
Collocations to be classified in this study are neither based on the psychological
approach or frequency-based approach. As discussed in Sect. 2.2.2, the definition
of collocation is phraseological, and thus its classification is not meant to be based
on restrictedness of combinations (cf. Howarth 1996, 1998a, b; Nesselhauf 2005);
rather it is broadly based on the word classes of its constituents, since the study
aims at investigating L2 learners’ performance with regard to certain types of
collocations and its relationship with vocabulary growth.
According to the syntactic structures of collocations, Hausmann (1989: 1010)
divided collocations into the following six types:
• adjective + noun (heavy smoker)
• (subject-) noun + verb (storm rage)
• noun + noun (lemon tree)
• adverb + adjective (deeply disappointed)
• verb + adverb (criticise severely)
• verb + (object-) noun (stand a chance).
Similar classifications were also proposed by Benson (1985) and Benson et al.
(2010), in whose classifications, collocations were further divided into grammatical
and lexical collocations. A grammatical collocation, according to Benson et al.
(2010), is a phrase consisting of a dominant word and a preposition or a gram-
matical structure; lexical collocations resemble those in Hausmann’s classifications,
which consist of nouns, adjectives, verbs and adverbs. In the classification put
forward by Benson et al. (2010), verb + noun collocations were further divided into
CA collocations (collocations containing a verb denoting creation/activation with a
noun) and EN collocations (collocations containing a verb denoting eradication/
nullification with a noun) (see Table 2.5 for classifications of lexical collocations).
Table 2.5 Classifications of lexical collocations by Benson et al. (2010)

Types Examples
Verb + noun/pronoun (or prepositional phrase); with Come to an agreement, make an
the verb denoting creation and/or activation impression, compose music
Verb + noun; with the verb denoting eradication Reject an appeal, lift a blockade,
and/or nullification break a code
Adjective + noun Strong tea, warm regards, reckless
abandon
Noun + verb Adjectives modify, alarms go off, bees
buzz
Noun + of + noun A herd of buffalo, a pack of dogs, a
bouquet of flowers
Adverb + adjective Deeply absorbed, strictly accurate,
sound asleep
Verb + adverb Affect deeply, amuse thoroughly,
argue heatedly
Among the lexical collocations categorised by Hausmann (1989) and Benson

et al. (2010), verb + noun, adjective + noun and noun + noun collocations fall into
the domain of this study, with the exception that verb + noun collocations are not
divided into CA and EN ones.12 As part of the aim of this research is to investigate
the growth of vocabulary produced by Chinese EFL learners, i.e. the growth of
verbs from delexical to lexical ones, verb + noun collocations are accordingly
further divided into delexical verb + noun and lexical verb + noun collocations.
Delexical verbs, also known as light verbs, such as make, have, or take, are
commonly defined as those verbs “whose semantic content is ‘light’ (or has little
lexical meaning), as opposed to ‘heavy’ (or lexically more specified), and much of
the semantic content is obtained from its arguments” (Miyamoto 2000: 12).
Although the total number of delexical verbs in the English language is small, they
have very high frequency of occurrence and are the commonest words (Sinclair and
Fox 1990: 147). Examples of light verb + noun constructions are make progress,
have a discussion and take a bath, where the main semantic content is provided not
by the verbs, but by the following nouns. The verbs are semantically general and
the object nouns are semantically specific (Algeo 1995). Most delexical structures
(a delexical verb followed by a noun group) can be replaced by an analogous single
word verb (e.g. give advice = advise), though some cannot be replaced by a single
verb, e.g. give evidence, give birth.
In this study, delexical verbs are used in a broad sense and no differentiation is
made regarding the semantic contents they carry in constructions like give advice
and give evidence (cf. Wang 2011). The six most common delexical verbs targeted
are do, give, have, make, take and get (Chi Man-lai et al. 1994; Kaszubski 2000;
Sinclair and Fox 1990; Wang 2011). All verb + noun collocations with the above
six verbs are considered as delexical verb + noun collocations.
12
Reasons for focusing the three types of collocations will be given in Chap. 4.
2.2.4 Summary
This chapter has concerned itself with developing the definition of collocation and
introducing its classifications. We have reviewed three different approaches to
collocation: the psychological approach, which views collocation as psychological
association in the mental lexicon, the Firthian approach, regarding collocation as
words in syntagmatic relations in texts and the phraseological approach, aiming at
demarcating collocation and distinguishing it from other types of word
co-occurrences like free combinations and idioms. The phraseological approach is
mainly followed in this study, since an empirical study on collocations in learner
language requires a categorisation framework allowing them to be separated from
idioms and free combinations. Based on the criteria of semantic transparency,
specialised senses of words and commutability commonly adopted by phraseolo-
gists in demarcating collocation, a slightly refined definition of collocation is
proposed: collocations are combinations of two or more words which are charac-
terised by a restricted range of co-occurrence in at least one of their constituent
words and by relative transparency in meaning. In addition, based on collocation
classifications proposed by Hausmann (1989) and Benson et al. (2010), three types
of lexical collocations: verb + noun, adjective + noun and noun + noun colloca-
tions will be examined in this study. Having defined what is meant by a collocation,
the next chapter moves on to discuss previous studies on collocation learning by L2
learners.
References
Aisenstadt, E. (1979). Collocability restrictions in dictionaries. In R. R. K. Hartmann (Ed.),

Dictionaries and their users: Papers from the 1978 B.A.A.L. seminar on lexicography (pp. 71–
74). Exeter: University of Exeter.
Aisenstadt, E. (1981). Restricted collocations in English lexicology and lexicography. ITL: Review
of Applied Linguistics, 53, 53–61.
Aitchison, J. (2003). Words in the mind: An introduction to the mental lexicon (3rd ed.). Oxford:
Blackwell.
Algeo, J. (1995). Having a look at the expanded predicate. In B. Aarts & C. F. Meyer (Eds.), The
verb in contemporary English: Theory and description (pp. 203–217). Cambridge: Cambridge
University Press.
Altenberg, B. (1998). On the phraseology of spoken English: The evidence of recurrent
word-combinations. In A. P. Cowie (Ed.), Phraseology: Theory, analysis and applications
Amosova, N. N. (1963). Osnovui anglijskoy frazeologii. In A. P. Cowie (Ed.) (1998),
Phraseology: Theory, analysis and applications. Oxford: Oxford University Press.
Benson, M. (1985). Collocations and idioms. In R. Ilson (Ed.), Dictionaries, lexicography and
language learning (pp. 61–68). Oxford: Published in association with the British Council by
Pergamon.
Benson, M., Benson, E., & Ilson, R. (2010). The BBI combinatory dictionary of English: Your
guide to collocations and grammar (3rd ed.). Amsterdam: Benjamins.
References 31
Biber, D., Johansson, S., Leech, G., et al. (1999). Longman grammar of spoken and written
English. Harlow: Longman.
Bolinger, D. (1976). Meaning and memory. Forum Linguisticum, 1, 1–14.
Chi Man-lai, A., Wong Piu-yiu, K., & Wong Chau-ping, M. (1994). Collocational problems
amongst ESL learners: A corpus-based study. In L. Flowerdew & A. K. Tong (Eds.), Entering
text (pp. 157–165). Hong Kong: University of Science and Technology.
Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Church, K., Gale, W., Hanks, P., et al. (1991). Using statistics in lexical analysis. In U. Zernik
(Ed.), Lexical acquisition: Exploring on-line resources to build a lexicon (pp. 115–164).
Hillsdale: Erlbaum.
Church, K., & Hanks, P. (1990). Word association norms, mutual information, and lexicography.
Computational Linguistics, 16, 22–29.
Church, K., & Hindle, D. (1990). Collocational constraints and corpus-based linguistics. In
Working Notes of the AAAI Symposium: Text-Based Intelligent Systems.
Cowie, A. P. (1981). The treatment of collocations and idioms in learners’ dictionaries. Applied
Linguistics, 2(3), 223–235.
Cowie, A. P. (1991). Multiword units in newspaper language. In S. Granger (Ed.), Perspectives on
the English lexicon: A tribute to Jacques Van Roey (pp. 101–116). Louvain-la-Neuve: Cahiers
de I’Institut de Linguistique de Louvain.
Cowie, A. P. (1992). Multiword lexical units and communicative language teaching. In P. Arnaud
& H. Bejoint (Eds.), Vocabulary and applied linguistics (pp. 1–12). London: Macmillan.
Cowie, A. P. (1998). Phraseology: Theory, analysis and applications. Oxford: Oxford University
Press.
Crystal, D. (1997). A dictionary of linguistics and phonology (4th ed.). Oxford: Blackwell.
Fellbaum, C. (2007). Idioms and collocations: Corpus-based linguistic and lexicographic studies.
London: Continuum.
Firth, J. R. (1957). Papers in linguistics 1934–1951. London: Oxford University Press.
Fox, G. (1998). Using corpus data in the classroom. In B. Tomlinson (Ed.), Materials development
in language teaching (pp. 25–43). Cambridge: Cambridge University Press.
Granger, S. (1998). Prefabricated patterns in advanced EFL writing: Collocations and formulae.
In A. P. Cowie (Ed.), Phraseology: Theory, analysis, and applications (pp. 145–160). Oxford:
Greenbaum, S. (1970). Verb-intensifier collocations in English: An experimental approach. The
Hague: Mouton.
Greenbaum, S. (1974). Some verb-intensifier collocations in American and British English.
American Speech, 49(1, 2), 79–89.
Halliday, M. A. K. (1966). Lexis as a linguistic level. In C. E. Bazell, J. C. Catford, M. A. K.
Handl, S. (2008). Essential collocations for learners of English: The role of collocational direction
and weight. In F. Meunier & S. Granger (Eds.), Phraseology in foreign language learning and
teaching (pp. 43–65). Amsterdam: Benjamins.
Hausmann, F. J. (1989). Le dictionnaire de collocations. In F. J. Hausmann, O. Reichmann, H.
E. Wiegand, et al. (Eds.), Wörterbücher: ein internationales Handbuch zur Lexicographie.
Dictionaries. Dictionnaires (pp. 1010–1019). Berlin: De Gruyter.
Herbst, T. (1996). What are collocations: Sandy beaches or false teeth? English Studies, 77(4),
379–393.
Hoey, M. (1991). Patterns of lexis in Text. Oxford: Oxford University Press.
Hoey, M. (2005). Lexical priming: A new theory of words and language. London: Routledge.
Howarth, P. (1998a). The phraseology of learners’ academic writing. In A. P. Cowie (Ed.),
Press.
Howarth, P. (1998b). Phraseology and second language proficiency. Applied Linguistics, 19(1),
24–44.
Hunston, S., & Francis, G. (2000). Pattern grammar. A corpus-driven approach to the lexical
grammar of English. Amsterdam: Benjamins.
Jackendoff, R. (1983). Semantics and cognition. Massachusetts: MIT Press.
Jackendoff, R. (1997). Twistin’ the night away. Language, 73, 543–559.
Johansson, S., & Hofland, K. (1989). Frequency analysis of English vocabulary and grammar.
Oxford: Oxford University Press.
Jones, S., & Sinclair, J. (1974). English lexical collocations: A study in computational linguistics.
Cahiers de Lexicologie, 23(2), 15–61.
Kaszubski, P. (2000). Selected aspects of lexicon, phraseology and style in the writing of Polish
advanced learners of English: A contrastive, corpus-based approach. [2011-10-13]. http://main.
amu.edu.pl/ przemka/rsearch.html.
Kelly, E. F., & Stone, P. J. (1975). Computer recognition of English word senses. Amsterdam:
North-Holland Publishing Company.
Kjellmer, G. (1987). Aspects of English collocations. In W. Meijs (Ed.), Corpus Linguistics and
Beyond: Proceedings of the Seventh International Conference on English Language Research
on Computerized Corpora (pp. 133–140). Amsterdam: Rodopi.
Kjellmer, G. (1991). A mint of phrases. In K. Aijmer & B. Altenberg (Eds.), English corpus
linguistics. Studies in honour of Jan Svartvik (pp. 111–127). London: Longman.
Kjellmer, G. (1994). A dictionary of English collocations: Based on the Brown corpus. Oxford:
Clarendon Press.
Laufer, B., & Waldman, T. (2011). Verb-noun collocations in second language writing: A corpus
Lee, C. Y., & Liu, J. S. (2009). Effects of collocation information on learning lexical semantics for
near synonym distinction. Computational Linguistics and Chinese Language Processing, 14
(2), 205–220.
Leech, G. (1974). Semantics. Harmondsworth: Penguin.
Lewis, M. (2000). Teaching collocation: Further developments in the lexical approach. London:
Language Teaching Publications.
Louw, B. (1993). Irony in the text or insincerity in the writer? The diagnostic potential of semantic
prosodies. In M. Backer, G. Francis, & E. Tognini-Bonelli (Eds.), Text and technology
McEnery, T., & Wilson, A. (1996). Corpus linguistics. Edinburgh: Edinburgh University Press.
McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based language studies: An advanced
resource book. London: Routledge.
Meara, P. (1984). The study of lexis in interlanguage. In A. Davies, C. Criper, & A. P. R.
HOWATT (Eds.), Interlanguage (pp. 225–235). Edinburgh: Edinburgh University Press.
Miyamoto, T. (2000). The Light Verb Construction in Japanese: The role of the verbal noun.
Amsterdam: Benjamins.
Moon, R. (1998). Fixed expressions and idioms in English: A corpus-based approach. Oxford:
Clarendon Press.
Nattinger, J. R., & Decarrico, J. S. (1992). Lexical phrases and language teaching. Oxford:
Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some
implications for teaching. Applied Linguistics, 24(2), 223–242.
Nesselhauf, N. (2004). What are collocations? In D. Allerton, N. Nesselhauf, & P. Skandera
(Eds.), Phraseological units: Basic concepts and their application (pp. 1–21). Basel: Schwabe.
Palmer, H. E. (1933). Second interim report on English collocations. Tokyo: Kaitakusha.
Partington, A. (1998). Patterns and meanings: Using corpora for English language research and
teaching. Amsterdam: Benjamins.
References 33
nativelike fluency. In J. C. Richards & R. W. Schmidt (Eds.), Language and communication
Philip, G. (2007). Decomposition and delexicalisation in learners’ collocational (mis)behaviour. In
Online Proceedings of Corpus Linguistics. [2014-01-12]. http://ucrel.lancs.ac.uk/publications/
cl2007/paper/170_Paper.pdf.
Renouf, A. (1987). Lexical resolution. In W. Meijs (Ed.), Corpus Linguistics and Beyond:
Proceedings of the Seventh International Conference on English Language Research on
Computerized Corpora (pp. 121–131). Amsterdam: Rodopi.
Renouf, A., & Sinclair, J. (1991). Collocational frameworks in English. In K. Aijmer & B.
Altenberg (Eds.), English corpus linguistics. Studies in honour of Jan Svartvik (pp. 128–143).
London: Longman.
Richards, J. C., & Schmidt, R. (2010). Longman dictionary of language teaching and applied
linguistics (4th ed.). London: Longman.
Sinclair, J. (1966). Beginning the study of lexis. In C. E. Bazell, J. C. Catford, & M. A. K.
Sinclair, J. (1987). Collocation: A progress report. In R. Steele & T. Threadgold (Eds.), Language
topics: Essays in honour of Michael Halliday (Vol. 2, pp. 319–331). Amsterdam: Benjamins.
Sinclair, J. (1995). Collins COBUILD English Dictionary. (2nd. ed.) London: HarperCollins.
Sinclair, J. (1996). The search for units of meaning. Textus, 9(1), 75–106.
Sinclair, J. (1998). The lexical item. In E. Weigand (Ed.), Contrastive lexical semantics (pp. 1–24).
Sinclair, J. (2003). Reading concordances: An introduction. London: Pearson Education Limited.
Sinclair, J. (2004). Trust the text. London: Routledge.
Sinclair, J., & Fox, G. (1990). Collins COBUILD English grammar. London: Collins.
Sinclair, J., Jones, S., & Daley, R. (2004). English collocation studies: The OSTI report. London:
Continuum.
Stubbs, M. (1995a). Collocations and semantic profiles: On the cause of the trouble with
quantitative study. Functions of Language, 2(1), 23–55.
Stubbs, M. (1995b). Corpus evidence for norms of lexical collocation. In G. Cook & B. Seidlhofer
(Eds.), Principle & practice in applied linguistics: Studies in honour of H. G. Widdowson
Stubbs, M. (1996). Text and corpus analysis: Computer-assisted studies of language and culture.
Oxford: Blackwell.
Teubert, W. (2010). Meaning, discourse and society. Cambridge: Cambridge University Press.
Van Roey, J. (1990). French-English contrastive lexicology: An introduction. Louvain-la-Neuve:
Peeters.
Vinogradov, V. V. (1947). Ob osnovnuikh tipakh frazeologicheskikh edinits v russkom yazuike.
In A. P. Cowie (Ed.) (1998), Phraseology: Theory, analysis and applications. Oxford: Oxford
University Press.
Wang, D. (2011). Language transfer and the acquisition of English light verb + noun collocations
by Chinese learners. Chinese Journal of Applied Linguistics, 11(2), 107–125.
Yorio, C. A. (1989). Idiomaticity as an indicator of second language proficiency. In K. Hyltenstam
& L. K. Obler (Eds.), Bilingualism across the lifespan (pp. 55–72). Cambridge: Cambridge
University Press.
Chapter 3
Collocation Studies in Second-Language
Learner English
Similar to collocation studies discussed in the previous chapter, the past decades
have also seen a large volume of studies on collocation learning in an L2. The
purpose of this chapter is to review previous collocation studies in learner English
to date. It begins by addressing the methodologies commonly adopted in L2 col-
location studies, with a view to introducing the methodology employed in this
study; Sect. 3.2 presents major findings of previous research in this field.
A review of the studies on L2 collocation learning is inevitably combined with
studies on other prefabricated forms of language (cf. Granger’s (1998a) study on
collocations and formulae), since collocation as a linguistic phenomenon belongs to
a larger umbrella term—“formulaic language”—and in practice collocations are not
always carefully delimited from other types of word combinations (Nesselhauf
2005: 3). Therefore, in this literature survey, research on L2 learners’ knowledge of
restricted combinations of words (e.g. formulae, formulaic sequences, routines, etc.)
is briefly reviewed, with the main focus on studies of collocations produced by L2
learners.
3.1 Methodologies Adopted in L2 Collocation Studies
In the investigation of L2 learners’ collocation knowledge, studies are generally

based on two categories of data: elicitation data and production data (Fan 2009:
112; Nesselhauf 2005: 4). Production data in this sense stands in contrast to the
elicited data type, referring exclusively to naturally occurring data or spontaneous
data (Penke and Rosenbach 2007: 10). Yet confusion may arise out of the over-
lapping between the two types, since elicitation data encompass the type of pro-
duction data elicited from L2 learners in translation tasks (compared with learners’
introspective data elicited in intuition judgment tasks); at the same time production
data also include production data of the elicited type (cf. Ellis 1994: 670; Granger
1998b: 4). In order to avoid confusion, Penke and Rosenbach’s distinction between
DOI 10.1007/978-981-10-5822-6_3
36 3 Collocation Studies in Second-Language Learner English
elicited and spontaneous data is adopted and previous L2 collocation studies are
considered as either elicitation data- or spontaneous data-based.
3.1.1 Elicitation Data-Based Collocation Studies
Elicitation tasks designed to assess second language learners’ phraseological

production/comprehension include translation tasks (e.g. Bahns and Eldaw 1993;
Biskup 1990, 1992; Farghal and Obeidat 1995; Hasselgren 1994; Irujo 1993;
Marton 1977), blank filling tasks (Farghal and Obeidat 1995; Hoffman and
Lehmann 2000; Scarcella 1979; Zhang 1993), cloze (Al-Zahrani 1998; Bahns and
Eldaw 1993; Schmitt et al. 2004) and word-combination tests (Bonk 2001;
Channell 1981; Granger 1998a; Gyllstad 2005; Siyanova and Schmitt 2008; Wolter
and Gyllstad 2011; Yamashita and Jiang 2010).
A great advantage of this approach is that researchers can directly observe and
analyse L2 learners’ collocation production/comprehension of a set of pre-selected
collocations. For instance, in eliciting L2 learners’ production data of a particular
type, Bahns and Eldaw (1993) chose 15 English verb–noun collocations whose
German equivalents would be likely to be translated into the respective English
collocations and designed both a translation test and a cloze test to measure German
L2 learners’ active knowledge of these collocations. Farghal and Obeidat (1995)
administered both blank filling and translation tests to two groups of Arabic learners
of English at two proficiency levels, aiming to investigate their productive
knowledge of 22 common English collocations on topics such as food, clothes,
colour and weather. Irujo (1993) confined the study of the production of English
idioms in translation tests by 12 bilingual native speakers of Spanish to three types
of idioms according to their similarity between Spanish and English: exact
equivalents, similar ones and totally different idioms. Hoffman and Lehmann (2000)
concentrated on 55 adjective–noun and noun–noun collocations strongly associated
in the BNC and designed a gap filling task to investigate native and non-native
speakers’ familiarity with these collocations. With the collocations pre-selected,
researchers can directly observe learners’ performance on them and eliminate the
risk of obtaining unnecessary data.
Moreover, in testing L2 learners’ receptive knowledge through eliciting their
introspective data, elicitation techniques such as acceptability judgments possess a
distinct advantage which cannot be outweighed by other methods (e.g. spontaneous
data). Word-combination tests designed to tap into L2 learners’ intuition about
possible collocations are usually in this form. In these tasks, learners were asked to
choose possible collocates of pre-determined words (Channell 1981; Granger
1998a; Gyllstad 2005) or judge the acceptability of certain collocations (Siyanova
and Schmitt 2008). The two kinds of recognition tests correspond to the test formats
for measuring learners’ receptive knowledge of English verb + noun phrase
(NP) collocations by Gyllstad (2005), namely, COLLMATCH and COLLEX.
In COLLMATCH, learners were presented with a number of grids consisting of
3.1 Methodologies Adopted in L2 Collocation Studies 37
verbs and NP objects, and then asked to indicate which verbs can combine with
which nouns. The other test (COLLEX) involved testees to choose the correct
collocation among two lexical combinations: one correct and one
pseudo-collocation (e.g. pay a visit and do a visit). Unlike translation, blank filling
and cloze tasks, whose main advantage is to test the productive collocational
knowledge of L2 learners, these elicitation tasks afford a direct assessment of L2
learners’ receptive collocational knowledge.
A tight control over what is elicited from L2 learners enables direct comparisons
of collocational evidence based on unified criteria. Comparisons can be made
between different elicitation tasks, between collocation performance of different
participants and between L2 learners’ receptive and productive collocation
knowledge. For example, different elicitation tasks administered to the same group
of learners can reveal different strategies L2 learners adopt in producing colloca-
tions. Bahns and Eldaw (1993) compared the collocation production of the same
levels of learners in two tasks and reported that subjects did not perform signifi-
cantly better in a translation task than a cloze task, even though they were able to
paraphrase the target collocations in translation sentences but not in cloze sen-
tences. Findings showed that the difference between the number of correctly
translated collocations and the number of correct collocates in the cloze task did not
reach statistical significance (ibid: 106). The explanation proposed by Bahns and
Eldaw (1993) was that collocations were not easy to be paraphrased.
By controlling the set of collocations to be elicited, comparisons can also be
made between performances of different groups of participants, i.e. between
learners at different proficiency levels, learners of different L1 backgrounds and
learners in contrast with native speakers. Different collocation performances and
strategies in producing the same collocations were observed by Farghal and Obeidat
(1995) in their two groups of subjects: junior and senior English major L2 learners
(Group A) and language teachers of English (non-native speakers) (Group B). Both
groups were found to be seriously deficient in collocations (ibid: 315). With regard
to different strategies adopted by the two groups, they noted that Group B partic-
ipants resorted to paraphrasing as a strategy (25.1%) in the translation task more
than Group A (3.8%) in the blank filling test. The two groups are not comparable,
however, since in the translation task administered to Group B, more freedom of
paraphrasing is allowed than in the blank filling test given by Group A, so it could
have been more effective if the two groups had been given similar elicitation tests.
Additionally, though there was a higher percentage of paraphrasing strategy
adopted by Group B, the target translations of collocations were not found to be
satisfactory. Examples of such paraphrases of collocations are food little fat for light
food, does not change for fast color (ibid: 325). The finding of this unnaturalness in
the produced collocations caused by paraphrasing in translation tests is consistent
with Bahns and Eldaw’s (1993) finding that some collocations cannot be readily
paraphrased.
Besides offering a comparison between learners of different proficiencies, col-
location performances of learners with different L1 backgrounds can be compared
in elicitation-based studies. Ina translation task, Biskup (1992) reported that the
German learners were risk-takers and produced more variant collocations whereas
Polish learners produced more restricted collocations. Meanwhile, in terms of the
comparison between learners and native speakers, L2 learners’ receptive knowledge
was found to be poorer than native speakers as in the judgment tasks conducted by
Siyanova and Schmitt (2008) and Granger (1998a).
Elicitation tasks also enable a comparison between L2 learners’ receptive and
productive knowledge over predefined test materials. Biskup (1990) discovered a
striking difference in Polish learners’ performances on L2-L1 and L1-L2 translation
tasks: their answers were 100% correct in the former but they had great difficulty in
the latter. Similar results were obtained by Marton (1977) in a pre-treatment and
post treatment Polish–English translation test. The studies conducted by both
Biskup and Marton suggest that learners’ productive knowledge of collocation lags
far behind their receptive knowledge. On the one hand, their apparent ease with
collocation comprehension rests on the high degree of semantic transparency in
collocations; on the other, translation from L2 to L1 is always easier than the other
way round because L2 words are initially associated to L1 translation equivalent to
access meaning whereas translation from L1 to L2 words requires concept medi-
ation, which leads to a stronger lexical association from L2 to L1 than that from L1
to L2 (Kroll and Stewart 1994).
It is evident from the above discussion that elicitation data-based studies on L2
learners’ collocation performance possess several advantages difficult to obtain with
other types of data. Variables affecting subjects’ production can be clearly and
systematically controlled with the result that certain collocations happening “to
occur very rarely or not at all unless specifically elicited” (Yip 1995: 9) are elicited
by the researchers. Meanwhile, by targeting the same set of collocations in elici-
tation tasks to different learners, comparisons can be performed between variables
researchers aim to investigate. However, one of the limitations of elicitation
data-based studies lies in generalisability of the research findings to the broader
language proficiency of the participants. As Bahns and Eldaw (1993: 108)
acknowledged, 15 collocations tested in the translation tasks were too small a
sample from which a hypothesis is generalised. Generalisability is also affected by
the artificiality of an experimental situation that “may lead learners to produce
language which differs widely from the type of language they would use naturally”
(Granger 1998b: 5). Take for example the following sentences in the blank-filling
and cloze tests designed by Schmitt et al. (2004) for tapping into learners’
knowledge of formulaic sequences:
(1) With reg_ to giving directions, you must know phrases like ‘Turn right at the
corner’. (concerning this certain thing) (answer: regard)
(2) The economy is sure to improve ___c__
a. in the long period
b. over a long time
c. in the long run
d. over a long space

e. I DON’T KNOW
Learners may produce the right kind of answer but avoid using the target
expression (e.g. with regard to) in real language production. In sentence (1), they
may well use other expressions on which they are more confident, such as about. In
(2), they may choose C but produce over a long time in their writing, given the
close semantic similarity of these expressions. So there may be a gap between
learners’ elicited performance and their natural production. There are also cases
where learners “tend to evaluate non-standard forms as bad and distance themselves
from them when asked explicitly, while still using such forms actively” (Penke and
Rosenbach 2007: 12). In other words, L2 learners may recognise the co-occurrence
of the two or more words as an appropriate collocation, but may neglect its use in
actual production. That is because “language use involves choices, and each time
we construct an utterance we have to select from our available resources in such a
way as to convey what we want to say” (Ellis 1987: 5). The discrepancy between
learners’ linguistic knowledge and communicative use is acknowledged by
Widdowson (1979: 197), who pointed out that in the process of converting the
former to the latter errors take place. For instance, the past tense forms may be
“known” but not “used” (Willis 2010: 10). As for collocations, the adjectival
collocates of amplifiers such as highly were reported to be better known than
actually produced (Granger 1998a). In Granger’s study, learners marked more
adjectives with the adverb highly in an elicitation test than they actually produced in
writings, a phenomenon which was “somewhat paradoxical when considered in the
light of evidence that learners underuse highly in their writing” (ibid: 153). The
discrepancy between receptive and productive collocation knowledge is also
endorsed by Gyllstad (2005: 27), who despite finding that the most advanced
Swedish university learners performed almost as well as native speakers on
receptive tasks, recognised that the two groups would behave differently through a
test of productive knowledge.
Thus, it is insufficient to use a few elicitation tasks such as questionnaires in the
hope of contributing to a comprehensive and systematic linguistic description (Pu
2010). The criticisms levelled against elicited data–based studies are addressed by
the analysis of learner language on the basis of natural language use data, or
third-person observed data (in the terminology of Stubbs (2001) and Widdowson
(2000)). Basing the description of L2 learners’ collocation acquisition on sponta-
neous data is gaining acceptance as it contributes to a fuller picture of learners’
collocation performance and learning.
3.1.2 Spontaneous Data-Based Collocation Studies
Unlike L2 collocation studies based on elicitation techniques, in which learners are

asked to produce or recognise a specific set of collocations pre-selected for
investigation, spontaneous data-based L2 collocation studies focus on L2 learners’

natural production of collocations either in conversational or written texts. Whether
a set of collocations is pre-selected and then elicited from learners or not is a radical
difference between the two data types. Spontaneous data collected in second lan-
guage acquisition in most cases involves elicitation with a very limited degree of
control in essays (where only the topic is given) or oral interviews (where the
interviewer introduces one or a few topics) (Nesselhauf 2005: 40). Such control
affects the topics or the time limit: otherwise the oral or written production of the
learners is spontaneous and free (cf. Granger 2002: 8). Based on observations of
learners’ authentic use of English, much concrete evidence on learners’ phraseo-
logical performance has been accumulated, especially on their written performance
(Ädel and Erman 2012; Durrant and Schmitt 2009; Fan 2009; Granger 1998a;
Howarth 1996, 1998a, b; Hsu 2007; Kaszubski 2000; Laufer and Waldman 2011;
Li and Schmitt 2010; Lorenz 1999; Martelli 2006; Men 2010; Nesselhauf 2005;
Siyanova and Schmitt 2008; Yorio 1989).1
One advantage of spontaneous data–based L2 collocation studies is that learners’
free production performance can be examined in large quantities, which elicitation
data does not provide. These large quantities of data are compiled into a corpus,
defined by Sinclair (1991: 171) as “a collection of naturally-occurring language
text, chosen to characterise a state or variety of a language”. In the context of SLA,
the data collected are freely produced learner language, so the corpus in use is a
learner corpus—defined by Granger (2002: 7) as: “electronic collections of
authentic FL/SL textual data assembled according to explicit design criteria for a
particular SLA/FLT purpose. They are encoded in a standardised and homogeneous
way and documented as to their origin and provenance”.
The availability and sophistication of computers greatly facilitate learner corpus
research. Studies of L2 collocations based on spontaneous data have almost
exclusively utilised learner corpora, whether self-assembled corpora (usually of
smaller size and more specific to research requirements) or public ones (of larger
size and fulfiling more varied research purposes). Studies using self-assembled
corpora include those by Yorio (1989), Howarth (1996, 1998a, b), Hsu (2007), Li
and Schmitt (2010). The merits of these learner corpora are that they are less time-
and energy-consuming to compile than public learner corpora, and they enable
researchers to focus on a particular type of learners. The disadvantages of
self-assembled data include questions over the generalisability of their results based
on a limited number of participants and their replicability given the unavailability of
these corpora to the public.
Another trend in investigating L2 collocation uses is to carry out studies based
on public learner corpora (e.g. Ädel and Erman 2012; Granger 1998a; Laufer and
Waldman 2011; Lorenz 1999; Martelli 2006; Nesselhauf 2003, 2005; Siyanova and
1
Studies of collocations in L2 speech are very rare, although there are studies of other forms of
phraseological performance in speech (for example, Aijmer 2009; Crossley and Salsbury 2011; De
Cock 2011; De Cock et al. 1998; Foster 2001). Lexical bundles, formulaic sequences and routines
are their main study foci.
Schmitt 2008; Zhang and Gao 2006).2 Many learner corpus-based collocation
studies use the sub-corpora of the well-known learner corpus, the International
Corpus of Learner English (ICLE). For example, Granger (1998a) investigated the
uses of collocations and formulae by French learners of English in the French
sub-corpus of ICLE; the study of verb–noun collocations produced by
German-speaking learners was undertaken by Nesselhauf (2005) by using the
German sub-corpus of ICLE, and Siyanova and Schmitt (2008) focused on the
production of adjective–noun collocations by Russian learners using the Russian
sub-corpus of ICLE. Other non-ICLE-based collocation studies include those by
Zhang and Gao (2006), Laufer and Waldman (2011). The large amount of learner
data recorded in these large-scale learner corpora allows for a description of L2
collocation production that is as comprehensive as possible. What is more, utilising
public corpora enables studies of the same corpora to be easily comparable and
replicable (Penke and Rosenbach 2007: 11).
With the development of corpus analysis techniques, another advantage of
learner corpus-based collocation studies is that data can be (semi-) automatically
extracted and processed. As in Howarth’s (1996, 1998a, b) study, the
machine-readable corpus is useful for a rapid check of the original context for
pre-extracted collocations. Nesselhauf (2005) performed a similar automatic anal-
ysis to check whether all instances of verbs that were found to be restricted had
actually been spotted by the manual analysis. Granger (1998a) used text-retrieval
software (TACT) to automatically retrieve all the words ending in ly from the NS
and NNS corpora and then manually sorted them according to pre-defined semantic
and syntactic criteria. In the process of extraction of verb + noun collocations,
Laufer and Waldman (2011) created concordances of the 220 most frequent
pre-generated nouns in an NS corpus, and manually identified the verbs to go with
these nouns in the NNS corpus. Yet whether these 220 most frequent nouns are also
frequent in the NNS corpus is in question. So a full picture of learners’ collocation
production would be neglected by selecting a set of predetermined words for
retrieval. To get a fairly comprehensive picture of L2 collocation uses, collocations
in this study are not confined to certain predefined node words. Furthermore, dif-
fering from some of the collocation studies where collocations are manually
identified (Howarth 1996, 1998a, b; Li and Schmitt 2010; Nesselhauf 2005;
Siyanova and Schmitt 2008), collocations will be semi-automatically extracted in
this study with the aid of text retrieval software.
Spontaneous data-based collocation studies also have certain disadvantages,
insofar as only productive knowledge rather than receptive knowledge can be
investigated; infrequent features are hard to examine even in fairly large corpora
2
The corpora in the studies of Laufer and Waldman (2011) contain 300,000 words, composed of
argumentative and descriptive essays by native speakers of Hebrew of different levels. The corpora
compiled by Lorenz (1999) record German learners’ argumentative writing totaling 200,000
words. Though it is not known whether these two corpora are publicly available, the large
quantities of data from multiple learners manifest great advantages compared with a corpus of
several essays of a limited number of L2 learners.
since they occur rarely unless specifically elicited. In other words, only the per-
formance of learners is investigated but not their competence (Granger 1998b;
Nesselhauf 2005; Yip 1995). However, learners’ performance can be taken as
indicating their phraseological competence, since as acknowledged by Ellis (1994:
13): “learners’ mental knowledge is not open to direct inspection; it can only be
inferred by examining samples of their performance”.
To summarise the methods used in investigation of L2 collocation knowledge,
previous research commonly explores two types of data: elicited and spontaneous
data, each possessing distinctive advantages in answering particular research
questions. Though elicitation-based studies enable direct observation and analysis
of L2 learners’ collocation production/comprehension of a set of pre-selected col-
locations, criticisms are levelled in terms of their naturalness and generalisability.
Spontaneous data-based studies examine L2 learners’ natural production of collo-
cations through using large learner corpora. On the one hand, the use of a large
learner corpus enables a comprehensive description of real language use; on the
other, the development of computer software greatly enhances efficiency in
retrieving and analysing collocations in learner corpora. The point of departure of
this investigation is thus learner corpus-based, using the publicly available Chinese
Learner English Corpus (CLEC) . It aims to examine Chinese English learners’
productive knowledge rather than receptive knowledge. It also sets out to (semi-)
automatically extract all the collocations within a syntactic category (e.g. verb +
noun collocations) instead of focusing only on a number of pre-determined collo-
cations. In this way some disadvantages of spontaneous data-based studies can be
overcome.
3.2 Previous Findings from L2 Collocation Research
Based on the two data types discussed in the previous section, collocation in an L2
has been extensively studied. Previous studies are varied in nature, as seen from a
wide range of differing task types, learner types and collocation types. Their
heterogeneity makes it difficult to compare the results of past studies (Paquot and
Granger 2012: 131). However, the overall picture that emerges through previous L2
collocation research is that collocation production constitutes a particular prob-
lematic domain in SLA, even for learners at an advanced level, compared with their
better receptive collocation knowledge (e.g. Biskup 1990; Gyllstad 2005; Marton
1977). Most importantly, L2 collocation studies indicate a collocation lag, where
collocation knowledge lags far behind the development of syntax and lexis. This
deficiency in collocation learning was recognised as early as the 1930s by Palmer
(1933) and is strongly upheld in later studies. In this section, major findings of prior
studies are presented.
3.2 Previous Findings from L2 Collocation Research 43
3.2.1 Forms of Collocation Deficiency: Overuse, Underuse

and Misuse3
3.2.1.1 Overuse and Underuse
Explorations of L2 learners’ collocational performance through quantitative com-

parisons of native and non-native uses [a methodological approach called
Contrastive Interlanguage Analysis by Granger (2002: 11f)] reveal that L2 learners
operate more on the “open choice principle” than the “idiom principle”, using fewer
collocations than their native-speaker counterparts. In addition to insufficient col-
location uses, they are found to overuse and underuse certain collocations (e.g. Ädel
and Erman 2012; Cobb 2003; De Cock et al. 1998; Durrant and Schmitt 2009;
Foster 2001; Granger 1998a; Howarth 1996, 1998a, b; Laufer and Waldman 2011;
Lorenz 1999; Kaszubski 2000; Yorio 1989). In an investigation of the verb–noun
collocations produced by both native and non-native speakers of English, Laufer
and Waldman (2011) found a far lower number of verb–noun collocations produced
by L2 learners (5.9%) compared with their native-speaker counterparts (10%).
Similar results were reported by Howarth (1998a) and Granger (1998a). In
Howarth’s (1998a) study, the percentage of conventional verb–noun collocations
was 25% among the writings by L2 learners, compared with 38% for
native-speakers. Likewise, Granger (1998a) observed that NNSs used significantly
fewer intensifying adverbs ending in ly (e.g. completely, highly) in terms of both
types and tokens. As to the use of lexical bundles (e.g. this can be seen, there seems
to be), L2 learners also manifested much lower use (60) compared with native
speaker peers, whose writing showed a considerably large number of lexical bun-
dles (130) (Ädel and Erman 2012). Failing to make a wide use of native-like
expressions often results in a lack of diversity in writing, which leads to a sense of
foreignness and even oddness in NNSs’ writings.
The lack of diversified use of collocations is also characterised by overuse and
underuse of certain collocations. In the spoken production of formulaic sequences
and routines by L2 learners, both De Cock et al. (1998) and Foster (2001) dis-
covered an overuse of some vagueness tags (e.g. and so on) and a highly significant
underuse of other vagueness tags (e.g. sort of thing, stuff like that). Turning to
written performance, Yorio (1989) recorded that in the writings of 25 ESL students,
students produced a far lower proportion of ‘idiomatic’ phrasal verbs (e.g. bring up)
than natives. Through analysing the adjective—intensifier combinations amongst
the writings of intermediate and advanced learners with L1 German, Lorenz (1999)
concluded that learners underused more restricted collocations and overused col-
locations that are less restricted. The type of overused collocations are always
linked to lexical combinations in learners’ L1 (e.g. Granger 1998a; Kaszubski
3
The findings of L2 collocation studies were neatly summarised into overuse, underuse and misuse
by Laufer and Waldman (2011) and Paquot and Granger (2012) in their review of L2 collocation
studies. This broad summarisation is employed in the present study.
2000). In Granger’s (1998a) study, the widely used combinations (e.g. closely
linked, deeply rooted) typically had a close translation equivalent in learners’ L1
French, but combinations non-congruent with their L1 were underused (e.g. com-
binations with highly, which is relatively much less frequent in French) (1998a:
148f). The same pattern is discerned in the use of discourse frames by French
learners of English. They were reported to massively overuse the active voice
frames which correspond to the uses of sentence introductory phrases in French
(e.g. We can see that…).
As Cobb (2003: 408) pointed out, what distinguishes L2 learners from NSs “is
the small number of precasts4 advanced learners have at their disposal, and the
extent to which these are used and overused”. The underlying reason for the
overuse and underuse phenomena that emerge in L2 learners’ collocation uses is
that learners tend to “‘cling on’ to certain fixed phrases and expressions which they
feel confident in using” (Granger 1998a: 156). These fixed phrases and expressions
become their ‘safe bets’ (ibid: 148), ‘islands of reliability’ (Dechert 1983: 184),
even referred to cutely as ‘lexical teddy bears’ (Hasselgren 1994: 237) or ‘collo-
cational teddy bears’ (Nesselhauf 2005: 69). Therefore, learners’ heavy reliance on
familiar collocations leads to overuse and avoidance of those which they are unsure
in using leads to underuse. These non-native features of L2 collocation production
are in fact not surprising since in the process of interlanguage development,
overuses and underuses of collocations are unavoidable phenomena, as is the case
with the use of grammatical structures or lexis. Additionally, the comparison of
collocational uses between NSs and NNSs will inevitably reveal less diversified
uses in learners since L2 learners not attaining native-like proficiency naturally
cannot reach a level on a par with NSs. This is where Contrastive Interlanguage
Analysis encounters criticism, to the effect that there tends to be an oversimplified
generalisation of learners’ overuse and underuse when their language is in direct
comparison with native speakers’ (Li 2009: 16). In other words, overuse and
underuse is hardly a specific problem of collocation. What is more important in L2
collocation studies is to investigate the forms of misuses and find the underlying
difficulties confronted with collocation learning.
3.2.1.2 Collocation Misuse
Previous L2 collocation studies report a large proportion of inappropriate uses of

collocations. Nesselhauf (2005) investigated the verb–noun collocations in a corpus
of writings by advanced German-speaking learners of English, and one of her
principal findings was that approximately one third of the collocations were
unacceptable or questionable. She concluded that advanced learners had consid-
erable difficulties in selecting the correct verbs in verb–noun collocations. The
proportion of erroneous collocations is supported by Laufer and Waldman (2011),
‘Precast’ refers to prefabricated chunks by Cobb (2003).

4
in whose study learners at three proficiency levels produced about a third of

erroneous verb–noun collocations. An even larger proportion of errors were iden-
tified in a gap filling task where learners were asked to supply the correct collocates
of frequent adjective–noun or noun–noun combinations, and L2 learners achieved
an average accuracy of only 34% (Hoffman and Lehmann 2000).
Although L2 learners produce a large proportion of collocation errors, past
studies indicate that not all types of collocations pose equal problems to L2 learners.
They experience greater difficulty in producing verb + noun collocations by L2
learners than with other types of collocations, e.g. adjective + noun collocations. In
a cross-sectional study on the development of Greek ESL learners’ collocation
knowledge, Gitsaki (1999) discovered a developmentally determined acquisition
order: adjective–noun collocations were “easy” and “early acquired” type of col-
locations and verb–noun collocations, were the “difficult” and “late acquired” ones.
In Siyanova and Schmitt’s (2008) study on the production of adjective–noun col-
locations by Russian advanced learners of English, they noted that a large per-
centage of learners’ collocations were appropriate (75.3% were attested at least
once in the BNC). Among the appropriate collocations, around 45% of the collo-
cations were not only appropriate, but also frequent and strongly associated English
word combinations. Only one-quarter of the combinations were not attested in the
BNC. However, the absence of these collocations in the BNC does not mean these
collocations are erroneous since they “found evidence that many of these were, in
fact, appropriate as well” (ibid: 437). When L2 learners’ performance in AN col-
locations was compared with that of native speakers, very little difference was
found between the use of appropriate collocations. Moreover, not only good per-
formance on adjective + noun collocations has been observed in previous research,
an improvement of collocational knowledge with rising proficiency is identified.
Gitsaki’s (1999) study showed that learners’ accuracy on adjective–noun colloca-
tions (e.g. sore throat, marine life, heavy drinker in Gitsaki’s study) increased as
they became more and more proficient. Likewise, the better command of adjective–
noun collocations with rising proficiency was also found in Zhang and Chen’s
(2006) study. In a cross-sectional study to test both the receptive and productive
knowledge of English adjective–noun collocations among three groups of EFL
learners at different proficiency levels, higher level subjects were found to have
obviously better command of AN collocations than lower levels in the acceptability
judgment tasks and translation exercises. Results indicate learners’ adjective–noun
collocational knowledge develops along with the rise in language proficiency.
In spite of learners’ good performance on adjective + noun collocations, col-
locations in general undoubtedly pose great learning difficulties even for proficient
L2 learners. Most researchers are in agreement that this learning difficulty involves
the arbitrary restrictions in word combinations. Findings arising out of studies
investigating L2 learners’ erroneous collocations and the degree of restriction in a
word combination suggest that combinations with a medium degree of restriction
are more prone to errors than more restricted combinations (Martelli 2006;
Nesselhauf 2003, 2005). Nesselhauf (2003) for example, found that more restricted
collocations like pay attention and run a risk were less prone to errors than
combinations where a verb takes a wider range of nouns (verbs like exert, perform,
reach). It was further suggested that more restricted collocations were learnt as
wholes whereas the less restricted ones were used creatively (ibid: 233).
Some researchers point out that the relative infrequency of individual colloca-
tions in input is a problem for L2 collocation learning. Henriksen (2013: 49) argues
that “collocations are more low-frequent than the words that make up the collo-
cations, and learners therefore mostly lack sufficient exposure to collocations”.
Exposure to collocations is good for the learning of a second language and for L2
collocations as well, and the lab-based study of collocation learning by Durrant and
Schmitt (2010) has confirmed that frequent input helps the learning of collocations.
A large amount of collocation input is a contributor in collocation learning, as
language input is beneficial for the learning of other L2 aspects, but it is not
sufficient. L2 learners do not pay attention to collocational relationships between
words even when they encounter collocations (Wray 2002). Unlike collocation
acquisition by native speakers, L2 learners are influenced by their mother tongue in
both collocation learning and production, and the influence of learners’ L1 is a
significant factor commonly identified as linked to (mostly erroneous) collocation
production in L2 collocation studies. The next section will discuss the role of
learners’ mother tongue in the learning of L2 collocations.
3.2.2 The Role of Learners’ L1
Past studies have demonstrated that L1 plays an important role in L2 collocation

learning, though no empirical evidence has been gathered to compare L1 influence
on collocations with its influence on other aspects of acquisition (e.g. phonology,
syntax and morphology) through quantifying and comparing the percentage of
interference errors.5 Studies of the influence of L1 on L2 collocation acquisition
generally fall into two major areas: those focusing on L2 learners’ collocations in
terms of L1-induced inappropriate uses and those exploring the potential influence
of L1 in terms of L1 and L2 congruent and non-congruent collocations.
3.2.2.1 L1 Influence in Terms of L1-Induced Inappropriate

Collocation Uses
Many collocation studies into L2 learners’ collocation performance discovered

traces of L1 in erroneous collocations (e.g. Biskup 1992; Farghal and Obeidat 1995;
Martelli 2006). Biskup (1992) observed the translation performance on collocations
5
There is evidence that for advanced L2 learners, L1 influence plays a marginal role in the
acquisition of word formation devices (Olshtain 1987), but L1 is believed to play a larger role in
lexis.
by Polish and German learners and compared their collocation errors in terms of
cross-linguistic influence. She found that for Polish learners of English, the errors
were loan translations or extension of L2 meaning on the basis of the L1 words,
whereas German learners tended to produce errors resulting from assumed formal
similarity, e.g. to crack nuts as *to crunch nuts. Biskup (1992) interpreted these two
types of L1 influence as the perceived differences between languages on the part of
learners: the Polish learners saw a distance between Polish and English and thus did
not assume much formal similarity, whilst German learners assumed more formal
similarity between their mother tongue and English. Farghal and Obeidat (1995)
analysed the tendencies of lexical simplification that learners followed in two
elicitation tasks: blank filling and a translation task. Four strategies that learners
adopted in producing collocations were distinguished: synonymy, avoidance,
transfer and paraphrasing, among which transfer took up 9.9 and 12.9% of all
attempted collocations among two groups of learners. However, some caution is
needed here since “avoidance” as a strategy is a complex phenomenon, and it is not
clear whether subjects in Farghal and Obeidat’s study knew the target collocations
but preferred avoiding them and used other forms instead. One example given is
light food,6 for which learners produced soft food, little food, quick meal, etc. To
call this strategy avoidance rather than transfer is questionable since avoidance is
one manifestation of language transfer (Ellis 1994). So the percentage of L1 transfer
might, make up an even larger proportion in Farghal and Obeidat’s data. L1
influence was also confirmed in Martelli’s (2006) study in which it had a relevant
role in the generation of wrong lexical collocations. However, unlike Farghal and
Obeidat (1995), the proportion of L1-induced errors was not quantified in Martelli’s
study.
The traces of L1 in erroneous collocation uses have been investigated and
quantified, with findings showing that L1-influenced errors make up a large amount
of errors even for learners at advanced levels. In the erroneous uses of verb–noun
collocations by Chinese learners of English with different proficiency levels, Zhang
and Gao (2006) noticed a varied proportion of L1-influenced errors, from nearly
one third to more than a half. Likewise, L1 influenced errors were most frequent in
Nesselhauf’s studies, where L1 influence occurred in about half of the non-native
collocations (2003, 2005). A higher percentage of L1 induced errors—over 60% of
those produced by intermediate and advanced learners were identified by Laufer
and Waldman (2011) and the number of L1-induced errors was not found to
decrease over time. Apart from L1-transfer errors, another consequence of heavy
reliance on their mother tongue in collocation production is the overuse of certain
collocations that are similar between two languages and underuse of patterns that
are mismatched in two languages (cf. Sect. 3.2.1.1).
6
Light food is actually not a target-like collocation. Light meal is a native-like one.
3.2.2.2 The Role of L1 in Terms of L1 and L2 Collocation (Non)

Congruence
Another line taken by past studies of the role of L1 in L2 collocation acquisition

examines the potential influence of L1 lexical combinations on the learning of L2
collocations from the perspective of (non)congruence. Collocations in an L2 are
either congruent, with direct translation equivalents, or non-congruent, without
direct translation equivalents with learners’ L1. It is maintained that L1 congruency
may facilitate, and non-congruency hinder, the acquisition of L2 collocations
(Bahns 1993; Philip 2007; Wolter 2006; Wolter and Gyllstad 2011; Yamashita and
Jiang 2010). Wolter (2006) provided a theoretical account for how learners’
already-established L1 lexical and conceptual knowledge might influence the
building of word connections in the L2 lexicon, and argued that for L2 learners, the
assimilation of new L2 words into paradigmatic hierarchical connections (e.g. dog-
animal, dog-terrier, dog-cat) is easier than the process of building syntagmatic
connections between L2 words (e.g. small room), since the latter requires
restructuring of the existing L1 network. For example, Japanese learners of English
have to restructure their network when encountering a small room since in their L1
the concept of ‘a small room’ is expressed as a narrow room. Learners’ L1 lexical
network “acts as an integrated set of ‘placeholders’ for L2 lexical items” (Wolter
2006: 743) and this L1 knowledge can be useful for acquisition of L2 collocations
that are similar to the L1 and be a hindrance if discrepancies occur. Bahns (1993)
holds the same view as Wolter (2006). Illustrating two types of congruent and
non-congruent collocations in German and English, he made a strong argument that
among the tens of thousands of collocations over which L2 learners should have
command, only the ones which are non-congruent with learners’ L1 should be
taught. The validity of this argument which ignores congruent collocations and
focuses exclusively on non-congruent ones is not flawless and will be discussed
below in Chap. 9.
The potential influence of L1 on the learning of L2 collocations at a psy-
cholinguistic level is not only theoretically predicted but also explored in experi-
ments through psycholinguistic techniques. Yamashita and Jiang (2010)
administered a real-time (online) phrase-acceptability judgment task to a group of
native speakers of English, Japanese English as a second language (ESL) users, and
Japanese English as a foreign language (EFL) learners. They were tested on both
congruent and non-congruent collocations. Subjects were asked to read a stimulus
presented on a computer screen and make a judgment about its acceptability by
pressing a Yes or No button on a keyboard as quickly as possible. Results showed
that both EFL and ESL learners made more errors with non-congruent collocations
than congruent collocations, and EFL learners reacted more slowly to
non-congruent collocations than to congruent ones. Another principal finding is the
indication that once non-congruent collocations are stored in memory, they are
processed autonomously without word-by-word mediation of the L1. In a similar
vein, Wolter and Gyllstad (2011) administered a primed Lexical Decision Task to a
group of L1 Swedish learners of English and a group of English native speakers
serving as controls on collocations in three conditions: congruent collocations,

non-congruent collocations and unrelated items for baseline data. Their aim was to
investigate whether collocational priming7 occurs, and whether L1 knowledge
influences how L2 collocations are processed. Results demonstrated that words
prime their collocates, as evidenced by the significant differences in reaction times
between collocations and unrelated items for the native speaker group. For NNSs,
interesting findings arise with regard to responses to congruent and non-congruent
collocations. Firstly, congruent collocations received more primings than
non-congruent collocations and the latter were responded to more slowly than
congruent ones. Secondly, for non-congruent collocations, no significant difference
of error rates was found with those of congruent collocations, and it was suggested
that once non-congruent collocation “is recognised as a legitimate collocation in the
L2, it becomes stored as such psychologically and when the first word in the
collocation is observed the second word of the collocation is anticipatorily acti-
vated” (Wolter and Gyllstad 2011: 442). Furthermore, another principal finding
suggesting the active role of L1 is that a ‘dual-activation’ was found for learners
performing tasks entirely in an L2: when an L2 word was activated, it stimulated
not only the L2 word’s collocates, but also the L1 translation equivalent and its L1
collocate.
Both the studies conducted by Yamashita and Jiang (2010), Wolter and Gyllstad
(2011) indicate a considerable influence of learners’ L1 on the processing of L2
collocations in the mental lexicon. It is widely acknowledged that L1 plays a role in
L2 acquisition, since late learners have already acquired a well-developed L1
lexical and conceptual network. Yet collocation acquisition seems highly suscep-
tible to L1 influence (Paquot and Granger 2012: 140). This can be accounted for
with the existing models of the structure of bilingual memory. One of the influential
models about the bilingual memory of late learners is proposed by Kroll and
Stewart (1994). According to their Revised Hierarchical Model (see Fig. 3.1), there
are asymmetric links between lexical representations and between lexical repre-
sentations and concepts. Though the lexical and conceptual links in both L1 and L2
lexica and the lexical links between L1 and L2 are bidirectional, they differ in
strength, i.e. the lexical link from L2 to L1 is assumed to be stronger than the lexical
link from L1 to L2, and the link from L1 to conceptual memory is assumed to be
stronger than the link from L2 to conceptual memory (Kroll and Stewart 1994:
158). The RHM reflects the consequences of L2 acquisition in late learners, who, on
the one hand, possess a fully developed L1 lexicon and their associated concepts (so
the stronger link between L1 words and concepts) and on the other, L2 words
access meaning via L1 translation equivalents to (so a strong lexical link from L2 to
L1). For words that have no corresponding translations, a very different process
may be involved (Jiang 2002).
7
Collocation priming is “the tendency for an activated word to accelerate the subsequent recog-
nition of a collocate” (Wolter and Gyllstad 2011: 431).
Fig. 3.1 Kroll and Stewart’s

(1994) Revised Hierarchical
Model
In the meantime, the RHM proposes that the organisation of bilingual memory
changes with rising L2 proficiency, in the form of developing the ability to con-
ceptually process L2 words directly, without mediation of L1 translation equiva-
lents. However, studies show that even for proficient learners, the L1 translation
equivalent is activated when processing the L2 word for meaning access (Thierry
and Wu 2007).8 Thus, if the acquisition of L2 lexical knowledge is initially clinging
onto L1 lexical/conceptual networks as the RHM predicts, it seems highly likely
that in the production process, L1 is firstly activated prior to the production of L2
words (cf. the ‘dual-activation’ in Wolter and Gyllstad (2011) discussed above).
That is where L1 transfer begins and yet not all features of L1 are activated and
transferred to the L2. According to Jiang’s (2000) psychological model of L2
vocabulary acquisition for late bilinguals (see Fig. 3.2), a majority of L2 words
fossilise at the first language lemma mediation stage, when the lemma information
of the L1 (containing semantic and syntactic information) is copied into the L2
lexical entry whilst lexical information at the lexeme level (containing morpho-
logical and phonological/orthographic specifications) is stored in the L2 lexical
entry.
The semantics and syntax of an L2 are quite likely to be influenced by learners’
L1. Considering the nature of collocations—the arbitrary co-selection of word
combinations, collocations are primarily word combinations representing syntactic
8
Kroll et al. (2010) argue that proficient bilinguals may access the translation equivalent after they
understand the meaning of the L2 word. The exploration of intricacies of this debate on whether
highly proficient bilinguals access meaning for L2 words through the mediation of their L1 is
beyond the scope of the present study and won’t be discussed further.
Fig. 3.2 Jiang’s model of fossilised L2 lexical knowledge (cited in Wolter and Gyllstad 2011:
446)
and semantic relationships between lexical items. In other words, collocation is

positioned at the part of the shadowy area between grammar and meaning (Nation
1990: 38).Therefore, collocations are highly susceptible to L1 influence.
However, the influence of learners’ L1 in the learning of collocations in terms of
(non)congruence has not been fully examined through the production of colloca-
tions. One common feature of the studies by Yamashita and Jiang (2010) and
Wolter and Gyllstad (2011) is that they tap L2 learners’ receptive collocation
knowledge by conducting psycholinguistic experiments. As discussed in
Sect. 3.1.1, there is always a gap between learners’ receptive knowledge and their
natural production. In their studies, L2 learners were more likely to accept collo-
cations similar to their L1 as legitimate collocations than those that are different
from their L1 patterns, but it is not clear yet whether they at the same time produce
congruent collocations with more accuracy than non-congruent ones. If so, Bahns’s
(1993) claim that only non-congruent collocations need to be taught appears valid.
However, Nesselhauf’s (2003, 2005) studies of L2 learners’ production of con-
gruent and non-congruent collocations challenge Bahns’s claim. 11% of the con-
gruent collocations produced by learners in her studies were found to be erroneous
and the percentage of erroneous non-congruent collocations among all the
non-congruent ones was 42% (Nesselhauf 2003). Similar ratios were obtained in
another study: 17% for the erroneous congruent and 42% for the erroneous
non-congruent collocations (Nesselhauf 2005). These figures show that although
congruent collocations are easier than non-congruent ones for L2 learners, they also
pose problems; on the other hand, not all non-congruent ones pose problems, since
nearly a half of them are not particularly problematic for L2 learners. However,
these figures are to be treated with caution since, according to Nesselhauf (2003,
2005), congruence is not only measured on the level of content words, but also on
the level of grammatical words. For example, a combination such as participate in
an event is considered non-congruent since for the German equivalent an einem
Ereignis teilnehmen, German an and English in are not considered translation
equivalents. Including grammatical items in defining (non-)congruence would
naturally lead to more non-congruent collocations since “items in closed gram-

matical classes normally behave differently across languages” (Salkie 2002: 56). In
addition, it seems paradoxical that phrasal verbs were treated as a single word and
differences of particles (e.g. up, over) were disregarded. So if congruence is only
measured on content-word-level, more non-congruent collocations in her studies
would be classified into congruent ones, which leads to a modification in the
interpretation of her findings.
3.2.3 Collocation Lag
Apart from the deficiencies in collocation uses uncovered by previous studies,

collocation knowledge has also been measured against the knowledge of lexis and
learners’ proficiency, with the general findings showing a collocation lag in SLA
acquisition. Schmitt and Carter (2004: 13) pointed out that L2 learners’ formulaic
language tends to lag behind other aspects. Wray (2002: 182) conveyed the same
impression that “by the time the learner has achieved a reasonable command of the
L2 lexicon and grammar, the formulaic sequences appear to lag behind”. Empirical
evidence has been gathered so as to measure learners’ collocation knowledge by
comparison with other aspects of language learning. For example, Bahns and Eldaw
(1993) analysed learners’ translations in terms of collocation errors and errors with
lexis in order to determine whether learners’ knowledge of collocations was on the
same level with their knowledge of general vocabulary. Collocation errors were
reported to be more than twice as frequent as general lexical errors. They concluded
that learners’ “knowledge of general vocabulary far outstrips their knowledge of
collocations” (ibid: 108). In a different vein, learners in Barfield (2007) were asked
to report on their knowledge of individual words and on combinations of these
words, and it was found that learners reported better knowledge of the former than
the latter (Barfield 2007, cited in Laufer and Waldman 2011). Irujo (1993) exam-
ined the use of idioms produced by 12 bilingual native speakers of Spanish, whose
English was virtually indistinguishable from a native speaker’s, and discovered a
deficiency of their knowledge of idioms relative to their general production with
few grammatical or lexical errors.
A deficiency in collocation knowledge among learners at very advanced levels is
also indicative of a lag in collocation knowledge. Farghal and Obeidat (1995)
observed that language teachers of English who had a minimum of five to ten years’
teaching experience were seriously deficient in collocations. In Nesselhauf’s (2005)
study, students with 10–17 years’ English learning produced a similar proportion of
erroneous collocations as students who had only studied English for 5–10 years. It
was summarised that the “length of a learner’s exposure to English in
English-speaking countries was shown to probably have a slight effect on collo-
cational accuracy, whereas the number of years a learner had undergone classroom
teaching was shown to have no effect” (2005: 237). A comparative study on col-
location performance across learners at three proficiency levels conducted by
Laufer and Waldman (2011) showed a much stronger collocation lag, as the
advanced and the intermediate learners produced significantly more erroneous
collocations than the basic learners. Similar results were obtained in Obukadeta’s
study as discussed in Chap. 1. Despite the heterogeneity in L2 collocation studies,
these results clearly indicate a noteworthy lack of positive correlation between
general language proficiency and collocation knowledge.
The question whether collocation knowledge can be related with general pro-
ficiency is not of crucial importance here, since on the one hand, it is difficult to
establish a clear link between language proficiency and phraseological competence,
the former of which is usually loosely measured in terms of the number of years of
English instruction for research purposes (Paquot and Granger 2012: 137); on the
other, it is evident that collocation lags behind other aspects of L2 knowledge and
‘may floor even the proficient non-native’ (Wray 2000: 463). Thus, what is cen-
trally important is to investigate what factors are associated with collocation lag.
The poor phraseological performance even for learners at an advanced level is
explained in terms of lacking awareness of collocational relationship between
words, i.e. learners do not pay attention to collocational relationships, and collo-
cations “are initially seen as compositional combinations of words rather than as a
phenomenon of co-selection” (Philip 2007: 3). Studies testing learners’ intuition
about collocations that are frequent in the L2 show a weak sense of collocational
relationships (Channell 1981; Granger 1998a; Siyanova and Schmitt 2008). For
example, in examining the collocational competence of a group of eight advanced
learners who were asked to mark the acceptable collocates of adjectives from a list
of nouns, Channell (1981: 120) found that “learners fail to realise the potential even
of words they know well, because they only use them in a limited number of
collocations of which they are sure”. To test whether French learners of English had
an underdeveloped sense of what constitutes a significant collocation, Granger
(1998a) used a word-combination test in which subjects were asked to choose all
the adjectives which collocated with 11 amplifiers ending in ly and functioning as
modifiers (e.g. highly, bitterly). Learners marked over 100 fewer frequent collo-
cations than the native speakers, providing clear evidence of learners’ weak sense of
collocations compared with that of native-speakers (Granger 1998a: 152). In a
similar vein, participants in Siyanova and Schmitt (2008) rated native-like collo-
cations as far less frequent, and atypical collocations as more frequent than those by
NSs. The ignorance of collocating relationships in language input naturally leads to
production which is “subject to whatever interlanguage rules the learner is operating
under” (Yorio 1989: 62). A typical illustration of this process from inability to
recognise a collocation to utilising interlanguage rules is given by Wray (2002:
209):
… the adult language learner, on encountering major catastrophe, would break it down into
a word meaning ‘big’ and a word meaning ‘disaster’ and store the words separately,
without any information about the fact they went together. When the need arose in the
future to express the idea again, they would have no memory of major catastrophe as the
pairing originally encountered, and any pairing of words with the right meaning would
seem equally possible: major, big, large, important, considerable, and so on, with catas-
trophe, disaster, calamity, mishap, tragedy, and the like.
Wray’s explanation of the way learners treat and produce collocations is con-
sistent with Wolter’s (2006: 746) claim that “the process of building syntagmatic
connections between words in an L2 appears to be considerably harder than the
process for building paradigmatic connections”. Furthermore, the acquisition of
new words may interfere with the production of collocations in the selection of
appropriate collocates from a set of related words. One illustration of semantic
relatedness is synonymy and the use of synonyms has been identified as the most
frequent strategy adopted in producing collocations (Farghal and Obeidat 1995;
Irujo 1993). Thus it seems that L2 learners’ vocabulary size is closely linked with
their collocation learning, though most probably in a negative way. The study
conducted by Gyllstad (2005: 1), for example suggests that “learners with large
vocabularies have a better receptive command of verb + NP collocations than
learners with smaller vocabularies”. Yet there is still unclarity about the relationship
between vocabulary size and the production of collocations, which is important for
an understanding of L2 collocation learning. There is to date still a paucity of
research into the relationship between vocabulary increase and collocation
production.
Therefore, in order to uncover the underlying factors inhibiting L2 learners’
collocation learning, this study seeks to investigate the relationship between
vocabulary growth and collocation learning, particularly the increase in vocabulary
in a set of semantically related words.
3.3 Summary
This chapter has reviewed the growing body of research that has been undertaken in
the past several decades into collocation learning by L2 learners. A wide range of
data types has been utilised in past research, including elicitation and spontaneous
data, or a mixture of the two types. With each data type possessing unique
advantages, learner corpora are gaining more and more popularity in terms of either
naturalness of learner language or large quantities. These distinctive advantages of
learner corpora will be further explored, as the present study will utilise a corpus of
written English produced by Chinese EFL learners, in order to investigate their
collocation performance and vocabulary increase.
Past research into L2 collocation studies has covered many types of learners,
including learners of different mother tongues or different proficiency groups. Their
research foci are different types of collocations, i.e. verb + noun collocations,
adjective + noun collocations, lexical phrase, lexical bundles, routines, etc. Given
this heterogeneity in L2 collocation research, direct comparisons of research find-
ings are difficult, but there emerges a general picture for the learning of collocations
by L2 learners, i.e. collocation learning poses special difficulty for L2 learners, as
3.3 Summary 55
evidenced by deficiencies of collocation overuse, underuse and misuse even for

learners at advanced levels. As pointed out in Chap. 1, collocation knowledge is
believed to lag behind grammar and lexis and constitutes the “last and most
challenging hurdle in attaining near native-like fluency” (Spottl and McCarthy
2004: 191).
In general, past studies of L2 collocation acquisition have been concerned with a
description of learners’ collocation performance and in providing evidence of the
problems confronted with L2 learners. In this sense, research into second language
collocation learning is still at an early stage. What brings about, or contributes to,
the collocation lag still remains unclear. The point of departure of this research is
postulating that vocabulary increase can be associated with collocation lag, and the
study thus aims to test this prediction through a developmental study of collocation
learning by Chinese EFL learners.
References
Ädel, A., & Erman, B. (2012). Recurrent word combinations in academic writing by native and
non-native speakers of English: A lexical bundles approach. English for Specific Purposes, 31
(2), 81–92.
Aijmer, K. (2009). “So er I just sort of I dunno I think it’s just because…”: A corpus study of “I
don’t know” and “dunno” in learner spoken English. In A. H. Jucker, D. Schreier, & M. Hundt
(Eds.), Corpora: Pragmatics and discourse (pp. 151–166). Amsterdam: Rodopi.
Al-Zahrani, M. S. (1998). Knowledge of English lexical collocations among male Saudi college
students majoring in English at a Saudi university. Ann Arbor, MI: UMI.
101–114.
Barfield, A. (2007). An exploration of second language collocation knowledge and development.
Swansea: University of Swansea.
Biskup, D. (1990). Some remarks on combinability: Lexical collocations. In J. Arabski (Ed.),
Foreign language acquisition papers (pp. 31–44). Katowice: Uniwersytet Slaski.
Biskup, D. (1992). L1 influence on learners’ renderings of English collocations: A Polish/German
empirical study. In P. Arnaud & H. Bejoint (Eds.), Vocabulary and applied linguistics
(pp. 85–93). London: Macmillan.
Bonk, W. (2001). Testing ESL learners’ knowledge of collocations. In T. Hudson & J. D. Brown
(Eds.), A focus on language test development: Expanding the language proficiency construct
across a variety of tests (pp. 133–142). Technical Report 21. Honolulu: University of Hawai’i,
Second Language Teaching and Curriculum Center.
Channell, J. (1981). Applying semantic theory to vocabulary teaching. ELT Journal, 35(2),
115–122.
Cobb, T. (2003). Analyzing late interlanguage with learner corpora: Quebec replications of three
European studies. Canadian Modern Language Review, 59(3), 393–423.
Crossley, S. A., & Salsbury, T. (2011). The development of lexical bundle accuracy and
production in English second language speakers. International Review of Applied Linguistics in
Teaching, 49, 1–26.
Dechert, H. W. (1983). How a story is done in a second language. In C. Faerch & G. Kasper
(Eds.), Strategies in interlanguage communication (pp. 175–196). London: Longman.
De Cock, S. (2011). Preferred patterns of use of positive and negative evaluative adjectives in
native and learner speech: An ELT perspective. In A. Frankenberg-Garcia, L. Flowerdew, & G.
Aston (Eds.), New trends in corpora and language learning (pp. 198–212). London:
Continuum.
De Cock, S., Granger, S., Leech, G., et al. (1998). An automated approach to the phrasicon of EFL
learners. In S. Granger (Ed.), Learner English on computer (pp. 67–79). London: Longman.
Durrant, P., & Schmitt, N. (2009). To what extent do native and non-native writers make use of
collocations? International Review of Applied Linguistics, 47, 157–177.
Durrant, P., & Schmitt, N. (2010). Adult learners’ retention of collocations from exposure. Second
Language Research, 26(2), 163–188.
Ellis, R. (1987). Second language acquisition in context. Hertfordshire: Prentice Hall.
Ellis, R. (1994). The study of second language acquisition. Oxford: Oxford University Press.
Fan, M. (2009). An exploratory study of collocational use by ESL students—A task based
approach. System, 37(1), 110–123.
Farghal, M., & Obeidat, H. (1995). Collocations: A neglected variable in EFL. International
Review of Applied Linguistics in Language Teaching, 33(4), 315–331.
Foster, P. (2001). Rules and routines: A consideration of their role in the task-based language
production of native and non-native speakers. In M. Bygate, P. Skehan, & M. Swain (Eds.),
Researching pedagogic tasks: Second language learning, teaching and testing (pp. 75–93).
Harlow: Longman.
Granger, S. (1998a). Prefabricated patterns in advanced EFL writing: Collocations and formulae.
Granger, S. (1998b). The computer learner corpus: A versatile new source of data for SLA
research. In S. Granger (Ed.), Learner English on computer (pp. 3–18). London: Longman.
Granger, S. (2002). A bird’s eye view of learner corpus research. S. Granger, J. Hung, & S.
Petch-Tyson (Eds.), Computer learner corpora, second language acquisition and foreign
language teaching (pp. 3–33). Amsterdam: Benjamins.
Gyllstad, H. (2005). Words that go together well: Developing test formats for measuring learner
knowledge of English collocations. In F. Heinat & E. Klingval (Eds.), The Department of
English in Lund: Working papers in linguistics (Vol. 5, pp. 1–31).
Hasselgren, A. (1994). Lexical teddy bears and advanced learners: A study into the ways
Norwegian students cope with English vocabulary. International Journal of Applied
Henriksen, B. (2013). Research on L2 learners’ collocational competence and development—A
progress report. In C. Bardel, C. Lindqvist, & B. Laufer (Eds.), L2 vocabulary acquisition,
knowledge and use: New perspectives on assessment and corpus analysis (pp. 29–56). Eurosla.
[2014-03-10]. http://www.eurosla.org/monographs/EM02/EM02tot.pdf.
Hoffman, S., & Lehmann, H. M. (2000). Collocational evidence from the British National Corpus.
In J. M. Kirk (Ed.), Corpora Galore: Analyses and Techniques in Describing English. Papers
from the Nineteenth International Conference on English Language Research on
Computerised Corpora (ICAME 1998) (pp. 17–32). Amsterdam: Rodopi.
Press.
24–44.
Hsu, J. (2007). Lexical collocations and their relation to the online writing of Taiwanese college
English majors and non-English majors. Electronic Journal of Foreign Language Teaching, 4
(2), 192–209.
References 57
Irujo, S. (1993). Steering clear: Avoidance in the production of idioms. International Review of
Applied Linguistics in Language Teaching, 31(3), 205–219.
Jiang, N. (2000). Lexical representation and development in a second language. Applied
Jiang, N. (2002). Form-meaning mapping in vocabulary acquisition in a second language. Studies
in Second Language Acquisition, 24, 617–637.
advanced learners of English: A contrastive, corpus-based approach. [2011-10-13]. http://main.
Kroll, J. F., & Stewart, E. (1994). Category interference in translation and picture naming:
Evidence for asymmetric connections between bilingual memory representations. Journal of
Memory and Language, 33, 149–174.
Kroll, J. F., Van Hell, J. G., Tokowicz, N., et al. (2010). The Revised Hierarchical Model: A
critical review and assessment. Bilingualism: Language and Cognition, 13(3), 373–381.
Li, J., & Schmitt, N. (2010). The development of collocation use in academic texts by advanced L2
learners: A multiple case study approach. In D. Wood (Ed.), Perspectives on formulaic
language: Acquisition and communication (pp. 23–46). London: Continuum.
Li, W. Z. (2009). A critical review of CIA. CAFLEC, 127, 13–17.
Lorenz, G. (1999). Adjective intensification—Learners versus native speakers: A corpus study of
argumentative writing. Amsterdam: Rodopi.
Martelli, A. (2006). A corpus based description of English lexical collocations used by Italian
advanced learners. In E. Corino, C. Marello, & C. Onesti (Eds.), Proceedings XII EURALEX
International Congress (pp. 1005–1011). Alessandria: Edizioni dell’Orso.
Marton, W. (1977). Foreign vocabulary learning as problem no. 1 of language teaching at the
advanced level. Interlanguage Studies Bulletin, 2, 33–57.
Men, H. Y. (2010). A corpus-based analysis of Chinese EFL learners’ adverb/adjective collocation
errors in English writing. Internet Fortune, (4), 108–109.
Nation, I. S. P. (1990). Teaching and learning vocabulary. Boston, MA: Heinle & Heinle.
Olshtain, E. (1987). The acquisition of new word formation processes in second language
acquisition. Studies in Second Language Acquisition, 9(2), 221–231.
Paquot, M., & Granger, S. (2012). Formulaic language in learner corpora. Annual Review of
Applied Linguistics, 32, 130–149.
Penke, M., & Rosenbach, A. (2007). What counts as evidence in linguistics: The case of
innateness. Amsterdam: Benjamins.
Online Proceedings of Corpus Linguistics. [2014-01-12]. http://ucrel.lancs.ac.uk/publications/
Pu, J. Z. (2010). Corpora and unified language studies. Journal of PLA University of Foreign
Languages, 33(2), 41–44.
Salkie, R. (2002). Two types of translation equivalence. In B. Altenberg & S. Granger (Eds.), Lexis
in contrast: Corpus-based approaches (pp. 51–71). Amsterdam: Benjamins.
Scarcella, R. (1979). Watch up!: A study of verbal routines in adults second language
performance. Working Papers on Bilingualism, 19, 79–88.
Schmitt, N., & Carter, R. (2004). Formulaic sequences in action: An introduction. In N. Schmitt
(Ed.), Formulaic sequences: Acquisition, processing and use (pp. 1–22). Amsterdam:
Benjamins.
Schmitt, N., Dörnyei, Z., & Adolphs, S., et al. (2004). Knowledge and acquisition of formulaic
sequences: A longitudinal study. In N. Schmitt (Ed.), Formulaic sequences: Acquisition,
processing and use (pp. 55–86). Amsterdam: Benjamins.
Siyanova, A., & Schmitt, N. (2008). L2 learner production and processing of collocation: A
multi-study perspective. Canadian Modern Language Review, 64(3), 429–458.
Spottl, C., & Mccarthy, M. (2004). Comparing knowledge of formulaic sequences across L1, L2,
L3 and L4. In N. Schmitt (Ed.), Formulaic sequences: Acquisition, processing and use
Thierry, G., & Wu, Y. J. (2007). Brain potentials reveal unconscious translation during
foreign-language comprehension. Proceedings of the National Academy of Sciences of the
United States of America, 12530–12535.
Widdowson, H. G. (1979). Explorations in applied linguistics. Oxford: Oxford University Press.
Widdowson, H. G. (2000). On the limitations of linguistics applied. Applied Linguistics, 21(1),
3–25.
Willis, D. (2010). Three reasons why. In S. Hunston & D. Oakey (Eds.), Introducing applied
linguistics: Concepts and skills (pp. 6–11). London: Routledge.
Wolter, B. (2006). Lexical network structures and L2 vocabulary acquisition: The role of L1
lexical/conceptual knowledge. Applied Linguistics, 27(4), 741–747.
Wray, A. (2000). Formulaic sequences in second language teaching: Principles and practice.
Applied Linguistics, 21(4), 463–489.
Yamashita, J., & Jiang, N. (2010). L1 influence on the acquisition of L2 collocations:
Japanese ESL users and EFL learners acquiring English collocations. TESOL Quarterly, 44(4),
647–668.
Yip, V. (1995). Interlanguage and learnability: From Chinese to English. Amsterdam: Benjamins.
Yorio, C. A. (1989). Idiomaticity as an indicator of second language proficiency. In K. Hyltenstam
& L. K. Obler (Eds.), Bilingualism across the lifespan (pp. 55–72). Cambridge: Cambridge
University Press.
Zhang, W. Z., & Chen, S. C. (2006). EFL learners’ acquisition of English adjective-noun
collocations—A quantitative study. Foreign Language Teaching and Research, 38(4),
251–258.
Zhang, X. (1993). English collocations and their effect on the writing of native and non-native
college freshmen. Indiana: Indiana University of Pennsylvania.
Zhang, Y., & Gao, Y. (2006). A CLEC-based study of collocation acquisition by Chinese English
language learners. CELEA Journal, 29(4), 28–35.
Chapter 4
Research Design
The main focus of this corpus-based research is to examine the relationship between
vocabulary increase and collocation uses by Chinese learners of English, with the
aim to test whether vocabulary increase is associated with the collocation lag in the
field of second language acquisition. This chapter outlines the design of such a
cross-sectional study. Section 4.1 briefly mentions research purpose and research
questions; Sect. 4.2 presents the types of collocations to be targeted in learners’
writings and justifies why verb + noun, adjective + noun and noun + noun collo-
cations are chosen rather than other types of collocations; Sect. 4.3 introduces the
learner corpora used: Chinese Learner English Corpus; Sect. 4.4 explains the
selection of two collocation dictionaries to be referenced; Sect. 4.5 briefly intro-
duces the British National Corpus as a native speaker reference corpus to check the
acceptability and appropriateness of combinations; Sect. 4.6 lists the software
adopted for automatic data collection, for the creation of databases, and for sta-
tistical analyses. The main procedure of the study is presented in Sect. 4.7, followed
by a summary of this study design (Sect. 4.8).
4.1 Research Purpose and Questions
The aim of this study is to examine the relationship between vocabulary growth and
the production of L2 collocations. As was discussed in Chap. 1, verb + noun
collocations are targeted and the growth of verbs is examined from two perspec-
tives: from delexical verbs to lexical verbs and the increase in verbs within a
synonym set. The study sets out to answer the following questions regarding VN
collocations:
DOI 10.1007/978-981-10-5822-6_4
60 4 Research Design
1. What developmental patterns appear in the verb + noun collocations produced

by L2 learners, in terms of delexical verb and lexical verb + noun collocations?
a. Is there a tendency towards increasing use of lexical verb + noun colloca-
tions with rising proficiency?
b. Is there a tendency towards increasing errors with lexical verb + noun col-
locations, and decreasing delexical verb + noun collocation errors with ris-
ing proficiency?
2. Within specific semantic domains of the verbs in verb + noun collocations used
by all levels of learners, is there a tendency for these verbs, as they increasingly
occur at the higher levels, to be associated with collocation errors?
In addition to targeting the most frequent type of collocations (verb + noun
collocations), two other frequent types of collocations, i.e. adjective + noun
(AN) and noun + noun (NN) collocations will be further investigated in the same
way as VN collocations, in order to compare with the findings from VN colloca-
tions. Questions concerning these two types of collocations are:
3. Are adjective + noun and noun + noun collocations produced by Chinese L2
learners at the same accuracy level as verb + noun collocations? If not, what
patterns do they follow?
4. Within specific semantic domains of the adjectives in adjective + noun collo-
cations and nouns in noun + noun collocations used by all levels of learners, is
there a tendency for these adjectives/nouns, as they increasingly occur at the
higher levels, to be associated with collocation errors?
As stated in Chap. 1, the role of L1 will also be examined in the learning of L2
collocations by Chinese learners. This study will test whether congruent colloca-
tions are easier than non-congruent ones and whether non-congruent collocations
once acquired, are less prone to errors.
Thus the design of this study covers Chinese learners’ production of VN col-
locations, as well as their production of AN and NN collocations. The hypotheses
concerning verb + noun collocations, adjective + noun and noun + noun colloca-
tions are interlinked as they contribute to testing the vocabulary growth factor in
collocation learning, whilst the hypothesis regarding L1 influence is a separate
enquiry. For this reason, the following sections present the design for investigation
of VN, AN and NN collocations, whilst the detailed procedure for analysing L1
influence is given separately in Chap. 9.
4.2 The Selection of Verb + Noun, Adjective + Noun

and Noun + Noun Collocations
As was presented in Sect. 2.2.3, lexical collocations are divided into six types:
adjective + noun; (subject-) noun + verb; noun + noun; adverb + adjective; verb +
adverb; verb + (object-) noun (Benson et al. 2010; Hausmann 1989). Among these,
4.2 The Selection of Verb + Noun, Adjective + Noun and Noun + Noun Collocations 61
verb + noun collocations were primarily targeted because they are the most fre-
quent and important (Benson et al. 2010; Howarth 1996, 1998a) and at the same
time constitute a frequent source of difficulty for L2 learners (Bahns and Eldaw
1993; Benson 1985; Biskup 1990; Cowie 1991, 1992; Gitsaki 1999; Howarth
1998a, b; Nesselhauf 2005; Palmer 1933). Furthermore, verb + noun collocations
are the most frequent type of collocation errors in the learner corpus we are going to
investigate (1572 out of 2940 tokens of collocation errors made by six levels of
Chinese learners of English).
However, in VN collocations, verbs and nouns were not given equal weight in
this study. The focus was on verbs, based on the assumption that it is the nouns
(nodes) that determine the verbs to go with.1 In both speech and writing, words are
produced and arranged in a linear sequence, which makes the production process
seem as if the preceding words select the following ones (verbs/adjectives select the
following nouns). In fact it is the noun where language users generally start from
when forming ideas (Cowie 1998; McIntosh et al. 2009) and “the most important
kind of collocations sought by a writer or translator is the one based on the noun, for
it is the noun that sets the semantic context of the sentence” (Kozlowska and
Dzierzanowska 1988: 8). With the case of lingering doubt as an example, doubt
selects an acceptable adjective—lingering but not loitering (Cowie 1998: 222–223).
The selection of the verb or the adjective by the noun is reflected in the organisation
of collocation dictionaries like Selected English Collocations (Kozlowska and
Dzierzanowska 1988), where nouns are the headwords. Therefore, verbs in VN
collocations produced by L2 learners will be investigated since the appropriate verb
has to be chosen to collocate with the noun previously selected (e.g. acquire
knowledge but not *grasp knowledge, play a role but not *occupy a role).2
In addition to verb + noun collocations, two other types of collocations were
briefly examined in the same way as VN collocations to compare with findings
from analyses of VN collocations. They are adjective + noun and noun + noun
collocations, which are the top two most frequent word combinations used by
native speakers of English (Johansson and Hofland 1989). They are not only fre-
quent, but also susceptible to error in L2 collocation performance. Adjective + noun
and noun + noun collocation errors are the second and third most frequent types of
collocation errors according to the error analysis of Chinese Learner English Corpus
(henceforth referred to as CLEC) (Gui and Yang 2003). Yet these error types have
scarcely been focused on in previous L2 collocation studies (except in the studies
1
Similar approach is taken by Bahns and Eldaw (1993: 103), by whom the noun was viewed as the
node and a verb as collocate. Howarth (1998a) on the contrary cast doubts on the direction of
selection from the noun to the verb through the erroneous collocations *the contrast is drawn and
*place weight on, and speculated it is the other way around. However, the two examples are just
indicative that nouns (contrast and weight) are preselected, but the collocating verbs went wrong
(draw and place).
2
In fact, in the process of the extraction of verb + noun collocations, we found that wrong choices
of the nouns were rather rare and there were only a few instances such as *solve the question. It
will be exemplified in Sect. 4.7.2.
on AN collocations conducted by Martelli 2006; Siyanova and Schmitt 2008). The

investigation of these two types of collocations was aimed at finding out whether
the findings with learners’ use of these two types are in line with the findings on
verb + noun collocations.
4.3 The Learner Corpus—CLEC
The present study focuses on the developmental patterns of Chinese EFL learners’
use of collocations and thus requires a longitudinal corpus of the performance of
Chinese learners over an extended period of time, or else a corpus of the perfor-
mance of Chinese learners at different proficiency levels in English, i.e. an
apparent-time approach to the study of language development. A longitudinal
corpus would be much preferred, but it is unfortunately unavailable at the time of
the research. However, a corpus of the written performance of learners at different
proficiency levels makes an apparent-time study possible. This study made use of
the one-million-word Chinese Learner English Corpus, a computerised textual
database of writings by Chinese learners at five different levels of proficiency.
CLEC is homogenous in the sense that all learners are Chinese learners of English;
at the same time it is heterogeneous because it represents learners at different
developmental stages. The apparent-time design assumes that the performance of
different age groups of learners at different proficiency levels is indicative of suc-
cessive stages of development. The inclusion of learners of five learning stages is a
distinct advantage of CLEC, which cannot be outweighed by other published
learner corpora. Other widely used and more recent corpora recording the written
performance of Chinese learners include Spoken and Written English Corpus of
Chinese Learners (SWECCL) (Wen et al. 2008), the Chinese sub-corpus of ICLE
(Granger et al. 2009) and the British Academic Written English Corpus (BAWE)
(Nesi 2011). SWECCL documents the spoken and written data of English major
university students and it records the writings of English majors of four grades. The
Chinese sub-corpus of ICLE contains argumentative essays written by higher
intermediate to advanced Chinese learners of English at universities, and BAWE
contains texts from proficient Chinese undergraduate students studying in several
UK universities. Though these learner corpora are newer and some of them are
larger than CLEC, they only cover learners of a fixed or limited range of proficiency
levels. Learners below the university level are not targeted in these corpora. CLEC,
therefore, is chosen for study.
We next provide a brief introduction to CLEC. The Chinese Learner English
Corpus is a one-million word collection of compositions produced by Chinese
learners of English at five developmental phases: high school students (coded by the
developers as ST2, approximately corresponding to the upper levels of secondary
school students in the UK), non-English major university students of lower grades
(ST3) and higher grades (ST4), and English majors of lower grades (the first and
second years, ST5) and higher grades (the third and fourth years, ST6) (Gui and
4.3 The Learner Corpus—CLEC 63
Yang 2003). The writings of the five groups of learners were recorded in separate
files, each approximating 200,000 words. For the convenience of discussion about
the five sub-corpora, it is preferable to achieve uniformity between learner types and
the corresponding files of their writings. Therefore, the files were named as follows:
• ST2: high school students; the written performance by ST2 learners;
• ST3: first and second-year non-English major university students; the written
performance by ST3 learners;
• ST4: third and fourth-year non-English major university students; the written
performance by ST4 learners;
• ST5: first and second-year English majors; the written performance by ST5
learners;
• ST6: third- and fourth-year English majors; the written performance by ST6
learners.
The project was undertaken by teachers of English in various universities across
three cities (Guangzhou, Shanghai, Xinxiang) in China, so the learners targeted in
CLEC were from several middle schools/universities rather than from one particular
institution. The sub-corpora of ST2, ST5 and ST6 are made up of learners’ free
compositions, whereas ST3 and ST4 consist of timed writings for tests (the national
general English proficiency tests: Band 4 and Band 6). In terms of text types, the
assignments for the ST2 group were mainly narrative (probably it is still early for
Chinese middle school students to develop argumentative writing skills), and their
writings were not confined to one topic. The ST3 and ST4 sub-corpora contain
argumentative essays. For the texts produced by ST5 learners, they are not confined
to one or two topics and are partly argumentative and partly narrative. The ST6
sub-corpora contain argumentative essays, and most of the topics are what the ICLE
suggested in their data collection process, which are “should euthanasia be legalised
in China?”, “Crime does not pay.”, “the abolition of prison systems”, “the value of
university degrees” and “Should a man/woman’s financial reward be commensurate
with their contribution to the society they live in?”, etc.
Not all the five sub-corpora were examined in this research. As noted earlier, the
sub-corpora of ST2, ST5 and ST6 are made up of learners’ free compositions,
whereas ST3 and ST4 consist of timed writings for tests. Only the ST2, ST5 and
ST6 learner files were used, as they contain the same data type—free compositions.
So it is assumed they did not undergo the time and mental pressure in timed writing
for tests and they could turn to referencing tools for help in the writing process. The
three groups of learner data are thus homogenous. It is furthermore generally
assumed that the quantity of formal English instruction learners receive is indicative
of their proficiency in English. Thus, it is possible to conduct an apparent-time
study on Chinese EFL learners’ collocation performance on the basis of the years of
instruction they get. In light of the components of CLEC, a clear dividing line in
English proficiency is observable, i.e. pre-university Chinese EFL learners (ST2)
and university-level learners (ST3, ST4, ST5 and ST6). Likewise, university-level
students can be further divided into non-English majors (ST3 and ST4) and English
majors (ST5 and ST6). However, it cannot be claimed that the proficiency of ST3,
ST4, ST5 and ST6 learners is in a continuum because of the difference in majors
(non-English major vs. English major). It is possible that some non-English majors
are better than English majors in the overall English performance. Consequently,
the intensity of English instruction is not used as a criterion to distinguish the
proficiency level of non-English and English majors, even though the latter might
be better than the former in general.
For these reasons, three groups of learners were examined in this study: ST2—
pre-university high school students, categorised as the “basic” level, ST5—English
majors of lower grades as the “intermediate” level, and ST6—English majors of
higher grades as the “advanced” level. The classification of the levels, as discussed
above, is based on the years of English instruction learners receive, and is mainly
adopted for straightforward comparison. The ST2 learners had at least 3–5 years’
classroom English instruction in China; the ST5 group had at least 6–7 years and
the ST6 learners had English instruction for at least 8–9 years.
4.4 Collocation Dictionaries for Reference
As stated in Chap. 2, the approach taken to collocation in this study is mainly

phraseological, and also frequency-based. For this reason, two collocation dic-
tionaries were selected so as to check on well-formed and erroneous collocations
produced by Chinese learners. One is non-corpus-based—the BBI Combinatory
Dictionary of English (3rd Edition) (Benson et al. 2010) (henceforth: the BBI),
while the other is corpus-based—Oxford Collocations Dictionary for Students of
English (2nd Edition) (McIntosh et al. 2009) (henceforth: the OCDSE).3 The BBI as
a collocation dictionary represents the phraseological approach and the OCDSE the
frequency-based approach. The BBI includes both grammatical and lexical collo-
cations and has a neatly presented organisation for the noun entries (e.g.
verb + object noun, adjective + noun and noun + noun collocations). It has been
widely consulted in studies conducted by Cowie (1992), Gitsaki (1999), Howarth
(1996), Laufer and Waldman (2011), and Nesselhauf (2004). However, for the BBI,
some common collocations have escaped the intuitions of the authors (Klotz 2003:
58) (e.g. do sport, give a comment which are absent in the BBI, but recorded in the
OCDSE). Collocations from the OCDSE are retrieved from a large reliable database
of actual language use—the Oxford English Corpus of over two billion words. It is
preferable to other corpus-based lists of word combinations, e.g. the Frequency
Analysis of English Vocabulary and Grammar (Vol. 2) (Johansson and Hofland
1989) and A Dictionary of English Collocations (Kjellmer 1994). Word combi-
nations in these two books are, respectively, based on one million words of the
LOB and Brown corpora whose limited size “poses problems, particular for the
3
The use of the two dictionaries in attesting collocations was also endorsed by Siyanova and
Schmitt (2008).
4.4 Collocation Dictionaries for Reference 65
study for word combinations” (Johansson and Hofland 1989: 14) and is often
inadequate for the lexicologist (Kjellmer 1994: xiii).4 So the OCDSE has been
chosen for its wide coverage. Therefore, both dictionaries are consulted in the
process of extracting collocations. If a collocation is included in either of the two
dictionaries, it is considered as a well-formed collocation.
4.5 The Reference Corpus—BNC
The British National Corpus was chosen to serve as a benchmark for measuring the
appropriateness of learners’ production of collocations which failed to be attested in
collocation dictionaries. As a general corpus representing as wide a range of
modern British English as possible, the 100-million-word corpus contains over
4000 written texts and transcripts of speech in British English (McEnery et al.
2006). It has to be noted that the BNC only covers British English of the late
twentieth century. However, creativity and productivity are two of the design
features of human language, by which it means that humans are able to construct
understandable linguistic forms, some of which have even not been used before.
Language is constantly changing, with the gradual emergence of neologisms. Some
of these new constructions and interpretation of words and expressions are accepted
by the language community and acquire status in the language stock. By the same
token, new combinations of words, i.e. collocations, are constantly coined and
gaining acceptance. Yet the BNC is not timely updated to include new forms of
language use, and its limited size means that it fails to cover a wider range of
English language uses. Considering the ever-changing nature of language, few
corpora can include all collocations, which makes the recognition of appropriate/
inappropriate word combinations difficult. However, relative to the creative use of
language, there is always a conventional core in any language, which remains stable
and usually becomes the learning target of language learners. Attested in a large
corpus, conventional expressions have the most frequent occurrences and creative
expressions are usually on the bottom of the frequency list. In published dic-
tionaries, conventional language uses are prioritised and recorded. Therefore, in this
study collocation dictionaries were taken as the criterion for locating conventional
English collocations. As was discussed in Chapter One, conventional language uses
are set both as the norm for L2 learners and as the criterion for judging the
appropriateness of learners’ interlanguage. If a collocation is listed in the two
dictionaries, it was recognised as correct (cf. Sect. 4.4). If it is not included in the
dictionaries, this research adopted the online version of the BNC—BYU-BNC,5
with the aim of checking whether it is acceptable or not. The BNC was also used to
4
For example, commit a crime, a commonly accepted collocation, is not included in Johansson and
Hofland’s book.
5
http://corpus.byu.edu/bnc/ [Accessed 10 March 2012].
locate the target collocating word if an appropriate one was not found in collocation
dictionaries. For example, in our learner database, create + poem, was viewed as
incorrect as it was not recognised as a conventional collocation in the dictionaries,
nor was recorded in the BNC, though it is understandable in the English language.
The detailed procedure for identifying well-formed and erroneous collocations by
using collocation dictionaries and the BNC will be shown in Sect. 4.7.2.
4.6 Software for Retrieval and Analysis
Data extraction has been greatly facilitated by the increasing sophistication and
availability of computers. To allow highly efficient and labour-saving data collec-
tion and analyses, the following types of software were used: AntConc 3.2.4w and
Wordsmith 5.0 to perform the function of concordancing and word-list generation;
EditPad Pro 7 and PowerGREP 4 to automatically collect words of a particular part
of speech and word combinations through regular expressions; Microsoft Office
Excel 2010 to help create databases of collocations and perform the function of
computing and graphing; and finally GraphPad Prism (Version 6.04) was used to
carry out statistical analyses.
Verb + noun collocations were semi-automatically collected, i.e. via an auto-
matic generation of all the verbs in concordances and a manual identification of
verb + noun collocations (cf. Sect. 4.7.2 for detailed explanation). For the retrieval
of other words or word combinations, the following regular expressions were used
in PowerGREP 4:
• For the retrieval of verbs: (\w+)_VV\w+
• For the retrieval of nouns: (\w+)_NN[12]\s|(\w+)_NN\s
• For the retrieval of adjectives: (\w+)_J\w+
• For the retrieval of AN combinations: (\w+_J\w+\s)((\w+_NN[12]\s)|(\w+_NN
\s))
• For the retrieval of NN combinations: (\w+(_NN[12]\s)|(_NN\s))(\w+(_NN[12]
\s)|(_NN\s)).
4.7 Procedure
4.7.1 Tagging and Reliability Check
When originally compiled, CLEC was error tagged into 61 types of error. However,
it was decided not to base the present study on the error-tagged version, which was
found to be inadequate in the following respects: first, well-formed collocations
relevant to the present study were not identified and tagged in CLEC; secondly,
error-tagging in CLEC was faulty as some erroneous collocations were missed out
4.7 Procedure 67
while some well-formed ones were included; thirdly, the error-tagging targeted
erroneous word combinations of all types (including problematic collocations,
erroneous free combinations, colligation errors, etc.) rather than exclusively col-
location errors (cf. Zhang and Gao 2006). For the purposes of the present research,
we rectified the above problems by applying the following procedure: all the error
tags were firstly removed and then the clean corpus was part-of-speech tagged,
followed by a reliability check. Collocations were finally semi-automatically
extracted with reference to the two widely used collocation dictionaries discussed
above.
4.7.1.1 POS Tagging
The total size of the three sub-corpora (ST2, ST5 and ST6) amounts to over
600,000 words. It would have been an impossibly large undertaking to extract
collocations manually by looking through the corpora word by word. For verb +
noun collocation extraction in this research, therefore, the starting point was to
locate all the verbs and then manually sorted out VN collocations (the justification
of this collection method will be given in Sect. 4.7.2). For this purpose, the corpora
were first automatically part-of-speech tagged using the online tagging service
developed by University Centre for Computer Corpus Research on Language at
Lancaster University.6 The current standard tagset—CLAWS7 was used for its
richness in detailed subdivisions of word types.
4.7.1.2 Reliability Check
After the POS tagging, a reliability check was performed. CLAWS is thought to
achieve a consistent accuracy of 96–97% and even 98.3% for the tagging of some
portions of the BNC (Garside 1987, 1996). Those figures are obtained through
tagging the texts of native speakers, although the accuracy rate varies according to
text types. For the tagging of learner language, a lower degree of accuracy is
generally believed to be achieved, since tagging learner language is complicated by
instances of grammatical and morphological errors. Prior to the retrieval of word
combinations, a reliability check was carried out on a sample of over 1000 words in
the ST6 file.
A straightforward and commonly used way of checking tagging validity is to
locate how many tagging errors occur in a sample of texts. Thus two pieces of
writings with a total of 1311 words from the tagged ST6 corpus were randomly
selected. After word-by-word examination, 29 words were found to be incorrectly
tagged. So the tagging reliability was 97.8% [(1311 − 29)/1311*%], a fairly high
accuracy rate and very much in line with the accuracy rate on native speaker texts.
6
http://ucrel.lancs.ac.uk/claws/trial.html [Accessed 1 March 2013].
Additionally, a further check was conducted on an approximately equivalent

sample of texts in the ST2, with the aim to find out whether CLAWS gave an
equally high reliability rate for texts produced by much lower proficiency levels. So
a sample with a total of 1336 words was chosen. The accuracy rate was—perhaps
surprisingly—as high as that found in ST6: 97.9% [(1336 − 28)/1336*%].
Therefore, CLAWS achieved a very high rate of accuracy in the learner texts.
Even within the wrongly tagged words, most of the errors concerned other word
classes rather than verbs. Examples of the very few tagging errors for verbs are:
(1) Euthanasia_NP1,_, or_CC mercy_NN1 killing_NN1,_, means_NN helping_
VVG to_TO hasten_VVI the_AT death_NN1 of_IO a_AT1 person_NN1
who_PNQS is_VBZ badly_RR suffering_VVG ._.
(2) In_II China_NP1,_, suicide_NN1 is_VBZ legal_JJ,_, which_DDQ means_
VVZ,_, people_NN are_VBR legal_JJ to_TO kill_VVI themselves_PPX2 in_II
a_AT1 helpless_JJ condition_NN1,_, so_RR what_DDQ we_PPIS2 conside_
NN1 is_VBZ only_RR whether_CSW it_PPH1 is_VBZ legal_JJ to_TO
end_VVI the_AT life_NN1 of_IO a_AT1 incurable_JJ patient_NN1 ._.
In both cases, means and conside (a spelling error for consider) were tagged as
nouns rather than verbs. Altogether among the 29 tagging errors in the ST6 sample,
5 verbs were wrongly tagged for other word classes instead of as verbs, making up
only 0.38% (5/1311*%) of all the tags. Calculated against the total number of verbs
(276 in total) in the sample, the rate of wrongly tagged verbs was 1.8% (5/276*%).
Therefore, the POS tagging was taken to be at least as reliable as CLAWS generally
is, and the fact that a very small number of verbs were missed out was not prob-
lematic for the overall analyses.
4.7.2 Investigation of Verb + Noun Collocations
In the semi-automatic extraction of VN collocations in the ST2, ST5 and ST6 files,
only collocations of a verb and a noun as its object were counted (e.g. make a
contribution, acquire knowledge). VN combinations can be easily retrieved with
regular expressions performed by PowerGREP. However, there are varied positions
of the nouns as the objects of the verbs. Verbs and nouns are not confined to the
immediate linear sequence: verb + (modifiers) + noun (e.g. make a plan). As in the
examples given by Greenbaum (1970: 10) (cf. Sect. 2.2.1), a collocational rela-
tionship can even transcend a sentence. The following categories of the noun’s
varied positions relative to concordanced verbs were also examined, e.g. the noun
used before the verb in a passive voice (great progress has been made; such
problems would be solved) and in attributive clauses (life pays everyone in different
ways for the contribution he makes to the society, *she can use the knowledge she
had learned in the new job). In these examples above, VN collocations were
subsequently retrieved: make a plan, made progress, solve problems, makes con-
tribution and *learned knowledge. So an automatic extraction of verb + noun
4.7 Procedure 69
combinations within a specified span would not only leave out some combinations,
but also “yield a great deal of unusable material, the sifting of which would
probably be even more time-consuming than the manual extraction of all verb-noun
combinations from the corpus” (Nesselhauf 2005: 43).
Therefore, taking the different proximities of verb + noun collocations into
consideration, VN combinations were not automatically retrieved but rather a
semi-automatic approach was applied: all the verb tags (except the copular be and
modal verbs) were searched, followed by a manual extraction of the nouns as the
collocates of the verbs (phrasal verb + noun patterns, e.g. put on weight were
disregarded).7 Although nouns select other lexical words, they were not first
searched because a large number of irrelevant information would be extracted (e.g.
adjective + noun, (subject) noun + verb, preposition + noun). As is acknowledged
by Howarth (1996: 78), “searches based on the verb would more sharply focus the
searches on the desired patterns”.8
Verbs were classified into eight main categories in CLAWS: VV0 (base form,
e.g. work), VVD (past tense, e.g. worked), VVG (-ing participle, e.g. working),
VVGK (-ing participle catenative, e.g. going in be going to), VVI (the infinitive
form, e.g. it will work…), VVN (past participle, e.g. worked), VVNK ((past par-
ticiple catenative, e.g. bound in be bound to) and VVZ (the present tense form, e.g.
works).9 Considering catenative verbs are followed by to infinitives rather than
nouns, they were disregarded. The remaining 6 forms of lexical verbs were
examined. In addition, verbs of do and have were separately tagged in the CLAWS
and they fell into the category of delexical verbs to be searched. Altogether 18 verb
taggers (six forms of verbs + six forms of do and six forms of have), totaling
87,957 tokens, were examined in concordances generated by AntConc. Next comes
the manual extraction of well-formed and erroneous VN collocations.
4.7.2.1 Extracting Well-Formed and Erroneous Collocations
A collocation was taken to be well-formed when it was found either in the BBI or in
the OCDSE.10 Collocations which were not listed in the two dictionaries, but were
7
This method for identifying verb + noun collocations has also been adopted by Howarth (1996,
1998a, b).
8
This method of taking verbs as the starting point in the extraction is different from the study by
Laufer and Waldman (2011), in which nouns were searched as node words. They started from a set
of 220 pre-selected nouns and proceeded with the identification of verb collocates. The way of
choosing a limited number of frequent nouns would inevitably leave out many verb + noun
collocations. This study performed an exclusive extraction.
9
http://ucrel.lancs.ac.uk/claws7tags.html [Accessed 1 March 2013].
10
A question is whether word combinations (open the door) that are found in the learner corpus
and also included in either the BBI or OCDSE should be listed in my database. Combinations like
open the door were not included, since they are viewed as free combinations based on our
definition of collocations. The two dictionaries were only used to attest well-formedness.
attested in the BBI well-formed

or OCDSE collocation
VN
attested in disregarded
collocation
the BNC
not attested in the
BBI or OCDSE
not attested erroneous
in the BNC collocation
Fig. 4.1 Procedures for identifying well-formed and erroneous VN collocations
attested in the BNC, were disregarded, for the association was too loose to be
viewed as a collocation (e.g. ?give pressure, ?eat tea). A VN collocation was
viewed as erroneous (e.g. *do + problem) when it was neither listed in the two
dictionaries, nor found in the BNC. Wider contexts of the concordanced verbs were
checked in cases of ambiguity. The target verbs for the erroneous collocations were
supplied by consulting the BBI, the OCDSE, or the BNC (e.g. acquire knowledge
but not *learn knowledge, seize time but not *grasp time). In only a few cases
where neither source supplied the target verb for an erroneous collocation, a native
speaker was consulted. As stated above, the following cases were not considered
when identifying verb + noun collocation errors:
• Colligation errors with verb + noun collocations were disregarded, considering
what is of central importance in this study is to examine the (in)correct choices
of verbs, not the (in)correct choices of grammatical forms. Therefore, errors in
phrasal verbs, determiners, prepositions and the number of nouns were not
counted (instances of such errors are *make one’s mind, *hunt a job, and *give
some advices).
• Errors involving free verb + noun combinations were also eliminated (e.g.
*abandon prisons, *build heroes).
• Errors involving the wrong choices of the nouns were eliminated (e.g. *earn his
life, ?*solve questions).
The following chart presents a summary of the procedure adopted for identifying
well-formed and erroneous VN collocations (Fig. 4.1).
4.7.2.2 Creating Databases for All the Collocations Produced

by the Three Levels of Learners
Excel software was used as a tool for storing all the collocations as well-formed
ones (further divided into delexical verb + noun collocations and lexical verb +
noun collocations) and erroneous ones (further divided into erroneous delexical
verb + noun collocations and erroneous lexical verb + noun collocations), with
information on collocation tokens and types recorded. For the convenience of
4.7 Procedure 71
study, verbs and nouns in collocations were lemmatised and articles, determiners
and adjectives in between them were not recorded. For example, the following
collocations were viewed as instantiations of the same collocation (make + plan):
make a plan, makes a plan, make plans, made plans, etc. One advantage of getting
the verbs and nouns in VN collocations lemmatised and recorded into separate and
parallel columns was to facilitate subsequent automatic analyses.
The procedure described above was required for the investigation of the
developmental patterns of VN collocations in terms of delexical verb + noun and
lexical verb + noun collocations. Answering the next question about the relation-
ship between vocabulary increase and collocation acquisition required the imple-
mentation of the following steps:
4.7.2.3 Classifying the Lexical Verbs in VN Collocations into Synonym

Sets
As a first step in examining the relationship between verb increase in a semantic

domain and collocation uses, lexical verbs in well-formed and erroneous colloca-
tions were classified into synonym sets or synsets, in WordNet parlance. Words are
synonyms “if they have a significant similar semantic content” (Saint-Dizier and
Viegas 1995: 18). It is widely acknowledged that no exact synonyms exist in a
language and many words are in a loose relationship of synonymy with varied
degrees of meaning overlap (Palmer 1981). For the classification of verbs in VN
collocations, one major criterion is the semantic content a word carries. In this
sense, Levin’s (1993) classification of semantically coherent verbs was adopted. For
instance, verbs like compose and create in the ST2 database and words in the ST6
database like build, form, and draw were categorised into verbs of creation.
However, it was not sufficient to base synonym classifications exclusively on
Levin’s classifications, since verbs grouped into one class by Levin were primarily
“syntactic” synonyms with semantically coherent bond. Verbs of one semantic
category were not exhaustively listed by Levin (1993), e.g. the verb establish,
which semantically belongs to “verbs of creation” but was listed in the category of
“verbs with predicative complement”. Therefore, other referencing sources were
consulted, namely the Oxford Dictionary of Synonyms and Antonyms (henceforth
ODSA) (Spooner 2005) and WordNet (the web interface),11 a large on-line word
reference system in which verbs are organised into synonym sets (cf. Fellbaum
2010; Miller 1995; Miller et al. 1990). In the Wordnet system, synonymy is defined
as the many-to-one mappings of word forms and concepts. Words representing the
same concept are called synonyms (e.g. boot and trunk). Synonyms are grouped
into unordered sets, viz. synsets (Fellbaum 2010: 232). One advantage of using the
WordNet is that like Levin’s classification of verbs, it displays the larger semantic
domain to which a verb belongs. In sum, three referencing sources were applied in
11
http://wordnetweb.princeton.edu/perl/webwn [Accessed 10 May 2013].
classifying verbs in VN collocations into synsets, i.e. English Verb Classes and
Alternations (henceforth EVCA) (Levin 1993), the ODSA and WordNet. If verbs are
co-listed in at least one of the three dictionaries, they were placed in the same set.
Given that verbs are polysemous, their synonyms were located specifically
within the sense of the verb in a VN collocation. For example, the verb discharge
has 11 senses (accordingly, 11 synsets) as listed in WordNet and 5 synsets in the
ODSA. In the VN collocation—discharge + duty produced by ST2 learners, only
synonyms of the sense of discharge were recorded (e.g. complete). Similarly, ac-
quire in acquire + knowledge was grouped under ‘learn’ verbs instead of being
placed into the synset of “obtaining”.
Classifying verbs into synsets was not confined to verbs under the same entry
covered in the referencing sources. Instead, if two verbs were not listed as syn-
onyms, but they had a shared synonym, the three verbs were grouped in a synset.
For example, fix and place were not listed as synonyms in either the ODSA or
WordNet, but they were, respectively, in a synonymous relationship with attach, so
they were placed in the same synset. Similarly, all three words were synonyms of
put and they were gathered in one synset. Accordingly, new verbs were gradually
accumulated in the synset of “verbs of putting”.
Besides the above criterion of synonym classification, two more loose criteria
were adopted, i.e. context and foreign-language equivalents. Words are defined as
synonyms if they both fit in a particular context (Palmer 1981; Saint-Dizier and
Viegas 1995). These synonym pairs are context-dependent synonyms. Based on
this criterion, the verbs lead and live were synonyms in the given context of
lead/live + life. Another standard was a cross-linguistic one. According to Benson
et al. (1986: 204), one of the definitions of synonymy is foreign-language equiv-
alent. Thus in the data sets of the verbs, wear and dress were synonyms in the sense
that they both share one Chinese translation equivalent (chuan), though they behave
quite differently in English.
In all, the criteria of classifying verbs in VN collocations in this study include
semantic similarity (synonyms), context-dependent synonyms and foreign-language
equivalents. Three referencing sources were applied: the EVCA, the ODSA and
WordNet. There was no hierarchy in applying these criteria and resources and
instead there were alternatives. For the convenience of study, each synset was given
a name according to the classification given by the EVCA, e.g. verbs of creation
(compose, create, build, etc.), and verbs of obtaining (achieve, earn, receive, etc.)In
cases where there was no umbrella term for the semantic set, a representative verb
was used to cover the synset, e.g. fulfil verbs incorporating verbs like fulfil, ac-
complish, apply, etc.
Finally, verb + noun collocations with the verbs falling into the synsets classi-
fied were investigated. The purpose was to find out whether there are more VN
collocation errors in higher levels within synsets where there is an increase of verbs,
and whether these errors are more associated with new verbs than old verbs.
4.7 Procedure 73
4.7.3 Investigation of Adjective + Noun and Noun + Noun

Collocations
In this study, adjectives in adjective + noun collocations only refer to attributive

adjectives, e.g. bright colour. Predicative adjectives, although they can form a
collocational relationship with the noun subject (e.g. the colour is bright), were not
considered for the convenience of automatic retrieval of attributive adjec-
tive + noun combinations. As is classified in the POS tagset of CLAWS 7, nouns
subdivided into 21 categories, e.g. common nouns (singular, plural and neutral for
number), nouns of titles, locative nouns, temporal nouns, proper nouns, etc. Only
the most frequent and important common nouns were targeted in the extraction of
AN and NN combinations, since nouns of titles, locative nouns, temporal nouns and
proper nouns do not fall into collocational relationships with adjectives.
The analysis of AN and NN collocations followed the same methodology as VN
collocations, except for the retrieval process. Due to the constant position of nouns in
the AN and NN combinations, where the noun is directly adjacent to the preceding
adjectives or nouns, data were automatically collected by PowerGREP. However,
unlike the extraction of verb + noun collocations, identifying AN and NN collo-
cations from the retrieved combinations meant distinguishing them from compounds
first. A compound noun is “a fixed expression which is made up of more than
one word and which functions in the clause as a noun (Sinclair and Fox 1990: 25).
To demarcate a AN or NN collocation from an adjective + noun or noun + noun
compound (e.g. high school, news bulletin)is not easy, since they both involve
frequent co-occurrence of word constituents, though compounds are much more
fixed and mutually predicted and function as one unit of meaning.
Distinguishing AN and NN collocations from compounds was not within the scope
of the present investigation. Instead, recognition of AN and NN collocations were
carried out with reference to the two dictionaries—the BBI and the OCDSE. If an AN
or NN combination was listed in either of the two collocation dictionaries, it was
taken as a collocation (rather than a compound).
4.8 Summary
This chapter has described the design of this cross-sectional investigation of

Chinese learners’ collocation written performance. Two major points were covered:
the relevant materials utilised and the procedure involved. A summary of the design
is presented in Table 4.1.
The next chapter reports the findings of this large scale study of Chinese
learners’ production of English verb + noun, adjective + noun and noun + noun
collocations, which used the methodology presented above.
Table 4.1 A brief summary of the design of the study

Materials The learner corpus CLEC
Groups of learners ST2, ST5 and ST6 learners
Types of collocations verb + noun, adjective + noun and noun + noun
collocations
Software AntConc 3.2.4w, Wordsmith 5.0, EditPad Pro 7,
PowerGREP 4, Microsoft Office Excel 2010 and
GraphPad Prism (Version 6.04)
Referencing The BBI (3rd Edition) and the OCDSE (2nd Edition)
dictionaries
Reference corpus The BNC
Procedure Processing of the a. Part-of-speech tagging of the three sub-corpora
sub-corpora b. Reliability check (a high tagging reliability)
Investigation into VN a. Extracting well-formed and erroneous collocations
collocations b. Creating collocation databases
c. Classifying lexical verbs in VN collocations into
synsets
Investigation into AN Automatic extraction of AN and NN combinations;
and NN collocations The same methods as analyses of VN collocations
were adopted.
References
101–114.
Benson, M. (1985). Collocations and idioms. In R. Ilson (Ed.), Dictionaries, lexicography and
language learning (pp. 61–68). Oxford: Published in association with the British Council by
Pergamon.
Benson, M., Benson, E., & Ilson, R. (1986). Lexicographic description of English. Amsterdam:
Benjamins.
Benson, M., Benson, E., & Ilson, R. (2010). The BBI combinatory dictionary of English: Your
guide to collocations and grammar (3rd ed.). Amsterdam: Benjamins.
Biskup, D. (1990). Some remarks on combinability: Lexical collocations. In J. Arabski (Ed.),
Foreign language acquisition papers (pp. 31–44). Katowice: Uniwersytet Slaski.
Cowie, A. P. (1991). Multiword units in newspaper language. In S. Granger (Eds.), Perspectives
on the English lexicon: A tribute to Jacques Van Roey (pp. 101–116). Louvain-la-Neuve:
Cahiers de I’Institut de Linguistique de Louvain.
Cowie, A. P. (1992). Multiword lexical units and communicative language teaching. In P. Arnaud
& H. Bejoint (Eds.), Vocabulary and applied linguistics (pp. 1–12). London: Macmillan.
Cowie, A. P. (1998). Phraseology: Theory, analysis and applications. Oxford: Oxford University
Press.
Fellbaum, C. (2010). Wordnet. In R. Poli, M. Healy, & A. Kameas (Eds.), Theory and applications
of ontology: Computer applications (pp. 231–243). London: Springer.
Garside, R. (1987). The CLAWS word-tagging system. In R. Garside, G. Leech, & G. Sampson
(Eds.), The computational analysis of English: A corpus-based approach (pp. 30–41). London:
Longman.
References 75
Garside, R. (1996). The robust tagging of unrestricted text: The BNC experience. In J. Thomas &
M. Short (Eds.), Using corpora for language research: Studies in the honour of Geoffrey Leech
Granger, S., Dagneaux, E., Meunier, F., et al. (2009). International corpus of learner English (V2).
Louvain: Presses Universitaires de Louvain.
Greenbaum, S. (1970). Verb-intensifier collocations in English: An experimental approach. The
Hague: Mouton.
Gui, S. C., & Yang, H. Z. (2003). Chinese learner English corpus. Shanghai: Shanghai Foreign
Language Education Press.
Hausmann, F. J. (1989). Le dictionnaire de collocations. In F. J. Hausmann, O. Reichmann,
H. E. Wiegand, et al. (Eds.), Wörterbücher: ein internationales Handbuch zur Lexicographie.
Dictionaries. Dictionnaires (pp. 1010–1019). Berlin: De Gruyter.
Press.
24–44.
Johansson, S., & Hofland, K. (1989). Frequency analysis of English vocabulary and grammar.
Oxford: Oxford University Press.
Kjellmer, G. (1994). A dictionary of English collocations: Based on the Brown corpus. Oxford:
Clarendon Press.
Klotz, M. (2003). Oxford collocations dictionary for students of English. International Journal of
Lexicography, 26(1), 57–61.
Kozlowska, C. D., & Dzierzanowska, H. (1988). Selected English collocations. Warszawa: PWN.
Levin, B. (1993). English verb classes and alternations: A preliminary investigation. Chicago:
University of Chicago Press.
Martelli, A. (2006). A corpus based description of English lexical collocations used by Italian
McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based language studies: An advanced
resource book. London: Routledge.
McIntosh, C., Francis, B., & Poole, R. (2009). Oxford collocations dictionary for students of
English (2nd ed.). Oxford: Oxford University Press.
Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38
(11), 39–41.
Miller, G. A., Beckwith, R., Fellbaum, C., et al. (1990). Introduction to WordNet: An on-line
lexical database. International Journal of Lexicography, 3(4), 235–244.
Nesi, H. (2011). BAWE: An introduction to a new resource. In A. Frankenberg-Garcia, L.
Flowerdew, & G. Aston (Eds.), New trends in corpora and language learning (pp. 213–228).
London: Continuum.
Nesselhauf, N. (2004). What are collocations? In D. Allerton, N. Nesselhauf, & P. Skandera
(Eds.), Phraseological units: Basic concepts and their application (pp. 1–21). Basel: Schwabe.
Palmer, F. R. (1981). Semantics (2nd ed.). Cambridge: Cambridge University Press.
Saint-Dizier, P., & Viegas, E. (1995). Computational lexical semantics. Cambridge: Cambridge
University Press.
Spooner, A. (2005). Oxford dictionary of synonyms and antonyms. Oxford: Oxford University
Press.
Wen, Q. F., Liang, M. C., & Yan, X. Q. (2008). Spoken and written English corpus of Chinese
learners (2nd ed.). Beijing: Foreign Language Teaching and Research Press.
Chapter 5
Chinese Learners’ Production
of Verb + Noun Collocations
This chapter enquires into the relationship between vocabulary increase and col-
location used by L2 learners seen from the overall perspective of the growth from
delexical verbs to lexical verbs. As part of the investigation, it presents the overall
analyses of VN collocations produced by all the three levels of learners. In addition
to the comparison of our findings with previous ones, the learning of verb + noun
collocations by L2 learners is analysed from the following perspectives: the overall
results and general patterns of verb + noun collocations produced by the three
proficiency levels (Sect. 5.1), the developmental patterns of delexical verb + noun
(abbreviated: DeLexVN) and lexical verb + noun (LexVN) collocations (Sect. 5.2),
the comparison between overall verb growth and VN collocation errors (Sect. 5.3),
followed by a summary of these three overall analyses (Sect. 5.4).
5.1 Overall Analyses (1): General Patterns of VN

Collocations Produced by L2 Learners
5.1.1 Overall Tokens of Collocations
Altogether 5068 instances of collocations (including both well-formed and erro-

neous ones) were extracted (ST2: 1579; ST5: 1660 and ST6: 1829). In terms of
absolute frequencies, the number of collocations produced by each level of learners
shows a gradual increase. However, there was no proportionate increase in collo-
cation production of the likewise gradually increasing sizes of each file were taken
into consideration (ST2: 208,088; ST5: 214,510; ST6: 226,106). This means that
there was no quantitative development in VN collocational knowledge as learners
become more proficient. Direct comparison of these results with previous studies,
as regards the numbers of VN collocations relative to the size of learner corpora is
difficult, given the varied methods they employed in counting verb + noun
DOI 10.1007/978-981-10-5822-6_5
78 5 Chinese Learners’ Production of Verb + Noun Collocations
collocations and the foci on learners at different proficiency levels. Instead of an

exhaustive retrieval of all the possible verb + noun collocations, Laufer and
Waldman (2011) calculated VN collocations starting from a predetermined set of
nouns with high frequencies in a native-speaker corpus and reported fewer collo-
cations (852) in their corpus of advanced learners (approximating the size of our
ST6 file). Howarth’s (1996, 1998a, b) studies, instead, presented a manual and
exhaustive analysis of all the VN collocations and reported around 1000 colloca-
tions in about 25,000 words of L2 academic writing. Compared with the proportion
of collocations extracted from the three files in this study, there was a much higher
percentage of collocations in Howarth’s study, a discrepancy probably due to two
factors: the definition of collocations and the focus on learners at different profi-
ciency levels. In Howarth’s studies, free and restricted collocations and idioms were
all included, which naturally resulted in a larger number. Furthermore, Howarth’s
subjects were full-time postgraduates of Linguistics and English Language
Teaching studying in an English-speaking country, and thus the proportions of
collocations produced by these highly proficient learners were unsurprisingly
higher.
In light of the homogeneity of learners’ proficiency level and collocation
retrieval methods, Nesselhauf’s (2005) data provides a level of comparison in terms
of the production of VN collocations by advanced learners. Compared with the
2082 VN collocations found by Nesselhauf (2005) in her investigation of the
around 150,000-word-writing by advanced German learners of English, the number
of collocations produced by our advanced learners (ST6)—1829, approximates to
the number of VN collocations in her findings. Yet there was a difference, as the
proportion of collocations out of the total words in her study was 1.7 times as high
as the proportion of collocations in the ST6 database. The main reason for this gap
in collocation percentages was that the types of VN collocations were rather
restrictedly defined in this study. As is illustrated in Sect. 4.7.2, only verbs with
nouns as objects were included, whilst eight other syntactic patterns which verb–
noun collocations fell into were counted by Nesselhauf (e.g. go to prison, fall in
love with somebody) (cf. Nesselhauf 2005: 68). So there was no major discrepancy
between the proportion of VN collocations obtained in this study and that reported
in previous L2 VN collocation research.
Though in general it was difficult to compare the number of VN collocations
relative to learner corpus size due to the heterogeneity of L2 VN collocation studies,
it can be seen from the quantities of collocation production that L2 learners pro-
duced far fewer VN collocations as compared to the percentage of VN collocations
produced by native speakers of English as reported in other studies. When the
proportion of verb + object–noun collocations produced by NSs was calculated out
of the total verb–noun combinations, the proportion was found to be as high as over
40% (Cowie 1991, 1992; Howarth 1996). If VN collocation proportion is counted
out of the total corpus size, Howarth (1996, 1998a, b) extracted over 5000 target
collocations out of a native-speaker corpus of only about 240,000 words, a
5.1 Overall Analyses (1): General Patterns of VN Collocations … 79
proportion 2.5 times as high as that obtained through our L2 learner data.1 This
quantitative discrepancy in terms of collocation uses has been widely acknowl-
edged and empirically tested (cf. Sect. 3.2.1.1). That learners used fewer colloca-
tions compared with NSs on the one hand, shows a poorer sense of collocations; on
the other, it demonstrates the greater use of an “open choice principle” than of an
“idiom principle” (Sinclair 1991) by L2 learners. This “open choice principle’’ is
further manifested through a non-diversified production of collocation types, dis-
cussed below.
5.1.2 Overall Types of Collocations and Collocation

Frequency Distribution
The numbers of collocation types produced by the three groups of Chinese EFL
learners were: 285 (ST2), 344 (ST5) and 441 (ST6).2 The overall number of types
(1070) was found to be rather low compared with tokens (5068). That means on
average one collocation was produced 5 times. But the frequency was not so evenly
distributed if we take a closer look at the distribution of collocation frequencies over
the overall types in each learner group. Figure 5.1 presents the distribution of
collocation frequencies over the 285 types of collocations in the ST2 database.
As is shown in Fig. 5.1, a predominant number of collocations had a frequency
less than 10, with 5 collocations having a frequency over 50. It can be seen that an
overwhelming number of collocations occurred fewer than 5 times, and in fact,
most of these occurred only once. The same pattern of the frequency distribution
across types was also found in the ST5 and ST6 databases. The fact that a majority
of the types of collocations were produced less than 5 times demonstrates a varied
use of collocations, which further acts as some sign of phraseological competence.
However, taking the total tokens into account, this varied use is at the same time
accompanied by an overuse of a small number of collocation types.
A further look into the distribution of collocation tokens over their types
revealed a huge overuse of a limited number of collocations in all three groups of
learners. Table 5.1 shows the types of collocations that were divided into three
groups according to frequencies: those with a frequency of 5 or lower ( 5), those
between 5 and 10 (5–10) and those with a frequency of 10 or more ( 10).
As is shown in the above table, a majority of collocations in the three levels
occurred less than 5 times (cf. Fig. 5.1) and only a small proportion of them had a
frequency more than 10. However, this small proportion of collocation produced
1
The proportion of VN collocations in native speaker data is in fact much higher, since Howarth
(1996, 1998a, b) started from the most frequent verb lemmas (with a frequency of 10 or more) and
then proceeded to extract their noun collocates, which means that there still exist a large number of
collocations with less frequent verbs.
2
Collocations like made plans, made a plan, make a plan were regarded in this study as instan-
tiations of one collocation type “make + plan”.
90
80
70
60
Frequency
50
40
30
20
10
0
0 50 100 150 200 250 300
Collocation types
Fig. 5.1 VN collocation frequencies distributed over collocation types in the ST2
Table 5.1 VN collocations Learners 5 5–10 10

divided into three frequency
groups ST2 230 16 39
ST5 277 27 40
ST6 381 20 40
more than 10 times made up a majority of the overall collocation tokens. Taking the
ST2 data, for example 39 types of collocations were used for 1052 times, making
up 67% of the total 1579 collocations. The same trend for heavy reliance on a
limited number of collocation types was revealed in the ST5 and ST6 databases as
well (see Fig. 5.2). Though there was a slight decrease in the proportion of col-
locations occurring more than 10 times among the overall tokens from ST2 to ST5
and ST6, similar distribution patterns were observed in the ST5 and ST6 collocation
databases, i.e. less than 14% collocation types made up more than half of the
collocations produced by Chinese EFL learners at three proficiency levels.
On the one hand, the finding obtained in this study mirrors those of previous
studies which have identified the phenomenon of phraseological overuse on the part
of L2 learners (Ädel and Erman 2012; Granger 1998a; Lorenz 1999; Kaszubski
2000; etc. cf. Sect. 3.2.1.1). On the other, the finding of an overuse of a restricted
number of collocations by L2 learners was reported in a different way here.
ST2 ST5 ST6

230 (≤5) 277 (≤5) 381(≤5)
16 (5-10) 27 (5-10) 20 (5-10)
39 (≥10) 40 (≥10)
40 (≥10)
Fig. 5.2 The frequency distribution of VN collocation types in ST2, 5 & 6 databases
A majority of studies find an overuse in L2 collocation data when comparing the

uses of a fixed type of formulaic sequences by NNSs with those produced by NSs
(e.g. Ädel and Erman 2012; Altenberg and Granger 2001; Cobb 2003; De Cock
et al. 1998; Foster 2001; Granger 1998a; Lorenz 1999; Kaszubski 2000). As dis-
cussed in Sect. 3.2.1.1, it is not surprising that NNSs, still in the process of
interlanguage development, are characterised by an overuse of a limited number of
collocations, and the attainment of diversified collocation uses, as with the attain-
ment of diversified use of grammatical structures or lexis, is incremental. The
finding of collocation overuse revealed in this study is, however, not based on
comparisons of non-native and native data. So a rather limited range of collocations
occurring more than 10 times in our data were “overused” in the sense that they
occurred more frequently than the other types of collocations. Nesselhauf (2005)
also identified a set of VN collocations that were most frequently produced but
unfortunately her data were not quantified, and the proportion of such an overuse is
not known. One striking advantage of examining L2 learners’ overuse without
comparison with NSs data is that characteristics of collocation uses specific to L2
learners can be authentically investigated.
VN collocations that were used more than 10 times by at least two learner levels
include: make + progress, make + use, make + friend, make + mistake, take +
part, take + care, do + homework, do + exercise, do + good, do + harm, have +
dinner, answer + question, lead + life, live + life, pay + attention, play + role,
sing + song, solve + problem, spend + time, try + best, wear + clothes and
*learn + knowledge. The heavy use of a restricted number of collocations by
Chinese L2 learners shows the phenomenon of “collocational teddy bears”
(Nesselhauf 2005: 69) as discussed in Sect. 3.2.1.1. A general feature of the above
expressions is that they are very frequently used in everyday native-speaker
English. The necessity to use these collocations for communicative purposes may
thus facilitate fluent and repeated uses. So a heavy use of certain collocations is not
a non-native phenomenon, rather it is a natural characteristic of everyday use.
However, though these overused collocations are required in everyday English use,
heavy reliance on these collocations, as pointed out in Chap. 1, can add a rigid
flavour to learners’ writings (Channell 1994: 21).
Whether this heavy use is determined by communicative purposes or it is a result
of clinging on to “collocational teddy bears” is not crucial here. What is more
important is why L2 learners have such a generally good command of the above
VN collocations.3 Previous studies show that what is common with overused
collocations is that they bear a great resemblance with learners’ L1 combinations
(cf. Granger 1998a; Kaszubski 2000). That may suggest an easier acquisition of L2
collocations which are similar to L1 combinations than those differing from L1
patterns. Yet a closer look at the above collocations from a Chinese perspective
shows that not all of them are L1-equivalent word combinations, such as delexical
3
Good performance is observed with the exception of *learn + knowledge.
400
350
300
ST2
Coll. types
250
200
ST5
150 ST6
100
50
0
≤5 5-10 ≥10
Coll. frequencies
Fig. 5.3 Between-group comparison of VN collocations within three frequency groups
verb combinations: make + progress, make + use, take + part, take + care, etc.4
So a necessity for communicative purposes and the frequent input for L2 learners of
these overused collocations may well account for such a heavy use. The role of L1
in collocation acquisition will be extensively discussed in Chap. 9.
Though there exists a necessity for the frequent use of collocations in everyday
English, as some sign of fluent and idiomatic control of a language, a varied use of
collocation types is also needed. As stated in Sect. 3.2.1.1, the difference of
phraseological uses between NSs and NNSs rests in the diversified types produced.
Turning to the types of VN collocations produced by all three levels, a
between-group comparison of diversification in collocation production was con-
ducted (see Fig. 5.3).
As is shown in Fig. 5.3 (cf. the corresponding numerical data in Table 5.1), for
collocations that were used 5 times or less, there was a clear increase in collocation
types in the ST6 level compared with the two lower levels. For collocations with a
frequency more than 5, there was not such a growing trend. It shows that despite a
heavy use of a rather small number of most frequent collocations at all levels, there
was no increase in the types of these frequent collocations across the three profi-
ciency levels, but an increase in far less frequent collocations. That there was an
increase in collocation types with the rise of proficiency is consistent with the
findings by Gitsaki (1999) and Zhang (1993), who reported that more proficient L2
learners produced more varied collocations than the less proficient L2 learners. It is
unsurprising in the sense that more collocations are learned as learners receive more
English instruction. So it becomes more important to investigate the quality of their
collocation production in terms of misuses.
4
The Chinese equivalent expressions for make + progress, make + use, take + part, take + care
are qude jinbu (literally: gain progress), shiyong (use), canjia (participate), zhaogu (care), among
which the last three are Chinese verb lexemes.
Table 5.2 Well-formed and erroneous VN collocations in the three levels of learners (types)
Learners Well-formed collocations Erroneous collocations Total
ST2 221 64 (22%) 285
ST5 300 44 (13%) 344
ST6 348 93 (21%) 441
Notes ST2 and ST5: p = 0.0020**
ST5 and ST6: p = 0.0024**
ST2 and ST6: p = 0.7121 ns
**indicate “very significant” and “ns” suggests “not significant” [The threshold significance
level is set as 0.05 by the GraphPad Prism. In the meantime, symbols used by Prism suggesting
the level of significance were also adopted, and these symbols together with the p values are:
****(p < 0.0001): extremely significant; ***(0.0001 < p < 0.001): extremely significant;
**(0.001 < p < 0.01): very significant; *(0.01 < p < 0.05): significant; ns (p > 0.05): not
significant (cited from GraphPad statistics guide: http://www.graphpad.com/guides/prism/6/
statistics/index.htm?extremely_significant_results.htm) (Accessed 10 June 2013)]
5.1.3 Collocation Misuses
Qualitative deviation is another feature manifested in L2 learners’ collocation

production. As with findings from previous studies, numerous collocation misuses
were also uncovered in this study. The following are some of non-native-like
(erroneous) VN collocations: *lay + role, *learn + knowledge, *conduct + crime,
*attend + military service, *do + problem, and*have + progress, etc. Compared
with the uneven distribution of well-formed collocations where a restricted number
were overused, the type/token ratio of erroneous collocations was very high (0.57
for ST2 and ST6 learners and 0.4 for the ST5 level), with only three collocations
wrongly used on more than 10 occasions (*learn + knowledge, *release + pain,
*release + burden).Thus, the percentage of collocation errors was calculated in
terms of collocation types in order to avoid statistical bias caused by uneven dis-
tribution of tokens. The well-formed and erroneous collocation types produced by
the three levels of learners are presented numerically in Table 5.2. To facilitate
comparison, the proportion of erroneous collocations is given out of the total
number of collocations.
As shown in the above table, nearly one fourth of the collocations in the ST2 and
ST6 databases are erroneous and a rather smaller proportion of collocations (13%)
are erroneous in the ST5 database. It is interesting to note that ST5 learners, in the
middle level, produced the fewest collocation errors among the three levels. In
addition, in order to see whether there is a statistical difference between the pro-
portions of well-formed and erroneous collocations and learner levels, Fisher’s test
was performed on the above data.5 Results show that ST5 learners—first and
second year English majors produced very significantly fewer VN collocation errors
5
Fisher’s test can give an exact P value and works fine with small sample sizes. Considering the
small number of collocation types, Fisher’s test was adopted.
than pre-university middle school students and third and fourth year English majors
(ST2 and ST5: p = 0.0020; ST5 and ST6: p = 0.0024). That collocation errors
made by ST5 learners are the fewest was also found by Zhang and Gao (2006) in
their analysis of the original error-tagged CLEC. Though they did not investigate
correct collocations, they gave the numbers of problematic verb–noun collocations
produced by Chinese EFL learners and figures showed that ST5 learners produced
the smallest number of errors compared with ST2 and ST6. Taking into other types
of erroneous word combinations, viz. noun/noun, noun/verb, adjective/noun,
verb/adverb and adverb/adjective combinations, ST5 was also found to produce the
smallest numbers of errors in the three levels under discussion (cf. Zhang and Gao
2006: 32). Therefore, in terms of the frequency of erroneous collocations, the ST5
level is not consistent from the levels of ST2 to ST6. ST5 learners received either
one to two more years’ English instruction than ST2 learners, and they receive one
to two fewer years’ instruction than ST6 learners. At the intermediate level, ST5
learners exhibit a higher competence than both the lower and higher levels of
learners. The question that arises here is that why the middle level outperforms the
other two levels. This phenomenon has also been noted by Zareva and Wolter
(2012) in L2 word association studies where the intermediate group produced
the highest percentage of collocational responses, higher than the advanced
group. They further noticed that “the same class (paradigmatic) connections
become more prominent as the proficiency of L2 learners of English increases to an
advanced level” (Zareva and Wolter 2012: 59–60). Therefore, in a broad sense, a
reverse relationship is suggested between vocabulary growth in the paradigmatic
relations and the collocation performance in the syntagmatic relations. To put the
matter in simple terms, the more words L2 learners learn, the more they make errors
(as seen from the comparison of error ratios between ST5 and ST6). Moreover, a
sufficient vocabulary size is undoubtedly important for correct collocation pro-
duction (as seen from the comparison of error ratios between ST2 and ST5). This
relationship between vocabulary growth and collocation is at the heart of this study
and will be elaborated in Chap. 6.
Returning to the present to the percentages of erroneous VN collocations pro-
duced by L2 learners, the result is similar to other studies of L2 VN collocation
production, given the heterogeneity in studies in this field. In Nesselhauf’s (2005)
investigation into the verb—noun collocations in a corpus of writings by advanced
German-speaking learners of English, approximately one third of the collocations
were found to be unacceptable or questionable. This proportion was endorsed by
Laufer and Waldman (2011), who reported about a third erroneous VN collocations
among all the collocations L2 learners produced. It should be noted that there were
higher proportions of VN collocation errors in the above studies because more types
of errors were included. For Nesselhauf, verb–noun errors include errors of all
elements, e.g. verbs, phrasal verbs, nouns, determiners, etc. An example given is the
collocation—come to the conclusion that and errors in any of these elements were
considered (Nesselhauf 2005: 71). Similarly, errors involving nouns were counted
in the study conducted by Laufer and Waldman (2011). The closest point to turn to
is the study carried out by Howarth (1996), who found a fourth of the verb–noun
collocations his subjects produced were erroneous. Therefore, our finding in terms
of the proportion of erroneous collocations is similar to previous L2 VN collocation
studies.
Turning to collocation errors as related to L2 proficiency, statistical analysis
shows that there was a significant difference between the numbers of erroneous
collocations and learner types, viz. ST2 versus ST5, ST5 versus ST6, though no
significant relationship was found between the ST2 and ST6 learners. However, the
data revealed a persistent proportion of collocation misuses in the two levels (22%
in the ST2 level and 21% in ST6 level). This suggests an overall lag in collocation
ability, with no sign of decrease in errors with rising proficiency. That there is no
decrease in errors accords with the finding of the cross-sectional study conducted by
Laufer and Waldman (2011), who found a third erroneous collocation produced by
learners at three proficiency levels. Thus, now we have again uncovered a defi-
ciency in the L2 acquisition of collocations, and it becomes important to identify
the factor(s) contributing to this lag.
5.1.4 Synopsis of Overall Analyses (1)
This section presents the overall analyses of all the verb + noun collocations pro-
duced by Chinese EFL learners at three proficiency levels. Overall results were
discussed in connection with prior findings in L2 VN collocation studies. In short,
this study identified both a quantitative and qualitative deficiency long acknowl-
edged in the collocation performance by L2 learners. Among the approximately
600,000 words of text analysed, only about 5000 collocations were retrieved, fol-
lowing the criteria set out in the last chapter, thus revealing a quantitative dis-
crepancy in terms of collocation uses and again manifesting a preference for the
“open-choice principle” on the part of L2 learners. This quantitative discrepancy
has also been shown through the small number of collocation types (1070) com-
pared with 5000 collocation tokens. These figures indicate weak collocational links
in the mental lexicon of L2 learners, which corroborates findings from word
association tests that L2 learners produced significantly fewer collocational
responses than native speakers did (cf. Fitzpatrick 2006).
In addition, in terms of collocation misuses, it was found that collocation poses
problems at all levels, shown by nearly a quarter of all the collocations produced
being erroneous. Furthermore, data obtained from this cross-sectional study yields
some interesting points: firstly, collocation overuse as reported in previous studies
through comparisons of NS and NNS corpora (Ädel and Erman 2012; Cobb 2003;
De Cock et al. 1998; Durrant and Schmitt 2009; Foster 2001; Granger 1998a) is
uncovered from a non-comparison perspective in this study. Through analysing the
distribution of collocation tokens over types, this study showed that a small pro-
portion of common collocations make up a majority of the overall collocation
tokens. This distribution is quite like word frequency distribution in a corpus of
natural language, viz. “a small number of words tend to make up a very large
portion of any normal text” (Milton 2009: 46). Secondly, between-group compar-
isons of collocation data revealed some general developmental patterns, viz. the
overall number of collocations does not increase with the rise of proficiency (both
in terms of tokens and types) but there is a more diversified collocation uses as
learners advance to higher levels. Despite the good signs of a development in
collocational competence as proficiency rises, there was no decrease in collocation
misuses, as collocation errors were persistent even at the ST6 level, depicting a
general lag in collocation knowledge. This indicates that collocational knowledge
does not improve with the advances of L2 proficiency and the stagnant of collo-
cational knowledge has long been endorsed (e.g. Bahns and Eldaw 1993; Laufer
and Waldman 2011). The next section, therefore, moves on to continue the dis-
cussion of this lag through examining collocations classified into delexical verb and
lexical verb + noun collocations produced by the three levels.
5.2 Overall Analyses (2): Between-Group Comparisons

of Delexical and Lexical VN Collocations
Vocabulary increase was first broadly measured in terms of the development from
delexical verbs to lexical verbs, viz. from very general to more specific verbs in
meanings, and then measured locally with reference to particular synsets. One of
the main hypotheses is with regard to the increase in verbs from delexical to lexical
verbs in collocation production: it is hypothesised that in VN collocation production
L2 learners at lower levels make more errors using delexical verbs, whilst those at
higher levels make more errors with lexical verbs. If this hypothesis is upheld, it
means that the learning of verbs, progressing from delexical to lexical verbs, does
not ensure better collocation competence, even though the growth of lexical verbs
provides more opportunities for L2 learners to be specific in choosing the right verb
to collocate with a noun in specific VN collocations.
As was pointed out in Sect. 2.2.3, the six commonest delexical verbs targeted are
do, give, have, make, take and get. Examples of well-formed and erroneous
delexical verb + noun collocations are: give + comment, make + money, take +
nap, *give + meeting, *take + joke, and *do + game. Examples of well-formed
and erroneous and lexical verb + noun collocations are achieve + aim, claim +
right, impose + burden, *ensure + law, *implement + act, and *teach + knowl-
edge. All the well-formed and erroneous DeLexVN and LexVN collocations in the
three databases were numerically tabulated in Table 5.3 and graphically presented
in Fig. 5.4.
Firstly, overall developmental patterns were revealed in the well-formed and
erroneous DeLexVN and LexVN collocations. As is clearly shown in Fig. 5.4, for
collocations that are correctly produced, there is with rising proficiency a clear
increase in lexical verb +noun collocations, and a decrease in delexical verb + noun
collocations. In general, it can be interpreted to the effect that the learning of more
5.2 Overall Analyses (2): Between-Group Comparisons … 87
Table 5.3 Well-formed and Learners Well-formed Erroneous

erroneous VN collocations in collocations collocations
the three levels of learners
DeLexVN LexVN DeLexVN LexVN
(tokens)
ST2 871 596 36 76
ST5 790 768 30 72
ST6 568 1097 34 130
1,200
1,000
Coll. freq.
800
Well-formed DeLexVN
600
Well-formed LexVN
400 Erroneous DeLexVN
200 Erroneous LexVN
0
ST2 ST5 ST6
Learner types
Fig. 5.4 Well-formed and erroneous VN collocations in the three levels of learners (tokens)
lexical verbs leads to L2 learners’ better production of VN collocations. This is a

good sign of general vocabulary development as well as a development in collo-
cational performance. However, it is interesting to note that at the same time L2
learners make more errors with LexVN collocations as proficiency rises, with
DeLexVN collocation errors remaining the same in quantity. Investigation of col-
location types rather than tokens in the three levels displayed the same develop-
mental pattern (see Fig. 5.5), and revealed a clear trend for the increase in erroneous
lexical verb + noun collocations from the ST5 to the ST6 level.
The above figures show the overall trend on both the correct and erroneous uses
of LexVN and DeLexVN collocations among different levels of learners. In
250
200
Coll. types
150 Well-formed DeLexVN

Well-formed LexVN
100
Erroneous DeLexVN
50 Erroneous LexVN
0
ST2 ST5 ST6
Learner types
Fig. 5.5 Well-formed and erroneous VN collocations in the three levels of learners (types)
Table 5.4 Well-formed VN Learners DeLexVN LexVN DeLexVN/LexVN

collocations produced by the
three levels of learners ST2 871 596 1.5:1
(tokens) ST5 790 768 1:1
ST6 568 1097 0.5:1
Notes ST2 and ST5: v2 = 22.57, p < 0.0001****
ST5 and ST6: v2 = 90.20, p < 0.0001****
ST2 and ST6: v2 = 199.3, p < 0.0001****
addition to a comparison of frequencies, further statistical analyses were performed

to see if these differences reached statistical significance. Analyses were subse-
quently carried out from two perspectives: comparisons of well-formed LexVN and
DeLexVN collocations in the three databases and comparisons of erroneous col-
locations produced by these three levels of learners.
5.2.1 Between-Group Comparisons of Well-Formed

DeLexVN and LexVN Collocations
Table 5.4 below shows the overall tokens of well-formed delexical and lexical
verb + noun collocations produced by the three levels of learners, and also gives
the information regarding the ratio of DeLexVN collocations divided by LexVN
collocations.
From this table, we can see that for ST2 learners, DeLexVN collocations were
used 1.5 times as often as LexVN collocations. For ST6 learners, this ratio dropped
sharply to 0.5, meaning that the DeLexVN collocations produced by the ST6 level
were only 0.5 times the number of LexVN collocations. Yet the ratio for the ST5
level was 1:1, indicating that they produced roughly equal numbers of DeLexVN
and LexVN collocations. These ratios demonstrate a clear growth in the production
of lexical verbs by ST6 learners.
Next, pairwise comparisons between the three groups were made using chi-
square test with Yate’s correction.6 A significant relationship was found between
the numbers of delexical verb + noun/lexical verb + noun collocations and learners
at different proficiency levels. More specifically, ST5 learners produced very sig-
nificantly more LexVN collocations than ST2 counterparts (v2 = 22.57,
p < 0.0001); Similarly, ST6 learners produced very significantly more LexVN
collocations than the ST5 level (v2 = 90.20, p < 0.0001); when the comparison was
made between ST2 and ST6 learners, the ST6 level produced very significantly
more LexVN collocations (v2 = 199.3, p < 0.0001). These statistical analyses,
together with the trend analyses presented in Fig. 5.4, indicate that Chinese EFL
6
This was chosen since the “Yates’ continuity correction is designed to make the chi-square
approximation better” (http://graphpad.com/guides/prism/6/statistics/index.htm?stat_chi-square_
or_fishers_test.htm) [Accessed 10 June 2013].
Table 5.5 Well-formed VN Learners DeLexVN LexVN DeLexVN/LexVN

three levels of learners (types) ST2 119 102 1.2:1
ST5 134 166 0.8:1
ST6 118 230 0.5:1
Notes ST2 and ST5: p = 0.0416*
ST5 and ST6: p = 0.0060**
ST2 and ST6: p < 0.0001****
learners’ production of lexical verb + noun collocations increases with rising

proficiency.
The above analyses were based on the overall collocation tokens. As has been
noted earlier in Sect. 5.1.2, there was an uneven distribution of collocation tokens
among types, so it is also necessary to consider the frequency of collocation types.
Table 5.5 presents the types of well-formed DeLexVN and LexVN collocations
in the three databases and the ratios of DeLexVN collocations divided by LexVN
collocations.
A similar analysis procedure was followed as in the above analyses of tokens.
Firstly, in terms of ratios, the numbers of delexical verb + noun collocations
decreased gradually as compared with those of lexical verb + noun collocations
from the levels of ST2 to ST6, with the ST5 in the middle level. This confirmed to
the trend as revealed in the analysis of collocation tokens. Secondly, statistical
analyses were performed. Considering that the number of types produced by
learners in this study was small in size, Fisher’s test was used instead of chi-square
test. Statistical significance was revealed between the production of LexVN col-
locations and learner types (ST2 and ST5: p = 0.0416; ST5 and ST6: p = 0.0060;
ST2 and ST6: p < 0.0001). Like the comparison of tokens between ST2 and ST6, a
highly significant difference was found in terms of collocation types, which means
that Chinese EFL learners’ production of lexical verb + noun collocations reliably
increases with rising proficiency. The next section goes on to examine the growth
trends for erroneous LexVN and DeLexVN collocations.
5.2.2 Between-Group Comparisons of Erroneous DeLexVN

and LexVN Collocations
The analyses of well-formed delexical verb + noun and lexical verb + noun col-
locations showed a clear trend towards an increase in the production of lexical
verb + noun collocations and decrease in delexical verb + noun collocations as L2
learners’ proficiency rises. That indicates a growing collocational competence with
the learning of more lexical verbs or nouns. The production of more LexVN col-
locations by more proficient learners seems unsurprising. It is natural that less
proficient learners tend to resort to general words rather than words of specific
meanings as constrained by limited vocabulary. However, as the learning of more
Table 5.6 Erroneous VN Learners DeLexVN LexVN

three levels of learners ST2 36 76
(tokens) ST5 30 72
ST6 34 130
Notes ST2 and ST5: p = 0.7671 ns
ST5 and ST6: p = 0.1398 ns
ST2 and ST6: p = 0.0354*
180
160
140
120
Coll. freq.
100
LexVN
80
60
DeLexVN
40
20
0
ST2 ST5 ST6
Learner types
Fig. 5.6 Erroneous VN collocations produced by the three levels of learners (tokens)
lexical verbs facilitates better verb choices (e.g. take + attitude, have + attention,
solve + problem in the ST2, but adopt + attitude, attract/catch + attention and
solve/resolve/tackle + problem in the ST6), it is not always facilitative since at the
same time more lexical verb + noun collocations were found to be incorrectly used
as learners become more proficient (see Table 5.6 and Fig. 5.6).
As is shown in the above graph, there was no increase in erroneous delexical
verb + noun collocations but there was an increase in erroneous lexical verb +
noun collocations from the ST2 to the ST6 level. Fisher’s test further showed a
strong trend for increasing lexical verb + noun errors in the ST6 level as compared
with the ST2 level (p = 0.0354), though no significant difference was found
between ST2 and ST5 levels, and ST5 and ST6levels. As was discussed earlier in
Sect. 5.1.3, the ST5 level stood out in terms of lower number of errors, than with
ST2 and ST6 levels. Analyses of erroneous collocation types were also carried
out and results showed no significant difference between groups, although there
was an increase in lexical verb + noun errors from the ST5 to the ST6 level (see
Appendix A).
5.2.3 Synopsis of Overall Analyses (2)
This section is concerned with quantitative analyses of the production of

well-formed and erroneous delexical verb + noun and lexical verb + collocations
by the three groups of learners, with the aim to test whether in VN collocation
learning lower levels of L2 learners make more errors with delexical verbs and
higher levels make more errors with lexical verbs. This prediction was upheld. It
was found that there was an increase in the ratios of erroneous lexical verb + noun
collocations from the ST2 level to the ST6 level and ST6 learners produced sig-
nificantly more errors with lexical verb + noun collocations than ST2 learners.
Although for all groups of learners, there were more LexVN collocation errors than
DeLex VN collocation errors, ST2 learners produced significantly more DeLexVN
errors compared with the ST6 level and the latter produced significantly more
LexVN errors.
In addition to the between-group comparisons of erroneous VN collocations,
well-formed ones were also analysed. Results showed that there was a gradual
increase in lexical verb + noun collocations and a decrease in delexical verb +
noun collocations among the three levels of learners. A significant relationship was
found between the numbers of LexVN collocations and learner groups, indicating a
significantly higher production of lexical verb + noun collocations with the rise of
proficiency. Combining this with the results obtained through analyses of erroneous
VN collocations, we can see, on the one hand, that the increase in lexical verbs
facilitates diversified verb + noun collocation production, and on the other, that
more errors are involved with lexical verb + noun collocations, even though the
increase in lexical verbs means more and better choices for L2 learners in choosing
the “right” verbs for the noun collocates. Therefore, in addition to the finding of a
poorer performance on the part of learners in lexical verb + noun collocations as
measured through the increase of verbs from delexical to lexical verbs in collo-
cation production, there is a need for a detailed analysis of the relationship between
lexical verb increase and collocation misuses. This detailed analysis will be per-
formed in Chap. 6.
Another interesting finding in this section was the position of ST5 learners. In
terms of overall collocation errors out of the total number of VN collocations, they
produced the fewest erroneous collocations among all levels (proportion of errors in
the ST5: 13%; ST2: 22%; ST6: 21%). Meanwhile, if the errors are divided into
erroneous DeLex VN and LexVN collocations, the ST5 level is still out of line with
the other two levels in terms of the ratio of DeLexVN to LexVN errors. However, in
terms of the well-formed VN collocations, the number of collocations produced by
ST5 learners is around the average of those produced by ST2 and ST6 learners
(ST5: 1558; ST2: 1467; ST6: 1665). The same is true for the ratios of well-formed
DeLexVN to LexVN collocations in the ST5 level (in terms of tokens: ST5: 1:1;
ST2: 1.5:1; ST6: 0.5:1; in terms of types: ST5: 0.8:1; ST2: 1.2:1; ST6: 0.5:1). We
can see that ST5 is in the transitional stage. Even though the ST5 level does not
show a significant difference between the two levels in terms of DeLexVN and
LexVN collocation errors, it is consistent with the trend for increasing use in
LexVN and decreasing use in DeLexVN collocations. As shown in Fig. 5.4, there is
a downward trend in delexical verb + noun collocations and an upward trend in
lexical verb + noun collocations. So ST5 is just at the place for the trend to be
monotonic. There is a slight progression in English proficiency between ST5 and
ST6 learners since ST5 is just lower than ST6, who have not spent much more time
on exposure in English than the ST5. However, there are significant differences
between the ST2 and ST6 level both in the number of well-formed and erroneous
DeLexVN collocations and LexVN collocations. The sharp difference between the
lowest level (ST2) and the highest level (ST6) makes the comparison between these
two levels more noteworthy. Based on this observation, comparisons in the fol-
lowing sections were mainly carried out between these two levels.
5.3 Overall Analyses (3): Verb Growth and Collocation

Errors
Up to this point, it can be seen that there is a significant increase in lexical verb
collocations, either correctly or incorrectly produced as learners proceed to the
advanced level. Before turning to the detailed analyses in Chap. 6 of verb increase
in a particular semantic set and VN collocations associated with these verbs, an
overall analysis was performed on the growth rate of all the verbs and nouns used
by the three groups. The aim of this quantitative analysis is to get a panoramic view
of vocabulary increase and its relationship with error growth. The aim is to compare
the growth rate of lexical verbs and the rates of collocation errors, in order to see
globally the interconnection of these two rates. Seen through the above finding that
there was a gradual increase in lexical verb + noun collocations both in terms of
tokens and types, it is predicted that there is a considerable increase in lexical verbs
and/or nouns.
Verbs and nouns were automatically retrieved through regular expressions per-
formed by PowerGREP and then lemmatised by Wordsmith (cf. Sect. 4.6). Examples
of lemmatised verbs are go (lemma), including goes, going, gone, went; legalise
(lemma), including legalised/legalized, legalises/legalizes, legalising/legalizing.7
Altogether the frequencies of lemmatised verbs/nouns used by the three groups,
their growth rates and growth rates of lexical verb + noun collocations are pre-
sented in Table 5.7.
The above table illuminates two interesting aspects. From the perspective of the
quantities of lemmatised verbs and nouns, the transitional stage of ST5 was further
7
The reason why words were lemmatised before their growth rate was calculated is that the various
forms of one word should be viewed as one word to avoid repetitive calculation. If all verb forms
were included, the four forms of the verb legalise were instead counted as four words in the ST6.
But as a matter of fact, for learners they’ve learnt only one verb—legalise.
5.3 Overall Analyses (3): Verb Growth and Collocation Errors 93
Table 5.7 Growth rates of lemmatised verbs, nouns and LexVN collocations
Verbs Nouns Well-formed Well-formed Erroneous Erroneous
(tokens) (types) (tokens) (types)
ST2 1473 3345 596 102 76 41
ST5 1771 3857 768 166 72 27
ST6 2049 4199 1097 230 130 71
Growth rates 39% 26% 84% 125% 71% 73%
(ST2–ST6)
confirmed (cf. Sect. 5.2.3). To be more specific, there is a gradual increase in verbs
and nouns from the ST2 to ST6, and the ST5 is again in mid-position allowing the
upward trend to be monotonic. It is not surprising that L2 learners learn more and
more verbs and nouns with increasing exposure to English. Another interesting point
we can observe here is regarding the growth rates of verbs and nouns in comparison
with the growth rates of well-formed and erroneous LexVN collocations. With the
learning of more verbs and nouns, the possibilities of combining them into
well-formed collocations increase as well (growth rates of well-formed collocations:
84% and 125%). This can be interpreted to the effect that with the increase in lexical
verbs in learners’ overall vocabulary, there are more chances for them to locate the
right lexical verbs to produce correct verb + noun collocations. At the same time, the
learning of more nouns means more diversified combinations of lexical verbs into
well-formed VN collocations. However, the chances that this vocabulary growth (for
verbs: 39% and for nouns: 26%) may lead to errors increase as well, seen through the
high growth rates of erroneous lexical verb + noun collocations (71% in terms of
tokens and 73% in terms of types). On the whole, these data suggest that the more
learners learn, the more chance there is they will make mistakes.
In addition, Table 5.7 also shows a higher growth rate of verbs (39%) than
nouns (26%). So the worsening collocation performance in lexical verb + noun
collocations can be inferred as more linked to verb increments than noun incre-
ments. In what follows, detailed analysis of verb increase was conducted, in order
to see collocation errors that are linked to verbs with an increase in a given syn-
onym set. Furthermore, the fact that nouns increase 26% from the ST2 to ST6
suggests that learning nouns also plays a role in the production of VN collocations.
Considering this, a case study is conducted to consider the ratios of collocation
errors associated with the learning of new nouns (see Sect. 6.3).
5.4 Synopsis of the Overall Analyses of Verb + Noun

Collocations
Sections 5.1, 5.2 and 5.3 set out to quantitatively examine the production of
verb + noun collocations by three levels of Chinese EFL learners. Unlike most
previous L2 collocation studies, this study started with an exhaustive extraction of
both the well-formed and erroneous VN collocations, and these collocations were
further divided into delexical verb and lexical verb collocations with a view to
looking into verb vocabulary growth, from delexical verbs to lexical verbs. An
apparent-time design was adopted, assuming that the performance of different age
groups of learners at different proficiency level is indicative of a continuous
developmental process. The above three sections presented the overall results
obtained through general quantitative analyses and findings were as follows:
• Results of the overall collocations (only around 5000) out of over 600,000
words of text support the findings of the ‘open-choice principle’ employed by
L2 learners in language production by comparison with NSs. Though it is
difficult to compare the number of VN collocations relative to the size of
writings by L2 learners due to the heterogeneity of L2 VN collocation studies,
compared with previous findings with regard to NS performance, L2 learners
produced a far smaller number of collocations. In addition, the overall number
of collocation types (1070) is rather low compared with tokens. Findings
showed a heavy use of a rather small number of collocation types, i.e. less than
14% types of collocations taking up more than half of the collocations produced
by Chinese EFL learners at all proficiency levels. These figures together indicate
poor L2 phraseological competence, as collocations are sparsely and repeti-
tiously used by L2 learners.
• Collocation overuse, as has been widely recognised, was discovered in this
study and yet in a different way. Unlike previous studies, the finding of overuse
was not based on comparisons of native and non-native data. Instead, com-
parisons of collocation production were performed within learner data in this
study. A rather limited type of collocations were “overused” in the sense that
they occurred more frequently than other collocation types, making up more
than half of all collocation tokens. Therefore, learners’ collocation interlanguage
is characterised by a small number of frequent collocations making up large
portions of all collocation uses.
• Collocation misuses were found at all levels, with varying percentages. In
general, nearly a quarter of the collocations produced by L2 learners were
erroneous. The quantity of errors was analysed with regard to L2 proficiency,
and statistical analysis showed that there was no significant difference between
the numbers of erroneous collocations and learners of the ST2 and ST6 level.
However, there were a persistent proportion of collocation misuses with the rise
of proficiency (22% in the ST2 level and 21% in ST6 level). Rather than a
decrease in errors as L2 learners’ proficiency rises, collocation misuses remain
at the same level, which indicates a lag in collocation acquisition.
• In terms of the production of delexical verb + noun and lexical verb + collo-
cations by the three groups of learners, an upward trend for well-formed lexical
verb and downward trend for delexical verb collocations were found from the
ST2 to the ST6 level. Further statistical analyses confirmed this trend by
showing a significant increase in well-formed lexical verb + noun collocations
with the rise of proficiency.
5.4 Synopsis of the Overall Analyses of Verb + Noun Collocations 95
• Although there was a significant increase in well-formed lexical verb colloca-

tions, which indicates a rising collocational competence at the ST6 level, there
were also significantly more lexical verb collocation errors at the ST6 level
compared with the ST2 learners. It was observed that there was an increase in
the ratios of erroneous lexical verb + noun collocations from the ST2 level to
the ST6 level and lower levels of learners made more errors with delexical verbs
and higher levels made more errors with lexical verbs. ST6 learners produced
significantly more errors with lexical verb + noun collocations than ST2
learners in terms of collocation tokens.
• The growth rates of verbs and nouns were measured and then compared with the
growth rates of lexical verb + noun collocations. Results indicated that with the
learning of more verbs and nouns, the possibilities of combining them into
well-formed collocations increased, but errors increased sharply as well. The
huge increase of lexical verb collocation errors in the ST6 level as compared with
the ST2 can be considered as an association more with the general growth in
verbs than the nouns, given the more rapid growth of the verbs than the nouns.
In sum, from a developmental perspective, there emerges a complex develop-
mental pattern of VN collocation knowledge, a quantitative progression but qual-
itative degradation in the development of Chinese L2 learners’ knowledge of
collocations. From the perspective of well-formed collocations, there is an
increasing and more diversified production of collocations with the rise of profi-
ciency. In spite of this positive sign of development, the percentage of collocation
errors are still high at the ST6 level. They produced significantly more lexical verb
errors than the ST2 level, suggesting a poorer performance on the part of learners in
lexical verb + noun collocations as measured through the increase in verbs from
delexical to lexical verbs in collocation production. General verb increase was
found from the lowest to the highest level. It seems the more verbs/nouns learners
acquire, the more they make errors. So there is a need for a detailed analysis of the
relationship between lexical verb increase and collocation misuses. The next
chapter will, therefore, be concerned with analysing the collocations of verbs as
classified into synonym sets.
References
Ädel, A., & Erman, B. (2012). Recurrent word combinations in academic writing by native and
non-native speakers of English: A lexical bundles approach. English for Specific Purposes, 31
(2), 81–92.
Altenberg, B., & Granger, S. (2001). The grammatical and lexical patterning of MAKE in native
and non-native student writing. Applied Linguistics, 22(2), 173–195.
101–114.
Channell, J. (1994). Vague Language. Oxford: Oxford University Press.
Cowie, A. P. (1991). Multiword units in newspaper language. In S. Granger (Ed.), Perspectives on

the English lexicon: A Tribute to Jacques Van Roey (pp. 101–116). Louvain-la-Neuve: Cahiers
de I’Institut de Linguistique de Louvain.
Cowie, A. P. (1993). Multiword Lexical Units and Communicative Language Teaching.
In P. Arnaud, & H. Bejoint (Eds.), Vocabulary and Applied Linguistics (pp. 1–12). London:
Macmillan.
De Cock, S., Granger, S., Leech, G., et al. (1998). An automated approach to the phrasicon of EFL
learners. In S. Granger (Ed.), Learner English on Computer (pp. 67–79). London: Longman.
Durrant, P., Schmitt, N. (2009). To what extent do native and non-native writers make use of
collocations?. International Review of Applied Linguistics, 47, 157–177.
Fitzpatrick, T. (2006). Habits and rabbits: Word associations and the L2 lexicon. EUROSLA
Yearbook, 6, 121–145.
production of native and non-native speakers. In M. Bygate, P. Skehan, M. Swain (Eds.),
Harlow: Longman.
In A. P. Cowie (Eds.), Phraseology: Theory, analysis, and applications (pp. 145–160). Oxford:
Howarth, P. (1998a). The phraseology of learners’ academic writing. In A. P. Cowie (Eds.),
Press.
24–44.
Kaszubski, P. (2000). Selected aspects of lexicon, phraseology and style in the writing of polish
advanced learners of English: A contrastive, corpus-based approach, [2011-10-13]. http://main.
Lorenz, G. (1999). Adjective intensification—learners versus native speakers: A corpus study of
argumentative writing. Amsterdam: Rodopi.
Milton, J. (2009). Measuring second language vocabulary acquisition. Bristol: Multilingual
matters.
Zareva, A., & Wolter, B. (2012). The ‘promise’ of three methods of word association analysis to
L2 lexical research. Second Language Research, 28(1), 41–67.
Zhang, X. (1993). English collocations and their effect on the writing of native and non-native
college freshmen. Indiana: Indiana University of Pennsylvania.
Chapter 6
Verb Increase and the Production
of Verb + Noun Collocations
The main research goal of this study is to answer the question whether, within
specific semantic domains of the verbs occurring in verb + noun collocations
produced by all levels of learners, there are more chances of these verbs in higher
levels to lead to collocational errors than they form the correct ones. Thus, Sect. 6.1
presents detailed analyses of VN collocations where verbs were classified into
synsets, aiming to investigate the relationship between verb increase and colloca-
tion errors. Section 6.2 gives a summary of the detailed analyses of verbs in synsets
and learners’ collocation performance with these verbs; Section 6.3 looks into
whether there are other factors accounting for a lag in collocation, i.e. the acqui-
sition of new nouns.
6.1 Detailed Analyses—Verb Increase

and Collocation Uses
The groups of learners first targeted were the ST2 and ST6, i.e. the lowest and the
highest levels. The reasons why the ST5 level was not included as the first step in
analysis are as follows: first, the ST5 level, as the transitional stage, produced the
fewest erroneous collocations as compared with the other two levels. Results from
statistical tests showed that they produced very significantly fewer VN collocation
errors than both the ST2 and ST6 groups of learners (cf. Sect. 5.1.3). Second,
analysis of erroneous lexical verb collocations in the ST5 file yielded the same
result: they produced the fewest erroneous lexical verb collocations and no sig-
nificant relationship was found between the ST5 level and the other two levels with
regard to the number of LexVN errors produced. However, there was a strong trend
for increasing LexVN errors at the ST6 level as compared with the ST2 level
(cf. Sect. 5.2.2). Therefore, the increase in lexical verbs was sharper in the ST6
level as compared with the ST2 level than as compared with the ST5 level. So we
DOI 10.1007/978-981-10-5822-6_6
98 6 Verb Increase and the Production of Verb + Noun Collocations
started by classifying verbs in VN collocations in the ST6 level into synsets, then
classified the lexical verbs in VN collocations in the ST2 level into synsets, and
compared VN collocations within these synsets between the two levels. Finally,
verbs in the VN collocations in the ST5 level were added for general comparison
with the other two levels, in order to see whether there is a consistent trend in
learners’ collocation performance within these synsets.
As was presented in Sect. 4.7.2, the criteria for classifying verbs in VN collo-
cations were semantic similarity (synonyms), context-dependent synonyms and
foreign-language equivalents. Three sources for determining semantic similarity
were referenced: the EVCA, the ODSA and WordNet. Verbs in the VN collocations
in the ST2 and ST6 databases were classified into synsets such as, verbs of creation
(e.g. compose, create, build), and verbs of obtaining (e.g. achieve, earn, receive),
etc. Verb + noun collocations produced by Chinese L2 learners were limited in
quantity (cf. Sect. 5.1.2), so the verbs in collocations were found to be infrequent.
Due to the rather infrequent uses of verbs in collocations produced by EFL learners
with proficiency ranging from the basic level to the advanced level, only a limited
number of synsets that occurred in both databases were obtained (see Table 6.1 for
the 16 synsets classified).
As is shown in the above table, there was an increase in verbs in the first 12
synsets, but the increase varied in different synsets. More dramatic verb increase at
the ST6 level was found in the semantic sets of verbs of creation, fulfil verbs, verbs
of putting and settle verbs than other synsets like verbs of obtaining, learn verbs,
verbs of transfer of a message, keep verbs, follow verbs, play verbs, change verbs
and break verbs. But for the last four synsets, there was no such increase in the
quantity of verbs in the ST6 level. The proliferation of verbs in the higher level is a
natural process as learners learn more words with more instruction they receive.
The more verbs in a semantic field learners learn, the more specific they can be in
expressing meanings. However, as is pointed out by Wolter (2006), L2 learning is
not merely restricted to expanding vocabulary size: the depth of vocabulary
knowledge is of equal importance and one measure of vocabulary depth is the
learning of syntagmatic connections between words. But as will be shown below,
greater specificity was acquired at a loss for L2 learners. In the following detailed
discussion of both the well-formed and erroneous VN collocations of these verbs in
the 12 synsets, we set out to examine whether these verbs lead to more errors than
they form correct collocations.
6.1.1 Analysis of VN Collocations Within Synsets Identified

at the ST2 and ST6 Levels
Both well-formed and erroneous VN collocations involving verbs in the16 synsets

in Table 6.1 were, respectively, recorded at the ST2 and ST6 levels of learners. For
classifying erroneous collocations, errors involving both the wrongly used verbs
6.1 Detailed Analyses—Verb Increase and Collocation Uses 99
Table 6.1 Synsets occurring both in ST2 and ST6 VN collocation databases
Synsets Verbs
ST2 ST6 No.
1 Verbs of Compose, create, Arouse, chart, build, draft, draw, 7
creation draw, hold, enact, establish, form, hold, launch,
launch, raise, set publish, raise, set, stir
2 “Fulfil” verbs Discharge, fulfil Accomplish, apply, carry out, 10
commit, conduct, enforce, exercise,
exert, fulfil, implement, perform,
realise
3 Verbs of Achieve, earn, Achieve, catch, earn, gain, grasp, 2
obtaining gain, gather, reach, receive, seize
grasp, receive
4 Verbs of Lay Attach, fix, impose, lay, place, put 5
putting
5 “Settle” verbs Settle, solve Charge, settle, solve, resolve, tackle, 4
undertake
6 “Learn” verbs Know, learn, Acquire, learn, master, study 1
study
7 Verbs of Teach, tell Impart, instruct, teach, tell 2
transfer of a
message
8 “Keep” verbs Hold, keep Hold, keep, maintain 1
9 “Follow” verbs Follow, obey Adopt, follow, obey 1
10 “Play” verbs Play Act, play 1
11 “Change” verbs Change Change, shift 1
12 “Break” verbs Break Break, violate 1
13 “Live” verbs Lead, live Lead, live 0
14 “Wear” verbs Dress, wear Dress, wear 0
15 “Drive” verbs Drive, ride Drive −1
16 “Pay” verbs Devote, pay Pay −1
Note The ‘No.’ column represents the number of verbs at the ST6 level that are more than verbs at
the ST2 level
and the target verbs falling in the synsets were included. In other words, errors
included not only the verbs within the synsets that were inappropriately produced,
but also verbs that should be produced but not. For example, *create (com-
pose) + song was classified as a collocation error in the synset of verbs of creation,
since the wrongly used verb (create) and the target verb (compose) have the
semantics of creation. In addition, *make (compose) + poem was also counted as a
collocation error falling in the synset of verbs of creation, given that the target verb
create was in the semantic field of creation. Detailed classification of well-formed
and erroneous collocations of the verbs within these synsets in ST2 and ST6 is
provided in Appendices B and C respectively. Analyses were performed on L2
learners’ collocation performance in synsets with a verb increase (i.e. the first 12
Table 6.2 Frequency of ST2 ST6

well-formed and erroneous
WFC EC WFC EC
VN collocations in the 16
verb synsets (ST2 and ST6) Synsets with a verb increase 39 22 126 65
Synsets with no verb 11 2 7 2
increase
Notes ‘WFC’ stands for well-formed verb + noun collocations;
‘EC’ for erroneous verb + noun collocations
synsets in Table 6.1) and synsets with no increase in verbs in the ST6 level (i.e. the
last 4 synsets). Table 6.2 presents the total number of collocation types within two
different kinds of synsets (for detailed information about the frequencies in each
synset see Appendix D). The frequency of tokens was not considered in the fol-
lowing analyses so as to avoid skewing the overall results, since there was an
unbalanced distribution of tokens within a limited range of collocation types
(cf. Sect. 5.1.2).
Within the 12 synsets where there was an increase in verbs in the ST6 level (e.g.
synsets of verbs of creation, fulfil verbs, etc.), well-formed VN collocations
increased dramatically from 39 to 126 in frequencies. However, there was also an
increase in collocation errors from 22 in the lowest level to 65 in the highest level.
In contrast, among the 4 synsets where no increases in verbs were found in the ST6
level (e.g. synsets of live verbs, wear verbs, drive verbs and pay verbs), collocation
errors remained constant from the ST2 to the ST6 (2 types of errors in total in each
level). In terms of proportions, the percentage of erroneous collocations involving
verbs in synsets with a verb increase out of the total number of collocations pro-
duced by ST2 learners was 36% (22/(39 + 22)), and for ST6 learners, the per-
centage was 34% (65/(126 + 65)). The percentage of collocation errors that ST6
learners made in synsets with a verb increase was roughly the same as that in the
ST2 level. This finding indicates a lag in collocational knowledge for more profi-
cient learners. More precisely, even though ST6 learners were more advanced and
acquired more lexical verbs, they were as likely to make verb + noun collocation
errors as much less proficient learners (ST2 learners). There was no sign of an
improving competence on VN collocations with the rise of proficiency.
Not only was a lag found in learners’ collocation performance in synsets with an
increase in verbs, the occurrence of collocation errors involving these synsets in the
ST6 was found to be more limited to elaborated synsets than it was in the ST2 level.
The total number of erroneous collocations produced by the two groups of learners,
respectively, were 64 (ST2) and 93 (ST6) (cf. Table 5.2). So the proportion of
collocation errors associated with verbs in the 12 synsets in ST2 was 34% (22/64).
For ST6 learners, the ratio was twice as high as that of ST2 learners −70% (65/93).
An increase in erroneous collocations in these synsets was found. Again, this gross
analysis of collocation errors out of the total number of errors indicated that the
more verbs that were learned by higher levels, the more collocation errors were
produced.
What has been found up to now supports the general prediction that verb
increase is a factor responsible for the stagnant development of collocational
knowledge. However, caution is needed here, since collocation errors in the above
analyses include both verbs that are old, i.e. verbs produced by the ST2 level in VN
collocations, and verbs that are new in the ST6 level, i.e. newly learned verbs that
were not found in ST2 VN collocation databases. The 65 collocation errors pro-
duced by ST6 learners involve both errors with old verbs and new verbs. For
example, given that the verb draw in verbs of creation has been used by the lower
level (e.g. draw + conclusion), it was considered as an already acquired verb for
learners at higher levels. Similarly, conduct was not used by ST2 learners in VN
collocations but was present in the ST6 level, so it was considered as a new verb. In
the calculation of erroneous verb + noun collocations in Table 6.2, errors involving
both the old verb (*make (draw) + conclusion) and the new verb (*conduct
(commit) + crime) were included. Therefore, ST6 learners’ collocation performance
on old verbs and new verbs should be distinguished, in order to look at whether
new verbs are associated with more errors than they form correct collocations.
In the process of distinguishing errors associated with old verbs and new verbs in
the ST6 level, the following criteria were adopted: if errors involved new verbs (e.g.
*publish (enact) + law, publish was a new verb in the ST6), they were put in the
category of errors with new verbs; if the error involved old verbs, but the target verb
was a new verb (e.g. make (conduct) + exam, conduct was a new verb in the ST6),
it was classified as errors with new verbs; if the error involved old verbs (e.g. *draw
(formulate) + theory, draw is an old verb for the ST6 level), but the target verb was
not a new verb in the synsets identified, it was considered as an error with old verbs.
Following these criteria, VN collocations associated with the verbs in the synsets
identified were divided into those with old verbs and new verbs. Examples of
well-formed VN collocations associated with old verbs are: launch + war, set +
fire; examples of well-formed VN collocations associated with new verbs are:
chart + course, draft + law; examples of collocation errors associated with old
verbs are *make (draw) + conclusion, *take (launch) + career; examples of errors
with new verbs are *arouse (cause) + trouble, *take (conduct) + survey. The fre-
quency information of collocation errors, divided into errors with old and new verbs
in the 12 synonym sets, is tabulated in Table 6.3.
As is shown in Table 6.3, the overall number of well-formed collocations with
old verbs and new verbs did not show an increase, but errors involving new verbs
increased sharply. The error percentage associated with new verbs out of the
number of their collocation uses is 41%, while that of old verbs is only 25%. Apart
from a comparison in percentages, further statistical analysis was performed.
Fisher’s test revealed a significant difference between old and new verbs in terms of
the number of erroneous collocations (p = 0.0216; see Table 6.4). In other words,
collocation errors involving new verbs are significantly more likely than errors with
old verbs.
Turning now to the synsets where L2 learners had more problems with new
verbs than with old verbs, it becomes clear from Fig. 6.1 that errors with new verbs
falling into the semantic domains of verbs of creation, fulfil verbs, verbs of
Table 6.3 VN collocation production involving old and new verbs at the ST6 level
Synsets Old verbs New verbs
Verbs WFC EC Verbs WFC EC
1 Verbs of Draw, 13 4 Arouse, chart, build, 6 11
creation hold, draft, enact, establish,
launch, form, publish, stir
raise, set
2 “Fulfil” Fulfil 3 1 Accomplish, apply, 23 16
verbs carry out, conduct,
enforce, exercise,
exert, implement,
perform, realise
3 Verbs of Achieve, 18 2 Catch, reach, seize 6 6
obtaining earn,
gain,
grasp,
receive
4 Verbs of Lay 2 2 Attach, fix, impose, 12 6
putting place, put
5 “Settle” Settle, 2 0 Charge, resolve, 4 1
verbs solve tackle, undertake
6 “Learn” Learn, 0 4 Acquire, master 1 1
verbs study
7 Verbs of Teach, 3 1 Impart, instruct 1 2
transfer of tell
a message
8 “Keep” Hold, 12 2 Maintain 3 0
verbs keep
9 “Follow” Obey, 4 1 Adopt 4 0
verbs follow
10 “Play” Play 2 2 Act 0 1
verbs
11 “Change” Change 1 1 Shift 1 0
verbs
12 “Break” Break 3 1 Violate 2 0
verbs
Total 63 21 63 44
84 107
ER 25% 41%
Notes ‘WFC’ stands for well-formed verb + noun collocations; ‘EC’ for erroneous verb + noun
collocations; ‘ER’ represents the ratio of the errors out of all the collocations examined in the
column
Table 6.4 Collocation uses Well-formed coll. Erroneous coll. Total

involving old verbs and new
verbs at the ST6 level Old verbs 63 21 84
New verbs 63 44 107
Note p = 0.0216 * Sig.
Fig. 6.1 Collocation errors involving old and new verbs in the ST6 synsets
obtaining, verbs of putting, settle verbs and verbs of transfer of a message occurred
more often than with old verbs. This result is strongly linked with synset classifi-
cation in which there is a proliferation of verbs just within the six synsets (cf.
Table 6.1). Therefore, verb increase as an inhibiting factor on target-like L2 col-
location performance is again supported.
6.1.2 Analysis of VN Collocations Within Synsets Identified

at the ST2, ST5 and ST6 Levels
The previous section has addressed the relationship between verb increase and
collocation errors in the lowest (ST2) and highest (ST6) levels and quantitative
analysis shows: (1) errors in the synsets with more verbs learnt in the ST6 level than
the ST2 level increased with rising proficiency; (2) errors with new verbs were
significantly more likely than with old verbs. In this section, the focus shifts to the
middle level (ST5), so as to see whether there is a consistent trend in the ST5 level
in terms of both verb increase in the 12 synsets and collocation uses linked with
these verb synsets. It is predicted that the performance of ST5 learners is consistent,
i.e. compared with the ST2 level there is an increase in verbs in the synsets and at
the same time an increase in collocation errors associated with the verbs in the
synsets, and there are fewer verbs and collocation errors, compared with the ST6
level.
All the lexical verbs in the verb + noun collocations in the ST5 database were
classified into synsets following the same procedure and criteria applied in the
analyses of the other two levels. Table 6.5 lists the classification of verbs in the ST5
level together with the verbs in synsets identified in the ST2 and ST6 levels.
Table 6.5 Verb synsets classified from ST2, ST5 and ST6 VN collocation databases
Types Synsets Verbs
ST2 ST5 ST6
1 Verbs of Compose, Arouse, build, conduct, Arouse, chart, build,
creation create, draw, draw, establish, form, draft, draw, enact,
hold, launch, hold, launch, produce, establish, form, hold,
raise, set publish, raise, set launch, publish, raise,
set, stir
2 “Fulfil” Discharge, Apply, enforce, fulfil, Accomplish, apply,
verbs fulfil perform, practice carry out, commit,
conduct, enforce,
exercise, exert, fulfil,
implement, perform,
realise
3 Verbs of Achieve, Achieve, catch, earn, Achieve, catch, earn,
obtaining earn, gain, gain, grasp, reach, gain, grasp, reach,
gather, receive, seize receive, seize
grasp,
receive
4 Verbs of Lay Attach, lay, place, put, Attach, fix, impose, lay,
putting set place, put
5 “Settle” Settle, solve Resolve, solve Charge, settle, solve,
verbs resolve, tackle,
undertake
6 “Learn” Know, learn, Learn, master, study Acquire, learn, master,
verbs study study
7 Verbs of Teach, tell Teach, tell Impart, instruct, teach,
transfer of tell
a message
8 “Keep” Hold, keep Hold, keep Hold, keep, maintain
verbs
9 “Follow” Follow, obey Adopt, follow, obey Adopt, follow, obey
verbs
10 “Play” Play Play Act, play
verbs
11 “Change” Change Change Change, shift
verbs
12 “Break” Break Break, violate Break, violate
verbs
13 “Live” Lead, live Lead, live Lead, live
verbs
14 “Wear” Wear, dress Wear Wear, dress
verbs
15 “Drive” Drive, ride Ride Drive
verbs
16 “Pay” Devote, pay Pay Pay
verbs
In terms of the variety of verbs falling in the first 12 synsets, the ST5 level falls
in the middle between the lower level (ST2) and the higher level (ST6). In each of
the 12 synsets, they produced verbs no more than the higher level and no less than
the lower level. A clear and consistent trend for verb increase is shown from the
above table. More and more verbs were learned in the semantic domains of verbs of
creation, fulfil verbs, verbs of obtaining and verbs of putting. On the whole, the
quantity of verbs in the ST5 level is more like the verbs produced by ST2 learners,
since in the 6 synsets of settle verbs, learn verbs, verbs of transfer of a message,
keep verbs, play verbs, change verbs and live verbs, the ST5 level shows no
increase in verbs compared with the lowest level.
Collocations with the verbs in the 16 synsets in the ST5 level were divided into
correct and erroneous uses, as is shown in Appendix E. Then similar analyses of
well-formed and erroneous collocation uses associated with verbs in the 12 synsets
were performed on the ST5 data (see the numerical presentation of collocation uses
associated with the 12 synsets in the three levels in Table 6.6, and for the detailed
frequency information, see Appendix F).
For the last four synsets in which higher levels did not produce more verbs than
the lower levels, no increase in well-formed and erroneous VN collocations was
found from the lowest to the highest level (well-formed collocations: from 11 to 6
to 7; errors: from 2 to 1 to 2). In contrast, the overall figures for well-formed and
erroneous VN collocations in each proficiency group showed that within the synsets
where there was a verb increase (the first 12 synsets), more and more well-formed
collocations (from 39, to 61 and then to 126 types) were produced; at the same time
errors increased as well from the ST2 to the ST6 level (from 22 to 65 types). There
was not an error increase from the ST2 to the ST5 level, which may be due to the
slight verb increase in synsets in the ST5 level compared with the ST2 level. In
addition, in terms of proportions, the percentage of collocation errors involving the
12 synsets in the ST5 was the lowest: 27% (for ST2: 36%; ST6: 34%) (see
Table 6.7). When the numbers of errors are placed into the bigger context of the
total collocation errors in each level, there is a clear increase in the proportions of
errors involving verbs in the 12 synsets.
As Table 6.7 reveals, the ratios of errors out of VN collocations associated with
the synsets identified did not show much decrease with rising proficiency, indi-
cating a general lag in collocational knowledge. In addition, out of the total number
Table 6.6 Well-formed and erroneous VN collocations in the 16 verb synsets (ST2, ST5 and
ST6)
ST2 ST5 ST6
WFC EC Total WFC EC Total WFC EC Total
The first 12 39 22 61 61 22 83 126 65 191
synsets
The last 4 11 2 13 6 1 7 7 2 9
synsets
Note ‘WFC’ stands for well-formed VN collocations, and ‘EC’ for erroneous VN collocations
Table 6.7 Proportions of VN collocation errors associated with the 12 synsets with a verb
increase
Levels Errors in synsets Total errors Total colls. in synsets ER1 (%) ER2 (%)
ST2 22 64 61 34 36
ST5 22 44 83 50 27
ST6 65 93 191 70 34
Notes ‘colls.’ stands for collocations; ‘ER1’ represents the ratio of errors in synsets out of the total
number of collocation errors in each level; ‘ER2’ represents the ratio of errors in synsets out of the
total number of both well-formed and erroneous collocations involving the 12 synsets
of VN collocation errors, collocation errors within the 12 synsets increased

markedly in proportions. Again this trend conforms to the trend that has been
discussed earlier, i.e. verb + noun collocation errors produced by the learners
became more and more limited to the synsets with an increase in verbs, as learners’
proficiency rose. If these errors are localised in each synset (see Fig. 6.2 for the
depiction), a clear error increase can be seen at the ST6 level, which encompasses
the most verbs in the 12 synsets. This increase is most manifested in the first four
synsets, which see the greatest increase in verbs from the ST2 to the ST6 levels
(cf. Table 6.1).
Up to this point, synsets that were particularly problematic for L2 learners
emerge. As can be seen from the above figure, verbs of creation, fulfil verbs, verbs
of obtaining and verbs of putting pose particular problems for advanced Chinese
learners of English. Previous research has shown that they are also susceptible to
Fig. 6.2 VN collocation errors with the verbs in the twelve synsets across the three levels
errors for advanced German-speaking learners of English. Examining the erroneous

VN collocations produced by German advanced learners of English, Nesselhauf
(2005) reported similar semantic verb groups where L2 learners had particular
problems. As in this study, the verbs wrongly used included those inappropriately
produced and those that were intended but not produced. One group of semantically
related verbs susceptible to error is what we classified here as verbs of obtaining,
though Nesselhauf did not provide an umbrella term. However, she gave a list of
the verbs: achieve, reach, acquire, obtain and gain. Both learners in her and our
studies had considerable difficulty with the right collocational choice. For example,
*reach + recognition was wrongly used instead of receive + recognition; and
*get + conclusion instead of reach + conclusion. Another verb group with which
Nesselhauf reported learners’ difficulty was verbs meaning “carrying something
out”, which in this study were the fulfil verbs encompassing the largest number of
errors for the ST6 learners. Both the German and Chinese learners were confused in
choosing the correct verbs, e.g. accomplish and conduct were produced instead of
commit in collocation with crime; delexical verbs like do and make were also
wrongly chosen for the noun crime; fulfil + plan was produced rather than imple-
ment + plan; and *implement + act was wrongly used instead of perform + act,
etc. In addition, Nesselhauf reported that the verb group comprising verbs such as
create, establish, set and set up also posed particular problems for learners and this
group of verbs was here classified as verbs of creation. Examples of errors within
this synset are *publish + law instead of enact + law, *stir + consciousness instead
of raise + consciousness, *raise + discussion instead of arouse + discussion.
In general, ST6 learners struggled in choosing the right lexical verbs from sets of
semantically related verbs; and in a few cases delexical verbs were used instead of
lexical verbs, e.g. *make + exam (conduct), *take + survey (conduct),
*build + regulation (enact), *build + tie (establish), *make + conclusion (draw),
*take + military service (perform). However, ST2 learners were more likely to use
delexical verbs instead of lexical verbs within the synsets classified in this study.
For example, among the 22 VN collocation errors produced by ST2 learners in the
synsets, a third were found to be linked with delexical verbs, e.g. *give + meeting,
*make + result, *do + problem, *get + lesson, etc.
Meanwhile, in the following cases of collocations, the erroneous verbs were
observed to share certain phonological resemblance to the target verbs, e.g. the
wrongly used verb—arise—and the target—arouse—in collocation with discus-
sion, draw and draft followed by law. These instances can be seen as a result of
phonological interference, which is consistent with word association studies
showing that learners’ responses tended to have a phonological similarity to
stimulus words whilst native speakers produced associations based on syntagmatic
and paradigmatic relations (Meara 1978).
Even in synsets where no errors were found in the ST2 level, e.g. fulfil verbs,
verbs of putting, keep verbs, change verbs and break verbs, this does not neces-
sarily mean a full acquisition of verbs in these sets (e.g. verbs like discharge, fulfil,
lay). Instead, it indicates the limited vocabulary that ST2 learners have. Although a
limited vocabulary may lead to success in choosing the right verbs for a noun in
certain collocations, it may cause imprecision in collocation production when the

need for using other collocations arises. However, with the expansion of vocabulary
size, more chances of incorrect production arise as learners are faced with more
difficulty choosing the right verb from a set of semantically related verbs.
6.2 Synopsis of Detailed Analyses of Verb Increase

Through the above-detailed analyses of verb increase within a certain semantic set
and their relationship with collocation performance, the development of learners’
VN knowledge was observed to stagnate. On the one hand, the percentages of
errors involving the 12 synsets identified from learners’ production of VN collo-
cations remained roughly constant (36% at the ST2 level and 34% at the ST6 level)
with rising proficiency. In addition, errors with new verbs in the ST6 level were
significantly more likely to be made than old verbs. On the other, the ratios of
collocation errors with the verbs in the synsets out of the total number of errors
increased successively from the lowest to the highest level (ST2: 34%; ST5: 50%
and ST6: 70%). As learners proceeded to more advanced levels, the occurrence of
collocation errors became more and more limited to synsets with a verb increase.
Verb classes most susceptible to errors were verbs of creation, fulfil verbs, verbs of
obtaining and verbs of putting, where there was a considerable increase in the
number of verbs. Thus, we consider that the increase of verbs in a particular
semantic domain is an inhibiting factor for the learning of VN collocations.
There is a view that collocations defeat even the most proficient English learners
because of the arbitrary restrictedness in collocations (Nesselhauf 2003, 2005).
Collocations are not just semantically motivated, but also involve arbitrarily
restricted selection. For example, blonde hair in English is felicitous, but *blonde
paint is not and auburn hair is used to describe women, but not men (Schmitt and
Carter 2004: 14). The restrictedness nature of collocation has been considered as the
most important factor correlating with learners’ difficulties with collocation pro-
duction (Nesselhauf 2003, 2005).
In light of our findings, what poses great difficulties for L2 learners is to dis-
tinguish among a group of semantically related verbs (e.g. perform vs. implement,
conduct vs. commit, etc.). In the erroneous collocation *conduct + crime instead of
commit + crime, it can be seen that they both share the semantic features of “carry
out something”, but differ from each other in the sense that “commit” denotes
“doing something illegal or bad”, but “conduct” means “organising something and
carry it out”.1 Likewise, in the following collocation error: *implement + act, the
learner who has made such an error may know a partial meaning of the verb
1
The meanings of the two verbs were quoted from Collins COBUILD Advanced Learner’s English
Dictionary (2006).
6.2 Synopsis of Detailed Analyses of Verb Increase and Collocation Uses 109
implement (i.e. the semantic component as “carrying out something”) but not its
complete semantics. This partial acquisition has led to the learner’s incorrect belief
that the erroneous verb implement can be combined with the noun act. However,
implement means more than that. The misused verb (implement) and the target verb
(perform) both belong to the semantic field of fulfil verbs and verbs in this set have
a small number of semantic features in common, but they are distinguished by
specific meanings. Both verbs have the semantic component “carry out something”,
but implement is distinguished from perform in that it implies: “to ensure what has
been planned is done” (e.g. implement a plan). For the verb perform, it simply
suggests “doing a (usually) complicated task or action”. So it is inferred that when a
learner has an incomplete command of implement, i.e. only the semantic compo-
nent as “carrying out something”, but not its distinctive feature of “ensuring
something that has been planned is completed”, collocation errors like imple-
ment + act are likely to be made. Therefore, seen from the erroneous VN collo-
cations produced by Chinese L2 learners, only a fraction of verb semantics was
acquired by learners, but not its distinguishing features from a set of semantically
related verbs. In this sense, acquisition of verb semantics is important for successful
learning of collocations.
6.3 An Alternative Explanation: New Nouns

The research hypothesis tested above was that verb increase is the main factor
responsible for the stagnant L2 collocation development. Accordingly, it is pre-
dicted that other factors, e.g. the acquisition of new nouns is not the main inhibiting
factor in collocation performance. Our prediction is thus that in the majority of new
nouns produced by higher levels of learners, learners produce correct VN collo-
cations. The prediction that noun increase is not the inhibiting factor would be
further confirmed if the percentage of new nouns in erroneous collocations
remained constant within the levels of ST5 and ST6.2
In order to examine whether collocation lag is a result of new noun acquisition,
specifically, whether collocation errors are made because learners learn a large
proportion of new nouns, the empirical requirement was to identify newly acquired
nouns by L2 learners. It was not feasible to see if a noun was new or old through
asking the learner him/herself at the time of their writing. The identification of new
nouns at a higher level was therefore implemented by examining nouns produced
by lower levels of learners. New nouns were those which were not used by lower
levels (i.e. with no occurrences in the file at lower levels) whilst old nouns referred
to those that were both used by the two groups of learners (i.e. with occurrences in
2
Given the constraint of locating new nouns in the lowest ST2 level, the proportion of collocation
errors where new nouns occur cannot be obtained.
both the files of the two groups). Taking the ST2 and ST6 learners for the purposes
of illustration, the nouns only occurring in the ST6 file were assumed to be newly
acquired nouns by ST6 learners and old ones were those that occurred in both ST6
and ST2 sub-corpora.3
The search for new nouns was performed automatically. With the ST2 and ST6
as examples, procedures involving the identification of new nouns in the colloca-
tions produced by ST6 learners were as follows:
• Store all the nouns in the VN collocations produced by ST6 learners in a text
file;
• Generate a list of all the nouns in the ST2;
• Use Wordsmith (the Match function) to delete all the matched nouns between
the wordlist of nouns in ST6 collocations and the nouns in ST2, and get a list of
new nouns in the ST6.
Once the above procedure had been carried out, analyses of new nouns were first
performed at the ST6 level (new nouns as compared with the ST2 level), then new
nouns (new nouns as compared with the ST5 level) in the ST6 were analysed and
finally new nouns in the ST5 level (new as compared with the ST2 level) were
analysed.
6.3.1 New Nouns in the ST6 VN Collocations (New

as Compared with the ST2 Level)
Altogether there were 264 nouns in the ST6 VN collocations, out of which 72 were
new nouns that did not appear in the ST2 file, and 192 old nouns that were used
both by ST2 and ST6 learners. A further categorisation was carried out among the
new nouns that occurred in the erroneous verb + noun collocations and correct
collocations. The results are presented in Table 6.8.
As is shown in the above table, 25% of the overall number of new nouns in
ST6 VN collocations were in erroneous collocations, which means that three-
quarters of these newly acquired nouns were in correct collocations. It then becomes
interesting to see whether collocation errors occur because of a lack of new verbs for
newly acquired nouns or whether it is just a matter of learners’ inability to associate
the new nouns with already acquired verbs. If the first case, the inhibiting role of the
new nouns will be manifested, as learners encounter two difficulties: newly acquired
nouns and a lack of appropriate verbs. If the collocating verbs for the new nouns are
3
This can only be assumptions, since it might be as well that some new nouns in ST6 were actually
acquired by the ST2 but not used (e.g. button, decision) or among old nouns in the ST6 and ST2,
some were idiosyncratic uses by one learner (e.g. criminal, principle) and not acquired by general
ST2 learners. These cases did exist but were rare, so assumptions were on the whole justified. In
addition, since groups of learners were targeted, the individual differences between learners could
not be spotted.
6.3 An Alternative Explanation: New Nouns and Collocation Uses 111
Table 6.8 New nouns and New nouns Old nouns Total
old nouns in ST6 VN
collocations (new nouns as Erroneous coll. 18 (25%) 42 60
compared with ST2) Correct coll. 54 (75%) 150 204
Total 72 (100%) 192 264
already acquired but not correctly used with the new nouns, we can infer learners’
split learning of collocations into individual words, rather than association of a
newly acquired noun with an old verb. The 18 new nouns and their collocating verbs
were analysed in the erroneous verb + noun collocations produced by ST6 learners,
shown in Table 6.9.4 Table 6.9 also provides information concerning whether or not
the erroneous and the target verb collocates of the new nouns are present in the ST2
file. If the target verb (e.g. impose) was not present in the ST2 file (signalled by a
minus symbol “−”), this verb was considered as a new verb for the ST6 learner.
Otherwise it was an old verb (e.g. play).
To see whether ST6 learners wrongly used a new verb instead of an old verb to
collocate with a new noun, or they used an old verb instead of another old/new verb
for the new noun, the 27 collocation pairs of the 18 new nouns in ST6 erroneous
collocations displayed in the above table were further classified into three cate-
gories, according to whether the erroneous and target verbs were already learnt by
ST2 learners.
6.3.1.1 New Nouns in the Erroneous VN Collocations Where

the Target Verb Was Absent in Lower Levels of Learners
(11 VN Collocations)
Instances of this category are *give + burden, *lay + burden, *release + burden,
*surpass + advantage, *break + regulation, *build + regulation, *take + survey,
*do + threat, *impose + threat, *conduct + murder, and *cause + imagination.
That the target verbs for these erroneous collocations (e.g. impose, relieve, outweigh,
etc.) were not used by ST2 learners suggests that they were new to ST6 learners or
had not been fully acquired yet. With *give + burden as an example, the target verb
impose was not used by ST2 learners, suggesting impose may be a new verb to the
ST6 learner who had acquired a new noun—burden. Thus, learners were found to
make collocation errors as such by using a verb they had already acquired (in the
case of burden, they used give and lay). Among the 11 collocations where collo-
cation errors may take place as a result of a lack of new verbs, 8 of the erroneous
verbs in the VN collocations are old verbs with an appearance in the ST2 file,
meaning they may have been already acquired by ST6 learners. This is natural given
that L2 learners have not acquired the target new verbs and have to use old verbs
instead. Learners acquire a new noun, but they may not acquire the collocating verbs,
4
Among the 18 nouns, there is a noun phrase—military service, which was regarded as one noun
for the convenience of analysis.
Table 6.9 18 New nouns in ST6 erroneous VN collocations and their verb collocates (new nouns
as compared with ST2)
Nouns Erroneous Target verbs Erroneous verbs Target verbs
verbs in ST2 in ST2
Burden Give Impose + −
Burden Lay Impose + −
Burden Release Relieve − −
Role Lead Play + +
Role Lay Assign + +
Role Act Play + +
Role Serve Play + +
Consciousness Stir Raise − +
Disadvantage Surpass Outweigh + −
Regulation Break Violate + −
Regulation Build Enact + −
Survey Take Conduct + −
Threat Do Pose + −
Threat Impose Pose − −
Murder Conduct Commit − −
Prejudice Reflect Hold − +
Prejudice Cast Hold − +
Treaty Draw Sign + +
Chat Make Have + +
Competence Exert Demonstrate − +
Imagination Cause Excite + −
Load Pull Carry + +
Mercy Cast Have − +
Recognition Reach Receive + +
Military Attend Perform + +
service
Military Take Perform + +
service
Measure Make Take + +
Note ‘+’ means that the verb appears in the ST2 sub-corpus, and ‘−’ represents an absence of the
verb in the ST2 sub-corpus
and collocation errors may thus occur. In the other three instances where new verbs
had not been acquired, newly acquired verbs were incorrectly used with the new
nouns, i.e. *release + burden, *impose + threat, and *conduct + murder.5
5
The target verbs for the three new nouns (relieve, pose and commit) and new erroneous verbs
(release, impose, conduct) all share partial phonological resemblance, which could be seen as
phonological interferences and the target verbs may have been known by learners but not fully
acquired yet.
In all, the 11 types of erroneous collocations in ST6 learner group can be viewed
as a lack of new collocating verbs with newly acquired nouns.
6.3.1.2 New Nouns in the Erroneous VN Collocations Where

the Target Verb Was Present in Lower Levels of Learners
but the Erroneous Verbs Was Absent (5 VN Collocations)
As is shown in Table 6.9, this category includes *stir + consciousness instead

of raise + consciousness, *reflect/cast + prejudice instead of hold + prejudice,
*exert + competence instead of demonstrate + competence and *cast + mercy
instead of have + mercy. That the target verbs were used in lower levels of learners
suggests that the verbs might be already known to ST6 learners, and yet they used
newly acquired verbs to collocate with the newly acquired nouns.
6.3.1.3 New Nouns in the Erroneous VN Collocations Where Both

the Erroneous and Target Verbs Were Present in Lower
Levels of Learners (11 VN Collocations)
This type arises when learners acquire a new noun, and misuse a known verb with
another already acquired verb. Erroneous collocations involving misuses of old
verbs include *lead + role (correct verb: play), *lay + role (assign), *act + role
(play), *serve + role (play), *draw + treaty (sign), *make + chat (have), *pull +
load (carry), *reach + recognition (receive), *attend + military service (perform),
*take + military service (perform), *make + measure (take). Errors of this type can
be inferred as a split learning of VN collocations, i.e. new nouns were learnt in an
isolated way instead of being learnt in collocational relationships with already learnt
verbs.
In all, the percentage of new nouns that are linked to collocation errors in the
ST6 is 25%, which means that among 100 nouns that are newly acquired by L2
learners, learners correctly find a collocating verb in 75 of the cases. Even in errors
involving the wrong choices of verbs for the newly acquired nouns, less than a half
of the errors involving new nouns (41%: 11/27) arise when learners do not know
the new target verb collocates (as is illustrated in the Category a.). However, in
more than half of the cases (59%: 16/27), errors arise when the target verb may
have already been acquired by ST6 learners for the newly acquired nouns (as is
illustrated in Categories b and c). These figures can be interpreted to the effect that
collocation errors with new nouns occur even though learners in most cases do not
lack the verbs for newly acquired nouns. For example, either new verbs are misused
instead of an old verb (e.g. *stir + consciousness instead of raise, *reflect +
prejudice instead of hold) or another old verb is misused instead of another old
target verb (e.g. *lead + role instead of play, *make + measure instead of take). On
the one hand, that learners use newly acquired verbs to combine with newly
acquired nouns can be viewed as boldness in collocation learning, i.e. they are
experimenting with verbs that have been newly learnt. On the other hand, the fact
that L2 learners fail to associate nouns with known verbs in collocations suggests
the learning of new nouns in isolation, instead of being learnt as prefabricated
chunks with the already acquired verbs. This finding supports Wray’s (2002) claim
of a split learning of collocations into individual items, or the inability to pay
attention to collocational relationships between words on the part of L2 learners.
Therefore, the role played by new nouns in the collocation lag in ST6 learners
can be viewed as no more than a minor one, influencing a limited percentage of new
nouns in erroneous collocations (25%). In most cases where collocation errors with
new nouns occur, it is because of an inability to associate the new nouns with
already acquired verbs. On this basis, collocation lag is not a result of newly
acquired nouns.
6.3.2 New Nouns in the ST6 VN Collocations (New

as Compared with the ST5 Level) and New Nouns
in the ST5 VN Collocations (New as Compared
with the ST2 Level)
It should be recalled that ST2 learners are senior middle school students, repre-
senting the lowest level in the CLEC corpus, and ST6 learners are third and
fourth-year university English majors, representing the highest proficiency level
within the corpus. Between these two levels of proficiency, there is the ST5 level of
first and second-year English majors. Therefore, in order to ensure the continuity of
between-group comparisons, VN collocation performance of ST5 learners was
taken into consideration as well. Following the same procedure as in the analysis of
ST6 as compared with the ST2 level, learners’ collocation performance in terms of
new nouns in the ST6 (new as compared with the ST5 level) and ST5 (new as
compared with the ST2 level) was analysed. Frequencies are presented in
Tables 6.10 and 6.11.
Table 6.10 New and old New nouns Old nouns Total
nouns in ST6 VN collocations
(new nouns as compared with Erroneous coll. 5 (14%) 55 60
ST5) Correct coll. 31 (86%) 173 204
Total 36 (100%) 228 264
Table 6.11 New and old New nouns Old nouns Total
nouns in ST5 VN collocations
(new nouns as compared with Erroneous coll. 6 (11%) 28 34
ST2) Correct coll. 48 (89%) 165 213
Total 54 (100%) 193 247
As a generalisation from Tables 6.8, 6.10 and 6.11, it becomes clear that the
percentages of new nouns in erroneous collocations are low and remain roughly
constant at the two higher levels. The percentage of new nouns in erroneous col-
locations in ST5 is 11% and the ratio is 14% for ST6 learners. So again the
prediction that the acquisition of new nouns is not an inhibiting factor in collocation
performance is upheld. It is interesting that if the new nouns at the ST6 level are
identified as new with reference to the nouns produced by the ST2 learners, the
percentage is 25%, exactly the sum of 11 and 14%. It follows that when the nouns
in the highest level (ST6) are compared with those produced by the lowest level
(ST2), there are more new nouns obtained than are compared with a lower level
(ST5). The three percentages (11, 14 and 25%) suggest two significant points: the
proficiency of the three levels of learners is continuously developing (as manifested
through a gradual increase in newly acquired nouns); the percentages of new nouns
in erroneous collocations appear to describe a coherent trend, supporting the
validity of the analysis method in this study.
Turning now to the detailed analyses of new nouns in erroneous VN colloca-
tions, new nouns in erroneous collocations in the ST6 (new as compared with the
ST5 level) and those in the ST5 (new as compared with the ST2 level) were
analysed (see Tables 6.12 and 6.13).
From the above tables, it can be inferred that in nearly all the cases of new nouns
in erroneous collocations, ST5 and ST6 learners may know the verb but fail in
using them with the new nouns (except *make + offence, *do + regulation). Even
when they do not lack the collocating verb for a new noun, errors still arise, which
again reveal a split learning of collocations.
In conclusion, the research hypothesis that it is the increase in verbs that is
mainly responsible for the stagnant collocation performance was upheld. In this
section we attempted an alternative explanation to see if the learning of newly
acquired nouns is also a factor responsible for the collocation lag. It was found that
the occurrence of new nouns is not the main factor responsible for stagnant
Table 6.12 New nouns in ST6 VN erroneous collocations and their verb collocates (new nouns
Nouns Erroneous Target verbs Erroneous verbs in Target verbs in
verbs ST5 ST5
Treaty Draw Sign + +
Competence Exert Demonstrate − +
Load Pull Carry + +
Recognition Reach Receive + +
Military Attend Perform + +
service
Military Take Perform + +
service
Note ‘+’ means that the verb appears in the ST5, and ‘−’ represents an absence of the verb in the
ST5
Table 6.13 New nouns in ST5 VN erroneous collocations and their verb collocates (new nouns
Nouns Erroneous Target Erroneous verbs in Target verbs in
verbs verbs ST2 ST2
Role Do Play + +
Role Act Play + +
Role Occupy Play + +
Role Lay Play + +
Drum Hit Beat + +
Eyebrow Frown Raise − +
Offence Make Commit + −
Regulation Do Enact + −
Utmost Make Do + +
Note ‘+’ means that the verb appears in the ST2, and ‘−’ represents an absence of the verb in the
ST2
development of collocation performance of L2 learners. The percentages of new

nouns in erroneous collocations are rather low (11 and 13%) and these figures
remain roughly constant at both higher levels. In addition, even though new nouns
are used in erroneous collocations, it was found that it is mainly not due to a lack of
new collocating verbs, since the target verbs may have been acquired. Even though
the target verbs have already been acquired, L2 learners fail to associate them with
the new nouns in collocation, which suggests the learning of new nouns in isolation,
rather than in chunks.
References
Meara, P. (1978). Learners’ word associations in French. Interlanguage Studies Bulletin, 3(2),
192–211.
Benjamins.
Chapter 7
Chinese Learners’ Performance
on English Adjective + Noun
In the learning of verb + noun collocations by Chinese learners of English, verb

increase in a set of semantically related verbs has been shown above to pose great
difficulties for learners. There is a lag in collocational knowledge as observed in
learners’ performance on verb + noun collocations. In light of this finding, this
chapter will now explore two other important and frequent types of collocations:
adjective + noun and noun + noun collocations, in order to investigate the inhibiting
factor of vocabulary growth in collocation learning.
For the convenience of comparison, analyses of AN and NN collocations were
performed only on the ST2 and ST6 data, given that the ST5 data show a certain
inconsistency in quantity of collocation errors in VN collocations (cf. Sect. 5.1.3).
Learners’ performance on AN collocations was firstly presented in Sect. 7.1, fol-
lowed by analyses of their production of NN collocations (Sect. 7.2).
7.1 Analyses of Adjective + Noun Collocations
Examples of frequent adjective + noun collocations produced by ST2 and ST6

learners are (collocations with a frequency over 10): active part, best wishes, civil
war, rapid progress, associate professor, developed countries, developing coun-
tries, economic development, environmental protection, living condition, natural
resources, etc. Erroneous AN collocations in the two databases are (with the target
adjectives given in brackets):*beautiful supper (delicious), *deep language (rich),
*heavy sufferings (great), *large laughter (loud), *large voice (loud), *light river
(clean), *sharp match (fierce), *classical song (classic), *clear welcome (warm),
*feminine movement (feminist), etc.
The overall number of well formed and erroneous AN collocations is presented
in Tables 7.1 (for tokens) and 7.2 (for types).
It is noteworthy that there are not so many instances of erroneous AN collo-
cations in the corpus. Only 17 cases were identified in the collocations produced by
DOI 10.1007/978-981-10-5822-6_7
118 7 Chinese Learners’ Performance on English Adjective + Noun …
Table 7.1 AN collocations Well-formed coll. Erroneous coll. Total

produced by ST2 and ST6
learners (tokens) ST2 337 (95%) 17 (5%) 354 (100%)
ST6 986 (99%) 15 (1%) 1001 (100%)
Note v2 = 10.99, p = 0.0009 *** extremely Sig.
Table 7.2 AN collocations Well-formed coll. Erroneous coll. Total

produced by ST2 and ST6
learners (types) ST2 100 (88%) 14 (12%) 114 (100%)
ST6 217 (96%) 9 (4%) 226 (100%)
Note v2 = 7.01, p = 0.0081 **
very Sig.
the lower level learners and 15 instances by the higher level. Apart from this, the
two tables reveal two important aspects. Firstly, it is evident that the ratios of AN
collocation errors are not high for both groups of learners in terms of either tokens
or types. These ratios are 12% for ST2 learners and 4% for ST6 learners. In contrast
to the proportion of verb + noun collocation errors (for ST2, 22%, and for ST6,
21%, cf. Sect. 5.1.3), they are much lower. Secondly, learners at the higher level
produced significantly more well-formed collocations than the lower level, and
there was a significant error decrease. The decrease in AN collocation errors stands
in sharp contrast to the production of VN collocations, for which the proportion of
errors does not show a clear decrease.
Therefore, based on quantitative analyses, the data show that Chinese learners do
not seem to have great difficulties with adjective + noun collocations. In addition,
their knowledge of AN collocations improves with rising proficiency. These results
corroborate the findings from previous studies of L2 learners’ learning of AN
collocations (e.g. Gitsaki 1999; Siyanova and Schmitt 2008; Zhang and Chen 2006;
cf. Sect. 3.2.1.2). Adjective—noun collocations have been identified as “easy” and
“early acquired” type of collocations (Gitsaki 1999) and more proficient learners
had better command of AN collocations than lower levels (Zhang and Chen 2006).
7.2 Analyses of Noun + Noun Collocations
Examples of frequent noun + noun collocations produced by ST2 and ST6 learners
are art festival, basketball match, book shop, fire fighter, swimming pool, crime
rate, etc.1 Erroneous NN collocations include *artist festival (art festival),
1
In the creation of the ST6 NN collocation database, two collocations—mercy killing (which has
been used for 255 times) and prison system (107 times)—were deleted so as to avoid statistical
skewedness. The high occurrence of these two collocations is because they are topic-related: two
of the topics given to the English majors are “the legalisation of euthanasia in China” and “the
abolition of prisons”.
7.2 Analyses of Noun + Noun Collocations 119
*homehold duties (household duties), and *scientist book (science book), etc.
Altogether there are 82 and 46 instances of erroneous noun + noun combinations in
each database. A detailed look into the errors involving noun + noun combinations
shows that not all of these errors are collocation errors, i.e. errors associated with
the wrong choices in words of the same word class. Instead, a large proportion of
them is colligation errors, i.e. errors linked with wrong choices in grammatical
categories. Unlike collocations referring to the co-occurrence of word combina-
tions, colligation refers to the co-occurrence of grammatical choices. So a collo-
cation error is an error with wrong lexical selections (but the word class is correct),
e.g. *learn knowledge rather than acquire knowledge. A colligation error is an error
with word classes of the words in a word combination, e.g. *industry city rather
than the industrial city. Table 7.3 presents the NN colligation errors in the ST2 and
ST6 databases.
Table 7.3 Noun + noun colligation errors produced by ST2 and ST6 learners
ST2 ST6
Errors Target words Errors Target words
Flowers exhibition Flower Electricity lamps Electric
Foreigner teacher Foreign Environment consciousness Environmental
Happiness family Happy Families members Family
Industry city Industrial Feminism movement Feminist
Interest book Interesting Feudalism society Feudalist
Socialism country Socialist Globe economy Global
Socialism reformation Socialist Heat debate Heated
Sport ground Sports Heat topic Heated
Sport meeting Sports Importance step Important
Sports trousers Sport Industry revolution Industrial
Summer’s holiday Summer Limit recourses Limited
History’s test History Medicine fee Medical
Math’s test Math Nationality defence National
Freedom life Free Nature process Natural
People computer Personal Scenery spots Scenic
Socialism construction Socialist
Socialism countries Socialist
Society evolution Social
Society factor Social
Society problem Social
Society wealth Social
Examples sentences Example
Capitalism countries Capitalist
Science way Scientific
Economy development Economic
Economy growth Economic
Stars hotel Star
In most of the colligation errors, learners wrongly use a noun instead of its
adjectival form, e.g. foreigner teacher rather than foreign teacher, electricity lamps
rather than electric lamps, environment consciousness instead of environmental
consciousness, etc. In English, both a noun and adjective can function as the
modifiers of the following noun, and this grammatical feature seems to be baffling
learners in choosing which to collocate with the nouns that follow. Another factor
of the misuses of nouns for adjective modifiers may be a cross-linguistic one; a
noun modifier before another noun is very common in the Chinese syntax. What is
more interesting among these errors is that sometimes learners are conscious of the
typical grammatical feature of English and try to use adjectival modifiers such as
possessives before the nouns. As the examples in Table 7.3 show, summer’s hol-
iday, history’s test and math’s test are indications of an awareness of adjectival
modifiers. Yet the over generalisation leads to colligation errors.
In terms of the quantities of these errors, learners do not seem to get better with
NN colligations as their proficiency level rises. On the contrary, there is a wors-
ening performance in noun + noun colligations. Tables 7.4 and 7.5 present the
frequencies of colligation errors and non-colligation errors in the two proficiency
groups.
From the two tables, it can be seen that in percentage terms, 83% of the NN
combination errors ST6 learners made are colligation errors, whilst the percentage
of ST2 learners is 43%. Fisher’s test on the frequencies of tokens and types showed
a significant difference between learner types in terms of the production of colli-
gation errors. ST6 learners made significantly more errors with noun + noun col-
ligations than ST2 learners. This could be that the influence of the noun + noun
structure in the L1 Chinese is very persistent even at higher levels. We shall turn to
the L1 influence on the learning of collocations by L2 learners in Chap. 9.
Besides the NN colligation errors, there are also a few cases where the entire
expression does not make sense in English and should be a noun or a totally new
expression, i.e. *smile sound for laughter, *hill-medicine for yam, *football door
Table 7.4 Colligation and non-colligation NN errors in the ST2 and ST6 levels (tokens)
NN colligation errors NN non-colligation errors Total
ST2 34 (43%) 46 (57%) 80 (100%)
ST6 38 (83%) 8 (17%) 46 (100%)
Note p < 0.0001 **** extremely Sig.
Table 7.5 Colligation and non-colligation NN errors in the ST2 and ST6 levels (types)
NN colligation errors NN non-colligation errors Total
ST2 15 (41%) 22 (59%) 37 (100%)
ST6 27 (77%) 8 (23%) 35 (100%)
Note p = 0.0020 ** very Sig.
7.2 Analyses of Noun + Noun Collocations 121
for goal, *book table for desk, *mother school for alma mater, *warning clock for
alarm bell, *psychology doctor for psychiatrist and *song words for lyric. Such
unacceptable expressions are the direct word-by-word rendering of the Chinese
characters into English. This may be a strategy adopted by L2 learners as they turn
to the direct translation of the Chinese expression (shanyao) when they have not
acquired the English word (yam).
With the colligation errors and errors of the entire expression excluded, the
remaining 16 NN combinations in the ST2 database are erroneous noun + noun
collocations. Collocation errors produced by ST2 learners are: *artist festival (art
festival), *basketball ground (basketball court), *football court (ground), *football
movement (football games), *hand master (head master), *hand teacher (head
teacher), *heart illness (heart disease), *homehold duties (household duties), *saw
materials (raw materials), *scientist book (science book), *speech match (speech
contest), *pity girl (poor girl), *middle night (mid-night), *singers match (singing
match), *end game (final game), *beauty match (beauty contest). The 7 NN col-
location errors in the ST6 database are: *feminine movement (feminist movement),
*feminism women (feminist women) *graduation certification (graduation certifi-
cate), *life standard (living standard), *mountain slides (land slides), and *song
star (music star). The overall number of well-formed and erroneous NN colloca-
tions is presented in Tables 7.6 (for tokens) and 7.7 (for types).
As is revealed from the above tables, the proportions of erroneous noun + noun
collocations are very low: 7% for ST2 and 2% for ST6 learners. Statistical analyses
showed that higher level learners made very significantly fewer errors than the
lower group in the use of NN collocations. This indicates a better command of NN
collocations as learners’ proficiency rises. Comparing this finding with the pro-
duction of AN collocations, it shows a similar pattern, i.e. better performance on
NN collocations was observed with the rise of L2 proficiency.
Table 7.6 Noun + noun Learners Well-formed coll. Erroneous coll. Total
collocations in the ST2 and
ST6 levels (tokens) ST2 618 (95%) 31 (5%) 649 (100%)
ST6 792 (99.2%) 6 (0.8%) 798 (100%)
Note v2 = 21.68, p < 0.0001 ****
extremely Sig.
Table 7.7 Noun + noun Learners Well-formed coll. Erroneous coll. Total
collocations in the ST2 and
ST6 levels (types) ST2 202 (93%) 16 (7%) 218 (100%)
ST6 307 (98%) 6 (2%) 313 (100%)
Note v2 = 8.20, p = 0.00425 **
very Sig.
7.3 Synopsis of the Analyses of Adjective + Noun

This chapter expands the analyses of Chinese learners’ collocation performance to

two other frequent types of collocations: adjective + noun and noun + noun col-
locations. A good performance was observed in the production of both AN and NN
collocations by Chinese EFL learners at two proficiency levels. There was a very
low proportion of AN collocation errors (ST2: 12%; ST6: 4%) and NN collocation
errors (ST2: 7%; ST6: 2%), although the ratios of AN collocations errors were more
than those of NN collocation errors. Meanwhile, learners’ knowledge of AN and
NN collocations improves with rising proficiency, as there were significantly more
well-formed AN and NN collocations at the higher level. The picture emerging
from the adjective + noun and noun + noun collocations is different from learners’
performance on verb + noun collocations. The next chapter will compare in detail
Chinese learners’ production of the three types of collocations and make possible
interpretations accounting for such differences.
References
Gitsaki, C. (1999). Second language lexical acquisition: a study of the development of

Zhang, W. Z. & Chen, S. C. (2006). EFL learners’ acquisition of English Adjective-Noun
collocations—A quantitative study. Foreign Language Teaching and Research, 38(4),
251–258.
Chapter 8
Comparison and Interpretation
of Learners’ Performance on the Three
Types of Collocations
This chapter presents and compares the results obtained from the analyses of
Chinese learners’ verb + noun, adjective + noun and noun + noun collocations.
Section 8.1 performs a comparison of erroneous collocations among the three types
of collocations; Sect. 8.2 analyses and compares the overall growth of verbs,
adjectives and nouns and relates them to the production of collocations; Sect. 8.3
analyses in detail the synset density of the three-word classes and offers interpre-
tations of the differing performances on the VN, AN and NN collocations.
8.1 Collocation Errors in the Three Types of Collocations
Table 8.1 presents the overall percentages of collocation errors among the pro-
duction of verb + noun, adjective + noun and noun + noun collocations by
Chinese EFL learners at the basic and advanced levels.
A comparison of the ratios of erroneous collocations among the three types of
collocations at each proficiency level (e.g. in the ST2 level, 22% for VN, 12% for
AN and 7% for NN) shows that Chinese L2 learners, irrespective of their profi-
ciencies, performed best in NN collocations, followed by AN collocations, and
performed worst in VN collocations. A cross-group comparison of the ratios
demonstrates a varied developmental pattern, i.e. no clear decrease in collocation
errors with the rise of proficiency in the production of VN collocations, but a
decrease in errors in the production of AN and NN collocations. Combining this
result with the findings from statistical tests, no significant relationship was found
between learner levels and erroneous VN collocations (cf. Sect. 5.1.3), suggesting
that there is no significant decrease in VN collocation errors. However, there were
very significantly fewer AN and NN collocation errors at the ST6 level than the ST2
level, suggesting an improvement in the acquisition of these two collocation types.
DOI 10.1007/978-981-10-5822-6_8
124 8 Comparison and Interpretation of Learners’ Performance on …
Table 8.1 Error ratios of VN, AN, and NN collocations produced by ST2 and ST6 learners
Learners Verb + noun coll. (%) Adjective + noun coll. (%) Noun + noun coll. (%)
ST2 22 12 7
ST6 21 4 2
Therefore, a varied collocation performance by L2 learners was observed.

Simply put, there is a stagnant development of VN collocation knowledge but
improved AN and NN collocation performance. VN collocations have long been
acknowledged as difficult for L2 learners. As for the acquisition of AN and NN
collocations, Philip (2007: 1) claims that they are easier to learn because
many noun + noun and adjective + noun collocates reflect linguistically what can be
observed in the real world, and so their lexical co-occurrence seems natural and relatively
unproblematic. However, as language moves away from the concrete and observable
towards the abstract and, apparently, arbitrary, the likelihood of a collocate to be deemed
‘logical’ by a learner diminishes, as does its accurate recycling in free production.
This claim holds water in that many noun + noun and adjective + noun colloca-
tions in our databases are concrete lexical co-occurrences that can be observed, i.e.
air conditioner, bus stop, football ground, blue sky, full moon, heavy rain, etc.
However, the observable property of certain collocations does not guarantee an easy
acquisition because of the arbitrariness in lexical co-occurrences. For example,
heavy rain is concrete and observable, but it does not rule out the possible com-
binations of *dense rain, *strong rain, or *powerful rain. Learners were found to
make errors with concrete NN collocations as well, such as *basketball ground,
*mountain slides, *light river, etc. At the same time, many abstract AN and NN
collocations were correctly produced, i.e. environment protection, labour force,
sales volume, cheap trick, deep sorrow, etc. So the observable feature of adjec-
tive + noun and noun + noun collocations as proposed by Philip (2007) cannot
account for learners’ relative ease with these two types of collocations.
Another possible explanation of the relative difficulty in acquiring VN collo-
cations and relative ease in acquiring AN and NN collocations is based on the
observation that words of different parts of speech differ in their tendency to cluster,
i.e. singular nouns and base forms of verbs are highly collocational while adjectives
and adverbs are not (Kjellmer 1990). This could be interpreted to mean that verbs
and nouns are more collocational and therefore bring greater learning burden for
foreign language learners. In other words, the collocational density of verb + noun
collocations poses more problems for learners than AN and NN collocations. This
could account for the overall learning burden of VN collocations, but to account for
the different performance on the three types of collocations, we argue for the
vocabulary growth factor and predict that the synonym densities of adjectives and
nouns are lower than those of verbs, thus resulting in better performance in the AN
and NN collocations and worse performance in VN collocations. We shall discuss
this in Sects. 8.2 and 8.3.
8.2 Vocabulary Growth and Collocation Errors 125
8.2 Vocabulary Growth and Collocation Errors
Through adopting a similar approach in examining the relationship between the

growth of lexical verbs and the growth of verb + noun collocation errors (cf.
Sect. 5.3), we calculated the growth rates of all the adjectives and nouns used by the
three levels of learners (including adjectives and nouns in sentences like “a healthy
diet is very important for people”), aiming to build a panorama of vocabulary
growth and erroneous collocations.
As is shown from Table 8.2, the numbers of adjectives and nouns produced by
ST5 learners occupy a middle position, confirming again a continuous rise in
proficiency from the ST2 to the ST6 levels. At the same time, there is a dramatic
increase in adjective uses from the ST2 to ST6 level (95%), but a clear decrease in
AN collocation errors. A less dramatic increase in nouns and a more marked
decrease in NN collocation errors are also displayed in the above table. Therefore,
in comparison with the VN data, different trends emerge with regard to performance
on verb + noun, adjective + noun and noun + noun collocations, i.e. there is an
increase in VN collocation errors and a decrease in AN and NN collocation errors
(see Fig. 8.1 for graphic depiction).
If the error increase is viewed in association with the increase of vocabulary,
Fig. 8.1 shows that VN collocation errors increase as learners acquire more and
more verbs and nouns, but AN and NN collocation errors decrease even though
learners acquire more adjectives and nouns. Thus, we see that vocabulary growth
plays a different role depending on word class-specific collocations. More specif-
ically, the increase in learners’ overall vocabulary in verbs has a negative corre-
lation with VN collocation learning, whereas the expansion of adjective and noun
vocabulary contributes to the learning of AN and NN collocation. The detailed
analysis of verb increase within certain synonym sets confirmed a lag in VN col-
locational knowledge, as collocation errors involving new verbs were significantly
more likely than errors with old verbs. The proportion of errors in synsets where
new verbs had been acquired increased markedly. So verb increase is an inhibiting
Table 8.2 Growth rates of adjectives, nouns and collocation errors

Lemmatised Lemmatised Tokens of Types of Tokens of Types of
adjectives nouns AN coll. AN coll. NN coll. NN coll.
errors errors errors errors
ST2 1175 3345 17 14 31 16
ST5 1850 3857 – – – –
ST6 2287 4199 15 9 6 6
Growth 95% 26% −12% −36% −81% −63%
rate (ST2
to ST6)
Note The numbers of lemmatised adjectives and nouns are type frequencies
NN coll. error
Nouns
AN coll. error
Adjectives
VN coll. error
Verbs
-100% -50% 0% 50% 100% 150%

Growth rates (type frequencies)
Fig. 8.1 Overall growth rates of the verbs, adjectives and nouns and collocation errors
factor in VN collocation acquisition. However, with the increase in adjectives and

nouns, AN/NN collocation errors in contrast decreased. Therefore, following the
procedures adopted in a detailed analysis of verb synsets, grouping adjectives and
nouns into synsets and examining collocation uses within these sets was required.
Analyses of adjective and noun synsets will be performed in the following section
in order to investigate in detail the vocabulary factor.
8.3 Synsets and Collocation Production
The classification of the adjectives and nouns in the databases into synonym sets
was performed with reference to the WordNet and the ODSA. These classifications
were much more difficult than the classification of verbs into synsets since adjec-
tives and nouns were more diversified than verbs in the collocation databases.
Starting from the highest level where more adjectives are produced in AN collo-
cations, adjectives listed in either of the reference sources as synonyms were
recorded as one synset. Accordingly, only seven synsets with a rather limited
number of adjectives were categorised by using the same reference works as with
verbs. They are adjectives describing broadness (broad, full, wide), adjectives
denoting keenness (keen, sharp), “deadly” adjectives (deadly, fatal, lethal), “clean”
adjectives (clean, clear, light), “distant” adjectives (distant, remote), “dense”
adjectives (dense, heavy) and “daily” adjectives (daily, everyday). Adjectives in the
ST2 collocation databases are more diversified and only “daily” adjectives (daily,
everyday) were identified.
The difficulty in grouping adjectives in synsets may be attributed to the types
of adjectives that were frequently used by learners, e.g. classifying adjectives
8.3 Synsets and Collocation Production 127
Table 8.3 Numbers of POS Unique Synsets Total word-sense

words, synsets and senses in Strings pairs
WordNet
Noun 117,798 82,115 146,312
Verb 11,529 13,767 25,047
Adjective 21,479 18,156 30,002
Adverb 4481 3621 5580
Totals 155,287 117,659 206,941
This table was quoted from http://wordnet.princeton.edu/wordnet/
man/wnstats.7WN.html#toc2 [Accessed10 May 2014]
(e.g. academic, annual).1 A large number of adjectives used by ST2 and ST6
learners in AN collocations are of a classifying nature (e.g. among the 142 types of
adjectives in the ST6 AN collocation database, 70 are classifying ones, cf.
Appendix G). Classifying adjectives, as their name suggests, function to group
nouns into different categories, so these adjectives themselves are too broad in
scope to be categorised in synsets. For example, synonyms of capitalist, chemical,
domestic, and solar have only a few synonyms as referenced in WordNet.
As it is difficult to classify adjectives into different synsets, the nouns produced
in the NN collocations, such as air, alarm, art are also difficult to group. So, in
general, the synonym density of adjectives and nouns in the databases is lower than
that of verbs, which maybe the reason why AN and NN collocations were more
accurate than VN collocations. Studies of the synsets of verbs, adjectives and nouns
have confirmed such a decrease in synonym density. According to the statistics
published online for WordNet 3.0 database, the ratios of synsets as compared to the
total number of verbs, adjectives and nouns, respectively, are 1.19 for verbs, 0.85
for adjectives and 0.70 for nouns (see Table 8.3 for the raw statistics cited from
WordNet).
Therefore, based on the statistics from the WordNet, in general, there are more
synsets for verbs than adjectives and more synsets for adjectives than nouns. That
decreasing synonym density of the three-word classes was also verified through
computational analysis of WordNet (cf. Kamps et al. 2004; Tufis and Stefanescu
2011). In the graphs drawn by Kamps et al. (2004: 1116) through collecting all
words in the WordNet, and relating words that can be synonymous, they observed a
giant component: in the verb-subgraph there is a component of size 6365 (or 57%
of all verbs); in the adjective-subgraph there is a component of size 5427 (or 25% of
all adjectives) and in the noun-subgraph there is a connected component of size
10,922 (or 10% of all nouns). These figures show that more verbs than adjectives
and more adjectives than nouns are related in synsets.
1
Adjectives are classified into 4 categories (Sinclair and Fox 1990: 63f): qualitative adjectives
(which identify qualities someone or something has, e.g. happy, and intelligent), classifying
adjectives (which identify someone or something as a member of class, e.g. financial and intel-
lectual), colour adjectives (identifying the colour of something, e.g. blue and green) and
emphasising adjectives (which are used to emphasise feelings, e.g. complete and absolute).
However, synset analysis of words in WordNet is very general. A more directly

focused study relevant to learner data in our research was conducted, through
narrowing down the investigation of the density of the three-word classes into our
databases. A straightforward and simple way to measure synonym density is to
compute the average number of synonyms for each verb, adjective and noun.
Considering a large number of words in learner data, a random sampling of words
in the ST2 databases was adopted. The sampling procedure is as follows: altogether
80 lexical verbs, 60 adjectives and 278 nouns in the VN, AN and NN collocations
were identified and then listed alphabetically. Every four verbs, every three
adjectives and every fourteen nouns were chosen in the three wordlists to form the
sample in this case study. In all, 60 words were selected, i.e. a random selection of
20 verbs in VN collocations, a random selection of 20 adjectives in AN collocations
and a random selection of 20 nouns in NN collocations. These words are:
• 20 verbs in VN collocations: answer, break, catch, comb, create, discharge,
earn, follow, grasp, kick, lead, obey, pass, play, remember, see, show, sow,
teach, wear
• 20 adjectives in AN collocations: blue, botanical, capitalist, classical, common,
crisp, deep, double, fair, firm, founding, glib, happy, historic, living, low, nat-
ural, political, public, strong
• 20 nouns in NN collocations: ball, break, center, colour, diamond, fashion,
gambling, head, lab, light, name, party, police, program, restaurant, sentence,
steel, telephone, trip, world.
Synonyms of those 60 words were located in WordNet. However, not all the
synonyms for each word are counted. WordNet provides the synonyms for each
word according to their different senses since different senses of the word lead to
different synonyms. As Table 8.3 shows, the average polysemy figure of verbs is
the highest—2.17 (25,047/11,529), much more than the average polysemy of
adjectives: 1.40 (30,002/21,479) and of nouns: 1.24 (14,6312/117,798).2 So the
inclusion of all synonyms irrespective of word senses will naturally add more
synonyms for verbs than adjectives and nouns. For example, 59 senses are listed for
the verb break, but for the adjective botanical, there is only 1 sense. Therefore, only
the synonyms of the particular sense of the word in question are taken into con-
sideration. The sense of a word is determined by its following collocates. Take the
verb answer for example. WordNet software (version 2.1) yielded the following 10
senses for this verb:3
2
See also the statistics on the WordNet website: http://wordnet.princeton.edu/wordnet/man/
wnstats.7WN.html#toc2 (Accessed 8 April 2014).
3
For the convenience of checking the synonyms of the verb in question, Wordnet (2.1) was
referenced instead of the web interface, as in the software different senses of the verb are num-
bered, and synonyms are neatly listed.
Sense 1
The answer, reply, respond—(reply or respond to; “She didn’t want to answer.”; “answer
the question”; “We answered that we would accept the invitation.”)
=> state, say, tell—express in words; “He said that he wanted to marry her.”; “Tell me what
is bothering you.”; “state your opinion”; “state your name”)
Sense 2
answer—(give the correct answer or solution to; “answer a question”; “answer the riddle”)
=> solve, work out, figure out, puzzle out, lick, work—(find the solution to (a problem or
question) or understand the meaning of; “Did you solve the problem?”; “Work out your
problems with the boss.”; “This unpleasant situation isn’t going to work itself out.”; “Did
you get it?”; “Did you get my meaning?”; “He could not work the math problem.”)
Sense 3
answer—(respond to a signal; “answer the door”; “answer the telephone”)
=> react, respond—(show a response or a reaction to something)
Sense 4
answer, resolve—(understand the meaning of; “The question concerning the meaning of
life cannot be answered.”)
=> solve, work out, figure out, puzzle out, lick, work—(find the solution to (a problem or
question) or understand the meaning of; “did you solve the problem?”; “Work out your
problems with the boss.”; “this unpleasant situation isn’t going to work itself out.”; “did
you get it?”; “Did you get my meaning?”; “He could not work the math problem.”)
Sense 5
answer—(give a defence or refutation of (a charge) or in (an argument); “The defendant
answered to all the charges of the prosecution.”)
=> refute, rebut—(overthrow by argument, evidence, or proof; “The speaker refuted his
opponent’s arguments.”)
Sense 6
answer—(be liable or accountable; “She must answer for her actions.”)
=> be—(have the quality of being; (copula, used with an adjective or a predicate noun);
“John is rich.”; “This is not a good answer.”)
Sense 7
suffice, do, answer, serve—(be sufficient; be adequate, either in quality or quantity; “A few
words would answer.”; “This car suits my purpose well.”; “Will $100 do?”; “A ‘B’ grade
doesn’t suffice to get me into medical school.”; “Nothing else will serve.”)
=> satisfy, fulfil, fulfil, live up to—(fulfil the requirements or expectations of)
Sense 8
answer—(match or correspond; “The drawing of the suspect answers to the description the
victim gave”)
=> match, fit, correspond, check, jibe, gibe, tally, agree—(be compatible, similar or con-
sistent; coincide in their characteristics; “The two stories don’t agree in many details.”;
“The handwriting checks with the signature on the check.”; “The suspect’s fingerprints
don’t match those on the gun.”)
Sense 9
answer—(be satisfactory for; meet the requirements of or serve the purpose of; “This may
answer her needs.”)
=> meet, satisfy, fill, fulfil, fulfil—(fill or meet a want or need)
Sense 10
answer—(react to a stimulus or command; “The steering of my new car answers to the
slightest touch.”)
=> react, respond—(show a response or a reaction to something)
In the particular collocation answer + question produced by ST2 learners, the

senses of answer are “reply/respond to, or give the correct answer or solution to, or
resolve”, corresponding to Senses 1, 2 and 4. Therefore, the number of synonyms
for an answer is 9, which includes a reply to, respond to, solve, work out, figure out,
puzzle out, lick, work and resolve. Troponyms (e.g. state, say, tell in Sense 1) which
denotes the manner of answering a question is not counted as synsets for an answer,
whilst hypernyms or superordinates (e.g. solve, work out, figure out, puzzle out,
lick, work in Sense 2) are included.
The same procedure was performed on the verb break. In the collocations
break + rule and break + record, Senses 6 and 14 correspond to the senses of a
break in the two-word combinations (see the following senses extracted from
WordNet software [version 2.1]) and accordingly 16 synonyms were located.
Sense 6
transgress, offend, infract, violate, go against, breach, break—(act in disregard of laws,
rules, contracts, or promises; “offend all laws of humanity”; “violate the basic laws or
human civilization”; “break a law”; “break a promise”)
=> disrespect—(show a lack of respect for)
Sense 14
better, break—(surpass in excellence; “She bettered her own record”; “break a record”)
=> surpass, outstrip, outmatch, outgo, exceed, outdo, surmount, outperform—(be or do
something to a greater degree; “Her performance surpasses that of any other student I
know.”; “She outdoes all other athletes.”; “This exceeds all my expectations.”; “This car
outperforms all others in its class.”)
Altogether the 60 words and their number of synonyms are presented in the fol-
lowing Table 8.4.
As the above table reveals, the synonyms for verbs (113) far outnumber those for
adjectives (57), and the synonyms for adjectives outnumber those for nouns (13).
The result confirms the findings from computational studies of synsets in WordNet,
viz. the synonym density for verbs, adjectives and nouns is on a decreasing scale. In
the light of the fact that verbs have more synsets than adjectives and adjectives have
more synsets than nouns, we get a better understanding of why L2 learners perform
worse on verb + noun collocations and better on the adjective + noun and
noun + noun collocations. Since there are more synonyms for verbs, the more verbs
in a synset learners acquire, the more likely they are to make collocation errors. In
Table 8.4 Selected words in the learner databases and the number of synonyms
Verbs No. of syns. Adjectives No. of syns. Nouns No. of syns.
Answer 9 Blue 2 Ball 0
Break 16 Botanical 1 Break 4
Catch 5 Capitalist 1 Center 0
Comb 6 Classical 1 Colour 0
Create 2 Common 4 Diamond 0
Discharge 8 Crisp 6 Fashion 3
Earn 9 Deep 6 Gambling 0
Follow 3 Double 2 Head 2
Grasp 7 Fair 7 Lab 0
Kick 1 Firm 2 Light 0
Lead 2 Founding 0 Name 0
Obey 3 Glib 1 Party 0
Pass 4 Happy 9 Police 0
Play 6 Historic 2 Program 1
Remember 6 Living 0 Restaurant 0
See 7 Low 7 Sentence 0
Show 4 Natural 2 Steel 0
Sow 9 Political 0 Telephone 1
Teach 3 Public 1 Trip 2
Wear 3 Strong 3 World 0
Total 113 Total 57 Total 13
Note ‘No. of syns.’ is short for the number of synonyms
contrast, the fewer synonyms for adjectives and nouns may explain why learners
seldom made errors in choosing the right collocates, although they were confused
with the grammatical forms of words and made colligation errors in NN
collocations.
8.4 Synopsis of the Findings in This Chapter
This chapter has been concerned with a comparison of learners’ performance on

verb + noun, adjective + noun and noun + noun collocations. Two overall findings
have emerged with regard to the three collocation types: a relatively poorer per-
formance on VN collocations was observed in Chinese EFL learners but better
performance was found on AN and NN collocations. Moreover, learners’ knowl-
edge of VN collocations did not increase with rising proficiency but their knowl-
edge of AN and NN collocations improved as they proceeded to higher levels.
A possible reason for such a difference was explored with regard to the overall
density of the synonyms for the three syntactic categories of words as collocators:
verbs, adjectives and nouns. Computational analyses both of the synsets of the
three-word classes in WordNet and in a case study revealed that verbs generally
have more synonyms than adjectives, while adjectives have more synonyms than
nouns. Therefore, the better performance on AN and NN collocations can be
accounted for through the lower density in synonyms. In this regard, the analyses of
AN and NN collocations in this chapter consolidate the prediction that vocabulary
growth is an inhibiting factor in collocation acquisition. To be more specific, for
word classes where there is little increase in a synonym set, collocation errors are
seldom made (as for adjective and nouns in AN and NN collocations); where there
are increases in words in synsets, chances of errors subsequently increase (as for
verbs in VN collocations).
References
Kamps, J., Marx, M., & Mokken, R. J., et al. (2004). Using WordNet to measure semantic
orientation of adjectives. In Proceedings of LREC-04, 4th International Conference on
Language Resources and Evaluation. (Vol. 4, pp. 1115–1118).
Kjellmer, G. (1990). Patterns of collocability. In J. Arts, W. Meijs (Eds.), Theory and practice in
corpus linguistics (pp. 163–178). Amsterdam: Rodopi.
Online Proceedings of Corpus Linguistics, [2014-01-12]. http://ucrel.lancs.ac.uk/publications/
Tufis, D., & Stefanescu, D. (2011). An Osgoodian perspective on WordNet. In Speech Technology
and Human-Computer Dialogue (SpeD), 2011 6th Conference on (pp. 1–8), IEEE.
Chapter 9
The Role of L1 in Collocation Learning
As already reviewed in Sect. 3.2.2, the L1 of the learner plays a considerable role in
the learning and production of L2 collocations. On the one hand, previous empirical
collocation studies into L2 learners’ collocation performance show that
L1-influenced errors make up a large proportion of errors even at advanced levels
(Laufer and Waldman 2011; Nesselhauf 2005). For example, Laufer and Waldman
(2011) reported that over 60% of the verb + noun collocation errors produced by
intermediate and advanced learners were L1-induced. This proportion is not found
to decrease over time. A heavy reliance on their mother tongue in collocation
production is also manifested through the overuse of certain collocations that are
linked to lexical combinations in the L1and the underuse of collocations that are
mismatched between the two languages (e.g. Granger 1998a, b; Kaszubski 2000).
On the other hand, research exploring the psychological reality of L2 collocation
learning in terms of L1 and L2 congruence and non-congruence found that L2
learners perform better on congruent collocations than non-congruent ones in
lexical decision tasks. It is also suggested that non-congruent collocations stored in
memory are processed autonomously without word-by-word mediation of the L1.
However, there is still a paucity of investigation into whether L2 learners produce
congruent collocations with more accuracy than non-congruent ones, and whether
non-congruent collocations, once learnt, are less susceptible to errors. Therefore, we
set out in this chapter to investigate the role of L1 in L2 collocation learning in
terms of the production of congruent and non-congruent collocations. It is worth-
while for this study to provide a point of comparison with existing research in terms
of the influence of L1 lexical network on L2 collocation learning. In addition, little
research evidence has been provided with regard to the role of the learners’ mother
tongue in different types of collocations, e.g. verb + noun and adjective + noun
This chapter has been written into a paper titled Cross-linguistic Influence on the Production of
L2 Collocations: A Corpus-based Study of Chinese EFL Learners’ Collocation Learning. The
paper was given at the 9th Newcastle upon Tyne Postgraduate Conference in Linguistics, 4
April, 2014.
DOI 10.1007/978-981-10-5822-6_9
134 9 The Role of L1 in Collocation Learning
collocations. So the second goal of this chapter is to compare learner performance

with regard to congruent and non-congruent VN and AN collocations in order to
see if there are differences in terms of cross-linguistic influence. It is hypothesised
that:
• L2 learners perform better with congruent collocations than non-congruent
collocations in collocation production.
• Non-congruent collocations that are correctly used by learners at lower levels
are not wrongly used by learners at higher levels.
• The L1 plays a different role in verb + noun and adjective + noun collocations.
To test the above hypotheses, the notion of congruence in classifying L2 col-
locations was first clarified (Sect. 9.1). Then within-group comparison of
well-formed and erroneous congruent and non-congruent VN collocations was
carried out to see whether congruent collocations were produced with a higher
accuracy than non-congruent ones (Sect. 8.2); Next, between-group comparison on
the well-formed and erroneous congruent and non-congruent VN collocations was
performed in order to see whether, from a developmental perspective,
non-congruent collocations that were used correctly at lower levels were not
wrongly used by higher levels (Sect. 9.3); Finally, within-group comparison of
positive and negative L1 influence with verb + noun and adjective + noun collo-
cations was made to examine the role of the L1 in the learning of different types of
collocations (Sect. 9.4). The last section (Sect. 9.5) presents a summary of this
chapter.
9.1 The Notion of Congruence
Congruence between L1 and L2 is approached from the perspective of cross-

linguistic translation equivalence (e.g. Bahns 1993; Marton 1977; Nesselhauf 2003,
2005; Philip 2007; Wolter and Gyllstad 2011; Yamashita and Jiang 2010). Yet as
Nesselhauf (2005: 221) acknowledges, the notion of congruence is difficult to
grasp. In contrastive or translation studies, translation equivalence refers to the
correspondence of forms or constructions between the source language and the
target language (Johansson 2007: 23). Here, translation equivalence used in
determining collocation congruence in two languages is measured at a word-for-
word level. In other words, if word elements in a collocation in one language have
direct word-for-word translational equivalence in another language, then it is
considered as a congruent collocation; otherwise, it is a non-congruent collocation
(Nesselhauf 2005; Wolter and Gyllstad 2011). An example of congruent collocation
given by Wolter and Gyllstad (2011) gives an answer, which can be rendered on a
word-for-word basis in Swedish as ge ett svar. For non-congruent collocations, they
give the example of pay a visit, for which the literal Swedish translation would be
*
betala ett besÖk, which is infelicitous.
9.1 The Notion of Congruence 135
A difficulty in capturing the notion of congruence is deciding how to measure

whether two words are “direct translation equivalents” in two languages. For
example, the English strong tea is idiomatically translated into Chinese as nong
cha. Yet nong cha is literally *dense tea when directly translated into English. In
this case it is not certain whether the English collocation strong tea shares “direct”
translation equivalence with the Chinese expression or not. Therefore, in order to
measure direct translational equivalence, the strategy of “back translation” is
adopted, where the translated word in question is translated back into English out of
context, and we see if it can be translated to the English word in question
(Altenberg and Granger 2002: 17; Nesselhauf 2005: 221). At the same time, to
reduce human judgment and make the classification feasible, the classification of
congruent and non-congruent collocations was approached in this study with the
assistance of two bilingual dictionaries. The procedure was as follows:
First, congruence is only considered at the level of content words;1
Next, the noun in the VN collocation is translated into Chinese;
Thirdly, the meaning of the verb is looked up in a bilingual dictionary [Oxford
Advanced Learners’ English–Chinese Dictionary (7th Edition)] (OALECD) with
reference to the meaning of the following noun, and then the Chinese meaning is
located;
Fourthly, we check if the Chinese verb translation together with the Chinese
noun makes a felicitous sequence. If not, then it is non-congruent collocation; if so,
we take the fifth step:
Finally, if the sequence obtained is a felicitous Chinese sequence, a check is
made as to whether, in the back translation process, the Chinese verb in question
would out of context be readily translated into the English word in question, with
reference to a Chinese–English bilingual dictionary—New Century Chinese–
English Dictionary (NCCED). If so, it is a congruent collocation; if not, it is
considered as a non-congruent one.
The following three collocations exemplify the detailed procedures:
Example 1: MAKE + CONTRIBUTION
Analysis:
• The Chinese meaning of CONTRIBUTION is gongxian.
• The sense of MAKE in the OALECD in the context of
MAKE + CONTRIBUTION is ‘create’ and in Chinese zuo.
• The Chinese sequence zuo gongxian is felicitous.
• The Chinese verb zuo in the NCCED can be readily translated into make in
English.
Conclusion: MAKE + CONTRIBUTION is a congruent collocation.
1
Grammatical words usually behave differently across languages. That is especially true of the
Chinese language, which has fewer words functioning as prepositions and they are used less
frequently than in English. The Chinese does not have articles (Cross and Papp 2008: 68; LÜ
2002). So grammatical words are disregarded and only content words of verbs and nouns in our
database were considered.
Example 2: TAKE + NOTE

• The Chinese meaning of NOTE is biji.
• The sense of TAKE in the OALECD in the context of TAKE + NOTE is “write
down” and in Chinese ji.
• The Chinese sequence ji biji is felicitous.
• The Chinese verb ji in the NCCED cannot be readily translated as take in
English, or rather it is literally translated as write down.
Conclusion: TAKE + NOTE is a non-congruent collocation.
Example 3: PAY + VISIT
• The Chinese meaning of VISIT is canguan.
• The sense of PAY in the bilingual dictionary in the context of PAY + VISIT is
“used with some nouns to show that you are giving or doing the thing men-
tioned” and there is no equivalence in Chinese.
Conclusion: PAY + VISIT is a non-congruent collocation.
Using the above procedure, all the well-formed collocations were classified into
congruent and non-congruent collocations. For the erroneous collocations, the
target collocation (e.g. acquire knowledge but not *learn knowledge) was classified
as congruent or non-congruent. Next, the erroneous verb was judged together with
the noun collocate to see if it is a word-for-word equivalent of a Chinese expres-
sion. If so, it was classified as a Chinese transfer error. Otherwise, it was taken as a
non-transfer error.
Altogether, six tags were designed to classify well-formed and erroneous
collocations:
• (I, W): non-congruent and well-formed collocations;
• (C, W): congruent and well-formed collocations;
• (I, N): non-congruent and negative transfer errors;
• (C, N): congruent and negative transfer errors;
• (I, N-): non-congruent and non-negative transfer errors;
• (C, N-): congruent and non-negative transfer errors;
In testing the three hypotheses proposed in this chapter, both quantitative and
qualitative analyses were performed, with statistical comparisons carried out along
three dimensions: (a) within-group comparison of well-formed and erroneous
congruent and non-congruent VN collocations; (b) between-group comparison of
the well-formed and erroneous uses of congruent and non-congruent VN colloca-
tions; (c) within-group comparison of positive and negative L1 influence in
verb + noun and adjective + noun collocations. The next three sections accordingly
present the results of these three groups of comparisons.
9.2 Within-group Comparison of Well-formed and Erroneous Congruent … 137
9.2 Within-group Comparison of Well-formed

and Erroneous Congruent and Non-congruent VN
Collocations
Well-formed congruent collocations are the collocations tagged as (C, W);

well-formed non-congruent ones are (I, W). Erroneous congruent collocations
include collocations tagged as (C, N) and (C, N-), whilst erroneous non-congruent
collocations are (I, N) and (I, N-).
First of all, a within-group comparison of the frequencies of well-formed and
erroneous congruent and non-congruent collocations was performed on the group of
ST2 learners by using chi-square tests. The results are shown numerically in
Table 9.1 (for the frequency of collocation types, see Table 9.2). Table 9.1 presents
the tokens of well-formed and erroneous congruent collocations and as well
well-formed and erroneous non-congruent collocations used by ST2 learners, along
with the percentages of erroneous collocations constituting out of the total number
of congruent and non-congruent collocations. Though erroneous collocations did
not make up a high proportion of total collocations produced, a chi-square test with
Yate’s correction was performed and a significant difference was found between the
number of erroneous collocations and collocation types as congruent or
non-congruent—that is, there were significantly more errors with congruent col-
locations than non-congruent collocations (v2 = 45.39, p < 0.0001).
When collocation types are considered, both the percentages of erroneous col-
locations among congruent collocations (30.4%) and among non-congruent collo-
cations (11.1%) increase compared with collocation tokens presented in Table 9.1.
Following a similar statistical analysis using Fisher’s test, a similar result was
obtained: there are significantly more errors with congruent collocations and more
Table 9.1 Well-formed and erroneous congruent and non-congruent collocations in the ST2
(tokens)
Types Congruent coll. (%) Non-congruent coll. (%) Total
Well-formed coll. 727 (88.7) 740 (97.5) 1467
Erroneous coll. 93 (11.3) 19 (2.5) 112
Total 820 (100) 759 (100) 1579
Note v2 = 45.39, p < 0.0001 **** extremely Sig.
(types)
Well-formed coll. 117 (69.6) 104 (88.9) 221
Erroneous coll. 51 (30.4) 13(11.1) 64
Total 168 (100) 117 (100) 285
Note v2 = 13.59, p = 0.0002 *** extremely Sig.
well-formed non-congruent collocations produced by ST2 learners (v2 = 13.59,

p = 0.0002). This result runs counter to what Nesselhauf (2005) observed, where
the percentage of collocation errors among congruent collocation tokens was
around 17% and among non-congruent collocations was 42%. The underlying
reason for such a substantial difference in the results obtained may be the different
criteria adopted in categorising the two types of collocations; namely, congruent
and non-congruent. In Nesselhauf’s data set, congruence was measured at the levels
of phrasal verbs, prepositions and nouns in verb + noun combinations (2005: 222).
That may have led to the classification of more non-congruent collocations which
otherwise might be congruent ones. According to Salkie (2002: 56), items in closed
grammatical classes normally behave differently across languages. Subsequently,
the likelihood of collocation errors with grammatical words (prepositions, the
number of the noun) increases due to their often greatly differing use between
languages. In this study, congruence was only measured at the level of content
words, with colligations disregarded. As was pointed out in Chap. 4, only the verbs
that were correctly or wrongly used were targeted, and combinations where a noun
was wrongly used were not included in the present study.
A difference in L2 collocation performance depending on varying L1 back-
grounds has been reported in previous L2 collocation studies (e.g. Biskup 1992;
Wang and Shaw 2008). Thus, another reason accounting for a difference between
Nesselhauf’s (2005) finding and ours may lie in the language background of the L2
learners targeted. In her study, the L1 of the learners is German, which, together
with the English language, belongs to Indo-European languages, whilst Chinese is
one category of Sino-Tibetan languages. Thus, more differences would be expected
in Chinese than German when compared with English. Based on this observation,
Chinese learners of English may encounter more difficulties in producing congruent
collocations than German learners of English.
The above result was obtained for the collocations used by the ST2 learner
group. As the same procedure was employed in the data of ST6 learners, similar
outcomes were produced when the frequencies of well-formed and erroneous
collocations among congruent and non-congruent collocations were compared, i.e.
the chi-square test shows that there were significantly more errors among congruent
collocations than non-congruent collocations, as a statistically significant relation-
ship was found between the number of erroneous collocations and whether collo-
cations were congruent and non-congruent (see Table 9.3 for token frequencies and
Appendix H for type information).
(tokens)
Well-formed coll. 1047 (88.1) 618 (96.4) 1665
Erroneous coll. 141 (11.9) 23 (3.6) 164
Total 1188 (100) 641 (100) 1829
The statistical analyses of collocations produced by ST2 and ST6 learners show
that congruent collocations pose more difficulties than non-congruent ones. The
greater difficulty with congruent collocations seems to contradict findings from
psycholinguistic experiments conducted by Yamashita and Jiang (2010), Wolter
and Gyllstad (2011), according to whose findings a group of highly proficient
language learners both processed L1–L2 collocations (i.e. congruent collocations)
than L2-only collocations (i.e. non-congruent collocations) with faster reaction
times and recognised the former with higher receptive scores. So L2 collocational
links in the mental lexicon of L2 learners are likely to be mediated by their L1 and
thus congruent collocations gain more legitimacy in the mental lexicon. This is
perhaps true in the sense of collocational storage and recognition, but a different
picture emerges when this link is activated in the production process.
A detailed analysis was further performed on erroneous congruent collocations,
aiming at discovering why congruent collocations seem to pose more difficulties for
Chinese learners of English. Erroneous collocations in the ST2 learner group were
identified. A detailed investigation into the 93 instances of erroneous congruent
collocations in ST2 revealed that a large proportion of the errors were a result of
‘partial congruence’ between the two languages. Congruent collocations are easier
for L2 learners in locating the appropriate verb (e.g. answer) with the preselected
noun (e.g. question) through direct rendering from their mother tongue. As in the
case of answer + question, the corresponding English verb—answer–is a
one-to-one match with the Chinese verb—huida (although reply to and respond to
are synonyms of answer in this sense and are also in collocational relationship with
question, they are not believed to be an exact match for huida, which is instead
rendered in Chinese as huifu). However, the problem for L2 learners is that
one-to-one correspondence in languages is not prevalent. As shown in the errors in
Table 9.4 below, there exists ‘differentiation’ between Chinese and English,
Table 9.4 Erroneous congruent collocations attributable to ‘differentiation’

Chinese words English words Error tokens
zuo (做) make, do (exercise), compose (poem), play (game) 5
kan (看) see, read (book) 3
chuan (穿) dress, wear (clothing) 1
shuo (说) say, tell (joke) 2
ting (听) hear, listen to (music) 3
dakai (打开) turn on, open (book) 1
canjia (参加) attend, participate in (activity) 1
biaoyan (表演) play, perform (play) 3
chengren (承认) concede, accept, admit (mistake) 2
chuangzuo (创作) create, compose (poem) 4
zhuazhu (抓住) catch, seize (time) 1
Total 26
Note Nouns in brackets are given as the noun collocates of the target English verbs
meaning that the native language has one form, whereas the target language has two
or more forms (Gass and Selinker 2008: 100). For example, there is a one-to-many
correspondence in the following forms and subsequently errors are induced by such
mismatches.
Additionally, another notable type of congruent collocation error can be attrib-
uted to ‘coalescing’, referring to the opposite of ‘differentiation’ where the native
language has more than one form corresponding to only one form of the target
language (Gass and Selinker 2008: 100f). Unlike errors attributable to differentia-
tion, where learners seem to have difficulties choosing the right form from several
possible forms in the L2, in cases of coalescing, what happens is that L2 learners
have to know that among several expressions in their native language (e.g. huode
zhishi (literal translation: acquire knowledge), xuexi zhishi: literal translation: *learn
knowledge), only one expression (e.g. huode zhishi: English translation: acquire
knowledge) corresponds to the L2 expression (e.g. acquire knowledge). Examples
of errors attributable to coalescing are shown in Table 9.5.
In Table 9.5, there are many-to-one correspondences between the Chinese and
English language and only the first sequence in the left column corresponds cor-
rectly to the sequences in English. For instance, the concept of acquire knowledge
can be expressed in Chinese in at least three forms: acquire knowledge/*earn
knowledge, *learn knowledge and *grasp knowledge. *learn knowledge has been
commonly produced by Chinese learners and negative transfer subsequently occurs.
*
learn knowledge, *grasp knowledge together with *teach knowledge are the
commonest Chinese expressions when expressing the concepts of acquire knowl-
edge and impart knowledge.
It can be seen that 66.7% ((26 + 36)/93 100) of the erroneous congruent
collocations are caused by a partial congruence (or in Nesselhauf’s (2005) termi-
nology, ‘partial non-congruence’), where one-to-many or many-to-one correspon-
dences occur in the native and target languages and only one equivalent of several
expressions that can be used in one language is acceptable in another. These two
factors are found to be responsible for the susceptibility to errors for congruent
Table 9.5 Erroneous congruent collocations attributable to ‘coalescing’

Chinese sequences English sequences Error tokens
gei jianyi/yuanyin (给建议/原因) give advice/reason 3
shuo jianyi/yuanyin (说建议/原因)
huode zhishi (获得知识) acquire knowledge 27
xuexi zhishi (学习知识)
zhangwo zhishi (掌握知识)
chuanshou zhishi (传授知识) impart knowledge 5
jiao zhishi (教知识)
fuxi zhishi (复习知识) review knowledge 1
jiyi zhishi (记忆知识)
Total 36
collocations. Just as Farghal and Obeidat (1995: 323) pointed out, reliance on L1
does not “always result in positive transfer since the one-to-one correspondence
hypothesis holds in only few cases”. Relying on the L1 by L2 learners commonly
leads to negative transfer.
9.3 Between-group Comparison of Well-formed

and Erroneous Congruent and Non-congruent VN
Collocations
Tables 9.6 and 9.7. below present the results of well-formed congruent and
non-congruent collocation tokens and erroneous ones produced by ST2 and ST6
learners (for types, see Appendices I and J).
Likewise, statistical analyses were conducted on the data in both tables above.
Two different results were obtained: a significant relationship between well-formed
non-congruent collocations and learner groups was found, whilst for erroneous
non-congruent collocations, no statistical significance was observed. Significantly
more non-congruent collocations were correctly used by ST2 learners than by the
ST6 group (both in terms of tokens and types). As is shown in Fig. 9.1, there is an
Table 9.6 Well-formed Types ST2 (%) ST6 (%) Total

congruent and non-congruent
VN collocations in ST2 and Congruent coll. 727 (49.6) 1047 (62.9) 1774
ST6 (tokens) Non-congruent coll. 740 (50.4) 618 (37.1) 1358
Total 1467 (100) 1665 (100) 3132
Table 9.7 Erroneous Types ST2 (%) ST6 (%) Total

congruent and non-congruent
VN collocations in ST2 and Congruent coll. 93 (83.0) 141 (85.8) 234
ST6 (tokens) Non-congruent coll. 19 (17.0) 23 (14.2) 42
Total 112 (100) 164 (100) 276
Note p = 0.50 ns
Fig. 9.1 Well-formed 1,200

congruent and non-congruent 1,000
collocation tokens in the ST2
and ST6 800
Freq. 600
Congruent coll.
400 Non-congruent coll.
200
0
ST2 ST6
Learner types
Fig. 9.2 Erroneous 160

congruent and non-congruent 140
collocation tokens in the ST2 120
and ST6 100
Freq. 80
Congruent coll.
60
40 Non-congruent coll.
20
0
ST2 ST6
Learner types
increase in congruent collocation uses and decrease in non-congruent collocation

uses as learners’ proficiency grows. The chances of erroneous congruent and
non-congruent collocations increase as well but not sharply. Additionally, the
increase in erroneous congruent collocations is sharper than of non-congruent ones
as is presented in Fig. 9.2.
Since there are no direct translation equivalents between non-congruent collo-
cations in learners’ L1 and L2, it would be expected that they are more difficult for
learners and more susceptible to errors. However, as the data reveals, the number of
errors with non-congruent collocations is significantly lower than errors with
congruent ones (cf. Tables 9.1, 9.2 and 9.3). It could be that non-congruent col-
locations are avoided in production, and when there is a need to express the con-
cept, paraphrases are used.2 For example, in Farghal and Obeidadt’s data, subjects
tended to paraphrase light food as food little fat, heavy drinker as drinks too much.
However, Table 9.1 indicates that there are more non-congruent collocations that
are correctly used than congruent ones, so these Chinese learners showed no sign of
such an avoidance strategy.
Another reason that non-congruent collocations are less prone to errors may be
that they are acquired as wholes and, once acquired, they are less prone to errors.
This claim is supported by psycholinguistic evidence that once stored in memory,
L2 collocations, like formulaic sequences, are processed independently of the L1
(Conklin and Schmitt 2008; Jiang and Nekrasova 2007; Wolter and Gyllstad 2011;
Yamashita and Jiang 2010). For the present research, evidence was gathered in
order to empirically test their claim: to investigate developmentally whether the
non-congruent collocations that were correctly used by ST2 learners were not
wrongly used by ST6 learners.
The steps taken in examining the uses of non-congruent collocations were as
follows:
• Check if the well-formed non-congruent collocations in the ST2 were also
correctly used by ST6 learners;
• Check the overlapping non-congruent collocations that were wrongly produced
by ST6 learners and the well-formed non-congruent ones in the ST2. Results
2
See also the paraphrasing strategy employed by L2 learners in Bahns and Eldaw (1993) and
Farghal and Obeidadt (1995).
9.3 Between-group Comparison of Well-formed and Erroneous … 143
showed that among the 104 types of well-formed non-congruent collocations in

the ST2, 33 types are also correctly used by ST6 learners (the others being
absent in the ST6 data). Among the erroneous uses of collocations by ST6
learners, all of them except two cases3 were absent from the ST2 database of
well-formed non-congruent collocations, indicating that erroneous
non-congruent collocations at the ST6 level were not produced in large numbers
at the ST2 level. This result functions as evidence in support of our claim of
holistic learning of non-congruent collocations. To be more specific, if a
majority of errors with non-congruent collocations at the ST6 level have been
correctly used in the ST2 level, non-congruent collocations may not be learnt
holistically and are prone to errors. What we found is that non-congruent col-
location errors in the ST6 were not frequently found in the database of
well-formed non-congruent ones produced by ST2 learners. In sum, though
there are not many overlapping uses of non-congruent collocations, our data
show that non-congruent collocations that are correctly used by lower levels of
learners are not susceptible to errors at higher levels.
A detailed look at the non-congruent collocations in our data gives two striking
categories of non-congruence: lexical non-congruence and structural
non-congruence.4 The former means that the English collocation and the Chinese
expression share the same verb + noun structure, but the verb in question does not
have a direct translation equivalent in both languages; structural non-congruence
means that the English VN collocation does not correspond to a VN sequence in the
Chinese, rather the Chinese uses a single lexeme. Take + pill is an example of
lexically non-congruent collocation, since the Chinese equivalent chi + yao is also
a verb + noun structure, yet the Chinese verb chi literally corresponds to eat in
English (*eat + pill). Other examples of this type include file + suit (Chinese:
tiqi + susong), reach + conclusion (Chinese: dechu + jielun); Make + fun, on the
other hand, is a structurally non-congruent collocation, in which the translation
equivalent in Chinese is quxiao, a single verb lexeme as listed in the Contemporary
Chinese Dictionary (the 5th Edition 2005) (henceforth: CCD). Take + bath
(Chinese: xizao), set + fire (Chinese: shenghuo) also fall into the category of
structurally non-congruent collocations.
Altogether there are 310 lexically non-congruent collocations and 429 struc-
turally non-congruent collocations that are correctly used in the ST2. For ST6
learners, the number of tokens of well-formed lexically non-congruent collocations
is 161 and of structurally non-congruent collocations, 457. Structurally
3
The two non-congruent collocations that were correctly used by the ST2 level but wrongly used at
the ST6 level are *make (draw) + conclusion and *put (pay) + attention, both of which are due to
the Chinese transfer.
4
Nesselhauf (2005: 222) distinguishes two types of non-congruence: lexical and non-lexical
non-congruence, the former of which corresponds to the sense in the present study whilst the latter,
referring specifically to elements other than lexical words (e.g., prepositions) that are not congruent
between languages.
non-congruent collocations make up a predominant proportion of the total

non-congruent collocations that are successfully produced by L2 learners. As Wang
(2011) reported in his study of Chinese learners’ learning of light verb + noun
collocations, learners seldom made errors and seemed to have little difficulty in
acquiring collocations where there are structural differences between the forms of
expression between Chinese and English.
Past studies of L2 collocation acquisition suggest that learners do not pay
attention to collocation relationships and thus collocations are seen as composi-
tional combinations of words rather than as a phenomenon of co-selection (Laufer
and Waldman 2011; Philip 2007; Wray 2002). Yet seen through the better per-
formance in non-congruent collocations by L2 learners, special attention might be
actually paid when the fully salient non-congruent collocations are encountered in
learning. Thus these collocations might be “restructured” in their mental lexicon
and memorised as wholes, bypassing the route of the L1. Wolter (2006: 743–744)
argues for this restructuring process by explaining that “in some cases their existing
L1 lexical/conceptual network will suffice, and slotting L2 lexical items into the
network will be fairly straightforward. In other cases, however, the network itself
will need fundamental restructuring in order to accommodate divergent properties”.
Our finding that non-congruent collocations are less likely to be at fault than
congruent collocations and are less prone to errors once acquired by lower levels of
learners empirically supports findings from psycholinguistic studies, e.g. once an
L2-only collocation (i.e. non-congruent collocation) “is recognised as a legitimate
collocation in the L2, it becomes stored as such psychologically and when the first
word in the collocation is observed the second word of the collocation is antici-
patorily activated” (Wolter and Gyllstad 2011: 442; also Yamashita and Jiang
2010). So we argue that non-congruent collocations are learnt holistically, and are
not as prone to compositionality as the learning of congruent collocations.
Considering learners’ relative ease in acquiring the non-congruent collocations,
it might be argued that non-congruent collocations are more “noticed” than con-
gruent ones. As pointed out by Schmidt (1990: 129), “noticing is the necessary and
sufficient condition for converting input to intake”. What make them more
noticeable are the divergent features between the target language and their mother
tongue, e.g. when there is no word-for-word translation equivalent of the L2 col-
locations in their L1. In this sense, non-congruent collocations receive more per-
ceptual salience [as James (1996) called it] than congruent ones. In addition, if the
non-congruent collocations are processed and stored holistically, the significantly
more non-congruent collocations at lower levels of learning and more congruent
collocations at higher levels of learning (see Table 9.6) conform to the model
proposed by Krashen and Scarcella (1978) and also Wray (2002): in early stages of
L2 learning, formulaic sequences are memorised as wholes but subsequently
“language development proceeds analytically, in the ‘one word at a time’ fashion”
(Krashen and Scarcella 1978: 297). This analytic approach leads to an analysis of a
formulaic sequence in terms of individual words, one which does not retain word
co-occurring information.
9.4 Within-group Comparison of Positive and Negative … 145
9.4 Within-group Comparison of Positive and Negative L1

Influence with VN and aN Collocations
Adjective + noun collocations produced by ST2 and ST6 learners were classified
into congruent ones and non-congruent ones following the same procedure with
verb + noun collocations. Examples of congruent AN collocations are: active part,
absolute truth, bad luck, blue sky, etc. Non-congruent collocations include active
volcano, narrow escape, promissory note, heavy smoke, etc. In this section, the role
of the L1 in the two types of collocations (VN and AN collocations) is examined in
order to see whether its influence is proportionate in different word-class
collocations.
It is assumed that congruent collocations are correctly used owing to L1 positive
transfer, though this assumption is somewhat arbitrary and speculative, since within
the learners’ “black box”, it is not clear whether congruent collocations are stored
and produced wholly without mediation through their L1. However, one may
justify this assumption, on the basis that learners take less reaction time and make
fewer errors in responding to congruent collocations than non-congruent ones,
which suggests that the former are stored in the learners’ mental lexicon via L1
mediation (cf. Wolter and Gyllstad 2011; Yamashita and Jiang 2010). So the col-
locations which are a result of positive transfer are collocations tagged as (C, W),
and negative transfers are collocations tagged as (I, N) and (C, N).
L1 influence was first measured in the VN collocations produced by ST2 and
ST6 learners. The overall number of L1-influenced collocations among the VN
collocations produced by ST2 learners was 801 (calculated as the overall colloca-
tions tagged as (C, W), (I, N) and (C, N)), which makes up 51% of all the collo-
cations (1,578 tokens of collocations). Similarly, 66% collocations in the ST6 level
were either positively or negatively influenced by the Chinese. In general over 50%
of the collocations produced by Chinese learners may be traced to the influence of
their L1: this figure was also reported by Wang (2011), in whose investigation of
Chinese college students’ acquisition of English light verb + noun collocations,
61.84% of the subjects’ production of L2 light verb + noun collocations were
positively or negatively transferred from Chinese.
Next, an analysis was performed on the negative transfer of VN collocations
between ST2 and ST6 learners (see Table 9.8 below for a numeric presentation). It
shows a decrease in transfer errors in the ST6 level (from 66 to 29%). In addition,
according to statistical analysis, ST6 learners made significantly more non-transfer
errors than ST2 learners, which suggests a weakening L1 influence on the pro-
duction of L2 collocations with learners’ rising proficiency. Yet the influence of the
Table 9.8 Transfer and Types ST2 (%) ST6 (%) Total
non-transfer VN collocation
errors produced by ST2 and Transfer errors 74 (66) 48 (29) 122
ST6 learners Non-transfer errors 38 (34) 116 (71) 154
Total 112 (100) 164 (100) 276
L1 is still strong as nearly one-third of the errors are L1-induced ones in the
advanced level.
For AN collocations in the two groups of learners, the L1 influence was found to
be larger than in VN collocations: 95% for ST2 learners and 96% for the ST6
group. Compared with the percentages obtained above in VN collocations, the L1
seems to play a bigger role in AN collocations. In the following analysis, we will
investigate whether its role is allocated proportionately in positive and negative
transfer in the two types of collocation.
Tables 9.9 and 9.10 present, respectively, the numbers of VN and AN collo-
cations due to positive transfer and negative transfer in the two groups of learners
(for types, see Appendices K and L).
The two tables reveal that in both groups there is significantly more positive
transfer in AN collocations and more negative transfer in VN collocations, irre-
spective of the tokens and types examined. That there is more negative transfer in
VN collocations than AN collocations produced by L2 learners has also been found
by Parastuti et al. (2009). They investigated the collocations used by Indonesian
English learners of English and reported that among all the negative transfer errors,
the percentage of negative transfer for the verb (creation + activation) + noun
collocations was the largest—54.24%, and the second largest was adjective + noun
collocations—18.64%.
That AN collocations are less error-prone may be because they are an early
acquired type of collocations whereas VN collocations are found to be the most
difficult collocations acquired by L2 learners (Gitsaki 1999). So it may be the
relative ease with AN collocations that enables a reduction in the negative L1
transfer, while the relative difficulty with VN collocations increases the possibility
of L1 interference. Another explanation may relate to the degree of congruence
Table 9.9 Positive and Types VN coll. AN coll. Total

negative transfer in VN and
AN collocations in the ST2 Positive transfer 727 (91%) 263 (97%) 990
(tokens) (C, W)
Negative transfer 74 (9%) 8 (3%) 82
((I, N) and (C, N))
Total 801 (100%) 271 (100%) 1072
Note p = 0.0005 *** extremely Sig.
Table 9.10 Positive and Types VN coll. AN coll. Total

negative transfer in VN and (%) (%)
AN collocations in the ST6
(tokens) Positive transfer 1047 (96) 960 (99.8) 2007
(C, W)
Negative transfer 48 (4) 2 (0.2) 50
((I, N) and (C, N))
Total 1095 (100) 962 (100) 2057
9.4 Within-group Comparison of Positive and Negative … 147
between AN and VN collocations. Of the AN collocations both in ST2 and ST6

databases, 78 and 97%, respectively, are congruent collocations (e.g. blue sky:
Chinese: lan tian). So one possible explanation is that between Chinese and English
adjectives correspondences are more often one-to-one, leading to more successful
learning of AN collocations; whilst ‘differentiation’ or ‘coalescing’ normally exists
between the verbs in two languages, thus causing more collocation errors.
9.5 Synopsis of Findings in This Chapter
The findings are summarised with regard to the three hypotheses proposed earlier in
this chapter, namely:
Hypothesis 1. L2 learners perform better in congruent collocations than
non-congruent collocations.
Hypothesis 2. Non-congruent collocations that are correctly used by learners at
lower levels are not wrongly used by learners at higher levels.
Hypothesis 3. The L1 plays a different role in verb + noun and adjective + noun
collocations.
For Hypothesis 1, we found that there were more congruent collocations than
non-congruent collocations that were correctly used by both groups in either tokens
or types (except for the tokens in the ST2). However, there were significantly more
errors with congruent collocations than non-congruent collocations. So this
hypothesis is rejected in light of the data showing that congruent collocations
actually posed more difficulties than non-congruent ones. Further, detailed analysis
was performed on erroneous congruent collocations so as to locate the factors
inhibiting the correct production of congruent collocations. It is found that a large
proportion of the errors can be attributed to ‘partial congruence’ between their
mother tongue and English, as in the forms of “differentiation” and “coalescing”.
With regard to Hypothesis 2, between-group comparisons on the well-formed
and erroneous uses of congruent and non-congruent verb + noun collocations were
conducted. This hypothesis is upheld as for non-congruent collocations that were
correctly produced by learners of lower levels, they were seldom wrongly used by
higher levels.
Concerning hypothesis 3, within-group comparisons of positive and negative L1
influence with verb + noun and adjective + noun collocations were carried out.
Statistical analysis revealed that there was significantly more positive transfer in
AN collocations and more negative transfer in VN collocations, irrespective of
learner types. That indicates that the L1 plays a different role in word-class-specific
collocations.
In conclusion, our findings regarding the cross-linguistic influence in the
learning and production of L2 collocations hold significant implications which can
be explored in connection with previous SLA theory. Discussion of these impli-
cations will be presented in Chap. 10.
References
Altenberg, B., & Granger, S. (2002). Recent trends in cross-linguistic lexical studies. In B.
Altenberg & S. Granger (Eds.), Lexis in contrast: Corpus-based approaches (pp. 3–48).
Biskup, D. (1992). L1 influence on learners’ renderings of English collocations: A Polish/German
empirical study. In P. Arnaud & H. Bejoint (Eds.), Vocabulary and applied linguistics (pp. 85–
93). London: Macmillan.
Conklin, K., & Schmitt, N. (2008). Formulaic sequences: Are they processed more quickly than
nonformulaic language by native and nonnative speakers? Applied Linguistics, 29(1), 72–89.
Cross, J., & Papp, S. (2008). Creativity in the use of verb + noun combinations by Chinese
learners of English. In G. Gilquin, S. Papp, & M. B. Diez-Bedmar (Eds.), Linking up
contrastive and learner corpus research (pp. 57–81). Amsterdam: Rodopi.
Dictionary Office, Institute of Linguistics, Chinese Academy of Social Sciences. (2005).
Contemporary Chinese Dictionary. (5th ed.) Beijing: The Commercial Press.
Gass, S. M., & Selinker, L. (2008). Second language acquisition: An introductory course. London:
Routledge, 2008
Granger, S. (1998b). The computer learner corpus: A versatile new source of data for SLA
research. In S. Granger (Ed.), Learner english on computer (pp. 3–18). London: Longman.
James, C. (1996). A cross-linguistic approach to language awareness. Language Awareness, 5(3–
4), 138–148.
Jiang, N., & Nekrasova, T. M. (2007). The processing of formulaic sequences by second language
speakers. Modern Language Journal, 91(3), 433–445.
Johansson, S. (2007). Seeing through multilingual corpora: On the use of corpora in contrastive
studies. Amsterdam: Benjamins.
advanced learners of English: a contrastive, corpus-based approach[J/OL]. (2011-10-13). http://
main.amu.edu.pl/ przemka/rsearch.html.
Krashen, S., & Scarcella, R. (1978). On routines and patterns in language acquisition and
performance. Language Learning, 28(2), 283–300.
Laufer, B., & Waldman, T. (2011). Verb-Noun collocations in second language writing: A corpus
Lü, S. X. (2002). Collection of LÜ Shuxiang (Vol. 5): 800 Words in Modern Chinese. Liaoning:
Liaoning Education Press.
Marton, W. (1977). Foreign vocabulary learning as problem No. 1 of language teaching at the
advanced level. Interlanguage Studies. Bulletin, 2, 33–57.
Parastuti, A., Said, M., & Wawan, W. (2009). The negative transfers of English collocations
written by the students of Gunadarma University[OL]. (2014-01-11). http://www.gunadarma.
ac.id/library/articles/graduate/letters/2009/Artikel_10604015.pdf.
Philip, G. (2007). Decomposition and delexicalisation in learners’ collocational (mis)behaviour.
Online Proceedings of Corpus Linguistics[OL]. (2014-01-12). http://ucrel.lancs.ac.uk/
publications/cl2007/paper/170_Paper.pdf.
References 149
Salkie, R. (2002). Two types of translation equivalence. In B. Altenberg & S.Granger (Eds.), Lexis
in contrast: Corpus-based approaches (pp. 51–71). Amsterdam: Benjamins.
Schmidt, R. W. (1990). The role of consciousness in second language learning. Applied
Linguistics, 11(2), 129–150.
Wang, D. (2011). Language transfer and the acquisition of English light Verb + Noun collocations
by Chinese learners. Chinese Journal of Applied Linguistics, 11(2), 107–125.
Wang, Y., & Shaw, P. (2008). Transfer and universality: Collocation use in advanced Chinese and
Swedish learner English. ICAME Journal, 32, 201–232.
Yamashita, J., & Jiang, N. (2010). L1 Influence on the acquisition of L2 Collocations:
647–668.
Chapter 10
Summary and Conclusions
The final chapter begins by summarising the key findings reported in Chaps. 5, 6, 7,
8 and 9. Then theoretical and pedagogical implications for L2 collocation learning
are discussed with a view to findings revealed in this study (Sect. 10.2). The book
concludes by mentioning the limitations of the present study and suggesting ways
forward in further research into L2 learners’ collocation learning (Sect. 10.3).
10.1 Summary
The mastery of collocations is a key indicator of second language learners’ overall

proficiency in the field of second language acquisition. The importance of collo-
cational knowledge to the attainment of native-like fluency has been widely
acknowledged (e.g. Palmer 1933; Pawley and Syder 1983; Wray 2002). Yet col-
location acquisition poses great problems even for fairly proficient L2 learners.
Much L2 collocation research has been devoted to an investigation into learners’
knowledge and use of collocations, with the finding that there are both quantitative
(e.g. overuse and underuse) and qualitative deficiencies (e.g. misuse) in their col-
location production. Apart from the deficiencies in collocation uses uncovered by
previous studies, collocation is believed to be acquired late and lag behind other
aspects of SLA (Henriksen 2013; Schmitt and Carter 2004). However, studies
identifying the factor(s) associated with the lag of collocational knowledge have
been few and our research fills this gap through investigating the relationship
between vocabulary growth and the learning of collocations by L2 learners.
A cross-sectional study of the collocation performance of Chinese learners of
English at three proficiency levels was conducted. Learners’ collocation perfor-
mance was investigated through their production of three frequent and important
lexical collocations: verb + noun, adjective + noun and noun + noun collocations,
with the main focus on the most difficult type of collocations—verb + noun col-
locations. Before pulling together the detailed results, an overall picture of
DOI 10.1007/978-981-10-5822-6_10
152 10 Summary and Conclusions
collocation production by Chinese EFL learners at the three proficiency levels is

presented.
Corroborating findings from previous L2 collocation studies (cf. Sect. 3.2.1),
both quantitative and qualitative deficiencies were identified in Chinese L2 learners’
performance in verb + noun collocations. Quantitative deficiency was exemplified
in the small number of VN collocations extracted from three sub-corpora (only
about 5000 tokens and 1070 types of collocations retrieved from writings of
approximately 600,000 words). That there was an insufficient use of collocations by
L2 learners suggests that learners more often combine individual words creatively
in language production rather than making use of formulaic language, which further
indicates L2 learners’ poor sense and command of prefabricated units in language
acquisition and production (Foster 2001; Laufer and Waldman 2011; Kjellmer
1991; Wray 2002). Another aspect of quantitative deficiency lies in the heavy use of
a limited range of collocation types.1 Heavy reliance on a small number of collo-
cations, together with a small quantity of collocations produced, indicated a poor
phraseological competence among Chinese learners.
In addition to a quantitative deficiency in collocation production, learners’
problems with collocation learning have been identified through the large propor-
tion of collocation misuses. Nearly a quarter of VN collocations were recognised as
erroneous collocations. Successful production of VN collocations not only posed
problems for all levels of learners, but there was no sign of improving collocation
performance with rising proficiency. On the one hand, there was no sign of
increasing collocation production from the ST2 to the ST6 level (cf. Sect. 5.1.1).
On the other, in terms of the erroneous collocations produced, though there was a
significant decrease in collocation errors from the ST2 to the ST5 level, errors
significantly increased again from the ST5 to the ST6 level. This uneven devel-
opmental path was attributed to the learning of more verbs at the ST6 level, which
inhibited the acquisition of collocations (cf. Chap. 6). As the overall trend predicts,
collocational knowledge did not improve since learners at the ST6 level, despite
advancing L2 proficiency, made about the same proportion of errors as the lowest
ST2 learners.
The study went on further to explore the role of the verb increase in this col-
location lag. Verb increase was firstly measured broadly in terms of the develop-
ment of verbs from delexical to lexical verbs. Investigation into delexical
verb + noun and lexical verb + noun collocations revealed that there was a gradual
increase in the production of lexical verb + noun collocations and decrease in
delexical verb + noun collocations among the three levels. A significant relation-
ship was found between the numbers of well-formed lexical verb + noun collo-
cations and learner levels, indicating a significantly higher production of lexical
verb + noun collocations with the rise of proficiency. However, in erroneous lexical
verb + noun collocations, there was a dip first from the ST2 to the ST5 level,
1
For example, 14% collocation types were used more than 10 times by learners at the lowest level,
making up 67% of all the collocations retrieved.
10.1 Summary 153
followed by a sharp increase from the ST5 level to the ST6 level. ST6 learners
produced significantly more lexical verb + noun collocation errors than ST2
learners, indicating poorer performance in lexical verb + noun collocations than
delexical verb + noun collocations. This means that with more lexical verbs learnt,
the chances of these lexical verbs leading to collocation errors increased as well.
When the lexical verbs in both well-formed and erroneous lexical verb + noun
collocations produced by all levels of learners were arranged into synonym sets, it
was found that collocation errors were seldom made where there was no growth in
verb synsets. However, there was an increase in collocation errors in synsets with a
verb increase. As learners proceeded to more advanced levels, the occurrence of
collocation errors was found to become more and more limited to synsets with verb
increases. Verb classes most susceptible to errors were verbs of creation, fulfil
verbs, verbs of obtaining and verbs of putting, where there was a considerable
increase in the number of verbs at the higher level. A marked lag in learners’
knowledge of VN collocations was observed, as more proficient learners produced
the same proportion of errors as learners of lower levels in terms of the synsets
identified. When verbs in these sets were divided into new and old verbs, errors
with new verbs at the ST6 level were significantly more likely to be made than
errors with old verbs. Therefore, we conclude that the increase in verbs in a par-
ticular semantic domain is an inhibiting factor for the learning of collocations: it
was suggested that learners may only have an incomplete command of the
semantics of the new verb, i.e. the basic meaning of that verb is acquired but not its
distinguishing features as distinctive from a set of semantically related verbs. This
study suggests that acquisition of verb semantics is important for successful
learning of L2 collocations.
In addition to verb growth as an inhibiting factor in collocation learning, further
analysis was performed to examine whether newly acquired nouns were also a
factor responsible for the lag. Results showed that the percentage of new nouns in
erroneous collocations produced by higher levels was rather low—around 13%, a
figure which remained roughly constant at both ST5 and ST6 levels. That means in
a majority of newly acquired nouns, VN colocations were target-like. Even though
new nouns were used in erroneous collocations, it was found that this was not
mainly due to a shortfall of new verbs collocating with the newly acquired nouns; in
fact the target verbs may have been acquired (e.g. *stir + consciousness instead of
raise + consciousness, *reflect/cast + prejudice instead of hold + prejudice). So
the occurrence of new nouns is not an inhibiting factor for the stagnant development
of L2 learners’ collocational knowledge.
This study has also investigated learners’ performance in two other frequent
types of collocations: adjective + noun and noun + noun collocations. Better per-
formance was discovered in the production of adjective + noun and noun + noun
collocations than verb + noun collocations. A comparison of the ratios of erroneous
collocations among the three types of collocations showed that L2 learners, irre-
spective of proficiency level, performed best on noun + noun collocations, fol-
lowed by adjective + noun collocations, and performed worst on verb + noun
collocations. Not only was a better performance observed on AN and NN
collocations produced by the three levels of learners, but there was also a clear
progression overall in collocational knowledge with regard to these two types of
collocations. However, learners’ knowledge of VN collocations, lagged, as we saw.
This finding of differing collocation performance depending on category type
has contributed to answering the question raised by Siyanova and Schmitt (2008:
453)—whether other types of L2 collocations (e.g. verb–noun, verb–adverb) would
be produced at a similar level as adjective + noun collocations. The answer to their
question is negative as Chinese L2 learners had much better command of noun +
noun collocations than adjective + noun collocations, and better command of
adjective + noun collocations than verb + noun collocations.
The present study attempted to account for such differing performance in dif-
ferent types of collocations in terms of vocabulary growth within synonym sets.
Classifying verbs into synsets was found to be more natural than adjectives and
nouns in the collocation databases. Combining with synonym analyses of the words
in WordNet, and a study of the synonym density of randomly selected verbs,
adjectives and nouns used by learners, it was discovered that synonym density of
the three types of words was on a decreasing scale. That semantic property may
account for L2 learners’ better performance in AN and NN collocations and worse
performance in VN collocations. In this regard, the prediction that vocabulary
growth is an inhibiting factor in collocation acquisition is again upheld.
As an important factor that cannot be ignored in L2 acquisition, the role of L1 in
collocation learning was also examined in this study. Contrary to Bahns’s (1993)
claim (cf. Chap. 3) that only collocations which are non-congruent with learners’
L1 collocations need to be taught to learners, we found that congruent collocations
were more prone to errors for Chinese learners of English. That was because cases
of one-to-one correspondence between the two languages are few and partial
congruence is common between the two languages, i.e. differentiation (one-to-many
correspondence) and coalescing (many-to-one correspondence). As for
non-congruent collocations, it was found once they were acquired, they were sel-
dom susceptible to errors. L1 was also found to play a different role depending on
the types of collocations, as we observed that there was more negative transfer for
verbs in verb + noun collocations and more positive transfer for adjectives in
adjective + noun collocations.
The main findings of our research are: (a) vocabulary growth was identified as a
factor responsible for the stagnant collocation performance in verb + noun collo-
cations; (b) learners performed differently in verb + noun, adjective + noun and
noun + noun collocations, with verb + noun collocations the most difficult to
acquire, and noun + noun collocations the easiest; (c) learners’ L1 played a dif-
ferent role depending on the types of collocations, i.e. more negative transfer in
verb + noun collocations and positive transfer in adjective + noun collocations.
These findings contribute to a more comprehensive understanding of collocation
learning by second language learners. Most importantly, our research has identified
the vocabulary growth factor as an inhibiting force in collocation acquisition, i.e.
the learning of new semantically related verbs in a synset leads to more collocation
errors. In this regard, it goes beyond previous L2 collocation studies by providing
10.1 Summary 155
an explanation for the stagnant development of collocation knowledge which has

been widely reported (cf. Chap. 3). Put simply, the more words they learn, the more
likely learners are to make collocation errors. Furthermore, the misuses of
semantically related words in collocations by L2 learners indicate what has been
acquired of the semantics of the erroneous verb and what has been not. In the case
of *implement + act, the core meaning of implement—“to carry out/do”—was
acquired but not its distinguishing semantic property—“to ensure that what has
been planned is done”. In this regard, our finding has contributed to illuminating an
aspect of second language acquisition—the linguistic target of learning. Through
the misuses of semantically related verbs in synsets in verb + noun collocations, we
found that learners failed to produce the correct collocation when they only
hadacquired the core meanings of a verb but not the distinctive semantic properties
of that verb. Therefore, collocation acquisition requires complete acquisition of the
semantics of a word.
Additionally, different from previous studies uncovering learners’ difficulties
with non-congruent collocations either in the production process (cf. Nesselhauf
2005), or in the collocation recognition process (cf. Yamashita and Jiang 2010;
Wolter and Gyllstad 2011), our finding in terms of L1 influence shows that con-
gruent collocations deserve more attention than non-congruent collocations for
Chinese L2 learners. It also finds that non-congruent collocations once acquired, are
less prone to errors, which, from the production perspective, verifies previous
findings obtained from psycholinguistic experiments. In light of these, this study
contributes to a more comprehensive understanding of the L1 influence on the
learning of L2 collocations. In the next section, theoretical and pedagogical
implications for L2 collocation learning will be discussed based on our findings.
10.2 Implications
10.2.1 Theoretical Implications
Although our data are based on L2 learners’ production, indicating that inferences
drawn from this study about psycholinguistic aspects in learners’ mental lexicon are
tentative, the findings in terms of the vocabulary growth factor and the role of L1 in
L2 learners’ collocation learning hold theoretical implications for the lexical
organisation in the mental lexicon of L2 learners of English and the cross-linguistic
influence in L2 collocation learning.
10.2.1.1 Implications for Lexical Organisation in the L2 Mental

Lexicon
The finding that learners misuse semantically similar words in collocation pro-
duction indicates that words are primarily semantically linked in the L2 lexicon.
Most words are stored in the mental lexicon via the establishment of semantic
associations and semantically similar words are stored nearby (e.g. Channell 1988;
Howarth 1998; Wolter 2001; Zareva and Wolter 2012). Language production
involves the selection of appropriate words according to the meaning to be con-
veyed. Psycholinguistic evidence has been gathered “in favor of a psycholinguistic
model in which words with like meanings are ‘close together’ in accessing terms”
(Channell 1988: 90; cf. Albert and Obler 1978). So in the production of word
combinations, a choice among the alternatives of a group of semantically related
words has to be made. The clustering of words with similar meanings thus produces
an interference effect in selecting the right words. Even native speakers encounter
semantic interference in producing the target collocations. This interference effect is
observed in the mis-collocations produced by native speakers (either intentionally
or unintentionally). Evidence concerning native speakers’ collocation misuse is
sparse in the literature. Howarth (1998a) is among the few who investigated both
the NSs’ and NNSs’ phraseological errors. He proposes two types of plausible
explanations to account for the occurrences of lexical mis-collocations produced by
NSs: collocational overlaps and blends. In *draw a contrast, the error can be seen as
the result of filling in a collocational gap within a partially overlapping cluster:
draw a distinction, make a distinction, make a contrast but not *draw a contrast. In
*
place weight, the error arises out of a blending of two pairs of collocations: place
emphasis and attach weight. Approaching these errors from the perspective of the
semantics of the erroneous verb and the target verb, we get the generalisation that
they are semantically related verbs, belonging to the synsets identified in the present
study. The verbs of draw and make are in the same set denoting verbs of creation.
Place and attach are in the same set of verbs of putting. For other L1 collocation
errors given by Howarth, such as *reach a justice, the verb reach and the target verb
achieve fall in the same set of verbs of obtaining (cf. Sect. 6.1). For another set of
verbs listed by Howarth, e.g. compile, draw up, make, produce and write, they are
semantically related as verbs of creation and it follows that collocation errors are
made by native speakers through wrongly selecting one of them (e.g. compile) to
collocate with a noun (e.g. memorandum).
Similar to the semantic interference for native speakers in selecting the right
word from a set of semantically similar words, L2 learners encounter the same
interference with the expansion of their vocabulary. However, there is a funda-
mental difference in the semantic interference effect between NSs and NNSs. Native
speakers may deviate from standard collocational forms either deliberately or
unintentionally and in fact there are only a very small number of erroneous col-
locations produced by NSs (Howarth 1998a). However, for L2 learners, the
semantic interference effect is stronger owing to an incomplete acquisition of the
semantics of the semantically related words. As the empirical evidence obtained in
the data shows, learners’ collocation production gets worse when new semantically
related words are learnt (e.g. *concede + mistake rather than admit + mistake). As
Chap. 6 reports, verbs in synsets increased dramatically with the rise of proficiency,
so did the increase in collocation errors. Misuses of verbs such as conduct, commit,
10.2 Implications 157
accomplish, enforce, implement and perform are believed to be caused by the partial
acquisition of the core meanings (i.e. “carry out” or “do”) but not their distinctive
meanings.
That semantic confusion increases with the rise of proficiency has been found in
several studies (e.g. Llach 2011; Ringbom 1987, 2001). In a developmental study of
the kinds of lexical errors that appeared in the written production of young Spanish
learners at two different stages, Llach (2011) observed a statistically significant
increase in semantic lexical errors at the higher proficiency level, which included
calques (literal translation of the word from the L1 to the L2) and semantic con-
fusion [the confusion of semantically related words, e.g. * my bedroom is great
(great for huge or big)]. These results support that “well-developed lexicons are
dominated by paradigmatic associative connections” (Zareva and Wolter 2012: 60).
Psycholinguistic research into the lexical organisation of L2 learners’ mental dic-
tionary indicates that “the same class (paradigmatic) connections become more
prominent as the proficiency of L2 learners of English increases to an advanced
level” (ibid: 59).2 Through word association tasks, Zareva and Wolter found that
with the advance of proficiency, NNSs’ lexicon becomes more paradigmatically
dominated like NSs’. Synonymy is an important paradigmatic response and L2
learners’ mental lexicon becomes organised more like a thesaurus in which words
with similar meanings are stored together (Meara 1978; Zareva and Wolter 2012).
Therefore, the more proficient learners become, the larger this thesaurus is, and the
more semantic interference they are confronted with in choosing the right word
from a set of semantically related words.
10.2.1.2 Implications for Cross-linguistic Influence in L2 Collocation

Learning
L2 learners are not only confronted with the semantic interference from vocabulary
increase along the paradigmatic relations, their L1 lexical network exercises a
considerable influence over the learning and production of collocations. Thus
another inference about cross-linguistic influence can be considered from this study.
Firstly, a higher error rate with congruent collocations than non-congruent ones,
even for advanced learners, suggests a consistent role of the L1 in producing
collocations even for proficient NNSs. As has been discussed in Chap. 3, the active
role of the L1 in collocation production is confirmed in psycholinguistic experi-
ments where a ‘dual-activation’ takes place: an L2 word stimulates not only its
collocates, but also its L1 translation equivalent and L1 collocate (Wolter and
Gyllstad 2011). Given that “even for advanced L2 learners, the L1 continues to be
active even when performing tasks entirely in the L2” (ibid: 443), it seems that apart
2
Paradigmatic relations between words refer to words of the same lexical class that can substitute
for another in a syntactic string (e.g., synonyms, antonyms, meronyms, hyponyms, etc.) (Zareva
and Wolter 2012: 44).
from the receptive process in primed lexical decision tasks, in the actual collocation
production process, L2 learners’ L1 still plays a predominant role through inter-
fering and mediating collocation production. The significant traces of the L1 in L2
collocations, and the large number of transfer errors well attest the active role of L1
on the production side.
The consistent role of L1 in collocation production may be closely linked with
the asymmetric cross-language connections. As the Revised Hierarchical Model
(Kroll and Stewart 1994; cf. Chap. 3) predicts, the link from L1 to conceptual
memory is assumed to be stronger than the link from L2 to conceptual memory, and
the lexical link from L2 to L1 is assumed to be stronger than the lexical link from
L1 to L2. Then it seems highly likely that in the production process, L1 is firstly
activated prior to the production of L2 words (see also the ‘dual-activation’ in
Wolter and Gyllystad (2011). Yet not all aspects of the L1 are easily activated in
producing an L2. As is shown in Jiang’s (2000) model, the lemma information
(containing semantic and syntactic information) of the L1 is copied into the L2
lexical entry. This is a stage called L1 lemma mediation stage where a majority of
L2 words fossilise at this stage (Jiang 2000). Thus based on Jiang’s model, L2
lexical information at the lemma level is in turn most likely to be influenced by the
L1. For producing L2 collocations, which are word combinations representing
syntactic and semantic relationships between lexical items, the L1 thus plays the
most significant role for L2 learners. For example, acquire knowledge as a word
combination involves both the semantics of acquire and knowledge and the syn-
tactic information of acquire as a transitive verb and knowledge as an uncountable
noun. With the storage of L1 semantics and syntax at the lemma level (e.g. for both
the L2 words acquire and knowledge), production of word combinations involving
semantics and syntax in the L2 (e.g. acquire knowledge) is easily mediated through
L1 semantics and syntax and thus L1 interference occurs in L2 collocation pro-
duction. Then L2 lexical combinations are the most susceptible to L1 influence
compared to other aspects of language acquisition (e.g. morphology and phonol-
ogy). Additionally, syntactic and phonological constructions are always finite
compared with L2 lexical combinations. So building syntagmatic connections
between words in an L2 is complicated by the infinite number of collocations, as
well as by the influence from L1 collocational knowledge (cf. Wolter 2006).
Reliance on the L1 lexical network underlies the large number of fortuitous
well-formed collocations that share direct translation equivalents between the L1
and L2, but at the same time the occurrence of erroneous congruent collocations. As
discussed in Sect. 9.2, types of mismatches like ‘differentiation’ and ‘coalescing’
make direct copying of L1 word combinations to L2 collocations error-prone. For
collocations that have no direct translation equivalent between languages, the
shared conceptual system may be the same, but features of word combinations
differ (cf. Kroll et al. 2010). Both the empirical data in the present study and the
experimental data obtained by Yamashita and Jiang (2010) and Wolter and Gyllstad
(2011) suggest that once these non-congruent collocations are acquired, they are
processed independently of the L1. It is speculated that with regard to the acqui-
sition process for non-congruent collocations, no direct access is gained in the
process of exploiting the existing L1 corresponding lexical network, so new con-

nections between words in the L2 will have to be made and thus a restructuring
process begins to accommodate the idiosyncrasies between languages (Wolter
2006: 745). This restructuring may take the form of a “noticing process” and may
accordingly contribute to the memory of these non-congruent collocations as
holistic units. Thus in the production process, these lexical combinations are
directly produced from the concept to L2 collocations, without mediation of
learners’ L1. Congruent collocations, however, become non-salient in the learning
process and are mediated through the translation of their L1. So in the production
process, these collocations are mediated through their L1 and thus either positive or
negative transfer takes place. As learners’ proficiency rises, the link between L2 and
the concept is stronger, resulting in the far fewer non-transfer errors in higher level
learners (see Fig. 10.1 below for a summary).
The above figure synthesizes empirical findings of the present study and also
conforms to findings from psychological experiments conducted by Yamashita and
Jiang (2010) and Wolter and Gyllstad (2011). According to the model,
non-congruent collocations that are acquired enjoy a separate and direct route to the
production of L2 without word-for-word mediation of the L1. The other colloca-
tions are likely to be mediated through a one-to-one translation equivalent in the L1
(though this is especially the case for lower levels, it is probably also the case for
proficient NNSs). This is the claim made by Wolter (2006: 743) that learners might
bypass the L2 acquisition process through relying on their L1 lexical network when
there is a marked overlap between the L1 and L2 lexical networks. For the
non-acquired collocations, learners tend to resort to word-for-word translation from
the L1 and as a result infelicitous collocations are produced. Where there happens to
be a one-to-one correspondence in the L2 (e.g. answer + question), the congruent
collocation translated from the L1 to L2 is a well-formed one; where there are
mismatches [e.g. one-to-many correspondence (“differentiation”) and many-to-one
correspondence (“coalescing”)], collocation errors are likely to occur.
In conclusion, one general inference drawn from this study is that two factors
pose particular problems for L2 learners in learning L2 collocations: semantically
related words in the L2 and learners’L1 lexical network. Learners encounter much
L1 L2
congruent and non-congruent acquired non-congruent

collocations collocations
concepts
Fig. 10.1 Processes for the production of congruent and non-congruent collocations by L2
learners
greater difficulties in producing collocations than native speakers, since on the one
hand, their L1 lexical/conceptual knowledge has a consistent influence on how
learners structure connections between words in an L2 (Wolter 2006); on the other,
expanding paradigmatic relations of words (e.g. synonymy relations) in their
vocabulary in the course of L2 acquisition means that more and more words are
stored in the L2 mental lexicon, thus exerting interfering forces in the word
selection process. This is one of the major implications drawn from this study. Our
results show that learners are confronted with a dilemma: on the one hand, their
production of collocations is characterised with a limited number of collocation
types, indicating an inadequate mastery of vocabulary; on the other, the increase in
vocabulary in turn inhibits the learning of collocations. In other words, the growth
of vocabulary in the paradigmatic relations [i.e. sets in the terminology of Carter
and McCarthy (1988: 210)] enables learners to have more varied choices but at the
same time produces an interfering effect in learning collocations. Words within sets
are in relationships of synonymy, antonymy, hyponymy, etc. Sets are believed to be
“powerful organising principles, and have a strong psychological reality for lan-
guage users and learners” (ibid: 211). In this regard, the synonym sets identified in
the learner data are not only the organising principle for semantically related words,
but also the interfering factor in selecting the appropriate word to collocate with
another word. Therefore, it is important for learners to acquire not only the shared
semantic element of a word in a group of semantically related words, but also to
acquire the distinguishing semantic contents of that word in order to differentiate it
from its synonyms. The next section will be devoted to a discussion of pedagogical
implications for collocation learning mainly in terms of a full mastery of word
semantics.
10.2.2 Pedagogical Implications
10.2.2.1 Acquisition of Verb Semantics
How collocations are learnt and what factors interfere with the learning of collo-
cations can shed much light on how collocations are best taught and learnt. Findings
in the present research hold a number of important pedagogical implications.
First, verb + noun collocations deserve special attention compared with AN and
NN collocations since they are more error-prone. Second, within verb + noun
collocations, it is the verb that poses more problems than the noun for L2 learners
(Granger 2014; Nesselhauf 2005). So verbs deserve more attention in vocabulary
learning. As is discussed in the previous section, the misuse of verbs with semantic
relatedness (e.g. implement and perform, admit and concede) in collocations means
that it is important to fully acquire the semantics of verbs: both their basic properties
(“semantic markers” in the parlance of Katz and Fodor 1963) and the features that
distinguish them from its semantically related words (“distinguishers”, Katz and
Fodor 1963).
An efficient learning of verb semantics has to take into account how the
semantics of words is approached. A traditional view of word semantics is to break
down word meaning into a number of abstract components or semantic features and
identify “those features that will distinguish the meaning of any one word from
every other that might … compete for a place in the same semantic territory”
(Cowie 2009: 57). This approach to the meaning of a word is known as
Componential Analysis. CA has long been used to describe and distinguish words
with semantic relatedness. For example, Rudzka et al. (1981, 1985) presented
words in sets whose members have similar meanings and distinguished them
through componential grids (and collocational grids as well). With the example of
admit and concede, which occur in our database as *concede + mistake (admit), the
componential grid given by Rudzka et al. (1985: 171) is as follows:
The semantic marker of both admit and concede is “accept as true or valid”, but
the distinguishers of admit are “to confess or to acknowledge or allow to enter” and
of concede are “to give up or give away to opponent” (ibid.). This way of con-
trasting the semantic features of semantically related words works well for linguistic
analysis but is not suitable as a language-teaching tool, as the features are always
abstract (Carter and McCarthy 1988). As one semantic component of admit—“to
confess”, the word confess might be more complex to understand than admit.
Meanwhile, the decontextualised presentation of meaning components in words or
phrases (e.g. “accept as true or valid” for both admit and concede) makes it hard for
L2 learners to comprehend.
So the acquisition of verb semantics is better aided through a contextualised
display of its meaning. The learning of word semantics in contexts has been widely
advocated (Cobb 2003; Hanks 1996; Hoey 2000; Laufer 2006). One macro context
for learning the semantics of a word is its co-text, as the full sentence definition of
the headword adopted by the Collins COBUILD English Dictionary (1995). The
definition provides “much of the context necessary for the meaning of the word in
use in the language, dependent on its environment, to be properly appreciated”
(Barnbrook 2007: 190). For example, the Cobuild dictionary defines implement and
perform in the following way:
Implement: If you implement something such as a plan, you ensure that what has
been planned is done.
Perform: When you perform a task or action, especially a complicated one, you do
it.
Seeing through the meanings of the two verbs, implement implies more than
“carrying out/do something”; it also incorporates the meaning of “carrying out what
has been planned”. When learners are presented with the definition of implement,
the possibility of them making errors like *implement + act found in this study may
be reduced.
Another way beneficial for the learning of verb semantics, especially for learning
the semantics of semantically related words is through their collocates (cf. Carter
and McCarthy 1988; Lee and Liu 2009; Xiao and McEnery 2006). Collocations
contribute to the understanding of the concept of a word through defining its
semantic area (Brown 1974; Nattinger 1988). The learning of lexical semantics and
learning of collocations are mutually beneficial and inseparable. Knowing a word
involves knowing which words it usually collocates with, and to know the collo-
cational behaviour of a word is one type of word knowledge necessary for a
complete acquisition of that word (cf. Nation 1990: 31). Learning word meanings
through collocates contributes to the comprehension of the semantics of that word
and an acquisition of the semantics in turn helps define its co-occurring words. The
inseparability of the learning of semantics and collocational behaviour is best
manifested in Lewis’s (1997: 97) view that “the real definition of a word is a
combination of its referential meaning and its collocational field”. For semantic
sets, display of overlapping collocates and of collocates exclusive to a particular
word is helpful for learners to both learn the common meaning of a group of words,
and the distinguishing features of each word. Such an approach to learning
semantically related words has been advocated by Rudzka et al. (1981, 1985). The
following is an example of their presentation of collocational grids for synonymous
words.
Collocational grids like this may not only help learners get the common
meanings of verbs in a semantic field, but also the different nuances of meanings. It
is beneficial for learners to identify the distinguishing meanings through the indi-
vidually tailored collocates. This way of learning may be much better than a
decontextualised word learning, i.e. memorising the meanings of words in word
lists or through translation equivalents in the L1. Learning and teaching in word
lists would unavoidably lead learners to believe that the collocates of synonymous
words in a list share many collocates (Hoey 2000), which further leads to the
semantic confusion in producing collocations (e.g. the misleading belief that im-
plement and perform share the collocate act). Likewise, learning words through
translation equivalents leads to the same problem of assuming similar collocates of
semantically related words. As Meara (1982) points out, learning vocabulary does
not just involve pairing L2 words and L1 meanings as the end state of learning that
word. With the words perform and implement as an example, they are translated
into the same word in Chinese according to the Oxford Advanced Learners’
English-Chinese Dictionary, but the distinguishing features are lost in the Chinese
translation equivalent. So with the same translation equivalent, the collocational
behaviour of semantically related words is highly likely to be believed as the same
by L2 learners. Psycholinguistic studies have found that L2 learners have same
translation pairs stored nearby in the mental lexicon and have difficulties distin-
guishing their meanings (e.g. hat-cap, problem-question) (Jiang 2002). Therefore, it
is far from enough to acquire the semantics of an L2 word on the basis of its
translation equivalent. Instead, both a full sentence definition of the verb and words
in syntagmatic relations with the verb can be presented to L2 learners for a com-
plete acquisition of its semantics.
10.2.2.2 Consciousness-Raising
The finding of a stagnant development in collocational knowledge as learners

advance to a higher level clearly indicates that collocations deserve more attention
from both L2 learners and foreign language teachers. But what is often the case is
that L2 learners do not pay attention to collocational relationships between words,
as collocations are largely semantically transparent (Bahns and Eldaw 1993; Laufer
and Waldman 2011; Martelli 2006; Nesselhauf 2003, 2005; Wray 2002). The
transparent nature of collocations means that they usually pose no problems for
comprehension, but in the production process, semantic interference from a set of
semantically similar words leaves learners in a state of not knowing which one to
choose to form an appropriate word combination. A lack of awareness of collo-
cations is not only inferred from a large number of collocation errors identified in
this study, but also confirmed through Nesselhauf’s (2005) finding of no
improvement in collocation competence for learners with more exposure to English.
That neither the length of a learner’s exposure to English, nor the use of a dictionary
seemed to have a significant effect on the overall number of collocations and the
number of collocation errors suggests an unawareness of the collocation phe-
nomenon on the part of L2 learners (Nesselhauf 2005).
An awareness of collocations, on the contrary, facilitates collocation learning.
This can be inferred from this study that learners perform better on non-congruent
collocations than congruent ones. Non-congruent collocations in the L2 do not have
word-for-word translation equivalents in the L1, thus more conscious learning may
be involved. In other words, non-congruent collocations may be salient to L2

learners and thus trigger deeper processing. Congruent collocations, on the con-
trary, are accessed directly in the L1 and require less processing. According to the
framework of human memory (Craik and Lockhart 1972), greater “depth of pro-
cessing” (i.e. conception of a series of processing stages in the perceptual analysis
of stimuli) implies more semantic or cognitive analysis. Accordingly, deeper levels
of analysis lead to longer lasting and stronger traces. Therefore, a deeper processing
means more chances of a non-congruent collocation to be stored permanently.
Another factor leading to a lack of awareness of collocational relationships may
be due to the traditional teaching of vocabulary as single lexical items in English
instruction practices (Farghal and Obeidat 1995; Siyanova and Schmitt 2008). One
consequence of focusing on single words is the combining of individual words
completely on the basis of syntactic rules in text production, operating on a prin-
ciple called the “open choice principle” by Sinclair (1987). It is necessary for
learners’ attention to be diverted from single lexical items to habitual word com-
binations. Only after collocations are paid attention to, can they have the chance to
become acquired, since “noticing is the necessary and sufficient condition for
converting input to intake” (Schmidt 1990: 129). However, as discussed earlier, the
transparent nature of collocations means they often pass unnoticed by learners
themselves in language input, thus collocations should be made explicit to learners
in teaching materials and in classroom activities.
In the process of raising learners’ awareness of collocations, one aspect arising
from our results should not be ignored—the influence from learners’ mother ton-
gue. Considering the persistent role L1 plays even at advanced levels, it is useful for
learners to be aware of L1-L2 differences in learning collocations. As James (1996:
147) proposes, translation is a particularly good way to raise L2 learners’
cross-linguistic awareness. The teaching/learning of L2 collocation with reference
to L1 collocational patterns has been previously suggested (e.g. Bahns 1993; Chi
Man-lai et al. 1994; Granger 1998; Martelli 2006; Nesselhauf 2003, 2005; Xiao and
McEnery 2006). Different from the view that collocations with no translation
equivalents in the mother tongue shall be paid particular attention to (cf. Bahns
1993; Nesselhauf 2003), this study showed that congruent collocations posed more
difficulties in production, since one-to-one correspondences between languages are
few. Therefore, special attention needs to be paid to congruent collocations in
which differentiation (one-to-many correspondence from the L1 to the L2) and
coalescing (many-to-one correspondence from the native language to the target
language) exist (cf. Chap. 9). The focus on congruent collocations requires both
materials writers and teachers to be familiar with the cross-linguistic differences
between collocations before diverting learners’ attention to particular collocations.
With acquire knowledge as an example, teachers can place special emphasis on the
verb acquire as a collocating verb for knowledge, rather than other verbs such as
learn, study, master, although these verbs are legitimate verb collocates for the
Chinese word combination with knowledge. In the meantime, learners can be
encouraged to apply a “back translation” strategy in learning collocations. When
they encounter acquire knowledge, they can translate the whole collocation into the
mother tongue (as they will always unconsciously do in learning an L2), but this is
not the end state. It would be beneficial for learners to translate the previously learnt
word combinations from their mother tongue back to English, without looking at
the English translation, in order to be more aware of the cross-linguistic differences
and ultimately to be conscious of the appropriate L2 collocation. This contrastive
analysis of collocations can be facilitating from a purely psycholinguistic per-
spective: collocation learning requires not only noticing, but also more “cognitive
depth” (Craik and Lockhart 1972). Therefore, collocations may ultimately enter the
long-term memory since more retention is gained in the learning process. In
instructional practices, contrastive analysis of collocations in terms of L1-L2
similarities/differences has been proved by Laufer and Girsai (2008) to be more
effective than teaching methods ignoring these cross-linguistic similarities and
differences between two languages.
10.3 Limitations and Ways Forward
One limitation of this study lies in the learner corpus adopted for data collection and
analysis. Results obtained in the study are based on a corpus of the English writings
by learners of one mother tongue—the Chinese. So all the generalisations made in
this research are on the basis of data restricted to one learner type. Researching
collocation performance by learners speaking other L1s would have been more
rewarding, as comparisons can be made between collocation performances by
learners of different mother tongues. With data obtained from more learner types,
both collocations use typical to one individual learner type and collocation patterns
common to learners of various L1s can be found.
A further limitation regarding the learner corpus adopted is concerned with the
properties of the learner data. The Chinese Learner English Corpus is a collection of
writings by learners at different learning stages. So it is cross-sectional rather than
longitudinal. A longitudinal learner corpus would help us to arrive at more definite
conclusions regarding L2 learners’ collocational development and the factor of
vocabulary growth in collocation learning. Yet due to the unavailability of a learner
corpus at the time of beginning this research, a corpus recording the writings by
learners of different proficiency levels was used.
Nonetheless, despite adopting a quasi-longitudinal corpus, there is a clear dif-
ferentiation in proficiency levels. The ST2, ST5 and ST6 learner groups (viz.
middle school students, English majors of lower grades and English majors of
higher grades), which are assumed to be in a continuous development based on the
years of English instruction they received, were found to be in a continuous
developmental stage. Several indicators show a continuous rise in proficiency, e.g.
the continuous increase in lexical verb + noun collocations, the increase in the
number of the overall verbs, adjectives and nouns produced by each level of
learners, etc.
Therefore, the investigation into L2 learners’ collocational development and the

process of how collocations are acquired is far from complete. In future research,
more longitudinal studies are needed to identify patterns of phraseological devel-
opment in L2 learners, and to compare them with those based on quasi-longitudinal
data (Paquot and Granger 2012: 143). In addition, with regard to the pedagogical
implications raised in the previous section, experiments can be conducted to test if
acquisition of verb semantics as proposed in this study contributes to the learning of
collocations in an efficient way in classrooms.
Another area of future enquiry lies in the acquisition of congruent and
non-congruent collocations. One of our findings is that learners at lower levels
produced significantly more non-congruent collocations than congruent ones. This
finding indicates that at early stages of language learning, collocations are learnt as
holistic units. However, this claim needs to be further verified through looking at
the collocation performance of learners at earlier stages than the ST2 level targeted
in this study, in order to get a more comprehensive picture. The recording of
learners’ written performance at beginners’ and basic levels is scarce and needs to
be included, in order to get a fuller picture of L2 learners’ production of
collocations.
In addition, our finding that non-congruent collocations receive more perceptual
salience than congruent collocations in learning and are less susceptible to errors
than congruent collocations needs to be further examined in classroom experiments.
In this sense, results obtained from spontaneous data in SLA research may be
complemented by experimental and intuitional data to capture aspects of compe-
tence as well as performance, and to validate results on learners’ use of word
combinations (cf. Cross and Papp 2008: 77).
Despite all these limitations discussed above, the findings revealed through a
cross-sectional study of the collocations produced by Chinese learners of English
contribute to a clearer understanding of the process of second language acquisition,
and to a more successful collocation teaching and learning through providing
pedagogical implications.
References
Albert, M., & Obler, L. K. (1978). The bilingual brain. New York: Academic Press.
Bahns, J., & Eldaw, M. (1993). Should we teach EFL students collocations? System, 21(1), 101–
114.
Barnbrook, G. (2007). Sinclair on collocation. International Journal of Corpus Linguistics, 12(2),
183–199.
Brown, D. (1974). Advanced vocabulary teaching: The problem of collocation. RELC Journal, 5
(2), 1–11.
Carter, R., & McCarthy, M. (1988). Vocabulary and language teaching. London: Longman.
Channell, J. (1988). Psycholinguistic considerations in the study of L2 vocabulary acquisition.
In R. Carter & M. Mccarthy (Eds.), Vocabulary and language teaching (pp. 83–94). London:
Longman.
References 167
Chi, M. -L. A., Wong, P. -Y. K., & Wong, C. -P. M. (1994). Collocational problems amongst ESL
learners: A corpus-based study. In L. Flowerdew & A. K. Tong (Eds.), Entering text (pp. 157–
165). Hong Kong: University of Science and Technology.
Cowie, A. P. (2009). Semantics. Oxford: Oxford University Press.
Craik, F. I. M., & Lockhart, R. S. (1972). Levels of processing: A framework for memory research.
Journal of Verbal Learning and Verbal Behavior, 11(6), 671–684.
Cross, J., & Papp, S. (2008). Creativity in the use of verb + noun combinations by Chinese
learners of English. In G. Gilquin, S. Papp, & M. B. Diez-Bedmar (Eds.), Linking up
contrastive and learner corpus research (pp. 57–81). Amsterdam: Rodopi.
production of native and non-native speakers. In M. Bygate, P. Skehan, & M. Swain (Eds.),
Harlow: Longman.
Granger, S. (1998). Prefabricated patterns in advanced EFL writing: collocations and formulae.
Granger, S. (2014). John Sinclair’s idiom principle: An inspiration for learner corpus research.
Talk given at 2014 Annual John Sinclair Lecture, Birmingham, 8 May 2014.
Hanks, P. (1996). Contextual dependency and lexical sets. International Journal of Corpus
Henriksen, B. (2013). Research on L2 learners’ collocational competence and development—A
progress report. In C. Bardel, C. Lindqvist, & B. Laufer (Eds.), L2 vocabulary acquisition,
knowledge and use: New perspectives on assessment and corpus analysis (pp. 29–56). Eurosla.
(2014-03-10). http://www.eurosla.org/monographs/EM02/EM02tot.pdf.
Hoey, M. A. (2000). World beyond collocation: New perspectives on vocabulary teaching. In M.
Lewis (Ed.), Teaching collocation: Further developments in the lexical approach (pp. 224–
245). Hove: Language Teaching Publications.
Hornby, A. S. (2009). Oxford advanced learner’s English-Chinese dictionary. (7th ed.) Beijing:
The Commercial Press.
Howarth, P. (1998). The phraseology of learners’ academic writing. In A. P. Cowie (Ed.),
Press.
James, C. (1996). A cross-linguistic approach to language awareness. Language Awareness, 5(3–
4), 138–148.
Jiang, N. (2000). Lexical representation and development in a second language. Applied
Jiang, N. (2002). Form-meaning mapping in vocabulary acquisition in a second language. Studies
in Second Language Acquisition, 24, 617–637.
Katz, J. J., & Fodor, J. A. (1963). The structure of a semantic theory. Language, 39(2), 170–210.
Kjellmer, G. A. (1991). Mint of phrases. K. Aijmer & B. Altenberg (Eds.), English corpus
linguistics. Studies in Honour of Jan Svartvik (pp. 111–127). London: Longman.
Kroll, J. F., & Stewart, E. (1994). Category interference in translation and picture naming:
evidence for asymmetric connections between bilingual memory representations. Journal of
Memory and Language, 33, 149–174.
Kroll, J. F., Van Hell, J. G., & Tokowicz, N., et al. (2010). The revised hierarchical model: A
critical review and assessment. Bilingualism: Language and Cognition, 13(3), 373–381.
Laufer, B. (2006). Comparing focus on form and focus on forms in second language vocabulary
learning. The Canadian Modern Language Review, 63(1), 149–166.
Laufer, B., & Girsai, N. (2008). Form-focused instruction in second language vocabulary learning:
A case for contrastive analysis and translation. Applied Linguistics, 29(4), 694–716.
Laufer, B., & Waldman, T. (2011). Verb-Noun collocations in second language writing: A corpus
Lee, C. Y., & Liu, J. S. (2009). Effects of collocation information on learning lexical semantics for
near synonym distinction. Computational Linguistics and Chinese Language Processing, 14
(2), 205–220.
Lewis, M. (1997). Implementing the lexical approach. Hove: Language Teaching Publications.
Llach, M. P. A. (2011). Lexical errors and accuracy in foreign language writing. Bristol:
Multilingual Matters.
Martelli, A. A. (2006). Corpus based description of English lexical collocations used by Italian
Meara, P. (1978). Learners’ word associations in French. Interlanguage Studies Bulletin, 3(2),
192–211.
Meara, P. (1982). Word associations in a foreign language. Nottingham Linguistic Circular, 11(2),
29–38.
Nation, I. S. P. (1990). Teaching and learning vocabulary. Boston, Mass: Heinle & Heinle.
Nattinger, J. (1988). Some current trends in vocabulary teaching. In R. Carter & M. Mccarthy
(Eds.), Vocabulary and language teaching (pp. 62–80). London: Longman.
Paquot, M., & Granger, S. (2012). Formulaic language in learner corpora. Annual Review of
Applied Linguistics, 32, 130–149.
nativelike fluency. In J. C. Richards & R. W. Schmidt (Eds.), Language and communication
Ringbom, H. (1987). The role of the first language in foreign language learning. Clevedon:
Multilingual Matters.
Ringbom, H. (2001). Lexical transfer in L3 production. In J. Cenoz, B. Hufeisen, & U. Jessner
(Eds.), Cross-linguistic influence in third language acquisition: Psycholinguistic perspectives
(pp. 59–68). Clevedon: Multilingual Matters.
Rudzka, B., Channell, J., Putseys, Y., et al. (1981). The words you need. London: Macmillan.
Rudzka, B., Channell, J., Putseys, Y., et al. (1985). More words you need. London: Macmillan.
Schmidt, R. W. (1990). The role of consciousness in second language learning. Applied
Linguistics, 11(2), 129–150.
Benjamins.
Sinclair, J. (1987). Collocation: A progress report. In R. Steele & T. Threadgold (Eds.), Language
topics: Essays in honour of Michael Halliday (Vol. 2, pp. 319–331). Amsterdam: Benjamins.
Sinclair, J. (1995). Collins COBUILD English dictionary. (2nd. ed.) London: HarperCollins.
Wolter, B. (2001). Comparing the L1 and L2 mental lexicon: A depth of individual word
knowledge model. Studies in Second Language Acquisition, 23(1), 41–69.
References 169
Xiao, R., & Mcenery, T. (2006). Collocation, semantic prosody, and near synonymy: A cross
linguistic perspective. Applied Linguistics, 27(1), 103–129.
Yamashita, J., & Jiang, N. (2010). L1 Influence on the acquisition of L2 collocations:
647–668.
Zareva, A., & Wolter, B. (2012). The ‘promise’ of three methods of word association analysis to
L2 lexical research. Second Language Research, 28(1), 41–67.
Appendix A
Erroneous VN Collocations Produced
by the Three Levels of Learners (Types)
Learners DeLexVN LexVN

ST2 23 41
ST5 17 27
ST6 22 71
Notes ST2 and ST5: p = 0.8404 ns
ST5 and ST6: p = 0.1038 ns
ST2 and ST6: p= 0.1080 ns
DOI 10.1007/978-981-10-5822-6
Appendix B
Well-Formed and Erroneous VN
Collocations in the 16 Synsets (ST2)
Synsets Well-formed coll. Erroneous coll.

Verbs Nouns Freq. Verbs Nouns Freq.
Verbs of Compose Poem 1 Make Poem 2
creation (compose)
Draw Picture 1 Create Poem 2
(compose)
Draw Conclusion 1 Create Song 2
(compose)
Hold Party 7 Make Environment 1
(create)
Hold Meeting 10 Give Meeting 1
(hold)
Hold Game 1 Raise Shout 2
(give)
Hold Contest 1
Hold Festival 1
Hold Ceremony 2
Launch War 1
Set Fire 2
Set Example 1
“Fulfil” verbs Discharge Duty 1
Fulfil Wish 1
Verbs of Achieve Victory 1 Make Result 1
obtaining (achieve)
Achieve Result 1 Earn Knowledge 1
(acquire)
Earn Money 1 Grasp Knowledge 3
(acquire)
Gain Knowledge 1
Gather Strength 1
Receive Letter 2
Receive Degree 3
(continued)
DOI 10.1007/978-981-10-5822-6
174 Appendix B: Well-Formed and Erroneous VN Collocations in the 16 Synsets (ST2)
(continued)
Verbs of Lay Foundation 1
putting
“Settle” verbs Settle Problem 2 Do Problem 4
(solve)
Solve Problem 2
“Learn” verbs Learn Knowledge 16
(acquire)
Get Lesson 1
(learn)
Study Knowledge 5
(acquire)
Know Knowledge 2
(acquire)
Verbs of Teach Lesson 1 Teach Knowledge 2
transfer of a (impart)
message Tell Lie 3 Take (tell) Joke 1
Tell Story 3 Say (tell) Joke 1
Tell Knowledge 3
(impart)
Tell (give) Advice 2
“Keep” verbs Hold Breath 1
Keep Record 7
Keep Pace 1
Keep Balance 1
Keep Secret 2
Keep Promise 1
“Follow” Obey Rule 2 Obey Fact 1
verbs (face)
Follow Advice 1 Observe Law 1
(obey)
“Play” verbs Play Part 1 Play Play 3
(perform)
“Change” Change Mind 6
verbs
“Break” verbs Break Rule 2
Break Record 3
“Live” verbs Lead Life 3
Live Life 4
“Wear” verbs Wear Clothes 17 Dress Clothing 1
(wear)
(continued)
Appendix B: Well-Formed and Erroneous VN Collocations in the 16 Synsets (ST2) 175
(continued)
“Drive” verbs Drive Motorcycle 1 Ride Bus 1
(drive)
Drive Bus 1
Drive Car 1
Ride Bike 5
“Pay” verbs Devote Attention 1
Pay Visit 2
Pay Respect 1
Pay Attention 23
Note Verbs in brackets are the target verbs for erroneous VN collocations
Appendix C

Verbs of Arouse Concern 2 Arise (arouse) Discussion 1
creation Chart Course 1 Arouse (cause) Trouble 1
Draft Law 1 Build (enact) Regulation 1
Draw Conclusion 11 Build (establish) Tie 1
Establish Relationship 1 Draw (draft) Law 2
Form Habit 1 Make (draw) Conclusion 1
Hold Meeting 2 Draw (formulate) Theory 2
Hold Conference 3 Draw (draft) Treaty 1
Launch War 1 Put forth (enact) Law 2
Raise Consciousness 1 Put forward Law 1
(enact)
Raise Objection 1 Set (enact) Law 1
Raise Question 3 Publish (enact) Law 2
Raise Issue 4 Take (launch) Career 2
Raise Argument 1 Stir (raise) Consciousness 1
Raise Alarm 1 Raise (arouse) Discussion 1
Set Goal 1
Set Example 4
Set Fire 1
Enact Law 1
(continued)
DOI 10.1007/978-981-10-5822-6
178 Appendix C: Well-Formed and Erroneous VN Collocations in the 16 Synsets (ST6)
(continued)
“Fulfil” Apply Principle 1 Accomplish Crime 2
verbs (commit)
Enforce Policy 1 Carry out Value 1
(realise)
Enforce Law 2 Ensure (enforce) Law 1
Exercise Power 2 Carry on Law 1
(enforce)
Exercise Judgment 1 Exert Ability 1
(demonstrate)
Exercise Right 1 Exert Competence 1
(demonstrate)
Exert Influence 3 Fulfil Ability 1
(demonstrate)
Fulfil Role 1 Implement Act 1
(perform)
Fulfil Ambition 1 Attend (perform) Military 1
service
Fulfil Wish 1 Take (perform) Military 1
service
Implement Principle 1 Carry (perform) Function 1
Implement Law 1 Make (conduct) Exam 1
Implement Policy 1 Take (conduct) Survey 2
Perform Military 2 Conduct Murder 2
service (commit)
Perform Act 1 Conduct Crime 3
(commit)
Perform Function 1 Make (commit) Crime 1
Realise Value 2 Do (commit) Crime 4
Realise Dream 6
Realise Goal 1
Conduct Survey 2
Commit Crime 120
Commit Homicide 2
Commit Suicide 17
Commit Offence 1
Commit Murder 2
Commit Act 1
(continued)
Appendix C: Well-Formed and Erroneous VN Collocations in the 16 Synsets (ST6) 179
(continued)
Verbs of Achieve Aim 1 Receive (achieve) Success 1
obtaining Achieve Dream 1 Cause (catch) Attention 1
Achieve Goal 11 Reach (catch) Attention 1
Achieve Purpose 2 Meet (earn) Praise 1
Achieve Success 3 Get (reach) Conclusion 1
Catch Attention 1 Approach (reach) Conclusion 1
Earn Money 27 Reach (receive) Recognition 1
Earn Living 7 Receive Operation 1
(undergo)
Gain Knowledge 4
Gain Victory 1
Grasp Opportunity 2
Reach Agreement 1
Reach Goal 3
Reach Target 2
Reach Conclusion 1
Receive Award 1
Receive Training 3
Receive Education 24
Receive Treatment 3
Receive Attention 1
Receive Reward 2
Receive Punishment 5
Receive Warning 1
Seize Opportunity 2
Verbs of Attach Importance 8 Give (impose) Burden 1
putting Fix Eye 2 Do (impose) Punishment 1
Impose Fine 1 Impose (pose) Threat 1
Impose Burden 1 Lay (impose) Burden 1
Impose Punishment 2 Lay (cast) Eye 1
Lay Emphasis 2 Lay (assign) Role 1
Lay Foundation 1 Give (put) End 1
Place Emphasis 1 Put (pay) Attention 3
Put Value 5
Put Emphasis 3
Put Hope 1
Put End 19
Put Blame 2
Put Priority 1
(continued)
180 Appendix C: Well-Formed and Erroneous VN Collocations in the 16 Synsets (ST6)
(continued)
“Settle” Solve Problem 35 Charge (tackle) Problem 1
verbs Solve Dispute 2
Resolve Problem 3
Tackle Problem 1
Undertake Duty 1
Undertake Task 2
“Learn” Acquire Knowledge 4 Learn (acquire) Knowledge 11
verbs Have (learn) Lesson 1
Get (learn) Lesson 1
Master (acquire) Knowledge 2
Study (acquire) Knowledge 3
Verbs of Teach Lesson 1 Teach (impart) Knowledge 2
transfer of Tell Truth 1 Instruct Idea 1
a message (communicate)
Tell Story 5 Push (impart) Knowledge 1
Impart Knowledge 1
“Keep” Hold Opinion 3 Reflect (hold) Prejudice 1
verbs Hold Position 3 Cast (hold) Prejudice 1
Hold Belief 2
Hold Post 1
Hold View 5
Hold Attitude 3
Keep Watch 1
Keep Distance 2
Keep Eye 3
Keep Balance 6
Keep Promise 2
Keep Pace 1
Maintain Order 3
Maintain Balance 1
Maintain Dignity 1
(continued)
Appendix C: Well-Formed and Erroneous VN Collocations in the 16 Synsets (ST6) 181
(continued)
“Follow” Obey Law 7 Obey (adhere to) Principle 1
verbs Obey Rule 2
Follow Principle 1
Follow Rule 1
Adopt Attitude 4
Adopt Method 2
Adopt Policy 3
Adopt Law 1
“Play” Play Role 54 Serve (play) Role 1
verbs Play Part 3 Act (play) Role 1
Lead (play) Role 1
“Change” Shift Focus 2 Change Criminal 1
verbs (rehabilitate)
Change Mind 1
“Break” Break Law 26 Break (violate) Regulation 1
verbs Break Rule 2
Break Promise 1
Violate Regulation 1
Violate Law 6
“Live” Lead Life 37
verbs Live Life 30
“Wear” Wear Clothes 1 Dress (wear) Clothing 1
verbs
“Drive” Drive Car 2
verbs
“Pay” Pay Attention 53 Pay (give) Praise 1
verbs Pay Heed 25
pay respect 3
Appendix D
Frequencies of Well-Formed
and Erroneous VN Collocation Types
in the 16 Synsets (ST2 and ST6)
Types Synsets ST2 ST6

WFC EC WFC EC
1. Verbs of creation 12 6 19 15
2. “Fulfill” verbs 2 0 26 17
3. Verbs of obtaining 7 3 24 8
4. Verbs of putting 1 0 14 8
5. “Settle” verbs 2 1 6 1
6. “Learn” verbs 0 4 1 5
7. Verbs of transfer of a message 3 5 4 3
8. “Keep” verbs 6 0 15 2
9. “Follow” verbs 2 2 8 1
10. “Play” verbs 1 1 2 3
11. “Change” verbs 1 0 2 1
12. “Break” verbs 2 0 5 1
13. “Live” verbs 2 0 2 0
14. “Wear” verbs 1 1 1 1
15. “Drive” verbs 4 1 1 0
16. “Pay” verbs 4 0 3 1
Notes WFC stands for well-formed VN collocations; EC for erroneous VN collocations
DOI 10.1007/978-981-10-5822-6
Appendix E

Verbs of Arouse Admiration 1 Raise Discussion 1
creation (arouse)
Build Building 4 Hold Race 3
(stage)
Conduct Experiment 2 Hold Match 2
(stage)
Draw Conclusion 6 Do (enact) Law 2
Draw Picture 1 Do (enact) Regulation 2
Establish Relationship 2 Play Dance 1
(perform)
Form Habit 7
Hold Party 6
Hold Meeting 2
Hold Debate 1
Launch Campaign 3
Perform Play 1
Produce Effect 2
Publish Book 7
Raise Question 3
Set Fire 1
“Fulfil” Apply Principle 2 Fulfil Plan 1
verbs (implement)
Enforce Law 1 Practice Policy 1
(implement)
Perform Operation 1
(continued)
DOI 10.1007/978-981-10-5822-6
186 Appendix E: Well-Formed and Erroneous VN Collocations in the 16 Synsets (ST5)
(continued)
Verbs of Achieve Purpose 3 Catch Chance 3
obtaining (seize)
Achieve Aim 3 Catch Opportunity 1
(seize)
Achieve Success 2 Grasp Skill 1
(acquire)
Earn Money 9
Earn Salary 2
Earn Living 2
Gain Knowledge 6
Gain Independence 2
Reach Agreement 2
Reach Goal 3
Receive Letter 42
Seize Opportunity 1
Verbs of Attach Importance 4 Put (turn) Ear 1
putting Lay Stress 1
Place Emphasis 2
Put Emphasis 2
Put Stress 1
Put End 2
Set Foot 1
“Settle” Resolve Problem 1 Do (solve) Problem 5
verbs Solve Problem 45
“Learn” Learn Lesson 3 Learn Knowledge 25
verbs (acquire)
Master Skill 3 Study Knowledge 6
(acquire)
Verbs of Teach Lesson 2 Teach Knowledge 7
transfer of (impart)
a message Tell Story 14 Have (tell) Joke 1
Tell Lie 4
Tell Truth 2
Tell Joke 3
“Keep” Hold Opinion 2
verbs Keep Touch 1
(continued)
Appendix E: Well-Formed and Erroneous VN Collocations in the 16 Synsets (ST5) 187
(continued)
“Follow” Adopt Method 1 Obey Method 2
verbs (adopt)
Adopt Policy 1
Follow Instruction 2
Obey Law 1
Obey Rule 2
“Play” Play Role 23 Act (play) Role 2
verbs Play Part 4 Occupy Role 1
(play)
Do (play) Role 1
Lay (play) Role 1
“Change” Change Mind 2
verbs
“Break” Break Law 1
verbs Break Rule 1
Break Record 1
Violate Rule 2
“Live” Live Life 17 Make (live) Life 1
verbs Lead Life 13
“Wear” Wear Clothes 11
verbs
“Drive” Ride Bike 16
verbs
“Pay” Pay Attention 25
verbs Pay Respect 1
Appendix F
Frequencies of Well-Formed
and Erroneous VN Collocation Types
in the 16 Synsets (ST2, ST5 and ST6)
Types Synsets ST2 ST5 ST6

WFC EC WFC EC WFC EC
1. Verbs of creation 12 6 16 6 19 15
2. “Fulfil” verbs 2 0 3 2 26 17
3. Verbs of obtaining 7 3 12 3 24 8
4. Verbs of putting 1 0 7 1 14 8
5. “Settle” verbs 2 1 2 1 6 1
6. “Learn” verbs 0 4 2 2 1 5
7. Verbs of transfer of a message 3 5 5 2 4 3
8. “Keep” verbs 6 0 2 0 15 2
9. “Follow” verbs 2 2 5 1 8 1
10. “Play” verbs 1 1 2 4 2 3
11. “Change” verbs 1 0 1 0 2 1
12. “Break” verbs 2 0 4 0 5 1
13. “Live” verbs 2 0 2 1 2 0
14. “Wear” verbs 1 1 1 0 1 1
15. “Drive” verbs 4 1 1 0 1 0
16. “Pay” verbs 4 0 2 0 3 1
DOI 10.1007/978-981-10-5822-6
Appendix G
Adjective Categories in the ST2 and ST6
AN Collocation Databases
ST6 ST2
Qualitative (69): Active, adverse, bad, breaking, (34): Active, bad, bright, cheap,
adjectives bright, broad, clean, clear, close, classical, close, common, correct,
common, controversial, convincing, crisp, dark, deep, fair, fast, firm, foul,
dark, deadly, deaf, deep, dense, fresh, full, glib, good, great, happy,
distant, effective, fair, fatal, fertile, hard, heavy, high, long, loose, loud,
fierce, fresh, full, good, great, guilty, low, open, popular, rapid, soft, strong,
hard, heated, heavy, high, hot, warm
infectious, irresistible, keen, key,
leading, lethal, light, long, mass,
narrow, near, nice, polluted, practical,
primary, primitive, privileged,
professional, promising, rapid,
remote, rural, scientific, sharp, small,
solid, sore, strong, torrential,
unexpected, urban, urgent, vicious,
warm, weak, wide
Classifying (70): Academic, annual, arable, (25): Boiled, British, botanical,
adjectives armed, associate, atomic, biochemical, capitalist, civil, closing, criminal,
bodily, boiling, broken, capitalist, daily, developed, developing, double,
chemical, compulsory, consequential, everyday, extracurricular, final, foster,
corporal, criminal, cultural, curable, founding, historic, living, Lunar,
daily, developed, developing, military, natural, physical, political,
domestic, economic, electric, public
endangered, environmental, ethical,
everyday, feminist, financial, final,
five-star, flared, foreign, human,
illegal, incurable, individual,
industrial, initial, international,
juvenile, latest, liberal, literal, living,
medical, middle, military, monetary,
moral, naked, native, national, natural,
nuclear, personal, physical, plastic,
political, presidential, promissory,
public, racial, sexual, social, solar,
spoiled, territorial, top
(continued)
DOI 10.1007/978-981-10-5822-6
192 Appendix G: Adjective Categories in the ST2 and ST6 AN Collocation Databases
(continued)
ST6 ST2
Emphasising (1): Absolute (1): Blue
adjectives
Colour (2): Black, blue (0)
adjectives
Appendix H
Well-Formed and Erroneous Congruent
and Non-congruent Collocations
in the ST6 (Types)

Well-formed coll. 221 (75) 127 (87) 348
Erroneous coll. 74 (25) 19 (13) 93
Total 295 (100) 146 (100) 441
Note v2 = 7.84, p = 0.0051 **
very Sig.
DOI 10.1007/978-981-10-5822-6
Appendix I
Well-Formed Congruent
and Non-congruent VN Collocations
in the ST2 and ST6 (Types)
Types ST2 (%) ST6 (%) Total

Congruent coll. 117 (53) 221 (64) 336
Non-congruent coll. 104 (47) 127 (36) 231
Total 221 (100) 348 (100) 569
Note v2 = 6.42, 0.01 < p < 0.05 *
Sig.
DOI 10.1007/978-981-10-5822-6
Appendix J
Erroneous Congruent and Non-congruent
VN Collocations in the ST2 and ST6
(Types)
Types ST2 (%) ST6 (%) Total

Congruent coll. 51 (80) 74 (80) 125
Non-congruent coll. 13 (20) 19 (20) 32
Total 64 (100) 93 (100) 157
Note v2 = 0.00, p > 0.05 ns
DOI 10.1007/978-981-10-5822-6
Appendix K
Positive and Negative Transfer
Between VN and AN Collocations
in the ST2 (Types)
Types VN coll. (%) AN coll. (%) Total

Positive transfer (C, W) 117 (75) 80 (92) 197
Negative transfer(I, N and C, N) 40 (25) 7 (8) 47
Total 157 (100) 87 (100) 244
Note p = 0.0007 *** extremely Sig.
DOI 10.1007/978-981-10-5822-6
Appendix L
Positive and Negative Transfer
Between VN and AN Collocations
in the ST6 (Types)
Types VN coll. (%) AN coll. (%) Total

Positive transfer (C, W) 221 (89) 204 (99) 425
Negative transfer(I, N and C, N) 28 (11) 2 (1) 30
Total 249 (100) 206 (100) 455
DOI 10.1007/978-981-10-5822-6
References
Brown, R. A. (1973). First language: The early stages. Cambridge, MA: Harvard University
Press.
Hanania, E., & Gradman, H. (1977). Acquisition of English structures: A case study of an adult
native speaker of Arabic in an English speaking environment. Language Learning, 27(1), 75–
91.
Hui, Y. A. (2002). New century Chinese-English dictionary. Beijing: Foreign Language Teaching
and Research Press.
Lombard, R. J. (1997). Non-native speaker collocations: A corpus-driven characterization from
the writing of native speakers of Mandarin (Mandarin Chinese). Ann Arbor, MI: UMI.
Men, H. (2014). L1 influence on the production of L2 collocations: A corpus-based study of
Chinese EFL learners’ collocation acquisition. Talk given at the 9th Newcastle upon Tyne
Postgraduate Conference in Linguistics, 4 April, 2014.
Peters, A. (1977). Language learning strategies. Language, 53(3), 560–573.
Schmidt, R. W. (1983). Interaction, acculturation, and the acquisition of communicative
competence: A case study of an adult. In N. Wolfson &E. Judd (Eds.), Sociolinguistics and
language acquisition (pp. 137–174). Rowley, MA: Newbury House.
Scott, M. (2004). WordSmith Tools (Version 4.0). Oxford: Oxford University Press.
Wong-Fillmore, L. (1976). The second time around: Cognitive and social strategies in language
acquisition. Stanford: Stanford University.
DOI 10.1007/978-981-10-5822-6
Index
A Differentiation, 17, 18, 21, 23, 29, 139, 140,

Acceptability, 36, 45, 48, 59 147, 154, 158, 159, 164, 165
Adjective + noun collocations. See AN Distinguishers, 160, 161
collocations Dual-activation, 49, 50, 157, 158
Apparent-time design, 62, 94
F
B Free word combinations, 20, 21, 24, 26
BBI, 64, 69, 73 Frequency-based approach, 14, 24, 28, 64
Bilingual memory, 49, 50
BNC. See British National Corpus I
Idiomaticity, 1, 9
C Idiom principle, 2, 43, 79
Chinese Learner English Corpus. See CLEC Idioms, 9, 10, 13, 20, 21, 24, 25, 27, 30, 36, 52,
Coalescing, 140, 147, 154, 158, 159, 164 78
Colligation, 17, 18, 24, 67, 70, 119–121, 131,
138 L
Collocability, 16 L1 influence, 46, 47, 49, 51, 60, 120, 134, 136,
Collocation acquisition, 4, 39, 46, 48, 49, 55, 145–147, 155, 158 See also L1 transfer
71, 82, 94, 126, 132, 144, 151, 154 L1 transfer, 47, 50, 146 See also L1 influence
Collocational grids, 161, 162 Lemma, 18, 50, 92, 158
Collocational range, 25, 26 Lexeme, 50, 143
Collocation lag, 2, 3, 42, 52, 53, 55, 59, 109, Lexically non-congruent collocations, 143
114, 115, 152 Lexical teddy bears, 44
Commutability, 21–25, 28, 30 See also Lexical verb + noun collocations. See LexVN
Substitutability collocations
Componential grid, 161
Compositionality, 144 M
Concordances, 41, 66, 69 Multiword units, 12, 13
Concordancing, 66
Congruence, 48, 51, 133–135, 138, 146 N
Cross-sectional study, 6, 45, 59, 85, 151, 166 NN collocations. See noun + noun collocations
Non-congruence, 133, 143
D
Delexical verb + noun collocations. See O
DeLexVN collocations Open choice principle, 2, 43, 79, 94, 164
DOI 10.1007/978-981-10-5822-6
206 Index
P S
Partial congruence, 139, 140, 147, 154 Salient, 144, 164
Partial non-congruence, 140 Semantic interference, 156, 157, 163
Perceptual salience, 144, 166 Semantic marker, 160, 161
Phraseological approach, 14, 15, 19, 20, 25, 30, Sematic relatedness, 54, 160, 161
64 Semantic transparency, 21, 25, 27, 30, 38
POS tagging, 67, 68 Structurally non-congruent collocations, 143
Prefabricated units, 12, 152 Substitutability, 22 See also Commutability
Prefabs, 9 See also Prefabricated units Synonym density, 127, 128, 130, 154
Priming, 49 Synonym set, 3, 6, 59, 71, 93, 95, 125, 126,
Psychological approach, 15, 28, 30 132, 153, 154, 160
Q T
Qualitative deficiency, 85 Translation equivalents, 48–51, 135, 142, 158,
Quantitative deficiency, 152 163, 164
R V
Recurrent word combinations, 10, 11 Verb + noun collocations. See VN collocations
Reliability check, 67
Restricted collocation, 10, 21, 23, 24, 38, 43, W
45, 78 WordNet, 71, 72, 98, 126–128, 130, 132, 154
Revised Hierarchical Model, 49, 158

Vocabulary Increase and Collocation

Uploaded by

Copyright:

Available Formats

Vocabulary Increase and Collocation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Vocabulary Increase and Collocation

Uploaded by

Copyright:

Available Formats

Haiyan Men

ISBN 978-981-10-5821-9 ISBN 978-981-10-5822-6 (eBook)

Library of Congress Control Number: 2017947487

Printed on acid-free paper

This Springer imprint is published by Springer Nature

I am pleased to have the opportunity to recommend this monograph study of the

Extensive consideration is also given to pedagogical aspects that arise from

August 2016 Richard Ingham

This book is based upon my Ph.D. dissertation, Vocabulary Increase and

6.3 An Alternative Explanation: New Nouns

Appendix A: Erroneous VN Collocations Produced by the Three

BNC British National Corpus

1.1 General Background

The past decades have seen a dramatic increase in studies on collocations in

1.2 Aims of the Study

The major task of second-language collocation research is to discover what it means

Furthermore, two other common types of collocations—adjective + noun and

1.3 The Shape of the Study

i.e. adjective + noun and noun + noun collocations. It aims at a corroboration of

Altenberg, B. (1998). On the phraseology of spoken English: The evidence of recurrent

Halliday, M. A. K. (1966). Lexis as a linguistic level. In C. E. Bazell, J. C. Catford, M. A. K.

2.1 The Importance of Collocation

2.1.1 The Pervasiveness of Phraseological Tendency

In the process of speech or text production, complete freedom of choice of a single

A considerable proportion of prefabs has been identiﬁed in various kinds of

2.1.2 The Importance of Collocation for L2 Learners

2.1.2.1 Phraseological Knowledge Is Important for Native-like

Knowledge of collocations is of the same importance as knowledge of grammar. It

2.1.2.2 Phraseological Knowledge Is Beneﬁcial for Efﬁcient

2.2 The Notion of Collocation

2.2.1 Collocation Previously Approached

Collocation, which refers to syntagmatic lexical relations in a language, can be

Collocation has also been psychologically envisaged (Aitchison 2003). In what

2.2.1.1 The Psychological Approach

Collocation involves strong associations between words. This association can be

for ‘Hungarian rhapsody’” (Aitchison 2003: 91). Words in a collocational rela-

2.2.1.2 The Firthian Approach

Text-Oriented Studies on Collocation

Collocation refers to a patterning of language with tendencies of lexical items to

Through a computer-based study, Jones and Sinclair (1974) discovered that

differentiation method based on actual language use has revolutionised lexical

Frequency-Based Studies on Collocation

2.2.1.3 The Phraseological Approach

Table 2.1 Classiﬁcations of English word combinations

according to generative rules” (Howarth 1996: 31). Therefore, special attention is

Semantic transparency/opacity is measured in terms of whether the meaning of the

Specialised Senses of One Element

Phraseologists distinguish collocations from free combinations in terms of the

(1) Figurative: require qualiﬁcations

Table 2.2 Howarth’s categorisation of collocations into ﬁve levels of restrictedness

2.2.2 Collocation Deﬁned in This Study

Table 2.3 Deﬁnitions of collocations and demarcating criteria adopted

Table 2.4 A framework for demarcating collocations

2.2.3 Collocations Classiﬁed in This Study

Different approaches to collocations result in different classiﬁcations. This section

Table 2.5 Classiﬁcations of lexical collocations by Benson et al. (2010)

Among the lexical collocations categorised by Hausmann (1989) and Benson

Aisenstadt, E. (1979). Collocability restrictions in dictionaries. In R. R. K. Hartmann (Ed.),

3.1 Methodologies Adopted in L2 Collocation Studies

In the investigation of L2 learners’ collocation knowledge, studies are generally

Table 5.1 VN collocations Learners 5 5–10 10