Vocabulary Increase and Collocation
Vocabulary Increase and Collocation
Vocabulary Increase and Collocation
Vocabulary
Increase and
Collocation
Learning
A Corpus-Based Cross-sectional Study of
Chinese Learners of English
Vocabulary Increase and Collocation Learning
Haiyan Men
Vocabulary Increase
and Collocation Learning
A Corpus-Based Cross-sectional Study
of Chinese Learners of English
123
Haiyan Men
Shanghai Sanda University
Shanghai
China
Not for sale outside the Mainland of China (Not for sale in Hong Kong SAR, Macau SAR, and Taiwan,
and all countries, except the Mainland of China)
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018
This work is subject to copyright. All rights are reserved by the Publishers, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publishers, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publishers nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publishers remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
v
vi Foreword
vii
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 General Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aims of the Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 The Shape of the Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 The Notion of Collocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 The Importance of Collocation . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 The Pervasiveness of Phraseological Tendency . . . . . . . 9
2.1.2 The Importance of Collocation for L2 Learners. . . . . . . 11
2.1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 The Notion of Collocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Collocation Previously Approached . . . . . . . . . . . . . . . . 14
2.2.2 Collocation Defined in This Study . . . . . . . . . . . . . . . . 25
2.2.3 Collocations Classified in This Study . . . . . . . . . . . . . . 27
2.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Collocation Studies in Second-Language Learner English . . . . . . . . 35
3.1 Methodologies Adopted in L2 Collocation Studies . . . . . . . . . . . 35
3.1.1 Elicitation Data-Based Collocation Studies . . . . . . . . . . 36
3.1.2 Spontaneous Data-Based Collocation Studies . . . . . . . . 39
3.2 Previous Findings from L2 Collocation Research . . . . . . . . . . . . 42
3.2.1 Forms of Collocation Deficiency: Overuse, Underuse
and Misuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.2 The Role of Learners’ L1 . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.3 Collocation Lag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
ix
x Contents
4 Research Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.1 Research Purpose and Questions . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 The Selection of Verb + Noun, Adjective + Noun
and Noun + Noun Collocations . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 The Learner Corpus—CLEC . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4 Collocation Dictionaries for Reference . . . . . . . . . . . . . . . . . . . . 64
4.5 The Reference Corpus—BNC . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.6 Software for Retrieval and Analysis . . . . . . . . . . . . . . . . . . . . . . 66
4.7 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.7.1 Tagging and Reliability Check . . . . . . . . . . . . . . . . . . . 66
4.7.2 Investigation of Verb + Noun Collocations . . . . . . . . . . 68
4.7.3 Investigation of Adjective + Noun and Noun + Noun
Collocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5 Chinese Learners’ Production of Verb + Noun Collocations . . .... 77
5.1 Overall Analyses (1): General Patterns of VN Collocations
Produced by L2 Learners . . . . . . . . . . . . . . . . . . . . . . . . . . .... 77
5.1.1 Overall Tokens of Collocations . . . . . . . . . . . . . . . .... 77
5.1.2 Overall Types of Collocations and Collocation
Frequency Distribution . . . . . . . . . . . . . . . . . . . . . .... 79
5.1.3 Collocation Misuses . . . . . . . . . . . . . . . . . . . . . . . .... 83
5.1.4 Synopsis of Overall Analyses (1) . . . . . . . . . . . . . .... 85
5.2 Overall Analyses (2): Between-Group Comparisons
of Delexical and Lexical VN Collocations . . . . . . . . . . . . . .... 86
5.2.1 Between-Group Comparisons of Well-Formed
DeLexVN and LexVN Collocations . . . . . . . . . . . .... 88
5.2.2 Between-Group Comparisons of Erroneous
DeLexVN and LexVN Collocations . . . . . . . . . . . .... 89
5.2.3 Synopsis of Overall Analyses (2) . . . . . . . . . . . . . .... 91
5.3 Overall Analyses (3): Verb Growth and Collocation Errors .... 92
5.4 Synopsis of the Overall Analyses of Verb + Noun
Collocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 93
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 95
6 Verb Increase and the Production of Verb + Noun
Collocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 97
6.1 Detailed Analyses—Verb Increase and Collocation Uses . . .... 97
6.1.1 Analysis of VN Collocations Within Synsets
Identified at the ST2 and ST6 Levels . . . . . . . . . . .... 98
6.1.2 Analysis of VN Collocations Within Synsets
Identified at the ST2, ST5 and ST6 Levels . . . . . . .... 103
6.2 Synopsis of Detailed Analyses of Verb Increase
and Collocation Uses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 108
Contents xi
Corpora
Dictionaries
Bilingual dictionaries:
NCCED New Century Chinese-English Dictionary
OALECD Oxford Advanced Learner’s English-Chinese Dictionary (7th Edition)
English dictionaries:
BBI The BBI Combinatory Dictionary of English: Your Guide to
Collocations and Grammar (3rd Edition)
COBUILD Collins COBUILD English Dictionary (2nd Edition)
OCDSE Oxford Collocations Dictionary for Students of English (2nd Edition)
ODSA Oxford Dictionary of Synonyms and Antonyms
Chinese dictionaries:
CCD Contemporary Chinese Dictionary (5th Edition)
xiii
xiv Abbreviations
Other Abbreviations
AN Adjective + noun
DeLexVN Delexical verb + noun
EFL English as a foreign language
ELT English language teaching
ESL English as a second language
EVCA English Verb Classes and Alternations
FL Foreign language
FLT Foreign language teaching
L1 First language
L2 Second language
LexVN Lexical verb + noun
NN Noun + noun
NNSs Non-native speakers
NSs Native speakers
POS Part of speech
SLA Second language acquisition
VN Verb + noun
Chapter 1
Introduction
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018 1
H. Men, Vocabulary Increase and Collocation Learning,
DOI 10.1007/978-981-10-5822-6_1
2 1 Introduction
disrespect or arrogance” (Wray 2002: 143), and “may sound rather bookish and
pedantic to a native speaker” (Channell 1994: 21).
In the meantime, arbitrarily restricted co-occurrence of word combinations
abound in the English language. For example, blond is perfect for modifying hair,
but not door or dress (Palmer 1981: 76f), and we “do the cooking” but “make
dinner” (Fox 1998: 33). Learning to construct word combinations that are cus-
tomarily used by native speakers is one of the most difficult tasks for even the most
proficient language learners (Pawley and Syder 1983). This phraseological defi-
ciency in second-language learners was realised as early as in the 1930s and has
been extensively discussed ever since. Palmer (1933) noted that when forming such
combinations (e.g. to ask a question, to do a favour), which are not-rule-governed
word combinations, learners may produce expressions such as *to make a question,
*to perform a favour. A great deal of previous research has found that collocation
learning constitutes a problematic domain for L2 learners even at fairly high pro-
ficiency levels. Studies in this field are generally devoted to a description of L2
learners’ difficulties with collocations. The overall picture that emerges from pre-
vious L2 collocation research is that apart from learners’ better receptive knowledge
of collocations (e.g. Biskup 1990; Gyllstad 2005; Marton 1977), collocation pro-
duction poses great problems. Overall, L2 learners’ “building material is individual
bricks rather than prefabricated sections” (Kjellmer 1991:124). They are found to
operate more on the “open choice principle” than the “idiom principle” and use
fewer collocations compared with native-speaker counterparts. In addition to
insufficient uses of collocation, overuse, underuse and misuse of certain colloca-
tions are frequently reported in learners’ writings (e.g. Granger 1998; Howarth
1996; Laufer and Waldman 2011).
Apart from the difficulties L2 learners encounter in the production of colloca-
tions, phraseological knowledge is believed to lag behind grammar and lexis and
constitutes the “last and most challenging hurdle in attaining near native-like flu-
ency” (Spottl and McCarthy 2004: 191). In comparison with learners’ general
vocabulary knowledge, knowledge of collocations is rather weak as Bahns and
Eldaw (1993) found that collocation errors were more than twice than errors with
lexical words. When collocation knowledge is compared among learners at different
levels, it was reported that collocation performance did not improve as the advanced
and the intermediate learners produced significantly more erroneous collocations
than the basic learners (Laufer and Waldman 2011). Similarly, in another more
recent study exploring the collocational competence of two groups of Nigerian
advanced speakers of English as a second language, Obukadeta (2014) discovered
that the participants who had been living/studying in the UK for up to 15 years
were less proficient in terms of their knowledge of collocations than the other group
which had never lived or studied outside Nigeria. These studies indicate a collo-
cation lag, which means that collocational knowledge does not develop alongside
learners’ general level of English proficiency.
Given the difficulties learners are confronted with and the lag in collocational
knowledge, investigating collocations in an L2 is a continuing concern within the
field of second-language learning and teaching. However, research in this field is
1.1 General Background 3
still in its early stage since there is not an overall theory accounting for how
collocations are acquired by L2 learners (Gitsaki 1999). Knowing how collocations
are acquired and produced can provide valuable insight into how they are best
taught. Although extensive research has been carried out on the learning and
production of L2 collocations, much of the research up to now has been descriptive
in nature. It remains unclear what factor(s) is/are associated with the lag in collo-
cation knowledge.
Therefore, the importance of phraseological knowledge in both language pro-
duction and comprehension, and its acquisition as a problematic territory for L2
learners provide sufficient justification for further research in this area. The aim of
this study is to fill the gap by examining factors associated with collocation lag.
delexical verbs and higher levels make more errors with lexical verbs in VN
collocations.
The hypothesis on the developmental patterns is closely linked with the general
hypothesis that verb increase is a hindrance in collocation acquisition. More pre-
cisely, at lower stages of L2 development, due to their limited mastery of verbs,
learners resort to delexical verbs to collocate with a noun instead of a specific
lexical verb. As their verb vocabulary grows, they have more access to lexical verbs
and tend to make more collocation errors with lexical verbs, because the increase in
synonymous verbs allows more chances of incorrect verb choices. The increase in
lexical verbs and the subsequent occurrences of errors with lexical verbs suggest
that vocabulary growth impedes collocation acquisition. To test whether the growth
of verb vocabulary constitutes an inhibiting force in collocation learning, the
relationship between vocabulary growth and collocation development has to be
viewed locally in specific VN collocations, through locating the semantic domains
of verbs in collocations where there is an increase in verbs and examining whether
the increase in these verbs subsequently leads to collocation errors.
Based on the two hypotheses, the following research questions will be
addressed:
1. What developmental patterns appear in the verb + noun collocations produced
by L2 learners, in terms of delexical verb and lexical verb + noun collocations?
a. Is there a tendency towards increasing use of lexical verb + noun collocations
with rising proficiency?
b. Is there a tendency towards increasing errors with lexical verb + noun collo-
cations and decreasing errors with delexical verb + noun collocations with rising
proficiency?
2. Within specific semantic domains of the verbs in verb + noun collocations used
by all levels of learners, is there a tendency for these verbs, as they increasingly
occur at the higher levels, to be associated with collocation errors?
Research questions 1b and 2 are interrelated as they bear the relation of the
whole and a part. They are both concerned with the increase of lexical verbs and the
production of verb + noun collocations at the higher levels. Research question 1b
addresses the relationship between the overall increase of lexical verbs at the higher
levels on the whole and the increasing/decreasing trend of verb + noun collocation
errors associated with lexical verbs; the scope of research concerning verb increase
and collocation errors is further narrowed down in research question 2, which is
aimed at a detailed investigation of the increase of lexical verbs within particular
semantic domains. Through confining the verb increase into semantic domains, the
study sets out to examine if verb increase in a semantic domain is a factor asso-
ciated with the lag in verb + noun collocational knowledge. The particular focus on
verb increase in semantic domains in research question 2 is built on the belief that
learners may be confused with semantically related words (e.g. acquire and obtain)
rather than words falling in different semantic domains (e.g. acquire and change) in
producing verb + noun collocations.
1.2 Aims of the Study 5
speaker is a myth. Those who have two native-speaking parents, both preferably
monolingual, and are raised in a native-speaking community, can still not be def-
initely defined as native speakers of that language, since other social factors like
mobility and the rise of new Englishes are at play (Davies 2003). As English is
becoming a lingua franca, and an increasing number of proficient academics whose
first language is not English enter English academia (Hyland 2006), it is even
harder to define what a native speaker is. The theoretical aspects of the
native-speaker construct will not be addressed in this study.
Nevertheless, for the investigation and description of learner interlanguage,
language learning goals in terms of native-speaker norms need to be set. Two
widely used English collocation dictionaries and the British National Corpus, a
collection of the texts in British English, were taken as a kind of target norm for L2
English learners. Language forms produced by L2 learners that conform to the
norm were regarded as well-formed, and those that deviate from the norm were
viewed as erroneous.
The book is divided into ten chapters. Chaps. 2 and 3 discuss previous theoretical
and empirical studies on (L2) collocations. Chap. 2 highlights the importance of
collocation and clarifies the notion of collocation. The significance of collocations
is discussed in terms of their prevalence in native-speaker texts and their importance
for a fluent and idiomatic control of English for L2 learners. Then the notion of
collocation is examined on the basis of previous different approaches, and a defi-
nition and classification applied in this study are presented. Chap. 3 reviews pre-
vious L2 collocation studies. In this chapter, the methodologies commonly adopted
in L2 collocation studies are first addressed, with a view to introducing the
methodology that has been more and more widely used in the analysis of collo-
cations in learner corpora; then major findings of previous L2 collocation studies
are discussed. Chap. 4 presents the detailed design of the present cross-sectional
study of Chinese EFL learners’ collocation performance. The learner corpus chosen
for such an investigation, the types of collocations targeted, the sources of reference
in extracting these collocations, and the procedures for collocation extraction and
analyses are introduced. Chaps. 5, 6, 7, 8 and 9 contain a detailed analysis of the
data. In Chap. 5, the overall picture of Chinese L2 learners’ performance in
verb + noun collocations is depicted, with the main focus on the developmental
patterns of collocation production from delexical verb + noun to lexical verb +
noun collocations. Chap. 6 is devoted to an investigation of the relationship
between verb increase in specific synonym sets and collocation uses associated with
verbs in these synsets. Moreover, an alternative explanation, i.e. the learning of new
nouns in collocation production, is made in order to see whether the acquisition of
new nouns is responsible for a lag in collocation. Chap. 7 goes on to explore
learners’ performance on two other important and frequent types of collocations,
1.3 The Shape of the Study 7
References
Collocation not only plays a crucial role in language production and comprehen-
sion, but also functions as a key indicator of L2 learners’ overall proficiency in the
field of second language acquisition. This chapter briefly clarifies the notion of
collocation before presenting in the next chapter the reviews of L2 collocation
studies. It begins by highlighting the importance of collocation for both native
speakers and L2 learners (Sect. 2.1). The second section (Sect. 2.2) proceeds to
discuss the different approaches to collocation, and develops a definition adopted in
this study. Finally, how collocations were previously classified and the classifica-
tion of collocation used in the present study are presented.
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018 9
H. Men, Vocabulary Increase and Collocation Learning,
DOI 10.1007/978-981-10-5822-6_2
10 2 The Notion of Collocation
1
Kjellmer’s recognition of collocation is based on his definition of collocation as a grammatically
well-structured sequence occurring more than once (1987: 133). So, more collocations were
counted than collocations defined in the present study (cf. Sect. 2.2.2).
2
The 1000-word sample was compiled from a 10,000-headword database, which recorded the most
frequent content words in the Cobuild (1995) database. So there were no function words beginning
with an f (e.g. for).
2.1 The Importance of Collocation 11
inclusion of any continuous string of words occurring more than once in identical
form. In a similar vein, Biber et al. (1999) identified many lexical bundles (recurrent
expressions) in a large corpus. Unlike Altenberg (1998), they set a fairly high
threshold level for what qualifies as a lexical bundle—lexical sequences occurring
at least ten times per million words and at the same time across at least five different
texts in a register. Even with such a high cut-off point between lexical bundles and
casual lexical co-occurrences, they discovered a large proportion of lexical bundles:
45% in conversation and 21% in academic prose.
Both the written and spoken languages of native speakers thus exhibit a strong
phraseological tendency. The spoken language has been found to consist of a
greater proportion of recurrent word combinations than the written language. One
reason provided by Biber et al. (1999) is that the spoken language involves a
considerable amount of repetitions, which increases the potential proportion of
clusters. Another underlying reason explaining why the spoken language is more
formulaic might be the time constraints imposed on speakers. Speakers usually do
not have enough time to coin novel expressions as they do in writing. This is the
case with journalistic reporting, where the intense pressures and time constraints on
reporters require them to use a great many familiar ready-made expressions (Cowie
1992). Hence, there is an unavoidably larger occurrence of formulaic language use
in spoken than written production. In all, word combinations make up a very high
proportion in both the written and spoken performance of native speakers. This
phenomenon demonstrates the block-like nature of language and facilitates the
inference that “when we speak or write it is therefore often more apposite to say that
we move from one cluster to the next than to say that we move from one word to
the next” (Kjellmer 1994: ix). The clusters, or multiple-word units, are stored in the
psychological lexicon and are believed by Kelly and Stone (1975) to be at least as
numerous as single words.
Therefore, as to learners of a second or foreign language,3 the existence of a
large number of word combinations underlying proficient performance requires
them to be empowered with this phraseological competence. Phraseological
knowledge is naturally of central importance to fluent and idiomatic control of the
language for L2 learners as well. The next section moves on to discuss the sig-
nificance of this phraseological competence for L2 learners.
The importance of collocational knowledge for L2 learners has been long and
widely recognised (e.g. Cowie 1992; Fox 1998; Kjellmer 1991; Lee and Liu 2009;
Lewis 2000; Meara 1984; Palmer 1933; Pawley and Syder 1983; Wray 2002; Yorio
3
In this research, the terms second and foreign language are used interchangeably, referring to any
language learned after one’s native language, although they are differentiated by Richards and
Schmidt (2010: 224f) in terms of whether the language is used as a medium of instruction in
schools or widely used in a country as a medium of communication by the government, media, etc.
12 2 The Notion of Collocation
1989). In this section, its significance is briefly summarised from two perspectives:
for native-like production and for efficient comprehension.
Here, the oddness of expressions produced by learners is not concerned with the
inappropriateness of grammar, but with the co-selected word combinations. To
know a language not only requires the knowledge of appropriate rules to generate
grammatically well-formed utterances of that language, but also knowledge of
which of these grammatical utterances are native-like (Biber et al. 1999; Wray
2002: 143). Failing to appropriately use these lexicalised expressions, as has been
pointed in Chap. 1, may even divert the reader’s attention from content to form
(Howarth 1998a: 174). As Cowie (1992: 10) acknowledged, it is impossible to
perform at a native-like level without knowledge of an appropriate range of mul-
tiword units. Therefore, the significance of phraseological knowledge for L2
learners should in no way be downplayed.
A good command of phraseological knowledge helps attain the goal of
native-like production through promoting fluency. A store of formulaic units in the
mental lexicon plays a key role in reducing the processing effort en route to language
production (cf. Hunston and Francis 2000: 271). Unlike the creative side of language
production, in which individual words are combined one by one according to
grammatical rules, the agglomeration of words into clusters constitutes one single
choice and thus saves much processing time (cf. Sinclair 1987: 320). Jackendoff’s
analogy between fixed word combinations and chunking in music well illustrates the
role of prefabricated units in promoting fluency, as he maintained that:
any musician can attest the fact that one of the tricks to playing fast is to make larger and
larger passages form simplex units from the point view of awareness—to “chunk” the input
and output. This suggests that processing speed is linked not so much to the gross measure
of information processed as to the number of highest-level units that must be treated
serially. Otherwise, chunking wouldn’t help. (Jackendoff 1983: 125)
Knowing a wide range of multiword units not only facilitates native-like produc-
tion, but also contributes to efficient comprehension on the part of L2 learners.
2.1 The Importance of Collocation 13
Hunston and Francis (2000: 270–271) argued that storing a large number of mul-
tiword units in the mental lexicon, learners can understand the meaning of text
without having to pay attention to every word. This is beneficial for enhancing both
the reading and listening efficiency. They further pointed out that knowledge of
phraseological patterns can help L2 learners reconstruct the meanings even if they
mishear some words in speech. At a micro-level, knowledge of co-occurring word
combinations contributes to successful comprehension of the semantics of each
constituent. For example, through a corpus-based analysis of the collocations with
affect/influence, Lee and Liu (2009) exemplified how the use of collocations pro-
vides a solid conceptual grounding of the target word for L2 learners in grasping the
lexical semantics of the two words.
In sum, in the process of striving for native-like language production, phrase-
ological knowledge is, on the one hand, important for L2 learners’ idiomatic and
fluent production; on the other, it helps promote the efficiency of language com-
prehension in general and the comprehension of lexical semantics of individual
words. Collocation is thus recognised by Lewis (2000: 45) as “the most powerful
force in the creation and comprehension of all naturally-occurring texts”.
2.1.3 Summary
In this section, we have placed collocation within the context of formulaic language
and reviewed its importance for both native speakers and L2 learners. First, the
ubiquitousness of formulaic language in either spoken or written language has long
been acknowledged and verified in previous studies. Given the pervasiveness of
conventionalised word combinations, it follows that L2 learners have to gain a good
command of them in order to achieve native-like proficiency. A good control of
formulaic language not only facilitates idiomatic production, but also promotes
efficient language comprehension. Collocation is one of the most important and
frequent aspects of formulaic language and constitutes the target of this study. The
next section will be devoted to a deeper discussion of the nature of collocations.
Given the abundance of terminology in the field of phraseology (cf. the various
expressions mentioned in Sect. 2.1, e.g. collocations, fixed expressions, idioms,
prefabs, complex lexical items, multiword units, etc.),4 a clarification of which of
4
Kjellmer (1994: xi) listed various terms referring to clusters of words: expressions, fixed com-
binations, formula units, formulas, larger-than-word units, lexical phrases, lexicalised sentence
stems, multiword lexical units (MLU), multiple-word units, patterned speech, patterns, phrases,
14 2 The Notion of Collocation
these aspects of formulaic language forms the object of this study is in order. The
present study will focus on the most common manifestations of formulaic language
—collocation.5 Yet as Bahns (1993: 57) admitted, “regrettably, collocation is a term
which is used and understood in many different ways”. So the primary aim of this
section is to summarise previous definitions and classifications and develop a
definition and classification of collocation in order to identify those word combi-
nations in learner English.
(Footnote 4 continued)
prefabricated speech, ready-made utterances, recurrent combinations, stock phrases and word-like
units. See also the terms to describe the phraseological phenomenon in Wray (2002: 9).
5
Fellbaum (2007: 8) distinguished collocation, a linguistic phenomenon, from collocations,
specific lexical instances resulting from collocation that are part of the lexicon. No differentiation
is attempted in this study.
2.2 The Notion of Collocation 15
6
This classification of the different approaches to collocation is similar to previous collocation
reviews. For example, definitions of collocation have been neatly summarised by Partington
(1998) into “textual”, “psychological” or “associative”, and “statistical” ones, whilst Handl (2008)
classifies previous definitions into four categories: text-oriented, association-oriented, statistically
oriented and semantically oriented. Herbst (1996) distinguishes three approaches in collocation:
“statistically oriented approach”, “significance oriented approach” and “text-oriented approach”.
Nesselhauf (2004) summarises the approaches of collocation as the “frequency-based” and
“phraseological” approach.
7
According to Aitchison (2003: 86), the commonest type of response to stimulus words is coor-
dination, e.g. salt with pepper, butterfly with moth.
16 2 The Notion of Collocation
Firth’s statement that “you shall know a word by the company it keeps” is well
exemplified by the co-occurring words dark night, where he claimed “one of the
meanings of night is its collocability with dark, and of dark, …, collocation with
night” (1957: 196). The meaning by collocation, as Firth argued, is “an abstraction
at the syntagmatic level and is not directly concerned with the conceptual or idea
approach to the meaning of words” (ibid: 196). Firth put forward a significant
conception as to the realisation of meaning by its instantiations with co-occurring
words. At a time when “the idea the language is based on a system of rules
determining the interpretation of its infinitely many sentences is by no means
novel” (Chomsky 1965: v), Firth’s meaning by collocation was fresh. This con-
ception has become a substantial and new impetus in observable-text-based and
later computer-assisted studies on collocation and established the British traditions
in text analysis (Stubbs 1996). Taking inspiration from Firth’s definition of collo-
cation, the Firthians have conducted studies of word co-occurrences based on real
language in use, and proposed other definitions.
2.2 The Notion of Collocation 17
Sinclair is the main inheritor and innovator of the Firthian approach, along with
other linguists who followed the British tradition and viewed collocations primarily
as a syntagmatic relation between words in texts (Halliday 1966; Hoey 1991, 2005;
Kjellmer 1987, 1994; Lewis 2000; Moon 1998; Sinclair 1966, 1987, 1991, 2004;
Stubbs 1996, 2001; etc.). Given the abundance of studies on collocation in this
trend, these studies are summarised in two subsections: one discusses the notion of
collocation in word sense recognition and differentiation, and the expansion of the
notion of collocation to other aspects, such as colligation, semantic prosody and
semantic preference; the other discusses frequency-based studies of collocation
focusing on defining and recognising collocations based on word frequencies.
These two lines of studies are however not mutually exclusive and only discussed
separately for the purpose of stressing their differences.
8
Words in boldface are quoted in their original forms.
18 2 The Notion of Collocation
9
Explanations of drink are quoted from WordNet.
10
Lemma refers to the composite set of word forms. For example, the lemma give refers to the
forms of give, gives, given, gave and giving (Sinclair 1991: 41–42).
2.2 The Notion of Collocation 19
collocates”. For example, the phrase set in primarily co-occurs with an unpleasant
state of affairs and has a negative prosody (Sinclair 1991: 68, for semantic prosody,
cf. Sinclair 2003; Stubbs 1995a, b, 1996, 2001). Other words would usually col-
locate with a certain semantic preference, as the naked eye collocates with verbs and
adjectives indicating visibility and the word unemployment usually collocates with
the semantic set of statistics (Sinclair 1991: 33; Stubbs 1995b: 254). Semantic
preference is therefore an abstraction of the semantic orientations over the collo-
cates of the node word.
Besides the loose treatment of collocation as co-occurring words within a set span,
other researchers reserve the notion of collocation for statistically significant
co-occurring words and define collocation as “the relationship a lexical item has
with items that appear with greater than random probability in its (textual) context”
(Hoey 1991: 7, 2005; cf. Greenbaum 1974; Moon 1998; Sinclair et al. 2004; Stubbs
2001). The higher the probability is, the more likely for a word combination to be a
collocation. Significant collocations are quantitatively identified by using statistical
formulae (cf. Church et al. 1991; Church and Hanks 1990; Church and Hindle
1990; McEnery and Wilson 1996; McEnery et al. 2006; Stubbs 1995a). Within the
field of frequency-based definition of collocation, some definitions purely rely on
frequency, as Moon (1998: 26) considered a collocation as that which “typically
denotes frequently repeated or statistically significant co-occurrences, whether or
not there are any special semantic bonds between collocating items”. Yet frequency
alone is not a reliable criterion for identifying meaningful collocations. Other
researchers add a grammatical standard as well as with frequency and define col-
location as recurring sequences of items that are grammatically well formed (cf.
Johansson and Hofland 1989: 95; Kjellmer 1987: 133, 1994: xiv). According to
Kjellmer (1994: xv), sequences that have no or only a very distant grammatical
relationship are excluded. For example, instances like but too, day but, however in
the, night he would not be considered as collocations even if the frequency criterion
was satisfied. Instead, by me, in April, of the Government all qualify as collocations
(ibid: xiv). However, even though the definition incorporates grammatical
well-formedness, it is not sufficient to distinguish between combinations formed on
the basis of grammatical rules (e.g. by me, in April) and collocations of phraseo-
logical value (e.g. make a decision, strong argument). The approach to identifying
collocations that are of phraseological value is the phraseological approach, which
will be illustrated in the following section.
In Aspects of the Theory of Syntax, Chomsky (1965: 190f) distinguished two types
of word relations: a close construction (as decide on a boat in the sense of choose
20 2 The Notion of Collocation
the boat) and a loose association (as decide on a boat meaning decide while on a
boat). This distinction is much the same as collocations and free combinations,
where close construction refers to collocation (e.g. the verb decide occurring
together with the particle on to mean choose), which represents a unit, and loose
association resembles free word combinations, which are constructed on the basis
of grammatical rules. The phraseological approach is concerned with the defining
criteria of collocation and demarcating it from other types of word combinations.
The phraseological approach, in contrast with the psychological and Firthian
approaches, concerns itself with classifying schemes of phraseological units
according to their varying degrees of fixedness. Russian phraseologists such as
Vinogradov (1947, cited in Cowie 1998: 4f) established three categories of word
combinations: “phraseological fusions” (e.g. spill the beans), “phraseological uni-
ties” (e.g. blow off steam) and “phraseological combinations” (e.g. meet the
demand). Different phraseologists adopt slightly different classifications with dif-
ferent terminology, as summarised in the table below:
As Table 2.1 shows, word combinations are generally divided into idioms,
collocations and free combinations, which are on a spectrum from the most fixed to
the most free. What is also revealed through the above table is that generally there
are two separate directions of interests distinguishing the Russian phraseologists
(Vinogradov and Amosova) from other phraseologists like Aisenstadt, Cowie and
Howarth. The Russian scholars start from the idiomatic spectrum to delineate the
specific phraseological zone and are preoccupied with the distinction between id-
ioms and collocations. Their classifying criteria will not be elaborated here, since
what is more challenging and significant for L2 learning is the distinction between
collocations and free word combinations, and to “identify at what point language
users are manipulating expressions as wholes rather than composing them
Semantic Transparency/Opacity
Commutability/Substitutability
Unlike free combinations which are subject to free substitution of either element
without a consequent alteration in the meaning of the other, collocations are
restricted in the commutability of either element (Aisenstadt 1979; Cowie 1992;
Howarth 1996, 1998a). Aisenstadt (1979: 73) illustrated restricted commutability in
the following two examples:
(4) shrug one’s shoulders
shrug something off
shrug something away
shrug one’s shoulders
square one’s shoulders
hunch one’s shoulders
(5) make a decision
take a decision
have a look
give a look
take a look.
In example (4), both shrug and shoulders are restricted to a number of
co-occurring words and neither of them can be substituted; In example (5), there is
a restricted commutability on the verbs, as decision is limited in alternative verb
collocates: make/take, and look in verbs such as have/give/take. Aisenstadt
attempted to demarcate collocations according to the restricted substitutability of
word constituents. Yet on the one hand, commutability itself is a vague criterion
and depends much on the conceivability of a human mind. With shrug one’s
2.2 The Notion of Collocation 23
shoulders for example, shoulders can have a rather wide set of verbs to go with, as
in straighten one’s shoulders, wash one’s shoulders, look at one’s shoulders, rub
one’s shoulders, scratch one’s shoulders (Nesselhauf 2005: 27). This is also the
case with decision, which can co-occur with a variety of verbs, such as reach a
decision, come to a decision, postpone a decision, criticise a decision, explain a
decision (ibid: 27). On the other hand, commutability can also be restricted in free
combinations like wash the glass, since substituting the verb clean for wash slightly
alters the original sense and the same applies to replacing the noun glass with cup.
So what qualifies the two combinations as collocations is the fact that the word
shoulders has a rather restricted set of co-occurring words with the sense of “shrug”
in shrug one’s shoulders (probably only the verb, i.e. shrug) and decision has a
restricted number of verbs with the sense of “make” in make a decision
(make/take/reach, etc.). The notion of the given sense was adopted by Cowie (1992:
5f) in his commutation tests to demarcate restricted collocations. The commutability
of the verb is tested through whether it is the only verb or one of a set of syn-
onymous verbs used in the appropriate sense in relation to a given noun (e.g. verbs
are commutable in abandon/give up a cherished principle, but verbs are not
commutable in run a deficit).
A comprehensive classification of collocations on the basis of commutability
was established by Howarth (1996: 102) in his categorisation of verb + noun
collocations from the most free to the most restricted (from L1 to L5), as sum-
marised in Table 2.2.
From L1 to L5, restrictedness of collocations is scaled from a slight degree of
restriction of one element to complete restriction of both elements and this
restriction is explained by the number of synonyms either element can take. For
example, for L1 collocations, nouns are subject to free substitution whilst restriction
is placed on the verbs because of the limited number of synonymous verbs. When
neither element permits substitution, i.e. with no synonym in the given sense, the
word combination is the most restricted collocation (L5), such as curry favour.
However, this classifying scheme complicates the differentiation between col-
locations and free combinations once the notion of synonyms is introduced. Like
the notion of commutability, the number of synonyms is also subject to the con-
ceivability of a human mind. With the examples in L3 for an example, the com-
bination pay heed is considered as a restricted collocation in the sense that heed is
completely restricted in its substitution. Yet according to the Oxford Dictionary of
Synonyms and Antonyms, attention is in a synonymous relationship with heed and
pay attention is an acceptable English collocation. The example of give
appearance/impression classified in L4 has the same problem, as the verb give can
be replaced by make/leave given that make appearance/impression or leave
appearance/impression are expressions with similar meanings.11 So the judging on
the number of synonyms requires a good deal of subjectivity. As with Cowie’s
commutation test in which verbs are measured in terms of the number of synonyms
they have, it is hard to find synonyms for verbs even in free combinations such as
open the door (?unblock, ?unlock). In cases where no synonyms are found, it can
just as well be a free combination rather than a restricted collocation, e.g. drink
one’s tea (Nesselhauf 2005). Commutability is not a clear criterion for differenti-
ating collocations from free combinations.
In this section, the notion of collocation has been first introduced in the domain
of psychological studies, with collocation viewed as psycholinguistic lexical
associations. Another field in which collocation has been researched is the
text/frequency-based studies of collocation. Much text/frequency-based research
focuses on the collocational relationship between words, the extension of the notion
of collocation to more abstract levels, such as colligation, semantic prosody,
semantic preference and the identification of significant collocations. However, the
Firthian approach is based on linear co-occurrence of items and takes little account
of the syntactic and semantic statements that are essential in treating collocations
(Greenbaum 1970: 10). In addition, the span established for identifying colloca-
tions–four words each side—is insufficient to account for certain common collo-
cations (e.g. collect stamps in the following examples):
(6) They collect many things, but chiefly stamps.
(7) They collect many things, though their chief interest is in collecting coins. We,
however, are only interested in stamps (Greenbaum 1970: 11).
So the frequency-based approach, although it can identify significant colloca-
tions of statistical value, cannot incorporate all the collocations of phraseological
value (like collecting stamps in the above examples). Moreover, the Firthian tra-
dition is preoccupied with collocation as a linguistic phenomenon per se and is not
concerned with demarcating collocations from other types of word combinations.
As discussed above, the notion of restriction inevitably forms part of accounting for
what a collocation is and this restriction distinguishes it from other forms of lexical
co-occurrences (e.g. free word combinations and idioms). Measured frequency
of co-occurring words is not a significant measure of collocational restriction
11
The verb collocates of impression—make and leave—are listed with reference to a collocation
dictionary––Oxford Collocations Dictionary for Students of English (2nd Edition).
2.2 The Notion of Collocation 25
(Cowie 1998: 226; Greenbaum 1974: 83); the phraseologists on the other hand have
proposed categorisation frameworks of word combinations. The separation of
collocations from free combinations is of essential importance in the investigation
of collocations used by L2 learners, since that constitutes the first step in examining
what is phraseological rather than what is free (cf. Howarth 1996). However, even
with the widely adopted defining features in demarcating collocations within the
phraseological approach, a clear borderline between free combinations and collo-
cations still cannot be set. The next sections, then, continue to examine the defi-
nition of collocations within the phraseological approach, attempting to develop a
usable categorisation of collocation and discussing previous classifications of
collocations.
Since this study is situated in the field of second language acquisition, aiming at
measuring nonnative phraseological performance, the approach taken to collocation
is mainly phraseological, in order to delimit collocations from idioms and free
combinations in learners’ English writings. Collocation within this approach has
been defined by previous researchers in more or less the same way, adopting the
criteria of semantic transparency/opacity, specialised sense of one element and
commutability (see definitions summarised in Table 2.3).
As demonstrated in the previous section, collocations can be distinguished from
idioms by applying the criterion of semantic transparency, i.e. the former are rel-
atively transparent in meaning (e.g. make a decision) and not as opaque as idioms
(e.g. kick the bucket). Another criterion—the specialised senses required of at least
one element of a word combination—is weaved, since certain collocations with
both elements used in their literal senses are excluded otherwise (e.g. commit a
crime, answer questions). As for the criterion of commutability measured in terms
of the number of synonyms an element can take, though it contributes to the
identification of the restrictedness in collocations, it operates more or less at an
intuitive level.
Therefore, there is a need for a clear definition using terminology which avoids
blurring the notion of collocations. What is commonly acknowledged is that there is
restricted commutability in either of the constituent words in a collocation. In other
words, either of the two elements has a limited set of words with which to co-occur
(Cowie 1981, 1998; Howarth 1996). For example, the noun stir in the given context
of “make a stir” has a limited set of verbs: cause/create/make a stir (Cowie 1981:
228). Or a verb in a given context has a limited set of nouns (e.g. pay one’s
respects/a compliment/court) (Cowie 1998: 216). The limited set of verbs/nouns is
termed collocational range, referring to the number of co-occurring words a word
can take (see also Cowie 1981, 1998; Granger 1998; Handl 2008; Leech 1974: 20;
Nesselhauf 2003; Philip 2007). In Greenbaum’s (1974: 80) words, the notion of
collocational range is exemplified by turn on, which “collocates with (among other
26 2 The Notion of Collocation
items) light, gas, radio, and TV…These items and others we might add to them
constitute the COLLOCATIONAL RANGE of turn on”.
Collocational range is used as a criterion for distinguishing phraseological units
in that elements in collocations have a restricted range of co-occurring words. With
the example of commit a crime, commit has a restricted range of nouns, such as
crime, wrongdoing, murder, and thus commit a crime qualifies as a collocation.
Combinations with both elements having a wide/unrestricted range of co-occurring
words are free word combinations, e.g. want a book, for which the verb want can
occur with, a car, money, peace, etc., and the noun book can occur with have, buy,
read, take, etc.
2.2 The Notion of Collocation 27
Therefore, this study utilises two essential defining criteria in defining colloca-
tions, namely, semantic transparency and the range of co-occurring words.
Collocations are then defined as combinations of two or more words which are
characterised by a restricted range of co-occurrence in at least one of their con-
stituent words and by relative transparency in meaning.
Based on this definition, we propose that word combinations with both elements
taking a wide range of co-occurring words are classified as free combinations;
combinations in which either of or both elements have a restricted range of
co-occurring words, and also are transparent in meaning, are categorised as col-
locations; combinations with both elements having a very restricted range of
co-occurring words and being opaque in meaning are viewed as idioms (see
Table 2.4 for a detailed illustration).
According to this framework, want a book is a free combination since both the
verb and noun have an unlimited range of co-occurring words. Call the shots is an
idiom with both the verb and noun having a very restricted range of words and
being semantically opaque. Both free combinations and idioms are disregarded in
this study. The focus is on collocations such as pay heed, commit a crime and curry
favour. This framework is a simplified version of Howarth’s categorisation of
collocations into five levels of restrictedness and Nesselhauf’s five groups of
combinatory possibilities of verbs in verb–noun combinations (cf. Howarth 1996:
102; Nesselhauf 2005: 30).
youths), words with habitual connections or clichés (wide awake) and words frozen
into a fixed order or “freezes” (knife and fork). This framework of collocation
classification resembles that of Howarth’s categorisation of collocations from the
least to the most restricted. The difference lies in the criteria they adopted, namely
the strength of association by Aitchison, as opposed to the analytical method of
semantic commutability by Howarth. The common denominator is that both
acknowledge the degree of fixedness in collocations. If words are strongly asso-
ciated, they tend to co-occur more often than would be expected in texts. This leads
to the classification of collocations in frequency-based studies, where collocations
are classified into significant and casual ones (cf. Jones and Sinclair 1974; Sinclair
1987, 1991; Sinclair et al. 2004). Moon (1998: 27) made a distinction among
collocations based on the constraints where collocation arises. The simplest col-
locations are semantically constrained and represent co-occurrence of the referents
in the real world (strawberry jam); the second kind is constrained both
lexico-grammatically and semantically and “arises where one word requires asso-
ciation with a member of a certain class or category of item” (rancid butter); the
third type is syntactically constrained and arises where a word requires comple-
mentation with a specified particle (too—to).
Collocations to be classified in this study are neither based on the psychological
approach or frequency-based approach. As discussed in Sect. 2.2.2, the definition
of collocation is phraseological, and thus its classification is not meant to be based
on restrictedness of combinations (cf. Howarth 1996, 1998a, b; Nesselhauf 2005);
rather it is broadly based on the word classes of its constituents, since the study
aims at investigating L2 learners’ performance with regard to certain types of
collocations and its relationship with vocabulary growth.
According to the syntactic structures of collocations, Hausmann (1989: 1010)
divided collocations into the following six types:
• adjective + noun (heavy smoker)
• (subject-) noun + verb (storm rage)
• noun + noun (lemon tree)
• adverb + adjective (deeply disappointed)
• verb + adverb (criticise severely)
• verb + (object-) noun (stand a chance).
Similar classifications were also proposed by Benson (1985) and Benson et al.
(2010), in whose classifications, collocations were further divided into grammatical
and lexical collocations. A grammatical collocation, according to Benson et al.
(2010), is a phrase consisting of a dominant word and a preposition or a gram-
matical structure; lexical collocations resemble those in Hausmann’s classifications,
which consist of nouns, adjectives, verbs and adverbs. In the classification put
forward by Benson et al. (2010), verb + noun collocations were further divided into
CA collocations (collocations containing a verb denoting creation/activation with a
noun) and EN collocations (collocations containing a verb denoting eradication/
nullification with a noun) (see Table 2.5 for classifications of lexical collocations).
2.2 The Notion of Collocation 29
12
Reasons for focusing the three types of collocations will be given in Chap. 4.
30 2 The Notion of Collocation
2.2.4 Summary
This chapter has concerned itself with developing the definition of collocation and
introducing its classifications. We have reviewed three different approaches to
collocation: the psychological approach, which views collocation as psychological
association in the mental lexicon, the Firthian approach, regarding collocation as
words in syntagmatic relations in texts and the phraseological approach, aiming at
demarcating collocation and distinguishing it from other types of word
co-occurrences like free combinations and idioms. The phraseological approach is
mainly followed in this study, since an empirical study on collocations in learner
language requires a categorisation framework allowing them to be separated from
idioms and free combinations. Based on the criteria of semantic transparency,
specialised senses of words and commutability commonly adopted by phraseolo-
gists in demarcating collocation, a slightly refined definition of collocation is
proposed: collocations are combinations of two or more words which are charac-
terised by a restricted range of co-occurrence in at least one of their constituent
words and by relative transparency in meaning. In addition, based on collocation
classifications proposed by Hausmann (1989) and Benson et al. (2010), three types
of lexical collocations: verb + noun, adjective + noun and noun + noun colloca-
tions will be examined in this study. Having defined what is meant by a collocation,
the next chapter moves on to discuss previous studies on collocation learning by L2
learners.
References
Biber, D., Johansson, S., Leech, G., et al. (1999). Longman grammar of spoken and written
English. Harlow: Longman.
Bolinger, D. (1976). Meaning and memory. Forum Linguisticum, 1, 1–14.
Chi Man-lai, A., Wong Piu-yiu, K., & Wong Chau-ping, M. (1994). Collocational problems
amongst ESL learners: A corpus-based study. In L. Flowerdew & A. K. Tong (Eds.), Entering
text (pp. 157–165). Hong Kong: University of Science and Technology.
Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Church, K., Gale, W., Hanks, P., et al. (1991). Using statistics in lexical analysis. In U. Zernik
(Ed.), Lexical acquisition: Exploring on-line resources to build a lexicon (pp. 115–164).
Hillsdale: Erlbaum.
Church, K., & Hanks, P. (1990). Word association norms, mutual information, and lexicography.
Computational Linguistics, 16, 22–29.
Church, K., & Hindle, D. (1990). Collocational constraints and corpus-based linguistics. In
Working Notes of the AAAI Symposium: Text-Based Intelligent Systems.
Cowie, A. P. (1981). The treatment of collocations and idioms in learners’ dictionaries. Applied
Linguistics, 2(3), 223–235.
Cowie, A. P. (1991). Multiword units in newspaper language. In S. Granger (Ed.), Perspectives on
the English lexicon: A tribute to Jacques Van Roey (pp. 101–116). Louvain-la-Neuve: Cahiers
de I’Institut de Linguistique de Louvain.
Cowie, A. P. (1992). Multiword lexical units and communicative language teaching. In P. Arnaud
& H. Bejoint (Eds.), Vocabulary and applied linguistics (pp. 1–12). London: Macmillan.
Cowie, A. P. (1998). Phraseology: Theory, analysis and applications. Oxford: Oxford University
Press.
Crystal, D. (1997). A dictionary of linguistics and phonology (4th ed.). Oxford: Blackwell.
Fellbaum, C. (2007). Idioms and collocations: Corpus-based linguistic and lexicographic studies.
London: Continuum.
Firth, J. R. (1957). Papers in linguistics 1934–1951. London: Oxford University Press.
Fox, G. (1998). Using corpus data in the classroom. In B. Tomlinson (Ed.), Materials development
in language teaching (pp. 25–43). Cambridge: Cambridge University Press.
Granger, S. (1998). Prefabricated patterns in advanced EFL writing: Collocations and formulae.
In A. P. Cowie (Ed.), Phraseology: Theory, analysis, and applications (pp. 145–160). Oxford:
Oxford University Press.
Greenbaum, S. (1970). Verb-intensifier collocations in English: An experimental approach. The
Hague: Mouton.
Greenbaum, S. (1974). Some verb-intensifier collocations in American and British English.
American Speech, 49(1, 2), 79–89.
Halliday, M. A. K. (1966). Lexis as a linguistic level. In C. E. Bazell, J. C. Catford, M. A. K.
Halliday, et al. (Eds.), In memory of J. R. Firth (pp. 148–162). London: Longman.
Handl, S. (2008). Essential collocations for learners of English: The role of collocational direction
and weight. In F. Meunier & S. Granger (Eds.), Phraseology in foreign language learning and
teaching (pp. 43–65). Amsterdam: Benjamins.
Hausmann, F. J. (1989). Le dictionnaire de collocations. In F. J. Hausmann, O. Reichmann, H.
E. Wiegand, et al. (Eds.), Wörterbücher: ein internationales Handbuch zur Lexicographie.
Dictionaries. Dictionnaires (pp. 1010–1019). Berlin: De Gruyter.
Herbst, T. (1996). What are collocations: Sandy beaches or false teeth? English Studies, 77(4),
379–393.
Hoey, M. (1991). Patterns of lexis in Text. Oxford: Oxford University Press.
Hoey, M. (2005). Lexical priming: A new theory of words and language. London: Routledge.
Howarth, P. (1996). Phraseology in English academic writing: Some implications for language
learning and dictionary making. Tübingen: Niemeyer.
Howarth, P. (1998a). The phraseology of learners’ academic writing. In A. P. Cowie (Ed.),
Phraseology: Theory, analysis and applications (pp. 161–186). Oxford: Oxford University
Press.
32 2 The Notion of Collocation
Howarth, P. (1998b). Phraseology and second language proficiency. Applied Linguistics, 19(1),
24–44.
Hunston, S., & Francis, G. (2000). Pattern grammar. A corpus-driven approach to the lexical
grammar of English. Amsterdam: Benjamins.
Jackendoff, R. (1983). Semantics and cognition. Massachusetts: MIT Press.
Jackendoff, R. (1997). Twistin’ the night away. Language, 73, 543–559.
Johansson, S., & Hofland, K. (1989). Frequency analysis of English vocabulary and grammar.
Oxford: Oxford University Press.
Jones, S., & Sinclair, J. (1974). English lexical collocations: A study in computational linguistics.
Cahiers de Lexicologie, 23(2), 15–61.
Kaszubski, P. (2000). Selected aspects of lexicon, phraseology and style in the writing of Polish
advanced learners of English: A contrastive, corpus-based approach. [2011-10-13]. http://main.
amu.edu.pl/ przemka/rsearch.html.
Kelly, E. F., & Stone, P. J. (1975). Computer recognition of English word senses. Amsterdam:
North-Holland Publishing Company.
Kjellmer, G. (1987). Aspects of English collocations. In W. Meijs (Ed.), Corpus Linguistics and
Beyond: Proceedings of the Seventh International Conference on English Language Research
on Computerized Corpora (pp. 133–140). Amsterdam: Rodopi.
Kjellmer, G. (1991). A mint of phrases. In K. Aijmer & B. Altenberg (Eds.), English corpus
linguistics. Studies in honour of Jan Svartvik (pp. 111–127). London: Longman.
Kjellmer, G. (1994). A dictionary of English collocations: Based on the Brown corpus. Oxford:
Clarendon Press.
Laufer, B., & Waldman, T. (2011). Verb-noun collocations in second language writing: A corpus
analysis of learners’ English. Language Learning, 61(2), 647–672.
Lee, C. Y., & Liu, J. S. (2009). Effects of collocation information on learning lexical semantics for
near synonym distinction. Computational Linguistics and Chinese Language Processing, 14
(2), 205–220.
Leech, G. (1974). Semantics. Harmondsworth: Penguin.
Lewis, M. (2000). Teaching collocation: Further developments in the lexical approach. London:
Language Teaching Publications.
Louw, B. (1993). Irony in the text or insincerity in the writer? The diagnostic potential of semantic
prosodies. In M. Backer, G. Francis, & E. Tognini-Bonelli (Eds.), Text and technology
(pp. 157–176). Amsterdam: Benjamins.
McEnery, T., & Wilson, A. (1996). Corpus linguistics. Edinburgh: Edinburgh University Press.
McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based language studies: An advanced
resource book. London: Routledge.
Meara, P. (1984). The study of lexis in interlanguage. In A. Davies, C. Criper, & A. P. R.
HOWATT (Eds.), Interlanguage (pp. 225–235). Edinburgh: Edinburgh University Press.
Miyamoto, T. (2000). The Light Verb Construction in Japanese: The role of the verbal noun.
Amsterdam: Benjamins.
Moon, R. (1998). Fixed expressions and idioms in English: A corpus-based approach. Oxford:
Clarendon Press.
Nattinger, J. R., & Decarrico, J. S. (1992). Lexical phrases and language teaching. Oxford:
Oxford University Press.
Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some
implications for teaching. Applied Linguistics, 24(2), 223–242.
Nesselhauf, N. (2004). What are collocations? In D. Allerton, N. Nesselhauf, & P. Skandera
(Eds.), Phraseological units: Basic concepts and their application (pp. 1–21). Basel: Schwabe.
Nesselhauf, N. (2005). Collocations in a learner corpus. Amsterdam: Benjamins.
Palmer, H. E. (1933). Second interim report on English collocations. Tokyo: Kaitakusha.
Partington, A. (1998). Patterns and meanings: Using corpora for English language research and
teaching. Amsterdam: Benjamins.
References 33
Pawley, A., & Syder, F. H. (1983). Two puzzles for linguistic theory: Nativelike selection and
nativelike fluency. In J. C. Richards & R. W. Schmidt (Eds.), Language and communication
(pp. 191–226). London: Longman.
Philip, G. (2007). Decomposition and delexicalisation in learners’ collocational (mis)behaviour. In
Online Proceedings of Corpus Linguistics. [2014-01-12]. http://ucrel.lancs.ac.uk/publications/
cl2007/paper/170_Paper.pdf.
Renouf, A. (1987). Lexical resolution. In W. Meijs (Ed.), Corpus Linguistics and Beyond:
Proceedings of the Seventh International Conference on English Language Research on
Computerized Corpora (pp. 121–131). Amsterdam: Rodopi.
Renouf, A., & Sinclair, J. (1991). Collocational frameworks in English. In K. Aijmer & B.
Altenberg (Eds.), English corpus linguistics. Studies in honour of Jan Svartvik (pp. 128–143).
London: Longman.
Richards, J. C., & Schmidt, R. (2010). Longman dictionary of language teaching and applied
linguistics (4th ed.). London: Longman.
Sinclair, J. (1966). Beginning the study of lexis. In C. E. Bazell, J. C. Catford, & M. A. K.
Halliday, et al. (Eds.), In memory of J. R. Firth (pp. 410–430). London: Longman.
Sinclair, J. (1987). Collocation: A progress report. In R. Steele & T. Threadgold (Eds.), Language
topics: Essays in honour of Michael Halliday (Vol. 2, pp. 319–331). Amsterdam: Benjamins.
Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.
Sinclair, J. (1995). Collins COBUILD English Dictionary. (2nd. ed.) London: HarperCollins.
Sinclair, J. (1996). The search for units of meaning. Textus, 9(1), 75–106.
Sinclair, J. (1998). The lexical item. In E. Weigand (Ed.), Contrastive lexical semantics (pp. 1–24).
Amsterdam: Benjamins.
Sinclair, J. (2003). Reading concordances: An introduction. London: Pearson Education Limited.
Sinclair, J. (2004). Trust the text. London: Routledge.
Sinclair, J., & Fox, G. (1990). Collins COBUILD English grammar. London: Collins.
Sinclair, J., Jones, S., & Daley, R. (2004). English collocation studies: The OSTI report. London:
Continuum.
Stubbs, M. (1995a). Collocations and semantic profiles: On the cause of the trouble with
quantitative study. Functions of Language, 2(1), 23–55.
Stubbs, M. (1995b). Corpus evidence for norms of lexical collocation. In G. Cook & B. Seidlhofer
(Eds.), Principle & practice in applied linguistics: Studies in honour of H. G. Widdowson
(pp. 245–256). Oxford: Oxford University Press.
Stubbs, M. (1996). Text and corpus analysis: Computer-assisted studies of language and culture.
Oxford: Blackwell.
Stubbs, M. (2001). Words and phrases: Corpus studies of lexical semantics. Oxford: Blackwell.
Teubert, W. (2010). Meaning, discourse and society. Cambridge: Cambridge University Press.
Van Roey, J. (1990). French-English contrastive lexicology: An introduction. Louvain-la-Neuve:
Peeters.
Vinogradov, V. V. (1947). Ob osnovnuikh tipakh frazeologicheskikh edinits v russkom yazuike.
In A. P. Cowie (Ed.) (1998), Phraseology: Theory, analysis and applications. Oxford: Oxford
University Press.
Wang, D. (2011). Language transfer and the acquisition of English light verb + noun collocations
by Chinese learners. Chinese Journal of Applied Linguistics, 11(2), 107–125.
Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge University Press.
Yorio, C. A. (1989). Idiomaticity as an indicator of second language proficiency. In K. Hyltenstam
& L. K. Obler (Eds.), Bilingualism across the lifespan (pp. 55–72). Cambridge: Cambridge
University Press.
Chapter 3
Collocation Studies in Second-Language
Learner English
Similar to collocation studies discussed in the previous chapter, the past decades
have also seen a large volume of studies on collocation learning in an L2. The
purpose of this chapter is to review previous collocation studies in learner English
to date. It begins by addressing the methodologies commonly adopted in L2 col-
location studies, with a view to introducing the methodology employed in this
study; Sect. 3.2 presents major findings of previous research in this field.
A review of the studies on L2 collocation learning is inevitably combined with
studies on other prefabricated forms of language (cf. Granger’s (1998a) study on
collocations and formulae), since collocation as a linguistic phenomenon belongs to
a larger umbrella term—“formulaic language”—and in practice collocations are not
always carefully delimited from other types of word combinations (Nesselhauf
2005: 3). Therefore, in this literature survey, research on L2 learners’ knowledge of
restricted combinations of words (e.g. formulae, formulaic sequences, routines, etc.)
is briefly reviewed, with the main focus on studies of collocations produced by L2
learners.
elicited and spontaneous data is adopted and previous L2 collocation studies are
considered as either elicitation data- or spontaneous data-based.
verbs and NP objects, and then asked to indicate which verbs can combine with
which nouns. The other test (COLLEX) involved testees to choose the correct
collocation among two lexical combinations: one correct and one
pseudo-collocation (e.g. pay a visit and do a visit). Unlike translation, blank filling
and cloze tasks, whose main advantage is to test the productive collocational
knowledge of L2 learners, these elicitation tasks afford a direct assessment of L2
learners’ receptive collocational knowledge.
A tight control over what is elicited from L2 learners enables direct comparisons
of collocational evidence based on unified criteria. Comparisons can be made
between different elicitation tasks, between collocation performance of different
participants and between L2 learners’ receptive and productive collocation
knowledge. For example, different elicitation tasks administered to the same group
of learners can reveal different strategies L2 learners adopt in producing colloca-
tions. Bahns and Eldaw (1993) compared the collocation production of the same
levels of learners in two tasks and reported that subjects did not perform signifi-
cantly better in a translation task than a cloze task, even though they were able to
paraphrase the target collocations in translation sentences but not in cloze sen-
tences. Findings showed that the difference between the number of correctly
translated collocations and the number of correct collocates in the cloze task did not
reach statistical significance (ibid: 106). The explanation proposed by Bahns and
Eldaw (1993) was that collocations were not easy to be paraphrased.
By controlling the set of collocations to be elicited, comparisons can also be
made between performances of different groups of participants, i.e. between
learners at different proficiency levels, learners of different L1 backgrounds and
learners in contrast with native speakers. Different collocation performances and
strategies in producing the same collocations were observed by Farghal and Obeidat
(1995) in their two groups of subjects: junior and senior English major L2 learners
(Group A) and language teachers of English (non-native speakers) (Group B). Both
groups were found to be seriously deficient in collocations (ibid: 315). With regard
to different strategies adopted by the two groups, they noted that Group B partic-
ipants resorted to paraphrasing as a strategy (25.1%) in the translation task more
than Group A (3.8%) in the blank filling test. The two groups are not comparable,
however, since in the translation task administered to Group B, more freedom of
paraphrasing is allowed than in the blank filling test given by Group A, so it could
have been more effective if the two groups had been given similar elicitation tests.
Additionally, though there was a higher percentage of paraphrasing strategy
adopted by Group B, the target translations of collocations were not found to be
satisfactory. Examples of such paraphrases of collocations are food little fat for light
food, does not change for fast color (ibid: 325). The finding of this unnaturalness in
the produced collocations caused by paraphrasing in translation tests is consistent
with Bahns and Eldaw’s (1993) finding that some collocations cannot be readily
paraphrased.
Besides offering a comparison between learners of different proficiencies, col-
location performances of learners with different L1 backgrounds can be compared
in elicitation-based studies. Ina translation task, Biskup (1992) reported that the
38 3 Collocation Studies in Second-Language Learner English
German learners were risk-takers and produced more variant collocations whereas
Polish learners produced more restricted collocations. Meanwhile, in terms of the
comparison between learners and native speakers, L2 learners’ receptive knowledge
was found to be poorer than native speakers as in the judgment tasks conducted by
Siyanova and Schmitt (2008) and Granger (1998a).
Elicitation tasks also enable a comparison between L2 learners’ receptive and
productive knowledge over predefined test materials. Biskup (1990) discovered a
striking difference in Polish learners’ performances on L2-L1 and L1-L2 translation
tasks: their answers were 100% correct in the former but they had great difficulty in
the latter. Similar results were obtained by Marton (1977) in a pre-treatment and
post treatment Polish–English translation test. The studies conducted by both
Biskup and Marton suggest that learners’ productive knowledge of collocation lags
far behind their receptive knowledge. On the one hand, their apparent ease with
collocation comprehension rests on the high degree of semantic transparency in
collocations; on the other, translation from L2 to L1 is always easier than the other
way round because L2 words are initially associated to L1 translation equivalent to
access meaning whereas translation from L1 to L2 words requires concept medi-
ation, which leads to a stronger lexical association from L2 to L1 than that from L1
to L2 (Kroll and Stewart 1994).
It is evident from the above discussion that elicitation data-based studies on L2
learners’ collocation performance possess several advantages difficult to obtain with
other types of data. Variables affecting subjects’ production can be clearly and
systematically controlled with the result that certain collocations happening “to
occur very rarely or not at all unless specifically elicited” (Yip 1995: 9) are elicited
by the researchers. Meanwhile, by targeting the same set of collocations in elici-
tation tasks to different learners, comparisons can be performed between variables
researchers aim to investigate. However, one of the limitations of elicitation
data-based studies lies in generalisability of the research findings to the broader
language proficiency of the participants. As Bahns and Eldaw (1993: 108)
acknowledged, 15 collocations tested in the translation tasks were too small a
sample from which a hypothesis is generalised. Generalisability is also affected by
the artificiality of an experimental situation that “may lead learners to produce
language which differs widely from the type of language they would use naturally”
(Granger 1998b: 5). Take for example the following sentences in the blank-filling
and cloze tests designed by Schmitt et al. (2004) for tapping into learners’
knowledge of formulaic sequences:
(1) With reg_ to giving directions, you must know phrases like ‘Turn right at the
corner’. (concerning this certain thing) (answer: regard)
(2) The economy is sure to improve ___c__
a. in the long period
b. over a long time
c. in the long run
3.1 Methodologies Adopted in L2 Collocation Studies 39
1
Studies of collocations in L2 speech are very rare, although there are studies of other forms of
phraseological performance in speech (for example, Aijmer 2009; Crossley and Salsbury 2011; De
Cock 2011; De Cock et al. 1998; Foster 2001). Lexical bundles, formulaic sequences and routines
are their main study foci.
3.1 Methodologies Adopted in L2 Collocation Studies 41
Schmitt 2008; Zhang and Gao 2006).2 Many learner corpus-based collocation
studies use the sub-corpora of the well-known learner corpus, the International
Corpus of Learner English (ICLE). For example, Granger (1998a) investigated the
uses of collocations and formulae by French learners of English in the French
sub-corpus of ICLE; the study of verb–noun collocations produced by
German-speaking learners was undertaken by Nesselhauf (2005) by using the
German sub-corpus of ICLE, and Siyanova and Schmitt (2008) focused on the
production of adjective–noun collocations by Russian learners using the Russian
sub-corpus of ICLE. Other non-ICLE-based collocation studies include those by
Zhang and Gao (2006), Laufer and Waldman (2011). The large amount of learner
data recorded in these large-scale learner corpora allows for a description of L2
collocation production that is as comprehensive as possible. What is more, utilising
public corpora enables studies of the same corpora to be easily comparable and
replicable (Penke and Rosenbach 2007: 11).
With the development of corpus analysis techniques, another advantage of
learner corpus-based collocation studies is that data can be (semi-) automatically
extracted and processed. As in Howarth’s (1996, 1998a, b) study, the
machine-readable corpus is useful for a rapid check of the original context for
pre-extracted collocations. Nesselhauf (2005) performed a similar automatic anal-
ysis to check whether all instances of verbs that were found to be restricted had
actually been spotted by the manual analysis. Granger (1998a) used text-retrieval
software (TACT) to automatically retrieve all the words ending in ly from the NS
and NNS corpora and then manually sorted them according to pre-defined semantic
and syntactic criteria. In the process of extraction of verb + noun collocations,
Laufer and Waldman (2011) created concordances of the 220 most frequent
pre-generated nouns in an NS corpus, and manually identified the verbs to go with
these nouns in the NNS corpus. Yet whether these 220 most frequent nouns are also
frequent in the NNS corpus is in question. So a full picture of learners’ collocation
production would be neglected by selecting a set of predetermined words for
retrieval. To get a fairly comprehensive picture of L2 collocation uses, collocations
in this study are not confined to certain predefined node words. Furthermore, dif-
fering from some of the collocation studies where collocations are manually
identified (Howarth 1996, 1998a, b; Li and Schmitt 2010; Nesselhauf 2005;
Siyanova and Schmitt 2008), collocations will be semi-automatically extracted in
this study with the aid of text retrieval software.
Spontaneous data-based collocation studies also have certain disadvantages,
insofar as only productive knowledge rather than receptive knowledge can be
investigated; infrequent features are hard to examine even in fairly large corpora
2
The corpora in the studies of Laufer and Waldman (2011) contain 300,000 words, composed of
argumentative and descriptive essays by native speakers of Hebrew of different levels. The corpora
compiled by Lorenz (1999) record German learners’ argumentative writing totaling 200,000
words. Though it is not known whether these two corpora are publicly available, the large
quantities of data from multiple learners manifest great advantages compared with a corpus of
several essays of a limited number of L2 learners.
42 3 Collocation Studies in Second-Language Learner English
since they occur rarely unless specifically elicited. In other words, only the per-
formance of learners is investigated but not their competence (Granger 1998b;
Nesselhauf 2005; Yip 1995). However, learners’ performance can be taken as
indicating their phraseological competence, since as acknowledged by Ellis (1994:
13): “learners’ mental knowledge is not open to direct inspection; it can only be
inferred by examining samples of their performance”.
To summarise the methods used in investigation of L2 collocation knowledge,
previous research commonly explores two types of data: elicited and spontaneous
data, each possessing distinctive advantages in answering particular research
questions. Though elicitation-based studies enable direct observation and analysis
of L2 learners’ collocation production/comprehension of a set of pre-selected col-
locations, criticisms are levelled in terms of their naturalness and generalisability.
Spontaneous data-based studies examine L2 learners’ natural production of collo-
cations through using large learner corpora. On the one hand, the use of a large
learner corpus enables a comprehensive description of real language use; on the
other, the development of computer software greatly enhances efficiency in
retrieving and analysing collocations in learner corpora. The point of departure of
this investigation is thus learner corpus-based, using the publicly available Chinese
Learner English Corpus (CLEC) . It aims to examine Chinese English learners’
productive knowledge rather than receptive knowledge. It also sets out to (semi-)
automatically extract all the collocations within a syntactic category (e.g. verb +
noun collocations) instead of focusing only on a number of pre-determined collo-
cations. In this way some disadvantages of spontaneous data-based studies can be
overcome.
Based on the two data types discussed in the previous section, collocation in an L2
has been extensively studied. Previous studies are varied in nature, as seen from a
wide range of differing task types, learner types and collocation types. Their
heterogeneity makes it difficult to compare the results of past studies (Paquot and
Granger 2012: 131). However, the overall picture that emerges through previous L2
collocation research is that collocation production constitutes a particular prob-
lematic domain in SLA, even for learners at an advanced level, compared with their
better receptive collocation knowledge (e.g. Biskup 1990; Gyllstad 2005; Marton
1977). Most importantly, L2 collocation studies indicate a collocation lag, where
collocation knowledge lags far behind the development of syntax and lexis. This
deficiency in collocation learning was recognised as early as the 1930s by Palmer
(1933) and is strongly upheld in later studies. In this section, major findings of prior
studies are presented.
3.2 Previous Findings from L2 Collocation Research 43
3
The findings of L2 collocation studies were neatly summarised into overuse, underuse and misuse
by Laufer and Waldman (2011) and Paquot and Granger (2012) in their review of L2 collocation
studies. This broad summarisation is employed in the present study.
44 3 Collocation Studies in Second-Language Learner English
2000). In Granger’s (1998a) study, the widely used combinations (e.g. closely
linked, deeply rooted) typically had a close translation equivalent in learners’ L1
French, but combinations non-congruent with their L1 were underused (e.g. com-
binations with highly, which is relatively much less frequent in French) (1998a:
148f). The same pattern is discerned in the use of discourse frames by French
learners of English. They were reported to massively overuse the active voice
frames which correspond to the uses of sentence introductory phrases in French
(e.g. We can see that…).
As Cobb (2003: 408) pointed out, what distinguishes L2 learners from NSs “is
the small number of precasts4 advanced learners have at their disposal, and the
extent to which these are used and overused”. The underlying reason for the
overuse and underuse phenomena that emerge in L2 learners’ collocation uses is
that learners tend to “‘cling on’ to certain fixed phrases and expressions which they
feel confident in using” (Granger 1998a: 156). These fixed phrases and expressions
become their ‘safe bets’ (ibid: 148), ‘islands of reliability’ (Dechert 1983: 184),
even referred to cutely as ‘lexical teddy bears’ (Hasselgren 1994: 237) or ‘collo-
cational teddy bears’ (Nesselhauf 2005: 69). Therefore, learners’ heavy reliance on
familiar collocations leads to overuse and avoidance of those which they are unsure
in using leads to underuse. These non-native features of L2 collocation production
are in fact not surprising since in the process of interlanguage development,
overuses and underuses of collocations are unavoidable phenomena, as is the case
with the use of grammatical structures or lexis. Additionally, the comparison of
collocational uses between NSs and NNSs will inevitably reveal less diversified
uses in learners since L2 learners not attaining native-like proficiency naturally
cannot reach a level on a par with NSs. This is where Contrastive Interlanguage
Analysis encounters criticism, to the effect that there tends to be an oversimplified
generalisation of learners’ overuse and underuse when their language is in direct
comparison with native speakers’ (Li 2009: 16). In other words, overuse and
underuse is hardly a specific problem of collocation. What is more important in L2
collocation studies is to investigate the forms of misuses and find the underlying
difficulties confronted with collocation learning.
combinations where a verb takes a wider range of nouns (verbs like exert, perform,
reach). It was further suggested that more restricted collocations were learnt as
wholes whereas the less restricted ones were used creatively (ibid: 233).
Some researchers point out that the relative infrequency of individual colloca-
tions in input is a problem for L2 collocation learning. Henriksen (2013: 49) argues
that “collocations are more low-frequent than the words that make up the collo-
cations, and learners therefore mostly lack sufficient exposure to collocations”.
Exposure to collocations is good for the learning of a second language and for L2
collocations as well, and the lab-based study of collocation learning by Durrant and
Schmitt (2010) has confirmed that frequent input helps the learning of collocations.
A large amount of collocation input is a contributor in collocation learning, as
language input is beneficial for the learning of other L2 aspects, but it is not
sufficient. L2 learners do not pay attention to collocational relationships between
words even when they encounter collocations (Wray 2002). Unlike collocation
acquisition by native speakers, L2 learners are influenced by their mother tongue in
both collocation learning and production, and the influence of learners’ L1 is a
significant factor commonly identified as linked to (mostly erroneous) collocation
production in L2 collocation studies. The next section will discuss the role of
learners’ mother tongue in the learning of L2 collocations.
5
There is evidence that for advanced L2 learners, L1 influence plays a marginal role in the
acquisition of word formation devices (Olshtain 1987), but L1 is believed to play a larger role in
lexis.
3.2 Previous Findings from L2 Collocation Research 47
by Polish and German learners and compared their collocation errors in terms of
cross-linguistic influence. She found that for Polish learners of English, the errors
were loan translations or extension of L2 meaning on the basis of the L1 words,
whereas German learners tended to produce errors resulting from assumed formal
similarity, e.g. to crack nuts as *to crunch nuts. Biskup (1992) interpreted these two
types of L1 influence as the perceived differences between languages on the part of
learners: the Polish learners saw a distance between Polish and English and thus did
not assume much formal similarity, whilst German learners assumed more formal
similarity between their mother tongue and English. Farghal and Obeidat (1995)
analysed the tendencies of lexical simplification that learners followed in two
elicitation tasks: blank filling and a translation task. Four strategies that learners
adopted in producing collocations were distinguished: synonymy, avoidance,
transfer and paraphrasing, among which transfer took up 9.9 and 12.9% of all
attempted collocations among two groups of learners. However, some caution is
needed here since “avoidance” as a strategy is a complex phenomenon, and it is not
clear whether subjects in Farghal and Obeidat’s study knew the target collocations
but preferred avoiding them and used other forms instead. One example given is
light food,6 for which learners produced soft food, little food, quick meal, etc. To
call this strategy avoidance rather than transfer is questionable since avoidance is
one manifestation of language transfer (Ellis 1994). So the percentage of L1 transfer
might, make up an even larger proportion in Farghal and Obeidat’s data. L1
influence was also confirmed in Martelli’s (2006) study in which it had a relevant
role in the generation of wrong lexical collocations. However, unlike Farghal and
Obeidat (1995), the proportion of L1-induced errors was not quantified in Martelli’s
study.
The traces of L1 in erroneous collocation uses have been investigated and
quantified, with findings showing that L1-influenced errors make up a large amount
of errors even for learners at advanced levels. In the erroneous uses of verb–noun
collocations by Chinese learners of English with different proficiency levels, Zhang
and Gao (2006) noticed a varied proportion of L1-influenced errors, from nearly
one third to more than a half. Likewise, L1 influenced errors were most frequent in
Nesselhauf’s studies, where L1 influence occurred in about half of the non-native
collocations (2003, 2005). A higher percentage of L1 induced errors—over 60% of
those produced by intermediate and advanced learners were identified by Laufer
and Waldman (2011) and the number of L1-induced errors was not found to
decrease over time. Apart from L1-transfer errors, another consequence of heavy
reliance on their mother tongue in collocation production is the overuse of certain
collocations that are similar between two languages and underuse of patterns that
are mismatched in two languages (cf. Sect. 3.2.1.1).
6
Light food is actually not a target-like collocation. Light meal is a native-like one.
48 3 Collocation Studies in Second-Language Learner English
7
Collocation priming is “the tendency for an activated word to accelerate the subsequent recog-
nition of a collocate” (Wolter and Gyllstad 2011: 431).
50 3 Collocation Studies in Second-Language Learner English
In the meantime, the RHM proposes that the organisation of bilingual memory
changes with rising L2 proficiency, in the form of developing the ability to con-
ceptually process L2 words directly, without mediation of L1 translation equiva-
lents. However, studies show that even for proficient learners, the L1 translation
equivalent is activated when processing the L2 word for meaning access (Thierry
and Wu 2007).8 Thus, if the acquisition of L2 lexical knowledge is initially clinging
onto L1 lexical/conceptual networks as the RHM predicts, it seems highly likely
that in the production process, L1 is firstly activated prior to the production of L2
words (cf. the ‘dual-activation’ in Wolter and Gyllstad (2011) discussed above).
That is where L1 transfer begins and yet not all features of L1 are activated and
transferred to the L2. According to Jiang’s (2000) psychological model of L2
vocabulary acquisition for late bilinguals (see Fig. 3.2), a majority of L2 words
fossilise at the first language lemma mediation stage, when the lemma information
of the L1 (containing semantic and syntactic information) is copied into the L2
lexical entry whilst lexical information at the lexeme level (containing morpho-
logical and phonological/orthographic specifications) is stored in the L2 lexical
entry.
The semantics and syntax of an L2 are quite likely to be influenced by learners’
L1. Considering the nature of collocations—the arbitrary co-selection of word
combinations, collocations are primarily word combinations representing syntactic
8
Kroll et al. (2010) argue that proficient bilinguals may access the translation equivalent after they
understand the meaning of the L2 word. The exploration of intricacies of this debate on whether
highly proficient bilinguals access meaning for L2 words through the mediation of their L1 is
beyond the scope of the present study and won’t be discussed further.
3.2 Previous Findings from L2 Collocation Research 51
Fig. 3.2 Jiang’s model of fossilised L2 lexical knowledge (cited in Wolter and Gyllstad 2011:
446)
Laufer and Waldman (2011) showed a much stronger collocation lag, as the
advanced and the intermediate learners produced significantly more erroneous
collocations than the basic learners. Similar results were obtained in Obukadeta’s
study as discussed in Chap. 1. Despite the heterogeneity in L2 collocation studies,
these results clearly indicate a noteworthy lack of positive correlation between
general language proficiency and collocation knowledge.
The question whether collocation knowledge can be related with general pro-
ficiency is not of crucial importance here, since on the one hand, it is difficult to
establish a clear link between language proficiency and phraseological competence,
the former of which is usually loosely measured in terms of the number of years of
English instruction for research purposes (Paquot and Granger 2012: 137); on the
other, it is evident that collocation lags behind other aspects of L2 knowledge and
‘may floor even the proficient non-native’ (Wray 2000: 463). Thus, what is cen-
trally important is to investigate what factors are associated with collocation lag.
The poor phraseological performance even for learners at an advanced level is
explained in terms of lacking awareness of collocational relationship between
words, i.e. learners do not pay attention to collocational relationships, and collo-
cations “are initially seen as compositional combinations of words rather than as a
phenomenon of co-selection” (Philip 2007: 3). Studies testing learners’ intuition
about collocations that are frequent in the L2 show a weak sense of collocational
relationships (Channell 1981; Granger 1998a; Siyanova and Schmitt 2008). For
example, in examining the collocational competence of a group of eight advanced
learners who were asked to mark the acceptable collocates of adjectives from a list
of nouns, Channell (1981: 120) found that “learners fail to realise the potential even
of words they know well, because they only use them in a limited number of
collocations of which they are sure”. To test whether French learners of English had
an underdeveloped sense of what constitutes a significant collocation, Granger
(1998a) used a word-combination test in which subjects were asked to choose all
the adjectives which collocated with 11 amplifiers ending in ly and functioning as
modifiers (e.g. highly, bitterly). Learners marked over 100 fewer frequent collo-
cations than the native speakers, providing clear evidence of learners’ weak sense of
collocations compared with that of native-speakers (Granger 1998a: 152). In a
similar vein, participants in Siyanova and Schmitt (2008) rated native-like collo-
cations as far less frequent, and atypical collocations as more frequent than those by
NSs. The ignorance of collocating relationships in language input naturally leads to
production which is “subject to whatever interlanguage rules the learner is operating
under” (Yorio 1989: 62). A typical illustration of this process from inability to
recognise a collocation to utilising interlanguage rules is given by Wray (2002:
209):
… the adult language learner, on encountering major catastrophe, would break it down into
a word meaning ‘big’ and a word meaning ‘disaster’ and store the words separately,
without any information about the fact they went together. When the need arose in the
future to express the idea again, they would have no memory of major catastrophe as the
pairing originally encountered, and any pairing of words with the right meaning would
54 3 Collocation Studies in Second-Language Learner English
seem equally possible: major, big, large, important, considerable, and so on, with catas-
trophe, disaster, calamity, mishap, tragedy, and the like.
Wray’s explanation of the way learners treat and produce collocations is con-
sistent with Wolter’s (2006: 746) claim that “the process of building syntagmatic
connections between words in an L2 appears to be considerably harder than the
process for building paradigmatic connections”. Furthermore, the acquisition of
new words may interfere with the production of collocations in the selection of
appropriate collocates from a set of related words. One illustration of semantic
relatedness is synonymy and the use of synonyms has been identified as the most
frequent strategy adopted in producing collocations (Farghal and Obeidat 1995;
Irujo 1993). Thus it seems that L2 learners’ vocabulary size is closely linked with
their collocation learning, though most probably in a negative way. The study
conducted by Gyllstad (2005: 1), for example suggests that “learners with large
vocabularies have a better receptive command of verb + NP collocations than
learners with smaller vocabularies”. Yet there is still unclarity about the relationship
between vocabulary size and the production of collocations, which is important for
an understanding of L2 collocation learning. There is to date still a paucity of
research into the relationship between vocabulary increase and collocation
production.
Therefore, in order to uncover the underlying factors inhibiting L2 learners’
collocation learning, this study seeks to investigate the relationship between
vocabulary growth and collocation learning, particularly the increase in vocabulary
in a set of semantically related words.
3.3 Summary
This chapter has reviewed the growing body of research that has been undertaken in
the past several decades into collocation learning by L2 learners. A wide range of
data types has been utilised in past research, including elicitation and spontaneous
data, or a mixture of the two types. With each data type possessing unique
advantages, learner corpora are gaining more and more popularity in terms of either
naturalness of learner language or large quantities. These distinctive advantages of
learner corpora will be further explored, as the present study will utilise a corpus of
written English produced by Chinese EFL learners, in order to investigate their
collocation performance and vocabulary increase.
Past research into L2 collocation studies has covered many types of learners,
including learners of different mother tongues or different proficiency groups. Their
research foci are different types of collocations, i.e. verb + noun collocations,
adjective + noun collocations, lexical phrase, lexical bundles, routines, etc. Given
this heterogeneity in L2 collocation research, direct comparisons of research find-
ings are difficult, but there emerges a general picture for the learning of collocations
by L2 learners, i.e. collocation learning poses special difficulty for L2 learners, as
3.3 Summary 55
References
Ädel, A., & Erman, B. (2012). Recurrent word combinations in academic writing by native and
non-native speakers of English: A lexical bundles approach. English for Specific Purposes, 31
(2), 81–92.
Aijmer, K. (2009). “So er I just sort of I dunno I think it’s just because…”: A corpus study of “I
don’t know” and “dunno” in learner spoken English. In A. H. Jucker, D. Schreier, & M. Hundt
(Eds.), Corpora: Pragmatics and discourse (pp. 151–166). Amsterdam: Rodopi.
Al-Zahrani, M. S. (1998). Knowledge of English lexical collocations among male Saudi college
students majoring in English at a Saudi university. Ann Arbor, MI: UMI.
Bahns, J. (1993). Lexical collocations: A contrastive view. ELT Journal, 47(1), 56–63.
Bahns, J., & Eldaw, M. (1993). Should we teach EFL students collocations? System, 21(1),
101–114.
Barfield, A. (2007). An exploration of second language collocation knowledge and development.
Swansea: University of Swansea.
Biskup, D. (1990). Some remarks on combinability: Lexical collocations. In J. Arabski (Ed.),
Foreign language acquisition papers (pp. 31–44). Katowice: Uniwersytet Slaski.
Biskup, D. (1992). L1 influence on learners’ renderings of English collocations: A Polish/German
empirical study. In P. Arnaud & H. Bejoint (Eds.), Vocabulary and applied linguistics
(pp. 85–93). London: Macmillan.
Bonk, W. (2001). Testing ESL learners’ knowledge of collocations. In T. Hudson & J. D. Brown
(Eds.), A focus on language test development: Expanding the language proficiency construct
across a variety of tests (pp. 133–142). Technical Report 21. Honolulu: University of Hawai’i,
Second Language Teaching and Curriculum Center.
Channell, J. (1981). Applying semantic theory to vocabulary teaching. ELT Journal, 35(2),
115–122.
Cobb, T. (2003). Analyzing late interlanguage with learner corpora: Quebec replications of three
European studies. Canadian Modern Language Review, 59(3), 393–423.
Crossley, S. A., & Salsbury, T. (2011). The development of lexical bundle accuracy and
production in English second language speakers. International Review of Applied Linguistics in
Teaching, 49, 1–26.
Dechert, H. W. (1983). How a story is done in a second language. In C. Faerch & G. Kasper
(Eds.), Strategies in interlanguage communication (pp. 175–196). London: Longman.
56 3 Collocation Studies in Second-Language Learner English
De Cock, S. (2011). Preferred patterns of use of positive and negative evaluative adjectives in
native and learner speech: An ELT perspective. In A. Frankenberg-Garcia, L. Flowerdew, & G.
Aston (Eds.), New trends in corpora and language learning (pp. 198–212). London:
Continuum.
De Cock, S., Granger, S., Leech, G., et al. (1998). An automated approach to the phrasicon of EFL
learners. In S. Granger (Ed.), Learner English on computer (pp. 67–79). London: Longman.
Durrant, P., & Schmitt, N. (2009). To what extent do native and non-native writers make use of
collocations? International Review of Applied Linguistics, 47, 157–177.
Durrant, P., & Schmitt, N. (2010). Adult learners’ retention of collocations from exposure. Second
Language Research, 26(2), 163–188.
Ellis, R. (1987). Second language acquisition in context. Hertfordshire: Prentice Hall.
Ellis, R. (1994). The study of second language acquisition. Oxford: Oxford University Press.
Fan, M. (2009). An exploratory study of collocational use by ESL students—A task based
approach. System, 37(1), 110–123.
Farghal, M., & Obeidat, H. (1995). Collocations: A neglected variable in EFL. International
Review of Applied Linguistics in Language Teaching, 33(4), 315–331.
Foster, P. (2001). Rules and routines: A consideration of their role in the task-based language
production of native and non-native speakers. In M. Bygate, P. Skehan, & M. Swain (Eds.),
Researching pedagogic tasks: Second language learning, teaching and testing (pp. 75–93).
Harlow: Longman.
Gitsaki, C. (1999). Second language lexical acquisition: A study of the development of
collocational knowledge. San Francisco: International Scholars Publications.
Granger, S. (1998a). Prefabricated patterns in advanced EFL writing: Collocations and formulae.
In A. P. Cowie (Ed.), Phraseology: Theory, analysis, and applications (pp. 145–160). Oxford:
Oxford University Press.
Granger, S. (1998b). The computer learner corpus: A versatile new source of data for SLA
research. In S. Granger (Ed.), Learner English on computer (pp. 3–18). London: Longman.
Granger, S. (2002). A bird’s eye view of learner corpus research. S. Granger, J. Hung, & S.
Petch-Tyson (Eds.), Computer learner corpora, second language acquisition and foreign
language teaching (pp. 3–33). Amsterdam: Benjamins.
Gyllstad, H. (2005). Words that go together well: Developing test formats for measuring learner
knowledge of English collocations. In F. Heinat & E. Klingval (Eds.), The Department of
English in Lund: Working papers in linguistics (Vol. 5, pp. 1–31).
Hasselgren, A. (1994). Lexical teddy bears and advanced learners: A study into the ways
Norwegian students cope with English vocabulary. International Journal of Applied
Linguistics, 4(2), 237–258.
Henriksen, B. (2013). Research on L2 learners’ collocational competence and development—A
progress report. In C. Bardel, C. Lindqvist, & B. Laufer (Eds.), L2 vocabulary acquisition,
knowledge and use: New perspectives on assessment and corpus analysis (pp. 29–56). Eurosla.
[2014-03-10]. http://www.eurosla.org/monographs/EM02/EM02tot.pdf.
Hoffman, S., & Lehmann, H. M. (2000). Collocational evidence from the British National Corpus.
In J. M. Kirk (Ed.), Corpora Galore: Analyses and Techniques in Describing English. Papers
from the Nineteenth International Conference on English Language Research on
Computerised Corpora (ICAME 1998) (pp. 17–32). Amsterdam: Rodopi.
Howarth, P. (1996). Phraseology in English academic writing: Some implications for language
learning and dictionary making. Tübingen: Niemeyer.
Howarth, P. (1998a). The phraseology of learners’ academic writing. In A. P. Cowie (Ed.),
Phraseology: Theory, analysis and applications (pp. 161–186). Oxford: Oxford University
Press.
Howarth, P. (1998b). Phraseology and second language proficiency. Applied Linguistics, 19(1),
24–44.
Hsu, J. (2007). Lexical collocations and their relation to the online writing of Taiwanese college
English majors and non-English majors. Electronic Journal of Foreign Language Teaching, 4
(2), 192–209.
References 57
Irujo, S. (1993). Steering clear: Avoidance in the production of idioms. International Review of
Applied Linguistics in Language Teaching, 31(3), 205–219.
Jiang, N. (2000). Lexical representation and development in a second language. Applied
Linguistics, 21(1), 47–77.
Jiang, N. (2002). Form-meaning mapping in vocabulary acquisition in a second language. Studies
in Second Language Acquisition, 24, 617–637.
Kaszubski, P. (2000). Selected aspects of lexicon, phraseology and style in the writing of Polish
advanced learners of English: A contrastive, corpus-based approach. [2011-10-13]. http://main.
amu.edu.pl/ przemka/rsearch.html.
Kroll, J. F., & Stewart, E. (1994). Category interference in translation and picture naming:
Evidence for asymmetric connections between bilingual memory representations. Journal of
Memory and Language, 33, 149–174.
Kroll, J. F., Van Hell, J. G., Tokowicz, N., et al. (2010). The Revised Hierarchical Model: A
critical review and assessment. Bilingualism: Language and Cognition, 13(3), 373–381.
Laufer, B., & Waldman, T. (2011). Verb-noun collocations in second language writing: A corpus
analysis of learners’ English. Language Learning, 61(2), 647–672.
Li, J., & Schmitt, N. (2010). The development of collocation use in academic texts by advanced L2
learners: A multiple case study approach. In D. Wood (Ed.), Perspectives on formulaic
language: Acquisition and communication (pp. 23–46). London: Continuum.
Li, W. Z. (2009). A critical review of CIA. CAFLEC, 127, 13–17.
Lorenz, G. (1999). Adjective intensification—Learners versus native speakers: A corpus study of
argumentative writing. Amsterdam: Rodopi.
Martelli, A. (2006). A corpus based description of English lexical collocations used by Italian
advanced learners. In E. Corino, C. Marello, & C. Onesti (Eds.), Proceedings XII EURALEX
International Congress (pp. 1005–1011). Alessandria: Edizioni dell’Orso.
Marton, W. (1977). Foreign vocabulary learning as problem no. 1 of language teaching at the
advanced level. Interlanguage Studies Bulletin, 2, 33–57.
Men, H. Y. (2010). A corpus-based analysis of Chinese EFL learners’ adverb/adjective collocation
errors in English writing. Internet Fortune, (4), 108–109.
Nation, I. S. P. (1990). Teaching and learning vocabulary. Boston, MA: Heinle & Heinle.
Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some
implications for teaching. Applied Linguistics, 24(2), 223–242.
Nesselhauf, N. (2005). Collocations in a learner corpus. Amsterdam: Benjamins.
Olshtain, E. (1987). The acquisition of new word formation processes in second language
acquisition. Studies in Second Language Acquisition, 9(2), 221–231.
Palmer, H. E. (1933). Second interim report on English collocations. Tokyo: Kaitakusha.
Paquot, M., & Granger, S. (2012). Formulaic language in learner corpora. Annual Review of
Applied Linguistics, 32, 130–149.
Penke, M., & Rosenbach, A. (2007). What counts as evidence in linguistics: The case of
innateness. Amsterdam: Benjamins.
Philip, G. (2007). Decomposition and delexicalisation in learners’ collocational (mis)behaviour. In
Online Proceedings of Corpus Linguistics. [2014-01-12]. http://ucrel.lancs.ac.uk/publications/
cl2007/paper/170_Paper.pdf.
Pu, J. Z. (2010). Corpora and unified language studies. Journal of PLA University of Foreign
Languages, 33(2), 41–44.
Salkie, R. (2002). Two types of translation equivalence. In B. Altenberg & S. Granger (Eds.), Lexis
in contrast: Corpus-based approaches (pp. 51–71). Amsterdam: Benjamins.
Scarcella, R. (1979). Watch up!: A study of verbal routines in adults second language
performance. Working Papers on Bilingualism, 19, 79–88.
Schmitt, N., & Carter, R. (2004). Formulaic sequences in action: An introduction. In N. Schmitt
(Ed.), Formulaic sequences: Acquisition, processing and use (pp. 1–22). Amsterdam:
Benjamins.
58 3 Collocation Studies in Second-Language Learner English
Schmitt, N., Dörnyei, Z., & Adolphs, S., et al. (2004). Knowledge and acquisition of formulaic
sequences: A longitudinal study. In N. Schmitt (Ed.), Formulaic sequences: Acquisition,
processing and use (pp. 55–86). Amsterdam: Benjamins.
Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.
Siyanova, A., & Schmitt, N. (2008). L2 learner production and processing of collocation: A
multi-study perspective. Canadian Modern Language Review, 64(3), 429–458.
Spottl, C., & Mccarthy, M. (2004). Comparing knowledge of formulaic sequences across L1, L2,
L3 and L4. In N. Schmitt (Ed.), Formulaic sequences: Acquisition, processing and use
(pp. 191–225). Amsterdam: Benjamins.
Stubbs, M. (2001). Words and phrases: Corpus studies of lexical semantics. Oxford: Blackwell.
Thierry, G., & Wu, Y. J. (2007). Brain potentials reveal unconscious translation during
foreign-language comprehension. Proceedings of the National Academy of Sciences of the
United States of America, 12530–12535.
Widdowson, H. G. (1979). Explorations in applied linguistics. Oxford: Oxford University Press.
Widdowson, H. G. (2000). On the limitations of linguistics applied. Applied Linguistics, 21(1),
3–25.
Willis, D. (2010). Three reasons why. In S. Hunston & D. Oakey (Eds.), Introducing applied
linguistics: Concepts and skills (pp. 6–11). London: Routledge.
Wolter, B. (2006). Lexical network structures and L2 vocabulary acquisition: The role of L1
lexical/conceptual knowledge. Applied Linguistics, 27(4), 741–747.
Wolter, B., & Gyllstad, H. (2011). Collocational links in the L2 mental lexicon and the influence
of L1 intralexical knowledge. Applied Linguistics, 32(4), 430–449.
Wray, A. (2000). Formulaic sequences in second language teaching: Principles and practice.
Applied Linguistics, 21(4), 463–489.
Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge University Press.
Yamashita, J., & Jiang, N. (2010). L1 influence on the acquisition of L2 collocations:
Japanese ESL users and EFL learners acquiring English collocations. TESOL Quarterly, 44(4),
647–668.
Yip, V. (1995). Interlanguage and learnability: From Chinese to English. Amsterdam: Benjamins.
Yorio, C. A. (1989). Idiomaticity as an indicator of second language proficiency. In K. Hyltenstam
& L. K. Obler (Eds.), Bilingualism across the lifespan (pp. 55–72). Cambridge: Cambridge
University Press.
Zhang, W. Z., & Chen, S. C. (2006). EFL learners’ acquisition of English adjective-noun
collocations—A quantitative study. Foreign Language Teaching and Research, 38(4),
251–258.
Zhang, X. (1993). English collocations and their effect on the writing of native and non-native
college freshmen. Indiana: Indiana University of Pennsylvania.
Zhang, Y., & Gao, Y. (2006). A CLEC-based study of collocation acquisition by Chinese English
language learners. CELEA Journal, 29(4), 28–35.
Chapter 4
Research Design
The main focus of this corpus-based research is to examine the relationship between
vocabulary increase and collocation uses by Chinese learners of English, with the
aim to test whether vocabulary increase is associated with the collocation lag in the
field of second language acquisition. This chapter outlines the design of such a
cross-sectional study. Section 4.1 briefly mentions research purpose and research
questions; Sect. 4.2 presents the types of collocations to be targeted in learners’
writings and justifies why verb + noun, adjective + noun and noun + noun collo-
cations are chosen rather than other types of collocations; Sect. 4.3 introduces the
learner corpora used: Chinese Learner English Corpus; Sect. 4.4 explains the
selection of two collocation dictionaries to be referenced; Sect. 4.5 briefly intro-
duces the British National Corpus as a native speaker reference corpus to check the
acceptability and appropriateness of combinations; Sect. 4.6 lists the software
adopted for automatic data collection, for the creation of databases, and for sta-
tistical analyses. The main procedure of the study is presented in Sect. 4.7, followed
by a summary of this study design (Sect. 4.8).
The aim of this study is to examine the relationship between vocabulary growth and
the production of L2 collocations. As was discussed in Chap. 1, verb + noun
collocations are targeted and the growth of verbs is examined from two perspec-
tives: from delexical verbs to lexical verbs and the increase in verbs within a
synonym set. The study sets out to answer the following questions regarding VN
collocations:
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018 59
H. Men, Vocabulary Increase and Collocation Learning,
DOI 10.1007/978-981-10-5822-6_4
60 4 Research Design
As was presented in Sect. 2.2.3, lexical collocations are divided into six types:
adjective + noun; (subject-) noun + verb; noun + noun; adverb + adjective; verb +
adverb; verb + (object-) noun (Benson et al. 2010; Hausmann 1989). Among these,
4.2 The Selection of Verb + Noun, Adjective + Noun and Noun + Noun Collocations 61
verb + noun collocations were primarily targeted because they are the most fre-
quent and important (Benson et al. 2010; Howarth 1996, 1998a) and at the same
time constitute a frequent source of difficulty for L2 learners (Bahns and Eldaw
1993; Benson 1985; Biskup 1990; Cowie 1991, 1992; Gitsaki 1999; Howarth
1998a, b; Nesselhauf 2005; Palmer 1933). Furthermore, verb + noun collocations
are the most frequent type of collocation errors in the learner corpus we are going to
investigate (1572 out of 2940 tokens of collocation errors made by six levels of
Chinese learners of English).
However, in VN collocations, verbs and nouns were not given equal weight in
this study. The focus was on verbs, based on the assumption that it is the nouns
(nodes) that determine the verbs to go with.1 In both speech and writing, words are
produced and arranged in a linear sequence, which makes the production process
seem as if the preceding words select the following ones (verbs/adjectives select the
following nouns). In fact it is the noun where language users generally start from
when forming ideas (Cowie 1998; McIntosh et al. 2009) and “the most important
kind of collocations sought by a writer or translator is the one based on the noun, for
it is the noun that sets the semantic context of the sentence” (Kozlowska and
Dzierzanowska 1988: 8). With the case of lingering doubt as an example, doubt
selects an acceptable adjective—lingering but not loitering (Cowie 1998: 222–223).
The selection of the verb or the adjective by the noun is reflected in the organisation
of collocation dictionaries like Selected English Collocations (Kozlowska and
Dzierzanowska 1988), where nouns are the headwords. Therefore, verbs in VN
collocations produced by L2 learners will be investigated since the appropriate verb
has to be chosen to collocate with the noun previously selected (e.g. acquire
knowledge but not *grasp knowledge, play a role but not *occupy a role).2
In addition to verb + noun collocations, two other types of collocations were
briefly examined in the same way as VN collocations to compare with findings
from analyses of VN collocations. They are adjective + noun and noun + noun
collocations, which are the top two most frequent word combinations used by
native speakers of English (Johansson and Hofland 1989). They are not only fre-
quent, but also susceptible to error in L2 collocation performance. Adjective + noun
and noun + noun collocation errors are the second and third most frequent types of
collocation errors according to the error analysis of Chinese Learner English Corpus
(henceforth referred to as CLEC) (Gui and Yang 2003). Yet these error types have
scarcely been focused on in previous L2 collocation studies (except in the studies
1
Similar approach is taken by Bahns and Eldaw (1993: 103), by whom the noun was viewed as the
node and a verb as collocate. Howarth (1998a) on the contrary cast doubts on the direction of
selection from the noun to the verb through the erroneous collocations *the contrast is drawn and
*place weight on, and speculated it is the other way around. However, the two examples are just
indicative that nouns (contrast and weight) are preselected, but the collocating verbs went wrong
(draw and place).
2
In fact, in the process of the extraction of verb + noun collocations, we found that wrong choices
of the nouns were rather rare and there were only a few instances such as *solve the question. It
will be exemplified in Sect. 4.7.2.
62 4 Research Design
The present study focuses on the developmental patterns of Chinese EFL learners’
use of collocations and thus requires a longitudinal corpus of the performance of
Chinese learners over an extended period of time, or else a corpus of the perfor-
mance of Chinese learners at different proficiency levels in English, i.e. an
apparent-time approach to the study of language development. A longitudinal
corpus would be much preferred, but it is unfortunately unavailable at the time of
the research. However, a corpus of the written performance of learners at different
proficiency levels makes an apparent-time study possible. This study made use of
the one-million-word Chinese Learner English Corpus, a computerised textual
database of writings by Chinese learners at five different levels of proficiency.
CLEC is homogenous in the sense that all learners are Chinese learners of English;
at the same time it is heterogeneous because it represents learners at different
developmental stages. The apparent-time design assumes that the performance of
different age groups of learners at different proficiency levels is indicative of suc-
cessive stages of development. The inclusion of learners of five learning stages is a
distinct advantage of CLEC, which cannot be outweighed by other published
learner corpora. Other widely used and more recent corpora recording the written
performance of Chinese learners include Spoken and Written English Corpus of
Chinese Learners (SWECCL) (Wen et al. 2008), the Chinese sub-corpus of ICLE
(Granger et al. 2009) and the British Academic Written English Corpus (BAWE)
(Nesi 2011). SWECCL documents the spoken and written data of English major
university students and it records the writings of English majors of four grades. The
Chinese sub-corpus of ICLE contains argumentative essays written by higher
intermediate to advanced Chinese learners of English at universities, and BAWE
contains texts from proficient Chinese undergraduate students studying in several
UK universities. Though these learner corpora are newer and some of them are
larger than CLEC, they only cover learners of a fixed or limited range of proficiency
levels. Learners below the university level are not targeted in these corpora. CLEC,
therefore, is chosen for study.
We next provide a brief introduction to CLEC. The Chinese Learner English
Corpus is a one-million word collection of compositions produced by Chinese
learners of English at five developmental phases: high school students (coded by the
developers as ST2, approximately corresponding to the upper levels of secondary
school students in the UK), non-English major university students of lower grades
(ST3) and higher grades (ST4), and English majors of lower grades (the first and
second years, ST5) and higher grades (the third and fourth years, ST6) (Gui and
4.3 The Learner Corpus—CLEC 63
Yang 2003). The writings of the five groups of learners were recorded in separate
files, each approximating 200,000 words. For the convenience of discussion about
the five sub-corpora, it is preferable to achieve uniformity between learner types and
the corresponding files of their writings. Therefore, the files were named as follows:
• ST2: high school students; the written performance by ST2 learners;
• ST3: first and second-year non-English major university students; the written
performance by ST3 learners;
• ST4: third and fourth-year non-English major university students; the written
performance by ST4 learners;
• ST5: first and second-year English majors; the written performance by ST5
learners;
• ST6: third- and fourth-year English majors; the written performance by ST6
learners.
The project was undertaken by teachers of English in various universities across
three cities (Guangzhou, Shanghai, Xinxiang) in China, so the learners targeted in
CLEC were from several middle schools/universities rather than from one particular
institution. The sub-corpora of ST2, ST5 and ST6 are made up of learners’ free
compositions, whereas ST3 and ST4 consist of timed writings for tests (the national
general English proficiency tests: Band 4 and Band 6). In terms of text types, the
assignments for the ST2 group were mainly narrative (probably it is still early for
Chinese middle school students to develop argumentative writing skills), and their
writings were not confined to one topic. The ST3 and ST4 sub-corpora contain
argumentative essays. For the texts produced by ST5 learners, they are not confined
to one or two topics and are partly argumentative and partly narrative. The ST6
sub-corpora contain argumentative essays, and most of the topics are what the ICLE
suggested in their data collection process, which are “should euthanasia be legalised
in China?”, “Crime does not pay.”, “the abolition of prison systems”, “the value of
university degrees” and “Should a man/woman’s financial reward be commensurate
with their contribution to the society they live in?”, etc.
Not all the five sub-corpora were examined in this research. As noted earlier, the
sub-corpora of ST2, ST5 and ST6 are made up of learners’ free compositions,
whereas ST3 and ST4 consist of timed writings for tests. Only the ST2, ST5 and
ST6 learner files were used, as they contain the same data type—free compositions.
So it is assumed they did not undergo the time and mental pressure in timed writing
for tests and they could turn to referencing tools for help in the writing process. The
three groups of learner data are thus homogenous. It is furthermore generally
assumed that the quantity of formal English instruction learners receive is indicative
of their proficiency in English. Thus, it is possible to conduct an apparent-time
study on Chinese EFL learners’ collocation performance on the basis of the years of
instruction they get. In light of the components of CLEC, a clear dividing line in
English proficiency is observable, i.e. pre-university Chinese EFL learners (ST2)
and university-level learners (ST3, ST4, ST5 and ST6). Likewise, university-level
students can be further divided into non-English majors (ST3 and ST4) and English
64 4 Research Design
majors (ST5 and ST6). However, it cannot be claimed that the proficiency of ST3,
ST4, ST5 and ST6 learners is in a continuum because of the difference in majors
(non-English major vs. English major). It is possible that some non-English majors
are better than English majors in the overall English performance. Consequently,
the intensity of English instruction is not used as a criterion to distinguish the
proficiency level of non-English and English majors, even though the latter might
be better than the former in general.
For these reasons, three groups of learners were examined in this study: ST2—
pre-university high school students, categorised as the “basic” level, ST5—English
majors of lower grades as the “intermediate” level, and ST6—English majors of
higher grades as the “advanced” level. The classification of the levels, as discussed
above, is based on the years of English instruction learners receive, and is mainly
adopted for straightforward comparison. The ST2 learners had at least 3–5 years’
classroom English instruction in China; the ST5 group had at least 6–7 years and
the ST6 learners had English instruction for at least 8–9 years.
3
The use of the two dictionaries in attesting collocations was also endorsed by Siyanova and
Schmitt (2008).
4.4 Collocation Dictionaries for Reference 65
study for word combinations” (Johansson and Hofland 1989: 14) and is often
inadequate for the lexicologist (Kjellmer 1994: xiii).4 So the OCDSE has been
chosen for its wide coverage. Therefore, both dictionaries are consulted in the
process of extracting collocations. If a collocation is included in either of the two
dictionaries, it is considered as a well-formed collocation.
The British National Corpus was chosen to serve as a benchmark for measuring the
appropriateness of learners’ production of collocations which failed to be attested in
collocation dictionaries. As a general corpus representing as wide a range of
modern British English as possible, the 100-million-word corpus contains over
4000 written texts and transcripts of speech in British English (McEnery et al.
2006). It has to be noted that the BNC only covers British English of the late
twentieth century. However, creativity and productivity are two of the design
features of human language, by which it means that humans are able to construct
understandable linguistic forms, some of which have even not been used before.
Language is constantly changing, with the gradual emergence of neologisms. Some
of these new constructions and interpretation of words and expressions are accepted
by the language community and acquire status in the language stock. By the same
token, new combinations of words, i.e. collocations, are constantly coined and
gaining acceptance. Yet the BNC is not timely updated to include new forms of
language use, and its limited size means that it fails to cover a wider range of
English language uses. Considering the ever-changing nature of language, few
corpora can include all collocations, which makes the recognition of appropriate/
inappropriate word combinations difficult. However, relative to the creative use of
language, there is always a conventional core in any language, which remains stable
and usually becomes the learning target of language learners. Attested in a large
corpus, conventional expressions have the most frequent occurrences and creative
expressions are usually on the bottom of the frequency list. In published dic-
tionaries, conventional language uses are prioritised and recorded. Therefore, in this
study collocation dictionaries were taken as the criterion for locating conventional
English collocations. As was discussed in Chapter One, conventional language uses
are set both as the norm for L2 learners and as the criterion for judging the
appropriateness of learners’ interlanguage. If a collocation is listed in the two
dictionaries, it was recognised as correct (cf. Sect. 4.4). If it is not included in the
dictionaries, this research adopted the online version of the BNC—BYU-BNC,5
with the aim of checking whether it is acceptable or not. The BNC was also used to
4
For example, commit a crime, a commonly accepted collocation, is not included in Johansson and
Hofland’s book.
5
http://corpus.byu.edu/bnc/ [Accessed 10 March 2012].
66 4 Research Design
locate the target collocating word if an appropriate one was not found in collocation
dictionaries. For example, in our learner database, create + poem, was viewed as
incorrect as it was not recognised as a conventional collocation in the dictionaries,
nor was recorded in the BNC, though it is understandable in the English language.
The detailed procedure for identifying well-formed and erroneous collocations by
using collocation dictionaries and the BNC will be shown in Sect. 4.7.2.
Data extraction has been greatly facilitated by the increasing sophistication and
availability of computers. To allow highly efficient and labour-saving data collec-
tion and analyses, the following types of software were used: AntConc 3.2.4w and
Wordsmith 5.0 to perform the function of concordancing and word-list generation;
EditPad Pro 7 and PowerGREP 4 to automatically collect words of a particular part
of speech and word combinations through regular expressions; Microsoft Office
Excel 2010 to help create databases of collocations and perform the function of
computing and graphing; and finally GraphPad Prism (Version 6.04) was used to
carry out statistical analyses.
Verb + noun collocations were semi-automatically collected, i.e. via an auto-
matic generation of all the verbs in concordances and a manual identification of
verb + noun collocations (cf. Sect. 4.7.2 for detailed explanation). For the retrieval
of other words or word combinations, the following regular expressions were used
in PowerGREP 4:
• For the retrieval of verbs: (\w+)_VV\w+
• For the retrieval of nouns: (\w+)_NN[12]\s|(\w+)_NN\s
• For the retrieval of adjectives: (\w+)_J\w+
• For the retrieval of AN combinations: (\w+_J\w+\s)((\w+_NN[12]\s)|(\w+_NN
\s))
• For the retrieval of NN combinations: (\w+(_NN[12]\s)|(_NN\s))(\w+(_NN[12]
\s)|(_NN\s)).
4.7 Procedure
When originally compiled, CLEC was error tagged into 61 types of error. However,
it was decided not to base the present study on the error-tagged version, which was
found to be inadequate in the following respects: first, well-formed collocations
relevant to the present study were not identified and tagged in CLEC; secondly,
error-tagging in CLEC was faulty as some erroneous collocations were missed out
4.7 Procedure 67
while some well-formed ones were included; thirdly, the error-tagging targeted
erroneous word combinations of all types (including problematic collocations,
erroneous free combinations, colligation errors, etc.) rather than exclusively col-
location errors (cf. Zhang and Gao 2006). For the purposes of the present research,
we rectified the above problems by applying the following procedure: all the error
tags were firstly removed and then the clean corpus was part-of-speech tagged,
followed by a reliability check. Collocations were finally semi-automatically
extracted with reference to the two widely used collocation dictionaries discussed
above.
The total size of the three sub-corpora (ST2, ST5 and ST6) amounts to over
600,000 words. It would have been an impossibly large undertaking to extract
collocations manually by looking through the corpora word by word. For verb +
noun collocation extraction in this research, therefore, the starting point was to
locate all the verbs and then manually sorted out VN collocations (the justification
of this collection method will be given in Sect. 4.7.2). For this purpose, the corpora
were first automatically part-of-speech tagged using the online tagging service
developed by University Centre for Computer Corpus Research on Language at
Lancaster University.6 The current standard tagset—CLAWS7 was used for its
richness in detailed subdivisions of word types.
After the POS tagging, a reliability check was performed. CLAWS is thought to
achieve a consistent accuracy of 96–97% and even 98.3% for the tagging of some
portions of the BNC (Garside 1987, 1996). Those figures are obtained through
tagging the texts of native speakers, although the accuracy rate varies according to
text types. For the tagging of learner language, a lower degree of accuracy is
generally believed to be achieved, since tagging learner language is complicated by
instances of grammatical and morphological errors. Prior to the retrieval of word
combinations, a reliability check was carried out on a sample of over 1000 words in
the ST6 file.
A straightforward and commonly used way of checking tagging validity is to
locate how many tagging errors occur in a sample of texts. Thus two pieces of
writings with a total of 1311 words from the tagged ST6 corpus were randomly
selected. After word-by-word examination, 29 words were found to be incorrectly
tagged. So the tagging reliability was 97.8% [(1311 − 29)/1311*%], a fairly high
accuracy rate and very much in line with the accuracy rate on native speaker texts.
6
http://ucrel.lancs.ac.uk/claws/trial.html [Accessed 1 March 2013].
68 4 Research Design
In the semi-automatic extraction of VN collocations in the ST2, ST5 and ST6 files,
only collocations of a verb and a noun as its object were counted (e.g. make a
contribution, acquire knowledge). VN combinations can be easily retrieved with
regular expressions performed by PowerGREP. However, there are varied positions
of the nouns as the objects of the verbs. Verbs and nouns are not confined to the
immediate linear sequence: verb + (modifiers) + noun (e.g. make a plan). As in the
examples given by Greenbaum (1970: 10) (cf. Sect. 2.2.1), a collocational rela-
tionship can even transcend a sentence. The following categories of the noun’s
varied positions relative to concordanced verbs were also examined, e.g. the noun
used before the verb in a passive voice (great progress has been made; such
problems would be solved) and in attributive clauses (life pays everyone in different
ways for the contribution he makes to the society, *she can use the knowledge she
had learned in the new job). In these examples above, VN collocations were
subsequently retrieved: make a plan, made progress, solve problems, makes con-
tribution and *learned knowledge. So an automatic extraction of verb + noun
4.7 Procedure 69
combinations within a specified span would not only leave out some combinations,
but also “yield a great deal of unusable material, the sifting of which would
probably be even more time-consuming than the manual extraction of all verb-noun
combinations from the corpus” (Nesselhauf 2005: 43).
Therefore, taking the different proximities of verb + noun collocations into
consideration, VN combinations were not automatically retrieved but rather a
semi-automatic approach was applied: all the verb tags (except the copular be and
modal verbs) were searched, followed by a manual extraction of the nouns as the
collocates of the verbs (phrasal verb + noun patterns, e.g. put on weight were
disregarded).7 Although nouns select other lexical words, they were not first
searched because a large number of irrelevant information would be extracted (e.g.
adjective + noun, (subject) noun + verb, preposition + noun). As is acknowledged
by Howarth (1996: 78), “searches based on the verb would more sharply focus the
searches on the desired patterns”.8
Verbs were classified into eight main categories in CLAWS: VV0 (base form,
e.g. work), VVD (past tense, e.g. worked), VVG (-ing participle, e.g. working),
VVGK (-ing participle catenative, e.g. going in be going to), VVI (the infinitive
form, e.g. it will work…), VVN (past participle, e.g. worked), VVNK ((past par-
ticiple catenative, e.g. bound in be bound to) and VVZ (the present tense form, e.g.
works).9 Considering catenative verbs are followed by to infinitives rather than
nouns, they were disregarded. The remaining 6 forms of lexical verbs were
examined. In addition, verbs of do and have were separately tagged in the CLAWS
and they fell into the category of delexical verbs to be searched. Altogether 18 verb
taggers (six forms of verbs + six forms of do and six forms of have), totaling
87,957 tokens, were examined in concordances generated by AntConc. Next comes
the manual extraction of well-formed and erroneous VN collocations.
A collocation was taken to be well-formed when it was found either in the BBI or in
the OCDSE.10 Collocations which were not listed in the two dictionaries, but were
7
This method for identifying verb + noun collocations has also been adopted by Howarth (1996,
1998a, b).
8
This method of taking verbs as the starting point in the extraction is different from the study by
Laufer and Waldman (2011), in which nouns were searched as node words. They started from a set
of 220 pre-selected nouns and proceeded with the identification of verb collocates. The way of
choosing a limited number of frequent nouns would inevitably leave out many verb + noun
collocations. This study performed an exclusive extraction.
9
http://ucrel.lancs.ac.uk/claws7tags.html [Accessed 1 March 2013].
10
A question is whether word combinations (open the door) that are found in the learner corpus
and also included in either the BBI or OCDSE should be listed in my database. Combinations like
open the door were not included, since they are viewed as free combinations based on our
definition of collocations. The two dictionaries were only used to attest well-formedness.
70 4 Research Design
VN
attested in disregarded
collocation
the BNC
not attested in the
BBI or OCDSE
not attested erroneous
in the BNC collocation
attested in the BNC, were disregarded, for the association was too loose to be
viewed as a collocation (e.g. ?give pressure, ?eat tea). A VN collocation was
viewed as erroneous (e.g. *do + problem) when it was neither listed in the two
dictionaries, nor found in the BNC. Wider contexts of the concordanced verbs were
checked in cases of ambiguity. The target verbs for the erroneous collocations were
supplied by consulting the BBI, the OCDSE, or the BNC (e.g. acquire knowledge
but not *learn knowledge, seize time but not *grasp time). In only a few cases
where neither source supplied the target verb for an erroneous collocation, a native
speaker was consulted. As stated above, the following cases were not considered
when identifying verb + noun collocation errors:
• Colligation errors with verb + noun collocations were disregarded, considering
what is of central importance in this study is to examine the (in)correct choices
of verbs, not the (in)correct choices of grammatical forms. Therefore, errors in
phrasal verbs, determiners, prepositions and the number of nouns were not
counted (instances of such errors are *make one’s mind, *hunt a job, and *give
some advices).
• Errors involving free verb + noun combinations were also eliminated (e.g.
*abandon prisons, *build heroes).
• Errors involving the wrong choices of the nouns were eliminated (e.g. *earn his
life, ?*solve questions).
The following chart presents a summary of the procedure adopted for identifying
well-formed and erroneous VN collocations (Fig. 4.1).
Excel software was used as a tool for storing all the collocations as well-formed
ones (further divided into delexical verb + noun collocations and lexical verb +
noun collocations) and erroneous ones (further divided into erroneous delexical
verb + noun collocations and erroneous lexical verb + noun collocations), with
information on collocation tokens and types recorded. For the convenience of
4.7 Procedure 71
study, verbs and nouns in collocations were lemmatised and articles, determiners
and adjectives in between them were not recorded. For example, the following
collocations were viewed as instantiations of the same collocation (make + plan):
make a plan, makes a plan, make plans, made plans, etc. One advantage of getting
the verbs and nouns in VN collocations lemmatised and recorded into separate and
parallel columns was to facilitate subsequent automatic analyses.
The procedure described above was required for the investigation of the
developmental patterns of VN collocations in terms of delexical verb + noun and
lexical verb + noun collocations. Answering the next question about the relation-
ship between vocabulary increase and collocation acquisition required the imple-
mentation of the following steps:
11
http://wordnetweb.princeton.edu/perl/webwn [Accessed 10 May 2013].
72 4 Research Design
classifying verbs in VN collocations into synsets, i.e. English Verb Classes and
Alternations (henceforth EVCA) (Levin 1993), the ODSA and WordNet. If verbs are
co-listed in at least one of the three dictionaries, they were placed in the same set.
Given that verbs are polysemous, their synonyms were located specifically
within the sense of the verb in a VN collocation. For example, the verb discharge
has 11 senses (accordingly, 11 synsets) as listed in WordNet and 5 synsets in the
ODSA. In the VN collocation—discharge + duty produced by ST2 learners, only
synonyms of the sense of discharge were recorded (e.g. complete). Similarly, ac-
quire in acquire + knowledge was grouped under ‘learn’ verbs instead of being
placed into the synset of “obtaining”.
Classifying verbs into synsets was not confined to verbs under the same entry
covered in the referencing sources. Instead, if two verbs were not listed as syn-
onyms, but they had a shared synonym, the three verbs were grouped in a synset.
For example, fix and place were not listed as synonyms in either the ODSA or
WordNet, but they were, respectively, in a synonymous relationship with attach, so
they were placed in the same synset. Similarly, all three words were synonyms of
put and they were gathered in one synset. Accordingly, new verbs were gradually
accumulated in the synset of “verbs of putting”.
Besides the above criterion of synonym classification, two more loose criteria
were adopted, i.e. context and foreign-language equivalents. Words are defined as
synonyms if they both fit in a particular context (Palmer 1981; Saint-Dizier and
Viegas 1995). These synonym pairs are context-dependent synonyms. Based on
this criterion, the verbs lead and live were synonyms in the given context of
lead/live + life. Another standard was a cross-linguistic one. According to Benson
et al. (1986: 204), one of the definitions of synonymy is foreign-language equiv-
alent. Thus in the data sets of the verbs, wear and dress were synonyms in the sense
that they both share one Chinese translation equivalent (chuan), though they behave
quite differently in English.
In all, the criteria of classifying verbs in VN collocations in this study include
semantic similarity (synonyms), context-dependent synonyms and foreign-language
equivalents. Three referencing sources were applied: the EVCA, the ODSA and
WordNet. There was no hierarchy in applying these criteria and resources and
instead there were alternatives. For the convenience of study, each synset was given
a name according to the classification given by the EVCA, e.g. verbs of creation
(compose, create, build, etc.), and verbs of obtaining (achieve, earn, receive, etc.)In
cases where there was no umbrella term for the semantic set, a representative verb
was used to cover the synset, e.g. fulfil verbs incorporating verbs like fulfil, ac-
complish, apply, etc.
Finally, verb + noun collocations with the verbs falling into the synsets classi-
fied were investigated. The purpose was to find out whether there are more VN
collocation errors in higher levels within synsets where there is an increase of verbs,
and whether these errors are more associated with new verbs than old verbs.
4.7 Procedure 73
4.8 Summary
References
Bahns, J., & Eldaw, M. (1993). Should we teach EFL students collocations? System, 21(1),
101–114.
Benson, M. (1985). Collocations and idioms. In R. Ilson (Ed.), Dictionaries, lexicography and
language learning (pp. 61–68). Oxford: Published in association with the British Council by
Pergamon.
Benson, M., Benson, E., & Ilson, R. (1986). Lexicographic description of English. Amsterdam:
Benjamins.
Benson, M., Benson, E., & Ilson, R. (2010). The BBI combinatory dictionary of English: Your
guide to collocations and grammar (3rd ed.). Amsterdam: Benjamins.
Biskup, D. (1990). Some remarks on combinability: Lexical collocations. In J. Arabski (Ed.),
Foreign language acquisition papers (pp. 31–44). Katowice: Uniwersytet Slaski.
Cowie, A. P. (1991). Multiword units in newspaper language. In S. Granger (Eds.), Perspectives
on the English lexicon: A tribute to Jacques Van Roey (pp. 101–116). Louvain-la-Neuve:
Cahiers de I’Institut de Linguistique de Louvain.
Cowie, A. P. (1992). Multiword lexical units and communicative language teaching. In P. Arnaud
& H. Bejoint (Eds.), Vocabulary and applied linguistics (pp. 1–12). London: Macmillan.
Cowie, A. P. (1998). Phraseology: Theory, analysis and applications. Oxford: Oxford University
Press.
Fellbaum, C. (2010). Wordnet. In R. Poli, M. Healy, & A. Kameas (Eds.), Theory and applications
of ontology: Computer applications (pp. 231–243). London: Springer.
Garside, R. (1987). The CLAWS word-tagging system. In R. Garside, G. Leech, & G. Sampson
(Eds.), The computational analysis of English: A corpus-based approach (pp. 30–41). London:
Longman.
References 75
Garside, R. (1996). The robust tagging of unrestricted text: The BNC experience. In J. Thomas &
M. Short (Eds.), Using corpora for language research: Studies in the honour of Geoffrey Leech
(pp. 167–180). London: Longman.
Gitsaki, C. (1999). Second language lexical acquisition: A study of the development of
collocational knowledge. San Francisco: International Scholars Publications.
Granger, S., Dagneaux, E., Meunier, F., et al. (2009). International corpus of learner English (V2).
Louvain: Presses Universitaires de Louvain.
Greenbaum, S. (1970). Verb-intensifier collocations in English: An experimental approach. The
Hague: Mouton.
Gui, S. C., & Yang, H. Z. (2003). Chinese learner English corpus. Shanghai: Shanghai Foreign
Language Education Press.
Hausmann, F. J. (1989). Le dictionnaire de collocations. In F. J. Hausmann, O. Reichmann,
H. E. Wiegand, et al. (Eds.), Wörterbücher: ein internationales Handbuch zur Lexicographie.
Dictionaries. Dictionnaires (pp. 1010–1019). Berlin: De Gruyter.
Howarth, P. (1996). Phraseology in English academic writing: Some implications for language
learning and dictionary making. Tübingen: Niemeyer.
Howarth, P. (1998a). The phraseology of learners’ academic writing. In A. P. Cowie (Ed.),
Phraseology: Theory, analysis and applications (pp. 161–186). Oxford: Oxford University
Press.
Howarth, P. (1998b). Phraseology and second language proficiency. Applied Linguistics, 19(1),
24–44.
Johansson, S., & Hofland, K. (1989). Frequency analysis of English vocabulary and grammar.
Oxford: Oxford University Press.
Kjellmer, G. (1994). A dictionary of English collocations: Based on the Brown corpus. Oxford:
Clarendon Press.
Klotz, M. (2003). Oxford collocations dictionary for students of English. International Journal of
Lexicography, 26(1), 57–61.
Kozlowska, C. D., & Dzierzanowska, H. (1988). Selected English collocations. Warszawa: PWN.
Laufer, B., & Waldman, T. (2011). Verb-noun collocations in second language writing: A corpus
analysis of learners’ English. Language Learning, 61(2), 647–672.
Levin, B. (1993). English verb classes and alternations: A preliminary investigation. Chicago:
University of Chicago Press.
Martelli, A. (2006). A corpus based description of English lexical collocations used by Italian
advanced learners. In E. Corino, C. Marello, & C. Onesti (Eds.), Proceedings XII EURALEX
International Congress (pp. 1005–1011). Alessandria: Edizioni dell’Orso.
McEnery, T., Xiao, R., & Tono, Y. (2006). Corpus-based language studies: An advanced
resource book. London: Routledge.
McIntosh, C., Francis, B., & Poole, R. (2009). Oxford collocations dictionary for students of
English (2nd ed.). Oxford: Oxford University Press.
Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38
(11), 39–41.
Miller, G. A., Beckwith, R., Fellbaum, C., et al. (1990). Introduction to WordNet: An on-line
lexical database. International Journal of Lexicography, 3(4), 235–244.
Nesi, H. (2011). BAWE: An introduction to a new resource. In A. Frankenberg-Garcia, L.
Flowerdew, & G. Aston (Eds.), New trends in corpora and language learning (pp. 213–228).
London: Continuum.
Nesselhauf, N. (2004). What are collocations? In D. Allerton, N. Nesselhauf, & P. Skandera
(Eds.), Phraseological units: Basic concepts and their application (pp. 1–21). Basel: Schwabe.
Nesselhauf, N. (2005). Collocations in a learner corpus. Amsterdam: Benjamins.
Palmer, F. R. (1981). Semantics (2nd ed.). Cambridge: Cambridge University Press.
Palmer, H. E. (1933). Second interim report on English collocations. Tokyo: Kaitakusha.
76 4 Research Design
Saint-Dizier, P., & Viegas, E. (1995). Computational lexical semantics. Cambridge: Cambridge
University Press.
Sinclair, J., & Fox, G. (1990). Collins COBUILD English grammar. London: Collins.
Siyanova, A., & Schmitt, N. (2008). L2 learner production and processing of collocation: A
multi-study perspective. Canadian Modern Language Review, 64(3), 429–458.
Spooner, A. (2005). Oxford dictionary of synonyms and antonyms. Oxford: Oxford University
Press.
Wen, Q. F., Liang, M. C., & Yan, X. Q. (2008). Spoken and written English corpus of Chinese
learners (2nd ed.). Beijing: Foreign Language Teaching and Research Press.
Zhang, Y., & Gao, Y. (2006). A CLEC-based study of collocation acquisition by Chinese English
language learners. CELEA Journal, 29(4), 28–35.
Chapter 5
Chinese Learners’ Production
of Verb + Noun Collocations
This chapter enquires into the relationship between vocabulary increase and col-
location used by L2 learners seen from the overall perspective of the growth from
delexical verbs to lexical verbs. As part of the investigation, it presents the overall
analyses of VN collocations produced by all the three levels of learners. In addition
to the comparison of our findings with previous ones, the learning of verb + noun
collocations by L2 learners is analysed from the following perspectives: the overall
results and general patterns of verb + noun collocations produced by the three
proficiency levels (Sect. 5.1), the developmental patterns of delexical verb + noun
(abbreviated: DeLexVN) and lexical verb + noun (LexVN) collocations (Sect. 5.2),
the comparison between overall verb growth and VN collocation errors (Sect. 5.3),
followed by a summary of these three overall analyses (Sect. 5.4).
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018 77
H. Men, Vocabulary Increase and Collocation Learning,
DOI 10.1007/978-981-10-5822-6_5
78 5 Chinese Learners’ Production of Verb + Noun Collocations
proportion 2.5 times as high as that obtained through our L2 learner data.1 This
quantitative discrepancy in terms of collocation uses has been widely acknowl-
edged and empirically tested (cf. Sect. 3.2.1.1). That learners used fewer colloca-
tions compared with NSs on the one hand, shows a poorer sense of collocations; on
the other, it demonstrates the greater use of an “open choice principle” than of an
“idiom principle” (Sinclair 1991) by L2 learners. This “open choice principle’’ is
further manifested through a non-diversified production of collocation types, dis-
cussed below.
The numbers of collocation types produced by the three groups of Chinese EFL
learners were: 285 (ST2), 344 (ST5) and 441 (ST6).2 The overall number of types
(1070) was found to be rather low compared with tokens (5068). That means on
average one collocation was produced 5 times. But the frequency was not so evenly
distributed if we take a closer look at the distribution of collocation frequencies over
the overall types in each learner group. Figure 5.1 presents the distribution of
collocation frequencies over the 285 types of collocations in the ST2 database.
As is shown in Fig. 5.1, a predominant number of collocations had a frequency
less than 10, with 5 collocations having a frequency over 50. It can be seen that an
overwhelming number of collocations occurred fewer than 5 times, and in fact,
most of these occurred only once. The same pattern of the frequency distribution
across types was also found in the ST5 and ST6 databases. The fact that a majority
of the types of collocations were produced less than 5 times demonstrates a varied
use of collocations, which further acts as some sign of phraseological competence.
However, taking the total tokens into account, this varied use is at the same time
accompanied by an overuse of a small number of collocation types.
A further look into the distribution of collocation tokens over their types
revealed a huge overuse of a limited number of collocations in all three groups of
learners. Table 5.1 shows the types of collocations that were divided into three
groups according to frequencies: those with a frequency of 5 or lower ( 5), those
between 5 and 10 (5–10) and those with a frequency of 10 or more ( 10).
As is shown in the above table, a majority of collocations in the three levels
occurred less than 5 times (cf. Fig. 5.1) and only a small proportion of them had a
frequency more than 10. However, this small proportion of collocation produced
1
The proportion of VN collocations in native speaker data is in fact much higher, since Howarth
(1996, 1998a, b) started from the most frequent verb lemmas (with a frequency of 10 or more) and
then proceeded to extract their noun collocates, which means that there still exist a large number of
collocations with less frequent verbs.
2
Collocations like made plans, made a plan, make a plan were regarded in this study as instan-
tiations of one collocation type “make + plan”.
80 5 Chinese Learners’ Production of Verb + Noun Collocations
90
80
70
60
Frequency
50
40
30
20
10
0
0 50 100 150 200 250 300
Collocation types
Fig. 5.1 VN collocation frequencies distributed over collocation types in the ST2
more than 10 times made up a majority of the overall collocation tokens. Taking the
ST2 data, for example 39 types of collocations were used for 1052 times, making
up 67% of the total 1579 collocations. The same trend for heavy reliance on a
limited number of collocation types was revealed in the ST5 and ST6 databases as
well (see Fig. 5.2). Though there was a slight decrease in the proportion of col-
locations occurring more than 10 times among the overall tokens from ST2 to ST5
and ST6, similar distribution patterns were observed in the ST5 and ST6 collocation
databases, i.e. less than 14% collocation types made up more than half of the
collocations produced by Chinese EFL learners at three proficiency levels.
On the one hand, the finding obtained in this study mirrors those of previous
studies which have identified the phenomenon of phraseological overuse on the part
of L2 learners (Ädel and Erman 2012; Granger 1998a; Lorenz 1999; Kaszubski
2000; etc. cf. Sect. 3.2.1.1). On the other, the finding of an overuse of a restricted
number of collocations by L2 learners was reported in a different way here.
Fig. 5.2 The frequency distribution of VN collocation types in ST2, 5 & 6 databases
5.1 Overall Analyses (1): General Patterns of VN Collocations … 81
3
Good performance is observed with the exception of *learn + knowledge.
82 5 Chinese Learners’ Production of Verb + Noun Collocations
400
350
300
ST2
Coll. types
250
200
ST5
150 ST6
100
50
0
≤5 5-10 ≥10
Coll. frequencies
verb combinations: make + progress, make + use, take + part, take + care, etc.4
So a necessity for communicative purposes and the frequent input for L2 learners of
these overused collocations may well account for such a heavy use. The role of L1
in collocation acquisition will be extensively discussed in Chap. 9.
Though there exists a necessity for the frequent use of collocations in everyday
English, as some sign of fluent and idiomatic control of a language, a varied use of
collocation types is also needed. As stated in Sect. 3.2.1.1, the difference of
phraseological uses between NSs and NNSs rests in the diversified types produced.
Turning to the types of VN collocations produced by all three levels, a
between-group comparison of diversification in collocation production was con-
ducted (see Fig. 5.3).
As is shown in Fig. 5.3 (cf. the corresponding numerical data in Table 5.1), for
collocations that were used 5 times or less, there was a clear increase in collocation
types in the ST6 level compared with the two lower levels. For collocations with a
frequency more than 5, there was not such a growing trend. It shows that despite a
heavy use of a rather small number of most frequent collocations at all levels, there
was no increase in the types of these frequent collocations across the three profi-
ciency levels, but an increase in far less frequent collocations. That there was an
increase in collocation types with the rise of proficiency is consistent with the
findings by Gitsaki (1999) and Zhang (1993), who reported that more proficient L2
learners produced more varied collocations than the less proficient L2 learners. It is
unsurprising in the sense that more collocations are learned as learners receive more
English instruction. So it becomes more important to investigate the quality of their
collocation production in terms of misuses.
4
The Chinese equivalent expressions for make + progress, make + use, take + part, take + care
are qude jinbu (literally: gain progress), shiyong (use), canjia (participate), zhaogu (care), among
which the last three are Chinese verb lexemes.
5.1 Overall Analyses (1): General Patterns of VN Collocations … 83
Table 5.2 Well-formed and erroneous VN collocations in the three levels of learners (types)
Learners Well-formed collocations Erroneous collocations Total
ST2 221 64 (22%) 285
ST5 300 44 (13%) 344
ST6 348 93 (21%) 441
Notes ST2 and ST5: p = 0.0020**
ST5 and ST6: p = 0.0024**
ST2 and ST6: p = 0.7121 ns
**indicate “very significant” and “ns” suggests “not significant” [The threshold significance
level is set as 0.05 by the GraphPad Prism. In the meantime, symbols used by Prism suggesting
the level of significance were also adopted, and these symbols together with the p values are:
****(p < 0.0001): extremely significant; ***(0.0001 < p < 0.001): extremely significant;
**(0.001 < p < 0.01): very significant; *(0.01 < p < 0.05): significant; ns (p > 0.05): not
significant (cited from GraphPad statistics guide: http://www.graphpad.com/guides/prism/6/
statistics/index.htm?extremely_significant_results.htm) (Accessed 10 June 2013)]
5
Fisher’s test can give an exact P value and works fine with small sample sizes. Considering the
small number of collocation types, Fisher’s test was adopted.
84 5 Chinese Learners’ Production of Verb + Noun Collocations
than pre-university middle school students and third and fourth year English majors
(ST2 and ST5: p = 0.0020; ST5 and ST6: p = 0.0024). That collocation errors
made by ST5 learners are the fewest was also found by Zhang and Gao (2006) in
their analysis of the original error-tagged CLEC. Though they did not investigate
correct collocations, they gave the numbers of problematic verb–noun collocations
produced by Chinese EFL learners and figures showed that ST5 learners produced
the smallest number of errors compared with ST2 and ST6. Taking into other types
of erroneous word combinations, viz. noun/noun, noun/verb, adjective/noun,
verb/adverb and adverb/adjective combinations, ST5 was also found to produce the
smallest numbers of errors in the three levels under discussion (cf. Zhang and Gao
2006: 32). Therefore, in terms of the frequency of erroneous collocations, the ST5
level is not consistent from the levels of ST2 to ST6. ST5 learners received either
one to two more years’ English instruction than ST2 learners, and they receive one
to two fewer years’ instruction than ST6 learners. At the intermediate level, ST5
learners exhibit a higher competence than both the lower and higher levels of
learners. The question that arises here is that why the middle level outperforms the
other two levels. This phenomenon has also been noted by Zareva and Wolter
(2012) in L2 word association studies where the intermediate group produced
the highest percentage of collocational responses, higher than the advanced
group. They further noticed that “the same class (paradigmatic) connections
become more prominent as the proficiency of L2 learners of English increases to an
advanced level” (Zareva and Wolter 2012: 59–60). Therefore, in a broad sense, a
reverse relationship is suggested between vocabulary growth in the paradigmatic
relations and the collocation performance in the syntagmatic relations. To put the
matter in simple terms, the more words L2 learners learn, the more they make errors
(as seen from the comparison of error ratios between ST5 and ST6). Moreover, a
sufficient vocabulary size is undoubtedly important for correct collocation pro-
duction (as seen from the comparison of error ratios between ST2 and ST5). This
relationship between vocabulary growth and collocation is at the heart of this study
and will be elaborated in Chap. 6.
Returning to the present to the percentages of erroneous VN collocations pro-
duced by L2 learners, the result is similar to other studies of L2 VN collocation
production, given the heterogeneity in studies in this field. In Nesselhauf’s (2005)
investigation into the verb—noun collocations in a corpus of writings by advanced
German-speaking learners of English, approximately one third of the collocations
were found to be unacceptable or questionable. This proportion was endorsed by
Laufer and Waldman (2011), who reported about a third erroneous VN collocations
among all the collocations L2 learners produced. It should be noted that there were
higher proportions of VN collocation errors in the above studies because more types
of errors were included. For Nesselhauf, verb–noun errors include errors of all
elements, e.g. verbs, phrasal verbs, nouns, determiners, etc. An example given is the
collocation—come to the conclusion that and errors in any of these elements were
considered (Nesselhauf 2005: 71). Similarly, errors involving nouns were counted
in the study conducted by Laufer and Waldman (2011). The closest point to turn to
is the study carried out by Howarth (1996), who found a fourth of the verb–noun
5.1 Overall Analyses (1): General Patterns of VN Collocations … 85
collocations his subjects produced were erroneous. Therefore, our finding in terms
of the proportion of erroneous collocations is similar to previous L2 VN collocation
studies.
Turning to collocation errors as related to L2 proficiency, statistical analysis
shows that there was a significant difference between the numbers of erroneous
collocations and learner types, viz. ST2 versus ST5, ST5 versus ST6, though no
significant relationship was found between the ST2 and ST6 learners. However, the
data revealed a persistent proportion of collocation misuses in the two levels (22%
in the ST2 level and 21% in ST6 level). This suggests an overall lag in collocation
ability, with no sign of decrease in errors with rising proficiency. That there is no
decrease in errors accords with the finding of the cross-sectional study conducted by
Laufer and Waldman (2011), who found a third erroneous collocation produced by
learners at three proficiency levels. Thus, now we have again uncovered a defi-
ciency in the L2 acquisition of collocations, and it becomes important to identify
the factor(s) contributing to this lag.
This section presents the overall analyses of all the verb + noun collocations pro-
duced by Chinese EFL learners at three proficiency levels. Overall results were
discussed in connection with prior findings in L2 VN collocation studies. In short,
this study identified both a quantitative and qualitative deficiency long acknowl-
edged in the collocation performance by L2 learners. Among the approximately
600,000 words of text analysed, only about 5000 collocations were retrieved, fol-
lowing the criteria set out in the last chapter, thus revealing a quantitative dis-
crepancy in terms of collocation uses and again manifesting a preference for the
“open-choice principle” on the part of L2 learners. This quantitative discrepancy
has also been shown through the small number of collocation types (1070) com-
pared with 5000 collocation tokens. These figures indicate weak collocational links
in the mental lexicon of L2 learners, which corroborates findings from word
association tests that L2 learners produced significantly fewer collocational
responses than native speakers did (cf. Fitzpatrick 2006).
In addition, in terms of collocation misuses, it was found that collocation poses
problems at all levels, shown by nearly a quarter of all the collocations produced
being erroneous. Furthermore, data obtained from this cross-sectional study yields
some interesting points: firstly, collocation overuse as reported in previous studies
through comparisons of NS and NNS corpora (Ädel and Erman 2012; Cobb 2003;
De Cock et al. 1998; Durrant and Schmitt 2009; Foster 2001; Granger 1998a) is
uncovered from a non-comparison perspective in this study. Through analysing the
distribution of collocation tokens over types, this study showed that a small pro-
portion of common collocations make up a majority of the overall collocation
tokens. This distribution is quite like word frequency distribution in a corpus of
natural language, viz. “a small number of words tend to make up a very large
86 5 Chinese Learners’ Production of Verb + Noun Collocations
portion of any normal text” (Milton 2009: 46). Secondly, between-group compar-
isons of collocation data revealed some general developmental patterns, viz. the
overall number of collocations does not increase with the rise of proficiency (both
in terms of tokens and types) but there is a more diversified collocation uses as
learners advance to higher levels. Despite the good signs of a development in
collocational competence as proficiency rises, there was no decrease in collocation
misuses, as collocation errors were persistent even at the ST6 level, depicting a
general lag in collocation knowledge. This indicates that collocational knowledge
does not improve with the advances of L2 proficiency and the stagnant of collo-
cational knowledge has long been endorsed (e.g. Bahns and Eldaw 1993; Laufer
and Waldman 2011). The next section, therefore, moves on to continue the dis-
cussion of this lag through examining collocations classified into delexical verb and
lexical verb + noun collocations produced by the three levels.
Vocabulary increase was first broadly measured in terms of the development from
delexical verbs to lexical verbs, viz. from very general to more specific verbs in
meanings, and then measured locally with reference to particular synsets. One of
the main hypotheses is with regard to the increase in verbs from delexical to lexical
verbs in collocation production: it is hypothesised that in VN collocation production
L2 learners at lower levels make more errors using delexical verbs, whilst those at
higher levels make more errors with lexical verbs. If this hypothesis is upheld, it
means that the learning of verbs, progressing from delexical to lexical verbs, does
not ensure better collocation competence, even though the growth of lexical verbs
provides more opportunities for L2 learners to be specific in choosing the right verb
to collocate with a noun in specific VN collocations.
As was pointed out in Sect. 2.2.3, the six commonest delexical verbs targeted are
do, give, have, make, take and get. Examples of well-formed and erroneous
delexical verb + noun collocations are: give + comment, make + money, take +
nap, *give + meeting, *take + joke, and *do + game. Examples of well-formed
and erroneous and lexical verb + noun collocations are achieve + aim, claim +
right, impose + burden, *ensure + law, *implement + act, and *teach + knowl-
edge. All the well-formed and erroneous DeLexVN and LexVN collocations in the
three databases were numerically tabulated in Table 5.3 and graphically presented
in Fig. 5.4.
Firstly, overall developmental patterns were revealed in the well-formed and
erroneous DeLexVN and LexVN collocations. As is clearly shown in Fig. 5.4, for
collocations that are correctly produced, there is with rising proficiency a clear
increase in lexical verb +noun collocations, and a decrease in delexical verb + noun
collocations. In general, it can be interpreted to the effect that the learning of more
5.2 Overall Analyses (2): Between-Group Comparisons … 87
1,200
1,000
Coll. freq.
800
Well-formed DeLexVN
600
Well-formed LexVN
400 Erroneous DeLexVN
0
ST2 ST5 ST6
Learner types
Fig. 5.4 Well-formed and erroneous VN collocations in the three levels of learners (tokens)
250
200
Coll. types
0
ST2 ST5 ST6
Learner types
Fig. 5.5 Well-formed and erroneous VN collocations in the three levels of learners (types)
88 5 Chinese Learners’ Production of Verb + Noun Collocations
Table 5.4 below shows the overall tokens of well-formed delexical and lexical
verb + noun collocations produced by the three levels of learners, and also gives
the information regarding the ratio of DeLexVN collocations divided by LexVN
collocations.
From this table, we can see that for ST2 learners, DeLexVN collocations were
used 1.5 times as often as LexVN collocations. For ST6 learners, this ratio dropped
sharply to 0.5, meaning that the DeLexVN collocations produced by the ST6 level
were only 0.5 times the number of LexVN collocations. Yet the ratio for the ST5
level was 1:1, indicating that they produced roughly equal numbers of DeLexVN
and LexVN collocations. These ratios demonstrate a clear growth in the production
of lexical verbs by ST6 learners.
Next, pairwise comparisons between the three groups were made using chi-
square test with Yate’s correction.6 A significant relationship was found between
the numbers of delexical verb + noun/lexical verb + noun collocations and learners
at different proficiency levels. More specifically, ST5 learners produced very sig-
nificantly more LexVN collocations than ST2 counterparts (v2 = 22.57,
p < 0.0001); Similarly, ST6 learners produced very significantly more LexVN
collocations than the ST5 level (v2 = 90.20, p < 0.0001); when the comparison was
made between ST2 and ST6 learners, the ST6 level produced very significantly
more LexVN collocations (v2 = 199.3, p < 0.0001). These statistical analyses,
together with the trend analyses presented in Fig. 5.4, indicate that Chinese EFL
6
This was chosen since the “Yates’ continuity correction is designed to make the chi-square
approximation better” (http://graphpad.com/guides/prism/6/statistics/index.htm?stat_chi-square_
or_fishers_test.htm) [Accessed 10 June 2013].
5.2 Overall Analyses (2): Between-Group Comparisons … 89
The analyses of well-formed delexical verb + noun and lexical verb + noun col-
locations showed a clear trend towards an increase in the production of lexical
verb + noun collocations and decrease in delexical verb + noun collocations as L2
learners’ proficiency rises. That indicates a growing collocational competence with
the learning of more lexical verbs or nouns. The production of more LexVN col-
locations by more proficient learners seems unsurprising. It is natural that less
proficient learners tend to resort to general words rather than words of specific
meanings as constrained by limited vocabulary. However, as the learning of more
90 5 Chinese Learners’ Production of Verb + Noun Collocations
180
160
140
120
Coll. freq.
100
LexVN
80
60
DeLexVN
40
20
0
ST2 ST5 ST6
Learner types
Fig. 5.6 Erroneous VN collocations produced by the three levels of learners (tokens)
lexical verbs facilitates better verb choices (e.g. take + attitude, have + attention,
solve + problem in the ST2, but adopt + attitude, attract/catch + attention and
solve/resolve/tackle + problem in the ST6), it is not always facilitative since at the
same time more lexical verb + noun collocations were found to be incorrectly used
as learners become more proficient (see Table 5.6 and Fig. 5.6).
As is shown in the above graph, there was no increase in erroneous delexical
verb + noun collocations but there was an increase in erroneous lexical verb +
noun collocations from the ST2 to the ST6 level. Fisher’s test further showed a
strong trend for increasing lexical verb + noun errors in the ST6 level as compared
with the ST2 level (p = 0.0354), though no significant difference was found
between ST2 and ST5 levels, and ST5 and ST6levels. As was discussed earlier in
Sect. 5.1.3, the ST5 level stood out in terms of lower number of errors, than with
ST2 and ST6 levels. Analyses of erroneous collocation types were also carried
out and results showed no significant difference between groups, although there
was an increase in lexical verb + noun errors from the ST5 to the ST6 level (see
Appendix A).
5.2 Overall Analyses (2): Between-Group Comparisons … 91
LexVN collocation errors, it is consistent with the trend for increasing use in
LexVN and decreasing use in DeLexVN collocations. As shown in Fig. 5.4, there is
a downward trend in delexical verb + noun collocations and an upward trend in
lexical verb + noun collocations. So ST5 is just at the place for the trend to be
monotonic. There is a slight progression in English proficiency between ST5 and
ST6 learners since ST5 is just lower than ST6, who have not spent much more time
on exposure in English than the ST5. However, there are significant differences
between the ST2 and ST6 level both in the number of well-formed and erroneous
DeLexVN collocations and LexVN collocations. The sharp difference between the
lowest level (ST2) and the highest level (ST6) makes the comparison between these
two levels more noteworthy. Based on this observation, comparisons in the fol-
lowing sections were mainly carried out between these two levels.
Up to this point, it can be seen that there is a significant increase in lexical verb
collocations, either correctly or incorrectly produced as learners proceed to the
advanced level. Before turning to the detailed analyses in Chap. 6 of verb increase
in a particular semantic set and VN collocations associated with these verbs, an
overall analysis was performed on the growth rate of all the verbs and nouns used
by the three groups. The aim of this quantitative analysis is to get a panoramic view
of vocabulary increase and its relationship with error growth. The aim is to compare
the growth rate of lexical verbs and the rates of collocation errors, in order to see
globally the interconnection of these two rates. Seen through the above finding that
there was a gradual increase in lexical verb + noun collocations both in terms of
tokens and types, it is predicted that there is a considerable increase in lexical verbs
and/or nouns.
Verbs and nouns were automatically retrieved through regular expressions per-
formed by PowerGREP and then lemmatised by Wordsmith (cf. Sect. 4.6). Examples
of lemmatised verbs are go (lemma), including goes, going, gone, went; legalise
(lemma), including legalised/legalized, legalises/legalizes, legalising/legalizing.7
Altogether the frequencies of lemmatised verbs/nouns used by the three groups,
their growth rates and growth rates of lexical verb + noun collocations are pre-
sented in Table 5.7.
The above table illuminates two interesting aspects. From the perspective of the
quantities of lemmatised verbs and nouns, the transitional stage of ST5 was further
7
The reason why words were lemmatised before their growth rate was calculated is that the various
forms of one word should be viewed as one word to avoid repetitive calculation. If all verb forms
were included, the four forms of the verb legalise were instead counted as four words in the ST6.
But as a matter of fact, for learners they’ve learnt only one verb—legalise.
5.3 Overall Analyses (3): Verb Growth and Collocation Errors 93
Table 5.7 Growth rates of lemmatised verbs, nouns and LexVN collocations
Verbs Nouns Well-formed Well-formed Erroneous Erroneous
(tokens) (types) (tokens) (types)
ST2 1473 3345 596 102 76 41
ST5 1771 3857 768 166 72 27
ST6 2049 4199 1097 230 130 71
Growth rates 39% 26% 84% 125% 71% 73%
(ST2–ST6)
confirmed (cf. Sect. 5.2.3). To be more specific, there is a gradual increase in verbs
and nouns from the ST2 to ST6, and the ST5 is again in mid-position allowing the
upward trend to be monotonic. It is not surprising that L2 learners learn more and
more verbs and nouns with increasing exposure to English. Another interesting point
we can observe here is regarding the growth rates of verbs and nouns in comparison
with the growth rates of well-formed and erroneous LexVN collocations. With the
learning of more verbs and nouns, the possibilities of combining them into
well-formed collocations increase as well (growth rates of well-formed collocations:
84% and 125%). This can be interpreted to the effect that with the increase in lexical
verbs in learners’ overall vocabulary, there are more chances for them to locate the
right lexical verbs to produce correct verb + noun collocations. At the same time, the
learning of more nouns means more diversified combinations of lexical verbs into
well-formed VN collocations. However, the chances that this vocabulary growth (for
verbs: 39% and for nouns: 26%) may lead to errors increase as well, seen through the
high growth rates of erroneous lexical verb + noun collocations (71% in terms of
tokens and 73% in terms of types). On the whole, these data suggest that the more
learners learn, the more chance there is they will make mistakes.
In addition, Table 5.7 also shows a higher growth rate of verbs (39%) than
nouns (26%). So the worsening collocation performance in lexical verb + noun
collocations can be inferred as more linked to verb increments than noun incre-
ments. In what follows, detailed analysis of verb increase was conducted, in order
to see collocation errors that are linked to verbs with an increase in a given syn-
onym set. Furthermore, the fact that nouns increase 26% from the ST2 to ST6
suggests that learning nouns also plays a role in the production of VN collocations.
Considering this, a case study is conducted to consider the ratios of collocation
errors associated with the learning of new nouns (see Sect. 6.3).
Sections 5.1, 5.2 and 5.3 set out to quantitatively examine the production of
verb + noun collocations by three levels of Chinese EFL learners. Unlike most
previous L2 collocation studies, this study started with an exhaustive extraction of
94 5 Chinese Learners’ Production of Verb + Noun Collocations
both the well-formed and erroneous VN collocations, and these collocations were
further divided into delexical verb and lexical verb collocations with a view to
looking into verb vocabulary growth, from delexical verbs to lexical verbs. An
apparent-time design was adopted, assuming that the performance of different age
groups of learners at different proficiency level is indicative of a continuous
developmental process. The above three sections presented the overall results
obtained through general quantitative analyses and findings were as follows:
• Results of the overall collocations (only around 5000) out of over 600,000
words of text support the findings of the ‘open-choice principle’ employed by
L2 learners in language production by comparison with NSs. Though it is
difficult to compare the number of VN collocations relative to the size of
writings by L2 learners due to the heterogeneity of L2 VN collocation studies,
compared with previous findings with regard to NS performance, L2 learners
produced a far smaller number of collocations. In addition, the overall number
of collocation types (1070) is rather low compared with tokens. Findings
showed a heavy use of a rather small number of collocation types, i.e. less than
14% types of collocations taking up more than half of the collocations produced
by Chinese EFL learners at all proficiency levels. These figures together indicate
poor L2 phraseological competence, as collocations are sparsely and repeti-
tiously used by L2 learners.
• Collocation overuse, as has been widely recognised, was discovered in this
study and yet in a different way. Unlike previous studies, the finding of overuse
was not based on comparisons of native and non-native data. Instead, com-
parisons of collocation production were performed within learner data in this
study. A rather limited type of collocations were “overused” in the sense that
they occurred more frequently than other collocation types, making up more
than half of all collocation tokens. Therefore, learners’ collocation interlanguage
is characterised by a small number of frequent collocations making up large
portions of all collocation uses.
• Collocation misuses were found at all levels, with varying percentages. In
general, nearly a quarter of the collocations produced by L2 learners were
erroneous. The quantity of errors was analysed with regard to L2 proficiency,
and statistical analysis showed that there was no significant difference between
the numbers of erroneous collocations and learners of the ST2 and ST6 level.
However, there were a persistent proportion of collocation misuses with the rise
of proficiency (22% in the ST2 level and 21% in ST6 level). Rather than a
decrease in errors as L2 learners’ proficiency rises, collocation misuses remain
at the same level, which indicates a lag in collocation acquisition.
• In terms of the production of delexical verb + noun and lexical verb + collo-
cations by the three groups of learners, an upward trend for well-formed lexical
verb and downward trend for delexical verb collocations were found from the
ST2 to the ST6 level. Further statistical analyses confirmed this trend by
showing a significant increase in well-formed lexical verb + noun collocations
with the rise of proficiency.
5.4 Synopsis of the Overall Analyses of Verb + Noun Collocations 95
References
Ädel, A., & Erman, B. (2012). Recurrent word combinations in academic writing by native and
non-native speakers of English: A lexical bundles approach. English for Specific Purposes, 31
(2), 81–92.
Altenberg, B., & Granger, S. (2001). The grammatical and lexical patterning of MAKE in native
and non-native student writing. Applied Linguistics, 22(2), 173–195.
Bahns, J., & Eldaw, M. (1993). Should we teach EFL students collocations? System, 21(1),
101–114.
Channell, J. (1994). Vague Language. Oxford: Oxford University Press.
Cobb, T. (2003). Analyzing late interlanguage with learner corpora: Quebec replications of three
European studies. Canadian Modern Language Review, 59(3), 393–423.
96 5 Chinese Learners’ Production of Verb + Noun Collocations
The main research goal of this study is to answer the question whether, within
specific semantic domains of the verbs occurring in verb + noun collocations
produced by all levels of learners, there are more chances of these verbs in higher
levels to lead to collocational errors than they form the correct ones. Thus, Sect. 6.1
presents detailed analyses of VN collocations where verbs were classified into
synsets, aiming to investigate the relationship between verb increase and colloca-
tion errors. Section 6.2 gives a summary of the detailed analyses of verbs in synsets
and learners’ collocation performance with these verbs; Section 6.3 looks into
whether there are other factors accounting for a lag in collocation, i.e. the acqui-
sition of new nouns.
The groups of learners first targeted were the ST2 and ST6, i.e. the lowest and the
highest levels. The reasons why the ST5 level was not included as the first step in
analysis are as follows: first, the ST5 level, as the transitional stage, produced the
fewest erroneous collocations as compared with the other two levels. Results from
statistical tests showed that they produced very significantly fewer VN collocation
errors than both the ST2 and ST6 groups of learners (cf. Sect. 5.1.3). Second,
analysis of erroneous lexical verb collocations in the ST5 file yielded the same
result: they produced the fewest erroneous lexical verb collocations and no sig-
nificant relationship was found between the ST5 level and the other two levels with
regard to the number of LexVN errors produced. However, there was a strong trend
for increasing LexVN errors at the ST6 level as compared with the ST2 level
(cf. Sect. 5.2.2). Therefore, the increase in lexical verbs was sharper in the ST6
level as compared with the ST2 level than as compared with the ST5 level. So we
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018 97
H. Men, Vocabulary Increase and Collocation Learning,
DOI 10.1007/978-981-10-5822-6_6
98 6 Verb Increase and the Production of Verb + Noun Collocations
started by classifying verbs in VN collocations in the ST6 level into synsets, then
classified the lexical verbs in VN collocations in the ST2 level into synsets, and
compared VN collocations within these synsets between the two levels. Finally,
verbs in the VN collocations in the ST5 level were added for general comparison
with the other two levels, in order to see whether there is a consistent trend in
learners’ collocation performance within these synsets.
As was presented in Sect. 4.7.2, the criteria for classifying verbs in VN collo-
cations were semantic similarity (synonyms), context-dependent synonyms and
foreign-language equivalents. Three sources for determining semantic similarity
were referenced: the EVCA, the ODSA and WordNet. Verbs in the VN collocations
in the ST2 and ST6 databases were classified into synsets such as, verbs of creation
(e.g. compose, create, build), and verbs of obtaining (e.g. achieve, earn, receive),
etc. Verb + noun collocations produced by Chinese L2 learners were limited in
quantity (cf. Sect. 5.1.2), so the verbs in collocations were found to be infrequent.
Due to the rather infrequent uses of verbs in collocations produced by EFL learners
with proficiency ranging from the basic level to the advanced level, only a limited
number of synsets that occurred in both databases were obtained (see Table 6.1 for
the 16 synsets classified).
As is shown in the above table, there was an increase in verbs in the first 12
synsets, but the increase varied in different synsets. More dramatic verb increase at
the ST6 level was found in the semantic sets of verbs of creation, fulfil verbs, verbs
of putting and settle verbs than other synsets like verbs of obtaining, learn verbs,
verbs of transfer of a message, keep verbs, follow verbs, play verbs, change verbs
and break verbs. But for the last four synsets, there was no such increase in the
quantity of verbs in the ST6 level. The proliferation of verbs in the higher level is a
natural process as learners learn more words with more instruction they receive.
The more verbs in a semantic field learners learn, the more specific they can be in
expressing meanings. However, as is pointed out by Wolter (2006), L2 learning is
not merely restricted to expanding vocabulary size: the depth of vocabulary
knowledge is of equal importance and one measure of vocabulary depth is the
learning of syntagmatic connections between words. But as will be shown below,
greater specificity was acquired at a loss for L2 learners. In the following detailed
discussion of both the well-formed and erroneous VN collocations of these verbs in
the 12 synsets, we set out to examine whether these verbs lead to more errors than
they form correct collocations.
Table 6.1 Synsets occurring both in ST2 and ST6 VN collocation databases
Synsets Verbs
ST2 ST6 No.
1 Verbs of Compose, create, Arouse, chart, build, draft, draw, 7
creation draw, hold, enact, establish, form, hold, launch,
launch, raise, set publish, raise, set, stir
2 “Fulfil” verbs Discharge, fulfil Accomplish, apply, carry out, 10
commit, conduct, enforce, exercise,
exert, fulfil, implement, perform,
realise
3 Verbs of Achieve, earn, Achieve, catch, earn, gain, grasp, 2
obtaining gain, gather, reach, receive, seize
grasp, receive
4 Verbs of Lay Attach, fix, impose, lay, place, put 5
putting
5 “Settle” verbs Settle, solve Charge, settle, solve, resolve, tackle, 4
undertake
6 “Learn” verbs Know, learn, Acquire, learn, master, study 1
study
7 Verbs of Teach, tell Impart, instruct, teach, tell 2
transfer of a
message
8 “Keep” verbs Hold, keep Hold, keep, maintain 1
9 “Follow” verbs Follow, obey Adopt, follow, obey 1
10 “Play” verbs Play Act, play 1
11 “Change” verbs Change Change, shift 1
12 “Break” verbs Break Break, violate 1
13 “Live” verbs Lead, live Lead, live 0
14 “Wear” verbs Dress, wear Dress, wear 0
15 “Drive” verbs Drive, ride Drive −1
16 “Pay” verbs Devote, pay Pay −1
Note The ‘No.’ column represents the number of verbs at the ST6 level that are more than verbs at
the ST2 level
and the target verbs falling in the synsets were included. In other words, errors
included not only the verbs within the synsets that were inappropriately produced,
but also verbs that should be produced but not. For example, *create (com-
pose) + song was classified as a collocation error in the synset of verbs of creation,
since the wrongly used verb (create) and the target verb (compose) have the
semantics of creation. In addition, *make (compose) + poem was also counted as a
collocation error falling in the synset of verbs of creation, given that the target verb
create was in the semantic field of creation. Detailed classification of well-formed
and erroneous collocations of the verbs within these synsets in ST2 and ST6 is
provided in Appendices B and C respectively. Analyses were performed on L2
learners’ collocation performance in synsets with a verb increase (i.e. the first 12
100 6 Verb Increase and the Production of Verb + Noun Collocations
synsets in Table 6.1) and synsets with no increase in verbs in the ST6 level (i.e. the
last 4 synsets). Table 6.2 presents the total number of collocation types within two
different kinds of synsets (for detailed information about the frequencies in each
synset see Appendix D). The frequency of tokens was not considered in the fol-
lowing analyses so as to avoid skewing the overall results, since there was an
unbalanced distribution of tokens within a limited range of collocation types
(cf. Sect. 5.1.2).
Within the 12 synsets where there was an increase in verbs in the ST6 level (e.g.
synsets of verbs of creation, fulfil verbs, etc.), well-formed VN collocations
increased dramatically from 39 to 126 in frequencies. However, there was also an
increase in collocation errors from 22 in the lowest level to 65 in the highest level.
In contrast, among the 4 synsets where no increases in verbs were found in the ST6
level (e.g. synsets of live verbs, wear verbs, drive verbs and pay verbs), collocation
errors remained constant from the ST2 to the ST6 (2 types of errors in total in each
level). In terms of proportions, the percentage of erroneous collocations involving
verbs in synsets with a verb increase out of the total number of collocations pro-
duced by ST2 learners was 36% (22/(39 + 22)), and for ST6 learners, the per-
centage was 34% (65/(126 + 65)). The percentage of collocation errors that ST6
learners made in synsets with a verb increase was roughly the same as that in the
ST2 level. This finding indicates a lag in collocational knowledge for more profi-
cient learners. More precisely, even though ST6 learners were more advanced and
acquired more lexical verbs, they were as likely to make verb + noun collocation
errors as much less proficient learners (ST2 learners). There was no sign of an
improving competence on VN collocations with the rise of proficiency.
Not only was a lag found in learners’ collocation performance in synsets with an
increase in verbs, the occurrence of collocation errors involving these synsets in the
ST6 was found to be more limited to elaborated synsets than it was in the ST2 level.
The total number of erroneous collocations produced by the two groups of learners,
respectively, were 64 (ST2) and 93 (ST6) (cf. Table 5.2). So the proportion of
collocation errors associated with verbs in the 12 synsets in ST2 was 34% (22/64).
For ST6 learners, the ratio was twice as high as that of ST2 learners −70% (65/93).
An increase in erroneous collocations in these synsets was found. Again, this gross
analysis of collocation errors out of the total number of errors indicated that the
more verbs that were learned by higher levels, the more collocation errors were
produced.
6.1 Detailed Analyses—Verb Increase and Collocation Uses 101
What has been found up to now supports the general prediction that verb
increase is a factor responsible for the stagnant development of collocational
knowledge. However, caution is needed here, since collocation errors in the above
analyses include both verbs that are old, i.e. verbs produced by the ST2 level in VN
collocations, and verbs that are new in the ST6 level, i.e. newly learned verbs that
were not found in ST2 VN collocation databases. The 65 collocation errors pro-
duced by ST6 learners involve both errors with old verbs and new verbs. For
example, given that the verb draw in verbs of creation has been used by the lower
level (e.g. draw + conclusion), it was considered as an already acquired verb for
learners at higher levels. Similarly, conduct was not used by ST2 learners in VN
collocations but was present in the ST6 level, so it was considered as a new verb. In
the calculation of erroneous verb + noun collocations in Table 6.2, errors involving
both the old verb (*make (draw) + conclusion) and the new verb (*conduct
(commit) + crime) were included. Therefore, ST6 learners’ collocation performance
on old verbs and new verbs should be distinguished, in order to look at whether
new verbs are associated with more errors than they form correct collocations.
In the process of distinguishing errors associated with old verbs and new verbs in
the ST6 level, the following criteria were adopted: if errors involved new verbs (e.g.
*publish (enact) + law, publish was a new verb in the ST6), they were put in the
category of errors with new verbs; if the error involved old verbs, but the target verb
was a new verb (e.g. make (conduct) + exam, conduct was a new verb in the ST6),
it was classified as errors with new verbs; if the error involved old verbs (e.g. *draw
(formulate) + theory, draw is an old verb for the ST6 level), but the target verb was
not a new verb in the synsets identified, it was considered as an error with old verbs.
Following these criteria, VN collocations associated with the verbs in the synsets
identified were divided into those with old verbs and new verbs. Examples of
well-formed VN collocations associated with old verbs are: launch + war, set +
fire; examples of well-formed VN collocations associated with new verbs are:
chart + course, draft + law; examples of collocation errors associated with old
verbs are *make (draw) + conclusion, *take (launch) + career; examples of errors
with new verbs are *arouse (cause) + trouble, *take (conduct) + survey. The fre-
quency information of collocation errors, divided into errors with old and new verbs
in the 12 synonym sets, is tabulated in Table 6.3.
As is shown in Table 6.3, the overall number of well-formed collocations with
old verbs and new verbs did not show an increase, but errors involving new verbs
increased sharply. The error percentage associated with new verbs out of the
number of their collocation uses is 41%, while that of old verbs is only 25%. Apart
from a comparison in percentages, further statistical analysis was performed.
Fisher’s test revealed a significant difference between old and new verbs in terms of
the number of erroneous collocations (p = 0.0216; see Table 6.4). In other words,
collocation errors involving new verbs are significantly more likely than errors with
old verbs.
Turning now to the synsets where L2 learners had more problems with new
verbs than with old verbs, it becomes clear from Fig. 6.1 that errors with new verbs
falling into the semantic domains of verbs of creation, fulfil verbs, verbs of
102 6 Verb Increase and the Production of Verb + Noun Collocations
Table 6.3 VN collocation production involving old and new verbs at the ST6 level
Synsets Old verbs New verbs
Verbs WFC EC Verbs WFC EC
1 Verbs of Draw, 13 4 Arouse, chart, build, 6 11
creation hold, draft, enact, establish,
launch, form, publish, stir
raise, set
2 “Fulfil” Fulfil 3 1 Accomplish, apply, 23 16
verbs carry out, conduct,
enforce, exercise,
exert, implement,
perform, realise
3 Verbs of Achieve, 18 2 Catch, reach, seize 6 6
obtaining earn,
gain,
grasp,
receive
4 Verbs of Lay 2 2 Attach, fix, impose, 12 6
putting place, put
5 “Settle” Settle, 2 0 Charge, resolve, 4 1
verbs solve tackle, undertake
6 “Learn” Learn, 0 4 Acquire, master 1 1
verbs study
7 Verbs of Teach, 3 1 Impart, instruct 1 2
transfer of tell
a message
8 “Keep” Hold, 12 2 Maintain 3 0
verbs keep
9 “Follow” Obey, 4 1 Adopt 4 0
verbs follow
10 “Play” Play 2 2 Act 0 1
verbs
11 “Change” Change 1 1 Shift 1 0
verbs
12 “Break” Break 3 1 Violate 2 0
verbs
Total 63 21 63 44
84 107
ER 25% 41%
Notes ‘WFC’ stands for well-formed verb + noun collocations; ‘EC’ for erroneous verb + noun
collocations; ‘ER’ represents the ratio of the errors out of all the collocations examined in the
column
Fig. 6.1 Collocation errors involving old and new verbs in the ST6 synsets
obtaining, verbs of putting, settle verbs and verbs of transfer of a message occurred
more often than with old verbs. This result is strongly linked with synset classifi-
cation in which there is a proliferation of verbs just within the six synsets (cf.
Table 6.1). Therefore, verb increase as an inhibiting factor on target-like L2 col-
location performance is again supported.
The previous section has addressed the relationship between verb increase and
collocation errors in the lowest (ST2) and highest (ST6) levels and quantitative
analysis shows: (1) errors in the synsets with more verbs learnt in the ST6 level than
the ST2 level increased with rising proficiency; (2) errors with new verbs were
significantly more likely than with old verbs. In this section, the focus shifts to the
middle level (ST5), so as to see whether there is a consistent trend in the ST5 level
in terms of both verb increase in the 12 synsets and collocation uses linked with
these verb synsets. It is predicted that the performance of ST5 learners is consistent,
i.e. compared with the ST2 level there is an increase in verbs in the synsets and at
the same time an increase in collocation errors associated with the verbs in the
synsets, and there are fewer verbs and collocation errors, compared with the ST6
level.
All the lexical verbs in the verb + noun collocations in the ST5 database were
classified into synsets following the same procedure and criteria applied in the
analyses of the other two levels. Table 6.5 lists the classification of verbs in the ST5
level together with the verbs in synsets identified in the ST2 and ST6 levels.
104 6 Verb Increase and the Production of Verb + Noun Collocations
Table 6.5 Verb synsets classified from ST2, ST5 and ST6 VN collocation databases
Types Synsets Verbs
ST2 ST5 ST6
1 Verbs of Compose, Arouse, build, conduct, Arouse, chart, build,
creation create, draw, draw, establish, form, draft, draw, enact,
hold, launch, hold, launch, produce, establish, form, hold,
raise, set publish, raise, set launch, publish, raise,
set, stir
2 “Fulfil” Discharge, Apply, enforce, fulfil, Accomplish, apply,
verbs fulfil perform, practice carry out, commit,
conduct, enforce,
exercise, exert, fulfil,
implement, perform,
realise
3 Verbs of Achieve, Achieve, catch, earn, Achieve, catch, earn,
obtaining earn, gain, gain, grasp, reach, gain, grasp, reach,
gather, receive, seize receive, seize
grasp,
receive
4 Verbs of Lay Attach, lay, place, put, Attach, fix, impose, lay,
putting set place, put
5 “Settle” Settle, solve Resolve, solve Charge, settle, solve,
verbs resolve, tackle,
undertake
6 “Learn” Know, learn, Learn, master, study Acquire, learn, master,
verbs study study
7 Verbs of Teach, tell Teach, tell Impart, instruct, teach,
transfer of tell
a message
8 “Keep” Hold, keep Hold, keep Hold, keep, maintain
verbs
9 “Follow” Follow, obey Adopt, follow, obey Adopt, follow, obey
verbs
10 “Play” Play Play Act, play
verbs
11 “Change” Change Change Change, shift
verbs
12 “Break” Break Break, violate Break, violate
verbs
13 “Live” Lead, live Lead, live Lead, live
verbs
14 “Wear” Wear, dress Wear Wear, dress
verbs
15 “Drive” Drive, ride Ride Drive
verbs
16 “Pay” Devote, pay Pay Pay
verbs
6.1 Detailed Analyses—Verb Increase and Collocation Uses 105
In terms of the variety of verbs falling in the first 12 synsets, the ST5 level falls
in the middle between the lower level (ST2) and the higher level (ST6). In each of
the 12 synsets, they produced verbs no more than the higher level and no less than
the lower level. A clear and consistent trend for verb increase is shown from the
above table. More and more verbs were learned in the semantic domains of verbs of
creation, fulfil verbs, verbs of obtaining and verbs of putting. On the whole, the
quantity of verbs in the ST5 level is more like the verbs produced by ST2 learners,
since in the 6 synsets of settle verbs, learn verbs, verbs of transfer of a message,
keep verbs, play verbs, change verbs and live verbs, the ST5 level shows no
increase in verbs compared with the lowest level.
Collocations with the verbs in the 16 synsets in the ST5 level were divided into
correct and erroneous uses, as is shown in Appendix E. Then similar analyses of
well-formed and erroneous collocation uses associated with verbs in the 12 synsets
were performed on the ST5 data (see the numerical presentation of collocation uses
associated with the 12 synsets in the three levels in Table 6.6, and for the detailed
frequency information, see Appendix F).
For the last four synsets in which higher levels did not produce more verbs than
the lower levels, no increase in well-formed and erroneous VN collocations was
found from the lowest to the highest level (well-formed collocations: from 11 to 6
to 7; errors: from 2 to 1 to 2). In contrast, the overall figures for well-formed and
erroneous VN collocations in each proficiency group showed that within the synsets
where there was a verb increase (the first 12 synsets), more and more well-formed
collocations (from 39, to 61 and then to 126 types) were produced; at the same time
errors increased as well from the ST2 to the ST6 level (from 22 to 65 types). There
was not an error increase from the ST2 to the ST5 level, which may be due to the
slight verb increase in synsets in the ST5 level compared with the ST2 level. In
addition, in terms of proportions, the percentage of collocation errors involving the
12 synsets in the ST5 was the lowest: 27% (for ST2: 36%; ST6: 34%) (see
Table 6.7). When the numbers of errors are placed into the bigger context of the
total collocation errors in each level, there is a clear increase in the proportions of
errors involving verbs in the 12 synsets.
As Table 6.7 reveals, the ratios of errors out of VN collocations associated with
the synsets identified did not show much decrease with rising proficiency, indi-
cating a general lag in collocational knowledge. In addition, out of the total number
Table 6.6 Well-formed and erroneous VN collocations in the 16 verb synsets (ST2, ST5 and
ST6)
ST2 ST5 ST6
WFC EC Total WFC EC Total WFC EC Total
The first 12 39 22 61 61 22 83 126 65 191
synsets
The last 4 11 2 13 6 1 7 7 2 9
synsets
Note ‘WFC’ stands for well-formed VN collocations, and ‘EC’ for erroneous VN collocations
106 6 Verb Increase and the Production of Verb + Noun Collocations
Table 6.7 Proportions of VN collocation errors associated with the 12 synsets with a verb
increase
Levels Errors in synsets Total errors Total colls. in synsets ER1 (%) ER2 (%)
ST2 22 64 61 34 36
ST5 22 44 83 50 27
ST6 65 93 191 70 34
Notes ‘colls.’ stands for collocations; ‘ER1’ represents the ratio of errors in synsets out of the total
number of collocation errors in each level; ‘ER2’ represents the ratio of errors in synsets out of the
total number of both well-formed and erroneous collocations involving the 12 synsets
Fig. 6.2 VN collocation errors with the verbs in the twelve synsets across the three levels
6.1 Detailed Analyses—Verb Increase and Collocation Uses 107
Through the above-detailed analyses of verb increase within a certain semantic set
and their relationship with collocation performance, the development of learners’
VN knowledge was observed to stagnate. On the one hand, the percentages of
errors involving the 12 synsets identified from learners’ production of VN collo-
cations remained roughly constant (36% at the ST2 level and 34% at the ST6 level)
with rising proficiency. In addition, errors with new verbs in the ST6 level were
significantly more likely to be made than old verbs. On the other, the ratios of
collocation errors with the verbs in the synsets out of the total number of errors
increased successively from the lowest to the highest level (ST2: 34%; ST5: 50%
and ST6: 70%). As learners proceeded to more advanced levels, the occurrence of
collocation errors became more and more limited to synsets with a verb increase.
Verb classes most susceptible to errors were verbs of creation, fulfil verbs, verbs of
obtaining and verbs of putting, where there was a considerable increase in the
number of verbs. Thus, we consider that the increase of verbs in a particular
semantic domain is an inhibiting factor for the learning of VN collocations.
There is a view that collocations defeat even the most proficient English learners
because of the arbitrary restrictedness in collocations (Nesselhauf 2003, 2005).
Collocations are not just semantically motivated, but also involve arbitrarily
restricted selection. For example, blonde hair in English is felicitous, but *blonde
paint is not and auburn hair is used to describe women, but not men (Schmitt and
Carter 2004: 14). The restrictedness nature of collocation has been considered as the
most important factor correlating with learners’ difficulties with collocation pro-
duction (Nesselhauf 2003, 2005).
In light of our findings, what poses great difficulties for L2 learners is to dis-
tinguish among a group of semantically related verbs (e.g. perform vs. implement,
conduct vs. commit, etc.). In the erroneous collocation *conduct + crime instead of
commit + crime, it can be seen that they both share the semantic features of “carry
out something”, but differ from each other in the sense that “commit” denotes
“doing something illegal or bad”, but “conduct” means “organising something and
carry it out”.1 Likewise, in the following collocation error: *implement + act, the
learner who has made such an error may know a partial meaning of the verb
1
The meanings of the two verbs were quoted from Collins COBUILD Advanced Learner’s English
Dictionary (2006).
6.2 Synopsis of Detailed Analyses of Verb Increase and Collocation Uses 109
implement (i.e. the semantic component as “carrying out something”) but not its
complete semantics. This partial acquisition has led to the learner’s incorrect belief
that the erroneous verb implement can be combined with the noun act. However,
implement means more than that. The misused verb (implement) and the target verb
(perform) both belong to the semantic field of fulfil verbs and verbs in this set have
a small number of semantic features in common, but they are distinguished by
specific meanings. Both verbs have the semantic component “carry out something”,
but implement is distinguished from perform in that it implies: “to ensure what has
been planned is done” (e.g. implement a plan). For the verb perform, it simply
suggests “doing a (usually) complicated task or action”. So it is inferred that when a
learner has an incomplete command of implement, i.e. only the semantic compo-
nent as “carrying out something”, but not its distinctive feature of “ensuring
something that has been planned is completed”, collocation errors like imple-
ment + act are likely to be made. Therefore, seen from the erroneous VN collo-
cations produced by Chinese L2 learners, only a fraction of verb semantics was
acquired by learners, but not its distinguishing features from a set of semantically
related verbs. In this sense, acquisition of verb semantics is important for successful
learning of collocations.
The research hypothesis tested above was that verb increase is the main factor
responsible for the stagnant L2 collocation development. Accordingly, it is pre-
dicted that other factors, e.g. the acquisition of new nouns is not the main inhibiting
factor in collocation performance. Our prediction is thus that in the majority of new
nouns produced by higher levels of learners, learners produce correct VN collo-
cations. The prediction that noun increase is not the inhibiting factor would be
further confirmed if the percentage of new nouns in erroneous collocations
remained constant within the levels of ST5 and ST6.2
In order to examine whether collocation lag is a result of new noun acquisition,
specifically, whether collocation errors are made because learners learn a large
proportion of new nouns, the empirical requirement was to identify newly acquired
nouns by L2 learners. It was not feasible to see if a noun was new or old through
asking the learner him/herself at the time of their writing. The identification of new
nouns at a higher level was therefore implemented by examining nouns produced
by lower levels of learners. New nouns were those which were not used by lower
levels (i.e. with no occurrences in the file at lower levels) whilst old nouns referred
to those that were both used by the two groups of learners (i.e. with occurrences in
2
Given the constraint of locating new nouns in the lowest ST2 level, the proportion of collocation
errors where new nouns occur cannot be obtained.
110 6 Verb Increase and the Production of Verb + Noun Collocations
both the files of the two groups). Taking the ST2 and ST6 learners for the purposes
of illustration, the nouns only occurring in the ST6 file were assumed to be newly
acquired nouns by ST6 learners and old ones were those that occurred in both ST6
and ST2 sub-corpora.3
The search for new nouns was performed automatically. With the ST2 and ST6
as examples, procedures involving the identification of new nouns in the colloca-
tions produced by ST6 learners were as follows:
• Store all the nouns in the VN collocations produced by ST6 learners in a text
file;
• Generate a list of all the nouns in the ST2;
• Use Wordsmith (the Match function) to delete all the matched nouns between
the wordlist of nouns in ST6 collocations and the nouns in ST2, and get a list of
new nouns in the ST6.
Once the above procedure had been carried out, analyses of new nouns were first
performed at the ST6 level (new nouns as compared with the ST2 level), then new
nouns (new nouns as compared with the ST5 level) in the ST6 were analysed and
finally new nouns in the ST5 level (new as compared with the ST2 level) were
analysed.
Altogether there were 264 nouns in the ST6 VN collocations, out of which 72 were
new nouns that did not appear in the ST2 file, and 192 old nouns that were used
both by ST2 and ST6 learners. A further categorisation was carried out among the
new nouns that occurred in the erroneous verb + noun collocations and correct
collocations. The results are presented in Table 6.8.
As is shown in the above table, 25% of the overall number of new nouns in
ST6 VN collocations were in erroneous collocations, which means that three-
quarters of these newly acquired nouns were in correct collocations. It then becomes
interesting to see whether collocation errors occur because of a lack of new verbs for
newly acquired nouns or whether it is just a matter of learners’ inability to associate
the new nouns with already acquired verbs. If the first case, the inhibiting role of the
new nouns will be manifested, as learners encounter two difficulties: newly acquired
nouns and a lack of appropriate verbs. If the collocating verbs for the new nouns are
3
This can only be assumptions, since it might be as well that some new nouns in ST6 were actually
acquired by the ST2 but not used (e.g. button, decision) or among old nouns in the ST6 and ST2,
some were idiosyncratic uses by one learner (e.g. criminal, principle) and not acquired by general
ST2 learners. These cases did exist but were rare, so assumptions were on the whole justified. In
addition, since groups of learners were targeted, the individual differences between learners could
not be spotted.
6.3 An Alternative Explanation: New Nouns and Collocation Uses 111
Table 6.8 New nouns and New nouns Old nouns Total
old nouns in ST6 VN
collocations (new nouns as Erroneous coll. 18 (25%) 42 60
compared with ST2) Correct coll. 54 (75%) 150 204
Total 72 (100%) 192 264
already acquired but not correctly used with the new nouns, we can infer learners’
split learning of collocations into individual words, rather than association of a
newly acquired noun with an old verb. The 18 new nouns and their collocating verbs
were analysed in the erroneous verb + noun collocations produced by ST6 learners,
shown in Table 6.9.4 Table 6.9 also provides information concerning whether or not
the erroneous and the target verb collocates of the new nouns are present in the ST2
file. If the target verb (e.g. impose) was not present in the ST2 file (signalled by a
minus symbol “−”), this verb was considered as a new verb for the ST6 learner.
Otherwise it was an old verb (e.g. play).
To see whether ST6 learners wrongly used a new verb instead of an old verb to
collocate with a new noun, or they used an old verb instead of another old/new verb
for the new noun, the 27 collocation pairs of the 18 new nouns in ST6 erroneous
collocations displayed in the above table were further classified into three cate-
gories, according to whether the erroneous and target verbs were already learnt by
ST2 learners.
Instances of this category are *give + burden, *lay + burden, *release + burden,
*surpass + advantage, *break + regulation, *build + regulation, *take + survey,
*do + threat, *impose + threat, *conduct + murder, and *cause + imagination.
That the target verbs for these erroneous collocations (e.g. impose, relieve, outweigh,
etc.) were not used by ST2 learners suggests that they were new to ST6 learners or
had not been fully acquired yet. With *give + burden as an example, the target verb
impose was not used by ST2 learners, suggesting impose may be a new verb to the
ST6 learner who had acquired a new noun—burden. Thus, learners were found to
make collocation errors as such by using a verb they had already acquired (in the
case of burden, they used give and lay). Among the 11 collocations where collo-
cation errors may take place as a result of a lack of new verbs, 8 of the erroneous
verbs in the VN collocations are old verbs with an appearance in the ST2 file,
meaning they may have been already acquired by ST6 learners. This is natural given
that L2 learners have not acquired the target new verbs and have to use old verbs
instead. Learners acquire a new noun, but they may not acquire the collocating verbs,
4
Among the 18 nouns, there is a noun phrase—military service, which was regarded as one noun
for the convenience of analysis.
112 6 Verb Increase and the Production of Verb + Noun Collocations
Table 6.9 18 New nouns in ST6 erroneous VN collocations and their verb collocates (new nouns
as compared with ST2)
Nouns Erroneous Target verbs Erroneous verbs Target verbs
verbs in ST2 in ST2
Burden Give Impose + −
Burden Lay Impose + −
Burden Release Relieve − −
Role Lead Play + +
Role Lay Assign + +
Role Act Play + +
Role Serve Play + +
Consciousness Stir Raise − +
Disadvantage Surpass Outweigh + −
Regulation Break Violate + −
Regulation Build Enact + −
Survey Take Conduct + −
Threat Do Pose + −
Threat Impose Pose − −
Murder Conduct Commit − −
Prejudice Reflect Hold − +
Prejudice Cast Hold − +
Treaty Draw Sign + +
Chat Make Have + +
Competence Exert Demonstrate − +
Imagination Cause Excite + −
Load Pull Carry + +
Mercy Cast Have − +
Recognition Reach Receive + +
Military Attend Perform + +
service
Military Take Perform + +
service
Measure Make Take + +
Note ‘+’ means that the verb appears in the ST2 sub-corpus, and ‘−’ represents an absence of the
verb in the ST2 sub-corpus
and collocation errors may thus occur. In the other three instances where new verbs
had not been acquired, newly acquired verbs were incorrectly used with the new
nouns, i.e. *release + burden, *impose + threat, and *conduct + murder.5
5
The target verbs for the three new nouns (relieve, pose and commit) and new erroneous verbs
(release, impose, conduct) all share partial phonological resemblance, which could be seen as
phonological interferences and the target verbs may have been known by learners but not fully
acquired yet.
6.3 An Alternative Explanation: New Nouns and Collocation Uses 113
In all, the 11 types of erroneous collocations in ST6 learner group can be viewed
as a lack of new collocating verbs with newly acquired nouns.
This type arises when learners acquire a new noun, and misuse a known verb with
another already acquired verb. Erroneous collocations involving misuses of old
verbs include *lead + role (correct verb: play), *lay + role (assign), *act + role
(play), *serve + role (play), *draw + treaty (sign), *make + chat (have), *pull +
load (carry), *reach + recognition (receive), *attend + military service (perform),
*take + military service (perform), *make + measure (take). Errors of this type can
be inferred as a split learning of VN collocations, i.e. new nouns were learnt in an
isolated way instead of being learnt in collocational relationships with already learnt
verbs.
In all, the percentage of new nouns that are linked to collocation errors in the
ST6 is 25%, which means that among 100 nouns that are newly acquired by L2
learners, learners correctly find a collocating verb in 75 of the cases. Even in errors
involving the wrong choices of verbs for the newly acquired nouns, less than a half
of the errors involving new nouns (41%: 11/27) arise when learners do not know
the new target verb collocates (as is illustrated in the Category a.). However, in
more than half of the cases (59%: 16/27), errors arise when the target verb may
have already been acquired by ST6 learners for the newly acquired nouns (as is
illustrated in Categories b and c). These figures can be interpreted to the effect that
collocation errors with new nouns occur even though learners in most cases do not
lack the verbs for newly acquired nouns. For example, either new verbs are misused
instead of an old verb (e.g. *stir + consciousness instead of raise, *reflect +
prejudice instead of hold) or another old verb is misused instead of another old
target verb (e.g. *lead + role instead of play, *make + measure instead of take). On
the one hand, that learners use newly acquired verbs to combine with newly
acquired nouns can be viewed as boldness in collocation learning, i.e. they are
114 6 Verb Increase and the Production of Verb + Noun Collocations
experimenting with verbs that have been newly learnt. On the other hand, the fact
that L2 learners fail to associate nouns with known verbs in collocations suggests
the learning of new nouns in isolation, instead of being learnt as prefabricated
chunks with the already acquired verbs. This finding supports Wray’s (2002) claim
of a split learning of collocations into individual items, or the inability to pay
attention to collocational relationships between words on the part of L2 learners.
Therefore, the role played by new nouns in the collocation lag in ST6 learners
can be viewed as no more than a minor one, influencing a limited percentage of new
nouns in erroneous collocations (25%). In most cases where collocation errors with
new nouns occur, it is because of an inability to associate the new nouns with
already acquired verbs. On this basis, collocation lag is not a result of newly
acquired nouns.
It should be recalled that ST2 learners are senior middle school students, repre-
senting the lowest level in the CLEC corpus, and ST6 learners are third and
fourth-year university English majors, representing the highest proficiency level
within the corpus. Between these two levels of proficiency, there is the ST5 level of
first and second-year English majors. Therefore, in order to ensure the continuity of
between-group comparisons, VN collocation performance of ST5 learners was
taken into consideration as well. Following the same procedure as in the analysis of
ST6 as compared with the ST2 level, learners’ collocation performance in terms of
new nouns in the ST6 (new as compared with the ST5 level) and ST5 (new as
compared with the ST2 level) was analysed. Frequencies are presented in
Tables 6.10 and 6.11.
Table 6.10 New and old New nouns Old nouns Total
nouns in ST6 VN collocations
(new nouns as compared with Erroneous coll. 5 (14%) 55 60
ST5) Correct coll. 31 (86%) 173 204
Total 36 (100%) 228 264
Table 6.11 New and old New nouns Old nouns Total
nouns in ST5 VN collocations
(new nouns as compared with Erroneous coll. 6 (11%) 28 34
ST2) Correct coll. 48 (89%) 165 213
Total 54 (100%) 193 247
6.3 An Alternative Explanation: New Nouns and Collocation Uses 115
As a generalisation from Tables 6.8, 6.10 and 6.11, it becomes clear that the
percentages of new nouns in erroneous collocations are low and remain roughly
constant at the two higher levels. The percentage of new nouns in erroneous col-
locations in ST5 is 11% and the ratio is 14% for ST6 learners. So again the
prediction that the acquisition of new nouns is not an inhibiting factor in collocation
performance is upheld. It is interesting that if the new nouns at the ST6 level are
identified as new with reference to the nouns produced by the ST2 learners, the
percentage is 25%, exactly the sum of 11 and 14%. It follows that when the nouns
in the highest level (ST6) are compared with those produced by the lowest level
(ST2), there are more new nouns obtained than are compared with a lower level
(ST5). The three percentages (11, 14 and 25%) suggest two significant points: the
proficiency of the three levels of learners is continuously developing (as manifested
through a gradual increase in newly acquired nouns); the percentages of new nouns
in erroneous collocations appear to describe a coherent trend, supporting the
validity of the analysis method in this study.
Turning now to the detailed analyses of new nouns in erroneous VN colloca-
tions, new nouns in erroneous collocations in the ST6 (new as compared with the
ST5 level) and those in the ST5 (new as compared with the ST2 level) were
analysed (see Tables 6.12 and 6.13).
From the above tables, it can be inferred that in nearly all the cases of new nouns
in erroneous collocations, ST5 and ST6 learners may know the verb but fail in
using them with the new nouns (except *make + offence, *do + regulation). Even
when they do not lack the collocating verb for a new noun, errors still arise, which
again reveal a split learning of collocations.
In conclusion, the research hypothesis that it is the increase in verbs that is
mainly responsible for the stagnant collocation performance was upheld. In this
section we attempted an alternative explanation to see if the learning of newly
acquired nouns is also a factor responsible for the collocation lag. It was found that
the occurrence of new nouns is not the main factor responsible for stagnant
Table 6.12 New nouns in ST6 VN erroneous collocations and their verb collocates (new nouns
as compared with ST5)
Nouns Erroneous Target verbs Erroneous verbs in Target verbs in
verbs ST5 ST5
Treaty Draw Sign + +
Competence Exert Demonstrate − +
Load Pull Carry + +
Recognition Reach Receive + +
Military Attend Perform + +
service
Military Take Perform + +
service
Note ‘+’ means that the verb appears in the ST5, and ‘−’ represents an absence of the verb in the
ST5
116 6 Verb Increase and the Production of Verb + Noun Collocations
Table 6.13 New nouns in ST5 VN erroneous collocations and their verb collocates (new nouns
as compared with ST2)
Nouns Erroneous Target Erroneous verbs in Target verbs in
verbs verbs ST2 ST2
Role Do Play + +
Role Act Play + +
Role Occupy Play + +
Role Lay Play + +
Drum Hit Beat + +
Eyebrow Frown Raise − +
Offence Make Commit + −
Regulation Do Enact + −
Utmost Make Do + +
Note ‘+’ means that the verb appears in the ST2, and ‘−’ represents an absence of the verb in the
ST2
References
Meara, P. (1978). Learners’ word associations in French. Interlanguage Studies Bulletin, 3(2),
192–211.
Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some
implications for teaching. Applied Linguistics, 24(2), 223–242.
Nesselhauf, N. (2005). Collocations in a learner corpus. Amsterdam: Benjamins.
Schmitt, N., & Carter, R. (2004). Formulaic sequences in action: An introduction. In N. Schmitt
(Ed.), Formulaic sequences: Acquisition, processing and use (pp. 1–22). Amsterdam:
Benjamins.
Wolter, B. (2006). Lexical network structures and L2 vocabulary acquisition: The role of L1
lexical/conceptual knowledge. Applied Linguistics, 27(4), 741–747.
Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge University Press.
Chapter 7
Chinese Learners’ Performance
on English Adjective + Noun
and Noun + Noun Collocations
the lower level learners and 15 instances by the higher level. Apart from this, the
two tables reveal two important aspects. Firstly, it is evident that the ratios of AN
collocation errors are not high for both groups of learners in terms of either tokens
or types. These ratios are 12% for ST2 learners and 4% for ST6 learners. In contrast
to the proportion of verb + noun collocation errors (for ST2, 22%, and for ST6,
21%, cf. Sect. 5.1.3), they are much lower. Secondly, learners at the higher level
produced significantly more well-formed collocations than the lower level, and
there was a significant error decrease. The decrease in AN collocation errors stands
in sharp contrast to the production of VN collocations, for which the proportion of
errors does not show a clear decrease.
Therefore, based on quantitative analyses, the data show that Chinese learners do
not seem to have great difficulties with adjective + noun collocations. In addition,
their knowledge of AN collocations improves with rising proficiency. These results
corroborate the findings from previous studies of L2 learners’ learning of AN
collocations (e.g. Gitsaki 1999; Siyanova and Schmitt 2008; Zhang and Chen 2006;
cf. Sect. 3.2.1.2). Adjective—noun collocations have been identified as “easy” and
“early acquired” type of collocations (Gitsaki 1999) and more proficient learners
had better command of AN collocations than lower levels (Zhang and Chen 2006).
Examples of frequent noun + noun collocations produced by ST2 and ST6 learners
are art festival, basketball match, book shop, fire fighter, swimming pool, crime
rate, etc.1 Erroneous NN collocations include *artist festival (art festival),
1
In the creation of the ST6 NN collocation database, two collocations—mercy killing (which has
been used for 255 times) and prison system (107 times)—were deleted so as to avoid statistical
skewedness. The high occurrence of these two collocations is because they are topic-related: two
of the topics given to the English majors are “the legalisation of euthanasia in China” and “the
abolition of prisons”.
7.2 Analyses of Noun + Noun Collocations 119
*homehold duties (household duties), and *scientist book (science book), etc.
Altogether there are 82 and 46 instances of erroneous noun + noun combinations in
each database. A detailed look into the errors involving noun + noun combinations
shows that not all of these errors are collocation errors, i.e. errors associated with
the wrong choices in words of the same word class. Instead, a large proportion of
them is colligation errors, i.e. errors linked with wrong choices in grammatical
categories. Unlike collocations referring to the co-occurrence of word combina-
tions, colligation refers to the co-occurrence of grammatical choices. So a collo-
cation error is an error with wrong lexical selections (but the word class is correct),
e.g. *learn knowledge rather than acquire knowledge. A colligation error is an error
with word classes of the words in a word combination, e.g. *industry city rather
than the industrial city. Table 7.3 presents the NN colligation errors in the ST2 and
ST6 databases.
Table 7.3 Noun + noun colligation errors produced by ST2 and ST6 learners
ST2 ST6
Errors Target words Errors Target words
Flowers exhibition Flower Electricity lamps Electric
Foreigner teacher Foreign Environment consciousness Environmental
Happiness family Happy Families members Family
Industry city Industrial Feminism movement Feminist
Interest book Interesting Feudalism society Feudalist
Socialism country Socialist Globe economy Global
Socialism reformation Socialist Heat debate Heated
Sport ground Sports Heat topic Heated
Sport meeting Sports Importance step Important
Sports trousers Sport Industry revolution Industrial
Summer’s holiday Summer Limit recourses Limited
History’s test History Medicine fee Medical
Math’s test Math Nationality defence National
Freedom life Free Nature process Natural
People computer Personal Scenery spots Scenic
Socialism construction Socialist
Socialism countries Socialist
Society evolution Social
Society factor Social
Society problem Social
Society wealth Social
Examples sentences Example
Capitalism countries Capitalist
Science way Scientific
Economy development Economic
Economy growth Economic
Stars hotel Star
120 7 Chinese Learners’ Performance on English Adjective + Noun …
In most of the colligation errors, learners wrongly use a noun instead of its
adjectival form, e.g. foreigner teacher rather than foreign teacher, electricity lamps
rather than electric lamps, environment consciousness instead of environmental
consciousness, etc. In English, both a noun and adjective can function as the
modifiers of the following noun, and this grammatical feature seems to be baffling
learners in choosing which to collocate with the nouns that follow. Another factor
of the misuses of nouns for adjective modifiers may be a cross-linguistic one; a
noun modifier before another noun is very common in the Chinese syntax. What is
more interesting among these errors is that sometimes learners are conscious of the
typical grammatical feature of English and try to use adjectival modifiers such as
possessives before the nouns. As the examples in Table 7.3 show, summer’s hol-
iday, history’s test and math’s test are indications of an awareness of adjectival
modifiers. Yet the over generalisation leads to colligation errors.
In terms of the quantities of these errors, learners do not seem to get better with
NN colligations as their proficiency level rises. On the contrary, there is a wors-
ening performance in noun + noun colligations. Tables 7.4 and 7.5 present the
frequencies of colligation errors and non-colligation errors in the two proficiency
groups.
From the two tables, it can be seen that in percentage terms, 83% of the NN
combination errors ST6 learners made are colligation errors, whilst the percentage
of ST2 learners is 43%. Fisher’s test on the frequencies of tokens and types showed
a significant difference between learner types in terms of the production of colli-
gation errors. ST6 learners made significantly more errors with noun + noun col-
ligations than ST2 learners. This could be that the influence of the noun + noun
structure in the L1 Chinese is very persistent even at higher levels. We shall turn to
the L1 influence on the learning of collocations by L2 learners in Chap. 9.
Besides the NN colligation errors, there are also a few cases where the entire
expression does not make sense in English and should be a noun or a totally new
expression, i.e. *smile sound for laughter, *hill-medicine for yam, *football door
Table 7.4 Colligation and non-colligation NN errors in the ST2 and ST6 levels (tokens)
NN colligation errors NN non-colligation errors Total
ST2 34 (43%) 46 (57%) 80 (100%)
ST6 38 (83%) 8 (17%) 46 (100%)
Note p < 0.0001 **** extremely Sig.
Table 7.5 Colligation and non-colligation NN errors in the ST2 and ST6 levels (types)
NN colligation errors NN non-colligation errors Total
ST2 15 (41%) 22 (59%) 37 (100%)
ST6 27 (77%) 8 (23%) 35 (100%)
Note p = 0.0020 ** very Sig.
7.2 Analyses of Noun + Noun Collocations 121
for goal, *book table for desk, *mother school for alma mater, *warning clock for
alarm bell, *psychology doctor for psychiatrist and *song words for lyric. Such
unacceptable expressions are the direct word-by-word rendering of the Chinese
characters into English. This may be a strategy adopted by L2 learners as they turn
to the direct translation of the Chinese expression (shanyao) when they have not
acquired the English word (yam).
With the colligation errors and errors of the entire expression excluded, the
remaining 16 NN combinations in the ST2 database are erroneous noun + noun
collocations. Collocation errors produced by ST2 learners are: *artist festival (art
festival), *basketball ground (basketball court), *football court (ground), *football
movement (football games), *hand master (head master), *hand teacher (head
teacher), *heart illness (heart disease), *homehold duties (household duties), *saw
materials (raw materials), *scientist book (science book), *speech match (speech
contest), *pity girl (poor girl), *middle night (mid-night), *singers match (singing
match), *end game (final game), *beauty match (beauty contest). The 7 NN col-
location errors in the ST6 database are: *feminine movement (feminist movement),
*feminism women (feminist women) *graduation certification (graduation certifi-
cate), *life standard (living standard), *mountain slides (land slides), and *song
star (music star). The overall number of well-formed and erroneous NN colloca-
tions is presented in Tables 7.6 (for tokens) and 7.7 (for types).
As is revealed from the above tables, the proportions of erroneous noun + noun
collocations are very low: 7% for ST2 and 2% for ST6 learners. Statistical analyses
showed that higher level learners made very significantly fewer errors than the
lower group in the use of NN collocations. This indicates a better command of NN
collocations as learners’ proficiency rises. Comparing this finding with the pro-
duction of AN collocations, it shows a similar pattern, i.e. better performance on
NN collocations was observed with the rise of L2 proficiency.
Table 7.6 Noun + noun Learners Well-formed coll. Erroneous coll. Total
collocations in the ST2 and
ST6 levels (tokens) ST2 618 (95%) 31 (5%) 649 (100%)
ST6 792 (99.2%) 6 (0.8%) 798 (100%)
Note v2 = 21.68, p < 0.0001 ****
extremely Sig.
Table 7.7 Noun + noun Learners Well-formed coll. Erroneous coll. Total
collocations in the ST2 and
ST6 levels (types) ST2 202 (93%) 16 (7%) 218 (100%)
ST6 307 (98%) 6 (2%) 313 (100%)
Note v2 = 8.20, p = 0.00425 **
very Sig.
122 7 Chinese Learners’ Performance on English Adjective + Noun …
References
This chapter presents and compares the results obtained from the analyses of
Chinese learners’ verb + noun, adjective + noun and noun + noun collocations.
Section 8.1 performs a comparison of erroneous collocations among the three types
of collocations; Sect. 8.2 analyses and compares the overall growth of verbs,
adjectives and nouns and relates them to the production of collocations; Sect. 8.3
analyses in detail the synset density of the three-word classes and offers interpre-
tations of the differing performances on the VN, AN and NN collocations.
Table 8.1 presents the overall percentages of collocation errors among the pro-
duction of verb + noun, adjective + noun and noun + noun collocations by
Chinese EFL learners at the basic and advanced levels.
A comparison of the ratios of erroneous collocations among the three types of
collocations at each proficiency level (e.g. in the ST2 level, 22% for VN, 12% for
AN and 7% for NN) shows that Chinese L2 learners, irrespective of their profi-
ciencies, performed best in NN collocations, followed by AN collocations, and
performed worst in VN collocations. A cross-group comparison of the ratios
demonstrates a varied developmental pattern, i.e. no clear decrease in collocation
errors with the rise of proficiency in the production of VN collocations, but a
decrease in errors in the production of AN and NN collocations. Combining this
result with the findings from statistical tests, no significant relationship was found
between learner levels and erroneous VN collocations (cf. Sect. 5.1.3), suggesting
that there is no significant decrease in VN collocation errors. However, there were
very significantly fewer AN and NN collocation errors at the ST6 level than the ST2
level, suggesting an improvement in the acquisition of these two collocation types.
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018 123
H. Men, Vocabulary Increase and Collocation Learning,
DOI 10.1007/978-981-10-5822-6_8
124 8 Comparison and Interpretation of Learners’ Performance on …
Table 8.1 Error ratios of VN, AN, and NN collocations produced by ST2 and ST6 learners
Learners Verb + noun coll. (%) Adjective + noun coll. (%) Noun + noun coll. (%)
ST2 22 12 7
ST6 21 4 2
This claim holds water in that many noun + noun and adjective + noun colloca-
tions in our databases are concrete lexical co-occurrences that can be observed, i.e.
air conditioner, bus stop, football ground, blue sky, full moon, heavy rain, etc.
However, the observable property of certain collocations does not guarantee an easy
acquisition because of the arbitrariness in lexical co-occurrences. For example,
heavy rain is concrete and observable, but it does not rule out the possible com-
binations of *dense rain, *strong rain, or *powerful rain. Learners were found to
make errors with concrete NN collocations as well, such as *basketball ground,
*mountain slides, *light river, etc. At the same time, many abstract AN and NN
collocations were correctly produced, i.e. environment protection, labour force,
sales volume, cheap trick, deep sorrow, etc. So the observable feature of adjec-
tive + noun and noun + noun collocations as proposed by Philip (2007) cannot
account for learners’ relative ease with these two types of collocations.
Another possible explanation of the relative difficulty in acquiring VN collo-
cations and relative ease in acquiring AN and NN collocations is based on the
observation that words of different parts of speech differ in their tendency to cluster,
i.e. singular nouns and base forms of verbs are highly collocational while adjectives
and adverbs are not (Kjellmer 1990). This could be interpreted to mean that verbs
and nouns are more collocational and therefore bring greater learning burden for
foreign language learners. In other words, the collocational density of verb + noun
collocations poses more problems for learners than AN and NN collocations. This
could account for the overall learning burden of VN collocations, but to account for
the different performance on the three types of collocations, we argue for the
vocabulary growth factor and predict that the synonym densities of adjectives and
nouns are lower than those of verbs, thus resulting in better performance in the AN
and NN collocations and worse performance in VN collocations. We shall discuss
this in Sects. 8.2 and 8.3.
8.2 Vocabulary Growth and Collocation Errors 125
NN coll. error
Nouns
AN coll. error
Adjectives
VN coll. error
Verbs
Fig. 8.1 Overall growth rates of the verbs, adjectives and nouns and collocation errors
The classification of the adjectives and nouns in the databases into synonym sets
was performed with reference to the WordNet and the ODSA. These classifications
were much more difficult than the classification of verbs into synsets since adjec-
tives and nouns were more diversified than verbs in the collocation databases.
Starting from the highest level where more adjectives are produced in AN collo-
cations, adjectives listed in either of the reference sources as synonyms were
recorded as one synset. Accordingly, only seven synsets with a rather limited
number of adjectives were categorised by using the same reference works as with
verbs. They are adjectives describing broadness (broad, full, wide), adjectives
denoting keenness (keen, sharp), “deadly” adjectives (deadly, fatal, lethal), “clean”
adjectives (clean, clear, light), “distant” adjectives (distant, remote), “dense”
adjectives (dense, heavy) and “daily” adjectives (daily, everyday). Adjectives in the
ST2 collocation databases are more diversified and only “daily” adjectives (daily,
everyday) were identified.
The difficulty in grouping adjectives in synsets may be attributed to the types
of adjectives that were frequently used by learners, e.g. classifying adjectives
8.3 Synsets and Collocation Production 127
(e.g. academic, annual).1 A large number of adjectives used by ST2 and ST6
learners in AN collocations are of a classifying nature (e.g. among the 142 types of
adjectives in the ST6 AN collocation database, 70 are classifying ones, cf.
Appendix G). Classifying adjectives, as their name suggests, function to group
nouns into different categories, so these adjectives themselves are too broad in
scope to be categorised in synsets. For example, synonyms of capitalist, chemical,
domestic, and solar have only a few synonyms as referenced in WordNet.
As it is difficult to classify adjectives into different synsets, the nouns produced
in the NN collocations, such as air, alarm, art are also difficult to group. So, in
general, the synonym density of adjectives and nouns in the databases is lower than
that of verbs, which maybe the reason why AN and NN collocations were more
accurate than VN collocations. Studies of the synsets of verbs, adjectives and nouns
have confirmed such a decrease in synonym density. According to the statistics
published online for WordNet 3.0 database, the ratios of synsets as compared to the
total number of verbs, adjectives and nouns, respectively, are 1.19 for verbs, 0.85
for adjectives and 0.70 for nouns (see Table 8.3 for the raw statistics cited from
WordNet).
Therefore, based on the statistics from the WordNet, in general, there are more
synsets for verbs than adjectives and more synsets for adjectives than nouns. That
decreasing synonym density of the three-word classes was also verified through
computational analysis of WordNet (cf. Kamps et al. 2004; Tufis and Stefanescu
2011). In the graphs drawn by Kamps et al. (2004: 1116) through collecting all
words in the WordNet, and relating words that can be synonymous, they observed a
giant component: in the verb-subgraph there is a component of size 6365 (or 57%
of all verbs); in the adjective-subgraph there is a component of size 5427 (or 25% of
all adjectives) and in the noun-subgraph there is a connected component of size
10,922 (or 10% of all nouns). These figures show that more verbs than adjectives
and more adjectives than nouns are related in synsets.
1
Adjectives are classified into 4 categories (Sinclair and Fox 1990: 63f): qualitative adjectives
(which identify qualities someone or something has, e.g. happy, and intelligent), classifying
adjectives (which identify someone or something as a member of class, e.g. financial and intel-
lectual), colour adjectives (identifying the colour of something, e.g. blue and green) and
emphasising adjectives (which are used to emphasise feelings, e.g. complete and absolute).
128 8 Comparison and Interpretation of Learners’ Performance on …
2
See also the statistics on the WordNet website: http://wordnet.princeton.edu/wordnet/man/
wnstats.7WN.html#toc2 (Accessed 8 April 2014).
3
For the convenience of checking the synonyms of the verb in question, Wordnet (2.1) was
referenced instead of the web interface, as in the software different senses of the verb are num-
bered, and synonyms are neatly listed.
8.3 Synsets and Collocation Production 129
Sense 1
The answer, reply, respond—(reply or respond to; “She didn’t want to answer.”; “answer
the question”; “We answered that we would accept the invitation.”)
=> state, say, tell—express in words; “He said that he wanted to marry her.”; “Tell me what
is bothering you.”; “state your opinion”; “state your name”)
Sense 2
answer—(give the correct answer or solution to; “answer a question”; “answer the riddle”)
=> solve, work out, figure out, puzzle out, lick, work—(find the solution to (a problem or
question) or understand the meaning of; “Did you solve the problem?”; “Work out your
problems with the boss.”; “This unpleasant situation isn’t going to work itself out.”; “Did
you get it?”; “Did you get my meaning?”; “He could not work the math problem.”)
Sense 3
answer—(respond to a signal; “answer the door”; “answer the telephone”)
=> react, respond—(show a response or a reaction to something)
Sense 4
answer, resolve—(understand the meaning of; “The question concerning the meaning of
life cannot be answered.”)
=> solve, work out, figure out, puzzle out, lick, work—(find the solution to (a problem or
question) or understand the meaning of; “did you solve the problem?”; “Work out your
problems with the boss.”; “this unpleasant situation isn’t going to work itself out.”; “did
you get it?”; “Did you get my meaning?”; “He could not work the math problem.”)
Sense 5
answer—(give a defence or refutation of (a charge) or in (an argument); “The defendant
answered to all the charges of the prosecution.”)
=> refute, rebut—(overthrow by argument, evidence, or proof; “The speaker refuted his
opponent’s arguments.”)
Sense 6
answer—(be liable or accountable; “She must answer for her actions.”)
=> be—(have the quality of being; (copula, used with an adjective or a predicate noun);
“John is rich.”; “This is not a good answer.”)
Sense 7
suffice, do, answer, serve—(be sufficient; be adequate, either in quality or quantity; “A few
words would answer.”; “This car suits my purpose well.”; “Will $100 do?”; “A ‘B’ grade
doesn’t suffice to get me into medical school.”; “Nothing else will serve.”)
=> satisfy, fulfil, fulfil, live up to—(fulfil the requirements or expectations of)
Sense 8
answer—(match or correspond; “The drawing of the suspect answers to the description the
victim gave”)
=> match, fit, correspond, check, jibe, gibe, tally, agree—(be compatible, similar or con-
sistent; coincide in their characteristics; “The two stories don’t agree in many details.”;
“The handwriting checks with the signature on the check.”; “The suspect’s fingerprints
don’t match those on the gun.”)
130 8 Comparison and Interpretation of Learners’ Performance on …
Sense 9
answer—(be satisfactory for; meet the requirements of or serve the purpose of; “This may
answer her needs.”)
=> meet, satisfy, fill, fulfil, fulfil—(fill or meet a want or need)
Sense 10
answer—(react to a stimulus or command; “The steering of my new car answers to the
slightest touch.”)
=> react, respond—(show a response or a reaction to something)
Altogether the 60 words and their number of synonyms are presented in the fol-
lowing Table 8.4.
As the above table reveals, the synonyms for verbs (113) far outnumber those for
adjectives (57), and the synonyms for adjectives outnumber those for nouns (13).
The result confirms the findings from computational studies of synsets in WordNet,
viz. the synonym density for verbs, adjectives and nouns is on a decreasing scale. In
the light of the fact that verbs have more synsets than adjectives and adjectives have
more synsets than nouns, we get a better understanding of why L2 learners perform
worse on verb + noun collocations and better on the adjective + noun and
noun + noun collocations. Since there are more synonyms for verbs, the more verbs
in a synset learners acquire, the more likely they are to make collocation errors. In
8.3 Synsets and Collocation Production 131
Table 8.4 Selected words in the learner databases and the number of synonyms
Verbs No. of syns. Adjectives No. of syns. Nouns No. of syns.
Answer 9 Blue 2 Ball 0
Break 16 Botanical 1 Break 4
Catch 5 Capitalist 1 Center 0
Comb 6 Classical 1 Colour 0
Create 2 Common 4 Diamond 0
Discharge 8 Crisp 6 Fashion 3
Earn 9 Deep 6 Gambling 0
Follow 3 Double 2 Head 2
Grasp 7 Fair 7 Lab 0
Kick 1 Firm 2 Light 0
Lead 2 Founding 0 Name 0
Obey 3 Glib 1 Party 0
Pass 4 Happy 9 Police 0
Play 6 Historic 2 Program 1
Remember 6 Living 0 Restaurant 0
See 7 Low 7 Sentence 0
Show 4 Natural 2 Steel 0
Sow 9 Political 0 Telephone 1
Teach 3 Public 1 Trip 2
Wear 3 Strong 3 World 0
Total 113 Total 57 Total 13
Note ‘No. of syns.’ is short for the number of synonyms
contrast, the fewer synonyms for adjectives and nouns may explain why learners
seldom made errors in choosing the right collocates, although they were confused
with the grammatical forms of words and made colligation errors in NN
collocations.
verbs, adjectives and nouns. Computational analyses both of the synsets of the
three-word classes in WordNet and in a case study revealed that verbs generally
have more synonyms than adjectives, while adjectives have more synonyms than
nouns. Therefore, the better performance on AN and NN collocations can be
accounted for through the lower density in synonyms. In this regard, the analyses of
AN and NN collocations in this chapter consolidate the prediction that vocabulary
growth is an inhibiting factor in collocation acquisition. To be more specific, for
word classes where there is little increase in a synonym set, collocation errors are
seldom made (as for adjective and nouns in AN and NN collocations); where there
are increases in words in synsets, chances of errors subsequently increase (as for
verbs in VN collocations).
References
Kamps, J., Marx, M., & Mokken, R. J., et al. (2004). Using WordNet to measure semantic
orientation of adjectives. In Proceedings of LREC-04, 4th International Conference on
Language Resources and Evaluation. (Vol. 4, pp. 1115–1118).
Kjellmer, G. (1990). Patterns of collocability. In J. Arts, W. Meijs (Eds.), Theory and practice in
corpus linguistics (pp. 163–178). Amsterdam: Rodopi.
Philip, G. (2007). Decomposition and delexicalisation in learners’ collocational (mis)behaviour. In
Online Proceedings of Corpus Linguistics, [2014-01-12]. http://ucrel.lancs.ac.uk/publications/
cl2007/paper/170_Paper.pdf.
Sinclair, J., & Fox, G. (1990). Collins COBUILD English grammar. London: Collins.
Tufis, D., & Stefanescu, D. (2011). An Osgoodian perspective on WordNet. In Speech Technology
and Human-Computer Dialogue (SpeD), 2011 6th Conference on (pp. 1–8), IEEE.
Chapter 9
The Role of L1 in Collocation Learning
As already reviewed in Sect. 3.2.2, the L1 of the learner plays a considerable role in
the learning and production of L2 collocations. On the one hand, previous empirical
collocation studies into L2 learners’ collocation performance show that
L1-influenced errors make up a large proportion of errors even at advanced levels
(Laufer and Waldman 2011; Nesselhauf 2005). For example, Laufer and Waldman
(2011) reported that over 60% of the verb + noun collocation errors produced by
intermediate and advanced learners were L1-induced. This proportion is not found
to decrease over time. A heavy reliance on their mother tongue in collocation
production is also manifested through the overuse of certain collocations that are
linked to lexical combinations in the L1and the underuse of collocations that are
mismatched between the two languages (e.g. Granger 1998a, b; Kaszubski 2000).
On the other hand, research exploring the psychological reality of L2 collocation
learning in terms of L1 and L2 congruence and non-congruence found that L2
learners perform better on congruent collocations than non-congruent ones in
lexical decision tasks. It is also suggested that non-congruent collocations stored in
memory are processed autonomously without word-by-word mediation of the L1.
However, there is still a paucity of investigation into whether L2 learners produce
congruent collocations with more accuracy than non-congruent ones, and whether
non-congruent collocations, once learnt, are less susceptible to errors. Therefore, we
set out in this chapter to investigate the role of L1 in L2 collocation learning in
terms of the production of congruent and non-congruent collocations. It is worth-
while for this study to provide a point of comparison with existing research in terms
of the influence of L1 lexical network on L2 collocation learning. In addition, little
research evidence has been provided with regard to the role of the learners’ mother
tongue in different types of collocations, e.g. verb + noun and adjective + noun
This chapter has been written into a paper titled Cross-linguistic Influence on the Production of
L2 Collocations: A Corpus-based Study of Chinese EFL Learners’ Collocation Learning. The
paper was given at the 9th Newcastle upon Tyne Postgraduate Conference in Linguistics, 4
April, 2014.
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018 133
H. Men, Vocabulary Increase and Collocation Learning,
DOI 10.1007/978-981-10-5822-6_9
134 9 The Role of L1 in Collocation Learning
1
Grammatical words usually behave differently across languages. That is especially true of the
Chinese language, which has fewer words functioning as prepositions and they are used less
frequently than in English. The Chinese does not have articles (Cross and Papp 2008: 68; LÜ
2002). So grammatical words are disregarded and only content words of verbs and nouns in our
database were considered.
136 9 The Role of L1 in Collocation Learning
Table 9.1 Well-formed and erroneous congruent and non-congruent collocations in the ST2
(tokens)
Types Congruent coll. (%) Non-congruent coll. (%) Total
Well-formed coll. 727 (88.7) 740 (97.5) 1467
Erroneous coll. 93 (11.3) 19 (2.5) 112
Total 820 (100) 759 (100) 1579
Note v2 = 45.39, p < 0.0001 **** extremely Sig.
Table 9.2 Well-formed and erroneous congruent and non-congruent collocations in the ST2
(types)
Types Congruent coll. (%) Non-congruent coll. (%) Total
Well-formed coll. 117 (69.6) 104 (88.9) 221
Erroneous coll. 51 (30.4) 13(11.1) 64
Total 168 (100) 117 (100) 285
Note v2 = 13.59, p = 0.0002 *** extremely Sig.
138 9 The Role of L1 in Collocation Learning
Table 9.3 Well-formed and erroneous congruent and non-congruent collocations in the ST6
(tokens)
Types Congruent coll. (%) Non-congruent coll. (%) Total
Well-formed coll. 1047 (88.1) 618 (96.4) 1665
Erroneous coll. 141 (11.9) 23 (3.6) 164
Total 1188 (100) 641 (100) 1829
Note v2 = 33.97, p < 0.0001 **** extremely Sig.
9.2 Within-group Comparison of Well-formed and Erroneous Congruent … 139
The statistical analyses of collocations produced by ST2 and ST6 learners show
that congruent collocations pose more difficulties than non-congruent ones. The
greater difficulty with congruent collocations seems to contradict findings from
psycholinguistic experiments conducted by Yamashita and Jiang (2010), Wolter
and Gyllstad (2011), according to whose findings a group of highly proficient
language learners both processed L1–L2 collocations (i.e. congruent collocations)
than L2-only collocations (i.e. non-congruent collocations) with faster reaction
times and recognised the former with higher receptive scores. So L2 collocational
links in the mental lexicon of L2 learners are likely to be mediated by their L1 and
thus congruent collocations gain more legitimacy in the mental lexicon. This is
perhaps true in the sense of collocational storage and recognition, but a different
picture emerges when this link is activated in the production process.
A detailed analysis was further performed on erroneous congruent collocations,
aiming at discovering why congruent collocations seem to pose more difficulties for
Chinese learners of English. Erroneous collocations in the ST2 learner group were
identified. A detailed investigation into the 93 instances of erroneous congruent
collocations in ST2 revealed that a large proportion of the errors were a result of
‘partial congruence’ between the two languages. Congruent collocations are easier
for L2 learners in locating the appropriate verb (e.g. answer) with the preselected
noun (e.g. question) through direct rendering from their mother tongue. As in the
case of answer + question, the corresponding English verb—answer–is a
one-to-one match with the Chinese verb—huida (although reply to and respond to
are synonyms of answer in this sense and are also in collocational relationship with
question, they are not believed to be an exact match for huida, which is instead
rendered in Chinese as huifu). However, the problem for L2 learners is that
one-to-one correspondence in languages is not prevalent. As shown in the errors in
Table 9.4 below, there exists ‘differentiation’ between Chinese and English,
meaning that the native language has one form, whereas the target language has two
or more forms (Gass and Selinker 2008: 100). For example, there is a one-to-many
correspondence in the following forms and subsequently errors are induced by such
mismatches.
Additionally, another notable type of congruent collocation error can be attrib-
uted to ‘coalescing’, referring to the opposite of ‘differentiation’ where the native
language has more than one form corresponding to only one form of the target
language (Gass and Selinker 2008: 100f). Unlike errors attributable to differentia-
tion, where learners seem to have difficulties choosing the right form from several
possible forms in the L2, in cases of coalescing, what happens is that L2 learners
have to know that among several expressions in their native language (e.g. huode
zhishi (literal translation: acquire knowledge), xuexi zhishi: literal translation: *learn
knowledge), only one expression (e.g. huode zhishi: English translation: acquire
knowledge) corresponds to the L2 expression (e.g. acquire knowledge). Examples
of errors attributable to coalescing are shown in Table 9.5.
In Table 9.5, there are many-to-one correspondences between the Chinese and
English language and only the first sequence in the left column corresponds cor-
rectly to the sequences in English. For instance, the concept of acquire knowledge
can be expressed in Chinese in at least three forms: acquire knowledge/*earn
knowledge, *learn knowledge and *grasp knowledge. *learn knowledge has been
commonly produced by Chinese learners and negative transfer subsequently occurs.
*
learn knowledge, *grasp knowledge together with *teach knowledge are the
commonest Chinese expressions when expressing the concepts of acquire knowl-
edge and impart knowledge.
It can be seen that 66.7% ((26 + 36)/93 100) of the erroneous congruent
collocations are caused by a partial congruence (or in Nesselhauf’s (2005) termi-
nology, ‘partial non-congruence’), where one-to-many or many-to-one correspon-
dences occur in the native and target languages and only one equivalent of several
expressions that can be used in one language is acceptable in another. These two
factors are found to be responsible for the susceptibility to errors for congruent
collocations. Just as Farghal and Obeidat (1995: 323) pointed out, reliance on L1
does not “always result in positive transfer since the one-to-one correspondence
hypothesis holds in only few cases”. Relying on the L1 by L2 learners commonly
leads to negative transfer.
Tables 9.6 and 9.7. below present the results of well-formed congruent and
non-congruent collocation tokens and erroneous ones produced by ST2 and ST6
learners (for types, see Appendices I and J).
Likewise, statistical analyses were conducted on the data in both tables above.
Two different results were obtained: a significant relationship between well-formed
non-congruent collocations and learner groups was found, whilst for erroneous
non-congruent collocations, no statistical significance was observed. Significantly
more non-congruent collocations were correctly used by ST2 learners than by the
ST6 group (both in terms of tokens and types). As is shown in Fig. 9.1, there is an
2
See also the paraphrasing strategy employed by L2 learners in Bahns and Eldaw (1993) and
Farghal and Obeidadt (1995).
9.3 Between-group Comparison of Well-formed and Erroneous … 143
3
The two non-congruent collocations that were correctly used by the ST2 level but wrongly used at
the ST6 level are *make (draw) + conclusion and *put (pay) + attention, both of which are due to
the Chinese transfer.
4
Nesselhauf (2005: 222) distinguishes two types of non-congruence: lexical and non-lexical
non-congruence, the former of which corresponds to the sense in the present study whilst the latter,
referring specifically to elements other than lexical words (e.g., prepositions) that are not congruent
between languages.
144 9 The Role of L1 in Collocation Learning
Adjective + noun collocations produced by ST2 and ST6 learners were classified
into congruent ones and non-congruent ones following the same procedure with
verb + noun collocations. Examples of congruent AN collocations are: active part,
absolute truth, bad luck, blue sky, etc. Non-congruent collocations include active
volcano, narrow escape, promissory note, heavy smoke, etc. In this section, the role
of the L1 in the two types of collocations (VN and AN collocations) is examined in
order to see whether its influence is proportionate in different word-class
collocations.
It is assumed that congruent collocations are correctly used owing to L1 positive
transfer, though this assumption is somewhat arbitrary and speculative, since within
the learners’ “black box”, it is not clear whether congruent collocations are stored
and produced wholly without mediation through their L1. However, one may
justify this assumption, on the basis that learners take less reaction time and make
fewer errors in responding to congruent collocations than non-congruent ones,
which suggests that the former are stored in the learners’ mental lexicon via L1
mediation (cf. Wolter and Gyllstad 2011; Yamashita and Jiang 2010). So the col-
locations which are a result of positive transfer are collocations tagged as (C, W),
and negative transfers are collocations tagged as (I, N) and (C, N).
L1 influence was first measured in the VN collocations produced by ST2 and
ST6 learners. The overall number of L1-influenced collocations among the VN
collocations produced by ST2 learners was 801 (calculated as the overall colloca-
tions tagged as (C, W), (I, N) and (C, N)), which makes up 51% of all the collo-
cations (1,578 tokens of collocations). Similarly, 66% collocations in the ST6 level
were either positively or negatively influenced by the Chinese. In general over 50%
of the collocations produced by Chinese learners may be traced to the influence of
their L1: this figure was also reported by Wang (2011), in whose investigation of
Chinese college students’ acquisition of English light verb + noun collocations,
61.84% of the subjects’ production of L2 light verb + noun collocations were
positively or negatively transferred from Chinese.
Next, an analysis was performed on the negative transfer of VN collocations
between ST2 and ST6 learners (see Table 9.8 below for a numeric presentation). It
shows a decrease in transfer errors in the ST6 level (from 66 to 29%). In addition,
according to statistical analysis, ST6 learners made significantly more non-transfer
errors than ST2 learners, which suggests a weakening L1 influence on the pro-
duction of L2 collocations with learners’ rising proficiency. Yet the influence of the
Table 9.8 Transfer and Types ST2 (%) ST6 (%) Total
non-transfer VN collocation
errors produced by ST2 and Transfer errors 74 (66) 48 (29) 122
ST6 learners Non-transfer errors 38 (34) 116 (71) 154
Total 112 (100) 164 (100) 276
Note p < 0.0001 **** extremely Sig.
146 9 The Role of L1 in Collocation Learning
L1 is still strong as nearly one-third of the errors are L1-induced ones in the
advanced level.
For AN collocations in the two groups of learners, the L1 influence was found to
be larger than in VN collocations: 95% for ST2 learners and 96% for the ST6
group. Compared with the percentages obtained above in VN collocations, the L1
seems to play a bigger role in AN collocations. In the following analysis, we will
investigate whether its role is allocated proportionately in positive and negative
transfer in the two types of collocation.
Tables 9.9 and 9.10 present, respectively, the numbers of VN and AN collo-
cations due to positive transfer and negative transfer in the two groups of learners
(for types, see Appendices K and L).
The two tables reveal that in both groups there is significantly more positive
transfer in AN collocations and more negative transfer in VN collocations, irre-
spective of the tokens and types examined. That there is more negative transfer in
VN collocations than AN collocations produced by L2 learners has also been found
by Parastuti et al. (2009). They investigated the collocations used by Indonesian
English learners of English and reported that among all the negative transfer errors,
the percentage of negative transfer for the verb (creation + activation) + noun
collocations was the largest—54.24%, and the second largest was adjective + noun
collocations—18.64%.
That AN collocations are less error-prone may be because they are an early
acquired type of collocations whereas VN collocations are found to be the most
difficult collocations acquired by L2 learners (Gitsaki 1999). So it may be the
relative ease with AN collocations that enables a reduction in the negative L1
transfer, while the relative difficulty with VN collocations increases the possibility
of L1 interference. Another explanation may relate to the degree of congruence
The findings are summarised with regard to the three hypotheses proposed earlier in
this chapter, namely:
Hypothesis 1. L2 learners perform better in congruent collocations than
non-congruent collocations.
Hypothesis 2. Non-congruent collocations that are correctly used by learners at
lower levels are not wrongly used by learners at higher levels.
Hypothesis 3. The L1 plays a different role in verb + noun and adjective + noun
collocations.
For Hypothesis 1, we found that there were more congruent collocations than
non-congruent collocations that were correctly used by both groups in either tokens
or types (except for the tokens in the ST2). However, there were significantly more
errors with congruent collocations than non-congruent collocations. So this
hypothesis is rejected in light of the data showing that congruent collocations
actually posed more difficulties than non-congruent ones. Further, detailed analysis
was performed on erroneous congruent collocations so as to locate the factors
inhibiting the correct production of congruent collocations. It is found that a large
proportion of the errors can be attributed to ‘partial congruence’ between their
mother tongue and English, as in the forms of “differentiation” and “coalescing”.
With regard to Hypothesis 2, between-group comparisons on the well-formed
and erroneous uses of congruent and non-congruent verb + noun collocations were
conducted. This hypothesis is upheld as for non-congruent collocations that were
correctly produced by learners of lower levels, they were seldom wrongly used by
higher levels.
Concerning hypothesis 3, within-group comparisons of positive and negative L1
influence with verb + noun and adjective + noun collocations were carried out.
Statistical analysis revealed that there was significantly more positive transfer in
AN collocations and more negative transfer in VN collocations, irrespective of
learner types. That indicates that the L1 plays a different role in word-class-specific
collocations.
In conclusion, our findings regarding the cross-linguistic influence in the
learning and production of L2 collocations hold significant implications which can
be explored in connection with previous SLA theory. Discussion of these impli-
cations will be presented in Chap. 10.
148 9 The Role of L1 in Collocation Learning
References
Altenberg, B., & Granger, S. (2002). Recent trends in cross-linguistic lexical studies. In B.
Altenberg & S. Granger (Eds.), Lexis in contrast: Corpus-based approaches (pp. 3–48).
Amsterdam: Benjamins.
Bahns, J. (1993). Lexical collocations: A contrastive view. ELT Journal, 47(1), 56–63.
Biskup, D. (1992). L1 influence on learners’ renderings of English collocations: A Polish/German
empirical study. In P. Arnaud & H. Bejoint (Eds.), Vocabulary and applied linguistics (pp. 85–
93). London: Macmillan.
Conklin, K., & Schmitt, N. (2008). Formulaic sequences: Are they processed more quickly than
nonformulaic language by native and nonnative speakers? Applied Linguistics, 29(1), 72–89.
Cross, J., & Papp, S. (2008). Creativity in the use of verb + noun combinations by Chinese
learners of English. In G. Gilquin, S. Papp, & M. B. Diez-Bedmar (Eds.), Linking up
contrastive and learner corpus research (pp. 57–81). Amsterdam: Rodopi.
Dictionary Office, Institute of Linguistics, Chinese Academy of Social Sciences. (2005).
Contemporary Chinese Dictionary. (5th ed.) Beijing: The Commercial Press.
Farghal, M., & Obeidat, H. (1995). Collocations: A neglected variable in EFL. International
Review of Applied Linguistics in Language Teaching, 33(4), 315–331.
Gass, S. M., & Selinker, L. (2008). Second language acquisition: An introductory course. London:
Routledge, 2008
Gitsaki, C. (1999). Second language lexical acquisition: A study of the development of
collocational knowledge. San Francisco: International Scholars Publications.
Granger, S. (1998a). Prefabricated patterns in advanced EFL writing: Collocations and formulae.
In A. P. Cowie (Ed.), Phraseology: Theory, analysis, and applications (pp. 145–160). Oxford:
Oxford University Press.
Granger, S. (1998b). The computer learner corpus: A versatile new source of data for SLA
research. In S. Granger (Ed.), Learner english on computer (pp. 3–18). London: Longman.
James, C. (1996). A cross-linguistic approach to language awareness. Language Awareness, 5(3–
4), 138–148.
Jiang, N., & Nekrasova, T. M. (2007). The processing of formulaic sequences by second language
speakers. Modern Language Journal, 91(3), 433–445.
Johansson, S. (2007). Seeing through multilingual corpora: On the use of corpora in contrastive
studies. Amsterdam: Benjamins.
Kaszubski, P. (2000). Selected aspects of lexicon, phraseology and style in the writing of Polish
advanced learners of English: a contrastive, corpus-based approach[J/OL]. (2011-10-13). http://
main.amu.edu.pl/ przemka/rsearch.html.
Krashen, S., & Scarcella, R. (1978). On routines and patterns in language acquisition and
performance. Language Learning, 28(2), 283–300.
Laufer, B., & Waldman, T. (2011). Verb-Noun collocations in second language writing: A corpus
analysis of learners’ English. Language Learning, 61(2), 647–672.
Lü, S. X. (2002). Collection of LÜ Shuxiang (Vol. 5): 800 Words in Modern Chinese. Liaoning:
Liaoning Education Press.
Marton, W. (1977). Foreign vocabulary learning as problem No. 1 of language teaching at the
advanced level. Interlanguage Studies. Bulletin, 2, 33–57.
Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some
implications for teaching. Applied Linguistics, 24(2), 223–242.
Nesselhauf, N. (2005). Collocations in a learner corpus. Amsterdam: Benjamins.
Parastuti, A., Said, M., & Wawan, W. (2009). The negative transfers of English collocations
written by the students of Gunadarma University[OL]. (2014-01-11). http://www.gunadarma.
ac.id/library/articles/graduate/letters/2009/Artikel_10604015.pdf.
Philip, G. (2007). Decomposition and delexicalisation in learners’ collocational (mis)behaviour.
Online Proceedings of Corpus Linguistics[OL]. (2014-01-12). http://ucrel.lancs.ac.uk/
publications/cl2007/paper/170_Paper.pdf.
References 149
Salkie, R. (2002). Two types of translation equivalence. In B. Altenberg & S.Granger (Eds.), Lexis
in contrast: Corpus-based approaches (pp. 51–71). Amsterdam: Benjamins.
Schmidt, R. W. (1990). The role of consciousness in second language learning. Applied
Linguistics, 11(2), 129–150.
Wang, D. (2011). Language transfer and the acquisition of English light Verb + Noun collocations
by Chinese learners. Chinese Journal of Applied Linguistics, 11(2), 107–125.
Wang, Y., & Shaw, P. (2008). Transfer and universality: Collocation use in advanced Chinese and
Swedish learner English. ICAME Journal, 32, 201–232.
Wolter, B. (2006). Lexical network structures and L2 vocabulary acquisition: The role of L1
lexical/conceptual knowledge. Applied Linguistics, 27(4), 741–747.
Wolter, B., & Gyllstad, H. (2011). Collocational links in the L2 mental lexicon and the influence
of L1 intralexical knowledge. Applied Linguistics, 32(4), 430–449.
Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge University Press.
Yamashita, J., & Jiang, N. (2010). L1 Influence on the acquisition of L2 Collocations:
Japanese ESL users and EFL learners acquiring English collocations. TESOL Quarterly, 44(4),
647–668.
Chapter 10
Summary and Conclusions
The final chapter begins by summarising the key findings reported in Chaps. 5, 6, 7,
8 and 9. Then theoretical and pedagogical implications for L2 collocation learning
are discussed with a view to findings revealed in this study (Sect. 10.2). The book
concludes by mentioning the limitations of the present study and suggesting ways
forward in further research into L2 learners’ collocation learning (Sect. 10.3).
10.1 Summary
1
For example, 14% collocation types were used more than 10 times by learners at the lowest level,
making up 67% of all the collocations retrieved.
10.1 Summary 153
followed by a sharp increase from the ST5 level to the ST6 level. ST6 learners
produced significantly more lexical verb + noun collocation errors than ST2
learners, indicating poorer performance in lexical verb + noun collocations than
delexical verb + noun collocations. This means that with more lexical verbs learnt,
the chances of these lexical verbs leading to collocation errors increased as well.
When the lexical verbs in both well-formed and erroneous lexical verb + noun
collocations produced by all levels of learners were arranged into synonym sets, it
was found that collocation errors were seldom made where there was no growth in
verb synsets. However, there was an increase in collocation errors in synsets with a
verb increase. As learners proceeded to more advanced levels, the occurrence of
collocation errors was found to become more and more limited to synsets with verb
increases. Verb classes most susceptible to errors were verbs of creation, fulfil
verbs, verbs of obtaining and verbs of putting, where there was a considerable
increase in the number of verbs at the higher level. A marked lag in learners’
knowledge of VN collocations was observed, as more proficient learners produced
the same proportion of errors as learners of lower levels in terms of the synsets
identified. When verbs in these sets were divided into new and old verbs, errors
with new verbs at the ST6 level were significantly more likely to be made than
errors with old verbs. Therefore, we conclude that the increase in verbs in a par-
ticular semantic domain is an inhibiting factor for the learning of collocations: it
was suggested that learners may only have an incomplete command of the
semantics of the new verb, i.e. the basic meaning of that verb is acquired but not its
distinguishing features as distinctive from a set of semantically related verbs. This
study suggests that acquisition of verb semantics is important for successful
learning of L2 collocations.
In addition to verb growth as an inhibiting factor in collocation learning, further
analysis was performed to examine whether newly acquired nouns were also a
factor responsible for the lag. Results showed that the percentage of new nouns in
erroneous collocations produced by higher levels was rather low—around 13%, a
figure which remained roughly constant at both ST5 and ST6 levels. That means in
a majority of newly acquired nouns, VN colocations were target-like. Even though
new nouns were used in erroneous collocations, it was found that this was not
mainly due to a shortfall of new verbs collocating with the newly acquired nouns; in
fact the target verbs may have been acquired (e.g. *stir + consciousness instead of
raise + consciousness, *reflect/cast + prejudice instead of hold + prejudice). So
the occurrence of new nouns is not an inhibiting factor for the stagnant development
of L2 learners’ collocational knowledge.
This study has also investigated learners’ performance in two other frequent
types of collocations: adjective + noun and noun + noun collocations. Better per-
formance was discovered in the production of adjective + noun and noun + noun
collocations than verb + noun collocations. A comparison of the ratios of erroneous
collocations among the three types of collocations showed that L2 learners, irre-
spective of proficiency level, performed best on noun + noun collocations, fol-
lowed by adjective + noun collocations, and performed worst on verb + noun
collocations. Not only was a better performance observed on AN and NN
154 10 Summary and Conclusions
collocations produced by the three levels of learners, but there was also a clear
progression overall in collocational knowledge with regard to these two types of
collocations. However, learners’ knowledge of VN collocations, lagged, as we saw.
This finding of differing collocation performance depending on category type
has contributed to answering the question raised by Siyanova and Schmitt (2008:
453)—whether other types of L2 collocations (e.g. verb–noun, verb–adverb) would
be produced at a similar level as adjective + noun collocations. The answer to their
question is negative as Chinese L2 learners had much better command of noun +
noun collocations than adjective + noun collocations, and better command of
adjective + noun collocations than verb + noun collocations.
The present study attempted to account for such differing performance in dif-
ferent types of collocations in terms of vocabulary growth within synonym sets.
Classifying verbs into synsets was found to be more natural than adjectives and
nouns in the collocation databases. Combining with synonym analyses of the words
in WordNet, and a study of the synonym density of randomly selected verbs,
adjectives and nouns used by learners, it was discovered that synonym density of
the three types of words was on a decreasing scale. That semantic property may
account for L2 learners’ better performance in AN and NN collocations and worse
performance in VN collocations. In this regard, the prediction that vocabulary
growth is an inhibiting factor in collocation acquisition is again upheld.
As an important factor that cannot be ignored in L2 acquisition, the role of L1 in
collocation learning was also examined in this study. Contrary to Bahns’s (1993)
claim (cf. Chap. 3) that only collocations which are non-congruent with learners’
L1 collocations need to be taught to learners, we found that congruent collocations
were more prone to errors for Chinese learners of English. That was because cases
of one-to-one correspondence between the two languages are few and partial
congruence is common between the two languages, i.e. differentiation (one-to-many
correspondence) and coalescing (many-to-one correspondence). As for
non-congruent collocations, it was found once they were acquired, they were sel-
dom susceptible to errors. L1 was also found to play a different role depending on
the types of collocations, as we observed that there was more negative transfer for
verbs in verb + noun collocations and more positive transfer for adjectives in
adjective + noun collocations.
The main findings of our research are: (a) vocabulary growth was identified as a
factor responsible for the stagnant collocation performance in verb + noun collo-
cations; (b) learners performed differently in verb + noun, adjective + noun and
noun + noun collocations, with verb + noun collocations the most difficult to
acquire, and noun + noun collocations the easiest; (c) learners’ L1 played a dif-
ferent role depending on the types of collocations, i.e. more negative transfer in
verb + noun collocations and positive transfer in adjective + noun collocations.
These findings contribute to a more comprehensive understanding of collocation
learning by second language learners. Most importantly, our research has identified
the vocabulary growth factor as an inhibiting force in collocation acquisition, i.e.
the learning of new semantically related verbs in a synset leads to more collocation
errors. In this regard, it goes beyond previous L2 collocation studies by providing
10.1 Summary 155
10.2 Implications
Although our data are based on L2 learners’ production, indicating that inferences
drawn from this study about psycholinguistic aspects in learners’ mental lexicon are
tentative, the findings in terms of the vocabulary growth factor and the role of L1 in
L2 learners’ collocation learning hold theoretical implications for the lexical
organisation in the mental lexicon of L2 learners of English and the cross-linguistic
influence in L2 collocation learning.
The finding that learners misuse semantically similar words in collocation pro-
duction indicates that words are primarily semantically linked in the L2 lexicon.
156 10 Summary and Conclusions
Most words are stored in the mental lexicon via the establishment of semantic
associations and semantically similar words are stored nearby (e.g. Channell 1988;
Howarth 1998; Wolter 2001; Zareva and Wolter 2012). Language production
involves the selection of appropriate words according to the meaning to be con-
veyed. Psycholinguistic evidence has been gathered “in favor of a psycholinguistic
model in which words with like meanings are ‘close together’ in accessing terms”
(Channell 1988: 90; cf. Albert and Obler 1978). So in the production of word
combinations, a choice among the alternatives of a group of semantically related
words has to be made. The clustering of words with similar meanings thus produces
an interference effect in selecting the right words. Even native speakers encounter
semantic interference in producing the target collocations. This interference effect is
observed in the mis-collocations produced by native speakers (either intentionally
or unintentionally). Evidence concerning native speakers’ collocation misuse is
sparse in the literature. Howarth (1998a) is among the few who investigated both
the NSs’ and NNSs’ phraseological errors. He proposes two types of plausible
explanations to account for the occurrences of lexical mis-collocations produced by
NSs: collocational overlaps and blends. In *draw a contrast, the error can be seen as
the result of filling in a collocational gap within a partially overlapping cluster:
draw a distinction, make a distinction, make a contrast but not *draw a contrast. In
*
place weight, the error arises out of a blending of two pairs of collocations: place
emphasis and attach weight. Approaching these errors from the perspective of the
semantics of the erroneous verb and the target verb, we get the generalisation that
they are semantically related verbs, belonging to the synsets identified in the present
study. The verbs of draw and make are in the same set denoting verbs of creation.
Place and attach are in the same set of verbs of putting. For other L1 collocation
errors given by Howarth, such as *reach a justice, the verb reach and the target verb
achieve fall in the same set of verbs of obtaining (cf. Sect. 6.1). For another set of
verbs listed by Howarth, e.g. compile, draw up, make, produce and write, they are
semantically related as verbs of creation and it follows that collocation errors are
made by native speakers through wrongly selecting one of them (e.g. compile) to
collocate with a noun (e.g. memorandum).
Similar to the semantic interference for native speakers in selecting the right
word from a set of semantically similar words, L2 learners encounter the same
interference with the expansion of their vocabulary. However, there is a funda-
mental difference in the semantic interference effect between NSs and NNSs. Native
speakers may deviate from standard collocational forms either deliberately or
unintentionally and in fact there are only a very small number of erroneous col-
locations produced by NSs (Howarth 1998a). However, for L2 learners, the
semantic interference effect is stronger owing to an incomplete acquisition of the
semantics of the semantically related words. As the empirical evidence obtained in
the data shows, learners’ collocation production gets worse when new semantically
related words are learnt (e.g. *concede + mistake rather than admit + mistake). As
Chap. 6 reports, verbs in synsets increased dramatically with the rise of proficiency,
so did the increase in collocation errors. Misuses of verbs such as conduct, commit,
10.2 Implications 157
accomplish, enforce, implement and perform are believed to be caused by the partial
acquisition of the core meanings (i.e. “carry out” or “do”) but not their distinctive
meanings.
That semantic confusion increases with the rise of proficiency has been found in
several studies (e.g. Llach 2011; Ringbom 1987, 2001). In a developmental study of
the kinds of lexical errors that appeared in the written production of young Spanish
learners at two different stages, Llach (2011) observed a statistically significant
increase in semantic lexical errors at the higher proficiency level, which included
calques (literal translation of the word from the L1 to the L2) and semantic con-
fusion [the confusion of semantically related words, e.g. * my bedroom is great
(great for huge or big)]. These results support that “well-developed lexicons are
dominated by paradigmatic associative connections” (Zareva and Wolter 2012: 60).
Psycholinguistic research into the lexical organisation of L2 learners’ mental dic-
tionary indicates that “the same class (paradigmatic) connections become more
prominent as the proficiency of L2 learners of English increases to an advanced
level” (ibid: 59).2 Through word association tasks, Zareva and Wolter found that
with the advance of proficiency, NNSs’ lexicon becomes more paradigmatically
dominated like NSs’. Synonymy is an important paradigmatic response and L2
learners’ mental lexicon becomes organised more like a thesaurus in which words
with similar meanings are stored together (Meara 1978; Zareva and Wolter 2012).
Therefore, the more proficient learners become, the larger this thesaurus is, and the
more semantic interference they are confronted with in choosing the right word
from a set of semantically related words.
L2 learners are not only confronted with the semantic interference from vocabulary
increase along the paradigmatic relations, their L1 lexical network exercises a
considerable influence over the learning and production of collocations. Thus
another inference about cross-linguistic influence can be considered from this study.
Firstly, a higher error rate with congruent collocations than non-congruent ones,
even for advanced learners, suggests a consistent role of the L1 in producing
collocations even for proficient NNSs. As has been discussed in Chap. 3, the active
role of the L1 in collocation production is confirmed in psycholinguistic experi-
ments where a ‘dual-activation’ takes place: an L2 word stimulates not only its
collocates, but also its L1 translation equivalent and L1 collocate (Wolter and
Gyllstad 2011). Given that “even for advanced L2 learners, the L1 continues to be
active even when performing tasks entirely in the L2” (ibid: 443), it seems that apart
2
Paradigmatic relations between words refer to words of the same lexical class that can substitute
for another in a syntactic string (e.g., synonyms, antonyms, meronyms, hyponyms, etc.) (Zareva
and Wolter 2012: 44).
158 10 Summary and Conclusions
from the receptive process in primed lexical decision tasks, in the actual collocation
production process, L2 learners’ L1 still plays a predominant role through inter-
fering and mediating collocation production. The significant traces of the L1 in L2
collocations, and the large number of transfer errors well attest the active role of L1
on the production side.
The consistent role of L1 in collocation production may be closely linked with
the asymmetric cross-language connections. As the Revised Hierarchical Model
(Kroll and Stewart 1994; cf. Chap. 3) predicts, the link from L1 to conceptual
memory is assumed to be stronger than the link from L2 to conceptual memory, and
the lexical link from L2 to L1 is assumed to be stronger than the lexical link from
L1 to L2. Then it seems highly likely that in the production process, L1 is firstly
activated prior to the production of L2 words (see also the ‘dual-activation’ in
Wolter and Gyllystad (2011). Yet not all aspects of the L1 are easily activated in
producing an L2. As is shown in Jiang’s (2000) model, the lemma information
(containing semantic and syntactic information) of the L1 is copied into the L2
lexical entry. This is a stage called L1 lemma mediation stage where a majority of
L2 words fossilise at this stage (Jiang 2000). Thus based on Jiang’s model, L2
lexical information at the lemma level is in turn most likely to be influenced by the
L1. For producing L2 collocations, which are word combinations representing
syntactic and semantic relationships between lexical items, the L1 thus plays the
most significant role for L2 learners. For example, acquire knowledge as a word
combination involves both the semantics of acquire and knowledge and the syn-
tactic information of acquire as a transitive verb and knowledge as an uncountable
noun. With the storage of L1 semantics and syntax at the lemma level (e.g. for both
the L2 words acquire and knowledge), production of word combinations involving
semantics and syntax in the L2 (e.g. acquire knowledge) is easily mediated through
L1 semantics and syntax and thus L1 interference occurs in L2 collocation pro-
duction. Then L2 lexical combinations are the most susceptible to L1 influence
compared to other aspects of language acquisition (e.g. morphology and phonol-
ogy). Additionally, syntactic and phonological constructions are always finite
compared with L2 lexical combinations. So building syntagmatic connections
between words in an L2 is complicated by the infinite number of collocations, as
well as by the influence from L1 collocational knowledge (cf. Wolter 2006).
Reliance on the L1 lexical network underlies the large number of fortuitous
well-formed collocations that share direct translation equivalents between the L1
and L2, but at the same time the occurrence of erroneous congruent collocations. As
discussed in Sect. 9.2, types of mismatches like ‘differentiation’ and ‘coalescing’
make direct copying of L1 word combinations to L2 collocations error-prone. For
collocations that have no direct translation equivalent between languages, the
shared conceptual system may be the same, but features of word combinations
differ (cf. Kroll et al. 2010). Both the empirical data in the present study and the
experimental data obtained by Yamashita and Jiang (2010) and Wolter and Gyllstad
(2011) suggest that once these non-congruent collocations are acquired, they are
processed independently of the L1. It is speculated that with regard to the acqui-
sition process for non-congruent collocations, no direct access is gained in the
10.2 Implications 159
L1 L2
concepts
Fig. 10.1 Processes for the production of congruent and non-congruent collocations by L2
learners
160 10 Summary and Conclusions
greater difficulties in producing collocations than native speakers, since on the one
hand, their L1 lexical/conceptual knowledge has a consistent influence on how
learners structure connections between words in an L2 (Wolter 2006); on the other,
expanding paradigmatic relations of words (e.g. synonymy relations) in their
vocabulary in the course of L2 acquisition means that more and more words are
stored in the L2 mental lexicon, thus exerting interfering forces in the word
selection process. This is one of the major implications drawn from this study. Our
results show that learners are confronted with a dilemma: on the one hand, their
production of collocations is characterised with a limited number of collocation
types, indicating an inadequate mastery of vocabulary; on the other, the increase in
vocabulary in turn inhibits the learning of collocations. In other words, the growth
of vocabulary in the paradigmatic relations [i.e. sets in the terminology of Carter
and McCarthy (1988: 210)] enables learners to have more varied choices but at the
same time produces an interfering effect in learning collocations. Words within sets
are in relationships of synonymy, antonymy, hyponymy, etc. Sets are believed to be
“powerful organising principles, and have a strong psychological reality for lan-
guage users and learners” (ibid: 211). In this regard, the synonym sets identified in
the learner data are not only the organising principle for semantically related words,
but also the interfering factor in selecting the appropriate word to collocate with
another word. Therefore, it is important for learners to acquire not only the shared
semantic element of a word in a group of semantically related words, but also to
acquire the distinguishing semantic contents of that word in order to differentiate it
from its synonyms. The next section will be devoted to a discussion of pedagogical
implications for collocation learning mainly in terms of a full mastery of word
semantics.
How collocations are learnt and what factors interfere with the learning of collo-
cations can shed much light on how collocations are best taught and learnt. Findings
in the present research hold a number of important pedagogical implications.
First, verb + noun collocations deserve special attention compared with AN and
NN collocations since they are more error-prone. Second, within verb + noun
collocations, it is the verb that poses more problems than the noun for L2 learners
(Granger 2014; Nesselhauf 2005). So verbs deserve more attention in vocabulary
learning. As is discussed in the previous section, the misuse of verbs with semantic
relatedness (e.g. implement and perform, admit and concede) in collocations means
that it is important to fully acquire the semantics of verbs: both their basic properties
(“semantic markers” in the parlance of Katz and Fodor 1963) and the features that
distinguish them from its semantically related words (“distinguishers”, Katz and
Fodor 1963).
10.2 Implications 161
An efficient learning of verb semantics has to take into account how the
semantics of words is approached. A traditional view of word semantics is to break
down word meaning into a number of abstract components or semantic features and
identify “those features that will distinguish the meaning of any one word from
every other that might … compete for a place in the same semantic territory”
(Cowie 2009: 57). This approach to the meaning of a word is known as
Componential Analysis. CA has long been used to describe and distinguish words
with semantic relatedness. For example, Rudzka et al. (1981, 1985) presented
words in sets whose members have similar meanings and distinguished them
through componential grids (and collocational grids as well). With the example of
admit and concede, which occur in our database as *concede + mistake (admit), the
componential grid given by Rudzka et al. (1985: 171) is as follows:
The semantic marker of both admit and concede is “accept as true or valid”, but
the distinguishers of admit are “to confess or to acknowledge or allow to enter” and
of concede are “to give up or give away to opponent” (ibid.). This way of con-
trasting the semantic features of semantically related words works well for linguistic
analysis but is not suitable as a language-teaching tool, as the features are always
abstract (Carter and McCarthy 1988). As one semantic component of admit—“to
confess”, the word confess might be more complex to understand than admit.
Meanwhile, the decontextualised presentation of meaning components in words or
phrases (e.g. “accept as true or valid” for both admit and concede) makes it hard for
L2 learners to comprehend.
So the acquisition of verb semantics is better aided through a contextualised
display of its meaning. The learning of word semantics in contexts has been widely
advocated (Cobb 2003; Hanks 1996; Hoey 2000; Laufer 2006). One macro context
for learning the semantics of a word is its co-text, as the full sentence definition of
the headword adopted by the Collins COBUILD English Dictionary (1995). The
definition provides “much of the context necessary for the meaning of the word in
use in the language, dependent on its environment, to be properly appreciated”
(Barnbrook 2007: 190). For example, the Cobuild dictionary defines implement and
perform in the following way:
162 10 Summary and Conclusions
Implement: If you implement something such as a plan, you ensure that what has
been planned is done.
Perform: When you perform a task or action, especially a complicated one, you do
it.
Seeing through the meanings of the two verbs, implement implies more than
“carrying out/do something”; it also incorporates the meaning of “carrying out what
has been planned”. When learners are presented with the definition of implement,
the possibility of them making errors like *implement + act found in this study may
be reduced.
Another way beneficial for the learning of verb semantics, especially for learning
the semantics of semantically related words is through their collocates (cf. Carter
and McCarthy 1988; Lee and Liu 2009; Xiao and McEnery 2006). Collocations
contribute to the understanding of the concept of a word through defining its
semantic area (Brown 1974; Nattinger 1988). The learning of lexical semantics and
learning of collocations are mutually beneficial and inseparable. Knowing a word
involves knowing which words it usually collocates with, and to know the collo-
cational behaviour of a word is one type of word knowledge necessary for a
complete acquisition of that word (cf. Nation 1990: 31). Learning word meanings
through collocates contributes to the comprehension of the semantics of that word
and an acquisition of the semantics in turn helps define its co-occurring words. The
inseparability of the learning of semantics and collocational behaviour is best
manifested in Lewis’s (1997: 97) view that “the real definition of a word is a
combination of its referential meaning and its collocational field”. For semantic
sets, display of overlapping collocates and of collocates exclusive to a particular
word is helpful for learners to both learn the common meaning of a group of words,
and the distinguishing features of each word. Such an approach to learning
semantically related words has been advocated by Rudzka et al. (1981, 1985). The
following is an example of their presentation of collocational grids for synonymous
words.
Collocational grids like this may not only help learners get the common
meanings of verbs in a semantic field, but also the different nuances of meanings. It
is beneficial for learners to identify the distinguishing meanings through the indi-
vidually tailored collocates. This way of learning may be much better than a
decontextualised word learning, i.e. memorising the meanings of words in word
10.2 Implications 163
lists or through translation equivalents in the L1. Learning and teaching in word
lists would unavoidably lead learners to believe that the collocates of synonymous
words in a list share many collocates (Hoey 2000), which further leads to the
semantic confusion in producing collocations (e.g. the misleading belief that im-
plement and perform share the collocate act). Likewise, learning words through
translation equivalents leads to the same problem of assuming similar collocates of
semantically related words. As Meara (1982) points out, learning vocabulary does
not just involve pairing L2 words and L1 meanings as the end state of learning that
word. With the words perform and implement as an example, they are translated
into the same word in Chinese according to the Oxford Advanced Learners’
English-Chinese Dictionary, but the distinguishing features are lost in the Chinese
translation equivalent. So with the same translation equivalent, the collocational
behaviour of semantically related words is highly likely to be believed as the same
by L2 learners. Psycholinguistic studies have found that L2 learners have same
translation pairs stored nearby in the mental lexicon and have difficulties distin-
guishing their meanings (e.g. hat-cap, problem-question) (Jiang 2002). Therefore, it
is far from enough to acquire the semantics of an L2 word on the basis of its
translation equivalent. Instead, both a full sentence definition of the verb and words
in syntagmatic relations with the verb can be presented to L2 learners for a com-
plete acquisition of its semantics.
10.2.2.2 Consciousness-Raising
mother tongue (as they will always unconsciously do in learning an L2), but this is
not the end state. It would be beneficial for learners to translate the previously learnt
word combinations from their mother tongue back to English, without looking at
the English translation, in order to be more aware of the cross-linguistic differences
and ultimately to be conscious of the appropriate L2 collocation. This contrastive
analysis of collocations can be facilitating from a purely psycholinguistic per-
spective: collocation learning requires not only noticing, but also more “cognitive
depth” (Craik and Lockhart 1972). Therefore, collocations may ultimately enter the
long-term memory since more retention is gained in the learning process. In
instructional practices, contrastive analysis of collocations in terms of L1-L2
similarities/differences has been proved by Laufer and Girsai (2008) to be more
effective than teaching methods ignoring these cross-linguistic similarities and
differences between two languages.
One limitation of this study lies in the learner corpus adopted for data collection and
analysis. Results obtained in the study are based on a corpus of the English writings
by learners of one mother tongue—the Chinese. So all the generalisations made in
this research are on the basis of data restricted to one learner type. Researching
collocation performance by learners speaking other L1s would have been more
rewarding, as comparisons can be made between collocation performances by
learners of different mother tongues. With data obtained from more learner types,
both collocations use typical to one individual learner type and collocation patterns
common to learners of various L1s can be found.
A further limitation regarding the learner corpus adopted is concerned with the
properties of the learner data. The Chinese Learner English Corpus is a collection of
writings by learners at different learning stages. So it is cross-sectional rather than
longitudinal. A longitudinal learner corpus would help us to arrive at more definite
conclusions regarding L2 learners’ collocational development and the factor of
vocabulary growth in collocation learning. Yet due to the unavailability of a learner
corpus at the time of beginning this research, a corpus recording the writings by
learners of different proficiency levels was used.
Nonetheless, despite adopting a quasi-longitudinal corpus, there is a clear dif-
ferentiation in proficiency levels. The ST2, ST5 and ST6 learner groups (viz.
middle school students, English majors of lower grades and English majors of
higher grades), which are assumed to be in a continuous development based on the
years of English instruction they received, were found to be in a continuous
developmental stage. Several indicators show a continuous rise in proficiency, e.g.
the continuous increase in lexical verb + noun collocations, the increase in the
number of the overall verbs, adjectives and nouns produced by each level of
learners, etc.
166 10 Summary and Conclusions
References
Albert, M., & Obler, L. K. (1978). The bilingual brain. New York: Academic Press.
Bahns, J. (1993). Lexical collocations: A contrastive view. ELT Journal, 47(1), 56–63.
Bahns, J., & Eldaw, M. (1993). Should we teach EFL students collocations? System, 21(1), 101–
114.
Barnbrook, G. (2007). Sinclair on collocation. International Journal of Corpus Linguistics, 12(2),
183–199.
Brown, D. (1974). Advanced vocabulary teaching: The problem of collocation. RELC Journal, 5
(2), 1–11.
Carter, R., & McCarthy, M. (1988). Vocabulary and language teaching. London: Longman.
Channell, J. (1988). Psycholinguistic considerations in the study of L2 vocabulary acquisition.
In R. Carter & M. Mccarthy (Eds.), Vocabulary and language teaching (pp. 83–94). London:
Longman.
References 167
Chi, M. -L. A., Wong, P. -Y. K., & Wong, C. -P. M. (1994). Collocational problems amongst ESL
learners: A corpus-based study. In L. Flowerdew & A. K. Tong (Eds.), Entering text (pp. 157–
165). Hong Kong: University of Science and Technology.
Cobb, T. (2003). Analyzing late interlanguage with learner corpora: Quebec replications of three
European studies. Canadian Modern Language Review, 59(3), 393–423.
Cowie, A. P. (2009). Semantics. Oxford: Oxford University Press.
Craik, F. I. M., & Lockhart, R. S. (1972). Levels of processing: A framework for memory research.
Journal of Verbal Learning and Verbal Behavior, 11(6), 671–684.
Cross, J., & Papp, S. (2008). Creativity in the use of verb + noun combinations by Chinese
learners of English. In G. Gilquin, S. Papp, & M. B. Diez-Bedmar (Eds.), Linking up
contrastive and learner corpus research (pp. 57–81). Amsterdam: Rodopi.
Farghal, M., & Obeidat, H. (1995). Collocations: A neglected variable in EFL. International
Review of Applied Linguistics in Language Teaching, 33(4), 315–331.
Foster, P. (2001). Rules and routines: A consideration of their role in the task-based language
production of native and non-native speakers. In M. Bygate, P. Skehan, & M. Swain (Eds.),
Researching pedagogic tasks: Second language learning, teaching and testing (pp. 75–93).
Harlow: Longman.
Granger, S. (1998). Prefabricated patterns in advanced EFL writing: collocations and formulae.
In A. P. Cowie (Ed.), Phraseology: Theory, analysis, and applications (pp. 145–160). Oxford:
Oxford University Press.
Granger, S. (2014). John Sinclair’s idiom principle: An inspiration for learner corpus research.
Talk given at 2014 Annual John Sinclair Lecture, Birmingham, 8 May 2014.
Hanks, P. (1996). Contextual dependency and lexical sets. International Journal of Corpus
Linguistics, 1(1), 75–98.
Henriksen, B. (2013). Research on L2 learners’ collocational competence and development—A
progress report. In C. Bardel, C. Lindqvist, & B. Laufer (Eds.), L2 vocabulary acquisition,
knowledge and use: New perspectives on assessment and corpus analysis (pp. 29–56). Eurosla.
(2014-03-10). http://www.eurosla.org/monographs/EM02/EM02tot.pdf.
Hoey, M. A. (2000). World beyond collocation: New perspectives on vocabulary teaching. In M.
Lewis (Ed.), Teaching collocation: Further developments in the lexical approach (pp. 224–
245). Hove: Language Teaching Publications.
Hornby, A. S. (2009). Oxford advanced learner’s English-Chinese dictionary. (7th ed.) Beijing:
The Commercial Press.
Howarth, P. (1998). The phraseology of learners’ academic writing. In A. P. Cowie (Ed.),
Phraseology: Theory, analysis and applications (pp. 161–186). Oxford: Oxford University
Press.
James, C. (1996). A cross-linguistic approach to language awareness. Language Awareness, 5(3–
4), 138–148.
Jiang, N. (2000). Lexical representation and development in a second language. Applied
Linguistics, 21(1), 47–77.
Jiang, N. (2002). Form-meaning mapping in vocabulary acquisition in a second language. Studies
in Second Language Acquisition, 24, 617–637.
Katz, J. J., & Fodor, J. A. (1963). The structure of a semantic theory. Language, 39(2), 170–210.
Kjellmer, G. A. (1991). Mint of phrases. K. Aijmer & B. Altenberg (Eds.), English corpus
linguistics. Studies in Honour of Jan Svartvik (pp. 111–127). London: Longman.
Kroll, J. F., & Stewart, E. (1994). Category interference in translation and picture naming:
evidence for asymmetric connections between bilingual memory representations. Journal of
Memory and Language, 33, 149–174.
Kroll, J. F., Van Hell, J. G., & Tokowicz, N., et al. (2010). The revised hierarchical model: A
critical review and assessment. Bilingualism: Language and Cognition, 13(3), 373–381.
Laufer, B. (2006). Comparing focus on form and focus on forms in second language vocabulary
learning. The Canadian Modern Language Review, 63(1), 149–166.
168 10 Summary and Conclusions
Laufer, B., & Girsai, N. (2008). Form-focused instruction in second language vocabulary learning:
A case for contrastive analysis and translation. Applied Linguistics, 29(4), 694–716.
Laufer, B., & Waldman, T. (2011). Verb-Noun collocations in second language writing: A corpus
analysis of learners’ English. Language Learning, 61(2), 647–672.
Lee, C. Y., & Liu, J. S. (2009). Effects of collocation information on learning lexical semantics for
near synonym distinction. Computational Linguistics and Chinese Language Processing, 14
(2), 205–220.
Lewis, M. (1997). Implementing the lexical approach. Hove: Language Teaching Publications.
Llach, M. P. A. (2011). Lexical errors and accuracy in foreign language writing. Bristol:
Multilingual Matters.
Martelli, A. A. (2006). Corpus based description of English lexical collocations used by Italian
advanced learners. In E. Corino, C. Marello, & C. Onesti (Eds.), Proceedings XII EURALEX
International Congress (pp. 1005–1011). Alessandria: Edizioni dell’Orso.
Meara, P. (1978). Learners’ word associations in French. Interlanguage Studies Bulletin, 3(2),
192–211.
Meara, P. (1982). Word associations in a foreign language. Nottingham Linguistic Circular, 11(2),
29–38.
Nation, I. S. P. (1990). Teaching and learning vocabulary. Boston, Mass: Heinle & Heinle.
Nattinger, J. (1988). Some current trends in vocabulary teaching. In R. Carter & M. Mccarthy
(Eds.), Vocabulary and language teaching (pp. 62–80). London: Longman.
Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some
implications for teaching. Applied Linguistics, 24(2), 223–242.
Nesselhauf, N. (2005). Collocations in a learner corpus. Amsterdam: Benjamins.
Palmer, H. E. (1933). Second interim report on English collocations. Tokyo: Kaitakusha.
Paquot, M., & Granger, S. (2012). Formulaic language in learner corpora. Annual Review of
Applied Linguistics, 32, 130–149.
Pawley, A., & Syder, F. H. (1983). Two puzzles for linguistic theory: Nativelike selection and
nativelike fluency. In J. C. Richards & R. W. Schmidt (Eds.), Language and communication
(pp. 191–226). London: Longman.
Ringbom, H. (1987). The role of the first language in foreign language learning. Clevedon:
Multilingual Matters.
Ringbom, H. (2001). Lexical transfer in L3 production. In J. Cenoz, B. Hufeisen, & U. Jessner
(Eds.), Cross-linguistic influence in third language acquisition: Psycholinguistic perspectives
(pp. 59–68). Clevedon: Multilingual Matters.
Rudzka, B., Channell, J., Putseys, Y., et al. (1981). The words you need. London: Macmillan.
Rudzka, B., Channell, J., Putseys, Y., et al. (1985). More words you need. London: Macmillan.
Schmidt, R. W. (1990). The role of consciousness in second language learning. Applied
Linguistics, 11(2), 129–150.
Schmitt, N., & Carter, R. (2004). Formulaic sequences in action: An introduction. In N. Schmitt
(Ed.), Formulaic sequences: Acquisition, processing and use (pp. 1–22). Amsterdam:
Benjamins.
Sinclair, J. (1987). Collocation: A progress report. In R. Steele & T. Threadgold (Eds.), Language
topics: Essays in honour of Michael Halliday (Vol. 2, pp. 319–331). Amsterdam: Benjamins.
Sinclair, J. (1995). Collins COBUILD English dictionary. (2nd. ed.) London: HarperCollins.
Siyanova, A., & Schmitt, N. (2008). L2 learner production and processing of collocation: A
multi-study perspective. Canadian Modern Language Review, 64(3), 429–458.
Wolter, B. (2001). Comparing the L1 and L2 mental lexicon: A depth of individual word
knowledge model. Studies in Second Language Acquisition, 23(1), 41–69.
Wolter, B. (2006). Lexical network structures and L2 vocabulary acquisition: The role of L1
lexical/conceptual knowledge. Applied Linguistics, 27(4), 741–747.
Wolter, B., & Gyllstad, H. (2011). Collocational links in the L2 mental lexicon and the influence
of L1 intralexical knowledge. Applied Linguistics, 32(4), 430–449.
Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge University Press.
References 169
Xiao, R., & Mcenery, T. (2006). Collocation, semantic prosody, and near synonymy: A cross
linguistic perspective. Applied Linguistics, 27(1), 103–129.
Yamashita, J., & Jiang, N. (2010). L1 Influence on the acquisition of L2 collocations:
Japanese ESL users and EFL learners acquiring English collocations. TESOL Quarterly, 44(4),
647–668.
Zareva, A., & Wolter, B. (2012). The ‘promise’ of three methods of word association analysis to
L2 lexical research. Second Language Research, 28(1), 41–67.
Appendix A
Erroneous VN Collocations Produced
by the Three Levels of Learners (Types)
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018 171
H. Men, Vocabulary Increase and Collocation Learning,
DOI 10.1007/978-981-10-5822-6
Appendix B
Well-Formed and Erroneous VN
Collocations in the 16 Synsets (ST2)
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018 173
H. Men, Vocabulary Increase and Collocation Learning,
DOI 10.1007/978-981-10-5822-6
174 Appendix B: Well-Formed and Erroneous VN Collocations in the 16 Synsets (ST2)
(continued)
Synsets Well-formed coll. Erroneous coll.
Verbs Nouns Freq. Verbs Nouns Freq.
Verbs of Lay Foundation 1
putting
“Settle” verbs Settle Problem 2 Do Problem 4
(solve)
Solve Problem 2
“Learn” verbs Learn Knowledge 16
(acquire)
Get Lesson 1
(learn)
Study Knowledge 5
(acquire)
Know Knowledge 2
(acquire)
Verbs of Teach Lesson 1 Teach Knowledge 2
transfer of a (impart)
message Tell Lie 3 Take (tell) Joke 1
Tell Story 3 Say (tell) Joke 1
Tell Knowledge 3
(impart)
Tell (give) Advice 2
“Keep” verbs Hold Breath 1
Keep Record 7
Keep Pace 1
Keep Balance 1
Keep Secret 2
Keep Promise 1
“Follow” Obey Rule 2 Obey Fact 1
verbs (face)
Follow Advice 1 Observe Law 1
(obey)
“Play” verbs Play Part 1 Play Play 3
(perform)
“Change” Change Mind 6
verbs
“Break” verbs Break Rule 2
Break Record 3
“Live” verbs Lead Life 3
Live Life 4
“Wear” verbs Wear Clothes 17 Dress Clothing 1
(wear)
(continued)
Appendix B: Well-Formed and Erroneous VN Collocations in the 16 Synsets (ST2) 175
(continued)
Synsets Well-formed coll. Erroneous coll.
Verbs Nouns Freq. Verbs Nouns Freq.
“Drive” verbs Drive Motorcycle 1 Ride Bus 1
(drive)
Drive Bus 1
Drive Car 1
Ride Bike 5
“Pay” verbs Devote Attention 1
Pay Visit 2
Pay Respect 1
Pay Attention 23
Note Verbs in brackets are the target verbs for erroneous VN collocations
Appendix C
Well-Formed and Erroneous VN
Collocations in the 16 Synsets (ST6)
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018 177
H. Men, Vocabulary Increase and Collocation Learning,
DOI 10.1007/978-981-10-5822-6
178 Appendix C: Well-Formed and Erroneous VN Collocations in the 16 Synsets (ST6)
(continued)
Synsets Well-formed coll. Erroneous coll.
Verbs Nouns Freq. Verbs Nouns Freq.
“Fulfil” Apply Principle 1 Accomplish Crime 2
verbs (commit)
Enforce Policy 1 Carry out Value 1
(realise)
Enforce Law 2 Ensure (enforce) Law 1
Exercise Power 2 Carry on Law 1
(enforce)
Exercise Judgment 1 Exert Ability 1
(demonstrate)
Exercise Right 1 Exert Competence 1
(demonstrate)
Exert Influence 3 Fulfil Ability 1
(demonstrate)
Fulfil Role 1 Implement Act 1
(perform)
Fulfil Ambition 1 Attend (perform) Military 1
service
Fulfil Wish 1 Take (perform) Military 1
service
Implement Principle 1 Carry (perform) Function 1
Implement Law 1 Make (conduct) Exam 1
Implement Policy 1 Take (conduct) Survey 2
Perform Military 2 Conduct Murder 2
service (commit)
Perform Act 1 Conduct Crime 3
(commit)
Perform Function 1 Make (commit) Crime 1
Realise Value 2 Do (commit) Crime 4
Realise Dream 6
Realise Goal 1
Conduct Survey 2
Commit Crime 120
Commit Homicide 2
Commit Suicide 17
Commit Offence 1
Commit Murder 2
Commit Act 1
(continued)
Appendix C: Well-Formed and Erroneous VN Collocations in the 16 Synsets (ST6) 179
(continued)
Synsets Well-formed coll. Erroneous coll.
Verbs Nouns Freq. Verbs Nouns Freq.
Verbs of Achieve Aim 1 Receive (achieve) Success 1
obtaining Achieve Dream 1 Cause (catch) Attention 1
Achieve Goal 11 Reach (catch) Attention 1
Achieve Purpose 2 Meet (earn) Praise 1
Achieve Success 3 Get (reach) Conclusion 1
Catch Attention 1 Approach (reach) Conclusion 1
Earn Money 27 Reach (receive) Recognition 1
Earn Living 7 Receive Operation 1
(undergo)
Gain Knowledge 4
Gain Victory 1
Grasp Opportunity 2
Reach Agreement 1
Reach Goal 3
Reach Target 2
Reach Conclusion 1
Receive Award 1
Receive Training 3
Receive Education 24
Receive Treatment 3
Receive Attention 1
Receive Reward 2
Receive Punishment 5
Receive Warning 1
Seize Opportunity 2
Verbs of Attach Importance 8 Give (impose) Burden 1
putting Fix Eye 2 Do (impose) Punishment 1
Impose Fine 1 Impose (pose) Threat 1
Impose Burden 1 Lay (impose) Burden 1
Impose Punishment 2 Lay (cast) Eye 1
Lay Emphasis 2 Lay (assign) Role 1
Lay Foundation 1 Give (put) End 1
Place Emphasis 1 Put (pay) Attention 3
Put Value 5
Put Emphasis 3
Put Hope 1
Put End 19
Put Blame 2
Put Priority 1
(continued)
180 Appendix C: Well-Formed and Erroneous VN Collocations in the 16 Synsets (ST6)
(continued)
Synsets Well-formed coll. Erroneous coll.
Verbs Nouns Freq. Verbs Nouns Freq.
“Settle” Solve Problem 35 Charge (tackle) Problem 1
verbs Solve Dispute 2
Resolve Problem 3
Tackle Problem 1
Undertake Duty 1
Undertake Task 2
“Learn” Acquire Knowledge 4 Learn (acquire) Knowledge 11
verbs Have (learn) Lesson 1
Get (learn) Lesson 1
Master (acquire) Knowledge 2
Study (acquire) Knowledge 3
Verbs of Teach Lesson 1 Teach (impart) Knowledge 2
transfer of Tell Truth 1 Instruct Idea 1
a message (communicate)
Tell Story 5 Push (impart) Knowledge 1
Impart Knowledge 1
“Keep” Hold Opinion 3 Reflect (hold) Prejudice 1
verbs Hold Position 3 Cast (hold) Prejudice 1
Hold Belief 2
Hold Post 1
Hold View 5
Hold Attitude 3
Keep Watch 1
Keep Distance 2
Keep Eye 3
Keep Balance 6
Keep Promise 2
Keep Pace 1
Maintain Order 3
Maintain Balance 1
Maintain Dignity 1
(continued)
Appendix C: Well-Formed and Erroneous VN Collocations in the 16 Synsets (ST6) 181
(continued)
Synsets Well-formed coll. Erroneous coll.
Verbs Nouns Freq. Verbs Nouns Freq.
“Follow” Obey Law 7 Obey (adhere to) Principle 1
verbs Obey Rule 2
Follow Principle 1
Follow Rule 1
Adopt Attitude 4
Adopt Method 2
Adopt Policy 3
Adopt Law 1
“Play” Play Role 54 Serve (play) Role 1
verbs Play Part 3 Act (play) Role 1
Lead (play) Role 1
“Change” Shift Focus 2 Change Criminal 1
verbs (rehabilitate)
Change Mind 1
“Break” Break Law 26 Break (violate) Regulation 1
verbs Break Rule 2
Break Promise 1
Violate Regulation 1
Violate Law 6
“Live” Lead Life 37
verbs Live Life 30
“Wear” Wear Clothes 1 Dress (wear) Clothing 1
verbs
“Drive” Drive Car 2
verbs
“Pay” Pay Attention 53 Pay (give) Praise 1
verbs Pay Heed 25
pay respect 3
Note Verbs in brackets are the target verbs for erroneous VN collocations
Appendix D
Frequencies of Well-Formed
and Erroneous VN Collocation Types
in the 16 Synsets (ST2 and ST6)
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018 183
H. Men, Vocabulary Increase and Collocation Learning,
DOI 10.1007/978-981-10-5822-6
Appendix E
Well-Formed and Erroneous VN
Collocations in the 16 Synsets (ST5)
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018 185
H. Men, Vocabulary Increase and Collocation Learning,
DOI 10.1007/978-981-10-5822-6
186 Appendix E: Well-Formed and Erroneous VN Collocations in the 16 Synsets (ST5)
(continued)
Synsets Well-formed coll. Erroneous coll.
Verbs Nouns Freq. Verbs Nouns Freq.
Verbs of Achieve Purpose 3 Catch Chance 3
obtaining (seize)
Achieve Aim 3 Catch Opportunity 1
(seize)
Achieve Success 2 Grasp Skill 1
(acquire)
Earn Money 9
Earn Salary 2
Earn Living 2
Gain Knowledge 6
Gain Independence 2
Reach Agreement 2
Reach Goal 3
Receive Letter 42
Seize Opportunity 1
Verbs of Attach Importance 4 Put (turn) Ear 1
putting Lay Stress 1
Place Emphasis 2
Put Emphasis 2
Put Stress 1
Put End 2
Set Foot 1
“Settle” Resolve Problem 1 Do (solve) Problem 5
verbs Solve Problem 45
“Learn” Learn Lesson 3 Learn Knowledge 25
verbs (acquire)
Master Skill 3 Study Knowledge 6
(acquire)
Verbs of Teach Lesson 2 Teach Knowledge 7
transfer of (impart)
a message Tell Story 14 Have (tell) Joke 1
Tell Lie 4
Tell Truth 2
Tell Joke 3
“Keep” Hold Opinion 2
verbs Keep Touch 1
(continued)
Appendix E: Well-Formed and Erroneous VN Collocations in the 16 Synsets (ST5) 187
(continued)
Synsets Well-formed coll. Erroneous coll.
Verbs Nouns Freq. Verbs Nouns Freq.
“Follow” Adopt Method 1 Obey Method 2
verbs (adopt)
Adopt Policy 1
Follow Instruction 2
Obey Law 1
Obey Rule 2
“Play” Play Role 23 Act (play) Role 2
verbs Play Part 4 Occupy Role 1
(play)
Do (play) Role 1
Lay (play) Role 1
“Change” Change Mind 2
verbs
“Break” Break Law 1
verbs Break Rule 1
Break Record 1
Violate Rule 2
“Live” Live Life 17 Make (live) Life 1
verbs Lead Life 13
“Wear” Wear Clothes 11
verbs
“Drive” Ride Bike 16
verbs
“Pay” Pay Attention 25
verbs Pay Respect 1
Note Verbs in brackets are the target verbs for erroneous VN collocations
Appendix F
Frequencies of Well-Formed
and Erroneous VN Collocation Types
in the 16 Synsets (ST2, ST5 and ST6)
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018 189
H. Men, Vocabulary Increase and Collocation Learning,
DOI 10.1007/978-981-10-5822-6
Appendix G
Adjective Categories in the ST2 and ST6
AN Collocation Databases
ST6 ST2
Qualitative (69): Active, adverse, bad, breaking, (34): Active, bad, bright, cheap,
adjectives bright, broad, clean, clear, close, classical, close, common, correct,
common, controversial, convincing, crisp, dark, deep, fair, fast, firm, foul,
dark, deadly, deaf, deep, dense, fresh, full, glib, good, great, happy,
distant, effective, fair, fatal, fertile, hard, heavy, high, long, loose, loud,
fierce, fresh, full, good, great, guilty, low, open, popular, rapid, soft, strong,
hard, heated, heavy, high, hot, warm
infectious, irresistible, keen, key,
leading, lethal, light, long, mass,
narrow, near, nice, polluted, practical,
primary, primitive, privileged,
professional, promising, rapid,
remote, rural, scientific, sharp, small,
solid, sore, strong, torrential,
unexpected, urban, urgent, vicious,
warm, weak, wide
Classifying (70): Academic, annual, arable, (25): Boiled, British, botanical,
adjectives armed, associate, atomic, biochemical, capitalist, civil, closing, criminal,
bodily, boiling, broken, capitalist, daily, developed, developing, double,
chemical, compulsory, consequential, everyday, extracurricular, final, foster,
corporal, criminal, cultural, curable, founding, historic, living, Lunar,
daily, developed, developing, military, natural, physical, political,
domestic, economic, electric, public
endangered, environmental, ethical,
everyday, feminist, financial, final,
five-star, flared, foreign, human,
illegal, incurable, individual,
industrial, initial, international,
juvenile, latest, liberal, literal, living,
medical, middle, military, monetary,
moral, naked, native, national, natural,
nuclear, personal, physical, plastic,
political, presidential, promissory,
public, racial, sexual, social, solar,
spoiled, territorial, top
(continued)
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018 191
H. Men, Vocabulary Increase and Collocation Learning,
DOI 10.1007/978-981-10-5822-6
192 Appendix G: Adjective Categories in the ST2 and ST6 AN Collocation Databases
(continued)
ST6 ST2
Emphasising (1): Absolute (1): Blue
adjectives
Colour (2): Black, blue (0)
adjectives
Appendix H
Well-Formed and Erroneous Congruent
and Non-congruent Collocations
in the ST6 (Types)
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018 193
H. Men, Vocabulary Increase and Collocation Learning,
DOI 10.1007/978-981-10-5822-6
Appendix I
Well-Formed Congruent
and Non-congruent VN Collocations
in the ST2 and ST6 (Types)
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018 195
H. Men, Vocabulary Increase and Collocation Learning,
DOI 10.1007/978-981-10-5822-6
Appendix J
Erroneous Congruent and Non-congruent
VN Collocations in the ST2 and ST6
(Types)
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018 197
H. Men, Vocabulary Increase and Collocation Learning,
DOI 10.1007/978-981-10-5822-6
Appendix K
Positive and Negative Transfer
Between VN and AN Collocations
in the ST2 (Types)
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018 199
H. Men, Vocabulary Increase and Collocation Learning,
DOI 10.1007/978-981-10-5822-6
Appendix L
Positive and Negative Transfer
Between VN and AN Collocations
in the ST6 (Types)
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018 201
H. Men, Vocabulary Increase and Collocation Learning,
DOI 10.1007/978-981-10-5822-6
References
Brown, R. A. (1973). First language: The early stages. Cambridge, MA: Harvard University
Press.
Hanania, E., & Gradman, H. (1977). Acquisition of English structures: A case study of an adult
native speaker of Arabic in an English speaking environment. Language Learning, 27(1), 75–
91.
Hui, Y. A. (2002). New century Chinese-English dictionary. Beijing: Foreign Language Teaching
and Research Press.
Lombard, R. J. (1997). Non-native speaker collocations: A corpus-driven characterization from
the writing of native speakers of Mandarin (Mandarin Chinese). Ann Arbor, MI: UMI.
Men, H. (2014). L1 influence on the production of L2 collocations: A corpus-based study of
Chinese EFL learners’ collocation acquisition. Talk given at the 9th Newcastle upon Tyne
Postgraduate Conference in Linguistics, 4 April, 2014.
Peters, A. (1977). Language learning strategies. Language, 53(3), 560–573.
Schmidt, R. W. (1983). Interaction, acculturation, and the acquisition of communicative
competence: A case study of an adult. In N. Wolfson &E. Judd (Eds.), Sociolinguistics and
language acquisition (pp. 137–174). Rowley, MA: Newbury House.
Scott, M. (2004). WordSmith Tools (Version 4.0). Oxford: Oxford University Press.
Wong-Fillmore, L. (1976). The second time around: Cognitive and social strategies in language
acquisition. Stanford: Stanford University.
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018 203
H. Men, Vocabulary Increase and Collocation Learning,
DOI 10.1007/978-981-10-5822-6
Index
© Springer Nature Singapore Pte Ltd. and Shanghai Jiao Tong University Press 2018 205
H. Men, Vocabulary Increase and Collocation Learning,
DOI 10.1007/978-981-10-5822-6
206 Index
P S
Partial congruence, 139, 140, 147, 154 Salient, 144, 164
Partial non-congruence, 140 Semantic interference, 156, 157, 163
Perceptual salience, 144, 166 Semantic marker, 160, 161
Phraseological approach, 14, 15, 19, 20, 25, 30, Sematic relatedness, 54, 160, 161
64 Semantic transparency, 21, 25, 27, 30, 38
POS tagging, 67, 68 Structurally non-congruent collocations, 143
Prefabricated units, 12, 152 Substitutability, 22 See also Commutability
Prefabs, 9 See also Prefabricated units Synonym density, 127, 128, 130, 154
Priming, 49 Synonym set, 3, 6, 59, 71, 93, 95, 125, 126,
Psychological approach, 15, 28, 30 132, 153, 154, 160
Q T
Qualitative deficiency, 85 Translation equivalents, 48–51, 135, 142, 158,
Quantitative deficiency, 152 163, 164
R V
Recurrent word combinations, 10, 11 Verb + noun collocations. See VN collocations
Reliability check, 67
Restricted collocation, 10, 21, 23, 24, 38, 43, W
45, 78 WordNet, 71, 72, 98, 126–128, 130, 132, 154
Revised Hierarchical Model, 49, 158