Lexical Bundles and l1 Transfer Effects 2013
Lexical Bundles and l1 Transfer Effects 2013
Lexical Bundles and l1 Transfer Effects 2013
Magali Paquot
Université catholique de Louvain
1. Introduction
The last ten years have witnessed a remarkable boom in the number of studies that
examine learners’ use of lexical bundles, i.e. “recurrent expressions, regardless of
their idiomaticity, and regardless of their structural status” (Biber et al. 1999: 990).
These repeated sequences of words may be grammatically complete (by contrast,
on the other hand) or incomplete (the nature of the, is based on the). They may be
composed of clause segments (e.g. I don’t know what) or be parts of phrases (e.g.
the use of). They are “conventionalized building blocks that are used as convenient
routines in language production” (Altenberg 1998: 122) and typically function as
referential markers (e.g. the end of the), text organizers (e.g. on the basis of, for
example), stance markers (e.g. it is possible to) or interactional discourse markers
(e.g. or something like that, thank you so much) (Biber et al. 2003). Corpus-query
tools often provide an option that retrieves repeated sequences of words of a given
To answer the research questions, the study is grounded in Jarvis’s (2000) uni-
fied framework for the study of L1 influence (Section 2). Section 3 describes the
learner and native corpus data used. In Section 4, the different methodological
steps required are summarised. Section 5 offers the results of the analysis of trans-
fer effects on French EFL learners’ use of lexical bundles, and Section 6 provides
answers to the research questions in the light of the preceding sections. Section 7
contains concluding remarks.
Transfer studies have too often fallen into the trap of making a case for L1 influ-
ence on the sole argument that the structure exists in the L1, thus relying exclusive-
ly on “‘shot-in-the-dark’ post hoc interpretive guesses which pass for explanations”
(Lightbown 1984: 245). To remedy this situation, Jarvis (2000) puts forward a uni-
fied framework for the study of L1 influence which is premised on the following
operationalizable definition of the construct of ‘transfer’:
L1 influence refers to any instance of learner data where a statistically significant
correlation (or probability-based relation) is shown to exist between some fea-
tures of learners’ IL performance and their L1 background. (Jarvis 2000: 252)
This definition of L1 influence translates into a list of at least three potential sourc-
es of evidence that transfer studies should consider altogether when presenting a
case for or against L1 influence:
i. Effect 1: Intra-L1-group homogeneity in learners’ IL performance is found
when learners who speak the same first language behave as a group with
respect to a specific second language (L2) feature. To illustrate this first L1
effect, Jarvis (2000) uses Selinker’s (1992) finding according to which Hebrew-
speakinglearners of English as a group tend to produce sentences in which
adverbs are placed before the object (e.g. I like very much movies).
ii. Effect 2: Inter-L1-group heterogeneity in learners’ IL performance is found
when “comparable learners of a common L2 who speak different L1s diverge
in their IL performance” (Jarvis 2000: 254). To illustrate, Jarvis (2000) refers
to a number of studies reported by Ringbom (1987) that have shown that
Finnish-speaking learners are more likely than their Swedish-speaking coun-
terparts to omit English articles and prepositions. Jarvis (2000) argues that
“this type of evidence strengthens the argument for L1 influence because it
essentially rules out developmental and universal factors as the cause of the
394 Magali Paquot
3. Data
The learner corpus data used for the present study come from the first version of
the International Corpus of Learner English (ICLE) (Granger et al. 2002). ICLE
texts share a number of learner and task variables, which were used as corpus-
design criteria. All the learners are young adults who study English as a Foreign
Language (EFL) at university. They are all in their second, third, or fourth year
and their proficiency level has commonly been described as advanced although
learner groups differ in proficiency (Granger et al. 2009: 12). Learner productions
share many task variables, notably for medium (writing), genre (academic es-
say), field (general English rather than English for Specific Purposes) and length
(between 500 and 1,000 words). Other variables differ. A majority of the learner
texts are argumentative, but the essays cover a wide range of topics (e.g. the death
penalty, euthanasia). Learner texts also differ in task conditions.
The focus of this study is on French learner writing but other ICLE sub-
corporawere also used as comparable corpora to test for inter-L1-group het-
erogeneity in learners’ interlanguage (IL) performance (cf. Section 2). Table 1
provides a breakdown of the ten ICLE sub-corpora used. Learner essays in each
sub-corpus were carefully selected in an attempt to control for a number of task
variables which may affect learner productions (cf. Kroll 1990, Ädel 2008): all
Lexical bundles and L1 transfer effects 395
the texts are untimed argumentative essays, potentially written with the help of
reference tools. Although essays written without the help of reference tools would
arguably have been more representative of what advanced EFL learners can pro-
duce, untimed essays with reference tools are used as they represent the majority
of learner texts in ICLE.
To evaluate Effect 3, several corpora of French writing were used. The 1.6 billion
word frWaC was consulted via the Sketch Engine (Kilgarriff & Kosem 2012) to
assess the frequency and prototypicality of specific word combinations in French
for general purposes. It was also deemed necessary to query smaller but more
comparable corpora of expert and student writing to control for text type and
levels of writing expertise:
i. The humanities component of the online Scientext corpus, i.e. a 3,431,531
word corpus of French published articles, theses and proceedings in linguis-
tics, psychology, education and natural language processing.
ii. The Corpus de Dissertations Françaises (CODIF), i.e. a 92,832 word corpus of
argumentative essays written by French-speaking students on similar topics
to ICLE-FR.
Spot-checks were also sometimes made in the 100 million word British National
Corpus and the 2 billion word English corpus ukWaC to check the lexicogram-
matical and distributional properties of English word combinations and hence
identify possible intralingual contrasts (Jarvis 2010). The two corpora were que-
ried via the Sketch Engine (Kilgarriff & Kosem 2012).
396 Magali Paquot
4. Methodology
The methodology used involves several steps which are described here. Sec-
tion 4.1 covers the extraction of lexical bundles from ICLE texts. Section 4.2
provides the procedures and statistical tests used to operationalize Jarvis’s (2000)
unified framework on learner corpus data. Section 4.3 describes the method used
to rule out topic influence and the rationale behind this extra step.
The focus of the study is on potential transfer effects on French EFL learners’ use
of bundles with lexical verbs. Lexical bundles of 3 words were first extracted from
the ICLE French sub-corpus with the help of the computer software WordSmith
Tools 5 (Scott 2008). A minimum frequency threshold of 5 occurrences was ad-
opted. The resulting list was filtered manually and the bundles that included a
lexical verb were selected for further analysis. A Perl program was then used to
retrieve relative frequencies per 100 words for each of the selected bundles in the
1,641 learner texts that make up the ten learner corpora.
the difference is. A post-hoc test must then be conducted to pinpoint the learner
population(s) responsible for the significant difference. As the objective here is to
evaluate Effects 1 and 2, the comparisons of interest are those between the French
learner corpus and the other ICLE sub-corpora. The Dunnett’s test is considered
the most powerful post-hoc test whenever one group is compared with each of the
other groups (Howell 1997: 380–381) and is therefore used in this study.1 When
lexical bundles display significant differences in use between the French learner
group and at least half of the other learner populations as revealed by Dunnett’s
tests, there is a strong case for intra-L1-group homogeneity and inter-L1-group
heterogeneity. The criterion used according to which over half of the comparisons
need to be significant is arbitrary and probably a relatively conservative estimate.
It is, however, used in this exploratory study to validate the methodology. All sta-
tistical tests were performed with R (R Core Team 2012).
While the first two effects readily lend themselves to automatic and quanti-
tative evaluation, intra-L1-group congruity between French learners’ L1 and IL
performance does not. Assessing this third effect requires a more qualitative ap-
proach. First, the use of each lexical bundle was carefully analysed in ICLE-FR.
The next steps consisted in identifying the French potential “equivalent” of each
lexical bundle in context, describing its use in French L1 and comparing learners’
L1 and IL patterns of use.
Learner texts in ICLE are varied in topic, and there is no single topic that is evenly
distributed across the 10 sub-corpora used in this study. Topic variability must
however be addressed as lexical bundles are particularly prone to this factor
(Cortes 2004) and the ICLE French sub-corpus is characterised by a strong bias
towards just one topic (“Europe 92: loss of sovereignty or birth of a nation?”). This
topic was selected by c. 40% of all the French learners, and more than 70% of all
the texts about Europe 92 in ICLE are to be found in the French component. As
the issue of topic variability could not be addressed a priori, it is dealt with just
before intra-L1-group congruity between French learners’ L1 and IL performance
(Effect 3) is tested. To rule out topic influence, the ICLE in-built corpus query tool
is used to analyse the distribution by essay prompt of all the bundles that display
intra-L1-group homogeneity and inter-L1-group heterogeneity (Effects 1 and 2).
If a lexical bundle only appears in French learners’ essays discussing the creation
and future of Europe and in no other ICLE text, this provides a strong indication
that topic is a much more likely explanation than L1 influence.
398 Magali Paquot
5. Results
This section presents the results obtained from the transfer study. The extrac-
tion procedure outlined in Section 4.1 made it possible to identify 273 bundles
with a lexical verb in the French learner corpus, which were submitted to further
analysis.
An R script was written to assess Effects 1 and 2 for the 273 lexical bundles under
study. The ANOVA test identified 87 lexical bundles that present significant dif-
ferences in use among the ten learner corpora. Among these, 34 bundles (12.45%)
display significant differences in use between the French learner group and at least
half of the other learner populations as revealed by Dunnett’s tests, thus showing
both intra-L1-group homogeneity and inter-L1-group heterogeneity. Table 2 lists
the 34 bundles, their F ratio and p value, as well as the number of learner popula-
tions from which the French learner group differs significantly in its use of each
lexical bundle.
Table 2. (continued)
Bundle F p Number of significant
learner corpus comparisons
to be found 5.206 5.32e-07 7
to build a 4.274 1.67e-05 9
to create a 2.788 0.00302 6
to go further 2.485 0.00809 6
to know whether 2.85 0.00246 8
wait and see 4.699 3.52e-06 9
want to create 3.011 0.00143 8
was considered as 2.421 0.00991 6
we can say 3.192 0.000774 6
we can wonder 2.669 0.00446 6
we may wonder 3.338 0.000469 9
we must not 2.606 0.00549 8
will be allowed 3.261 0.000612 8
will be needed 3.299 0.000536 9
will be united 3.328 0.000484 9
will keep its 3.309 0.000518 9
would say that 3.696 0.000134 8
An analysis of the 34 significant bundles in the 1,641 learner texts and their dis-
tribution by essay prompt reveals that 14 lexical bundles only appear in ICLE-
FR essays that discuss the creation and future of Europe. These bundles are keep
its own, keep their own, say that Europe, to build a, wait and see, will be needed,
will be united, will keep its, will be allowed, does it mean, going to become, want
to create, *loose their identity, and to create a. The influence of topic is visible in
the selection of content words (e.g. say that Europe, want to create) as well as in
tense preferences (e.g. will be allowed, will be united, will keep its) (Examples (1)
and (2)).
(1) Europe will be united against USA and Japan. (ICLE-FR)
(2) Each country will keep its own identity, currency, institutions and constitu-
tion. (ICLE-FR)
The influence of topic was ruled out for the remaining 20 lexical bundles as they
were found in essays covering a range of prompts (cf. Table 3).
400 Magali Paquot
The simplest way to test Effect 3 is to check whether there are equivalent lexical
bundles in French. Before doing so, however, a quick scan of concordance lines
for the 20 remaining lexical bundles (Table 3) showed that some regrouping of
embedded word sequences was possible (sometimes making up longer and more
syntactically complete bundles such as I would say that or pinpointing shorter but
more salient word combinations, e.g. considered as). Intra-L1-group congruity
between learners’ L1 and IL performance was consequently evaluated for fifteen
lexical bundles (see Table 4). L1/IL equivalence in form was found for a majority
of the English lexical bundles; equivalence in meaning or function was established
for the four lexical bundles involving the first person plural pronoun we. Table 4
also provides the most frequent corresponding bundles in French as identified in
frWaC for each of the fifteen longer, syntactically complete or more salient lexical
Lexical bundles and L1 transfer effects 401
bundles. Small capitals are used to represent lemmas rather than word forms. The
extent of the correspondence between the English and French lexical bundles is
discussed in Section 6.
Table 4. Lexical bundles and their most frequent equivalent forms in French
English lexical bundles Most frequent equivalent bundles in French
be tempted to être tenté/es de
considered as considéré/es comme
deeply rooted in profondément enraciné/es dans
I would say that je dirais que
is to know whether est de savoir si
not forget that pas oublier que
role to play rôle à jouer
speak of parler de
take the example prendre l’exemple
to be found être trouvé/es
to go further aller plus loin
we can say on peut dire
we can wonder on peut se demander
we must not il ne faut pas
we may wonder on peut se demander
6. Discussion
This section addresses the research questions guiding the study by discussing the
results provided in Section 5. The combination of the three effects investigated in
Section 5 points to a firm conclusion of L1 transfer for the twenty lexical bundles
for which topic influence was ruled out (Table 3). This represents as much as
58.8% of the lexical bundles that set the French learners apart from at least 5 other
learner populations (Section 5.1). Thus, to answer RQ1, over a half of French
learners’ idiosyncratic use of lexical bundles with verbs can be attributed to L1
influence.
A close look at the lexical bundles and their equivalent forms in French helps
identify four major types of transfer effect found in French EFL learners’ use of
recurrent word sequences, thus addressing RQ2: (i) transfer of collocational and
colligational preferences, (ii) transfer of syntactic constructions, (iii) transfer of
functions and discourse conventions and (iv) transfer of L1 frequency.
402 Magali Paquot
French EFL learners’ preference for the construction consider + as mirrors the
use of French considérer, which is typically followed by the preposition comme
when introducing adjective or noun phrases (Examples (5) and (6)). In frWaC,
for example, considérer + comme + ADJECTIVE has a relative frequency of
11.5 pmw while the structure without the preposition appears with a relative fre-
quency of 2 pmw.
(5) Il est également considéré comme le fondateur de l’abbaye de Malmédy en
Belgique. (frWaC)
(“He is also considered the founder of the abbey of Malmedy in Belgium.”)
(6) La nature a longtemps été considérée comme une réserve plutôt que comme
un patrimoine. (frWaC)
(“Nature has long been considered a reserve rather than a heritage.”)
Lexical bundles and L1 transfer effects 403
Among the lexical bundles that distinguish the French learner population from
the other learner groups, several include to-infinitive constructions. As illustrated
in Example (7), French learners use the lexical bundle to go further although it is
not very frequent in English (0.9 pmw in ukWaC). By contrast, the French con-
gruent bundle aller plus loin is relatively frequent (8.9 pmw in frWaC).
(7) Nevertheless the Americans decided to go further and were the first who
wanted to stop Hussein and his army. (ICLE-FR)
The lexical bundle to be found appears in several ICLE sub-corpora but it is most
frequent in the French learner sub-corpus where it is almost always preceded by
a noun phrase (NP) + the verb be (Examples (8) and (9)). This larger frame cor-
responds to French NP + être + à trouver, which is itself a lexical realisation of
the frequent French structure NP + être + à + VERB (over 20 pmw in frWaC).
The meaning of this French construction is more commonly expressed with the
modal verb should in English and the most frequent bundles that exemplify this
structure in frWaC include dossiers sont à retirer (“forms should be picked up”),
candidatures sont à adresser (“applications should be sent to”), précautions sont à
prendre (“precautions should be taken”), règles sont à respecter (“rules should be
followed”), and supplément est à payer (“extra charge should be paid”).
(8) The real problem is to be found in the fact that women who wish to have a job,
also desire to have a family life. (ICLE-FR)
(9) Another example is to be found between the French and the Italian vine grow-
ers: […]. (ICLE-FR)
There are only two sentences where to be found is not used with the verb be and
they both feature the combination a balance has to be found (Examples (10) and
(11)). Tellingly, the choice of have to in these two sentences is consistent with the
preferred expression of modality in the congruent phrase in French: un équilibre
doit/devra être trouvé is twice as frequent as un équilibre est à trouver in frWaC (41
vs. 23 occurrences).
(10) A balance has thus to be found. (ICLE-FR)
(11) And a balance between the two orientations has to be found. (ICLE-FR)
Similarly, the lexical bundle role to play is always introduced by the verb have
in ICLE-FR (Example (12)) and this larger word combination is congruent with
avoir un rôle à jouer, which is the most frequent lexical realisation of the French
404 Magali Paquot
phrases, i.e. nous sommes/serions tentés de and on est/serait tenté de (see below for
a discussion of EFL learners’ use of modal verbs). The pronouns nous and on are
very frequent in French academic writing. The first person plural pronoun nous
(“we”) is commonly used to involve the reader in the argument or guide them
through the research process. Such cases of inclusive we are often the subjects
of procedural verbs (nous avons procédé à, “we conducted”; nous avons repéré,
“we identified”) and metadiscursive verbs (e.g. nous aborderons, “we will discuss”;
nous montrerons, “we will show”) (Tutin 2010: 38). It may also be found when
an argumentative dimension is introduced with an opinion verb (e.g. penser,
“think”) or a verb of questioning (e.g. se demander, “wonder”). With these verbs,
however, the indefinite pronoun on is much more frequent,2 especially with the
modal verb pouvoir (e.g. on peut admettre, “we can admit”; on peut se demander,
“we may wonder”) (Tutin 2010: 23).
In the Scientext corpus, the verb parler (“speak”) is often used in introductory
phrases but it is actually found three times as often with the indefinite pronoun
on as with the personal pronoun subject nous (“we”) and is modified by pouvoir
(“can”) in 10% of the cases (Example (17)). In the CODIF, by contrast, the two
patterns are equally frequent and the more frequent use of nous may perhaps be
interpreted as a feature of novice writing. When compared to expert writers, for
example, French doctoral students have been reported to use more instances of
the first plural pronoun subject nous in their published research articles (Fløttum
& Thue Vold 2010: 46).
(17) Dans ce cas, on peut parler d’ellipse métonymique. (Scientext)
(“In this case we can speak of metonymic ellipsis.”)
These findings help explain French EFL learners’ idiosyncratic use of the lexical
bundle speak of as an effect of their mother tongue. French learners often use the
verb with the first person plural pronoun we and a modal verb (cf. Example (18)),
a pattern that is not common in English academic writing (1.2 pmw in the aca-
demic component of the BNC).
(18) We cannot speak of a loss of national identity […] (ICLE-FR)
French EFL learners’ overuse of lexical bundles including modal verbs is the re-
sult of a highly complex interplay of factors. This may, to some extent, simply be a
feature of novice writing: both L1 and L2 English student writers are reported to
rely extensively on modal verbs to convey statements with an appropriate degree
of doubt and certainty (Hyland & Milton 1997). L2 learners, however, appear to
depend far more heavily on these devices (e.g. Dagneaux 1995, Granger & Rayson
1998, Aijmer 2002, McKenny 2010) and to have incomplete mastery of the Eng-
lish modal system (Thewissen 2013).
406 Magali Paquot
The difficulties EFL learners face in using modal verbs may be reinforced by
interlingual factors as previously reported in the literature for other learner popu-
lations. Neff et al. (2003: 216), for example, attribute Spanish and Italian EFL learn-
ers’ erroneous use of the modal verb can in an epistemic sense to a mapping of the
more hypothetical meaning of the Spanish modal verb poder and the Italian modal
verb potere into their L2 English. An unnecessary use of modal verbs may also
be associated with transfer of writing conventions from the L1. Neff et al. (2004)
explain Spanish learners’ overuse of we must by the fact that the Spanish modal
verb deber can mean either must or should and that debemos (“we should” or “we
must”) + reporting verb is often used as a way of adding a further proposition to be
considered by the reader (e.g. debemos tener en cuenta, “we should/must take into
account”; debemos recorder, “we should/must remember”; debemos reconocer, “we
should/must recognize”; debemos aceptar, “we should/must accept”).
The data analysed for this study contained more examples of transfer of writ-
ing conventions. One of the most striking is French EFL learners’ overuse of we
can say (Example (19)), a lexical bundle which is not frequent in English academ-
ic writing (0.2 pmw) but is a translational equivalent of both nous pouvons dire
and on peut dire in French (0.3 and 1.3 pmw in Scientext). These two phrases are,
among other things, used to introduce the outcome of reasoning or put forward
a conclusion in French academic writing (Example (20)) and on peut dire is even
more frequent in French for general purposes (6 pmw in frWaC).
(19) In conclusion we can say that the birth of an economic nation would be
favourable. (ICLE-FR)
(20) Dans cette optique, on peut dire qu’il existe des genres plus ou moins codi-
fiés…. (Scientext)
(“In this perspective, we can say that there are genres which are more or less
codified….”)
Similarly, the lexical bundle we may wonder is absent from the academic compo-
nent of the British National Corpus and the modal verb can is awkward in we can
wonder (Example (21)). However, both lexical bundles are used by French EFL
learners with the meaning and function of the French introductory phrase on
peut se demander (Example (22)).
(21) But we can wonder what a prison is and what its function is in our society.
(ICLE-FR)
(22) La question de la compositionnalité sémantique […] et on peut se demander si
elle présente un intérêt particulier pour le traitement automatique des langues.
(Scientext)
(“The question of semantic compositionality […] and we can wonder whether
it is of particular significance for automatic language processing.”)
Lexical bundles and L1 transfer effects 407
Learners also use modal verbs of obligation and necessity more than L1 writ-
ers and tend to adopt a more direct and emphatic style of persuasion (Hinkel
2002: 110). However, the use of must, should and have to seems to vary widely
across different L1 learner populations and reflects at least partly cultural conven-
tions (Hinkel 1995). The lexical bundle we must not sets the French learners apart
from all the other learner groups except the Swedes. It is used in sequences such
as we must not be pessimistic, we must not forget, we must not lose sight of, and we
must not neglect, to “influence the reader by emotional appeal” (Ädel 2006: 78),
persuade them that certain events are desirable, and present the writer and the
reader as a team in ICLE-FR (Example (23)).
(23) But we must not forget that books used to be written for only a small part of
the total population. (ICLE-FR)
(24) Cependant, il ne faut pas oublier que les données recueillies auprès des sta-
giaires sont uniquement déclaratives. (Scientext)
The formally equivalent structure nous ne devons pas is not used in French aca-
demic writing; neither is the corresponding structure with indefinite on, i.e. on
ne doit pas. To express a negative obligation, French writers rather resort to the
impersonal structure il ne faut pas (Example (24)) but this discourse strategy is
more typical of general rather than academic language (20 pmw in frWaC vs. 1.2
in Scientext). It seems quite probable that French EFL learners’ use of the bundle
we must not is an attempt at expressing negative obligation and translating il
ne faut pas, a pattern which is also more frequent in French texts produced by
novice writers than expert writers. Interestingly, the larger bundle we must not
forget that is the only sequence that is repeated in ICLE-FR (5 occ.) and il ne faut
pas oublier que is also the only lexical bundle that is used repeatedly in Scientext.
As illustrated in Example (25), French EFL learners also made use of structures
involving the modal verb should as functional equivalent patterns to il ne faut
pas oublier.
(25) We should not forget that there are many sorts of criminals, ranging from the
accidental criminals and small fry to the hardened ones, the ones “beyond
redemption”. (ICLE-FR)
The remaining occurrences of the lexical bundle not forget that in ICLE-FR are
used with the first plural imperative form let us, as are a majority of occurrences
of the bundle take the example. There is no lexically equivalent form to English
let us in French. Equivalence is however found at the morphological level as
French makes use of an inflectional suffix to mark the first imperative plural
form. Paquot (2008) compares the use of let us in ICLE-FR with that of first
person plural imperative verbs in CODIF and finds that the rhetorical and or-
ganisational functions fulfilled by let us in French EFL learner writing can be
408 Magali Paquot
paralleled with the very frequent use of first person plural imperative verbs in
French student writing to organize discourse and interact with the reader (see
also Paquot 2010: 189–191). Imperative forms that are repeated in ICLE-FR of-
ten have translational equivalents that are found in CODIF (e.g. let us take the
example of, “prenons l’exemple de”; let us consider, “considérons”; let us hope,
“espérons”; let us examine, “examinons”; let us take, “prenons”; let us (not/never)
forget, “oublions/n’oublions pas que”; let us think, “pensons”). This generalized
overuse of the first person plural imperative in EFL French learner writing as a
rhetorical strategy does not conform to English academic writing conventions
but rather to French academic style.
Lastly, the use of the lexical bundle I would say is also idiosyncratic in ICLE-
FR. As shown in Example (26), the bundle is most often used in phraseological
‘cascades’, “collocational patterns which extend from a node to a collocate and
on again to another node (in other words, chains of shared collocates)” (Gledhill
2000: 212), with an adverbial phrase such as in conclusion or to conclude to intro-
duce a conclusion.
(26) In conclusion, I would say that television has actually replaced religion in our
western civilization. (ICLE-FR)
The French bundle je dirais appears in Scientext but it is not very frequent (0.38
pmw); the bundle, however, seems to be more typical of informal French and is
quite common in frWaC (5.3 pmw). The use of the first person pronoun je has
long been discouraged in French academic writing but it is used in disciplines
such as linguistics (cf. Fløttum 2003, Fløttum et al. 2006), where its use has in-
creased significantly between 1980 and 2000 in research articles (Gjesdal 2003,
quoted in Fløttum et al. 2006: 115). The lexical bundle je dirais is not found in
CODIF but it does occur with a relative frequency of 10 per 10,000 words in
the Corpus d’Apprenants du Français Langue Maternelle (CAFLaM), i.e. a newly
compiled corpus of argumentative texts produced by French-speaking first year
university students (Bolly 2008). Example (27) shows that EFL learners’ use of
longer sequences and phraseological cascades may also be transfer-related as the
bundle je dirais is also often introduced by discourse markers such as en conclu-
sion (“in conclusion”) or pour conclure (“to conclude”) in French-speaking novice
writing.
(27) En conclusion, je dirais qu’il existe un équilibre à trouver entre conformisme
et différence. (CAFLaM)
(“In conclusion, I would say that a balance should be found between conform-
ism and difference.”)
Lexical bundles and L1 transfer effects 409
It may thus be argued that French EFL learners’ use of I would say is the result of
a combination of L1-related factors, i.e. the relative and increasing tolerance of je
in the discipline they are studying, the high frequency of je dirais in general lan-
guage, and French-speaking novice writers’ reliance on phraseological cascades
including je dirais to conclude their argumentative essays.
7. Conclusion
Transfer effects on French learners’ use of 3-word sequences with lexical verbs
do not seem to generate obvious errors, at least at the intermediate to advanced
proficiency levels represented in the French component of the International
Corpus of Learner English. Rather, they are more visible in the learners’ selec-
tion of unmarked word combinations whose translational equivalents are deep-
ly entrenched in French speakers’ mental lexicon because these sequences are
Lexical bundles and L1 transfer effects 411
The transfer effects identified in this study are thus best described as “transfer of
primings” (Hoey 2005: 183). Mental primings for (at least frequent or core) L1
words and word strings are most probably superimposed on the primings for
their translation equivalent forms in the foreign language.
The direct pedagogical implication is that EFL teaching needs to counter the
default and sometimes misleading L1-related primings in EFL learners’ mental
lexicons. Awareness-raising activities focusing on similarities and differences
between the mother tongue and the foreign language are clearly needed. They
should not be restricted to “helping learners focus on errors typically committed
by learners from a particular L1” (Hegelheimer & Fisher 2006: 259) but should
also raise learners’ awareness of more subtle differences such as the collocational
preferences and distributional properties of similar words in the two languages.
This recommendation stands in sharp contrast to Bahns’s (1993: 56) claim that
collocations which are direct translation equivalents do not need to be taught.
Learners have no way of knowing which collocations are congruent in the mother
tongue and the foreign language; moreover, the differences between the colloca-
tions in L1 and L2 may lie in aspects of use rather than form or meaning.
Primings are also sensitive to the textual, generic and social contexts in
which a lexical item is encountered. Hoey (2005: 10) illustrates this with the word
412 Magali Paquot
r esearch, which is primed in the mind of academic language users to occur with
recent in academic discourse and news reports of research but is not primed to
occur in other text types or other contexts. A direct implication of Hoey’s theory
of lexical priming is that academic-like word combinations in the first language
cannot be assumed to be primed in the mental lexicon of novice native writers
who may have had little contact with academic texts in their L1. While many
of the French lexical bundles examined here proved to be relatively frequent in
French academic writing, some of them are indeed primed more strongly in gen-
eral language. This is particularly true of two sequences, i.e. on peut dire and il
ne faut pas, and calls for a more systematic deconstruction of the concept of L1
frequency in future research.
Many learner corpus-based studies, however, have fallen into the trap of
claiming L1 influence on the basis that the structure exists in the first language
without further investigation of L1 empirical data. In Douglas’s (2001: 451) words,
“the point here is not that these methods are faulty or that the interpretations are
invalid, but only that little or no evidence is provided for either quality [reliabil-
ity and validity]”. As shown in this study, formal similarity between L1 and L2
word combinations does not necessarily make the word combination in the first
language a strong candidate for transfer into the foreign language. Other factors
intervene and L1 frequency proved to contribute to transferability in a signifi-
cant way. The impact of L1 frequency is most apparent when different languages
are compared with the help of corpus data. As a consequence, this study also
brings support to the detection-based approach to transfer first outlined in Jarvis
(2010). The method is based on the premises that it is possible to identify the first
language of a learner on the basis of their use of specific features of the target
language and that these idiosyncrasies can serve as useful indicators of cross-
linguistic influence (Jarvis 2012).
Transfer effects were indeed pinpointed for twenty 3-word lexical bundles
which were further analysed as part of fifteen longer strings. This represents 7.3%
of all the 3-word sequences that appear at least 5 times in the French learner corpus
and c. 60% of the bundles that set the French learners apart from at least 5 other
learner populations. These figures are already quite high but they certainly under-
estimate the impact of the first language. The criterion according to which French
learners’ use of a given lexical bundle has to differ from that of at least five other
learner groups is very conservative. L1 influence may be obscured when the effects
of the mother tongue of different L1 learner populations coincide to produce the
same IL behaviour and this is certainly not a rare phenomenon (Jarvis 2000).
More generally, the study has also brought to light the considerable potential
of a corpus-driven approach to track L1 influence on learner language. Transfer
studies have often investigated “bits and pieces of learners’ language chosen for
Lexical bundles and L1 transfer effects 413
analysis because they caught the researcher’s eye, seemed to exhibit some syste-
maticity, confirmed some intuition one had about SLA, or had been found in-
teresting in L1 acquisition” (Lightbown 1984: 245). As put by De Cock (2004),
the lexical bundle approach represents “corpus linguistic methodology at its most
heuristic, i.e. as a raw discovery procedure” (De Cock 2004: 227). Coupled with
Jarvis’s (2000) framework and appropriate statistical tests, it proves most useful to
extract fully automatically a number of word combinations that deserved further
analysis and consequently identify transfer effects that until now have been little
documented in the SLA literature. Lexical transfer has too often been narrowed
down to transfer of form/meaning mappings and the third aspect of word knowl-
edge, i.e. use, has rarely been investigated in all its complexity. Further research is
clearly needed. Lexical bundles of different sizes and built around different word
classes than just verbs should prove fascinating data types to start with.
Notes
* I would like to thank Sylviane Granger, Victoria Hasko and two anonymous reviewers for
their valuable comments and constructive suggestions for improvement. I acknowledge the
financial support of the Fonds de la Recherche Scientifique (FNRS).
1. The use of parametric tests may be criticized as the data used in this study is not normally
distributed. According to Howell (1997), those who argue in favour of using parametric tests
“argue, however, that the assumptions normally cited as being required of parametric tests are
overly restrictive in practice and that the parametric tests are remarkably unaffected by viola-
tions of distribution assumptions” (Howell 1997: 646, see also Rietveld et al. 2004: 360). More-
over, parametric tests are said to be more powerful than non-parametric tests: they require
fewer observations than do non-parametric tests and are more likely “to lead to rejection of a
false null hypothesis” (Howell 1997: 646) than are their corresponding non-parametric tests.
This advantage seems to be maintained “even when the distribution assumptions are violated
to a moderate degree” (ibid).
2. The French indefinite pronoun on is much more frequent and stylistically very different
from the English one: it can refer to one or more people, be substituted for all personal pro-
nouns and “has an unclear enunciative status (i.e. relation to speaker or locator and receiver)”
(Fløttum et al. 2006: 113).
3. Paquot (2008) has also shown that the distribution of let us in the interlanguage of French,
Spanish and Dutch learners parallels that of first person plural imperative structures in the
three languages.
414 Magali Paquot
References
Gjesdal, A. M. 2003. L’emploi du Pronom ‘on’ dans les Articles de Recherche. Une étude diachro-
nique et qualitative. MA dissertation. Bergen: University of Bergen.
Gledhill, C. 2000. Collocations in Science Writing. Tuebingen: Gunter Narr Verlag.
Granger, S. 1998. “Prefabricated patterns in advanced EFL writing: Collocations and formulae”.
In A. Cowie (Ed.), Phraseology: Theory, Analysis and Applications. Oxford: Oxford Univer-
sity Press, 145–160.
Granger S., Dagneaux E. & Meunier F. 2002. The International Corpus of Learner English.
Handbook and CD-ROM. Louvain-la-Neuve: Presses Universitaires de Louvain.
Granger, S., Dagneaux, E., Meunier, F. & Paquot, M. 2009. The International Corpus of Learner
English. Handbook and CD-ROM (Version 2). Louvain-la-Neuve: Presses Universitaires
de Louvain.
Granger, S. & Rayson, P. 1998. “Automatic profiling of learner texts”. In S. Granger (Ed.), Learn-
er English on Computer. London/New York: Addison Wesley Longman, 119–131.
Groom, N. 2009. “Effects of second language immersion on second language collocational
development”. In A. Barfield & H. Gyllstad (Eds.), Researching Collocations in Another
Language. Basingstoke: Palgrave Macmillan, 21–33.
Hegelheimer, V. & Fisher, D. 2006. “Grammar, writing, and technology: A sample technol-
ogy-supported approach to teaching grammar and improving writing for ESL learners”.
CALICO Journal, 23 (2), 257–279.
Hinkel, E. 1995. “The use of modal verbs as a reflection of cultural values”. TESOL Quarterly,
29 (2), 325–243.
Hinkel, E. 2002. Second Language Writers’ Text: Linguistic and Rhetorical Features. London:
Lawrence Erlbaum Associates.
Hoey, M. 2005. Lexical Priming: A New Theory of Words and Language. London/New York:
Routledge.
Howell, D. 1997. Statistical Methods for Psychology. Belmont: Wadsworth.
Hyland, K. & Milton, J. 1997. “Qualifications and certainty in L1 and L2 students’ writing”.
Journal of Second Language Writing, 6 (2), 183–205.
Jarvis, S. 2000. “Methodological rigor in the study of transfer: Identifying L1 influence in the
interlanguage lexicon”. Language Learning, 50 (2), 245–309.
Jarvis, S. 2010. “Comparison-based and detection-based approaches to transfer research”.
In L. Roberts, M. Howard, M. Ó Laoire & D. Singleton (Eds.), EUROSLA Yearbook 10.
Amsterdam: John Benjamins, 169–192.
Jarvis, S. 2012. “The detection-based approach: An overview”. In S. Jarvis & S. Crossley (Eds.),
Approaching Language Transfer through Text Classification: Explorations in the Detection-
Based Approach. Bristol: Multilingual Matters, 1–33.
Juknevičienė, R. 2009. “Lexical bundles in learner language: Lithuanian learners vs. native
speakers”. Kalbotyra 61 (3), 61–72.
Kellerman, E. 1978. “Giving learners a break: Native language intuitions as a source of predic-
tions about transferability”. Working Papers on Bilingualism, 15, 59–92.
Kilgarriff, A. & Kosem, I. 2012. “Corpus tools for lexicographers”. In S. Granger & M. Paquot
(Eds.), Electronic Lexicography. Oxford University Press, 31–56.
Kroll, B. 1990. “What does time buy? ESL student performance on home vs. class composi-
tions”. In B. Kroll (Ed.), Second Language Writing. Cambridge: Cambridge University
Press, 140–154.
416 Magali Paquot
Tutin, A. 2010. “Dans cet article, nous souhaitons montrer que… Lexique verbal et position-
nement de l’auteur dans les articles en sciences humaines. Enonciation et rhétorique dans
l’écrit scientifique”. Lidil, Revue de Linguistique et de didactique des langues, 41, 15–40.
Available at: http://lidil.revues.org/index3040.html (accessed June 2013)
Author’s address
Magali Paquot
Centre for English Corpus Linguistics
Institut Langage et Communication
Université catholique de Louvain
Place Blaise Pascal 1, bte L3.03.31
1348, Louvain-la-Neuve
Belgium
magali.paquot@uclouvain.be