CL2015 AbstractBook PDF
CL2015 AbstractBook PDF
CL2015 AbstractBook PDF
Abstract Book
Edited by
Federica Formato and Andrew Hardie
Lancaster: UCREL
Table of contents
Plenaries
When an uptight register lets its hair down: The historical development of grammatical complexity
features in specialist academic writing
Douglas Biber
Exploring the interface of language and literature from a corpus linguistic point of view?
Michaela Mahlberg
Papers
Semantic tagging and Early Modern collocates
Marc Alexander; Alistair Baron; Fraser Dallachy; Scott Piao; Paul Rayson; Stephen Wattam
Does Corpus Size Matter? Exploring the potential of a small simplified corpus in improving
language learners writing quality
Wael Hamed Alharbi
10
14
16
19
Muslim and Christian attitudes towards each other in southwest Nigeria: using corpus tools to
explore language use in ethnographic surveys
Clyde Ancarno; Insa Nolte
22
24
Tracing verbal aggression over time, using the Historical Thesaurus of English
Dawn Archer; Beth Malory
27
28
32
33
Longest-commonest match
Vt Baisa; Adam Kilgarriff; Pavel Rychl; Milo Jakubek
36
39
42
All the news thats fit to share: Investigating the language of most shared news stories
Monika Bednarek; James Curran; Tim Dwyer; Fiona Martin; Joel Nothman
44
45
47
Tagging and searching the bilingual public notices from 19th century luxembourg
Rahel Beyer
50
Panel: A linguistic taxonomy of registers on the searchable web: Distribution, linguistic descriptions,
and automatic register identification
Doug Biber; Jesse Egbert; Mark Davies
52
54
56
Forward-looking statements in CSR reports: a comparative analysis of reports in English, Italian and
Chinese
Marina Bondi; Yu Danni
58
Depictions of strikes as battle and war in articles in, and comments to, a South African online
newspaper, with particular reference to the period following the Marikana massacre of Aug. 2012
Richard Bowker; Sally Hunt
60
Situating academic discourses within broader discursive domains: the case of legal academic writing
Ruth Breeze
62
63
A corpus analysis of discursive constructions of the Sunflower Student Movement in the English
language Taiwanese press
Andrew Brindle
66
An examination of learner success in UCLanESBs B1 and C1 speaking exams in accordance with the
Common European Framework of Reference for Languages.
Shelley Byrne
68
70
72
73
77
79
I crave the indulgence of a discriminating public to a Work: effective interaction between female
authors and their readership in Late Modern scientific prefaces and works
Begoa Crespo
81
Using corpora in the field of Augmentative and Alternative Communication (AAC) to provide visual
representations of vocabulary use by non-speaking individuals
Russell Cross
82
Changing Climates: a cross-country comparative analysis of discourses around climate change in the
news media
Carmen Dayrell; John Urry; Marcus Mller; Caimotto Maria Cristina; Tony Mcenery
85
The politics of please in British and American English: a corpus pragmatics approach
Rachele De Felice; M. Lynne Murphy
87
89
ii
91
93
Designing and implementing a multilayer annotation system for (dis)fluency features in learner and
native corpora
Amandine Dumont
96
99
102
Explaining Delta, or: How do distance measures for authorship attribution work?
Stefan Evert; Thomas Proisl; Christof Schch; Fotis Jannidis; Steffen Pielstrm; Thorsten Vitt
104
106
109
Institutional sexism and sexism in institutions: the case of Ministra and Ministro in Italy
Federica Formato
111
114
117
Learners use of modal verbs with the extrinsic meanings possibility and prediction
Kazuko Fujimoto
121
124
124
126
Analysing the RIP corpus: the surprising phraseology of Irish online death notices
Federico Gaspari
129
A golden keyword can open any corpus: theoretical and methodological issues in keyword extraction
Federico Gaspari; Marco Venuti
131
134
136
138
140
143
146
iii
148
The methodological explanation of synerging CL and SFL in (critical) Discourse studies: A case study
of the discursive representation of Chinese dream
Hang Su
151
Twitter rape threats and the discourse of online misogyny (DOOM): From discourses to networks
Claire Hardaker; Mark McGlashan
154
156
158
161
163
A text analysis by the use of frequent multi-word sequences: D. H. Lawrences Lady Chatterleys
Lover
Reiko Ikeo
165
167
A phraseological approach to the shift from the were-subjunctive to the was-subjunctive: Examples of
as it were and as it was
Ai Inoue
169
171
Examining Malaysian Sports News Discourse: A Corpus-Based Study of Gendered Key Words
Habibah Ismail
174
Doing well by talking good? Corpus Linguistic Analysis of Corporate Social Responsibility (CSR)
Sylvia Jaworska; Anupam Nanda
176
178
Can you give me a few pointers? Helping learners notice and understand tendencies of words and
phrases to occur in specific kinds of environment.
Stephen Jeaco
181
Panel: Researching small and specialised Corpora in the age of big data
Alison Johnson
183
Julian Barnes The Sense of an Ending and its Italian translation: a corpus stylistics comparison
Jane Helen Johnson
185
187
All our items are pre-owned and may have musty odor: A corpus linguistic analysis of item
descriptions on eBay
Andrew Kehoe; Matt Gee
191
193
195
iv
197
The Asian Corpus of English (ACE): Suggestions for ELT Policy and Pedagogy
Andy Kirkpatrick; Wang Lixun
199
Tweet all about it: Public views on the UNs HeForShe campaign
Risn Knight
201
204
206
209
211
214
Doing the naughty or having it done to you: agent roles in erotic writing
Alon Lischinsky
216
217
219
Increasing speed and consistency of phonetic transcription of spoken corpora using ASR technology
David Lukes
222
224
Quite + ADJ seen through its translation equivalents: A contrastive corpus-based study
Michaela Martinkova
226
228
231
Twitter rape threats and the Discourse of Online Misogyny (DOOM): using corpus-assisted
community analysis (COCOA) to detect abusive online discourse communities
Mark McGlashan; Claire Hardaker
234
A corpus based investigation of Techno-Optimism in the U.S National Intelligence Councils Global
Trends Reports
Jamie McKeown
236
239
241
243
Competition between accuracy and complexity in the L2 development of the English article system: A
learner corpus study
Akira Murakami
245
247
250
252
Should I say hearing-impaired or d/Deaf? A corpus analysis of divergent discourses representing the
d/Deaf population in America
Lindsay Nickels
255
257
Designing English Teaching Activities Based On Popular Music Lyrics From A Corpus Perspective
Maria Claudia Nunes Delfino
259
Some methodological considerations when using an MD-CADS approach to track changes in social
attitudes towards sexuality over time: The case of sex education manuals for British teenagers,
1950-2014
Lee Oakley
261
263
264
267
Citizens and migrants: the representation of immigrants in the UK primary legislation and
administration information texts (2007-2011)
Pascual Prez-Paredes
269
Using Wmatrix to classify open response survey data in the social sciences: observations and
recommendations
Gill Philip; Lorna J. Philip; Alistair E. Philip
271
Integrating Corpus Linguistics and GIS for the Study of Environmental Discourse
Robert Poole
273
275
A corpus-based discourse analytical approach to analysing frequency and impact of deviations from
formulaic legal language by the ICTY
Amanda Potts
277
Recycling and replacement as self repair strategies in Chinese and English conversations
Lihong Quan
279
Linguistic features, L1, and assignment type: Whats the relation to writing quality?
Randi Reppen; Shelley Staples
281
283
Investigating the Great Complement Shift: a case study with data from COHA
Juhani Rudanko
286
288
290
Developing ELT coursebooks with corpora: the case of Sistema Mackenzie de Ensino
Andrea Santos
293
295
vi
The notion of Europe in German, French and British election manifestos. A corpus linguistic
approach to political discourses on Europe since 1979
Ronny Scholz
297
300
Life-forms, Language and Links: Corpus evidence of the associations made in discourse about
animals
Alison Sealey
302
Teaching Near-Synonyms More Effectively -- A case study of happy words in Mandarin Chinese
Juan Shao
304
306
Tracing changes of political discourse: the case of seongjang (growth) and bokji (welfare) in South
Korean newspapers
Seoin Shin
309
Analyzing the conjunctive relations in the Turkish and English pedagogical texts: A Hallidayan
approach
Meliha R. Simsek
311
A corpus based discourse analysis of representations of mental illness and mental health in the British
Press
Gillian Smith
314
317
Do you like him? I don't dislike him. Stance expression and hedging strategies in female
characters of Downton Abbey. A case study.
Anna Stermieri; Cecilia Lazzeretti
320
321
Relative clause constructions as criterial features for the CEFR levels: Comparing oral/written
learner corpora vs. textbook corpora
Yuka Takahashi; Yukio Tono
324
326
328
330
Linguistic feature extraction and evaluation using machine learning to identify criterial grammar
constructions for the CEFR levels
Yukio Tono
332
335
337
339
341
vii
344
Size isnt everything: Rediscovering the individual in corpus-based forensic authorship attribution
David Wright
347
Illuminating President Obamas argumentation for sustaining the Status Quo, 2009 2012
Rachel Wyman
349
351
353
355
356
Nativeness or expertise: Native and non-native novice writers use of formulaic sequences
Nicole Ziegler
358
Posters
The development of an Arabic corpus-informed list of formulaic sequences for language pedagogy
Ayman Alghamdi
362
A Review of Semantic Search Methods To Retrieve Knowledge From The Quran Corpus
Mohammad Alqahtani; Eric Atwell
365
A contrastive analysis of Spanish-Arabic hedges and boosters use in persuasive academic writing
Anastasiia Andrusenko
366
368
370
372
374
377
378
380
Fit for lexicography? Extracting Italian Word Combinations from traditional and web corpora
Sara Castagnoli; Francesca Masini; Malvina Nissim
381
383
Semantic relation annotation for biomedical text mining based on recursive directed graph
Bo Chen; Chen Lyu; Xioaohui Liang
385
viii
386
388
389
390
393
395
Have you developed your entrepreneurial skills? Looking back to the development of a skills-oriented
Higher Education
Maria Fotiadou
397
398
The comparative study of the image of national minorities living in Central Europe
Milena Hebal-Jezierska
399
401
A resource for the diachronic study of scientific English: Introducing the Royal Society Corpus
Ashraf Khamis; Stefania Degaetano-Ortlieb; Hannah Kermes; Jrg Knappen; Noam Ordan; Elke Teich
404
405
407
Structuring a CMC corpus of political tweets in TEI: corpus features, ethics and workflow
Julien Longhi; Ciara R. Wigham
408
Patterns of parliamentary discourse during critical events: the example of anti-terrorist legislation
Rebecca Mckee
409
411
Textual patterns and fictional worlds: Comparing the linguistic depiction of the African natives in
Heart of Darkness and in two Italian translations
Lorenzo Mastropierro
412
414
417
Gender and e-recruitment: a comparative analysis between job adverts published for the German
and Italian labour markets
Chiara Nardone
418
Media reverberations on the Red Line: Syria, Metaphor and Narrative in the news
Ben OLoughlin; Federica Ferrari
419
421
ix
Mono-collocates: How fixed Multi-Word Units with OF or TO indicate diversity of use in different
corpora
Michael TL Pace-Sigge
422
Streamlining corpus-linguistics in Higher and adult education: the TELL-OP strategic partnership
Pascual Prez-Paredes
424
425
427
Multi-functionality and syntactic position of discourse markers in political conversations: The case of
you know, then and so in English and yan in Arabic.
Ben Chikh Saliha
430
431
Descriptive ethics on social media from the perspective of ideology as defined within systemic
functional linguistics
Ramona Statache; Svenja Adolphs; Christopher James Carter; Ansgar Koene; Derek Mcauley; Claire
O'Malley; Elvira Perez; Tom Rodden
433
Contrastive Analysis " the Relative clauses based on Parallel corpus of Japanese and English"
Kazuko Tanabe
434
435
435
438
Synthetism and analytism in the Celtic languages: Applying some newer typological indicators based
on rank-frequency statistics
Andrew Wilson; Risn Knight
439
440
Automatic Analysis and Modelling for Dialogue Translation Based on Parallel Corpus
Xiaojun Zhang; Longyue Wang; Qun Liu
442
Absence of Prepositions in Time Adverbials: Comparison of '*day' tokens in Brown and LOB
corpora
Shunji Yamazaki
443
444
446
447
Plenaries
Douglas Biber
Northern Arizona University
sylviane.granger@uclouvain.be
douglas.biber@nau.edu
Sylviane Granger
Universit catholique de Louvain
References
Crossley, S. & Salsbury, T.L. (2011). The development of
lexical bundle accuracy and production in English
second language speakers. IRAL - International Review
of Applied Linguistics in Language Teaching 49(1), 126.
Granger, S. & Bestgen, Y. (2014). The use of collocations
by intermediate vs. advanced non-native writers: A
bigram-based study. IRAL - International Review of
Applied Linguistics in Language Teaching 52(3), 2292
252.
Granger, S. & Paquot, M. (2010). Customising a general
EAP dictionary to meet learner needs. In Granger, S. &
Paquot, M. (eds.) eLexicography in the 21st century:
New
challenges,
new
applications.
Presses
universitaires de Louvain: Louvain-la-Neuve, 87-96.
Granger, S. & Paquot, M. (forthcoming). Electronic
lexicography goes local. Design and structures of a
needs-driven
online
academic
writing
aid.
Lexicographica
Leech, G. (1998). Preface: Learner corpora: what they are
and what can be done with them. In Granger, S. (ed.)
Learner English on Computer. Addison Wesley
Longman: London & New York.
Paquot, M. & Granger, S. (2012). Formulaic language in
learner corpora. Annual Review of Applied Linguistics,
32, 130-149.
Sinclair J. (2004). Trust the Text Language, corpus and
discourse. London: Routledge.
Acknowledgements
Parts of the presentation are derived from research
done for the CLiC Dickens project which is
supported by the UK Arts and Humanities Research
Council Grant Reference AH/K005146/1.
References
Mahlberg, M. 2013. Corpus Stylistics and Dickens's
http://clic.nottingham.ac.uk
3
Reading
Alan Partington
University of Bologna
alanscott.partington@unibo.it
* serendipity: The faculty of making happy and
unexpected discoveries by accident (OED), i.e.
finding out things you didnt even know you
were searching for: e.g. [he] warned that readers
were in danger of losing the serendipity of
finding a book they did not know they wanted
because of the growth in online book sales
(SiBol 13).
(Non)obviousness
Non-obviousness in CADS
References
Baker, P., C. Gabrielatos and A. McEnery. 2013.
Discourse Analysis and Media Attitudes: The
Representation of Islam in the British Press.
Cambridge: Cambridge University Press.
Biber, D., S. Johansson, G. Leech, S. Conrad and E.
Finegan. 1999. Longman Grammar of Spoken and
Written English. London: Longman.
Duguid, A. 2007. Men at work: how those at Number 10
construct their working identity. In Discourse,
Ideology and Specialized Communication, G. Garzone
& S. Sarangi (eds), 453-484. Bern: Peter Lang.
Friginal, E. and J. Hardy
2014. Corpus-based
Sociolinguistics. New York: Routledge.
Honderich, T. 2005. The Oxford Companion
Philosophy. Oxford: Oxford University Press.
to
Papers
Semantic tagging
and Early Modern collocates
Marc Alexander
University of
Glasgow
Alistair Baron
Lancaster
University
marc.alexander
@glasgow.ac.uk
a.baron
@lancaster.ac.uk
Fraser Dallachy
University of
Glasgow
Scott Piao
Lancaster
University
fraser.dallachy
@glasgow.ac.uk
s.piao
@lancaster.ac.uk
Paul Rayson
Lancaster
University
Stephen Wattam
Lancaster
University
p.rayson
@lancaster.ac.uk
s.wattam
@lancaster.ac.uk
Introduction
The corpus
http://www.gla.ac.uk/samuels/
http://ucrel.lancs.ac.uk/vard/
4
http://ucrel.lancs.ac.uk/wmatrix/
5
http://corpus.byu.edu/
3
Semantic annotation
Methodology
http://ucrel.lancs.ac.uk/usas/
Conclusion
http://www.gla.ac.uk/metaphor/
9
Acknowledgements
Wael Alharbi
Yanbu University College
References
Baron, A. and Rayson, P. 2008. VARD2:a tool for
dealing with spelling variation in historical corpora.
In: Postgraduate Conference in Corpus Linguistics,
2008-05-22, Aston University, Birmingham.
Dunning, T. 1993. Accurate methods for the statistics of
surprise and coincidence. Computational Linguistics
19(1). 6174.
EEBO. See http://eebo.chadwyck.com/home [accessed
13th January 2015]
Kay, C., Roberts, J., Samuels, M., and Wotherspoon, I.
(eds.). 2015. The Historical Thesaurus of English,
version 4.2. Glasgow: University of Glasgow.
http://www.gla.ac.uk/thesaurus
Rayson, P. 2008. From Key Words to Key Semantic
Domains. International Journal of Corpus Linguistics
13.4. 519-549.
Rayson, P., Archer, D., Piao, S. L., and McEnery, T.
2004a. The UCREL semantic analysis system. In
Proceedings of the workshop on Beyond Named Entity
Recognition Semantic labelling for NLP tasks in
association with 4th International Conference on
Language Resources and Evaluation (LREC 2004),
25th May 2004, Lisbon, Portugal, pp. 7-12.
Rayson, P., Berridge, D., and Francis, B. 2004b.
Extending the Cochran Rule for the Comparison of
Word Frequencies between Corpora. 7th International
Conference on Statistical Analysis of Textual Data.
10
whmalh@gmail.com
Introduction
Background
questions:
RQ1: What are the attitudes of students to towards
the corpus? Do they change over the three phases?
RQ2: Does the quantity and success of the queries
change over time?
Methods
www.learningenglish.voanews.com
11
References
British Council. 2013. Culture Means Business. Available
at
http://dera.ioe.ac.uk/18071/14/bis-13-1082international-education-global-growth-and-prosperityanalytical-narrative_Redacted.pdf
Chambers, A. 2007. Popularising corpus consultation by
language learners and teachers. In E. Hidalgo, L.
Quereda, & J. Santana (Eds.), Corpora in the foreign
language classroom (pp. 316). Amsterdam,
Netherlands: Rodopi.
Chambers, A., & OSullivan, I. 2004. Corpus consultation
and advanced learners writing skills in French.
ReCALL, 16(1), 158172.
Davies, S. 2003. Content-based instruction in EFL
contexts. The Internet TESL Journal, 4 (2). Re
Conclusion
13
Introduction
10
But
the
perspectives
of
designers
and
commissioning organisations are only part of the
story. The other parts involve users who interact
with the visualisation. During the course of the focus
groups, two issues emerged which have implications
for the ways that corpora are effectively visualised.
The first centred around user intentions, or the
expectations that users had for the visualisation.
What motivation did they have to interact with this
visualisation? In some cases, it was professional or
research interest: several participants were working
in media or communications, for example. Others
had personal experience of being a migrant in the
UK and connected with the subject matter. These
kinds of intentions and potential audiences are
important for linguists to consider as they visualise
their analysis because they can impact how the
subsequent presentation is interpreted.
Secondly, participants raised the issue of
(dis)trust, especially with this particular visualisation
of media texts. It was apparent that the political
importance of immigration, particularly in some of
the regions that had experienced recent migration,
impacted the reception of the visualisation and the
underlying corpus methods. For example, despite the
presence of explanatory text about the methods used,
as well as their comprehensiveness and limitations
(Allen 2014), some participants expressed
scepticism over the intentions of the visualisation:
the fact it was a corpus of media outputs suggested
to some that it was automatically politically biased
and motivated to counter negative portrayals. Yet
this was not universally felt: others pointed out that
features like the breadth of data and academic
branding communicated a sense of trust.
These issues of user intentions and trust
exemplify how reception of visualisations is affected
15
Conclusion
Acknowledgements
The author would like to acknowledge the collective
contributions of the Seeing Data team and advisory
board in the development, fieldwork, and analysis
stages.
References
Allen, W. (2014). Does Comprehensiveness Matter?
Reflections on Analysing and Visualising Uk Press
Portrayals of Migrant Groups. Paper presented at the
Computation + Journalism Symposium, Columbia
University, New York City.
Allen, W., & Blinder, S. (2013). Migration in the News:
Portrayals of Immigrants, Migrants, Asylum Seekers
and Refugees in National British Newspapers, 2010 to
2012 Migration Observatory Report. University of
Oxford: COMPAS.
Di Cristofaro, M. (2013). Visualizing Chunking and
Collocational Networks: A Graphical Visualization of
Words Networks. Paper presented at the Corpus
Linguistics Conference 2013, Lancaster University.
Gabrielatos, C., & Baker, P. (2008). Fleeing, Sneaking,
Flooding a Corpus Analysis of Discursive
Constructions of Refugees and Asylum Seekers in the
Uk Press, 1996-2005. Journal of English Linguistics,
36(1), 5-38.
Kirk, A. (2012). Data Visualization: A Successful Design
Process: Packt Publishing Ltd.
Kirk, A. (2014). Recognising the Intent of a Visualisation.
http://seeingdata.org/getting-gist-just-enough/
16
Abdulrahman
AlOsaimy
University of
Leeds
Eric Atwell
University of
Leeds
scama
@leeds.ac.uk
e.s.atwell
@leeds.ac.uk
Introduction
Work
17
Feature
Gender
Possible Values
Male/Female
Applied to
Nomonials &
Subj. of verb
Number
Sing./Dual/Plural
Case
nominative, accusative,
genitive
Definite or Not
First, Second, Third
active, passive
perfective, imperative,
imperfective
indicative, subjunctive,
jussive, energetic
Nomonials &
Subj. of verb
Nominals
state
Person
voice
aspect
mood
Nominals
Verbs
Verbs
Verbs
imperfective
verbs
Challenges
References
Aliwy, Ahmed Hussein. Arabic Morphosyntactic Raw
Text Part of Speech Tagging System. Diss.
Repozytorium Uniwersytetu Warszawskiego, 2013.
Habash, Nizar Y. "Introduction to Arabic natural
language processing."Synthesis Lectures on Human
Language Technologies 3.1 (2010): 1-187.
Smr, Otakar. Functional Arabic Morphology. Formal
System and Implementation. Diss. Ph. D. thesis,
Charles University in Prague, Prague, Czech Republic,
2007.
Boudlal, Abderrahim, et al. "Alkhalil Morpho SYS1: A
Morphosyntactic Analysis System for Arabic Texts."
International Arab Conference on Information
Technology. 2010.
Dukes, Kais, and Nizar Habash. "Morphological
Annotation of Quranic Arabic."LREC. 2010.
Atwell, E. S. "Development of tag sets for part-of-speech
tagging." (2008): 501-526.
Jaafar, Younes, and Karim Bouzoubaa. "Benchmark of
Arabic morphological analyzers challenges and
solutions." Intelligent Systems: Theories and
Applications (SITA-14), 2014 9th International
Conference on. IEEE, 2014.
Pasha, Arfath, et al. "Madamira: A fast, comprehensive
tool for morphological analysis and disambiguation of
Introductions in Engineering
Lectures
Sin Alsop
Coventry
University
Hilary Nesi
Coventry
University
alsops@uni
.coventry.ac.uk
hilary.nesi
@coventry.ac.uk
20
Figure 1: A visualisation of the occurrence and duration of selected discourse functions in the ELC
References
Bhatia, V. K. 1997. "Genre-Mixing in Academic
Introductions." English for Specific Purposes 16 (3):
181-195.
Lee, J. J. 2009. "Size matters: An exploratory comparison
of small- and large-class university lecture
introductions." English for Specific Purposes 28 (1):
42-57.
Nesi, H. and Gardner, S. 2012. Genres Across the
Disciplines: Student Writing in Higher Education.
Cambridge: Cambridge University Press.
Shamsudin, S. and Ebrahimi, S J. 2013. "Analysis of the
moves of engineering lecture introductions." Procedia
- Social and Behavioral Sciences 70 (0): 1303-1311.
Swales, J. M. 2004. Research Genres: Explorations and
Applications. Cambridge: Cambridge University Press.
Swales, J. M. 1990. Genre Analysis: English in Academic
and Research Settings. Cambridge: Cambridge
University Press.
Swales, J. M. 1981. Aspects of Article Introductions.
Language Studies Unit at the University of Aston in
Birmingham (reprinted 2011, Michigan University
Press).
Thompson, S. E. 2003. "Text-structuring metadiscourse,
intonation and the signalling of organisation in
academic lectures." Journal of English for Academic
Purposes 2 (1) 520.
Thompson, S. E. 1994. "Frameworks and contexts: A
genre-based
approach
to
analysing
lecture
introductions." English for Specific Purposes 13 (2)
171-186.
Yaakob, S. 2013. A Genre Analysis and Corpus Based
Study
of
University
Lecture
Introductions.
Unpublished PhD thesis, University of Birmingham.
Yeo, J-Y. and S-H. Ting. 2014. "Personal pronouns for
student engagement in arts and science lecture
introductions." English for Specific Purposes 34: 2637.
Young, L. 1994. "University lectures macro-structure
and micro-features". In J. Flowerdew (ed.) Academic
Listening. Cambridge: Cambridge University Press.
22
Insa Nolte
University of
Birmingham
clyde.ancarno
@kcl.ac.uk
m.i.nolte
@bham.ac.uk
Reference
Hunston, S. 2002. Corpora in applied linguistics.
Cambridge: Cambridge University Press.
23
Laurence
Anthony
Waseda
University
Paul
Baker
Lancaster
University
anthony@
waseda.jp
j.p.baker
@lancaster.ac.uk
Introduction
Validation experiments
LL threshold (0.001)
Rank Key Types
Key Tokens
Islam
1 Islam
Islam
2 Islam
Islam
3 Islam
Islam
4 Islam
Islam
5 Islam
Islam
6 Islam
Football
7 Islam
Obituary
8 Islam
Islam
9 Islam
Islam
10 Football
Islam
11 Obituary
Football
12 Islam
Science
13 Review
Review
14 Football
Islam
15 Science
Tennis
16 Tennis
Football
17 Football
Art
18 Football
Football
19 Football
Football
20 Art
Table I: ProtAnt analysis of newspaper articles
The second experiment was designed to see if
ProtAnt was able to correctly identify prototypical
texts in a small corpus of longer novels. Following a
similar design to that used in experiment 1, 10
versions of the novel Dracula were compared
against five versions of the novel Frankenstein, and
5 other randomly selected novels. Again, results
revealed that the ProtAnt analysis could rank almost
25
Conclusion
26
References
Baker, P. 2009. The BE06 Corpus of British English and
recent language change. International Journal of
Corpus Linguistics 14(3): 312-337.
Damerau, F. J. 1993. Generating and evaluating domainoriented multi-word terms from texts. Information
Processing and Management 29: 433-447.
Dunning, T. 1993. Accurate Methods for the Statistics of
Surprise and Coincidence. Computational Linguistics
19(1): 61-74.
Francis W. N. and Kucera H. 1964. Brown Corpus.
Available
online
at
https://archive.org/details/BrownCorpus
Merriam-Webster.
2014.
Available
http://www.merriamwebster.com/dictionary/prototypical
online
at
Bethan Malory
University of Central
Lancashire
dearcher
@uclan.ac.uk
bmccarthy2
@uclan.ac.uk
References
Archer, D. 2014. Exploring verbal aggression in English
11
A. Eda zkan
Mersin
University
Blent zkan
Mersin
University
aedaozkan
@mersin.edu.tr
ozkanbulent
@mersin.edu.tr
Introduction
Population Sample
Novel
96
Poem
68
7*
Tale
49
4*
Essay-Critics
44
3*
Theatre
Memoir
Research
Conversation-InterviewArticle
Humour
35
21
20
1*
18
1*
10
1*
Letter
1*
Biography
Diary
Various Types
30
TOTAL
*Anthological works
14
403
421
2
3
4
Life
Culture-ArtHealth
Essay
Economy-Finance,
World-Live,
Weather Forecast ,
Sports
Technology,
Education, Tabloid
Press
Health, Book,
Cinema, Theatre
Column
%10
%10
%20
100
100
Research Questions
%
22,80
2
16,15
2
11,63
8
10,45
1
8,313
4,988
4,750 %6
0
4,275
3,325
2,375
0,950
0,950
0,237
7,125
18*
100
Adjectives
amansz
azgn
ayakl
ahap
anlalmaz
adl
asrlk
ayn
ar, ateli
altm
apayr
alayc
ailevi
alt
ayrcalkl
ak
alayl
altn
avantajl
agresif
ahlakl
arka
aslsz
ackl, alelade
arkeolojik, artistik, atl
alt
anlk
acayip, adaletsiz
Afgan, arsz, asl, azl
f
436
371
359
336
307
303
285
278
272
258
244
243
241
239
233
217
208
193
185
182
181
180
177
175
170
163
162
160
157
Adjectives
akli, allm, ani, aylk
ayr
akll, aptal
art
adaletli
ar
arbal, analitik
alafranga
akamki
astronomik
acmasz
altnc, antidemokratik
anlamsz
ait, aynal
akl banda
aydnlk
acl
ak sak, akademik
asil
abuk sabuk, altn sars
aptalca
alakasz, aldatc, asabi
akllca
ahenkli, anlaml
alamakl, ayrntl
aktif, anlayl
alml
ask
alaturka
f
154
153
151
150
145
143
140
139
137
134
129
128
127
124
123
121
118
117
116
115
113
112
111
110
108
107
104
103
100
LAYERS
News etc.
B- INTERNET TEXTS
SUB LAYERS
Politics,
%
%60
%40
12
f
591
221
812
%
73
27
100
Adjective
1
1
1
1
4
7
36
78
462
Total
591
ar
ak
azgn
ayakl
acemi, amatr, ana, azametli
alt, alayl, aylk, ac, aksak, ak, agresif
ayn, ateli, altn, arsz, alafranga,
aydnlk, aktif
alayc, ailevi, atl, ani, astronomik, acl,
akademik
amansz, ahap, anlalmaz, adl, asrlk,
altm, apayr
-
30
ar
aina
akn
atl
atlgan
ayakl
aylk
ayrk
ayrks
ayrk
azgn
Adjectives
aksak
alacal
albenili
albenisiz
aldatc
alt
altn
amatr
ameliyatl
ana
anadan doma
angaje
anmsatc
arabal
aral
Arap
arzal
arzasz
arkasz
armut
art
ask suratl
astarsz
aalayc
ar
atak
atl
avu
ayakl
aydnlatc
aylk
ayn
ayrtrc
azgn
ak pak
akademik
akc
aklc
akkan
aksak
aksi
aktif
alafranga
alayc
alayl
albenili
alengirli
alevli
alkoll
allahsz
alt
altn
amatr
amiyane
ampirik
ana
anadan doma
ani
anlaml
anonim
antiseptik
ar
arzal
arzi
aristokrat
arkal
arsz
astronomik
aa
akn
atak
ateli
atl
avu dolusu
ayakl
aydn
aydnlk
aygn baygn
ayl
aylk
aylkl
ayn
ayrk
azametli
azgn
Conclusion
Acknowledgement
This study is based upon a National research Project,
numbered as TBTAK-SOBAG-109K104 nolu and
titled as Collocations of Adjectives in Turkey
Turkish A Corpus Based Application-. We
appreciate the contributions of TBTAK.
References
Kennedy, Graeme (1998). An Introduction to Corpus
Linguistics. New York: Addison Wesley Longman
Limited.
McEnery, Tony et al. (2006). Corpus-Based Language
Studies An Advanced Resource Book. New York:
Routledge.
zkan, B. (2010). An Investigation on Corpus-Checking
of Lexems Defined as Adverb in Gncel Trke
Szlk. Turkish Studies International Periodical for
the Languages, Literature and History of Turkish or
Turkic. 5/3 Summer 2010: 1764-1782.
zkan, B. (2010). Turkish Corpus - 2 (TC-2). Mersin
University.
zkan,
B.
(2011). TBTAK-SOBAG-109K104
Collocations of Adjectives in Turkey Turkish - A
Corpus Based Application- Project Report.
http://derlem.mersin.edu.tr/ctb/modules.php?module=t
anitim
zkan, B. (2014). The Corpus-Check of Verbs and the
Corpus-Based Dictionary of Verbs in Turkey Turkish
Lexicon bilig. Journal of Social Sciences of Turkish
World. 69. Spring 2014. 1719-204.
Trke Szlk (2005). Ankara: TDK Yay.
Trke Szlk http://tdk.gov.tr/
from L1 traditions.
Socio-pragmatic transfer from L1 to L2 may also
occur in the Korean L2 writers' preferred linguistic
choices in sub-category of each metadiscourse
markers; their strong preference for obligation
modal verbs among attitude markers might be
related to their pragmatic function of hedging in
Korean discourse. Also, rhetorical questions among
engagement devices, and of modal verb 'would' in
hedged expressions, often working as the indirect or
politeness discourse strategy in Korean spoken
discourse, are more frequently employed by Korean
L2 writers. A lack of register awareness might be
also problematic. Pedagogical L2 writing resources
should be given to teach Korean learners alternative
strategies for both genre-specific and culturespecific devices in written academic community.
Vt Suchomel
Lexical Computing
Ltd,
Masaryk Univ
vit.baisa@
sketchengine.co.uk
vit.suchemel@
sketchengine.co.uk
Adam Kilgarriff
Lexical Computing
Ltd.
Milo Jakubek
Lexical Computing
Ltd.
Masaryk Univ
adam.kilgarriff@
sketchengine.co.uk
milos.jakubicek@
sketchengine.co.uk
http://www.wordreference.com
http://usingenglish.com
http://www.linguee.com/
17
https://www.wordnik.com
18
http://en.bab.la
19
While it was tempting to say the E in the acronym should be
for English, we decided against, as we envisage offering
SkELL for other languages (SkELL-it, SKeLL-de, etc).
16
33
34
500
200
500
all
all
200
https://en.wikipedia.org
Tokens used
millions
21
20
Tokens (= words
+ punctuation)
millions
1,600
530
1,600
105
112
340
Acknowledgment
References
Baroni, M., & Bernardini, S. (2004, May). BootCaT:
Bootstrapping Corpora and Terms from the Web. In
LREC.
Baroni, M., Kilgarriff, A., Pomiklek, J., & Rychl, P.
(2006). WebBootCaT: instant domain-specific corpora
to support human translators. In Proceedings of EAM
(pp. 247-252).
Jakubek, M., Kilgarriff, A., Kov, V., Rychl, P., &
Suchomel, V. (2013). The TenTen Corpus Family. In
Proc. Int. Conf. on Corpus Linguistics.
Kilgarriff, A., Rychly, P., Smrz, P., & Tugwell, D.
(2004). Itri-04-08 the sketch engine. In Proceedings of
EURALEX (Vol. 6). Lorient, France. Pp 105-116.
Kilgarriff, A., Husk, M., McAdam, K., Rundell, M., &
Rychl, P. (2008, July). GDEX: Automatically finding
good dictionary examples in a corpus. In Proceedings
of EURALEX (Vol. 8).
35
Longest-commonest match
Vt Baisa
Lexical Computing
Ltd.
Masaryk Univ
Adam Kilgarriff
Lexical Computing
Ltd.
vit.baisa
@sketchengine.co.
uk
adam.kilgarriff
@sketchengine.co
.uk
Pavel Rychl
Lexical Computing
Ltd.
Masaryk Univ
Milo Jakubek
Lexical Computing
Ltd.
Masaryk Univ
pavel.rychly
@sketchengine.co.
uk
Milos.jakubicek
@sketchengine.co
.uk
Introduction
The prospects for automatically identifying twoword multiwords in corpora have been explored in
depth, and there are now well-established methods
in widespread use. (We use multiwords as a
cover-all term to include collocations, colligations,
idioms, set phrases etc.) But many multiwords are
of more than two words and research into methods
for finding items of three and more words has been
less successful (as discussed in the penultimate
section below).
We present an algorithm for identifying candidate
multiwords of more than two words called longestcommonest match.
Example
Compare Tables 1 and 2. Both are automaticallygenerated reports on the collocational behaviour of
the English verb fly.
Table 2 is an improvement on the input data as
shown in Table 1 as it immediately shows:
two set phrases - as the crow flies, off to a
flying start
sortie occurs as object within the noun
phrase operational sorties (a military
expression),
which is generally in the past tense
flying saucers and insects are salient. The
previous level of analysis, in which saucer
was
analysed as object of fly, and insect as
subject, left far more work for the analyst to
do, including unpacking parsing errors
sparks go with the base form of the verb
36
L-C match
%
flying saucers 52.3
as the crow
89.2
flies
kite
376
8.33 sortie
283
8.17 flew
47.3
operational
sorties
spark
256
8.02 sparks fly
40.6
aircraft
799
7.84 aircraft flying 40.8
plane
527
7.57 airline
297
7.39 airlines fly
30.0
start
980
7.24 off to a flying 64.8
start
helicopter
214
7.24 helicopter
29.9
flying
bird
917
7.08 insect
245
6.93 flying insects 82.0
pilot
350
6.68 Table 2: As Table 1, but with longest-commonest
match. % is the percentage of the hits (column 2)
which that the l-c match accounts for.
Algorithm
Related work
38
Current status
Acknowledgement
This work has been partly supported by the Ministry
of Education of the Czech Republic within the
LINDAT-Clarin project LM2010013 and by the
Czech-Norwegian Research Programme within the
HaBiT Project 7F14047.
References
Church, K. W., Hanks, P. 1989. Word association norms,
mutual information, and lexicography. Proc 27th
ACL, Vancouver, Canada. Pp. 7683.
Daudaraviius, V., Marcinkeviien, R. 2004. Gravity
counts for the boundaries of collocations. Int Jnl of
Corpus Linguistics 9(2) pp. 321348.
Triangulating methodological
approaches (panel)
Paul Baker
Lancaster
University
Jesse Egbert
Brigham Young
University
p.baker
@lancaster.ac.uk
jesse_egbert
@byu.edu
Tony McEnery
Lancaster
University
Amanda Potts
Lancaster
University
a.mcenery
@lancaster.ac.uk
a.potts
@lancaster.ac.uk
Bethany Gray
Iowa State
University
begray@iastate.edu
Introduction
Triangulation
Research question
40
Analytical methods
References
Baker, P. (2014). Using Corpora to Analyse Gender.
London: Bloomsbury.
Baker, P. (2015). Does Britain need any more foreign
doctors? Inter-analyst consistency and corpus-assisted
(critical) discourse analysis. In M. Charles, N. Groom
41
42
oh I (f)
oh I (m)
no I (f)
no I (m)
I think (f)
I think (m)
is it (f)
mn
is it (m)
I mean (f)
I mean (m)
WC: 30
ou know (f)
C2: 17
DE: 13
UU: 4
Region
different
regions
across the
UK
10
ou know(m)
Socio-econ. status
AB: 14
MC: 30
C1: 16
12
do you (f)
Gender Age
32 M
A (14-34): 24
32 F
B (35-54): 27
C (55+): 13
do you (m)
Corpus data
Results
and I (f)
and I (m)
Introduction
well I (f)
v.brezina
@lancaster.ac.uk
well I (m)
mi.barlow
@auckland.ac.nz
Vaclav Brezina
Lancaster
University
Method
Michael Barlow
University of
Auckland
mn
9
you know (f)
you know (m)
5
Frequency per 1000 Bigrams
I know (f)
I know (m)
4
2
1st
2nd
Position in Utterance
Final
1st
2nd
Final
Position in Utterance
Discussion
References
Barlow, M. (2014) WordSkew, Athelstan: Houston.
Brezina, V., & Meyerhoff, M. (2014). Significant or
random?: a critical review of sociolinguistic
generalisations based on large corpora. International
Journal of Corpus Linguistics, 19(1), 1-28.
Gries, S. T. (2006). Exploring variability within and
between corpora: some methodological considerations.
Corpora, 1(2), 109-151.
43
James Curran
University of
Sydney
Monika.Bednarek
@sydney.edu.au
james.r.curran
@sydney.edu.au
Tim Dwyer
University of
Sydney
Fiona Martin
University of
Sydney
timothy.dwyer
@sydney.edu.au
fiona.martin
@sydney.edu.au
Joel Nothman
University of Sydney
joel.nothman@gmail.com
Introduction
Corpus
44
Analyses
http://likeable.share-wars.com/
Acknowledgments
This paper is an output of the Australian Research
Council Linkage Project grant Sharing News Online:
Analysing the Significance of a Social Media
Phenomenon [LP 140100148].
References
Bednarek, M. and Caple, H. 2012a. News discourse.
London/New York: Continuum.
Bednarek, M. and Caple, H. 2012b. Value Added:
Language, image and news value. Discourse, Context
& Media 1: 103-113.
Bednarek, M. and Caple, H. 2014. Why do news values
matter? Towards a new methodological framework for
analyzing news discourse in Critical Discourse
Analysis and beyond. Discourse & Society 25 (2):
135-158.
Potts, A., Bednarek, M. and Caple, H. in press. How can
computer-based methods help researchers to
investigate news values in large datasets? A corpus
linguistic study of the construction of newsworthiness
in the reporting on Hurricane Katrina. Discourse &
Communication.
Tobias.J.Bernaisch
@anglistik.unigiessen.de
stgries@linguistic
s.ucsb.edu
References
Bernaisch, T., Gries, S.T. and Mukherjee, J. 2014. The
dative alternation in South Asian English(es):
modelling predictors and predicting prototypes.
English World-Wide 35(1): 731.
Bernaisch, T., Koch, C., Mukherjee, J. and Schilk, M.
2011. Manual for the South Asian Varieties of English
(SAVE) Corpus: compilation, cleanup process, and
details on the individual components. Giessen: Justus
Liebig University.
Bresnan, J. and Hay, J. 2008. Gradient grammar: an
effect of animacy on the syntax of give in New Zealand
and American English. Lingua 118: 245259.
Gries, S.T. 2003. Towards a corpus-based identification
of prototypical instances of constructions. Annual
Review of Cognitive Linguistics 1: 127.
Cinzia Bevitori
University of Bologna
cinzia.bevitori@unibo.it
The Corpus
Years
17901864
18651916
19171945
19461989
19902014
Epoch
Up to last
Civil
War
address
Before
WW I
Up to
end of
WW II
Cold
War
End of
Cold War
to present
Presidents
Washingt
on to
Lincoln
Johnso
n A. to
Wilson
Wilson
to
Roosev
elt F.
D.
Truma
n to
Bush
G. H.
Bush G.
H. to
Obama
76
52
28
47
25
550,791
647,81
7
152,56
6
292,87
8
152,089
No.
Addresses
No.
Tokens
5
Figure 1. Relative frequency of God (per hundred
tokens) across historical segments
Still, searching for God is only but one part of the
story. Word meanings may not be (and, indeed
frequently, are not) stable over time and this, I
believe, represents a great challenge to the corpus
analyst attempting to combine quantitative and
qualitative investigation of distinctive rhetorical
structures over time (see also Bevitori 2015). In fact,
close reading of most of the 18th and 19th century
SoU addresses shows that there are numerous
variants of the name God, which are difficult to
retrieve only through the aid of concordances
(Bayley and Bevitori 2014).
However, looking at the texts first can point to
possible search terms which can be further explored
through the software (e.g. Providence, Supreme,
Being, Divine Blessing, etc). In particular, the
analysis of the lemma bless* across the same
historical segments affords a somewhat useful
complementary perspective. There are 271
occurrences of bless* in the whole corpus (Figure
2), corresponding to 0.015 per hundred tokens.
'bless'
Conclusion
References
Anthony, L. 2011. AntConc (Version 3.2.4w) [Computer
Software] Tokyo, Japan: Waseda University. Available
online at http://www.laurenceanthony.net/
Baker, P. 2006. Using Corpora in Discourse Analysis.
London: Continuum.
Baker, P. 2011. Times may change but we'll always have
money: a corpus driven examination of vocabulary
change in four diachronic corpora. In Journal of
English Linguistics, 39: 65-88.
Baker, P., Gabrielatos, C., KhosraviNik, M.,
Krzyzanowski, M., McEnery, T. and Wodak, R. 2008,
A useful synergy? Combining critical discourse
analysis and corpus linguistics to examine discourses
of refugees and asylum seekers in the UK press, in
Discourse and Society, 19(3): 273-306.
Bayley, P. and Bevitori, C. 2011. Addressing the
Congress: Language change from Washington to
Obama (1790-2011). Unpublished Paper given at
Clavier 11 International Conference , Tracking
Language Change in Specialised and Professional
Genres, University of Modena and Reggio Emilia,
Modena, 24-26 November 2011.
Bayley, P. and Bevitori C. 2014. In search for meaning:
what corpora can/cannot tell. A diachronic case study
of the State of the Union Addresses (1790-2013). In
Miller, D. R., Bayley, P., Bevitori, C., Fusari, S. and
Luporini, ATicklish trawling: The limits of corpus
assisted meaning analysis. In Alsop, S. and Gardner,
S. (eds). Proceedings of ESFLCW 2013. Language in a
Digital Age: Be Not Afraid of Digitality. 01-03 July
2013. Coventry University: Coventry, UK
Bayley, P. and Bevitori C. , 2015. Two centuries of
security: Semantic variation in the State of the Union
49
Introduction
50
Sprachstandardisierung
unter
Mehrsprachigkeitsbedingungen: Das Deutsche in
Luxemburg im 19. Jahrhundert. Jahrbuch fr
germanistische Sprachgeschichte 5: 283-298
Gilles, P. and Ziegler, E.. 2013. The Historical
Luxembourgish Bilingual Affichen Database. In P.
Bennett, M. Durrell, S. Scheible and R.J. Whitt (eds.)
New methods in Historical Corpus Linguistics.
Tbingen: Narr. 127-138
Conclusion
References
Beyer, R., Gilles, P., Moliner, O. and Ziegler, E.. 2014.
51
Jesse Egbert
Brigham Young
University
Douglas.Biber
@nau.edu
Jesse_Egbert
@byu.edu
Mark Davies
Brigham Young University
Mark_Davies@byu. edu
Introduction
The corpus used for the study was extracted from the
General component of the Corpus of Global Webbased
English
(GloWbE;
see
http://corpus2.byu.edu/glowbe/).
The GloWbE
corpus contains c. 1.9 billion words in 1.8 million
web documents, collected in November-December
2012 by using the results of Google searches of
highly frequent English 3-grams (i.e., the most
common 3-grams occurring in COCA; e.g., is not
the, and from the). 800-1000 links were saved for
each n-gram (i.e., 80-100 Google results pages),
minimizing the bias from the preferences built into
Google searches.
Many previous web-as-corpus
studies have used similar methods with n-grams as
search engine seeds (see, e.g., Baroni & Bernardini,
2004; Baroni et al., 2009; Sharoff, 2005; 2006). It is
important to acknowledge that no Google search is
truly random. Thus, even searches on 3-grams
consisting of function words (e.g., is not the) will to
some extent be processed based on choices and
predictions built into the Google search engine.
However, selecting hundreds of documents for each
of these n-grams that consist of function words
rather than content words minimizes that influence.
To create a representative sample of web pages to
be analyzed in our project, we randomly extracted
53,424 URLs from the GloWbE Corpus. This
sample, comprising web pages from five geographic
regions (United States, United Kingdom, Canada,
Australia, and New Zealand), represents a large
sample of web documents collected from the full
spectrum of the searchable Web. Because the
ultimate objective of our project is to describe the
lexico-grammatical
characteristics
of
web
documents, any page with less than 75 words of text
was excluded from this sample.
To create the actual corpus of documents used for
our study, we downloaded the web documents
associated with those URLs using HTTrack
For the first study, we employed a bottom-up userbased investigation of a large, representative corpus
of web documents. Instead of relying on individual
expert coders, we recruit typical end-users of the
Web for our register coding, with each document in
the corpus coded by four different raters. End-users
identify basic situational characteristics of each web
document, coded in a hierarchical manner. Those
situational characteristics lead to general register
categories, which eventually lead to lists of specific
sub-registers. By working through a hierarchical
decision tree, users are able to identify the register
category of most internet texts with a high degree of
reliability.
The approach we have adopted here makes it
possible to document the register composition of the
searchable web. Narrative registers are found to be
the most prevalent, while Opinion and Informational
Description/Explanation registers are also found to
be extremely common.
One of the major
innovations of the approach adopted here is that it
permits an empirical identification of hybrid
documents, which integrate characteristics from
multiple general register categories (e.g.,
opinionated-narrative). These patterns are described
and illustrated through sample internet documents.
Study 2:
Comprehensive lexicogrammatical description of web registers
References
Baroni, M and Bernardini, S. 2004. BootCaT:
Bootstrapping corpora and terms from the web.
Proceedings of LREC 2004, Lisbon: ELDA. 13131316.
Baroni, M., Bernardini, S., Ferraresi, A., & Zanchetta, E.
2009. The WaCky wide web: A collection of very
large linguistically processed web-crawled corpora.
Language Resources and Evaluation 43 (3): 209-226.
Biber, D. 1988. Variation across Speech and Writing.
Cambridge: Cambridge University Press.
Kilgarriff, A. and Grefenstette, G. 2003. Introduction to
the special issue on the Web as corpus. Computational
Linguistics, 29:333-347.
Sharoff, S. 2005. Creating general-purpose corpora using
automated search engine queries. In M. Baroni and S.
Bernardini, (Eds.), WaCky! Working papers on the
Web as Corpus. Gedit, Bologna.
Sharoff, S. 2006. Open-source corpora: Using the net to
fish for linguistic data. International Journal of Corpus
Linguistics, 11(4), 435-462.
54
Randi Reppen
Northern Arizona
University
Douglas.biber
@nau.edu
Randi.reppen
@nau.edu
Erin Schnur
Northern Arizona
University
Romy Ghanem
Northern Arizona
University
Erin.Schnur
@nau.edu
Rg634
@nau.edu
References
Brezina, V. and D. Gablasova. 2013. Is there a core
general
vocabulary?
Introducing
the
New
General Service List. Applied Linguistics. 1-23
Coxhead, Averil. 2000. A new academic word list.
55
Felix Bildhauer
Freie
Universitt
Arne Zeschel
Institut fr Deutsche
Sprache
felix.bildhauer
@fu-berlin.de
zeschel@idsmannheim.de
Introduction
A:
B:
a.
Ich denke, dass sie morgen
kommen.
b.
Ich denke, sie kommen
morgen.
c.
Ich denke morgen.
I think [(that) they will come]
tomorrow.
a.
Ich denke,
dass sie morgen kommen.
sie kommen morgen.
morgen.
Ich bezweifle,
dass sie morgen kommen.
?? sie kommen morgen.
*morgen.
I doubt [(that) they will come] tomorrow.
Since they are not licensed across the board, frag56
mentary complement clauses like (1c) cannot be accounted for by unspecific appeals to recoverability
in context alone. How can the contrasts in (2a-c) be
explained, then? We explore the possibility that
types of permitted ellipses can be predicted from
governing verbs preference for particular kinds of
non-elliptical complement clauses. For instance,
wissen know most commonly combines with whclauses among its sentential complements. And
though ungrammatical in the ellipsis in (2.b), it
works well with sluices (Ross 1969) such as (3):
(3)
Corpus study
We explore these issues in a combined corpus-linguistic and interactional study of 25 CTPs from
different semantic classes using samples of 500
attestations each. The data is taken from the German
national conversation corpus FOLK (Deppermann &
Hartung 2011) and a subset of the DECOW2012
web corpus (Schfer & Bildhauer 2012) containing
quasi-spontaneous CMC data.
Before coding the full set of 25x500=12500
samples, we conducted a pilot study with seven
verbs from three semantic classes:
EPISTEMIC STATUS
denken to think
wissen to know
bezweifeln to doubt
PROPOSITIONAL ATTITUDE
frchten to fear
befrchten to fear
SOURCE OF KNOWLEDGE
hren to hear
merken to notice
Results
Outlook
Since the results of the pilot study point in the expected direction, the study is currently expanded to
the full set of 25 CTPs (12,500 data points). The
expanded version comprises five different verbs
from five semantic classes, including a greater number of wh-compatible types. In a first step, we repeat
the procedure outlined above for the total dataset.
Next, we zoom in on the actual usage patterns of the
elliptical utterances thus identified by investigating a
variety of their morphosyntactic, semantic, deictic,
information structural and sequential context properties. We close with a brief discussion of theoretical
options for modelling our findings within a surfaceoriented, construction-based approach to grammar:
what is the theoretical status of structures (i.e. our
fragmentary complement clauses) that are apparent
variants of other constructions (i.e. full syntactic
complementation patterns), in particular if these
structures are not merely different in form but also
show a more restricted distribution?
References
Deppermann, A. and Hartung, M. 2011. Was gehrt in
ein nationales Gesprchskorpus? Kriterien, Probleme
57
58
Danni Yu
University of
Modena and Reggio
Emilia
marina.bondi
@unimore.it
dannimail
@foxmail.com
References
Bhatia, A. 2012. The CSR report-the hybridization of a
confused genre (2007-2011) research article.
IEEETransactions on professional communication
55(3): 221-228.
Bhatia, A. 2013. International genre, local flavouranalysis of PetroChina's Corporate and Social
Responsibility Report. In RevistaSignos. Estudios de
linguistica. 46(83): 307-331.
Bondi, M. Forthcoming. The future in reports:
prediction, commitment and legitimization in CSR.
Pragmatics and society.
Catenaccio, P. 2012. Understanding CSR Discourse:
Insights from Linguistics and Discourse Analysis.
Milano: Brossura.
Fuoli, M. 2012. Assessing Social Responsibility: A
quantitative analysis of Appraisal in BP's and IKEA's
social reports. Discourse & Communication 6(1): 5581.
Fuoli, M. andParadis, C. 2014. A model of trust-repair
discourse.Journal of Pragmatics 74: 52-69.
Hunston, S. 2008. Starting with the small words:
Patterns, lexis and semantic sequences. In
International Journal of Corpus Linguistic s 13/3: 271295.
Malavasi, D. 2012. The necessary balance between
sustainability and economic success, an analysis of
Fiats and Toyotas Corporate Social Responsibility
Reports. In P. Heynderickxet al (eds.) The Language
Factor in International Business. Bern: Peter Lang,
247-264.
Sinclair, J.M. 2004. Trust the Text. Language, Corpus
and Discourse. London: Routledge.
Wang, D. 2013. Applying Corpus Linguistics in
Discourse Analysis. In Studies in Literature and
Language 6(2): 35-39.
59
Sally Hunt
Rhodes University,
Grahamstown
r.bowker@
ru.ac.za
s.hunt@
ru.ac.za
References
Bhaskar, R. (1993) Dialectic: The Pulse of Freedom.
London: Verso
Donnelly, L. (2011) Pay wars sideline job creation in
Mail
&
Guardian,
22
July,
http://mg.co.za/article/2011-07-22-pay-wars-sidelinejob-creation (Accessed 06.08.13)
Fairclough, N. (2001) Language and Power. 2nd edition.
Abingdon: Routledge
Fairclough, N. (2010) Critical Discourse Analysis: The
Critical Study of Language. 2nd ed. Harlow: Longman
Fanon, F. (1963) The Wretched of the Earth. Trans.
Farrington, C. New York: Grove Press
Gramsci, A. (1971) Selections from the Prison Notebooks.
Trans. Hoare, Q. & Nowell-Smith, G. London:
Lawrence & Wishart
Lakoff, G. & Johnson, M. (1980) Metaphors we live by.
Chicago: University of Chicago Press
Letsoalo, M. (2011) SA hit by strike fever in Mail &
Guardian, 15 July, http://mg.co.za/article/2011-07-15sa-hit-by-strike-fever (Accessed 06.08.13)
Machin, D. & Mayr, A. (2012) How to Do Critical
Discourse Analysis. London: Sage
Mail & Guardian www.mg.co.za
Peirce, C. S. (1955) Philosophical Writing of Peirce. Ed.
Buchler, J. New York: Dover
SAPA (2012) Battle lines drawn as municipal wage
negotiations begin in Mail & Guardian, 22 May,
http://mg.co.za/article/2012-05-22-municipal-wage26
Introduction
62
Results
Discussion
Acknowledgements
References
Giannoni, D., 2011. Mapping academic values in the
disciplines: a corpuss-based approach. Bern: Peter
Lang.
Hyland, K., 2004. Disciplinary discourses: social
interactions in academic writing. Ann Arbor:
University of Michigan.
Collocations in context:
A new perspective on collocation
networks
Vaclav Brezina
Lancaster University
Tony McEnery
Lancaster University
v.brezina
@lancaster.ac.uk
a.mcenery
@lancaster.ac.uk
Stephen Wattam
Lancaster University
s.wattam@lancaster.ac.uk
Method
63
Text
Tokens
Date
Yates
43,016
1699
Walker
63,515
1711
Anon
4,201
1740
Penn
9,800
1745
TOTAL
120,532
Table 1: Society for the Reformation of Manners
Corpus
The study uses GraphColl, a new tool developed by
the authors, which builds collocation networks on
the fly and gives the user full control over the
process of identification of collocations. Our starting
node (i.e. the word which we searched for first) was
swearing. The procedure consisted of the
following steps:
1 Replication of McEnerys (2006) study MI2
association measure.
2 Checking the results with log likelihood, another
association measure, which looks at the
evidence in the data against the null hypothesis.
3 Adding directionality as another dimension of
the collocational relationship using directional
association measure Delta P (Gries, 2013).
4 Adding dispersion with Cohens D (Brezina, in
preparation).
64
Conclusion
References
Algina, J., Keselman, H., & Penfield, R. D. (2005). An
Alternative to Cohen's Standardized Mean Difference
Effect Size: A Robust Parameter and Confidence
Interval in the Two Independent Groups Case.
Psychological methods, 10(3), 317.
Introduction
Data
Findings
safeguard,
protect,
defend,
support,
democracy,
future,
sovereignty and independence. Such
findings appear to indicate that The China Post
constructs the protests in terms of the damage they
may cause to the status quo and economic stability
of the island, whereas the discursive strategy of the
Taipei Times presents the protests as a struggle to
protect and defend the sovereignty and
independence of the island as well as safeguarding
the democratic process. The word students is
also one of the most frequent words in both corpora;
in The China Post, collocates include storm,
evict, urge and demand, however, in the
Taipei
Times
support,
occupy
and
participate are collocates, thus indicating that
while one newspaper focuses on the violent nature
of the demonstrations, the other emphasises
solidarity with the students. Such discursive
constructions are further perpetuated when extended
frequency lists are analysed. A frequent word in the
China Post corpus is police with a focus on the
violence which occurred between the protesters and
police during the protests, thus emphasising the antisocial nature of the demonstrations. Such a discourse
of violence is absent from the Taipei Times data; a
high frequency word is Sunflower which not only
functions as a nominalisation strategy associated
with hope, but also associates the protest movement
with the Wild Lily movement, a student movement
in 1990, which marked a turning point in Taiwan's
transition to pluralistic democracy.
Following the study of frequency, keywords of
the corpora were analysed, firstly using enTenTen
(2012) as a reference corpus.
The China Post Corpus
keyword
score
freq.
keyword
score
freq.
DPP
1889
321
KMT
1753
398
KMT
1517
225
DPP
1107
288
pact
996
461
Sunflower
781
438
Kuomintang
719
103
Taiwanese
774
352
Taiwan
685
806
Taiwan
638
1150
Taipei
533
158
pact
631
447
protesters
491
454
Taipei
613
278
Tsai
487
78
Tsai
318
78
Sunflower
467
171
protesters
299
423
Jiang
430
168
Jiang
246
142
10
Conclusion
References
Baker, P. 2006. Using Corpora in Discourse Analysis.
London: Continuum.
Costelloe, L. 2014. Discourses of sameness: Expressions
of nationalism in newspaper discourse on French urban
violence in 2005. In Discourse & Society 2014, Vol.
25(3) 315-340.
Fairclough, N. 1995. Media Discourse. London: Edward
Arnold.
Fukuda, M. 2014. Japan-China-Taiwan Relations after
Taiwan's Sunflower Movement. Asia Pacific Bulletin.
Number 264.
Kuo, S. 2007. Language as Ideology. Analyzing
Quotations in Taiwanese News Discourse. In Journal
of Asian Pacific Communication. 17:2. 281-301.
Mautner, G. 2008. Analyzing Newspaper, Magazines and
other Print Media. In R. Wodak & M. Krzyzanowski
(eds.) Qualitative Discourse Analysis in the Social
Sciences. New York: Palgrave Macmillan.
Rawnsley, G. D. 2004. Treading a Fine Line:
Democratisation and the Media in Taiwan. In
Parliamentary Affairs. Vol. 57, Issue 1. 209-222.
van Dijk, T. 1988. News as Discourse. Hillsdale, NJ:
Lawrence Erlbaum.
van Dijk, T. 1991. Racism and the Press. London:
Routledge.
68
References
Alderson, J. C. 2007. The CEFR and the need for more
research. The Modern Language Journal. 91 (4): 659663.
Alderson, J. C., Figueras, N., Kuijper, H., Nold, G.,
Takala, S. and Tardieu, C. 2006. Analysing tests of
reading and listening in relation to the common
European framework of reference: The experience of
the Dutch CEFR construct project. Language
Assessment Quarterly: An International Journal. 3 (1):
3-30.
Cambridge English Profile Corpus. (n.d.). English Profile:
CEFR
for
English.
Available
online
at
http://www.englishprofile.org/index.php/corpus
69
Calbert Graham
University of
Cambridge
apc38@cam.ac.uk
crg29@cam.ac.uk
Paula Buttery
University of
Cambridge
Michael McCarthy
University of
Cambridge
pjb48@ cam.ac.uk
mactoft@cantab.net
Overview
Automated assessment
https://sat.ilexir.co.uk
Acknowledgements
This work has been funded by Cambridge English
Language Assessment. We thank Nick Saville and
Ted Briscoe for their guidance. We thank Francis
Nolan, Kate Knill, Rogier van Dalen, and Ekaterina
Kochmar for their help. And we gratefully
acknowledge the support of Alan Little, Barbara
Lawn-Jones and Luca Savino.
References
Abney, S. & S. Bird (2010). The Human Language
Project: Building a Universal Corpus of the Worlds
Languages. Proceedings of the 48th Annual Meeting of
the Association for Computational Linguistics.
Association for Computational Linguistics.
Andersen, ., H. Yannakoudakis, F. Barker, & T. Parish
(2013). Developing and testing a self-assessment and
tutoring system. Proceedings of the Eighth Workshop
on Innovative Use of NLP for Building Educational
Applications.
Association
for
Computational
Linguistics.
Burstein, J. 2003. The e-rater scoring engine:
automated essay scoring with natural language
71
Introduction
Data
Methodology
Preliminary findings
References
del, A. and Rmer, U. 2012. Research on advanced
student writing across disciplines and levels:
Introducing the Michigan corpus of upper-level student
papers. International Journal of Corpus Linguistics
17 (1): 334.
Anthony, L. 2014. AntConc (Version 3.4.3) [Computer
Software] Tokyo, Japan: Waseda University. Available
from http://www.laurenceanthony.net/
Bestgen, Y. and Granger, S. 2014. Quantifying the
development of phraseological competence in L2
English writing: An automated approach. Journal of
Second Language Writing 26: 2841.
Biber, D. 2009. A corpus-driven approach to formulaic
language in English: Multi-word patterns in speech and
writing. International Journal of Corpus Linguistics
14 (3): 275311.
Biber, D., Conrad, S. and Cortes, V. 2004. If you look at
. . . : Lexical bundles in university teaching and
textbooks. Applied Linguistics 25 (3): 371405.
Chen, Y.-H. and Baker, P. 2014. Investigating criterial
discourse
features
across
second
language
development: Lexical bundles in rated learner essays,
CEFR B1, B2 and C1. Applied Linguistics, 133.
Cortes, V. 2004. Lexical bundles in published and
student disciplinary writing: Examples from history
and biology. English for Specific Purposes 23 (4):
397423.
Cortes, V. 2013. The purpose of this study is to:
Connecting lexical bundles and moves in research
article introductions. Journal of English for Academic
Purposes 12 (1): 3343.
Li, J. and Schmitt, N. 2009. The acquisition of lexical
phrases in academic writing: A longitudinal case
study. Journal of Second Language Writing 18 (2):
85102.
Ortega, L. and Iberri-Shea, G. 2005. Longitudinal
research in second language acquisition: Recent trends
and future directions. Annual Review of Applied
Linguistics 25: 26-45.
Staples, S., Egbert, J., Biber, D. and McClair, A. 2013.
Formulaic sequences and EAP writing development:
Lexical bundles in the TOEFL iBT writing section.
Journal of English for Academic Purposes 12 (3): 214225.
Valeriya Vinogradova
Universit Paris 13
Sorbonne Paris Cit
emmanuel.cartier@
lipn.univparis13.fr
valeriya.vinogradov
a@gmail.com
Computational
Models
Distributional Hypothesis
of
the
Linguistic motivation
preprocessing
for
linguistic
Semantic
Corpus
System Architecture
Sentence simplification
Steps 1 and 2:
subordinate clauses
Adverbials
and
^((?:en|dans||sur|selon|pour|chez|par).{5,150}?)\t,
\/PONCT\t/
DEFINIENDUM\t,
\/PONCT\t((?:en|dans||sur|selon|pour|chez|par).{5,150}?)\
t, \/PONCT
10 Results
The linguistic preprocessing improves greatly the
extraction process, as will be seen in table 1.
References
Baroni M. and Alessandro Lenci, 2010. Distributional
Memory: A General Framework for Corpus-Based
Semantics. Computational Linguistics 36(4):673-721
Bchet N., Cellier P., Charnois T., and Crmilleux B.,
2012. Discovering linguistic patterns using sequence
mining. In Alexander F. Gelbukh, editor, 13th
International Conference on Intelligent Text
Processing and Computational Linguistics, CICLing
2012, volume 7181 of Lecture Notes in Computer
Science, pages 154165. Springer, 2012.
Blacoe W. and Mirella Lapata. 2012. A Comparison of
Vector-based
Representations
for
Semantic
Composition. In Proceedings of the 2012 Joint
Conference on Empirical Methods in Natural
Language Processing and Computational Natural Language Learning, pages 546556, Jeju Island, Korea,
July. Association for Computational Linguistics.
Anna ermkov
ICNC, Charles
University in Prague
Lucie Chlumsk
ICNC, Charles
University in Prague
anna.cermakova
@ff.cuni.cz
lucie.chlumska
@ff.cuni.cz
Introduction
References
Braun, S. (2007). Integrating corpus work into secondary
education: From data-driven learning to needs-driven
corpora. ReCALL 19(03), 307-328.
Carter, R. (1998). Orders of reality: CANCODE,
communication, and culture. ELT Journal 52, 4356.
Cook, G. (1998). The uses of reality: A reply to Ronald
Carter. ELT Journal 52, 5763.
Cook, G. (2001). 'The philosopher pulled the lower jaw of
the hen'. Ludicrous invented sentences in language
teaching. Applied Linguistics 22, 366387.
Chambers, A. (2007). Popularising corpus consultation by
language learners and teachers. In E. Hidalgo et al.
(Eds.), Corpora in the foreign language classroom, 3 -
16. Rodopi.
Gardner, D. (2004). Vocabulary Input through Extensive
Reading: A Comparison of Words Found in Children's
Narrative and Expository Reading Materials. Applied
Linguistics 25(1), 137.
Hunston, S. (1995). Grammar in teacher education: The
role of a corpus. Language Awareness 4(1), 15-31.
Hunt, P. (Ed.) (1992). Literature for Children:
Contemporary Criticism. London and New York:
Routledge.
Knowles, M. & Malmkjr, K. (1996). Language and
Control in Children's Literature. London and New
York: Routledge.
Mauranen, A. & Kujamki, P. (Eds.) (2004). Translation
Universals. Do they exist? Amsterdam Philadelphia:
John Benjamins.
Oxford English Dictionary for Schools (2006). Ed by R.
Allen. Oxford: OUP.
Puurtinen, T. (2003). Genre-specific Features of
Translationese? Linguistic Differences between
Translated and Non-translated Finnish Childrens
Literature. Literary and Linguistics Computing 18(4),
389406.
Sealey, A. (2000). Childly language: children, language,
and the social world. Harlow: Longman.
Sealey, A. & Thompson, P. (2007). Corpus, Concordance,
Classification: Young Learners in the L1 Classroom.
Language Awareness 16(3), 208223.
Sealey, A. & Thompson, P. (2004). 'What do you call the
dull words?' Primary school children using corpusbased approaches to learn about language. English in
Education 38 (1), 8091.
Stubbs, M. (2002). On text and corpus analysis: A reply
to Borsley and Ingham. Lingua 112 (1), 711.
Thompson, P. & Sealey, A. (2007). Through children's
eyes? Corpus evidence of the features of children's
literature. International Journal of Corpus Lingustics
12 (1), 123.
Van Dijk, T. A. (1981). Discourse studies and education.
Applied Linguistics, 2(1), 126.
Wall, B. (1991). The Narrator's Voice: the dilemma of
children's fiction. London: Macmillan.
Wild, K., Kilgarriff, A., & Tugwell, D. (2013). The
Oxford Childrens Corpus: Using a Childrens Corpus
in
Lexicography.
International
Journal
of
Lexicography, 26(2), 190218.
Widdowson, H. (2000). On the limitations of linguistics
applied. Applied Linguistics 21, 325.
80
References
Baker, P. and McEnery, T. 2005. A corpus-based
approach to discourses of refugees and asylum seekers
References
Argamon, Shlomo; Koppel, Moshe; Fine, Jonathan;
Shimoni, Anat. 2003. Gender, Genre, and Writing
Style in Formal Written Texts. Text , 23/3.
Besnier, Niko. 1994. Involvement in linguistic practice:
An Ethnographic Appraisal. Journal of Pragmatics 22:
279-299.
Biber, Douglas. 1988. Variation across Speech and
Writing. Cambridge, UK: Cambridge University Press.
Bradbury-Jones, Caroline; Irvine, Fiona; Sambrook,
Sally. 2007. Unity and Detachment: A Discourse
Analysis of Doctoral Supervision. International
Journal of Qualitative Methods: 81-96.
Lakoff, Robin T., 1990. Talking power: The politics of
language in our lives. New York: Basic Books.
Narrog, Heiko. 2012, Modality, Subjectivity, and
Semantic Change: A Cross-Linguistic Perspective.
Oxford: Oxford University Press.
Prelli, Lawrence J. 1989. The rhetorical construction of
scientific ethos. In: Herbert W. Simon (ed.), Rhetoric
in the human sciences. London: Sage.
82
Introduction
To provide a large corpus against which clientgenerated utterance could be matched, the Corpus of
Contemporary America English (Davies, 2008) was
used. This was chosen because not only did it
provide a very large database far larger than any
currently available in the field of AAC but it also
includes frequency data and grammatical tagging
based on the CLAWS system (Garside, 1987). Both
word frequency and syntax (mainly in the area of
morphology) are important pieces of information
when monitoring the performance of an aided
communicator (Binger, 2008; Binger & Light,
2008). Furthermore, such information can inform
educational and clinical intervention programs
(Cross, 2013).
Another feature of the database is that words are
lemmatized, providing a level of analysis that has
implications for the teaching vocabulary as word
sets rather than individual lexical items. For
example, if a client demonstrates the use of jump,
jumps, jumped, walks, and walking, teaching
Next Steps
pedagogically value.
References
Arnott, J. L., & Alm, N. (2013). Towards the
improvement of Augmentative and Alternative
Communication
through
the
modelling
of
conversation. Computer Speech & Language, 27(6),
1194-1211.
Ball, L. J., & Lasker, J. (2013). Teaching Partners to
Support Communication for Adults with Acquired
Communication
Impairment.
Perspectives
on
Augmentative and Alternative Communication, 22(1),
4-15.
Binger, C. (2008). Grammatical Morpheme Intervention
Issues for Students Who Use AAC. Perspectives on
Augmentative and Alternative Communication, 17(2),
62-68.
Binger, C., & Light, J. (2008). The morphology and
syntax of individuals who use AAC: research review
and implications for effective practice. Augmentative
and Alternative Communication, 24(2), 123-138.
Cross, R. T. (2010). Developing Evidence-Based Clinical
Resources Embedding. In Hazel Roddam and Jemma
Skeat Evidence-Based Practice in Speech and
Language Therapy (pp. 114-121): John Wiley & Sons,
Ltd.
Cross, R. T. (2012). Using AAC device-generated data to
develop therapy sessions. Paper presented at the
American Speech Hearing and Language Association
Annual Convention, Atlanta, GA.
Cross, R. T. (2013). The Value and Limits of Automated
Data Logging and Analysis in AAC Devices. Paper
presented at the ASHA Convention, Chicago, IL.
Davies, M. (2008-). The Corpus of Contemporary
American English: 425 million words, 1990-present.
Available online at http://corpus.byu.edu/coca
Garside, R. (1987). The CLAWS Word-tagging System.
In R. Garside, G. Leech & G. Sampson (Eds.), The
Computational Analysis of English: A Corpus-based
Approach (pp. 30-41). London: Longman.
Lesher, G. W., Moulton, B. J., Rinkus, G., &
Higginbotham, D. J. (2000). A Universal Logging
Format for Augmentative Communication. Paper
presented at the 2000 CSUN Conference, Los Angeles.
http://www.csun.edu/cod/conf/2000/proceedings/0088
Lesher.htm
Miller, J., & Chapman, R. (1983). SALT: Systematic
Analysis of Language Transcripts. San Diego: College
Hills Press.
Romich, B. A., Hill, K. J., Seagull, A., Ahmad, N.,
Strecker, J., & Gotla, K. (2003). AAC Performance
Report Tool: PERT. Paper presented at the
Rehabilitation Engineering Society of North America
(RESNA) 2003 Annual Conference, Arlington, VA.
Travis, J., & Geiger, M. (2010). The effectiveness of the
Picture Exchange System (PECS) for children with
84
John Urry
Lancaster University
c.dayrell
@lancaster.ac.uk
j.urry
@lancaster.ac.uk
Marcus Mller
Maria Cristina
Caimotto
University of Turin
University of Heidelberg
marcus.mueller
@gs.uniheidelberg.de
mariacristina.caimo
tto@unito.it
Tony McEnery
Lancaster University
a.mcenery@lancaster.ac.uk
Introduction
Results
Final Remarks
Acknowledgements
References
EC (European Commission). 2011. Climate Change.
Special Eurobarometer 372 Report. Available at:
http://ec.europa.eu/public_opinion/archives/ebs/ebs_37
2_en.
Gabrielatos, C. 2007. Selecting query terms to build a
specialised corpus from a restricted-access database.
ICAME Journal 31: 5-43.
PEW. 2010. 2010 Pew Global Attitudes Report. Obama
more popular abroad than at home, global image of
U.S. continues to benefit. Muslim disappointment.
Available
at:
http://www.pewglobal.org/2010/06/17/obama-morepopular-abroad-than-at-home/
Urry, J. 2011. Climate Change and Society. Cambridge:
Polity Press.
M. Lynne Murphy
University
of Sussex
r.defelice@
ucl.ac.uk
m.l.murphy@
sussex.ac.uk
References
Aijmer, K. 2013. Understanding pragmatic markers: a
variational
approach.
Edinburgh:
Edinburgh
University Press.
Anke, L., Camacho Collados, J. and Moreton, E. 2013.
The development of COBEC: the Corpus of Business
English Correspondence. Paper presented at the V
Congreso Internacional de Lingstica de Corpus
(CILC), Alicante.
Blum-Kulka, S. and Olshtain, E. 1984. Requests and
apologies: a cross-cultural study of speech act
realization patterns (CCSARP). Applied Linguistics
5(3): 196-213.
Brown, P. and Levinson, S. 1987. Politeness: some
universals in language usage. Cambridge: Cambridge
University Press.
De Felice, R., Darby, J., Fisher, A. and Peplow, D. 2013.
A classification scheme for annotating speech acts in a
business email corpus. ICAME Journal 37: 71-105.
De Felice, R. and Moreton, E. 2014. The pragmatics of
Business English: introducing the Corpus of Business
English Correspondence (COBEC). Paper presented at
the 7th IVACS Conference, Newcastle.
Flck, I. 2011. Don't tell a great man what to do:
Directive speech acts in American and British English
conversations. Poster presented at 12th International
Pragmatics Conference, Manchester, July.
Fung, L. and Carter, R. 2007. Discourse markers and
spoken English: native and learner use in pedagogic
settings. Applied Linguistics 28(3): 410-439.
Leech, G. 2014. The pragmatics of politeness. Oxford:
Oxford University Press.
Murphy, M. L. 2012, 18 August. Saying please in
restaurants. Separated by a Common Language (blog).
http://separatedbyacommonlanguage.blogspot.co.uk/20
12/08/saying-please-in-restaurants.html (3 Dec 2014)
Murphy, M. L. 2015. Separated by a common politeness
marker: the case of please. Paper submitted to
International Pragmatics Association conference, July,
Antwerp.
Romero-Trillo, J. 2008. Pragmatics and corpus
linguistics: a mutualistic entente. Berlin: Mouton de
Gruyter.
Trawick-Smith, B. 2012, 13 May. Impolite please.
Dialect
Blog.
http://dialectblog.com/2012/05/13/impolite-please/ (3
Dec 2014)
Schneider, K. 2012. Appropriate behavior across varieties
of English. Journal of Pragmatics 44: 102237.
Claire Dembry
Cambridge
University Press
Robbie Love
Lancaster
University
cdembry@
cambridge.org
r.m.love@
lancaster.ac.uk
Introduction
Pilot study
28
Scaling up
90
Conclusion
References
Aston, G. & Burnard, L. 1997. The BNC handbook.
Exploring the BNC with SARA. Edinburgh: Edinburgh
University Press.
Crowdy, S. 1995. The BNC spoken corpus, in G. Leech,
G. Myers and J. Thomas (eds.) 1995, Spoken English
on Computer: Transcription, Mark-up and Application.
London: Longman, pp.224-235.
Leech, G. (1993). 100 million words of English. English
Today, 9-15. doi:10.1017/S0266078400006854
Love, R. (2014). Methodological issues in the compilation of
spoken corpora: the Spoken BNC2014 pilot study. Lancaster
University: unpublished Masters dissertation.
Introduction
Some findings
92
Conclusion
References
Baker, P. 2006. Using Corpora in Discourse Analysis.
London: Continuum
Becker, G and Nachtigall, R.D. 1991. Ambiguous
responsibility in the doctor-patient relationship: The
case of infertility, In: Social Science & Medicine, 32:
8, pp 875-885,
Fox, NJ. 2006. Health Identities: From Expert Patient to
Resisting Consumer. In: Health, 10 (4): 461-479
Greil, A, Blevins-Slauson, K and McQuillan, J. 2010. The
experience of infertility: A review of recent literature.
In: Sociology of Health & Illness 32:1 pp. 140162
Letherby, G. 2002. Challenging Dominant Discourses:
identity and change and the experience of 'infertility'
and 'involuntary childlessness. In: Journal of Gender
Studies, 11:3 pp. 277-288
Sunderland, J. 2004. Gendered discourses. Basingstoke:
Palgrave Macmillan.
Thompson, C. 2007. Making parents: the ontological
choreography
of
reproductive
technologies.
Cambridge, Mass. ; London : MIT
Introduction
Previous studies
Methodology
30
94
Initial conclusions
References
Baker, P. 2010. Sociolinguistics and Corpus Linguistics.
Edinburgh: Edinburgh University Press.
Partington, A. 2012. The changing discourses on antisemitism in the UK press from 1993 to 2009: A
modern-diachronic corpus-assisted discourse study.
Journal of Language & Politics11 (1): 51-76.
Introduction
Design
Implementation
33
(DIS)FLUENCY ANNOTATION
SYSTEM
(Dis)fluency
feature
Examples (FR009,
FR010 & FR01132)
Empty
pause
(perceptive
transcription)
Unfilled pause
(in ms; 3 subcategories)
Id
been
(0.720)
planning to to go
Filled pause
Filled pause
something to do with
er politics
Truncated word
Truncated
word
(3 subcategories)
Foreign word
Foreign word
politics or (0.820)
relations
internationales
Lengthening
Lengthening
in a (0.280) well in a
real (0.750) town
False start
Repetition
Restart
(5 subcategories)
Connector
Discourse
marker
Editing term
(1.590)
<FP>
<UP>
<RS>
<P>
<SP>
study
32
Each interview in LINDSEI is identified by a specific code:
FR corresponds to the interviewees mother-tongue (here
French), and the three-figure number (001 to 050) refers to the
50 learners.
we
have
mathematics
<L>
er
to
start
<FP>
(0.220)
well
(0.690)
<R0
<UP>
<DM>
<UP>
R1>
<N>+<S>
<N>
<N>+<P>
say
to
FR027
(0.120)
only
listen
to
classical
won
't
music
<UP>
<S>
erm
you
<FP>
er
girl
FR011
sitting
in
<R0
R0
(2.030)
yes
<UP>
<DM>
<N>+<P>
<N>
33
in
(0.230)
on
R1
R1>
<UP>
<RS+<T
RS+T>
RS>
<S>
<SP
SP>
chair
References
Aijmer, Karin. 1997. I Think - an English Modal
Particle. In Modality in Germanic Languages.
Historical and Comparative Perspectives, eds. Toril
Swan & Olaf J. Westvik, 1-47. Berlin: Mouton de
Gruyter.
Chafe, Wallace. 1980. Some Reasons for Hesitating. In
Temporal Variables in Speech, eds Raupach, Manfred
et al, 168-80. Den Haag: Mouton de Gruyter.
Chambers, Francine. 1997. What Do We Mean by
Fluency?. System 25, no 4: 535-44.
Crible, Ludivine, Dumont, Amandine, Grosman, Iulia, &
Notarrigo Ingrid. (2014). Annotation des marqueurs
de fluence et disfluence dans des corpus multilingues
et multimodaux, natifs et non natifs. Unpublished
internal report. Universit catholique de Louvain:
Louvain-la-Neuve.
Crystal, David. 1987. The Cambridge Encyclopedia of
Language. Cambridge: Cambridge University Press.
De Cock, Sylvie. 2004. Preferred sequences of words in
NS and NNS speech. Belgian Journal of English
Language and Literatures (BELL), New Series 2, 225
246.
98
Dag Elgesem
University of Bergen
Andrew Salway
Uni Research
dag.elgesem
@uib.no
andrew.salway
@uni.no
Introduction
Method
https://code.google.com/p/justext/
100
Main findings
has-served-the-public-interest/
van Dijck, J. 2014. Datafication, dataism and
dataveillance: Big data between scientific paradigm
and ideology. Surveillance & Society 12(2):197-208.
Wemple. E. 2013. Leaker, Source or Whistleblower.
http://www.washingtonpost.com/blogs/erikwemple/wp/2013/06/10/edward-snowden-leakersource-or-whistleblower/
Acknowledgements
This research was supported by a grant from the
Research Council of Norways VERDIKT program
(NTAP, project 213401). We are very grateful to
Knut Hofland for his role in creating the corpus
analysed here.
References
Greenwald, G. 2014. No Place to Hide. Edward Snowden,
the NSA and the Surveillance State. London: Hamish
Hamilton.
Kennedy, H., Elgesem, D. and Miguel, C. 2015. On
fairness: User perspectives on social media mining.
Forthcoming in Convergence
PEW Research Center. 2014. Most young Americans say
Snowden has served the public interest.
http://www.pewresearch.org/facttank/2014/01/22/most-young-americans-say-snowden101
Corpus statistics:
key issues and controversies (panel)
Stefan Evert
FAU ErlangenNrnberg
Gerold Schneider
University
of Zrich
stefan.evert
@fau.de
Gschneid
@es.uzh.ch
Vaclav Brezina
Lancaster
University
v.brezina
@lancaster.ac.uk
stgries@linguistic
s.ucsb.edu
Jefrey Lijffijt
University
of Bristol
Paul Rayson
Lancaster
University
jefrey.lijffijt
@bristol.ac.uk
p.rayson
@lancaster.ac.uk
Sean Wallis
University College
London
Andrew Hardie
Lancaster
University
s.wallis
@ucl.ac.uk
a.hardie
@lancaster.ac.uk
Motivation
102
Speakers
Visualisation
Thomas Proisl
FAU ErlangenNrnberg, Germany
stefan.evert
@fau.de
thomas.proisl
@fau.de
Christof Schch
University of
Wrzburg, Germany
Fotis Jannidis
University of
Wrzburg, Germany
christof.schoech@
uni-wuerzburg.de
fotis.jannidis
@uni-wuerzburg.de
Steffen Pielstrm
University of
Wrzburg, Germany
Thorsten Vitt
University of
Wrzburg, Germany
pielstroem
@biozentrum.uniwuerzburg.de
thorsten.vitt
@uni-wuerzburg.de
Introduction
Previous work
Current research
References
105
Silvia Bernardini
University of Bologna
adriano.ferraresi
@unibo.it
silvia.bernardini@u
nibo.it
Maja Milievi
University of Belgrade
m.milicevic@fil.bg.ac.rs
Introduction
Background:
collocations
interpreting/translation
in
Results
Method
Coeff.
0.612
-1.697
SE
0.241
0.307
Z
2.539
-5.526
p
<.05
<.001
-0.995
0.256
-3.885
<.001
1.505
0.384
3.917
<.001
0.069
0.221
0.312
ns
-1.128
0.246
-4.582
<.001
References
Baroni, M., Bernardini, S., Ferraresi, A. and Zanchetta, E.
2009. The Wacky Wide Web: A collection of very
large linguistically processed web-crawled corpora.
Language Resources and Evaluation 43 (3): 209226.
Bates, D. 2005. Fitting linear models in R: Using the
lme4 package. R News 5: 27-30.
Bernardini, S., Ferraresi, A. and Milievi, M.
Provisionally accepted. From EPIC to EPTIC
Exploring simplification in interpreting and translation
from an intermodal perspective. Target.
Dayrell, C. 2007. A quantitative approach to compare
108
Introduction
Conclusion
References
Bloch, J. 2009. The design of an online concordancing
program for teaching about reporting verbs. Language
Learning and Technology 13(1): 59-78.
Boulton, A. 2011. Language awareness and mediumterm benefits of corpus consultation. In A. Gimeno
Sanz (ed.) New Trends in Corpus Assisted Language
Learning: Working together, 39-46. Madrid:
Macmillan, ELT.
Chang, P. 2012. Using a stance corpus to learn about
effective authorial stance-taking: a textlinguistic
approach. ReCALL 24(2): 209-236.
Collentine, J. 2000. Insights into the construction of
grammatical knowledge provided by user-behaviour
tracking technologies. Language Learning and
Technology 36: 45-60.
Ellis, N.C. 2002. Frequency effects in language
processing. A review with implications for theories of
implicit and explicit language acquisition. Studies in
Second Language Acquisition 24: 143-188.
Flowerdew, L. 2008. Corpus linguistics for academic
literacies mediated through discussion activities. In D.
Belcher and A. Hirvela (eds.) The Oral/Literate
Connection: Perspectives on L2 Speaking, Writing and
Other Media Interactions, 268-287. Ann Arbor, MI:
University of Michigan Press.
Flowerdew, L. 2012. Exploiting a corpus of business
letters from a phraseological, functional perspective.
ReCALL 24(2): 152-168.
Gaskell, D. & Cobb, T. 2004. Can learners use
concordance feedback for writing errors? System
32(3): 301-319.
Huang, L-S 2011. Language learners as language
researchers: the acquisition of English grammar
through a corpus-aided discovery learning approach
mediated by intra- and interpersonal dialogues. In J.
Newman, H. Baayen & S. Rice (eds.) Corpus-based
Studies in Language Use, Language Learning and
Language Documentation, 91-122. Amsterdam:
Rodopi.
Johansson, S. 2009. Some thoughts on corpora and
second language acquisition. In K. Aijmer (ed.)
Corpora and Language Teaching, 33-44. Amsterdam:
John Benjamins.
Kennedy, C. & Miceli, T. 2010. Corpus-assisted creative
writing: introducing intermediate Italian students to a
corpus as a reference resource. Language Learning &
Federica Formato
Lancaster University
federicaformato.ac@gmail.com
Introduction
36
Data
24102
10147
8079
5879
20443
7396
7331
5716
24508
11086
9985
3437
70715
Results
Letta
Renzi
Unmarked forms
AF
4096
4964
3963
91.38
88.94
89.72
AF
365
604
442
8.14
10.82
10.00
AF
21
20
12
0.46
0.35
0.27
Marked forms
80.00%
70.00%
60.00%
50.00%
40.00%
30.00%
20.00%
10.00%
0.00%
RC
CS
LS
Monti Government
RC
CS
Letta Government
LS
RC
CS
LS
Renzi Government
Figure 1 Trends in the use of sub-categories of unmarked and marked forms divided by newspapers and
across governments
113
Conclusions
Mohamed Amine
Boukhaled
Labex OBVIL
francesca.frontini
@ilc.cnr.it
mohamed.boukhaled
@lip6.fr
References
Baker, P. (2008). Sexed Texts: Language, Gender and
Sexuality. London: Equinox.
Baker, P. (2014). Using corpora to analyze gender.
London & New York: Bloomsbury.
Formato, F. (2014) Language use and gender in the Italian
Parliament. PhD thesis, Lancaster University.
Retrieved
from
http://www.research.lancs.ac.uk/portal/en/publications/
language-use-and-gender-in-the-italianparliament(12ab6d96-d35e-4062-962835036d8fadad).html
Fusco, F. (2012) La lingua e il femminile nella
lessicografia italiana. Tra stereotipi e (in)visibilit.
Alessandria: Edizioni dellOrso.
McEnery, T., & Wilson, A. (2001). Corpus Linguistics.
Edinburgh: Edinburgh University Press.
Mills, S. (2008). Language and Sexism. Cambridge:
Cambridge University Press.
Robustelli, C. (2012). Luso del genere femminile
nellitaliano contemporaneo: teoria, prassi e proposte.
In Cortellazzo, M. (Ed.), Politicamente o
Linguisticamente Corretto? Maschile e Femminile: Usi
Correnti della Denominazione di Cariche e Professioni
(pp.
1-18).
Retrieved
from
http://ec.europa.eu/translation/italian/rei/meetings/docu
ments/decima_giornata_rei_novembre_2010_it.pdf.
114
jean-gabriel.ganascia@lip6.fr
http://www.cis.unimuenchen.de/~schmid/tools/TreeTagger/
Raisonneur
Chrysalde
Ariste
Clante
Phylinte
Bralde
Counterpart
Arnolphe
Sganarelle
Orgon
Alceste
Argan
116
il
Preliminary conclusions
Acknowledgements
This work was supported by French state funds
managed by the ANR within the Investissements
d'Avenir programme under reference ANR-11IDEX-0004-02, as well as by a scholarship from the
Fondation Maison Sciences de l'Homme, Paris.
References
Benzcri, J.-P. 1977. Histoire et prhistoire de lanalyse
des donnes. Partie V: l'analyse des correspondances.
Cahiers de Lanalyse Des Donnes, 2(1), 940.
Biber, D., & Conrad, S. 2009. Register, genre, and style.
Cambridge University Press.
Frontini, F., Boukhaled, M. A., & Ganascia, J. G. 2015
Linguistic Pattern Extraction and Analysis for Classic
French Plays. Presentation at the CONSCILA
Workshop, Paris.
Hawcroft, M. 2007. Molire: reasoning with fools.
Oxford University Press.
Husson, F., Josse, J., Le, S., & Mazet, J. 2013.
FactoMineR: Multivariate Exploratory Data Analysis
and Data Mining with R, R package version 1.24.
antonio.fruttaldo@unina.it
Introduction
References
Baker, P. 2014. Using corpora to analyze gender. London
& New York: Bloomsbury.
Bednarek, M. and Caple, H. 2013. News discourse.
London & New York: Bloomsbury.
Berkenkotter, C. and Huckin, T.N. 1995. Genre
knowledge
in
disciplinary
communication:
Cognition/culture/power. New Jersey: Lawrence
Erlbaum Associates.
Bhatia, V.K. 1996. Methodological issues in Genre
Analysis. Hermes, Journal of Linguistics 16: 39-59.
Bhatia, V.K. 2002. Applied genre analysis: a multiperspective model. Ibrica 4: 3-19.
Bhatia, V.K. 2004. Worlds of written discourse: A genrebased view. London: Continuum International.
Bivens, R. 2014. Digital currents: How technology and
the public are shaping TV news. Toronto: University of
Toronto Press.
Bowker, L. and Pearson, J. 2002. Working with
specialized language: A practical guide to using
corpora. London & New York: Routledge.
Coffey, A.J. and Cleary, J. 2008. Valuing New Media
Spaces: Are Cable Network News Crawls Crosspromotional
Agents?.
Journalism
& Mass
Communication Quarterly 85 (4): 894-912.
Coffey, A.J. and Cleary, J. 2011. Promotional practices
of cable news networks: A comparative analysis of
new and traditional spaces. International Journal on
Media Management 13 (3): 161-176.
Deuze, M. 2008. The changing context of news work:
Liquid journalism and monitorial citizenship.
International Journal of Communication 2: 848-865.
119
Websites
Elliott, S. (2009, January 22). In Trust Me, a Fake
Agency Really Promotes. The New York Times.
Retrieved
September
8,
2014,
from
http://www.nytimes.com/2009/01/22/business/media/2
2adco.html
Moore, F. (2001, December 27). News crawl not just for
bulletins anymore. Pittsburgh Post-Gazette. Retrieved
September
9,
2014,
from
http://news.google.com/newspapers?id=liQxAAAAIB
AJ&sjid=MnADAAAAIBAJ&pg=6570%2C3575355
The truth about news tickers [Web log post] (2011, March
9).
Retrieved
March
15,
2013,
from
http://runningheaders.wordpress.com
Poniewozik, J. (2010, November 24). The tick, tick, tick
of the times. The Time. Retrieved March 15, 2013,
from
http://content.time.com/time/specials/packages/article/
0,28804,2032304_2032745_2032850,00.html
Wikipedia entry dedicated to News Tickers (2004,
September 8). Retrieved March 7, 2013, from
http://en.wikipedia.org/wiki/News_ticker
Introduction
Methodology
38
Biber et al. (1999: 485) categorize modal verbs into two types
according to their meanings: intrinsic and extrinsic. These
two types are also called deontic and epistemic respectively.
39
Zemach and Islam (2011) is a new edition of Zemach and
Islam (2005), and the content is much the same with some
descriptions updated. The former was used for the students in
2009 and 2010, and the latter, for those in 2012.
Conclusion
Acknowledgements
I am deeply grateful to Professor Geoffrey Leech for
his valuable comments, suggestions and warm
encouragement. I pray for my great mentor
Professor Geoffrey Leechs eternal happiness and
peacefulness. I am also very grateful to Professor
Willem Hollmann for his helpful comments and
suggestions. All errors and inadequacies are my
own.
I would also like to express my deep gratitude to
Professor Sylviane Granger for her kind permission
to use Louvain Corpus of Native English Essays
44
References
Dana Gablasova
Lancaster
University
Vaclav Brezina
Lancaster
University
fukham@yahoo.com
d.gablasova
@lancaster.ac.uk
v.brezina
@lancaster.acuk
124
Introduction
Method
Procedure
CAND INT
Freq.
%
16
39.0
18
43.9
7
17.1
41
100
CAND DISC
Freq.
%
57
54.8
33
31.7
14
13.5
104
100
EX-INT
Freq.
35
10
6
%
68.6
19.6
11.8
51
100
EX-DISC
Freq.
%
65
71.4
9
9.9
17
18.7
91
100
125
Conclusion
Sheena Gardner
Coventry University
Douglas Biber
Northern Arizona
University
sheena.gardner
@coventry.ac.uk
douglas.biber
@nau.edu
Hilary Nesi
Coventry University
h.nesi@coventry.ac.uk
References
Aijmer, Karin. 2004. Pragmatic markers in spoken
interlanguage. Nordic Journal of English Studies 3
(1):173-190
Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan
Conrad, Edward Finegan, and Randolph Quirk. 1999.
Longman grammar of spoken and written English.
London/New York: Longman.
Fung, Loretta, and Ronald Carter. 2007. Discourse
markers and spoken English: Native and learner use in
pedagogic settings. Applied Linguistics 28 (3):410-439.
Hunston, Susan, and Geoffrey Thompson. 2000.
Evaluation in Text: Authorial Stance and the
Construction of Discourse: Authorial Stance and the
Construction of Discourse. Oxford University Press.
Krkkinen, Elise. 2003. Epistemic stance in English
conversation: A description of its interactional
functions, with a focus on I think. Amsterdam: John
Benjamins Publishing.
Krkkinen, Elise. 2006. Stance taking in conversation:
From subjectivity to intersubjectivity. Text & Talk 26
(6):699-731.
Mortensen, Janus. 2012. Subjectivity and Intersubjectivity
as Aspects of Epistemic Stance Marking. In
Subjectivity in Language and in Discourse, eds. Nicole
Baumgarten, Inke Du Bois, and Juliane House, 229246. Bingley: Emerald.
Trinity Collecge London. 2010. Graded examinations in
spoken EnglishSyllabus from 1 February 2010.
126
Introduction
Level 1
255
Level 2
229
Level 3
160
Level 4
80
188
206
120
205
181
154
156
133
216
198
170
207
1.4
1.4
1.5
2.0
More
Impersonal
5.1
5.6
5.7
6.3
Less overtly
Persuasive
2.7
2.8
3.0
3.2
More
Elaborated
12.7
13.9
14.7
17.2
1
2
3
4
Less
Narrative
More
Information
al
LEVEL
(Year)
5.9
6.2
6.4
5.5
2.1
3.0
3.0
3.7
5.7
6.5
5.7
4.4
2.3
1.3
1.5
1.2
Impersonal
Elaborated
13.4
15.3
15.6
13.4
Non
Narrative
AH
SS
LS
PS
Informatio
nal
Disciplinary
Group
5.5
6.2
5.7
6.5
Leve
l1
Leve
l2
Leve
l3
Leve
l4
N
text
s
795
Factor 1
Factor 2
Factor 3
Factor 4
-2.0349A
1.5277 A
0.9677 A
0.1515 A
754
-0.6158 B
0.4411 B
0.3448
0.3219 A
589
0.1279 B
-0.484 C
-0.0385
0.1356 A
598
3.3557 C
-2.1104
-1.6833
-0.7409
BA
Factor 1
Factor 2
Factor 3
Factor 4
AH
654
0.8968525B
2.8767694 A
4.6292360 A
-1.3375604 C
SS
698
4.7346048
0.9786615 B
1.6961467 B
0.2192967 B
LS
611
-0.9857905
-1.0196053
-2.6694000
0.3171914 BA
PS
773
-5.8237468
-3.1525827
-4.0513920
0.7918961 A
References
Biber, D. 2012 Register as a predictor of linguistic
variation. Corpus Linguistics and Linguistic Theory
8(1), 9-37.
Durrant, P. 2013 Discipline and level specificity in
university students written vocabulary Applied
Linguistics
Gardner, S. and H. Nesi (2013) A classification of genre
families in university student writing. Applied
Linguistics 34 (1) 1-29
Hardy, J. and U. Romer 2013 Revealing disciplinary
variation in student writing: a multi-dimensional
analysis of the Michigan Corpus of Upper-level
Student Papers (MICUSP). Corpora 8 (2) 183-207.
References
Al-Khatib, M. and Salem, Z. 2011 Obituary
announcements in Jordanian and British newspapers: A
cross-cultural overview. Acta Linguistica 5 (2): 80-96.
Fowler, B. 2007 The Obituary as Collective Memory.
Abingdon: Routledge.
Fries, U. 1990. Two Hundred Years of English Death
Notices. In M. Bridges (ed.) On Strangeness.
Tbingen: Gunter Narr. 57-71.
Fries, U. 2006. Death Notices: The Birth of a Genre. In
R. Facchinetti and M. Rissanen (eds.) Corpus-based
Studies of Diachronic English. Bern: Peter Lang. 157170.
Lipka, L. 2002. Non-serious text types and German
death notices an unlikely pair. In A. Fischer, G.
Tottie and P. Schneider (eds.) Text Types and Corpora:
Studies in Honour of Udo Fries. Tbingen: Gunter
Narr. 59-66.
Loock, R. and Lefebvre-Scodeller, C. 2014. Writing
about the Dead: A Corpus-based Study on How to
Refer to the Deceased in English vs French Obituaries
and Its Consequences for Translation. Current Trends
in Translation Teaching and Learning E 1 (2014): 115150.
Moore, S.H. 2002. Disinterring ideology from a corpus
of obituaries: A critical post mortem. Discourse &
society 13: 495-536.
Moses, R.A. and Marelli, G.D. 2003. Obituaries and the
discursive construction of dying and living. Texas
Linguistic Forum 47: 123-130.
Petrucci, A. 1995 Le Scritture Ultime: Ideologia della
Morte e Strategie dello Scrivere nella Tradizione
Occidentale. Torino: Giulio Einaudi.
Starck, N. 2008 Death can make a difference: A
comparative study of quality quartet obituary
practice. Journalism Studies 9 (6): 911-924.
Starck, N. 2009 Obituaries for sale: Wellspring of cash
and unreliable testimony. In B. Franklin (ed.) The
Future of Newspapers. Abingdon: Routledge. 320-328.
Federico Gaspari
UniStraDA
Marco Venuti
Univerisity of Catania
gaspari
@unistrada.it
mvenuti@unict.it
Keyword-related
linguistics
issues
in
corpus
Related work
Starting from the data used in two earlier corpusbased variationist phraseological studies (Gaspari
2013, 2014), we aim to provide an evaluation of
previous results by comparing them with those
obtained with the approach suggested by Gabrielatos
and Marchi (2012).Our main aim is that of testing
their approach to keyword identification staring
from the results of the previous analysis. In other
words we want to show to what extent the new
approach contributes to a different interpretation and
to a finer-grained analysis of differences and
similarities across our corpora.
Our analyses are based on two corpora. The first
includes official biographical profiles and award
motivations of Nobel Prize winners between 1901
and 2013, while the second consists of maiden
speeches (i.e. speeches delivered by new members
of the British Parliament when they address the
House for the first time) between 1983 and 2011.
Each corpus contains around 1.3 million words, with
male and female components, representing forms of
Institutional Discourse (Drew and Heritage 1992),
and displaying features of established genres
together with the evaluation of personal
achievements (official biographical profiles of Nobel
Prize winners and motivations of Nobel Prize
awards) and the expression of personal style (maiden
speeches).
Concluding remarks
References
Archer, D. (ed.) 2009. Whats In A Word-List?
Investigating Word Frequency and Keyword
Extraction. Farnham: Ashgate.
Baker, P. 2004. Querying Keywords: Questions of
Difference, Frequency, and Sense in Keywords
Analysis. Journal of English Linguistics 32 (4): 346359.
Baker, P. 2011. Times may change, but we will always
References
Baker, P. et al 2008. A Useful Methodological Synergy?
Combing Critical Discourse Analysis and
Corpus
Linguistics to examine Discourses of Refugees
and
Asylum seekers in the UK Press in Discourse and
Society: 19 (3), 273-306.
Baker, P. 2006. Using Corpora in Discourse Analysis:
London and New York: Continuum.
Capelli, G. 2006. Sun, Sea, Sex and unspoilt Countryside:
How the English Language makes Tourists out of
Readers: Pari, Pari Publishing.
Dann, G. M. S. 1996. The Language of Tourism: A
Sociolinguistic Perspective: Oxon, CAB International.
Fairclough, N. and Wodak, R. 1997. Critical Discourse
Analysis in T. A. van Dijk (Ed) Discourse as Social
Interaction: London: Sage Publications, pp 258-284.
Fina, M. E. 2011. What a TripAdvisor Corpus can tell us
about Culture: The Journal of Intercultural Mediation
and Communication, Vl4 pp 59-80.
135
Methods of characterizing
discontinuous lexical frames:
Quantitative measurements of
predictability and variability
136
Douglas Biber
Northern Arizona
University
begray
@iastate. edu
douglas.biber
@nau.edu
Joe Geluso
Iowa State University
Bethany Gray
Iowa State
University
jgeluso@iastate.edu
Introduction
Methods
Pattern
ACAD
1*34
12*4
1*34
12*4
CONV
% most
frequent
filler
P
(filler |
frame)
typetoken
ratio
19%
14%
46%
44%
0.11
0.10
0.29
0.29
0.41
0.43
0.20
0.19
% of
frame
with
unique
filler
30%
31%
14%
12%
References
Biber, D. 2009. A corpus-driven approach to formulaic
language in English. International Journal of Corpus
Linguistics 14 (3): 275-311.
Biber, D., Johansson, S., Leech, G., Conrad, S. and
Finegan, E. 1999. Longman grammar of spoken and
written English. London: Longman.
Butler, C. 1998. Collocational frameworks in Spanish.
International Journal of Corpus Linguistics 3 (1): 132.
Eeg-Olofsson, M., and
Altenberg, B. 1994.
Discontinuous recurrent word combinations in the
London-Lund Corpus. In U. Fries, G. Tottie, and P.
Schneider (eds.) Creating and using English language
corpora. Papers from the Fourteenth International
Conference on English Language Research on
Computerized Corpora, Zrich 1993. Amsterdam:
Rodopi.
Fletcher, W. 2003/2004/2011: online. Phrases in English.
Available at: http://phrasesinenglish.org/ (accessed
January 2015).
Gray, B. and Biber, D. 2013. Lexical frames in academic
137
Nicholas A. Lester
University of
California, Santa
Barbara
stgries@
gmail.com
nicholas.a.
lester@gmail.com
Stefanie Wulff
University of Florida
swulff@ufl.edu
Introduction
Extension 1: Surprisal
Extension 3: MuPDAR
Initial results
References
Gries, St.Th. and Adelman, A.S. 2014. Subject
realization in Japanese conversation by native and nonnative speakers: exemplifying a new paradigm for
learner corpus research. Yearbook of Corpus
Linguistics and Pragmatics 2014: New empirical and
theoretical paradigms. Cham: Springer.
Gries, St.Th. and Deshors, S.C. 2014. Using regressions
139
Jack Grieve
Aston University
Andrea Nini
Aston University
j.grieve1@
aston.ac.uk
a.nini1@
aston.ac.uk
Diansheng Guo
University of
South Carolina
Alice Kasakoff
University of
South Carolina
guod@mailbox
.sc.edu
kasakoff@mailbox
.sc.edu
Introduction
The corpus
Methods
140
Results
141
6
Figure 4 : Relative frequency of nf over time
Visual inspection of the examples above and of the
other scatterplots suggests that the emerging of new
words follows an s-shaped curve of diffusion
(Rogers, 2003) whereas the decline of old words
follows a steadier and almost linear decrease.
Discussion
Conclusions
Acknowledgements
This research is funded by the Economic and Social
Research Council, the Arts and Humanities Research
Counties, and JISC in the United Kingdom and by
the Institute of Museum and Library Services in the
United States, as part of the Digging into Data
Challenge.
References
Doyle, G. (2014) Mapping dialectal variation by querying
social media, In Proceedings of the 14th Conference of
the European Chapter of the Association for
Computational Linguistics.
Eisenstein, J., OConnor, B., Smith, N. and Xing, E.
(2012) Mapping the geographical diffusion of new
words, arXiv:1210.5268 [cs.CL], pp. 113, Available
from: http://arxiv.org/abs/1210.5268 (Accessed 13
June 2014).
Labov, W. (1995) Principles of Linguistic Change.
Volume I: Internal Factors, Oxford, Blackwell.
Rogers, E. M. (2003) Diffusion of Innovations, New
York, Free Press.
Smith, A. and Brenner, J. (2012) Twitter use 2012, Pew
Introduction
Context
143
144
experiences.
I use two corpora: a small, focused corpus (166
texts, 108,643 words) of news texts reporting on
Lucy Meadows between October 2012 and October
2013, and a reference corpus (7000 texts, 3,954,808
words) of news texts sampled from the same time
period. The gendered pronouns she and her emerged
as key terms. By examining gendered pronouns
when used to refer to Meadows, I found that he was
overwhelmingly used before Meadows' death
when she had already expressed her intention to live
and work full-time as female. Media reporting of
Meadows' transition appears to dismiss her gender
identity in favour of presenting her as the sex she
was assigned at birth. This finding appears to
reinforce observations by Serrano and Trans Media
Watch.
he
she
Before death
124
20
After death
38
451
Table 1: Pronoun use before and after Meadows
death
However, as Table 1 shows, female pronouns were
overwhelmingly used after her death. This is
probably due to several factors, not least campaigns
for improved reporting on trans issues by activists.
The data indicates that, while tabloid
misgendering is an issue, the situation is
complicated by use of direct and indirect quotations.
Direct quotations account for 65 of the 124
occurrences of he before death and 10 of the 38
occurrences of he after death. Of these, repetition
accounted for a considerable percentage of
occurrences there were 33 occurrences of a single
sentence (hes not only in the wrong body...hes in
the wrong job) from an article by Richard
Littlejohn, a columnist from the Daily Mail.
However, not all of these repetitions were uncritical
reproductions of Littlejohn's writing. Instead,
journalists were criticising Littlejohn but in doing
so, were also reproducing transphobic text.
Conclusions
media-an
Leveson, B. (2012). An inquiry into the culture, practices
and ethics of the press. London: The Stationery Office.
McNeil, J., Bailey, L., Ellis, S., Morton, J. and Regan, M.
(2012). The Trans Mental Health Study. Retrieved
from
http://www.gires.org.uk/assets/MedproAssets/trans_mh_study.pdf
Serrano, J. (2007). Whipping Girl: a transsexual woman
on sexism and the scapegoating of femininity. Berkley:
Seal Press
Trans Media Watch. (2011). The British Press and the
Transgender Community: Submission to The Leveson
Inquiry into the culture, practice and ethics of the
press.
Retrieved
from
http:/www.levesoninquiry.org.uk/wpcontent/uploads/2012/02/Submission-by-Trans-MediaWatch.pdf
Trans Media Watch. (2013). Trans Media Watch
responds to Chelsea Manning coming out. Retrieved
from
ttp://www.transmediawatch.org/Documents/Press_Rele
ase-20130822.pdf
References
Baker, P., (2005). Public Discourses of Gay Men.
London: Routledge
Baker, P. (2008). Sexed Texts: Language, gender and
sexuality. London: Continuum.
Baker, P. (2014). Using Corpora to Analyze Gender.
London: Bloomsbury.
Krehely, J. (2013). Pvt. Chelsea E. Manning Comes Out,
Deserves Respectful Treatment by Media and
Officials.
Retrieved
from
http://www.hrc.org/blog/entry/pvt.-chelsea-e.manning-comes-out-deserves-respectful-treatment-by48
Introduction
Lexical Selection
Zooniverse data
Study of but I
have
'm
am
Figure 2: Most frequent L1 and R1 items cooccuring with I in dark matter corpus
Conclusion
147
References
Dawkins, R. 1976. The Selfish Gene. Oxford: Oxford
University Press.
Hadikin, G. 2014. Lexical Selection and the Evolution of
Language Units. Manuscript submitted for publication.
Michael Handford
Tokyo University
mjahandford@gmail.com
148
References
Baker, M. (1995). Corpora in Translation Studies: An
Overview and Some Suggestions for Future Research.
Target, 7 (2), 223243.
Baker, P. (2006). Using Corpora in Discourse Analysis.
London: Continuum.
Benwell, B. & Stokoe, E. (2006). Discourse and Identity.
Edinburgh: Edinburgh University Press.
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus
Linguistics: Investigating Language Structure & Use.
Cambridge: Cambridge University Press.
Bucholtz, M., & Hall, K. (2005). Identity and interaction:
A sociocultural linguistic approach. Discourse Studies,
7 (4-5), 585614.
Collier, M. J., & Thomas, M. (1988). Cultural Identity:
An Interpretive Perspective. In Y. Y. Kim & W. B.
Gudykunst (Eds.), Theories in Intercultural
Communication (pp. 99-120). Newbury Park, CA:
Sage.
Connor, U., Ruiz-Garrido, M., Rozycki, W., Goering, E.,
Kinney, E., & Koehler, J. (2008). Patient-directed
medicine labeling: Text differences between the United
States and Spain. Communication & Medicine, 5 (2),
117-132.
Dervin, F. (2012). Cultural identity, representation and
Othering. In J. Jackson (Ed), Routledge Handbook of
Intercultural
Communication
(pp.
181-194).
Abingdon: Routledge.
Fahey, M. (2005). Speech acts as intercultural danger
zones: a cross cultural comparison of the speech act of
apologizing in Irish and Chilean soap operas. Journal
of Intercultural Communication, 8, 1404-1634.
Gee, J.P. (2005). An Introduction to Discourse Analysis.
Abingdon: Routledge.
Handford, M. (2010). The Language of Business
Meetings. Cambridge: Cambridge University Press.
Handford, M. (2014). Cultural identities in international,
interorganisational meetings: a corpus-informed
discourse analysis of indexical we, Language and
Intercultural Communication, (14) 1, 41-58.
Handford, M. (2015). Corpus Linguistics. In Zhu Hua
(Ed.)
Research
Methods
in
Intercultural
Communication. Oxford: Wiley-Blackwell.
Hofstede, G. (1991). Culture and Organisations. New
York: McGraw-Hill.
Holliday, A. (1999). Small Cultures. Applied Linguistics,
20 (2), 237-264.
Koester, A. J. (2004). Relational sequences in workplace
150
Introduction
Collocation analysis
Type
66
10
8
6
Token
1278
169
95
91
Transitivity analysis
Participant roles
Ca./Idd49.
Att./Idr.
Goal
Actor
Phenomenon
Senser
Verbiage
Sayer
No.
243
45
159
54
61
2
33
4
Ca. stands for Carrier, Idd. for Identified, Att. for Attribute,
and Idr. for Identifier.
152
Participant roles
Ca./Idd.
Att./Idr.
Goal
Actor
Phenomenon
Senser
Verbiage
Sayer
No.
198
43
136
22
30
0
19
2
Conclusion
Acknowledgements
This study is supported by China Scholarship
Council (No. 2012[3024]) and The Ministry of
Education, P. R. China (No. 14YJCZH148).
References
Alvaro J. 2013. Discursive representations of a dissident:
The case of Liu Xiaobo in China's English press.
Discourse & Society 24(3): 289-314.
Baker, P. 2006. Using corpora in discourse analysis.
London: Continuum.
Baker, P. 2010. Sociolinguistics and corpus linguistics.
Edinburg: Edinburg University Press.
Baker, P., et al. 2008. A useful methodological synergy?
Combining critical discourse analysis and corpus
linguistics to examine discourses of refugees and
asylum seekers in the UK press. Discourse & Society
19(3): 273-306.
Baker, P., Gabrielatos, C. and McEnery, T. 2013a.
Sketching Muslims: A corpus driven analysis of
representations around the word 'Muslim' in the British
press 1998-2009. Applied Linguistics 34(3): 255-278.
Baker, P., Gabrielatos, C. and McEnery, T. 2013b.
Discourse analysis and media attitudes. Cambridge:
CUP.
Baker, P. (ed.). 2015. Special issue in Discourse &
Communication 9(2).
Dunmire, P. 2009. 9/11 changed everything: An
intertextual analysis of the Bush doctrine. Discourse &
Society 20(2): 195-222.
Fairclough, N. 1995. Critical discourse analysis: The
critical study of language. London: Longman.
Fairclough, N. 2003. Analysing discourse: Textual
analysis for social research. London: Routledge.
Halliday, M. A. K. and Matthiessen, C. M. I. 2004. An
introduction to functional grammar. 3rd edition.
London: Edward Arnold.
Hart, C. 2014. Discourse, grammar and ideology:
Functional and cognitive perspectives. London:
Bloomsbury.
Kilgarriff, A., et al. 2004. The sketch engine. In:
Proceedings of Euralex, pp. 105-116.
McEnery, T and Hardie, A. 2012. Corpus linguistics:
Method, theory and practice. Cambridge: CUP.
153
Mark McGlashan
Lancaster University
c.hardaker
@lancaster.ac.uk
m.mcglashan
@lancaster.ac.uk
Introduction
Research questions
Data
Scope
Acknowledgements
This work was supported by the Economic and
Social
Research
Council
[grant
number
ES/L008874/1].
155
References
CPS. 2013. 'CPS authorises charges in Twitter-related
cases.'
no.
Available
online
at
http://www.cps.gov.uk/news/latest_news/stella_creasy
_mp_caroline_criado-perez/ (Accessed 30 September
2014)
Criado-Perez, C. 2013. 'We Need Women on British
Banknotes.'
Available
online
at
http://www.change.org/en-GB/petitions/we-needwomen-on-british-banknotes (Accessed 10 August
2013)
Zappavigna, M. 2014. 'Enacting identity in microblogging
through
ambient
affiliation.'
Discourse
&
Communication no. 8 (2): 209-228.
156
Yasemin Bayyurt
Boazii Univesity
bayyurty
@boun.edu.tr
including samples coming from non-native preservice English language teachers in Turkey and,
then to use this learner corpus both in creating
teaching materials and in implementing data driven
learning in undergraduate courses which are part of
the curriculum of the English Language Teacher
training programs in Turkey.
The corpus created for this project is a specialised
Turkish English Exam Corpus (TEEC), which has
been compiled by a research team at Middle East
Technical University (METU), Ankara, Turkey. The
corpus consists of 1914 Linguistic and ELT exam
papers (955483 words) written in timed
circumstances with no access to reference materials
by the students at the Foreign Language Education
(FLE) department at METU, Ankara between
January 2005 and December 2012. Only exam
papers were included in the corpus since the aim was
to collect spontaneous data which are the more
realistic representations of the English of the preservice English language teachers (Ellis 2001;
Selinker 1972). Since the aim in creating this corpus
was to identify the characteristics of the English
used by pre-service English language teachers (who
were also advanced learners of English) it was
decided that the corpus would be more useful to
potential users if it were tagged for features such as
orthography, punctuation, grammar (i.e., word
formation, agreement, tense, mood, word order) as
well as discoursal, pragmatic and rhetorical
characteristics. The annotation in the corpus was
done using EXMARaLDA partiture editor (i.e.,
Extensive Markup Language for Discourse
Annotation; http://exmaralda. org/).
The analyses of the mistakes of the learners
comprised three stages: (1) identification and
isolation, (2) supplying the target form and (3)
classification of the problem. The classification of
the identified problems, on the other hand, was done
using the scheme devised by Dulay et al. (1982) and
it included categories such as omission,
addition, misinformation and misordering.
The first course where the usefulness of the
METU TEEC was tested was The English Lexicon
(TEL) course. TEL is one of the must courses in the
curriculum of the English Language Teaching
Programs in Turkey and its main goal is to present,
discuss and analyze topics that are difficult for
native speakers of Turkish learning English. By
focusing on those problematic topics the course aims
not only to equip students with tools that will help
them do in-depth analyses of the linguistic data
coming from non-native speakers of English but also
to assist them in improving their own knowledge of
the target language. Since the compilation of the
METU TEEC, the topics included in the course
outline and the teaching methodology followed in
References
Aktas, T. 2005. Yabanci Dil Ogretiminde Iletisimsel Yeti.
Journal of Language and Linguistic Studies, 1(1), 89100.
Bayyurt, Y. 2010. Author positioning in academic
writing. In S. Zyngier and V. Viana (Eds), Avaliaoes
E Perspectivas: Mapeando Os Estudos Empiricos Na
Area deHumanas (Appraisals and Perspectives:
Mapping Empirical Studies In The Humanities) (pp.
163-184). Rio de Janeiro: The Federal University of
Rio de Janeiro.
Bayyurt, Y. 2012. Proposing a model for English
language education in the Turkish socio-cultural
context. In Yasemin Bayyurt and Yeim Bektaetinkaya (Eds.), Research Perspectives on Teaching
and Learning English in Turkey: Policies and
Practices (pp. 301-312). Berlin: Peter Lang.
Doanay-Aktuna, S. 1998. The spread of English in
Turkey and its current sociolinguistic profile. Journal
of Multilingual and Multicultural Development, 19 (1),
23-39.
Dulay, H. C., Burt, M. K. and Krashen, S. 1982.
Language Two. New York: Oxford University Press.
Ellis, R. 2001. The Study of Second Language
Acquisition. Oxford: Oxford University Press.
Hatipolu, . 2010. Summative Evolution of an
Undergraduate English Language Testing and
Evaluation Course by Future English Language
Teachers. English Language Teacher Education and
Development (ELTED), 13 (Winter 2010), 40-51.
Hatipolu, . 2013. First Stage in The Construction Of
METU Turkish English Exam Corpus (METU TEEC).
Boazii University Journal of Education, 30 (1), 5-23.
Isik, A. 2008. Yabanci Dil Egitimimizdeki Yanlislar
Nereden Kaynaklaniyor? Journal of Language and
Linguistics, 4(2), 15-26.
Kzlda, A. 2009. Teaching English in Turkey:
Dialogues with teachers about the challenges in public
primary schools. International Electronic Journal of
Elementary Education, 1 (3), 188-201.
Oguz, E. 1999. lkretimde Yabanc Dil (ngilizce)
retimi Sorunlar (The Problems of foreign language
(English)
teaching
in
elementary
schools).
Unpublished Master Thesis. Kocaeli University:
Kocaeli, Turkey.
157
Background
http://www.bbc.co.uk/news/world-us-canada-27562917
http://abclocal.go.com/three/kabc/kabc/My-TwistedWorld.pdf
51
158
Methodology
Analysis
Collocates with:
women
girl(s)
girl(s)
girlfriend(s)
girl(s)
women
women
girl(s)
girl(s)
girl(s)
girl(s)
girlfriend(s)
girl(s)
women
boy(s)
men
boy(s)
men
MI value
8.53
8.43
8.36
8.30
7.74
7.71
7.21
6.90
6.56
6.45
6.29
6.27
6.23
5.40
8.60
7.65
6.83
6.49
159
Conclusion
References
Anthony, L. 2014. AntConc (Version 3.4.3w) [Computer
Software]. Tokyo, Japan: Waseda University.
Available at: http://www.laurenceanthony.net/
Caldas-Coulthard, C. & Moon, R. 2010. Curvy, Hunky,
Kinky: Using corpora as tools for critical analysis.
Discourse and Society, 21(2), pp. 99-133.
Durrant, P. & Doherty, A. 2010. Are high frequency
collocations psychologically real? Investigating the
thesis of collocational priming. Corpus Linguistics and
Linguistic Theory, 6(2), pp. 125155.
Ferraresi, A., Zanchetta, E., Baroni, M., & Bernardini, S.
2008. Introducing and evaluating ukWaC, a very large
web-derived corpus of English. In Proceedings of the
WAC4 Workshop at LREC 2008, Marrakech, Morocco.
Herdadelen, A. & Baroni, M. 2011. Stereotypical gender
actions can be extracted from web text. Journal of the
American Society for Information Science and
Technology, 62(9), pp.1741-1749.
Horvarth, M., Hegarty, P., Tyler, S. & Mansfield, S.
2012. Lights on at the end of the party: Are lads
mags mainstreaming dangerous sexism? British
Journal of Psychology, 103, pp. 454-471.
Hunston, S. 2002. Corpora in applied linguistics.
Cambridge: Cambridge University Press.
Jane, E. 2014. Back to the kitchen, cunt: speaking the
unspeakable about online misogyny. Continuum, 28(4),
pp. 558-570.
Marcotte, A. 2014, May 30. 4 myths about sex and
women that prop up the new misogyny. Retrieved
from:http://www.salon.com/2014/05/30/4_myths_abou
t_sex_and_women_that_prop_up_the_new_misogyny_
partner/
Pearce, M. 2008. Investigating the collocational
behaviour of MAN and WOMAN in the BNC using
Sketch Engine. Corpora, 3(1), pp. 1-29.
Rayson, P. 2008. From key words to key semantic
domains. International Journal of Corpus Linguistics,
13(4), pp. 519-549.
Rayson, P. 2009. Wmatrix: a web-based corpus
processing environment, Computing Department,
Lancaster
University.
Available
at:
http://ucrel.lancs.ac.uk/wmatrix/
Wilson, A. & Rayson, P. 1993. Automatic content
Introduction
References
Arnon, I. and Snider, N. 2010. More than words:
Frequency effects for multi-word phrases. Journal of
Memory and Language 62 (1): 67-82.
Ashcraft, M. H. and Radvansky, G. A. 2010. Cognition
(5th edn.). London: Pearson.
Harley, T. A. 2008. The psychology of language: From
data to theory (3rd edn.). New York: Psychology
Press.
Huang, P., Y. Wible, D., and Ko, H. W. 2012. Frequency
effects and transitional probabilities in L1 and L2
speakers processing of multiword expressions. In S.
Th. Gries and D. Divjak (eds.). Frequency effects in
language learning and processing. Berlin: De Gruyter
Mouton.
Richard Bowker
Rhodes University
s.hunt@ru.ac.za
r.bowker@ru.ac.za
References
Andersen, G. 2011. Corpora as lexicographical basis
The case of anglicisms in Norwegian. Studies in
Variation,
Contacts
and
Change
in
English (VARIENG)
6.
http://www.helsinki.fi/varieng/journal/volumes/06/and
ersen/
Kilgarriff, A., Rundell, M. & U Dhonnchadha, E. 2006.
Efficient Corpus Development for Lexicography:
Building the New Corpus for Ireland in Language
Resources and Evaluation 40(2): 127-152 .
Hanks, P. 2010. Compiling a monolingual dictionary for
native speakers in Lexikos 20: 580-598.
Hunt, S.A. and Bowker, R. 2013. SAE11: a new member
of the family. Paper presented at Corpus Linguistics
2013 at Lancaster, UK, 22 - 26 July 2013.
Jeffrey, C. and Van Rooy, B. 2004. Emphasiser now in
colloquial
South
African
English. World
Englishes 23(2): 269-280.
Krishnamurthy, R. 2000. Size matters: Creating
dictionaries from the worlds largest corpus in
Proceedings of KOTESOL 2000: Casting the Net:
Diversity in Language Learning, Taegu, Korea: 169180.
Lass, R. 1995. South African English. In R. Mesthrie,
ed. Language and Social History: Studies in South
African Sociolinguistics. Cape Town: David Phillip,
89-106.
Rossouw, R. & Van Rooy, B. 2012. Diachronic changes
in modality in South African English. English WorldWide 33(1): 1-26.
Wasserman, R. and Van Rooy, B. 2014. The
Development of Modals of Obligation and Necessity in
White South African English through Contact with
Afrikaans. Journal of English Linguistics 42(1): 31
50.
she
knew
dont
know
he
knew
didnt
know
I
knew
total
Connie
Mellors
Clifford
Mrs
Bolton
others
total
28
33
27
15
24
14
12
43
34
16
110
References
Imao, Y. 2011. Mac OS X no konkodansa CasualConc
kihontekina tsukaikata to
yoreikensaku tsuru to shiteno oyorei [CasualConc, a
concordance software for Mac OS Xbasic functions
and how they can be utilized as a research tool].
Gaikokugo Kyoiku Media Gakkai Kansai shibu
Mesodorogi kenkyu bukai 2011nendo Ronshu [Journal
of Methodology SIG, Kansai Chapter, Japan
Association for Language Education and Technology
Kansai chapter 2011], 121-178.
Kilgarrif, A., Rychly, P, Smrz, P. & Tugwell D. 2004.
The Sketch Engine.
Proceedings of Euralex, 105-16.
Lawrence, D. H. 1960 [1928]. Lady Chatterleys Lover.
London: Penguin Books.
Introduction
Research objectives
Theoretical
frameworks
and
methodological
References
Baker, P. 2006. Using Corpora in Discourse Analysis.
London: Continuum.
Bednarek, M. and Caple, H. 2012. Value added':
Language, image and news values. Discourse,
Context, Media 1: 103-113.
Behind the Lines, NHS special report. 2011. Available
at:http://www.nhs.uk/news/2011/11November/Docume
nts/hope_and_hype_1.0.pdf
Boutron Y.A., Bafeta A., Marroun I., Charles P. et al.,
2012. Misrepresentation of Randomized Controlled
Trials in Press Releases and News Coverage: A Cohort
Study. PLoS Med 9(9): e1001308.
Goldacre, B. 2011. How far should we trust health
reporting? The Guardian, 11th June.
Calsamiglia, H. and van Dijk, T. A. 2004. Popularization
Discourse and Knowledge about the Genome.
Discourse and Society, 15 (4).
Galtung, J. and Ruge, M. 1973. Structuring and selecting
news. In J. Young and S. Cohen (eds.), The
Manufacture of News: Social Problems, Deviance and
the Mass Media (pp. 62-72). London: Constable.
Fairclough, N. 2006. Discourse and Social Change.
Cambridge: Polity Press.
Hunston, S. and Thompson, G. 2000. Evaluation in Text:
authorial stance and the construction of discourse.
Oxford: Oxford University Press.
Hyland, K. 2010. Constructing proximity: Relating to
readers in popular and professional science. Journal of
English for Academic Purposes 9: 116- 127.
Partington,
A. 2010. Modern Diachronic CorpusAssisted Discourse Studies (MD-CADS) on UK
newspapers : an overview of the project. Corpora 5
(2): 83108.
Ransohoff, D. F. and Ransohoff, R. M. 2001.
Sensationalism in the media: When scientists and
journalists may be complicit collaborators. Effective
Clinical Practice, 4: 185-188.
Suhardja, I. 2009. The Discourse of 'Distortion' and
Health and Medical News Reports: A Genre Analysis
Perspective. Ph.D. Thesis, University of Edinburgh.
Introduction
It has been generally believed that the weresubjunctive used in the phraseological unit 52 as it
were is strictly prohibited from being substituted
with the was-subjunctive; however, as examples (1)
and (2) show, as it was 53 is observed in
contemporary English.
(1) MORGAN: Will Justin Bieber have that, do
you think? Is it inevitable?
D. OSMOND: Hes got it now. Hes got it
now. You know, that kind of success at that
age can really bite you in the shorts, as it was,
the proverbial shorts.
MORGAN: What would you say to him?
(Corpus of Contemporary American English
(COCA), 2011)
(2) The journal had been intended as the perfect
Austenesque birthday gift for my vintageobsessed younger cousin. Id found it lying
alongside a worn copy of Pride and Prejudice
in a quirky antiques shop down on South
Congress and simply couldnt pass it up,
hobnobbing, as it was, with greatness.
(COCA, 2012)
As it was in (1) is used to give an example. In (2), it
is used to compare the fact that the author found a
worn copy of Pride and Prejudice to hobnobbing
with greatness.
The purpose of the study is to descriptively show
that as it were changes into as it was from a
phraseological perspective. In addition, based on the
data collected from corpora, this study minutely
explains the actual behaviours of as it was and its
relationship with as it were.
3
52
Phraseology
Previous research
subjunctive
on
the
were-
170
References
Greenbaum, S. and Whitcut, J. 1988. Longman guide to
English usage. London: Longman.
Verginica Barbu
Mititelu
Romanian Institute for
Artificial Intelligence,
Romanian Academy
elena@racai.ro
vergi@racai.ro
54
VISL, http://beta.visl.sdu.dk/visl/about/
171
55
57
58
http://www.maltparser.org/
http://www.iula.upf.edu/recurs01_tbk_uk.htm
Romanian
acl
advcl
advmod
agc
amod
appos
aux
auxpass
cc
compound
conj
correl
dblclitic
dep
det
dislocated
dobj
foreign
goeswith
iobj
list
mark
mwe
name
discourse
neg
nmod
parataxis
passmark
pmod
pobj
poss
possclitic
post
pred
prep
punct
reflclitic
remnant
reparandum
root
sc
secobj
spe
IULA
MOD
BYAG
SPEC
MOD
AUX
COORD
CONJ
UD
acl
advcl
advmod
agc
amod
appos
aux
auxpass
cc
compound
conj
unknown dep
SPEC
det
dislocated
DO
dobj
foreign
goeswith
IO
iobj
list
mark
mwe
name
discourse
NEG
neg
MOD
nmod
parataxis
PASSM
MOD
OBLC
poss
PRD,
ATR
COMP
PUNCT
case
punct
remnant
reparandum
root
SUBJ
nsubj,
csubj,
subj
cubjpass
voc
VOC
vocative
xcomp
OPRD
xcomp
Table 1. Inventories of relations: Romanian, IULA,
UD
59
Metric
Score
LAS
0.216
LA
0.417
UAS
0.514
AnyRight
0.715
Table 2. Evaluation results for 100 corrected
sentences
Habibah Ismail
University of Sydney
hism4614@uni.sydney.edu.au
Acknowledgements
This paper is supported by the Sectorial Operational
Programme Human Resources Development (SOP
HRD), financed from the European Social Fund and
by the Romanian Government under the contract
number SOP HRD/159/1.5/S/136077.
References
Arias, B., Bel, N., Fomicheva, M., Larrea, I., Lorente, M.,
Marimon, M., Mila, A., Vivaldi, J. and Padro, M.
2014. Boosting the creation of a treebank, In
Proceedings of LREC 2014, Reykjavik, Iceland
Bick J. and Greavu, A. 2010. A Grammatically
Annotated Corpus of Romanian Business Texts, in
Multilinguality and Interoperability in Language
Processing with Emphasis on Romanian, Editura
Academiei Romane, p. 169-183.
Hristea, F., Popescu, M. 2003. A Dependency Grammar
Approach to Syntactic Analysis with Special Reference
to Romanian, in F. Hristea i M. Popescu (coord.),
Building Awareness in Language Technology,
Bucureti, Editura Universitii din Bucureti, p. 9-16.
Ion, R., Irimia, E., tefnescu, D. and Tufi, D. 2012.
ROMBAC: The Romanian Balanced Annotated
Corpus. In Procedings of LREC 2012 Istanbul,
Turkey.
Marimon, M. and Bel, N. 2014. "Dependency structure
annotation in the IULA Spanish LSP Treebank". In
Language Resources and Evaluation. Amsterdam:
Springer Netherlands. ISSN 1574-020X
Melcuk, I. A. Dependency syntax : theory and practice,
Albany, State University Press of New York, 1987.
Nilsson, J., and Nivre, J. 2008. MaltEval: An Evaluation
and Visualization Tool for Dependency Parsing, In
Proceedings of LREC 2008, Marrakesch, Morocco.
Nivre, J. and Hall, J. 2005. Maltparser: A languageindependent system for data-driven dependency
parsing, In Proceedings of the 4th Workshop on
Treebanks and Linguistic Theories (TLT), pages 137148.
Perez, C.-A. 2014. Resurse lingvistice pentru prelucrarea
limbajului natural, PhD thesis, Al. I Cuza
University, Iasi.
Tesnire, L. lments de syntaxe structurale, Paris,
Klincksieck, 1959
174
Introduction
References
Aull, L. L., & Brown, D. W. 2013. Fighting Words: A
Corpus Anaysis of Gender Representations in Sports
Reportage. Corpora 8(1): 2752.
Baker, P. 2014. Using Corpora to Analyze Gender.
London: Bloomsbury Publishing.
Bernstein, A. 2002. Is It Time for a Victory Lap?:
Changes in the Media Coverage of Women in Sport.
International Review for the Sociology of Sport 37(34): 415428.
Caple, H. 2013. Competing for Coverage: Exploring
Emerging Discourses on Female Athletes in the
Australian Print Media. English Text Construction
6(2): 271294.
Eastman, S. T., & Billings, A. C. 2000. Sportscasting and
Sports Reporting: The Power of Gender Bias. Journal
of Sport and Social Issues 24(2): 192213.
Hardin, M., Chance, J., Doss, J. E., & Hardin, B. 2002.
Olympic Photo Coverage Fair to Female Athletes.
Newspaper Research Journal 23(2,3): 6478.
Jones, D. 2006. The Representation of Female Athletes in
Online Images of Successive Olympic Games. Pacific
Journalism Review 12(1): 108129.
King, C. 2007. Media Portrayals of Male and Female
Athletes: A Text and Picture Analysis of British
National Newspaper Coverage of the Olympic Games
Since 1984. International Review for the Sociology of
Sport 42: 187199.
Markula, P. 2009. Introduction. In P. Markula (Ed.),
Olympic Women and the Media: International
Perspectives (pp. 129). Basingstoke: Palgrave
MacMillan.
McDowell, J., & Schaffner, S. 2011. Football, Its a
Man's Game: Insult and Gendered Discourse in The
Gender Bowl. Discourse & Society 22(5): 547564.
Vincent, J., Imwold, C., Masemann, V., & Johnson, J. T.
2002. A comparison of selected Serious and
Popular British, Canadian, and United States
newspaper coverage of female and male athletes
competing in the centennial olympic games: Did
175
Sylvia Jaworska
Reading University
Anupam Nanda
Reading University
s.jaworska@
reading.ac.uk
a.nanda@
reading.ac.uk
Introduction
References
Cho, H., Roberts, R. and Pattens, D. 2010. The language
of US corporate environmental disclosure,
Accounting, Organizations and Society 35 (4): 431443.
Griffin, J. and Mahon, J. 1997. The Corporate Social
Performance and Corporate Financial Performance
Debate: Twenty-Five Years of Incomparable
Research, Business and Society 36 (1): 5-31.
Henry, E. 2008. Are investors influenced by how
earnings press releases are written?, Journal of
Business Communication 45 (4): 363-407.
Li, F. 2008. Annual report readability, current earnings,
and earnings persistence, Journal of Accounting and
Economics 45: 221-247.
Lischinsky, A. 2011. The discursive construction of a
responsible corporate self. In A.E. Sjlander and J.
Gunnarson Payne (eds.) Tracking discourses: Politics,
identity and social change. Lund: Nordic Academic
Press: 257-285.
177
Representations of Multilingualism in
Public Discourse in Britain: combining
corpus approaches with an attitude
survey
Sylvia Jaworska
Reading University
Christiana
Themistocleous
Reading University
s.jaworska
@reading.ac.uk
c.themistocleous
@ reading.ac.uk
Introduction
Research aims
Research methodology
Results
Examples of keywords
schools, education, school,
teaching, learn, learning
English, French, Welsh, Gaelic,
Spanish, German, Italian
children, pupils, teachers,
students, Canadians, bilinguals
Quebec, Canada, Wales, France,
European, Britain
bilingual, language, languages,
multilingual, bilingualism
foreign, fluent, ethnic, fluently
speak, speaking, says
London, Bangor
dyslexia, dyslexic, deaf
Internet, KGB
Examples of keywords
school, learning, schools, learn,
primary, education
English, French, Welsh, Gaelic,
Spanish, Catalan, Irish
children, pupils, speakers, people,
graduates, parents, immigrants
EU, Wales, UK, France
language, languages, bilingual,
multilingual, bilingualism
foreign, fluent, native, cultural
speak, says, speaking, translation,
spoken
London, Beijing
online, website, signs
Examples of keywords
school, schools, learning, primary
(school),
English, French, Spanish,
Mandarin, German, Flemish,
children, pupils, speakers,
immigrants, Bialystok
EU, Malta, UK, Belgium
language, languages, bilingual,
bilingualism
foreign, fluent,
speaking, speak, says
Brussels, Manchester
Alzheimers, dementia, brain,
cognitive
References
Baker, P. and McEnery, T. 2005. A corpus-based
approach to discourses of refugees and asylum seekers
in UN and newspaper texts. Journal of Language and
Politics 4 (2): 97-226.
Baker, P., Gabrielatos, C. and McEnery, T. 2013.
Discourse analysis and media attitudes. Cambridge:
Cambridge University Press.
Blackledge, A. 2004. Constructions of identity in
political discourse in multilingual Britian. In A.
Pavlenko and A. Blackledge (eds.) Negotiations of
Identity in Multilingual Contexts. Clevedon:
Multilingual Matters, 68-92.
Ensslin, A. and Johnson, S. 2006. Language in the news:
investigating representations of Englishness using
WordSmith Tools. Corpora 1 (2): 153-185.
Gabrielatos, C. and Baker, P. 2008. Fleeing, Sneaking,
Flooding: A Corpus Analysis of Discursive
Constructions of Refugees and Asylum Seekers in the
UK Press, 1996-20052. Journal of English Linguistics
36: 5-38.
Hardt-Mautner, G. 1995. Only connect: critical discourse
analysis and corpus linguistic. UCREL Technical
Paper 6. Lancaster: University of Lancaster.
Kelly-Holmes, H. 2012. Multilingualism in the Media. In:
M. Martin-Jones, A. Blackledge and A. Creese (eds.)
Routledge Handbook on Multilingualism. London and
New York: Routledge, 333-346.
Kelly-Holmes, H. and Milani, T. 2011. Thematising
multilingualism in the media. Journal of Language
and Politics 10 (4): 467-489.
180
References
Bernardini, S. (2004). "Corpora in the classroom: An
overview and some reflections on future
developments". In J. M. Sinclair How to Use Corpora
in Language Teaching. Amsterdam: John Benjamins:
15-36.
Cobb, T. (1999). "Giving learners something to do with
concordance output". ITMELT '99 Conference. Hong
Kong.
Hoey, M. (2005). Lexical Priming: A New Theory of
Words and Language. London, Routledge.
Hoey, M. and M. B. O'Donnell (2008). "Lexicography,
182
Introduction
The
research
methodologies
questions
and
Concordance
." Cross-examined byMR. HORRY. Did you not tell anybody about it? Not till
. Cross-examined byMR. HORRY. Have you not been something else here
. Cross-examined byMR. HORRY. Did you not find some flour bags also? No.
. Cross-examined byMR. HORRY. Did you not form your belief when you
References
Cameron, L., Deignan, A. 2003. Combining large and
small corpora to investigate tuning devices around
184
References
Adolphs, S. and Carter, R. 2002. Point of view and
semantic prosodies in Virginia Woolfs To the
Lighthouse. Poetica, 58: 7-20.
Barnes, J. 2011. The Sense of an Ending. London:
Vintage.
http://amsacta.cib.unibo.it/00002678/
Johnson, J.H. 2010. A corpus-assisted study of
parere/sembrare in Grazia Deleddas Canne al Vento
and La Madre. Constructing point of view in the
Source Texts and their English translations, in J.
Douthwaite and K. Wales (eds.), Stylistics and Co.
(unlimited) the range, methods and applications of
stylistics, Textus XXIII: 283-302.
Johnson, J.H. 2011. The use of deictic reference in
identifying point of view in Grazia Deleddas Canne al
Vento and its translation into English, in Target
23(1): 62-76.
Johnson, J.H. 2014. ...like reeds in the wind. Exploring
simile in the English translations of Grazia Deledda
using corpus stylistics. In D.R. Miller and E. Monti
(eds.) Tradurre Figure/ Translating Figurative
Language. Bologna: Bononia University Press.
Leech, G.N. 1965. This bread I break: Language and
Interpretation, Review of English Literature, 6.2
London: Longmans, Green.
Style. A
Palgrave
assisted
literary
Amelia Joulain-Jay
Lancaster University
a.t.joulain@lancaster.ac.uk
Introduction
No
1.1
1.2
1.3
1.4
1.5
2.1
2.2
2.3
2.4
2.5
3.1
Schema
NOUN_PHRASE ATTRIBUTIVE_
PREPOSITION COUNTRY
COUNTRY=VERB_COMPLEMENT
COUNTRY PREDICATE
COUNTRY to VERB
NOUN_PHRASE between COUNTRY
and COUNTRY
NOUN_PHRASE LOCATIONAL_
PREPOSITION COUNTRY
VERB
(LOCATIONAL_PREPOSITION)
COUNTRY
In COUNTRY
ADJECTIVE LOCATIONAL_
PREPOSITION COUNTRY
INSTITUTION of COUNTRY
3.2
(TROUP/VENUE) (TOWN)
COUNTRY
COUNTRY DATE
3.3
COUNTRY NUMBER
3.4
COUNTRY NOUN
Table 2. Phraseologies occurring with Russia and France in The Era (1840-1899)
Distinctions between the representations of the
countries are, however, noticeable in the details of
the specific personifying and locational
phraseologies associated with each country. When
Russia occurs in an personifying phraseology, it is
overwhelmingly (in 3/4 of cases) in a NOUN_
PHRASE
ATTRIBUTIVE_PREPOSITION
COUNTRY configuration, e.g. the complaints of
the enormous intrigues of Russia are becoming
universal (The Era, 23/10/1842). In contrast,
France appears within a more diverse range of
phraseologies. The most common, occurring in
about a third of cases, is COUNTRY
PREDICATE, e.g. France was going to war (The
Era, 09/1/1859). This result suggests a subtle
difference in the amount of agency or, to put it
another way, ability to exert power assigned to the
two countries.
In terms of locational phraseologies, in over half
the cases, Russia occurs in an INSTITUTION of
COUNTRY pattern, e.g. the emperor of Russia
most respectfully solicits from the public an
Inspection of his extensive stock of watches (The
Era, 13/5/1849), whereas France tends to occur
either in NOUN_PHRASE LOCATIONAL_
PREPOSITION COUNTRY patterns, e.g. [the
ship] brings () 297 passengers for England and
France (The Era, 06/6/1858), or in VERB
(LOCATIONAL_PREPOSITION) COUNTRY
Acknowledgement
This research is part of the ERC-funded Spatial
Humanities: Texts, GIS and Places project at
Lancaster University61.
61
http://www.lancaster.ac.uk/fass/projects/spatialhum.wordpress/
189
Figure 1. Raw and relative frequencies per year of Russia and France in The Era (1838-1900).
Figure 2. Raw frequency per year for France, France co-occurring with war within 20 words, and France
co-occurring with words tagged G3 within 20 words in The Era (1838-1900).
Figure 3. Raw frequency per year for Russia, Russia co-occurring with war within 20 words, and Russia cooccurring with words tagged G3 within 20 words in The Era (1838-1900).
190
References
Brake, L. and Demoor, M. (eds.) 2009. Dictionary of
Nineteenth-Century Journalism in Great Britain and
Ireland. London: Academic Press and the British
Library.
Andrew Kehoe
Birmingham City
University
Matt Gee
Birmingham City
University
andrew.kehoe
@bcu. ac.uk
matt.gee@
bcu.ac.uk
Introduction
Corpus composition
http://www.ebay.co.uk/
191
Linguistic variation
Summary
References
eBay Inc. 2014. eBay Marketplace Fast Facts At-AGlance (Q3 2014) Shareholders Report:
http://bit.ly/eBayInc2014
Kehoe, A. & M. Gee. 2007. New corpora from the web:
making web text more text-like. In P. Pahta, I.
Taavitsainen, T. Nevalainen and J. Tyrkk (eds.)
Towards Multimedia in Corpus Studies, electronic
publication,
University
of
Helsinki:
http://www.helsinki.fi/varieng/journal/volumes/02/keh
oe_gee/
Schmid, H. 1994. Probabilistic Part-of-Speech Tagging
Using Decision Trees. Proceedings of International
Conference on New Methods in Language Processing,
Manchester, UK.
Scott, M. 1997. PC Analysis of Key Words and Key
Key Words. System 25 (1), 1-13.
63
http://www.terapeak.com/
Introduction
Purposes
Methodology
193
Evaluative
meanings
Adjectives
No.
Normal
+Normality
Unusual
-Normality
Judgement
+Capacity
-Capacity
+Tenacity
-Tenacity
+Veracity
-Veracity
+Propriety
-Propriety
Security
Insecurity
Affect
Satisfaction
Dissatisfaction
Happiness
Unhappiness
Inclination
Disinclination
Appreciation
+Composition
-Composition
+Reaction
-Reaction
+Valuation
-Valuation
Others
Grand Total
1
2
67
194
110
147
53
46
158
215
31
24
7
29
6
2
13
16
27
27
13
22
13
2
14
1239
Table 1
The majority of this pattern is in the present simple
tense (747 instances), followed by the past simple
tense (491 instances). There is no instance where
this pattern is used in the future tense. As such, this
indicates that this pattern is used in reference to
current events and in some cases past events.
In terms of the grammatical subjects, a third
person human is most frequently used as a subject
(634 instances), followed by first person (313
instances) and second person pronoun (184
instances). As a consequence, this pattern is most
often used to talk about the behaviors or
characteristics of other people or the speaker.
In terms of engagement, the phraseological
pattern is more frequently oriented to heterogloss
(715 instances) than monogloss (524 instances).
That is, it is more strongly associated with
194
References
Gries, S. T. (2006). Corpus-based methods and cognitive
semantics: The many senses of to run. In S. T. Gries
& A. Stefanowitsch (Eds.), Corpora in Cognitive
Linguistics: Corpus-Based Approaches to Syntax and
Lexis (pp. 57100). Berlin/New York: Mouton de
Gruyter.
Hoey, M. 2007. Lexical priming and literary creativity.
In M. Hoey, M. Mahlberg, M. Stubbs, & W. Teubert
(Eds.), Text, discourse and corpora: Theory and
analysis (pp. 729). London and New York:
Continuum.
Hunston, S., & Francis, G. 1999. Pattern grammar: A
corpus-driven approach to the lexical grammar of
English. Amsterdam and Philadelphia: John Benjamins
Publishing.
Kennedy, G. 2003. Structure and Meaning in English: A
Guide for Teachers. London: Pearson Education
Limited.
Martin, J. R., & White, P. R. R. (2005). The language of
evaluation. Palgrave Macmillan: Great Britain.
Ondej Herman
Lexical Computing
Ltd.,
Masaryk University
Adam.kilgarriff@ske
tchengine.co.uk
Ondrej.herman@sket
chengine.co.uk
Jan Buta
Lexical Computing
Ltd.,
Masaryk University
Vojtch Kov
Lexical Computing
Ltd.,
Masaryk University
Jan.busta@sketcheng
ine.co.uk
Vojtech.kovar@sket
chengine.co.uk
Milo Jakubek
Lexical Computing Ltd., Masaryk University
Milos.jakubicek@sketchengine.co.uk
Introduction
Neologisms
from simple.
There are some lexical cues that speakers often
use when introducing a word for the first time: socalled, defined as, known as. In writing, the
language user might put the new item in single or
double quotation marks. One kind of corpus
strategy for identifying neologisms looks for items
that are marked in these ways. An implemented
system for English, which shows these methods to
be strikingly useful, is presented by Paryzek (2008).
The approach is extended for Swedish by
Stenetorp (2010) who starts from lists of neologisms
from the Swedish Academy and Swedish Language
Council, and develops a supervised machine
learning system which finds features of neologisms
vs. non-neologisms, and can then classify new items
as neologism-like or not. Stenetorp uses a very large
corpus of documents each with a time stamp, as do
we.
ODonovan and ONeil (2008) present the system
in use at Chambers Harrap at the time for identifying
neologisms to add to the dictionary. This is of
particular interest as it is a system which, in contrast
to the academic ones, is used in earnest by a
publisher. One component of the software suite
builds a large time-stamped corpus; another, the
word-tracking component (based on Eiken 2006)
identifies items which have recently jumped up in
relative frequency; and a third, echoing the third of
our criteria above, promotes higher-frequency items
so they will appear higher in the lists that
lexicographers are asked to monitor.
Gabrielatos et al. (2012) present an approach to
diachronic analysis similar to ours, but focusing on
one specific sub-issue: what are the most useful
time-slices to break the data set up into. There is
usually a trade-off between data sparsity, arguing for
fewer, fatter time-slices, and delicacy of analysis,
which may require thinner ones. We plan to
integrate the lessons from their paper into the
options available in Diacran.
Acknowledgments
This work has been partly supported by the Ministry
of Education of the Czech Republic within the
LINDAT-Clarin project LM2010013 and by the
Czech-Norwegian Research Programme within the
HaBiT Project 7F14047.
References
Davies, M. 2009. The 385+ million word Corpus of
Contemporary American English (19902008+):
Design,
architecture,
and
linguistic
insights. International
Journal
of
Corpus
Linguistics, 14(2), 159-190.
Eiken, U. C., Liseth, A. T., Witschel, H. F., Richter, M.,
Wang Lixun
Hong Kong Institute
of Education
a.kirkpatrick
@griffith.edu.au
lixun@ied.edu.hk
References
ACE. 2014. The Asian Corpus of English. Director: Andy
Kirkpatrick; Researchers: Wang Lixun, John Patkin,
Sophiann
Subhan.
http://corpus.ied.edu.hk/ace/
(accessed on 30 December 2014).
Archibald, A., Cogo, A. and Jenkins, J. (eds.) 2011.
Latest trends in ELF research. Newcastle, UK:
Cambridge Scholars Publishing.
Firth, A. 1996. The discursive accomplishment of
normality: On lingua franca English and conversation
analysis. Journal of Pragmatics 26: 237259.
Hall, C. J., Schmidtke, D. and Vickers, J. 2013.
Countability in world Englishes. World Englishes
32(1): 1-22.
House, J. 2003. English as a lingua franca: A threat to
multilingualism?. Journal of Sociolinguistics 7(4):
556-578.
Kirkpatrick, A. 2007. The communicative strategies of
ASEAN speakers of English as a lingua franca. In D.
Prescott (ed.) English in Southeast Asia: Literacies,
literatures and varieties (pp. 121-139). Newcastle,
UK: Cambridge Scholars Publishing.
Kirkpatrick, A and Subhan, S. 2014. Non-standard or new
standards or errors? The use of inflectional marking for
Introduction
Methodology
Findings
support*
Instances of
men as
agents of the
keyword
573
Instances of
women as
agents of the
keyword
55
commit*
39
invit*
pledge
96
stand*
288
55
engag*
20
perceive
Keyword
search term
address*
Conclusion
References
Baker, P. 2008. Sexed texts. London: Equinox Publishing
Ltd.
Baker, P. 2014. Using corpora to analyze gender. London:
Bloomsbury Academic.
Baker, P. and McEnery, T. Forthcoming. Who benefits
when discourse gets democratised? Analysing a
Twitter corpus around the British Benefits Street
debate.
Burr, V. 1995. An introduction to social constructionism.
London: Routledge.
Gamson, W. and Modigliani, A. 1989. Media discourse
and public opinion on nuclear power: A constructionist
approach. American Journal of Sociology, Vol. 95, pp.
1-37.
McCarthy, A. 2014. Sorry privileged white ladies, but
Emma Watson isn't a 'game changer' for feminism.
Retrieved
from
http://www.huffingtonpost.com/xojane-/emma-watsonfeminism_b_5884246.html, Accessed 16th December
2014.
203
Ansgar Koene
University of
Nottingham
Svenja Adolphs
University of
Nottingham
ansgar.koene
@nottingham.ac.uk
svenja.adolphs
@nottingham.ac.uk
Elvira Perez
University of
Nottingham
University of
Nottingham
Elvira.perez
@nottingham.ac.uk
psxcc@
nottingham.ac.uk
Ramona Statche
University of
Nottingham
Claire OMalley
University of
Nottingham
Ramona.statche
@nottingham.ac.uk
Claire.omalley
@nottingham.ac.uk
Tom Rodden
University of
Nottingham
Derek McAuley
University of
Nottingham
Tom.rodden
@nottingham.ac.uk
Derek.mcauley
@nottingham.ac.uk
Introduction
BAAL guidelines
Conclusion
Acknowledgements
This work forms part of the CaSMa project at the
University of Nottinghan, HORIZON Digital
Economy Research institute, supported by ESRC
grant ES/M00161X/1. For more information about
the CaSMa project, see 67.
References
British Association for Applied Linguistics, 2006,
Recommendations on Good Practice in Applied
Linguistics.
Available
online
at
http://www.baal.org.uk/dox/goodpractice_full.pdf
British Psychological Society, 2013. Ethics Guidelines for
Internet-mediated Research. INF206/1.2013. Leicester.
Bruckman, A., 2002. Ethical Guidelines for Research
Online. http://www.cc.gatech.edu/~asb/ethics/
Luger, E., 2013. Consent for all: Revealing the hidden
complexity of terms and conditions. Proceedings of the
SIGCHI conference on Human factors in computing
systems, 2687-2696.
Kosinski, M., Stillwell, D. and Graepel, T., 2013. Private
traits and attirbutes are predictable from digital
record of human behavior. PNAS 110 (15): 58025805.
Conceptualization of KNOWLEDGE
in the official educational discourse of
the Republic of Serbia
Milena Kostic
milenakostic09@gmail.com
67
1 http://casma.wp.horizon.ac.uk/
206
Results
ENTITY
BEING
PLANT
HUMAN
VALUABLE
OBJECT
OBJECT
INSTRUMENT
PRODUCT
BUILDING
SUBSTANCE
SOIL
LIQUID
ENERGY
SPACE
WATER
Conclusion
References
Andriessen, D. 2006. On the metaphorical nature of
intellectual capital: a textual analysis, Journal of
Intellectual Capital, 7(1), 93 110.
Andriessen, D. 2008. Stuff or love? How metaphors direct
our efforts to manage knowledge in organizations,
Knowledge Management Research & Practice, 6, 5
12.
Andriessen, D. and Van Den Boom, M. 2009. In Search
of Alternative Metaphors for Knowledge; Inspiration
from Symbolism. Electronic Journal of Knowledge
Management, 7(4), 397 404.
Bratianu, C. and Andriessen, D. 2008. Knowledge as
Energy: A Metaphorical Analysis. 9th European
Conference on Knowledge Management, Southampton
Solent University, Southampton, UK.
Dragievi, R. 2010. Leksikologija srpskog jezika.
Beograd: Zavod za udbenike.
Charteris-Black, J. 2004. Corpus Approaches to Critical
Metaphor Analysis. Palgrave Macmillan.
Johnson, M. 1987. The Body in the Mind. The Bodily
Basis of Meaning, Imagination, and Reason, Chicago:
The University of Chicago Press.
Kalra, M.B. and Baveja, B. 2012. Teacher Thinking about
Knowledge, Learning and Learners: A Metaphor
Analysis. Social and Behavioral Sciences 55, 317
326.
Klikovac, D. 2004. Metafore u miljenju i jeziku,
Beograd: XX vek.
Klikovac, D. 2006. Semantika predloga Studija iz
kognitivne lingvistike, Beograd: Filoloki fakultet (2.
izdanje).
Klikovac, D. 2008. Jezik i mo, Beograd: XX vek.
veczes, Z. 2002. Metaphor. A Practical Introduction.
Oxford: Oxford University Press.
veczes, Z. 2010. Metaphor: A Practical Introduction,
Oxford: OUP (2nd ed.).
Lakoff, G. and Johnson, M. 1980. Metaphors We Live By,
Chicago: University of Chicago Press.
Lakoff, G. 1987. Women, Fire, and Dangerous Things:
What Categories Reveal about the Mind. Chicago:
University of Chicago Press.
Introduction
Recency effects
Variation or change-in-progress?
References
Bybee, J. and Thompson, S. 1997. Three frequency
effects in syntax. Proceedings of the Twenty-Third
Annual Meeting of the Berkeley Linguistics Society:
General Session and Parasession on Pragmatics and
Grammatical Structure, 378-88.
Bybee, J. 2010. Language, Usage and Cognition.
Cambridge: Cambridge University Press.
Hooper, J.B. 1976. Word frequency in lexical diffusion
and the source of morpho-phonological change. In W.
Christie (ed.) Current progress in historical linguistics.
Amsterdam: North Holland, 96-105.
Wortschatz-Portal Universitt Leipzig.
<http://wortschatz.uni-leipzig.de>.
1998-2014.
DeReWo <http://www.ids-mannheim.de/derewo>,
Institut fr Deutsche Sprache, Programmbereich
Korpuslinguistik, Mannheim, Deutschland, 2013.
Elizaveta Kuzmenko
National Research
University
Higher School
of Economics
akutuzov@hse.ru
lizaku77@gmail.com
Olga Vinogradova
National Research University
Higher School of Economics
Andrey Kutuzov
National Research
University
Higher School
of Economics
olgavinogr@gmail.com
Learner corpora are mainly useful when errorannotated. However, human annotation is subject to
influence of various factors. The present research
describes our experiment in evaluating inter-rater
hierarchical annotation agreement in one specific
learner corpus. The main problem we are trying to
solve is how to take into account distances between
categories from different levels in our hierarchy, so
that it is possible to compute partial agreement.
The corpus in question is the Russian ErrorAnnotated Learner English Corpus (further
REALEC 69 ). It comprises nearly 800 pieces of
students writing (225 thousand word tokens). Our
students are mostly native Russian speakers, and
they write essays in English in their course of
general English. Teachers mark the essays and
annotate them according to the error classification
scheme (Kuzmenko and Kutuzov 2014). More than
10 thousand errors have already been annotated
manually.
REALEC error annotation scheme consists of 4
layers: error type, error cause, linguistic 'damage'
caused by the error and the impact of the error on
general understanding of the text. The first layer of
the annotation scheme in its turn consists of 151
categories organized into a tree-like structure.
Annotators choose a specific tag for the error they
have spotted, or apply one of the general categories
in accordance with the instructions provided.
In our inter-rater reliability experiment, 30 student
essays (7000 word tokens total) were chosen for this
task. An experienced ESL instructor outlined error
spans without marking exact error categories (520
spans in total). After that, 8 annotators were asked to
assign error categories to these error spans using
REALEC annotation scheme. All of them received
identical guidelines. They could change the area of
the error, or leave the marked span unannotated if
69
http://realec.org
211
% among
cases of
inconsiste
ncy
Example
solution
and
Course
of
action to be
taken
Mistakes made
by
the
annotators
33%
Improving
guidelines
and training
annotators
The
same
correction with
different
tagging
32%
Adding
guidelines on
whether
to
apply one or
both of the
tags
that
rightfully
describe the
error
Multiple tags
from mutually
exclusive areas
15%
Training
annotators to
decide
on
which
tags
rightfully
describe an
error
in
difficult cases
Particularly
distinguished New
Brunswick
showing > New
Brunswick
is
particularly marked
off showing with
tags Choice among
synonyms
+
Absence
of
a
component
in
clause or sentence
+ Standard word
order
or
New
Brunswick
is
particularly
distinguished
showing
with
Choice
among
synonyms + Voice
form + Standard
word order or
Particularly
distinguished was
New
Brunswick
showing
with
Absence
of
a
component
in
clause or sentence
+ Emphatic shift.
If
the
approaches
suggested are
equal in their
proximity to
the authors
intention, any
of them can
be applied.
Several
corrections
similarly close
to the original
text
13%
References
Artstein, R., and Poesio, M. Inter-coder agreement for
computational linguistics. Computational Linguistics
34.4 (2008): 555-596.
Craggs, R., and Wood, M. A categorical annotation
scheme for emotion in the linguistic content of
dialogue. Affective dialogue systems. Springer Berlin
Heidelberg, 2004. 89-100.
Geertzen, J., and Bunt, H. Measuring annotator
agreement in a complex hierarchical dialogue act
annotation scheme. Proceedings of the 7th SIGdial
Workshop on Discourse and Dialogue. Association for
Computational Linguistics, 2006.
Krippendorff, K. Content analysis: An introduction to its
methodology. Sage, 2012.
Kuzmenko, E, and Kutuzov, A. Russian Error-Annotated
Learner English Corpus: a Tool for Computer-Assisted
Language Learning. NEALT Proceedings Series Vol.
22: 87, 2014
Leech, G. Adding linguistic annotation. In: Developing
linguistic corpora : a guide to good practice. Oxbow
Books, Oxford, pp. 17-29, 2005
Passonneau, R. J. Applying reliability metrics to coreference annotation. arXiv preprint cmp-lg/9706011
(1997).
Chris Ryder
University of Reading
j.v.laws@
reading.ac.uk
c.s.ryder@
reading.ac.uk
Introduction
Prefixes (90)
27%
42%
31%
0%
Suffixes (110)
10%
44%
43%
3%
Methodology
Results
Prefixes
177
96
Suffixes
163
141
Totals
340
237
Conclusions
References
Bauer, L., Lieber, R. & I. Plag. 2013. The Oxford
Reference Guide to English Morphology. Oxford:
Oxford University Press.
Davies, M. (2012). BYU-BNC [Based on the British
National Corpus from Oxford University Press].
Available at <http://corpus.byu.edu/bnc>.
Dixon, R.M.W. 2014. Making new words: morphological
derivation in English. Oxford: Oxford University
Press.
Greenberg, J. H. (1966). Some universal of grammar with
particular reference to the order of meaningful
elements. In J. H. Greenberg (ed), Universals of
Language, 2nd edition, Cambridge, Mass: MIT Press.
Laws, J. V. & C. Ryder. 2014. Getting the measure of
derivational morphology in adult speech: A corpus
analysis using MorphoQuantics. University of
Reading: Language Studies Working Papers, 6. pp. 317.
Lehrer, A. (1998). Scapes, holics, and thons: the
semantics of English combining forms. American
Speech 73, 3-28.
Marchand, H. 1969. The categories and types of presentday English word-formation. Second edition.
Mnchen: C.H. Beck.
Minkova, D. and Stockwell, R. 2009. English words:
history and structure. Second edition. Cambridge:
Cambridge University Press.
Pri, T. (2005). Prefixes vs initial combining forms in
English: a lexicographic perspective. International
Journal of Lexicography 18, 313-334.
Pri, T. (2008). Suffixes vs final combining forms in
English: a lexicographic perspective. International
Journal of Lexicography 21, 1-22.
Stein, G. 2007. A dictionary of English affixes: their
function and meaning. Munich: Lincom Europa.
215
References
Attwood, F. (2010). Porn studies: from social problem to
cultural practice. In F. Attwood (Ed.), porn.com:
Making Sense of Online Pornography. Oxford: Peter
Lang (pp. 113). New York: Peter Lang.
Carter, R. (1997). Investigating English Discourse:
Language, Literacy, Literature. London: Routledge.
Dhaenens, F., Bauwel, S. V., & Biltereyst, D. (2008).
Slashing the Fiction of Queer Theory Slash Fiction,
Queer Reading, and Transgressing the Boundaries of
Screen Studies, Representations, and Audiences.
Journal of Communication Inquiry, 32(4), 335347.
doi:10.1177/0196859908321508
Dwyer, R. A. (2007). Terms of Endearment? Power and
Vocatives in BDSM Erotica (Masters thesis).
University of Edinburgh, Edinburgh.
Robbie Love
Lancaster
University
Claire Dembry
Cambridge University
Press
r.m.love
@lancaster.ac.uk
cdembry
@cambridge.org
Introduction
217
conventions throughout.
Love (2014) reports on a methodological pilot
study in advance of the construction of the Spoken
BNC2014, using sound recordings from an earlier
project. Loves findings suggest that speaker
identification appears to be largely unproblematic
for recordings which contain fewer than four
speakers, but that recordings with four or more
speakers are increasingly likely to prove difficult for
transcribers. As data collection for the corpus proper
progressed, it became clear that a substantial number
of recordings (approximately 20%) contain four or
more speakers. We therefore decided to revisit the
issue in greater detail, using recordings actually
collected to form part of the Spoken BNC2014.
Speaker identification is important because it is
the speaker ID codes in the corpus that allow users
to carry out sociolinguistic investigations,
comparing the language of speakers according to
demographic metadata, such as gender, age, or
socio-economic status (see for instance Baker 2014;
Xiao and Tao 2007; McEnery and Xiao 2004). It has
been
shown
that
making
sociolinguistic
generalisations based on corpus data is something
that is easy to do badly (Brezina and Meyerhoff
2014). If we were to have reason to believe that a
substantial number of speaker identifications in the
corpus might be inaccurate, there are further
worrying implications for the reliability of existing
and future studies which depend upon dividing
spoken corpora according to categories of
demographic metadata. This being the case, it is
essential for us to attempt to estimate the likely
extent of faulty speaker identification in a corpus
such as the Spoken BNC2014.
Method
Results
inter-rater
agreement regarding speaker identification
was alarmingly low, especially in light of
the high degree of certainty noted above.
Accuracy more transcribers failed than
succeeded in replicating the speaker ID
coding of the manufactured transcript to
anywhere near 100%.
Conclusion
Acknowledgements
We are grateful to the assistance of Samantha Owen,
Laura Grimes, Imogen Dickens and Sarah Grieves at
Cambridge University Press, and the freelance
transcribers who worked on the corpus. The research
presented in this paper was supported by the ESRC
Centre for Corpus Approaches to Social Science,
ESRC grant reference ES/K002155/1.
References
Baker, P. 2014. Using corpora to analyse gender.
London: Bloomsbury.
Brezina, V., & Meyerhoff, M. 2014. Significant or
random? A critical review of sociolinguistic
generalisations based on large corpora. International
Journal of Corpus Linguistics, 19(1), 1-28.
doi:10.1075/ijcl.19.1.01bre
Leech, G. 1993. 100 million words of English. English
Today, 9-15. doi:10.1017/S0266078400006854
Love, R. 2014. Methodological issues in the compilation
of spoken corpora: the Spoken BNC2014 pilot study.
Lancaster University: unpublished MA dissertation.
McEnery, A., & Xiao, Z. 2004. Swearing in modern
British English: the case of fuck in the BNC. Language
and
Literature,
13(3),
235-268.
doi:
10.1177/0963947004044873
Xiao, R., & Tao, H. 2007. A corpus-based
sociolinguistic study of amplifiers in British English..
Sociolinguistic Studies, 1 (2), 241273.
Uwe Springmann
CIS, LudwigMaximiliansUniversitt Mnchen
anke.luedeling@
rz.hu-berlin.de
springmann@
cis.uni-muenchen.de
Introduction
http://korpling.german.hu-berlin.de/ridges/index_en.html;
RIDGES stands for Register in Diachronic German Science.
The corpus is deeply annotated (in a multi-layer format) and
freely available under the CC-BY license. It can be downloaded
in several formats as well as queried through the ANNIS search
tool (Krause and Zeldes, 2014).
71
In transcribing a text, one has to take many decisions with
respect to diplomaticity. For the specific decisions in RIDGES,
see the manual on the homepage.
219
72
http://finereader.abbyy.com/
https://code.google.com/p/tesseract-ocr/
74
https://github.com/tmbdev/ocropy
75
https://code.google.com/p/cistern/
73
220
OCR
correction
annotator 1
40
21
annotator 2
80
28
annotator 3
60
35
average/page
30
14
Summary
References
Breuel, T. M. 2008. The OCRopus open source OCR
system. Electronic Imaging 2008.
Breuel, T. M., Ul-Hasan, A., Al-Azawi, M. A & Shafait,
F. 2013. High-performance OCR for printed English
and Fraktur using LSTM networks. In 12th
International Conference on Document Analysis and
Recognition (ICDAR), 683-687.
Claridge, C. 2008. Historical corpora. In A. Ldeling
&M. Kyt (eds.) Corpus Linguistics. An International
Handbook, Berlin: Mouton de Gruyter, 242259.
Hochreiter, S. and Schmidhuber, J. 1997. Long shortterm memory. Neural computation, 9(8), 1735-1780.
Krause, T. and Zeldes, A. 2014. ANNIS3: A new
architecture for generic corpus query and
visualization. Digital Scholarship in the Humanities.
Reddy, S. &Crane, G. 2006. A document recognition
system for early modern Latin. Chicago Colloquium
on Digital Humanities and Computer Science: What
Do You Do With A Million Books, Chicago, IL.
Rydberg-Cox, J. A. 2009. Digitizing Latin incunabula:
Challenges, methods, and possibilities. Digital
Humanities Quarterly, 3(1).
Springmann, U., Najock, D., Morgenroth, H., Schmid, H.,
Gotscharek, A. & Fink, F. 2014. OCR of historical
printings of Latin texts: problems, prospects,
progress. Proceedings of the First International
Conference on Digital Access to Textual Cultural
Heritage, 71-75.
Vobl, T., Gotscharek, A., Reffle, U., Ringlstetter, C.
&Schulz, K. U. 2014. PoCoTo an open source system
for efficient interactive postcorrection of OCRed
historical texts. Proceedings of the First International
Conference on Digital Access to Textual Cultural
Heritage, 57-61.
76
Introduction
Forced alignment
Additional benefits
77
This makes sense for Czech, where the grapheme-to-phoneme
correspondences are relatively straightforward. English would
rather use a reference pronouncing dictionary such as CMUdict
(Weide 1998) for AmE or BEEP (Robinson 1997) for BrE.
Acknowledgements
References
Bisani, M. and Ney, H. 2008. Joint-sequence models for
grapheme-to-phoneme
conversion.
Speech
Communication 50: 434-451.
Brocki, ., Korinek, D. and Marasek, K. 2014.
Challenges in Processing Real-Life Speech Corpora.
Presentation at Practical Applications of Language
Corpora (PALC 2014). d, Poland.
Jiang, H. 2005. Confidence measures for speech
recognition: A survey. Speech Communication 45:
455-470.
Kessler, B. 2007. Word Similarity Metrics and
Multilateral Comparison. In Proceedings of Ninth
Meeting of the ACL Special Interest Group in
Computational Morphology and Phonology. Prague,
Czech Republic: Association for Computational
Linguistics.
Kopivov, M., Klimeov, P., Golov, H. and Luke
D. 2014. Mapping Diatopic and Diachronic Variation
in Spoken Czech: the ORTOFON and DIALEKT
corpora. In N. Calzolari et al. (eds.) Proceedings of
the Ninth International Conference on Language
Resources and Evaluation (LREC '14). Reykjavik,
Iceland: European Language Resources Association
(ELRA). Available online at http://www.lrecconf.org/proceedings/lrec2014/pdf/252_Paper.pdf
Labov, W. 1963. The Social Motivation of a Sound
Change. Word 19: 273-309.
Macha, P. and Skarnitzl, R. 2009. Principles of Phonetic
Segmentation. Prague, Czech Republic: Nakladatelstv
Epocha.
McMahon, A.M.S. 1994. Understanding Language
223
Antti Arppe
University of Alberta
caelan@ualberta.ca
arppeualberta.ca
References
Cosh, C. (April 3, 2012). Dont call them tarsands.
Macleans. Retrieved from http://www.macleans.ca/
news/canada/oil-by-any-other-name/
Heylen, Kris, Thomas Wielfaert, and Dirk Speelman
(2013). Tracking Immigration Discourse through
Time: A Semantic Vector Space Approach to Discourse
Analysis.
Jaspaert, Koen, et al. (2011). Does framing work? An
empirical study of Simplifying Models for sustainable
food production. Cognitive Linguistics 22.3, 459-490.
Levin, I. P., & Gaeth, G. J. (1988). How consumers are
affected by the framing of attribute information before
and after consuming the product. Journal of consumer
research, 374-378.
Levin, I. P., Schneider, S. L., & Gaeth, G. J. (1998). All
frames are not created equal: A typology and critical
analysis of framing effects. Organizational behavior
and human decision processes, 76, 149-188.
Magnusson, C., Arppe, A., Eklund, T., Back, B.,
Vanharanta, H., & Visa, A. (2005). The language of
quarterly reports as an indicator of change in the
companys
financial
status.
Information
&
Management, 42, 561-574.
Peirsman, Y., Heylen, K., & Geeraerts, D. (2010).
225
Michaela Martinkov
Palack University
michaela.martinkova@upol.cz
Introduction
226
(2)
Discussion of findings
79
81
References
Aijmer, K. and Altenberg, B. 2002. Zero translations
and cross-linguistic equivalence: evidence from the
English-Swedish Parallel Corpus. In: L. E. Breivik
and A. Hasselgren (eds), From the COLTs mouth ...
and others. Language corpora studies in honour of
Anna-Brita Stenstrm. Amsterdam: Rodopi.
Biber, D. et al. 2007. Longman Grammar of Spoken and
Written English. Pearson Education ESL.
Desagulier, G. Quite new methods for a rather old issue:
Visualizing
constructional
idiosyncrasies
with
multivariate
statistics.
Available
at
http://www2.univparis8.fr/desagulier/home/handout_ICAME_33.pdf
Diehl, H. 2005. Quite as a degree modifier of verbs.
Nordic Journal of English Studies 4, (1), 1134.
Ghesquire, L. 2012. On the development of noun
intensifying quite. Paper presented at ICAME 33
conference.
Johansson, S. 2007. Seeing through multilingual
corpora. In R. Facchinetti (ed.) Corpus Linguistics 25
Years On. Amsterdam New York: Rodopi.
Levshina, N. 2014. Geographic variation of quite + ADJ
in twenty national varieties of English: A pilot study.
In: A. Stefanowitsch (ed.), Yearbook of the German
Cognitive Linguistics Association 2 (1), 109-126.
Ocelk, R. 2013. Smantick kly a skalrn
modifiktory v etin. Slovo a slovesnost 74 (2),
110-134.
Palacios Martnez, I. M. 2009. Quite Frankly, Im Not
Quite Sure That it is Quite the Right Colour. A
Corpus-Based Study of the Syntax and Semantics of
Quite in Present-Day English. English Studies 90 (2),
180-213.
Paradis, C. 1997. Degree modifiers of adjectives in
spoken British English (Lund Studies in English 92).
Lund: Lund University Press.
Paradis, C. 2008. Configurations, construals and change:
expressions of DEGREE. English Language and
Linguistics 12, 317-343
Quirk et al. 1985. Comprehensive Grammar of the
English Language. London: Longman.
Czech National Corpus - InterCorp. Institute of the Czech
National Corpus FF UK, Praha./
228
Markta
Janebov
Palack University
michaela.martinko
va@upol.cz
marketa.janebova@
upol.cz
Introduction
Data
fiction
Presseurope
Europarl
size: 15,038,876
f/ipm.: 51/3.39
Discussion of findings
Table 2 suggests a difference between fiction,
82
fiction
Presseurope
Europarl
Cz TTs
source
source
known
unknown
64.6%
35.4%
Cz STs
source
known
61.4%
45%
source
unknown
38.6%
55%
Looking Ahead
References
Czech National Corpus - InterCorp, Institute of the Czech
National Corpus, Prague. <http://www.korpus.cz>.
Fronek, J. 2000. Velk esko-anglick slovnk. LEDA.
Gast, V. 2012. Contrastive Linguistics: Theories and
Methods.
<http://www.personal.unijena.de/~mu65qev/papdf/contr_ling_meth.pdf>
Gast, V. and Levshina, N. 2014. Motivating w(h)-clefts
in English and German: A hypothesis-driven parallel
corpus
study.
<http://www.personal.unijena.de/~mu65qev/papdf/gast_levshina_subm.pdf>
Grepl, M. 2002. Reprodukce prvotnch vpovd. In: P.
Karlk et al. (eds.), Encyklopedick slovnk etiny.
Prague: Nakladatelstv Lidov noviny.
Hirschov, M. and Schneiderov, S. 2012. Evidenciln
vrazy v eskch publicistickch textech (ppad
dajndajn).
[online].
<http://www.ujc.cas.cz/miranda2/export/sitesavcr/data.
avcr.cz/humansci/ujc/vyzkum/gramatika-akorpus/proceedings-2012/konferencniprispevky/HirschovaMilada_SchneiderovaSona.pdf>
Hoffmanov, J. and Kolov I. 2007. Slovo pr/prej:
monosti jeho funkn a smantick diferenciace. In:
F. tcha and J. imandl (eds.), Gramatika a korpus
2005. Praha: J AV.
230
Diana McCarthy
Theoretical and Applied
Linguistics
Univ. Cambridge
Adam Kilgarriff
Lexical Computing
Ltd.
diana@
dianamccarthy.co.uk
adam.kilgarriff@
sketchengine.
co.uk
Milo Jakubek
Lexical Computing Ltd.
Masaryk University
Siva Reddy
Univ. Edinburgh
Verb Supersense
Verbs of
body
milos.jakubicek@
sketchengine.co.uk
siva.reddy@
ed.ac.uk
consumption
communication
...
Noun Supersense
Nouns denoting
act
acts or actions
animal
animals
artifact
man-made objects
...
fly
(verb)
UKWaC super sensed freq = 22,610 (61.1 per million)
intransframe
4,536
spectacle of a
flying around the building
Harris hawk
8.5
animal.n_*motion.v
392
10.12
artifact.n_*motion.v
1,007
9.58
240
8.8
1,323
8.36
communication.n_*motion.v
213
8.2
group.n_*motion.v
285
7.65
act.n_*motion.v
166
7.63
time.n_*motion.v
fly
person.n_*motion.v
0_*motion.v
mwe
100
1,750
7.56
0.6
fly_by_motion.v
413
12.33
fly_on_motion.v
291
11.99
fly_start_motion.v
194
11.54
fly_colours_act.n
141
11.15
transframe
1,074
person.n_*motion.v_artifact.n
ne_subject_of
*_motion.v
caternative
*motion.v_motion.v
4.1
103
974
8.81
2.1
892
551
8.31
2.2
177
8.31
Formalism
References
Baker, C. F., Fillmore, C. J., & Lowe, J. B. (1998,
August). The Berkeley FrameNet Project. Proc. ACL.
M. Baroni and A. Lenci. 2010. Distributional Memory: A
general framework for corpus-based semantics.
Computational Linguistics 36(4): 673-721.
Ciaramita, M. and Altun, Y. 2006. Broad-coverage sense
disambiguation and information extraction with a
supersense sequence tagger. Proc EMNLP, Sydney,
Australia: pp 594-602.
Ciaramita M. and Johnson, M 2003. Supersense Tagging
of Unknown Nouns in WordNet. In Proceedings of
EMNLP 2003.
Erk K. 2007. A simple, similarity-based model for
selectional preferences. Proc. ACL 2007. Prague,
Czech Republic, 2007.
Fellbaum, C editor. 1998. WordNet: An Electronic
Lexical Database. MIT Press, Cambridge.
Hanks, P. W. 2013. Lexical Analysis: a theory of norms
and exploitations. MIT Press.
Kilgarriff, A. 1997. Foreground and Background
Lexicons and Word Sense Disambiguation for
Information Extraction. Proc Workshop on Lexicondriven Information Extraction, Frascati, Italy.
Kilgarriff, A., Rychl, P., Smrz, P., Tugwell, D. 2004.
The Sketch Engine. Proc. EURALEX. pp. 105116.
Kipper, K., Korhonen, A., Ryant, N., & Palmer, M. 2006.
Extending VerbNet with novel verb classes. Proc.
LREC.
McCarthy, D. and Carroll J. 2003. Disambiguating nouns,
verbs and adjectives using automatically acquired
selectional preferences, Computational Linguistics,
29(4). pp 639-654.
Navigli. R., and S. Ponzetto. 2012. BabelNet: The
Automatic Construction, Evaluation and Application of
a Wide-Coverage Multilingual Semantic Network.
Artificial Intelligence, 193, Elsevier, pp. 217-250.
Resnik. P. 1993. Selection and Information: A ClassBased Approach to Lexical Relationships. Ph.D. thesis,
University of Pennsylvania, Philadelphia, PA.
Rychl, P. 2008. A Lexicographer-Friendly Association
Score.
Proc. RASLAN workshop, Brno, Czech
Republic.
Schulze, B. M., & Christ, O. 1994. The CQP users
manual. Universitt Stuttgart, Stuttgart.
Socher, R. 2014. Recursive Deep Learning for Natural
Language Processing and Computer Vision,
PhD Thesis, Computer Science Department, Stanford
University
233
Claire Hardaker
Lancaster University
m.mcglashan
@lancaster.ac.uk
c.hardaker
@lancaster.ac.uk
Introduction
234
Acknowledgements
This work was supported by the Economic and
Social
Research
Council
[grant
number
ES/L008874/1].
We would like to thank: Steve Wattam for his
tireless help, engineering wizardry and endless
cynicism; Uday Avadhanam for initiating him into
the (horribly complex) world of R; Tony McEnery
for all his help and patience.
References
Anthony, L. 2014. AntConc (Version 3.4.3.) [Computer
Software]. Tokyo, Japan: Waseda University.
Available from http://www.antlab.sci.waseda.ac.jp/
Baker, P. 2006. Using Corpora in Discourse Analysis.
London: Continuum.
Scott, J. 2013. Social Network Analysis. 3rd Ed. London:
Sage.
Zappavigna, M. 2012. Discourse of Twitter and social
media. London: Continuum.
Conclusions
The
global
community
currently
faces
unprecedented challenges in critical areas (e.g. food
production, resource depletion, energy availability)
with rhetoricians quick to portray technological
innovation as the panacea to avoid systemic
collapse. At times technology has arguably helped
deliver the species from peril but the extent to which
we can unquestioningly rely on technological
innovation or the extent to which it actually
increases risk of catastrophe is worryingly
overlooked in popular discourse (Wright 2005;
Taleb 2012).
Huesemann and Huesemann (2011) go so far as to
boldly claim that humanity has seduced itself with
ideological techno-optimistic notions of salvation.
They define techno-optimism in terms of a set of
core beliefs:
efficiency gains will solve the major
problems of our day;
continued economic growth is
environmentally sustainable;
military investment will ensure global peace
and secure access to scarce resources for
industrialised nations;
236
Preliminary results
Subcategory of TechnoAgreement
Optimism
Level (%)
Efficiency Gains
0.92
Infinite Growth
0.97
Medical Improvements
0.97
Military Investment
0.78
Biofuels and Nuclear Power
0.84
GM Crops
0.93
Material Happiness
1.0
Technological Imperative
0.92
0.89
Overall Agreement level
Table 1: Techno-Optimist Macro-Propositional
Agreement in the NIC Corpus
The high level of macro-propositional agreement
suggests that techno-optimism was highly prevalent
in the work of the NIC. In relation to Efficiency
Gains, Infinite Growth and Material Happiness this
disclosed a status quo bias (i.e. business as usual) in
the work of the NIC. Radical possibilities such as
the development of a seasonal economic system or
localized economies were not given consideration.
Interestingly whilst Military Investment was the
largest sub-category it also had the highest level of
macro-propositional disagreement. This was largely
driven by consideration of the challenges posed by
new technologies to traditional power structures. In
the remaining subcategories disagreement was
mostly driven by consideration of technical and
commercial constraints imposed upon the process of
innovation diffusion.
Stance Marker Items
% of
Per
Total
Use
1000
Words
Uncertainty
12.4
0.76
Marker (Hedge)
Certainty
Marker
(Booster)
1.2
0.08
Attitude Marker
2.5
0.16
Total
16.1
100
237
Stance
Marker
Uncertainty
Marker
(Hedge)
Items
Per
1000
Words
14.5
% of Difference
Total (% of Total
Use
Use: NIC and
Hyland study)
0.59
+0.18
0.24
-0.16
Certainty
Marker
(Booster)
5.8
Attitude
Marker
4.2
0.17
-0.02
Huesemann, M. and Huesemann, J. 2011. TECHNOFIX: Why Technology Wont Save Us or the
Environment.
Gabriola Island: New Society
Publishers.
Total
24.5
100
---
Summary
References
Commoner, B. 1971. The Closing Circle Nature, Man,
and Technology. New York: Aldred. A. Knopf.
Dickson, D. 1975.
The Politics of Alternative
Technology. New York: American Management
Association.
Eagleton, T. 1991. Ideology. An Introduction. London:
Verso.
238
Evgeniya
Smolovskaya
National Research
University HSE
Evgeniy
Mescheryakova
National Research
University HSE
esmolovskaya@
hse.ru
eimescheryakova@
hse.ru
Olesya Kisselev
Pennsylvania State
University
Ekaterina Rakhilina
National Research
University HSE
ovk103@psu.edu
erakhilina@hse.ru
Introduction
theoretical
issues
of
error
identification,
categorization and explanation of error source.
These two lines of work are not entirely independent
of each other; in fact, they feed into one another,
ideally, resulting into creation of a unified,
automated, and comprehensive error tagging system.
Error analysis of the texts in the Russian Learner
Corpus has been thus far attempted from these two
perspectives. Klyachko et al. (2013) tested a
protocol for automated error identification, which
consisted of comparison of lists of bi- and tri-grams
found in the learner corpus to the lists of bi- and trigrams found in a native corpus. This approach was
found to be fairly successful in identification of such
errors as noun-adjective agreement and prepositional
and verbal government. However, it comes with
certain limitations: for instance, it provided far less
accurate results for discontiguous structures
compared to contiguous strings (possibly due to the
size and characteristics of the baseline corpus) and,
more importantly, left a large repertoire of nongrammatical structures out of its scope.
Another approach, discussed in this paper, begins
with manual annotation of a sample of learner texts.
The annotators first read and tag deviant forms using
a tagging software developed for the project (see the
illustration of the program interface below, Figure
1). Importantly, the error tags include the
information about the source of an error (calque,
semantic extension, etc.), in addition to the
information about the structural property of an error
(e.g. lexical, aspectual, morphological).
Those erroneous structures that reach a frequency
threshold that reliably points to a systematic rather
than a random nature of these errors are then
examined and grouped according to structural and
functional properties. To illustrate how this approach
works we refer to examples below:
(1)
(2)
239
240
,
?
(Russian National Corpus)
You think it was easy to drop everything and fly
here?
/
(Russian National Corpus)
It is hard to say / when they will finish building.
Conclusions
References
Alsufieva, A., Kisselev, O. and Freels, S. 2012. Results
2012: Using Flagship Data to Develop a Russian
Learner Corpus of Academic Writing. Russian
Language Journal, 62: 79-105.
Bonch-Osmolovskaya, A. 2006. Dativnyj subject v
russkom
yazyke:
korpusnoe
issledovanie.
Unpublished PhD thesis, Moscow State University.
Granger, S. 1998. Learner English on Computer. Addison
Wesley Longman, London and New York.
Ladygina, A. 2014. Russkie heritazhnye konstruktsii:
korpusnoe issledovanie. Unpublished MA thesis,
Moscow State University.
Klyachko, E., Arkchangelskiy, T., Kisselev, O. and
Rakhilina. 2013 Automatic error detection in
Russian learner language. Conference presentation,
CL2013
References
Calsamiglia, H. and van Dijk T.A. 2004. Popularization
Discourse and Knowledge about the Genome.
Discourse & Society 15 (4), Special issue Genetic and
genomic discourses at the dawn of the 21st century,
guest-edited by B. Nerlich, R. Dingwall, P. Martin:
369-389.
George, S. 1994. An Awkward Partner. Britain in the
European Community. Oxford University Press:
Lakoff, G. and Johnson, M. 1980. Metaphors we live by.
Chicago: Chicago Press.
Milizia, D. 2014a. In, out, or half way? The European
attitude in the speeches of British leaders. Lingue e
Linguaggi 11: 157-175.
Milizia, D. 2014b. Specialized discourse vs popularized
discourse: the UK and the European Union. Paper
presented at the University of Catania, Italy, 2nd
International Conference, Language and Diversity:
Discourse and Translation, 9-11 October.
Milizia, D. 2014c. A bilingual comparable analysis: the
European Union in the speeches of British and Italian
leaders. Paper presented at the University of Milan,
Italy, CLAVIER 14: LSP 20-21 November.
Musolff, A. 2004. Metaphor and Political Discourse.
New York: Palgrave Macmillan.
Musolff, A. 2000. Political imagery of Europe: A house
without exit doors? Journal of Multilingual and
Multicultural Development 21 (3): 216-229.
Scott, M. 2012. WordSmith Tools 6.0. Lexical Analysis
Software Limited.
Semino, E. 2002. A sturdy baby or a derailing train?
Metaphorical representations of the euro in British and
Italian newspapers. Text 22 (1): 107-139.
Jean-Gabriel
Ganascia
UPMC, LIP6, Paris
jean-gabriel.
ganascia@lip6.fr
Introduction
Adjectives
English
French
ressembler , sembler, ,
rappeler, faire leffet de,
faire penser , faire songer ,
donner limpression de, avoir
lair de, verb + plus que, verb
+ moins que,
tre/devenirespce/type/gen
re/sorte de
identique , tel,
semblable , pareil ,
similaire , analogue ,
gal , comparable
Results
244
www.gutenberg.org
beq.ebooksgratuits.com
Acknowledgement
This work was supported by French state funds
managed the ANR within the Investissements
d'Avenir programme under the reference ANR-11IDEX-0004-02.
References
Bouverot, D. 1969. Comparaison et mtaphore. Le
Franais Moderne. 37(2) :132-147.
Israel, M., Riddle Harding, J. and Tobin, V. 2004. On
Simile. Language, Culture and Mind. Stanford: CSLI
Publications.
Goatly, A. 1997. The Language of Metaphors. London
and New York: Routledge.
Leech, G. and Short, M. 2007. Style in Fiction: A
Linguistic Introduction to English Fictional Prose.
Harlow: Pearson Longman.
Schmid, H. 1994. Probabilistic part-of-speech tagging
using decision trees. Proceedings of the International
Conference on New Methods in Language Processing:
Background
Corpus
Analysis
L2 accuracy development.
References
Barr, D. J. (2008). Analyzing visual world eyetracking
data using multilevel logistic regression. Journal of
Memory
and
Language,
59(4),
457474.
doi:10.1016/j.jml.2007.09.002
Gries, S. T. (in press). The most underused statistical
method in corpus linguistics: Multi-level (and mixedeffects)
models.
Corpora.
Retrieved
from
http://www.linguistics.ucsb.edu/faculty/stgries/researc
h/ToApp_STG_MultilevelModelingInCorpLing_Corp
ora.pdf
Jaeger, T. F. (2008). Categorical data analysis: Away
from ANOVAs (transformation or not) and towards
logit mixed models. Journal of Memory and Language,
59(4), 434446. doi:10.1016/j.jml.2007.11.007
Housen, A., Kuiken, F., & Vedder, I. (2012a).
Complexity, accuracy and fluency: Definitions,
measurement and research. In A. Housen, F. Kuiken,
& I. Vedder (Eds.), Dimensions of L2 performance and
proficiency: Complexity, accuracy and fluency in SLA
(pp. 120). Amsterdam: John Benjamins.
Housen, A., Kuiken, F., & Vedder, I. (2012b).
Dimensions of L2 performance and proficiency:
Compmlexity, accuracy and fluency in SLA.
Amsterdam: John Benjamins.
Susan Nacey
Hedmark University College
susan.nacey@hihm.no
Introduction
Categorization
translations
of
metaphor
1
2
3
4
5
6
7
Abbreviation
MM
M1 M2
MS
M/S S + gloss
MP
M
M M + gloss
3
4
5
6
7
8
9
10
TT ID tag
(NEST_Opp_)
Translation
strategy
002en.s32
M M (-)
References
003en.s37
M M (L1) +
gloss
004en.s37
M1 M2
005en.s37
MM
(-)
007en.s37
M1 M2
008en.s30
MM
(-/+)
010en.s36
M1 M2
011en.s39
MM
(-/+)
014en.s43
MM
(-)
MP
159en.s38
in
learner
English.
249
Mariko Abe
Chuo University
abe.127@
g.chuo-u.ac.jp
Yuichiro Kobayashi
Toyo University
kobayashi0721@gmail.com
Introduction
Research Methodology
Results
Hong
Kong
5.05
4.19
3.85
5.57
5.61
Taiwan
Korea
Japan
4.78
5.02
6.34
7.29
7.81
2.97
3.25
3.24
4.94
4.39
7.91
7.49
8.54
8.06
6.19
The present study examined the use of promptaffected lexical bundles in argumentative essays
produced by four Asian learner groups. Our
quantitative analyses revealed that among these
learner groups, more recycling of prompt wordings
was found in the essays produced by Japanese L2
learners. Also, unlike the other learner groups, Hong
Kong L2 learners did not reuse the whole wording of
the prompt in their essays at all. Furthermore, it was
found that L2 learners tended to reuse parts or the
whole wording of the given writing prompt
regardless of their English proficiency.
We assume that recurrent use of lexical bundles,
whether or not prompt-induced ones, is relevant to
the development of lexical cohesion in L2 writing.
Thus, further corpus-based research is necessary to
obtain a better understanding of lexical cohesive
links produced by L2 learners. Further research can
also offer pedagogical implications for L2 writing
instructors to develop more informed courses and
materials, enabling their students to produce more
cohesive written discourse in a second language.
Discussion
Conclusion
Acknowledgements
This research was supported by Grants-in-Aid for
Scientific Research Grant Numbers 24320101 and
26370703.
References
Abe, M., Kobayashi, Y. and Narita, M. 2013. Using
multivariate statistical techniques to analyze the
writing of East Asian learners of English. In S.
Ishikawa (ed.), Learner corpus studies in Asia and the
world - Vol.1. Kobe: School of Language and
Communication, Kobe University.
Granger, S. 1998. Learner English on computer. New
York: Addison Wesley Longman, Inc.
Granger, S., Gilquin, G. and Meunier, F. (eds.) 2013.
Twenty years of learner corpus research - looking back,
moving ahead: Proceedings of the first learner corpus
research conference (LCR 2011). Louvain-la-Neuve:
251
Eva Hajiov
Charles University in
Prague
nedoluzko@ufal.mff
.cuni.cz
hajicova@ufal.mff.
cuni.cz
252
253
in
69583
21529
% of
total
tfa=t
100%
30%
48054
70%
37606
54%
6755
10%
3709
5%
876
1%
Table 1
In conclusion, the analysis of anaphoric relations
from the point of view of their co-existence with one
aspect of information structure, namely the feature
of contextual boundness, has revealed that the
majority of contextually bound noun groups without
anaphoric links represent contextual relations of a
kind different from anaphoric and basic bridging
relations. Although several types of bridging
relations are annotated in PDT, they cannot cover all
kinds of textual cohesive interdependencies; our
inquiry has pointed out one of the directions for
further investigations.
254
Acknowledgement
We gratefully acknowledge support from the Grant
Agency of the Czech Republic (grant P406/12/0658
Coreference, discourse relations and information
structure in a contrastive perspective).
References
Bejek E., Hajiov E., Haji J. et al. (2013): Prague
Dependency Treebank 3.0. Data/software, Univerzita
Karlova v Praze, MFF, FAL, Prague, Czech
Republic, http://ufal.mff.cuni.cz/pdt3.0/
Leech G. (1974),
Harmondsworth.
Semantics.
Penguin
Books.
Introduction
255
Concluding remarks
References
Baker, P. (2006). Using Corpora in Discourse Analysis.
London: Bloomsbury Academic.
BeauchampPryor, K. (2011). Impairment, cure and
identity: where do I fit in? Disability & Society,
26(1), 517. doi:10.1080/09687599.2011.529662
Pascual PrezParedes
Universidad de
Murcia
yolanda.noguera@
upct.es
pascualf@um.es
Introduction
Methodology
References
Aston, Guy and Burnard, Lou. The BNC handbook:
exploring the British National Corpus with SARA.
Edinburgh University Press, 1998.
Bloggs, J.F. and Brown, Q.V. 2004. The very complicated
nature of corpus linguistics. Anytown: Anytown
University Press.
Biber, Douglas, Conrad, Susan, and Reppen, Hat, H.H.
2006. A classic research thesis with a very long title:
and a subtitle. Unpublished PhD thesis, University of
Anytown.
McEnery, Tony, and Wilson, Andrew. 2001. Corpus
Linguistics, 2nd ed. Edinburgh University Press
Partington, Alan. 1998. Patterns and Meanings, 1998.
Amsterdam: Benjamins,
Sinclair, John.1991. Corpus, Concordance, Collocation
Oxford UP.
Smith, X. 2003. Some thoughts on submitting abstracts
to conferences. In J. Jones and F.Farmer (eds.) All
about conferences. London:
Introduction
Metodology
Findings
1
2
3
References
Berber Sardinha, T. Teaching Grammar and Corpora. In:
Chapelle, C. A. (Ed.) The Encyclopedia of Applied
No. of Tokens
1950s
1960s
1970s
1980s
1990s
2000s
>2014
10
13
12
11
19
16
7
7,410
20,050
12,610
12,162
23,885
13,793
3,292
TOTAL
88
93,202
Acknowledgements
I would like to thank the Economic and Social
Research Council (Grant number: ES/J50001X/1)
for funding the wider doctoral project, of which this
research is a part.
References
Baker, P. 2005. Public Discourses of Gay Men. London:
Routledge.
Baker, P. 2014. Using Corpora to Analyze Gender.
London: Bloomsbury.
Chirrey, D. 2007. Women Like Us: Mediating and
contesting identity in Lesbian advice literature. In, H.
Sauntson and S. Kyratzis (eds.) Language, Sexualities
and
Desires:
Cross-Cultural
Perspectives.
Basingstoke: Palgrave Macmillan, pp. 223-244.
Chirrey, D. 2012. Reading the Script: An analysis of
script formulation in coming out advice texts. Journal
of Language and Sexuality 1 (1): 35-58.
Jewitt, C. and Oyama, R. 2001. Visual Meaning: A
Social Semiotic Approach. In, T. van Leeuwen and C.
Jewitt (eds.) Handbook of Visual Analysis. London:
Sage, pp. 134-156.
Partington, A., Duguid, A., and Taylor, C. 2013. Patterns
and Meanings in Discourse: Theory and practice in
corpus-assisted discourse studies (CADS). Amsterdam:
John Benjamins.
Wilmot, M. and Naidoo, D. 2014. Keeping Things
Straight: The representation of sexualities in life
orientation textbooks. Sex Education 14 (3): 32
scar Bladas
University of
Barcelona
a.oboyle@ qub.ac.uk
o.bladas@
alumni.ub.edu
References
264
Michael Pace-Sigge
University of Eastern Finland
michael.pace-sigge@uef.fi
Tomasello, M. 2005.
EuropeanJournal
The
ultra-social
animal.
Introduction
Research
Conclusions
References
Barbaresi, A. 2012. German Political Speeches. Corpus
and Visualization. Second release, 03/05/12. Available
online at: http://purl.org/corpus/german-speeches (last
accessed 21/11/14).
Bewer, F. 2004. Der Erwerb des Artikels als GenusAnzeiger im deutschen Erstspracherwerb. ZAS Papers
in Linguistics 33 (2): 87-140.
CHILDES Corpus, German. Available online at:
http://childes.psy.cmu.edu/data/germanic/german (last
accessed 21/11/14).
European Parliament Proceedings Parallel Corpus 19962011. German Source Release. Available online at:
http://www.statmt.org/europarl/
(last
accessed
21/11/14).
Fries, N. 2001. Ist Deutsch eine schwere Sprache? Am
Beispiel des Genus-Systems Berlin: Humboldt
Universitt. Available online at
http://www2.rz.huberlin.de/linguistik/institut/syntax/docs/
fries_ds_2000.pdf (last accessed 18/11/14)
Jantunen, J. H. and Brunni, S. 2013. Morphology, lexical
priming and second language acquisition: a corpusstudy on learner Finnish. In S. Granger, G. Gilquin
and F. Meunier (eds) Twenty Years of Learner Corpus
Research: Looking back, Moving ahead. Corpora and
Language in Use Proceedings 1. Louvain-la-Neuve.
Presses universitaires de Louvain. 235-245.
Helbig, G. and Buscha, J. 1984. Deutsche Grammatik.
Ein Handbuch fr den Auslnderunterricht.. Leibzig:
VEB Verlag Enzyklopdie Leibzig.
Hoey, M. 2005. Lexical Priming. A new theory of words
and language. London: Routledge.
Hoey, M. and Shao, J. (forthcoming). 'English and
Chinese two languages explained by the same
theory? The odd case of a psycholinguistic theory that
generates corpus-linguistic hypotheses for two
266
Introduction
Outline of Paper
The Corpus
Conclusion
References
Charteris-Black, Jonathan. (2014). Analysing Political
Speeches: Rhetoric, discourse and metaphor. London:
Palgrave-Macmillan.
Hanks, Patrick (2004). The Syntagmatics of Metaphor
in International Journal of Lexicography 17:3.
Hanks, Patrick. (2013). Lexical Analysis. London: John
Benjamins.
Hoey, Michael (2005). Lexical Priming. London:
Routledge.
Philip, Gill (2011). Colouring Meaning: Collocation and
Connotation in Figurative Language. Amsterdam:
John Benjamins.
Introduction
http://ec.europa.eu/eurostat/statisticsexplained/index.php/Migration_and_migrant_population_statisti
cs#Further_Eurostat_information
95
http://www.theguardian.com/uk/2013/jan/13/immigrationbritish-society-biggest-problem
Research
corpus
methodology:
the
LADEX
http://www.theguardian.com/uk/2013/jan/13/immigrationbritish-society-biggest-problem
269
Acknowledgements
Research funded by FFI2011-30214 Lenguaje de
la Administracin Pblica en el mbito de la
extranjera: estudio multilinge e implicaciones
culturales (LADEX). Spanish Ministry of Economy
and Competitiveness (Ministerio de Economa y
Competitividad).
References
Baker, P. 2005. Public discourses of gay men. London:
Routledge.
Baker, P. & McEnerey, T. 2005. A corpus-based
approach to discourses of refugees and asylum seekers
in UNI and newspaper texts. Language and Politics,4,
197-226.
Baker, P. et al. 2008. A useful methodological synergy?
Combining critical discourse analysis and corpus
linguistics to examine discourses of refugees and
asylum seekers in the UK press. Discourse & Society,
19: 273-306,
Baker, P. Gabrielatos, C. & McEnerey, T. 2013.
Discourse analysis and media attitudes. Cambridge:
270
Lorna J. Philip
University of
Aberdeen
gill.philip@
unimc.it
l.philip@
abdn.ac.uk
Alistair E. Philip
Chartered clinical psychologist
aephilip@waitrose.com
Introduction
Word
frequency
centrality
and
conceptual
Manual intervention
272
Concluding remarks
References
Philip, L.J. and Macmillan, D.C. 2005. Exploring values,
context and perceptions in contingent valuation
studies: the CV Market Stall technique and willingness
to pay for wildlife conservation. Journal of
Environmental Planning and Management 48 (2): 257274.
Philip, G., Philip L.J., and Philip, A.E. 2014. Learning as
conceptual acquisition: A pilot project for measuring
learning outcomes in higher education. Paper
presented at AELCO-SCOLA, Badajoz (Spain), 15-18
October 2014.
Pragglejaz group. 2007 MIP: a Method for identifying
metaphorically used words in discourse. Metaphor
and Symbol 22 (1): 1-39.
Robert Poole
University of Arizona
repoole@email.arizona.edu
References
Robert Poole
University of Arizona
repoole@email.arizona.edu
275
References
del, A. 2010. Using corpora to teach academic writing:
Challenges for the direct approach. In M. CampoyCubillo, B. Bells-Fortuo, and M. Gea-Valor (eds.)
Corpus-based approaches to English language
teaching. London: Continuum.
Boulton, A. 2010. Learning outcomes from corpus
consultation. In M. Jan, F. Valverde, and M. Prez
(eds.) Exploring new paths in language pedagogy.
London: Equinox.
Chambers, A. 2005. Integrating corpus consultation in
language studies. Language learning and technology
9 (2): 111-125.
Conrad, S. 2000. Will corpus revolutionize grammar
teaching in the 21st century?. TESOL Quarterly 34
(3): 548-560.
Charles, M. 2011. Using hands-on concordancing to
teach rhetorical functions: evaluation and implications
for EAP writing. In A. Frankenberg-Garcia, L.
Flowerdew, and G. Aston (eds.) New trends in corpora
and language learning. London: Continuum.
Flowerdew, L. 2003. A combined corpus and systemicfunctional analysis of the problem-solution pattern in a
student and professional corpus of technical writing.
TESOL Quarterly 37 (3): 489-511.
Flowerdew, J., & Wan, A. 2010. The linguistic and the
contextual in applied genre analysis: The case of the
company audit report. ESP English for Specific
Purposes 29 (2): 78-93.
Henry, A. 2007. Evaluating language learners response
to web-based, data-driven, genre teaching materials.
English for Specific Purposes 26 :462-484.
Leech, G. 1997. Teaching and language corpora: a
convergence. In A. Wichmann, S. Fligelstone, T.
McEnery, and G. Knowles (eds.) Teaching and
language corpora. London: Longman.
McCarthy, M. & Carter, R. 1994. Language as discourse.
London: Longman.
Reinhardt, J. 2010. The potential of corpus-informed L2
pedagogy. Studies in Hispanic and lusophone
linguistics 3 (1): 239-251.
Rmer, U. 2010. Using general and specialized corpora
in English language teaching: Past, present, and
future. In Corpus=based approaches to English
language teaching. M. Campoy-Cubillo, B. BellsFortuo, and M. Gea-Valor (eds.). London:
Continuum.
Introduction
Study background
Data
Methods
Discussion
Acknowledgements
References
International Tribunal for the Prosecution of Persons
Responsible for Serious Violations of International
Humanitarian Law Committed in the Territory of the
Former Yugoslavia since 1991. Updated Statute of the
279
280
References
Bada, E. (2010). Repetitions as vocalized fillers and selfrepairs in English and French interlanguages. Journal
of Pragmatics, 42, 1680-1688.
Chui, K. (1996). Organization of repair in Chinese
conversation. Text, 16, 343-372.
Fox, B., et al. 2009. A cross-linguistic investigation of the
site of initiation in same-turn self-repair. In Sidnell, J.
(Ed.),
Conversation
Analysis:
Comparative
Perspectives (pp. 60-103). Cambridge: Cambridge
University Press.
Fox, B., Maschler, Y., Uhmann, S. 2010. A crosslinguistic study of self-repair: Evidence from English,
German and Hebrew. Journal of Pragmatics, 42, 24872505.
Gilquin, G., De Cock, S., Granger. S. 2010. Louvain
International
Database
of
Spoken
English
Interlanguage. Belgium: Presses Universitaires de
Louvain.
Huang, H., Tanangkingsing, M. 2005. Repair in verbinitial languages [J]. Language and Linguistics, 6(4),
575-597.
Nemeth, Z. 2012. Recycling and replacement repairs as
self-initiated same-turn self-repair strategies in
Hungarian. Journal of Pragmatics, 44, 2022-2034.
Rieger, C.L., 2003. Repetitions as self-repair strategies in
English and German conversations. Journal of
Pragmatics, 35, 47-68.
Schegloff, E., Jefferson, G., Sacks, H. 1977. The
preference for self-correction in the organization of
repair in conversation. Language, 53, 361-382.
Wouk, F. 2005. The syntax of repair in Indonesian.
Discourse Studies, 7: 237-258.
Shelley Staples
PurdueUniversity
staples0@
purdue.edu
References
Biber, D. 1988. Variation across speech and writing.
Cambridge: CUP.
Biber, D. 2006. University language: A corpus-based
study of spoken and written registers.Philadelphia, PA:
John Benjamins Publishing.
Biber, D. and Gray, B. 2013. Discourse characteristics of
writing and speaking task types on the TOEFL
iBT Test: A lexico-grammatical analysis. TOEFL iBT
Research Report 19. Princeton, NJ: Educational
Testing Service.
Daniel Ross
University of Illinois at Urbana-Champaign
djross3@illinois.edu
Introduction
Outlook
References
Carden, G. & Pesetsky, D. 1977. Double-Verb
Constructions, Markedness, and a Fake Co-ordination.
Chicago Linguistics Society 13: 8282.
Culicover, P.W. & Jackendoff, R. 1997. Semantic
subordination despite syntactic coordination. Linguistic
Inquiry 28(2): 195217.
Lieven, E., Salomo, D. & Tomasello, M. 2009. Two-yearold childrens production of multiword utterances: A
usage-based analysis. Cognitive Linguistics 20(3): 481507.
Lind, . 1983. The variant forms try and/try to. English
Studies 5: 550563.
286
References
Allerton, D. 1988. Infinitivitis in English. In Klegraf
and D. Nehls, eds., Essays on the English Language
and Applied Linguistics on the Occasion of Gerhard
Nickels 60th Birthday. Heidelberg: Julius Groos
Verlag. 11-23.
Bolinger, D. 1968. Entailment and the Meaning of
Structures. Glossa 2, 119-127.
Duffley, P. 2000. Gerund versus Infinitive as
Complement of Transitive Verbs in English: the
Problems of Tense and Control. Journal of English
Linguistics, 28, 221-248.
Rohdenburg, G. 2006. The Role of Functional Constraints
in the Evolution of the English Complementation
System. In: C. Dalton-Puffer, D. Kastovsky, N. Ritt,
and H. Schendle, eds., Syntax, Style and Grammatical
Norms: English from 15002000. Bern: Peter Lang,
143166.
Rudanko, J. 2011. Changes in Complementation in
British and American English. Basingstoke: Palgrave
Macmillan.
Vosberg, U. 2003. The Role of Extractions and Horror
Aequi in the Evolution of -ing Complements with
287
2
1
Research Questions
Methodology
Results
Dummy
auf
dieACC/derDAT
Windschutzscheibe des Pkw aufschlug.
The Technical Inspection Organisation
has determined that the dummy had hit
the windscreen of the car
29% of the respondents report a preference for ACC,
as opposed to 36% of the participants preferring
DAT.
Overall, the results of this case study indicate that
although case marking in prepositional complements
of intransparent verbs is highly motivated, the
variation cannot be attributed to a single functional
opposition between path focs and endpoint focus.
On the contrary, a diverse range of different lexicosemantic factors seem to be in play, which suggests
that with these verbs, the ACC-DAT opposition
might only serve a local, verb-specific function,
which differs widely on a verb-by-verb basis. The
complexity and subtlety of these functions could
explain why there seems to be considerable
disagreement among native speakers with regards to
the most acceptable case marking in a given context.
References
Duden. 2006. Die Grammatik (7th ed.). Mannheim:
Dudenverlag.
Duden. 2007. Richtiges und gutes Deutsch (6th ed.).
Mannheim: Dudenverlag.
Lakoff, G.. 1987. Women, fire, and dangerous things:
What categories reveal about the mind. University of
Chicago Press.
Olsen, S.. 1996. Pleonastische Direktionale. In G.
Harras and M. Bierwisch (eds.) Wenn die Semantik
arbeitet. Klaus Baumgrtner zum 65. Geburtstag.
Tbingen: Niemeyer.
Rys, J., Willems, K., De Cuypere, L.. Akkusativ und
Dativ nach Wechselprpositionen im Deutschen. Eine
Korpusanalyse von versinken, versenken, einsinken
und einsenken in. In I. Doval and B. Lbke (eds.)
Raumlinguistik und Sprachkontrast. Neue Beitrge zu
spatialen Relationen im Deutschen, Englischen und
Spanischen. Mnchen: Iudicium Verlag.
Smith, M. B. 1995. Semantic motivation vs. arbitrariness
in grammar: toward a more general account of the
dative/accusative contrast with German two-way
prepositions. In: I. Rauch and G. Carr (eds.) Insights
in Germanic linguistics I: Methodology in transition.
Berlin/New York: Mouton de Gruyter.
Willems, K. 2011. The semantics of variable case
marking
(Accusative/Dative)
after
two-way
prepositions in German locative constructions.
Towards a constructionist approach. Indogermanische
Forschungen 116: 324366.
Willems, K. / Rys, J. / De Cuypere, L. In press. Case
alternation in argument structure constructions with
289
Dag Elgesem
University of Bergen
andrew.salway@
uni.no
Dag.Elgesem@
infomedia.uib.no
Kjersti Flttum
University of Bergen
kjersti.flottum@if.uib.no
Introduction
Background
Overview of method
Main findings
References
Acknowledgements
This research was supported by grants from The
Research Council of Norways SAMKUL program
(LINGCLIM, project 220654) and VERDIKT
program (NTAP, project 213401). We are very
grateful to Knut Hofland and Lubos Steskal for their
roles in creating the corpora analysed here.
292
Introduction
98
Using Corpora
Challenges
Concluding remarks
References
Ausubel, D.P. The Acquisition and Retention of
Knowledge: A Cognitive View. 2000. Dordrecht:
Springer Science+Business Media.
Berber Sardinha, A. P. 2004. Lingustica de Corpus.
Barueri, SP: Manole.
Burton, G. 2012. Corpora and coursebooks: destined to
be strangers forever? Corpora 2012 Vol. 7 (1): 91
294
108.
Carter, R. 1998. Orders of reality: Cancode,
communication and culture. In: ELT Journal. Oxford:
Oxford University Press, v.52, n.1, jan/1998, p.43-56.
Gavioli, L. 2005. Exploring Corpora for ESP Learning.
John Benjamins Publishing. Studies in Corpus
Linguistics, Vol.21.
Johns, Tim. 1991. Should you be persuaded: two
samples of data-driven learning materials. In: JOHNS,
T. Johns e King, P. (eds.) Classroom Concordancing.
In: ELR Journal 4. University of Birmingham. p.1-16.
Mishan, Freda. 2005. Designing Authenticity into
Language Learning Materials. Bristol: Intellect Books.
McCarthy, M. 2008. Lang. Teach. (2008), 41:4, 563574.
Tomlinson, B. 2003. Developing Materials for Language
Teaching. London: Continuum.
Tribble, C. and Jones, G. 1997. Concordances in the
classroom. A resource guide for teachers. Houston:
Athelstan Publications.
XIAO, R, And Mcenery, T. 2005. Corpora and language
education.
Manuscript.
Available
at:
http://www.corpus4u.org/archive/index.php/t-75.htm
1.
Roland Schfer
Freie Universitt
Berlin
Samuel Reichert
Freie Universitt
Berlin
roland.schaefer
@ fu-berlin.de
samuel.reichert@fu
-berlin.de
Introduction
ein
Becher Wein
aNOM cupNOM wineNOM
a cup of wine
ein
Becher des
Weines
aNOM cupNOM theGEN wineGEN
a cup of the wine
AP, such as in (3), where (3a) and (3b) have identical meanings.
ein
Becher leckerer
aNOM cupNOM tasyNOM
a cup of tasty wine
b. ein
Becher leckeren
aNOM cupNOM tasyGEN
(3) a.
Wein
wineNOM
Weines
wineGEN
In this section, we describe the data source, the sampling procedure as well as the annotation scheme.
We used the deWaC Web corpus (Baroni et al.
2009). The choice was motivated by its size (roughly
1.63 billion tokens), and by the fact that it contains
texts in diverse registers, including non-standard
variation. We took separate samples for embedded
nouns in the three grammatical genders: 1,450 observations of masculine, 1,845 observations of
neuter, and 1,719 observations of feminine embedded noun tokens. The reason for using separate samples is that German nouns show many case syncretisms, to the effect that only the masculine singular NP of the form [AP N] still differentiates between the four cases of German. In the neuter and
the feminine singular, nominative and accusative are
conflated. In the feminine singular, dative and genitive are conflated as well, effectively making the
feminine system a two case system. We therefore focus here on the results for masculine and neuter
nouns for reasons of greater clarity, although we did
also look at feminine nouns, and the results pointed
295
Odds ratio
0.14
1.37
1.37
4.36
1.03
0.05
p
< 0.001 ***
< 0.1
< 0.05 *
< 0,001 ***
< 0.001 ***
< 0.001 ***
Odds ratio
0.03
0.99
2.5
14.48
2.27
0.01
p
< 0.001 ***
> 0.1
< 0.001 ***
< 0.001 ***
< 0.001 ***
< 0.001 ***
References
Anttila, A. and Fong, V. 2000. The Partitive Constraint
in Optimality Theory. Journal of Semantics 17(4):
281314.
Barker, C. 1998. Partitives, double genitives and antiuniqueness. Natural Language and Linguistic Theory
16(4): 679717.
Baroni, M. and Bernardini, S. and Ferraresi, A. and
Zanchetta, E. 2009. The WaCky Wide Web: A
Collection of Very Large Linguistically Processed
Web-Crawled Corpora. Language Resources and
Evaluation 43(3): 209226.
Eisenberg, P. (author) and Fuhrhop, N. (collaborator).
2013. Grundriss der deutschen Grammatik: Das Wort,
Stuttgart: Metzler.
Fahrmeir, L. and Kneib, T. and Lang, S. and Marx, B.
2013. Regression Models, Methods, and Application.
Berlin: Springer.
Hentschel, E. 1993. Flexionsverfall im Deutschen? Die
Kasusmarkierung bei partitiven Genitiv-Attributen.
Zeitschrift fr Germanistische Linguistik 21(3): 320
333.
Vos, H. M. 1999. A grammar of partitive constructions.
Tilburg : Tilburg University.
References
Introduction
Preliminary results
Conclusion
References
Alison Sealey
Lancaster University
a.sealey@lancaster.ac.uk
This paper draws on findings emerging from a threeyear research project, funded by the Leverhulme
Trust, into the characteristics of discourse about
animals 104 . The overall aim of this project is to
investigate patterns in the way animals are
discursively represented, not only because this may
be intrinsically interesting, but also because of the
light it can potentially shed on the relationship
between discourse and reality. That is: a wide range
of discourse analytic studies (including some from
CDA perspectives, some using corpus methods and
some both) have explored how various social groups
are represented in language (e.g. Baker 2006; Baker
et al. 2008; Baker et al. 2013; Caldas-Coulthard and
Moon 2010; Gabrielatos and Baker 2008; Litosseliti
and Sunderland 2002; Partington 2004; Partington et
al. 2004). Such research must engage with the
possibility of reflexivity, in that the people described
may themselves respond to and sometimes
contribute to these discursive representations. This
project, by contrast, focuses on language about
living, sentient beings that, since they lack human
linguistic resources, do not participate directly in the
production of discourse. Thus, patterns in the
language used to represent animals and what they do
are a product of both the objective characteristics of
the creatures and the way discourse is used to
convey human perceptions of and stances towards
them.
Animals feature in human experience and
discourse as: objects of observation, study or
entertainment (in the wild, in laboratories, in zoos);
companions; tools (for transport and/or work);
commodities (for meat, other edible products, fur
and clothes), competitors (with each other and with
humans, in sport, as quarry in hunting, racing,
fighting) and out of place (pests / vermin) (see
DeMello 2012; Herzog 2010; Ingold 1988). These
are not mutually exclusive categories: creatures
hunted for sport, such as game birds or fish, may
then be eaten; creatures regarded as pests or vermin
may be executed clinically (e.g. by fumigation) or
hunted down in sporting rituals (e.g. foxes).
104
People, products, pests and pets: the discursive
representation of animals (RPG 2013 063)
302
References
Data
Research questions
Findings
303
References
Divjak, D. 2006. Ways of intending: Delineating and
structuring near synonyms. In S. Th Gries & A.
Stefanowitsch (Eds.), Corpora in Cognitive
Linguistics: Corpus-based Approaches to Syntax and
Lexis. Berlin/New York: Mouton de Gruyter, 1956.
Edmonds, P. & Hirst, G. 2002. Near synonyms and
lexical choice. In Computational Linguistics, 28 (2),
105-144.
Granger, S. & Meunier, F. (Eds.) 2008. Phraseology: An
Interdisciplinary Perspective. Amsterdam: John
Benjamins.
305
Sketch
Engine.
s.sharoff@leeds.ac.uk
Introduction
Annotation scheme
Inter-annotator agreement
A1 A3 A4 A5 A6 A7 A8 A9 A11
Part1 0.91 0.79 0.97 0.69 0.69 0.98 0.86 0.89 0.80
Part2 0.80 0.95 1.00 0.97 0.91 0.99 0.78 1.00 0.94
A12
0.88
1.00
A13
0.93
0.89
A14
0.90
0.91
A15
0.59
0.90
307
References
Adamzik, K. (1995). Textsorten Texttypologie. Eine
kommentierte Bibliographie. Nodus, Mnster.
Baroni, M., Bernardini, S., Ferraresi, A., and Zanchetta,
E. (2009). The WaCky wide web: a collection of very
large linguistically processed web-crawled corpora.
Language Resources and Evaluation, 43(3):209226.
Biber, D. (1988). Variations Across Speech and Writing.
Cambridge University Press.
Forsyth, R. and Sharoff, S. (2014). Document
dissimilarity within and across languages: a
benchmarking study. Literary and Linguistic
Computing, 29:622.
Halliday, M. (1992). Language as system and language as
instance: The corpus as a theoretical construct. In
Svartvik, J., editor, Directions in corpus linguistics:
proceedings of Nobel Symposium 82 Stockholm,
volume 65, pages 6177. Walter de Gruyter.
Kaufman, L., and Rousseeuw, P. J. (2009). Finding
groups in data: an introduction to cluster analysis,
John Wiley & Sons.
Krippendorff, K. (2004). Reliability in content analysis:
Some common misconceptions and recommendations.
Human Communication Research, 30(3).
Kuera, H. and Francis, W. N. (1967). Computational
analysis of present-day American English. Brown
University Press, Providence.
Lee, D. (2001). Genres, registers, text types, domains, and
styles: clarifying the concepts and navigating a path
through the BNC jungle. Language Learning and
Technology, 5(3):3772.
Santini, M., Mehler, A., and Sharoff, S. (2010). Riding
the rough waves of genre on the web. In Mehler, A.,
Sharoff, S., and Santini, M., editors, Genres on the
Web: Computational Models and Empirical Studies.
Springer, Berlin/New York.
308
Corpus
Introduction
seongjang (growth)
baljeon (development)
bokji (welfare)
bunbae (distribution)
Number of
articles
7,879
8,515
4,996
608
Number of
words
3,242,207
3,535,063
1,861,783
365,321
Conclusion
References
Anthony, L. 2014. AntConc (Version 3.4.3w) [Computer
Software]. Tokyo, Japan: Waseda University.
Available
from
http://www.laurenceanthony.net/software/antconc/.
Baker, P. 2006. Using Corpora in Discourse Analysis.
310
London: Continuum.
Baker, P., Gabrielatos, C., Khosravinik, M.,
Krzyzanowski, M., McEnery, T. and Wodak, R. 2008.
"A useful methodological synergy? Combining critical
discourse analysis and corpus linguistics to examine
discourses of refugees and asylum seekers in the UK
press". Discourse and Society 19(3), 273-306.
malliday@gmail.com
Introduction
Method
COH
TR
Exemplars
TR
INTERD
f
Exemplars
%
TR
-
1.63
37
30.32
13
10.65
30
24.59
47
38.52
items
69
56.55
60
49.18
sentences
122
100
122
100
EN
EN
1.72
which (1)
1.72
but (1)
15
25.86
1.72
otherwise (1)
18
31.03
items
3.44
34
58.62
sentences
58
100
58
100
EN
COH
Exemplars
Exemplars
TR
+
=
X
f
1
7
7
%
2.17
15.21
15.21
TR
bunun yan sra (1)
da (6), bile (1)
daha sonra (3),
bunun zerine (1)
f
0
10
6
%
0
21.73
13.04
items
sentences
EN
+
=
15
46
f
0
3
32.60
100
% EN
0
5.45 but (1), however
(2)
7.27 then (1), finally
(1)
12.72
100
16
46
f
9
22
34.78
100
%
EN
16.36 where (4), consider-ed (5)
40 and (13), but (6), or (2)
20
51
55
92.72
100
X
items
sentences
4
7
55
TR
ve (3), ancak (1), yksel-ErEk (5)
iin (3), iken (2), -mEk zere (1)
COH
f
%
5
6.49
19 24.67
22 28.57
46 59.74
77
100
f
%
0
0
6
13.04
7
15.21
13 28.26
46
100
Exemplars
TR
sz gelimi (3), hatta (2)
ne var ki (3), benzer biimde (1)
dolaysyla (1), bata (2), sonra (1)
EN
actually (1), also (1), though (1)
therefore (2), then (1)
INTERD
f
%
0
0
20 25.97
21 27.27
41 53.24
77
100
f
%
8 17.39
11 23.91
16 34.78
35 76.08
46
100
Exemplars
TR
olsun...olsun (1), -mAk yerine (1)
iin (8), -Ir gibi (1), -sE dE (4)
EN
which (1), resembl-ing (7)
and (9), instead of (1)
to save (8), only when (1), so (2)
Conclusion
McNamara 2007).
References
Bloor, T. and Bloor, M. 1995. The functional analysis of
English. New York: Oxford University Press Inc.
Chapman, L. J. 1982. A study in reading development: A
comparison of the ability of 8, 10 and 13 year old
children to perceive cohesion in their school texts.
The Annual Meeting of the United Kingdom Reading
Association, 19-23.07.1982, Newcastle upon Tyne,
England.
Gillian Smith
Lancaster University
g.smith6@lancaster.ac.uk
Introduction
Literature Review
Methodology
Results
Conclusion
References
Bili, B & Georgaca, E. (2007). Representations of
Mental Illness in Serbian Newspapers: A Critical
Discourse Analysis. Qualitative Research in
Psychology, 4(1-2), 167-186.
Bloor, M. & Bloor, T. (2007). The Practice of Critical
Discourse Analysis. London: Hodder Education.
Byrne, P. (2000). Stigma of mental illness and ways of
diminishing it. Advances in Psychiatric Treatment, 6,
65-72.
Coverdale, J., Nairn, R., and Claasen, D. (2002).
Depictions of mental illness in print media: a
prospective national sample. Australian and New
Zealand Journal of Psychiatry, 36, 697700.
Hallam, A. (2002). Media influences on mental health
policy: long-term effects of the Clunis and Silcock
cases. International Review of Psychiatry, 14, 2633.
Harper, S. (2009). Madness, power and the media: class,
gender and race in popular representations of mental
distress. Basingstoke: Palgrave.
Nairn, R., Coverdale, J., & Claasen, D. (2001). From
source material to news story in New Zealand print
media: a prospective study of the stigmatizing
processes in depicting mental illness. Australian and
New Zealand Journal of Psychiatry, 35, 654659.
Nawkov, L., Nawka, A., Admkov, T., Rukavina, T.V.,
Holcnerov, P., Kuzman, M.R. Raboch, J. (2001).
The picture of mental health/illness in the printed
media in three Central European countries. Journal of
Health Communication, 17(1), 22-40.
Nexis: News Search. (2014). Retrieved from
http://www.lexisnexis.com/uk/nexis/search/loadForm.d
o?formID=GB01NBSimplSrch&random0.7450459500
123497. Date accessed: 20th October 2014.
Olstead, R. (2002). Contesting the text: Canadian media
depictions of the conflation of mental illness and
criminality. Sociology of Health and Illness, 24, 621
643.
Stuart, H. (2003). Stigma and daily news: evaluation of a
newspaper intervention. Canadian Journal of
Psychiatry, 48, 651656.
A Multi-Dimensional Comparison of
Oral Proficiency Interviews to
Conversation, Academic and
Professional Spoken Registers
Shelley Staples
Purdue University
Jesse Egbert
Brigham Young
University
slstaples@
purdue.edu
Jesse_Egbert@
byu.edu
Geoffrey T. LaFlair
University of Kentucky
gtl7@nau.edu
Introduction
Methods
Conclusion
Acknowledgements
This research was partially funded by a Cambridge
Michigan Language Assessment SPAAN grant.
References
Al Surmi, M. 2012. Authenticity and TV shows: A
multidimensional analysis perspective. TESOL
Quarterly, 46 (4), 671-694.
Biber, D. 1988. Variation across speech and writing.
Cambridge: CUP.
Biber, D. 2006. University language: A corpus-based
study of spoken and written registers. Philiadelphia,
PA: John Benjamins Publishing.
Biber, D., Johansson, S., Leech, G., Conrad, S., and
Finegan, E. 1999. The Longman grammar of spoken
and written English. London: Pearson.
Friginal, E. 2009. The language of outsourced call
centers: a corpus-based study of cross-cultural
interaction. Amsterdam: John Benjamins.
Kane, M.T. 2013. Validating the interpretations and uses
of test scores. Journal of Educational Measurement, 50
(1), 1-73.
Kasper, G. and Ross, S.J. 2007. Multiple questions in oral
proficiency interviews. Journal of Pragmatics, 39,
2045-2070.
LaFlair, G., Egbert, J., and Staples, S. 2014, September.
Comparing oral proficiency interviews to academic
and professional spoken registers. Paper presented at
the meeting of AACL, Flagstaff, AZ.
LaFlair, G., Staples, S. and Egbert, J. 2015. Variability in
319
Cecilia Lazzeretti
University of
Modena e Reggio
Emilia
anna.stermieri
@unimore.it
cecilia.lazzere
tti@unimore.it
References
Baena, R. and Byker, C. 2014. Dialects of nostalgia:
Downton Abbey and English identity. National
Identities, (ahead of print) 1-11.
Bednarek,
M.
2012.
"Constructing
Nerdiness:
Characterisation in "The Big Bang Theory".
Multilingua:
Journal
of
Cross-Cultural
and
Interlanguage Communication 31 (2): 199-229.
Bednarek, M. 2011. "Expressivity and televisual
characterization". Language and Literature 20(1): 321.
Culpeper, J. 2001. Language and characterisation: people
in plays and other texts. London: Longman.
Chiaro, D. 2000. "The British will use tag questions,
wont they? The case of Four Weddings and a
Funeral". Tradurre il Cinema. Trieste: Universit degli
Studi di Trieste: 27-39.
Lakoff R. 1975. Language and Women's Place. New
York: Harper Row.
Partington, A., Duguid, A. and Taylor, C. 2013. Patterns
and meanings in discourse. Theory and Practice in
Corpus-Assisted
Discourse
Studies
(CADS).
Amsterdam: John Benjamins.
Weintraub, W. 1989. Verbal Behavior in Everyday
Life. New York: Springer.
Weintraub, W. 2003. Verbal Behaviour and Personality
Assessment. In Post, J.M. (ed) The Psychological
Assessment of Political leaders, The University of
Michigan Press: University of Michigan: 137-153.
p.supanfai1@lancaster.ac.uk
Introduction
Method
Results
Collocate types/tokens
Positive Negative
Neutral
/kreecay/
2/35
5/91
9/142
/khykt/
8/316
44/1,744 20/1,587
/chp/
4/61
26/456
34/2,315
Table 1: The results of the collocate analysis
The concordance analysis, under Sinclairs approach
shows that in many cases, /kreecay/ is used on its
own, that is, with only the colligations that are
common to all verbs and without any easily
classifiable pragmatic function beyond the core
literal meaning of (be) considerate (of). However,
/kreecay/ also appears in some fixed patterns of
consistent combinations of collocation, colligations,
and semantic preference, which we can describe as
extended units of meaning in Sinclairs sense, as
follows:
/c/, /yk/, or /yk c/ + [verb group] +
([object/adverb]) + (/t/) + /k kreecay/ +
([person])
This unit has a pragmatic function/ semantic
prosody of refraining from performing an
action due to consideration for someone.
[complete sentence-unit expressing
imposition of hearer on speaker] + /my t
kreecay/ + /n/, /ly/, or /rk/
The unit has a pragmatic function/ semantic
prosody of reduction of imposition
specifically, the speaker asserts to the hearer
that the previously-described imposition is
not, in fact, an imposition on them.
[action inconsiderate to another] + /ya/,
/bp/, or /dooy/ + /my kreecay/ +
([person])
The unit has a pragmatic function/ semantic
prosody that expresses disapproval of
behaviour.
Similarly to /kreecay/, /khykt/ and /chp/ are
both used independently, with very general
colligations and semantic preferences, but no clear
extended pragmatic function but also appear in
some patterns that can legitimately be considered
extended lexical units.
Acknowledgements
References
Aroonmanakun, W. 2007. Creating the Thai National
Corpus, MANUSYA: Journal of Humanities 13: 4-17.
Bednarek, M. 2008. Semantic preference and semantic
prosody re-examined, Corpus Linguistics and
Linguistic Theory 4 (2): 119-39.
Ebeling, S. O. 2014. Cross-linguistic semantic prosody:
the case of commit, signs of and utterly and their
Norwegian correspondences, Oslo Studies in
Language 6 (1): 161-79.
Hardie, A (forthcoming) A dual sort-and-filler strategy for
statistical analysis of collocation, keywords, and
lockwords.
Hunston, S. 2007. Semantic prosody revisited,
International Journal of Corpus Linguistics 12 (2):
249-68.
Louw, W. E. 1993. Irony in the text or insincerity in the
writer? The diagnostic Potential of semantic
prosodies, in M. Baker, G. Francis and E. TogniniBonelli (eds.) Text and Technology: In Honour of John
Sinclair, pp. 157-76. Amsterdam: John Benjamins.
Partington, A. 1998. Patterns and Meaning: Using
Corpora for English Language Research and
Teaching. Amsterdam/Philadelphia: John Benjamin.
Partington, A. 2004. Utterly content in each others
company: semantic prosody and semantic preference,
International Journal of Corpus Linguistics 9 (1): 13156.
Partington, A. 2014. Evaluative prosody, in Aijmer, K
and Ruhleman, C (eds.) A Handbook of Corpus
Pragmatics, pp. 279-303. Cambridge University Press.
Sinclair, J. 2004. Trust the Text: Language, Corpus and
323
Yukio Tono
Tokyo University of
Foreign Studies
takahashi.yuka.m0
@tufs.ac.jp
y.tono
@tufs.ac.jp
Introduction
324
Mode
Samples
WR
Junior &
senior
high (all
grades)
Adult
Corpus size
(sample n)
669,304
(10,038)
NICT JLE
SP
2,000,000
Corpus
(1,281)
Table 1. Learner corpora used in the study
Results
References
Comrie, B. and Keenan, E. 1979. Noun phrase
accessibility revisited. Language 55: 649-664.
Ellis, R. 2008. The Study of Second Language Acquisition.
Oxford: Oxford University Press.
Hamilton, R. 1994. Is implicational generalization
unidirectional and maximal? Evidence from
relativization instruction in a second language.
325
v.tantucci@lancaster.ac.uk
Introduction
326
Aspectual
discontinuity
evidentiality: Diachronic evidence
and
Earliest usages of V
guo as an experiential
perfect during the
tng dynasty (618907 A.C.)
are attested to be limited to its cooccurence with
animate subjects, mental verbs or verbs profiling the
syntactic subjects personal experience in the past
(cf. Cao 1995). However, Tantucci (2013:224225,
2015) observes that during the
Qng dynasty
(6361912 A.C.) V
guo will undergo a further
stage of semantic and grammatical reanalysis, as it
will start to cooccur with dummy subjects, in
subjectless or impersonal constructions with a new
interpersonal evidential (IE) meaning. More
specifically, functioning as an IE, V guo will be
no longer employed as an aspectual marker of past
experience, but rather used to problematize the
reliability of P as a piece of knowledge shared by the
SP/W together with a general 3rd party in society,
paraphrasable as: it is known that P.
I argue that the semasiological shift from past
experience to shared knowledge is precisely
triggered by the inherent discontinuous aspectual
structure of Vguo, which depending on the
context, the textual environment and the degree of
grammaticalization of
guo as a particle, functions
as a bridging element from a mere aspectual to a
new evidential reading. Compare the two examples
below:
pragmatics of V-
guo and specific text
types
References
At that time, the river still used to flow.
(Moyse-Faurie 1993: 210)
Charlotte Taylor
University of Sussex / Lancaster University
charlotte.taylor@sussex.ac.uk
Introduction
Mock politeness
Conclusions
References
Baker, P. 2014. Using Corpora to Analyze Gender.
London & New York: Bloomsbury.
Colston, H. L., & Lee, S. Y. 2004. Gender differences in
verbal irony use. Metaphor and Symbol, 19(4), 289306.
Culpeper, J. 2011. Impoliteness: Using Language to
Cause Offence. Cambridge University Press.
Dress, M.L., R.J. Kreuz, K.E. Link and G.M. Caucci.
2008. Regional Variation in the Use of Sarcasm.
Journal of Language and Social Psychology 27: 71.
Gibbs, R. W. 2000. Irony in talk among friends.
Metaphor & Symbol 15 (1&2): 5-27.
Gullick, D. & Lancaster University 2010. Collocational
Network
Explorer
(CONE)
Available
from
https://code.google.com/p/collocation-networkexplorer/
Haugh, M. 2014. Jocular mockery as interactional
practice in everyday Anglo-Australia conversation.
Australian Journal of Linguistics 34, 1: 76-99.
Ivanko, S. L., Pexman, P. M., & Olineck, K. M. 2004.
How sarcastic are you? Individual differences and
verbal irony. Journal of Language and Social
329
330
References
Baker, P. 2006. Using Corpora in Discourse Analysis.
London/New York: Continuum.
Bednarek, M. and Caple, H. 2012a. News Discourse.
London/New York: Continuum.
Bednarek, M and Caple, H. 2012b. Value added":
Language, image and news values. Discourse,
Context, Media 1: 103-113.
Bednarek, M. and Caple, H. 2014. Why do news values
matter? Towards a new methodological framework for
analyzing news discourse in Critical Discourse
Analysis and beyond'. Discourse & Society 25/2: 135158.
Bell, A. 1991. The Language of News Media. Oxford:
Blackwell.
Bouvier, G. 2012. How Facebook users select identity
categories for self-presentation. Journal of
Multicultural Discourses 7 (1): 37-57.
Boyd, D. and Ellison, N. 2008. Social Network Sites:
Definition, History, and Scholarship. Journal of
Computer-Mediated Communication 13 (1): 210-230.
DeAndrea, D. C., Shaw, A. S., and Levine, T. R. 2010.
Online language: The role of culture in selfexpression and self-construal on Facebook. Journal of
Language and Social Psychology 29 (4): 425-442.
Dwyer, T. 2010. Media Convergence. Maidenhead:
University Open Press.
Ellison, N., Steinfield, C. and Lampe, C. 2007. The
Benefits of Facebook "Friends:" Social Capital and
College Students' Use of Online Social Networking
Sites. Journal of Computer-Mediated Communication
12 (4), 1143-1168.
Fowler, R. 1991. Language in the News: Discourse and
Ideology in the Press. London/New York: Routledge.
Galtung, J. & Ruge, M. 1965. The structure of foreign
news: The presentation of the Congo, Cuba and Cyprus
crises in four Norwegian newspapers. Journal of
Peace Research 2 (1): 64- 90.
Gries,
St.
Th.
2008.
Dispersions
and
adjusted
331
Yukio Tono
Tokyo University of Foreign Studies
y.tono@tufs.ac.jp
Background
CEFRlevel
A1 to C2
Textbook
Corpus
Skills
Corpus
size
2,800,000
(95 titles)
All skills
Mode
Samples
WR
Junior &
senior
high (all
grades)
Adult
SP
WR/SP
Junior
High 3rd
Senior
High 1-3
Corpus size
(sample n)
670,000
(10,036)
2,000,000
(1,281)
100,000
(2,000)
2,500,000
(30,000)
GTECfS
WR
Writing
Corpus
Table 2: Learner corpora used for the study
Feature extraction
Machine learning
1-1
1-2
1-3
1-4
1-5
1-6
1-10
1-11
1-12
1-13
1-14
1-15
1-16
1-17
1-18
1-19
1-20
Useful attributes
A1 vs. A2
A1:
A2:
A1 vs B1
A1: It is PP
B1: be able to; how NP+VP;
Present perfect continuous
A1 vs B2
A1:
B2:
He [She] is NP;
He [She] is ADJP
how NP+VP
V + it + ADJ to do
References
Aleksandar Trklja
University of Exeter
1
Introduction
Research aims
Findings
concurrentielle>.
Formulaic expressions can be found in texts from
both CJEU and REF but CJEU judgments have a
higher degree of formulaicity. For example, on the
average 46% of the text of CJEU judgments in
English consists of repetitive expressions which are
at least five words long. The figure for UK
judgments is 37% and for Irish judgments 39%. In
the German version of these CJEU judgments
repeated expressions make up on the average 37% of
the text and in judgments produced by the German
and Austrian Constitutional Courts 33% and 23%
respectively. The average length of repeated
expressions also tends to be higher in CJEU
judgments (in English, French and German versions
of CJEU judgments they are 60 words long and in
the REF corpus 30 words long).
In addition to this direct type of repetition there are
also semantic repetitions realized through
synonyms-like expressions. These phrases are
identified in a parallel corpus of CJEU judgments.
We assume that items from language A that
corresponds to items from language B and are used
in the same context have the same function. Thus, in
our context we find that <it must be borne in mind>,
<it must be recalled>, <it must be pointed out>, <it
should be noted> or <it should be observed> are
interchangeable and therefore have the same
function because they meet the above criteria. The
difference which is usually made in the literature
(e.g. Coates, 1983) between a stronger (<must>) and
weaker <should> notion of necessity is here ignored.
Similarly, following what can be found in the
dictionaries consulted and in the results yielded by
the Sketch Engine tools Thesaurus and Sketch-Diff
the verbs <bear in mind>, <recall>, <point out>,
<note> and <observe> do not tend to be used as
synonyms generally in English.
Conclusion
References
Martin Warren
Hong Kong Polytechnic University
martin.warren@polyu.edu.hk
Introduction
Data
Findings
Implications
Acknowledgements
The research described in this paper was
substantially supported by a grant from the Research
Grants Council of the Hong Kong Special
Administrative Region (Project No. PolyU
5440/13H).
References
Biber, D., Connor, U and Upton, T. (eds.). 2007.
Discourse on the Move: Using Corpus Analysis to
describe
Discourse
Structure.
Amsterdam/
Philadelphia: John Benjamins.
Chamber of Hong Kong Listed Companies. 2007.
http://www.chklc.org/web/eng/index.htm, retrieved on
9 September 2009.
Gill, A. 2002. Corporate governance in emerging
markets - saints & sinners: whos got religion?
Symposium on Corporate Governance and Disclosure:
The Impact of Globalisation. The School of
Accountancy, The Chinese University of Hong Kong,
February 2002.
Greaves, C. 2009. ConcGram 1.0: a phraseological
search engine. Amsterdam: John Benjamins.
Ho, S.S.M. 2003. Corporate governance in Hong Kong:
Key problems and prospects, 2nd Edition, Copyright
2002, 2003. Centre for Accounting Disclosure &
Corporate Governance. School of Accountancy, The
Chinese University of Hong Kong.
Background
ready makes the lexica and some of the other resources editable, will be adopted, but a further aim is
to improve these options by also exposing more of
the underlying grammar and pragmatic inferencing
model to even computationally relatively inexperienced linguists to enable them to create suitably
customised large-scale corpora of pragmatically
annotated written language, as well as analyse them
on a variety of different levels, in a simple and
efficient manner.
References
Allen, J. and Core, M. 1997. Draft of DAMSL: Dialog
Act Markup in Several Layers. Available from:
ftp://ftp.cs.rochester.edu/pub/packages/dialogannotation/manual.ps.gz .
Horn, L. and Ward, G. 2006. The Handbook of Pragmatics. Oxford: Blackwell. (paperback edition of 2004).
Jurafsky, D., Shriberg, E. and Biasca, D. 1997. Switchboard SWBD-DAMSL Shallow-Discourse-Function
Annotation Coder Manual. Available from:
http://www.stanford.edu/~jurafsky/ws97/ics-tr-9702.ps
Kirk, J. 2013. Beyond the Structural Levels of Language: An Introduction to the SPICE-Ireland Corpus
and its Uses. In Cruickshank, J. and McColl Millar, R.
(eds.) 2013. After the Storm: Papers from the Forum
for Research on the Languages of Scotland and Ulster
triennial meeting, Aberdeen 2012. Aberdeen: Forum
for Research on the Languages of Scotland and
Ireland, 207-32.
Leech, G. and Weisser, M. (2003). Generic Speech Act
Annotation for Task-Oriented Dialogue. In
Archer/Rayson/Wilson/McEnery (Eds.) Proceedings of
the Corpus Linguistics 2003 Conference. Lancaster
University: UCREL Technical Papers, vol. 16.
Weisser, M. 2014. Speech act annotation. In Aijmer, K.
& Rhlemann, C. (Eds.). Corpus Pragmatics: a
Handbook. Cambridge: CUP.
Weisser, M. 2014. DART the Dialogue Annotation and
Research Tool. Submitted to Corpus Linguistics and
Linguistic Theory.
Weisser, M. 2014. The DART Manual. Application
manual to accompany the Dialogue Annotation &
Research
Tool.
Available
from
http://martinweisser.org/publications/DART_manual.p
df.
Weisser, M. 2013; forthcoming 2015. Corpora. In
Barron, A., Gu, Y. and Steen, G. (Eds.). The Routledge
Handbook of Pragmatics. London: Routledge.
Weisser, M. 2010. Annotating Dialogue Corpora SemiAutomatically: a Corpus-Linguistic Approach to
Pragmatics. Habilitation (professorial) thesis, University of Bayreuth.
Weisser, M. 2007. The Text Feature Analyser a
Flexible Tool for Comparing Different Levels of Text
340
Maria Lehl
Tonguesten
valentin.werner
@uni-bamberg.de
maria
@tonguesten.com
Introduction
Stylistic analysis
106
http://www.rebeats.tv
NLP annotation
341
https://gate.ac.uk/wiki/twitter-postagger.html
342
Integrating corpus
language learning
linguistics
and
References
Beath, O. 2010. I want to be more perfect than others:
a case of ESL motivation. Paper presented at the
Faculty of Education and IERI HDR Conference,
Wollongong, 12 November 2010. Available online at
http://ro.uow.edu.au/edupapers/161/
Brtuoli-Dutra, P. 2014. Multi-dimensional analysis of
pop songs. In T. B. Sardinha and M. V. Pinto (eds.)
Multi-Dimensional Analysis, 25 Years on: A Tribute to
Douglas Biber. Amsterdam: Benjamins. 149-176.
Biber, D. 1988. Variation across Speech and Writing.
Cambridge: Cambridge University Press.
Gimnez, J. and Mrquez, L. 2004. SVMTool: a general
POS tagger generator based on Support Vector
Machines. In M. T. Lino, M. F. Xavier, F. Ferreira, R.
Costa and R. Silva (eds.) Proceedings of the 4th
International Conference on Language Resources and
Evaluation, Paris: ELRA. 43-46. Available online at
http://www.lrecconf.org/proceedings/lrec2004/pdf/597.pdf
Halcsy, P., Kornai, A. and Oravecz, C. 2007. HunPos
an open source trigram tagger. In 45th Annual
Meeting of the Association for Computational
Linguistics: Proceedings of the Interactive Poster and
Demonstration Sessions. Prague: Association for
Computational Linguistics. 209-212. Available online
at http://aclweb.org/anthology/P07-2
Israel, H. F. 2013. Language learning enhanced by music
and song. Literacy Information and Computer
Education
Journal
2
(1):
1269-1275.
Mode
Modalit
y
Act type
Audio act
Audio
Silence
Verbal
Text
chat
Kinesics
Communicative
gestures
Kinesics
Mimics
Kinesics
Extracommunicative
gestures
Coverbal
Nonverbal
Explanation
Verbal act in the
full
duplex
audio channel
Interval between
two audio acts
greater
than
three seconds
Message entered
into the text chat
window
Gestures seen in
the
webcam
recordings
(iconic,
metaphoric,
deictic,
beat,
emblem,
communicative
action)
Facial
expressions seen
in the webcam
recordings and
their functions
(e.g.
surprise,
happiness,
incomprehensio
n)
For
example,
scratching
forehead,
pushing
hair
behind
ear,
playing with
pen.
Perspectives
Acknowledgements
This research was supported by the Ulysses
programme funded jointly by the Irish Research
Councils and the French Ministry of Foreign Affairs.
References
Btrancourt, M., Guichon, N. & Pri, Y. (2011).
Assessing the use of a Trace-Based Synchronous Tool
for distant language tutoring. Proceedings of the 9th
International Conference on Computer-Supported
Collaborative Learning, Hong-Kong, July 2011.
pp.478-485
Blin, F., Guichon, N., Thousny, S. & Wigham, C.R.
(2014). Creating and sharing a language learning and
teaching corpus of multimodal interactions: ethical
challenges and methodological implications. Sixteenth
International CALL Research Conference, 7-9 July,
Antwerp, Belgium.
Kellerman, S. (1992). I see what you mean': The Role of
Kinesic Behaviour in Listening and Implications for
Foreign and Second Language Learning, Applied
Linguistics, 13(3). pp.239-258.
Lazaraton, A. (2004). Gesture and speech in the
vocabulary explanations of one ESL teacher. A
microanalystic inquiry, Language Learning, 54 (1).
pp.79-117.
Reffay, C., Betbeder, M-L. & Chanier, T. (2012).
Multimodal learning and teaching corpora exchange:
lessons learned in five years by the Mulce project.
International Journal of Technology Enhanced
Learning, 4(1). pp.11-30.
Reffay, C., Chanier, T., Noras, M. & Betbeder, M-L.
(2008). Contribution la structuration de corpus
d'apprentissage pour un meilleur partage en recherche.
Sciences et Technologies de l'Information et de la
Communication pour l'Education et la Formation
(Sticef), 15. [oai: edutice.archives-ouvertes.fr:edutice00159733].
Sloetjes, H. & Wittenburg, P. (2008). Annotation by
category ELAN and ISO DCR. In Proceedings of the
6th International Conference on Language Resources
and Evaluation (LREC 2008).
Wigham, C.R. & Chanier, T. (2014). Pedagogical corpora
as a means to reuse research data and analyses in
teacher-training. In Colpaert, J., Aerts, A. &
Oberhofer, M. (Eds). Research Challenges in CALL.
Proceedings of the Sixteenth International CALL
Conference, 7-9 July Antwerp: University of Antwerp.
Wigham, C.R., Thousny, S., Blin, F. & Guichon, N.,
(2014). ISMAEL LEarning and Teaching Corpus.
346
Research questions
Method
Implications
References
Barlow, Michael. 2013. Exemplar theory and patterns of
production. Paper presented at Corpus Linguistics
2013, Lancaster, 2226 July 2013.
Chaski, Carole, E. 2001. Empirical evaluations of
language-based author identification techniques.
Forensic Linguistics: (The International Journal of
Speech Language and the Law) 8(1), 165.
Cohen, William W. 2009. Enron Email Dataset. [online].
Available
from:
http://www.cs.cmu.edu/~enron/.
[Accessed November 2010].
Cotterill, Janet. 2010. How to use corpus linguistics in
forensic linguistics. In Anne OKeefe and Michael
Rachel Wyman
Kings College London
rachel.wyman@kcl.ac.uk
Introduction
349
GOAL
CIRCUMSTANCES
MEANSGOAL
VALUES
(adapted from Fairclough & Fairclough 2012: 48)
Figure 1
The way in which the speaker represents the nations
current circumstances and values enters into his/her
claim. However, these representations are not
always accurate. False representations may enter
into discourse, resulting in flawed narratives based
on arguments that do not stand up to critical
evaluation. Yet they still form the premises for
arguments about how to respond to political
problems with action. For example, Obamas
argument for Wall Street reform depends on his
depiction of Wall Street as responsible for causing
the 2008 financial crisis. Wall Street reform will
result in the return to a fair system. However, the
U.S. government regulates the financial system;
therefore it is implicated in its reckless behavior.
Yet Obamas narrative succeeded and the Wall
Street Reform Bill was passed, demonstrating how
350
Research questions
Jiajin Xu
Beijing Foreign
Studies University
Maocheng Liang
Beijing Foreign
Studies University
xujiajin@
bfsu.edu.cn
liangmaocheng@
bfsu.edu.cn
The experiment
Marco
Polo
3.77
6.25
1.58
1.71
571.32
370.27
405.50
430.51
Crown
B-C
3.75
6.17
1.70
1.97
563.89
378.32
411.83
428.15
t score
p value
.56
.89
-6.25
-8.53
7.242
-2.79
-2.41
1.37
.57
.38
.00
.00
.00
.01
.02
.17
References
Baker, M. 1993. Corpus linguistics and translation
studies: Implications and applications. In M. Baker,
G. Francis and E. Tognini-Bonelli (eds.) Text and
technology: In honour of John Sinclair. 233-250.
Amsterdam: John Benjamins.
Crossley, S., Greenfield, J., and McNamara, D. 2008.
Assessing text readability using cognitively based
indices. TESOL Quarterly 42 (3): 475-493.
Coltheart, M. 1996. MRC psycholinguistic database:
Machine
usable
dictionary.
http://www.psych.rl.ac.uk/MRC_Psych_Db_files/mrc2
.html (accessed on 9 Jan. 2015).
McNamara, D., Graesser, A., Philip M. McCarthy, P., and
Cai, Z. 2014. Automated evaluation of text and
discourse with Coh-Metrix. Cambridge: Cambridge
University Press.
Xu, Jiajin and Liang, Maocheng. 2013. A tale of two
Cs: Comparing English varieties with Crown and
CLOB (The 2009 Brown family corpora). ICAME
Journal 37: 175-1
Richard Xiao
Lancaster
University
xuhai1101
@gdufs.edu.cn
r.xiao
@lancaster.ac.uk
Vaclav Brezina
Lancaster University
v.brezina@lancaster.ac.uk
Introduction
Publically
corpora
available
Chinese
learner
108
http://202.112.195.192:8060/hsk/login.asp
Written Corpus:
http://www.globalhuayu.com/corpus3/Search.aspx ; Spoken
Corpus: http://www.globalhuayu.com/corpus5/Default.aspx
110
HSK is a Chinese proficiency public test, similar to IELTS or
TOEFL.
109
353
Acknowledgements
This project was supported by a grant from British
Academy International Partnership & Mobility 2013
Scheme.
References
Granger, S. 2013. Learner corpora In C.A. Chapelle
The encyclopedia of applied linguistics. London:
Blackwell.
Pravec, N.A. 2002. Survey of learner corpora. ICAME
Journal 26: 81-114.
Background
This Study
Applications
References
Francis Gill, Susan Hunston and Elizabeth Manning 1996.
Collins Cobuild Grammar Patterns 1: Verbs. London:
HarperCollins.
Hunston, Susan and Gill, Francis (2000). Pattern
Grammar: A Corpus-driven Approach to the Lexical
Grammar of English. Amsterdam: John Benjamins.
Kilgarriff, Adam and Rob Koeling 2003. An evaluation of
a lexicographer's workbench incorporating word sense
disambiguation. Proc. CICLING, 3rd Int Conf on
Intelligent Text Processing and Computational
Linguistics, Mexico City. Springer Verlag.
Mason, Oliver 2004. Automatic Processing of Local
Grammar Patterns. In Proceedings of CLUK.
University of Birmingham, 166171.
Mason, Oliver and Susan, Hunston 2004. The automatic
recognition of verb patterns--A feasibility study.
International Journal of Corpus Linguistics 9(2):253270.
Sinclair, John 2003. Reading Concordance. London:
Pearson Education Ltd., Longman.
356
Selinker(1971)
proposed
that
interlanguage
characterizes as systematic and dynamic throughout
the stages of second language acquisition. In other
words, the interlanguage which the learner has
constructed is portrayed as an internally consistent
system and the process of development from one
stage to the next is ordered and regular (Ellis
1985:118).
It
has
become
increasingly
acknowledged, however, that interlanguage is also
variable. It is believed that the variety in a learners
language can be a part of his learning process as
well as contextual alterations. Systematicity and
variability are two reconcilable features of learner
language.
Contextual variability, as the second type of
variability identified in interlanguage, is evident
when the language user varies his use of linguistic
forms according to the linguistic environment. Then,
a full account of the situational and contextual
variability in interlanguage requires studying how
the linguistic environment constrains the operation
of interlanguage rules at different stages of
development in different contexts. Interlanguage
language of learner productions in various societies,
therefore as linguistic system in its own right, offers
a valuable resource of studying how they are varying
systematically due to the linguistic, situational and
psycholinguistic factors that are imposed on the
learners in different ethnic groups. It is worth
attempting this with the advent of large corpora of
learner written texts and the exponentially increasing
computing power. Hundt and Mukherjee (2011:2)
argued that it is high time learner Englishes and
second-language varieties are described and
compared on an empirical basis in order to draw
conceptual and theoretical conclusions with regard
to their form, function and acquisition. They believe
such descriptive studies and comparisons were not
possible on a large scale 20 years back as the
relevant ESL (e.g. the International Corpus of
English, ICE) and EFL (e.g. the International Corpus
of Learner Corpus, ICLE) computerized corpora
have only become available in recent decades. With
this purpose in mind, the researcher compares
corpora of written English texts by Chinese English
learners and learners from 11 other countries (i.e.,
References
Biewer, C. (2011). Modal auxiliaries in second language
varieties of English: A learners perspective. In J.
Mukherjee & M. Hundt (Eds.), Exploring SecondLanguage Varieties of English and Learner Englishes:
Bridging a Paradigm Gap. Amsterdam: John
Benjamins Publishing, pp. 733.
Ellis, R. (1985). Sources of variability in interlanguage.
Applied Linguistics, 6(2):118-31.
Gilquin, G. & Granger, S. (2011). From EFL to ESL:
Evidence from the International Corpus of Learner
English. In J. Mukherjee & M. Hundt (eds). Exploring
Second-Language Varieties of English and Learner
Englishes: Bridging a Paradigm Gap. Amsterdam:
John Benjamins Publishing, pp.55-78.
Mukherjee, J. & Hundt, M. (2011). Exploring SecondLanguage Varieties of English and Learner Englishes:
Bridging a Paradigm Gap. Amsterdam: John
Benjamins Publishing.
Nicole Ziegler
University of Hawaii at Manoa
358
nziegler@hawaii.edu
Acknowledgements
References
del, A. and Erman, B. 2012. Recurrent word
combinations in academic writing by native and nonnative speakers of English: A lexical bundles
approach. English for Specific Purposes 31: 81-92.
Biber, D. and Barbieri, F. 2007. Lexical bundles in
university spoken and written registers. English for
Specific Purposes 26: 263-286.
Biber, D., and Conrad, S. 1999. Lexical bundles in
conversation and academic prose. In H. Hasselgard
and S. Oksefjell (eds.), Out of Corpora. Studies in
honour of Stig Johansson. Amsterdam: Rodopi.
Biber, D., Conrad, S., and Cortes, V. 2004. If you look
atLexical bundles in university teaching and
359
360
Posters
Methodology
Research Questions
References
Arnon, I. and Snider, N. (2010). More than words:
Frequency effects for multi-word phrases. Journal of
Memory and Language, 62(1), 67-82.
Barfield, A. and Gyllstad, H. (2009). Researching
collocations in another language:
Multiple
Interpretations. Basingstoke: Pelgrave Macmillan.
Bishop, H. (2004). Noticing formulaic sequences: A
problem of measuring the subjective. LSO Working
Papers in Linguistics, 4,15-19.
Conklin, K. and Schmitt, N.
(2008).
Formulaic
sequences: Are they processed more quickly than
nonformulaic language by native and nonnative
speakers Applied Linguistics 29(1), 72-89.
Conzett, J. (2000). Integrating collocation into a reading
Eric Atwell
University of Leeds
scmmal@
leeds.ac.uk
E.S.Atwell@
leeds.ac.uk
References
Abbas, N. H. 2009. Quran 'search for a concept' tool and
website. MRes thesis, University of Leeds.
Abu Shawar, B. and Atwell, E. 2004. An Arabic chatbot
giving answers from the Qur'an. Proceedings of
TALN. 4(2), pp.197-202.
Al Gharaibeh, A. et al. 2011. The usage of formal
methods in Quran search system. In: Proceedings of
international conference on information and
communication systems, Ibrid, Jordan. pp.22-24.
Alrehaili, S. M. and Atwell, E. 2014. Computational
ontologies for semantic tagging of the Quran: A survey
of past approaches. In: LREC 2014 Proceedings.
Atwell, E. et al. 2011. An artificial intelligence approach
to Arabic and Islamic content on the internet. In:
Proceedings of NITS 3rd National Information
Technology Symposium.
Dukes, K. 2013. Statistical parsing by machine learning
from a classical Arabic treebank. PhD thesis.
Explorer, Q. 2005. Quran Explorer [Online]. [Accessed
26 October 2014]. Available from:
http://www.quranexplorer.com/Search/Default.aspx
Iqbal, R. et al. 2013. An experience of developing Quran
ontology with contextual information support.
Multicultural Education & Technology Journal. 7,
pp.333-343.
Raza, S.A. et al. An essential framework for concept
based evolutionary Quranic search engine (CEQSE).
Shoaib, M. et al. 2009. Relational WordNet model for
semantic search in Holy Quran. Emerging
Technologies, 2009. ICET 2009. International
Conference on, 2009. IEEE, 29-34.
Sudeepthi, G. et al. 2012. A survey on semantic web
search engine. International Journal of Computer
Science, 9.
Yauri, A. R. et al. 2013. Quranic verse extraction based
on concepts using OWL-DL ontology. Research
Journal of Applied Sciences Engineering and
Technology. 6, pp.4492-4498.
Yunus, M. et al. 2010. Semantic query for Quran
documents results. Open Systems (ICOS), 2010 IEEE
Conference on, 2010. IEEE, 1-5.
Zarrabi-Zadeh, H. 2007. Tanzil. http://tanzil.net/
366
367
Amlia Mendes
Centro de Lingustica
da Universidade de
Lisboa
sandra.antunes
@clul.ul.pt
amalia.mendes
@clul.ul.pt
Introduction
Corpus constitution
Inf.
129
65
52
246
Age
22
24
29
25
Texts
323
142
139
604
Words
57.377
21.610
21.200
100.195
Proficiency
Int. (34%)
Elem. (41%)
Elem. (57%)
------
111
http://www.clul.ul.pt/en/research-teams/547
368
Data analysis
Conclusion
References
Cowie, A. P. 1998. Phraseology: Theory, Analysis, and
Applications. Oxford: Oxford University Press.
Dagneaux, E., Denness, S., Granger, S., Meunier, F.,
Neff, J. and Thewissen, J. (eds.) 2005. Error
Tagging Manual. Version 1.2. Centre for English
Corpus Linguistics. Universit Catholique de
Louvain. Belgium.
Gilquin, G. 2007. To err is not all. What corpus and
elicitation can reveal about the use of collocations
by learners. Zeitschrift fr Anglistik und
Amerikanistik, 55.3. Pp. 273-291.
Granger, S. 1996. From CA to CIA and back: An
integrated approach to computerized bilingual and
learner corpora. In K. Aijmer, B. Altenberg and M.
Johansson (eds.) Languages in Contrast. Text-based
369
South America
North America
Comparable
Parallel
Components
IT
EN
IT
EN
Num. of texts
120
80
390
390
Africa
Middle East
European
Local UK
Local IT
0 10 20 30 40 50 60 70 80
IT - Rai Uno, Rainews24
EN - BBC One
EU - Euronews
370
Sub-corpora
Text
average 8,000 6,000 250
250
length
(approx.)116
Table 1: Multi-modal corpus final composition.
Asia
112
http://it.euronews.com/notizie/telegiornale/
Although the corpus only includes the English and Italian
versions, the channel provides rolling news in thirteen different
languages, making translation one of its flagship features.
115
E.g., off-screen narrating voices or visual support items such
as slides and pictures.
116
The average text lengths have been calculated on the word
count of a two weeks transcribed sample.
117
With reference to speakers, area and topic of the reported
news and for audio and visual (or both) events that took place on
the screen.
114
Tokens
Search item f %
IT:
EN:
ucra* ukra*
BBC
5,888
60,117
RAI
11,754
96.321
Euronews
EN
3,123
12,522
1.15
1.22
1.6
Euronews 3,717
13,007 1,45
IT
Table 2: Types, tokens and ucra*/ukra* frequency
counts119 of the four data sets.
Crime
Banks
Croatia
Belgium
Denmark
Bosnia
Poland
References
Hungary
Russia
Germany
Ukraine
France
Spain
Greece
0
5 10 15 20 25 30 35 40 45
IT - Rai Uno, Rainews24
EN - BBC One
EU - Euronews
118
119
Software used: AntConc 3.4.3,
http://www.laurenceanthony.net/software.html
371
372
Keith Barrs
Hiroshima Shudo University
Keithbarrs@hotmail.com
Methodology
and non-catachrestic and (2) used within the sketchdiff function to analyse their behaviour. An
additional finding related to this issue was that many
distinct English words have become homographs in
Japanese due to differences in the phonologies of the
two languages (e.g. bath and bus are both
represented as
, basu, in katakana). This creates
problems for the corpus analysis because it is highly
complicated to isolate the different meanings
represented by one word, causing issues for the
interpretation of the data.
These findings suggest that in order to build a
sufficient database of loanwords for a large-scale
corpus-based study, it is necessary to start with a
number far beyond the target number for the
analysis. This is because of current limitations with
corpus software in the analysis of homographic and
polysemous words. This can greatly alter the
workload and timescale for the research, and is
therefore an important consideration to be aware of
when conducting corpus-based studies of loanwords.
References
Daulton, F. E. 2008. Japans Built-in Lexicon of Englishbased Loanwords. Clevedon: Multilingual Matters Ltd.
Inagawa, M. 2010. A Corpus-Driven Study of
Loanwords: Synchronic and Diachronic Change of
English-Derived Words in Contemporary Japanese.
Unpublished PhD thesis, University of Queensland.
Irwin, M. 2011. Loanwords in Japanese. Philadelphia:
John Benjamins Publishing Company.
Kay, G. 1995. English loanwords in Japanese. World
Englishes, 14(l), 6776.
Kilgarriff, A., Baisa, V., Busta, J., Jakubicek, M., Kovar,
V., Michelfeit, J., Rychly, P. and Suchomel, V. 2014.
The Sketch Engine: Ten years on. Lexicography
ASIALEX,
1,
736.
Available
online
at
http://link.springer.com/article/10.1007%2Fs40607014-0009-9
Loveday, L. 1996. Language Contact in Japan: A SocioLinguistic History. Oxford: Oxford University Press.
Onysko, A., & Winter-Froemel, E. 2011. Necessary loans
luxury loans? Exploring the pragmatic dimension of
borrowing. Journal of Pragmatics, 43(6), 15501567.
Ringbom, H. 2007. Cross-linguistic similarity in foreign
language learning. Clevedon: Multilingual Matters
Ltd.
Stanlaw, J. 2004. Japanese English: Language and
culture contact. Hong Kong: Hong Kong University
Press.
nbelkabe7@ub.edu
120
374
Introduction
http://ugriw.net
Structure
Functionality
Novel
Short story
Tale
Play (theatre)
Poem
Lyrics
Proverb
Newspaper article
Biography
Academic
Textbook
Magazine
Speech
Interview
Blog
Internet page
Comment
Year if date unknown
Man/Woman/Unknown
If text is translation
Written/Spoken
Linguistic area
Text title
if text is translation
if text is translation
spelling has been revised
in this text entry
Table 2: Metadata
The results of a global lexicometry search will
display a summary of the contents of the corpus in
terms of the number of words for each genre of text,
together with the highest frequencies. For each genre
of text, the following will be displayed:
number of texts
number of words
percentage of the number of words with
respect to the total number of words in the
whole corpus
the most frequent word, together with its
frequency
relative index of the frequency, expressed in
a percentage value; that is to say, what
proportion the word represents with respect
to the total words in that type of text
number of distinct words
relative index of the number of distinct
words expressed in a percentage value, i.e.
375
References
Atkins et al. (1992) Corpus design criteria. In Literary &
Linguist Computing, 7 (1). Oxford: Oxford University
Press
Garside, R., Leech, G. N., and McEnery, T. (1997)
Corpus annotation: linguistic information from
computer text corpora. London: Longman
Greenberg, J. (1955) Studies in African Linguistic
Classification. New Haven
Hardie, A. (2012) CQPweb - combining power, flexibility
and usability in a corpus analysis tool. International
Journal of Corpus Linguistics 17 (3): 380409.
Mammeri, M. (1974) Tajerrumt n Tmazight (Grammar of
Tamazight). Alger: Bouchene.
McEnery, T. and Hardie, A. (2012) Corpus Linguistics:
Method, Theory and Practice. Cambridge Textbooks in
Linguistics. Cambridge:Cambridge University Press.
Sinclair, J. M. (1991) Corpus, Concordance, Collocation.
Oxford: Oxford University Press.
Prescriptive-descriptive disjuncture:
Rhetorical organisation of research
abstracts in information science
John Blake
Japan Advanced Institute
of Science and Technology
johnb@jaist.ac.jp
Introduction
Prescriptive advice
Aim
Method
Results
Discussion
377
Nina Horkov
The International
School of Prague
rbohat@isp.cz
nhorakova@isp.cz
Beata Rdlingov
The International
School of Prague
References
Biber, D. & Gray, B. (2013). Nominalizing the verb
phrase in scientific writing. In B. Aarts, J. Close, G.
Leech & S. Wallis (Eds). The verb phrase in English:
Investigating recent language change with corpora,
(pp.99-132). Cambridge: Cambridge University.
Rbert Boht
The International
School of Prague
brodlingova@isp.cz
Introduction
Background
Conclusion
References
Alsop, S. and Nesi, H. 2009. Issues in the development
of the British Academic Written English (BAWE)
corpus. Corpora. 4 (1): 71-83.
Gupta, K. 2014. Corpus Linguistics MOOC: Discussion
question for week 4. Future Learn. Lancaster
University.
Available
online
at
https://www.futurelearn.com/courses/corpuslinguistics-2014q3/steps/14848/progress?page=5#comment_2311204
Christian Bentz
University of
Cambridge
apc38@cam.ac.uk
cb696@cam.ac.uk
Calbert Graham
University of
Cambridge
Paula Buttery
University of
Cambridge
crg29@cam.ac.uk
pjb48@cam.ac.uk
Overview
Crowdsourcing corpora
Recordings
Acknowledgements
This work has been funded by Cambridge English
Language Assessment, Crowdflower and Crowdee.
We thank Tim Polzehl of Technische Universitt
Berlin for his help in designing and publishing the
Crowdee jobs. We thank Wil Stevens of Crowd
Flower
for
his
assistance
with
the
transcription/annotation jobs.
References
Ballier, N. & P. Martin (2013). Developing corpus
interoperability for phonetic investigation of learner
corpora. In: Daz-Negrillo, A., N. Ballier, P. Thompson
(eds.), Automatic treatment and analysis of learner
corpus data. Amsterdam: John Benjamins.
Crowdy, S. (1993). Spoken corpus design. Literary and
Linguistic Computing 8: 259-265.
Kolly, M-J. & A. Leemann (in press). Dialkt pp:
Communicating dialectology to the public
crowdsourcing dialects from the public. In: Leemann,
A., M.-J. Kolly, V. Dellwo, S. Schmid (eds.), Trends in
Phonetics and Phonology. Studies from Germanspeaking Europe. Bern / New York: Peter Lang.
123
Further information at:
http://apc38.user.srcf.net/outreach/#crowd
124
http://sldr.org/ortolang-000913, http://sldr.org/ortolang000914
francesca.masini
@unibo.it
Malvina Nissim
University of Groningen
m.nissim@rug.nl
Introduction
Acknowledgements
This research is carried out within the CombiNet
project (Word Combinations in Italian), funded by
the Italian Ministry of Education, University and
Research.125
References
Atkins, B.T.S and Rundell, M. 2008. The Oxford Guide to
Practical Lexicography. Oxford: Oxford University
Press.
Baroni, M., Bernardini, S., Comastri, F., Piccioni, L.,
Volpi, A. and Aston, G. 2004. Introducing the La
125
http://combinet.humnet.unipi.it/.
382
Introduction
Research methodology
FB webpage
1st for
Immigration-UK
Visa Experts
Global Visa
Support
UK Visa and
Work Permit
USA Visa
Experience
Topic
No. of
words
852
Years
2,860
4,627
7,849
USA Visa
2,004
Experiences,
Questions and
Confessions
Tot.
17,192
Table1: Breakdown of The ELF WebIn Corpus.
Main findings
383
Conclusions
References
Anthony, L., 2014. Antconc 3.2.4w, Tokyo, Japan:
Waseda
University.
Available
online
at
http://www.antlab.sci.waseda.ac.jp/ .
(eds.) Cogo, A., Archibald, A. and J. Jenkins, 2011.
Latest trends in ELF research. Cambridge: Cambridge
Scholars Publishing.
Grabher, G., J. Maintz, 2006. Learning in personal
networks: collaborative knowledge production in
virtual forums. Working Papers Series, Centre on
Organizational Innovation, Columbia University: 1-12.
Jenkins, J., 2007. English as a Lingua Franca: Attitude
and Identity. New York: Oxford University Press.
Lee, C. K. M., (2002). Literacy practices in computermediated communication in Hong Kong. The Reading
Matrix (2): 1-25.
Lin, H., L., Qiu, 2013. Two sites, two voices: linguistic
differences between Facebook status updates and
Tweets. Lecture Notes in Computer Science (8024):
432-440.
Maldonado, G. J., Mora, M., Garca, S., P., Edipo, 2001.
Personality,
sex
and
computer-mediated
communication through the Internet. Anuario de
Psicologa, Vol. 32 (2): 51-62.
Mauranen, A., 2007. Hybrid Voices: English a the
Lingua Franca of Academics. Language and
Discipline Perspectives on Academic Discourse, K.
Flottum (ed.), Newcastle, UK: Cambridge Scholars
Publishing: 243-59.
Prez-Sabater, C., 2012. The Linguistics of Social
Networking: A Study of Writing Conventions on
Facebook. Available online at http://www.linguistikonline.de/56_12/perez-sabater.html
Seidlhofer, B., 2001. Closing a conceptual gap: the case
for a description of English as a lingua franca.
384
Chen Lyu
Wuhan University
bochen@
whu.edu
lvchen1989@
whu.edu
Xiaohui Liang
Wuhan University
1504719992@qq.com
Introduction
Annotation
graph
with
recursive
directed
Conclusion
Acknowledgements
Supported by the National Natural Science
Foundation of China (61202193, 61202304), the
Major Projects of Chinese National Social Science
Foundation (11&ZD189), and the Chinese
Postdoctoral Science Foundation (2013M540593,
2014T70722).
Beom-Il Kang
Yonsei University
amancio.choi
@gmail.com
Kangbeomil
@gmail.com
linguistic processing, i.e. segmentation and postagging, of the texts extremely time-consuming and
labour-intensive. To resolve these problems, a
corpus processing pipeline has been developed in
python and java languages. It is composed of several
processing modules from the initial data collection
to data standardisation to the final production of the
structured corpus database. Especially, separate
segmentation and tagging algorithms are developed
for the Chinese character sequences and Korean
character sequences, and the two processing results
are combined.
For the Chinese character sequences, a lexiconbased segmentation algorithm has been developed
based on the lexicon list of more than 300,000
entries. This algorithm extracts all n-grams of the
Chinese characters and match them against the
lexicon list. For Korean, the old Korea characters
and pre-modern orthographic variants have been
normalized into modern forms to improve automatic
processing performance of the segmenter and tagger.
Instead of using a special tool like VARD2 (Baron
and Rayson 2009; Hendrickx and Marquilhas 2011),
a spelling normalization module has been developed
based on pre-modern character sequence rules as the
latter operates on Korean faster and more flexibly.
Firstly, the backbone normalization rules of about
1,650 are written by examining all the character
sequences of the most frequent 10,000 eojeols
occurring more than 100 times. Next, all the uni-, bi, and trigrams of character sequences of all the
eojeols are extracted from the Hallym corpus and the
modern Korean corpus (Sejong Corpus 128 )
respectively. Then their presence/absence and
relative frequency are compared to each other. All in
all, about 2,500 sequence normalisation rules have
been written and applied to the entire corpus data.
Thirdly, this corpus is the first large-sized Korean
corpus which is fully TEI-XML compliant.
Previously-built Korean corpora adopted unique data
formats which make them incompatible with
standard ones due to the linguistic characteristics of
Korean, in particular the presence of 'eojeol'.
Table 1 and Table 2 show the basic statistics
of the resulting corpus.
unit
sentence
eojeol
(a
sentence
component)
morpheme (or word)
counts
953,541
11,131,8
41
24,481,8
36
Table 1: The statistics of the text structure units
word class
(common)
noun
(lexical) verb
tokens
7,266,89
types
2,727,35
17,663
226,257
0
682,155
5,986
969,242
10,227
Table 2: The statistics of major word classes
adjective
adverb
For further research, we plan to do several macroanalyses using text-mining techniques, first with
topic modelling, to draw the conceptual map of this
historical period. The macro-analyses could
illustrate intellectual structure hidden to the
researchers' naked eye, and coupled with corpus
linguistics techniques, will help researchers find
their ways through enormous textual data, and make
it possible to compare socio-political and ideological
evolutions of certain concepts synchronically and
diachronically between Korea and other Asian
countries, and between Korea and Western
countries. In addition to the macro-analysis, the
post-editing and refining of segmentation and POS
tagging results will be carried out, in particular for
the Chinese character sequences. The post-editing
can further improve the algorithms for automatic
segmentation and tagging for the pre-modern
Korean language.
References
Baron, A. and Rayson, P. 2009. Automatic
standardization of texts containing spelling variation:
How much training data you need?. In Proceedings of
Corpus Linguistics 2009. University of Liverpool,
Liverpool.
Hendrickx, I. and Marquilhas, R. 2011. From Old Texts
to Modern Spellings: An Experiment in Automatic
Normalisation. Journal for Language Technology and
Computational Linguistics 26(2):6576
Koselleck, Reinhard. 2002. The Practice of Conceptual
History: Timing History, Spacing Concepts. Translated
by Todd Samuel Presner. Stanford: Stanford
University Press.
Koselleck, Reinhard. 1975. The Temporalisation of
Concepts. Unpublished paper, Paris. Available at
http://www.jyu.fi/yhtfil/redescriptions/Yearbook%201
997/Koselleck%201997.pdf
Sangseok, Yim. 2008. The Formation of the Korean and
Chinese mixed-up style in the 20th Korean language.
Seoul: Jisik-Sanup Publications Co., Ltd.
128
https://ithub.korean.go.kr/user/main.do
http://www.sejong.or.kr/
387
References
Biber, D., Connor, U. and Upton, T. 2007. Discourse on
the move: using corpus analysis to describe discourse
structure. Amsterdam: John Benjamins.
Forsyth, R.S. and Sharoff, S. 2014. Document
dissimilarity within and across languages: a
benchmarking study. Literary and Linguistic
Computing, 29 (1): 6-22.
Hoey, M. 2001. Textual interaction: an introduction to
written discourse analysis. London: Routledge.
Mauranen, A. 2010. Features of English as a lingua
franca in academia. Helsinki English Studies 6: 6-28.
HES Special issue on English as a Lingua Franca.
Swales, J. M. 1990. Genre analysis: English in academic
and research settings. Cambridge: Cambridge
University Press.
Vercruysse, N. and Proteasa, V. 2012. Transparency
Tools across the European Higher Education Area.
The Flemish Ministry for Education and Training.
Michaela Mahlberg
University of
Nottingham
johan.dejoode
@nottingham.ac.uk
michaela.mahlberg
@nottingham.ac.uk
Peter Stockwell
University of Nottingham
peter.stockwell@nottingham.ac.uk
Background
Pilot study
Work in progress
Acknowledgement
The CLiC Dickens project is supported by the UK
Arts and Humanities Research Council Grant
Reference AH/K005146/1.
References
Mahlberg, M., & Smith, C. 2012. Dickens, the Suspended
Quotation and the Corpus. Language and Literature,
21(1), 5165. doi:10.1177/0963947011432058
Stockwell, P. 2002. Cognitive Poetics an Introduction.
Hoboken: Routledge.
Culpeper, J. 2001. Language and Characterisation. People
in Plays and Other Texts. Harlow: Pearson Education.
390
Introduction
http://avalon.law.yale.edu/subject_menus/inaug.asp
http://www.repubblica.it/2009/01/sezioni/esteri
131
http://iipdigital.usembassy.gov/iipdigital-en/index.html
132
http://www.ilsole24ore.com/
133
https://www.project-syndicate.org/
134
http://globalvoicesonline.org/
130
Results
Type
Appraisal
groups
Targets
Modifiers
EN
Political
News
TED
Political
News
TED
Political
News
TED
624
236
349
486
254
341
599
221
288
519
194
326
411
203
292
510
191
246
551
197
297
437
244
323
542
214
264
IT
RU
Category
EN
Appraisal
groups
Targets
Modifiers
Appraisal
groups
Targets
Modifiers
Appraisal
groups
Targets
Modifiers
IT
RU
Positi
ve
744
Negat
ive
440
Neutr
al
2
Ambi
guous
23
165
294
723
200
189
345
477
178
0
215
481
13
247
299
736
146
141
362
334
143
0
186
467
10
284
382
124
178
400
120
151
367
Type of words
Agreeing
Disagreeing
Ambiguous
Agreeing
Disagreeing
Ambiguous
Agreeing
Disagreeing
Ambiguous
Frequency
590
244
14
454
190
7
152
71
5
Percentage
69.57%
28.77%
1.65%
69.63%
29.18%
1.07%
66.67%
31.14%
2.19%
Conclusions
References
Argamon, S., Bloom, K., Esuli, A. and Sebastiani, F.
Automatically Determining Attitude Type and Force
for Sentiment Analysis In Zygmunt Vetulani & Hans
Uszkoreit (eds.) Human Language Technology.
Challenges of the Information Society, SpringerVerlag: 218 231.
Bloom, K. and Argamon, S. 2009. Automated learning
of appraisal extraction patterns. Language and
Computers 71: 249260.
Cettolo, M., Girardi, C. and Federico, M. 2012. Wit3:
Web inventory of transcribed and translated talks. In
Proceedings of the 16th Conference of the European
Association for Machine Translation (EAMT), Trento,
Italy.
Di Bari, M., Sharoff, S. and Thomas, M. 2014. Multiple
views as aid to linguistic annotation error analysis. In
Proceedings of the 8th Linguistic Annotation
Workshop (LAW VIII), ACL SIGANN Workshop held in
conjunction with Coling 2014, Dublin, Ireland.
Available
online
at
http://www.aclweb.org/anthology/W14-4912.
Di Bari M., Sharoff S. and Thomas M., 2013. SentiML:
functional annotation for multilingual sentiment
analysis. In Proceedings of the 1st International
Workshop on Collaborative Annotations in Shared
Environment: metadata, vocabularies and techniques
in the Digital Humanities, held in conjunction with
DocEng 2013, Florence, Italy. Available online at
http://dl.acm.org/citation.cfm?doid=2517978.2517994.
Harris, Z. 1954. Distributional structure. Word, 10, 146135
http://corpus.leeds.ac.uk/marilena/SentiML
162.
Hunston, S. 2010. Corpus approaches to evaluation
Phraseology and Evaluative Language. In Routledge
Advances in Corpus Linguistics, Taylor & Francis.
Iulia Drghici
University of Bucharest
alicadus@gmail.com
136
139
2 Background
Academic writing for research publication takes
place around the globe, involving, according to a
recent account, 5.5 million scholars, 2,000
publishers and 17,500 research/higher education
institutions (Lillis and Curry 2010). Universities
worldwide are striving to increase the quantity,
quality and impact of their research publications.
This endeavor applies to research students, as well
as faculty members, with international publication
increasingly becoming a requirement for graduation
at PhD and even Masters degree level. For many
advanced academic writers, however, English is not
their first language and so they need additional help
in developing their skills in writing for publication.
However, the training support offered to such
writers tends to be sporadic in most jurisdictions.
This is specifically the case in Hong Kong, which is
the focus for this presentation (Kwan, 2010).
Corpus
linguistics
pedagogy
and
language
395
References
Bianchi, F., & Pazzaglia, R. (2007). Student writing of
research articles in a foreign language: Metacognition
and corpora. In R. Facchinetti (Ed.), Corpus linguistics
25 years on (pp. 259-287). New York: Rodopi.
396
linguistics:
Davies, M. (2013). Google Scholar and COCAAcademic: Two very different approaches to
examining academic English. Journal of English for
Academic Purposes, 12(3), 155-165.
Flowerdew, J. (2009). Corpora in language teaching. In
Long, M. H. & Doughty, C.J. (Eds.). The handbook of
language teaching. (pp. 327-350). Oxford: WileyBlackwell.
Example
Corpus A
Corpus B
Corpus C
(2000)
(2007)
(2015)
1
careers
careers
careers
2
students
students
students
3
work
information
your
4
information
career
career
5
employer
graduate
your
6
service
service
skills
7
discipline
work
information
8
employers
skills
work
9
graduate
graduates
experience
10 application
graduate
university
Table 1- Keyword Analysis: Corpus A, B & C
397
Discussion
References
Fairclough, N. (1993) Critical Discourse Analysis and
the Marketization of Public Discourse: The
Universities, Discourse & Society, 4(2), pp. 133168.
Fairclough, N. (2015) Language and power. 3rd edn.
Oxon: Routledge.
Internet Archive: Wayback Machine (no date). Available
at: http://archive.org/web/ (Accessed: 15 January
2015).
Mautner, G. (2005) The Entrepreneurial University: A
discursive profile of a higher education buzzword,
Critical Discourse Studies, 2(2), pp. 95120.
Mautner, G. (2009) Corpora and Critical Discourse
Analysis, in Baker, P. (ed.) Contemporary corpus
linguistics. London; New York: Continuum, pp. 32
46.
Tomlinson, S. (2005) Education in a post-welfare society.
2nd edn. Maidenhead: Open University Press
398
alf@via-rs.net
mariafinatto@gmail
.com
400
References
Bako, M., Doliski, I., Duda, J., Hebal-Jezierska, M.,
Collocation Images of Hungarians in Slavonic
Languages, Practical Applications of Linguistic
Research. In. A. Obrbska (ed.) Practical Applications
of Linguistic Research. d.
Baker, P. 2010. Sociolinguistics and Corpus Linguistics.
Edinburgh.
Baker, P, Gabrielatos, C, Khosravinik, M, Krzyanowski,
M, McEnery, T., Wodak, R. 2008. A Useful
Methodological
Synergy?
Combining
Critical
Discourse Analysis and Corpus Linguistics to Examine
Discourses of Refugees and Asylum Seekers in the UK
Press. Discourse & Society 19(3): 273-305.Smith, X.
2003. Some thoughts on submitting abstracts to
conferences. In J. Jones and F. Farmer (eds.) All
about conferences. London: Example Press.
Bartmiski, J. (ed.) 1999. Jzykowy obraz wiata. Lublin.
Smith, X. 2003. Some thoughts on submitting
abstracts to conferences. In J. Jones and F. Farmer
(eds.) All about conferences. London: Example Press.
Duszak, A., Fairclough, N. 2008. Krytyczna analiza
dyskursu. Interdyscyplinarne podejcie do komunikacji
spoecznej. Krakw: Universitas.
Chlewiski, Z., Kurcz, I. 1992. Stereotypy i uprzedzenia.
Warszawa.
esk nrodn korpus, www.korpus.cz
ermk, F., ulc, M. 1996. Kolokace. Praha.
Gabrielatos, C., Baker, P. 2008. Fleeing, sneaking,
flooding: A corpus analysis of discursive constructions
of refugees and asylum seekers in the UK Press 19962005. Journal of English Linguistics 36(1), 5-38.
Glynn, D., Fischer, K. 2010. Quantitative Methods in
Cognitive Semantics: Corpus-Driven Approaches.
Berlin/NewYork
Hebal-Jezierska, M. 2011. Kolokan obrazy nkterch
lexm patcch do smantickho pole cizinec v
eskm tisku (s metodologickmi vahami). In. F.
ermk (eds.) Korpusov lingvistika Praha 2011,
InterCorp. Praha, 109-121.
Hebal-Jezierska, M. 2012. The image of a lexeme based
on the analysis of collocations. In. P. Pzik (ed.)
Corpus Data across Languages and Disciplines. Peter
Lang, Frankurt am Main, Berlin, Bern, Bruxelles, New
York, Oxford, Warszawa, Wien, 183-192.
Hebal-Jezierska, M. 2012. Wizerunki kolokacyjne
mniejszoci narodowych yjcych w Republice
Czeskiej i Republice Sowacji. A talk from the
conference Tertium: Sowo w kontekcie. Krakw.
Haan, H., Scholz, S., Stereotyp, Identitt und Geschichte:
die Funktion von Stereotypen in gesellchaftlichen
Diskursen.
Hunston, S., Francis, G. 2000. Pattern Grammar,
Amsterdam/Philadephia.
and
Meanings,
References
Adolphs, S. and Carter, R. 2007. Beyond the word: New
challenges in analysing corpora of spoken English.
European Journal of English Studies 11(2): 133146.
Adolphs, S. and Knight, D. 2008. Analysing Discourse
Markers: A Multi-Modal Approach. In the British
Association for Applied Linguistics Annual Conference
(BAAL 2008), University of Swansea.
Aijmer, K. 2013. Analysing modal adverbs as modal
particles and discourse markers. In L. Degand, B.
Cornillie, & P. Pietrandrea (eds.) Discourse markers
and modal particles: Categorization and description.
Amsterdam: John Benjamins.
Aijmer, K. & Simon-Vandenbergen, A. 2006. Pragmatic
markers in contrast. Amsterdam: Elsevier.
Allen, L. Q. 1999. Functions of nonverbal
communication in teaching and learning a foreign
language. The French Review 72 (3): 469480.
Allwood, J. 2008. Multimodal Corpora. In A. Ldeling,
and M. Kyt (eds) Corpus Linguistics: An
international handbook. Berlin: Mouton de Gruyter.
Andersen, G. 2001. Pragmatic markers and
sociolinguistic variation: A relevance-theoretic
approach to the language of adolescents. Amsterdam:
John Benjamins.
Bavelas, J. B. 1994. Gestures as part of speech:
Methodological implications. Research on Language
& Social Interaction 27 (3): 201221.
Blakemore, D. 1987. Semantic Constraints on Relevance.
Oxford: Blackwell.
Blakemore, D. 1989. Denial and contrast: A relevance
theoretic analysis of but. Linguistics and philosophy
12(1): 1537.
Blakemore, D. 2002. Relevance and linguistic meaning:
The semantics and pragmatics of discourse markers.
Cambridge: Cambridge University Press.
Birdwhistell, R. L. 1970. Kinesics and context: Essays on
body motion communication. Philadelphia: University
of Pennsylvania Press.
Brinton, L. J. 1996. Pragmatic markers in English:
Grammaticalization and discourse functions. Berlin:
Walter de Gruyter.
Carter, R. and McCarthy, M. 2006. Cambridge grammar
of English: a comprehensive guide: spoken and written
English grammar and usage. Cambridge: Cambridge
University Press.
Cassell, J., McNeill, D. and McCullough, K. 1999.
Speech-gesture mismatches: Evidence for one
underlying
representation
of
linguistic
and
nonlinguistic information. Pragmatics & Cognition
7(1): 133.
Ferr, G. 2011. Multimodal analysis of discourse
markers donc, alors and en fait in conversational
French. In Actes de ICPhS XVII: 671674.
Fraser, B. 1990. An approach to discourse markers.
Journal of Pragmatics 14: 383395.
403
Stefania DegaetanoOrtlieb
Saarland University
ashraf.khamis@unisaarland.de
s.degaetano@mx.uni
-saarland.de
Hannah Kermes
Saarland University
Jrg Knappen
Saarland University
h.kermes@mx.unisaarland.de
j.knappen@mx.unisaarland.de
Noam Ordan
Saarland University
Elke Teich
Saarland University
noam.ordan@unisaarland.de
e.teich@mx.unisaarland.de
References
Baron, A. and Rayson, P. 2008. VARD 2: A tool for
dealing with spelling variation in historical corpora.
Postgraduate Conference in Corpus Linguistics 2008,
May 22. Birmingham, UK: Aston University.
Biber, D., Finegan, E. and Atkinson, D. 1994. ARCHER
and its challenges: Compiling and exploring A
Representative Corpus of Historical English Registers.
In U. Fries, P. Schneider and G. Tottie (eds.), Creating
and using English language corpora, 114.
Amsterdam/New York: Rodopi.
Degaetano-Ortlieb, S., Kermes, H., LapshinovaKoltunski, E. and Teich, E. 2013. SciTex - a diachronic
corpus for analyzing the development of scientific
registers. In P. Bennett, M. Durrell, S. Scheible and R.
J. Whitt (eds.), New methods in historical corpus
linguistics: Corpus linguistics and interdisciplinary
perspectives on language (CLIP), vol. 3. Tbingen:
Narr.
Evert, S. and Hardie, A. 2011. Twenty-first century
corpus workbench: Updating a query architecture for
the new millennium. In Proceedings of the Corpus
Linguistics 2011 Conference. Birmingham, UK.
Moskowich, I. and Crespo, B. 2007. Presenting the
Corua Corpus: A collection of samples for the
historical study of English scientific writing. In J.
Prez-Guerra, D. Gonzlez-lvarez, J. L. Bueno
Alonso and E. Rama-Martnez (eds.), Of varying
language and opposing creed: New insights into Late
Modern English, 341357. Bern: Peter Lang.
Nesi, H. 2011. BAWE: An introduction to a new
resource. In A. Frankenberg-Garcia, L. Flowerdew and
Michal Ken
Institute of the Czech National Corpus
Charles University
michal.kren@ff.cuni.cz
Background
Design of SYN2015
http://ucnk.ff.cuni.cz/english/struktura.php
405
Technical enhancements
406
Conclusion
Acknowledgement
The corpus design, compilation and annotation are a
result of team work carried out during the
implementation of the Czech National Corpus
project (LM2011023) funded by the Ministry of
Education, Youth and Sports of the Czech Republic
within the framework of Large Research,
Development and Innovation Infrastructures.
References
Biber, D. 1993. Representativeness in Corpus Design.
Literary and Linguistic Computing 8 (4): 243257.
Hntkov, M., Ken, M., Prochzka, P. and Skoumalov,
H. 2014. The SYN-series corpora of written Czech.
In Proceedings of LREC 2014. Reykjavk: ELRA,
160164. Available online at http://www.lrecconf.org/proceedings/lrec2014/pdf/294_Paper.pdf
Krlk, J. and ulc, M. 2005. The Representativeness of
Czech Corpora. International Journal of Corpus
Linguistics 10 (3): 357366.
Ken, M. 2013. Odraz jazykovch zmn v synchronnch
korpusech. Prague: NLN.
Machlek, T. and Ken, M. 2013. Query interface for
diverse corpus types. In Natural Language
Processing,
Corpus
Linguistics,
E-learning.
Ldenscheid: RAM Verlag, 166173.
Rychl, P. 2007. Manatee/Bonito - A Modular Corpus
Manager. In 1st Workshop on Recent Advances in
Slavonic Natural Language Processing. Brno:
Masaryk University, 6570.
144
http://www.korpus.cz/
References
Archer, D. 2005. Questions and answers in the English
courtroom (1640-1760): a sociopragmatic analysis.
Amsterdam/Philadelphia: John Benjamins Publishing
Company.
Brown, P. & Levinson, S. 1978. Universals in language
usage: politeness phenomena. In E. N. Goody, (ed),
Questions and Politeness: Strategies in Social
Interaction. Cambridge: Cambridge University Press.
Clayman, S. & Heritage, J. 2002. The News Interview.
Journalistic and Public Figures on the Air. Cambridge:
Cambridge University Press.
Clayman, S. 2010. Questions in Broadcast Journalism. In
Freed, A. F. & Ehrlich, S. (eds) Why Do You Ask
The function of Questions in Institutional Discourse.
New York: Oxford University Press, pp. 256-278.
Spencer-Oatey, H. 1992. Cross-Cultural Politeness:
British and Chinese Conceptions of the Tutor-Student
relationship. Unpublished Ph.D. thesis. Lancaster
University.
407
Ciara R. Wigham
Universit Lumire
Lyon 2 - ICAR
Julien.Longhi@
u-cergy.fr
ciara.wigham@
univ-lyon2.fr
References
CoMeRe Repository (2014). Repository for the CoMeRe
corpora [website], http://hdl.handle.net/11403/comere
Burnard, L. & Bauman, S. (2013). TEI P5: Guidelines for
electronic text encoding and interchange. TEI
consortium,
tei-c.org.
http://www.teic.org/release/doc/tei-p5-doc/en/Guidelines.pdf
Chanier, T., Poudat, C., Sagot, B., Antoniadis, G.,
Wigham, C.R., Hriba, L., Longhi, J. & Seddah, D.
(2014). The CoMeRe corpus for French: structuring
and annotation heterogeneous CMC genres, in
Beiwenger, M., Oostdijk, N., Storrer, A & van den
Heuvel, H. Building and Annotating Corpora of
Computer-Mediated Discourse: Issues and Challenges
at the Interface of Corpus and Computational
Linguistics, Journal of Language Technology and
Computational Linguistics (special issue). pp1-31.
http://www.jlcl.org/2014_Heft2/Heft2-2014.pdf
Djemili S., Longhi J., Marinica C., Kotzinos D. & Sarfati
G.-E. (2014). What does Twitter have to say about
ideology , Konvens 2014 - Workshop proceedings
vol. 1 (NLP 4 CMC: Natural Language Processing for
Core
Metadata
Initiative.
410
References
Baker, Paul. 2004. "'Unnatural acts' Discourses of
homosexuality within the House of Lords debates on
gay male law reform." Journal of Sociolinguistics 8
(1): 88-106.
Baker, Paul. 2009. "'The question is, how cruel is it?'
Keywords, Fox Hunting and the House of Commons."
In What's in a Word-list? Investigating word frequency
and keyword extraction, edited by Dawn Archer, 12536. Surrey: Ashgate Publishing Ltd.
Baker, Paul, Costas Gabrielatos, and Tony McEnery.
2013. Discourse analysis and media attitudes:the
representation of Islam in the British press.Cambridge;
New York: Cambridge University Press.
Mansbridge, Jane. 1999. "Should Blacks Represent
Blacks and Women Represent Women? A Contingent
Yes." The Journal of Politics 61 (3): 628-57.
Phillips, Anne. 1995. The Politics of Presencee. New
York: Oxford University Press.
Pitkin, Hanna Fenichel. 1967. The concept of
representation: Berkeley: University of California
Press.
Rationale
Literature
Research Questions
Design of Study
Significance of Results
411
References
Clark, E., & Paran, A. (2007). The employability of nonnative-speaker teachers of EFL: A UK survey. System,
35(4), 407-430.
Lorenzo Mastropierro
University of Nottingham
lorenzo.mastropierro
@nottingham.ac.uk
1 Introduction
412
Methodology
3 Analysis
References
Conclusion
413
1 Introduction
While reviewing the literature relating to corpora
and educational materials, I have identified a bias in
pedagogical corpus-linguistic research. The majority
of the research appears to focus on research for
education materials (FEM) and fewer pieces of
research focus on research on educational materials
(OEM). This is interesting because research has
found intermittent or no use of corpora by creators
of educational-materials. Several reasons are given
for this, including lack of familiarity with corpora
and insufficient computer skills (Burton, 2012). One
could also explain this underdevelopment, as
Meunier (2002: 123) did, by recognizing that
learner corpora research is still in its infancy.
While some people can be shown how to use
corpora to inform their educational materials, it
appears as though this will not be an option in every
case; particularly where appropriate corpora are not
known or not available, as is the case for some
minority or small-community languages. Should
corpus analyses on educational materials be more
widely conducted, we may build on the valuable
materials currently in existence, rather than starting
from scratch.
In conducting this study I will illustrate some
research methods available to corpus-linguists who
seek to evaluate written text currently being used in
education (therefore conducting OEM research),
with the aim of relating the materials to the CEFR.
This paper take will also show initial findings from
my research.
The examples I will use are based on the CEFR as
applied to Irish, but some of the methods of analysis
could apply to a wide range of languages. The Irishlanguage interpretation of the CEFR is chiefly
realised by Teastas Eorpach na Gaeilge (European
Certificate of Irish), or TEG. The sample materials
provided by TEG for each CEFR level and the
relevant syllabi will be used as a baseline from
which the additional educational materials will be
related to the CEFR.
Research of this type requires the consideration of
multiple language features, from syntax to lexicon,
and from grammar to discourse; a fact which is
corroborated in Council of Europe (2009). The
Manual goes on to state that it is not a blueprint, but
414
research
on
3 Initial results
Sample lessons provided by TEG at A1 level include
the images below and encourage the teacher to
introduce the word siopadireacht (shopping) as
an activity, siopa (shop) as a place of work, and a
lesson called liosta siopadireachta (shopping
list).
Shopping as an activity - A shop as a place of
work - Write a shopping list
References
Biber, D. (1995) Dimensions of Register Variation: A
cross-linguistic comparison. Cambridge University
Press, 1995
Burton, G. (2009) Corpora and Coursebooks: destined to
be strangers forever?
Council of Europe (2009) Manual for Relating Language
Examinations to the Common European Framework of
Reference for Languages: Learning, teaching,
assessment.
http://www.coe.int/t/dg4/linguistic/source/manualrevisi
on-proofread-final_en.pdf (downloaded 15 Dec, 2014)
Coxhead, A. (2000). A New Academic Word List.
TESOL Quarterly, Vol. 34, No. 2 (Summer, 2000), pp.
213-238 http://www.jstor.org/stable/3587951
Meunier, F. (2002) The pedagogical value of native and
learner corpora in EFL grammar teaching.
le
with
mama
mammy
Hypertextualizer:Quotation
Extraction Software
Ji Milika
Charles University,
Prague
Petr Zemnek
Charles University,
Prague
jiri@milicka.cz
petr.zemanek@
ff.cuni.cz
Introduction
The Software
Acknowledgements
References
Kolak, O. and Schilit, B. N. 2008. Generating Links by
Mining Quotations. In HT 08: Proceedings of the
nineteenth ACM conference on Hypertext and
hypermedia. New York.
Pareti, S., OKeefe, T., Konstas, I., Curran, J. R. and
Koprinska, I. 2013. Automatically Detecting and
Attributing Indirect Quotations. In Proceedings of the
2013 Conference on Empirical Methods in Natural
Language Processing. Seattle.
Paul, W., Fernandes, D., Motta, E. and Milidi, R. L.
2011. Quotation Extraction for Portuguese. In
Proceedings of the 8th Brazilian Symposium in
Information and Human Language Technology.
Cuiab.
Pouliquen, B., Steinberger, R. and Best, C.. 2007.
Automatic Detection Of Quotations in Multilingual
News. In Proceedings of Recent Advances in Natural
Language Processing 2007. Borovets.
Zemnek, P. and Milika, J. 2014a. Quotations,
Relevance and Time Depth: Medieval Arabic
Literature in Grids and Networks. In: 14th Conference
of the European Chapter of the Association for
Computational Linguistics. 2014. Available online at
http://aclweb.org/anthology//W/W14/W14-09.pdf
Zemnek, P. and Milika, J. 2014b. Ranking Search
Results for Arabic Diachronic Corpora. Google-like
search engine for (non)linguists. In Proceedings of
CITALA 2014 (5th International Conference on Arabic
Language Processing, Oujda). Available online at
http://www.citala.org/papers/paper_29.pdf
417
References
Braun, F 2000. Leitfaden zur geschlechtergerechten
Formulierung. Mehr Frauen in die Sprache. Kiel:
Ministerium fr Justiz, Frauen, Jugend und Familie des
Landes Schleswig-Holstein.
Ben O'Loughlin
Royal Holloway,
University of London
Federica Ferrari
University
of Bologna
Ben.OLoughlin
@rhul.ac.uk
federica.ferrari
10@unibo.it
References
Pascual PrezParedes
University of
Murcia.
carlos.ordonana
@um.es
Pascualf
@um.es
References
Paradis, J. 2011. Individual differences in child English
second language acquisition: Comparing child-internal
and child-external factors. Linguistic Approaches to
Bilingualism, 1,3, 213-237.
Introduction:
The Investigation
References
Francis, G. 1993. A Corpus-Driven Approach to
Grammar Principles, Methods and Examples. In:
Baker, Mona, Francis, Gill, and Tognini-Bonelli,
Elena, eds. Text and Technology : In Honour of John
Sinclair. Amsterdam, NLD: John Benjamins
Publishing Company, 1993.
Hoey, M. 2005. Lexical Priming. London: Routledge.
Pace-Sigge, M. (forthcoming) The Function and Use of
TO and OF in Multi-Word Units. Hounslow: Palgrave
Macmillan.
Sinclair, J. (Editor-in-Chief) et al. 1990. Collins Cobuild
Grammar. London: Collins.
Sinclair, J. [1992] 2004. Trust the text. Language, corpus
and discourse. London: Routledge.
Stubbs, M. (1996). Text and Corpus Analysis. ComputerAssisted Analysis of Language and Culture. Oxford:
Basil Blackwell.
Conclusion
Streamlining corpus-linguistics in
Higher and adult education: the
TELL-OP strategic partnership
Pascual Prez-Paredes
Universidad de Murcia
pascualf@um.es
Introduction
Aims
Acknowledgements
Transforming European Learner Language into
Learning
Opportunities
2014-1-ES01-KA203004782, a KA200 Higher Education Strategic
Partnership, funded by the OAPEE and the EU.
Begoa
Crespo Garca
Universidade da
Corua
luis.pcastelo
@udc.es
bcrespo@udc.es
References
Aguado-Jimnez, P., Prez-Paredes, P., & Snchez, P.
(2012). Exploring the use of multidimensional analysis
of learner language to promote register awareness.
System, 40(1), 90-103.
Boulton, A., & Prez-Paredes, P. (2014). Researching
uses of corpora for language teaching and learning
Editorial Researching uses of corpora for language
teaching and learning. ReCALL, 26, 121-127.
Conole, G. 2013. Designing for Learning in an Open
World. Explorations in the Learning Sciences,
Instructional Systems and Performance Technologies,
Vol. 4. Springer.
Kinshuk, Huang, Ronghuai (Eds.). 2015. Ubiquitous
Learning Environments and Technologies. Lecture
Notes in Educational Technology. Springer.
Prez-Paredes, P. 2010. Corpus Linguistics and Language
Education in Perspective: Appropriation and the
Possibilities Scenario. In T. Harris & M. Moreno Jan
(Eds.), Corpus Linguistics in Language Teaching (pp.
53-73). Peter Lang.
Prez-Paredes, P., & Snchez Tornel, M. (2009).
Understanding e-skills in the Foreign Language
Teaching context: Skills, strategies and computer
expertise. . In R. Marriott & P. Torres (Eds.),
Handbook of Research on E-Learning Methodologies
for Language Acquisition (pp. 1-22). IGI Global.
Acknowledgements
The research here reported on has been funded by
the Consellera de Educacin e Ordenacin
Universitaria (I2C plan, reference number
Pre/2011/096, co-funded 80% by the European
Social Fund) and the Ministerio de Economa y
Competitividad (MINECO), grant number FFI201342215-P. These grants are hereby gratefully
acknowledged.
References
Biezma, Mara. 2011. Conditional inversion and
givenness. Proceedings of SALT 21: 552-571.
Carter-Thomas, Shirley & Elizabeth Rowley-Jolivet.
2008. If-conditionals in medical discourse: from
theory to disciplinary practice. Journal of English for
426
Introduction
Koteyko (2014).
should be banned.
In these editorials the full veil and the veil or
headscarf is not defined as a religious symbol.
The evident problem in the frame elements in the
editorials of ABC is to ban any kind of veil. They
state that the burqa is a garment of cultural
significance and then subsequently to deny it. As for
the evaluation, there is frequent mention of the false
liberalism of those who want to allow their use and
fallacies made by those who defend the veil and then
defend the ban Christian symbols in public spaces. It
supports its argument to the rule of law, democratic
principles and legality. In the first editorial of ABC
one of the reasons to ban their use is the rejection of
shelters, but that case was subsequently abandoned
and not re-quote in the remaining publishers. The
necessary integration of Muslims is mentioned.
There is a reference in editorials of ABC to
multiculturalism, which is the excuse used by those
who want to allow their use. It highlighted in the
editorials of ABC more solutions than those offered
by El Pas and El Peridico.
The solution for ABC is clear: we must ban the
headscarf, both the full veil as the Islamic headscarf.
The issue in the editorial of La Vanguardia is
that the headscarf should be banned.
In the Editorials of La Vanguardia only defined in
a text Muslim veil is equivalent to wearing a cross
religious symbol, but not the frame element shown
in the remaining texts.
On evaluation, it is emphasized the authorities
should take action on the matter, and do not leave it
in the hands of the Municipalities. It stresses the
need for Muslims to integrate, as proposed by El
Peridico, ABC and slightly El Pas. Also it goes to
the ambiguous sense as a place of reference to know
why or why not is to prohibit element also it is
mentioned in other newspapers. It refers to the
concept of multiculturalism, in this case paper with
positive value and why we live in Europe in a
multicultural society.
About solutions, it is proposed to be flexible with
headscarf lets face uncovered and prohibit the
wearing of full veil.
Conclusions
References
Baker, P., Gabrielatos, C., KhosraviNik, M.,
Krzyzanowski, M., McEnery, T., and R. Wodak 2008.
A useful methodological synergy? Combining critical
discourse analysis and corpus linguistics to examine
discourses of refugees and asylum seekers in the UK
press. Discourse Society, 19: 273-306.
conversations.
Strongly enough, these pragmatic units have
undergone a pragmaticalization process, which gives
them a set of inferential meanings; this process
contributes to the acquisition of a variety of
pragmatic functions in different contexts. The
markers you know and yan then and so have
interpersonal and interactional purposes: they soften
the force of an illocutionary act and derive relevant
inferences of implicit meanings.
In both Arabic and English, speakers are found to
use them for a variety of pragmatic contexts to
perform auto-correction, reformulation, hedging and
mitigating face-threatening acts, holding a turn, and
request for implication and cooperation of
interactants.
References
Blakemore, D. (2002). Relevance And Linguistic
Meaning: The Semantics And Pragmatics Of Discourse
Markers. Cambridge: Cambridge University Press.
Brinton, L, J. (1996). Pragmatic Markers In English:
Grammaticalization And Discourse Functions.
Herndon: Walter De Gruyter.
Brown, P., Levinson, S. C. (1987). Politeness- Some
Universals In Language Usage. Cambridge: Cambridge
University Press.
Dostie, G. (2004). Pragmaticalisation Et Marqueurs
Discursifs: Analyse Smantique Et Traitement
Lexicographique. Bruxelles: Duculot.
Erman, B. (1987). Pragmatic Expressions In English, A
Study Of You Know , You See And I Mean
In Face-To-Face Conversation, Doctoral Dissertation
At The University Of Stockholm. Stockholm.
Erman, B. (2001). Pragmatic Markers Revisited With A
Focus On You Know In Adult And Adolescent Talk.
Journal Of Pragmatics 33: 1337-1359.
Elsevier
Science B.V.
Kerbrat- Orecchioni, C. (1990). Les Interactions Verbales
I. Paris: Armand Colin.
Kerbrat- Orecchioni, C. (1992). Les Interactions Verbales
Ii. Paris: Armand Colin.
Kerbrat- Orecchioni, C. (1994). Les Interactions Verbales
Iii. Paris: Armand Colin.
Leech, G. 1975. A Communicative Grammar Of English.
London: Longman.
Leech, G. (1983). Principles Of Pragmatics. London New
York: Longman.
Levinson, S. (1983). Pragmatics. Cambridge: Cambridge
University Press.
Schiffrin, D. (1987). Discourse Markers.
Cambridge University Press.
New York:
University Press.
Andrew Salway
Uni Research,
Bergen
Knut Hofland
Uni Research,
Bergen
andrew.salway
@uni.no
knut.hofland
@uni.no
Introduction
Approach
3 Current status
We have harvested all posts from 5563 Englishlanguage blogs, 2088 French-language blogs and
128 Norwegian blogs; approximately 9.7x106 blog
posts and 5.9x109 words. Processing is underway to
extract: the text content of each post, comments,
date, data about article links (links from the main
text of the post to any other site), blog roll links, and
other links.
Next we will assess corpus quality, and clean
where necessary, using techniques such as character
and n-gram distribution to check topic, language and
duplicates (cf. Biemann et al. 2013). Features
particular to a blog corpus will also be checked,
including the distribution of dates, and the network
structure generated automatically from link data.
We expect that all processing and validation will
be complete by July 2015. Our plan is to make the
corpora available to researchers for download and
for online analysis in the Corpuscle system145.
Acknowledgements
This work is supported by the RCNs VERDIKT
program. We are grateful to Dag Elgesem, Kjersti
Flttum, Anje Mller Gjesdal and Lubos Steskal for
input on corpus design, and especially to ystein
Reigem for his work on normalisation, deduplication
145
http://clarino.uib.no/korpuskel/page
432
References
Biemann, C., Bildhauer, F., Evert, S., Goldhahn, D.,
Quasthoff, U., Schfer, R., Simon, J., Swiezinski, L.
and Zesch, T. 2013. Scalable Construction of HighQuality Web Corpora. Journal for Language
Technology and Computational Linguistics 28(2):2359.
Kehoe, A. Gee, M. 2012. Reader comments as an
aboutness indicator in online texts: introducing the
Birmingham Blog Corpus. Studies in Variation,
Contacts and Change in English 12. Online at
www.helsinki.fi/varieng/series/volumes/12/kehoe_gee/
Salway, A., Touileb, S. and Hofland, K. 2013. Applying
Corpus Techniques to Climate Change Blogs. In A.
Hardie and R. Love (eds.) Corpus Linguistics 2013
Abstract
Book.
Available
online
at
http://ucrel.lancs.ac.uk/cl2013/doc/CL2013ABSTRACT-BOOK.pdf
Svenja Adolphs
University of
Nottingham
Ramona.statche
@nottingham.ac.uk
svenja.adolphs
@nottingham.ac.uk
Ansgar Koene
University of
Nottingham
University of
Nottingham
psxcc@
nottingham.ac.uk
ansgar.koene
@nottingham.ac.uk
Derek McAuley
University of
Nottingham
Claire OMalley
University of
Nottingham
Derek.mcauley
@nottingham.ac.uk
Claire.omalley
@nottingham.ac.uk
Elvira Perez
University of
Nottingham
Tom Rodden
University of
Nottingham
Elvira.perez
@nottingham.ac.uk
Tom.rodden
@nottingham.ac.uk
Acknowledgement
This work forms part of the CaSMa project at the
University of Nottingham, HORIZON Digital
Economy Research institute, supported by ESRC
grant ES/M00161X/1.
References
Martin, J. R. and White, P. R. R., 2005. The Language of
Evaluation. Appraisal in English. Palgrave Macmillan.
Eggins, S., 2004. An Introduction to Systemic Functional
Linguistics. 2nd ed. New York London: Continuum.
433
Contrastive analysis
the Relative clauses based on Parallel
corpus of Japanese and English
Kazuko Tanabe
Japan Womens University
tanabeka@fc.jwu.ac.jp
kara no dappi
o
from P casting off ACC
hitsuyoo ga
aru.
need NOM exist
English:
They must do away with their mentality of
depending on the government.
Isogu hitsuyo in the Japanese sentence seems to
be translated into must in English.
(3) hoshin(principle)
Japanese:
Senta de gennchi kunren o kaishi shi-tai hoshin
Center at training ACC start want principle
da.
copura
English:
The ministry also intends to start training at the
Japanese made center.
Kaishi shi-tai hoshin is presumed to be
translated into intend to.
In conclusion, Japanese fact-S construction
does not tend to be translated into relative
clause constraction in English. The English
meaning of verbs or auxiliaries usually reflect
the connotation
of
Japanese fact-S
construction.
smtwardo@gmail.com
ab6667@coventry.ac.uk
References
Rysiewicz J., Foreign Language Aptitude TestPolish
(FLAT-PL), General characteristics, description,
analysis, statistics and test administration procedures,
Pozna 2011, retrieved on March 3rd, 2014,
https://www.academia.edu/1744649/Foreign_Languag
e_Aptitude_Test_-_Polish_FLATPL_Test_Uzdolnien_do_Nauki_Jezykow_Obcych__TUNJO_
Introduction
Corpora
Corpus
Word count
Mathematics
265,959
Natural Science (NS) 279,899
Social & Political 280,095
Science (SPS)
Table 2: Corpora used in the study with word counts
As can be seen from Table 2, the corpora used in
this study are relatively small. However, since they
contain either the entire textbook in the case of
Mathematics or a significant proportion of the
extracts that students are expected to read while
studying the SPS and NS courses, we can be fairly
certain that findings based on these corpora are
representative of the type of language that first year
undergraduate students at the university will meet on
these compulsory courses.
Method
Findings
References
Biber, D., Johansson, S., Leech, G., Conrad, S., Finegan,
E. 1999. Longman Grammar of Spoken and Written
English. Harlow: Longman.
Charles, M. 2007. Argument or evidence: Disciplinary
variation in the use of the Noun that pattern in stance
construction. English for Specific Purposes, 26: 203
218
Hunston, S. 2008. Starting with the small words:
Patterns, lexis and semantic sequences. International
Journal of Corpus Linguistics 13: 271-295
Hunston, S. 2011. Corpus approaches to evaluation:
phraseology and evaluative language. New York:
Routledge.
Hunston, S. and Francis, G. 2000. Pattern grammar.
Amsterdam/Philadelphia: John Benjamins.
437
References
Baker, P. (2006). Using Corpora in Discourse Analysis.
London: Continuum.
Baker, P., Gabrielatos, C., KhosraviNik, M.,
Kryzanowski, M., McEnery, T., & Wodak, R. (2008).
A useful methodological synergy? Combining critical
discourse analysis and corpus linguistics to examine
discourses of refugees and asylum seekers in the UK
press. Discourse & Society, 19(3), 273-306.
Barnard-Wills, D. (2011). UK news media discourses of
surveillance. The Sociological Quarterly, 52(4), 548
567.
Black, I. (2013, June 10). NSA spying scandal: What we
have learned. The Guardian. Retrieved from
http://www.theguardian.com/world/2013/jun/10/nsaspying-scandal-what-we-have-learned
Lyon, D. (2004). Globalizing surveillance: Comparative
and sociological perspectives. International Sociology,
19(2), 135-149.
MacDonald, M. N., & Hunter, D. (2013a). The discourse
of Olympic security: London 2012. Discourse &
Society, 24(1), 66-88.
MacDonald, M. N., & Hunter, D. (2013b). Security,
population and governmentality: UK counter-terrorism
discourse (2007-2011). Critical Approaches to
Discourse Analysis Across Disciplines, 7(1), 123-140.
Mahlberg, M. (2014). Corpus linguistics and discourse
analysis. In K. P. Schneider & A. Barron (Eds.),
Pragmatics of Discourse (pp. 215-238). Berlin: De
Gruyter Mouton.
McEnery, T. (2009). Keywords and moral panics: Mary
Whitehouse and media censorship. In D. Archer (Ed.),
What's in a Word-list? Investigating Word Frequency
and Keyword Extraction (pp. 93-124). Farnham:
Ashgate.
Williams, R. (1983). Keywords: A Vocabulary of Culture
and Society (2nd ed.). London: Fontana.
Zurawski, N. (2007). Einleitung [Introduction]. In N.
Zurawski (Ed.), Surveillance Studies: Perspektiven
eines Forschungsfeldes [Perspectives of a research
field] (pp. 7-24). Opladen: Barbara Budrich.
Risn Knight
Lancaster University
a.wilson
@lancaster.ac.uk
r.knight1
@lancaster.ac.uk
Introduction
Theory
legomena the main underlying feature of the rankfrequency-based indicators necessarily leads to a
change in the type-token relationship overall.
Data
Results
Conclusion
an interesting story.
References
Greenberg, J.H. 1960. A quantitative approach to the
morphological typology of languages. International
Journal of American Linguistics, 26: 178-194.
maylywong@hku.hk
Conclusion
References
Baker, P. and McEnery, T. 2014. Find the doctors of
death: press representation of foreign doctors working
in the NHS, a corpus-based approach. In A. Jaworski
and N. Coupland (eds.) The discourse reader (3rd ed.)
(pp. 465-480). London and New York: Routledge.
Fowler, R. 1991. Language in the news: discourse and
ideology in the press. London: Routledge.
Hart, C. 2013a. Constructing contexts through grammar:
cognitive models and conceptualisation in British
441
Vincentwang0229@
gmail.com
Qun Liu
Dublin City University
Qliu@computing.dcu.ie
http://sslmitdev-online.sslmit.unibo.it/corpora/corpora.php
http://cass.lancs.ac.uk/?page_id=1386
442
:
: .
: . ,
.
: ?
:
: , , , .
, .
Master Long: The scabbard is so beautiful.
Governor Yu: It's beautiful but dangerous. Once you see
it tainted with blood, its beauty is hard to admire.
Master Long: It must be exciting to ba a fighter, to be
totally free!
Governor Yu: Fighters have rules too: friendship, trust,
integrity. Without rules, we wouldn't survive for long.
Acknowledgements
This research is supported by the Science
Foundation Ireland (Grant 12/CE/I2267) as part of
the ADAPT Centre (www.adaptcentrel.ie) at Dublin
City University, Ireland. It is also supported by the
HUAWEI TECHNOLOGIES Co., LTD and
National Social Science Foundation of China (Grant
10CYY006) as part of Shaanxi Normal University, China.
References
Haspelmath, M. 2001. "The European linguistic area:
standard average European". Language Typology and
Language Universals (1):1492-1510.
Huang, C.T. 1989. "Pro-drop in Chinese: A Generalized
Control Theory". In Jaeggli, O. and K. J. Safir (eds.)
The Null Subject Parameter. London: Kluwer
Academic Publisher.
443
References
Algeo, John (1988). British and American grammatical
differences. International Journal of Lexicography 1:
1-31.
Celce-Murcia, M. and D. Larsen-Freeman. (1999). The
Grammar Book. Boston: Heinle-Heinle.
Quirk, R. and S. Greenbaum. (1973). A Univeristy
Grammar of English. Essex: Longman.
Quirk, R., Greenbaum, S., Leech, G., & Svartvik, J.
(1985). A Comprehensive Grammar of the English
Language. London: Longman.
Sonoda, Kenji. (2002). Omission of Prepositions in Time
Adverbials in Present-day Spoken AmE. Bulletin of
Nagasaki University School of Health Sciences.
15(2):19-25.
Svetlana
Dzhakupova
National Research
University
Higher School of
Economics
natalia.zevakhina
@gmail.com
Svetlanads
@yandex.ru
Elmira Mustakimova
National Research University
Higher School of Economics
egmustakimova_2@edu.hse.ru
444
Introduction
http://web-corpora.net/CoRST/
Error classification
Acknowledgments
The results of the project Corpus studies of
language variation: from deviations to linguistic
norm, carried out within the framework of the
Basic Research Program at the National Research
University Higher School of Economics (HSE) in
2015, are presented in this work.
References
Arkhangelsky T. 2012. Les Crocodiles 2.6. [Software].
Moscow.
Glovinskaya, M. 2000. Aktivnye processy v grammatike
// Russkiy yazyk kontsa XX stoletiya. M. (Active
processes in grammar // The Russian language in the
end of the 20th century. Moscow).
Rakhilina, E. 2014. Stepeni sravneniya v svete russkoy
grammatiki oshibok // Sbornik k 10-letiyu NRC. M.
(Comparative degrees in the view of error Russian
grammar // Collection towards the tenth anniversary of
NRC. Moscow).
Segalovich I., Titov V. 1997-2014. MyStem. [Software].
Available from https://tech.yandex.ru/mystem/
445
Dace Znotia
University
of Latvia
Inga Znotia
Liepaja University,
Ventspils
University College
Daceznotina
@gmail.com
inga.s.znotina
@gmail.com
Current research
In the current research, 50 pictures (static, singleframe) were used photography and drawings
(documental, art). 20 participants described them
verbally producing 1000 short texts. Content
analysis (categories: character, event/action, time,
space, world knowledge, emotion/immersion) of
these texts took 4 previously described steps (~6
hours for each 20 minutes of spoken text). The
results from corpus analysis contain the frequencies
of narrative elements and the relations between them
(the mental structure of the static visual stimuli).
Conclusions
Acknowledgements
This work was partly funded by European Social
Fund, project Doktora studiju attstba Liepjas
Universitt (grant No.2009 / 0127 / 1DP /
1.1.2.1.2. / 09 / IPIA / VIAA / 018).
References
Flanagan, J. 2008. Knowing More Than We Can Tell: The
Cognitive Structure of Narrative Comprehension. In
Partial Answers: Journal of Literature and the History
of Ideas, Volume 6, Number 2, June 2008: 323-245.
Herman, D. 2009. Basic Elements of Narrative. Oxford:
Wiley-Blackwell.
McEnery, T., Wilson, A. 2001. Corpus Linguistics. An
Introduction. Second Edition. Edinburgh: Edinburgh
University Press.
Sanford, A.J. and Emmott, C. 2012. Mind, Brain and
Narrative. Cambridge: Cambridge University Press.
Introduction
Data collection
aforementioned
metainformation
categories.
Annotation should include error annotation, part-of
speech annotation, and syntactic sentence type
annotation.
All changes and additions to the corpus are
described in the News section of the corpuss
website.
Acknowledgements
Future plans
448