Articles and conf. papers by Maria Jose Marin Perez
Corpora, 2022
The phenomenon of immigration and its depiction in media texts have been examined profusely withi... more The phenomenon of immigration and its depiction in media texts have been examined profusely within the field of corpus-based discourse analysis (Gabrielatos and Baker, 2008; Baker et al., 2013; Blinder and Allen, 2016). This research seeks to present it as reflected on a corpus of 600 judicial decisions issued by Spanish courts in the years 2016 and 2017. This analysis was motivated by the rise of extreme right-wing parties in Europe in the recent years, which dehumanise immigrants and portray them as a threat to the welfare state. On a first approach, the results appear to dissociate immigration and crime since a considerable percentage of the keywords obtained (c. 20%) revolves around three major topoi, namely, family, territory/access, and legalpunishment, not showing evidence of any major offences or crimes amongst the top-ranking lexicon. The study of the collocate networks of the KWs within the category legal punishment confirms our initial perception, in fact, out of 21 collocates, only the word delito(crime) itself collocates with terms referring to typified crimes such as violencia(violence).In parallel, the data were triangulated using the text-classification software UMTextStats(García-Díaz et al., 2018). The results of this second analysis confirm our initial observations.
International Journal of Language & Law (JLL), 2017
In spite of the plethora of possibilities offered by Corpus Linguistics to the study of legal Eng... more In spite of the plethora of possibilities offered by Corpus Linguistics to the study of legal English, the research devoted to the study of this English variety based on this discipline is not as fruitful as that dedicated to other branches of ESP. The present research could be regarded as an introduction into major issues related to the design and compilation of a legal corpus such as the application of appropriate sampling strategies to ensure its representative value. This study also examines the implementation of Automatic Term Recognition (ATR) methods for the analysis of legal terminology and the automatic deployment of collocate net-works. The first section explores such a controversial issue as establishing the ideal size for a specialised corpus applying the type/term ratio to a corpus of judicial decisions, the BLaRC, used as reference. In section 3, the assessment of differentAutomatic Term Recognition (ATR) methods is described. Out of five different methods, Drouin’s (2003) TermoStat is found and recommended as the most efficient one in legal term mining. Finally, sections 4 and 5 demonstratethe practicalityof collocate networks (Williams, 1998; 2001)in their capacity to reveal lexico-grammatical patterns which provide plenty of information for the study of legal text. A case study of the sub-technical legal term party using Lancsbox –designed by Brezina, McEnery & Wattam(2015)– is presented in section 5.2, where its general and specialised contexts are examined. Such scrutiny brings to the foreground interesting data such as the relevance of marriages of convenience in a collection of judicial decisions
ESP Today, 2019
In spite of being popularly regarded as examples of objectivity, two collections of Spanish and B... more In spite of being popularly regarded as examples of objectivity, two collections of Spanish and British judicial decisions related to the search terms migration, immigration and their Spanish equivalents were examined in search for evidence of the use of evaluative vocabulary, which appears to be considerably significant judging by the amount of such lexical items found in both corpora. This research thus introduces a contrastive corpus-based study of two legal corpora through the
replication of the appraisal theory model. The frequency lists from both corpora, obtained using the software Lancsbox (Brezina et al., 2015) were compared by examining and classifying those vocabulary items amongst the top 2,500 types in the lists using the taxonomy provided by appraisal theory. The findings show that the British dataset contains a greater proportion of evaluative vocabulary, particularly as regards the category affect within the appraisal system. Such findings could be related to the very nature of its legal system, where the law is said to be judge-made, leaving greater freedom for the expression of stance as
opposed to the Spanish system, which is codified and may somehow constrain legal actors in the way in which they convey their attitude towards the propositional content of legal texts.
Quaderns de Filología Inglesa: Estudis Lingüístics, 2017
Abstract: The features of specialised languages have been extensively described by scholars in th... more Abstract: The features of specialised languages have been extensively described by scholars in the literature. Amongst them, Enrique Alcaraz's work stands out as an exhaustive and comprehensive description of EPAP at all linguistic levels: lexical, syntactic , semantic and pragmatic. This research aims to provide a bottom-up assessment of his description on a lexical level through the implementation of corpus-based techniques on two specialised corpora of legal and telecommunications English. The results support Alcaraz's portrayal as regards term usage, the relevance of sub-technical vocabulary , the peculiarities of Latin single and multi-word terms in legal English and the significant presence and usage of abbreviations in telecommunications English.
Miscelánea: A journal of English and American Studies, 2017
This research presents a data-driven experiment in the legal English field where the FLAX, an ope... more This research presents a data-driven experiment in the legal English field where the FLAX, an open-source self-learning online platform, is assessed as regards its efficacy in aiding a group of legal English non-native undergraduates (divided into an experimental and a control group) to use legal terminology more consistently, amongst other language items. The experimental group were instructed to only resort to the FLAX and to exploit all the functionalities offered by it. Conversely, the control group could access any information source at hand except for the learning platform for the completion of the same task. Two learner corpora were gathered and analysed on a lexical and pragmatic level for the evaluation of term usage and distribution, lexical diversity, lexical fundamentality and the use of discourse markers. The results display a tendency on the part of the experimental group towards a more consistent usage of legal terminology, which also appears to be better distributed than the terms in the non-FLAX corpus. In contrast and on average, the lexicon in the FLAX-based corpus tends to be slightly more basic. Concerning the use of MD markers, though marginally, the experimental group appears to use a greater number of evidentials, endophoric and interactional markers.
Procedia: Social and Behavioural Sciences, 2015
The present research explores the impact that cognates, that is, words which share formal and oft... more The present research explores the impact that cognates, that is, words which share formal and often semantic features in the L1 and the L2, may have on the understanding and acquisition of legal English terminology. To that end, a DDL experiment was carried out using two corpora, one of them the BLaRC, an 8.85 million-word collection of judicial decisions issued by British courts, and the LACELL, a general English corpus of 21 million words. 56 first-year Spanish Law students were asked to translate 12 legal terms, 10 of which were English/Spanish cognates. The results showed that, as it was indeed expected, the higher the students' proficiency level (they were administered a level test prior to the experiment), the higher their rate of success in providing correct answers. This was so both for the general and specialised fields proving that partial semantic equivalence between cognates did pose certain difficulties in their understanding even for the higher level groups.
ASp la revue du GERAS , 2014
This article suggests a new approach for the semantic description of “sub-technical terms” applyi... more This article suggests a new approach for the semantic description of “sub-technical terms” applying Cantos and Sánchez’s (2001) Lexical Constellation model, which allows for a visual representation of the path they follow towards specialisation. Various authors highlight the use of sub-technical terms as an outstanding feature of the legal lexicon. This study attests that almost half of the terms in a legal corpus of 8.85 million words overlap with general English vocabulary lists such as West’s (1953) General Service List (GSL) or the British National Corpus (BNC) most common 3,000 words of English. Therefore, given the relevance of these terms within this ESP variety and the lack of proposals for their analysis, an attempt is made to try and account for such a complex phenomenon adopting a semantic and a corpus-based perspective. This is achieved employing a qualitative and quantitative methodology for the description of the process of specialisation of the sub-technical legal term “charge”, used to exemplify our model of analysis.
Terminology, 2016
One of the most remarkable features of the legal English lexicon is the use of subtechnical
vocab... more One of the most remarkable features of the legal English lexicon is the use of subtechnical
vocabulary, that is, words frequently shared by the general and specialised
fields which either retain a legal meaning in general English or acquire a specialised one
in the legal context. As testing has shown, almost 50% of the terms extracted from
BLaRC, an 8.85m word legal corpus, were found amongst the most frequent 2,000 word
families of West’s (1953) GSL, Coxhead’s (2000) AWL or the BNC (2007), hence the
relevance of this type of vocabulary in this English variety. Owing to their peculiar
statistical behaviour in both contexts, it is particularly problematic to identify them and
measure their termhood based on such parameters as their frequency or distribution in
the general and specialised environments. This research proposes a novel termhood
measuring method intended to objectively quantify this lexical phenomenon through the
application of Williams’ (2001) lexical network model, which incorporates contextual
information to compute the level of specialisation of sub-technical terms.
FOR BIBLIOGRAPHIC REFERENCE: Marín Pérez, M.J. (in the press, 2016). “Measuring the Degree of Specialisation of Sub-Technical Legal Terms through Corpus Comparison: a Domain-Independent Method”. Terminology, 22(1). John Benjamins.
corpus design. The available corpora existing do not satisfy our needs as we intend to establish ... more corpus design. The available corpora existing do not satisfy our needs as we intend to establish the core vocabulary of this ESP branch, so we have opted for creating our own, BLaRC: British Law Report Corpus. The selected genre are law reports (judicial decisions) as they stand at the very basis of common law The purpose of this paper is thus to show and justify the decisions we have made in its process of compilation and design. RESUMEN satisfacen nuestras necesidades pues nuestro objetivo es establecer el vocabulario básico de esta variedad, por este motivo hemos optado por crear nuestro propio corpus, BLaRC: British Law Report Corpus. El género seleccionado son los law reports (decisiones judiciales) ya que son una pieza fundamental en los sistemas legales common law decisiones que se han tomado en su proceso de diseño y compilación. PALABRAS CLAVE: corpus legal, ESP, common law, representatividad, tamaño del corpus, word target.
Journal of English Studies, 2012
The aim of this paper is to describe and justify the structure and design criteria to create a le... more The aim of this paper is to describe and justify the structure and design criteria to create a legal English corpus of judicial decisions. The authors, lecturers of this ESP variety, decided to engage into specific corpus design due to the small variety of teaching materials and corpora available. Judicial decisions are essential wheels in the legal machinery of common law systems and, precisely because of that fact, they are fundamental as a legal genre. This is why we intend to compile a 6m word legal corpus of UK judicial decisions in order to establish the core vocabulary of the genre and use it for further linguistic analysis and the elaboration of didactic materials.
Corpora, 9 (1), 2014
Specialised texts are characterised, amongst other features, by the presence of terminology which... more Specialised texts are characterised, amongst other features, by the presence of terminology which conveys domain-specific concepts essential for the specialist interested in analysing such texts. Automatic term recognition methods (ATR) are employed to automatically identify those terms, especially due to the large size of corpora nowadays. However, they tend to concentrate on the identification of multi-word terms (MWTs) neglecting single-word terms (SWTs) to a certain extent. This might be related to the greater number of the former found in fields such as biomedicine. However, as far as legal English is concerned, testing has shown that SWTs represent 65.22% of the items in the specialised glossary employed for the evaluation of the ATR methods examined herein. This article presents the evaluation of five single-word term recognition methods, namely, Chung’s (2003), Drouin’s (2003), Kit and Liu’s (2008), Keywords (2008), and TF-IDF (term frequency-inverse document frequency) which were tested on the United Kingdom Supreme Court Corpus (UKSCC), a 2.6 million-word legal corpus designed and compiled with such purpose. The results indicate that Drouin’s TermoStat software is the best performing one achieving 73.45% precision on the top 2000 candidate terms.
This doctoral thesis aims at identifying and analysing the specialised vocabulary in BLaRC (the B... more This doctoral thesis aims at identifying and analysing the specialised vocabulary in BLaRC (the British Law Report Corpus), an ad hoc legal corpus of British Law Reports of 8.85 million words, which is described and justified in detail in chapter 2. In order to do so, ten different ATR methods are implemented on a 2.6 million word corpus, UKSCC (the United Kingdom Supreme Court Corpus), extracted from the main one to facilitate their implementation and validation process. Chapter 3 is devoted to the evaluation of such ATR methods as regards the precision levels achieved in term identification by each of them. Average precision is calculated through the automatic comparison of the lists of candidate terms (CTs) produced by each method with a gold standard, that is, an electronic legal glossary of 10,088 entries, also compiled for this research. Cumulative precision is measured following the same procedure so as to observe and compare the way it evolves as the number of identified terms augments. As a result, Terminus 2.0 (Nazar & Cabré, 2012) and TermoStat (2003), the best performing techniques, are selected with the aim of implementing them on BLaRC. After doing so, the validated lists of both single and multi-word legal terms extracted from it are offered in section 3.2.4. Chapter 3 ends with the proposal of some activities aimed at illustrating the varied applications and uses of specialised corpora and vocabulary lists in ESP teaching. Owing to the relevance of sub-technical vocabulary as a major component of the legal lexicon, a quantitative method is proposed in chapter 4 to measure its degree of specialisation based on the context of usage of this type of words. William’s (2001) lexical network model is applied to a set of general, highly specialised and sub-technical words in order to observe and compare the number and frequency of their collocates and co-collocates both in BLaRC, the specialised corpus, and LACELL, the general one. The observation of the data obtained leads to the formulation of the algorithm Sub-Tech allowing to place the words analysed along a continuum of specialisation depending on the data obtained after the implementation of Williams’ model. Finally, with the purpose of describing sub-technical vocabulary from a semantic perspective, Cantos and Sánchez’s (2001) lexical constellation model is applied to analyse the semantic features of the shared terms trial, charge and battery resulting into a much clearer picture of the process undergone by sub-technical words from general usage to specialisation. The application of this model in combination with the quantitative method described above may be regarded as a first step towards a better understanding of a lexical phenomenon which, to the best of our knowledge, has not been explored in depth to date.
Legal terminology presents certain traits which may interfere with its automatic detection such a... more Legal terminology presents certain traits which may interfere with its automatic detection such as its relevant presence in everyday language. Thus, this research explores the levels of precision achieved by five single and multi-word term recognition methods on a pilot legal corpus of 2.6 million words. A comparison is carried out with the results presented by Marín (2014a). Once the most effective single and multi-word term recognition method is singled out, it is applied to the reference corpus, BLaRC, with the aim of producing a
reliable list of legal terms which might be exploited in areas such as English for Specific Purposes (ESP) instruction, Applied Linguistics or Terminology.
Miscelánea: A Journal of English and American Studies, 2014
The use of language corpora in second language (SL) instruction dates back to the late 1980s, whe... more The use of language corpora in second language (SL) instruction dates back to the late 1980s, when Johns (1986) coins the term data-driven learning (DDL). DDL is based on the use of concordance lines extracted from corpora which learners examine with the aim of eliciting the rules of the language governing them. Scholars highlight the usefulness of specialised corpora in ESP (English for Specialised Purposes) teaching due to their authentic and current character, however, in the area of legal English, the number and availability of these corpora is reduced. Likewise, the amount of experiments with DDL methods in this area is almost inexistent (Boulton, 2011). Owing to this methodological void, this article presents a proposal to exploit the legal term inventories extracted automatically from BLaRC, an 8.85 million-word legal English corpus. The proposal consists in the design of four different corpus-based activities which focus on the morphological, lexical, semantic and syntactic levels of the language as well as a pedagogical research method for their future implementation.
English for Specific Purposes World, 2014
Getting to know the terms in a specialised text certainly contributes to the understanding of the... more Getting to know the terms in a specialised text certainly contributes to the understanding of the text itself. Their identification becomes essential precisely because of that reason
and, owing to the large size of specialised corpora nowadays, the use of automatic term recognition (ATR) methods is fundamental when trying to extract the most characteristic terms in a given domain.
However, these methods are not 100% effective and they must be validated before resorting to them so that the precision levels achieved are high enough for specialists to draw reliable conclusions on this type of vocabulary. This article presents the
assessment of four different ATR methods on two specialised
corpora of legal and telecommunication English.
The methods selected, TF-IDF (term frequency-inverse document frequency), C-value (Frantzi and Ananiadou, 1999)
TermoStat (Drouin, 2003) and Terminus 2.0 (Nazar and Cabré, 2012) were evaluated in terms of precision . The aim of this
evaluation is to compare the results obtained in all cases and
to conclude whether there exists a certain degree of domain-
dependence as regards each of these methods.
Journal of Second Language Teaching and Research, 2014
The introduction of legal English as a compulsory subject in the curriculum of the Law Degrees ta... more The introduction of legal English as a compulsory subject in the curriculum of the Law Degrees taught at Spanish universities due to the implementation of the Bologna Reform has led to the design of syllabuses which are intended to enable students to become proficient users of this English variety with both academic and professional purposes. This paper presents a corpus-based proposal for the grading of materials for the teaching of legal vocabulary which can be extrapolated to other varieties of English for Specific and Academic Purposes (ESAP). In order to exemplify it, a sample list of 33 crime nouns (obtained from the legal English textbooks consulted) has been examined in terms of their frequency, keyness and text range values in an ad hoc legal corpus of 2.6 million words, UKSCC. After doing so, Chung’s (2003) automatic term recognition (ATR) method has been applied so as to establish their level of specialisation. Our proposal relies on the assumption that the
information obtained after taking these different parameters into consideration might be helpful for the ESAP instructor to rank the vocabulary inventories obtained from specialised
corpora so that the materials derived from them can be graded according to the students’ needs.
Procedia -Social and Behavioral Sciences-, Oct 2013
Automatic term recognition (ATR) methods help to identify the most representative terms in a corp... more Automatic term recognition (ATR) methods help to identify the most representative terms in a corpus automatically, saving time and allowing managing large amounts of data that could not be dealt with manually. This paper presents the evaluation of two ATR methods implemented on a 2.6 million-word legal corpus designed and compiled ad hoc: Keywords (Scott, 2008) and Chung's method (2003). Both techniques have been assessed as regards precision and recall. The results clearly show that Keywords is, by far, the most efficient one achieving to recognize 62% true terms out of the 2,000 items evaluated in this study.
The implementation of the Bologna Reform at Spanish universities has brought about major changes ... more The implementation of the Bologna Reform at Spanish universities has brought about major changes amongst which the use of foreign languages has become a key issue within the concept of a European area of higher education. Moreover, the idea of internationalisation is one of the fundamental goals of the “Campus Mare Nostrum” integrated by the Universities of Murcia and Cartagena. Henceforth, a language of communication is required to develop such concept in the fields of research and teaching and English appears to be the major vehicular language or lingua franca. This paper explores the possibilities offered by specialised language corpora for the planning and elaboration of didactic materials to teach English as a specialised language within the fields of Telecommunications Engineering and Law. Two specific corpora will be studied: TEC (Telecommunication Engineering English Corpus) and BLaRC (British Law Report Corpus) designed, compiled and analysed for research purposes. Special attention will be paid to such questions as frequency rates or keyness, amongst others, as determining factors to identify the core vocabulary of both specialised languages, a point of departure for the creation of new didactic materials.
Servicio de Publicaciones de la universidad de Murcia, Feb 21, 2014
This paper presents the proposal of two activities aimed at fostering the acquisition of legal te... more This paper presents the proposal of two activities aimed at fostering the acquisition of legal terminology using a DDL (data-driven learning) methodology. The number of pedagogical experiments carried out in this ESP branch is very reduced (Boulton, 2010). This is why the gap in the area must be filled by designing and implementing activities using legal corpora as the source to obtain the information from. The activities presented in section 4 of this study are based on BLaRC, an 8.85 million-word corpus of British judicial decisions designed and compiled by Marín (2013). A review of the literature is also offered in sections 1 and 2, which show the pros and cons of the use of this type of teaching methods, according to the authors cited, and the methodological void existing in the area. This study ends with the proposal of an experimental research design for the implementation of the activities following it, which are conceived to act as support to already existing materials such as textbooks.
Online corpora by Maria Jose Marin Perez
Uploads
Articles and conf. papers by Maria Jose Marin Perez
replication of the appraisal theory model. The frequency lists from both corpora, obtained using the software Lancsbox (Brezina et al., 2015) were compared by examining and classifying those vocabulary items amongst the top 2,500 types in the lists using the taxonomy provided by appraisal theory. The findings show that the British dataset contains a greater proportion of evaluative vocabulary, particularly as regards the category affect within the appraisal system. Such findings could be related to the very nature of its legal system, where the law is said to be judge-made, leaving greater freedom for the expression of stance as
opposed to the Spanish system, which is codified and may somehow constrain legal actors in the way in which they convey their attitude towards the propositional content of legal texts.
vocabulary, that is, words frequently shared by the general and specialised
fields which either retain a legal meaning in general English or acquire a specialised one
in the legal context. As testing has shown, almost 50% of the terms extracted from
BLaRC, an 8.85m word legal corpus, were found amongst the most frequent 2,000 word
families of West’s (1953) GSL, Coxhead’s (2000) AWL or the BNC (2007), hence the
relevance of this type of vocabulary in this English variety. Owing to their peculiar
statistical behaviour in both contexts, it is particularly problematic to identify them and
measure their termhood based on such parameters as their frequency or distribution in
the general and specialised environments. This research proposes a novel termhood
measuring method intended to objectively quantify this lexical phenomenon through the
application of Williams’ (2001) lexical network model, which incorporates contextual
information to compute the level of specialisation of sub-technical terms.
FOR BIBLIOGRAPHIC REFERENCE: Marín Pérez, M.J. (in the press, 2016). “Measuring the Degree of Specialisation of Sub-Technical Legal Terms through Corpus Comparison: a Domain-Independent Method”. Terminology, 22(1). John Benjamins.
reliable list of legal terms which might be exploited in areas such as English for Specific Purposes (ESP) instruction, Applied Linguistics or Terminology.
and, owing to the large size of specialised corpora nowadays, the use of automatic term recognition (ATR) methods is fundamental when trying to extract the most characteristic terms in a given domain.
However, these methods are not 100% effective and they must be validated before resorting to them so that the precision levels achieved are high enough for specialists to draw reliable conclusions on this type of vocabulary. This article presents the
assessment of four different ATR methods on two specialised
corpora of legal and telecommunication English.
The methods selected, TF-IDF (term frequency-inverse document frequency), C-value (Frantzi and Ananiadou, 1999)
TermoStat (Drouin, 2003) and Terminus 2.0 (Nazar and Cabré, 2012) were evaluated in terms of precision . The aim of this
evaluation is to compare the results obtained in all cases and
to conclude whether there exists a certain degree of domain-
dependence as regards each of these methods.
information obtained after taking these different parameters into consideration might be helpful for the ESAP instructor to rank the vocabulary inventories obtained from specialised
corpora so that the materials derived from them can be graded according to the students’ needs.
Online corpora by Maria Jose Marin Perez
replication of the appraisal theory model. The frequency lists from both corpora, obtained using the software Lancsbox (Brezina et al., 2015) were compared by examining and classifying those vocabulary items amongst the top 2,500 types in the lists using the taxonomy provided by appraisal theory. The findings show that the British dataset contains a greater proportion of evaluative vocabulary, particularly as regards the category affect within the appraisal system. Such findings could be related to the very nature of its legal system, where the law is said to be judge-made, leaving greater freedom for the expression of stance as
opposed to the Spanish system, which is codified and may somehow constrain legal actors in the way in which they convey their attitude towards the propositional content of legal texts.
vocabulary, that is, words frequently shared by the general and specialised
fields which either retain a legal meaning in general English or acquire a specialised one
in the legal context. As testing has shown, almost 50% of the terms extracted from
BLaRC, an 8.85m word legal corpus, were found amongst the most frequent 2,000 word
families of West’s (1953) GSL, Coxhead’s (2000) AWL or the BNC (2007), hence the
relevance of this type of vocabulary in this English variety. Owing to their peculiar
statistical behaviour in both contexts, it is particularly problematic to identify them and
measure their termhood based on such parameters as their frequency or distribution in
the general and specialised environments. This research proposes a novel termhood
measuring method intended to objectively quantify this lexical phenomenon through the
application of Williams’ (2001) lexical network model, which incorporates contextual
information to compute the level of specialisation of sub-technical terms.
FOR BIBLIOGRAPHIC REFERENCE: Marín Pérez, M.J. (in the press, 2016). “Measuring the Degree of Specialisation of Sub-Technical Legal Terms through Corpus Comparison: a Domain-Independent Method”. Terminology, 22(1). John Benjamins.
reliable list of legal terms which might be exploited in areas such as English for Specific Purposes (ESP) instruction, Applied Linguistics or Terminology.
and, owing to the large size of specialised corpora nowadays, the use of automatic term recognition (ATR) methods is fundamental when trying to extract the most characteristic terms in a given domain.
However, these methods are not 100% effective and they must be validated before resorting to them so that the precision levels achieved are high enough for specialists to draw reliable conclusions on this type of vocabulary. This article presents the
assessment of four different ATR methods on two specialised
corpora of legal and telecommunication English.
The methods selected, TF-IDF (term frequency-inverse document frequency), C-value (Frantzi and Ananiadou, 1999)
TermoStat (Drouin, 2003) and Terminus 2.0 (Nazar and Cabré, 2012) were evaluated in terms of precision . The aim of this
evaluation is to compare the results obtained in all cases and
to conclude whether there exists a certain degree of domain-
dependence as regards each of these methods.
information obtained after taking these different parameters into consideration might be helpful for the ESAP instructor to rank the vocabulary inventories obtained from specialised
corpora so that the materials derived from them can be graded according to the students’ needs.
We are particularly concerned with closing the gap in language teacher training where competencies in materials development are still dominated by print-based proprietary course book publications, which in many cases do not reflect salient findings from the research into domain-specific language. We are also concerned with the growing gap in language teaching practitioner competencies for understanding important issues of copyright and licensing that are changing rapidly in the context of digital and web literacy developments. These key issues are being largely ignored in the informal language teaching practitioner discussions, both by experienced and new language tutors, and in the formal research into teaching and materials development practices.
1st author Camino Rea Rizzo
2nd author María José Marín
Internationalization is one of the main aims of today’s Spanish universities. According to the Strategy for the Internationalization of Spanish Universities 2015-2020 (MECD), internationalization is defined as the process to integrate an international, intercultural and global dimension into the aims, functions and development of Higher Education. University teaching offer is found on the bases for internationalization considered in three stages: i) development of mobility, ii) internationalization of degree programmes and iii) Institutional internationalization.
In 2010 the Technical University of Cartagena (UPCT), through the International Excellence Campus Mare Nostrum 37/38, took on the challenge of promoting teaching in English and taking measures to provide teachers with the necessary resources to teach content subjects in English as part of the official offer in degree and master programmes. Since then, different lines of action have been developed for this purpose: certification of English proficiency, training and incentivization.
Consistent with this policy, the UPCT’s internationalization plan (2017/2020) pursues to increase the number of bilingual programmes (English and Spanish) by promoting the offer of at least one bilingual itinerary, where a minimum of 50% credits are available in English, per University Centre. The groundwork for bilingual teaching or itineraries is incipiently found within the Campus Mare Nostrum subprogramme which defines the objective of planning subjects, designing teaching materials and performing the preparatory actions required to implement teaching in English in the following academic year. Precisely in this preparatory stage, an important lack was observed by the body of teachers involved in the project. They were missing a language tool to resort to when they needed to use the so-called classroom language and the vocabulary related to a university context like horas de tutoría, convocatoria de examen, aula virtual, etc. Hence, the language tool was envisioned in order to provide a work tool and a language resource for teaching in English which, in turn, facilitates communication and understanding between teachers and students. The tool covers three broad categories dealing with i) the classroom language, composed of four subcategories: informal talks among colleagues and students; conversations between teacher and student in the classroom; conversations during a personal tutorial; and conversations about administrative topics, ii) the basic mathematical language shared by technical subjects, divided into solving equations; notation and connectors; coordinate axes and graphical representation; and operations and functions, and iii) the specialized technical language of the contents subjects available in English (20 specializations).
This work describes the development of the language tool from its beginning in 2015/16 as a compilation of language samples in a linguistic corpus, the parameters governing its systematization into a database, until its final outcome available on the webpage http://inglesuniversitario.upct.es
Traditionally, there has been a tendency to describe legal language on the basis of the author’s knowledge and intuition or of a small amount of language samples (Mellinkoff 1963; Alcaraz 1994; Tiersma 1999; Borja 2000). The features described in such works often display a top-down characterisation of legalese constructed upon a deductive approach. Nonetheless, there is a current tendency towards portraying legalese from a totally different angle, that of corpus linguistics (CL), which allows for a bottom-up depiction of language using large amounts of text as evidence (Author 2012; Biel & Engberg 2013; Goźdź-Roszkowski & Pontrandolfo 2014; Breeze 2015).
The possibilities offered by CL for the study of legal language are varied, this research will concentrate on two of them, on the one hand, automatic term recognition (ATR) and on the other hand, the use of collocational networks, which might unveil data that may otherwise remain unnoticed. As regards the former, the evaluation of five different ATR methods will be carried out within the legal field to find out the most efficient one in pointing at the keywords in an 8.5 million word legal corpus of judicial decisions, as illustrated by Author (2014). As for the latter, a case study of the sub-technical legal term party will be presented. The legal corpus used for this study was processed using the software Lancsbox (Brezina et al. 2015) which, amongst other uses, is capable of obtaining the collocational networks of terms automatically and displaying them visually, graphically representing the strength of the links existing between a word and its collocates and co-collocates and thus expanding the context of study. Such scrutiny revealed certain patterns such as the relevance of marriages of convenience in our collection of judicial decisions.
However, after processing two Spanish and British legal corpora of roughly 3 million words each, it appears that the occurrence of vocabulary items capable of expressing attitudinal stance in both text collections is more than mere chance, particularly in English. Yet, could such presence be somehow quantified? can differing legal systems condition the way in which legal actors express appraisal?
This research presents a corpus-based contrastive analysis of two corpora of judicial decisions about immigration. After identifying the most frequent 2,500 words from both texts collections, the Appraisal Theory model (White, 1999; Martin 2003; Eggins and Slade, 1997; Rothery & Stenglin, 2000; and Kaltenbacher, 2006), was implemented for their categorisation. The results indicated that the British corpus displayed a higher proportion of evaluative vocabulary items (33% more) than the Spanish dataset; the proportion was even greater for those terms belonging in the category affect (10% against 19% in the British corpus). These and other findings could be explained in relation to our initial hypothesis, whereby civil law legal systems, where the law is codified, might inhibit the expression of emotion as opposed to common law systems, where the law is said to be judge-made, thus leaving greater room for subjectivity.
La experiencia consistió en la realización de un estudio sobre los beneficios que el uso de este tipo de métodos de autoaprendizaje puede conllevar. Con ese fin, el grupo de informantes se dividió en dos subgrupos, uno experimental y otro de control, para aplicar el tratamiento metodológico sobre el experimental. Mientras que el grupo de control llevó a cabo la tarea encomendada consultando páginas web, libros, etc., (igual que han hecho en cursos anteriores como parte del proyecto fin de cuatrimestre), el grupo experimental utilizó únicamente los recursos disponibles en la plataforma, siguiendo al pie de la letra los vídeo-tutoriales elaborados por la profesora Fitzgerald y haciendo las actividades diseñadas para la plataforma.
A pesar del número reducido de informantes, los resultados fueron satisfactorios, mostrando que un porcentaje considerable de los sujetos del grupo experimental eran capaces de utilizar más y mejor la terminología especializada que el grupo de control, además de haber estructurado con más claridad sus trabajos escritos. Como contrapartida, en general, el grupo de control fue más creativo, ampliando considerablemente su área de investigación y mostrando un mayor conocimiento del lenguaje altamente especializado.
The texts of the presentations were gathered and analysed on a lexical and pragmatic level for the evaluation of term usage and distribution, lexical diversity, lexical fundamentality and the use of discourse markers. Although a larger sample of texts would be recommendable to reach more sound conclusions in this respect, the results display a tendency on the part of the experimental group towards a more consistent usage of the terminology, which also appears to be better distributed than the terms in the non-FLAX corpus. In contrast and on average, the lexicon in the FLAX-based corpus tends to be slightly more basic. Concerning the use of meta-discourse markers, though marginally, the experimental group appears to utilise a greater number of evidentials, endophoric and interactional markers.
years thanks to the pioneer work by Johns [1], who coined the term data-driven learning (DDL,
henceforth); Sinclair [2], who developed the concept further on; or Boulton [3], amongst others. DDL
teaching methods promote language study based on the observation of concordances, that is,
examples of the authentic use of keywords in context (KWC), which are retrieved from a linguistic
corpus by running software programs specifically designed to that end, such as Wordsmith [4].
According to the literature on the subject, there exist arguments for and against the use of corpora in
language teaching and there has been a fairly small number of pedagogical experiments in English for
specific purposes (ESP) [3], particularly in the field of telecommunication English. This work suggests
two activities for teaching terminology within this area applying DDL-based methodology, together with
a pedagogical experimental model for its future implementation. Such activities are not intended to
substitute any other teaching material like course books; they are rather envisaged as a supplement to
language exposure and/or reinforcement of terminology. The language samples of this specialized
variety are stored in the Telecommunication English Corpus (TEC) [5], designed and compiled ad hoc
for language research owing to the scarcity of technical corpora available.
Consequently, this research reports on the bilingual degree in Telecommunication at the UPCT within the framework of CLIL and why the specialized corpus brings a considerable advantage to that situation.
Resumen: Las características de los lenguajes de especialidad se han descrito pro-fusamente en la literatura especializada. El trabajo de Enrique Alcaraz destaca entre otros por su exhaustiva y minuciosa descripción del IFE a todos los niveles: léxico, sintáctico, semántico y pragmático. Este estudio tiene como finalidad la constatación de dicha descripción desde una perspectiva basada en análisis de dos corpus de inglés jurídico y de telecomunicaciones. Los resultados obtenidos corroboran lo ya observado por Alcaraz en lo que se refiere al uso de los términos especializados, la relevancia del vocabulario subtécnico, las peculiaridades de los términos latinos en el inglés jurídico y la significativa presencia de las abreviaturas en el inglés de telecomunicaciones. Palabras clave: IFE; inglés jurídico; inglés de telecomunicaciones; lingüística del corpus. Abstract: The features of specialised languages have been extensively described by scholars in the literature. Amongst them, Enrique Alcaraz's work stands out as an exhaustive and comprehensive description of EPAP at all linguistic levels: lexical, syntactic , semantic and pragmatic. This research aims to provide a bottom-up assessment of his description on a lexical level through the implementation of corpus-based techniques on two specialised corpora of legal and telecommunications English. The results support Alcaraz's portrayal as regards term usage, the relevance of sub-technical vocabulary , the peculiarities of Latin single and multi-word terms in legal English and the significant presence and usage of abbreviations in telecommunications English.