Skip to main content

Anabela Barreiro

INESC-ID, IST, Universidade de Lisboa, Human Language Technologies, Post-Doc

INESC-ID, Spoken Language Systems Lab (L2F), Post-Doc

Followers

228

Following

269

Co-authors

23

Mentions

1

Public Views

Post-doctoral scientist at INESC-ID Lisboa. PhD in Linguistics. Works in machine translation and paraphrasing applied to authoring aids, text production and revision, and cross-language tasks. Participated in R&D projects: Eagles, Linguateca, PT-Star, Speedial, MT4M. Principal investigator of the eSPERTo project funded by the Portuguese NSF. Developed commercial MT systems at Logos Corporation-USA. Experience in multilingual linguistic resources and NLP tools. MC COST Action enetCollect (Combining Language Learning with Crowdsourcing Techniques).
Supervisors: Belinda Maia, Adam Meyers, Luísa Coheur, and Isabel Trancoso
Address: Rua Alves Redol, 9

less

University of California, Santa Barbara

Harvard University

The University of Texas at Austin

University of California, Merced

David Pierre Leibovitz

Carleton University

Microsoft Research

Sonja Eisenbeiss

University of Cologne

University of Pittsburgh

The University of Sydney

UCL Institute of Education

InterestsView All (25)

Uploads

Books by Anabela Barreiro

One Book, Two Language Varieties

by Isabel Garcez, Anabela Barreiro, and Tanara Zingano Kuhn

springer, 2020

This paper presents a comparative study of alignment pairs, either contrasting expressions or sty... more This paper presents a comparative study of alignment pairs, either contrasting expressions or stylistic variants of the same expression in the European (EP) and the Brazilian (BP) varieties of Portuguese. The alignments were collected semi-automatically using the CLUE-Aligner tool, which allows to record all pairs of paraphrastic units resulting from the alignment task in a database. The corpus used was a children's literature book Os Livros Que Devoraram o Meu Pai (The Books that Devoured My Father) by the Portuguese author Afonso Cruz and the Brazilian adaptation of this book. The main goal of the work presented here is to gather equivalent phrasal expressions and different syntactic constructions, which convey the same meaning in EP and BP, and contribute to the optimisation of editorial processes compulsory in the adaptation of texts, but which are suitable for any type of editorial process. This study provides a scientific basis for future work in the area of editing, proofreading and converting text to and from any variety of Portuguese from a computational point of view, namely to be used in a paraphrasing system with a variety adaptation functionality, even in the case of a literary text. We contemplate "challenging" cases, from a literary point of view, looking for alternatives that do not tamper with the imagery richness of the original version.

Paraphrastic Variance between European and Brazilian Portuguese

This paper presents a methodology to extract a paraphrase database for the European and Brazilian... more This paper presents a methodology to extract a paraphrase database for the European and Brazilian varieties of Portuguese, and discusses a set of paraphrastic categories of multiwords and phrasal units, such as the compounds toda a gente vs todo o mundo "everybody" or the gerundive constructions [estar a + V-Inf] vs [ficar + V-Ger] (e.g., estive a observar vs fiquei observando "I was observing"), which are extremely relevant to high quality paraphrasing. The variants were manually aligned in the e-PACT corpus, using the CLUE-Aligner tool. The methodology, inspired in the Logos Model, focuses on a semantico-syntactic analysis of each paraphrastic unit and constitutes a subset of the Gold-CLUE-Paraphrases. 1 The construction of a larger dataset of paraphrastic contrasts among the distinct varieties of the Portuguese language is indispensable for variety adaptation, i.e., for dealing with the cultural, linguistic and stylistic differences between them, making it possible to convert texts (semi-)automatically from one variety into another, a key function in paraphrasing systems. This topic represents an interesting new line of research with valuable applications in language learning, language generation, question-answering, summarization, and machine translation, among others. The paraphrastic units are the first resource of its kind for Portuguese to become available to the scientific community for research purposes.

Por Outras Palavras POP@PROPOR2018 -- Primeiro Seminário sobre Ferramentas e Recursos Linguísticos para Parafraseamento em Português

Por Outras Palavras

Este volume contém os trabalhos apresentados no POP@PROPOR2018: POP-Por Outras Pala-vras, o 1 o s... more Este volume contém os trabalhos apresentados no POP@PROPOR2018: POP-Por Outras Pala-vras, o 1 o seminário sobre Ferramentas e Recur-sos Linguísticos para Parafraseamento em Por-tuguês, realizado a 24 de Setembro de 2018 em Canela (RS), Brasil. O seminário teve como objetivo reunir investigadores linguistas e que trabalham ná area do Processamento de Lin-guagem Natural interessados em discutir novas idéias sobre o desenvolvimento e uso de recursos linguísticos orientados para pararafraseamento em português com aplicações do mundo real. As paráfrases são extremamente importan-tes na comunicação humana, tanto na produção como na compreensão da linguagem, e assumem um papel cada vez mais importante em ativi-dades e projetos de investigação. Diversas ex-periências linguísticas mostraram a viabilidade de usar recursos parafrásticos numa ampla va-riedade de aplicações de software, pois permi-tem reconhecer e gerar formas equivalentes de expressar o mesmo conteúdo, permitindo que os sistemas forneçam ao utilizador sugestões para dizer e escrever a mesma coisa / ideia por ou-tras palavras, aumentar a fluência, a criativi-dade e a diversidade estilística. No atual estágio de desenvolvimento, os sistemas de parafrase-amento exigem conhecimento linguístico e "in-teligência"sensível ao contexto para "compreen-der"e reconhecer uma ampla variedade de ex-pressões. Para o português, a utilidade dos re-cursos parafrásticos já foi explorada em cenários aplicativos, como um sistema de diálogo, para au-mentar o conhecimento linguístico de um agente virtual inteligente, em ferramentas de suma-rização e simplificação e também em ferramentas que visam obter tradução automática de quali-dade superior. No entanto, ´ e necessária mais in-vestigação para a viabilidade e sucesso de um sis-tema de parafraseamento a longo prazo nasáreasnasáreas de produção e revisão de texto, nomeadamente no desenvolvimento e melhoria de plataformas de autoria online, desenvolvendo programas in-terativos para ajudar os estudantes de português como língua estrangeira a produzir frases dife-rentes mas equivalentes ou até para estudantes nativos, para os auxiliar nas tarefas de produção e revisão dos seus textos. Ao propor o seminário POP, queríamos (i) reunir investigadores com interesse no campo das paráfrases, e com especial enfoque no português, para aprender e partilhar informação sobre o tema; (ii) reunir um conjunto de artigos de boa qualidade que discutam asúltimasas´asúltimas tendências náná area e contribuam para melhorar o estado da arte das paráfrases em português; (iii) trocar ideias e disseminar as melhores práticas para ajudar a fomentar a investigação nestá area; (iv) fomen-tar uma convergência de esforços de investigação para uma definição consensual dos métodos ci-entíficos, e incentivar a cooperação internacio-nal, a fim de alcançar estratégias comuns que respondamàsrespondamàs necessidades tecnológicas atuais; (v) discutir novas metodologias, como redes neu-rais, etc., e aprender a combinar essas metodolo-gias com esforços linguísticos; (vi) discutir desa-fios futuros e trocar informação sobre aspetos ci-entíficos e tecnológicos; (vii) incentivar e reforçar a criação de corpora paralelos de paráfrases para o português como conjuntos de dados para a co-leta de recursos de alinhamento parafrástico para treino e teste de sistemas de parafraseamento; e (viii) localizar fontes de financiamento para impulsionar ainda mais a investigação, apoiar a inovação e desenvolver esta tecnologia capaci-tante essencial. O Comitê do Programa era composto por 22 1

Contractions: To Align or Not to Align, That Is the Question

This paper performs a detailed analysis on the alignment of Portuguese contractions, based on a p... more This paper performs a detailed analysis on the alignment of Portuguese contractions, based on a previously aligned bilingual corpus. The alignment task was performed manually in a subset of the English-Portuguese CLUE4Translation Alignment Collection. The initial parallel corpus was pre-processed and, a decision was made as to whether the contraction should be maintained or decomposed in the alignment. Decomposition was required in the cases in which the two words that have been concatenated, i.e., the preposition and the determiner or pronoun, go in two separate translation alignment pairs (e.g., [no seio de] [a União Europeia] | [within] [the Euro-pean Union]). Most contractions required decomposition in contexts where they are positioned at the end of a multiword unit. On the other hand, while contractions tend to be maintained when they occur in the beginning or in the middle of the multiword unit, i.e., in the frozen part of the multiword (e.g., [no que diz respeito a] | [with regard to] or [além disso] [in addition].

Parafraseamento Automático de Registo Informal em Registo Formal na Língua Portuguesa -- Automated Paraphrasing of Portuguese Informal into Formal Language

by Anabela Barreiro and Isabel Garcez

Resumo Este artigo apresenta o processo de automatização de parafraseamento em português e conver... more Resumo Este artigo apresenta o processo de automatização de parafraseamento em português e conversão de construções típicas do registo informal ou da lingua-gem falada em construções de registo formal usadas na linguagem escrita. Ilustraremos o processo de auto-matização com exemplos extraídos do corpus e-PACT, que envolvem a colocação normalizada de pronomes clíticos quando co-ocorrem com compostos verbais. A tarefa consiste em parafrasear e normalizar, en-tre outras, construções como vou-lhe/posso-lhe fazer uma surpresa em vou/posso fazer-lhe uma surpresa, em que o pronome clítico lhe migra de uma posição enclítica imediatamente a seguir ao primeiro verbo do composto verbal para uma posição enclítica a seguir ao verbo principal, qué e o verbo responsável pela seleção do argumento pronominal. O primeiro verbóverbó e um verbo auxiliar ou um verbo volitivo, e.g., querer. Esté e um procedimento padronizado no processo de revisão em português europeu. Casos como este repre-sentam fenómenos linguísticos em que os estudantes de língua portuguesa e falantes em geral se confun-dem ou onde "tropeçam". O artigo enfatiza a língua padrão em que os fenómenos observados ocorrem, descreve exemplos de interesse encontrados no corpus e apresenta uma solução automática, baseada na aplicação de gramáticas transformacionais genéricas, que facilitam a normalização de inadequações ou fa-lhas sintáticas (registos informais) encontradas nas construções pesquisadas em construções padronizadas típicas da escrita formal ou escrita profissional. Abstract This paper presents the automation process of paraphrasing and converting Portuguese constructions typical of informal or spoken language into a formal written language. We illustrate this automation process with examples extracted from the e-PACT corpus that involve the placement of clitic pronouns in verbal compound contexts. Our task consists in paraphrasing and normalizing, among others, constructions such as vou-lhe/posso-lhe fazer uma surpresa into vou/posso fazer-lhe uma surpresa 'lit: I will/can to him/her make a surprise / I will/can make to him/her a surprise; I will/can make him/her a surprise', where the clitic pronoun lhe migrates from an enclitic position immediately after the first verb of the verbal compound to an enclitic position after the main verb, which is the verb responsible for the selection of that pronominal argument. The first verb is either an auxiliary verb or a volitive verb, e.g., querer 'want'. This is a standard revision procedure in European Portu-guese. Cases like this represent linguistic phenomena where language students and language users in general get confused or "stumble". The paper focuses on general language where the phenomena being observed occur, describes examples of interest found in the corpus, and presents an automatic solution for the normalization of informal syntactic inadequacies found in the researched structures into standard structures typical of formal or professional writing through the application of very generic transformational grammars .

Alinhamentos Parafrásticos entre o Português Europeu e o Português do Brasil: Construções de Predicados Verbais com o Pronome Clítico 'lhe'

Este artigo apresenta o alinhamento de construções contendo predicados verbais com o clítico lhe ... more Este artigo apresenta o alinhamento de construções contendo predicados verbais com o clítico lhe nas variedades de Português Europeu (PE) e Português do Brasil (PB), como nas frases Já lhe arrumaram a bagagem -Sua bagagem está seguramente guardada, onde a próclise do dativo lhe em PE contrasta com o pronome possessivo sua em PB. Selecionámos vários pares contrastivos de paráfrases, tais como pronomes clíticos em próclise eênclise, pronomes ocorrendo em presença de pronomes relativos e de advérbios de negação, entre outras construções a fim de ilustrar esse fenómeno linguístico. Algumas diferenças correspondem a contrastes reais entre as duas variedades de Português, enquanto que outras representam escolhas puramente estilísticas. As variants contrastivas foram alinhadas manualmente a fim de estabelecer um conjunto padrão, e a tipologia estabelecida de forma a poder ser futuramente ampliada e disponibilizada ao público. Os alinhamentos dos pares de paráfrases foram executados no corpus e-PACT usando a ferramenta CLUE-Aligner. Esta pesquisa foi desenvolvida noâmbito do projeto eSPERTo.

Contributos para o Aumento de Qualidade na Língua Digital

As tecnologias da língua entraram na esfera da vida quotidiana do cidadão comum em diversas áreas... more As tecnologias da língua entraram na esfera da vida quotidiana do cidadão comum em diversas áreas: na comunicação social e nos media, no ensino e aprendizagem da língua e de línguas estrangeiras, em divulgação cultural, na saúde, nas relações interpessoais nas redes sociais e em relações interna-cionais, entre outros. Este processo de globalização linguística representa simultaneamente um desafio e uma oportunidade para os falantes do por-tuguês. Este artigo descreve algumas ferramentas e recursos linguísticos desenvolvidos no INESC-ID que visam um aumento da qualidade do por-tuguês em aplicações de linguagem natural: o eSPERTo, o CLUE-Aligner, o e-PACT, e as CLUE Alignment Guidelines. O eSPERTo é uma plataforma de geração de paráfrases com vista à reescrita de textos e ferramenta para o auxílio à tradução. O CLUE-Aligner é um alinhador interativo que permite a anotação e extração de unidades lexicais multipalavra e outras unidades frási-cas em corpos paralelos de traduções e em textos parafrásticos. O e-PACT é um corpo de paráfrases de unidades lexicais multipalavra e expressões al-inhadas construído a partir de traduções do inglês para as variantes europeia e brasileira do português. As CLUE Alignment Guidelines são linhas dire-trizes de alinhamento de paráfrases e unidades de tradução desenvolvidas a partir de corpos paralelos das variantes europeia e brasileira do português e de corpos paralelos de inglês e de línguas românicas. Para além de visarem a qualidade linguística, estas ferramentas estão também direcionadas para a internacionalização da língua portuguesa. 1 Introdução Apesar dos avanços significativos sobretudo na última década nas tecnologias da língua, ainda se registam lacunas ao nível da qualidade na língua digital 1 , área onde 1 Língua digital é toda a língua que passa por computadores e ferramentas eletrónicas.

Port4NooJ: Portuguese Linguistic Module and Bilingual Resources for Machine Translation

In this paper we present version 0.1 of Port4NooJ, the open source NooJ Portuguese linguistic mod... more In this paper we present version 0.1 of Port4NooJ, the open source NooJ Portuguese linguistic module, which integrates a bilingual extension for Portuguese-English machine translation, (MT4NooJ), a work in progress. We first explain the motivation behind this work and then describe the main components of the module, particularly, the electronic dictionaries, the rules which formalize and document Portuguese inflectional and derivational descriptions, and the different types of grammar: morphological, disambiguation, syntactic-semantic, multiword expressions and translation grammars. We explain how the different components interact and show the application of these linguistic resources, dictionaries and grammars to text. We present methodology and results driven by the new characteristics of this module.

Generating Paraphrases of Human Intransitive Adjective Constructions with Port4NooJ

This paper details the integration into Port4NooJ of 15 lexicon-grammar tables describing the dis... more This paper details the integration into Port4NooJ of 15 lexicon-grammar tables describing the distributional properties of 4,248 human intransitive adjectives. The properties described in these tables enable the recognition and generation of adjectival constructions where the adjective has a predicative function. These properties also establish semantic relationships between adjective, noun and verb predicates, allowing new paraphrasing capabilities that were described in NooJ grammars. The new dictionary of human intransitive adjectives created by merging the information on those tables with the Port4NooJ homo-graph adjectives is comprised of 5,177 entries. The enhanced Port4NooJ is being used in eSPERTo, a NooJ-based paraphrase generation platform.

eSPERTo's Paraphrastic Knowledge applied to Question-Answering and Summarization

This paper reports our first attempt of integrating eSPERTo's para-phrastic engine, which is base... more This paper reports our first attempt of integrating eSPERTo's para-phrastic engine, which is based on NooJ platform, with two application scenarios: a conversational agent, and a summarization system. We briefly describe eS-PERTo's base resources, and the necessary modifications to these resources that enabled the production of paraphrases required to feed both systems. Although the improvement observed in both scenarios is not significant, we present a detailed error analysis to further improve the achieved results in future experiments.

Integrating the Lexicon-Grammar of Predicate Nouns with Support Verb fazer into Port4NooJ

by Anabela Barreiro and Lucília Chacoto

This paper describes the ongoing process of integrating approximately 3,000 predicate nouns into ... more This paper describes the ongoing process of integrating approximately 3,000 predicate nouns into Port4NooJ, the Portuguese module for NooJ. The integration of these resources enables us to further extend the paraphrastic capabilities of eSPERTo paraphrasing system developed in the scope of a project with the same name. The integrated predicate nouns co-occur with the support verb fazer (do or make) and their syntactic and distributional properties are formalized in lexicon-grammar tables. These lexicon-grammar tables resulted in a standalone dictionary of predicate noun constructions and a few new grammars that can be used in paraphrase analysis and generation.

e-PACT: eSPERTo Paraphrase Aligned Corpus of EN-EP/BP Translations

This paper presents e-PACT, a corpus of paraphrase aligned European and Brazilian Portuguese samp... more This paper presents e-PACT, a corpus of paraphrase aligned European and Brazilian Portuguese sampled from the translations of two literary English books by David Lodge available in the COMPARA corpora. We used the e-PACT sentence-aligned corpus as a baseline to annotate semantically equivalent multiwords, phrases, and expressions between the two variants of Portuguese. The annotation task was performed by following a set of guidelines, the CLUE4Paraphrasing Alignment Guidelines, and the pairs of paraphrastic units found in the corpus, the Gold CLUE4Paraphrasing, were annotated through the use of an alignment tool called CLUE-Aligner. All the resources, the e-PACT corpus, the CLUE4Paraphrasing Alignment Guidelines, the Gold CLUE4Paraphrasing pairs of paraphrastic units, and the CLUE-Aligner tool were developed in the scope of the eSPERTo project.

Make it simple with paraphrases: automated paraphrasing for authoring aids and machine translation

This book presents a novel scientific approach to improve machine translation by paraphrasing sup... more This book presents a novel scientific approach to improve machine translation by paraphrasing support verb constructions with semantically equivalent verbs (e.g. make a presentation of/present). The author demonstrates that this strategy produces a positive impact in machine translation. The study is reproducible and extendable to distinct linguistic phenomena and successfully applied to different- purpose natural language processing applications. The author exemplifies how paraphrases can be efficiently employed by authoring aids to help simplify and clarify texts, presenting obvious benefits to linguistic quality assurance in text processing. While addressing and providing a solution for a specific linguistic problem, this book presents a comprehensive theoretical background and exposure of conceptual problems that will interest natural language processing professionals, linguists, translators, and students. Written in a simple language, this book will be easily understood by non-specialists in the field who have an interest in natural language.

Papers by Anabela Barreiro

Large Language Models and OpenLogos: An Educational Case Scenario

by Andrijana Pavlova and Anabela Barreiro

Open Research Europe, 2024

Large Language Models (LLMs) offer advanced text generation capabilities, sometimes surpassing hu... more Large Language Models (LLMs) offer advanced text generation capabilities, sometimes surpassing human abilities. However, their use without proper expertise poses significant challenges, particularly in educational contexts. This article explores different facets of natural language generation (NLG) within the educational realm, assessing its advantages and disadvantages, particularly concerning LLMs. It addresses concerns regarding the opacity of LLMs and the potential bias in their generated content, advocating for transparent solutions. Therefore, it examines the feasibility of integrating OpenLogos expert-crafted resources into language generation tools used for paraphrasing and translation. In the context of the Multi3Generation COST Action (CA18231), we have been emphasizing the significance of incorporating OpenLogos into language generation processes, and the need for clear guidelines and ethical standards in generative models involving multilingual, multimodal, and multitasking capabilities. The Multi3Generation initiative strives to progress NLG research for societal welfare, including its educational applications. It promotes inclusive models inspired by the Logos Model, prioritizing transparency, human control, preservation of language principles and meaning, and acknowledgment of the expertise of resource creators. We envision a scenario where OpenLogos can contribute significantly to inclusive AI-supported education. Ethical considerations and limitations related to AI implementation in education are explored, highlighting the importance of maintaining a balanced approach consistent with traditional educational principles. Ultimately, the article advocates for educators to adopt innovative tools and methodologies to foster dynamic learning environments that facilitate linguistic development and growth.

Automated Paraphrasing for Authoring Aids and Machine Translation

SPIDER: A System for Paraphrasing in Document Editing and Revision — Applicability in Machine Translation Pre-editing

Lecture Notes in Computer Science, 2011

... Anabela Barreiro ... Lambert Academic Publishing (2011) ISBN 978-3-8383-8565-5 Barreiro, A., ... more

EVAL - Evaluation of Machine Translation at FLUP

Multi3Generation: Multitask, Multilingual, Multimodal Language Generation

HAL (Le Centre pour la Communication Scientifique Directe), Jun 1, 2022

This paper presents the Multitask, Multilingual, Multimodal Language Generation COST Action-Multi... more This paper presents the Multitask, Multilingual, Multimodal Language Generation COST Action-Multi3Generation (CA18231), an interdisciplinary network of research groups working on different aspects of language generation. This "metapaper" will serve as reference for citations of the Action in future publications. It presents the objectives, challenges and a the links for the achieved outcomes.

Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing

Prefácio - POP - Por Outras Palavras

Linguamática, 2019

Este volume contém os trabalhos apresentados no POP -- Por Outras Palavras, o 1º seminário sobre ... more Este volume contém os trabalhos apresentados no POP -- Por Outras Palavras, o 1º seminário sobre Ferramentas e Recursos Linguísticos para Parafraseamento em Português, realizado a 24 de Setembro de 2018 em Canela (RS), Brasil. O seminário teve como objetivo reunir investigadores linguistas e que trabalham na área do Processamento de Linguagem Natural interessados em discutir novas ideias sobre o desenvolvimento e uso de recursos linguísticos orientados para pararafraseamento em português com aplicações do mundo real. As paráfrases são extremamente importantes na comunicação humana, tanto na produção como na compreensão da linguagem, e assumem um papel cada vez mais importante em atividades e projetos de investigação. Diversas experiências linguísticas mostraram a viabilidade de usar recursos parafrásticos numa ampla variedade de aplicações de software, pois permitem reconhecer e gerar formas equivalentes de expressar o mesmo conteúdo, permitindo que os sistemas forneçam ao utilizado...

One Book, Two Language Varieties

by Isabel Garcez, Anabela Barreiro, and Tanara Zingano Kuhn

springer, 2020

This paper presents a comparative study of alignment pairs, either contrasting expressions or sty... more This paper presents a comparative study of alignment pairs, either contrasting expressions or stylistic variants of the same expression in the European (EP) and the Brazilian (BP) varieties of Portuguese. The alignments were collected semi-automatically using the CLUE-Aligner tool, which allows to record all pairs of paraphrastic units resulting from the alignment task in a database. The corpus used was a children's literature book Os Livros Que Devoraram o Meu Pai (The Books that Devoured My Father) by the Portuguese author Afonso Cruz and the Brazilian adaptation of this book. The main goal of the work presented here is to gather equivalent phrasal expressions and different syntactic constructions, which convey the same meaning in EP and BP, and contribute to the optimisation of editorial processes compulsory in the adaptation of texts, but which are suitable for any type of editorial process. This study provides a scientific basis for future work in the area of editing, proofreading and converting text to and from any variety of Portuguese from a computational point of view, namely to be used in a paraphrasing system with a variety adaptation functionality, even in the case of a literary text. We contemplate "challenging" cases, from a literary point of view, looking for alternatives that do not tamper with the imagery richness of the original version.

Paraphrastic Variance between European and Brazilian Portuguese

This paper presents a methodology to extract a paraphrase database for the European and Brazilian... more This paper presents a methodology to extract a paraphrase database for the European and Brazilian varieties of Portuguese, and discusses a set of paraphrastic categories of multiwords and phrasal units, such as the compounds toda a gente vs todo o mundo "everybody" or the gerundive constructions [estar a + V-Inf] vs [ficar + V-Ger] (e.g., estive a observar vs fiquei observando "I was observing"), which are extremely relevant to high quality paraphrasing. The variants were manually aligned in the e-PACT corpus, using the CLUE-Aligner tool. The methodology, inspired in the Logos Model, focuses on a semantico-syntactic analysis of each paraphrastic unit and constitutes a subset of the Gold-CLUE-Paraphrases. 1 The construction of a larger dataset of paraphrastic contrasts among the distinct varieties of the Portuguese language is indispensable for variety adaptation, i.e., for dealing with the cultural, linguistic and stylistic differences between them, making it possible to convert texts (semi-)automatically from one variety into another, a key function in paraphrasing systems. This topic represents an interesting new line of research with valuable applications in language learning, language generation, question-answering, summarization, and machine translation, among others. The paraphrastic units are the first resource of its kind for Portuguese to become available to the scientific community for research purposes.

Por Outras Palavras POP@PROPOR2018 -- Primeiro Seminário sobre Ferramentas e Recursos Linguísticos para Parafraseamento em Português

Por Outras Palavras

Este volume contém os trabalhos apresentados no POP@PROPOR2018: POP-Por Outras Pala-vras, o 1 o s... more Este volume contém os trabalhos apresentados no POP@PROPOR2018: POP-Por Outras Pala-vras, o 1 o seminário sobre Ferramentas e Recur-sos Linguísticos para Parafraseamento em Por-tuguês, realizado a 24 de Setembro de 2018 em Canela (RS), Brasil. O seminário teve como objetivo reunir investigadores linguistas e que trabalham ná area do Processamento de Lin-guagem Natural interessados em discutir novas idéias sobre o desenvolvimento e uso de recursos linguísticos orientados para pararafraseamento em português com aplicações do mundo real. As paráfrases são extremamente importan-tes na comunicação humana, tanto na produção como na compreensão da linguagem, e assumem um papel cada vez mais importante em ativi-dades e projetos de investigação. Diversas ex-periências linguísticas mostraram a viabilidade de usar recursos parafrásticos numa ampla va-riedade de aplicações de software, pois permi-tem reconhecer e gerar formas equivalentes de expressar o mesmo conteúdo, permitindo que os sistemas forneçam ao utilizador sugestões para dizer e escrever a mesma coisa / ideia por ou-tras palavras, aumentar a fluência, a criativi-dade e a diversidade estilística. No atual estágio de desenvolvimento, os sistemas de parafrase-amento exigem conhecimento linguístico e "in-teligência"sensível ao contexto para "compreen-der"e reconhecer uma ampla variedade de ex-pressões. Para o português, a utilidade dos re-cursos parafrásticos já foi explorada em cenários aplicativos, como um sistema de diálogo, para au-mentar o conhecimento linguístico de um agente virtual inteligente, em ferramentas de suma-rização e simplificação e também em ferramentas que visam obter tradução automática de quali-dade superior. No entanto, ´ e necessária mais in-vestigação para a viabilidade e sucesso de um sis-tema de parafraseamento a longo prazo nasáreasnasáreas de produção e revisão de texto, nomeadamente no desenvolvimento e melhoria de plataformas de autoria online, desenvolvendo programas in-terativos para ajudar os estudantes de português como língua estrangeira a produzir frases dife-rentes mas equivalentes ou até para estudantes nativos, para os auxiliar nas tarefas de produção e revisão dos seus textos. Ao propor o seminário POP, queríamos (i) reunir investigadores com interesse no campo das paráfrases, e com especial enfoque no português, para aprender e partilhar informação sobre o tema; (ii) reunir um conjunto de artigos de boa qualidade que discutam asúltimasas´asúltimas tendências náná area e contribuam para melhorar o estado da arte das paráfrases em português; (iii) trocar ideias e disseminar as melhores práticas para ajudar a fomentar a investigação nestá area; (iv) fomen-tar uma convergência de esforços de investigação para uma definição consensual dos métodos ci-entíficos, e incentivar a cooperação internacio-nal, a fim de alcançar estratégias comuns que respondamàsrespondamàs necessidades tecnológicas atuais; (v) discutir novas metodologias, como redes neu-rais, etc., e aprender a combinar essas metodolo-gias com esforços linguísticos; (vi) discutir desa-fios futuros e trocar informação sobre aspetos ci-entíficos e tecnológicos; (vii) incentivar e reforçar a criação de corpora paralelos de paráfrases para o português como conjuntos de dados para a co-leta de recursos de alinhamento parafrástico para treino e teste de sistemas de parafraseamento; e (viii) localizar fontes de financiamento para impulsionar ainda mais a investigação, apoiar a inovação e desenvolver esta tecnologia capaci-tante essencial. O Comitê do Programa era composto por 22 1

Contractions: To Align or Not to Align, That Is the Question

This paper performs a detailed analysis on the alignment of Portuguese contractions, based on a p... more This paper performs a detailed analysis on the alignment of Portuguese contractions, based on a previously aligned bilingual corpus. The alignment task was performed manually in a subset of the English-Portuguese CLUE4Translation Alignment Collection. The initial parallel corpus was pre-processed and, a decision was made as to whether the contraction should be maintained or decomposed in the alignment. Decomposition was required in the cases in which the two words that have been concatenated, i.e., the preposition and the determiner or pronoun, go in two separate translation alignment pairs (e.g., [no seio de] [a União Europeia] | [within] [the Euro-pean Union]). Most contractions required decomposition in contexts where they are positioned at the end of a multiword unit. On the other hand, while contractions tend to be maintained when they occur in the beginning or in the middle of the multiword unit, i.e., in the frozen part of the multiword (e.g., [no que diz respeito a] | [with regard to] or [além disso] [in addition].

Parafraseamento Automático de Registo Informal em Registo Formal na Língua Portuguesa -- Automated Paraphrasing of Portuguese Informal into Formal Language

by Anabela Barreiro and Isabel Garcez

Resumo Este artigo apresenta o processo de automatização de parafraseamento em português e conver... more Resumo Este artigo apresenta o processo de automatização de parafraseamento em português e conversão de construções típicas do registo informal ou da lingua-gem falada em construções de registo formal usadas na linguagem escrita. Ilustraremos o processo de auto-matização com exemplos extraídos do corpus e-PACT, que envolvem a colocação normalizada de pronomes clíticos quando co-ocorrem com compostos verbais. A tarefa consiste em parafrasear e normalizar, en-tre outras, construções como vou-lhe/posso-lhe fazer uma surpresa em vou/posso fazer-lhe uma surpresa, em que o pronome clítico lhe migra de uma posição enclítica imediatamente a seguir ao primeiro verbo do composto verbal para uma posição enclítica a seguir ao verbo principal, qué e o verbo responsável pela seleção do argumento pronominal. O primeiro verbóverbó e um verbo auxiliar ou um verbo volitivo, e.g., querer. Esté e um procedimento padronizado no processo de revisão em português europeu. Casos como este repre-sentam fenómenos linguísticos em que os estudantes de língua portuguesa e falantes em geral se confun-dem ou onde "tropeçam". O artigo enfatiza a língua padrão em que os fenómenos observados ocorrem, descreve exemplos de interesse encontrados no corpus e apresenta uma solução automática, baseada na aplicação de gramáticas transformacionais genéricas, que facilitam a normalização de inadequações ou fa-lhas sintáticas (registos informais) encontradas nas construções pesquisadas em construções padronizadas típicas da escrita formal ou escrita profissional. Abstract This paper presents the automation process of paraphrasing and converting Portuguese constructions typical of informal or spoken language into a formal written language. We illustrate this automation process with examples extracted from the e-PACT corpus that involve the placement of clitic pronouns in verbal compound contexts. Our task consists in paraphrasing and normalizing, among others, constructions such as vou-lhe/posso-lhe fazer uma surpresa into vou/posso fazer-lhe uma surpresa 'lit: I will/can to him/her make a surprise / I will/can make to him/her a surprise; I will/can make him/her a surprise', where the clitic pronoun lhe migrates from an enclitic position immediately after the first verb of the verbal compound to an enclitic position after the main verb, which is the verb responsible for the selection of that pronominal argument. The first verb is either an auxiliary verb or a volitive verb, e.g., querer 'want'. This is a standard revision procedure in European Portu-guese. Cases like this represent linguistic phenomena where language students and language users in general get confused or "stumble". The paper focuses on general language where the phenomena being observed occur, describes examples of interest found in the corpus, and presents an automatic solution for the normalization of informal syntactic inadequacies found in the researched structures into standard structures typical of formal or professional writing through the application of very generic transformational grammars .

Alinhamentos Parafrásticos entre o Português Europeu e o Português do Brasil: Construções de Predicados Verbais com o Pronome Clítico 'lhe'

Este artigo apresenta o alinhamento de construções contendo predicados verbais com o clítico lhe ... more Este artigo apresenta o alinhamento de construções contendo predicados verbais com o clítico lhe nas variedades de Português Europeu (PE) e Português do Brasil (PB), como nas frases Já lhe arrumaram a bagagem -Sua bagagem está seguramente guardada, onde a próclise do dativo lhe em PE contrasta com o pronome possessivo sua em PB. Selecionámos vários pares contrastivos de paráfrases, tais como pronomes clíticos em próclise eênclise, pronomes ocorrendo em presença de pronomes relativos e de advérbios de negação, entre outras construções a fim de ilustrar esse fenómeno linguístico. Algumas diferenças correspondem a contrastes reais entre as duas variedades de Português, enquanto que outras representam escolhas puramente estilísticas. As variants contrastivas foram alinhadas manualmente a fim de estabelecer um conjunto padrão, e a tipologia estabelecida de forma a poder ser futuramente ampliada e disponibilizada ao público. Os alinhamentos dos pares de paráfrases foram executados no corpus e-PACT usando a ferramenta CLUE-Aligner. Esta pesquisa foi desenvolvida noâmbito do projeto eSPERTo.

Contributos para o Aumento de Qualidade na Língua Digital

As tecnologias da língua entraram na esfera da vida quotidiana do cidadão comum em diversas áreas... more As tecnologias da língua entraram na esfera da vida quotidiana do cidadão comum em diversas áreas: na comunicação social e nos media, no ensino e aprendizagem da língua e de línguas estrangeiras, em divulgação cultural, na saúde, nas relações interpessoais nas redes sociais e em relações interna-cionais, entre outros. Este processo de globalização linguística representa simultaneamente um desafio e uma oportunidade para os falantes do por-tuguês. Este artigo descreve algumas ferramentas e recursos linguísticos desenvolvidos no INESC-ID que visam um aumento da qualidade do por-tuguês em aplicações de linguagem natural: o eSPERTo, o CLUE-Aligner, o e-PACT, e as CLUE Alignment Guidelines. O eSPERTo é uma plataforma de geração de paráfrases com vista à reescrita de textos e ferramenta para o auxílio à tradução. O CLUE-Aligner é um alinhador interativo que permite a anotação e extração de unidades lexicais multipalavra e outras unidades frási-cas em corpos paralelos de traduções e em textos parafrásticos. O e-PACT é um corpo de paráfrases de unidades lexicais multipalavra e expressões al-inhadas construído a partir de traduções do inglês para as variantes europeia e brasileira do português. As CLUE Alignment Guidelines são linhas dire-trizes de alinhamento de paráfrases e unidades de tradução desenvolvidas a partir de corpos paralelos das variantes europeia e brasileira do português e de corpos paralelos de inglês e de línguas românicas. Para além de visarem a qualidade linguística, estas ferramentas estão também direcionadas para a internacionalização da língua portuguesa. 1 Introdução Apesar dos avanços significativos sobretudo na última década nas tecnologias da língua, ainda se registam lacunas ao nível da qualidade na língua digital 1 , área onde 1 Língua digital é toda a língua que passa por computadores e ferramentas eletrónicas.

Port4NooJ: Portuguese Linguistic Module and Bilingual Resources for Machine Translation

In this paper we present version 0.1 of Port4NooJ, the open source NooJ Portuguese linguistic mod... more In this paper we present version 0.1 of Port4NooJ, the open source NooJ Portuguese linguistic module, which integrates a bilingual extension for Portuguese-English machine translation, (MT4NooJ), a work in progress. We first explain the motivation behind this work and then describe the main components of the module, particularly, the electronic dictionaries, the rules which formalize and document Portuguese inflectional and derivational descriptions, and the different types of grammar: morphological, disambiguation, syntactic-semantic, multiword expressions and translation grammars. We explain how the different components interact and show the application of these linguistic resources, dictionaries and grammars to text. We present methodology and results driven by the new characteristics of this module.

Generating Paraphrases of Human Intransitive Adjective Constructions with Port4NooJ

This paper details the integration into Port4NooJ of 15 lexicon-grammar tables describing the dis... more This paper details the integration into Port4NooJ of 15 lexicon-grammar tables describing the distributional properties of 4,248 human intransitive adjectives. The properties described in these tables enable the recognition and generation of adjectival constructions where the adjective has a predicative function. These properties also establish semantic relationships between adjective, noun and verb predicates, allowing new paraphrasing capabilities that were described in NooJ grammars. The new dictionary of human intransitive adjectives created by merging the information on those tables with the Port4NooJ homo-graph adjectives is comprised of 5,177 entries. The enhanced Port4NooJ is being used in eSPERTo, a NooJ-based paraphrase generation platform.

eSPERTo's Paraphrastic Knowledge applied to Question-Answering and Summarization

This paper reports our first attempt of integrating eSPERTo's para-phrastic engine, which is base... more This paper reports our first attempt of integrating eSPERTo's para-phrastic engine, which is based on NooJ platform, with two application scenarios: a conversational agent, and a summarization system. We briefly describe eS-PERTo's base resources, and the necessary modifications to these resources that enabled the production of paraphrases required to feed both systems. Although the improvement observed in both scenarios is not significant, we present a detailed error analysis to further improve the achieved results in future experiments.

Integrating the Lexicon-Grammar of Predicate Nouns with Support Verb fazer into Port4NooJ

by Anabela Barreiro and Lucília Chacoto

This paper describes the ongoing process of integrating approximately 3,000 predicate nouns into ... more This paper describes the ongoing process of integrating approximately 3,000 predicate nouns into Port4NooJ, the Portuguese module for NooJ. The integration of these resources enables us to further extend the paraphrastic capabilities of eSPERTo paraphrasing system developed in the scope of a project with the same name. The integrated predicate nouns co-occur with the support verb fazer (do or make) and their syntactic and distributional properties are formalized in lexicon-grammar tables. These lexicon-grammar tables resulted in a standalone dictionary of predicate noun constructions and a few new grammars that can be used in paraphrase analysis and generation.

e-PACT: eSPERTo Paraphrase Aligned Corpus of EN-EP/BP Translations

This paper presents e-PACT, a corpus of paraphrase aligned European and Brazilian Portuguese samp... more This paper presents e-PACT, a corpus of paraphrase aligned European and Brazilian Portuguese sampled from the translations of two literary English books by David Lodge available in the COMPARA corpora. We used the e-PACT sentence-aligned corpus as a baseline to annotate semantically equivalent multiwords, phrases, and expressions between the two variants of Portuguese. The annotation task was performed by following a set of guidelines, the CLUE4Paraphrasing Alignment Guidelines, and the pairs of paraphrastic units found in the corpus, the Gold CLUE4Paraphrasing, were annotated through the use of an alignment tool called CLUE-Aligner. All the resources, the e-PACT corpus, the CLUE4Paraphrasing Alignment Guidelines, the Gold CLUE4Paraphrasing pairs of paraphrastic units, and the CLUE-Aligner tool were developed in the scope of the eSPERTo project.

Make it simple with paraphrases: automated paraphrasing for authoring aids and machine translation

This book presents a novel scientific approach to improve machine translation by paraphrasing sup... more This book presents a novel scientific approach to improve machine translation by paraphrasing support verb constructions with semantically equivalent verbs (e.g. make a presentation of/present). The author demonstrates that this strategy produces a positive impact in machine translation. The study is reproducible and extendable to distinct linguistic phenomena and successfully applied to different- purpose natural language processing applications. The author exemplifies how paraphrases can be efficiently employed by authoring aids to help simplify and clarify texts, presenting obvious benefits to linguistic quality assurance in text processing. While addressing and providing a solution for a specific linguistic problem, this book presents a comprehensive theoretical background and exposure of conceptual problems that will interest natural language processing professionals, linguists, translators, and students. Written in a simple language, this book will be easily understood by non-specialists in the field who have an interest in natural language.

Large Language Models and OpenLogos: An Educational Case Scenario

by Andrijana Pavlova and Anabela Barreiro

Open Research Europe, 2024

Large Language Models (LLMs) offer advanced text generation capabilities, sometimes surpassing hu... more Large Language Models (LLMs) offer advanced text generation capabilities, sometimes surpassing human abilities. However, their use without proper expertise poses significant challenges, particularly in educational contexts. This article explores different facets of natural language generation (NLG) within the educational realm, assessing its advantages and disadvantages, particularly concerning LLMs. It addresses concerns regarding the opacity of LLMs and the potential bias in their generated content, advocating for transparent solutions. Therefore, it examines the feasibility of integrating OpenLogos expert-crafted resources into language generation tools used for paraphrasing and translation. In the context of the Multi3Generation COST Action (CA18231), we have been emphasizing the significance of incorporating OpenLogos into language generation processes, and the need for clear guidelines and ethical standards in generative models involving multilingual, multimodal, and multitasking capabilities. The Multi3Generation initiative strives to progress NLG research for societal welfare, including its educational applications. It promotes inclusive models inspired by the Logos Model, prioritizing transparency, human control, preservation of language principles and meaning, and acknowledgment of the expertise of resource creators. We envision a scenario where OpenLogos can contribute significantly to inclusive AI-supported education. Ethical considerations and limitations related to AI implementation in education are explored, highlighting the importance of maintaining a balanced approach consistent with traditional educational principles. Ultimately, the article advocates for educators to adopt innovative tools and methodologies to foster dynamic learning environments that facilitate linguistic development and growth.

Automated Paraphrasing for Authoring Aids and Machine Translation

SPIDER: A System for Paraphrasing in Document Editing and Revision — Applicability in Machine Translation Pre-editing

Lecture Notes in Computer Science, 2011

... Anabela Barreiro ... Lambert Academic Publishing (2011) ISBN 978-3-8383-8565-5 Barreiro, A., ... more

EVAL - Evaluation of Machine Translation at FLUP

Multi3Generation: Multitask, Multilingual, Multimodal Language Generation

HAL (Le Centre pour la Communication Scientifique Directe), Jun 1, 2022

This paper presents the Multitask, Multilingual, Multimodal Language Generation COST Action-Multi... more This paper presents the Multitask, Multilingual, Multimodal Language Generation COST Action-Multi3Generation (CA18231), an interdisciplinary network of research groups working on different aspects of language generation. This "metapaper" will serve as reference for citations of the Action in future publications. It presents the objectives, challenges and a the links for the achieved outcomes.

Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing

Prefácio - POP - Por Outras Palavras

Linguamática, 2019

Este volume contém os trabalhos apresentados no POP -- Por Outras Palavras, o 1º seminário sobre ... more Este volume contém os trabalhos apresentados no POP -- Por Outras Palavras, o 1º seminário sobre Ferramentas e Recursos Linguísticos para Parafraseamento em Português, realizado a 24 de Setembro de 2018 em Canela (RS), Brasil. O seminário teve como objetivo reunir investigadores linguistas e que trabalham na área do Processamento de Linguagem Natural interessados em discutir novas ideias sobre o desenvolvimento e uso de recursos linguísticos orientados para pararafraseamento em português com aplicações do mundo real. As paráfrases são extremamente importantes na comunicação humana, tanto na produção como na compreensão da linguagem, e assumem um papel cada vez mais importante em atividades e projetos de investigação. Diversas experiências linguísticas mostraram a viabilidade de usar recursos parafrásticos numa ampla variedade de aplicações de software, pois permitem reconhecer e gerar formas equivalentes de expressar o mesmo conteúdo, permitindo que os sistemas forneçam ao utilizado...

EP–BP Paraphrastic Alignments Verbal Predicate Constructions with the Clitic Pronoun lhe

This paper presents the alignment of verbal predicate constructions with the clitic pronoun lhe i... more This paper presents the alignment of verbal predicate constructions with the clitic pronoun lhe in the European (EP) and Brazilian (BP) varieties of Portuguese, such as in the sentences Já lhe arrumaram a bagagem | Sua bagagem está seguramente guardada "His baggage is safely stowed away", where the EP dative proclisis lhe contrasts with the BP possessive pronoun sua. We have selected several different paraphrastic contrasts, such as proclisis and enclisis, clitic pronouns co-occurring with relative pronouns and negation-type adverbs, among other constructions to illustrate the linguistic phenomenon. Some differ- ences correspond to real contrasts between the two Portuguese varieties, while others purely represent stylistic choices. The contrasting variants were manually aligned in order to constitute a gold standard dataset, and a typology has been established to be further enlarged and made publicly available. The paraphrastic alignments were performed in the e-PACT corpu...

Linguistic Resources Overview

Port4NooJ is a set of linguistic resources developed on NooJ linguistic environment for the autom... more Port4NooJ is a set of linguistic resources developed on NooJ linguistic environment for the automated processing of the Portuguese language. They integrate a bilingual extension and can also be used in Portuguese to English machine translation. The linguistic resources are: the electronic dictionaries, the inflectional and derivational rules to formalize and document the Portuguese morphological structure, and the different types of grammar: morphological (for contracted forms), disambiguation, syntactic-semantic, multiword expressions, and translation grammars. The interaction between the different components and the application of the linguistic resources to text will be explained throughout this document.

Workshop Beyond Translation Memories: New Tools for Translators

Paraphrasing Emotions in Portuguese

Communications in Computer and Information Science, 2021

Port4NooJ v3.0: Integrated Linguistic Resources for Portuguese NLP

This paper introduces Port4NooJ v3.0, the latest version of the Portuguese module for NooJ, highl... more This paper introduces Port4NooJ v3.0, the latest version of the Portuguese module for NooJ, highlights its main features, and details its three main new components: (i) a lexicon-grammar based dictionary of 5,177 human intransitive adjectives, and a set of local grammars that use the distributional properties of those adjectives for paraphrasing (ii) a polarity dictionary with 9,031 entries for sentiment analysis, and (iii) a set of priority dictionaries and local grammars for named entity recognition. These new components were derived and/or adapted from publicly available resources. The Port4NooJ v3.0 resource is innovative in terms of the specificity of the linguistic knowledge it incorporates. The dictionary is bilingual Portuguese-English, and the semantico-syntactic information assigned to each entry validates the linguistic relation between the terms in both languages. These characteristics, which cannot be found in any other public resource for Portuguese, make it a valuable...

Machine Translation of Non-Contiguous Multiword Units

Proceedings of the Workshop on Discontinuous Structures in Natural Language Processing, 2016

Linguistic resources for paraphrase generation in portuguese: a lexicon-grammar approach

Language Resources and Evaluation, 2022

This paper presents a new linguistic resource for the generation of paraphrases in Portuguese, ba... more This paper presents a new linguistic resource for the generation of paraphrases in Portuguese, based on the lexicon-grammar framework. The resource components include: (i) a lexicon-grammar based dictionary of 2100 predicate nouns co-occurring with the support verb ser de ‘be of’, such as in ser de uma ajuda inestimável ‘be of invaluable help’; (ii) a lexicon-grammar based dictionary of 6000 predicate nouns co-occurring with the support verb fazer ‘do’ or ‘make’, such as in fazer uma comparação ‘make a comparison’; and (iii) a lexicon-grammar based dictionary of about 5000 human intransitive adjectives co-occurring with the copula verbs ser and/or estar ‘be’, such as in ser simpático ‘be kind’ or estar entusiasmado ‘be enthusiastic’. A set of local grammars explore the properties described in linguistic resources, enabling a variety of text transformation tasks for paraphrasing applications. The paper highlights the different complementary and synergistic components and integration ...

Paraphrastic Variance between European and Brazilian Portuguese

Workshop on NLP for Similar Languages, Varieties and Dialects, 2018

This paper presents a methodology to extract a paraphrase database for the European and Brazilian... more This paper presents a methodology to extract a paraphrase database for the European and Brazilian varieties of Portuguese, and discusses a set of paraphrastic categories of multiwords and phrasal units, such as the compounds toda a gente vs todo o mundo "everybody" or the gerundive constructions [estar a + V-Inf] vs [ficar + V-Ger] (e.g., estive a observar vs fiquei observando "I was observing"), which are extremely relevant to high quality paraphrasing. The variants were manually aligned in the e-PACT corpus, using the CLUE-Aligner tool. The methodology, inspired in the Logos Model, focuses on a semantico-syntactic analysis of each paraphrastic unit and constitutes a subset of the Gold-CLUE-Paraphrases. 1 The construction of a larger dataset of paraphrastic contrasts among the distinct varieties of the Portuguese language is indispensable for variety adaptation, i.e., for dealing with the cultural, linguistic and stylistic differences between them, making it possible to convert texts (semi-)automatically from one variety into another, a key function in paraphrasing systems. This topic represents an interesting new line of research with valuable applications in language learning, language generation, question-answering, summarization, and machine translation, among others. The paraphrastic units are the first resource of its kind for Portuguese to become available to the scientific community for research purposes.

Projetos sobre Tradução Automática do Português no Laboratório de Sistemas de Língua Falada do INESC-ID

Linguamática, 2014

As tecnologias da língua, de um modo especial as aplicações de tradução automática, têm o potenci... more As tecnologias da língua, de um modo especial as aplicações de tradução automática, têm o potencial de ajudar a quebrar barreiras linguísticas e culturais, apresentando um importante contributo para a globalização e internacionalização do português ao permitir que conteúdos linguísticos sejam partilhados a partir de e para esta língua. O presente artigo tem como objetivo apresentar o trabalho de investigação na área da tradução automática realizada pelo Laboratório de Sistemas de Língua Falada do INESC-ID, nomeadamente a tradução automática de fala, a tradução de microblogues e a criação de um sistema híbrido de tradução automática. Centraremos a nossa atenção na criação do sistema híbrido, que tem como objetivo a combinação de conhecimento linguístico, nomeadamente semântico-sintático, com conhecimento estatístico, de forma a aumentar o nível de qualidade da tradução.

The Lexicon-Grammar of Predicate Nouns with ser de in Port4NooJ

Communications in Computer and Information Science

This paper provides continuity for previous efforts on the integration of complementary lexicon-g... more This paper provides continuity for previous efforts on the integration of complementary lexicon-grammars to expand the paraphrastic capabilities of Port4NooJ, the Portuguese module of NooJ (Silberztein 2016). We describe the integration of the lexicon-grammar of 2,085 predicate nouns, which co-occur in constructions with the support verb ser de 'be of' in European Portuguese, such as in O Pedro é de uma coragem extraordinária 'Peter is of an extraordinary courage', studied, classified and formalized by Baptista (2005b). This led to a 20% increase in the number of predicate nouns. We also extended previously created paraphrasing grammars, such as the grammars that paraphrase symmetric predicates, as well as the grammars that handle the substitution of the support verb by another support verb. Furthermore, we created new grammars to paraphrase negative constructions, appropriate noun constructions, adjectival constructions, and manner sub-clauses. The paraphrastic capabilities acquired have been integrated in the eSPERTo system.

One Book, Two Language Varieties

This paper presents a comparative study of alignment pairs, either contrasting expressions or sty... more This paper presents a comparative study of alignment pairs, either contrasting expressions or stylistic variants of the same expression in the European (EP) and the Brazilian (BP) varieties of Portuguese. The alignments were collected semi-automatically using the CLUE-Aligner tool, which allows to record all pairs of paraphrastic units resulting from the alignment task in a database. The corpus used was a children’s literature book Os Livros Que Devoraram o Meu Pai (The Books that Devoured My Father) by the Portuguese author Afonso Cruz and the Brazilian adaptation of this book. The main goal of the work presented here is to gather equivalent phrasal expressions and different syntactic constructions, which convey the same meaning in EP and BP, and contribute to the optimisation of editorial processes compulsory in the adaptation of texts, but which are suitable for any type of editorial process. This study provides a scientific basis for future work in the area of editing, proofread...

Projetos sobre Tradução Automática do Português no Laboratório de Sistemas de Língua Falada do INESC-ID

Language technologies, in particular machine translation applications, have the potential to help... more Language technologies, in particular machine translation applications, have the potential to help break down linguistic and cultural barriers, presenting an important contribution to the globalization and internationalization of the Portuguese language, by allowing content to be shared 'from' and 'to' this language. This article aims to present the research work developed at the Laboratory of Spoken Language Systems of INESC-ID in the field of machine translation, namely the automated speech translation, the translation of microblogs and the creation of a hybrid machine translation system. We will focus on the creation of the hybrid system, which aims at combining linguistic knowledge, in particular semantico-syntactic knowledge, with statistical knowledge, to increase the level of translation quality.

Integrating the Lexicon-Grammar of Predicate Nouns with Support Verb fazer into Port4NooJ

This paper describes the ongoing process of integrating approximately 3,000 predicate nouns into ... more This paper describes the ongoing process of integrating approximately 3,000 predicate nouns into Port4NooJ, the Portuguese module for NooJ. The integration of these resources enables us to further extend the paraphrastic capabilities of eSPERTo paraphrasing system developed in the scope of a project with the same name. The integrated predicate nouns co-occur with the support verb fazer (do or make) and their syntactic and distributional properties are formalized in lexicon-grammar tables. These lexicon-grammar tables resulted in a standalone dictionary of predicate noun constructions and a few new grammars that can be used in paraphrase analysis and generation.