CV (in Russian) by Elena Yagunova
Lande D., Yagunova E. Dynamic frequency features as the basis for the structural description of diverse linguistic objects // CEUR Workshop Proceedings Volume 934, 2012, Pages 150-159 ISSN: 16130073
The paper presents an approach to studying dynamic frequency features of words for diverse lingui... more The paper presents an approach to studying dynamic frequency features of words for diverse linguistic objects; the purpose of the approach is to describe heterogeneous dynamic objects covering the wide range from individual texts to the news texts flow. Four groups of words are used, extracted on the basis of their dynamic frequency response (both global and local), each of which has a clearly distinct physical and linguistic nature. They correspond to diverse linguistic characteristics of objects in terms of the object structure and language features.
Lande D., Yagunova E. Dynamic frequency features as the basis for the structural description of diverse linguistic objects // CEUR Workshop Proceedings Volume 934, 2012, Pages 150-159 ISSN: 16130073
The paper presents an approach to studying dynamic frequency features of words for diverse lingui... more The paper presents an approach to studying dynamic frequency features of words for diverse linguistic objects; the purpose of the approach is to describe heterogeneous dynamic objects covering the wide range from individual texts to the news texts flow. Four groups of words are used, extracted on the basis of their dynamic frequency response (both global and local), each of which has a clearly distinct physical and linguistic nature. They correspond to diverse linguistic characteristics of objects in terms of the object structure and language features.
Lande D., Yagunova E. Dynamic frequency features as the basis for the structural description of diverse linguistic objects // CEUR Workshop Proceedings Volume 934, 2012, Pages 150-159 ISSN: 16130073
The paper presents an approach to studying dynamic frequency features of words for diverse lingui... more The paper presents an approach to studying dynamic frequency features of words for diverse linguistic objects; the purpose of the approach is to describe heterogeneous dynamic objects covering the wide range from individual texts to the news texts flow. Four groups of words are used, extracted on the basis of their dynamic frequency response (both global and local), each of which has a clearly distinct physical and linguistic nature. They correspond to diverse linguistic characteristics of objects in terms of the object structure and language features.
In this paper an approach to primary business-media analysis for further information extraction i... more In this paper an approach to primary business-media analysis for further information extraction is proposed. We consider business events representation by looking into part of speech (POS) distribution across tagged n-grams. Two Russian business-media corpora, Russian Business Consulting (RBC) and Commersant, are analyzed, and it is shown that they differ not only in style or themes coverage but also in the range of contexts for the words which mark business-events. Purchase, merger and ownership events are given a closer look at, and it is shown that they are mostly represented by noun phrases in both corpora rather than verbal phrases.
Papers by Elena Yagunova
As part of our project ParaPhraser on the identification and classification of Russian paraphrase... more As part of our project ParaPhraser on the identification and classification of Russian paraphrase, we have collected a corpus of more than 8000 sentence pairs annotated as precise, loose or non-paraphrases. The corpus is annotated via crowdsourcing by naive native Russian speakers, but from the point of view of the expert, our complex paraphrase detection model can be more successful at predicting paraphrase class than a naive native speaker.
In this paper we present a new Russian paraphrase corpus derived from the news feed of the social... more In this paper we present a new Russian paraphrase corpus derived from the news feed of the social network and conduct its primary analysis. Most media agencies post their news reports on their pages in social networks, and the headlines of the messages are often the same as those of the corresponding news articles from the official websites of the agencies. However, sometimes these pairs of headlines differ, and in such cases a headline from the social network can be considered a compression or a paraphrase of the original headline. In other words, such news feed from social networks is a rich resource of textual entailment, and, as it is shown in this paper, various linguistic phenomena, e.g., irony, presupposition and attention attracting markers. We collect the described pairs of headlines and construct the Russian social network news feed paraphrase corpus based on them. We test the paraphrase detection model trained on the other existing Russian paraphrase corpus, ParaPhraser.r...
Computación y Sistemas
In this paper, we construct paraphrase graphs for news text collections (clusters). Our aims are,... more In this paper, we construct paraphrase graphs for news text collections (clusters). Our aims are, first, to prove that paraphrase graph construction method can be used for news clusters identification and, second, to analyze and compare stylistically different news collections. Our news collections include dynamic, static and combined (dynamic and static) texts. Their respective paraphrase graphs reflect their main characteristics. We also automatically extract the most informationally important linked fragments of news texts, and these fragments characterize news texts as either informative, conveying some information, or publicistic ones, trying to affect the readers emotionally.
Communications in Computer and Information Science
The paper describes the results of the First Russian Paraphrase Detection Shared Task held in St.... more The paper describes the results of the First Russian Paraphrase Detection Shared Task held in St.-Petersburg, Russia, in October 2016. Research in the area of paraphrase extraction, detection and generation has been successfully developing for a long time while there has been only a recent surge of interest towards the problem in the Russian community of computational linguistics. We try to overcome this gap by introducing the project ParaPhraser.ru dedicated to the collection of Russian paraphrase corpus and organizing a Paraphrase Detection Shared Task, which uses the corpus as the training data. The participants of the task applied a wide variety of techniques to the problem of paraphrase detection, from rule-based approaches to deep learning, and results of the task reflect the following tendencies: the best scores are obtained by the strategy of using traditional classifiers combined with fine-grained linguistic features, however, complex neural networks, shallow methods and purely technical methods also demonstrate competitive results.
Communications in Computer and Information Science, 2016
This paper presents a crowdsourcing project on the creation of a publicly available corpus of sen... more This paper presents a crowdsourcing project on the creation of a publicly available corpus of sentential paraphrases for Russian. Collected from the news headlines, such corpus could be applied for information extraction and text summarization. We collect news headlines from different agencies in real-time; paraphrase candidates are extracted from the headlines using an unsupervised matrix similarity metric. We provide user-friendly online interface for crowdsourced annotation which is available at paraphraser.ru. There are 5181 annotated sentence pairs at the moment, with 4758 of them included in the corpus. The annotation process is going on and the current version of the corpus is freely available at http://paraphraser.ru.
2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT), 2015
In this paper we analyze and compare different types of sentence similarity measures applied to t... more In this paper we analyze and compare different types of sentence similarity measures applied to the problem of sentential paraphrase identification. We work with Russian, and all the experiments are conducted on the Russian paraphrase corpus we have collected from the news headlines (and are collecting at the moment). Apart from the similarity measures, we also analyze the corpus itself. As a result of the research we disprove the supposition that it is more difficult to distinguish between precise and loose paraphrases than between loose paraphrases and non-paraphrases. We also come up with the recommendations for the application of different similarity measures to identifying paraphrases derived from the news texts.
Lecture Notes in Computer Science, 2015
This paper deals with the task of sentential paraphrase identification. We work with Russian but ... more This paper deals with the task of sentential paraphrase identification. We work with Russian but our approach can be applied to any other language with rich morphology and free word order. As part of our ParaPhraser.ru project, we construct a paraphrase corpus and then experiment with supervised methods of paraphrase identification. In this paper we focus on the low-level string, lexical and semantic features which unlike complex deep ones do not cause information noise and can serve as a solid basis for the development of an effective paraphrase identification system. Results of the experiments show that the features introduced in this paper improve the paraphrase identification model based solely on the standard low-level features or the optimized matrix metric used for corpus construction.
This paper presents an experimental solution of the problem of the nature of the terminology coll... more This paper presents an experimental solution of the problem of the nature of the terminology collocations and possibility of their ranging, which depends on the degree of coherence of these collocations. Within this paper the combination of two different approaches-calculation and experiments with informants-is proposed to the study of the terminology collocations. The proposed approach is particularly relevant for those scientific areas, where still there isn't precise terminology.
Lecture Notes in Computer Science, 2016
In this paper information extraction task for the restaurant recommendation system is considered.... more In this paper information extraction task for the restaurant recommendation system is considered. We develop an information extraction system which is intended to gather restaurants aspects from users’ reviews and output them to the recommendation module. As many of the restaurant aspects are subjective, our task can also be called sentiment analysis, or opinion mining. Thus, we present an aspect-based approach towards sentiment analysis of reviews about restaurants for e-tourism recommender systems. The analyzed frames are service and food quality, cuisine, price level, noise level, etc. In this paper we focus on service quality, cuisine type and food quality. As part of the preprocessing phase, a method for Russian reviews corpus analysis (as part of information extraction) is proposed. Its importance is shown at the experimental phase, when the application of machine learning techniques to aspects extraction is analyzed. It is shown that the information obtained during corpus analysis improve system performance. We conduct experiments with several feature sets and classifiers and show that the use of resources learnt from the corpus leads to the improvement of the models. Naive Bayes appears to be the best choice for sentiment classification, while Logistic Regression and SVM are best at deciding on the relevance of a review with respect to the particular aspect.
Polibits
In this paper a recommender system is described which takes a set of venue categories of user's i... more In this paper a recommender system is described which takes a set of venue categories of user's interest into account to form a tourist itinerary throughout a city. The system is focused on user preferences in venue aspects. Techniques of such aspects extraction are developed in this paper, in particular from reviews corpora. User preferences are used to weigh aspects associated with particular sights and restaurants. These filtered venues along with time restrictions are subject to submit into the recommender system. A lightweight ontology is discussed which describes the domains of restaurants and sightseeing knowledge and allows venues comparative analysis to enhance the search for relevant venues. The system designed performs automated planning of tourist itineraries, flexible sights searching, and analysis of venues aspects extracted from reviews in Russian.
Uploads
CV (in Russian) by Elena Yagunova
Papers by Elena Yagunova
examples from the heterogeneous corpus of the STIDS 2013 conference proceedings.