Text Processing
570 Followers
Recent papers in Text Processing
Problem statement: This study proposed multimodal integration method at the concept level to investigate information from multimodalities. The multimodal data was represented as two separate lists of concepts which were extracted from... more
Dzongkha, the national language of Bhutan, is continuous in written form and it fails to mark the word boundary. Dzongkha word segmentation is one of the fundamental problems and a prerequisite that needs to be solved before more advanced... more
In this paper, we discuss how Artificial Intelligence (AI) techniques might be brought to bear in automatically recognizing "creative reasoning" in student e-discussions. An AI-based graph-matching algorithm was used to find instances of... more
This longitudinal study is focused on the development of learning strategies in low-, average-, and high-achieving children from third to fifth grade (i.e., from age 9-10 to age 11-12). Children's comprehension and learning of expository... more
This paper reports on the application of the Text Attribution Tool (TAT) to profiling the authors of Arabic emails. The TAT system has been developed for the purpose of language-independent author profiling and has now been trained on two... more
In this paper we present a general approach to string matching based on multiple sliding text-windows, and show how it can be applied to some among the most efficient algorithms for the problem based on nondeterministic automata and... more
This paper analyzes free online programs for sentiment analysis which can, on the bases of their algorithm, give a positive, negative or neutral opinion of a text. At the beginning of the paper sentiment analysis programs and techniques... more
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and... more
India is a country of having multi spoken languages and there is a need in digitization of books and documents and conversion of this text of these books and documents into speech. This paper envisages on design and implementation of Text... more
In dieser Arbeit werden zwei Paradigmen der Textverarbeitung (WYSIWYG- vs. WYSIWYM-Prinzip) vorgestellt und konkret gezeigt, mit welchen Programmen wissenschaftliche Arbeiten geschrieben und publiziert (PDF u.a.) werden können.
The paper reports work on collecting and annotating code-mixed English-Hindi social media text (Twitter and Facebook messages), and experiments on automatic tagging of these corpora, using both a coarse-grained and a fine-grained... more
This workshop was conceived with the aim of bringing together the different computational linguistic subcommunities which model language predominantly by way of theoretical syntax, either in the form of a particular theory (e.g. CCG,... more
Advances in computational linguistics and discourse processing have made it possible to automate many language-and text-processing mechanisms. We have developed a computer tool called Coh-Metrix, which analyzes texts on over 200 measures... more
Support vector machines (SVMs) appeared in the early nineties as optimal margin classifiers in the context of Vapnik's statistical learning theory. Since then SVMs have been successfully applied to real-world data analysis problems, often... more
XML documents are used to exchange data. Data exchange implies the transformation of the original data to a different structure. Often such transformations need to be adapted to some specific situation, like the rendering to non-standard... more
The processing of reggaeton songs demands an active engagement from the listener due to high level of fragmentation in the lyrics, which stems from variability in word stress and prosodic segmentation. Changes in word stress challenge the... more
N-grams have been widely investigated for a number of text processing and retrieval applications. This article examines the performance of the digram and trigram term conflation techniques in the context of Arabic free text retrieval. It... more
This paper presents a new approach to text processing, based on textemes. These are atomic text units generalising the concepts of character and glyph by merging them in a common data structure, together with an arbitrary number of... more
We present the first comprehensive review of research into hemianopic dyslexia since Mauthner's original description of 1881. We offer an explanation of the reading impairment in patients with unilateral homonymous visual field disorders... more
8 th International Conference on Signal Processing and Pattern Recognition (SIPR 2022) is a forum for presenting new advances and research results in the fields of Digital Processing and Pattern Recognition. The conference will bring... more
This paper describes an own implementation of a regular expression preprocessor written in PHP. It extends the regular expression functionality by allowing users to define named segments. These segments include custom character classes,... more
public class StockQuote { public static void main(String[] args) { String name = "http://finance.yahoo.com/q?s="; In in = new In(name + args[0]);
How different cultures react and respond given a crisis is predominant in a society's norms and political will to combat the situation. Often, the decisions made are necessitated by events, social pressure, or the need of the hour,... more
In this paper we present an approach to automatic authorship attribution dealing with real-world (or unrestricted) text. Our method is based on the computational analysis of the input text using a text-processing tool. Besides the style... more
[Table of Contents] http://www.cup.es/us/catalogue/catalogue.asp?isbn=9780521833356&ss=toc Chapt. 38. Inferences during discourse comprehension in Korean (pp. 474-483) - Soyoung Kim Suh, Jung-Mo Lee and Jae-Ho Lee - Publishers:... more
This paper first introduces a newly-recorded high quality Romanian speech corpus designed for speech synthesis, called "RSS", along with Romanian front-end text processing modules and HMM-based synthetic voices built from the corpus. All... more
This paper reports on the application of the Text Attribution Tool (TAT) to profiling the authors of Arabic emails. The TAT system has been developed for the purpose of language-independent author profiling and has now been trained on two... more
One of our studies is upon the comparison of the nature of Moore's and Hopcroft's algorithms. This gives some new insight in both algorithms. As we shall see, these algorithms are quite different both in behavior and in complexity. In... more
Automatic document classification due to its various applications in data mining and information technology is one of the important topics in computer science. Classification plays a vital role in many information management and retrieval... more
The assumption of traditional character educators that children build moral literacy from reading or hearing moral stories is challenged based on research findings. First, research in text comprehension indicates that readers do not... more
This paper introduces a hidden topic-based framework for processing short and sparse documents (e.g., search result snippets, product descriptions, book/movie summaries, and advertising messages) on the Web. The framework focuses on... more