Tweet Segmentation and Its Application To
Tweet Segmentation and Its Application To
Tweet Segmentation and Its Application To
ABSTRACT:
Twitter can be very useful for arranging a time and place to get together. It's like a
conference call with text messaging. It has attracted millions of users to share and disseminate
most up-to-date information. Targeting twitter stream is usually constructed by filtering tweets
with predefined selection criteria. However, many applications in Information Retrieval (IR) and
Natural Language Processing (NLP) suffer severely from the noisy and short nature of tweets. In
this context we get a problem within the sentences of the twit texts such as grammar errors and
spelling errors. In this paper, we propose a novel framework for tweet segmentation in a batch
mode, called HybridSeg. By splitting tweets into meaningful segments, the semantic or context
information is well preserved and easily extracted by the downstream applications. The
Segmentation models and Named Entity identification can consider the sentences within the
NER algorithm system to evaluate and given the exact corrections of the sentences.
EXISTING SYSTEM:
In previous work limited length of a tweet (i.e., 140 characters) and no restrictions on its
writing styles, tweets often contain grammatical errors, misspellings, and informal abbreviations.
The error-prone and short nature of tweets often make the word-level language models for tweets
less reliable. For example, given a tweet “I call her, no answer. Her phone in the bag, she
dancing.”, there is no clue to guess it’s true theme by disregarding word order (i.e., bag-of-word
model).
We propose and evaluate two segment-based NER algorithms. Both algorithms are
unsupervised in nature and take tweet segments as input. One algorithm exploits co-occurrence
of named entities in targeted Twitter streams by applying random walk (RW) with the
assumption that named entities are more likely to co-occur together. The other algorithm utilizes
Part-of-Speech (POS) tags of the constituent words in segments. The segments that are likely to
be a noun phrase are considered as named entities.
SOFTWARE REQUIREMENTS:
HARDWARE REQUIREMENTS: