Icaiccit 719
Icaiccit 719
Icaiccit 719
Abstract— Sentiment Analysis is referred as text organization that is the reviews of the product on various social media platforms
used to classify the expressed mind-set or feelings in different before buying it. The reviews of the customers if constantly
manners such as negative, positive, favorable, unfavorable, monitored can help in evaluating the customer’s loyalty, keeping
thumbs up, thumbs down, etc. Majority of the content available on track on their sentiments and also analyzing the impact of
various social media websites is in English Language. Due to various marketing activities related to a product.
advent of technology and access to Internet in huge populated
country like India, people tend to share their views on a language Sentiment Analysis can be done using two approaches:
that they are more comfortable in. This gives rise to users sharing Machine Learning Approach and Lexicon Based Approach.
their opinions in code-mixed languages such as Hindi mixed with Lexicon Based Approach is dictionary-based approach that is
English. Opinions in the form of tweets or comments are available being used for sentiment analysis. The data obtained after
all-over social media. These views/comments posted by the users cleaning is divided and compared with the words in the
can be analyzed for various purposes. This paper proposes a model dictionaries. Polarity is given to all the words and the overall
that will find the percentage of Hinglish text present in the text polarity is calculated. This classifies a text as positive, negative
retrieved from various social media platforms. The user can keep or neutral related to any product or issue. In the Machine
or discard the text for analysis depending on the percentage of Learning approach, data is preprocessed with the help of various
Hinglish text present. The accuracy value attained is 83%, this can preprocessing techniques and then it is fed to different
further increase when add more words in the dataset that is
classifying algorithms. Using these algorithms the data is
classifying the comments.
classified into various polarity i.e. Positive, Negative and
Keywords— Sentiment Analysis, social media, opinions, Hinglish Neutral.
text, Detection of Hinglish text Everything and anything is available on the internet and
every household has access to the internet and its services. In
I. INTRODUCTION India variety of languages are spoken and written, out of which
Sentiment Analysis has now become one of the interesting the most commonly used languages are Hindi and English. A lot
topics of research in the field of Artificial Intelligence. People of work on sentiment analysis on different languages is done
are writing or expressing their views on social platforms like individually but quite less on code mixed languages. Code
Facebook, Twitter, Youtube, Instagram etc. and a huge data is mixed script can be a combination of any two languages like
being generated every minute of the day. The views or Hindi-English(Hinglish), Spanish – English(Spanglish), or any
sentiments can be in any language, the need is to extract the other indigenous Indian language mixed with English. In this
relevant information out of it. The businesses can use this ever research work the focus will be on sentiments written in
growing volume of data for decision making process as well as Hinglish. The objectives of this research are as follows: To
to improve their policies in the future. Also people give their identify the percentage of Hinglish text present in the text to be
feedback about the quality of the product which can be extracted analysed, to create a model to find most frequently used words
and analyzed to improve its quality. So, the text available on the in Hinglish texts, creation of a domain specific
social media platforms if analyzed correctly can be proved lexicon(Cyberbullying) to calculate the polarity of given
useful in many ways. Hinglish words, classifying the sentiments using various
machine learning algorithms, to compare the results of
There exist two main ways to categorize textual
sentiments using lexicon based approach and machine learning
information: facts and opinions. Facts are objective in nature
algorithms.
whereas opinions are usually subjective. Today, the Web is a
place where everyone can post or share reviews about various As the technology has advanced , the availability and access
issues and products. Social media affects the user’s point of view to Internet services is with among the majority of the population
and hence the decisions. It has now become a crucial part of .And people tend to spend more time on Social Media platform
digital marketing. Sentiment analysis is quite similar to opinion consuming the content available of their choice. Also they
mining, with the help of which one can interpret the user’s view express their point of view regarding the same in their language
by finding the polarity of the text. Polarity means to determine of convenience. Earlier people used a single or Unicode
whether the text is having positive, negative or neutral emotions language as a mode of sharing their opinions regarding some
related to any product or issue. The customers nowadays read
.
∗ 100
.
F. Step -6 Evaluation
Fig. 4 Classification of Comments
Further we evaluate the manual Tagging from the actual
D. Step-4 Corpus Creation class and the predicted class for attaining the level of
A corpus is created that by combining the following accuracy. Figure 6. indicates the value of Accuracy, Recall,
available corpus of Hinglish text: CMUHinglishDoG, f1-score calculated.
HinglishNorm, hinglish-corpus, Hinglish-TOP-Dataset.
This corpus helps to classify text. Figure 5 shows a glimpse
of the corpus used.