Astudyof Sentimentanalysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/332451019

A Study of Sentiment Analysis: Concepts, Techniques, and Challenges

Chapter · January 2019


DOI: 10.1007/978-981-13-6459-4_16

CITATIONS READS

12 14,283

3 authors, including:

Dr. Manjula Bairam R Lakshman Naik


Kakatiya University KU College of Engineering and Technology, Kakatiya University
35 PUBLICATIONS 105 CITATIONS 26 PUBLICATIONS 178 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by R Lakshman Naik on 06 January 2020.

The user has requested enhancement of the downloaded file.


A Study of Sentiment Analysis:
Concepts, Techniques, and Challenges

Ameen Abdullah Qaid Aqlan, B. Manjula and R. Lakshman Naik

Abstract Sentiment analysis (SA) is a process of extensive exploration of data


stored on the Web to identify and categorize the views expressed in a part of the
text. The intended outcome of this process is to assess the author attitude toward a
particular topic, movie, product, etc. The result is positive, negative, or neutral. These
study illustrated different techniques in SA approach for extracting and analytics
sentiments associated with the polarity of positive, negative, or neutral on the topic
selected. Social networks SA can be a useful source of information and data. SA
acquires important in many areas of business, politics, and thought. So, this study
contains a comprehensive overview of the most important studies in this field from
the past to the recent studies till 2017. The main aim of this study is to provide full
concept about SA techniques and its classification and methods used it. Also, we
give a brief overview of big data techniques and its relation and use in SA field.
Because the recent period has witnessed a remarkable development in the use of Big
Data (Hadoop) in the process collection of data and reviews from social networks
for analysis.

Keywords Big data · Classification · Challenges · Sentiment analysis · Social


media · Twitter

1 Introduction

In present days, most of the people are expressing their feelings, opinions, and shar-
ing their experiences, using the Internet and the social networks. This usually leads
to communicate massive amount of data using the Internet. But most of these data
are useful when analyzed; for example, most industrial companies and election cam-

A. A. Q. Aqlan (B) · B. Manjula


Department of Computer Science, Kakatiya University, Warangal 506009, Telangana, India
e-mail: ameenaqlan218@gmail.com
R. Lakshman Naik
Department of Information Technology, Kakatiya University, Warangal 506009, Telangana, India

© Springer Nature Singapore Pte Ltd. 2019 147


N. Chaki et al. (eds.), Proceedings of International Conference on Computational
Intelligence and Data Engineering, Lecture Notes on Data Engineering
and Communications Technologies 28, https://doi.org/10.1007/978-981-13-6459-4_16
148 A. A. Q. Aqlan et al.

Fig. 1 Sentiment analysis process steps

paigns rely on knowing the opinions of people through communication sites and
see whether they are positive, negative, or neutral. The SA has emerged because of
the huge information exchange on the Internet. The SA idea was first proposed by
Nasukawa [1]. Firstly, the SA is used for natural language process (NLP) [2], which
analyzes opinions, feelings, reactions of people and writers on the Internet through
social networking sites and business sites about the many products and services.
Sentiment analysis is a broad field for many researchers and can also be called opin-
ion mining; because it helps to classify ideas and opinions as positive, negative, or
neutral. SA is a textual study, which is widely used on reviews and surveys in the
Internet and social media. It handles responses and customer feedback on commer-
cial sites to know their acceptance or rejection of a product; this helps to improve
the sales of the company as it tells the choice of a customer. With the explosion of
different opinions through social networking sites, new ideas were generated by sys-
tems, politicians, psychologists, manufacturers, and researchers to analyze them to
implement the best decisions. Sentiment analysis has a high efficiency using NLP, as
statistics, and machine learning approaches to extract and define sentiment content
in a text unit.

2 Sentiment Analysis

Sentiment analysis is becoming very important to study growing opinions faster and
faster within social media and other sites, The huge explosion in information in recent
years in the sites of communication, air traffic and alternative markets, all this huge
amount of information cannot be controlled and analyzed used the traditional way,
so the scientists and researchers developed a high-efficiency techniques to deal with
this data. This requires the SA to process data and know its polarity to determine the
right decision. SA involves five steps to process data; those are data collection, text
preparation, sentiment detection, sentiment classification, and presentation of output
[3] as shown in bellow (Fig. 1).
A Study of Sentiment Analysis … 149

2.1 Data Collection

The data collection is the first step in sentiment analysis. The collection of data from
sources like user groups, Twitter, Facebook, blogs and commercial website such as
amazon.com and alibaba.com, etc. This data cannot be analyzed using traditional
methods like scanning, text analysis, and language processing, which is used for
extraction and classification. Wei Xu [4] and Tapan Kumar [5] proposed a certain
method for a task of paraphrasing and gathering the tweet data called Twitter Stream-
ing API.

2.2 Text Preparation

Text preparation examines the data before analyzing it. Some reviews and conversa-
tions in the communication sites contain offensive and inappropriate words, so they
are examined and preparation to be the result more reliable analysis. This process
selects the contents that are not related to the analysis and then removes it. Objec-
tive of the process is the removal of spam and inappropriate reviews before sent to
automated analysis. In this case, we can use part of speech (POS) technique which
are used for text preparation before analysis [6, 7].

2.3 Sentiment Detection

Sentiment detection is the process of finding the sentiment newline expressed in a


review by using machine learning technique or NLP technique; these are also called
opinion mining (OM) new line and sentiment analysis. Sentiment detection consists
of the examination of phrases and sentences extracted from reviews and ideas. All the
sentences containing self-expressions like beliefs, opinions, and abuse are retained.
Many research studies in this field included different methods of detection, like
Lakshmish Kaushik [8]; one of the recent studies propose a system for automatic
sentiment detection in natural audio streams by using POS technique.

2.4 Sentiment Classification

Sentiment Classification is a task to extraction and classification the text whose


objective to classify according to a polarity of the opinion it contains (pang 2002),
e.g. positive or negative, good or bad, like or dislike. Sentiment classification contains
multiple techniques, and it is classified into three main techniques, namely machine
learning approach, hybrid techniques approach, and lexicon-based approach [9] and
150 A. A. Q. Aqlan et al.

Fig. 2 Sentiment classification techniques

[10]. Presently, Naive Bayes technique and support vector machines (SVMs) are
more popular and used for sentiment classification. These techniques can improve
an accuracy of classification of Tweets, such as Ankur Goel [11] use Naive Bayes for
Sentiment Analysis Tweets in high speed. Therefore, sentiment analysis has a great
deal of research in this field and found that many applications and improvements have
occurred in sentiment analysis in recent years. In this study, we will clarification
most of the research in this area. These articles cover most of the divisions and
classifications widely in SA fields. Sentiment classification techniques are discussed
with more emphasis on most details and related points and originating references.
In Fig. 2, we will illustrate all techniques which are currently used in sentiment
classification until 2017.

2.4.1 Lexicon-Based Approach

Multiple words are used to classify sentiment and use positive words for the desired
things, while using negative words for undesired things. So, lexicon-based approach
relies mainly on finding opinion lexicon, which is used for text analysis. There are
two methods according to lexicon-based approach. The first one is corpus-based
approach, and the second one is dictionary-based approach.
Corpus-Based Approach: The corpus-based approach starts with a seed list of
opinion words and then finds other ideas from the words in a large corpus to get
opinions from certain directions. In another meaning, most methods rely on gram-
matical patterns or that occur together with the seed list of opinion words to find other
A Study of Sentiment Analysis … 151

words from a large corpus. Hatzivasiloglu [12] is one of the most important methods
to represent the corpus-based approach. The first step to start was to create a seed list
and use it with a wide range of language restrictions to be able to identify additional
words including their orientations. To implement corpus-based approach, we use two
different approaches: statistical approach and semantic approach as illustrated in the
following.
i. Statistical approach: It is used in many applications that have a relation in the
field of SA. The famous of them is the one that can detect the manipulation of
the review by conducting a statistical test of randomization which is called runs
test [13].
ii. Semantic approach: It gives values to sentiments while relies on more than
principle to calculate the affinity and similarity of different words. The basis of
this principle is to support the Sentiment value in the words and words close.
WorldNet [14].
Dictionary-Based Approach: Dictionary-based approach presented a compre-
hensive strategy for the dictionary-based approach. In this famous strategy, a small
group of words is hand-picked with known trends [15, 16]. Then, we come to plant
this group of words by searching in the known approach corpora thesaurus [17] or
WorldNet [18] for all synonyms and antonyms. The new words that are found are
added to the seed list, and then, the next repetition begins. This repetitive process
continues and stops only when there are no new words.

2.4.2 Machine Learning Approach

Machine learning approach is used to solve the problems related to text classification
that contain syntactic or linguistic features. Whilst lexicon-based approach is used
to extracting sentiment from text, it depends on a sentiment lexicon; the collection of
known and pre-compiled sentiment terms in Machine Learning algorithms divided
into Reinforcement Learning [17], Unsupervised Learning, and Supervised Learning.
Reinforcement Learning Technique: Its entirety indicates how to make an opti-
mal decision an important technique that differs relatively from its counterpart unsu-
pervised learning. This technique is highly concerned with improving the efficiency
of text classification to show that the reinforcement learning technique is important
and prominent.
Unsupervised Approach: It is a unique type of machine learning algorithm and
is used in most cases to draw and diverse inferences of data; these groups of data
consist of input data without any labeled responses. It is used when it is impossible
to obtain labeled training documents.
Supervised Learning: It is a type of machine learning approach that uses a data
set called training data set to make predictions. These data set contain input data
as well as response values. In supervised learning methods, it makes use of a large
number of assorted training documents.
152 A. A. Q. Aqlan et al.

Probabilistic Classifiers: It uses many models for classification. There are several
types of mixture models; each mixture model must be an integrated mixture com-
ponent. Each type of this mixture acts as generative and can support the particular
taking term for this component or other; this approach is called generative classifier.
i. Maximum Entropy Classifier: The maximum entropy classifier is a kind of clas-
sifications normally used in NLP, speech, data, and addressing problems. Max-
imum entropy is also probability distribution estimation; it is an important and
famous technique widely used for a variety of natural language tasks, such as
language modeling, part-of-speech tagging, and text segmentation. The under-
lying principle of maximum entropy is without external knowledge.
ii. Bayesian Network classifier: The most important assumption of the Bayesian
network classifier is a set of variables, each variable containing a limited set
of mutual cases. It is independent of the features. The real assumption is a
suggestion intended to have all the features which are completely dependent.
This certainly leads to a certain model of Bayesian network which is a guided
graph and represents a random contract.
iii. Naïve Bayes: Naïve Bayes is the most popular method for text classification
recently. Naïve Bayes classifier model computes the back probability of the
class, based on the division of words in the adopted document.
Rule-based Classification: Rule-based classification is used for any scheme that
constructs the classification according to rules IF and THEN. Linear classifier is a
vector of real-valued numerical input features.
Linear classifier: is a decision based on the value of a linear combination of
characteristics. An object’s characteristics are also known as feature values and are
typically presented to the machine in a vector called a feature vector; linear classifier
is divided into two methods; those are:
i. Neural Network: Neural network is a continuum of algorithms based on the
recognition of the relationships inherent in several sets of data using a process
similar to the way the human mind works.
ii. Support vector machine: SVM is used to analyze datasets for classification and
regression analysis, it is a machine learning algorithm that leads to process data
automatically.
Decision Tree Classifiers (DTC): These are used for classification. It’s aims to
divide large data into small groups to be easy control, the DTC use multi-values of
attributes, and features of the data to appear a class label discrete prediction. It is a
fairly simple technique and widely used in sentiment analysis field.

2.4.3 Hybrid Techniques Approach

Hybrid techniques approach is a combination of multiple computational techniques


which provide greater advantages than individual techniques and improve sentiment
A Study of Sentiment Analysis … 153

(data) analysis. Use of this technique is very convenient for many because it combines
two or more technologies, so it shows much better results than other methods.

2.5 Presentation of Output

The main objective of analyzing a huge amount of data is to convert unstructured


text into useful information and then to display it through charts such as a graph, line
graph, and bar graph.

3 Background

SA is contextual mining of texts; it identifies the sentences and subjective information


to classify opinions according to polarity. Sentiment analysis studies people’s feel-
ings, opinions, assessments, and attitudes toward many services, issues, events, and
organizations [17]. SA is not only applied to the commercial product reviews; it can
also be applied to all types of social communication sites and stock markets. Three
topics work under the umbrella of sentiment analysis emotions detection, building
resources, and transfer learning.
Emotions detection is a recent field of research that is closely related to SA.
The aim of SA is to detect positive, negative, or neutral, feelings from the text,
whereas emotion detection aims to detect and recognize types of feelings through
the expression of texts, such as anger, disgust, fear, happiness, sadness, and surprise.
Building resources is a lexicon; it is a vocabulary that is used to express an opinion
according to the polarity either positive, negative, or neutral. Transfer learning is
considered as the transfer of knowledge from one learned task to a new task in
machine learning. The text classification according to the following criteria is as
follows: The first standard is the polarity of sentiment into (positive, negative, or
neutral). The second standard is the polarity of the outcome that applies to most
political articles and medical facilities for managing disease data as follows [19]:
• Use agree or disagree, e.g., political debates [20];
• Criteria good or bad [21];
• Pros and cons: The meaning of this is either positive or negative [22];
in the following figure, will clarify steps of Sentiment Classification and related
works in SA.
154 A. A. Q. Aqlan et al.

Fig. 3 Knowledge
discovery and pattern
recognition architecture

3.1 Knowledge Discovery and Pattern Recognition


Architecture

Social networking sites are many and full of useful data; however, there are important
and credible data and useless data (not useful). The reliability data is usually found in
cultural sites or shopping sites because the customer may have experience in dealing
with commercial sites. So, we have illustrated the kinds of techniques and classifica-
tions within the field of sentiment analysis and how to extract and manipulate data to
reach reliable results. This section will illustrate a knowledge discovery and pattern
recognition architecture, Fig. 3.

3.1.1 Social Network

The Internet is the right environment and the main source for most information and
ideas that are shared by users. It is a resourceful place with respect to sentiment
information. By following a lot of ideas and articles, we found that people pre-
fer to publish their content through various online social media, such as forums,
microblogs, or online social networking sites. Choosing data source is the first step
in SA. All communication site a fertile environment for data collection; most of
A Study of Sentiment Analysis … 155

the data is useful, and often, there is abusive data which not useful, these data are
excluded automatically during analysis.

3.1.2 Data Collection

Application program interface (API)—this is the proposed system to extract the data
and download. It is characterized by research for hashtags, main keywords, and other
classifiers simultaneously [23]. API is widely used to collect reviews by researchers
and interested companies, but now we can use different Application program to
collect data such as Hadoop flume, Shirahatti [21].

3.1.3 Natural Language Processing

Natural Language Processing: It deals with all human languages whether written
or oral to process and apply. This is the main propose of NLP. NLP refers to part of
text; part of text includes verbs, adjectives, and nouns.
We may get unorganized data in this case and need further processing using one
of the algorithms (part of speech or N gram). Part of Speech can divide the sentences
to small words, and each word has a meaning; for example, in English language each
small word can have a distinctive meaning and this comes according to its use and
functions, and is categorized into several types or small parts of speech such as noun,
pronoun, verb, adverb, adjective, conjunction, preposition, and interjection.
N Gram, An n gram is simply a sequence of tokens. In the context of computa-
tional linguistics, these tokens are usually words, though they can be characters or
subsets of characters. The n simply refers to the number of tokens. n gram is used for
word sequence itself or predictive model that assigns it a probability. The gram is a
combination of letters; n gram refers to divide the sentence into several parts (count
the word); for example, “Friday will be holiday”—this is contains 4 gram; “tomor-
row is holiday”—this is contains 3 gram. Named-entity recognition (NER), using to
divides comments or the tweet into smaller parts this parts each one containing two
words.

3.1.4 Preprocessing

Preprocessing relies mainly on finding opinion lexicon, which is used for text anal-
ysis; it classifies the words either positive, negative, or neutral.

3.1.5 Feature Identifier

Feature identifier: It is to identify and classify the entities being referred to as tweets.
Final stage SVM classifier is trained in order to obtain the tweet’s label.
156 A. A. Q. Aqlan et al.

3.1.6 Classifier Approach

Classifier Approach: It is used to analyze datasets for classification and regression


analysis. It classifies the opinion positive, negative, or neutral in a final stage.

3.1.7 Result Analyzation

Result Analyzation or sentiment analysis commonly uses several ratings to express


the abundance of feelings or versa. Sentiment is evaluated through the use of stars;
some Websites require the evaluation of their material by using stars, and we will
illustrate the rating of stars commonly used as follows [24] and [25].
• Positive +2 or 5 stars
• Rational positive +1 or 4 stars
• Neutral 0 or 3 stars
• Rational negative –1 or 2 stars
• Negative –2 or 1 star

4 Related Work

The purpose of this study is to give a clear conception of most techniques in the field
of sentiment analysis, where it is easier for new researchers to benefit from it. As
mentioned many techniques of analysis, we will clarify some studies and research in
recent years that dealt with this area; this paper also covers a wide field of sentiment
classification technique and approach in SA field. Lexicon-based technique aims to
extract and collect data from social network such as Twitter [25], Facebook, etc. by
use API Graph to collect and load all the target data for analysis, and examine all
words that do not represent an emotional value or feature, then created a list of words
and analyzed them, that would be used in all cases, these shown positive results in
predict the sentiment behind a status post on Facebook by use lexicon-based approach
with high efficiency.
Machine learning approach is not limited to the analysis of data in social media,
Where used to know the driver’s sense at the moment of leadership. One of them
sought to generate and know the rules of the cognitive deviations of the drivers
directly from the place of the driving simulation environment. Through this study,
the eye movements of the drivers were taken using a simulated device [26].
Dictionary-based approach is used with high efficiency in the field of SA. Seongik
Park build thesaurus lexicon characterized in clearly and credibility [27], Where build
this approach through three online dictionaries to gathering thesauruses based on the
seed words, and sought to stores the real words which can be trusted into the thesaurus
lexicon in order to improve the reputation and credibility of the thesaurus lexicon,
and prove it a prominent lexicon.
A Study of Sentiment Analysis … 157

Use the DBA to build thesaurus lexicon. The purpose of this is to increase the
availability of tweet and review for the sentiment classification without the need to
use human resource. However, accuracy obtained was slightly increased.
Ishtiaq Ahsan has build a methodology for reviewing opinions and detecting
spam through a well-known learning method (active learning and supervision) with
the use of all data including real and fabricated show us very promising results while
conducting several different experiments [21]. The results have shown that detection
method is very effective and promising. The use of this technique is very convenient
for many because it combines two or more technologies, so it shows much better
results than other methods.
Seongik Park [21] uses the dictionary-based approach to build a lexicon for sen-
timent classification and uses three online dictionaries rich in vocabulary to collect
thesauruses based on the seed words to improved reliability of the lexicon. The-
sauruses are a collection of antonyms and synonyms to expand the lexicon more
vocabulary.
Because he focused only on lexicon building, the result was slightly increased.

5 Big Data

Massive amounts of data are stored in communication sites. These data are increasing
rapidly every year. Most of the data stored on the Internet has been produced in recent
years. But most of the data produced may be useful when processed. Business,
political, and social sites are the main source of data growth and the largest data
incubator.
Big data can be defined as a collection and storage of huge amounts of information
for final processing; there is a concept that gained momentum among the public
in the early 2000s almost by Doug Laney which explain the current mainstream
definition of big data. Big data is of high volume, high velocity, and high variety.
High volume means large amounts of information are collected from different sources
such as commerce, social media, and other communication channels and then stored
in Hadoop support. High velocity is a high capability of retrieval data, sharing into
slaves, process speed should be very high, high variety means all varieties of data
processing, such as structured, unstructured, and semi structured.
Big data involves various tools including Hadoop, data science, MongoDB, data
mining, Teradata, and Python. Hadoop is a big data framework to store and process
high volumes of data with very high speed. Hadoop is an open source. Hadoop
satisfies all characteristics of big data. So you can say Hadoop as a big data framework.
Hadoop framework includes two main modules: Hadoop MapReduce and Hadoop
Distributed File System.
• MapReduce is a processing technique and a program model for distributed com-
puting based on Java. MapReduce algorithm involves two important tasks those
are map and reduce. The map is A set of data which is converted to another set of
158 A. A. Q. Aqlan et al.

Fig. 4 Data science terms

data (map is input value) Reduce job takes the output from a map; reduce is output
of data value, the reduce task is always performed after the map job.
• Hadoop Distributed File System (HDFS) is designed to store a huge data sites
reliably. HDFS creates several replicas of a huge data for reliability and then put
them on compute nodes around the cluster.
Data Science: It is very suitable for data analysis, so it can give a boost of
improvements to sentiment analysis. Data science is dealing with each structured
data, unstructured data, and semi-structured data. In simple terms, it incorporates
statistics, mathematical, analysis, signal processing, natural language processing,
etc. Data science is an umbrella which involves several terms as illustrated in Fig. 4.

5.1 Big Data Tools in Sentiment Analysis

Using the modern program in big data is very important for sentiment analysis at the
moment. The use of big data for sentiment is very appropriate and still at the begin-
ning of growth. Most of the comments available in social media are unstructured.
So, we can use Hadoop technology because Hadoop can deal with structured data,
unstructured data, and semi-structured data. A research study is conducted to deter-
mine diabetes awareness among different segments of the population using Hadoop
MapReduce, [21]. The author Monu Kumar [28] extracts and collect the data from
social networks and analyze it by using big data technique. Hadoop MapReduce is
processing technique of big data effectively, so priorities use Hadoop for data col-
A Study of Sentiment Analysis … 159

lection and analysis. Shirahatti [21] collects comments and reviews from Twitter
using streaming tool flume then processes it in Hadoop MapReduce. The result of
processing time taken was very less compared to others previous methods.

6 Sentiment Analysis Challenges

The researcher in this field faces several of constraints and challenges which come in
the form of sentences or words vague difficult to identify. These constraints constitute
an obstacle to analyze targeted data and may lead to an unreliable outcome. There are
different types of these obstacles that pose challenging for sentiment analysis using
one of the known questionnaires, simple questionnaires, or role-based questionnaires
[22]. But the clear events is highly accepted and a candidate to obtaining good
comments and high-quality. We will illustrate some types of challenges which are
listed below.

i. Success or fail from one side:In many events that occur daily, expressions and
phrases are often used to describe the win or loss of one side. For example, Brazil
defeated Argentina 2-1; The Supreme Court ruled in favor of early marriage, the
regime’s army suppressed a popular uprising. In this case, if the person supports
Brazil, early marriage, and the ruling regime, will consider these events as positive.
but the person supporting Argentina and marriage over the age of eighteen, and
the popular uprising will consider these events as negative. Also, we note that
the transfer of the proceedings of the event as the win of one party or (as the
loss of another party) does not mean that the speaker expresses a negative or
positive opinion towards one of the parties mentioned. For example, when Sevilla
defeated Real Madrid 2-1 in the Spanish League 2017, the news around the
world was reported approximately that Real Madrid lost from Sevilla, instead
of broadcasting the news Sevilla beat on Real Madrid. This does not refer that
speakers or those involved in the broadcast reports expressed a negative opinion
on the Real Madrid team, but the royal team was their focus and they thought
that Real Madrid would not easily lose and owns with him forty matches without
losing pre-Sevilla match.

ii. Precisely understanding what is meant by opinion: Sometimes, it is difficult


for a person to understand the exact meaning of some sentences. For example,
glad to expose the disciplinary committee steroids of player Noor. From this
context, the target of the view seems unclear, do he mean Noor, Noor doping, or
Noor doping that which are revealed. In this case, the person can conclude that
the news carrier has a negative idea toward Noor doping and probably Noor in
general.
iii. Neutral reports of events: When the announcer starts broadcasting the reports,
the speaker must give a signal to his or her emotional state before describing
the events or situations, because It is unclear whether these reports should be
160 A. A. Q. Aqlan et al.

considered as emotional about evolution or presupposes whether the speaker is


in a negative emotional state (Happy, angry, cheerful, sad,).
iv. Detection abusive opinions and counterfeit reviews: On the Web, there is a
lot of spam information which contains spam and abusive reviews for sentiment
classification; it is unacceptable to process data with a presence of fake data
because this reduces the reliability of the results; we should initially identify
unwanted messages and remove them and then the processing, These steps we
can do through reviewer [29].
v. The focus on one domain: This challenge is a major obstacle to sentiment
analysis because it depends mainly on a limited nature from sentiment analysis
word; this may lead to focus on only one topic. For example, we may find in one
domain several features and good performance; at the same time, these features
may be very bad in some other domains [30].
vi. Difficult acquisition of opinion mining software: The software of opinion
mining is very expensive, and their prices are high and can only be bought by
governments and large organizations for now. These high prices exceed people’s
expectations and will remain a major obstacle to people wanting to get this
software; the average person cannot buy this expensive software. Therefore,
this software should be available to all categories of society without exception
so that everyone benefits from them.

7 Conclusion

This study mainly focuses on the overview of different techniques used in the field of
SA. Among thirty-eight papers recently published till 2017, we discussed the impor-
tance of opinions and comments on Web sites and how to extract them through cer-
tain techniques. We have noted important techniques in this area including the most
famous as Naive Bayes and SVM are the more commonly utilized in machine learn-
ing algorithms, to solve sentiment classification problem. Many present researchers
have improved the scope of this field. The aim of this study gives overview on these
improvements and summarizes categories of articles given according to different
sentiment analyses. The contribution of this study takes a brief look at the use of
big data (Hadoop tools) in sentiment analysis field. Use a big data is a fairly new
study in SA field. Therefore, our future work will focus on using big data techniques
(Hadoop) in sentiment analysis field to give more effective and accurate results. This
paper will be useful for new researchers and who has a desire to join this field. In
this study, we covered all the techniques and the most famous in one paper and illus-
trated different techniques in (SA) approach for extracting and analytics sentiments
associated with the polarity of positive or negative, or neutral on the topic selected.
A Study of Sentiment Analysis … 161

References

1. Nasukawa Y (2003) Sentiment analysis: capturing favorability using natural language process-
ing, IBM Almaden Research Center, CA 95120, https://doi.org/10.1145/945645.945658
2. Mohey D (2016) A survey on sentiment analysis challenges. J King Saud Univ Eng https://doi.
org/10.1016/j.jksues.2016.04.002
3. Alessia D (2015) Approaches, tools and applications for sentiment analysis implementation.
Int J Comput Appl 125(3)
4. Xu W , Ritter A, Grishman R (2013) Gathering and generating paraphrases from twitter with
application to normalization
5. Hazra TK (2015) Mitigating the adversities of social media through real time tweet extraction
system, IEEE, https://doi.org/10.1109/iemcon.2015.7344483
6. Semih Y (2014) Tagging accuracy analysis on part-of-speech taggers. J Comput Commun
2:157–162, https://doi.org/10.4236/jcc.2014.24021
7. El-Din DM (2015) Online paper review analysis. Int J Adv Comput Sci Appl 6(9)
8. Kaushik L (2013) Sentiment extraction from natural audio streams, IEEE https://doi.org/10.
1109/icassp.2013.6639321
9. Vaghela VB (2016) Analysis of various sentiment classification techniques. Int J Comput Appl
140(3)
10. BiltawiL M (2016) Sentiment classification techniques for Arabic language a survey, IEEE,
https://doi.org/10.1109/iacs.2016.7476075
11. Goel A (2016) Real time sentiment analysis of tweets using naive bayes, IEEE, https://doi.org/
10.1109/ngct.2016.7877424
12. Hu M, Liu B (2004) Mining and summarizing customer reviews, seattle, Washington, USA,
https://doi.org/10.1145/1014052.1014073
13. Kim S-M (2004) Determining the sentiment of opinions, ACM Digital Library, https://doi.org/
10.3115/1220355.1220555
14. Mohammad S (2009) Generating high-coverage semantic orientation lexicons from overtly
marked words and a thesaurus. In: Conference on empirical methods in natural language pro-
cessing, pp 599–608
15. Miller GA (1993) Introduction to word net: an on-line lexical database
16. Hatzivassiloglou V, McKeown R (1998) Predicting the semantic orientation of adjectives, New
York, N.Y.10027, USA
17. Medhat W (2014) Sentiment analysis algorithms and applications a survey. Ain Shams Eng J
(Elsevier B.V.), 5(4):1093–1113
18. Soo-Min Kim, Determining the Sentiment of Opinions, International Journal,
doi=10.1.1.68.1034, (2004)
19. Pang B, Lee L (2008) Opinion mining and sentiment analysis. https://doi.org/10.1561/
1500000011
20. Niu Y (2005) Analysis of polarity information in medical text, PMC Jurnal
21. Park S (2016) Building thesaurus lexicon using dictionary based approach for sentiment clas-
sification, IEEE, https://doi.org/10.1109/sera.2016.7516126
22. Ramsingh J (2016) Data analytic on diabetic awareness with Hadoop streaming using map
reduce in Python, IEEE, https://doi.org/10.1109/icaca.2016.7887979
23. Kim S-M, Hovy E (2006) Automatic identification of pro and con reasons in online reviews,
ACM Digital Library
24. Trupthi M (2017) Sentiment analysis on twitter using streaming API, IEEE, https://doi.org/10.
1109/iacc.2017.0186
25. Cambria E, Hussain A (2015) Group Using Lexicon Based Approach. Springer J https://doi.
org/10.1007/978-3-319-23654-4
26. Akter S (2016) Sentiment analysis on Facebook group using lexicon based approach, IEEE,
https://doi.org/10.1109/ceeict.2016.7873080
27. Yoshizawa A (2016) Machine-learning approach to analysis of driving simulation data, IEEE,
https://doi.org/10.1109/icci-cc.2016.7862067
162 A. A. Q. Aqlan et al.

28. Istiaq Ahsan MN (2016) An ensemble approach to detect review spam using hybrid machine
learning technique, IEEE, https://doi.org/10.1109/iccitechn.2016.7860229
29. Kumar M (2016) Analyzing Twitter sentiments through big data, IEEE, https://doi.org/10.
1109/sysmart.2016.7894530
30. Abhinandan P, Shirahatti (2015) Sentiment analysis on Twitter data using Hadoop. Int J Eng
Res Gen Sci 3(6)

View publication stats

You might also like