Kartik-20CS46 Report
Kartik-20CS46 Report
Kartik-20CS46 Report
On
SENTIMENT ANALYSIS : OPINION MINING
Session : 2023-24
SUBMITTED TO: SUBMITTED BY:
Mr. H.R. Choudhary Kartik Yadav
Department of CSE 20EEACS047
7TH Semester
1
CERTIFICATE
Place: Ajmer
2
ACKNOWLEDGEMENT
I want to give sincere thanks to the Principal Dr. Rekha Mehra for
her valuable support.
Kartik Yadav
20EEACS047
3
TABLE OF CONTENT
4
LIST OF FIGURES
5
6
ABSTRACT
The goal of this report is to classify review data into sentiments (positive or
negative) by using different supervised machine learning classifiers on data
collected for different Indian political parties and to show which political party
is performing best for the public. We also concluded which classifier gives
more accuracy during classification.
7
CHAPTER 1 INTRODUCTION
Sentiment analysis provides some answers into what the most important issues are,
from the perspective of customers, at least. Because sentiment analysis can be
automated, decisions can be made based on a significant amount of data rather than
plain intuition that isn’t always right.
It can be done at document, phrase and sentence level. At document level, summary
of the entire document is taken first and then it is analyzed whether the sentiment is
positive, negative or neutral. In phrase level, analysis of phrases in a sentence is taken
into account to check the polarity. In Sentence level, each sentence is classified in a
particular class to provide the sentiment. Sentimental Analysis has various
applications. It is used to generate opinions for people on social media by analyzing
their feelings or thoughts which they provide in the form of text. Sentiment Analysis
is domain centered, i.e. results of one domain cannot be applied to other domains.
Sentimental Analysis is used in many real life scenarios, to get reviews about any
product or movies, to get the financial report of any company, for predictions or
marketing.
Using machine learning techniques and natural language processing we can extract
the subjective information of a document and try to classify it according to its polarity
8
such as positive, neutral or negative. It is a really useful analysis since we could
possibly determine the overall opinion about a selling object, or predict stock markets
for a given company like, if most people think positively about it, possibly its stock
markets will increase, and so on..
Effective business strategies can be built from results of sentiment and emotion
analysis. Identifying clear emotions will establish a transparent meaning of text which
potentially develops customer relationships, motivation and extends consumer
expectations towards a brand or service or a product.
Just as with other data related to customer experience, emotions data is used to create
strategies that will improve the business's customer relationship management (CRM).
Sentimental analysis software programs can be used with companies' data collection,
data classification, data analytics and data visualization initiatives to find out the
hidden sentiment from the text which can help them to find out the areas of
improvement as well as the changes that can lead them to grow their business with
the customer satisfaction.
Twitter is a micro blogging platform where anyone can read or write short forms of
messages which are called tweets. The amount of data accumulated on twitter is very
huge. This data is unstructured and written in natural language. Twitter Sentiment
Analysis is the process of accessing tweets for a particular topic and predicts the 1
sentiment of these tweets as positive, negative or neutral with the help of different
machine learning algorithms
9
1.2 Motivation to Work
10
1.3.1 Objective
The main objective of this report work is to perform sentiment analysis on text of
social media platforms such that people's opinions about products, services, policies
etc. are extracted from these online platforms. Thus to achieve this objective we build
a classifier based on supervised learning and perform live sentiment analysis on data
collected from different political parties.
1.3.2 Methodology
The sentiment analysis of Twitter data is an emerging field that needs much more
attention.We use Tweepy an API to stream live tweets from Twitter.User based on his
interest chooses a keyword and tweets containing that keyword are collected and
stored into a csv file.Then we make it a labeled dataset using textblob and setting the
sentiment fields accordingly.Thus our train data set without preprocessing is
ready.Next we perform preprocessing to clean,remove unwanted text,characters out
of the tweets.Then we train our classifier by fitting the train data to the classifier
,there after prediction of results over unseen test data set is made which there after
provides us with the accuracy with which the classifier had predicted the
outcomes.There after we present our results in a pictorial manner which is the best
way to showcase results because of its easiness to understand information out of it.
Extraction of Data
Tweets based on a keyword of the user's choice of interest have been collected using a
famous twitter API known as Tweepy and stored into a csv file.This data set collected
for sentiment analysis have tweets based on a keyword e.g.,cybertruck. Tweets
mimicking various emotions as a dataset downloaded from kaggle is used for
emotional analysis.Since both the machines are trained using supervised learning and
work on different parameters different data sets have been considered.
In order to extract the opinion first of all data is selected and extracted from twitter in
the form of tweets. After selecting the data set of the tweets, these tweets were
cleaned from emoticons, unnecessary punctuation marks and a database was created
to store this data in a specific transformed structure. In this structure, all the
transformed tweets are in lowercase alphabets and are divided into different parts of
tweets in the specific field. The details about the steps adopted for the transformation
of information are described in next subsections.
11
Fig: 1.3.2.1.1 Extraction of Data
Processing of Data :
Conversion to lowercase:
To maintain uniformity all the tweets are converted to lowercase .This will benefit to
avert inconsistency in data.Python provides a function called lower() to convert
sentences to lower case.
Tokenization:
Tokenization is the process of converting text into tokens before transforming it into
vectors. It is also easier to filter out unnecessary tokens. For example, a document
into paragraphs or sentences into words. In this case we are tokenizing the reviews
into words.
12
Stemming and Lemmatization:
Sentences are always narrated in tenses,singular and plural forms making most
words accompanied with -ing,-ed,es and ies. Therefore,extracting the root word will
suffice to identify sentiment behind the text.
Base forms are the skeleton for grammar stemming and lemmatization reduces
inflectional forms and derivational forms to common base forms .
Example: Cats are reduced to cats ,ponies are reduced to poni.
Feature Extraction:
Text data demands a special measure before you train the model.Words after
tokenization are encoded as integers or floating point values for feeding input to
machine learning algorithms. This practice is described as vectorization or feature
extraction. Scikit-learn library offers TF-IDF vectorizer to convert text to word
frequency vectors.
Fitting Data to Classifier and predicting test data:
Train data is fitted to a suitable classifier upon feature extraction ,then once the
classifier is trained enough then we predict the results of the test data using the
classifier,then compare the original value to the value returned by the classifier.
Result Analysis:
Here the accuracy of different classifiers are shown among which the best classifier
with highest accuracy percent is chosen. Some factors such as f-score,mean,variance
etc., also accounts for consideration of the classifiers.
Visual Representation:
Our final results are plotted as pie charts which contain different fields such as
positive,negative,neutral in case of sentiment analysis.where as happy,sad,joy etc., in
case of emotional analysis. Pictorial representation is the best way to convey
information without much efforts.Thus it is chosen.
13
1.3.2.2 System Architecture:
14
Naive Bayes Algorithm :
Naive Bayes algorithm which is based on well known Bayes theorem which is
mathematically represented as
Where,
A and B are events
P(A/B) is the likelihood of event A given that event B is true and has
happened,Which is known to be as posterior probability .
P(A) is the likelihood of an event A being true,Which is known to be a prior
probability.
P(B/A) is the likeliness of happening of an event B given A was true ,Which is known
to be as Likelihood.
P(B) is the likelihood of happening of an event B,Which is known to be as Evidence .
This is a classification method that relies on Bayes' Theorem with strong (naive)
independence assumptions between the features. A Naive Bayes classifier expects
that the closeness of a specific feature (element) in a class is disconnected to the
closeness of some other elements. For instance, an organic fruit might be considered
to be an apple if its color is red, its shape is round and it measures approximately
three inches in breadth. Regardless of whether these features are dependent upon one
another or upon the presence of other features, a Naïve Bayes classifier would
consider these properties independent due to the likelihood that this natural fruit is
an apple. Alongside effortlessness, the Naive Bayes is known to out-perform even
exceedingly modern order strategies. The Bayes hypothesis is a method of
computing for distinguishing likelihood P(a|b) from P(a), P(b) and P(b|a) as follows:
p(a|b) = [p(b|a) * p(a)] / p(b)
Where p(a|b ) is the posterior probability of class a given predictor b and p(b|a ) is the
likelihood that is the probability of predictor b given class a.
The prior probability of class a is denoted as p(a ), and the prior probability predictor p
is denoted as p(b).
The Naive Bayes is widely used in the task of classifying texts into multiple classes and
was recently utilized for sentiment analysis classification.
15
Fig: 1.3.2.2.2 Naive Bayes Classifier
16
1.4 Goal of Report
With the emergence of social networking, many websites have evolved in the past
decade like Twitter, Facebook, Tumbler, etc. Twitter is one website which is widely
used all over the world. According to Twitter it has been recorded that around 200
billion tweets posts every year. Twitter allows people to express their thoughts,
feelings, emotions, opinions, reviews, etc. about any topic in natural language within
140 characters. Python is the standard high-level programming language which is
best for NLP. Thus, for processing natural language data, Python uses one of its
libraries called Natural Language Toolkit. NLTK provides large amount of corpora
which helps in training classifiers and it helps in performing all NLP methodology
like tokenizing, part-of-speech tagging, stemming, lemmatizing, parsing and
performing sentiment analysis for given datasets.
It is a challenging task to deal with a large dataset, but with the use of NLTK we can
easily classify our data and give more accurate results based on different classifiers.
The goal of this thesis is to perform sentiment analysis on different reviews of social
media platforms and surveys. Public opinions of these parties are mined from Twitter
and then classified into sentiments, whether positive or negative by using supervised
machine learning classifiers. These results will let us know about the reviews and
opinions of people on these political parties.
To achieve this goal, a module is created which can perform live sentimental analysis.
In live sentimental analysis users can obtain the trend of any live trending topic
depicted by two sentiment categories (positive and negative) in live graphs. Further
accuracy and reliability of the module can be checked with the help of various
machine learning classifiers.
In this thesis we work on different political parties because in our country politics
plays a very vital role. Winning an election by any party is different from how that
party works after winning.
18
CHAPTER 2 LITERATURE REVIEW
"What other people think” has always been an important piece of information for
most of us during the decision-making process. The Internet and the Web have now
(among other things) made it possible to find out about the opinions and experiences
of those in the vast pool of people that are neither our personal acquaintances nor
well-known professional critics — that is, people we have never heard of. And
conversely, more and more people are making their opinions available to strangers via
the Internet. The interest that individual users show in online opinions about products
and services, and the potential influence such opinions wield, is something that is
driving force for this area of interest. And there are many challenges involved in this
process which needs to be walked all over inorder to attain proper outcomes out of
them.
19
O. Almatrafi, S. Parack, B. Chavan et al [12]
They are the researchers who proposed a system based on location. According to
them, Sentiment Analysis is carried out by Natural Language Processing (NLP) and
machine learning algorithms to extract a sentiment from a text unit which is from a
particular location. They study various applications of location based sentiment
analysis by using a data source in which data can be extracted from different
locations easily. In Twitter, there is a field of tweet location which can easily be
accessed by a script and hence data (tweets) from particular locations can be collected
for identifying trends and patterns. In their research they work on Indian general
elections 2014. They perform mining on 600,000 tweets which were collected over a
period of 7 days for two political parties. They apply a supervised machine learning
approach, like Naïve-Bayes algorithm, to build a classifier which can classify the
tweets in either positive or negative. They identify the thoughts and opinions of users
towards these two political parties in different locations and they plot their findings
on India map by using a Python library.
20
other classifiers, and it generalized well on unseen examples.
Five datasets were considered to compare among various approaches. In the bag of
words Each sentence in the dataset was represented by a feature vector composed of
Boolean attributes for each word that occurs in the sentence. If a word occurs in a
given sentence, its corresponding attribute is set to 1; otherwise it is set to 0. In N
grams approach they are defined as sequences of words of length n. N-grams can be
used for catching syntactic patterns in text and may include important text features
such as negations, e.g., “not happy”. Negation is an important feature for the analysis
of emotion in text because it can totally change the expressed emotion of a sentence.
The author concludes some research studies in sentiment analysis claimed that
N-grams features improve performance beyond the BOW approach.
2.2.3 Classification of Emotions from text using SVM based Opinion Mining :
SVM classification using Quadratic programming was used. Steps included preparing
the data set ,annotating the dataset with predefined emotions , using NLP, preparing
the database matrix of test emotions and training emotions, classifying the training set
with a support vector machine using quadratic programming algorithm. Compute the
21
prediction of the support vector machine using kernel function and its parameter for
classification and finally compute the accuracy of the classification. The basic idea of
SVM is to find the optimal hyperplane to separate two classes with the largest margin
of pre-classified data. After this hyperplane is determined it is used for classifying
data into two classes based on which side they are located. By applying appropriate
transformations to the data space after computing the separating hyperplane, SVM
can be extended to cases where the margin between two classes is non-linear. Finally,
on classifying the data set ,superior results have been obtained.
22
CHAPTER 3 DESIGN
24
3.2.2 Sequence Diagram:
25
Fig: 3.2.2.2 Sequence Diagram for Sentiment Analysis(II)
26
3.2.3 Activity Diagram:
27
3.2.4 Collaboration Diagram:
28
4.1 Software Requirement
Following are the software and modules that needs to be installed for successful
execution of the project.They are:
● Anaconda
● Spyder
● Jupyter NoteBook
● Nltk
● Scikit-learn
● Matplotlib
● Tweepy
● Pandas
● Numpy
● TextBlob
● VaderSentiment
● Csv
● Re(Regular Expressions)
● Windows
analyzer=SentimentIntensityAnalyzer()
hacking_tweets=r"C:\Users\saile\OneDrive\Desk
top\cyber.csv" COLS = ['ItemId','Sentiment',
'SentimentText']
start_date = '2019-10-01'
end_date = '2019-10-31'
# Happy Emoticons
emoticons_happy = set([
':-)', ':)', ';)', ':o)', ':]', ':3', ':c)', ':>', '=]', '8)', '=)', ':}',
':^)', ':-D', ':D', '8-D', '8D', 'x-D', 'xD', 'X-D', 'XD', '=-D', '=D',
'=-3', '=3', ':-))', ":'-)", ":')", ':*', ':^*', '>:P', ':-P', ':P', 'X-P',
'x-p', 'xp', 'XP', ':-p', ':p', '=p', ':-b', ':b',
'>:)', '>;)', '>:-)', '<3'
])
# Sad Emoticons
emoticons_sad = set([
':L', ':-/', '>:/', ':S', '>:[', ':@', ':-(', ':[', ':-||', '=L', ':<',
':-[', ':-<', '=\\', '=/', '>:(', ':(', '>.<',
":'-(", ":'(", ':\\', ':-c', ':c', ':{', '>:\\', ';('
])
#Emoji patterns
'''emoji_pattern = re.compile("["
u"\U0001F600-\U0001F64F" # emoticons
u"\U0001F300-\U0001F5FF" # symbols
& pictographs
u"\U0001F680-\U0001F6FF" # transport
& map symbols
u"\U0001F1E0-\U0001F1FF" # flags
(iOS) u"\U00002702-\U000027B0"
u"\U000024C2-\U0001F251"
"]+",
flags=re.UNICODE) '''
30
#combine sad and happy emoticons
emoticons =
emoticons_happy.union(emoticons_sad)
#mrhod clean_tweets()
def clean_tweets(tweet):
stop_words =
set(stopwords.words('english'))
word_tokens =
word_tokenize(tweet)
#after tweepy preprocessing the colon left remain after removing mentions
import pandas as pd
import numpy as np
import re
import nltk
from nltk.tokenize import
word_tokenize from
nltk.corpus import
stopwords
train_data =pd.read_csv(r'C:\Users\saile\OneDrive\Desktop\cyber.csv')
rand_indexs = np.random.randint(1,len(train_data),50).tolist()
train_data["SentimentText"][rand_indexs]
tweets_text =
train_data.SentimentText.str.cat()
emos = set(re.findall(r"
([xX:;][-']?.) ",tweets_text))
emos_count = []
for emo in emos:
emos_count.append((tweets_text.co
unt(emo), emo))
32
sorted(emos_count,reverse=True)
HAPPY_EMO = r" ([xX;:]-?[dD)]|:-?[\)]|[;:][pP]) "
most_used_words(train_data.SentimentTex
t.str.cat())[:100] # In[11]:
mw = most_used_words(train_data.SentimentText.str.cat())
most_words = []
for w in mw:
if len(most_words) == 1000: break
if w in stopwords.words("english"): continue
else:
most_words.append(w)
sorted(most_words) #
In[12]:
from nltk.stem.snowball import
SnowballStemmer from nltk.stem
import WordNetLemmatizer
def stem_tokenize(text):
stemmer =
SnowballStemmer("englis
h") stemmer =
WordNetLemmatizer()
return [stemmer.lemmatize(token) for token in word_tokenize(text)]
def lemmatize_tokenize(text):
lemmatizer = WordNetLemmatizer()
return [lemmatizer.lemmatize(token) for token in word_tokenize(text)]
# In[13]:
from sklearn.feature_extraction.text import
TfidfVectorizer from sklearn.base import
33
TransformerMixin, BaseEstimator from
sklearn.pipeline import Pipeline
class
TextPreProc(BaseEstimator,Transfor
merMixin): def init (self,
use_mention=False):
self.use_mention = use_mention
def fit(self, X, y=None):
return self
# Removing links
X = X.str.replace(r"https?://\S*", "")
# replace repeated letters with only two
occurences # heeeelllloooo => heelloo
X = X.str.replace(r"(.)\1+", r"\1\1") # mark
emoticons as happy or sad
X = X.str.replace(HAPPY_EMO, " happyemoticons ")
X = X.str.replace(SAD_EMO, " sademoticons ")
X = X.str.lower() return X
# In[14]:
from sklearn.model_selection import
train_test_split from sklearn.ensemble
import RandomForestClassifier from
sklearn.model_selection import
cross_val_score from sklearn.metrics
import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import BernoulliNB,
MultinomialNB from sklearn import tree
sentiments = train_data['Sentiment']
tweets = train_data['SentimentText']
vectorizer = TfidfVectorizer(tokenizer=lemmatize_tokenize,
ngram_range=(1,2)) pipeline = Pipeline([
('text_pre_processing',
TextPreProc(use_mention=True)), ('vectorizer',
34
vectorizer),
])
learn_data, test_data, sentiments_learning, sentiments_test = train_test_split(tweets,
sentiments, test_size=0.3)
learning_data =
pipeline.fit_transform(learn_data)
lr = LogisticRegression()
bnb = BernoulliNB() mnb =
MultinomialNB()
clf = tree.DecisionTreeClassifier()
clf1 = RandomForestClassifier(n_estimators=10) models =
{
'logitic regression': lr,
'bernoulliNB': bnb,
'multinomialNB': mnb,
}
for model in models.keys():
scores = cross_val_score(models[model], learning_data, sentiments_learning,
scoring="f1", cv=10)
print("===", model, "===")
print("scores = ", scores)
print("mean = ", scores.mean()) print("variance = ",
scores.var())
models[model].fit(learning_data, sentiments_learning)
# In[9]:
from sklearn.model_selection import GridSearchCV
grid_search_pipeline = Pipeline([ ('text_pre_processing',
TextPreProc()),
('vectorizer', TfidfVectorizer()), ('model', MultinomialNB()),
])
params = [
{
'texttext_pre_processing use_mention': [True, False], 'vectorizer max_features': [1000,
2000,
5000, 10000, 20000, None],'vectorizer ngram_range': [(1,1), (1,2)],
},]
grid_search = GridSearchCV(grid_search_pipeline, params,
cv=5, scoring='f1') grid_search.fit(learn_data,
sentiments_learning) print(grid_search.best_params_)
# In[9]:
mnb.fit(learning_data, sentiments_learning)
35
# In[15]:
testing_data = pipeline.transform(test_data)
mnb.score(testing_data, sentiments_test)
# In[12]:
# Data to plot pos=124 neg=166
labels = 'Positive','Negative' sizes = [pos,neg]
colors = ['gold', 'lightcoral']
explode = (0.1,0)
# explode 1st slice
# Plot
plt.pie(sizes, explode=explode, labels=labels,
colors=colors, autopct='%1.1f%%',
shadow=True, startangle=140) plt.axis('equal')
plt.show()
import pandas as pd
import numpy as np
import nltk
import re import
itertools import
time
import
matplotlib.pyp
lot as plt
start_time =
time.time()
import os
data =
36
pd.read_csv(r'C:\Users\saile\OneDrive\Desktop\project\text_emotion.csv')
sizes1=[]
from nltk.stem.wordnet import
WordNetLemmatizer lem =
WordNetLemmatizer()
def cleaning(text):
txt = str(text)
txt = re.sub(r"http\S+", "",
txt) if len(txt) == 0:
return 'no text' else:
txt = txt.split() index = 0
for j in range(len(txt)):
if txt[j][0] == '@':
index = j
txt = np.delete(txt,
index) if len(txt) ==
0:
return 'no
text' else:
words = txt[0]
for k in range(len(txt)-1):
words+= " " +
txt[k+1] txt = words
txt = re.sub(r'[^\w]', ' ',
txt) if len(txt) == 0:
return 'no
text' else:
txt = ''.join(''.join(s)[:2] for _, s in itertools.groupby(txt))
txt = txt.replace("'", "")
txt =
nltk.tokenize.word_tokenize(txt) for
j in range(len(txt)):
37
txt[j] = lem.lemmatize(txt[j],
"v") if len(txt) == 0:
return 'no
text' else:
return txt
data['content'] = data['content'].map(lambda x:
cleaning(x)) data = data.reset_index(drop=True)
for i in range(len(data)):
words = data.content[i][0]
for j in range(len(data.content[i])-1):
words+= ' ' +
data.content[i][j+1]
data.content[i] = words
from sklearn.feature_extraction.text import
TfidfVectorizer from sklearn.metrics import
classification_report
from sklearn import svm
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(data.content, data.sentiment, test_size=0.25,
random_state=0)
x_train = x_train.reset_index(drop = True)
x_test = x_test.reset_index(drop = True)
y_train = y_train.reset_index(drop = True)
y_test = y_test.reset_index(drop = True)
vectorizer = TfidfVectorizer(min_df=3,
max_df=0.9) train_vectors =
vectorizer.fit_transform(x_train) test_vectors =
vectorizer.transform(x_test)
model = svm.SVC(kernel='linear')
model.fit(train_vectors, y_train)
predicted_sentiment =
model.predict(test_vectors)
report=(classification_report(y_test, predicted_sentiment,output_dict=True))
df=pd.DataFrame(report).transpose()
df.to_csv('classification_report.csv',index=False)
sizes=df['support'].toli
st() for i in range(13):
sizes1.append(int(i))
labels=['Anger','Boredom','Empty','Enthusiasm','Fun','Happiness','Hate','Love','Neutral','Relief','S
adness','Surprise','Worry']
colors=['Red','yellowgreen','lightcoral','orange','gold','purple','black','pink','brown','green','blue','m
aroon','bluegreen']
plt.pie(sizes1, explode=None, labels=labels, colors=None,
autopct='%1.1f%%', shadow=True, startangle=None)
38
plt.axis('equal')
plt.show()
predicted_sentiments
= []
for s in range(len(predicted_sentiment));
predicted_sentiments.append(predicted_sentiment[s])
prediction_df=pd.DataFrame({'Content':x_test,
'Emotion_predicted':predicted_sentiment, 'Emotion_actual': y_test})
prediction_df.to_csv('emotion_recognizer_svm.csv', index = False)
elapsed_time = time.time() - start_time
print ("processing time:", elapsed_time, "seconds")
Reviews Class
Processed Tweet think, habit, lie, even, don’t, need, tell, angry
39
Table 4.4.3 Sample Cleaned Data
:( \u201c@EW: How awful. Police: Sad, awful, police, driver, kills, Driver kills 2,
http://t.co/8GmFiOuZbS\u201d injures 23 at #SXSW injures
Output
Below are the results for sentiment and emotional analysis represented as a pie-chart for users
using matplotlib.
40
CHAPTER 5 CONCLUSION & FUTURE WORK
In future work , we aim to handle emoticons , dive deep into emotional analysis to
further detect idiomatic statements .We will also explore richer linguistic analysis
such as parsing and semantic analysis.
Some of future scopes that can be included in our research work are:
● Use of parser can be embedded into the system to improve results.
● A web-based application can be made for our work in future.
● We can improve our system so that we can deal with sentences of multiple
meanings.
● We can also increase the classification categories so that we can get better results.
● We can start work on multi languages like Hindi, Spanish, and Arabic to
provide sentiment analysis to more local.
41
6. REFERENCES
[1] Emma Haddi, Xiaohui Liu, Yong Shin,”The Role of Text Pre-processing
in Sentiment Analysis” Volume 17, 2013
url:https://doi.org/10.1016/j.procs.2013.05.005
[2] Saif M.Mohammad,”9 - Sentiment Analysis: Detecting Valence,
Emotions, and Other Affectual States from Text”, National Research Council
Canada, Ottawa, ON, Canada, 15 April 2016
url:https://doi.org/10.1016/B978-0-08-100508-8.00009-6
[3] H. Tang, S. Tan, X. Cheng, A survey on sentiment detection of reviews,
Expert Systems with Applications 36 (7) (2009) 10760-10773.
url:https://doi.org/10.1016/j.eswa.2009.02.063
[4] B. Pang, L. Lee, S. Vaithyanathan, Thumbs up? sentiment classification
using machine learning techniques, in: Proceedings of the 2002 Conference on
Empirical Methods in Natural Language Processing (EMNLP), 2002.
url:https://doi.org/10.48550/arXiv.cs/020500
[5] T. Wilson, J. Wiebe, P. Hoffmann, Recognizing contextual polarity in
phrase-level sentiment analysis, in: Proceedings of the Human Language
Technology Conference and the Conference on Empirical Methods in Natural
Language Processing (HLT/EMNLP), 2005, pp. 347-354.
url:https://aclanthology.org/H05-1044.pdf
[6] “Support Vector Machines”
[Online], http://scikitlearn.org/stable/modules/svm.html#svm-classification,
Accessed Jan 2016
[7] ] H. Wang, D. Can, F. Bar and S. Narayana, “A system for real-time
Twitter sentiment analysis of 2012 U.S.presidential election cycle”, Proc. ACL
2012 System Demonstration, pp. 115-120, 2012
[8] P. Pang, L. Lee and S. Vaithyanathan, “Thumbs up? sentiment
classification using machine learning techniques”, Proc. ACL-02 conference on
Empirical methods in natural language processing, vol.10, pp. 79-86, 2002
[9] P. Pang and L. Lee, “Opinion Mining and Sentiment Analysis.
Foundation and Trends in Information Retrieval”, vol. 2(1-2), pp.1-135, 2008
42
[10.] E. Loper and S. Bird, “NLTK: the Natural Language Toolkit”, Proc.
ACL-02 Workshop on Effective tools and methodologies for teaching natural
language processing and computational linguistics ,vol. 1,pp. 63-70, 2002
[11.] O. Almatrafi, S. Parack and B. Chavan, “Application of location-based
sentiment analysis using Twitter for identifying trends towards Indian general
elections 2014”. Proc. The 9th International Conference on Ubiquitous
Information Management and Communication,2015.
[12.] L. Jiang, M. Yu, M. Zhou, X. Liu and T. Zhao, “Target-dependent twitter
sentiment classification”, Proc. The 49th Annual Meeting of the Association for
Computational Linguistics: Human Language Technologies, vol. 1, pp. 151-160,
2011.
[13.] C. Tan, L. Lee, J. Tang, L. Jiang, M. Zhou and P. Li, “User-level sentiment
analysis incorporating social networks”, Proc. The 17th ACM SIGKDD
international conference on Knowledge discovery and data mining, pp. 1397-
1405, 2011.
[14.] A. Pak and P. Paroubek, “Twitter as a Corpus for Sentiment Analysis and
Opinion Mining”, vol. 10, pp. 1320-1326, 2010.
[15.] B. Sun and TY. V. Ng, “Analyzing Sentimental Influence of Posts on
Social Networks”, Proc. The 2014 IEEE 18th International Conference on
Computer Supported Cooperative Work in Design, 2014.
[16.] B. Sun and TY. V. Ng, “Analyzing Sentimental Influence of Posts on
Social Networks”, Proc. The 2014 IEEE 18th International Conference on
Computer Supported Cooperative Work in Design, 2014 .
[17.] A. Go, R. Bhayani and L. Huang, “Twitter sentiment classification using
distant supervision”, CS224N Project Report, Stanford, vol.1-12, 2009
[18.] A. Barhan and A. Shakhomirov, “Methods for Sentiment Analysis of Twitter
Messages”, Proc.12th Conference of FRUCT Association, 2012
[19.] T. C. Peng and C. C. Shih, “An Unsupervised Snippet-based Sentiment
Classification Method for Chinese Unknown Phrases without using Reference
Word Pairs”. IEEE/WIC/ACM Int. Conf. on Web Intelligence and Intelligent
Agent Technology, vol. 3, pp. 243-248, 2010.
43