International Journal of Pure and Applied Mathematics
Volume 119 No. 15 2018, 3509-3514
ISSN: 1314-3395 (on-line version)
url: http://www.acadpubl.eu/hub/
Special Issue
http://www.acadpubl.eu/hub/
Sentiment Analysis of Online Food Reviews using
Customer Ratings
Sasikala P#1 , L.Mary Immaculate Sheela*2
#
Department of Computer science, Mother Teresa Women’s University, Kodaikanal, India. 1shakthesasi@gmail.com
*
Pentecost University College, Accra, Ghana. 2drsheela09@gmail.com
Abstract— The growth of web contributes a huge quantity
of user created content such as customer feedback,
opinions and reviews. Sentiment analysis in web embraces
the problem of aggregating data in the web and extraction
about opinions. Studying the opinions of customers helps
to determine the people feeling about a product and how it
is received in the market. Various commercial tools are
available for sentiment analysis. In this paper, we propose
a system which classifies the reviews on a scale of 1 to 5
based on the sentiments in the words. The groups of words
used to make a decision to rate the reviews are displayed
as word cloud.
Keywords— Customer reviews, Machine Learning, Opinion
mining, Natural Language Processing, Sentiment Analysis.
not make much difference between sentence and document
level. To get the detail opinion or sentiments have to process
the document to aspect level. Aspect level sentiment analysis
is to determine the features of the sentiment conveyed towards
each aspect and the given target entities. This proposed model
handles sentiment polarity classification which is a
fundamental problem of sentiment classification. The online
data have several drawbacks to hinder the sentiment analysis
task. The first defect is anybody can post their own contents
and quality or impact of their comment is not assured. Online
spammers may post their fake opinions [4]. The next fault is
the polarity of reviews cannot be ascertained or unavailable.
The dataset used in this paper is around 3000 food reviews
collected from amazon. Each post in amazon is inspected and
verified by the company before it gets posted. Each review has
a rating scale from 1 to 5 stars which can be used to identify
the sentiment polarity.
I. INTRODUCTION
“What others think?‖ is always important information in a
decision-making process. Every day people discuss various
products on social media sites. Web and its associated
distribution services provides information services such as
online services where data objects are linked together to
facilitate interactive access. Web pages may not have a
predefined schema or pattern and it is difficult for computers
to understand the semantic meaning. Companies want a piece
of that pie to determine how their audience communicates to
find the important information that drives business. Sentiment
analysis is the robotic mining of opinions and feelings from
content through Natural Language Processing (NLP).
Sentiment analysis is nothing but categorizing opinions in the
given content or documents into "positive" or "negative‖ or
"neutral"[1].
Sentiment analysis can be processed in three levels: aspectlevel, document-level, and sentence-level. Sentiment analysis
in document level considers the entire document as a single
topic and classifies positive or negative sentiment. The
sentiment expressed in each and every sentence is classified in
sentence level. Since sentences are part of the documents does
3509
II. BACKGROUND AND LITERATURE REVIEW
Sentiment analysis techniques are machine learning, lexicon
based and hybrid techniques. Machine learning techniques are
implemented in supervised classification. Lexicon based
approach depends on the collection of opinion terms.
Combined approach of lexicon and machine learning is hybrid
approach. The computational techniques involving languages
are called as natural language processing. Most of the existing
researches focus on the mining of data available online in web.
Before World Wide Web, information or opinion about the
products is collected based on surveys manually.
Xing et al. [8] had proposed a work on product reviews
collected from amazon to identify the negation phrases.
Sentence level and review level classification of data is
performed for the data collected from February to April 2014.
Aashutosh Bhatt et al. [3] used reviews of iPhone 5 extracted
from Amazon website and suggested a rule based extraction of
product feature sentiment analysis. POS technique is
implemented to each and every sentence level and the results
are shown in charts. Ahmad Kamal [2] used supervised and
International Journal of Pure and Applied Mathematics
rule based techniques to mine the opinions from online
product reviews.
III. RESEARCH DESIGN AND METHODOLOGY
Sentiment classification is to select and extract the text
features. Feature selection in sentiment analysis is collecting
the information from reviews in web and performing the
following steps.
Data Preparation: The data preparation step will pre-process
the data and removes all the non-textual information and tags.
Data pre-processing performs cleaning of data by removing
the information like review date and name of the reviewer
which is not required for sentiment analysis.
Review Analysis: finding parts of speech (POS) adjectives and
counting the presence and frequency.
Sentiment Classification: Classifies the extracted words as
positive or negative.
The architectural view of the system is given in fig.1.
Fig .1 Architecture of the proposed system.
Special Issue
sentence. The subjectivity detection is implemented using
part-of-speech (POS) tagger [6]. The POS tagger is used to
filter the word which does not have any sentiments.
A collection of nouns (dog, restaurant, cat, etc), verbs (is, are,
would, etc) and adjectives (hot, beautiful, fragrant, etc) are
used to decide the subjectivity of the sentence. Frequently
used features are identified and stored as vector of text. These
features are processed and converted to lower case alphabets.
Further all the punctuations and stop words are removed. The
word frequency matrix is created in the next step. Opinion
words with sentiment polarity as positive, negative, neutral are
extracted and a score value is provided. Opinion words are
mined as summary and displayed as word cloud. The best
reviews are those with score of 5 stars and 4 stars, average
reviews are those with score of 3 stars and worst reviews are
those with score of 2 stars and 1 star.
1. Procedure ScoreOrientation ( score t ,
summary si)
2. begin
for each opinion word op in summary si
3.
4.
if ( t > 3 )
5.
si orientation = positive
6.
else
7.
si orientation = negative
8. endfor;
9. end
Fig .2 Prediction of score orientation.
The above mentioned procedure in Fig.2 is used to compute
the scores and sentiment polarity. The results are displayed as
word clouds with scores rating from 1 to 5. R is a popular tool
for statistical analysis using data frames [7]. It is an open
source tool and suitable for various operating systems.
IV. EXPERIMENTAL SETUP AND RESULT EVALUATION
In this division we assess some of our design preferences
explained in the preceding sections. The dataset consists of
50000 fine food reviews posted by more than 200,000
customers about 70,000 and over food products[5]. For the
experimental usage latest 3000 fine food reviews is used and
implemented.
A. Sentiment Computation using scores
The rating of the products is in five star scales shown in
fig.3.The score value 1 indicates very negative, score value 2
is negative, 3 is neutral, 4 is positive and 5 is very positive.
Fig.3 Histogram of score
TABLE I
Summary of User Provided Score
Fig .1 Architecture of the proposed system.
The reviews are extracted from the amazon website and stored
in SQLite database. An organized machine learning technique
is planned to group the subjectivity words in the review
3510
Fig.3 Histogram of score
TABLE I
SUMMARY OF USER PROVIDED SCORE
International Journal of Pure and Applied Mathematics
Special Issue
S.no
Min.
1st Qu.
Median
Mean
3rd Qu.
Max.
1.000
4.000
5.000
4.146
5.000
5.000
The above table depicts the summary of user score mean value
is 4.146.
5
Top best review sentiment score is listed in the above table.
The text in the summary field is best reviews fetched from the
dataset. To extract the opinion word, the above text has to be
pre-processed; punctuation, special symbols like <br> and
stop words have to be removed. Top worst review comment is
presented in the below TABLE IV. "Being a salt-free product is why
I purchased this, but the chips are quite greasy." The word ―greasy‖ is
used as opinion word to decide the sentiment polarity. The
sentiment or opinion of each of every term in the summary is
analysed.
Fig.4 Histogram of average sentiment
TABLE II
TABLE IV
TOP WORST REVIEW COMMENTS BY SENTIMENT SCORE
SUMMARY OF AVERAGE SENTIMENT
S.no
Min.
1st Qu.
Median
Mean
3rd Qu.
Max.
1.06600
0.04845
0.20960
0.21800
0.37710
2.60300
1
2
The histogram of average sentiment is represented in fig. 2.
The top 3000 fine food reviews are used to find the average
sentiment. The average mean is 0.21800 shown in table 2.
There is a huge difference between score and average
sentiment.
3
TABLE III
TOP BEST REVIEWS BY SENTIMENT SCORE
S.no
1
2
3
4
Summary
/>The item was arrived earlier than expected, but,
nevertheless, it was well packed and only melts on her
mouth, she totally loves it.<br /><br />Three different taste
and flavors of Ferreros guarantees to give three different
yummy joy, Mmmmm...
"Excellent shipment and product as well , looking forward
to purchase again , thanks<br />Excellent shipment and
product as well , looking forward to purchase again ,
thanks<br />Excellent shipment and product as well ,
looking forward to purchase again , thanks"
4
Summary
"I liked getting this one for my twin 10-month olds because
it has such interesting ingredients, particularly the zucchini
and the garbanzo beans. I like the fact that it's vegetarian
yet I believe it has a complete protein since it also has brown
rice. This list of ingredients looks excellent and it's a green
color naturally.<br /><br />The babies liked it. They have
liked everything that has carrots.<br /><br />I have the
subscription for this one as I really like the ingredients and
the fact that I'm giving the babies food that is naturally
green.<br /><br />Ingredients per the label: Water,
organic corn, organic zucchini, organic green beans,
organic brown rice, organic carrots, organic garbanzo
beans, organic canola oil.<br /><br />This has 70 calories,
8% protein, 40% vitamin A, 10% vitamin C, 2% calcium and
2% iron. I've been informed that the percentage of daily
needs met is based on an adult."
"This is the very best marinade you can buy as far as I am
concerned---of course, everyone to his or her own tastes, but
all in all, I think anyone will like it---I give it five stars..."
"My partner is very happy with the tea, and is feeling much
better since starting to drink it.<br />She has been drinking
it both hot (normal) and iced (chilled) and likes the
refreshing nature of it."
"I ordered this for a friend at South Carolina for Christmas,
i want her to try some Italian chocolates, lol!<br /><br
3511
5
Summary
"Being a salt-free product is why I purchased this, but the
chips are quite greasy."
"My husband and I were very disappointed in this coffee,
very weak, watery cup of coffee. A definite waste of $13.00."
"Here is a nutritional breakdown:<br /><br />Calories:
100<br />Calories from Fat: 30<br />Total Fat: 3g<br
/>Saturated Fat, Trans Fat, Poly and Mono: 0g<br
/>Cholesterol: 0mg<br />Sodium: 160mg - 7%<br
/>Potassium 180mg - 5%<br />Total Carbohydrates 15g 5%<br />Dietary Fiber 1g - 4%<br />Sugars 2g - 4%<br
/>Protein 1g<br />Calcium 2%<br />Iron 2%<br /><br
/>For Weightwatcher folks this breaksdown to 2 Points......."
"If you are ordering cookies, beware, the 'cookies' arrived
in crumbles, so, if you want a box of stale crumbs order
away :)"
"No doubt about it this is just the right combination of
Lobster, scampi, prawns with a hint of Dry white wine & a
dash of brandy finish this soup to perfection.<br
/>Ingredients<br />Water, Lobster (4%), Cod (3.5%),
Scampi (2.5%), Concentrated Tomato Paste, Modified
Cornflour, White Wine, Prawns (1.5%), Skimmed Milk
Powder, Butterfat, Double Cream, Salt, Yeast Extract,
Sugar, Shrimp Powder, Fish Powder, Vegetable Oil,
Vegetable Extracts, Stabiliser (Polyphosphates), Brandy,
Concentrated Lemon Juice, Spices, Herb and Spice Extracts
with celery.<br />*No Artificial Colours<br />*No
Artificial Flavours<br />*No Artificial Preservatives<br
/><br />Information<br />Gluten Free<br />Contains:
MILK, FISH, SHELLFISH, CELERY."
B. Word Clouds
The visualization of opinion words and its counts with sizes
relative to their counts is displayed in Fig.5. This word cloud
is created by R software using text mining packages.
Frequency matrix is constructed before generating the word
cloud.
International Journal of Pure and Applied Mathematics
Special Issue
The score mentioned in the second field in Table VI is
converted into positive or negative using score value. All the
values above score 3 are predicted as positive and the values
less than 3 are predicted as negative. We can see the result in
TABLE VII.
TABLE VII
TEST AND TRAINING OF DATA SET USING SENTIMENT POLARITY
Fig. 5 Word Cloud of Sentiment words
The word cloud shows the scores in a 5 point scale ranging
from very negative to very positive.
Score
Summary
0
Positive
Good Quality Dog Food
1
Negative
Not as Advertised
2
Positive
―Delight‖ says it all
3
Negative
Cough Medicine
4
Positive
Great taffy
5
Positive
6
Positive
Great! Just as good as the expensive brands!
7
Positive
Wonderful, tasty taffy
8
Positive
Yay Barley
9
Positive
Healthy Dog Food
10
Positive
The Best Hot Sauce in the World
Nice Taffy
After performing stemming and pruning of the data, further
machine learning algorithms can be applied to predict the
expected results.
Fig.6 Word clouds of scores
V.
C. Sentiment Score Prediction
This prediction model focuses on sentiment polarity like
positive or negative instead of scores. Predictive techniques
like Naïve Bayes, Regression can be used to test the data. The
below table is the representation of split data into train and test
set.
TABLE VI
TEST AND TRAINING OF DATA SET USING SCORES
Score
Summary
0
5
Good Quality Dog Food
1
1
Not as Advertised
2
4
"Delight" says it all
3
2
Cough Medicine
4
5
Great taffy
5
4
Nice Taffy
6
5
Great! Just as good as the expensive brands!
7
5
Wonderful, tasty taffy
8
5
Yay Barley
9
5
Healthy Dog Food
10
5
The Best Hot Sauce in the World
Sentiment analysis is the process of identifying the feeling
expressed in the text or document. We proposed a
methodology for mining the food reviews based on score
combined with existing text analysing packages. The proposed
system has produced a very good result using the score
ratings. The limitation of this system is, it works better only
for the open sentiments like rating or scores. The results were
not promising for hidden sentiments. In Future work,
prediction based methods will be implemented with existing
approach. More features will be extracted to handle the
implicit sentiment analysis.
[1]
[2]
[3]
[4]
3512
CONCLUSION AND FUTURE WORK
REFERENCES
Anjali Ganesh Jivani, ―A Comparative Study of Stemming
Algorithms‖, International. Journal. Computer. Technology.
Applications., Vol 2 (6), 1930-38, ISSN:2229-6093.
Ahmad Kamal , ―Subjectivity Classification using Machine
Learning Techniques for Mining Feature - Opinion Pairs from
Web Opinion Sources‖, International Journal of Computer Science
Issues (IJCSI), Volume 10 Issue 5, 2013, pp 191-200.
Aashutosh Bhatt, Ankit Patel, Harsh Chheda, Kiran Gawande,
―Amazon Review Classification and Sentiment Analysis ―,IJCSIT,
Vol. 6 (6) , 2015, 5107-5110.
Liu B (2014) ,‖The science of detecting fake reviews‖,
http://content26.com/blog/bing-liu-the-science-of-detecting-fakereviews/.
International Journal of Pure and Applied Mathematics
[5]
[6]
[7]
[8]
J. McAuley and J. Leskovec, ―From amateurs to connoisseurs:
modeling the evolution of user expertise through online reviews.‖
,WWW, 2013.
Roth D, Zelenko D (1998), ―Part of speech tagging using a
network of linear separators‖ In: Coling-Acl, The 17th
International Conference on Computational Linguistics, 1136–
1142.
Tierney, L. (2005), ―Some notes on the past and future of LispStat‖, Journal of Statistical Software, 13 (9), 1–15.
Xing Fang and Justin Zhan , ―Sentiment analysis using product
review data ―, Journal of Big Data 2015 : DOI: 10.1186/s40537015-0015-2© Fang and Zhan; licensee Springer. 2015.
3513
Special Issue
3514