sentimental_analysis[1]
sentimental_analysis[1]
sentimental_analysis[1]
OBJECTIVE
This project focusing on the estimation of the polarity of the
sentiment evoked by an text through input box. To implement an
algorithm for automatic classification of text into positive, negative or
neutral. Sentiment Analysis to determine the attitude of the mass is
positive, negative, neutral towards the subject of interest. It is
represented in the form of pie chart.
PROBLEM STATEMENT
To provide a Sentiment Analysis system for customers review
classification, that may be helpful to analyze the information where
opinions are highly unstructure and are either positive or negative.
EXISTING SYSTEM
The content of user generated opinions in the social media such as
face book, twitter, review sites, etc are growing in large volume. These
opinions can be tapped and used as business intelligence for various uses
such as marketing, prediction, etc. Generally sentiment analysis is used
for finding out the aptitude of the author considering some topic. But in
our social network sites not implemented Sentiment analysis. Some
survey depends on the static sent word dataset to find the sentiment
analysis. But we require finding a proper solution to find the polarity of
the micro blogs.
PROPOSED SYSTEM
We will collect the unstructured data through the text box. With that
data covert the data to lower case and data is processed as follow.
Pre-processing
Before the feature extractor can use the reviews to build feature
vector, the review text goes through pre-processing step where the
following steps are taken. These steps convert plain text of the review
into process able elements with more information added that can be
utilized by feature extractor. For all these steps, third-party tools were
used that were specialized to handle unique nature of review text.
PyCharm is an integrated development environment(IDE) used in
computer programming. Specifically for the Python language. It is
developed by the Czech company JetBrains. It provides code analysis, a
graphical debugger, an integrated unit tester.
STEPS INVOLVED
Step 1: Tokenization
Tokenization is the process of converting text as a string into
processable elements called tokens. In the context of a review, these
elements can be words, emoticons, url links, hashtags or punctuations
“an insanely awsum…. " Text was broken into “an”, “insanely”,
“awsum”…. These elements are often separated by separated by a space.
On the other hand, hash tags with”#” preceding the tag needs to be
retained since a word as a hash tag may have different sentiment value
than a word used regularly in the text.
Step 2 :
Parts of Speech Tags Parts of Speech (POS) tags are characteristics of a
word in a sentence based on grammatical categories of words of
language. This information is essential for sentiment analysis as words
may have different sentiment value depending on their POS tag. For
example, word like “good” as a noun contains no sentiment whereas
“good” as an adjective positive sentiment. each token extracted in the
last step is assigned a POS.
MATPLOT LIBRARY
Matplotlib is a data visualization tool that helps users view large
amounts of data in a more comprehensible way. Companies use
Matplotlib to simplify complex data so they can determine growth
patterns and solve problems.
Matplotlib can create static, animated, and interactive visualizations.
Users can customize the visual style and layout, export to many file
formats, and embed in JupyterLab and Graphical User Interfaces.
PANDA'S LIBRARY
The Pandas library is a popular, open-source tool for data analysis and
manipulation in Python.
Pandas is a high-level tool that can help with data wrangling, or the
process of transforming data into a structured and quality format for
analysis. It can help with tasks like sorting, removing irrelevant values,
and restructuring data sets. Pandas also has tools for reading and writing
data in different formats, such as CSV, Excel, and SQL.
SCIKIT-LEARN LIBIRARARY
Scikit-learn, also known as sklearn, is a free, open-source Python
library that provides tools for machine learning and statistical modeling:-
● Classification: Identify which category an object belongs to, such as
spam detection or image recognition
● Regression: Linear and logistic regression
● Clustering: K-means and K-means++ Model selection: Preprocessing,
including Min-Max Normalization
● Dimensionality reduction: A tool for machine learning and statistical
modeling Evaluation, selection, and model development: Tools for
machine learning and statistical modeling
● Data preprocessing: Tools for machine learning and statistical
modeling
VECTORIZER LIBRARY
The vectorizers library aims to provide a set of easy to use tools for
turning various kinds of unstructured sequence data into vectors. By
following the scikit-learn transformer API we ensure that any of the
vectorizer classes can be trivially integrated into existing sklearn
workflows or pipelines.
● Stopped by during the late May bank holiday off Rick Steve recommendation and loved it. 1
● The selection on the menu was great and so were the prices. 1
● The potatoes were like rubber and you could tell they had been made up ahead of time being kept under
a warmer. 0
● A great touch. 1
● The cashier had no care what so ever on what I had to say it still ended up being wayyy overpriced. 0
● I was disgusted because I was pretty sure that was human hair. 0
● Highly recommended. 1
● So they performed. 1
● #NAME? 0
● This hole in the wall has great Mexican street tacos, and friendly staff. 1
● Took an hour to get our food only 4 tables in restaurant my food was Luke warm, Our sever was running
around like he was totally overwhelmed. 0
● The worst was the salmon sashimi. 0
● Also there are combos like a burger, fries, and beer for 23 which is a decent deal. 1
● seems like a good quick place to grab a bite of some familiar pub food, but do yourself a favor and look
elsewhere. 0
● The only redeeming quality of the restaurant was that it was very inexpensive. 1
● Poor service, the waiter made me feel like I was stupid every time he came to the table. 0
● Service sucks. 0
● There is not a deal good enough that would drag me into that establishment again. 0
● Hard to judge whether these sides were good because we were grossed out by the melted styrofoam and
didn't want to eat it for fear of getting sick. 0
● On a positive note, our server was very attentive and provided great service. 1
● Frozen pucks of disgust, with some of the worst people behind the register. 0
● The only thing I did like was the prime rib and dessert section. 1
● My side Greek salad with the Greek dressing was so tasty, and the pita and hummus was very refreshing.
1
● We ordered the duck rare and it was pink and tender on the inside with a nice char on the outside. 1
● He came running after us when he realized my husband had left his sunglasses on the table. 1
● They have horrible attitudes towards customers, and talk down to each one when customers don't enjoy
their food. 0
● The Heart Attack Grill in downtown Vegas is an absolutely flat-lined excuse for a restaurant. 0
[]
import numpy as np
import pandas as pd
import nltk
import re
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
import sklearn
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
[]
data = pd.read_csv('/content/drive/MyDrive/Restaurant_Reviews.tsv',
delimiter='\t' , quoting=3)
[]
data.shape
(1000, 2)
[]
data.columns
Index(['Review', 'Liked'], dtype='object')
[]
data.head()
[]
data.info
[]
data.describe()
[]
import matplotlib.pyplot as plt
from wordcloud import WordCloud
[]
combined_text=" ".join(data['Review']) #Combine all review text into one
string
wordcloud=WordCloud(width=800,height=400,background_color='white'
).generate(combined_text)
[]
#Plot the word cloud
plt.figure(figsize=(10,6))
plt.imshow(wordcloud,interpolation='bilinear')
plt.axis('off')
plt.title('Word Cloud of Reviews')
plt.show()
#The bigger the font of the word, that many times that word is repeated
in the dataset.
[]
from collections import Counter
[]
targeted_words=['good','great','amazing','bad','not bad']
all_words=" ".join(data['Review']).lower().split() #flattened reviews into a
single list of words
word_counts=Counter(all_words) #count of target words
target_word_count={word:word_counts[word] for word in
targeted_words}
#plotting
[]
#Plotting
plt.figure(figsize=(8,6))
plt.bar(target_word_count.keys(),target_word_count.values(),color=['blu
e','green','orange','red','black'])
plt.xlabel('Words')
plt.ylabel('Frequency')
plt.title('Frequency of specific words in reviews')
plt.show()
[]
corpus =[]
for i in range(0,1000):
review =re.sub(pattern='[^a-zA-Z]',repl=' ', string=data['Review'][i])
review = review.lower()
review_words = review.split()
review_words = [word for word in review_words if not word in
set(stopwords.words('english'))]
ps= PorterStemmer()
review =[ps.stem(word) for word in review_words]
review = ' '.join(review)
corpus.append(review)
[]
corpus[:1500]
[]
from sk. learn. feature_extraction.text import countvrctorizer
cv=countvectroizer(max_feauters=1500)
x=cv. fit_transfor(corpus).toarray()
y=data. iloc[:,1].values
[]
from sklearn model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0,20,random_s
tate=0)
[]
X_train.shape,X_test.shape,y_train.shape,y_test.shape
[]
from sklearn.naive_bayes import MultinomialNB
classifier =MultinomialNB()
classifier.fit(X_train, y_train)
[]
y_pred = classifier.predict(X_test)
y_pred
[]
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
score1 =accuracy_score(y_test,y_pred)
score2 = accuracy_score(y_test,y_pred)
score3 = recall_score(y_test,y_pred)
print("---------SCORES--------")
print("Accuracy score is {}%".format(round(score1*100,3)))
print("Precision score is {}%".format(round(score2*100,3)))
print("recall score is {}%".format(round(score3*100,3)))
[]
from sk. learn. metrics import confusion_matrix
cm=confusion_matrix(y_test.y_pred)
[]
cm
[]
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
plt.figure(figsize=(10,6))sns.heatmap(cm,annot=True, cmap="YlGnBu",
xticklabels=['Negative','Positive'],yticklabels=['Negative','Positive'])
plt.xlabel('Predicted values')
plt.ylabel('Actual Values')
[]
fromssl import ALERT_DESCRIPTION_HANDSHAKE_FAILURE
best_accuracy =0.0
alpha_val =0.0
for i in np.arange(0.1,1.1,0.1):
temp_classifier =MultinomialNB(alpha=i)
temp_classifier.fit(X_train,y_train)
temp_y_pred =temp_classifier.predict(X_test)
score = accuracy_score(y_test,temp_y_pred)
print("AccuracyScoreforalpha={}is{}%".format(round(i,1),round(score*10
0,3)))
if score>best_accuracy:
best_accuracy=score
alpha_val =i
print('----------------------------------------------------')
print("The Best Accuracy Score is {}% with alpha value as
{}".format(round(best_accuracy*100, 2), round(alpha_val, 1)))
[ ]
classifier =MultinomialNB(alpha=0.2)
classifier.fit(X_train, y_train)
[]
[] import re
from nitk.corpus import stopwords
from nltk.stem import PorterStemmer
def predict_sentiment(sample_review):
sample_review=re.sub(pattern='[^a-zA-Z]",repl='',string-sample_review)
sample_reviewsample_review.lower()sample_review_words
sample_review.split()
sample_review_words [word for word in sample_review_words if not
word in set(stopwords.words('english'))]
ps = PorterStemmer()
final_review [ps.stem(word) for word in sample_review_words]
final_review = join(final_review)
temp = cv.transform([final_review]).toarray()
return classifier.predict(temp)
[]
sample_review ='The food is really bad.'
if predict_sentiment(sample_review):
print("Positive review")
else:
print("Negative review")
[]
sample_review ='The food was absolutely wonderful,from preparation to
presentation, very pleasing.'
if predict_sentiment(sample_review):
print("This is a Positive review")
else:
print("This is a Negative review")
REVIEW OUTLINE OF THE PROJECT
■ OUTLINE:-
● Problem Statement
● Proposed Sytem/Solution
● System Development Approach
● Algorithm & Deployment
● Advantages
● Applications
● Result
● Conclusion
● Future Scope
● References
■ PROBLEM STATEMENT
● Deployment:
Developed a user-friendly interface or application that provides real-
time predictions for customer reviews.Deploy the solution on a scalable
and reliable platform, considering factors like server infrastructure,
response time, and user accessibility.After the testing the aloritgorithm
and the interface , we deployed the model.
● Evaluation:
We evaluter the total algorithm, checked the all metrics like accuracy
value, precision value and the recall value.
● Result:
The accuracy score was 78.5%
■ SYSTEM APPROACH
● Prediction Process:
Naive Bayes is a simple but surprisingly powerful probabilistic machine
learning algorithm used for predictive modeling andclassification tasks. It
is an algorithm that learns the probability of every object, its features,
and which groups they belong to. Itcalculates the probability of a text
belonging to each sentiment class based on word frequencies. Then, it
assigns the class with thehighest probability.
We also check with the real time examples, and the result was so
accurate .
■ ADVANTAGES:-
● Brand analysis.
● New product perception.
● Finding a best option.
● Review related analysis .
● Support in decision making.
● Prediction amand trend analysis.
■ RESULT:-
The result of the machine learning model was accurate,It was also
done at real timeexamples.
It was very useful to analysing the sentiment. Here are some outputs
we got,
sample_review ='The food is really bad.'
if predict_sentiment(sample_review):
print("Positive review")
else:
print("Negative review")
★ Negative review
★ ---------SCORES--------
The accuracy score was 78.5%
precision value was 76.5%
recall value was 78.641%
■ CONCLUSION