Interim Project - Sentiment Analysis of Movie
Interim Project - Sentiment Analysis of Movie
Interim Project - Sentiment Analysis of Movie
Interim report on
Submitted By
Group No. 8 Batch: APR- 2019 Location: Bengaluru
Group Members
Saurav Suman – BABAPR19053
Neha Tiwary – BABAPR19057
Divya Thomas – BABAPR19018
Anurag Kedia – BABAPR19011
Peehu – BABAPR19071
Research Supervisor
Mr. Deepak Sharma
Page | 1
Contents
1. Introduction:................................................................................................................................3
2. Scope, Objective & Problem Statement.......................................................................................3
3. Data Source and Description.......................................................................................................5
a) Data Source.............................................................................................................................5
b) Data Description......................................................................................................................7
4. Data Pre-processing.....................................................................................................................8
a) Data Cleaning..........................................................................................................................8
b) Creation of Word Corpuses.....................................................................................................9
c) Extraction and Tokenization....................................................................................................9
d) DTM (Document Term Matrix)...............................................................................................9
5. Exploratory data analysis of the data.........................................................................................10
a) Visualization of retweets.......................................................................................................11
b) Plot of most frequent words in the text..................................................................................11
c) World Cloud of the keywords of the tweets...........................................................................13
d) Word Cloud for Account from which most retweets originate..............................................14
e) Word Cloud for the Location from where most of the tweets originate.................................14
6. Modelling Approach..................................................................................................................15
a. Techniques and software to be used......................................................................................15
b. Model Building......................................................................................................................16
c. Comparison chart for accuracy & AUC value of the all Model for Pre-release datasets........20
d. Comparison chart for accuracy & AUC value of the all Model for Post-release datasets......21
7. Recommendations & Applications............................................................................................21
8. Challenges and Limitations.......................................................................................................21
9. References and Bibliography.....................................................................................................22
10. Appendix...................................................................................................................................23
Page | 2
1. Introduction:
Movies are the most convenient way to entertain people. However, only few movies get higher
success and are ranked high. Many movies are produced by the movie industry in a year. Movie
revenue depends on various components such as cast acting in a movie, budget for the making of the
movie, film critics review, rating for the movie, release year of the movie, etc. Because of these
multiple components there is no formula that helps us to provide analysis for predicting how much
revenue a particular movie will be generating.
However, by analysing the revenues generated by previous movies, a model can be built which can
help us predict the expected revenue for a particular movie.
As we know in today’s world the movie is one of the biggest sources of entertainment and also for
business purposes. If we were able to predict the movie success rate in the correct manner then it
will be easy for the businessman to get higher profit from it and also if the prediction shows the
success rate is low of certain movie then it helps those businessmen to improve the content of the
movie so that they can get higher revenue from it. Success rate of movies, models and mechanisms
can be used to predict the success of a movie.
It will help the business significantly. Stakeholders such as Actors, Producers, directors and Event
Company etc. can use these predictions to make more informed decisions. They can make the
decision before the movie release.
Social media such as Twitter, YouTube, Facebook have been used for sharing contents and
comments on all types of subjects by millions of people on a daily basis. It is clear that businesses
have a strong interest in tapping into these huge data sources to extract information that might
improve their decision-making process.
For example, predictive models derived from social media for successful movies may facilitate
filmmakers making more profitable decisions. The topic of movies is one of the considerable
interests in the social media user community among all class of people.
Page | 3
Sentiment analysis aims to uncover the attitude of the person on a particular topic from the written
text. Other terms used to denote this research area include “opinion mining” and “subjectivity
detection”. It uses natural language processing and machine learning techniques to find statistical
and/or linguistic patterns in the text that reveal attitudes.
It has gained popularity in recent years due to its immediate applicability in business environment,
such as summarizing feedback from the product reviews, discovering collaborative
recommendations, or assisting in election campaigns. The focus of our project is the analysis of the
sentiments in the short web site comments.
We expect the short comment to express succinctly and directly person’s opinion on any movie. We
focus on two important properties of text: 1. subjectivity – whether the style of the sentence is
subjective or objective; 2. polarity – whether the person expresses positive or negative opinion. We
use statistical methods to capture the elements of subjective style and the sentence polarity.
Statistical analysis is done on the sentence level. We apply machine learning techniques to classify
set of messages. We are interested in the following questions:
1. To what extent can we extract the subjectivity and polarity from the short comments? What are
the important features that can be extracted from the raw text that have the greatest influence on the
classification?
2. What machine learning techniques are suitable for this purpose?
Problem Statement 1
Based on the twitter data, what are the sentiments of the people pre and post movie release
Objective 1:
Identify the hashtags related to the movie that would help to garner more tweet information on the
movie which has to be analysed for the sentiment analysis.
Problem Statement 2
Movie rating categorization based on the polarity of the tweets.
Objective 2:
Using the polarity score categorize the rating levels like bad, average, good, excellent etc. towards
any movie before its release and predict the overall rating of the movie based on the Twitter data.
After the release of movie we have collected the set of fresh tweets and build the model to check if
the prediction of our model gives the same result or not, which will help us to conclude whether the
expectation of the people before movie release is similar to the sentiment analysis of post movie
release.
To meet our Scope, we have built a model which is capable enough to predict the sentiment of
people for multiple movies which will help others to compare different movie on people’s
sentiment based on twitter
Page | 4
Also, it is capable to predict the sentiment of people from their tweets provided to model for
different time frames. (Example: 7days or 15 days tweets pre & post release of movie)
We would analyse the data and classify the emotions into following bins as per the Score
obtained for each tweet:
No of stars Rating
Table 1
“Score” is the varaibale which we get from sent.value parameterafter passing the tweets
Twitter API
Twitter is an innovative microblogging service aired in 2006 with currently more than 550 million
users. The user created status messages are termed tweets by this service. The public timeline of
twitter service displays tweets of all users worldwide and is an extensive source of real-time
information.
The original concept behind microblogging was to provide personal status updates. But the current
scenario surprisingly witnesses tweets covering everything under the world, ranging from current
political affairs to personal experiences.
Movie reviews, travel experiences, current events etc. add to the list. Tweets (and microblogs in
general) are different from reviews in their basic structure. While reviews are characterized by
formal text patterns and are summarized thoughts of authors, tweets are more casual and restricted to
140 characters of text. Tweets offer companies an additional avenue to gather feedback.
Sentiment analysis to research products, movie reviews etc. aid customers in decision making before
making a purchase or planning for a movie. Enterprises find this area useful to research public
opinion of their company and products, or to analyze customer satisfaction.
Organizations utilize this information to gather feedback about newly released products which
supplements in improving further design, as twitter dataset is easily available and has more impact,
so we are using only Twitter data as our main dataset.
Page | 5
Scrapping of data from Twitter:
We can access the Twitter data through the public API which is provided by the Twitter. These APIs
can be accessed only by authentication requests, which must be signed with valid login ID and
password. The authentication keys are provided by Twitter through which we can do the Tweet
extraction. Few steps need to be followed to create the Authentication keys and those steps are as
follows:
2. Manage Application
After finishing the entire process, we get the unique keys. These Unique keys are required for
collection of tweets from tweeter. The Unique Keys are:
• Consumer key
• Access token
Page | 6
b) Data Description
We will be using the data of 7 and 15 days of pre and post movie release. Sample dataset of the
movie “Tanahji” is mentioned in the appendix section. We will be using the text column of the
dataset for our sentiment analysis. Other columns like source, retweet_count, hashtag, created_at, etc
are being used to understand the overview of the dataset.
The first phase was data acquisition. Here we choose Twitter as our data sources.
Second phase was data cleaning. After scrapping data from various sources, we have cleaned our
data mainly on unavailability of some features.
After cleaning all data, next phase is data integration and transformation. In third phase we
classified some features and create corpus of the text data.
Fourth phase is Sentiment analysis of the Tweets. We use get_sentiment function to get the score
of each tweet and further build the dtm of the tweets which also has a column with the score
value.
In the Fifth phase, the dataset in divided into two sections: Training dataset includes 70% and
testing dataset includes 30% of the total dataset.
Sixth phase is Result and Analysis, where we will run our data through different classification
models on our dataset and check the accuracy and AUC value.
As it is said in Analytics that “Higher the accuracy, better the result”.
Start
Model Generation
Data Processing,
Using Machine End
Feature
Learning
Engineering
Algorithm
Splitting the
Data Visualization dataset into train
and test datasets
Page | 7
3. Data Pre-processing
a) Data Cleaning
Tweets containing both positive and negative emoticons were not taken into account. The list of
positive emoticons used for labeling the training set includes :), :-),), :D, and =), while the list of
negative emoticons consists of :(, :-(, and: (. Inevitably this simplification results in partially correct
or noisy labeling. The emoticons were stripped out of the training data for the classifier to learn from
other features that describe the tweets.
The tweets were manually labeled based on their sentiment, regardless of the presence of emoticons
in the tweets. As the Twitter community has created its own language to post messages, we explore
the unique properties of this language to better define the feature space.
Removal of html tags & symbol “@”. The hyperlinks often present in these tweets in turn
restrict the vocabulary size
Removal of tweet URL
Remove pesky Unicode like <U+A>
Removal of punctuation marks. The basic approach to deal with this is to remove everything that
isn’t a standard number or letter. It should be borne in mind that sometimes punctuations can be
really useful, like web addresses, where the punctuation often defines the web address.
Therefore, the removal of punctuation should be tailored to the specific problem. In our case, we
will remove all punctuations.
Another pre-processing task we have to do is to remove unhelpful terms. Many words are
frequently used but are only meaningful in a sentence. These are called stop words. Examples
are ‘the’, ‘is’, ‘at’, and ‘which’. It’s unlikely that these words will improve our ability to
understand sentiments, so we want to remove them to reduce the size of the data.
We change the case of the word to lowercase so that same words are not counted as different
because of lower or upper case.
Removal of numbers or digits
Removal of blank spaces both from the beginning and the end of the tweet.
In addition to Twitter-specific text preprocessing, other standard preprocessing steps were performed
to define the feature space for tweet feature vector construction. These include text tokenization,
removal of stopwords, stemming, N-gram construction (concatenating 1 to N stemmed words
appearing consecutively) and using minimum word frequency for feature space reduction.
The resulting terms were used as features in the construction of TF-IDF feature vectors representing
the documents (tweets). TF-IDF stands for term frequencyinverse document frequency feature
weighting scheme where weight reflects how important a word is to a document in a document
collection.
Page | 8
b) Creation of Word Corpuses
A positive word corpus contains all possible positive words which are usually used in tweets
similarly a negative word corpus is also created.
A word corpus contains many numbers of words since tweets are created by various people around
the world in their own style.
So, we had to consider all possible words for the corresponding word corpus especially the analysis
is for Hollywood and Bollywood movies.
Positive Negative
Tweet
Words Words
While bringing this tweet for analysis, the positive word corpus compares all the tokens and will find
the words “special”, “montage” and assigns a polarity count. The tweet will again pass through the
negative word corpus and will find the word “clichés” and assigns a polarity count. All other words
will be neglected by the word corpuses since they have neutral polarity.
Page | 9
d) DTM (Document Term Matrix)
A document-term matrix or term-document matrix is a mathematical matrix that describes the
frequency of terms that occur in a collection of documents. In a document-term matrix, rows
correspond to documents in the collection and columns correspond to terms. There are various
schemes for determining the value that each entry in the matrix should take. One such scheme is tf-
idf.
Document Term Matrix is tracking the term frequency for each term by each document. It starts with
the Bag of Words representation of the documents and then for each document we can track the
number of time a term exists. Term count is a common metric to use in a Document Term Matrix.
NOTE: Here, all the outputs have been considered for the movie “Tanhaji”
Page | 10
a) Visualization of retweets
Here we are presenting the nature of the tweets which we collected. The below Donut chart is
showing the proportion of the tweets.
N-gram plots: Bag of Words ignores the semantic context of the review and concentrates
primarily on frequency of each word. To overcome that, we also tried ngram modelling wherein
Page | 11
we created unigrams, bigrams and mixture of both. While creating unigrams is more or less
similar to the bag of words approach, bigrams provided more contextual information on the
review text.
Plot of frequency distribution of uni-gram: The type of models that assign probabilities to the
sequences of single word.
Plot of frequency distribution of bi-gram: The type of models that assign probabilities to the
sequences of two words
Page | 12
Plot of frequency distribution of Tri-gram: The type of models that assign probabilities to the
sequences of three words
Page | 13
d) Word Cloud for Account from which most retweets originate
This Word Cloud shows the account from where most of the retweets have been generated.
e) Word Cloud for the Location from where most of the tweets originate
Page | 14
5. Modelling Approach
a. Techniques and software to be used
Excel
R/Python:
Sentiment prediction has been a great area of research in the recent times and is a challenging task
especially in morphologically rich languages. The task requires us to classify a given sentence either
as "Positive" or "Negative". In order to do this, we went ahead and tried out multiple deep learning-
based methods.
In this project, we are applying data mining techniques, machine learning algorithms, using several
feature extraction techniques used in text mining and understand their relevance to our problem.
We will develop a methodology on the basis of historical data and current data available from the
data source i.e. Twitter to understand any movie’s viewer sentiment outcome.
We could predict the rating of a given movie just based on its summary but quite often, we have
reviews for the movie along with their summary which could be utilized in order to improve the
prediction capability of our networks. Using the sentiment of these reviews as a prior along with the
summary aids the task at hand.
In order to generate sentiment priors, we used the sentiment classification network mentioned in the
previous section. The actual task of predicting the rating is a regression problem where the output
would be a single floating value between 0.0 to 5.0, in order to simplify the task, we convert it into a
classification problem where we round the true rating to the nearest integer which would give us a
five-class classification problem.
Following are the key variables used during our analysis
Text (tweets)
Re-tweet count
Screen name
Date of tweet/ re-tweet
Favorite_count
hastags
retweetfavorite_count
verified
From above list of variables, Text (Tweets) & hastags are the most important variable used for
analysis; however other categories are finally contributing in the final insight of textual data.
Sentiment analysis refers to the use of natural language processing, text analysis and computational
linguistics to extract and identify subjective information in source materials.
We will provide both pre- release tweets & post- release tweets for EDA building and generating
polarity based on their tweets. Here, we are using get_nrc_sentiment (tweets) to find the sentiment of
all the tweets passed.
The get_sentiments function returns a tibble, so to take a look at what is included as “positive” and
“negative” sentiment
Page | 15
Figure 11: Sentiment graph from tweets generated
b. Model Building
We have passed the dataset for both pre and post release to meet our objective of finding the
sentiment of people and comparing it for before the movie release and after it is released.
For these we have built few Classification Model for predicting different levels of rating
CART Model
“CART: Classification and Regression Trees” is a machine learning algorithm for classification and
regression.Here in the model, CART algorithm mainly works with dividing of recursive training
dataset into partitions to get pure target class, where every node in the tree is related to specific
record set split by a test based on selected feature.
Here is the output of the CART Tree, which represent the different sentiment of people for the
movie “Tanhaji”
Page | 16
Figure 12: Decision Tree
The evaluation of performance is judged by the confusion matrix. It is specific table layout that
allows visualization of the performance of an algorithm.
Confusion matrix for CART model for the pre-release tweets for test dataset which brings the
accuracy of 71.86%.
Confusion matrix for CART model for the post-release tweets which brings the accuracy of 55.2%
Random Forest classifier provides two types of randomness, first is with respect to data and second
is with respect to features. Random Forest classifier uses the concept of Bagging and Bootstrapping.
Page | 17
As Random Forest is the combination of decision Trees, it deals with multiple number of
hyperparameters which are:
Number of Trees to construct for the Decision Forest
Number of features to select at random
Depth of each trees.
All these hyperparameters are required to be set manually which will be time consuming and does
not guarantee that it will give good results for the parameter that we have set manually. Each of the
hyperparameters have their own importance and influence towards the output prediction. There are
two measures of importance given for each variable in the random forest.
The first measure is based on how much the accuracy decreases when the variable is excluded. The
second measure is based on the decrease of Gini impurity when a variable is chosen to split a node.
Confusion matrix for Random Forest model for the post-release tweets which brings the accuracy of
93.26%.
Confusion matrix for Random Forest model for post-release tweets which brings the accuracy of
91.66%
Page | 18
Figure 17: Random Forest Confusion Matrix for post-release tweets
Figure 18: Naïve Bayes Model Confusion Matrix for pre-release tweets
Confusion matrix for Naïve Bayes model for post-release tweets which brings the accuracy of 30.4%
Figure 19: Naïve Bayes Model Confusion Matrix for post-release tweets
Page | 19
Confusion matrix for SVN model for pre-release tweets which brings the accuracy of 73.4%
Confusion matrix for SVN model for post-release tweets which brings the accuracy of 72.25%
c. Comparison chart for accuracy & AUC value of the all Model for Pre-release datasets
Table 3
Page | 20
d. Comparison chart for accuracy & AUC value of the all Model for Post-release datasets
Table 4
Page | 21
Another task in sentiment analysis is subjectivity/objectivity identification where it focuses on
classifying a given text (usually a sentence) into one of the two classes (objective or subjective).
As the subjectivity of words and phrases may depend on their context and an objective
document may contain subjective sentences (a news article quoting people's opinions), this
problem can sometimes be more difficult than polarity classification.
Page | 22
References and Bibliography
Page | 23
8. Appendix
Below is the attached fie for sample dataset and data definition file for movie “Tanhaji”
Page | 24
## The following objects are masked from 'package:twitteR':
##
## id, location
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(forestmangr)
library(tidytext)
library(slam)
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following object is masked from 'package:NLP':
##
## annotate
library(RWeka)
library(pROC)
## Type 'citation("pROC")' for a citation.
##
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
##
## cov, smooth, var
#setwd("E:\Study\Capstone")
#write.csv(tanhaji_all,"tanhaji_all.csv")
setwd("E:\\Study\\Capstone")
head(movie_tweets_data)
## X.1 X user_id status_id created_at screen_name
## 1 66439 66439 2.340511e+09 1.212471e+18 2020-01-01 20:31:06 bjadams156
## 2 66440 66440 2.340511e+09 1.212471e+18 2020-01-01 20:31:06 bjadams156
## 3 60459 60459 4.021178e+08 1.212553e+18 2020-01-02 01:54:20 A_Jay_FanNepal
## 4 56533 56533 1.148293e+18 1.212556e+18 2020-01-02 02:07:49 MizarPradyum
## 5 55158 55158 1.126794e+18 1.212557e+18 2020-01-02 02:12:43 ABHI_ADholic04
## 6 56467 56467 1.148293e+18 1.212558e+18 2020-01-02 02:13:47 MizarPradyum
##
text
Page | 25
## 1 Asking this only to @ajaydevgn fans ..in KRK style !!\nHow many of you feel that
#RohitShetty is making full chuti*a of Devgn on the name of <U+0001F449> mere bade Hain
bhaiya re !!\n#TanhajiTheUnsungWarrior \n#Tanhaji #BhujThePrideOfIndia\nAjay fans ke
alawa koi aur yahan text Kiya to UMKB <U+0001F923>
## 2 Asking this only to @ajaydevgn fans ..in KRK style !!\nHow many of you feel that
#RohitShetty is making full chuti*a of Devgn on the name of <U+0001F449> mere bade Hain
bhaiya re !!\n#TanhajiTheUnsungWarrior \n#Tanhaji #BhujThePrideOfIndia\nAjay fans ke
alawa koi aur yahan text Kiya to UMKB <U+0001F923>
## 3
@AbhishekDudhai6 @NishantADHolic_ @ajaydevgn After #Tanhaji cannot wait to watch this
## 4
Dhokebaaj of the decade<U+0001F449> @deepikapadukone ... She betrayed the person who has
given her back to back 4 hits(LAK , Race2, Cocktail, Aarakshan) by clashing her movie
#Chhapaak with SAIF Sir's #Tanhaji
## 5
@krish_is_Devil @MizarPradyum Thanks bhai , watch it in 3D for better experience
#TanhajiTheUnsungWarrior #Tanhaji
## 6
40 k completed on BMS #TanhajiTheUnsungWarrior #Tanhaji
## source display_text_width reply_to_status_id reply_to_user_id
## 1 Twitter for iPhone 140 NA NA
## 2 Twitter for iPhone 140 NA NA
## 3 Twitter for Android 104 NA NA
## 4 Twitter for Android 140 NA NA
## 5 Twitter for Android 84 1.212438e+18 1.207668e+18
## 6 Twitter for Android 76 NA NA
## reply_to_screen_name is_quote is_retweet favorite_count retweet_count
## 1 <NA> FALSE TRUE 0 58
## 2 <NA> FALSE TRUE 0 58
## 3 <NA> FALSE TRUE 0 4
## 4 <NA> FALSE TRUE 0 32
## 5 krish_is_Devil FALSE FALSE 4 0
## 6 <NA> FALSE TRUE 0 2
## quote_count reply_count hashtags symbols
## 1 NA NA RohitShetty NA
## 2 NA NA RohitShetty NA
## 3 NA NA Tanhaji NA
## 4 NA NA <NA> NA
## 5 NA NA c("TanhajiTheUnsungWarrior", "Tanhaji") NA
## 6 NA NA c("TanhajiTheUnsungWarrior", "Tanhaji") NA
## urls_url urls_t.co urls_expanded_url media_url media_t.co media_expanded_url
## 1 <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA>
## media_type ext_media_url ext_media_t.co ext_media_expanded_url ext_media_type
## 1 <NA> <NA> <NA> <NA> NA
## 2 <NA> <NA> <NA> <NA> NA
## 3 <NA> <NA> <NA> <NA> NA
## 4 <NA> <NA> <NA> <NA> NA
## 5 <NA> <NA> <NA> <NA> NA
## 6 <NA> <NA> <NA> <NA> NA
## mentions_user_id
## 1 c("368435117", "65659343")
## 2 c("368435117", "65659343")
## 3 c("110915886", "1109687428778999808", "2955267019", "65659343")
## 4 c("3853108342", "101695592")
## 5 c("1207667977618890752", "1148293149174820864")
Page | 26
## 6 1126794046528077824
## mentions_screen_name lang
## 1 c("teamrb_", "ajaydevgn") en
## 2 c("teamrb_", "ajaydevgn") en
## 3 c("racquel_erika", "AbhishekDudhai6", "NishantADHolic_", "ajaydevgn") en
## 4 c("ClassySaifian", "deepikapadukone") en
## 5 c("krish_is_Devil", "MizarPradyum") en
## 6 ABHI_ADholic04 en
## quoted_status_id quoted_text quoted_created_at quoted_source
## 1 NA <NA> <NA> <NA>
## 2 NA <NA> <NA> <NA>
## 3 NA <NA> <NA> <NA>
## 4 NA <NA> <NA> <NA>
## 5 NA <NA> <NA> <NA>
## 6 NA <NA> <NA> <NA>
## quoted_favorite_count quoted_retweet_count quoted_user_id quoted_screen_name
## 1 NA NA NA <NA>
## 2 NA NA NA <NA>
## 3 NA NA NA <NA>
## 4 NA NA NA <NA>
## 5 NA NA NA <NA>
## 6 NA NA NA <NA>
## quoted_name quoted_followers_count quoted_friends_count quoted_statuses_count
## 1 <NA> NA NA NA
## 2 <NA> NA NA NA
## 3 <NA> NA NA NA
## 4 <NA> NA NA NA
## 5 <NA> NA NA NA
## 6 <NA> NA NA NA
## quoted_location quoted_description quoted_verified retweet_status_id
## 1 <NA> <NA> NA 1.212400e+18
## 2 <NA> <NA> NA 1.212400e+18
## 3 <NA> <NA> NA 1.212353e+18
## 4 <NA> <NA> NA 1.212400e+18
## 5 <NA> <NA> NA NA
## 6 <NA> <NA> NA 1.212430e+18
##
retweet_text
## 1 Asking this only to @ajaydevgn fans ..in KRK style !!\nHow many of you feel that
#RohitShetty is making full chuti*a of Devgn on the name of <U+0001F449> mere bade Hain
bhaiya re !!\n#TanhajiTheUnsungWarrior \n#Tanhaji #BhujThePrideOfIndia\nAjay fans ke
alawa koi aur yahan text Kiya to UMKB <U+0001F923>
## 2 Asking this only to @ajaydevgn fans ..in KRK style !!\nHow many of you feel that
#RohitShetty is making full chuti*a of Devgn on the name of <U+0001F449> mere bade Hain
bhaiya re !!\n#TanhajiTheUnsungWarrior \n#Tanhaji #BhujThePrideOfIndia\nAjay fans ke
alawa koi aur yahan text Kiya to UMKB <U+0001F923>
## 3
@AbhishekDudhai6 @NishantADHolic_ @ajaydevgn After #Tanhaji cannot wait to watch this
## 4
Dhokebaaj of the decade<U+0001F449> @deepikapadukone ... She betrayed the person who has
given her back to back 4 hits(LAK , Race2, Cocktail, Aarakshan) by clashing her movie
#Chhapaak with SAIF Sir's #Tanhaji
## 5
<NA>
## 6
40 k completed on BMS #TanhajiTheUnsungWarrior #Tanhaji
## retweet_created_at retweet_source retweet_favorite_count
## 1 2020-01-01 15:48:21 Twitter for Android 357
## 2 2020-01-01 15:48:21 Twitter for Android 357
## 3 2020-01-01 12:42:00 Twitter for Android 6
Page | 27
## 4 2020-01-01 15:48:40 Twitter for Android 51
## 5 <NA> <NA> NA
## 6 2020-01-01 17:45:27 Twitter for Android 11
## retweet_retweet_count retweet_user_id retweet_screen_name
## 1 58 3.684351e+08 teamrb_
## 2 58 3.684351e+08 teamrb_
## 3 4 1.109159e+08 racquel_erika
## 4 32 3.853108e+09 ClassySaifian
## 5 NA NA <NA>
## 6 2 1.126794e+18 ABHI_ADholic04
## retweet_name retweet_followers_count
## 1 REAL BOXOFFICE 16590
## 2 REAL BOXOFFICE 16590
## 3 forever aamirian 831
## 4 <U+2614><U+FE0F>CLASSY SAIFIAN<U+2614> 1770
## 5 <NA> NA
## 6 ABHI_Tanhaji04 570
## retweet_friends_count retweet_statuses_count retweet_location
## 1 1 5215 New Delhi, India
## 2 1 5215 New Delhi, India
## 3 72 11616 India
## 4 32 21547
## 5 NA NA <NA>
## 6 603 13246
##
retweet_description
## 1
Typing the truth below ..<U+0001F447><U+0001F64F>
## 2
Typing the truth below ..<U+0001F447><U+0001F64F>
## 3 only aamir khan rock my
world...i love u aamir khan for life..i know one day i will see aamir khan up close &
personal
## 4 ur Fav Actor mi8 b a bigger<U+2B50><U+FE0F>but definitely not a better INSAAN than
SAIF SIR..Sir is epitome of Class,Royalness& humbleness.(Fan Account of Megastar SAIF
Sir).
## 5
<NA>
## 6
Tanhaji on 10 jan , watch it in 3d
## retweet_verified place_url place_name place_full_name place_type country
## 1 FALSE <NA> <NA> <NA> <NA> <NA>
## 2 FALSE <NA> <NA> <NA> <NA> <NA>
## 3 FALSE <NA> <NA> <NA> <NA> <NA>
## 4 FALSE <NA> <NA> <NA> <NA> <NA>
## 5 NA <NA> <NA> <NA> <NA> <NA>
## 6 FALSE <NA> <NA> <NA> <NA> <NA>
## country_code geo_coords coords_coords bbox_coords
## 1 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 2 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 3 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 4 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 5 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 6 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## status_url name
## 1 https://twitter.com/bjadams156/status/1212471332127948801 Brandon Adams
## 2 https://twitter.com/bjadams156/status/1212471332127948801 Brandon Adams
## 3 https://twitter.com/A_Jay_FanNepal/status/1212552672995356672 ADFnepal
## 4 https://twitter.com/MizarPradyum/status/1212556070024953856 Ajay Devgn fan
## 5 https://twitter.com/ABHI_ADholic04/status/1212557301426589698 ABHI_Tanhaji04
Page | 28
## 6 https://twitter.com/MizarPradyum/status/1212557570491047936 Ajay Devgn fan
## location
## 1 Winston-Salem,NC
## 2 Winston-Salem,NC
## 3 kathmandu, Nepal
## 4 Indiana, USA
## 5
## 6 Indiana, USA
## description url
## 1 I have a twin sister.I graduated from Glenn High School in 2013. <NA>
## 2 I have a twin sister.I graduated from Glenn High School in 2013. <NA>
## 3 Movie Lovers - @ajayDevgn - #maidaan #Tanhaji #bhuj #RRR <NA>
## 4 welcome to ajay devgn kingdom <NA>
## 5 Tanhaji on 10 jan , watch it in 3d <NA>
## 6 welcome to ajay devgn kingdom <NA>
## protected followers_count friends_count listed_count statuses_count
## 1 FALSE 1171 5002 24 64508
## 2 FALSE 1171 5002 24 64508
## 3 FALSE 4957 554 28 73249
## 4 FALSE 180 461 0 23128
## 5 FALSE 570 603 1 13246
## 6 FALSE 180 461 0 23128
## favourites_count account_created_at verified profile_url
## 1 62563 2014-02-13 03:26:57 FALSE <NA>
## 2 62563 2014-02-13 03:26:57 FALSE <NA>
## 3 16149 2011-10-31 15:40:10 FALSE <NA>
## 4 30028 2019-07-08 18:09:56 FALSE <NA>
## 5 4616 2019-05-10 10:20:10 FALSE <NA>
## 6 30028 2019-07-08 18:09:56 FALSE <NA>
## profile_expanded_url account_lang
## 1 <NA> NA
## 2 <NA> NA
## 3 <NA> NA
## 4 <NA> NA
## 5 <NA> NA
## 6 <NA> NA
## profile_banner_url
## 1 <NA>
## 2 <NA>
## 3 https://pbs.twimg.com/profile_banners/402117759/1474271309
## 4 https://pbs.twimg.com/profile_banners/1148293149174820864/1575377431
## 5 https://pbs.twimg.com/profile_banners/1126794046528077824/1576308416
## 6 https://pbs.twimg.com/profile_banners/1148293149174820864/1575377431
## profile_background_url
## 1 http://abs.twimg.com/images/themes/theme1/bg.png
## 2 http://abs.twimg.com/images/themes/theme1/bg.png
## 3 http://abs.twimg.com/images/themes/theme1/bg.png
## 4 <NA>
## 5 <NA>
## 6 <NA>
## profile_image_url
## 1 http://pbs.twimg.com/profile_images/538421341054312451/1MyROBLw_normal.jpeg
## 2 http://pbs.twimg.com/profile_images/538421341054312451/1MyROBLw_normal.jpeg
## 3 http://pbs.twimg.com/profile_images/1111984479063531521/Vk48l5bv_normal.jpg
## 4 http://pbs.twimg.com/profile_images/1215181148839530496/NUdgSpAR_normal.jpg
## 5 http://pbs.twimg.com/profile_images/1208985936597409792/H6J4Ls-9_normal.jpg
## 6 http://pbs.twimg.com/profile_images/1215181148839530496/NUdgSpAR_normal.jpg
tail(movie_tweets_data)
Page | 29
## X.1 X user_id status_id created_at screen_name
## 174346 66582 142 1.214896e+18 1.216941e+18 2020-01-14 04:32:37 Chintan64138110
## 174347 66576 136 1.158960e+18 1.216941e+18 2020-01-14 04:32:41 Gajendr16824337
## 174348 66448 8 1.387810e+08 1.216941e+18 2020-01-14 04:32:43 onenonlymahakal
## 174349 66443 3 1.163468e+18 1.216941e+18 2020-01-14 04:32:44 shri0944
## 174350 66441 1 1.669707e+09 1.216941e+18 2020-01-14 04:32:45 pruthwirajdeo
## 174351 66442 2 7.057414e+17 1.216941e+18 2020-01-14 04:32:45 Aryaman111
##
text
## 174346 Just saw #Tanhaji, a brilliant, well-acted and inspiring film. Shivaji and the
Marathas created the basis for India's Independence Movement and show the true warrior
spirit of India. Don't understand why Shiv Sena is not highlighting it. Would have made
Bal Thackeray proud.
## 174347 Just saw #Tanhaji, a brilliant, well-acted and inspiring film. Shivaji and the
Marathas created the basis for India's Independence Movement and show the true warrior
spirit of India. Don't understand why Shiv Sena is not highlighting it. Would have made
Bal Thackeray proud.
## 174348 As per the reliable
sources #TanhajiTheUnsungWarrior will be declared national film of India by our home
minister @AmitShah in next parliament session.\n\nCongratulations @ajaydevgn @omraut
<U+0001F389><U+0001F389>#Tanhaji
## 174349 Just saw #Tanhaji, a brilliant, well-acted and inspiring film. Shivaji and the
Marathas created the basis for India's Independence Movement and show the true warrior
spirit of India. Don't understand why Shiv Sena is not highlighting it. Would have made
Bal Thackeray proud.
## 174350 Just saw #Tanhaji, a brilliant, well-acted and inspiring film. Shivaji and the
Marathas created the basis for India's Independence Movement and show the true warrior
spirit of India. Don't understand why Shiv Sena is not highlighting it. Would have made
Bal Thackeray proud.
## 174351 Just saw #Tanhaji, a brilliant, well-acted and inspiring film. Shivaji and the
Marathas created the basis for India's Independence Movement and show the true warrior
spirit of India. Don't understand why Shiv Sena is not highlighting it. Would have made
Bal Thackeray proud.
## source display_text_width reply_to_status_id
## 174346 Twitter for Android 140 NA
## 174347 Twitter for Android 140 NA
## 174348 Twitter for Android 140 NA
## 174349 Twitter Web App 140 NA
## 174350 Twitter for Android 140 NA
## 174351 Twitter for Android 140 NA
## reply_to_user_id reply_to_screen_name is_quote is_retweet favorite_count
## 174346 NA <NA> FALSE TRUE 0
## 174347 NA <NA> FALSE TRUE 0
## 174348 NA <NA> FALSE TRUE 0
## 174349 NA <NA> FALSE TRUE 0
## 174350 NA <NA> FALSE TRUE 0
## 174351 NA <NA> FALSE TRUE 0
## retweet_count quote_count reply_count hashtags symbols
## 174346 25 NA NA Tanhaji NA
## 174347 25 NA NA Tanhaji NA
## 174348 14 NA NA TanhajiTheUnsungWarrior NA
## 174349 25 NA NA Tanhaji NA
## 174350 25 NA NA Tanhaji NA
## 174351 25 NA NA Tanhaji NA
## urls_url urls_t.co urls_expanded_url media_url media_t.co
## 174346 <NA> <NA> <NA> <NA> <NA>
## 174347 <NA> <NA> <NA> <NA> <NA>
## 174348 <NA> <NA> <NA> <NA> <NA>
## 174349 <NA> <NA> <NA> <NA> <NA>
## 174350 <NA> <NA> <NA> <NA> <NA>
Page | 30
## 174351 <NA> <NA> <NA> <NA> <NA>
## media_expanded_url media_type ext_media_url ext_media_t.co
## 174346 <NA> <NA> <NA> <NA>
## 174347 <NA> <NA> <NA> <NA>
## 174348 <NA> <NA> <NA> <NA>
## 174349 <NA> <NA> <NA> <NA>
## 174350 <NA> <NA> <NA> <NA>
## 174351 <NA> <NA> <NA> <NA>
## ext_media_expanded_url ext_media_type mentions_user_id
## 174346 <NA> NA 2924521080
## 174347 <NA> NA 2924521080
## 174348 <NA> NA 2754072768
## 174349 <NA> NA 2924521080
## 174350 <NA> NA 2924521080
## 174351 <NA> NA 2924521080
## mentions_screen_name lang quoted_status_id quoted_text quoted_created_at
## 174346 davidfrawleyved en NA <NA> <NA>
## 174347 davidfrawleyved en NA <NA> <NA>
## 174348 RoninADfannn en NA <NA> <NA>
## 174349 davidfrawleyved en NA <NA> <NA>
## 174350 davidfrawleyved en NA <NA> <NA>
## 174351 davidfrawleyved en NA <NA> <NA>
## quoted_source quoted_favorite_count quoted_retweet_count quoted_user_id
## 174346 <NA> NA NA NA
## 174347 <NA> NA NA NA
## 174348 <NA> NA NA NA
## 174349 <NA> NA NA NA
## 174350 <NA> NA NA NA
## 174351 <NA> NA NA NA
## quoted_screen_name quoted_name quoted_followers_count
## 174346 <NA> <NA> NA
## 174347 <NA> <NA> NA
## 174348 <NA> <NA> NA
## 174349 <NA> <NA> NA
## 174350 <NA> <NA> NA
## 174351 <NA> <NA> NA
## quoted_friends_count quoted_statuses_count quoted_location
## 174346 NA NA <NA>
## 174347 NA NA <NA>
## 174348 NA NA <NA>
## 174349 NA NA <NA>
## 174350 NA NA <NA>
## 174351 NA NA <NA>
## quoted_description quoted_verified retweet_status_id
## 174346 <NA> NA 1.216941e+18
## 174347 <NA> NA 1.216941e+18
## 174348 <NA> NA 1.216796e+18
## 174349 <NA> NA 1.216941e+18
## 174350 <NA> NA 1.216941e+18
## 174351 <NA> NA 1.216941e+18
##
retweet_text
## 174346 Just saw #Tanhaji, a brilliant, well-acted and inspiring film. Shivaji and the
Marathas created the basis for India's Independence Movement and show the true warrior
spirit of India. Don't understand why Shiv Sena is not highlighting it. Would have made
Bal Thackeray proud.
## 174347 Just saw #Tanhaji, a brilliant, well-acted and inspiring film. Shivaji and the
Marathas created the basis for India's Independence Movement and show the true warrior
spirit of India. Don't understand why Shiv Sena is not highlighting it. Would have made
Bal Thackeray proud.
Page | 31
## 174348 As per the reliable
sources #TanhajiTheUnsungWarrior will be declared national film of India by our home
minister @AmitShah in next parliament session.\n\nCongratulations @ajaydevgn @omraut
<U+0001F389><U+0001F389>#Tanhaji
## 174349 Just saw #Tanhaji, a brilliant, well-acted and inspiring film. Shivaji and the
Marathas created the basis for India's Independence Movement and show the true warrior
spirit of India. Don't understand why Shiv Sena is not highlighting it. Would have made
Bal Thackeray proud.
## 174350 Just saw #Tanhaji, a brilliant, well-acted and inspiring film. Shivaji and the
Marathas created the basis for India's Independence Movement and show the true warrior
spirit of India. Don't understand why Shiv Sena is not highlighting it. Would have made
Bal Thackeray proud.
## 174351 Just saw #Tanhaji, a brilliant, well-acted and inspiring film. Shivaji and the
Marathas created the basis for India's Independence Movement and show the true warrior
spirit of India. Don't understand why Shiv Sena is not highlighting it. Would have made
Bal Thackeray proud.
## retweet_created_at retweet_source retweet_favorite_count
## 174346 2020-01-14 04:30:55 Twitter Web App 99
## 174347 2020-01-14 04:30:55 Twitter Web App 99
## 174348 2020-01-13 18:55:21 Twitter for Android 23
## 174349 2020-01-14 04:30:55 Twitter Web App 99
## 174350 2020-01-14 04:30:55 Twitter Web App 99
## 174351 2020-01-14 04:30:55 Twitter Web App 99
## retweet_retweet_count retweet_user_id retweet_screen_name
## 174346 25 2924521080 davidfrawleyved
## 174347 25 2924521080 davidfrawleyved
## 174348 14 2754072768 RoninADfannn
## 174349 25 2924521080 davidfrawleyved
## 174350 25 2924521080 davidfrawleyved
## 174351 25 2924521080 davidfrawleyved
## retweet_name
## 174346 Dr David Frawley
## 174347 Dr David Frawley
## 174348 THE WHITE WOLF..!! #Tanhaji <U+0001F6A9><U+0001F6A9>
## 174349 Dr David Frawley
## 174350 Dr David Frawley
## 174351 Dr David Frawley
## retweet_followers_count retweet_friends_count retweet_statuses_count
## 174346 307220 153 23992
## 174347 307220 153 23992
## 174348 1106 150 14186
## 174349 307220 153 23992
## 174350 307220 153 23992
## 174351 307220 153 23992
## retweet_location
## 174346 Santa Fe, NM USA
## 174347 Santa Fe, NM USA
## 174348
## 174349 Santa Fe, NM USA
## 174350 Santa Fe, NM USA
## 174351 Santa Fe, NM USA
##
retweet_description
## 174346 Pandit Vamadeva Shastri, Vedacharya, author of over forty books, Yoga, Ayurveda
and Vedanta, D.Litt., Padma Bhushan recipient, views personal
## 174347 Pandit Vamadeva Shastri, Vedacharya, author of over forty books, Yoga, Ayurveda
and Vedanta, D.Litt., Padma Bhushan recipient, views personal
## 174348
## 174349 Pandit Vamadeva Shastri, Vedacharya, author of over forty books, Yoga, Ayurveda
Page | 32
and Vedanta, D.Litt., Padma Bhushan recipient, views personal
## 174350 Pandit Vamadeva Shastri, Vedacharya, author of over forty books, Yoga, Ayurveda
and Vedanta, D.Litt., Padma Bhushan recipient, views personal
## 174351 Pandit Vamadeva Shastri, Vedacharya, author of over forty books, Yoga, Ayurveda
and Vedanta, D.Litt., Padma Bhushan recipient, views personal
## retweet_verified place_url place_name place_full_name place_type country
## 174346 TRUE <NA> <NA> <NA> <NA> <NA>
## 174347 TRUE <NA> <NA> <NA> <NA> <NA>
## 174348 FALSE <NA> <NA> <NA> <NA> <NA>
## 174349 TRUE <NA> <NA> <NA> <NA> <NA>
## 174350 TRUE <NA> <NA> <NA> <NA> <NA>
## 174351 TRUE <NA> <NA> <NA> <NA> <NA>
## country_code geo_coords coords_coords bbox_coords
## 174346 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 174347 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 174348 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 174349 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 174350 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 174351 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## status_url
## 174346 https://twitter.com/Chintan64138110/status/1216941161698349057
## 174347 https://twitter.com/Gajendr16824337/status/1216941181264777217
## 174348 https://twitter.com/onenonlymahakal/status/1216941189032640512
## 174349 https://twitter.com/shri0944/status/1216941193944190976
## 174350 https://twitter.com/pruthwirajdeo/status/1216941197249282050
## 174351 https://twitter.com/Aryaman111/status/1216941195357753345
## name location
## 174346 Chintan Kumar
## 174347 Gajendra Singh Shekhawat
## 174348 Nirmal Aj fan <U+0001F1EE><U+0001F1F3> Indore, India
## 174349 Shri (<U+0936><U+094D><U+0930><U+0940>) Bengaluru, India
## 174350 pruthwiraj
## 174351 Vivek
##
description
## 174346
Rashtravaadi | name changed for security reason |
## 174347
Friendly
## 174348
movie n cricket maniac,shiv bhakt n believer
## 174349 Proud Hindian (Hindu+Indian) \n<U+0C9C><U+0CC8>
<U+0CB6><U+0CCD><U+0CB0><U+0CC0> <U+0CB0><U+0CBE><U+0CAE><U+0CCD>
<U+0001F6A9>\n<U+0CAD><U+0CBE><U+0CB0><U+0CA4><U+0CCD> <U+0CAE><U+0CBE><U+0CA4><U+0CBE>
<U+0C95><U+0CBF> <U+0C9C><U+0CC8>
<U+0001F1EE><U+0001F1F3>\n<U+0CB5><U+0C82><U+0CA6><U+0CC7>
<U+0CAE><U+0CBE><U+0CA4><U+0CB0><U+0C82> <U+0001F1EE><U+0001F1F3>
## 174350
## 174351
Page | 33
## 174346 659 2020-01-08 13:06:03 FALSE <NA>
## 174347 2785 2019-08-07 04:35:56 FALSE <NA>
## 174348 15036 2010-04-30 15:30:39 FALSE <NA>
## 174349 1824 2019-08-19 15:10:40 FALSE <NA>
## 174350 1378 2013-08-14 06:24:51 FALSE <NA>
## 174351 16825 2016-03-04 13:07:19 FALSE <NA>
## profile_expanded_url account_lang
## 174346 <NA> NA
## 174347 <NA> NA
## 174348 <NA> NA
## 174349 <NA> NA
## 174350 <NA> NA
## 174351 <NA> NA
## profile_banner_url
## 174346 <NA>
## 174347 <NA>
## 174348 https://pbs.twimg.com/profile_banners/138781014/1495135789
## 174349 https://pbs.twimg.com/profile_banners/1163468236496617474/1578248233
## 174350 <NA>
## 174351 <NA>
## profile_background_url
## 174346 <NA>
## 174347 <NA>
## 174348 http://abs.twimg.com/images/themes/theme1/bg.png
## 174349 <NA>
## 174350 http://abs.twimg.com/images/themes/theme1/bg.png
## 174351 <NA>
##
profile_image_url
## 174346
http://pbs.twimg.com/profile_images/1214897021569470464/PsmQvyxI_normal.jpg
## 174347
http://pbs.twimg.com/profile_images/1158960946838044673/8WZueqx5_normal.jpg
## 174348
http://pbs.twimg.com/profile_images/688317640897638403/ELaY-ZEX_normal.jpg
## 174349
http://pbs.twimg.com/profile_images/1163468598603444224/Ia-OmyqY_normal.jpg
## 174350
http://pbs.twimg.com/profile_images/378800000293680962/f80a13a608555e74bc6c43f883e9eb03_n
ormal.jpeg
## 174351
http://pbs.twimg.com/profile_images/1207229736100884480/s4fAGDXh_normal.jpg
Creating the varibales for pre and post dates of the moview release
# Edit the Release date of the movie in MM/DD/YYYY format
Page | 34
Filter the pre release data from the dataset
Page | 35
## 2 <NA> <NA> <NA>
## 3 <NA> <NA> <NA>
## 4 <NA> <NA> <NA>
## 5 <NA> <NA> <NA>
## 6 <NA> <NA> <NA>
## media_url media_t.co
## 1 <NA> <NA>
## 2 http://pbs.twimg.com/media/ENSf8oHUUAI7mcr.jpg https://t.co/qSsZ0HMA7R
## 3 <NA> <NA>
## 4 <NA> <NA>
## 5 <NA> <NA>
## 6 <NA> <NA>
## media_expanded_url media_type
## 1 <NA> <NA>
## 2 https://twitter.com/Meena_Iyer/status/1212770073535864832/photo/1 photo
## 3 <NA> <NA>
## 4 <NA> <NA>
## 5 <NA> <NA>
## 6 <NA> <NA>
## ext_media_url ext_media_t.co
## 1 <NA> <NA>
## 2 http://pbs.twimg.com/media/ENSf8oHUUAI7mcr.jpg https://t.co/qSsZ0HMA7R
## 3 <NA> <NA>
## 4 <NA> <NA>
## 5 <NA> <NA>
## 6 <NA> <NA>
## ext_media_expanded_url
## 1 <NA>
## 2 https://twitter.com/Meena_Iyer/status/1212770073535864832/photo/1
## 3 <NA>
## 4 <NA>
## 5 <NA>
## 6 <NA>
## ext_media_type mentions_user_id mentions_screen_name lang quoted_status_id
## 1 NA 2390513293 PuneTimesOnline en NA
## 2 NA 143098087 Meena_Iyer en NA
## 3 NA 497770148 RajiniFC en NA
## 4 NA 139639456 BreakingViews4u en NA
## 5 NA 497770148 RajiniFC en NA
## 6 NA 565560313 PradeepBastola en NA
## quoted_text quoted_created_at quoted_source quoted_favorite_count
## 1 <NA> <NA> <NA> NA
## 2 <NA> <NA> <NA> NA
## 3 <NA> <NA> <NA> NA
## 4 <NA> <NA> <NA> NA
## 5 <NA> <NA> <NA> NA
## 6 <NA> <NA> <NA> NA
## quoted_retweet_count quoted_user_id quoted_screen_name quoted_name
## 1 NA NA <NA> <NA>
## 2 NA NA <NA> <NA>
## 3 NA NA <NA> <NA>
## 4 NA NA <NA> <NA>
## 5 NA NA <NA> <NA>
## 6 NA NA <NA> <NA>
## quoted_followers_count quoted_friends_count quoted_statuses_count
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
Page | 36
## 6 NA NA NA
## quoted_location quoted_description quoted_verified retweet_status_id
## 1 <NA> <NA> NA 1.212779e+18
## 2 <NA> <NA> NA 1.212770e+18
## 3 <NA> <NA> NA 1.212737e+18
## 4 <NA> <NA> NA 1.212821e+18
## 5 <NA> <NA> NA 1.212737e+18
## 6 <NA> <NA> NA 1.212581e+18
##
retweet_text
## 1 With just days to the release
of #TanhajiTheUnsungWarriror, the makers are making the most of the time to promote the
movie. @ajaydevgn @TanhajiFilm @omraut @SharadK7 @itsKajolD #SaifAliKhan #Tanhaji
\nhttps://t.co/ewbaCNXKq2
## 2
8 days to go #Tanhaji https://t.co/qSsZ0HMA7R
## 3 Pan-Indian movie #Darbar overtakes Telugu biggie
#SarileruNekkevvaru and Hindi biggie #Tanhaji to become the most awaited movie! Let's hit
100K interests before the end of this weekend, folks! 200K before release!
<U+0001F60E><U+0001F918> https://t.co/1djPtbZBo5
## 4 #TANHAJI rated 15 by British censor board #BBFC\nRunning time 130m 30s\n#BBFCInsight
strong violence, bloody images\n#TanhajiTheUnsungWarrior a historical action drama set in
17th century in which a Maratha warrior embarks on a mission to recapture a hill fortress
taken by Mughal
## 5 Pan-Indian movie #Darbar overtakes Telugu biggie
#SarileruNekkevvaru and Hindi biggie #Tanhaji to become the most awaited movie! Let's hit
100K interests before the end of this weekend, folks! 200K before release!
<U+0001F60E><U+0001F918> https://t.co/1djPtbZBo5
## 6
Tanhaji in hindi belts and Darbar in South to have huge box office openings as per
interest shown in BMS both have 40.4K and more. Chhapak to start on a dull note
everything depends on WOM.#Tanhaji
## retweet_created_at retweet_source retweet_favorite_count
## 1 2020-01-02 16:52:00 TweetDeck 137
## 2 2020-01-02 16:18:12 Twitter for iPhone 205
## 3 2020-01-02 14:06:41 Twitter Web App 410
## 4 2020-01-02 19:39:22 Twitter Web App 18
## 5 2020-01-02 14:06:41 Twitter Web App 410
## 6 2020-01-02 03:48:21 Twitter for Android 3
## retweet_retweet_count retweet_user_id retweet_screen_name
## 1 37 2390513293 PuneTimesOnline
## 2 41 143098087 Meena_Iyer
## 3 172 497770148 RajiniFC
## 4 5 139639456 BreakingViews4u
## 5 172 497770148 RajiniFC
## 6 2 565560313 PradeepBastola
## retweet_name retweet_followers_count retweet_friends_count
## 1 Pune Times 67861 1261
## 2 Meena Iyer 4414 103
## 3 Rajinikanth Fans <U+0001F918> 56964 275
## 4 Breaking Movies 23424 1709
## 5 Rajinikanth Fans <U+0001F918> 56964 275
## 6 Pradeep Bastola 143 51
## retweet_statuses_count retweet_location
## 1 16903 Pune, India
## 2 2243 India
## 3 29618
## 4 84747 FB.com/BreakMovies
## 5 29618
## 6 1458 Nepal
Page | 37
##
retweet_description
## 1 Official handle of Pune Times. Follow for news about the
city and updates from Bollywood and the Marathi entertainment industry
## 2 Influencer, CEO, Ajay Devgn FFilms, Ex-EDITOR Bombay Times and DNA
After Hours, Author Khullam Khulla. Retweets are not endorsements.
## 3 Rajini FC is a tribute to Superstar Rajinikanth handled by Team Rajinists. Updates
on Thalaivar Rajnikanth's next films and more. #Darbar #Thalaivar168
## 4
!!!!! Engine out, completely !!!!!
## 5 Rajini FC is a tribute to Superstar Rajinikanth handled by Team Rajinists. Updates
on Thalaivar Rajnikanth's next films and more. #Darbar #Thalaivar168
## 6 Tweets are personal, RTs are not
endorsement. I love my country, sports, movies and Medical Science.
## retweet_verified place_url place_name place_full_name place_type country
## 1 TRUE <NA> <NA> <NA> <NA> <NA>
## 2 TRUE <NA> <NA> <NA> <NA> <NA>
## 3 FALSE <NA> <NA> <NA> <NA> <NA>
## 4 FALSE <NA> <NA> <NA> <NA> <NA>
## 5 FALSE <NA> <NA> <NA> <NA> <NA>
## 6 FALSE <NA> <NA> <NA> <NA> <NA>
## country_code geo_coords coords_coords bbox_coords
## 1 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 2 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 3 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 4 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 5 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 6 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## status_url
## 1 https://twitter.com/FanOfAjayDevgn1/status/1212890025676816384
## 2 https://twitter.com/FanOfAjayDevgn1/status/1212890414467837953
## 3 https://twitter.com/theeejay_muc/status/1212890536119545858
## 4 https://twitter.com/Tanhaji_25Dec19/status/1212891789150961664
## 5 https://twitter.com/Gopinat38606021/status/1212896540500512768
## 6 https://twitter.com/AdiansNepal/status/1212900453937111040
## name location
## 1 <U+0001F31F>Fan Of Ajay Devgn<U+0001F31F>
## 2 <U+0001F31F>Fan Of Ajay Devgn<U+0001F31F>
## 3 Theeejay Germany
## 4 Sagar New Delhi, India
## 5 Gopinath
## 6 ADF NEPAL Kathmandu
##
description
## 1
TANHAJI \ni love Ajay sir
## 2
TANHAJI \ni love Ajay sir
## 3
Hi, here's is Theeejay! Main Twitter account: @theeejay
## 4
## 5
## 6 @ajaydevgn fan club Nepal. Die hard fan of the King of intensity & versatility, Two
Time national award winner & the real action hero. undisputed king of clash<U+0001F4AA>
## url protected followers_count friends_count listed_count statuses_count
## 1 <NA> FALSE 200 73 0 14911
## 2 <NA> FALSE 200 73 0 14911
Page | 38
## 3 <NA> FALSE 79 52 0 599
## 4 <NA> FALSE 1477 1091 11 26284
## 5 <NA> FALSE 354 713 2 30727
## 6 <NA> FALSE 101 265 0 3182
## favourites_count account_created_at verified profile_url
## 1 28093 2018-10-23 12:31:40 FALSE <NA>
## 2 28093 2018-10-23 12:31:40 FALSE <NA>
## 3 490 2019-10-15 15:47:25 FALSE <NA>
## 4 31826 2010-01-01 05:57:11 FALSE <NA>
## 5 38991 2019-04-28 16:27:36 FALSE <NA>
## 6 1282 2017-12-23 14:08:01 FALSE <NA>
## profile_expanded_url account_lang
## 1 <NA> NA
## 2 <NA> NA
## 3 <NA> NA
## 4 <NA> NA
## 5 <NA> NA
## 6 <NA> NA
## profile_banner_url
## 1 https://pbs.twimg.com/profile_banners/1054711956664434688/1555838130
## 2 https://pbs.twimg.com/profile_banners/1054711956664434688/1555838130
## 3 https://pbs.twimg.com/profile_banners/1184133591908921345/1576187331
## 4 https://pbs.twimg.com/profile_banners/100915145/1571653346
## 5 <NA>
## 6 https://pbs.twimg.com/profile_banners/944570289953824768/1514039933
## profile_background_url
## 1 <NA>
## 2 <NA>
## 3 <NA>
## 4 http://abs.twimg.com/images/themes/theme1/bg.png
## 5 <NA>
## 6 http://abs.twimg.com/images/themes/theme1/bg.png
## profile_image_url
## 1 http://pbs.twimg.com/profile_images/1194577883181699073/S4UiPutd_normal.jpg
## 2 http://pbs.twimg.com/profile_images/1194577883181699073/S4UiPutd_normal.jpg
## 3 http://pbs.twimg.com/profile_images/1187238869311381504/AeBQsav4_normal.jpg
## 4 http://pbs.twimg.com/profile_images/1164493859386085376/cgxrrBl5_normal.png
## 5 http://abs.twimg.com/sticky/default_profile_images/default_profile_normal.png
## 6 http://pbs.twimg.com/profile_images/1212555422906769411/uw-yV2xx_normal.jpg
tail(pre_tweets_data)
## X.1 X user_id status_id created_at
## 67488 173184 106744 7.084906e+17 1.215422e+18 2020-01-09 23:56:04
## 67489 173181 106741 1.210608e+18 1.215422e+18 2020-01-09 23:56:21
## 67490 173168 106728 7.936731e+17 1.215422e+18 2020-01-09 23:56:58
## 67491 173167 106727 1.938105e+08 1.215422e+18 2020-01-09 23:57:10
## 67492 173166 106726 8.528880e+17 1.215423e+18 2020-01-09 23:58:32
## 67493 173165 106725 2.986419e+09 1.215423e+18 2020-01-09 23:59:27
## screen_name
## 67488 vk9378
## 67489 DeepakK98376858
## 67490 dev4Ind
## 67491 Ornawalla
## 67492 KrishnamitraHKJ
## 67493 vishalmellark
##
text
## 67488 #OneWordReview...\n#Tanhaji: SUPERB.\nRating:
<U+2B50><U+FE0F><U+2B50><U+FE0F><U+2B50><U+FE0F><U+2B50><U+FE0F>\nDrama, emotions,
conflict, action, VFX, #Tanhaji is an enthralling experience... Electrifying climax...
Page | 39
Top notch direction... #Ajay, #Kajol, #Saif in super form... Get ready for 2020’s first
<U+20B9> <U+0001F4AF>cr+ film. #TanhajiReview https://t.co/N9TwWsWazd
## 67489
"Bharpoor Entertainer, Lajawab Film #Tanhaji".- Journalist #TanhajiReview\n\nBest Of
Luck & Congratulations In Advance !!!\n@ajaydevgn and Fans !!!
#TanhajiTheUnsungWarrior https://t.co/0oV0N3TDpn
## 67490 #OneWordReview...\n#Tanhaji: SUPERB.\nRating:
<U+2B50><U+FE0F><U+2B50><U+FE0F><U+2B50><U+FE0F><U+2B50><U+FE0F>\nDrama, emotions,
conflict, action, VFX, #Tanhaji is an enthralling experience... Electrifying climax...
Top notch direction... #Ajay, #Kajol, #Saif in super form... Get ready for 2020’s first
<U+20B9> <U+0001F4AF>cr+ film. #TanhajiReview https://t.co/N9TwWsWazd
## 67491 #OneWordReview...\n#Tanhaji: SUPERB.\nRating:
<U+2B50><U+FE0F><U+2B50><U+FE0F><U+2B50><U+FE0F><U+2B50><U+FE0F>\nDrama, emotions,
conflict, action, VFX, #Tanhaji is an enthralling experience... Electrifying climax...
Top notch direction... #Ajay, #Kajol, #Saif in super form... Get ready for 2020’s first
<U+20B9> <U+0001F4AF>cr+ film. #TanhajiReview https://t.co/N9TwWsWazd
## 67492
Me:- 2morrow my a/c will be @verified & we both will go to see #Tanhaji also
#Boycott_Chhapaak\n\nTwitter :-
<U+0001F447><U+0001F447><U+0001F937><U+0001F3FB><U+200D><U+2642><U+FE0F><U+0001F648><U+00
01F602> https://t.co/eQm6hUrwYm
## 67493
#Thalaiva the day 1 business of #Darbar > #Tanhaji + #Chapaak more than double it is
actually if the worldwide fig <U+0001F60D> one sun .. one moon .. one star ..
superstar ..
## source display_text_width reply_to_status_id
## 67488 Twitter for Android 140 NA
## 67489 Twitter for Android 143 NA
## 67490 Twitter for iPhone 140 NA
## 67491 Twitter for iPhone 140 NA
## 67492 Twitter for Android 133 NA
## 67493 Twitter for iPhone 142 NA
## reply_to_user_id reply_to_screen_name is_quote is_retweet favorite_count
## 67488 NA <NA> FALSE TRUE 0
## 67489 NA <NA> FALSE TRUE 0
## 67490 NA <NA> FALSE TRUE 0
## 67491 NA <NA> FALSE TRUE 0
## 67492 NA <NA> FALSE TRUE 0
## 67493 NA <NA> FALSE TRUE 0
## retweet_count quote_count reply_count
## 67488 10287 NA NA
## 67489 144 NA NA
## 67490 10287 NA NA
## 67491 10287 NA NA
## 67492 38 NA NA
## 67493 2 NA NA
## hashtags symbols urls_url urls_t.co
## 67488 c("OneWordReview", "Tanhaji", "Tanhaji") NA <NA> <NA>
## 67489 c("Tanhaji", "TanhajiReview") NA <NA> <NA>
## 67490 c("OneWordReview", "Tanhaji", "Tanhaji") NA <NA> <NA>
## 67491 c("OneWordReview", "Tanhaji", "Tanhaji") NA <NA> <NA>
## 67492 c("Tanhaji", "Boycott_Chhapaak") NA <NA> <NA>
## 67493 c("Thalaiva", "Darbar", "Tanhaji", "Chapaak") NA <NA> <NA>
## urls_expanded_url media_url media_t.co media_expanded_url media_type
## 67488 <NA> <NA> <NA> <NA> <NA>
## 67489 <NA> <NA> <NA> <NA> <NA>
## 67490 <NA> <NA> <NA> <NA> <NA>
## 67491 <NA> <NA> <NA> <NA> <NA>
## 67492 <NA> <NA> <NA> <NA> <NA>
## 67493 <NA> <NA> <NA> <NA> <NA>
Page | 40
## ext_media_url ext_media_t.co ext_media_expanded_url ext_media_type
## 67488 <NA> <NA> <NA> NA
## 67489 <NA> <NA> <NA> NA
## 67490 <NA> <NA> <NA> NA
## 67491 <NA> <NA> <NA> NA
## 67492 <NA> <NA> <NA> NA
## 67493 <NA> <NA> <NA> NA
## mentions_user_id mentions_screen_name lang
## 67488 99642673 taran_adarsh en
## 67489 1610358128 iSKsCombat_ en
## 67490 99642673 taran_adarsh en
## 67491 99642673 taran_adarsh en
## 67492 c("1064198326990458880", "63796828") c("IamPurn", "verified") en
## 67493 1135988009453547520 Justano84979866 en
## quoted_status_id quoted_text quoted_created_at quoted_source
## 67488 NA <NA> <NA> <NA>
## 67489 NA <NA> <NA> <NA>
## 67490 NA <NA> <NA> <NA>
## 67491 NA <NA> <NA> <NA>
## 67492 NA <NA> <NA> <NA>
## 67493 NA <NA> <NA> <NA>
## quoted_favorite_count quoted_retweet_count quoted_user_id
## 67488 NA NA NA
## 67489 NA NA NA
## 67490 NA NA NA
## 67491 NA NA NA
## 67492 NA NA NA
## 67493 NA NA NA
## quoted_screen_name quoted_name quoted_followers_count
## 67488 <NA> <NA> NA
## 67489 <NA> <NA> NA
## 67490 <NA> <NA> NA
## 67491 <NA> <NA> NA
## 67492 <NA> <NA> NA
## 67493 <NA> <NA> NA
## quoted_friends_count quoted_statuses_count quoted_location
## 67488 NA NA <NA>
## 67489 NA NA <NA>
## 67490 NA NA <NA>
## 67491 NA NA <NA>
## 67492 NA NA <NA>
## 67493 NA NA <NA>
## quoted_description quoted_verified retweet_status_id
## 67488 <NA> NA 1.215295e+18
## 67489 <NA> NA 1.215298e+18
## 67490 <NA> NA 1.215295e+18
## 67491 <NA> NA 1.215295e+18
## 67492 <NA> NA 1.215317e+18
## 67493 <NA> NA 1.215396e+18
##
retweet_text
## 67488 #OneWordReview...\n#Tanhaji: SUPERB.\nRating:
<U+2B50><U+FE0F><U+2B50><U+FE0F><U+2B50><U+FE0F><U+2B50><U+FE0F>\nDrama, emotions,
conflict, action, VFX, #Tanhaji is an enthralling experience... Electrifying climax...
Top notch direction... #Ajay, #Kajol, #Saif in super form... Get ready for 2020’s first
<U+20B9> <U+0001F4AF>cr+ film. #TanhajiReview https://t.co/N9TwWsWazd
## 67489
"Bharpoor Entertainer, Lajawab Film #Tanhaji".- Journalist #TanhajiReview\n\nBest Of
Luck & Congratulations In Advance !!!\n@ajaydevgn and Fans !!!
#TanhajiTheUnsungWarrior https://t.co/0oV0N3TDpn
Page | 41
## 67490 #OneWordReview...\n#Tanhaji: SUPERB.\nRating:
<U+2B50><U+FE0F><U+2B50><U+FE0F><U+2B50><U+FE0F><U+2B50><U+FE0F>\nDrama, emotions,
conflict, action, VFX, #Tanhaji is an enthralling experience... Electrifying climax...
Top notch direction... #Ajay, #Kajol, #Saif in super form... Get ready for 2020’s first
<U+20B9> <U+0001F4AF>cr+ film. #TanhajiReview https://t.co/N9TwWsWazd
## 67491 #OneWordReview...\n#Tanhaji: SUPERB.\nRating:
<U+2B50><U+FE0F><U+2B50><U+FE0F><U+2B50><U+FE0F><U+2B50><U+FE0F>\nDrama, emotions,
conflict, action, VFX, #Tanhaji is an enthralling experience... Electrifying climax...
Top notch direction... #Ajay, #Kajol, #Saif in super form... Get ready for 2020’s first
<U+20B9> <U+0001F4AF>cr+ film. #TanhajiReview https://t.co/N9TwWsWazd
## 67492
Me:- 2morrow my a/c will be @verified & we both will go to see #Tanhaji also
#Boycott_Chhapaak\n\nTwitter :-
<U+0001F447><U+0001F447><U+0001F937><U+0001F3FB><U+200D><U+2642><U+FE0F><U+0001F648><U+00
01F602> https://t.co/eQm6hUrwYm
## 67493
#Thalaiva the day 1 business of #Darbar > #Tanhaji + #Chapaak more than double it is
actually if the worldwide fig <U+0001F60D> one sun .. one moon .. one star ..
superstar ..
## retweet_created_at retweet_source retweet_favorite_count
## 67488 2020-01-09 15:31:45 Twitter for iPad 49234
## 67489 2020-01-09 15:42:11 Twitter Web App 312
## 67490 2020-01-09 15:31:45 Twitter for iPad 49234
## 67491 2020-01-09 15:31:45 Twitter for iPad 49234
## 67492 2020-01-09 16:58:50 Twitter for Android 10
## 67493 2020-01-09 22:12:14 Twitter for iPhone 7
## retweet_retweet_count retweet_user_id retweet_screen_name
## 67488 10287 9.964267e+07 taran_adarsh
## 67489 144 1.610358e+09 iSKsCombat_
## 67490 10287 9.964267e+07 taran_adarsh
## 67491 10287 9.964267e+07 taran_adarsh
## 67492 38 1.064198e+18 IamPurn
## 67493 2 1.135988e+18 Justano84979866
##
retweet_name
## 67488
taran adarsh
## 67489
Sardar Singh
## 67490
taran adarsh
## 67491
taran adarsh
## 67492
Nipun<U+0001F1EE><U+0001F1F3><U+0001F441><U+FE0F><U+0001F443><U+0001F441><U+FE0F><U+0001F
6A9>
## 67493 <U+0930><U+093E><U+0927><U+0947> <U+092E><U+094B><U+0939><U+0928>
<U+0915><U+0947> <U+092B><U+093C><U+0948><U+0928>
## retweet_followers_count retweet_friends_count retweet_statuses_count
## 67488 3741490 168 34910
## 67489 1858 1342 5520
## 67490 3741490 168 34910
## 67491 3741490 168 34910
## 67492 8873 6668 17149
## 67493 3 2 265
##
retweet_location
## 67488
Mumbai, India
## 67489
Page | 42
Punjab, India
## 67490
Mumbai, India
## 67491
Mumbai, India
## 67492 <U+0938><U+093E><U+0930><U+0947> <U+091C><U+0939><U+0949> <U+0938><U+0947>
<U+0905><U+091A><U+094D><U+091B><U+093E><U+0001F449><U+092D><U+093E><U+0930><U+0924>
<U+092A><U+094D><U+092F><U+093E><U+0930><U+093E>
## 67493
##
retweet_description
## 67488
Movie critic | Biz analyst | Influencer | Instagram: https://t.co/4hvdbhvfkX
## 67489
If you even dream of beating me, you better wake up and apologise!!! .\nOnly God Of
Bollywood @BeingSalmanKhan Matters And Rules!!! \n#SalmanKhan Fan
## 67490
Movie critic | Biz analyst | Influencer | Instagram: https://t.co/4hvdbhvfkX
## 67491
Movie critic | Biz analyst | Influencer | Instagram: https://t.co/4hvdbhvfkX
## 67492
#<U+0932><U+0947><U+0916><U+0915><U+270D><U+FE0F>#<U+0930><U+093E><U+0937><U+094D><U+091F
><U+094D><U+0930><U+0935><U+093E><U+0926><U+0940>,#TaxPayer,TwitsIn<U+2764>
#LovesNature<U+0001F331>,flwdBy<U+0001F449>@SDPachauri
@bhavsarhardiik,@Real_anuj,@shwait_malik @caopmishra @zammit_marc<U+0001F60D> Sis-
@Saritasidh
## 67493
Here only for Salman Khan ..
## retweet_verified place_url place_name place_full_name place_type country
## 67488 TRUE <NA> <NA> <NA> <NA> <NA>
## 67489 FALSE <NA> <NA> <NA> <NA> <NA>
## 67490 TRUE <NA> <NA> <NA> <NA> <NA>
## 67491 TRUE <NA> <NA> <NA> <NA> <NA>
## 67492 FALSE <NA> <NA> <NA> <NA> <NA>
## 67493 FALSE <NA> <NA> <NA> <NA> <NA>
## country_code geo_coords coords_coords bbox_coords
## 67488 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 67489 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 67490 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 67491 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 67492 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 67493 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## status_url
## 67488 https://twitter.com/vk9378/status/1215422014736875521
## 67489 https://twitter.com/DeepakK98376858/status/1215422085570281472
## 67490 https://twitter.com/dev4Ind/status/1215422241980088320
## 67491 https://twitter.com/Ornawalla/status/1215422292391419904
## 67492 https://twitter.com/KrishnamitraHKJ/status/1215422634613071873
## 67493 https://twitter.com/vishalmellark/status/1215422867610861569
## name location
## 67488 Vikas Kumar <U+0001F1EE><U+0001F1F3> Agartala, India
## 67489 Deepak Kumar
## 67490 devp India
## 67491 P B India
## 67492 Krishnamitra Jauhar
## 67493 buttercakeluv
##
description
## 67488
Page | 43
## 67489
Student
## 67490
VandeMatram
## 67491
Liberals are bunch of Chutiyas. A Proud Hindu. Supporter of Truth. No Bullshit, Just
State the Facts. NaMo Fan. No Poverty. Respect <U+270A>.
## 67492 <U+0938><U+0943><U+0937><U+094D><U+091F><U+093F> <U+0939><U+0948>
<U+0939><U+0930><U+093F> <U+092E><U+0928><U+094D><U+0926><U+093F><U+0930>
<U+092E><U+0947><U+0930><U+093E>, <U+0927><U+094D><U+092F><U+093E><U+0928>
<U+0939><U+0948> <U+0938><U+091A><U+094D><U+091A><U+0940>
<U+092A><U+0942><U+091C><U+093E><U+0964>
<U+0938><U+092C><U+092E><U+0947><U+0902> <U+0915><U+0943><U+0937><U+094D><U+0923>
<U+0928><U+093F><U+0939><U+093E><U+0930><U+0942><U+0901>
'<U+091C><U+094C><U+0939><U+0930>',<U+092D><U+093E><U+0935> <U+0928>
<U+0930><U+093E><U+0916><U+0942><U+0901> <U+0926><U+0942><U+091C><U+093E><U+0964><U+0964>
\n<U+0001F449>My tweets are\nin<U+0001F449><U+2665><U+FE0F>Likes<U+0001F448>
## 67493
I apologize in advance. snapchat : vishalmellark | instagram : buttercakeluv
## url protected followers_count friends_count listed_count statuses_count
## 67488 <NA> FALSE 378 1358 10 89474
## 67489 <NA> FALSE 18 472 0 189
## 67490 <NA> FALSE 101 234 3 11762
## 67491 <NA> FALSE 37 105 0 13768
## 67492 <NA> FALSE 11013 513 23 214923
## 67493 <NA> FALSE 117 159 0 24693
## favourites_count account_created_at verified profile_url
## 67488 137120 2016-03-12 03:11:37 FALSE <NA>
## 67489 1015 2019-12-27 17:07:01 FALSE <NA>
## 67490 18285 2016-11-02 04:36:34 FALSE <NA>
## 67491 12866 2010-09-22 18:38:41 FALSE <NA>
## 67492 5150 2017-04-14 14:15:19 FALSE <NA>
## 67493 33882 2015-01-17 03:36:10 FALSE <NA>
## profile_expanded_url account_lang
## 67488 <NA> NA
## 67489 <NA> NA
## 67490 <NA> NA
## 67491 <NA> NA
## 67492 <NA> NA
## 67493 <NA> NA
## profile_banner_url
## 67488 https://pbs.twimg.com/profile_banners/708490600933367808/1457753503
## 67489 https://pbs.twimg.com/profile_banners/1210607948184997888/1577769005
## 67490 <NA>
## 67491 <NA>
## 67492 https://pbs.twimg.com/profile_banners/852887997448155137/1528385874
## 67493 https://pbs.twimg.com/profile_banners/2986419235/1576170519
## profile_background_url
## 67488 http://abs.twimg.com/images/themes/theme1/bg.png
## 67489 <NA>
## 67490 <NA>
## 67491 http://abs.twimg.com/images/themes/theme5/bg.gif
## 67492 <NA>
## 67493 http://abs.twimg.com/images/themes/theme1/bg.png
## profile_image_url
## 67488 http://pbs.twimg.com/profile_images/1100769543012605952/bpjVOp0C_normal.jpg
## 67489 http://pbs.twimg.com/profile_images/1211877019077640192/4c7pFtKC_normal.jpg
## 67490 http://abs.twimg.com/sticky/default_profile_images/default_profile_normal.png
## 67491 http://abs.twimg.com/sticky/default_profile_images/default_profile_normal.png
Page | 44
## 67492 http://pbs.twimg.com/profile_images/1004746850199527425/33n7gVGL_normal.jpg
## 67493 http://pbs.twimg.com/profile_images/1213529784237486080/RtX3Uiat_normal.jpg
# Remove replies
movie_tweets_original <- subset(movie_tweets_original,
is.na(movie_tweets_original$reply_to_status_id))
# favorite_count
movie_tweets_original <- movie_tweets_original %>% arrange(-favorite_count)
movie_tweets_original[1,5]
## [1] "2020-01-08 08:31:48"
#retweet_count
movie_tweets_original <- movie_tweets_original %>% arrange(-retweet_count)
movie_tweets_original[1,5]
## [1] "2020-01-09 15:31:45"
# dataset containing only the retweets and one containing only the replies.
Create a separate data frame containing the number of original tweets, retweets, and replies
# Adding columns
movie_data$fraction = movie_data$count / sum(movie_data$count)
movie_data$percentage = movie_data$count / sum(movie_data$count) * 100
movie_data$ymax = cumsum(movie_data$fraction)
movie_data$ymin = c(0, head(movie_data$ymax, n=-1))
Page | 45
ggplot(movie_data, aes(ymax=ymax, ymin=ymin,
xmax=4, xmin=3, fill=Type_of_Tweet)) +
geom_rect() +
coord_polar(theta="y") +
xlim(c(2, 4)) +
theme_void() +
theme(legend.position = "right")
SHOW THE SOURCE DEVICES FROM WHERE THE TWEETS ARE PUBLISHED
Page | 46
SHOW THE MOST FREQUENT WORDS FOUND IN THE TWEETS
Page | 47
SHOW THE MOST FREQUENTLY USED HASHTAGS
Page | 48
SHOW THE ACCOUNTS FROM WHICH MOST RETWEETS ORIGINATE
set.seed(1234)
wordcloud(pre_tweets_data$retweet_screen_name, min.freq=200, scale=c(2, .5),
random.order=FALSE, rot.per=0.25,
colors=brewer.pal(8, "Dark2"))
## Warning in tm_map.SimpleCorpus(corpus, tm::removePunctuation): transformation
## drops documents
## Warning in tm_map.SimpleCorpus(corpus, function(x) tm::removeWords(x,
## tm::stopwords())): transformation drops documents
set.seed(1234)
Page | 49
#wordcloud(tweets_data$retweet_location, min.freq=500,colors=brewer.pal(8, "Dark2"))
set.seed(1234)
Page | 50
#wordcloud(tweets_data$retweet_location, min.freq=500,colors=brewer.pal(8, "Dark2"))
ew_sentiment<-get_nrc_sentiment((tweets))
sentimentscores<-data.frame(colSums(ew_sentiment[,]))
names(sentimentscores) <- "Score"
sentimentscores <- cbind("sentiment"=rownames(sentimentscores),sentimentscores)
rownames(sentimentscores) <- NULL
ggplot(data=sentimentscores,aes(x=sentiment,y=Score))+
geom_bar(aes(fill=sentiment),stat = "identity")+
theme(legend.position="none")+
xlab("Sentiments")+ylab("Scores")+
ggtitle("Total sentiment based on scores")+
theme_minimal()
Page | 51
pre_tweets_data$text = gsub("https(.*)*$", "", pre_tweets_data$text) # remove tweet URL
pre_tweets_data$text = gsub("www[[:alnum:][:punct:]]*","", tolower(pre_tweets_data$text
))
pre_tweets_data$text = gsub("<.*?>", "", pre_tweets_data$text) # remove html tags
pre_tweets_data$text = gsub("@\\w+", "", pre_tweets_data$text) # remove at(@)
pre_tweets_data$text = gsub("[[:punct:]]", "", pre_tweets_data$text) # remove punctuation
pre_tweets_data$text = gsub("\r?\n|\r", " ", pre_tweets_data$text) # remove /n
pre_tweets_data$text = gsub("[[:digit:]]", " ", pre_tweets_data$text) # remove
numbers/Digits
pre_tweets_data$text = gsub("[ |\t]{2,}", " ", pre_tweets_data$text) # remove tabs
pre_tweets_data$text = gsub("^ ", "", pre_tweets_data$text) # remove blank spaces at the
beginning
pre_tweets_data$text = gsub(" $", "", pre_tweets_data$text) # remove blank spaces at the
end
head(pre_tweets_data$text)
## [1] "with just days to the release of tanhajitheunsungwarriror the makers are making
the most of the time to promote the movie saifalikhan tanhaji"
## [3] "panindian movie darbar oveakes telugu biggie sarilerunekkevvaru and hindi biggie
tanhaji to become the most awaited movie lets hit k interests before the end of this
weekend folks k before release"
## [4] "tanhaji rated by british censor board bbfc running time m s bbfcinsight strong
violence bloody images tanhajitheunsungwarrior a historical action drama set in th
century in which a maratha warrior embarks on a mission to recapture a hill foress taken
by mughal"
## [5] "panindian movie darbar oveakes telugu biggie sarilerunekkevvaru and hindi biggie
tanhaji to become the most awaited movie lets hit k interests before the end of this
weekend folks k before release"
## [6] "tanhaji in hindi belts and darbar in south to have huge box office openings as
per interest shown in bms both have k and more chhapak to sta on a dull note everything
depends on womtanhaji"
Page | 52
n_grams_plot <- function(n, data) {
options(mc.cores=1)
# plots
ggplot(ngrams_matrix, aes(x=word, y=freq)) +
geom_bar(stat="Identity", fill="pink", colour="black") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("n-grams") +
ylab("Frequency")
}
n_grams_plot(n=1, data=sub_blogs_Corpus)
Page | 53
Plot of frequency distribution of 2-gram
n_grams_plot(n=2, data=sub_blogs_Corpus)
n_grams_plot(n=3, data=sub_blogs_Corpus)
Page | 54
Plot of frequency distribution of 4-gram
n_grams_plot(n=4, data=sub_blogs_Corpus)
Create the Corpus and define and get the rating of the movie based on the score given by syuzhet
package
corpus_tw = Corpus(VectorSource(pre_tweets_data$text))
Page | 55
sparse_tw.df = as.data.frame(as.matrix(sparse_tw))
colnames(sparse_tw.df) = make.names(colnames(sparse_tw.df))
Classify the tweets based on the scores provided by get_sentiment function into 5 categories.
#sparse_tw.df$Polarity = category_sentiment
#table(sparse_tw.df$Polarity)
#category_sentiment <- ifelse(sent.value < 0, "Bad", ifelse(sent.value == 0 , "Ignore",
# ifelse(sent.value > 0 & sent.value < 0.5, "Average",
# ifelse(sent.value >=0.5 & sent.value < 1 ,"Good",
# ifelse(sent.value >= 1 & sent.value < 1.5 ,"Very
Good","Excellent")))))
#sparse_tw.df$Polarity = category_sentiment
#table(sparse_tw.df$Polarity)
category_sentiment <- ifelse(sent.value < 0, 1, ifelse(sent.value == 0 , "Ignore",
ifelse(sent.value > 0 & sent.value < 0.5, 2,
ifelse(sent.value >=0.5 & sent.value < 1 ,3,
ifelse(sent.value >= 1 & sent.value < 1.5 ,4,5)))))
sparse_tw.df$Polarity = category_sentiment
table(sparse_tw.df$Polarity)
##
## 1 2 3 4 5 Ignore
## 8345 6124 14411 12434 15069 11110
sparse_tw_new.df <- filter(sparse_tw.df, Polarity != "Ignore")
table(sparse_tw_new.df$Polarity)
##
## 1 2 3 4 5
## 8345 6124 14411 12434 15069
ggplot(data=sparse_tw_new.df, aes(Polarity, fill=Polarity))+ geom_bar()
Page | 56
Working on it
Page | 57
BUILD CLASSIFICATION MODELS AND PREDICT FOR PRE RELEASE ANALYSIS
We will use different classification models and check its accuracy and performance. Polarity
will be the dependant variable.
prop.table(table(sparse_tw_new.df$Polarity))
##
## 1 2 3 4 5
## 0.1480056 0.1086143 0.2555912 0.2205275 0.2672614
library(caTools)
##
## Attaching package: 'caTools'
## The following object is masked from 'package:RWeka':
##
## LogitBoost
set.seed(777)
prop.table(table(train_data$Polarity))
##
## 1 2 3 4 5
## 0.1482857 0.1080000 0.2491429 0.2268571 0.2677143
prop.table(table(test_data$Polarity))
##
## 1 2 3 4 5
## 0.1480000 0.1080000 0.2493333 0.2273333 0.2673333
#CART Diagram
prp(movie_cart_model, extra=2)
Page | 58
Predict and Evaluate the Performance of CART train data
Page | 59
## 4 6 0 77 250 8
## 5 0 0 73 31 297
# Baseline accuracy
accuracy_cart_test_pre = sum(diag(confusion_matrix_cart))/sum(confusion_matrix_cart)
accuracy_cart_test_pre
## [1] 0.7186667
Comparison of all the performace measure of CART Model on Train and Test dataset
results_cart_train_pre = data.frame(accuracy_cart_train_pre,
as.numeric(roc_obj_cart_train_pre$auc))
names(results_cart_train_pre) = c("ACCURACY", "AUC-ROC" )
results_cart_test_pre =
data.frame(accuracy_cart_test_pre,as.numeric(roc_obj_cart_test_pre$auc) )
names(results_cart_test_pre) = c("ACCURACY", "AUC-ROC")
Page | 60
Build Random Forest Model
# Load Library
library(randomForest)
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
##
## margin
## The following object is masked from 'package:dplyr':
##
## combine
set.seed(777)
# Make predictions:
predict_rf_train_pre = predict(movie_rf_model, data=train_data,type="response")
Page | 61
Predict and Evaluate the Performance of Random Forest on test data
# Make predictions:
predict_rf_test_pre = predict(movie_rf_model, newdata=test_data,type="response")
#Variable importance:
#varImpPlot(movie_rf_model,main='Variable Importance Plot: Movie ',type=2)
varImpPlot(movie_rf_model,sort=TRUE,type=NULL, class=NULL, scale=TRUE,cex=.8)
Page | 62
##
## Call:
## multiclass.roc.default(response = as.numeric(train_data$Polarity), predictor =
as.numeric(predict_rf_train_pre), quiet = TRUE)
##
## Data: as.numeric(predict_rf_train_pre) with 5 levels of
as.numeric(train_data$Polarity): 1, 2, 3, 4, 5.
## Multi-class area under the curve: 0.9585
#Test data - Plot ROC curve
roc_obj_rf_test_pre <-
multiclass.roc(as.numeric(test_data$Polarity),as.numeric(predict_cart_test_pre),quiet=TRU
E)
roc_obj_rf_test_pre
##
## Call:
## multiclass.roc.default(response = as.numeric(test_data$Polarity), predictor =
as.numeric(predict_cart_test_pre), quiet = TRUE)
##
## Data: as.numeric(predict_cart_test_pre) with 5 levels of
as.numeric(test_data$Polarity): 1, 2, 3, 4, 5.
## Multi-class area under the curve: 0.8389
Comparison of all the performace measure of Random Forest Model on Train and Test dataset
results_rf_train_pre = data.frame(accuracy_rf_train_pre,
as.numeric(roc_obj_rf_train_pre$auc))
names(results_rf_train_pre) = c("ACCURACY", "AUC-ROC" )
results_rf_test_pre = data.frame(accuracy_rf_test_pre,as.numeric(roc_obj_rf_test_pre$auc)
)
names(results_rf_test_pre) = c("ACCURACY", "AUC-ROC")
set.seed(123)
library(e1071)
# Make predictions:
predict_svm_train_pre = predict(movie_svm_model, data=train_data, decision.values=TRUE)
Page | 63
## predict_svm_train_pre
## 1 2 3 4 5
## 1 272 10 181 7 49
## 2 1 116 203 0 58
## 3 0 0 857 1 14
## 4 3 0 220 524 47
## 5 0 1 105 1 830
# Baseline accuracy:
accuracy_svm_train_pre = sum(diag(confusion_matrix_svm))/sum(confusion_matrix_svm)
accuracy_svm_train_pre
## [1] 0.7425714
# Make predictions:
predict_svm_test_pre = predict(movie_svm_model, newdata=test_data,decision.values=TRUE)
library(pROC)
Page | 64
## multiclass.roc.default(response = as.numeric(test_data$Polarity), predictor =
as.numeric(predict_svm_test_pre), quiet = TRUE)
##
## Data: as.numeric(predict_svm_test_pre) with 5 levels of
as.numeric(test_data$Polarity): 1, 2, 3, 4, 5.
## Multi-class area under the curve: 0.8154
Comparison of all the performace measure of SVM Model on Train and Test dataset
results_svm_train_pre = data.frame(accuracy_svm_train_pre,
as.numeric(roc_obj_svm_train_pre$auc))
names(results_svm_train_pre) = c("ACCURACY", "AUC-ROC" )
results_svm_test_pre =
data.frame(accuracy_svm_test_pre,as.numeric(roc_obj_svm_test_pre$auc) )
names(results_svm_test_pre) = c("ACCURACY", "AUC-ROC")
set.seed(777)
# Make predictions:
predict_nb_train_pre = predict(movie_nb_model, train_data, type = "class")
# Make predictions:
predict_nb_test_pre = predict(movie_nb_model,newdata = test_data, type = "class")
Page | 65
confusion_matrix_nb <- table(test_data$Polarity, predict_nb_test_pre)
confusion_matrix_nb
## predict_nb_test_pre
## 1 2 3 4 5
## 1 155 36 5 26 0
## 2 10 126 3 23 0
## 3 43 77 92 162 0
## 4 20 39 7 275 0
## 5 79 109 29 109 75
# Baseline accuracy:
accuracy_nb_test_pre = sum(diag(confusion_matrix_nb))/sum(confusion_matrix_nb)
accuracy_nb_test_pre
## [1] 0.482
AUC-ROC Curve for Naive Bayes model on Train and Test dataset
library(pROC)
Comparison of all the performace measure of Naive Bayes Model on Train and Test dataset
results_nb_train_pre = data.frame(accuracy_nb_train_pre,
as.numeric(roc_obj_nb_train_pre$auc))
names(results_nb_train_pre) = c("ACCURACY", "AUC-ROC" )
results_nb_test_pre = data.frame(accuracy_nb_test_pre,as.numeric(roc_obj_nb_test_pre$auc)
)
names(results_nb_test_pre) = c("ACCURACY", "AUC-ROC")
Page | 66
row.names(df_fin) = c('Naive Bayes Train Pre', 'Naive Bayes Test Pre')
df_fin
## ACCURACY AUC-ROC
## Naive Bayes Train Pre 0.4831429 0.7284567
## Naive Bayes Test Pre 0.4820000 0.7202593
Comparing best of both the models - LDA and SVM with their performance - Accuracy , Sensitivity,
Specificity and AUC-ROC Curve
row.names(df_fin) = c('CART Train Pre', 'CART Test Pre','Random Forest Train Pre','Random
Forest Test Pre', 'SVM Train Pre','SVM Test Pre','Naive Bayes Train Pre','Naive Bayes
Test Pre')
#round(df_fin,2)
#install.packages("kableExtra")
library(kableExtra)
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
print("Model Performance Comparison Metrics ")
## [1] "Model Performance Comparison Metrics "
kable(round(df_fin,2)) %>%
kable_styling(c("striped","bordered"))
ACCURACY
AUC-ROC
0.72
0.84
0.72
0.84
0.94
0.96
Page | 67
0.93
0.84
0.74
0.81
0.73
0.82
0.48
0.73
0.48
0.72
Page | 68
#TanhajiTheUnsungWarrior Media Screening report is EPIC , my friend who is attending the
screening in mumbai saying its best film of @ajaydevgn career. #Tanhaji
## 5
WHAT A FILM YAAAR!<U+0001F525><U+0001F525>\nThe climax had the heart-beat racing like
never before. This is how Historic WAR films should be made - Bollywood's Best so far!
@OmRaut & @AjayDevgn TAKE A BOW<U+0001F64F>\n \n#Tanhaji #TanhajiReview
## 6 #OneWordReview...\n#Tanhaji: SUPERB.\nRating:
<U+2B50><U+FE0F><U+2B50><U+FE0F><U+2B50><U+FE0F><U+2B50><U+FE0F>\nDrama, emotions,
conflict, action, VFX, #Tanhaji is an enthralling experience... Electrifying climax...
Top notch direction... #Ajay, #Kajol, #Saif in super form... Get ready for 2020’s first
<U+20B9> <U+0001F4AF>cr+ film. #TanhajiReview https://t.co/N9TwWsWazd
## source display_text_width reply_to_status_id reply_to_user_id
## 1 Twitter for Android 128 NA NA
## 2 Twitter for Android 140 NA NA
## 3 Twitter for Android 140 NA NA
## 4 Twitter for Android 140 NA NA
## 5 Twitter for Android 140 NA NA
## 6 Twitter for Android 140 NA NA
## reply_to_screen_name is_quote is_retweet favorite_count retweet_count
## 1 <NA> FALSE TRUE 0 17
## 2 <NA> FALSE TRUE 0 24
## 3 <NA> FALSE TRUE 0 10287
## 4 <NA> FALSE TRUE 0 948
## 5 <NA> FALSE TRUE 0 631
## 6 <NA> FALSE TRUE 0 10287
## quote_count reply_count hashtags symbols
## 1 NA NA Tanhaji NA
## 2 NA NA Tanhaji NA
## 3 NA NA c("OneWordReview", "Tanhaji", "Tanhaji") NA
## 4 NA NA TanhajiTheUnsungWarrior NA
## 5 NA NA <NA> NA
## 6 NA NA c("OneWordReview", "Tanhaji", "Tanhaji") NA
## urls_url urls_t.co urls_expanded_url media_url media_t.co media_expanded_url
## 1 <NA> <NA> <NA> <NA> <NA> <NA>
## 2 <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA>
## 4 <NA> <NA> <NA> <NA> <NA> <NA>
## 5 <NA> <NA> <NA> <NA> <NA> <NA>
## 6 <NA> <NA> <NA> <NA> <NA> <NA>
## media_type ext_media_url ext_media_t.co ext_media_expanded_url ext_media_type
## 1 <NA> <NA> <NA> <NA> NA
## 2 <NA> <NA> <NA> <NA> NA
## 3 <NA> <NA> <NA> <NA> NA
## 4 <NA> <NA> <NA> <NA> NA
## 5 <NA> <NA> <NA> <NA> NA
## 6 <NA> <NA> <NA> <NA> NA
## mentions_user_id mentions_screen_name lang quoted_status_id quoted_text
## 1 3086592157 OpinionsRP en NA <NA>
## 2 1337106955 ajay36mittal en NA <NA>
## 3 99642673 taran_adarsh en NA <NA>
## 4 146937987 SumitkadeI en NA <NA>
## 5 142231741 Nilzrav en NA <NA>
## 6 99642673 taran_adarsh en NA <NA>
## quoted_created_at quoted_source quoted_favorite_count quoted_retweet_count
## 1 <NA> <NA> NA NA
## 2 <NA> <NA> NA NA
## 3 <NA> <NA> NA NA
## 4 <NA> <NA> NA NA
## 5 <NA> <NA> NA NA
## 6 <NA> <NA> NA NA
Page | 69
## quoted_user_id quoted_screen_name quoted_name quoted_followers_count
## 1 NA <NA> <NA> NA
## 2 NA <NA> <NA> NA
## 3 NA <NA> <NA> NA
## 4 NA <NA> <NA> NA
## 5 NA <NA> <NA> NA
## 6 NA <NA> <NA> NA
## quoted_friends_count quoted_statuses_count quoted_location quoted_description
## 1 NA NA <NA> <NA>
## 2 NA NA <NA> <NA>
## 3 NA NA <NA> <NA>
## 4 NA NA <NA> <NA>
## 5 NA NA <NA> <NA>
## 6 NA NA <NA> <NA>
## quoted_verified retweet_status_id
## 1 NA 1.215320e+18
## 2 NA 1.215349e+18
## 3 NA 1.215295e+18
## 4 NA 1.215203e+18
## 5 NA 1.215344e+18
## 6 NA 1.215295e+18
##
retweet_text
## 1
Movie #Tanhaji is not holiday release so comparing its advance with holiday releases only
show ur jealous soul.
## 2
I can see a certain honesty in the early reviews of #Tanhaji which are coming out!! This
is so bloody rare in these times! And the kind of things people are saying,i am now so so
excited to catch it at the earliest! #TanhajiReview
## 3 #OneWordReview...\n#Tanhaji: SUPERB.\nRating:
<U+2B50><U+FE0F><U+2B50><U+FE0F><U+2B50><U+FE0F><U+2B50><U+FE0F>\nDrama, emotions,
conflict, action, VFX, #Tanhaji is an enthralling experience... Electrifying climax...
Top notch direction... #Ajay, #Kajol, #Saif in super form... Get ready for 2020’s first
<U+20B9> <U+0001F4AF>cr+ film. #TanhajiReview https://t.co/N9TwWsWazd
## 4
#TanhajiTheUnsungWarrior Media Screening report is EPIC , my friend who is attending the
screening in mumbai saying its best film of @ajaydevgn career. #Tanhaji
## 5
WHAT A FILM YAAAR!<U+0001F525><U+0001F525>\nThe climax had the heart-beat racing like
never before. This is how Historic WAR films should be made - Bollywood's Best so far!
@OmRaut & @AjayDevgn TAKE A BOW<U+0001F64F>\n \n#Tanhaji #TanhajiReview
## 6 #OneWordReview...\n#Tanhaji: SUPERB.\nRating:
<U+2B50><U+FE0F><U+2B50><U+FE0F><U+2B50><U+FE0F><U+2B50><U+FE0F>\nDrama, emotions,
conflict, action, VFX, #Tanhaji is an enthralling experience... Electrifying climax...
Top notch direction... #Ajay, #Kajol, #Saif in super form... Get ready for 2020’s first
<U+20B9> <U+0001F4AF>cr+ film. #TanhajiReview https://t.co/N9TwWsWazd
## retweet_created_at retweet_source retweet_favorite_count
## 1 2020-01-09 17:08:44 Twitter for Android 39
## 2 2020-01-09 19:05:16 Twitter for Android 44
## 3 2020-01-09 15:31:45 Twitter for iPad 49234
## 4 2020-01-09 09:24:04 Twitter for iPhone 3490
## 5 2020-01-09 18:47:04 Twitter for Android 2152
## 6 2020-01-09 15:31:45 Twitter for iPad 49234
## retweet_retweet_count retweet_user_id retweet_screen_name retweet_name
## 1 17 3086592157 OpinionsRP Ash
## 2 24 1337106955 ajay36mittal JabTakHaiCinema
## 3 10287 99642673 taran_adarsh taran adarsh
## 4 948 146937987 SumitkadeI Sumit kadel
## 5 631 142231741 Nilzrav N J
Page | 70
## 6 10287 99642673 taran_adarsh taran adarsh
## retweet_followers_count retweet_friends_count retweet_statuses_count
## 1 1526 193 16988
## 2 187 440 12324
## 3 3741490 168 34910
## 4 87103 88 20466
## 5 5055 959 56183
## 6 3741490 168 34910
## retweet_location
## 1 Seven Heaven
## 2
## 3 Mumbai, India
## 4 Kolkata, West Bengal.
## 5 India/ UAE
## 6 Mumbai, India
##
retweet_description
## 1
......
## 2 Engineer,MBA in Finance but defined by my love for
Movies,Acting,Singing,Dancing,Music,Cricket!\nExtremely Occasional Blog 'FILMALAYA' at
following link:
## 3 Movie
critic | Biz analyst | Influencer | Instagram: https://t.co/4hvdbhvfkX
## 4 Film Trade analyst | Critic | Influencer | Youtube channel -
https://t.co/CaHFAF2LD5 . For work related query email me at - Sumitkadel21@yahoo.com
## 5 Office Manager | Janta's Movie Reviewer | #Chhapaak Review: TODAY | All Things
Humor, Films & 90s Bollywood | Fun RTs | RATIONALIST | Gujju |Food & Freedom
## 6 Movie
critic | Biz analyst | Influencer | Instagram: https://t.co/4hvdbhvfkX
## retweet_verified place_url place_name place_full_name place_type country
## 1 FALSE <NA> <NA> <NA> <NA> <NA>
## 2 FALSE <NA> <NA> <NA> <NA> <NA>
## 3 TRUE <NA> <NA> <NA> <NA> <NA>
## 4 TRUE <NA> <NA> <NA> <NA> <NA>
## 5 FALSE <NA> <NA> <NA> <NA> <NA>
## 6 TRUE <NA> <NA> <NA> <NA> <NA>
## country_code geo_coords coords_coords bbox_coords
## 1 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 2 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 3 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 4 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 5 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 6 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## status_url
## 1 https://twitter.com/AJAY_sardar_/status/1215423099513888770
## 2 https://twitter.com/Vastha421/status/1215423225473064960
## 3 https://twitter.com/RAMUKUM01606330/status/1215423336399826944
## 4 https://twitter.com/PARVATHINP/status/1215423430809411585
## 5 https://twitter.com/Ankit_patel_AP/status/1215423488988598273
## 6 https://twitter.com/NanoMIndia/status/1215423485943541760
##
name
## 1
AJAY
## 2
Vastha42
## 3
RAMU KUMAR
## 4
Page | 71
PARVATHI P
## 5
Ankit<U+2694>
## 6 <U+092E><U+093F><U+091F><U+094D><U+091F><U+0940> <U+0915><U+093E>
<U+092E><U+093E><U+0927><U+094B>
## location
## 1
## 2
## 3 Kuwait
## 4
## 5 follows you
## 6 Navi Mumbai, India
##
description
## 1 bhakt of AJAY
DEVGN, MSDhoni nd MODI JI!!\n<U+0001F1EE><U+0001F1F3><U+0001F1EE><U+0001F1F3>
## 2
## 3 FROM..VIL..MUSAHARI..
POST...KAIL GHAR..PS..BARHARIYA..DST...SIWAN...BIHAR.. LIVE..IN.. KUWAIT.. CITY..
## 4
India is my country.. Bharath Mata ki Jai..
## 5
MovieLover ... \n\n\n\n@ajaydevgn\n\n\n\n... SportLover \n\n@msdhoni
## 6 Staunch follower of Sanatana Dharma,I support Hindutva. \nTotally believe in One
Nation-One Rule. let's unite against terror and everything which is NOT indian.
## url protected followers_count friends_count listed_count statuses_count
## 1 <NA> FALSE 56 349 0 2232
## 2 <NA> FALSE 16 143 0 1781
## 3 <NA> FALSE 18 101 0 273
## 4 <NA> FALSE 846 1329 5 33709
## 5 <NA> FALSE 216 306 0 14024
## 6 <NA> FALSE 2098 4198 1 42894
## favourites_count account_created_at verified profile_url
## 1 7087 2018-09-10 05:17:12 FALSE <NA>
## 2 2021 2020-01-06 06:05:54 FALSE <NA>
## 3 7099 2017-09-13 09:37:22 FALSE <NA>
## 4 27070 2009-11-08 14:28:26 FALSE <NA>
## 5 22156 2017-04-13 16:42:53 FALSE <NA>
## 6 44266 2017-11-20 07:44:18 FALSE <NA>
## profile_expanded_url account_lang
## 1 <NA> NA
## 2 <NA> NA
## 3 <NA> NA
## 4 <NA> NA
## 5 <NA> NA
## 6 <NA> NA
## profile_banner_url
## 1 https://pbs.twimg.com/profile_banners/1039019940592803841/1578548036
## 2 <NA>
## 3 https://pbs.twimg.com/profile_banners/907901003567063040/1552758484
## 4 https://pbs.twimg.com/profile_banners/88429125/1465792930
## 5 https://pbs.twimg.com/profile_banners/852562745870491648/1571653398
## 6 https://pbs.twimg.com/profile_banners/932514924160364544/1578215611
## profile_background_url
## 1 <NA>
## 2 <NA>
## 3 <NA>
## 4 http://abs.twimg.com/images/themes/theme1/bg.png
## 5 <NA>
Page | 72
## 6 <NA>
## profile_image_url
## 1 http://pbs.twimg.com/profile_images/1163969231710371840/_dPMB5kq_normal.jpg
## 2 http://abs.twimg.com/sticky/default_profile_images/default_profile_normal.png
## 3 http://pbs.twimg.com/profile_images/1106975563229605888/GsW5lHOn_normal.jpg
## 4 http://pbs.twimg.com/profile_images/835309518246420480/lLeww3af_normal.jpg
## 5 http://pbs.twimg.com/profile_images/1215169883228401664/9nfFxm-W_normal.jpg
## 6 http://pbs.twimg.com/profile_images/1201882496511688705/-XcIJvSA_normal.jpg
tail(post_tweets_data)
## X.1 X user_id status_id created_at screen_name
## 105848 66582 142 1.214896e+18 1.216941e+18 2020-01-14 04:32:37 Chintan64138110
## 105849 66576 136 1.158960e+18 1.216941e+18 2020-01-14 04:32:41 Gajendr16824337
## 105850 66448 8 1.387810e+08 1.216941e+18 2020-01-14 04:32:43 onenonlymahakal
## 105851 66443 3 1.163468e+18 1.216941e+18 2020-01-14 04:32:44 shri0944
## 105852 66441 1 1.669707e+09 1.216941e+18 2020-01-14 04:32:45 pruthwirajdeo
## 105853 66442 2 7.057414e+17 1.216941e+18 2020-01-14 04:32:45 Aryaman111
##
text
## 105848 Just saw #Tanhaji, a brilliant, well-acted and inspiring film. Shivaji and the
Marathas created the basis for India's Independence Movement and show the true warrior
spirit of India. Don't understand why Shiv Sena is not highlighting it. Would have made
Bal Thackeray proud.
## 105849 Just saw #Tanhaji, a brilliant, well-acted and inspiring film. Shivaji and the
Marathas created the basis for India's Independence Movement and show the true warrior
spirit of India. Don't understand why Shiv Sena is not highlighting it. Would have made
Bal Thackeray proud.
## 105850 As per the reliable
sources #TanhajiTheUnsungWarrior will be declared national film of India by our home
minister @AmitShah in next parliament session.\n\nCongratulations @ajaydevgn @omraut
<U+0001F389><U+0001F389>#Tanhaji
## 105851 Just saw #Tanhaji, a brilliant, well-acted and inspiring film. Shivaji and the
Marathas created the basis for India's Independence Movement and show the true warrior
spirit of India. Don't understand why Shiv Sena is not highlighting it. Would have made
Bal Thackeray proud.
## 105852 Just saw #Tanhaji, a brilliant, well-acted and inspiring film. Shivaji and the
Marathas created the basis for India's Independence Movement and show the true warrior
spirit of India. Don't understand why Shiv Sena is not highlighting it. Would have made
Bal Thackeray proud.
## 105853 Just saw #Tanhaji, a brilliant, well-acted and inspiring film. Shivaji and the
Marathas created the basis for India's Independence Movement and show the true warrior
spirit of India. Don't understand why Shiv Sena is not highlighting it. Would have made
Bal Thackeray proud.
## source display_text_width reply_to_status_id
## 105848 Twitter for Android 140 NA
## 105849 Twitter for Android 140 NA
## 105850 Twitter for Android 140 NA
## 105851 Twitter Web App 140 NA
## 105852 Twitter for Android 140 NA
## 105853 Twitter for Android 140 NA
## reply_to_user_id reply_to_screen_name is_quote is_retweet favorite_count
## 105848 NA <NA> FALSE TRUE 0
## 105849 NA <NA> FALSE TRUE 0
## 105850 NA <NA> FALSE TRUE 0
## 105851 NA <NA> FALSE TRUE 0
## 105852 NA <NA> FALSE TRUE 0
## 105853 NA <NA> FALSE TRUE 0
## retweet_count quote_count reply_count hashtags symbols
## 105848 25 NA NA Tanhaji NA
## 105849 25 NA NA Tanhaji NA
Page | 73
## 105850 14 NA NA TanhajiTheUnsungWarrior NA
## 105851 25 NA NA Tanhaji NA
## 105852 25 NA NA Tanhaji NA
## 105853 25 NA NA Tanhaji NA
## urls_url urls_t.co urls_expanded_url media_url media_t.co
## 105848 <NA> <NA> <NA> <NA> <NA>
## 105849 <NA> <NA> <NA> <NA> <NA>
## 105850 <NA> <NA> <NA> <NA> <NA>
## 105851 <NA> <NA> <NA> <NA> <NA>
## 105852 <NA> <NA> <NA> <NA> <NA>
## 105853 <NA> <NA> <NA> <NA> <NA>
## media_expanded_url media_type ext_media_url ext_media_t.co
## 105848 <NA> <NA> <NA> <NA>
## 105849 <NA> <NA> <NA> <NA>
## 105850 <NA> <NA> <NA> <NA>
## 105851 <NA> <NA> <NA> <NA>
## 105852 <NA> <NA> <NA> <NA>
## 105853 <NA> <NA> <NA> <NA>
## ext_media_expanded_url ext_media_type mentions_user_id
## 105848 <NA> NA 2924521080
## 105849 <NA> NA 2924521080
## 105850 <NA> NA 2754072768
## 105851 <NA> NA 2924521080
## 105852 <NA> NA 2924521080
## 105853 <NA> NA 2924521080
## mentions_screen_name lang quoted_status_id quoted_text quoted_created_at
## 105848 davidfrawleyved en NA <NA> <NA>
## 105849 davidfrawleyved en NA <NA> <NA>
## 105850 RoninADfannn en NA <NA> <NA>
## 105851 davidfrawleyved en NA <NA> <NA>
## 105852 davidfrawleyved en NA <NA> <NA>
## 105853 davidfrawleyved en NA <NA> <NA>
## quoted_source quoted_favorite_count quoted_retweet_count quoted_user_id
## 105848 <NA> NA NA NA
## 105849 <NA> NA NA NA
## 105850 <NA> NA NA NA
## 105851 <NA> NA NA NA
## 105852 <NA> NA NA NA
## 105853 <NA> NA NA NA
## quoted_screen_name quoted_name quoted_followers_count
## 105848 <NA> <NA> NA
## 105849 <NA> <NA> NA
## 105850 <NA> <NA> NA
## 105851 <NA> <NA> NA
## 105852 <NA> <NA> NA
## 105853 <NA> <NA> NA
## quoted_friends_count quoted_statuses_count quoted_location
## 105848 NA NA <NA>
## 105849 NA NA <NA>
## 105850 NA NA <NA>
## 105851 NA NA <NA>
## 105852 NA NA <NA>
## 105853 NA NA <NA>
## quoted_description quoted_verified retweet_status_id
## 105848 <NA> NA 1.216941e+18
## 105849 <NA> NA 1.216941e+18
## 105850 <NA> NA 1.216796e+18
## 105851 <NA> NA 1.216941e+18
## 105852 <NA> NA 1.216941e+18
## 105853 <NA> NA 1.216941e+18
Page | 74
##
retweet_text
## 105848 Just saw #Tanhaji, a brilliant, well-acted and inspiring film. Shivaji and the
Marathas created the basis for India's Independence Movement and show the true warrior
spirit of India. Don't understand why Shiv Sena is not highlighting it. Would have made
Bal Thackeray proud.
## 105849 Just saw #Tanhaji, a brilliant, well-acted and inspiring film. Shivaji and the
Marathas created the basis for India's Independence Movement and show the true warrior
spirit of India. Don't understand why Shiv Sena is not highlighting it. Would have made
Bal Thackeray proud.
## 105850 As per the reliable
sources #TanhajiTheUnsungWarrior will be declared national film of India by our home
minister @AmitShah in next parliament session.\n\nCongratulations @ajaydevgn @omraut
<U+0001F389><U+0001F389>#Tanhaji
## 105851 Just saw #Tanhaji, a brilliant, well-acted and inspiring film. Shivaji and the
Marathas created the basis for India's Independence Movement and show the true warrior
spirit of India. Don't understand why Shiv Sena is not highlighting it. Would have made
Bal Thackeray proud.
## 105852 Just saw #Tanhaji, a brilliant, well-acted and inspiring film. Shivaji and the
Marathas created the basis for India's Independence Movement and show the true warrior
spirit of India. Don't understand why Shiv Sena is not highlighting it. Would have made
Bal Thackeray proud.
## 105853 Just saw #Tanhaji, a brilliant, well-acted and inspiring film. Shivaji and the
Marathas created the basis for India's Independence Movement and show the true warrior
spirit of India. Don't understand why Shiv Sena is not highlighting it. Would have made
Bal Thackeray proud.
## retweet_created_at retweet_source retweet_favorite_count
## 105848 2020-01-14 04:30:55 Twitter Web App 99
## 105849 2020-01-14 04:30:55 Twitter Web App 99
## 105850 2020-01-13 18:55:21 Twitter for Android 23
## 105851 2020-01-14 04:30:55 Twitter Web App 99
## 105852 2020-01-14 04:30:55 Twitter Web App 99
## 105853 2020-01-14 04:30:55 Twitter Web App 99
## retweet_retweet_count retweet_user_id retweet_screen_name
## 105848 25 2924521080 davidfrawleyved
## 105849 25 2924521080 davidfrawleyved
## 105850 14 2754072768 RoninADfannn
## 105851 25 2924521080 davidfrawleyved
## 105852 25 2924521080 davidfrawleyved
## 105853 25 2924521080 davidfrawleyved
## retweet_name
## 105848 Dr David Frawley
## 105849 Dr David Frawley
## 105850 THE WHITE WOLF..!! #Tanhaji <U+0001F6A9><U+0001F6A9>
## 105851 Dr David Frawley
## 105852 Dr David Frawley
## 105853 Dr David Frawley
## retweet_followers_count retweet_friends_count retweet_statuses_count
## 105848 307220 153 23992
## 105849 307220 153 23992
## 105850 1106 150 14186
## 105851 307220 153 23992
## 105852 307220 153 23992
## 105853 307220 153 23992
## retweet_location
## 105848 Santa Fe, NM USA
## 105849 Santa Fe, NM USA
## 105850
## 105851 Santa Fe, NM USA
## 105852 Santa Fe, NM USA
Page | 75
## 105853 Santa Fe, NM USA
##
retweet_description
## 105848 Pandit Vamadeva Shastri, Vedacharya, author of over forty books, Yoga, Ayurveda
and Vedanta, D.Litt., Padma Bhushan recipient, views personal
## 105849 Pandit Vamadeva Shastri, Vedacharya, author of over forty books, Yoga, Ayurveda
and Vedanta, D.Litt., Padma Bhushan recipient, views personal
## 105850
## 105851 Pandit Vamadeva Shastri, Vedacharya, author of over forty books, Yoga, Ayurveda
and Vedanta, D.Litt., Padma Bhushan recipient, views personal
## 105852 Pandit Vamadeva Shastri, Vedacharya, author of over forty books, Yoga, Ayurveda
and Vedanta, D.Litt., Padma Bhushan recipient, views personal
## 105853 Pandit Vamadeva Shastri, Vedacharya, author of over forty books, Yoga, Ayurveda
and Vedanta, D.Litt., Padma Bhushan recipient, views personal
## retweet_verified place_url place_name place_full_name place_type country
## 105848 TRUE <NA> <NA> <NA> <NA> <NA>
## 105849 TRUE <NA> <NA> <NA> <NA> <NA>
## 105850 FALSE <NA> <NA> <NA> <NA> <NA>
## 105851 TRUE <NA> <NA> <NA> <NA> <NA>
## 105852 TRUE <NA> <NA> <NA> <NA> <NA>
## 105853 TRUE <NA> <NA> <NA> <NA> <NA>
## country_code geo_coords coords_coords bbox_coords
## 105848 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 105849 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 105850 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 105851 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 105852 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## 105853 <NA> c(NA, NA) c(NA, NA) c(NA, NA, NA, NA, NA, NA, NA, NA)
## status_url
## 105848 https://twitter.com/Chintan64138110/status/1216941161698349057
## 105849 https://twitter.com/Gajendr16824337/status/1216941181264777217
## 105850 https://twitter.com/onenonlymahakal/status/1216941189032640512
## 105851 https://twitter.com/shri0944/status/1216941193944190976
## 105852 https://twitter.com/pruthwirajdeo/status/1216941197249282050
## 105853 https://twitter.com/Aryaman111/status/1216941195357753345
## name location
## 105848 Chintan Kumar
## 105849 Gajendra Singh Shekhawat
## 105850 Nirmal Aj fan <U+0001F1EE><U+0001F1F3> Indore, India
## 105851 Shri (<U+0936><U+094D><U+0930><U+0940>) Bengaluru, India
## 105852 pruthwiraj
## 105853 Vivek
##
description
## 105848
Rashtravaadi | name changed for security reason |
## 105849
Friendly
## 105850
movie n cricket maniac,shiv bhakt n believer
## 105851 Proud Hindian (Hindu+Indian) \n<U+0C9C><U+0CC8>
<U+0CB6><U+0CCD><U+0CB0><U+0CC0> <U+0CB0><U+0CBE><U+0CAE><U+0CCD>
<U+0001F6A9>\n<U+0CAD><U+0CBE><U+0CB0><U+0CA4><U+0CCD> <U+0CAE><U+0CBE><U+0CA4><U+0CBE>
<U+0C95><U+0CBF> <U+0C9C><U+0CC8>
<U+0001F1EE><U+0001F1F3>\n<U+0CB5><U+0C82><U+0CA6><U+0CC7>
<U+0CAE><U+0CBE><U+0CA4><U+0CB0><U+0C82> <U+0001F1EE><U+0001F1F3>
## 105852
Page | 76
## 105853
# Remove retweets
movie_tweets_original <- post_tweets_data[post_tweets_data$is_retweet==FALSE, ]
# Remove replies
movie_tweets_original <- subset(movie_tweets_original,
is.na(movie_tweets_original$reply_to_status_id))
Page | 77
favorite_count (i.e. the number of likes) or retweet_count (i.e. the number of retweets)
# favorite_count
movie_tweets_original <- movie_tweets_original %>% arrange(-favorite_count)
movie_tweets_original[1,5]
## [1] "2020-01-11 10:05:20"
#retweet_count
movie_tweets_original <- movie_tweets_original %>% arrange(-retweet_count)
movie_tweets_original[1,5]
## [1] "2020-01-11 10:05:20"
# dataset containing only the retweets and one containing only the replies.
Create a separate data frame containing the number of original tweets, retweets, and replies
Page | 78
SHOW THE SOURCE DEVICES FROM WHERE THE TWEETS ARE PUBLISHED
Page | 79
SHOW THE MOST FREQUENT WORDS FOUND IN THE TWEETS
Page | 80
## Selecting by n
Page | 81
SHOW THE ACCOUNTS FROM WHICH MOST RETWEETS ORIGINATE
set.seed(1234)
wordcloud(post_tweets_data$retweet_screen_name, min.freq=200, scale=c(2, .5),
random.order=FALSE, rot.per=0.25,
colors=brewer.pal(8, "Dark2"))
## Warning in tm_map.SimpleCorpus(corpus, tm::removePunctuation): transformation
## drops documents
## Warning in tm_map.SimpleCorpus(corpus, function(x) tm::removeWords(x,
## tm::stopwords())): transformation drops documents
Page | 82
SHOWS THE LOCATION FROM WHICH THE MOST TWEETS BELONGS
set.seed(1234)
Page | 83
#wordcloud(tweets_data$retweet_location, min.freq=500,colors=brewer.pal(8, "Dark2"))
set.seed(1234)
Page | 84
#wordcloud(tweets_data$retweet_location, min.freq=500,colors=brewer.pal(8, "Dark2"))
ew_sentiment<-get_nrc_sentiment((tweets))
sentimentscores<-data.frame(colSums(ew_sentiment[,]))
names(sentimentscores) <- "Score"
sentimentscores <- cbind("sentiment"=rownames(sentimentscores),sentimentscores)
rownames(sentimentscores) <- NULL
ggplot(data=sentimentscores,aes(x=sentiment,y=Score))+
geom_bar(aes(fill=sentiment),stat = "identity")+
theme(legend.position="none")+
xlab("Sentiments")+ylab("Scores")+
ggtitle("Total sentiment based on scores")+
theme_minimal()
Page | 85
CLEANING THE DATA
head(post_tweets_data$text)
## [1] "movie tanhaji is not holiday release so comparing its advance with holiday
releases only show ur jealous soul"
## [2] "i can see a ceain honesty in the early reviews of tanhaji which are coming out
this is so bloody rare in these times and the kind of things people are sayingi am now so
so excited to catch it at the earliest tanhajireview"
Page | 86
## [3] "onewordreview tanhaji superb rating cr film tanhajireview"
options(mc.cores=1)
# plots
ggplot(ngrams_matrix, aes(x=word, y=freq)) +
geom_bar(stat="Identity", fill="pink", colour="black") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("n-grams") +
ylab("Frequency")
}
n_grams_plot(n=1, data=sub_blogs_Corpus)
Page | 87
Plot of frequency distribution of 2-gram
n_grams_plot(n=2, data=sub_blogs_Corpus)
n_grams_plot(n=3, data=sub_blogs_Corpus)
Page | 88
Plot of frequency distribution of 4-gram
n_grams_plot(n=4, data=sub_blogs_Corpus)
Create Corpus and define the rating of the movie based on the score given by get_sentiment function
Page | 89
corpus_tw = Corpus(VectorSource(post_tweets_data$text))
sparse_tw.df = as.data.frame(as.matrix(sparse_tw))
colnames(sparse_tw.df) = make.names(colnames(sparse_tw.df))
#category_sentiment <- ifelse(sent.value < 0, "Bad", ifelse(sent.value >= 0 & sent.value
< 0.5, "Average",
# ifelse(sent.value >= 0.5 & sent.value < 1, "Good",
# ifelse(sent.value >=1 & sent.value < 1.5 ,"Very
Good","Excellent"))))
#sparse_tw.df$Polarity = category_sentiment
Classify the tweets based on the scores provided by get_sentiment function into 5 categories.
sparse_tw.df$Polarity = category_sentiment
table(sparse_tw.df$Polarity)
##
## 1 2 3 4 5 Ignore
## 8942 8203 17346 15614 24895 30853
table(sparse_tw_new.df$Polarity)
Page | 90
##
## 1 2 3 4 5
## 8942 8203 17346 15614 24895
ggplot(data=sparse_tw_new.df, aes(Polarity, fill=Polarity))+ geom_bar()
We will use different classification models and check its accuracy and performance. Polarity
will be the dependant variable.
prop.table(table(sparse_tw_new.df$Polarity))
##
## 1 2 3 4 5
## 0.1192267 0.1093733 0.2312800 0.2081867 0.3319333
Split the data into 70-30 ratio for Train and Test
library(caTools)
set.seed(777)
Page | 91
train_data = subset(model_data, spl == TRUE)
test_data = subset(model_data, spl == FALSE)
prop.table(table(train_data$Polarity))
##
## 1 2 3 4 5
## 0.1114286 0.1088571 0.2397143 0.2117143 0.3282857
prop.table(table(test_data$Polarity))
##
## 1 2 3 4 5
## 0.1113333 0.1086667 0.2400000 0.2120000 0.3280000
#CART Diagram
prp(movie_cart_model, extra=2)
Page | 92
## predict_cart_train_post
## 1 2 3 4 5
## 1 102 6 0 0 282
## 2 0 150 0 0 231
## 3 1 2 181 1 654
## 4 1 0 8 392 340
## 5 0 0 6 2 1141
# Baseline accuracy
accuracy_cart_train_post = sum(diag(confusion_matrix_cart))/sum(confusion_matrix_cart)
accuracy_cart_train_post
## [1] 0.5617143
library(pROC)
Page | 93
as.numeric(predict_cart_test_post), quiet = TRUE)
##
## Data: as.numeric(predict_cart_test_post) with 5 levels of
as.numeric(test_data$Polarity): 1, 2, 3, 4, 5.
## Multi-class area under the curve: 0.5985
Comparison of all the performace measure of CART Model on Train and Test dataset
results_cart_train_post = data.frame(accuracy_cart_train_post,
as.numeric(roc_obj_cart_train_post$auc))
names(results_cart_train_post) = c("ACCURACY", "AUC-ROC" )
results_cart_test_post =
data.frame(accuracy_cart_test_post,as.numeric(roc_obj_cart_test_post$auc) )
names(results_cart_test_post) = c("ACCURACY", "AUC-ROC")
# Load Library
library(randomForest)
set.seed(777)
# Make predictions:
predict_rf_train_post = predict(movie_rf_model, data=train_data,type="response")
Page | 94
## predict_rf_train_post
## 1 2 3 4 5
## 1 302 4 60 4 20
## 2 13 311 37 6 14
## 3 11 6 782 9 31
## 4 5 2 39 654 41
## 5 10 1 26 13 1099
# Baseline accuracy:
accuracy_rf_train_post = sum(diag(confusion_matrix_rf))/sum(confusion_matrix_rf)
accuracy_rf_train_post
## [1] 0.8994286
# Make predictions:
predict_rf_test_post = predict(movie_rf_model, newdata=test_data,type="response")
#Variable importance:
#varImpPlot(movie_rf_model,main='Variable Importance Plot: Movie ',type=2)
varImpPlot(movie_rf_model,sort=TRUE,type=NULL, class=NULL, scale=TRUE,cex=.8)
Page | 95
AUC-ROC Curve for CART on Train and Test dataset
Comparison of all the performace measure of Random Forest Model on Train and Test dataset
results_rf_train_post = data.frame(accuracy_rf_train_pre,
as.numeric(roc_obj_rf_train_post$auc))
names(results_rf_train_post) = c("ACCURACY", "AUC-ROC" )
Page | 96
results_rf_test_post =
data.frame(accuracy_rf_test_pre,as.numeric(roc_obj_rf_test_post$auc) )
names(results_rf_test_post) = c("ACCURACY", "AUC-ROC")
set.seed(123)
library(e1071)
# Make predictions:
predict_svm_train_post = predict(movie_svm_model, data=train_data, decision.values=TRUE)
# Make predictions:
predict_svm_test_post = predict(movie_svm_model, newdata=test_data,decision.values=TRUE)
Page | 97
# Baseline accuracy:
accuracy_svm_test_post = sum(diag(confusion_matrix_svm))/sum(confusion_matrix_svm)
accuracy_svm_test_post
## [1] 0.7093333
library(pROC)
Comparison of all the performace measure of SVM Model on Train and Test dataset
results_svm_train_post = data.frame(accuracy_svm_train_post,
as.numeric(roc_obj_svm_train_post$auc))
names(results_svm_train_post) = c("ACCURACY", "AUC-ROC" )
results_svm_test_post =
data.frame(accuracy_svm_test_post,as.numeric(roc_obj_svm_test_post$auc) )
names(results_svm_test_post) = c("ACCURACY", "AUC-ROC")
Page | 98
Build Naive Bayes Model
set.seed(777)
# Make predictions:
predict_nb_test_post = predict(movie_nb_model,newdata = test_data, type = "class")
AUC-ROC Curve for Naive Bayes model on Train and Test dataset
library(pROC)
#library(ROCR)
#Train data - Plot ROC curve
roc_obj_nb_train_post <-
multiclass.roc(as.numeric(train_data$Polarity),as.numeric(predict_nb_train_post),quiet=TR
UE)
roc_obj_nb_train_post
##
## Call:
Page | 99
## multiclass.roc.default(response = as.numeric(train_data$Polarity), predictor =
as.numeric(predict_nb_train_post), quiet = TRUE)
##
## Data: as.numeric(predict_nb_train_post) with 5 levels of
as.numeric(train_data$Polarity): 1, 2, 3, 4, 5.
## Multi-class area under the curve: 0.6423
#Test data - Plot ROC curve
roc_obj_nb_test_post <-
multiclass.roc(as.numeric(test_data$Polarity),as.numeric(predict_nb_test_post),quiet=TRUE
)
roc_obj_nb_test_post
##
## Call:
## multiclass.roc.default(response = as.numeric(test_data$Polarity), predictor =
as.numeric(predict_nb_test_post), quiet = TRUE)
##
## Data: as.numeric(predict_nb_test_post) with 5 levels of
as.numeric(test_data$Polarity): 1, 2, 3, 4, 5.
## Multi-class area under the curve: 0.6245
Comparison of all the performace measure of Naive Bayes Model on Train and Test dataset
results_nb_train_post = data.frame(accuracy_nb_train_post,
as.numeric(roc_obj_nb_train_post$auc))
names(results_nb_train_post) = c("ACCURACY", "AUC-ROC" )
results_nb_test_post =
data.frame(accuracy_nb_test_post,as.numeric(roc_obj_nb_test_post$auc) )
names(results_nb_test_post) = c("ACCURACY", "AUC-ROC")
Comparing best of both the models - LDA and SVM with their performance - Accuracy , Sensitivity,
Specificity and AUC-ROC Curve
#install.packages("kableExtra")
library(kableExtra)
print("Model Performance Comparison Metrics ")
## [1] "Model Performance Comparison Metrics "
kable(round(df_fin,2)) %>%
kable_styling(c("striped","bordered"))
Page | 100
Page | 101