SYSTEMS, MAN, AND CYBERNETICS, PART C: APPLICATIONS AND REVIEWS, VOL. XX, NO. XX, JUNE 2013
1
PANAS-t: A Pychometric Scale for Measuring
Sentiments on Twitter
Pollyanna Gonçalves⋆
Fabrı́cio Benevenuto⋆†
Meeyoung Cha‡
† Computer
Science Department, Federal University of Ouro Preto, Brazil
Science Department, Federal University of Minas Gerais, Brazil
‡ Graduate School of Culture Technology, KAIST, Korea
arXiv:1308.1857v1 [cs.SI] 8 Aug 2013
⋆ Computer
Abstract—Online social networks have become a major communication platform, where people share their thoughts and
opinions about any topic real-time. The short text updates people
post in these network contain emotions and moods, which when
measured collectively can unveil the public mood at population
level and have exciting implications for businesses, governments,
and societies. Therefore, there is an urgent need for developing
solid methods for accurately measuring moods from large-scale
social media data. In this paper, we propose PANAS-t, which
measures sentiments from short text updates in Twitter based
on a well-established psychometric scale, PANAS (Positive and
Negative Affect Schedule). We test the efficacy of PANAS-t over
10 real notable events drawn from 1.8 billion tweets and demonstrate that it can efficiently capture the expected sentiments of
a wide variety of issues spanning tragedies, technology releases,
political debates, and healthcare.
Index Terms—Twitter, sentiment analysis, public emotion,
public mood, psychometric scales, PANAS.
I. I NTRODUCTION
Online social networks (OSNs) like Facebook and Twitter
have become an important communication platform, where
people share their thoughts and opinions about any topic in
a collaborative manner in real-time. As of 2012, Facebook
has over one billion active users, which is one-seventh of the
world population, and Twitter similarly has over 400 million
registered users each producing hundreds of millions of status
updates every day [36]. Given the scale and the richness of
these networks, the potential for mining the data within OSNs
and utilizing observations from such data is tremendous, and
OSN data have been a gold mine for scholars in fields like
linguistics, sociology, and psychology who are looking for
real-time language data to analyze [25].
The massive-scale detailed human lifelog data found in
OSNs have important implications for businesses, governments, and societies. The following areas of research demonstrate directly how useful observations from mining OSN
data could be. First, social media data can be used to find
resonance of important real-time debates and breaking news.
As more people are seamlessly connected to the Web and OSN
sites by mobile devices, people participate in delivering and
propagating prominent and urgent information like political
uprising [21], [9], natural disasters [32], and the upheaval
of epidemics [15]. Second, social media data can be used to
not only understand the current trends but also predict future
trends such as movie sales [2], political elections [33], [12],
[28], as well as stock market [7].
The key features of OSN data that allow for the above
implications is at their immediacy and immensity, for which
the development of new methods on large-scale and real-time
collection and analysis of OSN data are crucial. One such
important development is at inferring sentiments in OSNs. A
recent work has showed that real-time moods of people can be
gauged on a global level, instead of relying on questionnaires
and other laborious and time-consuming methods of data
collection [14]. Measuring sentiments from unstructured OSN
data can not only broaden our understanding of the human
nature, but also comprehend how, when, and why individuals’
feelings fluctuate according to various social and economic
events.
While sentiment analysis in OSNs is getting great attention,
existing work on measuring sentiments from OSN data has
focused on extracting opinions (not feelings) for marketing
purposes [30] and on finding correlation of moods with some
other factor such as happiness [13] and stock price [7]. Most
research on inferring moods from social media texts have
directly employed existing natural language processing tools
like LIWC (Linguistic Inquiry and Word Count) [31], PANAS
(Positive and Negative Affect Schedule) [35], [34], ANEW
(Affective Norms for English Words) [23], and Profile of
Mood States (POMS) [7] that have been developed to suit
more traditional style writing, such as formal articles that
uses proper language (but not for unstructured and lessformal OSN data). However, relatively little attention has been
paid developing solid methods for adjusting existing natural
language processing tools for specific types of OSN data.
In this paper, we use well-established psychometric scales,
PANAS, to measure sentiments from short text updates in
Twitter and propose PANAS-t, which is an eleven-sentiment
psychometric scale adapted to the context of Twitter. PANASt contains positive and negative mood states and is suitable
to measure sentiments about any sort of event in Twitter. To
establish PANAS-t, we used empirical data from a unique
dataset containing 1.8 billion tweets. We used such data to
compute normalization scores for each sentiment, so that any
increase or decrease in positive or negative moods over time
can be measured relatively to the presence of the overall
sentiments in this dataset. This approach makes PANAS-t very
simple and practical to be used for large amounts of data and
even for real-time analysis.
To validate our approach, we extracted 10 real notable
events that span a wide variety of issues spanning tragedies,
technology releases, political debates, and healthcare from the
3.5 years worth of Twitter data, and demonstrated that PANASt can effectively capture the mood fluctuations during these
events. The 10 events studied include the 2009 Presidential
election in the US, death of the singer Michael Jackson, as well
as the natural disasters like the 2010 Earthquake in Haiti. Our
qualitative evaluation offers strong evidences that PANAS-t
correctly captured expected sentiments for the analyzed events.
The remainder of this paper is organized as follows. Section
2 surveys existing approaches to measure sentiments from text.
Section 3 details how PANAS-t works and Section 4 describes
the Twitter dataset. Section 5 provides experimental evidences
that our approach is able to capture public mood from tweets
associated to noteworthy events. Finally, Section 6 concludes
the paper and offers directions for future work.
II. R ELATED W ORK
With the growth of social networking on web, sentiment
analysis and opinion mining have become a subject of study
for many researches. In this section, we survey different
techniques used to measure sentiments from online text and
describe related work that studied sentiments in Twitter.
Several methodologies have being used by researchers to
extract sentiment from online text. An overview of a number of these approaches was well-presented in Pang and
Lee’s survey [30], which covers several methods that use
Natural Language Processing (NLP) techniques for sentiment
analysis—techniques by which subjective properties of text are
inferred using statistical methods. Those methods are usually
suitable for constructing sentiment-aware and opinion mining
Web applications, which analyze feedback of consumers or
users about a particular product or service [3], [1].
Chesley et al. [10] utilized verbs and adjectives extracted
from Wikipedia to classify text from blogs into three categories: objective, subjective-positive, or subjective-negative.
The verb classes used in the paper can express objectivity
and polarity (i.e., a positive or negative opinion), and the
polarity of adjectives can be drawn from their entries in
the online dictionary, with high accuracy rates of two verb
classes demonstrating polarity near 90%. More recently, Pak
and Paroubek [29] utilized strategies of grammatical structures’ recognition to define if a tweet written by a user is a
subjective phrase or not. They demonstrated that superlative
adjectives, verbs in first person, and personal pronouns are
often used for expressing emotions and opinions as opposed
to comparative adjectives, common, and proper nouns that are
a strong indicator of an objective text.
Other approaches that extract sentiment from online text
rely on machine learning, a technique in which algorithms
learn a classification model from a set of previously labeled
data, and then apply the acquired knowledge to classify text
new into sentiment categories. In [6], the authors use Support
Vector Machine (SVM) and Multinomial Naive Bayes (MNB)
classifiers to test whether brevity in microblog posts give
any advantage in classifying sentiment and in fact find that
short document length suggests a more compact and explicit
sentiment than long document length. In [16], the authors use
Random Walk (RW)-based model and compare it with SVM
to predict bias in user opinions. Although these approaches
are applicable for several scenarios, supervised learning techniques require manual intervention for pre-classifying training
data, which may be infeasible for massive-scale social media
data.
Another line of research on extracting sentiments from
online text is at measuring a happiness index from text [14].
Dodds and Danforth [13] proposed a method that computes
the level of happiness of an unstructured text. They showed
that while the happiness index inferred from song lyrics trends
downward from the 1960s to the mid 1990s remained stable
within genres, that of blogs has steadily increased from 2005
to 2009. While providing new insights, one drawback of this
approach is that the happiness index proposed has a single
scale and do not provide any other categorization of rich
sentiments, which is the focus of this work.
Miyoshi, T. [27] et al. propose a method to estimate the
semantic orientation of Japanese reviews about some target
products. Authors selected words that possible change the
semantic orientation of a text and then concluded if the review
of a product can be considered desirable or not. In order
to evaluat their approach, authors analyzed 1,400 Japanese
reviews of eletric products such as LCD and MP3 Players
in order to separated it in positive and negative reviews.
There are two studies that are more closely related to
our goals. Kim et al. [23] proposed a method for detecting
emotions using Affective Norms for English Words (ANEW),
which is a dataset that contains normative emotional ratings
for 1034 English words. Each word in the ANEW dataset is
associated with a rating of 1–9 along each of three dimensions:
valence, arousal, and dominance. Based on these scales, the
authors examined sample tweets about celebrity deaths and
found ANEW to be a promising tool mine Twitter data.
Another study [7] utilized Profile of Mood States (POMS),
which is a psychological rating scale that measures certain
mood states consisting of 65 adjectives that qualify 6 negative
feelings: tension, depression, anger, vigor, fatigue and confusion. The authors applied this scale to identify sentiments on
a sample of tweets and evaluate the mood of users related to
market fluctuations and events like political elections in the
United States.
This paper builds upon the above efforts and adopt a different psychometric scale called PANAS (Positive and Negative
Affect Schedule) [35], [34] to achieve new contributions. First,
compared to the machine learning-based or other dictionarybased approaches, PANAS contains a well-balanced set of both
positive and negative affects. This makes PANAS suitable to
analyze reactions of people not only on crisis events such as
celebrity deaths and natural disasters, but also amusing events
that incur positive emotions. Second, compared to existing
work that tested sentiment extraction on sample data, we
use the complete data gathered from Twitter to test the idea,
which allows us to perform appropriate normalization to adjust
PANAS for Twitter.
III. PANAS- T: A FFECT M EASURE FOR T WITTER
Our approach to measure sentiments in Twitter is rooted on a
well-known psychometric scale, namely PANAS. We begin by
describing PANAS-x, a popular expanded version of PANAS,
which we utilize and then describe the normalization steps that
we take to adapt the psychometric scale for Twitter.
A. The PANAS and PANAS-x Scales
The original PANAS consists of two 10-item mood scales and
was developed by Watson and Clark [35] to provide brief
measures of PA (Positive Affect) and NA (Negative Affect).
Respondents are asked to rate the extent to which they have
experienced each particular emotion within a specified time
period (typically during the past week), with reference to a 5point scale. Ever since the development of the test, the words
appearing in the checklist broadly tapped the affective lexicon.
Later, the same authors developed an expanded version by
including 60 items. The expanded version, called PANAS-x,
not only measures the two original higher order scales (PA
and NA), but also 11 specific affects: Fear, Sadness, Guilt,
Hostility, Shyness, Fatigue, Surprise, Joviality, Self-Assurance,
Attentiveness, and Serenity.
Table I summarizes the word composition of the PANASx scale [34]. The negative affect includes words like “afraid,”
“scared,” and “nervous,” while the fatigue affect state includes
words like “sleepy,” “tired,” and “sluggish.” The items in
PANAS-x has been validated extensively and also is known to
have strongly relationship with POMS categories, with convergent correlations ranging above 0.85. In addition, PANASx has been demonstrated with its excellence over POMS,
because the items in PANAS-x tend to be less highly correlated
with one another, and thus show better discriminant validity.
For instance, the mean correlation among the PANAS-x Fear,
Hostility, Sadness, and Fatigue scales was 0.45, which is
significantly lower than the mean correlation (0.60) among
the corresponding POMS scales.
The authors also validated that individual trait scores on the
PANAS-X scales (a) are stable over time, (b) show significant
convergent and discriminant validity when correlated with
peer-judgments, (c) are highly correlated with corresponding
measures of aggregated state affect, and (d) are strongly
and systematically related to measures of personality and
emotionality [34]. Due to this excellence, we choose to adopt
PANAS-x for analyzing short text updates from online social
media.
B. Adjusting PANAS-x for Twitter
Tweets expressing certain sentiments may appear more frequently than others, leading to a bias or dominance of a
small set of sentiments in OSN data. Thus, in order to tell
if tweets expressing a specific type of sentiment has increased
or decreased for a given event (e.g., celebrity death or natural
disasters), we first need to know what kinds of sentiments
appear during “typical” or non-event periods. Unfortunately,
it is hard or impossible to determine which dates would be
classified as such. One natural baseline would be to aggregate
sentiments over a long period of time and consider the
proportion of each type of sentiment as the baseline. Therefore,
by comparing the proportion of tweets that contain a specific
sentiment during a given event against the entire baseline, one
can know how sentiments have changed related to the presence
of a given event in the entire dataset.
We describe the methods to compute the baselines for
comparison. We assume each normalized tweet can be mapped
to a single sentiment. When a tweet contains any of the
adjectives in Table I, we associate the corresponding sentiment
s as the main sentiment of the tweet. In case none of the
sentiment words in Table I appear in a tweet, we cannot infer
the sentiment for that tweet. This limitation is common to
most other sentiment tools described in the related work. In
case there is a tie and more than two sentiments can be found
in a single tweet, we choose the first sentiment that appears in
the tweet (based on the locatio of the adjectives) as the major
sentiment of that tweet, although such ties are very rare and
hence are negligible for analysis.
The baseline sentiment can be then calculated as follows.
Let T be the entire set of normalized tweets and Ts the subset
of these tweets related to sentiment s. The baseline value for
each sentiment, αs , is defined as the proportion that divides
the number of occurrences of tweets of each type of sentiment
by the total number of normalized tweets in our dataset:
αs =
|Ts |
|T |
(1)
Table II shows the baseline values for all 11 sentiments
in PANAS-x from the 3.5 years worth of Twitter data, which
we will describe in detail in the next section. Some sentiments
occur orders of magnitude more frequently than others. Tweets
expressing fatigue occurs nearly 32 more frequently than
tweets expressing shyness. This skew in frequency indicates
that normalization is needed to comprehend the effective
change of a given sentiment, because treating the any increase
in the number of fatigue and shyness tweets equally will result
in under-estimation and over-estimation of these sentiments,
respectively. Therefore, the inherent skew in sentiments reinforces that a proper normalization specific to the OSN is
necessary.
Sentiment (s)
Fear
Sadness
Guilt
Hostility
Shyness
Fatigue
Surprise
Joviality
Self-assurance
Attentiveness
Serenity
Baseline ( αs )
0.0063791
0.0086279
0.0021756
0.0018225
0.0007608
0.0240757
0.0084612
0.0182421
0.0036012
0.0008997
0.0022914
Table II
F RACTION OF TWEETS FOR EACH SENTIMENT IN THE ENTIRE DATASET.
Given the baseline sentiment values in Table II, we can
now compute the relative increase or decrease in sentiments
for a particular sample of tweets as follows. Let S be the set of
General Dimension Scales
Negative Affect (10)
Positive Affet (10)
afraid, scared, nervous, jittery, irritable, hostile, guilty, ashamed, upset, distressed.
active, alert, attentive, determined, enthusiastic, excited, inspired, interested, pround, strong.
Basic Negative Emotions Scales
Fear (6)
Hostility (6)
Guilt (6)
Sadness (5)
afraid, scared, frightened, nervous, jittery, shaky.
angry, hostile, irritable, scornful, disgusted, loathing.
guilty, ashamed, blameworthy, angry at self, disgusted with self, dissatisfied with self.
sad, blue, downhearted, alone, lonely.
Basic Positive Emotions Scales
Joviality (8)
Self-assurance (6)
Attentiveness (4)
happy, joyful, delighted, cheerful, excited, enthusiastic, lively, energetic.
proud, strong, confident, bold, daring, fearless.
alert, attentiveness, concentrating, determined.
Other Affective States
Shyness (4)
Fatigue (4)
Serenity (3)
Surprise (3)
shy, bashful, sheepish, timid.
sleepy, tired, sluggish, drowsy.
calm, relaxed, at ease.
amazed, surprised, astonished.
Note. The number of terms comprising each scale is shown in parentheses.
Table I
I TEM COMPOSITION OF THE PANAS-x SCALES .
tweets (e.g., natural disaster) and Ss the subset of these tweets
related to sentiment s. We define βs as the relative occurrence
of sentiment s for the event S and compute it as follows:
βs =
|Ss |
|S|
(2)
Finally, we define the PANAS-t score as an eleven-dimensional
sentiment vector, where the PANAS-t score function P (s) for
sentiment s is computed as bellow:
(
(αs −βs )
if βs ≤ αs
αs
P (s) =
(3)
s)
otherwise
− (βsβ−α
s
The value of P (s) varies between -1 and 1 for each
sentiment s. An event with P (f ear) = 0 means that the
event has no increase or decrease for the sentiment fear in
comparison with the entire dataset of tweets posted as of 2009.
A positive value of 0.3 would mean an increase of 30%, and
so on. Our strategy to compute the PANAS-t score is simple
and suitable for allowing the comparison of both the increase
and decrease for each type of sentiment relatively to a nonbias dataset. More importantly, Table II provides a baseline for
comparison against any kinds of sample tweets. For instance,
one could easily crawl tweet samples using the Twitter API
and normalize the sentiment scores found with our baselines.
C. Most popular words of PANAS-t
Having seen that the level of baseline sentiments in tweets
are skewed, we quantify which words of the PANAS-t scales
appear most frequently in the dataset. Table III shows the
frequency of each adjective based on the entire Twitter
data. Even within a given sentiment, certain adjectives are
used more frequently to express feelings. The most popular
adjectives are “sleepy” in the fatigue category (appearing
over 8.0 million times), followed by “happy” in the joviality
category (appearing over 3.8 million times). Other popular
words include “tired”, “excited”, “sad”, “amazed”, “alone”,
and “surprised”, which all appear more than 1 million times.
However, certain words in the PANAS-x scales are rarely
used in Twitter to express the moods, such as “downherted” in
the sadness category and “blameworth” in the guilt category.
We may expect that not all words in the PANAS-x will appear
frequently in OSNs, because the PANAS-x scale was originally
designed to be used in a different environment (i.e., intrusive
surveys). A patient submitted to PANAS test needs to mark in
a scale from 1 to 5 how much each of these words tell about
her mood state. Despite of this difference between PANAS-x
and PANAS-t, the next section presents a number of situations
in which PANAS-t can capture the expected mood states of
populations about a number of noteworthy events accurately.
IV. T WITTER DATASET
The dataset used in this work includes extensive data from a
previous measurement study that included a complete snapshot
of the Twitter social network and the complete history of
tweets posted by all users as of August 2009 [8]. More
specifically, the dataset contains 54,981,152 users who had
1,963,263,821 follow links among themselves and posted
1,755,925,520 tweets (as of August 2009). Out of all users,
nearly 8% of the accounts were set as private, which implies
that only their friends could view their links and tweets. We
ignore these users in our analysis.
This dataset is appropriate for the purpose of this work for
the following reasons. First, the dataset contains all users with
accounts created before August 2009. Thus, it is not based
on sampling techniques that can introduce bias towards some
characteristics of the users. Second, this dataset contains all
tweets of these users, which is essential for measuring the
increase or decrease of a certain sentiment related to tweets
of a specific event. Thus, this dataset uniquely allows us to
Self-assurance
proud: 762,990
strong: 596,376
daring: 295,047
confident: 95,858
bold: 90,101
fearless: 20,084
Guilt
ashamed: 492,371
guilty: 324,446
angry at self: 7,873
disgusted with self: 2,853
dissatisfied with self: 61
blameworthy: 19
Hostility
angry: 483,937
irritable: 268,546
disgusted: 220,470
loathing: 72,330
hostile: 12,614
scornful: 7,516
Surprise
amazed: 2,758,114
surprised: 1,050,164
astonished: 19,047
Attentiveness
alert: 209,062
concentrating: 123,725
determined: 96,616
attentive: 5,456
Fatigue
sleepy: 8,043,591
tired: 3,486,574
sluggish: 19,938
drowsy: 18,435
Fear
scare: 1,649,193
nervous: 668,867
afraid: 515,224
shaky: 173,142
frightened: 75,260
jittery: 12,791
Joviality
happy: 3,802,662
excited: 3,170,837
delighted: 117,074
lively: 43,552
enthusiastic: 34,323
energic: 22,159
joyful: 21,663
cheerful: 19,178
Shyness
shy: 320,611
timid: 13,521
bashful: 2,556
sheepish: 6,850
Sadness
sad: 2,765,458
alone: 1,096,592
lonely: 15,858
blue: 987
downhearted: 286
Serenity
at ease: 1,030,236
relaxed: 737,668
calm: 258,576
-
Table III
F REQUENCY OF EACH TERM OF PANAS- T IN THE TOTAL DATABASE .
normalize the presence of sentiments of a sample of tweets
relatively to the inherit sentiments in Twitter.
A. Data cleaning steps
In order to analyze only those tweets that possibly express
individuals’ feelings, we only into account tweets that contain
explicit statements of their author’s mood states by matching
the following expressions in tweets: “I’m”, “I am”, “I”,
“am”, “feeling”, “me” and “myself”. A similar approach has
been used in [7] in finding correlations of Twitter moods and
stock price. In total, we found 479,356,536 tweets that match
these patterns, which correspond to about 27% of the entire
dataset of tweets.
Once we found a set of candidate tweets that contain emotions and moods, we further cleaned the data as follows. We
first applied common language processing approaches such as
case-folding, stemming, and removal of stop words, URLs, and
common verb-forms. We then separated individual terms using
white-space as delimiters and also removed commas, dashes,
and others non-alphanumeric characters. For example, a tweet
“I am so scared about swine flu” terns into the following set
of terms, [I, am, scare, swine, flu]. In the remainder of this
paper, we use the above described normalization and analyze
a total of 479,356,536 normalized tweets.
V. E VALUATION OF PANAS- T
In order to evaluate the extent to which PANAS-t can accurately measure sentiments of Twitter users, we need ground
truth data to compare the results with our methods. Such
ground truth data is difficult to obtain because sentiments are
subjective by nature. In this paper, we consider a few number
of strategies to perform this evaluation. First we evaluate a
set of popular events, for which the sentiments associated
with them are expected or easy to be verified. Second, we
compare our results obtained using PANAS-t with an analysis
performed using common emoticons most used by users for
express their feeling on social networkings. Third, we show
that the baseline values computed for PANAS-t were useful
to measure sentiments from a dataset of tweets collected in a
different period.
A. Testing across popular real-world events
We picked nine events that were widely reported to have been
covered by Twitter1 . These events, summarized in Table IV,
span topics related to tragedies, products and movie releases,
politics, health, as well as sport events. To extract tweets
relevant to the these events, we first identified a set of
keywords describing each topic by consulting news websites,
blogs, wikipedia, and informed individuals. Given the selected
list of keywords, we identified the topics by searching for
keywords in the tweet dataset. We limited the duration of
each event because popular keywords are typically hijacked by
spammers after certain time [5], [11]. Table IV also displays
the keywords used and the total number of tweets for each
topic.
In order to test how accurately PANAS-t can measure
sentiment fluctuations, we calculated the PANAS-t scales for
all events and present them in Kiviat representations. In each
Kiviat graph, radial lines starting at the central point -1
represents each sentiment with the maximum value of 1 [22].
1 Top
Twitter trends http://tinyurl.com/yb4965e
Event
H1N1
Duration
Mar 1 – Jul 31,
2009
A IR F RANCE
Jun 1–6, 2009
US-E LEC
Nov 2–6, 2008
O BAMA
Jan
2009
Jun
2009
M ICHAEL JACKSON
S USAN -B OYLE
H ARRYP OTTER
O LYMPICS
S AMOA
H AITI
18–22,
25–30,
Apr
11–16,
2009
Jul
13–17,
2009
Aug
6–26,
2008
Sep 28 – Oct
4, 2009
Jan
11–17,
2010
Description (Example keywords)
Disease outbreak (tamiflu, outbreak, influenza, pandemia, pandemic, h1n1, swine, world health organization)
A plane crash (victims, passengers, A330, 447, crash,
airplane, airfrance)
US presidential election (clinton, biden, palin, vote,
mccain, democrat, republican, obama)
Presidential inauguration speech (barack obama,
white house, presidential, inauguration)
Death of celebrity (rip, mj, michael jackson, death,
king of pop, overdose, drugs, heart attack, conrad
murray)
Appearance of a new celebrity (susan boyle, I
dreamed a dream, britain’s got talent)
Release of a movie (harry potter, half-blood prince,
rowling)
Beijing Olympics (olympics, medals, china, beijing,
sports, peking, sponsor)
Natural disaster (tsunami, samoa islands, tonga,
earthquake)
Natural disaster (haiti, earthquake, richter, port au
prince, jacmel, leogane)
# Tweets
335,969
29,765
185,477
43,015
56,259
7,142
194,356
12,815
23,881
236,096
Table IV
S UMMARY OF EVENTS THAT WERE ANALYZED .
In Figure V-A, we plot the eleven sentiments in each figure
so that each figure represents the corresponding event.
The first event we examine is H1N1, which represents
the worldwide disease outbreak of the H1N1 influenza. The
marking date, March 1st of 2009 was the day, where the
influenza was declared by World Health Organization (WHO)
as the global pandemic. To identify the event, we searched
for a number of keywords including “pandemic” and “swine”
and found a total of 335,969 relevant tweets during the five
months period. Figure 1(a) shows the sentiment scores of
this event based on PANAS-t scales. It demonstrates that the
emotional state of Twitter users increased in attentiveness (P(s)
= 0.8774) and fear (P(s) = 0.6768) in the days just after the
announcement. Indeed, these two feelings correspond to the
most likely feelings to expect from this event as people were
both attentive to the precautions as well as afraid of a global
pandemic.
The second event is A IR F RANCE, which describes the
tragic crash of an airplane on July 1st, 2009, which caused
a big commotion in Twitter. The AirFrance Flight 447 was a
scheduled as commercial flight from Rio de Janeiro to Paris,
but crashed in Ocean and killed all the 216 passengers. As
expected, the crash caused sad emotions towards those who
died and also fear that a something similar might happen
again. Figure 1(b) shows the Kiviat representation for this
event. As expected, fear (P(s) = 0.72914) and sadness (P(s) =
0.6992) were the two most predominated feelings in the tweets
associated to this event.
The third event is US-E LEC, which describes the presidential election related tweets in the US. With the election,
many voters might feel apprehensive and even excited about
the power of choice that is given to them. Our results show
sentiments on this direction. Figure 1(c) shows that users had
the feeling for self-assurance (P(s) = 0.6741), joviality (P(s) =
0.4277) and fear (P(s) = 0.3072) increased, when the election
results came out.
The fourth event, O BAMA, describes the president Barack
Obama’s inauguration speech, which received wide attention
in Twitter. As reported in reference [17], the majority of
Americans were more confident in the improvement of the
country after viewing President Barack Obama’s inauguration
speech. Our analysis of the mood of Twitter’s users performed
on the day of Obama’s speech shows a particularly large
increase in self-assurance’s (P(s) = 0.7980), followed by
surprise (P(s) = 0.5802), and joviality (P(s) = 0.5227). But
despite all the positive manifestation regarding the election
of Obama, we can also see a positive, but not so high value
for sadness (P(s) = 0.1789), which might naturally represent
tweets from Barack Obama’s oppositors. Figure 1(d) shows
that the feelings measured with PANAS-t are agreement with
the ones reported in reference [17].
The fifth Kiviat chart, M ICHAEL -JACKSON, is about the
death of singer Michael Jackson. According to DailyMail [4],
nine of the ten most popular topics in Twitter were dedicated
to the event the day after his death. In Figure 1(e), we can see
an increase in sadness (P(s) = 0.4055), fear (P(s) = 0.5676),
shyness (P(s) = 0.4055), guilt (P(s) = 0.1616), and surprise
(P(s) = 0.0810). It is interesting to perceive that, in addition
to the expected feelings associated with a sudden death like
sadness and fear, we could see increase in guilt. This may
be explained by the fact that many speculated about who or
what killed Michael Jackson and fans and critics blamed the
high stress caused by paparazzi and media for the death of
celebrity. Therefore, some Twitter users felt guilt for his death
and expressed such feeling in their tweets.
The next event we analyze is S USAN -B OYLE, who’s appearance as a contestant the TV show, Britain’s Got Talent, had
an incredible repercussion in the media. Global interest was
triggered by the contrast between her powerful voice singing
“I Dreamed a Dream” from the musical Les Miserables and
Figure 1.
(a) H1N1
(b) AirFrance
(c) US-Elec
(d) Obama
(e) MJ-death
(f) Susan-Boyle
(g) Harry-Potter
(h) Olympics-begin
(i) Olympics-end
Events and feelings associated with them using PANAS-t.
her plain appearance on stage. The contrast of the audience’s
first impression of her, with the standing ovation she received
during and after her performance, led to an immediate viral
spread over the social networks and a huge attention of the
global media. Figure 1(f) shows that the sentiments expressed
in Twitter associated with Susan Boyle’s first appearance are
surprise (P(s) = 0.9066), followed by self-assurance (P(s) =
0.4751), and guilt (P(s) = 0.1367). The high surprise factor
could also explain why Susan Boyle’s video went viral on the
Internet. People also felt self-assured as it is encouraging to
see a woman successfully facing an audience that is laughing
at her. Finally, guilt is also expected as the event is based on
wrong prejudice based on appearance.
The last two charts shown in the figure are related to the
O LYMPICS games that were held in the summer of 2008 in
Beijing, China. For this event, we show two Kiviat charts: one
drawn based on the beginning sentiments and the other based
on the ending sentiments of the event. Figure 1(h) is based on
sentiments from the day of opening ceremony on August 08,
where people felt surprise (P(s) = 0.7024), attentiveness (P(s)
= 0.4621), and joviality (P(s) = 0.3298). However, in the end
of the event, on August 24th, we can see that these feelings
had a decrease, whereas sadness increased from P(s)= 0.1222
in this day and to P(s) = 0.5245 in the next day, as we can
see in Figure 1(i).
The seventh event we studied is H ARRY-P OTTER, which
describes the release of the movie “Harry Potter and the
Half-Blood Prince”. Figure 1(g) shows that the main feelings
associated are joviality (P(s) = 0.6355), surprise (P(s) =
0.4926), and sadness (P(s) = 0.2056), which also is described
by many other critics that say the movie will leave the audience
“pleased, amused, excited, scared, infuriated, delighted, sad,
surprised, and thoughtful.”
B. Testing across different geographical regions
In order to evaluate whether PNAS-t can effectively capture
the subtle sentiment differences across different geographical
areas, we take the example of the popular H1N1 event and
examine how sentiments on the event fluctuate over time in
two different regions: USA and Europe.
To give further context of the H1N1 event, we start by
describing its impact on society. The H1N1 influenza, or
also known as the “swine flu” by the public, has killed as
many as half a million people in 2009. The World Health
Organization (WHO) declare it as the first global pandemic
since the 1968 Hong Kong flu, which caused a large concern in
the world population. Later, WHO launched several warnings
and precautions that should be taken by governments and by
public, taking the entire population to a state of world alert
against the disease.
In this section we compare the fluctuations of the mood of
users about H1N1 in two different locations. More specifically,
we want to verify how USA and European Twitter users
felt about the event and quantify differences in public mood
according to geographic regions.
In examining the difference in sentiments across North
America and Europe, we focus on only English tweets. Therefore sentiments in Europe are limited to those tweets residing
from Europe written in English. To be consistent in language
representativeness, we limited our focus to tweets residing
from the following regions in Europe: Ireland, Kingdom of
the Netherlands, Malta, and United Kingdom. To do this we
used a database collected and used in reference [24]. In this
paper, authors used an expressive database from Twitter to
separate unique ids, that represents users, by location.
The sparkline charts shown in Figures 2(a) and 2(b) present
the fluctuations of four of the major sentiments related to the
event in PANAS-t scales for Europe and USA, respectively.
The charts are marked with five dates that indicate the day of
important announcements made by WHO. In March, Mexican
authorities begin picking up cases of what WHO called an
“influenza-like-illness.” This event led European users to have
an increase in the feeling of surprise (P(s) = 0.8730) but the
same did not happen with users in the US.
In April, the first case of H1N1 in the United States was
confirmed and WHO issued a health advisory on the outbreak
of “influenza like illness in the United States and Mexico”, and
the charts shown a similar increase in fear in both locations,
P(s) = 0.7401 for Europe and P(s) = 0.6154 for the US. We
also see an increase in attentiveness, but this trend is only for
Europe (P(s) = 0.4423).
In June, WHO declared the new strain of swine-origin H1N1
as a pandemic, causing an increase of fear (P(s) = 0.5385) but
also in attentiveness in the US (P(s) = 0.3491) and in users
from Europe (P(s) = 0.3174). In July, 26,089 new cases of
H1N1 were confirmed in Europe by WHO, which leads to a
further increase in sentiment of fear (P(s) = 0.4887), mainly
among the European users.
On the last marked date in August, the most affected
countries and deaths were announced as being located in
Europe and America [19]. In this period, European users had
an increase in feeling of hostility (P(s) = 0.2542), whereas
users in the US increased the feeling of fear (P(s) = 0.4112).
These variations in the degree of sentiments expressed over
time can effectively capture the dynamics in people’s moods
across different geographical regions.
C. Testing across different time periods
The baseline values computed for PANAS-t in Table II is
based on longitudinal data, based on 3.5 years worth of tweets
(a) H1N1-Europe
(b) H1N1-US
Figure 2.
Public mood for H1N1 over 2009 in Europe and U.S.
between 2006 and until mid 2009, and represent a rather stable
base sentiment of Twitter users. Therefore, these baseline
values can be used to detect feelings of Twitter users from
much later time periods (beyond mid 2009). Here, we use a
different Twitter dataset that contains tweets posted between
the end of 2009 to the end of 2010 that was collected by [26]
and have extracted tweets associated with two last events in
Table IV: S AMOA and H AITI.
The 2009 S AMOA Islands Tsunami was caused by a submarine earthquake that took place in the Samoan Islands
on September 29th with a magnitude of 8.1, which was
the largest earthquake of 2009. A tsunami was generated
causing substantial damage and loss of life in Samoa, American Samoa, and Tonga. More than 189 people were killed
including children, which caused a large commotion around
the world and generated a state of alert in neighboring coastal
countries [18]. Figure 3(a) shows the Kiviat chart for mood of
users on the day of tsunami and the day after, which shows
dominance in feelings of fear (P(s) = 0.9280), attentiveness
(P(s) = 0.9932), hostility (P(s) = 0.8451), surprise (P(s) =
0.6528), and sadness (P(s) = 0.6483).
A similar tragic event happened in three months later in
another part of the world. The 2010 H AITI earthquake was a
catastrophic natural disaster, which caused severe damage in
Port-au-Prince and the nearby region killing at least 250,000
people. Figure 3(b) shows that feelings of hostility (P(s)
= 0.9280), attentiveness (P(s) = 0.3678), surprise (P(s) =
0.4576) and sadness (P(s) = 0.3975) had an increase. We
also see an increase in shyness and guilt. After this event
the world’s eyes were focused on the disaster and people
(a) Samoa
(b) Haiti
Figure 3. Feeling expressed by Twitter’s users for Tsunami, in Samoa Islands,
and Earthquake, in Haiti.
around the world offered help to Haiti [20]. As the poverty
and precarious situation of the Haiti people was unveiled in
the news, it is possible that this situation has generated an
increase of these two feelings among the Twitter users. This
finding demonstrates that PNAS-t is stable and can effectively
represent sentiments of tweets gathered much later in time.
VI. C ONCLUSIONS
In this paper, we present PANAS-t an eleven-sentiment psychometric scale adapted to the context of Twitter. PANAS-t is
based on the expanded version of the well known Positive
Affect Negative Affect Scale (PANAS-x). Using empirical
data from a unique Twitter dataset containing 1.8 billion
tweets, we were able to compute the normalization scores
for each sentiment. We conducted a three-step evaluation.
We first applied PANAS-t to 11 notable events that were
widely discussed in Twitter. We next compared PANAS-t with
a method using most common emoticons that are used for
users in Web. We finally showed that our method can be
used in other database and also in other periods. These results
provide strong evidences that PANAS-t can accurately capture
the positive and negative sentiments about events in Twitter.
The normalized scores of sentiments provided in this paper
allow anyone to easily use PANAS-t, making it very simple
and practical to be used for large amounts of data and even
for real-time analysis. We hope that this psychometric scale
can be used by any researches with the purpose of create tools
that can be used for government agencies or companies that
might be interested in improving their products using social
networks. From the researcher perspective our method would
allow one to comprehend how, when, and why individuals feel
and their feelings fluctuate according to social and economic
events.
Despite the new opportunities our work brings, there are
several limitations. First, the tweets we examined do no
represent everyone who expressed sentiments in Twitter. We
only focused on those tweets that explicitly contained “I am
feeling” kinds of tags, although other tweets may contain
emotions as well. Nonetheless, classifying emotional content
from informational content remains an important challenge in
social media analysis.
Second, one criticism of sentiment analysis is that it takes a
naive view of emotional states, assuming that personal moods
can simply be divined from word selection. This might seem
particularly perilous on a medium like Twitter, where sarcasm
and other playful uses of language may subvert the surface
meaning of a tweet. Deeper linguistic analysis should be
explored to provide “a richer and a more nuanced view” of
how people present themselves to the world.
We expect that in the future more applications will utilize
sentiment analysis for specific vocabularies especially in a dynamic environment like Twitter to understand people’s moods.
Thus, we plan to combine other techniques such as machine
learning to dynamically incorporate sentiments to PANAS-t
according to the context.
R EFERENCES
[1] E. M. Airoldi, X. Bai, R. Padman, Markov blankets and meta-heuristic
search: Sentiment extraction from unstructured text, Lecture Notes in
Computer Science 3932 (Advances in Web Mining and Web Usage
Analysis) (2006) 167–187.
[2] S. Asur, B. A. Huberman, Predicting the future with social media, CoRR
abs/1003.5699.
[3] A. Aue, M. Gamon, Customizing sentiment classifiers to new domains:
A case study, in: Proceedings of Recent Advances in Natural Language
Processing (RANLP), 2005.
[4] C. Bates, How michael jackson’s death shut down twitter, brought chaos
to google...and ’killed off’ jeff goldblum, http://bit.ly/16e6eM, accessed
January, 2012.
[5] F. Benevenuto, G. Magno, T. Rodrigues, V. Almeida, Detecting spammers on twitter, in: Proceedings of the Annual Collaboration, Electronic
messaging, Anti-Abuse and Spam Conference (CEAS), 2010.
[6] A. Bermingham, A. F. Smeaton, Classifying sentiment in microblogs:
is brevity an advantage?, in: Proceedings of the 19th ACM international
conference on Information and knowledge management, 2010.
[7] J. Bollen, A. Pepe, H. Mao, Modeling public mood and emotion:
Twitter sentiment and socio-economic phenomena, in: Proceedings of
the International AAAI Conference on Weblogs and Social Media
(ICWSM), 2011.
[8] M. Cha, H. Haddadi, F. Benevenuto, K. P. Gummadi, Measuring User
Influence in Twitter: The Million Follower Fallacy, in: Proceedings
of the International AAAI Conference on Weblogs and Social Media
(ICWSM), 2010.
[9] M. Cheong, V. C. Lee, A microblogging-based approach to terrorism
informatics: Exploration and chronicling civilian sentiment and response
to terrorism events via twitter, Information Systems Frontiers 13 (2011)
45–59.
[10] P. Chesley, B. Vincent, L. Xu, R. Srihari, Using verbs and adjectives to
automatically classify blog sentiment, in: AAAI Symposium on Computational Approaches to Analysing Weblogs (AAAI-CAAW), 2006.
[11] S. Chhabra, A. Aggarwal, F. Benevenuto, P. Kumaraguru, Phi.sh/$ocial:
The phishing landscape through short urls, in: Proceedings of the 8th
Annual Collaboration, Electronic messaging, Anti-Abuse and Spam
Conference (CEAS), 2011.
[12] N. A. Diakopoulos, D. A. Shamma, Characterizing debate performance
via aggregated twitter sentiment, in: Proceedings of the 28th international conference on Human factors in computing systems, 2010.
[13] P. Dodds, C. Danforth, Measuring the happiness of large-scale written
expression: Songs, blogs, and presidents, Journal of Happiness Studies
11 (2010) 441–456.
[14] S. A. Golder, M. W. Macy, Diurnal and seasonal mood vary with work,
sleep, and daylength across diverse cultures, Science 333 (6051) (2011)
1878–1881.
[15] J. Gomide, A. Veloso, W. M. Jr., V. Almeida, F. Benevenuto, F. Ferraz,
M. Teixeira, Dengue surveillance based on a computational model of
spatio-temporal locality of twitter, in: Proceedings of the ACM Web
Science Conference (WebSci), 2011.
[16] P. H. C. Guerra, A. Veloso, W. Meira, Jr, V. Almeida, From bias to
opinion: a transfer-learning approach to real-time sentiment analysis, in:
Proceedings of the ACM SIGKDD Conference on Knowledge Discovery
and Data Mining (KDD), 2011.
[17] HCD, Confidence levels increase among democrats and independents, decrease among republicans after viewing obama’s press conference, http://www.hcdi.net/news/MediacurvesRelease.cfm?M=276, accessed January 15, 2012.
[18] Fear for new tsunami after earthquake hits sumatra, http://digitaljournal.
com/article/279868.
[19] Map of affected countries and deaths as of 23 august 2009, http://www.
who.int/csr/don/2009 08 28/en/index.html.
[20] Operation
compassion
responds
to
haitian
earthquake
victims,
http://www.operationcompassion.org/2010/01/
operation-compassion-responds-to-haitian-earthquake-victims/.
[21] US confirms it asked Twitter to stay open to help Iran protesters, http:
//tinyurl.com/klv36p.
[22] R. Jain, The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling, 1st ed., John Wiley and Sons, INC, 1991.
[23] B. E. Kim, S. Gilbert, Detecting sadness in 140 characters: Sentiment
analysis and mourning michael jackson on twitter, Web Ecology 03
(2009) 1–15.
[24] J. Kulshrestha, F. Kooti, A. Nikravesh, K. P. Gummadi, Geographic
dissection of the twitter network, in: Proceedings of the International
AAAI Conference on Weblogs and Social Media (ICWSM), 2012.
[25] D. Lazer, A. Pentland, L. Adamic, S. Aral, A. lászló Barabási, D. Brewer,
N. Christakis, N. Contractor, J. Fowler, M. Gutmann, T. Jebara, G. King,
M. Macy, D. Roy, M. V. Alstyne, Computational social science, Science
323 (5915) (2009) 721–723.
[26] P. J. Michael, M. Dredze, You are what you tweet: Analyzing twitter
for public health, in: Proceedings of the International AAAI Conference
on Weblogs and Social Media (ICWSM), 2011.
[27] T. Miyoshi, Y. Nakagami, Sentiment classification of customer reviews
on electric products, International Symposium in Information Technology (ITSim) (2007) 2028–2033.
[28] B. O’Connor, R. Balasubramanyan, B. R. Routledge, N. A. Smith, From
tweets to polls: Linking text sentiment to public opinion time series,
in: Proceedings of the International AAAI Conference on Weblogs and
Social Media (ICWSM), 2010.
[29] A. Pak, P. Paroubek, Twitter as a corpus for sentiment analysis and opinion mining, in: N. C. C. Chair), K. Choukri, B. Maegaard, J. Mariani,
J. Odijk, S. Piperidis, M. Rosner, D. Tapias (eds.), Proceedings of the
Seventh International Conference on Language Resources and Evaluation (LREC), European Language Resources Association (ELRA), 2010.
[30] B. Pang, L. Lee, Opinion mining and sentiment analysis, Foundations
and Trends in Information Retrieval 2 (2008) 1–135.
[31] J. W. Pennebaker, M. R. Mehl, K. G. Niederhoffer, Psychological
aspects of natural language use: Our words, ourselves, Annual Review
of Psychology 54 (2003) 547–577.
[32] T. Sakaki, M. Okazaki, Y. Matsuo, Earthquake shakes twitter users:
real-time event detection by social sensors, in: Proceedings of the
International Conference on World Wide Web (WWW), 2010.
[33] A. Tumasjan, T. O. Sprenger, P. G. Sandner, I. M. Welpe, Predicting
elections with twitter: What 140 characters reveal about political sentiment, in: Proceedings of the International AAAI Conference on Weblogs
and Social Media (ICWSM), 2010.
[34] D. Watson, L. A. Clark, The PANAS-X: Manual for the positive and
negative affect schedule-Expanded Form, University of Iowa, 1994.
[35] D. Watson, L. A. Clark, A. Tellegen, Development and validation of brief
measures of positive and negative affect: the PANAS scales, Journal of
Personality and Social Psychology 54 (1988) 1063–1070.
[36] K. Wickre, Celebrating twitter7, http://blog.twitter.com/2013/03/
celebrating-twitter7.html, accessed on May 16, 2013.