Socio-Ethnic Ingredients of Social Network Communities

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Poster Presentation CSCW 2017, February 25–March 1, 2017, Portland, OR, USA

Socio-Ethnic Ingredients of Social


Network Communities

Tushar Maheshwari Aishwarya N. Reganti Abstract


IIIT-Sri City, Chittoor IIIT-Sri City, Chittoor In network science, a community is considered to be a
Sri City, AP, India, 517588 Sri City, AP, India, 517588 group of nodes densely connected internally and sparsely
tushar.m14@iiits.in aishwarya.r14@iiits.in connected externally. Detecting and analyzing commu-
nities from social networks has attracted immense at-
tention over the last decade. However, the semantic
interpretation of a community is hardly studied. In this
paper, we attempt to understand whether individuals in
a community possess similar personalities, values and
Tanmoy Chakraborthy Amitava Das ethical background. To this end, we collect Twitter val-
University of Maryland IIIT-Sri City, Chittoor ues corpus, extract the network communities and pro-
College Park, USA, 20742 Sri City, AP, India, 517588 pose automatic models to determine personality, val-
tanchak@umiacs.umd.edu amitava.das@iiits.in ues, considered as ethnicity of individuals. Various ex-
periments are performed to understand the character-
istics or blend of characteristics of individuals within a
community.

Author Keywords
Personality; Values; Social community; Twitter.

Permission to make digital or hard copies of part or all of this work for personal ACM Classification Keywords
or classroom use is granted without fee provided that copies are not made or
distributed for profit or commercial advantage and that copies bear this notice
E.1 [Data Structures]: Graphs and Networks
and the full citation on the first page. Copyrights for third-party components of
this work must be honored. For all other uses, contact the Owner/Author.
Copyright is held by the owner/author(s).
Introduction
CSCW ’17 Companion, February 25 - March 01, 2017, Portland, OR, USA Detecting and analyzing dense groups or communities
ACM 978-1-4503-4688-7/17/02.
http://dx.doi.org/10.1145/3022198.3026322
from social and information networks has attracted im-
mense attention over the last decade [3]. Since commu-

235
Poster Presentation CSCW 2017, February 25–March 1, 2017, Portland, OR, USA

Table 1: Values Description. Facebook Personality Corpus


Values Type Schwartz Values In the recent years, there have been a lot of research ac-
Achievement (AC) The value here comes from setting goals and then achieving
them tivities on automated identification of various personality
Benevolence (BE) They are very philanthropic, they seek to help others and pro-
Persoanlity Type Description
vide general welfare
traits of an individual from their language usage and be-
Openness (O) imaginative and in-
sightful and have wide
Conformity (CO) This category of people obey clear rules and structures haviour in social media. One milestone in this area is
Hedonism (HE) Hedonists are those who simply enjoy themselves
interests
Power (PO) The ability to control over others the “Workshop and Shared Task on Computational Per-
Conscientiousness organized, thorough,
(C) and planned
Security (SE) Those who seek security value health and safety to a greater
degree
sonality Recognition” (WCPR) in 2013. They released
Extroversion (E) talkative, energetic,
and assertive
Self-direction (SD) Individuals who are self-directed, enjoy being independent a Facebook corpus, which consists of 10,000 Facebook
and are outside the control of others
Agreeableness (A) sympathetic, kind, and
Stimulation (ST) It is closely related to hedonism, nevertheless the goals are status updates of 250 users and their Facebook network
affectionate
Neuroticism (N) tense, moody, and
slightly different properties, labeled with personality traits. Personality
Tradition(TR) A traditionalist respects practices of the past, doing things
anxious.
blindly because they are customary and Schwartz Values models support fuzzy member-
Universalism (UN) Individuals who are universal,seek social justice and tolerance
Table 2: Big - 5 Personality for all
ship, which means that anyone having Open personal-
Traits. ity can have Agreeable nature as well, and similarly that
nity detection is an ill-defined problem, several heuris- someone with Power orientation also can have Achieve-
tics were proposed to detect communities from the net- ment orientation. To understand this notion, the fuzzy
work structure [2]. However, the semantic interpreta- membership statistics in the Facebook Personality cor-
tion of a community, i.e., the behavior of individuals in pus are reported in Figure 1. On a careful analysis one
a community has hardly been addressed. This paper can clearly observe how each Personality trait is over-
presents psycholinguistic study in order to understand lapped with other Personality traits. It can be inferred
the behavior of individuals forming communities in so- from the visualisation that very few extroverts have neu-
cial networks. We use two psychological models to iden- rotic nature, but many of them are positively oriented
tify the behavior of individuals thoroughly. The Person- towards Conscientiousness (C) trait. It, indeed makes
ality model (Table 2) is used to understand the char- sense as extrovert people are outgoing and would like
acteristics or blend of characteristics at individual level, to mingle in different circles of the society.
whereas the values and ethics (Table 1) models are
used to understand and analyze interpersonal dynam- Twitter Values Corpus
ics of societal sentiment. The standard method for any psychological data collec-
tion is through self-assessment tests, popularly known
Corpus as psychometric tests. Self-assessments were obtained
To start with, we asked a very fundamental question - using a fifty-item long male/female version of the Portrait
whether social media is a good proxy of the original so- Values Questionnaire (PVQ). We crowd-sourced the data
ciety or not. We grounded our corpus collection based using the Amazon Mechanical Turk (AMT) service. A
Figure 1: Personality Values on the conclusion drawn by [1] that people, in general, 50 item PVQ questionnaire was given to people and
fuzziness for Facebook corpus do not use virtually desired/bluffed social media profiles we requested them to - (i) answer the PVQ questions
using Circos Visualisation. to promote an idealized-virtual-identity. honestly and (ii) provide their Twitter ids so that their
tweets could be crawled. However, we faced several

236
Poster Presentation CSCW 2017, February 25–March 1, 2017, Portland, OR, USA

challenges while working with Twitter and therefore a existent accounts and those users who had less than
number of iterations, human interventions and personal 100 tweets. We downloaded total 6,768 users’ tweet
communications had to be done to resolve all these is- data. The highest (resp. lowest) number of tweets for
sues. In the end, 367 unique user’s data were gath- a user was 3,641 (resp. 100) with an average num-
ered containing PVQ answers along with user’s tweets. ber tweets per user being 2,406. Note that we use
The highest number of tweets for a user was quite high Facebook Personality corpus and Twitter Values corpus
(15K) and the lowest number of tweet per user was a for designing Personality and Values models, respec-
only 100 (average is 1,608). We also ensured that par- tively and use them to analyze the community structure
ticipants are native English speakers from various cul- present in the Twitter community dataset.
tures and ethnic backgrounds. Schwartz Values fuzzi-
ness is shown in Figure 2. In this figure, for example the Personality and Values Models
fuzzy membership of the ACY oriented people is repre- The state-of-the-art sentiment analysis systems analyze
sented by outgoing red bands. The width of each outgo- any fragment of text in isolation. However, in order to
Figure 2: Schwartz Values fuzzi- ing band from ACY represents the degree of member- design Big five personality model and Schwartz model
ness for Twitter Values corpus classifier, psycholinguistic analysis is required. Here,
ship of ACY with other Values classes. Similarly, we
using Circos Visualisation. The
can observe that there are 10 incoming bands of 10 we discuss various features and methods used for the
intricate structure of the Circos
different colours towards ACY, indicating the member- automatic Personality and Values identification. Our mod-
figure rightly signifies how Values
are strongly connected with each ship of each class in ACY. In each class there is a self- els are inspired by the research reports published in
other at societal level. arc which represents membership of each class with it- the Workshop and Shared Task on Computational Per-
self (i.e., 100%). Such overlapping nature of psycho- sonality Recognition [4]. We experiment with several
logical classes makes the computational classification machine Learning algorithms such as Support Vector
problem much more challenging than the classical sen- Machine (SVM), Multinomial Naive Bayes (mNM), Sim-
timent analysis problem. ple Logistic Regression (LR), and Random Forest (RF).
1 (a) Among them SVM (with linear kernel) turns out to be
0.5
Twitter Community Corpus the best method.
F−Score

0
O C E A N

1 (b) We use the Twitter network, released by SNAP1 (nodes: Linguistic Features: We use three different Psycholin-
0.5
0
AC BE CO HE PO SE SD ST TR UN 81,306, edges: 1,768,149). Users are distinguished by guistic lexicon features: Linguistic Inquiry Word Count
their Twitter id’s. This dataset has been widely used in (LIWC2 ), Harvard General Inquirer3 and MRC4 psycholin-
Figure 3: Performance of SVM
community detection studies [5]. We further enriched guistic database. In addition, the Sensorial lexicon Sen-
for (a) Personality (b) Values
models.
the dataset by crawling the tweets of each user, re- sicon5 was used.
quired for our Personality and Values models. The orig- Non-Linguistic Features: Facebook network proper-
.
inal dataset had 18,021 users and 5,038 communities. ties including network size, betweenness centrality, den-
We considered 1,562 ground-truth communities, after
discarding all the communities having less than 5 con- 2 http://www.liwc.net/
3 http://www.wjh.harvard.edu/ inquirer/
stituent members. We further discarded all the non- 4 http://www.psych.rl.ac.uk/
1 http://snap.stanford.edu/ 5 https://hlt-nlp.fbk.eu/technologies/sensicon

237
Poster Presentation CSCW 2017, February 25–March 1, 2017, Portland, OR, USA

sity and transitivity, are provided as a part of the re- and Personality feature sets separately along with net-
leased Facebook personality corpus [4]. For the Twit- work information. We use the results obtained from our
Sl. No
(i)
Feature
Network information
NMI
0.57
ARI
0.61
PU
0.65
F-score
0.41
ter Values corpus, total number of tweets/or messages SVM based models directly into the feature vector. We
(ii)
(iii)
(i) + Values feature
(i) + Personality feature
0.57
0.59
0.61
0.64
0.66
0.69
0.42
0.44
of one user, total number of likes, average time differ- report the accuracy of CESNA for individual feature set-
(iv) All 0.61 0.68 0.71 0.45
ence between two tweets/messages, total number of tings separately in Table 3. Finally, we combine all these
Table 3: Results of CESNA favourites and re-tweets on all the tweets/messages by features together with equal weights given to node at-
for different feature sets. Over- one user and their in-degree and out-degree centrality tributes and network information and oberve that the
all, we observe that consider- scores on network of friends and followers are used as accuracy is improved duo to the addition of Personal-
ing the node attributes along with features. The Personality and Values models achieve ity and Values features. This indicates that appropriate
the network features always im- average F-scores of 0.78 and 0.70 respectively with SVM. additional information related to nodes can significantly
proves the performance as op- Class wise F-Scores for both the systems are reported aid to the performance of the community detection.
posed to considering only the in the Figure 3.
network information. When all Conclusion
the features considered together, Improving Community Detection This work unfolds semantic interpretation of communi-
CESNA achieves 7%, 11.41%, So far, we have analyzed the Values and Personality of ties present in social networks in terms of Personality
9.23% and 9.75% performance
individuals. At this stage we are interested to apply such and Values of individual. We also showed how it can be
gain in terms of standard com-
psycholinguistic models in some real world problems. leveraged to detect communities more accurately. Our
munity evaluation metrics – NMI,
ARI, PU and F-score respec- We utilize our models in community detection problem, future direction would be to examine demographic psy-
tively compared to the case with where the research question is – given a network of cholinguistic variance of social network communities.
only the network information. individuals with their Values and Personality traits pro-
vided a priori, can we discover more accurate commu- References
nity structure compared to the one obtained using only [1] M. D. Back, J. M. Stopfer, S. Vazire, S. Gaddis, S.C.
network information? Traditionally, when network com- Schmukle, B. Egloff, and S. D. Gosling. 2010. Facebook
munities need to be detected, there are two possible profiles reflect actual personality, not self-idealization.
Psychological Science 21 (2010), 372–374.
sources of information one can use: the network struc-
[2] Tanmoy Chakraborty, Ayushi Dalmia, Animesh Mukher-
ture, and the attributes of nodes. Typically the existing jee, and Niloy Ganguly. 2016. Metrics for Community
algorithms have only focused on the network informa- Analysis: A Survey. CoRR abs/1604.03512 (2016).
tion, and tend to ignore the node attributes and the over- [3] Santo Fortunato. 2010. Community detection in graphs.
all textual information so far. Here, we use the state-of- Physics Reports 486, 3-5 (2010), 75 – 174.
the-art algorithm, CESNA [5], which considers both the [4] Dejan Markovikj, Sonja Gievska, Michal Kosinski, and
network structure and the node attributes: Personality David Stillwell. 2013. Mining facebook data for predic-
and Values in our case to detect communities. CESNA tive personality modeling. In ICWSM 2013, Boston, MA,
allows us to control the weights of network and node USA.
features. We consider the Twitter network and three [5] Jaewon Yang, Julian J. McAuley, and Jure Leskovec.
sets of features – the network information, the Person- 2014. Community Detection in Networks with Node At-
tributes. CoRR abs/1401.7267 (2014).
ality and Values features. We run CESNA with Values

238

You might also like