Socio-Ethnic Ingredients of Social Network Communities
Socio-Ethnic Ingredients of Social Network Communities
Socio-Ethnic Ingredients of Social Network Communities
Author Keywords
Personality; Values; Social community; Twitter.
Permission to make digital or hard copies of part or all of this work for personal ACM Classification Keywords
or classroom use is granted without fee provided that copies are not made or
distributed for profit or commercial advantage and that copies bear this notice
E.1 [Data Structures]: Graphs and Networks
and the full citation on the first page. Copyrights for third-party components of
this work must be honored. For all other uses, contact the Owner/Author.
Copyright is held by the owner/author(s).
Introduction
CSCW ’17 Companion, February 25 - March 01, 2017, Portland, OR, USA Detecting and analyzing dense groups or communities
ACM 978-1-4503-4688-7/17/02.
http://dx.doi.org/10.1145/3022198.3026322
from social and information networks has attracted im-
mense attention over the last decade [3]. Since commu-
235
Poster Presentation CSCW 2017, February 25–March 1, 2017, Portland, OR, USA
236
Poster Presentation CSCW 2017, February 25–March 1, 2017, Portland, OR, USA
challenges while working with Twitter and therefore a existent accounts and those users who had less than
number of iterations, human interventions and personal 100 tweets. We downloaded total 6,768 users’ tweet
communications had to be done to resolve all these is- data. The highest (resp. lowest) number of tweets for
sues. In the end, 367 unique user’s data were gath- a user was 3,641 (resp. 100) with an average num-
ered containing PVQ answers along with user’s tweets. ber tweets per user being 2,406. Note that we use
The highest number of tweets for a user was quite high Facebook Personality corpus and Twitter Values corpus
(15K) and the lowest number of tweet per user was a for designing Personality and Values models, respec-
only 100 (average is 1,608). We also ensured that par- tively and use them to analyze the community structure
ticipants are native English speakers from various cul- present in the Twitter community dataset.
tures and ethnic backgrounds. Schwartz Values fuzzi-
ness is shown in Figure 2. In this figure, for example the Personality and Values Models
fuzzy membership of the ACY oriented people is repre- The state-of-the-art sentiment analysis systems analyze
sented by outgoing red bands. The width of each outgo- any fragment of text in isolation. However, in order to
Figure 2: Schwartz Values fuzzi- ing band from ACY represents the degree of member- design Big five personality model and Schwartz model
ness for Twitter Values corpus classifier, psycholinguistic analysis is required. Here,
ship of ACY with other Values classes. Similarly, we
using Circos Visualisation. The
can observe that there are 10 incoming bands of 10 we discuss various features and methods used for the
intricate structure of the Circos
different colours towards ACY, indicating the member- automatic Personality and Values identification. Our mod-
figure rightly signifies how Values
are strongly connected with each ship of each class in ACY. In each class there is a self- els are inspired by the research reports published in
other at societal level. arc which represents membership of each class with it- the Workshop and Shared Task on Computational Per-
self (i.e., 100%). Such overlapping nature of psycho- sonality Recognition [4]. We experiment with several
logical classes makes the computational classification machine Learning algorithms such as Support Vector
problem much more challenging than the classical sen- Machine (SVM), Multinomial Naive Bayes (mNM), Sim-
timent analysis problem. ple Logistic Regression (LR), and Random Forest (RF).
1 (a) Among them SVM (with linear kernel) turns out to be
0.5
Twitter Community Corpus the best method.
F−Score
0
O C E A N
1 (b) We use the Twitter network, released by SNAP1 (nodes: Linguistic Features: We use three different Psycholin-
0.5
0
AC BE CO HE PO SE SD ST TR UN 81,306, edges: 1,768,149). Users are distinguished by guistic lexicon features: Linguistic Inquiry Word Count
their Twitter id’s. This dataset has been widely used in (LIWC2 ), Harvard General Inquirer3 and MRC4 psycholin-
Figure 3: Performance of SVM
community detection studies [5]. We further enriched guistic database. In addition, the Sensorial lexicon Sen-
for (a) Personality (b) Values
models.
the dataset by crawling the tweets of each user, re- sicon5 was used.
quired for our Personality and Values models. The orig- Non-Linguistic Features: Facebook network proper-
.
inal dataset had 18,021 users and 5,038 communities. ties including network size, betweenness centrality, den-
We considered 1,562 ground-truth communities, after
discarding all the communities having less than 5 con- 2 http://www.liwc.net/
3 http://www.wjh.harvard.edu/ inquirer/
stituent members. We further discarded all the non- 4 http://www.psych.rl.ac.uk/
1 http://snap.stanford.edu/ 5 https://hlt-nlp.fbk.eu/technologies/sensicon
237
Poster Presentation CSCW 2017, February 25–March 1, 2017, Portland, OR, USA
sity and transitivity, are provided as a part of the re- and Personality feature sets separately along with net-
leased Facebook personality corpus [4]. For the Twit- work information. We use the results obtained from our
Sl. No
(i)
Feature
Network information
NMI
0.57
ARI
0.61
PU
0.65
F-score
0.41
ter Values corpus, total number of tweets/or messages SVM based models directly into the feature vector. We
(ii)
(iii)
(i) + Values feature
(i) + Personality feature
0.57
0.59
0.61
0.64
0.66
0.69
0.42
0.44
of one user, total number of likes, average time differ- report the accuracy of CESNA for individual feature set-
(iv) All 0.61 0.68 0.71 0.45
ence between two tweets/messages, total number of tings separately in Table 3. Finally, we combine all these
Table 3: Results of CESNA favourites and re-tweets on all the tweets/messages by features together with equal weights given to node at-
for different feature sets. Over- one user and their in-degree and out-degree centrality tributes and network information and oberve that the
all, we observe that consider- scores on network of friends and followers are used as accuracy is improved duo to the addition of Personal-
ing the node attributes along with features. The Personality and Values models achieve ity and Values features. This indicates that appropriate
the network features always im- average F-scores of 0.78 and 0.70 respectively with SVM. additional information related to nodes can significantly
proves the performance as op- Class wise F-Scores for both the systems are reported aid to the performance of the community detection.
posed to considering only the in the Figure 3.
network information. When all Conclusion
the features considered together, Improving Community Detection This work unfolds semantic interpretation of communi-
CESNA achieves 7%, 11.41%, So far, we have analyzed the Values and Personality of ties present in social networks in terms of Personality
9.23% and 9.75% performance
individuals. At this stage we are interested to apply such and Values of individual. We also showed how it can be
gain in terms of standard com-
psycholinguistic models in some real world problems. leveraged to detect communities more accurately. Our
munity evaluation metrics – NMI,
ARI, PU and F-score respec- We utilize our models in community detection problem, future direction would be to examine demographic psy-
tively compared to the case with where the research question is – given a network of cholinguistic variance of social network communities.
only the network information. individuals with their Values and Personality traits pro-
vided a priori, can we discover more accurate commu- References
nity structure compared to the one obtained using only [1] M. D. Back, J. M. Stopfer, S. Vazire, S. Gaddis, S.C.
network information? Traditionally, when network com- Schmukle, B. Egloff, and S. D. Gosling. 2010. Facebook
munities need to be detected, there are two possible profiles reflect actual personality, not self-idealization.
Psychological Science 21 (2010), 372–374.
sources of information one can use: the network struc-
[2] Tanmoy Chakraborty, Ayushi Dalmia, Animesh Mukher-
ture, and the attributes of nodes. Typically the existing jee, and Niloy Ganguly. 2016. Metrics for Community
algorithms have only focused on the network informa- Analysis: A Survey. CoRR abs/1604.03512 (2016).
tion, and tend to ignore the node attributes and the over- [3] Santo Fortunato. 2010. Community detection in graphs.
all textual information so far. Here, we use the state-of- Physics Reports 486, 3-5 (2010), 75 – 174.
the-art algorithm, CESNA [5], which considers both the [4] Dejan Markovikj, Sonja Gievska, Michal Kosinski, and
network structure and the node attributes: Personality David Stillwell. 2013. Mining facebook data for predic-
and Values in our case to detect communities. CESNA tive personality modeling. In ICWSM 2013, Boston, MA,
allows us to control the weights of network and node USA.
features. We consider the Twitter network and three [5] Jaewon Yang, Julian J. McAuley, and Jure Leskovec.
sets of features – the network information, the Person- 2014. Community Detection in Networks with Node At-
tributes. CoRR abs/1401.7267 (2014).
ality and Values features. We run CESNA with Values
238