SNA Project Presentation

Fake News Spreader Detection
Group Members :
Anupam Raj (IIT2020034)
Group 9 Arjun Yadav (IIT2020120)

Kaustubh Kale (IIT2020129)
Satyam Gupta (IIT2020143)
Medha Tiwari (IEC2020063)
2
Fake news spreader detection is a process of
identifying individuals or organizations that
intentionally propagate misinformation,
disinformation, or propaganda to mislead the public.
Fake news has become a prevalent issue in today's
society, particularly with the rise of social media and
the internet. It can have serious consequences,
including the spread of hate speech, the erosion of
INTRODUCTION trust in public institutions, and the undermining of

democratic processes.
The detection of fake news spreaders involves

several techniques and approaches, including
natural language processing, machine learning,
social network analysis, and fact-checking. These
methods can be used to analyze the content of news
articles, social media posts, and other forms of
online communication to identify patterns and
indicators of fake news spreading behavior.
Fake news is a piece of false information
published by news outlets to mislead
consumers.
● Fake news has several significant negative
effects on civil society.
● People may accept deliberate lies as
Fake news and purpose of truths,the likelihood of accepting fake
news as true increases after repeated
detecting fake news exposure.
spreaders ● Second, fake news may change the way
people respond to legitimate news. When
people are inundated with fake news, the
line between fake news and real news
becomes more uncertain
● Finally, the prevalence of fake news has
the potential to break the trustworthiness
of the entire news ecosystem.
● Buda and Bolonyai used ensemble learning technique
for detection of fake news spreaders. They used n-grams
and statistical features as content based detection is
faster compared to context or knowledge based detection.
● Anastasia Giachanou focused on problem of
differentiating between users that tend to share fake
news (spreaders) and those that tend to check the
factuality of articles (checkers). To this end, they first
collect articles that have been manually annotated from
experts as fake or fact and then they proposed the
Related Work / CheckerOrSpreader model that is based on a CNN
Literature Review network. CheckerOrSpreader incorporates the linguistics
patterns and the personality traits of the users that are
inferred from users’ posts to decide if a user is a potential
spreader or checker.
● In 2002,Unigrams and bi-grams, Naive Bayes, maximum
entropy classification,and support vector machines helped
to classify the sentiment around movie-data.
● In 2004, support vector machine and Naive Bayes were
used to determine if movie reviews were positive or
negative.
● In 2022, Rath and Salecha used different flavours of
GCN using trust based strategies to identify fake
spreader detection.
● They analysed both network structure + Node
activity to decide which node is more likely to spread
a news using properties like trustingness ,
trustworthiness and believability.
● Out of GCNtop, GCNact, SArandGEtop:,
SArandGEact ,SAtopGEtop, SAtopGEact, SAactGEtop,
Related Work / Literature SAactGEact, SAtopGEtop have performed better in F
( false news spreading) :93.7% , T (true news
Review spreading):83.4%, F U T : 61.6% accuracy.
● They also included bot filtration using a bot
detection model proposed by Kudugunta and Ferrara
and found that after bot filtration, the performance
increases by 2.8% for SArandGEact, 1% for
SAtopGEact, 4.6% for SAactGEtop and 3.6% for
SAactGEact.
● Anastasia Giachanu used Checker Or Spreader
model that can classify a user as a potential fact
checker or a potential fake news spreader using CNN.
1. Experimented with a number of machine
learning models based on word n-grams
extracted from the text.
2. Precisely investigated the performance of :

a. regularized logistic regressions (LR),
b. random forests (RF),
c. XGBoost classifiers (XGB)
Methodology-1: d. linear support vector machines (SVM).
N-grams
For all four models, we ran an
extensive grid search combined with
five-fold cross-validation to find the
optimal text preparation method,
vectorization technique and
modeling parameters.
3. We tested the same parameters for the:

a. English Data
Contd:
These are the hyper-parameters we tried with 5-fold cross validation to find the best set
of hyper-parameters for the 4-models.
Contd:
1. Investigated two types of text cleaning methods for all models. The first method
(M1) removed all non alphanumeric characters (except #) from the text, while the
second method (M2) removed most non alphanumeric characters (except #) but
kept emoticons and emojis. Both methods transformed the text to lower case.
2. Regarding the vectorization of the corpus, experimented with a number of

parameters & tested different word ngram ranges (unigrams, bigrams, unigrams
and bigrams) and also looked at different scenarios regarding the minimum overall
document frequency of the word n-grams (3, 4, 5, 6, 7, 8, 9, 10) included as
features.
3. The accuracy of our model was approximately 5% lower on the test set compared to
the cross-validation results (70% vs. 74% for the English dataset).
● refitted the four submodels with the cross-validated
hyperparameters five times on different chunks of the
original training data (each consisting of tweets from 240
users). This is done to prevent overfitting on ensemble on
training data.
● Predictions given by these five models to the 60

remaining users were appended to the training data of the
ensemble model, thus this training set consisted of
Methodology-2: predictions given to all 300 users in the training data.
Stacking-ensemble ● used these constructed training and test sets to find the
best ensemble from the following three methods:
1. majority voting
2. linear regression of predicted probabilities
3. logistic regression model [Best Result- an accuracy
of 70%for the English dataset]]
● We have experimented with english
language only.
● The Training Data consisted of 300

xml files, each containing the twitter
feed of a twitter account. The feed is
Dataset of last 100 tweets
● We have used 200 XML files for

testing purposes.
● https://zenodo.org/record/4039435#.
ZFVGAnZBy3B
● The Average accuracy for Support Vector
Machine model came out to be 55.66 %.
● The Average accuracy for Random Forest

model came out to be 55.0 %.
● The Average accuracy for Logistic

Regression model came out to be 55.66 %.
Results
● The Average accuracy for XGBoost model
came out to be 56.0 %.
● The Average accuracy for Ensemble model

based on combination of all the above
models came out to be 71.5 %.
● Firstly we have implemented the proposed-
classification model.
● Most of the research paper that we read had

better accuracies predicting whether the
spreader is fake or legitimate in spanish
language compared to other languages like
English. One of the answer that we found was
that the spanish users generally use/put more
emotions in their tweets using emoticons or
Future Work
using some sort of method.
● Another promising direction for achieving

higher accuracy in profiling fake news
spreaders is to develop a software that is able
to determine whether a single tweet should be
considered as fake news with high accuracies. It
would be interesting to investigate how a
software that is able to decide whether a single
tweet is fake news would perform in this task.
● From this project we came to know
why it is important to identify the
fake news spreaders and what
consequences the can have.
● We also gained the knowledge about
various Machine learning models
used to identify the fake news
spreader.
Conclusions ● By implementing the project we
concluded the insights that various
models like SVM, XGBoost, Random
Forest and logistic Regression gave
roughly the same accuracies, but
combining them all in an ensemble
model increased the accuracy
significantly.
1. https://docs.google.com/spreadsheets/d/1PfGhA
AZ_ottsrL-ma4ssBDVnaBxJA61JNfCoG-jNmr
c/edit#gid=0&range=F10
2. https://github.com/pan-webis-de/buda20/tree/m
ain/paper
3. https://scikit-learn.org/stable/modules/generated
/sklearn.model_selection.GridSearchCV.html
4. https://zenodo.org/record/4039435#.ZFVGAnZ
References By3B
5. https://dl.acm.org/doi/abs/10.1145/3137597.313
7600
6. https://www.sciencedirect.com/science/article/
pii/S0306457318306794

SNA Project Presentation

Uploaded by

Copyright:

Available Formats

SNA Project Presentation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SNA Project Presentation

Uploaded by

Copyright:

Available Formats

Fake News Spreader Detection

Group 9 Arjun Yadav (IIT2020120)

INTRODUCTION trust in public institutions, and the undermining of

The detection of fake news spreaders involves

2. Precisely investigated the performance of :

3. We tested the same parameters for the:

2. Regarding the vectorization of the corpus, experimented with a number of

● Predictions given by these five models to the 60

● The Training Data consisted of 300

● We have used 200 XML files for

● The Average accuracy for Random Forest

● The Average accuracy for Logistic

● The Average accuracy for Ensemble model

● Most of the research paper that we read had

● Another promising direction for achieving

You might also like