Project5 Fake Text Detection
Project5 Fake Text Detection
Abhishek Verma1 , Aditya Gupta2 , Aman Dixit3 , Maulik Singhal4 , Prakhar Pradhan5
1
190042 2 190061 2 190103 2 190489 3 190618
1
AE, 1 EE, 1 BSBE, 2 ME, 3 CE
{abhivrm, adigup, amandx, smaulik, prakharp}@iitk.ac.in
Abstract 2 Motivation
With the evolving complexity of AI technologies
This study investigates detecting machine- and ML algorithms, we have the opportunity to
generated text from the human-generated text.
use the same technology to find the truth. As this
This paper presents various approaches to build-
ing ML-based models, such as Graphical Neu-
is just the start of the future where every second
ral networks(GNN) and fine-tuning using the job would be done by artificial intelligence, we
pre-trained model RoBERTa. We propose the also have to start early and evolve accordingly to
BERT score as an evaluation metric along with give tight competition. Humans have evolved for
perplexity and burstiness. We present senti- 300 thousand years to reach here, so it is obvious
ment as a semantic feature to make the model any advancement in machine learning algorithms
more robust and tune the dataset to make the cannot match the complexity of human speech or
model less prone to adversarial attacks. These
actions. We take advantage of this minute differ-
findings can be an effective intervention in im-
proving the existing models. ence and use the same technology and algorithms
used by GPTs to always stay ahead in this race.
8 Error Analysis
1. Logistic Regression - We have implemented
LR using all characteristic features.
(a) Text vectorization
i. BOW, CBOW
ii. TF-IDF
iii. One-hot Encoding
iv. Word2vec
(b) Word Embedding Types
(c) Frequency based
i. BOW, TF-IDF, Glove
(d) Prediction-based
i. Word2Vec
References