0% found this document useful (0 votes)
705 views22 pages

Fake News Detection Using Machine Learning: Bachelor of Technology

This document discusses detecting fake news using machine learning. It begins by explaining how social media has become a major source of news consumption but also a platform for widespread fake news. Fake news can negatively impact society by undermining the credibility of real news sources, intentionally spreading misinformation, and influencing political opinions. The document proposes using machine learning methods to automatically detect fake news spread on social media in order to mitigate these negative effects.

Uploaded by

Pavitra Devi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
705 views22 pages

Fake News Detection Using Machine Learning: Bachelor of Technology

This document discusses detecting fake news using machine learning. It begins by explaining how social media has become a major source of news consumption but also a platform for widespread fake news. Fake news can negatively impact society by undermining the credibility of real news sources, intentionally spreading misinformation, and influencing political opinions. The document proposes using machine learning methods to automatically detect fake news spread on social media in order to mitigate these negative effects.

Uploaded by

Pavitra Devi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Fake news detection using machine learning

A Technical Mini project report submitted in partial fulfillment of the

requirements of the award of degree of

BACHELOR OF TECHNOLOGY
COMPUTER SCIENCE & ENGINEERING

Submitted By
B.L.V.S.SRI PAVANI (18A21B0505)

K.V. SANDHYA (18A21B0523)

K.G.N.V.S.D.PAVANI (18A21B0515)

CH.MANJU BHARGAVI (18A21B0507)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


SWARNANDHRA COLLEGE OF ENGINEERING & TECHNOLOGY
(AUTONOMOUS)
Accredited by National Board of Accreditation, AICTE, New Delhi
(Approved by A.I.C.T.E & Affiliated to JNTU Kakinada)An ISO 9001:2000 certified
institution
Seetharampuram, Narsapur-534 280, West Godavari (Dt.),A.P

i
SWARNANDHRA COLLEGE OF ENGINEERING & TECHNOLOGY
(AUTONOMOUS)
(Accredited by NBA and NAAC with ‘A’ Grade (CGPA 3.32/4))
(Approved by AICTE, Autonomous with JNTU Kakinada)
Seetharmpuram, Narsapur-534 280

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CERTIFICATE
This is to certify that the mini project Report entitled “FAKE NEWS DETECTION USING
MACHINE LEARNING” is being submitted by B.L.V.S.SRI PAVANI (18A21B0505), K.V.
SANDHYA (18A21B0523) ,K.G.N.V.S.D.PAVANI (18A21B0515), CH.MANJUBHARGAVI
(18A21B0507)in partial fulfillment for the award of the degree of “Bachelor of Technology in
COMPUTER SCIENCE AND ENGINEERING” during the academic year 2021-2022 and it has been
found worthy of acceptance according to the requirements of the autonomous.

Mini Project Guide

Mr.P.Srinivas Rao

Asst.Professor

Mini Project Co-Ordinator HEAD OF THE DEPARTMENT

Dr.A.Jegatheesan Dr.P.Srinivasulu M.Tech.,Ph.D.,

Professor/CSE Professor & Head /CSE

ii
ACKNOWLEDGEMENT

The satisfaction that accompanies the successful completion of every task during my
dissertation would be complete without the mention of the people who made it possible. I
consider it my privilege to my gratitude and respect to all who guided, inspired and helped me in
completion of my mini project.

I extend my heartfelt gratitude to the almighty for giving me strength in Proceeding with this
mini project report on "FAKE NEWS DETECTION USING MACHINE LEARNING"

I own my special thanks to the Dr.S.Ramesh Babu, M.Tech., Ph.D., Secretary and
Correspondent, Swarnandhra College of Engineering and Technology,seetharampuram for
providing necessary arrangements to carry out this seminar.

I would like to express my sincere thanks to Dr.P.Srinivasulu , M.Tech., Ph.D.,Professor


& Head, Department of Computer Science and Engineering for valuable suggestions at the time
of need.

I would like to express my profound sense of gratitude to Mini Project Coordinator


Dr.A.Jegatheesan ,Professor Department of Computer Science and Engineering for his
consistent encouragement and earnest support to complete it successfully.

B.L.V.S.SRI PAVANI (18A21B0505)


K.V. SANDHYA (18A21B0523)
K.G.N.V.S.D.PAVANI (18A21B0515)
CH.MANJU BHARGAVI (18A21B0507)

iii
FAKE NEWS DETECTION USING MACHINE LEARNING

ABSTRACT:
In our modern era where the internet is ubiquitous, everyone relies on various online resources
for news. News spread rapidly among millions of users with a very short span of time. The
spread of fake news has far-reaching consequences like the creation of biased opinions during
election outcomes. Moreover, spammers use appealing news headlines to generate revenue via
click-baits. For some years, mostly since the rise of social media, fake news have become a
society problem, in some occasion spreading more and faster than the true information. In this
paper I evaluate the performance of Attention Mechanism for fake news detection on two
datasets, one containing traditional online news articles and the second one news from various
sources. I compare results on both dataset and the results of Attention Mechanism against
LSTMs and traditional machine learning methods. It shows that Attention Mechanism does not
work as well as expected. In addition, I made changes to original Attention Mechanism paper,
by using word2vec embedding, that proves to works better on this particular case.

iv
TABLE OF CONTENTS

CHAPTER NO TITLE PAGE NO


CERTIFICATE i
ACKNOWLEDGEMENT ii
ABSTRACT iii
LIST OF FIGURES iv
1. Introduction 1
2. System overview 2
2.1. About the system
2.2. Existing System
2.3. Proposed System
2.4. System Requirements
3. System Implementation & Results 4
3.1. System Architecture
3.2. Data Flow Diagram/
System Flow Diagram
3.3. UML Diagrams
3.3.1 Use case Diagram
3.3.2 Class Diagram
3.3.3 Sequence Diagram
3.3.4 Activity Diagram
3.4. Frame Work
4. Conclusion and Future Enhancements 11
5. References 12
6. Appendix 13
6.1 A.1.Source Code

v
LIST OF FIGURES
S.NO Name of the Figure Page No
1 System Overview for fake news detection 2

2 System Architecture for fake news detection 4

3 Data flow Diagram for fake news detection 5

4 Usecase Diagram for fake news detection 6

5 Class Diagram for fake news detection 7

6 Sequence Diagram for fake news detection 8

7 Activity Diagram for fake news detection 9

8 Frame Work for fake news detection 10

vi
CHAPTER-1
INTRODUCTION

As an increasing amount of our lives is spent interacting online through social media platforms, more
and more people tend to hunt out and consume news from social media instead of traditional news
organizations. The explanations for this alteration in consumption behaviors are inherent within the
nature of those social media platforms: it's often more timely and fewer expensive to consume news
on social media compared with traditional journalism , like newspapers or television; and it's easier to
further share, discuss , and discuss the news with friends or other readers on social media. For
instance, 62 percent of U.S. adults get news on social media in 2016, while in 2012; only 49 percent
reported seeing news on social media. It had been also found that social media now outperforms
television because the major news source. Despite the benefits provided by social media, the standard
of stories on social media is less than traditional news organizations. However, because it's
inexpensive to supply news online and far faster and easier to propagate through social media, large
volumes of faux news, i.e., those news articles with intentionally false information, are produced
online for a spread of purposes, like financial and political gain. i had been estimated that over 1
million tweets are associated with fake news “Pizzagate" by the top of the presidential election. Given
the prevalence of this new phenomenon, “Fake news" was even named the word of the year by the
Macquarie dictionary in 2016. The extensive spread of faux news can have a significant negative
impact on individuals and society. First, fake news can shatter the authenticity equilibrium of the news
ecosystem for instance; it's evident that the most popular fake news was even more outspread on
Facebook than the most accepted genuine mainstream news during the U.S. 2016 presidential election.
Second, fake news intentionally persuades consumers to simply accept biased or false beliefs. Fake
news is typically manipulated by propagandists to convey political messages or influence for instance,
some report shows that Russia has created fake accounts and social bots to spread false stories. Third,
fake news changes the way people interpret and answer real news, for instance, some fake news was
just created to trigger people's distrust and make them confused; impeding their abilities to
differentiate what's true from what's not. To assist mitigate the negative effects caused by fake news
(both to profit the general public and therefore the news ecosystem. It's crucial that we build up
methods to automatically detect fake news broadcast on social media.

1
CHAPTER-2
SYSTEM OVERVIEW

2.1 ABOUT THE SYSTEM

. Fake news detection using machine learning project is aimed to develop an online tool to
detect fake news. The entire project has been developed keeping in view of the Machine
Learning technology, in mind. The data analyzer will collect data from various news
resources. This system is used for maintaining whole fake and real data. Moreover if any
general user wants to know if a data is real or fake, he can take the help of this tool

2
2.2 EXISTING SYSTEM
Google has noticed that users can spot misinformation by using its multiple online tools for fact-
checking.

Logically is a free mobile app and browser extension. It provides fact and image verification services.
Claim Buster is an online tool for instant fact-checking.

4S4U an initiative of CI, Andhra Pradesh Police Department that responds us if the news is fake or real
based on our news request.

2.3 PROPOSED SYSTEM


The task of classifying news manually requires in-depth knowledge of the domain and expertise to
identify anomalies in the text. The data we used in our work is collected from the World Wide Web
and contains news articles from various domains. In order to reduce the spread of fake news,
identifying key elements involved in the spread of news is an important step. Graph theory and
machine learning techniques can be employed to identify the key sources involved in spread of fake
news. Likewise, real time fake news identification in videos can be another possible future direction.

2.4 SYSTEM REQUIREMENTS


 Frontend Web Technologies: HTML, CSS, JavaScript
 Framework : Django
 Tools : PyCharm , VSCode, Microsoft Excel
 Database : MySQL
 Libraries : Numpy, SKlearn, Pandas (Python)

3
CHAPTER 3
SYSTEM IMPLEMENTATION AND RESULTS

3.1 SYSTEM ARCHITECTURE


Static Search: The architecture of Static part of fake news detection system is quite simple and is done
keeping in mind the basic machine learning process flow. The system design is shown below and self-
explanatory. The main processes in the design are Figure system Architecture

System architecture for fake news detection

Dynamic Search : The second search field of the site asks for specific keywords to be searched on the
net upon which it provides a suitable output for the percentage probability of that term actually being
present in an article or a similar article with those keyword references in it.

URL Search: The third search field of the site accepts a specific website domain name upon which the
implementation looks for the site in our true sites database or the blacklisted sites database. The true sites
database holds the domain names which regularly provide proper and authentic news and vice versa. If
the site isn’t found in either of the databases then the implementation doesn’t classify the domain it
simply states that the news aggregator does not exists.

4
3.2 DATA FLOW DIAGRAM

Data Flow Diagram for Fake News Detection

A data flow diagram (DFD) maps out the flow of information for any process or system. It uses
defined symbols like rectangles, circles and arrows, plus short text labels, to show data inputs,
outputs, storage points and the routes between each destination. Data flowcharts can range from
simple, even hand-drawn process overviews, to in-depth, multi-level DFDs that dig progressively
deeper into how the data is handled. They can be used to analyze an existing system or model a
new one. Like all the best diagrams and charts, a DFD can often visually “say” things that would
be hard to explain in words, and they work for both technical and nontechnical audiences, from
developer to CEO. That’s why DFDs remain so popular after all these years. While they work
well for data flow software and systems, they are less applicable

5
3.3 UML Diagram (Use Case, Class Diagram, Sequence Diagram,
Activity Diagram)
USE CASE DIAGRAM

Use Case diagram for Fake News Detection

Model a system, the most important aspect is to capture the dynamic behavior. Dynamic
behavior means the behavior of the system when it is running/operating.
Only static behavior is not sufficient to model a system rather dynamic behavior is more
important than static behavior. In UML, there are five diagrams available to model the dynamic
nature and use case diagram is one of them. Now as we have to discuss that the use case
diagram is dynamic in nature, there should be some internal or external factors for making the
interaction.
These internal and external agents are known as actors. Use case diagrams consists of actors,
use cases and their relationships. The diagram is used to model the system/subsystem of an
application. A single use case diagram captures a particular functionality of a system.
Hence to model the entire system, a number of use case diagrams are used.

6
CLASS DIAGRAM

Class Diagram for Fake News Detection

Class diagram is a static diagram. It represents the static view of an application. Class diagram
is not only used for visualizing, describing, and documenting different aspects of a system but
also for constructing executable code of the software application.
Class diagram describes the attributes and operations of a class and also the constraints imposed
on the system. The class diagrams are widely used in the modeling of object oriented systems
because they are the only UML diagrams, which can be mapped directly with object-oriented
languages.
Class diagram shows a collection of classes, interfaces, associations, collaborations, and
constraints. It is also known as a structural diagram.

7
SEQUENCE DIAGRAM

Sequence Diagram for Fake News Detection

The sequence diagram represents the flow of messages in the system and is also termed as an
event diagram. It helps in envisioning several dynamic scenarios. It portrays the communication
between any two lifelines as a time-ordered sequence of events, such that these lifelines took part
at the run time. In UML, the lifeline is represented by a vertical bar, whereas the message flow is
represented by a vertical dotted line that extends across the bottom of the page. It incorporates
the iterations as well as branching.

8
ACTIVITY DIAGRAM

Activity Diagram for fake News Detection


Activity diagram is another important diagram in UML to describe the dynamic aspects of the
system. Activity diagram is basically a flowchart to represent the flow from one activity to
another activity. The activity can be described as an operation of the system. The control flow is
drawn from one operation to another. This flow can be sequential, branched, or concurrent.
Activity diagrams deal with all type of flow control by using different elements such as fork,
join, etc.

9
3.4 FRAME WORK

This advanced python project of detecting fake news deals with fake and real news. Using
sklearn, we build a TfidfVectorizer on our dataset. Then, we initialize a PassiveAggressive
Classifier and fit the model. In the end, the accuracy score and the confusion matrix tell us how
well our model fares.

The fake news Dataset


The dataset we’ll use for this python project- we’ll call it news.csv. This dataset has a shape of
7796×4. The first column identifies the news, the second and third are the title and text, and the
fourth column has labels denoting whether the news is REAL or FAKE.

10
CHAPTER-4
CONCLUSION AND FUTURE ENHANCEMENTS

The main contribution of this project is support for the idea that machine learning could be useful
in a novel way for the task of classifying fake news. Our findings show that after much pre-
processing of relatively small dataset, simple algorithm is able to pick up on a diverse set of
potentially subtle language patterns that a human may (or may not) be able to detect. Many of
these language patterns are intuitively useful in a humans manner of classifying fake news.
Likewise, our model looks for indefinite or inconclusive words, referential words, and evidence
words as patterns that characterize real news. Even if a human could detect these patterns, they
are not able to store as much information as a ML model, and therefore, may not understand the
complex relationships between the detection of these patterns and the decision for classification.
Furthermore, the model seems to be relatively unphased by the exclusion of certain “giveaway”
topic words in the training set, as it is able to pick up on trigrams that are less specific to a given
topic, if need be. As such, this seems to be a really good start on a tool that would be useful to
augment humans ability to detect Fake News.

Through the work done in this project, we have shown that machine learning certainly does have
the capacity to pick up on sometimes subtle language patterns that may be difficult for humans to
pick up on. The next steps involved in this project come in three different aspects. The first of
aspect that could be improved in this project is augmenting and increasing the size of the dataset.
We feel that more data would be beneficial in ridding the model of any bias based on specific
patterns in the source. There is also question as to weather or not the size of our dataset is
sufficient.

11
CHAPTER-5
REFERENCES

References:

1. Kushal Agarwalla, Shubham Nandan, Varun Anil Nair, D. Deva Hema, “Fake News Detection
using Machine Learning and Natural Language Processing,” International Journal of Recent
Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-7, Issue-6, March 2019

2. https://www.kaggle.com/

3. https://data-flair.training/

12
CHAPTER -6
APPENDIX

STEPS FOR DETECTING FAKE NEWS WITH PYTHON


Follow the below steps for detecting fake news and complete your first advanced Python Project

1. Make necessary imports:


1. import numpy as np
2. import pandas as pd
3. import itertools
4. from sklearn.model_selection import train_test_split
5. from sklearn.feature_extraction.text import TfidfVectorizer
6. from sklearn.linear_model import PassiveAggressiveClassifier
7. from sklearn.metrics import accuracy_score, confusion_matrix

13
2. Now, let’s read the data into a DataFrame, and get the shape of the data and
the first 5 records.

#Read the data


1. df=pd.read_csv(r'C:\Users\project\Desktop\\news.csv')
.#Get shape and head
2. df.shape
3. df.head()

3. And get the labels from the Data Frame.

#DataFlair - Get the labels


1. labels=df.label
2. labels.head()

14
4. Split the dataset into training and testing sets.

#DataFlair - Split the dataset


1. x_train,x_test,y_train,y_test=train_test_split(df['text'], labels, test_size=0.2, random_state=7)

5. Let’s initialize a TfidfVectorizer with stop words from the English language and
a maximum document frequency of 0.7 (terms with a higher document frequency
will be discarded). Stop words are the most common words in a language that are
to be filtered out before processing the natural language data. And a
TfidfVectorizer turns a collection of raw documents into a matrix of TF-IDF
features.

DataFlair - Initialize a TfidfVectorizer


1. tfidf_vectorizer=TfidfVectorizer(stop_words='english', max_df=0.7)
#DataFlair - Fit and transform train set, transform test set
2. tfidf_train=tfidf_vectorizer.fit_transform(x_train)
3. tfidf_test=tfidf_vectorizer.transform(x_test)

15
6. Next, we’ll initialize a PassiveAggressiveClassifier. This is. We’ll fit this on
tfidf_train and y_train.

Then, we’ll predict on the test set from the TfidfVectorizer and calculate the
accuracy with accuracy_score() from sklearn.metrics.
#DataFlair - Initialize a PassiveAggressiveClassifier
1. pac=PassiveAggressiveClassifier(n_iter=50)
2. pac.fit(tfidf_train,y_train)
#DataFlair - Predict on the test set and calculate accuracy
3. y_pred=pac.predict(tfidf_test)
4. score=accuracy_score(y_test,y_pred)
5. print(f'Accuracy: {round(score*100,2)}%')

7. We got an accuracy of 92.82% with this model. Finally, let’s print out a
confusion matrix to gain insight into the number of false and true negatives and
positives.

#DataFlair - Build confusion matrix


1. confusion_matrix(y_test,y_pred, labels=['FAKE','REAL'])

16

You might also like