Fake News Detection Using Machine Learning: Bachelor of Technology
Fake News Detection Using Machine Learning: Bachelor of Technology
BACHELOR OF TECHNOLOGY
COMPUTER SCIENCE & ENGINEERING
Submitted By
B.L.V.S.SRI PAVANI (18A21B0505)
K.G.N.V.S.D.PAVANI (18A21B0515)
i
SWARNANDHRA COLLEGE OF ENGINEERING & TECHNOLOGY
(AUTONOMOUS)
(Accredited by NBA and NAAC with ‘A’ Grade (CGPA 3.32/4))
(Approved by AICTE, Autonomous with JNTU Kakinada)
Seetharmpuram, Narsapur-534 280
CERTIFICATE
This is to certify that the mini project Report entitled “FAKE NEWS DETECTION USING
MACHINE LEARNING” is being submitted by B.L.V.S.SRI PAVANI (18A21B0505), K.V.
SANDHYA (18A21B0523) ,K.G.N.V.S.D.PAVANI (18A21B0515), CH.MANJUBHARGAVI
(18A21B0507)in partial fulfillment for the award of the degree of “Bachelor of Technology in
COMPUTER SCIENCE AND ENGINEERING” during the academic year 2021-2022 and it has been
found worthy of acceptance according to the requirements of the autonomous.
Mr.P.Srinivas Rao
Asst.Professor
ii
ACKNOWLEDGEMENT
The satisfaction that accompanies the successful completion of every task during my
dissertation would be complete without the mention of the people who made it possible. I
consider it my privilege to my gratitude and respect to all who guided, inspired and helped me in
completion of my mini project.
I extend my heartfelt gratitude to the almighty for giving me strength in Proceeding with this
mini project report on "FAKE NEWS DETECTION USING MACHINE LEARNING"
I own my special thanks to the Dr.S.Ramesh Babu, M.Tech., Ph.D., Secretary and
Correspondent, Swarnandhra College of Engineering and Technology,seetharampuram for
providing necessary arrangements to carry out this seminar.
iii
FAKE NEWS DETECTION USING MACHINE LEARNING
ABSTRACT:
In our modern era where the internet is ubiquitous, everyone relies on various online resources
for news. News spread rapidly among millions of users with a very short span of time. The
spread of fake news has far-reaching consequences like the creation of biased opinions during
election outcomes. Moreover, spammers use appealing news headlines to generate revenue via
click-baits. For some years, mostly since the rise of social media, fake news have become a
society problem, in some occasion spreading more and faster than the true information. In this
paper I evaluate the performance of Attention Mechanism for fake news detection on two
datasets, one containing traditional online news articles and the second one news from various
sources. I compare results on both dataset and the results of Attention Mechanism against
LSTMs and traditional machine learning methods. It shows that Attention Mechanism does not
work as well as expected. In addition, I made changes to original Attention Mechanism paper,
by using word2vec embedding, that proves to works better on this particular case.
iv
TABLE OF CONTENTS
v
LIST OF FIGURES
S.NO Name of the Figure Page No
1 System Overview for fake news detection 2
vi
CHAPTER-1
INTRODUCTION
As an increasing amount of our lives is spent interacting online through social media platforms, more
and more people tend to hunt out and consume news from social media instead of traditional news
organizations. The explanations for this alteration in consumption behaviors are inherent within the
nature of those social media platforms: it's often more timely and fewer expensive to consume news
on social media compared with traditional journalism , like newspapers or television; and it's easier to
further share, discuss , and discuss the news with friends or other readers on social media. For
instance, 62 percent of U.S. adults get news on social media in 2016, while in 2012; only 49 percent
reported seeing news on social media. It had been also found that social media now outperforms
television because the major news source. Despite the benefits provided by social media, the standard
of stories on social media is less than traditional news organizations. However, because it's
inexpensive to supply news online and far faster and easier to propagate through social media, large
volumes of faux news, i.e., those news articles with intentionally false information, are produced
online for a spread of purposes, like financial and political gain. i had been estimated that over 1
million tweets are associated with fake news “Pizzagate" by the top of the presidential election. Given
the prevalence of this new phenomenon, “Fake news" was even named the word of the year by the
Macquarie dictionary in 2016. The extensive spread of faux news can have a significant negative
impact on individuals and society. First, fake news can shatter the authenticity equilibrium of the news
ecosystem for instance; it's evident that the most popular fake news was even more outspread on
Facebook than the most accepted genuine mainstream news during the U.S. 2016 presidential election.
Second, fake news intentionally persuades consumers to simply accept biased or false beliefs. Fake
news is typically manipulated by propagandists to convey political messages or influence for instance,
some report shows that Russia has created fake accounts and social bots to spread false stories. Third,
fake news changes the way people interpret and answer real news, for instance, some fake news was
just created to trigger people's distrust and make them confused; impeding their abilities to
differentiate what's true from what's not. To assist mitigate the negative effects caused by fake news
(both to profit the general public and therefore the news ecosystem. It's crucial that we build up
methods to automatically detect fake news broadcast on social media.
1
CHAPTER-2
SYSTEM OVERVIEW
. Fake news detection using machine learning project is aimed to develop an online tool to
detect fake news. The entire project has been developed keeping in view of the Machine
Learning technology, in mind. The data analyzer will collect data from various news
resources. This system is used for maintaining whole fake and real data. Moreover if any
general user wants to know if a data is real or fake, he can take the help of this tool
2
2.2 EXISTING SYSTEM
Google has noticed that users can spot misinformation by using its multiple online tools for fact-
checking.
Logically is a free mobile app and browser extension. It provides fact and image verification services.
Claim Buster is an online tool for instant fact-checking.
4S4U an initiative of CI, Andhra Pradesh Police Department that responds us if the news is fake or real
based on our news request.
3
CHAPTER 3
SYSTEM IMPLEMENTATION AND RESULTS
Dynamic Search : The second search field of the site asks for specific keywords to be searched on the
net upon which it provides a suitable output for the percentage probability of that term actually being
present in an article or a similar article with those keyword references in it.
URL Search: The third search field of the site accepts a specific website domain name upon which the
implementation looks for the site in our true sites database or the blacklisted sites database. The true sites
database holds the domain names which regularly provide proper and authentic news and vice versa. If
the site isn’t found in either of the databases then the implementation doesn’t classify the domain it
simply states that the news aggregator does not exists.
4
3.2 DATA FLOW DIAGRAM
A data flow diagram (DFD) maps out the flow of information for any process or system. It uses
defined symbols like rectangles, circles and arrows, plus short text labels, to show data inputs,
outputs, storage points and the routes between each destination. Data flowcharts can range from
simple, even hand-drawn process overviews, to in-depth, multi-level DFDs that dig progressively
deeper into how the data is handled. They can be used to analyze an existing system or model a
new one. Like all the best diagrams and charts, a DFD can often visually “say” things that would
be hard to explain in words, and they work for both technical and nontechnical audiences, from
developer to CEO. That’s why DFDs remain so popular after all these years. While they work
well for data flow software and systems, they are less applicable
5
3.3 UML Diagram (Use Case, Class Diagram, Sequence Diagram,
Activity Diagram)
USE CASE DIAGRAM
Model a system, the most important aspect is to capture the dynamic behavior. Dynamic
behavior means the behavior of the system when it is running/operating.
Only static behavior is not sufficient to model a system rather dynamic behavior is more
important than static behavior. In UML, there are five diagrams available to model the dynamic
nature and use case diagram is one of them. Now as we have to discuss that the use case
diagram is dynamic in nature, there should be some internal or external factors for making the
interaction.
These internal and external agents are known as actors. Use case diagrams consists of actors,
use cases and their relationships. The diagram is used to model the system/subsystem of an
application. A single use case diagram captures a particular functionality of a system.
Hence to model the entire system, a number of use case diagrams are used.
6
CLASS DIAGRAM
Class diagram is a static diagram. It represents the static view of an application. Class diagram
is not only used for visualizing, describing, and documenting different aspects of a system but
also for constructing executable code of the software application.
Class diagram describes the attributes and operations of a class and also the constraints imposed
on the system. The class diagrams are widely used in the modeling of object oriented systems
because they are the only UML diagrams, which can be mapped directly with object-oriented
languages.
Class diagram shows a collection of classes, interfaces, associations, collaborations, and
constraints. It is also known as a structural diagram.
7
SEQUENCE DIAGRAM
The sequence diagram represents the flow of messages in the system and is also termed as an
event diagram. It helps in envisioning several dynamic scenarios. It portrays the communication
between any two lifelines as a time-ordered sequence of events, such that these lifelines took part
at the run time. In UML, the lifeline is represented by a vertical bar, whereas the message flow is
represented by a vertical dotted line that extends across the bottom of the page. It incorporates
the iterations as well as branching.
8
ACTIVITY DIAGRAM
9
3.4 FRAME WORK
This advanced python project of detecting fake news deals with fake and real news. Using
sklearn, we build a TfidfVectorizer on our dataset. Then, we initialize a PassiveAggressive
Classifier and fit the model. In the end, the accuracy score and the confusion matrix tell us how
well our model fares.
10
CHAPTER-4
CONCLUSION AND FUTURE ENHANCEMENTS
The main contribution of this project is support for the idea that machine learning could be useful
in a novel way for the task of classifying fake news. Our findings show that after much pre-
processing of relatively small dataset, simple algorithm is able to pick up on a diverse set of
potentially subtle language patterns that a human may (or may not) be able to detect. Many of
these language patterns are intuitively useful in a humans manner of classifying fake news.
Likewise, our model looks for indefinite or inconclusive words, referential words, and evidence
words as patterns that characterize real news. Even if a human could detect these patterns, they
are not able to store as much information as a ML model, and therefore, may not understand the
complex relationships between the detection of these patterns and the decision for classification.
Furthermore, the model seems to be relatively unphased by the exclusion of certain “giveaway”
topic words in the training set, as it is able to pick up on trigrams that are less specific to a given
topic, if need be. As such, this seems to be a really good start on a tool that would be useful to
augment humans ability to detect Fake News.
Through the work done in this project, we have shown that machine learning certainly does have
the capacity to pick up on sometimes subtle language patterns that may be difficult for humans to
pick up on. The next steps involved in this project come in three different aspects. The first of
aspect that could be improved in this project is augmenting and increasing the size of the dataset.
We feel that more data would be beneficial in ridding the model of any bias based on specific
patterns in the source. There is also question as to weather or not the size of our dataset is
sufficient.
11
CHAPTER-5
REFERENCES
References:
1. Kushal Agarwalla, Shubham Nandan, Varun Anil Nair, D. Deva Hema, “Fake News Detection
using Machine Learning and Natural Language Processing,” International Journal of Recent
Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-7, Issue-6, March 2019
2. https://www.kaggle.com/
3. https://data-flair.training/
12
CHAPTER -6
APPENDIX
13
2. Now, let’s read the data into a DataFrame, and get the shape of the data and
the first 5 records.
14
4. Split the dataset into training and testing sets.
5. Let’s initialize a TfidfVectorizer with stop words from the English language and
a maximum document frequency of 0.7 (terms with a higher document frequency
will be discarded). Stop words are the most common words in a language that are
to be filtered out before processing the natural language data. And a
TfidfVectorizer turns a collection of raw documents into a matrix of TF-IDF
features.
15
6. Next, we’ll initialize a PassiveAggressiveClassifier. This is. We’ll fit this on
tfidf_train and y_train.
Then, we’ll predict on the test set from the TfidfVectorizer and calculate the
accuracy with accuracy_score() from sklearn.metrics.
#DataFlair - Initialize a PassiveAggressiveClassifier
1. pac=PassiveAggressiveClassifier(n_iter=50)
2. pac.fit(tfidf_train,y_train)
#DataFlair - Predict on the test set and calculate accuracy
3. y_pred=pac.predict(tfidf_test)
4. score=accuracy_score(y_test,y_pred)
5. print(f'Accuracy: {round(score*100,2)}%')
7. We got an accuracy of 92.82% with this model. Finally, let’s print out a
confusion matrix to gain insight into the number of false and true negatives and
positives.
16