Fake New Detection Assignment
Fake New Detection Assignment
IBM Assignment
Problem Statement:
Social media platforms have become battlegrounds for information, where misinformation and
harmful content proliferate, eroding trust and sometimes leading to real-world harm. In response to
this pressing challenge, there is a need for an AI-powered solution, referred to as the "Truth
Detector," to combat misinformation and promote a healthier online space.
Spotting Fakes: Analyzing text and various forms of media (such as images and videos) to
identify potential misinformation. This includes detecting common tactics and formats used
by creators to spread false information.
Critical Thinking: Going beyond keyword analysis, the Truth Detector must delve deeper into
the context, sentiment, and credibility of information sources to avoid mistakenly flagging
legitimate content as false.
Fairness and Unbias: It is imperative to mitigate biases in both data and algorithms to ensure
that the Truth Detector operates impartially. The goal is to uphold freedom of expression
while safeguarding users from harm caused by misinformation.
Additionally, the design should incorporate mechanisms for users to provide feedback and appeal
flagged content. This fosters transparency and trust in the system, empowering users to participate
in the moderation process and contribute to a more reliable online environment.
The ultimate aim of the Truth Detector is to serve as a champion in the fight against misinformation
on social media, promoting trust, credibility, and safety in online interactions. Are you ready to
unleash the power of AI and build a healthier digital space?
Approach:
(Text -based Fact-Checker)
The code begins by importing necessary libraries such as NLTK, requests, numpy, and sklearn, which
are essential for various functionalities. Next, it loads pre-trained GloVe word embeddings from a file
(glove.6B.50d.txt). These embeddings capture semantic meanings of words in a high-dimensional
space. Subsequently, two preprocessing functions are defined to prepare input text for analysis. The
preprocess(sentence) function tokenizes input sentences, converts words to lowercase, and removes
stopwords. Meanwhile, the get_sentence_embedding(sentence) function calculates the embedding
for a given sentence using GloVe word embeddings, computing the mean of word embeddings for
words in the sentence. Following this, the code defines a function, semantic_similarity(sentence1,
sentence2), to measure the semantic similarity between two sentences using cosine similarity. It
leverages the cosine_similarity function from sklearn to accomplish this task.
Moving on, the code utilizes the Google Generative AI (Gemini) through the ChatGoogleGenerativeAI
class from the langchain_google_genai module. It initializes an instance of the class (llm) with the
desired model and Google API key, enabling interactions with Gemini for generating responses to
queries. Synonyms for "false" and "true" are retrieved from WordNet and stored in lists (synonyms
and synonymsT), including additional terms such as speaker pronouns and affirmative terms in
synonymsT. Subsequently, two functions for fact-checking are defined: fact_checker(q) and
Combine_Fact_Checker(q).
The former checks the validity of a statement using synonyms of "false" and "true", and if necessary,
invokes Gemini AI for clarification. The latter combines multiple fact-checking methods, including an
external fact-checking API and the fact_checker() function, to determine the truthfulness of the
provided information. The code queries the Google Fact Check Tools API to fetch fact-checking
information based on user queries, handling cases where the API response is insufficient by falling
back to internal fact-checking methods. Lastly, exception handling is implemented to catch errors
that may occur during API calls or processing of input statements, ensuring robustness and graceful
handling of errors throughout the code execution.
Declarations:
(Text -based Fact-Checker)
1. Libraries Imported:
2. Function Definitions:
a. preprocess(sentence)
b. get_sentence_embedding(sentence)
c. semantic_similarity(sentence1, sentence2)
Description: Calculates the cosine similarity between the embeddings of two sentences.
Inputs: sentence1, sentence2 (strings) - The two input sentences.
Output: Similarity score (float) between 0 and 1.
d. gemini(q)
Description: Utilizes the Google Generative AI to generate responses for the given query.
Input: q (string) - The query.
Output: Generated response as a string.
e. fact_checker(q)
Description: Checks the validity of a statement by using synonyms of "false" and "true",
and if needed, invokes the Gemini AI for clarification.
Input: q (string) - The statement or question to be fact-checked.
Output: Prints whether the information is true or false.
f. Combine_Fact_Checker(q)
4. External APIs:
Google Fact Check Tools API: Used for fetching fact-checking information based on user
queries.
5. Main Functionality:
6. Error Handling:
Exception handling is implemented to capture errors that may occur during API calls or
processing of input statements.
(DeepFake Detection)
gradio: Gradio is a library that allows you to quickly create UIs for your machine learning models. It's
particularly useful for building web-based interfaces.
torch: PyTorch is an open-source machine learning library used for tasks such as natural language
processing and computer vision.
facenet_pytorch: This library provides pre-trained face detection models based on deep learning
architectures. MTCNN (Multi-task Cascaded Convolutional Networks) is a face detection algorithm.
numpy: NumPy is a library for numerical computing with Python, providing support for large, multi-
dimensional arrays and matrices, along with a collection of mathematical functions to operate on
these arrays.
PIL (Python Imaging Library): PIL is a library for opening, manipulating, and saving many different
image file formats.
cv2 (OpenCV): OpenCV is a library of programming functions mainly aimed at real-time computer
vision. It provides tools for image processing and computer vision.
pytorch_grad_cam: PyTorch-Grad-CAM is a library for visualizing the regions of an image that a CNN
focuses on while making a decision.
Model Used:
The code utilizes the InceptionResnetV1 model pre-trained on the VGGFace2 dataset for face
recognition tasks.
Functions Defined:
predict(input_image:Image.Image): This function takes an input image, detects faces using MTCNN,
preprocesses the detected face, generates class activation maps (CAM) using GradCAM, performs
inference using the pre-trained InceptionResnetV1 model, and returns the predicted class ("real" or
"fake") along with the confidence scores and the face with explainability (highlighted regions
indicating decision-making areas).
The gr.Interface class is used to create a simple web-based UI for the predict function. It takes the
function (predict), defines the input and output components of the UI (input image, predicted class
label, and the image with explainability), and launches the interface.
The launch() method is called on the interface object, which launches the Gradio interface in a new
browser window.
Step-by-Step Explaination:
(Text -based Fact-Checker)
Importing Libraries:
The code begins by importing necessary libraries such as NLTK, requests, numpy, and sklearn.
These libraries provide functionalities for natural language processing, HTTP requests,
numerical computations, and similarity calculations.
The code loads pre-trained GloVe word embeddings from a file (glove.6B.50d.txt). These
embeddings represent words as dense vectors in a high-dimensional space, capturing
semantic meanings.
Preprocessing Functions:
The code uses the ChatGoogleGenerativeAI class from the langchain_google_genai module
to interact with the Google Generative AI (Gemini). It initializes an instance of the class (llm)
with the desired model and Google API key.
Synonym Retrieval:
Synonyms for "false" and "true" are obtained using WordNet and stored in lists (synonyms
and synonymsT respectively). Additionally, speaker pronouns and affirmative terms are
included in the synonymsT list.
fact_checker(q): Checks the validity of a statement using synonyms of "false" and "true",
and if necessary, invokes Gemini AI for clarification.
Combine_Fact_Checker(q): Combines multiple fact-checking methods including an
external fact-checking API and the fact_checker() function. It checks the truthfulness of
the provided information.
The code queries the Google Fact Check Tools API to fetch fact-checking information based
on user queries. It handles cases where the API response is insufficient by falling back to
internal fact-checking methods.
The Combine_Fact_Checker() function serves as the main interface for fact-checking. It takes
a statement or question as input, combines various fact-checking methods, and prints
whether the information is true or false.
Error Handling:
Exception handling is implemented to catch errors that may occur during API calls or
processing of input statements. This ensures graceful handling of errors and prevents
program crashes.
Code:
(Text -based Fact-Checker)
import nltk
import requests
import numpy as np
word_embeddings = {}
for line in f:
values = line.split()
word = values[0]
word_embeddings[word] = embedding
stop_words = set(stopwords.words('english'))
def preprocess(sentence):
tokens = nltk.word_tokenize(sentence)
words = [word.lower() for word in tokens if word.isalpha() and word.lower() not in stop_words]
return words
def get_sentence_embedding(sentence):
words = preprocess(sentence)
if len(word_vectors) > 0:
sentence_embedding = np.mean(word_vectors, axis=0)
return sentence_embedding
else:
return None
embedding1 = get_sentence_embedding(sentence1)
embedding2 = get_sentence_embedding(sentence2)
return similarity_score
else:
return 0
def gemini(q):
response = llm.invoke([q])
return response.content
synonyms = []
for i in syn.lemmas():
synonyms.append(i.name())
synonyms = list(set(synonyms))
print(synonyms)
synonymsT = []
for i in syn.lemmas():
synonymsT.append(i.name())
synonymsT = list(set(synonymsT))
speaker_pronouns = ['i', 'me', 'myself', 'we', 'us', 'ourselves', "i'm", "i've", "i'd", "i'll", "my", "mine"]
synonymsT += speaker_pronouns
synonymsT.append("yes")
print(synonymsT)
def fact_checker(q):
try:
q_token = word_tokenize(q)
print("True Information")
else:
print("False Information")
else:
except Exception as e:
def Combine_Fact_Checker(q):
if (len(q) == 0):
return
try:
resp = requests.get(f"https://factchecktools.googleapis.com/v1alpha1/claims:search?
pageSize=3&query={query}&key=AIzaSyAfbNl74qJ7S1iMVQwtrxqFNu-SB6DOPwc").json()
fact_checker(q.lower())
else :
textual_ratings = []
textual_ratings.append(review['textualRating'])
if (set(textual_ratings).issubset(set(negative_marker))):
print("False Information")
else:
print("True Information")
except Exception as e:
import gradio as gr
import torch
import torch.nn.functional as F
import cv2
import warnings
warnings.filterwarnings("ignore")
DEVICE = 'cpu'
mtcnn = MTCNN(
select_largest=False,
post_process=False,
device=DEVICE
).to(DEVICE).eval()
model = InceptionResnetV1(
pretrained="vggface2",
classify=True,
num_classes=1,
device=DEVICE
model.load_state_dict(checkpoint['model_state_dict'])
model.to(DEVICE)
model.eval()
def predict(input_image:Image.Image):
try:
"""Predict the label of the input_image"""
face = mtcnn(input_image)
if face is None:
prev_face = prev_face.astype('uint8')
face = face.to(DEVICE)
face = face.to(torch.float32)
target_layers=[model.block8.branch1[-1]]
targets = [ClassifierOutputTarget(0)]
grayscale_cam = grayscale_cam[0, :]
with torch.no_grad():
output = torch.sigmoid(model(face).squeeze(0))
real_prediction = 1 - output.item()
fake_prediction = output.item()
confidences = {
'real': real_prediction,
'fake': fake_prediction
except Exception as e:
print(e)
interface = gr.Interface(
fn=predict,
inputs=[
],
outputs=[
gr.Label(label="Class"),
],
).launch()