Fake Reviews Detector Project (132,133)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

A D PATEL INSTITUTE OF TECHNOLOGY

DEPARTMENT OF INFORMATION TECHNOLOGY


SUBJECT NAME & COURSE CODE: ARTIFICIAL INTELLIGENCE (202044503)

Fake Reviews Detector Project


1. Introduction
The Fake Reviews Detector project is designed to identify whether a given review is real or fake
using machine learning techniques. This project utilizes natural language processing (NLP) to
preprocess the text data and build a classification model using Logistic Regression.

2. Dataset Description
The dataset used for this project contains two columns: 'review' and 'label'. The 'review' column
contains the text of the review, while the 'label' column indicates whether the review is real (0) or
fake (1).

3. Code Overview
The code is structured into several sections: data preprocessing, feature extraction, model
training, evaluation, and prediction.

4. Code Snippet
import pandas as pd
import numpy as np
import re
import nltk
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer

def preprocess_review(review):
review = re.sub('[^a-zA-Z]', ' ', review)
review = review.lower()
review = review.split()
ps = PorterStemmer()
review = [ps.stem(word) for word in review if word not in set(stopwords.words('english'))]
return ' '.join(review)

# Load your dataset


df = pd.read_csv('reviews.csv') # Ensure this path is correct

1
Enrollment Numbers : 12202080601132 , 12202080601133
Names : Patel Vaibhavi , Rana Vairagi
A D PATEL INSTITUTE OF TECHNOLOGY
DEPARTMENT OF INFORMATION TECHNOLOGY
SUBJECT NAME & COURSE CODE: ARTIFICIAL INTELLIGENCE (202044503)

# Preprocess reviews
df['processed_review'] = df['review'].apply(preprocess_review)

# Feature extraction
tfidf = TfidfVectorizer(max_features=5000)
X = tfidf.fit_transform(df['processed_review']).toarray()
y = df['label']

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model


model = LogisticRegression()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Print evaluation metrics


print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

# Predicting a sample review


def predict_review(review_text):
processed_review = preprocess_review(review_text)
review_tfidf = tfidf.transform([processed_review]).toarray()
return model.predict(review_tfidf)

# Example usage for prediction


review = "This product is absolutely amazing! I love it!"
prediction = predict_review(review)
print("Prediction (0 = Real, 1 = Fake):", prediction[0])

5. Expected Outputs
The expected outputs include:
- Accuracy of the model.
- A classification report detailing precision, recall, and F1-score.
- A confusion matrix indicating the number of true and false predictions.
- Predictions for sample reviews indicating whether they are real or fake.
2
Enrollment Numbers : 12202080601132 , 12202080601133
Names : Patel Vaibhavi , Rana Vairagi
A D PATEL INSTITUTE OF TECHNOLOGY
DEPARTMENT OF INFORMATION TECHNOLOGY
SUBJECT NAME & COURSE CODE: ARTIFICIAL INTELLIGENCE (202044503)

6. Conclusion
The Fake Reviews Detector project showcases the application of machine learning and natural
language processing to tackle the problem of fake reviews. By using this model, businesses can
better identify and mitigate the impact of deceptive reviews.

3
Enrollment Numbers : 12202080601132 , 12202080601133
Names : Patel Vaibhavi , Rana Vairagi

You might also like