Enron-Spam-Detection-using-NLP-ML

This project leverages the Enron email dataset to build a spam detection model using classical machine learning techniques. The model processes and classifies emails based on their subject lines and message bodies, with a final accuracy of 90–91% using Logistic Regression and Multinomial Naive Bayes classifiers. Dataset Overview Original dataset size: 33,716 emails

Fields: Message_ID, Date, Subject, Message, Label

Message_ID and Date were retained for reference but not used in modeling.

Duplicates were removed based on Subject and Message content after cleaning.

Preprocessing Steps ✅ Subject Column: Removed non-alphabetic characters.

Removed duplicate entries based on cleaned text.

✅ Message Column: Similar cleaning as Subject (non-alphabetic character removal).

Duplicate messages removed after cleaning.

Final dataset size: 23,575 unique entries

Feature Extraction Used CountVectorizer from scikit-learn to convert text into a Bag-of-Words representation.

Applied separately to both Subject and Message columns.

🤖 Models Used Multinomial Naive Bayes

Logistic Regression

Both models were evaluated individually on:

Subject features

Message features

🏆 Achieved 90–91% accuracy on both fields using both models.

🚀 Future Improvements Combine both Subject and Message features

Try TF-IDF instead of CountVectorizer

Experiment with deep learning models (e.g. LSTM, BERT)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Enron Spam Detection using NLP & ML.ipynb		Enron Spam Detection using NLP & ML.ipynb
README.md		README.md
enron_spam_data[1].zip		enron_spam_data[1].zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enron-Spam-Detection-using-NLP-ML

About

Releases

Packages

Languages

maheshvarade/Enron-Spam-Detection-using-NLP-ML

Folders and files

Latest commit

History

Repository files navigation

Enron-Spam-Detection-using-NLP-ML

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages