CPP Report

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14




“Email Spam Detection”

Submitted By :
Kartik Shinde [22]

Shreyash Jangam [17]

Pream Dongare [09]

Under The Guidance Of

Mrs. S. K. Kawale


In Partial Fulfilment Of


PUNE – 411041


PUNE – 411041


Kartik Shinde [22]

Shreyash Jangam [17]
Pream Dongare [09]

“Email Spam Detection”







Mrs.S.K.Kawale Mrs.P.V.Javkar Mrs.A.V.Kurkute Dr. (Mrs.) M.S. Jadhav

(Project Guide) (Project Co-Ordinator) (Head Of Dept.) (Principal)

"The real spirit of achieving a goal is through the way of excellence and austere
discipline." The satisfaction and euphoria that accompany the successful
completion of any task would be incomplete without mentioning of the people who
made it possible and support had been a constant source of encouragement which
crowned our efforts with success.

We express my overwhelming gratitude towards respected guide

Mrs.S.K.Kawale for her constant encouragement and valuable guidance during
completion of our present work. She is the true guide who guided us with moral
values. We are happythat we could work under her thoughtful guidance.

We are very much thankful to our computer Technology coordinator

Mrs.P.V.Javkar and Head of department Mrs.A.V.Kurkute for their support.
First and foremost, we wish to record our gratitude and thanks to
Mrs.S.K.Kawale for her enthusiastic guidance and help in successful completion
of seminar.

We are deeply indebted and we would like to express sincere thanks to our principal
Dr. (Mrs.) Mrunalini S. Jadhav.

Finally, we express our honest and sincere feelings towards all other staff member
of computer department and our colleagues who directly or indirectly encourage
us, helped us, and criticized us in accomplishment of our present work.

Group Member Are :

Kartik Shinde
Shreyash Jangam
Pream Dongare
Email spam has become a major problem nowadays, with rapid growth of
internet users, email spams is also increasing. people are using them for
illegal and unethical conducts, phishing and fraud. sending malicious link
through spam emails which can harm our system and can also seek in into
your system. creating a fake profile and email account is much easy for
the spammers, they pretend like a genuine person in their spam emails,
these spammers target those peoples who are not aware about these
frauds. so, it is needed to identify those spam mails which are fraud, this
project will identify those spam by using techniques of machine learning,
this paper will discuss the machine learning algorithms and apply all these
algorithm on our data sets and

Title Page No.


Acknowledgement 1

Abstract ------------------------------------------ 2

1. Introduction and Background

1.1 Introduction 4

1.2 Background 5

2. Literature Survey and Problem Definition

2.1 LiteratureSurvey -------------------------------------- 6

2.2 ProblemDefinition -------------------------------------- 7

2.3 Specification -------------------------------------- 7

3. ProposedMethodology

3.1 ProposedMethodology --------------------------------- 8

3.2 ActionPlan --------------------------------- 8

4. ReferencesandBibliography

4.1 Papers ------------------------------------- 9

4.2 Books ------------------------------------- 9

4.3 Websites ------------------------------------ 9

1. Introduction and Background

1.1 Introduction
Email or electronic mail spam refers to the “using of email to send unsolicited emails or
advertising emails to a group of recipients. Unsolicited emails mean the recipient has not granted
permission for receiving those emails. “The popularity of using spam emails is increasing since last
decade. Spam has become a big misfortune on the internet. Spam is a waste of storage, time and
message speed. Automatic email filtering may be the most effective method of detecting spam but
nowadays spammers can easily bypass all these spam filtering applications easily. Several years ago,
most of the spam can be blocked manually coming from certain email addresses. Machine learning
approach will be used for spam detection. Major approaches adopted closer to junk mail filtering
encompass “text analysis, white and blacklists of domain names, and community-primarily based
techniques”. Text assessment of contents of mails is an extensively used method to the spams. Many
answers deployable on server and purchaser aspects are available. Naive Bayes is one of the utmost
well-known algorithms applied in these procedures. However, rejecting sends essentially dependent on
content examination can be a difficult issue in the event of bogus positives. Regularly clients and
organizations would not need any legitimate messages to be lost. The boycott approach has been
probably the soonest technique pursued for the separating of spams. The technique is to acknowledge
all the sends other than those from the area/electronic mail ids. Expressly boycotted. With more up to
date areas coming into the classification of spamming space names this technique keeps an eye on no
longer work so well.

 What are benefits of a Email Spam Detection?

• Spam Is Most Well-Known For Spreading Viruses And Scams To Unwitting People
Across The Internet, But It Can Actually Cause Plenty Of Problems For The Modern
Business. This Is Why Effective Spam Filtering, Like Securence Spam Filtering, Is An
Important Part Of Running A Successful Business In The 21st Century. Here Are Just
A Few Reasons Why Spam Filtering Is Important For Not Only Keeping You Safe
From Viruses, But Also For Helping Your Company Be More Effective And Successful.

• The Average Office Worker Receives Roughly 121 Emails Per Day, Half Of Which Are
Estimated To Be Spam. But Even At 60 Emails A Day, It Is Easy To Lose Important
Communications To The Sheer Number That Are Coming In. This Is One Of The
Secret Benefits Of Spam Filtering That People Do Not Know About: It Simply
Streamlines Your Inbox. With Less Garbage Coming Into Your Inbox, You Can
Actually Go Through Your Emails More Effectively And Stay In Touch With Those
Who Matter.

• Protect Against Malware, Viruses, And Other Forms Of Malicious Attacks Are
• Every Day, Someone Falls Prey To A Phishing Scam, A Particular Kind Of Spam-
Based Scheme Where Someone Thinks They Are Getting A Legitimate Email And
Ends Up Divulging Credit Card Information.
1.1 Background

Email has been the most important medium of communication nowadays, through
internet connectivity any message can be delivered to all aver the world. More than
270 billion emails are exchanged daily, about 57% of these are just spam emails.
Spam emails, also known as non-self

Nowadays, which affects or hacks personal information like bank ,related to money or
anything that causes destruction to single individual or a corporation or a group of
people. Besides advertising, these may contain links to phishing or malware hosting
websites set up to steal confidential information. Spam is a serious issue that is not
just annoying to the end-users but also financially damaging and a security risk.

Hence this system is designed in such a way that it detects unsolicited and unwanted
emails and prevents them hence helping in reducing the spam message which would
be of great benefit to individuals as well as to the company .In the future this system
can be implemented by using different algorithms and also more features can be
added to the existing system.

Email Spam Detection Database

2. Literature Survey and Problem Definition

2.1 Literature Survey

Bo Yu and Zong-ben Xu (2008) performed a comparative analysis on content-

based spam classification using four different machine learning algorithms. This
paper classified spam emails using four different machine learning algorithms viz.
Naıve Bayesian, Neural Network, Support Vector Machine and Relevance Vector
Machine. The analysis was performed on different training dataset and feature
selection. Analysis results demonstrated that NN algorithm is no good enough
algorithm to be used as a tool for spam rejection. SVM and RVM machine
learning algorithms are better algorithms than NB classifier. Instead of slow
learning, RVM is still better algorithm than SVM for spam classification with less
execution time and less relevance vectors as in . Tiago A. Almeida and Akebo
Yamakami (2010) performed a comparative analysis using content-based filtering
for spam. This paper discussed seven different modified versions of Naïve Bayes
Classifier and compared those results with Linear Support Vector Machine on six
different open and large datasets. The results demonstrated that SVM, Boolean
NB and Basic NB are the best algorithms for spam detection. However SVM
executed the accuracy rate higher than 90% for almost all the datasets utilized as
in . Loredana Firte, Camelia Lemnaru and Rodica Potolea (2010) performed a
comparative analysis on spam detection filter using KNN Algorithm and
Resampling approach. This paper make use of K-NN algorithm for classification
of spam emails on predefined dataset using feature’s selected from the content and
emails properties. Resampling of the datasets to appropriate set and positive
distribution was carried out to make the algorithm efficient for feature selection as
in. Ms.D.Karthika Renuka, Dr.T.Hamsapriya, Mr.M.Raja Chakkaravarthi and
Ms.P.Lakshmisurya (2011) performed a comparative analysis on spam
classification based on supervised learning using several machine learning
techniques. In this analysis, the comparison was done using three different
machine learning classification algorithms viz. Naïve Bayes, J48 and Multilayer
perceptron (MLP) classifier. Results demonstrated high accuracy for MLP but
high time consumption. While Naïve Bayes accuracy was low than MLP but was
fast enough in execution and learning. The accuracy of Naïve Bayes was enhanced
using FBL feature selection and used filtered Bayesian Learning with Naïve
Bayes. The modified Naïve Bayes showed the accuracy of 91%
2.2 Problem Definition

Email Spam Detection was primarily developed The reason to do this is simple: by
detecting unsolicited and unwanted emails, we can prevent spam messages from creeping
into the user’s inbox, thereby improving user experience. Emails are sent through a spam

2.3 Specification

1. Email Spam Detection is developed by using ML & Python.

2. Study the syntax and functions of Python
3. Software Requirements: Intel i5 , RAM : 8 GB
3. Proposed Methodology

3.1 Proposed Methodology

In, the e-mail detection method was proposed for the detection of spam. In the
system, four predictive machine learning and python classifiers were used
with various data partitions for training and testing of the models. Additionally
different hyper parameters values were used in the models. The system
obtained good results.

3.2 Action Plan

Sr. Details of Activity Planned Planned Name of

No Start Date Finish Date Responsible
Team Member
Research Papers On Email Kartik Shinde
01 Spam Detection Shreyash Jangam
Pream Dongare
02 Study the concept of Email Kartik Shinde
Spam Detection Data. Shreyash Jangam
Pream Dongare
Study the details of required Kartik Shinde
03 software. Shreyash Jangam
Pream Dongare
Advantages & Disadvantages For Kartik Shinde
04 The Project. Shreyash Jangam
Pream Dongare
Application & future scope of Kartik Shinde

the project for further use. Shreyash Jangam
Pream Dongare
06 Cost Estimations. Kartik Shinde
Shreyash Jangam
Pream Dongare
07 Prepare the project. Kartik Shinde
Shreyash Jangam
Pream Dongare
4. References and Bibliography

4. Papers
When we receive message in the inbox ,that message will be exported to dataset. This
1. message will be detected as spam or not using Naïve Bayes Classifier.Before detecting
whether received message is spam or notthe model has to be trained which is explained in
the below section.This concept includes Information System.

2. When we receive message in the inbox ,that message will be exported to dataset as shown
below. This message will be detected as spam or not.
3. In this system, to solve the problem of spam, the spam classification system is created to
identify spam and non- spam. Since spammers may send spam messages many times, it is
difficult to identify it every time manually .So we will be using some of the strategies in our
proposed system to detect the spam. The proposed solution not only identifies the spam
word but also identifies the IP address of the system through which the spam message is
sent so that next time when the spam message is sent from the same system our proposed
system directly identifies it as blacklisted based on the IP address. An information system
. offers a litany of benefits that help to make the process of managing
4. The exported message will be detected as spam or not using Bayes theorem and Naive
Bayes Classifier following all the steps discussed above along with finding probability of
words in spam and ham messages to detect it as spam or not. The below figures shows
message which got detected as spam and ham.

6. If Urgent! Please call 09062703810 is an exported message from the inbox to the
dataset then based on trained dataset and using Bayes theorem and Naive Bayes
Classifier, the above message is detected as Spam as shown below

4. Books
1. IGERT Independent Publishing Platform (May 20, 2020), Email Spam Detection: A
Complete Guide" (Author) by Thashina Sultana, K A Sapnaz,

4. Websites
1. Email based Spam Detection – IJERT

2. E-mail spam detection - Machine Learning: End-to-End guide for Java developers [Book]

You might also like