Reportfile

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

E-MAIL SPAM DETECTION

USING
AI AND MACHINE LEARNING
A PROJECT SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

BACHELOR OF TECHNOLOGY
IN

COMPUTER SCIENCE AND ENGINEERING


BY

SANJAY KUMAR 1728442

AVNISH MISHRA 1630316

MANOJ KUMAR 1630241

Under the Guidance Of

Ms. CK Raina

HOD of CSE department

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


ADESH INSTITUTE OF TECHNOLOGY

GHARUAN, PUNJAB, INDIA


UNDERTAKING

I declare that the work presented in this project titled “Email spam detection”, submitted to the

computer science , Faculty of Engineering & Technology, Punjab technical university, Bareilly for the

award of the Bachelor of Technology degree in Computer science, is my original work. I have not

plagiarized or submitted the same work for the award of any other degree. In case this undertaking is

found incorrect, I accept that my degree may be unconditionally withdrawn.

Month, Year

Place

________________________________

(Student Name)
CERTIFICATE

Certified that the work contained in the project titled “Email spam detection”, by Sanjay

kumar, Avnish Mishra, Manoj kumar, has been carried out under my supervision and that this

work has not been submitted elsewhere for a degree.

Prof. CK Raina

Dept. computer science & technology

Adesh institution of technology


ACKNOWLEDGEMENT

The success and final outcome of this project required a lot of guidance and assistance from
many people and I am extremely privileged to have got this all along the completion of my
project. All that I have done is only due to such supervision and assistance and I would not
forget to thank them. I respect and thank Ms. CK Raina , for providing us an opportunity to do
the project work and giving us all support and guidance which made us complete the project
duly. We are extremely thankful to him for providing such a nice support and guidance,
although he had busy schedule managing the corporate affairs.

I owe my deep gratitude to our project guide Ms. CK Raina, who took keen interest on our
project work and guided us all along, till the completion of our project work by providing all
the necessary information for developing a good system. I would not forget to remember for
their encouragement and more over for their timely support and guidance till the completion of
our project work. We heartily thank our internal project guide for guidance and suggestions
during this project work. I am thankful to and fortunate enough to get constant encouragement,
support and guidance from all Teaching staffs which helped us in successfully completing our
project work. Also, We would like to extend our sincere esteems to all staff in laboratory for
their timely support.

SANJAY KUMAR
AVNISH KUMAR
MANOJ KUMAR
i
ABSTRACT
E-mail spam is the very recent problem for every individual. E-mail spam continues to become
a problem on the Internet. The e-mail spam is nothing it’s an advertisement of any
company/product or any kind of virus which is receiving by the email client mailbox without
any notification. Spammed emails contain many copies of the same message, commercial
advertisement or other irrelevant posts like pornographic content. To solve this problem the
different spam filtering technique is used. We’ll build a simple email classifier using naive
Bayes theorem. The spam filtering techniques are used to protect our mailbox for spam mails.
In this project, we are using the Naïve Bayesian Classifier for spam classification. The Naïve
Bayesian Classifier is very simple and efficient method for spam classification. Naive Bayes is
a simple Machine Learning algorithm that is useful in certain situations, particularly in
problems like spam classification. It is based on the famous Bayes Theorem of Probability. We
must have a training data set for our classifier to work. Here we are using the Enron Spam
dataset for classification of spam and non-spam mails. The feature extraction technique is used
to extract the feature. The result is to increase the accuracy of the system.

ii
List of Figures

Figure No. Title Page No.

1. Spam filter process 14

2. Class diagram 16

3. Use Case diagram 17

4. Sequence diagram 18

5. Feature Extraction Method 19

6. The Proposed Methodology 21


Shows the process for E-mail spam filtering based on
7. Na¨ıve Bayes algorithm 22

8. Type of Spam mail 24

9. Type of Ham mail 25

10. Interface 25

11. Tokenize Data of Mail by Classifier store in file 25

12. words classified in Ham/Spam 26

iii
Table of Contents
1. INTRODUCTION………………………………………………………………………1

1.1 TYPES OF SPAM ………………………………………………………….....2

1.1.1 Email Spam………………………………………………………….2

1.1.2 Instant messenger spam…………………………………………...…2

1.1.3 Unsolicited text messages………………………………………..….2

1.1.4 Social networking spam…………………………………………..…2

1.2 PROBLEMS OF SPAM……………………………………………….....……3

1.2.1 Viruses………………………………………………………….....…3

1.2.2 Server problems…………………………………………………...…3

1.2.3 Hacking and Phishing……………………………………………..…3

1.2.4 Productivity threats………………………………………………..…4

1.2.5 Blank spam emails and forwarding spam emails …………………....4

1.3 TYPES OF SPAM E-MAIL FILTERS…………………………………..,…....4

1.3.1 Challenge-Response spam filter……………………………,,……….4

1.3.2 Rule based scan filtering system……………………………,,………5

1.3.3 Global black lists spam filter…………………………………,,……..6

1.3.4 Bayesian Analysis…………………………………………………....6

1.3.5 Permission based spam filt…………………………………………...6

2. SRS………………………………………………………………………………………7

2.1 INTRODUCTION……………………………………………………………...7

iv
2.1.1 Purpose……………………………………………………………...7

2.1.2 Scope………………………………………………………………..7

2.1.3 Technologies to be used……………………………………….……7

2.1.4 Overview………………………………………………….……...…7

2.2 OVERALL DESCRIPTION………………………………………………….8

2.2.1 Goals of proposed system…………………………………….…….8

2.2.2 Project Requirements………………………………………….……8

2.2.3 User Characteristics…………………………………………………9

2.2.4 Constraints…………………………………………………………..9

2.3 FEASIBILITY STUDY……………………………………………………….9

2.3.1 Technical feasibility…………………………………………………9

2.3.1.1 Front-end selection……………………………………….10

2.3.1.2 Back-end Selection……………………………………….10

2.3.2 Economical feasibility………………………………………………11

2.3.3 Operational Feasibility………………………………………………11

2.3.4 Schedule feasibility…………………………………………….........11

2.4 DESIGN…………………………………………………….………………..12

2.4.1 Objective………………………………………………….…………12

2.4.2 Scope………………………………………………………….…….12

2.4.3 Advantages……………………………………………………….....12

2.5 REQUIREMENTS………………………………………….…………….......12

2.5.1 Functional Requirements…………………………....………………12

v
2.5.2 Non-Functional Requirements………………………………………12

2.5.3 Performance Requirements…………………………………….........12

2.5.4 Safety Requirements………………………………………….……..13

2.5.5 Security Requirements…………………………………………...….13

2.5.6 Project Requirements……………………………………………..…13

2.6 SOFTWARE INTERFACE…………………………………………………...13

3. ARCHITECTURE DIAGRAM………………………………………………………..14

3.1 Proposed system……………………………………………………………….14

3.2 Spam filter process…………………………………………………………….14

3.3 Spam filtering algorithm………………………………………………………15

3.4 Class diagram………………………………………………………………….16

3.5 Use case diagram………………………………………………………….......17

3.6 Sequence diagram……………………………………………………………..18

4. PROJECT METHEDOLOGIES………………………………………………………..19

4.1 The Feature Extraction……………………………………………………...…19

4.2 The Proposed Methodology………………………………………………….20

4.3 Naïve Bayes classifier……………………………………………………….21

4.4 Pre-processing……………………………………………………………......22

4.5 Experimental Setup ………………………………………………………….23

4.5.1 Print all the directories and files…………………………………....23

vi
4.5.2 Print only files in the Ham and Spam folder…………………….…23

4.5.3 Read all the files in the Ham and Spam folders……………………23

4.5.4 Prepare our data for the Naive Bayes filter…………………………23

4.5.5 Create the Test/Train data, call the Naive Bayes Filter……………..23

5. SCREENSHOTS…………………………………………………………………..….24

6. CONCLUSION……………………………………………………………………………27

7. FUTURE SCOPE……………….…………………………………………………………27

8. REFERENCES…………………………………………………………………………….28

vii

You might also like