Part 3 discription

CHAPTER 1
INTRODUCTION
Phishing is a simple yet complex mechanism that escalates threats to the security of
the Internet community. With little information about the victim, the attacker can
produce a believable and personalized email or webpage. It is also hard to catch the
attacker, as most of them tend to hide their location and work in almost complete
anonymity. Even with high technology and excellent security software, users can
become victims of this scheme. This is due to the huge of number of methods that
can be used by the attackers to attract users into their phishing scheme. A report by
Forbes has highlighted that approximately $500 million losses related to phishing
attacks occur every year in the US businesses.
Phishing is defined as an attack to lure users to a fake webpage that masquerades as

a legitimate website and aims to obtain disclosed personal data or credentials. The
largest phishing campaign is conducted using spam emails to direct users to fake
web pages using impersonation techniques such as email spoofing and Domain
Name System (DNS) spoofing and as well as social engineering. In addition, a
phished website also tries to mimic the legitimate source by numerous methods,
such as embedding some important contents imported directly from the legitimate
website and using similar keywords that refer to the target, including the title,
images, and links.
A study by Hassan et al. raises concern on the methods used to detect and filter
phishing webpages or emails successfully. Phishing can be considered as a semantic
attack that easily tricks the users by crafting deceptive semantic techniques. The
phrases in the phishing vector, especially through emails, are Lure, Hook, and
Catch. Two mechanisms are suggested to defend against this phishing vector:
developing awareness programmes and deploying the detection and filtering
systems. Awareness programmes are designed to educate users by implementing
phishing defensive training such as that found. Whereas for the deployment of
technical defences against phishing, one can apply the two-factor authentication in a
robust secure email, use disguised executable file detection, analyse and detect
executable files transferred via emails, and add another layer of security by warning
1
a user when abnormal data in the header source code are detected, such as in the
spoofed email.
Objectives of the project:
Phishing is an attempt to obtain confidential information about a user or an

organization. It is an act of impersonating a credible webpage to lure users to
expose sensitive data, such as username, password and credit card information. It
has cost the online community and various stakeholders hundreds of millions of
dollars. There is a need to detect and predict phishing, and the machine learning
classification approach is a promising approach to do so. However, it may take
several phases to identify and tune the effective features from the dataset before the
selected classifier can be trained to identify phishing sites correctly. This paper
presents the performance of two feature selection techniques known as the Feature
Selection by Omitting Redundant Features (FSOR) and Feature Selection by
Filtering Method (FSFM) to the 'Phishing Websites' dataset from the University of
California Irvine and evaluates the performance of phishing webpage detection via
three different machine learning techniques: Random Forest (RF) tree, Multilayer
Perceptron (MLP) and Naive Bayes (NB). The most effective classification
performance of these machine learning algorithms is further rectified based on a
selected subset of features set by various feature selection methods. The
observational results have shown that the optimized Random Forest (RFPT)
classifier with feature selection by the FSFM achieves the highest performance
among all the techniques.
2
CHAPTER 2
LITERATURE SURVEY
Phishing – challenges and solutions
Phishing is a major threat to all Internet users and is difficult to trace or defend
against since it does not present itself as obviously malicious in nature. In today's
society, everything is put online and the safety of personal credentials is at risk.
Phishing can be seen as one of the oldest and easiest ways of stealing information
from people and it is used for obtaining a wide range of personal details. It also has
a fairly simple approach – send an email, email sends victim to a site, site steals
information.
Anomaly Based Web Phishing Page Detection
Many anti-phishing schemes have recently been proposed in literature. Despite all
those efforts, the threat of phishing attacks is not mitigated. One of the main reasons
is that phishing attackers have the adaptability to change their tactics with little cost.
In this paper, we propose a novel approach, which is independent of any specific
phishing implementation. Our idea is to examine the anomalies in Web pages, in
particular, the discrepancy between a Web site's identity and its structural features
and HTTP transactions. It demands neither user expertise nor prior knowledge of
the Web site. The evasion of our phishing detection entails high cost to the
adversary. As shown by the experiments, our phishing detector functions with low
miss rate and low false-positive rate.
Off-the-Hook: An Efficient and Usable Client-Side Phishing

Prevention Application
Phishing is a major problem on the Web. Despite the significant attention it has
received over the years, there has been no definitive solution. While the state-of-the-
art solutions have reasonably good performance, they suffer from several drawbacks
3
including potential to compromise user privacy, difficulty of detecting phishing
websites whose content change dynamically, and reliance on features that are too
dependent on the training data. To address these limitations we present a new
approach for detecting phishing webpages in real-time as they are visited by a
browser. It relies on modeling inherent phisher limitations stemming from the
constraints they face while building a webpage. Consequently, the implementation
of our approach, Off-the-Hook, exhibits several notable properties including high
accuracy, brand-independence and good language-independence, speed of decision,
resilience to dynamic phish and resilience to evolution in phishing techniques. Off-
the-Hook is implemented as a fully-client-side browser add-on, which preserves
user privacy. In addition, Off-the-Hook identifies the target website that a phishing
webpage is attempting to mimic and includes this target in its warning. We
evaluated Off-the-Hook in two different user studies. Our results show that users
prefer Off-the-Hook warnings to Firefox warnings.
Comparative analysis of features based machine learning

approaches for phishing detection
Machine learning based anti-phishing techniques are based on various features

extracted from different sources. These features differentiate a phishing website
from a legitimate one. Features are taken from various sources like URL, page
content, search engine, digital certificate, website traffic, etc, of a website to detect
it as a phishing or non-phishing. The websites are declared as phishing sites if the
heuristic design of the websites matches with the predefined rules. The accuracy of
the anti-phishing solution depends on features set, training data and machine
learning algorithm. This paper presents a comprehensive analysis of Phishing
attacks, their exploitation, some of the recent machine learning based approaches for
phishing detection and their comparative study. It provides a better understanding of
the phishing problem, current solution space in machine learning domain, and scope
of future research to deal with Phishing attacks efficiently using machine learning
based approaches.
Using Case-Based Reasoning for Phishing Detection
4
Many classifications techniques have been used and devised to combat phishing
threats, but none of them is able to efficiently identify web phishing attacks due to
the continuous change and the short life cycle of phishing websites. In this paper,
we introduce a Case-Based Reasoning (CBR) Phishing Detection System (CBR-
PDS). It mainly depends on CBR methodology as a core part. The proposed system
is highly adaptive and dynamic as it can easily adapt to detect new phishing attacks
with a relatively small data set in contrast to other classifiers that need to be heavily
trained in advance. We test our system using different scenarios on a balanced 572
phishing and legitimate URLs. Experiments show that the CBR-PDS system
accuracy exceeds 95.62%, yet it significantly enhances the classification accuracy
with a small set of features and limited data sets.
Information leakage preventive training
Phishing is an attempt to obtain private/confidential information such as usernames,

passwords, and financial details. It is often for malicious reasons by disguising as a
trustworthy entity in an electronic communication such as email. The chances of
obtaining confidential or personal information are higher when website medium
combined with email medium in launching phishing attacks. Universiti Kebangsaan
Malaysia (UKM) has experienced phishing emails attacks in 2016. Besides
technology that focuses on email security, the safety awareness program that meant
to provide education to the users especially UKM staffs needs to be enhanced to
reduce the risk of thievery on personal data, university confidential information and
research data. The simulation approach in a real environment can provide a true
picture to the staffs about the serious impact of phishing attacks. The objectives of
the simulation are to measure and to educate UKM staffs on the security awareness.
We designed a spear phishing simulation procedure with collaboration between the
Faculty of Information Science and Technology (FTSM), Information Technology
Center, Bursary Department and Department of Registrar, UKM. The simulation
was conducted from 11-13 January 2017 with 553 email addresses were identified
from five different faculties. There were 209 respondents (38%) who have entered
their official ids (captured) and password (not captured). The differences in the
number of respondents between science and technology (S&T) faculties and non-
S&T faculties indicated the security awareness is in the worrying level. A high
5
percentage of responses among the management and professional group can also be
classified as being in an alarming rate. This simulation is the first practice in UKM
and it helps to increase awareness and provide education about cyber security.
Impact of security awareness training on phishing click-through

rates
In this paper we study the impact that security awareness training has on the people
who click on malicious links contained in phishing emails. Phishing is a criminal
activity in which social engineering techniques and technology are used to obtain
personal information without one's consent. Currently, anti-phishing techniques
have little academic backing, usually only statistics and testimonials from security
organizations. This paper aims to provide an educational standard by which the
usefulness of internet security awareness and anti-phishing techniques can be
compared in the future.
Identity Theft – Empirical evidence from a Phishing exercise
Identity theft is an emerging threat in our networked world and more individuals
and companies fall victim to this type of fraud. User training is an important part of
ICT security awareness; however, IT management must know and identify where to
direct and focus these awareness training efforts. A phishing exercise was
conducted in an academic environment as part of an ongoing information security
awareness project where system data or evidence of users’ behavior was
accumulated. Information security culture is influenced by amongst other aspects
the behavior of users. This paper presents the findings of this phishing experiment
where alarming results on the staff behavior are shown. Educational and awareness
activities pertaining to email environments are of utmost importance to manage the
increased risks of identity theft.
Enhancing Anti-phishing by a Robust Multi-Level Authentication

Technique (EARMAT)
6
Phishing is a kind of social engineering attack in which experienced persons or
entities fool novice users to share their sensitive information such as usernames,
passwords, credit card numbers, etc. through spoofed emails, spams, and Trojan
hosts. The proposed scheme based on designing a secure two factor authentication
web application that prevents phishing attacks instead of relying on the phishing
detection methods and user experience. The proposed method guarantees that
authenticating users to services, such as online banking or e-commerce websites, is
done in a very secure manner. The proposed system involves using a mobile phone
as a software token that plays the role of a second factor in the user authentication
process, the web application generates a session based onetime password and
delivers it securely to the mobile application after notifying him through Google
Cloud Messaging (GCM) service, then the user mobile software will complete the
authentication process – after user confirmation- by encrypting the received onetime
password with its own private key and sends it back to the server in a secure and
transparent to the user mechanism. Once the server decrypts the received onetime
password and mutually authenticates the client, it automatically authenticates the
user’s web session. We implemented a prototype system of our authentication
protocol that consists of an Android application, a Java-based web server and a
GCM connectivity for both of them. Our evaluation results indicate the viability of
the authentication protocol to secure the web applications authentication against
various types of threats.
Disguised executable files in spear-phishing emails: Detecting the

point of entry in advanced persistent threat
Advanced Persistent Threat (APT) is one of the most serious types of cyber attacks,
which is a new and more complex version of multi-step attack. Within the APT life
cycle, the most common technique used to get the point of entry is spear-phishing
emails which may contain disguised executable files. This paper presents the
disguised executable file detection (DeFD) module, which aims at detecting
disguised exe files transferred over the connections. The detection is based on a
comparison between the MIME type of the transferred file and the file name
extension. This module was experimentally evaluated and the results show
successful detection of disguised executable files.
7
CHAPTER 3
SYSTEM ANALYSIS
3.1 Existing System
Cyber-attack attack can utilize phishing in delivery phases. It is started

when the attacker learns about the target organization, either through web pages or
any downloaded materials. Then, the attacker puts malicious code into a delivery
vehicle, such as a fake webpage or an attachment. In the context of the fake
webpage, the attacker clones the targeted official webpage with several input fields
(e.g., text box, image). The attachment and link to the fake webpage can also be
sent to users through email to attract thousands of victims. In addition, it is also
possible in spreading phishing link and fake web pages with the aid of blogs, forums
and so forth.
3.1.1 Disadvantages
1.Less accuracy.
3.2 Proposed System
The machine learning approach is selected to predict whether a website,

according to a dataset with some extracted features, is legitimate or phishing. Some
extracted features acquire the same influence level on classifier accuracy to predict
phishing sites and are considered as redundant. Optimization classification
performance was conducted in determining the most effective features among all
the features extracted.
3.2.1 Advantages
8
1. Accuracy is more.
3.3 Process Model Used With Justification
Umbrella
DOCUMENT CONTROL
Activity
Umbrella
Business Requirement Activity
Documentation
• Feasibility Study
• TEAM FORMATION
• Project Specification ANALYSIS &
Requirements PREPARATION
DESIGN CODE UNIT TEST ASSESSMENT
Gathering
INTEGRATION ACCEPTANCE
& SYSTEM TEST
DELIVERY/ INS
TESTING
TALLATION
Umbrella
TRAINING
Activity
Fig 3.1 SDLC (Umbrella model)
SDLC is nothing but Software Development Life Cycle. It is a standard which

is used by software industry to develop good software.
3.4 Stages in SDLC
 Requirement Gathering
 Analysis
 Designing
 Coding
 Testing
 Maintenance
9
3.4.1 Requirements Gathering stage
The requirements gathering process takes as its input the goals identified in the
high-level requirements section of the project plan. Each goal will be refined into a
set of one or more requirements. These requirements define the major functions of
the intended application, define operational data areas and reference data areas, and
define the initial data entities. Major functions include critical processes to be
managed, as well as mission critical inputs, outputs and reports. A user class
hierarchy is developed and associated with these major functions, data areas, and
data entities. Each of these definitions is termed a Requirement. Requirements are
identified by unique requirement identifiers and, at minimum, contain a requirement
title and textual description.
Fig.3.2 Gathering requirements
These requirements are fully described in the primary deliverables for this stage: the
Requirements Document and the Requirements Traceability Matrix (RTM). The
requirements document contains complete descriptions of each requirement,
including diagrams and references to external documents as necessary. Note that
detailed listings of database tables and fields are not included in the requirements
document.
The title of each requirement is also placed into the first version of the RTM,
along with the title of each goal from the project plan. The purpose of the RTM is to
show that the product components developed during each stage of the software
10
development lifecycle are formally connected to the components developed in prior
stages.
In the requirements stage, the RTM consists of a list of high-level requirements,

or goals, by title, with a listing of associated requirements for each goal, listed by
requirement title. In this hierarchical listing, the RTM shows that each requirement
developed during this stage is formally linked to a specific product goal. In this
format, each requirement can be traced to a specific product goal, hence the term
requirements traceability.
The outputs of the requirements definition stage include the requirements document,
the RTM, and an updated project plan.
 Feasibility study is all about identification of problems in a project.

 No. of staff required to handle a project is represented as Team Formation, in
this case only modules are individual tasks will be assigned to employees who
are working for that project.
 Project Specifications are all about representing of various possible inputs
submitting to the server and corresponding outputs along with reports
maintained by administrator.
3.4.2 Analysis Stage
The planning stage establishes a bird's eye view of the intended software product,
and uses this to establish the basic project structure, evaluate feasibility and risks
associated with the project, and describe appropriate management and technical
approaches.
The most critical section of the project plan is a listing of high-level product
requirements, also referred to as goals. All of the software product requirements to
be developed during the requirements definition stage flow from one or more of
these goals. The minimum information for each goal consists of a title and textual
description, although additional information and references to external documents
may be included. The outputs of the project planning stage are the configuration
management plan, the quality assurance plan, and the project plan and schedule,
11
with a detailed listing of scheduled activities for the upcoming Requirements stage,
and high level estimates of effort for the out stages.
Fig.3.3 Analysis
3.4.3 Designing Stage
The design stage takes as its initial input the requirements identified in the approved
requirements document. For each requirement, a set of one or more design elements
will be produced as a result of interviews, workshops, and/or prototype efforts.
Design elements describe the desired software features in detail, and generally
include functional hierarchy diagrams, screen layout diagrams, tables of business
rules, business process diagrams, pseudo code, and a complete entity-relationship
diagram with a full data dictionary. These design elements are intended to describe
the software in sufficient detail that skilled programmers may develop the software
with minimal additional input.
When the design document is finalized and accepted, the RTM is updated to show
that each design element is formally associated with a specific requirement. The
outputs of the design stage are the design document, an updated RTM, and an
updated project plan.
12
Fig.3.4 Designing
3.4.4 Development (Coding) Stage
The development stage takes as its primary input the design elements described in
the approved design document. For each design element, a set of one or more
software artifacts will be produced. Software artifacts include but are not limited to
menus, dialogs, and data management forms, data reporting formats, and specialized
procedures and functions. Appropriate test cases will be developed for each set of
functionally related software artifacts, and an online help system will be developed
to guide users in their interactions with the software.
The RTM will be updated to show that each developed artifact is linked to a
specific design element, and that each developed artifact has one or more
corresponding test case items. At this point, the RTM is in its final configuration.
The outputs of the development stage include a fully functional set of software that
satisfies the requirements and design elements previously documented, an online
help system that describes the operation of the software, an implementation map
that identifies the primary code entry points for all major system functions, a test
plan that describes the test cases to be used to validate the correctness and
completeness of the software, an updated RTM, and an updated project plan.
13
Fig.3.5 Coding
3.4.5 Integration & Test Stage
During the integration and test stage, the software artifacts, online help, and test data
are migrated from the development environment to a separate test environment. At
this point, all test cases are run to verify the correctness and completeness of the
software. Successful execution of the test suite confirms a robust and complete
migration capability. During this stage, reference data is finalized for production use
and production users are identified and linked to their appropriate roles. The final
reference data (or links to reference data source files) and production user list are
compiled into the Production Initiation Plan.
The outputs of the integration and test stage include an integrated set of software,
an online help system, an implementation map, a production initiation plan that
describes reference data and production users, an acceptance plan which contains
the final suite of test cases, and an updated project plan.
14
Fig.3.6 Testing
3.4.6 Installation & Acceptance Test
During the installation and acceptance stage, the software artifacts, online help, and
initial production data are loaded onto the production server. At this point, all test
cases are run to verify the correctness and completeness of the software. Successful
execution of the test suite is a prerequisite to acceptance of the software by the
customer.
After customer personnel have verified that the initial production data load is
correct and the test suite has been executed with satisfactory results, the customer
formally accepts the delivery of the software.
15
Fig.3.7 Maintenance
The primary outputs of the installation and acceptance stage include a production
application, a completed acceptance test suite, and a memorandum of customer
acceptance of the software. Finally, the PDR enters the last of the actual labor data
into the project schedule and locks the project as a permanent project record. At this
point the PDR "locks" the project by archiving all software items, the
implementation map, the source code, and the documentation for future reference.
3.4.7 Maintenance
Outer rectangle represents maintenance of a project, Maintenance team will start

with requirement study, understanding of documentation later employees will be
assigned work and they will undergo training on that particular assigned category.
For this life cycle there is no end, it will be continued so on like an umbrella (no
ending point to umbrella sticks).
16
CHAPTER 4
SOFTWARE REQUIREMENT SPECIFICATION
4.1. Overall Description
A Software Requirements Specification (SRS) – a requirements specification for a

software system is a complete description of the behaviour of a system to be
developed. It includes a set of use cases that describe all the interactions the users
will have with the software. In addition to use cases, the SRS also contains non-
functional requirements. Non-functional requirements are requirements which
impose constraints on the design or implementation (such as performance
engineering requirements, quality standards, or design constraints).
System requirements specification: A structured collection of information that embodies

the requirements of a system. A business analyst, sometimes titled system analyst, is
responsible for analyzing the business needs of their clients and stakeholders to help
identify business problems and propose solutions. Within the systems development
lifecycle domain, the BA typically performs a liaison function between the business side
of an enterprise and the information technology department or external service
providers. Projects are subject to three sorts of requirements:
 Business requirements describe in business terms what must be delivered or

accomplished to provide value.
 Product requirements describe properties of a system or product (which could

be one of several ways to accomplish a set of business requirements.)
 Process requirements describe activities performed by the developing

organization. For instance, process requirements could specify .Preliminary
investigation examine project feasibility, the likelihood the system will be useful to
the organization. The main objective of the feasibility study is to test the Technical,
Operational and Economical feasibility for adding new modules and debugging old
running system. All system is feasible if they are unlimited resources and infinite
17
time. There are aspects in the feasibility study portion of the preliminary
investigation:
• ECONOMIC FEASIBILITY
A system can be developed technically and that will be used if installed must still be a
good investment for the organization. In the economical feasibility, the development
cost in creating the system is evaluated against the ultimate benefit derived from the
new systems. Financial benefits must equal or exceed the costs. The system is
economically feasible. It does not require any addition hardware or software. Since
the interface for this system is developed using the existing resources and
technologies available at NIC, There is nominal expenditure and economical
feasibility for certain.
• OPERATIONAL FEASIBILITY
Proposed projects are beneficial only if they can be turned out into information
system. That will meet the organization’s operating requirements. Operational
feasibility aspects of the project are to be taken as an important part of the project
implementation. This system is targeted to be in accordance with the above-
mentioned issues. Beforehand, the management issues and user requirements have
been taken into consideration. So there is no question of resistance from the users that
can undermine the possible application benefits. The well-planned design would
ensure the optimal utilization of the computer resources and would help in the
improvement of performance status.
• TECHNICAL FEASIBILITY
Earlier no system existed to cater to the needs of ‘Secure Infrastructure

Implementation System’. The current system developed is technically feasible. It is a
web based user interface for audit workflow at NIC-CSD. Thus it provides an easy
access to the users. The database’s purpose is to create, establish and maintain a
workflow among various entities in order to facilitate all concerned users in their
various capacities or roles. Permission to the users would be granted based on the
roles specified. Therefore, it provides the technical guarantee of accuracy, reliability
and security.
18
4.2. External Interface Requirements
User Interface
The user interface of this system is a user friendly python Graphical User Interface.
Hardware Interfaces
The interaction between the user and the console is achieved through python
capabilities.
Software Interfaces
The required software is python.
Operating Environment
Windows XP.
HARDWARE REQUIREMENTS:
 Processor - Pentium –IV
 Speed - 1.1 Ghz

 RAM - 256 MB(min)
 Hard Disk - 20 GB
 Key Board - Standard Windows Keyboard
 Mouse - Two or Three Button Mouse
 Monitor - SVGA
SOFTWARE REQUIREMENTS:
19
 Operating System - Windows7/8
 Programming Language - Python
CHAPTER 5
IMPLEMETATION
5.1 Python
Python is a general-purpose language. It has wide range of applications from Web

development (like: Django and Bottle), scientific and mathematical computing
(Orange, SymPy, NumPy) to desktop graphical user Interfaces (Pygame, Panda3D).
The syntax of the language is clean and length of the code is relatively short. It's fun
to work in Python because it allows you to think about the problem rather than
focusing on the syntax.
5.1.1 History of Python:
Python is a fairly old language created by Guido Van Rossum. The design began in
the late 1980s and was first released in February 1991.
Why Python was created?
In late 1980s, Guido Van Rossum was working on the Amoeba distributed operating
system group. He wanted to use an interpreted language like ABC (ABC has simple
easy-to-understand syntax) that could access the Amoeba system calls. So, he
decided to create a language that was extensible. This led to design of a new
language which was later named Python.
Why the name Python?
No. It wasn't named after a dangerous snake. Rossum was fan of a comedy series
from late seventies. The name "Python" was adopted from the same series "Monty
Python's Flying Circus".
20
5.1.2 Features of Python:
A simple language which is easier to learn
Python has a very simple and elegant syntax. It's much easier to read and write
Python programs compared to other languages like: C++, Java, C#. Python makes
programming fun and allows you to focus on the solution rather than syntax.
If you are a newbie, it's a great choice to start your journey with Python.
Free and open-source
You can freely use and distribute Python, even for commercial use. Not only can
you use and distribute software’s written in it, you can even make changes to the
Python's source code.
Python has a large community constantly improving it in each iteration.
Portability
You can move Python programs from one platform to another, and run it without
any changes.
It runs seamlessly on almost all platforms including Windows, Mac OS X and

Linux.
Extensible and Embeddable
Suppose an application requires high performance. You can easily combine pieces
of C/C++ or other languages with Python code.
This will give your application high performance as well as scripting capabilities
which other languages may not provide out of the box.
A high-level, interpreted language
21
Unlike C/C++, you don't have to worry about daunting tasks like memory
management, garbage collection and so on.
Likewise, when you run Python code, it automatically converts your code to the
language your computer understands. You don't need to worry about any lower-
level operations.
Large standard libraries to solve common tasks
Python has a number of standard libraries which makes life of a programmer much
easier since you don't have to write all the code yourself. For example: Need to
connect MySQL database on a Web server? You can use MySQLdb library using
import MySQLdb .
Standard libraries in Python are well tested and used by hundreds of people. So you
can be sure that it won't break your application.
Object-oriented
Everything in Python is an object. Object oriented programming (OOP) helps you

solve a complex problem intuitively.
With OOP, you are able to divide these complex problems into smaller sets by
creating objects.
5.1.3 Applications of Python:
1. Simple Elegant Syntax
Programming in Python is fun. It's easier to understand and write Python code.
Why? The syntax feels natural. Take this source code for an example:
a=2
b=3
22
sum = a + b
print(sum)
2. Not overly strict
You don't need to define the type of a variable in Python. Also, it's not necessary to
add semicolon at the end of the statement.
Python enforces you to follow good practices (like proper indentation). These small
things can make learning much easier for beginners.
3. Expressiveness of the language
Python allows you to write programs having greater functionality with fewer lines
of code. Here's a link to the source code of Tic-tac-toe game with a graphical
interface and a smart computer opponent in less than 500 lines of code. This is just
an example. You will be amazed how much you can do with Python once you learn
the basics.
4. Great Community and Support
Python has a large supporting community. There are numerous active forums online
which can be handy if you are stuck.
23
CHAPTER 6
EXPECTED RESULTS
Classifier Processing Time Accuracy
Random Forest 15 second 96.98%
Multilayer Perception 945 seconds 96.32%
Naive Bayes 1 second 92.94%
24
CHAPTER 7
REFERENCES
[1] I. Vayansky, and S. Kumar, “Phishing – challenges and solutions,” Computer

Fraud & Security, pp. 15-20, 2018.
[2] Phishing Activity Trends Report – 1st Quarter 2018. Available online:
https://docs.apwg.org/reports/apwg_trends_report_q1_2018.pdf (accessed on: 1
February 2019).
[3] Y. Pan, and X. Ding, “Anomaly-based web phishing page detection”, In Proc.
Of the 22nd ACSAC, IEEE, Miami, FL, USA, pp. 381-392, 2006.
[4] S. Marchal, G. Armano, T. Grondahl, K. Saari, N. Singh, and N. Asokan, “Off-

the-Hook: An Efficient and Usable Client-Side Phishing Prevention Application,”
IEEE Transaction on Computers, vol. 66, no. 10, pp. 1717-1733, 2017.
[5] A. K. Jain, and B. B. Gupta, “Comparative analysis of features based machine

learning approaches for phishing detection,” In Proc. INDIACom, IEEE, New
Delhi, India, 2016.
[6] H. Y. A. Abutair, and A. Belghith, “Using Case-Based Reasoning for Phishing

Detection,” Procedia Computer Science, vol. 109, 281-288, 2017.
25
[7] N. A. Bakar, M. Mohd, and R. Sulaiman, “Information leakage preventive
training,” In Proc. Of 6th ICEEI, IEEE, Langkawi, Malaysia, 2018.
[8] A. Carella, M. Kotsoev, and T. M. Truta, “Impact of security awareness training

on phishing click-through rates,” In IEEE Proc. Big Data, IEEE, Boston, MA, USA,
2017.
[9] T. Steyn, H. Kruger, and L. Drevin, “Identity theft - empirical evidence from a
phishing exercise” In New Approaches for Security, Privacy and Trust in Complex
Environments; Venter, H.; Eloff, M.; Labuschagne, L.; Eloff, J.; von Sohns, R.
Springer: Boston, MA, USA, vol. 232, pp. 193-203, 2007.
[10] A. Yasin, and A. Abuhasan, “Enhancing anti-phishing by a robust multi-level

authentication technique,” IAJIT, vol. 15, pp. 990-999, 2018.
[11] I, Ghafir, V. Prenosil, M. Hammoudeh, F. J. Aparicio-Navarro, K. Rabie, and

A. Jabban, “Disguised executable files in spear-phishing emails: Detecting the point
of entry in advanced persistent threat,” In Proc. ICFNDS‟18, ACM, Amman,
Jordan, 2018.
[12] B. Opazo, D. Whitteker, C. C. Shing, “Email trouble: Secrets of spoofing, the

dangers of social engineering, and how we can help,” 13th International Conference
on Natural Computation, ICNC-FSKD, IEEE, Guilin, China, pp. 2812-2817, 2017.
[13] W. Harrop, and A. Matteson, “Cyber Resilience: A Review of Critical National

Infrastructure and Cyber-Security Protection Measures Applied in the UK and
USA,” In Current and Emerging Trends in Cyber Operations: Policy, Strategy and
Practice; George Washington University, USA, Springer, 2015.
[14] A. Waleed, "Phishing website detection based on supervised machine learning

with wrapper features selection," International Journal of Advanced Computer
Science and Applications, vol. 8, pp. 72-78, 2017.
26
[15] K. Firdous, B. Al-Otaibi, A. Al-Qadi, and N. Al-Dossari, “Hybrid client side
phishing websites detection approach,” International Journal of Advanced
Computer Science and Applications,vol. 5, pp. 132-140, 2014.
27

Part 3 discription

Uploaded by

Copyright:

Available Formats

Part 3 discription

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Part 3 discription

Uploaded by

Copyright:

Available Formats

CHAPTER 1

Phishing is defined as an attack to lure users to a fake webpage that masquerades as

Objectives of the project:

Phishing is an attempt to obtain confidential information about a user or an

Phishing – challenges and solutions

Anomaly Based Web Phishing Page Detection

Off-the-Hook: An Efficient and Usable Client-Side Phishing

Comparative analysis of features based machine learning

Machine learning based anti-phishing techniques are based on various features

Using Case-Based Reasoning for Phishing Detection

Information leakage preventive training

Phishing is an attempt to obtain private/confidential information such as usernames,

Impact of security awareness training on phishing click-through

Identity Theft – Empirical evidence from a Phishing exercise

Enhancing Anti-phishing by a Robust Multi-Level Authentication

Disguised executable files in spear-phishing emails: Detecting the

3.1 Existing System

Cyber-attack attack can utilize phishing in delivery phases. It is started

3.2 Proposed System

The machine learning approach is selected to predict whether a website,

3.3 Process Model Used With Justification

Fig 3.1 SDLC (Umbrella model)

SDLC is nothing but Software Development Life Cycle. It is a standard which

3.4 Stages in SDLC

Fig.3.2 Gathering requirements

In the requirements stage, the RTM consists of a list of high-level requirements,

 Feasibility study is all about identification of problems in a project.

3.4.2 Analysis Stage

3.4.3 Designing Stage

3.4.4 Development (Coding) Stage

3.4.5 Integration & Test Stage

3.4.6 Installation & Acceptance Test

Outer rectangle represents maintenance of a project, Maintenance team will start

4.1. Overall Description

A Software Requirements Specification (SRS) – a requirements specification for a

System requirements specification: A structured collection of information that embodies

 Business requirements describe in business terms what must be delivered or

 Product requirements describe properties of a system or product (which could

 Process requirements describe activities performed by the developing

Earlier no system existed to cater to the needs of ‘Secure Infrastructure

The required software is python.

 Processor - Pentium –IV

 Speed - 1.1 Ghz

Python is a general-purpose language. It has wide range of applications from Web

5.1.1 History of Python:

Why Python was created?

Why the name Python?

A simple language which is easier to learn

Free and open-source

Python has a large community constantly improving it in each iteration.

It runs seamlessly on almost all platforms including Windows, Mac OS X and

Extensible and Embeddable

A high-level, interpreted language

Large standard libraries to solve common tasks

Everything in Python is an object. Object oriented programming (OOP) helps you

5.1.3 Applications of Python:

1. Simple Elegant Syntax

2. Not overly strict

3. Expressiveness of the language

4. Great Community and Support