Journal
Journal
Journal
The proposed model is to build a machine The data set in KDD Cup99 have normal and 22 attack
learning model for anomaly detection. Anomaly detection is type data with 41 features and all generated traffic patterns
an important technique for recognizing fraud activities, end with a label either as ‘normal’ or any type of ‘attack’ for
suspicious activities, network intrusion, and other abnormal upcoming analysis. There are varieties of attacks which are
events that may have great significance but are difficult to entering into the network over a period of time and the attacks
detect. The machine learning model is built by applying are classified into the following four main classes.
proper data science techniques like variable identification that
is the dependent and independent variables. Then the ➢ Denial of Service (DoS)
visualisation of the data is done to insights of the data .The
model is build based on the previous dataset where the ➢ User to Root (U2R)
algorithm learn data and get trained different algorithms are
used for better comparisons. The performance metrics are ➢ Remote to User (R2L)
calculated and compared.
➢ Probing
Advantages:
Advantages:
1.The anomaly detection can be automated process using the
machine learning.
Denial of Service is a class of attacks where an attacker
makes some computing or memory resource too busy or too
2.Performance metric are compared in order to get better full to handle legitimate requests, denying legitimate users
model. access to a machine. The different ways to launch a DoS
attack are by abusing the computer’s legitimate features,
Existing System:
➢ by targeting the implementation bugs
They proposed first to create a contrastive self-
supervised learning to the anomaly detection problem of ➢ by exploiting the misconfiguration of the systems
attributed networks. CoLa, is mainly consists of three
components: contrastive instance pair sampling, GNN-based
DoS attacks are classified based on the services that an
contrastive learning model, and multiround sampling-based
attacker renders unavailable to legitimate users.
anomaly score computation. Their model captures the
relationship between each node and its neighbouring structure
User to Root:
and uses an anomaly-related objective to train the contrastive
learning model. We believe that the proposed framework
In User to Root attack, an attacker starts with access to a
opens a new opportunity to expand self-supervised learning
normal user account on the system and gains root access.
and contrastive learning to increasingly graph anomaly
Regular programming mistakes and environment assumption
detection applications. The multiround predicted scores by the
give an attacker the opportunity to exploit the vulnerability of
contrastive learning model are further used to evaluate the
root access.
abnormality of each node with statistical estimation. The
training phase and the inference phase. In the training phase,
the contrastive learning model is trained with sampled Remote to User:
instance pairs in an unsupervised fashion. After that the
anomaly score for each node is obtained in the inference In Remote to User attack, an attacker sends packets to a
phase. machine over a network that exploits the machine’s
vulnerability to gain local access as a user illegally. There are
Disadvantages: different types of R2L attacks and the most common attack in
this class is done by using social engineering.
1.The performance is not good and its get complicated for
other networks. Probing:
2.The performance metrics like recall F1 score and Probing is a class of attacks where an attacker scans a network
comparison of machine learning algorithm is not done. to gather information in order to find known vulnerabilities.
An attacker with a map of machines and services that are
available on a network can manipulate the information to look
for exploits. There are different types of probes: some of them
abuse the computer’s legitimate features and some of them use
social engineering techniques. This class of attacks is the most ports
common because it requires very little technical expertise. Denial of service where a remote host is
land
sent a UDP packetwith the
Summary:
same source and destination
This chapter outlines the structure of the dataset used in the
pod Denial of service ping of death
proposed work. The various kinds of features such as discrete
smurf Denial of service icmp echo reply flood
and continuous features are studied with a focus on their role
Denial of service where mis-fragmented
in the attack. The attacks are classified with a brief
teardrop
introduction to each. The next chapter discusses the clustering UDP packets causesome
and classification of the data with a direction to learning by
machine. systems to reboot
multihop Multi-day scenario in which a user first
Table: Attack Types Grouped to respective Class breaks into one
machine
Dos R2L U2R Probe
Exploitable CGI script which allows a client
phf to execute arbitrary commands on a machine
Back FTP Write Load mosule Ip sweep with a mis-configured web
server.
Neptune Multihop Rerl Nmap Multi-day scenario in which a user breaks
into a machine with the purpose of finding
Land Phf Rootkit Satan spy important information where the user tries to
avoid detection. Uses several different exploit
Pod Spy Buffer overflow Port sweep methods
to gain access
Smurf Warezclient Ps Msscan Users downloading illegal software which
Teardrop Warezmaste Sql attack saint warezclient was previously posted via
r anonymous FTP by the warezmaster
Apache2 Imap xterm Anonymous FTP upload of Warez (usually
warezmaster illegal copies of copy writed
Mail bomb Guess software) onto FTP server
password Imap Remote buffer overflow using imap port leads
Process http tunnel to root shell
table Non-stealthy loadmodule attack which resets
UDP Storm named loadmodule IFS for a normal user and
creates a root shell
send mail
Perl Perl attack which sets the user id to root in a
snmpget perl script and
attack creates a root shell
snmp guess Multi-day scenario where a user installs
rootkit one or more components of a
worm
rootkit
xlock Surveillance sweep performing either a port
xsnoop ipsweep sweep or ping on multiple
host addresses
nmap Network mapping using the nmap tool. Mode
of exploring network will vary-options
Table: Description of Attacks include SYN
Types of satan Network probing tool which looks for well-
Description known weaknesses.
Attacks Operates at three different levels. Level 0 is
Denial of service attack against apache light
back web server where a Surveillance sweep through many ports to
portsweep determine which services are
client requests a URL containing many
supported on a single host
backslashes
Guess passwords for a valid user using simple
neptune Syn flood denial of service on one or more
dict variants of the account
name over a telnet connection • Import required libraries packages
Buffer overflow using eject program on
eject Solaris. Leads to a user->root • Analyze the general properties
transition if successful
Buffer overflow using the ffbconfig UNIX • Find duplicate and missing values
ffb system command leads to
root shell • Checking unique and count values
Buffer overflow using the fdformat UNIX
format system command leads to
root shell
Remote FTPuser creates .rhost file in ➢ Uni-variate data analysis
ftp-write world writable anonymous FTP
directory and obtains local login • Rename, add data and drop the data
guest Try to guess password via telnet for guest
account • To specify data type
Denial of service for the syslog service
syslog connects to port 514 with
unresolvable source ip
warez User logs into anonymous FTP site and ➢ Exploration data analysis of bi-variate and multi-
creates a hidden variate
directory
• Plot diagram of pairplot, heatmap, bar chart and
Histogram
III. OBJECTIVE
➢ Apply the fundamental concepts of machine learning ➢ Comparing algorithm to predict the result
from an available dataset and Evaluate and interpret
my results and justify my interpretation based on • Based on the best accuracy
observed dataset.
Scope:
➢ Create notebooks that serve as computational records
and document my thought process and investigate the The scope of this project is to investigate a dataset of
network connection whether attacked or not to network connection attacks for KDD records for medical
analyses the data set. sector using machine learning technique. To identifying
network connection is attacked or not.
➢ Evaluate and analyses statistical and visualized
results, which find the standard patterns for all IV. FEASIBILITY STUDY
regiments.
Data Wrangling:
In this section of the report will load in the data, check for
Project Goals: cleanliness, and then trim and clean given dataset for analysis.
Make sure that the document steps carefully and justify for
➢ Exploration data analysis of variable identification cleaning decisions.
The data set collected for predicting given data is split Requirements are the basic constrains that are required to
into Training set and Test set. Generally, 7:3 ratios are applied develop a system. Requirements are collected while designing
to split the Training set and Test set. The Data Model which the system. The following are the requirements that are to be
was created using Random Forest, logistic, Decision tree discussed.
algorithms and Support vector classifier (SVC) are applied on
the Training set and based on the test result accuracy, Test set 1. Functional requirements
prediction is done.
2. Non-Functional requirements
Preprocessing:
The data which was collected might contain missing 3. Environment requirements
values that may lead to inconsistency. To gain better results
data need to be preprocessed so as to improve the efficiency of A. Hardware requirements
the algorithm. The outliers have to be removed and also
variable conversion need to be done. B. software requirements
Building the classification model: Functional requirements:
The prediction of Phishing Website, A Random Forest
Algorithm prediction model is effective because of the The software requirements specification is a technical
following reasons: It provides better results in classification specification of requirements for the software product. It is the
problem. first step in the requirements analysis process. It lists
• It is strong in preprocessing outliers, irrelevant requirements of a particular software system. The following
variables, and a mix of continuous, categorical and details to follow the special libraries like sk-learn, pandas,
discrete variables. numpy, matplotlib and seaborn.
• It produces out of bag estimate error which has
proven to be unbiased in many tests and it is Non-Functional Requirements:
relatively easy to tune with.
1. Process of functional steps,
Construction of a Predictive Model:
Machine learning needs data gathering have lot of 2. Problem define
past data’s. Data gathering have sufficient historical data and
raw data. Before data pre- processing, raw data can’t be used 3. Preparing data
directly. It’s used to preprocess then, what kind of algorithm
with model. Training and testing this model working and
4. Evaluating algorithms
predicting correctly with minimum errors. Tuned model
involved by tuned time to time with improving the accuracy.
5. Improving results
Data Gathering
6. Prediction the result
1. Software Requirements:
Choose model
➢ Download and install anaconda and get the most 4. Visualizing the dataset.
useful package for machine learning in Python.
5. Evaluating some algorithms.
➢ Load a dataset and understand its structure using
statistical summaries and data visualization. 6. Making some predictions.
➢ Machine learning models, pick the best and build II. SYSTEM ARCHITECTURE
confidence that the accuracy is reliable.
Class Diagram:
Training Testing
Dataset Dataset
➢ In case of a regression problem, for a new record, Tkinter supports a range of Tcl/Tk versions, built either with
each tree in the forest predicts a value for Y (output). or without thread support. The official Python binary release
The final value can be calculated by taking the bundles Tcl/Tk 8.6 threaded. See the source code for the
average of all the values predicted by all the trees in _tkinter module for more information about supported
forest. Or, in case of a classification problem, each versions.
tree in the forest predicts the category to which the
new record belongs. Finally, the new record is Tkinter is not a thin wrapper, but adds a fair amount of its own
assigned to the category that wins the majority vote. logic to make the experience more pythonic. This
documentation will concentrate on these additions and
Support Vector Machines: changes, and refer to the official Tcl/Tk documentation for
details that are unchanged.
A classifier that categorizes the data set by setting an
optimal hyper plane between data. I chose this classifier as it Tkinter is a Python binding to the Tk GUI toolkit. It is the
is incredibly versatile in the number of different kernelling standard Python interface to the Tk GUI toolkit, and is
functions that can be applied and this model can yield a high Python's de facto standard GUI. Tkinter is included with
predictability rate. Support Vector Machines are perhaps one standard GNU/Linux, Microsoft Windows and macOS installs
of the most popular and talked about machine learning of Python.
algorithms. They were extremely popular around the time they
were developed in the 1990s and continue to be the go-to The name Tkinter comes from Tk interface. Tkinter was
method for a high-performing algorithm with little tuning. written by Fredrik Lundh.
• How to disentangle the many names used to refer to Tkinter is free software released under a Python license.
support vector machines.
As with most other modern Tk bindings, Tkinter is
• The representation used by SVM when the model is implemented as a Python wrapper around a complete Tcl
actually stored on disk. interpreter embedded in the Python interpreter. Tkinter calls
are translated into Tcl commands, which are fed to this [10]. Xiaoyong Yuan , Pan He, Qile Zhu, and Xiaolin Li “Adversarial
Examples: Attacks and Defenses for Deep Learning ,”.Issue on 2019.
embedded interpreter, thus making it possible to mix Python
and Tcl in a single application.
II. CONCLUTION
ACKNOWLEDGMENT
REFERENCES
[1]. Xiaoyong Yuan , Pan He, Qile Zhu, and Xiaolin Li , “Adversarial
Examples: Attacks and Defenses for Deep Learning”, Issue on 2019.
[3]. Preetish Ranjan, Abhishek Vaish , “Apriori Viterbi Model for Prior
Detection of Socio-Technical Attacks in a Social Network ”, Issue on
2014.
[6]. Wentao Zhao, Jianping Yin and Jun Long “A Prediction Model of
DoS Attack’s Distribution Discrete Probability,”.Issue on 2008.
[7]. Jinyu W1, Lihua Yin and Yunchuan Guo “Cyber Attacks Prediction
Model Based on Bayesian Network ,”.Issue on 2012.
[9]. Preetish Ranjan, Abhishek Vaish “Apriori Viterbi Model for Prior
Detection of Socio-Technical Attacks in a Social Network ,”.Issue on
2014.