0% found this document useful (0 votes)

4 views

Anomaly Detection Using Ml (1)

The document discusses the importance of anomaly detection in network traffic, emphasizing its applications in fraud detection and cybersecurity. It reviews various machine learning algorithms suitable for detecting anomalies, comparing their effectiveness and suitability for different data types. The KDD Cup 1999 dataset is utilized to build predictive models, with results indicating high performance in detecting various types of network attacks.

Uploaded by

Nizam Azmi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Anomaly Detection Using Ml (1)

Uploaded by

Nizam Azmi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

ANOMALY DETECTION IN

NETWORK TRAFFIC
ANOMALY DETECTION
USING ML
Muhamad Nizam Azmi
Mataram University
WHY ANOMALY DETECTION ?
Anomaly Detection Overview

Crucial task in identifying deviations from normal

behavior
Applications in fraud detection, industrial maintenance,
cybersecurity

Challenges

High-dimensional data
Large-scale distributed systems
Vast amounts of data
advacements in Machine Learning

Powerful tools for pattern recognition

Effective in complex and high-dimensional
datasets

NEXT- PROJECT
Distinguishing between normal and anomalous
behaviors

Unsupervised Learning Algorithms FOCUS

No need for labeled attack data
Ideal for real-world applications with scarce
labeled data
point anomaly

Collective anomaly

3 TYPES OF
contextual anomaly ANOMALY
point anomaly

Collective anomaly

3 TYPES OF
contextual anomaly ANOMALY

Point anomaly
detection
focus project

COMPARISON OF VARIOUS MACHINE LEARNING ALGORTHM

Aim: Identify the most effective techniques for different anomaly detection tasks

Provide insights and guidelines for future applications in cybersecurity, industrial monitoring, and beyond

ADABOOST Naive Bayes Gradient Boosting

Logistic Regression K-Nearest Neigbors (KNN) SVM

Random Forest Decision Tree Neural Network (NN)

WHY CHOOSE THESE
ALGORITHMS?
AdaBoost Naive Bayes Gradent Boosting Random Forest
AdaBoost is chosen for its Naive Bayes is selected for Gradient Boosting is chosen for Random Forest is chosen for its
ability to enhance the its simplicity, speed, and its high predictive accuracy high accuracy, ability to handle
performance of simple effectiveness in handling and ability to handle a variety large datasets with higher
models, making it effective in large datasets, making it of data types and distributions, dimensionality, and robustness
scenarios where data may suitable for real-time which is crucial for detecting against overfitting.
have a lot of noise or complex anomaly detection tasks. subtle anomalies.
patterns.

Logistic Regression KNN SVM Neural Network (NN)

KNN is chosen for its
Logistic Regression is selected SVM is selected for its Neural Networks are chosen
for its interpretability and simplicity and ability to
robustness in high- for their flexibility and ability to
effectiveness in binary perform well with small dimensional spaces and its model complex, non-linear
classification problems, making to medium-sized effectiveness in cases where relationships in data, which is
it useful for understanding and datasets, particularly the anomaly classes are not essential for accurately
explaining anomaly detection
when the data is not linearly separable. detecting anomalies in high-
results.
linearly separable. dimensional datasets.
WHY CHOOSE THESE
ALGORITHMS?
AdaBoost Naive Bayes Gradent Boosting Random Forest

ability to enhance the

Add a These algorithms collectively provide Random
AdaBoost is chosen for its Naive Bayes is selected for
its simplicity, speed, and
a
Gradient Boosting is chosen for
its high predictive accuracy
Forest is chosen for its
high accuracy, ability to handle

robust framework for identifying anomalies,dimensionality, and robustness

performance of simple effectiveness in handling and ability to handle a variety
large datasets with higher
models, making it effective in large datasets, making it of data types and distributions,
scenarios where data may suitable for real-time which is crucial for detecting
against overfitting.
leveraging their individual strengths to enhance
have a lot of noise or complex
patterns.
anomaly detection tasks. subtle anomalies.

the accuracy and reliability of anomaly detection

Logistic Regression KNN SVM Neural Network (NN)
Logistic Regression is selected
system.SVM is selected for its
KNN is chosen for its
Neural Networks are chosen
for its interpretability and simplicity and ability to
robustness in high- for their flexibility and ability to
effectiveness in binary perform well with small dimensional spaces and its model complex, non-linear
classification problems, making to medium-sized effectiveness in cases where relationships in data, which is
it useful for understanding and datasets, particularly the anomaly classes are not essential for accurately
explaining anomaly detection
when the data is not linearly separable. detecting anomalies in high-
results.
linearly separable. dimensional datasets.
Evaluation of Distributed ML Algorithm for Anomaly Detection
PREV-
RESEARCH
Astekin, M., Zengin, H., and Sözer, H. (2018) compared distributed machine
learning algorithms for system log analysis. They focused on scalability
and efficiency, highlighting the strengths of certain algorithms in handling
large datasets.

Metaheuristics and Machine Learning for Anomaly Detection in Big Data

Cavallaro, C., Cutello, V., Pavone, M., and Zito, F. (2023) reviewed the use of
metaheuristics combined with machine learning for anomaly detection.
Their study showed improved detection accuracy and adaptability to
different datasets.

Industrial Anomaly Detection with Neural Network Architectures

Siegel, B. (2020) compared neural network architectures for detecting
industrial anomalies. The study focused on real-time detection capabilities
and accuracy, discussing implementation challenges and solutions.

ETC.
SUMMARIZE
PREV-
RESEARCH
DATASET INFORMATION
Purpose: The KDD Cup 1999 dataset is used to build a predictive model to distinguish
between "bad" connections (intrusions or attacks) and "good" (normal) connections in
a computer network. It aims to protect the network from unauthorized users, including
potential insiders.

Background: The dataset is based on the 1998 DARPA Intrusion Detection Evaluation
Program, managed by MIT Lincoln Labs. The program's objective was to evaluate
research in intrusion detection using a standard set of data that includes various
intrusions simulated in a military network environment.

Data Collection.....
Environment: Simulated a typical U.S. Air Force LAN with multiple simulated attacks.
Duration: Data was collected over nine weeks (seven weeks for training, two weeks for testing).
Data Size:
Training data: 4 gigabytes of compressed binary TCP dump data, resulting in about five million connection
records.
Test data: Around two million connection records.
Connection Records: Each connection is a sequence of TCP packets between a source IP address and a target IP
address, labeled as either normal or a specific type of attack.
TYPES OF ATTACKS

DOS R2L
Denial of Service e.g., Syn flood Remote to Local e.g., guessing
passwords

U2R Probing
User to Root e.g., Buffer overflow e.g., port scanning
attacks
RESULT

ICMP: The most frequent protocol

type with over 250,000
occurrences.
TCP: The second most common
protocol type with over 150,000
occurrences.
UDP: The least frequent protocol
type with fewer than 50,000
occurrences.
RESULT

0: Not logged in
1: Successfully logged in

The number of users who did not

log in (0) is significantly higher
than those who successfully
logged in (1).
RESULT
Categories:
dos: Denial of Service attacks -
391,458 instances
normal: Normal traffic (no
attack) - 97,278 instances
probe: Surveillance and
probing attacks - 4,107
instances
r2l: Remote to local attacks -
1,126 instances
u2r: User to root attacks - 52
instances
RESULT
num_root: Removed due to high correlation with
num_compromised (Correlation = 0.9938).
srv_serror_rate: Removed due to high correlation
with serror_rate (Correlation = 0.9984).
srv_rerror_rate: Removed due to high correlation
with rerror_rate (Correlation = 0.9947).
dst_host_srv_serror_rate: Removed due to high
correlation with srv_serror_rate (Correlation =
0.9993).
dst_host_serror_rate: Removed due to high
correlation with rerror_rate (Correlation = 0.9870).
dst_host_rerror_rate: Removed due to high
correlation with srv_rerror_rate (Correlation =
0.9822).
dst_host_srv_rerror_rate: Removed due to high
correlation with rerror_rate (Correlation = 0.9852).
dst_host_same_srv_rate: Removed due to high
correlation with dst_host_srv_count (Correlation =
0.9737).
DECISION TREE RESULT
RANDOM FOREST RESULT
SVM RESULT
KNN RESULT
LOGISTIC REGRESSION RESULT
NEURAL NETWORK RESULT
GRADIENT BOOSTING RESULT
NAIVE BAYES RESULT
ADABOOST RESULT
ROC CURVE & FEATURE IMPORTANCES RESULT
DOS (Class 0): AUC (Area Under the Curve) = 1.00
Normal (Class 1): AUC = 1.00
Probe (Class 2): AUC = 0.99
R2L (Class 3): AUC = 0.97
U2R (Class 4): AUC = 0.82

High Performance:
Classes 0 and 1 have perfect AUC scores of
1.00, indicating excellent classification
performance with no false positives.
Class 2 also demonstrates very high
performance with an AUC of 0.99.
Moderate Performance:
Class 3 has an AUC of 0.97, showing strong
performance with minimal false positives.
Class 4 has a lower AUC of 0.82, indicating
room for improvement in distinguishing this
class from others.
ROC CURVE & FEATURE IMPORTANCES RESULT

Dominant Feature: The srv_count feature

overwhelmingly dominates the feature
importance, indicating it has a critical
impact on the model's performance.
Other Features: Although other features
contribute less, they still hold importance for
the model, affecting specific aspects of its
predictions.
CONCLUSION
Effectiveness of Machine Learning Algorithms:
This project successfully demonstrated that various machine learning
algorithms such as ADABOOST, Naive Bayes, Gradient Boosting, Logistic
Regression, K-Nearest Neighbors (KNN), Support Vector Machine (SVM),
Random Forest, Decision Tree, and Neural Networks (NN) are capable of
detecting anomalies in large and complex datasets.

Recommendations for Further Development:

For future research, it is recommended to apply advanced data
augmentation techniques and feature engineering to further enhance
model performance. Additionally, combining multiple algorithms or using
ensemble approaches can help improve the accuracy and robustness of
anomaly detection.
REFERENCES
Astekin, M., Zengin, H., & Sözer, H. (2018). Evaluation of Distributed Machine Learning Algorithms for Anomaly Detection from
Large-Scale System Logs: A Case Study. Proceedings of the IEEE International Conference on Big Data, 862-1967. Get In Touch
With Us
Cavallaro, C., Cutello, V., Pavone, M., & Zito, F. (2023). Discovering anomalies in big data: a review focused on the application
of metaheuristics and machine learning techniques. Frontiers in Big Data, 6. Get In Touch With Us
Shabat, G., Segev, D., & Averbuch, A. (2017). Uncovering Unknown Unknowns in Financial Services Big Data by Unsupervised
Methodologies: Present and Future trends. Proceedings of the Machine Learning Research, 71, 8-19. Get In Touch With Us
Siegel, B. (2020). Industrial Anomaly Detection: A Comparison of Unsupervised Neural Network Architectures. IEEE Sensors
Journal, 4(8), 1-4. Get In Touch With Us
Zoppi, T., Ceccarelli, A., & Bondavalli, A. (2020). Into the Unknown: Unsupervised Machine Learning Algorithms for Anomaly-
Based Intrusion Detection. Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks (DSN),
50200, 44. Get In Touch With Us

10
THANK YOU FOR
YOUR ATTENTION

Applied Data Science Questions
No ratings yet
Applied Data Science Questions
15 pages
Network Anomaly Detection-Methods, Systems and Tools
No ratings yet
Network Anomaly Detection-Methods, Systems and Tools
34 pages
IEEE Conference Templa
No ratings yet
IEEE Conference Templa
4 pages
IEEE Conference Template
No ratings yet
IEEE Conference Template
4 pages
Mausumi_doi.org.10.32010.26166127.2020.3.2.196.206
No ratings yet
Mausumi_doi.org.10.32010.26166127.2020.3.2.196.206
12 pages
1 s2.0 S1110016824002850 Main
No ratings yet
1 s2.0 S1110016824002850 Main
11 pages
Machine Learning for Anomaly Detection
No ratings yet
Machine Learning for Anomaly Detection
23 pages
Ahmed PDF
No ratings yet
Ahmed PDF
6 pages
Machine Learning Approaches To Network Anomaly Detection: Tarem Ahmed, Boris Oreshkin and Mark Coates
No ratings yet
Machine Learning Approaches To Network Anomaly Detection: Tarem Ahmed, Boris Oreshkin and Mark Coates
6 pages
CS L06 MachineLearning AnomalyDetection
No ratings yet
CS L06 MachineLearning AnomalyDetection
61 pages
Network Anomaly Detection
No ratings yet
Network Anomaly Detection
18 pages
s00521-023-09309-y
No ratings yet
s00521-023-09309-y
19 pages
Paper 8 CN
No ratings yet
Paper 8 CN
5 pages
10.1007@978-981-13-9710-344
No ratings yet
10.1007@978-981-13-9710-344
13 pages
Ijetae 0512 58 PDF
No ratings yet
Ijetae 0512 58 PDF
5 pages
Anomaly Detection in Networks Using Machine Learning
No ratings yet
Anomaly Detection in Networks Using Machine Learning
71 pages
Anomaly Detection in Big Data
No ratings yet
Anomaly Detection in Big Data
148 pages
CCN Presentation[1][1]
No ratings yet
CCN Presentation[1][1]
13 pages
Network Anomaly Detection Using A Hybrid Approach of Machine H Öztekin
No ratings yet
Network Anomaly Detection Using A Hybrid Approach of Machine H Öztekin
12 pages
Batch 7 Conference Paper
No ratings yet
Batch 7 Conference Paper
5 pages
Multi Level Deep Learning Model For Network Anomal
No ratings yet
Multi Level Deep Learning Model For Network Anomal
12 pages
BCLR 0148
No ratings yet
BCLR 0148
81 pages
A_Study_on_High_Speed_Outlier_Detection
No ratings yet
A_Study_on_High_Speed_Outlier_Detection
17 pages
Applying Machine Learning To Cyber Security
No ratings yet
Applying Machine Learning To Cyber Security
117 pages
911523405006
No ratings yet
911523405006
16 pages
Anamoly Detection
No ratings yet
Anamoly Detection
20 pages
Ecmlpkdd08 Lazarevic Dmfa
No ratings yet
Ecmlpkdd08 Lazarevic Dmfa
116 pages
Research
No ratings yet
Research
15 pages
Journal of Network and Computer Applications: Mohiuddin Ahmed, Abdun Naser Mahmood, Jiankun Hu
No ratings yet
Journal of Network and Computer Applications: Mohiuddin Ahmed, Abdun Naser Mahmood, Jiankun Hu
13 pages
batch 7 conference paper (1)
No ratings yet
batch 7 conference paper (1)
5 pages
Network Intrusion Detection Using Machine Learning: Project Guide DR K Suresh
No ratings yet
Network Intrusion Detection Using Machine Learning: Project Guide DR K Suresh
40 pages
amnamoly-detection-in-network
No ratings yet
amnamoly-detection-in-network
2 pages
05 02 Survey Data Mining v2
No ratings yet
05 02 Survey Data Mining v2
54 pages
Aam Micro
No ratings yet
Aam Micro
13 pages
IEEE_Conference_Template
No ratings yet
IEEE_Conference_Template
4 pages
Final Progress
No ratings yet
Final Progress
22 pages
Literature Survey
No ratings yet
Literature Survey
1 page
Robust Data Model For Enhanced Anomaly Detection: R.Ravinder Reddy, Dr.Y Ramadevi, DR.K.V.N Sunitha
No ratings yet
Robust Data Model For Enhanced Anomaly Detection: R.Ravinder Reddy, Dr.Y Ramadevi, DR.K.V.N Sunitha
8 pages
Anomaly Detection: A Tutorial: Arindam Banerjee, Varun Chandola, Vipin Kumar, Jaideep Srivastava
No ratings yet
Anomaly Detection: A Tutorial: Arindam Banerjee, Varun Chandola, Vipin Kumar, Jaideep Srivastava
101 pages
19bit0368 Capstone Final Review
No ratings yet
19bit0368 Capstone Final Review
48 pages
Anomaly Detection: A Tutorial
No ratings yet
Anomaly Detection: A Tutorial
101 pages
Intrusion Detection in Wireless Sensor Networks
No ratings yet
Intrusion Detection in Wireless Sensor Networks
5 pages
Time-Series Anomaly Detection Service at Microsoft
No ratings yet
Time-Series Anomaly Detection Service at Microsoft
9 pages
KNN 2
No ratings yet
KNN 2
4 pages
El 17 01 08
No ratings yet
El 17 01 08
8 pages
Anomaly Detection: Jing Gao
No ratings yet
Anomaly Detection: Jing Gao
51 pages
Intrusion Detection - DM
No ratings yet
Intrusion Detection - DM
34 pages
A Comparative Analysis of Machine Learni
No ratings yet
A Comparative Analysis of Machine Learni
9 pages
Intrusion Detection System by Using K-Means Clustering, C 4.5, FNN, SVM Classifier
No ratings yet
Intrusion Detection System by Using K-Means Clustering, C 4.5, FNN, SVM Classifier
5 pages
Ieee - Intrusion Detection System Using Neural
No ratings yet
Ieee - Intrusion Detection System Using Neural
8 pages
Anomaly Detection in Log Files Using
No ratings yet
Anomaly Detection in Log Files Using
67 pages
2.1(1)
No ratings yet
2.1(1)
8 pages
Title of The Paper: (All Fonts of The Document in 10pt, Times New Roman Font.)
No ratings yet
Title of The Paper: (All Fonts of The Document in 10pt, Times New Roman Font.)
2 pages
Yihunie2019 PDF
No ratings yet
Yihunie2019 PDF
5 pages
Supervised Machine Learning Algorithms For Intrusion Detection
No ratings yet
Supervised Machine Learning Algorithms For Intrusion Detection
14 pages
Atf ETH Master Thesis AD+RCA
No ratings yet
Atf ETH Master Thesis AD+RCA
43 pages
A HYBRID MACHINE LEARNING METHOD
No ratings yet
A HYBRID MACHINE LEARNING METHOD
6 pages
Anomaly ND Condition Monitoring 2
No ratings yet
Anomaly ND Condition Monitoring 2
18 pages
A Review of Neural Networks For Anomaly Detection
No ratings yet
A Review of Neural Networks For Anomaly Detection
26 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Remote Sensing Technology Applications in Forestry and REDD PDF
No ratings yet
Remote Sensing Technology Applications in Forestry and REDD PDF
246 pages
The LION Way: Machine Learning Plus Intelligent Optimization
100% (1)
The LION Way: Machine Learning Plus Intelligent Optimization
150 pages
Ineuron 12mnths
No ratings yet
Ineuron 12mnths
26 pages
KNN
No ratings yet
KNN
3 pages
Ipl Matches Documentation
No ratings yet
Ipl Matches Documentation
28 pages
Prediction_of_Crime_Hotspots_using_Machine_Learning_with_Stacked_Generalized_Approach
No ratings yet
Prediction_of_Crime_Hotspots_using_Machine_Learning_with_Stacked_Generalized_Approach
5 pages
Quantitative Methods Module 1
No ratings yet
Quantitative Methods Module 1
24 pages
Multi Modal Hate Speech Detection Using Machine Learning
100% (1)
Multi Modal Hate Speech Detection Using Machine Learning
5 pages
Customer Churn Prediction On E-Commerce Using Machine Learning
No ratings yet
Customer Churn Prediction On E-Commerce Using Machine Learning
8 pages
TE - 2019 - (AIML) Artificial Intelligence and Machine Learning
No ratings yet
TE - 2019 - (AIML) Artificial Intelligence and Machine Learning
4 pages
Artificial Intelligence in 5G
No ratings yet
Artificial Intelligence in 5G
34 pages
CIS-STA 3920 LN4.a Classification With KNN 7-20-21
No ratings yet
CIS-STA 3920 LN4.a Classification With KNN 7-20-21
12 pages
3550 Geometric Corrections
No ratings yet
3550 Geometric Corrections
36 pages
Report-2023-24 FINAL. (5)
No ratings yet
Report-2023-24 FINAL. (5)
32 pages
Machine Learning
100% (1)
Machine Learning
102 pages
AI Lab
No ratings yet
AI Lab
10 pages
Big Data Storage Techniques For Spatial Databases: Implications of Big Data Architecture On Spatial Query Processing
No ratings yet
Big Data Storage Techniques For Spatial Databases: Implications of Big Data Architecture On Spatial Query Processing
27 pages
Ijcds160137 1570980185
No ratings yet
Ijcds160137 1570980185
16 pages
Ijst 2023 3152
No ratings yet
Ijst 2023 3152
11 pages
10f 601 Midterm
No ratings yet
10f 601 Midterm
17 pages
Wepik Advancements in Diabetes Detection Leveraging Machine Learning Models Including SVM Random Forest 20231103202928mQLf
No ratings yet
Wepik Advancements in Diabetes Detection Leveraging Machine Learning Models Including SVM Random Forest 20231103202928mQLf
12 pages
1 s20 S2542660521001037 Main
No ratings yet
1 s20 S2542660521001037 Main
18 pages
An Overview On Gene Expression Analysis: Dr. R. Radha, P. Rajendiran
No ratings yet
An Overview On Gene Expression Analysis: Dr. R. Radha, P. Rajendiran
6 pages
(IJIT-V10I5P3) :sinda Alexander Mwikwabe, Dr. Nambiro Alice, PHD, Dr. Anselemo Peters PHD
No ratings yet
(IJIT-V10I5P3) :sinda Alexander Mwikwabe, Dr. Nambiro Alice, PHD, Dr. Anselemo Peters PHD
6 pages
Exploring The Relation Between Blood Tests and Covid-19 Using Machine Learning
No ratings yet
Exploring The Relation Between Blood Tests and Covid-19 Using Machine Learning
6 pages
Fuzzy Logic and Hybrid Based Approaches For The Risk of Heart
No ratings yet
Fuzzy Logic and Hybrid Based Approaches For The Risk of Heart
17 pages
Subject Stream Prediction A Machine Learning Approach To Select The Suitable Subject Stream For Senior Secondary Students in Sri Lanka
No ratings yet
Subject Stream Prediction A Machine Learning Approach To Select The Suitable Subject Stream For Senior Secondary Students in Sri Lanka
8 pages
1-s2.0-S2666285X21000893-main
No ratings yet
1-s2.0-S2666285X21000893-main
6 pages
Machine Learning Project in Python Step-By-Step
No ratings yet
Machine Learning Project in Python Step-By-Step
23 pages