Intrusion Detection of Imbalanced Network Traffic Based On Machine Learning and Deep Learning

Uploaded by

Greeshma Deepak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

249 views14 pages

Intrusion Detection of Imbalanced Network Traffic Based On Machine Learning and Deep Learning

Uploaded by

Greeshma Deepak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Received November 27, 2020, accepted December 27, 2020, date of publication December 30, 2020,

date of current version January 13, 2021.

Digital Object Identifier 10.1109/ACCESS.2020.3048198

Intrusion Detection of Imbalanced Network

Traffic Based on Machine Learning
and Deep Learning
LAN LIU 1, PENGCHENG WANG 1, JUN LIN 2, AND LANGZHOU LIU 1
1 School of Electronic and Information Engineering, Guangdong Polytechnic Normal University, Guangzhou 510655, China
2 China Electronic Product Reliability and Environmental Testing Research Institute, Guangzhou 510610, China

Corresponding author: Jun Lin (linjun@ceprei.com)

This work was supported in part by the National Natural Science Foundation of China under Grant 61972104, in part by the Special Project
for Research and Development in Key Areas of Guangdong Province under Grant 2019B010121001, and in part by the Special Fund for
Science and Technology Innovation Strategy of Guangdong Province under Grant 2020A0332.

ABSTRACT In imbalanced network traffic, malicious cyber-attacks can often hide in large amounts of
normal data. It exhibits a high degree of stealth and obfuscation in cyberspace, making it difficult for Network
Intrusion Detection System(NIDS) to ensure the accuracy and timeliness of detection. This paper researches
machine learning and deep learning for intrusion detection in imbalanced network traffic. It proposes a novel
Difficult Set Sampling Technique(DSSTE) algorithm to tackle the class imbalance problem. First, use the
Edited Nearest Neighbor(ENN) algorithm to divide the imbalanced training set into the difficult set and the
easy set. Next, use the KMeans algorithm to compress the majority samples in the difficult set to reduce
the majority. Zoom in and out the minority samples’ continuous attributes in the difficult set synthesize new
samples to increase the minority number. Finally, the easy set, the compressed set of majority in the difficult,
and the minority in the difficult set are combined with its augmentation samples to make up a new training set.
The algorithm reduces the imbalance of the original training set and provides targeted data augment for the
minority class that needs to learn. It enables the classifier to learn the differences in the training stage better
and improve classification performance. To verify the proposed method, we conduct experiments on the
classic intrusion dataset NSL-KDD and the newer and comprehensive intrusion dataset CSE-CIC-IDS2018.
We use classical classification models: random forest(RF), Support Vector Machine(SVM), XGBoost,
Long and Short-term Memory(LSTM), AlexNet, Mini-VGGNet. We compare the other 24 methods; the
experimental results demonstrate that our proposed DSSTE algorithm outperforms the other methods.

INDEX TERMS IDS, imbalanced network traffic, machine learning, deep learning, CSE-CIC-IDS2018.

I. INTRODUCTION James Anderson first proposed the concept of intrusion

With the rapid development and wide application of 5G, IoT, detection in 1980, and then some scholars applied machine
Cloud Computing, and other technologies, network scale, learning methods in intrusion detection [1]. However, due
and real-time traffic become more complex and massive, to the limitation of computer storage and computing power
cyber-attacks have also become complex and diverse, bring- at that time, machine learning failed to attract attention.
ing significant challenges to cyberspace security. As the sec- With the rapid development of computers and the emer-
ond line of defense behind the firewall, the Network Intrusion gence and promotion of Artificial Intelligence(AI) and other
Detection System(NIDS) needs to accurately identify mali- technologies, many scholars have applied machine learning
cious network attacks, provide real-time monitoring and methods to network security. They have achieved certain
dynamic protection measures, and formulate strategies. results [2]–[4].
In real cyberspace, normal activities occupy the dominant
position, so most traffic data are normal traffic; only a few
The associate editor coordinating the review of this manuscript and are malicious cyber-attacks, resulting in a high imbalance of
approving it for publication was Emre Koyuncu . categories. In the highly imbalanced and redundant network

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
7550 VOLUME 9, 2021
L. Liu et al.: Intrusion Detection of Imbalanced Network Traffic Based on Machine Learning and Deep Learning

traffic data, intrusion detection is facing tremendous pressure. Pervez proposed a new method for feature selec-
Cyber-attacks can hide in a large amount of normal traf- tion and classification merging of multi-class NSL-KDD
fic. Therefore, the machine learning algorithm cannot fully Cup99 dataset using Support Vector Machine(SVM) and dis-
learn the distribution of a few categories, and it is easy to cussed the classification accuracy of classifiers under dif-
misclassify [5]. ferent dimension features [12]. Shiraz studied some new
Since Lecun et al. [6] proposed the theory of Deep Learn- technologies to improve CANN intrusion detection methods’
ing as an essential subfield of machine learning, deep learn- classification performance and evaluated their performance
ing has shown excellent performance in Computer Vision on the NSL-KDD Cup99 dataset [13]. He used the K Farthest
(CV) [7], Natural Language Processing (NLP) [8]. Intrusion Neighbor(KFN) and the K Nearest Neighbor(KNN) to clas-
detection technology based on deep learning has been widely sify the data and used the Second Nearest Neighbor(SNN)
studied in academia and industry. The method of deep learn- of the data when the nearest and farthest neighbors have
ing is to mine the potential features of high-dimensional data the same class label. The result shows the CANN detec-
through training models and convert network traffic anomaly tion rate and reduces the failure the alert rate is improved
detection problems into classification problems [9]. By train- or provides the same performance. Bhattacharya proposed
ing a large number of data samples, adaptive learning of the a machine learning model based on hybrid Principal Com-
difference between normal behavior and abnormal behavior ponent Analysis(PCA)-Firefly [14]. The dataset used was
effectively enhances the real-time performance of intrusion the open dataset collected from Kaggle. Firstly, the model
processing. However, in the multi-classification of network performs one key coding for transforming the IDS dataset,
traffic, the imbalance of classification still affects. then uses the hybrid PCA-Firefly algorithm to reduce the
Faced with imbalanced network traffic data, we propose a dimension, and the XGBoost algorithm classifies the reduced
novel Difficult Set Sampling Technique(DSSTE) algorithm dataset.
to tackle the class imbalance problem in network traffic. In recent years, with the powerful ability of automatic fea-
This method effectively reduces the imbalance and makes the ture extraction, deep learning has made remarkable achieve-
classification model learning difficult samples more effective. ments in the fields of Computer Vision(CV), Autonomous
We use classic machine learning and deep learning algorithms driving(AD), Natural Language Processing(NLP). Many
to verify on two benchmark datasets. The specific contribu- scholars apply deep learning to intrusion detection for traf-
tions are as follows. fic classification, which has become a hot spot of current
(1) We use the classic NSL-KDD and the up-to-date CSE- research. The method of deep learning is to mine the potential
CIC-IDS2018 as benchmark datasets and conduct detailed characteristics of high-dimensional data through a training
analysis and data cleaning. model and transform network traffic anomaly detection into
(2) This work proposes a novel DSSTE algorithm, reducing classification problem [15]. Through a large number of sam-
the majority samples and augmenting the minority samples ple data training, adaptive learning between normal network
in the difficult set, tackling the class imbalance problem in traffic and abnormal network traffic effectively enhances
intrusion detection so that the classifier learns the differences real-time intrusion processing.
better in training. Torres et al. [16] first converted network traffic charac-
(3) The classification model uses Random Forest(RF), teristics into a series of characters and then used Recurrent
Support Vector Machine(SVM), XGBoost, Long and Short Neural Network(RNN) to learn their temporal characteristics,
Time Memory(LSTM), AlexNet, Mini-VGGNet. Compar- which were further used to detect malicious network traffic.
ing with other methods, we divide the experiment into Wang et al. [17] proposed a malicious software traf-
30 methods. fic classification algorithm based on Convolutional Neu-
The rest of this article is organized as follows. The second ral Network(CNN). By mapping the traffic characteristics
part mainly introduces the related work of intrusion detection to pixels, the network traffic image is generated, and the
and class imbalance research. The third section introduces image is used as the input of the CNN to realize traffic
our proposed DSSTE algorithm, machine learning, and deep classification. Staudemeyer and Shamsinejad [13] proposed
learning algorithm. The fourth section analyzes and experi- an intrusion detection algorithm based on Long Short-Term
ments on the benchmark dataset. Finally, the paper concludes Memory(LSTM), which detects DoS attacks and probe
in the fifth section. attacks with unique time series in the KDD Cup99 dataset.
Kwon et al. [18] has carried out relevant research on the deep
learning model, focusing on data simplification, dimension
II. RELATED WORKS reduction, classification, and other technologies, and pro-
A. INTRUSION DETECTION SYSTEM(IDS) poses a Fully Convolutional Network(FCN) model. By com-
In the research of network intrusion detection based on paring with the traditional machine learning technology, it is
machine learning, scholars mainly distinguish normal net- proved that the FCN model is useful for network traffic anal-
work traffic from abnormal network traffic by dimensionality ysis. Tama et al. [19] proposed an anomaly-based IDS based
reduction, clustering, and classification, to realize the identi- on a two-stage meta-classifier, which uses a hybrid feature
fication of malicious attacks [10], [11]. selection method to obtain accurate feature representations.