Review Article: Network Attacks Detection Methods Based On Deep Learning Techniques: A Survey

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Hindawi

Security and Communication Networks


Volume 2020, Article ID 8872923, 17 pages
https://doi.org/10.1155/2020/8872923

Review Article
Network Attacks Detection Methods Based on Deep Learning
Techniques: A Survey

Yirui Wu , Dabao Wei, and Jun Feng


College of Computer and Information, Hohai University, Nanjing, China

Correspondence should be addressed to Jun Feng; fengjun@hhu.edu.cn

Received 7 May 2020; Revised 26 June 2020; Accepted 20 July 2020; Published 28 August 2020

Academic Editor: Xiaolong Xu

Copyright © 2020 Yirui Wu et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

With the development of the fifth-generation networks and artificial intelligence technologies, new threats and challenges have
emerged to wireless communication system, especially in cybersecurity. In this paper, we offer a review on attack detection
methods involving strength of deep learning techniques. Specifically, we firstly summarize fundamental problems of network
security and attack detection and introduce several successful related applications using deep learning structure. On the basis of
categorization on deep learning methods, we pay special attention to attack detection methods built on different kinds of
architectures, such as autoencoders, generative adversarial network, recurrent neural network, and convolutional neural
network. Afterwards, we present some benchmark datasets with descriptions and compare the performance of representing
approaches to show the current working state of attack detection methods with deep learning structures. Finally, we summarize
this paper and discuss some ways to improve the performance of attack detection under thoughts of utilizing deep
learning structures.

1. Introduction attacks without prior knowledge of their detailed charac-


teristics. However, traditional machine learning methods are
The continuous development and extensive usage of Internet not capable of providing distinctive feature descriptors to
benefit numerous network users from a quantity of aspects. describe the problem of attack detection, due to their lim-
Meanwhile, network security becomes much more impor- itations in model complexity. Recently, machine learning
tant with wide usage of network. Network security is closely has made a great breakthrough by simulating human brain
related to computers, networks, programs, various data, and with structure of neural networks, which are named deep
so forth, where the purpose of defense is to prevent un- learning methods for their general architecture of deep
authorized access and modification [1]. However, the layers to solve complicated problems. Among these suc-
growing number of internet-connected systems in finance, cessful applications, Google’s AlphaGo is one of the most
E-commerce, and military makes them become targets of outstanding trials for the game of “go,” involving the
network attacks, resulting in large quantity of risk and strength of a typical kind of deep learning structure, that is,
damage. Essentially, it is necessary to provide effective convolutional neural networks.
strategies to detect and defend attacks and maintain network Since deep learning is complex in its original structures
security. Furthermore, different kinds of attacks are usually and domain-oriented applications, this paper is written to
required to be processed in different ways. How to identify explain so for those who aim to study in the field of network
different kinds of network attacks thus becomes the main security by utilizing deep learning methods. Essentially,
challenge in domain of network security to be solved, es- there exists a quantity of previous work focusing on attack
pecially those attacks never seen before. detection using deep learning techniques. Among them,
Over the past several years, researchers have used various several literature reviews [2–8] have been conducted to get
kinds of machine learning methods to classify network ideas from applying deep learning on attack detection, which
2 Security and Communication Networks

is the foundation of our paper. For example, Berman et al. supervised methods with different structures. Section 4
[5] provide a quantity of reading resources to describe the presents datasets and analyzes the performance comparisons
basic knowledge and development history of deep learning of a quantity of deep learning methods. Section 5 provides
methods and their corresponding applications in attack discussion and conclusion based on the current foundations
detection. Different from a complete view on this specific and presents several ideas for future research.
domain brought by Berman et al. [5], Apruzzese et al. [4]
focus on explaining attack detection methods related to 2. Brief Introduction to Attack Detection
intrusion detection, malware analysis, and spam detection.
In work of Wickramasinghe et al. [7], they mainly review In order to provide an overview of effective attack detection
deep learning methods on securing under the usage of based on deep learning techniques, it is essential to introduce
Internet of Things technologies, which offers a clear view on background knowledge. We thus first give a brief intro-
variant kinds of cyberattacks and the corresponding tech- duction to the concepts of attack detection, which could offer
niques used in detection. Afterwards, Aleesa et al. [3] review a basic recognition for new learners. Afterwards, we make a
and analyze the research status of intrusion detection system brief representation of successful applications for
based on deep learning technology among four major da- cybersecurity.
tabases. Meanwhile, they offer a systematic literature review
of the relevant articles using the keywords “deep learning”,
2.1. Developing Process of Attack Detection. Attacks could be
“invasion”, and “attack” selection, which provide a wide
recognized as the attempts to bypass security policies of the
range of resource background for the researchers. By re-
system, which gives attackers easier access to obtain or
garding dataset as significantly important to intrusion de-
modify information, even destroying the system. With
tection, Ferrag et al. [6] describe 35 well-known network
technologies developing on wireless communication sys-
datasets and divide them into seven categories. They in-
tems, serious threats to network security, especially security
troduce seven presentative models for each category, where
of wireless communication systems, have been proposed by
they evaluate and compare the efficiency via accuracy and
more frequent network attack activities, due to openness
false alarm rate based on real traffic datasets, that is, CSE-
characteristics of wireless channels. Since we are now in
CIC-IDS2018 and Bot-IoT.
machine learning and big data epoch [9], cybersecurity in
In fact, all the above review papers have their own
wireless communication systems is important for users to
emphases, such as security applications, attacks type,
protect network, computer, and data from attacks. There
datasets, or databases. Unlike former methods, we intend to
exist variant kinds of attacks for cyber systems, such as
build our paper on the basis of deep learning models, thus
flooding, distributed denial of service, abnormal packet
paying special attention to attack detection methods built on
attack, and spoofing.
different kinds of deep learning architectures. Furthermore,
To deal with such attack threads to cybersecurity, re-
we offer a fair comparison and our own specified analysis on
searchers have proposed many solutions [10]. Among the
performance of representing approaches based on bench-
solutions, attack detection is one of the most effective ways,
mark datasets. We believe our paper could offer a more
which offers a complete and dynamic security mechanism to
understandable reading resource for readers, who are in-
monitor, prevent, and resist attacks. Specifically, attack
terested in how different deep learning architectures affect
detection would collect information by monitoring network,
the area of attack detection.
system status, behavior, and the usage of system, which
In the paper, we attempt to build up basics for future
could automatically detect unauthorized usage of system
research through a thorough literature review of deep
users and attacks of external attackers on the system.
learning related approaches in the field of attack detection.
In recent years, machine learning is developing with
More specially, firstly, we summarize the fundamental
incredible speed. Among different machine learning
problems, classify the previous methods, and review the
methods, deep learning structures construct artificial neural
useful methods for beginners. Then, we briefly introduce the
networks to simulate interconnecting neurons of human
great progress on deep learning techniques in cybersecurity.
brains, which brings distinctive power to solve complicated
By replacing traditional machine learning methods with
problems. Researchers thus adopt various deep learning
deep learning structures, researchers have proposed a
methods to operate attack detection, resulting in significant
quantity of novel algorithms to greatly improve the per-
achievements. However, there are still many unsolved
formance referring to higher detection rate and lower false
problems due to the limitation of deep learning methods. It
alarm rate. Afterwards, we compare and analyze the per-
is essential to make a summarization of how former methods
formance of some representative deep learning approaches
use deep learning methods to detect attacks, which could
on benchmark dataset. Finally, we make a summary of the
bring new ideas for future developments.
problems to be solved and future direction of deep learning
method to improve attack detection.
We organize the rest of our work as follows. Section 2 2.2. Applications of Attack Detection Using Deep Learning
focuses on concepts of attack detection and cyber applica- Structures. Since deep learning shows great potential in
tions via research background introduction. Section 3 offers constructing security applications, it has been widely used in
overviews on different deep learning methods for attack cybersecurity [11]. There are numerous related applications
detection, which are categorized as unsupervised and such as malware, intrusion, phishing, spam detection, and
Security and Communication Networks 3

traffic analysis [12]. We believe these successful application We divide the malware detection methods into two
examples could help analyze users’ requirements with the categories, that is, signature-based and anomaly-based de-
innovation brought by deep learning structures. Thus, we tection. Traditional antivirus software can be included in the
provide some typical applications to present practicability of first category, which detects malicious files based on file
deep learning method, where we believe these applications signature. However, slightly deformed malicious codes
can be implemented in domains of multimedia handling could be bypassed, leading to a large number of false pos-
[13], signal processing [14], and so on [15]. itives. Later, technologies of sandbox and virtual machine
appear to detect dynamic behaviors of virus, which can be
regarded as big progress from static detection to dynamic
2.2.1. Intrusion Detection. Intrusion detection system could analysis, greatly improving the ability to detect unknown
detect malicious activities by collecting and analyzing net- malicious code.
work behavior, security log, and other information available For example, in [19], Saxe discusses the deep learning of
on the network and among connected computers [16]. a four-layer network application. In order to get appropriate
Essentially, intrusion detection system checks existence of computing feature text extraction technology, PE Metadata
abnormal behaviors against system security policy and signs Features can be used. The author proposes eXpose neural
of being attacked in the system, which is capable of pro- network, where their network takes the original short strings
tecting the system with real-time responses. Under tradi- as input and extracts features to classify with character-level
tional system settings, intrusion detection system works as a embeddings. Because of the feature design of self-extraction,
reasonable, active, and efficient supplement to firewall, the method of express is better than the baseline method
which actually acts as a passive defense mean to attacks. based on manual feature extraction. Pascanu et al.’s [20]
Traditional intrusion detection system is firstly built on echo state network is helpful to extract all information by
misuse of intrusion detection technology, which mainly random time projection technology, the max pooling is used
extracts characteristics or rules of intrusion behavior. After for nonlinear sampling of data, and the logistic regression is
appearance of abnormal behavior detection technology with used for final classification of data.
traditional machine learning models, intrusion detection
system evolutes to carry out probability statistical modeling
for normal behaviors, which could analyze and alarm ab- 2.2.3. Domain Generate Algorithms. DGAs are popular to be
normal behaviors with large deviation. However, such used as malware tools to create a great quantity of domain
system may have unsatisfactory results, due to low capability names for tracking communication with C2 server. Different
in problem space defining and complexity in modeling domain names make it difficult to use standard technologies
malicious activities. like blacklist or sink-holing to prevent malicious domain
To further overcome shortcomings brought by tradi- names. DGAs are often used in various network attacks,
tional machine learning methods, deep learning technology such as spam, personal data theft, and DDoS attacks.
is performed to analyze network packets, which progres- By applying deep learning technologies, DGA is capable
sively changes the mainstream idea of intrusion detection of detecting domain names from the perspective of syntax
from blacklist to white model. A new NIDS deep learning analysis. Specifically, such novel algorithms could not only
model is proposed by Shone et al. [17], which is helpful to compare word frequency with normal domain names by n-
analyze the network traffic under the symmetrical deep gram methods but also compare the probabilities of char-
autoencoder technology. On the basis of LSTM algorithm, acter combination with normal domain names by HMM
Vinayakumar et al. [18] design a system call modeling ap- method. Moreover, it is capable of analyzing the entropy,
proach with integration method for anomaly intrusion consonant letter, and other characteristics of domain names,
detection system. System call modeling helps capture the which are utilized in LSTM for abnormal classification. Due
semantics of each call and relationship on the network. The to the slow speed and poor performance of traditional
integration method mainly focuses on the false alarm rate in technology, Feng et al. [21] provide a deep learning method
accordance with IDS design. Currently, a mature intrusion which helps to distinguish DGA domains from non-DGA
detection system could detect many kinds of attacks with the domains. In [22], the advantages of featureless extraction of
strength of deep learning structures. raw domain names as an input in LSTM network are also
discussed.

2.2.2. Malware Detection. Malware is designed to reduce 3. Deep Learning Methods for Attack Detection
performance and vulnerability of a computer, server, or
computer network. Under extreme situations, Malware will Considering the current deep learning methods for attack
result in destruction of the entire system. Malware requires detection [23] and following the categorization of the pre-
to be implanted into the target computer at first. Afterwards, vious works [24, 25], we roughly divide them into three
it could execute code, script, active content, and other categories as well, that is, unsupervised (e.g., autoencoder
software automatically or following orders from planters. It (AE), deep belief network (DBN), and generative adversarial
is noted that such software or codes could be categorized in network (GAN)), supervised (e.g., deep neural network
forms of computer viruses, worms, Trojans, spyware, ad- (DNN), convolutional neural network (CNN), and recurrent
vertising software, and malicious codes. neural network (RNN)), and other hybrid methods; we show
4 Security and Communication Networks

the details of categorization in Figure 1. Essentially, there successfully reduces computations cost of analysis by
exist other classification criterions. For example, Berman combining AE with shallow learning. Specifically, NDAE has
et al. [5] review the related deep learning methods according an additional coding stage comparing with typical AE, which
to attacks type and focus on how deep learning is used for could reduce complexity and improve the accuracy of the
various attacks. Moreover, Al-Garadi et al. [2] offer a model. We show such structure in Figure 2, where we can
comprehensive view of deep learning methods based on the observe its hierarchical feature extractor. At the end of their
applications of cybersecurity. proposed NDAE, they apply the structure of random forest
Adopting different kinds of deep learning algorithms to recognize abnormal situations with the help of feature
could bring variant advantages for attack detection representation learned from NDAEs. To evaluate their
methods. Supervised learning based methods often result in model, the authors have implemented their codes in GPU
high accuracy, due to quantity of information provided by and evaluated with KDDCup 99 and NSL-KDD, achieving
manually labeled samples. Without sufficient knowledge promising results comparing with others.
from labeled data, unsupervised learning based methods Since AE is capable of learning potential representation
are generally low in performance. However, manually la- of unknown attacks, Yousefi-Azar et al. [27] propose to learn
belling is a time-consuming task, especially for complex feature representation with AE structure for different
attacks. There even exist cases that cannot be described by a cybersecurity applications, which consists of two training
simple label, due to the inherent complexity of real-world stages, that is, pretraining and fine-tuning. The former stage
network attacks. Therefore, unsupervised learning based is designed to search for an appropriate starting point for the
methods could perform well without prior knowledge of fine-tuning stage. After determining the parameters in the
attacks, which is an obvious advantage. Hybrid methods pretraining stage, fine-tune stage will coverage offering
decrease the number of training samples and maintain a feature description for input data. Their proposed feature
relatively high performance, which is suitable to deal with learning scheme can greatly reduce feature dimensions, thus
variant attack situations. However, it is generally complex significantly minimizing storage requirements. Experiment
in structure and high in computing time, which prevents its results show their feature representation can be used in
wide usage. many domains and could achieve remarkable results com-
paring with previous works.
Since collected network raw data can be unbalanced in
3.1. Unsupervised Learning for Attack Detection distribution, Farahnakian et al. [28] utilize deep stacked
autoencoder to focus on important and informative feature
3.1.1. Autoencoder Based Methods for Attack Detection. representations, thus constructing classification models to
Let us first introduce the architecture of AE, which can be detect abnormal behaviors. Specifically, their proposed
regarded as a data compression algorithm with neural network consists of 4 AEs in sequential order, which will be
network structure. In fact, it is capable of firstly compressing trained in a greedy layerwise fashion. Experimental results
the input into feature space representation and then on KDDCup 99 dataset show it could achieve high accuracy
reconstructing representation into the output. Since AE can for abnormal detection, that is, 94.71%, even under the
be regarded as a typical representing learning algorithm, it is situation of unbalanced data.
widely used for dimension reduction and outlier detection. In order to construct a flexible system for detecting
Researchers in cybersecurity also adopt AE to represent intrusion attacks, Javaid et al. [29] utilize sparse AE and
abnormal behaviors in its compressed feature space, which softmax-regression layer for construction and self-taught
brings the advantage of dynamical representation for un- learning (STL) for the training process. Specifically, their
known category of attacks. proposed STL could be divided into two steps, where sparse
To extract informative feature descriptors from original AE is used for unsupervised feature learning at first and
network traffic data, Zhang et al. [26] propose to detect softmax-regression is used for classification after feature
network intrusion by stacking dilated convolutional AE extraction. In fact, usage of STL could largely improve the
(DCAEs), which is a successful combination of self-taught learning ability of constructed network facing unknown
and representation learning. Specifically, original network attacks, where new categories of attacks can be incrementally
traffic data is firstly transformed into a vector through the analyzed during runtime without troubles of training from
preprocessing step. During unsupervised training, DCAEs scratch.
learn the hierarchical structure of feature representation Following such idea, Papamartzivanos et al. [30] present
from a large number of unlabeled samples. Afterwards, use a more powerful approach with MAPE-K framework, which
backpropagation algorithm and a few labeled instances to could construct a misuse intrusion detection system with
fine-tune and improve feature description ability learned scalable, self-adaptive, and autonomous characteristics. It
from the unlabeled instances. In fact, using original network could extract generalized features for problem reconstruc-
traffic and unsupervised pretraining makes their model tion, even facing unknown environment and using unla-
more adaptive and flexible to deal with complicated raw beled data. They believe their proposed method could work
data. well by grasping the nature of variant attacks, where they
Following the idea to facilitate intrusion detection with further design experiments to show that their method could
AE models, Shone et al. [17] propose nonsymmetric deep AE deal with new situations without updating the training set
(NDAE) for unsupervised feature blearing, which manually.
Security and Communication Networks 5

Deep learning for


attack detection

Unsupervised Supervised Hybrid

Generative Convolutional Recurrent


Deep belief Deep neural
Autoencoder adversarial neural neural
network network
(AE) network network network
(DBN) (DNN)
(GAN) (CNN) (RNN)

Figure 1: Categorization of the current deep learning methods for attack detection.

Hidden Hidden Hidden


X1 layer X′1 X1 layer X′1
layer

X... h1 X′... X... h1 h2 X′...

Xn X′n Xn X′n

Encode Decode Encode Encode Encode

Typical autoencoder Nonsymmetric deep autoencoder

Figure 2: Network structure of Shone et al. [17], which is a novel structure of AE designed with nonsymmetrical multiple hidden layers.

Feature extraction is one of the major issues to address hidden layers. They find the best parameter settings for DBN
for attack detection. Regarding AE as a structure for in- is a four-layer DBN model, which could achieve better
formation compressing and feature generation, utilizing AE performance than other machine learning methods on
brings advantages of automatical and dynamical feature KDDCup 99 dataset.
construction, resulting in high accuracy for detecting pre- Afterwards, Ding et al. [32] represent malware as opcode
defined attacks existing in datasets. Facing variant and sequences and use DBN to detect malware, where they use
unknown attacks which are the main characteristics in unsupervised learning to pretrain a multilayer generative
cybersecurity, researchers have emphasized self-learning model to help DBNs solve the overfitting problem. We show
strategies to make AE more powerful. its structure in Figure 3, where we can observe DBN works as
classier in the whole workflow with steps of RBM training
and BP fine-tuning. With the help of additional unlabeled
3.1.2. Deep Belief Network Based Methods for Attack data, their proposed DBN could achieve accuracy as high as
Detection. Deep belief network (DBN) could be divided 96%, which outperforms three other traditional artificial
into two categories, that is, restricted Boltzmann machines intelligence models, that is, SVM, kNN, and decision tree.
(RBM) with several layers of unsupervised learning net- However, their methods are not justified by other metrics.
works and backpropagation neural network (BPNN or Since behavioral characteristics of ad hoc networks have
BP) with one such layer. Essentially, RBM is a random brought great challenges to network security, Tan et al. [33]
structure of generating neural network, which is undi- propose a deep belief network based on ad hoc network
rected graph model composed of different layers con- intrusion detection structure. Their proposed DBN model
structed by visible neurons and hidden neurons. Due to contains 6 modules: wireless monitoring node for data
the natural characteristics of RBM, it is effective for DBN fetching, data fusion module to fuse useful data and remove
to train layer by layer. redundancy, DBN training module and DBN intrusion
Early, Gao et al. [31] focus on dealing with big raw data module to train and identify whether there is intrusion,
and apply deep belief network to construct such intrusion respectively, and response module that expresses results of
detection system. In their paper, they try different DBN the proposed model to users. Experimental results show
models by adjusting parameters like number of layers and their proposed method can reach 97.6% in accuracy, leading
6 Security and Communication Networks

best performance (reported as 97.7% in accuracy and 8s CPU


PE files time for each instance) when testing with KDDCup 99
dataset. Their method offers possibility to implement deep
learning methods for attack detection on low computation
PE parser Feature DBN resource platforms like drones, cell phones, and personal
selection classifier computers, which greatly expands usage scenarios of such
PE file methods.
unpacking Because the traditional intrusion detection approaches
Selecting opcode RBM face difficulties dealing with high-speed network data and
N-grams training cannot detect the unknown attacks at present, Zhang et al.
[37] propose a network attack detection model integrating
Decompiling flow calculation and deep learning, which comprises two
PE file parts: real-time detection algorithm based on frequent
patterns and a classification algorithm based on the DBN
BP and SVM. Sliding window stream data processing can realize
Constructing
Fine-tuning
PE file real-time detection, and the DBN-SVM algorithm can im-
Extracting
opcode n-
feature vectors prove classification accuracy. Based on the CICIDS2017
grams dataset, several groups of comparative experiments are
carried out. The method’s real-time detection efficiency is
Figure 3: Workflow of opcode malware detection approach higher than that of traditional algorithms.
proposed by Ding et al. [32], which consists of three major
components: PE (Portable Executable) parser, feature extractor,
and malware detection module. It is noted that DBN is the core 3.1.3. Generative Adversarial Network Based Methods for
classier of malware detection module. Attack Detection. Due to property of discovering inherent
pattern of data to generate new samples, generative
adversarial network (GAN) is one of the most promising
it to be fit with implementation in intrusion detection unsupervised learning methods proposed in recent years.
applications. The main inspiration of GAN comes from the idea of zero-
To explore the capabilities of DBN for detecting intru- sum game. When it is applied to deep neural network, it
sion attacks, Alom et al. [34] propose an effective platform to keeps playing games between generator G and discriminator
explain intrusion attempts in network traffics. Their con- D, and finally G is capable of learning distribution repre-
structed system firstly uses digital encoding and standard- sentation of actual data. G is to imitate, model, and learn
ized method to select features and then uses DBN to classify distribution characteristics of real data as much as possible,
network intrusion by assigning class label to each feature while the task of D is to distinguish whether an input data
vector. According to their experiments and analysis, their comes from real data or output of G. Through the contin-
constructed system can not only detect attacks, but also uous competition between these internal models, the gen-
accurately identify and classify network activities according eration ability and discrimination ability of both G and D
to limited, incomplete, and nonlinear data sources. can be greatly improved.
Many trials have been applied in using DBN for in- Even though GAN is new in conception and hard in the
trusion detection. However, there still exist many unsolved training process, researchers successfully build several attack
problems, such as redundant information, easy to trap into detection applications by regarding it as basic structure. For
local maximal. To solve these problems, Zhao et al. [35] instance, Erpek et al. [38] propose a GAN-based approach to
propose to detect intrusion attacks by involving strength of detect jamming attacks on wireless communications and
DBN and probabilistic neural network (PNN). Firstly, they defend it based on collected information of attacks. Spe-
rescale original input data to low-dimensional by utilizing cifically, their model consists of a transmitter, a receiver, and
nonlinear describing capability of DBN. Meanwhile, DBN a jammer. A pretrained classifier is adopted by the trans-
could maintain basic characteristics of original data in mitter to predict the current channel state and decide
representation. Secondly, particle swarm optimization al- whether to send based on the latest sensing results, while the
gorithm is used to reduce the size of hidden nodes of every jammer collects the channel state and ACKs to construct a
layer. Thirdly, PNN is introduced to classify low-dimen- classifier, which could predict next transmission and block it
sional information. Their experiments on KDDCup 99 successfully. The jammer uses classification score to control
dataset show they have solved the above problem to a certain the power under the average power constraint. Afterward, a
extent. GAN is designed to perform as a jammer, which can cut
Regarding real-time attacks detection as the biggest down collection time by adding synthetic samples.
challenge of intrusion detection, Alrawashdeh and Purdy Utilizing machine learning technology to perform
[36] propose an anomaly detection method based on DBN, phishing detection, that is, URL of fake web address, is
which only consists of one-hidden layer RBM and a fine- popular, due to its high effectiveness and real-time response.
tuning layer constructed by logistic regression classifier. However, adversary may bypass URL classification algo-
Their simplest design of DBN achieves instant running and rithm by modifying components. To solve this problem,
Security and Communication Networks 7

AlEroud and Karabatis [39] propose to generate URL-based can effectively defend against four special testing attacks,
phishing examples by using generator of GAN, which are where standard DNN fail to detect all these attacks.
then shipped to discriminator, that is, black-box phishing Challenges arise motivated by the fact that malicious
detector. In their proposed GAN model where its structure is attacks are constantly varying and occur on very large
shown in Figure 4, generator network could generate dis- volumes which require scalable solutions. To meet this
turbed versions of real phishing examples and convert them challenge, a DNN structure with a scalable and hybrid design
into adversary examples. Discriminator network learns to is proposed by Vinayakumar et al. [18], which can watch
classify both generated examples and real ones working as a network traffic and host level events in real-time, actively
phishing detector, where the generator parameters and warning possible network attacks. Specifically, their pro-
weights are updated with information passing from the posed framework adopts scalable computing architecture,
discriminator. After testing with a public phishing dataset, text representation method, and DNNs to meet the re-
their experimental results show that their proposed GAN is quirement to process big data, where DNN could help
successful by avoiding a large number of unknown phishing improve the performance of their model with functions of
examples. nonlinear activation.
GAN is not often used for attack detection field. In fact, For network administrator, it is an urgent task to prevent
GAN is in fast developing in terms of structures, algorithms, the invasion of malicious network hackers and keep the
and so forth. At present, GAN have shown promising results network system and computer in a safe and normal oper-
in many domains, which lead us to believe this proposing ation state. Peng et al. [42] propose a network intrusion
new technique to synthesize attempts is quite significant in detection method based on deep learning, which uses deep
creating a defensive mechanism. Such novel defensive neural network to extract features of network monitoring
mechanism can further complete quantity of tasks, such as data, and BP neural network is used to classify intrusion
preventing zero-day phishing attempts, performing opinion types. The method is evaluated by KDDCup 99 dataset. The
spam, and detecting intrusion attacks. Therefore, we think results show that the method achieves the accuracy of
there exists a broad research space to connect GAN structure 95.45%, and it has a significant improvement while com-
with attack detection filed. pared with the traditional machine learning method.

3.2. Supervised Learning for Attack Detection 3.2.2. Convolutional Neural Network Based Methods for
Attack Detection. CNN involves convolution computation
3.2.1. Deep Neural Network Based Methods for Attack and depth structure, which is a representative and com-
Detection. DNN is recognized as multilayer perceptron due monly used techniques in deep learning domain. Specifi-
to characteristic of multiple hidden layers. Such multilayer cally, CNN uses multilayer perception variant design
feature brings advantage to express complex functions with requiring minimal preprocessing. The basic structure of
fewer parameters, which makes DNN capable of facilitating CNN is composed of input and output layers and multiple
tasks of feature extraction and representation learning. hidden layers which include convolution, pooling, and full
Essentially, there exist three categories of layers in DNN. connection layer. Compared with other classification algo-
Generally speaking, we regard the first layer as input layer, rithms, CNN uses relatively less preprocessing and is in-
the last layer as output layer, and middle layers as hidden dependent of feature design containing prior knowledge,
layers. which are its main advantages.
To provide a solution to network security problem, Convolutional neural network has been applied to
Tang et al. [40] propose a DNN model to perform flow network security field with much promising progress. For
based anomaly detection. Their first attempt in applying example, Kolosnjaji et al. [43] attempt to construct a neural
DNN for network security results in a relatively simple network with convolutional and recursive network layers,
DNN, which is composed of one input layer, three hidden which obtains classification features to model malware
layers, and one output layer. Some experiments are detection system. Through their proposed method, they
carried out on NSL-KDD dataset, where the proposed obtain a hierarchical feature extraction architecture, which
DNN model is proven to detect zero-day attack combines advantages of convolution operation from con-
and behaves better than the other machine learning volutional layer and sequence modeling from recursive
methods. network layer. Afterwards, Kolosnjaji et al. [44] further
To enhance ability of DNN, Li et al. [41] propose a novel develop it to involve with feature derived from headers of
network structure called HashTran-DNN to classify Android Portable Executable files, which achieves quite remarkable
malware. We show its architecture design in Figure 5, where accuracy and recall rate under cases of fusing data.
we can observe their most innovation point lies in trans- To detect attack indicators in advance, Saxe and Berlin
forming input samples by using hash functions to preserve [19] propose eXpose neural network, where their network
locality characteristics. After transforming input data, takes the original short strings as input and extracts features
HashTran-DNN uses AE to perform denoising task, so that to classify with character-level embeddings. It is noted that
DNN classifier can obtain locality information in the po- their original inputs are a wide and complicated range for
tential space for better performance. After analyzing the algorithms to deal with. Owing to the self-extracted feature
experimental results, we can observe that HashTran-DNN design, eXpose is superior to baseline methods based on
8 Security and Communication Networks

Dataset with phishing


& legitimate URLs

Labeled data

Legitimate/
benign URLs Phishing detector
Tensors
URL features Discriminator

Phishing URLs

Adversarial URLs
Generator
Feedback

Noise

Figure 4: Overview of steps for the GAN model proposed by AlEroud and Karabatis [39].

Input layer Hashing layer First hidden Latent space Adjusted training
layer representations procedure

Wc2
L L Classifier
Mapping Wh Wc3
Wc1
MH = {H1; ... ; HL}
Wd L
Decoder
(a) HashTran-DNN architecture, which contains a new “hashing layer”
Extracting features Hashing layer Hidden layer Decoder & classifier
Hash value
Hash functions representation

MH = {H1; ... ; HL}

Training
No
Rejection? 1
Testing Classfication
Yes Adversarial
Mapping example

Benign files
Malicious files
Figure 5: Architecture design of HashTran-DNN model proposed by Li et al. [41].

manual feature extraction. However, it achieves a decrease in requests, Zhang et al. [46] propose a word2vec representing
false alarm rate compared with these baselines, which proves and CNN-based malicious detection approach, which is the
automatically feature extraction process in CNN is not first attempt to combine “word2vec” and CNN in malicious
robust and reasonable enough with introducing extra or detection domain. Specifically, they first introduce the
even noise information from original inputs. “word2vec” tool to represent each word obtained from
Malicious web shell detection is an important means to HTTP by features. Then, they represent the web request as a
protect network security. Aiming at analysis of HTTP fixed-size matrix by concatenating features. Finally, they
Security and Communication Networks 9

build up the shell classification model based on CNN


Preprocess phase
structure. Several groups of experiment are carried out, and
the proposed method performs the best when comparing ISCX dataset
with relevant classical classifiers. (raw traffic)
To achieve robust performance in attack detection with
CNN structure, an end-to-end encrypted traffic classifica-
tion method based on one-dimensional CNN is presented by
Labels Training traffic Test traffic
Wang et al. [45], in which feature extraction, selection, and
classifier are integrated into an end-to-end framework. We
show its detailed network design in Figure 6, where their
proposed 1D-CNN as learning algorithm directly learns Idx1 files Idx3 files Idx3 files
relationship between automatical extracted features and
outputs with predicted labels in training phase. In their
experiment, they adopt ISCX VPN-nonVPN traffic dataset Training phase Test phase
for verification, where they achieve better performance than
the latest methods in 11 of 12 evaluation measurements. Minibatch
Such promising results are remarkable, due to robust and SGD training
informative traffic data representation and fine-tuning steps
to improve model ability. Regarding network traffic data as Fine-tuned
two-dimensional image, a new traffic analysis approach CNN model
CNN model
based on CNN is further proposed by Wang et al. [47]. They
test their algorithm on USTC-TRC2016 flow dataset to show
average classification accuracy is as high as 99%. Predict labels
To solve the diversity attack of wireless network traffic
and improve the detection ability of malicious intrusion in Figure 6: Workflow of the traffic analysis approach proposed by
wireless network, an intrusion detection method based on Wang et al. [45], which consists of three parts: preprocess, training,
improved convolutional neural network is proposed by and testing phase.
Yang and Wang [48], namely, ICNN-Based Wireless Net-
work Intrusion Detection Model. Preprocess the network
traffic data, and then model the data using CNN. CNN category of neural network structures, which is designed
abstracts low-level intrusion traffic data into high-level with “memory” function to maintain previous content. In
features, automatically extracts sample features and opti- fact, such design feature coincides with the idea that “human
mizes network parameters through random gradient descent cognition is based on the past experience and memory.”
algorithm to converge the model. The results on the RNN is thus good at dealing with time-series information.
KDDTest + show that the detection accuracy is 8.82% and However, there are still some problems in structure design of
0.51% higher than that of LeNet-5 and DBN, while the false RNN like gradient disappearance or gradient explosion,
positive rate is also lower. It also has a big advantage which leads failure to remember or model long-time de-
compared to the previous methods. pendence. Therefore, researchers develop LSTM and GRU
Low rate denial of service (LDOS) attacks reduce the with gates design and memory cell, which successfully keep
performance of network services, and it is difficult to dis- long-time relationship unforgotten by passing through
tinguish the attack behavior from the normal traffic. Thus, a important components of information flow.
new detection method of LDOS attack based on multifeature Early, Staudemeyer [50] proposes to consider time-series
fusion and convolutional neural network (CNN) is proposed characteristics of known malicious behavior and network
by Tang et al. [49]. They calculate features and fuse them into traffic, which may improve accuracy performance of attacks
a feature map to describe the state of the network. The CNN detection algorithms. To confirm this, they implement
model is used to distinguish and detect feature maps in- LSTM for intrusion detection based on excellent property of
cluding LDOS attacks. Experiments are carried out on NS2 LSTM to model long-time dependant relationship. They
simulation platform and test-bed and results show that the design a four-memory blocks network, each of which
proposed method can effectively detect LDOS attacks with contains two cells. The network is capable of keeping balance
accuracy of 97.1%. in both computational cost and detection performance.
Their experimental results indicate that the proposed LSTM
model is better than previously published methods since
3.2.3. Recurrent Neural Network Based Methods for Attack LSTM could learn to backtrack and correlate continuous
Detection. Since the output of DNN and CNN only con- connection records in a time-varying manner.
siders the influence of the current input without considering Later, Krishnan and Raajan [51] apply RNN to perform
information from the previous and future time, they could task of attack classification, where their anticipated model is
achieve significant performance on the classification or constructed as a sawy self-erudition based Intrusion De-
recognition tasks without time-varying characteristics. In- tection System by RNN structure. During the experiments,
volving time-dependent data, RNN is proposed as a special their proposed intrusion detection system could filter
10 Security and Communication Networks

attacks, but fail to identify false positives. Comparing with with the idea of integrating advantages of different deep
the baseline methods, their proposed method has improved learning structures.
in measurements, such as classification accuracy and time- Early in 2015, Li et al. [57] apply a AE and DNN based
consuming. hybrid deep learning method for malicious code detection.
Similarly, Yin et al. [52] explore utilizing RNN for in- Specifically, they adopt AE to reduce dimensions of original
trusion detection named RNN-IDS, where they evaluate data and focus on the main and important features. Af-
RNN-IDS with forms of binary classification and multiclass terward, they use a DBN-based learning model to do the
classification. In fact, RNN model has one-way information detection of malicious code, which consists of multilayer
flow from the first units to the hidden, also from the previous RBM and a layer of BPNN. Defining each layer of RBM as
hidden unit to the current one, where the hidden units could unsupervised trained and BP as supervised trained, their
be regarded as storage units to store end-to-end and useful optimal hybrid model is finally obtained by fine-tuning the
information for classification. They have tested whether the whole network. Experiments show that detection accuracy of
parameters, such as number of the neurons, have impact on their hybrid network is higher than other previous DBN-
the RNN-IDS using NSL-KDD dataset. When comparing based networks.
with previous works such as ANN, random forest, and SVM, Later in 2017, Ludwig [58] employs an ensemble network
RNN-IDS has an advantage in classification performance to classify various types of attacks. In fact, the neural net-
with high accuracy. work learning classifies targets with multiple classifiers and
Since LSTM solves the long-term dependency problem merges their results to form robust outputs. To distinguish
and overcomes the vanishing gradient drop during training, between normal and abnormal behaviors, their proposed
Kim et al. [54] apply LSTM architecture for intrusion de- method fuses AE, BNN, DNN, and extreme learning ma-
tection, where the size of hidden layer and the learning rate chine for better performance. Their proposed ensemble
are settled as 80 and 0.01 after experiments. Comparing with method brings promising results, which achieve more ac-
Staudemeyer [50], the constructed LSTM model has a higher curate performance than utilizing single classifier for de-
false detection rate when training with the KDDCup 99 tection task.
dataset. Following the trend of applying LSTM on attack Following the idea of fusing classifiers to obtain better
detection, Le et al. [55] build a LSTM classifier to detect results, Li et al. [59] propose an ensemble structure to en-
intrusion as well. They aim is to find the most suitable hance the robustness of neural networks for malware de-
optimizer for gradient descent optimization of LSTM, where tection in 2018; the network is shown in Figure 8. More
they compare six widely used optimization methods, that is, specifically, a group of neural networks are trained in the
Adagrad, Adadelta, RMSprop, Adam, Adamax, and Nadam, training stage and each classifier keeps its counter such as
and find the most effective one is LSTM with Nadam input conversion and semantic preservation. In the test
optimizer. stage, the labeled samples are determined by voting of
To reduce high false alarm rate achieved by the former different classifiers. Their proposed ensemble framework is
methods, a system-call analysis method is proposed by Kim applied to the challenge of AICS 2019 and has received a
et al. [53], which is developed for anomaly-based host in- good performance in both accuracy and recall.
trusion detection system. As shown in Figure 7, their method In order to detect network attacks effectively, Liu et al.
consists of two modules: the front-end module, that is, [60] propose an end-to-end detection method in 2019. Based
system call language models, which is used to model time- on the deep learning model, the author proposes two
varying characteristics of system calls with LSTM structure payload classification models: PL-CNN and PL-RNN. The
in various environments, and the back-end module which is model learns feature representation from the original pay-
used to predict exceptions based on information passing load without feature engineering and end-to-end detection.
from the front-end module by a set of ensemble and At the same time, they design a data preprocessing method,
threshold-based classifiers. which can keep enough information while keeping effi-
GRU is a variant of LSTM, in which softmax function is ciency. The accuracy of the proposed methods is 99.36% and
used as the final output layer. Moreover, GRU uses cross- 99.98%, respectively, when applied to DARPA 1998 dataset.
entropy function to calculate its losses. Based on GRU The proposed methods support the use of network data flow
structure, Agarap [56] proposes a novel network for binary for effective end-to-end attack detection, so as to solve the
classification in the attack detection field, which regards a practical problem.
total of 21 features as model inputs. Specifically, linear Most recently in 2019, Zhang et al. [61] do not design the
support vector machine (SVM) is introduced to replace characteristics of the flow but directly extract the original
softmax function of the proposed GRU model, which could data information for analysis. At the same time of learning
achieve relatively better effects than the traditional GRU- the temporal and spatial characteristics of flow, a new
softmax network on public datasets, due to fast convergence network intrusion detection model called deep network is
and better ability in classification. proposed, which integrates the improved leNet-5 and LSTM
neural network structure. The CICIDS2017 dataset and the
CTU dataset are used to evaluate the performance of the
3.3. Other Deep Learning Methods for Attack Detection. network. The amount of traffic is large, and the type of attack
In this subsection, we aim to emphasize on the hybrid is relatively new. The experimental results show that the
category of methods on attack detection, which are designed performance of the network model is better than other
Security and Communication Networks 11

System call
Thresholding
language model
classifier Cf
LM1 1

System call Normal


Thresholding Ensemble or
Normal training data language model
classifier Cf classifier Cf- abnormal
LM2 2

System call
Thresholding
language model
classifier Cf
LMm m

Query sequence

Figure 7: Structure design of Kim et al. [53] for intrusion detection system.

Ensemble fen
Transformation semantics-preservation
Classifier f1

Loss function
Transformed Encoder
data Decoder
Preprocessing

Target labels
Training data

Transformation semantics-preservation
Classifier fl

Loss function
Transformed
Encoder
data Decoder
Training
Testing x Classifier f1
Sample x Voting
y
x
Classifier fl

Forward
Gradient
Figure 8: Workflow of the hybrid model proposed by [59].

network intrusion detection models, and it can achieve the service or resource access to system, Probe represents sur-
best detection accuracy. veillance and probing, and R2L refers to the unauthorized
access while there is an illegal access from the remote
4. Comparisons and Analysis machine to local one and represents that there is an un-
authorized access to local superuser privileges by local
4.1. Public Datasets. Many public datasets are popular to unprivileged user. In Table 1, we display 22 different attacks
prove and compare efficiency and effectiveness among in training and test data, which could be categorized into
different attack detection methods. Among them, we list two these four attack types.
famous benchmark datasets, that is, KDDCup 99 and NSL- In KDDCup 99 dataset, each record has 41 features in
KDD, which are widely used in the academic research to total including basic features, content features, and traffic
evaluate the ability to detect attacks. features as shown in Table 2, where the basic features are
obtained from TCP/IP connections including basic char-
acteristics of connection. The content features are extracted
4.1.1. KDDCup 99 Dataset. Despite the fact that there exist
from data content, which can be used in the detection of
some drawbacks like containing a great deal of redundant
U2R and R2L attacks, which are usually hidden in the
training and testing data, KDDCup 99 dataset is famous in the
packets data without abnormal appearance in single packet
field of cybersecurity. It includes both labeled training data and
and normal connection. Meanwhile, traffic features refer to
unlabeled test data, which correspond to seven and two weeks of
accumulated values in a time window with 100 connections.
data originated from DARPA′98 IDS evaluation program [62].
It is noted that 7 features and 34 features are symbolic and
Five categories of labels are contained in the dataset
continuous in data type, respectively.
which are normal, DoS, Probe, R2L and U2R, that is, short
for DoS, Probe, R2L, and U2R, where normal refers to
normal traffic instances, Dos is an attack in which the at- 4.1.2. NSL-KDD Dataset. NSL-KDD is famous as a new
tacker tries to make the target machine stop providing development of KDDCup 99 dataset, which comes out to
12 Security and Communication Networks

Table 1: Category of 22 different attacks contained by KDDCup 99.


Class label Attack name
DoS back, land, neptune, pod, smurf, teardrop.
Probe ipsweep, nmap, portsweep, satan.
R2L ftp_write, guess_passwd, imap, multihop, phf, spy, warezclient, warezmaster.
U2R buffer_overflow, loadmodule, perl, rootkit.

Table 2: Feature set for each instance in KDDCup 99 dataset. NSL-KDD and KDDCup 99 dataset are similar in
No. Features Types
structure, where both of them are divided into four attack
types as mentioned before. NSL-KDD dataset is divided into
1 Duration Continuous
two parts: KDDTrain+ and KDDTest+, where we show the
2 protocol_type Symbolic
3 Service Symbolic specific numbers corresponding to each attack type in Ta-
4 Flag Symbolic ble 3. It is noted that there are 17 attack types in KDDTest+,
5 src_bytes Continuous which do not appear in KDDTrain+. This interesting setting
6 dst_bytes Continuous makes NSL-KDD more challenging than KDDCup 99
7 Land Symbolic dataset, which imitates real-life network environment with
8 wrong_fragment Continuous unknown attacks. We believe only these learning methods
9 Urgent Continuous built on realistic theoretical basis, that is, analyzing inherent
10 Hot Continuous characteristics of attack behaviors, would achieve promising
11 num_failed_logins Continuous results on NSL-KDD.
12 logged_in Symbolic
13 num_compromised Continuous
14 root_shell Continuous 4.2. Measurements. In this subsection, we describe 7 mea-
15 su_attempted Continuous surements including accuracy (ACC), precision (PR), true
16 num_root Continuous
positive rate (TPR), recall (RE), false positive rate (FPR), true
17 num_file_creations Continuous
18 num_shells Continuous negative rate (TNR), and F1-score. Firstly, we define several
19 num_access_files Continuous items, where true positive (TP) and false negative (FN) refer
20 num_outbound_cmds Continuous to attack data correctly classified or not, respectively, and
21 is_hosts_login Symbolic false positive (FP) and true negative (TN) are normal data
22 is_guest_login Symbolic which are classified as normal or attack, respectively. Af-
23 Count Continuous terwards, we define measurements as follows:
24 srv_count Continuous
(TP + TN)
25 serror_rate Continuous ACC � ,
26 srv_serror_rate Continuous (TP + FN + TN + FP)
27 rerror_rate Continuous
28 srv_rerror_rate Continuous TP
PR � ,
29 same_srv_rate Continuous (TP + FP)
30 diff_srv_rate Continuous
31 drv_diff_host_rate Continuous TP
RE � ,
32 dst_host_count Continuous (TP + FN)
33 dst_host_srv_count Continuous
34 dst_host_same_srv_count Continuous FN
35 dst_host_diff_srv_rate Continuous FNR � , (1)
(TP + FN)
36 dst_host_same_src_port_count Continuous
37 dst_host_srv_diff_host_rate Continuous FP
38 dst_host_serror_rate Continuous FPR � ,
39 dst_host_srv_serror_rate Continuous
(FP + TN)
40 dst_host_serror_rate Continuous TN
41 dst_host_srv_rerror_rate Continuous TNR � ,
(TN + FP)

(2 ∗ PR ∗ RE)
reduce shortcomings of the previous dataset. Specifically, it FS � ,
(PR + RE)
not only removes redundant data from the training and test
data to achieve more accurate detection rate but also offi- where ACC shows the proportion of the amount of data that
cially sets the number of records in both training and test are correctly classified to whole data, PR calculates the
data. Moreover, different difficulty level group has different proportion of the amount of attack data that are correctly
number of records, which is inversely proportional to the classified to all attack data representing how many attacks
percentage of that in the primary KDD dataset. Hence, predicted are actual attacks, TPR or RE shows the pro-
evaluations and comparisons among different learning portion of predicted attacks to all attacks, FNR estimates the
technologies become more effective and obvious. percentage of the number of misclassified normal data to all
Security and Communication Networks 13

Table 3: Records distribution in training and test data [63]. methods, especially unsupervised learning methods, could
Class KDDTrain+ KDDTest+
bare the shortage of sufficient training samples.
We can observe that performance of AE-based methods
Dos 45927 74588
is uneven, where most of the improved AE-based methods
Probe 11656 2421
R2L 995 2754 obviously perform better than traditional AE-based
U2R 52 200 methods. This is due to the fact that the structure of AE
might lose important information during compression
process. Meanwhile, improved AE could better capture
normal data, FPR called FAR measures the proportion of the important and informative parts of input data with addi-
benign events that are incorrectly classified as attack, TNR is tional designs. Similarly, LSTM-based and GRU-based
recognized as the proportion of attack data that are correctly methods outperform RNN-based methods, due to their
classified to the whole attack data, and F1-score is the features in structure design of gates and memory cells. In
weighted average of PR and RE and representing balance fact, such intelligent designs bring advantage of capability of
performance in both precision and recall. maintaining long-term information, thus better modeling
long-time relationship.
Due to the large number of DBN- and RNN-based (e.g.,
4.3. Comparisons and Performance Analysis. In Table 4, we LSTM and GRU) methods for attack detection proposed by
offer detailed statics on attack detection results achieved by researchers, we would like to regard DBN- and RNN-based
various methods listed in Section 3, where most of the listed methods as typical unsupervised and supervised algorithms,
deep learning methods are designed to perform network respectively, where we further compare them to show the
intrusion detection and malware detection. Among quantity advantages and disadvantages of both groups.
of measurements, we select accuracy, precision, F1-score, Essentially, RNN could remember information of the last
and FPR as evaluations, since most of the listed methods use several moments and then apply it in the calculation for the
these measurements for experiments. We must emphasize current unit, which introduces temporal information to help
that there exist imbalances in performance comparisons more accurate classification. However, RNN can be powerful
since different authors adopt different datasets, measure- structure with sufficient training instances, where attack data
ments, and settings. However, Table 4 can still provide much especially those unknown attacks are hard to be achieved.
information by roughly comparing different deep learning Meanwhile, DBN is capable of automatically discovering
methods for attack detection. feature pattern from input data. Moreover, the unsupervised
From Table 4, we could notice that the mean perfor- DBN network is less likely to be overfitting than those
mances of different categories of attack detection methods supervised methods due to its pretraining procedure, where
are variant. In the authors’ opinion, DBN, LSTM, CNN, and DBN could learn inherent descriptions on abnormal be-
AE achieve the detection performance in descending order. haviors or attacks by learning from unlabeled data. This
Meanwhile, hybrid methods are inconsistent, since their feature of generated ability makes DBN, that is, a typical
performances are highly related with ensemble classifiers. unsupervised learning method, fit with real environment of
DBN is the highest in performance, due to its inherent network security. Last but not least, DBN is easy to be
property of multiple layers in dealing with quantity of trained, fast to be converged, and low in running time, due to
unlabeled data. LSTM may achieve higher performance than less hidden layers compared with deep structures of CNN or
CNN by involving temporal property for more precise so. Therefore, we think unsupervised learning methods
modeling. AE may suffer from large unlabeled data without could produce better classification results than supervised
enough prior knowledge or enough layers to describe the learning methods, especially when facing small, imbalanced,
complexity embedded. or redundant dataset.
Essentially, it is interesting to point out that RBMs and
AEs are popular in intrusion detection because we can 5. Summary
pretrain the RBMs and AEs with unlabeled data and fine-
tune with only a small number of labeled data. Regarding Deep learning uses cascaded layers in a hierarchy structure
ACC values achieve by listed methods as the first evaluation to perform data processing, which results in significant
index due to its completeness, we can find the best per- results in domains of unsupervised feature learning and
formance achieved by attack detection methods on pattern recognition. Inspired by performance of deep
KDDCup 99 dataset, that is, 99.8% achieved by Kim et al. learning methods, we believe deep learning is important for
[53], is larger than that on NSL-KDD dataset, that is, 98.3% field of network security, so as to review the current deep
achieved by Javaid et al. [29], which proves that NSL-KDD learning methods for attack detection. We analyze recent
dataset is much more difficult than KDDCup 99 dataset due methods, classify them according to different deep learning
to settings of unknown instances in testing dataset. Another techniques, and compress the performance of the most
interesting point is that all CNN-based methods abandon representative methods.
the usage of KDDCup 99 and NSL-KDD datasets since their Over the past few years, research on how to apply deep
small number of samples could not support showing dis- learning methods on attack detection has made a great
tinguished power of CNN for generating feature descriptors progress. However, many problems still exist. Firstly, it is
with abundant information. Meanwhile, other deep learning challenging to modify deep learning methods as real-time
14 Security and Communication Networks

Table 4: Quantitative evaluation of listed attack detection methods using different deep learning structures, where ID, MD, and TI represent
network intrusion detection, malware detection, and traffic identification, respectively.
DL Method Usage Dataset ACC (%) PR (%) FPR (%) FS
Convolutional AE Yu et al. [26] ID CTU-UNB — 98.44 — 0.980
Sparse AE Javaid et al. [29] ID NSL-KDD 98.30 — — 0.990
AE Pamartzivanos et al. [30] ID KDDCup 99 77.99 80.00 — —
SAE Farahnakian and Heikkonen [28] ID KDDCup 99 94.71 94.53 0.42 —
AE Shone et al. [17] ID NSL-KDD 89.22 92.97 10.78 0.910
Sparse AE Shone et al. [17] ID KDDCup 99 97.85 99.99 2.15 0.980
AE Aygun and Yavuz [64] ID NSL-KDD 93.62 91.39 — 0.938
Denoising AE Aygun and Yavuz [64] ID NSL-KDD 94.35 94.26 — 0.940
Sparse AE Gharic et al. [65] ID NSL-KDD 96.45 95.56 — 0.965
AE Yousefi-Azar et al. [27] ID, MD NSL-KDD 83.34 — — —
DBN Gao et al. [31] ID KDDCup 99 93.49 92.33 0.76 —
DBN Ding et al. [32] MD Netflow 96.10 — — —
DBN Qu et al. [66] ID NSL-KDD 95.25 — — —
DBN Tan et al. [33] ID Netflow 97.60 — 0.90 —
DBN Alom et al. [34] ID 40% NSL-KDD 97.50 — — —
DBN Zhao et al. [35] ID KDDCup 99 99.14 93.25 0.62 —
DBN Alrawashdeh and Purdy [36] ID 10% KDDCup 99 97.90 97.81 2.10 0.975
DNN Tang et al. [40] ID NSL-KDD 91.70 83.00 — —
DNN Vinayakumar et al. [18] ID, MD KDDCup 99 93.00 99.00 0.95
DNN Wang et al. [42] ID KDDCup 99 95.45 — — —
CNN Kolosnjaji et al. [43] MD Netflow — 93.00 — 0.920
CNN Saxe and Berlin [19] MD Netflow 92.00 — 0.10 —
CNN Wang et al. [45] ID ISCX — 97.30 — 0.960
CNN Wang et al. [47] TI Netflow 99.41 — — —
CNN Tang et al. [49] ID NS2 simulation 97.1 — — —
CNN Yang and Wang [48] ID KDDCup 99 95.36 95.55 0.76 0.930
LSTM Staudemyer [50] ID 10% KDDCup 99 93.85 — 1.62 —
RNN Krishnan and Raajan [51] ID KDDCup 99 77.55 84.60 — 0.730
RNN Yin et al. [52] ID NSL-KDD 83.28 — — —
LSTM Kim et al. [54] ID 10% KDDCup 99 96.93 98.80 10.00 —
LSTM Le et al. [55] ID KDDCup 99 97.54 98.95 9.98 —
LSTM Kim et al. [53] ID KDDCup 99 99.80 — 5.50 —
GRU Agarap [56] ID Netflow 84.15 — — —
Ensemble Ludwig [58] ID NSL-KDD 92.50 93.00 0.92 —
AE, DBN Li et al. [57] ID KDDCup 99 92.10 — 1.58 —
DCNN Naseer et al. [67] ID NSL-KDD 85.00 — — 0.980
PL-CNN Liu et al. [60] ID DARPA1998 99.36 90.56 — 0.910
PL-RNN Liu et al. [60] ID DARPA1998 99.98 99.98 — 0.990

classifiers for attack detection. In most of the previous works, review will provide guidance and dictionaries for further
they only reduce feature dimension for less computation research in this field.
cost during phase of feature extraction. Secondly, most of the
deep learning techniques are appropriate for analysis of Data Availability
image and pattern recognition. Thus, how to conduct the
classification of network traffic reasonably with deep The data used to support the findings of this study were
learning techniques will be an interesting issue. Thirdly, with supplied by Dabao Wei under license and so cannot be made
more data involving the experiments, the classification re- freely available. Requests for access to these data should be
sults will be better [68]. However, most of the attack de- made to Yirui Wu (wuyirui@hhu.edu.cn).
tection problems are short of sufficient data. Therefore,
combining supervised and unsupervised learning may Conflicts of Interest
provide better performance, which has been proved by many
trials. Moreover, with the development of IoT [69], fog, The authors declare that they have no conflicts of interest.
cloud [70], and big data technologies, how to involve them to
help improve effectiveness of attack detection methods using Acknowledgments
deep learning remains an open and interesting question.
According to the above analysis, we hold a belief that this This work was supported by National Key R&D Program of
overview is a benefit for those who have ideas to improve the China under Grant 2018YFC0407901, the Fundamental
performance of attack detection in terms of accuracy; our Research Funds for the Central Universities under Grant
Security and Communication Networks 15

B200202177, the Natural Science Foundation of China under [16] R. Vinayakumar, K. Soman, and P. Poornachandran, “Eval-
Grant 61702160, and the Natural Science Foundation of uating effectiveness of shallow and deep networks to intrusion
Jiangsu Province under Grant BK20170892. detection system,” in Proceedings of 2017 International
Conference on Advances in Computing, Communications and
Informatics (ICACCI), IEEE, Udupi, India, pp. 1282–1289,
References September 2017.
[17] N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, “A deep learning
[1] S. Aftergood, “Cybersecurity: the cold war online,” Nature,
approach to network intrusion detection,” IEEE Transactions
vol. 547, no. 7661, pp. 30-31, 2017.
on Emerging Topics in Computational Intelligence, vol. 2, no. 1,
[2] M. A. Al-Garadi, A. Mohamed, A. Al-Ali, X. Du, and
pp. 41–50, 2018.
M. Guizani, “A survey of machine and deep learning methods
[18] R. Vinayakumar, M. Alazab, K. P. Soman, P. Poornachandran,
for internet of things (iot) security,” 2018, http://arxiv.org/
A. Al-Nemrat, and S. Venkatraman, “Deep learning approach
abs/ 11023.
for intelligent intrusion detection system,” IEEE Access, vol. 7,
[3] A. Aleesa, B. Zaidan, A. Zaidan, and N. M. Sahar, Review of
pp. 41525–41550, 2019.
Intrusion Detection Systems Based on Deep Learning Tech-
[19] J. Saxe and K. Berlin, “expose: a character-level convolutional
niques: Coherent Taxonomy, Challenges, Motivations, Rec-
neural network with embeddings for detecting malicious urls,
ommendations, Substantial Analysis and Future Directions.
Neural Computing and Applications, pp. 1–32, Springer, file paths and registry keys,” 2017, http://arxiv.org/abs/ 1702.
Berlin, Germany, 2019. 08568.
[4] G. Apruzzese, M. Colajanni, L. Ferretti, A. Guido, and [20] R. Pascanu, J. W. Stokes, H. Sanossian, M. Marinescu, and
M. Marchetti, “On the effectiveness of machine and deep A. Thomas, “Malware classification with recurrent networks,”
learning for cyber security,” in Proceedings of 2018 10th In- in Proceedings of 2015 IEEE International Conference on
ternational Conference on Cyber Conflict (CyCon), IEEE, Acoustics, Speech and Signal Processing (ICASSP), pp. 1916–
Tallinn, Estonia, pp. 371–390, June 2018. 1920, Queensland, Australia, April 2015.
[5] D. Berman, A. Buczak, J. Chavis, and C. Corbett, “A survey of [21] Z. Feng, C. Shuo, and W. Xiaochuan, “Classification for dga-
deep learning methods for cyber security,” Information, based malicious domain names with deep learning archi-
vol. 10, no. 4, p. 122, 2019. tectures,” in Proceedings of 2017 Second International Con-
[6] M. A. Ferrag, L. Maglaras, S. Moschoyiannis, and H. Janicke, ference on Applied Mathematics and Information Technology,
“Deep learning for cyber security intrusion detection: ap- London, UK, January 2017.
proaches, datasets, and comparative study,” Journal of In- [22] J. Woodbridge, H. S. Anderson, A. Ahuja, and D. Grant,
formation Security and Applications, vol. 50, p. 102419, 2020. “Predicting domain generation algorithms with long short-
[7] C. S. Wickramasinghe, D. L. Marino, K. Amarasinghe, and term memory networks,” 2016, http://arxiv.org/abs/ 1611.
M. Manic, “Generalization of deep learning for cyber-physical 00791.
system security: a survey,” in Proceedings of IECON 2018-44th [23] M. Z. Alom, T. M. Taha, C. Yakopcic et al., “The history began
Annual Conference of the IEEE Industrial Electronics Society, from alexnet: a comprehensive survey on deep learning
IEEE, Washington, DC, USA, pp. 745–751, October 2018. approaches,”2018 pages, CoRR abs/1803.01164.
[8] Y. Xin, L. Kong, Z. Liu et al., “Machine learning and deep [24] E. Aminanto and K. Kim, “Deep learning in intrusion de-
learning methods for cybersecurity,” IEEE Access, vol. 6, tection system: an overview,” in Proceedings of 2016 Inter-
pp. 35365–35381, 2018. national Research Conference on Engineering and Technology
[9] X. Xu, C. He, Z. Xu, L. Qi, S. Wan, and M. Z. A. Bhuiyan, (2016 IRCET), Higher Education Forum, Seoul, South Korea,
“Joint optimization of offloading utility and privacy for edge January 2016.
computing enabled iot,” IEEE Internet of Things Journal, [25] L. Deng, “A tutorial survey of architectures, algorithms, and
vol. 7, no. 4, pp. 2622–2629, 2020. applications for deep learning,” APSIPA Transactions on
[10] X. Xu, Q. Liu, X. Zhang, J. Zhang, L. Qi, and W. Dou, “A Signal and Information Processing, vol. 3, 2014.
blockchain-powered crowdsourcing method with privacy [26] Y. Yu, J. Long, and Z. Cai, “Network intrusion detection
preservation in mobile environment,” IEEE Transactions on through stacking dilated convolutional autoencoders,” Secu-
Computational Social Systems, vol. 6, no. 6, pp. 1407–1419, rity and Communication Networks, vol. 2017, Article ID
2019. 4184196, 10 pages, 2017.
[11] X. Xu, X. Liu, Z. Xu, F. Dai, X. Zhang, and L. Qi, “Trust- [27] M. Yousefi-Azar, V. Varadharajan, L. Hamey, and
oriented iot service placement for smart cities in edge com- U. Tupakula, “Autoencoder-based feature learning for cyber
puting,” IEEE Internet of Things Journal, vol. 7, 2019. security applications,” in Proceedings of 2017 International
[12] X. Xu, Y. Chen, X. Zhang, Q. Liu, X. Liu, and L. Qi, A Joint Conference on Neural Networks (IJCNN), IEEE, San
Blockchain-Based Computation Offloading Method for Edge Diego, CA, USA, pp. 3854–3861, June 2017.
Computing in 5g Networks, John and Wiley, Hoboken, NJ, [28] F. Farahnakian and J. Heikkonen, “A deep auto-encoder
USA, 2019. based approach for intrusion detection system,” in Proceed-
[13] C. Wang, Z. Chen, K. Shang, and H. Wu, “Label-removed ings of 2018 20th International Conference on Advanced
generative adversarial networks incorporating with k-means,” Communication Technology (ICACT), IEEE, Chuncheon,
Neurocomputing, vol. 361, pp. 126–136, 2019. South Korea, pp. 178–183, July 2018.
[14] T. Meng, K. Wolter, H. Wu, and Q. Wang, “A secure and cost- [29] A. Javaid, Q. Niyaz, W. Sun, and M. Alam, “A deep learning
efficient offloading policy for mobile cloud computing against approach for network intrusion detection system,” in Pro-
timing attacks,” Pervasive and Mobile Computing, vol. 45, ceedings of the 9th EAI International Conference on Bio-Inspired
pp. 4–18, 2018. Information and Communications Technologies (formerly
[15] X. Li and H. Wu, “Spatio-emporal representation with BIONETICS), pp. 21–26, New York, NY, USA, December 2016.
deepneural recurrent network in MIMO CSI feedback,” [30] D. Papamartzivanos, F. Gomez Marmol, and G. Kambourakis,
CoRRabs/1908.07934, 2019. Introducing Deep Learning Self-Adaptive Misuse Network
16 Security and Communication Networks

Intrusion Detection Systems, IEEE Access, Piscataway, NJ, [44] B. Kolosnjaji, G. Eraisha, G. Webster, A. Zarras, and C. Eckert,
USA, 2019. “Empowering convolutional networks for malware classifi-
[31] N. Gao, L. Gao, Q. Gao, and H. Wang, “An intrusion detection cation and analysis,” in Proceedings of 2017 International Joint
model based on deep belief networks,” in Proceedings of 2014 Conference on Neural Networks (IJCNN), pp. 3838–3845, San
Second International Conference on Advanced Cloud and Big Diego, CA, USA, June 2017.
Data, IEEE, Huangshan, China, pp. 247–252, November 2014. [45] W. Wang, M. Zhu, J. Wang, X. Zeng, and Z. Yang, “End-to-
[32] Y. Ding, S. Chen, and J. Xu, “Application of deep belief end encrypted traffic classification with one-dimensional
networks for opcode based malware detection,” in Proceedings convolution neural networks,” in Proceedings of 2017 IEEE
of 2016 International Joint Conference on Neural Networks International Conference on Intelligence and Security Infor-
(IJCNN), pp. 3901–3908, Vancouver, British, July 2016. matics (ISI), pp. 43–48, Taipei, Taiwan, June 2017.
[33] Q. . s. Tan, W. Huang, and Q. Li, “An intrusion detection [46] M. Zhang, B. Xu, S. Bai, S. Lu, and Z. Lin, “A deep learning
method based on dbn in ad hoc networks,” in Proceedings of method to detect web attacks using a specially designed
Wireless Communication and Sensor Network: International CNN,” in Proceedings of 24th International Conference on
Conference on Wireless Communication and Sensor Network Neural Information Processing, pp. 828–836, Guangzhou,
(WCSN, World Scientific, Wuhan, China, pp. 477–485, De- China, November 2017.
cember 2015. [47] W. Wang, M. Zhu, X. Zeng, X. Ye, and Y. Sheng, “Malware
[34] M. Z. Alom, V. Bontupalli, and T. M. Taha, “Intrusion de- traffic classification using convolutional neural network for
tection using deep belief networks,” in Proceedings of 2015 representation learning,” in Proceedings of 2017 International
National Aerospace and Electronics Conference (NAECON), Conference on Information Networking, ICOIN), Da Nang,
pp. 339–344, Dayton, OH, USA, June 2015. Vietnam, January 2017.
[35] G. Zhao, C. Zhang, and L. Zheng, “Intrusion detection using [48] H. Yang and F. Wang, “Wireless network intrusion detection
deep belief network and probabilistic neural network,” in based on improved convolutional neural network,” IEEE
Proceedings of 2017 IEEE International Conference on Com- Access, vol. 7, pp. 64366–64374, 2019.
putational Science and Engineering (CSE) and IEEE Inter- [49] D. Tang, L. Tang, W. Shi, S. Zhan, and Q. Yang, Mf-cnn: A
national Conference on Embedded and Ubiquitous Computing New Approach for Ldos Attack Detection Based on Multi-
(EUC), Taipei, Taiwan, December 2017. Feature Fusion and Cnn. Mobile Networks and Applications,
[36] K. Alrawashdeh and C. Purdy, “Toward an online anomaly pp. 1–18, Springer, Berlin, Germany, 2020.
intrusion detection system based on deep learning,” in Pro- [50] R. C. Staudemeyer, “Applying long short-term memory re-
ceedings of 2016 15th IEEE International Conference on Ma- current neural networks to intrusion detection,” South Afri-
chine Learning and Applications (ICMLA), pp. 195–200, can Computer Journal, vol. 56, no. 1, pp. 136–154, 2015.
Anaheim, CA, USA, December 2016. [51] R. B. Krishnan and N. Raajan, “An intellectual intrusion
[37] H. Zhang, Y. Li, Z. Lv, A. K. Sangaiah, and T. Huang, “A real- detection system model for attacks classification using rnn,”
time and ubiquitous network attack detection based on deep International Journal of Pharmaceutical Technology and
belief network and support vector machine,” IEEE/CAA Biotechnology, vol. 8, no. 4, pp. 23157–23164, 2016.
Journal of Automatica Sinica, vol. 7, no. 3, pp. 790–799, 2020. [52] C. Yin, Y. Zhu, J. Fei, and X. He, “A deep learning approach
[38] T. Erpek, Y. E. Sagduyu, and Y. Shi, “Deep learning for for intrusion detection using recurrent neural networks,”
launching and mitigating wireless jamming attacks,” IEEE IEEE Access, vol. 5, pp. 21954–21961, 2017.
Transactions on Cognitive Communications and Networking, [53] G. Kim, H. Yi, J. Lee, Y. Paek, and S. Yoon, “Lstm-based
vol. 5, no. 1, pp. 2–14, 2018. system-call language modeling and robust ensemble method
[39] A. AlEroud and G. Karabatis, “Bypassing detection of url- for designing host-based intrusion detection systems,” 2016,
based phishing attacks using generative adversarial deep http://arxiv.org/abs/ 1611.01726.
neural networks,” in Proceedings of the Sixth International [54] J. Kim, J. Kim, H. L. T. Thu, and H. Kim, “Long short term
Workshop on Security and Privacy Analytics, pp. 53–60, New memory recurrent neural network classifier for intrusion
Orleans, LA, USA, March 2020. detection,” in Proceedings of 2016 International Conference on
[40] T. A. Tang, L. Mhamdi, D. McLernon, S. A. R. Zaidi, and Platform Technology and Service (PlatCon), pp. 1–5, Jeju,
M. Ghogho, “Deep learning approach for network intrusion Korea, February 2016.
detection in software defined networking,” in Proceedings of [55] T. Le, J. Kim, and H. Kim, “An effective intrusion detection
2016 International Conference on Wireless Networks and classifier using long short-term memory with gradient descent
Mobile Communications (WINCOM), IEEE, Reims, France, optimization,” in Proceedings of 2017 International Conference
pp. 258–263, October 2016. on Platform Technology and Service (PlatCon), pp. 1–6, Jeju,
[41] D. Li, R. Baral, T. Li, H. Wang, Q. Li, and S. Xu, “Hashtran- Korea, February 2017.
dnn: a framework for enhancing robustness of deep neural [56] A. F. M. Agarap, “A neural network architecture combining
networks against adversarial malware samples,” 2018, http:// gated recurrent unit (gru) and support vector machine (svm)
arxiv.org/abs/ 1809.06498. for intrusion detection in network traffic data,” in Proceedings
[42] W. Peng, X. Kong, G. Peng, X. Li, and Z. Wang, “Network of the 2018 10th International Conference on Machine
intrusion detection based on deep learning,” in Proceedings of Learning and Computing, ACM, Macau, China, February
2019 International Conference on Communications, Infor- 2018.
mation System and Computer Engineering (CISCE), pp. 431– [57] Y. Li, R. Ma, and R. Jiao, “A hybrid malicious code detection
435, Haikou, China, July 2019. method based on deep learning,” International Journal of
[43] B. Kolosnjaji, A. Zarras, G. Webster, and C. Eckert, “Deep Security and Its Applications, vol. 9, no. 5, pp. 205–216, 2015.
learning for classification of malware system call sequences,” [58] S. A. Ludwig, “Intrusion detection of multiple attack classes
in Proceedings of Australasian Joint Conference on Artificial using a deep neural net ensemble,” in Proceedings of 2017 IEEE
Intelligence, pp. 137–149, Springer, Hobart, Australia, De- Symposium Series on Computational Intelligence (SSCI), IEEE,
cember 2016. Honolulu,HI, USA, November 2017.
Security and Communication Networks 17

[59] D. Li, Q. Li, Y. Ye, and S. Xu, “Enhancing robustness of deep


neural networks against adversarial malware samples: prin-
ciples, framework, and aics’2019 challenge,” 2018, http://arxiv.
org/abs/ 1812.08108.
[60] H. Liu, B. Lang, M. Liu, and H. Yan, “Cnn and rnn based
payload classification methods for attack detection,” Knowl-
edge-Based Systems, vol. 163, pp. 332–341, 2019.
[61] Y. Zhang, X. Chen, L. Jin, X. Wang, and D. Guo, “Network
intrusion detection: based on deep hierarchical network and
original flow data,” IEEE Access, vol. 7, pp. 37004–37016, 2019.
[62] R. Lippmann, J. W. Haines, D. J. Fried, J. Korba, and K. Das,
“The 1999 darpa off-line intrusion detection evaluation,”
Computer Networks, vol. 34, no. 4, pp. 579–595, 2000.
[63] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A
detailed analysis of the kdd cup 99 data set,” in Proceedings of
2009 IEEE Symposium on Computational Intelligence for Se-
curity and Defense Applications, IEEE, Ottawa, Canada,
pp. 1–6, July 2009.
[64] R. C. Aygun and A. G. Yavuz, “Network anomaly detection
with stochastically improved autoencoder based models,” in
Proceedings of 2017 IEEE 4th International Conference on
Cyber Security and Cloud Computing (CSCloud), pp. 193–198,
New York, NY, USA, June 2017.
[65] M. Gharib, B. Mohammadi, S. H. Dastgerdi, and M. Sabokrou,
“Autoids: auto-encoder based method for intrusion detection
system,” 2019, http://arxiv.org/abs/ 1911.03306.
[66] F. Qu, J. Zhang, Z. Shao, and S. Qi, “An intrusion detection
model based on deep belief network,” in Proceedings of the
2017 VI International Conference on Network, Communica-
tion and Computing, pp. 97–101, Kunming, China, December
2017.
[67] S. Naseer, Y. Saleem, S. Khalid et al., “Enhanced network
anomaly detection based on deep neural networks,” IEEE
Access, vol. 6, pp. 48231–48246, 2018.
[68] N. Jones, “Computer science: the learning machines,” Nature,
vol. 505, no. 7482, pp. 146–148, 2014.
[69] X. Xu, X. Zhang, H. Gao, Y. Xue, L. Qi, and W. Dou, “Become:
blockchain-enabled computation offloading for iot in mobile
edge computing,” IEEE Transactions on Industrial Infor-
matics, vol. 16, no. 6, pp. 4187–4195, 2020.
[70] X. Xu, R. Mo, F. Dai, W. Lin, S. Wan, and W. Dou, “Dynamic
resource provisioning with fault tolerance for data-intensive
meteorological workflows in cloud,” IEEE Transactions on
Industrial Informatics, vol. 16, 2019.

You might also like