Cyber Security Meets Artificial in

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

1462 Li / Front Inform Technol Electron Eng 2018 19(12):1462-1474

Frontiers of Information Technology & Electronic Engineering


www.jzus.zju.edu.cn; engineering.cae.cn; www.springerlink.com
ISSN 2095-9184 (print); ISSN 2095-9230 (online)
E-mail: jzus@zju.edu.cn

Review:

Cyber security meets artificial intelligence: a survey*

Jian-hua LI
School of Cyber Security, Shanghai Jiao Tong University, Shanghai 200240, China
E-mail: lijh888@sjtu.edu.cn
Received Sept. 16, 2018; Revision accepted Dec. 13, 2018; Crosschecked Dec. 24, 2018

Abstract: There is a wide range of interdisciplinary intersections between cyber security and artificial intelligence (AI). On one
hand, AI technologies, such as deep learning, can be introduced into cyber security to construct smart models for implementing
malware classification and intrusion detection and threating intelligence sensing. On the other hand, AI models will face various
cyber threats, which will disturb their sample, learning, and decisions. Thus, AI models need specific cyber security defense and
protection technologies to combat adversarial machine learning, preserve privacy in machine learning, secure federated learning,
etc. Based on the above two aspects, we review the intersection of AI and cyber security. First, we summarize existing research
efforts in terms of combating cyber attacks using AI, including adopting traditional machine learning methods and existing deep
learning solutions. Then, we analyze the counterattacks from which AI itself may suffer, dissect their characteristics, and classify
the corresponding defense methods. Finally, from the aspects of constructing encrypted neural network and realizing a secure
federated deep learning, we expatiate the existing research on how to build a secure AI system.

Key words: Cyber security; Artificial intelligence (AI); Attack detection; Defensive techniques
https://doi.org/10.1631/FITEE.1800573 CLC number: TP309

1 Introduction longer useful in protecting systems against new cyber


security threats, such as advanced persistent threats
Today, various novel networking and computing (APTs) and zero-day attacks. Moreover, as cyber
technologies, such as software-defined networking threats become ubiquitous and sustainable, the di-
(SDN), big data, and fog computing, have promoted verse attack entry points, high-level intrusion mode,
the rapid development of cyberspace (Li GL et al., and systematic attack tools reduce the cost of cyber
2018; Li LZ et al., 2018a, 2018b). Meanwhile, cyber threat deployment. To maximize the security level of
security has become one of the most important issues core system assets, it is urgent to develop innovative
in cyberspace (Guan et al., 2017; Wu et al., 2018). and intelligent security defense methodologies that
Cyberspace security has imposed tremendous impacts can cope with diversified and sustainable threats. To
on various critical infrastructures. Traditional security implement new cyber security defense and protection,
relies on the static control of security devices de- the system should obtain the history and current se-
ployed on special edges or nodes, such as firewalls, curity state data and make intelligent decisions that
intrusion detection systems (IDSs), and intrusion can provide adaptive security management and
prevention systems (IPSs), for network security control.
monitoring according to the pre-specified rules. Artificial intelligence (AI) is a fast-growing
However, this passive defense methodology is no branch of computer science that researches and de-
velops theories, methods, techniques, and application
*
Project supported by the National Natural Science Foundation of systems to simulate, extend, and expand human in-
China (Nos. 61431008 and 61571300) telligence. Thanks to the development of ultra-
ORCID: Jian-hua LI, http://orcid.org/0000-0002-6831-3973
performance computing technology and the emer-
© Zhejiang University and Springer-Verlag GmbH Germany, part of
Springer Nature 2018 gence of deep learning (DL), AI technology has made
Li / Front Inform Technol Electron Eng 2018 19(12):1462-1474 1463

great progress in recent years. In particular, DL system itself may also be attacked or deceived, re-
technology has enabled people to benefit from more sulting in incorrect classification or prediction results.
data, obtain better results, and develop more potential. For example, in adversarial environments, manipu-
It has dramatically changed people’s lives and re- lating training samples will result in toxic attacks, and
shaped traditional AI technology. AI has a wide range manipulating test samples will result in evasion at-
of applications, such as facial recognition, speech tacks. Attacks in adversarial environments are in-
recognition, and robotics, but its application scope tended to undermine the integrity and usability of
goes far beyond the three aspects of image, voice, and various AI applications, and mislead neural networks
behavior. It also has many other outstanding applica- by employing adversarial samples, causing classifiers
tions in the field of cyber security, such as malware to derive wrong classification. Of course, there are
monitoring and intrusion detection. In the early de- corresponding defense measures against adversarial
velopment of AI technology, machine learning (ML) attacks. These defense measures focus mainly on
technology played a vital role in dealing with cyber- three aspects (Akhtar and Mian, 2018): (1) modifying
space threats. Although ML is very powerful, it relies the training process or input samples; (2) modifying
too much on feature extraction. This flaw is particu- the network itself, such as adding more layers/
larly glaring when it is applied to the field of cyber sub-networks and changing the loss/activation func-
security. For example, to enable an ML solution to tion; (3) using some external models as network
recognize malware, we have to manually compile the add-ons when classifying samples that have not ap-
various features associated with malware, which un- peared. As DL models become more complex and
doubtedly limits the efficiency and accuracy of threat datasets become larger, centralized training methods
detection. This is because ML algorithms work ac- cannot adapt to these new requirements. Distributed
cording to the pre-defined specific features, which learning modes, such as federated learning launched
means that features which are not pre-defined will by Google, have emerged, enabling many intelligent
escape detection and cannot be discovered. It can be terminals to learn a shared model in a collaborative
concluded that the performance of most ML algo- way. However, all training data is stored in terminal
rithms depends on the accuracy of feature recognition devices, which brings many security challenges. How
and extraction (Golovko, 2017). In the view of ob- to ensure that the model is not maliciously stolen and
vious flaws in traditional ML, researchers began to that it can construct a distributed ML system with
study deep neural network (DNN), also known as DL, privacy protection, is a major research hotspot.
which is a sub-domain of ML. A big difference in
concept between the traditional ML and DL is that DL
can be used to directly train the original data without 2 Artificial intelligence: new trend of cyber
extracting its features. In the past few years, DL has security
achieved 20%–30% performance improvement in the
fields of computer vision, speech recognition, and There are many approaches for implementing AI.
text understanding, and achieved a historic leap in the At the very early stage, people used a knowledge base
development of AI (Deng and Yu, 2014). DL can to formalize the knowledge. However, this approach
detect nonlinear correlations hidden in the data, needs too many manual operations to exactly describe
support any new file types, and detect unknown at- the world with complex rules. Therefore, scientists
tacks, which is an attractive advantage in cyber secu- designed a pattern in which the AI system can extract
rity defense. In recent years, DL has made great pro- a model from raw data, and this ability is called
gress in preventing cyber security threats, especially “ML.” ML algorithms include statistical mechanisms,
in preventing APT attacks. DNN can learn the such as Bayesian algorithms, function approximation
high-level abstract characteristics of APT attacks, (linear or logistics regression), and decision trees
even if they employ the most advanced evasion (Hatcher and Yu, 2018). All these algorithms are
techniques (Yuan, 2017). powerful and can be used in many situations where
Although novel AI technologies, such as DL, simple classification is needed. Nevertheless, these
play an important role in cyberspace defense, AI methods are limited in accuracy, which may lead to a
1464 Li / Front Inform Technol Electron Eng 2018 19(12):1462-1474

poor performance on massive and complex data rep- 2.2 Deep learning applications
resentation (LeCun et al., 2015). DL was proposed to
In this part, we review the applications of DL.
solve the above deficiencies. DL imitates the process
DL is widely used in autonomous systems because of
of human neurons and builds the neural architecture
the significant advantages in optimization, discrimi-
with complex interconnections. Today, DL is a re-
nation, and prediction. Due to the massive application
search hotspot in academia and has been widely used area categories, we introduce only a few representa-
in various industrial scenarios. Therefore, we will tive application domains.
introduce the categorization and the applications of
state-of-the-art models in DL research in different 2.2.1 Image and video recognition
areas. Image and video recognition is the most im-
2.1 Categorization of deep learning portant area of DL research. The typical structure of
DL in this area is the deep convolutional neural net-
The categorization of DL is based on its learning work (CNN). This structure can reduce the image size
mechanism. There are three kinds of primary learning by convolving and pooling the image before putting
mechanisms: supervised learning, unsupervised the data into the full-connected neural network. In this
learning, and reinforcement learning. area, there are numerous research branches, and many
2.1.1 Supervised learning derivative applications are based on this fundamental
research. For example, Ren et al. (2017) proposed a
Supervised learning clearly requires labeled in- faster CNN for real-time object detection to signifi-
put data, and is usually used as a classification cantly reduce the running time of the detection
mechanism or a regression mechanism. For example, network.
malware detection is a typical binary classification
2.2.2 Text analysis and natural language processing
scenario (malicious or benign) (Goodfellow et al.,
2014). In contrast to classification, regression learn- With the development of social networking and
ing outputs a prediction value that is one or more mobile Internet, massive data is created by human
continuous-valued numbers according to the input interaction. The requirement of text analysis and
data. natural language processing is the precondition of
on-the-fly translation and human-machine interaction
2.1.2 Unsupervised learning with natural speech. Many related DL applications
In contrast to supervised learning, the input data have been proposed. For instance, Manning et al.
of unsupervised learning is unlabeled. Unsupervised (2014) proposed a toolkit, named “Stanford
learning is often used to cluster data, reduce dimen- CoreNLP,” which is an extensible pipeline providing
sionality, or estimate density. For instance, a fuzzy core natural language analysis.
deep brief network (DBN) system combines the 2.2.3 Finance, economics, and market analysis
Takagi-Sugeno-Kang (TSK) fuzzy system, and can
provide an adaptive mechanism to regulate the depth Stock trading and other market models require
of the DBN to obtain a highly accurate clustering. high accurate market predictions. DL has been highly
exploited as a powerful market predictive tool. For
2.1.3 Reinforcement learning example, Korczak and Hernes (2017) proposed a
financial time-series forecasting algorithm based on
Reinforcement learning is based on rewarding
the CNN architecture. The forecasting error rate sig-
the action of a smart agent. It can be considered as a
nificantly decreased via testing using forex market
fusion of supervised learning and unsupervised
data.
learning. It is suitable for tasks that have long-term
feedback (Arulkumaran et al., 2017). By combining
the advances in training of deep neural networks, 3 Artificial intelligence based cyber security
Mnih et al. (2015) developed the deep Q-network,
which can achieve human-level control as a deep In this section, we review the traditional ML
reinforcement learning architecture. schemes against cyberspace attacks and various DL
Li / Front Inform Technol Electron Eng 2018 19(12):1462-1474 1465

schemes. The implementation process, experimental Based on a multi-class k-NN classifier, Meng
results, and efficiency of different programs in com- et al. (2015) developed a knowledge-based alert ver-
bating cyberspace attacks are discussed. ification method to identify false alarms and non-
critical alarms. Then, to filter out these unwanted
3.1 Traditional machine learning schemes against
alarms, they designed an intelligent alarm filter that
cyberspace attacks
consists of three major components: an alarm data-
An ML solution consists of four main steps (Xin base, a rating measurement, and an alarm filter. They
et al., 2018): conducted experiments from different dimensions,
1. extract the features; and the experimental results indicated that the de-
2. select the appropriate ML algorithm; signed alarm filter can achieve a good filtering per-
3. train the model and then select the model with formance even with limited CPU usage.
the best performance by evaluating different algo-
3.1.2 Support vector machine based cyber security
rithms and adjusting parameters;
4. classify or predict unknown data using the The support vector machine (SVM) is a super-
trained model. vised learning algorithm that has superior perfor-
Common ML solutions include k-nearest- mance, including support vector classification and
neighbor (k-NN), support vector machine (SVM), support vector regression. The core idea of SVM is to
decision tree, neural network, etc. Different kinds of separate the data by constructing an appropriate split
algorithms solve different types of problems. It is plane. Fig. 1 shows a typical SVM realization. The
necessary to select an appropriate algorithm accord- optimal split plane is determined for classification of
ing to specific industrial application scenarios. attacked/safe measurements.
3.1.1 k-nearest-neighbor-based cyber security
The premise of k-NN execution is that the data
and labels of the training dataset should be known.
Input the test data and then compare the characteris-
Ztr2

tics of the test data with the corresponding features in


the training set to find the top k metadata that is the
most similar to the training set. Finally, select the one
with the most occurrences among the k metadata as
the class corresponding to the test data.
Syarif and Gata (2017) proposed an intrusion
detection scheme using binary particle swarm opti- Fig. 1 A support vector machine (SVM) classification
mization (PSO) and k-NN algorithms. They chose implementation
KDD CUP 1999, which is a widely used standard
dataset for researchers to simulate in IDSs. The Olalere et al. (2016) constructed a real-time
overall experimental results show a 2% accuracy malware uniform resource locator (URL) classifier by
increase compared with those obtained using the identifying and evaluating discriminative lexical
k-NN algorithm alone. features of malware URLs. It manually examined
Hybridization of classifiers usually performed blacklisted malware URLs, which led to identifica-
better than individual ones. k-NN, SVM, and tion of 12 discriminative lexical features. Then,
pdAPSO are hybridized for intrusion detection (Dada, empirical analysis was conducted on the identified
2017). Dada (2017) compared the performances of features of the existing blacklisted malware URLs
these three classifiers employing KDD99 datasets, and newly collected malware URLs, revealing that
and the experimental results showed that the fusion of attackers followed the same pattern in crafting mal-
the three classifiers can lead to a classification accu- ware URLs. Finally, they used an SVM to evaluate
racy of 98.55%. However, Dada (2017) focused on the performance and effectiveness of the extracted
only classification accuracy, and not the complexity features, and obtained 96.95% accuracy with a low
and efficiency of the model. false negative rate (FNR) of 0.018.
1466 Li / Front Inform Technol Electron Eng 2018 19(12):1462-1474

SVM has also been used in intrusion detection Vuong et al. (2015) used a decision tree to gen-
and analysis in some emerging networks. For exam- erate simple detection rules that were used to defend
ple, in the software-defined network, the controller is against denial of service and command injection at-
vulnerable to DDoS, which leads to resource exhaus- tacks on robotic vehicles. They considered cyber
tion. Kokila et al. (2014) used an SVM classifier to input features, such as network traffic and disk data,
detect DDoS attacks in the software-defined network. and physical input features, such as speed, power
They also carried out some experiments on the ex- consumption, and jittering. Their experimental results
isting DARPA dataset and compared the perfor- showed that different attacks have different impacts
mances between the SVM classifier and other tech- on robot behaviors, including cyber and physical
niques, showing that the designed SVM scheme operations, and that the addition of physical input
produced a lower false positive rate (FPR) and higher features could help the decision tree increase the
classification accuracy. Nevertheless, SVM training overall accuracy of detection and reduce the false
requires more time, which is an obvious defect. positive rate.
The cyberspace of different industrial applica-
tions presents different network characteristics, and
thus the suffered attack patterns are also specific. For
example, two-way communication and the distributed
energy network that makes the grid intelligent are the
main features of a smart grid. In the smart grid, ma-
licious injection of erroneous data will have a cata-
strophic impact on decisions at various stages. Shahid
et al. (2012) proposed two techniques for fault detec-
tion and classification in power transmission lines
(TL). Both approaches are based on the one-class
quarter-sphere support vector machines (QSSVMs).
The first approach, called “temporal-attribute
QSSVM (TA-QSSVM),” tries to determine the at-
tribute correlations of data measured in a TL for fault
detection, and the second approach exploits attribute
correlations for only fault classification. These con-
vincing experiments showed that TA-QSSVM can
obtain almost 100% fault-detection accuracy and
A-QSSVM can achieve 99% fault classification ac-
curacy, which are remarkable results. In addition to
accuracy, these approaches had less computational
Fig. 2 A decision tree construction for malware detection
complexity than multi-class SVM (from O(n4) to
O(n2)), making them applicable to online detection
and classification. APT attacks employ social engineering methods
to invade various systems, which brings big social
3.1.3 Decision tree based cyber security
issues. Moon et al. (2017) designed a decision tree
The decision tree algorithm is a method to ap- based intrusion detection system detecting APT at-
proximate the value of a discrete function. In essence, tacks that might intellectually change after intrusion
the decision tree mechanism is a process to classify into a system. The intuitive idea was to analyze the
data through a series of rules. Fig. 2 shows the deci- behavior information through a decision tree. This
sion tree construction process for malware detection. system could also detect the possibility of the initial
Malware can be classified based on a decision tree. intrusion and reduce the hazard to a minimum by
The decision result is derived from specific charac- responding to APT attacks as soon as possible. The
teristics through pre-defined decision rules. detection accuracy was 84.7% in their experiments;
Li / Front Inform Technol Electron Eng 2018 19(12):1462-1474 1467

the accuracy was actually high considering the diffi- experimental results showed that DeepFlow outper-
culty in detecting malware-related APT attacks. forms traditional ML algorithms, such as Naïve Bayes,
PART, logistic regression, SVM, and multi-layer
3.1.4 Neural network based cyber security
perceptron (MLP). Some new DL technologies can
Gao et al. (2010) developed an intrusion detec- also be used (Ota et al., 2017; Li LZ et al., 2018a,
tion system based on a neural network to detect arti- 2018b).
facts of command-and-response injection attacks by
monitoring the physical behaviors of supervisory
control and data acquisition (SCADA) systems. The
experimental results showed that the neural network
based IDS has an excellent performance in detecting
man-in-the-middle response injection and DoS-based
response injection, but it could not detect replay-
based response injection attacks.
Vollmer and Manic (2009) proposed a computa-
tionally efficient neural network algorithm to provide
an intrusion detection alert scheme for cyber security
state awareness. The experimental results indicated
that this enhanced version of the neural network al-
gorithm reduced memory requirements by 70%, and
reduced runtime from 37 s to 1 s.
3.2 Deep learning solutions for defending against
Fig. 3 Deep belief network
cyberspace attacks
The DL method is very similar to the ML method. Focusing on the problems in intrusion detection,
As mentioned earlier, the feature selection in DL is such as redundant information, long training time,
automatic rather than manual, and DL attempts to and tendency to fall into a local optimum, Zhao et al.
obtain deeper features from the given data. The cur- (2017) put forward a novel intrusion detection
rent DL programs include the DBN, recurrent neural scheme by combining DBN and a probabilistic neural
network (RNN), and CNN. In this section we describe network (PNN). In this method, the raw data was
the use of different types of deep neural networks to converted into low-dimensional data, and DBN (with
defend against several network attacks in different nonlinear learning ability) extracted the essential
scenarios. characteristics from the original data. They used a
particle swarm optimization algorithm to optimize the
3.2.1 Deep belief network based attack defense hidden-layer node number per layer. Then they em-
DBN is a probability generation model consist- ployed a PNN to classify the low-dimensional data.
ing of multiple restricted Boltzmann layers. Zhu et al. The performance evaluation using the “KDD CUP
(2017) proposed a novel DL-based approach called 1999” dataset indicated that this method performs
“DeepFlow” to directly detect malware from the data better than traditional PNN, PCA-PNN, and raw
flows in Android applications. This scheme is DBN-PNN without optimization.
implemented based on DBN (Fig. 3). Based on the 3.2.2 Recurrent neural network based attack detection
DeepFlow architecture, complex attack feature data
can be analyzed. DeepFlow architecture consists of Unlike traditional feed-forward neural networks
three components: FlowDroid for feature extraction, (FNNs), RNNs introduce directional loops that can
SUSI for feature coarse-granularity, and the DBN DL handle contextual correlation among inputs to process
model for classification. Two crawler modules can be sequence data.
used to crawl malware from malware sources and To classify permission-based Android malware,
benign ware from Google Play Store separately. The Vinayakumar et al. (2018) used a long short-term
1468 Li / Front Inform Technol Electron Eng 2018 19(12):1462-1474

memory recurrent neural network (LSTM-RNN) implemented few-shot intrusion detection using a
because LSTM can learn temporal behaviors through linear SVM and a 1-nearest-neighbor classifier.
sparse representations of Android permissions se- Few-shot learning is suitable for occasions where the
quences. They also launched some notable experi- training set for a certain class is small. Finally, they
ments that were run up to 1000 epochs with a learning implemented the proposed scheme on the two
rate from 0.01 to 0.50. All LSTM networks achieved well-known public datasets: KDD99 and NSL-KDD.
the highest accuracy of 89.7% in the real-world An- These two datasets are unequal and some classes may
droid malware test dataset. have fewer training samples than others. The exper-
Loukas et al. (2018) proposed a cloud-based imental results showed that the proposed scheme has
cyber-physical intrusion detection scheme for the a better performance than previous schemes on these
Internet of Vehicles (IoV) using a deep multilayer two datasets.
perceptron and an RNN. They pointed out that RNN,
3.2.4 Automatic encoder based solutions for threat
with an LSTM hidden layer, proved very promising in
detection
learning the temporal context of various attacks, such
as DoS, command injection, and malware. This work Some researchers have attempted to use DL to
also revealed that detection latency, the key defect of distribute attack detection in a fog computing envi-
DL-based schemes, is a result of the increased pro- ronment. Abeshu and Chilamkurti (2018) proposed a
cessing demands, which can be addressed by novel distributed DL approach for cyberspace attack
cloud-based computational offloading. They also detection in fog-to-things computing. The model they
carried out some experiments in a real cyber envi- adopted was a stacked auto-encoder for unsupervised
ronment to verify their approach. DL. They trained a model with a mix of normal and
attack samples from an unlabeled network, and the
3.2.3 Convolutional neural network based attack
model identified patterns of attacks and normal data
detection
through a self-learning scheme. The experimental
CNN is a kind of feed-forward neural network results showed that the proposed deep model per-
that includes a convolutional layer and a pooling layer. forms better than shallow models in terms of the false
Artificial neurons can respond to surrounding alarm rate, accuracy, and scalability.
elements. Aygün and Yavuz (2017) proposed two anomaly
Based on a CNN, Meng et al. (2017) proposed a detection models employing an auto-encoder (AE)
novel model, named “malware classification based on and a de-noising auto-encoder (DAE), respectively.
static malware gene sequences (MCSMGS),” for They compared the performances of deterministic AE
malware classification. First, the scheme extracted the and the stochastically improved DAE models based
malware gene sequences of both informational and on the proposed stochastic anomaly threshold
material attributes. Second, it tried to determine the selection technique, indicating that each single model
representation of correlation and similarity of each performs better than all previous non-hybrid anomaly
malware. Finally, to achieve accurate malware clas- detection approaches. In addition, they claimed that
sification, a module named “static malware gene the performance of these two schemes could match
sequences−convolution neural network (SMGS- that of some hybrid solutions, and that the proposed
CNN)” was employed to analyze the extracted mal- stochastic threshold selection method is a successful
ware gene sequences. They claimed that the alternate to hybrid methods.
classification accuracy was up to 98% with the pro- Zolotukhin et al. (2016) focused on the detection
posed scheme, and it was more effective than the of DoS attacks in the application layer. Their scheme
SVM model. consists of analysis of communications between a
Chowdhury et al. (2017) presented an improved web server and its clients, separation of these com-
DL scheme based on CNN for intrusion detection. munications, and examination of communication
First, the scheme trained a convolutional neural net- distribution using a stacked auto-encoder and a class
work for intrusion detection. The second step was of DL algorithms. The scheme requires no decryption
different from that of other CNN solutions: it of the encrypted traffic, which obeys the ethical
extracted outputs from each layer in the CNN and norms concerning privacy. The experimental results
Li / Front Inform Technol Electron Eng 2018 19(12):1462-1474 1469

with the dataset from a realistic cyber environment rate of 97% when only 4.02% of the input features per
suggested good detection of DoS-related attacks, sample were modified.
which increased web service availability.

4 Cyber security attack and defensive tech-


niques of artificial intelligence

In fact, AI will also face cybersecurity threats.


For example, ML requires protection of the samples,
learning models, and the interoperation processes.
This section consists of three parts: introducing pos-
sible adversarial attacks on AI, summarizing several
defense methods against these attacks, and introduc-
ing security challenges of distributed DL and how to
construct secure AI under a distributed training mode.
4.1 Adversarial attacks on artificial intelligence
Traditional ML approaches assume that the dis-
tribution of training data is almost the same as that of
testing data. In an adversarial environment, however,
modern deep networks are prone to attacks by ad- Fig. 4 Adversarial attacks in different scenarios
versarial samples. These adversarial samples impose
only a slight disturbance on the original samples, and Sabour et al. (2015) introduced a new algorithm
thus a human virtual system could not detect the dis- to generate adversarial images by concentrating on
turbance. Such an attack can lead to wrong classifi- the internal layers of the deep neural network instead
cation of the deep neural network. The deep signifi- of focusing on image perturbations designed to pro-
cance of such phenomena has attracted many re- duce erroneous class labels. Mopuri et al. (2017)
searchers to study adversarial attacks and DL security. developed a systematic approach to compute univer-
Fig. 4 shows three typical adversarial attacks in sal perturbation for deep network. They also revealed
different application scenarios. In recommendation that the existence of these universal perturbations
systems, injecting poisoning data may result in in- implied some unknown but important geometric
correct recommendations. In facial recognition, add- correlations among classifier decision boundaries.
ing even a small number of modified images can Moosavi-Dezfooli et al. (2016) generated a minimum
cause the application to make an almost completely normalized perturbation by iterative computation,
wrong classification. Imposing only a small adver- pushing the image within the classification boundary
sarial perturbation on a generative model may pro- out of bounds until an error classification occurred.
duce totally incorrectly reconstructed samples. They proved that the proposed scheme generated
Papernot et al. (2016) proposed a novel class of smaller disturbances than FGSM and had similar
algorithms to disturb classifiers by modifying only a deception rates. Houdini is a method to deceive
few pixels in the image rather than perturbing the gradient-based ML algorithms (Cisse et al., 2017).
whole image. Their scheme is based on a precise The anti-attack is achieved by generating an adver-
understanding of the mapping between the inputs and sarial sample that is specific to the task loss function,
the outputs of the deep neural network. They also using the gradient information of the network’s
implemented an experiment on a typical application micro-loss function to generate anti-disturbance. In
to computer vision. The results showed that the pro- addition to image classification networks, the
posed algorithm could produce samples classified by algorithm can be used to spoof voice recognition
humans but misclassified by a deep network with a networks.
1470 Li / Front Inform Technol Electron Eng 2018 19(12):1462-1474

4.2 Defense methods against adversarial attacks perturbations (Moosavi-Dezfooli et al., 2017). The
core idea of this scheme is to add a separate trained
4.2.1 Modifying the training process and input data
network to the original model to achieve a method
The robustness of a deep network is improved by that does not require adjustment factors and is im-
continuously inputting new types of adversarial mune to the sample. Lee et al. (2017) employed the
samples and performing adversarial training. To en- popular generative adversarial networks (GANs)
sure effectiveness, this method requires high-intensity framework to train a deep network that is robust to
adversarial samples, and the network architecture attacks, such as FGSM. Lyu et al. (2015) provided
must be equipped with sufficient expressive power. another defense scheme based on a GAN. The fol-
This method is called “brute-force adversarial train- lowing are detection-only approaches. The feature
ing,” because it requires a large amount of training squeezing methods (He et al., 2017; Xu et al., 2017)
data. Goodfellow et al. (2015) and Cubuk et al. (2017) explore whether the sample is adversarial or not using
mentioned that this method could regularize the two models. Subsequent work described how this
network to reduce overfitting. However, Moosa- method was acceptable by C&W attacks. Meng and
vi-Dezfooli et al. (2017) pointed out that no matter Chen (2017) proposed a framework called “MagNet,”
how many anti-samples are added, there are new which uses a classifier to train the manifold meas-
anti-attack samples that can deceive the network. urements to determine whether the picture is noisy. In
Luo et al. (2015) proposed to use the foveation miscellaneous methods (Feinman et al., 2017;
mechanism to defend against the anti-disturbance Gebhart and Schrater, 2017; Liang et al., 2017), the
generated by L-BFGS and FGSM. The assumption of authors trained a model to treat all input images as
this proposal is that the image distribution is robust to noise, first learning how to smooth the picture and
transition variation, and the disturbance does not have then classifying it.
this property. However, the universality of this
method has not been proven. Xie et al. (2017) found 4.3 Construction of safe artificial intelligence
that introducing random rescaling on training images systems
can reduce the intensity of attacks. 4.3.1 Safe distributed ML/DL systems
4.2.2 Modifying network Shokri and Shmatikov (2015) first proposed the
It has been observed that the simply stacking construction of privacy-preserving DL under a dis-
denoising auto-encoders on the original network tributed training system (Fig. 5) that enables multiple
make themselves only more vulnerable. Gu and parties to collaboratively learn an accurate neural
Rigazio (2015) introduced deep contractive networks, network model without leaking their input datasets.
among which a smoothness penalty term similar to The key innovation of this work is the selective
contractive auto-encoders is used. Using input gra- sharing of deep neural network parameters during
dient regularization to improve robustness against model training, which makes the scheme effective
attack (Ross and Doshi-Velez, 2017), this method has and robust because the training can be asynchro-
a good effect combined with brute-force adversarial nously run. In the experiments where two datasets
training, but the computational complexity is very MNIST and SVHN were used, the proposed system
high. Some researchers attempted to use biologically was evaluated. The results suggested high classifica-
inspired solutions; for example, Nayebi and Ganguli tion accuracy in both datasets, even when the partic-
(2017) attempted to defend against attacks using a ipants shared 10% of their parameters. However,
nonlinear activation function similar to that of non- Phong et al. (2018) demonstrated that in the system of
linear dendrites in biological brains. In another work, Shokri and Shmatikov (2015), gradients shared over
the dense associative memory model is based on a the cloud server may be compromised, leading to
similar mechanism (Krotov and Hopfield, 2018). local data leakage. To protect the gradients over the
honest but unusual server and ensure training accu-
4.2.3 Using an additional network racy, Phong et al. (2018) used additive homomorphic
The scheme of Akhtar et al. (2018) was a defense encryption to enable cipher computation across the
framework against adversarial attacks using universal gradients. The tradeoff of this scheme is the cost of
Li / Front Inform Technol Electron Eng 2018 19(12):1462-1474 1471

increased communication overhead between the released a novel and fundamental library to construct
cloud server and DL participants. other kinds of classifiers, such as a multiplexer and a
face detection classifier. The bottleneck of ML train-
ing on encrypted data lies in the accuracy of the
classifier. It is difficult for an ML/DL algorithm to
obtain high-dimensional statistical information from
encrypted data, because ciphertext is the result of
confusion, and its statistical information has been
destroyed to a certain extent.

5 Conclusions and future work

We have summarized the integration of AI and


cyberspace security from two aspects. On the one
hand, we have reviewed the use of AI related tech-
nologies (ML and DL) in detecting and resisting
Fig. 5 Safe distributed machine learning/deep learning various types of attacks in cyberspace. The applica-
systems tion range, implementation principle, and experi-
mental results of various schemes have been summa-
rized and compared by means of classification. On the
In a federated learning environment, mobile de-
other hand, in the view of the attacks that AI itself
vices can participate in the learning process, and
may encounter and the security protection require-
terminal users benefit from the shared model trained
ments, we have first reviewed various attacks that AI
on distributed data. A typical federated learning solu-
systems may suffer from in an adversarial environ-
tion was proposed by McMahan et al. (2016). They
ment. We then have analyzed the defense strategies
presented a practical method for communication- for different kinds of attacks. Finally, we have dis-
efficient DL networks in decentralized data. Based on cussed how to build a safe AI system in a distributed
this architecture, Bonawitz et al. (2017) designed a ML/DL environment.
practical secure aggregation protocol for high- With the rapid development of AI and cyber-
dimensional data in privacy-preserving ML. This space security, the integration of these two disciplines
aggregation protocol allows the server to securely will present more and more application scenarios. In
compute the sum of parameters collected from many the future, there are several promising and open topics
mobile devices in a distributed way. They also for integrated cyber security and AI technologies.
launched some experiments and compared this Some typical topics are as follows: (1) AI-based cyber
scheme with other protocols using secure multi-party security situational awareness should be studied to
computation. The experimental results showed that provide smart prediction and protection for cyber-
the proposed protocol produces lower overhead and space; (2) Novel and special AI algorithms for cyber
has better fault tolerance and greater robustness. security, especially for big data intelligence, should
be studied; (3) Novel security protection solutions for
4.3.2 Machine learning classification over encrypted
AI should be pursued in the future.
data
It is vital to ensure the confidentiality of both References
data and the classifier. To realize this security con- Abeshu A, Chilamkurti N, 2018. Deep learning: the frontier for
distributed attack detection in fog-to-things computing.
straint, some researchers tried to build a safe AI by
IEEE Commun Mag, 56(2):169-175.
training cipher data, which is really difficult. Bost https://doi.org/10.1109/MCOM.2018.1700332
et al. (2015) constructed three major ML classifica- Akhtar N, Mian A, 2018. Threat of adversarial attacks on deep
tion protocols over encrypted data: hyperplane deci- learning in computer vision: a survey. IEEE Access,
sion, naïve Bayes, and decision trees. They also 6:14410-14430.
1472 Li / Front Inform Technol Electron Eng 2018 19(12):1462-1474

https://doi.org/10.1109/ACCESS.2018.2807385 Goodfellow IJ, Shlens J, Szegedy C, 2015. Explaining and


Akhtar N, Liu J, Mian A, 2018. Defense against universal harnessing adversarial examples.
adversarial perturbations. IEEE/CVF Conf on Computer https://arxiv.org/abs/1412.6572
Vision and Pattern Recognition, p.3389-3398. Gu SX, Rigazio L, 2015. Towards deep neural network ar-
https://doi.org/10.1109/CVPR.2018.00357 chitectures robust to adversarial examples.
Arulkumaran K, Deisenroth MP, Brundage M, et al., 2017. https://arxiv.org/abs/1412.5068
Deep reinforcement learning: a brief survey. IEEE Signal Guan ZT, Li J, Wu LF, et al., 2017. Achieving efficient and
Process Mag, 34(6):26-38. secure data acquisition for cloud-supported Internet of
https://doi.org/10.1109/MSP.2017.2743240 Things in smart grid. IEEE Internet Things J, 4(6):
Aygün RC, Yavuz AG, 2017. A stochastic data discrimination 1934-1944. https://doi.org/10.1109/JIOT.2017.2690522
based autoencoder approach for network anomaly detec- Hatcher WG, Yu W, 2018. A survey of deep learning: plat-
tion. Proc 5th Signal Processing and Communications forms, applications and emerging research trends. IEEE
Applications Conf, p.1-4. Access, 6:24411-24432.
https://doi.org/10.1109/SIU.2017.7960410 https://doi.org/10.1109/ACCESS.2018.2830661
Bonawitz K, Ivanov V, Kreuter B, et al., 2017. Practical secure He W, Wei J, Chen XY, et al., 2017. Adversarial example
aggregation for privacy-preserving machine learning. defenses: ensembles of weak defenses are not strong.
Proc ACM SIGSAC Conf on Computer and Communi- https://arxiv.org/abs/1706.04701
cations Security, p.1175-1191. Kokila RT, Selvi ST, Govindarajan K, 2014. DDoS detection
https://doi.org/10.1145/3133956.3133982 and analysis in SDN-based environment using support
Bost R, Popa RA, Tu S, et al., 2015. Machine learning classi- vector machine classifier. Proc 6th Int Conf on Advanced
fication over encrypted data. Network and Distributed Computing, p.205-210.
System Security Symp, p.331-364. https://doi.org/10.1109/ICoAC.2014.7229711
https://doi.org/10.14722/ndss.2015.23241 Korczak J, Hernes M, 2017. Deep learning for financial time
Chowdhury MMU, Hammond F, Konowicz G, et al., 2017. A series forecasting in a-trader system. Proc Federated Conf
few-shot deep learning approach for improved intrusion on Computer Science and Information Systems, p.905-
detection. Proc 8th Annual Ubiquitous Computing, Elec- 912. https://doi.org/10.15439/2017F449
tronics and Mobile Communication Conf, p.456-462. Krotov D, Hopfield J, 2018. Dense associative memory is
https://doi.org/10.1109/UEMCON.2017.8249084 robust to adversarial inputs. Neur Comput, 30(12):
Cisse M, Adi Y, Neverova N, et al., 2017. Houdini: fooling 3151-3167. https://doi.org/10.1162/neco_a_01143
deep structured prediction models. LeCun Y, Bengio Y, Hinton G, 2015. Deep learning. Nature,
https://arxiv.org/abs/1707.05373 521(7553):436-444. https://doi.org/10.1038/Nature14539
Cubuk ED, Zoph B, Schoenholz SS, et al., 2017. Intriguing Lee H, Han S, Lee J, 2017. Generative adversarial trainer:
properties of adversarial examples. defense to adversarial perturbations with GAN.
https://arxiv.org/abs/1711.02846 https://arxiv.org/abs/1705.03387
Dada EG, 2017. A hybridized SVM-kNN-pdAPSO approach Li GL, Wu J, Li JH, et al., 2018. Service popularity-based
to intrusion detection system. Faculty Seminar Series, smart resources partitioning for fog computing-enabled
p.1-8. industrial Internet of Things. IEEE Trans Ind Inform,
Deng L, Yu D, 2014. Deep learning: methods and applications. 14(10):4702-4711.
Found Trend Sig Process, 7(3-4):197-387. https://doi.org/10.1109/TII.2018.2845844
https://doi.org/10.1561/2000000039 Li LZ, Ota K, Dong MX, 2018a. Deep learning for smart
Feinman R, Curtin RR, Shintre S, et al., 2017. Detecting ad- industry: efficient manufacture inspection system with
versarial samples from artifacts. fog computing. IEEE Trans Ind Inform, 14(10):4665-
https://arxiv.org/abs/1703.00410 4673. https://doi.org/10.1109/TII.2018.2842821
Gao W, Morris T, Reaves B, et al., 2010. On SCADA control Li LZ, Ota K, Dong MX, 2018b. DeepNFV: a light-weight
system command and response injection and intrusion framework for intelligent edge network functions virtu-
detection. eCrime Researchers Summit, p.1-9. alization. IEEE Netw, in press.
https://doi.org/10.1109/ecrime.2010.5706699 https://doi.org/10.1109/MNET.2018.1700394
Gebhart T, Schrater P, 2017. Adversary detection in neural Liang B, Li HC, Su MQ, et al., 2017. Detecting adversarial
networks via persistent homology. image examples in deep networks with adaptive noise
https://arxiv.org/abs/1711.10056 reduction. https://arxiv.org/abs/1705.08378
Golovko VA, 2017. Deep learning: an overview and main Loukas G , Vuong T , Heartfield R , et al, 2018. Cloud-based
paradigms. Opt Memory Neur Netw, 26(1):1-17. cyber-physical intrusion detection for vehicles using deep
https://doi.org/10.3103/S1060992X16040081 learning. IEEE Access, 6:3491-3508.
Goodfellow IJ, Pouget-Abadie J, Mirza M, et al., 2014. Gen- https://doi.org/10.1109/ACCESS.2017.2782159
erative adversarial networks. Luo Y, Boix X, Roig G, et al., 2015. Foveation-based mecha-
https://arxiv.org/abs/1406.2661 nisms alleviate adversarial examples.
Li / Front Inform Technol Electron Eng 2018 19(12):1462-1474 1473

https://arxiv.org/abs/1511.06292 Commun Appl, 13(3S), Article 34.


Lyu C, Huang KZ, Liang HN, 2015. A unified gradient regu- https://doi.org/10.1145/3092831
larization family for adversarial examples. IEEE Int Conf Papernot N, McDaniel P, Jha S, et al., 2016. The limitations of
on Data Mining, p.301-309. deep learning in adversarial settings. IEEE European
https://doi.org/10.1109/ICDM.2015.84 Symp on Security and Privacy, p.372-387.
Manning CD, Surdeanu M, Bauer J, et al., 2014. The Stanford https://doi.org/10.1109/EuroSP.2016.36
CoreNLP natural language processing toolkit. Proc 52nd Phong LT, Aono Y, Hayashi T, et al., 2018. Privacy-
Annual Meeting of the Association for Computational preserving deep learning via additively homomorphic
Linguistics: System Demonstrations, p.55-60. encryption. IEEE Trans Inform Forens Secur, 13(5):
https://doi.org/10.3115/v1/P14-5010 1333-1345. https://doi.org/10.1109/TIFS.2017.2787987
McMahan HB, Moore E, Ramage D, et al., 2016. Communi- Ren SQ, He KM, Girshick R, et al., 2017. Faster R-CNN:
cation-efficient learning of deep networks from decen- towards real-time object detection with region proposal
tralized data. https://arxiv.org/abs/1602.05629 networks. IEEE Trans Patt Anal Mach Intell, 39(6):
Meng DY, Chen H, 2017. MagNet: a two-pronged defense 1137-1149.
against adversarial examples. Proc ACM Conf on Com- https://doi.org/10.1109/TPAMI.2016.2577031
puter and Communications Security, p.135-147. Ross AS, Doshi-Velez F, 2017. Improving the adversarial
https://doi.org/10.1145/3133956.3134057 robustness and interpretability of deep neural networks by
Meng WZ, Li WJ, Kwok LF, 2015. Design of intelligent KNN- regularizing their input gradients.
based alarm filter using knowledge-based alert verifica- https://arxiv.org/abs/1711.09404
tion in intrusion detection. Secur Commun Netw, 8(18): Sabour S, Cao YS, Faghri F, et al., 2015. Adversarial manip-
3883-3895. https://doi.org/10.1002/sec.1307 ulation of deep representations.
Meng X, Shan Z, Liu FD, et al., 2017. MCSMGS: malware https://arxiv.org/abs/1511.05122
classification model based on deep learning. Int Conf on Shahid N, Aleem SA, Naqvi IH, et al., 2012. Support vector
Cyber-Enabled Distributed Computing and Knowledge machine based fault detection & classification in smart
Discovery, p.272-275. grids. IEEE Globecom Workshops, p.1526-1531.
https://doi.org/10.1109/CyberC.2017.21 https://doi.org/10.1109/GLOCOMW.2012.6477812
Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level Shokri R, Shmatikov V, 2015. Privacy-preserving deep
control through deep reinforcement learning. Nature, learning. Proc 53rd Annual Allerton Conf on Communi-
518(7540):529-533. https://doi.org/10.1038/nature14236 cation, Control, and Computing, p.1310-1321.
Moon D, Im H, Kim I, et al., 2017. DTB-IDS: an intrusion https://doi.org/10.1109/ALLERTON.2015.7447103
detection system based on decision tree using behavior Syarif AR, Gata W, 2017. Intrusion detection system using
analysis for preventing APT attacks. J Supercomput, hybrid binary PSO and K-nearest neighborhood algo-
73(7):2881-2895. rithm. 11th Int Conf on Information & Communication
https://doi.org/10.1007/s11227-015-1604-8 Technology and System, p.181-186.
Moosavi-Dezfooli SM, Fawzi A, Frossard P, 2016. DeepFool: https://doi.org/10.1109/ICTS.2017.8265667
a simple and accurate method to fool deep neural net- Vinayakumar R, Soman KP, Poornachandran P, et al., 2018.
works. IEEE Conf on Computer Vision and Pattern Detecting Android malware using long short-term
Recognition, p.2574-2582. memory (LSTM). J Int Fuzzy Syst, 34(3):1277-1288.
https://doi.org/10.1109/CVPR.2016.282 https://doi.org/10.3233/JIFS-169424
Moosavi-Dezfooli SM, Fawzi A, Fawzi O, et al., 2017. Uni- Vollmer T, Manic M, 2009. Computationally efficient neural
versal adversarial perturbations. Proc IEEE Conf on network intrusion security awareness. Proc 2nd Int Symp
Computer Vision and Pattern Recognition, p.86-94. on Resilient Control Systems, p.25-30.
https://doi.org/10.1109/CVPR.2017.17 https://doi.org/10.1109/ISRCS.2009.5251357
Mopuri KR, Garg U, Babu RV, 2017. Fast feature fool: a data Vuong TP, Loukas G, Gan D, et al., 2015. Decision tree-based
independent approach to universal adversarial detection of denial of service and command injection at-
perturbations. https://arxiv.org/abs/1707.05572 tacks on robotic vehicles. IEEE Int Workshop on
Nayebi A, Ganguli S, 2017. Biologically inspired protection of Information Forensics and Security, p.1-6.
deep networks from adversarial attacks. https://doi.org/10.1109/WIFS.2015.7368559
https://arxiv.org/abs/1703.09202 Wu J, Dong MX, Ota K, et al., 2018. Big data analysis-based
Olalere M, Abdullah MT, Mahmod R, et al., 2016. Identifica- secure cluster management for optimized control plane in
tion and evaluation of discriminative lexical features of software-defined networks. IEEE Trans Netw Serv
malware URL for real-time classification. Int Conf on Manag, 15(1):27-38.
Computer and Communication Engineering, p.90-95. https://doi.org/10.1109/TNSM.2018.2799000
https://doi.org/10.1109/ICCCE.2016.31 Xie CH, Wang JY, Zhang ZS, et al., 2017. Adversarial exam-
Ota K, Dao MS, Mezaris V, et al., 2017. Deep learning for ples for semantic segmentation and object detection.
mobile multimedia: a survey. ACM Trans Multim Comput IEEE Int Conf on Computer Vision, p.1378-1387.
1474 Li / Front Inform Technol Electron Eng 2018 19(12):1462-1474

https://doi.org/10.1109/ICCV.2017.153 work. IEEE Int Conf on Computational Science and En-


Xin Y, Kong LS, Liu Z, et al., 2018. Machine learning and gineering and IEEE Int Conf on Embedded and Ubiqui-
deep learning methods for cybersecurity. IEEE Access, tous Computing, p.639-642.
6:35365-35381. https://doi.org/10.1109/CSE-EUC.2017.119
https://doi.org/10.1109/ACCESS.2018.2836950 Zhu DL, Jin H, Yang Y, et al., 2017. DeepFlow: deep learn-
Xu WL, Evans D, Qi YJ, 2017. Feature squeezing mitigates ing-based malware detection by mining Android appli-
and detects Carlini/Wagner adversarial examples. cation for abnormal usage of sensitive data. IEEE Symp
https://arxiv.org/abs/1705.10686 on Computers and Communications, p.438-443.
Yuan XY, 2017. PhD forum: deep learning-based real-time https://doi.org/10.1109/ISCC.2017.8024568
malware detection with multi-stage analysis. IEEE Int Zolotukhin M, Hämäläinen T, Kokkonen T, et al., 2016. In-
Conf on Smart Computing, p.1-2. creasing web service availability by detecting applica-
https://doi.org/10.1109/SMARTCOMP.2017.7946997 tion-layer DDoS attacks in encrypted traffic. Proc 23rd Int
Zhao GZ, Zhang CX, Zheng LJ, 2017. Intrusion detection Conf on Telecommunications, p.1-6.
using deep belief network and probabilistic neural net- https://doi.org/10.1109/ICT.2016.7500408

You might also like