Portugal QML PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Towards Quantum-Enhanced Machine Learning for

Network Intrusion Detection


Arnaldo Gouveia1,2 Miguel Correia2
1 2
IBM Belgium, Brussels INESC-ID, Instituto Superior Técnico, Universidade de Lisboa – Lisboa, Portugal

Abstract—Network Intrusion Detection Systems (NIDSs) are On May 2016, IBM announced the first Near Term Quantum
commonly used today to detect malicious activities. Quantum Computer, a NISQ, to be publicly available on the cloud [14].
computers, despite not being practical yet, are becoming available The product is known as IBM Q Experience or simply IBM
for experimental purposes. We present the first approach for
applying unsupervised Quantum Machine Learning (QML) in QX. Customers can programme a universal quantum computer,
the context of network intrusion detection from the perspective of simulate quantum operations via an interface, and perform the
quantum information, based on the concept of quantum-assisted same quantum operations on a quantum computer. IBM QX
ML. We evaluate it using IBM QX in simulation mode and show has been used in many experiences, e.g., quantum oscillator
that the accuracy of a Quantum-Assisted NIDS, based on our synchronization [15], variational quantum algorithms [16],
approach, can be high, rivaling with the the best conventional
SVM results, with a dependence on the characteristics of the and quantum search [17], and optimization approaches to
dataset. convert common quantum circuits descriptions into the IBM
QX have been proposed [18]. A 5-Qbit quantum computer was
I. I NTRODUCTION
initially offered and, since June 2017, also a 16-Qbit quantum
The detection of security violations using machine learning computer, respectively IBM QX2 and IBM QX3. Revised
(ML) has been extensively investigated [1]–[8]. Intrusion versions of these quantum computers, QX4 and IBM QX5,
Detection Systems (IDSs) are tools used to detect such ma- are available since September 2017. In 2019 and 2020 systems
licious activity. Network IDSs (NIDSs) in particular detect with 27 and 65 qubits were released. In order to use them, the
malicious activity in networks and are one of the best known desired quantum functionality, e.g., a quantum circuit, has to
contexts of ML application in the field [3]. IDSs and NIDSs be properly mapped so that the underlying physical constraints
can be classified as signature-based or anomaly-based [1]. A are satisfied. Recently the plan to reach 100 Qbits by 2023 was
signature-based (N)IDS detects attacks by comparing the data announced.
flow under analysis to patterns stored in a signature database We aim to use an hybrid classical/quantum approach where
of known attacks. An anomaly-based (N)IDS typically detects the data and feature learning are classical, whereas the classi-
anomalies using a model of normal behaviour of the monitored fication algorithm is quantum. In this approach, classical data
system and flagging behavior lying outside of the model as has to be converted into quantum data. This approach allows
anomalous or suspicious. Signature-based IDSs can detect the implementation of quantum algorithms on the quantum
well-known attacks with high accuracy but fail to detect or computers available today, e.g., NISQs like IBM QX.
find unknown attacks, whereas anomaly-based IDSs have that The IBM Quantum Experience (IBM Q) is accessible online
capacity. In this paper we focus on anomaly-based NIDSs. giving users in the general public access to a set of IBM’s
Quantum computers, and among them Noisy Intermediate- prototype quantum processors. It is an example instance of
Scale Quantum (NISQ) computers, aim to leverage quantum cloud based quantum computing. As of May 2018, there are
physics to perform computational tasks beyond the capabilities three processors on the IBM Quantum Experience: two 5-
of the most powerful classical computers, potentially achieving qubit processors and a 16-qubit processor. This service can be
quantum supremacy [9]–[11]. As the amount of Qbits and used to run algorithms and experiments, and explore tutorials
accuracy of quantum computers increases, the problem of and simulations around what might be possible with quantum
when they exceed the state-of-the-art classical computation computing.
draws a great deal of attention. A definitive proof of quantum The first part of the processing relates to the concept
dominance is predicted in the near future [9]–[11], and one of feature learning. Principal component analysis (PCA) is
argument has already been made that it has been accomplished arguably the most used technique for dimensionality reduction,
[12], although it has been disputed by competitors [13]. A so it has already been used in the context of intrusion detection
consequence of quantum supremacy is that quantum comput- [19], [20]. Autoencoders are a more general technique that,
ers may outperform their classical counterparts quadratically however, can also be used for dimensionality reduction, with
in terms of learning efficiency, or even exponentially in several benefits:
performance [10]. This is a motivation towards investigating Autoencoders can be trained using a variety of stochastic
the potential of Quantum-Assisted Machine Learning (QAML) optimization methods that have been developed for training
in the context of network intrusion detection. deep neural networks, becoming highly efficient for large
978-1-7281-8326-8/20/$31.00 2020
c IEEE datasets, in particular with statistical signatures that change
2
over time [21]. Autoencoders can handle high-dimensional is the same as minimizing kwk ~
2 . Originally SVM learning
training data, such as images, being suitable for online learning algorithms could only solve linearly separable problems. To
contexts in which new data arrives continuously, without the address this limitation the kernel trick was introduced. The
need of computationally expensive matrix operations as PCA. mapping, represented in the figure, is characterized by the
In summary, the advantages of using an autoencoder over choice of a class of functions known as kernels. Kernels
PCA can be summarized as: autoencoders can process high- use specific functions φ(x) to map the data into a higher-
dimensional data; autoencoders can process large datasets; dimensional space, see Figure 1. This procedure is justified
autoencoders are capable of online processing, processing new by Cover’s theorem [24], which guarantees that any data set
data as it arrives. becomes (linearly) separable as the data dimension grows.
We use a quantum version of SVM, Quantum Support In this situation SVM finds the hyperplane in a new space
Vector Machine (QSVM), that uses a quantum kernel estimator that maximizes the margin and minimizes misclassifications.
and optimizer [22], [23]. A classifier with a large margin is prone to be more accurate.
In this work we aim at answering the following questions:
1) How effectively can autoencoders, with known NIDS B. Quantum-Assisted Machine Learning
datasets, produce an encoding with probability distribu-
tion functions (PDFs) – in other terms a latent space – Quantum computing can help speed up some types of
related to the input variables so that they can processed problems. It may be able to find new, easier ways to achieve
by other ML algorithms, in an unsupervised fashion? economic optimisation. Nevertheless, as stated above, the
2) Can this result be used to interface with quantum compu- existing quantum computers are very small. We therefore plan
tation systems in line with a quantum-enhanced approach to use QAML, a hybrid classical / quantum solution where
for network intrusion detection? data and function learning are classical, while the classification
algorithm is quantum.
The contributions of the this work are: (1) a study of
autoencoders in the scope of unsupervised learning; (2) a study An example of a classical / quantum hybrid algorithm that
on quantum unsupervised ML by interfacing a dimensional- relies on quantum circuits considered to be inefficiently scaled
ity reduction technique based on autoencoders and quantum by classical methods was provided in [23]. In this work, the
processors. We evaluate our approach using IBM QX and two authors define and apply two binary classification approaches
well-established network intrusion detections datasets (UNB utilizing supervised learning. Such classification algorithms
NSL KDD and UNSW NB15). are related to regular SVMs. The aim in this research is to
introduce a non-linear function map that takes the data to be
II. BACKGROUND graded into a space where it can be linearly segregated.
A. Support Vector Machines The method used in this work is the second method ex-
plained in [23] directly exploits this connection to classical
The SVM classification algorithm can classify data into two SVMs. Here only the overlap of the feature map circuits is
sub-groups. Let us assume there are M training data points estimated for each pair in the training set. That is, the quantum
x~i , i = 1, . . . , M , and each data has a label +1 or −1. Let computer is only used once to compute the kernel matrix for
us also assume that the number of dimensions of the feature the training set. The optimal separating hyperplane and the
space is N . In this case, the training data can be written as support vectors can be obtained efficiently in the training set
{(x~i , yi ) : x~i ∈ RN , yi = ±1}. size by solving the conventional dual optimization problem on
a classical computer. This problem is concave and therefore
efficiently solvable on a classical computer. Once the support
vectors have been determined the quantum computer needs to
be queried again for classification, when the kernel for a new
φ datum is estimated. The quantum processor is thus used twice:
once during training and then again in the classification phase,
in both cases to estimate the quantum kernel.

III. Q UANTUM S UPPORT V ECTOR M ACHINES

Fig. 1. SVM kernel trick illustrated with samples from two classes here A. Quantum Encoding Preliminary Concepts
represented by black and white dots.
Suppose we have a dataset D of N instances:
The frontier between two classes can be represented by the
distance between two support hyperplanes w ~ · ~x + b = 1 and D = {x1 , ..., xi , ..., xN } (1)
~ x+b = −1. With the expressions of the support hyperplanes
w·~
defined, the distance between them can be represented by where each xi is a real number. In basis encoding, each
2 2
~ . Thus, SVM aims at maximizing the margin kwk
kwk ~ , which instance xi is encoded as |bi i where bi is the binary repre-
sentation of xi . The dataset D can then be represented as a J(Ω) = 1 J(Ω) = 1
|1i
superposition of all computational basis states:
N
1 X
|ψi = √ |bi i (2)
N i=1 60◦
60◦
In amplitude encoding, data is encoded in the amplitudes of
quantum states. The above dataset D can be encoded as:
PN
xi |ii |0i
|ψi = i=1 (3) J(Ω) = −1 J(Ω) = −1
||D||
(a) (b)
The encoding scheme proposed consists of a layer of
Fig. 2. Quantum Kernel Functions: On the left; A classical dataset mapped
Hadamard gates on each qubit, followed by an operator, into the Bloch sphere. JΩ () returns binary labels (−1, 1) and can be mapped
UΦ (~x), then another layer of Hadamards, and UΦ (~x) again. onto the Bloch sphere by using a non-linear feature map: for a single Qubit
A feature map on n-Qbits is generated by the unitary where UΦ(x) = Zx is a phase-gate of angle x ∈ Ω. The mapped data can be
separated by the hyperplane given by normal w. ~ On the right: For the general
H denotes the conventional Hadamard gate and circuit UΦ(~x) is formed by products of single- and two-Qubit unitaries that are
UΦ (~x) = UΦ(~x) H ⊗n UΦ(~x) H ⊗n . diagonal in the computational basis. The circuit family depends non-linearly
The UΦ(~x) gate consists of Φ() gates, which can operate on on the data through the coefficients φS (~ x) with |S| ≤ 2. The experimental
implementation of the parameterized diagonal two-Qbit operations is done
one or two qubits. The single-qubit Φ operation is just a Z using CNOTs and Z−gates [23].
rotation by xi on the ith qubit.
Quantum Support Vector Machines (QSVMs) use a quan-
tum kernel estimator to estimate the kernel function and defect, the simulation of a single layer preparation circuit can
optimize the classifier directly. QSVMs are an interesting be effectively performed. by uniform sampling [27].
application for NISQ computers [22], [23]. The scheme we Every n-Qbit device operation can be approximated to an
are proposing deals with the original problem of supervised appropriate level of accuracy by the gates of the universal gate
learning: the creation of a SVM classifier and is fully explained collection. The nearly universal set contains only two types of
in [23]. For this question, we provide data from the training gates: the Toffoli Gate and the Hadamard Gate [28] [29].
set T and the test set S from the subset Ω ⊂ Rd . Both are The Toffoli Gate is common for (reversible) classical com-
assumed to be labeled as m : T ∪ S → {+1, −1} unknown puting and has a totally classical character in quantum circuits
to the algorithm. The algorithm just gets test data labels as it preserves its computational basis. This seems to suggest
T . The aim is to predict an estimated map on the test set that the quantum benefit is provided by gates which do not
m̃ : s → {+1, −1} in such a way that it fits the true map maintain the theoretical base, i.e. the Hadamard gates. Since
m(~s) = m̃(~s) on the test data ~s ∈ S with a high probability. circuits such as those used in this simulation are likely to result
By means of SVMs, the data is projected non-linearly to a in the successful testing of a purely classical computer.
high dimensional space, a function area , where a hyperplane
is configured to differentiate the named samples. The quantum B. Quantum kernel estimation
variant of this method, QSVM, has already been proposed Quantum kernel estimation: The protocol chosen imple-
by Rebentrost et al. [25]. Their scheme makes it possible to ments the SVM directly. A classical SVM is used for classifi-
accomplish incremental progress if data is given in a coherent cation. The quantum computer is used twice in this protocol:
superposition. • First, the kernel K(~ xi , ~xj ) is estimated on a quantum
A feature map on n-Qbits is generated by the unitary computer for all pairs of training data ~xi , ~xj ∈ T . Here
UΦ (~x) = UΦ(~x) H ⊗n UΦ(~x) H ⊗n , where H denotes the con- it will be convenient to write T = {~x1 , . . . , ~xt } with
ventional Hadamard gate and t = |T |; also let yi = m(~xi ) be the corresponding label.
  The optimization problem for the optimal SVM can be
formulated in terms of a dual quadratic program that only
X Y
UΦ(~x) = exp i φS (~x) Zi  , (4)
S⊆[n] i∈S uses access to the kernel.
t t
is a diagonal gate in the Pauli Z basis (Figure 2(b)). This X 1 X
n LD (α) = αi − yi yj αi αj K(~xi , ~xj ), (5)
circuit will act as an initial state on |0i . The φS(~x) ∈ R i=1
2 i,j=1
coefficients encode the ~x ∈ Ω data. In general, any diagonal Pt
U Φ(~x) can be used if it can be implemented efficiently. subject to i=1 αi yi = 0 and αi ≥ 0 for each i. This
This is the case, for example, when only ≤ 2interactions problem is concave whenever K(~xi , ~xj ) is a positive
are considered. The exact evaluation of the inner-product definite matrix. The solution to this problem will be given
between two states created from a similar circuit with only by a nonnegative vector α ~ = (α1 , . . . , αt ).
one diagonal layer U Φ(~x) is ¶-hard [26]. Nevertheless, in the • The quantum computer is used a second time to estimate
experimentally relevant context of the estimate of the additive the kernel for a new datum ~s ∈ S with all the support
vectors. The optimal solution α~ ∗ is used to construct the 2) Preprocessing of the data: the non-numeric features were
classifier encoded into numeric values, then normalized into the
t
X
! interval [0, 1].
m̃(~s) = sign yi αi∗ K(~xi , ~s) + b . (6) 3) Encoding of the data: we considered an implementation
i=1 based on the autoencode() function provided by the Ruta
package [31], [32] for the R statistical computing package
IV. D ESCRIPTION OF THE E XPERIMENT
[33].
This section presents the full quantum-enhanced intrusion • NSL-KDD: For the NSL-KDD dataset autoencode()
detection scheme. The scheme can be divided in the following has been tested with four activation functions – Linear,
phases: Hyperbolic Tangent, Relu and Sigmoid –, in order to
1) Obtain the input data, i.e., network flows corresponding understand the performance in the four cases.
to a period of time. The NIDS can be configured to detect • NB15: For the NB15 dataset the autoencode() has been
intrusions by inspecting from the last T period of time. tested with a single activation function, sigmoid, as
For example, T can be set to 10 minutes, 30 minutes, or there was no point in repeating the previous evaluation.
1 hour. 4) Normalization: the data returned by autoencode() was
2) Preprocess the data, e.g., do feature normalization. normalized to the range [−1, 1].
3) Encode the data. This phase does dimensionality re- 5) Quantum processing: the resulting dataset has been pro-
duction, i.e., summarizes the input data to a size that cessed by the QSVM() function1 of the Qiskit quantum
is appropriate in terms of dimension to be inputted to computing framework. This function runs the binary
the QSVM() function. As explained before, this phase is classification at the quantum simulation level.
processing the data with an Autoencoder and obtaining
For comparison purposes, a classical SVM algorithm was
its latent space.
run over the entire datasets after being processed by the
4) Normalize the data returned by the autoencoder for
autoencoder. Specifically, we used the liquidSVM, a package
preparing it for the next phase.
that implements SVMs and configures hyper-parameters auto-
5) Quantum processing, done by a quantum computer, in
matically using k-fold cross validation.2
order to detect intrusions in the input data. As explained,
this phase involves running a QSVM() function algorithm B. Experimental process for the Quantum component
to classify flows as normal or anomalous.
6) Repeat for the next period of time or data sample in 1) For each data point, the process of encoding data
parallel. Notice that the period may overlap the previous involves a series of gates that will transform the initial
if that is considered useful for some reason. state, |0i, into some state that depends on ~x|Φ(~x)i.
One such method is amplitude encoding, in which the
The detection process has to be preceded by a training
resulting state would be. In this scheme, the data is
process. Training involves the same phases 1 to 5 above, except
encoded into the basis state amplitudes of the state:
that the QSVM in phase 5 is executed in training mode.
   
a1 x1
V. E XPERIMENTAL E VALUATION  ..   .  P
 .  ↔  ..  , |xi |2 = 1, xi ∈ R
An important component of a NIDS evaluation is a good i
a2n x2n
benchmarking dataset. We use two recent datasets that have
been designed specifically and carefully to evaluate NIDSs. 2) According to [23], [34] , classifiers based on quantum
They contain a large number of attacks and are labelled with circuits should only provide a quantum advantage if the
information about which network flows are normal traffic and kernel used in the quantum space is difficult to estimate
which are malicious. We designate them NSL-KDD [30] and classically: K(~x, ~z) = | h Φ(~x) | Φ(~z) i |2 .
NB15 [5], [6]. 3) Inference: Inference in quantum machine learning
amounts to the unitary operation, U , such that the loss
A. Data encoding and preparation is minimized when measuring U hΦ(~x)|. Mathematically,
The phases presented in Section IV were executed the U is a matrix with complex elements which is unitary,
following way: where n is the number of qubits; However, in practice,
such a general matrix cannot be directly trained, since
1) Obtaining the input data: the flows were extracted from
a real quantum computer has to employ combinations
the KDD-NSL and NB15 datasets:
of a limited set of gates rather than arbitrary unitary
• NSL-KDD: a sample of 100 data samples for training
operations. What is used are layers of rotation gates
and 50 data samples for testing has been extracted from separated by entangling layers composed of controlled
the KDDTrain+.arff file. Pauli gates [35].
• NB15: a sample of 150 data samples for training and
50 data samples for testing has been extracted from the 1 Qiskit Aqua algorithms on Qiskit
NB15 train file. 2 liquidSVM on GitHub
4) Measurement: After all the operations in the circuit have
been performed, some observable is measured f = Z1 Z2
as the parity function and a random unitary V ∈ SU (4).
Each measurement provides one of a number of dis-
crete values. For calculating the expected value for an
observable, so measurements are made repeatedly in
order to obtain the mean. The end result is that the
expected value of a measurement from the entire circuit is
hΦ(~x)| V † f V |Φ(~x)i), where f represents the observable
measured. The measurement is then used to produce a
label and a value for the loss which is used to optimize
the parameters used in the inference step.

(a)

(a)
(b)

Fig. 4. Distribution at the encoding layer for the NSL KDD test dataset. As
the statistical distribution (SD) does not match the SD of the training set the
data from the test datasets were not used for testing. Data obtained from the
autoencode() Ruta package function applied to the NSL-KDD dataset. Two
cases: (a) NSL-KDD test dataset processed with sigmoid activation: training
loss: 9.3377; (b) NSL-KDD test dataset processed with linear activation:
training loss: 9.4114.

that the accuracy increases with the quantum circuit depth as


theoretically foreseen.
When these performance figures are compared with classical
SVM classification (Table II) we can see that they compare
favourably. In the NSL-KDD case both figures are very
close, 0.92 acuuracy for QSVM and 0.93 accuracy for SVM.
(b) Regarding the NB15 dataset case we can see that the accuracy
Fig. 3. Distribution at the encoding layer (Linear activation and Elu function values provided by the QSVM classifier are somewhat lower
case). This diagram shows the location of the different encoding for normal than the ones provided by the SVM classifier Acc = 0.75.
and malicious samples. Weak superposition can be seen with a continuous
distribution between the two types of samples. Data obtained from the autoen- In Figures 9 and 10 the quantum kernel matrices obtained
code() Ruta package function applied to the NSL-KDD dataset. Two cases are in the simulations are shown. More defined matrices are
shown: (a) NSL-KDD test dataset processed with sigmoid activation.Training observable for the best performing cases, in line with what
loss: 9.4375; (b) NSL-KDD test dataset processed with Hyperbolic activation.
Training loss: 9.3450. was expected. A clear visualization of the quality of the kernel
obtained is linked to the capacity to observe clear spots for
Evaluation:The values of accuracy of our quantum clas- the upper left and lower right of the figure denoting a more
sification algorithm, based on QSVM, are shown in Table I distinctive density and as a consequence a better capacity for
and Table II. Complementing this table, a depiction of the SVM to separate data points.
accuracy versus circuit depth can be seen in Figure 8. It shows
Fig. 5. Distribution at the encoding layer (Linear activation function case).
This diagram shows the location of the different encoding for normal and Fig. 6. NB15 train dataset distribution at the encoding layer (Linear activation
malicious samples. Strong superposition can be seen with a continuous function case). This diagram shows the location of the different encoding
distribution between the several types of samples. Data obtained from the for normal and malicious samples. Strong superposition can be seen with
autoencode() Ruta package function applied to the NB15 dataset. a continuous distribution between the several types of samples, each one
corresponding to a different type of flow. Data obtained from the autoencode()
Accuracy values Ruta package function applied to the NB15 dataset.
Depth 2 3 4 5
NSL KDD 0.8 0.92 0.92 0.92
NB15 0.6 0.64 0.64 0.64
was obtained (with better accuracy values).
TABLE I
ACCURACY TABLE VERSUS CIRCUIT DEPTH FOR THE Q UANTUM SVM
SIMULATION FOR THE NSL KDD DATASET CASE . VI. C ONCLUSIONS

An hybrid solution to submit the classification of network


flows to a quantum computer has been demonstrated. The
The values of accuracy obtained for the NSL-KDD dataset performance values obtained have shown that the approach
– Acc = 0.92 for the Quantum SVM and Acc = 0.93 is feasible and may in the future allow NISDs to benefit from
for classical (Tables I and II) are very close. The kernel quantum assisted computing. One particular but fundamental
matrices show how the SVM algorithm can draw the margin issue relates to the need for statistical similarity between the
to differentiate the two classes of traffic (normal/malicious). testing and training dataset for the test dataset to be used
In this experiment, we can see kernel matrices can label with the models trained. Hence results also show that some
the classes properly most of the time. The kernel matrices work has to be done to optimize the datasets for submission
are depicted in Figures 10 and 9. One can observe that the to the quantum algorithms, here used in simulation mode.
distinctiveness of the matrices are much more evident for the Judging from the accuracy values obtained for the classical
NSL-KDD case, where at the same time better separability SVM and Quantum SVM trials, encouraging data seems to
hint at showing advantage for the quantum processing. This
Accuracy Pos Val Error Neg Val Error should be object of further study.
NB15 0.75 0.132 0.383
NSL-KDD 0.93 0.136 0.019
Acknowledgements. This research was supported by by
TABLE II the European Commission under grant agreement number
ACCURACY TABLE FOR THE CLASSICAL SVM ( LIQUID SVM,
NON - QUANTUM ) EXECUTION OVER THE ENTIRE DATASET FOR BOTH
830892 (SPARTA) and by national funds through Fun-
NSL-KDD AND NB15. dação para a Ciência e a Tecnologia (FCT) with reference
UIDB/50021/2020 (INESC-ID).
[3] P. García-Teodoro, J. Díaz-Verdejo, G. Maciá-Fernández, and
E. Vázquez, “Anomaly-based network intrusion detection: Techniques,
systems and challenges,” Computers & Security, vol. 28, no. 1-2, pp.
18–28, Feb. 2009.
[4] R. Mitchell and I.-R. Chen, “A survey of intrusion detection techniques
for cyber-physical systems,” ACM Computing Surveys, vol. 46, no. 4,
pp. 55:1–55:29, Mar. 2014.
[5] N. Moustafa and J. Slay, “UNSW-NB15: a comprehensive data set for
network intrusion detection systems (UNSW-NB15 network data set),”
in 2015 Military Communications and Information Systems Conference,
Nov 2015, pp. 1–6.
[6] ——, “The evaluation of network anomaly detection systems: Statistical
analysis of the UNSW-NB15 data set and the comparison with the
KDD99 data set,” Information Security Journal: A Global Perspective,
vol. 25, no. 1-3, pp. 18–31, 2016.
[7] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed
analysis of the KDD CUP 99 data set,” in Proceedings of the 2nd IEEE
International Conference on Computational Intelligence for Security and
Defense Applications, 2009, pp. 53–58.
[8] P. Wheeler and E. Fulp, “A taxonomy of parallel techniques for intrusion
detection,” in Proceedings of the 45th Annual Southeast Regional
Conference, 2007, pp. 278–282.
[9] B. Villalonga, D. Lyakh, S. Boixo, H. Neven, T. S. Humble, R. Biswas,
E. G. Rieffel, A. Ho, and S. Mandrà, “Establishing the quan-
tum supremacy frontier with a 281 Pflop/s simulation,” ArXiv, vol.
abs/1905.00444, 2019.
[10] I. L. Markov, A. Fatima, S. V. Isakov, and S. Boixo, “Quantum
supremacy is both closer and farther than it appears,” ArXiv, vol.
abs/1807.10749, 2018.
[11] D. Ristè, M. P. da Silva, C. A. Ryan, A. W. Cross, A. D. Córcoles, J. A.
Smolin, J. M. Gambetta, J. M. Chow, and B. R. Johnson, “Demonstration
of quantum advantage in machine learning,” npj Quantum Information,
vol. 3, no. 1, p. 16, 2017.
[12] F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin, R. Barends,
R. Biswas, S. Boixo, F. G. Brandao, D. A. Buell et al., “Quantum
supremacy using a programmable superconducting processor,” Nature,
vol. 574, no. 7779, pp. 505–510, 2019.
Fig. 7. NB15 Test dataset distribution at the encoding layer (Sigmoid [13] A. Khrennikov, “Echoing the recent Google success: Foundational roots
activation function case). As the statistical distribution (SD) does not match of quantum supremacy,” arXiv preprint arXiv:1911.10337, 2019.
the SD of the training set the data from the test datasets were not used for [14] J. M. Gambetta, J. M. Chow, and M. Steffen, “Building logical qubits
testing. This diagram shows the location of the different encoding for normal in a superconducting quantum computing system,” npj Quantum Infor-
and malicious samples. Weak superposition can be seen with a continuous mation, vol. 3, no. 1, p. 2, 2017.
distribution between the two types of samples. Data obtained from the [15] M. Koppenhöfer, C. Bruder, and A. Roulet, “Quantum synchronization
autoencode() Ruta package function applied to the NB15 dataset. Training on the ibm q system,” 2019.
loss: 8.2805 [16] M. Lubasch, J. Joo, P. Moinier, M. Kiffner, and D. Jaksch, “Variational
quantum algorithms for nonlinear problems,” 2019.
[17] K. Das and A. Sadhu, “Constant time quantum search algorithm over
a datasets: An experimental study using ibm q experience,” ArXiv, vol.
abs/1810.03390, 2018.
[18] A. Zulehner, A. Paler, and R. Wille, “An efficient methodology for map-
ping quantum circuits to the IBM QX architectures,” IEEE Transactions
on Computer-Aided Design of Integrated Circuits and Systems, vol. 38,
pp. 1226–1236, 2017.
[19] Z. Elkhadir, K. Chougdali, and M. Benattou, “Intrusion detection system
using PCA and kernel PCA methods,” in Proceedings of the Mediter-
ranean Conference on Information & Communication Technologies,
A. El Oualkadi, F. Choubani, and A. El Moussati, Eds., 2016, pp. 489–
497.
[20] K. K. Vasan and B. Surendiran, “Dimensionality reduction using prin-
cipal component analysis for network intrusion detection,” Perspectives
Fig. 8. Accuracy as a function of the quantum circuit depth for both datasets. in Science, vol. 8, pp. 510–512, 2016.
These values can be compared with the accuracy obtained for the SVM in the [21] E. Plaut, “From principal subspaces to principal components with linear
classical domain executed over the entire datasets: Acc=0.75 for the NB15 autoencoders,” ArXiv, vol. abs/1804.10253, 2018.
dataset and Acc=0.93 for the NSL-KDD dataset. [22] J. Preskill, “Quantum Computing in the NISQ era and beyond,” Quan-
tum, vol. 2, p. 79, Aug. 2018.
[23] V. Havlíček, A. Córcoles, K. Temme, A. Harrow, A. Kandala, J. Chow,
and J. Gambetta, “Supervised learning with quantum-enhanced feature
R EFERENCES spaces,” Nature, vol. 567, pp. 209–212, 03 2019.
[24] T. M. Cover, “Geometrical and statistical properties of systems of linear
[1] H. Debar, M. Dacier, and A. Wespi, “A revised taxonomy of intrusion inequalities with applications in pattern recognition,” IEEE Transactions
detection systems,” Annales des Télécommunications, vol. 55, no. 7, pp. on Electronic Computers, vol. 14, no. 3, pp. 326–334, June 1965.
361–378, 2000. [25] P. Rebentrost, M. Mohseni, and S. Lloyd, “Quantum support vector
[2] S. Dhaliwal, A.-A. Nahid, and R. Abbas, “Effective intrusion detection machine for big data classification,” Physical Review Letters, vol. 113,
system using XGBoost,” Information, vol. 9, no. 7, pp. 1–24, 2018. 07 2013.
(a) (b) (c) (d)

Fig. 9. Kernel matrices for several depth figures. Ideal kernel matrices containing the inner products of all data points used for training with the NSL-KDD
dataset. Data obtained with a Qiskit simulation. (a) Kernel matrix for depth=2, Accuracy=0.8; (b) Kernel matrix for depth=3, Accuracy=0.92; (c) Kernel
matrix for depth=4, Accuracy=0.92; (d) Kernel matrix for depth=5, Accuracy=0.92. Clearly visible the correlation between distinguishability of the kernel
space and the accuracy values obtained.

(a) (b) (c) (d)

Fig. 10. Kernel matrices for several depth figures. Ideal kernel matrices containing the inner products of all data points used for training with the NB15
dataset. Data obtained with a Qiskit simulation. (a) Kernel matrix for depth=2. Accuracy=0.60; (b) Kernel matrix for depth=3. Accuracy=0.64; (c) Kernel
matrix for depth=4. Accuracy=0.64; (d) Kernel matrix for depth=5.Accuracy=0.64. Clearly visible the correlation between distinguishability of the kernel
space and the accuracy values obtained.

[26] L. A. Goldberg and H. Guo, “The complexity of approximating complex-


valued ising and tutte partition functions,” Computational Complexity,
vol. 26, pp. 765–833, 2014.
[27] T. F. Demarie, Y. Ouyang, and J. F. Fitzsimons, “Classical verification
of quantum circuits containing few basis changes,” Physical Review A,
vol. 97, no. 4, Apr 2018.
[28] Y. Shi, “Both Toffoli and controlled-not need little help to do universal
quantum computing,” Quantum Info. Comput., vol. 3, no. 1, pp. 84–92,
Jan. 2003.
[29] D. Aharonov, “A simple proof that Toffoli and Hadamard are quantum
universal,” 2003.
[30] A. Shiravi, H. Shiravi, M. Tavallaee, and A. A. Ghorbani, “Toward
developing a systematic approach to generate benchmark datasets for
intrusion detection,” Computers & Security, vol. 31, no. 3, pp. 357–374,
May 2012.
[31] D. Charte, F. Herrera, and F. Charte, “Ruta: Implementations of neural
autoencoders in r,” Knowledge-Based Systems, vol. 174, pp. 4 – 8, 2019.
[32] D. Charte, F. Charte, S. García, M. J. del Jesús, and F. Herrera, “A
practical tutorial on autoencoders for nonlinear feature fusion: Taxon-
omy, models, software and guidelines,” Information Fusion, vol. 44, pp.
78–96, 2018.
[33] R Core Team, R: A Language and Environment for Statistical
Computing, R Foundation for Statistical Computing, Vienna, Austria,
2019. [Online]. Available: https://www.R-project.org/
[34] M. Schuld and N. Killoran, “Quantum machine learning in feature
Hilbert spaces,” Phys. Rev. Lett., vol. 122, p. 040504, Feb 2019.
[35] M. Schuld and F. Petruccione, “Supervised learning with quantum
computers,” 2018.

You might also like