Comparative Analysis of Dimensionality Reduction T
Comparative Analysis of Dimensionality Reduction T
Comparative Analysis of Dimensionality Reduction T
Research Article
Keywords: intrusion detection, secure water treatment dataset, convolutional neural networks,
dimensionality reduction, gated recurrent unit
DOI: https://doi.org/10.21203/rs.3.rs-2904250/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License
Abstract
The Internet of Things (IoT) has revolutionized the functionality and efficiency
of distributed cyber-physical systems, such as city-wide water treatment systems.
However, with increased connectivity comes the risk of cybersecurity threats. In
this research, we propose an Intrusion Detection System (IDS) for securing the
Secure Water Treatment (SWaT) dataset using a 1D Convolutional Neural Net-
work (CNN) model enhanced with a Gated Recurrent Unit (GRU). The proposed
method outperforms existing methods by achieving 99.68% accuracy and F1 score
of 0.9869. The paper also explores dimensionality reduction methods, includ-
ing Autoencoders, Generalized Eigenvalue Decomposition (GED), and Principal
Component Analysis (PCA). The research findings highlight the importance of
balancing dimensionality reduction with the need for accurate intrusion detec-
tion. It is found that PCA provided better performance compared to the other
techniques, as reducing the input dimension by 90.2% resulted in only a 2.8%
and 2.6% decrease in the accuracy and F1 score, respectively.
1
Keywords: intrusion detection, secure water treatment dataset, convolutional neural
networks, dimensionality reduction, gated recurrent unit
1 Introduction
The Internet of Things (IoT) is a technology that enables the connection of every-day
devices, such as appliances, vehicles, and industrial equipment, to the internet. This
allows these devices to communicate with one another and with other systems, and to
be controlled and monitored remotely. The increased connectivity provided by IoT has
had a significant impact on industrial control systems, which were previously closed
off from the outside world.
In the past, industrial control systems (ICS) were primarily used to control and
monitor industrial processes within a single facility or on a small scale. With the
advent of IoT, however, these systems can now be connected to the internet, enabling
remote monitoring and control. This allows for city-wide or nation-wide distributed
systems to work collaboratively and efficiently, with the ability to share information
and coordinate actions across different locations.
Although connectivity has many benefits, it also brings the danger of cyberattacks.
An attacker can access the communication channel and control the system and imple-
ment an attack. The attack may have various effects from simply unavailability of
service to catastrophic system failure. As industrial control systems become connected
to the internet, they become more vulnerable to cyberattacks.
There are examples of cyberattacks targeting industrial control systems (ICS) in
recent years. In 2000, a former employee maliciously commanded SCADA (Supervisory
Control and Data Acquisition) radio-controlled sewage [1]. He caused hundreds of
thousands of raw sewages to spill out around various parts of the city in Australia.
One of the most well-known examples of an ICS cyberattack is the Stuxnet worm
[2], which was discovered in 2010. The worm specifically targeted the software used to
control industrial processes at an Iranian nuclear facility. The attack caused physical
damage to the centrifuges used to enrich uranium, setting back the facility’s operations.
In 2015 a malicious cyberattacks were targeted the Ukraine power grid [3], causing
widespread power outages across the country. The attackers used spear-phishing emails
to gain access to the network, and then used malware to disrupt the operations of the
power plants.
WannaCry ransomware attack affected thousands of computers including industrial
control plants [4]. Triton malware [5], which specifically targeted the industrial control
systems used to operate critical infrastructure, was discovered in 2017. The malware
manipulated the Triconex Safety Instrumented System (SIS) controllers, which are
used to monitor and control industrial processes in facilities such as oil refineries and
chemical plants. Although the full extent of damage caused by these attacks are not
publicized, the attacks demonstrate the potential consequences of a successful cyber-
attack on industrial control systems, which can include physical damage, disruption
of operations, and even fatalities.
2
Attacks on ICS can range from simple disruptions of service to catastrophic failures
that can have major physical consequences. Given the potential consequences of a
successful attack, it is important to take the necessary steps to protect industrial
control systems especially critical infrastructure. Therefore, organizations that deploy
IoT-enabled industrial control systems need to be aware of these security risks and
take appropriate measures to protect against them. This includes implementing robust
security protocols, monitoring for and responding to potential security threats, and
providing employee education and training to raise awareness of security risks. One
common practice of protecting ICS is the use of intrusion detection systems (IDS).
Re-searchers have proposed various IDSs to identify and detect intrusions and help
secure cyber-physical systems. However, the efficiency and effectiveness of IDSs can
be improved through feature selection and feature reduction algorithms.
In this research, we propose a method for securing the Secure Water Treatment
(SWaT) dataset by implementing an IDS using a one-dimensional Convolutional
Neural Network (CNN) model enhanced with a Gated Recurrent Unit (GRU). Addi-
tionally, we explore various dimensionality reduction techniques, such as autoencoders,
Generalized Eigenvalue Decomposition (GED), and Principal Component Analysis
(PCA). The goal of the paper is to determine the optimal feature subset that can
improve the efficiency of the model without compromising its accuracy. In light of
above, the contributions of the paper are as follows:
• A novel IDS approach based on a 1D CNN model enhanced with a GRU is achieved
for securing the SWaT dataset, which outperforms traditional IDS methods in terms
of accuracy and robustness.
• The impacts of dimensionality reduction techniques are evaluated.
• Insights have been provided into the trade-off between feature reduction and
detection accuracy.
• The importance of balancing feature reduction and detection accuracy is demon-
strated for effective intrusion detection in cyber-physical systems.
The rest of this paper is organized as follows. Section 2 gives brief background
information of SWaT dataset, GRU, and CNN along with the related works. The pro-
posed method and implementation details are explained in Section 3. The experimental
result and discussion are presented in section 4 followed by a conclusion in Section 5.
3
Fig. 1 The process flowchart of SWaT system.
analyzer, water level sensor, and flow meter, and handle the actuators such as valves
and pumps. At Level 1, the PLCs communicate with each other using a separate net-
work, which connects all six stages to the Supervisory Control and Data Acquisition
(SCADA) system.
PLC 1 controls the flow of raw water by opening or closing the valves connected to
the inlet and outlet of the raw water tank in Stage 1. After chemical dosing in Stage 2,
the water is fed to Stage 3 for the Ultra Filtration (UF). From there, the UF feed pump
forwards the water to the Reverse Osmosis (RO) feed tank in Stage 4. Before entering
the RO process, the water passes through an ultraviolet (UV) de-chlorinator to remove
any free chlorine. In Stage 5, the RO process removes inorganic impurities from the
de-chlorinated water. The filtered water produced by the RO process is stored in the
permeating tank in Stage 6 for distribution, and Stage 6 also handles the cleaning of
the UF membranes through the backwash process.
The dataset comprises 11 days of total attack duration, with the first seven days
being attack-free. It contains 946,722 samples and 51 attributes. Attacks were imple-
mented in Level 1, where Programmable Logic Controllers (PLCs) communicate with
the SCADA system. In this level, the data packets are manipulated, and malicious
4
messages are transmitted to the SCADA system. The attack duration varied from a
few minutes to a few hours.
5
Nedeljkovic and Jakovljevic [14] implemented semi-supervised IDS by using CNN-
based auto regression. They applied Finite Impulse Response (FIR) filter to remove
high frequency noise.
Dillon et. al [15] showed that design knowledge increases the efficiency of the IDS.
One reason behind that is when data consist of binary values and analogue ones,
machine learning algorithms can be biased towards binary ones and ignore them.
Experimental results show a 5% increase in the detection by using design knowledge.
There are also research papers that implement dimension reduction. Li et. al. [16]
proposed a method called ”end-to-end anomaly detection” for detecting anomalies
using a digital twin. The proposed method uses a multidimensional deconvolutional
network and attention mechanism with PCA to detect anomalies quickly in real-
time. However, the performance of the method, F1=0.94, is not acceptable for critical
infrastructure. Alimi et al. [17] applied PCA to various supervised learning algo-
rithms. They achieved the best performance for the SWaT dataset with the J48
decision tree classifier, however, the F1 score was 0.814. Priyanga et al. [18] proposed
a hyper-graph-based anomaly detection technique. The proposed algorithm involves
two phases: dimensionality reduction using enhanced principal component analysis
(EPCA) and anomaly detection with HG-based convolution neural network (CNN).
El-Nour et al. proposed framework [19] involving two isolation forest models and PCA.
Although these methods implemented dimensionality reduction algorithms, they do
not emphasize on dimensionality reduction. This article explores the limitations of cur-
rent dimensionality reduction methods and discusses the importance of dimensionality
reduction.
6
2.4.1 Principal Component Analysis (PCA)
Principal Component Analysis (PCA) is a commonly used statistical technique for
dimensionality reduction in data analysis and machine learning. The main goal of
PCA is to identify patterns and structure in high-dimensional data by reducing the
number of variables and retaining the most important information.
Once the principal components are identified, data can be projected onto a lower-
dimensional subspace by selecting a subset of the principal components. This new
subspace retains the most important information from the original high-dimensional
dataset while reducing the number of variables.
2.4.3 Autoencoders
An autoencoder is a neural network that can be used for dimensionality reduction
by compressing high-dimensional data into a lower-dimensional latent representation
as shown in Figure 2. The autoencoder consists of an encoder that maps the input
data to the latent layer, a bottleneck layer that represents the compressed data, and a
decoder that reconstructs the original data from the latent representation. By training
the network to minimize the difference between the input and reconstructed output,
the network learns to identify the most important features in the data and discard the
less important ones.
The size of the latent space is an important consideration when designing an
autoencoder, as it determines how much information will be retained after compres-
sion. If the latent space is too small, the autoencoder may lose important information
and result in the poor reconstruction of the input data. On the other hand, if the
latent space is too large, the autoencoder may overfit and memorize the training data,
resulting in poor generalization to new data.
7
Fig. 2 The architecture of the autoencoder used in this experiment.
The proposed 1D CNN model consists of three convolutional layers, each with a
filter size of 16 and a kernel size of 5, 3, and 2 respectively as shown in Figure 3a. The
filter size refers to the number of filters or feature maps used in each convolutional
layer. The kernel size, on the other hand, refers to the size of the convolutional filter
or window that is moved across the input sequence to extract features. A smaller
kernel size allows the network to capture more local features, while a larger kernel size
captures more global features.
After the three convolutional layers, the output feature maps from each layer are
concatenated into a single tensor. The purpose of this is to combine the features
learned at different levels of abstraction into a single representation.
The concatenated tensor is then fed into a max pooling layer, which reduces the
spatial dimensions of the tensor by taking the maximum value within a specified
window. This helps to extract the most salient features from the input sequence while
reducing the computational cost of the network.
Batch normalization is then applied to normalize the output of the previous layer,
which helps to speed up training and improve the generalization of the model. Finally,
8
Fig. 3 a) The architecture 1D CNN block used in the experiment along with b) the whole architecture
including GRU layers.
the ReLU activation function is applied elementwise to the output of the batch nor-
malization layer, which introduces non-linearity to the model and helps to extract
more complex features.
As shown in Figure 3b, after the two blocks of the CNN, the output is fed into GRU
layers, which can capture longer-term dependencies in the input sequence. Finally, the
output of the GRU layer is passed through a dense layer, which produces the final
output of the model.
9
CNN-GRU model on the SWaT dataset. We trained the model for different numbers
of epochs ranging from 1 to 100 and evaluated its performance.
Figure 4 illustrates the trends of validation loss and accuracy as a function of the
number of epochs. The optimal epoch number is located at epoch number 10, where
the minimum validation loss and maximum validation accuracy intersect.
Fig. 4 Determining the optimal number of epochs for model training using validation loss and
accuracy.
Other hyperparameters such as batch size, learning rate, and optimizer can signif-
icantly impact the performance of a deep-learning model. Therefore, it is important
to optimize these hyperparameters to achieve the best possible performance. In this
study, we chose a batch size of 32, a learning rate of 0.001, and the Adam optimizer
based on their effectiveness in previous studies and our experimentation on the SWaT
dataset.
10
4.1 Evaluation Method
The proper testing of an Intrusion Detection System (IDS) is a crucial step in eval-
uating its effectiveness. To ensure the accuracy and reliability of the proposed IDS
model, we conducted a comprehensive analysis of its performance.
Our proposed model employs a binary classifier to differentiate between authentic
messages and potential attacks. As a result, there are four possible outcomes: false
negative, false positive, true negative, and true positive. A true positive occurs when
an attack is correctly identified by the system, while a true negative occurs when an
authentic message is correctly accepted as such. In contrast, a false positive occurs
when an authentic message is labeled as an attack, and a false negative occurs when
an attack is labeled as an authentic message.
To assess the performance of our proposed IDS model, we calculated the values
for FN, FP, TN, and TP. These values provide important insights into the system’s
ac-curacy and effectiveness in detecting potential attacks. Additionally, we calculated
several key metrics such as accuracy, precision, and recall values.
The accuracy metric evaluates the percentage of correct predictions made by the
model, whereas the precision metric assesses the percentage of true positives among all
positive predictions. Recall metric evaluates the percentage of true positives detected
by the system among all actual attacks. By considering all of these metrics, we
can assess the overall performance of the IDS model and determine its efficiency in
detecting potential attacks.
11
4.2 Comparison with the State-of-the-art Techniques
As many researchers use this dataset, it serves as a common benchmark for evalu-
ating and comparing the performance of different methods. The proposed method is
compared with other state-of-the-art methods.
Table 1 presents the performance metrics for the proposed method along with other
State-of-the art proposals. There is a trade-off between Precision and Recall. These
two metrics measure different aspects of a classifier’s performance, and optimizing one
metric often comes at the expense of the other.
The Table 1 depicts that some models achieved high Precision scores but lower
Recall scores (e.g., CNN–FIR), while others achieved higher Recall scores but lower
Preci-sion scores (e.g., EPCA-HG-CNN). The proposed CNN-GRU model achieved
the best overall performance, achieving an impressive accuracy score of 0.9968, F1
score of 0.9869, precision of 0.9855, and recall of 0.9882.
12
of eigenvectors increases, with the highest precision of 0.9843 achieved with 20 eigen-
vectors. The recall of the IDS is highest for 25 eigenvectors with a value of 0.9781,
indicating that the IDS with 25 eigenvectors is better at detecting true positive cases.
The true positive (TP) values increase with the number of eigenvectors, while the false
negative (FN) values decrease, indicating that the IDS is more capable of detecting
true positive cases with a higher number of eigenvectors. However, the false positive
(FP) values slightly increase as the number of eigenvectors increases, which suggests
that increasing the number of eigenvectors may result in a higher rate of false alarms.
4.3.2 Autoencoder
Choosing the appropriate number of latent layers can be a challenging task, and it often
requires experimentation and tuning to find the optimal number for a given problem.
Typically, the number of latent layers is determined by balancing the trade-off between
model complexity and performance on the validation set.
13
Table 2 The performance analysis of generalized eigenvalue decomposition.
14
Fig. 6 The variance of PCA components.
cases as the number of components increases. In contrast, the false positive values
remain relatively low across all component numbers, indicating that the pro-posed
method can maintain a low rate of false alarms even with an increased number of
components.
15
technique among the three methods evaluated, as it resulted in the best balance be-
tween number of dimension and accuracy. PCA can slightly improve accuracy and F1
score of CNN-GRU architecture with 20 components. On the other hand, reducing the
input features by 90.2% using PCA resulted in only a 2.6% decrease in the F1 score
of the intrusion detection system. When the number of components is decreased, pure
CNN-GRU model outperforms all experimented dimensionality reduction methods.
This suggests that there may be trade-offs between reducing dimensionality and main-
taining accuracy, and that each situation may require a different approach depending
on the specific goals and constraints of the system being used. Overall, the findings
suggest that careful consideration and testing of different dimensionality reduction
techniques is necessary for optimization.
Although successful results are observed, there is some room to improve the current
system. The method was evaluated offline. Future work could explore the feasibility
of implementing the proposed method in real time to provide continuous monitoring
and early detection of potential cyberattacks on critical infrastructure systems.
While this research focused on the SWaT dataset, future work could explore the
effectiveness of the proposed method on other datasets related to critical infrastruc-
ture, such as power grids or transportation systems. This would provide insights into
the generalizability of the proposed method.
5 Conclusion
This research investigates the use of a 1D CNN and GRU model on the SWaT dataset.
The main aim of the research is to improve the performance of the model by utilizing
the complementary strengths of both models. The results demonstrate that combining
the 1D CNN and GRU models can significantly enhance the accuracy of the model.
Moreover, the study highlights the importance of dimensionality reduction, indicating
that the selection of relevant features can significantly affect the performance of the
model.
Declarations
• Funding:The authors declare that no funds, grants, or other support were received
during the preparation of this manuscript
• Conflict of interest:The authors have no conflicts of interest to declare. All co-
authors have seen and agree with the contents of the manuscript and there is no
16
financial interest to report. We certify that the submission is original work and is
not under review at any other publication.
• Data Availability Statement: SwAT dataset from iTrust Lab is used. Publicly
available at: https://itrust.sutd.edu.sg/itrust-labs datasets/#SWaT
• Authors’ contributions:All authors contributed to the study conception and design.
Material preparation and analysis were performed by Mehmet Bozdal, Kadir Ileri
and Ali Ozkahraman. All authors read and approved the final manuscript.
References
[1] Abrams, M., Weiss, J.: Malicious control system cyber security attack case
study–maroochy water services, australia. McLean, VA: The MITRE Corporation
(2008)
[2] David, K.: The real story of stuxnet. ieee Spectrum 50(3), 48–53 (2013)
[3] Case, D.U.: Analysis of the cyber attack on the ukrainian power grid. Electricity
Information Sharing and Analysis Center (E-ISAC) 388, 1–29 (2016)
[4] Kovacs, E.: Industrial Systems at Risk of WannaCry Ransomware Attacks. https:
//www.securityweek.com/industrial-systems-risk-wannacry-ansomware-attacks.
Accessed: 2023-01-11
[5] Di Pinto, A., Dragoni, Y., Carcano, A.: Triton: The first ics cyber attack on safety
instrument systems. In: Proc. Black Hat USA, vol. 2018, pp. 1–26 (2018)
[7] Adepu, S., Mathur, A.: Distributed attack detection in a water treatment plant:
Method and case study. IEEE Transactions on Dependable and Secure Computing
18(1), 86–99 (2018)
[8] Das, T.K., Adepu, S., Zhou, J.: Anomaly detection in industrial control systems
using logical analysis of data. Computers & Security 96, 101935 (2020)
[10] Al-Dhaheri, M., Zhang, P., Mikhaylenko, D.: Detection of cyber attacks on a
water treatment process. IFAC-PapersOnLine 55(6), 667–672 (2022)
[11] Boateng, E.A., Bruce, J., Talbert, D.A.: Anomaly detection for a water treat-
ment system based on one-class neural network. IEEE Access 10, 115179–115191
(2022)
17
[12] Kravchik, M., Shabtai, A.: Detecting cyber attacks in industrial control systems
using convolutional neural networks. In: Proceedings of the 2018 Workshop on
Cyber-physical Systems Security and Privacy, pp. 72–83 (2018)
[13] Zhou, L., Zeng, Q., Li, B.: Hybrid anomaly detection via multihead dynamic graph
attention networks for multivariate time series. IEEE Access 10, 40967–40978
(2022)
[14] Nedeljkovic, D., Jakovljevic, Z.: Cnn based method for the development of cyber-
attacks detection algorithms in industrial control systems. Computers & Security
114, 102585 (2022)
[15] Cheong Lien Sung, D., MR, G.R., P Mathur, A.: Design-knowledge in learning
plant dynamics for detecting process anomalies in water treatment plants (2022)
[16] Li, Z., Duan, M., Xiao, B., Yang, S.: A novel anomaly detection method for
digital twin data using deconvolution operation with attention mechanism. IEEE
Transactions on Industrial Informatics (2022)
[17] Alimi, O.A., Ouahada, K., Abu-Mahfouz, A.M., Rimer, S., Alimi, K.O.A.:
Supervised learning based intrusion detection for scada systems. In: 2022 IEEE
Nigeria 4th International Conference on Disruptive Technologies for Sustainable
Development (NIGERCON), pp. 1–5 (2022). IEEE
[18] Krithivasan, K., Pravinraj, S., VS, S.S., et al.: Detection of cyberattacks in
industrial control systems using enhanced principal component analysis and
hypergraph-based convolution neural network (epca-hg-cnn). IEEE Transactions
on Industry Applications 56(4), 4394–4404 (2020)
[19] Elnour, M., Meskin, N., Khan, K., Jain, R.: A dual-isolation-forests-based attack
detection framework for industrial control systems. IEEE Access 8, 36639–36651
(2020)
[20] Xie, X., Wang, B., Wan, T., Tang, W.: Multivariate abnormal detection for
industrial control systems using 1d cnn and gru. Ieee Access 8, 88348–88359
(2020)
[21] Kravchik, M., Shabtai, A.: Efficient cyber attacks detection in industrial con-
trol systems using lightweight neural networks. arxiv 2019. arXiv preprint
arXiv:1907.01216
[22] Macas, M., Wu, C.: An unsupervised framework for anomaly detection in a water
treatment system. In: 2019 18th IEEE International Conference On Machine
Learning And Applications (ICMLA), pp. 1298–1305 (2019). IEEE
18