0% found this document useful (0 votes)
46 views16 pages

1 s2.0 S0957417423024144 Main

This document summarizes previous research on intrusion detection systems using machine learning techniques. It then proposes a new Deep Residual Convolutional Neural Network (DCRNN) optimized by an improved Gazelle Optimization Algorithm to improve intrusion detection performance. The DCRNN reduces computational complexity by selecting relevant features using a novel Binary Grasshopper Optimization Algorithm. The proposed system is evaluated on several datasets and is shown to outperform existing models in classification accuracy and processing time, efficiently identifying various attacks.

Uploaded by

Sovia Hanifa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views16 pages

1 s2.0 S0957417423024144 Main

This document summarizes previous research on intrusion detection systems using machine learning techniques. It then proposes a new Deep Residual Convolutional Neural Network (DCRNN) optimized by an improved Gazelle Optimization Algorithm to improve intrusion detection performance. The DCRNN reduces computational complexity by selecting relevant features using a novel Binary Grasshopper Optimization Algorithm. The proposed system is evaluated on several datasets and is shown to outperform existing models in classification accuracy and processing time, efficiently identifying various attacks.

Uploaded by

Sovia Hanifa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Expert Systems With Applications 238 (2024) 121912

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

Deep residual convolutional neural Network: An efficient technique for


intrusion detection system
Gunupudi Sai Chaitanya Kumar a, *, Reddi Kiran Kumar b, Kuricheti Parish Venkata Kumar c,
Nallagatla Raghavendra Sai d, Madamachi Brahmaiah e
a
Department of Artificial Intelligence, DVR & Dr. HS MIC College of Technology, Andhra Pradesh, India
b
Department of Computer Science, Krishna University, Machilipatnam, Andhra Pradesh, India
c
Department of Computer Applications, Velagapudi Ramakrishna Siddhartha Engineering College, Andhra Pradesh, India
d
Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh, India
e
Department of Computer Science and Engineering, RVR & JC College of Engineering, Guntur, Andhra Pradesh, India

A B S T R A C T

The fast growth of computer networks over the past few years has made network security in smart cities a significant issue. Network intrusion detection is crucial to
maintaining the integrity, confidentiality, and resource accessibility of the various network security rules. Conventional intrusion detection systems frequently use
mining association rules to identify intrusion behaviors. They run into issues such as a high false alarm rate (FAR), limited generalization capacity, and slow
timeliness because they cannot adequately extract distinctive information about user activities. The primary goal of the current research is to classify attacks using
efficient approaches to identify genuine packets. If the number of characteristics in a dataset decreases, the complexity of DL approaches is significantly decreased. In
this research work, the Deep Residual Convolutional neural network (DCRNN) is proposed to enhance network security through intrusion detection, which is
optimized by the Improved Gazelle Optimization Algorithm (IGOA). Feature selection has eliminated irrelevant features from network data used in obstacle clas­
sification processes. Essential features are chosen using the Novel Binary Grasshopper Optimization Algorithm (NBGOA). Experimentation is carried out using the
UNSW-NB-15, Cicddos2019 dataset, and CIC-IDS-2017 dataset. According to the experimental findings, the proposed system outperforms existing models regarding
classification accuracy and processing time. The results demonstrate that the presented approach efficiently and precisely identifies various assaults.

1. Introduction unknown attacks. However, anomaly-based IDS frequently produces


false positive results, whereas signature-based IDS produces false
Information and communication technology (ICT) systems are now negative results (Jan et al., 2019; Pawlicki et al., 2020; Choi et al.,
integral to every aspect of modern business and society (Smys et al., 2019). The training step of an anomaly-based strategy entails collecting
2020). The complexity and frequency of cyber-attacks against ICT sys­ and monitoring data related to.
tems are also rising. ICT systems, therefore, require a very effective the regular operation of the system before constructing a model
network security solution. One of the often-employed methods for behavior of legal users (Wang et al., 2020). The models that are
spotting different kinds of hostile attacks in the network is an intrusion employed for detection are then discussed.
detection (ID) system (Aldweesh et al., 2020; Jin et al., 2020). Intrusion Numerous kinds of research have concentrated on intrusion detec­
Detection System (IDS) can also be divided into signature-based and tion system design utilizing machine learning (ML) techniques, which
anomaly-based categories. IDS that rely on signatures identify attacks by primarily use unsupervised or supervised techniques to identify typical
comparing them to known signatures (Almiani et al., 2020). These IDS patterns to improve IDS performance (Liu et al., 2021; Sakr et al., 2019).
can neither detect zero-day attacks nor novel attack types. On the other The most popular ML-based methods are the NB, DT, Gaussian mixture
hand, anomaly-based IDS gather information about the behavior of model, SVM, RF, KNN, and PCA. Using a relatively big data set for the
legitimate users. Then, they analyze the behavior currently being ML approach is time-consuming (Mebawondu et al., 2020), but suc­
observed to determine whether it is the behavior of legitimate or mali­ cessful ML algorithms only require a limited amount of input data.
cious users (Zhou et al., 2020; Mulyanto et al., 2021; Khraisat et al., Large-scale data are also unsuitable for ML algorithms to handle various
2019; Nguyen et al., 2020). As a result, these kinds of IDS can identify categorization tasks due to poor effectiveness due to their high-

* Corresponding author.
E-mail addresses: saichaitanyakumar@mictech.ac.in (G. Sai Chaitanya Kumar), rkk.cs@kru.ac.in (R. Kiran Kumar), kpvk@vrsiddhartha.ac.in (K. Parish Venkata
Kumar), nallagatlaraghavendra@kluniversity.in (N. Raghavendra Sai), brahmaiahm@rvrjc.ac.in (M. Brahmaiah).

https://doi.org/10.1016/j.eswa.2023.121912
Received 9 March 2023; Received in revised form 18 September 2023; Accepted 27 September 2023
Available online 4 October 2023
0957-4174/© 2023 Elsevier Ltd. All rights reserved.
G. Sai Chaitanya Kumar et al. Expert Systems With Applications 238 (2024) 121912

Table 1
Overview of Prior Research.
References Approaches Advantages Disadvantages

Otoum et al. (21) CNN-LSTM This technique is easier, faster, and less complicated Attack data is in a small ratio
than other ones.
Li et al. (22) AE- Random forest This approach provides higher accuracy in prediction. A high dimensionality problem occurs.
algorithm
Devan, P., and Khare, N XGBoost-DNN It’s simple to use and understand. Cannot identify other assaults
(23)
Riyaz, B., and Ganapathy, CNN Both binary and multi-class data sets were masterfully The number of rules can only be so many, and accurate detection
S (24) executed. takes more training time.
Khan, M. A (25) CRNN The performances of IDS were improved when Due to the blending of anomalies and signatures, complexity has
compared with existing ones. risen.
Ramaiah et al. (26) DNN Minimizes the production of duplicate data. higher level users utilize more resources
Imrana et al. (27) BiLSTM Increased accuracy level Higher computational complexity

dimension and nonlinear properties (Subbarayalu et al., 2019; Lee et al., samples from various attack kinds in model training samples on model
2019; Musafer et al., 2020). Thus, the feature selection utility, regarded effectiveness. Compared to the previous method, the results showed that
as a crucial criterion of an ML algorithm, is unavoidable to exclude the given methodology outperformed it.
unnecessary features from the input data and enhance the learning Li et al., (2020) developed a practical DL approach, the Auto-Encoder
process (Sohi et al., 2021; Jiang et al., 2020). To solve the above­ Intrusion Detection System (AE-IDS), based on the RF method to
mentioned issues, we proposed a Deep Residual Convolutional neural enhance classification accuracy and reduce training time. This strategy
network (DCRNN) to improve network security through IDS, which is groups and selects features to generate the training set. After training,
optimized by the Gazelle Based Optimization Algorithm (GOA). The the model can anticipate the outcomes using an AE, drastically boosting
relevant features are selected with the help of the novel optimization prediction accuracy while reducing detection time. The outcomes show
algorithm. The proposed approach improves the performances effec­ that the suggested technique performs better than conventional machine
tively and also reduces the computational complexity. This article has learning-based intrusion detection methods in terms of simplicity of
the following format. A summary of the previous papers is provided in training, flexibility, and detection rate.
the second part. The third part describes a methodology. In Part 4, the Devan et al., (2020) developed the XGBoost- DNN (deep neural
results and discussion findings are given, along with examining every network) model, which employs the XGBoost approach for selecting the
experimental data. Section 5 brings the article to a close. features, followed by DNN for categorization of NIDS. XGBoost deals
with regularization and aids in the prevention of overfitting concerns.
The Adam optimizer maximized the LR during DNN training, while the
1.1. Novelty of this research
softmax classifier was utilized to classify network intrusions. The
observed data show that a DL approach outperforms existing methods
The key contributions are,
with a constant level of 97 % categorization accuracy.
For recognizing intruders in wireless networks, a new intrusion
• Data normalization improves the model’s precision and convergence
detection system was created by Riyaz et al., (2020) to ensure data
rate. Three dataset class labels were numeralized using one-hot
communication security. Convolutional neural networks (CNN) provide
encoding in the preprocessing stage.
a new method for selecting features termed conditional random field
• A novel Binary grasshopper optimization algorithm (NBGOA) selects
and linear correlation coefficient to determine and categorize the
essential features. It eliminates irrelevant elements from the network
essential vital properties. The trials were carried out to assess the pro­
data that are used in the classification of obstacles.
posed intrusion detection system’s ability to identify intrusions with
• Deep Residual Convolutional neural network (DRCNN) was
greater accuracy.
employed for classifying the actual IDS model, which is fine-tuned
CRNN (Convolutional neural networks (CNN) - recurrent neural net­
using the Improved Gazelle optimization algorithm (IGOA).
works (RNN)) was utilized for IDS that anticipates and analyses aggressive
• IGOA optimizes the DCRNN hyperparameter, accelerating feature
network cyberattacks, according to Khan et al., (2021). Convolutional
learning and significantly enhancing performance.
neural networks (CNN) in the HCRNNIDS conduct convolution to capture
• Various tests have been performed on the UNSW-NB15, cicid­
local information, while recurrent neural networks (RNN) capture tem­
dos2019, and CICIDS2017 datasets. The experimental results show
poral features to enhance the ID system’s efficacy and prognosis.
that the state effectiveness of our proposed network exceeds all
Deep neural networks were recommended by Ramaiah et al., (2021)
existing approaches.
for the IDS classifier. The developed intrusion detection approach em­
• The proposed approach was differentiated into several deep learning
ploys a correlation tool and RF method to identify the most critical in­
and machine learning methods. The developed technique functioned
dependent parameters. The assault classifier based on a new neural
superior to the prior techniques per the evaluation outcomes.
network is developed. Enhanced neural-based classifiers and shallow
neural networks are presented to identify malicious attacks. According
2. Related works to the experimental findings, the developed IDS performs better in
quantitative measures.
IDSs have been used in different research because they are critical in A Bidirectional Long-Short-Term-Memory (BiLSTM) based intrusion
safeguarding computer networks against virtual threats. This paper outlines detection system was introduced by Imrana et al., (2021), which pro­
the ML and DL methodologies utilized in prior studies for IDS designs. poses a model that can use labeled data to describe whether a dataset is
Otoum et al., (2022) introduced a DL-based IDS that extracts spatial normal or an attack and then use that information to classify unknown
and temporal information from network traffic data and provides a good data accurately. The suggested strategy performed well and produced
IDS using an ensemble network of CNN and LSTM. DL- intrusion correct results. A related prior work is shown in Table 1.
detection system employed a category weight optimization strategy to
enhance robustness by reducing the impact of an imbalanced amount of

2
G. Sai Chaitanya Kumar et al. Expert Systems With Applications 238 (2024) 121912

Fig. 1. The architecture of the proposed methodology.

3. Proposed methodology resilient IDS is to different evasion strategies.

The intrusion detection system under consideration consists of two 3.2. Data preprocessing
states. The first stage is choosing the best features, and the second is
categorization. In the second stage, a deep residual neural network is Dataset processing is crucial in processing all the datasets utilized in
utilized to detect the network intrusion. At the same time, the Novel this work. Here, one-hot encoding, data normalization, and data
Binary Grasshopper Optimization technique (NBGOA) is employed in cleaning are carried out.
the first step to choose the best features. The proposed work adopts a
novel technique instead of the conventional machine learning-based 3.2.1. Data cleaning
categorization procedure because it offers additional benefits. We examined the entire dataset to manage missing or corrupted data.
Through effective calculation, the proposed solution effectively de­ To do this, they first determined whether any instances had missing data
tects the intrusion and lowers the network’s energy consumption. An and which data had insufficient amounts, such as -inf, +inf, nan, etc.
overview of the suggested intrusion detection model is shown in Fig. 1. They discarded specimens with erroneous or incomplete entries because
One hot encoding is carried out on the data that needs to be cleaned the dataset comprised a substantial volume of data.
during data preprocessing. The next step is data normalization, selecting
the most effective features. To train the network, the best features are 3.2.2. One hot encoding
used. The network model then presents the findings after using the test The second stage involves normalizing the numerical value after
data to find network intrusions. converting a symbolic value to it. All of the datasets contain symbolic
features, such as protocol features (such as User Datagram Protocol
(UDP), Transmission Control Protocol (TCP), etc.), service features (such
3.1. Problem statement as Terminal Network (telnet), File Transfer Protocol (FTP), etc.). Flag
features that conflict with the approach of categorization. Since a
This article presents some of the significant research challenges that numeric value must define all classes, it is crucial to make sure that all
demand attentive consideration. symbols are sets of numbers.
One of the significant issues in creating the intrusion detection sys­
tem is the usage of a suitable dataset. The NSL-KDD dataset, which 3.2.3. Data normalization
consists of dated traffic, does not accurately reflect current attack sce­ Normalizing causes the numerical values to fall within the same
narios, and lacks real-time features, are the primary sources of data used range. Before using the dataset in the training method, the dataset’s
by current IDS systems, which makes them unreliable in terms of per­ properties must be standardized to normalize the numeric value. This
formance outcomes. By evaluating more recent datasets, such as the aims to provide the attribute values with regular semantics. The min and
UNSW-NB15 dataset, the CICIDS-2017 dataset, the Mawilab, and the max values for attribute × must be transformed to the new range ac­
IoT-23 dataset, it is possible to get traffic from simulated environments cording to Eq. (1) before the numerical values may be normalized to
and resolve this problem. IDS relies on methods for machine learning regular semantics among xmin and xmax.
that fit data. Although the data was gathered from a single network, an
xcurrent − xmin
IDS must be established on various networks with comparable accuracy. xnew = 1
xmin − xmax
Another challenging task for an IDS is to find attacks concealed by
evasion strategies. There is still room for more research into how Where × represents the starting point and x’ is the normalized gain.

3
G. Sai Chaitanya Kumar et al. Expert Systems With Applications 238 (2024) 121912

The minimum and maximum numbers of the data set are denoted by min search space. The grasshopper position is upgraded as,
and max, respectively. Between 0 and 1 are the normalized values’ ⎛ ⎞
ranges. The data in the training part is normalized using the min and ⎜∑
ubd − Ibd (⃒⃒ d )X − X ⎟
max values derived for each column. ⎜ N
Xid − Td = c⎜ S ⃒Xj − Xid |
j i⎟
⎟ 3
⎜ j=1c 2 dij ⎟
⎝ j∕
=1 ⎠

3.3. Feature selection


Since Xid and Td are binary bits in the binary version of the algorithm,
After pre-processing the data, the relevant features for NIDS are chosen equation (3) denotes a distance between these two binary bits. To
using the second percentile method and recursive feature removal tech­ calculate this distance, we utilize the Hamming distance.
nique. It reduces the number of dimensions in the dataset, which lowers
processing and memory usage and facilitates understanding and analysis 3.3.3. Hamming distance
of the data. The computational complexity of the system is significantly When two hoppers are provided as strings of the binary, their size is
lowered by choosing the best features. Features are chosen using the Novel described as the Hamming difference among the two binary strings that
Binary Grasshopper Optimization Algorithm (NBHOA). correspond to them. The hamming distance among each string is
A population-based evolutionary search methodology known as the calculated using the total number of distinct bits. For instance, i is
Grasshopper optimization algorithm (GOA) algorithm was first imple­ denoted as Gi = (Xi 1, 2, ....., XiR ) while Xij ∈ {0, 1}, and it has R features.
mented as an optimization method to deal with continuous problems. ∑R [ ]
The setting of many optimization issues, including feature selection, is a |Xj − Xi | = rH(Xj , Xi ) = k=1
Xki ∕
= Xkj 4
binary space. As a result, we were inspired to suggest a binary version of ( )
While rH Xir , Xjr in the distance. The distance is computed in equa­
the Grasshopper Optimization Algorithm because binary optimization
tion (5)
issues can only have solutions that fall into the binary 0 and 1 range.
{
Continuous space is binarized when converted into binary space, where ( ) 0 if Xir = Xjr
rH Xir , Xjr = 5
continuous values are represented by 0 or 1. 1 if Xir ∕
= Xjr
The first technique proposes the binary ways without altering the
structural components of ongoing methods. Various consolidation tech­ It is the basic model for using the transformation method.
niques must be used to convert the position vectors from the real values to |Xid − Td | = F(Dist) 6
binary values at each iteration to produce the solution vector. The modi­
fied position equation, significant value priority, angle modulation, and Where |Xid − Td | =dH (Xid , Td ) is the distance among the bits Xid and Td ,
transfer functions, which take an actual value as an input and output a Dist is computed as:
⎛ ⎞
value among 0 and 1, determine the likelihood of shifting places.
The basic upgrading formula is rewritten employing binary vectors ⎜∑ )X − X ⎟
⎜ N ubd − Ibd (⃒⃒ d i⎟
and modifying the operators which can only produce binary values. In Dist = c⎜
⎜ j=1c S ⃒Xj − Xid |
j ⎟ 7
2 dij ⎟
the second technique, the continuous approach’s structure is updated. In ⎝ j∕
=1 ⎠
this method, the location vectors are created as binary vectors. The
literature uses a variety of operators, one of which is the Hamming
In the version of the binary, the ubd in any dimension is 1, and the
distance among two binary strings, which is employed to depict the
differences among the two vectors as a distance. A crossover operation is LBD is 0, then it denotes ubd −2 Ibd in Eqn (7) is equal (1–0)/2 = 0.5
used among the vectors to place the extension operator among the two ubd − Ibd
vectors. The specifics of our proposed Novel Binary Grasshopper Opti­ = 0.5 8
2
mization Algorithm technique for selecting the features are presented in
the following subsections. 3.3.4. Fitness function
The performance of the classifier and the amount of chosen features
3.3.1. Initial population are taken into account by the fitness function. It reduces the size of the
Initializing the population is the first stage in creating a population- chosen features while maximizing categorization accuracy. As a result,
based optimization method. In the proposed method, Each grasshopper as indicated in Equation (9), the following fitness function is employed
in the swarm, I is given a random initialization by a binary vector Xi, (i to assess each unique solution.
= 1, 2,…, N), while the swarm size is denoted as N. With a likelihood of
0.5, a binary value of 0 or 1 is allocated to every dimension: #SF
Fitness = α*errorrate + (1 − α)* 9
{ #All F
1, if Rand() > 0.5
Xij (0) = 2 The categorization error rate utilizing the chosen features is called α
0, else
Error Rate. The error rate ranges from 0 to 1 and is determined by
Rand () produces a random value in the range (0, 1). The character X dividing the number of incorrectly classified data points by the total
represents the starting value of the jth vector of the ith grasshopper. number of categorizations. #SF stands for the quantity of chosen fea­
tures, and #All_F for all the characteristics in the initial dataset. α is
3.3.2. Position updates employed to regulate the weights given to subset length and categori­
The grasshoppers migrate toward the destination during the search zation quality. α is configured to 0.9 in our tests.
phase because that is the best spot the swarm has found. The swarm
starts with a population of random solutions and updates every parti­ 3.3.5. Principles of the NBGOA
cle’s position in continuous GOA under Eq. (3) to find the optimal so­ The Novel Binary Grasshopper Optimization Algorithm randomly
lution. It is considerably different from upgrading a search of binary generates the primary population of N grasshoppers. The fitness func­
space. tion assesses each solution, which might include a subset of features. It
The value in equation (4) is added to the amount of the appropriate relies on the solution’s accuracy as evaluated by the proposed approach
vector bit in the target, represented as Td, to allow grasshoppers in and the amount of selected characteristics in Equation (17). The process
continuous search to upgrade their locations. Since grasshopper location continues after initiation until a predetermined stopping criterion is
vectors can only include 0 or 1, they cannot be upgraded in a binary

4
G. Sai Chaitanya Kumar et al. Expert Systems With Applications 238 (2024) 121912

met. According to Equations (4), (7), and (8), grasshoppers upgrade softmax function from the output layer with m feature representation
[ ]
their positions dependent on all other grasshoppers in every cycle. The [ ]
S = s(1) , s(2) , s(3) , ....s(m) andlabels D = d(1) , d(2) , d(3) , ...d(m) as its loss
best target location thus far was upgraded after every iteration.
function. The hypothesis function is
1
3.4. Classification hg (s) = 12
1 + exp(− θT f )

A new classifier called Deep Residual Convolutional neural network Where θ (θ ∈ Rn) is the parameter model of the softmax function. θ is
(DRCNN) optimized using an improved Gazelle optimization algorithm trained to reduce the cost function
(IGOA) has been developed to produce outcomes with more accuracy [ ]
and attain the best performance of IDS. 1 ∑m (k)
J(θ) = − (k) (k) (k)
y loghθ (f + (1 − y )log(1 − hθ (f )) 13
m k=1

3.4.1. Deep residual convolutional neural network (DRCNN)


Softmax minimizes the loss function, guaranteeing that the predic­
We employ a deep residual convolutional neural network inspired by
tive probability density belongs to the correct class. Softmax assigns a
ResNet for attack categorization. DRCNN contains 40 convolutional
value to each class it can identify. These values add up to 1. ReLU and
layers and is mainly made up of three sections.
leaky ReLU are CNN layers’ most often utilized activation functions. The
Learning the characteristics of the attack is the first part. It comprises
momentum algorithm, an upgraded version of the gradient descent
a rectified linear unit (ReLU) and a convolutional layer, which includes
process, updates the model parameters. Continuous convolution and
64 filters with a 3 × 3 filter size. The second section comprises 19 re­
pooling are used in CNN to represent features step-by-step. The opti­
sidual blocks with two convolutional layers and a Rectified Linear Unit
mization and improvement of feature representation are ongoing,
(ReLU) layer for nonlinear mapping. The 64 × 3 × 3 filters make up
moving from local, tiny correlations to comprehensive relationships.
every convolutional layer. A convolutional layer with a 3 × 3 filter
makes up the third component. The residual block serves as the foun­
3.4.3. Residual convolutional neural network
dation of our network. Our network structure is comparable to ResNet
The DL network plays a significant role in impacting the outcome of
and performs well in computer vision challenges. The neural network is
categorization and detection. The DL model’s effectiveness will suffer if
now simpler because we have eliminated the Batch Normalization (BN)
the depth is increased after a certain point. One of the causes is that the
layer from every layer. Two convolutional layers that are skip-linked
issue of vanishing/exploding gradients becomes more evident the
comprise the majority of the residual block. Every convolutional layer
deeper the network design is, which makes the network training more
works directly with a ReLU layer to do the nonlinear mapping.
challenging. This network employs “shortcut connections” that skip over
many network levels.
3.4.2. Convolutional neural network
One-dimensional inputs are typically processed using 1D convolu­
Although it has the exact input, hidden, and output layers as other
tion. The one-dimensional residual block and its relationship to its input
networks, CNN is more adept at identifying patterns in the data and
and output are described as follows:
correlations between local values. In contrast to other networks, it uses
square matrices as input data, and its hidden layers combine convolu­ lj = RELU(F(lj− 1 , wj ) + lj− 1 ) 14
tional and pool layers. The data for training is a × a square matrix
While lj− 1 is the input vectors and lj is the output vectors. A ReLU
denoted as.
function activates every residual structure. The ResNet block’s jth layer’s
⎡ ⎤
(i) weight parameter is denoted by wj. The function includes the BN, ReLU,
⎢ x11 x(i) ... x(i)
⎢ (i)
12 1a ⎥
⎥ and 1D convolutional layers.
⎢x (i)
x22 ... x2a ⎥
(i)
x(i) =⎢

21 ⎥
⎥ 10 lj = RELU(F(lj− 1 , wj ) + Ws lj− 1 ) 15
⎢ ... ... .. ... ⎥
⎣ (i) ⎦
xa1 x(i)
a2 ... (i)
xaa The BP layer and 1D convolutional layer’s feature maps are
employed as residual information in the shortcut connections branch.
A filter is a square matrix with the dimensions b × b (b < a) that to enhance the residual block features.
serves as the convolutional kernel. There are numerous filters on each The architecture.
convolutional layer. The weight of the jth filter is represented as We suggest a unique deep ResNet architecture to learn the features of
⎡ ⎤ attack types effectively.
(j)
w(j) w(j) Due to the two lead’s data, the input in the network layout contains
⎢ w11 12 ... 1b ⎥
⎢ (j)
⎢w

⎥ two channels. A 1D convolution layer, a BP layer, a ReLU layer, and a max-
w(j) ... w(j)
w(j) =⎢

21 22 2b ⎥
⎥ 11 pooling layer follow the input layer. In every residual block, the first
⎢ ...
⎣ (j)
... ... ... ⎥
⎦ convolution layer has a convolution kernel size of 1x1, the second is 3x1,
wb1 w(j)
b2 ... (j)
wbb and the third is 1x1. The benefit of utilizing a 1*1 convolution kernel is
that it is drastically used. Instead of a whole connection layer, we add an
The CNN feature representation functions as shown below. The fil­ average pooling layer after the final residual block. Last but not least, the
ters first perform convolution in the sample matrix’s upper left corner. softmax layer is used to estimate every label’s likelihood function:
Once the entire matrix has been traversed, the filters continue to slide
with a steady stride while performing convolution. Ultimately, the filters eaj
Sj = ∑N 16
invent new matrices that make up the hidden data in this manner. The
ak
k=1 e
processing of the intricate structure of multivariable samples can be While N represents the sum of classes. aj denotes the jth input of the
avoided via this local connection. It emphasizes the relationships among softmax layer. The cross-entropy loss function is represented as
many variables.
∑N
Pool layers decrease the number of features, whereas convolutional L= − yk logSk 17
layers extract features. A new small matrix is created by averaging
k=1

various values from the matrix using the most used pooling technique, While yk represents the likelihood that the sample under test belongs
average pooling. By reducing characteristics and the impact of tiny al­ to class k, it’s crucial to avoid the loss function of optimizing one cate­
terations, pooling layers ease the burden on classifiers. CNN uses a gory while suppressing other categories in light of the training set’s clear

5
G. Sai Chaitanya Kumar et al. Expert Systems With Applications 238 (2024) 121912

class imbalance issue. We need to train the model to pay more attention 3.5.2. The Lévy flight
to minor classes to improve the classification accuracy for such classes. The levy flight is represented as,
The focal loss function concentrates on small samples. The definition of
focused loss in our multi-label categorization is L(Aj )|Aj |1− α
22

FLloss = − at (1 − pt )γ log(Pt ) 18 While Xj represents the space flight, and α = (1, 2) shows the power
law exponent.
The modulating factor (1 − pt) c lowers the loss contribution from ∫
simple samples. pt was computed using 1 0
fL (x; α, ν) = exp(− νqα )cos(qx)δq 23
{ n ∞
p, y = 1
Pt = 19 The algorithm used in this work produces a stable Lévy motion. The
1-P, otherwise
scale unit is indicated by v, and α is the distribution index.
Where p ∈ (0, 1) denotes the category prediction likelihood, the label
x
value is represented as y, and t is a weighting factor that can scale the Levy(α) = 0.05* 1 24
minor classes separately. Table 2 represents the details of DCRNN. |y|α
While ×,y, and α, are summarized below:
3.5. Gazelle optimization algorithm
x = Normal(0, σ2x ) 25
The GOA is introduced in this section.
y = Normal(0, σ 2x ) 26
3.5.1. Initialization ⎡ ⎤
Gazelles (A) are used in the strategy, a population-based optimization
⎢Γ(1 − α)sin(πα )⎥
approach, using search parameters that are randomly initialized. Equation σx = ⎢ 2 ⎥
27
⎣ Γ(1+α)α2α−2 1 ⎦
(20) states that the prospective solution’s n-by-d matrix represents the 2

search agents. The GOA uses UB and LB constraints of the problem to


calculate the population matrix’s potential values stochastically. Where σy = 1 and α = 1.5.
⎡ ⎤
A1,1 A1,2 ⋯ A1,j A1,1− d A1,d
⎢ A2,1 A2,2 ⋯ A2,j A2,1− d A2,d ⎥ 3.5.3. The basic Gazelle optimization algorithm
⎢ ⎥
⎢ ⋯ ⋯ .⋯ ⋮ ⋯ .⋯ ⎥ The newly created Gazelle Optimization algorithm imitates how

⎢ ⋮
⎥ 20 gazelles manage to survive. The optimal strategy involves grazing
⎢ ⋯ ⋯ Ai,j ⋮ ⋯ ⎥ ⎥
⎣ An− 1,1 An− 1,2 ⋯ An− 1,j ⋯ An− 1,d ⎦ without a predator and running for cover when one is spotted. As a
An,1 An,2 ⋯ ⋯ An− d,1 An,d result, the described Gazelle Optimization algorithm optimization pro­
cess consists of two parts.
While the number of gazelles is represented by n, designated space is
represented by d, and X represents the matrix, solution locations are
3.5.3.1. Exploitation. Utilizing Brownian motion, expressed by a uni­
present where every site is produced by Equation (21).
form and controlled phase, neighboring portions of the domain are
Ai,j = rand*(UBj − LBj ) + LBj 21 effectively covered during this phase. Equation (29) illustrates the for­
mula description of this occurrence.
For every Ai, j, every iteration produces a candidate location, while
rand denotes a random number. gazellei+1 = gazellei + n.R*SB *(Elite − SB *gazellei ) 28
While For the given iteration gazellei+1 is the result. For the present
iteration, gazellei the result, the pace, is depicted as s, c SB is constant

Table 2
Architecture of DCRNN.
Layers Types Pool size Activation function Stride kernel size Output shapes No. of filters

0 input – – – 300x2 –
1 1D convolution 32 ReLU 2 148x32 5x1
2 Batch Normalization – – – – 148x32 –
3 1D max pooling – – 2 74x32 5x1
4–9 1D convolution in Residual blocks 1 – ReLU 1 1x1 74x32 32
1 3x1 74x32 32
1 1x1 74x128 128
1D Convolution in Shortcut connection – ReLU 2 3x1 74x128 128
10–15 1D convolution in Residual blocks 2 – ReLU 2 1x1 37x64 64
1 3x1 37x64 64
1 1x1 37x256 256
1D convolutional shortcut connection – ReLU 2 3x1 34x256 256
16–21 1D convolutional residual block 3 – ReLU 2 1x1 19x128 128
1 3x1 19x128 128
1 1x1 19x128 512
1D convolution in shortcut connection ReLU 2 3x1 37x256 256
22–27 1D convolution in Residual block 4 – ReLU 2 1x1 19x128 256
1 3x1 19x128 256
1 1x1 19x512 1024
1D convolution in shortcut connection ReLU 2 3x1 19x512 1024
28 1D average pooling 3 – – – 1x1024 –
29 Flattern – – – – 1024 –
30 Fully connected – tanh – – 40 –
31 Fully connected – softmax – – 5 –

6
G. Sai Chaitanya Kumar et al. Expert Systems With Applications 238 (2024) 121912

random numbers (0, 1), and a various random number’s vector is rep­ Table 5
resented as S. Data instances of every attack.
class Train set-Size Test Set-Size
3.5.3.2. Exploration. The exploration phase starts when a predator is Normal 1,553,132 443,755
spotted. It consists of a sequence of phases, and periodic huge jumps are Generic 150,836 43,097
used in this algorithmic phase. This tactic has improved search func­ Dos 11,449 3269
tionality in the optimization literature. Runs show a sharp turn in the Fuzzers 16,972 4849
Shellcode 1057 303
direction of travel, represented by the mu. The gazelle shifts its direction
Worms 122 35
every iteration, moving in one way when the iteration number is even. Backdrops 1630 466
We proposed that the gazelle uses the Lévy flight to migrate because it Analysis 1874 535
reacts first. Equation (30) shows the mathematical formula for the ga­ Reconnaissance 9791 2797
zelle’s actions. Exploits 31,167 8906

̅̅̅̅̅̅→ ̅̅̅̅→ → ̅̅̅→ → ̅̅̅̅→


gazellei+1 = gazellei + P.μ.SL *(Elitei − SL *gazellei ) 29 imbalance among the searching phases. The fact that there may be a
limited range of potential solutions is one of the GOA’s significant
While
problems. The recommended solution has a suitable arrangement
( )
among the existing ways to address these problems to address clustering
( ) 2 maxiteriter
iter obstacles more efficiently.
CF = 1 − 30
max − iter Finally, we show how the problems were fixed. The imbalance be­
tween the search processes is first addressed. Due to this, the optimi­
Even though Mongolian gazelles are not threatened, research on
zation process avoids the local search area and instead looks for the best
them also indicated an annual survival rate of 0.66, equating to just 0.34
solution. Applying numerous update strategies by the advised technique
cases where predators are effective. Predator success rates (PSRs) affect
will maintain the variety of the used solutions.
a gazelle’s ability to escape. The effect of PSRs is modeled in equation
(32).
3.5.5. Computational complexity of the proposed approach
⎧ [ ̅→ → ̅→ ̅→ →]
̅̅̅̅→ The entire computing complexity of the proposed IGOA is supplied
̅̅̅̅̅̅→ ⎨ gazellei + CF LB + S *(UB − LB )*U if r ≤ PSRs
gazellei+1 = ̅̅̅̅→ ̅̅̅̅̅→ ̅̅̅̅̅̅→ under the initiation of the population, the goal of the existing ap­
⎩ gazelle + (PSRS(1 − r) + r)(gazelle
i − gazellee ), else
r1 r2 proaches, and then upgrading the populations. Assume that L (M) was
31 the difficulty of initializing those results and that M refers to the total
{ number of solutions employed. The upgrading of the solutions has a

U =
0 if r < 0.34
32 temporal complexity of O (T M) + O (T N Dim), and T is the maximum
1 otherwise number of completed iterations. D is a measurement of the problem’s
size. Following is a description of the IGOA’s computation time.
3.5.4. Proposed IGOA
L(IGOA) = (M) × L(GOA) + L(OL) + L(RDR) 33
This section presents the suggested IGOA, highlighting its main
functions and organizational structure. The proposed strategy uses the Three primary search operators determine the proposed method’s
three main methods: GOA, Orthogonal learning (OL), and Rosenbrock’s time complexity. The complexity times for these approaches are
Direct Rotation (RDR), subject to three phases after a transition mech­ computed as follows.
anism. According to an assumption (IF rand < 0.2), the suggested IGOA
L(LS) = L(M × (max iter × D + 1)) 34
changes the locations of the solutions. The search process is examined at
the conclusion to see whether it should be stopped or continued. If this is
L(GOA) = L(M × (max iter × D + 1)) 35
the case, the OL’s search operations will be conducted; and < 0.5, the
RDR’s search operations will be conducted. L(RDR) = L(M × D) 36
The suggested methodology solves the difficulties of the conven­
tional approaches. The GOA needs help with convergent rates and an As a result, the IGOA’s overall time complexity is provided as
follows.
Table 3 L(IGOA) = L(max iter × M × (D + 1) + 1(M × D) + (M × D)) 37
Details of parameter.
Parameter Value

MLP layers 3
Table 6
Momentum 0.9
Decay 10-5 Data instances of every class.
MLP hidden nodes 48 Attacks Train set- Size Test set-Size
Learning rate 0.01
RNN hidden units 128 BENIGN 1,591,167 454,620
Batch size 32 MSSQL 15 4
Epoch 20 Infiltration 26 7
Dos Hulk 161,751 46,215
Port Scan 111,251 31,786
Table 4 XSS 457 130
Test environment. DoS Golden Eye 7205 2059
SSH 4128 1179
project Environment
DDoS 89,618 25,606
System Python Bot 1376 393
processor Intel i5 2.60 GHz DoS slow HTTP test 3849 1100
Anaconda 4.5.11 DoS slow loris 4057 1159
Python 3.9 Heart bleed 8 2
RAM 16 GB Brute force 1055 301
Backdrop Tensor flow FTP 5516 1588

7
G. Sai Chaitanya Kumar et al. Expert Systems With Applications 238 (2024) 121912

L(IGOA) = L(Max iter × M × (D + M)) 38 4.2.3. Cicddos2019 dataset


The CICDDoS2019 dataset, which has been extensively utilized for
4. Result and discussion DDoS attack detection and categorization, is used in this study. The
collection includes benign samples and many current, actual DDoS as­
The experimental data and conclusions are discussed in this section. sault samples. Table 7 shows the total number of records included in
This research estimates the efficacy of our suggested approach using CICDDoS2019.
three datasets as validation. The data samples are divided into two There are 88 statistical features in every record in the dataset (e.g.,
groups; one serves as the training dataset and is used to build a classifier. source and destination IP addresses, timestamps, source and destination
In the second step, the classifier is assessed using the testing dataset. To port numbers, the protocol used for the attack, and a label for a type of
assess the model’s effectiveness, we ran two trials. The first experiment DDoS attack). While 7 DDoS attacks are present in the testing dataset,
involves multiple classifications, and the second compares it to current the training dataset comprises 12 separate DDoS attacks.
methods. We simultaneously designed contrast experiments on the other
datasets to compare them to state-of-the-art machine learning method­ 4.3. Performance metrics
ologies and deep learning approaches. The parameter setup is shown in
Table 3. We compute the accuracy, recall, specificity, and false alarm rate to
assess the effectiveness of our model. The model’s accuracy and F-score
4.1. Experimental setup were also examined. Following are explanations and sources for each of
these metrics:
The findings were completed on the Python platform using a computer Accuracy (ACC): The proportion of correctly identified intrusions to
with an 8 GB of RAM and i5 CPU. The testing setting is displayed in all traffic records is as follows:
Table 4. TN + TP
ACC = 39
TN + FP + FN + TP
4.2. Selection of dataset Recall: It relates to the ratio of intrusion records accurately identi­
fied as such to all anomalies:
Many Intrusion Detection Systems (IDSs) have been using the NSL-
KDD dataset in recent years. Still, these datasets contain outdated Recall =
TP
40
traffic and must accurately represent contemporary attack scenarios or FN + TP
traffic trends. They also lack real-time features. The most current Specificity: shows the percentage of intrusions that occurred as
benchmark datasets, UNSW-NB15, CICIDS2017, and Cicddos2019, are opposed to those that a NIDS predicted.
thus utilized in this work.
TN
Specificity = 41
FP + TN
4.2.1. UNSW-NB15
The dataset comprises 2.54 million network packets and contains 9 False Alarm Rate (FAR): It may be estimated by dividing the
distinct assault types. Attack traffic accounts for only 12.65 % of the average number of normal connections by the average variety of general
entire dataset compared to ordinary traffic, which comprises 87.35 % of connections.
the data set overall. Data distribution details for every attack are shown
FP
in Table 5. FAR = 42
TN + FP

4.2.2. CICIDS2017 dataset Precision: The Precision calculates the proportion of actual attack
The dataset contains more than 2,830,000 instances, of which attack records compared to anticipated attack records.
traffic accounts for 19.70 % and benign traffic for 80.30 %. There are 14 TP
different assault kinds and one standard class. 84 features were retrieved Precision = 43
FP + TP
from the produced network traffic and are in the dataset’s last column
containing the multiclass label. Data distribution for every class is F-Score: The harmonic mean of recall and Precision is calculated
shown in Table 6. using F-Measure.
( )
1
Table 7 F − Score = 2 44
Training and testing set of Cicddos2019 dataset. precision− 1 + TPR− 1

Dataset Total Benign Malicious


4.3.1. Performance Evaluation of CICIDS 2017 dataset
Training 50.063,112 56,863 50,006,249 Several experiments were conducted on the CICIDS 2017 dataset to
Testing 20,364,525 56,965 20,307,560
estimate the effectiveness of the proposed approach. The multi-class

Table 8
Multi-class categorization of our method on the CICIDS2017 dataset.
DCRNN (%) Optimized DCRNN (%)
Types of Attack Accuracy TPR FPR F1-score Accuracy TPR FPR F1-score

Normal 99.17 99.56 0.89 99.64 99.57 99.86 0.31 99.64


DNS 98.78 96.07 1.21 89.95 99.08 98.13 0.64 93.65
NTP 99.53 98.05 2.34 97.22 99.63 98.65 1.02 97.99
NetBIOS 99.29 96.89 0.93 98.62 99.41 97.99 0.54 99.62
SYN 98.43 99.21 1.43 96.98 99.05 99.67 0.09 98.47
MSSQL 99.02 95.67 0.97 99.06 99.17 97.80 0.21 99.87
UDP 99.34 97.89 1.87 98.76 99.34 98.69 0.43 99.32
LDAP 98.31 97.66 0.65 97.32 99.01 97.98 0.54 99.54
SNMP 97.64 98.72 0.74 98.65 98.79 99.32 0.15 98.87
UDP-LAG 99 97.87 0.63 98.09 99.37 98.17 0.28 99.02

8
G. Sai Chaitanya Kumar et al. Expert Systems With Applications 238 (2024) 121912

Fig. 2. Multi-class classification on CICIDS2017 dataset (a) Evaluation of DCRNN approach (b) Evaluation of optimized DCRNN (c) Results of FAR on Optimized
DCRNN and without.

9
G. Sai Chaitanya Kumar et al. Expert Systems With Applications 238 (2024) 121912

4.3.2. Performance Evaluation of UNSW NB15 dataset


To analyze the efficiency of the suggested approach, several studies
are conducted using the UNSW-NB15 dataset. The multiple-class cate­
gorization outcome of the suggested method is shown in table 10.
Table 10 shows that the suggested approach’s multi-categorization
effectiveness is higher and produces better results for every attack
class. All classes achieve accuracy rates of at least 99 %. These are the
best principles. However, attacks like Fuzzers, Reconnaissance, Generic
Dos, and exploits performances are low compared to all other classes.
Fig. 5 represents the performance of UNSW-NB15 dataset. The ROC
curve is represented in Fig. 6.
The existing deep learning approaches using UNSW NB15 datasets
like DBN, CNN, LSTM, and RNN are utilized to compare with the pro­
posed approaches. When compared with existing approaches, the pro­
posed method outperforms all. The proposed approach achieves 99.06 %
accuracy, 98.99 % of detection rate, and 1.49 FAR. Fig. 7 and Table 11
represent the differentiation of similar approaches using the UNSW
Fig. 3. ROC curve of the CIC-IDS 2019 dataset.
NB15 dataset.

categorization outcome is represented in Table 8. 4.3.3. Performance evaluation of the Cicddos2019 dataset
Table 8 shows that the suggested model’s multiple categorization Several studies have been conducted on the Cicddos 2019 dataset to
effectiveness is greater and produces superior results for each attack analyze the efficiency of the proposed approach. The multiple-class
class. All classes achieve accuracy rates of at least 99 %. These are the categorization outcome of our method is explained in Table 12.
best principles. However, these attacks (DNS, SYN, LDAP, and SNMP) Fig. 8 illustrates how the multi-classification effectiveness of the
are low compared to all other classes. Fig. 2 shows the comparison. Fig. 3 suggested strategy is better and delivers better results for all attack
shows the ROC curve. classes. All classes achieve accuracy rates of at least 99 %. These are the
The existing deep learning approaches using CIC-IDS 2019 datasets best principles. However, these attacks MSSQL, WebDDos, UDP, SNMP,
like DBN, ANN, SAE, and RNN are utilized to compare with the proposed and SSDP numbers are low compared to all other classes. Fig. 9 shows a
approaches. When compared with existing approaches, the proposed graphic of this ROC curve.
method outperforms all. The existing approaches like DBN and RNN The existing deep learning approaches using Cicddos2019 datasets
need more attention. Fig. 4 and Table 9 represent the differentiation of like Bi-LSTM, CNN, MLP, and AE + MLP are utilized to compare with the
similar approaches using the CIC-IDS 2019 dataset. proposed approaches. When compared with existing approaches, the

Table 9
Comparison of similar approaches using the CIC-IDS 2019 dataset.
Database Approaches Accuracy Detection rate FAR

ANN 91.17 91.75 8.83


Autoencoder 90.88 91.43 9.12
CIC-IDS2017 Sparse autoencoder 99.06 98.76 4.12
DBN 97.45 – 8.5
RNN 96.08 – 9.2
Proposed 99.56 99.07 0.78

Fig. 4. Existing approaches performance comparison on CIC-IDS 2019 dataset (a) performances of accuracy and DR (b) FAR comparison.

10
G. Sai Chaitanya Kumar et al. Expert Systems With Applications 238 (2024) 121912

Table 10
Multi-class categorization of our method on the UNSW-NB15 dataset.
DCRNN (%) Optimized DCRNN (%)
Attacks Accuracy TPR FPR F1-score F1-score TPR FPR Accuracy

Normal 99.03 98.96 1.21 97.64 99.71 99.36 0.61 99.76


Analysis 98.43 97.43 0.81 96.54 99 98.76 0. 49 99.24
Backdoors 98.11 98.72 0.74 98.32 99.39 98.12 0.45 99.12
DoS 96.74 97.34 0.78 98.76 99.17 96.34 0.78 99
Exploits 98.04 99.08 1.08 95.17 98.16 99.18 0.48 99.13
Fuzzers 97.67 96.67 0.94 98.90 99.47 98.27 0.63 99.72
Generic 98.02 94.74 0.87 97.84 99.09 97.33 0.87 99.23
Reconnaissance 97.98 97.32 0.75 97.04 99.23 98.12 0.71 99.05
Shellcode 98.14 98.12 0. 84 97.41 99.73 98.12 0. 64 99.81
Worms 98.29 97.21 0.59 98.08 99 97.56 0.59 99.16

Fig. 5. Multi-class classification on UNSW-NB15 dataset (a) Evaluation of DCRNN approach (b) Evaluation of optimized DCRNN (c) Results of FAR on Optimized
DCRNN and without.

11
G. Sai Chaitanya Kumar et al. Expert Systems With Applications 238 (2024) 121912

Fig. 6. ROC curve for UNSW-NB15 dataset.

proposed method outperforms all. 98.89 % of Precision, 99.32 % of F1- summarised in Table 14.
Score, 99.12 % of accuracy, and 99.06 % of recall Fig. 10 and Table 13 The existing machine learning approaches like Multinomial NB,
represent the differentiation of similar approaches using the Cicd­ Random forest, J48, and Logistic regression are employed to compare
dos2019 dataset. with the proposed approach.
A comparison of the proposed with existing machine learning ap­
4.3.4. Overall effectiveness of the presented model proaches is shown in Fig. 11.
For analyzing the performance of individual attack detection, the Table 15 compares the Accuracy, False Alarm Rate, and DR metrics
developed approach produces higher outcomes than prior deep and with the most recent studies to identify the activities. The suggested
machine learning techniques as per the performance measures. The total scheme has the highest accuracy (99.17 %), lowest false alarm rate
attacker recognition results from the recommended proposed model are (0.87), and highest detection rate (99.8 %). The findings demonstrate

Table 11
Comparison of similar approaches using the UNSW NB15 dataset.
Database Approaches Accuracy Detection rate FAR

Deep Belief Network (DBN) 92.36 – 4.48


Recurrent Neural Network (RNN) 95.36 – 7.65
UNSW NB15 IGWO-SVM 89.66 94.56 30.15
CNN 97.51 96.13 6.8
Long Short Term memory (LSTM) 95.43 98.43 6.21
Proposed 99.06 98.99 1.49

Fig. 7. Existing approaches performance comparison on UNSW NB15 dataset (a) performances of accuracy and DR (b) FAR comparison.

12
G. Sai Chaitanya Kumar et al. Expert Systems With Applications 238 (2024) 121912

Table 12
Multi-class categorization on the Cicddos2019 dataset.
DCRNN (%) Optimized DCRNN (%)
Attack types Accuracy TPR FPR F1-score Accuracy TPR FPR F1-score

NTP 98.71 95.78 1.26 97.78 99.09 98.74 0.84 98.56


DNS 99.06 98.45 2.34 98.16 99.43 99.21 1.04 99.45
LDAP 99.43 97.63 0.95 96.56 99.67 98.63 0.56 98.43
MSSQL 97.92 99.02 1.84 97.15 98.99 99.44 0.67 98.64
NetBIOS 99 99.17 0.67 98.13 99.34 99.56 0.32 99.05
SNMP 96.65 98.71 0.73 96.15 99.03 99.64 0.54 98.74
SSDP 97.73 96.74 1.52 98.56 98.65 98.76 1.09 99.03
UDP 95.16 97.89 0.96 97.67 98.08 98.91 0.43 97.97
UDP-Lag 98.23 98.74 0.64 98.15 98.87 99.12 0.41 98.74
WebDDoS 95.64 99.06 0.88 99.06 98.08 99.43 0.34 99.31
SYN 98.71 99.32 1.31 98.74 99.12 99.69 0.76 98.89
TFTP 94.43 99.11 0.49 98.63 97.49 99.47 0.27 99.02

Fig. 8. Multi-class classification on Cicddos2019 dataset (a) Evaluation of DCRNN approach (b) Evaluation of optimized DCRNN (c) Results of FAR on Optimized
DCRNN and without.

13
G. Sai Chaitanya Kumar et al. Expert Systems With Applications 238 (2024) 121912

Fig. 9. ROC curve of Cicddos2019 dataset.

Table 13 4.3.5. Performance of training and testing set


Comparison of similar approaches using the Cicddos2019 dataset. The classification accuracy and loss value of the IDS are plotted
against the number of iterations in Fig. 12(a) and (b). As demonstrated
Methods Precision F1-Score Accuracy Recall
(%) (%) (%) (%)
by the figure, the technique employed in this paper results in a good
convergence effect. The training phase and the testing phase were
Bi-LSTM 97.93 98.18 99.84
separated from the entire dataset. Twenty-five percent of the testing

CNN 93.3 92.8 95.4 92.4
Multilayer Perceptron 84.4 89 92.5 94.2 data and 75 percent of the training data were produced specifically for
(MLP) this investigation. During the training phase, 200 iterations of the sug­
Auto encoder (AE) + 97.91 98.18 98.34 98.48 gested method are trained using the processing training set. The rate of
MLP
learning is set to 0.01.
Proposed 98.89 99.32 99.12 99.06

5. Conclusion and future works

that the suggested approach effectively raises the recognition rate This work presents a model for intrusion detection based on
compared to the most recent studies. ensemble learning. All types of attacks will be able to be recognized
The processing time comparison is shown in Table 16. Compared using the provided strategy. With an ensemble paradigm and soft
with the existing proposed approach, it consumes less testing time. computing techniques in place of ANN and DL methodologies, high
accuracy is achieved using fewer resources, processing power, and FAR.
Data normalization improves both model precision and convergence
speed. One-hot encoding is also used in the preprocessing phase of the
class label numeralization of three datasets. The DRCNN then effectively
categorizes various attack types using the compressed and reduced
characteristics generated by the NBGOA model. Our proposed approach
outperformed many comparable methods with an accuracy of 99.17 %, a
false alarm rate of 0.87, and a detection rate of 99.8 % after being
thoroughly and exhaustively evaluated against numerous subsets of
massive assault data. Future research could incorporate a new clustering
technique into the proposed models to further boost intrusion detection
performance while focusing on overfitting and data sparsity issues.

Declaration of Competing Interest

The authors declare that they have no known competing financial


interests or personal relationships that could have appeared to influence
the work reported in this paper.

Fig. 10. Differentiation of similar approaches using the Cicddos2019 dataset.

Table 14
Proposed model versus traditional machine learning algorithm.
Methods Accuracy (%) TPR (%) F1-score (%) FPR (%)

J48 97.32 96.80 91.43 2.17


Multinomial NB 72.52 78.20 52.06 33.16
Logistic regression 97.68 94.96 90.55 1.47
Random forest 96.08 95.47 76.71 3.30
Proposed 99.04 98.67 98.89 0.40

14
G. Sai Chaitanya Kumar et al. Expert Systems With Applications 238 (2024) 121912

Fig. 11. Comparison of proposed versus existing approaches (a) Evaluation of accuracy, F1-Score, and TPR (b) FPR comparison.

Table 15
Comparative investigation of proposed and related work methods.
Methods Database Accuracy False alarm rate Detection rate

CNN-LSTM (21) NSL-KDD 99.02 % – –


AE- Random forest algorithm (22) CSE-CIC-IDS 2018 – – –
XGBoost-DNN (23) NSL-KDD 97 % – –
CNN (24) NSL-KDD 98 % – –
CRNN (25) CSE-CIC-IDS2018 97.75 % 1.4 % 97.75 %

DNN (26) KDDCUP99 98 % – –


BiLSTM (27) NSL-KDD 94.26 % 4.20 % 98.54 %
Proposed CSE-CIC-IDS2018, NSL-KDD, UNSW-2015 99.17 % 0.87 99.08 %

Table 16
Summary of processing time comparison.
Approaches Processing time (s) Testing time (s)

feature grouping based on linear correlation coefficient (FGLCC)-CFA 83.28 43.50


cuttlefish algorithm (CFA) 350.60 110.45
FGLCC 78.37 38.21
Proposed 47 23.01

Fig. 12. Training and testing evaluation of a proposed approach.

15
G. Sai Chaitanya Kumar et al. Expert Systems With Applications 238 (2024) 121912

Data availability Khraisat, A., Gondal, I., Vamplew, P., Kamruzzaman, J., & Alazab, A. (2019). A novel
ensemble of hybrid intrusion detection systems for detecting Internet of Things
attacks. Electronics, 8(11), 1210.
Data will be made available on request. Jan, S. U., Ahmed, S., Shakhov, V., & Koo, I. (2019). Toward a lightweight intrusion
detection system for the Internet of Things. IEEE Access, 7, 42450–42471.
Acknowledgments Pawlicki, M., Choraś, M., & Kozik, R. (2020). Defending network intrusion detection
systems against adversarial evasion attacks. Future Generation Computer Systems, 110,
148–154.
We declare that this manuscript is original, has not been published Choi, H., Kim, M., Lee, G., & Kim, W. (2019). Unsupervised learning approach for
before, and is not currently being considered for publication elsewhere. network intrusion detection system using autoencoders. The Journal of
Supercomputing, 75(9), 5597–5621.
Funding: Not applicable. Wang, H., Cao, Z., & Hong, B. (2020). A network intrusion detection system based on
Availability of Data and Material: Not applicable. convolutional neural network. Journal of Intelligent & Fuzzy Systems, 38(6),
Code Availability: Not applicable. 7623–7637.
Liu, J., Gao, Y., & Hu, F. (2021). A fast network intrusion detection system using adaptive
Authors’ Contributions: The author confirms sole responsibility for synthetic oversampling and LightGBM. Computers & Security, 106, 102289.
the following: study conception and design, data collection, analysis and Sakr, M. M., Tawfeeq, M. A., & El-Sisi, A. B. (2019). Network intrusion detection system
interpretation of results, and manuscript preparation. based PSO-SVM for cloud computing. International Journal of Computer Network and
Information Security, 11(3), 22.
Ethics Approval: This material is the author’s original work, which Mebawondu, J. O., Alowolodu, O. D., Mebawondu, J. O., & Adetunmbi, A. O. (2020).
has not been previously published elsewhere. The paper reflects the Network intrusion detection system using supervised learning paradigm. Scientific
author’s research and analysis in a truthful and complete manner. African, 9, e00497.
Subbarayalu, V., Surendiran, B., Kumar, A. R., & P.. (2019). Hybrid network intrusion
detection system for smart environments based on internet of things. The Computer
References Journal, 62(12), 1822–1839.
Musafer, H., Abuzneid, A., Faezipour, M., & Mahmood, A. (2020). An enhanced design of
Smys, S., Basar, A., & Wang, H. (2020). Hybrid intrusion detection system for the Internet sparse autoencoder for latent features extraction based on trigonometric simplexes
of Things (IoT). Journal of ISMAC, 2(04), 190–199. for network intrusion detection systems. Electronics, 9(2), 259.
Aldweesh, A., Derhab, A., & Emam, A. Z. (2020). Deep learning approaches for anomaly- Sohi, S. M., Seifert, J. P., & Ganji, F. (2021). RNNIDS: Enhancing network intrusion
based intrusion detection systems: A survey, taxonomy, and open issues. Knowledge- detection systems through deep learning. Computers & Security, 102, 102151.
Based Systems, 189, 105124. Jiang, H., He, Z., Ye, G., & Zhang, H. (2020). Network intrusion detection based on PSO-
Jin, D., Lu, Y., Qin, J., Cheng, Z., & Mao, Z. (2020). SwiftIDS: Real-time intrusion XGBoost model. IEEE Access, 8, 58392–58401.
detection system based on LightGBM and parallel intrusion detection mechanism. Otoum, Y., Liu, D., & Nayak, A. (2022). DL-IDS: A deep learning–based intrusion
Computers & Security, 97, 101984. detection framework for securing IoT. Transactions on Emerging Telecommunications
Almiani, M., AbuGhazleh, A., Al-Rahayfeh, A., Atiewi, S., & Razaque, A. (2020). Deep Technologies, 33(3), e3803.
recurrent neural network for IoT intrusion detection system. Simulation Modelling Li, X., Chen, W., Zhang, Q., & Wu, L. (2020). Building auto-encoder intrusion detection
Practice and Theory, 101, 102031. system based on random forest feature selection. Computers & Security, 95, 101851.
Zhou, Y., Cheng, G., Jiang, S., & Dai, M. (2020). Building an efficient intrusion detection Ramaiah, M., Chandrasekaran, V., Ravi, V., & Kumar, N. (2021). An intrusion detection
system based on feature selection and ensemble classifier. Computer networks, 174, system using optimized deep neural network architecture. Transactions on Emerging
107247. Telecommunications Technologies, 32(4), e4221.
Mulyanto, M., Faisal, M., Prakosa, S. W., & Leu, J. S. (2021). Effectiveness of focal loss for Imrana, Y., Xiang, Y., Ali, L., & Abdul-Rauf, Z. (2021). A bidirectional LSTM deep
minority classification in network intrusion detection systems. Symmetry, 13(1), 4. learning approach for intrusion detection. Expert Systems with Applications, 185,
115524.

16

You might also like