Optimizing Anomaly-Based Attack Detection Using Classification Machine Learning
Optimizing Anomaly-Based Attack Detection Using Classification Machine Learning
https://doi.org/10.1007/s00521-023-09309-y(0123456789().,-volV)
(0123456789().,-volV)
ORIGINAL ARTICLE
Received: 10 April 2023 / Accepted: 16 November 2023 / Published online: 20 December 2023
The Author(s) 2023
Abstract
One of the significant aspects of our digital world is that data are literally everywhere, and it is increasing. On the other
hand, the number of cyberattacks aiming to seize this data and use it illegally is increasing at an exponential rate, and this is
the challenge. Therefore, intrusion detection systems (IDS) have attracted considerable interest from researchers and
industries. In this regard, machine learning (ML) techniques are playing a pivotal role as they put the responsibility of
analyzing enormous amounts of data, finding patterns, classifying intrusions, and solving issues on computers instead of
humans. This paper implements two separate classification layers of ML-based algorithms with the recently published NF-
UQ-NIDS-v2 dataset, preprocessing two volumes of sample records (100 k and 10 million), utilizing MinMaxScaler,
LabelEncoder, selecting superlative features by recursive feature elimination, normalizing the data, and optimizing hyper-
parameters for classical algorithms and neural networks. With a small dataset volume, the results of the classical algorithms
layer show high detection accuracy rates for support vector (98.26%), decision tree (98.78%), random forest (99.07%),
K-nearest neighbors (98.16%), CatBoost (99.04%), and gradient boosting (98.80%). In addition, the layer of neural
network algorithms has proven to be a very powerful technology when using deep learning, particularly due to its unique
ability to effectively handle enormous amounts of data and detect hidden correlations and patterns; it showed high
detection results, which were (98.87%) for long short-term memory and (98.56%) for convolutional neural networks.
Keywords Intrusion detection Detection techniques and methodologies Classical Machine learning algorithms
Neural network and dataset
1 Introduction behavior patterns that involve theft, misuse, deceit, and the
making of mistaken suggestions are classified as intrusion
Globalization and constant change are currently the main and fraud [3]. According to a report by Cisco on annual
drivers for information system security, resulting in a sig- internet usage, digital transformation and technological
nificant increase in security risks such as new attacks being advancements have increased the frequency of cyberat-
detected daily, which can be disastrous [1, 2]. Most human tacks and the total cost of data breaches. Furthermore, the
nature of the threats is constantly changing [4]. Intrusion
detection constitutes one of the most important method-
& Hany Abdelghany Gouda ologies for dealing with information security incidents; it is
hany.abdelghany21@commerce.helwan.edu.eg
based on selections, deployments, and operations that
Mohamed Abdelslam Ahmed cover the entire spectrum of mechanisms for signs of
dr.m_abdelsalam@commerce.helwan.edu.eg
potential attacks [5–7]. However, not all vulnerabilities
Mohamed Ismail Roushdy have always been considered equal, and thus constant
mohamed.roushdy@fue.edu.eg
improvements in detection must be made, according to the
1
Information Systems Department, Faculty of Commerce and Edgescan Vulnerability Stats Report for 2022 [8].
Business Administration, Helwan University, Cairo, Egypt Utilizing machine learning (ML) is one of the major
2
Computer Science Department, Faculty of Computers and research areas being studied currently to improve intrusion
Information Technology, Future University in Egypt, Cairo, detection (ID). These ML approaches aim to overcome the
Egypt
123
3240 Neural Computing and Applications (2024) 36:3239–3257
123
Neural Computing and Applications (2024) 36:3239–3257 3241
0,1 to avoid bias. Then, they used the XGBoost model to The study showed that KNN, DT, and NB algorithms
eliminate the lowest 20 features in the selection process performed best in detecting attacks. However, they had
with an accuracy score of 98.91%. In the classification limitations in recognizing unknown intrusions, and it was
stage, the authors used the deep neural network (DNN) recommended to use multiple algorithms against multiple
algorithm and optimized it using Adam optimization to attacks.
reduce issues related to mean and variance parameters. Anwer et al. [17] recommend an anomaly-based NID
Finally, the authors achieved the highest accuracy of 97%. framework. In the third section, they identified gaps in the
Wei et al. [14] conducted research on two datasets, previous literature, such as inefficiencies in fog-based
namely the UNSW-NB15 and NSL-KDD datasets. The attack detection, feature selection, inaccurate algorithms on
UNSW-NB15 dataset consisted of 119,341 anomaly small datasets, and rating of false positives and negatives.
records related to nine classes of attacks, whereas the NSL- They used the NS-KDD dataset, which was split into 80%
KDD dataset included 58,630 anomaly records indicating for training and 20% for testing, with a scaling ratio of
four types of attacks. The researchers aimed to address the approximately 78:75. Three types of classification algo-
challenge of feature selection in the data preparation stage, rithms were utilized, including SVM with an accuracy of
which has a direct impact on training and classification under 35%, GBDT with an accuracy of 78.01%, and RF,
processes. To solve this issue, they proposed a multi-ob- which was the most accurate with an accuracy of 85.34%.
jective immune algorithm (MOIA). In the classification Al-Bakaa and Al-Musawi [18], in this work, the authors
stage, a neural network algorithm was used to train and applied two different feature selection methods to the
classify the dataset. The experimental results showed a UNSW-NB15 dataset. The dataset had 49 features, and in
detection accuracy of 79.81% for the first dataset with the preprocessing stage, 35 features were selected by
about 16 features, which excluded some common attacks. converting all values to numeric, scaling the features to a
On the other hand, the second dataset achieved a higher range of (0, 1) to avoid bias, and removing 14 noisy and
detection accuracy of 99.47% with a total of 24 features. insignificant features. Two types of classification algo-
Zhang et al. [15] conducted research on proactive rithms, decision tree (DT) and random forest (RF), were
intrusion detection on a server machine using the ADFA- used to classify the data. The results achieved 99.9%
LD dataset, which is based on the Linux core. The dataset accuracy, but the weakness of the IDS efficiency in
was divided into more than two-thirds for training and the detecting multiple classes was also noted.
rest for testing, containing two types of data (normal and Singh and Kesswani [19] conducted a study on anomaly
attacks). The authors analyzed and preprocessed the system process detection across the IoT domain using architectural
call trace and trace length using N-gram and evaluated the feature selection. They used the NSL-KDD dataset, which
significance sequence through term frequency and inverse includes four types of attacks, and divided it into 4 million
document frequency (TF-IDF). Six classification models records for training and validation and 1 million records for
were used, and the best model with an accuracy of 97.05% testing. They applied scaling and reduction to eliminate
was the multilayer perceptron (MLP) model with a k-fold irrelevant features and mitigate bias. They used the corre-
size of 10. lation coefficient to identify the factors influencing weight,
Masser et al. [16] conducted a comparative study based and only included numerical values for low variance fea-
on three key performance indicators (KPIs). First, an tures. The feature_range attribute was set to (0, 1) to
anomaly-based detection methodology was used. Second, decrease data dimensionality. The k-fold attribute was set
the CICIDS2017 dataset, which includes two types of to 10 for the validation phase. However, the authors did not
traffic (normal and attacks) with 14 sub-anomalies and mention the classification algorithms used. The proposed
divided into three different ratios (4–6, 5–5, and 6–4). method achieved a detection accuracy of 98.4%.
Third, seven supervised algorithms were utilized, including Ambavkar et al. [20] present a comprehensive analysis
artificial neural network solver = ADAM, decision tree of seven different research topics conducted over six years
using entropy, max depth = none, and weight = balanced, that involve various classification algorithms, both indi-
k-nearest neighbor with k = 1, Naive Bayes with default vidual and hybrid, that are relevant to intrusion detection
values, random forest using estimators = 100, systems. The analysis compares several factors such as
weight = balanced, and max depth = none, and support algorithms used, datasets, accuracy, strengths, and weak-
vector machine with kernel = RBF, max iter = - 1, and nesses. The authors also propose several methods to iden-
weight = balanced. In addition, three unsupervised algo- tify the most effective features for enhancing the IDS
rithms were used, including convolutional neural network efficiency. They prioritize detection accuracy and consider
with n-estimators = 100, k-means with clusters = 4, only results with an accuracy of 90% or higher, which are
max_iter = 300, and expectation–maximization, and self- achieved by Bayesian-based algorithms, random forest
organizing maps, both using the same model by default.
123
3242 Neural Computing and Applications (2024) 36:3239–3257
1 This paper NF-UQ- 98.26, SVC, DT, RFC, KNN, Utilizing two separate Four classes with classical
NIDS-v2 98.78, CatBoost, Gradient classification layers ML- algorithms. In addition, two
Year 2021 99.07, Boosting, LSTM, CNN based with the recently types in neural networks
98.16, published NF-UQ-NIDS-v2
Classes 20
99.04, dataset, preprocessing two
98.80, volumes of various sizes
98.87, (100k and 10 million), and
98.56% hyper-parameter
optimization
2 Meftah UNSW- 86% Log. Reg, Gradient Use one phase to select A limited volume of data,
et al. [11] NB15 Boosting, SVM, D T, superlative features and the failed to train four classes of
Year 2015 and NB other to perform a binary attacks, choices in cleansing,
classification to detect feature selection, scaling, and
Classes 9
anomalous traffic failed to train and detect four
categories of attacks
3 Alamiedy NSL-KDD 93.64, Gray wolf optimization Detect two classes DoS and Aging dataset, failed to train
et al. [12] Year 1999 91.01% and SVM Probe two classes of attacks, Within
a small dataset volume,
Classes 4
57.72% for R2L and 53.7%
for U2R
4 Devan and NSL-KDD 98.91, 97% XGBoost-DNN XGBoost model for feature A limited volume of data from
Khare Year 1999 selection an aging dataset.
[13] Inappropriate for
Classes 4
comparability whenever
employing classical
algorithms to evaluate
experimental results that are
on a different deep layer
5 Wei et al. UNSW- 99.47% NN, and multi-objective improved multi-objective The experimental conditions
[14] NB15 immune algorithm algorithm for optimize and attack methods through
Year 2015 (MOIA) feature selection the aging NSL-KDD dataset
and network traffic issue
Classes 9
could not correctly represent
NSL-KDD the recent traffic. In addition,
Year 1999 low accuracy achieved for
Classes 4 the UNSW-NB15 dataset,
which excluded some
common attacks
6 Zhang et al. ADFA-LD 97.05% SVM, DT, RF, MLP, Developing data The loading process, extracting
[15] Year 2017 KNN and Multi- preprocessing and system features, and prediction
variable NB call analysis host-based should all have low
Classes 6
computation costs. With
larger frames, it takes more
time to extract the features
7 Masser CICIDS2017 Highest ANN, DT, KNN, NB, RF, comprehensively reviews limitations in recognizing
et al. [16] Year 2017 KNN, DT, SVM, and CNN previous studies,, the KNN, unknown intrusions with
then NB DT, and NB models obtain ANN, RF, SVM, and CNN
Classes 14
the best results models
8 Anwer et al. NSL-KDD 85.34% SVM, GBDT, and RF RF Failed to train SVM with an
[17] Year 1999 accuracy of under 35% and
GBDT with an accuracy of
Classes 4
78.01% due to an aging
dataset. This attempt did not
substantially fill the research
gap
9 Al-Bakaa UNSW- 99.96% DT and RF Forward selection ranking Detection delay issue and a
and Al- NB15 (FSR) and backward multi-class scenario
Musawi Year 2015 elimination ranking (BER)
[18] methods. Binary class
Classes 9
123
Neural Computing and Applications (2024) 36:3239–3257 3243
Table 1 (continued)
Nos Authors Dataset Accuracy Algorithms Strength points Weak points
10 Singh and NSL-KDD 98.4% Proposed model The mechanism utilizes Aging dataset. the authors did
Kesswani Year 1999 correlation-based filtration not mention the classification
[19] and feature-based trust algorithms used
Classes 4
factors
11 Ambavkar NSL-KDD 99.48, Genetic Algorithm (GA), A comprehensive analysis, More time was required to
et al. [20] Year 1999 99.87, Bagged Classifier, and They prioritize to NB, build the model.
99.22% KNN, SVM, Breadth- RF, SVM and DT Additionally, U2L and R2L
Classes 4
Forest Tree (BFTree), attacks were not effectively
NB, DT, RF, MLP, and detected
LR
(RF), support vector machine (SVM), and decision tree network traffic gathered, both incoming and outgoing,
(DT). rather than from individual hosts. NIDS, on the other hand,
The significance of this paper is that it contributes has various issues, the most serious of which is extensi-
improved models to intrusion detection utilizing both bility, which means it cannot accommodate and process
classical and neural algorithms trained on a recently encrypted communication traffic.
released dataset. Table 1 provides a summary of the liter- Host-based ID (HIDS) inspects the integrity of the host
ature comparison that highlights the strengths and itself—placed to monitor assets—by examining several
weaknesses. features running, such as system files, system calls, ser-
vices, configuration files, ports, audit logs, and abnormal
activities. HIDS technologies typically require elevated
3 Intrusion detection privileges on the host to track the state of system layer files
and execute responsive actions. For this reason, it is
Anderson [21] made one of the earliest references to important that the IDS itself be secured and protected
intrusion detection techniques in the 1980s, concerning the against attacks, as compromises could lead to unauthorized
potential abuse and early detection of harmful activities. administrator level access. HIDS can monitor every single
Seven years later, D. Denning proposed the concept of a process, executables, kernel entries, memory utilization,
real-time intrusion detection expert system as the founda- and a variety of other features to report indicators of
tion for what is now known as an IDS [22]. attacks.
IDSs are pieces of software that monitor and examine
the information system environment for malicious activity 3.2 Detection methodologies
and commonly used to detect and recognize attacks.
Specifically, because of the constant and rapid improve- The detection methodology is a strategy for monitoring,
ments that are characteristic of attacks in this domain, it analyzing, determining occurrences and potential threats,
deemed important to prioritize innovative techniques and and reporting actions. It comprises a management console
improving existing system. The intention behind this sec- and sensors, and each technique has its own set of pros and
tion is to present widely used IDS technologies, method- cons, which are presented individually under signature-
ologies, capabilities, and any remaining limitations based and anomaly-based.
[5, 23–28].
• Signature-based: this detection method uses a list
known as indicators of compromise (IOCs). It works
3.1 Types of ID technologies
by creating sequences and patterns of fingerprints of
well-known attacks and then using them to identify
Network-based ID (NIDS) designed—placing sensors
intrusions when those match a particular signature,
strategically across a network—to monitor cloud, on-pre-
sometimes known as ‘misuse detection.’ Simpler, more
mises, and hybrid architectures for abnormal events that
performant, and providing accurate consequences of
could imply a threat. NIDS could very well recognize
explicit knowledge of the attack with low false positive
network intrusions in real-time by analyzing data from all
alarms. The signature database faces failure against any
123
3244 Neural Computing and Applications (2024) 36:3239–3257
123
Neural Computing and Applications (2024) 36:3239–3257 3245
123
3246 Neural Computing and Applications (2024) 36:3239–3257
A popular source for machine learning datasets that are 4.3 Descriptive analysis
freely available on Kaggle, the university of Queensland
Australia, UCI Machine Learning Repository, AWS. A Description of the 100 k records that were read from the
number of machine learning datasets are also freely data source to train classical algorithms, showing the shape
available to the public, e.g., NIDS, NF-UQ-NIDS-V2, of the rows, columns, headers, and data types, and veri-
UNSW-NB15, CICIDS 2017, ADFA-LD, NSL-KDD, fying that there are no duplicate rows or missing values,
KDD, and DARPA. neither NaN nor NULL, in the whole data frame. Figure 1
The University of Queensland in Australia and Kaggle shows lists of count values for the attack classes generated,
have recently released the NF-UQ-NIDS-V2 dataset, which and there are two types of events totaling 100 k records:
contains a wide range of variously generated, network- the first is normal with 32,986 records, and the second is
based anomaly attacks simulated in an environment con- anomalous with 67,014 records, containing 19 subsets.
taining events of traffic flow totaling nearly 76 million Consequently, in preparation to train neural networks,
rows, and 46 columns, roughly divided into a 1:3 ratio for these records were increased by one 100-fold, and the
normal to abnormal. The dataset CSV file size is 13.73 GB attacks had reached 20 subsets.
[32, 32].
In this work, 100 K records were imported and loaded 4.4 Cleansing and imputer
from the master source file of the NF-UQ-NIDS-v2 dataset
to train classical algorithms. For training neural network We achieved it by fixing the values from the results of the
algorithms, the dataset was then increased until it had 10 previous stage, if they exist. Dropping duplicate rows and
million records. And the following are the preparation using equations for missing values such as mean, median,
stages (from 1 to 8) in which a data farm is analyzed, most frequent, and constant.
handled, and structured in a manner that maximizes accu-
racy and efficiency. 4.5 Scaling numerical values
4.1 Libraries Where the diversity of features and their distribution may
have different values across the data farm, it will lead to
The initial step import some libraries that are used to difficulties in handling and formulating the issue, hence
perform some specific functions, e.g., PANDAS to read csv training the algorithm fails to work. There are many
files, SKLEARN for missing values and scaling, NUMPY techniques in the preprocessing module from the
to perform complex mathematical operations, MAT- SKLEARN library. (e.g., StandardScaler, MinMaxScaler,
PLOTLIB to plot charts, tensorflow and keras for neural MaxAbsScaler, Normalizer, Binarizer, PolynomialFea-
network. tures, FunctionTransformer). We utilize (MinMaxScaler
with both attributes copy = True and fere_range = 0, 1).
4.2 Data load
4.6 Encoding categorical attributes
Determining the path of the data source, and reading it with
the number of specific rows we required. The data farm is prepared for a machine learning algorithm
that works perfectly with mathematics and numbers. From
this point on, categorical variables should be encoded into
123
Neural Computing and Applications (2024) 36:3239–3257 3247
123
3248 Neural Computing and Applications (2024) 36:3239–3257
123
Neural Computing and Applications (2024) 36:3239–3257 3249
5.1.1 Support vector classifier (SVC) separable issue like those classes, we must remap the
Y-dimension to a higher dimension, which means refor-
The capability of this module is to handle and transform mulating the same values in a different space by two
highly complex data with a minimal number of training equations, one to determine the red line and the other for
datasets; it essentially addresses the issue of nonlinear data the green line.
separation through the kernel trick technique by using more In case two, mapping the dataset from two-dimensional
than one hyperplane line, which is one of its main to three-dimensional is also the same as the issue in case 1,
strengths. Utilizing the kernel attribute via the radial basis but with different dimensions to separate the two classes
function (RBF), it can be expressed mathematically as in from each other through reformulating the two-dimen-
Eq. (1). sional (X, Y) to the three-dimensional (X, Y, Z) by three
kðx1 ; x2 Þ ¼ exp x1 x22 =2r2 ð1Þ equations.
where ‘||X1 - X2||’ refer to space between 1 and 2. ‘r’ refer 5.1.2 Decision tree classifier (DTC)
to variance and our hyper-parameter.
We reformulate the two cases by mapping the datasets It is termed a decision tree since, similar to a hierarchy tree
from lower-dimensional to higher-dimensional space to constructed to manage processes top-down, it initiates with
group each class together as shown in Fig. 3. the root node by utilizing the most informative feature by
In case one, as the figure shows when mapping the asking a simple question with a conclusive answer (yes or
datasets from one-dimensional to two-dimensional, one no), and then the subtrees grow down by repeating this
feature expressed the X-dimension, so the result is the process which means that growth creates two types of
green class mediating the red class. For a nonlinearly nodes that appear. The first is the decision node and
123
3250 Neural Computing and Applications (2024) 36:3239–3257
contains branches, while the other is a normal leaf node 5.1.3 Random forest classifier (RFC)
that does not have branches. Figure 4 demonstrates the
hierarchy of decision tree construction. It was designed by using the concept of parallel ensemble
We explain how a decision tree works through the most learning, which is a hierarchical combining of decision tree
commonly used choice metrics, which can be mathemati- classifiers that uses a predefined likelihood to select the
cally expressed as in Eqs. (2), (3). ultimate superlative feature. The random forest decides the
EntropyðsÞ ¼ P þ logðPþÞ P logðPÞ ð2Þ final prediction result depending on the unanimous vote of
the forecasts it receives from each tree. Consider Fig. 5.
where (s) dataset, P? probability of yes, P- probability of
no, Log number of classes 5.1.4 K-Nearest neighbors (KNN)
The result,
Between 0 and 1 (Zero = refers to a superlative feature It is one of the simplest types of classification algorithms in
that affects directly the output, and One= refers to a weak machine learning models. Figure 6 shows the method it
feature that does not have any effect on the outputs). utilizes when it is fed new data like (black triangle) to
EðsÞ ðC ðG1Þ=CðsÞ EðG1Þ þ C ðG2Þ=CðsÞ EðG2ÞÞ classify it under a specific class (blue or red). It first
ð3Þ determines the K factor by assigning a specific number
(k = 4). Meanwhile, the scope of nearby points, that it will
where E entropy, (s) dataset, C count, G1 group 1, G2 work on, as it will re-sort for all neighbor points, calculate
group 2 the distance between the new data and these points (3 blue
The result, and 1 red), and classify the new data according to the
Select the group with the highest value. majority of its nearest neighbors (the new data is classified
under blue).
123
Neural Computing and Applications (2024) 36:3239–3257 3251
The core idea depends on generating numerous weaker Recurrent neural networks (RNNs) have been extended by
classifiers and integrating their prediction results to the most powerful version that can buffer long-term
assemble one stringent classifier to achieve the highest dependencies in sequential data, called long short-term
accuracy. CatBoost classifier (CBC) and gradient boosting memory (LSTM). The LSTM structure involves a cell-
classifier (GBC) are two types of boosting algorithms. chain; as shown in Fig. 7, each one of these cells has a
memory and three fully connected gates (forget, input, and
5.2 Neural network algorithms output), which control the information flow both inside and
outside of the cell. Each LSTM cell’s output is fed into the
Neural network (NN) algorithms are characterized by next cell in the chain.
mimicking the structure and function of the human cog- Here, Ct-1 Ct-1 is hidden cell state of long-term mem-
nitive system through convoluted correlations, similar to ory, Ht-1 is hidden state of the previous cell, short-term
how a human would obtain conclusions. NN trains on memory, and Xt is current input. r sigmoid activation
massive data and the ability to process it in its original function output value (0 delete or 1 keep). tanh activation
form, and when there is inadequate domain expertise for function output value (- 1 and 1). Ht1 current cell output.
feature selection, it learns gradually and discovers excep- The memory that makes it easier to remember values of
tional features from the data. data over a random time interval from (Ct-1) to (Ct1). The
forget gate determines which information the LSTM will
proceed to keep and which to delete from the previous cell
state (Ct-1) utilizing a sigmoid activation function
123
3252 Neural Computing and Applications (2024) 36:3239–3257
(0 = delete, 1 = keep). It achieves this by assessing the pattern features in detail. The CNN diagram is depicted in
new input data (Xt) and the data gained from the previously Fig. 8, where the network architecture is constructed from
hidden state (Ht-1). The input gate, as a method to alter the four layers (input, convolutional, pooling, and fully con-
state of memory, determines which value from the input nected). As the CNN diagram below shows, the input layer
should be utilized. The values to pass through are 0 or 1, mainly feeds the dataset to the next layer. The kernel in the
depending on the sigmoid function. Secondly, the tanh convolution layer works as a filter for the feature detector
activation function assigns weight to the values that are in the image. A pooling layer used the downscaling tech-
passed, determining their relevance on a scale between nique to reduce the feature map to prevent over-fitting
(- 1 and 1). The output gate is currently in this hidden issues by summarizing via two common methods (average
state. The newly updated cell state via a tanh function pooling and max pooling). Then flattening, which converts
multiplied by the output passed from a sigmoid function the full matrix of pooled feature maps directly into just one
determines the hidden state of the current cell output (Ht1). column before feeding that column to the fully connected
layer of the neural network. In the fully connected layer
5.2.2 Convolutional neural network called the dense, all neurons have direct connectivity for
transforming their output into any number of classes. Deep
A convolutional neural network (CNN) broadly used for learning is composed of multiple layers of convolution and
machine vision, and it was designed-based on the human pooling. Once we proceed more deeply within CNN, it
visual neuron simulator. Each layer of CNN is only con- ultimately recognizes deeper characteristic features.
nected to a portion of the neurons, with the last layer at the
tail of the network being fully connected. CNN applies
filters separately to each pixel in an image to learn the
123
Neural Computing and Applications (2024) 36:3239–3257 3253
6 Experiment environment and results The SVC algorithm utilized hyper-parameter value sets
as shown in Table 3 and achieved (98.26%) for testing
The experiment addressed in this work conducted on model accuracy. Bot (100%) had the highest detection rate,
dataset, tools, libraries, modules, and classes, as indicated followed by three other types (Brute Force, DDoS, and
in Table 2. They are all powerful and agile open sources. Scanning) that obtained (99%), while there are four types
In terms of the performance measures shown in Fig. 9 (Shellcode, Theft, mitm, and ransomware) that have not
for validating and evaluating ML models, this work seeks been detected. al model.
to achieve high incident detection accuracy, essentially in The DT algorithm utilized hyper-parameter value sets as
the testing dataset. The model accuracy and classification shown in Table 4 and achieved 98.78% accuracy for the
report presented below are the main criteria to examine. In testing model. Bot (100%) had the highest detection rate,
addition, the difference in detection accuracy between followed by DDoS, DoS, and scanning as having the
training and testing for recognizing over-fitting and under- highest detection rate, while there are three types (Shell-
fitting issues must be noted. code, Theft, and mitm) that have not been detected due to
Also, six machine learning classification algorithms their small number.
configured with hyper-parameter values for each one sep- For the RFC algorithm, we utilized hyper-parameter
arately were trained, validated, tested, and evaluated in this value sets as depicted in Table 5 and achieved a 99.07%
environment, as shown in Tables 3, 4, 6, 7, 8, 9 and 10 accuracy rate for the testing model. Bot and scanning
which will be explained in the context of this section. (100%) had the highest detection rate, while Brute Force,
123
3254 Neural Computing and Applications (2024) 36:3239–3257
DDoS, and Dos had the next highest detection rates, while Scanning anomaly had the highest detection rate (100%),
there are four types (Shellcode, Theft, mitm, and ran- followed by Bot, DDoS, and DoS. Anomalies were not
somware) that have not been detected because their total recognized (Fuzzers, Shellcode, Theft, Mitm, and Ran-
number did not exceed five. somware). The variation in detection accuracy between
The testing model had a 98.16% accuracy rate when testing and training is less than 0.33%, indicating that it is
using the KNN algorithm with hyper-parameter value sets, within the acceptable range for equity and quality.
as demonstrated in Table 6. The detection rate for Bot, The long short-term memory (LSTM) algorithm utilized
DDoS, and scanning was the highest. Four categories hyper-parameter value sets as shown in Table 9 and
(Shellcode, Theft, mitm, and Ransomware) were not achieved 98.87% accuracy for the testing model. Bot,
detected since their total number was not greater than five. Infiltration, and scanning (100%) had the highest detection
After implementing the categorical boost algorithm with rate, followed by DDoS and DoS, as having the highest
hyper-parameter value sets, the testing model had a 99.04% detection rate, while there are two types (analysis and
accuracy rate, as demonstrated in Table 7. Bot anomaly worms) that have not been detected.
event had the greatest detection rate (100%), while the For the convolution neural network (CNN) algorithm,
accuracy rate for Brute Force, DDoS, DoS, and scanning we utilized hyper-parameter value sets as shown in
was next. The same four anomalous events (Shellcode, Table 10 and achieved a 98.56% accuracy rate for the
Theft, Mitm, and Ransomware) that were prevalent in testing model. Bot (100%) had the highest detection rate,
earlier models happened again and were detected for the while (Brute Force, DDoS, Dos, Infiltration, and Scanning)
same reasons. had the next highest detection rates, while there are two
The testing model achieved 98.80% accuracy after types (analysis and worms) that have not been detected.
adopting the gradient boosting algorithm using hyper-pa-
rameter value settings, which is apparent in Table 8.
123
Neural Computing and Applications (2024) 36:3239–3257 3255
Analysis 0.00 0.00 0.00 181 Analysis 0.00 0.00 0.00 132
Backdoor 0.93 0.89 0.91 1500 Backdoor 0.93 0.89 0.91 1008
Benign 1.00 1.00 1.00 1,986,660 Benign 1.00 1.00 1.00 1,324,729
Bot 1.00 1.00 1.00 11,291 Bot 1.00 1.00 1.00 7620
Brute force 0.99 0.98 0.99 9935 Brute force 0.99 0.98 0.98 6545
DDoS 0.99 0.99 0.99 1,717,661 DDoS 0.99 0.99 0.99 1,144,686
DoS 0.98 0.99 0.99 1,411,559 DoS 0.98 0.99 0.99 940,644
Exploits 0.61 0.88 0.72 2502 Exploits 0.62 0.90 0.73 1716
Fuzzers 0.65 0.72 0.68 1771 Fuzzers 0.65 0.74 0.69 1153
Generic 0.99 0.68 0.81 1293 Generic 0.99 0.65 0.79 899
Infiltration 1.00 0.99 1.00 9249 Infiltration 1.00 0.99 1.00 6205
Reconnaissance 0.99 0.95 0.97 207,690 Reconnaissance 0.98 0.95 0.97 138,469
Shellcode 0.77 0.75 0.76 103 Shellcode 0.77 0.57 0.66 82
Theft 0.67 0.87 0.76 202 Theft 0.64 0.82 0.72 120
Worms 0.00 0.00 0.00 12 Worms 0.00 0.00 0.00 9
Injection 0.91 0.77 0.83 53,772 Injection 0.91 0.77 0.83 36,338
mitm 0.93 0.31 0.47 615 mitm 0.88 0.26 0.40 386
Password 0.91 0.94 0.93 91,081 Password 0.91 0.94 0.92 60,587
Ransomware 0.87 0.92 0.89 281 Ransomware 0.88 0.88 0.88 170
Scanning 1.00 1.00 1.00 299,086 Scanning 1.00 1.00 1.00 199,258
xss 0.94 0.96 0.95 193,556 xss 0.94 0.96 0.95 129,244
Accuracy 0.99 6,000,000 Accuracy 0.99 4,000,000
Macro avg 0.82 0.79 0.79 6,000,000 Macro avg 0.81 0.77 0.78 4,000,000
Weighted avg 0.99 0.99 0.99 6,000,000 Weighted avg 0.99 0.99 0.99 4,000,000
123
3256 Neural Computing and Applications (2024) 36:3239–3257
Analysis 0.00 0.00 0.00 181 Analysis 0.00 0.00 0.00 132
Backdoor 1.00 0.88 0.93 1500 Backdoor 1.00 0.89 0.94 1008
Benign 1.00 1.00 1.00 1,986,660 Benign 1.00 1.00 1.00 1,324,729
Bot 1.00 1.00 1.00 11,291 Bot 1.00 1.00 1.00 7620
Brute force 1.00 0.98 0.99 9935 Brute force 1.00 0.98 0.99 6545
DDoS 0.99 0.99 0.99 1,717,661 DDoS 0.99 0.99 0.99 1,144,686
DoS 0.98 0.99 0.99 1,411,559 DoS 0.98 0.99 0.99 940,644
Exploits 0.82 0.66 0.73 2502 Exploits 0.83 0.66 0.73 1716
Fuzzers 0.57 0.76 0.65 1771 Fuzzers 0.56 0.77 0.65 1153
Generic 0.77 0.74 0.75 1293 Generic 0.75 0.71 0.73 899
Infiltration 0.99 1.00 0.99 9249 Infiltration 0.99 0.99 0.99 6205
Reconnaissance 0.96 0.93 0.95 207,690 Reconnaissance 0.96 0.93 0.95 138,469
Shellcode 0.78 0.80 0.79 103 Shellcode 0.74 0.60 0.66 82
Theft 0.33 0.24 0.28 202 Theft 0.32 0.31 0.31 120
Worms 0.00 0.00 0.00 12 Worms 0.00 0.00 0.00 9
Injection 0.75 0.76 0.75 53,772 Injection 0.75 0.76 0.76 36,338
mitm 0.92 0.30 0.46 615 mitm 0.88 0.25 0.39 386
Password 0.90 0.90 0.90 91,081 Password 0.90 0.90 0.90 60,587
Ransomware 0.95 0.69 0.80 281 Ransomware 0.91 0.61 0.73 170
Scanning 0.99 0.99 0.99 299,086 Scanning 0.99 0.99 0.99 199,258
xss 0.94 0.93 0.94 193,556 xss 0.94 0.94 0.94 129,244
Accuracy 0.99 6,000,000 Accuracy 0.99 4,000,000
Macro avg 0.79 0.74 0.76 6,000,000 Macro avg 0.79 0.73 0.74 4,000,000
Weighted avg 0.99 0.99 0.99 6,000,000 Weighted avg 0.99 0.99 0.99 4,000,000
correlation technique that is architecture-based on stacking use is not permitted by statutory regulation or exceeds the permitted
for both classical algorithms and deep neural networks. use, you will need to obtain permission directly from the copyright
holder. To view a copy of this licence, visit http://creativecommons.
org/licenses/by/4.0/.
Funding Open access funding provided by The Science, Technology Data availability The data source utilized in this study is obtainable
& Innovation Funding Authority (STDF) in cooperation with The from: Sarhan, M., Layeghy, S. & Portmann, M. Towards a Standard
Egyptian Knowledge Bank (EKB). Feature Set for Network Intrusion Detection System Datasets. Mobile
Netw Appl (2021). https://doi.org/10.1007/s11036-021-01843-0. Data
repositories: https://www.kaggle.com/datasets/aryashah2k/nfuq
Open Access This article is licensed under a Creative Commons nidsv2-network-intrusion-detection-dataset.
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as
long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons licence, and indicate References
if changes were made. The images or other third party material in this
article are included in the article’s Creative Commons licence, unless 1. Johnson A, Dempsey K, Ross R, Gupta S, Bailey D (2011) Guide
indicated otherwise in a credit line to the material. If material is not for security-focused configuration management of information
included in the article’s Creative Commons licence and your intended systems. Computer Security Division, Information Technology
123
Neural Computing and Applications (2024) 36:3239–3257 3257
Laboratory (National Institute of Standards and Technology), 19. Singh KP, Kesswani N (2022) An anomaly-based intrusion
NIST Special Publication 800-128 detection system for IoT networks using trust factor. SN Comput
2. Internet Crime Report 2021 (2021) Federal Bureau of Investi- Sci 3:168. https://doi.org/10.1007/s42979-022-01053-9
gation (FBI), Internet Crime Complaint Center, IC3 20. Ambavkar O, et al. (2022) Review on IDS based on ML algo-
3. Abdou H, Khalifa W, Roushdy M, Salem A (2019) Machine rithms. Int J Res Appl Sci Eng Technol (IJRASET). IC Value:
learning techniques for credit card fraud detection. Future Com- 45.98. https://doi.org/10.22214/ijraset.2022.47284. ISSN:
put Inform J 4(2) 2321-9653
4. White Paper, Cisco Annual Internet Report (2018–2023), Cisco 21. Anderson JP (1980) Computer security threat monitoring and
Public, March 9. 2020. https://www.cisco.com/c/en/us/solutions/ surveillance. Tech. Rep., James P. Anderson Co., Washington
collateral/executive-perspectives/annual-internet-report/white- 22. Denning DE (1987) An intrusion-detection model. IEEE Transact
paper-c11-741490.html. Accessed 01 Dec 2022 Softw Eng 13(2)
5. Scarfone K, Mell P (2007) Guide to intrusion detection and 23. Ozkan-Okay M, et al. (2021) A comprehensive systematic liter-
prevention systems—IDPS. Computer Security Division, Infor- ature review on intrusion detection systems. Department of
mation Technology Laboratory (National Institute of Standards Computer Engineering, Ankara University Turkey, IEEE Access.
and Technology), NIST Special Publication 800-94 Digital Object Identifier. https://doi.org/10.1109/ACCESS.2021.
6. (2013) Information technology—security techniques—code of 3129336
practice for information security controls, 2nd edition. Interna- 24. Lai C, et al. (2021) Review of intrusion detection methods and
tional Standard ISO/IEC 27002, pp 67–71 tools for distributed energy resources. Sandia National Labora-
7. (2015) Information technology—security techniques—selection, tories, Sandia Report, SAND2021-1737, pp 15–24
deployment and operations of intrusion detection systems (IDPS), 25. Khraisat A et al (2019) Cybersecurity, ‘‘Survey of intrusion
1st edn. International Standard ISO/IEC 27039 detection systems: techniques, datasets and challenges. Internet
8. Edgescan (2022) vulnerability statistics report. Smart Vulnera- Commerce Security Laboratory, Federation University Australia,
bility Management Mount Helen, Australia. Springer Open Access. https://doi.org/
9. Saikushwanth V, Ramachandra G (2021) Intrusion detection 10.1186/s42400-019-0038-7
system using machine learning, vol 2, issue 10. Department of 26. Sharma M, Sawant K (2021) Intrusion detection system Using
Computer Science and Engineering, Chalapathi Institute of Deep Learning. Int J Sci Res Eng Trends (IJSRET) 7(2):
Engineering and Technology, Guntur, United International 2395–566x. ISSN (Online)
Journal for Research and Technology (UIJRT). ISSN:2582-6832 27. Malek Z, Trivedi B, Shah A (2020) User behavior pattern-sig-
10. Bhatia V, Choudhary S, Ramkumar KR (2020) A comparative nature based intrusion detection. In: 2020 4th world conference
study on various intrusion detection techniques using machine on smart trends in systems, security and sustainability (WorldS4)
learning and neural network. Department of Computer Science 28. Likhomanov D, Poliukh V (2020) Predicting malicious hosts by
and Engineering, Chitkara University Institute of Engineering blacklisted IPv4 address density estimation. In: The 11th IEEE
and Technology, Chitkara University, Punjab India, 2020 8th international conference on dependable systems, services and
international conference on reliability, infocom technologies and technologies, DESSERT 2020, Kyiv, Ukraine
optimization (trends and future directions) (ICRITO), Amity 29. Paullada A, et al. (2021) Data and its (dis)contents: a survey of
University, Noida, India. IEEE. 978-1-7281-7016-9/20/$31.00 dataset development and use in machine learning research.
2020 Department of Linguistics, University of Washington, Seattle,
11. Meftah S, et al. (2019) Network based intrusion detection using WA, USA. https://doi.org/10.1016/j.patter.2021.10033
the UNSW-NB15 dataset. Int J Comput Digit Syst. https://doi. 30. Sydorenko I (2021) [Online]. https://labelyourdata.com/articles/
org/10.12785/ijcds/080505. ISSN (2210-142X) what-is-dataset-in-machine-learning. Accessed 20 Dec 2022
12. Alamiedy T et al (2020) Anomaly-based intrusion detection 31. The University of Queensland in Australia. https://staff.itee.uq.
system using multi-objective grey wolf optimisation algorithm. edu.au/marius/NIDS_datasets/#RA5
J Ambient Intell Hum Comput 11:3735–3756. https://doi.org/10. 32. Sarhan M, Layeghy S, Portmann M (2022) Towards a Standards
1007/s12652-019-01569-8 Feature Set for Network Intrusion Detection System Datasets,
13. Devan P, Khare N (2020) An efficient XGBoost–DNN-based University of Queensland, Brisbane QLD 4072, Australia. Mob
classification model for network intrusion detection system. Netw Appl 27:357–370. https://doi.org/10.1007/s11036-021-
Neural Comput Appl 32:12499–12514 01843-0
14. Wei W, et al. (2020) An improved multi-objective immune 33. What it is machine learning [online]. https://www.ibm.com/
algorithm for intrusion feature selection in intrusion detecion. topics/machine-learning. Accessed 10 Dec 2022
Appl Soft Comput 95:106522. https://www.sciencedirect.com/ 34. What it is machine learning [online]. https://ischoolonline.berke
science/article/pii/S1568494620304610 ley.edu/blog/what-is-machine-learning/. Accessed 10 Dec 2022
15. Zhang X, Niyaz Q, Jahan F, Sun W (2020) Early detection of 35. What it is machine learning [online]. https://www.oracle.com/ng/
host-based intrusions in linux environment. 978-1-7281-5317-9/ artificial-intelligence/machine-learning/what-is-machine-learn
20/$31.00 2020 IEEE ing/. Accessed 10 Dec 2022
16. Masser ZK, et al. (2021) Benchmarking of machine learning for 36. Janiesch C, Zschech P, Heinrich K (2021) Machine learning and
anomaly based intrusion detection systems in the CICIDS2017 deep learning. Electron Mark 31:685–695. https://doi.org/10.
dataset. IEEE Access, Open Access J. https://doi.org/10.1109/ 1007/s12525-021-00475-2
ACCESS.2021.3056614 37. Pasupa K, Sunhem W (2016) A compariosn between shallow and
17. Anwer M, et al (2021) Attack detection in IoT using machine deep architecture classifiers on small dataset. In: 8th international
learning. Eng, Technol Appl Sci Res 11(3):7273–7278. https:// conference on information technology and electrical engineering
www.etasr.com (ICITEE), Yogyakarta, Indonesia. 978-1-5090-4139-8/16/$31.00
18. Al-Bakaa A, Al-Musawi B (2021) Improving the performance of 2016 IEEE
intrusion detection system through finding the most effective
features. Department of Electronics and Communication Faculty Publisher’s Note Springer Nature remains neutral with regard to
of Engineering, University of Kufa, Iraq. 978-1-6654-1224-7/21/ jurisdictional claims in published maps and institutional affiliations.
$31.00 2021 IEEE
123