Proposal
Proposal
Proposal
CHAPTER 1
Problem Statement (200 words max):
As the number of Internet of Things (IoT) devices in smart homes increases, the risk of zero-
day attacks also increases. Zero-day attacks exploit unknown vulnerabilities in software and
hardware, making them difficult to detect and defend against. IoT devices are often
vulnerable to such attacks due to their limited resources and lack of security updates.
To address this problem, implementing a honeypot for IoT smart homes can be an effective
solution. A honeypot is a security mechanism that creates a decoy system or application to
attract attackers and detect their activities. By implementing a honeypot, it is possible to
identify zero-day attacks and gather information about the attackers' methods and tactics.
Machine learning techniques can be used to develop a more effective honeypot for IoT smart
homes. Machine learning algorithms can analyse network traffic and identify patterns that
indicate an attack. By training the machine learning models on data from previous attacks, the
honeypot can become more accurate and effective in detecting zero-day attacks.
Research Questions:
How effective are honeypots for cyber security for IOT devices in detecting and
mitigating cyber-attacks, and how does machine learning improve its performance?
What are the design considerations for implementing a honeypot for cyber security?
What are the limitations and challenges in implementing a honeypot for IOT devices
using machine learning, and how can they be addressed to improve its performance?
How does the use of honeypots compare to other cyber security approaches, and what
are its advantages and disadvantages in different contexts?
Research Objectives:
Detect and mitigate zero-day attacks: The honeypot will act as a trap for attackers, and
by analysing the data collected by the honeypot, organizations can identify and
mitigate zero-day attacks.
Improve machine learning algorithms: The honeypot will use machine learning
algorithms to detect and classify the behaviour of the attackers. The data collected by
the honeypot can be used to improve the machine learning algorithms, making them
more accurate and effective.
Enhance threat intelligence: By analysing the data collected by the honeypot,
organizations can gain valuable insights into the tactics, techniques, and procedures
used by attackers. This information can be used to enhance threat intelligence,
enabling organizations to better protect their IoT smart homes.
Provide early warning of attacks: The honeypot can provide early warning of attacks,
allowing organizations to take proactive measures to prevent the attacks from causing
damage.
Identify vulnerable devices: By analysing the data collected by the honeypot,
organizations can identify vulnerable devices and take measures to patch or replace
them.
CHAPTER 2
Literature Review
In this chapter we will study different papers relevant to the honeypot and machine learning
used in IoT
The Internet of Things (IoT) is everywhere now, but maybe nowhere is it more pervasive than
in the modern smart home. Cyberattacks on smart homes are on the rise due to the
proliferation of IoT devices. Cyber-attacks may be detected and monitored with the use of a
honeypot. This paper intends to critically examine the existing literature on utilising
honeypots in IoT smart homes as a means of mitigating the effects of zero-day attacks by
means of machine learning.
In computer networks, one sort of security mechanism known as a honeypot is utilised to
detect and foil attempts to get into the network. It is a sham version of a network, service, or
piece of software that is used to probe for vulnerabilities in security. Honeypots are designed
to uncover trends in the behaviour of attackers by monitoring for and analysing attacks after
they have occurred. Honeypots are becoming increasingly used as a low-cost technique of
detecting and monitoring cyberattacks on Internet of Things (IoT) smart homes, which has
contributed to their rise in popularity in recent years.
The Internet of Things makes smart homes susceptible to zero-day assaults. An attack that
takes advantage of a flaw in the system's security that has never been discovered previously is
known as a "zero-day attack." It is common for these assaults to go undiscovered until after
they have already caused significant harm. Machine learning has been found to be an
effective way for detecting zero-day attacks, in contrast to the traditional security
methodologies, which struggle to identify these types of threats.
Scope of Research
Review Questions
What challenges existed in the ML based honeypots?
How well do IOT honeypots detect and mitigate cyberattacks, and how does machine
learning improve their performance?
Cybersecurity honeypot design considerations?
How might machine learning honeypots for IOT devices be improved? What are their
limitations and challenges?
Research Selection Criteria
Journal articles, conference papers.
Research published during the period between 2019 and 2022.
Research must provide the answers to the research questions.
Research also contains the title, and year.
Literature targeted the honeypot, honeypot for IOT Smart Homes and machine
learning.
Targeted Area
Honeypot for IOT Smart Homes.
Honeypot and Machine Learning.
Intelligent-Interaction Honeypot
In order to fix IoT's security issues, this article suggests deploying deception strategies and
honeypots. The goals involve finding flaws, making Internet of Things devices safer, and
fixing them in a cost-effective way. With this approach, attacker sessions can be prolonged
and more IoT network threats can be captured by utilising machine learning to develop
honeypots that fool and engage them. The paper's significance is in showcasing the potential
of machine learning powered IoT honeypots and bringing attention to the need for efficient
and inexpensive approaches to discover security flaws in IoT devices. Additional machine
learning method improvements, testing in real-world IoT setups, monitoring, and updating to
adapt to changing attacker methods, impact assessment on IoT security, and cost-benefit
analysis for economic feasibility in IoT systems are all on the agenda for future work on the
honeypot (Mfogo, 2023).
First, it presents multi-cascaded CNN classification for IIoT network assault detection.
Second, it recommends dynamic honey pot encryption for IoT cloud data transmission and
storage. Accuracy, precision, recall, and F1-score are compared on power, loop sensor, and
land sensor datasets. It compares the proposed method's throughput, latency, and detection
rate against existing approaches. The article compares system encrypting, decrypting, and
running times. A secure, energy efficient IIoT data transfer system uses power, loop sensor,
and land sensor data. Multi-scale grasshopper optimisation refines network model. Dynamic
honey pot encryption saves data in the cloud once the Robust Multi-cascaded CNN (RMC-
CNN) classification system detects an assault. Low-power IIoT data transmission, RMC-
CNN network breach detection, dynamic honeypot encryption for data security, encrypted
IoT cloud storage, and distributed ledger encryption key storage are the paper's contributions.
Simulations show the suggested technique transfers secret data quicker. The report also
emphasises the new system's cost function performance compared to prior systems and
recommends real-time data and precise detection. It encourages IIoT network microservice
behavioural analysis and virtual data environments and real-time data analysis. Building a
strong detection algorithm framework and applying it to real-time data is the identified gap.
Real-time data concerns should be investigated in the virtual data environment. The
recommended technique prioritises fixing real-time data issues and understanding IIoT
network microservice behaviour to fix disparities (Sankaran, 2023).
A security solution for safeguarding Internet of Things (IoT) devices. This will be achieved
by utilising an ensemble machine learning approach to develop an intrusion detection system.
The efficacy of the proposed solution is assessed through experimentation with diverse
datasets that conform to industry standards. This study makes a significant contribution to the
field by utilising an ensemble machine learning technique to develop and implement an
intrusion detection system. Furthermore, the system's effectiveness is evaluated using
industry-standard datasets to ensure its reliability and accuracy. Prospective research
endeavours encompass broadening the scope of experimentation to encompass a wider range
of Internet of Things (IoT) datasets, assimilating contemporaneous cyber threat models into
IoT devices, acknowledging the intricacy of safeguarding low-energy IoT devices,
investigating alternative security resolutions for confidentiality and reliability, and subjecting
the solution to testing on operational IoT devices to rectify any anomalies (Das, 2022).
The rising concerns over the safety of Internet-connected technology, including embedded
systems, cyber-physical devices, and the Internet of Things (IoT). Among the goals of this
project are the formulation of a workable plan for the defence of the Internet of Things
infrastructure, the analysis of the efficacy of machine learning (ML) algorithms in the
defence of the IoT and its ecosystem, and the investigation of data sets that are pertinent to
the project. The process entails analysing cutting-edge data sets to detect security
vulnerabilities and provide solutions, evaluate the efficacy of machine learning techniques in
managing IoT security, and evaluating the efficiency of ML algorithms. Contributions made
by this study include a discussion of IoT data and crypto ransomware, the highlighting of
clustering challenges for attack detection, the use of a feature-rich dataset (UKM-IDS20), and
the demonstration of the effectiveness of three machine learning algorithms in recognising
and classifying IoT assaults. The work that has to be done in the future includes integrating
found clusters, increasing the dataset to include current threats, advocating the use of a wider
range of Internet of Things and operational technology protocols, and proposing the usage of
STIX for structured threat data sharing (Ariffin, 2022).
Locating potential threats through the utilisation of honeypots. It suggests the utilisation of
watermarked learning models and makes use of machine learning techniques in order to
achieve this objective. The approach that was used in the research included producing a
private key that was then included in the watermark and utilising the Threat Model as part of
the framework that was provided by the HoneyModel. The primary innovations made by this
study are the creation of machine learning honeypots called HoneyModel, which are able to
identify the adversarial utilisation of machine learning models, the synthesis of embedded
watermark keys inside the models, and the utilisation of neural networks for the training of
models. The research offers additional investigation of advanced neural network
methodology as a potential area for future work or gaps (Abdou, 2021).
A conceptual architecture for the protection of networks. Second, it suggests the development
of a machine learning (ML) model and honeypot system to improve the overall level of
security in a number of different types of companies. The approach calls for the establishment
of three honeypots with little interaction within a private network in addition to the
application of Support Vector Machine (SVM) algorithms for the development of an
enhanced artificial intelligence (AI) system. The purpose of this article is to offer a security
solution that is applicable to enterprises and can be utilised by them to secure their data and
defend themselves against cyberattacks. In terms of work that will be done in the future, the
authors intend to investigate the possibility of utilising high-level interaction honeypots in
conjunction with reinforcement learning (RL) in order to further improve security measures
(Tsochev, 2021).
An AntiConcealer edge IoT framework that makes use of edge artificial intelligence for the
purpose of determining whether an attacker is concealing their activity in the Internet of
Things (IoT). Honeypots are integrated with servers as part of the architecture in order to
assess how successful and accurate the process of recognising attacker behavioural patterns
is. The approach entails employing a Multivariate Hawkes Process to develop an adversarial
behaviour model and then applying it to discover concealed behaviours using BPGM. This is
done in order to accomplish the task. After that, the hidden behaviours that have been
categorised are employed in a non-negative weighted impact matrix, and the Decision Tree is
used to evaluate the findings of the matrix. The invention of AntiConcealer, an edge-assisted
Internet of Things framework, is the main contribution of this study. AntiConcealer identifies
and inhibits harmful activity in the Internet of Things by using an AI technique to identify
disguised behaviours. The potential application of reinforcement learning to further increase
the capabilities of the framework is indicated in the study as the future work that needs to be
done or the future gap that needs to be filled (Zhang, 2021).
Rule-based content by putting out a solution based on artificial intelligence. Also presented is
a method for the identification and evaluation of attacks that makes use of machine learning
techniques such as LightGBM, Random Forest, and K-NN. The implementation of the
system and the analysis of assaults are both handled by the Cowrie Honeypot. The authors of
this research built an artificial intelligence-based threat identification and detection system in
order to get a deeper comprehension of attacks made against the Cowrie Honeypot. In their
future work, the authors suggest conducting more research into reinforcement learning as a
potential method for making the system better. In conclusion, the work presented here offers
a system that is based on machine learning for detecting and evaluating cyber hazards, as well
as an AI-based solution for rule-based content, with the potential for future enhancement
through reinforcement learning (BO-XIANG WANG, 2022).
Deploying ultra-dense networks with wireless honeypots (WHs) and developing a tactical
deployment approach utilising Reinforcement Learning (RL) technique such as Q-Learning
and e-Greedy are both things that will be done in this project. The report also explores several
kinds of honeypots that provide the highest possible level of security. The approach that was
used in the study included the use of RL agents in order to establish the suitable number of
WHs for protecting access points. The purpose of this study is to investigate the use of WHs
in very dense networks, propose a strategic deployment strategy that makes use of two
different RL algorithms, and attempt to find the ideal number of WHs. These are the
contributions that the paper makes. In the suggested future study, more sophisticated RL
approaches for using WHs in B5G, 5G Core, and 5G-RAN networks are going to be
researched and investigated (Radoglou-Grammatikis, 2022).
The utilisation of a machine learning (ML) augmented honeypot with the purpose of lowering
the amount of labour expenses connected with data processing. An emphasis on automatically
assessing and cleaning the data in order to improve the effectiveness of the detection. This
investigation makes use of a technique that is based on a honeypot system architecture that is
composed of four modules. These modules are called Request Handle, Base, Payload
Detection, and Logger. In order to recognise and make a record of the contents of requests,
the Payload Detection Component makes use of machine learning methods, more particularly
a regret value ensemble. The design and implementation of a machine learning enhanced
honeypot system, the automation of data assessment using ML, and the use of a regret value
ensemble strategy to increase detection performance are the contributions that this work
makes. In terms of further work, the authors aim to investigate reinforcement learning as a
potential method of further improving the system that has been suggested (Jiang, 2020).
The challenge of detecting social spammers and protecting safe social media platforms from
being subjected to social assaults. The technique that has been suggested entails developing a
system that is based on BLS (Bilateral Supervision Learning). This system is known as SSL
(Semi Supervised Learning). The essential data for training the model is collected from a
real-world Twitter dataset using a small quantity of labelled data and a big number of
unlabeled data. The model is then trained using this information. For the purpose of
assessment, the output of the proposed model, which goes by the name ASSD (Adaptable
Social Spammer Detection), is compared with the output of existing supervised and semi-
supervised machine learning models. The construction of a completely customizable SSD
system that includes both a BLS and an SSL board is the contribution that this work makes to
the field. In the work that will be done in the future, the primary objective should be to
decrease the amount of time spent in training while simultaneously increasing the accuracy of
spammer identification and optimising the computational complexity of the system (Qiu,
2020).
PHG into Software-Defined Networking (SDN), also known as the Probabilistic Honeypot
Game. Second, it solves the problem of Distributed Denial of Service (DDoS) assaults in the
setting of the Industrial Internet of Things (IIoT). Thirdly, it investigates the possibility of
demonstrating the presence of several Bayesian Nash Equilibrium groups in the PHG. Lastly,
it intends to make use of PHG techniques in order to effectively manage harmful assaults.
The approach that is utilised in this research comprises combining PHG tactics into SDN.
More specifically, the emphasis is placed on the Defender's Optimal Strategy, and an Optimal
Strategy Analysis is carried out utilising PHG. Anti-honeypot assaults are also covered in this
work, with an emphasis placed on the benefits these attacks present to attackers and how
PHG methods may be used to analyse the interactions that are involved. The study makes a
number of contributions, some of which are the following: the resolution of DDoS assaults in
IIoT scenario; the introduction of PHG technique into SDN; and the detection of honeypots
by attackers for the purpose of exploitation. In the future, research in this field will focus on
improving the precision of PHG and putting it to use in applications that are relevant to the
real world (Wang, 2020).
Honeypot-based approaches and machine learning (ML) are utilised in this malware detection
solution for Internet of Things platforms. The primary goals are to design a system that can
identify Distributed Denial of Service (DDoS) assaults and Zero-Day vulnerabilities and
provide protection against them. Setting up an Internet of Things virtual honeypot to collect
log files and then using machine learning models for DDoS attack detection, such as KNN
and random forest, are the steps involved in the process. The creation of a DDoS detection
system that makes use of honeypot and machine learning techniques is the contribution that
this work makes. In next work, we will investigate unsupervised machine learning as a means
of improving honeypot capabilities and expanding the system so that it can identify more
kinds of assaults (Vishwakarma, 2019).
The development of a honeypot system that may detect possible assaults by automatically
scanning network traffic or log files. Second, it presents a whole new automated
identification model that is able to differentiate between regular servers and honeypots. This
model uses three group characteristics that were obtained from a random forest technique as
its foundation. Extraction of features, computation of the features that were extracted, and
collecting of data and labelling of the data are the three components that make up the
methodology of the study. The contributions of this work include establishing that honeypots
are capable of simulating actual systems and giving a reference point for further developing
honeypot technology. Both of these are important aspects. In terms of work that will be done
in the future, the authors propose looking at reinforcement learning as a possible strategy
(Huang, 2019).
Honeypots are a useful method for tracking and identifying cyber-attacks in IoT-enabled
dwellings. These days, one of the best ways to spot zero-day threats is with the use of
machine learning algorithms. To counteract zero-day assaults using machine learning,
researchers have studied the use of honeypots in Internet of Things (IoT) smart homes. These
technologies have been proven to be efficient in identifying and protecting against zero-day
threats in IoT smart homes. However, further study is required to assess these systems'
efficacy in practise and to deal with the difficulties of integrating them with preexisting IoT
gadgets.
15
CHAPTER 3
Methodology
Introduction
The methodology chapter is very necessary in order to properly construct a honeypot architecture
that can recognise and neutralise threats in a smart home environment. This chapter provides a
step-by-step guide for establishing a honeypot environment and implementing machine learning
strategies for monitoring, identifying, and neutralising potential security risks.
A physical honeypot is a term that is used in the context of cybersecurity to refer to a controlled
environment in which tangible gadgets, such as smart home hubs, cameras, thermostats, and
other Internet of Things (IoT) devices, are installed. Examples of these types of devices include
16
smart homes. The aforementioned devices are connected to a network that simulates the default
settings of a typical home network configuration. The goods described above include networking
equipment such as routers and switches among other similar devices.
The creating of virtual representations of connected home appliances and networks may be
accomplished by virtual honeypots through the use of virtualization technologies like
hypervisors and containers. Because this method allows for several instances to be run on a
single piece of hardware, it provides more adaptability and expandability, which are both
benefits of putting this strategy into practise.
In our case we will create a virtual environment due to the availability of resources.
Honeypots are a sort of cybersecurity method that entails setting up a trap or a phoney system in
order to attract and monitor potentially dangerous actors, such as hackers or cybercriminals.
Honeypots may also be used to monitor legitimate traffic (Lutkevich, 2021). Honeypots are also
sometimes referred to by the name honey baskets. Honeypots serve two purposes: first, they
collect intelligence about an adversary's plans, techniques, and tools; second, they divert an
adversary's attention away from more vital systems that are really being watched. Honeypots are
employed for both purposes. Honeypots are created to give off the illusion of being actual targets
or weak systems in order to encourage potential attackers into dealing with them. This is done so
that the honeypot may lure them in. It is able to simulate a wide variety of systems, including
web servers, databases, and network devices, amongst other types of systems, depending on the
specific objectives of the cybersecurity team. It is possible to set up the honeypot either on the
internal network of an organisation, in which case it would be referred to as an internal honeypot,
or on a separate network segment, in which case it would be referred to as an external honeypot
(Kumar, 2023).
There are two main types of honeypots: low-interaction and high-interaction honeypots:
Honeypots with low levels of engagement mimic just a small subset of services or
protocols, giving attackers a more constrained environment in which to interact with the
honeypot. They are easier to deploy and keep up to date, but the information they give
about an attacker's behaviour is less specific (Lakhwani, 2022).
17
Honeypots with a high engagement rate offer an atmosphere that is truer to life by
imitating a broad variety of services and making it possible for attackers to engage in
prolonged conversation. They are able to catch more complex attack strategies and give
more in-depth insights on the behaviours of the attacker, but they take more resources to
implement and manage (Lakhwani, 2022).
Honeypots are capable of gathering a wide variety of data, such as network traffic, system logs,
and even the activities of potential attackers. By conducting an analysis of this data, specialists in
the field of cybersecurity have the opportunity to gather significant insights into the tactics and
motives of attackers, as well as their strategies and possible holes in the organization's defences.
This knowledge may be put to use to strengthen security measures, construct more robust
defences, and expand skills for responding to incidents (Guan, 2023).
However, there is a possibility of harm coming from the use of honeypots. Honeypots might be
used by attackers as a springboard to launch attacks on other systems in the network if they are
not adequately separated from the rest of the network. In order to reduce the potential for harm
caused by these threats, it is essential to put in place stringent security precautions such as
network segmentation and isolation. Honeypots are a useful tool in the field of cybersecurity
because they provide a proactive approach to the collecting of threat intelligence and enable
organisations to keep one step ahead of possible attackers. In general, honeypots are a good tool.
Machine Learning:
The goal of the subfield of computer science and artificial intelligence known as machine
learning is to simulate the way in which humans acquire knowledge while simultaneously
improving the accuracy of this process via the use of various algorithms and large amounts of
data. The accomplishment of this goal can be accomplished by patterning machine learning after
the way in which humans learn, therefore imitating human learning processes (Education, 2020).
The discipline of data science is going through a period of considerable expansion, and one of
the most important factors contributing to this development is machine learning. In the process
of data mining, statistical techniques are used to educate computers to generate categorizations or
prognostications and to unearth key discoveries. Following the acquisition of these insights,
actions inside applications and companies are subsequently changed, which has the ability to
effect substantial growth key performance indicators (KPIs). Because big data is always
growing, there will be a greater need for data scientists in the labour market in the near future. It
will be vital to offer aid in determining the most critical business issues and the data that is
required to properly address those queries. This will be a crucial step (Priyadharshini, 2020).
The two primary subfields that make up the field of machine learning are supervised learning
and unsupervised learning. Every one of them contributes in their own unique way, carries out
their own unique jobs, delivers their own unique outputs, and relies on their own unique
assortment of data. Around 70 percent of machine learning may be attributed to supervised
learning, whereas only 10 to 20 percent can be attributed to unsupervised learning. According to
(Priyadharshini, 2020), the gaps are closed via reinforcement learning (Menon, 2021). The
honeypot strategy has been given a new lease on life because of the widespread availability of
machine learning (ML) libraries. The application of machine learning techniques has been very
helpful for proactive functionality and retrospective analysis. In both of these domains, there is a
large variety of applications that may be made use of machine learning. Because the algorithm
may be "taught" by making use of the data that has already been obtained, supervised learning is
an excellent choice for the process of looking backwards (Alan, 2023). Following the collection
of this data, fresh events are categorised based on their characteristics. There are many different
kinds of classifiers, such as linear classifiers, Naive Bayes classifiers, Support Vector Machines
19
(SVM), Decision Trees, and Random Forests. There are a great many additional descriptors to
consider. Using supervised and unsupervised learning, the data might be structured in a variety of
different ways. When it comes to selecting how to categorise data into categories or form links
between entities, algorithms are given the ability to make such decisions on their own (Dowling,
2020).
Supervised Learning
Supervised learning involves the utilisation of labelled training data. Supervised learning is a
type of learning where the data is already known and the learning process is guided towards
achieving success. The input data is employed for the purpose of training the Machine Learning
model. After the completion of the training process, it is possible to input unfamiliar data and
obtain a novel outcome (Education, 2020) (IBM, 2020).
Unsupervised Learning
Unsupervised learning is a subfield of machine learning that makes use of data for training
purposes that has not been pre-labeled or classified by an experienced professional. The quality
of an algorithm that allows it to function independently of the effect of input data and in the
absence of previous data is referred to as "unsupervised" in the field of computer science. These
pieces of data are then introduced into an algorithm for machine learning as part of the process of
"training" a model. After it has finished its training, a model will be able to engage in active
pattern recognition and respond appropriately (Priyadharshini, 2020).
Different ML Models
the hyperplane that creates the clearest division between the two different sets of data is a
necessary step in the classification process (Jon, 2021).
Random Forest
The trees were crafted with two different kinds of chance in mind when they were built. The
construction of each tree begins with the selection of information at random from the
comprehensive collection. We use a random selection process to select some of the
characteristics present at each node in order to produce the most optimal split. Previous research
has demonstrated that Random Forest is the most accurate classification method now in use,
compared to the other methods that are currently in use. This is due to the fact that Random
Forest is capable of estimating crucial classification criteria after applying a big collection of
data (IBM, 2022).
Decision Tree
The Decision Tree is both the strategy that is employed the most frequently and the one that is
the most successful when it comes to the classification of data. In a decision tree, which is very
similar to a flowchart, each internal node represents a test that was run on a feature, each branch
indicates the result of the test, and each leaf node, also referred to as a terminal node, marks the
end of the tree.
The Decision Tree is an all-purpose, predictive modelling tool that has applications in a range of
different areas and topics of study. It may be thought of as a tree with nodes representing
different categories of information. An algorithm is often responsible for the construction of
decision trees. This algorithm seeks various ways to segment a dataset depending on certain
preset criteria. The algorithm known as the decision tree is an illustration of a suitable algorithm.
Because of how beneficial it is, this type of supervised learning is utilised rather frequently. In
the context of problems involving classification and regression, decision trees may be utilised as
a method of non-parametric supervised learning. The objective here is to extract straightforward
decision rules from the data so that the model can make more accurate projections regarding the
value of the variable that is the focus of our attention (IBM, 2022).
21
Decision trees consider entropy and information gain as its major criteria when creating trees
beginning at the root node. The entropy of a sample can be utilised as a measuring stick in order
to ascertain whether or not the sample is homogenous.
Entropy = -∑ jPjlog2(Pj)
Decision trees also make use of the Gini index, which quantifies the degree of uncertainty that
exists inside a single node, in order to reduce the likelihood of an incorrect classification being
made.
Gini Index = 1- ∑ j P j2
Data Collection
In this we collect data through different resources like UCI, Kaggle, IEEE Data port etc. The
process of methodically accumulating information for the purposes of analysis and decision-
making from a broad variety of locations and individuals is referred to as "data collection," and
the phrase "data collection" is used to characterise this act. It involves compiling quantitative and
qualitative descriptions of the world, utilising methods such as questionnaires, interviews, and
computerised systems, among other sources of information. Primary sources are data acquired
directly from the population of interest, whereas secondary sources are data that have previously
been compiled by professionals in the subject. Primary sources are more reliable than secondary
ones. In order to perceive tendencies, patterns, and insights that allow for informed actions and
judgements, appropriate data collecting guarantees the availability of information that is precise,
full, and reliable.
Training machine learning algorithms involves providing them with data, after which the
resulting models will automatically adjust their parameters in order to attain the highest possible
22
level of performance. The models will continue to be improved in this manner until they are able
to accurately detect attempts at infiltration.
Proposed ML Methodology
Considering this matter, we are adopting an all-encompassing approach. Our method makes use
of supervised learning and a regression model to forecast criminal profiles. The sciences of
artificial intelligence and machine learning both have a subject of research known as supervised
learning. It is unique in that it utilises labelled datasets to train algorithms for identification and
prediction, making it stand out from other similar approaches. In order to apply these models, we
followed the strategy described above.
In order for us to be able to recognise possibly harmful profiles, we are going to make use of an
automated machine learning model. The software application known as Wireshark may be
utilised in its capacity as a network protocol analyzer to collect data packets from a network such
as the one that connects computers to the internet. In addition to that, you may use this
application to investigate the information contained within the data packets. Our group intends to
23
make use of a dataset that will be compiled by Kaggle/UCI in the near future. RapidMiner is a
data science platform that was built specifically for companies in order to allow those businesses
to research the ways in which the personnel, knowledge, and data of a company interact with one
another in order to influence decisions. We are going to make use of RapidMiner, which was
developed especially for businesses in order to allow those businesses to explore the ways in
which such interactions may affect decisions. RapidMiner is a company that offers a wide range
of services related to data mining and machine learning. Some of these services include data
loading and transformation (also known as ETL), data preparation and visualisation, predictive
analytics and statistical modelling, evaluation, and deployment. In addition to such services, we
also provide data preparation and visualisation. During the creation of RapidMiner, the
programming language that was employed most frequently was Java. RapidMiner should be used
to import the dataset that may be located in the nearby repository. Following this, the dataset
should be cleaned, and any values that are absent should be removed before the data is
preprocessed. After that, choosing the target column for the training model is the next step. After
that, divide the information into two unique groups, namely the data pertaining to the training
and the data pertaining to the assessment. After the model has been put through its training, go
on to conducting the analysis on it.
The information is then analysed using the machine learning models that have been trained on
the information. The models can identify suspicious behaviour by comparing it to the patterns
they were taught to look for. This study is useful for spotting and comprehending attacks that
have never been seen before.
The honeypot design in the context of a smart home is described in further detail in the chapter
devoted to the methodology. This method employs algorithms for machine learning, the
collecting of data, and in-depth analysis in an effort to enhance the identification and mitigation
of potential risks, hence safeguarding the safety and dependability of the smart home
infrastructure.
24
References
Abdou, A., 2021. HoneyModels: Machine Learning Honeypots. Special Topics in Military
Communications, p. 6.
Alan, 2023. What is Machine Learning? Defination, Types, Applications, and more (. [Online]
Available at: https://www.mygreatlearning.com/blog/what-is-machine-learning/
[Accessed 2023].
Anon., 2023. HoneyIoT: Adaptive High-Interaction Honeypot for IoT Devices Through
Reinforcement Learning. Chongqi Guan, p. 11.
Ariffin, T. A. M. T., 2022. IoT attacks and mitigation plan: A preliminary study with Machine
Learning Algorithms. p. 6.
BO-XIANG WANG, J.-L. C., 2022. An AI-Powered Network Threat Detection System. Issue
May 17, 2022, p. 9.
BO-XIANG WANG, J.-L. C., 2022. An AI-Powered Network Threat Detection System. IEEE,
Issue May 25, 2022, p. 9.
Das, R. R., 2022. Securing IoT devices using Ensemble Machine Learning in Smart Home
Management System. p. 8.
Devi, B. T., 2020. An Appraisal over Intrusion Detection Systems in Cloud Computing Security
Attacks. Innovative Mechanisms for Industry Applications (ICIMIA 2020), p. 6.
Dowling, S., 2020. A new framework for adaptive and agile honeypots. A new framework for
adaptive and agile honeypots, p. 180.
Ellouh, M., 2022. IoTZeroJar: Towards a Honeypot Architecture for Detection of Zero-Day
Attacks in IoT. p. 7.
Guan, C. e. a., 2023. HoneyIoT: Adaptive high-interaction honeypot for IoT devices through
reinforcement learning,. [Online]
Available at: http://arxiv.org/abs/2305.06430
Huang, C., 2019. Automatic Identification of Honeypot Server Using Machine Learning
Techniques. Security and Communication Networks, Volume 9, p. 9.
Iqbal, Z., 2022. Denial of Service (DoS) Defences against Adversarial Attacks in IoT Smart
Home Networks using Machine Learning Methods. NUST Journal of Engineering Sciences,
Volume Vol. 15, No. 1, p. 8.
Irini Lygerou, S. S. E. V. G. S. &. D. G., 2022. A decentralized honeypot for IoT Protocols based
on Android devices. International Journal of Information Security volume, p. 21.
Jiang, K., 2020. Design and Implementation of A Machine Learning Enhanced Web Honeypot
System. 2020 13th International Congress on Image and Signal Processing, BioMedical
Engineering and Informatics (CISP-BMEI), p. 5.
Joseph Bao, M. K. Y. V. C. K., 2023. IoTFlowGenerator: Crafting Synthetic IoT Device Traffic
Flows for Cyber Deception. Cryptography and Security (cs.CR); Machine Learning , p. 13.
Lakhwani, S., 2022. What is a honeypot? Types, benefits, risks and best practices. [Online]
Available at: https://www.knowledgehut.com/blog/security/honeypot
[Accessed 2023].
Matin, I. M. M., 2019. Malware Detection Using Honeypot and Machine Learning. p. 4.
Mfogo, V. S., 2023. AIIPot: Adaptive Intelligent-Interaction Honeypot for IoT Devices. p. 7.
Priyadharshini, 2020. What is Machine Learning and types of Machine Learning. [Online]
Available at: https://www.simplilearn.com/tutorials/machine-learning-tutorial/what-is-machine-
learning#what_are_the_different_types_of_machine_learning
Qiu, T., 2020. An Adaptive Social Spammer Detection Model with Semi-supervised Broad
Learning. p. 14.
machine-example-code/
[Accessed Sunday October 2022].
Sankaran, K. S., 2023. Deep learning-based energy efficient optimal RMC-CNN model for
secured data transmission and anomaly detection in industrial IOT. Sustainable Energy
Technologies and Assessments , Issue 4 January 2023, p. 8.
Srinivasa, S., 2022. Interaction matters: a comprehensive analysis and a dataset of hybrid IoT/OT
honeypots. p. 14.
Sumadi, F. D. S., 2022. SD-Honeypot Integration for Mitigating DDoS Attack Using Machine
Learning Approaches. INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION,
Issue March 2022, p. 6.
Tsochev, G., 2021. Using Machine Learning Reacted with Honeypot Systems for Securing
Network. International Conference AUTOMATICS AND INFORMATICS, Issue October 02,
2021, p. 4.
Vishwakarma, R., 2019. A Honeypot with Machine Learning based Detection Framework for
defending IoT based Botnet DDoS Attacks. Third International Conference on Trends in
Electronics and Informatics (ICOEI 2019), p. 6.
Abdou, A., 2021. HoneyModels: Machine Learning Honeypots. Special Topics in Military
Communications, p. 6.
Ariffin, T. A. M. T., 2022. IoT attacks and mitigation plan: A preliminary study with Machine
Learning Algorithms. p. 6.
28
BO-XIANG WANG, J.-L. C., 2022. An AI-Powered Network Threat Detection System. IEEE,
Issue May 25, 2022, p. 9.
Das, R. R., 2022. Securing IoT devices using Ensemble Machine Learning in Smart Home
Management System. p. 8.
Devi, B. T., 2020. An Appraisal over Intrusion Detection Systems in Cloud Computing Security
Attacks. Innovative Mechanisms for Industry Applications (ICIMIA 2020), p. 6.
Ellouh, M., 2022. IoTZeroJar: Towards a Honeypot Architecture for Detection of Zero-Day
Attacks in IoT. p. 7.
Huang, C., 2019. Automatic Identification of Honeypot Server Using Machine Learning
Techniques. Security and Communication Networks, Volume 9, p. 9.
Iqbal, Z., 2022. Denial of Service (DoS) Defences against Adversarial Attacks in IoT Smart
Home Networks using Machine Learning Methods. NUST Journal of Engineering Sciences,
Volume Vol. 15, No. 1, p. 8.
Jiang, K., 2020. Design and Implementation of A Machine Learning Enhanced Web Honeypot
System. 2020 13th International Congress on Image and Signal Processing, BioMedical
Engineering and Informatics (CISP-BMEI), p. 5.
Matin, I. M. M., 2019. Malware Detection Using Honeypot and Machine Learning. p. 4.
Mfogo, V. S., 2023. AIIPot: Adaptive Intelligent-Interaction Honeypot for IoT Devices. p. 7.
Qiu, T., 2020. An Adaptive Social Spammer Detection Model with Semi-supervised Broad
Learning. p. 14.
Sankaran, K. S., 2023. Deep learning-based energy efficient optimal RMC-CNN model for
secured data transmission and anomaly detection in industrial IOT. Sustainable Energy
Technologies and Assessments , Issue 4 January 2023, p. 8.
29
Sumadi, F. D. S., 2022. SD-Honeypot Integration for Mitigating DDoS Attack Using Machine
Learning Approaches. INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION,
Issue March 2022, p. 6.
Tsochev, G., 2021. Using Machine Learning Reacted with Honeypot Systems for Securing
Network. International Conference AUTOMATICS AND INFORMATICS, Issue October 02,
2021, p. 4.
Vishwakarma, R., 2019. A Honeypot with Machine Learning based Detection Framework for
defending IoT based Botnet DDoS Attacks. Third International Conference on Trends in
Electronics and Informatics (ICOEI 2019), p. 6.
Vishwakarma, R. and Jain, A. K. (2019) “A Honeypot with Machine Learning based Detection
Framework for defending IoT based Botnet DDoS Attacks,” in 2019 3rd International
Conference on Trends in Electronics and Informatics (ICOEI). IEEE, pp. 1019–1024
AlMahmeed, Y. S. and Al-Omay, A. Y. (2022) “Zero-day attack solutions using threat hunting
intelligence: Extensive survey,” in 2022 International Conference on Data Analytics for Business
and Industry (ICDABI). IEEE, pp. 309–314.
Shahid, W. B. et al. (2022) “A deep learning assisted personalized deception system for
countering web application attacks,” Journal of information security and applications,
67(103169), p. 103169. doi: 10.1016/j.jisa.2022.103169
Lee, S. et al. (2021) “Classification of botnet attacks in IoT smart factory using honeypot
combined with machine learning,” PeerJ. Computer science, 7(e350), p. e350. doi:
10.7717/peerj-cs.350.
Ahmad, R. and Alsmadi, I. (2021) “Machine learning approaches to IoT security: A systematic
literature review,” Internet of Things, 14(100365), p. 100365. doi: 10.1016/j.iot.2021.100365.
30
Hamza, A. A. et al. (2022) “HSAS-MD analyzer: A hybrid security analysis system using model-
checking technique and deep learning for malware detection in IoT apps,” Sensors (Basel,
Switzerland), 22(3), p. 1079. doi: 10.3390/s22031079.
Gyamfi, E. and Jurcut, A. (2022) “Intrusion detection in Internet of Things systems: A review on
design approaches leveraging multi-access edge computing, machine learning, and
datasets,” Sensors (Basel, Switzerland), 22(10), p. 3744. doi: 10.3390/s22103744.
Sharma, S., Lone, F. R. and Lone, M. R. (2020) “Machine learning for enhancement of security
in internet of things based applications,” in Security and Privacy in the Internet of Things. 1st
Edition. Chapman and Hall/CRC, pp. 95–108.
Jha, C. K., Biswas, S. S. and Nafis, M. T. (2023) “A comprehensive system for smart homes with
a minimalist information security framework,” in Information and Communication Technology
for Competitive Strategies (ICTCS 2021). Singapore: Springer Nature Singapore, pp. 401–411.
Scott, E. et al. (2022) “Optimising user security recommendations for AI-powered smart-homes,”
in 2022 IEEE Conference on Dependable and Secure Computing (DSC). IEEE, pp. 1–8.
Ali, S. S. and Choi, B. J. (2020) “State-of-the-art artificial intelligence techniques for distributed
smart grids: A review,” Electronics, 9(6), p. 1030. doi: 10.3390/electronics9061030.
Amraoui, N. and Zouari, B. (2022) “Securing the operation of Smart Home Systems: a literature
review,” Journal of reliable intelligent environments, 8(1), pp. 67–74. doi: 10.1007/s40860-021-
00160-3.
Viegas, E. K. et al. (2023) “A dynamic machine learning scheme for reliable network-based
intrusion detection,” in Advanced Information Networking and Applications. Cham: Springer
International Publishing, pp. 439–451.
Kavitha, A. and Priyanka, R. (2022) “Analysis of novel face recognition system to minimize the
false identification rate using fast Fourier transform in comparison with wavelet transform,”
in 2022 14th International Conference on Mathematics, Actuarial Science, Computer Science
and Statistics (MACS). IEEE, pp. 1–5.
31
El Kamel, N. et al. (2020) “A smart agent design for cyber security based on honeypot and
machine learning,” Security and communication networks, 2020, pp. 1–9. doi:
10.1155/2020/8865474.
Koroniotis, N., Moustafa, N. and Sitnikova, E. (2019) “Forensics and deep learning mechanisms
for botnets in internet of things: A survey of challenges and solutions,” IEEE access: practical
innovations, open solutions, 7, pp. 61764–61785. doi: 10.1109/access.2019.2916717.
Joseph, T. A. and Jayapandian, N. (2022) “Detection of various security threats in IoT and cloud
computing using machine learning,” in 2022 International Conference on Sustainable Computing
and Data Communication Systems (ICSCDS). IEEE, pp. 996–1001.
Meera, A. J., Kantipudi, M. V. V. P. and Aluvalu, R. (2021) “Intrusion detection system for the
IoT: A comprehensive review,” in Advances in Intelligent Systems and Computing. Cham:
Springer International Publishing, pp. 235–243.
Dowling, S., Schukat, M. and Barrett, E. (2019) “Using reinforcement learning to conceal
honeypot functionality,” in Machine Learning and Knowledge Discovery in Databases. Cham:
Springer International Publishing, pp. 341–355