A Review Paper

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

EAI Endorsed Transactions

on Internet of Things Review Article

Water Quality Estimation and Anomaly Detection: A


Review
Deniz Balta1 , Seda Balta Kaç2 , Musa Balta3 and Süleyman Eken2,∗
1Sakarya University, Department of Software Engineering, Sakarya 54050, Turkey
2KocaeliUniversity, Department of Information Systems Engineering, Izmit 41001, Turkey
3Sakarya University, Department of Computer Engineering, Sakarya 54050, Turkey

Abstract

Critical infrastructures that provide irreplaceable services are systems that contain industrial control systems
(ICS) that can cause great economic losses, security vulnerabilities and disruption of public order when the
information in it is corrupted. These ICSs, which were previously isolated, have now become systems that
contain online sensors, wireless networks, and artificial intelligence technologies. This situation has also
increased the scope of attacks by malicious people who intend to carry out industrial espionage and sabotage
these systems. In this study, water quality estimation systems and anomaly detection are comprehensively
examined. In this direction, the statistics of the studies in the literature, the methods for water quality
anomaly detection, the existing data sets, and the difficulties encountered in the water systems to achieve
better water management are discussed. Principle findings of this research can be summarized as follows: (i)
new methodologies and architectures have improved water quality assessment through anomaly detection,
(ii) different datasets including multi-modal information have been presented, and (iii) remaining challenges
and prospects have been investigated.

Received on 01 August 2023; accepted on 14 October 2023; published on 18 October 2023


Keywords: Water quality, Anomaly detection, Water management systems, Water analytics, Water distribution networks,
Machine learning
Copyright © 2023 D. Balta et al., licensed to EAI. This is an open access article distributed under the terms of the CC
BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any
medium so long as the original work is properly cited.
doi:10.4108/eetiot.v9i4.3660

1. Introduction for the sustainable management of water resources,


methods that require advanced technology in areas
71% of the Earth’s surface is surrounded by water, such as measurement systems, monitoring, and control
which is vital to all known life forms. Cleaning of systems need to be defined and put into practice [1–
drinking water is a major task for water supply 3]. Therefore, water management systems are in a
companies around the world and it is a big problem very important position for the protection of critical
that drinking water is highly vulnerable to possible infrastructures due to a number of factors such as
attacks. In recent years, with the acceleration of these. The first of these factors is the quality of
industrialization, intense human activities, agricultural the water distributed, and a slight deterioration in
activities, and other sectors’ water demands have water quality directly affects many people in terms
increased in most countries. Due to environmental of health. Another problem is that, unlike certain
pollution and variable climatic conditions, different other infrastructures where it would be possible to
problems such as the increase in the amount of restrict physical access to key assets, water management
water resources use, deterioration of water quality, and systems have a significant number of remote stations
sabotage are encountered. In addition, the subject of that are challenging to control and safeguard from
water management systems is one of the research areas unintentional or intentional contamination incidents.
that many developed countries prioritize. Accordingly, Because there are very few defense mechanisms in
case of pollution. Various techniques [4, 5] can be
∗ Corresponding author. Email: suleyman.eken@kocaeli.edu.tr
used to model the distribution of pollutants, but the

EAI Endorsed Transactions on


1 Internet of Things
| Volume 9 | Issue 4 |
D. Balta et al.

large and complex topologies of water distribution A smart water city enhances the quality of life of
systems make these techniques difficult to apply. In its residents by utilizing Information and Communi-
order to improve emergency response capacity and cations Technology (ICT) and other technologies to
safeguard water quality from potential risks brought address urban water issues at every stage of the urban
on by intentional or unintentional contamination, it water cycle. Six categories can be used to classify
has become essential to develop an effective detection general research: (i) management of alternative water
method to spot changes and anomalies in water quality resources and reuse, (ii) sustainability, (iii) water 4.0,
as well as to provide rapid early warning in case of (iv) sanitation and value-adding, (v) quality, and (vi)
potential hazards. networks [13]. The restoration of the water cycle, water-
Detection of intentional/unintentional contamina- front use, and intelligent water management help to
tion events that threaten the safety of water manage- improve overall water management in addition to offer-
ment systems and prevention systems is widely studied ing individualized solutions for traditional water man-
in the literature. These studies address the issue of agement practices including drainage, water treatment,
water management systems security from many differ- and wastewater treatment. ICT-based intelligent tech-
ent aspects such as water quality determination and nologies augment and supplement existing infrastruc-
detection of anomalies, placement of water quality ture and water management technologies on a broad
sensors and SCADA (Supervisory Control and Data scale in a smart water city [14].
Acquisition) security, pollutant detection, modeling of
intervention and mitigation methods, and the devel- 1.2. AI-driven next-generation cyber-physical
opment of artificial intelligence and machine learning systems
supported anomaly detection models for more complex
situations addresses [6, 7]. The Next Generation Cyber-Physical Systems (NG-CPS)
have become complex, autonomous, sophisticated, and
Water management systems are known as cyber-
pervasive as a result of the gradual integration of
physical systems where physical processes work
technology. As a result, both academia and business
together with computational engineering systems.
are interested in today’s NG-CPS, which includes the
In these systems, water quality measurement and
Internet of Things (IoT), cyber components, Internet
water management are controlled by SCADA system
of Vehicles (IoV), Intelligent Implantable Medical
composed of existing sensors, actuators, programmable
Devices (IMDs), etc. Although NG-CPS can be defined
logic controllers (PLCs), remote terminal units (RTUs),
by a number of opportunities for service providers
and field devices such as these. Therefore, recent cyber
(stakeholders in the industry and the market) as well as
physical events show that these SCADA systems water
for consumers (clients). Although NG-CPS technology
management systems are suitable for cyber attacks
has many benefits, it also presents a number of
and are one of the leading critical infrastructures. At
difficulties for the involved parties, including reliability,
this point, it is clear that in these water management
security, and interoperability [15].
systems, there is a need for tools that can detect
The literature offered a number of ways to address
anomalies in water quality, evaluate the risk of the
these issues with NG-CPS technology, however, they
cyber-physical system, and support the prevention and
don’t seem to be able to recognize recently adopted
intervention of cyber-physical attacks [8].
risks. Designing trustworthy AI-driven solutions for
NG-CPS technology is therefore imperative if we are
1.1. Smart city and its water management to handle these issues profitably [16, 17]. Because AI-
perspectives driven solutions have the potential to foresee and detect
both existing dangers and those that have just been
The study and creation of applications for smart cities
accepted, they should be employed as a substitute
have become hot topics in recent years. Although the
technology in the presence of existing literature. To
concept of the smart city—connected cities, intelligent
create new AI-driven solutions for this developing
cities, digital cities, etc.—was first proposed in the
technology, researchers and industry experts must
1990s, big data and AI-driven recent technological
collaborate [18].
advancements have accelerated the adoption of these
applications. These programs, made available by city
governments, give residents amenities that might make 1.3. Anomalies and their types
daily living easier [9]. According to Bellini et al. [10], There are numerous definitions of anomalies, each with
who manually categorized the applications for smart varying levels of specificity. They are typically thought
cities, there are eight main classes: smart governance, to be infrequent in comparison to non-anomalous
smart economy, smart facilities, smart transport, smart observations in a dataset and to deviate from the norm
energy [11, 12], smart industry and production, smart in terms of their attributes. Any water-quality value
environment (like smart water), and smart healthcare. or set of data that is the result of a manufacturing

EAI Endorsed
EAI Endorsed Transactions Transactions
on Internet of Things on
2 Internet of Things
| Volume 9 | Issue 4 |
Water Quality Estimation and Anomaly Detection

defect in the in-situ sensor equipment is considered an al. [24] examine existing trends and improvements in
anomaly in this study. In collaboration with the end- water quality. They also identify and assess a variety of
user, Leigh et al. [19] established the many categories of widely used estimating techniques across data sources
abnormalities that are expected to arise in the water- and datasets, point out the shortcomings of the system
quality data. These include drift, clusters of spikes, as it stands as well as prospective improvements. Jiang
missing values, large and small abrupt spikes, low et al. [25] provide a general systematic framework to
variability, including permanent values, continuous analyze the dynamics of river water quality in depth
offsets, abrupt shifts, high variability, impossible by incorporating high temporal resolution observations
values, out-of-sensor-range values, and others [20]. with a combination of Fourier and wavelet spectrum
Also, the main groups of anomalies are briefly analysis. Ahmed et al. [26] examine the issue from a
discussed as (i) Point anomaly: This happens when a number of angles, including the examination of cutting-
single data instance deviates from the overall dataset’s edge technologies like the Internet of Things (IoT) and
typical pattern. (ii) Contextual/Conditional anomaly: machine learning approaches to address water quality
These are data occurrences that are only labeled as well as the traditional methods of measuring water
anomalies in a given circumstance. (iii) Collective quality to obtain insight into the issue. After examining
anomaly: A group of data instances is referred to as a the present options, the authors suggest a low-cost, IoT-
collective anomaly when they behave abnormally when based system that uses machine learning techniques
compared to the full dataset. It is possible to include to track trends in water quality and identify unusual
anomaly types to these groups as given in Table 1. events. Gupta et al. [27] offer a summary of data ana-
lytics platforms appropriate for diverse Environmental
Table 1. Types of anomalies and their groups Science and Engineering (ESE) research applications.
Utilizing three example case cases, we demonstrate
Anomaly type Its group recent ML algorithm implementations in the ESE sector.
Large sudden spike Point or collective anomaly One of these case studies is the detection of anomalies
Low variability/persistent values Contextual anomaly in continuous data generated by engineered water sys-
Constant offset Point or collective anomaly tems. Shi et al. [28] list the management of drinking
Sudden shifts Point or collective anomaly water quality applications of online UV-Vis spectropho-
High variability Collective anomaly tometers over the previous two decades. Table 2 shows
Impossible values Contextual anomaly the comparison with other review and survey papers.
Out-of-sensor-range values Contextual anomaly The main objectives of the review address following
Drift Collective anomaly ones.
Clusters of spikes Collective anomaly
Small sudden spike Point anomaly • Must do a thorough literature analysis to
Missing values Point or collective anomaly determine the most recent methods for estimating
water quality and spotting anomalies.

1.4. Motivations and objectives of this review • To draw attention to the flaws and restrictions of
these existing techniques.
Much research that has been done in the area of out-
lier or anomaly identification has been organized and
• To give remaining challenges and recommend
classified in a few recent survey publications, with
future research directions.
a focus on the research challenges that still require
attention. The potential of tensor-based techniques as
a cutting-edge method for the detection and identi- 1.5. Paper organization
fication of anomalies and failures in interdisciplinary
activities is highlighted by Fanaee-T and João [21]. The rest of this paper is organized as follows: Section
Sebestyen and Hangan [22] discuss the difficulties and 2 focuses on water management systems architectures
potential solutions associated with putting computer- with their components and cyber security in water
based anomaly detection systems into practice through quality. Section 3 describes the methodology for the
a number of case studies. Dogo et al. [23] conduct a systematic review process according to PRISMA 2020.
thorough literature review to determine the current ML Section 4 deals with the materials and methods for
approaches being used to address the water quality water quality and anomaly detection. Section 5 presents
anomaly detection (WQAD) issue, highlight the draw- the remaining challenges and prospects. In the last
backs and restrictions of these approaches, suggest a section, we conclude the paper.
hybrid DL-ELM framework for WQAD that could be
further investigated, and then suggest future research 2. Water Management Systems and Water Quality
directions. Through the use of remote sensing, Sagan et

EAI Endorsed Transactions on


3 Internet of Things
| Volume 9 | Issue 4 |
D. Balta et al.

Table 2. Comparison with other review and survey papers

Paper Year Cyber-Physical System Focus


Fanaee-T and João[21] 2016 None Anomaly Detection
Sebestyen and Hangan [22] 2017 All Anomaly Detection
Dogo et al. [23] 2019 Smart Water Grids Water Quality and Anomaly Detection
Sagan et al. [24] 2020 None Water Quality
Jiang et al. [25] 2020 None Water Quality
Ahmed et al. [26] 2020 IoT-based low-cost system Water Quality
Gupta et al. [27] 2021 Engineered Water Systems Anomaly Detection
Shi et al. [28] 2022 None Water Quality
Ours 2023 Water Management Systems Water Quality and Anomaly Detection

2.1. Water management systems architecture and of water-related data such as flow, leakage, pressure
components changes, transmission, current chemical parameters
and levels, using technologies such as sensors, wireless
Water management facilities under the control of communication devices and control units. In this
local governments managements should be organized way, it helps the efficient use of the resource by
according to the design principles and norms deter- analyzing the collected data. In general, there are four
mined by the World Health Organization worldwide. components of smart water management technology
For this reason, drinking water treatment plants are as shown in Table 3. These are digital output
made according to certain standards for the purification devices such as meters and sensors, SCADA systems,
of water supplied from the surface and underground geographical information systems (GIS), and related
water resources and are generally monitored by SCADA software. These components are used for various
systems. All equipment is designed to be controlled purposes. For example, with digital output devices,
from a single center. An automation program for the water quality can be monitored instantly, leakage
facility is prepared. All processes are in the computer and pressure can be detected in real-time, asset
environment and under its control, and all data about management can be provided, and consumption can
the facility are continuously recorded through this pro- be measured with smart water meters. With SCADA
gram. Regional control panels are installed in order to systems, operations such as optimization of pumping
intervene in the electro-mechanical parts of the units in stations, control of treatment and drinking water
the facility. In case of need, it is possible to intervene in facilities, environmental controls can be performed and
that unit from its own panel by means of operators. processes can be controlled remotely by processing
The automation program contains all the necessary and optimizing the information obtained. With the
information (flow rates, levels, pressures, temperatures, Geographic information system (GIS), information
dissolved oxygen concentration, pH values, and other about the environment can be collected, managed or
concentrations) for an effective operation. Major analyzed. In this way, asset management of a water
automatic quality control and measurement equipment management system, management and analysis of
(flow meters, pH meter, turbidity meter, residual environmental data can be performed, and integrated
chlorine analyzer) are checked every day, calibrated network models can be obtained. Related software is
if necessary, and renewed and the instructions for used to store, report or use data collected by other
use are followed. In drinking water treatment plants, components. Thanks to this software, for example,
all units and main equipment are connected to the water networks can be managed, and possible attack
automation system. There is an alarm system in order situations can be detected by working in integration
to take necessary measures in case of malfunction and with GIS and SCADA systems. Thus, decision-making
to deliver news to the control room. and risk management can be facilitated for modeling
Thanks to smart technology, traditional water the infrastructure and environmental systems of
management systems can have the instrumental ability water management systems. The fact that water
to measure and record data, and the ability to stay management systems become smart, that is, integrating
in touch with system administrators interconnected with information and communication technologies,
and quickly analyze the current situation and respond increases efficiency and performance. But it also leaves
and solve problems intelligently. (Fig. 1). Smart infrastructure vulnerable to cyber threats. Because
water management can generally be defined as thousands of sensors added to the system will
intelligent, efficient and sustainably sourced water. be controlled over a network and if an attacker
Smart water systems are designed to collect all kinds

EAI Endorsed Transactions on


4 Internet of Things
| Volume 9 | Issue 4 |
Water Quality Estimation and Anomaly Detection

Figure 1. Smart Water Management System

gains access to the network, the security obligations In other words, it has been observed that many of the
of confidentiality, integrity, and availability will be water quality parameters are measured and followed
violated. Therefore, it can be said that recent studies in institutions with water management systems in real
have not been able to obtain a full solution due to the life, but they are not analyzed together at the point
problem of testing on real systems, limited computing of determining the water quality, and observations are
resources, existing architectures not responding to made by looking at a few determined parameters one
change, re-usability problems, and limitations in by one. Therefore, the established systems also need a
communication. In summary, it is necessary to increase monitoring system that accurately reports water quality
safety studies on smart water management systems and changes by analyzing all measurement parameters
to obtain new and sustainable solutions. based on measured values together. In other words, an
adequate and accurate alarm system that enables early
2.2. Water quality process detection of any changes is a basic requirement for the
provision of clean and safe drinking water.
Water quality is defined as an indicator of the A distribution system’s water quality monitoring
physical, biological, and chemical properties of water. process is a delicate and extremely complex process
Changes in water chemistry can occur due to natural that is influenced by a variety of factors. Because
disasters such as earthquakes, terrorist attacks, or it is difficult to anticipate the water quality at a
man-made pollution. Today, water companies use particular stage in the system’s life due to the varying
pollution warning systems to control drinking water water quality data arriving from various sources and
quality. With these systems, they regularly monitor treatment facilities as well as the diversity of water
the relevant water quality and environmental data at pathways in the system. Additionally, there is a lot
various measurement points, using different sensors. of data pollution because the data produced differ

EAI Endorsed Transactions on


5 Internet of Things
| Volume 9 | Issue 4 |
D. Balta et al.

Table 3. Components of Smart Water Management Technologies

Component Purpose Application Example


• Rainmeters, flow meters,
water quality monitoring and
other environmental data
Digital outputs Collecting and transmitting • Acoustic devices for real-time leak detection
(counter and sensor) information in real time. • Video camera for asset management
• Smart water meters to measure consumption
• Pressure monitoring for leak detection
and pump optimization
• Pressure management
Process information and remotely • Pump station optimization
SCADA systems operate and optimize systems • Water treatment plant control
and processes. • Sewage treatment plant control
• Environmental controls, reservoirs, flows, etc.
• Asset mapping and asset management
Geographic Information To store, manage, process and analyze
• Fully integrated network models
Systems spatial information.
• Environmental data analysis and management
• Often integrated with GIS and/or SCADA
systems to manage water networks,
control pressure, monitor leakage.
To store, use and report data.
• Improved decision making and risk management
To model infrastructure and
• Customer databases
Software environmental systems to improve
• Intelligent metering, billing and collections
design, decision making and
• Hydraulic design and optimization
risk management.
• Water resources and hydrological modeling
for water security
• Cloud-based data management and hosting options

from one another. Therefore, in order to ensure in general, risk assessments are carried out with the
comparability in the produced data, there must be a possible effects of pollutants in water resources on
certain standardization. When the production methods human health and aquatic ecosystem, the analysis, and
of water-related data, duplicate data production, and rating of this risk, and the measures to be taken in
data sharing problems are experienced, institutional order to prevent negative effects. In order for water
capacities in data collection, storage, and analysis at resources to reach a good level of quality, general
the local level are insufficient and the data cannot water quality standards are determined in the world.
be recorded sufficiently. This situation necessitates Environmental Protection Agency (EPA) has launched
effective log/data management in water management a significant push to create powerful, thorough, and
systems. In addition, the inaccessibility of data in completely integrated surveillance and monitoring
digital environments and such issues are the main systems, including global water quality data, that allow
problems in the production and use of water-related for the early identification and awareness of diseases,
data in the world. pests, and dangerous substances [29]. In this direction,
environmental quality standards have been determined
When determining the quality of water, it is necessary in the reference of the World Health Organization
to know where and for what purpose the water is (WHO) for EU priority substances for water quality
used (such as drinking water, industry, agriculture, and in water management systems and for country-specific
energy sector) and where the water comes from (rivers, pollutants [1, 2, 30, 31]. The Guidelines for drinking-
lakes, coastal-transitional waters, and underground water quality (GDWQ), the first version of which was
waters) play a role in determining water quality published in 1958, is the international reference point
standards. For example, while determining the quality used to establish national and regional regulations
of the water to be used for agriculture, parameters on water quality and includes an assessment of the
such as salinity of the water and ion toxicity are health risks posed by various microbial, chemical,
involved, while determining the quality of drinking radiological, and physical contaminants that may be
water, parameters such as the PH ratio of the water, the present in drinking water. In the literature, drinking
amount of chlorine, and the dissolved oxygen should be water quality is generally determined by the analysis
considered. At the point of water quality management,

EAI Endorsed Transactions on


6 Internet of Things
| Volume 9 | Issue 4 |
Water Quality Estimation and Anomaly Detection

of various parameters. In this direction, physico- network (BPNN) are integrated into the data-driven
chemical parameters, which are of vital importance for model. Genetic algorithm was used to optimize suitable
institutions and which are decided on water quality by initial weight parameters. BPNN was applied to adjust
direct measurement, and which are widely used in the suitable connection architectures and determine the
literature to determine water quality, are given in Table characteristics of water quality variation.
4.
Kang et al. [32] examined big data analytics studies 3. Methodology for Systematic Review
applied in the field of water quality. These studies
were classified and compared according to big data This systematic review’s goal is to summarize the
prediction models. These comparisons were made current state of knowledge and identify areas for
using models such as artificial neural networks, future study that should be prioritized. To ensure
Radial-based Function Network (RBFN), Deep Belief a comparable and thorough outcome, the Preferred
Network, Decision Trees, Improved Decision Trees, and Reporting Items for Systematic Reviews (PRISMA)
Least Squares Support Vector Machine. In addition, 2020 technique [37] (see Fig. 2) has been specifically
the parameters affecting the water quality according created to offer detailed reporting guidelines for such
to the standards in the related study have been assessments. This process typically has four steps: (i)
diversified under different subheadings. Lu et al. [5] Identification, (ii) Screening, (iii) Eligibility, and (iv)
collected data from the Tualatin River, one of the Inclusion.
world’s most polluted rivers, to estimate water quality
and estimated indicators such as water temperature, 3.1. Identification of sources and search terms
dissolved oxygen, pH value, specific conductivity,
turbidity, and fluorescence dissolved in organic matter Scopus, Web of Science, and DBLP were the main
(FDOM). XGBoost and Random Forest models with online databases used in the search strategy to find
data noise have been proposed for forecasting systems. publications. These are the most popular libraries in
The proposed models are then compared with classical the field of water quality estimation and anomaly
models (PSO-SVM, RBFNN, LSSVM, LSTM) under detection in water management systems for publishing
different metrics. According to the proposed RF model, conference proceedings and journal papers. We used
it performed best in estimating temperature, dissolved Google Scholar to find relevant publications that
oxygen and specific conductivity. In their study, Chawla appeared in other databases in addition to returning
et al. [33] used regression and machine learning models articles that were covered in these databases. The
such as linear regression, random forest, support vector databases were searched using suitable keywords and
machine (SVM) and long short-term memory (LSTM) keyword combinations such as [“water quality" &&
to predict the Salton Sea salinity level and future “anomaly detection"]. The search was restricted to the
trend. Parameters such as temperature, conductivity, years 2012 through 2022, which narrows the scope
specific conductivity, dissolved oxygen and salinity of our meta-analysis to more recent publications.
were studied. Selim et al. [34] present a study on The search string for each database is displayed in
water quality analysis using the Internet of things and Table 5. Advanced searching was employed to weed
big data analytics. An IoT-based model is proposed out irrelevant papers when a basic database search
considering the parameters affecting the quality of produced a large number of results.
water such as Oxidation Reduction Potential (ORP),
dissolved oxygen (DO), PH, Electrical Conductivity 3.2. Screening
(EC) and turbidity. In the study, the points that need
to be considered in making the data read through The papers from the Identification stage that received
these devices meaningful are mentioned. In the study the highest ratings were manually annotated. The
of Nemade and Shah [35], firstly, data cleaning was degree to which water quality and anomaly detection
performed by removing missing values and outliers were discussed/explained in each publication was
on the dataset collected using IoT sensors. Then, a critical and central qualifying question for the
the G-SMOTE technique, which hybridizes SMOTE screening procedure. The publication’s relation to the
and genetic algorithm, is proposed to solve the subject of the ones described above was another
unbalanced data set problem. In the proposed system, criterion for this screening phase. Using a second
the usage area of water is determined by using the keyword annotation procedure, we distributed the final
modified deep learning neural network (MDLNN) collection by classifying each manuscript according to a
classifier. In the study of Jin et al. [36], surface more specific set of categories based on its title, abstract,
water quality estimation is made to provide real-time and keywords. The whole text required to be reviewed
early warnings based on past observation data. A at this point only if the categorization of publications
genetic algorithm (IGA) and a back propagation neural based on these three elements was not feasible.

EAI Endorsed Transactions on


7 Internet of Things
| Volume 9 | Issue 4 |
D. Balta et al.

Identification
Records removed before
screening:
Records identified from: Duplicate records removed (n =
Databases (n = 3,219) 150)
Records marked as ineligible by
automation tools (n = 0)

Records screened Records excluded


(n = 119) (n = 3,100)
Screening

Records sought for retrieval Records not retrieved


(n = 119) (n = 0)

Records excluded:
Records assessed for eligibility Not-English (n = 12)
(n = 88) Type of text (n = 11)
Not closely related (n = 8)
Included

Studies included in review


(n=88)

Figure 2. PRISMA 2020 flow diagram

3.3. Eligibility and Inclusion 3.4. Results


This section outlines the procedures we used to select 3,219 publications were found after searching an
the final group of papers for this review. The following online database. To facilitate further investigation,
selection criteria were used to find publications for a their information was exported as a CSV file. After
systematic review: removing any duplicates, the remaining peer-reviewed
• Must address anomaly detection and water articles that have appeared in internationally renowned
quality in water management systems. conferences, seminars, or publications were picked for
more in-depth analysis. The eligible list of publications
• Must have technical content. for analysis was selected by reading the title and
abstract and skimming the text in accordance with
• Must have undergone peer review and been pub- inclusion and exclusion criteria. As a consequence, a
lished in a workshop, conference, or international selection of 78 papers that would be examined in order
journal. to address the study themes was finalized.

EAI Endorsed Transactions on


8 Internet of Things
| Volume 9 | Issue 4 |
Water Quality Estimation and Anomaly Detection
References

Table 5. Search string used for each data source

data 92
data 3

data 6

data 3

data 3

data 3

data 3
Source Total paper Search string
Scopus 97 “water quality" && “anomaly detection"
Web of Science 65 “water quality" && “anomaly detection"
DBLP 7 “water quality" && “anomaly detection"
Ref. Range (WHO)

Google Scholar 3050 “water quality" && “anomaly detection"

7-12 °C (ideal)
4- 5 mg/L

5 mg/L A software program called VOSviewer [38] is used


6.5-8.5

5 NTU

data 2

data 2
to visualize and explore maps made from network
data obtained from these papers. Country-based and
abstract networks are shown in Figs. 3a and 3b,
Low dissolved oxygen is critical to aquatic life. Low level is one of the most important indicators of water pollution.

respectively. Items are depicted with a circle and their


label. The weight of an object determines the size of the
The amount of dissolved oxygen is a very important parameters in water pollution and wastewater treatment.

circle and label for that item. The label and circle of an
object grow in size in proportion to its weight.
Table 4. Water quality physico-chemical parameters and explanations

4. Materials and Methods


It depends on the temperature of the water, the partial pressure of oxygen in the atmosphere,

4.1. Global description of the datasets


the organisms that provide oxygen to the water, and the mineral concentration in the water.

It is not safe to test or implement attacks on cyber-


It can decrease over time because of decomposition of organic substances in the water.
Free chlorine levels decrease over time, so the CL2 levels of the water in the tanks and

physical systems and the intrusion detection and intru-


sion prevention systems that can be created against
them on real physical systems. Researchers often use
The temperatures of surface waters are naturally determined by the climate.
the water from the treatment are different. Inactivates pathogenic bacteria.

platforms that simulate real systems or real cyber-


physical test environments. Cyber-physical environ-
ments called testbeds have been established in about
but there can be significant changes in conductivity as waters mix.

30 countries for various needs such as vulnerability


analysis, training, development, and testing of defense
The concentration of organic matter is measured in water.

mechanisms. iTrust [39] for cyber security research


at Singapore University of Technology and Design,
The number of dissolved salts in water are estimated.
Temperatures of different water sources are different.

It is generally stable in water from the same source,


It is usually constant, but changes with the seasons.

Sakarya University Critical Infrastructure National Test


It measures the concentration of hydrogen ions.

Bed center (CENTER) [40] in Turkey, The Missis-


sippi State University (MSU) SCADA Security Lab
[41], Technical Assessment Research Lab, China [42],
Free chlorine is added to disinfect water.

SCADA testbed recently built at the University of New


It is important for water purification.
It measures how clean the water is.

Orleans, USA [43] are the most popular centers that


It is undesirable to be in the water.

provide opportunities for studies by offering cyber-


physical environments created for critical infrastruc-
tures. Apart from these, cyber security studies are
also carried out on simulation platforms or small-scale
cyber-physical test environments (eg EpanetCDA [44],
Facies [45], WaterBox[46]) on water management sys-
Explanation

tems. Among the cyber-physical test environments, the


most respected and popular are the Secure Water Treat-
ment (SWaT) and The Water Distribution (WADI™) test
environments located at the ITrust center. Therefore,
among the accessible public data sets in the literature
Dissolved Oxygen (mg/L)

on water management systems, these are the data sets


Water temperature (°C)

in which all kinds of scenarios are tried and the most


Specific conductance

Total organic carbon

realistic data is obtained. In the SWAT architecture,


Chlorine (mg/L)
Turbidity (FNU)

which is designed based on the 6-stage water treatment


process, is aimed to test a small series of cyber-attacks in
Parameter

the test area and develop a defense mechanism against


them, using carefully designed experiments that ensure
PH

no damage to the physical system. In this context, the

EAI Endorsed Transactions on


9 Internet of Things
| Volume 9 | Issue 4 |
D. Balta et al.

(a) Country-based network

(b) Abstract network

Figure 3. VOSviewer-based visualization of the papers

SWaT Dataset – 7 days in normal operation and 4 days the WADI test environment has the capabilities to sim-
with attack scenarios were systematically generated ulate the effects of physical attacks such as water leaks
from the test area [47]. Designed as an extension of and malicious chemical injections. Likewise, the WADI
SWAT, in addition to the attacks and defenses therein, dataset contains data from 123 sensors and actuators

EAI Endorsed Transactions on


10 Internet of Things
| Volume 9 | Issue 4 |
Water Quality Estimation and Anomaly Detection

collected over 14 days and two days with attacks [47]. class are 0. False Positives (FP) - A situation where the
In addition to the WADI dataset, BATADAL, which is actual data point class is 0 and the predicted data point
the result of a competition to objectively compare the class is 1. False Negatives (FN) - A situation where the
performance of algorithms for detecting cyber attacks actual data point class is 1 and the predicted data point
on water distribution systems, includes one year of class is 0.
normal data without attacks, 6 months of tagged attack Accuracy is the most common performance metric
data [47]. for classification algorithms. It can be defined as the
number of correct predictions made as the ratio of all
4.2. Performance metrics predictions made. The formula is as follows.

Performance metrics are an important part of machine TP +TN


Accuracy = (1)
learning that gives someone an insight into whether T P + T N + FP + FN
progress has been made as a result of the analysis. Precision can be defined as the number of correct
There are several criteria we can use to evaluate results returned by our machine learning model. The
the performance of ML algorithms, classification, formula is as follows.
and regression algorithms. How the performance
of machine learning algorithms is measured and TP
compared and how the importance of various features P recision = (2)
T P + FP
in the result is evaluated depends entirely on the
Recall can be defined as the number of positives
metric chosen. Therefore, metrics must be chosen
returned by our machine learning model.
carefully to evaluate machine learning performance
[48]. Performance metrics used for classification TP
problems are Confusion Matrix, Accuracy, Precision, Recall = (3)
T P + FN
Recall, F1-Score
The Confusion Matrix is the easiest way to measure F1_Score gives the harmonic average of precision
the performance of a classification problem where the and recall. Mathematically, the F1 score is the weighted
output can be classes of two or more types. That is, a average of precision and recall. The best value of F1 is
confusion matrix consists of a two-dimensional table 1, the worst is 0. The formula is as follows.
4. There are "Actual" and "Predicted" and also "True
Positives (TP)", "True Negatives (TN)", "False Positives 2 ∗ P recision ∗ Recall 2∗TP
(FP)", "False Negatives (FN)" in both dimensions as F1 = = (4)
P recision + Recall 2 ∗ T P + FP + FN
shown below.
Performance metrics that can be used to evaluate
predictions for regression problems are Mean Absolute
Error (MAE), Mean Square Error (MSE), and R Squared
(R2 ).
Mean Absolute Error (MAE) is the simplest error
metric used in regression problems. It is basically the
sum of the mean of the absolute difference between the
predicted and actual values. The formula is as follows.

D
X
MAE = |xi − yi | (5)
i=1

Here, x and y are D dimensional vectors, and xi


denotes the value on the ith dimension of x.
Mean Square Error (MSE) is like MAE except that
instead of using the absolute value, it squares the
difference of the actual and predicted output values
Figure 4. Confusion Matrix before adding them all. The formula is as follows.

D
The explanation of terms associated with the X
confusion matrix is as follows. True Positives (TP) - A MSE = (xi − yi )2 (6)
i=1
state where both the true class and the predicted data
point class are 1. True Negatives (TN) - A situation The R_Square metric is often used for explanatory
where both the true class and the predicted data point purposes and provides an indication of the fitness or

EAI Endorsed Transactions on


11 Internet of Things
| Volume 9 | Issue 4 |
D. Balta et al.

goodness of a set of predicted output values to actual of unsupervised learning. In this instance, the data’s
output values. The formula is as follows. features will correspond with the labels. [52].
PD
(xi − ŷi )2 4.4. Deep learning
R2 = 1 − Pi=1
D
(7)
2 Starting in 2012, deep learning, a new area of machine
i=1 (xi − yi )
learning, lead to breakthrough advances. The above-
mentioned water quality and anomaly detection are
4.3. Traditional methods
particularly well-suited to deep learning because it
The classical (ML) techniques that have recently drawn has specialized network types for sequential data that
the most interest in water quality management and capture temporal structures, they mainly computerize
anomaly detection include logistic regression (LR), feature engineering and selection by prioritizing
support vector machines (SVM) and artificial neural and learning hierarchies of progressively abstract
networks (ANN). Statistical techniques are also trusted representations of the inputs, making them particularly
for the anomaly identification of water quality data well-suited to high-dimensional data, and they can
in addition to these conventional ML methods. In learn arbitrarily complex non-linear mappings [23, 53–
this field of research, multivariate methods like linear 55].
discriminant analysis and principal component analysis The primary deep learning (DL) architecture models
have also been used. are deep belief networks (DBN), deep Boltzmann
The bulk of these classic ML approaches have machines (DBM), stacked denoising autoencoders
limitations due to their large computational memory (SDAE), convolutional neural networks (CNN), and
and time needs, imbalanced anomalous-to-normal data recurrent neural networks (RNN). These models have
ratios, and sensor signal processing noise. Because of been applied to the analysis of water quality and
this, they have low levels of accuracy, a high rate anomaly detection [56].
of false alarms, poor missing data handling, and a
lack of robustness when managing sizable real-time 4.5. Extreme learning machine
datasets from numerous and diverse sensory sources The Extreme learning machine (ELM) algorithm was
in high dimensional data search spaces. As a result, devised in response to the learning rate of feedforward
it becomes vital to research additional cutting-edge neural networks, which is typically thought to be
anomaly detection approaches in order to enhance significantly slower than predicted due to slower
performance and fix these ML systems’ flaws [49, 50]. iterations and parameter tuning of the networks. The
The fundamental idea behind learning from data three-layer feedforward design of the traditional ELM.
is to use a collection of observations to identify an The first layer is the input layer, while the second is
underlying process. Finding a function that, using the the sole layer that is hidden. The input layer is then
data at hand, maximizes a particular score is one projected to a higher dimensionality by the hidden
approach to see this more formally. This function can layer using connection weights that are randomly
be thought of as a rough approximation of the actual, generated, set, and fixed across the network. The hidden
unidentified function that specifies the data generation layer’s outputs are generated using non-linear sigmoid
process. We are in a supervised learning environment activation functions. With features for linear input-
when the training data (the available data) provides output, the third layer is used as the output. In order
explicit examples of what the desired output should to train the connection weights between the hidden and
be. In this context, classification refers to the process output layers, a regularized least squares technique,
of giving a label to an observation or piece of data in such as the Moore-Penrose pseudo-inverse, is used to
order to place it into one of several classes or categories. calculate the hidden layer values and the desired output
An example of this method is a classifier, which may be [57].
trained using a set of previously labeled observations to In contrast to backpropagation (BP) based neural
establish the proper parameters. The anticipated label networks, ELM does not use iterations or parameter
for an observation is the result of applying a classifier to adjustment. The ELM algorithm’s key advantages
that observation. Finding a function from a hypothesis include quick training and strong generalization, which
set, which includes all feasible functions depending on is the capacity to perform well on novel inputs that
the chosen model, is equivalent to the training process haven’t been seen before other than those used to train
of a classifier [51]. the model. As a result, ML research uses the ELM
The techniques used for clustering are designed to algorithm extensively. Numerous investigations have
create extremely distinct clusters that are internally been carried out by different scholars with the goal of
cohesive. When we lack the precise labels matching to enhancing the theoretical and practical performance of
each observation, clustering is the most popular type the original ELM. Researchers are paying close attention

EAI Endorsed Transactions on


12 Internet of Things
| Volume 9 | Issue 4 |
Water Quality Estimation and Anomaly Detection

to ELM as a solution for anomaly detection issues in identify areas of concern and inform decisions about
different areas because of its quick training times and how best to allocate resources and solve issues related
strong generalization abilities [58]. to water, quality, safety, and sustainability. Recent
initiatives to increase black-box models’ explainability
4.6. Reinforcement learning lie under the purview of XAI research. They include the
study’s analysis tools Deep LIFT [80], RISE [81], SHAP
Nevertheless, apart from these two types, we must [82], and LIME [83].
distinguish another one that is very different from those
two: Reinforcement Learning (RL). In RL, instead of
having an initial training dataset from which to learn,
5.4. Contamination diffusion models
the learning system called an agent interacts in an Contamination diffusion models are used in water
environment and it is responsible to select and perform management systems to simulate the transport and
actions, getting rewards or penalties in return. The spread of water contaminants such as pollutants,
agent must learn by itself the best strategy (the so- chemicals, and viruses. These models are used to
called policy) to get the most reward over time. A policy predict how contaminants will move over time,
determines (either in a probabilistic or deterministic allowing water managers to identify areas of potential
way) what action the agent should take when it is contamination and develop strategies to prevent or
in a given situation [59, 60]. Table 6 shows different reduce their impact. The choice of numerical model
approaches for water quality estimation and anomaly tools for water pollution diffusion in the model base
detection with the used models with their parameters. must be established, reliable, all-encompassing, and
flexible enough to accommodate various scenarios [84].
5. Remaining Challenges and Prospects
5.5. Class imbalance problem
5.1. Future sensors
A dataset with the imbalanced distribution of classes,
Future water management system sensors will be
where one or more classes contain more instances than
more accurate, effective, and economical. These sensors
the others, is referred to as having a class imbalance.
might detect, monitor, and analyze water quality using
For instance, in a binary-class situation, the class
sophisticated machine learning algorithms, which
having the majority of instances is referred to as the
would give more precise, current information regarding
majority class, and the class with the minority of
water supplies. New sensors might also be employed
instances is referred to as the minority class. Real-
to monitor the effects of climate change on water
world anomalies in water quality are uncommon but
supplies in real-time, allowing water managers to take
interesting occurrences, but forecasting them from
preventative action to safeguard their water resources
an unbalanced learning standpoint using conventional
from pollution and other environmental concerns [76,
machine learning algorithms is extremely difficult
77].
[53]. To solve the class imbalance problem, it is
possible to utilize combinations of heterogeneous and
5.2. Reproducibility homogeneous algorithms, such as bagging, boosting,
Reproducibility in water management systems is the stacking, and their variants embedded with resampling
ability to replicate the same results with the same set strategies, as well as optimized DNN models. Data
of data. This allows for a greater degree of confidence level, algorithm level, and cost-sensitive level methods
when making decisions based on the data available. can also be utilized. Robust models of increasing
Reproducibility also enables researchers and scientists class imbalance and stable models under extreme class
to verify the accuracy of the results they are seeing imbalance ratios are still gaps in the literature [85].
and make sure they are reliable. By reproducing
results, water management teams can ensure that their 5.6. Optimal sensor placement problem
decisions are based on accurate, reliable data [78, 79].
The sensor placement problem is an optimization
problem that attempts to find the optimal locations
5.3. Explainable artificial intelligence for sensors in a given area in order to maximize
Explainability in water management systems refers to their effectiveness. This could involve finding the most
the ability to explain the decisions and predictions that effective placements for traffic cameras, temperature
have been made by an AI-based system. Explainability sensors, or any other kind of sensor. The goal is to
can provide insight into why and how an AI-based strategically place the sensors so they can provide the
system has arrived at a certain decision, enabling users most accurate readings and insights while minimizing
to evaluate the accuracy and reliability of the system. costs [86]. The number of nodes in a WDN is frequently
This can be used by water management teams to better substantially more than the number of accessible

EAI Endorsed Transactions on


13 Internet of Things
| Volume 9 | Issue 4 |
D. Balta et al.

Table 6. Comparison of different approaches for water quality estimation and anomaly detection

Ref. Computational
Year Parameters DataSet Performance
No. Model
flow rate, pressure,
BATADAL, model based-0.99 Acc.
[61] 2018 BATADALs pumps status,
imulation pre-solve- 0.81 Acc.
water levels
temperature, PH
CEEMDAN-XGBoost Collected
[5] 2020 dis. oxygen,conductance, X
CEEMDAN-RF Data
turbidity, fluorescent
Random Forest dissolved organic matter. Offline / Online
Generated
[62] 2018 Decision Tree SCADA network traffic RF 99.98, 99.89 Acc. / 0.01, 0 FPR
Dataset
Logistic Regression Decision Tree 100, 99.89 Acc. / 0, 0 FPR
Naive Bayes,KNN Log. Reg. 99.86, 99.59 Acc. /0.12,0.16 FPR
klor, organik karbon, OpenWater
[4] 2021 Epanet msx N. Bayes 99.51, 99.60XAcc. / 0.50, 0.31 FPR
bakteri Analytics
KNN 100, 72.29 Acc. / 0, 0.11 FPR
SVM-0.99994 Acc. /0.99991 F.Meas.
SVM
Generated Dec. tree-0.99994, Acc./ 0.9999 F.Meas.
[63] 2020 Decision Trees network traffic
Dataset kNN 0.99982 Acc. /0.99664 F.Meas.
kNN, Kmeans
Kmeans 0.99982 Acc. /0.99667 F.Meas.
[64] 2018 [HTML]FFFFFFPCA + ANN WMS equipments BATADAL 0.968 Accuracy
MLR, SVR BGA, Chl, fDOM, Simulation,
[65] 2020 X
ELR, pDNN DO, SC, turbidty collected data
Bayes,
[66] 2020 Turbidity, SC, DO Collected Data 0.745 Accuracy
Isolation Forest
Precision/Recall/F.Meas.
DNN, time series,
[67] 2017 X SWAT DNN 0.98295/0.67847/0.80281
SVM
SVM 0.92500/0.69901/0.79628
Water Resources
[68] 2019 SVM X X
of China
Temp,
specific conductance
Collected
[69] 2018 ANN dissolved oxygen (DO), >0.98 Accuracy
Data
pH, turbidity (TURB),
nitrate + nitrite nitrogen
98.8 Accuracy
SVM 0.9485 F.Meas.
logistic regression,
Temp, Cl, PH, Collected LSTM 0.9023 F.Meas.
[70] 2019 linear discriminant analy.,
Turbidity Data RNN 0.8345 F.Meas.
SVM, ANN, DNN, LSTM
Log.Reg. 0.6027 F.Meas.
LDA 0.0820 F.Meas.
pH, turbidity, 0.82 Accuracy
[71] 2021 CUSUM, RF Collected Data
iron, chlorine 0.84 F.Measure
Generated
[72] 2022 statistical methods, DL flow, pressure X
Dataset
Weighted kNN, C5.0, pH, turbidity, redoks, Generated
[73] 2021 0.97 Accuracy
Discriminant Analysis temp, TOC, Chl Dataset
Linear regression, Temp, DO, pH, turbidity,
[33] 2021 Collected Data Multivariate Linear 0.97 Accuracy
RF, SVM, LSTM phosphorus
Nitrogen, turbidity, Generated
[36] 2019 GA, BPNN X
electro-conductibility Dataset
magnesium, chloride,
sulfate, bicarbonates, Generated
[74] 2021 FFNN, GEP+ PSO X
specific conductivity, Dataset
temp
turbidity, conductivity, Generated Turbidty- 0.86 Accuracy
[19] 2019 ARIMA
level Dataset Conductivity-0.93 Accuracy
Attention-based
Precision/Recall/F.Meas.
Spatio-Temporal
[75] 2019 X SWAT ConvLSTM-ED 0.98/0.422/0.578
Autoencoder,
STAE-AD 0.96/0.815/0.880
ConvLSTM-ED

sensors. In order to deliver network-wide, globally more frequently rather than static ones. It should
relevant information, sensors must be positioned in this be considered that this raises the problem’s level of
manner. Moving sensors are utilized in real applications complexity, though.

EAI Endorsed Transactions on


14 Internet of Things
| Volume 9 | Issue 4 |
Water Quality Estimation and Anomaly Detection

5.7. Anomaly event localization swiftly understand analytical data with the help of data
visualization, especially those without a background
Not only are anomalies and behavioral changes of
in computer science or statistical analysis. In most
sensor data in water distribution networks to be
cases, the Graphical user interface (GUI) is provided by
detected but also the correct position and source
the user interface layer of water management systems,
of faults that result in anomalous behaviors at
from which users can export and view data, produce
the water distribution networks are to be found
summary statistics, and edit data quality [88]. To
(fault localization/anomaly event localization). The
visualize water management-related data, some issues
hydraulic model must be built with nodal demands
should be regarded: (i) The data organization and
that are sufficiently accurate to reflect actual water
analysis process must be done initially. (ii) When
consumption, accurate elevations at locations (nodes)
working with massive datasets, it might be intimidating
where pressure data are recorded, and accurate
to try to spot trends by simply looking at the raw
boundary conditions, such as service reservoirs, tanks,
data. And when working with data, it is crucial to
and pumps, in order to produce good results for
present the data in an objective manner. (iii) The
an anomaly hotspot localization process [72]. Not
third phase involves monitoring data and analyzing
only pressure-based anomalies but also other types of
trends. (iv) It’s crucial to identify the audience before
anomalies should be investigated. Identification of the
starting to produce infographics, social media posts,
source of contamination can be another hotspot that
or academic outputs using the findings. (v) Science-
needs to be localized. This addresses the requirement
congruent narratives that are values-driven can help us
to respond as soon as the contamination is discovered
communicate with the right audiences. (vi) By its most
and to implement the necessary defenses to isolate the
basic definition, graphic design is the art of producing
system component that has been affected.
visual content, principally conveying messages through
the use of visual hierarchy and page layout strategies.
5.8. Anomaly correction It’s ideal to adhere to fundamental design visual
Anomaly correction is a process of detecting, diagnos- guidelines and principles when creating graphics. (vii)
ing, and correcting anomalies in data. It helps identify The results should be announced to the public. In
any unusual patterns or behavior in datasets that may science, consistency and replication are crucial [89].
indicate an error or irregularity. Anomaly correction
can be used to improve the accuracy and reliability 5.10. Parallel and distributed computing
of data-driven decision-making [87]. The value of the Only IoT data is expected to have 50 billion connected
data directly affects the relevance of the detection and sensors worldwide by 2025, whereas the size of
correction methods. Sensor data, often known as the data is expanding quickly at a rate of millions per
information produced by sensors, can be either numeri- second. In order to extract knowledge or make an
cal or categorical. The former behave like numbers that accurate prediction, integrating, analyzing, and mining
can execute mathematical operations and are contin- enormous amounts of data requires an effective and
uous, scalable, and have a zero. The latter, however, efficient framework and an algorithm[90]. Due to
lack all mathematical operations and are discrete. Since the continuous evolution of data streams, predicting
categorical data are displayed as a string of symbols, anomaly detection and monitoring water quality at
any anomaly may be caused by an unknown symbol high speed are crucial and challenging challenges
or symbol sequence. It should be noted that as pro- [91]. The majority of current and traditional anomaly
cessing power improves, the appeal of sensors with detection techniques rely significantly on stationary
categorical output is rising. The inability to perform data, and it can take centralized algorithm hours or
statistical analysis due to the nature of the problem even days to compute and identify accurate results.
makes anomaly detection and correction much more Thus, parallel and distributed computing is critical in
difficult. reducing execution time, which can fit the need for real-
time or near-real-time detection and monitoring [92].
5.9. Visualization and GUI design
Data visualization is the use of visual components to
5.11. Water quality in social multimedia
effectively communicate the relevance of large datasets Social media platforms have emerged as a reliable
and to find undiscovered data trends. Charts, graphs, means of communication and information transmission
maps, tables, and other visual representations of data during the past ten years. They are a favored
are all examples of data visualization. Interactive data forum to discuss and express concerns over various
visualization, on the other hand, allows users to directly domestic and international difficulties because of their
alter plot elements and create connections between capacity to reach sizable audiences globally. Security,
several plots. Decision makers can more easily and disaster response, disease outbreaks [93], and consumer

EAI Endorsed
EAI Endorsed Transactions Transactions
on Internet of Things on
15 Internet of Things
| Volume 9 | Issue 4 |
D. Balta et al.

happiness are all monitored on social media by law contamination, classify water quality changes and
enforcement, emergency management agencies, the anomalies, and ensure early warning in case of potential
public health community, and businesses [94, 95]. hazards. It’s not easy to create a static set of rules or
Although social media monitoring is still relatively restrictions that catch major attacks clearly and quickly.
new to the water industry, it might be utilized for Therefore, the use of learning-based anomaly detection
comparable objectives given that consumer complaints techniques is essential for water quality detection in
are a good source for spotting distribution system issues water management systems. In this way, the anomaly
early on. detection system will provide a defense mechanism
The Water Research Foundation’s 2017 project1 , to water management systems, while simultaneously
Social Media for Water Utilities, showed how the water maintaining, repairing, and developing similar critical
industry lagged behind other sectors in embracing infrastructures. With the work to be done by developing
social media, such as the electric industry. According to models based on artificial intelligence and machine
the survey, just a small portion of the 60 drinking water learning techniques, predicting more stealthy attacks
and wastewater utilities in the United States with social and implementing a defense mechanism can be
media profiles were actually using it, and of those who possible. In fact, these intrusion detection systems need
did, only a small portion was able to successfully reach to be evaluated against real-time water management
their customer base. systems.
Nowadays, various studies are carried out to monitor
water quality through social media [96]. One of them is Acknowledgement. This study was supported by The Sci-
“Water Quality in Social Multimedia [95]". The analysis entific and Technological Research Council of Turkey
of social media tweets on water quality, security, and (TUBITAK) with project number 122E610.
safety is the focus of the WaterMM Task. In order to
download the text, the accompanying image, and the References
metadata of tweets that were chosen using a keyword-
based search that included words or phrases about the [1] Türkiye’nin su politikaları, t.c. dışişleri
quality of drinking water, participants in this task are bakanlığı., https://www.mfa.gov.tr/turkiye_
given a set of Twitter post IDs (e.g., strange color, nin-su-politikasi.tr.mfa. Accessed: 2023-01-12.
odor or taste, related illnesses, etc.). Participants can [2] Kucukcelebi, C. (2014) Avrupa Birliği Uyum Sürecinde
Türkiye’nin Su Politikası, Su Hukuku ve Su Kaynakları
tackle the task using text features, image features,
Yönetiminde Yeniden Yapılanmalar. Master’s thesis, İstan-
metadata, or a combination of the above. You can
bul Teknik Üniversitesi.
review some papers using WaterMM benchmark dataset [3] Health, E. (2013) NSW Guidelines for Drinking Water
[97–99]. Using other social media platforms, collecting Management Systems (NSW Government).
multimodal and cross-data will be the main focus of [4] Pelekanos, N., Nikolopoulos, D. and Makropou-
future works. los, C. (2021) Simulation and vulnerability assess-
ment of water distribution networks under deliberate
6. Conclusion contamination attacks. Urban Water Journal 18: 1–14.
doi:10.1080/1573062X.2020.1864832.
In this study, water management systems and water [5] Lu, H. and Ma, X. (2020) Hybrid decision
quality issues from critical infrastructures, cyber tree-based machine learning models for short-
security studies on water quality, past cyber attacks on term water quality prediction. Chemosphere 249.
water quality processes, and security requirements are doi:10.1016/j.chemosphere.2020.126169.
systematically examined. The studies reviewed in this [6] Chhipi-Shrestha, G., Mian, H.R., Mohammadiun, S.,
Rodriguez, M., Hewage, K. and Sadiq, R. (2023)
paper are promising, but more work is required for
Digital water: artificial intelligence and soft computing
implementation and validation on real water systems.
applications for drinking water quality assessment.
Monitoring water quality in water systems is a highly Clean Technologies and Environmental Policy : 1–30.
complicated and critical process influenced by many [7] Berglund, E.Z., Shafiee, M.E., Xing, L. and Wen, J.
factors. Therefore, methods that require advanced (2023) Digital twins for water distribution systems.
technology should be defined and applied. It has Journal of Water Resources Planning and Management
become compulsory to develop an efficient detection 149(3): 02523001.
method for improving the emergency response capacity [8] Sikder, M.N.K., Nguyen, M.B., Elliott, E.D. and
in the event of a possible attack, protect against Batarseh, F.A. (2023) Deep h2o: Cyber attacks detection
potential hazards caused by intentional/unintentional in water distribution systems using deep learning.
Journal of Water Process Engineering 52: 103568.
[9] Ismagilova, E., Hughes, L., Dwivedi, Y.K. and Raman,
1 https://www.nacwa.org/docs/default-source/conferences- K.R. (2019) Smart cities: Advances in research—an
events/older-events/2017-summer/stratcomm-h2o/laura- information systems perspective. International Journal of
ganus.pdf?sfvrsn=18c1f561_4 Information Management 47: 88–100.

EAI Endorsed Transactions on


16 Internet of Things
| Volume 9 | Issue 4 |
Water Quality Estimation and Anomaly Detection

[10] Bellini, P., Nesi, P. and Pantaleo, G. (2022) Iot-enabled simulations, machine learning, and cloud computing.
smart cities: A review of concepts, frameworks and key Earth-Science Reviews 205: 103187.
technologies. Applied Sciences 12(3): 1607. [25] Jiang, J., Zheng, Y., Pang, T., Wang, B., Chachan, R.
[11] Erdem, T. and Eken, S. (2021) Layer-wise relevance and Tian, Y. (2020) A comprehensive study on spectral
propagation for smart-grid stability prediction. In analysis and anomaly detection of river water quality
Mediterranean Conference on Pattern Recognition and dynamics with high time resolution measurements.
Artificial Intelligence (Springer): 315–328. Journal of Hydrology 589: 125175.
[12] Breviglieri, P., Erdem, T. and Eken, S. (2021) Predicting [26] Ahmed, U., Mumtaz, R., Anwar, H., Mumtaz, S. and
smart grid stability with optimized deep models. SN Qamar, A.M. (2020) Water quality monitoring: from
Computer Science 2: 1–12. conventional to emerging technologies. Water Supply
[13] Oberascher, M., Rauch, W. and Sitzenfrei, R. (2022) 20(1): 28–45.
Towards a smart water city: A comprehensive review [27] Gupta, S., Aga, D., Pruden, A., Zhang, L. and
of applications, data requirements, and communication Vikesland, P. (2021) Data analytics for environmental
technologies for integrated management. Sustainable science and engineering research. Environmental Science
Cities and Society 76: 103442. & Technology 55(16): 10895–10907.
[14] Keriwala, N. and Patel, A. (2022) Innovative roadmap [28] Shi, Z., Chow, C.W., Fabris, R., Liu, J. and Jin, B. (2022)
for smart water cities: A global perspective. Materials Applications of online uv-vis spectrophotometer for
Proceedings 10(1): 1. drinking water quality monitoring and process control:
[15] Yaacoub, J.P.A., Salman, O., Noura, H.N., Kaaniche, a review. Sensors 22(8): 2987.
N., Chehab, A. and Malli, M. (2020) Cyber-physical [29] Water quality criteria, https://https://www.epa.gov/
systems security: Limitations, issues and future trends. wqc. Accessed: 2023-01-12.
Microprocessors and microsystems 77: 103201. [30] Ulusal su planı, t.c. tarım ve orman bakanlığı,
[16] Mishra, A. and Ray, A.K. (2022) A novel layered https://www.tarimorman.gov.tr/SYGM/Belgeler/
architecture and modular design framework for next-gen NHYP%20DEN%C4%B0Z/ULUSAL%20SU%20PLANI.pdf.
cyber physical system. In 2022 International Conference Accessed: 2023-01-12.
on Computer Communication and Informatics (ICCCI) [31] WHO, W.H.O. (2022) Guidelines for drinking-water
(IEEE): 1–8. quality (World Health Organization WHO).
[17] Latif, S.A., Wen, F.B.X., Iwendi, C., Li-li, F.W., Mohsin, [32] Kang, G., Gao, J.Z. and Xie, G. (2017) Data-driven water
S.M., Han, Z. and Band, S.S. (2022) Ai-empowered, quality analysis and prediction: A survey. 2017 IEEE
blockchain and sdn integrated security architecture Third International Conference on Big Data Computing
for iot network of cyber physical systems. Computer Service and Applications (BigDataService) : 224–232.
Communications 181: 274–283. [33] Chawla, P., Cao, X., Fu, Y., Hu, C.m., Wang, M., Wang, S.
[18] Manogaran, G., Khalifa, N.E.M., Loey, M. and Taha, and Gao, J. (2021) Water quality prediction of salton sea
M.H.N. (2023) Cyber-Physical Systems for Industrial using machine learning and big data techniques. Inter-
Transformation: Fundamentals, Standards, and Protocols national Journal of Environmental Analytical Chemistry :
(CRC Press). 1–24doi:10.1080/03067319.2021.1963713.
[19] Leigh, C., Alsibai, O., Hyndman, R.J., [34] Selim, G.E., Hemdan, E.E.D., Shehata, A. and El-
Kandanaarachchi, S., King, O.C., McGree, J.M., Fishawy, N. (2021) Anomaly events classification and
Neelamraju, C. et al. (2019) A framework for automated detection system in critical industrial internet of
anomaly detection in high frequency water-quality data things infrastructure using machine learning algo-
from in situ sensors. Science of the Total Environment 664: rithms. Multimedia Tools and Applications 80: 1–22.
885–898. doi:10.1007/s11042-020-10354-1.
[20] Li, Z., Liu, H., Zhang, C. and Fu, G. (2023) Generative [35] Nemade, B. and Shah, D. (2022) An efficient iot
adversarial networks for detecting contamination events based prediction system for classification of water
in water distribution systems using multi-parameter, using novel adaptive incremental learning framework.
multi-site water quality monitoring. Environmental J. King Saud Univ. Comput. Inf. Sci. 34(8 Part A): 5121–
Science and Ecotechnology 14: 100231. 5131. doi:10.1016/j.jksuci.2022.01.009, URL https://
[21] Fanaee-T, H. and Gama, J. (2016) Tensor-based anomaly doi.org/10.1016/j.jksuci.2022.01.009.
detection: An interdisciplinary survey. Knowledge-Based [36] Jin, T., Cai, S., Jiang, D. and Liu, J. (2019) A data-
Systems 98: 130–147. driven model for real-time water quality prediction and
[22] Sebestyen, G. and Hangan, A. (2017) Anomaly detection early warning by an integration method. Environmental
techniques in cyber-physical systems. Acta Universitatis Science and Pollution Research 26. doi:10.1007/s11356-
Sapientiae, Informatica 9(2): 101–118. 019-06049-2.
[23] Dogo, E.M., Nwulu, N.I., Twala, B. and Aigbavboa, C. [37] Page, M.J., McKenzie, J.E., Bossuyt, P.M., Boutron,
(2019) A survey of machine learning methods applied I., Hoffmann, T.C., Mulrow, C.D., Shamseer, L. et al.
to anomaly detection on drinking-water quality data. (2021) The prisma 2020 statement: an updated guideline
Urban Water Journal 16(3): 235–248. for reporting systematic reviews. Systematic reviews
[24] Sagan, V., Peterson, K.T., Maimaitijiang, M., Sidike, 10(1): 1–11.
P., Sloan, J., Greeling, B.A., Maalouf, S. et al. (2020) [38] Van Eck, N.J. and Waltman, L. (2011) Text min-
Monitoring inland water quality using remote sensing: ing and visualization using vosviewer. arXiv preprint
Potential and limitations of spectral indices, bio-optical arXiv:1109.2058 .

EAI Endorsed Transactions on


17 Internet of Things
| Volume 9 | Issue 4 |
D. Balta et al.

[39] Mathur, A.P. and Tippenhauer, N.O. (2016) Swat: a quality on a real-world data set. Journal of Information
water treatment testbed for research and training on and Telecommunication 3(3): 294–307.
ics security. In 2016 International Workshop on Cyber- [51] Akpinar, K.O. and Ozcelik, I. (2019) Analysis of
physical Systems for Smart Water Networks (CySWater): machine learning methods in ethercat-based anomaly
31–36. doi:10.1109/CySWater.2016.7469060. detection. IEEE Access 7: 184365–184374.
[40] Özçelİk, , İskefiyeli, M., Balta, M., Akpinar, K.O. [52] Berry, M.W., Mohamed, A. and Yap, B.W. (2019)
and Toker, F.S. (2021) Center water: A secure testbed Supervised and unsupervised learning for data science
infrastructure proposal for waste and potable water (Springer).
management. In 2021 9th International Symposium [53] Dogo, E.M. (2021) Application of Artificial Intelligence
on Digital Forensics and Security (ISDFS): 1–7. Technologies for Water Quality Anomaly Detection. Ph.D.
doi:10.1109/ISDFS52919.2021.9486364. thesis, University of Johannesburg (South Africa).
[41] Morris, T., Srivastava, A., Reaves, B., Gao, W., [54] Qian, K., Jiang, J., Ding, Y. and Yang, S. (2020) Deep
Pavurapu, K. and Reddi, R. (2011) A control system learning based anomaly detection in water distribution
testbed to validate critical infrastructure protection systems. In 2020 IEEE International Conference on
concepts. International Journal of Critical Infrastructure Networking, Sensing and Control (ICNSC) (IEEE): 1–6.
Protection 4: 88–103. doi:10.1016/j.ijcip.2011.06.005. [55] Luo, Y., Xiao, Y., Cheng, L., Peng, G. and Yao, D.
[42] Gao, H., Peng, Y., Dai, Z., Wang, T., Han, X. and (2021) Deep learning-based anomaly detection in cyber-
Li, H. (2014) An Industrial Control System Testbed physical systems: Progress and opportunities. ACM
Based on Emulation, Physical Devices and Simulation. In Computing Surveys (CSUR) 54(5): 1–36.
Butts, J. and Shenoi, S. [eds.] 8th International Conference [56] Alom, M.Z., Taha, T.M., Yakopcic, C., Westberg, S.,
on Critical Infrastructure Protection (ICCIP) (Arlington, Sidike, P., Nasrin, M.S., Van Esesn, B.C. et al. (2018)
United States: Springer), Critical Infrastructure Protection The history began from alexnet: A comprehensive
VIII AICT-441: 79–91. doi:10.1007/978-3-662-45355- survey on deep learning approaches. arXiv preprint
1_6, URL https://hal.inria.fr/hal-01386756. Part arXiv:1803.01164 .
1: Control Systems Security. [57] Arias-Rodriguez, L.F., Duan, Z., Díaz-Torres, J.d.J.,
[43] Ahmed, I., Roussev, V., Johnson, W., Senthivel, S. Basilio Hazas, M., Huang, J., Kumar, B.U., Tuo, Y. et al.
and Sudhakaran, S. (2016) A scada system testbed for (2021) Integration of remote sensing and mexican water
cybersecurity and forensic research and pedagogy: 1–9. quality monitoring system using an extreme learning
doi:10.1145/3018981.3018984. machine. Sensors 21(12): 4118.
[44] Taormina, R., Galelli, S., Tippenhauer, N.O., Ostfeld, [58] Choi, Y.H., Sadollah, A. and Kim, J.H. (2020) Improve-
A. and Salomons, E. (2016) Assessing the effect of cyber- ment of cyber-attack detection accuracy from urban
physical attacks on water distribution systems: 436–442. water systems using extreme learning machine. Applied
doi:10.1061/9780784479865.046. Sciences 10(22): 8179.
[45] Etchevés Miciolino, E., Setola, R., Bernieri, G., [59] Li, Y. (2017) Deep reinforcement learning: An overview.
Panzieri, S., Pascucci, F. and Polycarpou, M.M. (2017) arXiv preprint arXiv:1701.07274 .
Fault diagnosis and network anomaly detection in [60] Khampuengson, T. and Wang, W. (2022) Deep rein-
water infrastructures. IEEE Design Test 34(4): 44–51. forcement learning ensemble for detecting anomaly in
doi:10.1109/MDAT.2017.2682223. telemetry water level data. Water 14(16): 2492.
[46] Kartakis, S., Abraham, E. and McCann, J.A. (2015) [61] Housh, M. and Ohar, Z. (2018) Model-based
Waterbox: A testbed for monitoring and controlling approach for cyber-physical attack detection in
smart water networks. In Proceedings of the 1st water distribution systems. Water Research 139.
ACM International Workshop on Cyber-Physical Systems doi:10.1016/j.watres.2018.03.039.
for Smart Water Networks, CySWater’15 (New York, [62] Teixeira, M., Salman, T., Zolanvari, M. and Jain, R.
NY, USA: Association for Computing Machinery). (2018) Scada system testbed for cybersecurity research
doi:10.1145/2738935.2738939, URL https://doi.org/ using machine learning approach. Future Internet 10.
10.1145/2738935.2738939. doi:10.3390/fi10080076.
[47] Itrust labs datasets, https://itrust.sutd.edu.sg/ [63] Phillips, B., Gamess, E. and Krishnaprasad, S. (2020)
itrust-labs_datasets/dataset_info/. Accessed: An evaluation of machine learning-based anomaly
2023-01-12. detection in a scada system using the modbus protocol.
[48] Botchkarev, A. (2019) A new typology design of doi:10.1145/3374135.3385282.
performance metrics to measure errors in machine [64] Abokifa, A., Haddad, K., Lo, C. and Biswas, P.
learning regression algorithms. Interdisciplinary Journal (2018) Real-time identification of cyber-physical
of Information, Knowledge, and Management 14: 045–076. attacks on water distribution systems via machine
doi:10.28945/4184, URL https://doi.org/10.28945% learning based anomaly detection techniques. Journal
2F4184. of Water Resources Planning and Management 145.
[49] Parmar, J., Chouhan, S., Raychoudhury, V. and doi:10.1061/(ASCE)WR.1943-5452.0001023.
Rathore, S. (2023) Open-world machine learning: appli- [65] Peterson, K.T., Sagan, V. and Sloan, J.J. (2020)
cations, challenges, and opportunities. ACM Computing Deep learning-based water quality estimation
Surveys 55(10): 1–37. and anomaly detection using landsat-8/sentinel-
[50] Muharemi, F., Logofătu, D. and Leon, F. (2019) Machine 2 virtual constellation and cloud computing.
learning approaches for anomaly detection of water GIScience & Remote Sensing 57(4): 510–525.

EAI Endorsed Transactions on


18 Internet of Things
| Volume 9 | Issue 4 |
Water Quality Estimation and Anomaly Detection

doi:10.1080/15481603.2020.1738061, URL https: [76] Zhu, J., Liu, X., Shi, Q., He, T., Sun, Z., Guo, X., Liu,
//doi.org/10.1080/15481603.2020.1738061. W. et al. (2019) Development trends and perspectives of
https://doi.org/10.1080/15481603.2020.1738061. future sensors and mems/nems. Micromachines 11(1): 7.
[66] Liu, J., Wang, P., Jiang, D., Nan, J. and Zhu, W. [77] Mezni, H., Driss, M., Boulila, W., Atitallah, S.B.,
(2020) An integrated data-driven framework for Sellami, M. and Alharbi, N. (2022) Smartwater: A
surface water quality anomaly detection and early service-oriented and sensor cloud-based framework for
warning. Journal of Cleaner Production 251: 119145. smart monitoring of water environments. Remote Sensing
doi:https://doi.org/10.1016/j.jclepro.2019.119145, URL 14(4): 922.
https://www.sciencedirect.com/science/article/ [78] Eken, S., Şara, M., Satılmış, Y., Karslı, M., Tufan,
pii/S0959652619340156. M.F., Menhour, H. and Sayar, A. (2020) A reproducible
[67] Inoue, J., Yamagata, Y., Chen, Y., Poskitt, C. and Sun, J. educational plan to teach mini autonomous race car
(2017) Anomaly detection for a water treatment system programming. The International Journal of Electrical
using unsupervised machine learning . Engineering & Education 57(4): 340–360.
[68] Fang, S., Sun, W. and Huang, L. (2019) Anomaly [79] Stagge, J.H., Rosenberg, D.E., Abdallah, A.M., Akbar,
detection for water supply data using machine learning H., Attallah, N.A. and James, R. (2019) Assessing data
technique. Journal of Physics: Conference Series 1345(2): availability and research reproducibility in hydrology
022054. doi:10.1088/1742-6596/1345/2/022054, URL and water resources. Scientific data 6(1): 1–12.
https://dx.doi.org/10.1088/1742-6596/1345/2/ [80] Li, J., Zhang, C., Zhou, J.T., Fu, H., Xia, S. and Hu, Q.
022054. (2021) Deep-lift: deep label-specific feature learning for
[69] Shi, B., Wang, P., Jiang, J. and Liu, R. (2017) image annotation. IEEE Transactions on Cybernetics .
Applying high-frequency surrogate measurements [81] Petsiuk, V., Das, A. and Saenko, K. (2018) Rise:
and a wavelet-ann model to provide early warnings Randomized input sampling for explanation of black-
of rapid surface water quality anomalies. The box models. arXiv preprint arXiv:1806.07421 .
Science of the total environment 610-611: 1390–1399. [82] Lundberg, S.M. and Lee, S.I. (2017) A unified approach
doi:10.1016/j.scitotenv.2017.08.232. to interpreting model predictions. Advances in neural
[70] Muharemi, F., Logofătu, D. and Leon, F. (2019) information processing systems 30.
Machine learning approaches for anomaly detection [83] Das, A. and Rad, P. (2020) Opportunities and challenges
of water quality on a real-world data set. Journal in explainable artificial intelligence (xai): A survey. arXiv
of Information and Telecommunication 3(3): 294–307. preprint arXiv:2006.11371 .
doi:10.1080/24751839.2019.1565653. [84] Ali, I. (2012) New generation adsorbents for water
[71] Riss, G., Romano, M., Memon, F. and Kapelan, Z. treatment. Chemical reviews 112(10): 5073–5091.
(2021) Detection of water quality failure events at [85] Dablain, D., Krawczyk, B. and Chawla, N.V. (2022)
treatment works using a hybrid two-stage method with Deepsmote: Fusing deep learning and smote for
cusum and random forest algorithms. Water Supply 21. imbalanced data. IEEE Transactions on Neural Networks
doi:10.2166/ws.2021.062. and Learning Systems .
[72] Wu, Z., Chew, A., Meng, X., Cai, J., Pok, J., Kalfarisi, [86] Brentan, B., Carpitella, S., Barros, D., Meirelles,
R., Lai, K. et al. (2022) Data-driven and model-based G., Certa, A. and Izquierdo, J. (2021) Water quality
framework for smart water grid anomaly detection and sensor placement: a multi-objective and multi-criteria
localization. AQUA—Water Infrastructure, Ecosystems approach. Water Resources Management 35(1): 225–241.
and Society 71(1): 31–41. [87] Xue, M., Chew, A.W.Z., Cai, J., Pok, J., Kalfarisi, R.
[73] Thompson, K.A. and Dickenson, E.R. (2021) Using and Wu, Z.Y. (2022) Improving near real-time anomaly
machine learning classification to detect simulated event detection and classification with trend change
increases of de facto reuse and urban stormwater detection for smart water grid operation management.
surges in surface water. Water Research 204: 117556. Urban Water Journal : 1–11.
doi:https://doi.org/10.1016/j.watres.2021.117556, URL [88] Xu, H., Berres, A., Liu, Y., Allen-Dumas, M.R. and
https://www.sciencedirect.com/science/article/ Sanyal, J. (2022) An overview of visualization and visual
pii/S0043135421007521. analytics applications in water resources management.
[74] Shah, M.I., Javed, M.F., Alqahtani, A. and Aldrees, A. Environmental Modelling & Software : 105396.
(2021) Environmental assessment based surface water [89] Abdallah, A.M., Rheinheimer, D.E., Rosenberg, D.E.,
quality prediction using hyper-parameter optimized Knox, S. and Harou, J.J. (2022) An interoperable
machine learning models based on consistent big data. software ecosystem to store, visualize, and publish
Process Safety and Environmental Protection 151: 324– water resources systems modelling data. Environmental
340. doi:https://doi.org/10.1016/j.psep.2021.05.026, Modelling & Software 151: 105371.
URL https://www.sciencedirect.com/science/ [90] Sun, A.Y. and Scanlon, B.R. (2019) How can big data
article/pii/S0957582021002664. and machine learning benefit environment and water
[75] Macas, M. and Wu, C. (2019) An unsupervised management: a survey of methods, applications, and
framework for anomaly detection in a water treatment future directions. Environmental Research Letters 14(7):
system. In 2019 18th IEEE International Conference On 073001.
Machine Learning And Applications (ICMLA): 1298–1305. [91] Balta, S., Zavrak, S. and Eken, S. (2022) Real-time
doi:10.1109/ICMLA.2019.00212. monitoring and scalable messaging of scada networks
data: A case study on cyber-physical attack detection in

EAI Endorsed Transactions on


19 Internet of Things
| Volume 9 | Issue 4 |
D. Balta et al.

water distribution system. In International Congress of


Electrical and Computer Engineering (Springer): 203–215.
[92] Difallah, D.E., Cudre-Mauroux, P. and McKenna,
S.A. (2013) Scalable anomaly detection for smart city
infrastructure networks. IEEE Internet Computing 17(6):
39–47.
[93] Özgüven, Y.M. and Eken, S. (2021) Distributed mes-
saging and light streaming system for combating pan-
demics. Journal of Ambient Intelligence and Humanized
Computing : 1–15.
[94] Shao, Z., Sumari, N.S., Portnov, A., Ujoh, F., Musakwa,
W. and Mandela, P.J. (2021) Urban sprawl and its impact
on sustainable urban development: a combination of
remote sensing and social media data. Geo-spatial
Information Science 24(2): 241–255.
[95] Andreadis, S., Gialampoukidis, I., Bozas, A., Moumtzi-
dou, A., Fiorin, R., Lombardo, F., Karakostas, A. et al.
(2021) Watermm: water quality in social multimedia task
at mediaeval 2021. In Proceedings of the MediaEval 2021
Workshop, Online.
[96] Balta Kaç, S. and Eken, S. (2023) Customer complaints-
based water quality analysis. Water 15(18): 3171.
[97] Ayub, M.A., Ahmad, K., Ahmad, K., Ahmad, N.
and Al-Fuqaha, A. (2021) Nlp techniques for water
quality analysis in social media content. arXiv preprint
arXiv:2112.11441 .
[98] Ahmad, K., Ayub, M., Khan, J., Ahmad, N. and Al-
Fuqaha, A. (2022) Social media as an instant source
of feedback on water quality. IEEE Transactions on
Technology and Society .
[99] Hanif, M., Khawar, A., Tahir, M.A. and Rafi, M. (2021)
Deep learning based framework for classification of
water quality in social media data. In Proceedings of the
MediaEval 2021 Workshop, Online.

EAI Endorsed Transactions on


20 Internet of Things
| Volume 9 | Issue 4 |

You might also like