Pokhara University2
Pokhara University2
Pokhara University2
A Report on
Submitted by:
Aashish Timalsina(019-401)
BikeshBhatta(019-421)
Shambhu sha (019-414)
Submitted to:
Department of Electronics and Communication Engineering
2024
ACKNOWLEDGEMENT
It gives us immense pleasure in presenting this project report on “IOT BASE WATER
QULATIY MEASUREMENTS USING MEACHINE LEARNING”. We express our
gratitude to ‘Nepal Engineering College’ and ‘Department of Electronics and
Communication’ for providing us the opportunity to work on this project.
Last but not the least we place a deep sense of gratitude to our seniors and our colleague
who have been constant source of inspiration during throughout this project work. We
sincerely appreciate the inspiration; support and guidance of all those people who have
helped us in making this project a success.
PAGE \* MERGEFORMAT v
ABSTRACT
World Economic Forum ranked drinking watercrisis as one of the global risk, due to which
around 200children are dying per day. Drinking unsafe water alone causes around 3.4 million
deaths per year. Despite the advancements in technology, sufficient quality measures are not
present to measure the quality of drinking water. By focusing on the above issue, this paper
proposes a low cost water quality monitoring systemusing emerging technologies such as
IOT(Internet of Thing) , Machine Learning and Cloud Computing which can replace traditional
way of quality monitoring. This helps in saving people of rural areas from various dangerous
diseases such as fluorosis, bone deformities etc. The proposed model also has a capacity to
control temperature of water and adjusts it so as to suit environment temperature.water condition
based on four physical parameters i.e., temperature, pH, electric conductivity and turbidity
properties. Four sensors are connected with ESP-32in discrete way to detect the water
parameters. Extracted data from the sensors are transmitted to a cloud via webpage/mobile app
for user proper information.
PAGE \* MERGEFORMAT v
Contents
ACKNOWLEDGEMENT.....................................................................................................................i
LIST OF FIGURES............................................................................................................................ vi
Chapter 1: INTRODUCTION............................................................................................................1
1.2 Objective..................................................................................................................................3
1.3 Application............................................................................................................................... 4
Chapter 3: Methodology..............................................................................................................13
PAGE \* MERGEFORMAT v
3.1.2 Purposed Modeling and analysis.................................................................................15
3.4.2 PH Sensor.....................................................................................................................18
Conclusion................................................................................................................................... 25
REFERENCES.................................................................................................................................26
PAGE \* MERGEFORMAT v
LIST OF FIGURES
vi
LIST OF TABLES
vii
Chapter 1: INTRODUCTION
1.1 Backgrounds and Statement of Problems
Water is one of the most valuable natural resources that humans have gifted. Water management
becomes an important issue especially in industrial, agricultural and other sectors. Most of the
people around the world lack behind drinkable water.Research by WHO (World Health
Organization) shows that almost 1.4 million of child death can be prevented by providing
drinkable water to them. The primary objectiveof this project is to introduce an intelligent water
quality monitoring system in IoT (Internet of Things) platform which would help to monitoring
different physical parameters of the drinkable water rather than relying on manual process.
Moreover, We need a real time system which monitors water quality through sensors such as pH,
turbidity and temperature and updates those values in Cloud service. This system consists of
sensors which measure the chemical composition of water. These sensor values are then passed
to NodeMCU micro controllerwhich has inbuilt Wi-Fi module, using which the data is passed
over to cloud space and driffent protocol give information to user at real time.
Monitoring water quality is essential for protecting human health and the environment and
controlling water quality. Artificial Intelligence (AI)/Machine learning offers significant
opportunities to help improve the classification and prediction of water quality (WQ). In this
study, various AI algorithms are assessed to handle WQ data collected over an extended period
and develop a dependable approach for forecasting water quality as accurately as possible.
Specifically, various machine learning classifiers and their stacking ensemble models were used
to classify the WQ data via the Water Quality Index (WQI). The studied classifiers included
Support Vector Machine (SVM), Random Forest (RF), Logistic Regression (LR), Decision Tree
(DT), CAT Boost. The challenge lies in developing robust ML models capable of real-time
analysis and prediction of water quality parameters, enabling timely intervention to prevent
contamination and ensure safe water supply. By addressing this problem, the proposal aims to
enhance the efficiency and effectiveness of water quality management systems, contributing to
sustainable water resource utilization in the face of increasing environmental challenges.
1
1.1.1 Classification of water
Based on its source, water can be divided into ground water and surface water. Both types of
water can be exposed to contamination risks from agricultural, industrial, and domestic activities,
which may include many types of pollutants such as heavy metals, pesticides, fertilizers,
hazardous chemicals, and oils.
Water quality can be classified into four types—potable water, palatable water, Contaminated
(polluted) water, and infected water. The most common scientific Definitions of these types of
water quality are as follows:
1. Potable water: It is safe to drink, pleasant to taste, and usable for domestic Purposes.
2. Palatable water: It is esthetically pleasing; it considers the presence of chemicals That do not
cause a threat to human health.
2
1.1.2 Water quality standard
3
1.2 Objective
Gather and curate a diverse dataset comprising water quality parameters such as pH,
dissolved oxygen, turbidity, and contaminants to facilitate the training of a robust linear
regression model.
Develop a scalable and adaptable IoT infrastructure for water quality monitoring.
Implement machine learning algorithms to analyze water quality parameters and predict
potential contamination events.
Establish a feedback loop for continuous model improvement by regularly updating the
dataset and refining the linear regression model based on real-world performance, ensuring
adaptability to changing environmental conditions and emerging water quality challenges.
1.3 Application
The scope of implementing Machine Learning (ML) &IOT in water quality management is
expansive, encompassing diverse applications. ML algorithms offer real-time analysis of water
quality parameters, enabling proactive monitoring and early detection of contaminants. This
technology can optimize resource allocation, streamline treatment processes, and enhance
decision-making for water authorities. From predicting pollutant levels to assessing
environmental impact, ML's versatility allows for a holistic approach to water quality
management. Its applications extend to smart water systems, precision agriculture, and
safeguarding public health, underscoring its potential to revolutionize how we monitor, analyze,
and ensure the sustainability of water resources.
4
1.4 overview of proposal
The proposal aims to revolutionize water quality measurement by integrating advanced machine
learning (ML) techniques into existing monitoring systems. Leveraging ML algorithms will
enhance the efficiency, accuracy, and real-time nature of water quality assessment. This
innovative approach addresses the limitations of traditional methods, enabling proactive
management of water resources and ensuring the delivery of clean, safe water for diverse
purposes.
Government of Nepal has issued this notice of implementation of National Drinking Water
Quality Standards, 2062 under the provision of Water Resources Act, 2049, Clause 18 and Sub
Clause 1
(A)NationalDrinkingWaterQualityStandard
Concentration
S.N. Category Parameters Units Limits Remark
2 pH 6.5-8.5*
5
Concentration
S.N. Category Parameters Units Limits Remark
18 Nitrate mg/L 50
19 Copper mg/L 1
mg/L as
20 Total Hardness CaCO3 500
22 Zinc mg/L 3
26 E. Coli MPN/100 ml 0
Total MPN/100 0 in 95% samples
27 Microbiological Coliform ml
Base on the different physical chemical and biological properties water quality will be
classification with different parameter.
6
No. Physical Parameters Chemical Parameters Biological Parameters
4 Taste and odor Chloride Protozoa
5 Solids Chlorine residual
7 Nitrogen
8 Fluoride
11 Hardness
12 Dissolved oxygen
17 Radioactive substances
An index value is calculated for each of water quality parameters, temperature, biological
oxygen demand (BOD), total suspended sediment (TSS), dissolved oxygen (DO), and
conductivity. A higher value of each index indicates better water quality.and the following
relation was used to compute the WQI:.
7
Where N denotes the number of the total parameter, qi denotes the quality estimate scale for each
parameter i calculated by Eq. (2)
6 Above 150 Unfit for Drinking Proper treatment required before use.
Logistic regression is one of the most popular Machine Learning algorithms, which
comes under the Supervised Learning technique. It is used for predicting the categorical
dependent variable using a given set of independent variables.
Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or
False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values
which lie between 0 and 1.
8
Logistic Regression is much similar to the Linear Regression except that how they are
used. Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems.
Logistic Regression can be used to classify the observations using different types of data
and can easily determine the most effective variables used for the classification. The
below image is showing the logistic function:
Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is used
for Classification problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate
n-dimensional space into classes so that we can easily put the new data point in the correct
category in the future. This best decision boundary is called a hyper plane.
SVM chooses the extreme points/vectors that help in creating the hyper plane. These extreme
cases are called as support vectors, and hence algorithm is termed as Support Vector Machine.
9
Consider the below diagram in which there are two different categories that are classified using a
decision boundary or hyper plane.
[Source:https://www.javatpoint.com/machine-learning-support-vector-machine-algorithm ]
10
Fig 2.4.2 structural model of SVR
[Source:https://link.springer.com/article/10.1007/s11042-023-16737-4/figures/2]
Boost is a gradient boosted decision tree (GBDT) and category feature-based algorithm. Under
the context of the GBDT algorithm, this method is better at implementation. The critical issue is
dealing with categorical characteristics efficiently and reasonably. Boost is made up of two
elements: category variables and boost. When the Boost algorithm analyzes categorical features,
it includes all sample data sets in the learning process. Then Boost organizes all these sample
data sets at random and filters out samples from all characteristics with the same category.
Cat Boost overcomes a limitation of other decision tree-based methods in which, typically, the
data must be pre-processed to convert categorical string variables to numerical values, one-hot-
encodings, and so on. This method can directly consume a combination of categorical and non-
categorical explanatory variables without preprocessing. It preprocesses as part of the algorithm.
Cat Boost uses a method called ordered encoding to encode categorical features. Ordered
encoding considers the target statistics from all the rows prior to a data point to calculate a value
to replace the categorical feature.
Another unique characteristic of Cat Boost is that it uses symmetric trees. This means that at
every depth level, all the decision nodes use the same split condition.
Koju, N. K., Prasad, T., Shrestha, S. M., &Raut, P[2] . (2014). Drinking water quality of
Kathmandu Valley. Nepal Journal of Science and Technology, 15(1), 115–120.
doi:10.3126/njst.v15i1.12027Koju,
11
Pradeepkumar M, Monisha J, Pravenisha R, Praiselin V, Suganya Devi K[3]: entitled ”The Real
Time Monitoring of Water Quality in IoT Environment”. This paper discusses not only sensor
based system but also it introduces cloud computing architecture into IoT which makes the
sensor data accessible worldwide.
During the research we can found that research done by 2009.08.001 Khatiwada, N. R.Takizawa,
S., Tran, T. V. N[1]., & Inoue, M. (2002). Ground water contamination assessment for
sustainable water supply in Kathmandu Valley, Nepal.Water Science & Technology, 46(9), 147–
154. doi:10.2166/wst.2002.0226
M. Valdivia, et.al [4], proposed a model to identify best predictors of THM levels in final potable
water and distribution networks, and to decide the rate of change in future. The data between
Jan 2011 and Jan 2013 from 93 full-scale Scottish water treatment plants were inspected to
recognize the factors causing the advancement of THMs. Multilinear regression algorithms were
used to build the models for individual THMs compounds. Pearson's correlation analysis was
applied to measure data and concluded that ambient temperature, DOC, and chloride were
important in the formation of THMs across Scottish WTPs.
Daigavane et.al.[5], the proposed system, used sensors with Wi-Fi module for conductivity,
temperature, water level, pH and turbidity along with power supply were connected to the basic
controller-Arduino UNO. The basic controller retrieves the values of the sensor to be assessed by
placing the sensors in separate water samples and the data will be forwarded to the cloud using
the WI-FI module. The recommended android application will be used to detect sensor values
examined via cloud, and alerts will be provided to the user if the value exceeds the threshold
value
12
Nikhil Kedia[6] entitled ”Water Quality Monitoring for Rural Areas-A Sensor Cloud Based
Economical Project” This paper not only highlights embedded sensor systems, but also discusses
the challenges and economic viability of the system involving Mobile Network Operator and
Government. This system directly contacts Government to take action based on the severity of
quality issue.
Yafra Khan, et.al [7], proposed a prediction model for water quality using Artificial Neural
Network and time-series analysis to support water quality factors. The water quality data from
January to March 2014, were collected from an online re-source of the United States Geological
Survey. The dataset includes chlorophyll, specific conductance, dissolved oxygen, and turbidity
which affect and influence the quality of water. A feed-forward Neural Network with NAR time
series mod-el had been used with the training algorithm of Scaled Conjugate Gradient and the
activation function of Log Sigmoid. The performance evaluation of the ANN based predictive
model were calculated using Regression, Mean Squared Error and Root Mean Squared Error.
The ANN-NAR proposed model proves that the prediction accuracy indicating much improved
values as compared to other algorithms.
13
Chapter 3: Methodology
3.1 Introduction to System Design
Design is the abstraction of a solution .It is general description of the solution to problem without
the details. Design is a view pattern seen in the analysis phase to be a pattern in a design phase.
After the design phase we can reduce the time required to create the implementation.
The design of the system is the most critical factor affecting the quality of the application .The
system design aims to identify the modules that should be in the system, the specification of
these module and how with each other to produce the desired result.
For a system like our needs some kind of dataset that includes multiple classes. Thus, to have
proper classification, we will collect data from driffent sources like Department of water supply
&sewerage management, ENFOS etc. and try to add more by doing field visits if needed. We
need to go through some data preprocessing steps in case of noisy and messy data.
14
Suggest their Remedies
Fig.3.1.1 shows the schematic circuit diagram of the hardware set-up of the proposed IWQM
system. Except the temperature sensor, other three sensors are of analog type. Each sensor has
three different color wires such as red, black and others. Here, red wires are for +5V power
supply, black wires are for ground and others are used for data estimation. A breadboard is used
for creating common points for ground and power supply separately. Then common node of
ground is connected to the ground of ESP-32 and same process is repeated for power supply. The
analog sensors are connected to the analog pins and digital sensor is connected to digital pin of
the controller.
15
3.1.2 Purposed Modeling and analysis
Machine learning required a large amount of historical data. Data collection has a sufficient
amount of historical and raw data. Raw data cannot be used directly prior to data pre-processing.
It is then used to preprocess what kind of algorithm with the model. Training and testing this
model to ensure that it predicts correctly and with minimal errors. A tuned model involves tuning
from time to time to improve accuracy.
There are basically fallowing steps for making our model predict the water quality of the water
samples. Those steps are:-
A. Problem Identification
In this step, we identify the problem which is solved by our model. So the problem to be
solvedby our model is water qualityprediction using a dataset.
16
B. Data Extraction:-
In this, we extract the data from the internet to train our data and predict the water
quality. So for that, we take the Department of water supply and Sewerage Management
dataset which contains almost 2200 instances of different places which are collected
between up to 2023.
C. Data Exploration:-
In this step, we analyze the data visually by comparing some parameters of water with the
WHO standards of water. It gives a slight overview of the data.
D. Data Cleaning
In this step, we clean that data like if there are some missing values in it so we replace
them with mean and remove noise from thedata.
E. Data Engineering
In this step, we ensure that the data is quality data so that the prediction accuracy
increases.
F. Data Selection
In this step, we select the data types and source of the data. The essential goal of data
selection is deciding fitting data type, source, and instrument that permit agents to
respond to explore questions sufficiently
G. Data Splitting
In this step, we divide the dataset into smaller subsets for easing the complexity.
Normally, with a two-section split, one section is utilized to assess or test the information
and the other to prepare the model.
H. Data Modeling
In this step, we create a graph of the dataset for visual representation of data for better
understanding. A Data Model is this theoretical model that permits the further structure of
conceptual models and to set connections between data.
17
3.2 PurposedClassification models for Water quality prediction
18
:.
ESP32 is a series of low-cost, low-power system on a chip microcontroller with integrated Wi-Fi
and dual-mode Bluetooth.
The ESP32 is dual core, this means it has 2 processors.
It has Wi-Fi and Bluetooth built-in.
The clock frequency can go up to 240MHz and it has a 512 kB RAM.
This particular board has 30 or 36 pins, 15 in each row
3.4.2PH Sensor
The sensor generates a voltage proportional to the hydrogen ion concentration, and this voltage is
then converted into a pH value.
19
Fig 3.4.2 PH Sensor
20
3.4.3 Turbidity Sensor
Turbidity is a measure of water quality that reflects the amount of suspended particles in a water
sample by observing the amount of light scattered through it.Water with high turbidity often
requires purification processes before it can be used in industrial and domestic applications. This
is because a decrease in turbidity often implies a reduction in harmful substances, bacteria, and
viruses in the water.
Thing Speak enables sensors, instruments, and websites to send data to the cloud where it is
stored in either a private or a public channel. Thing Speak stores data in private channels by
default, but public channels can be used to share data with others.
21
3.4.5 Web base application /Mobile app
22
Chapter 4: Feasibility analysis
From all the study done regarding the feasibility of proposed system, it can be said tat system is
moderately feasible. Feasibility study on the project can be categorized in the fallowing three
types which are:
1. Technical Feasibility
2. Economics Feasibility
3. Operational Feasibility
4.1 Technical Feasibility
The project uses machine learning, deep learning algorithmstechniques. The main difficult task is
the creation of dataset which is a time consuming as well as a difficult thing to do. But, once it is
done then the project is technically feasible.
4.2 Economic Feasibility
The system is more software than hardware. So, it will be economically feasible. But to Make
the model or to train it, we need a normal PC which is present everywhere Nowadays.
4.3 Operational Feasibility
We are going to deploy the trained model into a possibly website which will be very easy to use
as well as user friendly. So, the system can be operated by anyone with idea of using a website.
4.4 Gantt Chart
23
Fig 4.2: Gantt chart
4.5 Cost Estimation
S.N. Item No. of Cost per unit of the items Total Cost
Items
1 ESP-32 1 1000 1000
2 Turbidity Sensor 1 2250 2250
3 Water proof Tempeture Sensor 1 2000 2000
4 PH sensor 1 1500 1500
5 Jumping wires pieces 15 100 100
Total Cost Estimation 6850
On integrating this system with state and central government work flow, we can enable
fast response rate from government officers thus improving the quality of life in rural as
well as urban areas.
Adding more quality sensors which can detect other chemical and physical parameters
affecting the quality of water can improve our results and thus making our system
effective. Etc.
24
25
Conclusion
This paper presented a practical and economical solution to monitor the quality of water
especially in rural areas without any human intervention. To solve this problem this paper
presented various contemporary technologies such as IoT, cloud computing and Machine
learning. On combining these technologies we are able to solve one of the basic and emerging
problem of human survival to certain extent.
So, in this paper, we propose an alternative approach using artificial intelligence to predict water
quality. This method uses a significant and easily available water quality index which is set by
the WHO (World Health Organization). The data taken in this paper is taken from “Department
of water supply and sewerage management”& ENFOS which includes 2285 examples of the
distinct wellspring. In this paper, WQI (Water Quality Index) is calculated using AI techniques.
26
REFERENCES
[1] Pradeepkumar M, Monisha J. ”The Real Time Monitoring of Water Quality in IoT
Environment” 2016 International Journal of Innovative Research in Science, Engineering and
Technology, 2015 ISSN(Online): 2319-8753
[3] Kedia, Nikhil. ”Water Quality Monitoring for Rural Areas- a Sensor Cloud Based
Economical Project” 2015 1st International Conference on Next Generation Computing
Technologies (NGCT), 2015, doi:10.1109/ngct.2015.7375081.
[4] Vijayakumar, N, and R Ramya. ”The Real Time Monitoring of Water Quality in IoT
Environment” 2015 International Conference on Circuits, Power and Computing Technologies
[ICCPCT-2015], 2015, doi:10.1109/iccpct.2015.7159459.
[6] Fiona Regan, Antoin, McCarthy. ”Smart Coast Projectˆa Smart Water Quality Monitoring
Systemˆa 2006, Marine Institute/Environmental Protection Agency Partnership, 2006
27
[7] Vaishnavi V. Daigavane, Dr. M.A Gaikwad. ”Water Quality Monitoring System Based on
IOT” 2017 Advances in Wireless and Mobile Communications, Nov 2017 ISSN 0973-6972
[8] Pradeepkumar M, Monisha J. ”The Real Time Monitoring of Water Quality in IoT
Environment” 2016 International Journal of Innovative Research in Science, Engineering and
Technology, 2015 ISSN(Online) : 2319-8753
28
29
30
31