Annual Average Rainfall Prediction Using KNN Model of Machine Learning

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Annual Average Rainfall Prediction Using

KNN Model of Machine Learning


Nidhi Lal1, Shishupal Kumar2
nidhi.2592@gmail.com1
shishupal.kumar@cse.iiitn.ac.in5
Department of Computer Science and Engineering
Indian Institute of Information Technology, Nagpur, India

Abstract – Rainfall prediction is one of the most daunting disasters. A flood warning system also needs a numerical
tasks which can impact human society in a significant way. precipitation forecast to improve the warning lead time.
Accurate and proper rainfall prediction can reduce the loss Annual average precipitation projections, which are
of life and property. The prediction of rainfall is the premise referred to predictions with lead times of a year, can be
of responding to management or water resource and timely
effectively used to predict natural disasters such as
action against floods. The complexity of the hydrological
system, and rainfall time series as a member of it having droughts or floods and also for catch management and
characteristics of being nonlinear and nonstationary, makes planning the use of water resources.Since the complexity
accurate prediction of rainfall a much difficult job at present. of the physical laws of rainfall are quite complex, it is
This paper proposes the KNN model for prediction of annual impossible as of now to predict this process concisely and
average rainfall. completely[3]. Due to this, time-series modeling, aiming
to reproduce relevant statistical characteristics by using of
KEYWORDS - Rainfall Prediction, K-Nearest Neighbors the observed series, plays an important role in long-term
(KNN) Model, Annual Average Rainfall.
rain forecasting [4] [5].
I. INTRODUCTION
This paper attempts to predict the rainfall in Barisal,
Bangladesh using data on the amount of precipitation each
The forecasting of precipitation continues to be a serious
day in the span of several years using the KNN(K-Nearest
concern and has attracted the attention of governments,
Neighbours) model of machine learning. This paper has
businesses, risk management agencies and the scientific
been structured to contain an introduction, proposed
community. Rainfall is a weather factor that affects,
methodology, training the model, implementation details,
among other things, many human activities such as
results and a brief discussion of the proposed method, its
agricultural production, construction, power generation,
implementation and advantages.
forestry and tourism [1]. To this degree, prediction of
rainfall is important because of this parameter. Such
II. RELATED WORK
forecasts also promote the monitoring of, among others,
the activities of agriculture, infrastructure, tourism,
With the influence of weather in social and economic
transport and education. The availability of reliable
activities, prediction of atmospheric phenomena like
meteorological forecasts can improve decision-making in
rainfall would contribute to the prevention of adverse
the face of potential occurrence of natural events for
events. In the last few years, several approaches have been
disaster management agencies [2].
proposed in order to deal with this goal.
The precipitation forecast has substantial value in
• Abhishek et al. proposed the use of ANNs for predicting
many areas. The higher the predictions lead, the better the
the monthly average rainfall in an area of India
use of water will be, and there will be less drought
characterized by monsoon type climate. The presented months of the year have the same number of days some of
case study used data of eight-month per year. In these the days have no value assigned to them. These values are
months, there is certainty that rainfall events will be changed to 0 for preprocessing purposes. We have also
present. The authors use the average humidity and average calculated the total rainfall for each month of each year for
wind speed as explanatory variables [6]. analysis.

• Lee, Sunyoung, Sungzoon Cho, and Patrick M. Wong.


"Rainfall prediction using artificial neural networks."
journal of geographic information and Decision Analysis
2.2 (1998): 233-242 [7].

• Chau, K. W., and C. L. Wu. "A hybrid model coupled


with singular spectrum analysis for daily rainfall
prediction." Journal of Hydroinformatics 12.4 (2010):
458-473 [8].

• Ramana, R. Venkata, et al. "Monthly rainfall prediction


using wavelet neural network analysis." Water resources
management 27.10 (2013): 3697-3711 [9].

• Wong, Kok Wai, et al. "Rainfall prediction model using


soft computing technique." Soft Computing 7.6 (2003):
434-438 [10].

• Sahai, A. K., M. K. Soman, and V. Satyan. "All India


summer monsoon rainfall prediction using an artificial
neural network." Climate dynamics 16.4 (2000): 291-302
[11].

• Nayak, Deepak Ranjan, Amitav Mahapatra, and Pranati


Mishra. "A survey on rainfall prediction using artificial
neural network." International Journal of Computer
Applications 72.16 (2013) [12].

III. PROPOSED WORK

A. THE DATASET

The dataset used in this project has 216 records . There are
31 distinct labels each representing a different day of the
month. There are 5 attributes consisting of one categorical,
two continuous attributes, and two discrete (Fig. 1).
Fig. 1: The Dataset
The Station attribute is Barisal for all the records
indicating that the dataset provides information on the
annual precipitation in Barisal. Since this is the same for
all the records, we drop this attribute. The records span a B. ATTRIBUTES
time of 17 years from 1995 to 2012. The month attribute
assigns a numerical value(1 to 12) to each of the months of • Station: This is the location of the record. There is only 1
the year in order. The High temp and Low temp attributes type of station and that is Barisal.
show the highest temperature and the lowest temperature
in each month of a particular year. The rainfall on each • Year: The year indicates the year in which the record
day of the month is recorded as an integer. Since not all was taken. It can be any number from 1995 to 2012. The
number of records for each year is the same. That is, each mm. The data shown in Fig. 2 and Fig. 3 has been plotted
year comprises of 14.28% of the dataset. using Matplotlib in Python 3.7.

• Month: This is the month in which the record was taken.


The numbers assigned to each month are as follows-
January - 1, February - 2 , March - 3, April - 4, May - 5,
June - 6, July - 7, August - 8, September – 9, October –
10, November – 11 and December – 12.

• High temp: Indicates the highest temperature in the


corresponding month of the year in degree celsius. The
highest value is 35.1 in April 1999 and the lowest value is
on January 1998.

• Low temp: Indicates the lowest temperature in the Fig. 3: Total Monthly Rainfall
corresponding month of the year in degree celsius. The
highest value is 27.1 in June 2005 and the lowest value is
on January 2001. E. APPROACH

C. LABEL For our work we have used the KNN(K-Nearest


Neighbours) model to predict the rainfall. The K-NN is a
Each label is an integer value from 1 to 31 indicating the nonparametric approach, and was introduced from pattern
day of the month. Then number of labels can vary from 28 recognition work (Cover and Hart 1967)[13]. Because of
to 31 for each month. February on leap year is considered its ability to approximate nonlinear dynamics, with the
to have 29 days and 28 days on the rest of the years. development of the theory on nonlinear dynamics, K-NN
has been adopted by many researchers in the field of
D. DATA VISUALISATION nonlinear dynamics as a standard method for predicting
time series (Farmer and Sidorowich 1987; Sugihara and
May 1990) [14].

Methods of K-nearest neighbor (K-NN) use


similarity (neighborhood) between predictor observations
and related sets of historical observations (successors) to
obtain the best approximation for a dependent variable
(Karlsson and Yakowitz 1987; Lall and Sharma 1996)
[15]. It can be used in databases containing data points
where the points are divided into several classes to predict
the classification of a new sample point.

Fig. 2:Annual Rainfall For the estimation of error, R2(R-squared) score


has been used for measurement. It measures the proportion
Fig. 2 shows the annual rainfall for each of the of the variance for a dependent variable that is explained
by one or more independent variables in a regression
years, from the figure we can see that the year 1998
model. The R2 score is used to explain to what extent the
witnessed the highest rainfall of about 2500 mm and 2012
variance of one variable explains the variance in a second
witnessed the lowest rainfall of about 1500mm.
variable.For preprocessing the data, the entries containing
Fig. 3 shows the total rainfall in each month over no value (due to non-existence of the day) have been filled
all the years. This has been calculated by adding all the with 0. The year attribute has been dropped completely
records for the respective month totals which had been since it has been noticed that better results are obtained by
calculated earlier. We can see from the Fig. 3 that July doing so.
experiences the highest rainfall of nearly 8000 mm while
December experiences the lowest rainfall of less than 100 80% of the data set has been used for training
purposes and the rest of the 20% has been used for testing.
The number of random states is set to 25. For optimal From Fig 4 we see that relatively better values of R2 score
accuracy of the model, the number of neighbours has been are obtained with 25 random states and 8 neighbours as
set to 8. For all the experiments and development of compared to 10 random states and 5 neighbours before.
classifiers, we used Python 3.7 and Google colab’s Jupyter
Notebook. We used libraries such as Scikit Learn, Pandas Fig. 5a and 5b show the R2 score when the number of
and Numpy for processing of data and predictions. neighbours is fixed (8 and 5 respectively) and the random
states are varied. Higher values of R2 score are observed
when the number of neighbours are 8.

IV. RESULT AND DISCUSSION

Fig. 4: Varying Neighbours with 25 and 10 Random


States

Fig. 6: Predictions with 10 Random States and 5


Neighbours

5a.

Fig. 7: Predictions with 25 Random States and 8


5b. Neighbours

Fig. 5: Varying Random States with 5 and 8 In Fig. 6, there were 10 random states, 5 neighbours, and
Neighbours year was included in dataset which result in the R2 score
equal to 0.761.To improve the R2 score, the random states
which were 10 previously,were now increased to 25
random states. Also the neighbours were increased to 8. [5] Herrera, L. J., Pomares, H., Rojas, I., Guillén, A.,
Year attribute is also dropped from the dataset, which Prieto, A., and Valenzuela, O. (2007). “Recursive
results in improvement in R2 score,which is now increased prediction for long term time series forecasting using
to 0.827(Fig. 7). advanced models.” Neurocomputing, 70(16–18),
2870–2880.
These results can be used in better predictions of rainfall
and can be used to tackle the problems such as disaster [6] Kumar Abhishek, A. Kumar, R. Ranjan, and S.
mitigation, planned crop management, etc. Also, it should Kumar. A rainfall prediction model using artificial
be noted that these results may or may not be applicable neural network. In 2012 IEEE Control and System
only for the city of Barisal since rainfall depends on many Graduate Research Colloquium (ICSGRC), pages
more factors such as geographic location. It is also 82–87, July 2012.
possible that better results are obtained using other models
of machine learning models like Logistic Regression,
Linear Regression or using Artificial Neural Networks. [7] Lee, Sunyoung, Sungzoon Cho, and Patrick M.
Another fact is that due to high rate of global warming and Wong. "Rainfall prediction using artificial neural
climate change as compared to before, year might actually networks." journal of geographic information and
be a vital factor in recent years. [17] [18] Decision Analysis 2, no. 2 (1998): 233-242.

V. CONCLUSION [8] Chau, K. W., and C. L. Wu. "A hybrid model


coupled with singular spectrum analysis for daily
From the above results we can see that optimum results for rainfall prediction." Journal of Hydroinformatics 12,
the prediction of rainfall in the city of Barisal for the time no. 4 (2010): 458-473.
from 1995 to 2012 using K-Nearest Neighbour Modelare
obtained from using 25 random states and 8 neighbours. [9] Ramana, R. Venkata, B. Krishna, S. R. Kumar, and
Another thing seen was that dropping the year column N. G. Pandey. "Monthly rainfall prediction using
gave better results as compared to when the year column wavelet neural network analysis." Water resources
was also used for prediction. This shows that in the management 27, no. 10 (2013): 3697-3711.
timespan when the dataset was taken, the year did not play
a vital role in affecting rainfall. This may not be true now [10] Wong, Shue Tuck. "A MULTIVARIATE
due to global warming, pollution and other climate STATISTICAL MODEL FOR PREDICTING
affecting factors. MEAN ANNUAL FLOOD IN NEW ENGLAND1."
Annals of the Association of American Geographers
VI. REFERENCES 53, no. 3 (1963): 298-311.

[1] World Health Organization: Climate Change and [11] Sahai, A. K., A. M. Grimm, V. Satyan, and G. B.
Human Health: Risks and Responses. World Health Pant. "Long-lead prediction of Indian summer
Organization, January 2003. monsoon rainfall from global SST evolution."
Climate Dynamics 20, no. 7-8 (2003): 855-863.
[2] Alcntara-Ayala, I.: Geomorphology, natural hazards,
vulnerability and prevention of natural disasters in [12] Nayak, Deepak Ranjan, Amitav Mahapatra, and
developing countries. Geomorphology 47(24), Pranati Mishra. "A survey on rainfall prediction
107124 (2002) using artificial neural network." International Journal
of Computer Applications 72, no. 16 (2013).
[3] Hu, Jian, Jun Liu, Yong Liu, and Cheng Gao.
"EMD-KNN model for annual average rainfall [13] Cover, T. M., and Hart, P. E. (1967). “Nearest
forecasting." Journal of Hydrologic Engineering 18, neighbor pattern classification.” IEEE Trans.
no. 11 (2011): 1450-1457. Inform. Theory, 13(1), 21–27.

[4] Chen, S. M., and Hsu, C. C. (2004). “A new method [14] Farmer, D. J., and Sidorowich, J. J. (1987).
to forecast enrollments using fuzzy time series.” Int. “Predicting chaotic time series.” Phys. Rev.
J. App. Sci. Eng., 2(3), 234–244 Lett., 59(8), 845–848.

[15] Karlsson, M., and Yakowitz, S. (1987). “Nearest-


neighbor methods for nonparametric
rainfall-runoff forecasting.” Water Resour. Res.,
23(7), 1300–1308.

[16] Oswal, Nikhil. "Predicting Rainfall using Machine


Learning Techniques." arXiv preprint
arXiv:1910.13827 (2019).

[17] Shongwe, Mxolisi E., G. J. Van Oldenborgh, B. J. J.


M. Van Den Hurk, B. De Boer, C. A. S. Coelho, and
M. K. Van Aalst. "Projected changes in mean and
extreme precipitation in Africa under global
warming. Part I: Southern Africa." Journal of
Climate 22, no. 13 (2009): 3819-3837.

[18] Ayers, Greg. "Air pollution and climate change: has


air pollution suppressed rainfall over Australia?."
Clean Air and Environmental Quality 39, no. 2
(2005): 51.

You might also like