Annual Average Rainfall Prediction Using KNN Model of Machine Learning
Annual Average Rainfall Prediction Using KNN Model of Machine Learning
Annual Average Rainfall Prediction Using KNN Model of Machine Learning
Abstract – Rainfall prediction is one of the most daunting disasters. A flood warning system also needs a numerical
tasks which can impact human society in a significant way. precipitation forecast to improve the warning lead time.
Accurate and proper rainfall prediction can reduce the loss Annual average precipitation projections, which are
of life and property. The prediction of rainfall is the premise referred to predictions with lead times of a year, can be
of responding to management or water resource and timely
effectively used to predict natural disasters such as
action against floods. The complexity of the hydrological
system, and rainfall time series as a member of it having droughts or floods and also for catch management and
characteristics of being nonlinear and nonstationary, makes planning the use of water resources.Since the complexity
accurate prediction of rainfall a much difficult job at present. of the physical laws of rainfall are quite complex, it is
This paper proposes the KNN model for prediction of annual impossible as of now to predict this process concisely and
average rainfall. completely[3]. Due to this, time-series modeling, aiming
to reproduce relevant statistical characteristics by using of
KEYWORDS - Rainfall Prediction, K-Nearest Neighbors the observed series, plays an important role in long-term
(KNN) Model, Annual Average Rainfall.
rain forecasting [4] [5].
I. INTRODUCTION
This paper attempts to predict the rainfall in Barisal,
Bangladesh using data on the amount of precipitation each
The forecasting of precipitation continues to be a serious
day in the span of several years using the KNN(K-Nearest
concern and has attracted the attention of governments,
Neighbours) model of machine learning. This paper has
businesses, risk management agencies and the scientific
been structured to contain an introduction, proposed
community. Rainfall is a weather factor that affects,
methodology, training the model, implementation details,
among other things, many human activities such as
results and a brief discussion of the proposed method, its
agricultural production, construction, power generation,
implementation and advantages.
forestry and tourism [1]. To this degree, prediction of
rainfall is important because of this parameter. Such
II. RELATED WORK
forecasts also promote the monitoring of, among others,
the activities of agriculture, infrastructure, tourism,
With the influence of weather in social and economic
transport and education. The availability of reliable
activities, prediction of atmospheric phenomena like
meteorological forecasts can improve decision-making in
rainfall would contribute to the prevention of adverse
the face of potential occurrence of natural events for
events. In the last few years, several approaches have been
disaster management agencies [2].
proposed in order to deal with this goal.
The precipitation forecast has substantial value in
• Abhishek et al. proposed the use of ANNs for predicting
many areas. The higher the predictions lead, the better the
the monthly average rainfall in an area of India
use of water will be, and there will be less drought
characterized by monsoon type climate. The presented months of the year have the same number of days some of
case study used data of eight-month per year. In these the days have no value assigned to them. These values are
months, there is certainty that rainfall events will be changed to 0 for preprocessing purposes. We have also
present. The authors use the average humidity and average calculated the total rainfall for each month of each year for
wind speed as explanatory variables [6]. analysis.
A. THE DATASET
The dataset used in this project has 216 records . There are
31 distinct labels each representing a different day of the
month. There are 5 attributes consisting of one categorical,
two continuous attributes, and two discrete (Fig. 1).
Fig. 1: The Dataset
The Station attribute is Barisal for all the records
indicating that the dataset provides information on the
annual precipitation in Barisal. Since this is the same for
all the records, we drop this attribute. The records span a B. ATTRIBUTES
time of 17 years from 1995 to 2012. The month attribute
assigns a numerical value(1 to 12) to each of the months of • Station: This is the location of the record. There is only 1
the year in order. The High temp and Low temp attributes type of station and that is Barisal.
show the highest temperature and the lowest temperature
in each month of a particular year. The rainfall on each • Year: The year indicates the year in which the record
day of the month is recorded as an integer. Since not all was taken. It can be any number from 1995 to 2012. The
number of records for each year is the same. That is, each mm. The data shown in Fig. 2 and Fig. 3 has been plotted
year comprises of 14.28% of the dataset. using Matplotlib in Python 3.7.
• Low temp: Indicates the lowest temperature in the Fig. 3: Total Monthly Rainfall
corresponding month of the year in degree celsius. The
highest value is 27.1 in June 2005 and the lowest value is
on January 2001. E. APPROACH
5a.
Fig. 5: Varying Random States with 5 and 8 In Fig. 6, there were 10 random states, 5 neighbours, and
Neighbours year was included in dataset which result in the R2 score
equal to 0.761.To improve the R2 score, the random states
which were 10 previously,were now increased to 25
random states. Also the neighbours were increased to 8. [5] Herrera, L. J., Pomares, H., Rojas, I., Guillén, A.,
Year attribute is also dropped from the dataset, which Prieto, A., and Valenzuela, O. (2007). “Recursive
results in improvement in R2 score,which is now increased prediction for long term time series forecasting using
to 0.827(Fig. 7). advanced models.” Neurocomputing, 70(16–18),
2870–2880.
These results can be used in better predictions of rainfall
and can be used to tackle the problems such as disaster [6] Kumar Abhishek, A. Kumar, R. Ranjan, and S.
mitigation, planned crop management, etc. Also, it should Kumar. A rainfall prediction model using artificial
be noted that these results may or may not be applicable neural network. In 2012 IEEE Control and System
only for the city of Barisal since rainfall depends on many Graduate Research Colloquium (ICSGRC), pages
more factors such as geographic location. It is also 82–87, July 2012.
possible that better results are obtained using other models
of machine learning models like Logistic Regression,
Linear Regression or using Artificial Neural Networks. [7] Lee, Sunyoung, Sungzoon Cho, and Patrick M.
Another fact is that due to high rate of global warming and Wong. "Rainfall prediction using artificial neural
climate change as compared to before, year might actually networks." journal of geographic information and
be a vital factor in recent years. [17] [18] Decision Analysis 2, no. 2 (1998): 233-242.
[1] World Health Organization: Climate Change and [11] Sahai, A. K., A. M. Grimm, V. Satyan, and G. B.
Human Health: Risks and Responses. World Health Pant. "Long-lead prediction of Indian summer
Organization, January 2003. monsoon rainfall from global SST evolution."
Climate Dynamics 20, no. 7-8 (2003): 855-863.
[2] Alcntara-Ayala, I.: Geomorphology, natural hazards,
vulnerability and prevention of natural disasters in [12] Nayak, Deepak Ranjan, Amitav Mahapatra, and
developing countries. Geomorphology 47(24), Pranati Mishra. "A survey on rainfall prediction
107124 (2002) using artificial neural network." International Journal
of Computer Applications 72, no. 16 (2013).
[3] Hu, Jian, Jun Liu, Yong Liu, and Cheng Gao.
"EMD-KNN model for annual average rainfall [13] Cover, T. M., and Hart, P. E. (1967). “Nearest
forecasting." Journal of Hydrologic Engineering 18, neighbor pattern classification.” IEEE Trans.
no. 11 (2011): 1450-1457. Inform. Theory, 13(1), 21–27.
[4] Chen, S. M., and Hsu, C. C. (2004). “A new method [14] Farmer, D. J., and Sidorowich, J. J. (1987).
to forecast enrollments using fuzzy time series.” Int. “Predicting chaotic time series.” Phys. Rev.
J. App. Sci. Eng., 2(3), 234–244 Lett., 59(8), 845–848.