0% found this document useful (0 votes)
20 views4 pages

Paper 84

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 4

Online Incremental Machine Learning Model for

Multi-Sensor and Multi-Temporal Satellite Data: A


Case Study of Bangalore
Yash Mittal Uttam Kumar Deeksha Aggarwal
Spatial Computing Laboratory Spatial Computing Laboratory Spatial Computing Laboratory
IIIT Bangalore IIIT Bangalore IIIT Bangalore
Bangalore, India Bangalore, India Bangalore, India
yash.mittal@iiitb.ac.in uttam@iiitb.ac.in deeksha.aggarwal003@iiitb.ac.in

Abstract—Once a machine learning model has been trained In recent years, various thematic LULC time series products
and deployed, training it again on new and different data is have been generated for water [1], snow cover [2], vegetation
time and compute expensive. In this research, we propose a index [3], urban/settlement extents [4], [5], and general LULC
novel approach that enables machine learning models to continue
learning on multi-sensor and multi-temporal data. We have [6]. Many Earth Observation (EO) based forecasting applica-
presented a comparative analysis between traditional learning tions have been made for LULC, crop output, plant cover,
and incremental learning approaches in the context of land use flooding, and urbanization, according to Koehler and Kuenzer
classification. The study explores the benefits of incremental [7]. LULC classification plays a crucial role in understanding
online learning over traditional learning methods, focusing on and managing our dynamic environment. Accurate and up-to-
its performance and effectiveness in handling large and evolving
datasets. The online incremental approach effectively integrates date information about the distribution and changes in land
information from multiple sources and time periods, enhancing cover classes, such as forests, urban areas, agriculture, and
the overall predictive power and accuracy of the models. water bodies, is essential for environmental monitoring, urban
Index Terms—incremental learning, LULC classification, on- planning, natural resource management, and climate change
line learning studies. However, the classification of LULC from remote
sensing data poses unique challenges due to the dynamic
I. I NTRODUCTION nature of land cover patterns and the continuous influx of
new data. In our study we have used incremental learning
Land cover (LC) refers to the physical and biological cover to perform LULC classification over four different classes.
present over the surface of land - water, vegetation, bare soil, Data over several years through different satellites was used
etc. Land use (LU) is defined from human perspective (purpose to perform this classification.
and context) - use of land as agriculture, forestry and building Traditional machine learning approaches for LULC classi-
construction (residential vs. commercial, etc.). An urban area fication often rely on training models with a static dataset and
(cities and towns) has various land uses such as residential, then applying those models to classify new data. However,
commercial, industrial, water/lakes, vegetation and agricultural this approach fails to capture the temporal dynamics and
areas. Understanding the spatial and temporal dynamics of ur- concept drifts that occur in land cover classes over time. As
ban growth as well as determining how urbanization affects the the landscape undergoes transformations caused by natural
environment require an understanding of LU. Researchers and processes and human activities, there is a pressing need for
decision-makers can find and analyse patterns and trends in models that can adapt and update to these changes in a
urban growth and formulate plans for managing and reducing seamless and efficient manner.
the detrimental effects of urbanization by examining changes In this study we have used IL to perform LU classification of
in LU over time. four different classes in an urban area for which data over sev-
The process of classifying the land surface into distinct eral years acquired through multi-satellite sensors were used
groups according to its physical, biological and anthropogenic to perform the classification task. Current work addresses the
properties is known as land use land cover (LULC) classi- limitations of traditional static models by enabling the adaptive
fication. Several techniques including Geospatial Information and continuous learning of LU patterns. Incremental learning
System and remote sensing along with field surveys aid in allows models to adapt to evolving landscapes, automatically
classifying LU. Large-scale mapping and classification of LU incorporate new data, and refine their classification capabilities
frequently involves the use of several satellite data that are over time. By incrementally updating the models, we can
used to track changes over time and give useful information effectively capture changes in LU classes, account for concept
about their spatial-temporal distribution. drifts, and provide accurate and up-to-date LU information.
TABLE I for training. For example, in first set of experiments training
DATA FROM D IFFERENT S ENSORS data from 2018 to 2022 were combined and utilized to train
Data of Year Sensors the ML model. Subsequently, the testing data to evaluate the
2007 LT05C2-L1 performance of the trained model for that specific year was
2009, 2010, 2011 LT05-TM 2022. Similar experiments were conducted for other years
2013, 2014, 2015,
2016, 2017, 2018, as well, ensuring a comprehensive evaluation across multiple
L8-OLI time periods (see Table III for details).
2019, 2020, 2021,
2022 With evaluation over multiple years we have also done
comparison for single year. The training data from one single
year was used to train model and also tested on testing data
II. DATA AND S TUDY A REA for same year.
Greater Bangalore’s remote sensing data for the Land- To perform the classification task, the Random Forest
sat series with a 30 m spatial resolution was taken from Classifier from the scikit-learn package was employed. This
USGS’s public domain at http://landsat.usgs.gov. Ward bor- classifier is known for its ability to handle complex datasets
ders on the city map were digitally extracted from the and provide robust classification results. In order to optimize
BBMP (Bruhat Bangalore Mahanagara Palike) map. Us- the performance of the classifier, a grid search technique was
ing a portable, pre-calibrated GPS (Global Positioning Sys- employed, coupled with cross-validation with a fold size of
tem), Survey of India topographical sheet, Google Earth, five. This process allowed for the systematic exploration of
Bhuvan (http://earth.google.com, http://bhuvan.nrsc.gov.in), various hyperparameter combinations, ultimately selecting the
ground control points were gathered to register and geocorrect best model based on the evaluation metrics. By systematically
remote sensing data. tuning the hyperparameters, the model was able to adapt and
Landsat data were georeferenced and cropped to fit the study generalize well to the given dataset, enhancing its ability to
area. False colour composite (FCC) images were generated to accurately classify LC and LU classes. Details of experimental
aid the identification and analysis of heterogeneous patches setup and results are presented in Tables II and III
found within the research region. Different land use categories
involved are i) urban (buildings, roads and paved surfaces), ii) B. Incremental Online Learning ML Models
vegetation (parks, botanical gardens and grass lands such as In the context of incremental learning, the training data for
golf field), iii) water bodies (lakes, sewage treatment tanks) all years was processed in a sequential manner. This involved
and iv) others (play grounds, quarry regions and barren land). passing the data for each year individually, starting from the
Landsat data of the year 2007 to 2022 comprised of several earliest year and progressing chronologically. For example,
GeoTIFF images covering the study area of interest were the data for the year 2018 was processed record by record,
preprocessed using Python libraries like Rasterio to ensure followed by the data for the subsequent years up to 2022.
spatial consistency, resolution and alignment. Note that these The model was incrementally trained on this sequential data.
data were captured from different sensor as shown in Table Once the training phase was completed, the model was then
I. Subsequently, labelled data were extracted from the images evaluated using the test data from the year 2022 to assess its
for different LU classes. The masked pixels were assigned performance and validate its effectiveness. Refer to Tables II
class labels based on available ground truth data and expert and III for more details.
knowledge. This process helped to generate a labeled dataset To check that performance of the models does not degrade
for further analysis and model training. The Common bands even for a single year of data we have used training data
from LT05C2-L1, LT05 TM and L8-OLI were considered of one single year, trained model through same IL model
for classification and further analysis. Objective here was to and validated it on test data of same year. Details of results
keep common bands across multi-sensor data and perform IL for same are given in Table III To facilitate the incremental
for ease of maintaining consistency in classification results. learning process, a variant of the Random Forest model known
Training and test data was collected for each year separately. as the Adaptive Random Forest model was employed [8].
Training data were used to train models in both online and The Adaptive Random Forest model was implemented using
offline learning methods. the River package, a Python library specifically designed for
online machine learning tasks. This package provides efficient
III. M ETHODOLOGY and scalable algorithms for processing streaming data and en-
This section will cover how online incremental learning was abled the implementation of the incremental learning approach
compared with traditional methods in various experiments. on the dataset. By utilizing the Adaptive Random Forest model
and the River package, the incremental learning method aimed
A. Classical Offline Learning ML Models to adaptively learn from the sequential data, incorporating
In the context of traditional offline learning, ML models the concept drift detection and diversity-inducing mechanisms.
were trained using a joint training approach first. This means This approach allowed for the continual refinement and up-
that when training a model for a specific year, the training dating of the model over time, accommodating changes in the
data from multiple previous years were combined and used land cover and land use patterns observed in the data.
IV. R ESULTS AND D ISCUSSIONS

All the experiments aimed to thoroughly examine the per-


formance and effectiveness of the IL approach when compared
with the traditional offline learning methods, thereby shedding
light on the advantages and limitations of each approach in
the context of LU classification. We evaluated models of both
traditional and incremental learning approaches on basis of
accuracy, precision, recall and f1-score of test data. As we
can see from Table II in all experiments, incremental online
method performed better as compared to traditional offline
method in terms of accuracy, precision, recall and f1-score. Fig. 1. LULC prediction on year 2016 data using multiple years of data
Also we have further evaluated performance of online and through: (a) Adaptive Random Forest and (b) Random Forest
offline learning for single year as can be seen in Table III. We
found that results of incremental learning were better.
In Table I, we have divided our data in four different vegetation throughout the year as plants go through different
experimental scenarios. In the first experiment, data is taken growth stages. Similarly, water bodies may experience fluctu-
from 2009 to 2014 to train the model by both offline and online ations in size, depth, or water quality over different seasons.
approaches. Further, the model was evaluated on independent Geospatial data can be collected at regular intervals using
set of test data from the year 2014. In this experiment, the satellite imagery, aerial surveys, or ground-based sensors. With
data from year 2009 to 2011 comes from one sensor (LT05- more frequent data collection, it becomes possible to capture
TM) while data from year 2012 to 2014 comes from the finer temporal variations and monitor changes more effectively.
different sensor (L8-OLI) as tabulated in Table I. Similarly, High-resolution satellite imagery, for example, can provide
in the second experiment the data is taken from year 2018 updates on land cover changes on a monthly, weekly, or even
to 2022. Here, the data comes from same sensor. In third daily basis. These changes can be used in incremental model
experiment we took six years of same sensor data from year to adapt to new changes.
2014 to 2019. Lastly, in fourth experiment we increased the Another advantage is the efficient use of resources. Un-
number of years to eight starting from 2007 to 2016. Here data like offline methods that require all data to be loaded into
for year 2007 comes from one sensor (LT05C2-L1), data for memory during training, online learning models can handle
year 2009 to 2011 comes from second sensor (LT05-TM) and large amounts of data without the need for extensive memory
data for year 2013 to 2016 comes from third sensor (L8-OLI). resources. The model processes data in a sequential manner,
In all of the four experiments we observed that the incremental allowing it to learn from each instance without needing to
learning method of adaptive random forest outperforms with store the entire dataset in memory. Incremental online learning
respect to the traditional random forest algorithm. is particularly effective in handling different varieties of data
We observed that over multiple years, incremental learning over time. As the model continuously learns and updates itself,
performed better as compared to traditional learning. To fur- it can easily adapt to new patterns and variations in the data.
ther evaluate the performance of algorithms we used data for This adaptability makes it well-suited for applications where
single year and tested both the approaches on it. Results for the data characteristics change over the years, such as in land use
same are given in Table III. In first experiment training data and land cover classification.
from year 2014 was used to train Random Forest and Adaptive Moreover, online learning models have the advantage of
Random Forest. Then both of these models were evaluated automatic updates. As new data is processed, the model
on the test data from year 2014. We can see that Adaptive incrementally incorporates the information, ensuring that it
Random Forest performed slightly better than Random Forest stays up to date with the latest trends and patterns in the data.
in this case. Similar experiments were done on year 2022, This feature eliminates the need for manual retraining and
2019, and 2016. For year 2022 and 2019 incremental method ensures that the model remains accurate and effective over
performed better. For year 2016 traditional method performed time.
better by just a difference of 1%.
V. C ONCLUSIONS
Incremental online learning methods offer several advan-
tages over traditional offline methods. One significant advan- Overall, it can be concluded that the incremental online
tage is their ability to handle real-time changes in data while method performs reasonably well in terms of accuracy, pre-
the model is in production. As time progresses, the Earth’s cision, and recall compared to the traditional offline method.
surface undergoes continuous transformations, and geospatial Although there are some variations in performance between
data reflects these temporal changes. Geospatial data often the two methods, the incremental online method offers the
exhibits seasonal variations, especially in phenomena such advantage of learning from new data in a sequential man-
as vegetation indices or water bodies. For example, satellite ner without the need for retraining the model from scratch.
imagery can capture the changing colors and density of Geospatial data is considered dynamic due to its ability to
TABLE II
C OMPARISON OF T RADITIONAL AND I NCREMENTAL L EARNING M ETHOD F OR M ULTIPLE Y EARS

Years used Years used


Traditional Offline Method Incremental Online Method
for training for testing
Accuracy Precision Recall F1-Score Accuracy Precision Recall F1-Score
2009, 2010, 2011,
2014 0.55 0.59 0.55 0.52 0.66 0.71 0.66 0.66
2013, 2014
2018, 2019, 2020,
2022 0.59 0.60 0.59 0.58 0.65 0.66 0.65 0.64
2021, 2022
2014, 2015, 2016,
2019 0.58 0.72 0.58 0.57 0.62 0.69 0.62 0.62
2017, 2018, 2019
2007, 2009, 2010,
2011, 2013, 2014, 2016 0.86 0.87 0.83 0.84 0.87 0.88 0.88 0.88
2015, 2016

TABLE III
C OMPARISON OF T RADITIONAL AND I NCREMENTAL L EARNING M ETHOD F OR A S INGLE Y EAR

Year Traditional Offline Method Incremental Online Method


Accuracy Precision Recall F1-Score Accuracy Precision Recall F1-Score
2014 0.67 0.72 0.67 0.68 0.68 0.73 0.68 0.69
2022 0.62 0.65 0.62 0.62 0.63 0.66 0.63 0.62
2019 0.60 0.69 0.61 0.62 0.62 0.69 0.62 0.62
2016 0.89 0.89 0.89 0.89 0.88 0.89 0.88 0.88

capture temporal changes, seasonal variations, and dynamic [4] AJ Florczyk, C Corbane, D Ehrlich, S Freire, T Kemper, L Maffenini,
phenomena. Online learning is well-suited for dynamic en- M Melchiorri, M Pesaresi, P Politis, M Schiavina, et al. Ghsl data
package 2019: public release ghs p2019. Publications Office, 2019.
vironments where data arrives continuously and the model Doi:10.2760/290498.
needs to adapt and learn from new observations in real-time. [5] Mattia Marconcini, Annekatrin Metz-Marconcini, Soner U¨ reyen,
It provides the advantages of real-time updates, resource effi- Daniela Palacios-Lopez, Wiebke Hanke, Felix Bachofer, Julian Zeidler,
Thomas Esch, Noel Gorelick, Ashwin Kakarla, et al. Outlining where
ciency, adaptability to changing data patterns, and suitability humans live, the world settlement footprint 2015. Scientific Data,
for dynamic applications. However, careful data validation and 7(1):242, 2020. Doi:10.1038/s41597-020-00580-5.
quality control measures are necessary to ensure the accuracy [6] SM Damien and F Mark. Mcd12q1 modis/terra+ aqua land cover type
yearly l3 global 500m sin grid v006 [data set]. NASA EOSDIS Land
and reliability of the online learning process. Processes DAAC, 2019. Doi:10.5067/MODIS/MCD12Q1.006.
By harnessing the power of incremental learning, we can [7] Jonas Koehler and Claudia Kuenzer. Forecasting spatio-temporal dynam-
unlock new opportunities in LULC classification and pave ics on the land surface using earth observation data—a review. Remote
Sensing, 12(21):3513, 2020. Doi:10.3390/rs12213513.
the way for more informed decision-making, sustainable land [8] Heitor M Gomes, Albert Bifet, Jesse Read, Jean Paul Barddal, Fabr´ıcio
management, and a deeper understanding of our ever-changing Enembreck, Bernhard Pfharinger, Geoff Holmes, and Talel Abdessalem.
environment. Once the classified data from online IL be- Adaptive random forests for evolving data stream classification. Machine
Learning, 106:1469–1495, 2017. Doi:10.1007/s10994-017-5642-8.
longing to urban areas have been obtained, numerous related
application studies such as urban modelling, road and building
detection, water mapping, urban area forest estimation, trans-
portation modelling, etc can be carried out.

ACKNOWLEDGMENT
Authors are thankful to IIITB (International Institute of
Information Technology Bangalore) for all the support institute
have provided in this project.

R EFERENCES
[1] Igor Klein, Ursula Gessner, Andreas J Dietz, and Claudia Kuenzer.
Global waterpack–a 250 m resolution dataset revealing the daily dy-
namics of global inland water bodies. Remote sensing of environment,
198:345–362, 2017. Doi:10.1016/j.rse.2017.06.045.
[2] Andreas J Dietz, Claudia Kuenzer, and Stefan Dech. Global snowpack:
a new set of snow cover parameters for studying status and dynamics of
the planetary snow cover extent. Remote sensing letters, 6(11):844–853,
2015. Doi:10.1080/2150704X.2015.1084551.
[3] Jonathan Le´on-Tavares, Jean-Louis Roujean, Bruno Smets, Erwin
Wolters, Carolien Tot´e, and Else Swinnen. Correction of directional
effects in vegetation ndvi time-series. Remote Sensing, 13(6):1130,
2021. Doi:10.3390/rs13061130.

You might also like