Paper 84
Paper 84
Paper 84
Abstract—Once a machine learning model has been trained In recent years, various thematic LULC time series products
and deployed, training it again on new and different data is have been generated for water [1], snow cover [2], vegetation
time and compute expensive. In this research, we propose a index [3], urban/settlement extents [4], [5], and general LULC
novel approach that enables machine learning models to continue
learning on multi-sensor and multi-temporal data. We have [6]. Many Earth Observation (EO) based forecasting applica-
presented a comparative analysis between traditional learning tions have been made for LULC, crop output, plant cover,
and incremental learning approaches in the context of land use flooding, and urbanization, according to Koehler and Kuenzer
classification. The study explores the benefits of incremental [7]. LULC classification plays a crucial role in understanding
online learning over traditional learning methods, focusing on and managing our dynamic environment. Accurate and up-to-
its performance and effectiveness in handling large and evolving
datasets. The online incremental approach effectively integrates date information about the distribution and changes in land
information from multiple sources and time periods, enhancing cover classes, such as forests, urban areas, agriculture, and
the overall predictive power and accuracy of the models. water bodies, is essential for environmental monitoring, urban
Index Terms—incremental learning, LULC classification, on- planning, natural resource management, and climate change
line learning studies. However, the classification of LULC from remote
sensing data poses unique challenges due to the dynamic
I. I NTRODUCTION nature of land cover patterns and the continuous influx of
new data. In our study we have used incremental learning
Land cover (LC) refers to the physical and biological cover to perform LULC classification over four different classes.
present over the surface of land - water, vegetation, bare soil, Data over several years through different satellites was used
etc. Land use (LU) is defined from human perspective (purpose to perform this classification.
and context) - use of land as agriculture, forestry and building Traditional machine learning approaches for LULC classi-
construction (residential vs. commercial, etc.). An urban area fication often rely on training models with a static dataset and
(cities and towns) has various land uses such as residential, then applying those models to classify new data. However,
commercial, industrial, water/lakes, vegetation and agricultural this approach fails to capture the temporal dynamics and
areas. Understanding the spatial and temporal dynamics of ur- concept drifts that occur in land cover classes over time. As
ban growth as well as determining how urbanization affects the the landscape undergoes transformations caused by natural
environment require an understanding of LU. Researchers and processes and human activities, there is a pressing need for
decision-makers can find and analyse patterns and trends in models that can adapt and update to these changes in a
urban growth and formulate plans for managing and reducing seamless and efficient manner.
the detrimental effects of urbanization by examining changes In this study we have used IL to perform LU classification of
in LU over time. four different classes in an urban area for which data over sev-
The process of classifying the land surface into distinct eral years acquired through multi-satellite sensors were used
groups according to its physical, biological and anthropogenic to perform the classification task. Current work addresses the
properties is known as land use land cover (LULC) classi- limitations of traditional static models by enabling the adaptive
fication. Several techniques including Geospatial Information and continuous learning of LU patterns. Incremental learning
System and remote sensing along with field surveys aid in allows models to adapt to evolving landscapes, automatically
classifying LU. Large-scale mapping and classification of LU incorporate new data, and refine their classification capabilities
frequently involves the use of several satellite data that are over time. By incrementally updating the models, we can
used to track changes over time and give useful information effectively capture changes in LU classes, account for concept
about their spatial-temporal distribution. drifts, and provide accurate and up-to-date LU information.
TABLE I for training. For example, in first set of experiments training
DATA FROM D IFFERENT S ENSORS data from 2018 to 2022 were combined and utilized to train
Data of Year Sensors the ML model. Subsequently, the testing data to evaluate the
2007 LT05C2-L1 performance of the trained model for that specific year was
2009, 2010, 2011 LT05-TM 2022. Similar experiments were conducted for other years
2013, 2014, 2015,
2016, 2017, 2018, as well, ensuring a comprehensive evaluation across multiple
L8-OLI time periods (see Table III for details).
2019, 2020, 2021,
2022 With evaluation over multiple years we have also done
comparison for single year. The training data from one single
year was used to train model and also tested on testing data
II. DATA AND S TUDY A REA for same year.
Greater Bangalore’s remote sensing data for the Land- To perform the classification task, the Random Forest
sat series with a 30 m spatial resolution was taken from Classifier from the scikit-learn package was employed. This
USGS’s public domain at http://landsat.usgs.gov. Ward bor- classifier is known for its ability to handle complex datasets
ders on the city map were digitally extracted from the and provide robust classification results. In order to optimize
BBMP (Bruhat Bangalore Mahanagara Palike) map. Us- the performance of the classifier, a grid search technique was
ing a portable, pre-calibrated GPS (Global Positioning Sys- employed, coupled with cross-validation with a fold size of
tem), Survey of India topographical sheet, Google Earth, five. This process allowed for the systematic exploration of
Bhuvan (http://earth.google.com, http://bhuvan.nrsc.gov.in), various hyperparameter combinations, ultimately selecting the
ground control points were gathered to register and geocorrect best model based on the evaluation metrics. By systematically
remote sensing data. tuning the hyperparameters, the model was able to adapt and
Landsat data were georeferenced and cropped to fit the study generalize well to the given dataset, enhancing its ability to
area. False colour composite (FCC) images were generated to accurately classify LC and LU classes. Details of experimental
aid the identification and analysis of heterogeneous patches setup and results are presented in Tables II and III
found within the research region. Different land use categories
involved are i) urban (buildings, roads and paved surfaces), ii) B. Incremental Online Learning ML Models
vegetation (parks, botanical gardens and grass lands such as In the context of incremental learning, the training data for
golf field), iii) water bodies (lakes, sewage treatment tanks) all years was processed in a sequential manner. This involved
and iv) others (play grounds, quarry regions and barren land). passing the data for each year individually, starting from the
Landsat data of the year 2007 to 2022 comprised of several earliest year and progressing chronologically. For example,
GeoTIFF images covering the study area of interest were the data for the year 2018 was processed record by record,
preprocessed using Python libraries like Rasterio to ensure followed by the data for the subsequent years up to 2022.
spatial consistency, resolution and alignment. Note that these The model was incrementally trained on this sequential data.
data were captured from different sensor as shown in Table Once the training phase was completed, the model was then
I. Subsequently, labelled data were extracted from the images evaluated using the test data from the year 2022 to assess its
for different LU classes. The masked pixels were assigned performance and validate its effectiveness. Refer to Tables II
class labels based on available ground truth data and expert and III for more details.
knowledge. This process helped to generate a labeled dataset To check that performance of the models does not degrade
for further analysis and model training. The Common bands even for a single year of data we have used training data
from LT05C2-L1, LT05 TM and L8-OLI were considered of one single year, trained model through same IL model
for classification and further analysis. Objective here was to and validated it on test data of same year. Details of results
keep common bands across multi-sensor data and perform IL for same are given in Table III To facilitate the incremental
for ease of maintaining consistency in classification results. learning process, a variant of the Random Forest model known
Training and test data was collected for each year separately. as the Adaptive Random Forest model was employed [8].
Training data were used to train models in both online and The Adaptive Random Forest model was implemented using
offline learning methods. the River package, a Python library specifically designed for
online machine learning tasks. This package provides efficient
III. M ETHODOLOGY and scalable algorithms for processing streaming data and en-
This section will cover how online incremental learning was abled the implementation of the incremental learning approach
compared with traditional methods in various experiments. on the dataset. By utilizing the Adaptive Random Forest model
and the River package, the incremental learning method aimed
A. Classical Offline Learning ML Models to adaptively learn from the sequential data, incorporating
In the context of traditional offline learning, ML models the concept drift detection and diversity-inducing mechanisms.
were trained using a joint training approach first. This means This approach allowed for the continual refinement and up-
that when training a model for a specific year, the training dating of the model over time, accommodating changes in the
data from multiple previous years were combined and used land cover and land use patterns observed in the data.
IV. R ESULTS AND D ISCUSSIONS
TABLE III
C OMPARISON OF T RADITIONAL AND I NCREMENTAL L EARNING M ETHOD F OR A S INGLE Y EAR
capture temporal changes, seasonal variations, and dynamic [4] AJ Florczyk, C Corbane, D Ehrlich, S Freire, T Kemper, L Maffenini,
phenomena. Online learning is well-suited for dynamic en- M Melchiorri, M Pesaresi, P Politis, M Schiavina, et al. Ghsl data
package 2019: public release ghs p2019. Publications Office, 2019.
vironments where data arrives continuously and the model Doi:10.2760/290498.
needs to adapt and learn from new observations in real-time. [5] Mattia Marconcini, Annekatrin Metz-Marconcini, Soner U¨ reyen,
It provides the advantages of real-time updates, resource effi- Daniela Palacios-Lopez, Wiebke Hanke, Felix Bachofer, Julian Zeidler,
Thomas Esch, Noel Gorelick, Ashwin Kakarla, et al. Outlining where
ciency, adaptability to changing data patterns, and suitability humans live, the world settlement footprint 2015. Scientific Data,
for dynamic applications. However, careful data validation and 7(1):242, 2020. Doi:10.1038/s41597-020-00580-5.
quality control measures are necessary to ensure the accuracy [6] SM Damien and F Mark. Mcd12q1 modis/terra+ aqua land cover type
yearly l3 global 500m sin grid v006 [data set]. NASA EOSDIS Land
and reliability of the online learning process. Processes DAAC, 2019. Doi:10.5067/MODIS/MCD12Q1.006.
By harnessing the power of incremental learning, we can [7] Jonas Koehler and Claudia Kuenzer. Forecasting spatio-temporal dynam-
unlock new opportunities in LULC classification and pave ics on the land surface using earth observation data—a review. Remote
Sensing, 12(21):3513, 2020. Doi:10.3390/rs12213513.
the way for more informed decision-making, sustainable land [8] Heitor M Gomes, Albert Bifet, Jesse Read, Jean Paul Barddal, Fabr´ıcio
management, and a deeper understanding of our ever-changing Enembreck, Bernhard Pfharinger, Geoff Holmes, and Talel Abdessalem.
environment. Once the classified data from online IL be- Adaptive random forests for evolving data stream classification. Machine
Learning, 106:1469–1495, 2017. Doi:10.1007/s10994-017-5642-8.
longing to urban areas have been obtained, numerous related
application studies such as urban modelling, road and building
detection, water mapping, urban area forest estimation, trans-
portation modelling, etc can be carried out.
ACKNOWLEDGMENT
Authors are thankful to IIITB (International Institute of
Information Technology Bangalore) for all the support institute
have provided in this project.
R EFERENCES
[1] Igor Klein, Ursula Gessner, Andreas J Dietz, and Claudia Kuenzer.
Global waterpack–a 250 m resolution dataset revealing the daily dy-
namics of global inland water bodies. Remote sensing of environment,
198:345–362, 2017. Doi:10.1016/j.rse.2017.06.045.
[2] Andreas J Dietz, Claudia Kuenzer, and Stefan Dech. Global snowpack:
a new set of snow cover parameters for studying status and dynamics of
the planetary snow cover extent. Remote sensing letters, 6(11):844–853,
2015. Doi:10.1080/2150704X.2015.1084551.
[3] Jonathan Le´on-Tavares, Jean-Louis Roujean, Bruno Smets, Erwin
Wolters, Carolien Tot´e, and Else Swinnen. Correction of directional
effects in vegetation ndvi time-series. Remote Sensing, 13(6):1130,
2021. Doi:10.3390/rs13061130.