Artificial Intelligence and Machine Learning in Earth System Sciences With Special Reference To Climate Science and Meteorology in South Asia
Artificial Intelligence and Machine Learning in Earth System Sciences With Special Reference To Climate Science and Meteorology in South Asia
Artificial Intelligence and Machine Learning in Earth System Sciences With Special Reference To Climate Science and Meteorology in South Asia
Previous
reviews A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Rolnick et al.2 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Reichstein ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
et al.3
Shen et al.4 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Sit et al.5 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Ball et al.6 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Fang et al.7 ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
The present ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
study
A, Electricity systems; B, Transportation systems; C, Buildings and cities/urban climate; D, Industrial systems; E, Farms and forests; F, Climate
change mitigation; G, Weather and climate prediction; H, Climate finance; I, Causality; J, Computer vision, K, Interpretable machine learning; L,
Natural language processing; M, Reinforcement learning; N, Time series; O, Transfer learning; P, Uncertainty estimation; Q, Unsupervised learn-
ing; R, Seismology; S, South Asian monsoon; T, Short-range weather prediction; U, Extended range weather forecasting; V, Seasonal weather pre-
diction; W, Hydrology; X, Oceanography; Y, Transformers or generative adversarial networks; Z, Weather and climate extremes.
temperature, etc. The developments in physical sciences the challenges in accurately predicting the various spatio-
associated with simple statistical methodologies have left temporal scales of the monsoon. We also focus on using
a large grey area in uncovering the relationships leading ML methods for extended range predictions.
to complex, nonlinear variables. Hence, there is a need to The studies summarized in Table 1 have not considered
dedicate resources to using advanced ML-based tools to the latest state-of-the-art algorithms, such as the attention-
decipher the links between physical fields which are still based transformers and generative adversarial networks.
out of our reach and improve their predictability. The de- The advancements brought about by these models in the
velopments in deep learning, deep reinforcement learn- computer vision and natural language processing com-
ing, transformers, nonlinear science, and recent advances munity make them excellent candidates to be explored in
in interpretable ML are the areas that can help solve cru- the domain of ESS.
cial research problems in ESS. Recognizing this need, to This study outlines all the previous reviews on the sub-
effectively utilize the extensive data, the Ministry of ject, delineates the tools required, the materials needed by
Earth Sciences (MoES), Government of India (GoI) has re- interested researchers to gain hands-on experience in ML
cently set up a virtual centre for AI and ML devoted to and can be used to further the applications of ML in ESS.
earth sciences, which is anchored at the Indian Institute
of Tropical Meteorology (IITM), Pune.
Background
Figure 1. Mind map of multidimensional areas related to machine learning (ML) in weather and climate sciences.
with added attention has been recently done12. Recent ad- sion should be the first go-to methods. EnhanceNeT and
vances in computer vision show that algorithms such as PSPNet are algorithms that can be used to classify the ob-
SRGAN, LapSRN, FSRGAN and UNET outperform the jects in images and spatially locate them. They have shown
standard SRCNN. Long short-term memory (LSTM) excellent results in computer vision applications. They
networks, sequence-to-sequence networks and the recent can be used for problems such as identifying floods from
attention-based transformer models have improved the satellite imagery.
accuracies in natural language processing. Some of these
algorithms have also been used or can be applied to the
time-series forecasting problems in ESS. A survey on ESS datasets
these applications can be found in Lim and Zohren13.
Weather and climate data are so massive that they have The understanding of ESS datasets is important while de-
not been explored exhaustively by the community working veloping ML models. These datasets primarily come in
on big data. The spatio-temporal nature of the datasets, three classes: (i) observational data, (ii) reanalysis prod-
i.e. three-dimensional fields at each temporal dimension, ucts which are merged model outputs and observations
makes it a complex problem to solve. The patterns in this created invariably and (iii) dynamical model simulated
four-dimensional data cannot be deciphered manually, outputs (such as climate change data from models). For
and ML offers a perfect opportunity. Models that have the South Asian domain, long-period ground-based obser-
shown good performance on video datasets such as vations made by India Meteorological Department (IMD)
ConvLSTM, can build large-scale, deep learning-based are available. These datasets can now be obtained from
systems that can predict the information in high spatial the website https://dsp.imdpune.gov.in. Satellite-based
and temporal resolution14,15. Sequence-to-sequence and products are available from the Tropical Rainfall Measuring
LSTM networks have been used to predict and forecast Mission (TRMM), Landsat, Sentinel and MODIS. Reana-
active-break cycles of Indian monsoon16. Before starting lysis products are gridded products which are developed
any analysis, traditional algorithms such as random forest, through blending models and observational data products
support vector machines and multivariate linear regres- using data assimilation techniques, and are useful for the
CURRENT SCIENCE, VOL. 122, NO. 9, 10 MAY 2022 1021
REVIEW ARTICLE
fields which are not/cannot be measured directly by in- It is challenging to predict them accurately. Other impor-
struments. They offer insights into the information which tant problems of interest to the ESS community are flood
is closet to reality. Various reanalysis products are avail- forecasting and disaster management using AI/ML-based
able for the South Asian region, such as IMDAA reanaly- techniques.
sis, NCEP/NCAR reanalysis, CAMS reanalysis, ERA5 In seismology, AI/ML-based techniques are being used
reanalysis, MERRA-2 reanalysis and JRA55 reanalysis. for earthquake detection, phase-picking (measurement of
With regard to model products, TIGGE (short- and arrival times of distinct seismic phases), event classifica-
medium-range forecasts), CMIP5/CMIP6 (past and future tion, early warning of earthquake, ground motion predic-
climate scenarios), and seasonal to sub-seasonal (S2S) tion, tomography and earthquake geodesy. They are also
hindcasts are available. The model outputs are based on useful to determine and predict tsunami inundation and
the integration of partial differential equations of dynami- heights.
cal systems. ML offers an innovative methodology to im-
prove these dynamical model estimates by combining
Popular tools to perform ML for ESS
them with the observed or reanalysis products.
The archive of seismic waveform data, global position-
The open-source software packages have provided a
ing system (GPS) data, oceanographic and other geo-
bridge to the domain experts to avoid reinventing the
science datasets in India is increasing exponentially every
wheel while applying ML to their problems. Python is the
year, calling for fast and efficient processing and disse-
most popular language for ML, and various libraries such
mination of information to the public service systems.
as TensorFlow, PyTorch, Theano, MXNet, OpenCV,
Keras and PyTorch Lightning are available freely. Visua-
Research problems in ESS lization software such as TensorBoard and Tableau assist
in communicating the results from ML models. In addi-
tion to the software requirements, deep learning needs
South Asia is home to more than two billion people who
graphical processing units (GPUs) to perform tensor
are largely dependent on natural climate variability for
computations in neural networks. Tensor processing units
their livelihood. For example, the Indian monsoon feeds
(TPUs) are a step ahead of GPUs, wherein the neural
agricultural lands over the region, thus directly impacting
network is encoded on the chip to perform fast calcu-
its economic well-being. Monsoon is a complex, multi-
lations. However, TPUs are only available over the cloud,
scale and nonlinear problem. Hence linear methods can-
and each individual cannot buy a personal GPU for deep
not unravel the fundamental processes, especially the
learning. Hence free and paid cloud computing services,
feedback processes leading to its variability. Forecasts at
such as Amazon Web Services (AWS), Microsoft Azure,
various temporal scales such as short to medium range
Google Cloud Platform (GCP), Paperspace, Digital
(1–10 days), extended range (2–3 weeks), seasonal scale
Ocean, Google Earth Engine, etc. provide an option to
(for the coming season) and climate scale (hundreds of
build machines over the cloud to perform deep learning
years) are essential for planning hydrological resources of
and data analysis in ESS17. A step further, the concept of
the region. It has been known that the crop yields are de-
Jupiter notebooks as a service has become popular, and
pendent on meteorological variables; ML can be used to
there are several free and paid vendors providing note-
accurately forecast the spatial crop yield a season in ad-
books as a service. Notable amongst them are the free
vance and thus economically benefit the society. The de-
services offered by Kaggle, Google Colab and others.
mographics in the South Asian region have considerably
Readers can find information on more cloud vendors at
changed in the past decades, and many people now live in
https://github.com/binga/cloud-gpus, https://github.com/
the cities. This demographic shift could be attributed to
zszazi/Deep-learning-in-cloud, https://github.com/disc-
the agricultural variability arising from the modulations
diver/deep-learning-cloud-providers/blob/master/list.md,
in rainfall patterns (and other factors such as new oppor-
etc. ‘Docker containers’ have also become an essential part
tunities in various sectors).
of the ecosystem, helping us to deploy end-to-end pack-
The population density in South Asian countries is also
ages for deep learning.
very high. Hence, locally accurate urban forecasts are a
need of the hour. These locations are also sources of
chemical species harmful to the environment and all living Educational materials for learning earth system
beings. Hence air-pollution prediction is a significant data science
task. Identifying localities with high air pollution is essen-
tial for city planning; for example, deciding the number A key component in the ML cycle is the educational re-
of electric buses to be introduced in a city. ML-based algo- sources to build knowledge and apply it to ESS. The ave-
rithms can be used to improve the cyclone forecasts of nues to learn data science and use ML for earth sciences
dynamical models. Extreme weather events such as heat- applications are the Coursera specializations, courses,
waves and cloud bursts are causing havoc in recent times. professional certificates, Udacity nanodegrees, Udemy
1022 CURRENT SCIENCE, VOL. 122, NO. 9, 10 MAY 2022
REVIEW ARTICLE
courses and other free and paid materials available as recurrence plots fall in the category of nonlinear metho-
massive open online courses (MOOCs). The Development dologies and are suitable for specific applications. While
of Skilled Manpower in Earth System Sciences (DESK), using ML physical sciences, one primary concern is that
Ministry of Earth Sciences (MoES), GoI regularly holds these could be considered as black-box models. Interpret-
training programmes for young researchers on ML appli- able ML aims to address this concern, and analysis of
cations in earth sciences. DESK conducted one such deep learning model weights reveals the patterns learned.
training workshop in 2021, and the video recordings of Active research is being done in this area, and it is crucial
the sessions can be found at https://tinyurl.com/448t8yb4. for the increasing acceptability of deep learning models at
the production scale in ESS. The emerging fields of aug-
mented reality, virtual reality, improved remote sensing
Decision-making for ML in ESS
measurements, crowd-sourcing and drone technology offer
excellent potential to advance observation data collection
Once the weather/hydrological forecasts are generated,
and improve ML models.
they must be used to make decisions for the benefit of
society. Deep reinforcement learning is an excellent method
for this. State-of-the-art algorithms such as Deep-Q-net- Applications of AI and ML in earth sciences
works, vanilla policy gradient, trust region policy optimi-
zation, proximal policy optimization, deep deterministic The AI/ML algorithms have vast applications in earth
policy gradient (DDPG), soft actor-critic, twin delayed sciences problems. Figure 2 depicts a few such applica-
DDPG, etc. can be used to train agents who can guide in tions in areas such as atmosphere/biosphere, seismology
decision-making. The most crucial aspect of deep rein- and ocean.
forcement learning is the design of the environment, ac-
tion(s) and reward(s). The authorities can use these tools
Statistical downscaling
in decision-making for disaster preparedness/mitigation,
hydrological planning and other associated tasks.
Downscaling of data is necessary to obtain a local projec-
tion of the information. The present-day models and ob-
Feature engineering for ML in ESS servations generated from weather stations (or other
instruments) are available at a coarser resolution. They
Feature engineering is the generation of meaningful pre- are irregularly spaced, which may often lead to misrepre-
dictors or parameters to improve the performance of a sentation (or absence) of precipitation, temperature or
ML model. It is performed after cleaning the data and pre- other variables at local levels. Downscaling the Indian
paring them in a format that can train statistical models. summer monsoon (ISM) rainfall is a difficult task involv-
It has been noted that removing redundant variables im- ing a multi-scale spatio-temporal dynamical process with
proves the performance of ML systems. Various methods significant variance18. Further, regional variations of ISM
can be used to find the most valuable predictors; some of rainfall are often quite substantial, varying from a few
them are principal component analysis (PCA), empirical millimetres to thousands of millimetres within a few hun-
orthogonal functions (EOF) and independent component dred kilometres. The ISM rainfall can be classified into
analysis (ICA). Binning, counting, transforming or filter- different coherently fluctuating zones, linked to complex
ing can extract the predictive signal from the data to im- multi-scale processes19–21.
prove the models. Unsupervised learning techniques, Statistical downscaling is a low-cost method to obtain
such as autoencoder, can also assist in finding valuable information at the local scale and provide it to the stake-
predictors from raw datasets. The deep learning-based holders. AI and ML techniques are used for statistical
models are, however, coded for image-based input data- downscaling8,22. Recently, development in the single im-
sets. To overcome this limitation, strategies such as trans- age super-resolution using deep learning has proved to be
forming the spherical global data to a cubed sphere or one of the best methods used for this purpose8–10. Another
tangent planes mapping can effectively reduce spherical method that has shown promising results in statistical
distortions in the data. downscaling is ConvLSTM documented by Harilal et al.23.
While the previous decade has seen the hype of deep The growing volume of seismological and other geo-
learning overshadow other ML methodologies, numerous science-related datasets acquired from surface and borehole
emerging and innovative ML methods can be used for studies requires efficient analysis and trend recognition
ESS. Graph ML is training neural networks on graphs and techniques to extract valuable signals. AI/ML tools have
is becoming increasingly popular. Complex networks and been applied in different fields in seismology, from event
Figure 2. An overview of the application of artificial intelligence (AI)/ML algorithms in some earth sciences problems. The precipita-
tion forecasting can include data from short-range, medium-range and extended-range forecasting.
identification to earthquake prediction, with varying de- applied to datasets not only encompassed by training sets,
grees of success24–29. The case studies also highlight the but also to complex cases such as clipped seismograms.
need for further research and development to refine the exi- Kong et al.33 used neural networks to detect P-wave onset
sting techniques, and develop new tools that could be uti- and P-wave polarity. ML techniques have important appli-
lized in the processing and analyses of large datasets and cations in detecting small-magnitude local earthquakes in
identification of different geophysical signals. AI/ML areas characterized by sparsity of receivers. AI/ML algo-
techniques in geoscience/seismology could be employed rithms may play an essential role in the identification of
gainfully to analyse other seismological datasets that events and in locating earthquakes with recordings of the
MoES, GoI and its affiliated institutions routinely ac- events at fewer stations33,34. Other applications in earth
quire. Identifying seismic phases accurately is one of the sciences such as hydrology, show that AI/ML can esti-
primary requirements in seismological data analysis to mate and predict streamflow in ungauged basins35–37.
determine earthquake source parameters. ML helps iden-
tify different seismic phases in the data.
In many earthquake detection algorithms, short-term Short- and medium-range data-driven weather
average (STA)/long-term average (LTA) criteria are used forecasting
to detect possible arrival times of P and S waves30. There-
fore, matched filtering or template matching technique is Currently, the highest global resolution ensemble predic-
used for event detection. In this method, waveforms of tion system at ~12.5 km horizontal resolution (with 21
known events are used as templates to scan through conti- members) is being used for providing ten-day probabilistic
nuous waveforms to detect new events31. Recently, ML forecast based on the Global Ensemble Forecast System
has been utilized to improve earthquake detection and (GEFS@T1534) by IMD. IITM has implemented the
phase-picking capabilities25,32. Fingerprinting and similarity high-resolution GEFS for operational application since
thresholding (FAST) is the latest algorithm using ML June 2018. While the deterministic GFS model38 at 12.5 km
techniques that have been used to identify earthquakes horizontal resolution provides a better skill up to ~five
without prior knowledge of seismicity. FAST would facili- days compared to the earlier coarser resolution (~25 km
tate the automated processing of large and voluminous resolution GFST574)39, the ensemble prediction system
datasets by being computationally more efficient than has shown much better skill than the control member (the
template matching. Similarly, the generalized phase detec- deterministic GFS model), particularly for predicting ex-
tion (GPD) algorithm searches for near-identical wave- treme rainfall events40,41. The model forecast inaccuracies
forms from millions of seismograms, which is used to mainly arise from initial conditions and improper physical
classify windowed data as P, S or noise. GPD can be parameterization. The uncertainties of initial conditions
1024 CURRENT SCIENCE, VOL. 122, NO. 9, 10 MAY 2022
REVIEW ARTICLE
are resolved primarily by the perturbed initial states in earchers also consider that AI/ML methods can outper-
the ensemble prediction system. However, the uncertainty form conventional prediction systems for seasonal fore-
arising from deterministic closures of the physical para- casts54,55. Currently, they outperform statistical models.
meterization still adds many errors due to unrealistic con- One of the long-standing seasonal prediction problems
straints, namely the quasi-equilibrium42. Under the AI/ is the ISMR prediction. Blandford started seasonal fore-
ML paradigms, the use of sub-grid-scale tendencies gene- casting of ISMR using empirical methods in 1886. Since
rated by the cloud-resolving models within each climate then, numerous attempts have been made to predict sea-
model grid would be used as the input of a deep learning sonal mean monsoon over India using empirical and
model. The inputs would be mapped for training to target dynamical models (atmosphere and coupled ocean–atmo-
the heat and moisture tendencies and this framework sphere models; see Rao et al.39 for more details). Empiri-
holds promise in improving the model fidelity43–45. cal models showed very high skills (>0.9) during the
development stages and during the actual operational
phase, while they showed weak skills (<0.5). On the other
ML for extended range forecasts
hand, dynamical models showed moderate skill during
the hindcast and operational forecast phase39. The
AI/ML methods have recently found applications in cli-
primary reason for the failure of empirical models in pro-
mate forecast models. There are two basic applications
viding high skills during the operational phase is that the
that show promise for near-future climate applications.
relationship between predictors and predictands under-
The first is the bias correction and improvement of the
goes secular changes from the time the model has been
numerical model forecasts. The second relates to the me-
developed to the stage when it is made operational. To
thods attempting the sub-seasonal low-frequency predic-
avoid such a situation, AI/ML models can be used effi-
tions. The bias correction and model post-processing
ciently to identify new predictors53. Using autoencoders,
applications are helpful to the stakeholders using climate
Saha et al.53 have developed an AI/ML model to predict
forecasts. The climate forecasts from dynamical models
ISMR with two months lead time and an absolute mean
show substantial bias when the forecast is considered
error of less than 3%. On the other hand, the dynamical
over scales lower than the balanced flow, mainly arising
models exhibit systematic biases in precipitation that
due to unknown physics or unresolved dynamics. When
arise due to parametrization schemes used in these mod-
sufficient observations are available over a location,
els39 and therefore underestimate the extremes. To avoid
some of the systematic errors arising due to unresolved
such systematic errors, AI/ML models will be useful.
scale dynamics or physics can be corrected46. Sub-
seasonal forecasting using ML methods are now under
active research12,47–49. ML for improving the physical processes in
dynamical models
ML for seasonal and climate-scale forecasting
Dynamical models work on the principle of solving par-
Seasonal forecasting is one of the most challenging pro- tial differential equations over the area of interest with
blems in forecasting. As pointed out by Lorenz50, the the necessary initial and boundary conditions. They con-
weather forecasts are highly dependent on initial condi- sist of various components such as atmosphere, ocean,
tions (today’s weather determines tomorrow’s weather). land surface, etc. and a correct representation of physical
In contrast, climate projections/decadal predictions (an processes in the numerical models is highly essential for
average of weather for a few decades) are less sensitive to accurate simulations of the coupled climate systems. For
the initial conditions. However, they depend on boundary example, various researchers have tried to understand the
conditions. When we try to make seasonal forecasts, the relationship between the Indian monsoon and the global
distinction is somewhat blurred, and the seasonal fore- and regional teleconnections such as El Niño-Southern
casts still depend on initial conditions51. Chattopadhyay Oscillation (ENSO)56,57, Indian Ocean dipole (IOD)58,
et al.51 have shown that model hindcasts initialized with North Atlantic Oscillation59, Pacific Decadal Oscillation60,
February initial conditions exhibit better prediction skills volcanic eruptions61 and aerosols62,63. Recent studies have
for the Indian summer monsoon rainfall (ISMR). Further attempted to use deep learning to develop models that
complexities such as resolving ocean processes also be- better represent the physical processes. For example, de
come essential at a seasonal scale. Hence, extracting pre- Witt and Hornigold64 used deep reinforcement learning-
dictive information (which changes from event to event) based approach to test the stratospheric aerosol injection
across both space and timescales is vital to significantly on climate. Volcanic eruptions have been used as an ana-
improve seasonal forecasts52. Therefore, the use of AI/ logue for stratospheric aerosol injection, and deep learn-
ML methods for improving seasonal forecasts is impera- ing can assist in addressing the nonlinear nature of the
tive, and the research community has started using these problem. Recently, Lamb and Gentine43 used graph neural
methods extensively in seasonal forecasts53–55. Some res- networks to study the aerosol optical properties. Seifert
CURRENT SCIENCE, VOL. 122, NO. 9, 10 MAY 2022 1025
REVIEW ARTICLE
and Rasp65 discusses the role of ML in estimating cloud learning method to reconstruct the optical images78.
microphysics. The uncertainties in the simulation of the While modelling and deploying systems and issuing
Indian monsoon arise from the missing or erroneous warnings, the ML method can give a post-forecast correc-
physics in the dynamical systems. ML to improve the un- tion to account for the uncertainties after learning from
derstanding of physical processes can lead to cascading all previous failures79.
returns by enhancing the hydrological outputs from the
numerical weather prediction (NWP) models66–70.
ML for hydrogeological modelling
ML for nowcasting weather and tracking storms cells
Rajaee et al.80 use 67 published studies to assess the AI
approaches towards groundwater level (GWL) modelling.
There is a need for a high-resolution early warning system
They found that ML could accurately simulate and fore-
with reliable nowcasts in the regions of steep topography
cast GWL time series in various aquifers. This type of
and urban areas during severe weather. Traditionally,
modelling uses data science to unravel physical relation-
nowcasting is performed by carrying out extrapolation,
ships between GWL and various hydrological factors.
probabilistic nowcasting71, semi-Lagrangian advection
Due to the lack of mathematical/physical representations
scheme72 and using algorithms like optical flow, etc. The
of the processes, AI models are beneficial in groundwater
state-of-the-art, data-driven approach plays a pivotal role
modelling, where knowledge-driven simulation is chal-
in weather nowcasting. Doppler weather radar provides
lenging to design. Research and methods in hydrogeology
extremely high geographical and temporal resolution
have evolved in response to global challenges81. Hydro-
weather information. Agarwal et al.73 utilized radar im-
geologists are now working to find solutions to a wide
ages to forecast the weather using the U-Net algorithm,
range of issues, including the long-term supply of potable
demonstrating that it outperformed the optical flow tech-
water, geothermal energy production, preservation of the
nique. Su et al.74 have shown that ML approaches have a
natural environment and the impact of climate change on
high learning capacity, and enhance echo position and in-
groundwater. These challenges can be solved by hydro-
tensity forecast accuracy in convective cells. The tempor-
geologists using numerical modelling. Identifying piezo-
al precision of such convective cells varies from 30 to
metric risk zones and calculating groundwater recharge
60 min during a relatively short period. Estimating preci-
are two examples of simple hydrogeological issues that
pitation in complicated orography regions is a well-known
are routinely treated using simpler models. Iterative dis-
problem. Arulraj and Barros75 used detection and classifi-
crete forms of the equations driving the hydrogeological
cation ML algorithms to improve the estimation of oro-
process are solved using numerical models to handle
graphic precipitation across the Southern Appalachian
complex difficulties. The Internet of Things and other
Mountains. Human lives, ecosystems, manmade struc-
recent technological advancements have allowed hydrogeo-
tures, and landscapes are at risk when snow avalanches
logists to acquire large amounts of real-time data. Tradi-
occur in mountainous locations. The International Com-
tional modelling approaches have difficulty extracting
mission for Alpine Rescue anticipates an increase in the
useful features, quantifying uncertainty or establishing
frequency of deadly occurrences caused by snow ava-
correlations between diverse factors. At least four issues
lanches, with an average of 138 recorded cases per year
impede the broad adoption of ML in hydrogeology as a
in 2015 across Alpine nations and North America. A recent
complement to the numerical models. The first constraint
study used ML to simulate the hazards due to snow ava-
is that most ML models are opaque black boxes. Using a
lanches76. Important precursors for modelling snow ava-
black-box model, one does not know the laws that govern
lanche hazards were found to be slope, topographic
the system’s operation or the causal relationships between
location, surface wetness and precipitation.
the variables. Hence hydrogeologists cannot explain or jus-
tify the model results, either for improved understanding
ML for numerical weather prediction of the phenomena or to support high-stakes judgements.
A second issue is that generalization is challenging in hy-
Satellite remote sensing and NWP groups are ripe for rapid drogeology data-driven models even with high simulation
advancement in the application of ML. NWP relies heavily fidelity. Another drawback of the ML models is that they
on integrating fields generated by satellites and other re- may not converge and cannot be automatically extended to
mote sensing devices. Both spatially and temporally, gaps respond to new events in a system under study. Extensive
are a common occurrence in such data. The existence of and dedicated research efforts are needed at the intersec-
spatial and temporal gaps is a typical issue in such obser- tion of hydrogeology and ML.
vations. Alleviating uncertainties arising due to these data Tsunami evacuations helped by early warnings can
gaps is necessary before performing ML. The time series considerably reduce the number of casualties. However,
of satellite ocean fields are constructed using an ensem- incorrect danger predictions and warnings might have the
ble of neural networks with varying weights77 and a deep opposite impact. To limit the number of casualties in
1026 CURRENT SCIENCE, VOL. 122, NO. 9, 10 MAY 2022
REVIEW ARTICLE
Figure 3. Gartner’s hype cycle for ML in earth system science (ESS) with a focus on research problems associated with South Asia.
future tsunamis, it is vital to develop tsunami forecasting AI for climate and human health
systems based on real-time tsunami observation data and
provide early warnings. Using an advanced CNN, research- Using supervised ML, topic modelling and geoparsing,
ers were able to accurately forecast tsunamis based on data Berrang-Ford et al.82 identified mapped all climate change
from extensive tsunami and geodetic monitoring net- and health research published between 1 January 2013
works82, which is the first effort at AI-enabled end-to-end and 9 April 2020. Their analysis included only the studies
tsunami inundation predictions. published in English, with 15,963 climate and health
CURRENT SCIENCE, VOL. 122, NO. 9, 10 MAY 2022 1027
REVIEW ARTICLE
studies published between 2013 and 2019. They found an science for ESS. This work aims to further ESS over
overwhelming focus on the effects of climate change on South Asia using ML applications as an end goal.
human health, with little attention paid to mitigation and
adaptation. Causal mortality and infectious disease inci-
1. Chantry, M., Christensen, H., Dueben, P. and Palmer, T., Oppor-
dence due to heat and air pollution were most frequently tunities and challenges for machine learning in weather and cli-
studied. Seasonality, harsh weather, heat and weather vari- mate modelling: hard, medium and soft AI. Philos. Trans. R. Soc.
ability were the most researched weather exposures. London, Ser. A, 2021, 379, 20200083.
Mental health, undernutrition, and maternal and child 2. Rolnick, D. et al., Tackling climate change with machine learning.
health were the areas of climate health study that received arXiv.org, 2019; doi:https://arxiv.org/pdf/1906.05433.pdf.
3. Reichstein, M. et al., Deep learning and process understanding for
less attention. Low-income countries, which often bear data-driven earth system science. Nature, 2019, 566, 195–204.
the brunt of health consequences due to climate change, 4. Shen, C., A transdisciplinary review of deep learning research and
were underrepresented in the studies. Climate change and its relevance for water resources scientists. Water Resour. Res.,
human health must be mapped using automated ML in the 2018, 54, 8558–8593.
era of big data. With the lack of data guidance on climate 5. Sit, M. et al., A comprehensive review of deep learning applica-
tions in hydrology and water resources. Water Sci. Technol., 2020,
and health, policymakers may be hesitant to make deci- 82, 2635–2670.
sions on how to mitigate the health effects of climate 6. Ball, J. E., Anderson, D. T. and Chan, C. S., A comprehensive
change. ML to generate the datasets can lead to transfor- survey of deep learning in remote sensing: theories, tools and
mational benefits for society. challenges for the community. J. Appl. Remote Sensing, 2017, 11,
042609.
7. Fang, W., Xue, Q., Shen, L. and Sheng, V. S., Survey on the appli-
Summary and future directions cation of deep learning in extreme weather prediction. Atmos-
phere, 2021, 12.
8. Dong, C., Chen, Loy, C. C., He, K. and Tang, X., Image super-
In this study, a review of ML applications in ESS has resolution using deep convolutional networks. CoRR, abs/1501.
been done. The future directions especially relevant to solu- 00092, 2015.
tions for the South Asian region have been summarized as 9. Kumar, B. et al., Deep learning-based downscaling of summer
a Gartner’s curve (Figure 3). Hard AI problems such as monsoon rainfall data over Indian region. Theor. Appl. Climatol.,
earthquake prediction and climate-scale predictions re- 2021, 143, 1145–1156.
10. Vandal, T. et al., DeepSD: generating high resolution climate
quire long lead times of several years to centuries. They change projections through single image super-resolution. arXiv.
will take more than a decade of development to be fully org, 2017, 1–9; doi:https://arxiv.org/abs/1703.03126.
solved by ML and allied techniques. Such a long deve- 11. Saha, M., Mitra, P. and Nanjundiah, R. S., Autoencoder-based
lopment time is expected because of data sparsity; for identification of predictors of Indian monsoon. Meteorol. Atmos.
example, over the Himalayan region, for earthquake pre- Phys., 2016, 128, 613–628.
12. Saha, M. and Nanjundiah, R. S., Prediction of the ENSO and
diction. Significant uncertainties in dynamical models to EQUINOO indices during June–September using a deep learning
project end-of-century estimates of climate are also ex- method. Meteorol. Appl., 2020, 27, e1826.
pected to be resolved after extensive research and deve- 13. Lim, B. and Zohren, S., Time-series forecasting with deep learn-
lopment. Recent developments in ML, particularly in ing: a survey. Philos. Trans. R. Soc. London, Ser. A, 2021, 379,
deep learning, are expected to lead to transformative im- 20200209.
14. Kumar, B. et al., Deep learning based forecasting of Indian sum-
provements in the short to extended-range forecast, intel- mer monsoon rainfall. 2021; arXiv:2107.04270.
ligent transportation, precision agriculture, policymaking, 15. Shi, X. et al., Convolutional LSTM network: a machine learning
wind and energy forecasts during this decade. These approach for precipitation nowcasting. arXiv.org, 1506.04214, 2015.
advancements would be driven by the critical nature of 16. Viswanath, S., Saha, M., Mitra, P. and Nanjundiah, R. S., Deep
such problems and the availability of high spatio-tem- learning based LSTM and SeqToSeq models to detect monsoon
spells of India. In International Conference on Computational
poral drones, ground-based observations and satellite Science – ICCS 2019, Springer, Cham, 2019, pp. 204–218.
datasets. 17. Singh, M., Singh, B. B., Singh, R., Upendra, B., Kaur, R., Gill, S.
We have discussed various AI/ML techniques that have S. and Biswas, M. S., Quantifying COVID-19 enforced global
been used and those with high potential for improving the changes in atmospheric pollutants using cloud computing based
state-of-the-art in ESS. Figure 4 is a word cloud showing remote sensing. Remote Sensing Appl.: Soc. Environ., 2021, 22,
100489.
all the critical components required for ML in ESS. An 18. Chang, C.-P. et al., The multiscale global monsoon system: re-
exhaustive literature survey on AI/ML/DL applications in search and prediction challenges in weather and climate. Bull. Am.
the South Asian domain, a mind map incorporating all the Meteorol. Soc., 2018, 99, ES149–ES153.
essential components of data science applications in ESS 19. Gadgil, S., Yadumani and Joshi, N. V., Coherent rainfall zones of
and a Gartner’s curve for future directions are the main the Indian region. J. R. Meteorol. Soc., 1993, 13, 546–566.
20. Gadgil, S., The Indian monsoon and its variability. Annu. Rev.
contributions of this review. It can be used as a starting Earth Planet. Sci., 2003, 31, 429–467.
point to understand the existing research problems, appli- 21. Moron, V., Robertson, A. W. and Pai, D. S., On the spatial cohe-
cable algorithms, educational resources, hardware/soft- rence of sub-seasonal to seasonal Indian rainfall anomalies. Cli-
ware stacks and other vital aspects essential to data mate Dyn., 2017, 49, 3403–3423.