Remotesensing 11 02801
Remotesensing 11 02801
Remotesensing 11 02801
Article
Debris Flow Susceptibility Mapping Using
Machine-Learning Techniques in Shigatse
Area, China
Yonghong Zhang 1 , Taotao Ge 1 , Wei Tian 2, * and Yuei-An Liou 3, *
1 School of Automation, Nanjing University of Information Science & Technology, Nanjing 210044, China;
zyh@nuist.edu.cn (Y.Z.); gtt347568@gmail.com (T.G.)
2 School of Computer and Software, Nanjing University of Information Science & Technology,
Nanjing 210044, China
3 Center for Space and Remote Sensing Research, National Central University, Taoyuan 32001, Taiwan
* Correspondence: tw@nuist.edu.cn (W.T.); yueian@csrsr.ncu.edu.tw (Y.-A.L.)
Received: 8 October 2019; Accepted: 21 November 2019; Published: 27 November 2019
Abstract: Debris flows have been always a serious problem in the mountain areas. Research on the
assessment of debris flows susceptibility (DFS) is useful for preventing and mitigating debris flow
risks. The main purpose of this work is to study the DFS in the Shigatse area of Tibet, by using
machine learning methods, after assessing the main triggering factors of debris flows. Remote sensing
and geographic information system (GIS) are used to obtain datasets of topography, vegetation,
human activities and soil factors for local debris flows. The problem of debris flow susceptibility
level imbalances in datasets is addressed by the Borderline-SMOTE method. Five machine learning
methods, i.e., back propagation neural network (BPNN), one-dimensional convolutional neural
network (1D-CNN), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost)
have been used to analyze and fit the relationship between debris flow triggering factors and
occurrence, and to evaluate the weight of each triggering factor. The ANOVA and Tukey HSD
tests have revealed that the XGBoost model exhibited the best mean accuracy (0.924) on ten-fold
cross-validation and the performance was significantly better than that of the BPNN (0.871), DT (0.816),
and RF (0.901). However, the performance of the XGBoost did not significantly differ from that of
the 1D-CNN (0.914). This is also the first comparison experiment between XGBoost and 1D-CNN
methods in the DFS study. The DFS maps have been verified by five evaluation methods: Precision,
Recall, F1 score, Accuracy and area under the curve (AUC). Experiments show that the XGBoost has
the best score, and the factors that have a greater impact on debris flows are aspect, annual average
rainfall, profile curvature, and elevation.
Keywords: debris flow susceptibility; remote sensing; GIS; oversampling methods; back propagation
neural network; one-dimensional convolutional neural network; decision tree; random forest; extreme
gradient boosting
1. Introduction
Debris flows involve gravity-driven motion of solid-fluid mixtures with abrupt surge fronts,
free upper surfaces, variably erodible basal surfaces, and compositions that may change with position
and time [1]. They can cause great damage to the safety of people’s lives and property, public facilities
and ecological environment. Due to the harsh natural environment and deforestation caused by
over-exploitation of human beings, Shigatse is a typical area with active debris flows in the Tibet
Autonomous Region. Debris flows can cause very high damages because the study area is densely
populated. Therefore, mitigating and reducing the disasters caused by debris flows are critical to
the local authorities. Most of Shigatse mountainous area is inaccessible and characterized by very
steep slope such that it is very difficult to carry out field surveys. The installation and maintenance
of sufficient monitoring facilities in these areas are also very challenging. Therefore, zoning debris
flow susceptibility (DFS) maps through spatial data can be used to prevent and mitigate casualties and
economic losses caused by debris flow events.
Susceptibility mapping of debris flow is prominent for early warning and treatments of regional
debris flows. DFS assessment is based on the spatial characteristics of debris flow events and relevant
factors (topography, soil, vegetation, human activities and climate). It aims to estimate the spatial
distribution of future debris flow probability in a given area [2]. Some studies have discussed and
analyzed debris flows in the study area [3,4], focusing on the residential settlements and vicinity of
roads. Assessing the susceptibility of debris flows in the whole study area is difficult due to the vast
size of land (exceeding 180,000 square kilometers). The detailed spatial information on the debris
flow triggering factors is also quite limited. In this case, satellite remote sensing has good application
prospects because it can describe the characteristics of a large area, such as terrain, vegetation, and
climate of the place where debris flow events occur. Therefore, compared with the traditional field
geological survey, which requires a lot of work and resources, data from remote sensing represented in
a GIS environment can fill the gap of on-site monitoring data. That is, it can be applied for the DFS
researches in a more effective and economical way.
In recent years, GIS and remote sensing data have been used to conduct many studies of disasters
in mountains. Researchers built their methodology analyzing data of known occurred debris flows
and tested it through unknown debris flow events. Gregoretti et al. [5] proposed a GIS-based model
tested against field measurements for a rapid hazard mapping. Kim et al. [6] used a high-resolution
light detection and ranging (LiDAR) digital elevation model to calculate the volume of debris flows.
Kim et al. [7] developed a GIS-based real-time debris flow susceptibility assessment framework for
highway sections. Alharbi et al. [8] presented a GIS-based methodology for determining initiation area
and characteristics of debris flow by using remote sensing data. At present, the DFS assessment methods
can be mainly divided into two categories: qualitative and quantitative models. The qualitative model
assigns a weight (0–1) to each debris flow triggering factor based on expert experience and knowledge
or heuristics to assess the DFS [9]. Common qualitative analysis methods include fuzzy logic [10],
analytic hierarchy [11] and network analysis [12] and so on. While these models have achieved a lot in
the study of debris flows, they still suffer for some shortcomings, such as a high degree of subjectivity
and limited applicability to specific areas [13].
Quantitative methods usually include two types: deterministic and statistical models based on
physical mechanisms. Deterministic methods are used to study the physical laws of debris flows and
establish the corresponding models to simulate the DFS [14]. The disadvantages of these models are in
that they require detailed inspection data for each slope. Thus, they are only suitable for smaller areas.
Statistical models are data-driven. The DFS assessment from them combines the past debris flow events
with environmental characteristics. It is assumed that the environmental characteristics of the past debris
flows events will lead to debris flows in the future. The models for the DFS quantitative assessment
include information model [15], evidence weight method [16], frequency ratio [17] and so on.
In recent years, data mining and machine learning techniques have also received extensive
attention because they can more accurately describe the nonlinear relationship between DFS and
triggering factors [18], and there is no special requirement for the distribution of triggering factors.
Machine learning algorithms are often superior to traditional statistical models [19] for the following
reasons. First of all, machine learning can adapt to larger datasets, while traditional statistical learning
methods are more suitable for small datasets. Secondly, machine learning has better controllability
and extensibility than traditional statistical models. Moreover, traditional statistical models are in
general limited to certain requirements and assumptions on data, whereas machine learning methods
are not. In the past three decades, common machine learning methods used for studying DFS mapping
include back propagation neural network (BPNN) [20], decision tree (DT) [21], Bayesian network [22],
Remote Sens. 2019, 11, 2801 3 of 26
and support vector machine [23]. With the advancement of researches, more and more models have
been developed with better fitting performance. Under such circumstances, continuous verification
and evaluation are still necessary for constructing and selecting a DFS evaluation model. Therefore,
comparisons among various models to investigate DFS have become hot topics in academia. Since
the information about debris flow occurrence is very limited and different, stability and accurate
predictive power are the primary requirements for selecting the appropriate method to achieve better
modeling results.
Among machine learning methods, BPNN is widely used because it carries the excellent nonlinear
fitting and complex learning abilities to extract the complex relationship between debris flow triggering
factors and DFS [24]. Convolutional neural network, a classical deep learning method, has been
rapidly developed in the past decade and is widely used in pattern recognition and medicine It is
generally used for classification and recognition of two-dimensional images. In recent years, artificial
intelligence scholars have made the convolutional neural network one-dimensional, so as to perform
the speech recognition [25], fault diagnosis [26] and data classification [27]. As an end-to-end model,
the one-dimensional convolutional neural network can extract and classify different characteristics of
debris flows directly from raw data without expert guidance. DT is another powerful prediction model
with three major advantages: the model is easy to build; the final model is easy to interpret; and the
model provides clear information about the relative importance of input factors [28]. These advantages
have motivated researchers to develop new DT models to better utilize the debris flow information.
At the same time, integrated learning algorithms based on decision trees have also been widely
concerned. Among them, the more representative ones are bagging and boosting. Kadavi et al. [29]
used four integrated algorithms: Adaboost, Bagging, LogitBoost, and Multiclass classifier to calculate
and plot the DFS map. They proved that the Multiclass classifier had the best performance by verifying
the AUC value of the test set.
Due to the complex terrain, geology and other mountain conditions in the study area,
the multi-source and multi-data are used as much as possible to characterize the terrain and geological
conditions of debris flows. Although machine learning methods have been demonstrated to achieve
results with satisfactory to some extent, this paper further discusses whether they can be applied to
examine the DFS. Its most important contributions are described as follows. (a) We collected debris
flow events data and a variety of original remote sensing data related to topographic factors, such as
soil factors, human factors and vegetation factors, and performed pre-processing operations, including
projection, registration and sampling based on remote sensing and GIS technology (ArcGIS v.10.2
software). (b) We obtained the characteristics of the study area where debris flow occurred and used
the data generation algorithm to merge the collected debris flow events data. (c) Based on the Python
language, using the keras framework and the scikit-learn module, five DFS models (BPNN, 1D-CNN,
DT, RF, and XGBoost) were constructed for the training set. The applicability of these models was
examined for the Shigatse region. It is notable that this is the first comparative experiment of XGBoost
and 1D-CNN in the study of DFS. (d) Cross-validation methods were used to compare the performance
of artificial neural networks and tree-based models to reduce the bias and variance. (e) Statistical
analyses of the comparative algorithm were done to verify whether the performance is significantly
different. (f) The test set was used to evaluate the models’ prediction ability in combination with the
five evaluation methods of classical Recall, Precision, F1 score, Accuracy, and AUC [30]. (g) At the end
of the study, the tree-based “feature importance” ranking was used to evaluate the main characteristic
factors affecting the DFS.
Figure1.1. Location
Figure Location of
of the
the study
study area.
area. Site
Site maps
maps of
of (a)
(a) China,
China, (b)
(b) Tibet
Tibet Autonomous
AutonomousRegion,
Region,and
and(c)
(c)the
the
study area.
study area.
2.1.1.
2.1.1. Debris
Debris Flow
Flow Dataset
Dataset
Collection
Collection and
and analysis
analysis of
of debris
debris flow
flow event
event datasets
datasets are
are prerequisites
prerequisites for
for the
the DFS
DFS assessment.
assessment.
There
There are 1944 debris flow sites in the study area from 1998 to 2008. Each case includesinformation
are 1944 debris flow sites in the study area from 1998 to 2008. Each case includes information
obtained
obtainedfrom
fromfield disaster
field investigation,
disaster such as
investigation, time,
such asdebris
time,flow susceptibility
debris level, and geographic
flow susceptibility level, and
location. The information on debris flows is provided by the Tibet Meteorological
geographic location. The information on debris flows is provided by the Tibet Meteorological Bureau. These
events can be viewed through the geological cloud portal [32].
Bureau. These events can be viewed through the geological cloud portal [32].
Table 1. Data layer related to debris flows susceptibility (DFS) in the study area.
Topographic factors that include elevation, slope, aspect, and curvature are extracted from
the Shuttle Radar Topography Mission Digital Elevation Model (SRTM DEM) using the ArcGIS
platform [33]. The vegetation coverage is represented by the normalized difference vegetation index
(NDVI), calculated from the obtained 2000–2008 MODIS images and averaged to generate the thematic
layer of the annual average NDVI. Rainfall data are collected from the Tropical Rainfall Measurement
Task (TRMM) [34]. We use a rainfall dataset (No: 3B42v7) with a time interval of three hours
and a spatial resolution of 0.25 degree during 1998–2008 to construct a thematic layer of annual
average precipitation. The 15 types of land use information layers are provided by National Earth
System Science Data Sharing Infrastructure, National Science & Technology Infrastructure of China
(http://www.geodata.cn) [35,36]. In addition, the road vector data provided by OpenStreetMap (OSM)
(https://www.openstreetmap.org/#map=11/22.3349/113.76000) is used to calculate the distance from
the road. Soil factors are provided by the Resource and Environmental Science Data Center (RESDC)
of the Chinese Academy of Sciences.
Higher resolution is conducive to the topographic analysis of single-ditch debris flow, but in this
work, our research focuses on the use of multiple attribute factors to analyze the disaster susceptibility
of the entire study area. Golovko [2] and Ahmed [9] believe that 30M resolution Digital Elevation
Model (DEM) can be used for the analysis of the susceptibility to mountain disasters. Therefore, DEM
data with a pixel size of 30 m is used (Figure 2a). The slope angle is a fundamental factor calculated
by the DEM data and the range of it obtained by statistics is wide (0–89◦ ) (Figure 2b). The aspect of
the slope is another key factor affecting the DFS. Because the slope surface in different directions is
exposed to the wind and rain in different degrees. The aspect thematic layers are reclassified into nine
categories: flat (−1), north (337.5–360◦ , 0–22.5◦ ), north-east (22.5–67.5◦ ), east (67.5–112.5◦ ), south-east
(112.5–157.5◦ ), south (157.5–202.5◦ ), south-west (202.5–247.5◦ ), west (247.5–292.5◦ ), and north-west
(292.5–337.5◦ ) (Figure 2c). The second derivative of the slope, i.e., the curvature, helps us understand
the characteristics of the basin runoff and erosion processes. In this study, three curvature functions are
used to show the shape of the terrain (Figure 2d–f). They are the curvature of the profile, the curvature
Remote Sens. 2019, 11, 2801 6 of 26
of the plane, and the total curvature of the surface defined as the curvature of the maximum slope,
the curvature
Remote Sens. 2018,of10,the contour,
x FOR and the combination of the curvatures, respectively.
PEER REVIEW 6 of 26
Human activities affect the geographical environment, which in turn influences the occurrence of
assessment
debris flows.often
The takes the distance
land cover thematic from
mapthe roadhow
shows into human
account,production
because thecanroad construction
change and
natural land,
maintenance
and 14 land use cause
typescertain change
including and damage
farmland and forestto the
can local morphology.
be identified This
(Figure 2g).variable
The DFSisassessment
calculated
by using
often takesthetheEuclidean
distance from distance calculation
the road technique
into account, in the
because the spatial analysis tool
road construction andofmaintenance
ArcGIS 10.2
(Figure
cause 2h). change and damage to the local morphology. This variable is calculated by using the
certain
The vegetation
Euclidean coveragetechnique
distance calculation is one ofinthe important
the spatial parameters
analysis to evaluate
tool of ArcGIS the DFS.
10.2 (Figure 2h). NDVI
extracted from remote
The vegetation sensing
coverage is oneimages
of the is a commonly
important used to
parameters vegetation index
evaluate the DFS.for inferring
NDVI the
extracted
vegetation
from remotedensity.
sensingItimages is veryis sensitive
a commonly to the presence
used of chlorophyll
vegetation on vegetation
index for inferring surface density.
the vegetation (Figure
2i).
It is We
verycalculated
sensitive to thethe
NDVI valueof
presence using the following
chlorophyll formula:surface (Figure 2i). We calculated the
on vegetation
NDVI value using the following formula:
NDVI = (NIR − RED)/(NIR + RED) (1)
where NIR and RED represent the NDVI = (NIR − RED)/(NIR
near-infrared and red-band, + RED)
respectively, and they are the second (1)
and first channels of the MODIS image. The NDVI value ranges between −1 and 1. The negative
where NIR and RED
value indicates that represent
the ground thecover
near-infrared
is an object andhighly
red-band, respectively,
reflective andlight
to visible theysuch
are the
as second
clouds,
and firstwater,
snow, channels etc. of the MODIS
0 means bare image.
land. AThe NDVIvalue
positive value represents
ranges between −1 and 1.coverage
a vegetation The negative valueit
area and
indicates that the ground cover is an
increases with the vegetation coverage density. object highly reflective to visible light such as clouds, snow, water,
etc. 0Debris
means bareflowsland.are A positive
usually value represents
influenced by changes a vegetation
in humiditycoverage area and
caused by it increases
rainfall with the
infiltration.
vegetation coverage density.
Permeability can be expressed by soil type (Figure 2j), soil texture (Figure 2k–m) and soil erosion
Debris
(Figure flows the
2n). Since areparticle
usuallydistribution
influenceddetermines
by changesthe inshape
humidityof soilcaused by rainfall infiltration.
water characteristic curve and
Permeability
affects the soilcan be expressed
hydraulic by soil type
characteristics, the (Figure
soil type2j),and soil texture
texture have(Figure
a great2k–m) and soil
influence on theerosion
DFS.
(Figure
Most of the study area is covered by alpine soil, including grass mat soil, cold soil, and frozen and
2n). Since the particle distribution determines the shape of soil water characteristic curve soil.
affects the soil
According hydraulicmost
to statistics, characteristics,
of the alpine thesoil
soilistype andand
brown texture
has have
a stronga great influence
acidity. on thesoil
The alpine DFS.is
Most
mainly of composed
the study area is covered
of silt, sand and by clay
alpine soil,
fine including
sand, and has grass
fastmat soil, coldand
permeable soil,low
andmoisturized
frozen soil.
According
ability. Soiltoerosion
statistics, most of theused
is sometimes alpine
as asoil is brown
synonym forandsoilhas
anda water
strongloss,
acidity.
and Theareasalpine soil is
with severe
mainly
erosioncomposed
are susceptibleof silt,tosand andflows.
debris clay fine
Thesand, and causes
external has fastofpermeable
soil erosionandarelow moisturized
mainly hydro,ability.
wind,
Soil
and erosion is sometimes
freeze-thaw. Clearly,used as asoil
fragile synonym for soil and
characteristics water loss, and
accompanied areas with severe
by concentrated erosion
rainfall are
usually
susceptible to debris
result in debris flows. flows. The external causes of soil erosion are mainly hydro, wind, and freeze-thaw.
Clearly, fragileissoil
Rainfall thecharacteristics accompanied
main factor leading to debris by flows.
concentrated
The study rainfall
areausually resultby
is affected inthe
debris flows.
monsoon
Rainfall is the main factor leading to debris flows. The study area is affected
climate with rare precipitation and an annual average precipitation less than 1300 mm (Figure 2o). by the monsoon climate
with rare precipitation
However, and an of
statistical analyses annual average precipitation
the geological hazard points lessoccurring
than 1300inmm the(Figure
study area2o). show
However,that
statistical
heavy rain analyses of the geological
and continuous rainfallhazard points occurring
are important externalinfactors
the study area show
leading that heavy
to geological rain and
disasters in
continuous rainfall
the Shigatse area. are important external factors leading to geological disasters in the Shigatse area.
(a) (b)
Figure 2. Cont.
Remote Sens. 2019, 11, 2801 7 of 26
Remote Sens. 2018, 10, x FOR PEER REVIEW 7 of 26
(c) (d)
(e) (f)
(g) (h)
(i) (j)
Figure 2. Cont.
Remote Sens. 2019, 11, 2801 8 of 26
Remote Sens. 2018, 10, x FOR PEER REVIEW 8 of 26
(k) (l)
(m) (n)
(o)
Figure 2. Spatial distribution of debris flow characteristics; (a) elevation, (b) slope angel, (c) aspect,
Figure 2. Spatial distribution of debris flow characteristics; (a) elevation, (b) slope angel, (c) aspect,
(d) total curvature, (e)profile curvature, (f) plan curvature, (g) landcover, (h) distance to road, (i)
(d) total curvature, (e)profile curvature, (f) plan curvature, (g) landcover, (h) distance to road, (i) NDVI,
NDVI, (j) soil type, (k) sand, (l) silt, (m) clay, (n) erosion, and (o) rainfall.
(j) soil type, (k) sand, (l) silt, (m) clay, (n) erosion, and (o) rainfall.
2.2.
2.2. Methods
Methods
The
The main
main purpose
purpose of of our
our research
research is
is to
to fit the relationship between
between the the triggering
triggering factors
factors and
and
occurrence
occurrence of debris flows. The problem can be expressed as a multi-class classification. Given a setset
flows. The problem can be expressed as a multi-class classification. Given a of
of input
input quantities,
quantities, theclassification
the classificationmodel
modelattempts
attemptstotolabel
labelthe
theDFS
DFS level
level for
for each pixel in the the study
study
area.
area. The
Theinput
inputquantities
quantities toto the
themodels
modelsare
are the
the triggering
triggering factors
factors ofof the
the debris
debris flowflow events
events that
that were
were
collected by the local Chinese Geological Survey researchers after many years
collected by the local Chinese Geological Survey researchers after many years of field investigation. of field investigation.
According
According to tothe
theresearchers’
researchers’ investigation
investigation of debris
of the the debris flow we
flow sites, sites, we the
obtain obtain theofvalues
values of 15
15 triggering
triggering factors at the corresponding positions through the value extraction
factors at the corresponding positions through the value extraction function of ArcGIS v10.2 software. function of ArcGIS
v10.2 software.
That is, the input That is, model
of the the inputis aof the model is a one-dimensional
one-dimensional vector form [×1, ×2, . . . ,form
vector ×15].[×1,
The×2, …, ×15].
output The
value of
output value
the model of the
is the DFSmodel
level,isindicating
the DFS level, indicatingprobability
the occurrence the occurrence probability
of debris flows. of Thedebris flows.
division The
criteria
division criteria of regional DFS are based on the detailed survey and specification of landslide
collapse debris flows by the China Geological Survey as shown in Table 2.
Remote Sens. 2019, 11, 2801 9 of 26
of regional DFS are based on the detailed survey and specification of landslide collapse debris flows by
the China Geological Survey as shown in Table 2.
1000
events
1000 402
flowevents
Figure Figure
Figure
3. Statistics of debris flow events with different susceptibility.
3. Statistics of debris
3. Statistics of debrisflow
flow events with
events with different
different susceptibility.
susceptibility.
In addition, we have also implemented other traditional machine learning algorithms, such as
support vector machine, logistic regression, and naive Bayesian model, but the results are disappointing.
Therefore, these methods are not introduced here. The following part is a brief introduction to the data
sampling generation algorithm and five classifiers used in this paper.
2.2.1. Borderline-SMOTE
It is well known that in the model training process, when a certain class in the classified data
set is of a high proportion, the classifier performance will be seriously affected. Synthetic Minority
Oversampling Technique is often referred to as SMOTE that has been improved for its application
in solving data imbalance problems [37]. It is used to artificially generate vector data to achieve the
consistency among each category in the dataset. In the study, it is common that most units are with
moderate susceptibility. In the classification process, the scarcity of the category data with fewer
samples (the minority class) is one of the main factors for over-fitting and inaccuracy. This paper
chooses the boundary-based SOMTE algorithm (Borderline-SMOTE) to handle the imbalance of the
data. Specifically, the k-nearest neighbor algorithm is used to calculate the nearest neighbor sample
in the minority sample set in the training set. The boundary sample set is constructed according to
whether the majority class in the nearest neighbor sample set is dominant. The k-nearest neighbors
of the sample Ti in the boundary sample set are calculated, and the sample Tj is randomly selected.
The SMOTE algorithm is used to randomly insert the feature vector between the selected neighbor
samples and the initial vector. The SMOTE algorithm is shown in Equation (2),
Tnew = Ti + random(0, 1) ∗ Ti − T j (2)
Finally, the generated new sample is added to the original sample set.
where f (θ) represents the transfer function; θ = {w, b} represents the network parameter; w is the
weight; and b is the threshold.
Figure 5.
Figure One-dimensional neural
5. One-dimensional neural network
network structure
structure used
used in
in this
this research.
research.
and t is the node. Finally, pruning techniques are used to deal with the over-fitting problem of the
model. Upon completing the entire algorithm, we can clearly understand the internal decision-making
mechanism and thus get a more objective knowledge of debris flow triggering factors.
X 2
Gini(t) = 1 − [p(ck t)] (5)
k
2.2.6. XGBoost
XGBoost, also known as extreme Gradient Boosting, is a gradient-enhanced integration algorithm
based on classification trees or regression trees. It works the same way as Gradient Boosting, but adds
features similar to Adaboost. The algorithm combines multiple DT models in an iterative way to form
a classification model with a better structure and higher precision. It has achieved impressive results
in many international data mining competitions and won more than two championships in the Kaggle
competition. In the DFS analysis experiment, the XGBoost can classify the DFS level according to the
environmental characteristics of the Shigatse region and rank the importance of the triggering factors.
The XGBoost uses both the first and second derivatives to perform Taylor expansion on the loss
function and adds a regular term to it. Therefore, while considering the model accuracy, the model
complexity is also well controlled. Finally, the predictive power of the model is trained by minimizing
the total loss function [39]. The objective function of the model can be expressed as Equation (6):
n
X
J ( ft ) = L yi , ŷi (t−1) + ft (xi ) + Ω( ft ) + C (6)
i=1
where i represents the ith sample, ŷi (t−1) represents the predicted value of the (t − 1)th model for the
sample i, ft (xi ) represents the newly added tth model, Ω( ft ) represents the regular term, C represents
some constant terms, and the outermost L() represents the error.
The optimizer aims to calculate the structure and the leaf score of the CART tree. XGBoost
accelerates existing lifting tree enhancement algorithms through the cache-aware read-ahead technology,
distributed external memory computing technology and AllReduce fault-tolerant tools. It can also be
trained by using a graphics processing unit to provide a very high speed boost.
In this work, we can import the XGBoost toolkit in Python. The training process controls the
establishment of DT by adjusting five hyper-parameters: the number of iterations, the number of trees
generated, the learning rate, the maximum depth of each tree, and the L2 regularization. The Gamma
hyper-parameter limits the gain amount required for segmentation.
five common evaluation methods are used to quantify model performances, including Precision, Recall,
F1 score, Accuracy and AUC. Finally, 293 debris flow events are applied as a test set.
In the case of the binary classification problem, four elements, i.e., TP, TN, FP and FN, are defined
as follows.
TP: True Positive. Samples belonging to the TRUE class are correctly marked as positive by the model.
TN: True Negative. Samples belonging to the TRUE class are incorrectly marked as negative by
the model.
FP: False Positive. Samples belonging to the FALSE class are incorrectly marked as positive by
the model.
FN: False Negative. Samples belonging to the FALSE class are correctly marked as negative by
the model.
2.3.1. Precision
In the binary classification task, precision represents the ratio of the correct labeled True class
samples to the total number of predicted values labeled true. The formula is as follows:
2.3.2. Recall
The Recall rate is the ratio of the correct labeled True sample to the total number of True samples,
expressed as Equation (8) in the binary classification task.
The Recall rate represents the weighted average of the Recall rates for each category in a
multivariate classification task.
2.3.3. F1 Score
The F1 score is represented by Precision and Recall, with values between 0 and 1, which represent
the worst and best, respectively. The relative contributions of accuracy and recall to the F1 score are
equal. The formula is defined as follows:
In a multivariate classification task, the F1 score represents a weighted average of F1 scores for
each category.
2.3.4. Accuracy
In a multivariate classification task, accuracy represents the ratio of correctly classified samples to
the total number of samples.
rate (FPR). TPR represents the ratio of the positive instance correctly classified to the total number of
all the positive instances, as represented by Equation (10):
FPR is the ratio of the positive instance misclassified to the total number of all the negative
instances, as represented by Equation (11):
The AUC method is also designed to evaluate the binary classification. First, we need to convert
the multivariate classification task into multiple binary classification, and then separately calculate the
AUC values of the respective categories. Finally, the multivariate classification result is obtained by
obtaining the average of the total AUC values [40].
2.4. Cross-Validation
In this paper, the cross-validation method is used to complete the parameter optimization.
Specifically, based on the error-based verification evaluation index, the training set is divided into
k pairs of mutually equal exclusion subsets, where k − 1 pairs are used as the training sets and the
remaining subset are used as the verification set. The experiment is performed by rotating the subset k
times in turn, and the k verification results are averaged. In this paper, the GridSearchCV module via
the scikit-learn and the cv function via the XGBoost library are used to optimize the parameters of the
decision tree, random forest and the XGBoost model. In the Keras framework, the cross-validation
method based on the GridSearchCV module is also used to search in the parameter space, and the
optimal parameter estimation of the model in the data set is given.
J−1
X
w2l (T ) = τ̂2t (12)
t=1
The selected feature is the one that provides maximal estimated improvement τ̂2t in the squared
error risk over that for a constant fit over the entire region. The following formula represents the
importance calculation over the additive M trees.
M
1 X 2
w2l (T ) = τ̂t (Tm ) (13)
M
m=1
In reality, a frequently used attribute often has a good distinguishing ability. In this study,
the importance of the factors affecting the debris flows occurrences is ranked from high to low
according to the characteristic attribute of the decision-making process of DFS.
3. Results
In this section, a specific implementation of five machine learning algorithms is introduced. Using
Python as the development language, the BPNN and the 1D-CNN are constructed based on the Keras
learning framework. The DT and the RF are implemented by API provided by the scikit-learn module,
and the XGBoost algorithm is implemented by the Python-based code provided by its official website.
The performance of DFS model depends largely on the choice, adjustment and optimization of its
parameters. Therefore, the optimization of the model structure and parameters requires multiple
experiments. The cross-validation method is used to complete the parameter optimization. After many
experiments, the optimal parameters of the five DFS models are obtained as shown in Table 3.
Algorithm Parameter
Number of iterations: 3000;
Learning rate: 0.01;
Activation function: tanh, softmax;
BPNN Number of nodes: input layer = 15, hidden layer = (30,30),
output layer = 4;
Optimization function: Adam;
Loss: Logarithmic Loss Function;
Alpha:0.005
Criterion: gini;
Min_samples_split = 2;
DT
Mat_depyh: 38;
Splitter: random
Convolutional Layer: Filter = 8, Kernel_size = 3, Stride = 1,
activation = Relu;
1D-CNN Pooling Layer: max_pooling;
Fully connected layer: node =15, activation = Relu;
Output layer: node = 4, activation = Softmax
Remote Sens. 2019, 11, 2801 16 of 26
Table 3. Cont.
Algorithm Parameter
Number of iterations: 30;
Max_feature = sprt
Max_depth: 20;
RF
Criterion:gini;
Min_samples_split = 0.8;
Min_samples_leaf = 1
Number of iterations: 39;
Max_depth:15;
colsample_bytree: 0.5;
subsample: 0.9;
Eval_metric: mlogloss;
XGBoost Objective:multi: softmax;
Eta: 0.1;
Lamda:0.2
Alpha = 0.005
Min_child_weight: 0.6;
Num_class: 4
In order to obtain robust verification results, we use the One-way ANOVA method to test whether
there is a significant difference between the methods. The ANOVA method is used according to the
five groups of accuracy, and the results are shown in Table 5.
The F value in the table indicates the ratio of the mean square between the groups to the mean
square within the group. The corresponding P value is found according to the F value through the
lookup table. Sig represents the P value, which is 0 and less than 0.05, indicating that we can reject
the null hypothesis H0. We can think that there are significant differences between at least two sets
of models. Significant differences are calculated according to post-hoc Tukey’s HSD for all pairwise
comparisons between accuracies as shown in Table 6.
Remote Sens. 2019, 11, 2801 17 of 26
According to statistics, XGBoost performs best in terms of accuracy, and there is a significant
difference (p < 0.005) from BPNN, DT and RF. There is no significant difference between XGBoost and
CNN, but the average accuracy of XGBoost is higher than that of 1D-CNN.
(a) (b)
(c) (d)
(e)
Figure
Figure 6.
6. DFS
DFS maps
maps based
based on
on the
the models
models of
of (a)
(a)BPNN,
BPNN,(b)
(b)DT,
DT, (c)
(c) 1D-CNN
1D-CNN (d)
(d) RF,
RF, and
and (e)
(e) XGBoost
XGBoost in
in
Shigatse
Shigatse area.
area.
Remote Sens. 2018, 10, x FOR PEER REVIEW 19 of 26
80
Percentage (%) of debris flow
70
60
50
Very Low
zones
40
Low
30
20 Medium
10 High
0
ANN CNN DT RF XGBoost
Model
Figure
Figure 7. Susceptibility
7. Susceptibility leveldistributions
level distributions in
in DFS
DFSmaps
mapsconstructed by by
constructed the the
fivefive
models.
models.
(1) The values of the Recall, Precision, F1 score, Accuracy, and AUC evaluation score of the five
algorithms are quite different. That is, the performances of different algorithms show great
differences in the test set.
(2) Despite large differences in the evaluation index values, their differences show the same trend.
That is, the algorithm is superior when each evaluation index is superior to the other algorithms.
(3) The AUC evaluation scores of the five algorithms are very high, indicating that they are excellent
for evaluating the DFS in our study area. The AUC values of the BPNN, 1D-CNN, DT, RF and
XGBoost are 0.946, 0.976, 0.911, 0.976 and 0.988, respectively. It can be seen that XGBoost has the
best performance.
(4) From the results of the five indicators, the evaluation scores of the BPNN and DT models are
similar, and the 1D-CNN, RF and XGBoost models also take approximate scores, but the former
has a large gap with the latter.
(5) The models in this manuscript are all operated on Intel (R) Core (TM) i7-6800K CPU @ 3.4 GHz
with 64 RAM servers. In terms of predicting the time of the entire area, the DT model takes the
shortest time. XGBoost and 1D-CNN models take about the same time, and the calculation time
is at a medium level. The prediction speed of the BPNN model is slow, which takes about 20 min
for a single prediction. Finally, it can be seen that the RF calculation takes the longest time.
Table 7. Various assessment scores for five debris flow-prone models.
Figure ROC
8. 8.
Figure curve
ROC ofof
curve five models.
five models.
Figure 9. Relative
Figure 9. Relative importance
importance of
of DFS
DFS triggering
triggering factors.
factors.
4. Discussion
This study aims to estimate the regional DFS by using five highly representative machine learning
models, i.e., BPNN, 1D-CNN, DT, RF, and XGBoost. According to literature, such investigations are
rare in Shigatse, particularly based on 1D-CNN and XGBoost.
In the early days, BPNN showed excellent performance in a variety of classification tasks. However,
this research only demonstrates its accuracy to outperform a single DT. XGBoost is not only better than
BPNN in terms of accuracy, but also in terms of speed, because BPNN has too many parameters to be
adjusted. Especially, XGBoost can generate “feature importance” that allows researchers to analyze the
data and BPNN is a black box model, for which much research has been done to explain the internal
structure. Although XGBoost has not been used for debris flow susceptibility analysis, some scholars
in the field of mountain disaster study have similar conclusions that the boost model exceeds the
accuracy of BPNN by 8% [18].
RF and XGBoost are integrated machine learning algorithms based on DT. The corresponding
evaluation scores are higher than that of a single DT. Such a result shows that the integrated algorithm
can make up the lack of fitting ability of a single DT. Although RF and XGBoost are both integrated
machine learning algorithms, XGBoost’s overall performance is better than the RF algorithm. The RF
algorithm focuses on the final voting results of all DTs, which can reduce variance, while XGBoost
focuses on the residuals generated by the last iteration which can reduce both variance and bias.
Performance comparisons between XGBoost and RF have been commonly obtained in many research
areas. Usually, XGBoost is in the leading position [39,45].
Like XGoost, 1D-CNN has not been used in debris flow susceptibility, and little literature is
concerned about them. The cross-validation results show that the accuracies of XGBoost and 1D-CNN
are not significantly different, but the average accuracy of XGBoost is better than that of 1D-CNN.
The test performance of the two models is also led by XGBoost. The main reason for this result is
that CNN can capture things like image, audio and possibly text quite well by modeling the spatial
temporal locality, while tree-based models solve tabular data very well.
When considering the model classification performance comprehensively, we can find that
XGBoost has the best comprehensive performance with high classification accuracy, good prediction
effect and less calculation time. Therefore, the XGBoost research method should attract more attention
in the future evaluation of DFS.
80 200
Percentages of each
70
Percentages of each
60 150
50 100
40
classe(%)
classe(%)
30 High 50 High
20
0
-260.30…
-17.05 ~…
-2.98 ~ -…
10 Medium Medium
0
-1.04 ~ 0
1.43 ~ 2.43
3.43 ~7.43
Low Low
Very Low Very Low
Profile curvature
Aspect
Remote Sens. 2018, 10, x FOR PEER
(a) REVIEW (b) 23 of 26
100 250
Precentages of each
Percentages of each
80 200
60 150
classe(%)
100
class(%)
40 High High
50
20 Medium Medium
0
0 Low Low
1387-3600
3600-4279
4279-4634
4634-4900
4900-5167
5167-5448
5448-5802
5802-8776
Rainfall(mm) Elevation(m)
(c) (d)
Figure
Figure 10.
10. Major
Majortriggering
triggeringfactors
factors data
dataobtained
obtained from
from the
the initial
initial region
region of
of the
the debris
debris flow:
flow: (a)
(a)Aspect;
Aspect;
(b) Profile curvature; (c) Rainfall; and (d) Elevation.
Machine learning
Machine learning algorithms
algorithmscan
canhandle
handlelarge-scale data.
large-scale In addition,
data. theythey
In addition, are more objective
are more than
objective
the traditional qualitative evaluation methods and can support making decisions without
than the traditional qualitative evaluation methods and can support making decisions without expert system
expert system support. However, there are some inherent problems. For example, the data
preprocessing workload is large and time-consuming, and the data processing results have a great
impact on the classifier.
5. Conclusions
Remote Sens. 2019, 11, 2801 23 of 26
support. However, there are some inherent problems. For example, the data preprocessing workload
is large and time-consuming, and the data processing results have a great impact on the classifier.
5. Conclusions
In this study, multi-source satellite data and GIS are used to characterize the gestation environment
of debris flows in the study area, and then input these environmental characteristics into the machine
learning methods to establish the DFS model. The role and weight of the triggering factors shown by
the training process are analyzed for the purpose of further studying the main causes of debris flow.
In the entire research process described above, the four main findings are described as follows:
(1) Satellite remote sensing can provide data for regional DFS analysis, especially for mountainous
areas such as the southwestern Tibet with steep terrain where the sites are not always accessible
for investigation. Higher resolution does allow the image to better describe the terrain where the
debris flow occurs [48] and potentially improve further analysis. It is important and necessary to
use topographical factors, human activities, vegetation cover, climatic and soil elements provided
by satellite remote sensing to estimate regional debris flow susceptibility.
(2) Five machine learning algorithms were used to construct DFS map in Shigatse. The results
confirm that all five methods can be used to analyze the susceptibility of debris flows. According
to the performance, XGBoost ranks the first, and 1D-CNN is the second, followed by RF, BPNN,
and DT. XGBoost has the best predictive performance with the highest score among the five
evaluation methods. The ANOVA method and the Tukey’s HSD test showed that the accuracy of
XGBoost is significantly better than those of RF, BPNN, and DT, but it is not significantly different
from 1D-CNN. In terms of the time required for prediction, DT takes the least time, and the time
required for 1D-CNN is moderate and close to XGBoost. RF and BPNN are slower to calculate.
It is notable that this is the first comparative experiment of XGBoost and 1D-CNN in the study of
DFS. The ranking of the model based on the “feature importance” indicates that the slope aspect,
rainfall, profile curvature and DEM have a greater impact on the debris flows. The results of
this study are significant for the local public facility construction and the residential property
protection. Therefore, the XGBoost method has good prospects in estimating the DFS.
(3) By comparing the debris flow susceptibility maps of the five prediction algorithms, it is found that
the prediction results of five models all show that the moderately susceptible areas account for a
large proportion. This experiment has not yet explained the reasons for the different prediction
results. The causes will be explored in the subsequent studies. There may be some shortcomings
in the use of susceptibility in statistics as a label in experiments. In the follow-up study, we are
going to use the clustering algorithm first to obtain the location where the debris flow is not easy
to occur and use it together with the existing debris flow data for the classification of debris flow
susceptibility. With the development of machine learning technology, we will strive to further
improve the performance of the model for DFS by modifying and optimizing the algorithm.
(4) Debris flows are common in mountain areas. Five machine learning models are used to analyze
the debris flow events in the study. The results show that the XGBoost model has the best
predictive performance, which can be used to prevent casualties and economic losses caused by
debris flows. For local land planning and land use, relevant departments can use the XGBoost
model in combination with satellite remote sensing and GIS spatial data processing to create
feature maps and high-precision area-sensitive maps to provide guidance and preparation for
debris flow prevention and mitigation.
Author Contributions: Conceptualization, Y.Z. and T.G.; methodology, T.G.; software, T.G.; validation, W.T.,
and Y.Z.; formal analysis, T.G.; investigation, T.G.; resources, Y.Z.; data curation, W.T.; writing—original draft
preparation, T.G.; writing—review and editing, Y.Z., W.T., Y.-A.L.; visualization, T.G.; supervision, W.T.; project
administration, Y.Z.; funding acquisition, Y.Z.
Remote Sens. 2019, 11, 2801 24 of 26
Funding: This research was funded by the National Natural Science Foundation of China, grant number
41661144039, 41875027 and 41871238.
Acknowledgments: The authors would like to thank the Tibet Plateau Institute of Atmospheric Environment,
Geospatial data cloud, Resource and Environmental Cloud Platform, earth observing system data and information
system and Geoscientific Data & Discovery Publishing Systems for the data that they kindly provided.
Acknowledgement for the data support from “National Earth System Science Data Sharing Infrastructure,
National Science & Technology Infrastructure of China. (http://www.geodata.cn)”.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Iverson, R.M. Debris-flow mechanics. In Debris-Flow Hazards and Related Phenomena; Springer: Berlin/Heidelberg,
Germany, 2005; pp. 105–134. ISBN 9783540207269.
2. Golovko, D.; Roessner, S.; Behling, R.; Wetzel, H.U.; Kleinschmit, B. Evaluation of remote-sensing-based
landslide inventories for hazard assessment in Southern Kyrgyzstan. Remote Sens. 2017, 9, 943. [CrossRef]
3. LV, X.; Ding, M.; Zhang, Y.; Teng, J. Hazard assessment of mountainous disasters in Nieyou section of
Sino-Nepal highway based on triangle whitening weight function. J. Southwest Univ. Sci. Technol. 2017, 1.
[CrossRef]
4. Sun, Y.; Chen, H.; Zhang, Z.; Zhao, Y.; Bao, P.; Bai, J. Distribution regularities of geological hazards along the
g318 lhasa-shigatse section and their influence factors. J. Nat. Disasters 2014, 23, 111–119.
5. Gregoretti, C.; Stancanelli, L.M.; Bernard, M.; Boreggio, M.; Degetto, M.; Lanzoni, S. Relevance of erosion
processes when modelling in-channel gravel debris flows for efficient hazard assessment. J. Hydrol. 2019,
568, 575–591. [CrossRef]
6. Kim, H.; Lee, S.W.; Yune, C.Y.; Kim, G. Volume estimation of small scale debris flows based on observations
of topographic changes using airborne LiDAR DEMs. J. Mt. Sci. 2014, 11, 578–591. [CrossRef]
7. Kim, H.S.; Chung, C.K.; Kim, S.R.; Kim, K.S. A GIS-based framework for real-time debris-flow hazard
assessment for expressways in Korea. Int. J. Disaster Risk Sci. 2016, 7, 293–311. [CrossRef]
8. Alharbi, T.; Sultan, M.; Sefry, S.; ElKadiri, R.; Ahmed, M.; Chase, R.; Chounaird, K. An assessment of landslide
susceptibility in the Faifa area, Saudi Arabia, using remote sensing and GIS techniques. Nat. Hazards Earth
Syst. Sci. 2014, 14, 1553. [CrossRef]
9. Ahmed, B.; Dewan, A. Application of bivariate and multivariate statistical techniques in landslide
susceptibility modeling in Chittagong City Corporation, Bangladesh. Remote Sens. 2017, 9, 304. [CrossRef]
10. Li, Y.; Wang, H.; Chen, J.; Shang, Y. Debris flow susceptibility assessment in the Wudongde Dam area, China
based on rock engineering system and fuzzy C-means algorithm. Water 2017, 9, 669. [CrossRef]
11. Liou, Y.A.; Nguyen, A.K.; Li, M.H. Assessing spatiotemporal eco-environmental vulnerability by Landsat
data. Ecol. Indic. 2017, 80, 52–65. [CrossRef]
12. Sujatha, E.R.; Sridhar, V. Mapping debris flow susceptibility using analytical network process in Kodaikkanal
Hills, Tamil Nadu (India). J. Earth Syst. Sci. 2017, 126, 116. [CrossRef]
13. Aditian, A.; Kubota, T.; Shinohara, Y. Comparison of GIS-based landslide susceptibility models using
frequency ratio, logistic regression, and artificial neural network in a tertiary region of Ambon, Indonesia.
Geomorphology 2018, 318, 101–111. [CrossRef]
14. Di Cristo, C.; Iervolino, M.; Vacca, A. Applicability of Kinematic and Diffusive models for mud-flows: A
steady state analysis. J. Hydrol. 2018, 559, 585–595. [CrossRef]
15. Xu, W.B.; Yu, W.J.; Jing, S.C.; Zhang, G.P.; Huang, J.X. Debris flow susceptibility assessment by GIS and
information value model in a large-scale region, Sichuan Province (China). Nat. Hazards 2013, 65, 1379–1392.
[CrossRef]
16. Chen, X.; Chen, H.; You, Y.; Chen, X.; Liu, J. Weights-of-evidence method based on GIS for assessing
susceptibility to debris flows in Kangding County, Sichuan Province, China. Environ. Earth Sci. 2016, 75, 70.
[CrossRef]
17. Achour, Y.; Garçia, S.; Cavaleiro, V. GIS-based spatial prediction of debris flows using logistic regression
and frequency ratio models for Zêzere River basin and its surrounding area, Northwest Covilhã, Portugal.
Arab. J. Geosci. 2018, 11, 550. [CrossRef]
18. Oh, H.J.; Lee, S. Shallow landslide susceptibility modeling using the sata mining models artificial neural
network and boosted tree. Appl. Sci. 2017, 7, 1000. [CrossRef]
Remote Sens. 2019, 11, 2801 25 of 26
19. Shirzadi, A.; Shahabi, H.; Chapi, K.; Bui, D.T.; Pham, B.T.; Shahedi, K.; Ahmad, B.B. A comparative study
between popular statistical and machine learning methods for simulating volume of landslides. Catena 2017,
157, 213–226. [CrossRef]
20. Wang, L.J.; Guo, M.; Sawada, K.; Lin, J.; Zhang, J. A comparative study of landslide susceptibility maps using
logistic regression, frequency ratio, decision tree, weights of evidence and artificial neural network. Geosci. J.
2016, 20, 117–136. [CrossRef]
21. Abancó, C.; Hürlimann, M. Estimate of the debris-flow entrainment using field and topographical data.
Nat. Hazards 2014, 71, 363–383. [CrossRef]
22. Prenner, D.; Kaitna, R.; Mostbauer, K.; Hrachowitz, M. The value of using multiple hydrometeorological
variables to predict temporal debris flow susceptibility in an alpine environment. Water Resour. Res. 2018, 54,
6822–6843. [CrossRef]
23. Jiang, W.; Rao, P.; Cao, R.; Tang, Z.; Chen, K. Comparative evaluation of geological disaster susceptibility
using multi-regression methods and spatial accuracy validation. J. Geogr. Sci. 2017, 27, 439–462. [CrossRef]
24. Kang, S.; Lee, S.R.; Vasu, N.N.; Park, J.Y.; Lee, D.H. Development of an initiation criterion for debris flows
based on local topographic properties and applicability assessment at a regional scale. Eng. Geol. 2017, 230,
64–76. [CrossRef]
25. Zhao, J.; Mao, X.; Chen, L. Learning deep features to recognise speech emotion using merged deep CNN.
IET Signal Process. 2018, 12, 713–721. [CrossRef]
26. Pan, H.; He, X.; Tang, S.; Meng, F. An improved bearing fault diagnosis method using one-dimensional CNN
and LSTM. Strojinski Vestnik/J. Mech. Eng. 2018, 64, 443–452. [CrossRef]
27. Zhao, X.; Wen, Z.; Pan, X.; Ye, W.; Bermak, A. Mixture gases classification based on multi-label one-dimensional
deep convolutional neural network. IEEE Access 2019, 7, 12630–12637. [CrossRef]
28. Tsangaratos, P.; Ilia, I. Landslide susceptibility mapping using a modified decision tree classifier in the Xanthi
Perfection, Greece. Landslides 2016, 13, 305–320. [CrossRef]
29. Kadavi, P.R.; Lee, C.-W.; Lee, S. Application of ensemble-based machine learning models to landslide
susceptibility mapping. Remote Sens. 2018, 10, 1252. [CrossRef]
30. Nikolopoulos, E.I.; Destro, E.; Bhuian, M.; Borga, M.; Anagnostou, E. Evaluation of predictive models for
post-fire debris flow occurrence in the western United States. Nat. Hazard Earth Syst. Sci. 2018, 18, 2331–2343.
[CrossRef]
31. Tang, M.; Fu, T.; Zhang, W.; Yang, J. Genetic mechanism of geohazard along national highway 318 in Tibet
and prevention countermeasure. J. Highw. Transp. Res. Dev. 2012, 5, 005. [CrossRef]
32. Geological Cloud Portal Home Page. Available online: http://geocloud.cgs.gov.cn/#/portal/home (accessed
on 12 May 2017).
33. Marco, C.; Stefano, C.; Sebastiano, T.; Lorenzo, M. GIS tools for preliminary debris-flow assessment at
regional scale. J. Mt. Sci. 2017, 14, 2498–2510. [CrossRef]
34. Djeddaoui, F.; Chadli, M.; Gloaguen, R. Desertification susceptibility mapping using logistic regression
analysis in the Djelfa area, Algeria. Remote Sens. 2017, 9, 1031. [CrossRef]
35. Gong, P.; Wang, J.; Yu, L.; Zhao, Y.; Zhao, Y.; Liang, L.; Niu, Z.; Huang, X.; Fu, H.; Liu, S.; et al. Finer
resolution observation and monitoring of global landcover: First mapping results with Landsat TM and
ETM+ data. Int. J. Remote Sens. 2013, 34, 2607–2654. [CrossRef]
36. Li, C.; Gong, P.; Wang, J.; Zhu, Z.; Biging, G.S.; Yuan, C.; Hu, T.; Zhang, H.; Wang, Q.; Li, X.; et al. The
first all-season sample set for mapping global landcover with Landsat-8data. Sci. Bull. 2017, 62, 508–515.
[CrossRef]
37. Verbiest, N.; Ramentol, E.; Cornelis, C.; Herrera, F. Preprocessing noisy imbalanced datasets using SMOTE
enhanced with fuzzy rough prototype selection. Appl. Soft Comput. 2014, 22, 511–517. [CrossRef]
38. Mao, Y.; Zhang, M.; Sun, P.; Wang, G. Landslide susceptibility assessment using uncertain decision tree
model in loess areas. Environ. Earth Sci. 2017, 76, 752. [CrossRef]
39. Wang, S.; Dong, P.; Tian, Y. A novel method of statistical line loss estimation for distribution feeders based on
feeder cluster and modified XGBoost. Energies 2017, 10, 2067. [CrossRef]
40. Wu, Y.J.; Chiang, C.T. ROC representation for the discriminability of multi-classification markers.
Pattern Recognit. 2016, 60, 770–777. [CrossRef]
Remote Sens. 2019, 11, 2801 26 of 26
41. Rajaraman, S.; Antani, S.K.; Poostchi, M.; Silamut, K.; Hossain, M.A.; Maude, R.J.; Thoma, G.R. Pre-trained
convolutional neural networks as feature extractors toward improved malaria parasite detection in thin
blood smear images. PeerJ 2018, 6, e4568. [CrossRef]
42. Abdi, H.; Williams, L.J. Tukey’s honestly significant difference (HSD) test. In Encyclopedia of Research Design;
Salkind, N., Ed.; Sage: Thousand Oaks, CA, USA, 2010; pp. 1–5.
43. Li, C.; Zheng, X.; Yang, Z.; Kuang, L. Predicting short-term electricity demand by combining the advantages
of ARMA and XGBoost in fog computing environment. Wirel. Commun. Mob. Comput. 2018, 2018, 18.
[CrossRef]
44. Shimoda, A.; Ichikawa, D.; Oyama, H. Using machine-learning approaches to predict non-participation in a
nationwide general health check-up scheme. Comput. Methods Programs Biomed. 2018, 163, 39–46. [CrossRef]
[PubMed]
45. Zhang, L.; Ai, H.; Chen, W.; Yin, Z.; Hu, H.; Zhu, J.; Liu, H. CarcinoPred-EL: Novel models for predicting the
carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods. Sci. Rep. 2017, 7,
2118. [CrossRef] [PubMed]
46. Wang, Z.Y.; Gong, T.L.; Shi, W.J. Typical types of vegetation and erosion in the Yalutsangpo Basin.
Adv. Earth Sci. 2011, 26, 1208–1216.
47. Guo, C.W.; Yao, L.K.; Duan, S.S.; Huang, Y.D. Distribution regularities of landslides induced by Wenchuan
earthquake, Lushan earthquake and Nepal earthquake. J. Southwest Jiaotong Univ. 2016, 51, 71–77.
48. Stolz, A.; Huggel, C. Debris flows in the Swiss National Park: The influence of different flow models and
varying DEM grid size on modeling results. Landslide 2008, 5, 311–319. [CrossRef]
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).