Marjanovic 2009
Marjanovic 2009
Marjanovic 2009
Abstract— Case study addresses NW slopes of Fruška Gora or less reliable results depending on the complexity of the
Mountain, Serbia. Landslide activity is quite notorious in this approach [3], [4]. The central idea of all the studies implies
region, especially along the Danube’s right river bank, and the processing of input geo-parameters into a single final
recently intensified seismicity coupled with atmospheric model through various weighting and interpolating methods
precipitation might be critical for triggering new landslide i.e. statistical analysis, probabilistic approach, machine
occurrences. Hence, it is not a moment too soon for serious learning, expert-based opinion and so forth. However, the
landslide susceptibility assessment in this region. State-of-the- latter is characterized as quite accurate when combined with
art approaches had been taken into consideration, cutting other techniques, and the closest to the original geotechnical
down to the Support Vector Machine (SVM) and k-Nearest
assessment [5]. It has also been shown that the combination
Neighbor (k-NN) algorithms, trained upon expert based model
of landslide susceptibility (a multi-criteria analysis). The latter
of expert-based and machine learning techniques tend to be
involved Analytical Hierarchy Process (AHP) for weighting particularly convenient for the regional problematic [6]. In
influences of different input parameters. These included regional studies, certain generalization is necessary, so direct
elevation, slope angle, aspect, distance from flows, vegetation modeling could be extremely time-consuming and inefficient
cover, lithology, and rainfall, to represent the natural factors of unlike machine learning trained over an expert-based model.
the slope stability. Processed in a GIS environment (as discrete Namely, the algorithm that is generated while the machine is
or float raster layers) trough AHP, those parameters yielded being trained is readily applicable to a much wider area than
susceptibility pattern, classified by the entropy model into four the training set encompasses. Hence, a proper reconstruction
classes. Subsequently the susceptibility pattern has been of the final model is possible with sparse geo-inputs [7].
featured as training set in SVM and k-NN algorithms. Detailed Accordingly, the foci of this research were fabrication
fitting involved several cases, among which SVM with and fitting of the machine learning algorithms for the
Gaussian kernel over geo-dataset (coordinates and input landslide susceptibility assessment over a wider (coarser
parameters) reached the highest accuracy (88%) scale) mountain area. The approach is two-folded, initially
outperforming other considered cases by far. being introduced through multi-criteria analysis in GIS
environment and secondly, through machine learning. (Their
Keywords- AHP, k-NN, Landslide susceptibility, SVM duality will reflect the organization of this paper). The
former involves results readily presented by the author [8],
I. INTRODUCTION whereas the latter regards the main objective of this study.
The environmental hazards dealt within the Engineering Finally, it would be appropriate to mention that this
Geology scope, affect both, the social and the economic approach turns unprecedented for the area of interest, as well
aspects of human lives. Hazardous phenomena trouble at as for the geotechnical practice in Serbia. In fact, there is
different scales, with different intervals and endurance, generally limited number of studies which addresses
leaving the different outcomes. The strategies of their methodology adhered to herein, such as the case studies of
management need to treat the existing hotspots, but landslide susceptibility in Hong Kong area [6], slope stability
nonetheless to deal with potentially new ones by predicting assessment by [9], or general geotechnical 3D modeling
their behavior, volume and severeness, prior to their potential capabilities with machine learning [7]. The presented study
triggering. Herein, one of the most widespread hazardous gains gravity for its practical contribution in both, local and
phenomena is to be considered [1], [2]. This addresses global scales.
landslides and alike mass movements i.e. their susceptibility.
II. CASE STUDY
Landslide susceptibility stands for likelihood of landslide
occurrence over an area. Its assessment had been illustrated The area of interest encompasses NW slopes of the
in versatile techniques in various case studies, yielding more Fruška Gora Mountain, in vicinity of Novi Sad, NW Serbia.
274
Figure 1. Input dataset: a) elevation; b) slope angle; c) aspect; d) distance from flows; e) vegetation; f) lithology (a-alluvion, b-loess, c-limestone, d-clays);
g) rainfall; h) landslide susceptibility model (1-low, 2-mild, 3-moderate, 4-high susceptibility)
275
• susceptibility model (SM) – discrete raster model algorithms and by limiting the input dataset solely to spatial
utilized as a label parameter of the training mode in coordinate palettes. We used our own implementation of the
learning process (Fig. 1-h). It is generated by the k-NN classifier.
following normalized distribution (principal vector): 2) SVM algorithm
The second algorithm used in this study was the SVM
SM = 0,29 ⋅ P1 + 0,27 ⋅ P2 + 0,15 ⋅ P3 + 0,14 ⋅ P4 + 0,08 ⋅ P5 + 0,05 ⋅ P6 + 0,02 ⋅ P7
, (1) classifier. This method deals with the binary classification
model, but one can easily transform n-classes problem into
where Pi corresponds to pixel Digital Number (DN) the sequence of n binary classification tasks (one-versus-all
value of the appropriate parameter. Equation (1) is [16]). The algorithm tries to generate a separating hyper-
obtained by consistent AHP pair-wising and plane in the initial space of Pi coordinates between two
represents SM classified by its entropy function. distinct classes. Since the problem at hand is not linear by its
According to the information gain function the nature, SVM uses kernel functions to map the initial input
optimal number of susceptibility classes of our space into a high-dimensional feature space where the points
model is 4 (low, mild, moderate and high become more linearly separable. Gaussian radial function is
susceptibility), while it would be the most used in all experiments in a two-folded cross-validation. The
informative (but not properly visualized) with 9 latter means that the training mode was performed over 50%
classes [20]. SM is evaluated versus another, expert- of the input data (training set), evenly distributed throughout
based reference obtained by Remote Sensing method the raster sets. The other part is then used for testing (test
[11] with substantial correlation. set). In the next iteration the procedure is repeated with
swapping of training and testing set roles. Finally, the
Finally, the complete input set includes spatial reference accuracy is evaluated as the arithmetic mean of those two
or X,Y coordinates (in Gaus-Krüger Projection – Zone 7, iterations.
ellipsoid Bessel 1841) in order to support spatial patterns in All SVM experiments were performed in an open-source
training process. package LIBSVM [21].
Final act of the data preparation included normalization
IV. RESULTS AND DISCUSSION
of the entire input raster set to a 0-1 span and its conversion
to a convenient table format. Apart from Table I and Table II, the results will be
mainly presented and discussed textually, since correlations
B. Machine learning between input label (multi-criteria driven model of landslide
Machine learning classifiers implied in this research susceptibility) and output models (susceptibility patterns
included pattern recognition algorithms. They were tracked by k-NN and SVM) could not be fairly visualized,
performed through training and testing mode. Former especially where higher accuracy is reached.
requires input dataset, in our case a geo-dataset introduced in In order to compare the results (Table I) the same
the forgoing method description of multi-criteria analysis, training sets have been simultaneously generated and
and the latter includes predicting of the values by previously processed in both algorithms.
trained algorithm, and its accuracy evaluation. Accuracy is Initially, systematic fitting of the learning procedures
simply computed as count of matching instances (matching took place via parametric adjustments of k-NN and SVM
pixels) versus total number of instances used for testing (all algorithms, as to be respectively presented.
training-testing pixels), expressed as percentage. Thus, the There were two sets of experiments in k-NN method,
principal difference between the machine learning including the case with 1% and 5% of input data, randomly
approaches occurs in the training stage, where we used k- selected but uniformly distributed over the area. As
Nearest Neighbor (k-NN) and Support Vector Machine previously specified, this method turns extremely inefficient
(SVM) algorithms. with larger number of parameters and bigger percentage of
1) k-NN algorithm training data. This justifies the reduction of the input dataset
This is amongst the simplest algorithms, which performs to two attributes i.e. X, Y coordinates and landslide
classification of a point in an n-dimensional input space susceptibility class label, as well as the small amount (1%
(space of Pi coordinates) by class values of the k closest and 5% cases) of input data for training of the algorithm.
neighboring elements (in the training set). Herein, a simple Spatial parameters were not preferred over other input
voting system assigns the new class value to that particular parameters just by chance. Namely, some spatial correlations
element by the class which predominates in neighboring are observable in the susceptibility model (Fig. 1-h) which
instances. To avoid even votes, the number of neighbors is allows us to assume certain spatial dependence in data
set to an odd number (k=1, 3, 5, 7…). Since it is more distribution and regenerate it with sparse input data (it will
probable that closer neighbors have a greater impact it is be presented that small amount of input data was sufficient
desirable to ponder each neighbor’s proximity. However, this for substantial accuracy). In the experiment we randomly
requires sorting and pondering distances per each element, generated 5 training-testing splits and measured the accuracy
resulting in hardware-demanding and time-consuming of classification per each. The final result was obtained after
procedures. In this study, such shortcomings were partly averaging.
suppressed by implementing more advanced sorting Number of nearest neighbors is tested for consistency per
both cases, whereupon 3-NN experiment met the
276
requirements. Further increase of neighbors improved
consistency of the model but did not exceed the topmost TABLE II. ACCURACY ESTIMATION OF 2-FOLD A
SVM IN % FOR
DIFFERENT INPUT REPRESENTATIONS
accuracy. As indicated in Table I, the accuracy is nearly the
same, if not better in the 1% sample (with about 400
Reducedb set Parametricc set Entire set
instances for training versus nearly 2000 in the 5% case).
Equaling only about 57% of accuracy in the average, this 71,8 86,4 87,6
result is not that reliable, but the fact that only spatial a
2-fold SVM was performed with Gaussian kernel with C,Ȗ equaling 100 and 2, respectively
relations of the susceptibility model yield over 50% suggests b
reduced input set included X, Y coordinates of GK projection and DN values set of reclassified
that there is some room for improvement if learning would landslide susceptibility model
277
problem. In comparison to k-NN it turned more consistent [9] H. Zhao, “Slope reliability analysis using a support vector machine,”
and constantly more precise. The most important result is the Computers and Geotechnics, vol. 35, pp. 459-467, 2008.
one revealing that small training sets are sufficient to reach [10] M. Janjiü, Engineering Geology characteristics of terrains of National
Republic of Serbia (Inženjerskogeološke karakteristike terena
very high accuracy. In order to exploit these features, it has Narodne Republike Srbije), Geological and Geophysical Survey,
been suggested that the susceptibility model could be Belgrade (Serbia), 1962, pp. 19-189.
recreated for wider area, with less effort than in standard [11] R. Pavloviü et al, “Geological conditions of racional usage and
modeling procedures [3]. However, a multi-fold case with preservation of the Fruška Gora Mountain area (Geološki uslovi
sparse data inputs yet needs to be confirmed. racionalnog korišüenja i zaštite prostora Fruške gore),” Specialist
Another curiosity is that this approach is entirely novel in study, Department for Remote Sensing and Structural Geology,
Faculty of Mining and Geology, University of Belgrade, 2005,
landslides assessments and we hope that the future unpublished
This paper is written in the framework of Methods of [12] T. ýupkoviü, “Evolution of geomorphologic processes on Fruška
artificial intelligence in GIS, a project of Czech Republic Grant Gora Mountain (Evolucija geomorfoloških procesa Fruške gore),”
Agency (CR GA 205/09/079) Master Thesis, Department for Remote Sensing and Structural
experiments will show its applicability in this field. Geology, Faculty of Mining and Geology, University of Belgrade,
1995, unpublished.
[13] M. Komac, “A landslide susceptibility model using the analytical
REFERENCES hierarchy process method and multivariate statistics in perialpine
Slovenia,” Geomorphology, vol. 74, pp. 17-28, 2005.
[1] P. Aleotti and R. Chowdhury, “Landslide hazard assessment:
summary review and new perspectives,” Bull Eng Geol Environ, vol. [14] A. Esmali and H. Ahmadi, “Using GIS & RS in mass movements
58, pp. 21-44, 1999. hazard zonation – A case study in Germichay watershed, Aderbil,
Iran,” Proc. Map Asia 2003, Kuala Lumpur (Malaysia), 13-15
[2] K. Smith, Environmental Hazards – Assessing the risk and reducing October, 2003.
disaster, Routledge, London and New York (UK, USA), 2001, pp.
180-202. [15] V. Vapnik, The Nature of Statistical Learning Theory, 2nd ed.
Springer-Verlag, New York (USA), 1995, pp 138-167.
[3] J. Chacón et al, “Engineering geology maps: landslides and
geographical information systems,” Bull Eng Geol Environ, vol. 65, [16] A. I. Belousov, S. A. Verzakov, J. Von Frese, “Applicational aspects
pp. 341-411, 2006. of support vector machines,” J. Chemom, vol. 16, pp 482-489, 2002.
[4] G. Bonham-Carter, Geographic information system for geosciences – [17] C. Van Westen et al, “Landslide hazard and risk zonation – Why is it
still so difficult?”, Bull Eng Geol Environ, vol. 65, pp. 197-205, 2006.
Modeling with GIS, Pergamon, New York (USA), 1994, pp. 51-81,
177-334. [18] V. Voženílek, “Landslide modeling for natural risk/hazard assessment
[5] M. Ercanoglu et al, “Adaptation and comparison of expert opinion to with GIS,” Geographica, Acta Universitas Carolinae, vol. XXXV, pp.
analytical hierarchy process for landslide susceptibility mapping,” 145-155, 2000.
Bull Eng Geol Environ, vol 67, pp. 565-578, 2008. [19] G. Ravi, Remote sensing Geology, Springer-Verlag, Berlin
(Germany), 2002, pp. 429-583.
[6] X. Yao, L. G. Tham, F. C. Dai, “Landslide susceptibility mapping
based on support vector machine: A case study on natural slopes of [20] M. Marjanoviü and B. Abolmasov, “AHP-driven multi-criteria
Hong Kong, China,” Geomorphology, vol. 101, pp. 572-582, 2008. analysis of landslide susceptibility on Fruška Gora Mountain,”
unpublished
[7] A. Smirnoff, E. Boisvert, and S. J. Paradis, “Support vector machines
for 3D modeling from sparse geological information of various [21] C. C. Chang, and C. J. Lin, “LIBSVM: a library for support vector
origins,” Computers & Geosciences, vol. 34, pp. 127-143, 2008. machines,” 2001, Software available at:
[8] M. Marjanoviü, “Landslide susceptibility modeling: A case study on http://www.csie.ntu.edu.tw/~cjlin/libsvm
Fruška Gora Mountain, Serbia”, Geomorphologia Slovaca et
Bohemica, vol 10, 2009, in press
278