Prediction of CBR by Deep Artificial Neural Networks With Hyperparameter Optimization by Simulated Annealing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Indian Geotech J

https://doi.org/10.1007/s40098-024-00870-4

ORIGINAL PAPER

Prediction of CBR by Deep Artificial Neural Networks


with Hyperparameter Optimization by Simulated Annealing
Crespin Prudence Yabi1 · Sètondji Wadoscky Agongbe1 · Bio Chéïssou Koto Tamou1 ·
Ehsan Noroozinejad Farsangi2 · Eric Alamou1 · Mohamed Gibigaye3

Received: 12 June 2023 / Accepted: 2 January 2024


© The Author(s) 2024

Abstract The construction of pavements requires the configuration for each network. The predictions obtained
complete identification of the soils in place and of the added are correlated with the true values from 83.6 to 96.5%. In
materials. This identification consists in determining the terms of performance, the models have a mean deviation
class of the soils and in evaluating their bearing capacity ranging from 3.74 to 5.96%, a maximum deviation ranging
through the California bearing ratio (CBR) index. Obtaining from 12.43 to 16.2% and a squared deviation ranging from
the CBR index is very costly in terms of time and financial 0.781 to 2.189. The results suggest that the variable VBS
resources, especially when it is a large-scale project. It has a negative impact on the accuracy of the networks in
thus leaves prospects of obtaining it by simpler processes; predicting the CBR index. The developed models respect
hence, it arises the need to find simpler processes compared the confidence threshold (± 10%) and can be used to set up
to classical processes. This study develops models for a local or regional geotechnical platform.
predicting the CBR index from physical properties that are
less complex to obtain, based on deep neural networks. To Keywords CBR prediction · Deep artificial neural
achieve this, three databases were used. A first database networks · Clustering · Data augmentation · TensorFlow
consists of the proportion of fines, the Atterberg limits and
the Proctor references of the soils. A second database uses
the methylene blue value instead of the Atterberg limits, Introduction
and a third database uses only the proportion of fines and
the Proctor soil reference. On each of the databases, a deep Roads are linear structures that rest on a soil called a
neural network model was developed using dense layers, roadbed. This subgrade may be of selected materials
regularization layers, residual blocks and parallelization brought to the site, or it may be the existing soil or both.
in TensorFlow to predict the CBR value. Each model All materials used in the construction of the road must be
was formed by combining several deep neural networks geotechnically identified to ensure their suitability for use.
developed according to specific architectures. To expedite When identifying soils, the road engineer looks at several
training, the simulated annealing method was employed properties of the soil. These include its granularity, its
to optimize hyperparameters and define the optimal clay content, its optimal compaction characteristics and its
bearing capacity. These different quantities allow to classify
the soils according to the GTR and to determine the value of
* Ehsan Noroozinejad Farsangi
ehsan.noroozinejad@westernsydney.edu.au their California bearing ratio (CBR), at the end of the soil
1 identification process. But several problems arise with the
Laboratory of Tests and Studies in Civil Engineering,
National University of Sciences, Technologies, Engineering method of obtaining the CBR of soils. According to Liang
and Mathematics, Abomey, Benin and al. [1], the CBR test represents 37% of the total cost of
2
Urban Transformations Research Centre (UTRC), Western identifying a single soil sample. The CBR test takes enough
Sydney University, Parramatta, NSW, Australia time. In the identification process, this time represents 71%
3
Laboratory of Applied Energetic and Mechanic, LEMA, of the total identification time of a single sample. Despite its
University of Abomey-Calavi, Abomey‑Calavi, Benin requirements, CBR is not only one of the input parameters

13
Vol.:(0123456789)
Indian Geotech J

for pre-calculated pavement structure sheets, but also an terms of coefficient of determination. From 2011 to 2021,
important parameter that can be used to accept pavement statistical learning methods were used by [13–19]. The
support platforms on site in accordance with the Senegal accuracy increased up to 88% still in terms of coefficient of
pavement design guide [2]. The pavement dimensioning determination. With the availability of project data and the
guide for tropical countries [3] also enables soil modulus to advent of artificial intelligence, researchers have predicted
be estimated through existing correlations. the CBR value using artificial neural networks, as shown
Furthermore, it is noted that CBR testing is performed on in Table 1.
soil samples from project to project and thus can be consid- The works presented in Table 1 are based on the use of
ered a recurring task. As a researcher, we therefore wonder lightweight and wide neural networks where the prediction
if there are not technologies capable of learning from exist- variables are physical properties of soils, such as the
ing CBR data, the behavior of the latter as a function of percentage of fines, percentage of sand, percentage of gravel,
other soil properties that are less complex to obtain and less liquid limit, plasticity index and optimal water content. The
expensive. The issue is to be able to simplify the method of coefficients of determination (R2) obtained range from 0.68
obtaining the CBR bearing capacity index so that we can to 0.98, and the root mean square error (RMSE) remains
observe a reduction in the cost of soil identification and a low. In view of the recommendations of Charu Aggarwal
reduction in the identification time of a single soil sample. [20], deep architectures of artificial neural networks are
But the question had been raised before and had three preferable to simple and large networks since, according to
eras of attempted answers. From 1970 to 2021, researchers the researcher, the latter can present good performances, but

Table 1  Literature review on artificial neural networks for CBR prediction


References Method Predictor Activation functions Results
R2 RMSE MAE

Taskiran [21] Combination of artificial neural %(F + S), %S, %G, LL , IP , 𝛾dmax , wopt logsig, tansig 0.91 1.48
networks and gene expression %(F + S), %S, LL , IP , 𝛾dmax , wopt 0.64 2.94
programming
%(F + S), %S, IP , 𝛾dmax , wopt 0.523 5.44
%(F + S), IP , 𝛾dmax , wopt 0.807 2.11
%(F + M), IP , 𝛾dmax , wopt , S 0.838 2.07
%(F + M), IP , 𝛾dmax , wopt 0.885 1.69
%(F + M), 𝛾dmax , wopt 0.681 2.85
Sabat[22] Artificial neural networks with back- % of silt, % Poussiere ̀ de carriere ̀ 0.981 1.187 1.75
propagation Number of day, 𝛾dmax , wopt
Roy et al. [23] Artificial neural networks with back- % Passantsa(4.25
̀ mm, 2 mm,
propagation
0.425 mm, 0.075 mm),
IP , 𝛾dmax , wopt
Bhatt et al. [24] Artificial neural networks with %F, %S, %G, LL , LP , 𝛾dmax , wopt tansig, linear 0.9579 0.03
backpropagation of Levenberg– %S, %G, IP , 𝛾dmax , wopt 0.9615 0.0274
Marquardt
%S, %G, %F, 𝛾dmax , wopt 0.9501 0.032
%S, %G, 𝛾dmax , wopt 0.9792 0.0199
𝛾dmax , wopt 0.8871 0.0439
Levenberg–Marquardt neural net-
work
Erzin et al. [25] Artificial neural networks with back- ( C(%),)A(%), Cc, Tan-sigmoid
Q(%), Fel(%), Ca(%), .9384 3.65 2.53
propagation Cu, w(%), G, qdry g/cm3
Taha et al.[26] Artificial neural networks with back- D60, 𝛾dmax Tan-Axon 0.90 6.96 5.21
propagation
Bardhanet al. [27] Extreme learning machine (ELM) %(F + S), %S, %G, IP , 𝛾dmax , wopt
and adaptive neuro swarm intel-
ligence (ANSI)

such as [4–12] used simple correlations to estimate the value do not offer a satisfactory generalization capacity on data
of the CBR, and they obtained an accuracy of up to 78% in unknown to the prediction model.

13
Indian Geotech J

In 1989, Hornik [28] have rigorously established that The development of such a digital tool would reduce the
standard multilayer feedforward networks with as little financial and temporal costs of obtaining the CBR value of
as one hidden layer using arbitrary squashing functions soils and, consequently, the costs of carrying out soil iden-
are capable of approximating any measurable Borel func- tification studies, since it requires only secondary data and
tion from one finite dimensional space to another with any does not depend on costly and time-consuming laboratory
desired degree of accuracy, provided that a sufficient number analyses. The obtained results can be used in setting up a
of hidden units are available. This work confirms Aggarwal’s local or regional geotechnical platform.
recommendations. Deep artificial neural networks can detect
more information in the data than a simple, large network. In
2022, Othman et Abdelwahab [29] adopted a deep architec- Material and Methods
ture to predict the CBR value. The number of hidden layers
varies from 2 to 4 and the number of hidden units varies Artificial neural networks are essentially used to learn pro-
from 7 to 20. They obtained a maximum coefficient of deter- cesses or the behavior of phenomena, based on existing data
mination R2 = 0.945, a minimum absolute error of 1.93 and a that reflect the behavior of that phenomenon. Yan le Cun
maximum error of 17.64 while the CBR values are between [30] proposed in 1986 the principle of learning a network,
3 and 100. In view of the results obtained and the learning which is composed of six major steps, namely (i) data col-
curves, we note that the models present perfectible points, lection; (ii) the preprocessing of the data; (iii) the division of
such as the reduction of the error committed by the model the data set; (iv) training of the network and optimization of
during the prediction, the improvement of the coefficient of the hyperparameters; (v) network performance evaluation;
determination, to better exploit the power of deep artificial and (vi) deployment of the network.
neural networks. The method of developing deep neural networks for CBR
Not only the type of network (deep or light), but also prediction used in this work is based on this principle. It
the hyperparameter optimization methods used in the lit- consists of preparing and dividing the data into three sets,
erature deserve attention. The first one is the generate and training and selecting the best model, evaluating the per-
test method, which consists in defining a network structure formance of the network and making new inferences from
and analyzing its performance until a satisfactory result is the developed model. Each of these steps is described in
obtained. The second method, that of grids, consists in going the following sections. The collected data must be preproc-
through all the possible configurations in the search space essed by cleaning and scaling transformation. For neural
defined for each hyperparameter. The disadvantage with network development purposes, the preprocessed database is
the first method is that it does not allow to ensure that the divided into a test data set that is kept and a training data set
obtained structure is the most optimal one, and there could for the network parameters. To do this, optimization meth-
be other structures offering better performances. As for the ods, including simulated annealing, are used to determine
second method, it is too costly in time and computing power the optimal values of the different hyperparameters of the
and can only be considered when using very advanced com- network. After this step, the best model is retrained on the
puting tools such as GPUs (graphics processing units) and training and validation data set.
TPUs (tensor processing units). When, at the end of the process, the model shows satisfac-
Despite all the studies already carried out, to our knowl- tory performance on the test data kept in the meantime, it
edge, the development of CBR neural networks with a deep can be saved to make predictions on new data instances. If
architecture to ensure their use in industry remains largely not, the possible alternatives to improve the model perfor-
perfectible to guarantee good generalization performances. mance are to increase the data and/or to revise the architec-
Since deep networks have thousands of parameters and many ture fixed at the beginning.
hyperparameters, the use of an optimization method such as
simulated annealing (adapted for fast optimization of com- Data Presentation and Preparation
plex problems) that is less costly and more secure would be
more interesting. The database used to train the CBR soil bearing capacity
The aim of the present study is firstly to develop an easy- index prediction models came from two sources. The data
to-use numerical tool capable of predicting the CBR value were collected from reports of previously completed and
on the basis of physical properties (optimum water content, ongoing projects in Benin republic. There are 372 instances
maximum dry density, Atterberg limits, granularity, etc.) by of data on the results of AG (Particle size analysis), VBS
exploiting the power of artificial neural networks to the max- (Methylene blue value), OPM (Optimum Proctor Modified),
imum and secondly to reduce the cost of developing deep CBR of samples and 122 instances of data on LA (Atterberg
networks and rationalize the method of optimizing network Limits) in place of VBS. From this database, the results of
hyperparameters using simulated annealing.

13
Indian Geotech J

Table 2  Presentation of the Problems Main Bases Derived bases


starting databases
Prediction of the CBR index AG LA OPM CBR AG OPM CBR
AG VBS OPM CBR %F LA OPM CBR
%F OPM CBR

of neural networks. The author also suggests using input


variables that do not have a direct correlation to avoid
biasing the network’s learning.
In light of the aforementioned recommendations, a
complete exploration of each database was carried out
using the numpy, pandas, matplotlib and seaborn libraries
in the Python language, to detect anomalies, behavior and
distribution of data. The different anomalies detected dur-
ing the investigation were then cleaned up by a meticulous
process that can be summarized as follows:

• Removal of columns with a single value or low variance


• Removal of duplicate data instances
• Removal of rows with missing data
• Remove outliers.
Fig. 1  Detection of outliers in the BD-P2 database using the boxplot
method At the end of the preprocessing process carried out in
the Python language using the scikit-learn library, three
databases were selected. A first database consists of
the particle size analysis test, the Atterberg limit test, the the proportion of fines, the Atterberg limits, the Proctor
PROCTOR test and the CBR test could be collected. references of the soils and the CBR values. A second
From these data, five Microsoft Excel databases were database uses the methylene blue value instead of the
constructed as presented in Table 2. Atterberg limits. And a third database uses only the
The raw data constituting the databases contain proportion of fines, the Proctor reference of the soils and the
anomalies. Indeed, as shown in Fig. 1, the variables in CBR values. The three databases are presented in Tables 3,
the database contain outlier values that could decrease 4 and 5. Figure 3 shows that after cleaning the databases,
the performance of the models to be developed. On the there are no more outliers that can train the networks. Also,
other hand, the dispersion of observed data (Table 3 for the predictor variables are linearly weakly correlated with
example) in each variable indicates the information that the target variable as recommended by [31].
each data instance brings. Figure 2 presents a normality According to [20], using data following the normal
test conducted on the variable F (percentage of fines) of distribution for training a network facilitates training
the ­BD2 database. According to [31], variables following the network. Figure 2 shows an example of a graphical
a normal distribution facilitate and expedite the training normality test performed on the training sets. We notice

Table 3  Statistical description F LL IP dsOPM wOPM CBR


of the database BD1
Eff 65 65 65 65 65 65
Mean 46.55 51.11 23.87 1.87 14.578 15.073
Std 10.82 5.23 3.19 0.096 3.079 3.184
min 22 36.7 15.7 1.71 8.8 10
25% 37 47.8 22 1.802 11.9 13
50% 50.1 50.6 23.6 1.842 15.5 14.1
75% 53.9 56 26.6 1.96 17.2 17
Max 64.9 60.5 30.8 2.027 19.3 24
E 42.9 23.8 15.1 0.317 10.5 14

13
Indian Geotech J

Table 4  Statistical description of the BD3 database


F dsOPM wOPM CBR

Count 68 68 68 68
Mean 39.86 1.82 14.18 15.31
Std 20.76 0.05 3.54 3.62
Min 5.5 1.71 9 10
25% 12.5 1.8 9.75 13
50% 50.65 1.83 15.5 14
75% 54.28 1.84 17.2 17.05
Max 64.9 1.95 19.3 29.4
E 59.4 0.24 10.3 19.4

Table 5  Statistical description of the BD2 database


Fig. 3  Tukey box plots of variables in the preprocessed main
F VBS dsOPM wOPM CBR database 1
Eff 175.00 175.00 175.00 175.00 175.00
Mean 10.03 0.20 1.92 9.32 36.65
Std 4.29 0.05 0.09 1.40 7.72 them positive and remain faithful to the nature of the initial
Min 0.20 0.08 1.63 5.80 16.00 data.
25% 6.20 0.18 1.91 8.20 34.00 As shown in Fig. 4, each database obtained after the pre-
50% 10.80 0.20 1.93 9.00 37.00 processing process was divided into three sets as highlighted
75% 12.65 0.22 1.98 10.10 41.00 [20]. We used the holdout cross-validation method to split
Max 20.70 0.30 2.10 14.10 66.00 the starting data set into a training data set and a test data set.
E 20.50 0.22 0.47 8.30 50.00 Subsequently, the training data set is divided into a training
set and a validation set by the leave-one-out (LOO) cross-
validation method.
Indeed, the “holdout” cross-validation method consists
in training the model on a proportion (1−p) % of the start-
ing set and retaining p% of the data set to test the model’s
performance. The “leave-one-out” cross-validation is a par-
ticular form of the “k-fold” cross-validation. It consists in
retaining a single instance of data on which the model will
be evaluated after having been trained on the whole database
except for this instance. Based on the work of Payam and
Lei [32], this method would have a high variance even if it
makes an unbiased assessment of performance.
The proportion of test data defined according to the size
of the starting data set and the objectives set. It varied from
one database to another because not all databases had the
same size.

Fig. 2  Graphical test of normality on the percentage of fines in the


BD_P2 database The Deep Artificial Neural Networks

An artificial neural network can be simply defined as a


that the data do not follow the normal distribution (Fig. 4). computational graph that uses the composition of several
So, a series of transformations were used. As shown in linear or nonlinear functions to approximate the behavior
Fig. 5, a first transformation is done to center and reduce of highly nonlinear functions, according to Aggarwal [20].
the variables to a normal distribution. The last operation Artificial neural networks also use thousands of small
consists in normalizing the standardized variables to make computational units called perceptrons. A single neuron
can approximate a linear function, but several would be

13
Indian Geotech J

Fig. 4  Data transformation process

Fig. 5  Proposed method for developing neural network models

needed to approximate a relatively nonlinear function. few hidden layers. According to Aggarwal, deep networks
We then arrange several of these units to form an artificial can detect complex features in the data and thus offer better
neural network. The perceptrons are organized in layers, generalization capabilities than simple networks. But they
one after the other and in parallel. The first layer, the input are subject to several problems that must be solved during
layer, acts as a distributor of the input data to the following their development.
layers, called hidden layers or intermediate layers. In the The architecture depends on the family of artificial neural
intermediate layers, the information is transmitted from back networks. This family is determined from the nature of the
to front and from layer to layer until it reaches the last layer data entering and leaving the network. Here, we are dealing
called the output layer, where the prediction corresponding only with numerical data, and therefore, we are dealing with
to the inputs provided is obtained. The network is said to dense networks with fully connected hidden layers. The
be deep when it is composed of many hidden layers, and it network architecture is initialized at the beginning of the
is said to be simple or light when it is composed of only a

13
Indian Geotech J

process and is subject to modification when, after training, data. Details on the residual block are provided in the
a problem arises. following.
At the end of the process, six architectures of artificial
neural networks were developed and were programmed An output layer uses linear activation and consists of a
using the tensorflow + keras library written in Python and single unit.
supported by Google.
Deep Architecture AP_2
Deep Architecture AP_1
This architecture consists of two branches. Using
This architecture consists of a single branch. On this branch, parallelization in the network architecture, each branch
we find: detects a particular feature in the training data. The first
branch consists of dense and batch normalization (DBN)
• An input layer that distributes the data to the hidden lay- blocks that behave like a normal dense neural network. To
ers prevent the network from becoming saturated as the number
• Hidden layers: consisting of residual blocks that use of layers increases, we used “batch normalization” layers.
connection hops to capture simple features in the training The second branch consists of a sub-network of type AP_1
(Fig. 6).

Deep Architecture AP_3

This architecture consists of two sub-networks of type


AP_1 as shown in Fig. 7. The advantage of developing a
type of network consisting of already trained networks is
that the weights of the sub-networks are already calculated
and are used to initialize the weights of the AP_3 network.
In this architecture, we increased the level of complexity
of the model gradually. It also allowed us to save time.

Fig. 6  Residual blocks

Fig. 7  Deep artificial neural network architecture AP_3

13
Indian Geotech J

Deep Architecture AP_4 and others to make the networks robust. A brief description
and the role of each layer are presented below.
This architecture consists of two sub-networks of type
AP_3. It is a network with two outputs, all predicting Batch Normalization Layer
the value of the CBR index. An optimization by the grid
method is then done to determine the proportions in which Normalization is a recent method to solve the problem of
each prediction must be taken to obtain CBR values as leakage and gradient explosion. Beyond that, it also ensures
close as possible to the true values. the stability of the loss gradient during training. In differ-
ent artificial neural network architectures, the pre-activation
batch normalization layer helps to facilitate the gradient flow
Deep Architecture AP_5 in order to make the training faster and cheaper. Details on
how batch normalization layers work are provided by Aggar-
The output layer of this architecture admits the same oper- wal [20].
ation as the previous one. The two networks differ only in
their input layer. Here, the input layer is multiplied tenfold
in 4, to feed each branch of the network. After several Regularization Layer
processing cells through the hidden blocks, the results are
then concatenated two by two to form only two outputs. The regularization layer solves the overlearning problem that
Othman may have faced without solving. As shown in the
Architecture Stacking figure below, we notice that from the beginning of the train-
ing of the network, a gap starts to be observed between the
To exploit the strength of multiple models, an existing loss on the training data and the loss on the validation data.
approach in the literature is to stack already trained models Elaborate neural network architectures exploit a regulari-
and then bring processing through additional hidden layers zation layer of type L2, as recommended by Aggarwal.
to better converge to the true values. In our case, we stacked
the architectures with the least bias to form a better perform- Residual Blocks
ing model.
The different architectures were tested and optimized on In order to solve the gradient explosion and evanescence
a sample database in order to choose the one with the best problem as well as the degradation problem, residual neural
performance. networks have been proposed. According to He et al. [33],
the degradation problem results in an increase of the loss on
the training data and on the test data when increasing the
Overview of the Deep Neural Network Development number of hidden layers to train the deep neural networks.
Method This finding is justified by the fact that deep networks have
difficulty approximating less complex (linear) aspects by
Figure 5 illustrates the workflow of the method used to compound nonlinear activation functions. Residual neural
develop deep neural networks. The initial steps, as previ- networks propose an approach that allows the introduction
ously outlined, involve cleaning and preparing the data for of identity mappings in order to better apprehend the linear
use in the training and testing processes. With network archi- aspects of the function to be approximated.
tectures already defined, optimization methods, such as grid These residual blocks have made it possible to bet-
optimization and/or simulated annealing, are used to deter- ter exploit the power of neural networks by developing
mine optimal values for network hyperparameters. The best very deep neural networks whose number of layers varies
model is then retrained on the training and validation set. If between 10 and 100. Figure 6 shows the structure of the
the model exhibits satisfactory performance on the retained residual block used in this study.
test data, it can be saved for making predictions on new data.
Otherwise, potential improvements include increasing data Gaussian Noise Layer
and/or revising the initial architecture.
The Gaussian noise layers used are positioned just before
The Constituent Layers of Artificial Neural Network the output layer of each architecture. They made the network
Architectures less sensitive to noise, since they take as input experimental
values that have a confidence level of ± 10%.
The different architectures developed use layers to solve
instability problems encountered during network training

13
Indian Geotech J

Table 6  Search space for Network architectures Hyperparameters Research space


hyperparameters by the
proposed RNA architecture Common Number of residual blocks [1 ∶ 3 ∶ 50]
Number of units per hidden layer [1 ∶ 5 ∶ 100]
Batch size [2 ∶ 2 ∶ 16]
Hidden layer activation functions {tan sig, log sig, LeakyReLU}
AP2 ∝ −coef of Leaky ReLU [0.1 ∶ 0.1 ∶ 2]
Number of residual blocks in branch 2 [1 ∶ 3 ∶ 50]
AP3 Number of residual blocks in branch 1 [1 ∶ 3 ∶ 50]
Number of DBN blocks in branch 2 [1 ∶ 1 ∶ 50]
AP4 Number of residual blocks in branch 1 [1 ∶ 3 ∶ 50]
Number of DBN blocks in branch 2 [1 ∶ 1 ∶ 50]
Number of residual blocks in branch 3 [1 ∶ 3 ∶ 50]
Number of DBN blocks of the branch 4 [1 ∶ 1 ∶ 50]

Activation Functions optimized in the model selection phase and the derivative
of the loss function with respect to the incoming weights of
The choice of activation functions is very important in the each neuron is computed by the chain rule.
definition of the neural network architecture. Considering the The mini-batch stochastic gradient descent strategy is
activation functions already used to understand the behavior Nadam’s, a modified version of the basic method. Details
of the CBR index according to the physical properties, a on this strategy are provided in Aggarwal’s paper.
search space has been defined to select the one which offers
better performances and which is thus more adapted. Optimization of Hyperparameters

Training of Deep Artificial Neural Networks Hyperparameters are parameters that enter the design of the
neural network. Different from the parameters of the net-
Due to the complexity of the loss function of deep neural work, they are called hyperparameters to nuance the synaptic
networks, the parameter optimization method used, i.e., weights and biases of the parameters that define the archi-
gradient descent by backpropagation, has a high computa- tecture of the network.
tional cost. Backpropagation allows to progressively reduce The tuning of the hyperparameters is done on data other
the error by adjusting the values of the unknowns (synaptic than those on which the gradient descent is performed dur-
weights and bias) until reaching a minimum. ing the tuning phase, where we try to determine the optimal
The gradient descent approach to neural network train- values of these parameters. The method of optimization of
ing using backpropagation consists in reducing the error hyperparameters used is the simulated annealing. The hyper-
rate of the model by adjusting the vector of weights linking parameters to be optimized vary from one model architec-
two consecutive layers until the minimum of the function is ture to another. All the different architectures share common
reached: This is the gradient descent. hyperparameters. Table 6 presents the hyperparameters to be
To start training the model, the synaptic weights are ran- optimized by neural network architecture.
domly defined. The training is done in two steps: The optimization of hyperparameters using the grid
method or the manual method (trial and error) has a very
• A first phase required to compute the output values and high computational cost. For each set of values taken from
local derivatives at different nodes: This is the phase the search spaces of each hyperparameter, the model thus
before defined must be trained on a number of epochs allowing to
• A second phase is required to compute the sum of the select the best models. To optimize the hyperparameters of
products of the local derivatives on all paths from a node a model of type AP_1 using the grid method, 14,044,800
to the output: This is the backward phase. neural networks must be trained, i.e., for 554 years when
the average training time is two minutes on the CPU. With
Once the weights are corrected, the first phase resumes GPUs, TPUs and distributed training techniques, the com-
and the cycle continues. In the learning process, the only putational cost can be considerably reduced.
unknowns are the learning rate and the derivative of the Another alternative is to use optimization methods that
loss function with respect to the incoming weights of each are less cumbersome to use. In the present case, we have
computing unit. The learning rate is a hyperparameter to be used simulated annealing which is a metaheuristic method.

13
Indian Geotech J

According to [34], simulated annealing allows a less impor- would then be necessary to correct them by slightly reheat-
tant use of memory. ing the material to allow the atoms to regain their freedom
of movement, thus facilitating an eventual rearrangement
Simulated Annealing leading to a more stable structure.
The main idea of simulated annealing is to reproduce this
The simulated annealing method is a generalization of the behavior of the material used in metallurgy in the optimi-
Monte Carlo method. It has been used to optimize the hyper- zation process to reach the optimal solution. The internal
parameters of networks using the hyperopt library available energy of the material is then the objective function of the
in Python. It is known for its ability to avoid blocking in problem. Thus, reaching the thermodynamic equilibrium
local optima and to approach the global optimum for high state corresponds to reaching the optimal solution of the
dimensional problems. It is inspired by a natural principle problem.
called “annealing,” used in metallurgy to obtain defect-free To reach this state of ther modynamic quasi-
alloys with various shapes. The process consists of heating equilibrium, at decreasing temperature steps, we used
the metal to a certain temperature where it becomes liquid the iterative Metropolis procedure. This procedure allows
(the particles are free to circulate). At this stage, the temper- to exit local minima with a higher probability as the
ature is lowered very slowly so as to obtain a solid of well- temperature is higher. When the algorithm reaches the
defined shape. If this temperature drop is abrupt, we obtain very low temperatures, the most probable states are in
a glass; if on the contrary this temperature drop is very slow principle excellent solutions to the optimization problem.
(allowing the atoms time to reach thermodynamic equilib- Figure 8 summarizes the operation of the simulated
rium), we will obtain more and more regular structures, until annealing, and the details are provided in the appendix
we reach a state of minimum energy corresponding to the of the document.
perfect structure of a crystal, and we say that the system is
“frozen.” In the case where this lowering of temperature
would not be done slowly enough, it could appear defects. It

Fig. 8  Flowchart of the simulated annealing method [35]

13
Indian Geotech J

Evaluation and Deployment of Neural Networks Maximum Absolute and Relative Error

Validation or testing in deep learning is done on data not They measure the maximum deviation from the predictions
seen by the model, reserved before starting the training. made by the models. They are very important among the
The performance of the models is evaluated through certain metrics because of the deployment objectives of the models
metrics. These are: in industry.
MA = max (|y − ŷ |)
Coefficient of Determination

It allows to determine how the model behaves in its predic-


( )
|y − ŷ |
Emax = max
tions when new inferences are made. Its value between 0 y
and 1 reflects the weak or strong correlation between the
predicted data and the true data. The ideal value of the coef-
ficient of determination is 1. Mean Absolute and Relative Error
n
(
2
∑ yi −̂yi )2 Although they are insufficient to ensure the reliability of the
R = 1−
models, they complement the previous metrics. They allow to
(
1 yi −y)2
identify the global behavior of the model on all the test data.
1∑
MAE = |y − ŷ |
N

Fig. 9  Evolution of the loss on the validation data according to the number of iterations

13
Indian Geotech J

100 ∑ |y − ŷ | The grid method is a manual method that requires training


MAPE = all the models from all the defined configurations. We also
N y
notice at the end of the process that there is convergence and
that the optimal solution is not chosen randomly.
Square Root of the Mean Square Error At the end of the process, the network structure with the
smallest loss on the validation data is the optimal structure.
This metric is more sensitive to sensitive values than the other Table 7 presents the optimal values of the hyperparameters
metrics. When the model makes predictions that deviate of each artificial neural network architecture obtained by the
greatly from the expected values, there is a large discrepancy simulated annealing.
between the mean absolute error and the square root of the
absolute square error. Comparison of Neural Network Architectures

1∑ The six previously developed architectures were optimized,
RMSE = (y − ŷ )2
N trained and tested on a control database. The control data-
base is the derived database BD3, which consists of fewer
possible explanatory variables. Theoretically, this is the most
difficult database to operate because of the lack of predictor
Results and Discussion variables.
Figure 9 highlights the performance of each model
Optimal Structures of Neural Network Architectures asl as their development cost. The architectures AP1 ,
AP1 _AP3 _AP5 , et AP1 _AP3 _AP4 _AP5 have coefficients
The structure of each architecture was optimized by of determination between 0.934 and 0.965, which reflect
simulated annealing of a few iterations (ranging from 200 an excellent correlation between the true values and the
to 400) using the CPU. At each iteration, the model with predicted values according to the Pellinen assessment
the chosen configuration is trained on 100 epochs since criteria used by Rehman et al. [10]. From the point of view
according to [20], artificial neural networks reach the of the maximum relative error and the average relative error,
majority of their performance during the first iterations. we notice that the architectures AP1 , AP3 , AP5 and stacked
Figure 9 shows the optimization curve of the AP_1 networks offer better performance. In view of our objectives,
architecture structure using simulated annealing. From this the maximum relative error is more preponderant than the
figure, we notice that the convergence is already reached average one. Thus, the stacked model AP1 _AP3 _AP4 _A
since the first 25 iterations and that it is not useful to wait remains more powerful than all the others. However, its
for the 200 iterations. development cost is very high because it requires first the
Compared to [29] who had to train 240 neural net- development of the architectures AP1 , AP3 , AP4 etAP5
works to optimize their networks, the use of simulated (Fig. 10).
annealing completely reduces the optimization time of the In the following, we use the stacked architecture
hyperparameters. AP1 _AP3 _AP4 _AP5 to develop the different models related
Simulated annealing has the advantage of choosing the to the databases.
next configuration based on the current configuration while
trying to minimize the error committed until convergence.

Table 7  Optimal values Hyperparameters AP_1 AP_2 AP_3 AP_4


of hyperparameters using
simulated annealing Number of residual blocks 1 30 27 21 25
Number of DBN blocks 1 – 10 – 7
Number of residual blocks 2 – – 30 2
Number of DBN blocks 2 – – – 24
Number of hidden units 30 90 50
Hidden layer activation function LeakyReLU tanh tanh tanh
Gaussian noise coef 0.03045 0.3714 0.223727 0.2316145
Alpha leaky ReLU 0.116659 0.2864 0.275740 0.3279165
Batch size 12 8 8 12
L2 regularization Coef 0.0484169 0.001 0.001013 0.2664881

13
Indian Geotech J

1.05
1
0.95
0.9
0.85
0.8
0.75
0.7
0.65
0.6
0.55
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0

Fig. 10  Comparison of architectures in terms of performance

CBR Predicted vs real CBR


R²=0.838
20
19
18
17
16
CBR

15
14
13
12
11
0 2 4 6 8 10 12 14 16 18 20

Test_values Predicted_values

Fig. 11  Prediction results of CBR values on test data by the M1 model

Models for Predicting the CBR Bearing Capacity Model M1 ∶ CBR = f (AG, LA, OPM)
of Soils
The M1 model is a deep artificial neural network model based
Three models of deep artificial neural networks have on the AP_1_AP_3_AP_4_AP_5 architecture to predict the
been developed based on the stacked architecture CBR value from the proportion of fines, Atterberg limits and
AP1 _AP3 _AP4 _AP5. To each database corresponds a model PROCTOR references. The model required the development
of deep artificial neural networks. of four different models taking as input the same parameters
but based on the AP_1, AP_3, AP_4, AP_5 architectures.
The overall computational cost of the model is 53,928 s or
14 h 58 min and 48 s. The models were trained on 105 data

13
Indian Geotech J

CBR Predicted vs real CBR


R²=0.30
44

42

40
CBR

38

36

34

32
0 5 10 15 20 25 30 35

Test_values Predicted_values

Fig. 12  Prediction results of CBR values on test data by M2 model

instances, i.e., 85% of the database. Figure 11 shows the terms of the deviation between true and predicted values,
results obtained when predicting the CBR value on the test but still faces some difficulty in making predictions in the
data by the M1 model. M1. right direction. Figure 12 shows the results obtained on 30
instances of test data.
Model M2 ∶ CBR = f (AG, VBS, OPM)
Model M3 ∶ CBR = f (AG, OPM)
The M2 model uses the proportion of fines, the methylene
blue value and the PROCTOR soil references to predict the The M3 model uses the proportion of fines and PROCTOR
CBR value. It is a model stack, but has a relatively different references to predict the CBR bearing capacity. Compared
architecture. to the M1 model which has only 65 data instances, the M3
The results obtained ( R2 = 0.36 , model has the advantage of having many more samples
Emax = 17.91%, MAE = 2.139 , RMSE = 2.635 , (282). The M3 model is trained on a more represented
MAPE = 5.70% ) by the second approach are better. We population compared to the M1 model. Although the M2
notice that the model shows acceptable performance in model has more samples than the M1 model, we notice a

Fig. 13  Prediction results of CBR values on test data by the M3 model

13
Indian Geotech J

Table 8  Model comparison with existing models


Models Performance Data set description
2
R Emax RMSE MAE Mean Std Min Max Inputs

Othman et al. [29] 0.945 17.64 2.81 2.16 49.87 33.22 3 100 AG-LA-OPM
Harini HN [14] 0.94 – 2.47 2.69 4 0.6 0.97 4 LA-OPM
Bhatt et al.[24] 0.9792 – 0.0199 – 3.77 2.87 1.55 22.4 AG-OPM
Shakir Al-Busultan 0.7773 – 4.3141 – 49.69 6.871 23.5 60 Passing no. 2
and al [37] Passing no. 1 Passing no. 3/8 Passing no. 4
Passing no. 8 Passing no. 50 Passing no.
200
M.D.D
O.W.C
L.L
P.I
SO3
Soluble salt Gypsum
Organic CBR
M1 0.838 2.27 0.781 0.563 15.07 3.18 10 24 AG-LA-OPM
M2 0.36 5.771 2.635 2.139 36.65 7.72 16 66 AG-VBS-OPM
M3 0.965 5.89 2.189 1.64 15.31 3.62 10 29.4 F-OPM

decrease in performance compared to the other models. potential of deep neural networks. Moreover, the network
This can be explained by the fact that the VBS attribute structures were optimized by the simulated annealing
has a negative impact on the performance of the M2 model. method which allowed to determine optimal values for
Indeed, we notice that this column has a very low variance the hyperparameters in order to converge more quickly to
and behaves almost like a constant. When the VBS variable the optimal network.
is removed from the database to form the M3 model (in However, the model of Harini and al. in [24] demonstrates
accordance with A. Géron’s recommendations in [36]), we better performance than Model M1 developed in this article.
notice an improvement in performance. It offers a coefficient Nevertheless, the model was tested on only 10 data instances
of determination of R2 = 0.965, a maximum relative error after being developed on 114 instances. Furthermore, it
of Emax = 12.43%, a mean absolute error of MAE = 1.64 , is a shallow neural network that is deemed unreliable for
a root mean square error of RMSE = 2.189 and an average industrial applications according to the suggestions of Charu
relative error of MAPE = 5.963% (Fig. 13). Aggarwal [20].
We note that the model of Shakir Al-Busultan and al.
Comparison with Existing Models which is a neural network consisting of a single hidden layer
and an input layer with 15 prediction variables yet developed
Two out of the three developed models (M1 and M3) on 358 data instances has good performance, but less than
show satisfactory performance, while Model M2 shows our two models of deep neural networks M1 and M3 trained
poor performance (Table 8). Model M3, which is based on a less consistent database. This suggests that deep neural
on the fines percentage and optimal soil compaction char- networks detect forms in data even if they are more complex
acteristics, performs well compared to the model of Oth- than large and light neural networks.
man and al. in [29], which is also a deep neural network.
Indeed, as shown in Table 7, we note on the one hand
that the M3 model is composed of 135 residual blocks Conclusion
(see Fig. 6) and 31 dense layer blocks (see Fig. 7) while
the model of Othman et al. consists of 4 hidden layers at The development of deep artificial neural networks based
most. The M3 model is therefore deeper (with 197 hid- on the traditional method has enabled us to develop three
den layers and batch normalization layers) than the model very deep neural networks with 206 hidden layers and low
of Othman et al. (consisting of 4 hidden layers). On the development costs, thanks to the use of simulated annealing
other hand, we also note that the architectures developed for hyperparameter optimization.
with the techniques of batch normalization, regulariza- The models developed for enterprise use are as follows:
tion of hidden layers, residual blocks and parallelization,
increased the depth of networks and better exploited the

13
Indian Geotech J

• Model M1 uses the proportion of fines, Atterberg limits if changes were made. The images or other third party material in this
and PROCTOR soil references to predict the value of article are included in the article’s Creative Commons licence, unless
indicated otherwise in a credit line to the material. If material is not
the CBR bearing capacity index. The predictions of this included in the article’s Creative Commons licence and your intended
model are 83.6% perfectly correlated with the true values use is not permitted by statutory regulation or exceeds the permitted
(R2 = 0.836). The model commits a maximum error of use, you will need to obtain permission directly from the copyright
2.27 on CBR values between 10 and 24, i.e., relatively holder. To view a copy of this licence, visit http://creativecommons.
org/licenses/by/4.0/.
16.2%, a mean absolute error of 0.563 and a mean square
error of 0.781. The mean relative error on the test data is
3.74%, which satisfies the requirements of geotechnical
estimations.
• Model M3 uses fewer input parameters to predict the
References
CBR bearing capacity of soils. Based solely on the pro- 1. Liang S et al (2015) Liste de prix 2015. Proc Natl Acad Sci
portion of fines and the PROCTOR references of the 3(1):1–15
soils, predictions of 96.5% are perfectly correlated with 2. IFSTTAR (2015) Catalogue de structures de chaussées neuves et
the true values, with a relative a maximum relative error Guide de dimensionnement des chaussées au SENEGAL
3. CEBTP (2019) Revue du guide chaussées pour les pays tropicaux
of 12.43%, a mean absolute error of 1.64, a mean square 4. Agarwal KB, Ghanekar KD (1970) Prediction of CBR from
error of 2.189 and a mean relative error of 5.76%. plasticity characteristics of soil. In: Proceeding of 2nd south-east
Asian conference on soil engineering, Singapore. pp 11–15
We can conclude that the M1 model, having been trained 5. Olidis C, Hein D (2004) Guide for the mechanistic-empirical
design of new and rehabilitated pavement structures materials
on fine soils, shows good generalization performance in characterization is your agency ready?. In: TAC/ATC 2004-2004
view of the results obtained and can be used in industry. Annual conference and exhibition of the transportation association
The model’s complexity has enabled us to learn more about of Canada: transportation innovation accelerating. Pace
CBR behavior even from a limited amount of data. 6. Lepert P. Évolution de la déflexion observée sur les chaussées
souples modernes, pp 35–42
Model M2 was trained on a gravelly soil database con- 7. Udo E, Kennedy EC, Assam S (2015) Comparative stabilization
sisting of GA, VBS and OPM results, while model M3 was and model prediction of CBR values of Orukim residual SOILS,
trained on a database obtained by combining model M1 and AkwaIbom State, Nigeria. In: IOSR journal of mechanical and
model M2, but retaining only GA and OPM attributes. It can civil engineering 12(4)
8. Al-Hashemi HM, Bukhary AH (2016) Correlation between
be seen that model M3 offers better generalization perfor- California bearing ratio (CBR) and angle of repose of granular
mance than the first two models. Unlike model M3, model soil. Electron J Geotech Eng 21(17):5655–5660
M2 cannot be used in industry due to its poor performance. 9. Kumar AU, Sachar A (2020) Evaluation of correlation’s between
We conclude that adding VBS to the attributes of the Cbr using Dcp with laboratory Cbr at varying energy levels. Int J
Adv Sci Technol 29(9)
prediction network reduces performance and that the per- 10. Rehman ZU, Khalid U, Farooq K, Mujtaba H (2017) Prediction
formance of the M3 model may improve as the size of the of CBR value from index properties of different soils. Tech J Univ
database increases. The complexity of the M3 model gives it Eng Technol Taxila, Pakistan 22
a good ability to generalize to data not known to the model. 11. Roksana K, Muqtadir A, Islam T (2018) Relationship between
CBR and soil index properties of Bangladesh soil samples. Rev
Cienc y Tecnol Mod 6(2):1–9
12. Gül Y, Çayir HM (2021) Prediction of the California bearing ratio
Funding Open Access funding enabled and organized by CAUL and from some field measurements of soils. Proc Inst Civ Eng Munic
its Member Institutions. This research received no specific grant from Eng. https://​doi.​org/​10.​1680/​jmuen.​19.​00020
any funding agency in the public, commercial, or not-for-profit sectors. 13. Yildirim B, Gunaydin O (2011) Estimation of California bearing
ratio by using soft computing systems. Expert Syst Appl. https://​
Data Availability The data used to support the findings of this study doi.​org/​10.​1016/j.​eswa.​2010.​12.​054
are included in the article. 14. Harini H, Naagesh S (2014) Prediction CBR of fine grained soils
by artificial neural network and multiple linear regression. Int J
Declarations Civ Eng Technol 5(2):119–126
15. Attah IC, Agunwamba JC, Etim RK, Ogarekpe NM (2019)
Modelling and predicting cbr values of lateritic soil treated
Conflict of interest The authors declare that there is no conflict with metakaolin for road material. ARPN J Eng Appl Sci
of interest in the publication of this article. They have no known 14(20):3609–3618
competing financial interests or personal relationships that could have 16. Bardhan A, Gokceoglu C, Burman A, Samui P, Asteris PG (2021)
appeared to influence the work reported in this paper. Efficient computational techniques for predicting the California
bearing ratio of soil in soaked conditions. Eng Geol. https://​doi.​
Open Access This article is licensed under a Creative Commons org/​10.​1016/j.​enggeo.​2021.​106239
Attribution 4.0 International License, which permits use, sharing, 17. Trong DK et al (2021) On random subspace optimization-based
adaptation, distribution and reproduction in any medium or format, hybrid computing models predicting the California bearing ratio
as long as you give appropriate credit to the original author(s) and the of soils. Materials (Basel). https://​doi.​org/​10.​3390/​ma142​16516
source, provide a link to the Creative Commons licence, and indicate

13
Indian Geotech J

18. Tenpe AR, Patel A (2020) Application of genetic expression 28. Hornik K, Stinchcombe M, White H (1989) Multilayer
programming and artificial neural network for prediction of feedforward networks are universal approximators. Neural Netw
CBR. Road Mater Pavement Des. https://​doi.​org/​10.​1080/​14680​ 2(5):359–366. https://​doi.​org/​10.​1016/​0893-​6080(89)​90020-8
629.​2018.​15449​24 29. Othman K, Abdelwahab H (2022) The application of deep neural
19. Tenpe AR, Patel A (2020) Utilization of support vector models networks for the prediction of California Bearing Ratio of road
and gene expression programming for soil strength modeling. subgrade soil. Ain Shams Eng J. https://​doi.​org/​10.​1016/j.​asej.​
Arab J Sci Eng. https://​doi.​org/​10.​1007/​s13369-​020-​04441-6 2022.​101988
20. Aggarwal CC (2018) Neural networks and deep. Learning. 30. Le Cun Y (1986) Learning process in an asymmetric threshold
https://​doi.​org/​10.​1007/​978-3-​319-​94463-0 network. Disorder Syst Biol Organ. https://​doi.​org/​10.​1007/​978-
21. Taskiran T (2010) Prediction of California bearing ratio (CBR) 3-​642-​82657-3_​24
of fine grained soils by AI methods. Adv Eng Softw 41(6):886– 31. Abidin DZ, Nurmaini S, Malik RF, Rasywir E, Pratama Y (2020)
892. https://​doi.​org/​10.​1016/j.​adven​gsoft.​2010.​01.​003 Data preparation for machine learning. In: Proceedings of 2nd
22. Sabat AK (2013) Prediction of california bearing ratio of a international conference on informatics, multimedia, cyber and
soil stabilized with lime and quarry dust using artificial neural information system ICIMCIS 2020, pp. 284–289. https://​doi.​org/​
network. Electron J Geotech Eng 18:3261–3272 10.​1109/​ICIMC​IS515​67.​2020.​93542​73
23. Roy TK, Kuity A, Roy SK (2013) Prediction of soaked CBR 32. Payam Refaeilzadeh HL, Lei Tang (2005) Cross-validation.
for subgrade layer by using artificial neutral network model. Dement with Lewy Bodies Park Dis Dement. https://​doi.​org/​10.​
In: Proceedings of the international symposium on engineering 5743/​cairo/​97897​74160​097.​003.​0002
under uncertainty: safety assessment and management 33. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for
(ISEUSAM - 2012). https://​doi.​org/​10.​1007/​978-​81-​322-​0757- image recognition. In: Proceedings of the IEEE conference on
3_​83 computer vision and pattern recognition, vol. 2016-Decem, pp
24. Bhatt S, Jain PK, Pradesh M (2014) Prediction of California 770–778. https://​doi.​org/​10.​1109/​CVPR.​2016.​90
bearing ratio of soils using artificial neural network. Am Int J 34. Marne DELUPDE (2007) Conception de métaheuristiques
Res Sci Technol Eng Math 8(2):156–161 d’optimisation pour la segmentation d’images. Application à des
25. Erzin Y, Turkoz D (2016) Use of neural networks for the images biomédicales
prediction of the CBR value of some Aegean sands. Neural 35. Zhou A-H, Zhu L-P, Hu B, Pan S. Traveling-Salesman-problem
Comput Appl. https://​doi.​org/​10.​1007/​s00521-​015-​1943-7 algorithm based on simulated annealing and gene-expression
26. Taha S, Gabr A, El-Badawy S (2019) Regression and neural programming
network models for California bearing ratio prediction of typical 36. Géron A (2019) Hands-on machine learning whith Scikit-Learing,
granular materials in Egypt. Arab J Sci Eng. https://​doi.​org/​10.​ Keras and Tensorfow
1007/​s13369-​019-​03803-z 37. Al-Busultan S, Aswed GK, Almuhanna RRA, Rasheed SE (2020)
27. Bardhan A, Samui P, Ghosh K, Gandomi AH, Bhattacharyya S Application of artificial neural networks in predicting subbase
(2021) ELM-based adaptive neuro swarm intelligence techniques CBR values using soil indices data. In: IOP Conference series:
for predicting the California bearing ratio of soils in soaked materials science and engineering vol. 671, no. 1, pp 0–9. https://​
conditions. Appl Soft Comput 110:107595. https://​doi.​org/​10.​ doi.​org/​10.​1088/​1757-​899X/​671/1/​012106
1016/j.​asoc.​2021.​107595
Publisher’s Note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.

13

You might also like