Prediction of CBR by Deep Artificial Neural Networks With Hyperparameter Optimization by Simulated Annealing
Prediction of CBR by Deep Artificial Neural Networks With Hyperparameter Optimization by Simulated Annealing
Prediction of CBR by Deep Artificial Neural Networks With Hyperparameter Optimization by Simulated Annealing
https://doi.org/10.1007/s40098-024-00870-4
ORIGINAL PAPER
Abstract The construction of pavements requires the configuration for each network. The predictions obtained
complete identification of the soils in place and of the added are correlated with the true values from 83.6 to 96.5%. In
materials. This identification consists in determining the terms of performance, the models have a mean deviation
class of the soils and in evaluating their bearing capacity ranging from 3.74 to 5.96%, a maximum deviation ranging
through the California bearing ratio (CBR) index. Obtaining from 12.43 to 16.2% and a squared deviation ranging from
the CBR index is very costly in terms of time and financial 0.781 to 2.189. The results suggest that the variable VBS
resources, especially when it is a large-scale project. It has a negative impact on the accuracy of the networks in
thus leaves prospects of obtaining it by simpler processes; predicting the CBR index. The developed models respect
hence, it arises the need to find simpler processes compared the confidence threshold (± 10%) and can be used to set up
to classical processes. This study develops models for a local or regional geotechnical platform.
predicting the CBR index from physical properties that are
less complex to obtain, based on deep neural networks. To Keywords CBR prediction · Deep artificial neural
achieve this, three databases were used. A first database networks · Clustering · Data augmentation · TensorFlow
consists of the proportion of fines, the Atterberg limits and
the Proctor references of the soils. A second database uses
the methylene blue value instead of the Atterberg limits, Introduction
and a third database uses only the proportion of fines and
the Proctor soil reference. On each of the databases, a deep Roads are linear structures that rest on a soil called a
neural network model was developed using dense layers, roadbed. This subgrade may be of selected materials
regularization layers, residual blocks and parallelization brought to the site, or it may be the existing soil or both.
in TensorFlow to predict the CBR value. Each model All materials used in the construction of the road must be
was formed by combining several deep neural networks geotechnically identified to ensure their suitability for use.
developed according to specific architectures. To expedite When identifying soils, the road engineer looks at several
training, the simulated annealing method was employed properties of the soil. These include its granularity, its
to optimize hyperparameters and define the optimal clay content, its optimal compaction characteristics and its
bearing capacity. These different quantities allow to classify
the soils according to the GTR and to determine the value of
* Ehsan Noroozinejad Farsangi
ehsan.noroozinejad@westernsydney.edu.au their California bearing ratio (CBR), at the end of the soil
1 identification process. But several problems arise with the
Laboratory of Tests and Studies in Civil Engineering,
National University of Sciences, Technologies, Engineering method of obtaining the CBR of soils. According to Liang
and Mathematics, Abomey, Benin and al. [1], the CBR test represents 37% of the total cost of
2
Urban Transformations Research Centre (UTRC), Western identifying a single soil sample. The CBR test takes enough
Sydney University, Parramatta, NSW, Australia time. In the identification process, this time represents 71%
3
Laboratory of Applied Energetic and Mechanic, LEMA, of the total identification time of a single sample. Despite its
University of Abomey-Calavi, Abomey‑Calavi, Benin requirements, CBR is not only one of the input parameters
13
Vol.:(0123456789)
Indian Geotech J
for pre-calculated pavement structure sheets, but also an terms of coefficient of determination. From 2011 to 2021,
important parameter that can be used to accept pavement statistical learning methods were used by [13–19]. The
support platforms on site in accordance with the Senegal accuracy increased up to 88% still in terms of coefficient of
pavement design guide [2]. The pavement dimensioning determination. With the availability of project data and the
guide for tropical countries [3] also enables soil modulus to advent of artificial intelligence, researchers have predicted
be estimated through existing correlations. the CBR value using artificial neural networks, as shown
Furthermore, it is noted that CBR testing is performed on in Table 1.
soil samples from project to project and thus can be consid- The works presented in Table 1 are based on the use of
ered a recurring task. As a researcher, we therefore wonder lightweight and wide neural networks where the prediction
if there are not technologies capable of learning from exist- variables are physical properties of soils, such as the
ing CBR data, the behavior of the latter as a function of percentage of fines, percentage of sand, percentage of gravel,
other soil properties that are less complex to obtain and less liquid limit, plasticity index and optimal water content. The
expensive. The issue is to be able to simplify the method of coefficients of determination (R2) obtained range from 0.68
obtaining the CBR bearing capacity index so that we can to 0.98, and the root mean square error (RMSE) remains
observe a reduction in the cost of soil identification and a low. In view of the recommendations of Charu Aggarwal
reduction in the identification time of a single soil sample. [20], deep architectures of artificial neural networks are
But the question had been raised before and had three preferable to simple and large networks since, according to
eras of attempted answers. From 1970 to 2021, researchers the researcher, the latter can present good performances, but
Taskiran [21] Combination of artificial neural %(F + S), %S, %G, LL , IP , 𝛾dmax , wopt logsig, tansig 0.91 1.48
networks and gene expression %(F + S), %S, LL , IP , 𝛾dmax , wopt 0.64 2.94
programming
%(F + S), %S, IP , 𝛾dmax , wopt 0.523 5.44
%(F + S), IP , 𝛾dmax , wopt 0.807 2.11
%(F + M), IP , 𝛾dmax , wopt , S 0.838 2.07
%(F + M), IP , 𝛾dmax , wopt 0.885 1.69
%(F + M), 𝛾dmax , wopt 0.681 2.85
Sabat[22] Artificial neural networks with back- % of silt, % Poussiere ̀ de carriere ̀ 0.981 1.187 1.75
propagation Number of day, 𝛾dmax , wopt
Roy et al. [23] Artificial neural networks with back- % Passantsa(4.25
̀ mm, 2 mm,
propagation
0.425 mm, 0.075 mm),
IP , 𝛾dmax , wopt
Bhatt et al. [24] Artificial neural networks with %F, %S, %G, LL , LP , 𝛾dmax , wopt tansig, linear 0.9579 0.03
backpropagation of Levenberg– %S, %G, IP , 𝛾dmax , wopt 0.9615 0.0274
Marquardt
%S, %G, %F, 𝛾dmax , wopt 0.9501 0.032
%S, %G, 𝛾dmax , wopt 0.9792 0.0199
𝛾dmax , wopt 0.8871 0.0439
Levenberg–Marquardt neural net-
work
Erzin et al. [25] Artificial neural networks with back- ( C(%),)A(%), Cc, Tan-sigmoid
Q(%), Fel(%), Ca(%), .9384 3.65 2.53
propagation Cu, w(%), G, qdry g/cm3
Taha et al.[26] Artificial neural networks with back- D60, 𝛾dmax Tan-Axon 0.90 6.96 5.21
propagation
Bardhanet al. [27] Extreme learning machine (ELM) %(F + S), %S, %G, IP , 𝛾dmax , wopt
and adaptive neuro swarm intel-
ligence (ANSI)
such as [4–12] used simple correlations to estimate the value do not offer a satisfactory generalization capacity on data
of the CBR, and they obtained an accuracy of up to 78% in unknown to the prediction model.
13
Indian Geotech J
In 1989, Hornik [28] have rigorously established that The development of such a digital tool would reduce the
standard multilayer feedforward networks with as little financial and temporal costs of obtaining the CBR value of
as one hidden layer using arbitrary squashing functions soils and, consequently, the costs of carrying out soil iden-
are capable of approximating any measurable Borel func- tification studies, since it requires only secondary data and
tion from one finite dimensional space to another with any does not depend on costly and time-consuming laboratory
desired degree of accuracy, provided that a sufficient number analyses. The obtained results can be used in setting up a
of hidden units are available. This work confirms Aggarwal’s local or regional geotechnical platform.
recommendations. Deep artificial neural networks can detect
more information in the data than a simple, large network. In
2022, Othman et Abdelwahab [29] adopted a deep architec- Material and Methods
ture to predict the CBR value. The number of hidden layers
varies from 2 to 4 and the number of hidden units varies Artificial neural networks are essentially used to learn pro-
from 7 to 20. They obtained a maximum coefficient of deter- cesses or the behavior of phenomena, based on existing data
mination R2 = 0.945, a minimum absolute error of 1.93 and a that reflect the behavior of that phenomenon. Yan le Cun
maximum error of 17.64 while the CBR values are between [30] proposed in 1986 the principle of learning a network,
3 and 100. In view of the results obtained and the learning which is composed of six major steps, namely (i) data col-
curves, we note that the models present perfectible points, lection; (ii) the preprocessing of the data; (iii) the division of
such as the reduction of the error committed by the model the data set; (iv) training of the network and optimization of
during the prediction, the improvement of the coefficient of the hyperparameters; (v) network performance evaluation;
determination, to better exploit the power of deep artificial and (vi) deployment of the network.
neural networks. The method of developing deep neural networks for CBR
Not only the type of network (deep or light), but also prediction used in this work is based on this principle. It
the hyperparameter optimization methods used in the lit- consists of preparing and dividing the data into three sets,
erature deserve attention. The first one is the generate and training and selecting the best model, evaluating the per-
test method, which consists in defining a network structure formance of the network and making new inferences from
and analyzing its performance until a satisfactory result is the developed model. Each of these steps is described in
obtained. The second method, that of grids, consists in going the following sections. The collected data must be preproc-
through all the possible configurations in the search space essed by cleaning and scaling transformation. For neural
defined for each hyperparameter. The disadvantage with network development purposes, the preprocessed database is
the first method is that it does not allow to ensure that the divided into a test data set that is kept and a training data set
obtained structure is the most optimal one, and there could for the network parameters. To do this, optimization meth-
be other structures offering better performances. As for the ods, including simulated annealing, are used to determine
second method, it is too costly in time and computing power the optimal values of the different hyperparameters of the
and can only be considered when using very advanced com- network. After this step, the best model is retrained on the
puting tools such as GPUs (graphics processing units) and training and validation data set.
TPUs (tensor processing units). When, at the end of the process, the model shows satisfac-
Despite all the studies already carried out, to our knowl- tory performance on the test data kept in the meantime, it
edge, the development of CBR neural networks with a deep can be saved to make predictions on new data instances. If
architecture to ensure their use in industry remains largely not, the possible alternatives to improve the model perfor-
perfectible to guarantee good generalization performances. mance are to increase the data and/or to revise the architec-
Since deep networks have thousands of parameters and many ture fixed at the beginning.
hyperparameters, the use of an optimization method such as
simulated annealing (adapted for fast optimization of com- Data Presentation and Preparation
plex problems) that is less costly and more secure would be
more interesting. The database used to train the CBR soil bearing capacity
The aim of the present study is firstly to develop an easy- index prediction models came from two sources. The data
to-use numerical tool capable of predicting the CBR value were collected from reports of previously completed and
on the basis of physical properties (optimum water content, ongoing projects in Benin republic. There are 372 instances
maximum dry density, Atterberg limits, granularity, etc.) by of data on the results of AG (Particle size analysis), VBS
exploiting the power of artificial neural networks to the max- (Methylene blue value), OPM (Optimum Proctor Modified),
imum and secondly to reduce the cost of developing deep CBR of samples and 122 instances of data on LA (Atterberg
networks and rationalize the method of optimizing network Limits) in place of VBS. From this database, the results of
hyperparameters using simulated annealing.
13
Indian Geotech J
13
Indian Geotech J
Count 68 68 68 68
Mean 39.86 1.82 14.18 15.31
Std 20.76 0.05 3.54 3.62
Min 5.5 1.71 9 10
25% 12.5 1.8 9.75 13
50% 50.65 1.83 15.5 14
75% 54.28 1.84 17.2 17.05
Max 64.9 1.95 19.3 29.4
E 59.4 0.24 10.3 19.4
13
Indian Geotech J
needed to approximate a relatively nonlinear function. few hidden layers. According to Aggarwal, deep networks
We then arrange several of these units to form an artificial can detect complex features in the data and thus offer better
neural network. The perceptrons are organized in layers, generalization capabilities than simple networks. But they
one after the other and in parallel. The first layer, the input are subject to several problems that must be solved during
layer, acts as a distributor of the input data to the following their development.
layers, called hidden layers or intermediate layers. In the The architecture depends on the family of artificial neural
intermediate layers, the information is transmitted from back networks. This family is determined from the nature of the
to front and from layer to layer until it reaches the last layer data entering and leaving the network. Here, we are dealing
called the output layer, where the prediction corresponding only with numerical data, and therefore, we are dealing with
to the inputs provided is obtained. The network is said to dense networks with fully connected hidden layers. The
be deep when it is composed of many hidden layers, and it network architecture is initialized at the beginning of the
is said to be simple or light when it is composed of only a
13
Indian Geotech J
process and is subject to modification when, after training, data. Details on the residual block are provided in the
a problem arises. following.
At the end of the process, six architectures of artificial
neural networks were developed and were programmed An output layer uses linear activation and consists of a
using the tensorflow + keras library written in Python and single unit.
supported by Google.
Deep Architecture AP_2
Deep Architecture AP_1
This architecture consists of two branches. Using
This architecture consists of a single branch. On this branch, parallelization in the network architecture, each branch
we find: detects a particular feature in the training data. The first
branch consists of dense and batch normalization (DBN)
• An input layer that distributes the data to the hidden lay- blocks that behave like a normal dense neural network. To
ers prevent the network from becoming saturated as the number
• Hidden layers: consisting of residual blocks that use of layers increases, we used “batch normalization” layers.
connection hops to capture simple features in the training The second branch consists of a sub-network of type AP_1
(Fig. 6).
13
Indian Geotech J
Deep Architecture AP_4 and others to make the networks robust. A brief description
and the role of each layer are presented below.
This architecture consists of two sub-networks of type
AP_3. It is a network with two outputs, all predicting Batch Normalization Layer
the value of the CBR index. An optimization by the grid
method is then done to determine the proportions in which Normalization is a recent method to solve the problem of
each prediction must be taken to obtain CBR values as leakage and gradient explosion. Beyond that, it also ensures
close as possible to the true values. the stability of the loss gradient during training. In differ-
ent artificial neural network architectures, the pre-activation
batch normalization layer helps to facilitate the gradient flow
Deep Architecture AP_5 in order to make the training faster and cheaper. Details on
how batch normalization layers work are provided by Aggar-
The output layer of this architecture admits the same oper- wal [20].
ation as the previous one. The two networks differ only in
their input layer. Here, the input layer is multiplied tenfold
in 4, to feed each branch of the network. After several Regularization Layer
processing cells through the hidden blocks, the results are
then concatenated two by two to form only two outputs. The regularization layer solves the overlearning problem that
Othman may have faced without solving. As shown in the
Architecture Stacking figure below, we notice that from the beginning of the train-
ing of the network, a gap starts to be observed between the
To exploit the strength of multiple models, an existing loss on the training data and the loss on the validation data.
approach in the literature is to stack already trained models Elaborate neural network architectures exploit a regulari-
and then bring processing through additional hidden layers zation layer of type L2, as recommended by Aggarwal.
to better converge to the true values. In our case, we stacked
the architectures with the least bias to form a better perform- Residual Blocks
ing model.
The different architectures were tested and optimized on In order to solve the gradient explosion and evanescence
a sample database in order to choose the one with the best problem as well as the degradation problem, residual neural
performance. networks have been proposed. According to He et al. [33],
the degradation problem results in an increase of the loss on
the training data and on the test data when increasing the
Overview of the Deep Neural Network Development number of hidden layers to train the deep neural networks.
Method This finding is justified by the fact that deep networks have
difficulty approximating less complex (linear) aspects by
Figure 5 illustrates the workflow of the method used to compound nonlinear activation functions. Residual neural
develop deep neural networks. The initial steps, as previ- networks propose an approach that allows the introduction
ously outlined, involve cleaning and preparing the data for of identity mappings in order to better apprehend the linear
use in the training and testing processes. With network archi- aspects of the function to be approximated.
tectures already defined, optimization methods, such as grid These residual blocks have made it possible to bet-
optimization and/or simulated annealing, are used to deter- ter exploit the power of neural networks by developing
mine optimal values for network hyperparameters. The best very deep neural networks whose number of layers varies
model is then retrained on the training and validation set. If between 10 and 100. Figure 6 shows the structure of the
the model exhibits satisfactory performance on the retained residual block used in this study.
test data, it can be saved for making predictions on new data.
Otherwise, potential improvements include increasing data Gaussian Noise Layer
and/or revising the initial architecture.
The Gaussian noise layers used are positioned just before
The Constituent Layers of Artificial Neural Network the output layer of each architecture. They made the network
Architectures less sensitive to noise, since they take as input experimental
values that have a confidence level of ± 10%.
The different architectures developed use layers to solve
instability problems encountered during network training
13
Indian Geotech J
Activation Functions optimized in the model selection phase and the derivative
of the loss function with respect to the incoming weights of
The choice of activation functions is very important in the each neuron is computed by the chain rule.
definition of the neural network architecture. Considering the The mini-batch stochastic gradient descent strategy is
activation functions already used to understand the behavior Nadam’s, a modified version of the basic method. Details
of the CBR index according to the physical properties, a on this strategy are provided in Aggarwal’s paper.
search space has been defined to select the one which offers
better performances and which is thus more adapted. Optimization of Hyperparameters
Training of Deep Artificial Neural Networks Hyperparameters are parameters that enter the design of the
neural network. Different from the parameters of the net-
Due to the complexity of the loss function of deep neural work, they are called hyperparameters to nuance the synaptic
networks, the parameter optimization method used, i.e., weights and biases of the parameters that define the archi-
gradient descent by backpropagation, has a high computa- tecture of the network.
tional cost. Backpropagation allows to progressively reduce The tuning of the hyperparameters is done on data other
the error by adjusting the values of the unknowns (synaptic than those on which the gradient descent is performed dur-
weights and bias) until reaching a minimum. ing the tuning phase, where we try to determine the optimal
The gradient descent approach to neural network train- values of these parameters. The method of optimization of
ing using backpropagation consists in reducing the error hyperparameters used is the simulated annealing. The hyper-
rate of the model by adjusting the vector of weights linking parameters to be optimized vary from one model architec-
two consecutive layers until the minimum of the function is ture to another. All the different architectures share common
reached: This is the gradient descent. hyperparameters. Table 6 presents the hyperparameters to be
To start training the model, the synaptic weights are ran- optimized by neural network architecture.
domly defined. The training is done in two steps: The optimization of hyperparameters using the grid
method or the manual method (trial and error) has a very
• A first phase required to compute the output values and high computational cost. For each set of values taken from
local derivatives at different nodes: This is the phase the search spaces of each hyperparameter, the model thus
before defined must be trained on a number of epochs allowing to
• A second phase is required to compute the sum of the select the best models. To optimize the hyperparameters of
products of the local derivatives on all paths from a node a model of type AP_1 using the grid method, 14,044,800
to the output: This is the backward phase. neural networks must be trained, i.e., for 554 years when
the average training time is two minutes on the CPU. With
Once the weights are corrected, the first phase resumes GPUs, TPUs and distributed training techniques, the com-
and the cycle continues. In the learning process, the only putational cost can be considerably reduced.
unknowns are the learning rate and the derivative of the Another alternative is to use optimization methods that
loss function with respect to the incoming weights of each are less cumbersome to use. In the present case, we have
computing unit. The learning rate is a hyperparameter to be used simulated annealing which is a metaheuristic method.
13
Indian Geotech J
According to [34], simulated annealing allows a less impor- would then be necessary to correct them by slightly reheat-
tant use of memory. ing the material to allow the atoms to regain their freedom
of movement, thus facilitating an eventual rearrangement
Simulated Annealing leading to a more stable structure.
The main idea of simulated annealing is to reproduce this
The simulated annealing method is a generalization of the behavior of the material used in metallurgy in the optimi-
Monte Carlo method. It has been used to optimize the hyper- zation process to reach the optimal solution. The internal
parameters of networks using the hyperopt library available energy of the material is then the objective function of the
in Python. It is known for its ability to avoid blocking in problem. Thus, reaching the thermodynamic equilibrium
local optima and to approach the global optimum for high state corresponds to reaching the optimal solution of the
dimensional problems. It is inspired by a natural principle problem.
called “annealing,” used in metallurgy to obtain defect-free To reach this state of ther modynamic quasi-
alloys with various shapes. The process consists of heating equilibrium, at decreasing temperature steps, we used
the metal to a certain temperature where it becomes liquid the iterative Metropolis procedure. This procedure allows
(the particles are free to circulate). At this stage, the temper- to exit local minima with a higher probability as the
ature is lowered very slowly so as to obtain a solid of well- temperature is higher. When the algorithm reaches the
defined shape. If this temperature drop is abrupt, we obtain very low temperatures, the most probable states are in
a glass; if on the contrary this temperature drop is very slow principle excellent solutions to the optimization problem.
(allowing the atoms time to reach thermodynamic equilib- Figure 8 summarizes the operation of the simulated
rium), we will obtain more and more regular structures, until annealing, and the details are provided in the appendix
we reach a state of minimum energy corresponding to the of the document.
perfect structure of a crystal, and we say that the system is
“frozen.” In the case where this lowering of temperature
would not be done slowly enough, it could appear defects. It
13
Indian Geotech J
Evaluation and Deployment of Neural Networks Maximum Absolute and Relative Error
Validation or testing in deep learning is done on data not They measure the maximum deviation from the predictions
seen by the model, reserved before starting the training. made by the models. They are very important among the
The performance of the models is evaluated through certain metrics because of the deployment objectives of the models
metrics. These are: in industry.
MA = max (|y − ŷ |)
Coefficient of Determination
Fig. 9 Evolution of the loss on the validation data according to the number of iterations
13
Indian Geotech J
13
Indian Geotech J
1.05
1
0.95
0.9
0.85
0.8
0.75
0.7
0.65
0.6
0.55
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
15
14
13
12
11
0 2 4 6 8 10 12 14 16 18 20
Test_values Predicted_values
Models for Predicting the CBR Bearing Capacity Model M1 ∶ CBR = f (AG, LA, OPM)
of Soils
The M1 model is a deep artificial neural network model based
Three models of deep artificial neural networks have on the AP_1_AP_3_AP_4_AP_5 architecture to predict the
been developed based on the stacked architecture CBR value from the proportion of fines, Atterberg limits and
AP1 _AP3 _AP4 _AP5. To each database corresponds a model PROCTOR references. The model required the development
of deep artificial neural networks. of four different models taking as input the same parameters
but based on the AP_1, AP_3, AP_4, AP_5 architectures.
The overall computational cost of the model is 53,928 s or
14 h 58 min and 48 s. The models were trained on 105 data
13
Indian Geotech J
42
40
CBR
38
36
34
32
0 5 10 15 20 25 30 35
Test_values Predicted_values
instances, i.e., 85% of the database. Figure 11 shows the terms of the deviation between true and predicted values,
results obtained when predicting the CBR value on the test but still faces some difficulty in making predictions in the
data by the M1 model. M1. right direction. Figure 12 shows the results obtained on 30
instances of test data.
Model M2 ∶ CBR = f (AG, VBS, OPM)
Model M3 ∶ CBR = f (AG, OPM)
The M2 model uses the proportion of fines, the methylene
blue value and the PROCTOR soil references to predict the The M3 model uses the proportion of fines and PROCTOR
CBR value. It is a model stack, but has a relatively different references to predict the CBR bearing capacity. Compared
architecture. to the M1 model which has only 65 data instances, the M3
The results obtained ( R2 = 0.36 , model has the advantage of having many more samples
Emax = 17.91%, MAE = 2.139 , RMSE = 2.635 , (282). The M3 model is trained on a more represented
MAPE = 5.70% ) by the second approach are better. We population compared to the M1 model. Although the M2
notice that the model shows acceptable performance in model has more samples than the M1 model, we notice a
13
Indian Geotech J
Othman et al. [29] 0.945 17.64 2.81 2.16 49.87 33.22 3 100 AG-LA-OPM
Harini HN [14] 0.94 – 2.47 2.69 4 0.6 0.97 4 LA-OPM
Bhatt et al.[24] 0.9792 – 0.0199 – 3.77 2.87 1.55 22.4 AG-OPM
Shakir Al-Busultan 0.7773 – 4.3141 – 49.69 6.871 23.5 60 Passing no. 2
and al [37] Passing no. 1 Passing no. 3/8 Passing no. 4
Passing no. 8 Passing no. 50 Passing no.
200
M.D.D
O.W.C
L.L
P.I
SO3
Soluble salt Gypsum
Organic CBR
M1 0.838 2.27 0.781 0.563 15.07 3.18 10 24 AG-LA-OPM
M2 0.36 5.771 2.635 2.139 36.65 7.72 16 66 AG-VBS-OPM
M3 0.965 5.89 2.189 1.64 15.31 3.62 10 29.4 F-OPM
decrease in performance compared to the other models. potential of deep neural networks. Moreover, the network
This can be explained by the fact that the VBS attribute structures were optimized by the simulated annealing
has a negative impact on the performance of the M2 model. method which allowed to determine optimal values for
Indeed, we notice that this column has a very low variance the hyperparameters in order to converge more quickly to
and behaves almost like a constant. When the VBS variable the optimal network.
is removed from the database to form the M3 model (in However, the model of Harini and al. in [24] demonstrates
accordance with A. Géron’s recommendations in [36]), we better performance than Model M1 developed in this article.
notice an improvement in performance. It offers a coefficient Nevertheless, the model was tested on only 10 data instances
of determination of R2 = 0.965, a maximum relative error after being developed on 114 instances. Furthermore, it
of Emax = 12.43%, a mean absolute error of MAE = 1.64 , is a shallow neural network that is deemed unreliable for
a root mean square error of RMSE = 2.189 and an average industrial applications according to the suggestions of Charu
relative error of MAPE = 5.963% (Fig. 13). Aggarwal [20].
We note that the model of Shakir Al-Busultan and al.
Comparison with Existing Models which is a neural network consisting of a single hidden layer
and an input layer with 15 prediction variables yet developed
Two out of the three developed models (M1 and M3) on 358 data instances has good performance, but less than
show satisfactory performance, while Model M2 shows our two models of deep neural networks M1 and M3 trained
poor performance (Table 8). Model M3, which is based on a less consistent database. This suggests that deep neural
on the fines percentage and optimal soil compaction char- networks detect forms in data even if they are more complex
acteristics, performs well compared to the model of Oth- than large and light neural networks.
man and al. in [29], which is also a deep neural network.
Indeed, as shown in Table 7, we note on the one hand
that the M3 model is composed of 135 residual blocks Conclusion
(see Fig. 6) and 31 dense layer blocks (see Fig. 7) while
the model of Othman et al. consists of 4 hidden layers at The development of deep artificial neural networks based
most. The M3 model is therefore deeper (with 197 hid- on the traditional method has enabled us to develop three
den layers and batch normalization layers) than the model very deep neural networks with 206 hidden layers and low
of Othman et al. (consisting of 4 hidden layers). On the development costs, thanks to the use of simulated annealing
other hand, we also note that the architectures developed for hyperparameter optimization.
with the techniques of batch normalization, regulariza- The models developed for enterprise use are as follows:
tion of hidden layers, residual blocks and parallelization,
increased the depth of networks and better exploited the
13
Indian Geotech J
• Model M1 uses the proportion of fines, Atterberg limits if changes were made. The images or other third party material in this
and PROCTOR soil references to predict the value of article are included in the article’s Creative Commons licence, unless
indicated otherwise in a credit line to the material. If material is not
the CBR bearing capacity index. The predictions of this included in the article’s Creative Commons licence and your intended
model are 83.6% perfectly correlated with the true values use is not permitted by statutory regulation or exceeds the permitted
(R2 = 0.836). The model commits a maximum error of use, you will need to obtain permission directly from the copyright
2.27 on CBR values between 10 and 24, i.e., relatively holder. To view a copy of this licence, visit http://creativecommons.
org/licenses/by/4.0/.
16.2%, a mean absolute error of 0.563 and a mean square
error of 0.781. The mean relative error on the test data is
3.74%, which satisfies the requirements of geotechnical
estimations.
• Model M3 uses fewer input parameters to predict the
References
CBR bearing capacity of soils. Based solely on the pro- 1. Liang S et al (2015) Liste de prix 2015. Proc Natl Acad Sci
portion of fines and the PROCTOR references of the 3(1):1–15
soils, predictions of 96.5% are perfectly correlated with 2. IFSTTAR (2015) Catalogue de structures de chaussées neuves et
the true values, with a relative a maximum relative error Guide de dimensionnement des chaussées au SENEGAL
3. CEBTP (2019) Revue du guide chaussées pour les pays tropicaux
of 12.43%, a mean absolute error of 1.64, a mean square 4. Agarwal KB, Ghanekar KD (1970) Prediction of CBR from
error of 2.189 and a mean relative error of 5.76%. plasticity characteristics of soil. In: Proceeding of 2nd south-east
Asian conference on soil engineering, Singapore. pp 11–15
We can conclude that the M1 model, having been trained 5. Olidis C, Hein D (2004) Guide for the mechanistic-empirical
design of new and rehabilitated pavement structures materials
on fine soils, shows good generalization performance in characterization is your agency ready?. In: TAC/ATC 2004-2004
view of the results obtained and can be used in industry. Annual conference and exhibition of the transportation association
The model’s complexity has enabled us to learn more about of Canada: transportation innovation accelerating. Pace
CBR behavior even from a limited amount of data. 6. Lepert P. Évolution de la déflexion observée sur les chaussées
souples modernes, pp 35–42
Model M2 was trained on a gravelly soil database con- 7. Udo E, Kennedy EC, Assam S (2015) Comparative stabilization
sisting of GA, VBS and OPM results, while model M3 was and model prediction of CBR values of Orukim residual SOILS,
trained on a database obtained by combining model M1 and AkwaIbom State, Nigeria. In: IOSR journal of mechanical and
model M2, but retaining only GA and OPM attributes. It can civil engineering 12(4)
8. Al-Hashemi HM, Bukhary AH (2016) Correlation between
be seen that model M3 offers better generalization perfor- California bearing ratio (CBR) and angle of repose of granular
mance than the first two models. Unlike model M3, model soil. Electron J Geotech Eng 21(17):5655–5660
M2 cannot be used in industry due to its poor performance. 9. Kumar AU, Sachar A (2020) Evaluation of correlation’s between
We conclude that adding VBS to the attributes of the Cbr using Dcp with laboratory Cbr at varying energy levels. Int J
Adv Sci Technol 29(9)
prediction network reduces performance and that the per- 10. Rehman ZU, Khalid U, Farooq K, Mujtaba H (2017) Prediction
formance of the M3 model may improve as the size of the of CBR value from index properties of different soils. Tech J Univ
database increases. The complexity of the M3 model gives it Eng Technol Taxila, Pakistan 22
a good ability to generalize to data not known to the model. 11. Roksana K, Muqtadir A, Islam T (2018) Relationship between
CBR and soil index properties of Bangladesh soil samples. Rev
Cienc y Tecnol Mod 6(2):1–9
12. Gül Y, Çayir HM (2021) Prediction of the California bearing ratio
Funding Open Access funding enabled and organized by CAUL and from some field measurements of soils. Proc Inst Civ Eng Munic
its Member Institutions. This research received no specific grant from Eng. https://doi.org/10.1680/jmuen.19.00020
any funding agency in the public, commercial, or not-for-profit sectors. 13. Yildirim B, Gunaydin O (2011) Estimation of California bearing
ratio by using soft computing systems. Expert Syst Appl. https://
Data Availability The data used to support the findings of this study doi.org/10.1016/j.eswa.2010.12.054
are included in the article. 14. Harini H, Naagesh S (2014) Prediction CBR of fine grained soils
by artificial neural network and multiple linear regression. Int J
Declarations Civ Eng Technol 5(2):119–126
15. Attah IC, Agunwamba JC, Etim RK, Ogarekpe NM (2019)
Modelling and predicting cbr values of lateritic soil treated
Conflict of interest The authors declare that there is no conflict with metakaolin for road material. ARPN J Eng Appl Sci
of interest in the publication of this article. They have no known 14(20):3609–3618
competing financial interests or personal relationships that could have 16. Bardhan A, Gokceoglu C, Burman A, Samui P, Asteris PG (2021)
appeared to influence the work reported in this paper. Efficient computational techniques for predicting the California
bearing ratio of soil in soaked conditions. Eng Geol. https://doi.
Open Access This article is licensed under a Creative Commons org/10.1016/j.enggeo.2021.106239
Attribution 4.0 International License, which permits use, sharing, 17. Trong DK et al (2021) On random subspace optimization-based
adaptation, distribution and reproduction in any medium or format, hybrid computing models predicting the California bearing ratio
as long as you give appropriate credit to the original author(s) and the of soils. Materials (Basel). https://doi.org/10.3390/ma14216516
source, provide a link to the Creative Commons licence, and indicate
13
Indian Geotech J
18. Tenpe AR, Patel A (2020) Application of genetic expression 28. Hornik K, Stinchcombe M, White H (1989) Multilayer
programming and artificial neural network for prediction of feedforward networks are universal approximators. Neural Netw
CBR. Road Mater Pavement Des. https://doi.org/10.1080/14680 2(5):359–366. https://doi.org/10.1016/0893-6080(89)90020-8
629.2018.1544924 29. Othman K, Abdelwahab H (2022) The application of deep neural
19. Tenpe AR, Patel A (2020) Utilization of support vector models networks for the prediction of California Bearing Ratio of road
and gene expression programming for soil strength modeling. subgrade soil. Ain Shams Eng J. https://doi.org/10.1016/j.asej.
Arab J Sci Eng. https://doi.org/10.1007/s13369-020-04441-6 2022.101988
20. Aggarwal CC (2018) Neural networks and deep. Learning. 30. Le Cun Y (1986) Learning process in an asymmetric threshold
https://doi.org/10.1007/978-3-319-94463-0 network. Disorder Syst Biol Organ. https://doi.org/10.1007/978-
21. Taskiran T (2010) Prediction of California bearing ratio (CBR) 3-642-82657-3_24
of fine grained soils by AI methods. Adv Eng Softw 41(6):886– 31. Abidin DZ, Nurmaini S, Malik RF, Rasywir E, Pratama Y (2020)
892. https://doi.org/10.1016/j.advengsoft.2010.01.003 Data preparation for machine learning. In: Proceedings of 2nd
22. Sabat AK (2013) Prediction of california bearing ratio of a international conference on informatics, multimedia, cyber and
soil stabilized with lime and quarry dust using artificial neural information system ICIMCIS 2020, pp. 284–289. https://doi.org/
network. Electron J Geotech Eng 18:3261–3272 10.1109/ICIMCIS51567.2020.9354273
23. Roy TK, Kuity A, Roy SK (2013) Prediction of soaked CBR 32. Payam Refaeilzadeh HL, Lei Tang (2005) Cross-validation.
for subgrade layer by using artificial neutral network model. Dement with Lewy Bodies Park Dis Dement. https://doi.org/10.
In: Proceedings of the international symposium on engineering 5743/cairo/9789774160097.003.0002
under uncertainty: safety assessment and management 33. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for
(ISEUSAM - 2012). https://doi.org/10.1007/978-81-322-0757- image recognition. In: Proceedings of the IEEE conference on
3_83 computer vision and pattern recognition, vol. 2016-Decem, pp
24. Bhatt S, Jain PK, Pradesh M (2014) Prediction of California 770–778. https://doi.org/10.1109/CVPR.2016.90
bearing ratio of soils using artificial neural network. Am Int J 34. Marne DELUPDE (2007) Conception de métaheuristiques
Res Sci Technol Eng Math 8(2):156–161 d’optimisation pour la segmentation d’images. Application à des
25. Erzin Y, Turkoz D (2016) Use of neural networks for the images biomédicales
prediction of the CBR value of some Aegean sands. Neural 35. Zhou A-H, Zhu L-P, Hu B, Pan S. Traveling-Salesman-problem
Comput Appl. https://doi.org/10.1007/s00521-015-1943-7 algorithm based on simulated annealing and gene-expression
26. Taha S, Gabr A, El-Badawy S (2019) Regression and neural programming
network models for California bearing ratio prediction of typical 36. Géron A (2019) Hands-on machine learning whith Scikit-Learing,
granular materials in Egypt. Arab J Sci Eng. https://doi.org/10. Keras and Tensorfow
1007/s13369-019-03803-z 37. Al-Busultan S, Aswed GK, Almuhanna RRA, Rasheed SE (2020)
27. Bardhan A, Samui P, Ghosh K, Gandomi AH, Bhattacharyya S Application of artificial neural networks in predicting subbase
(2021) ELM-based adaptive neuro swarm intelligence techniques CBR values using soil indices data. In: IOP Conference series:
for predicting the California bearing ratio of soils in soaked materials science and engineering vol. 671, no. 1, pp 0–9. https://
conditions. Appl Soft Comput 110:107595. https://doi.org/10. doi.org/10.1088/1757-899X/671/1/012106
1016/j.asoc.2021.107595
Publisher’s Note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
13