Machine Learning Models For Estimating Preliminary Factory Construction Cost: Case Study in Southern Vietnam
Machine Learning Models For Estimating Preliminary Factory Construction Cost: Case Study in Southern Vietnam
Machine Learning Models For Estimating Preliminary Factory Construction Cost: Case Study in Southern Vietnam
Nguyen Dang-Trinh, Pham Duc-Thang, Tran Nguyen-Ngoc Cuong & Tran Duc-
Hoc
To cite this article: Nguyen Dang-Trinh, Pham Duc-Thang, Tran Nguyen-Ngoc Cuong & Tran
Duc-Hoc (2022): Machine learning models for estimating preliminary factory construction
cost: case study in Southern Vietnam, International Journal of Construction Management, DOI:
10.1080/15623599.2022.2106043
ABSTRACT KEYWORDS
Construction of industrial enterprises has become more necessary in recent years. It is critical for project Deep learning; ensemble
managers to estimate the entire cost of a building project at this early stage. Existing approaches that model; industrial
use operator experience as a mathematical formula. Initial estimates are inaccurate due to the lack of construction; machine
learning; preliminary cost
available data points, which leads to overruns in project costs. This research utilizes different machine
learning techniques to predict preliminary factory construction cost. Five popular numeric predictive tech-
niques: support vector machine (SVM), artificial neural network (ANN), generalized linear regression
(GENLIN), classification and regression-based techniques (CART), exhaustive chi-squared automatic inter-
action detection (CHAID) are used for baseline and ensemble models. A deep learning neural network
(DLNN) is also utilized in this study. The machine learning model is trained and tested on actual data
gathered in the southern part of Vietnam. Deep learning outperforms all other machine learning algo-
rithms in this comparison, while the ensemble model of artificial neural networks and generalised linear
regression also fared well. Cost estimators can quickly pick the best model for projecting the cost of con-
structing a preliminary factory by having access to a variety of estimate methodologies.
low errors of predicting final project costs. An ensemble model Zhu et al. (2010) built a model based on fuzzy and genetic neu-
is combined machine learning techniques that built by taking the ron networks for the estimation of project cost.
strengths of single machine learners to achieve better prediction Several researchers have applied the linear and multiple
performance rather than using a single model (Chou et al. 2022). regression models to predict project cost at initial phases due to
Deep learning technique can learn complicated relations and the simplicity and capacity of using software system. Lowe et al.
high-level data features (Ning et al. 2020). An estimation model (2006) confirmed that the log of cost backward model is the best
generated from a deep neural network can handle efficiently the regression model for estimating construction-building cost out of
aforementioned problems and enhance prediction accuracy. six proposed linear regression models. Stoy et al. (2008) applied
Therefore, emerging technologies are able to generate practical the regression analysis in estimating residential building con-
and precise outcomes concerning actual circumstances. struction costs. Nasrazadani et al. (2017) created a modeling
The growth of economy has led to the investment in building framework that utilizes Bayesian regression for retrofit cost pre-
factory in the southern, Vietnam. Domestic and foreign invest- diction. The regression model has become popular in cost esti-
ment capital in factories has increased substantially in recent mation due to its simplicity. Nevertheless, SVM and CBR have
years. Several capital sources such as budget capital, non-budget superior performance in dealing with nonlinear data compared
capital and social capital have been invested. Estimation cost is a to the regression model.
crucial criterion for decision maker to invest the money for con- An et al. (2007) assessed the feature of estimated conceptual
struction, especially in the idea formation stage. Estimator often cost by using support vector machine technique. SVM outper-
uses conventional cost estimation process that may cause large formed the discriminant analysis technique in estimating results.
error due to lack of necessary information. Therefore, this HongWei (2009) integrated SVM with rough set theory to
research aims at finding the factors that influence on preliminary enhance the prediction accuracy of the construction building
factory construction cost. cost. Son et al. (2012) hybridized principal component analysis
This study also focuses on carrying out various machine- and SVM to predict accurately project performance in the prep-
learning techniques to seek the most suitable predictive models aration phase. Petruseva et al. (2016) demonstrated the superior-
for estimating preliminary factory construction cost. Five popular ity of SVM in estimation precision of bidding price compared to
numeric predictive algorithms inclusive of support vector regression models. CBR method that worked as a progressive
machine (SVM), artificial neural network (ANN), generalized lin- finding mechanism for the identical situation was a promising
ear regression (GENLIN), classification and regression-based technique for cost estimation (Kwon et al. 2017; Hyung et al.
techniques (CART), exhaustive chi-squared automatic interaction 2019). An et al. (2007) introduced a case-based reasoning model
detection (CHAID), as well as ensemble models are compared in predicting construction cost that experience is included in the
for preliminary cost estimation. Moreover, a deep learning neural analytic hierarchy process.
network is introduced to possibly improve cost estimation. A Among estimation cost techniques, the hybrid models are the
cross-fold validation approach is further utilized to avoid ran- current trend because these methods were able to yield high
domness in selecting the testing fold. accuracy in predicting outcomes. Moreover, the hybrid models
can eliminate the drawbacks of a single model. Cheng et al.
(2013) proposed a hybrid model by using an evolutionary algo-
Related works on cost estimation rithm to optimize parameters of least squares SVM to predict
the construction cost index. Arabzadeh et al. (2018) proved that
Many extensive studies of construction cost estimate using artifi- the hybrid models achieved more accuracy in cost estimation
cial intelligence and machine learning models have been dis- than the single model. Shoar et al. (2022) used a hybrid model
cussed (Elmousalami 2021). Previous studies can be classified based on random forest regression to predict the increased pro-
into six groups including artificial neural network (ANN), fuzzy ject cost of high-rise residential buildings. Das et al. (2022)
logic (FL), regression, support vector machine (SVM), case-based hybridized the seasonal regression and artificial neural network
reasoning (CBR), and hybrid models (Elmousalami 2020). for forecasting the wind energy production cost. The hybrid
Ambrule and Bhirud (2017) have applied ANN for estimating models have been proved the appropriate techniques in estimat-
the preliminary cost of building projects to overtake the errors at ing construction cost with stable and high accuracy results.
initial phases of construction building. Juszczyk et al. (2018) Ensemble models and deep neuron networks are recently
investigated the application of ANNs in calculating building introduced for cost estimation with high accuracy. Williams and
activities overall cost of playing ground. Maya et al. (2021) Gong (2014) proposed a stacking ensemble and text mining
designed a model based neural network in estimating future pro- method to predict the cost overrun based on project contract
ject performance. documentation. Cao et al. (2018) developed a powerful ensemble
Fuzzy system techniques have been implemented in estimat- method for estimating the unit price bidding. The proposed
ing construction project costs for years. Yang and Xu (2010) pro- model outperformed any of the constituent learning algorithms
posed a fuzzy technique including four inputs and one output to and the baseline models. Meharie et al. (2021) demonstrated the
estimate building projects with a maximum error is 3.2%. Zhai use of the stacking ensemble-learning method for estimating the
et al. (2013) utilized fuzzy c-means to establish a fuzzy system construction project costs with high accuracy. Ning et al. (2020)
for predicting cost. Karatas and Ince (2016) modeled an expert applied a convolutional neuron network technique to estimate
tool based on fuzzy logic for satellite cost prediction. The above- the manufacturing cost. Bodendorf et al. (2021) investigated the
mentioned fuzzy methods used experts’ opinions to determine use of deep learning neural network that is based on image proc-
fuzzy rules generation. Hence, the hybridization of fuzzy and essing, auto encoding, and regression method to calculate the
other techniques for cost estimation justification is a new evolv- manufacturing cost of motherboard.
ing trend Cheng and Roy (2010) hybridized fuzzy approach, evo- Several researchers have successfully identified many key cost
lutionary algorithm, and neural network model to estimate driver identifications in construction projects (Elmousalami
conceptual cost. Fuzzy logic was applied for input and out data. 2020). ElSawy et al. (2011) determined the ten most important
INTERNATIONAL JOURNAL OF CONSTRUCTION MANAGEMENT 3
factor cost drivers out of 52 factors via a questionnaire survey and output dependent variables (Y) as the data distribution
based on expert’s judgment. Kim (2013) used a questionnaire assumption.g ¼ g ðEðY ÞÞ ¼ Xi bi þ O, Y F(3)
survey and factor analysis to identify and rank the factors affect- where g is the linear prediction function, O is an offset vari-
ing set of guidelines for infrastructure projects. Marzouk and able, bi denotes the slope coefficients, Xi is independent inputs,
Elkadi (2016) based on experts’ responses in performing ques- and F is the distribution of Y. The generalized linear model con-
tionnaire to choose the causal factors of water purifying systems. sists of three constituents (1) an output variable Y complies with
El-Sawah and Moselhi (2014) collected a data set from 35 low- a particular random distribution where expected value l and
rise structural steel buildings for preliminary cost estimating. variance r2(E(Y) ¼ l; (2) a connecting function g(.) that links
Lotfy and Mohamed (2002) used 480 real projects as input data the expected value (l) of Y to transform predicted values of g[g
for proposed model. ¼ g(l)], and (3) a structural model.
An extensive review shows that there were a few studies on
estimating the early construction cost of a factory building.
Especially, there are no research on predicting preliminary fac-
tory construction cost in Vietnam. Cong and Minh (2020) used Classification and regression trees (CART)
ANN to estimate the construction schools cost in Ho Chi Minh The CART is a basic machine learning algorithm that can deal
city, Vietnam. The estimation cost model used a real data of 27 with regression and classification problems (Breiman et al. 1984).
school projects and yielded a high accuracy over 90%. Truong The variables in CART can classify as numerical or categorical.
and Soo-Yong (2009) utilized neural network model to predict- A set of learning data intends to optimize a learning tree. The
ing apartment construction cost in Vietnam. A data of 14 sam- optimization process assurances robustness while can keep the
ples were used for training and the last five cases were used for model simplicity. Various impurity measurements are used as
testing. The current study aims at determining the most effective the criterion to split nodes in CART. For example, Gini is usu-
and precise estimating methods for preliminary factory construc- ally picked for symbolic targeted fields. For continuous targets,
tion cost. Moreover, this research collects actual data on 35 the least-squared deviation is applied for automatically choosing
industrial park projects for model implementation. without selection explanation. The Gini index g(t) can be formu-
2
lated as Eq. (4).g ðtÞ ¼ 1 pðtÞ ð1pðtÞÞ2 (4)
where p(t) is the relative frequency of the first class in the
Machine learning models node. The value of Gini index equals to zero when one class
Baseline predicting approaches appears at a node.
Ensemble model the level of factors affecting via a five-point Likert scale, where 1
denotes the almost none influence and 5 represents extreme
Ensemble methods are powerful machine learning methods to influence. The final part gathers respondent’s information
integrate the best-performing models to improve the overall including organization, designations, and years of experience.
achievement. The mathematical expression of the ensemble The unanswered and identical responses for all questions were
approach as g: Rd!R d-dimensional vector of input data and a eliminated from the data. The outliers was removed via the box-
one-dimensional output Y. An estimated function g(.) is obtained plot method (Schwertman et al. 2004).
by a particular algorithm in each process. The linear combin- The justification of the above process was to determine the
ation functions in Eq. (6) is utilized P
to obtain an ensemble-based most crucial factors that impact preliminary factory construction
function gen() as followsgenen ð:Þ ¼ Nj¼1 cj gð:Þ(6) cost. The study conducted a total of 200 questionnaires to
where cj is the linear combination coefficients, which are respondents in the southern region, Vietnam. A total of 178 valid
defined based on average values of weights. responses were received, representing a response rate of 89%. The
inadequate data were removed before conducting the statistical
process in SPSS V23 (Landau 2017). The critical factors are iden-
Model construction and evaluation methods tified by mean value and Cronbach’s a coefficient analysis (Hair
Data collection et al. 2013). The variance inflation factor (VIF) is applied to
examine the multicollinearity (Hair et al. 2019; Nguyen et al.
This section consists of two phases: (1) a questionnaire was con- 2022). The VIF value is equal to or greater than 5 represents the
ducted to acquire preliminary data. The main factors affecting developing regression model with a high probability of exhibiting
preliminary factory construction cost can be clearly identified. multicollinearity and vice versa. The values of the inner VIF are
The scope of work focuses on studying factory construction cost in adequate range (VIF 1-1.923 < 5), therefore, the multicollinear-
in the southern region, Vietnam. Therefore, the questionnaire ity assumption is eliminated. Table 1 lists ten input variables that
method allows users to gain information from a large audience were used for estimating preliminary factory construction cost.
in a short period in a standardized way. (2) A real data set is In the second phase, a data set was collected from the indus-
collected from 35 industrial park projects in Vietnam that use trial park projects. Table 2 provides completed input and out-
for machine learning model evaluation. put datasets.
In the first phase, a preliminary questionnaire was created K-fold cross-validation technique is resampling the dataset
using expert judgements and relevant suggestions from literature. method to assess a machine learning model performance. This
The variables including twenty factors collected from the litera- method intends to have a lower bias compared to random sam-
ture review. A pilot study was conducted to determine the final pling methods. This study applied fivefold validation testing to
questionnaire form. The preliminary survey was conducted with provide a reasonable computing time and minor variation
fifteen (15) professionals with at least three years of experience according to Kohavi (1995) suggestion and the size of sample
in bidding and building construction factory in Vietnam by face data. The general process of stratified fivefold cross validation is
to face and online interviews. Because of the difficulty in con- as following critical tasks: (1) dividing sample data into five sub-
tacting experts mainly involved in the factory project as well as sets, (2) selecting a separate subset for testing, and remaining
limited time, only eight respondents were returned including subsets for training, (3) repeating five times model training and
bidding department (four respondents) and construction depart- testing. The model performance is evaluated via subset data test-
ment (four respondents). Eight specialists contributed a pilot test ing, as shown in Figure 1. The average results obtained by five
to suggest minor changes in the designing stage of the question- testing rounds express the accuracy of considering model.
naire. The approved questionnaire was ready for use in the field
after those changes were made.
Model construction and criteria
The authorized questionnaire consists of sixteen main ques-
tions to identify the influential factors. The field survey applied The Rapidminer (Minerswa et al. 2001) is used to implement the
two methods including interview and survey to eliminate the dis- predictive models for cost estimation. Rapidminer owns an easy to
inclination response. The main questionnaire comprises three use human-computer interaction to execute an analytical process.
main parts. The first part presents the survey purposes and fun- The user can use simple clicking buttons to input data,
damental knowledge about preliminary factory construction cost set algorithm parameters, and also build models simply. Figure 2
to targeted respondents. The respondents are asked to evaluate depicts five steps of model construction, which are explained below.
First step (loading data): This retrieve bottom is utilized to The above five steps also are used to implement the deep
access data in the repository and load them into the process neuron network and ensemble model.
Second step (select attributes): This mechanism uses various The machine learning model performance was measured via
filter types to select the attribute. The following process will statistical indicators that consist of correlation coefficient (R),
operate only on the selected attributes mean absolute percentage error (MAPE), root mean square error
Third step (set role): This node defines function of selected (RMSE), and mean absolute error (MAE). The R value is
attributes. The operator also specifies the input and tar- employed to evaluate the correlation between two variables. The
get variables. higher the absolute value of the R indicates the stronger the rela-
Fourth step (cross validation): This task uses a k-fold cross tionship. The MAPE expresses accuracy in a percentage manner
validation method to evaluate the statistical model and uses the concept of absolute values. The RMSE stands for
performance. the sample standard deviation of estimated and actual values.
Fifth step (building model): The model is built and tested in The MAE presents absolute errors between the estimated and
this phase actual values. The lower values of MAPE, RMSE, and MAE
6 N. DANG-TRINH ET AL.
strong confidence in predicting accuracy. The mathematical for- Table 3. Comparisons of machine learning approaches.
mula of these indicators was depicted as Eqs. (7)–(10). RMSE (million) MAE (million) MAPE (%) R
P P P
n ya :yp ya yp Model Avg. Std. Avg. Std. Avg. Std. Avg. Std.
R ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffirffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P P 2 (7)
P P 2 ANN 5661.965 479.25 4153.246 425.31 22.49 5.74 0.910 0.125
n y2a ya n y2p yp SVM 12,162.688 981.53 10,538.749 7831.26 88.78 8.76 0.763 0.167
CART 7272.326 717.18 5174.612 625.15 32.16 7.89 0.861 0.159
1 Xn yp ya GENLIN 6559.556 596.21 4890.241 515.38 35.54 7.17 0.849 0.148
MAPE ¼ (8) CHAID 7364.911 754.29 5352.942 629.31 31.97 6.98 0.894 0.157
n i¼1 ya DNN 4911.216 415.79 4019.731 376.84 21.70 4.98 0.921 0.109
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 Xn
Ensemble 4798.831 402.87 3915.893 381.29 20.16 4.25 0.932 0.093
RMSE ¼ y ya Þ2
i¼1 ð p
(9) model
n Noted: Avg. ¼ average; Std. ¼ standard deviation.
1 Xn
MAE ¼ i¼1 p
y ya (10)
n investment but it is a quite error if the project has small amount
where ya represents an actual value; yp denotes predicted value; of total investment. The correlation coefficient (R) is 0.932,
and n is number of samples. which is relatively high, there is a good linear correlation
between the actual value and the estimated value.
Figure 3 presents the actual values and predicted values
Experimental results and discussion obtained by the best machine-learning model (Ensemble). The
Results and analysis highest and lowest absolute deviation between actual and pre-
dicted values are 5988.5 MilVND and 92.4 MilVND, respectively.
Table 3 reported the statistical performance measurement of The horizontal axis denotes the index of instances in the testing
ensemble designed model, DNN, and other single predictive data of all folds; the vertical axis represents the preliminary fac-
approaches including ANN, SVM, CART, GENLIN, CHAID. tory construction cost.
The average and standard deviation values of all indicators were Computational time is another important indicator that
presented by a summary of the cross-fold modeling performance should be considered for model evaluation. All considered mod-
via testing folds of considered models. The predictive accuracy els use CPU time in obtaining the results for a fair comparison
obtained by the ANN model was the best in all commonly used with the same hardware platform. The computational times of
predictive models. The deep neuron network is the most effective every single model in estimating the preliminary factory con-
model to predict preliminary factory construction cost in all struction cost. were shown as following: The ANN needed the
baseline models. Notably, the ensemble model of the two best least computation time with average values 5 seconds for estimat-
single models (ANN þ GENLIN) was superior to that of baseline ing the preliminary factory construction cost, followed by the
models in all cases. The predictive accuracy obtained by the SVM (6 sec), CART (7 sec), GENLIN (8 sec), CHAID (8 sec), and
ensemble model was 4798.831 (MilVND) of RMSE, 3915.893 DNN(7 sec). These models were built in Rapidminer studio soft-
(MilVND) of MAE, 20.16% of MAPE, and 0.932 of R. The MAE ware with given optimal parameters. Thus, these models provide
value is acceptable for a project with a large amount of total a good basis for developing a cost estimation system.
INTERNATIONAL JOURNAL OF CONSTRUCTION MANAGEMENT 7
Figure 3. The actual values and predicted values obtained by the best model.
learning model and neglecting on analysis of cost performance Enshassi A, Mohamed S, Abdel-Hadi M. 2013. Factors affecting the accuracy
response. Despite that, the collected real dataset is very useful for of pre-tender cost estimates in the Gaza Strip. J Construct Dev Countries.
18(1):73–94.
further purposes.
Ganorkar AB, Lakhe RR, Agrawal KN. 2017. Cost estimation techniques in
manufacturing industry: concept, evolution and prospects. Int J Econ
Account. 8(3–4):303–336.
Disclosure statement Hair JF, Black WC, Babin BJ, Anderson RE. 2013. Multivariate data analysis.
USA: Pearson Prentice Hall publishing.
No potential competing interest was reported by the authors. Hair JF, Sarstedt M, Ringle CM. 2019. Rethinking some of the rethinking of
partial least squares. Eur J Market. 53(4):566–584.
HongWei M. 2009. An improved support vector machine based on rough set
Funding for construction cost prediction. In: 2009 International forum on com-
puter science-technology and applications; Chongqing, China. IEEE; p.
This research is funded by Vietnam National University HoChiMinh 3–6
City (VNU-HCM) under grant number DS2022-20-01. Hyung W-G, Kim S, Jo J-K. 2019. Improved similarity measure in case-based
reasoning: a case study of construction cost estimation. Eng Constr Archit
Manage. 27(2):561–578.
Juszczyk M, Lesniak A, Zima K. 2018. ANN based approach for estimation
of construction costs of sports fields. Complexity 2018:1–11.
References Karatas Y, Ince F. 2016. Fuzzy expert tool for small satellite cost estimation.
Al-Tawal DR, Arafah M, Sweis GJ. 2021. A model utilizing the artificial IEEE Aerosp Electron Syst Mag. 31(5):28–35.
neural network in cost estimation of construction projects in Jordan. Eng Kass GV. 1980. An exploratory technique for investigating large quantities of
Constr Archit Manage. 28(9):2466–2488. categorical data. J R Stat Soc Ser C (Appl Stat). 29(2):119–127.
Ambrule VR, Bhirud AN. 2017. Use of artificial neural network for pre Kim S. 2013. Hybrid forecasting system based on case-based reasoning and
design cost estimation of building projects. Int J Recent Innov Trends analytic hierarchy process for cost estimation. J Civil Eng Manage. 19(1):
Comput Commun. 5(2):173–176. 86–96.
An S-H, Kim G-H, Kang K-I. 2007. A case-based reasoning cost estimating Kohavi R. 1995. A study of cross-validation and bootstrap for accuracy esti-
model using experience by analytic hierarchy process. Build Environ. mation and model selection. In: Proceedings of the 14th international
42(7):2573–2579. joint conference on artificial intelligence - Vol. 2. Montreal, Quebec,
An S-H, Yeol Park U, Kang K-I, Cho M-Y, Cho H-H. 2007. Application of Canada: Morgan Kaufmann Publishers Inc. p. 1137–1143.
support vector machines in assessing conceptual cost estimates. J Comput Kwon N, Park M, Lee H-S, Ahn J, Kim S. 2017. Construction noise predic-
Civ Eng. 21(4):259–264. tion model based on case-based reasoning in the preconstruction phase. J
Arabzadeh V, Niaki STA, Arabzadeh V. 2018. Construction cost estimation Constr Eng Manage. 143(6):04017008.
of spherical storage tanks: artificial neural networks and hybrid regres- Landau S. 2017. A handbook of statistical analysis using SPSS. Washington
sion—GA algorithms. J Ind Eng Int. 14(4):747–756. D.C.: CRC Press LLC.
Asghari V, Hsu S-C, Wei H-H. 2021. Expediting life cycle cost analysis of Lotfy EA, Mohamed AS. 2002. Applying neural networks in case-based rea-
infrastructure assets under multiple uncertainties by deep neural networks. soning adaptation for cost assessment of steel buildings. Int J Comput
J Manage Eng. 37(6):04021059. Appl. 24(1):28–38.
Bodendorf F, Merbele S, Franke J. 2021. Deep learning based cost estimation Lowe DJ, Emsley MW, Harding A. 2006. Predicting construction cost using
of circuit boards: a case study in the automotive industry. Int J Prod Res. multiple regression techniques. J Constr Eng Manage. 132(7):750–758.
1–22. doi:10.1080/00207543.2021.1998698. Marzouk M, Elkadi M. 2016. Estimating water treatment plants costs using
Breiman L, Friedman JH, Olshen Richard A, Stone CJ. 1984. Classification factor analysis and artificial neural networks. J Clean Prod. 112:
and regression trees. Newyork: Chapman and Hall/CRC. 4540–4549.
Cao Y, Ashuri B, Baek M. 2018. Prediction of unit price bids of resurfacing Maya R, Hassan B, Hassan A. 2021. Develop an artificial neural network
highway projects through ensemble machine learning. J Comput Civil (ANN) model to predict construction projects performance in Syria. J
Eng. 32(5):04018043. King Saud Univ Eng Sci.
Cheng M-Y, Hoang N-D, Wu Y-W. 2013. Hybrid intelligence approach Meharie MG, Mengesha WJ, Gariy ZA, Mutuku RNN. 2021. Application of
based on LS-SVM and Differential Evolution for construction cost index stacking ensemble machine learning algorithm in predicting the cost of
estimation: A Taiwan case study. Automat Construct. 35:306–313. highway construction projects. Eng Constr Archit Manage. doi:10.1108/
Cheng M-Y, Roy AFV. 2010. Evolutionary fuzzy decision model for construc- ECAM-02-2020-0128.
tion management using support vector machine. Expert Syst Appl. 37(8): Minerswa I, Klinkenberg R, Fischer S. 2001. RapidMiner. Germany:
6061–6069. University of Dortmund.
Cheng M-Y, Tsai H-C, Sudjono E. 2010. Conceptual cost estimates using Mohamed A, Celik T. 2002. Knowledge based-system for alternative design,
evolutionary fuzzy hybrid neural network for projects in construction
cost estimating and scheduling. Knowledge Based Syst. 15(3):177–188.
industry. Expert Syst Appl. 37(6):4224–4231.
Murat G€ unaydın H, Zeynep Dogan S. 2004. A neural network approach for
Chou J-S, Fleshman D-B, Truong D-N. 2022. Comparison of machine learn-
early cost estimation of structural systems of buildings. Int J Project
ing models to provide preliminary forecasts of real estate prices. J Hous
Manage. 22(7):595–602.
Built Environ. doi:10.1007/s10901-022-09937-1.
Nasrazadani H, Mahsuli M, Talebiyan H, Kashani H. 2017. Probabilistic
Cong TD, Minh QN. 2020. Estimating the construction schools cost in Ho
Chi Minh City using artificial neural network. Hanoi, Vietnam: IOP modeling framework for prediction of seismic retrofit cost of buildings. J
Conference Series: Materials Science and Engineering. p. 869. Constr Eng Manage. 143(8):04017055.
Das P, Patty S, Malakar T, Rani N, Saha S, Barman D. 2022. A hybrid regres- Nelder JA, Wedderburn RWM. 1972. Generalized linear models. J R Stat Soc
sion based forecasting model for estimating the cost of wind energy pro- Ser A (Gen). 135(3):370–384.
duction. IFAC-PapersOnLine. 55(1):795–800. Nguyen T-T-N, Anh Nguyen T, Tien Do S, Nguyen VT. 2022. Assessing stake-
El-Sawah H, Moselhi O. 2014. Comparative study in the use of neural net- holder behavioural intentions of BIM uses in Vietnam’s construction proj-
works for order of magnitude cost estimating in construction. J Inform ects. Int J Construct Manage. 1–9. doi:10.1080/15623599.2022.2051241.
Technol Construct. 19:462–473. Ning F, Shi Y, Cai M, Xu W, Zhang X. 2020. Manufacturing cost estimation
Elmousalami HH. 2021. Comparison of artificial intelligence techniques for based on a deep-learning method. J Manufact Syst. 54:186–195.
project conceptual cost prediction: a case study and comparative analysis. Petruseva S, Sherrod P, Pancovska VZ, Petrovski A. 2016. Predicting bidding
IEEE Trans Eng Manage. 68(1):183–196. price in construction using support vector machine. TEM J. 5(5):143–151.
Elmousalami HH. 2020. Artificial intelligence and parametric construction Pettang C, Mbumbia L, Foudjet A. 1997. Estimating building materials cost
cost estimate modeling: state-of-the-art review. J Constr Eng Manage. in urban housing construction projects, based on matrix calculation: the
146(1):03119008. case of Cameroon. Construct Build Mater. 11(1):47–55.
ElSawy I, Hosny H, Razek MA. 2011. A neural network model for construc- Pham TQD, Le-Hong T, Tran XV. 2021. Efficient estimation and optimiza-
tion projects site overhead cost estimating in Egypt. Int J Comput Sci. tion of building costs using machine learning. Int J Construct Manage.
3(8):273–283. 1–13. doi:10.1080/15623599.2021.1943630.
INTERNATIONAL JOURNAL OF CONSTRUCTION MANAGEMENT 9
Sayed M, Abdel-Hamid M, El-Dash K. 2020. Improving cost estimation in con- Truong LV, Soo-Yong K. 2009. Neural network model for construction cost
struction projects. Int J Construct Manage. 1–20. doi:10.1080/15623599. prediction of apartment projects in Vietnam. Korean J Construct Eng
2020.1853657. Manage. 10(3):139–147.
Schwertman NC, Owens MA, Adnan R. 2004. A simple more general boxplot Vapnik VN. 1995. The nature of statistical learning theory. New York, NY:
method for identifying outliers. Comput Stat Data Anal. 47(1):165–174. Springer-Verlag.
Shartooh Sharqi S, Bhattarai A. 2021. Evaluation of several machine learning CBRE. 2020. Vietnam industrial market time for a critical makeover.
models for field canal improvement project cost prediction. Complexity Vietnam: CBRE.
2021:1–12. Williams TP, Gong J. 2014. Predicting construction cost overruns using text
Shoar S, Chileshe N, Edwards JD. 2022. Machine learning-aided engineering mining, numerical data and ensemble classifiers. Automat Construct. 43:
services’ cost overruns prediction in high-rise residential building projects: 23–29.
Application of random forest regression. J Build Eng. 50:104102. Yang S, Xu J. 2010. The application of fuzzy system method to the cost esti-
Son H, Kim C, Kim C. 2012. Hybrid principal component analysis and sup- mation of construction works. In: 2010 International conference on
port vector machine model for predicting the cost performance of com- machine learning and cybernetics; Qingdao, China. IEEE; p. 654–658.
mercial building projects using pre-project planning variables. Automat Zhai K, Jiang N, Pedrycz W. 2013. Cost prediction method based on an
Construct. 27:60–66. improved fuzzy model. Int J Adv Manuf Technol. 65(5–8):1045–1053.
Stoy C, Pollalis S, Schalcher H-R. 2008. Drivers for cost estimating in early Zhu WJ, Feng WF, Zhou YG. 2010. The application of genetic fuzzy neural
design: case study of residential construction. J Constr Eng Manage. network in project cost estimate. In: 2010 International conference on
134(1):32–39. e-product e-service and e-entertainment; Henan, China. IEEE.
Sut N, Simsek O. 2011. Comparison of regression tree data mining methods
for prediction of mortality in head injury. Expert Syst Appl. 38(12):
15534–15539.