Automatic_crack_classification_on_asphalt_pavement
Automatic_crack_classification_on_asphalt_pavement
Automatic_crack_classification_on_asphalt_pavement
Table 1 provides a comprehensive summary of various studies utilizing transfer learning (TL) techniques for crack
detection and classification. Although TL methods have achieved significant success in this domain, a critical
review of the literature reveals several limitations. Notably, only a limited number of studies have undertaken a
systematic evaluation of multiple pre-trained CNN models, highlighting the need for a more thorough investigation
into their comparative performance in crack classification. Furthermore, prior research has addressed the
optimization of pre-trained CNN architectures, with most efforts confined to applying existing optimization
algorithms to enhance these models. A detailed comparative analysis of pre-trained CNN models, however, can
yield valuable insights into the potential of fine-tuning their architectural layers to improve performance. This
approach not only refines the models but also contributes to the development of robust, automated TL-based
systems for crack classification.
To address these gaps, the present study goes beyond the standard application of transfer learning by emphasizing
comprehensive model evaluation, layer optimization, and generalizability. Unlike prior works that primarily test
pre-existing architectures, this study explores the structural adaptability of pre-trained CNNs, demonstrating how
fine-tuning specific layers can significantly improve model performance. Additionally, the study introduces a
robust methodology for selecting optimal architectures tailored to crack detection, which fills a critical gap in the
literature.
2. METHODOLOGY
This empirical research adopted a positivist philosophy and deductive reasoning (Edwards et al., 2020) to analyze
secondary image data obtained from open-source databases to accurately model the phenomenon under
investigation. Such an approach has been extensively used previously to evaluate risk factors impacting upon
public-private partnerships (Kukah et al., 2022); assess the risk associate with sustainable housing (Adabre et al.,
2022); and model construction machinery stability (Edwards et al., 2019). Therefore, this well-established
Figure 1: A transfer learning strategy using a trained CNN for identifying asphalt pavement cracks.
(1)
Precision as shown in equation (2) is another metric which calculates the total of correct positive predictions made.
Thus, precision is a metric used to calculate the accuracy for the marginal class (Ilse et al., 2020).
(2)
Recall as shown in equation (3) is a metric that measures the proportion of correct positive predictions out of all
possible positive instances. In contrast to precision, which reflects the accuracy of positive predictions, recall
provides insight into the number of missed positive predictions. Therefore, recall gives an indication of the model's
coverage of the positive class (Ilse et al., 2020).
(3)
The F1-score combines both precision and recall into a single metric, effectively capturing both aspects of model
performance. After calculating precision and recall for either a binary or multiclass classification task, these two
metrics are used to compute the F1-score (Ilse et al., 2020) as shown in equation (4).
(4)
The classification problem is frequently related to the multi-class classification, which classifies instances into
three or more classes (Ilse et al., 2020). Precision, recall and F1 score can be classified into three categories for
multi-class classification viz.: ‘macro’, ‘micro’ and ‘weighted’. In macro averaging, the multiclass predictions are
reduced into various sets of binary predictions, the corresponding metric is determined for each of the binary cases,
and the results are averaged for k classes (Luo & Uzuner, 2014) as shown in equations (5), (6) and (7).
(5)
(6)
(7)
Micro averaging treats the entire set of data as an aggregate result and calculates 1 metric rather than k metrics
that get averaged together. Like macro, weighted determines the weighted mean by taking label imbalance into
account (Luo & Uzuner, 2014) as shown in equations (8), (9), and (10).
(8)
(9)
(10)
In these pretrained models the raw images are pre-processed by resizing and normalising all images to become
224 × 224. The learning rate (0.001), training epoch (10) and batch size of 64.
2.4 Dataset
The study utilised public datasets that were obtained from the GitHub website. The main dataset used is
CRACK500, consisting of 500 images, each with a size of around 2,000 × 1,500 pixels. The images were taken
on the main campus of Temple University using mobile phones. Yang et al. (2019) augmented the images number
by splitting each image into 16 separate regions without overlap. Only the regions that included more than 1,000
pixels of crack were retained. This method increased the size of the training dataset by incorporating 1,896
additional photos. To evaluate the pre-trained CNN models, additional datasets from GitHub are combined with
the primary dataset to create 2,380 pavement crack images. The dataset's images are divided into four groups (to
represent the four types of cracks: longitudinal, horizontal, diagonal and alligator) as shown in Table 3. A random
selection of 20% of the collected images were pooled to create the test set (see Table 3).
Table 3: Dataset groups summary.
Dataset Longitudinal Horizontal Diagonal Alligator Total
Train 547 546 543 209 1845
Test 160 160 160 55 535
Total 707 706 703 264 2380
3. RESULTS
This section presents and discusses the results of the study, including results of the loss, and accuracy metrics.
3.1 Loss
Figure 2: Loss curves of the training and validation for (a) DenseNet121, (b) EfficientNetB0, (c) EfficientNetB3,
(d) MobileNet, and (e) MobileNetV2.
The loss value varied throughout the first few epochs but as the epoch increased, it decreased and proceeded to
Figure 3: Accuracy metrics of the training and validation for (a) DenseNet121, (b) EfficientNetB0, (c)
EfficientNetB3, (d) MobileNet, and (e) MobileNetV2.
The results indicate that EfficientNet-B3 realized the highest recall rate at 94%, followed by DenseNet-121 at
92%, EfficientNet-B0 at 89%, MobileNet at 86% and MobileNet V2 at 84%. F1-score is a performance measure
that integrates both precision and recall into a single value, presenting a balanced assessment of a model's accuracy
in categorization tasks. As shown in Table 4, EfficientNet-B3 achieved the highest F1-score at 94%, followed by
DenseNet-121 at 92%, EfficientNet-B0 at 89%, MobileNet at 86%, and MobileNetV2 at 84%.
The accuracy performance parameters for the pre-trained CNNs are showed and compared in Table 4. Notably,
4. DISCUSSION
The classification classes of the evaluated dataset were classified with varying degrees of accuracy using the same
pre-trained CNN models. The confusion matrix illustrates how well-trained models perform while categorizing
various classes. Figure 4 (a,b,c,d,e) shows the normalised confusion matrix, out of these confusion matrices, the
pre-trained models performed very effectively on the diagonal, longitudinal and alligator crack images. However,
misclassifications appeared on the horizontal crack for EfficientNetB0.
Moreover, the confusion matrices show that the alligator crack has less accuracy in the DenseNet121 and
MobileNetV2 models. This could be because the alligator data set is smaller than other datasets.
Figure 4: Normalised confusion matrix for (a) DenseNet121, (b) EfficientNetB0, (c) EfficientNetB3, (d) MobileNet,
and (e) MobileNetV2.
In this study, some modifications in EfficientNetB3 architecture are proposed to enhance the model performance.
Specifically, the top layers of the model were replaced with dense, batch normalization, and dropout layers, as
suggested by Ali et al. (2022). These modifications were applied to the B3 base architecture, utilizing the Swish
activation function for the dense (fully connected) layers, as recommended by Ramachandran et al. (2018) and
The theoretical contributions of the study lie in its innovative use of convolutional neural networks (CNNs) and
transfer learning to address the critical task of asphalt pavement crack classification. This research addresses gaps
in current methodologies by systematically evaluating five pre-trained CNN models: EfficientNet-B0,
EfficientNet-B3, DenseNet-121, MobileNet, and MobileNetV2, all fine-tuned for crack detection tasks using the
Crack500 dataset. The study's results highlight EfficientNet-B3 as the most effective model, achieving a 96% F1
score and 96% accuracy after applying advanced transfer learning techniques.
Key Theoretical Contributions:
1. Transfer Learning Optimization: By leveraging ImageNet pre-trained weights and fine-tuning the CNN
layers, the study demonstrates the effectiveness of transfer learning for domain-specific tasks. This
approach minimizes the need for extensive domain-specific data, a common challenge in pavement
inspection.
2. Evaluation of Multiple Models: The comparison of five CNN architectures provides a deeper
understanding of their strengths and weaknesses in multiclass classification. The study highlights
EfficientNet-B3’s architecture as optimal due to its ability to balance performance and computational
efficiency.
3. Scalability and Generalization: The study establishes a foundation for future research by demonstrating
that transfer learning techniques can generalize well to different types of cracks (e.g., longitudinal,
diagonal). It sets a roadmap for expanding datasets and testing additional pre-trained models to enhance
robustness further.
4. Integration into Practical Applications: Beyond theoretical advancements, the study offers insights into
integrating AI-driven crack detection into real-world workflows, such as automated road inspections,
reducing manual effort and improving accuracy.
By combining theoretical rigor with practical implications, this research provides a scalable framework that can
be built upon for broader pavement distress classifications and applied in various infrastructure management
scenarios.
5. CONCLUSION
Convolutional neural networks (CNNs) have quickly emerged as a prominent and effective method for crack
detection. While numerous studies have explored crack detection using CNNs, limited research has focused on
leveraging pre-trained models for this purpose. This study applied transfer learning to classify types of asphalt
pavement cracks. The main contributions of this research are: 1) the evaluation of five pre-trained CNN models,
originally trained on the ImageNet dataset, on the collected asphalt pavement crack dataset; 2) the assessment of
REFERENCES
Adabre, M. A., Chan, A. P. C., Edwards, D. J., & Osei-Kyei, R. (2022). To build or not to build, that is the
uncertainty: Fuzzy synthetic evaluation of risks for sustainable housing in developing economies. Cities,
125, 103644. https://doi.org/https://doi.org/10.1016/j.cities.2022.103644
Ali, K., Shaikh, Z. A., Khan, A. A., & Laghari, A. A. (2022). Multiclass skin cancer classification using
EfficientNets – a first step towards preventing skin cancer. Neuroscience Informatics, 2(4), 100034.
https://doi.org/https://doi.org/10.1016/j.neuri.2021.100034
Augustauskas, R., & Lipnickas, A. (2020). Improved Pixel-Level Pavement-Defect Segmentation Using a Deep
Autoencoder. Sensors, 20(9), 2557. https://www.mdpi.com/1424-8220/20/9/2557
Baduge, S. K., Thilakarathna, S., Perera, J. S., Ruwanpathirana, G. P., Doyle, L., Duckett, M., Lee, J., Saenda, J.,
& Mendis, P. (2023). Assessment of crack severity of asphalt pavements using deep learning algorithms
and geospatial system. Construction and Building Materials, 401, 132684.
https://doi.org/https://doi.org/10.1016/j.conbuildmat.2023.132684
Brien, D. O., Andrew Osborne, J., Perez-Duenas, E., Cunningham, R., & Li, Z. (2023). Automated crack
classification for the CERN underground tunnel infrastructure using deep learning. Tunnelling and
Underground Space Technology, 131, 104668. https://doi.org/https://doi.org/10.1016/j.tust.2022.104668
Canestrari, F., & Ingrassia, L. P. (2020). A review of top-down cracking in asphalt pavements: Causes, models,
experimental tools and future challenges. Journal of Traffic and Transportation Engineering (English
Edition), 7(5), 541-572. https://doi.org/https://doi.org/10.1016/j.jtte.2020.08.002
Cha, Y.-J., Choi, W., & Büyüköztürk, O. (2017). Deep Learning-Based Crack Damage Detection Using
Convolutional Neural Networks. Computer-Aided Civil and Infrastructure Engineering, 32(5), 361-378.
https://doi.org/https://doi.org/10.1111/mice.12263
Choi, W., & Cha, Y. J. (2020). SDDNet: Real-Time Crack Segmentation. IEEE Transactions on Industrial
Electronics, 67(9), 8016-8025. https://doi.org/10.1109/TIE.2019.2945265