4road Damage Detection
4road Damage Detection
Bachelor of Technology
In
August, 2021
CERTIFICATE
This is to Certify that Ajit Raj Shekhar, Chandan Yadav, Rahul Kumar Yadav Kausal and
Shreya Pandey has carried out the Project work presented in the Project Report entitled “Road
Damage Detection” for the award of Bachelor of Technology (Computer Science &
Engineering) from Ashoka Institute of Technology & Management, Varanasi in partial
fulfillment of the requirement for the award of degree of B. Tech. in Computer Science &
Engineering, is a record of the candidate’s own work carried out by them under my supervision. The
matter embodied in this report is original and has not been submitted for the award of any other
degree.
Forwarded By:
(Mr. Gaurav Ojha)
Mr. Arvind Kumar
Assistant Professor
Assistant Prof. & Head of Department
(Department of CSE)
(Department of CSE)
Date- ………………….
2
DECLARATION
We hereby declare that the project entitled “Road Damage Detection” submitted by us
in the partial fulfillment of the requirement for the award of the degree of Bachelor of
Technology (Computer Science & Engineering) of Dr. A.P.J. Abdul Kalam
Technical University, Lucknow is record of our own work carried under the
supervision and guidance of Mr. Gaurav Ojha.
To the best of our knowledge this project have not been submitted to any other
University or Institute for the award of degree.
Signature : Signature :
Name : Ajit Raj Shekhar Name : Chandan Yadav
Roll No. : 1764110005 Roll No. : 1764110018
Date : Date :
Signature : Signature :
Name : Rahul Kumar Yadav Kausal Name : Shreya Pandey
Roll No. : 1764110037 Roll No. : 1864110910
Date : Date :
3
ACKNOWLEDGEMENT
In performing our project, we had to take the help and guideline of some respected
persons, who deserves our gratitude. The completion of this project gives us much
pleasure.
I take this opportunity to express my deep gratitude and deep regard to my guide Mr.
Gaurav Ojha, Designation, Department of Computer Science & Engineering, Ashoka
Institute of Technology & Management, Varanasi, for his/her exemplary guidance,
monitoring and encouragement throughout the course of this project. And he is also our
Internal Guide for helping us to complete out this project work successfully.
We also take the opportunity to acknowledge the contribution of Mr. Arvind Kumar,
Assistant Professor & Head of Department, Computer Science & Engineering, of
College Ashoka Institute of Technology & Management, Varanasi, for his full support
and assistance during the development of the project.
A sincere thanks to all my project team they performed very well and their constant
effort is really appreciable. Their dedication encouraged me to perform well, I really
admire their company, and it was a great experience to work with them.
We are thankful to all our faculty member for their cooperation, invaluably constructive
criticism and friendly advice during the project. We are also thankful to our colleague
and classmate who helped us in compilation of this project.
Finally, yet importantly, we would like to express my heartfelt thanks to my beloved
parents for their blessings. I perceive as this opportunity as a big milestone in my career
development. I will strive to use gained skill and knowledge in the best possible way,
and will work on their improvement, in order to continue cooperation with all of you in
the future.
4
ABSTRACT
The various defects that occur on asphalt pavement are a direct cause car accidents,
and countermeasures are required because they cause significantly dangerous
situations. In this project, we propose convolutional neural networks (CNN)-based
road surface damage detection with deep learning. First, the training database is
collected through the camera installed in the vehicle while driving on the road.
Moreover, the CNN model is trained in the form of a semantic segmentation using the
deep convolutional autoencoder. Here, we augmented the training dataset depending
on brightness, and finally generated lots of training images. Furthermore, the CNN
model is updated by using the pseudo-labeled images from the semi-supervised
learning methods for improving the performance of road surface damage detection
technique. To demonstrate the effectiveness of the proposed method, various
evaluation datasets were created to verify the performance of the proposed road
surface damage detection, and four experts evaluated each image. As a result, it is
confirmed that the proposed method can properly segment the road surface damages.
5
LIST OF FIGURES
6
CONTENT
1.OBJECTIVE
2.INTRODUCTION
2.1 What is road damage detection ?
2.2 what is deep learning ?
2.3 CNNs(Convolutional neural network).
3.SYSTEM REQUIREMENTS
3.1 Hardware components.
3.2 Software required.
4.IMPLEMENTATION
-Object detection system
-Data collection.
-Deep ensemble learning.
-TABLE 1: Model comparision.
5.CODINGS.
7
1. OBJECTIVE
Roads are one of the most crucial parts of a social as well as economic development
of any developing as well as developed country. But as we are well aware that the
maintenance of the same by governmental organisations such as municipalities is a
big challenge, many researchers are already indulged in finding multiple ways for
developing an efficient and apt way for helping the municipalities. If regular
inspection of the road conditions are not maintained then the condition of roads will
worsen gradually due to many factors such as weather, traffic, aging, poor choice of
materials etc.
Some agencies deploy road survey vehicles which consist of multiple expensive
sensors and high resolution cameras. There are some experienced road managers
who supervise and perform visual inspection of roads. But these methods are of
course really time consuming and expensive. Even after the completion of
inspection, these agencies struggle to maintain accurate and updated databases of
recorded structural damages.
So we need something which is inexpensive, fast and organised solution for such
road damage detections. Nowadays we are very fortunate that almost everyone
carries a camera based smartphone. So with the advent of Object Detection
techniques in AI, people have started launching challenges and research in this
domain and municipalities in Japan have already started using such smartphone
based AI techniques to perform road damage inspection. So this case study is an
attempt to use some state of the art techniques to build a model which will try to
detect multiple types of road damages such as potholes, alligator cracks, etc using
artificial intelligence tools.
2. INTRODUCTION
8
Throughout this project we mainly talk about the Road damage detection, Deep
Learning and CNNs(Convolutional Neural Networks) . So before moving further
we must have knowledge about these terms as mentioned above.
What are some of the technical challenges one would need to overcome in doing
so?
This method walks through our approach to solving the task, highlighting the
unexpected issues we encountered along the way.
9
Figure: Example images from existing papers on road damage detection.
10
Figure: Road damage detection and classifications.
To detect the road surface damages, CNN-based techniques have also being
studied. They detect the road surface damages by using the object detection. The
object detection is to find the position of an object on the image, which is in the
form of bounding boxes, and to determine what class the object is.
In object detection, the parts of the road surface damages are not precisely
segmented. In this, we focus on finding the road surface damages in the form of
semantic segmentation.Although convolutional neural networks show high
performance on image processing in various applications, most approaches are
limited to supervised learning. Traditional machine learning methods can be
divided into supervised and unsupervised learning categories, where supervised
learning refers to the use of datasets that pair input data with labeled data to train
models, and CNNs often use image information as input data. Labeled data can
vary in segmentation, classification, and regression depending on the structure
11
of the neural network, and much time and effort are required to acquire such
labeled data. On the other hand, collecting unlabeled input data is a relatively
easy and simple alternative to acquiring labeled data. Unsupervised learning
refers to a type of machine learning algorithm used to generate new input data,
or determine hidden structures from datasets consisting of input data without
labeled responses.
In the case of performing supervised learning for the road surface damage
detection technique based on the semantic segmentation, the labeled image that
only segments road surface damages can be used as training data. In this paper,
5000 pieces of image datasets were collected, and these datasets must be labeled
one by one to train the model, which requires a great deal of time and effort
compared to collecting simple unlabeled input data.
Figure. Examples of: (a) fully connected neural network (FCN) and (b) 1D and
(c) 2D convolutional neural networks (CNNs): all neurons are connected in (a),
while only adjacent neurons are connected in (b,c).
12
3. SYSTEM REQUIREMENTS
The project needs both hardware and software components. The hardware
components includes, the Image Capturing Device (Webcam, Optical device,
smartphones camera), Monitoring system, Connection cables, Storage system,
Etc.
Software components are Jupyter Notebook, Pycharm, etc. They are described in
detail below:
Webcam software enables users to record a video or stream the video on the
Internet. As video streaming over the Internet requires much bandwidth, such
streams usually use compressed formats. The maximum resolution of a webcam
is also lower than most handheld video cameras, as higher resolutions would be
reduced during transmission. The lower resolution enables webcams to be
relatively inexpensive compared to most video cameras, but the effect is
adequate for video chat sessions.
The webcam features are mainly dependent on the computer processor as well as
an operating system of the computer. They can provide advanced features such
as image archiving, motion sensing, custom coding, or even automation.
Furthermore, webcams are used for social video recording, video broadcasting,
13
and computer vision and mainly used for security surveillance and in
videoconferencing.
Features of webcam
The webcams can differ in terms of size, shape, specification, and price. There
are several features of webcam that help you choose the best webcam for your
individual needs:
1. Megapixels
2. Frame rate
3. Lens quality
4. Autofocus
6. Resolution
14
Smartphones camera: Today’s smartphones come equipped with a very
comprehensive set of camera related specifications. Our smartphone, for many
of us, has become our primary camera due to it being the one we always have
with us.
In its purest form, smartphone photography is all about collecting photons (light)
and converting them into electrons (image). The capabilities of the supporting
hardware and software are paramount to producing high-quality images of your
chosen subject
Bright aperture
Large sceen
HDR etc
15
ii) Monitoring System : By putting thresholds on the damage score one can
automatically find areas of significant road damage. Most of the time the
classification was correct. In some cases there was noise in the data which
caused the score to be high the damage score is meant to distinguish cracks and
potholes from a smooth pavement. However, there are many more objects and
deformations that can be found in roads, three examples are shown in Figure
Two of them are part of the infrastructure, a manhole and a grate. Obviously
they Are not considered road damage. The third example is an area where the
Pavement is buckled. These structures can be clearly seen in the 3D maps. We
are Currently working on algorithms to classify such objects and Deformation
the areas of damage can be shown on a map and used by maintenance. A display
Concept is shown in Figure where the user can click on the selected locations
And view detailed information about it.
16
3.3 SOFTWARE REQUIRED
i) Jupyter Notebook
17
ii) Pycharm
Edu (free and open-sourced): for learning programming languages and related
technologies with integrated educational tools.
18
4. IMPLEMENTATION
i) YOLO
YOLO is an object detection framework that can achieve high mean average
precision (mAP) and speed. In addition, YOLO can predict the region and class
of objects with a single CNN. An advantageous feature of YOLO is that its
processing speed is considerably fast because it solves the problem as a mere
regression, detecting objects by considering background information. The
19
YOLO algorithm outputs the coordinates of the bounding box of the object
candidate and the confidence of the inference after receiving an image as input.
ii) R-FCN
R-FCN is another object detection framework, which was proposed by Dai et al.
(Dai et al., 2016). Its architecture is that of a region-based, fully convolutional
network for accurate and efficient object detection. Although Faster R-CNN is
several times faster than Fast R-CNN, the region-specific component must be
applied several hundred times per image. Instead of cropping features from the
same layer where the region proposals are predicted like in the case of the Faster
R-CNN method, in the R-FCN method, crops are taken from the last layer of the
features prior to prediction. This approach of pushing cropping to the last layer
minimizes the amount of per-region computation that must be performed. Dai et
al. (Dai et al., 2016) showed that the R-FCN model (using Resnet 101) could
achieve aaccuracy comparable to Faster R-CNN often at faster running speeds.
iii) SSD
20
Data Collection
Data Collection thus far, in the study of damage detection on the road surface,
images are either captured from above the road surface or using on-board
cameras on vehicles. When models are trained with images captured from above,
the situations that can be applied in practice are limited, considering the
difficulty of capturing such images. In contrast, when a model is constructed
with images captured from an on-board vehicle camera, it is easy to apply these
images to train the model for practical situations. For example, using a readily
available camera like on smartphones and general passenger cars, any individual
can easily detect road damages by running the model on the smartphone or by
transferring the images to an external server and processing it on the server. We
installed a smartphone (LG Nexus 5X) on the dashboard of a car, as shown in
Figure, and photographed images of 600 × 600 pixels once per second. The
reason we select a photographing interval of 1sec is because it is possible to
photograph images while traveling on the road without leakage or duplication
when the average speed of the car is approximately 40 km/h (or approximately
10 m/s). For this purpose, we created a smartphone application that can capture
images of the roads and record the location information once per second.
21
22
Deep Ensemble Learning
An object detection algorithm deals with detecting semantic objects and visual
content belonging to a certain class from a digital image. With the advances in
deep neural networks, several Convolutional Neural Network (CNN) based
object detection algorithms have been proposed. The first one was the Region of
CNN features (R-CNN) method, which proposed to perform object detection via
two steps: object region proposal and classification. The first step generates
multiple regions by using a selective search, which are then input to a CNN
classifier. However, due to its inherent computational complexity, several
optimized versions of R-CNN were proposed such as the Fast R-CNN algorithm.
More recently, an algorithm known as ”You Only Look Once” (YOLO) was
proposed, which combined the two steps from R-CNN algorithm and
significantly reduced the computational complexity. YOLO uses a CNN which
inherently decides regions from the image and outputs probabilities for each of
them. Hence, it is able to achieve a significant speedup as compared to R-CNN
based algorithms and can be used for real-time processing as well. The goal of
this work is to improve upon the real-time detection capabilities for road damage
detection, hence we use YOLO as our base model. Ensemble methods, which
combine the predictions from various models, have been successfully employed
in various machine learning tasks to improve the accuracy. In this work, we use
an ensemble of YOLO models trained for different number of iterations and
different resolutions. More details about the model selection and implementation
can be found here. We present the model performance in Table I.
Results
Fig. 1 shows the detection results from a single YOLO model under varying
conditions.
In Fig. 2, we show some detection results for YOLO models trained on data
from Japan and India.
Whereas, in Fig. 3, training data from Japan, India, and Czech are used.
23
24
The models trained on data from all the countries seem to perform better than
the models trained on data from Japan and India only, which is in contrast to the
results presented in [1].
Furthermore, selection of the input image size considerably affects the detection
performance. Since YOLO requires the input image resolution to be a multiple
of 32, we focused on two specific sizes, 416 and 608. However, as opposed to
common perception, increasing the resolution of the image decreased the
performance of the base model, as shown in Table II. We evaluated the
performance of the proposed models by using the platform provided by the
organizers of IEEE BigData Cup Challenge 2020. As described in Section III-A,
the bounding boxes whose class label matched with the ground truth were
selected and then those with a greater than 50% IoU were picked. Finally the F1
Score for these boxes was calculated.
25
5. CODING'S
In this section we talk about coding arena about the project which listed out
through section.
import os
#classes = ["stopsign"]
dw = 1./size[0]
dh = 1./size[1]
x = (box[0] + box[1])/2.0
y = (box[2] + box[3])/2.0
w = box[1] - box[0]
h = box[3] - box[2]
x = x*dw
w = w*dw
y = y*dh
26
h = h*dh
return (x,y,w,h)
2–
import glob
import pandas as pd
import xml.etree.ElementTree as ET
3-
def class_text_to_int(row_label):
if row_label == 'D00':
return '0'
return '1'
return '2'
return '3'
return '4'
return '5'
27
elif row_label == 'D43':
return '6'
return '7'
else:
exit(0)
4-
#c=0
xml_list = []
tree = ET.parse(xml_file)
root = tree.getroot()
ct = 0
fn = root.find('filename').text
fn = fn.split('.')
print(fn[0])
w = int(root.find('size')[0].text)
h = int(root.find('size')[1].text)
28
#print("Output:" + txt_outpath)
#print(txt_outfile)
# print(member.find('bndbox')[0].text)
cls = member[0].text
xmin = int(member.find('bndbox')[0].text)
xmax = int(member.find('bndbox')[1].text)
ymin = int(member.find('bndbox')[2].text)
ymax = int(member.find('bndbox')[3].text)
#print(w,h)
bb = convert((w,h), b)
#print(bb)
cls_id = class_text_to_int(cls)
txt_outfile.close()
#xml_list.append(value)
#list_file.close()
print("Files created")
29
def main():
image_path = "C:\Users\Ajit\Downloads\Road-damage-detection-
master\src"#os.path.join(os.getcwd(), 'annotations')
op = "F:/intern/RoadDamageDataset/Sumida/lb/"
xml_df = xml_to_csv(image_path,op)
5-
import os
import glob
import pandas as pd
import xml.etree.ElementTree as ET
6-
#def label2det(label):
# f = open('val.txt', 'a+')
#
f.write('/media/erress/Personal/Programming/BennettUniversity/bdd100k/ima
ges/100k/val/%s.jpg' % (label['name']))
# f.write('\n')
# f.close()
30
7-
def change_dir(path):
xml_list = []
f = open('testlist.txt', 'a+')
tree = ET.parse(xml_file)
root = tree.getroot()
# print(member.find('bndbox')[0].text)
value = root.find('filename').text
f.write('train_data/images/%s' % (value))
f.write('\n')
f.close()
print('file created')
8-
def main():
image_path = "C:\Users\Ajit\Downloads\Road-damage-detection-
master\src"#os.path.join(os.getcwd(), 'annotations')
xml_df = change_dir(image_path)
31
9-
#cpu config
sess = tf.Session(config=config)
keras.backend.set_session(sess)
#os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
#os.environ["CUDA_VISIBLE_DEVICES"] = ""
det_l = cfgconst.net.layers[len(cfgconst.net.layers)-1]
CLASSNUM = det_l.classes
f = open(cfgconst.labelnames)
voc_names =[]
for ln in f:
print(voc_names)
if CLASSNUM != len(voc_names):
exit()
32
# run_yolo
if len(sys.argv) < 2:
exit()
voc_labels= []
for i in range(CLASSNUM):
voc_labels.append("ui_data/labels/"+voc_names[i]+".PNG")
if not os.path.isfile(voc_labels[i]):
exit()
import utils
#print 'thresh='+str(thresh)
#exit()
#cfg_path = sys.argv[2]
33
filename = sys.argv[3] if len(sys.argv) > 3 else 'nofilename'
print sys.argv
print model_weights_path+','+filename
# construct network
backup_directory = "backup/"
if os.path.isfile(model_weights_path):
model=load_model(model_weights_path,
custom_objects={'yololoss': ddd.yololoss})
model.compile(loss=ddd.yololoss, optimizer=sgd,
metrics=["accuracy"])
else:
34
#base = utils.basecfg(cfg_path)
model = kerasmodel.makenetwork(net)
print str(net.max_batches/(len(X_train)/net.batch))
#datagen = ImageDataGenerator(
# featurewise_center=True,
# featurewise_std_normalization=True,
# rotation_range=0,
# width_shift_range=0.,
# height_shift_range=0.,
# horizontal_flip=True)
#datagen.fit(X_train)
#model.fit_generator(datagen.flow(X_train, Y_train,
batch_size=net.batch),
# samples_per_epoch=len(X_train),
nb_epoch=net.max_batches/(len(X_train)/net.batch))
35
#model.fit(X_train, Y_train, batch_size=net.batch,
nb_epoch=net.max_batches/(len(X_train)/net.batch))
early_stop = EarlyStopping(monitor='loss',
min_delta=0.001,
patience=3,
mode='min',
verbose=1)
checkpoint = ModelCheckpoint('yolo_weight.h5',
monitor='loss',
verbose=1,
save_best_only=True,
mode='min',
period=1)
batchesPerdataset = max(1,len(X_train)/net.batch)
model.fit(X_train, Y_train,
nb_epoch=net.max_batches/(batchesPerdataset), batch_size=net.batch,
verbose=1)
model.save_weights('yolo_weight_rd.h5')
model.save('yolo_kerasmodel_rd.h5')
36
net = cfgconst.net ##parse.parse_network_cfg(cfg_path)
testmodel = load_model(model_weights_path,
custom_objects={'yololoss': ddd.yololoss})
(s,w,h,c) = testmodel.layers[0].input_shape
testloss = testmodel.evaluate(x_test,y_test)
print y_test
pred = testmodel.predict(X_test)
(s,w,h,c) = testmodel.layers[0].input_shape
confid_index_list =[]
confid_value_list =[]
x_value_list = []
y_value_list =[]
w_value_list =[]
h_value_list =[]
class_id_list =[]
37
classprob_list =[]
x0_list = []
x1_list = []
y0_list = []
y1_list = []
det_l = cfgconst.net.layers[len(cfgconst.net.layers)-1]
side = det_l.side
classes = det_l.classes
xtext_index =0
foundindex = False
max_confid =0
for p in pred:
#foundindex = False
#print 'L'+str(k)
for i in range(side):
for j in range(side):
if k==0:
38
max_confid =
max(max_confid,p[k*49+i*7+j])
confid_index_list.append(i*7+j)
foundindex = True
#print '-'
print 'max_confid='+str(max_confid)
confid_value = max(0,p[0*49+confid_index])
x_value = max(0,p[1*49+confid_index])
y_value = max(0,p[2*49+confid_index])
w_value = max(0,p[3*49+confid_index])
h_value = max(0,p[4*49+confid_index])
maxclassprob = 0
maxclassprob_i =-1
for i in range(classes):
39
maxclassprob = p[(5+i)*49+confid_index]
maxclassprob_i = i
classprob_list.append( maxclassprob)
class_id_list.append( maxclassprob_i)
print 'max_confid='+str(max_confid)+',c='+str(confid_value)
+',x='+str(x_value)+',y='+str(y_value)+',w='+str(w_value)+',h='+str(h_value)
+',cid='+str(maxclassprob_i)+',prob='+str(maxclassprob)
print 'confid_index='+str(confid_index)+',x='+str(x)
+',y='+str(y)+',row='+str(row)+',col='+str(col)
#draw = ImageDraw.Draw(nim)
#draw.rectangle([x-(w_value/2)*w,y-(h_value/2)*h,x+
(w_value/2)*w,y+(h_value/2)*h])
#del draw
#nim.save('predbox.png')
#sourceimage = X_test[xtext_index].copy()
40
y0_list.append( max(0, int(y-(h_value/2)*h)) )
x1_list.append( int(x+(w_value/2)*w) )
y1_list.append( int(y+(h_value/2)*h) )
break
#xtext_index = xtext_index + 1
#print pred
sourceimage = X_test[0].copy()
#print (s,w,h,c)
#exit()
if os.path.isfile(imglist_path):
testmodel = load_model(model_weights_path,
custom_objects={'yololoss': ddd.yololoss})
(s,w,h,c) = testmodel.layers[0].input_shape
f = open(imglist_path)
for img_path in f:
41
#X_test = []
if os.path.isfile(img_path.strip()):
frame = Image.open(img_path.strip())
#(orgw,orgh) = img.size
continu
#X_test.append(np.asarray(nim))
# draw classimg
classimg = cv2.imread(voc_labels[class_id])
42
print
'box='+str(x0)+','+str(y0)+','+str(x1)+','+str(y1)
#print img.shape
#print classimg.shape
yst = max(0,y0-classimg.shape[0])
yend = max(y0,classimg.shape[0])
img[yst:yend, x0:x0+classimg.shape[1]] =
classimg
# draw text
font = cv2.FONT_HERSHEY_SIMPLEX
cv2.imshow('frame',img)
break
else:
cv2.destroyAllWindows()
else:
43
def demo_yolo(model_weights_path, filename, thresh=0.3):
print 'demo_yolo'
testmodel = load_model(model_weights_path,
custom_objects={'yololoss': ddd.yololoss})
(s,w,h,c) = testmodel.layers[0].input_shape
cap = cv2.VideoCapture(filename)
while (cap.isOpened()):
if not ret:
break
#print frame
# draw classimg
44
classimg = cv2.imread(voc_labels[class_id])
print 'box='+str(x0)+','+str(y0)+','+str(x1)+','+str(y1)
#print img.shape
#print classimg.shape
yst = max(0,y0-classimg.shape[0])
yend = max(y0,classimg.shape[0])
# draw text
font = cv2.FONT_HERSHEY_SIMPLEX
cv2.imshow('frame',img)
break
cap.release()
cv2.destroyAllWindows()
if sys.argv[1]=='train':
train_yolo(model_weights_path)
elif sys.argv[1]=='test':
if os.path.isfile(model_weights_path):
45
test_yolo(filename, model_weights_path, confid_thresh=thresh)
else:
test_yolo(filename, confid_thresh=thresh)
elif sys.argv[1]=='demo_video':
if os.path.isfile(model_weights_path):
else:
exit()
elif sys.argv[1]=='debug':
46