0% found this document useful (0 votes)
114 views46 pages

4road Damage Detection

The document describes a graduate project report submitted to AKTU university that aims to develop a system for road damage detection. It outlines the objectives of the project, which are to build an inexpensive and efficient solution for detecting various types of road damages using computer vision and deep learning techniques. The system would analyze images and video footage collected from smartphones to identify issues like potholes and cracks and help streamline the process of road maintenance and management. It discusses some of the technical challenges in developing such a system and the approaches that will be explored to address them.

Uploaded by

Ajit Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views46 pages

4road Damage Detection

The document describes a graduate project report submitted to AKTU university that aims to develop a system for road damage detection. It outlines the objectives of the project, which are to build an inexpensive and efficient solution for detecting various types of road damages using computer vision and deep learning techniques. The system would analyze images and video footage collected from smartphones to identify issues like potholes and cracks and help streamline the process of road maintenance and management. It discusses some of the technical challenges in developing such a system and the approaches that will be explored to address them.

Uploaded by

Ajit Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 46

Road Damage Detection

A graduate project report submitted to AKTU in partial fulfillment


of the requirement for the award of the degree of

Bachelor of Technology
In

Computer Science and Engineering

SUBMITED BY UNDER GUIDANCE OF

Ajit Raj Shekhar (1764110005) Mr. Gaurav Ojha


Chandan Yadav (1764110018) (Assistant Professor)
Rahul Kumar Yadav Kausal (1764110037)
Shreya Pandey (1864110910) Computer Science & Engineering

Department of Computer Science & Engineering

Ashoka Institute of Technology and Management


(A constitute of Dr. A.P.J. Abdul Kalam Technical University)
Varanasi -221007

August, 2021
CERTIFICATE

This is to Certify that Ajit Raj Shekhar, Chandan Yadav, Rahul Kumar Yadav Kausal and
Shreya Pandey has carried out the Project work presented in the Project Report entitled “Road
Damage Detection” for the award of Bachelor of Technology (Computer Science &
Engineering) from Ashoka Institute of Technology & Management, Varanasi in partial
fulfillment of the requirement for the award of degree of B. Tech. in Computer Science &
Engineering, is a record of the candidate’s own work carried out by them under my supervision. The
matter embodied in this report is original and has not been submitted for the award of any other
degree.

Forwarded By:
(Mr. Gaurav Ojha)
Mr. Arvind Kumar
Assistant Professor
Assistant Prof. & Head of Department
(Department of CSE)
(Department of CSE)

Date- ………………….

2
DECLARATION

We hereby declare that the project entitled “Road Damage Detection” submitted by us
in the partial fulfillment of the requirement for the award of the degree of Bachelor of
Technology (Computer Science & Engineering) of Dr. A.P.J. Abdul Kalam
Technical University, Lucknow is record of our own work carried under the
supervision and guidance of Mr. Gaurav Ojha.
To the best of our knowledge this project have not been submitted to any other
University or Institute for the award of degree.

Signature : Signature :
Name : Ajit Raj Shekhar Name : Chandan Yadav
Roll No. : 1764110005 Roll No. : 1764110018
Date : Date :

Signature : Signature :
Name : Rahul Kumar Yadav Kausal Name : Shreya Pandey
Roll No. : 1764110037 Roll No. : 1864110910
Date : Date :

3
ACKNOWLEDGEMENT

In performing our project, we had to take the help and guideline of some respected
persons, who deserves our gratitude. The completion of this project gives us much
pleasure.
I take this opportunity to express my deep gratitude and deep regard to my guide Mr.
Gaurav Ojha, Designation, Department of Computer Science & Engineering, Ashoka
Institute of Technology & Management, Varanasi, for his/her exemplary guidance,
monitoring and encouragement throughout the course of this project. And he is also our
Internal Guide for helping us to complete out this project work successfully.

We also take the opportunity to acknowledge the contribution of Mr. Arvind Kumar,
Assistant Professor & Head of Department, Computer Science & Engineering, of
College Ashoka Institute of Technology & Management, Varanasi, for his full support
and assistance during the development of the project.
A sincere thanks to all my project team they performed very well and their constant
effort is really appreciable. Their dedication encouraged me to perform well, I really
admire their company, and it was a great experience to work with them.
We are thankful to all our faculty member for their cooperation, invaluably constructive
criticism and friendly advice during the project. We are also thankful to our colleague
and classmate who helped us in compilation of this project.
Finally, yet importantly, we would like to express my heartfelt thanks to my beloved
parents for their blessings. I perceive as this opportunity as a big milestone in my career
development. I will strive to use gained skill and knowledge in the best possible way,
and will work on their improvement, in order to continue cooperation with all of you in
the future.

4
ABSTRACT

The various defects that occur on asphalt pavement are a direct cause car accidents,
and countermeasures are required because they cause significantly dangerous
situations. In this project, we propose convolutional neural networks (CNN)-based
road surface damage detection with deep learning. First, the training database is
collected through the camera installed in the vehicle while driving on the road.
Moreover, the CNN model is trained in the form of a semantic segmentation using the
deep convolutional autoencoder. Here, we augmented the training dataset depending
on brightness, and finally generated lots of training images. Furthermore, the CNN
model is updated by using the pseudo-labeled images from the semi-supervised
learning methods for improving the performance of road surface damage detection
technique. To demonstrate the effectiveness of the proposed method, various
evaluation datasets were created to verify the performance of the proposed road
surface damage detection, and four experts evaluated each image. As a result, it is
confirmed that the proposed method can properly segment the road surface damages.

5
LIST OF FIGURES

6
CONTENT

1.OBJECTIVE
2.INTRODUCTION
2.1 What is road damage detection ?
2.2 what is deep learning ?
2.3 CNNs(Convolutional neural network).

3.SYSTEM REQUIREMENTS
3.1 Hardware components.
3.2 Software required.

4.IMPLEMENTATION
-Object detection system
-Data collection.
-Deep ensemble learning.
-TABLE 1: Model comparision.
5.CODINGS.

7
1. OBJECTIVE

Roads are one of the most crucial parts of a social as well as economic development
of any developing as well as developed country. But as we are well aware that the
maintenance of the same by governmental organisations such as municipalities is a
big challenge, many researchers are already indulged in finding multiple ways for
developing an efficient and apt way for helping the municipalities. If regular
inspection of the road conditions are not maintained then the condition of roads will
worsen gradually due to many factors such as weather, traffic, aging, poor choice of
materials etc.

Some agencies deploy road survey vehicles which consist of multiple expensive
sensors and high resolution cameras. There are some experienced road managers
who supervise and perform visual inspection of roads. But these methods are of
course really time consuming and expensive. Even after the completion of
inspection, these agencies struggle to maintain accurate and updated databases of
recorded structural damages.

This poor management leads to unorganised and inappropriate resource allocation


for road maintenance.

So we need something which is inexpensive, fast and organised solution for such
road damage detections. Nowadays we are very fortunate that almost everyone
carries a camera based smartphone. So with the advent of Object Detection
techniques in AI, people have started launching challenges and research in this
domain and municipalities in Japan have already started using such smartphone
based AI techniques to perform road damage inspection. So this case study is an
attempt to use some state of the art techniques to build a model which will try to
detect multiple types of road damages such as potholes, alligator cracks, etc using
artificial intelligence tools.

2. INTRODUCTION

8
Throughout this project we mainly talk about the Road damage detection, Deep
Learning and CNNs(Convolutional Neural Networks) . So before moving further
we must have knowledge about these terms as mentioned above.

2.1 What is road damage detection

Road damage detection is critical for the maintenance of a road, which


traditionally has been performed using expensive high-performance sensors.
With the recent advances in technology, especially in computer vision, it is
now possible to detect and categorize different types of road damages, which
can facilitate efficient maintenance and resource management.

Can we automatically detect and classify the severity of road damage by


exploiting raw video footage taken from smartphones on car dashboards?

What are some of the technical challenges one would need to overcome in doing
so?

This method walks through our approach to solving the task, highlighting the
unexpected issues we encountered along the way.

From different road detection methods, we found most approaches could be


broken down into the following categories:

 3D Analysis: usage of stereo image analysis¹ or LIDAR point clouds² to


detect abnormalities in pavement.

 Vibration-based Analysis: capitalizing on on-board accelerometers or


gyroscopes.³

 Vision-based models: ranging from traditional techniques like edge-


detection & spectral segmentation⁴ to representation learning and
segmentation via Convolutional Neural Networks (CNNs).⁵

9
Figure: Example images from existing papers on road damage detection.

2.2 What is Deep Learning

Developers apply the end-to-end object detection method based on deep


learning to the road surface damage detection problem, and verify its
detection accuracy. In particular, we examine whether we can detect eight
classes of road damage by applying state-of-the-art object detection methods.

This case study is on a real world application of Deep Learning on Making


Classification & detection of different kinds of road damages

10
Figure: Road damage detection and classifications.

2.3 CNNs(Convolutional Neural Networks)

To detect the road surface damages, CNN-based techniques have also being
studied. They detect the road surface damages by using the object detection. The
object detection is to find the position of an object on the image, which is in the
form of bounding boxes, and to determine what class the object is.

In object detection, the parts of the road surface damages are not precisely
segmented. In this, we focus on finding the road surface damages in the form of
semantic segmentation.Although convolutional neural networks show high
performance on image processing in various applications, most approaches are
limited to supervised learning. Traditional machine learning methods can be
divided into supervised and unsupervised learning categories, where supervised
learning refers to the use of datasets that pair input data with labeled data to train
models, and CNNs often use image information as input data. Labeled data can
vary in segmentation, classification, and regression depending on the structure

11
of the neural network, and much time and effort are required to acquire such
labeled data. On the other hand, collecting unlabeled input data is a relatively
easy and simple alternative to acquiring labeled data. Unsupervised learning
refers to a type of machine learning algorithm used to generate new input data,
or determine hidden structures from datasets consisting of input data without
labeled responses.

In the case of performing supervised learning for the road surface damage
detection technique based on the semantic segmentation, the labeled image that
only segments road surface damages can be used as training data. In this paper,
5000 pieces of image datasets were collected, and these datasets must be labeled
one by one to train the model, which requires a great deal of time and effort
compared to collecting simple unlabeled input data.

Figure. Examples of: (a) fully connected neural network (FCN) and (b) 1D and
(c) 2D convolutional neural networks (CNNs): all neurons are connected in (a),
while only adjacent neurons are connected in (b,c).

12
3. SYSTEM REQUIREMENTS

The project needs both hardware and software components. The hardware
components includes, the Image Capturing Device (Webcam, Optical device,
smartphones camera), Monitoring system, Connection cables, Storage system,
Etc.

Software components are Jupyter Notebook, Pycharm, etc. They are described in
detail below:

3.2 Hardware Components

i) Image Capturing Device

Webcams : A webcam is a video camera that feeds or streams an image or


video in real time to or through a computer network, such as the Internet.
Webcams are typically small cameras that sit on a desk, attach to a user’s
monitor, or are built into the hardware. Webcams can be used during a video
chat session involving two or more people, with conversations that include live
audio and video.

Webcam software enables users to record a video or stream the video on the
Internet. As video streaming over the Internet requires much bandwidth, such
streams usually use compressed formats. The maximum resolution of a webcam
is also lower than most handheld video cameras, as higher resolutions would be
reduced during transmission. The lower resolution enables webcams to be
relatively inexpensive compared to most video cameras, but the effect is
adequate for video chat sessions.

The webcam features are mainly dependent on the computer processor as well as
an operating system of the computer. They can provide advanced features such
as image archiving, motion sensing, custom coding, or even automation.
Furthermore, webcams are used for social video recording, video broadcasting,

13
and computer vision and mainly used for security surveillance and in
videoconferencing.

Features of webcam

The webcams can differ in terms of size, shape, specification, and price. There
are several features of webcam that help you choose the best webcam for your
individual needs:

1. Megapixels

2. Frame rate

3. Lens quality

4. Autofocus

5. Low light quality

6. Resolution

14
Smartphones camera: Today’s smartphones come equipped with a very
comprehensive set of camera related specifications. Our smartphone, for many
of us, has become our primary camera due to it being the one we always have
with us.

In its purest form, smartphone photography is all about collecting photons (light)
and converting them into electrons (image). The capabilities of the supporting
hardware and software are paramount to producing high-quality images of your
chosen subject

Features On Good Camera Phones :

Bright aperture

Decent amount of pixels

Large sceen

optical image stabilization

Lenses and zoom

HDR etc

15
ii) Monitoring System : By putting thresholds on the damage score one can
automatically find areas of significant road damage. Most of the time the
classification was correct. In some cases there was noise in the data which
caused the score to be high the damage score is meant to distinguish cracks and
potholes from a smooth pavement. However, there are many more objects and
deformations that can be found in roads, three examples are shown in Figure

Two of them are part of the infrastructure, a manhole and a grate. Obviously
they Are not considered road damage. The third example is an area where the
Pavement is buckled. These structures can be clearly seen in the 3D maps. We
are Currently working on algorithms to classify such objects and Deformation
the areas of damage can be shown on a map and used by maintenance. A display
Concept is shown in Figure where the user can click on the selected locations
And view detailed information about it.

16
3.3 SOFTWARE REQUIRED

i) Jupyter Notebook

The Jupyter Notebook is an open-source web application that allows you to


create and share documents that contain live code, equations, visualizations, and
narrative text. Its uses include data cleaning and transformation, numerical
simulation, statistical modeling, data visualization, machine learning, and much
more. Jupyter Notebook (formerly IPython Notebooks) is a web-based
interactive computational environment for creating Jupyter notebook documents.
The “notebook” term can colloquially make reference to many different entities,
mainly the Jupyter web application, Jupyter Python web server, or Jupyter
document format depending on context.

According to the official website of Jupyter, Project Jupyter exists to develop


open-source software, open-standards, and services for interactive computing
across dozens of programming languages. Jupyter Book is an open-source
project for building books and documents from computational material. It allows
the user to construct the content in a mixture of Markdown, an extended version
of Markdown called MyST, Maths & Equations using MathJax, Jupyter
Notebooks, reStructuredText, the output of running Jupyter Notebooks at build
time. Multiple output formats can be produced (currently single files, multipage
HTML web pages and PDF files).

17
ii) Pycharm

PyCharm is a dedicated Python Integrated Development Environment (IDE)


providing a wide range of essential tools for Python developers, tightly
integrated to create a convenient environment for productive Python, web, and
data science development.

PyCharm is available in three editions:

Community (free and open-sourced): for smart and intelligent Python


development, including code assistance, refactorings, visual debugging, and
version control integration.

Professional(paid) : for professional Python, web, and data science development,


including code assistance, refactorings, visual debugging, version control
integration, remote configurations, deployment, support for popular web
frameworks, such as Django and Flask, database support, scientific tools
(including Jupyter notebook support), big data tools.

Edu (free and open-sourced): for learning programming languages and related
technologies with integrated educational tools.

PyCharm supports the following versions of Python:

Python 2: version 2.7

Python 3: from the version 3.6 up to the version 3.10

18
4. IMPLEMENTATION

Object Detection System


In general, for object detection, methods that apply an image classifier to an
object detection task have become mainstream; these methods entail varying the
size and position of the object in the test image, and then using the classifier to
identify the object. In the past few years, an approach involving the extraction of
multiple candidate regions of objects using region proposals as typified by R-
CNN then making a classification decision with candidate regions using
classifiers has also been reported. However, the R-CNN approach can be time
consuming because it requires more crops, leading to significant duplicate
computation from overlapping crops. This calculation redundancy was solved
using a Fast R-CNN , which inputs the entire image once through a feature
extractor so that crops share the computation load of feature extraction. As
described above, image processing methods have historically developed at a
considerable pace. In our study, we primarily focus on four recent object
detection systems: the Faster R-CNN , the You Look Only Once (YOLO)
system, the Region-based Fully Convolutional Networks (R-FCN) system and
the Single Shot Multibox Detector (SSD) system.

i) YOLO

YOLO is an object detection framework that can achieve high mean average
precision (mAP) and speed. In addition, YOLO can predict the region and class
of objects with a single CNN. An advantageous feature of YOLO is that its
processing speed is considerably fast because it solves the problem as a mere
regression, detecting objects by considering background information. The

19
YOLO algorithm outputs the coordinates of the bounding box of the object
candidate and the confidence of the inference after receiving an image as input.

ii) R-FCN

R-FCN is another object detection framework, which was proposed by Dai et al.
(Dai et al., 2016). Its architecture is that of a region-based, fully convolutional
network for accurate and efficient object detection. Although Faster R-CNN is
several times faster than Fast R-CNN, the region-specific component must be
applied several hundred times per image. Instead of cropping features from the
same layer where the region proposals are predicted like in the case of the Faster
R-CNN method, in the R-FCN method, crops are taken from the last layer of the
features prior to prediction. This approach of pushing cropping to the last layer
minimizes the amount of per-region computation that must be performed. Dai et
al. (Dai et al., 2016) showed that the R-FCN model (using Resnet 101) could
achieve aaccuracy comparable to Faster R-CNN often at faster running speeds.

iii) SSD

SSD is an object detection framework that uses a single feed-forward


convolutional network to directly predict classes and anchor offsets without
requiring a second stage per-proposal classification operation. The key feature of
this framework is the use of multi-scale convolutional bounding box outputs
attached to multiple feature maps at the top of the network.

20
Data Collection
Data Collection thus far, in the study of damage detection on the road surface,
images are either captured from above the road surface or using on-board
cameras on vehicles. When models are trained with images captured from above,
the situations that can be applied in practice are limited, considering the
difficulty of capturing such images. In contrast, when a model is constructed
with images captured from an on-board vehicle camera, it is easy to apply these
images to train the model for practical situations. For example, using a readily
available camera like on smartphones and general passenger cars, any individual
can easily detect road damages by running the model on the smartphone or by
transferring the images to an external server and processing it on the server. We
installed a smartphone (LG Nexus 5X) on the dashboard of a car, as shown in
Figure, and photographed images of 600 × 600 pixels once per second. The
reason we select a photographing interval of 1sec is because it is possible to
photograph images while traveling on the road without leakage or duplication
when the average speed of the car is approximately 40 km/h (or approximately
10 m/s). For this purpose, we created a smartphone application that can capture
images of the roads and record the location information once per second.

21
22
Deep Ensemble Learning

An object detection algorithm deals with detecting semantic objects and visual
content belonging to a certain class from a digital image. With the advances in
deep neural networks, several Convolutional Neural Network (CNN) based
object detection algorithms have been proposed. The first one was the Region of
CNN features (R-CNN) method, which proposed to perform object detection via
two steps: object region proposal and classification. The first step generates
multiple regions by using a selective search, which are then input to a CNN
classifier. However, due to its inherent computational complexity, several
optimized versions of R-CNN were proposed such as the Fast R-CNN algorithm.
More recently, an algorithm known as ”You Only Look Once” (YOLO) was
proposed, which combined the two steps from R-CNN algorithm and
significantly reduced the computational complexity. YOLO uses a CNN which
inherently decides regions from the image and outputs probabilities for each of
them. Hence, it is able to achieve a significant speedup as compared to R-CNN
based algorithms and can be used for real-time processing as well. The goal of
this work is to improve upon the real-time detection capabilities for road damage
detection, hence we use YOLO as our base model. Ensemble methods, which
combine the predictions from various models, have been successfully employed
in various machine learning tasks to improve the accuracy. In this work, we use
an ensemble of YOLO models trained for different number of iterations and
different resolutions. More details about the model selection and implementation
can be found here. We present the model performance in Table I.

Results

Fig. 1 shows the detection results from a single YOLO model under varying
conditions.

In Fig. 2, we show some detection results for YOLO models trained on data
from Japan and India.

Whereas, in Fig. 3, training data from Japan, India, and Czech are used.

23
24
The models trained on data from all the countries seem to perform better than
the models trained on data from Japan and India only, which is in contrast to the
results presented in [1].

Furthermore, selection of the input image size considerably affects the detection
performance. Since YOLO requires the input image resolution to be a multiple
of 32, we focused on two specific sizes, 416 and 608. However, as opposed to
common perception, increasing the resolution of the image decreased the
performance of the base model, as shown in Table II. We evaluated the
performance of the proposed models by using the platform provided by the
organizers of IEEE BigData Cup Challenge 2020. As described in Section III-A,
the bounding boxes whose class label matched with the ground truth were
selected and then those with a greater than 50% IoU were picked. Finally the F1
Score for these boxes was calculated.

TABLE I: Model Comparison

Model Name Dataset: Test 1 Dataset: Test 2

Yolo-v4(416*416) 0.5193 0.5137

Yolo-v4(608*608) 0.5122 0.5012

Ensemble(5 models) 0.5321 0.5226

Ensemble(15 models) 0.6091 0.5983

Ensemble(25 models) 0.6102 0.6297

Ensemble(30 models) 0.6275 0.6358

25
5. CODING'S
In this section we talk about coding arena about the project which listed out
through section.

import os

from os import walk, getcwd

from PIL import Image

#classes = ["stopsign"]

def convert(size, box):

dw = 1./size[0]

dh = 1./size[1]

x = (box[0] + box[1])/2.0

y = (box[2] + box[3])/2.0

w = box[1] - box[0]

h = box[3] - box[2]

x = x*dw

w = w*dw

y = y*dh

26
h = h*dh

return (x,y,w,h)

2–

import glob

import pandas as pd

import xml.etree.ElementTree as ET

3-

def class_text_to_int(row_label):

if row_label == 'D00':

return '0'

elif row_label == 'D01':

return '1'

elif row_label == 'D10':

return '2'

elif row_label == 'D11':

return '3'

elif row_label == 'D20':

return '4'

elif row_label == 'D40':

return '5'

27
elif row_label == 'D43':

return '6'

elif row_label == 'D44':

return '7'

else:

exit(0)

4-

def xml_to_csv(path, outpath):

#c=0

xml_list = []

for xml_file in glob.glob(path + '/*.xml'):

tree = ET.parse(xml_file)

root = tree.getroot()

ct = 0

fn = root.find('filename').text

fn = fn.split('.')

print(fn[0])

txt_outpath = outpath + fn[0] + '.txt'

w = int(root.find('size')[0].text)

h = int(root.find('size')[1].text)

28
#print("Output:" + txt_outpath)

txt_outfile = open(txt_outpath, "w")

#print(txt_outfile)

for member in root.findall('object'):

# print(member.find('bndbox')[0].text)

cls = member[0].text

xmin = int(member.find('bndbox')[0].text)

xmax = int(member.find('bndbox')[1].text)

ymin = int(member.find('bndbox')[2].text)

ymax = int(member.find('bndbox')[3].text)

#print(w,h)

b = (float(xmin), float(xmax), float(ymin), float(ymax))

bb = convert((w,h), b)

#print(bb)

cls_id = class_text_to_int(cls)

txt_outfile.write(cls_id+ " " + " ".join([str(a) for a in bb]) + '\n')

txt_outfile.close()

#xml_list.append(value)

#list_file.close()

print("Files created")

29
def main():

image_path = "C:\Users\Ajit\Downloads\Road-damage-detection-
master\src"#os.path.join(os.getcwd(), 'annotations')

op = "F:/intern/RoadDamageDataset/Sumida/lb/"

xml_df = xml_to_csv(image_path,op)

5-

import os

import glob

import pandas as pd

import xml.etree.ElementTree as ET

6-

#def label2det(label):

# f = open('val.txt', 'a+')

#
f.write('/media/erress/Personal/Programming/BennettUniversity/bdd100k/ima
ges/100k/val/%s.jpg' % (label['name']))

# f.write('\n')

# f.close()

30
7-

def change_dir(path):

xml_list = []

f = open('testlist.txt', 'a+')

for xml_file in glob.glob(path + '/*.xml'):

tree = ET.parse(xml_file)

root = tree.getroot()

#for member in root.findall('object'):

# print(member.find('bndbox')[0].text)

value = root.find('filename').text

f.write('train_data/images/%s' % (value))

f.write('\n')

f.close()

print('file created')

8-

def main():

image_path = "C:\Users\Ajit\Downloads\Road-damage-detection-
master\src"#os.path.join(os.getcwd(), 'annotations')

xml_df = change_dir(image_path)

31
9-

#cpu config

config = tf.ConfigProto( device_count = {'GPU': 1 , 'CPU': 56} ) #max: 1


gpu, 56 cpu

sess = tf.Session(config=config)

keras.backend.set_session(sess)

#os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"

#os.environ["CUDA_VISIBLE_DEVICES"] = ""

det_l = cfgconst.net.layers[len(cfgconst.net.layers)-1]

CLASSNUM = det_l.classes

f = open(cfgconst.labelnames)

voc_names =[]

for ln in f:

voc_names.append(ln.strip()) # = ["stopsign", "skis"]

# check class number

print(voc_names)

if CLASSNUM != len(voc_names):

print('cfg file class setting is not equal to '+cfgconst.labelnames)

exit()

32
# run_yolo

if len(sys.argv) < 2:

print ('usage: python %s [train/test/valid] [pretrained model (optional)]\n'


%(sys.argv[0]))

exit()

voc_labels= []

for i in range(CLASSNUM):

voc_labels.append("ui_data/labels/"+voc_names[i]+".PNG")

if not os.path.isfile(voc_labels[i]):

print ('can not load image:%s' %(voc_labels[i]))

exit()

import utils

thresh = utils.find_float_arg(sys.argv, "-thresh", .2)

#print 'thresh='+str(thresh)

#exit()

cam_index = utils.find_int_arg(sys.argv, "-c", 0)

#cfg_path = sys.argv[2]

model_weights_path = sys.argv[2] if len(sys.argv) > 2 else 'noweight'

33
filename = sys.argv[3] if len(sys.argv) > 3 else 'nofilename'

print sys.argv

print model_weights_path+','+filename

def train_yolo( weights_path):

# construct network

net = cfgconst.net #parse.parse_network_cfg(cfg_path)

train_images = cfgconst.train #"train_data/train.txt"

backup_directory = "backup/"

# load pretrained model

if os.path.isfile(model_weights_path):

print 'Loading '+model_weights_path

model=load_model(model_weights_path,
custom_objects={'yololoss': ddd.yololoss})

sgd = opt.SGD(lr=net.learning_rate, decay=net.decay,


momentum=net.momentum, nesterov=True)

model.compile(loss=ddd.yololoss, optimizer=sgd,
metrics=["accuracy"])

else:

# base is cfg name

34
#base = utils.basecfg(cfg_path)

print ('Learning Rate: %f, Momentum: %f, Decay: %f\n' %


(net.learning_rate, net.momentum, net.decay));

model = kerasmodel.makenetwork(net)

(X_train, Y_train) = yolodata.load_data(train_images,net.h,net.w,net.c,


net)

print ('max_batches : %d, X_train: %d, batch: %d\n' %(net.max_batches,


len(X_train), net.batch));

print str(net.max_batches/(len(X_train)/net.batch))

#datagen = ImageDataGenerator(

# featurewise_center=True,

# featurewise_std_normalization=True,

# rotation_range=0,

# width_shift_range=0.,

# height_shift_range=0.,

# horizontal_flip=True)

#datagen.fit(X_train)

#model.fit_generator(datagen.flow(X_train, Y_train,
batch_size=net.batch),

# samples_per_epoch=len(X_train),
nb_epoch=net.max_batches/(len(X_train)/net.batch))

35
#model.fit(X_train, Y_train, batch_size=net.batch,
nb_epoch=net.max_batches/(len(X_train)/net.batch))

early_stop = EarlyStopping(monitor='loss',

min_delta=0.001,

patience=3,

mode='min',

verbose=1)

checkpoint = ModelCheckpoint('yolo_weight.h5',

monitor='loss',

verbose=1,

save_best_only=True,

mode='min',

period=1)

batchesPerdataset = max(1,len(X_train)/net.batch)

model.fit(X_train, Y_train,
nb_epoch=net.max_batches/(batchesPerdataset), batch_size=net.batch,
verbose=1)

model.save_weights('yolo_weight_rd.h5')

model.save('yolo_kerasmodel_rd.h5')

def debug_yolo( cfg_path, model_weights_path='yolo_kerasmodel_rd.h5' ):

36
net = cfgconst.net ##parse.parse_network_cfg(cfg_path)

testmodel = load_model(model_weights_path,
custom_objects={'yololoss': ddd.yololoss})

(s,w,h,c) = testmodel.layers[0].input_shape

x_test,y_test = yolodata.load_data('train_data/test.txt', h, w, c, net)

testloss = testmodel.evaluate(x_test,y_test)

print y_test

print 'testloss= '+str(testloss)

def predict(X_test, testmodel, confid_thresh):

print 'predict, confid_thresh='+str(confid_thresh)

pred = testmodel.predict(X_test)

(s,w,h,c) = testmodel.layers[0].input_shape

# find confidence value > 0.5

confid_index_list =[]

confid_value_list =[]

x_value_list = []

y_value_list =[]

w_value_list =[]

h_value_list =[]

class_id_list =[]

37
classprob_list =[]

x0_list = []

x1_list = []

y0_list = []

y1_list = []

det_l = cfgconst.net.layers[len(cfgconst.net.layers)-1]

side = det_l.side

classes = det_l.classes

xtext_index =0

foundindex = False

max_confid =0

for p in pred:

#foundindex = False

for k in range(1): #5+classes):

#print 'L'+str(k)

for i in range(side):

for j in range(side):

if k==0:

38
max_confid =
max(max_confid,p[k*49+i*7+j])

#sys.stdout.write( str(p[k*49+i*7+j])+', ' )

if k==0 and p[k*49+i*7+j]>confid_thresh:

confid_index_list.append(i*7+j)

foundindex = True

#print '-'

print 'max_confid='+str(max_confid)

for confid_index in confid_index_list:

confid_value = max(0,p[0*49+confid_index])

x_value = max(0,p[1*49+confid_index])

y_value = max(0,p[2*49+confid_index])

w_value = max(0,p[3*49+confid_index])

h_value = max(0,p[4*49+confid_index])

maxclassprob = 0

maxclassprob_i =-1

for i in range(classes):

if p[(5+i)*49+confid_index] > maxclassprob and


foundindex:

39
maxclassprob = p[(5+i)*49+confid_index]

maxclassprob_i = i

classprob_list.append( maxclassprob)

class_id_list.append( maxclassprob_i)

print 'max_confid='+str(max_confid)+',c='+str(confid_value)
+',x='+str(x_value)+',y='+str(y_value)+',w='+str(w_value)+',h='+str(h_value)
+',cid='+str(maxclassprob_i)+',prob='+str(maxclassprob)

row = confid_index / side

col = confid_index % side

x = (w / side) * (col + x_value)

y = (w / side) * (row + y_value)

print 'confid_index='+str(confid_index)+',x='+str(x)
+',y='+str(y)+',row='+str(row)+',col='+str(col)

#draw = ImageDraw.Draw(nim)

#draw.rectangle([x-(w_value/2)*w,y-(h_value/2)*h,x+
(w_value/2)*w,y+(h_value/2)*h])

#del draw

#nim.save('predbox.png')

#sourceimage = X_test[xtext_index].copy()

x0_list.append( max(0, int(x-(w_value/2)*w)) )

40
y0_list.append( max(0, int(y-(h_value/2)*h)) )

x1_list.append( int(x+(w_value/2)*w) )

y1_list.append( int(y+(h_value/2)*h) )

break

#xtext_index = xtext_index + 1

#print pred

sourceimage = X_test[0].copy()

return sourceimage, x0_list, y0_list, x1_list, y1_list, classprob_list,


class_id_list

def test_yolo(imglist_path, model_weights_path='yolo_kerasmodel_rd.h5',


confid_thresh=0.3):

print 'test_yolo: '+imglist_path

# custom objective function

#print (s,w,h,c)

#exit()

if os.path.isfile(imglist_path):

testmodel = load_model(model_weights_path,
custom_objects={'yololoss': ddd.yololoss})

(s,w,h,c) = testmodel.layers[0].input_shape

f = open(imglist_path)

for img_path in f:

41
#X_test = []

if os.path.isfile(img_path.strip()):

frame = Image.open(img_path.strip())

#(orgw,orgh) = img.size

nim = scipy.misc.imresize(frame, (w, h, c))

if nim.shape != (w, h, c):

continu

#nim = img.resize( (w, h), Image.BILINEAR )

img, x0_list, y0_list, x1_list, y1_list, classprob_list,


class_id_list = predict(np.asarray([nim]), testmodel, thresh)

#X_test.append(np.asarray(nim))

#predict(np.asarray(X_test), testmodel, confid_thresh)

# found confid box

for x0,y0,x1,y1,classprob,class_id in zip(x0_list,


y0_list, x1_list, y1_list, classprob_list, class_id_list):

# draw bounding box

cv2.rectangle(img, (x0, y0), (x1, y1),


(255,255,255), 2)

# draw classimg

classimg = cv2.imread(voc_labels[class_id])

42
print
'box='+str(x0)+','+str(y0)+','+str(x1)+','+str(y1)

#print img.shape

#print classimg.shape

yst = max(0,y0-classimg.shape[0])

yend = max(y0,classimg.shape[0])

img[yst:yend, x0:x0+classimg.shape[1]] =
classimg

# draw text

font = cv2.FONT_HERSHEY_SIMPLEX

cv2.putText(img, str(classprob), (x0,y0-


classimg.shape[0]-1), font, 1,(255,255,255),2,cv2.LINE_AA)

cv2.imshow('frame',img)

if cv2.waitKey(1000) & 0xFF == ord('q'):

break

else:

print img_path+' predict fail'

cv2.destroyAllWindows()

else:

print imglist_path+' does not exist'

43
def demo_yolo(model_weights_path, filename, thresh=0.3):

print 'demo_yolo'

testmodel = load_model(model_weights_path,
custom_objects={'yololoss': ddd.yololoss})

(s,w,h,c) = testmodel.layers[0].input_shape

cap = cv2.VideoCapture(filename)

while (cap.isOpened()):

ret, frame = cap.read()

if not ret:

break

#print frame

nim = scipy.misc.imresize(frame, (w, h, c))

#nim = np.resize(frame, (w, h, c)) #, Image.BILINEAR )

img, x0_list, y0_list, x1_list, y1_list, classprob_list, class_id_list =


predict(np.asarray([nim]), testmodel, thresh)

# found confid box

for x0,y0,x1,y1,classprob,class_id in zip(x0_list, y0_list, x1_list,


y1_list, classprob_list, class_id_list):

# draw bounding box

cv2.rectangle(img, (x0, y0), (x1, y1), (255,255,255), 2)

# draw classimg

44
classimg = cv2.imread(voc_labels[class_id])

print 'box='+str(x0)+','+str(y0)+','+str(x1)+','+str(y1)

#print img.shape

#print classimg.shape

yst = max(0,y0-classimg.shape[0])

yend = max(y0,classimg.shape[0])

img[yst:yend, x0:x0+classimg.shape[1]] = classimg

# draw text

font = cv2.FONT_HERSHEY_SIMPLEX

cv2.putText(img, str(classprob), (x0,y0-classimg.shape[0]-1),


font, 1,(255,255,255),2,cv2.LINE_AA)

cv2.imshow('frame',img)

if cv2.waitKey(100) & 0xFF == ord('q'):

break

cap.release()

cv2.destroyAllWindows()

if sys.argv[1]=='train':

train_yolo(model_weights_path)

elif sys.argv[1]=='test':

if os.path.isfile(model_weights_path):

45
test_yolo(filename, model_weights_path, confid_thresh=thresh)

else:

test_yolo(filename, confid_thresh=thresh)

elif sys.argv[1]=='demo_video':

if os.path.isfile(model_weights_path):

print 'pretrain model:'+model_weights_path+', video:'+filename+',


thresh:'+str(thresh)

demo_yolo(model_weights_path, filename, thresh)

else:

print 'syntax error::need specify a pretrained model'

exit()

elif sys.argv[1]=='debug':

debug_yolo( cfg_path, model_weights_path )

46

You might also like