Car Parking Occupancy Detection Using YOLOv3

Car Parking Occupancy Detection using YOLOv3
by
Arepalli Rama Venkata Naga Sai
A thesis submitted in partial fulfillment of the requirements for the

degree of Master of Engineering in
Microelectronics and Embedded Systems
Examination Committee: Dr. Mongkol Ekpanyapong (Chairperson)

Prof. Matthew N. Dailey
Prof. Manukid Parnichkun
Nationality: India
Previous Degree: Bachelor of Technology in Electronics and
Communications Engineering
Jawaharlal Nehru Technological University,
Hyderabad, Telangana, India
Scholarship Donor: AIT Fellowship
Asian Institute of Technology

School of Engineering and Technology
Thailand
May 2019
Acknowledgements
I am grateful to my family for their love, support and motivation on me.
I would take this opportunity to thank my beloved advisor Dr. Mongkol Ekpanyapong for
his support, valuable suggestions, guidance and encouragement which have helped me
throughout to complete my thesis.
I would be grateful to thank Mr. Chatchai and Mr. Clifford for their assistance in
assembling GPU and CPU. I would be grateful to thank Mr. Teerapon for his help
whenever required. I would like to thank the committee members, Prof. Matthew N.
Dailey & Prof. Manukid Parnichkun for their valuable comments and suggestions for my
thesis.
A.R.V.N.SAI
May 2019
ii
ABSTRACT
The car parking occupancy detection is one of the most important systems that are needed
at various parking lots. For this thesis, CNNs have been used because they achieve most
promising results than compared to the other traditional parking detections. This thesis
presents a robust technique for the car parking occupancy detection by going through the
most of the parking issues such as parking displacements, non-unified car sizes and inter-
object occlusion. This thesis presents a real-time parking space detection based on the
Convolutional Neural Networks (CNN). The aim of this thesis is to solve the issue of car
parking for some extent which is done by taking videos from surveillance cameras and
detecting the occupancy of the parking lot whether the parking space is Empty or
Occupied. By using this technique, the main parking lot issue, that is availability of the
parking spaces in the parking lot is solved, so that the drivers do not waste much time in
searching for the parking space and do not leave in frustration of not able to find the exact
Empty space. This thesis uses the YOLOv3 object detection algorithm which is
implemented by deep neural network architecture. This is achieved by collecting the data
from five parking lots at our institute and training them using YOLOv3 model. The
detection is tested on both images and videos and the results indicate that this method is
most efficient in detecting a car in parking lot.
iii
TABLE OF CONTENTS
CHAPTER TITLE PAGE
Title Page i
Acknowledgments ii
Abstract iii
Table of Contents iv
List of Figures vi
List of Tables viii
1 Introduction 1
1.1 Overview 1
1.2 Problem statement 2
1.3 Objective 2
1.4 Limitations and Scope 2
1.4.1 Limitations 2
1.4.2 Scope 3
2 Literature Review 4
2.1 Background 4
2.1.1 Counter-based 4
2.1.2 Sensor-based 5
2.1.3 Vision-based 6
2.2 Related Work 7
2.2.1 Advantages and Disadvantages of previous works 9
2.3 YOLOV3 11
2.4 Proposed Method 14
3 Methodology 15
3.1 Data Collection 15
3.2 Annotating Images 18
3.3 Data Processing 21
3.4 Training and Testing 22
3.5 Flow Chart 25
4 Experimental Results 26
4.1 Results 26
5 Conclusion 60
iv
5.1 Conclusion 60
5.2 Recommendations 60
References 61
v
LIST OF FIGURES
FIGURE TITLE PAGE
Fig 1.1 Car Parking Lot 1

Fig 2.1 Counter-based Parking 4
Fig 2.2 Sensor-based Parking 5
Fig 2.3 Vision-based Parking 6
Fig 2.4 CNN Architecture 11
Fig 2.5 YOLOv3 Architecture 12
Fig 3.1(a) Data Collection of Parking Lot1 from a Video 16
Fig 3.1 (b) Data Collection of Parking Lot2 from a Video 16
Fig 3.1(c) Data Collection of Parking Lot3 from a Video 17
Fig 3.1(d) Data Collection of Parking Lot4 from a Video 17
Fig 3.1(e) Data Collection of Parking Lot5 from a Video 18
Fig 3.2 (a) Parking Lot1 Image labelled using LabelImg Tool 19
Fig 3.2(b) Parking Lot2 Image labelled using LabelImg Tool 19
Fig 3.2(c) Parking Lot3 Image labelled using LabelImg Tool 20
Fig 3.2(d) Parking Lot4 Image labelled using LabelImg Tool 20
Fig 3.2(e) Parking Lot5 Image labelled using LabelImg Tool 21
Fig 3.3 YOLO format of txt files 21
Fig 3.4 Intersection Over Union 23
Fig 3.5 Screenshot taken during training the YOLOv3 model 24
Fig 3.6 Flowchart Representation of Algorithm 25
Fig 4.1 Mean Average Precision (mAP) 27
Fig 4.2(a) Detection of the parking using YOLOv3 28
Fig 4.2(b) Overlapped Output of YOLOv3 in OpenCV 28
Fig 4.2(c) Confidence of the predicted image 29
vi
Fig4.10(a) Detection of the parking using YOLOv3 40
Fig4.10(b) Overlapped Output of YOLOv3 in OpenCV 40
Fig4.10(c) Confidence of the predicted image 41
Fig4.14(a) Detection of parking lot1 on a video 46
Fig4.14(b) Detection of parking lot1 on a video 46
Fig4.19(a) Wrong Overlapping of bounding boxes of YOLOv3 in OpenCV 51
Fig4.19(b) Wrong Overlapping of bounding boxes of YOLOv3 in OpenCV 51
Fig4.19(c) Wrong Overlapping of bounding boxes of YOLOv3 in OpenCV 52
Fig4.19(d) Wrong Overlapping of bounding boxes of YOLOv3 in OpenCV 52
Fig 4.20 Graphical representation of average loss vs number of iterations 53
for car parking occupancy detection
Fig4.21(a) Overlapped output of YOLOv3 in OpenCV of parking lot2 59
Fig4.21(b) Overlapped output of YOLOv3 in OpenCV of parking lot2 59
vii
LIST OF TABLES
TABLE TITLE PAGE
Table 2.2 Related Work 7

Table 2.2.1 Advantages and Disadvantages of previous works 9
Table 4.1 Precision values of car parking occupancy detection 54
Table 4.2 Overall information about the testing videos 55
Table 4.3 Overall accuracy on the testing videos on new data 56
Table 4.4 Accuracy for training on 4 parking lots and testing on 1 parking 58
lot on an image
viii
CHAPTER 1
INTRODUCTION
1.1 Overview
Finding a parking space in the parking lot these days is the major issue which cannot be
neglected because most of the time is wasted and also the energy of the drivers. Due to
increase in the population day by day the number of private vehicles are also increasing
but due to some reasons most of the times the number of parking spaces in a parking lot
remains the same. Sometimes, there may be an empty space in the parking lot but the
driver who sits in the car cannot know where the empty space is exactly located. The
reason behind that may be the free spot is far away from the car, or may be the parking
space is not visible because of some other cars or big objects hiding the empty space. In
the previous days, or may be at some parking lots now there used to be a person who
manages the parking spaces, in fact the person who manages also do not have the
information about where the empty space is located, they get to know only the total
number of empty spaces so the drivers themselves have to look around parking lot and
search for the empty space, meanwhile other driver arrives there are many losses such as
time, fuel and also temper. So, for these kinds of issues the researchers have developed
some solutions like parking lot recognition techniques by making use of some objects like
video cameras and sensors which are used to detect an empty and occupied state of the
parking space. The below fig 1.1 represent the car parking lot of a shopping mall.
Fig 1.1 Car Parking Lot
(https://d2v9y0dukr6mq2.cloudfront.net/video/thumbnail/WUS5VgH/outdoor-parking-
lot-filled-with-parked-cars-outside_bjeihmswe_thumbnail-full01.png)
1
This system can be implemented mostly in the outdoor parking lot so that this can cover
the whole parking lot. Outdoor parking lots are like parking lots at shopping malls,
restaurants and at universities. This can be useful in real-time application by knowing the
empty and occupied parking spaces which also helps in reducing the time taken for
parking a vehicle, pollution and can also reduce the queues at the parking lot.
1.2 Problem Statement
As the population grows there is an increase in the private vehicles which caused many
issues to the people at the crowded areas such as shopping malls, restaurants and
universities parking lots. This results in time waste for the drivers in search for the empty
parking space which includes atmospheric pollution and frustration of the drivers. On the
other hand, most of the other systems show only the total number of spaces empty or
occupied rather than showing the exact location.
1.3 Objectives
• To design a model which can efficiently detect the Empty and Occupied status of
the parking space in an image and a video by making use of YOLOv3 object
detection.
• To detect a car parking lot occupancy in an image and in the video whether it is
Empty or Occupied by using YOLOv3 detector which is represented by
overlapping the output in OpenCV and to collect data which is good and large
enough to train the model.
1.4 Limitations and Scope

1.4.1 Limitations
In the YOLO algorithm, the model predicts and checks the total image and the video with
respect to the training data that is it checks in the camera point of view at the testing time.
Unless few images of the test environment are included in the training dataset, the YOLO
model cannot get good detection. The precision values may change according to the
lighting conditions.
2
1.4.2 Scope
In future the model can be trained by increasing the images in the dataset and making
huge dataset in such a way that it can detected at different locations and with different
environment. The other scope is that the system can be developed in such a way that it
could give directions to the drivers to the nearby available empty space by using deep
learning and AI methods.
3
CHAPTER 2
LITERATURE REVIEW
2.1 Background
It is very often to see large number crowds at some places like, mega shopping malls,
stadiums and universities. In one of the heavily crowded places like shopping malls, there
will be huge crowds whenever there are any discounts and sales season presented by
merchants. As many people travel to these shopping malls in their own vehicles, so the
parking lots will also be crowded these days. Similarly, students and the staff at the
universities come to the institute by their own vehicles. There are many drivers who only
park at their favourite parking spaces in the parking lot during these periods. Generally,
the solutions for the parking lot problems are categorized into 3 groups:
• Counter-based
• Sensor-based
• Vision-based
2.1.1 Counter-based: This type of system work on the sensors which are placed at the
entrance and exit points of the parking lots. These can only give the information about
the total number of empty or occupied spaces in the parking lot but do not give the exact
location about the empty space, so these types of systems cannot be used on street and
residential parking spaces. The below fig 2.1 shows the example of counter-based
parking.
Fig 2.1: Counter-based Parking
(https://upload.ecvv.com/upload/Product/200912/China_Video_Based_Counter2009123
14174110.JPG)
4
2.1.2 Sensor-based:
In the sensor-based systems there are wire and wireless based sensors in which wired
systems which mainly depend ultrasonic, infrared light and the wireless based on the
magnetic-based sensors which are fixed at the parking lot. Both the wired and wireless
are being used mainly at the parking lot like shopping malls which are indoor. Since,
these are mainly used at the indoor environments these ground sensors can be used to
decide the position of the parking spaces. But, this kind of sensors requires high
maintaining of sensors and installing costs at each and every parking space, so on a total
this might become expensive when there are many parking spaces at a parking lot. So,
the solution for this is to use application-focused sensors which can be like magnetic or
infrared sensors and pneumatic road tubes. So, these kinds of sensors are very rarely used
in some cities. Some paperwork will be required for installing these sensors for the
parking space detection and time will also be wasted and the drivers will face some issues
while they are willing to park their cars when the installation of the sensors are going on.
The fig 2.2 below shows an example of sensor-based parking.
Fig 2.2: Sensor-based Parking
(https://sc02.alicdn.com/kf/HTB12jDELXXXXXa_XXXXq6xXFXXXz/Parking-
occupancy-sensors-parking-guidance-system-for.jpg)
5
2.1.3 Vision-based:
By using vision-based systems than the sensor-based has two advantages: versatility and
lower cost. Rather than using one ground sensor per parking space a smart camera can be
used which decreases the cost. Compared to the other systems the accuracy of this system
is far better than the others. Therefore, rather than using the other systems it is better to
go with the vision-based systems which gives high accuracy results. In addition to the
above advantages these can also be used to do the video surveillance activities and can
be used to face recognition or people recognition and video tracking of the moving cars
and people. In the below fig 2.3 the image represents the vision-based parking.
Fig 2.3: Vision-based Parking (Giuseppe et al., 2016)
6
Table 2.2 Related Work:
Reference Description Authors

[1] In this paper, they used the magnetic sensors and Sangwon Lee,
the ultrasonic sensors for better accuracy and Dukhee Yoon,
quality recognition of the cars in the parking Amitabha Ghosh
space.
[2] In this paper, they proposed an intelligent car Tang, V.W.S., Y.
parking system by using the wireless sensor Zheng, and J. Cao
systems. They have fixed one sensor at every
respective parking space to detect the occupancy
of the parking lot. The prototype of the system is
done on toy cars which is controlled by the
remote system and they have used Crossbow
motes which consists of temperature, light and
acoustic sensors.
[3] In this paper, they proposed an algorithm on Z. Zhang, M. Tao and
AMR sensor for parking detection. It is based on H. Yuan
the characteristic of the interference signal which
is caused by the adjacent vehicles. They have
done this with more accuracy than the existing
algorithms. They achieved this on both the
vehicles parking and the vehicles leaving.
[4] In this paper, they have proposed a method for Q. Wu, C. Huang, S.
recognition of the available parking lot by using Wang, W. Chiu and T.
Markov Random Field framework and Support Chen
Vector Machines. In this, the model is trained to
detect the car in the parking space by using some
machine learning algorithm rather than
segmentation of the vehicle in an image which is
taken from the parking lot.
[5] In this paper, they have used machine learning X. Ling, J. Sheng, O.
algorithms and developed a procedure to detect Baiocchi, X. Liu and
the parking lot topologies which depends on the M. E. Tolentino
7
position of the vehicles parked in the parking lot.
They have implemented this system making use
of Raspberry Pi and checked it in real-time urban
street near the University of Washington campus.
[6] In this paper, COINS has been developed by Bong, David & K.C,
using an integrated image processing approach. Ting & Lai, Koon
This system particularly needs manual seeding Chun
process for the initialization process and the
program takes on boundary search to acquire the
exact locations of the parking lot in an image. The
operation of this system is mainly done by object
detection, image difference, edge detection and
voting algorithm.
[7] In this paper, they evaluated the approach for the G. Amato, F. Carrara,
parking detection, by making using of CNNs to F. Falchi, C. Gennaro
detect the parking lot occupancy by using and C. Vairo
Raspberry Pi camera.
[8] In this paper, they present a good detection model S. Valipour, M. Siam,
which depends on deep CNNs. The testing and E. Stroulia and M.
implementation of the system is done a huge Jagersand
dataset and also tested on the video file from the
web-accessible camera present at the parking
space. The status of the parking lots is shown by
their mobile application. The design of their
network is based on VGGNet-F.
8
Table 2.2.1 Advantages and Disadvantages of previous works:
References Type Advantages Disadvantages

[1] [2] [3] Sensor-based 1. There is high 1. It may be a bit
parking confusing for
efficiency. unfamiliar
2. There is no need users.
for driving while 2. It requires high
looking for an installation and
available space. maintenance
3. There are less cost.
chances for 3. It is not
vehicle recommended
vandalism. for high peak
4. There is a hour volume
minimal staff facilities.
requirement if it
is used by known
parkers.
[4] [5] [6] [7] Vision-based 1. Good for car 1. The lighting
[8] searching in conditions
large parking must be well
lots. designed and
2. Easy for lot implemented.
occupancy 2. Software must
detection, be carefully
parking space selected and
recognition, implemented.
parking charges 3. Has no real
collection etc. intelligence:
3. Easily adaptable follow rules.
to inspect
different
products.
9
4. Maintains
consistency of
inspection.
5. Can be
reprogrammed
off-line on
another machine.
Convolutional Neural Networks (ConvNets or CNNs) are same as the human neural
networks which are built with weights and neurons. So, by using Convolutional Neural
Networks the task can be easily learned by the network. So, this is used with the pre-
existing algorithm to recognise in real-time the occupancy of the parking space. This
includes the quality, accurate, efficient and the scalable answer for the real-time
recognition of the parking lot. Convolutional Neural Networks are different kinds of
multi-layered network, which is designed in such a way that with only some pre-
processing it can detect the visual patterns from the image itself. The learning techniques,
particularly the convolutional neural networks provide the solution to the problems such
as parking occupancy detection. This deep learning techniques provides a solution for the
which is good for disturbances like partial occlusions, presence of shadows, variation of
light conditions and exhibits great performance. The results provide some quality when
considered with the parking occupancy detection and environment which are totally
different from the data used while training the model. In the phase of classification it
needs computational resources which makes it appropriate to the embedded environment
such as Raspberry Pi. A convolutional neural network is a combination of a large number
of hidden layers in which each layer does mathematical computations on the image and
gives an output image that is given in input to the network. A convolutional neural
network is different from the classical neural networks which is the CNN has
convolutional layers, which are able to model the spatial correlation of neighbouring
pixels better than the normal fully connected layers. The classification output is given by
the input which is given at the training phase. The model takes lots of training time and
also a bit expensive but after the training is done the detection from the network is quite
fast and efficient. The below fig 2.4 represents the Convolutional Neural Network (CNN)
architecture.
10
Fig 2.4: CNN Architecture (Sepehr et al., 2016)
2.3 YOLOV3
You Look Only Once (YOLO) is one among the faster algorithms for the object
recognition. But, for now it is not the most accurate algorithm for the object recognition,
and can be chosen for the real-time recognition as there will be less loss in terms of
accuracy.
In the past years, YOLO 9000 used to be the fastest which is also one among the most
accurate algorithm. But, before a couple of years it is not the best accurate algorithms
when compared with the other algorithms like SSD and RetinaNet which are the best for
accuracy. Then the next version of the YOLO which is YOLOv3 has been released with
some of the changes in the internal architecture which is known as Darknet. YOLOv3
consists of skip connections, residual blocks and upsampling. Darknet, which is the
YOLOv3 architecture consists of “53” layered network which is originally trained on
Imagenet. Then mainly for the part of detection, “53” more layers are added onto it, and
is given by a “106” fully convolutional underlying architecture for YOLOv3.The below
fig 2.5 represents the YOLOv3 network architecture.
11
Fig2.5: YOLOV3 Architecture
(https://cdn-images-1.medium.com/max/1000/1*d4Eg17IVJ0L41e7CTWLLSg.png)
In the YOLOv3 algorithm the detections can be done at 3 different scales which is
considered as the most important feature in the entire algorithm. It is fully convolutional
neural network which gives the output by applying a “1 x 1” kernel on a feature map. In
the YOLOv3 architecture the detection of the object in the image is done at 3 different
sizes and at 3 different layers in the architecture.
The shape of the detection kernel is given by “1 x 1 x (B x (5+C) )”. In this B represents
the no. of the bounding boxes which are present in a cell on the feature map which can
predict, 5 represents the 4 boundary box attributes and 1 for the confidence of the object
and C denotes the no. of classes. The YOLOv3 which is trained on COCO dataset gives
B = 3 and C = 80, so which gives the size of the kernel as “1 x 1 x 255”. The feature map
which is extracted from this kernel has different height and the width of the previous
feature map, this has the detection attributes along the depth.
For example, the image which is given as input is of size “416 x 416”, as mentioned
earlier the algorithm makes the predictions at 3 different layers of the network which are
done by downsampling the data by 32, 16 and 8.
In the YOLOv3 algorithm detection is done at 3 different layers. The 1st detection is made
by the 82nd layer. The image given at the input is downsampled for the 1st 81 layers in
such a way that it has a stride of 32. For example, the image has dimensions like “416 x
12
416”, the output size after the feature mapping is given by “13 x 13”. So, the output after
the 1 x 1 kernel detection will be “13 x 13 x 255”.
In a similar way the feature map at the 79th layer is sent to some convolutional layers,
then they the image is upsampled by 2 times which gives the dimensions of “26 x 26”.
Then this feature map is joined together with the feature map from the 61st layer. The
feature map is then given to a 1 x 1 convolutional layers to join the features from the 61st
layer. So, the 2nd detection will take place at the 94th layer, by giving a feature map of “26
x 26 x 255”.
This procedure continues in a above similar way, the feature map at the 91st layer is sent
to some convolutional layers, before connecting it with the feature map from the 36th
layer. And the same continues to be done as before the feature map is then given to a “1
x 1” convolutional layers to join the features from the 36th layer. the last and the final
detection is done at the 106th layer giving a feature map of size “52 x 52 x 255”.
All the above process is done to make sure that the problem of predicting the smaller
bounding boxes in the previous version of the YOLOv2 is resolved. The upsampling
layers which are connected to the previous plays a key role in predicting the smaller
bounding boxes.
The “13 x 13” layers predicts the large bounding boxes, the “52 x 52” layer predicts the
smaller bounding boxes and the remaining “26 x 26” layer predicts the medium bounding
boxes. The comparison of YOLOv2 and YOLOv3 is done below. YOLOv3 totally makes
use of 9 anchor boxes, each of the three detections use 3 of anchor boxes. These anchor
boxes are calculated by the K-Means clustering to get the 9 anchors. These anchor boxes
are arranged in the config file according to the descending order of their particular
dimensions.
If the input image of the same size is given to both the YOLOv2 and YOLOv3 then
Yolov3 detects more objects than that of YOLOv2. For example, for the same resolution
of “416 x 416” the YOLOv2 detects “13 x 13 x 5 = 845 boxes”. Using five anchor boxes
5 bounding boxes are predicted. But, as mentioned earlier in YOLOv3 the detections are
done at three different layers. So, for the same input, the no. of bounding boxes detected
are “10,647”. This shows that YOLOv3 makes predictions 10times the number of boxes
done by YOLOv2. This is the reason behind the slowness of the YOLOv3 architecture.
In the YOLOv3 the confidence of the object and the class prediction in YOLOv3 is done
by using logistic regression and the threshold is used to predict many labels for a
particular object.
13
2.4 Proposed Method:
In the proposed method this would like to present a robust detection algorithm based on
the deep learning techniques with good accuracy. The proposed method detects a car
occupancy in the parking lot in an image and in the video by training the model with the
data of the parking lot created. This shows that the convolutional neural networks work
will have a success rate close to the traditional methods used in the past. The system will
be able to detect the car occupancy in the parking lot according to the camera direction
when the camera is installed by the user. In this the images are collected from the
surveillance camera in different angles of the parking lot so that the model can detect the
car in different angles. The data for this is collected form five parking lots from the
institute for better results. As mentioned earlier there are some disadvantages included in
this vision-based system, the detection in this is more accurate and the advantages far
outweighs its disadvantages.
14
CHAPTER 3
METHODOLOGY
In this this, training and finding a car in the parking space is done using convolutional
neural networks. In the proposed method, this would like to collect the images of cars
parked in the outdoor parking lot and create a dataset with all the images so that the neural
network can detect the car present in the parking space or not accurately.
3.1 Data Collection
In this thesis, we collected the images of the five parking lots at the two different sides
of the administrative building and others at the two different sides of the hotel at our
Institute. The images are collected using the temporary surveillance camera which is
placed above the Hom Krun coffee shop and at the hotel in a position such that every
parking slot is covered. This camera can be accessed using the VLC media player,
through which the videos are recorded. The videos are recorded every day in different
lighting conditions. We took snaps of the two different parking lots from the video
recorded. We collected 900 images from the parking lot and trained the model. For
test images, we randomly collected the images from the videos recorded and tested
them for detection, for testing the videos, we collected the video on a different day
for few minutes and tested them.
The camera which is used for collecting the data is Honeywell’s new HBW4PR2
bullet camera which has high picture clarity. The image quality is Full HD 1080p
25/30 fps. The camera is very good in taking the quality of the data to the next level,
it also has wide dynamic range. The performance of the camera is also excellent in
low-light conditions with 3D noise reduction.
15
Figure 3.1(a): Data Collection of Parking Lot1 from a Video
Figure 3.1(b): Data Collection of Parking Lot2 from a Video
The above shown fig 3.1(a), fig 3.1(b) represent the data collected from the
parking lot1 and parking lot2 which is done by taking screenshots on a video
recorded.
16
Figure 3.1(c): Data Collection of Parking Lot3 from a Video
Figure 3.1(d): Data Collection of Parking Lot4 from a Video
The above shown fig 3.1(c), fig 3.1(d), fig 3.1(e) represents the data collected
from the parking lot1, parkinglot2 and parking lot3 which is done by taking
screenshots on a video recorded.
17
Figure 3.1(e): Data Collection of Parking Lot5 from a Video
3.2 Annotating Images
We annotated images with two different classes which are “Empty” and “Occupied”.
For labelling the image we used LabelImg software which is an open source. In this
software the images can be annotated easily as it is user friendly. Firstly, the software
contains some pre-defined classes which should be changed according to our classes,
we have named the classes as “Empty” and “Occupied”. The software is set to YOLO
format and we annotated the images where there is no car in a particular slot as
“Empty” and where there is a car as “Occupied”. The data of the bounding boxes that
we annotated along with the two classes are saved in the txt format which will save
the coordinates of the bounding boxes.
18
Figure 3.2(a): Parking Lot1 Image labelled using LabelImg Tool
Figure 3.2(b): Parking Lot2 Image labelled using LabelImg Tool
The above shown fig 3.2(a), fig 3.2(b) represents the annotating images of the parking
lot1 and parking lot2. The images are annotated using the LabelImg Tool.
19
Figure 3.2(c): Parking Lot3 Image labelled using LabelImg Tool
Figure 3.2(d): Parking Lot4 Image labelled using LabelImg Tool
The above shown fig 3.2(c), fig 3.2(d) represents the annotating images of the parking
lot1 and parking lot2. The images are annotated using the LabelImg Tool.
20
Figure 3.2(e): Parking Lot5 Image labelled using LabelImg Tool
3.3 Data Processing
For YOLOV3 detector, there is a different format for the values to be fed into the
model. These values are saved in the txt format which contains some of the specified
classes and some values. These values can be as below.
The order of the values in txt file is class, x, y, w, h.
“where x = Absolute x / width of total image.
y = Absolute y / height of total image.
w = Absolute width / width of the total image.
h = Absolute height / height of total image.”
Figure 3.3: YOLO format of txt files
21
Where Absolute x, Absolute y, Absolute width, Absolute height are given below
“Absolute x = (Xmin + (Absolute width/2))
Absolute y = (Ymin + (Absolute height/2))
Absolute width = abs (Xmax - Xmin)
Absolute height = abs (Ymax - Ymin)”
In the above figure 3.3 the number ‘0’ denotes the class “Empty” and the number ‘1’
denotes the class “Occupied” and the other values denotes the metrics x,y,w,h.
3.4 Training and Testing
As the deep learning requires hardware with high end specifications for training, so
we are using hardware with the following configurations
CPU- Intel core i7-7700k, 4.3 GHz,
Motherboard – Asus Prime Z270-A and Asus ROG-SYRIX-GTX1080TI
RAM – 16GB for CPU and 11GB for GPU
We have used the pre-trained weights of YOLOV3 which are trained on COCO
dataset which consists of 80 classes and images are more than 80000. The training of
our model on the dataset was done by the previous steps. The filters of YOLOV3
configuration file should be set according to the below YOLOV3 filters formula
• “Filters = (classes + 5) * 3”
As there are two classes in our case, the classes are set to our number of objects that
is two in each of 3 yolo layers, the filters which are in convolutional layer just before
the 3 yolo layers are set to 21. The YOLOv3 configuration file consists of batch
number which is set to 64, subdivisions to 16, learning rate to 0.001, saturation to 1.5
and exposure to 1.5. The resolution of the input image is resized to a width and height
of 608*608. And the training of our model is done, but as YOLOv3 does not limit the
number of iterations to be done, the system continuously trains the model. Usually
2000 iterations are sufficient for each class(object), but the iteration number should
not be less than 4000 in total. But, for a better precision the training should be stopped
for the iteration where the average loss no longer decreases or remains constant. So,
when the training should be actually stopped depends upon the average loss function.
22
The average loss function changes gradually from a higher value to the lower value
and remains constant, then the training can be stopped.
The weights file is generated and saved for every 1000 iterations and this can be
changed in the configuration file for our desired number of iterations. After the
training is done the mAP (Mean Average Precision) is calculated for the validated
images. This should be done for every weight so that we can know the mAP for each
and every weight and the weights file whose mAP is higher can be used for testing
the image for good detection.
Thresh is considered as the minimum threshold of the IOU(Intersection Over Union)

during the training. IOU(Intersection Over Union) is determined to be the fraction of
area of union between the anchor box and the ground truth.
Figure3.4: Intersection Over Union
(https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-
detection/)
23
Figure 3.5: Screenshot taken during training the YOLOv3 model
The above fig 3.5 is screenshot taken when training the YOLOv3 model
24
3.5 Flow Chart:
The flow chart for training and testing the YOLOv3 model and representing it in
OpenCV is as follows in the below fig 3.6
Figure 3.6: Flowchart representation of Algorithm
25
CHAPTER 4
EXPERIMENTAL RESULTS
4.1 Results:
The results which are tested are from five parking lots i.e parking lot 1,2,3,4&5 at our
Institute. First, the occupancy detection of the parking lot is done by YOLOv3 detector
but as the parking lot at our institute is in quadrilateral form and in the form of squares
and rectangles, the bounding boxes of the parking lot are drawn manually using OpenCV
python code and saved them in an xml format and the output data of the YOLOv3 detector
and the each class i.e whether it is Empty or Occupied is used to detect the occupancy by
overlapping them with the bounding boxes using python code in such a way that it covers
each slot of the parking lot. For each image below we can see the good performance of
the system most of the times. So far, the accuracy is good when tested on the images and
also from the videos. The confidence of the model for each test image are attached below
the image which is pretty much good at all the times. For some of the videos due to the
detector not overlapping exactly on given particular slot there are miss predictions.
For this thesis we have used 10% of the total images for validating our model so that we
can calculate the mAP (Mean Average Precision) of our model. As mentioned earlier we
have two classes in our model which are represented by class_id 0,1. In the case of mAP,
precision and recall are the two ways to calculate the predictions. It can be calculated by
the below formula
Precision = TP / (TP+FP)
TP (True Positive): This is the case in which the network predicted correct, and it is True.
TN (True Negative): This is the case in which the network predicted wrong, and it is
False.
FP (False Positive): This is the case in which the network predicted correct, but it is False.
FN (False Negative): This is the case in which the network predicted wrong, but it is True.
26
Fig 4.1: Mean Average Precision(mAP)
As shown in the above figure 4.1 the mAP (Mean Average Precision) is 89 percent for
the parking lot.
As mentioned earlier, for the occupancy detection we used YOLOv3 detector, the output
is shown in below figures 4.2(a), 4.3(a), 4.4(a), 4.5(a), 4.6(a), 4.7(a), 4.8(a), 4.9(a),
4.10(a), 4.11(a), 4.12(a), 4.13(a). As shown in the figure 3.1(a), 3.1(b), 3.1(c), 3.1(d),
3.1(e) are the lanes at our institute are in the form of quadrilaterals and also squares and
rectangles so to exactly cover the parking slot we need to label the images manually which
can be done using OpenCV. So, the YOLOv3 detector data is used to detect the occupancy
by overlapping them with the bounding boxes in OpenCV. The overlapping output of
YOLOv3 in OpenCV for each parking lot are shown in below figures 4.2(b), 4.3(b),
4.4(b), 4.5(b), 4.6(b), 4.7(b), 4.8(b), 4.9(b), 4.10(b), 4.11(b), 4.12(b), 4.13(b). After the
occupancy detection the confidence of each image is seen in the terminal window, the
confidences of each predicted image in YOLOv3 for each parking lot are shown in the
Figures 4.2(c), 4.3(c), 4.4(c), 4.5(c), 4.6(c), 4.7(c), 4.8(c), 4.9(c), 4.10(c), 4.11(c), 4.12(c),
4.13(c). For detection in the videos, the videos are tested with different data which are
not included in the training dataset. The videos are randomly recorded for few minutes
on a different day and tested. The snapshots of the predicted output of the videos tested
27
are shown in the Figures 4.14(a), 4.14(b), 4.15(a), 4.15(b), 4.16(a), 4.16(b), 4.17(a),
4.17(b), 4.18(a), 4.18(b). The testing videos also includes some overlapping problems in
the videos due to the detector not overlapping exactly on given particular slot present in
OpenCV. This case is occurred due to the camera is not fixed in a particular angle which
covers every parking space for some parking lots, so the bounding boxes are not exactly
overlapping on the particular space and changing on a video. In this case, the YOLOv3
output is correct but due to wrong overlapping this is occurred on OpenCV. Figures
4.19(a), 4.19(b), 4.19(c), 4.19(d) represents the wrong overlapping of YOLOv3 in
OpenCV.
Figure 4.2(a): Detection of the parking using YOLOv3
Figure 4.2(b): Overlapped Output of YOLOv3 in OpenCV
28
Figure 4.2(c): Confidence of the predicted image
The above shown fig 4.2(c) represents the confidence of each predicted output in
YOLOv3 which is shown in the terminal window and the fig 4.3(a) represents the
occupancy detection of YOLOv3 detector which is done by testing in YOLOv3 detector.
29
The above shown fig 4.3(b) represents the output of OpenCV which is done by
overlapping the output of YOLOv3 detector in OpenCV and the fig 4.3(c) represents the
confidence of each predicted output in YOLOv3 which is shown in the terminal window.
30
The above fig 4.4(a) shows the occupancy detection of YOLOv3 detector which is done
by testing in YOLOv3 detector and the fig 4.4(b) shows the output of OpenCV which is
done by overlapping the output of YOLOv3 detector in OpenCV.
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
Figure 4.14(a): Detection of the parking lot1 on a video
Figure 4.14(b): Detection of the parking lot1 on a video
The above fig 4.14(a), fig 4.14(b) represent the snapshots of the predicted output of the
videos tested of the parking lot1. The videos are randomly recorded for few minutes on a
different and tested.
46
47
48
49
50
Figure 4.19(a): Wrong Overlapping of bounding boxes of YOLOv3 in OpenCV
Figure 4.19(b): Wrong Overlapping of bounding boxes of YOLOv3 in OpenCV
The above fig 4.19(a), fig 4.19(b) shows the wrong overlapping of bounding boxes of
YOLOv3 in OpenCV. It is occurred in the video output due to the bounding boxes not
overlapping exactly on the particular parking space. This occurs due to the camera is not
fixed in a particular angle which covers every parking space.
51
Figure 4.19(c): Wrong Overlapping of bounding boxes of YOLOv3 in OpenCV
Figure 4.19(d): Wrong Overlapping of bounding boxes of YOLOv3 in OpenCV
The above fig 4.19(a), fig 4.19(b) shows the wrong overlapping of bounding boxes of
YOLOv3 in OpenCV. It is occurred in the video output due to the bounding boxes not
overlapping exactly on the particular parking space. This occurs due to the camera is not
fixed in a particular angle which covers every parking space.
52
Figure 4.20: Graphical representation of average loss vs number of iterations for
car parking occupancy detection
The above graph is plotted by taking down the accuracies for every 1000 iterations. This
is the graphical representation of Table 4.1
53
Table 4.1 Precision values of Car parking occupancy detection for the images
Number of Iterations Precision (%)
1000 86.72
2000 88.96
3000 89.69
4000 88.79
5000 89.34
6000 88.65
7000 88.97
8000 89.21
9000 89.14
10000 88.75
11000 89.16
12000 89.03
13000 88.93
14000 89.26
15000 88.73
16000 89.28
17000 89.03
18000 89.37
19000 89.31
20000 89.32
54
Table 4.2 Overall information about the testing videos
Test Video Total duration of Location of the Video recording

the video tested parking lot date
1 8sec Administrative 5th April 2019
building
2 4min 4sec Administrative 5th April 2019
building
3 36sec Administrative 5th April 2019
building
4 19sec Hom crun coffee shop 4th April 2019
5 1min 30sec Hom crun coffee shop 4th April 2019
6 10min 35sec Hom crun coffee shop 4th April 2019
7 2min 43sec Hom crun coffee shop 18th Feb 2019
8 8sec Aitcc Annex 18th April 2019
10 1min 20sec Aitcc Annex 18th April 2019
The above table 4.2 represents the total tested videos at different parking locations. The
videos are taken from five parking lots and tested, the time duration and the video
recording date are also included in the above table.
55
Table 4.3 Overall accuracy on the testing videos on new data
Test Actually Actually Predicted Predicted No. of Accuracy

Videos Empty Occupied Empty Occupied wrong (%)
predictions
1 2 5 2 5 - 100
2 2 5 2 5 - 100
3 2 5 2 5 - 100
1 6 1 6 - 100
4 9 4 9 4 - 100
5 8 5 9 4 1 92.3
7 6 9 4 2 84.6
6 7 7 6 1 92.3
5 8 6 7 1 92.3
6 2 11 3 10 1 92.3
2 11 4 9 2 84.6
3 10 4 9 1 92.3
2 11 2 11 - 100
2 11 3 10 1 92.3
1 12 3 10 2 84.6
2 11 3 10 1 92.3
7 7 7 7 7 - 100
6 8 6 8 - 100
8 9 6 10 5 1 93.3
9 10 2 10 2 - 100
10 8 7 8 7 - 100
9 6 9 6 - 100
10 5 10 5 - 100
9 6 10 5 1 93.33
11 10 2 10 2 - 100
10 2 11 1 1 91.66
9 3 11 1 2 83.33
9 3 10 2 1 91.66
56
Test Actually Actually Predicted Predicted No. of Accuracy
Videos Empty Occupied Empty Occupied wrong (%)
predictions
12 5 9 8 6 3 78.5
5 9 7 7 2 85.7
4 10 6 8 2 85.7
4 10 8 6 4 71.4
4 10 7 7 3 78.5
13 6 8 8 6 2 85.7
5 9 8 6 3 78.5
5 9 7 7 2 85.7
5 9 6 8 1 92.8
4 10 5 9 1 92.8
4 10 7 7 3 78.5
4 10 5 9 1 92.8
The above table 4.3 represents the overall accuracy on the testing videos on new data. For
the final testing, the data is taken from a different day and the actual empty and actual
occupied data is taken from the video so that this can calculate the overall precision. By
running the video, we can get the predicted empty and predicted occupied values from
which we get to know the total wrong detections for each video. It is occurred in the video
output due to the bounding boxes not overlapping exactly on the particular parking space.
This occurs due to the camera is not fixed in a particular angle which covers every parking
space. This case also occurs when there are some miss predictions in the YOLOv3 output.
By calculating the average precision for all the videos, the total overall accuracy is up to
94%. The values are noted for each change of the car parking space so that we can get
exact accuracy of the system.
57
Table 4.4 Accuracy for training on 4 parking lots and testing on 1 parking lot on
an image
Testing Actually Actually Predicted Predicted No. of Accuracy

Images Empty Occupied Empty Occupied wrong (%)
of only predictions
parking
lot2
1 2 5 5 2 3 57.14
2 2 5 5 2 3 57.14
3 2 5 5 2 3 57.14
4 2 5 4 3 2 71.42
5 2 5 4 3 2 71.42
6 2 5 4 3 2 71.42
7 2 5 4 3 2 71.42
8 3 4 6 1 3 57.14
9 4 3 5 2 1 85.71
10 4 3 6 1 2 71.42
11 3 4 5 2 2 71.42
12 2 5 5 2 3 57.14
13 2 5 4 3 2 71.42
14 1 6 6 1 5 28.57
15 0 7 6 1 6 14.28
16 0 7 6 1 6 14.28
17 1 6 6 1 5 28.57
18 0 7 6 1 6 14.28
19 0 7 6 1 6 14.28
20 0 7 4 3 4 42.85
The above table 4.4 represents the accuracy for the test data of which the environment is
completely different from the training data. The images in the test data are taken from the
parking lot 2 and tested with the trained 1,3,4,5 parking lot weights. The number of classes
is set to 1 during the training phase which is “Occupied”. This is done for checking the
58
accuracy of the model. As mentioned in the limitations some of the test environment
images should be included in the training data for good accuracy. So, this is done mainly
to check the accuracy of the model on a new test environment. The average accuracy of
the model on a new test environment is up to 51%. These are tested totally on 20 images
of the parking lot2. This can be increased by including some of the test environment data
in the training data. Due to not at all including the test environment data in the training
data, the accuracy of the model is very less compared to all the above scenarios. The
below fig 4.21(a) and 4.21(b) represent the testing images of the YOLOv3 output
overlapping with OpenCV of the parking lot2.
Fig 4.21(a) Overlapped output of YOLOv3 in OpenCV of parking lot2
Fig 4.21(b) Overlapped output of YOLOv3 in OpenCV of parking lot2
59
CHAPTER 5
CONCLUSION AND RECOMMENDATIONS
5.1 Conclusion
The car parking detection occupancy is done by collecting the data from different
environments in real-time scenario from five parking lots. It is performed and tested
on images including the videos at five parking lots on YOLOv3 detector and on
OpenCV by overlapping the output of YOLOv3 detector.
5.2 Recommendations
This thesis can be further implemented by increasing the dataset by taking data from
different locations from different cameras so that the precision and the accuracy can
be improved. This can also be further implemented in night conditions.
60
REFERENCES
[1]. S. Lee, D. Yoon and A. Ghosh, "Intelligent parking lot application using wireless
sensor networks," 2008 International Symposium on Collaborative Technologies and
Systems, Irvine, CA, 2008.
[2]. V. W. s. Tang, Y. Zheng and J. Cao, "An Intelligent Car Park Management System
based on Wireless Sensor Networks," 2006 First International Symposium on Pervasive
Computing and Applications, Urumqi, 2006.
[3]. Z. Zhang, M. Tao and H. Yuan, "A Parking Occupancy Detection Algorithm Based
on AMR Sensor," in IEEE Sensors Journal, vol. 15, no. 2, pp. 1261-1269, Feb. 2015.
[4]. Q. Wu, C. Huang, S. Wang, W. Chiu and T. Chen, "Robust Parking Space Detection
Considering Inter-Space Correlation," 2007 IEEE International Conference on
Multimedia and Expo, Beijing, 2007.
[5]. X. Ling, J. Sheng, O. Baiocchi, X. Liu and M. E. Tolentino, "Identifying parking

spaces & detecting occupancy using vision-based IoT devices," 2017 Global Internet of
Things Summit (GIoTS), Geneva, 2017.
[6]. Bong, David & K.C, Ting & Lai, Koon Chun. (2008). Integrated Approach in the
Design of Car Park Occupancy Information System (COINS). IAENG International
Journal of Computer Science. 35.
[7]. G. Amato, F. Carrara, F. Falchi, C. Gennaro and C. Vairo, "Car parking occupancy
detection using smart camera networks and Deep Learning," 2016 IEEE Symposium on
Computers and Communication (ISCC), Messina, 2016.
[8]. S. Valipour, M. Siam, E. Stroulia and M. Jagersand, "Parking-stall vacancy indicator

system, based on deep convolutional neural networks," 2016 IEEE 3rd World Forum on
Internet of Things (WF-IoT), Reston, VA, 2016.
[9]. TzuTaLin, “tzutalin/labelimg: Labelimg is a graphical image annotation tool and

label object bounding boxes in images.” https://github.com/ tzutalin/labelImg, June 2017.
(Accessed on 09/30/2016).
[10]. B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman, “Labelme: A

database and web-based tool for image annotation,” International Journal of Computer
Vision, vol. 77, no. 1, pp. 157–173, 2008.
61
[11]. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep
convolutional neural networks,” in Advances in Neural Information Processing Systems
25 (F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, eds.), pp. 1097–1105,
Curran Associates, Inc., 2012.
[12] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied

to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[13] Hui, J. (Mar7, 2018), mAP(mean average precision) for object detection.
[14] Kathuria, A. (Apr 23, 2018). Whats new in yolov3?ence.com
[15] Murugavel, M. (Jun 23, 2018). How to train yolov3to detect custom objects.
[16] Redmon, J., & Farhadi, A. (2018). YOLOv3: An Incremental Improvement.
[17] Redmon, J. (Mar 26, 2018). Yolo: Real-time object detection.
62

Car Parking Occupancy Detection Using YOLOv3

Uploaded by

Copyright:

Available Formats

Car Parking Occupancy Detection Using YOLOv3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Car Parking Occupancy Detection Using YOLOv3

Uploaded by

Copyright:

Available Formats

Car Parking Occupancy Detection using YOLOv3

Arepalli Rama Venkata Naga Sai

A thesis submitted in partial fulfillment of the requirements for the

Examination Committee: Dr. Mongkol Ekpanyapong (Chairperson)

Scholarship Donor: AIT Fellowship

Asian Institute of Technology

I am grateful to my family for their love, support and motivation on me.

CHAPTER TITLE PAGE

FIGURE TITLE PAGE

Fig 1.1 Car Parking Lot 1

TABLE TITLE PAGE

Table 2.2 Related Work 7

Fig 1.1 Car Parking Lot

1.2 Problem Statement

1.4 Limitations and Scope

Fig 2.1: Counter-based Parking

The fig 2.2 below shows an example of sensor-based parking.

Fig 2.2: Sensor-based Parking

Fig 2.3: Vision-based Parking (Giuseppe et al., 2016)

Reference Description Authors

References Type Advantages Disadvantages

3.1 Data Collection

Figure 3.1(b): Data Collection of Parking Lot2 from a Video

Figure 3.1(d): Data Collection of Parking Lot4 from a Video

3.2 Annotating Images

Figure 3.2(b): Parking Lot2 Image labelled using LabelImg Tool

Figure 3.2(d): Parking Lot4 Image labelled using LabelImg Tool

3.3 Data Processing

The order of the values in txt file is class, x, y, w, h.

“where x = Absolute x / width of total image.

y = Absolute y / height of total image.

w = Absolute width / width of the total image.

h = Absolute height / height of total image.”

Figure 3.3: YOLO format of txt files

“Absolute x = (Xmin + (Absolute width/2))

Absolute y = (Ymin + (Absolute height/2))

Absolute width = abs (Xmax - Xmin)

Absolute height = abs (Ymax - Ymin)”

3.4 Training and Testing

CPU- Intel core i7-7700k, 4.3 GHz,

Motherboard – Asus Prime Z270-A and Asus ROG-SYRIX-GTX1080TI

RAM – 16GB for CPU and 11GB for GPU

Thresh is considered as the minimum threshold of the IOU(Intersection Over Union)

Figure3.4: Intersection Over Union

Figure 3.6: Flowchart representation of Algorithm

Figure 4.2(a): Detection of the parking using YOLOv3

Figure 4.2(b): Overlapped Output of YOLOv3 in OpenCV

Figure 4.3(a): Detection of the parking using YOLOv3

Figure 4.3(c): Confidence of the predicted image

Figure 4.4(b): Overlapped Output of YOLOv3 in OpenCV

Figure 4.5(a): Detection of the parking using YOLOv3

Figure 4.5(c): Confidence of the predicted image

Figure 4.6(b): Overlapped Output of YOLOv3 in OpenCV

Figure 4.7(a): Detection of the parking using YOLOv3

Figure 4.7(c): Confidence of the predicted image

Figure 4.8(b): Overlapped Output of YOLOv3 in OpenCV

Figure 4.9(a): Detection of the parking using YOLOv3

Figure 4.9(c): Confidence of the predicted image

Figure 4.10(b): Overlapped Output of YOLOv3 in OpenCV