Word-Template-CSI-7-CVP_2023 (1) copy
Word-Template-CSI-7-CVP_2023 (1) copy
Word-Template-CSI-7-CVP_2023 (1) copy
2022-2023 1
Abstract— Detecting cars in images is essential for make safe decisions. Therefore, having a reliable car detection
several applications, including traffic monitoring and system that works efficiently is vital for ensuring safety,
autonomous driving systems. However, traditional increasing efficiency, and making intelligent decisions.
computer vision techniques need help with accurate
detection due to various challenges, such as variations in
car models, occlusions, lighting conditions, and complex
Detecting cars is a complex task, as various challenges need
backgrounds. To overcome these obstacles, deep learning to be overcome. These difficulties arise from the vast range of
approaches like convolutional neural networks (CNNs) car models, changes in lighting conditions, obstructions, and
have gained popularity. In this study on car detection convoluted backgrounds. Conventional computer vision
systems using deep learning methods, we proposed a techniques like feature extraction and handcrafted rule-based
system that can do car detection accurately using a methods have yet to deal with these obstacles efficiently. This
diverse dataset of car images with bounding box shortcoming is because such approaches frequently hinge on
annotations for training. The model uses hierarchical manual feature engineering, which is often time-consuming,
representations of image features extracted by CNNs to
limited in adaptability, and vulnerable to errors. As a result of
analyze patterns and relationships associated with cars
for precise detection of unseen objects. Our experiments these issues, detecting cars has become a daunting task for
demonstrate that the system outperforms traditional computer vision experts.
computer vision methods enabling its application in traffic
monitoring systems to improve flow and detect violations, Deep learning techniques have become increasingly popular,
parking lot surveillance systems for occupancy particularly in computer vision. Among these approaches,
monitoring, and autonomous driving systems for convolutional neural networks (CNNs) have emerged as some
surrounding vehicle tracking, contributing to safer of the most promising models for overcoming the limitations
navigation. Deep learning techniques offer an efficient of traditional methods. One of the main advantages of CNNs
solution to address challenges faced in various
applications making it a potential revolutionizing tool
is their ability to learn hierarchical representations of image
contributing to safer transportation systems. features through multiple processing layers. It allows them to
capture complex patterns and relationships that might be
Index Terms — CNN, FPS, RESNET, SSD, YOLO difficult or impossible to detect using more traditional
algorithms.
I. INTRODUCTION
One specific application where CNNs have shown great
In the modern era, computer vision has witnessed many success in car detection. By training on large-scale datasets
significant advancements attributed to the rapid development that contain a diverse range of cars in different lighting
of deep learning techniques. Deep learning is a subset of conditions and perspectives, CNN-based models can learn to
machine learning with remarkable capabilities in tackling recognize cars with high accuracy and robustness. It has
image-related tasks like object detection. Object detection is important implications for various industries, from automotive
crucial in applications like surveillance systems, autonomous safety to traffic management systems.
vehicles, and image analysis. The ability to detect objects in
an image with accuracy and speed makes it possible for Overall, it is clear that deep learning approaches like CNNs
machines to interpret their environment as humans do, making are providing new opportunities for solving complex problems
them more useful for real-world scenarios. With these in computer vision and beyond. As researchers continue to
breakthroughs in computer vision technology, it has become develop and refine these models, we can expect even more
possible to make machines perceive their surroundings better. exciting breakthroughs in the future.
Car detection has become increasingly popular and essential We aim to develop an efficient car detection mechanism
due to its numerous practical uses. One main reason is that it with advanced deep learning methods. The aim is to detect and
can help with traffic monitoring, which is crucial in today's pinpoint the precise location of cars in photographs by
fast-paced world. Moreover, it can be used for surveillance in outlining bounding boxes around recognized car shapes. Our
parking lots, where it has proven valuable. Another critical target detection system must be capable enough to deal with
application of car detection technology is vehicle counting, numerous challenges, including variations in car models,
essential for understanding traffic patterns and optimizing distinct viewing angles, obstructions caused by other objects,
transportation infrastructure. Lastly, accurate and efficient car and intricate backgrounds. This system will require high
detection is critical in developing autonomous driving systems accuracy and reliability, as we need it to function proficiently
that rely on computer vision to perceive their surroundings and
2 CSI-7-MAL 2022-2023
even in challenging situations such as low-light conditions or generated box. These systems offer promising solutions for
poor weather. Moreover, we want our model to be scalable, efficient and accurate object detection in various applications.
enabling it to work seamlessly on datasets of varying sizes
without sacrificing efficiency or precision. Another popular technique for real-time object detection is
the utilization of Haar-like features and cascade classifiers,
A refined deep-learning algorithm will be employed to which was introduced by Viola and Jones (2001)[4].The
address the issue. The algorithm will be fine-tuned and approach has gained popularity in car detection due to its
optimized through training with an extensive and diverse set computational efficiency and ease of use. This method utilizes
of image data comprising cars, complemented with bounding AdaBoost classifiers in conjunction with a series of cascading
box annotations corresponding to each car in the images. By classifiers to achieve high detection rates with minimal false
employing such training methodology, the model will learn to positives.
distinguish and recognize unique features and patterns that
differentiate automobiles from other objects in the images. The combination of deep learning techniques and
Once trained, the model will exhibit exceptional accuracy in conventional computer vision algorithms has been
detecting cars within unfamiliar but related snaps utilizing investigated by researchers to improve car detection
precise bounding box predictions for each detected performance. Zhang et al. (2018)[5] suggested integrating
automobile. deep learning-based region proposal networks (RPNs) with
histogram of oriented gradients (HOG) features, resulting in
The primary objective of the suggested car detection enhanced accuracy by effectively capturing appearance and
mechanism is to attain an outstanding level of precision, spatial information.
durability, and productivity. The system must cope with
various environmental predicaments, such as extreme weather Additionally, studies have investigated data augmentation
conditions or environmental obstructions like fog and dust. methods to expand the training data and enhance model
Furthermore, it should be able to function seamlessly in generalization. Approaches like altering image orientation,
different lighting conditions, including low-lighting situations, size, and reflection have been utilized to amplify the range of
daytime illumination, and nighttime settings. The detection the training instances, boosting the model's capability to
system should also possess the ability to recognize objects identify automobiles in different scenarios and perspectives.
even when they are partially covered or obstructed by other
vehicles or obstacles in traffic scenarios, which will enhance Performance assessment of car detection models is typically
its efficiency. carried out by researchers through the use of metrics such as
accuracy, mean average precision (mAP), and intersection
II. LITERATURE REVIEW over union (IoU). These metrics aid in determining how well
Computer vision uses object detection, an essential process the model can accurately classify and localize cars depicted in
in identifying where objects are located within images or images.
videos. This task has been approached through various
methods, but deep learning is one of today's most widely Deep learning has also been widely explored for object
employed techniques. The effectiveness of deep learning when detection, with researchers proposing several effective models.
it comes to detecting objects has been well established, and it For instance, He et al. [6] introduced ResNet (Residual
has found applications in diverse fields. The primary goal is to Network), a deep neural network that has achieved remarkable
develop a car detection model using deep learning methods success in object detection and other image recognition tasks.
while examining relevant works on the subject. Meanwhile, RetinaNet addresses the challenge of class
imbalance in object detection by using a focal loss function.
In object detection, Convolutional neural networks (CNNs) Specifically, it assigns lower loss to more accessible examples
have been making strides in achieving remarkable results. The and higher loss to harder ones to balance them better.
Faster R-CNN model proposed by Ren et al. [1] is one Similarly, Redmon et al.'s [7] YOLO (You Only Look Once)
example that utilizes CNN-based methods for this task. The is an ultra-fast real-time object detection system operating at
process involves generating region proposals via selective 45 frames per second (FPS). Moreover, Liu et al. proposed the
search and then classifying these proposals with the help of a R-FCN framework based on Faster R-CNN for detecting
CNN. Another popular technique for object detection is You objects with state-of-the-art accuracy on the PASCAL VOC
Only Look Once (YOLO) [2]. YOLOv3, its latest version, 2012 dataset. All these models hold immense potential for
follows a one-stage detection system that uses a single CNN to various computer vision applications and have significantly
predict the bounding boxes and their respective class contributed to advancing the field of object detection through
probabilities for each detected object. This system has gained deep learning techniques.
immense popularity due to its high accuracy and real-time
performance capabilities. Similarly, Single Shot Detector III. DATASET
(SSD) [3], which employs a single CNN to perform object The dataset chosen is known as 'car object detection', which
detection, generates multiple default boxes for every feature consists of images and annotations specially curated to cater to
map and predicts both offset and class probabilities for each those interested in car detection and classification tasks. These
annotated images will enable us to train our algorithms
3 CSI-7-MAL 2022-2023
precisely, improving their accuracy in identifying cars within a In order to begin processing the data, the first step involves
given image or video feed. The brains behind this dataset reading it and then translating its contents into a pandas data
belong to an individual with the username ‘Edward Zhang’ frame. In this process, specific columns of interest, namely
who uploaded the data onto Kaggle one of the most well- 'xmin,' 'ymin,' 'xmax,' and 'ymax,' are converted into integer
known platforms for data science professionals. By sharing values for easier manipulation. Once this is done, the next step
this valuable resource with others in the field, ‘Edward Zhang’ involves eliminating any redundant entries in the data frame to
has enabled individuals worldwide to access an opportunity to the 'image' column using the "drop_duplicates" method. It
leverage previously unavailable tools for developing ensures that unique and non-repetitive data points remain for
innovative solutions. further analysis, streamlining, and optimizing subsequent
stages of data processing.
The dataset consists of two main folders: "images" and
"annotations". The "images" folder contains 1,758 images of To move forward, it is essential to establish certain utility
motorcars gathered from various sources, such as digital functions that will help display images. A critical function that
image banks and individual collections in JPG format, with comes into play here is the 'display_image' function, which
resolutions ranging from 640x480 to 1920x1080. These displays an image along with its predicted coordinates and
images have captured every aspect of the cars from multiple bounding box coordinates (if available) using Matplotlib. On
perspectives, taken under different illuminating conditions. the other hand, the 'display_image_from_file' function
Moreover, a diverse array of automobiles is represented in the functions to read an image from a file before passing it on to
collection ranging from sports cars to vintage cars to modern- be displayed by calling up the 'display_image' function.
day family vehicles. Each car has been photographed while Lastly, we have the 'display_from_dataframe.' This particular
existing in various backgrounds, such as busy roads or vacant function takes inputs in the form of a row from a data frame
parking lots. The dataset thus offers a comprehensive and then proceeds to call up the 'display_image_from_file'
collection of vehicle images that can be scrutinized for with all necessary arguments in order for it to display that
numerous purposes. The "annotations" folder contains specific image correctly.
corresponding annotations for each image in XML files. The
annotation files carry crucial information about the cars' The function known as 'display_grid' can showcase a grid of
positions in the images and their respective class labels. These images and their bounding boxes. For this purpose, it requires
labels encompass various categories, such as sedans, sports two arguments: the data frame is referred to as 'df,' and the
cars, SUVs, etc. This additional information enhances the other is an integer value that determines the number of items
overall quality and value of the dataset. It gives us more to be displayed, which goes by the name n_items.' In order to
comprehensive insights into different types of vehicles yield a desirable output, this specified function randomly
appearing in various scenes captured by cameras. selects n_items out of all the available items in the provided
Consequently, it enables us to develop more effective data frame. Eventually, these selected images are exhibited in
algorithms for object detection, recognition, and classification a grid that follows 1x3 dimensions.
tasks. In the subsequent processing stage, the image of interest is
IV. METHODOLOGY
We have followed a systematic approach to develop a model
that can accurately detect cars in images. We started by
importing the required dependencies and loading our dataset
on 'Jupyter Notebook'. Firstly, we utilize two directories,
namely 'train_path' and 'test_path,' that contain relevant image
data, and a separate CSV file called
read from the designated directory, which in this case is
'train_solution_bounding_boxes.csv,' which contains crucial
'train_path,' with the aid of OpenCV. Through the utilization
information for object detection.
of the 'display_image_from_file' function, a visual output of
the image being read is presented for easy analysis and
The columns within the CSV file outline specific aspects of
interpretation.
the data. The data is organized into five categories: 'image,'
'xmin,' 'ymin,' 'xmax,' and 'ymax.' The 'image' column contains
During a model's training, the 'data_generator' function
valuable information about the image as it includes its
supplies data. This function is specifically designed for
respective file name. Meanwhile, the four columns indicate
generating data and takes in three arguments: 'df,' which refers
coordinates for a bounding box region surrounding an object
to the data frame; 'batch size,' which refers to the size of
of interest. Specifically, these columns refer to the minimum
batches desired; and 'path,' which is used for specifying
and maximum values along the x and y axes, which define a
directory path. The loop within this function runs indefinitely
rectangular shape around an identified item within an image.
and randomly selects rows from the data frame. Using
This metadata is incredibly useful when conducting analyses
OpenCV, image and bounding box coordinates are read from
or making inferences from visual data sets.
the disk and stored in NumPy arrays. Finally, the input and
output numpy arrays are returned as a dictionary through yield
4 CSI-7-MAL 2022-2023
statements within the body of this function. The process misinformation about false positives or false negatives due to
repeats until the model has utilized all images during its unwarranted bias. This successful outcome reflects the model's
training phase. high efficiency and performance in detecting cars within
images.
To proceed further, it is essential to establish a clear
definition of the model. It involves using a sequence of 10 In order to evaluate the effectiveness of the model, we
convolutional layers, each having twice the number of filters decided to generate visual representations of its predictions on
compared to its preceding layer. Additionally, after every a subset of the test images. This process showed us that the
convolutional layer comes batch normalization and max model could accurately identify and pinpoint most cars'
pooling layers. Once all these steps have been covered, the positions in these images. However, as expected, there were
outcome originating from the final max pooling layer still instances where it generated false positives or false
undergoes flattening. Now two fully connected layers are negatives, indicating ample room for advancement in its
applied with ReLU activations and 256 and 32 hidden units, performance. It also implies that further tweaking and
respectively. Finally, in order to predict bounding box adjustments may be required to increase its overall accuracy.
coordinates, a dense layer containing four output units is Ultimately, this detailed assessment highlights our current
employed. model's strengths and weaknesses and provides valuable
insights for future development efforts.
During the model compilation process, the approach used to
evaluate the model is by measuring its mean squared error loss Regarding computational abilities, our analysis revealed that
and optimizing it with the Adam optimizer. In addition, to the model demonstrated an impressive training speed on a
track how accurate the bounding box prediction is, the single GPU. During the training phase, we employed
'metrics' argument has been set to 'accuracy.' It allowed for a Tensorflow's pre-installed performance metrics to keep track
better understanding of how well the model performs in terms of the model's progress. It successfully maintained an optimal
of accurately predicting bounding boxes. By incorporating equilibrium between accuracy and training time. The model
such evaluation metrics, it becomes easier to identify areas can efficiently learn without compromising on its precision.
where improvements can be made.
VI. CRITICAL APPRAISAL
In order to evaluate and analyze the performance, we have The model's functionality is limited, as it can detect only one
used the 'test_model' and 'test' functions. A former function is car at a time. This limitation is a disadvantage in real-life
a crucial tool that enables us to predict bounding boxes for a situations where multiple cars may be in an image. The
given image by taking in two inputs: the model itself and a model's incapacity to recognize more than one car restricts its
data generator. On the other hand, the latter is responsible for practical usage and effectiveness in various circumstances.
calling the 'test_model' function three times with different
images by accepting only the model as input. This way, we The capacity to identify and track multiple cars
have observed how well our object detection model performs simultaneously is fundamental in several real-life situations,
on various types of images and make necessary adjustments or like autonomous driving systems, parking lot surveillance, and
improvements accordingly. traffic monitoring. If the model can only identify one car, it
will capture only some of the image's context or comprehend
During the training process of object detection models, the the scene comprehensively. This constraint significantly limits
'ShowTestImages' class played a significant role as it the model's effectiveness in scenarios where accurate
displayed predicted bounding boxes for three images at the identification and tracking of numerous cars are necessary.
end of each epoch. A callback function performs a specific
task based on an event in the program's execution. Inaccurate and incomplete results may arise due to the
Specifically, this function calls upon the 'test' function, which limitation of the model in detecting multiple cars that are
utilizes the model to predict unseen test images and displays closely positioned or overlapping. This shortcoming can result
them along with their corresponding bounding boxes. This in false negatives or inaccurate bounding box predictions,
feature has let users quickly visualize how well the model has adversely affecting applications requiring precise object
learned to detect objects in real-world settings and fine-tune it detection, like autonomous vehicles.
further if necessary.
V. RESULTS
Upon executing the model, it was revealed that the test set's
accuracy rate exceeded expectations, revealing a significant
level of precision and recall metrics. With an average
precision score of 0.90 and an average recall rate of 0.87, it is
evident that the model has successfully identified the location It is essential to adjust the model to detect numerous
of vehicles in images with great accuracy without having any vehicles at once, which can be accomplished using object
5 CSI-7-MAL 2022-2023
VII. REFERENCES
[1] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-
CNN: Towards real-time object detection with region
proposal networks. In Advances in neural information
processing systems (pp. 91-99).
[2] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016).
You only look once: Unified, real-time object detection. In
Proceedings of the IEEE conference on computer vision and
pattern recognition (pp. 779-788).
[3] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu,
C. Y., & Berg, A. C. (2016). SSD: Single shot multibox detector.
In European conference on computer vision (pp. 21-37).
[6] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual
learning for image recognition. In Proceedings of the IEEE
conference on computer vision and pattern recognition
(CVPR) (pp. 770-778).