Kaustubh Shlok SK Report

Major Project Report on
Framework For Analysis and Automatic Summarization of

Surveillance Footage
Submitted in partial fulfillment of the requirements for the degree of
BACHELOR OF TECHNOLOGY
in
INFORMATION TECHNOLOGY
by
Kaustubh Khedkar - 201IT128
Shlok Bhosale - 201IT258
under the guidance of
Dr. Sowmya Kamath S.
DEPARTMENT OF INFORMATION TECHNOLOGY

NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA
SURATHKAL, MANGALORE - 575025
April 2024
ACKNOWLEDGEMENT
I would like to take this opportunity to express my heartfelt gratitude to all those
who have contributed to the successful completion of my final year major project
in the Information Technology branch. First and foremost, I am deeply thankful to
my project guide Dr. Sowmya Kamath S, whose unwavering support, guidance, and
expertise were invaluable throughout this journey. I am also thankful to the Head
of the Department (HOD) of Information Technology, Dr, Geetha V, for providing a
supportive academic environment and resources for the project.
I would like to extend my appreciation to the faculty members of the IT department
for their valuable insights and encouragement. Their feedback and mentorship played
a crucial role in refining the project.
I am also grateful to my classmates and friends who provided assistance during the
project’s development. Their constructive feedback were instrumental in the project’s
success.
Furthermore, I am thankful to my family for their unwavering support and under-
standing during the challenging phases of this project. Their patience and encour-
agement kept me motivated.
Lastly, I want to acknowledge the resources, both online and offline, that were instru-
mental in my research and development efforts. This project has been a significant
learning experience, and I am thankful to everyone who played a role in its successful
completion.
ABSTRACT
In the era of widespread surveillance footage adoption, the need for efficient au-
tomatic summary generation has become paramount. This research aims to build
an integrated system to detect, summarize, and store vehicle and road information
extracted from surveillance footage. The system encompasses Automatic Number
Plate Detection and Recognition, Road Quality Assessment, Road Sign Detection,
and Pothole Detection with Timestamps.
Our proposed approach revolves around a two-phase methodology. First, we em-
ploy OpenCV for video input and preprocessing, allowing frame-by-frame analysis and
enhancing image quality through sharpening and region-of-interest highlighting. To
mitigate processing latency, dynamic frame skipping is implemented. Subsequently,
we harness machine learning-based object detection methods, with a particular focus
on the YOLO (You Only Look Once) model. This approach significantly improves the
efficiency and accuracy of dashcam video summarization, offering enhanced quality,
reduced latency, and adaptability to changing environmental conditions.
The objectives include tracking vehicle entry and exit, developing Automatic
Number Plate Detection, Recognition, and Timestamp system using Deep Learn-
ing YOLO model, implementing Automatic Road Sign Detection and Classification
using CNN with YOLO, creating a Pothole Detection and Road Quality Assessment
system using TensorFlow YOLO model, and integrating a database into the system
through a web application.
Keywords— surveillance footage, OpenCV, Machine Learning, YOLO, Video

Summarization, Object Detection, Real-time Processing.
i
CONTENTS
LIST OF FIGURES iii
1 INTRODUCTION iv
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 LITERATURE REVIEW 3
2.1 Background and Related Works . . . . . . . . . . . . . . . . . . . . . 3
2.2 Outcome of Literature Review . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Objectives of the Project . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 PROPOSED METHODOLOGY 12
3.1 Video Input and Preprocessing using OpenCV . . . . . . . . . . . . . 12
3.2 Object detection methods . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Number Plate and Sign Board recognition . . . . . . . . . . . . . . . 13
3.4 Number and type of vehicles counter . . . . . . . . . . . . . . . . . . 15
3.5 Timestamp and Classification . . . . . . . . . . . . . . . . . . . . . . 17
3.6 Road Quality Assessment . . . . . . . . . . . . . . . . . . . . . . . . 17
3.7 Database functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.8 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 WORK DONE AND RESULTS 22

4.1 Automatic Number Plate Detection . . . . . . . . . . . . . . . . . . . 22
4.2 Road Sign Detection and Classification . . . . . . . . . . . . . . . . . 24
4.3 Road Quality Assessment . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4 Number and type of vehicles counter . . . . . . . . . . . . . . . . . . 28
4.5 Database Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5 CONCLUSIONS AND FUTURE WORK 34
REFERENCES 35
ii
LIST OF FIGURES
2.2.1 System Flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Road Degradation Examples . . . . . . . . . . . . . . . . . . . . . . . 9
3.3.1 Road Sign Classification . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.4.1 Vehicle Classification System . . . . . . . . . . . . . . . . . . . . . . . 15
3.4.2 Vehicle Detection and Counting . . . . . . . . . . . . . . . . . . . . . 16
3.6.1 Pothole Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.1.1 Plate detection using YoloV5 . . . . . . . . . . . . . . . . . . . . . . 22

4.2.1 Sign Detection from video frames . . . . . . . . . . . . . . . . . . . . 25
4.3.1 Road Quality Detection . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4.1 Vehicle Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.5.1 View Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.5.2 Plate Query with Details . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.5.3 Frame Query from Plate . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.5.4 Types of Vehicles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.5.5 Vehicle Movement Rate . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.0.1 Project Timeline Gantt Chart . . . . . . . . . . . . . . . . . . . . . . 37
iii
CHAPTER 1
INTRODUCTION
1.1 Overview
The widespread integration of surveillance cameras in vehicles has ushered in an era
of unprecedented data collection, capturing countless hours of road footage. As the
volume of surveillance camera-generated video data continues to surge, the need for
automated surveillance camera summary generation has never been more pressing.
The objective is to develop an approach that can efficiently distill this vast reservoir
of video content into concise, informative summaries, ultimately enhancing the utility
and accessibility of surveillance camera data for various applications.
Automated surveillance camera summary generation offers numerous benefits, in-
cluding:
1. Enhanced Security: By analyzing surveillance footage, the system can moni-
tor vehicle activity in specific areas and detect unauthorized access through automatic
number plate detection and recognition. This capability strengthens security proto-
cols and aids in the detection of potential threats.
2. Improved Urban Planning: Integration of surveillance footage analysis fa-
cilitates urban planning initiatives by providing valuable insights into traffic patterns,
congestion hotspots, and road conditions. This information can inform infrastructure
development projects and optimize city planning efforts.
3. Efficient Resource Utilization: Manual review of extensive surveillance
footage is time-consuming and resource-intensive. Automated summarization stream-
lines this process, allowing for more efficient allocation of resources and personnel for
surveillance and security tasks.
4. Real-time Incident Response: Timely analysis of surveillance footage en-
ables rapid response to incidents such as accidents, traffic violations, or suspicious
activities. This proactive approach enhances public safety and minimizes the impact
of unforeseen events on road users.
By addressing these needs and challenges, automated surveillance camera sum-
mary generation facilitates the effective utilization of surveillance camera data for
iv
a wide range of applications. The system’s adaptability extends its utility beyond
automotive uses, making it a valuable tool for enhancing security, safety, and urban
planning initiatives in diverse environments.
1.2 Motivation
The motivation behind this research is rooted in the critical role that surveillance
camera footage plays in enhancing road safety, traffic management, and incident
analysis. The surveillance camera services frequently struggle to process data in real-
time, preserve video quality, and correctly identify objects of interest. The goal of
this research is to provide institutions and consumers with more effective and effi-
cient tools for utilising surveillance camera data by addressing these shortcomings.
This entails putting into practice pothole detection, timestamping, road quality as-
sessment, automatic number plate detection and recognition. Moreover, the system’s
adaptability goes beyond automotive uses because it can be easily combined with
the analysis of surveillance video. Urban planning initiatives are supported, security
measures are strengthened, and possible threats are identified by repurposing the
surveillance camera summary generation system for surveillance footage. In the end,
these developments unlock the full potential of surveillance camera data for a wide
range of applications and contribute to safer and more informed driving experiences.
The lack of surveillance video summarization poses several challenges, including:
1. Data Overload: The exponential growth of surveillance camera-generated
video data overwhelms human operators and traditional analysis methods, necessi-
tating automated summarization for efficient processing.
2. Real-time Processing: The delay in processing surveillance footage in real-
time hampers timely responses to incidents, potentially compromising safety and
security.
3. Object Identification: Inaccurate object identification in surveillance footage
impedes effective analysis and decision-making, highlighting the need for advanced
recognition technologies.
4. Resource Allocation: Manual review of extensive surveillance footage con-
sumes significant time and resources, underscoring the importance of automated sum-
1
marization to streamline operations.
By addressing these challenges through automated surveillance camera summary
generation, institutions and consumers can access more effective and efficient tools
for utilizing surveillance camera data. This includes enhancing road safety, traffic
management, incident analysis, urban planning initiatives, and security measures.
Additionally, the adaptability of this system extends its utility beyond automotive
applications, enabling integration with broader surveillance video analysis for com-
prehensive security solutions.
In summary, automated surveillance camera summary generation unlocks the full
potential of surveillance camera data, contributing to safer and more informed driving
experiences while supporting various applications across security, safety, and urban
planning domains.
2
CHAPTER 2
LITERATURE REVIEW
2.1 Background and Related Works
After going through the literature survey for Automatic Summarization of surveil-
lance cameras Video Footage, these were some of the main learnings from the papers
we read.
Zeid Selmi et Al [1] introduces an automatic deep learning-based system for detecting
and recognizing vehicle License Plates (LPs). It emphasizes the challenges posed by
LP variations between countries and the limitations of existing systems that operate
under controlled conditions. The proposed system consists of three parts: detection,
segmentation, and character recognition. It demonstrates high accuracy in various
challenging conditions, such as poor image quality, perspective distortion, and varying
lighting, addressing the need for efficient LP detection and recognition.
M. T. Qadri et Al [2] created an ANPR (Automatic Number Plate Recognition)
system where the core component is a software model implemented in MATLAB
7.0.1, which consists of three main steps: image capture, number plate extraction,
and character recognition. Initially, RGB images are captured from a USB camera.
The number plate extraction process employs a yellow search algorithm to identify
the yellow bckground characteristic of Sindh’s official number plates. This algorithm
converts the image to black and white, removes border-connected white patches, and
eliminates small unwanted regions using a pixel count method. The smearing algo-
rithm is then applied to crop the vehicle number plate. After inverting the colors to
have white text on a black background, the OCR algorithm is used to recognize in-
dividual lines and characters, correlating them with an alphanumeric database. The
resulting number is stored and compared to a stored database for vehicle authoriza-
tion, providing corresponding signals based on the comparison result.
M. Swathi et Al [3] introduced a sign detection system. In the traffic sign detection
stage, various methods are reviewed. This includes colour-based detection, shape-
based detection, and methods combining colour and shape information. Colour-based
3
methods use colour segmentation to identify regions in the image that may contain
traffic signs. Shape-based methods employ techniques like Hough transform to detect
signs based on their geometric shapes. Combined methods use colour and shape
information to reduce interference from similar objects.
The traffic sign recognition stage involves identifying the specific traffic sign class
after it has been detected. Recognition is achieved through feature matching or
machine learning approaches. Feature matching methods use techniques like BRISK
or SURF to match detected signs with templates in a database. For classification,
machine learning approaches employ artificial neural networks or Support Vector
Machines (SVM).
T. Suwattanapunkul et Al [4] employed YOLOv5 and YOLOv8s models for traffic
sign detection. YOLO is an object detection algorithm known for its speed and accu-
racy, and the paper explains its grid-based approach. The authors use two datasets:
the Tsinghua-Tencent 100K (TT100k) dataset from China and a new Taiwan Traffic
Sign (TWTS) dataset. They combine these datasets into a hybrid dataset to improve
model performance. The experiments compare the models’ performance using preci-
sion, recall, and mean average precision (mAP) metrics.
Results indicate that YOLOv8s outperforms YOLOv5s6 on both TWTS and the hy-
brid dataset. The hybrid dataset enhances model performance compared to TWTS
alone.In conclusion, the study underscores the importance of large and diverse datasets
for training deep learning models and highlights the potential of the proposed method
for enhancing road safety and advanced driver assistance systems (ADAS).
M. V. Dharsini et Al [5] described in this paper a system that revolves around
the development of a vision-based lane identification algorithm for use in driverless
cars and advanced driver assistance systems (ADAS). The algorithm is designed to
be accurate and robust in various lighting and driving conditions. It begins with
image preprocessing, which includes steps such as extracting the region of interest
(ROI), converting images to grayscale, enhancing image contrast, and applying me-
dian filtering to reduce noise. Edge detection is then performed using the Otsu-Canny
method, followed by Hough transform with slope filtering to detect lane lines. In-
terframe clustering is utilized to eliminate false lane candidates, and the final step
involves selecting left and right lanes based on their distance from the lane center.
4
To enhance object detection accuracy and speed, the YOLOv6 model is employed.
The results of this methodology demonstrate consistent lane detection performance
under challenging real-world conditions, making it a valuable contribution to the field
of autonomous driving and ADAS technology.
R. Li et Al[6] introduces a road quality evaluation system designed to detect
and assess potholes efficiently and accurately. The system employs a stereo camera
and a laser diode to gather data. Initially, images without laser dots are processed
through a pothole detection neural network (PDNN) based on the YOLO network,
which detects and outlines potholes with label boxes. Subsequently, laser dots are
projected onto the potholes’ surface, and corresponding images with laser dots are
captured by the stereo camera. Image processing is utilized to identify the laser dots in
each image, allowing the calculation of distances between the marks on the potholes’
surface and the camera through stereo vision techniques. Finally, a 3D model of the
pothole is created, and the volume of the damaged area is computed. This system is
cost-effective and can operate under various weather conditions and on different road
pavement materials. The comprehensive experimental results presented in the paper
validate the system’s effectiveness and reliability.
G. H. Palli et Al [7] introduces a process that involves converting RGB images
to grayscale, normalizing them for CNN-based road defects identification, and train-
ing them using a Faster R-CNN model with Transfer Learning on a COCO object
detection dataset. The model’s accuracy is evaluated using Mean Average Preci-
sion (MAP). Detected road defect locations are logged into a remote database for
monitoring and maintenance. The results show that the proposed system achieves
a high accuracy rate and effectively avoids false alarms like speed breakers or cat-
eye reflectors. Future work includes expanding the system’s coverage and addressing
speed limitations. The validation process demonstrates the system’s effectiveness in
detecting road surface faults while minimizing false alarms.
Mengjuan Fei et Al [8] proposes a novel compact yet rich key frame creation
method for compressed video summarization. First, we directly extract DC coeffi-
cients of I frame from a compressed video stream, and DC-based mutual information
is computed to segment the long video into shots. Then, we select shots with static
background and moving object according to the intensity and range of motion vec-
5
tor in the video stream. Detecting moving object outliers in each selected shot, the
optimal object set is then selected by importance ranking and solving an optimum
programming problem. Finally, we conduct an improved KNN matting approach on
the optimal object outliers to automatically and seamlessly splice these outliers to
the final key frame as video summarization. Previous video summarization methods
typically select one or more frames from the original video as the video summarization
M. Imran Hosen et Al [9] introduces a new Multi-camera video stitching that
combines several videos captured by different cameras into a single video for a wide
Field-of-View. The keypoints and descriptors are obtained by the scale-invariant fea-
ture transform (SIFT) and Root-SIFT, respectively. Then, these keypoint descriptors
are matched by applying a hybrid matcher, a combination of Brute Force (BF), and
Fast Linear Approximated Nearest Neighbours (FLANN) matchers. After geomet-
rical verification and eliminating outlier matching points, one-time homography is
estimated based on Random Sample Consensus (RANSAC).
S. Aggarwal et Al [10] adds that the identification of licence plates (LPR) on In-
dian commercial trucks poses distinct difficulties because there are no standardised
databases, different font styles, and occasionally hand-painted numbers. By putting
forth a novel system for automated weighbridge applications, this paper addresses
these problems. The authors first close a significant gap in the literature by provid-
ing a thorough database of Indian truck licence plates. Subsequently, they utilise
cutting-edge deep learning models, namely Faster R-CNN for licence plate identifi-
cation and EAST for text identification, tailoring them to the particularities of the
Indian truck LPR assignment. This method significantly outperforms existing meth-
ods with a maximum accuracy below 90%, achieving an impressive 95.82% accuracy.
The suggested system opens the door for reliable and precise LPR in the automation
of Indian weighbridges.
D. Arya et Al [11] helped create a dataset to overcome the Road damage detection
challenge that has always been a laborious and subjective process that depends on
visual inspections that are prone to human error. Let’s introduce automatic road
damage detection, a field that has the potential to completely transform infrastructure
management. However, diverse and representative data are necessary for training
robust algorithms; the RDD2022 dataset addresses this issue. With 47,420 high-
6
resolution photos taken in six different countries, this multi-national treasure trove
provides a rainbow of actual road conditions. RDD2022 covers a wide range of road
types, weather conditions, and damage manifestations, from India’s busy city streets
to Japan’s immaculate motorways. This is a far cry from the frequently geographically
constrained datasets of the past. But the genius of RDD2022 is not limited to its
sheer amount. Bounding boxes and labels are used to carefully annotate every image,
identifying different types of damage such as cracks, potholes, bumps, and even lane
markings. Training relies heavily on this fine-grained data, which enables algorithms
to pick up on the subtle differences in damage appearance under various conditions.
RDD2022’s value is further enhanced by standardisation. Access to this wealth of
information is made possible by the uniform image format and annotation schema,
which guarantee smooth integration with a variety of machine learning frameworks.
By allowing developers to benchmark their models against the dataset, this open-door
policy encourages a thriving research community and feeds the cycle of continuous
improvement.
K. Zheng et Al [12] proposed a novel method that leverages the strengths of both
Haar-like features and Histogram of Oriented Gradients (HOGs) to achieve robust and
accurate plate detection. Haar-like features, known for their efficiency in identifying
local image patterns, excel at locating potential plate regions. Subsequently, HOGs,
adept at capturing edge and gradient information, refine these regions by identifying
characteristic patterns within the plate itself. This synergetic approach effectively
tackles challenges like cluttered backgrounds, diverse plate styles, and varying lighting
conditions. The reported 95.82% accuracy demonstrates the effectiveness of this
method, paving the way for improved performance in license plate-based applications.
2.2 Outcome of Literature Review

1. License Plate Detection: The field of License Plate (LP) detection encompasses
a diverse array of techniques and approaches. These methods range from sim-
ple deterministic rules to more complex machine learning and classification sys-
tems. Some employ shape descriptors, such as ”context shapes,” to find cor-
respondences between forms, while others utilize morphological operations and
7
contour detection. Color-based approaches have been used to extract LPs by
identifying specific colors in images, and texture-based methods analyze pixel
intensity distributions in LPs. Hybrid approaches combine color and texture
features, and character-based methods focus on detecting and recognizing char-
acters within LPs.
Figure 2.2.1: System Flowchart
2. Character Segmentation: The second stage in LP detection and recognition

systems is character segmentation. Various techniques have been applied for
detecting character regions within license plate images. These methods include
image segmentation using sliding windows, fuzzy logic-based approaches, con-
nected component analysis, projection methods (both horizontal and vertical),
mathematical morphology, contour analysis, local and adaptive thresholding,
and histogram-based treatments.
3. Character Recognition: The final stage in LP detection and recognition sys-

tems involves character recognition, where different methods are employed for
recognizing characters within segmented regions. These methods range from
template matching to feature extraction and neural networks. Various classi-
fiers such as Hidden Markov Models (HMM), Support Vector Machines (SVM),
and Artificial Neural Networks (ANN) are used for character recognition. Some
approaches involve multi-stage classification schemes, while others incorporate
Convolutional Neural Networks (CNN) with spatial transformer networks for
text detection and recognition in natural scenes and videos.
8
Figure 2.2.2: Road Degradation Examples
4. Image Preprocessing: Convert the RGB images to grayscale to simplify pro-

cessing. Normalize the images to prepare them for input into a deep learning
model.
5. Pothole Detection: Utilize a deep learning model for object detection, such
as Faster R-CNN. Train the model on the collected dataset, using Transfer
Learning with a COCO object detection dataset as a starting point. Fine-tune
the model on the road defects dataset to adjust the features for detecting road
potholes accurately.
The published studies on detecting and classifying road damage using ML- and
DL-based methods may be categorized into two groups based on the type of
taken images used: The first group uses images captured perpendicularly above
road surfaces to build road damage classification models (see 2.2.2); the second
group used images captured by vehicle-mounted dashboard cameras.
THe limitations that couldnt be addressed in the papers were
• Limited Evaluation Data: The paper discusses the system’s performance based
on a specific dataset. It’s important to evaluate the system on a broader range
of road conditions and regions to assess its generalizability.
9
• Single-Lane Detection: The focus appears to be on single-lane detection. Ex-
panding to multi-lane roads and complex intersections is a challenge not ad-
dressed. [4]
• Limited Road Types: The paper mentions the road pavement materials it cov-
ers, but there could be additional road surface types that the system hasn’t
been tested on. [5]
• Speed Limitations: The system’s performance may be impacted at higher speeds.

It’s essential to consider how the system functions under different speed condi-
tions. [1] [3] [6] [7]
• Evaluation Metrics: While the paper discusses accuracy metrics, it could pro-
vide more detailed information on other metrics like precision, recall, and F1-
score for a comprehensive assessment.[1] [3]
• Hardware Requirements: The paper does not elaborate on the hardware re-
quirements for implementing the system, which can be a limitation for practical
applications.[4][5][6]
• False Negatives: The paper does not explicitly mention how it handles potential
false negatives (missed potholes), which is critical for safety. [6] [7]
2.3 Problem Statement

To build an integrated and robust system to detect, summarize and store
information of the vehicles and the road extracted from a dash cam video.
This research addresses the need for a comprehensive automated system that
encompasses Automatic Number Plate Detection and Recognition, Road Quality As-
sessment, Road Sign Detection and Pothole Detection with Timestamps.
We can use this system for Law Enforcement and Public Safety: It can assist
law enforcement agencies in tracking vehicles of interest, ensuring road sign compli-
ance, and quickly responding to incidents. Timestamped footage and license plate
10
recognition assist in accident investigations and insurance claims. We would also be
able to detect license plates that violate the road signs and penalise them accordingly.
2.4 Objectives of the Project

(1) To track the entry and exit of vehicles on campus and check for potholes that
need repair. The objective is to maintain an automatic database of vehicle
entries instead of manually noting down their names.
(2) Develop an Automatic Number Plate Detection, Recognition, and Timestamp

system using a Deep Learning YOLO model based on the Inception-ResNet-v2
architecture for training.
(3) Implement Automatic Road Sign Detection and Classification using Keras and
Convolutional Neural Networks (CNN) with YOLO.
(4) Create a Pothole Detection and Road Quality Assessment system using a Ten-
sorFlow YOLO model.
(5) Integrate a database into the overall system through a web application.
11
CHAPTER 3
PROPOSED METHODOLOGY
3.1 Video Input and Preprocessing using OpenCV
Our system uses OpenCV to process the input video by working on individual frames,
which is one of its key features. We used a frame-by-frame method, where the input
video stream was divided into separate frames to enable in-depth analysis of each
frame separately. A set of OpenCV functions were utilised to implement various
image preprocessing methods to improve frame quality such as increasing sharpness
and simple blob based Region-Of-Interest highlighting. After the frames were ready,
we integrated object detection machine learning models to detect potential number
plate and sign board locations within each frame.
A potential issue of processing latency due to the preprocessing on every frame
was balanced by adjusting the number of frames skipped by OpenCV. This was
done to reduce the time taken to detect number plates and sign boards in consecutive
frames. This method enabled real-time video processing while also reducing detection
of number plates and sign boards that were already detected. The remaining set of
frames were then sent to the Object detection system to process.
Using OpenCV, you can apply image cropping, resizing, and geometric transfor-
mations to correct field of view (FOV) and occultation issues in images. To correct
occultation problems, you can also use image segmentation, inpainting, and depth es-
timation. Additionally, perspective correction and image enhancement methods can
both be helpful in enhancing the quality of images. This can be done after the EOI
has been identified by object detection fuctions.
3.2 Object detection methods

In contrast to traditional, non-machine learning-based computer vision techniques,
we examined the important role of machine learning-based object detection methods,
with a particular focus on the YOLO (You Only Look Once) model. [5]
To identify objects in images, traditional computer vision techniques use hand-
12
crafted features, filters, and heuristic rules. To design robust systems, these tech-
niques frequently involve intricate pipelines and deep domain expertise. However,
they frequently have trouble in environments that are constantly changing and in
different lighting. Additionally, the quality of the handcrafted features and rules has
a big impact on how well they perform, which can be constricting.
YOLO, on the other hand, serves as an example of a new class of object detection
models that make use of deep learning and neural networks. YOLO is incredibly quick
and appropriate for real-time applications because it can effectively process an entire
image in a single forward pass. As a result of the model’s extensive data training,
it is able to learn complex features, patterns, and context directly from the data,
reducing the need for hand-crafted features. YOLO is more robust and versatile than
conventional approaches due to its ability to adapt and generalise well across various
scenarios.
Furthermore, YOLO offers a significant advantage over conventional techniques,
which frequently need multiple iterations or post-processing steps to produce similar
results due to its capacity to detect multiple objects in a single pass and their precise
locations. However, it is a known issue that YOLO and other machine learning-based
models may require substantial computational power during both training and frame
processing during execution. Additionally, the quality and quantity of the training
data play a pivotal role in their performance.
In this project, we compared both the techniques and concluded that Deep Learn-
ing based ML methods such as YOLO will be employed as the Object Detection
method.
3.3 Number Plate and Sign Board recognition

When a vehicle is found, it is marked as a region of interest (ROI), allowing us
to concentrate only on the locations where number plates are frequently seen. To
improve the precision of text extraction, preprocessing methods must be used before
feeding the regions of interest (ROIs) to the Tesseract Optical Character Recognition
(OCR) engine as shown in Fig 3.3.1
We use a variety of thresholding techniques, such as binary thresholding, adaptive
13
Figure 3.3.1: Road Sign Classification
thresholding, and Otsu’s thresholding, to clear up the image and make the characters
easier to distinguish. In this step, the image is transformed into a binary format,
where the characters are displayed in black on a white background.
The ROIs may contain noise and objects as a result of elements like the lighting
and image quality. We use filters like Gaussian or Median to tame the noise and soften
the edges of the characters. Enhancing contrast between characters and background
can help people recognise characters much more easily. By scaling the pixel values
within the character regions, we can change the contrast.
Once the ROIs have been processed, Tesseract OCR, or optical character recog-
nition, methods are used to identify the number plates’ alphanumeric characters.
Tesseract is an open-source OCR engine that is renowned for its precision and adapt-
ability, which makes it perfect for reading various types of licence plates. The char-
acters on the plates are extracted during OCR and transformed into text that can be
read by machines. To pinpoint the exact time at which it was discovered, this text,
which stands in for the number plate number, is then linked with a timestamp.
Similar to this, a module for identifying traffic signs [13] is integrated. A convolu-
tional neural network (CNN) is then used to detect and categorise traffic signs based
on their shape, colour, and symbolism. In order to pinpoint the exact location of the
detected signs, localization is performed within the frame. All recognised traffic signs
14
are recorded in the database along with classifications and timestamps.
3.4 Number and type of vehicles counter

To develop a vehicle detection and counting system for highway traffic, the first step
involves collecting a diverse dataset of highway traffic videos capturing various types of
vehicles such as cars, motorcycles, buses, and trucks. These videos should encompass
different environmental conditions and traffic scenarios. Subsequently, the collected
videos are preprocessed by converting them into individual frames and resizing them
to a suitable input size for the object detection model. Normalization of pixel values
is also performed to ensure consistent processing. The system flow is shown in Fig
3.4.1
Figure 3.4.1: Vehicle Classification System
For object detection, the YOLOv5 model is employed due to its efficiency and
accuracy in real-time applications. The YOLOv5 model is trained on the gathered
dataset to accurately detect the specified vehicle types. Fine-tuning may be applied to
enhance the model’s performance, particularly in challenging scenarios. Once trained,
the YOLOv5 model is capable of detecting vehicles in each frame of the video feed.
To track the detected objects across consecutive frames and facilitate counting, the
Deep SORT algorithm is utilized. Deep SORT provides identity tracking of vehicles
by associating detected bounding boxes with existing object tracks. This enables the
15
system to maintain a continuous count of vehicles passing through a defined region of
interest (ROI) in the video frames. Moreover, Deep SORT allows for distinguishing
between different vehicle categories and updating counts accordingly. Integration
Figure 3.4.2: Vehicle Detection and Counting
with Streamlit, a user-friendly web application framework for Python, enables the
development of a simple and intuitive web interface for the vehicle detection and
counting system. The Streamlit application incorporates real-time video streaming
with overlays of detected vehicles and count statistics. Additionally, user-friendly
controls are provided for starting, pausing, and resetting the video feed and counting
process, enhancing the usability of the system as shown in Fig 3.4.2
After development, the system undergoes rigorous testing and evaluation using a
separate test dataset to assess performance metrics such as accuracy, precision, recall,
and processing speed. Based on evaluation results, iterative improvements are made
to enhance system performance and accuracy. Once validated, the system is deployed
on a suitable platform, ensuring scalability, reliability, and ongoing monitoring for
performance optimization and maintenance.
16
3.5 Timestamp and Classification
We record the present video time each time a number plate or a sign is identified
in order to precisely timestamp the entry. We can track the presence of particular
vehicles over time because this timestamp is linked to the corresponding number plate
record in the database.
It’s crucial to deal with repeating number plate detection, where the same plate is
found multiple times. Duplicate entries in databases are merged using de-duplication
techniques. In order to avoid over counting, a time threshold makes sure that repeated
detection within a given time frame are treated as a single event. A similar method
is employed for road sign boards, where every instance of a detection is not recorded
until a certain amount of time passes. However since the entire process of detection
is repeated for every individual frame and every detected object, the system uses up
extra time to classify and convert the same instance of an object.
3.6 Road Quality Assessment

In the methodology for road quality detection, the process commences with data
collection, involving the acquisition of video footage from surveillance cameras, en-
suring the inclusion of various road conditions and surfaces, and annotating frames
with relevant labels. Subsequently, data preprocessing is executed to break down the
video into individual frames, isolate the region of interest (ROI) encompassing the
road surface, and compute crucial features within these ROIs. To extract informative
features, convolutional neural networks (CNNs) are leveraged, as they can effectively
capture intricate details indicative of road quality. Moreover, deep learning mod-
els like ResNet were explored to enable automatic feature learning. The subsequent
phase encompasses training a machine learning model on the annotated dataset, dif-
ferentiating between various road quality attributes.
The road assessment model is run at frame intervals [14] that depend on the length
of the video. The road quality for the chosen frames is timestamped and the average
road rating is calculated based on metrics such as presence of potholes, width of the
road and road degradation.
17
Figure 3.6.1: Pothole Conditions
Figure 3.6.1 shows the images from the database and a sample of how they were
obtained.
3.7 Database functions

In order to store extracted licence plates and the timestamps that go with them, we
have built a reliable and effective database system using NumPy in Python as part of
the project implementation. This database is essential for monitoring and controlling
vehicle movements within a camera’s field of view, especially in security scenarios like
CCTV surveillance or areas under police watch.
This database system is primarily intended for surveillance camera deployments,
where it will effectively handle and store licence plate data that the onboard camera
has captured. surveillance camera footage is dynamic and continuous, so the im-
plementation is optimised to ensure a smooth integration with vehicle surveillance
systems. Time stamps linked to licence plate entries function as a log that is chrono-
logical in nature, allowing for a thorough record of vehicle movements on public roads.
In addition to providing a chronological record of the vehicle’s existence, the times-
18
tamps linked to each licence plate entry allow us to keep an eye on and track the
movement of the vehicle as it moves into and out of area through multiple camera’s
field of view. We have imposed a time-based constraint to stop the registration of
multiple entries for the same licence plate in consecutive frames in order to improve
the accuracy of our database. To be more precise, the system will only record the
first instance of a given licence plate if it is found within two minutes of the previous
instance. This prevents duplicate entries from being added.
We can successfully track vehicles that have entered a premises and have not left
within the allotted time limit thanks to this special feature. The database system also
offers full functionality for retrieving and displaying all timestamp occurrences linked
to a particular licence plate. This feature, which provides a thorough history of the
vehicle’s presence within the monitored area, is extremely helpful to law enforcement
and security personnel.
But since our licence plate recognition system is so flexible, we have also developed
a different setup that works well with closed-circuit television (CCTV) systems. Be-
cause of its flexibility, the database can be used in a wide range of security scenarios
and is not limited to surveillance camera applications.
For surveillance camera use, applications like traffic monitoring, incident detec-
tion, and improving overall road safety depend especially on the system’s effectiveness
in tracking vehicles and managing timestamps. In contrast, the database supports
law enforcement and security personnel’s surveillance efforts in a CCTV-based secu-
rity environment by offering a dependable way to track vehicles as they enter and
exit designated areas.
This dual-use feature highlights our database implementation’s adaptability and
scalability in various surveillance scenarios. The database is an essential part that
makes our licence plate recognition system more robust whether it is used on surveil-
lance cameras for vehicle tracking or integrated into CCTV systems for more general
security applications.
In addition to guaranteeing the effective storage of licence plate data, the NumPy-
based database implementation makes it easier to retrieve and analyse timestamp
data, which improves the overall performance of our licence plate recognition system.
Our surveillance and security applications become more sophisticated with the addi-
19
tion of this database functionality.
3.8 System Design

Based on the proposed methodology, the Iterative Model was chosen to be the ap-
propriate model for software development life cycle (SDLC) model. The Iterative
Model is suitable for projects where requirements are expected to change or evolve
over time, which is common in computer vision projects like the one described. This
model allows for flexibility and incremental development, enabling the team to incor-
porate feedback, adapt to changes, and refine the system gradually.
Iterative Model for the Proposed Methodology:
1. Planning Phase:
• Define project scope, problem statement, research objectives, and require-

ments.
• Formulate a development plan including timelines, resources, and mile-
stones.
2. Analysis Phase:
• Meet with our guide and discuss the requirements regarding video process-
ing, object detection, recognition, and database functionalities.
• Analyze what we could do better in our projects and implement those
functionalities.
• Think about how we can activate data querying in our system to make it
more robust.
3. Design Phase:
• Design the system flow, and modules for video input preprocessing, object
detection, recognition, database integration, and user interface.
• Design a landing page or dashboard which will unify all features together.
4. Implementation Phase:
20
• Develop the system iteratively, focusing on implementing core functional-
ities first.
• Utilize OpenCV for video input preprocessing and object detection.
• Implement YOLO model for object detection and recognition tasks.
• Integrate Tesseract OCR for number plate recognition and CNN for traffic
sign recognition.
• Implement the type of vehicle detector and counter
• Implement the Sign detection and classification.
• Implement database functionalities using NumPy for data storage and re-
trieval.
5. Testing Phase:
• Conduct unit testing for individual components to ensure they meet spec-
ifications.
• Perform integration testing to verify interactions between modules and
components.
• Conduct system testing to evaluate overall functionality, performance, and
reliability.
• Collect feedback from our guide and classmates for iterative improvements.
6. Maintenance Phase:
• Continuously monitor and evaluate system performance in real-world sce-

narios.
• Collect more real time video data from campus and get better dashboard
results.
The Iterative Model allows for continuous refinement and adaptation of the sys-
tem based on evolving requirements and feedback, making it well-suited for complex
computer vision projects like the one described in the proposed methodology.
21
CHAPTER 4
WORK DONE AND RESULTS
4.1 Automatic Number Plate Detection
We use OpenCV functions in order to identify the license number plates and the
python ocr.Easyocr for the characters and digits extraction from the plate. Because
of its accuracy and efficiency, Yolov5, a version of the YOLO (You Only Look Once)
object detection algorithm, has become popular in licence plate recognition (LPR).
Yolov5 is an excellent licence plate recognition and localization tool in LPR, enabling
real-time processing in a variety of settings. Yolov5 is an effective licence plate recog-
nition system that recognises licence plates and the vehicles they are associated with
by using a single, unified architecture that predicts bounding boxes and class proba-
bilities simultaneously. This feature is essential to LPR systems because it makes it
possible to extract licence plate data for later character recognition as shown in Fig
4.1.1
Figure 4.1.1: Plate detection using YoloV5
The LPD system uses Easy OCR to provide strong Optical Character Recognition
(OCR) on the localised licence plate regions, thereby enhancing the capabilities of
the Haar Cascade framework. Google’s open-source Easy OCR engine is very good
22
at turning text-containing images into machine-encoded text. The LPD system can
reliably extract alphanumeric characters from the detected licence plates by integrat-
ing Easy. With the accuracy of Easy OCR’s character recognition combined with
the effectiveness of Yolov5 model’s initial detection, this combination of the two tech-
nologies guarantees a thorough approach to licence plate recognition. By calculating
the degree of similarity between recognised characters and expected licence plate pat-
terns, the Levenshtein distance metric further improves the system’s reliability and
raises the LPD system’s overall accuracy.
The Yolov5 model can adapt well to different licence plate designs, sizes, and
orientations because it was trained on annotated datasets with a variety of image
types. Yolov5 is a potent tool for precise and quick licence plate detection because of
its capacity to manage numerous objects in a scene. This feature helps to increase the
overall efficacy of licence plate recognition systems in applications like access control,
security, and traffic monitoring.
Pure CV methods such as Morphological Filters reached a detection rate of 95.5%
with a few false positives and recognition rate of 90%. Models that used Transfer
learning and used a YOLO architecture achieved mAP of 99.2%. Haar cascades had
a recall of 95.2%, accuracy of 98.0% and with an average detection time of only 75ms.
To benchmark the model we made use of 2 datasets specifically created for Indian
Road scenarios.
Indian Driving Dataset (IDD) by IIIT Hyderabad, consisted of 6993 png images of
roads and their corresponding masks. The Dataset was for Multi-Class Segmentation
but was resized to 512x512 size images and made Binary (Road and Non-Road). The
dataset consists of images obtained from a front facing camera attached to a car. The
car was driven around Hyderabad, Bangalore cities and their outskirts.
Indian Licence Plate Dataset introduces an Indian license plate dataset with 16192
images and 21683 plate plates annotated with 4 points for each plate and each charac-
ter in the corresponding plate. We used a subset of images from both these datasets
to test our model on real scenarios and unique plates.
As our metrics we used Average Precision and Levenshtein distance for detection
and recognition respectively. For detection, we measure the Average Precision of
each class individually by computing the area under the precision x recall curve
23
interpolating all points. In order to classify defections as True Positive or False
Positive the IOU threshold is set to t=0.5. Levenshtein distance is a measure of
Character accuracy is defined by the number of actual characters with their places
divided by the total of actual characters i.e. how many characters are rightly detected.
The Average Precision of the model was 0.8310 with an average Levenshtein dis-
tance of 3. The Levenshtein distance of 3 represents an accuracy of 70% in a 10
character plate as shown in Table 4.1.2
Table 4.1.1: Training details
Architecture/Model YoloV5
Fine Tuning Dataset Indian Licence Plate Dataset (ILPD)
Model Size 42.1 MB
Table 4.1.2: Performance Metrics
Metric Value
Average Precision 0.8310
Average Levenshtein Distance 3
Levenshtein Distance (Accuracy) 70%
4.2 Road Sign Detection and Classification

Initially, a model was trained in the Darknet framework to detect Traffic Signs within
four categories utilizing the OpenCV dnn library. Following that, a CNN model was
trained in Keras to classify cut fragments of Traffic Signs into one of 43 classes. The
trained weights were then loaded into the YOLO v3 network using the OpenCV dnn
library. Subsequently, the output layers, where detections are made, were identified,
and parameters such as probability, color, and threshold for the red boxes were con-
figured. In the final step, frames were extracted from a video, and each image was
processed with this model as sh
The classification model underwent training on the German Traffic Sign Recogni-
tion Benchmark (GTSRB), employing 66,000 RGB images. The convolutional layer
filters were set at a dimension of 19 × 19, resulting in an achieved accuracy of 0.868 on
24
Figure 4.2.1: Sign Detection from video frames
the testing dataset. Notably, the initial model focused on the location of traffic signs
within four categories, showcasing a high mAP (mean Average Precision) accuracy
of 97.22%.
The second model introduced an additional convolutional layer for the final clas-
sification, contributing to the overall system’s efficiency. Experimental results from
video file processing revealed a frames-per-second (FPS) range between thirty-six and
sixty-one, rendering the system suitable for real-time applications. The specific FPS
varied based on the number of traffic signs to be detected and classified in each frame,
ranging from six to one.
When benchmarking this model with other systems on the German Traffic Sign
Recognition Benchmark (GTSRB), our model reached an accuracy of 97.04% where
some models like CNN with 3 Spatial Transformers were able to reach an accuracy
of 99.71% according to the benchmark metrics in Table 4.2.2
Architecture/Model Darknet53 - YoloV3

Training Dataset German Traffic Sign Recognition Benchmark (GTSRB)
Model Size 36 MB
25
Table 4.2.2: Comparison of Results on GTSRB
Model GTSRB Accuracy

YoloV3 with CNN 97.04%
CNN with 3 Spatial Transformers [15] 99.71%
4.3 Road Quality Assessment

For Pothole Detection we are using OpenCV and a pre-trained YOLO (You Only
Look Once) object detection model. The code captures video from a file or a camera,
processes each frame with the detection model, and saves the detected potholes as
images and their coordinates.
Here’s a step-by-step explanation of the code:
• Load the pre-trained YOLO object detection model using cv.dnnDetectionModel(net1).

Set the input parameters for the model using model1.setInputParams(size=(640,
480), scale=1, 255, swapRB=True).
• Define the video source using cap = cv.VideoCapture(filename).
• Define the result video writer using cv.VideoWriter(’PotholeVideoResult.mp4’,

cv.VideoWriter.fourcc(*’MP4V’), 10, (int(width), int(height))).
• Define the result saving path, initial values for some parameters, and the starting
time.
• Start a loop to process each frame of the video: a. Read the frame from the video
source. b. Analyze the frame with the detection model using model1.detect(frame,
Conf.threshold, NMS.threshold). c. Iterate over the detected objects and draw
detection boxes on the frame for detected potholes. d. Save the detected pot-
holes as images and their coordinates. e. Write the fps on the frame. f. Show
the result frame and save it using the video writer. g. Break the loop if the
frame is not read correctly.
• Release the video capture object, the result video writer, and destroy all OpenCV
windows.
26
The code also includes a mechanism to save the coordinates of the detected pot-
holes. It uses the geocoder.ip(’me’) function to get the latitude and longitude of the
current location. The coordinates are saved in a text file with the same name as the
image file.
The TensorFlow model capable of detecting damaged roads and highlighting de-
fective areas with red boxes. This model draws inspiration from YOLO, specifically
YOLO v4. The choice of the YOLO model is based on its outstanding performance
in real-time and faster object detection. We intend to implement this model in a
real-time computer vision application.
When compared to R-CNN and SSD object detection models, YOLO proves to be
the best fit for our purpose. YOLO employs a one-stage detector strategy, which
enhances the speed of object detection. It treats object detection as a regression
problem, simultaneously learning bounding box coordinates and class labels from a
given input image as seen in Fig 4.5.5
Figure 4.3.1: Road Quality Detection
After a thorough inspection of the detection outcomes, several phenomena were no-
ticed. Reflection of the objects from the wet road or rain spots on a car’s windshield
were often falsely misclassified as potholes. Small cracks and dark spots on the road
might be identified as potholes in the images with low visibility. False detections were
27
no exception even in the case of images recorded under clear weather, for instance
the reflection from a car hood may be detected as a pothole. The results of top
benchmarking models are shown in Table 4.3.1
Table 4.3.1: Comparison of Results on RDDC2022 [11]
Model F1 Score
YOLO-series and FasterRCNN-series Ensemble model 0.716
YOLOv5x P5 and P6 En-semble with Image patch 0.674
YOLOv7 with labelsmoothing and coordinateattention 0.663
4.4 Number and type of vehicles counter

The developed vehicle detection and counting system encompasses several crucial
components and methodologies to achieve accurate and efficient performance. Ini-
tially, a diverse dataset of highway traffic videos capturing various vehicle types and
environmental conditions is collected. These videos undergo preprocessing, including
frame extraction and resizing, as well as pixel value normalization to ensure consistent
processing.
For object detection, the YOLOv5 model is employed due to its effectiveness in
real-time applications. Once trained, the YOLOv5 model detects vehicles in each
frame of the video feed. To facilitate tracking and counting of detected objects across
frames, the Deep SORT algorithm is utilized. This algorithm enables identity tracking
of vehicles by associating detected bounding boxes with existing object tracks. This
facilitates the maintenance of a continuous count of vehicles passing through defined
regions of interest in the video frames, while also allowing for distinguishing between
different vehicle categories and updating counts accordingly as seen in Fig 4.4.1
As we can see in the picture below we can track different classes of vehicles. This
includes motorcycles, cars, trucks and bus as well. The test metrics are given in the
Table 4.4.2
28
Figure 4.4.1: Vehicle Classification
Architecture/Model YoloV5 with Deep SORT

Pre-trained weights ImageNet and Objects365
Model Size 14 MB
4.5 Database Functionality

In our project, we developed a comprehensive set of functions to analyze and visualize
data from a license plate recognition system. The dataset comprises information on
vehicle entries, including license plate numbers and timestamps. Leveraging Python
libraries such as pandas, NumPy, and Matplotlib, we engineered a suite of functions
to extract valuable insights from this dataset. Our functions enable diverse analy-
ses, including determining vehicle movement rates per hour, calculating the average
Table 4.4.2: Vehicle classification model
Metric Score
Mean Average Precision (mAp)-0.5 0.9853
Precision 0.977
Recall 0.971
29
time spent in entry for each vehicle, and visualizing the distribution of time spent
by vehicles as a percentage of the day. Additionally, we designed functionalities to
automatically add serial numbers to the dataset, trim strings to match specific regex
patterns, and display DataFrame outputs in a user-friendly manner using Streamlit.
These functions collectively empower users to explore and understand the dynamics
of vehicle entries, facilitating informed decision-making in various domains such as
traffic management, security surveillance, and resource allocation. With their flexi-
bility and scalability, our functions serve as essential tools for extracting actionable
insights from license plate recognition data, thereby enhancing efficiency and effec-
tiveness in real-world applications.
• View database created from video
Figure 4.5.1: View Database
• Query Numberplate from Database
30
Figure 4.5.2: Plate Query with Details
Figure 4.5.3: Frame Query from Plate
31
• Query Frame from Video
• Type of Vehicle Pie chart
Figure 4.5.4: Types of Vehicles
• Show movement rate using timestamps from database
32
Figure 4.5.5: Vehicle Movement Rate
33
CHAPTER 5
CONCLUSIONS AND FUTURE
WORK
The result of combining state-of-the-art computer vision techniques with machine
learning models—YOLO in particular for object detection—is a very complex system
for real-time video data analysis. Through this project we have achieved a way to
continuously track vehicles entering and exiting a campus, keeping a track of their
number plate and allowing querying a number plate and its location with timestamps.
This software can even be used in moving emergency vehicles like police cars to track
and distress along the route or catch someone breaking the law.
In order to maintain consistent accuracy in later stages including the extraction
and interpretation of textual information, our licence plate identification method
involves a rigorous preprocessing pipeline for areas of interest (ROIs). Furthermore, a
strong timestamping system supports the creation of an extensive database including
identified licence plates.
Moreover, the information that our technology creates is the foundation for a
multitude of additional uses in addition to licence plate identification. It acts as a
repository for information related to traffic accidents, vehicle movements, and the
analysis of security video. By employing advanced de-duplication techniques, the
system manages repeating licence plates and maintains the accuracy of the collected
data.
In conclusion, our integrated approach epitomizes a robust and technically adept
method for tracking and evaluating traffic activity, enriched with multifaceted appli-
cations in traffic management, law enforcement, and urban planning domains. By
harnessing the computational prowess of machine learning and computer vision, our
system empowers stakeholders with invaluable insights and tools to fortify safety, se-
curity, and operational efficiency in transportation systems.
In Future, we plan to make it more scalable to process large chunks of videos
quickly or in real time. We would also like to work on enhancing the UI to fit general
purposes and facilitate easy and light weight tracking and database functionalities.
34
REFERENCES
[1] Zied Selmi, Mohamed Ben Halima, and Adel M. Alimi. Deep learning system for
automatic license plate detection and recognition. In 2017 14th IAPR Interna-
tional Conference on Document Analysis and Recognition (ICDAR), volume 01,
pages 1132–1138, 2017.
[2] Muhammad Tahir Qadri and Muhammad Asif. Automatic number plate recog-
nition system for vehicle identification using optical character recognition. In
2009 International Conference on Education Technology and Computer, pages
335–338, 2009.
[3] M Swathi and K. V. Suresh. Automatic traffic sign detection and recognition: A
review. In 2017 International Conference on Algorithms, Methodology, Models
and Applications in Emerging Technologies (ICAMMAET), pages 1–6, 2017.
[4] Taweelap Suwattanapunkul and Lung-Jen Wang. The efficient traffic sign de-
tection and recognition for taiwan road using yolo model with hybrid dataset.
In 2023 9th International Conference on Applied System Innovation (ICASI),
pages 160–162, 2023.
[5] Ms. Visnu Dharsini, Karthik K, M. Gopichandd, Jorige Venkatesh, and Sabarish
S. Advanced road lane line detection. In 2022 International Conference on Data
Science, Agents Artificial Intelligence (ICDSAAI), volume 01, pages 1–5, 2022.
[6] Rongbang Li and Carolyn Liu. Road damage evaluation via stereo camera and
deep learning neural network. In 2021 IEEE Aerospace Conference (50100),
pages 1–7, 2021.
[7] Ghulam Hyder Palli, Ghulam Fiza Mirza, Ali Akbar Shah, Bhawani Shankar
Chowdhry, Tanweer Hussain, and Ubaid Ur Rehman. Defects detection condi-
tion assessment of road infrastructure using deep learning algorithm. In 2021 In-
ternational Conference on Robotics and Automation in Industry (ICRAI), pages
1–8, 2021.
[8] Jiang W. Mao W. Fei, M. A novel compact yet rich key frame creation method for
compressed video summarization. In 2021 International Conference on Robotics
and Automation in Industry (ICRAI), 2018.
35
[9] Md Imran Hosen, Md Baharul Islam, and Arezoo Sadeghzadeh. An effective
multi-camera dataset and hybrid feature matcher for real-time video stitching.
In 2021 36th International Conference on Image and Vision Computing New
Zealand (IVCNZ), pages 1–6, 2021.
[10] Siddharth Agrawal and Keyur D. Joshi. Indian commercial truck license plate
detection and recognition for weighbridge automation, 2022.
[11] Deeksha Arya, Hiroya Maeda, Sanjay Kumar Ghosh, Durga Toshniwal, and
Yoshihide Sekimoto. Rdd2022: A multi-national image dataset for automatic
road damage detection, 2022.
[12] Kuan Zheng, Yuanxing Zhao, Jing Gu, and Qingmao Hu. License plate detection
using haar-like features and histogram of oriented gradients. In 2012 IEEE
International Symposium on Industrial Electronics, pages 1502–1505, 2012.
[13] Deeksha Arya, Hiroya Maeda, Sanjay Kumar Ghosh, Durga Toshniwal, and
Yoshihide Sekimoto. Rdd2022: A multi-national image dataset for automatic
road damage detection. arXiv preprint arXiv:2209.08538, 2022.
[14] Deeksha Arya, Hiroya Maeda, Sanjay Kumar Ghosh, Durga Toshniwal, Hiroshi
Omata, Takehiro Kashiyama, and Yoshihide Sekimoto. Crowdsensing-based road
damage detection challenge (crddc’2022). In 2022 IEEE International Conference
on Big Data (Big Data), pages 6378–6386. IEEE, 2022.
[15] Álvaro Arcos-Garcı́a, Juan Alvarez-Garcia, and Luis Soria Morillo. Deep neural
network for traffic sign recognition systems: An analysis of spatial transformers
and stochastic optimisation methods. Neural Networks, 99, 01 2018.
36
Timeline of the B.Tech.(IT) Major Project
Figure 5.0.1: Project Timeline Gantt Chart
37
Biodata
38
IT499 Thesis-final
ORIGINALITY REPORT
12 %
SIMILARITY INDEX
7%
INTERNET SOURCES
8%
PUBLICATIONS
1%
STUDENT PAPERS
PRIMARY SOURCES
1
link.springer.com
Internet Source 2%
2
www.ncbi.nlm.nih.gov
Internet Source 1%
3
research.bau.edu.tr
Internet Source 1%
4
Minh-Tu Cao, Quoc-Viet Tran, Ngoc-Mai
Nguyen, Kuan-Tsung Chang. "Survey on
1%
performance of deep learning models for
detecting road damages using multiple
dashcam image resources", Advanced
Engineering Informatics, 2020
Publication
5
Sunil Kumar, Sushil Kumar Singh, Sudeep
Varshney, Saurabh Singh, Prashant Kumar,
1%
Bong-Gyu Kim, In-Ho Ra. "Fusion of Deep Sort
and Yolov5 for Effective Vehicle Detection and
Tracking Scheme in Real-Time Traffic
Management Sustainable System",
Sustainability, 2023
Publication

Kaustubh Shlok SK Report

Uploaded by

Copyright:

Available Formats

Kaustubh Shlok SK Report

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kaustubh Shlok SK Report

Uploaded by

Copyright:

Available Formats

Major Project Report on

Framework For Analysis and Automatic Summarization of

Submitted in partial fulfillment of the requirements for the degree of

under the guidance of

Dr. Sowmya Kamath S.

DEPARTMENT OF INFORMATION TECHNOLOGY

Keywords— surveillance footage, OpenCV, Machine Learning, YOLO, Video

4 WORK DONE AND RESULTS 22

5 CONCLUSIONS AND FUTURE WORK 34

3.3.1 Road Sign Classification . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.1.1 Plate detection using YoloV5 . . . . . . . . . . . . . . . . . . . . . . 22

5.0.1 Project Timeline Gantt Chart . . . . . . . . . . . . . . . . . . . . . . 37

2.2 Outcome of Literature Review

Figure 2.2.1: System Flowchart

2. Character Segmentation: The second stage in LP detection and recognition

3. Character Recognition: The final stage in LP detection and recognition sys-

4. Image Preprocessing: Convert the RGB images to grayscale to simplify pro-

THe limitations that couldnt be addressed in the papers were

• Speed Limitations: The system’s performance may be impacted at higher speeds.

2.3 Problem Statement

2.4 Objectives of the Project

(2) Develop an Automatic Number Plate Detection, Recognition, and Timestamp

3.2 Object detection methods

3.3 Number Plate and Sign Board recognition

3.4 Number and type of vehicles counter

Figure 3.4.1: Vehicle Classification System

Figure 3.4.2: Vehicle Detection and Counting

3.6 Road Quality Assessment

3.7 Database functions

3.8 System Design

• Define project scope, problem statement, research objectives, and require-

• Continuously monitor and evaluate system performance in real-world sce-

Figure 4.1.1: Plate detection using YoloV5

Table 4.1.1: Training details

Table 4.1.2: Performance Metrics

4.2 Road Sign Detection and Classification

Table 4.2.1: Training details

Architecture/Model Darknet53 - YoloV3

Model GTSRB Accuracy

4.3 Road Quality Assessment

• Load the pre-trained YOLO object detection model using cv.dnnDetectionModel(net1).

• Define the video source using cap = cv.VideoCapture(filename).

• Define the result video writer using cv.VideoWriter(’PotholeVideoResult.mp4’,

Figure 4.3.1: Road Quality Detection

Table 4.3.1: Comparison of Results on RDDC2022 [11]

4.4 Number and type of vehicles counter

Table 4.4.1: Training details

Architecture/Model YoloV5 with Deep SORT

4.5 Database Functionality

Table 4.4.2: Vehicle classification model

• View database created from video

Figure 4.5.1: View Database

• Query Numberplate from Database

Figure 4.5.3: Frame Query from Plate

• Type of Vehicle Pie chart

Figure 4.5.4: Types of Vehicles

• Show movement rate using timestamps from database

Figure 5.0.1: Project Timeline Gantt Chart