Smart Surveillence Report

Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

ACKNOWLEDGEMENT

It is a great pleasure for us to acknowledge the assistance and support of a large number
of individuals who have been responsible for the successful completion of this project work.
First, we take this opportunity to express our sincere gratitude to Faculty of
Engineering & Technology, Jain (Deemed-to-be University), for providing us with a great
opportunity to pursue our Bachelor’s Degree in this institution.
We place on record, our sincere thank you to Dr. Geetha G, Director, School of CSE
Faculty of Engineering & Technology, Jain (Deemed-to-be University), for the
continuous encouragement.
It is a matter of immense pleasure to express our sincere thanks to Dr. Rajesh A,
Professor and HOD, Department of Computer Science & Engineering, Jain (Deemed-
to-be University), for providing right academic guidance that made our task possible.
It is a matter of immense pleasure to express our sincere thanks to Prof.
Narasimhayya B E, Program Coordinator of the Department, Computer Science &
Engineering, Jain (Deemed-to-be University), for providing right academic guidance that
made our task possible.

We would like to thank, Project Coordinator, Prof. Venkataravana Nayak K, Assistant


Professor, Department of Computer Science & Engineering and all the staff members of
Computer Science & Engineering department for their support.

We would like to thank our guide Ms. Sunena Rose M V, Assistant Professor, Dept.
of Computer Science & Engineering, Jain (Deemed-to-be University), for sparing his
valuable time to extend help in every step of our project work, which paved the way for
smooth progress and fruitful culmination of the project.
We are also grateful to our family and friends who provided us with every requirement
throughout the course.
We would like to thank one and all who directly or indirectly helped us in completing
the Project work successfully.

Signature of Students

iv
TABLE OF CONTENTS

Page No.
CERTIFICATE ii
DECLARATION iii
ACKNOWLEDGEMENT iv
TABLE OF CONTENTS v
ABSTRACT vii
LIST OF FIGURES viii
NOMENCLATURE USED ix

Chapter 1
1. INTRODUCTION

1.1. Overview 1

1.2. Problem Definition 1

1.3. Objectives 1

1.4. Methodology 3

Chapter 2
2. LITERATURE SURVEY

2.1. Related Work 6

2.2. Existing System 7

2.3. Limitation of Existing System 8

2.4. Proposed System 10

Chapter 3
3. METHODOLOGY

3.1. Architecture 12

3.2. Sequence Diagram 14

v
Chapter 4
4. TOOL DESCRIPTION

4.1. Hardware Requirements 15

4.2. Software Requirements 15

Chapter 5
5. IMPLEMENTATION 18

Chapter 6
6. RESULTS AND ANALYSIS

6.1. Result Discussion and Analysis 27

Chapter 7
7. CONCLUSIONS AND FUTURE SCOPE
30
7.1 Conclusion

7.2 Future Scope 30

31
REFERENCES
33
APPENDIX

vi
ABSTRACT
The increasing availability and advancement of digital surveillance systems have underscored
the critical need for efficient and accurate weapon detection solutions to enhance public
safety and security. This abstract presents an innovative approach for real-time weapon
detection using the YOLOv8 (You Only Look Once version 8) deep learning framework.This
proposed system leverages the capabilities of convolutional neural networks (CNNs) to
automatically detect and classify weapons in real-time video streams. By employing a single-
stage object detection architecture, YOLOv8 achieves impressive accuracy and efficiency in
real-time applications. The weapon detection pipeline involves various stages, such as data
collection, preprocessing, model training, and real-time inference. A comprehensive dataset,
comprising a diverse range of firearm images, is carefully curated to train the YOLOv8
model. Transfer learning techniques are employed to fine-tune the pretrained model
specifically for detecting different types of weapons. During the inference stage, the
YOLOv8 model analyzes video frames in real-time, utilizing parallel computation to achieve
high processing throughput. By examining the spatial relationships and contextual
information within the frames, the model accurately localizes and classifies weapons,
providing real-time alerts for potential threats.

To evaluate the system's performance, extensive experiments are conducted on standard


datasets and custom video sequences. The results demonstrate the effectiveness of YOLOv8
in detecting weapons with high precision and recall rates, while maintaining real-time
processing capabilities. The real-time weapon detection system based on YOLOv8 holds
significant implications for enhancing security in public spaces, transportation hubs, and
other critical areas. By enabling rapid and automated weapon identification, potential threats
can be promptly detected and addressed, thereby improving public safety and minimizing
response time during critical situations.

Keywords: Weapon detection, YOLOv8, real-time, deep learning, object detection, public
safety, surveillance systems.

vii
LIST OF FIGURES
Fig. No. Description of Figure Page No.

3.1 Sequence Diagram of our Project 14


5.1 Dataset Labelling using MakeSenseAI 20
5.2 YAML Configuration File 22
5.3 Command for training YOLO for our custom data 23
5.4 Model Summary 24
6.1 Specified the window pixels and initiation of the flask module 28
6.2 Specified the window pixels and logs of the flask module 28
Information of the layers, parameters, gradient and GFLOPs
6.2 29
detected by the YOLOv8 model
6.3 Output of the model shown on the User Interface in Realtime 29
6.4 Model detecting a grenade launcher 29
6.5 Model detecting various types of pistols 30
6.6 Model detecting M416 sample 30

viii
NOMENCLATURE USED

YOLO You Only Look Once


CNN Convolution Neural Network
RCNN Region based Convolution Neural Network
SSD Single Shot Detection
ANN Artificial Neural Network
COCO Common Objects in Context
SVM Support Vector Machine
GPU Graphics Processing Unit
VGG Visual Geometry Group
SGD Stochastic Gradient Descent
YAML Yet Another Markup Language
HOG Head on Generation
IOU Intersection over Union
mAP Mean Average Precision

ix
Chapter 1
INTRODUCTION
1.1. Overview

This project focuses on the development of a real-time weapon detection system using the
YOLOv8 deep learning framework. The objective is to leverage the power of convolutional
neural networks (CNNs) to automatically detect and classify weapons in live video streams,
enhancing public safety and security in various settings.

The project encompasses multiple stages, starting with data collection. A diverse dataset
containing firearm images is curated to train the YOLOv8 model specifically for weapon
detection. Transfer learning techniques are applied to fine-tune the pretrained model, enabling
it to identify various types of weapons accurately.

Overall, the project aims to contribute to public safety by leveraging deep learning techniques
to automate weapon detection, reducing manual effort and response time in critical situations.

1.2. Problem Definition


The problem addressed in this project is the need for efficient and accurate weapon detection
in real-time video streams. The increasing availability of digital surveillance systems highlights
the importance of promptly identifying and responding to potential threats, thereby enhancing
public safety and security.

Traditional methods of weapon detection often rely on manual inspection, which is time-
consuming, labor-intensive, and prone to human error. To overcome these limitations, an
automated solution using deep learning techniques is proposed. The goal is to develop a real-
time weapon detection system based on the YOLOv8 framework, capable of accurately
identifying and classifying weapons in live video feeds.

1.3. Objectives
Develop a real-time weapon detection system: The primary objective of this project is to design
and implement a robust real-time weapon detection system using the YOLOv8 deep learning

1
framework. The system should be capable of analyzing live video streams and promptly
detecting weapons with high accuracy.

i)Curate a diverse weapon dataset: Collect and curate a comprehensive dataset containing a
wide range of firearm images and other potentially threatening objects. This dataset will be
used for training the YOLOv8 model to accurately identify and classify different types of
weapons.

ii)Train and fine-tune the YOLOv8 model: Utilize the curated dataset to train the YOLOv8
model, leveraging transfer learning techniques to adapt the pretrained model to the specific
task of weapon detection. Fine-tune the model to achieve high accuracy, precision, and recall
rates for weapon identification.

iii)Optimize model performance for real-time processing: Implement optimization


strategies to ensure efficient real-time processing of video frames. Explore techniques such as
parallel computation and hardware acceleration to enhance the system's speed and throughput.

iv)Evaluate system performance and metrics: Conduct extensive experiments to evaluate


the performance of the developed weapon detection system. Measure and analyze key metrics
such as precision, recall, and accuracy to assess the system's effectiveness in detecting weapons
in different scenarios.

v)Enhance robustness to environmental factors: Improve the system's resilience to


environmental factors such as lighting conditions, camera angles, and occlusions. Incorporate
techniques such as data augmentation and domain adaptation to enhance the model's ability to
generalize to unseen data and different surveillance settings.

vi)Demonstrate real-world applicability and usability: Showcase the practical applicability


of the developed system by integrating it into a real-world surveillance environment. Assess

2
its usability, reliability, and scalability in detecting weapons and providing real-time alerts for
potential threats.

By achieving these objectives, the project aims to contribute to enhancing public safety and
security by providing an efficient and accurate real-time weapon detection system that can be
deployed in various settings requiring threat detection and mitigation.

1.4. Methodology
1.Data Collection and Preparation:

• Gather a diverse dataset of firearm images and other potentially threatening objects.
Include variations in lighting conditions, angles, and occlusions to enhance the
model's robustness.
• Annotate the dataset with bounding boxes indicating the location of weapons in each
image.
• Split the dataset into training, validation, and testing sets.

2.Model Selection and Architecture:

• Choose YOLOv8 as the base object detection framework due to its real-time processing
capabilities and high accuracy.
• Adapt the YOLOv8 architecture for weapon detection by modifying the output classes
to represent different weapon categories.

3.Transfer Learning and Model Training:

• Initialize the YOLOv8 model with pretrained weights from a large-scale dataset, such
as COCO or ImageNet.
• Perform transfer learning by freezing the initial layers and fine-tuning the remaining
layers on the weapon detection dataset.
• Utilize optimization techniques such as gradient descent and adaptive learning rate
schedules to train the model.

3
4.Model Evaluation and Hyperparameter Tuning:

• Evaluate the trained model on the validation set using metrics like precision, recall, and
mean Average Precision (mAP).
• Conduct hyperparameter tuning, adjusting parameters such as learning rate, batch size,
and anchor sizes, to optimize the model's performance.

5.Real-time Inference:

• Implement the trained YOLOv8 model for real-time weapon detection on video
streams.
• Utilize techniques like frame sampling, parallel computation, and hardware
acceleration (e.g., GPUs) to ensure real-time processing.

6.Performance Evaluation:

• Evaluate the real-time weapon detection system on the testing set, measuring key
performance metrics like precision, recall, and accuracy.
• Analyze the system's performance in different scenarios, including varying lighting
conditions, occlusions, and camera angles.

7.Robustness Enhancement:

• Improve the system's robustness by incorporating techniques like data augmentation,


which can include image rotation, scaling, and noise addition.
• Explore domain adaptation methods to enhance the model's ability to generalize to
different surveillance environments and unseen data.

8.Integration and Deployment:

• Integrate the real-time weapon detection system into a real-world surveillance


environment or platform.

4
• Assess the usability, reliability, and scalability of the system in detecting weapons and
providing real-time alerts for potential threats.

Throughout the project, document the methodology, experimental setup, and findings to
facilitate reproducibility and future enhancements. Continuously iterate and refine the
approach based on evaluation results and feedback.

5
Chapter 2
LITERATURE SURVEY
2.1. Related Work

In this section, we briefly discuss about the previous work done on this project “Smart
Surveillance Using YOLO V8”.
related work in weapon detection using R-CNN, SSD, YOLOv3, and YOLOv5 methods
showcases the application of various deep learning techniques for detecting weapons in
surveillance videos. Here is an overview of the related work using these methods:

i) R-CNN:

Researchers have explored the use of R-CNN for weapon detection, where region proposal
techniques like selective search are employed to generate potential weapon regions. The
extracted regions are then classified using SVM or other classifiers. Limitations include slower
processing speed, inefficiency in handling scale and aspect ratios, and sensitivity to occlusions.

ii) SSD (Single Shot MultiBox Detector):

SSD has been utilized for real-time weapon detection, offering a one-stage approach that
predicts bounding boxes and class probabilities in a single pass. However, SSD may face
challenges in accurately detecting small-sized weapons and may have reduced precision
compared to other models.

iii) YOLOv3 (You Only Look Once v3):

YOLOv3 has been extensively used in weapon detection projects, leveraging its grid-based
approach for predicting bounding boxes and class probabilities directly. Researchers have
achieved real-time weapon detection with YOLOv3, although it may have limitations in
handling overlapping objects, small-sized weapons, and may require significant computational
resources.

6
iv) YOLOv5 (You Only Look Once v5):

YOLOv5, a lightweight and optimized version of YOLO, has been applied in real-time weapon
detection scenarios. It offers improved speed and accuracy compared to its predecessors.
However, YOLOv5 may have limitations in detecting small-sized weapons, sensitivity to
occlusions, and requires careful hyperparameter tuning.

These related works collectively demonstrate the application of deep learning techniques in
weapon detection, with varying emphasis on real-time processing, accuracy, and efficiency.
Each method has its advantages and limitations, and researchers have explored strategies to
address these limitations, such as incorporating data augmentation, optimizing network
architectures, and utilizing domain-specific datasets.

Overall, the related work provides valuable insights into the strengths and weaknesses of
different approaches, aiding in the selection and optimization of the most suitable method for
the proposed real-time weapon detection project.

2.2. Existing System


Here [3] multiple deep learning methods used for object detection are compared. They are
experimenting object detection using YOLO V3 and comparing its performance and
observation with the RCNN and SSD where they achieve average precision of 68% for YOLO
V3, 66.5% for SSD and 60% for faster RCNN. In their study YOLO V3 succeeds in
approaching the desired accuracy.

In this[4] Object detection and tracking in real time is done using Single shot detection (SSD)
algorithm which is based on VGG-16 architecture. The implementation methodology is
complex here where they followed different methods for object detection like Frame
differencing, Optical flow and Background subtraction. Followed by object tracking using
sequence of detection or detection with dynamics.

This study[5] introduces an innovative model that leverages advanced models such as YOLO,
renowned for its high-speed detection capabilities. The primary focus of this research is to
address the issue of false positives and negatives in Weapon Detection. To achieve this, the
model incorporates Gaussian blur to eliminate background noise and emphasizes the region of
interest. Additionally, the combination of YOLOv5 and Stochastic Gradient Descent (SGD)
enhances the overall performance of the proposed approach.

7
The[6] introduction of YOLOv4 in the object detection domain brought significant
improvements in both accuracy and speed compared to its predecessors. While the YOLOv4
paper did not explicitly focus on weapon detection, its advancements have exerted a
noteworthy influence on subsequent research in this area. Researchers have capitalized on the
enhanced detection capabilities of YOLOv4 and customized it specifically for weapon
detection, leading to notable advantages.

In their study, Nguyen et al. (2020) proposed a method for effectively detecting weapons in
surveillance[7] videos by leveraging the Single Shot MultiBox Detector (SSD). They designed
a customized SSD model and trained it on an extensive dataset of annotated surveillance
videos. The results of their research demonstrated the successful application of SSD for real-
time weapon detection in surveillance camera footage.

In this study,[8] a novel gun detection system was proposed for security applications,
employing the YOLOv3 architecture. The researchers trained the model on an extensive dataset
consisting exclusively of gun images, placing emphasis on its focused nature. To enhance the
model's robustness, they incorporated a variety of data augmentation techniques. Through
rigorous experimentation, the system demonstrated outstanding accuracy and real-time
performance, establishing its suitability for real-world deployment.

Kim et al. [9] (2021) aimed to develop a deep learning model utilizing YOLOv4 for gun
detection in security systems. They collected a comprehensive dataset of gun images from
various sources and applied transfer learning techniques to fine-tune the YOLOv4 model. The
proposed model exhibited high accuracy and surpassed previous approaches in terms of
detection performance.

2.3. Limitation of Existing System


i) Manual Inspection and Human Surveillance:

• Labor-intensive and time-consuming: Manual inspection and human surveillance


for weapon detection rely on security personnel visually inspecting individuals and
their belongings. This approach is slow and requires significant manpower.
• Subject to human error: Manual inspection is prone to human error, and security
personnel may miss or misidentify potential weapons, leading to false negatives or
false positives.

8
• Inefficient for real-time monitoring: Manual inspection is not suitable for real-time
monitoring of video streams, as it cannot process large amounts of video data in
real-time.

ii) Traditional Computer Vision Techniques:

• Limited accuracy: Traditional computer vision techniques, such as image


processing algorithms and feature-based methods, may have limitations in
accurately detecting weapons, especially in complex scenarios or under challenging
lighting conditions.
• Difficulty in handling object variations: Traditional techniques may struggle to
handle variations in weapon appearance, scale, orientation, and occlusions, leading
to reduced detection performance.
• Lack of adaptability: Traditional techniques often require manual fine-tuning and
may not generalize well to different surveillance environments or unseen weapon
types.

iii) Machine Learning Approaches without Real-time Processing:

• Slow processing speed: Some machine learning approaches for weapon detection,
such as R-CNN or two-stage methods, may be computationally expensive and have
slower processing speeds, making them unsuitable for real-time applications.
• Limited scalability: Models with slow processing speeds may not scale well to
large-scale surveillance systems with multiple cameras or high-resolution video
streams.

iv) Lack of Robustness and Generalization:

• Limited robustness to occlusions and environmental factors: Existing systems may


struggle with accurately detecting weapons when they are partially occluded by
objects or in challenging environmental conditions like poor lighting or cluttered
backgrounds.

9
• Limited generalization to unseen data: Many existing systems may have limited
generalization capabilities, as they may have been trained on specific datasets or
environments and may not perform well on unseen data or novel weapon types.

2.4. Proposed System


The proposed system aims to develop an automated real-time weapon detection system using
the YOLOv8 framework. This system overcomes the limitations of existing approaches by
leveraging deep learning techniques for accurate and efficient weapon detection in surveillance
videos.

Key Features of the Proposed System:

1.Real-time Processing: The system will be designed to operate in real-time, enabling the
detection of weapons in live video streams with minimal latency. This allows for proactive
threat identification and immediate response.

2.High Accuracy: By utilizing the YOLOv8 framework, the proposed system aims to achieve
high accuracy in weapon detection. The model will be trained on diverse datasets containing a
wide range of weapon types, orientations, scales, and occlusions, enhancing its detection
capabilities.

3.Efficient Architecture: YOLOv8 is optimized for speed and efficiency, making it suitable for
real-time applications. The proposed system will leverage the streamlined architecture to
process video frames quickly, enabling the analysis of high-resolution video streams in a timely
manner.

4.Adaptability and Generalization: The system will be designed to adapt to different


surveillance environments, allowing for deployment in various settings such as public spaces,
transportation hubs, or critical areas. It will also aim to generalize well to unseen data, enabling
effective weapon detection for different weapon types and variations.

5.Robustness to Occlusions and Complex Backgrounds: The proposed system will incorporate
techniques to improve robustness to occlusions and complex backgrounds. This will involve

10
leveraging deep learning methods for feature extraction, utilizing advanced network
architectures, and incorporating data augmentation strategies.

6.User-friendly Interface and Alerts: The system will feature a user-friendly interface that
provides real-time visualization of the video feed and detected weapon regions. It will also
generate alerts or notifications to security personnel or relevant authorities when a potential
weapon is detected, enabling prompt response and threat mitigation.

By developing the proposed system, we aim to enhance public safety and security by providing
an automated, accurate, and real-time weapon detection solution. The system's adaptability,
efficiency, and robustness will enable its deployment in various surveillance scenarios,
contributing to proactive threat detection and efficient resource allocation.

11
Chapter 3
METHODOLOGY
3.1. Architecture
The weapon detection system aims to develop an automated solution for detecting
weapons in real-time using the YOLOv8m (You Only Look Once) algorithm. The
system plays a crucial role in enhancing public safety and security by leveraging deep
learning techniques to accurately and efficiently detect weapons in surveillance videos.
To achieve robustness and reliability, it is essential to create a diverse dataset of
annotated images and videos that encompass various weapon types, angles, lighting
conditions, and backgrounds. This diverse dataset serves as the foundation for training
and fine-tuning the weapon detection model.

i) Dataset Creation:
The dataset creation phase is a crucial step in building an effective weapon detection
system. To create a diverse dataset, it is important to capture annotated images and
videos featuring different types of weapons. These images and videos should be
captured from various angles, under different lighting conditions, and against different
backgrounds. This diversity ensures that the model can generalize well and accurately
detect weapons in a wide range of real-world scenarios. Careful consideration should
be given to dataset size and diversity to ensure that the system is robust and can handle
variations in weapon appearance.

ii)Data Pre-processing:
Once the dataset is collected, it needs to be pre-processed to prepare it for training the
weapon detection model. This involves resizing the images to a standardized size and
normalizing the pixel values to ensure consistency across the dataset. Additionally, data
augmentation techniques such as rotation, scaling, and flipping should be applied to
increase the dataset's size and introduce variations in object appearance. Splitting the
dataset into training, validation, and testing subsets is also crucial for effective model
training and evaluation.

12
iii)YOLOv8m Model Architecture:
The YOLOv8m algorithm is a popular choice for object detection, including weapon
detection. The model architecture follows a one-stage approach, allowing it to predict
object bounding boxes and class probabilities directly. To improve the model's
performance, it is advisable to pretrain it on a large-scale dataset like COCO or
ImageNet. This initial training provides a strong foundation for the model to learn
generic features. Next, the model should be fine-tuned using the collected weapon
dataset to specialize in weapon detection. Fine-tuning involves updating the model's
weights by minimizing a predefined loss function based on the annotated data.

iv)Model Evaluation:
Once the YOLOv8m model is trained, it needs to be evaluated to assess its performance.
The evaluation is typically performed using the validation subset of the dataset. Metrics
such as precision, recall, and mean average precision (mAP) are computed to measure
the model's accuracy and effectiveness in detecting weapons. Based on the evaluation
results, further iterations of fine-tuning and parameter adjustments can be conducted to
improve the model's performance.

v)Real-Time Video Streaming and Weapon Detection:


To enable real-time weapon detection, a Flask application is developed specifically for
this purpose. The Flask application serves as the platform for integrating the trained
YOLOv8m model. The video stream is processed in real-time, with each frame being
passed through the model for weapon detection. The model predicts bounding boxes
around the identified weapons, allowing for visualization and tracking of weapons
within the video stream.

vi)Threshold Adjustment:
To optimize the performance of the weapon detection system, the confidence thresholds
need to be fine-tuned. These thresholds determine the sensitivity of the system in
detecting weapons. Adjusting the thresholds allows finding the right balance between
avoiding false positives (detecting non-weapons as weapons) and detecting true
positives (correctly identifying weapons). It may require experimentation and iterative
adjustments to find the optimal thresholds for the specific use case and desired
performance trade-offs.
13
3.2. Sequence Diagram

Fig 3.1: Sequence Diagram of our Project.

To train and detect using YOLOv8, you need to start by collecting a dataset and annotating it
with labelled bounding boxes. Next, configure the YOLOv8 architecture and set appropriate
hyperparameters for training. Train the network using the annotated dataset to learn how to
detect objects. Once the model is trained, you can use it for object detection by providing
input images and processing the output bounding box predictions. YOLOv8 uses a single-
stage object detection approach, making it efficient and capable of real-time detection. It
combines high accuracy with fast inference times, making it a popular choice for various
computer vision applications.

14
Chapter 4
TOOL DESCRIPTION

4.1. Hardware Requirements:


• Web camera: An external webcam is used to perform this project.

• GPU: Good GPU for good performance.

• PC/Laptop: laptop with minimum of I5 processor.

4.2. Software Requirements:


• Machine learning: Machine learning is a kind of programming which gives computers
the capability to automatically learn from data without being explicitly programmed.
This means in other words that these programs change their behaviour by learning from
data. Machine learning is a subset of artificial intelligence (AI) that focuses on the
development of algorithms and models that allow computer systems to learn and make
predictions or decisions without being explicitly programmed. It involves creating
mathematical models and algorithms that can automatically learn from and make
predictions or decisions based on data. Machine learning algorithms learn from data by
identifying patterns, relationships, and trends. They analyse and process large amounts
of data to uncover hidden insights and make predictions or decisions based on that
information. Machine learning has revolutionized many industries and has the potential
to drive significant advancements in various fields by enabling computers to learn and
make intelligent decisions based on data patterns.
• Python: Python is clearly one of the best languages for machine learning. Python does
contain special libraries for machine learning namely SciPy, pandas and NumPy which
is great for linear algebra and getting to know kernel method of machine learning. The
language is great to use when working with machine learning algorithms and has easy
syntax. Python is a high-level, interpreted programming language known for its
simplicity, readability, and versatility. It was created by Guido van Rossum and initially
released in 1991. Python emphasizes code readability and a clear syntax, making it
easier to write and understand compared to many other programming languages. Python
comes with a comprehensive standard library that provides a wide range of modules

15
and packages for performing common tasks, such as file I/O, networking, database
access, and more. The standard library greatly enhances Python's capabilities and
reduces the need for external dependencies.
• OpenCV: OpenCV stands for Open-Source Computer vision. It is an Open Source
BSD licensed library that includes hundreds of advanced computer vision algorithms
that are optimized to use hardware acceleration. OpenCV is commonly used for
machine learning, image processing, image manipulation and much more.
OpenCV has a modular structure. Image manipulation is easily performed in a few lines
of code using OpenCV.

• Google Colab: Colab is based on the popular Jupyter Notebook interface and supports
the creation of interactive notebooks. These notebooks consist of code cells that can be
executed individually, allowing for an interactive and iterative coding experience.
Users can write Python code, execute it, view the results, and add explanatory text or
visualizations in Markdown cells. One of the key advantages of Google Colab is its
provision of computing resources in the cloud. It offers both CPU and GPU (Graphical
Processing Unit) options, allowing users to run code that requires intensive
computational power, such as machine learning or deep learning tasks. The availability
of GPU resources can significantly speed up the execution of these computationally
demanding algorithms. Colab also integrates with other Google services. It provides
seamless access to Google Drive, allowing users to import and export files, including
datasets, code scripts, and trained models. Moreover, it enables collaboration by
allowing multiple users to work on the same notebook simultaneously, making it
suitable for team projects or educational purposes.
• Flask: Flask is a micro web framework written in Python. It is designed to be simple
and lightweight, providing the basic tools and features needed to build web
applications. Flask follows the principle of simplicity, focusing on providing a solid
foundation for web development while allowing developers the freedom to choose and
integrate additional libraries and tools as needed. Flask allows developers to define
routes for different URLs or endpoints of a web application. With Flask's route
decorators, developers can specify the URL patterns and associated functions that
handle incoming requests. Flask supports common HTTP methods like GET, POST,
PUT, DELETE, etc. Flask includes a template engine, called Jinja, which enables
developers to generate dynamic HTML content. Templates can be used to separate the

16
presentation layer from the application logic, making it easier to maintain and modify
web pages. Flask is widely used for building small to medium-sized web applications,
RESTful APIs, and microservices. Its simplicity, flexibility, and extensibility make it a
popular choice among Python developers who value lightweight frameworks and the
ability to customize and integrate additional components as needed.
• Deep Learning: Deep learning is a subfield of machine learning that focuses on building
and training artificial neural networks inspired by the structure and function of the
human brain. It aims to enable computers to learn and make complex decisions or
predictions by simulating the behaviour of interconnected neurons in neural networks.
Deep learning relies on artificial neural networks, which are composed of
interconnected layers of artificial neurons or nodes. These networks are designed to
mimic the behaviour of the human brain, where each neuron takes inputs, applies
weights, performs computations, and produces an output. Deep learning emphasizes the
use of deep neural networks, which have multiple hidden layers between the input and
output layers. The term "deep" refers to the depth of the network, as it has a large
number of layers compared to traditional neural networks. Deep learning algorithms
require large amounts of labelled training data to effectively learn and generalize. With
the advent of big data and advancements in computing power (including GPUs), deep
learning has become more feasible and has achieved impressive results in various
domains, such as image recognition, natural language processing, and speech
recognition.
• Visual Studio code: Visual Studio Code (VS Code) is a lightweight and versatile source
code editor developed by Microsoft. It is available for Windows, macOS, and Linux
and is widely used by developers for various programming languages and frameworks.
VS Code provides a rich and intuitive code editing experience. VS Code has a vast
extension marketplace that offers a wide range of extensions developed by the
community. These extensions provide additional functionalities, such as language
support, debugging capabilities, code snippets, version control integration, and more.
Developers can customize and enhance their editor's functionality by installing the
extensions that suit their needs. VS Code allows the configuration and execution of
various tasks, such as building, testing, and running applications, through its integrated
task runner. Users can define custom tasks or leverage predefined task configurations
for popular tools and frameworks, streamlining common development workflows.

17
Chapter 5

IMPLEMENTATION

i)Dataset Creation:

To create our custom dataset for YOLO models, we collected a variety of images featuring
different types of weapons. These images were manually labeled using an open-source website
called Make Sense. Make Sense is a popular tool used for creating datasets specifically for
YOLO (You Only Look Once) models, which are widely used for object detection tasks.

In total, we gathered 900 labeled images for our dataset, with 100 images dedicated to each
selected category of weapon. This ensured a diverse range of weapon types and variations
within each category. The purpose of such labeling is to provide ground truth annotations for
the model to learn from, enabling it to accurately identify and classify weapons in real-world
scenarios.

Additionally, we created a separate validation dataset consisting of 180 labeled images. Within
this validation set, there were 18 images for each category of weapon. The validation dataset
serves as an independent sample to assess the performance and generalization capability of the
trained YOLO models.

By manually labelling the images and creating this custom dataset, we have taken an essential
step in training and evaluating our YOLO models for weapon detection. The dataset's diversity
and the inclusion of a validation set will contribute to a robust and reliable model capable of
accurately detecting various types of weapons in different settings.

18
Fig 5.1: Dataset Labelling using MakeSenseAI

ii)Creating a YAML configuration file


This is an essential step in training a YOLO model as it specifies important parameters and
paths required for the training process. Let's delve into more detail about the elements typically
included in a YAML configuration file for YOLO models.

Paths to Training and Validation Data:


The YAML file contains the paths to both the training and validation datasets. These paths
specify the locations of the labeled images and associated annotations that were created during
the dataset creation process. The training data consists of the 900 labeled images, while the
validation data comprises the 180 labeled images, as mentioned earlier.

19
Classes of Data:
Another crucial component of the YAML file is specifying the classes or categories of data
present in the dataset. In our case, since we have selected different categories of weapons, we
would include a list of these weapon classes in the configuration file. For example, the classes
might include "handguns," "rifles," "knives," "explosives," and other relevant categories based
on the specific types of weapons chosen.

Model Configuration:
The YAML file also includes various parameters related to the model architecture and training
process. These parameters might include the network backbone, the number of anchor boxes
to be used, the input image size, the learning rate, batch size, and other hyperparameters. These
settings define the model's architecture and guide the training process.

Training and Optimization Settings:


The YAML configuration file specifies additional settings for training and optimization. This
may include the number of epochs (iterations) for training, the loss function to be used (e.g.,
YOLOv3 often uses a combination of localization loss and confidence loss), the optimizer (e.g.,
Adam or SGD), and any specific regularization techniques or augmentation strategies applied
during training.

Checkpoint and Output Paths:


To save the progress and trained models, the configuration file includes paths for saving
checkpoints and final trained models. These paths ensure that the model's weights are stored
periodically during training and can be used for further fine-tuning or evaluation. Additionally,
the configuration file specifies the path where the trained model's output files and logs will be
saved.

By creating a YAML configuration file with the relevant paths, classes, model settings, and
optimization parameters, we ensure that the training process is well-defined and can be
executed smoothly. The configuration file serves as a blueprint for training the YOLO model
on our custom dataset, enabling us to fine-tune the model to accurately detect weapons based
on the labeled images and annotations we have prepared.

20
Fig 5.2: YAML Configuration File

iii)Training:
To train our YOLO model using the YOLOv8m algorithm for object detection, we followed a
two-step process: pretraining on a large-scale dataset and fine-tuning on our collected weapon
dataset. Here are the details and parameters we used during the training process:

1. Pretraining on a Large-Scale Dataset:


We started by pretraining the YOLO model on a large-scale dataset such as COCO (Common
Objects in Context) or ImageNet. These datasets contain a vast number of images with diverse
object classes, enabling the model to learn general features and object detection capabilities.
Pretraining on such datasets helps the model acquire foundational knowledge and improves its
ability to detect objects accurately.

2. Fine-tuning on Collected Weapon Dataset:


After pretraining, we fine-tuned the model using the collected weapon dataset that we created.
Fine-tuning involves training the pretrained model on our specific dataset to adapt it to the task
of detecting weapons. This process allows the model to specialize in identifying and localizing
different types of weapons based on the annotations provided in our dataset.

3.Hyperparameter Configuration:
During the training process, we adjusted several hyperparameters to optimize the performance
of our YOLO model. Here are the specific parameters we used:

21
- Batch Size: We set the batch size to 16. This parameter determines the number of images
processed in each iteration during training. It affects memory usage and training speed. A larger
batch size can lead to faster convergence but requires more memory.

- Epochs: We trained the model for 300 epochs. An epoch represents one complete pass through
the entire training dataset. Training for multiple epochs allows the model to learn from the data
repeatedly and refine its performance over time.

- Image Size (imgsz): We set the image size to 640. This parameter determines the resolution
at which the images are processed during training. Larger image sizes can capture more details
but require more computational resources. The choice of image size depends on the specific
requirements of the task and available resources.

- Resume: We set the "resume" parameter to False. This indicates that we are starting the
training from scratch and not resuming from a previous checkpoint or trained model.

By adjusting these hyperparameters, we aimed to strike a balance between model performance


and computational efficiency, considering the specific characteristics of our weapon detection
task and available computing resources.

Throughout the training process, it is common to monitor metrics such as loss values, mAP
(mean Average Precision), and IoU (Intersection over Union) scores to evaluate the model's
progress and make further adjustments if necessary. Fine-tuning the pretrained YOLO model
using our custom weapon dataset with the provided hyperparameters helps the model specialize
in detecting weapons accurately, enhancing its practical applicability in real-world scenarios.

Fig 5.3: Command for training YOLO for our custom data.

iv)Evaluation
After training our YOLOv8m model on the weapon detection dataset, we proceeded to evaluate
its performance using the validation subset. The evaluation process involved measuring the
mean average precision (mAP), which is a commonly used metric for assessing the accuracy
of object detection models. Here are the evaluation results we obtained:

22
1. mAP50: 87.2%
mAP50 measures the average precision at a IoU (Intersection over Union) threshold of 0.5. It
represents how well the model accurately localizes and classifies weapons when there is at least
a 50% overlap between the predicted bounding boxes and the ground truth annotations. A
mAP50 of 87.2% indicates that the model performs well in detecting weapons with a
reasonable degree of accuracy.

2. mAP50-90: 70.3%

Fig 5.4: Model Summary

mAP50-90 measures the average precision over a range of IoU thresholds from 0.5 to 0.9, in
increments of 0.05. It provides a broader evaluation of the model's performance across a range
of IoU thresholds, indicating how well the model generalizes to different levels of bounding
box overlap. An mAP50-90 of 82.3% suggests that the model maintains good accuracy across
a wider range of IoU thresholds.

These evaluation results demonstrate that our trained YOLOv8m model performs well in
detecting weapons in the validation subset of our dataset. The mAP50 of 87.2% indicates a
high degree of accuracy in localizing and classifying weapons with a 50% IoU threshold, while
the mAP50-90 of 82.3% reflects the model's ability to generalize well to different levels of
bounding box overlap.

It is worth noting that evaluation metrics can vary depending on the specific requirements and
thresholds set for the object detection task. Additionally, it is important to consider other factors
such as the balance between precision and recall, false positive and false negative rates, and
the specific application context when assessing the overall effectiveness of the model.

These evaluation results provide valuable insights into the performance of our trained
YOLOv8m model and indicate its potential for accurate weapon detection. However, further
testing and evaluation in real-world scenarios are recommended to validate its effectiveness
and ensure reliable performance in practical applications.

v)Implementation

23
Implementing a real-time streaming application for weapon detection involves integrating a
Flask web application with OpenCV to capture live video, convert it into frames, and pass each
frame through our custom YOLO model for detection. Here's a more detailed explanation of
the process:

1. Flask Web Application Setup:


We developed a Flask web application to handle the streaming functionality and serve as the
interface for the users. Flask is a popular web framework in Python that allows us to create web
applications easily. We set up the necessary routes and endpoints to handle the video streaming
and detection process.

2. Live Video Capture using OpenCV:


We utilized OpenCV (Open Source Computer Vision Library), a powerful computer vision
library, to capture live video frames from a webcam or any other video source. OpenCV
provides functions and utilities for video input/output, image processing, and computer vision
tasks. We incorporated the OpenCV functions within our Flask application to capture the live
video stream.

3. Converting Video to Frames:


Using OpenCV's video capture functionality, we retrieved each frame from the live video
stream. These frames represent individual images that we will pass through our YOLO model
for weapon detection. OpenCV provides methods to access and manipulate frames efficiently,
allowing us to perform real-time processing on each frame.

4. Object Detection with Custom YOLO Model:


For each captured frame, we passed it through our custom YOLO model that we trained earlier
for weapon detection. The YOLO model analyzes the frame and identifies the presence of
weapons by drawing bounding boxes around them and classifying the detected objects
accordingly. This process allows us to perform real-time weapon detection on the live video
stream.

5. Displaying Results and Streaming:


Once the frame has been processed by the YOLO model, we can display the results on the web
application interface. This can include drawing the bounding boxes around the detected
weapons and displaying the class labels or any other relevant information. The processed
frames are then streamed back to the user through the Flask web application, providing real-
time feedback on the detected weapons.

24
By integrating a Flask web application with OpenCV and our custom YOLO model, we are
able to capture live video, convert it into frames, perform weapon detection on each frame in
real-time, and display the results back to the user through a web interface. This implementation
enables us to create a practical and interactive application for real-time weapon detection using
our trained YOLO model.

vi)Detection
During the real-time video streaming process, we applied our trained YOLO model to each
frame to detect and localize weapons. This involved drawing bounding boxes around the
detected weapons and displaying the confidence score of each detection. Here's a more detailed
explanation of the implementation:

1. YOLO Model Inference on Each Frame:


For every frame captured from the live video stream, we passed it through our custom YOLO
model for weapon detection. The YOLO model analyzed the frame and identified potential
weapons based on learned patterns and features. The model output provided information about
the bounding box coordinates, class labels, and confidence scores for each detected object.

2. Localization with Bounding Boxes:


Once the YOLO model identified the weapons in each frame, we drew bounding boxes around
them to visually localize their positions. The bounding boxes are defined by their top-left and
bottom-right coordinates, indicating the region in which the weapon is located. Drawing the
bounding boxes helps users identify and understand the precise location of the detected
weapons within the video frame.

3. Confidence Score Display:


To provide additional information about the detected weapons, we displayed the confidence
score of each detection. The confidence score represents the model's confidence level in
classifying the object as a weapon. We typically displayed the confidence score in the top-right
corner of the bounding box. This score helps users assess the reliability and certainty of each
detection.

4. Fine-tuning Confidence Thresholds:


To control the sensitivity of weapon detection, we fine-tuned the confidence thresholds. The
confidence threshold determines the minimum confidence score required for an object to be
considered a valid detection. By adjusting the threshold, we can control the trade-off between
sensitivity and specificity. Lower thresholds may result in more detections but with a higher

25
chance of false positives, while higher thresholds may lead to fewer detections but with
increased accuracy.

5. Continuous Output Display:


To provide a seamless real-time experience, we ensured that the detection and localization
results were displayed continuously. As each frame was processed, the bounding boxes,
confidence scores, and any other relevant information were updated and displayed in real-time
on the web application interface. This ensured that the users received immediate feedback on
the detected weapons as the video stream progressed.

By implementing this process, we were able to leverage the YOLO model to detect and localize
weapons in real-time video streams. The bounding boxes, confidence scores, and continuous
output display facilitated easy visualization and understanding of the detected weapons,
enabling users to assess the situation promptly and take appropriate actions if necessary.

26
Chapter 6

RESULTS AND ANALYSIS

6.1. Result Discussion and Analysis


In this thesis weapon is searched in the window of size 320 x 240 pixels in an image of
custom resolution, as the database available for training the neural network has images
with 320 x 240 pixels resolution. Since HOG(Histogram of Oriented Gradient) is selected
as feature vector for describing the features of weapon and as HOG parameters are not
adjusted adaptively to the size of the image , as well as length of the input vector to the
neural network must be constant, knife is searched in the real time image using sliding
window of 320 x 240 pixels and a buffer size value as 2.

Figure 6.1

Fig 6.1: Specified the window pixels and initiation of the flask module

Fig 6.2: Specified the window pixels and logs of the flask module

27
Fig 6.3: Information of the layers, parameters, gradient and GFLOPs detected by the YOLOv8 model

Fig 6.4: Output of the model shown on the User Interface in Realtime

Our model achieved a mAP of 87.2% over 6 categories of weapons which is very good for any detection
model. Our YOLOv8m model is very accurate in detecting weapons and can also detect multiple weapons in a
single frame. These are few results of that are predicted on live stream through our flask web application.

Fig 6.5: model detecting a grenade launcher

28
Fig 6.6: Model detection various types of pistols

Fig 6.7: Model detecting M416 sample

29
Chapter 7
CONCLUSIONS AND FUTURE SCOPE

7.1 Conclusion

YOLOv8 is the latest release in the family of YOLO models, defining a new state-of-
the-art in object detection. When benchmarked on Roboflow 100, we saw a significant
performance boost between v8 and v5.

The YOLOv8 software is designed to be as intuitive as possible for developers to use.


With a new Ultralytics YOLOv8 pip package, using the model in your code has never been
easier. There is also a new command line interface that makes training more intuitive, too.

In conclusion, this research paper delved into the application of YOLOv8m (You Only
Look Once) for real-time weapon detection in streaming environments. Through extensive
experimentation and meticulous analysis, the study showcased the effectiveness and efficiency
of YOLO in swiftly and accurately identifying weapons. Integrating YOLO into real-time
streaming systems has significant implications for bolstering security measures, expediting
threat recognition, and enabling prompt responses in critical scenarios. The findings underscore
YOLO's potential as a valuable tool for real-time weapon detection across diverse domains,
such as public spaces, transportation hubs, and critical infrastructure, thereby augmenting
safety and security. Future research endeavours aimed at advancing and optimizing YOLOv8m
and its related techniques hold tremendous promise in advancing ongoing initiatives to ensure
public safety and security.

7.2 Future Work:


In these types of projects, there is always space for improvement. This proposed
methodology gives a good surveillance but this can be expensive to install due to the need of a
high computing GPU. Hardware advancement can be much more useful. Dataset with much
more types of weapons can be trained. Improving the speed of the model is also very helpful.
Making the application adapt better for different Environments. By addressing these aspects,
researchers can enhance the system's scalability and enable its practical applicability in real-
world surveillance scenarios.

30
REFERENCES

[1] González, Jose Luis & Zaccaro, Carlos & Alvarez-Garcia, Juan & Soria Morillo, Luis
& Caparrini, Fernando. (2020). Real-time gun detection in CCTV: An open problem.
Neural networks : the official journal of the International Neural Network Society. 132.
297-308. 10.1016/j.neunet.2020.09.013.

[2] Grega M, Matiolański A, Guzik P, Leszczuk M. Automated Detection of Firearms and


Knives in a CCTV Image. Sensors (Basel). 2016 Jan 1;16(1):47. doi:
10.3390/s16010047. PMID: 26729128; PMCID: PMC4732080.

[3] W. Tarimo, M. M.Sabra and S. Hendre, "Real-Time Deep Learning-Based Object


Detection Framework," 2020 IEEE Symposium Series on Computational Intelligence
(SSCI), Canberra, ACT, Australia, 2020, pp. 1829-1836, doi:
10.1109/SSCI47803.2020.9308493.

[4] G. Chandan, A. Jain, H. Jain and Mohana, "Real Time Object Detection and Tracking
Using Deep Learning and OpenCV," 2018 International Conference on Inventive
Research in Computing Applications (ICIRCA), Coimbatore, India, 2018, pp. 1305-
1308, doi: 10.1109/ICIRCA.2018.8597266.

[5] Asad, Muhammad & Hashmi, Tufail & Rasheed, Osama. (2023). Multiplatform
Surveillance System for Weapon Detection using YOLOv5.
10.1109/ICET56601.2022.10004690.

[6] YOLOv4: Optimal Speed and Accuracy of Object Detection by Alexey Bochkovskiy,
Chien -Yao Wang, Hong- Yaun Mark Liao.
[7] Efficient Weapon Detection in Surveillance Videos using SSD by Nguyen et al.
(2020):
[8] https://towardsdatascience.com/step-by-step-yolo-model-deployment-in-localhost-
using-python-8537e93a1784
[9] https://www.makesense.ai/

31
[10] https://docs.ultralytics.com/reference/hub/auth
[11] https://www.augmentedstartups.com/blog/10-things-you-need-to-know-about-
ultralytics-yolov8
[12] https://blog.roboflow.com/whats-new-in-yolov8/
[13] https://ultralytics.com/article/Introducing-Ultralytics-YOLOv8
[14] https://blog.roboflow.com/how-to-train-yolov8-on-a-custom-dataset/
[15] https://labelstud.io/blog/quickly-create-datasets-for-training-yolo-object-
detection-with-label-studio/

32
APPENDIX

33
34
35

You might also like