19bce0014 VL2021220702099 Pe003
19bce0014 VL2021220702099 Pe003
19bce0014 VL2021220702099 Pe003
On
FINAL REPORT
Submi ed By
Submi ed to
Prof. SARAVANAKUMAR K
For
CSE1901 TECHNICAL ANSWERS FOR REAL WORLD PROBLEMS
tt
tt
ti
ti
PROBLEM STATEMENT
INTRODUCTION
YOLO is an acronym that stands for You Only Look Once. We are
employing Version 5, which is now the most advanced object
identification algorithm available. It is a novel convolutional neural
network (CNN) that detects objects in real-time with great accuracy. This
approach uses a single neural network to process the entire picture, then
separates it into parts and predicts bounding boxes and probabilities for
each component. These bounding boxes are weighted by the expected
probability.
k
OBJECTIVES
1) To create a dataset of students wearing mask and/or ID card.
2) To train a model that detects mask and id card.
3) Run inference and check real time scenarios.
RESEARCH PAPERS
PROPOSED ARCHITECTURE
The network architecture of YOLOv5 is as follows:
Image source: ResearchGate - A Forest Fire Detec on System Based on Ensemble Learning
The data are first input to CSPDarknet for feature extraction, and then fed
to PANet for feature fusion. Finally, Yolo Layer outputs detection results
(class, score, location, size).
ti
PROPOSED SOLUTION
● We will use the YOLOv5 object detection architecture in PyTorch.
● One stage detectors are better at real time applications.
● We also plan to solve the speed, time, memory and accuracy problem
with the help of Multi Task Learning.
● In multi-class classification, you are assigning a single label to an input
image, whereas in multi-task learning, you are asking each input image
whether it has a particular object 1, object 2… object n.
● With multi-task learning, multiple objects can appear in the same image
and hence one image has multiple labels (one for each object that we are
going to detect.
● With multi-task learning, we train a single neural network to look at
each image and solve N different classification problems. This results in
better performance than training N completely separate neural networks to
do N tasks separately.
MODULES
Roboflow
Roboflow is a computer application website that allows us to create
custom datasets. First, we create a workspace and then add our project
Mask&IDCard detection. Then we add the images we clicked. The
software runs a preprocessor to make the dimensions of all images
uniform. It then allows us to add annotations. We annotate each and every
image by adding boxes around Masks and ID cards in each image. The
software also splits the dataset into train, validation and test set.
SMMOD.ipynb
This file/module clones the publicly available yolov5 repo. It also reads
the custom dataset that we create from Roboflow. The number of classes
parameter is edited in the predefined yolov5s.yaml file to suit our needs. It
then gives a function call to train.py to train the neural network to a
specific number of epochs and save the weights to the best.pt file. We then
plot the various performance parameters using tensorboard (mAP,
precision, recall, loss). We then run the inference on a test dataset to verify
the results.
Train.py
This module is responsible for training our model. It initializes other
classes like the Model class and the Dataloader class. It trains the model
on a specific number of epochs. It also calculates all the performance
metrics. It finally stores the updated or the best weights into the best.pt file
and saves the model. It also has the option to train the model on the GPU
of the device.
Yolov5s.yaml
This file deals with the model configurations. It defines the anchors.
Anchor boxes are a set of predefined bounding boxes of a certain height
and width. These boxes are defined to capture the scale and aspect ratio of
specific object classes you want to detect and are typically chosen based
on object sizes in your training datasets. Then comes the structure of the
backbone, which is a convolutional neural network that pools image pixels
Module LetterBox
This module resizes and pads the images while meeting stride-multiple
constraints and then draws a bounding box around the detected object. We
first get the current shape of the image and resize to out desired shape.
Then, we scale down the image for a better mAP score. Next up, we
compute the padding required to surround the image. Now, finally using
OpenCV's copyMakeBorder() method we draw a border around the object
with the specified colour on the image.
Detect.py
It initializes all the parameters like:
● Source of the videostream or image.
● Source of the weights file.
● Image size.
● Device: CPU / GPU.
● Confidence threshold.
It loads the model and opens the videostream. It then continuously reads
the frames and preprocesses the frame using the letterbox module and then
runs inference, makes predictions by drawing the boxes around the
detected objects.
10
DATASET
Our dataset has been created by our team members. We have clicked
pictures of people with four different cases to make our dataset. These
include pictures without mask and id card, pictures with only id card,
pictures with only mask, and pictures with both id card and mask. Since
our model is applies to VIT library, we used VIT students with mask and
VIT ID card as our dataset. Our dataset comprises of 324 pictures.
11
PERFORMANCE METRICS
12
EVALUATION METRICS
F1 Curve
Confusion Matrix
13
RESULTS
14
DISCUSSION
It was noticed that the inference of our model ran smoothly for real time
object detection with CPU computing, whereas multiple models required
GPU computing speed to account for the delay in the inference process.
There was also a significant difference in the memory used for running the
inference. Single model (500MB RAM) used 200MB of RAM lesser
compared to multiple model approach (700MB).
Also, our multi task learning approach trains a single neural network to
detect all objects simultaneously, thereby reducing the cost function (Log
loss) significantly compared to what other approaches can offer.
The above reasons are why the MTL approach is very beneficiary for real-
time object detection where inference time and device memory are
constraints.
In this way, we aim to solve a simple problem existing in VIT. Our project
clearly confirms or rejects the entry of the students depending upon if they
are wearing their ID Cards and Masks or not.
15
CONCLUSION
16
REFERENCES
[1] Tomar, Haider, Sagar (May 2022). A Study on Real Time Object Detection using Deep Learning.
International Journal of Engineering Research & Technology (IJERT) Volume 11, Issue 05
[2] Jaiswal, T., Pandey, M., & Tripathi, P. (2022, March). Real Time Multiple-Object Detection Based
On Enhanced SSD. In 2022 Second International Conference on Power, Control and Computing
Technologies (ICPC2T) (pp. 1-5). IEEE.
[3] Xu, Renjie & Lin, Haifeng & Lu, Kangjie & Cao, Lin & Liu, Yunfei. (2021). A Forest Fire Detection
System Based on Ensemble Learning. Forests. 12. 217. 10.3390/f12020217.
[4] Naik, U. P., Rajesh, V., & Kumar, R. (2021, September). Implementation of YOLOv4 algorithm for
multiple object detection in image and video dataset using deep learning and artificial intelligence for
urban traffic video surveillance application. In 2021 Fourth International Conference on Electrical,
Computer and Communication Technologies (ICECCT) (pp. 1-6). IEEE.
[5] Khattar, A., Hegde, S., & Hebbalaguppe, R. (2021). Cross-domain multi-task learning for object
detection and saliency estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (pp. 3639-3648).
[6] Cao, D., Chen, Z., & Gao, L. (2020). An improved object detection algorithm based on multi-scaled
and deformable convolutional neural networks. Human-centric Computing and Information Sciences,
10(1), 1-22.
[8] L. Jiao et al., "A Survey of Deep Learning-Based Object Detection," in IEEE Access, vol. 7, pp.
128837-128868, 2019, doi: 10.1109/ACCESS.2019.2939201.
[9] Zhao, Zhong-Qiu & Zheng, Peng & Xu, Shou-Tao & Wu, Xindong. (2019). Object Detection With
Deep Learning: A Review. IEEE Transactions on Neural Networks and Learning Systems. PP. 1-21.
10.1109/TNNLS.2018.2876865.
[10] S., Manjula & Krishnamurthy, Lakshmi & Ravichandran, Manjula. (2016). A Study On Object
Detection.
17