Lecture 7 Deep Learning in Object Detection 2025
Lecture 7 Deep Learning in Object Detection 2025
CPCS432
Lecture 7
A
Deep Learning in
Object Detection
Computer Vision
Applying Deep Learning Algorithms
for Object Detection
1
ebotenique Object Detection
CPCS432 Lecture 7
Dr. Arwa Basbrain
Object Detection
CPCS432 Lecture 7
Dr. Arwa Basbrain
Ground Truth in Object Detection
1. Class Labels
The ground truth includes the object class label, such as "dog," "car," "person," etc.,
indicating the true identity of each object in the image.
2. Bounding Boxes
Ground truth bounding boxes specify the exact location and size of each object in the image.
A bounding box is typically represented by four values: the x and y coordinates of the top-
left corner, the width, and the height of the box, or alternatively, the coordinates of both
top-left and bottom-right corners. not 12t
M
meh
Best
Ref -https://www.youtube.com/watch?v=QBVzZBsif20&ab_channel=DATAtab
CPCS432 Lecture 7 28/09/2023 15
How To Evaluating a Vision System
Ref -https://www.youtube.com/watch?v=QBVzZBsif20&ab_channel=DATAtab
CPCS432 Lecture 7 28/09/2023 16
How To Evaluating a Vision System
Ref -https://www.youtube.com/watch?v=QBVzZBsif20&ab_channel=DATAtab
CPCS432 Lecture 7 28/09/2023 17
How To Evaluating a Vision System
Ref -https://www.youtube.com/watch?v=QBVzZBsif20&ab_channel=DATAtab
CPCS432 Lecture 7 28/09/2023 18
How To Evaluating a Vision System
Ref -https://www.youtube.com/watch?v=QBVzZBsif20&ab_channel=DATAtab
CPCS432 Lecture 7 28/09/2023 19
How To Evaluating a Vision System
20
Ref -https://www.youtube.com/watch?v=QBVzZBsif20&ab_channel=DATAtab
CPCS432 Lecture 7 28/09/2023 20
Important components of Object Detection
thisow for
onein tno
Notice how the sliding box is moving across the entire image
wom
may V
throug-2
1
passes selectarch
Y
>
-
CPCS432 Lecture 7 28/09/2023 23
Important components of Object Detection
If an object lies over multiple grids, then the grid that contains the midpoint of
that object is responsible for detecting that object.
Non-max suppression
The object can be in represented multiple grids. The grid with the highest probability will
be the final prediction for that object.
Anchor Boxes
Anchor boxes are used to capture the scaling and aspect ratio of the objects we wish to detect. They
are of predefined size (height and width) and are sized based on the size of the object we want to
detect.
1.R-CNN: Regions with CNN features. It combines Regional Proposals with CNN.
3.Faster R-CNN: Object detection networks on Region Proposal algorithms to hypothesize objecti
locations. deation
Also
4.Mask R-CNN: This network extends Faster R-CNN by adding the prediction of-
segmentation masks
on each region of interest.
gaste5.YOLO: You Only Look Once architecture. It proposes a single Neural Network to predict bounding
boxes and class probabilities from an image in a single evaluation. >
- yoloug or
youll Also does segmentation
Nei 6.SSD: Single Shot MultiBox Detector. It presents a model to predict objects in images using a single
deep Neural Network.
lowea
but
Bounding
determine o Boxes
After
o to
Image reference : https://www.youtube.com/watch?v=nJzQDpppFj0&t=811s&ab_channel=SoroushMehraban
CPCS432 Lecture 7 28/09/2023 30
DL architectures in Object Detection
Region-based CNN (R-CNN)
Reference : https://www.youtube.com/watch?v=nJzQDpppFj0&t=811s&ab_channel=SoroushMehraban
CPCS432 Lecture 7 28/09/2023 31
DL architectures in Object Detection
R-CNN
distratages
Challenges with R-CNN
1. R-CNN implements three algorithms (CNN for extracting the features, 2 SVM). It makes R-CNN
solutions quite slow to be trained.
2. It extracts features using CNN for 2000 regions which makes it slower.
4. The selective search algorithm is fixed, and not much improvements can be made.
CS-664
CPCS432Lecture
Lecture6 7Dr. Arwa Basbrain 28/09/2023 32
DL architectures in Object Detection
Fast R-CNN
a
features see
Reference : https://www.youtube.com/watch?v=5gAq6BZ87aA&ab_channel=SoroushMehraban
CPCS432 Lecture 7 28/09/2023 33
DL architectures in Object Detection
Fast R-CNN
Regioninterest
&
Reference : https://www.youtube.com/watch?v=5gAq6BZ87aA&ab_channel=SoroushMehraban
CPCS432 Lecture 7 28/09/2023 34
DL architectures in Object Detection
Fast R-CNN
Regions Of Interest
changea
• Softmax layers have better accuracy than SVM and have faster execution time.
gotes
↓or
An
CPCS432 Lecture 7 28/09/2023 38
DL architectures in Object Detection
Faster R-CNN
• YOLO was proposed in 2016 by Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi.
• You Only Look Once or YOLO is targeted for real-time object detection.
• YOLO a single CNN predicts both the bounding boxes and the respective class probabilities.
con ecte
it t
output raa
seach
ser
I
car
Ground a
or
a
teein
e
-
1767234
2n
230Passen
-
3
30
use
have
-
:
-it
-b
jur + -
-
zo
DL architectures in Object Detection
You Only Look Once (YOLO)
his
Yolous
Challenges with YOLO
It suffers from high localization error. Moreover, since each of the
doing
grid cells predicts only two boxes and can have only one class as the
output, YOLO can predict only a limited number of nearby objects. It
suffers from a problem of low recall too. And hence in the next
version of YOLOv2 and YOLOv3, these issues were addressed.
from a
tartingorded war
Sand 29