0% found this document useful (0 votes)
7 views

Lecture 7 Deep Learning in Object Detection 2025

The document discusses deep learning techniques in object detection, highlighting the dual tasks of classification and localization. It covers evaluation metrics such as Intersection over Union (IoU) and Mean Average Precision (mAP), along with various deep learning architectures like R-CNN, Fast R-CNN, and YOLO. Key components such as sliding window approaches, bounding boxes, and anchor boxes are also explained, emphasizing their importance in improving detection accuracy and performance.

Uploaded by

dunaziad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Lecture 7 Deep Learning in Object Detection 2025

The document discusses deep learning techniques in object detection, highlighting the dual tasks of classification and localization. It covers evaluation metrics such as Intersection over Union (IoU) and Mean Average Precision (mAP), along with various deep learning architectures like R-CNN, Fast R-CNN, and YOLO. Key components such as sliding window approaches, bounding boxes, and anchor boxes are also explained, emphasizing their importance in improving detection accuracy and performance.

Uploaded by

dunaziad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

&

CPCS432
Lecture 7
A
Deep Learning in
Object Detection
Computer Vision
Applying Deep Learning Algorithms
for Object Detection

Dr. Arwa Basbrain

1
ebotenique Object Detection

Object detection involves two distinct sets of activities:


locating objects and classifying objects.
Object Detection

Locating objects within the image is called object localization,


which is typically performed by drawing bounding boxes around the
objects.

CPCS432 Lecture 7
Dr. Arwa Basbrain
Object Detection

Introduction to Object Detection in Computer Vision


1. Object detection is a computer vision technique that identifies and locates
Object Detection

objects in images or videos.


2. It performs two tasks simultaneously:
most • Classification: Identifying the type of object (e.g., person, car, dog).
e

• Localization: Determining the object’s location in the form of


bounding boxes or masks.
3. Object detection is critical for applications like autonomous driving,
surveillance, and medical imaging.

CPCS432 Lecture 7
Dr. Arwa Basbrain
Ground Truth in Object Detection
1. Class Labels
The ground truth includes the object class label, such as "dog," "car," "person," etc.,
indicating the true identity of each object in the image.

2. Bounding Boxes
Ground truth bounding boxes specify the exact location and size of each object in the image.
A bounding box is typically represented by four values: the x and y coordinates of the top-
left corner, the width, and the height of the box, or alternatively, the coordinates of both
top-left and bottom-right corners. not 12t
M

CPCS432 Lecture 7 28/09/2023 11


Evaluation Metrics in Object
Detection
1. Intersection over Union (IoU) Jaccard Index :
in
Accuracy
a

Measures overlap between predicted and


Bad
ground truth bounding boxes. =

meh

Best

CPCS432 Lecture 7 28/09/2023 12


En
sprecision
Evaluation Metrics in Object
Detection
corian class

2. Mean Average Precision (mAP): Calculates average


precision across all classes and IoU thresholds, providing an
overall measure of model accuracy.
Average Precision (AP)
The average precision (AP) is a way to summarize the
precision-recall curve into a single value representing the
average of all precisions. The AP is calculated according to the
next equation. Using a loop that goes through all
precisions/recalls, the difference between the current and next
#of classes & 8
recalls is calculated and then multiplied by the current precision.
In other words, the AP is the weighted sum of precisions at each
threshold where the weight is the increase in recall. 47
to
forevery
CPCS432 Lecture 7 28/09/2023 13
Evaluation Metrics in Object
Detection

3. . Real-Time Performance Metrics: Balance between accuracy


and detection speed (e.g., frames per second, FPS).

CPCS432 Lecture 7 28/09/2023 14


How To Evaluating a Vision System
The resulted of ML Algorithm is:
• True Positive (TP): The number of instances correctly predicted as positive. 4
• True Negative (TN): The number of instances correctly predicted as negative. 3
• False Positive (FP) : The number of instances incorrectly predicted as positive. 2
• False Negative (FN) : The number of instances incorrectly predicted as negative. 1

Negatives: not having the disease Positives: having the disease

Ref -https://www.youtube.com/watch?v=QBVzZBsif20&ab_channel=DATAtab
CPCS432 Lecture 7 28/09/2023 15
How To Evaluating a Vision System

The resulted of ML Algorithm is:


• True Positive (TP): 4
• True Negative (TN): 3
• False Positive (FP) : 2
• False Negative (FN) : 1

Ref -https://www.youtube.com/watch?v=QBVzZBsif20&ab_channel=DATAtab
CPCS432 Lecture 7 28/09/2023 16
How To Evaluating a Vision System

The resulted of ML Algorithm is:


• True Positive (TP): 4
• True Negative (TN): 3
• False Positive (FP) : 2
• False Negative (FN) : 1

Ref -https://www.youtube.com/watch?v=QBVzZBsif20&ab_channel=DATAtab
CPCS432 Lecture 7 28/09/2023 17
How To Evaluating a Vision System

The resulted of ML Algorithm is:


• True Positive (TP): 4
• True Negative (TN): 3
• False Positive (FP) : 2
• False Negative (FN) : 1

Ref -https://www.youtube.com/watch?v=QBVzZBsif20&ab_channel=DATAtab
CPCS432 Lecture 7 28/09/2023 18
How To Evaluating a Vision System

The resulted of ML Algorithm is:


• True Positive (TP): 4
• True Negative (TN): 3
• False Positive (FP) : 2
• False Negative (FN) : 1

Ref -https://www.youtube.com/watch?v=QBVzZBsif20&ab_channel=DATAtab
CPCS432 Lecture 7 28/09/2023 19
How To Evaluating a Vision System

The resulted of ML Algorithm is:


• True Positive (TP): 4
• True Negative (TN): 3
• False Positive (FP) : 2
• False Negative (FN) : 1

20

Ref -https://www.youtube.com/watch?v=QBVzZBsif20&ab_channel=DATAtab
CPCS432 Lecture 7 28/09/2023 20
Important components of Object Detection

• Sliding window approach for Object Detection


• Bounding box approach
• Non-max suppression
• Anchor boxes concept

CPCS432 Lecture 7 28/09/2023 21


Important components of Object Detection

Sliding window approach for Object Detection

thisow for
onein tno

Notice how the sliding box is moving across the entire image

CPCS432 Lecture 7 28/09/2023 22


Important components of Object Detection
Sow
Sliding window approach for Object Detection
5

wom

may V
throug-2
1

passes selectarch
Y

>
-
CPCS432 Lecture 7 28/09/2023 23
Important components of Object Detection

Bounding box approach for Object Detection ow dete rmi


to
a
n eng
Divides the entire image into grids (x by x), and then for each grid. which s
the

A bounding box can give us the following details: botstandina


Pc: Probability of having an object (0: no object, 1: an object).
Bx: the x coordinate.
By: the y coordinate
Bh: the height
Bw: the width
C1: Class 1
C2: Class 2

If an object lies over multiple grids, then the grid that contains the midpoint of
that object is responsible for detecting that object.

CPCS432 Lecture 7 28/09/2023 24


Important components of Object Detection

Non-max suppression

The object can be in represented multiple grids. The grid with the highest probability will
be the final prediction for that object.

CPCS432 Lecture 7 28/09/2023 25


Important components of Object Detection.

Anchor Boxes

Anchor boxes are used to capture the scaling and aspect ratio of the objects we wish to detect. They
are of predefined size (height and width) and are sized based on the size of the object we want to
detect.

During the process of object detection, each anchor box is


tiled across the image, and the Neural Network outputs a
unique set of predictions for each of the anchor boxes.

The output consists of


1. The probability score,
for echo t
2. IoU,
3. background,
4. Offset for each anchor box.

CPCS432 Lecture 7 28/09/2023 26


Deep Learning frameworks for Object Detection

1.R-CNN: Regions with CNN features. It combines Regional Proposals with CNN.

2.Fast R-CNN: A Fast Region–based Convolutional Neural Network.

3.Faster R-CNN: Object detection networks on Region Proposal algorithms to hypothesize objecti
locations. deation
Also
4.Mask R-CNN: This network extends Faster R-CNN by adding the prediction of-
segmentation masks
on each region of interest.

gaste5.YOLO: You Only Look Once architecture. It proposes a single Neural Network to predict bounding
boxes and class probabilities from an image in a single evaluation. >
- yoloug or
youll Also does segmentation

Nei 6.SSD: Single Shot MultiBox Detector. It presents a model to predict objects in images using a single
deep Neural Network.

CPCS432 Lecture 7 28/09/2023 27


DL architectures in Object Detection
Deep learning architectures detect objects of interest in an image or in a video or even in the live
video stream. a determines

lowea
but
Bounding
determine o Boxes

Image reference : https://www.youtube.com/watch?v=nJzQDpppFj0&t=811s&ab_channel=SoroushMehraban


CPCS432 Lecture 7 28/09/2023 28
DL architectures in Object Detection
Region-based CNN (R-CNN)
Je giveto o

After

Image reference : https://www.youtube.com/watch?v=nJzQDpppFj0&t=811s&ab_channel=SoroushMehraban


CPCS432 Lecture 7 28/09/2023 29
DL architectures in Object Detection
Region-based CNN (R-CNN)
2000 regions
CNN for
Extracting
the features

Reshape all regions


o to
Image reference : https://www.youtube.com/watch?v=nJzQDpppFj0&t=811s&ab_channel=SoroushMehraban
CPCS432 Lecture 7 28/09/2023 30
DL architectures in Object Detection
Region-based CNN (R-CNN)

Reference : https://www.youtube.com/watch?v=nJzQDpppFj0&t=811s&ab_channel=SoroushMehraban
CPCS432 Lecture 7 28/09/2023 31
DL architectures in Object Detection
R-CNN
distratages
Challenges with R-CNN
1. R-CNN implements three algorithms (CNN for extracting the features, 2 SVM). It makes R-CNN
solutions quite slow to be trained.

2. It extracts features using CNN for 2000 regions which makes it slower.

3. It takes 40–50 seconds to make a prediction for an image

4. The selective search algorithm is fixed, and not much improvements can be made.

CS-664
CPCS432Lecture
Lecture6 7Dr. Arwa Basbrain 28/09/2023 32
DL architectures in Object Detection
Fast R-CNN
a
features see

Reference : https://www.youtube.com/watch?v=5gAq6BZ87aA&ab_channel=SoroushMehraban
CPCS432 Lecture 7 28/09/2023 33
DL architectures in Object Detection
Fast R-CNN

Regioninterest
&

Reference : https://www.youtube.com/watch?v=5gAq6BZ87aA&ab_channel=SoroushMehraban
CPCS432 Lecture 7 28/09/2023 34
DL architectures in Object Detection
Fast R-CNN

Regions Of Interest

changea

CPCS432 Lecture 7 Reference : https://www.youtube.com/watch?v=5gAq6BZ87aA&ab_channel=SoroushMehraban 28/09/2023 35


DL architectures in Object Detection
Fast R-CNN

Fast R-CNN has a few advantages over R-CNN:


• Fast R-CNN does not require feeding of 2000 regions.

• It uses only one convolution operation per image

• There is no need to store a feature map, resulting in saving disk space.

• Softmax layers have better accuracy than SVM and have faster execution time.

CPCS432 Lecture 7 28/09/2023 36


DL architectures in Object Detection
Faster R-CNN

CPCS432 Lecture 7 28/09/2023 37


DL architectures in Object Detection
Faster R-CNN

Region Proposal Networks (RPNs)

gotes
↓or

An
CPCS432 Lecture 7 28/09/2023 38
DL architectures in Object Detection
Faster R-CNN

Fast R-CNN Faster R-CNN

CPCS432 Lecture 7 Reference : https://www.youtube.com/watch?v=5gAq6BZ87aA&ab_channel=SoroushMehraban 28/09/2023 39


DL architectures in Object Detection
Faster R-CNN
The way an RPN works

1. RPN takes the feature maps


2. RPN applies a sliding window and generates k anchor boxes with different shapes and sizes.
3. RPN will also predict that an anchor is an object or not.
4. It will also give the bounding box regressor to adjust the anchors.

Noted RPN has not suggested the class of the object.


CPCS432 Lecture 7 28/09/2023 40
DL architectures in Object Detection
You Only Look Once (YOLO)

• YOLO was proposed in 2016 by Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi.
• You Only Look Once or YOLO is targeted for real-time object detection.
• YOLO a single CNN predicts both the bounding boxes and the respective class probabilities.

CPCS432 Lecture 7 28/09/2023 41


DL architectures in Object Detection
You Only Look Once (YOLO)
Loss function in YOLO

CPCS432 Lecture 7 28/09/2023 43


DL architectures in Object Detection
daube
You Only Look Once (YOLO)
Layer l

con ecte
it t
output raa

CPCS432 Lecture 7 28/09/2023 44


a

seach
ser

I
car
Ground a
or
a
teein
e
-

1767234
2n

230Passen
-
3
30
use
have
-
:

-it
-b

jur + -
-

zo
DL architectures in Object Detection
You Only Look Once (YOLO)
his
Yolous
Challenges with YOLO
It suffers from high localization error. Moreover, since each of the

doing
grid cells predicts only two boxes and can have only one class as the
output, YOLO can predict only a limited number of nearby objects. It
suffers from a problem of low recall too. And hence in the next
version of YOLOv2 and YOLOv3, these issues were addressed.
from a

tartingorded war
Sand 29

CPCS432 Lecture 7 28/09/2023 45


or
DL architectures in Object Detection
Single Shot MultiBox Detector (SSD)

CPCS432 Lecture 7 28/09/2023 46


DL architectures in Object Detection
Single Shot MultiBox Detector (SSD)

SSD uses the VGG16 architecture which we have


discussed in the previous chapters but with a few
modifications. By using an SSD, only a single shot is
required to detect multiple images in an object. It is hence
called single shot since it utilizes a single forward pass for
both object localization and classification. Regional
Proposal Network (RPN)–based solutions like R-CNN,
Fast R-CNN, need two shots – first one to get the region
proposals and second to detect the object for each
proposal. And hence SSD proves to be much faster than
RPN-based approaches. Szegedy et al. called it multibox,
and the significance of the word detector is obvious. Let’s
explore more on the multibox detector concept.

CPCS432 Lecture 7 28/09/2023 47


DL architectures in Object Detection
Single Shot MultiBox Detector (SSD)

CPCS432 Lecture 7 28/09/2023 48

You might also like