0% found this document useful (0 votes)
94 views

Object Detection and Tracking Using Opencv in Python: March 2020

This document summarizes an object detection and tracking project using OpenCV in Python. It discusses several techniques used in the project, including frame differencing, colorspaces, background separation, and optical flow. It also describes using Haar cascade classifiers for face, eye, and nose detection. Results are shown for frame differencing and color-based tracking. The overall goal of the project is to develop an object detection and tracking framework in video using different approaches in OpenCV.

Uploaded by

Rohit Krishnan M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views

Object Detection and Tracking Using Opencv in Python: March 2020

This document summarizes an object detection and tracking project using OpenCV in Python. It discusses several techniques used in the project, including frame differencing, colorspaces, background separation, and optical flow. It also describes using Haar cascade classifiers for face, eye, and nose detection. Results are shown for frame differencing and color-based tracking. The overall goal of the project is to develop an object detection and tracking framework in video using different approaches in OpenCV.

Uploaded by

Rohit Krishnan M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/343282935

Object Detection and Tracking Using OpenCV in Python

Presentation · March 2020


DOI: 10.13140/RG.2.2.34682.93128

CITATIONS READS

0 7,059

2 authors:

Sidra Mehtab Jaydip Sen


NSHM Knowledge Campus Praxis Business School
105 PUBLICATIONS   431 CITATIONS    393 PUBLICATIONS   4,701 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Forecasting of Tourist Inflow for Infrastructure and Logistics Planning View project

Wireless Mesh Networks: Architecture, Algorithms and Applications View project

All content following this page was uploaded by Sidra Mehtab on 29 July 2020.

The user has requested enhancement of the downloaded file.


Object Detection and Tracking
Using OpenCV in Python
Master of Science (Data Science & Analytics) Batch 2018 – 2020
Minor Project Presentation

By
Sidra Mehtab
(Reg. No: 182341810028)
Under the Supervision of Prof. Jaydip Sen

NSHM Knowledge Campus, Kolkata, INDIA


Affiliated to
Maulana Abul Kalam Azad University of Technology, Kolkata, INDIA
Objective of the Work
• The primary objective of this work is to develop a
framework for object detection and tracking in a video
using various approaches.
• Based on the framework developed, it is also
extended for the detection of other features of a
human face like eyes and nose.
• In addition, a very efficient and effective edge
detection algorithm – Canny’s Edge Detection – is
also developed.
Outline
• Frame differencing
• Colorspaces
• Background separation
• Optical flow
• Object detection and tracking using Haar cascade classifiers
• Face detection
• Eyes detection
• Nose detection
• Canny’s edge detection algorithm
• Conclusion
• References
Frame Differencing
• Frame differencing is one of the simplest techniques that can be
used to identify moving parts in a video.
• When we look at a live video stream, the differences between
consecutive frames captured from the stream gives us a lot of useful
information.
• In frame differencing, the computer checks the difference between
two video frames. If the pixels have changed, there apparently was
something changing in the image (moving for example).
• Most techniques work with some “blur” and “threshold”, to
distinguish real movement from noise, because frame could differ
too when light conditions in a room change.
• Using OpenCV we define a function to first compute the difference
between the current frame and the next frame of a video. Using the
same function we also compute the difference between the current
frame and the previous frame. We compute the bitwise AND
between the two difference frames and return it.
def frame_diff(prev_frame, cur_frame, next_frame):
# Difference between the current frame and the next frame
diff_frames_1 = cv2.absdiff(next_frame, cur_frame)
diff_frames_2 = cv2.absdiff(cur_frame, prev_frame)
return cv2.bitwise_and(diff_frames_1, diff_frames_2)
Frame Differencing (contd…)
• We, then, define a function to grab the current frame from a
given input video and start reading it from the video “capture”
object. We resize the frame based on the “scaling factor”. Finally,
we convert the image into “grayscale” and return it.
def get_frame(cap, scaling_factor):
_, frame = cap.read()
frame = cv2.resize(frame, None, fx=scaling_factor,
fy=scaling_factor, interpolation=cv2.INTER_AREA)
gray = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY)
return gray
• Now, we define the main function and initialize the video capture object.
• Next, we define the scaling factor to resize the image we used a “scaling
factor” of 0.8.
• We grab the current frame, the next frame, and the frame after that.
• Finally, we use an infinite “while loop” that continues its iteration until the
user presses the ESC key.
• The output video (with frame differencing) is saved using a free video
capture tool – free cam – that store videos in .wmv format.
Frame Differencing (contd…)
The structure of the main function is as follows:
if __name__=='__main__’:
cap = cv2.VideoCapture('bihag_nazrul_geeti.mp4’)
scaling_factor = 0.8
# Grab the current frame
prev_frame = get_frame(cap, scaling_factor)
# Grab the next frame
cur_frame = get_frame(cap, scaling_factor)
# Grab the frame after that
next_frame = get_frame(cap, scaling_factor)

# Iterate indefinitely until the user presses the Esc key.


while True:
# Display the frame difference
cv2.imshow('Object Movement', frame_diff(prev_frame, cur_frame, next_frame))
# Update frame the variables
prev_frame = cur_frame
cur_frame = next_frame
# Grab the next frame from the webcam
next_frame = get_frame(cap, scaling_factor)
# Check if the user pressed the Esc key. If so, exit the loop:
key = cv2.waitKey(10)
if key == 27:
break
# Once you exit the loop, make sure that all the windows are closed properly:
cv2.destroyAllWindows()
Results : Frame Differencing

Link of the Video

The frame of the original video

Link of the Video

Output of the frame differencing program


Tracking Objects Using Colorspace
• The information obtained by frame differencing is useful, but it is
not possible to build a robust tracker using that approach. The
method is also very sensitive to noise and it does not really track
an object completely.
• To build a robust object tracker, we need to know what
characteristics of the object can be used to track it accurately.
• The color spaces become relevant here.
• RGB color space does not lend itself nicely to object tracking.
HSV (Hue, Saturation, and Value) color space is closer to how
humans perceive colors, and hence it is used for object tracking.
• We convert a captured frame from RGB to HSV colorspace and
then use color thresholding to track any given object.
• One needs to know the “color distribution” of the object so that
one can select the appropriate ranges for thresholding.
Tracking Objects Using Colorspace (contd…)
• We, first, define a function to grab the current frame from an input mp4
video file and start reading it from the video “capture” object.
def get_frame(cap, scaling_factor):
_, frame = cap.read()
frame = cv2.resize(frame, None, fx=scaling_factor,
fy=scaling_factor, interpolation=cv2.INTER_AREA)
return frame

• We now define the main function and start by initializing the video
“capture” object.
• Then, we define the actual value of the “scaling factor” to be used for
resizing the captured frames from the video.
• We, then, use a “while loop” to iterate indefinitely until the user interrupts
the program by hitting the ESC key to terminate the program.
• We convert each frame grabbed by the capturing unit to HSV color
space using the inbuilt library function in OpenCV. Then, we define the
approximate HSV color region for the color of human skin.
• We use the HSV threshold values to create the “mask”.
Tracking Objects Using Colorspace (contd…)
• Next, we compute the bitwise-AND between the pixels in the “mask” and
those in the original image.
• We run a median blurring to smoothen the image.
• Finally, we display the input and output frames of the image side by side.
if __name__=='__main__':
cap = cv2.VideoCapture('bihag_flute.mp4’)
scaling_factor = 0.5
while True:
frame = get_frame(cap, scaling_factor)
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
lower = np.array([0, 30, 50])
upper = np.array([30, 150, 255])
mask = cv2.inRange(hsv, lower, upper)
img_bitwise_and = cv2.bitwise_and(frame, frame, mask=mask)
img_median_blurred = cv2.medianBlur(img_bitwise_and, 5)
cv2.imshow('Input', frame)
cv2.imshow('Output', img_median_blurred)
c = cv2.waitKey(5)
if c == 27:
break
cv2.destroyAllWindows()
Results : Tracking by Colorspace

Link of the Video

The frame of the original video

Link of the Video

Output of the colorspace program


Object Tracking Using Background Separation
• Background subtraction is a technique that models the background in
a given video and then uses that model to detect moving objects.
• This technique is used a lot in “video compression” as well as “video
surveillance”.
• It performs really well where we have to detect moving objects within a
static scene.
• The algorithm basically works by detecting the background, building a
model for it, and then subtracting it from the current frame to obtain the
foreground. This foreground corresponds to moving objects.
• Unlike the frame differencing approach, the algorithm here is not
differencing successive frames. Rather, it is actually modeling the
background and updating it in real-time.
• This approach makes the algorithm adaptive that can adjust to a moving
baseline.
• Due to its adaptive capability, the background separation algorithm
performs much better than the frame differencing algorithm.
Object Tracking Using Background Separation
(contd…)
• First, we define a function to grab the current frame. It resizes the frame
and returns it.
• The main function initializes the video capture object.
• Then, the background subtractor object is defined.
• The history and the learning rates are defined. The “history” refers to the
number of previous frames to use for learning. The higher value for
“history” indicates a slower learning rate.
• The value of the “learning rate” is computed as the reciprocal of the
value of the “history”.
• A “while loop” iterates infinitely till the user interrupts the program or the
input video file stops running. The user interrupt is done by pressing the
ESC key.
• Inside the “while loop”, the program keeps reading the frame from the
input video, grabs the current video and resizes it.
• A “mask” is computed using the background subtractor object defined in
the cv2 library. The mask is converted from “grayscale” to “RGB”.
• Both the input and output frames are displayed simultaneously.
Object Tracking Using Background Separation
(contd…)
The program structure is as follows:
def get_frame(cap, scaling_factor):
_, frame = cap.read()
frame = cv2.resize(frame, None, fx=scaling_factor,
fy=scaling_factor, interpolation=cv2.INTER_AREA)
return frame

if __name__=='__main__’:
cap = cv2.VideoCapture('bihag_flute.mp4')
bg_subtractor = cv2.createBackgroundSubtractorMOG2()
history = 75
learning_rate = 1.0/history
while True:
frame = get_frame(cap, 0.5)
mask = bg_subtractor.apply(frame, learningRate=learning_rate)
mask = cv2.cvtColor(mask, cv2.COLOR_GRAY2BGR)
cv2.imshow('Input', frame)
cv2.imshow('Output', mask & frame)
c = cv2.waitKey(10)
if c == 27:
break
cap.release()
cv2.destroyAllWindows()
Results : Tracking by Background Separation

Link of the Video

The frame of the original video

Link of the Video

Output of the colorspace program


Optical Flow-Based Tracking
• Optical flow is a technique used in computer vision that uses image
feature points to track an object across successive frames in a live
video.
• When we detect a set of “feature points” in a given frame, we
compute the displacement vectors to keep track of it. Then, we show
the motion of these “feature points” between successive frames.
These vectors are known as motion vectors.
• Lucas-Kanade’s method is the most popular method of optical flow-
based tracking.
• The steps of the method are as follows:
• We extract the feature points from the current frame. For each
feature point that is extracted, a 3*3 patch (of pixels) is created with
the feature point at the center. We assume that all the points in each
patch have similar motion. The size of the window depends on the
problem at hand.
Optical Flow-Based Tracking (contd…)
• For each patch, we look for a match in its neighborhood in the
previous frame.
• We pick the best match based on an “error metric”. The search area,
however, is bigger than 3*3 because we look for a bunch of different
3*3 patches to get the one that is closest to the current patch.
• Once the closest patch in the current frame is found, the path from
the center point of the current patch and the matched patch in the
previous frame becomes the motion vector.
• We similarly compute the motion vectors for all the other patches.
• We, now, discuss the salient features of the implementation of this
algorithm:
• We, first, define a function to start tracking an object using optical
flow by initializing the video “capture” object and scaling factor.
• The number of frames to track and the number of frames to skip
are then decided.
• Variables related to tracking paths and frame index are initialized
Optical Flow-Based Tracking (contd…)
• The tracking parameters like the window size, maximum level, and
the termination criteria are defined.
• A “while loop” is constructed that iterate infinitely till the user presses
the ESC key, or the input video completes its playing.
• In the “while loop”, the current frame from the input video is captured.
• The “capture” frame is resized.
• The captured frame is converted from color to the grayscale.
• A copy of the frame is created.
• It is checked whether the length of the tracking path is greater than
zero or not.
• The “feature points” are organized on the tracking path.
• The optical flow is computed based on the previous and current
images by using the feature points and the “tracking parameters”. For
this purpose, the forward optical flow, reverse optical flow, and the
difference between the forward and the reverse optical flow is
computed.
Optical Flow-Based Tracking (contd…)
• The good feature points are extracted.
• The variables for the new tracking paths are initialized.
• Circles are drawn around the feature points by iterating through all
of them (i.e., the “good feature points”).
• The X and Y coordinates of the “good feature points” are appended,
and it is ensured that the number of frames to be tracked is never
exceeded.
• Circles are drawn around the “good feature points”, the tracking
paths are updated, and lines are drawn using the new tracking paths
to show the movement.
• An “if block” is set which is executed after skipping the right number
of frames (specified earlier). Inside the “if block” a “mask” is
constructed with all pixel values set to 255. For all feature points
lying on the tracking path, circles are drawn over all “good feature
points”.
Optical Flow-Based Tracking (contd…)
• The “good features” to track are computed using the in-built function
along with parameters like mask, maximum corners, quality level,
minimum distance, and the block size.
• If the feature points exist, they are appended to the “tracking paths”.
• The variables related to the frame index and the previous grayscale
image are updated.
• The output frame (and hence the output video) is displayed.
• If the user presses the ESC button, or if the input video stops, the
“while loop” is terminated and the program stops.
Optical Flow-Based Tracking (contd…)
def start_tracking():
cap= cv2.VideoCapture('bihag_flute.mp4’)
scaling_factor = 0.5
num_frames_to_track = 5
num_frames_jump = 2
# Initialize variables related to tracking paths and frame index:
tracking_paths = []
frame_index = 0
# Define the tracking parameters like the window size, max LEVEL and the termination
criteria:
tracking_params = dict(winSize = (11, 11), maxLevel = 2, criteria =
(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))
# Iterate indefinitely until the user presses the Esc key.
while True:
# Capture the current frame
_, frame = cap.read()
# Resize the frame
frame = cv2.resize(frame, None, fx=scaling_factor, fy=scaling_factor,
interpolation=cv2.INTER_AREA)
# Convert the frame from RGB to grayscale:
frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Create a copy of the frame
output_img = frame.copy()
Optical Flow-Based Tracking (contd…)
# Check if the length of tracking paths is greater than zero:
if len(tracking_paths) > 0:
prev_img, current_img = prev_gray, frame_gray
feature_points_0 = np.float32([tp[-1] for tp in tracking_paths]).reshape(-1, 1, 2)
feature_points_1, _, _ = cv2.calcOpticalFlowPyrLK(prev_img, current_img,
feature_points_0, None, **tracking_params)
feature_points_0_rev, _, _ = cv2.calcOpticalFlowPyrLK(current_img, prev_img,
feature_points_1, None, **tracking_params)
diff_feature_points = abs(feature_points_0 - feature_points_0_rev).
reshape(-1, 2).max(-1)
good_points = diff_feature_points < 1
new_tracking_paths = []
for tp, (x, y), good_points_flag in zip(tracking_paths, feature_points_1.reshape(-1,
2), good_points):
if not good_points_flag:
continue
tp.append((x, y))
if len(tp) > num_frames_to_track:
del tp[0]
new_tracking_paths.append(tp)
cv2.circle(output_img, (x, y), 3, (0, 255, 0), -1)
tracking_paths = new_tracking_paths
Optical Flow-Based Tracking (contd…)
cv2.polylines(output_img, [np.int32(tp) for tp in tracking_paths], False, (0, 150, 0))
if not frame_index % num_frames_jump:
mask = np.zeros_like(frame_gray)
mask[:] = 255
for x, y in [np.int32(tp[-1]) for tp in tracking_paths]:
cv2.circle(mask, (x, y), 6, 0, -1)
feature_points = cv2.goodFeaturesToTrack(frame_gray,
mask = mask, maxCorners = 500, qualityLevel = 0.3,
minDistance = 7, blockSize = 7)
if feature_points is not None:
for x, y in np.float32(feature_points).reshape(-1, 2):
tracking_paths.append([(x, y)])
frame_index += 1
prev_gray = frame_gray
cv2.imshow('Optical Flow', output_img)
c = cv2.waitKey(1)
if c == 27:
break

if __name__ == '__main__’:
# Start the tracker
start_tracking()
# Close all the windows
cv2.destroyAllWindows()
Results : Optical Flow-Based Tracking

Link of the Video

The frame of the original video

Link of the Video

Output of the colorspace program


Face Detection and Tracking
• We first load the Haar cascade file of the frontal face using the
CascadeClassifier method of cv2 module using the following command:
face_cascade = cv2.CascadeClassifier(‘haar_cascade_files/Haarcascade_frontalface_default.xml’)

• We initialize the video “capture” object and define the “scaling factor” using
the following:
cap = cv2.VideoCapture('bihag_nazrul_geeti.mp4’)
ds_factor = 0.5
• Next, we use a “while loop” to iterate indefinitely until the user presses the
ESC key or the input video completes its playing. We convert the image
into a “grayscale” image.
while True:
_, frame = cap.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

• Next, we run the face detector on the grayscale image using the following
line with a “scaleFactor” value of 1.3 (default value) and the “minNeighbors”
value of 15.
faces = face_cascade.detectMultiScale(gray, 1.3, 15)
Face Detection and Tracking (contd…)
• The first parameter “scaleFactor” of the detectMultiScale function determines a
trade-off between detection accuracy and speed of detection. The detection
window starts at size “minSize”, and after testing all windows of that size, the
window is scaled up the “scaleFactor”, and re-tested, and so on until the window
reaches or exceeds the “maxSize”. If the “scaleFactor” is large, (e.g., 2.0), there
will be fewer steps, so detection will be faster, but we may miss objects whose
size is between two tested scales. We usually don’t change the default value,
which is 1.3.

• The second parameter known as the “minNeighbors” is changed based on the


requirements. The higher the values of the second parameter(minNeighbors), the
less will be the number of false positives and less error will be in terms of false
detection of faces. However, there is a chance of missing some unclear face
traces as well.

• Next, we iterate through the detected faces and draw rectangles around them, so
that the faces are clearly identified and the detected faces are displayed.

for (x, y, w, h) in faces:


cv2.rectangle(frame, (x, y), (x+w, y+h), (255, 0, 0), 2)
# Display the output
cv2.imshow(‘Face detector’, frame)
Results : Face Detection

Link of the Video

Output of the nose detection program


Eyes Detection in a Face
• We, first load the Haar cascade files of “frontal face” and “eye” using the
CascadeClassifier method of the cv2 module using the following command:
face_cascade = cv2.CascadeClassifier(‘haar_cascade_files/Haarcascade_frontalface_default.xml’)

eye_cascade = cv2.CascadeClassifier('haar_cascade_files/haarcascade_eye.xml')

• Next, we initialize the video “capture” object and define the scaling factor.
cap = cv2.VideoCapture('bihag_nazrul_geeti.mp4’)
ds_factor = 0.5

• Next, we use a “while loop” to iterate indefinitely until the user presses the
ESC key or the input video completes its playing. We convert the image into
a “grayscale” image.
while True:
_, frame = cap.read()
frame = cv2.resize(frame, None, fx=ds_factor, fy=ds_factor,
interpolation=cv2.INTER_AREA)
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

• Next, we run the face detector on the “grayscale” image using the following
line with “scaleFactor” value of 1.3 (default value) and the minNeighbors
value of 20 in an outer “for loop”.
faces = face_cascade.detectMultiScale(gray, 1.3, 5)
Eyes Detection in a Face (contd…)
• In an inner “for loop”, we detected the eyes for each “face”
detected in the outer “for loop”.
cv2.rectangle(roi_color,(ex,ey),(ex+ew,ey+eh),(0,255,0),4)

• The code of the two “for loops” together is as follows:

faces = face_cascade.detectMultiScale(gray, 1.3, 20)


for (x,y,w,h) in faces:
roi_gray = gray[y:y+h, x:x+w]
roi_color = frame[y:y+h, x:x+w]
eyes = eye_cascade.detectMultiScale(roi_gray)
for (x_eye,y_eye,w_eye,h_eye) in eyes:
center = (int(x_eye + 0.5*w_eye), int(y_eye + 0.5*h_eye))
radius = int(0.3 * (w_eye + h_eye))
color = (0, 255, 0)
thickness = 3
cv2.circle(roi_color, center, radius, color, thickness)
cv2.imshow(‘Eye detector’,frame)
Results : Eyes Detection

Link of the Video

Output of the eyes detection program


Nose Detection in a Face
• We, first load the Haar cascade file of the nose using the
CascadeClassifier method of cv2 module using the following command:
nose_cascade = cv2.CascadeClassifier('haar_cascade_files/haarcascade_mcs_nose.xml’)

• We initialize the video “capture” object and define the “scaling factor”
using the following:
cap = cv2.VideoCapture('bihag_nazrul_geeti.mp4’)
ds_factor = 0.7
• Next, we, use a “while loop” to iterate indefinitely until the user presses the
ESC key or the input video completes its playing. We convert the image
into a “grayscale” image.
while True:
_, frame = cap.read()
frame = cv2.resize(frame, None, fx=ds_factor, fy=ds_factor,
interpolation=cv2.INTER_AREA)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

• Next, we run the nose detector on the grayscale image using the following
line with a “scaleFactor” value of 1.3 (default value) and the minNeighbors
value of 8.
nose_rects = nose_cascade.detectMultiScale(gray, 1.3, 8)
Nose Detection in Face (contd…)
• Next, we iterate through the detected noses and draw rectangles around them,
so that the noses are clearly identified.

for (x, y, w, h) in nose_rects:


cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 3)
break
# Display the output
cv2.imshow(‘Nose detector’, frame)
Results : Nose Detection

Link of the Video

Output of the nose detection program


Canny’s Edge Detection Algorithm
• The Canny Edge Detector is an edge detection operator that uses a
multi-stage algorithm to detect a wide range of edges in images. It was
developed by John F. Canny in 1986.
• The algorithm works in five steps:
 Noise reduction
 Gradient calculation
 Non-maximum suppression
 Double thresholding
 Edge tracking by hysteresis
• The algorithm is based on “grayscale” pictures. Therefore, the pre-
requisite is to convert the image to “grayscale” before the method can
be applied.
• Noise reduction: Since the working principles of this method is largely
based on the computation of derivatives, edge detection results are
highly sensitive to image noise. The most common way to get rid of the
noise is to apply a Gaussian blur. To do so, the image convolution
technique is applied with a Gaussian Kernel (3*3, 5*5, 7*7, ….). The
kernel size depends on the expected blurring effect – smaller kernels
imply less visible blur. We have used a 5*5 Gaussian kernel in our work.
Canny’s Edge Detection Algorithm (contd…)
• Gradient computation: This step detects the edge intensity and
direction by calculating the gradient of the image using an edge
detection operator. Edges correspond to a change of pixels’ intensity. To
detect it, the easiest way is to apply filters that highlight this intensity
change in both directions: horizontal (x) and vertical (y). When the
image is smoothed, the derivatives Ix and Iy with respect to x and y are
calculated. In practice, this is implemented by convolving the image I
with Sobel kernels Kx and Ky respectively. After the completion of this
step, the gradient intensity level lies between 0 and 255, and surely it is
not uniform. However, the edges on the final result should have the
same intensity (i.e., white pixel = 255).
• Non-maximum Suppression: Since the final image should have thin
edges, we must perform non-maximum suppression to thin out the
edges. The algorithm goes through all the points on the gradient
intensity matrix and finds the pixels with the maximum value in the
edge directions. Each pixel has two main criteria – edge direction in
radians, and the pixel intensity (between 0 – 255). Based on these
inputs, the non-maximum suppression steps executed are as follows:
 A matrix is created that is initialized to 0 and of the same size as that of the original
gradient intensity matrix.
Canny’s Edge Detection Algorithm (contd…)
 The edge directions are identified based on the angle value from the angle matrix.
 It is checked whether the pixel in the same direction has a higher intensity than the pixel
that is currently being processed.
 The image is returned with the non-max suppression algorithm.
• Application of the non-maximum suppression step results in an image
with thinner edges. However, the image still exhibits some variations in
the intensity of the edges – some pixels tend to be brighter than the
others. This problem is solved in the next two steps.
• Double thresholding: in this step, three kinds of pixels are identifies –
strong, weak, and non-relevant. The strong pixels have so high intensity
that we are sure that they contribute to the final edge. Weak pixels have
the intensity that is not enough to be considered as strong, but yet not
small enough to be considered as non-relevant for edge detection.
Other pixels are considered as non-relevant for the edge. The double
thresholding works as follows:
 High threshold is used to identify the strong pixels (intensity higher than the high
threshold)
 Low threshold is used to identify the non-relevant pixels (intensity lower than the low
threshold)
 All pixels having intensity between both thresholds are flagged as weak. The next step
(Hysteresis) helps us identify the ones that could be considered as strong and the ones
that are considered as non-relevant.
 The result of this step is an image with only 2-pixel intensity values (strong and weak)
Canny’s Edge Detection Algorithm (contd…)
• Edge tracking by hysteresis: based on the
threshold results, the hysteresis consists of
transforming weak pixels into strong ones, if and only
if, at least one of the pixels around the one being
processed is a strong one.
• Hence for a given edge chain, if the magnitude of any
edge of the chain is greater than the upper threshold,
all edges above the lower thresholds are selected as
edge points.
• Canny has not provided any basis for selecting the
upper and the lower thresholds, and similar to many
such applications, selection of the thresholds are
application dependent.
Results : Canny’s Edge Detection

Link of the Video

The frame of the original video

Link of the Video

Output of the Canny’s edge detection program


Conclusion
• In this work, we have discussed the concept of object detection
and tracking in a video using OpenCV in Python, using various
methods such as frame differencing, colorspaces, background
separation, optical flows, and Haar cascades classifiers. We
have also discussed in detail, a famous edge detection algorithm
– Canny’s Edge Detector.
• We used the rich library set of OpenCV for a robust face
detection from a sample video. For training the model with the
feature set of a face, we used the “Haar frontal face” XML file.
• We later extended our model to detect eyes and nose in the
same input video. We used “haar_eyes” and “haar_mcs_nose”
XML files for this purpose.
• We also implemented all concepts like frame differencing,
colorspaces, background separation, optical flows using
OpenCV in Python.
• We implemented Canny’s Edge Detection algorithm also in
OpenCV on a sample video and detected all the edges in the
video.
• All our models could successfully detect all faces, eyes, and
noses in the input image with 100% detection accuracy and in
real-time detection speed.
References
1. Viola, P.. & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features.
Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR, 2001), December 8-14, 2001, Kauai, HI, USA.
2. Kirby, M., Sirovich, L. (1990) Application of the Karhunen-Loeve procedure for the characterization of
human faces. IEEE Transaction of Pattern Analysis and Machine Intelligence, Vol 12, No 1, January
1990., pp. 103 – 108.
3. Liao, S., Jain, A.K., Li, S. Z. (2016). A fast and accurate unconstrained face detector. IEEE Transaction
of Pattern Analysis and Machine Intelligence, Vol 38, No 2, pp. 211 – 123.
4. Luo, D., Wen, G., Li, D., Hu, Y., and Huna, E. (2018). Deep learning-based face detection using iterative
bounding-box regression. Multimedia Tools Applications. DOI: https://doi.or/10.1007/s11042-018-
56585.
5. Mingxing, J., Junqiang, D., Tao, C., Ning, Y., Yi, J., and Zhen, Z. (2013). An improved detection
algorithm of face with combining AdaBoost and SVM. Proceedings of the 25th Chinese Control and
Decision Conference, pp. 2459-2463.
6. Ren, Z., Yang, S., Zou, F., Yang, F., Luan, C., and Li, K. (2017). A face tracking framework based on
convolutional neural networks and Kalman filter. Proceedings of the 8th IEEE International Conference
on Software Engineering and Services Science, pp. 410-413.
7. Zhang, H., Xie, Y., Xu, C. (2011). A classifier training method for face detection based on AdaBoost.
Proceedings of the International Conference on Transportation, Mechanical, and Electrical
Engineering, pp. 731-734.
8. Zou, L., Kamata, S. (2010). Face detection in color images based on skin color models. Proceedings of
IEEE Region 10 Conferences , pp. 681-686.
References
9. Zhang, Y., Wang, X., and Qu, B. (2012). Three-frame difference algorithm research based on
mathematical morphology. Proceedings of 2012 International Workshop on Information and Electronics
Engineering (IWIEE), pp. 2705 – 2709.
10. Altun, H., Sinekli, R., Tekbas, U., Karakaya, F. and Peker, M. (2011). An efficient color detection in
RGB space using hierarchical neural network structure. Proceedings of 2011 International Symposium
on Innovations in Intelligent Systems and Applications, pp. 154-158, Istanbul, Turkey.
11. Lee, J., Lim, S., Kim, J-G, Kim, B., Lee, D. (2014). Moving object detection using background
subtraction and motion depth detection in depth image sequences. Proceedings of the 18th IEEE
International Symposium on Consumer Electronics (ISCE’2014), Jeju Island, South Korea, August 2014.
12. Lucas, B. D. & Kanade, T. (1981). An iterative image registration technique with an application to stereo
vision. Proceedings of Imaging Understanding Workshop, pp 121 – 130.
13. Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis
and Machine Intelligence, Volume: PAMI-8, No: 6, pp. 679-698, November 1986.
14. Li, J. and Ding, S. (2011). A research on improved Canny edge detection algorithm. Proceedings of the
International Conference on Applied Informatics and Communication, pp. 102 – 108, Communications
in Computer and Information Science (CCIS), Vol 228, Springer-Verlag.
15. Han, X., Gao, Y., Lu, Z., Zhang, Z., Niu, D. (2016). Research on moving object detection algorithm
based on improved three frame difference method and optical flow. Proceedings of the 5th International
Conference on Instrumentation and Measurement, Computer, Communication and Control (IMCCC),
Qinhuangdao, China, February 2016.
Thank You!
Questions?
email: smehtab@acm.org

View publication stats

You might also like