Object Detection
Object Detection
Object Detection
On
BACHELOR OF TECHNOLOGY
in
ELECTRONICS & COMMUNICATION ENGINEERING
Submitted by
R. SADANAND (16311A04V3)
CH. AMAN PRASAD (16311A04W3)
Y. MUKESH REDDY (16311A04W4)
1
Department of Electronics and Communication Engineering
SREENIDHI INSTITUTE OF SCIENCE AND TECHNOLOGY
(Affiliated to Jawaharlal Nehru Technological University, Hyderabad)
CERTIFICATE
R. SADANAND (16311A04V3)
CH. AMAN PRASAD (16311A04W3)
2
Y. MUKESH REDDY (16311A04W4)
Head of Department
Dr.S.P.V.Subba Rao
Professor- HOD
Department of ECE
3
DECLARATION
This is to certify that the work reported in the present thesis titled “OBJECT DETECTION
USING MATLAB” is a record work done by me/us in the Department of Electronics and
Communication Engineering, Sreenidhi Institute of Science and Technology,
Yamnampet, Ghatkesar.
No part of the thesis is copied from books/ journals/ internet and wherever the portion is
taken; the same has been duly referred in the text. The report is based on the project work
done entirely by me/ us and not copied from any other source.
R. SADANAND – 16311A04V3
CH. AMAN PRASAD-16311A04W3
Y. MUKESH REDDY-16311A04W4
1
ACKNOWLEDGMENT
We would like to express our sincere gratitude and thanks to
Ms.E.Lavanya, Internal Guide, Department of Electronics and Communication
Engineering, Sreenidhi Institute of Science and Technology for allowing us to take up this
project.
We would specially like to express our sincere gratitude and thanks to, Mrs. C.N. Sujatha,
Project Coordinator, Department of Electronics and Communication Engineering,
Sreenidhi Institute of Science and Technology for guiding us throughout the project.
We are very grateful to Dr. S.P.V. Subba Rao, Head of the Department of Electronics
and Communication Engineering, Sreenidhi Institute of Science and Technology for
allowing us to take up this project.
We also extend our sincere thanks to our parents and friends for their moral support
throughout the project work.
2
ABSTRACT
Object detection is most prevalent step of video analytics. Performance at higher level is greatly
depends on accurate performance of object detection. Various platforms are being used for
designing and implementation of object detection algorithm. It includes C programming,
MATLAB and Simulink, open cv etc. Among these, MATLAB programming is most popular in
students and researchers due to its extensive features. These features include data processing using
matrix, set of toolboxes and Simulink blocks covering all technology fields, easy programming,
and Help topics with numerous examples. This paper presents the implementation of object
detection and tracking using MATLAB. It demonstrates the basic block diagram of object
detection and explains various predefined functions and object from different toolboxes that can
be useful at each level in object detection. Useful tool boxes include image acquisition, image
processing, and computer vision. This study helps new researcher in object detection field to
design and implement algorithms using MATLAB.
3
CONTENTS
Abstract 3
Contents 4
Chapter 8 Result 40
Chapter 10 Conclusion 45
10.1 Conclusion 45
10.2 Future Scope 45
Chapter 11 References 45
5
Chapter 1
Introduction to Object Detection
1.1 INTRODUCTION:
Video analytics is popular segment of computer vision. It has enormous applications such as traffic
monitoring, parking lot management, crowd detection, object recognition, unattended baggage
detection, secure area monitoring, etc. Object detection is critical step in video analytics. The
performance at this step is important for scene analysis, object matching and tracking, activity
recognition. Over the years, research is flowing towards innovating new concept and improving
or extending the established research for performance improvement of object detection and
tracking. Various object detection approaches has been developed based on statistic, fuzzy, neural
network etc. Most approaches involve complex theory. These approaches can be evolved further
by thorough understanding, implementation and experimentation. All these approaches can be
learned by reading, reviewing, and taking professor’s expert guidance. However, implementation
and experimentation requires good programmer. Various platforms are being used for the design
and implementation of object detection and tracking algorithm. These platforms include C
programming, Open CV, MATLAB etc. The object detection system to be used in real time should
satisfy two conditions. First, system code must be short in terms of execution time. Second, it must
efficiently use memory. However, programmer must have good programming skill in case of
programming in C and OpenCV. Moreover, it is time intensive too for new researcher to develop
such efficient code for real time use.Assuming all these facts, the MATLAB is found as better
platform to design and implementation of algorithm. It contains more than seventy toolboxes
covering all possible fields in technology. All toolboxes are rich with predefined functions, system
6
objects and simulink blocks. This feature helps to write short code and saves time in logic
development at various steps in system. MATLAB supports matrix operation which is huge
advantage during processing of an image or frame in video sequence. MATLAB coding is simple
and easily learned by any new researcher. This paper presents implementation of object detection
system using MATLAB and its toolboxes. This study explored various toolboxes and identified
useful functions and objects that can be used at various levels in object detection and tracking.
Toolboxes mainly include computer vision, image processing, and image acquisition. MATLAB
2012 version is used for this study. This paper organized in four section second section describe
general block diagram of object detection. Third section involves MATLAB functions and objects
that are useful in implementation of object detection system. Sample coding is presented for object
detection and tracking in section four. Paper is concluded in fifth section.
Today, images and video are everywhere. Online photo sharing sites and social networks have
them in the billions. The field of vision research[1,] has been dominated by machine learning and
statistics. Using images and video to detect, classify, and track objects or events in order to
”understand” a real-world scene. Programming a computer and designing algorithms for
understanding what is in these images is the field of computer vision. Computer vision powers
applications like image search, robot navigation, medical image analysis, photo management and
7
many more. From a computer vision point of view, the image is a scene consisting of objects of
interest and a background represented by everything else in the image. The relations and
interactions among these objects are the key factors for scene understanding. Object detection and
recognition are two important computer vision tasks. Object detection determines the presence of
an object and/or its scope, and locations in the image. Object recognition identifies the object class
in the training database, to which the object belongs to. Object detection typically precedes object
recognition. It can be treated as a two-class object recognition, where one class represents the
object class and another class represents non-object class. Object detection can be further divided
into soft detection, which only detects the presence of an object, and hard detection, which detects
both the presence and location of the object. Object detection field is typically carried out by
searching each part of an image to localize parts. This can be accomplished by scanning an object
template across an image at different locations, scales, and rotations, and a detection is declared if
the similarity between the template and the image is sufficiently high. The similarity between a
template and an image region can be measured by their correlation (SSD). Over the last several
years it has been shown that image based object detectors are sensitive to the training data.
8
The future of object detection has massive potential across a wide range of industries. We are
thrilled to be one of the main drivers behind real time intelligent vision, high performance
computing, artificial intelligence and machine learning, which has allowed us to create a solution
that will never distort video, allowing for various AI capabilities which other companies simply
cannot enable.
CHAPTER 2
LITERATURE SURVEY
The object detection task can be addressed by considering the video as an unrelated sequence of
frames and perform static object detection In 2009, Felzenszwalb et al. [1] described an object
detection system based on mixtures of multiscale deformable part models. Their system was able
to represent highly variable object classes and achieves state-of-the-art results in the PASCAL
object detection challenges. They combined a margin-sensitive approach for data-mining hard
negative examples with a formalism we call latent SVM. This led to an iterative training algorithm
that alternates between fixing latent values for positive examples and optimizing the latent SVM
objective function. Their system relied heavily on new methods for discriminative training of
classifiers that make use of latent information. It also relied heavily on efficient methods for
matching deformable models to images. The described framework allows for exploration of
additional latent structure. For example, one can consider deeper part hierarchies (parts with parts)
or mixture models with many components. Leibe et al. [2] in 2007, presented a novel method for
detecting and localizing objects of a visual category in cluttered real-world scenes. Their approach
9
considered object categorization and figure-ground segmentation as two interleaved processes that
closely collaborate towards a common goal. The tight coupling between those two processes
allows them to benefit from each other and improve the combined performance. The core part of
their approach was a highly flexible learned representation for object shape that could combine the
information observed on different training examples in a probabilistic extension of the Generalized
Hough Transform. As they showed, the resulting approach can detect categorical objects in novel
images and automatically infer a probabilistic segmentation from the recognition result. This
segmentation was then in turn used to again improve recognition by allowing the system to focus
its efforts on object pixels and to discard misleading influences from the background. Their
extensive evaluation on several large data sets showed that the proposed system was applicable to
a range of different object categories, including both rigid and articulated objects. In addition, its
flexible representation allowed it to achieve competitive object detection performance already
from training sets that were between one and two orders of magnitude smaller than those used in
comparable systems. Recently in last decade, methods based on local image features have shown
promise for texture and object recognition tasks. Zhang et al. [3] in 2006, presented a large-scale
evaluation of an approach that represented images as distributions (signatures or histograms) of
features extracted from a sparse set of key-point locations and learnt a Support Vector Machine
classifier with kernels based on two effective measures for comparing distributions. They first
evaluated the performance of the proposed approach with different key-point detectors and
descriptors, as well as different kernels and classifiers. Then, they conducted a comparative
evaluation with several modern recognition methods on 4 texture and 5 object databases. On most
of those databases, their implementation exceeded the best reported results and achieved
comparable performance on the rest. Additionally, we also investigated the influence of
background correlations on recognition performance. In 2001, Viola and Jones [4] in a conference
on pattern recognition described a machine learning approach for visual object detection which
was capable of processing images extremely rapidly and achieving high detection rates. Their work
was distinguished by three key contributions. The first was the introduction of a new image
representation called the "integral image" which allowed the features used by their detector to be
computed very quickly. The second was a learning algorithm, based on AdaBoost, which used to
select a small number of critical visual features from a larger set and yield extremely efficient
classifiers. The third contribution was a method for combining increasingly more complex
classifiers in a "cascade" which allowed background regions of the image to be quickly discarded
while spending more computation on promising object-like regions. The cascade could be viewed
as an object specific focus-of-attention mechanism which unlike some of the previous approaches
provided statistical guarantees that discarded regions were unlikely to contain the object of interest.
They had done some testing over face detection where the system yielded detection rates
comparable to the best of previous systems. Used in real-time applications, the detector runs at 15
frames per second without resorting to image differencing or skin color detection. In 2000, Weber
et al. [5] proposed a method to learn heterogeneous models of object classes for visual recognition.
The training images, that they used, contained a preponderance of clutter and the learning was
10
unsupervised. Their models represented objects as probabilistic constellations of rigid parts
(features). The variability within a class was represented by a join probability density function on
the shape of the constellation and the appearance of the parts. Their method automatically
identified distinctive features in the training set. The set of model parameters was then learned
using expectation maximization. When trained on different, unlabeled and non-segmented views
of a class of objects, each component of the mixture model could adapt to represent a subset of the
views. Similarly, different component models could also specialize on sub-classes of an object
class. Experiments on images of human heads, leaves from different species of trees, and motor-
cars demonstrated that the method works well over a wide variety of objects.
CHAPTER 3
BLOCK DIAGRAM
This section explains general block diagram of object detection and significance of each block in
the system. Common object detection mainly includes video input, preprocessing, object
segmentation, post processing. It is shown in Fig.
11
The significance of each block is as follows
Preprocessing:-It mainly involves temporal and spatial smoothing such as intensity adjustment,
removal of noise. For real-time systems, frame-size and frame-rate reduction are commonly used.
It highly reduces computational cost and time[1].
Object detection: It is the process of change detection and extracts appropriate change for further
analysis and qualification. Pixels are classified as foreground, if they
changed. Otherwise, they are considered as background. This process is called as back ground
subtraction. The degree of "change" is a key factor in segmentation and can vary depending on the
application. The result of segmentation is one or more foreground blobs, a blob being a collection
of connected pixels [1].
Post processing: Remove false detection caused due to dynamic condition in background using
morphological and speckle noise removal. BMC 2012 Dataset[6]: This dataset include real and
synthetic video. It is mainly used for comparison of different background subtraction techniques.
12
Fish4knowledge Dataset[7]: The Fish4 knowledge 35 dataset is an underwater benchmark dataset
for target detection against complex background. Carnegie Mellon Dataset[8]:
The sequence of CMU25 by Sheikh and Shah involves a camera mounted on a tall tripod. The
wind caused the tripod to sway back and forth causing vibration in the scene. This dataset is useful
while studying camera jitter background Situation. Stored video need to be read in appropriate
format before processing. Various related functions from image processing(IP) and computer
vision(CV) toolbox can be used for this purpose.
13
Chapter 4
Introduction to MATLAB
4.1 MATLAB:(matrix laboratory) is a multi-paradigm numerical computing environment and
proprietary programming language developed by MathWorks. MATLAB allows matrix
manipulations, plotting of functions and data, implementation of algorithms, creation of user
interfaces, and interfacing with programs written in other languages.
Although MATLAB is intended primarily for numerical computing, an optional toolbox uses the
MuPAD symbolic engine allowing access to symbolic computing abilities. An additional package,
Simulink, adds graphical multi-domain simulation and model-based design for dynamic and
embedded systems.
As of 2018, MATLAB has more than 3 million users worldwide.MATLAB users come from
various backgrounds of engineering, science, and economics.
4.2 HISTORY:
Cleve Moler, the chairman of the computer science department at the University of New Mexico,
started developing MATLAB in the late 1970s.He designed it to give his students access to
LINPACK and EISPACK without them having to learn Fortran. It soon spread to other universities
and found a strong audience within the applied mathematics community. Jack Little, an engineer,
was exposed to it during a visit Moler made to Stanford University in 1983. Recognizing its
commercial potential, he joined with Moler and Steve Bangert. They rewrote MATLAB in C and
founded MathWorks in 1984 to continue its development. These rewritten libraries were known
as JACKPAC. In 2000, MATLAB was rewritten to use a newer set of libraries for matrix
manipulation, LAPACK.
MATLAB was first adopted by researchers and practitioners in control engineering, Little's
specialty, but quickly spread to many other domains. It is now also used in education, in particular
the teaching of linear algebra and numerical analysis, and is popular amongst scientists involved
in image processing.
4.4 SYNTAX
The MATLAB application is built around the MATLAB programming language. Common usage
of the MATLAB Assignment Help application involves using the "Command Window" as an
interactive mathematical shell or executing text files containing MATLAB code.
4.5 VARIABLES
Variables are defined using the assignment operator, =. MATLAB is a weakly typed programming
language because types are implicitly converted. It is an inferred typed language because variables
can be assigned without declaring their type, except if they are to be treated as symbolic objects,
and that their type can change. Values can come from constants, from computation involving
values of other variables, or from the output of a function. For example:
>> x = 17
x=
17
>> x = 'hat'
x=
hat
>> y = 3*sin(x)
15
y=
-1.6097 3.0000
A simple array is defined using the colon syntax: initial:increment:terminator. For instance:
>> array = 1:2:9
array =
13579
defines a variable named array (or assigns a new value to an existing variable with the name array)
which is an array consisting of the values 1, 3, 5, 7, and 9. That is, the array starts at 1 (the initial
value), increments with each step from the previous value by 2 (the increment value), and stops
once it reaches (or to avoid exceeding) 9 (the terminator value).
>> array = 1:3:9
array =
147
the increment value can actually be left out of this syntax (along with one of the colons), to use a
default value of 1.
>> ari = 1:5
ari =
12345
assigns to the variable named ari an array with the values 1, 2, 3, 4, and 5, since the default value
of 1 is used as the increment.
Indexing is one-based,which is the usual convention for matrices in mathematics, unlike zero-
based indexing commonly used in other programming languages such as C, C++, and Java.
Matrices can be defined by separating the elements of a row with blank space or comma and using
a semicolon to terminate each row. The list of elements should be surrounded by square brackets
[]. Parentheses () are used to access elements and subarrays (they are also used to denote a function
argument list).
>> A = [16 3 2 13; 5 10 11 8; 9 6 7 12; 4 15 14 1]
16
A=
16 3 2 13
5 10 11 8
9 6 7 12
4 15 14 1
>> A(2,3)
ans =
11
Sets of indices can be specified by expressions such as 2:4, which evaluates to [2, 3, 4]. For
example, a submatrix taken from rows 2 through 4 and columns 3 through 4 can be written as:
>> A(2:4,3:4)
ans =
11 8
7 12
14 1
A square identity matrix of size n can be generated using the function eye, and matrices of any size
with zeros or ones can be generated with the functions zeros and ones, respectively.
>> eye(3,3)
ans =
100
010
001
>> zeros(2,3)
ans =
17
000
000
>> ones(2,3)
ans =
111
111
Transposing a vector or a matrix is done either by the function transpose or by adding dot-prime
after the matrix (without the dot, prime will perform conjugate transpose for complex arrays):
>> A = [1 ; 2], B = A.', C = transpose(A)
A=
1
2
B=
1 2
C=
1 2
Structures
MATLAB supports structure data types. Since all variables in MATLAB are arrays, a more
adequate name is "structure array", where each element of the array has the same field names. In
addition, MATLAB supports dynamic field names(field look-ups by name, field manipulations,
etc.).
4.6 FUNCTIONS
A function is a group of statements that together perform a task. In MATLAB, functions are
defined in separate files. The name of the file and of the function should be the same.
Functions operate on variables within their own workspace, which is also called the local
workspace, separate from the workspace you access at the MATLAB command prompt which is
called the base workspace.
Functions can accept more than one input arguments and may return more than one output
arguments.
Syntax of a function statement is −
function [out1,out2, ..., outN] = myfun(in1,in2,in3, ..., inN)
Example
The following function named mymax should be written in a file named mymax.m. It takes five
numbers as argument and returns the maximum of the numbers.
Create a function file, named mymax.m and type the following code in it −
19
function max = mymax(n1, n2, n3, n4, n5)
The first line of a function starts with the keyword function. It gives the name of the function and
order of arguments. In our example, the mymax function has five input arguments and one output
argument.
The comment lines that come right after the function statement provide the help text. These lines
are printed when you type −
help mymax
MATLAB will execute the above statement and return the following result −
This function calculates the maximum of the
five numbers given as input
20
You can call the function as −
mymax(34, 78, 89, 23, 11)
MATLAB will execute the above statement and return the following result −
ans = 89
Example
In this example, we will write an anonymous function named power, which will take two numbers
as input and return first number raised to the power of the second number.
Create a script file and type the following code in it −
21
result2 = 7
result3 = 1.0000e-10
result4 = 9.5459
Example
Let us write a function named quadratic that would calculate the roots of a quadratic equation.
The function would take three inputs, the quadratic co-efficient, the linear co-efficient and the
constant term. It would return the roots.
The function file quadratic.m will contain the primary function quadratic and the sub-function
disc, which calculates the discriminant.
Create a function file quadratic.m and type the following code in it −
function [x1,x2] = quadratic(a,b,c)
22
x1 = (-b + d) / (2*a);
x2 = (-b - d) / (2*a);
end % end of quadratic
MATLAB will execute the above statement and return the following result −
ans = 0.7321
23
Example
Let us rewrite the function quadratic, from previous example, however, this time the disc function
will be a nested function.
Create a function file quadratic2.m and type the following code in it −
function [x1,x2] = quadratic2(a,b,c)
function disc % nested function
d = sqrt(b^2 - 4*a*c);
end % end of function disc
disc;
x1 = (-b + d) / (2*a);
x2 = (-b - d) / (2*a);
end % end of function quadratic2
MATLAB will execute the above statement and return the following result −
ans = 0.73205
Example
Let us rewrite the quadratic function. This time, however, the disc function calculating the
discriminant, will be a private function.
24
Create a subfolder named private in working directory. Store the following function file disc.m in
it −
function dis = disc(a,b,c)
%function calculates the discriminant
dis = sqrt(b^2 - 4*a*c);
end % end of sub-function
Create a function quadratic3.m in your working directory and type the following code in it −
function [x1,x2] = quadratic3(a,b,c)
x1 = (-b + d) / (2*a);
x2 = (-b - d) / (2*a);
end % end of quadratic3
MATLAB will execute the above statement and return the following result −
ans = 0.73205
25
Global variables can be shared by more than one function. For this, you need to declare the
variable as global in all the functions.
If you want to access that variable from the base workspace, then declare the variable at the
command line.
The global declaration must occur before the variable is actually used in a function. It is a good
practice to use capital letters for the names of global variables to distinguish them from other
variables.
Example
Let us create a function file named average.m and type the following code in it −
function avg = average(nums)
global TOTAL
avg = sum(nums)/TOTAL;
end
Create a script file and type the following code in it −
global TOTAL;
TOTAL = 10;
n = [34, 45, 25, 45, 33, 19, 40, 34, 38, 42];
av = average(n)
When you run the file, it will display the following result −
av = 35.500
can alter any member of object only if object is an instance of a reference class, otherwise value
class methods must return a new instance if it needs to modify the object.
An example of a simple class is provided below.
classdef Hello
methods
function greet(obj)
disp('Hello!')
end
end
end
When put into a file named hello.m, this can be executed with the following commands:
>> x = Hello();
>> x.greet();
Hello!
27
As alternatives to the MuPAD based Symbolic Math Toolbox available from MathWorks,
MATLAB can be connected to Maple or Mathematica.
Libraries also exist to import and export MathML.
Chapter 5
MATLAB Toolboxes
28
technological discipline of computer vision seeks to apply its theories and models to the
construction of computer vision systems.
Sub-domains of computer vision include scene reconstruction, event detection, video tracking,
object recognition, 3D pose estimation, learning, indexing, motion estimation, and image
restoration.
5.1.1 Applications
Applications range from tasks such as industrial machine vision systems which, say, inspect bottles
speeding by on a production line, to research into artificial intelligence and computers or robots
that can comprehend the world around them. The computer vision and machine vision fields have
significant overlap. Computer vision covers the core technology of automated image analysis
which is used in many fields. Machine vision usually refers to a process of combining automated
image analysis with other methods and technologies to provide automated inspection and robot
guidance in industrial applications. In many computer-vision applications, the computers are pre-
programmed to solve a particular task, but methods based on learning are now becoming
increasingly common. Examples of applications of computer vision include systems for:
● Automatic inspection, e.g., in manufacturing applications;
● Assisting humans in identification tasks, e.g., a species identification system;
● Controlling processes, e.g., an industrial robot;
● Detecting events, e.g., for visual surveillance or people counting, e.g., in the restaurant
industry Interaction, e.g., as the input to a device for computer-human interaction;
● Modeling objects or environments, e.g., medical image analysis or topographical
modeling;
● Navigation, e.g., by an autonomous vehicle or mobile robot; and
● Organizing information, e.g., for indexing databases of images and image sequences.
The classical problem in computer vision, image processing, and machine vision is that of
determining whether or not the image data contains some specific object, feature, or activity.
Different varieties of the recognition problem are described in the literature
● Object recognition (also called object classification) – one or several pre-specified
or learned objects or object classes can be recognized, usually together with their 2D
positions in the image or 3D poses in the scene. Blippar, Google Goggles and LikeThat
provide stand-alone programs that illustrate this functionality.
29
● Detection – the image data are scanned for a specific condition. Examples include
detection of possible abnormal cells or tissues in medical images or detection of a
vehicle in an automatic road toll system. Detection based on relatively simple and fast
computations is sometimes used for finding smaller regions of interesting image data
which can be further analyzed by more computationally demanding techniques to
produce a correct interpretation.
Currently, the best algorithms for such tasks are based on convolutional neural networks. An
illustration of their capabilities is given by the ImageNet Large Scale Visual Recognition
Challenge; this is a benchmark in object classification and detection, with millions of images and
hundreds of object classes. Performance of convolutional neural networks, on the ImageNet tests,
is now close to that of humans.[26] The best algorithms still struggle with objects that are small or
thin, such as a small ant on a stem of a flower or a person holding a quill in their hand. They also
have trouble with images that have been distorted with filters (an increasingly common
phenomenon with modern digital cameras). By contrast, those kinds of images rarely trouble
humans. Humans, however, tend to have trouble with other issues. For example, they are not good
at classifying objects into fine-grained classes, such as the particular breed of dog or species of
bird, whereas convolutional neural networks handle this with ease.
Several specialized tasks based on recognition exist, such as:
● Content-based image retrieval – finding all images in a larger set of images which
have a specific content. The content can be specified in different ways, for example in
terms of similarity relative a target image (give me all images similar to image X), or
in terms of high-level search criteria given as text input (give me all images which
contain many houses, are taken during winter, and have no cars in them).
Computer vision for people counter purposes in public places, malls, shopping centres
30
● 2D code reading – reading of 2D codes such as data matrix and QR codes.
● Facial recognition
● Shape Recognition Technology (SRT) in people counter systems differentiating
human beings (head and shoulder patterns) from objects
The aim of image restoration is the removal of noise (sensor noise, motion blur, etc.) from images.
The simplest possible approach for noise removal is various types of filters such as low-pass filters
or median filters. More sophisticated methods assume a model of how the local image structures
look, to distinguish them from noise. By first analysing the image data in terms of the local image
structures, such as lines or edges, and then controlling the filtering based on local information from
the analysis step, a better level of noise removal is usually obtained compared to the simpler
approaches.
An example in this field is inpainting.
5.1.2 Computer vision toolbox in matlab
Computer Vision Toolbox™ provides algorithms, functions, and apps for designing and testing
computer vision, 3D vision, and video processing systems. You can perform object detection and
tracking, as well as feature detection, extraction, and matching. For 3D vision, the toolbox supports
single, stereo, and fisheye camera calibration; stereo vision; 3D reconstruction; and lidar and 3D
point cloud processing. Computer vision apps automate ground truth labeling and camera
calibration workflows.You can train custom object detectors using deep learning and machine
learning algorithms such as YOLO v2, Faster R-CNN, and ACF. For semantic segmentation you
can use deep learning algorithms such as SegNet, U-Net, and DeepLab. Pretrained models let you
detect faces, pedestrians, and other common objects.
You can accelerate your algorithms by running them on multicore processors and GPUs. Most
toolbox algorithms support C/C++ code generation for integrating with existing code, desktop
prototyping, and embedded vision system deployment.
32
signaling, analog-to-digital conversion, formatting luminance and color differences, and color
formats such as YUV444 and YUV411. DCTs are also used for encoding operations such as
motion estimation, motion compensation, inter-frame prediction, quantization, perceptual
weighting, entropy encoding, variable encoding, and motion vectors, and decoding operations such
as the inverse operation between different color formats (YIQ, YUV and RGB) for display
purposes. DCTs are also commonly used for high-definition television (HDTV) encoder/decoder
chips.
Image Processing Toolbox apps let you automate common image processing workflows. You can
interactively segment image data, compare image registration techniques, and batch-process large
data sets. Visualization functions and apps let you explore images, 3D volumes, and videos; adjust
contrast; create histograms; and manipulate regions of interest (ROIs).
You can accelerate your algorithms by running them on multicore processors and GPUs. Many
toolbox functions support C/C++ code generation for desktop prototyping and embedded vision
system deployment.
33
processing in-the-loop, hardware triggering, background acquisition, and synchronizing
acquisition across multiple devices.
Image Acquisition Toolbox supports all major standards and hardware vendors, including USB3
Vision, GigE Vision®, and GenICam™ GenTL. You can connect to Velodyne LiDAR® sensors,
machine vision cameras, and frame grabbers, as well as high-end scientific and industrial devices.
CHAPTER 6
MATLAB IMPLEMENTATION
Different toolboxes have been explored for functions and objects which can be useful at various
levels in the object detection. All such functions/ objects are described in this
Section.
34
6.1 Video Input
Input video has two possible ways Stored Video and real time video. Stored video can be obtained
from standard dataset available from internet. Real time video includescamera continuously
monitoring specific area producing real time video. These video can be understood by MATLAB
after reading.
BMC 2012 Dataset[6]: This dataset include real and synthetic video. It is mainly used for
comparison of different background subtraction techniques .Fish4 knowledge Dataset[7]: The
Fish4 knowledge 35dataset is an underwater benchmark dataset for target detection against
complex background.Carnegie Mellon Dataset[8]: The sequence of CMU25by Sheikh and Shah
involves a camera mounted on a tall tripod. The wind caused the tripod to sway back and forth
causing vibration in the scene.
This dataset is useful while studying camera jitter back ground situation.Stored video need to be
read in appropriate format before processing. Various related functions from image processing(IP)
and computer vision(CV) toolbox can be used for this purpose.
35
Image Function iminfo Information about
processing graphics file
36
Image acquisition is widely used toolbox which allows real time acquisition of video from video
acquisition device.Some commonly used function are explained below
Imaqtool:It launches an interactive GUI and allowsusersto explore, configure, and acquire data
from image acquisition devices.
Videoinput: It can be used to create video input object.This object can further be used to acquire
and display the image sequences.
Propinfo: It captures all the property information about image acquisition object. This information
can be useful in further video processing.
Getsnapshot: It immediately returns one single imageframe, from the video input object. This
function is useful to capture image at critical moment.
Trigger: Initiates data logging for the video inputobject. It can be used to initialize video at
appropriate moment and collect a video data.
6.2 PreProcessing :
Data preprocessing is an important step in the data mining process. The phrase "garbage in,
garbage out" is particularly applicable to data mining and machine learning projects. Data-
gathering methods are often loosely controlled, resulting in out-of-range values (e.g., Income:
−100). Analyzing data that has not been carefully screened for such problems can produce
misleading results. Thus, the representation and quality of data is first and foremost before running
an analysis.Often, data preprocessing is the most important phase of a machine learning project,
especially in computational biology.
If there is much irrelevant and redundant information present or noisy and unreliable data, then
knowledge discovery during the training phase is more difficult. Data preparation and filtering
steps can take considerable amount of processing time. Data preprocessing includes cleaning,
37
Instance selection, normalization, transformation, feature extraction and selection, etc. The product
of data preprocessing is the final training set.
Data pre-processing may affect the way in which outcomes of the final data processing can be
interpreted. This aspect should be carefully considered when interpretation of the results is a key
point, such in the multivariate processing of chemical data (chemometrics).
Task of PreProcessing
● Data cleansing
● Data editing
● Data reduction
● Data wrangling
The video is needed to be converting to appropriate data type after reading. Useful objects and
functions are listed
Useful function/object for video data type conversion
38
IP Function Im2doubl, im2single, These function can be
used to convert image
im2 uint8,
to specified form
im2uint16
This step may include noise removal, contrast adjustment,image correction. Useful function and
object summarized
Salt and
peppernoise)
39
CV Object vision.Image Filter Perform 2-D FIR
filtering of input
matrix
40
IP Function histeq Enhance contrast
using histogram
equalization
Any object detection system performs segmentation based on one or more feature of the scene. It
may include color,corner, edge, shape, gradient, texture, DCT or DFT coefficient. Different
functions are available to extract these
Useful function/object for feature extraction
IP Function rgb2gray
IP Function rgb2ycbcr
IP Function ycbcr2rgb
IP Function corner
IP Function edge
IP Function imgradient
IP Function entropyfiit
IP Function rangefilt
41
IP Function stdfilt
CV Object vision.ColorSpace
Converter
CV Object vision.DCT
CV Object vision.FFT
CV Object vision.EdgeDetector
STEP 1: INPUT…..to store the input…stored input need to be read in appropriate format before
processing. Various related functions from image processing and computer vision toolbox can be
used for this purpose.Some example functions are “imread, iminfo, imwrite, imshow”. These
functions are used to read, to write, to get information and to display the image.
STEP 2: PREPROCESSING….RGB to Gray conversion and gaussian noise removal using
median filter. It includes series of operations those are shown..
vision.scadeobjectDetector
vision.OpticalFlow
42
vision.PeopleDetector
STEP 4: Post Processing…..it is required to remove unwanted portion in the foreground mask. It
mayarise due to false detection caused by dynamic may include speckle noise, small holes in the
scene etc. Detected object can be annoted for proper display. Some of the useful functions in this
process are…
imclose
imopen
imfill
43
CHAPTER 7
SOURCE CODE
clc
%% Test Two
%%Histogram of Orientated Gradients
%%Histogram of Pixel Orientation
%%Histogram of curvatures
%% Eccentricity
%clear all
close all
%% Area Ratios Weight
tic
load ('newData.mat')%read video
%load('FinalHog.mat')
depth=6;
Params = [9 3 2 1 0.2];
video=mmreader('F:\Thesis\Testing Videos\T4.h64');%\Other
Datasets\test_videos\test_videos\3.avi');
%[7.5 18.5 345 224]));
for k=3501:5:4000
%%Read an image
figure(1);
image=imcrop(read(video,k),[7.5 18.5 345 224]);
44
%%Gray Scale image
img=(rgb2gray(image));
BW=edge(sqrt(double(img)),'canny',0.29);
% [x,y]=find(BW);
% deri= diff([x y],2) ;
% ind=find(deri(:,1)~=0&deri(:,2)~=0);
img1=sqrt(double(img))-sqrt(double(fi));
fore=zeros(size(img1));
ind=find(img1>max(max(img1)*0.6));
fore(ind)=255;
%BW=abs(edgelinking2_C(BW,3,3));
[BW AngleLeft AngleRight]= edgelinking2_C(BW,3,3);
BW=abs(BW);
st=strel('disk',3);
BW=imopen(BW,st);
fore=imdilate(fore,st);
LabelsList=unique(BW(ind));
toc
hold off;
figure(2);
subplot(2,2,1),imshow(BW);
subplot(2,2,2),imshow(fore);
subplot(2,2,3:4),imshow(image);hold on;
maximumLabel=max(max(BW));
%Find Properties of Connected Components
% for all those contours whose area is less than 20 and greater the 150
for i=2:numel(LabelsList)
45
[x_A,y_A]=find(BW==LabelsList(i));
px=x_A;
py=y_A;
if(~isempty(px))
if(~isempty(subImage))
hogs = HoG(double(subImage),Params);
% subImage=edge(subImage,'canny');
% r = regionstat(double(subImage), 1, 'Extent');
%
Z=[x_A-median(x_A),y_A-median(y_A)];
C=cov(Z);
[E,V]=eig(C);
V=sort(diag(V));
stra=V(2)/sum(V);
ell = inertiaEllipse([x_A y_A]);
OtherFeatures=[ell(4)/ell(3);ell(5);stra];%;AngleRight(i);stra];
46
H=[hogs;OtherFeatures];%;length(unique(x_A));r;((box(2)-box(1))/height);stra];
drawBox(box,'g');
end
%%Result
end
end
saveas(figure(2),strcat('Results\T4_',num2str(k),'.jpg'));
time=toc;
fi=rgb2gray(image);
end
% title('subImage');
%saveit = close(saveit);
CHAPTER 8
RESULT
47
Object Detection is observed
CHAPTER 9
APPLICATIONS
Optical character recognition or optical character reader, often abbreviated as OCR, is the
mechanical or electronic conversion of images of typed, handwritten or printed text into machine-
encoded text, whether from a scanned document, a photo of a document, a scene-photo (for
example the text on signs and billboards in a landscape photo) or from subtitle text superimposed
on an image, we are extracting characters from the image or video.
48
Widely used as a form of information entry from printed paper data records – whether passport
documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of
static-data, or any suitable documentation it is a common method of digitizing printed texts so that
they can be electronically edited, searched, stored more compactly, displayed on-line, and used in
machine processes such as cognitive computing, machine translation, (extracted) text-to-speech.
One of the best examples of why you need object detection is for autonomous driving is In order
for a car to decide what to do in next step whether accelerate, apply brakes or turn, it needs to
know where all the objects are around the car and what those objects are That requires object
detection and we would essentially train the car to detect known set of objects such as cars,
pedestrians, traffic lights, road signs, bicycles,motorcycles, etc.
TRACKING OBJECTS
49
Object detection system is also used in tracking the objects, for example tracking a ball during a
football match, tracking movement of a cricket bat, tracking a person in a video.
Object tracking has a variety of uses, some of which are surveillance and security, traffic
monitoring, video communication, robot vision and animation.
Face detection is a computer technology being used in a variety of applications that identifies
human faces in digital images. Face recognition describes a biometric technology that goes way
beyond recognizing when a human face is present. It actually attempts to establish whose face it
is.
There are lots of applications of face recognition. Face recognition is already being used to unlock
phones and specific applications. Face recognition is also used for biometric surveillance, Banks,
retail stores, stadiums, airports and other facilities use facial recognition to reduce crime and
prevent violence.
50
SMILE DETECTION
Facial expression analysis plays a key role in analyzing emotions and human behaviors. Smile
detection is a special task in facial expression analysis with various potential applications such as
photo selection, user experience analysis and patient monitoring.
PEDESTRIAN DETECTION
Pedestrian detection is an essential and significant task in any intelligent video survillance system,
as it provides the fundamental information for semantic understanding of the video footages. It has
an obvious extension to automotive applications due to the potential for improving safety systems.
Increase in the number of sport lovers in games like football, cricket, etc. has created a need for
digging, analyzing and presenting more and more multidimensional information to them. Different
classes of people require different kinds of information and this expands the space and scale of the
required information. Tracking of ball movement is of utmost importance for extracting any
information from the ball based sports video sequences and we can record the video frame
according to the movement of the ball automatically.
By Recognizing the objects in the images ,combining each object in the image and passing detected
objects label in the URL we can make the object detection system as image search.
51
AUTOMATIC TARGET RECOGNITION
Automatic target recognition (ATR) is the ability for an algorithm or device to recognize targets
or other objects based on data obtained from sensors.
Target recognition was initially done by using an audible representation of the received signal,
where a trained operator who would decipher that sound to classify the target illuminated by the
radar. While these trained operators had success, automated methods have been developed and
continue to be developed that allow for more accuracy and speed in classification. ATR can be
used to identify man made objects such as ground and air vehicles as well as for biological targets
such as animals, humans, and vegetative clutter. This can be useful for everything from
recognizing an object on a battlefield to filtering out interference caused by large flocks of birds
on Doppler weather radar.
CHAPTER 10
CONCLUSION AND FUTURE SCOPE
CHAPTER 11
REFERENCES
53
54