Sequential Particle Swarm Optimization for Visual Tracking
Xiaoqin Zhang1 ,Weiming Hu1 ,Steve Maybank2 ,Xi Li1 ,Mingliang Zhu1
National Laboratory of Pattern Recognition, Institute of Automation, Beijing, China
{xqzhang,wmhu,lixi,mlzhu}@nlpr.ia.ac.cn
2
School of Computer Science and Information Systems, Birkbeck College, London, UK
sjmaybank@dcs.bbk.ac.uk
1
Abstract
al. [1] is a good example. The aim is to obtain a tight contour enclosing the object by minimizing an energy function.
In [2], the cost function is defined as the sum of squared
differences between the observation candidate and a fixed
template. Then the motion parameters are found by minimizing the cost function through a gradient descent search.
Mean shift, which firstly appeared in [10] as an approach for
estimating the gradient of a density function, is applied by
Comaniciu [3] to visual tracking, in which the cost function between two color histograms is minimized through
the mean shift iterations. In general, deterministic methods
are usually computationally efficient but they easily become
trapped in local minima. In contrast, stochastic methods introduce some stochastic factors into the searching process
in order to have a higher probability of reaching the global
optimum of the cost function. For example, in [11], object
tracking is viewed as an online MAP (maximum a posterior) problem, which is solved by randomly generating a
large number of particles to find the maximum of the posterior distribution. Bray et al. [12] use the stochastic metadescent strategy to adapt the step size of the gradient descent
search, and thus avoid local minima of the optimization process in articulated structure tracking. Leung and Gong [13]
incorporate random subsampling into mean shift tracking to
boost its efficiency and robustness for low-resolution video
sequences. Compared with the deterministic counterparts,
stochastic methods are usually more robust, but they suffer
a large computational load, especially in high-dimensional
state space. Although considerable work has already been
done above, a more effective optimization method is still
intensively needed for robust visual tracking.
Visual tracking usually involves an optimization process
for estimating the motion of an object from measured images in a video sequence. In this paper, a new evolutionary
approach, PSO (particle swarm optimization), is adopted
for visual tracking. Since the tracking process is a dynamic
optimization problem which is simultaneously influenced by
the object state and the time, we propose a sequential particle swarm optimization framework by incorporating the
temporal continuity information into the traditional PSO algorithm. In addition, the parameters in PSO are changed
adaptively according to the fitness values of particles and
the predicted motion of the tracked object, leading to a
favourable performance in tracking applications. Furthermore, we show theoretically that, in a Bayesian inference
view, the sequential PSO framework is in essence a multilayer importance sampling based particle filter. Experimental results demonstrate that, compared with the state-of-theart particle filter and its variation-the unscented particle filter, the proposed tracking algorithm is more robust and effective, especially when the object has an arbitrary motion
or undergoes large appearance changes.
1. Introduction
Visual tracking has emerged as a central problem in
many applications such as surveillance, vision-based control, human-computer interfaces, intelligent transportation,
and augmented reality. Recent years have witnessed great
advances in the literature, e.g. the snakes model [1], template matching [2], mean shift [3], condensation [4], appearance models [5], probabilistic data association [6] and
so on.
Most of the existing tracking algorithms can be formed
as an optimization process, which are typically tackled using either deterministic methods [1, 2, 3, 7, 8, 9] or stochastic methods [4, 11, 12, 13, 14, 15, 16]. Deterministic methods usually involve a gradient descent search to minimize
a cost function. The snakes model introduced by Kass et
Recently PSO (particle swarm optimization) [17, 18,
19, 20], a new population based stochastic optimization
technique, has received more and more attentions because of its considerable success in solving nonlinear, nondifferentiable, multimodal optimization problems. Unlike
other particle based stochastic optimization techniques such
as genetic algorithms, the particles in PSO interact locally
with one another and with their environment in analogy
1
978-1-4244-2243-2/08/$25.00 ©2008 IEEE
with the ’cognitive’ and ’social’ aspects of animal populations, found in fish schooling, birds flocking, and insects
swarming. Starting from a diffuse population, now called a
swarm, individuals, now termed particles, move about the
search space and eventually cluster in the regions where the
optima are located. The advantages of this mechanism are,
on one hand, the robustness and sophistication of the group
behavior, and on the other hand, the simplicity and low cost
of the computation associated with each particle.
In view of the forgoing discussion, we propose a sequential PSO based tracking framework. To the best of our
knowledge, the proposed framework is new in the tracking
literature. The main contributions of this paper are as follows. The sequential information in the tracking process
is effectively incorporated into PSO method to form a robust tracking framework, in which any appearance models
can be used. Meanwhile, we show theoretically that, in a
Bayesian inference view, the PSO iterations are essentially
a swarm-intelligence guided multi-layer importance sampling strategy which incorporates the new observations into
a sampling stage, and thus avoids the sample impoverishment problem suffered by the particle filter.
The rest of this paper is structured as follows. A brief introduction to the traditional PSO algorithm is presented in
Section 2. In Section 3, the proposed sequential PSO framework is described in detail. Section 4 presents the proposed
tracking algorithm in the sequential PSO framework. Experimental results are shown in Section 5, and Section 6 is
devoted to conclusion.
represent inertial velocity, cognitive effect and social effect
respectively.
After the nth iteration, the fitness value of each particle
is evaluated by a predefined observation model as follows.
f (xi,n+1 ) = p(oi,n+1 |xi,n+1 )
(3)
i,n+1
where o
is the observation corresponding to the state
xi,n+1 . Then the individual best and global best of particles
are updated in the following equations:
i,n+1
i,n+1
pi =
x
pi ,
, if f (x
else
) > f (pi )
g = arg max f (pi )
pi
(4)
(5)
In this way, the particles search for the optima (here, assuming optimization means maximizing) through the above
iterations until convergence. In PSO algorithm, there are
several parameters to be tuned: constriction factor X , maximum velocity v max , acceleration constants ϕ1 , ϕ2 , the maximum number of iterations T , and the initialization of the
particles.
3. Sequential Particle Swarm Optimization
2. Particle Swarm Optimization
3.1. Motivation
Particle swarm optimization, originally developed by
Kennedy and Eberhart in 1995 [17], is a population based
stochastic optimization technique, which is inspired by the
social behavior of bird flocking. In detail, a PSO algorithm
is initialized with a group of random particles {xi,0 }N
i=1 (N
is the number of particles). Each particle xi,0 has a corresponding fitness value which is evaluated by the observation
model f (xi,0 ), and has a relevant velocity v i,0 which directs
the movement of the particle. In each iteration, the ith particle moves with the adaptable velocity v i,0 , which is a function of the best state found by that particle (pi , for individual
best), and of the best state found so far among all particles
(g , for global best). Given these two best values, the particle updates its velocity and state with following equations
in the nth iteration (as shown in Fig.1),
v i,n+1 = X (v i,n + ϕ1 u1 (pi − xi,n ) + ϕ2 u2 (g − xi,n )) (1)
xi,n+1 = xi,n + v i,n+1
Figure 1. The nth iteration of particle i
(2)
where ϕ1 , ϕ2 are acceleration constants, u1 , u2 ∈ (0, 1) are
uniformly distributed random numbers, and X is a constriction factor to confine the velocity within a reasonable range:
||v i,n || ≤ v max . In Equation (1), the three different parts
In this section, an interpretation of the tracking process
in a stochastic optimization view is presented to show why
PSO can achieve good performance in tracking applications.
Essentially, visual tracking is the successive localization
of a specific region in a video sequence. Let’s consider the
following version of the tracking problem: suppose there
is a groundtruth corresponding to the object (food) in the
image (state space) being searched. Suppose a group of
particles (birds) are randomly generated in the image (state
space), and none of the particles (birds) knows where the
object (food) is. But each particle (bird) knows how far it
is from the object (food) by evaluating the observation in
each iteration. What is the best strategy to find the object
(food), and how can the information obtained by each particle (bird) be used efficiently? The PSO framework, inspired
by the swarm intelligence–birds flocking, provides an effective way to answer these questions, which motivates us
to design a PSO based framework for robust and efficient
visual tracking.
However, in tracking applications, the data is typically a
time sequence, and hence the task is essentially a dynamic
1C?H?8A?K8H?DC
D< E8FH?9A;G
38FH?9A;G 9DCJ;F=;:
8H H?B; H
5H8=; 1
H . H +,
verged at time t, the re-diversification strategy is carried out
as follows.
i,0
i
xt+1 ∼ N (pt , Σ)
where Σ is the covariance matrix of the Gaussian distribution, whose diagonal elements are proportional to the predicted velocity vtpred of the optimum at time t.
48C:DB
EFDE8=8H?DC
0?HC;GG
;J8AI8H?DC
vtpred = gt−1 − gt−2
5H8=; 11
6E:8H; E8FH?9A;G
8C: E8F8B;H;FG
5H8=; 111
/>;9@- GHDE
?H;F8H?DC DF CDH
2
7
Figure 2. Overview of the sequential PSO algorithm
optimization problem which distinguishes it from traditional optimization problems. In this case, the cost function
is influenced by both the object state and the time, and optima may shift spatially, change both height and shape, or
come into or go out of existence according to the time.
To effectively tackle such a dynamic optimization problem, we need to answer these questions: a) how to utilize
the temporal continuity information between two consecutive frames, b) how to maintain the diversity of the particles
in the optimization process.
3.2. Sequential PSO Based Framework
Motivated by the above discussion, we propose a sequential PSO based framework for visual tracking. To give a
clear view, the flowchart of the sequential PSO based framework is schematically shown in Fig.2. First, the individual
best of particles from the previous optimization round are
randomly propagated to enhance their diversities. Then, the
modified PSO with parameters adaptively tuned is carried
out. Finally, an effective convergence criterion is checked
to decide whether the PSO iteration stops or not. There are
three major stages in the sequential PSO based framework:
random propagation, adaptive PSO and convergence criterion, which are described in the following sections.
3.2.1
(6)
Random Propagation
When PSO is applied to such dynamic optimization problems, the major difficulty is the diversity loss of particles
due to the convergence of the previous optimization process. Thus, a re-diversification mechanism must be employed when the particles are propagated to the next image
frame.
An effective re-diversification mechanism needs to know
the prior knowledge of the object motion. In this paper, the
particle set is randomly propagated according to a Gaussian transition model whose mean is the previous individual best particle and covariance matrix is determined by the
predicted velocity of the object motion.
Given the individual best of particle set {pit }N
i=1 con-
(7)
i,0
Meanwhile, the velocity vt+1
is sampled from the uniform
pred
distribution U (0, vt ).
In our re-diversification strategy, resampling process is
not needed because the individual best of particle set converged at time t provides a compact sample set for propagation (for the reason, please see the Section 3.2.3). Although
randomly propagation according to the predicted velocity is
simple, it is sufficient because it is only used to produce an
initial value for a subsequent search for the optimal state.
3.2.2
Adaptive PSO
A drawback of the aforementioned version of PSO is the
lack of a reasonable mechanism for controlling the acceleration parameters ϕ1 , ϕ2 and the maximum velocity vtmax ,
fostering the danger of swarm explosion and divergence especially in high-dimensional state space. To overcome this
deficiency, we propose a modified PSO by self-tuning its
parameters, where the acceleration parameters ϕ1 , ϕ2 can
be set as follows,
ϕ1 = 2f (pi )/(f (pi ) + f (g))
i
ϕ2 = 2f (g)/(f (p ) + f (g))
(8)
(9)
Compared with the canonical PSO in [17] which constantly sets these parameters to 2, our strategy is more reasonable. Meanwhile, the equations (8)(9) demonstrate that
the preference for the ’cognitive’ part or ’social’ part is determined by their fitness values.
In tracking applications, the maximum velocity v max
provides a reasonable bound in order to cover the particle’s
maximum motion and prevent the particle from arbitrary
moving. Traditionally, the maximum velocity v max is set to
a predefined constant. However, it is not reasonable when
the object has an arbitrary motion. Therefore, we propose a
novel scheme for selecting vtmax based on predicted velocity
vtpred of the optimum.
vtmax = 1.2 ∗ vtpred
So X is set to
max
X =
||vt
||/||vti,n+1 ||, if ||vti,n+1 || > ||vtmax ||
1,
else
(10)
(11)
In this way, the maximum velocity v max is heuristically
selected by utilizing the motion information in the previous
tracking process, and thus provides a reasonable limitation
to the moving of particles and a certain capability to absorb
their acceleration.
Figure 3. The convergence criterion of the sequential PSO algorithm
3.2.3
Convergence Criterion
The goal of tracking is to find the object as soon as possible. It is not necessary for all the particles to converge to the
object. As a result, the convergence criterion is designed as
follows:
f (gt ) >Th , where Th is a predefined threshold, and all the individual best {pit }N
i=1 are in a neighborhood of gt , as shown
in Fig.3, or the maximum iteration number is encountered.
According to this criterion, the object to be searched can be
efficiently identified and the convergent particle set {pit }N
i=1
provides a compact initialization without sample impoverishment for the next optimization process, and the temporal continuity information can be naturally incorporated into
the sequential PSO framework.
4. A Bayesian Inference Interpretation of Sequential PSO
We investigate the sequential PSO in a Bayesian inference view, and find that sequential PSO is a unified framework which combines the multi-layer importance sampling
and particle filter. The multi-layer importance sampling
stage incorporates the newest observations into importance
sampling process to approximate the ’optimal’ proposal distribution p(xt |xit−1 , ot ) [21].
To make this paper self-contained, we first briefly review
the standard particle filter and its major limitation, which
are described in more detail in [22]. We then present multilayer importance sampling carried out by PSO iterations.
4.1. Standard Particle Filter
Particle filter [22] is an online Bayesian inference process for estimating the unknown state xt at time t from a sequential observations o1:t perturbed by noises. A dynamic
state-space form employed in the Bayesian inference framework is shown as follows,
state transition model xt = ft (xt−1 , ǫt ) ↔ p(xt |xt−1 ) (12)
observation model ot = ht (xt , νt ) ↔ p(ot |xt )
(13)
where xt , ot represent system state and observation, ǫt , νt
are the system noise and observation noise.
ft (., .)
and ht (., .) are the state transition and observation models, which are determined by probability distributions
p(xt |xt−1 ) and p(ot |xt ) respectively. The key idea of particle filter is to approximate the posterior probability distribution p(xt |o1:t ) by a set of weighted samples {xit , wti }N
i=1 ,
Figure 4. An illustration of importance sampling (left: sample
from p(xt |xt−1 ), right: after PSO iterations )
which are sampled from a proposal distribution q(·), i.e.
xit ∼ q(xt |xit−1 , o1:t ), (i = 1, · · ·, N ), and then each particle’s weight is set to
wti ∝
p(ot |xit )p(xit |xit−1 )
q(xt |xit−1 , o1:t )
(14)
Finally, the posterior probability distribution is approxi
i
i
mated as p(xt |o1:t ) = N
i=1 wt δ(xt − xt ), where δ(·) is the
Dirac function.
The proposal distribution q(·) is critically important for
a successful particle filter since it concerns putting the sampling particles in the useful area where the posterior is
significant. In practice, the dynamic transition distribution p(xt |xt−1 ) is usually taken as the proposal distribution. However, it is unreasonable when p(xt |xt−1 ) lies in
the tail of p(ot |xt ) (as shown in Fig.4). In fact, Doucet
et al. [21] show that the ’optimal’ proposal distribution
is p(xt |xit−1 , ot ). So the question is, how to incorporate
the current observation ot into the transition model to form
an effective proposal distribution in reasonable computation
cost.
4.2. Multi-layer Importance Sampling and Incorporation of Measurement
From the above description, we can see that the sequential PSO is a combination of the PSO iterations and the
particle filtering procedure. Unlike the traditional particle
filter algorithm which directly samples the particles from
state transition distribution, the PSO iterations employed in
our framework is essentially a multi-layer importance sampling stage which progressively updates the sampled particles based on the newest observations.
Specially, the initial particles in sequential PSO method
are firstly sampled from the transition distribution as follows.
i
xi,0
t+1 ∼ N (pt , Σ)
(15)
In each PSO iteration, the particles are updated according
to the newest observations. The detail of the multi-layer
importance sampling strategy is presented in Algorithm 1.
As shown in Fig.4, the particles directly sampled from
transition model are situated in the tail of the observation
likelihood. In contrast, through the PSO iterations, the particles are moved towards the region where the likelihood of
observation has larger values, and are finally relocated to
the dominant modes of the likelihood.
Algorithm 1 Multi-layer Importance Sampling
N
1. Initialization Xt+1 = {xi,0
t+1 }i=1
2. for n = 0 : T do
3. Carry out the PSO iteration based on Equ.(1)(2)
i,n
i,n
i,n+1
+ϕ1 u1 (pit+1 −xi,n
= X (vt+1
vt+1
t+1 )+ϕ2 u2 (gt+1 −xt+1 ))
i,n+1
xi,n+1
= xi,n
t+1
t+1 + vt+1
4. Evaluation of fitness values
i,n+1 i,n+1
f (xi,n+1
t+1 ) = p(ot+1 |xt+1 )
5. Incorporation
of the current observations
pit+1 =
5.
6.
7.
8.
i,n+1
xt+1
pit+1 ,
i,n+1
, if f (xt+1
)>f (pit+1 )
else
gt+1 = arg maxpi f (pit+1 )
t+1
Update the parameters.
Check the convergence criterion: if satisfied, break;
end for
N
Output the particles set Xt+1 = {xi,n
t+1 }i=1
5. Proposed Tracking Algorithm
In this section, we introduce the proposed tracking algorithm and demonstrate how the aforementioned sequential
PSO framework is adopted for tracking. Our algorithm localizes the tracked object in each image frame using a rectangular window, and the motion of a tracked object between
two consecutive frames is approximated by an affine image
warping. Specifically, the motion is characterized by the
state of the particle xt = (x, y, θ, s, α, β) where {x, y} denote the 2-D translation parameters and {θ, s, α, β} are deformation parameters. Moreover, the fitness value of each
particle is evaluated by a spatial constraint MOG (mixture
of Gaussian) based appearance model. In the following
parts, we first introduce the spatial constraint MOG based
appearance model, then give a detailed description of the
proposed tracking algorithm in the sequential PSO based
framework.
5.1. Spatial Constraint MOG Based Appearance
Model
The appearance of the target is modeled by a spatial constraint MOG, with the parameters estimated by an online
EM algorithm.
1) Appearance Model: Similar to[5],[23], the appearance model consists of three components S, W, F , where
the S component captures temporally stable images, the W
component characterizes the two-frame variations, and the
F component is a fixed template of the target to prevent the
model from drifting away. However, this appearance model
treats each pixel independently and discards the spatial layout of the target. So it may fail in the case that, for instance,
there are several similar objects close to the target or partial occlusion. In our work, we apply a 2-D gaussian spatial constraint to the SW F based appearance model, whose
mean vector is the coordinate of the center position and the
diagonal elements of the covariance matrix are proportional
300
250
200
150
100
50
0
280
260
240
220
200
180
160
160
150
140
130
120
110
100
Figure 5. A 2-D gaussian spatial constraint MOG based appearance model
to the size of the target in the corresponding spatial directions, as illustrated in Fig. 5. Thus the fitness value of particles can be evaluated by the following appearance model,
f (xt ) = p(ot |xt ) =
d
2
πl,t (j)N (ot (j); µl,t (j), σl,t
(j))
N (x(j); xc , Σc ) ∗
j=1
l=s,w,f
(16)
2
{πl,t , µl,t , σl,t
,l
where
= s, w, f } represent mixture probabilities, mixture centers and mixture variances of the
S, W, F components respectively, ot is the candidate region
corresponding to state of particle xt and d is the number
of pixels inside ot . x(j),xc and Σc represent the coordinate
of the pixel j , the center coordinate of the target and the
variance matrix in the spatial space. Here, N (x; µ, σ 2 ) is a
Gaussian density defined as follows.
(x − µ)2
N (x; µ, σ 2 ) = (2πσ 2 )−1/2 exp −
2σ 2
(17)
This spatial constraint appearance model is based on the
assumption that the closer the pixel to the center, the more
important it is for the model representation. Fortunately,
such an assumption is almost satisfied in real applications.
2) Parameter Estimation: In order to make the model
parameters depend more heavily on the most recent observation, we assume that the previous appearance is exponentially forgotten and new information is gradually added to
the appearance model. To avoid having to store all the data
from previous frames, an online EM algorithm is used to estimate the parameters of the S, W, F components as follows.
Step1: During the E-step, the ownership probability of each
component is computed as
2
ml,t (j) ∝ πl,t (j)N (ot (j); µl,t (j), σl,t
(j))
which fulfills
l=s,w,f
(18)
ml,t = 1.
Step2: The mixing probability of each component is estimated as
πl,t+1 (j) = αml,t (j) + (1 − α)πl,t (j); l = s, w, f
(19)
and a recursive form for moments {Mk,t+1 ; k = 1, 2} are
evaluated as
Mk,t+1 (j) = αokt (j)ms,t (j) + (1 − α)Mk,t (j); k = 1, 2 (20)
where α = 1 − e−1/τ acts as a forgotten factor and τ is a
predefined constant.
Step3: Finally, the mixture centers and the variances are
estimated in the M-step
µs,t+1 (j) =
M1,t+1 (j) 2
M2,t+1 (j)
, σs,t+1 =
− µ2s,t+1 (j)
πs,t+1 (j)
πs,t+1 (j)
2
2
(j) = σw,1
(j)
µw,t+1 (j) = ot (j), σw,t+1
2
2
µf,t+1 (j) = µf,1 (j), σf,t+1
(j) = σf,1
(j)
5.2. Sequential PSO Based Tracking Algorithm
Sequential PSO has provided a general and effective
tracking framework. Therefore, we embed the spatial constraint MOG based appearance model into this framework
for the fitness value evaluation. The detail of the sequential
PSO based tracking algorithm is presented as follows.
Algorithm 2 Sequential PSO Based Tracking Algorithm
Input: Given the individual best particles {pit }N
i=1 at time t;
1. Randomly propagate the particle set to enhance their diversities according to the following transition model
i
xi,0
t+1 ∼ N (pt , Σ)
where Σ is a diagonal covariance matrix whose elements
are the corresponding variances of affine parameters, i.e.,
2
, σφ2 .
σx2 , σy2 , σθ2 , σs2 , σα
2. The fitness value of each particle is evaluated by the spatial
constraint MOG based observation model as follows.
i,n
i,n
f (xi,n
t+1 ) = p(ot+1 |xt+1 ), i = 1 · · · N, n = 0 · · · T
3. Update {pit+1 }N
i=1 and gt+1 by the fitness values obtained
above, and update the parameters.
4. Carry out the PSO iteration based on Equ.(1),(2);
5. Check the convergence criterion: if satisfied, continue, otherwise go to step 2;
Output: Global optimum: gt+1 ;
6. Experiment Results
In our implementation, each candidate image corresponding to a particle is rectified to a 30×15 patch, and the
feature is a 450-dimension vector of gray level values subjected to zero-mean-unit-variance normalization. All of the
experiments are carried out on a CPU Pentium IV 3.2GHz
PC with 512M memory.
6.1. Sequential PSO vs PF and UPF
First, we conduct a comparison experiment among the
SPSO (sequential PSO) based tracking algorithm, a standard PF (particle filter) and its variation-UPF (unscented
particle filter) [24] on a video with manually labeled
groundtruth. Then, a theoretical investigation is presented
to show why SPSO has advantages over the other two algorithms.
This video sequence1 contains a human face moving to
the left and right very quickly. Although the sequence is
simple, it is effective to show the claimed advantages of
the SPSO framework. In our implementation, the parameters in the particle filter and unscented particle filter are set
1 The sequence is available at http://vision.stanford.edu/ birch/headtracker/seq/.
to {N = 600, Σ = diag(82 , 82 , 0.022 , 0.022 , 0.0022 , 0.0022 )}
corresponding to the number of particles and the covariance
matrix of the transition distribution respectively. To give
a convincing comparison, the sequential PSO algorithm is
calibrated in the same metric, implementing with the same
covariance matrix and with 60 particles in each iteration.
As shown in Fig.6(a), the particle filter based tracker fails
to track the object at frame 19, because it can not catch the
rapid motion of the object. More particles and an enlargement for the diagonal elements of the covariance matrix
would improve its performance, but this strategy involves
more noises and a heavy computational load, and it may trap
in the curse of dimensionality when the dimension of the
state increases. Fig.6(b) shows the tracking performance of
the unscented particle filter, from which we can notice that
the tracker follows the object throughout the sequence, but
the localization accuracy is unsatisfactory. In comparison,
our method, which utilizes individual and environmental information in the search space, never loses the target and
achieves the most accurate results. Furthermore, we have
conducted a quantitative evaluation of these algorithms, and
have a comparison in the following aspects: frames of successful tracking, MSE (mean square error) between the estimated position and the labeled groundtruth. In table 1, it is
clear that the PF tracker fails at frame 19 while the UPF and
SPSO trackers succeed in tracking throughout the sequence.
Additionally, the SPSO tracker outperforms the UPF tracker
in term of accuracy.
A theoretical investigation shows the underlying reasons
for the above experimental results. The undesired behavior
of particle filter in Fig.6 is caused by the sample impoverishment in its particle generation process. Let’s focus on the
frame 19 when the PF tracker loses the target. Here, the particles are sampled from the Gaussian based transition distribution to catch the object motion. When the object has rapid
and arbitrary motion, the particles drawn from this distribution do not cover a significant region of the likelihood (as
shown in top-left of Fig.7), and thus the weights of most
particles are low, leading to the tracking failure. As for the
unscented particle filter, the sigma-states are generated by
UT (unscented transformation) and propagated (as shown
in top-right Fig.7), and the weighted mean and covariance
are calculated to form a better proposal distribution, thus
enhancing the tracking performance to some degree. However, the estimation accuracy of UT is only to the secondorder for non-Gaussian data, which may not be coincident
with actual motion and thus leads to inaccurate localization.
Meanwhile, the generation of sigma-states and the updating of the covariance are time-consuming. While the SPSO
framework extracts the local and global information in the
particle configuration, and incorporates the newest observations into the proposal distribution, resulting in a better performance. The bottom row of Fig.7 shows the multi-layer
(a) Particle filter
(b) Unscented particle filter
(c) Sequential PSO
Figure 6. Tracking performances of a human face with rapid motion
Tracking Framework
PF
UPF
SPSO
Frames Tracked
18/31
31/31
31/31
MSE of Position (by pixels)
17.069
6.975
4.172
Table 1. Quantitative results of SPSO tracker and its comparison with PF tracker and UPF tracker
importance sampling processes in SPSO framework, which
pulls the particles to significant regions of likelihood. As
a result, the SPSO framework can handle this rapid motion
even with a smaller particle number.
6.2. Tracking Results of Different Scenes
In order to further evaluate the performance of the proposed tracking framework, it is tested on three video sequences with different environments. The first video sequence contains a man walking across a lawn with a
cluttered background, large appearance and illumination
changes. In the second video sequence, a pedestrian walks
with a large pose change (bows down to reach the ground
and stands back up later). Both of these two video sequences are taken from moving cameras outdoors. The third
video sequence is a figure skating match, which contains a
figure skater with a drastic motion.
From Fig.8(a), we can see that the online updating
scheme easily absorbs the appearance and illumination
changes, and our tracking framework provides an effective
solution to follow the walking man in the cluttered background, because the sequential PSO framework is very effective at finding the global optimum. Fig.8(b) shows the
result of tracking the walking pedestrian, demonstrating the
effectiveness of our framework in tracking the large pose
changes. A tracking result of the figure skater with agile motions is shown in Fig.8(c), which demonstrates that
our algorithm has the ability to track the object where large
movements exist between two successive frames.
7. Conclusion
A new sequential particle swarm optimization framework for visual tracking has been proposed in this paper.
The sequential information required by the tracking process
is incorporated into the modified PSO to make this swarm
technique properly suited for tracking. In addition, we have
reformulated the SPSO framework in a Bayesian way, and
found that it is essentially a multi-layer importance sampling based particle filter. Furthermore, this framework has
been naturally extended to multi-object tracking as multimodal optimization. In experiments, the sequential PSO
based tracker is compared very favorably with the particle
filter and the unscented particle filter, both in terms of accuracy and efficiency, demonstrating that the sequential PSO
is a promising framework for visual tracking.
In summary, the sequential PSO provides a more reasonable mechanism and an more effective way to tackle
the dynamic optimization problems than sequential Monte
Carlo methods. So it has many other potential applications
in computer vision, including image registration, template
matching and dynamic background modeling.
Acknowledgment
This work is partly supported by NSFC (Grant No.
60672040, 60705003) and the National 863 High-Tech
R&D Program of China (Grant No. 2006AA01Z453).
References
[1] M. Kass, A. Witkin and D. Terzopoulos, ”Snakes: active contour
models”, IJCV., 1(4): 321-332, 1988. 1
Particle filter
Unscented particle filter
Sequential PSO
Figure 7. Tracking procedure of the frame 19
(a) clutter background, large appearance and illumination changes
(b) large pose changes
(c) drastic motion
Figure 8. More experimental results
[2] G. D. Hager and P. N. Hager, ”Efficient region tracking with
parametric models of geometry andillumination”, IEEE Trans. on
PAMI., 20(10): 1025-1039, 1998. 1
[3] D. Comaniciu, V. Ramesh, and P. Meer, ”Kernel-based object tracking”, IEEE Trans. on PAMI., 25(5): 234-240, 2003. 1
[4] M. Isard and A. Blake, ”Condensation: conditional density propagation for visual tracking”, IJCV., 29(1): 5-28, 1998. 1
[5] A. D. Jepson, D. J. Fleet, and T. F. El-Maraghi, ”Robust online
appearance models for visual tracking”, IEEE Trans. on PAMI.,
25(10): 1296-1311, 2003. 1, 5
[6] C. Rasmussen and G. D. Hager, ”Probabilistic Data Association
Methods for Tracking Complex Visual Objects”, IEEE Trans. on
PAMI., 23(6):560-576, 2001. 1
[7] T. Yu and Y. Wu, ”Differential Tracking based on SpatialAppearance Model(SAM)”, Proc. CVPR’06, pp.720-727, 2006. 1
[8] Q. Zhao, S. Brennan and H. Tao, ”Differential EMD Tracking”, Pro.
ICCV’07, 2007. 1
[9] M. Dewan and G. D. Hager, ”Toward Optimal Kernel-based Tracking”, Pro. CVPR’07, pp. 618-625,2007 1
[10] K. Fukunaga and L. Hostetler, ”The Estimation of the Gradient of a
Density Function, with Applications in Pattern Recognition”, IEEE
Trans. on Information Theory, 21(1): 32-40, 1975. 1
[11] C. Yang, R. Duraiswami and L. Davis, ”Fast Multiple Object Tracking via a Hierarchical Particle Filter”, Pro. ICCV’05, pp. 212-219,
2005. 1
[12] M. Bray, E. K. Meier, N. N. Schraudolph and L. V. Gool, ”Fast
Stochastic Optimization for Articulated Structure Tracking”, Image
and Vision Computing , 25(3): 352-364, 2007. 1
[13] A. P. Leung, S. G. Gong, ”Optimizing Distribution-based Matching
by Random Subsampling”, Pro. CVPR’07, pp. 1-8, 2007. 1
[14] A. Cuzol, E. Memin, ”A stochastic filter for fluid motion tracking”,
Pro. ICCV’05, pp. 396-402, 2005. 1
[15] B. Song, A. K. Chowdhury, ”Stochastic Adaptive Tracking In A
Camera Network”, Pro. ICCV’07, 2007. 1
[16] X. Zhang, W. Hu, S. Maybank, and X. Li, ”Graph Based Discriminative Learning for Robust and Efficient Object Tracking”, Pro.
ICCV’07, 2007. 1
[17] J. Kennedy, and R. C. Eberhart, ”Particle swarm optimization”, in
Proc. IEEE Int’l Conf. on Neural Networks, pp. 1942-1948, 1995.
1, 2, 3
[18] M. Clerc, and J. Kennedy, ”The particle swarm-explosion, stability, and convergence in a multidimensional complex space”, IEEE
Trans. on Evolutionary Computation, 6(1): 58-73, 2002. 1
[19] M. P. Wachowiak, R. Smolikova, and Y. Zheng, ”An approach to
multimodal biomedical image registration utilizing particle swarm
optimization”, IEEE Trans. on Evolutionary Computation, 8(3):
289-301, 2004. 1
[20] D. Parrott and X. Li, ”Locating and tracking multiple dynamic optima by a particle swarm model using speciation”, IEEE Trans. on
Evolutionary Computation, 10(4): 440-458 , 2006. 1
[21] A. Doucet, S. Godsill, and C. Andrieu, ”On sequential Monte Carlo
sampling methods for Bayesian filtering”, Statistics and Computing,
10(3): 197-208, 2000. 4
[22] M. Arulampalam, S. Maskell, N. Gordon, and T. Clapp. A tutorial
on particles filters for online nonlinear/non-gaussian bayesian tracking. IEEE Trans. on Signal Processing, 50(2): 174-188, 2002. 4
[23] S. Zhou, R. Chellappa, B. Moghaddam, ”Visual Tracking and Recongnition Using Appearance-adaptive Models in Particles Filters”,
IEEE Trans. on IP, 13(11): 1434-1456, 2004. 5
[24] R. Merwe, A. Doucet, N. Freitas, and E. Wan, ”The unscented particle filter”, Technical Report CUED/F-INFENG/TR 380, Cambridge
University Engineering Department, August 2000 . 6