Academia.eduAcademia.edu

Sequential particle swarm optimization for visual tracking

2008

Sequential Particle Swarm Optimization for Visual Tracking Xiaoqin Zhang1 ,Weiming Hu1 ,Steve Maybank2 ,Xi Li1 ,Mingliang Zhu1 National Laboratory of Pattern Recognition, Institute of Automation, Beijing, China {xqzhang,wmhu,lixi,mlzhu}@nlpr.ia.ac.cn 2 School of Computer Science and Information Systems, Birkbeck College, London, UK sjmaybank@dcs.bbk.ac.uk 1 Abstract al. [1] is a good example. The aim is to obtain a tight contour enclosing the object by minimizing an energy function. In [2], the cost function is defined as the sum of squared differences between the observation candidate and a fixed template. Then the motion parameters are found by minimizing the cost function through a gradient descent search. Mean shift, which firstly appeared in [10] as an approach for estimating the gradient of a density function, is applied by Comaniciu [3] to visual tracking, in which the cost function between two color histograms is minimized through the mean shift iterations. In general, deterministic methods are usually computationally efficient but they easily become trapped in local minima. In contrast, stochastic methods introduce some stochastic factors into the searching process in order to have a higher probability of reaching the global optimum of the cost function. For example, in [11], object tracking is viewed as an online MAP (maximum a posterior) problem, which is solved by randomly generating a large number of particles to find the maximum of the posterior distribution. Bray et al. [12] use the stochastic metadescent strategy to adapt the step size of the gradient descent search, and thus avoid local minima of the optimization process in articulated structure tracking. Leung and Gong [13] incorporate random subsampling into mean shift tracking to boost its efficiency and robustness for low-resolution video sequences. Compared with the deterministic counterparts, stochastic methods are usually more robust, but they suffer a large computational load, especially in high-dimensional state space. Although considerable work has already been done above, a more effective optimization method is still intensively needed for robust visual tracking. Visual tracking usually involves an optimization process for estimating the motion of an object from measured images in a video sequence. In this paper, a new evolutionary approach, PSO (particle swarm optimization), is adopted for visual tracking. Since the tracking process is a dynamic optimization problem which is simultaneously influenced by the object state and the time, we propose a sequential particle swarm optimization framework by incorporating the temporal continuity information into the traditional PSO algorithm. In addition, the parameters in PSO are changed adaptively according to the fitness values of particles and the predicted motion of the tracked object, leading to a favourable performance in tracking applications. Furthermore, we show theoretically that, in a Bayesian inference view, the sequential PSO framework is in essence a multilayer importance sampling based particle filter. Experimental results demonstrate that, compared with the state-of-theart particle filter and its variation-the unscented particle filter, the proposed tracking algorithm is more robust and effective, especially when the object has an arbitrary motion or undergoes large appearance changes. 1. Introduction Visual tracking has emerged as a central problem in many applications such as surveillance, vision-based control, human-computer interfaces, intelligent transportation, and augmented reality. Recent years have witnessed great advances in the literature, e.g. the snakes model [1], template matching [2], mean shift [3], condensation [4], appearance models [5], probabilistic data association [6] and so on. Most of the existing tracking algorithms can be formed as an optimization process, which are typically tackled using either deterministic methods [1, 2, 3, 7, 8, 9] or stochastic methods [4, 11, 12, 13, 14, 15, 16]. Deterministic methods usually involve a gradient descent search to minimize a cost function. The snakes model introduced by Kass et Recently PSO (particle swarm optimization) [17, 18, 19, 20], a new population based stochastic optimization technique, has received more and more attentions because of its considerable success in solving nonlinear, nondifferentiable, multimodal optimization problems. Unlike other particle based stochastic optimization techniques such as genetic algorithms, the particles in PSO interact locally with one another and with their environment in analogy 1 978-1-4244-2243-2/08/$25.00 ©2008 IEEE with the ’cognitive’ and ’social’ aspects of animal populations, found in fish schooling, birds flocking, and insects swarming. Starting from a diffuse population, now called a swarm, individuals, now termed particles, move about the search space and eventually cluster in the regions where the optima are located. The advantages of this mechanism are, on one hand, the robustness and sophistication of the group behavior, and on the other hand, the simplicity and low cost of the computation associated with each particle. In view of the forgoing discussion, we propose a sequential PSO based tracking framework. To the best of our knowledge, the proposed framework is new in the tracking literature. The main contributions of this paper are as follows. The sequential information in the tracking process is effectively incorporated into PSO method to form a robust tracking framework, in which any appearance models can be used. Meanwhile, we show theoretically that, in a Bayesian inference view, the PSO iterations are essentially a swarm-intelligence guided multi-layer importance sampling strategy which incorporates the new observations into a sampling stage, and thus avoids the sample impoverishment problem suffered by the particle filter. The rest of this paper is structured as follows. A brief introduction to the traditional PSO algorithm is presented in Section 2. In Section 3, the proposed sequential PSO framework is described in detail. Section 4 presents the proposed tracking algorithm in the sequential PSO framework. Experimental results are shown in Section 5, and Section 6 is devoted to conclusion. represent inertial velocity, cognitive effect and social effect respectively. After the nth iteration, the fitness value of each particle is evaluated by a predefined observation model as follows. f (xi,n+1 ) = p(oi,n+1 |xi,n+1 ) (3) i,n+1 where o is the observation corresponding to the state xi,n+1 . Then the individual best and global best of particles are updated in the following equations:  i,n+1 i,n+1 pi = x pi , , if f (x else ) > f (pi ) g = arg max f (pi ) pi (4) (5) In this way, the particles search for the optima (here, assuming optimization means maximizing) through the above iterations until convergence. In PSO algorithm, there are several parameters to be tuned: constriction factor X , maximum velocity v max , acceleration constants ϕ1 , ϕ2 , the maximum number of iterations T , and the initialization of the particles. 3. Sequential Particle Swarm Optimization 2. Particle Swarm Optimization 3.1. Motivation Particle swarm optimization, originally developed by Kennedy and Eberhart in 1995 [17], is a population based stochastic optimization technique, which is inspired by the social behavior of bird flocking. In detail, a PSO algorithm is initialized with a group of random particles {xi,0 }N i=1 (N is the number of particles). Each particle xi,0 has a corresponding fitness value which is evaluated by the observation model f (xi,0 ), and has a relevant velocity v i,0 which directs the movement of the particle. In each iteration, the ith particle moves with the adaptable velocity v i,0 , which is a function of the best state found by that particle (pi , for individual best), and of the best state found so far among all particles (g , for global best). Given these two best values, the particle updates its velocity and state with following equations in the nth iteration (as shown in Fig.1), v i,n+1 = X (v i,n + ϕ1 u1 (pi − xi,n ) + ϕ2 u2 (g − xi,n )) (1) xi,n+1 = xi,n + v i,n+1 Figure 1. The nth iteration of particle i (2) where ϕ1 , ϕ2 are acceleration constants, u1 , u2 ∈ (0, 1) are uniformly distributed random numbers, and X is a constriction factor to confine the velocity within a reasonable range: ||v i,n || ≤ v max . In Equation (1), the three different parts In this section, an interpretation of the tracking process in a stochastic optimization view is presented to show why PSO can achieve good performance in tracking applications. Essentially, visual tracking is the successive localization of a specific region in a video sequence. Let’s consider the following version of the tracking problem: suppose there is a groundtruth corresponding to the object (food) in the image (state space) being searched. Suppose a group of particles (birds) are randomly generated in the image (state space), and none of the particles (birds) knows where the object (food) is. But each particle (bird) knows how far it is from the object (food) by evaluating the observation in each iteration. What is the best strategy to find the object (food), and how can the information obtained by each particle (bird) be used efficiently? The PSO framework, inspired by the swarm intelligence–birds flocking, provides an effective way to answer these questions, which motivates us to design a PSO based framework for robust and efficient visual tracking. However, in tracking applications, the data is typically a time sequence, and hence the task is essentially a dynamic 1C?H?8A?K8H?DC D< E8FH?9A;G 38FH?9A;G 9DCJ;F=;: 8H H?B; H 5H8=; 1 H . H +, verged at time t, the re-diversification strategy is carried out as follows. i,0 i xt+1 ∼ N (pt , Σ) where Σ is the covariance matrix of the Gaussian distribution, whose diagonal elements are proportional to the predicted velocity vtpred of the optimum at time t. 48C:DB EFDE8=8H?DC 0?HC;GG ;J8AI8H?DC vtpred = gt−1 − gt−2 5H8=; 11 6E:8H; E8FH?9A;G 8C: E8F8B;H;FG 5H8=; 111 />;9@- GHDE ?H;F8H?DC DF CDH 2 7 Figure 2. Overview of the sequential PSO algorithm optimization problem which distinguishes it from traditional optimization problems. In this case, the cost function is influenced by both the object state and the time, and optima may shift spatially, change both height and shape, or come into or go out of existence according to the time. To effectively tackle such a dynamic optimization problem, we need to answer these questions: a) how to utilize the temporal continuity information between two consecutive frames, b) how to maintain the diversity of the particles in the optimization process. 3.2. Sequential PSO Based Framework Motivated by the above discussion, we propose a sequential PSO based framework for visual tracking. To give a clear view, the flowchart of the sequential PSO based framework is schematically shown in Fig.2. First, the individual best of particles from the previous optimization round are randomly propagated to enhance their diversities. Then, the modified PSO with parameters adaptively tuned is carried out. Finally, an effective convergence criterion is checked to decide whether the PSO iteration stops or not. There are three major stages in the sequential PSO based framework: random propagation, adaptive PSO and convergence criterion, which are described in the following sections. 3.2.1 (6) Random Propagation When PSO is applied to such dynamic optimization problems, the major difficulty is the diversity loss of particles due to the convergence of the previous optimization process. Thus, a re-diversification mechanism must be employed when the particles are propagated to the next image frame. An effective re-diversification mechanism needs to know the prior knowledge of the object motion. In this paper, the particle set is randomly propagated according to a Gaussian transition model whose mean is the previous individual best particle and covariance matrix is determined by the predicted velocity of the object motion. Given the individual best of particle set {pit }N i=1 con- (7) i,0 Meanwhile, the velocity vt+1 is sampled from the uniform pred distribution U (0, vt ). In our re-diversification strategy, resampling process is not needed because the individual best of particle set converged at time t provides a compact sample set for propagation (for the reason, please see the Section 3.2.3). Although randomly propagation according to the predicted velocity is simple, it is sufficient because it is only used to produce an initial value for a subsequent search for the optimal state. 3.2.2 Adaptive PSO A drawback of the aforementioned version of PSO is the lack of a reasonable mechanism for controlling the acceleration parameters ϕ1 , ϕ2 and the maximum velocity vtmax , fostering the danger of swarm explosion and divergence especially in high-dimensional state space. To overcome this deficiency, we propose a modified PSO by self-tuning its parameters, where the acceleration parameters ϕ1 , ϕ2 can be set as follows, ϕ1 = 2f (pi )/(f (pi ) + f (g)) i ϕ2 = 2f (g)/(f (p ) + f (g)) (8) (9) Compared with the canonical PSO in [17] which constantly sets these parameters to 2, our strategy is more reasonable. Meanwhile, the equations (8)(9) demonstrate that the preference for the ’cognitive’ part or ’social’ part is determined by their fitness values. In tracking applications, the maximum velocity v max provides a reasonable bound in order to cover the particle’s maximum motion and prevent the particle from arbitrary moving. Traditionally, the maximum velocity v max is set to a predefined constant. However, it is not reasonable when the object has an arbitrary motion. Therefore, we propose a novel scheme for selecting vtmax based on predicted velocity vtpred of the optimum. vtmax = 1.2 ∗ vtpred So X is set to  max X = ||vt ||/||vti,n+1 ||, if ||vti,n+1 || > ||vtmax || 1, else (10) (11) In this way, the maximum velocity v max is heuristically selected by utilizing the motion information in the previous tracking process, and thus provides a reasonable limitation to the moving of particles and a certain capability to absorb their acceleration. Figure 3. The convergence criterion of the sequential PSO algorithm 3.2.3 Convergence Criterion The goal of tracking is to find the object as soon as possible. It is not necessary for all the particles to converge to the object. As a result, the convergence criterion is designed as follows: f (gt ) >Th , where Th is a predefined threshold, and all the individual best {pit }N i=1 are in a neighborhood of gt , as shown in Fig.3, or the maximum iteration number is encountered. According to this criterion, the object to be searched can be efficiently identified and the convergent particle set {pit }N i=1 provides a compact initialization without sample impoverishment for the next optimization process, and the temporal continuity information can be naturally incorporated into the sequential PSO framework. 4. A Bayesian Inference Interpretation of Sequential PSO We investigate the sequential PSO in a Bayesian inference view, and find that sequential PSO is a unified framework which combines the multi-layer importance sampling and particle filter. The multi-layer importance sampling stage incorporates the newest observations into importance sampling process to approximate the ’optimal’ proposal distribution p(xt |xit−1 , ot ) [21]. To make this paper self-contained, we first briefly review the standard particle filter and its major limitation, which are described in more detail in [22]. We then present multilayer importance sampling carried out by PSO iterations. 4.1. Standard Particle Filter Particle filter [22] is an online Bayesian inference process for estimating the unknown state xt at time t from a sequential observations o1:t perturbed by noises. A dynamic state-space form employed in the Bayesian inference framework is shown as follows, state transition model xt = ft (xt−1 , ǫt ) ↔ p(xt |xt−1 ) (12) observation model ot = ht (xt , νt ) ↔ p(ot |xt ) (13) where xt , ot represent system state and observation, ǫt , νt are the system noise and observation noise. ft (., .) and ht (., .) are the state transition and observation models, which are determined by probability distributions p(xt |xt−1 ) and p(ot |xt ) respectively. The key idea of particle filter is to approximate the posterior probability distribution p(xt |o1:t ) by a set of weighted samples {xit , wti }N i=1 , Figure 4. An illustration of importance sampling (left: sample from p(xt |xt−1 ), right: after PSO iterations ) which are sampled from a proposal distribution q(·), i.e. xit ∼ q(xt |xit−1 , o1:t ), (i = 1, · · ·, N ), and then each particle’s weight is set to wti ∝ p(ot |xit )p(xit |xit−1 ) q(xt |xit−1 , o1:t ) (14) Finally, the posterior probability distribution is approxi i i mated as p(xt |o1:t ) = N i=1 wt δ(xt − xt ), where δ(·) is the Dirac function. The proposal distribution q(·) is critically important for a successful particle filter since it concerns putting the sampling particles in the useful area where the posterior is significant. In practice, the dynamic transition distribution p(xt |xt−1 ) is usually taken as the proposal distribution. However, it is unreasonable when p(xt |xt−1 ) lies in the tail of p(ot |xt ) (as shown in Fig.4). In fact, Doucet et al. [21] show that the ’optimal’ proposal distribution is p(xt |xit−1 , ot ). So the question is, how to incorporate the current observation ot into the transition model to form an effective proposal distribution in reasonable computation cost. 4.2. Multi-layer Importance Sampling and Incorporation of Measurement From the above description, we can see that the sequential PSO is a combination of the PSO iterations and the particle filtering procedure. Unlike the traditional particle filter algorithm which directly samples the particles from state transition distribution, the PSO iterations employed in our framework is essentially a multi-layer importance sampling stage which progressively updates the sampled particles based on the newest observations. Specially, the initial particles in sequential PSO method are firstly sampled from the transition distribution as follows. i xi,0 t+1 ∼ N (pt , Σ) (15) In each PSO iteration, the particles are updated according to the newest observations. The detail of the multi-layer importance sampling strategy is presented in Algorithm 1. As shown in Fig.4, the particles directly sampled from transition model are situated in the tail of the observation likelihood. In contrast, through the PSO iterations, the particles are moved towards the region where the likelihood of observation has larger values, and are finally relocated to the dominant modes of the likelihood. Algorithm 1 Multi-layer Importance Sampling N 1. Initialization Xt+1 = {xi,0 t+1 }i=1 2. for n = 0 : T do 3. Carry out the PSO iteration based on Equ.(1)(2) i,n i,n i,n+1 +ϕ1 u1 (pit+1 −xi,n = X (vt+1 vt+1 t+1 )+ϕ2 u2 (gt+1 −xt+1 )) i,n+1 xi,n+1 = xi,n t+1 t+1 + vt+1 4. Evaluation of fitness values i,n+1 i,n+1 f (xi,n+1 t+1 ) = p(ot+1 |xt+1 ) 5. Incorporation  of the current observations pit+1 = 5. 6. 7. 8. i,n+1 xt+1 pit+1 , i,n+1 , if f (xt+1 )>f (pit+1 ) else gt+1 = arg maxpi f (pit+1 ) t+1 Update the parameters. Check the convergence criterion: if satisfied, break; end for N Output the particles set Xt+1 = {xi,n t+1 }i=1 5. Proposed Tracking Algorithm In this section, we introduce the proposed tracking algorithm and demonstrate how the aforementioned sequential PSO framework is adopted for tracking. Our algorithm localizes the tracked object in each image frame using a rectangular window, and the motion of a tracked object between two consecutive frames is approximated by an affine image warping. Specifically, the motion is characterized by the state of the particle xt = (x, y, θ, s, α, β) where {x, y} denote the 2-D translation parameters and {θ, s, α, β} are deformation parameters. Moreover, the fitness value of each particle is evaluated by a spatial constraint MOG (mixture of Gaussian) based appearance model. In the following parts, we first introduce the spatial constraint MOG based appearance model, then give a detailed description of the proposed tracking algorithm in the sequential PSO based framework. 5.1. Spatial Constraint MOG Based Appearance Model The appearance of the target is modeled by a spatial constraint MOG, with the parameters estimated by an online EM algorithm. 1) Appearance Model: Similar to[5],[23], the appearance model consists of three components S, W, F , where the S component captures temporally stable images, the W component characterizes the two-frame variations, and the F component is a fixed template of the target to prevent the model from drifting away. However, this appearance model treats each pixel independently and discards the spatial layout of the target. So it may fail in the case that, for instance, there are several similar objects close to the target or partial occlusion. In our work, we apply a 2-D gaussian spatial constraint to the SW F based appearance model, whose mean vector is the coordinate of the center position and the diagonal elements of the covariance matrix are proportional 300 250 200 150 100 50 0 280 260 240 220 200 180 160 160 150 140 130 120 110 100 Figure 5. A 2-D gaussian spatial constraint MOG based appearance model to the size of the target in the corresponding spatial directions, as illustrated in Fig. 5. Thus the fitness value of particles can be evaluated by the following appearance model, f (xt ) = p(ot |xt ) =   d     2 πl,t (j)N (ot (j); µl,t (j), σl,t (j)) N (x(j); xc , Σc ) ∗   j=1 l=s,w,f (16) 2 {πl,t , µl,t , σl,t ,l where = s, w, f } represent mixture probabilities, mixture centers and mixture variances of the S, W, F components respectively, ot is the candidate region corresponding to state of particle xt and d is the number of pixels inside ot . x(j),xc and Σc represent the coordinate of the pixel j , the center coordinate of the target and the variance matrix in the spatial space. Here, N (x; µ, σ 2 ) is a Gaussian density defined as follows.  (x − µ)2 N (x; µ, σ 2 ) = (2πσ 2 )−1/2 exp − 2σ 2 (17) This spatial constraint appearance model is based on the assumption that the closer the pixel to the center, the more important it is for the model representation. Fortunately, such an assumption is almost satisfied in real applications. 2) Parameter Estimation: In order to make the model parameters depend more heavily on the most recent observation, we assume that the previous appearance is exponentially forgotten and new information is gradually added to the appearance model. To avoid having to store all the data from previous frames, an online EM algorithm is used to estimate the parameters of the S, W, F components as follows. Step1: During the E-step, the ownership probability of each component is computed as 2 ml,t (j) ∝ πl,t (j)N (ot (j); µl,t (j), σl,t (j)) which fulfills  l=s,w,f (18) ml,t = 1. Step2: The mixing probability of each component is estimated as πl,t+1 (j) = αml,t (j) + (1 − α)πl,t (j); l = s, w, f (19) and a recursive form for moments {Mk,t+1 ; k = 1, 2} are evaluated as Mk,t+1 (j) = αokt (j)ms,t (j) + (1 − α)Mk,t (j); k = 1, 2 (20) where α = 1 − e−1/τ acts as a forgotten factor and τ is a predefined constant. Step3: Finally, the mixture centers and the variances are estimated in the M-step µs,t+1 (j) = M1,t+1 (j) 2 M2,t+1 (j) , σs,t+1 = − µ2s,t+1 (j) πs,t+1 (j) πs,t+1 (j) 2 2 (j) = σw,1 (j) µw,t+1 (j) = ot (j), σw,t+1 2 2 µf,t+1 (j) = µf,1 (j), σf,t+1 (j) = σf,1 (j) 5.2. Sequential PSO Based Tracking Algorithm Sequential PSO has provided a general and effective tracking framework. Therefore, we embed the spatial constraint MOG based appearance model into this framework for the fitness value evaluation. The detail of the sequential PSO based tracking algorithm is presented as follows. Algorithm 2 Sequential PSO Based Tracking Algorithm Input: Given the individual best particles {pit }N i=1 at time t; 1. Randomly propagate the particle set to enhance their diversities according to the following transition model i xi,0 t+1 ∼ N (pt , Σ) where Σ is a diagonal covariance matrix whose elements are the corresponding variances of affine parameters, i.e., 2 , σφ2 . σx2 , σy2 , σθ2 , σs2 , σα 2. The fitness value of each particle is evaluated by the spatial constraint MOG based observation model as follows. i,n i,n f (xi,n t+1 ) = p(ot+1 |xt+1 ), i = 1 · · · N, n = 0 · · · T 3. Update {pit+1 }N i=1 and gt+1 by the fitness values obtained above, and update the parameters. 4. Carry out the PSO iteration based on Equ.(1),(2); 5. Check the convergence criterion: if satisfied, continue, otherwise go to step 2; Output: Global optimum: gt+1 ; 6. Experiment Results In our implementation, each candidate image corresponding to a particle is rectified to a 30×15 patch, and the feature is a 450-dimension vector of gray level values subjected to zero-mean-unit-variance normalization. All of the experiments are carried out on a CPU Pentium IV 3.2GHz PC with 512M memory. 6.1. Sequential PSO vs PF and UPF First, we conduct a comparison experiment among the SPSO (sequential PSO) based tracking algorithm, a standard PF (particle filter) and its variation-UPF (unscented particle filter) [24] on a video with manually labeled groundtruth. Then, a theoretical investigation is presented to show why SPSO has advantages over the other two algorithms. This video sequence1 contains a human face moving to the left and right very quickly. Although the sequence is simple, it is effective to show the claimed advantages of the SPSO framework. In our implementation, the parameters in the particle filter and unscented particle filter are set 1 The sequence is available at http://vision.stanford.edu/ birch/headtracker/seq/. to {N = 600, Σ = diag(82 , 82 , 0.022 , 0.022 , 0.0022 , 0.0022 )} corresponding to the number of particles and the covariance matrix of the transition distribution respectively. To give a convincing comparison, the sequential PSO algorithm is calibrated in the same metric, implementing with the same covariance matrix and with 60 particles in each iteration. As shown in Fig.6(a), the particle filter based tracker fails to track the object at frame 19, because it can not catch the rapid motion of the object. More particles and an enlargement for the diagonal elements of the covariance matrix would improve its performance, but this strategy involves more noises and a heavy computational load, and it may trap in the curse of dimensionality when the dimension of the state increases. Fig.6(b) shows the tracking performance of the unscented particle filter, from which we can notice that the tracker follows the object throughout the sequence, but the localization accuracy is unsatisfactory. In comparison, our method, which utilizes individual and environmental information in the search space, never loses the target and achieves the most accurate results. Furthermore, we have conducted a quantitative evaluation of these algorithms, and have a comparison in the following aspects: frames of successful tracking, MSE (mean square error) between the estimated position and the labeled groundtruth. In table 1, it is clear that the PF tracker fails at frame 19 while the UPF and SPSO trackers succeed in tracking throughout the sequence. Additionally, the SPSO tracker outperforms the UPF tracker in term of accuracy. A theoretical investigation shows the underlying reasons for the above experimental results. The undesired behavior of particle filter in Fig.6 is caused by the sample impoverishment in its particle generation process. Let’s focus on the frame 19 when the PF tracker loses the target. Here, the particles are sampled from the Gaussian based transition distribution to catch the object motion. When the object has rapid and arbitrary motion, the particles drawn from this distribution do not cover a significant region of the likelihood (as shown in top-left of Fig.7), and thus the weights of most particles are low, leading to the tracking failure. As for the unscented particle filter, the sigma-states are generated by UT (unscented transformation) and propagated (as shown in top-right Fig.7), and the weighted mean and covariance are calculated to form a better proposal distribution, thus enhancing the tracking performance to some degree. However, the estimation accuracy of UT is only to the secondorder for non-Gaussian data, which may not be coincident with actual motion and thus leads to inaccurate localization. Meanwhile, the generation of sigma-states and the updating of the covariance are time-consuming. While the SPSO framework extracts the local and global information in the particle configuration, and incorporates the newest observations into the proposal distribution, resulting in a better performance. The bottom row of Fig.7 shows the multi-layer (a) Particle filter (b) Unscented particle filter (c) Sequential PSO Figure 6. Tracking performances of a human face with rapid motion Tracking Framework PF UPF SPSO Frames Tracked 18/31 31/31 31/31 MSE of Position (by pixels) 17.069 6.975 4.172 Table 1. Quantitative results of SPSO tracker and its comparison with PF tracker and UPF tracker importance sampling processes in SPSO framework, which pulls the particles to significant regions of likelihood. As a result, the SPSO framework can handle this rapid motion even with a smaller particle number. 6.2. Tracking Results of Different Scenes In order to further evaluate the performance of the proposed tracking framework, it is tested on three video sequences with different environments. The first video sequence contains a man walking across a lawn with a cluttered background, large appearance and illumination changes. In the second video sequence, a pedestrian walks with a large pose change (bows down to reach the ground and stands back up later). Both of these two video sequences are taken from moving cameras outdoors. The third video sequence is a figure skating match, which contains a figure skater with a drastic motion. From Fig.8(a), we can see that the online updating scheme easily absorbs the appearance and illumination changes, and our tracking framework provides an effective solution to follow the walking man in the cluttered background, because the sequential PSO framework is very effective at finding the global optimum. Fig.8(b) shows the result of tracking the walking pedestrian, demonstrating the effectiveness of our framework in tracking the large pose changes. A tracking result of the figure skater with agile motions is shown in Fig.8(c), which demonstrates that our algorithm has the ability to track the object where large movements exist between two successive frames. 7. Conclusion A new sequential particle swarm optimization framework for visual tracking has been proposed in this paper. The sequential information required by the tracking process is incorporated into the modified PSO to make this swarm technique properly suited for tracking. In addition, we have reformulated the SPSO framework in a Bayesian way, and found that it is essentially a multi-layer importance sampling based particle filter. Furthermore, this framework has been naturally extended to multi-object tracking as multimodal optimization. In experiments, the sequential PSO based tracker is compared very favorably with the particle filter and the unscented particle filter, both in terms of accuracy and efficiency, demonstrating that the sequential PSO is a promising framework for visual tracking. In summary, the sequential PSO provides a more reasonable mechanism and an more effective way to tackle the dynamic optimization problems than sequential Monte Carlo methods. So it has many other potential applications in computer vision, including image registration, template matching and dynamic background modeling. Acknowledgment This work is partly supported by NSFC (Grant No. 60672040, 60705003) and the National 863 High-Tech R&D Program of China (Grant No. 2006AA01Z453). References [1] M. Kass, A. Witkin and D. Terzopoulos, ”Snakes: active contour models”, IJCV., 1(4): 321-332, 1988. 1 Particle filter Unscented particle filter Sequential PSO Figure 7. Tracking procedure of the frame 19 (a) clutter background, large appearance and illumination changes (b) large pose changes (c) drastic motion Figure 8. More experimental results [2] G. D. Hager and P. N. Hager, ”Efficient region tracking with parametric models of geometry andillumination”, IEEE Trans. on PAMI., 20(10): 1025-1039, 1998. 1 [3] D. Comaniciu, V. Ramesh, and P. Meer, ”Kernel-based object tracking”, IEEE Trans. on PAMI., 25(5): 234-240, 2003. 1 [4] M. Isard and A. Blake, ”Condensation: conditional density propagation for visual tracking”, IJCV., 29(1): 5-28, 1998. 1 [5] A. D. Jepson, D. J. Fleet, and T. F. El-Maraghi, ”Robust online appearance models for visual tracking”, IEEE Trans. on PAMI., 25(10): 1296-1311, 2003. 1, 5 [6] C. Rasmussen and G. D. Hager, ”Probabilistic Data Association Methods for Tracking Complex Visual Objects”, IEEE Trans. on PAMI., 23(6):560-576, 2001. 1 [7] T. Yu and Y. Wu, ”Differential Tracking based on SpatialAppearance Model(SAM)”, Proc. CVPR’06, pp.720-727, 2006. 1 [8] Q. Zhao, S. Brennan and H. Tao, ”Differential EMD Tracking”, Pro. ICCV’07, 2007. 1 [9] M. Dewan and G. D. Hager, ”Toward Optimal Kernel-based Tracking”, Pro. CVPR’07, pp. 618-625,2007 1 [10] K. Fukunaga and L. Hostetler, ”The Estimation of the Gradient of a Density Function, with Applications in Pattern Recognition”, IEEE Trans. on Information Theory, 21(1): 32-40, 1975. 1 [11] C. Yang, R. Duraiswami and L. Davis, ”Fast Multiple Object Tracking via a Hierarchical Particle Filter”, Pro. ICCV’05, pp. 212-219, 2005. 1 [12] M. Bray, E. K. Meier, N. N. Schraudolph and L. V. Gool, ”Fast Stochastic Optimization for Articulated Structure Tracking”, Image and Vision Computing , 25(3): 352-364, 2007. 1 [13] A. P. Leung, S. G. Gong, ”Optimizing Distribution-based Matching by Random Subsampling”, Pro. CVPR’07, pp. 1-8, 2007. 1 [14] A. Cuzol, E. Memin, ”A stochastic filter for fluid motion tracking”, Pro. ICCV’05, pp. 396-402, 2005. 1 [15] B. Song, A. K. Chowdhury, ”Stochastic Adaptive Tracking In A Camera Network”, Pro. ICCV’07, 2007. 1 [16] X. Zhang, W. Hu, S. Maybank, and X. Li, ”Graph Based Discriminative Learning for Robust and Efficient Object Tracking”, Pro. ICCV’07, 2007. 1 [17] J. Kennedy, and R. C. Eberhart, ”Particle swarm optimization”, in Proc. IEEE Int’l Conf. on Neural Networks, pp. 1942-1948, 1995. 1, 2, 3 [18] M. Clerc, and J. Kennedy, ”The particle swarm-explosion, stability, and convergence in a multidimensional complex space”, IEEE Trans. on Evolutionary Computation, 6(1): 58-73, 2002. 1 [19] M. P. Wachowiak, R. Smolikova, and Y. Zheng, ”An approach to multimodal biomedical image registration utilizing particle swarm optimization”, IEEE Trans. on Evolutionary Computation, 8(3): 289-301, 2004. 1 [20] D. Parrott and X. Li, ”Locating and tracking multiple dynamic optima by a particle swarm model using speciation”, IEEE Trans. on Evolutionary Computation, 10(4): 440-458 , 2006. 1 [21] A. Doucet, S. Godsill, and C. Andrieu, ”On sequential Monte Carlo sampling methods for Bayesian filtering”, Statistics and Computing, 10(3): 197-208, 2000. 4 [22] M. Arulampalam, S. Maskell, N. Gordon, and T. Clapp. A tutorial on particles filters for online nonlinear/non-gaussian bayesian tracking. IEEE Trans. on Signal Processing, 50(2): 174-188, 2002. 4 [23] S. Zhou, R. Chellappa, B. Moghaddam, ”Visual Tracking and Recongnition Using Appearance-adaptive Models in Particles Filters”, IEEE Trans. on IP, 13(11): 1434-1456, 2004. 5 [24] R. Merwe, A. Doucet, N. Freitas, and E. Wan, ”The unscented particle filter”, Technical Report CUED/F-INFENG/TR 380, Cambridge University Engineering Department, August 2000 . 6