EE 589/689 Foundations of computer vision: Lecture notes
Fall quarter 2006, OGI/OHSU
Miguel Á. Carreira-Perpiñán
Based mainly on:
David Forsyth and Jean Ponce: Computer Vision. A Modern Approach. Prentice-Hall, 2003.
Introduction to computer vision
Computer vision has been around since the 1960s. Recent developments:
• Increasing availability of cheap, powerful cameras (e.g. digital cameras, webcams) and other
sensors.
• Increasing availability of massive amounts of image and multimedia content on the web
(e.g. face databases, streaming video or image-based communication).
• Increasing availability of cheap, powerful computers (processor speed and memory capac-
ity).
• Introduction of techniques from machine learning and statistics (complex, data-driven mod-
els and algorithms).
Three related areas:
Computer vision
Image 2D 3D
processing image(s) world
Computer graphics
• Computer graphics: representation of a 3D scene in 2D image(s).
• Computer vision: recovery of information about the 3D world from 2D image(s); the inverse
problem of computer graphics.
• Image processing: operate one one image to produce another image (e.g. denoising, deblur-
ring, enhancement, deconvolution—in particular in medical imaging).
Some problems of computer vision:
• Structure-from-motion (3D reconstruction from multiple views, stereo reconstruction)
• Shape-from-X (single image):
– shape-from-texture
– shape-from-shading
– shape-from-focus
• Segmentation
• Tracking
• Object recognition
1
A few applications of computer vision:
• Structure-from-motion:
– Throw away motion, keep structure: image-based rendering (e.g. 3D models of build-
ings, etc. for architecture or entertainment industry)
– Throw away structure, keep motion: mobile robot control (we know the structure but
not the robot location)
• Image collections:
– Image retrieval: find me pictures containing cars and trees
– Image annotation: textual description of objects in image
• Finding faces in a group picture, crowd, etc.
• Recovering articulated pose of a person from a video
• Medical applications:
– Image enhancement
– Segmentation of brain
– Image registration or alignment: compare brains of different people, or brains be-
fore/after lesion
– Blood vessels: track cells
– Unobstrusive patient monitoring
• HCI: track eye motion; recognize physical gestures (e.g. sign language)
2
Mean-shift clustering
Represent each pixel xn , n = 1, . . . , N by a feature vector as in spectral clustering, typically
position & intensity (i, j, I) or colour (i, j, L∗ , u∗ , v ∗ ).
Idea: define a function that represents the density of the data set {xn }N D
n=1 ⊂ R , then declare
each maximum as a cluster representative and assign each pixel to a maximum via the mean-shift
algorithm.
Kernel density estimate (smooth multivariate histogram) with bandwidth σ:
X
1 X
N N
x − xn
p(x) = p(n)p(x|n) = K x ∈ RD
n=1
N n=1
σ
R
The kernel K satisfies K(x) dx = 1 and K(x) ≥ 0 (so the kernel is a pdf). Typical kernels:
1
x−xn
2
• Gaussian (infinite support): K x−xσ
n
∝ exp − 2 σ
(
x−x
0,
n
>0
• Epanechnikov (finite support): K x−xn
∝
x−x
2 σ
.
σ
1−
n
, otherwise
σ
C3 C1 C2
eplacements p(x)
Mean-shift algorithm for Gaussian kde: maxima (also minima, saddle points) of p(x) satisfy
1 X − 21 k x−x X X
N N N
n 2
k x − x n
0 = ∇p(x) ∝ − e σ ∝ p(x) p(n|x)(x − x n ) =⇒ x = p(n|x)xn = f (x)
N n=1 σ2 n=1 n=1
with “shifts” x − xn (thus ∇p(x) ∝ mean shift) and posterior probabilities (by Bayes’ th.)
2
p(x|n)p(n) exp − 12 k(x − xn )/σk
p(n|x) = = PN 2 .
p(x) 1
n0 =1 exp − 2 k(x − xn0 )/σk
This shows that the fixed points of f are stationary points of p, and suggests defining a fixed-point
iterative scheme by starting from a data point xn and iteratively applying f till convergence (and
repeating this for all pixels). It is possible to prove that this algorithm converges from nearly
any initial x to a maximum with linear convergence rate (in fact it is an EM algorithm).
Advantages:
• Nonparametric clustering: only need to set σ.
• No step size needed.
19
• Works well with clusters having complex shapes.
• The number of clusters is determined automatically by σ.
Disadvantages:
• The mean-shift iteration is slow.
• Large total computational cost: O(kN 2 ) where k = average number of mean-shift iterations
per pixel (k ≈ 20–100). Accelerations are possible that produce almost the same segmentation.
Gaussian mean-shift (GMS) algorithm Gaussian blurring mean-shift (GBMS) algorithm
for n ∈ {1, . . . , N } For each data point repeat Iteration loop
x ← xn Starting point for m ∈ {1, . . . , N } For each data point
repeat
Iteration loop exp − 21 k(xm −xn )/σk2
2
∀n: p(n|xm ) ← PN
exp − 21 k(x−xn )σk 2
1
n0 =1 exp − 2 k(xm −xn0 )/σk
∀n: p(n|x) ← PN
ym ← N n=1 p(n|xm )xn
2
1
P
n0 =1 exp − 2 k(x−xn0 )/σk One GMS step
PN
x ← n=1 p(n|x)xn Update x end
until x’s update < tol ∀m: xm ← ym Update whole data set
zn ← x Maximum until stop
end connected-components({xn }N n=1 ) Clusters
connected-components({zn }N n=1 ) Clusters
Figure 1: Pseudocode. The “connected-components” step collects all equivalent but numerically
slightly different points.
Mean-shift blurring clustering
Like mean-shift clustering, but actually move data points at each step. It obtains very similar
segmentations to those of mean-shift clustering but quite faster (cubic convergence rate for
Gaussian clusters): the total computational cost is still O(kN 2 ) but k is quite smaller (k ≈
5). It is related to spectral clustering, since effectively the algorithm is iterating (X ← PX,
update P) where X = (x1 , . . . , xN ) and P = D−1 A (stochastic, random-walk PN matrix), and
1 2
Amn = exp − 2 k(xm − xn )/σk are the Gaussian affinities and D = diag ( n=1 Amn ) the
degree matrix.
Exercises
1. Derive the mean-shift algorithm for a general kernel K.
2. Prove that the mean-shift algorithm is gradient ascent on p(x) with an adaptive step size.
3. Derive the mean-shift algorithm for the Gaussian kde with a bandwidth that is a full
covariance matrix Σn for each point.
20
9
7
clusters
5
3
1
80
iterations
60
40
PSfrag replacements
20
0
9 10 11 12 13 14 15 16 17 18 19
σ
Figure 2: Segmentation results with GMS for hand 50 × 40.
References
[1] Keinosuke Fukunaga and Larry D. Hostetler. The estimation of the gradient of a density function, with application in pattern
recognition. IEEE Trans. Information Theory, IT–21(1):32–40, January 1975.
[2] Yizong Cheng. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Analysis and Machine Intelligence, 17(8):790–799,
August 1995.
[3] Miguel Á. Carreira-Perpiñán. Mode-finding for mixtures of Gaussian distributions. IEEE Trans. Pattern Analysis and Machine
Intelligence, 22(11):1318–1323, November 2000.
[4] Dorin Comaniciu and Peter Meer. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Analysis
and Machine Intelligence, 24(5):603–619, May 2002.
[5] Miguel Á. Carreira-Perpiñán. Acceleration strategies for Gaussian mean-shift image segmentation. In Cordelia Schmid, Stefano
Soatto, and Carlo Tomasi, editors, Proc. of the 2006 IEEE Computer Society Conf. Computer Vision and Pattern Recognition
(CVPR’06), pages 1160–1167, New York, NY, June 17–22 2006.
[6] Miguel Á. Carreira-Perpiñán. Fast nonparametric clustering with Gaussian blurring mean-shift. In William W. Cohen and
Andrew Moore, editors, Proc. of the 23rd Int. Conf. Machine Learning (ICML–06), pages 153–160, Pittsburgh, PA, June 25–29
2006.
[7] Miguel Á. Carreira-Perpiñán. Gaussian mean shift is an EM algorithm. IEEE Trans. Pattern Analysis and Machine Intelligence.
To appear.
21