Lecture Notes
Lecture Notes
• Increasing availability of cheap, powerful cameras (e.g. digital cameras, webcams) and other
sensors.
• Increasing availability of massive amounts of image and multimedia content on the web
(e.g. face databases, streaming video or image-based communication).
• Increasing availability of cheap, powerful computers (processor speed and memory capac-
ity).
• Introduction of techniques from machine learning and statistics (complex, data-driven mod-
els and algorithms).
• Computer vision: recovery of information about the 3D world from 2D image(s); the inverse
problem of computer graphics.
• Image processing: operate one one image to produce another image (e.g. denoising, deblur-
ring, enhancement, deconvolution—in particular in medical imaging).
– shape-from-texture
– shape-from-shading
– shape-from-focus
• Segmentation
• Tracking
• Object recognition
1
A few applications of computer vision:
• Structure-from-motion:
– Throw away motion, keep structure: image-based rendering (e.g. 3D models of build-
ings, etc. for architecture or entertainment industry)
– Throw away structure, keep motion: mobile robot control (we know the structure but
not the robot location)
• Image collections:
• Medical applications:
– Image enhancement
– Segmentation of brain
– Image registration or alignment: compare brains of different people, or brains be-
fore/after lesion
– Blood vessels: track cells
– Unobstrusive patient monitoring
• HCI: track eye motion; recognize physical gestures (e.g. sign language)
2
Mean-shift clustering
Represent each pixel xn , n = 1, . . . , N by a feature vector as in spectral clustering, typically
position & intensity (i, j, I) or colour (i, j, L∗ , u∗ , v ∗ ).
Idea: define a function that represents the density of the data set {xn }N D
n=1 ⊂ R , then declare
each maximum as a cluster representative and assign each pixel to a maximum via the mean-shift
algorithm.
Kernel density estimate (smooth multivariate histogram) with bandwidth σ:
X
1 X
N N
x − xn
p(x) = p(n)p(x|n) = K x ∈ RD
n=1
N n=1
σ
R
The kernel K satisfies K(x) dx = 1 and K(x) ≥ 0 (so the kernel is a pdf). Typical kernels:
1
x−xn
2
• Gaussian (infinite support): K x−xσ
n
∝ exp − 2 σ
(
x−x
0,
n
>0
• Epanechnikov (finite support): K x−xn
∝
x−x
2 σ
.
σ
1−
n
, otherwise
σ
C3 C1 C2
eplacements p(x)
Mean-shift algorithm for Gaussian kde: maxima (also minima, saddle points) of p(x) satisfy
1 X − 21 k x−x X X
N N N
n 2
k x − x n
0 = ∇p(x) ∝ − e σ ∝ p(x) p(n|x)(x − x n ) =⇒ x = p(n|x)xn = f (x)
N n=1 σ2 n=1 n=1
with “shifts” x − xn (thus ∇p(x) ∝ mean shift) and posterior probabilities (by Bayes’ th.)
2
p(x|n)p(n) exp − 12 k(x − xn )/σk
p(n|x) = = PN 2 .
p(x) 1
n0 =1 exp − 2 k(x − xn0 )/σk
This shows that the fixed points of f are stationary points of p, and suggests defining a fixed-point
iterative scheme by starting from a data point xn and iteratively applying f till convergence (and
repeating this for all pixels). It is possible to prove that this algorithm converges from nearly
any initial x to a maximum with linear convergence rate (in fact it is an EM algorithm).
Advantages:
19
• Works well with clusters having complex shapes.
Disadvantages:
• Large total computational cost: O(kN 2 ) where k = average number of mean-shift iterations
per pixel (k ≈ 20–100). Accelerations are possible that produce almost the same segmentation.
repeat
Iteration loop exp − 21 k(xm −xn )/σk2
2
∀n: p(n|xm ) ← PN
exp − 21 k(x−xn )σk 2
1
n0 =1 exp − 2 k(xm −xn0 )/σk
∀n: p(n|x) ← PN
ym ← N n=1 p(n|xm )xn
2
1
P
n0 =1 exp − 2 k(x−xn0 )/σk One GMS step
PN
x ← n=1 p(n|x)xn Update x end
until x’s update < tol ∀m: xm ← ym Update whole data set
Figure 1: Pseudocode. The “connected-components” step collects all equivalent but numerically
slightly different points.
Exercises
1. Derive the mean-shift algorithm for a general kernel K.
2. Prove that the mean-shift algorithm is gradient ascent on p(x) with an adaptive step size.
3. Derive the mean-shift algorithm for the Gaussian kde with a bandwidth that is a full
covariance matrix Σn for each point.
20
9
7
clusters
5
3
1
80
iterations
60
40
PSfrag replacements
20
0
9 10 11 12 13 14 15 16 17 18 19
σ
References
[1] Keinosuke Fukunaga and Larry D. Hostetler. The estimation of the gradient of a density function, with application in pattern
recognition. IEEE Trans. Information Theory, IT–21(1):32–40, January 1975.
[2] Yizong Cheng. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Analysis and Machine Intelligence, 17(8):790–799,
August 1995.
[3] Miguel Á. Carreira-Perpiñán. Mode-finding for mixtures of Gaussian distributions. IEEE Trans. Pattern Analysis and Machine
Intelligence, 22(11):1318–1323, November 2000.
[4] Dorin Comaniciu and Peter Meer. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Analysis
and Machine Intelligence, 24(5):603–619, May 2002.
[5] Miguel Á. Carreira-Perpiñán. Acceleration strategies for Gaussian mean-shift image segmentation. In Cordelia Schmid, Stefano
Soatto, and Carlo Tomasi, editors, Proc. of the 2006 IEEE Computer Society Conf. Computer Vision and Pattern Recognition
(CVPR’06), pages 1160–1167, New York, NY, June 17–22 2006.
[6] Miguel Á. Carreira-Perpiñán. Fast nonparametric clustering with Gaussian blurring mean-shift. In William W. Cohen and
Andrew Moore, editors, Proc. of the 23rd Int. Conf. Machine Learning (ICML–06), pages 153–160, Pittsburgh, PA, June 25–29
2006.
[7] Miguel Á. Carreira-Perpiñán. Gaussian mean shift is an EM algorithm. IEEE Trans. Pattern Analysis and Machine Intelligence.
To appear.
21