0% found this document useful (0 votes)
80 views

Lecture Notes

Computer vision involves using computer algorithms to analyze and understand images and videos. It has applications in areas like 3D reconstruction, object recognition, tracking, and medical imaging. One common technique is mean-shift clustering, which uses a kernel density estimate to identify clusters in image data without specifying the number of clusters in advance. The mean-shift algorithm iteratively shifts data points to the mean of nearby points, converging when points no longer shift. This provides an automated way to segment images into clusters of similar pixels.

Uploaded by

Sohail Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views

Lecture Notes

Computer vision involves using computer algorithms to analyze and understand images and videos. It has applications in areas like 3D reconstruction, object recognition, tracking, and medical imaging. One common technique is mean-shift clustering, which uses a kernel density estimate to identify clusters in image data without specifying the number of clusters in advance. The mean-shift algorithm iteratively shifts data points to the mean of nearby points, converging when points no longer shift. This provides an automated way to segment images into clusters of similar pixels.

Uploaded by

Sohail Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

EE 589/689 Foundations of computer vision: Lecture notes

Fall quarter 2006, OGI/OHSU

Miguel Á. Carreira-Perpiñán

Based mainly on:


David Forsyth and Jean Ponce: Computer Vision. A Modern Approach. Prentice-Hall, 2003.
Introduction to computer vision
Computer vision has been around since the 1960s. Recent developments:

• Increasing availability of cheap, powerful cameras (e.g. digital cameras, webcams) and other
sensors.

• Increasing availability of massive amounts of image and multimedia content on the web
(e.g. face databases, streaming video or image-based communication).

• Increasing availability of cheap, powerful computers (processor speed and memory capac-
ity).

• Introduction of techniques from machine learning and statistics (complex, data-driven mod-
els and algorithms).

Three related areas:


Computer vision
Image 2D 3D
processing image(s) world
Computer graphics

• Computer graphics: representation of a 3D scene in 2D image(s).

• Computer vision: recovery of information about the 3D world from 2D image(s); the inverse
problem of computer graphics.

• Image processing: operate one one image to produce another image (e.g. denoising, deblur-
ring, enhancement, deconvolution—in particular in medical imaging).

Some problems of computer vision:

• Structure-from-motion (3D reconstruction from multiple views, stereo reconstruction)

• Shape-from-X (single image):

– shape-from-texture
– shape-from-shading
– shape-from-focus

• Segmentation

• Tracking

• Object recognition

1
A few applications of computer vision:

• Structure-from-motion:

– Throw away motion, keep structure: image-based rendering (e.g. 3D models of build-
ings, etc. for architecture or entertainment industry)
– Throw away structure, keep motion: mobile robot control (we know the structure but
not the robot location)

• Image collections:

– Image retrieval: find me pictures containing cars and trees


– Image annotation: textual description of objects in image

• Finding faces in a group picture, crowd, etc.

• Recovering articulated pose of a person from a video

• Medical applications:

– Image enhancement
– Segmentation of brain
– Image registration or alignment: compare brains of different people, or brains be-
fore/after lesion
– Blood vessels: track cells
– Unobstrusive patient monitoring

• HCI: track eye motion; recognize physical gestures (e.g. sign language)

2
Mean-shift clustering
Represent each pixel xn , n = 1, . . . , N by a feature vector as in spectral clustering, typically
position & intensity (i, j, I) or colour (i, j, L∗ , u∗ , v ∗ ).
Idea: define a function that represents the density of the data set {xn }N D
n=1 ⊂ R , then declare
each maximum as a cluster representative and assign each pixel to a maximum via the mean-shift
algorithm.
Kernel density estimate (smooth multivariate histogram) with bandwidth σ:

X  
1 X
N N
x − xn
p(x) = p(n)p(x|n) = K x ∈ RD
n=1
N n=1
σ
R
The kernel K satisfies K(x) dx = 1 and K(x) ≥ 0 (so the kernel is a pdf). Typical kernels:
  
1 x−xn 2
• Gaussian (infinite support): K x−xσ
n
∝ exp − 2 σ
( x−x
 0, n
>0
• Epanechnikov (finite support): K x−xn
∝ x−x 2 σ
.
σ
1− n
, otherwise
σ

C3 C1 C2

eplacements p(x)

Mean-shift algorithm for Gaussian kde: maxima (also minima, saddle points) of p(x) satisfy
 
1 X − 21 k x−x X X
N N N
n 2
k x − x n
0 = ∇p(x) ∝ − e σ ∝ p(x) p(n|x)(x − x n ) =⇒ x = p(n|x)xn = f (x)
N n=1 σ2 n=1 n=1

with “shifts” x − xn (thus ∇p(x) ∝ mean shift) and posterior probabilities (by Bayes’ th.)
2
p(x|n)p(n) exp − 12 k(x − xn )/σk
p(n|x) = = PN 2 .
p(x) 1
n0 =1 exp − 2 k(x − xn0 )/σk

This shows that the fixed points of f are stationary points of p, and suggests defining a fixed-point
iterative scheme by starting from a data point xn and iteratively applying f till convergence (and
repeating this for all pixels). It is possible to prove that this algorithm converges from nearly
any initial x to a maximum with linear convergence rate (in fact it is an EM algorithm).
Advantages:

• Nonparametric clustering: only need to set σ.

• No step size needed.

19
• Works well with clusters having complex shapes.

• The number of clusters is determined automatically by σ.

Disadvantages:

• The mean-shift iteration is slow.

• Large total computational cost: O(kN 2 ) where k = average number of mean-shift iterations
per pixel (k ≈ 20–100). Accelerations are possible that produce almost the same segmentation.

Gaussian mean-shift (GMS) algorithm Gaussian blurring mean-shift (GBMS) algorithm

for n ∈ {1, . . . , N } For each data point repeat Iteration loop

x ← xn Starting point for m ∈ {1, . . . , N } For each data point

repeat

Iteration loop exp − 21 k(xm −xn )/σk2
2
 ∀n: p(n|xm ) ← PN
exp − 21 k(x−xn )σk 2

1
n0 =1 exp − 2 k(xm −xn0 )/σk
∀n: p(n|x) ← PN
ym ← N n=1 p(n|xm )xn
2

1
P
n0 =1 exp − 2 k(x−xn0 )/σk One GMS step
PN
x ← n=1 p(n|x)xn Update x end
until x’s update < tol ∀m: xm ← ym Update whole data set

zn ← x Maximum until stop


end connected-components({xn }N n=1 ) Clusters

connected-components({zn }N n=1 ) Clusters

Figure 1: Pseudocode. The “connected-components” step collects all equivalent but numerically
slightly different points.

Mean-shift blurring clustering


Like mean-shift clustering, but actually move data points at each step. It obtains very similar
segmentations to those of mean-shift clustering but quite faster (cubic convergence rate for
Gaussian clusters): the total computational cost is still O(kN 2 ) but k is quite smaller (k ≈
5). It is related to spectral clustering, since effectively the algorithm is iterating (X ← PX,
update P) where X = (x1 , . . . , xN ) and P = D−1 A (stochastic, random-walk PN matrix), and
1 2
Amn = exp − 2 k(xm − xn )/σk are the Gaussian affinities and D = diag ( n=1 Amn ) the
degree matrix.

Exercises
1. Derive the mean-shift algorithm for a general kernel K.

2. Prove that the mean-shift algorithm is gradient ascent on p(x) with an adaptive step size.

3. Derive the mean-shift algorithm for the Gaussian kde with a bandwidth that is a full
covariance matrix Σn for each point.

20
9
7

clusters
5
3
1

80

iterations
60
40
PSfrag replacements
20
0
9 10 11 12 13 14 15 16 17 18 19
σ

Figure 2: Segmentation results with GMS for hand 50 × 40.

References
[1] Keinosuke Fukunaga and Larry D. Hostetler. The estimation of the gradient of a density function, with application in pattern
recognition. IEEE Trans. Information Theory, IT–21(1):32–40, January 1975.

[2] Yizong Cheng. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Analysis and Machine Intelligence, 17(8):790–799,
August 1995.

[3] Miguel Á. Carreira-Perpiñán. Mode-finding for mixtures of Gaussian distributions. IEEE Trans. Pattern Analysis and Machine
Intelligence, 22(11):1318–1323, November 2000.

[4] Dorin Comaniciu and Peter Meer. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Analysis
and Machine Intelligence, 24(5):603–619, May 2002.

[5] Miguel Á. Carreira-Perpiñán. Acceleration strategies for Gaussian mean-shift image segmentation. In Cordelia Schmid, Stefano
Soatto, and Carlo Tomasi, editors, Proc. of the 2006 IEEE Computer Society Conf. Computer Vision and Pattern Recognition
(CVPR’06), pages 1160–1167, New York, NY, June 17–22 2006.

[6] Miguel Á. Carreira-Perpiñán. Fast nonparametric clustering with Gaussian blurring mean-shift. In William W. Cohen and
Andrew Moore, editors, Proc. of the 23rd Int. Conf. Machine Learning (ICML–06), pages 153–160, Pittsburgh, PA, June 25–29
2006.

[7] Miguel Á. Carreira-Perpiñán. Gaussian mean shift is an EM algorithm. IEEE Trans. Pattern Analysis and Machine Intelligence.
To appear.

21

You might also like