Alignment and Object Instance Recognition: Computer Vision Jia-Bin Huang, Virginia Tech
Alignment and Object Instance Recognition: Computer Vision Jia-Bin Huang, Virginia Tech
Alignment and Object Instance Recognition: Computer Vision Jia-Bin Huang, Virginia Tech
Recognition
Computer Vision
Jia-Bin Huang, Virginia Tech
Many slides from S. Lazebnik and D. Hoiem
Administrative Stuffs
• HW 2 due 11:55 PM Oct 3
Today’s class
• Review fitting
• Alignment
• Going over HW 2
Previous class
• Global optimization / Search for parameters
• Least squares fit
• Robust least squares
• Iterative closest point (ICP)
2
2 x1 1 y1
m m
E i 1 xi 1 yi Ap y
n 2
b b
xn 1 yn
y T y 2( Ap)T y ( Ap)T ( Ap)
dE
2 A T Ap 2 A T y 0
dp Matlab: p = A \ y;
A T Ap A T y p A T A A T y
1
A p y
Slide modified from S. Lazebnik
E a n b n
i 1 2( a xi b yi c ) 0 c i 1 xi i 1 yi a x b y
n
c n n
2
x1 xy1 y
a
E i 1 ( a ( xi x ) b( yi y )) 2
n
p T A T Ap
b
x n x
y n y
p T A T Ap
minimize p T A T Ap s.t. p T p 1 minimize
pT p
Solution is eigenvector corresponding to smallest eigenvalue of ATA
% get a, b, c parameters
a = p(1);
b = p(2);
c = -(a*mean(x)+b*mean(y));
err = (a*x+b*y+c).^2;
A p
% convert to slope-intercept (m, b)
m = -a/b;
b = -c/b; % note: this b is for slope-intercept now
Robust Estimator
•1. Initialize: e.g., choose by least squares fit and
x
Hough space
N I 14
Algorithm:
1. Sample (randomly) the number of points required to fit the model (#=2)
2. Solve for model parameters using samples
3. Score by the fraction of inliers within a preset threshold of the model
Repeat 1-3 until the best model is found with high confidence
function [m, b] = ransacfit(x, y)
% y = mx + b
N = 200;
thresh = 0.03;
bestcount = 0;
for k = 1:N
rp = randperm(numel(x));
tx = x(rp(1:2));
ty = y(rp(1:2));
m = (ty(2)-ty(1)) ./ (tx(2)-tx(1));
b = ty(2)-m*tx(2);
nin = sum(abs(y-m*x-b)<thresh);
if nin > bestcount
bestcount = nin;
inliers = (abs(y - m*x - b) < thresh);
end
end
% total least square fitting on inliers
[m, b] = total_lsqfit(x(inliers), y(inliers));
Line fitting demo
demo_linefit(npts, outliers, noise, method)
residual ( x , M )
i
i
residual (T ( x ), x)
i
i i
What if you want to align but have no prior
matched pairs?
• Hough transform and RANSAC not applicable
• Important applications
A1
A2 A3 B1
B2 B3
Given matched points in {A} and {B}, estimate the translation of the object
xiB xiA t x
B A t
yi yi y
Example: solving for translation
A1
A2 A3 (tx, ty) B1
B2 B3
A4
B2 B3
B5
Problem: outliers
A2 A3 (tx, ty) B1
A4 B2 B3
A5 A6
(tx, ty)
• Difficulties
• Noise (typically 1-3 pixels)
• Outliers (often 30-50%)
• Many-to-one matches or multiple objects
Parametric (global) warping
T
p = (x,y) p’ = (x’,y’)
Transformation T is a coordinate-changing machine:
p’ = T(p)
original
Transformed
2
Scaling
X 2,
Y 0.5
Scaling
(x’, y’)
(x, y)
x’ = x cos() - y sin()
y’ = x sin() + y cos()
2-D Rotation
Polar coordinates…
x = r cos (f )
y = r sin (f)
x’ = r cos (f + )
(x’, y’) y’ = r sin (f + )
Trig Identity…
(x, y) x’ = r cos(f) cos() – r sin(f) sin()
y’ = r sin(f) cos() + r cos(f) sin()
Substitute…
f
x’ = x cos() - y sin()
y’ = x sin() + y cos()
2-D Rotation
This is easy to capture in matrix form:
x
x' cos sin x x 1 0 t x
y ' sin cos y y 0 1 t y
y
1
Rotate Translate
x
x a b c Affine is any combination of
y d e
f
y
translation, scale, rotation,
1
Affine shear
Affine Transformations
x
Affine transformations are combinations of x a b c
• Linear transformations, and y d e
f
y
• Translations 1
or
Properties of affine transformations:
• Lines map to lines
x' a b c x
• Parallel lines remain parallel
y ' d e f y
• Ratios are preserved
• Closed under composition 1 0 0 1 1
Projective Transformations
Matched B2
B1
keypoints
2. Solve for affine
transformation parameters
Affine
Parameters
3. Score by inliers and choose
solutions with score above This Class
# Inliers
threshold
Choose hypothesis with max
score above threshold
Overview of Keypoint Matching
1. Find a set of
distinctive key-
points
A1 B3
2. Define a region
around each
A2 A3 keypoint
B2
3. Extract and
B1
normalize the
region content
fA fB
4. Compute a local
N pixels
Input
Image Stored
Image
xi Mx i t
xi m1 m2 xi t1
y m Want to find M, t to minimize
m4 yi t 2
i 3 n
|| xi Mxi t ||
i 1
2
Fitting an affine transformation
• Assume we know the correspondences, how do we
get the (transformation?
x,y )
i i
( xi, yi)
m1
m2
x 1 0 m3 xi
xi m1 m2 xi t1 i yi 0 0
y m
i 3 m4 yi t 2 0 0 xi yi 0 1 m4 yi
t1
t 2
Fitting an affine transformation
m1
m2
x yi 0 0 1 0 m3 xi
i
0 0 xi yi 0 1 m4 yi
t1
t 2
• Linear system with six unknowns
• Each match gives us two linearly independent
equations: need at least three to solve for the
transformation parameters
Finding the objects (in detail)
1. Match interest points from input image to database image
2. Get location/scale/orientation using Hough voting
• In training, each point has known position/scale/orientation
wrt whole object
• Matched points vote for the position, scale, and orientation
of the entire object
• Bins for x, y, scale, orientation
• Wide bins (0.25 object length in position, 2x scale, 30 degrees orientation)
• Vote for two closest bin centers in each direction (16 votes total)
3. Geometric verification
• For each bin with at least 3 keypoints
• Iterate between least squares fit and checking for inliers and
outliers
4. Report object if > T inliers (T is typically 3, can be computed to
match some probabilistic threshold)
Examples of recognized objects
View interpolation
• Training
– Given images of different
viewpoints
– Cluster similar viewpoints
using feature matches
– Link features in adjacent
views
• Recognition
– Feature matches may be
spread over several
training viewpoints
Use the known links to
“transfer votes” to other [Lowe01]
viewpoints Slide credit: David Lowe
Applications
• Sony Aibo
(Evolution Robotics)
• SIFT usage
– Recognize
docking station
– Communicate
with visual cards
• Other uses
– Place recognition
– Loop closure in SLAM
K. Grauman, B. Leibe 53
Slide credit: David Lowe
Location Recognition
Training
[Lowe04]
Slide credit: David Lowe
Another application: category
recognition
• Goal: identify what type of object is in the image
• Approach: align to known objects and choose
category with best match
“Shape matching and object recognition using low distortion correspondence”, Berg et
al., CVPR 2005: http://www.cnbc.cmu.edu/cns/papers/berg-cvpr05.pdf
Examples of Matches
Examples of Matches
Other ideas worth being aware of
• Thin-plate splines: combines global affine warp with
smooth local deformation
ws = 7; % Tracking ws x ws patches
[track_x, track_y] = ... % Prob 1.2 Keypoint tracking
trackPoints(pt_x, pt_y, im, ws);
% Visualizing the feature tracks on the first and the last frame
figure(2), imagesc(im{1}), hold off, axis image, colormap gray
hold on, plot(track_x', track_y', 'r');
HW 2 – Feature/Keypoint detection
• Compute second moment matrix
I x2 ( D ) I x I y ( D )
( I , D ) g ( I )
I x I y ( D ) I y ( D )
2
% Initialization
N = numel(pt_x); % Number of keypoints
nim = numel(im); % Number of images
track_x = zeros(N, nim);
track_y = zeros(N, nim);
track_x(:, 1) = pt_x(:);
track_y(:, 1) = pt_y(:);
for i = 1: 50