SVM and Kernels
SVM and Kernels
Martin Hofmann
1 / 13
Optimization Problem
maximize:
W() =
m
X
i=1
m
1X
j j yi yj hxi xj i
2
i,j=1
subject to i 0, i = 1, . . . , m and
m
i=1
i yi = 0
2 / 13
Mapping Example
map data points into feature space with some function
e.g.:
: R2 R2
hyperplane hw zi = 0, as a function of x:
3 / 13
Kernel Trick
solve maximisation problem using mapped data points
W() =
m
X
i=1
m
1X
i
j j yi yj h(xi ) (xj )i
2
i,j=1
f (x) = hw xi + b =
i yi hxi xi with w =
i yi xi
4 / 13
K(x, z)
= h~x ~zi
= (x1 z1 + x2 z2 )2
= (x12 z21 + 2x1 z1 x2 z2 + x22 z22 )
E
D
=
(x12 , 2x1 x2 , x22 ) (z21 , 2z1 z2 , z22 )
= h(~x) (~z)i
5 / 13
Typical Kernels
Polynomial Kernel
d
K(x, z) = (hx zi + ) ,
for d 0
K(x, z) = e
kxzk2
2 2
kxk :=
hx xi
(Sigmoid Kernel)
K(x, z) = tanh( hx zi +
Inverse multi-quadric
K(x, z) = p
kx zk2 2 2 + c2
6 / 13
, 0
0
K s(, ) =
N N0
X
X
k(xi , xj0 )
i=1 j=1
, 0
no one-fits-all kernel
model search and cross-validation in practice
low polynomial or RBF a good initial try
7 / 13
Kernel Properties
Symmetry
8 / 13
K(x, z)
K(x, z)
K(x, z)
K(x, z)
K(x, z)
=
=
=
=
=
c K1 (x, z)
c + K1 (x, z)
K1 (x, z) + K2 (x, z)
K1 (x, z) K2 (x, z)
f (x) f (z)
9 / 13
Gram Matrix
input vectors
let V = ~v1 , . . . ,~vn a set of input vectors, then the Gram Matrix K is defined
as:
..
h(~v2 ) (~v1 )i . . .
K=
..
.
h(~vn ) (~v1 )i . . . h(~vn ) (~vn )i
K is symmetric and positive semis-definite (positive eigenvalues)
10 / 13
Mercers Theorem
assume:
finite input space X = {x1 , . . . , xn }
symmetric function K(x, z) on X
Gram Matrix K = (K(xi , xj ))ni,j=1
since K is symmetric there exists an orthogonal matrix V s.t. K = VV0
diagonal containing eigenvalues t of K
and eigenvectors vt = (vti )ni=1 as columns of V
all eigenvalues are non-negative and let feature mapping be
: xi 7
p
i vti
n
t=1
then
h(xi ) (xj )i =
n
X
Rn , i = 1, . . . , n.
t=1
11 / 13
12 / 13
Summary
function
represented by symmetric, positive semi-definite Gram Matrix
fuses information about data and kernel
standard kernels (cross validation)
every similarity matrix can be used as kernel (satisfying Mercers criteria)
ongoing research to estimate Kernel Matrix from available data
13 / 13