Lecture09 SVM Intro, Kernel Trick (Updated)
Lecture09 SVM Intro, Kernel Trick (Updated)
Lecture09 SVM Intro, Kernel Trick (Updated)
f(x) = sign(wTx + b)
Which one
should we
choose!
x'
Good generalization! xi
SVM: Choosing a Separating Hyperplane
10
in
arg
Robust to outliners
M
as we saw and thus
d
strong generalization xi
ability.
d
It proved itself to
have better
performance on test
data in both practice
and in theory.
SVM: Support Vectors
12
in
arg
These are
M
Support
d
Vectors xi
d
Maximum Margin Classification 13
in
arg
M
linearly separable data.
Noise (outlier) problem.
d
xi
d
But, it can deal with non-linear
classes with a nice tweak.
SVM: Non Linear case
16
Key idea: map our points with a mapping function (x) to a space of
sufficiently high dimension so that they become separable by a
hyperplane in the new higher-dimensional feature space.
o Input space: the space where the points xi are located.
o Feature space: the space of f(xi) after the transformation, where f(.) is the
transformation function, For example: a non-linearly separable case in one
dimension: x
0
Mapping data to two-dimensional space with (x) = (x, x2)
x2
0 x
Interlude: Illustration of a hyperplane
17
0 x
Input space vs Feature space
18
Linear kernel: K (x i , x j ) x i x j
p
Polynomial kernel of powerKp:( x i , x j ) (1 x i x j )
||x i x j ||2 / 2 2
Gaussian kernel K (x i , x j ) e
(Also called RBF Kernel)
Can lift to infinite dim. space
K (x i , x j ) tanh(x i x j )
Two-layer perceptron:
SVM: Kernel Issues
31
We saw that the Gaussian Radial Basis Kernel lifts the data to
infinite dimension so our data is always separable in this
space so why don’t we always use this kernel?
First of all we should decide which to use in this kernel:
1 2
exp( xi x j )
2 2
Secondly, A strong kernel, which lifts the data to infinite
dimension, sometimes may lead us the severe problem of
Overfitting.
SVM: Kernel Issues
33
o In addition to the above problems, another problem is that
sometimes the points are linearly separable but the margin is Low: