Support Vector Machines: Some Slides Adapted From
Support Vector Machines: Some Slides Adapted From
Support Vector Machines: Some Slides Adapted From
Machines
Some slides adapted from
Var2
ABDBM © Ron Shamir 5
Maximizing the Margin
Var1
IDEA 1: Select the
separating
hyperplane that
maximizes the
margin!
Margin
Width
Margin
Width
Var2
ABDBM © Ron Shamir 6
Support Vectors
Var1
Support Vectors
Margin
Width
Var2
ABDBM © Ron Shamir 7
Setting Up the Optimization
Problem
Var1
Scaling w, b so that
k=1, the problem
becomes:
2
max
w x b 1 w
w s.t. (w x b) 1, x of class 1
(w x b) 1, x of class 2
w x b 1 1 Var2
1
w x b 0
• as
yi (w xi b) 1, xi
1
Minimize (w ) || w || 2 i yi (w x i b) i
w,b , 2
s.t. 1 0,..., l 0
• Dual Problem
1
Maximize F (Λ) Λ 1 Λ DΛ
2
Λy 0
subject to
Λ0
where Λ ( 1 , 2 ,...,l ), y ( y1 , y 2 ,..., y n ), Dij yi y j x i x j
• Representation for w
l
w i yi x i
• Decision function i 1
l
f ( x ) sign ( yi i ( x x i ) b)
i 1
Allow some
j instances to fall
within the margin,
but penalize them
w x b 1
w
w x b 1 1 Var2
1
w x b 0
Var1 yi (w xi b) 1 i , xi
i i 0
Objective function
j penalizes for
misclassified instances
and those within the
w x b 1 margin
w
1
min w C i
2
w x b 1 1 Var2
2
1 i
w x b 0
C trades-off margin width
& misclassifications
ABDBM © Ron Shamir 17
Linear, Soft-Margin SVMs
1
min w C i yi (w xi b) 1 i , xi
2
2 i i 0
• Algorithm tries to keep i at zero while maximizing
margin
• Alg does not minimize the no. of misclassifications
(NP-complete problem) but the sum of distances
from the margin hyperplanes
• Other formulations use i2 instead
• C: penalty for misclassification
• As C, we get closer to the hard-margin solution
l
f ( x ) sign ( yi i ( x x i ) b)
i 1
Var1 Var1
i
i
Var2
Var2 w x b 0
w x b 0
Var2
ABDBM © Ron Shamir 24
Advantages of Non-Linear
Surfaces
Var1
Var2
ABDBM © Ron Shamir 25
Linear Classifiers in High-
Dimensional Spaces
Constructed
Var1
Feature 2
Var2
Constructed
Feature 1
Find function (x) to map to
a different space
– n number of variables
s.t. yi ( w ( x ) b) 1 i , xi
• The (Wolfe) dual of this i 0
problem
– one equality constraint 1
– n positivity constraints min i j yi y j ( ( xi ) ( x j )) i
ai 2
– n number of variables i, j i
(Lagrange multipliers)
– Objective function more s.t. C i 0, xi
complicated
y
i
i i 0
sgn( wx b) sgn( i yi K ( xi , x ) b)
i
where b solves j ( y j i yi K ( xi , x j ) b 1) 0,
i
1/||w||
L2
dist
from
best
56