CpE646 9v3 PDF
CpE646 9v3 PDF
CpE646 9v3 PDF
In chapter
p 3,, the underlying
y g pprobabilityy densities were
known (or given)
The training sample was used to estimate the parameters of
th
these probability
b bilit densities
d iti (ML, (ML MAP estimations)
ti ti )
In this chapter, we only know the proper forms for the
discriminant
disc iminant functions, tthee training
ta g samples
sa p es will
w be used to
estimate the parameters of the discriminant function.
We focus on linear discriminant functions, which are either
li
linear in
i the
h components off x, or linear
li in
i some functions
f i
of x.
They may not be optimal, but they are very simple to use
Linear Discriminant Functions and
Decisions Surfaces
g(x)
g( ) also provides
p an algebraic
g measure of the distance
from x to the hyperplane
rw
x = xp +
w
where xp is the normal projection of x onto H, and r is
the distance between x and H. H
2
since g ( x p ) = wt x p + 0 = 0 and wt w = w
rw
g ( x) = w x + 0 = w x p +
t t
+ 0 = r || w ||,
w
g ( x)
or r =
w
Linear Discriminant Functions and
Decisions Surfaces
Linear Discriminant Functions and
Decisions Surfaces
The multi-category
g y case
We define c linear discriminant functions
gi ( x) = wit x + wi 0 i = 11,..., c
Example:
p a qquadratic decision functions for a 2-
dimensional feature space
g ( x) = w0 + w1 x12 + w2 x1 x2 + w3 x22 + w4 x1 + w5 x2
where w = ( w0 , w1 , w2 ,..., w5 )T
and x = ((1,, x12 , x1 x2 , x22 , x1 , x2 )T
Generalized Linear Discriminant Functions
g ( x) = 2 x12 + x32 + x2 x3 + 4 x1 2 x2 + 1
2 0 0 x1 4
g ( x) = [ x1 x2 x3 ] 0 0 0 x2 + [ x1 x2 x3 ] 2 + 1
0 1 1 x3 0
1 w0
x w
let y = 1 and a = 1
# #
x
d wd
we have g ( x) = a t y
Generalized Linear Discriminant Functions
Example:
p let the qquadratic discriminant function be
g(x)=a1+a2x+a3x2
the original x-pace is 1-dimensional, and the 3-
dimensional vector y is given by
y=[1, x, x2]t
Varying
V i x in i 1-D
1 D will
ill cause y to
t trace
t outt a curve in
i
3-D
Generalized Linear Discriminant Functions
Generalized Linear Discriminant Functions
Disconnected regions
g mayy become connected in high g
dimension
If x is from p(x), the density in y-space, p ( y ) , will
b ddegenerate,
be t being
b i zero everywhere h exceptt on the th
curve, where it is infinite. This is common problem
when d > d, or mapping from a lower dimensional
space to a higher dimensional space.
Generalized Linear Discriminant Functions
Generalized Linear Discriminant Functions
In a 2-category
g y case,, ggiven a linear discriminant function
g(x)=aty, if there is a weight vector that can correctly
classify all the samples, these samples are called linearly
separable.
separable
In this case, if atyi>0, yi is labeled 1, and if atyi<0, yi is
labeled 2.
Or we can replace all samples labeled 2 by their
negatives, and then we are looking for a weight vector a
such that atyi>0 for all the samples.
samples This weight vector is
called a separating vector.
The Two-Category Linearly Separable Case
Newton descent
The iteration is based on
a(k+1)=a(k)-H-1J
where H is the Hessian matrix of second partial
derivatives 2J/ aiaj evaluated at a(k)
Usually
U ll Newton
N t descent
d t will
ill give
i a greater
t
improvement per step than the simple gradient
descent algorithm.
It is not applicable if the Hessian matrix is singular
The Two-Category Linearly Separable Case
The Two-Category Linearly Separable Case
Criterion Functions
The Perceptron criterion function
J p (a ) = (a t y ) and t e J p = ( y )
a d then
yY yY
J q ( a ) = ( a t y )2
yY
1 ( a t y b )2
J r (a ) =
2 yY ' || y ||2
where Y ' is the set of samples for which a t y b
n
J s ( a ) =|| Ya b ||2 = ( a t yi bi )2
i =1
Support Vector Machine
Support
pp vector machines ((SVMs)) are linear machines with
margins
SVMs rely on nonlinear function (kernel) () to map the
d t into
data i t a sufficiently
ffi i tl highhi h dimension,
di i ini which
hi h two
t
categories can always be separated by a hyperplane.
G
Given
ve n sasamples,
p es, kk=1,, 2,, , n. Let
et zk=+1 o
or -1 for
o sa
sample
pe
xk in 1 or 2 respectively. We have
yk= (xk)
g(y)=aty
the separating hyperplane ensures
zkg((yk)1,
)1 k=1,k 1 , n.
Support Vector Machine
The support
pp vectors are the transformed training g samples
p
for which zkg(yk)=1, they are equally close to the
hyperplane, and they are the most difficult patterns to
classify
Support Vector Machine
Support Vector Machine
SVM training
g
Choice of () requires domain knowledge. Other
common choices include polynomials, Gaussians, radial
b i functions
basis f ti (RBF)
Define 1 n
L( a, ) = || a ||2 k [ zk a t yk 1]
2 k =1