4.4-InstanceBasedLearning Part 2
4.4-InstanceBasedLearning Part 2
4.4-InstanceBasedLearning Part 2
This can be generalized to weighted nearest neighbor classifiers. That is, where the ith
nearest neighbor is assigned a weight wi where i=1..k.
A typical weight is the inverse of the square of the distance = 1/ d(xq, xi)^2.
The weighted k-NN algorithm can be used both for classification or regression:
• In weighted k-NN classification, the output is a class membership. An query instance
(xq) is assigned the class label most common among its k nearest neighbors where the
´vote´ of each neighbor i is weighted with wi.
• In k-NN regression, the output is the property value for the query instance (xq). This
value is the weighted sum of the property values of its k nearest neighbors divided by
the sum of the weights.
In the weighted approach one can extend the k-nearest neighbor method from k to all
dataitems. The alternative to keep to k elements is called a local weighted method while
the extension to all dataitems is called a global weighted method.
Locally Weighted Regression
The nearest neighbor approaches approximate a target function for a single query instance (xq).
Locally weighted regression (LWR) is an extension of this approach. It constructs an explicit
approximation of the target function over a local region surrounding xq. The approximation may
be a linear function, a quadratic function etc.
• The term local in term locally weighted regression is motivated by the fact that approximation
is based only on data near to xq.
• The term weighted is motivated by the fact that the contribition of training instances are
weighted based on the distance from xq. The weights are defined by a so called kernel
function.
One can say that the kernel function moderates the original distance measure.
• The term regression is motivated by the fact that we aim at approximating real-valued
functions.
Kernel functions in
Non parametric Statistics
A Kernel is a window function. When the
Argument is a distance measure on can say that
the kernel function is a moderation of the
original distance measure.
where
• xi and ci are the feature vector and class label (+1, -1) for the training instance i
• cq is the (+1 or -1 )for the unlabeled input xq
• the wi are the weights for the training examples
• k is the function that measures similarity between any pair of instances = kernel
• the sum ranges over the n labeled examples in the classifier's training set
• the Sign function determines whether the predicted classification comes out positive
or negative (+1, -1).
Structure for the rest of the lecture
2. Kernel Methods
3. Support Vector Machine
The property of the instance space required for such a hyper plane to
be found is called linear separability. Obviously the linear classification
techniques can only handle linearly separable cases.
Techniques that can handle the non-linear situations we call non linear
classifier techniques.
Hyperplane
A hyperplane in an n-dimensional Euclidean space is a
n-1 dimensional subset of that space that divides the space
into two disconnected parts. Examples show hyperplanes
in 2 and 3 dimensions.
Support Vector Machine
= x1^2*z1^2 +x2^2*z2^2+2*x1*x2*z1*z2=(x1*z1+x2*z2)^2
Cluster Analysis