Kernel Models 1233
Kernel Models 1233
Kernel Models 1233
Models
From Intuition to Application
Motivation & Problem
Statement
Linear models like Logistic Regression struggle with data
that is not linearly separable, such as complex shapes or
XOR patterns. This limitation makes them less effective in
real-world applications like image classification or
bioinformatics. To overcome this, we need a mechanism
that can handle non-linear relationships while retaining
computational efficiency. This is where kernel models
come in, allowing us to transform data into higher
dimensions, making it linearly separable.
3/1/20XX 2
Linearly separable data
Linearly separable data
Not a linear decision boundary?
Not Linearly separable
data?
Intuition Behind Non-Linearity
Content
X X
O O
X X O O O O X X x1 O O x1
What if there’s a 2D input
space that’s not linearly
separable?
2D TO 3D Transformation
Pipeline: TEST
DATA
OUTPUT
Support Vector Machines
Click icon to add picture
Kernel Ridge
Click icon to add picture
Types
Regression(KRR)
Analysis (KPCA)
Kernel Density
Estimation
Support Vector Machines
• Definition:
SVMs are max-margin classifiers that use
kernel functions to separate data into
different classes.
• SVMs find a hyperplane or line that
maximizes the distance between classes in
an N-dimensional space. They are based on
statistical learning theory and use support
vectors and margins to find the optimal
separating hyperplane.
• Advantages:
• Perform well with noisy or outlier-prone
data.
14 Presentation title 20XX
• Effective in high-dimensional spaces.
SVM using Kernel
• How It Works:
• Data is transformed into a
higher-dimensional space
or a new feature space
using a kernel.
• The optimal hyperplane is
found to separate the
classes.
• New data is classified
based on this hyperplane.
•Advantages:
•Can capture non-linear relationships in
data.
•Provides predictions with probabilistic
confidence.
•Suitable for high-dimensional datasets.
3/1/20XX 22
3/1/20XX 23
Let’s discuss in detail….
3/1/20XX 24
1) Linear Kernel Function
• It is used when the data
is linearly separable.
• K(x1, x2) = x1 . X2
Q)When to use it?
Linear kernels are best for low-dimensional
datasets with lots of features, and are often
used for text classification. They're also ideal
for linear problems like support vector
machines (SVMs) and logistic regression.
3/1/20XX 26
2. Polynomial Kernel
• It is used when the data is not
linearly separable.
• K(x1, x2) = (x1 . x2 + 1)d
How it works?
The polynomial kernel generates new
features by combining existing features
using polynomials. It looks at the given
features of input samples, as well as
combinations of those features, to
determine their similarity.
3/1/20XX 28
Polynomial Kernel Function
• For inhomogeneous kernels, this is given as:
k(x,y) = (c+ xT.y)q ,
where x and y are two vectors.
• Here c is a constant and q is the degree of the polynomial.
• If c is zero and degree is one, the polynomial kernel is
reduced to a linear kernel.
• The value of degree q should be optimal as more degree
may lead to overfitting
3/1/20XX 29
Application
Q) Consider two data points x = and y = (2, 3) with c = 1
Apply linear, homogeneous and inhomogeneous kernels.
Solution:•
The kernel is given by k(x, y) = (xT * y)q
If the it is called linear kernel q = 1
k(x, y) =
3/1/20XX 30
Solution:•
The kernel is given by k(x, y) = (x ^ T * y) ^ q
If q = 2 the it is called homogeneous or quadratic kernel
k(x, y) =
Solution:
The kernel is given by k(x, y) = (x ^ T * y) ^ q
If q = 2 and c = 1 the it is called inhomogeneous kernel
k(x,y)=
3/1/20XX 31
3. Gaussian Kernel Radial Basis
Function (RBF):
• k(xi, xj) = exp(-𝛾||xi - xj||2)
• It is a popular radial basis
function, and is used in a variety
of learning architectures,
including: Spatial statistics,
Dynamical system identification,
Gaussian processes for machine
learning, and Classification of
object existence
• Example: Image classification or
datasets with clusters.
3/1/20XX 32
Application
Q)Consider two data points x = (1, 2) and y = (2, 3) with sigma = 1 .
Apply RBF kernel and find the value of RBF kernel for these points.
then k(x, y) =
3/1/20XX 33
Gaussian Kernel RBF:
• Radial Basis Functions (RBFs) or Gaussian kernels are extremely
useful in SVM.The RBF function is shown as below:
38 Presentation title
”
20XX
Problems with the discussed
approach:
1. Curse of Dimensionality
2. Slow and ineffective Computation
3. Mapping Data to Higher-Dimensional Spaces
44 Presentation title
”
20XX
Why is it important to use
the kernel trick?
As you can see in the above picture, if we find a way to
map the data from 2-dimensional space to 3-dimensional
space, we will be able to find a decision surface that
clearly divides between different classes. My first thought
of this data transformation process is to map all the data
point to a higher dimension (in this case, 3 dimension), Click icon to add picture
find the boundary, and make the classification.
{ (x,y) = (x²,
√2xy, y²) }
Here x and y are two data points in 3 dimensions. Let’s assume that we need to map
x and y to 9-dimensional space. We need to do the following calculations to get the
final result, which is just a scalar. The computational complexity, in this case, is
O(n²).
However, if we use the kernel function, which is denoted as k(x, y),
instead of doing the complicated computations in the 9-dimensional
space, we reach the same result within the 3-dimensional space by
calculating the dot product of x -transpose and y. The computational
complexity, in this case, is O(n).