0% found this document useful (0 votes)
16 views28 pages

Support Vector Machine (SVM)

Uploaded by

Ridhima Gautam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views28 pages

Support Vector Machine (SVM)

Uploaded by

Ridhima Gautam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Support Vector Machine(SVM)

• SVM works well with higher dimensional data


and thus avoids dimensionality problem.
• Although the SVM based classification (i.e.,
training time) is extremely slow, the result, is
however highly accurate. Further, testing an
unknown data is very fast.
• SVM is less prone to over fitting than other
methods. It also facilitates compact model for
classification
• Linear SVM : a classification technique when training
data are linearly separable.
• Non-linear SVM : a classification technique when
training data are linearly non-separable.
The followings are important concepts in SVM −
Support Vectors − Datapoints that are closest to the hyperplane is
called support vectors. Separating line will be defined with the
help of these data points.
Hyperplane − As we can see in the above diagram, it is a
decision plane or space which is divided between a set of objects
having different classes.
Margin − It may be defined as the gap between two lines on the
closet data points of different classes. It can be calculated as the
perpendicular distance from the line to the support vectors. Large
margin is considered as a good margin and small margin is
considered as a bad margin.
The main goal of SVM is to divide the datasets into classes to find
a maximum marginal hyperplane (MMH) and it can be done in
the following two steps −
First, SVM will generate hyperplanes iteratively that segregates
the classes in best way.
Then, it will choose the hyperplane that separates the classes
correctly.
Maximum Margin Hyperplane

we shall assume a simplistic situation that given a training data D


={t1,t2,....tn} with a set of n tuples, which belong to two classes either
+ or - and each tuple is described by two attributes say A1, A2.

• Thus, for a good classifier it must


h2 choose one of the infinite number of
hyperplanes, so that it performs better
not only on training data but as well as
test data.
• To illustrate how the different choices
of hyperplane influence the
classification error, consider any
arbitrary two hyperplanes H1 and H2
as shown in Fig. 3.
• In Fig. 3, two hyperplanes H1 and H2 have their own
boundaries called decision boundaries(denoted as b11 and
b12 for H1 and b21 and b22 for H2).
• A decision boundary is a boundary which is parallel to
hyperplane and touches the closest class in one side of the
hyperplane.
• The distance between the two decision boundaries of a
hyperplane is called the margin. So, if data is classified
using Hyperplane H1, then it is with larger margin then
using Hyperplane H2.
• The margin of hyperplane implies the error in classifier. In
other words, the larger the margin, lower is the
classification error.
• Intuitively, the classifier that contains hyperplane with
a small margin are more susceptible to model over
fitting and tend to classify with weak confidence on
unseen data.
• Thus during the training or learning phase, the
approach would be to search for the hyperplane with
maximum margin.
• Such a hyperplane is called maximum margin
hyperplane and abbreviated as MMH.
• We may note the shortest distance from a hyperplane
to one of its decision boundary is equal to the shortest
distance from the hyperplane to the decision boundary
at its other side.
• Alternatively, hyperplane is at the middle of its
decision boundaries.
Linear SVM

• A SVM which is used to classify data which are


linearly separable is called linear SVM.
• In other words, a linear SVM searches for a
hyperplane with the maximum margin.
• This is why a linear SVM is often termed as a
maximal margin classifier (MMC).
Objective
function

Will consider c no. of error


We will not change
hyperplane
Regularization
Will get by using hyper
parameter tuning
Hard margin

Soft margin

C=3
It means it allow 3 mis -classification
Regularization parameter (C):
1. The C parameter in SVM is mainly used for the Penalty
parameter of the error term.
2. You can consider it as the degree of correct classification
that the algorithm has to meet or the degree of optimization
the SVM has to meet.
3. Controls the tradeoff between the classification of training
points accurately and a smooth decision boundary or in a
simple word, it suggests the model choose data points as a
support vector.
3. For large C – then model choose more data points as a
support vector and we get the higher variance and lower
bias, which may lead to the problem of overfitting.
4.For small C – If the value of C is small then the model chooses
fewer data points as a support vector and gets lower
variance/high bias.
The value of gamma and C should not be very high because it
leads to overfitting or it shouldn’t be very small (underfitting).
Thus we need to choose the optimal value of C.
Gamma Parameter:

1. Gamma is used when we use the Gaussian RBF kernel.


2.If you use linear or polynomial kernel then you do not need
gamma only you need C hypermeter.
3. It decides that how much curvature we want in a decision
boundary.
4. High Gamma value – More curvature
5. Low Gamma value – Less curvature
SVM Kernels

The SVM algorithm has a technique called the kernel trick. The SVM kernel is a
function that takes low dimensional input space and transforms it to a higher
dimensional space i.e. it converts not separable problem to separable problem.
It is mostly useful in non-linear separation problem. Simply put, it does some
extremely complex data transformations, then finds out the process to separate
the data based on the labels or outputs

In the scenario below, we can’t have linear


hyper-plane between the two classes,
SVM can solve this
problem. Easily! It
solves this problem by
introducing additional
feature. Here, we will
add a new feature
z=x^2+y^2.
Now, let’s plot the data
points on axis x and z:
Non linear SVM

Linearly not separable

If we draw a best fit line % of error will be more

poly
rbf
sigmoid

SVM kernel perform transformation


from lower dimensional space to
higher dimensional space
Common Types of Kernels used in SVM

Let us say that we have two vectors with name x1 and


Y1, then the linear kernel is defined by the dot product
of these two vectors:
Polynomial Kernel
A polynomial kernel is defined by the following
equation:
Polynomial kernel
Gaussian radial basis function (RBF)

It is a general-purpose kernel; used when there is no prior knowledge


about the data.
Equation is:

Sigmoid kernel
We can use it as the proxy for neural networks. Equation is
Pros and Cons associated with SVM
Pros:
It works really well with a clear margin of separation
It is effective in high dimensional spaces.
It is effective in cases where the number of dimensions is greater than the
number of samples.
It uses a subset of training points in the decision function (called support
vectors), so it is also memory efficient.
Cons:
It doesn’t perform well when we have large data set because the required
training time is higher
It also doesn’t perform very well, when the data set has more noise i.e. target
classes are overlapping
SVM doesn’t directly provide probability estimates, these are calculated using an
expensive five-fold cross-validation. It is included in the related SVC method of
Python scikit-learn library.

You might also like