Support Vector Machine (SVM)
Support Vector Machine (SVM)
Soft margin
C=3
It means it allow 3 mis -classification
Regularization parameter (C):
1. The C parameter in SVM is mainly used for the Penalty
parameter of the error term.
2. You can consider it as the degree of correct classification
that the algorithm has to meet or the degree of optimization
the SVM has to meet.
3. Controls the tradeoff between the classification of training
points accurately and a smooth decision boundary or in a
simple word, it suggests the model choose data points as a
support vector.
3. For large C – then model choose more data points as a
support vector and we get the higher variance and lower
bias, which may lead to the problem of overfitting.
4.For small C – If the value of C is small then the model chooses
fewer data points as a support vector and gets lower
variance/high bias.
The value of gamma and C should not be very high because it
leads to overfitting or it shouldn’t be very small (underfitting).
Thus we need to choose the optimal value of C.
Gamma Parameter:
The SVM algorithm has a technique called the kernel trick. The SVM kernel is a
function that takes low dimensional input space and transforms it to a higher
dimensional space i.e. it converts not separable problem to separable problem.
It is mostly useful in non-linear separation problem. Simply put, it does some
extremely complex data transformations, then finds out the process to separate
the data based on the labels or outputs
poly
rbf
sigmoid
Sigmoid kernel
We can use it as the proxy for neural networks. Equation is
Pros and Cons associated with SVM
Pros:
It works really well with a clear margin of separation
It is effective in high dimensional spaces.
It is effective in cases where the number of dimensions is greater than the
number of samples.
It uses a subset of training points in the decision function (called support
vectors), so it is also memory efficient.
Cons:
It doesn’t perform well when we have large data set because the required
training time is higher
It also doesn’t perform very well, when the data set has more noise i.e. target
classes are overlapping
SVM doesn’t directly provide probability estimates, these are calculated using an
expensive five-fold cross-validation. It is included in the related SVC method of
Python scikit-learn library.