Support Vector Machine - Explanation
Support Vector Machine - Explanation
Support Vector Machine - Explanation
From the figure above it’s very clear that there are multiple lines (our
hyperplane here is a line because we are considering only two input
features x1, x2) that segregate our data points or do a classification
between red and blue circles. So how do we choose the best line or in
general the best hyperplane that segregates our data points?
One reasonable choice as the best hyperplane is the one that represents
the largest separation or margin between the two classes.
Multiple hyperplanes separate the data from two classes
So in this type of data point what SVM does is, finds the maximum margin
as done with previous data sets along with that it adds a penalty each
time a point crosses the margin. So the margins in these types of cases
are called soft margins. When there is a soft margin to the data set, the
SVM tries to minimize (1/margin+∧(∑penalty)). Hinge loss is a commonly
used penalty. If no violations no hinge loss.If violations hinge loss
proportional to the distance of violation.
Till now, we were talking about linearly separable data(the group of blue
balls and red balls are separable by a straight line/linear line). What to do
if data are not linearly separable?
Original 1D dataset for classification
Say, our data is shown in the figure above. SVM solves this by creating a
new variable using a kernel. We call a point xi on the line and we create a
new variable yi as a function of distance from origin o.so if we plot this we
get something like as shown below
The vector W represents the normal vector to the hyperplane. i.e the
direction perpendicular to the hyperplane. The parameter b in the
equation represents the offset or distance of the hyperplane from the
origin along the normal vector w.
The distance between a data point x_i and the decision boundary can be
calculated as:
Optimization:
For Hard margin linear SVM classifier:
The target variable or label for the ith training instance is denoted by the
symbol ti in this statement. And ti=-1 for negative occurrences (when yi= 0)
and ti=1positive instances (when yi = 1) respectively. Because we require
the decision boundary that satisfy the constraint:
For Soft margin linear SVM classifier:
Dual Problem: A dual Problem of the optimisation problem that
requires locating the Lagrange multipliers related to the support vectors
can be used to solve SVM. The optimal Lagrange multipliers α(i) that
maximize the following dual objective function
where,
αi is the Lagrange multiplier associated with the ith training sample.
K(xi, xj) is the kernel function that computes the similarity between
two samples xi and xj. It allows SVM to handle nonlinear classification
problems by implicitly mapping the samples into a higher-dimensional
feature space.
The term ∑αi represents the sum of all Lagrange multipliers.
The SVM decision boundary can be described in terms of these optimal
Lagrange multipliers and the support vectors once the dual issue has
been solved and the optimal Lagrange multipliers have been discovered.
The training samples that have i > 0 are the support vectors, while the
decision boundary is supplied by:
The SVM kernel is a function that takes low-dimensional input space and
transforms it into higher-dimensional space, ie it converts nonseparable
problems to separable problems. It is mostly useful in non-linear
separation problems. Simply put the kernel, does some extremely
complex data transformations and then finds out the process to separate
the data based on the labels or outputs defined.
Advantages of SVM
Effective in high-dimensional cases.
Its memory is efficient as it uses a subset of training points in the
decision function called support vectors.
Different kernel functions can be specified for the decision functions
and its possible to specify custom kernels.
SVM implementation in Python
Predict if cancer is Benign or malignant. Using historical data about
patients diagnosed with cancer enables doctors to differentiate malignant
cases and benign ones are given independent attributes.
Steps
Load the breast cancer dataset from sklearn.datasets
Separate input features and target variables.
Buil and train the SVM classifiers using RBF kernel.
Plot the scatter plot of the input features.
Plot the decision boundary.
Plot the decision boundary
Python3
# Load the important packages
cancer = load_breast_cancer()
X = cancer.data[:, :2]
y = cancer.target
svm.fit(X, y)
DecisionBoundaryDisplay.from_estimator(
svm,
X,
response_method="predict",
cmap=plt.cm.Spectral,
alpha=0.8,
xlabel=cancer.feature_names[0],
ylabel=cancer.feature_names[1],
# Scatter plot
c=y,
s=20, edgecolors="k")
plt.show()
Output:
Breast Cancer Classifications with SVM RBF kernel