Presented By: M. Saqib Iqbal Gull Muhammad Presented To: Mr. Imran Ali Khan Artificial Intelligence National College of Bussiness Administration & Economics Multan
Presented By: M. Saqib Iqbal Gull Muhammad Presented To: Mr. Imran Ali Khan Artificial Intelligence National College of Bussiness Administration & Economics Multan
Presented By: M. Saqib Iqbal Gull Muhammad Presented To: Mr. Imran Ali Khan Artificial Intelligence National College of Bussiness Administration & Economics Multan
Presented by:
M. Saqib Iqbal
Gull Muhammad
Presented to:
Artificial intelligence
Types of SVM
The dimensions of the hyperplane depend on the features present in the dataset,
which means if there are 2 features (as shown in image), then hyperplane will
be a straight line. And if there are 3 features, then hyperplane will be a 2-
dimension plane.
We always create a hyperplane that has a maximum margin, which means the
maximum distance between the data points.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect
the position of the hyperplane are termed as Support Vector. Since these vectors
support the hyperplane, hence called a Support vector.
As we have a clear idea of the terminologies related to SVM, let’s now see how
the algorithm works. For example, we have a classification problem where we
have to separate the red data points from the blue ones.
Since it is a two-dimensional problem, our decision boundary will be a line, for
the 3-dimensional problem we have to use a plane, and similarly, the
complexity of the solution will increase with the rising number of features.
As shown in the above image, we have multiple lines separating the data points
successfully. But our objective is to look for the best solution.
There are few rules that can help us to identify the best line.
If we talk about our example, if we get a new red data point closer to line A as
shown in the image below, line A will miss classifying that point. Similarly, if
we got a new blue instance closer to line B, then line A and C will classify the
data successfully, whereas line B will miss classifying this new point.
The point to be noticed here, In both the cases line C is successfully classifying
all the data points why? To understand this let’s take all the lines one by one.
How to build a classifier based on SVM
An example about SVM classification of cancer UCI datasets using machine
learning tools i.e. scikit-learn compatible with Python.
X, Y = make_blobs(n_samples=500, centers=2,random_state=0,
cluster_std=0.40)
# plotting scatters
plt.show()
Output
What Support vector machines do, is to not only draw a line between two classes
here, but consider a region about the line of some given width. Here’s an example of
what it can look like:
# plotting scatter
for m, b, d in [(1, 0.65, 0.33), (0.5, 1.6, 0.55), (-0.2, 2.9, 0.2)]:
yfit = m * xfit + b
color='#AAAAAA', alpha=0.4)
plt.xlim(-1, 3.5);
plt.show()
Importing datasets
import numpy as np
import pandas as pd
x = pd.read_csv("C:\...\cancer.csv")
a = np.array(x)
x = np.column_stack((x.malignant,x.benign))
x.shape
print (x),(y)
Fitting a Support Vector Machine
Now we’ll fit a Support Vector Machine Classifier to these points. While the
mathematical details of the likelihood model are interesting, we’ll let read
about those elsewhere. Instead, we’ll just treat the scikit-learn algorithm as a
black box which accomplishes the above task.
# import support vector classifier
clf = SVC(kernel='linear')
clf.fit(x, y)
After being fitted, the model can then be used to predict new values:
clf.predict([[120, 990]])
clf.predict([[85, 550]])