0% found this document useful (0 votes)
10 views

Lab Program 9

good
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Lab Program 9

good
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Machine Learning Laboratory 15CSL76

9. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions. Java/Python ML library classes can be used for this
problem.

K-Nearest Neighbor Algorithm

Training algorithm:
 For each training example (x, f (x)), add the example to the list training examples
Classification algorithm:
 Given a query instance xq to be classified,
 Let x1 . . .xk denote the k instances from training examples that are nearest to xq
 Return

 Where, f(xi) function to calculate the mean value of the k nearest training
examples.

Data Set:

Iris Plants Dataset: Dataset contains 150 instances (50 in each of three classes)
Number of Attributes: 4 numeric, predictive attributes and the Class

1 Deepak D, Assistant Professor, Dept. of CS&E, Canara Engineering College, Mangaluru


Machine Learning Laboratory 15CSL76

Program:

from sklearn.model_selection import train_test_split


from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix
from sklearn import datasets

""" Iris Plants Dataset, dataset contains 150 (50 in each of three
classes)Number of Attributes: 4 numeric, predictive attributes and
the Class
"""
iris=datasets.load_iris()

""" The x variable contains the first four columns of the dataset
(i.e. attributes) while y contains the labels.
"""
x = iris.data
y = iris.target

print ('sepal-length', 'sepal-width', 'petal-length', 'petal-width')


print(x)
print('class: 0-Iris-Setosa, 1- Iris-Versicolour, 2- Iris-Virginica')
print(y)

""" Splits the dataset into 70% train data and 30% test data. This
means that out of total 150 records, the training set will contain
105 records and the test set contains 45 of those records
"""
x_train, x_test, y_train, y_test =
train_test_split(x,y,test_size=0.3)

#To Training the model and Nearest nighbors K=5


classifier = KNeighborsClassifier(n_neighbors=5)
classifier.fit(x_train, y_train)

#to make predictions on our test data


y_pred=classifier.predict(x_test)

""" For evaluating an algorithm, confusion matrix, precision, recall


and f1 score are the most commonly used metrics.
"""
print('Confusion Matrix')
print(confusion_matrix(y_test,y_pred))
print('Accuracy Metrics')
print(classification_report(y_test,y_pred))

2 Deepak D, Assistant Professor, Dept. of CS&E, Canara Engineering College, Mangaluru


Machine Learning Laboratory 15CSL76

Output:

sepal-length sepal-width petal-length petal-width


[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]
. . . . .
. . . . .

[6.2 3.4 5.4 2.3]


[5.9 3. 5.1 1.8]]

class: 0-Iris-Setosa, 1- Iris-Versicolour, 2- Iris-Virginica


[0 0 0 ………0 0 1 1 1 …………1 1 2 2 2 ………… 2 2]

Confusion Matrix
[[20 0 0]
[ 0 10 0]
[ 0 1 14]]

Accuracy Metrics

Precision recall f1-score support

0 1.00 1.00 1.00 20


1 0.91 1.00 0.95 10
2 1.00 0.93 0.97 15

avg / total 0.98 0.98 0.98 45

3 Deepak D, Assistant Professor, Dept. of CS&E, Canara Engineering College, Mangaluru


Machine Learning Laboratory 15CSL76

Basic knowledge

Confusion Matrix

True positives: data points labelled as positive that are actually positive
False positives: data points labelled as positive that are actually negative
True negatives: data points labelled as negative that are actually negative
False negatives: data points labelled as negative that are actually positive

Accuracy: how often is the classifier correct?

F1-Score:

Support: Total Predicted of Class.


Support = TP + FN

4 Deepak D, Assistant Professor, Dept. of CS&E, Canara Engineering College, Mangaluru


Machine Learning Laboratory 15CSL76

Example:

 Support _ A = TP_A + FN_A


= 30 + (20 + 10)
= 60

5 Deepak D, Assistant Professor, Dept. of CS&E, Canara Engineering College, Mangaluru

You might also like