Experiment 7 Ids
Experiment 7 Ids
Experiment 7 Ids
October 2, 2023
[ ]: KNN CLASSIFIER
URK21CS2056
[ ]: AIM:
To Implement the KNN classification for the given data set and analyse
theperformance of the classifier with different K values.
[ ]: DESCRIPTION:
K-nearest neighbors (KNN) algorithm is a type of supervised ML algorithm
which can be used for both classification as well as regression
predictive problems.
However, it is mainly used for classification predictive problems
in industry.
Step1: For implementing any algorithm, we need dataset. So, during the
first step of KNN, load the training as well as test data.
Step2: Choose the value of K i.e. the nearest data points. K can
beany integer.
Step3: For each point in the test data do the following:
3.1 : Calculate the distance between test data and each row
oftraining data with the help of any of the method
1
namely: Euclidean, Manhattan or Hamming distance.
The most commonly used method to calculate distance is
Euclidean.
3.2 : Now, based on the distance value, sort them in ascending
order.
3.3 : Next, it will choose the top K rows from the sorted array.
3.4: Now, it will assign a class to the test point based on
most frequent class of these rows.
Step4: End
Confusion Matrix:
It is the easiest way to measure the performance of a
classification problem where the output can be of two or more
type of classes.
A confusion matrix is nothing but a table with two dimensionsviz.
“Actual” and “Predicted” and furthermore, both the dimensions
have “True Positives (TP)”, “True Negatives (TN)”,
“False Positives (FP)”, “False Negatives (FN)”.
Classification Accuracy:
accuracy_score function of sklearn.metrics is used
tocompute accuracy of the classification model.
Accuracy= (TP+TN)/(TP+FP+FN+TN)
Precision:
Precision, used in document retrievals, may be defined
as the number of correct documents returned by
classification model.
Precision= TP/(TP+FP)
2
Recall or Sensitivity:
Recall may be defined as the number of
positivesreturned by classification model.
It can be calculated from the confusion matrix
Recall= TP/(TP+FN)
Specificity:
Specificity, may be defined as the number
of negatives returned by the classification
model.
Specificity= TN/(TN+FP)
F1 Score:
This score will give us the harmonic mean of
precision and recall.
F1 score is having equal relative contribution
ofprecision and recall.
[ ]: a. Develop a KNN classification model for the Cancer dataset using the␣
↪scikit-learna.
3
import matplotlib.pyplot as plt
[4]: y=df['diagnosis'
] y
[4] 0 M
:
1 M
2 M
3 M
4 M
..
564 M
565 M
566 M
567 M
568 B
Name: diagnosis, Length: 569, dtype: object
4
[5]: df['diagnosis']= df['diagnosis'].replace({'M':1,
'B':0}) df['diagnosis']
[5] 0 1
:
1 1
2 1
3 1
4 1
..
564 1
565 1
566 1
567 1
568 0
Name: diagnosis, Length: 569, dtype: int64
[6] array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0,
:
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1,
0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1,
0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1,
1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0,
0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1,
1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1,
0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1,
1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1,
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0,
0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0,
0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0]
)
5
[ ]: d. Divide the data into training (75%) and testing set (25%)
6
[ ]: g. Perform feature scaling on independent variables and analyse the performance
7
F1-score: 0.8648648648648649
Classification Report:
precision recall f1-score support
8
auc_score = auc(fpr, tpr)
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
tn, fp, fn, tp = conf_matrix.ravel()
specificity = tn / (tn + fp)
recall = tp / (tp + fn)
precision = tp / (tp + fp)
f1_score = 2 * precision * recall / (precision + recall)
print(tp,tn)
print(accuracy)
print(f1_score)
print(auc_score
)
46 82
0.8951048951048951
0.8598130841121495
0.9190126478988168
[14]:
knn = KNeighborsClassifier(n_neighbors = 7)
knn.fit(x_train, y_train)
y_pred = knn.predict(x_test)
y_pred
conf_matrix = confusion_matrix(y_test, y_pred)
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba[:,
1])auc_score = auc(fpr, tpr)
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
tn, fp, fn, tp = conf_matrix.ravel()
specificity = tn / (tn + fp)
recall = tp / (tp + fn)
precision = tp / (tp + fp)
f1_score = 2 * precision * recall / (precision + recall)
print(tp,tn)
print(accuracy)
print(f1_score)
print(auc_score
) 83
47
0.9090909090909091
0.8785046728971964
0.9190126478988168
[15]:
knn = KNeighborsClassifier(n_neighbors = 9)
knn.fit(x_train, y_train)
y_pred = knn.predict(x_test)
y_pred
conf_matrix = confusion_matrix(y_test, y_pred)
9
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba[:,
1])auc_score = auc(fpr, tpr)
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
tn, fp, fn, tp = conf_matrix.ravel()
specificity = tn / (tn + fp)
recall = tp / (tp + fn)
precision = tp / (tp + fp)
f1_score = 2 * precision * recall / (precision + recall)
print(tp,tn)
print(accuracy)
print(f1_score)
print(auc_score
)
45 82
0.8881118881118881
0.8490566037735849
0.9190126478988168
[16]:
knn = KNeighborsClassifier(n_neighbors = 11)
knn.fit(x_train, y_train)
y_pred = knn.predict(x_test)
y_pred
conf_matrix = confusion_matrix(y_test, y_pred)
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba[:,
1])auc_score = auc(fpr, tpr)
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
tn, fp, fn, tp = conf_matrix.ravel()
specificity = tn / (tn + fp)
recall = tp / (tp + fn)
precision = tp / (tp + fp)
f1_score = 2 * precision * recall / (precision + recall)
print(tp,tn)
print(accuracy)
print(f1_score)
print(auc_score
) 83
45
0.8951048951048951
0.8571428571428572
0.9190126478988168
[ ] |K value| TP | TN | Accuracy | F-score | AUC-score |
:
| |
| 3 | 80 | 80 |0.8951048951048951|0.8648648648648649|0.9190126478988168|
| |
| 5 | 46 | 82 |0.8951048951048951|0.8951048951048951|0.9190126478988168|
10
| |
| 7 | 47 | 83 |0.9090909090909091|0.8785046728971964|0.9190126478988168
|
| |
| 9 | 45 | 82 |0.8881118881118881|0.8490566037735849|0.9190126478988168
|
| |
| 11 | 45 | 83 |0.8951048951048951|0.8571428571428572|0.9190126478988168
|
irisData = load_iris()
X = irisData.data
y = irisData.target
neighbors = [3,5,7,9,11]
train_accuracy = np.empty(len(neighbors))
test_accuracy = np.empty(len(neighbors))
plt.legend()
plt.xlabel('n_neighbors'
)
plt.ylabel('Accuracy')
plt.show()
11
[ ]: Therefore for the value k=7, the classification algorithm provides better
performance.
[ ]: RESULT:
Hence, we where able to work on the KNN classification and also know about
the performence matrics,and the output has been verified.
12