Algorithms - K Nearest Neighbors
Algorithms - K Nearest Neighbors
Tilani Gunawardena
1
Algorithms: K Nearest Neighbors
2
Simple Analogy..
• Tell me about your friends(who your
neighbors are) and I will tell you who you are.
3
Instance-based Learning
4
KNN – Different names
• K-Nearest Neighbors
• Memory-Based Reasoning
• Example-Based Reasoning
• Instance-Based Learning
• Lazy Learning
5
What is KNN?
6
KNN: Classification Approach
7
8
Distance Measure
Compute
Distance
Test
Record
Training
Records Choose k of the
“nearest” records
9
Distance measure for Continuous
Variables
10
Distance Between Neighbors
• Calculate the distance between new example
(E) and all examples in the training set.
13
How to choose K?
14
15
X X X
16
KNN Feature Weighting
17
Feature Normalization
18
Nominal/Categorical Data
• Distance works naturally with numerical attributes.
19
KNN Classification
$250,000
$200,000
$150,000
Loan$ Non-Default
$100,000 Default
$50,000
$0
0 10 20 30 40 50 60 70
Age
20
KNN Classification – Distance
Age Loan Default Distance
25 $40,000 N 102000
35 $60,000 N 82000
45 $80,000 N 62000
20 $20,000 N 122000
35 $120,000 N 22000
52 $18,000 N 124000
23 $95,000 Y 47000
40 $62,000 Y 80000
60 $100,000 Y 42000
48 $220,000 Y 78000
33 $150,000 Y 8000
48 $142,000 ?
D ( x1 x2 ) ( y1 y2 )
2 2
21
KNN Classification – Standardized Distance
Age Loan Default Distance
0.125 0.11 N 0.7652
0.375 0.21 N 0.5200
0.625 0.31 N 0.3160
0 0.01 N 0.9245
0.375 0.50 N 0.3428
0.8 0.00 N 0.6220
0.075 0.38 Y 0.6669
0.5 0.22 Y 0.4437
1 0.41 Y 0.3650
0.7 1.00 Y 0.3861
0.325 0.65 Y 0.3771
0.7 0.61 ?
X Min
Xs
Max Min
22
Strengths of KNN
• Very simple and intuitive.
• Can be applied to the data from any distribution.
• Good classification if the number of samples is large enough.
Weaknesses of KNN
23