0% found this document useful (0 votes)
5 views20 pages

KNN

The K-Nearest Neighbor (KNN) algorithm is a simple, non-parametric machine learning technique used for classification and regression based on the similarity of data points. It classifies new data by comparing it to stored data points and determining the most common category among the nearest neighbors. The selection of the number of neighbors (K) can be optimized through methods like cross-validation and error analysis.

Uploaded by

Fareeha Butt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views20 pages

KNN

The K-Nearest Neighbor (KNN) algorithm is a simple, non-parametric machine learning technique used for classification and regression based on the similarity of data points. It classifies new data by comparing it to stored data points and determining the most common category among the nearest neighbors. The selection of the number of neighbors (K) can be optimized through methods like cross-validation and error analysis.

Uploaded by

Fareeha Butt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

K-Nearest Neighbor

Dr. Adven
K-Nearest Neighbor(KNN) Algorithm

• K-Nearest Neighbor is one of the simplest Machine Learning algorithms


based on Supervised Learning technique.
• K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most
similar to the available categories.
• K-NN algorithm stores all the available data and classifies a new data
point based on the similarity. This means when new data appears then it
can be easily classified into a well suite category by using K- NN
algorithm.
• K-NN algorithm can be used for Regression as well as for Classification
but mostly it is used for the Classification problems.
K-Nearest Neighbor(KNN) Algorithm
• K-NN is a non-parametric algorithm, which means it does not make
any assumption on underlying data.
• It is also called a lazy learner algorithm because it does not learn
from the training set immediately instead it stores the dataset and at
the time of classification, it performs an action on the dataset.
• KNN algorithm at the training phase just stores the dataset and when
it gets new data, then it classifies that data into a category that is
much similar to the new data.
Example
• Suppose, we have an image of a creature that looks similar to cat and
dog, but we want to know either it is a cat or dog. So for this
identification, we can use the KNN algorithm, as it works on a
similarity measure. Our KNN model will find the similar features of the
new data set to the cats and dogs images and based on the most
similar features it will put it in either cat or dog category.
Why do we need a K-NN Algorithm?

• Suppose there are two categories, i.e., Category A and Category B,


and we have a new data point x1, so this data point will lie in which of
these categories. To solve this type of problem, we need a K-NN
algorithm. With the help of K-NN, we can easily identify the category
or class of a particular dataset. Consider the below diagram:
How does K-NN work?

• The K-NN working can be explained on the basis of the below algorithm:
• Step-1: Select the number K of the neighbors
• Step-2: Calculate the Euclidean distance of K number of neighbors
• Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
• Step-4: Among these k neighbors, count the number of the data points in
each category.
• Step-5: Assign the new data points to that category for which the number
of the neighbor is maximum.
• Step-6: Our model is ready.
Selecting the value of K in KNN
algorithm
1. Cross-Validation:
• Use k-fold cross-validation to test different values of K and choose the one that provides the best performance on
validation data. This helps to avoid overfitting and underfitting.
2. Error Analysis:
• Plot the error rate for different values of K and select the K with the lowest error rate. Typically, this is done by
plotting the error rate on the y-axis and KKK values on the x-axis.
• The plot often shows a U-shaped curve where the error initially decreases, reaches a minimum, and then starts to
increase. Select K at the point where the error is minimized.
3. Domain Knowledge:
• Use domain-specific knowledge to guide the selection of K. For example, in some cases, a small KKK (like 3 or 5) might
be more appropriate for capturing local patterns, while in others, a larger K might be needed to account for noise.
4. Heuristic Methods:
• A common heuristic is to set K as the square root of the number of data points, but this is just a rule of thumb and
may not always provide the best result.
5. Empirical Testing:
• Test multiple values of K empirically on the training set and observe which one provides the best balance between
bias and variance.
6. Even vs. Odd Values:
• Use odd values of K when the number of classes is even to avoid ties.
Advantages of KNN Algorithm

• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is large.
Disadvantages of KNN Algorithm

• Always needs to determine the value of K which may be complex


some time.
• The computation cost is high because of calculating the distance
between the data points for all the training samples.
Example: solve for: SL=5.2,
SW=3.1, Specie=?
Solution
Step 1
Step 2: Assign Rank
Step 3: Find the NN value
Dealing with the categorical
Features (Example)
Example 2
Solution
Solution
Solution
Solution

You might also like