0 ratings 0% found this document useful (0 votes) 5 views 4 pages KNN Algorithm
The K-Nearest Neighbors (K-NN) algorithm is a supervised machine learning method used for classification and regression, which classifies new data points based on their similarity to existing data. It operates by storing all available data and determining the category of a new data point by analyzing the closest neighbors using Euclidean distance. Additionally, logistic regression is introduced as a classification algorithm that predicts probabilities for discrete classes using a logistic function.
AI-enhanced title and description
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Carousel Previous Carousel Next
Save KNN algorithm For Later K-Nearest Neighbors (K-NN) algorithm
The K-Nearest Neighbors (K-NN) algorithm is a clear and simple supervised
machine learning algorithm that can be used to solve regression and classification
problems.
The K-NN algorithm assumes that the new case and existing cases are similar and
places the new case in the category that is most similar to the existing categories. The
K-NN algorithm stores all available data and classifies a new data point based on
its similarity to the existing data. This means that when new data appears, the KNN
algorithm can quickly classify it into a suitable category.
K-NN is a non-parametric algorithm, which means it makes no assumptions about
the data it uses. It's also known as a lazy learner algorithm because it doesn't learn
from the training set right away; instead, it stores the dataset and performs an action
on it when it comes time to classify it.
Pattern recognition, data mining, and intrusion detection are some of the demanding
applications.
Need of the K-NN Algorithm
Assume there are two categories, Category Aand Category B, and we have anew data
point x1. Which of these categories will this data point fall into? A K-NN algorithm44 Machine Learning
is required to solve this type of problem. We can easily identify the category or class
of a dataset with the help of K-NN as shown in Figure 2.12:
Cvelr Category 8
New data point 5
Figure 2.12: Category or class of a dataset with the help of K-NN
New data point
© assigned to.
The following algorithm can be used to explain how KNNs work:
Step-I: Select the no. K of the neighbors.
Step-II: Determine the Euclidean distance between K neighbors.
Step-III: Take the K closest neighbors based on the Euclidean distance calculated.
Step-IV: Count the number of data points in each category among these K neighbors.
Step-V: Assign the new data points to the category with the greatest number of
neighbors.
Step-VI: Our model is complete.
Example: Let's say we have anew data point thatneeds to be
laced in the ‘
category. Consider the following illustration: P Eee
Category A
Figure 2.13: K-NN exampleSupervised Learning Algorithms ™@ 45
First, we'll decide on the number of neighbors, so we'll go with k=5.
The Euclidean distance between the data points will then be calculated as shown
in Figure 2.14. The Euclidean distance is the distance between two points that we
learned about in geometry class. It can be calculated using the following formula:
Koay)
x Xe oe
?
Euclidean Distance between Arand B= /AXz-Xi)?+
Figure 2.14: The Euclidean distance between the data points
We found the closest neighbors by calculating the Euclidean distance, which yielded
three closest neighbors in category A and two closest neighbors in category B as
shown in Figure 2.15. Consider the following illustration:
26
°.,
f, °
@anechs
©®
. New Data
o¢ ®, point
Figure 2.15: Closest neighbors for the Category A and B46 @ Machine Learning
‘As can be seen, the three closest neighbors are all from category A, so this new data
point must also be from that category.
Logistic Regression
The classification algorithm logistic regression is used to assign observations to a
discrete set of classes. Unlike linear regression, which produces a continuous number
of values, logistic regression produces a probability value that can be mapped to two
or more discrete classes using the logistic sigmoid function.
A regression model with a categorical target variable is known as logistic regression.
To model binary dependent variables, it employs a logistic function.
The target variable in logistic regression has two possible values, such as yes/no.
Consider how the target variable y would be represented in the value of "yes" is 1 and
“no” is 0. The log-odds of y being 1 is a linear combination of one or more predictor
variables, according to the logistic model. So, let's say we have two predictors or
independent variables, x, and x, and p is the probability of y equaling 1. Then, using
the logistic model as a guide:
We can recover the odds by exponentiating the equation:
neeth ve)
ao a
1-p
= deed se)
i ‘a
p=
re)
aa
me sab +e
lt+e xt ia
Asa result, the probability of y is 1. If
isch ye
1, y equals 1. Asa result, P's closer to 0, y equals 0, and if p is closer to
the logistic regression equation is:
1
em ae)
: ied
1
his equation can be generalized to n number lent
id i
a beg is T of parameters and independ
1
y= ————
“By* By x,+ 0. +B, x)