Module 03

21CS54 | AI & ML| SEARCH CREATORS.
Module – 03
Chapter: - 01 - Basics of Learning Theory
• Learning is a process by which one can acquire knowledge and construct new ideas or
concepts based on the experiences.
• Machine learning is an intelligent way of learning general concept from training examples
without writing a program.
• There are many machine learning algorithms through which computers can intelligently
learn from past data or experiences, identify patterns, and make predictions when new
data is fed.
Learning Objectives
Understand the basics of learning theory and the key elements of machine learning.
Introduce Concept learning to obtain an abstraction and generalization from the data.
Learn about hypothesis representation language and to explore searching the hypothesis
space using Find-S algorithm and its limitations.
Study about version spaces, List-Then-Eliminate algorithm and Candldate Elimination
algorithm. Introduce Inductive bias, a set of prior assumptions considered by a learning
algorithm beyond the training data in order to perform induction.
Discuss the tradeoff between the two factors called bias and variance which exist when
modelling a machine learning algorithm.
Understand the different challenges in model selection and model evaluation. Study
popular re-sampling model selection method called Cross-Validation (K-fold, LOOCYV,
etc.) to tune machine learning model.
Introduce the various learning frameworks and learn to evaluate hypothesis.
INTRODUCTION To LEARNING AND ITS TYPES

The process of- acquiring knowledge and expertise through study, experience, or being
taught is called as learning. generally, humans learn in different ways.
To make machines learn, we need t, simulate the. strategies of human learning in
machines. But, will the computers learn? This question has been raised over many
centuries by philosophers, mathematicians, and logicians.
Search Creators... Page 1

Let. the unknown target function be f: X = Y, that maps the input space to output space.
The objective of the learning Program is to pick a function, g: y — Y to approximate
hypothesis f. All the possible formulae form a hypothesis space.
In short, let H be the set of all formulae from which the learning algorithm chooses. The
choice is good when the hypothesis g replicates f for all samples. This is shown in Figure
3.1.
Classical and Adaptive Machine Learning Systems

classical machine learning system has components such as Input, Process and Output. The
input values are taken from the environment directly.

These values are processed and a hypothesis is generated as output model. This model is
then used for making predictions.
The predicted values are consumed by the environment. In contrast to the classical
systems, adaptive systems interact with the input for getting labelled data as direct inputs
are not available.
This process is called reinforcement learning. In reinforcement learning, a learning agent
interacts with the environment and in return gets feedback.
Based on the feedback, the learning agent generates input samples for learning, which are
used for generating the learning model.
Such learning agents are not static and change their behavior according to the external
signal received from the environment.
The feedback is known as reward and learning here is the ability of the learning agent
adapting to the environment based on the reward. These are the characteristics of an
adaptive system.
Learning
Types There are different types of learning. Some of the different learning methods are as
follows:
1. Learn by memorization or learn by repetition also called as rote learning is done by

memorizing without understanding the logic or concept.
2. Learn by examples also called as learn by experience or previous knowledge acquired at
some time, is like finding an analogy, which means performing inductive learning from
observations that formulate a general concept.
3. Learn by being taught by an expert or a teacher, generally called as passive learning.
4. Learning by critical thinking, also called as deductive learning, deduces new facts q,
conclusion from related known facts and information.
5. Self learning, also called as reinforcement learning, is a self-directed learning thy
normally learns from mistakes punishments and rewards.
6. Leaming to solve problems is a type of cognitive learning where learning happen in the
mind and is possible by devising a methodology to achieve a goal.
7. Learning by generalizing explanations, also called as explanation-based learning (EBJ)
is another learning method that exploits domain knowledge from experts to improve the
accuracy of learned concepts by supervised learning.
Acquiring general concept from specific instances of the training dataset is the main
challenge of machine learning.

INTRODUCTION TO COMPUTATION LEARNING THEORY
These questions are the basis of a field called ‘Computational Learning Theory’ or in
short (COLT). It is a specialized field of study of machine learning.
COLT deals with formal methods used for learning systems. It deals with frameworks for
quantifying learning tasks and learning algorithms.
It provides a fundamental basis for study of machine learning.
Computational Learning Theory uses many concepts from diverse areas such as
Theoretical Computer Science, Artificial Intelligence and Statistics.
The core concept of COLT is the concept of learning framework. One such important
framework is PAC.
It deals with Probably Approximate Learning (PAC) and Vapnik-Chervonenkis (VC)
dimensions.
The learning framework is discussed in a detailed manner in COLT focuses on supervised
learning tasks. Since the complexity of analyzing is difficult, normally, binary
classification tasks are considered for analysis.
DESIGN OF A LEARNING SYSTEM
Training Experience
Let us consider designing of a chess game. In direct experience, individual board states
and correct moves of the chess game are given directly.

In indirect system, the move sequences and results are only given. The training
experience also depends on the presence of a supervisor who can label all valid moves for
a board state.
In the absence of a supervisor, the game agent plays against itself and learns the good
moves, if the training samples cover all scenarios, or in other words, distributed enough
for performance computation.
If the training samples and testing samples have the same distribution, the results would
be good.
Determine the Target Function

The next step is the determination of a target function. In this step, the type of knowledge
that needs to be learnt is determined.
In direct experience, a board move is selected and is determined whether it is a good
move or not against all other moves. If it is the best move, then it is chosen as: B —> M,
where, B and M are legal moves.
In indirect experience, all legal moves are accepted and a score is generated for each. The
move with largest score is then chosen and executed.
Determine the Target Function Representation
Choosing an Approximation Algorithm for the Target Function
Here, b is the sample and v(b) is the predicted hypothesis. The approximation is carried, "
out as:

INTRODUCTION TO CONCEPT LEARNING

Concept learning is a learning strategy of acquiring abstract knowledge or inferring a
genera concept or deriving a category from the given training samples.
It is a process of abstraction and generalization from the data. Concept learning helps to
classify an object that has a set of common, relevant features.
Thus, it helps a learner compare and contrast categories based on the similarity and
association of positive and negative instances in the training data to classify an object.
The learner tries to simplify by observing the common features from the training samples
and then apply this simplified model to the future samples. This task is also known as
learning from experience.
Each concept or category obtained by learning is a Boolean valued function which takes a
true or false value.
For example, humans can identify different kinds of animals based on common relevant
features and categorize all animals based on specific sets of features.
Concept learning requires three things:

Input — Training dataset which is a set of training instances, each labeled with the name of a
concept or category to which it belongs. Use this past experience to train and build the model.
Output — Target concept or Target function f. It is a mapping function f(x) from input x to
output y. It is to determine the specific features or common features to identify an object.
Test — New instances to test the learned model.

Representation of a Hypothesis
hypothesis ‘h’ approximates a target function ‘f’ to represent the relationship between the
independent attributes and the dependent attribute of the training instances.
The hypothesis is the predicted approximate model that best maps the inputs to outputs.
Each hypothesis is represented as a conjunction of attribute conditions in the antecedent
part.
For example, (Tail = Short) A (Color = Black).... The set of hypothesis in the search
space is called as hypotheses. Hypotheses are the plural form of hypothesis.
Generally ‘H’ is used to represent the hypotheses and ‘i’ is used to represent a candidate
hypothesis.

Thus, concept learning can also be called as Inductive Learning that tries to induce a
general function from specific training instances.
That is why a hypothesis is an approximate target function that best maps the inputs to
outputs.

Hypothesis Space
Hypothesis space is the set of all possible hypotheses that approximates the target
function f. In other words, the set of all possible approximations of the target function can
be defined as hypothesis space.
From this set of hypotheses in the hypothesis space, a machine learning algorithm would
determine the best possible hypothesis that would best describe the target function or best
fit the outputs.
Generally, a hypothesis representation language represents a larger hypothesis space.
Every machine learning algorithm would represent the hypothesis space in a different
manner about the function that maps the input variables to output variables.
For example, a regression algorithm represents the hypothesis space as a linear function
whereas a decision tree algorithm represents the hypothesis space as a tree.
The set of hypotheses that can be generated by a learning algorithm can be further
reduced by specifying a language bias.
The subset of hypothesis space that is consistent with all-observed training instances is
called as Version Space. Version space represents the only hypotheses that are used for
the classification.
For example, each of the attribute given in the Table 3.1 has the following possible set of
values.
Hypothesis ordering is also important wherein the hypotheses are ordered from the most
specific one to the most general one in order to restrict searching the hypothesis space
exhaustively.
Heuristic Space Search

Heuristic search is a search strategy that finds an optimized hypothesis/solution to a
problem by iteratively improving the hypothesis/solution based on a given heuristic
function or a cost measure.
Heuristic search methods will generate a possible hypothesis that can be a solution in the
hypothesis spacer or a path from the initial state.
Generalization and Specialization
In order to understand about how we construct this concept hierarchy, Jet us apply this
general principle of generalization/specialization relation.
By generalization of the most specific hypothesis and by specialization of the most
general hypothesis, the hypothesis space can be searched for an approximate hypothesis
that matches all positive instances but does not match any negative instance.
Generalization - Specific to General Learning This learning methodology will search

through the hypothesis space for an approximate hypothesis by generalizing the most specific
hypothesis.
Example:- Consider the training instances shown in Table 3.1 and illustrate Specific to
General Learning.

Specialization - General to Specific Learning This learning methodology will search

through the hypothesis space for an approximate hypothesis by specializing the most general
hypothesis.
illustrate learning by Specialization — General to Specific Learning for the data instances
shown in Table 3.1. Solution: Start from the most general hypothesis which will make true all
positive and negative instances.


Hypothesis Space Search by Find-S Algorithm

Find-S algorithm is guaranteed to converge to the most specific hypothesis in H that is
consistent with the positive instances in the training dataset.
Obviously, it will also be consistent with the negative instances. Thus, this algorithm
considers only the positive instances and eliminates negative instances while generating
the hypothesis. It initially starts with the most specific hypothesis.
Example:- Consider the training dataset of 4 instances shown in Table 3.2. It contains the
details of the performance of students and their likelihood of getting a job offer or not in their
final semester. Apply the Find-S algorithm.

Limitations of Find-S Algorithm

1. Find-S algorithm tries to find a hypothesis that is consistent with positive instances,
ignoring all negative instances. As long as the training dataset is consistent, the
hypothesis found by this algorithm may be consistent.
2. The algorithm finds only one unique hypothesis, wherein there may be many other
hypotheses that are consistent with the training dataset.

Version Spaces
The version space contains the subset of hypotheses from the hypothesis space that is
consistent with all training instances in the training dataset.
List-Then-Eliminate Algorithm
The principle idea of this learning algorithm is to initialize the version space to contain all
hypotheses and then eliminate any hypothesis that is found inconsistent with any training
instances.
Initially, the algorithm starts with a version space to contain all hypotheses scanning each
training instance, The hypotheses that are inconsistent with the training instance are
eliminated.
Finally, the algorithm outputs the list of remaining hypotheses that are all consistent.
This algorithm works fine if the hypothesis space is finite but practically it is difficult to
deploy this algorithm. Hence, a variation of this idea is introduced in the Candidate
Elimination algorithm.
Version Spaces and the Candidate Elimination Algorithm
Version space learning is to generate all consistent hypotheses around. This algorithm
computes the version space by the combination of the two cases namely,
The algorithm defines two boundaries called 'general boundary’ which is a set of all
hypotheses that are the most general and “specific boundary’ which is a set of all
hypotheses that are the most specific.
Thus, the algorithm limits the version space to contain only those hypotheses that are
most general and most specific. Thus, it provides a compact representation of List-then
algorithm.

Generating Positive Hypothesis ‘S’
If it is a positive example, refine S to include the positive instance. We need to

generalize S to include the positive instance.
The hypothesis is the conjunction of 'S’ and positive instance. When generalizing, for
the first positive instance, add to S all minimal generalizations such that S is filled
with attribute values of the positive instance. previous iteration.
If it is a negative instance, it skips

Generating Negative Hypothesis ‘G’
If it is a negative instance, refine G to exclude the, negative instance.

Then, prune G to exclude all inconsistent hypotheses in G with the positive, instance.
The idea is to add to G al] minimal specializations to exclude the negative instance ap,
d be consistent with the positive instance. Negative hypothesis indicates general
hypothesis.
Generating Version Space - [Consistent Hypothesis]
We need to take the combination of sets in ‘G and check that with ‘S’. When the combined
set fields are matched with fields in °S’, the only that is included in the version space as
consistent hypothesis.
Example:-
Consider the same set of instances from the training dataset shown in Table 33 and generate
version space as consistent hypothesis.

Since it is a negative instance, specialize G2 to exclude the negative example but stay
consistent with S2. Generate hypothesis for each of the non-matching attribute value
in S2 and fill with the attribute value of S2.
There is no inconsistent hypothesis in 52 with the negative instance, hence S3 remains the
same.


Chapter: - 02 - Similarity-based Learning
Similarity-based Learning is a supervised learning technique that predicts the class label
of a test instance by gauging the similarity of this test instance with training instances.
Similarity-based learning refers to a family of instance-based learning which is used to
solve both classification and regression problems.
Similarity-based classification is useful in various fields such as image processing, text
classification, pattern recognition, bio informatics, data mining, information retrieval,
natural language processing, etc.
A practical application of this learning is predicting daily stock index price changes.
Learning Objectives
INTRODUCTION TO SIMILARITY OR INSTANCE-BASED

LEARNING
similarity-based classifiers use similarity measures to locate the nearest neighbors and
classify a test instance which works in contrast with other learning mechanisms such as
decision trees or Neural networks.
Similarity-based learning is also called as Instance-based learning/Just-in time learning
since it does not build an abstract model of the training instances and performs lazy
learning when classifying a new instance.
The drawback of this learning is that it requires a large memory to store the data since a
global abstract model is not constructed initially with the training data.
Generally, Similarity-based classification problems formulate the features of test instance
and training instances in Euclidean space to learn the similarity or dissimilarity between
instances.

Differences Between Instance- and Model-based Learning

An instance is an entity or an example in the training dataset. It is described by a set of
features or attributes. One attribute describes the class label or category of an instance.
Instance-based methods learn or predict the class label of a test instance only when a new
instance is given for classification and until then it delays the processing of the training
dataset.
The differences between Instance-based Learning and Model-based Learning
Some examples of Instance-based learning algorithms are:
NEAREST-NEIGHBOR LEARNING
A natural approach to similarity-based classification is k-Nearest-Neighbors (k-NN),
which is a non-parametric method used for both classification and regression problems.
It is a simple and powerful non-parametric algorithm that predicts the category of the test
instance according to the ‘K’ training samples which are closer to Q the test instance and
classifies it to that category which has the largest probability.
A visual representation x of this learning is shown in Figure 4.1. There are two classes of
objects called C, and C, in the given figure.

When given a test instance T, the category of this test instance is determined by looking
at the class of k =3 nearest neighbors. thus, the class of this test instance k-Nearest
Neighbor Learning T is predicted as C,.
WEIGHTED K-NEAREST-NEIGHBOR ALGORITHM

The Weighted k-NN is an extension of k-NN. It chooses the neighbors by using the
weighted distance.

The k-Nearest Neighbor (k-NN) algorithm has some serious limitations as its performance
is solely dependent on choosing the k nearest neighbors, the distance metric used and the
decision rule. The idea is that weights are inversely proportional to distances.
The selected k nearest neighbors can be assigned uniform weights, which means all the
instances in each neighborhood are weighted equally or weights can be assigned by the
inverse of their distance.
NEAREST CENTROID CLASSIFIER

A simple alternative to k-NN classifiers for similarity-based classification is the Nearest Centroid
Classifier. It is a simple classifier and also called as Mean Difference classifier. The idea of this
classifier is to classify a test instance to the class whose centroid/mean is closest to that instance.

LOCALLY WEIGHTED REGRESSION (LWR)

Locally Weighted Regression (LWR) is a non-parametric supervised learning algorithm
that performs local regression by combining regression model with nearest neighbor’s
model.
LW is also referred to as a memory-based method as it requires training data while
prediction by uses only the training data instances locally around the point of interest.
Using nearest neighbor g algorithm, we find the instances that are closest to a test
instance and fit linear function to each of those ‘K’ nearest instances in the local
regression model. The key idea is that we need to approx.
imate the linear functions of all ‘¥ neighbors that minimize the error such that the
prediction line is no more linear but rather it is a curve.

Chapter: - 03 – Regression Analysis
Regression analysis is a supervised learning method for predicting Continuous variables,

The difference between classification and regression analysis is that regression methods
are used to predict qualitative variables or continuous numbers unlike categorical
variables or labels.
it is used to predict linear or non-linear relationships among variables of the given dataset.
This chapter deals with an introduction of regression and its various types.
Learning Objectives
INTRODUCTION TO REGRESSION
Regression analysis is the premier method of supervised learning. This is one of the most
popular and oldest supervised learning technique.
Given a training dataset D containing N training points (x, y,), where i = 1...N, regression
analysis is used to model the relationship between one or more independent variables x,
and a dependent variable y.
The relationship between the dependent and independent variables can be represented as a
function as follows: y=f(x)
Here, the feature variable x is also known as an explanatory variable, exploratory
variable, a predictor variable, an independent variable, a covariate, or a domain point. y is
a dependent variable. Dependent variables are also called as labels, target variables, or
response variables.
This is used to determine the relationship each of the exploratory variables exhibits. Thus,
regression analysis is used for prediction and forecasting.
Regression is used to predict continuous variables or quantitative variables such as price
and revenue. Thus, the primary concern of regression analysis is to find answer to
questions such as:

INTRODUCTION TO LINEARITY, CORRELATION, AND

CAUSATION
The quality of the regression analysis is determined by the factors such as correlation and
causation.
Regression and Correlation

Correlation among two variables can be done effectively using a Scatter plot, which is a
plot between explanatory variables and response variables. It is a 2D graph showing the
relationship between two variables.
The x-axis of the scatter plot is independent, or input or predictor variables and y-axis of
the scatter plot is output or dependent or predicted variables. The scatter plot is useful in
exploring data. Some of the scatter plots are shown in Figure 5.1.
In positive correlation, one variable change is associated with the change in another
variable. In negative correlation, the relationship between the variables is reciprocal while
in random correlation, no relationship exists between variables.
While correlation is about relationships among variables, say ¥ and y, regression is about
predicting one variable given another variable.

Regression and Causation

Causation is about causal relationship among variables, say x and y. Causation means know
in, whether x causes y to happen or vice versa. x causes y is often denoted as x implies y.
Correlation and Regression relationships are not same as causation relationship.
Similarly, the relationship between higher sales of cool drinks due to a rise in temperature
is not a causal relation. Even though high temperature is the cause of cool drinks sales, it
depends on other factors too,
Linearity and Non-linearity Relationships

The linearity relationship between the variables means the relationship between the
dependent and independent variables can be visualized as a straight line. The line of the
form, y = ax +p can be fitted to the data points that indicate the relationship between x and
y.
By linearity, it is meant that as one variable increases, the corresponding variable also
increases in a linear manner. A linear relationship is shown in Figure 5.2 (a).
A non-linear relationship exists in functions such as exponential function and power
function and it is shown in Figures 5.2 (b) and 5.2 (c). Here, x-axis is given by x data and
y-axis is given by y data.

Types of Regression Methods

❖ Linear Regression It is a type of regression where a line is fitted upon given data for
finding the linear relationship between one independent variable and one dependent
variable to describe relationships.
❖ Multiple Regression It is a type of regression where a line.is fitted for finding the linear
relationship between two or more independent variables and one dependent variable to
describe relationships among variables.
❖ Polynomial Regression It is a type of non-linear regression method of describing
relationships among variables where N™ degree polynomial is used to model the
relationship between one independent variable and one dependent variable. Polynomial
multiple regression is used to model two or more independent variables and one
dependent variable.
❖ Logistic Regression-It is used for predicting categorical variables that involve one or
more independent variables and one dependent variable. This is also known as a binary
classifier.
❖ Lasso and Ridge Regression Methods These are special variants of regression method
where regularization methods are used to limit the number and size of coefficients of the
independent variables.
Limitations of Regression Method

Outliers — Outliers are abnormal data. It can bias the outcome of the regression model,
as outliers push the regression line towards it.
Number of cases — The ratio of independent and dependent variables should be at least
20: 1. For every explanatory variable, there should be at least 20 samples. At least five
samples are required in extreme cases.
Missing data — Missing data in training data can make the model unfit for the sampled
data.
Multicollinearity — If exploratory variables are highly correlated (0.9 and above), the
regression is vulnerable to bias. Singularity leads to perfect correlation of 1. The remedy
is to remove exploratory variables that exhibit correlation more than 1. If there is a tie,
then the tolerance (1 - R squared) is used to eliminate variables that have the greatest
value.
INTRODUCTION TO LINEAR REGRESSION

In the simplest form, the linear regression model can be created by fitting a line among the Scattered
data points. The line is of the form given in Eq. (5.2).

The idea of linear regression is based on Ordinary Least Square (OLS) approach. This
method is also known as ordinary least squares method. In this method, the data points
are modelled using a straight line.
The squares of the individual errors can also be computed and added to give a sum of
squared error. The line with the lowest sum of squared error is called line of best fit.

Linear Regression in Matrix Form

Matrix notations can be used for representing the values of independent and dependent
variables. This is illustrated through Example 5.2. The Eq. (5.3) can be written in the
form of matrix as follows:
This can be written as: Y = Xa +e, where X is an n x 2 matrix, Y is an n x 1 vector, ais a 2

x 1 column vector and e is an n x 1 column vector.
Find linear regression of the data of week and product sales (in Thousands) given in
Table 5.3. Use linear regression in matrix form.

VALIDATION OF REGRESSION METHODS

The regression model should be evaluated using some metrics for checking the correctness,
The following metrics are used to validate the results of regression.
Mean Absolute Error (MAE)
Mean Squared Error (MSE)
Root Mean Square Error (RMSE)

Standard Error
Residuals or error is the difference between the actual (y) and predicted value (7). If the
residuals have normal distribution, then the mean is zero and hence it is desirable. This is a
measure of variability in finding the coefficients.
Relative MSE
Coefficient of Variation
Coefficient of Determination
To understand the coefficient of determination, one needs to understand the total variation of
coefficients in regression analysis. The sum of the squares of the differences between the y-value of
the data pair and the average of y is called total variation. Thus, the following variations can be
defined.

Standard Error Estimate

Standard error estimate is another useful measure of regression. It is the standard deviation of
the observed values to the predicted values. This is given as:
Example: -
Let us consider the data given in the Table 5.3 with actual and predicted values. Find standard
error estimate.

21CS54: Artificial Intelligence and Machine Learning
Module 3
Topics: Basics of Learning theory, Similarity Based Learning, Regression Analysis
Textbook 2: Chapter 3 - 3.1 to 3.4, Chapter 4, chapter 5.1 to 5.4
Chapter 4
Similarity Based Learning
4.1 Similarity or Instance-based Learning
4.1.1 Difference between Instance-and Model-based Learning
Some examples of Instance-based Learning algorithms are:
a) KNN
b) Variants of KNN
c) Locally weighted regression
d) Learning vector quantization
e) Self-organizing maps
f) RBF networks
Dept of ISE, RNSIT VS, Dr SNK, AG

4.2 Nearest-Neighbor Learning

• A powerful classification algorithm used in pattern recognition.
• K nearest neighbors stores all available cases and classifies new cases based on a
similarity measure (e.g distance function).
• One of the top data mining algorithms used today.
• A non-parametric lazy learning algorithm (An Instance based Learning method).
• Used for both classification and regression problems.
In the above diagram 4.1, 2 classes of objects called C1 and C2. When given a test
instance T,the category of this test instance is determined by looking at the class ofk=3
nearest neighbors. Thus, the class of this test instance T ispredicted as C2.

Consider the student performance training, dataset of 8 data instances shown in Table 4.2
which describes the performance of individual students in a course and their CGPA
obtained in the previous semesters. The independent attributes are CGPA, Assessment and
Project. The target variable is ‘Result’ which is a discrete valued variable that takes two
values ‘Pass’ or ‘Fail. Based on the performance of a student, classify whether a student
will pass or fail in that course.
Sl No. CGPA Assessment Project Submitted Result

1 9.2 85 8 Pass
2 8 80 7 Pass
3 8.5 81 8 Pass
4 6 45 5 Fail
5 6.5 50 4 Fail
6 8.2 72 7 Pass
7 5.8 38 5 Fail
8 8.9 91 9 Pass

4.3 Weighted K-Nearest Neighbor Algorithm

The weighted KNN is an extension of k-NN.It chooses the neighbors by using the
weighted distance. In weighted kNN, the nearest k points are given a weight using a
function called as the kernel function. The intuition behind weighted kNN, is to give
more weight to the points which are nearby and less weight to the points which are
farther away.


4.4 Nearest Centroid Neighbor

The Nearest Centroids algorithm assumes that the centroids in the input feature space
are different for each target label. The training data is split into groups by class label,
then the centroid for each group of data is calculated. Each centroid is simply the mean
value of each of the input variables, so it is also called as Mean Difference classifier.
If there are two classes,then two centroids or points are calculated; three classes give
three centroids, and so on.

4.5 Locally Weighted Regression (LWR)

Where, г is called the bandwidth parameter and controls the rate at which wi reduces to
zero with distance from xi.


CHAPTER 5
REGRESSION ANALYSIS
5.1 Introduction to Regression
Regression analysis is a fundamental concept that consists of a set of machine learning
methods that predict a continuous outcome variable (y) based on the value of one or
multiple predictor variables (x). OR
Regression analysis is a statistical method to model the relationship between a
dependent (target) and independent (predictor) variables with one or more independent
variables.
Regression is a supervised learning technique which helps in finding the correlation
between variables.
It is mainly used for prediction, forecasting, time series modelling, and determining the
causal- effect relationship between variables.
Regression shows a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the datapoints and the
regression line is minimum." The distance between datapoints and line tells whether a
model has captured a strong relationship or not.
• Function of regression analysis is given by: Y=f(x)
Here, y is called dependent variable and x is called independent variable.
Applications of Regression Analysis
1) Sales of a goods or services
2) Value of bonds in portfolio management
3) Premium on insurance companies
4) Yield of crop in agriculture
5) Prices of real estate
5.2 INTRODUCTION TO LINEARITY, CORRELATION AND CAUSATION
A correlation is the statistical summary of the relationship between two sets of
variables. It is a core part of data exploratory analysis, and is a critical aspect of
numerous advanced machine learning techniques. Correlation between two variables
can be found using a scatter plot.
There are different types of correlation:

• Positive Correlation: Two variables are said to be positively correlated when their
values move in the same direction. For example, in the image below, as the value
for X increases, sodoes the value for Y at a constant rate.
• Negative Correlation: Finally, variables X and Y will be negatively correlated
when their values change in opposite directions, so here as the value for X increases,
the value for Y decreases at a constant rate.
• Neutral Correlation: No relationship in the change of variables X and Y. In this
case, the values are completely random and do not show any sign of correlation,
as shown in the following image:
Causation
Causation is about relationship between two variables as x causes y. This is called x
implies b.Regression is different from causation. Causation indicates that one event is
the result of the occurrence of the other event; i.e. there is a causal relationship between
the two events.
Linear and Non-Linear Relationships
• The relationship between input features (variables) and the output (target) variable is
fundamental. These concepts have significant implications for the choice of algorithms,
model complexity, and predictive performance.
• Linear relationship creates a straight line when plotted on a graph, a Non-Linear
relationship does not create a straight line but instead creates a curve.
• Example:
Linear-the relationship between the hours spent studying and the grades obtained in a
class.
Non-Linear- GPS Signal

• Linearity:
Linear Relationship: A linear relationship between variables means that a change in
one variable is associated with a proportional change in another variable.
Mathematically, it can be represented as y = a * x + b, where y is the output, x is the
input, and a and b are constants.
Linear Models: Goal is to find the best-fitting line (plane in higher dimensions) to the
data points. Linear models are interpretable and work well when the relationship
between variablesis close to being linear.
Limitations: Linear models may perform poorly when the relationship between
variables is non-linear. In such cases, they may underfit the data, meaning they are too
simple to capture the underlying patterns.
• Non-Linearity:
Non-Linear Relationship: A non-linear relationship implies that the change in one
variable isnot proportional to the change in another variable. Non-linear relationships
can take various forms, such as quadratic, exponential, logarithmic, or arbitrary shapes.
Non-Linear Models: Machine learning models like decision trees, random forests,
support vector machines with non-linear kernels, and neural networks can capture non-
linear relationships. These models are more flexible and can fit complex data patterns.
Benefits: Non-linear models can perform well when the underlying relationships in the
data are complex or when interactions between variables are non-linear. They have the
capacity to capture intricate patterns.

Types of Regression
Linear Regression:
Single Independent Variable: Linear regression, also known as simple linear
regression, is used when there is a single independent variable (predictor) and one
dependent variable (target).
Purpose: Linear regression is used to establish a linear relationship between two
variables and make predictions based on this relationship. It's suitable for simple
scenarios where there's onlyone predictor.
Multiple Regression:
Multiple Independent Variables: Multiple regression, as the name suggests, is used
when there are two or more independent variables (predictors) and one dependent
variable (target).
Purpose: Multiple regression allows you to model the relationship between the
dependent variable and multiple predictors simultaneously. It is used when there are
multiple factors that may influence the target variable, and you want to understand their
combined effect and makepredictions based on all these factors.
Polynomial Regression:
Polynomial regression is an extension of multiple regression used when the relationship
between the independent and dependent variables is non-linear.
Logistic Regression:
Logistic regression is used when the dependent variable is binary (0 or 1). It models the
probability of the dependent variable belonging to a particular class.
Lasso Regression (L1 Regularization):
Lasso regression is used for feature selection and regularization. It penalizes the
absolutevalues of the coefficients, which encourages sparsity in the model.

Ridge Regression (L2 Regularization):

Ridge regression is used for regularization to prevent overfitting in multiple regression.
Itpenalizes the square of the coefficients.
Limitations of Regression
5.3 INTRODUCTION TO LINEAR REGRESSION

Linear regression model can be created by fitting a line among the scattered data points.
The line is of the form:
y= a0+a1*x+e
Ordinary Least Square Approach

The ordinary least squares (OLS) algorithm is a method for estimating the parameters
of a linear regression model. Aim: To find the values of the linear regression model's
parameters (i.e., the coefficients) that minimize the sum of the squared residuals.
In mathematical terms, this can be written as: Minimize ∑(yi – ŷi)^2
where yi is the actual value, ŷi is the predicted value.

A linear regression model used for determining the value of the response variable, ŷ,
can berepresented as the following equation.
y = b0 + b1x1 + b2x2 + … + bnxn + e
where: y - is the dependent variable, b0 is the intercept, e isthe error term
b1, b2, …, bn are the coefficients of the independentvariables x1, x2, …, xn
The coefficients b1, b2, …, bn can also be called the coefficientsof determination. The
goal of the OLS method can be used to estimate the unknown parameters (b1, b2, …,
bn) by minimizingthe sum of squared residuals (RSS). The sum of squared residualsis
also termed the sum of squared error (SSE).
This method is also known as the least-squares method for regression or linear
regression.Mathematically the line of equations for points are:
y1=(a0+a1x1)+e1
y2=(a0+a1x2)+e2 ……. yn=(a0+a1xn)+en.
In general ei=yi - (a0+a1x1)

Linear Regression Example

Linear Regression in Matrix Form

5.4 VALIDATION OF REGRESSION METHODS

The regression should be evaluated using some metrics for checking the correctness.
Thefollowing metrics are used to validate the results of regression.

Coefficient of Determination
The coefficient of determination (R² or r-squared) is a statistical measure in a regression

model that determines the proportion of variance in the dependent variable that can be
explained by the independent variable.
The sum of the squares of the differences between the y-value of the data pair and the
average of y is called total variation. Thus, the following variation can be defined as,
The explained variation is given by, =∑( Ŷi – mean(Yi))2 The unexplained variation is
given by, =∑( Yi - Ŷi )2
Thus, the total variation is equal to the explained variation and the unexplained
variation. The coefficient of determination r2 is the ratio of the explained and
unexplained variations.
Standard Error Estimate:

Standard Error Estimate is another useful measure of regression. It is standard
deviation of the observed values to the predicted values. This is given as:


Module 03

Uploaded by

Copyright:

Available Formats

Module 03

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module 03

Uploaded by

Copyright:

Available Formats

21CS54 | AI & ML| SEARCH CREATORS.

Chapter: - 01 - Basics of Learning Theory

INTRODUCTION To LEARNING AND ITS TYPES

Search Creators... Page 1

Classical and Adaptive Machine Learning Systems

Search Creators... Page 2

1. Learn by memorization or learn by repetition also called as rote learning is done by

Search Creators... Page 3

INTRODUCTION TO COMPUTATION LEARNING THEORY

DESIGN OF A LEARNING SYSTEM

Search Creators... Page 4

Determine the Target Function

Determine the Target Function Representation

Choosing an Approximation Algorithm for the Target Function

Search Creators... Page 5

INTRODUCTION TO CONCEPT LEARNING

Concept learning requires three things:

Test — New instances to test the learned model.

Search Creators... Page 6

Search Creators... Page 7

Search Creators... Page 8

Heuristic Space Search

Generalization and Specialization

Generalization - Specific to General Learning This learning methodology will search

Search Creators... Page 10

Specialization - General to Specific Learning This learning methodology will search

Search Creators... Page 11

Search Creators... Page 12

Hypothesis Space Search by Find-S Algorithm

Search Creators... Page 13

Limitations of Find-S Algorithm

Search Creators... Page 14

Version Spaces and the Candidate Elimination Algorithm

Search Creators... Page 15

Generating Positive Hypothesis ‘S’

If it is a positive example, refine S to include the positive instance. We need to

Search Creators... Page 16

Generating Negative Hypothesis ‘G’

If it is a negative instance, refine G to exclude the, negative instance.

Generating Version Space - [Consistent Hypothesis]

Search Creators... Page 17

Search Creators... Page 18

Search Creators... Page 19

Chapter: - 02 - Similarity-based Learning

INTRODUCTION TO SIMILARITY OR INSTANCE-BASED

Search Creators... Page 20

Differences Between Instance- and Model-based Learning

The differences between Instance-based Learning and Model-based Learning

Some examples of Instance-based learning algorithms are:

Search Creators... Page 21

WEIGHTED K-NEAREST-NEIGHBOR ALGORITHM

Search Creators... Page 22

NEAREST CENTROID CLASSIFIER

Search Creators... Page 23

LOCALLY WEIGHTED REGRESSION (LWR)

Search Creators... Page 24

Chapter: - 03 – Regression Analysis

Regression analysis is a supervised learning method for predicting Continuous variables,

Search Creators... Page 25