Unit I
Unit I
Unit I
Supervised learning is the types of machine learning in which machines are trained
using well "labelled" training data, and on basis of that data, machines predict the
output. The labelled data means some input data is already tagged with the correct
output.
In supervised learning, the training data provided to the machines work as the supervisor
that teaches the machines to predict the output correctly. It applies the same concept as
a student learns in the supervision of the teacher.
Supervised learning is a process of providing input data as well as correct output data
to the machine learning model. The aim of a supervised learning algorithm is to find a
mapping function to map the input variable(x) with the output variable(y).
In the real-world, supervised learning can be used for Risk Assessment, Image
classification, Fraud Detection, spam filtering, etc.
The task of the Regression algorithm is to find the mapping function to map the input
variable(x) to the continuous output variable(y).
Example: Suppose we want to do weather forecasting, so for this, we will use the
Regression algorithm. In weather prediction, the model is trained on the past data, and
once the training is completed, it can easily predict the weather for future days.
Types of regression:
There are various types of regressions which are used in data science and machine
learning. Each type has its own importance on different scenarios, but at the core, all
the regression methods analyze the effect of the independent variable on dependent
variables. Here we are discussing some important types of regression which are given
below:
o Linear Regression
o Logistic Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression
o Ridge Regression
o Lasso Regression:
Linear Regression:
Polynomial regression:
Polynomial Regression is a regression algorithm that models the relationship between
a dependent(y) and independent variable(x) as nth degree polynomial
y= b0+b1x1+ b2x12+ b2x13+...... bnx1n
o It is also called the special case of Multiple Linear Regression in ML. Because
we add some polynomial terms to the Multiple Linear regression equation to
convert it into Polynomial Regression.
o It is a linear model with some modification in order to increase the accuracy.
o The dataset used in Polynomial regression for training is of non-linear nature.
o It makes use of a linear regression model to fit the complicated and non-linear
functions and datasets.
o Hence, "In Polynomial regression, the original features are converted into
Polynomial features of required degree (2,3,..,n) and then modeled using a
linear model."
In the previous topic, we have learned about Simple Linear Regression, where a single
Independent/Predictor(X) variable is used to model the response variable (Y). But there
may be various cases in which the response variable is affected by more than one
predictor variable; for such cases, the Multiple Linear Regression algorithm is used.
Overfitting and Underfitting are the two main problems that occur in machine learning
and degrade the performance of the machine learning models.
o Signal: It refers to the true underlying pattern of the data that helps the
machine learning model to learn from the data.
o Noise: Noise is unnecessary and irrelevant data that reduces the performance of
the model.
o Bias: Bias is a prediction error that is introduced in the model due to
oversimplifying the machine learning algorithms. Or it is the difference
between the predicted values and the actual values.
o Variance: If the machine learning model performs well with the training
dataset, but does not perform well with the test dataset, then variance occurs.
Overfitting:
Overfitting occurs when our machine learning model tries to cover all the data points
or more than the required data points present in the given dataset. Because of this, the
model starts caching noise and inaccurate values present in the dataset, and all these
factors reduce the efficiency and accuracy of the model. The overfitted model has low
bias and high variance.
Example: The concept of the overfitting can be understood by the below graph of the
linear regression output:
How to avoid the Overfitting in Model
Both overfitting and underfitting cause the degraded performance of the machine
learning model. But the main cause is overfitting, so there are some ways by which we
can reduce the occurrence of overfitting in our model.
o Cross-Validation
o Training with more data
o Removing features
o Early stopping the training
o Regularization
o Ensembling
Underfitting
Underfitting occurs when our machine learning model is not able to capture the
underlying trend of the data. To avoid the overfitting in the model, the fed of training
data can be stopped at an early stage, due to which the model may not learn enough
from the training data. As a result, it may fail to find the best fit of the dominant trend
in the data.
In the case of underfitting, the model is not able to learn enough from the training data,
and hence it reduces the accuracy and produces unreliable predictions.
An underfitted model has high bias and low variance.
Example: We can understand the underfitting using below output of the linear
regression model:
To understand the least-squares regression method lets get familiar with the concepts
involved in formulating the line of best fit.
Line of Best Fit:
Line of best fit is drawn to represent the relationship between 2 or more variables. To
be more specific, the best fit line is drawn across a scatter plot of data points in order to
represent a relationship between those data points
If we were to plot the best fit line that shows the depicts the sales of a company over a
period of time, it would look something like this:
Notice that the line is as close as possible to all the scattered data points. This is what
an ideal best fit line looks like
To start constructing the line that best depicts the relationship between variables in the
data, we first need to get our basics right. Take a look at the equation below:
Surely, you’ve come across this equation before. It is a simple equation that represents
a straight line along 2 Dimensional data, i.e. x-axis and y-axis. To better understand
this, let’s break down the equation:
• y: dependent variable
• m: the slope of the line
• x: independent variable
• c: y-intercept
So the aim is to calculate the values of slope, y-intercept and substitute the
corresponding ‘x’ values in the equation in order to derive the value of the dependent
variable.
Let’s see how this can be done.
Step 2: Compute the y-intercept (the value of y at the point where the line crosses the
y-axis):
Now let’s look at an example and see how you can use the least-squares regression
method to compute the line of best fit.
Consider an example. Tom who is the owner of a retail shop, found the price of different
T-shirts vs the number of T-shirts sold at his shop over a period of one week.
Once you substitute the values, it should look something like this:
Let’s construct a graph that represents the y=mx + c line of best fit:
Now Tom can use the above equation to estimate how many T-shirts of price $8 can he
sell at the retail shop.
This comes down to 13 T-shirts! That’s how simple it is to make predictions using
Linear Regression.
Now let’s try to understand based on what factors can we confirm that the above line is
the line of best fit.
The least squares regression method works by minimizing the sum of the square of the
errors as small as possible, hence the name least squares. Basically the distance between
the line of best fit and the error must be minimized as much as possible. This is the basic
idea behind the least squares regression method.
A few things to keep in mind before implementing the least squares regression method
is:
• The data must be free of outliers because they might lead to a biased and
wrongful line of best fit.
• The line of best fit can be drawn iteratively until you get a line with the minimum
possible squares of errors.
• This method works well even with non-linear data.
• Technically, the difference between the actual value of ‘y’ and the predicted
value of ‘y’ is called the Residual (denotes the error).
Ridge Regression:
o Ridge regression is one of the most robust versions of linear regression in which
a small amount of bias is introduced so that we can get better long term
predictions.
o The amount of bias added to the model is known as Ridge Regression penalty.
We can compute this penalty term by multiplying with the lambda to the squared
weight of each individual features.
o The equation for ridge regression will be:
Lasso Regression:
Below are some important assumptions of Linear Regression. These are some formal
checks while building a Linear Regression model, which ensures to get the best
possible result from the given dataset.
o Linear relationship between the features and target:
Linear regression assumes the linear relationship between the dependent and
independent variables.
o Small or no multicollinearity between the features:
o Homoscedasticity Assumption:
Homoscedasticity is a situation when the error term is the same for all the
values of independent variables. With homoscedasticity, there should be no
clear pattern distribution of data in the scatter plot.
o Normal distribution of error terms:
Linear regression assumes that the error term should follow the normal
distribution pattern. If error terms are not normally distributed, then confidence
intervals will become either too wide or too narrow, which may cause
difficulties in finding coefficients.
It can be checked using the q-q plot. If the plot shows a straight line without
any deviation, which means the error is normally distributed.
o No autocorrelations:
The linear regression model assumes no autocorrelation in error terms. If there
will be any correlation in the error term, then it will drastically reduce the
accuracy of the model. Autocorrelation usually occurs if there is a dependency
between residual errors.