Regression Notes
Regression Notes
REGREESION
Linear regression is one of the most basic machine learning model. Its like a ‘hello world’
program of the machine learning. Linear regression is used when there is linear relationship
between input variables and output variables. That means we can calculate the values of output
variables by using some kind of linear combination of input variables. If there is only one input
variable then we call it ‘Single Variable Linear Regression’ or ‘Univariate Linear Regression’.
Regression’
And in case of moree than one input variables we call it ‘Multi Variable Linear Regression’
or ‘Multivariate Linear Regression’.
Regression’
Since there are only few data point, we can easily eyeball it and draw the best fit line, which will
generalize the relationship between input and output variables for us.
Since this line generalizes the relationship between input and output values for any prediction on
given input value, we can simply plot it on a line and Y coordinate for that point will give us the
prediction value.
Steps
To find the line of best fit for N points:
Step 2: Sum all x, y, x2 and xy, which gives us Σx, Σy, Σx2 and Σxy (Σ means "sum up")
b = Σy − m Σx / N
y = mx + b
Example: Sam found how many hours of sunshine vs how many ice creams were sold at the shop
from Monday to Friday:
"x" "y"
Hours of Ice Creams
Sunshine Sold
2 4
3 5
5 7
7 10
9 15
Let us find the best m (slope) and b (y-intercept) that suits that data, y = mx + b.
x y x2 xy
2 4 4 8
3 5 9 15
5 7 25 35
7 10 49 70
9 15 81 135
x y x2 xy
2 4 4 8
3 5 9 15
5 7 25 35
7 10 49 70
9 15 81 135
Σx: 26 Σy: 41 Σx2: 168 Σxy: 263
b = Σy − m Σx / N
= 41 − 1.5183 x 26 / 5
= 0.3049...
y = mx + b
y = 1.518x + 0.305
Here are the (x,y) points and the line y = 1.518x + 0.305 on a graph:
Nice fit!
Sam hears the weather forecast which says "we expect 8 hours of sun tomorrow", so he uses the
above equation to estimate that he will sell
MAE is a very simple metric which calculates the absolute difference between actu actual and
predicted values.To better understand, let’s take an example you have input data and output data
and use Linear Regression, which draws a best-fit
best line.
Now you must find the MAE of your model which is basically a mistake made by the model
known as an error. Now find the difference between the actual value and predicted value that is
an absolute error, but we must find the mean absolute of the complete dataset. So, sum all the
nd this is MAE. And we aim to get a
errors and divide them by a total number of observations and
minimum MAE because this is a loss.
Advantages of MAE
The MAE you get is in the same unit as the output variable.
It is most Robust to outliers.
Disadvantages of MAE
The graph of MAE is not differentiable so we have to apply various optimizers like
Gradient descent which can be differentiable.
differentiable
MSE is a most used and very simple metric with a little bit of change in mean absolute error.
difference.What actually the MSE represents? It represents the squared distance between actual
benefit of MSE.
Advantages of MSE
The graph of MSE is differentiable, so you can easily use it as a loss function.
Disadvantages of MSE
The value you get after calculating MSE is a squared unit of output. for example, the
output variable is in meter(m) then after calculating MSE the output we get is in meter
squared.
If you have outliers in the dataset then it penalizes the outliers most and the calculated
Robust to outliers which were an advantage in MAE.
MSE is bigger. So, in short, It is not Robust
As RMSE is clear by the name itself, that it is a simple square root of mean squared error.
Advantages of RMSE
The output value you get is in the same unit as the required output variable which makes
interpretation of loss easy.
Disadvantages of RMSE
R Squared (R2)
R2 score is a metric that tells the performance of your model, not the loss in an absolute sense
that how many wells did your model perform.In contrast, MAE and MSE depend on the context
So, with help of R squared we have a baseline model to compare a model which none of the
other metrics provides. The same we have in classification problems which we call a threshold
which is fixed at 0.5. So basically R2 squared calculates how must regression line is better than a
1. Initially, let m = 0, c = 0
Where L = learning rate — controlling how much the value of “m” changes with each step. The
3. Similarly, D wrt c
4. Now, updating the current value of m and c,
5. We repeat this process until loss function is very small ,i.e. ideally 0 % error (100% accuracy).
Multivariate Regression
Multivariate Regression is a supervised machine learning algorithm involving multiple data
variables for analysis. A Multivariate regression is an extension of multiple regression with
one dependent variable and multiple independent variables. Based on the number of
independent variables, we try to predict the output.
Multivariate regression tries to find out a formula that can explain how factors in variables
respond simultaneously to changes in others.
There are numerous areas where multivariate regression can be used. Let’s look at some
examples to understand multivariate regression better.
1. Pual, wants to estimate the price of a house. She will collect details such as the
location of the house, number of bedrooms, size in square feet, amenities
available, or not. Basis these details price of the house can be predicted and how
each variables are interrelated.
2. An agriculture scientist wants to predict the total crop yield expected for the
summer. He collected details of the expected amount of rainfall, fertilizers to be
used, and soil conditions. By building a Multivariate regression model scientists
can predict his crop yield. With the crop yield, the scientist also tries to understand
the relationship among the variables.
3. If an organization wants to know how much it has to pay to a new hire, they will
take into account many details such as education level, number of experience, job
location, has niche skill or not. Basis this information salary of an employee can be
predicted, how these variables help in estimating the salary.
4. Economists can use Multivariate regression to predict the GDP growth of a state or
a country based on parameters like total amount spent by consumers, import
expenditure, total gains from exports, total savings, etc.
5. A company wants to predict the electricity bill of an apartment, the details needed
here are the number of flats, the number of appliances in usage, the number of
people at home, etc. With the help of these variables, the electricity bill can be
predicted.
Mathematical Equation
The simple regression linear model represents a straight line meaning y is a function of x.
When we have an extra dimension (z), the straight line becomes a plane.
Here, the plane is the function that expresses y as a function of x and z. The linear regression
equation can now be expressed as:
y = m1.x + m2.z+ c
y is the dependent variable, that is, the variable that needs to be predicted.
x is the first independent variable. It is the first input.
m1 is the slope of x1. It lets us know the angle of the line (x).
z is the second independent variable. It is the second input.
m2 is the slope of z. It helps us to know the angle of the line (z).
c is the intercept. A constant that finds the value of y when x and z are 0.
The equation for a model with two input variables can be written as:
y = β0 + β1.x1 + β2.x2
What if there are three variables as inputs? Human visualizations can be only three
dimensions. In the machine learning world, there can be n number of dimensions. The
equation for a model with three input variables can be written as:
y = β0 + β1.x1 + β2.x2 + β3.x3
Below is the generalized equation for the multivariate regression model-
y = β0 + β1.x1 + β2.x2 +….. + βn.xn
Where n represents the number of independent variables, β0~ βn represents the coefficients
and x1~xn, is the independent variable.
The multivariate model helps us in understanding and comparing coefficients across the
output. Here, small cost function makes Multivariate linear regression a better model.
mod
Cost Function
The cost function is a function that allows a cost to samples when the model differs from
observed data. This equation is the sum of the square of the difference between the predicted
value and the actual value divided by twice the length of the dataset. A smaller mean squared
error implies a better performance. Here, the cost is the sum of squared errors.
Steps involved for Multivariate regression analysis are feature selection and feature
engineering, normalizing the features, selecting the loss function and hypothesis, set
hypothesis parameters, minimize the loss function, testing the hypothesis, and generating the
regression model.
Feature selection--
Normalizing Features-
Features
We need to scale the features as it maintains general distribution and ratios in data. This will
lead to an efficient analysis. The value of each feature can also be changed.
The loss function predicts whenever there is an error. Meaning, when the hypothesis
prediction deviates from actual values. Here, the hypothesis is the predicted value from the
feature/variable.
The hypothesis parameter needs to be set in such a way that it reduces the loss function and
predicts well.
The loss function needs to be minimized by using a loss minimization algorithm on the
dataset, which will help in adjusting hypothesis parameters. After the loss is minimized, it
can be used for further action. Gradient descent is one of the algorithms commonly used for
loss minimization.
The hypothesis function needs to be checked on as well, as it is predicting values. Once this
is done, it has to be tested on test data.
Polynomial Regression
o It is also called the special case of Multiple Linear Regression in ML. Because we add
some polynomial terms to the Multiple Linear regression equation to convert it into
Polynomial Regression.
o It is a linear model with some modification in order to increase the accuracy.
o The dataset used in Polynomial regression for training is of non-linear
non linear nature.
o It makes use of a linear regression model to fit the complicated and non-linear
non functions
and datasets.
Regularization Techniques
As we move towards the right in this image, our model tries to learn too well the details and the
noise from the training data, which ultimately results in poor performance on the unseen data.
In other words, while going towards the right, the complexity of the model increases such that
Let’s consider a neural network which is overfitting on the training data as shown in the image
below.
Assume that our regularization coefficient is so high that some of the weight matrices are nearly
equal to zero.
This will result in a much simpler linear network and slight underfitting of the training data.
Such a large value of the regularization coefficient is not that useful. We need to optimize the
below.
L2 & L1 regularization
L1 and L2 are the most common types of regularization. These update the general cost function
Due to the addition of this regularization term, the values of weight matrices decrease because it
assumes that a neural network with smaller weight matrices leads to simpler models. Therefore,
In L2, we have:
Here, lambda is the regularization parameter. It is the hyperparameter whose value is optimized
for better results. L2 regularization is also known as weight decay as it forces the weights to
In L1, we have:
In this, we penalize the absolute
solute value of the weights. Unlike L2, the weights may be reduced to
zero here. Hence, it is very useful when we are trying to compress our model. Otherwise, we
Dropout
This is the one of the most interesting types of regularization techniques. It also produces very
good results and is consequently the most frequently used regularization technique in the field of
deep learning.
To understand dropout, let’s say our neural network structure is akin to the one shown below:
So what does dropout do? At every iteration, it randomly selects some nodes and removes them
along with all of their incoming and outgoing connections as shown below.
So each iteration has a different set of nodes and this results in a different set of outputs. It can
Ensemble models usually perform better than a single model as they capture more randomness.
Similarly, dropout also performs better than a normal neural network model.
Considering that we are designing a machine learning model. A model is said to be a good
machine-learning model, if it generalizes any new input data from the problem domain in a
proper way. This helps us to make predictions in the future data, that data model has never seen.
Whenever working on a data set to predict or classify a problem, we tend to find accuracy by
implementing a design model on first train set, then on test set. If the accuracy is satisfactory, we
tend to increase accuracy of data-sets prediction either by increasing or decreasing data feature or
features selection or applying feature engineering in our machine-learning model. But sometime
our model maybe giving poor result. This can be explained by overfitting and underfitting, which
are majorly responsible for the poor performances of the machine learning algorithms.
capture the underlying trend of the data. We want the model to learn from the training data, but
we don’t want it to learn too much (i.e. too many patterns). One solution could be to stop the
training earlier. However, this could lead the model to not learn enough patterns from the
training data, and possibly not even capture the dominant trend. In underfitting (i.e. high bias) is
just as bad for generalization of the model as overfitting. In high bias, the model might not have
enough flexibility in terms of line fitting, resulting in a simplistic line that does not generalize
well.
When we run our training algorithm on the data set, we allow the overall cost (i.e. distance from
each point to the line) to become smaller with more iterations. Leaving this training algorithm
run for long leads to minimal overall cost. However, this means that the line will be fit into all
the points (including noise), catching secondary patterns that may not be needed for the