DA Unit-3
DA Unit-3
DA Unit-3
Regression
Regression Concepts:
Regression analysis is a form of predictive modelling technique which investigates
the relationship between a dependent (target) and independent variable (s)
(predictor). This technique is used for forecasting, time series modelling and finding
the causal effect relationship between the variables. For example, relationship
between rash driving and number of road accidents by a driver is best studied
through regression.
• Dependent-Target Variable, e.g: test score
• Independent Variable- Predictive Variable or Explanatory Variable,
e.g : age
Regression analysis estimates the relationship between two or more variables. Let’s
understand this with an easy example:
Let’s say, you want to estimate growth in sales of a company based on current
economic conditions. You have the recent company data which indicates that the
growth in sales is around two and a half times the growth in the economy. Using this
insight, we can predict future sales of the company based on current & past
information.
There are multiple benefits of using regression analysis. They are as follows:
• Linear Regression
• Logistic Regression
• Polynomial Regression
• Ridge Regression
• Lasso Regression
1. Linear Regression
It is one of the most widely known modeling technique. Linear regression is usually
among the first few topics which people pick while learning predictive modeling. In
this technique, the dependent variable is continuous, independent variable(s) can be
continuous or discrete, and nature of regression line is linear.
The relationship between the two variable is three types. They are
(iii) No Relationship
Where
• y is dependent variable
• x is independent variable
• b is slope--> how much the line rises for each unit increase in x
• a is y intercept --> the value of y when x=0.
Logistic Regression
Logistic Regression is used to solve the classification problems, so it’s called as
Classification Algorithm that models the probability of output class.
• It is a classification problem where your target element is categorical
• Unlike in Linear Regression, in Logistic regression the output required is representedin
discrete values like binary 0 and 1.
• It estimates relationship between a dependent variable (target) and one or more
independent variable (predictors) where dependent variable is categorical/nominal.
• Logistic regression is a supervised learning classification algorithm used to predict
the probability of a dependent variable.
• The nature of target or dependent variable is dichotomous(binary), which means
there would be only two possible classes.
• In simple words, the dependent variable is binary in nature having data coded as
either 1 (stands for success/yes) or 0 (stands for failure/no), etc. but instead of giving
the exact value as 0 and 1, it gives the probabilistic values which lie between 0
and 1.
• Logistic Regression is much similar to the Linear Regression except that how they
are used. Linear Regression is used for solving Regression problems,
whereas Logistic regression is used for solving the classification problems.
• In Logistic regression, instead of fitting a regression line, we fit an "S" shaped
logistic function, which predicts two maximum values (0 or 1).
•
Sigmoid Function:
• It is the logistic expression especially used in Logistic Regression.
• The sigmoid function converts any line into a curve which has discrete values like
binary 0 and 1.
• In this session let’s see how a continuous linear regression can be manipulated and
converted into Classifies Logistic.
• The sigmoid function is a mathematical function used to map the predicted values
to probabilities.
• It maps any real value into another value within a range of 0 and 1.
• The value of the logistic regression must be between 0 and 1, which cannot go
beyond this limit, so it forms a curve like the "S" form. The S-form curve is called
the Sigmoid function or the logistic function.
• In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and
a value below the threshold values tends to 0.
Where,
P represents Probability of Output class Y represents predicted output.
Example
0 4.2
0 5.1
0 5.5
1 8.2
1 9.0
1 9.9
Polynomial Regression
o Polynomial Regression is a type of regression which models the non-linear
dataset using a linear model.
o It is similar to multiple linear regression, but it fits a non-linear curve between
the value of x and corresponding conditional values of y.
o Suppose there is a dataset which consists of datapoints which are present in a
non-linear fashion, so for such case, linear regression will not best fit to those
datapoints. To cover such datapoints, we need Polynomial regression.
o In Polynomial regression, the original features are transformed into
polynomial features of given degree and then modeled using a linear
model. Which means the datapoints are best fitted using a polynomial line.
o The equation for polynomial regression also derived from linear regression
equation that means Linear regression equation Y= b 0+ b1x, is transformed
into Polynomial regression equation Y= b0+b1x+ b2x2+ b3x3+ + bnxn.
o Here Y is the predicted/target output, b0, b1,... bn are the regression
coefficients. x is our independent/input variable.
o The model is still linear as the coefficients are still linear with quadratic.
When we compare the above three equations, we can clearly see that all three
equations are Polynomial equations but differ by the degree of variables.
The Simple and Multiple Linear equations are also Polynomial equations with
a single degree, and the Polynomial regression equation is Linear equation
with the nth degree.
Stepwise Regression
• This form of regression is used when we deal with multiple independent
variables. In this technique, the selection of independent variables is done
with the help of an automatic process, which involves no human intervention.
• Stepwise regression basically fits the regression model by adding/dropping
co-variates one at a time based on a specified criterion. Some of the most
commonly used Stepwise regression methods are listed below:
Standard stepwise regression does two things. It adds and removes predictors
as needed for each step.
➢ Forward selection starts with most significant predictor in the model and
adds variable for each step.
➢ Backward elimination starts with all predictors in the model and removes
the least significant variable for each step.
➢ The aim of this modeling technique is to maximize the prediction power with
minimum number of predictor variables. It is one of the method to handle
higher dimensionality of data set.
Ridge Regression:
o Ridge regression is one of the most robust versions of linear regression in
which a small amount of bias is introduced so that we can get better long term
predictions.
o The amount of bias added to the model is known as Ridge Regression
penalty. We can compute this penalty term by multiplying with the lambda
to the squared weight of each individual features.
o The equation for ridge regression will be: