National University of Modern Languages Lahore Campus Topic

National University of Modern Languages
Lahore Campus
Topic:
Regression Analysis
Subject:
Multivariate & Data Analysis
Submitted To:
Muhammad Shoaib
Submitted by:
Muhammad Ahmad
Roll Number:
L-21127
Class (shift):
MBA-VI (M)
Definition:
Regression analysis is most widely used statistical technique for investigating or estimating the
relationship between dependent and a set of independent explanatory variables. It is also used as a
blanket term for a variety of data analysis techniques that are utilized in a qualitative research
Types of Regression
Every regression technique has some assumptions attached to it which we need to meet before running
analysis. These techniques differ in terms of type of dependent and independent variables and
distribution.
Linear Regression
It is the simplest form of regression. It is a technique in which the dependent variable is continuous in
nature. The relationship between the dependent variable and independent variables is assumed to be
linear in nature.We can observe that the given plot represents a somehow linear relationship between
the mileage and displacement of cars. The green points are the actual observations while the black line
fitted is the line of regression
Polynomial Regression
It is a technique to fit a nonlinear equation by taking polynomial functions of independent variable.
Logistic Regression
In logistic regression, the dependent variable is binary in nature (having two categories). Independent
variables can be continuous or binary. In multinomial logistic regression, you can have more than two
categories in your dependent variable.
Quantile Regression
Quantile regression is the extension of linear regression and we generally use it when outliers, high
skeweness and heteroscedasticity exist in the data.In linear regression, we predict the mean of the
dependent variable for given independent variables. Since mean does not describe the whole
distribution, so modeling the mean is not a full description of a relationship between dependent and
independent variables. So we can use quantile regression which predicts a quantile (or percentile) for
given independent variables.
Lasso Regression
Lasso stands for Least Absolute Shrinkage and Selection Operator. It makes use of L1 regularization
technique in the objective function. Thus the objective function in LASSO regression becomes:
λ is the regularization parameter and the intercept term is not regularized. We do not assume that the
error terms are normally distributed.
For the estimates we don't have any specific mathematical formula but we can obtain the estimates
using some statistical software
Elastic Net Regression
Elastic Net regression is preferred over both ridge and lasso regression when one is dealing with highly
correlated independent variables.
Principal Components Regression (PCR)
PCR is a regression technique which is widely used when you have many independent variables OR
multicollinearity exist in your data. It is divided into 2 steps:
1. Getting the Principal components
2. Run regression analysis on principal components
The most common features of PCR are:
1. Dimensionality Reduction
2. Removal of multicollinearity
Partial Least Squares (PLS) Regression
It is an alternative technique of principal component regression when you have independent variables
highly correlated. It is also useful when there are a large number of independent variables.
Support Vector Regression
Support vector regression can solve both linear and non-linear models. SVM uses non-linear kernel
functions (such as polynomial) to find the optimal solution for non-linear models.
Ordinal Regression
Ordinal Regression is used to predict ranked values. In simple words, this type of regression is suitable
when dependent variable is ordinal in nature. Example of ordinal variables - Survey responses (1 to 6
scale), patient reaction to drug dose (none, mild, severe).
Poisson Regression
Poisson regression is used when dependent variable has count data.
Application of Poisson Regression -
1. Predicting the number of calls in customer care related to a particular product
2. Estimating the number of emergency service calls during an event
The dependent variable must meet the following conditions -
1. The dependent variable has a Poisson distribution.
2. Counts cannot be negative.
3. This method is not suitable on non-whole numbers
Negative Binomial Regression
Like Poisson Regression, it also deals with count data. The question arises "how it is different from
poisson regression". The answer is negative binomial regression does not assume distribution of count
having variance equal to its mean. While poisson regression assumes the variance equal to its mean.
Quasi Poisson Regression
It is an alternative to negative binomial regression. It can also be used for overdispersed count data.
Both the algorithms give similar results, there are differences in estimating the effects of covariates. The
variance of a quasi-Poisson model is a linear function of the mean while the variance of a negative
binomial model is a quadratic function of the mean.
Cox Regression
Cox Regression is suitable for time-to-event data. See the examples below -
1. Time from customer opened the account until attrition.
2. Time after cancer treatment until death.
3. Time from first heart attack to the second.

National University of Modern Languages Lahore Campus Topic

Uploaded by

Copyright:

Available Formats

National University of Modern Languages Lahore Campus Topic

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

National University of Modern Languages Lahore Campus Topic

Uploaded by

Copyright:

Available Formats

National University of Modern Languages

It is a technique to fit a nonlinear equation by taking polynomial functions of independent variable.

Elastic Net Regression

Principal Components Regression (PCR)

1. Getting the Principal components

2. Run regression analysis on principal components

The most common features of PCR are:

Partial Least Squares (PLS) Regression

Support Vector Regression

Poisson regression is used when dependent variable has count data.

Application of Poisson Regression -

1. Predicting the number of calls in customer care related to a particular product

2. Estimating the number of emergency service calls during an event

The dependent variable must meet the following conditions -

1. The dependent variable has a Poisson distribution.

2. Counts cannot be negative.

3. This method is not suitable on non-whole numbers

Negative Binomial Regression

Quasi Poisson Regression

1. Time from customer opened the account until attrition.

2. Time after cancer treatment until death.

3. Time from first heart attack to the second.

You might also like