National University of Modern Languages Lahore Campus Topic
National University of Modern Languages Lahore Campus Topic
National University of Modern Languages Lahore Campus Topic
Lahore Campus
Topic:
Regression Analysis
Subject:
Multivariate & Data Analysis
Submitted To:
Muhammad Shoaib
Submitted by:
Muhammad Ahmad
Roll Number:
L-21127
Class (shift):
MBA-VI (M)
Definition:
Regression analysis is most widely used statistical technique for investigating or estimating the
relationship between dependent and a set of independent explanatory variables. It is also used as a
blanket term for a variety of data analysis techniques that are utilized in a qualitative research
Types of Regression
Every regression technique has some assumptions attached to it which we need to meet before running
analysis. These techniques differ in terms of type of dependent and independent variables and
distribution.
Linear Regression
It is the simplest form of regression. It is a technique in which the dependent variable is continuous in
nature. The relationship between the dependent variable and independent variables is assumed to be
linear in nature.We can observe that the given plot represents a somehow linear relationship between
the mileage and displacement of cars. The green points are the actual observations while the black line
fitted is the line of regression
Polynomial Regression
Logistic Regression
In logistic regression, the dependent variable is binary in nature (having two categories). Independent
variables can be continuous or binary. In multinomial logistic regression, you can have more than two
categories in your dependent variable.
Quantile Regression
Quantile regression is the extension of linear regression and we generally use it when outliers, high
skeweness and heteroscedasticity exist in the data.In linear regression, we predict the mean of the
dependent variable for given independent variables. Since mean does not describe the whole
distribution, so modeling the mean is not a full description of a relationship between dependent and
independent variables. So we can use quantile regression which predicts a quantile (or percentile) for
given independent variables.
Lasso Regression
Lasso stands for Least Absolute Shrinkage and Selection Operator. It makes use of L1 regularization
technique in the objective function. Thus the objective function in LASSO regression becomes:
λ is the regularization parameter and the intercept term is not regularized. We do not assume that the
error terms are normally distributed.
For the estimates we don't have any specific mathematical formula but we can obtain the estimates
using some statistical software
Elastic Net regression is preferred over both ridge and lasso regression when one is dealing with highly
correlated independent variables.
PCR is a regression technique which is widely used when you have many independent variables OR
multicollinearity exist in your data. It is divided into 2 steps:
1. Dimensionality Reduction
2. Removal of multicollinearity
It is an alternative technique of principal component regression when you have independent variables
highly correlated. It is also useful when there are a large number of independent variables.
Support vector regression can solve both linear and non-linear models. SVM uses non-linear kernel
functions (such as polynomial) to find the optimal solution for non-linear models.
Ordinal Regression
Ordinal Regression is used to predict ranked values. In simple words, this type of regression is suitable
when dependent variable is ordinal in nature. Example of ordinal variables - Survey responses (1 to 6
scale), patient reaction to drug dose (none, mild, severe).
Poisson Regression
Like Poisson Regression, it also deals with count data. The question arises "how it is different from
poisson regression". The answer is negative binomial regression does not assume distribution of count
having variance equal to its mean. While poisson regression assumes the variance equal to its mean.
It is an alternative to negative binomial regression. It can also be used for overdispersed count data.
Both the algorithms give similar results, there are differences in estimating the effects of covariates. The
variance of a quasi-Poisson model is a linear function of the mean while the variance of a negative
binomial model is a quadratic function of the mean.
Cox Regression
Cox Regression is suitable for time-to-event data. See the examples below -