Linear regression and logistic regression are two types of
regression analysis techniques that are used to solve the regression problem using machine learning. They are the most prominent techniques of regression. But, there are many types of regression analysis techniques in machine learning, and their usage varies according to the nature of the data involved. This article will explain the different types of regression in machine learning, and under what condition each of them can be used. If you are new to machine learning, this article will surely help you in understanding the regression modelling concept. What is Regression Analysis?
Regression analysis is a predictive modelling technique that
analyzes the relation between the target or dependent variable and independent variable in a dataset. The different types of regression analysis techniques get used when the target and independent variables show a linear or non-linear relationship between each other, and the target variable contains continuous values. The regression technique gets used mainly to determine the predictor strength, forecast trend, time series, and in case of cause & effect relation. Regression analysis is the primary technique to solve the regression problems in machine learning using data modelling. It involves determining the best fit line, which is a line that passes through all the data points in such a way that distance of the line from each data point is minimized. Types of Regression Analysis Techniques
There are many types of regression analysis techniques, and
the use of each method depends upon the number of factors. These factors include the type of target variable, shape of the regression line, and the number of independent variables. Below are the different regression techniques: 1. Linear Regression 2. Logistic Regression 3. Ridge Regression 4. Lasso Regression 5. Polynomial Regression 6. Bayesian Linear Regression
1. Linear Regression
Linear regression is one of the most basic types of regression
in machine learning. The linear regression model consists of a predictor variable and a dependent variable related linearly to each other. In case the data involves more than one independent variable, then linear regression is called multiple linear regression models. The below-given equation is used to denote the linear regression model: y=mx+c+e where m is the slope of the line, c is an intercept, and e represents the error in the model. The best fit line is determined by varying the values of m and c. The predictor error is the difference between the observed values and the predicted value. The values of m and c get selected in such a way that it gives the minimum predictor error. It is important to note that a simple linear regression model is susceptible to outliers. Therefore, it should not be used in case of big size data.
2. Logistic Regression
Logistic regression is one of the types of regression analysis
technique, which gets used when the dependent variable is discrete. Example: 0 or 1, true or false, etc. This means the target variable can have only two values, and a sigmoid curve denotes the relation between the target variable and the independent variable. Logit function is used in Logistic Regression to measure the relationship between the target variable and independent variables. Below is the equation that denotes the logistic regression. logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3….+bkXk where p is the probability of occurrence of the feature. For selecting logistic regression, as the regression analyst technique, it should be noted, the size of data is large with the almost equal occurrence of values to come in target variables. Also, there should be no multicollinearity, which means that there should be no correlation between independent variables in the dataset. 3. Ridge Regression
This is another one of the types of regression in machine
learning which is usually used when there is a high correlation between the independent variables. This is because, in the case of multi collinear data, the least square estimates give unbiased values. But, in case the collinearity is very high, there can be some bias value. Therefore, a bias matrix is introduced in the equation of Ridge Regression. This is a powerful regression method where the model is less susceptible to overfitting. Below is the equation used to denote the Ridge Regression, where the introduction of λ (lambda) solves the problem of lambda) solves the problem of multicollinearity: β = (X^{T}X + λ*I)^{-1}X^{T}yI)^{-1}X^{T}y 4. Lasso Regression
Lasso Regression is one of the types of regression in machine
learning that performs regularization along with feature selection. It prohibits the absolute size of the regression coefficient. As a result, the coefficient value gets nearer to zero, which does not happen in the case of Ridge Regression. Due to this, feature selection gets used in Lasso Regression, which allows selecting a set of features from the dataset to build the model. In the case of Lasso Regression, only the required features are used, and the other ones are made zero. This helps in avoiding the overfitting in the model. In case the independent variables are highly collinear, then Lasso regression picks only one variable and makes other variables to shrink to zero.
Below is the equation that represents the Lasso Regression
method: N^{-1}Σ^{N}_{i=1}f(x_{i}, y_{I}, α, β)
5. Polynomial Regression
Polynomial Regression is another one of the types of regression
analysis techniques in machine learning, which is the same as Multiple Linear Regression with a little modification. In Polynomial Regression, the relationship between independent and dependent variables, that is X and Y, is denoted by the n-th degree. It is a linear model as an estimator. Least Mean Squared Method is used in Polynomial Regression also. The best fit line in Polynomial Regression that passes through all the data points is not a straight line, but a curved line, which depends upon the power of X or value of n. While trying to reduce the Mean Squared Error to a minimum and to get the best fit line, the model can be prone to overfitting. It is recommended to analyze the curve towards the end as the higher Polynomials can give strange results on extrapolation. Below equation represents the Polynomial Regression: l = β0+ β0x1+ε
6. Bayesian Linear Regression
Bayesian Regression is one of the types of regression in
machine learning that uses the Bayes theorem to find out the value of regression coefficients. In this method of regression, the posterior distribution of the features is determined instead of finding the least-squares. Bayesian Linear Regression is like both Linear Regression and Ridge Regression but is more stable than the simple Linear Regression.