ML unit-2 ppt

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 34

Machine Learning

Unit-II
Linear regression
• Regression is essentially finding a relationship (or) association between the
dependent variable (Y) and the independent variable(s) (X), i.e. to find the function ‘f
’ for the association Y = f (X).
• Linear regression is a statistical model that is used to predict a continuous
dependent variable from one or more independent variables
• It is called "linear" because the model is based on the idea that the relationship between
the dependent and independent variables is linear.
• In a linear regression model, the independent variables are referred to as the predictors
and the dependent variable is
• The goal is to find the "best" line that fits the data. The "best" line is
the one that minimizes the sum of the squared differences between the
observed responses in the dataset and the responses predicted by the line.
• For example, if you were using linear regression to model the relationship
between the temperature outside and the number of ice cream cones sold at
an ice cream shop, you could use the model to predict how many ice cream
cones you would sell on a hot day given the temperature outside.
• The goal is to find the "best" line that fits the data. The "best" line is
the one that minimizes the sum of the squared differences between the
observed responses in the dataset and the responses predicted by the line.
• For example, if you were using linear regression to model the relationship
between the temperature outside and the number of ice cream cones sold at
an ice cream shop, you could use the model to predict how many ice cream
cones you would sell on a hot day given the temperature outside.
• The value of intercept indicates the
value of Y when X = 0. It is known
as ‘the intercept or Y intercept’
because it specifies where the
straight line crosses the vertical or Y-
axis.

• Slope of a straight line represents


how much the line in a graph
changes in the vertical direction (Y-
axis) over a change in the horizontal
direction (X-axis)
Slope = Change in Y/Change in X

Simple linear regression


• Example: If we take Price of a Property as the dependent variable and the Area of
the Property (in sq. m.) as the predictor variable, we can build a model using
simple linear regression.

• Assuming a linear association, we can reformulate the model as

where ‘a’ and ‘b’ are intercept and slope of the straight
line, respectively.
Slope of the simple linear regression
model
• Slope of a straight line represents how much the line in a graph
changes in the vertical direction (Y-axis) over a change in the
horizontal direction (X-axis)
• Rise is the change in Y-axis
• Run is the change in X-axis
Ordinary Least Squares (OLS)
algorithm
Exercise Problem
• A college professor believes that if the grade for internal examination
is high in a class, the grade for external examination will also be high.
A random sample of 15 students in that class was selected, and the
data is given as,
Multiple Linear Regression
• In a multiple regression model, two or more independent variables,
i.e. predictors are involved.
• Example: A model which can predict the correct value of a real estate
if it has certain standard inputs such as area (sq. m.) of the property,
location, floor, number of years since purchase, amenities available
etc as independent variables.
• The model describes a plane in the three-dimensional space of Ŷ, X1 ,
and X2 . Parameter ‘a’ is the intercept of this plane. Parameters ‘b1’
and ‘b2’ are referred to as partial regression coefficients.
• Parameter b1 represents the change in the mean response
corresponding to a unit change in X1 when X2 is held constant.
• Parameter b2 represents the change in the mean response
corresponding to a unit change in X2 when X1 is held constant.
• Consider the following example of a multiple linear regression model
with two predictor variables, namely X1 and X2

Multiple regression plane


• Multiple regression for estimating equation when there are ‘n’
predictor variables is as follows:

While finding the best fit line, we can fit either a polynomial or curvilinear
regression. These are known as polynomial or curvilinear regression, respectively.
Assumptions in Regression Analysis
• linear relationship between the dependent and independent variales
• Regression line can be valid only over a limited range of data. If the line is
extended (outside the range of extrapolation), it may only lead to wrong
predictions.
• The values of the error (ε) are independent and are not related to any
values of X
• The number of observations (n) is greater than the number of parameters (k) to
be estimated, i.e. n > k.
• normally distributed error component
• no multicollinearity, instability of regression coefficients
• no hereoskedasticity ,the variance of the residuals must be constant across the
predicted vaues.
Given the above assumptions, the OLS estimator is the Best Linear Unbiased
Estimator (BLUE), and this is called as Gauss-Markov Theorem
problems in Regression Analysis
1.Multicollinearity
2.Heteroskedasticity
Multicollinearity
• 2 or more independent variables are strongly correlated with one
another.
• problem arises with this is,individual variables can not be clearly
seperated by this regression equation becomes unstable.
• draw the regression line to one independent variable to others
independent variables
• to detect multicollinearity

Tolerance (T) < 0.1 Variation Inflation Factor (VIF) >10


Heteroskedasticity
• Heteroskedasticity refers to the changing variance of the error term.
that means the observed values deviate from the predicted values
ununiformly.If the variance of the error term is not constant across
data sets, there will be erroneous predictions.

where ‘var’ represents the variance, ‘cov’ represents the covariance, ‘u’ represents
the error terms, and ‘X’ represents the independent variables.
This assumption is more commonly written as
Improving Accuracy of the Linear
Regression Model
• Accuracy refers to how close the estimation is near the actual value
• Prediction refers to continuous estimation of the value.
• Bias and Variance is similar to accuracy and prediction
High bias = low accuracy (not close to real value)
High variance = low prediction (values are scattered)
Low bias = high accuracy (close to real value)
Low variance = high prediction (values are close to each
other)
• A regression model which is highly accurate and highly predictive, the
overall error of the model will be low, implying a low bias (high
accuracy) and low variance (high prediction) - highly preferable
• Similarly, if the variance increases (low prediction), the spread of our
data points increases, which results in less accurate prediction. As the
bias increases (low accuracy), the error between our predicted value
and the observed values increases.
• Balancing out bias and accuracy is essential in a regression model In the
linear regression model, it is assumed that the number of observations
(n) is greater than the number of parameters (k) to be estimated, i.e. n >
k, and in that case, the least squares estimates tend to have low
variance and hence will perform well on test observations.
• However, if observations (n) is not much larger than parameters (k),
then there can be high variability in the least squares fit, resulting in
overfitting and leading to poor predictions.If k > n, then linear regression
is not usable.
• Accuracy of linear regression can be improved using the following
three methods:
1. Shrinkage Approach
2. Subset Selection
3. Dimensionality (Variable) Reduction
Shrinkage (Regularization) approach
• This approach involves fitting a model involving all predictors. However,
the estimated coefficients are shrunken towards zero relative to the
least squares estimates.
• This shrinkage (also known as regularization) has the effect of reducing
the overall variance. Some of the coefficients may also be estimated to
be exactly zero, thereby indirectly performing variable selection.
• The two best-known techniques for shrinking the regression
coefficients towards zero are
1. ridge regression
2. lasso (Least Absolute Shrinkage Selector Operator)
Ridge regression (L2 regularization)
• It modifies the over-fitted or under fitted models by adding the
penalty equivalent to the sum of the squares of the magnitude of
coefficients.
• Ridge Regression performs regularization by shrinking the coefficients
present.
L2= (y-ŷ)2+α (sum of square of coefficients)

where α or λ- tuning parameter


Lasso regression (L1 regularization)
• It modifies the over-fitted or under-fitted models by adding the
penalty equivalent to the sum of the absolute values of coefficients.
L1= (y-ŷ)2+α (absolute value of the magnitude of coefficients)

where α or λ- tuning parameter

• Lasso can be used to select important features of a dataset


• The difference between ridge and lasso regression is that lasso tends to
make coefficients to absolute zero as compared to Ridge which never sets
the value of the coefficient to absolute zero
Subset selection
• Identify a subset of the predictors that is assumed to be related to the
response and then fit a model using OLS on the selected reduced
subset of variables.
• There are two methods in which subset of the regression can be
selected:
1. Best subset selection (considers all the possible (2k ))
2. Stepwise subset selection
i. Forward stepwise selection (0 to k)
ii. Backward stepwise selection (k to 0)
1. Best subset selection -
• fit a seperate OLS for each possible subset.
• Best subset selection is a method used in statistical modeling and
machine learning to select a subset of predictors that are most
relevant for predicting the target variable.
• This technique is particularly useful in situations where you have a
large number of potential predictors, and you want to identify the
most important ones to improve model accuracy and interpretability.
2.Stepwise subset selection
• Forward stepwise selection (0 to k)
 It begins with model containing no predictors ,predictors are added
one by one to model until all k predictors are included in the model.
•Backward stepwise selection (k to 0)
 This process starts with all k predictors and then iteratively
removes the least useful predictors one by one.
Dimensionality Reduction
• In dimensionality reduction, predictors (X) are transformed, and the
model is set up using the transformed variables after dimensionality
reduction.
• The number of variables is reduced using the dimensionality
reduction method.
• Principal component analysis is one of the most important
dimensionality (variable) reduction techniques.

You might also like