0% found this document useful (0 votes)

23 views

Module 3

Linear regression is a machine learning algorithm that finds the linear relationship between an input variable (x) and an output variable (y). The hypothesis of linear regression is y=mx+c, where m is the slope and c is the intercept. The model finds the best fit line by minimizing the cost function, which is the root mean squared error between the predicted y values and true y values. Gradient descent is used to optimize the parameters and minimize the cost function. Regularization techniques like ridge regression and lasso regression are used to prevent overfitting by shrinking coefficients. Ridge adds a penalty based on the square of coefficients while lasso uses the absolute value, allowing it to perform feature selection by driving some coefficients to zero.

Uploaded by

2021.shreya.pawaskar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

Module 3

Uploaded by

2021.shreya.pawaskar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Module 3

Linear Models
Linear Regression

• Linear Regression is a machine learning algorithm based on supervised learning.

• Linear regression performs the task to predict a dependent variable value (y) based on a given
independent variable (x). So, this regression technique finds out a linear relationship between x (input)
and y(output). Hence, the name is Linear Regression.
• Hypothesis of Linear Regression is
• Y= m.x+c
• The model gets the best regression fit line by finding the best m (slope) and
c(intercept) values.
Least Square Method

• The least squares method is a statistical procedure to find

the best fit for a set of data points by minimizing the sum
of the offsets or residuals of points from the plotted
curve.
• Least squares regression is used to predict the behavior
of dependent variables.
Cost Function

• The model aims to predict y value such that the error difference between predicted value
and true value is minimum.
• Cost function(J) of Linear Regression is the Root Mean Squared Error (RMSE) between
predicted y value (pred) and true y value (y).
•
GRADIENT DESCENT IN
LINEAR REGRESSION

• An algorithm to minimize a loss function by optimizing the

parameters

• NewValue = old value – step size

• Newvalue = old value – Learning Rate * slope
What is a Cost Function?

• Linear
Regression
Alpha- Learning rate

• If the learning rate is too high, we might OVERSHOOT the minima and keep bouncing,
without reaching the minima
• If the learning rate is too small, the training might turn out to be too long
• Blog to study
• https://www.analyticsvidhya.com/blog/2021/08/understanding-gradient-descent-algorithm
-and-the-maths-behind-it/
Plotting the Gradient Descent Algorithm
• When we have a single
parameter (theta), we can plot
the dependent variable cost on
the y-axis and theta on the
x-axis. If there are two
parameters, we can go with a
3-D plot, with cost on one axis
and the two parameters (thetas)
along the other two axes.
Plotting the Gradient Descent Algorithm
Regularization in ML

• Problem : Overfitting
• Solution : This is a form of regression, that constrains/ regularizes or shrinks the
coefficient estimates towards zero. In other words, this technique discourages learning a
more complex or flexible model, so as to avoid the risk of overfitting.
• A simple Linear Regression
• Y = mx + c
Regression Regularization

• Particularly, regularization is implemented to avoid overfitting of the data, especially

when there is a large variance between train and test set performances.
• With regularization, the number of features used in training is kept constant, yet the
magnitude of the coefficients (B) as seen in equation is reduced
Intuition for Regression

• While there are quite a number of predictors, RM and RAD

have the largest coefficients. The implication of this will be
that housing prices will be driven more significantly by these
two features leading to overfitting, where generalizable
patterns have not been learned.
• There are different ways of reducing model complexity and
preventing overfitting in linear models. This includes ridge
and lasso regression models.

Image of coefficients below to predict house prices.

Ridge Regression

• RSS is modified by adding the shrinkage quantity.

• λ is the tuning parameter that decides how much we want to penalize the flexibility of
our model.
• It adds a factor of sum of squares of coefficients in the optimization objective. Thus, ridge
regression optimizes the following:
• Objective = RSS + α * (sum of square of coefficients)
• The coefficient estimates produced by this method are also known as the L2 norm.
• Performs L2 regularization, i.e. adds penalty equivalent to square of the magnitude of
coefficients
Ridge Regression

• Objective = RSS + α * (sum of square of coefficients)

• α = 0:
• The objective becomes same as simple linear regression.
• We’ll get the same coefficients as simple linear regression.
• α = ∞:
• The coefficients will be zero. Why? Because of infinite weightage on square of coefficients, anything less than
zero will make the objective infinite.
• 0 < α < ∞:
• The magnitude of α will decide the weightage given to different parts of objective.
• The coefficients will be somewhere between 0 and ones for simple linear regression.
LASSO
Least Absolute Shrinkage and Selection Operator,

• this variation differs from ridge regression only in penalizing the high coefficients.
• It uses |βj|(modulus)instead of squares of β, as its penalty. In statistics, this is known as
the L1 norm.
• Lasso regression performs L1 regularization, i.e. it adds a factor of sum of absolute value
of coefficients in the optimization objective. Thus, lasso regression optimizes the
following:
• Objective = RSS + α * (sum of absolute value of coefficients)
LASSO

• Like that of ridge, α can take various values. Lets iterate it here briefly
• α = 0: Same coefficients as simple linear regression
• α = ∞: All coefficients zero (same logic as before)
• 0 < α < ∞: coefficients between 0 and that of simple linear regression
Alpha (α) can be any real-valued number between zero and
infinity; the larger the value, the more aggressive the
penalization is.
lasso regression shrinks the coefficients and helps to reduce the model
complexity and multi-collinearity.
Key Take Away for Lasso : Lasso Regression for Model
Selection

• Due to the fact that coefficients will be shrunk towards a mean of zero, less important
features in a dataset are eliminated when penalized.
• The shrinkage of these coefficients based on the alpha value provided leads to some form
of automatic feature selection, as input variables are removed in an effective approach.
Why Lasso can be Used for Model Selection, but not Ridge
Regression
• The elliptical contours (red circles) are the cost
functions for each.
• Since lasso regression takes a diamond shape in
the plot for the constrained region, each time the
elliptical regions intersect with these corners, at
least one of the coefficients becomes zero. This
is impossible in the ridge regression model as it
forms a circular shape and therefore values can
be shrunk close to zero, but never equal to zero.
Conclusion
❑ The cost function for both ridge and lasso regression are similar. However,
ridge regression takes the square of the coefficients and lasso takes the
magnitude.
❑ Lasso regression can be used for automatic feature selection, as the
geometry of its constrained region allows coefficient values to inert to zero.
❑ An alpha value of zero in either ridge or lasso model will have results
similar to the regression model.
❑ The larger the alpha value, the more aggressive the penalization.
What is Hyperplane ??
• For a linearly separable dataset having n features , a hyperplane is basically an (n – 1) dimensional
subspace used for separating the dataset into two sets, each set containing data points belonging to a
different class.
• For example, for a dataset having two features X and Y (therefore lying in a 2-dimensional space), the
separating hyperplane is a line (a 1-dimensional subspace).
• Similarly, for a dataset having 3-dimensions, we have a 2-dimensional separating hyperplane, and so
on.
• In machine learning, Support Vector Machine (SVM) is a non-probabilistic, linear, binary classifier
used for classifying data by learning a hyperplane separating the data.
What is SVM ?

• SVM was developed by Vladimir Vapnik in the 1970s

• Vapnik envisaged that coming up with a decision boundary that tries to maximize the margin
between the two classes will give great results and overcome the problem of overfitting.
• In the SVM algorithm, we plot each data item as a point in n-dimensional space (where n is a
number of features you have) .Then, we perform classification by finding the hyper-plane that
differentiates the two classes.
• kernel method was introduced that made it possible to solve non-linear problems using SVM.
Find the right Hyperplane
Scenario ?
Margin (maximum)

Our objective is to find a plane that has the maximum

margin, i.e the maximum distance between data points of
both classes.

Support vectors are data points that are closer to the

hyperplane and influence the position and orientation of
the hyperplane.
These margins are calculated using data points known as
Support Vectors.
OR
Support Vectors are those data points that are near to the
hyper-plane and help in orienting it.
Algorithm
Intuition of SVM
• In SVM, we take the output of the linear function and if that output is greater than 1, we
identify it with one class and if the output is -1, we identify is with another class.

• Step 1: SVM algorithm predicts the classes. One of the classes is identified as 1 while the
other is identified as -1.
Step 2

• convert the problem into a mathematical equation involving unknowns. These unknowns
are then found by converting the problem into an optimization problem.
• As optimization problems always aim at maximizing or minimizing something while
looking and tweaking for the unknowns, in the case of the SVM classifier, a loss function
known as the hinge loss function is used and tweaked to find the maximum margin.
Step 3: Loss function

• If cost function is zero no class is predicted incorrectly

• The problem is that there is a trade-off between maximizing margin and the loss generated
if the margin is maximized to a very large extent. To bring these concepts in theory, a
regularization parameter is added.
Step 4: Partial derivative

• we take partial derivatives with respect to the weights to find the gradients. Using the
gradients, we can update our weights.
Understanding the loss function/svm/kernels in svm

• https://www.geeksforgeeks.org/hinge-loss-relationship-with-support-vector-machines/
• https://iq.opengenus.org/hinge-loss-for-svm/
• Kernel SVM
• https://www.analyticsvidhya.com/blog/2021/10/support-vector-machinessvm-a-complete-
gui de-for-beginners/
Step 5: Update weight
• When there is no misclassification, i.e our model correctly predicts the class of our data
point, we only have to update the gradient from the regularization parameter.
• When there is a misclassification, i.e our model make a mistake on the prediction of the
class of our data point, we include the loss along with the regularization parameter to
perform gradient update.
Introduction to kernels

• When we can easily separate data with hyperplane by drawing a straight line is Linear
SVM.
• When we cannot separate data with a straight line we use Non – Linear SVM. In this, we
have Kernel functions
• It transforms data into another dimension so that the data can be classified.
• It transforms two variables x and y into three variables along with z.
Kernel Trick
Datasets are which you will be working or
currently working on might not always be
linear. One approach to handling
nonlinear datasets is to add more
features, such as polynomial features; in
some cases, this can result in a linearly
separable dataset.
Consider the left plot in Figure 1: it
represents a simple dataset with just one
feature x1. This dataset is not linearly
separable, as you can see. But if you add
a second feature x2 = (x1)2, the resulting
2D dataset is perfectly linearly separable.
Kernel Trick

Kernel functions Quadratic function used

• Polynomial Kernel
• Sigmoid Kernel
• RBF Kernel
Polynomial Kernel

• A polynomial kernel is a kind of SVM kernel that uses a polynomial function to map the
data into a higher-dimensional space. It does this by taking the dot product of the data
points in the original space and the polynomial function in the new space.
• The important terms we need to note are x1, x2, x1^2, x2^2, and x1 * x2. When finding
these new terms, the non-linear dataset is converted to another dimension that has
features x1^2, x2^2, and x1 * x2.
ℽ = 1 / 2σ^2,

RBF Kernel

• the Squared Euclidean Distance is multiplied

by the gamma parameter and then finding
the exponent of the whole.

• where,
1. ‘σ’ is the variance and our hyperparameter
2. ||X₁ - X₂|| is the Euclidean (L₂-norm) Distance
between two points X₁ and X₂
“Gamma” parameter in the RBF kernel

• It controls the width of the Gaussian function used to map the input data into a higher-dimensional space.
• A small value of gamma means that the influence of each training example is relatively large, and the
decision boundary becomes more curved or nonlinear.
• Conversely, a larger value of gamma means that the influence of each training example is relatively small,
and the decision boundary becomes more linear.
• choosing the optimal value of gamma depends on the complexity of the dataset and the number of training
examples.
• if gamma is too small, there is a risk of underfitting the data, while if it is too high, there is a risk of
overfitting.

The Hundred-Page Machine Learning Book - Andriy Burkov
No ratings yet
The Hundred-Page Machine Learning Book - Andriy Burkov
16 pages
340AJ Service 3121259 Jan-2012 Global English PDF
No ratings yet
340AJ Service 3121259 Jan-2012 Global English PDF
356 pages
21csc305p Ml Unit 2 Ppt
No ratings yet
21csc305p Ml Unit 2 Ppt
115 pages
Unit -3_ML_24
No ratings yet
Unit -3_ML_24
41 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
Module 3.3 Classification Models, An Overview
No ratings yet
Module 3.3 Classification Models, An Overview
11 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Syllabus of Machine Learning
No ratings yet
Syllabus of Machine Learning
19 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Hundred Page ML Book CH 3
No ratings yet
Hundred Page ML Book CH 3
16 pages
ML models and when to choose one over others
No ratings yet
ML models and when to choose one over others
7 pages
ML Solved Endsem
No ratings yet
ML Solved Endsem
16 pages
Linear Regression
No ratings yet
Linear Regression
60 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Machine Learning Questions and Answers For Interview
No ratings yet
Machine Learning Questions and Answers For Interview
20 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
Unit 2
No ratings yet
Unit 2
92 pages
Classification & Regression BDMDM Print
No ratings yet
Classification & Regression BDMDM Print
5 pages
2.1 Linear Regression
No ratings yet
2.1 Linear Regression
39 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
Chapter 6 Supervised Learning
No ratings yet
Chapter 6 Supervised Learning
6 pages
Lecture 3
No ratings yet
Lecture 3
51 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Lec 07-08 - Final
No ratings yet
Lec 07-08 - Final
32 pages
w4 Generalisation
No ratings yet
w4 Generalisation
42 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
ML Summary PDF
No ratings yet
ML Summary PDF
5 pages
ML-1
No ratings yet
ML-1
24 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
5.linear Regression
No ratings yet
5.linear Regression
39 pages
Module3_Ch1
No ratings yet
Module3_Ch1
83 pages
Linear Regression
No ratings yet
Linear Regression
37 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
Introduction To Machine Learning Algorithms: Linear Regression
No ratings yet
Introduction To Machine Learning Algorithms: Linear Regression
1 page
chapter2- optimisation
No ratings yet
chapter2- optimisation
7 pages
Lecture-17-Linear Regression Using Sklearn
No ratings yet
Lecture-17-Linear Regression Using Sklearn
32 pages
ML_AI
No ratings yet
ML_AI
53 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
eng
No ratings yet
eng
10 pages
ML UNIT II
No ratings yet
ML UNIT II
30 pages
LinearRegression1 210720 171800
No ratings yet
LinearRegression1 210720 171800
41 pages
unit 2 svms linear logistic regression
No ratings yet
unit 2 svms linear logistic regression
9 pages
ML Unit-2 Final
No ratings yet
ML Unit-2 Final
32 pages
UNIT3 Machine Learning
No ratings yet
UNIT3 Machine Learning
53 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Unit2 ML Notes
No ratings yet
Unit2 ML Notes
19 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Regression
No ratings yet
Regression
16 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
No ratings yet
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
43 pages
GradientDescent-Regression_slides
No ratings yet
GradientDescent-Regression_slides
26 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Chapter 2 - Logistic Regression
No ratings yet
Chapter 2 - Logistic Regression
88 pages
Machine learning
No ratings yet
Machine learning
19 pages
ML Unit 3
No ratings yet
ML Unit 3
2 pages
2022 Linear Regression
No ratings yet
2022 Linear Regression
34 pages
Unit 2
No ratings yet
Unit 2
8 pages
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
MIS Short Notes-35-pages
No ratings yet
MIS Short Notes-35-pages
35 pages
MIS all modules diagrams
No ratings yet
MIS all modules diagrams
45 pages
MIS all modules 28 pages
No ratings yet
MIS all modules 28 pages
28 pages
mis-6-pages
No ratings yet
mis-6-pages
6 pages
Module2 D MapReduceParadigm
No ratings yet
Module2 D MapReduceParadigm
84 pages
CH 02 Basics of Cryptography
No ratings yet
CH 02 Basics of Cryptography
66 pages
CH 01 B Modular Arithmetic
No ratings yet
CH 01 B Modular Arithmetic
64 pages
018 Risk Analysis
No ratings yet
018 Risk Analysis
26 pages
Modes - of - Opre and rc4
No ratings yet
Modes - of - Opre and rc4
39 pages
AV Aids
No ratings yet
AV Aids
183 pages
AMTE 130 - Lesson 3 - Types of Gas Turbine Engines
No ratings yet
AMTE 130 - Lesson 3 - Types of Gas Turbine Engines
45 pages
Anglgear Catalog Metrico
100% (1)
Anglgear Catalog Metrico
3 pages
Submitted By:-Anish Gupta Pec University of Technology B.E. (Electrical)
No ratings yet
Submitted By:-Anish Gupta Pec University of Technology B.E. (Electrical)
29 pages
CHE Lab Electrochemical Cells 12th
100% (1)
CHE Lab Electrochemical Cells 12th
6 pages
수능연계_2026학년도_EBS수능특강_영어_워크북_16강_아잉카
No ratings yet
수능연계_2026학년도_EBS수능특강_영어_워크북_16강_아잉카
8 pages
Alpha Om 6106 Solder Paste en 05dec19 TB
No ratings yet
Alpha Om 6106 Solder Paste en 05dec19 TB
5 pages
Buku Teks Matematik Tahun 6 KSSR
No ratings yet
Buku Teks Matematik Tahun 6 KSSR
201 pages
AutoSense OS White Paper
No ratings yet
AutoSense OS White Paper
4 pages
The Social Learning Theory of Julian B. Rotter
No ratings yet
The Social Learning Theory of Julian B. Rotter
5 pages
Thesis About Racism PDF
100% (3)
Thesis About Racism PDF
4 pages
Nymhy & Nyyhy Sutrado
No ratings yet
Nymhy & Nyyhy Sutrado
1 page
Dsi-Usa Dywidag Soil Nails
No ratings yet
Dsi-Usa Dywidag Soil Nails
16 pages
Vma210 Scheme PDF
No ratings yet
Vma210 Scheme PDF
1 page
Sigmanest Powerpack™: Benefits Advantages Features
No ratings yet
Sigmanest Powerpack™: Benefits Advantages Features
4 pages
K Ninja H2 Owner's & Service Manuals 08
No ratings yet
K Ninja H2 Owner's & Service Manuals 08
21 pages
memo-GPP Utilization
No ratings yet
memo-GPP Utilization
1 page
10th International Conference On Recent Trends in Electrical Engineering (RTEE 2024)
No ratings yet
10th International Conference On Recent Trends in Electrical Engineering (RTEE 2024)
2 pages
M201 Tech CKLST r4
No ratings yet
M201 Tech CKLST r4
32 pages
FS - M2 Opportunistic Entrepreneur
No ratings yet
FS - M2 Opportunistic Entrepreneur
12 pages
Instant ebooks textbook (Ebook) Aggressive and Violent Peasant Elites in the Nordic Countries, C. 1500-1700 by Ulla Koskinen (eds.) ISBN 9783319406879, 9783319406886, 3319406876, 3319406884 download all chapters
100% (7)
Instant ebooks textbook (Ebook) Aggressive and Violent Peasant Elites in the Nordic Countries, C. 1500-1700 by Ulla Koskinen (eds.) ISBN 9783319406879, 9783319406886, 3319406876, 3319406884 download all chapters
55 pages
(Yard) Individual ASSIGNMENT (Qantitative)
40% (5)
(Yard) Individual ASSIGNMENT (Qantitative)
2 pages
Afraz Naginger Mca - (A) Rollno:59 C-Assignment
No ratings yet
Afraz Naginger Mca - (A) Rollno:59 C-Assignment
7 pages
Srinagar Kashmir - SINA To Badgam - BDGM - 5 Trains - India Rail Info - A Busy Junction For Travellers & Rail Enthusiasts
No ratings yet
Srinagar Kashmir - SINA To Badgam - BDGM - 5 Trains - India Rail Info - A Busy Junction For Travellers & Rail Enthusiasts
2 pages
Assignment 1 Python
No ratings yet
Assignment 1 Python
1 page
Specification of 1500v 5.1mva Solar Ware Station - Reva
No ratings yet
Specification of 1500v 5.1mva Solar Ware Station - Reva
15 pages
Q3 MUSIC10 SSLM6 Week-6
No ratings yet
Q3 MUSIC10 SSLM6 Week-6
5 pages
College Prep: Writing A Strong Essay: Approaches To Unusual Essay Prompts
No ratings yet
College Prep: Writing A Strong Essay: Approaches To Unusual Essay Prompts
3 pages
Leave and Payroll Management System: March 2017
No ratings yet
Leave and Payroll Management System: March 2017
6 pages