ML Lecture - 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

Lecture two

(linear regression with one variable)


linear regression with one variable cont.

 What is linear regression

 Model Representation

 Cost Function

 Gradient Descent
linear regression with one variable cont.

 What is linear regression

 Model Representation

 Cost Function

 Gradient Descent
• Machine learning models employ algorithms to acquire knowledge from data, akin
to how humans gain insights from their experiences. These models can be
categorized into two primary groups according to the learning algorithm they
utilize, and further subcategorized based on their specific tasks and the
characteristics of their outputs.

Supervised learning techniques involve utilizing historical data that includes


associated labels to construct the model.
Regression pertains to situations where the target variable to be forecasted is
continuous, such as student grades or diamond prices.
Classification, on the other hand, deals with scenarios where the target variable to
be predicted falls into discrete categories, like labeling incoming emails as spam or
not, determining a binary choice (Yes or No, True or False), or categorizing into
classes (0 or 1).
What is Linear Regression?

• Linear regression is a statistical method employed to anticipate the connection


between two variables. It operates under the assumption of a linear association
between the independent and dependent variables, seeking to identify the
optimal-fitting straight line that characterizes this relationship. The line's
placement is determined by minimizing the sum of squared disparities between
the projected values and the observed values.
• Linear regression is a widely applied technique across various domains,
encompassing economics, finance, and social sciences, to scrutinize and foretell
trends within data. Additionally, it can be extended to encompass multiple linear
regression, which involves multiple independent variables, and logistic regression,
primarily utilized for addressing binary classification tasks.
Simple linear regression involves a single independent variable and one dependent
variable.
The model calculates the slope and intercept of the best-fit line, which illustrates the
relationship between these variables. The slope signifies how the dependent variable
changes for each unit alteration in the independent variable, whereas the intercept
signifies the projected value of the dependent variable when the independent variable
equals zero.
Linear regression is a straightforward and fundamental statistical method employed in
predictive analysis within the realm of machine learning. It demonstrates the linear
connection between the independent (predictor) variable (X-axis) and the dependent
(output) variable (Y-axis), hence the term "linear regression." When there's just one input
variable (independent variable), this is referred to as simple linear regression.
As shown in graph the linear relationship between the
output(y) and predictor(X) variables. The blue line is
referred to as the best-fit straight line. Based on the
given data points, we attempt to plot a line that fits the
points the best.

To calculate best-fit line linear regression uses a traditional slope-intercept form which is given
below,

where Yi = Dependent variable, β0 = constant/Intercept, β1 = Slope/Intercept, Xi = Independent


variable.
This algorithm explains the linear relationship between the dependent(output) variable y and the
independent(predictor) variable X using a straight line Y= B0 + B1 X.
Types of RegressionModels
+ve Linear Relationship -ve Linear Relationship

Relationship NOT Linear No Relationship


how the linear regression finds out which is the best fit line?

The objective of the linear regression algorithm is to determine the optimal values for B0
and B1 in order to identify the most suitable line of fit. This best-fit line is defined as the one
with the minimal error, indicating that the disparity between the projected values and the
real values should be as small as possible.

Random Error(Residuals)
In regression, the difference between the observed value of the dependent variable(yi) and the predicted
value(predicted) is called the residuals.
εi = ypredicted – yi
where ypredicted = B0 + B1 Xi
What is the best fit line?
In simple terms, the best fit line is a line that fits the given
scatter plot in the best way. Mathematically, the best fit
line is obtained by minimizing the Residual Sum of
Squares(RSS).
linear regression with one variable cont.

 What is linear regression

 Model Representation

 Cost Function

 Gradient Descent
Let us take the good old example of Housing Prices Vs Size example. Let us plot it on a graph
and, let as have some arbitrary training sets available for us for various prices(of the order
of $1000) with respect to the size of the house(in sq. ft.).
In this context, we are examining the house prices in the Portland area, measured in thousands of
dollars ($1000s) on the y-axis, while the house's size in square feet is plotted on the x-axis. The "red
cross" symbols denote the random training data points for various houses, showing their prices
corresponding to their respective areas.
Now, let's assume you have a friend who is planning to sell his house, using the provided dataset.
Imagine that the size of your friend's house is 1250 square feet, and you want to predict the "actual
price" at which your friend's house can be sold. Naturally, you'd be keen to get an accurate estimate,
right? To do this, we aim to create a model by drawing a straight line that best fits the data, in order
to determine the optimal price at which your friend can sell his house.
So, now by drawing the straight line along the arbitrary training sets, we made a linear plotting in order to
predict a nominal price for your friend’ house and making his life worth easier(thank Machine Learning for this!)
• As a result, when we plot this, it appears that your friend can potentially sell his house for
approximately $250,000. That's a substantial amount!

• This example falls under the category of a Supervised Learning Algorithm. You might
wonder why this qualifies as a Supervised Learning Algorithm example. It's quite simple: it
provided the "correct answer" for each data point in the dataset. Additionally, this is an
illustration of a Regression Problem. So, what is regression and why is it relevant here? Once
again, it's straightforward: regression is employed when predicting real-valued outcomes.
We'll delve into a different type of Supervised Learning Algorithm shortly, where we address
Classification problems that involve discrete valued output.

• For now, let's continue focusing on Regression. Here, we are examining the problem of
Housing Prices and Area using training sets in the Portland area.
You might be curious about what this "training set" is. Well, the training set is essentially the provided
data, consisting of the x and y values. It serves as the training data, and our task here is to predict the
housing price based on the given "area" in the training set.
In the diagram above, you can observe that 'm' represents the total number of training examples. For
this specific instance, let's assume that 'm' equals 47. The 'x's are referred to as the "input" variables,
which are commonly known as "features," while the 'y's are considered the "output" variables,
sometimes also called "target" variables. To denote a single training example, we often use (x, y), and
(x^(i), y^(i)) signifies the "i^(th)" training example. Please note that the superscript "i" within
parentheses does not indicate an exponent. It simply refers to the index within our training set. So, it's
not "x to the power of i and y to the power of i." It merely denotes the number of rows in the training
sets (as shown in the table above).
In the context of the table, for instance, x^(1) corresponds to "2104," the first value of x in row 1, and
the same principle applies to the y values as well (i.e., y^(1) = "460").
Here's the sequence of steps followed by the Supervised Learning Algorithm in the context of
Regression. The training set is first processed by the prominent Learning Algorithm and then
passed through a function denoted as "h." What does "h" represent? Well, "h" is referred to as
the hypothesis.
A "hypothesis" is essentially a function that provides an estimated value of y based on x. In our
housing example, it gives us the estimated price of your friend's house concerning the house's
size. To put it simply, it's like establishing a connection between x and y.
We acknowledge that the term "hypothesis" may seem somewhat unusual and potentially
confusing due to its broader meaning, but it has been used in this manner in the field of machine
learning since its early days. It's become the standard terminology for Machine Learning
Algorithms.

How do we represent "h"?


linear regression with one variable cont.

 What is linear regression

 Model Representation

 Cost Function

 Gradient Descent
The primary objective in training a linear regression model is to minimize the cost function. By
finding the values of w and b that result in a small cost function, we achieve a model that
accurately predicts the target values. Minimizing the cost function involves adjusting the
parameters iteratively until convergence, using techniques such as gradient descent.

How to select θi’s ?


Suppose you have the following dataset with one feature

• 1. Plot of All (Xi, Yi) Pairs


• 2. Suggests How Well Model Will Fit

Y
30
20
10
X
0 20 40 60
How you can assist the best fit

We try to minimize value of cost function to get best fit

X
0 20 40 60
Least Squares
• The least squares method is a form of mathematical regression analysis used to
determine the line of best fit for a set of data, providing a visual demonstration of the
relationship between the data points. Each point of data represents the relationship
between a known independent variable and an unknown dependent variable.

15
Least SquaresGraphically
More than one feature case
,

Minimize

predictions on the
training set
the actual values

Minimize
To gain a better understanding, let’s visualize how the cost function changes with different values of w.
We simplify the model by setting b to 0, resulting in the function f(x) = w * x.
The cost function J(w) depends only on w and measures the squared difference between f(x) and y
for each training example.

Example: Exploring the Cost Function

Suppose we have a training set with three points (1, 1), (2, 2), and (3, 3). We plot the function f(x) = w * x
for different values of w and calculate the corresponding cost function J(w).
• When w = 1: f(x) = x, the line passes
through the origin, perfectly fitting the
data.
• The cost function J(w) is 0 since f(x)
equals y for all training examples.
• Setting w = 0.5: f(x) = 0.5 * x, the line has
a smaller slope. The cost function J(w) now
measures the
• squared errors between f(x) and y for
each example. It provides a measure of
how well the line fits the data.
The relationship between the function f(x) and the cost function J(w) in linear regression can be visualized
by plotting them side by side. The cost function J(w) measures the error between the predicted values of
the model and the actual values in the training dataset. The goal is to find the optimal parameter values
(weights) w that minimize the cost function. By minimizing the cost function, we find the best-fitting line
that minimizes prediction errors. This is typically done using optimization algorithms like gradient descent.
The plot helps us understand how changes in the parameters w affect the cost function and the relationship
between the model’s predictions and the associated error.
linear regression with one variable cont.

 What is linear regression

 Model Representation

 Cost Function

 Gradient Descent
Iterative solution not only in linear regression. It's
actually used all over the place in machine learning.

 Objective: minimize any function ( Cost Function J)


Example
Imagine that this is a landscape of grassy park, and you want to go to
the lowest point in the park as rapidly as possible

Starting point
Red: means high
blue: means low

J()


local
minimum 
New Starting
point

Red: means high


blue: means low

J()

New local
minimum



With different starting point
Gradient descentAlgorithm
J(θ1)

d
1 1  j(1)
+ slop
d1

θ1 θ1= θ1-  (+ve)


J(θ1)

- slop

θ1 = θ 1 - (-ve)
θ1
Linear regression case
d d 1 m

d j
j(0,1)   h (xi ) Yi 2
d j 2m i1
d d 1 m
j(0,1)   0 1 (xi ) Yi 
2
d j

d j 2m i1
1 m
d
j  0: j(0,1) h(xi ) Yi 
d0 m i1
1 m
d
j 1: j(0 ,1) h(xi ) Yi  xi
d1 m i1
Example after implement some iterations
using gradient descent
Iteration1
Iteration2
Iteration3
Iteration4
Iteration5

You might also like