ML Lecture - 3
ML Lecture - 3
ML Lecture - 3
Model Representation
Cost Function
Gradient Descent
linear regression with one variable cont.
Model Representation
Cost Function
Gradient Descent
• Machine learning models employ algorithms to acquire knowledge from data, akin
to how humans gain insights from their experiences. These models can be
categorized into two primary groups according to the learning algorithm they
utilize, and further subcategorized based on their specific tasks and the
characteristics of their outputs.
To calculate best-fit line linear regression uses a traditional slope-intercept form which is given
below,
The objective of the linear regression algorithm is to determine the optimal values for B0
and B1 in order to identify the most suitable line of fit. This best-fit line is defined as the one
with the minimal error, indicating that the disparity between the projected values and the
real values should be as small as possible.
Random Error(Residuals)
In regression, the difference between the observed value of the dependent variable(yi) and the predicted
value(predicted) is called the residuals.
εi = ypredicted – yi
where ypredicted = B0 + B1 Xi
What is the best fit line?
In simple terms, the best fit line is a line that fits the given
scatter plot in the best way. Mathematically, the best fit
line is obtained by minimizing the Residual Sum of
Squares(RSS).
linear regression with one variable cont.
Model Representation
Cost Function
Gradient Descent
Let us take the good old example of Housing Prices Vs Size example. Let us plot it on a graph
and, let as have some arbitrary training sets available for us for various prices(of the order
of $1000) with respect to the size of the house(in sq. ft.).
In this context, we are examining the house prices in the Portland area, measured in thousands of
dollars ($1000s) on the y-axis, while the house's size in square feet is plotted on the x-axis. The "red
cross" symbols denote the random training data points for various houses, showing their prices
corresponding to their respective areas.
Now, let's assume you have a friend who is planning to sell his house, using the provided dataset.
Imagine that the size of your friend's house is 1250 square feet, and you want to predict the "actual
price" at which your friend's house can be sold. Naturally, you'd be keen to get an accurate estimate,
right? To do this, we aim to create a model by drawing a straight line that best fits the data, in order
to determine the optimal price at which your friend can sell his house.
So, now by drawing the straight line along the arbitrary training sets, we made a linear plotting in order to
predict a nominal price for your friend’ house and making his life worth easier(thank Machine Learning for this!)
• As a result, when we plot this, it appears that your friend can potentially sell his house for
approximately $250,000. That's a substantial amount!
• This example falls under the category of a Supervised Learning Algorithm. You might
wonder why this qualifies as a Supervised Learning Algorithm example. It's quite simple: it
provided the "correct answer" for each data point in the dataset. Additionally, this is an
illustration of a Regression Problem. So, what is regression and why is it relevant here? Once
again, it's straightforward: regression is employed when predicting real-valued outcomes.
We'll delve into a different type of Supervised Learning Algorithm shortly, where we address
Classification problems that involve discrete valued output.
• For now, let's continue focusing on Regression. Here, we are examining the problem of
Housing Prices and Area using training sets in the Portland area.
You might be curious about what this "training set" is. Well, the training set is essentially the provided
data, consisting of the x and y values. It serves as the training data, and our task here is to predict the
housing price based on the given "area" in the training set.
In the diagram above, you can observe that 'm' represents the total number of training examples. For
this specific instance, let's assume that 'm' equals 47. The 'x's are referred to as the "input" variables,
which are commonly known as "features," while the 'y's are considered the "output" variables,
sometimes also called "target" variables. To denote a single training example, we often use (x, y), and
(x^(i), y^(i)) signifies the "i^(th)" training example. Please note that the superscript "i" within
parentheses does not indicate an exponent. It simply refers to the index within our training set. So, it's
not "x to the power of i and y to the power of i." It merely denotes the number of rows in the training
sets (as shown in the table above).
In the context of the table, for instance, x^(1) corresponds to "2104," the first value of x in row 1, and
the same principle applies to the y values as well (i.e., y^(1) = "460").
Here's the sequence of steps followed by the Supervised Learning Algorithm in the context of
Regression. The training set is first processed by the prominent Learning Algorithm and then
passed through a function denoted as "h." What does "h" represent? Well, "h" is referred to as
the hypothesis.
A "hypothesis" is essentially a function that provides an estimated value of y based on x. In our
housing example, it gives us the estimated price of your friend's house concerning the house's
size. To put it simply, it's like establishing a connection between x and y.
We acknowledge that the term "hypothesis" may seem somewhat unusual and potentially
confusing due to its broader meaning, but it has been used in this manner in the field of machine
learning since its early days. It's become the standard terminology for Machine Learning
Algorithms.
Model Representation
Cost Function
Gradient Descent
The primary objective in training a linear regression model is to minimize the cost function. By
finding the values of w and b that result in a small cost function, we achieve a model that
accurately predicts the target values. Minimizing the cost function involves adjusting the
parameters iteratively until convergence, using techniques such as gradient descent.
Y
30
20
10
X
0 20 40 60
How you can assist the best fit
X
0 20 40 60
Least Squares
• The least squares method is a form of mathematical regression analysis used to
determine the line of best fit for a set of data, providing a visual demonstration of the
relationship between the data points. Each point of data represents the relationship
between a known independent variable and an unknown dependent variable.
15
Least SquaresGraphically
More than one feature case
,
Minimize
predictions on the
training set
the actual values
Minimize
To gain a better understanding, let’s visualize how the cost function changes with different values of w.
We simplify the model by setting b to 0, resulting in the function f(x) = w * x.
The cost function J(w) depends only on w and measures the squared difference between f(x) and y
for each training example.
Suppose we have a training set with three points (1, 1), (2, 2), and (3, 3). We plot the function f(x) = w * x
for different values of w and calculate the corresponding cost function J(w).
• When w = 1: f(x) = x, the line passes
through the origin, perfectly fitting the
data.
• The cost function J(w) is 0 since f(x)
equals y for all training examples.
• Setting w = 0.5: f(x) = 0.5 * x, the line has
a smaller slope. The cost function J(w) now
measures the
• squared errors between f(x) and y for
each example. It provides a measure of
how well the line fits the data.
The relationship between the function f(x) and the cost function J(w) in linear regression can be visualized
by plotting them side by side. The cost function J(w) measures the error between the predicted values of
the model and the actual values in the training dataset. The goal is to find the optimal parameter values
(weights) w that minimize the cost function. By minimizing the cost function, we find the best-fitting line
that minimizes prediction errors. This is typically done using optimization algorithms like gradient descent.
The plot helps us understand how changes in the parameters w affect the cost function and the relationship
between the model’s predictions and the associated error.
linear regression with one variable cont.
Model Representation
Cost Function
Gradient Descent
Iterative solution not only in linear regression. It's
actually used all over the place in machine learning.
Starting point
Red: means high
blue: means low
J()
local
minimum
New Starting
point
J()
New local
minimum
With different starting point
Gradient descentAlgorithm
J(θ1)
d
1 1 j(1)
+ slop
d1
- slop
θ1 = θ 1 - (-ve)
θ1
Linear regression case
d d 1 m
d j
j(0,1) h (xi ) Yi 2
d j 2m i1
d d 1 m
j(0,1) 0 1 (xi ) Yi
2
d j
d j 2m i1
1 m
d
j 0: j(0,1) h(xi ) Yi
d0 m i1
1 m
d
j 1: j(0 ,1) h(xi ) Yi xi
d1 m i1
Example after implement some iterations
using gradient descent
Iteration1
Iteration2
Iteration3
Iteration4
Iteration5