0% found this document useful (0 votes)
5 views

Linear+regression+with+one+variable

Uploaded by

alaaabdo347890
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Linear+regression+with+one+variable

Uploaded by

alaaabdo347890
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Machine learning Algorithms

Linear regression with one variable :


Model representation
 Model Representation

 Cost Function

 Gradient Descent
500
Housing Prices
400

300

dependent 4
200
variable X

125
0
Independent variable

Supervised Regression:
Learning
“right answers” or “Labeled Predict continuous valued
data” given output (price)
B

Training set of Size in feet 2 ( x) Price ($ ) in 1000's (y)


housing prices 2104
1416
460
232
^
( Portland, OR )
1534 315 > m
852 178
••• •••

Notation:
m = Number of training examples
Example
x's = "input" variable / features
y's = "output" variable / "target" variable
x (1) 2104
(x,y) one training example (one raw)
y (2) 232
(x (i),y (i)) i th training example
x (4) 852
Training set the job of a learning algorithm to
output a function is usually
w denoted lowercase h and h
Learning algorithm stands for hypothesis

x h > y

the job of a hypothesis function


is taking the value of x and it
tries to output the estimated
value of y. So h is a function that
maps from x's to y's
How do we represent h ?

X
Linear Equations
Y
y = f (X) = 00 + <9lX
Change in Y
θ1= Slop (ΔY)

Change in X (ΔX)

θ0=Y-intercept
X
Types of Regression Models
Positive Linear Relationship Relationship NOT Linear
3.5
3-
2.5 - t +
2 +
1.5 | J
1
0.5 -
0 + t
5 10 15 20

Negative Linear Relationship No Relationship


S
7
6
5
4
3
2
1
O t + + t
O 2 4 6 S 1O
u

The cost function, let us figure out how to fit the best
possible straight line to our data.
Size in feet 2 ( x ) Price ($ ) in 1000's ( y)
Training Set
2104 460
1416 232
1534 315
852 178
••• ••
#

Hypothesis: he ( x ) — 9Q + 9\ X
How to choose θi’s ?
Scatter plot

• 1. Plot of All (Xi, Yi) Pairs


• 2. Suggests How Well Model Will Fit

Y
60
40
20
0 X
0 20 40 60
Thinking Challenge

How would you draw a line through the points?


How do you determine which line ‘fits best’?

Y
60
40
20
0 X
0 20 40 60

11
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?

Y
60
40
20
0 X
0 20 40 60
Intercept
unchanged
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?
Slope
unchanged

Y
60
40
20
0 X
0 20 40 60
Intercept
changed
Thinking Challenge
How would you draw a line through the points?
How do you determine which line ‘fits best’?
Slope
changed

Y
6
0
4
0 X
2 0 2 4 60
Intercept 0 0 0
changed
0
Least Squares
• 1. „Best Fit‟ Means Difference Between Actual Y
Values and Predicted Y Values is a Minimum.
• So square errors!
m m
 Yi  h(x i)   ˆi
2 2
i1 i1

15
Least Squares
• 1. „Best Fit‟ Means Difference Between Actual Y Values &
Predicted Y Values Are a Minimum. So square errors!

m m
 Yi  h(x i)   ˆi
2 2
i1 i1
• 2. LS Minimizes the Sum of the Squared Differences
(errors) (SSE)

16
Least Squares Graphically
n
LS minimizes   2
  2
1   2
2   2
3   2
4
i 1
Y Y201X2 ˆ2
^4
^2
^1 ^3
hθ(xi )  θ0  θ1 X i
X
17
Least Squared errors Linear
Regression

I
,

A
Minimiz
e

> predictions on the


X training set
the actual values
Idea : Choose #oA so that
/10 (2) is close to y for our
training examples ( x , y )
Minimiz
e
Cost function visualization
Consider a simple case of hypothesis by setting θ0=0, then h becomes : hθ(x)=θ1x
Each value of θ1 corresponds to a different hypothesis as it is the slope of the line
which corresponds to different lines passing through the origin as shown in plots below as y-intercept
i.e. θ0 is nulled out.

At θ1=2,

At θ1=1,

At θ1=0.5, J(0.5) ( 0.52 + l 2 + 1.52 ) = 0.58

Simple
Simple Hypothesis
Cost function visualization

At θ1=2,

At θ1=1,

At θ1=0.5, J(0.5) „ -- ( 0.52 + l 2 + 1.52 ) = 0.58


2* 6

On plotting points like this further, one gets


the following graph for the cost function which
is dependent on parameter θ1.

plot each value of θ1 corresponds to a


different hypothesizes -2 4 9\ 6
Cost function visualization

What is the optimal value of θ1 that minimizes


J(θ1) ?

It is clear that best value for θ1 =1 as J(θ1 ) = 0,


which is the minimum.

How to find the best value for θ1 ?

Plotting ?? Not practical specially in high


dimensions?
The solution :

1. Analytical solution: not applicable for large -2 4 9\ 6


datasets
2. Numerical solution: ex: Gradient descent .
Hypothesis:
h 0 ( x ) = #o H-

Parameters:

Cost Function:
m
(i ) 2
J ( 0o , 0 i ) = 2 hl 2Z (h o { x ) - y (i )
)

Goal: minimize J ( O n . 0 \ )
0o , 0 \
Gradient Descent
Iterative solution not only in linear regression. It's
actually used all over the place in machine learning.

 Objective: minimize any function ( Cost Function J)


p

Have some function J ( QQ , Q\ )


Want min J( #oA )
0O,0I

Outline:
• Start with some %

• Keep changing 0o
^ i -
to reduce / ( $o ? $i)

until we hopefully end up at a minimum


Imagine that this is a landscape of grassy park, and you want to go
to the lowest point in the park as rapidly as possible

Red: means
Starting point
high blue:
means low

J()


local
minimum 
New Starting
point

Red: means
high blue:
means low

J()

New local
minimum



With different starting
point
Gradient descent Algorithm
d
repeat until convergence
• Where
o
^
:= is the assignment operator
:= 9 j - a
893;

o ois the learning rate which basically defines how big the steps are during
the descent

o o «J( 0 0 ) s the partial derivative term


OGj o» i
'
o j = 0, 1 represents the feature index number

Also the parameters should be updated simulatenously , i.e. ,

tempo ’.= OQ — ex. %r J( eo , 0 )


dOo|
i
2 00

1 75

1.50

125

tempi Oi — at % J ( Bo, Oi )
dGx
- Z 1.00

0 75
0 50

9Q := tempo 0.25

-0.25 x
-050
6\ := tempi -0 50-0 75
-0.75
-1 005-1.00
J(θ1)

d
1 1  j(1)
+ slop
d1

θ1 θ1= θ1- (+ve)


J(θ1)

- slop

θ1= θ1- (-ve)


θ1
Gradient descent algorith Linear Regression Model

repeat until convergence


d
Mra3rwi)
]
I

( for j = 1 and j = (!)


/
d m

d j
j(0,1) 
d 1
 h (xi ) Yi 2
d j 2m i1
d
 0 1 (xi ) Yi 
1 m
d
j(0,1)  
2
d j d j 2m i1
m
j(0,1)  h(xi )  Yi 
d 1
j  0:
d0 m i1
m
j(0,1)  h(xi )  Yi  xi
d 1
j 1:
d1 m i1
Gradient descent algorithm

repeat until convergence {



m
1 ( i ) ) - (i)
#0 0() am E ( M* 2/ )
m
01 = 01

1
a 111 E ( M* (i ) ) - (i )
2/ ) xw•

}
"Batch" Gradient Descent

"Batch": Each step of gradient descent


uses all the training examples.
repeat until convergence {
m
9o : Oo

1
am w {i )
£ (M* ) y )
-

m
9\ 91 i
am E { ho { x {l)
)- y ) - x
{i)
^
}
Example after implement some iterations
using gradient descent
Iteration 1

0.6

0.5 -

0.4 -

0.3 -

0.2 -

0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 2

0.6

0.5 -

0.4 -

0.3 -

0.2 -

0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 3

0.6

0.5 -

0.4 -

0.3 -

0.2 -

0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 4

0.6

0.5 -

0.4 -

0.3 -

0.2 -

0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 5

0.6

0.5 -

0.4 -

0.3 -

0.2 -

0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 6

0.6

0.5 -

0.4 -

0.3 -

0.2 -

0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 7

0.6

0.5 -

0.4 -

0.3 -

0.2 -

0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 8

0.6

0.5 -

0.4 -

0.3 -

0.2 -

0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 9
0.6

0.5 -

0.4 -

0.3 -

0.2 -

0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 10
0.6

0.5 -

0.4 -

0.3 -

0.2 -

0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Iteration 11
0.6

0.5 -

0.4 -

0.3 -

0.2 -

0.1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
x
Thanks

You might also like