Optimization PPT - Part-2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

OPTIMIZATION TECHNIQUES

(19MAT202)

Dr. SAYANTAN MANDAL

DEPARTMENT OF MATHEMATICS
Gradient-based Method
Line Search in Multidimensional
Optimization
• One-dimensional search methods play an
important role in multidimensional optimization
problems.
• In particular, iterative algorithms for solving such
optimization problems (to be discussed in the
upcoming classes) typically involve a line search
at every iteration.
• To be specific, let 𝑓: 𝑅𝑛 → 𝑅 be a function that we
wish to minimize.
• Iterative algorithms for finding a minimizer of 𝑓
are of the form:

• Note that choice of 𝛼𝑘 involves a one-dimensional


minimization.
• This choice ensures that under appropriate
conditions,
Multidimensional Optimization

• Any of the one-dimensional methods discussed in


this chapter can be used to minimize 𝜙𝑘 .
• We may, for example, use the secant method to
find 𝛼𝑘 .
• In this case we need the derivative of 𝜙𝑘 , which is

Directional derivative of 𝝓𝒌 in the direction of 𝒅(𝒌)


Multidimensional Optimization

• Therefore, applying the secant method for the line


search requires the gradient 𝑓 , the initial line-
search point 𝒙(𝑘) and the search direction 𝒅(𝑘) .
• Of course, other one dimensional search methods
may be used for line search.
Gradient Methods
• In this chapter we consider a class of search
methods for real-valued functions on 𝑅 𝑛 .
• These methods use the gradient of the given
function.
• We use such terms as level sets, normal vectors,
and tangent vectors.
• A level set of a function 𝑓: 𝑅𝑛 → 𝑅 is the set of
points 𝑥 satisfying 𝑓(𝒙) = 𝑐 for some constant 𝑐.
• Thus, a point 𝑥0 ∈ 𝑅𝑛 is on the level set
corresponding to level c if 𝑓(𝑥0 ) = 𝑐.
Gradient Methods
• In the case of functions of two real variables,
𝑓: 𝑅2 → 𝑅 , the notion of the level set is illustrated
in the following Figure:
Gradient Methods
• The gradient of 𝑓 at 𝑥0 , denoted 𝑓 (𝑥0 ), if it is not a
zero vector, is orthogonal to the tangent vector to an
arbitrary smooth curve passing through 𝑥𝑜 on the level
set 𝑓(𝑥) = 𝑐.

• Thus, the direction of maximum rate of increase of a


real-valued differentiable function at a point is
orthogonal to the level set of the function through that
point.
• In other words, the gradient acts in such a direction
that for a given small displacement, the function 𝑓
increases more in the direction of the gradient than in
any other direction.
Gradient Methods
• Thus, the direction in which 𝑓 (𝑥), points is the
direction of maximum rate of increase of 𝑓 at 𝑥.
• The direction in which −𝑓 (𝑥), points is the
direction of maximum rate of decrease of 𝑓 at 𝑥.
• Hence, the direction of negative gradient is a good
direction to search if we want to find a function
minimizer.
Procedure:
• We proceed as follows:
• Let 𝒙(0) be a starting point, and
• consider the point 𝒙(0) − 𝛼𝑓(𝒙(0) ).
• Then, by Taylor's theorem, we obtain

𝒇(𝒙(0) − 𝛼𝑓(𝒙 0 )) = 𝒇(𝒙 0 ) − 𝛼|| 𝑓 𝒙 0 ||2 + 𝑜(𝛼)


• Thus, if 𝑓 𝒙 0 ≠ 𝟎, then for sufficiently small 𝛼 > 0, we
have
𝒇(𝒙(0) − 𝛼𝑓(𝒙 0 )) < 𝒇(𝒙 0 ).

• This means that the point 𝒙(𝟎) − 𝜶𝒇(𝒙(𝟎) ) is an


improvement over the point 𝒙(𝟎) if we are searching for a
minimizer.
Procedure:
• To formulate an algorithm that implements this idea,
suppose that we are given a point 𝒙(𝑘) .

• To find the next point 𝒙(𝑘+1) ,


 we start at 𝒙(𝑘)
 and move by an amount −𝛼𝑘 𝑓(𝒙(𝑘) )
 where 𝛼𝑘 is a positive scalar called the step size.
• This procedure leads to the following iterative
algorithm:
𝒙(𝒌+𝟏) = 𝒙(𝒌) − 𝛼𝑘 𝑓(𝒙(𝒌) )
• We refer to this as a gradient descent algorithm (or
simply a gradient algorithm).
The Method of Steepest Descent
• The method of steepest descent is a gradient
algorithm where the step size 𝜶𝒌 is chosen to
achieve the maximum amount of decrease of the
objective function at each individual step.
• Specifically, 𝛼𝑘 is chosen to minimize

𝜙𝑘 𝛼 : = 𝑓(𝒙(𝒌) − 𝛼𝑓(𝒙(𝒌) ))

• In other words,
𝛼𝑘 = argmin 𝑓(𝒙(𝒌) − 𝛼𝑓 𝒙 𝒌
).
𝛼≥0
The Method of Steepest Descent
• To summarize, the steepest descent algorithm
proceeds as follows:
• At each step, starting from the point 𝒙(𝒌) , we
conduct a line search in the direction −𝑓(𝒙(𝒌) )
until a minimizer, 𝒙(𝒌+𝟏) , is found.
• A typical sequence resulting from the method of
steepest descent is depicted in the following
Figure:
• Observe that the method of steepest descent moves
in orthogonal steps, as stated in the following
proposition.
Remark: (Stopping Criteria)
Problem: We use the method of steepest descent to find the
minimizer of
𝒇 𝒙𝟏 , 𝒙𝟐 = 𝒙𝟏 − 𝒙𝟐 + 𝟐𝒙𝟐𝟏 + 𝟐𝒙𝟏 𝒙𝟐 + 𝒙𝟐𝟐
The initial point is 𝑥 (0) = 0, 0 𝑇 . Perform three iterations.

Solution: (i). First Find 𝑓(𝒙)

𝑻
𝑓 𝒙 = 𝟏 + 𝟒𝒙𝟏 + 𝟐𝒙𝟐 , −𝟏 + 𝟐𝒙𝟏 + 𝟐𝒙𝟐

Then find 𝑓(𝒙(𝟎) ):

𝑓(𝒙(𝟎) ) = 1, −1 𝑇
(ii). We need to find 𝒙(𝟏) .

What is this 𝛼0 ?

Use second derivative


= arg min 𝑓( 0, 0 𝑇 − 𝛼 1, −1 𝑇 ) test to find the minimizer
𝛼≥0
= arg min 𝑓( −𝛼, 𝛼 𝑇 ) of
𝛼≥0
= arg min( −𝛼 − 𝛼 + 2𝛼 2 − 2𝛼 2 + 𝛼 2 )
𝛼≥0
= arg min(−2𝛼 + 𝛼 2 ) = 1
𝛼≥0

= 0, 0 𝑇 − 1, −1 𝑇 = −1, 1 𝑇
(ii). We need to find 𝒙(𝟐) .

To find 𝒙(𝟐) , first find 𝑓(𝒙(𝟏) )


𝑻
𝑓 𝒙 = 𝟏 + 𝟒𝒙𝟏 + 𝟐𝒙𝟐 , −𝟏 + 𝟐𝒙𝟏 + 𝟐𝒙𝟐

𝟏
𝑓 𝒙 = [−1, −1]
Problem: We use the method of steepest descent to find the
minimizer of
𝒇 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 = 𝒙𝟏 − 𝟒 𝟒 + 𝒙𝟐 − 𝟑 𝟐 + 𝟒 𝒙𝟑 + 𝟓 𝟒

The initial point is 𝑥 (0) = 4, 2, −1 𝑇 . Perform three


iterations.

Solution: (i). First Find 𝑓(𝒙)


Problem: We use the method of steepest descent to find the
minimizer of
𝒇 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 = 𝒙𝟏 − 𝟒 𝟒 + 𝒙𝟐 − 𝟑 𝟐 + 𝟒 𝒙𝟑 + 𝟓 𝟒

The initial point is 𝑥 (0) = 4, 2, −1 𝑇 . Perform three


iterations.

Solution:
(i). First Find 𝑓(𝒙)

𝟑 𝟑 𝟑 𝑻
𝑓 𝒙 = 𝟒 𝒙𝟏 − 𝟒 , 𝟐 𝒙𝟐 − 𝟑 , 𝟏𝟔 𝒙𝟑 + 𝟓

Then find 𝑓(𝒙(𝟎) )

𝑓(𝒙(𝟎) ) = 0, −2, 1024 𝑇


(ii). We need to find 𝒙(𝟏) .

What is this 𝛼0 ?
(ii). We need to find 𝒙(𝟐) .

To find 𝒙(𝟐) , first find 𝑓(𝒙(𝟏) )


(ii). We need to find 𝒙(𝟑) .
Remark:
Newton’s Method
• Recall that the method of steepest descent uses only
first derivatives (gradients) in selecting a suitable
search direction.
• This strategy is not always the most effective.
• If higher derivatives are used, the resulting iterative
algorithm may perform better than the steepest
descent method.
• Newton's method (sometimes called the Newton-
Raphson method) uses first and second derivatives
and indeed does perform better than the steepest
descent method if the initial point is close to the
minimizer.
Newton’s Method
• The idea behind this method is as follows. Given a starting
point, we construct a quadratic approximation to the
objective function that matches the first and second
derivative values at that point.
• We then minimize the approximate (quadratic) function
instead of the original objective function.
• We use the minimizer of the approximate function as the
starting point in the next step and repeat the procedure
iteratively.
• If the objective function is quadratic, then the
approximation is exact, and the method yields the true
minimizer in one step.
• If, on the other hand,
• the objective function is not quadratic, then the
approximation will provide only an estimate of the position
of the true minimizer.
Newton’s Method
• The following recursive formula represents
Newton's method

where
Newton’s Method
Problem: Use Newton's method to minimize the following
function:

(Powell function)

Solution:
Solution:

(1) (0) 0 −1 0
𝒙 =𝒙 −𝐹 𝒙 𝒈
0 0 −1 0
=𝒙 −𝑭 𝒙 𝑓(𝒙 )
Newton’s Method

You might also like