6 Gradient Method
6 Gradient Method
6 Gradient Method
1
Gradient
• If we look at the Level Sets (projections at different
heights), then the derivatives indicate the direction
along which the function increases at the maximum
rate.
2
Gradient
Level Set of a function 𝑓: ℝ𝑛 → ℝ at a level 𝑐 is the set of points 𝑆 = {𝑥: 𝑓 𝑥 = 𝑐}.
Example:
2 2
𝑓 𝑥, 𝑦 = 𝑥𝑒 −𝑥 −𝑦
𝑥 ∈ −0.1, 2 , 𝑦 ∈ [−0.1, 3]
3
Gradient
• If we consider the direction of increment to be 𝑑, 𝑑 = 1,
then the rate of increase of 𝑓(𝑥) along 𝑑 at 𝑥 is
𝛻𝑓 𝑥 𝑇 𝑑 ≤ 𝛻𝑓 𝑥
𝛻𝑓 𝑥
• If 𝑑 = i.e., unit vector along the direction of
| 𝛻𝑓 𝑥 |
𝛻𝑓(𝑥), then
𝑇
𝛻𝑓 𝑥
𝛻𝑓 𝑥 = 𝛻𝑓 𝑥
| 𝛻𝑓 𝑥 |
4
Gradient
𝑇
𝛻𝑓 𝑥
𝛻𝑓 𝑥 = 𝛻𝑓 𝑥
| 𝛻𝑓 𝑥 |
5
Gradient Descent
• Let us start at an initial approximation 𝑥0 , then to find the minimum, the next point according to gradient
descent will be 𝑥1 = 𝑥0 − 𝛻𝑓(𝑥0 ).
• Here, 𝛼𝑘 > 0 is called the Step Size or Learning Rate and we start at 𝑥𝑘 and move along the direction
− 𝛼𝑘 𝛻𝑓 𝑥𝑘 .
6
Gradient Descent
• The gradient varies as the search proceeds, tending to zero as we approach the minimizer.
• We have the option of either taking very small steps and reevaluating the gradient at every step, or
we can take large steps each time.
• The first approach results in a laborious method of reaching the minimizer, whereas the second
approach may result in a more zigzag path to the minimizer.
• The advantage of the second approach is possibly fewer gradient evaluations.
7
Steepest Descent
• Here, the step size 𝛼𝑘 is chosen to achieve the maximum reduction in the objective function at each
step.
• Some methods of selecting 𝛼𝑘 are
• Fixed learning rate
• Line search
• Bold driver
• Momentum
8
Steepest Descent: Fixed Rate
4 3 4
Example: Find the minimum of 𝑓 𝑥1 , 𝑥2 , 𝑥3 = 𝑥1 − 4 + 𝑥2 − 3 + 4 𝑥3 + 5
Solution: Assume 𝑥0 = 4, 2, −1 𝑇
9
Steepest Descent: Fixed Rate
Example: Find the minimum of 𝑓 𝑥1 , 𝑥2 , 𝑥3 = 𝑥1 − 4 4 + 𝑥2 − 3 3 + 4 𝑥3 − 5 4
𝑇
Solution: Assume 𝑥0 = 4, 2, −1 and choose 𝛼 = 0.001
Step 1: Calculate the gradient at the point
4 𝑥1 − 4 3
𝛻𝑓 𝑥 ቚ = 2 𝑥2 − 3 2
𝑥= 4,2,−1 𝑇
16 𝑥3 + 5 3 𝑥= 4,2,−1 𝑇
10
Steepest Descent: Line Search
• Step 3: Use Line search to determine 𝛼𝑘 and then calculate the next point 𝑥𝑘+1
𝜙(𝛼) = 2𝛼 − 1 4 + 4 4 − 1024𝛼 4
𝛼 𝑘 − 𝛼 𝑘−1
𝛼 𝑘+1 = 𝛼𝑘 − ′ 𝑘 ′ 𝑘−1
𝜙 ′ (𝛼 𝑘 )
𝜙 𝛼 −𝜙 𝛼
⇒ 𝛼0 = 3.976 × 10−3
11
Steepest Descent: Line Search
• Step 3: Use Line search to determine 𝛼𝑘 and then calculate the next point 𝑥𝑘+1
4 4
𝜙(𝛼) = 2𝛼 − 1 + 4 4 − 1024𝛼
• Place this 𝛼0 in 𝑥1 = 𝑥0 − 𝛼0 𝛻𝑓 𝑥0 .
• Go to Step 1 and use this 𝑥1 to find 𝑥2 and keep repeating Steps 1, 2 and 3 to reach the minimum
of 𝑓 𝑥1 , 𝑥2 , 𝑥3 . Each iteration will have a new 𝛼 value.
12
Selection of the Stopping Criteria
• But if we are working with local minima or constraint sets, then we will have to modify the
stopping criterion. Such as
• Stop when the step size 𝛼𝑘 approaches zero.
13
Examples
1. Find the solution for ℎ 𝑥 = 0 using fixed rate steepest descent.
4 + 3𝑥1 + 2𝑥2
ℎ 𝑥 = =0
1 + 2𝑥1 + 3𝑥2
𝑥1 = −1.999
𝑥2 = 0.999
14
Steepest Descent for Quadratic Function
• For quadratic functions of the form
1
𝑓 𝑥 = 𝑥 𝑇 𝑄𝑥 − 𝑏 𝑇 𝑥
2
• In this case, for the update equation, 𝑥𝑘+1 = 𝑥𝑘 − 𝛼𝑘 𝑔𝑘 , we can determine the update of 𝛼𝑘 .
• Since
𝛼𝑘 = 𝑎𝑟𝑔𝑚𝑖𝑛𝛼≥0 𝑓 𝑥𝑘 − 𝛼𝑔𝑘
⇒ 𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝜙 𝛼 = 𝑥𝑘 − 𝛼𝑔𝑘 𝑇 𝑄 𝑥𝑘 − 𝛼𝑔𝑘 − 𝑏 𝑇 𝑥𝑘 − 𝛼𝑔𝑘
⇒ 𝜙 ′ 𝛼 = 𝑥𝑘 − 𝛼𝑔𝑘 𝑇 𝑄 −𝑔𝑘 − 𝑏 𝑇 −𝑔𝑘
15
Steepest Descent for Quadratic Function
𝑔𝑘𝑇 𝑔𝑘
⇒ 𝛼𝑘 = 𝑇
𝑔𝑘 𝑄𝑔𝑘
𝑔𝑘𝑇 𝑔𝑘
Hence, the update equation would be 𝑥𝑘+1 = 𝑥𝑘 − 𝑔𝑘 , where 𝑔𝑘 = 𝑄𝑥𝑘 − 𝑏
𝑔𝑘𝑇 𝑄𝑔𝑘
16
Steepest Descent for Quadratic Function
For fixed rate of step size 𝛼, the gradient descent algorithm 𝑥𝑘 → 𝑥 ∗ for any 𝑥0 if and only if
2
0<𝛼<
𝜆𝑚𝑎𝑥 (𝑄)
17
Home Work 1
1
Convert the function in 𝑓 𝑥 = 2 𝑥 𝑇 𝑄𝑥 + 𝑏 𝑇 𝑥 format and show steepest descent with varying 𝜶
for 2 iterations.
3 2 2
𝑓 𝑥 = (𝑥1 +𝑥2 ) + 2𝑥1 𝑥2 + 3𝑥1 − 𝑥2 − 22
2
Hint: 𝑄 must be symmetric and positive definite.
2
• 0<𝛼<
𝜆_𝑚𝑎𝑥 𝑄
18
Home Work 2
Show the first two iterations for steepest descent with a fixed 𝜶 for the following function
3
𝑓 𝑥 = 𝑥 𝑇 4 2√2 𝑥 + 𝑥 𝑇 + 24
0 5 6
• Hint: Rearrange the equation to make 𝑸 symmetric and then check for positive definiteness.
• 𝑄 = 8 2 2; 2 2 10 ; Find maximum eigenvalue of Q
2
• 0<𝛼<
𝜆_𝑚𝑎𝑥 𝑄
• Defined 𝑓(𝑥) and 𝛻𝑓 𝑥 = 𝑄𝑥𝑘 − 𝑏 in script as function
• Use the update equation 𝑥𝑘+1 = 𝑥𝑘 − 𝛼𝛻𝑓(𝑥𝑘 )
𝑔𝑘𝑇 𝑔𝑘
• Repeat the same problem with 𝛼𝑘 = , instead of Fixed value.
𝑔𝑘𝑇 𝑄𝑔𝑘
• Note down the differences, pros and cons you have.
19