Unconstrained minimization
Mathématiques Informatique et Statistique Appliquées (MISA)
Université d’Antananarivo
Unconstrained minimization problems
minimize f (x)
where f : RN → R is convex and twice differentiable. We assume that the optimal
value p ? = inf x f (x) is attained (and finite).
Optimality condition
f is differentiable and convex, a point x ? is optimal iff ∇f (x ? ) = 0.
Sometimes, we can analytically solve the optimality equation but usually the problem
must be solved by an iterative algorithm.
Descent methods
We want to produce a minimizing sequence x (k) , k = 1, ..., where
x (k+1) = x (k) + t (k) ∆x (k) with f (x (k+1) ) < f (x (k) )
I ∆x (k) is the search direction
I t (k) > 0 is the step size
Descent direction
∆x (k) is a descent direction if ∇f (x (k) )> ∆x (k) < 0 (from convexity).
Algorithm 1: General descent method
Start at a point x ∈ dom f
repeat
Determine a descent direction ∆x
Choose a step size t > 0
Update x = x + t∆x
until stopping criterion is satisfied;
Choosing the step size - Line search
Exact line search
Choose t to minimize f along the ray {x + t∆x|t > 0}
t = arg min f (x + t∆x)
t>0
Backtracking line search
An inexact line search approach that chooses t just to reduce f enough. It depends on
two constants α ∈ (0, 0.5) and β ∈ (0, 1).
Algorithm 2: Backtracking line search
t=1
repeat
t = βt
until f (x + t∆x) < f (x) + αt∇f (x)> ∆x;
Gradient descent
It is a descent method with search direction ∆x = −∇f (x).
Algorithm 3: Gradient descent
Start at a point x ∈ dom f
repeat
∆x = −∇f (x)
Choose a step size t > 0
Update x = x + t∆x
until stopping criterion is satisfied;
The stopping criterion is usually of the form ||∇f (x)||2 < .
Newton’s method
It is a descent method with search direction ∆x = −(∇2 f (x))−1 ∇f (x).
We can say that v = ∆x
I minimizes the second order approximation
1
f (x + v ) ≈ f (x) + ∇f (x)> v + v > ∇2 f (x)v
2
I or equivalently solves the linearized optimality condition
∇f (x + v ) ≈ ∇f (x) + ∇2 f (x)v = 0
More
Explore further
I Steepest descent methods (generalizes gradient descent)
I Quasi-Newton methods (like BFGS, L-BFGS)
I Convergence analysis
I Self-concordance
I Gradient-free optimization