0% found this document useful (0 votes)
10 views46 pages

MAE Optimization Lecture 2 Handout

The document provides an introduction to optimization, outlining key concepts such as optimization problems, feasible sets, and cost functions. It discusses the definitions of global and local solutions, as well as the importance of differential calculus in optimization. Additionally, it covers practical and theoretical issues related to optimization, including modeling, numerical search for solutions, and the characterization of solutions.

Uploaded by

Samuel Jiménez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views46 pages

MAE Optimization Lecture 2 Handout

The document provides an introduction to optimization, outlining key concepts such as optimization problems, feasible sets, and cost functions. It discusses the definitions of global and local solutions, as well as the importance of differential calculus in optimization. Additionally, it covers practical and theoretical issues related to optimization, including modeling, numerical search for solutions, and the characterization of solutions.

Uploaded by

Samuel Jiménez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

1MAE004 - Optimization

Introduction to Optimization

E. Flayac

March 6th, 2025

Introduction to Optimization | | March 4th | Slide 1/46


Outline

Generalities on Optimization

Differential Calculus

Convexity

Introduction to Optimization | | March 4th | Slide 2/46


Generalities on Optimization

Introduction to Optimization | Generalities on Optimization | March 4th | Slide 3/46


General Optimization problem

Optimization is about making a decision that is the best accord-


ing to a given criterion, among a given set of options.

An optimization problem is defined by:


▶ An admissible (feasible) set A ⊂ Rn
▶ A vector x ∈ A represent decision variables
▶ A cost function (objective, criterion) f : A → R
The problem is written :

min f (x) (P)


x∈A

Introduction to Optimization | Generalities on Optimization | March 4th | Slide 4/46


Generalities about the Feasible Set

The nature of A depends on the nature of the decision to be made.

▶ Discrete or Continuous
▶ Finite or Infinite Dimension
▶ Static or Dynamic
▶ Defined by a finite number of equality and inequality constraints:
A = {x ∈ Rn : h(x) = 0, g (x) ≤ 0}
with h(x) ∈ Rm and g (x) ∈ Rp

Introduction to Optimization | Generalities on Optimization | March 4th | Slide 5/46


Generalities about the Feasible Set

Examples of constraints:
▶ Physical restrictions: positivity, equilibrium law, Newton’s second
law, . . .
▶ Goal to achieve: target point, production goal, . . .
▶ Safety constraints: ”stay on the road”, collision avoidance,
probability of failure, . . .
▶ Other constraints: budget, time, space, . . .

Introduction to Optimization | Generalities on Optimization | March 4th | Slide 6/46


Generalities about the Cost Function

The cost function f allows comparing the available options in A.

The properties of f (and g and h) allow defining several types of


problems:
▶ Linear or Nonlinear
▶ Differentiable (smooth) or Non-differentiable (non-smooth)
▶ Convex or Non-convex

Introduction to Optimization | Generalities on Optimization | March 4th | Slide 7/46


Generalities about the Cost Function

Examples of cost functions:


▶ ”Usual sense” cost to minimize: fuel consumption, expenses, . . .
▶ Profit to maximize: return on investment, sales, reward, . . .
▶ Other performance indicators: time, distance, system energy,
regression error, . . .

Introduction to Optimization | Generalities on Optimization | March 4th | Slide 8/46


Unconstrained example: non-linear regression
Data: [ti , yi ] for i = 1 to M
 
0.2308 0.0652
0.4903 0.0153 
 
0.8438
 −0.0496

0.9398
 −0.0657

2.5145
 −0.1448

2.6552
 −0.1336

.. ..
. .
Formulation
▶ Signal of the form s(t) = a exp(−bt) cos(ct + d)
▶ Inverse problem: Identify the parameters a, b, c, and d with no a
priori
▶ a, b, c, and d are decision variables of the objective function f (to
be defined) so that:
T
x = a b c d ∈ R4 = A


Introduction to Optimization | Generalities on Optimization | March 4th | Slide 9/46


Unconstrained example: non-linear regression

▶ From M sampling points [ti , yi ] for i = 1 to M, identification by


minimizing the function
M
1X
f (x) = (yi − s(ti ))2
2
i=1
M
1X
= (yi − a exp(−bti ) cos(cti + d))2
2
i=1

▶ Choose to square the function → f differentiable (not the case with


the absolute value)
▶ Possibility to add some constraints x = [a, b, c, d]T
▶ Called a Non-linear least squares problem

Introduction to Optimization | Generalities on Optimization | March 4th | Slide 10/46


Example with a constraint: minimum distance to an obstacle

Introduction to Optimization | Generalities on Optimization | March 4th | Slide 11/46


Example with a constraint

Introduction to Optimization | Generalities on Optimization | March 4th | Slide 12/46


Example with a constraint: minimum distance to an obstacle

▶ Cost function: distance to the obstacle (horizontal distance)


f (x1 , x2 ) = x12

▶ One equality constraint:


A = {”Path of the vehicle”}
A = {(x1 , x2 ) ∈ R2 |c(x1 , x2 ) = 0}

Introduction to Optimization | Generalities on Optimization | March 4th | Slide 13/46


General optimization problem: Solutions

Going back to :
min f (x) (P)
x∈A

Definition: Solution
We say that x∗ ∈ Rn is a global solution of (P) if
▶ x∗ ∈ A;
▶ f (x∗ ) ≤ f (x), ∀x ∈ A.
The set of solutions is written as :
argmin f (x)
x∈A

Remarks
▶ A global solution is also called a global minimum
▶ Minimizing f is equivalent to maximize −f

Introduction to Optimization | Generalities on Optimization | March 4th | Slide 14/46


Example
f (x) = 0.4x(sin(x) + cos(2x))

A x

Global Minimum

Introduction to Optimization | Generalities on Optimization | March 4th | Slide 15/46


Local solutions
Going back to :
min f (x) (P)
x∈A

Definitions
We denote by B(x∗ , r ) the ball centered at x∗ of radius r > 0 i.e.
B(x∗ , r ) = {x ∈ Rn | ∥x − x∗ ∥ ≤ r }
Local solutions
We say that x∗ ∈ Rn is a local solution of (P) if
▶ x∗ ∈ A
▶ ∃r > 0, ∀x ∈ B(x∗ , r ) ∩ A, f (x∗ ) ≤ f (x)
Strict local solution
We say that x∗ ∈ Rn is a strict local solution of (P) if
▶ x∗ ∈ A
▶ ∃r > 0, ∀x ∈ B(x∗ , r ) ∩ A\{x∗ }, f (x∗ ) < f (x)
Introduction to Optimization | Generalities on Optimization | March 4th | Slide 16/46
Example
y f (x) = 0.4x(sin(x) + cos(2x))

B(x ∗ , r )

A x∗ x
Local Minimum

Global Minimum

Introduction to Optimization | Generalities on Optimization | March 4th | Slide 17/46


Theoretical Issues

▶ Existence and uniqueness of solutions


▶ Characterization of solutions (necessary conditions) Lecture 2-4
▶ Verification of optimality (sufficient conditions) Lecture 2-4
▶ Perturbed problems (stability of solutions)

Introduction to Optimization | Generalities on Optimization | March 4th | Slide 18/46


Practical Issues

▶ Modeling and approximations Lecture 2


▶ Numerical search for solutions
▶ Design and application of resolution algorithms Lecture 3
▶ Proof of convergence

Introduction to Optimization | Generalities on Optimization | March 4th | Slide 19/46


Differential Calculus

Introduction to Optimization | Differential Calculus | March 4th | Slide 20/46


Why do need we to know calculus ?

▶ Expert in trajectory
computation (orbital
mechanics)

▶ She was asked to recheck all


the calculations for the orbit of
the first American in space in
1961

▶ She was more trusted than


electronic computers of the
time
Katherine Johnson (1918-2020),
American mathematician

Introduction to Optimization | Differential Calculus | March 4th | Slide 21/46


Partial derivatives

Definition
We consider a smooth function f : Rn → R.
For a fixed x = [x1 , . . . , xn ]T ∈ Rn and y ∈ R set :
fi (y ) = f (x1 , . . . , xi−1 , y , xi+1 , . . . , xn )
|{z}
i th
∂f
The partial derivative of f with respect to xi at x, denoted by ∂xi reads:
∂f
(x) = fi ′ (xi )
∂xi

Introduction to Optimization | Differential Calculus | March 4th | Slide 22/46


Partial derivatives
▶ Example 1: f : R2 → R
f (x) = f (x1 , x2 ) = 2(x1 − 1)2 + x1 x2 + x22
∂f ∂f
(x) = 4(x1 − 1) + x2 (x) = x1 + 2x2
∂x1 ∂x2
▶ Example 2: f : Rn → R
Xn
f (x) = (xi − xi−1 )2
i=2
= (x2 − x1 )2 + · · · + (xi − xi−1 )2 + (xi+1 − xi )2 + · · · + (xn − xn−1 )2

∂f ∂f
(x) = 2(x1 − x2 ) (x) = 2(xn − xn−1 )
∂x1 ∂xn

∂f
(x) = 2(xi − xi−1 ) + 2(xi − xi+1 ) for i = 3, . . . , n − 1
∂xi

Introduction to Optimization | Differential Calculus | March 4th | Slide 23/46


Directional derivatives

We consider a smooth function f : Rn → R


Definition
Let x, h ∈ Rn , the derivative of f at x in the direction h reads:
f (x + ϵh) − f (x)
f ′ (x; h) := lim , (ϵ > 0)
ϵ→0 ϵ
In other words :
f ′ (x; h) = ϕ′ (0)
where ϕ(ϵ) = f (x + ϵh).
Here, ϕ is the restriction of f along the line passing through x with
direction h.

Introduction to Optimization | Differential Calculus | March 4th | Slide 24/46


Directional derivatives: example

Introduction to Optimization | Differential Calculus | March 4th | Slide 25/46


Directional derivatives and Partial derivatives

Remarks
▶ Let ei = [0, . . . , 0, 1 , 0, . . . , 0]T be i th vector column of the
|{z}
i th
canonical basis of Rn
▶ The partial derivatives are in fact the directional derivatives in the
directions of the canonical basis
∂f f (x + ϵei ) − f (x)
(x) = f ′ (x; ei )(x) = lim
∂xi ϵ→0 ϵ

Introduction to Optimization | Differential Calculus | March 4th | Slide 26/46


Gradient

We consider a scalar function f : Rn → R


First-order derivative
▶ We will denote by C 1 = C 1 (Rn , R) the set of continuously
differentiable functions for Rn to R
▶ For f ∈ C 1 (Rn , R) and x ∈ Rn , one defines the gradient of f at x by
 ∂f 
  ∂x1 (x)
∂f
∇f (x) := (x) =  ...  ∈ Rn
 
∂xi 1≤i≤n ∂f
∂xn (x)
which is the column vector of partial derivatives.
▶ One gets for h ∈ Rn (Ex 3)
n
X ∂f
f ′ (x; h) = ∇f (x)T h = (x)hi
∂xi
i=1

Introduction to Optimization | Differential Calculus | March 4th | Slide 27/46


Hessian

We consider a smooth function f : Rn → R scalar function


Second-order derivative
▶ We will denote by C 2 = C 2 (Rn , R) the set of twice continuously
differentiable functions for Rn to R.
▶ For f ∈ C 2 (Rn , R)and x ∈ Rn , one defines the Hessian of f at x by

∂2f
 
2
∇ f (x) := (x)
∂xi ∂xj 1≤i,j≤n
 ∂2f ∂2f

2 (x) · · · ∂x ∂x (x)
 ∂x1. n
..
1
..  ∈ Rn×n

= .
. . . 
∂2f ∂2f
∂x1 ∂xn (x) · · · ∂x 2
(x)
n

The Hessian is a square symmetric matrix.

Introduction to Optimization | Differential Calculus | March 4th | Slide 28/46


Gradient and Hessian: example for two variables

Consider the two-dimensional example


1
f (x1 , x2 ) = (1 − x1 )2 + (1 − x2 )2 + (2x2 − x12 )2
2
" #
∂f
−2(1 − x1 ) − 2x1 (2x2 − x12 )
 
∇f (x) = ∂x1 (x)
=
∂f −2(1 − x2 ) + 2(2x2 − x12 )
∂x2 (x)
 2 
∂ f ∂2f
2 ∂x12
(x) ∂x2 ∂x1 (x)  2
6x1 − 4x2 + 2 −4x1

∇ f (x) = 
∂2f ∂2f
 =
∂x ∂x (x) 2 (x)
−4x1 6
1 2 ∂x2

Gradient and Hessian for a specific point x = [0, −2]T


   
−2 2 10 0
∇f (0, −2) = , ∇ f (0, −2) =
−14 0 6

Introduction to Optimization | Differential Calculus | March 4th | Slide 29/46


Taylor expansion - Integral Remainder

We consider a smooth function f : Rn → R scalar function


First-order Taylor expansion
If f ∈ C 1 , for any x, h ∈ Rn
Z 1
f (x + h) = f (x) + ∇f (x + th)T h dt.
0

Second-order Taylor expansion


If f ∈ C 2 , for any x, h ∈ Rn
Z 1
1 T
f (x + h) = f (x) + ∇f (x) h + hT ∇2 f (x + th)h dt.
2 0

Introduction to Optimization | Differential Calculus | March 4th | Slide 30/46


Taylor expansion

We consider a smooth function f : Rn → R scalar function


First-order Taylor expansion
If f ∈ C 1 , for any x, h ∈ Rn
f (x + h) = f (x) + ∇f (x)T h + ||h||ϵ(h)
lim ϵ(h) = 0
||h||→0

Second-order Taylor expansion


If f ∈ C 2 , for any x, h ∈ Rn
1
f (x + h) = f (x) + ∇f (x) T h + hT ∇2 f (x) h + ||h||2 ϵ(h)
| {z } 2 | {z }
gradient Hessian
lim ϵ(h) = 0
||h||→0

⇒ useful to identify the gradient and Hessian of any function

Introduction to Optimization | Differential Calculus | March 4th | Slide 31/46


Taylor expansion for two variables
Consider the two-dimensional example
1
f (x1 , x2 ) = (1 − x1 )2 + (1 − x2 )2 + (2x2 − x12 )2
2
T
Taylor series expansion about x = [0, −2] (see Slide 21), for
h = [h1 , h2 ]T
 
T 1 T 10 0
f (x + h) ≈ 18 + [−2 − 14] h + h h
2 0 6

Engineering Design Optimization Book, 2022

Introduction to Optimization | Differential Calculus | March 4th | Slide 32/46


The Jacobian matrix
We consider a smooth vector function f : Rn → Rm such that for x ∈ Rn :
 
f1 (x)
f (x) =  ... 
 

fm (x)
with scalar components fi : Rn → R for 1 ≤ i ≤ m
First-order derivative
If f is continuously differentiable on Rn , one defines for any x ∈ Rn the
Jacobian of f at x by
 ∂f1 ∂f1 
∂x1 (x) · · · ∂xn (x)
Jf (x) := f ′ (x) :=  ... .. ..  ∈ Rm×n

. . 
∂fm ∂fm
∂x (x) · · · ∂xn (x)
 1T 
∇f1 (x)
′ ..
Jf (x) := f (x) :=   m × n rectangular matrix
 
.
T
∇fm (x)
Introduction to Optimization | Differential Calculus | March 4th | Slide 33/46
Differential form and Jacobian matrix

We consider a smooth function f : Rn → Rm vector function


Some rules of differentiation
▶ Linearity: if f (x) = αu(x) + βv (x), with u,v : Rn → Rm and
(α, β) ∈ R2 then f ′ (x) = αu ′ (x) + βv ′ (x).
▶ The chain rule: for v : Rn → Rp and u : Rp → Rm and f defined
for x ∈ Rn such that
f (x) = u ◦ v (x) = u(v (x)) composition of two functions
Then one has:
f ′ (x) = (u ◦ v )′ (x) = u ′ (v (x))v ′ (x)
m×p p×n
z }| { z }| {
Jf (x) = Ju◦v (x) = Ju (v (x)) Jv (x)
| {z }
matrix product

Introduction to Optimization | Differential Calculus | March 4th | Slide 34/46


Chain rule examples: univariate (n = p = m = 1)

Single-variable chain rule (n = p = m = 1): Let u, v and f be such that


for any x ∈ R and y ∈ R:
v (x) = x 2 u(y ) = sin(y )

f (x) = u(v (x)) = sin(x 2 )


Then, the Jacobian (derivative) of f reads:
f ′ (x) = u ′ (v (x))v ′ (x) = cos(x 2 )(2x)

Introduction to Optimization | Differential Calculus | March 4th | Slide 35/46


Chain rule examples: multivariate (n = 1, m = 2, p = 2)

Let u, v and f be such that for any x ∈ R and y ∈ R2 :


 2  
x sin(y1 ) + y2
v (x) = 3 u(y1 , y2 ) =
x y2

sin(x 2 ) + x 3
 
f (x) = u(v (x)) =
x3
One has:
   
′ 2x cos(y1 ) 1

v (x) = u (y1 , y2 ) =
3x 2 0 1

cos(x 2 ) 1 2x
  
′ ′ ′
f (x) = u (v (x))v (x) =
0 1 3x 2
2x cos(x 2 ) + 3x 2
 

f (x) =
3x 2

Introduction to Optimization | Differential Calculus | March 4th | Slide 36/46


Level curves of a function of 2 variables
Definition
Let f : R2 → R and z ∈ R. The level curve (or contour map) of f of
level z is the set:
Cz = {(x, y ) ∈ R2 |f (x, y ) = z}.

Example 1: f (x, y ) = 2x 2 + 5y 2
For a given z, we get the level curves by 2x 2 + 5y 2 = z.
⇒ It describes a family of ellipses.

Introduction to Optimization | Differential Calculus | March 4th | Slide 37/46


Level curves of a function of 2 variables
Definition
Let f : R2 → R and z ∈ R. The level curve (or contour map) of f of
level z is the set:
Cz = {(x, y ) ∈ R2 |f (x, y ) = z}.

Example 2: f (x, y ) = −xy exp(−x 2 − y 2 )

Introduction to Optimization | Differential Calculus | March 4th | Slide 38/46


Gradient and level curves
The gradient vector at (x, y ), ∇f (x, y ), is perpendicular to the level curve
of the function passing through (x, y ).
 
2 2 4x
Example 1:f (x, y ) = 2x + 5y , ∇f (x, y ) =
10y

Introduction to Optimization | Differential Calculus | March 4th | Slide 39/46


Convexity

Introduction to Optimization | Convexity | March 4th | Slide 40/46


Convex sets

Definition
A set C ∈ Rn is called convex if
∀(x, y) ∈ C , ∀t ∈ [0, 1], tx + (1 − t)y ∈ C .

x y x y

A convex set A nonconvex set

Examples
▶ Rn
▶ Ball: {x ∈ Rn | ∥x∥ ≤ 1}

Introduction to Optimization | Convexity | March 4th | Slide 41/46


Convex functions
Generic definition
A function f : Rn → R is convex if
∀(x, y) ∈ (Rn )2 , ∀t ∈ [0, 1], f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y).

f (y)
f (x) f (x)
f (y)

x
x y x y
(1 − t)x + ty (1 − t)x + ty

Convex function Nonconvex function

Remark
A function f : Rn → Rm is convex iff its components (f1 , . . . fm ) are convex

Introduction to Optimization | Convexity | March 4th | Slide 42/46


Convex functions: examples

▶ Linear functions: f (x) = Ax for A ∈ Rm×n


▶ Affine functions: f (x) = Ax + b for A ∈ Rm×n and b ∈ Rm
▶ Squared Euclidean norm: f (x) = ∥x∥22 = xT x

Introduction to Optimization | Convexity | March 4th | Slide 43/46


Smooth convex functions
Convexity and Gradient
A continuously differentiable function f : Rn → R is convex if and only if
∀(x, y) ∈ (Rn )2 , f (y) ≥ f (x) + ∇f (x)T (y − x).

f (y)

f (x) + ∇f (x)T (y − x)
f (x)
x
x y
Convex function

Convexity and Hessian


A twice continuously differentiable function f : Rn → R is convex ⇔
∀x ∈ Rn , ∇2 f (x) is positive semi-definite (i.e., ∇2 f (x) ⪰ 0).
Introduction to Optimization | Convexity | March 4th | Slide 44/46
Gradient and Hessian of a quadratic form

Definition
Let S ∈ Rn×n be a real symmetric matrix, c ∈ Rn , and α ∈ R, a quadratic
form is a function f : Rn → R given by
1
f (x) = xT Sx + cT x + α
2
In this case, one has
∇f (x) = Sx + c ∈ Rn

∇2 f (x) = S

Convex quadratic function


The function f is convex ⇐⇒ S ⪰ 0

Introduction to Optimization | Convexity | March 4th | Slide 45/46


Gradient and Hessian of a quadratic form - Identification

1
f (x) = xT Sx + cT x + α
2
Taylor expansion with x ∈ Rn and h ∈ Rn :
1
f (x + h) = (x + h)T S(x + h) + cT (x + h) + α
2
1 T 
= x Sx + xT Sh + hT Sx + hT Sh + cT x + cT h + α
2
1 1
= f (x) + (xT Sh + hT
| {zSx} ) + hT Sh + cT h
2 T T T
2
=(h Sx) =x Sh
1
= f (x) + xT Sh + hT Sh + cT h
2
T 1 T
= f (x) + (Sx + c) h + h Sh
| {z } 2 | {z }
T 2 h ∇ f (x)h
∇f (x)T h

By identification: ⇒ ∇f (x) = Sx + c ∈ Rn and ∇2 f (x) = S ∈ Rn×n .


Introduction to Optimization | Convexity | March 4th | Slide 46/46

You might also like