lect4_removed

Convex Optimization
(EE227A: UC Berkeley)
Lecture 4
(Conjugates, subdifferentials)
31 Jan, 2013
◦
Suvrit Sra
Organizational
♥ HW1 due: 14th Feb 2013 in class.
♥ Please LATEX your solutions (contact TA if this is an issue)
♥ Discussion with classmates is ok
♥ Each person must submit his/her individual solutions
♥ Acknowledge any help you receive
♥ Do not copy!
♥ Make sure you understand the solution you submit
♥ Cite any source that you use
♥ Have fun solving problems!
2 / 30
Recap
I Eigenvalues, singular values, positive definiteness
I Convex sets, θ1 x + θ2 y ∈ C, θ1 + θ2 = 1, θi ≥ 0
I Convex functions, midpoint convex, recognizing convexity
I Norms, mixed-norms, matrix norms, dual norms
I Indicator, distance function, minimum of jointly convex
I Brief mention of other forms of convexity
x+y
f 2 ≤ 12 [f (x) + f (y)] + continuity =⇒ f is cvx
∇2 f (x) 0 implies f is cvx.
3 / 30
Fenchel Conjugate
4 / 30
Dual norms
Def. Let k·k be a norm on Rn . Its dual norm is
kuk∗ := sup uT x | kxk ≤ 1 .

uT x
Exercise: Verify that we may writekuk∗ = supx6=0 kxk
Exercise: Verify that kuk∗ is a norm.
I ku + vk∗ = sup (u + v)T x | kxk ≤ 1

I But sup (A + B) ≤ sup A + sup B
Exercise: Let 1/p + 1/q = 1, where p, q ≥ 1. Show that k·kq is

dual to k·kp . In particular, the `2 -norm is self-dual.
Hint: Use Hölder’s inequality: uT v ≤ kukp kvkq
5 / 30
Fenchel conjugate
Def. The Fenchel conjugate of a function f is
f ∗ (z) := sup xT z − f (x).

x∈dom f
Note: f ∗ pointwise (over x) sup of linear functions of z, so cvx!
Example Let f (x) = kxk. We have f ∗ (z) = Ik·k∗ ≤1 (z). That is,
conjugate of norm is the indicator function of dual norm ball.
I Consider two cases: (i) kzk∗ > 1; (ii) kzk∗ ≤ 1

I Case (i), by definition of dual norm (sup over z T u) there is a u s.t.
kuk ≤ 1 and z T u > 1
I f ∗ (z) = supx xT z − f (x). Rewrite x = αu, and let α → ∞
I Then, z T x − kxk = αz T u − kαuk = α(z T u − kuk); → ∞
I Case (ii): Since z T x ≤ kxkkzk∗ , xT z − kxk ≤ kxk(kzk∗ − 1) ≤ 0.
I x = 0 maximizes kxk(kzk∗ − 1), hence f (z) = 0.
I Thus, f (z) = +∞ if (i), and 0 if (ii), as desired.
6 / 30
Fenchel conjugate
Example f (x) = ax + b; then,
f ∗ (z) = sup zx − (ax + b)
x
= ∞, if (z − a) 6= 0.
7 / 30
Fenchel conjugate
Example f (x) = ax + b; then,
f ∗ (z) = sup zx − (ax + b)
x
= ∞, if (z − a) 6= 0.
Thus, dom f ∗ = {a}, and f ∗ (a) = −b.

√
Example Let a ≥ 0, and set √ f (x) = − a2 − x2 if |x| ≤ a, and +∞
otherwise. Then, f ∗ (z) = a 1 + z 2 .
Example f (x) = 12 xT Ax, where A 0. Then, f ∗ (z) = 12 z T A−1 z.
Exercise: If f (x) = max(0, 1 − x), then dom f ∗ is [−1, 0], and

within this domain, f ∗ (z) = z.
Hint: Analyze cases: max(0, 1 − x) = 0; and
max(0, 1 − x) = 1 − x
7 / 30
Subdifferentials
8 / 30
First order global underestimator
f (x) i
−y
), x
h∇ f (y
)+
f (y
f (y)
y x
f (x) ≥ f (y) + h∇f (y), x − yi
9 / 30
First order global underestimator
f (x)
f (y) x − yi
+ hg 1,
f (y )
g1
g2
g3
y
f (x) ≥ f (y) + hg, x − yi
9 / 30
Subgradients
f (x)
f (y) x − yi
+ hg 1,
f (y )
g1
g2
g3
y
g1 , g2 , g3 are subgradients at y
10 / 30
Subgradients – basic facts
I f is convex, differentiable: ∇f (y) the unique subgradient at y
I A vector g is a subgradient at a point y if and only if
f (y) + hg, x − yi is globally smaller than f (x).
I Usually, one subgradient costs approx. as much as f (x)
I Determining all subgradients at a given point — difficult.
I Subgradient calculus—a great achievement in convex analysis
I Without convexity, things become wild! — advanced course
11 / 30
Subgradients – example
f (x) := max(f1 (x), f2 (x)); both f1 , f2 convex, differentiable
f (x) f1 (x)
f2 (x)
y
? f1 (x) > f2 (x): unique subgradient of f is f10 (x)
? f1 (x) < f2 (x): unique subgradient of f is f20 (x)
? f1 (y) = f2 (y): subgradients, the segment [f10 (y), f20 (y)]
(imagine all supporting lines turning about point y)
12 / 30
Subgradients
Def. A vector g ∈ Rn is called a subgradient at a point y, if for all

x ∈ dom f , it holds that
f (x) ≥ f (y) + hg, x − yi
13 / 30
Subdifferential
Def. The set of all subgradients at y denoted by ∂f (y). This set is

called subdifferential of f at y
If f is convex, ∂f (x) is nice:
♣ If x ∈ relative interior of dom f , then ∂f (x) nonempty
♣ If f differentiable at x, then ∂f (x) = {∇f (x)}
♣ If ∂f (x) = {g}, then f is differentiable and g = ∇f (x)
14 / 30
Subdifferential – example
f (x) = |x|
∂f (x)
+1
x
−1

−1
 x < 0,
∂|x| = +1 x > 0,

[−1, 1] x = 0.

15 / 30
More examples
Example f (x) = kxk2 . Then,

(
kxk−1
2 x x 6= 0,
∂f (x) :=
{z | kzk2 ≤ 1} x = 0.
Proof.
kzk2 ≥ kxk2 + hg, z − xi
kzk2 ≥ hg, zi
=⇒ kgk2 ≤ 1.
16 / 30
More examples
Example A convex function need not be subdifferentiable everywhere.

Let (
−(1 − kxk22 )1/2 if kxk2 ≤ 1,
f (x) :=
+∞ otherwise.
f diff. for all x with kxk2 < 1, but ∂f (x) = ∅ whenever kxk2 ≥ 1.
17 / 30
Calculus
18 / 30
Recall basic calculus
If f and k are differentiable, we know that
Addition: ∇(f + k)(x) = ∇f (x) + ∇k(x)
Scaling: ∇(αf (x)) = α∇f (x)
Chain rule
If f :Rn → Rm , and k : Rm → Rp . Let h : Rn → Rp be the
composition h(x) = (k ◦ f )(x) = k(f (x)). Then,
Dh(x) = Dk(f (x))Df (x).
Example If f : Rn → R and k : R → R, then using the fact that

∇h(x) = [Dh(x)]T , we obtain
∇h(x) = k 0 (f (x))∇f (x).
19 / 30
Subgradient calculus
♠ Finding one subgradient within ∂f (x)
♠ Determining entire subdifferential ∂f (x) at a point x
♠ Do we have the chain rule?
♠ Usually not easy!
20 / 30
Subgradient calculus
H
If f is differentiable, ∂f (x) = {∇f (x)}
H
Scaling α > 0, ∂(αf )(x) = α∂f (x) = {αg | g ∈ ∂f (x)}
Addition∗ : ∂(f + k)(x) = ∂f (x) + ∂k(x) (set addition)
H
Chain rule∗ : Let A ∈ Rm×n , b ∈ Rm , f : Rm → R, and

H
h : Rn → R be given by h(x) = f (Ax + b). Then,

∂h(x) = AT ∂f (Ax + b).
Chain rule∗ : h(x) = f ◦ k, where k : X → Y is diff.
H
∂h(x) = ∂f (k(x)) ◦ Dk(x) = [Dk(x)]T ∂f (k(x))
Max function∗ : If f (x) := max1≤i≤m fi (x), then

H
[
∂f (x) = conv {∂fi (x) | fi (x) = f (x)} ,
convex hull over subdifferentials of “active” functions at x

Conjugation: z ∈ ∂f (x) if and only if x ∈ ∂f ∗ (z)
H
21 / 30
Examples
It can happen that ∂(f1 + f2 ) 6= ∂f1 + ∂f2
Example Define f1 and f2 by

( √ (
−2 x if x ≥ 0, +∞ if x > 0,
f1 (x) := and f2 (x) := √
+∞ if x < 0, −2 −x if x ≤ 0.
Then, f = max {f1 , f2 } = I0 , whereby ∂f (0) = R

But ∂f1 (0) = ∂f2 (0) = ∅.
However, ∂f1 (x) + ∂f2 (x) ⊂ ∂(f1 + f2 )(x) always holds.
22 / 30
Examples
Example f (x) = kxk∞ . Then,
∂f (0) = conv {±e1 , . . . , ±en } ,
where ei is i-th canonical basis vector (column of identity matrix).

To prove, notice that f (x) = max1≤i≤n |eTi x| .

23 / 30
Example (S. Boyd)
Example Let f (x) = max sT x | si ∈ {−1, 1} (2n members)

(−1, 1)
+1 (1, 1)
+1
−1 −1
(1, −1)
∂f at x = (0, 0) ∂f at x = (1, 0) ∂f at x = (1, 1)
24 / 30
Rules for subgradients
25 / 30
Subgradient for pointwise sup
f (x) := sup h(x, y)
y∈Y
Getting ∂f (x) is complicated!
Simple way to obtain some g ∈ ∂f (x):

I Pick any y ∗ for which h(x, y ∗ ) = f (x)
I Pick any subgradient g ∈ ∂h(x, y ∗ )
I This g ∈ ∂f (x)
h(z, y ∗ ) ≥ h(x, y ∗ ) + g T (z − x)
h(z, y ∗ ) ≥ f (x) + g T (z − x)
f (z) ≥ h(z, y) (because of sup)
T
f (z) ≥ f (x) + g (z − x).
26 / 30
Example
Suppose ai ∈ Rn and bi ∈ R. And
f (x) := max (aTi x + bi ).

1≤i≤n
This f a max (in fact, over a finite number of terms)

I Suppose f (x) = aTk x + bk for some index k
I Here f (x; y) = fk (x) = aTk x + bk , and ∂fk (x) = {∇fk (x)}
I Hence, ak ∈ ∂f (x) works!
27 / 30
Subgradient of expectation
Suppose f = Ef (x, u), where f is convex in x for each u (an r.v.)
Z
f (x) := f (x, u)p(u)du
I For each u choose any g(x, u) ∈ ∂x f (x, u)

R
I Then, g = g(x, u)p(u)du = Eg(x, u) ∈ ∂f (x)
28 / 30
Subgradient of composition
Suppose h : Rn → R cvx and nondecreasing; each fi cvx
f (x) := h(f1 (x), f2 (x), . . . , fn (x)).
To find a vector g ∈ ∂f (x), we may:

I For i = 1 to n, compute gi ∈ ∂fi (x)
I Compute u ∈ ∂h(f1 (x), . . . , fn (x))
I Set g = u1 g1 + u2 g2 + · · · + un gn ; this g ∈ ∂f (x)
I Compare with ∇f (x) = J∇h(x), where J matrix of ∇fi (x)
Exercise: Verify g ∈ ∂f (x) by showing f (z) ≥ f (x) + g T (z − x)
29 / 30
References
1 R. T. Rockafellar. Convex Analysis
2 S. Boyd (Stanford); EE364b Lecture Notes.
30 / 30

lect4_removed

Uploaded by

Copyright:

Available Formats

lect4_removed

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

lect4_removed

Uploaded by

Copyright:

Available Formats

Convex Optimization

Def. Let k·k be a norm on Rn . Its dual norm is

kuk∗ := sup uT x | kxk ≤ 1 .

I But sup (A + B) ≤ sup A + sup B

Exercise: Let 1/p + 1/q = 1, where p, q ≥ 1. Show that k·kq is

f ∗ (z) := sup xT z − f (x).

Note: f ∗ pointwise (over x) sup of linear functions of z, so cvx!

I Consider two cases: (i) kzk∗ > 1; (ii) kzk∗ ≤ 1

Thus, dom f ∗ = {a}, and f ∗ (a) = −b.

Example f (x) = 12 xT Ax, where A  0. Then, f ∗ (z) = 12 z T A−1 z.

Exercise: If f (x) = max(0, 1 − x), then dom f ∗ is [−1, 0], and

f (x) ≥ f (y) + h∇f (y), x − yi

f (x) := max(f1 (x), f2 (x)); both f1 , f2 convex, differentiable

Def. A vector g ∈ Rn is called a subgradient at a point y, if for all

f (x) ≥ f (y) + hg, x − yi

Def. The set of all subgradients at y denoted by ∂f (y). This set is

Example f (x) = kxk2 . Then,

Example A convex function need not be subdifferentiable everywhere.

Example If f : Rn → R and k : R → R, then using the fact that

∇h(x) = k 0 (f (x))∇f (x).

Chain rule∗ : Let A ∈ Rm×n , b ∈ Rm , f : Rm → R, and

h : Rn → R be given by h(x) = f (Ax + b). Then,

∂h(x) = ∂f (k(x)) ◦ Dk(x) = [Dk(x)]T ∂f (k(x))

Max function∗ : If f (x) := max1≤i≤m fi (x), then

convex hull over subdifferentials of “active” functions at x

It can happen that ∂(f1 + f2 ) 6= ∂f1 + ∂f2

Example Define f1 and f2 by

Then, f = max {f1 , f2 } = I0 , whereby ∂f (0) = R

However, ∂f1 (x) + ∂f2 (x) ⊂ ∂(f1 + f2 )(x) always holds.

Example f (x) = kxk∞ . Then,

∂f (0) = conv {±e1 , . . . , ±en } ,

where ei is i-th canonical basis vector (column of identity matrix).

Example Let f (x) = max sT x | si ∈ {−1, 1} (2n members)

∂f at x = (0, 0) ∂f at x = (1, 0) ∂f at x = (1, 1)

Getting ∂f (x) is complicated!

Simple way to obtain some g ∈ ∂f (x):

f (x) := max (aTi x + bi ).

This f a max (in fact, over a finite number of terms)

I For each u choose any g(x, u) ∈ ∂x f (x, u)

f (x) := h(f1 (x), f2 (x), . . . , fn (x)).

To find a vector g ∈ ∂f (x), we may:

Exercise: Verify g ∈ ∂f (x) by showing f (z) ≥ f (x) + g T (z − x)

You might also like

Example f (x) = 12 xT Ax, where A 0. Then, f ∗ (z) = 12 z T A−1 z.