Convex Optimization

(EE227A: UC Berkeley)

Lecture 4
(Conjugates, subdifferentials)
31 Jan, 2013

Suvrit Sra
I Eigenvalues, singular values, positive definiteness
I Convex sets, θ1 x + θ2 y ∈ C, θ1 + θ2 = 1, θi ≥ 0
I Convex functions, midpoint convex, recognizing convexity
I Norms, mixed-norms, matrix norms, dual norms
I Indicator, distance function, minimum of jointly convex
I Brief mention of other forms of convexity

f 2 ≤ 12 [f (x) + f (y)] + continuity =⇒ f is cvx
∇2 f (x)  0 implies f is cvx.

Fenchel Conjugate

Dual norms

Def. Let k·k be a norm on Rn . Its dual norm is

kuk∗ := sup uT x | kxk ≤ 1 .

uT x
Exercise: Verify that we may writekuk∗ = supx6=0 kxk
Exercise: Verify that kuk∗ is a norm.
I ku + vk∗ = sup (u + v)T x | kxk ≤ 1

I But sup (A + B) ≤ sup A + sup B

Exercise: Let 1/p + 1/q = 1, where p, q ≥ 1. Show that k·kq is

dual to k·kp . In particular, the `2 -norm is self-dual.
Hint: Use Hölder’s inequality: uT v ≤ kukp kvkq

Fenchel conjugate
Def. The Fenchel conjugate of a function f is

f ∗ (z) := sup xT z − f (x).

x∈dom f

Note: f ∗ pointwise (over x) sup of linear functions of z, so cvx!

Example Let f (x) = kxk. We have f ∗ (z) = Ik·k∗ ≤1 (z). That is,
conjugate of norm is the indicator function of dual norm ball.

I Consider two cases: (i) kzk∗ > 1; (ii) kzk∗ ≤ 1

I Case (i), by definition of dual norm (sup over z T u) there is a u s.t.
kuk ≤ 1 and z T u > 1
I f ∗ (z) = supx xT z − f (x). Rewrite x = αu, and let α → ∞
I Then, z T x − kxk = αz T u − kαuk = α(z T u − kuk); → ∞
I Case (ii): Since z T x ≤ kxkkzk∗ , xT z − kxk ≤ kxk(kzk∗ − 1) ≤ 0.
I x = 0 maximizes kxk(kzk∗ − 1), hence f (z) = 0.
I Thus, f (z) = +∞ if (i), and 0 if (ii), as desired.
Fenchel conjugate
Example f (x) = ax + b; then,
f ∗ (z) = sup zx − (ax + b)
= ∞, if (z − a) 6= 0.

Fenchel conjugate
Example f (x) = ax + b; then,
f ∗ (z) = sup zx − (ax + b)
= ∞, if (z − a) 6= 0.

Thus, dom f ∗ = {a}, and f ∗ (a) = −b.

Example Let a ≥ 0, and set √ f (x) = − a2 − x2 if |x| ≤ a, and +∞
otherwise. Then, f ∗ (z) = a 1 + z 2 .

Example f (x) = 12 xT Ax, where A  0. Then, f ∗ (z) = 12 z T A−1 z.

Exercise: If f (x) = max(0, 1 − x), then dom f ∗ is [−1, 0], and

within this domain, f ∗ (z) = z.
Hint: Analyze cases: max(0, 1 − x) = 0; and
max(0, 1 − x) = 1 − x

First order global underestimator

f (x) i
), x
h∇ f (y
f (y
f (y)

y x

f (x) ≥ f (y) + h∇f (y), x − yi

First order global underestimator

f (x)

f (y) x − yi
+ hg 1,
f (y )


f (x) ≥ f (y) + hg, x − yi

f (x)

f (y) x − yi
+ hg 1,
f (y )


g1 , g2 , g3 are subgradients at y

Subgradients – basic facts
I f is convex, differentiable: ∇f (y) the unique subgradient at y
I A vector g is a subgradient at a point y if and only if
f (y) + hg, x − yi is globally smaller than f (x).
I Usually, one subgradient costs approx. as much as f (x)
I Determining all subgradients at a given point — difficult.
I Subgradient calculus—a great achievement in convex analysis
I Without convexity, things become wild! — advanced course

Subgradients – example

f (x) := max(f1 (x), f2 (x)); both f1 , f2 convex, differentiable

f (x) f1 (x)

f2 (x)

? f1 (x) > f2 (x): unique subgradient of f is f10 (x)
? f1 (x) < f2 (x): unique subgradient of f is f20 (x)
? f1 (y) = f2 (y): subgradients, the segment [f10 (y), f20 (y)]
(imagine all supporting lines turning about point y)
Def. A vector g ∈ Rn is called a subgradient at a point y, if for all

x ∈ dom f , it holds that

f (x) ≥ f (y) + hg, x − yi

Def. The set of all subgradients at y denoted by ∂f (y). This set is

called subdifferential of f at y
If f is convex, ∂f (x) is nice:
♣ If x ∈ relative interior of dom f , then ∂f (x) nonempty
♣ If f differentiable at x, then ∂f (x) = {∇f (x)}
♣ If ∂f (x) = {g}, then f is differentiable and g = ∇f (x)

Subdifferential – example

f (x) = |x|

∂f (x)



 x < 0,
∂|x| = +1 x > 0,

[−1, 1] x = 0.

More examples

Example f (x) = kxk2 . Then,

2 x x 6= 0,
∂f (x) :=
{z | kzk2 ≤ 1} x = 0.

kzk2 ≥ kxk2 + hg, z − xi
kzk2 ≥ hg, zi
=⇒ kgk2 ≤ 1.

More examples

Example A convex function need not be subdifferentiable everywhere.

Let (
−(1 − kxk22 )1/2 if kxk2 ≤ 1,
f (x) :=
+∞ otherwise.
f diff. for all x with kxk2 < 1, but ∂f (x) = ∅ whenever kxk2 ≥ 1.

Recall basic calculus
If f and k are differentiable, we know that
Addition: ∇(f + k)(x) = ∇f (x) + ∇k(x)
Scaling: ∇(αf (x)) = α∇f (x)

Chain rule
If f :Rn → Rm , and k : Rm → Rp . Let h : Rn → Rp be the
composition h(x) = (k ◦ f )(x) = k(f (x)). Then,
Dh(x) = Dk(f (x))Df (x).

Example If f : Rn → R and k : R → R, then using the fact that

∇h(x) = [Dh(x)]T , we obtain

∇h(x) = k 0 (f (x))∇f (x).

Subgradient calculus
♠ Finding one subgradient within ∂f (x)
♠ Determining entire subdifferential ∂f (x) at a point x
♠ Do we have the chain rule?
♠ Usually not easy!

Subgradient calculus
If f is differentiable, ∂f (x) = {∇f (x)}
Scaling α > 0, ∂(αf )(x) = α∂f (x) = {αg | g ∈ ∂f (x)}
Addition∗ : ∂(f + k)(x) = ∂f (x) + ∂k(x) (set addition)

Chain rule∗ : Let A ∈ Rm×n , b ∈ Rm , f : Rm → R, and


h : Rn → R be given by h(x) = f (Ax + b). Then,

∂h(x) = AT ∂f (Ax + b).
Chain rule∗ : h(x) = f ◦ k, where k : X → Y is diff.

∂h(x) = ∂f (k(x)) ◦ Dk(x) = [Dk(x)]T ∂f (k(x))

Max function∗ : If f (x) := max1≤i≤m fi (x), then

∂f (x) = conv {∂fi (x) | fi (x) = f (x)} ,

convex hull over subdifferentials of “active” functions at x

Conjugation: z ∈ ∂f (x) if and only if x ∈ ∂f ∗ (z)

It can happen that ∂(f1 + f2 ) 6= ∂f1 + ∂f2

Example Define f1 and f2 by

( √ (
−2 x if x ≥ 0, +∞ if x > 0,
f1 (x) := and f2 (x) := √
+∞ if x < 0, −2 −x if x ≤ 0.

Then, f = max {f1 , f2 } = I0 , whereby ∂f (0) = R

But ∂f1 (0) = ∂f2 (0) = ∅.

However, ∂f1 (x) + ∂f2 (x) ⊂ ∂(f1 + f2 )(x) always holds.

Example f (x) = kxk∞ . Then,

∂f (0) = conv {±e1 , . . . , ±en } ,

where ei is i-th canonical basis vector (column of identity matrix).

To prove, notice that f (x) = max1≤i≤n |eTi x| .

Example (S. Boyd)

Example Let f (x) = max sT x | si ∈ {−1, 1} (2n members)

(−1, 1)
+1 (1, 1)

−1 −1
(1, −1)

∂f at x = (0, 0) ∂f at x = (1, 0) ∂f at x = (1, 1)

Subgradient for pointwise sup
f (x) := sup h(x, y)

Getting ∂f (x) is complicated!

Simple way to obtain some g ∈ ∂f (x):

I Pick any y ∗ for which h(x, y ∗ ) = f (x)
I Pick any subgradient g ∈ ∂h(x, y ∗ )
I This g ∈ ∂f (x)

h(z, y ∗ ) ≥ h(x, y ∗ ) + g T (z − x)
h(z, y ∗ ) ≥ f (x) + g T (z − x)
f (z) ≥ h(z, y) (because of sup)
f (z) ≥ f (x) + g (z − x).

Suppose ai ∈ Rn and bi ∈ R. And

f (x) := max (aTi x + bi ).


This f a max (in fact, over a finite number of terms)

I Suppose f (x) = aTk x + bk for some index k
I Here f (x; y) = fk (x) = aTk x + bk , and ∂fk (x) = {∇fk (x)}
I Hence, ak ∈ ∂f (x) works!

Subgradient of expectation
Suppose f = Ef (x, u), where f is convex in x for each u (an r.v.)
f (x) := f (x, u)p(u)du

I For each u choose any g(x, u) ∈ ∂x f (x, u)

I Then, g = g(x, u)p(u)du = Eg(x, u) ∈ ∂f (x)

Subgradient of composition
Suppose h : Rn → R cvx and nondecreasing; each fi cvx

f (x) := h(f1 (x), f2 (x), . . . , fn (x)).

To find a vector g ∈ ∂f (x), we may:

I For i = 1 to n, compute gi ∈ ∂fi (x)
I Compute u ∈ ∂h(f1 (x), . . . , fn (x))
I Set g = u1 g1 + u2 g2 + · · · + un gn ; this g ∈ ∂f (x)
I Compare with ∇f (x) = J∇h(x), where J matrix of ∇fi (x)

Exercise: Verify g ∈ ∂f (x) by showing f (z) ≥ f (x) + g T (z − x)

1 R. T. Rockafellar. Convex Analysis
2 S. Boyd (Stanford); EE364b Lecture Notes.

