0% found this document useful (0 votes)
26 views

Script Convex Analysis

Uploaded by

qc6r7vfzkn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Script Convex Analysis

Uploaded by

qc6r7vfzkn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 167

Technische Universität Chemnitz

Lecture notes

Convex Analysis
in normed spaces with applications to image processing

Sommersemester 2024

Lecture by Prof. Dr. Sebastian Neumayer


Exercises by Rajmadan Lakshmanan

Based on the script by Gabriele Steidl, which was modfied by Viktor Stein and Johannes Hertrich.
This script surely contains typos and I appreciate any hints for corrections and
improvements.
Last edited on July 23, 2024.
CONTENTS

Contents
1 Convex Sets 2
1.1 Basic Properties of Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Hahn-Banach Separation of Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Convex Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4 Orthogonal Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2 Convex Functions 26
2.1 Proper, Lower Semicontinuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 Convexity for Differentiable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4 Continuity Properties of Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.5 Minima of Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.6 Infimal Projection and Infimal Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.7 Proximal Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3 Subgradients 66
3.1 The Subdifferential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.2 Subdifferential Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4 Proximal Algorithms 77
4.1 Fixed Point Algorithms and Averaged Operators . . . . . . . . . . . . . . . . . . . . . . . 77
4.2 First-order Algorithms for Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.3 Application: Inverse Problems in Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5 Monotone Set-valued Mappings 103


5.1 Set-valued Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2 The Subdifferential as a Set-valued Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

6 Conjugate Functions 109


6.1 Conjugate Functions and their Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.2 Moreau’s Characterization of Proximal Mappings . . . . . . . . . . . . . . . . . . . . . . 118
6.3 Relations of Positively Homogeneous Functions to Support and Indicator Functions . . . . 123

7 Duality Theory 126


7.1 Conjugate Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7.2 Fenchel-type Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.3 Lagrangian Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.4 Lagrangian Duality and Saddle Point Problems . . . . . . . . . . . . . . . . . . . . . . . 138

8 Primal-Dual Algorithms 143


8.1 Alternating Direction Method of Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.2 Primal-Dual Hybrid Gradient Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

Index 155

2
CONTENTS

Foreword
In recent years Convex Analysis has become an important tool in image and data processing.
This lecture introduces Convex Analysis and in particular optimization algorithms for solving
convex variational problems.
In contrast to the initial version of this lecture, we present many results in general normed
spaces, Banach spaces or Hilbert spaces and highlight the differences to the Euclidean set-
ting. To keep the notations simple, we restrict our attention to real vector spaces. Neverthe-
less, many theorems can be generalized further to (Hausdorff locally convex) topological
vector spaces and to vector spaces over the complex numbers.
In order to get familiar with the concepts of the lecture and to see the relation to image
processing tasks, we highly recommend the participation in the exercises accompanying the
lecture.
Good treatments of Convex Analysis in normed spaces are [AO16, MN22, Luc06, Z0̆2, NP19],
where the latter is more introductory than the others. In [ET99] and the more advanced
texts [BS13, Chp. 2] and [BP12, Chp. 1-3] locally convex (Hausdorff) topological vector
spaces are considered, in [Bon19] and [BV10] Banach spaces, and in [BC11] Hilbert spaces.
For Euclidean cases, we refer to [Roc70b, HUL93] as an overview.

1
1 CONVEX SETS

1 Convex Sets
In this chapter, we consider convex sets such as affine sets, halfspaces and hyperplanes, Lecture 1,
polyhedral sets, simplices and convex cones as well as their basic properties. Moreover, 03.04.2024
we consider separation and projection theorems. Unless specified otherwise, E and F are
always real vector spaces. Later, these are additionally quipped with norms.

1.1 Basic Properties of Convex Sets


Textbooks: [AB99, Sec. 5.3], [Luc06, Sec. 1.1], [ET99, Sec. 1.1], [NP19, Chp. 3], [AO16,
Chp. 1-3], [MN22, Sec. 2.1], [Man94, Sec. 3.1] and [Hol12, Sec. 2].
A set is convex if all line segments connecting points in the set are subsets of the set.

Definition 1.1.1 (Convex set)


A subset S Ă E is convex if convex

p1 ´ λqx ` λy P S @λ P r0, 1s, @x, y P S.

A point x P E is a convex combination of pxk qnk“1 Ă E if there exist pλk qnk“1 Ă r0, 1s convex
such that combination
n
ÿ ÿn
x“ λ k xk and λk “ 1.
k“1 k“1

The convex hull of S Ă E is the set of all convex combinations of points in S: convex hull
# +
N
ÿ N
ÿ
convpSq :“ λk xk : N P N, pxk qN
k“1 Ă S, pλk qN
k“1 Ă r0, 1s, λk “ 1 .
k“1 k“1

2
y
λ= 3
1
3x + 23 y

Fig. 1.1: Left: Two points x, y P E “ R2 , the line segment connecting them and an
intermediary point. Right: A convex set and a non-convex set in R2 .

Example 1.1.2 The whole set E and the empty set H both are convex sets and so is any
singleton txu for x P E. The convex subsets of R are exactly the intervals. Open and closed
balls in pE, } ¨ }q are convex. The unit sphere tx P E : }x} “ 1u is never convex. ˛

2
1 CONVEX SETS

Fig. 1.2: Left: Convex combination p “ λ1 x1 ` λ2 x2 ` λ3 x3 of three points in R2 and


relations between distances at the triangle spanned by these points. Right: The convex
hull of a subset of R2 [Luc06, Fig. 1.2].

A set is convex if and only if it contains all convex combinations of its points, see Exer-
cise 1.1.20. In particular, convpSq Ă E is convex for any S Ă E.
Various combinations of convex sets are convex again. Here is an important example.

Definition 1.1.3 (Minkowski sum)


The Minkowski sum of two sets C1 , C2 Ă E is defined by Minkowski sum

C1 ` C2 :“ tc1 ` c2 : c1 P C1 , c2 P C2 u.

Fig. 1.3: The red set is the Minkowski sum of the blue and green set.

The Minkowski sum of two convex sets is convex. If C1 is closed and C2 is compact, then
C1 ` C2 is closed. On the other hand, the Minkowski sum of two closed (but not compact)
sets is not necessarily closed as the following example shows. Let

C1 :“ Rď0 ˆ Rě0 and C2 :“ tpx, yq P R2 : x ě 0, xy ě 1u. (1.1)

3
1 CONVEX SETS

Then, the Minkowski sum is given by C1 ` C2 “ tpx, yq P R2 : x P R, y ą 0u, which is not


closed. Moreover, it holds that convpaA ` bBq “ a convpAq ` b convpBq for all A, B Ă E
Fig. 1.4: The sets
and all a, b P R. The proof is left as Exercise 1.1.22.
from (1.1) and their
We now gather some operations preserving convex sets, see Exercise 1.1.24. Minkowski sum
[Luc06, Fig. 1.6].

Theorem 1.1.4: Operations preserving convex sets


Let C, D, Ci Ă E be convex sets for i P I and a, b P R. Further, let L : E Ñ F be an
affine function and S Ă F convex. Then, these sets are convex:
1 C ˆ D :“ tpc, dq : c P C, d P Du.
2 LpCq as well as the inverse image L´1 pSq.
3 aC ` bD “ tac ` bd : c P C, d P Du (in particular the translation x ` C :“
txu ` C for x P E).
iPI Ci .
Ş
4
5 The interior C ˝ and closure C if E is a normed space, .

Remark 1.1.5 (Convex hull) By Theorem 1.1.4 4 , we have for S Ă E that Fig. 1.5: While the
intersection of con-
č vex sets is convex,
convpSq “ C
their union is not
C convex:SĂCĂE
necessarily convex.
is convex. In this sense, the convex hull is the smallest convex set that includes S. ˝

Affine Sets and Functions


An important subclass of convex sets are the affine sets.

Definition 1.1.6 (Affine set, affine combination, affine hull)


A subset S Ă E is affine if affine

λx ` p1 ´ λqy P S @λ P R, @x, y P S.

A point x P E is an affine combination of pxk qnk“1 Ă E if there exist pλk qnk“1 Ă R with affine
combination
n
ÿ n
ÿ
x“ λk xk and λk “ 1. (1.2)
k“1 k“1

The affine hull of S Ă E is affpSq, the set of all affine combinations of points in S. The affine hull
closed affine hull of S Ă E is affpSq :“ affpSq. closed affine hull

4
1 CONVEX SETS

Fig. 1.6: A convex set C and its affine hull affpCq [Luc06, Fig. 1.3].

By definition, affine sets are convex. Moreover, any affine set S Ă E can be represented as
translation S “ L ` s of some linear subspace L Ă E with a vector s P E. In particular,
every linear subspace of E is an affine set and we have that affptxuq “ txu for all x P E.
The affine hull of two points is the unique line containing them both.
Remark 1.1.7 (Affine versus convex) Affine sets inherit many properties from convex
ones. For example, the properties 1 - 4 from Theorem 1.1.4 also hold if we replace
convex by affine. Moreover, a set is affine if and only if it contains all affine combinations
of its points and we have that
č
affpSq “ C,
C affine:SĂCĂE

i.e., affpSq is the smallest affine set containing S. However, not any convex set is affine and
consequently some properties do not coincide. For instance, if E is finite-dimensional, then
the affine hull is always closed, which is not true for the convex hull. ˝
Due to the relation of affine sets with linear subspaces, we can define an anlogous concept
to linear independence.

Definition 1.1.8 (Affine independence)


The points pxk qnk“1 Ă E are affinely independent if for pλk qnk“1 Ă R the relations affinely
independent
n
ÿ n
ÿ
λ k xk “ 0 and λk “ 0 (1.3)
k“1 k“1

imply that λ1 “ . . . “ λn “ 0.

5
1 CONVEX SETS

x2 x2

x1 x1 x4

x2

x1

x3 x3 x3

Fig. 1.7: In the first plot, the points are affinely independent. In the others, they are not.

Affine and linear independence are closely related. More precisely, the points pxk qnk“1 Ă E
are affinely independent if and only if the elements txk ´ xr : k P t1, . . . , nuztruu are linearly
independent for any r P t1, . . . , nu, see Exercise 1.1.30.
Let pxk qnk“1 be affinely independent. Since (1.2) is equivalent to
n
ÿ n
ÿ
x ´ xr “ λk pxk ´ xr q and λk “ 1, (1.4)
k“1 k“1
k‰r

every x P aff pxk qnk“1 can be uniquely represented as affine combination of pxk qnk“1 , see
` ˘

Exercise 1.1.30.

Definition 1.1.9 (Barycentric coordinates)


The coefficients pλk qnk“1 (written as pλ1 : . . . : λn q) from (1.4) are the (normalized)
barycentric coordinates of x with respect to pxk qnk“1 . barycentric
coordinates
Remark 1.1.10 (Barycenter) For points pxk qnk“1 in a metric space pE, dq and coordinates
řn řn
pλk qnk“1 Ă Rě0 with k“1 λk “ 1, the barycenters are given by arg minxPE k“1 λk dpxk , xq.
řn
If E is a Hilbert space, then there exists a unique barycenter x̂ given by x̂ “ k“1 λk xk .
In this case x̂ has the barycentric coordinates pλ1 : . . . : λn q with respect to pxk qnk“1 . ˝
Example 1.1.11 (Simplex and unit simplex) A simplex is the convex hull of a (finite) simplex
set of affinely independent points. In Rd the unit vectors pek qdk“1 (which only have zero
entries except for the k-th entry, which is equal to one) are affinely independent such that
" d
ÿ *
∆d :“ convpe1 , . . . , ed q “ x P r0, 1sd : xk “ 1
k“1

is a simplex, called the probability simplex or unit simplex. ˛ unit simplex

In order to examine affine sets more in detail, we consider so-called affine functions.

Definition 1.1.12 (Affine function)


An affine function is the sum of a linear function and a constant (vector); it has the form affine function

f : E Ñ F, x ÞÑ gpxq ` y,

where g : E Ñ F is linear and y P F .

A function f is affine if and only if it commutes with affine combinations, i.e., f pλx ` p1 ´
λqyq “ λf pxq ` p1 ´ λqf pyq for all x, y P E and λ P R, see Exercise 1.1.32. This implies that
images and preimages of affine sets under affine functions are affine.

6
1 CONVEX SETS

Hyperplanes and Half-spaces


Textbooks: [MN22, Def. 2.44], [Bré11, Def., p. 4] and [Hol12, p. 4].
Now, we consider hyperplanes, which are an important example of affine sets. These are
affine subspaces with codimension one. In order to give a precise definition, we recap some
basic facts from functional analysis.
The space E 1 :“ tg : E Ñ R : g is linearu is called the algebraic dual space and its elements
are called linear functionals. Within this lecture, we mainly consider real normed vector
spaces pE, } ¨ }q and require that the considered linear functionals are continuous. Therefore,
we define the (continuous or topological) dual space as E ˚ :“ tg P E 1 : g is continuousu.
The evaluation of a continuous linear functional is denoted by the dual pairing

x ¨, ¨ yE : E ˚ ˆ E Ñ R, pg, xq ÞÑ gpxq. (1.5)

By definition x ¨, ¨ yE is a bilinear mapping. For simplicity, we write x ¨, ¨ y for x ¨, ¨ yE , whenever


it is clear which underlying space is meant. Then, the function

} ¨ }˚ : E ˚ Ñ R, g ÞÑ sup x g, x y
xPE
}x}“1

defines a norm on the dual space E ˚ . We know from the lecture “Functional Analysis” that if
E is a Banach space, then also E ˚ is a Banach space. Moreover, if E is a Hilbert space,
then the dual pairing corresponds to the inner product by Riesz’s representation theorem.

Definition 1.1.13 (Hyperplane and closed halfspace)


A hyperplane through x0 P E with normal p P E ˚ zt0u is hyperplane
␣ ( ␣ (
Hp,a :“ x P E : x p, x ´ x0 y “ 0 “ x P E : x p, x y “ a ,

where a :“ x p, x0 y.
Moreover, for a P R and p P E ˚ zt0u the closed halfspaces in E are defined as closed halfspace

`
Hp,a :“ tx P E : x p, x y ě au and ´
Hp,a :“ tx P E : x p, x y ď au.

By definition, the hyperplane Hp,a is the preimage of the closed affine set t0u under the
continuous affine function x ÞÑ x p, x y ´a. Hence, it is a closed affine set. In contrast, the
closed halfspaces Hp,a
˘
are the preimages of the closed non-affine sets r0, `8q and p´8, 0s
under the same function. Thus, they are closed and convex, but not affine. Moreover, we
can always assume that }p}˚ “ 1 and a ě 0 since it holds Hbp,ba “ Hp,a for all real b ą 0.
p˘q p˘q

An important example of convex sets are so-called polyhedra. A subset A Ă E is called


Fig. 1.8: A hyper-
polyhedral if it can be expressed as the intersection of finitely many closed halfspaces. plane and its two as-
Polyhedra are particularly relevant in linear and combinatorial optimization tasks. sociated closed half-
spaces in R2 .
Definition 1.1.14 (Polyhedral set)
A set S Ă E is a polyhedral set (or polyhedron) if there exists a continuous linear polyhedral set

7
1 CONVEX SETS

function f : E Ñ Rm and a vector a P Rm such that

S “ tx P E : f pxq ď au,

where the “ď” is meant componentwise.

Fig. 1.9: A polyhedral set defined by five linear inequalities is equal to the intersection of
five half-spaces with normal vectors a1 , . . . , a5 [BV04, Fig. 2.11].

The Relative Interior

Definition 1.1.15 (Relative interior)


Let pE, } ¨ }q be a real normed vector space. The relative interior of S Ă E is the interior relative interior
of S relative to its closed affine hull:

ripSq :“ y P S : Dε ą 0 such that Bε pyq X affpSq Ă S ,


␣ (

where Bε pyq :“ tx P E : }x ´ y} ă εu is the open ball centered at y with radius ε.

In the case that S is full-dimensional in the sense that affpSq “ E, we have that ripSq “ S ˝ .
In general, this is not true as the following examples show.
• The relative interior of a point is the point itself.
• The relative interior of a line segment contains all points except the endpoints.
• A planar triangle sitting in R3 like in Fig. 1.6 has empty interior. However, the
relative interior captures the interior with respect to the lower-dimensional
affine subspace which the triangle occupies.
Remark 1.1.16 (Relative interior of convex sets)
If E is finite-dimensional, then every convex set C ‰ H Ă E has non-empty relative interior.
This is not true in, e.g., for C being the closed convex set of Lebesgue-a.e. non-positive
functions f : r0, 1s Ñ R in E “ L2 pr0, 1sq. The proof of these facts is left as an exercise.
Solutions can be found in [Luc06, Prop. 1.1.13], [Hol12, p. 9] and [BS13, p. 20]. ˝

8
1 CONVEX SETS

The following theorem shows a couple of properties of the relative interior.

Theorem 1.1.17: Properties of the relative interior of convex


sets
Let E and F be real normed vector spaces and C, C1 , C2 Ă E be convex subsets
with ripCq, ripC1 q, ripC2 q ‰ H.
1 ripCq is a convex set and has the same closed affine hull as C.
2 We have ripCq “ ripCq and ripCq “ C.
3 We have z P ripCq if and only for every x P C there exists a µ̄ ą 0 such that
for all 0 ă µ ă µ̄ it holds z ˘ µpz ´ xq P C.
4 We have C1 “ C2 if and only if ripC1 q “ ripC2 q.
5 We have f pripCqq “ ri f pCq for all continuous affine functions f : E Ñ F .
` ˘

6 We have ripC1 ˆ C2 q “ ripC1 q ˆ ripC2 q.


7 We have ripC1 ˘ C2 q “ ripC1 q ˘ ripC2 q (in particular ripx ` Cq “ x ` ripCq for
all x P E) and riptC1 q “ t ripC1 q for all t P R.
8 If ripC1 q X ripC2 q ‰ H, then ripC1 X C2 q “ ripC1 q X ripC2 q.

Fig. 1.10: If ripS1 qX


For the proof, we need the next lemma. After the lemma is shown, the proof of Theo- ripS2 q “ H, then
ripS1 X S2 q can be
rem 1.1.17 is left as Exercise 1.1.34.
non-empty.
Lemma 1.1.18 (Accessibility lemma)
Let C Ă pE, } ¨ }q be convex with ripCq ‰ H and t P r0, 1q. Then p1 ´ tq ripCq ` tC “ ripCq.
Hence if C Ă E is closed and convex with non-empty relative interior, then C “ ripCq.

Lecture 2,
Proof. The inclusion “Ą” is clear since ripCq Ă C. For the other inclusion, let x P ripCq,
05.04.2024
y P C and t P r0, 1q. We want to show that w :“ p1 ´ tqx ` ty P ripCq. The case t “ 0 is
clear, so assume t ‰ 0. As x P ripCq, there exists a ε ą 0 such that

Bε pxq X affpCq Ă C. (1.6)

Since y P C, there exists a y 1 P C such that }y ´ y 1 } ă t 3.


1´t ε

Let ε̃ :“ p1 ´ tq 3ε ą 0 and w1 P Bε̃ pwq X affpCq be arbitrary. We want to show that w1 P C.


Let x1 :“ 1´t
1
w1 ´ 1´t t
y 1 P affpCq. Then w1 “ p1 ´ tqx1 ` ty 1 and
› ›
› w ´ ty w1 ´ ty 1 › △‰ 1 t
}x ´ x1 } “ ›› ´ › ď }w ´ w1 } ` }y 1 ´ y}
1´t 1´t › 1´t 1´t
1 ε t 1´tε 2
ď t ¨ 
p1
´tq ` ¨ “ ε ă ε,

  3 1´t t 3 3

so that x1 P Bε pxq X affpCq. Since x P ripCq, this implies by (1.6) that x1 P C. As C is


convex, w1 “ p1 ´ tqx1 ` ty 1 P C. l

If a set C Ă pE, } ¨ }q is convex with C ˝ ‰ H, then we have that ripCq “ C ˝ . Therefore, the
lemma implies that tC ˝ ` p1 ´ tqC Ă C ˝ for all t P r0, 1q.

9
1 CONVEX SETS

Exercises

Exercise 1.1.19
Prove that the sets C1 :“ tpxn qnPN P ℓ2 : |xn | ď 2´n u and

C2 :“ pxn qnPN P ℓ2 : xn “ 0 except for finitely many n P N


␣ (

are convex. ■

Exercise 1.1.20 (Convex hull and convex combination [Roc70b, Thm. 2.2])
Prove that a set is convex if and only if it contains all convex combinations of its points. ■

Exercise 1.1.21 (Convex hull is not open nor closed) Find a convex set that is nei-
ther closed nor open. Is the convex hull of an open set open? ■

Exercise 1.1.22 (Convex hull of sum of sets [Hol12, (2.2)])


We have convpaA ` bBq “ a convpAq ` b convpBq for all A, B Ă E and all a, b P R. ■

Exercise 1.1.23 (Properties of the Minkowski sum) Minkowski addition is asso-


ciative, commutative and it is distributive with respect to Y. We have λA ` λB “ λpA ` Bq
and pλ ` µqA Ă λA ` µA for all λ, µ ě 0 and A Ă E with equality if and only if A is convex
[Sch14, Rem. 1.1.1, p. 127]. ■

Exercise 1.1.24 Prove Theorem 1.1.4. ■

Exercise 1.1.25 If C Ă E is closed and convex with non-empty interior, then C “ C ˝ . ■

Exercise 1.1.26 Let S Ď E with S ˝ ‰ H. Prove that affpSq “ E. ■

Exercise 1.1.27 (Affine calculus [Z0̆2, p. 2]) We have affpA ` Bq “ affpAq ` affpBq,
affpA ˆ Cq “ affpAq ˆ affpCq and aff T pAq “ T affpAq for all A, B P E, C P F and linear
` ˘ ` ˘

mappings T : E Ñ F . All of these relations hold if aff is replaced by conv. ■

Exercise 1.1.28 (aff is a closure operator) For A Ă E we have A Ă affpAq (exten-


sivity), affpaffpAqq “ affpAq (idempotence) and if A Ă B Ă E, then affpAq Ă affpBq
(monotonicity). ■

Exercise 1.1.29 (aff pCq “ aff pCq) We have affpCq “ affpCq because aff is a closure
operator as well. ■

Exercise 1.1.30 (Affine independence and linear independence)


Prove that the points pxk qnk“1 Ă E are affinely independent if and only if the set txk ´ xr :
k P t1, . . . , nuztruu is linearly independent for any r P t1, . . . , nu. ■

Exercise 1.1.31 (Caratheodory’s theorem) Let S Ă Rd . Prove that for any x P


řd`1
convpSq we find points x1 , ..., xd`1 P Rd and λ1 , ..., λd`1 ě 0 with i“1 λi “ 1 such that
řd`1
x “ i“1 λi xi . ■

10
1 CONVEX SETS

Exercise 1.1.32 The affine functions are exactly the functions F with F λx ` p1 ´ λqy “
` ˘

λF pxq ` p1 ´ λqF pyq for all x, y P E and λ P R. ■

Exercise 1.1.33 Prove that taking the affine hull comutes with affine functions, i.e., it
holds affpF pAqq “ F paffpAqq for any affine function F : E Ñ F and any set A Ă E. ■

Exercise 1.1.34 Prove Theorem 1.1.17. ■

11
1 CONVEX SETS

1.2 Hahn-Banach Separation of Convex Sets


Textbooks: [BS13, pp. 18–20], [HUL93, Sec. 4.1], [ET99, p. 5] and [Hol12, Sec. I§4, pp.
63–64].
In this subsection we investigate under which conditions a very simple convex set - a hyper-
plane - can be squeezed in between two convex sets. Figure 1.11 shows that this is clearly
not possible when either of the sets is non-convex. Let pE, } ¨ }q be a real normed space with
dual space E ˚ . A hyperplane can separate two sets in different ways.
Fig. 1.11: Two sets
Definition 1.2.1 ((Strong) Separation of sets by a hyperplane)
that cannot be sep-
Let C, D Ă E be two sets. The hyperplane Hp,a arated by a hyper-
plane.
• separates C and D if

x p, x y ď a ď x p, y y @x P C, y P D.

• strongly (or: strictly) separates C and D if

x p, x y ă a ă x p, y y @x P C, y P D.

• properly separates C and D if it separates C and D and both sets are not fully
contained in Hp,a , i.e., if there exist x P C and y P D such that

x p, x y ă x p, y y .

Fig. 1.12: Left: Two proper separable sets C1 , C2 Ă R2 . Middle and Right: Two sets
C1 , C2 Ă Rk , k P t2, 3u which cannot be properly separated [HUL93, Fig. 4.1.3].

The following theorem can be proven using the Hahn-Banach theorem [BS13, Thm. 2.13].

Theorem 1.2.2: Separating point and convex set


Let C Ă E be a non-empty, convex set and x0 R C. Then, the following holds true.
• There exists a p P E ˚ zt0u such that supxPC x p, x y ď x p, x0 y.
• If C is closed, there exist p P E ˚ zt0u and ε ą 0 with supxPC x p, x y ď x p, x0 y ´ε.
For a :“ 21 x p, x0 y ` supxPC x p, x y , the hyperplane Hp,a separates C and tx0 u. If
` ˘

C is closed, this separation is strong.

12
1 CONVEX SETS

Remark 1.2.3 Let C Ă E be convex. Then, it holds that


č
´
C“ Hp,a .
´
pPE ˚ ,aPR:CĂHp,a

The inclusion “Ă” holds since C is a subset of the right hand side by definition and since
the right hand side is closed. Moreover, we find by Theorem 1.2.2 for any x P EzC some
p P E ˚ and a P R such that x R Hp,a
´
Ą C which implies the reverse direction “Ą”. ˝

Theorem 1.2.4: Separation of disjoint convex sets

Let C, D Ă E be non-empty, convex disjoint sets. Then there exist p P E ˚ zt0u and
a P R such that the hyperplane Hp,a separates C and D. If C is closed and D is
compact, then p and a can be chosen such that Hp,a separates C and D strongly.

Fig. 1.13: Left: Two convex compact sets are strongly separable by a hyperplane. Right:
Without compactness, two closed convex disjoint sets can be not strongly separable.

Proof. The set S :“ C ´ D does not contain the origin, because C and D are disjoint.
Applying Theorem 1.2.2 on S and t0u there exists p ‰ 0 P E ˚ such that

x p, x y ď x p, 0 y “ 0 @x P S. (1.7)

Since S “ C ´ D, we obtain

x p, c y ď x p, d y @c P C, d P D. (1.8)

Hence, the hyperplane Hp,a with a :“ 21 supcPC x p, c y ` inf dPD x p, d y separates C and D.
` ˘

If C is closed and D is compact, then S is closed and convex since the Minkowski sum
of a closed and a compact set is again closed. Hence we can chose p P E ˚ zt0u and ε ą 0
by Theorem 1.2.2 such that (1.7) and (1.8) hold with an additional “´ε” on the right hand
side. Thus, Hp,a separates C and D strongly in this case. l

Even though two convex sets can always be separated by the previous theorem, this sepa-
ration might not be proper as we can see in the following theorem.

13
1 CONVEX SETS

Theorem 1.2.5: General separation theorem for convex sets


Let C, D Ă E be convex sets with non-empty relative interior. There exists a hyper-
plane that separates C and D properly if and only if ripCq X ripDq “ H.

Proof. “ ùñ :” Let p P E ˚ zt0u and a P R such that Hp,a separates C and D properly
and assume that x0 P ripCq X ripDq. That is, it holds x p, z1 y ď a ď x p, z2 y for all z1 P C
and z2 P D and there exist x1 P C and x2 P D such that x p, x1 y ă x p, x2 y. Since
x0 P C X D we have that x p, x0 y “ a. Because x0 P ripCq X ripDq, we find by Theorem 1.1.17
3 numbers λ1 , λ2 ą 0 such that y1 :“ x0 ´ λpx1 ´ x0 q “ p1 ` λqx0 ´ λx1 P C and
y2 :“ x0 ´ λpx2 ´ x0 q “ p1 ` λqx0 ´ λx2 P D for all 0 ă λ ă minpλ1 , λ2 q. Consequently, we
have that
x p, y1 y “ x p, p1 ` λqx0 ´ λx1 y “ x p, p1 ` λqx0 y ´ x p, λx1 y
ą x p, p1 ` λqx0 y ´ x p, λx2 y “ x p, p1 ` λqx0 ´ λx2 y “ x p, y2 y .

This contradicts the separation property since y1 P C and y2 P D.


“ ðù :” We have to distinguish two cases.
1 Let 0 R affpC ´ Dq. Then, Theorem 1.2.2 implies that there exist p P E ˚ zt0u and
ε ą 0 such that
x p, x y ď x p, 0 y ´ε “ ´ε @x P affpC ´ Dq.

In particular, we have for all x1 ´ x2 P C ´ D that

x p, x1 ´ x2 y ď ´ε ô x p, x1 y ď x p, x2 y ´ε.

Consequently, Hp,a for a “ 21 psupx1 PC x p, x1 y ` inf x2 PD x p, x2 yq strongly separates C


and D which implies proper separation.
2 Let 0 P affpC ´ Dq. Then F :“ affpC ´ Dq is a linear subspace of E. By assumption,
we have that ripCq X ripDq “ H such that 0 R ripCq ´ ripDq “ ripC ´ Dq. Therefore,
there exists by Theorem 1.2.2 some p P E ˚ zt0u such that

x p, x y ď 0 @x P ripC ´ Dq.

In particular, it holds

tx P F : x p, x y ď 0u Ą ripC ´ Dq.

Taking the interior in F of both sides, we obtain

tx P F : x p, x y ă 0u Ą ripC ´ Dq “ ripCq ´ ripDq

such that
x p, x1 ´ x2 y ă 0 @x1 P ripCq, x2 P ripDq.

Hence, we obtain

x p, x1 y ă x p, x2 y @x1 P ripCq, x2 P ripDq. (1.9)

14
1 CONVEX SETS

Moreover, we have for x P C Ă C “ ripCq and y P D Ă D “ ripDq that there exist


pxk qk in ripCq and pyk qk P ripDq sucht that xk Ñ x and yk Ñ y for k Ñ 8. In
particular, we have that

x p, x ´ y y “ lim xloooooomoooooon
p, xk ´ yk y ď 0.
kÑ8
ă0

Consequently, it holds x p, x y ď x p, y y for all x P C and y P D. To summarize, the


hyperplane Hp,a for a “ 12 psupx1 PC x p, x1 yE ` inf x2 PD x p, x2 yE q separates C and D.
By (1.9) together with ripCq ‰ H ‰ ripDq, it separates C and D properly. l

Remark 1.2.6 (Supporting halfspace / hyperplane)


A supporting halfspace to a convex set C Ă E is a closed halfspace H which contains C supporting
and whose boundary has non-empty intersection with C. Any point in this intersection is halfspace
called supporting point for C.
A supporting hyperplane to C is a hyperplane which is the boundary of a supporting supporting
halfspace of C. For x0 P CzC̊, the supporting hyperplane of C at x0 is hyperplane

Hp,a :“ tx P E : x p, x y “ x p, x0 y “: au

for any p ‰ 0 such that x p, x y ď a “ x p, x0 y for all x P C.


By Theorem 1.2.5, there always exists a supporting hyperplane to a non-empty convex set
C Ă E at x0 P CzC̊. ˝

Fig. 1.14: Supporting hyperplanes to a convex set at different points [HUL93, Fig. 2.4.1].

15
1 CONVEX SETS

1.3 Convex Cones

Introduction to convex cones and examples


Textbooks: [AB99, p. 197-202], [BV04, Subsec. 2.4.1], [Bar02, Sec. II.8], [AO16, Chp. 10]
and [BS13, Subsec. 2.1.4].

Definition 1.3.1 (Cone, convex cone, conic hull)


A nonempty set K Ă E is a cone (with apex 0) if x P K implies λx P K for every λ ě 0. cone
A convex cone is a cone, which is a convex set. The conic hull of C Ă E, conepCq, is convex cone
the intersection of all convex cones containing it.

Fig. 1.15: Left: An open convex cone. Right: A closed non-convex cone.

Fig. 1.16: The conic


In the literature, the defining property of a cone is sometimes only required for λ ą 0 so hull of a convex sub-
that their cones do not necessarily contain the origin. We do mnot follow this convention set of R2 [Jah07,
since including the 0 is often important for us. Fig. 4.3].

The properties 1 - 4 of Theorem 1.1.4 remain true when “convex set” is replaced by
“cone”. Moreover, the closure of a cone remains a cone. However, the interior of a cone does
not necessarily remain one. Simple examples of cones are linear subspaces of E.

Lemma 1.3.2 (Characterization of convex cones)


A cone K Ă E is convex if and only if K ` K Ă K, i.e., if x ` y P K for all x, y P K.

The proof is left as Exercise 1.3.15.


Example 1.3.3 (Convex cones from nonnegativity constraints)
Sets defined by nonnegativity / nonpositivity constraints are often convex cones:
• nonnegative (componentwise) vectors of Rd , Rd` :“ r0, 8qd (called the nonneg-
ative orthant).
• nonnegative functions, for example continuous nonnegative functions:
␣ (
f P Cpra, bs; Rq : f pxq ě 0 @x P ra, bs

16
1 CONVEX SETS

or square-integrable functions, which are nonnegative almost everywhere:

f P L2 pra, bs, µ; Rq : f ě 0 µ-almost everywhere ,


␣ (

where µ is a nonnegative measure and a ă b are real numbers.


• nonnegative measures M` pXq :“ tµ P M pXq : µpAq ě 0 @A P Bu, where M pXq
are the Radon measures on a compact Polish measure space pX, Bq,
dˆd
• symmetric positive semidefinite matrices Sym` d pRq :“ tA P R : AT “
d
A, x Ax ě 0 @x P R u or more generally positive self-adjoint operators on a
T

Hilbert space H: tT P LpHq : T ˚ “ T, x T x, x y ě 0 @x P Hu.


Furthermore, the closed halfspaces Hp,0
˘
are convex cones. ˛

Remark 1.3.4 (Plotting Sym` 2 pRq) Let Symd pRq be equipped the Frobenius norm }¨}F
and let | ¨ |2 be the Euclidean norm. We can visualize the cone Sym` 2 pRq of symmetric
positive definite 2 ˆ 2 matrices by using the linear (normal) isometry (Exercise!)
¨ ˛ ¨ ˛
˜ ¸ a´c x
` ˘ ` 3 ˘ a b 1 ˚
T : Sym2 pRq, } ¨ }F Ñ R , | ¨ |2 ÞÑ ? ˝ 2b ‚ “: ˝y ‚.
‹ ˚ ‹
b c 2
a`c z

The eigenvalues of X :“ ab cb are (Exercise!)


` ˘

1 ` a ˘
λ1,2 “ ? ˘ x2 ` y 2 ` z .
2
a 3
Hence, X P Sym` 2 pRq if and only if x2 ` y 2 ď z, so T Sym`
` ˘
a 2 pRq “ tpx, y, zq P R :
x2 ` y 2 ď zu is an “ice cream cone” (see Fig. 1.17). ˝

Fig. 1.17: The boundaries of the standard second order cone L3 (the “ice cream cone”) and
its rotated version L3r . Image source: https://docs.mosek.com/modeling- cookbook/cqo.html.

Remark 1.3.5 (Second order cone programming)


The standard second order (or quadratic or Lorentz) cone
#˜ ¸ +
d`1 x d
L :“ : x P R , |x|2 ď xd`1
xd`1

is closed and convex, where d ě 1. For d “ 2, the second-order cone is simply the “ice cream
cone” from Remark 1.3.4. Its rotated version is

Ld`2 :“ px, xd`1 , xd`2 qT P Rd`2 : |x|22 ď 2xd`1 xd`2 , xd`1 ě 0 .


␣ (
r

17
1 CONVEX SETS

A second order cone program (SOCP) solves


min cT x subject to Ax ` b P K̃,
xPK

where K, K̃ are products of closed convex cones of the form Ld1 , Ldr 2 , Rd3 , Rd`4 or t0u, A is a
matrix and b and c are vectors. Software packages for SOCP are e.g. SeDuMi and MOSEK.
In Exercise 1.3.17, we see how SOCPs can be applied for solving quadratic problems. ˝

Dual and Polar Cones

Definition 1.3.6 (Polar cone, dual cone)


The polar cone of the nonempty set S Ă E is polar cone

S 0 :“ p P E ˚ : x p, x y ď 0 @x P S .
␣ (

The dual cone of S is S0 :“ ´S 0 . dual cone

Fig. 1.18: Left. The polar cone C 0 of a non-convex set C Ă R2 . The angles between the
limiting rays of the conic hull of C and that of C 0 make right angles. Right. The dual cone
C0 (in the figure denoted as C ˚ ) of the same set C.

For analyzing polar and dual cone, we recap the definition of the canonical embedding of E
into its bidual space E ˚˚ :“ pE ˚ q˚ . The canonical embedding is defined by the continuous canonical
linear isometry ΛE : E Ñ E ˚˚ , where we define ΛE pxq for x P E by embedding
x ΛE pxq, f yE ˚ “ x f, x yE for all f P E ˚ .
If ΛE is surjective, the normed space E is called reflexive. In this case, ΛE is an isometric reflexive
isomorphism between E and E ˚˚ . Then, we identify x with ΛE pxq and write formally
x “ ΛE pxq and E “ E ˚˚ . Note that not any normed space is reflexive.
Now, we can rewrite the polar cone S 0 by
( č␣
S 0 :“ p P E ˚ : x p, x yE ď 0 @x P S “
␣ (
p P E ˚ : x p, x yE ď 0
xPS
č␣ ( č ´
“ p P E ˚ : x ΛE pxq, p yE ˚ ď 0 “ HΛE pxq,0 .
xPS xPS

18
1 CONVEX SETS

In particular, S 0 is an intersection of closed convex sets in E ˚ and hence closed and convex.
Example 1.3.7
1 For E “ Rd it holds rr0, 8qd s0 “ p´8, 0sd .

2 For a linear subspace S Ă E we have that the dual and polar cone coincide and are
given by the orthogonal complement of S, i.e., S 0 “ S0 “ S K , where the orthogonal
complement is defined as S K :“ tp P E ˚ : x p, x y “ 0 @x P Su. In particular pt0uq0 “
E ˚ and E 0 “ t0u Ă E ˚ .
3 For x P E, the cone K “ tλx : λ ě 0u has the polar cone K 0 :“ tp P E ˚ : x p, x y ď 0u.
‰0
Symd` pRq “ Symd´ pRq.

4

5 Equipped with the right topologies, the polar cone of the cone of nonnegative measures
M` pXq is the cone of nonpositive continuous functions C ´ pX; Rq.
‰0
Lp pΩ; Rě0 q “ Lq pΩ; Rď0 q, where p ě 1 and p1 ` 1q “ 1.

6

7 Let pE, } ¨ }q “ pRd , } ¨ }2 q. The polar cone of the finitely generated cone

K :“ tAy : y P Rm
ě0 u, A P Rd,m

is given by the polyhedral cone

K 0 “ tp P Rd : AT p ď 0u.

Argument: Set S :“ tp P Rd : AT p ď 0u. If p P S, then we have for x P K that

xp, xy “ xp, Ayy “ xAT p, yy ď 0,

so that p P K 0 . Thus, S Ď K 0 .
Conversely, let p P K 0 , i.e., xp, Ayy ď 0 for all y ě 0. In particular, we obtain for
y “ ei that xp, ai y ď 0, i “ 1, . . . , m. Hence K 0 Ď S and in summary K 0 “ S. ˛

Definition 1.3.8 (Angle)


The angle between x P Ezt0u and p P E ˚ zt0u is the number α P r0, πq such that angle

x p, x y
cospαq “ .
}p}˚ }x}

Using this definition, the polar cone S 0 zt0u consists exactly of those p P E ˚ such that for
all x P Szt0u the angle α between x and p fulfills cospαq ď 0, i.e., α ě π2 .
Lecture 3,
Lemma 1.3.9 (The bipolar cone)
10.04.2024
Assume that E is reflexive (where we identify E with E ˚˚ ). Then, for K ‰ H Ă E we have
` ˘
K 00 :“ pK 0 q0 “ cone convpKq . For closed convex cones K Ă E we get K 00 “ K.

Proof. “Ą”: We have

K 00 “ tx̂ P E ˚˚ : x x̂, p yE ˚ ď 0 @p P K 0 u.

19
1 CONVEX SETS

By the identification x “ ΛE pxq and by inserting the definition of K 0 ,we get that

K 00 “ tx P E : x ΛE pxq, p yE ˚ ď 0 @p P K 0 u “ tx P E : x p, x y ď 0 @p P K 0 u.

Now, let x P K. Then, it holds by the definition of K 0 for any p P K 0 that x p, x y ď 0.


Therefore, we have that x P K 00 . Hence K Ă K 00 . Since K 00 is a closed convex cone, we
obtain that conepconvpKqq Ă K 00 .
“Ă”: We show for any x0 R conepconvpKqq that x0 R K 00 . By Theorem 1.2.4 there exists
a p P E ˚ zt0u strongly separating the compact convex non-empty set tx0 u and the convex
closed non-empty set conepconvpKqq:

(1.10)
` ˘
x p, x y ă x p, x0 y @x P cone convpKq .

Choosing x “ 0 P conepconvpKqq yields 0 ă x p, x0 y.


To show x0 R K 00 , it now remains to show that p P K 0 . Assume that p R K 0 . Then we find
y P K such that x p, y y ą 0. This implies that x “ xxp,x
p,y y y P conepconvpKqq fulfills
0y

x p, x0 y
x p, x y “ x p, y y “ x p, x0 y,
x p, y y

which contradicts (1.10) since K Ă conepconvpKqq. l

Tangent and normal cones


Textbooks: [AF09, Sec. 4.1–4.2], [Jah07, Sec. 4.1], [Roc81, Sec. 2] and [HUL93, Sec. III.5].
Let S Ă E. Given some point x P S, we are interested in the set of all directions in which
we can move within the set S. To this end, we define the so-called tangential cone. In
literature, there are also other definitions of a tangential cone. Here, we rely on a definition
by Bouligand. The arising tangential cone is also called Bouligand’s contingent cone or
paratingent cone and dates back to Bouligand [Bou30] and Severi [Sev30].

Definition 1.3.10 (Tangential cone)


Let S ‰ H Ă E and x0 P E. Then, the tangential cone TS px0 q is defined by tangential cone

TS px0 q :“ d P E : Dxn P S, xn Ñ x0 , Dλn Ñ 0` such that d “ lim 1


␣ (
pxn ´ x0 q .
nÑ8 λn

By definition, we have TS px0 q “ H for all x0 R S. It can be proven that the tangential cone
is always closed. If S Ă E is convex, the tangential cone can be defined equivalently by

TS px0 q “ td “ tpx ´ x0 q : x P S, t ě 0u
“ td P E : Dλ ą 0 such that x0 ` λd P Su “ conepS ´ x0 q.

In particluar, if S is convex, then TS px0 q is convex, too. However, in the case that S is
non-convex, TS px0 q is not necessarily convex, see Fig. 1.19 left.
Remark 1.3.11 (Clarke’s tangent cone) An alternative tangential cone is Clarke’s
tangent cone as proposed in [Cla90a, p. 51]. It is always convex even if S is non-convex. If

20
1 CONVEX SETS

S is convex, then Clarke’s tangent cone and the tangential cone due to Bouligand from
Definition 1.3.10 coincide. Since we are mainly interested in convex sets, we do not discuss
Clarke’s tangent cone within this lecture. ˝
For a convex set C Ă E we define the cone of all normals onto C in a point x0 P E. Since
normals live on the dual space E ˚ , the normal cone is a subset of E ˚ .

Definition 1.3.12 (Normal cone)


Let C Ă E be convex and let x0 P C. Then, we define the normal cone to C at x0 by normal cone

NC px0 q :“ tp P E ˚ : xp, x ´ x0 y ď 0 @x P Cu.

By definition normal cones are closed and it holds NC px0 q “ NC px0 q “ NripCq px0 q. The
right side of Fig. 1.19 illustrates the tangential and normal cone of a convex set.
Lemma 1.3.13
Let C Ă E be an affine set and let x0 P C. Then, it holds

NC px0 q “ tp P E ˚ : x p, x ´ x0 y “ 0, @x P Cu.

Proof. The inclusion “Ą” is clear by the definition. For “Ă” let p P NC px0 q and x P C.
Moreover, let pxn q8n“1 be a sequence in C such that xn Ñ x0 . Then we have by the affinity of
C that 2xn ´ x P C such that the definition of the normal cone yields x p, 2xn ´ x ´ x0 y ď 0.
Taking the limit n Ñ 8 and using the continuity of p, we obtain that x p, x0 ´ x y “
x p, 2x0 ´ x ´ x0 y ď 0. Since x P C the definition of the normal cone also yields that
x p, x ´ x0 y ď 0 such that we arrive at x p, x ´ x0 y “ 0. l

x0
C

x0+ Nc (x 0 )
x0+ T c(x0 )

TS (x0 )

Fig. 1.19: Left: Tangential cone (blue dotted) of a non-convex set S (black curve). Right:
Tangential cone and normal cone of a convex set C.

As we show next, tangent and normal cones are related by polarity.

21
1 CONVEX SETS

Theorem 1.3.14: Polarity of Tangent and Normal Cones

Let C be convex and x0 P C. Then, NC px0 q is the polar cone to TC px0 q, i.e.,

NC px0 q “ pTC px0 qq0 .

Proof. Let p P pTC px0 qq0 . Since C ´ x0 Ă TC px0 q, we have that

x p, x ´ x0 y ď 0 @x P C,

which implies p P NC px0 q.


Conversely, let p P NC px0 q and d P TC px0 q. Then, there exist sequences pxn qn in C with
xn Ñ x0 and λn Ñ 0` such that d “ limnÑ8 λ1n pxn ´ x0 q. Therefore, we obtain by the
continuity of p that
x p, d y “ lim λ1n loooooomoooooon
x p, xn ´ x0 y ď 0.
nÑ8
ď0 since xn P C
and p P NC px0 q

Since d P TC px0 q was arbitrary chosen, we obtain that x p, d y ď 0 for all d P TC px0 q such
that p P pTC px0 qq0 . l

Exercises

Exercise 1.3.15 Prove Lemma 1.3.2. ■

Exercise 1.3.16 A convex cone K Ă E is a linear subspace over R iff K “ ´K. ■

Exercise 1.3.17 Rewrite the optimization problems


$
’ Ax “ u,

1 1 &
min x2 , min |x|, min2 |x ´ y|22 ` λ|x1 ´ x2 |, min a ´ 1 s. t. 1 ď a,
xPR 2 xPR xPR 2 aPR, ’
uPRm ,
’ 1
ď u ď a.
%
xPRn a

for some fixed y P R2 and A P Rmˆn as SOCPs (see Remark 1.3.5). ■

Exercise 1.3.18 (Polar cones) Prove the identities from Example 1.3.7. ■

Exercise 1.3.19 (Dual cone of Lorentz cone) The second order cone Ld`1 from Re-
mark 1.3.5 is dual to itself. ■

Exercise 1.3.20 What is TC pxq for x P C ˝ ? ■

Exercise 1.3.21 For a linear bounded operator A : E Ñ F and b P F consider C :“ tx P


E : Ax “ bu. Then TC px0 q “ kerpAq and NC px0 q “ ranpA˚ q for all x0 P C. ■

22
1 CONVEX SETS

1.4 Orthogonal Projections


In this subsection, we consider orthogonal projections onto convex sets. For this, we assume
that E is a real Hilbert space. In abuse of notations, we denote the inner product in the
Hilbert space E again by x ¨, ¨ yE : E ˆ E Ñ R. The following projection theorem is usually
proven in any introductory functional analysis course.

Theorem 1.4.1: Projection Theorem in Hilbert Spaces


Let E be a real Hilbert space and let C Ă E be non-empty, closed and convex.
Then, for every x P E, there exists a uniquely determined element x̂ P C such that

x̂ “ arg min }x ´ y}. (1.11)


yPC

The elemet x̂ is called orthogonal projection of x onto C and we denote it by orthogonal


PC pxq. Further, (1.11) is eqivalent to projection

x x ´ x̂, y ´ x̂ y ď 0 for all y P C. (1.12)

If C Ă E is a linear subspace, then (1.12) simplifies to

x x ´ x̂, y y ď 0 for all y P C.

Theorem 1.4.1 does not remain true, whenever we drop some of the assumptions:
• If E is no longer a Hilbert space, the arg min from (1.11) might not be unique. For
example, let pE, } ¨ }q “ pR2 , } ¨ }8 q, C “ tpx, 0q : x P Ru and x “ p0, 1q. Then, it holds
arg minyPC }x ´ y} “ tpx, 0q : x P r´1, 1su.
• If C is not convex, the arg min from (1.11) might not be unique. For instance, consider
pE, } ¨ }q “ pR, | ¨ |q, C “ t´1, 1u and x “ 0. Then, it holds arg minyPC }x ´ y} “
t´1, 1u “ C. See also Fig. 1.20 for another example.
• If C is not closed, the arg min from (1.11) might be empty. For example, consider
pE, } ¨ }q “ pR, | ¨ |q, C “ r´1, 0q and x “ 1. Then, arg minyPC }x ´ y} “ H.
Fig. 1.20: The or-
Example 1.4.2 Let pE, } ¨ }q “ pRd , } ¨ }2 q. For y P Rd and Bp p0, λq “ tz P Rd : }z}p ď λu,
thogonal projection
we want to find the minimizer x̂ “ PBp p0,λq pyq of onto a non-convex
set is not unique
}y ´ x}22 Ñ min s.t. x P Bp p0, λq. [AM19, Fig. 3].

1 For p “ 8, the problem can be solved componentwise, i.e., we have to find xi with
|xi | ď λ such that the distance |yi ´ xi | becomes minimal. We obtain that
#
yi if |yi | ď λ,
x̂i “
λ sgnpyi q if |yi | ą λ.

23
1 CONVEX SETS

2 For p “ 2, we obtain (see also Fig. 1.21) that


#
y if }y}2 ď λ,
x̂ “ y
λ }y}2 if }y}2 ą λ.

3 The case p “ 1 is more complicated and we refer to [DSSSC08, DFL08]. One can
prove that
# #
y if }y}1 ď λ, 0 if |yi | ď γ,
x̂ “ d and Sγ pyi q :“
pSµ pyi qqi“1 if }y}1 ą λ yi ´ γ sgnpyi q if |yi | ą γ

|y |`...`|y |´λ
with µ :“ πp1q m
πpmq
, where |yπp1q | ě . . . ě |yπpdq | ě 0 are the sorted absolute
values of the components of y and m ď d is the largest index with |yπpmq | ą 0 and

|yπp1q | ` . . . ` |yπpmq | ´ λ
ď |yπpmq |.
m
The computation of PB1 p0,λq pyq requires Opd log dq operations due to the sorting pro-
cedure. The function Sγ is known as soft-shrinkage with threshold γ, see [DJ98]. soft-shrinkage
Note that the example can be generalized to pE, } ¨ }q “ L2 pΩ, µq for some measure µ on
some set Ω and the sets
ż
2
Bp p0, λq :“ tf P L pΩ, µq : |f pxq|p dµpxq ď λp u, 1 ď p ă 8

B8 p0, λq :“ tf P L2 pΩ, µq : ess suppf q ď λu. ˛

.f
. λ ff

Fig. 1.21: Projection onto B2 p0, λq Ď R2 .

In the case that C is a convex cone, we obtain the following corollary.


Corollary 1.4.3 (Orthogonal Projection onto Closed Convex Cones)
Let E be a Hilbert space and let C Ď E be a closed, convex cone. Then any element x0 P E
can be uniquely decomposed as

x0 “ PC px0 q ` PC 0 px0 q, where x PC px0 q, PC 0 px0 q y “ 0.

24
1 CONVEX SETS

Proof. By Theorem 1.4.1 we have that

xx0 ´ PC px0 q, x ´ PC px0 qy ď 0 @x P C.

In particular, this implies for x “ 0 that xx0 ´ PC px0 q, PC px0 qy ě 0 and for x “ 2PC px0 q
that xx0 ´ PC px0 q, PC px0 qy ď 0. Thus, xx0 ´ PC px0 q, PC px0 qy “ 0 and consequently xx0 ´
PC px0 q, xy ď 0 for all x P C, i.e., x0 ´ PC px0 q P C 0 . Since for all x P C 0 it holds

xx0 ´ px0 ´ PC px0 qq, x ´ px0 ´ PC px0 qqy “ xPC px0 q, x ´ x0 ` PC px0 qy
“ xPC px0 q, xy ` xPC px0 q, PC px0 q ´ x0 y ď 0,

we see that x0 ´ PC px0 q “ PC 0 x0 . l

Fig. 1.22 illustrates the orthogonal projection onto a convex cone.

. K*

Fig. 1.22: Orthogonal projection onto a closed, convex cone K.

Remark 1.4.4 (Projection Theorem in strictly convex Banach spaces) In general


Banach spaces, Theorem 1.4.1 does not hold. Nevertheless, under additional assumptions,
a similar projection theorem can be derived. Possible assumptions include strict convexity,
reflexivity and smoothness. However, these considerations are beyond the scope of the lec-
ture and we refer to [BS13, Subsec. 2.1.2], [Dud02, Sec. 6.2], [Hol12, Sec. 4, Thms. on pp.
63 – 64], [Hol72, §22, 28, 32], [Z0̆2, Sec. 3.8], [Alb96, Sec. 2] for more details. ˝

25
2 CONVEX FUNCTIONS

2 Convex Functions
Real-valued convex functions are intimately connected with convex sets since they are pre-
cisely the functions such that all the points on and above its graph form a convex set. They
have very favorable properties when it comes to optimization.
After studying extended real-valued functions that are suitably well-behaved (“proper”) and
fulfill a semi-continuity requirement compatible with minimization procedures, we introduce
different types of convex functions. For differentiable functions, convexity is equivalent to
monotonicity of the derivative. Moreover, convex functions are continuous (on the relative
interior of their domain) under mild assumptions. In the last two sections of the chapter,
we study the problem of minimizing a convex function and related problems like proximal
operators. Throughout this chapter, pE, } ¨ }q is a real normed vector space, unless indicated
otherwise.
Textbooks: [ET99, Chp. 1.2], [Roc70b, Sec. 4], [AB99, Sec. 5.4], [BL11, Subsec. 6.2.1],
[DM93, Chp. 1] and [Hol12, Sec. 1.3].

2.1 Proper, Lower Semicontinuous Functions


Textbooks: [BS13, p. 13], [Luc06, Sec. 1.2], [Phe09, Sec. 3], [Z0̆2, p. 60-63] and [ET99,
Subsec. I.2.2].
The extended real numbers are R :“ R Yt´8, 8u “ r´8, 8s with the arithmetic extended real
numbers
a ` 8 “ 8 ` a “ 8, @a P p´8, 8s
a ´ 8 “ ´8 ` a “ ´8, @a P r´8, 8q
a ¨ p˘8q “ p˘8q ¨ a “ ˘8 @a P p0, 8s
a ¨ p˘8q “ p˘8q ¨ a “ ¯8 @a P r´8, 0q
0 ¨ p˘8q “ p˘8q ¨ 0 “ 0 and ´ p´8q “ 8
as well as infpHq :“ 8 and suppHq “ ´8.
Note that R is not a field like R (not even a semi-group). The last part of the definition is
motivated by the fact that for all S ‰ H we want infpSq ď infpHq and suppSq ě suppHq.
The expressions 8 ` p´8q and ´8 ` 8 are undefined.
In many minimization applications, only functions with values in R Yt`8u “ p´8, 8s are
considered. We reuqire functions taking the value ´8 later for defining conjugate functions
and duality. A function with values in R is a finite function.
Common examples for functions taking infinite values are indicator functions. For a set indicator
C Ă E, the indicator function is defined by function
$
& 0, if x P C,
ιC : E Ñ R Yt`8u, x ÞÑ
% `8, if x R C.

It can be used to rewrite a constrained minimization problems in unconstrained form as


inf f pxq “ inf f pxq ` ιC pxq.
xPC xPE

26
2 CONVEX FUNCTIONS

Definition 2.1.1 (Proper and lower semicontinuous functions)


A function f : E Ñ R is proper if f pxq ‰ ´8 for every x P E and f ı 8. Otherwise, f proper
is improper. Let C Ă E. The proper or effective domain of f : C Ñ R is effective domain

dompf q :“ tx P C : f pxq ă 8u.

A function f : E Ñ R is lower semicontinuous (lsc) at x0 P E if lower


semicontinuous
f px0 q ď lim inf f pxq :“ sup inf f pxq. (2.1)
xÑx0 εą0 xPBε px0 q

A function f is lower semicontinuous if it is lower semicontinuous at every point x0 P E.


We call f upper semicontinuous (usc) if ´f is lower semicontinuous.

Since f px0 q ě inf xPBε px0 q f pxq for every ε ą 0, we always have f px0 q ě supεą0 inf xPBε px0 q f pxq,
so the inequality in (2.1) can be replaced by equality.

Fig. 2.1: A upper semicontinuous function that is not lower semicontinuous at x0 .

For f : E Ñ R, the following definitions are equivalent to lower semicontinuity at x0 P E.


• For all λ ă f px0 q, there exists a ε ą 0 so that f pxq ě λ for all x P Bε px0 q.
nÑ8
• For every sequence pxn qnPN Ă E with xn ÝÝÝÑ x0 such that limnÑ8 f pxn q exists in
R we have f px0 q ď limnÑ8 f pxn q.
Lecture 4,
Definition 2.1.2 ((Strict) epigraph, level set)
12.04.2024
The epigraph of f : E Ñ R is epigraph

epipf q :“ tpx, rq P E ˆ R : f pxq ď ru,

the strict epigraph of f is strict epigraph

s-epipf q – tpx, rq P E ˆ R : f pxq ă ru.

The r-level set of f for r P R is level set

levr pf q :“ tx P E : f pxq ď ru.

27
2 CONVEX FUNCTIONS

The domain dompf q is given by the projection of epipf q Ă E ˆ R onto E, i.e., the image
of epipf q under the mapping px, aq ÞÑ x for px, aq P E ˆ R. Thus epipf q “ H if and only if
f ” 8. The set epipf q has vertical lines where f takes the value ´8.
The (strict) epigraph is a subset of E ˆ R. We equip the space E ˆ R with the norm
}px, aq}2 “ }x}2 ` a2 . Then, the dual space of E ˆ R is given by pE ˆ Rq˚ “ E ˚ ˆ R with

xpp, bq, px, aq yEˆR “ x p, x y `ba. Fig. 2.2: Level set


and epigraph.

Example 2.1.3 (Epigraph of an affine function) For p P E ˚ and a P R the epigraph


of the affine function x p, ¨ y `a is given by the closed halfspace Hp̃,´a
´
, where p̃ :“ pp, ´1qT .˛

Lemma 2.1.4 (Lsc, epigraphs and level sets)


Let f : E Ñ R. Then the following are equivalent.
• f is lower semicontinuous;
• epipf q is closed;
• all level sets of f are closed.
Fig. 2.3: The epi-
graph of an affine
Proof. “ 1 ùñ 2 ”: Let x P E, a P R and ppxn , an qqnPN Ă epipf q with pxn , an q Ñ px, aq. function with “nor-
Then, f pxn q ď an for all n P N by Definition 2.1.2. As f is lower semicontinuous, we have mal vector” p is
the closed half-space
f pxq ď lim inf f pxn q ď lim an “ a, with normal vector
nÑ8 nÑ8
pp, ´1qT .

that is, px, aq P epipf q.


“ 2 ùñ 3 ”: Let a P R and pxn qnPN Ă leva pf q with xn Ñ x. As epipf q is closed, we have
limnÑ8 pxn , aq “ px, aq P epipf q, that is, x P leva pf q.
“ 3 ùñ 1 ”: Let x0 P E. If f px0 q “ ´8, then f is lower semicontinuous at x0 . If
f px0 q ą ´8, then for all a P R with a ă f px0 q we have by definition that x0 R leva pf q.
Since leva pf q is closed, there exists ε ą 0 such that leva pf q X Bpx0 , εq “ H. Thus f pxq ą a
for all x P Bpx0 , εq. Since this holds for all a P R with a ă f px0 q, we obtain that f is lower
semicontinuous at x0 . l

Corollary 2.1.5 (Lower semicontinuity of indicator functions)


A subset S Ă E is closed if and only if ιS is lower semicontinuous.

Proof. We have that epipιS q “ S ˆ r0, 8q is closed if and only if S is closed. l

Using Lemma 2.1.4 one can prove a result similar to Theorem 1.1.4 [BL18, Lem. 6.14], which
among other things, states that the lower semicontinuous functions form a convex cone.

Theorem 2.1.6: Operations preserving lsc functions

Let f, g, fi : E Ñ R for i P I be lower semicontinuous and F be a real normed space.


Then the following functions are lower semicontinuous.
1 αf for any α ě 0.

28
2 CONVEX FUNCTIONS

2 f ` g (if defined).
3 f ˝ L, where L : F Ñ E is a continuous function.
4 minpf1 , f2 q
5 φ ˝ f provided φ : R Ñ R is increasing and lower semicontinuous.
6 supiPI fi .

The proof is not hard and left as Exercise 2.1.11. Theorem 2.1.6 does not hold true for other
relations such as substraction or the infimum, see Exercise 2.1.12.

Definition 2.1.7 (Lower semicontinuous hull, closure)


The lower semicontinuous hull f¯ of f : E Ñ R is defined by lower
semicontinuous
f pxq “ inftb P R : px, bq P epipf qu. hull
The closure of f : E Ñ R is closure
$
& f, if f pxq ą ´8 @x P E,
clpf q :“
% ´8, if f pxq “ ´8 for one x P E.

A function f : E Ñ R is closed if clpf q “ f . closed

By definition, the lower semicontinuous hull of f : E Ñ R can be equivalently defined by


• the unique function f : E Ñ R which fulfills
` ˘
epi f “ epipf q.

• the function f : E Ñ R defined by

f px0 q “ lim inf f pxq.


xÑx0

By Lemma 2.1.4 f is lower semicontinuous if and only if f “ f . If f : E Ñ p´8, 8s, then


closedness and lower semicontinuity are equivalent.

Example 2.1.8 (Closure of indicator functions) Let S Ă E. By Corollary 2.1.5 (and


its proof), we have clpιS q “ ιS “ ιS . This motivates the denomination and notation from
Definition 2.1.7. ˛

29
2 CONVEX FUNCTIONS

Fig. 2.4: Left: A proper non-lower semicontinuous function g1 . Right: Its closure clpg1 q.

Exercises

Exercise 2.1.9 (Lower semicontinuous functions)


Consider the functions pfk q4k“1 : R Ñ R
$
& 1 , if x ą 0,
f1 :“ ιp´1,1s , f2 :“ ιr´1,1s , f3 :“ ιt0u , f4 pxq :“ x (2.2)
% 8, if x ď 0.

and the negative Shannon entropy


$
’ x logpxq, if x ą 0,

&
f5 : R Ñ R, x ÞÑ 0, if x “ 0, (2.3)


else.
%
8,

Fig. 2.5: The proper functions from Eqs. (2.2) and (2.3). In the first two plots, the blue
region depicts the epigraph of the functions.

All of them are proper. Which of them are lower semicontinuous? ■

Exercise 2.1.10 (Epigraph of a sublinear function) A function f : E Ñ R is called


sublinear if
• f is positively homogeneous, i.e., f prxq “ rf pxq for all r ě 0 and x P E, and
• f is subadditive, i.e., f px ` yq ď f pxq ` f pyq for all x, y P E.
Prove that f : E Ñ R is sublinear if and only if epipf q is a convex cone. ■

Exercise 2.1.11 Prove Theorem 2.1.6. ■

30
2 CONVEX FUNCTIONS

Exercise 2.1.12 (Operations preserving lower semicontinuous functions?)


Let f, fi , g, L and F be as in Theorem 2.1.6. Show that f ´ g, L ˝ f and inf iPI fi are in
general not lower semicontinuous. ■

Exercise 2.1.13 Show that the “ℓ0 -norm” (used by David Donoho for image processing
[Don01, Chp. 2]) defined as

} ¨ }0 : Rd Ñ t0, . . . , du,
␣ (
px1 , . . . , xd q ÞÑ # j P t1, . . . , nu : xj ‰ 0 ,

where # denotes cardinality, is not a norm, but lower semicontinuous. ■

Exercise 2.1.14 (Lower semicontinuous hull is a closure operator)


The assignment f ÞÑ f is a hull / closure operator, that is, f “ f , f ď g ùñ f ď g and
f ď f for all f, g : E Ñ R. ■

Exercise 2.1.15 Show that for f : E Ñ R we have clpf qpxq “ supthpxq : h is lsc, h ď f u
for all x P E. ■

31
2 CONVEX FUNCTIONS

2.2 Convex Functions


Textbooks: [MN22, Sec. 2.4], [BC11, Sec. 8.1], [Z0̆2, Sec. 2.1] and [ET99, Sec. I.2].
We start off this section with several definitions.

Definition 2.2.1 (Strictly / Strongly / Uniformly convex function)


Let C Ă E be convex. A function f : C Ñ R is convex if convex

(2.4)
` ˘
f tx ` p1 ´ tqy ď tf pxq ` p1 ´ tqf pyq @t P r0, 1s, @x, y P dompf q,

strictly convex if strictly convex


` ˘
f tx ` p1 ´ tqy ă tf pxq ` p1 ´ tqf pyq @t P p0, 1q, @x, y P dompf q, x ‰ y,

uniformly convex if there exists a non-decreasing function φ : r0, 8q Ñ r0, 8q with uniformly convex
φpxq “ 0 if and only if x “ 0 such that
` ˘
f tx ` p1 ´ tqy ď tf pxq ` p1 ´ tqf pyq ´ tp1 ´ tqφp}x ´ y}q @t P r0, 1s, @x, y P dompf q,

λ-convex (with modulus of convexity λ P R) if λ-convex

1
f tx ` p1 ´ tqy ď tf pxq ` p1 ´ tqf pyq ´ λtp1 ´ tq}x ´ y}2 @t P r0, 1s, @x, y P dompf q.
` ˘
2

If f is λ-convex with λ ą 0 (that is, uniformly convex with φpxq “ 2 λx ),


1 2
then f is
strongly convex. strongly convex

Fig. 2.6: Left: A strictly convex function. Right: A convex, but not strictly convex
function.

We call f (strictly/uniformly/λ-/strongly) concave if ´f is (strictly/uniformly/λ-/strongly)


convex. Strong convexity implies strict convexity, which implies convexity. The converse is
not true in both cases. If f is λ-convex and λ̃ ă λ, then f is also λ̃-convex. If f1 is λ1 -convex
and f2 is λ2 -convex, then f1 ` f2 is λ1 ` λ2 convex.

32
2 CONVEX FUNCTIONS

By Exercise 1.1.32 affine functions are the only ones that are both convex and concave. If
f : E Ñ R is convex, then dompf q is convex too. In particular, for S Ă E, the indicator
function ιS is convex if and only if S is convex. Moreover, any convex f : S Ñ R for
some convex S Ă E can naturally be extended to a convex function f¯: E Ñ R by setting
f¯pxq “ f pxq for x P S and f¯pxq “ `8 otherwise.
We denote the set of convex and lower semicontinuous functions f : E Ñ R by ΓpEq. Its
subset of proper functions is denoted by Γ0 pEq. Note that ΓpEq and Γ0 pEq are convex cones
in the vector space of all functions from E to R.

Lemma 2.2.2
Let f : E Ñ R be convex. Then the following holds true.
• If f has a finite value in some point x0 P ripdom f q, then f is proper.
• If f is lsc, ripdompf qq ‰ H and f px0 q “ ´8 for some x0 P E, then f pxq P t˘8u for
all x P E.

Proof. For the first statement, we have to verify that f does not take the value ´8.
Assume that there exists x1 P dom f with f px1 q “ ´8. Since x0 P ripdom f q, there exists
x2 P ripdom f q such that x0 “ λx1 ` p1 ´ λqx2 , λ P p0, 1q. Then, we get the contradiction

f px0 q ď λf px1 q ` p1 ´ λqf px2 q “ ´8.

Next, we prove the second statement. Due to the first statement, we know that f pxq “ ´8
for all x P ripdompf qq. Since f is lsc, this implies that f pxq “ ´8 for all x P dompf q. By
definition, f pxq “ `8 for x R dompf q. l

Lemma 2.2.3
Let E be a Hilbert space. Then, a function f is λ-convex if and only if the quadratic
perturbation f ´ 21 λ} ¨ }2 of f is convex.

Proof. Since inner products are bi-linear, it holds for any x, y P E that

}tx ` p1 ´ tqy}2 “ t2 }x}2 ` 2tp1 ´ tq x x, y y `p1 ´ tq2 }y}2 .

Hence
1 1 1
λ}tx ` p1 ´ tqy}2 ´ tλ}x}2 ´ p1 ´ tqλ}y}2
2 2 2
1 1
“ λtpt ´ 1q}x} ` λtp1 ´ tq x x, y y ` λp1 ´ tq p1 ´ tq ´ 1 }y}2
2
“ ‰
2 2
1 2
“ ´ λtp1 ´ tq}x ´ y} .
2
Therefore, f is λ-convex, if and only if
1
f tx ` p1 ´ tqy ď tf pxq ` p1 ´ tqf pyq ´ λtp1 ´ tq}x ´ y}2
` ˘
2
1 1 1
“ tf pxq ` p1 ´ tqf pyq ` λ}tx ` p1 ´ tqy}2 ´ tλ}x}2 ´ p1 ´ tqλ}y}2
2 2 2
for all x, y P E. By substracting 1
2 λ}tx ` p1 ´ tqy}2 this is equivalent to convexity of
f ´ 12 λ} ¨ }2 . l

33
2 CONVEX FUNCTIONS

Note that we do not need the completeness for the proof. Hence, Lemma 2.2.3 also holds
true in non-complete inner product spaces. Lecture 5,
Example 2.2.4 17.04.2024
• Linear functionals and their absolute value are convex, but not strictly convex.
• Let E be a Hilbert space. Then, the function 12 } ¨ ´y}2 for y P E is 1-convex.
• Let pE, } ¨ }q “ pR, | ¨ |q. Then, f : R Ñ R with x ÞÑ signpxq |x| is not λ-convex for
a

any λ P R.
• Let E be a Hilbert space. Then, } ¨ }2 is strictly convex. This is no longer true on
Banach spaces. A counterexample is pRd , } ¨ }8 q for d ě 2. ˛

Lemma 2.2.5 (Jensen’s inequality)


A function f : E Ñ R is convex if and only if for all n P N and all pxk qnk“1 Ă dompf q
˜ ¸
n
ÿ n
ÿ n
ÿ
f λ k xk ď λk f pxk q pλk qnk“1 Ă r0, 1s, λk “ 1.
k“1 k“1 k“1

The proof is left as an exercise.

Theorem 2.2.6: Convexity, level sets and epigraphs

The following relations between convexity, level sets and epigraphs hold true.
• If f : E Ñ R is convex, then levα pf q is convex for every α P R.
• A function f : E Ñ R is convex if and only if epipf q is a convex set.
• If f is convex, then so are f and clpf q.

Proof. • Take x, y P levα pf q for α P R and λ P r0, 1s. Then λx ` p1 ´ λyq P levα pf q as
by the convexity of f we have
` ˘
f λx ` p1 ´ λyq ď λf pxq ` p1 ´ λqf pyq ď λα ` p1 ´ λqα “ α.

• If dompf q “ H, then f ” 8 as well as epipf q “ H are convex. Therefore, it suffices


to consider dompf q ‰ H.
“ ùñ ”: Let f be convex. For px, aq, py, bq P epipf q and λ P r0, 1s we have

f pλx ` p1 ´ λqyq ď λf pxq ` p1 ´ λqf pyq ď λa ` p1 ´ λqb,

so
λpx, aq ` p1 ´ λqpy, bq “ pλx ` p1 ´ λqy, λa ` p1 ´ λqbq P epipf q.

“ ðù ”: Let epipf q be convex and x, y P dompf q such that f pxq, f pyq ‰ ´8. Since
px, f pxqq, py, f pyqq P epipf q, we have

λpx, f pxqq ` p1 ´ λqpy, f pyqq “ pλx ` p1 ´ λqy, λf pxq ` p1 ´ λqf pyqq P epipf q,

which implies f pλx ` p1 ´ λqyq ď λf pxq ` p1 ´ λqf pyq. The case f pxq “ ´8 follows
by considering the points px, ´N q and py, f pyqq for N Ñ 8.

34
2 CONVEX FUNCTIONS

• If f is convex, then epipf q is convex by the second part. By Theorem 1.1.4 epipf q “
epipf q is convex, so f is convex by the second part.
If f takes the value ´8, then clpf q ” ´8 is constant, hence convex and if not, then
clpf q ” f is convex by the previous paragraph. l

Hence the epigraph of a proper convex function is non-empty, convex and does not contain
any vertical lines. Moreover, if f : E Ñ R is proper and convex, then clpf q P Γ0 pEq.
Remark 2.2.7 (Quasiconvex functions)
The converse of the first statement in Theorem 2.2.6 is not true, consider e.g. | ¨ |. Func-
a

tions with convex level sets are called quasi-convex and can equivalently be characterized
by the relaxed inequality
` ˘
f λx ` p1 ´ λyq ď maxtf pxq, f pyqu @λ P r0, 1s, @x, y P dompf q. ˝

Fig. 2.7: The quasi-


Theorem 2.2.8: Supporting hyperplane
convex function
a
| ¨ | [MN22, Fig.
Let f : E Ñ R be proper and convex. There exist p P E ˚ and a P R such that
2.15].

f pyq ě x p, y y `a @y P E.

Fig. 2.8: Left: A convex function f : R Ñ R. At any point of differentiability, there is


exactly one tangent (blue) staying below the graph of f and intersecting it in that point,
while non-differentiable points, there are infinitely many (some in red). Right: At any x0
on the boundary of dompf q, there also multiple non-vertical hyperplanes supporting f .

Proof. Since clpf q ď f by Exercise 2.1.15, it suffices to show the assertion for lower semi-
continuous functions.
Let x0 P dompf q and b0 ă f px0 q. Then, px0 , b0 q R epipf q Ă E ˆ R, which is a closed and
convex set. By Theorem 1.2.2, there exists pp̄, ´āq P pE ˆ Rq˚ zt0u “ pE ˚ ˆ Rqzt0u with

x p̄, x y ´āb ă x p̄, x0 y ´āb0 @px, bq P epipf q.

35
2 CONVEX FUNCTIONS

By noting that px, f pxqq P epipf q for all x P dompf q, this implies

x p̄, x y ´āf pxq ă x p̄, x0 y ´āb0 @x P dompf q. (2.5)

In particular, we have for x “ x0 that

0 ă ā pf px0 q ´ b0 q
loooooomoooooon
ą0

such that ā ą 0. By dividing (2.5) by ´ā and setting p :“ p̄{ā this yields

f pxq ą x p, x y ` x p, x0 y `b0 @x P dompf q.

Since f pxq “ 8 for x R dompf q, the equation is true for all x P E. Thus, setting a “
x p, x0 y `b0 yields the claim. l

Using similar techniques and Remark 1.2.3, one can prove the following corollary.
Corollary 2.2.9 (Lower description of convex lsc functions)
For a convex function f : E Ñ R and x0 P E
␣ (
clpf qpx0 q “ sup x p, x0 y `a : x p, x y `a ď f pxq @x P E .
pPE ˚ zt0u,
aPR

Analogously to Theorem 1.1.4, there are several operations preserving the convexity of func-
tions. Note that products of convex functions may not be convex. An example is the function
f : R2 Ñ R, px1 , x2 q ÞÑ x1 x2 .

Theorem 2.2.10: Operations preserving convex functions

Let f, f1 , f2 , fi : E Ñ R for i P I be convex. Then, the following functions are convex:


1 f1 ` f2 (if defined). If f2 is strictly convex, then so is f1 ` f2 .
2 αf for α ě 0.
3 f ˝ A, where A : F Ñ E is an affine map and F is a vector space.
4 φ ˝ f , where φ : R Ñ R is a convex and non-decreasing (both on a convex set
containing the range of f ) function on R with φp`8q :“ 8 and φp´8q :“
infpφq.
If f is strictly convex and φ is strictly increasing (that is, x ă y implies φpxq ă
φpyq), then φ ˝ f is strictly convex.
5 supiPI fi .
6 the pointwise limit (it if exists) of pfn qnPN .
7 the perspective function of f ,
! x ) ´x¯
g: px, tq P E ˆ p0, 8q : P dompf q Ñ R, px, tq ÞÑ tf ,
t t
which is (strictly) convex if and only if f is (strictly) convex.

The proof is left as Exercise 2.2.12.

36
2 CONVEX FUNCTIONS

Exercises

Exercise 2.2.11 Prove the claims from Example 2.2.4. ■

Exercise 2.2.12 Prove Theorem 2.2.10. ■

Exercise 2.2.13 (Vecmax function) The vecmax function vecmax function

vecmax : Rd Ñ R, px1 , . . . , xd q ÞÑ maxptx1 , . . . , xd uq. ■

is convex, but not strictly convex.

37
2 CONVEX FUNCTIONS

2.3 Convexity for Differentiable Functions


Checking the definition of convexity directly might be hard in practice. If the function of
interest is differentiable, the monotinicity properties of the derivatives are directly related
to convexity. In this section, we first extend the basic notions of differentiability to general
real normed vector spaces. Afterwards, we show that for differentiable functions convexity
is equivalent to monotonicity of the gradient.
Textbooks: [Z0̆2, p. 79-81 (+ parts of Sec. 2.1)] [CV77, Sec. 1.5], [ET99, Sec. I.5.2], [BS13,
Def. 2.44] and [Phe09, Lem. 1.2–1.4].

Differentiability in Real Normed Spaces


In the following, we consider derivatives for functions mapping from a real normed space
pE, } ¨ }E q to another one pF, } ¨ }F q. To this end, we denote the normed vector space of all
linear continuous mappings A : E Ñ F by pLpE, F q, } ¨ }LpE,F q q, where the norm is defined
by
}A}LpE,F q “ sup }Apxq}F .
xPE
}x}E ď1

We abbreviate } ¨ }E by } ¨ }. In analogy to the differentiability in finite dimensional spaces,


the derivative in normed spaces is based on directional derivatives. The definition is named
in honor of the French mathematician René Eugène Gateaux (1889 - 1914).

Definition 2.3.1 (Gateaux derivative)


Let x0 P E and let U Ă E be a neighborhood of x0 . Further, let f : U Ñ F be a
(possibly non-linear) function. Then, the one-sided directional derivative of f at x0 one-sided
into direction v P E is defined by directional
derivative
f px0 ` tvq ´ f px0 q
Df rx0 spvq :“ lim P F, (2.6)
tŒ0 t

if it exists. If the limit in (2.6) exists for all v and if Df rx0 s : E Ñ F is linear and continuous,
we call Df rx0 s P LpE, F q the (linear) Gateaux derivative of f at x0 . In this case, we Gateaux
call f Gateaux differentiable at x0 . We call f : U Ñ F Gateaux differentiable, if it derivative
is Gateaux differentiable at x0 for all x0 P U ˝ . Then, the mapping Df : U ˝ Ñ LpE, F q,
x ÞÑ Df rxs is called the Gateaux differential.

Consider the case that pE, } ¨ }E q “ pRd , } ¨ }2 q and pF, } ¨ }E q “ pRn , } ¨ }2 q and let f : Rd Ñ
Rn be a Gateaux differentiable function. Then, f is by definition partially differentiable
and the Gateaux derivative Df rx0 s is given by the Jacobian matrix Jf rx0 s of f at x0
via the matrix-vector multiplication Df rx0 spvq “ Jf rx0 sv. Consequently, the Gateaux
derivative coincides with the partial derivative in Euclidean spaces. Vice versa, not any
partial differentiable function is Gateaux differentiable as the following example shows.
Example 2.3.2 The function
$
x2 y
&
x2 `y 2 for px, yq ‰ p0, 0q,
f : R2 Ñ R, px, yq ÞÑ
%0 else.

38
2 CONVEX FUNCTIONS

is continuous and has the partial derivatives

2xy 3 x2 px2 ´ y 2 q
fx px, yq “ and fy px, yq “
px2 ` y 2 q2 px2 ` y 2 q2

for px, yq ‰ p0, 0q and fx p0, 0q “ fy p0, 0q “ 0. Note that the fx is not continuous in
nÑ8
x0 :“ p0, 0q (choose pxn , yn q “ p n1 , n1 q ÝÝÝÑ p0, 0q). The directional derivative in x0 is not a
linear function since
f px0 ` thq ´ f px0 q t2 h2 th2 h2 h2 h2 h2
lim “ lim 2 2 1 2 2 “ lim 2 1 2 “ 2 1 2
tÑ0 t tŒ0 tpt h1 ` t h2 q tŒ0 h1 ` h2 h1 ` h2

and the last term is not linear in h. Thus, f is not Gâteaux differentiable in x0 . ˛

Remark 2.3.3 (Higher-order Gateaux derivatives)


As in the finite-dimensional case, we can define second derivatives as derivatives of the
derivative. To this end, let U Ă E be open. We call a Gateaux differentiable function
f : U Ñ F twice Gateaux differentiable (at x0 P U ) if the Gateaux differential
Df : U Ñ LpE, F q is again Gateaux differentiable (at x0 P U ). More precisely, f is twice
Gateaux differentiable at x0 P U if D2 f rx0 s defined by

Df rx0 ` tvs ´ Df rx0 s


D2 f rx0 spvq :“ lim P LpE, F q
tŒ0 t

is a continuous linear operator from E to LpE, F q (i.e., D2 f rx0 s P LpE, LpE, F qq).
Analogously, higher-order Gateaux derivatives can be defined. Similar to first derivatives,
higher-order Gateaux derivatives are consistent with the usual definition of differentiability
in finite-dimensional cases. ˝

In this lecture, we are interested in functions mapping into the the real numbers, i.e., pF, } ¨
}F q “ pR, | ¨ |q. Here, we will need first- and second-order Gateaux derivatives. In this case,
the first Gateaux derivative Df rx0 s is an element of E ˚ . Then, we call it the Gateaux
gradient, denoted by ∇f px0 q :“ Df rx0 s P E ˚ , since the directional derivatives from (2.6) Gateaux
are given by the dual pairing gradient

Df rx0 spvq “ x ∇f px0 q, v y

for all v P E. Moreover, we call the second Gateaux derivative the Gateaux Hessian Gateaux
and denote it by ∇2 f rx0 s :“ D2 f rx0 s P LpE, E ˚ q. It defines a bilinear form E ˆ E Ñ R by Hessian

px, yq ÞÑ x ∇2 f rx0 spxq, y y .

Example 2.3.4 Let pE, } ¨ }q “ pRnˆd , } ¨ }Fro q be the Hilbert space of all n ˆ d matrices Lecture 6,
equipped with Frobenius norm and associated inner product x A, B y “ tracepAT Bq. We 19.04.2024
compute the Gateaux gradient and Hessian of f pAq “ 12 }Ax ´ y}22 for fixed x P Rd , y P Rn .

39
2 CONVEX FUNCTIONS

To this end, we first compute the directional derivatives Df rAspHq for H P E:


f pA ` tHq ´ f pAq
Df rAspHq “ lim
tŒ0 t
xpA ` tHqx ´ y, pA ` tHqx ´ y y ´ x Ax ´ y, Ax ´ y y
“ lim
tŒ0 2t
x Ax ´ y, Ax ´ y y ` x tHx, Ax ´ y y ` x Ax ´ y, tHx y ` x tHx, tHx y ´ x Ax ´ y, Ax ´ y y
“ lim
tŒ0 2t
2t x Hx, Ax ´ y y `t2 x Hx, Hx y t
“ lim “ x Hx, Ax ´ y y ` lim }Hx}22
tŒ0 2t tŒ0 2

“ x Hx, Ax ´ y y “ tracepxT H T pAx ´ yqq “ tracepH T pAx ´ yqxT q “ xpAx ´ yqxT , H y .

Consequently, we have that ∇f pAq P E ˚ is given by H ÞÑ x ∇f pAq, H y “ xpAx ´ yqxT , H y.


Using the identification of E ˚ with E via the Riesz isomorphism, we write ∇f pAq “ pAx ´
yqxT P E “ E ˚ . Similarly, the Gateaux Hessian is given by
∇f pA ` tHq ´ ∇f pAq ppA ` tHqx ´ yqxT ´ pAx ´ yqxT
∇2 f rAspHq “ lim “ lim
tŒ0 t tŒ0 t
“ HxxT .
Hence, ∇2 f rAs : E Ñ E “ E ˚ is given by the mapping H ÞÑ HxxT “ x HxxT , ¨ y. ˛

Note that a function which is Gateaux differentiable in x0 is not necessarily continuous in


x0 . This can be seen by the following example.
Example 2.3.5 Consider
$
x4 y
2
&
x6 `y 3 for px, yq ‰ p0, 0q,
f : R Ñ R, px, yq ÞÑ
%0 else.

To see that f is not continuous in x0 :“ p0, 0q consider f pt, t2 q as t Ñ 0. However, we have


for all v P R2 zt0u that
t5 v14 v2 tv14 v2
Df r0spvq “ lim 6 ` t4 v 3 “ lim 6 ` v 3 “ 0.
tŒ0 t7 v1 2 tÑ0 t3 v1 2

Consequently, f is Gateaux differentiable in 0 with Df r0s “ 0. ˛

In order to retain the usual properties of differentiable functions, we introduce the total
derivative or Fréchet derivative.

Definition 2.3.6 (Fréchet differentiability)


Let x0 P E and let U Ă E be a neighborhood of x0 . Further, let f : U Ñ F be a (possibly
non-linear) function. Then, f is Fréchet differentiable at x0 P E, if there exists a Fréchet
Df rx0 s P LpE, F q such that differentiable

}f px0 ` vq ´ f px0 q ´ Df rx0 spvq}F


lim “ 0. (2.7)
vPE }v}E
}v}E Ñ0

One often writes “f pxq “ f px0 q ` Df rx0 spx ´ x0 q ` op}x ´ x0 }q”. Similarly as before,
we call Df rx0 s P LpE, F q the Fréchet derivative of f at x0 . Further, f is Fréchet
differentiable if it is Fréchet differentiable at every x P U ˝ . In this case Df : U ˝ Ñ
LpE, F q is called the Fréchet differential of f .

40
2 CONVEX FUNCTIONS

Theorem 2.3.7: Properties of Gateaux and Fréchet derivatives


Let x0 P E, U Ă E a neighborhood of x0 and f : U Ñ F . Then, the following hold.
1 If f is Fréchet differentiable (at x0 ), then f is also Gateaux differentiable
(at x0 ). Moreover, Fréchet derivative and Gateaux derivative (and therefore
also the differentials) coincide.
2 If the one-sided directional derivative Df rx0 spvq from (2.6) exists, it is positively
homogeneous, i.e., for all t ą 0 it holds that Df rx0 sptvq exists and we have that
Df rx0 sptvq “ t Df rx0 spvq.
3 Let g : U Ñ F and v P E such that the one-sided directional derivative of f
and g at x0 P U into direction v exists and let t ě 0. Then, it holds

Dptf qrx0 spvq “ t Df rx0 spvq, Dpf ` gqrx0 spvq “ Df rx0 spvq ` Dgrx0 spvq.

In particular, if f and g are Gateaux differentiable in x0 P U ˝ , then it holds


for any t P R that

Dptf qrx0 s “ t Df rx0 s, Dpf ` gqrx0 s “ Df rx0 s ` Dgrx0 s.

4 Let E be a Hilbert space and F “ R. Assume that f is twice Gateaux


differentiable on U and suppose that the Gateaux Hessian ∇2 f : E Ñ LpE, Eq
is continuous on U . Then, ∇2 f is self-adjoint.
5 Let dimpEq ă 8. If f is Gateaux differentiable (at x0 ) and locally Lipschitz
continuous (at x0 ), then f is Fréchet differentiable (at x0 ).

Proof. 1 - 3 These follow directly by definition and are left as an exercise to the reader.
4 The proof is analogously to the proof of the finite dimensional Schwarz theorem. For
x, y P E consider the expression
f px0 ` hx ` tyq ´ f px0 ` hxq ´ f px0 ` tyq ` f px0 q
Qph, tq :“
th
and note that
x ∇f px0 ` hxq ´ ∇f px0 q, y y
lim lim Qph, tq “ lim “ x ∇2 f rx0 spxq, y y . (2.8)
hÑ0 tÑ0 hÑ0 h
On the other hand, the (one-dimensional) mean value theorem applied on the function
g1 psq “ f px0 ` sx ` tyq ´ f px0 ` sxq yields that there exist some 0 ă c1 ph, tq ă h such that
1 1
Qph, tq “ pf px0 ` hx ` tyq ´ f px0 ` hxq ´ f px0 ` tyq ` f px0 qq “ pg1 phq ´ g1 p0qq
th th
1 A ∇f px ` c ph, tqx ` tyq ´ ∇f px ` c ph, tqxq E
0 1 0 1
“ g11 pc1 ph, tqq “ ,x .
t t
Applying again the mean value theorem onto g2 psq “ x ∇f px0 ` c1 ph, tqx ` syq, x y yields
that there exists some 0 ă c2 ph, tq ă t with
1
Qph, tq “ pg2 ptq ´ g2 p0qq “ g21 pc2 ph, tqq “ x ∇2 f rx0 ` c1 ph, tqx ` c2 ph, tqyspyq, x y .
t
Since ∇2 f is continuous and limh,tÑ0 c1 ph, tq “ 0 “ limh,tÑ0 c2 ph, tq, we obtain that

lim lim Qph, tq “ x ∇2 f rx0 spyq, x y .


hÑ0 tÑ0

41
2 CONVEX FUNCTIONS

Together with (2.8), we arrive at x ∇2 f rx0 spxq, y y “ x ∇2 f rx0 spyq, x y “ x x, ∇2 f rx0 spyq y.
5 Let Df rx0 s be the Gateaux derivative. Assume that Df rx0 s is not a Fréchet deriva-
tive. Then, there exists ε ą 0 and a sequence pun qn in E such that }un }E Ñ 0 and

}f px0 ` un q ´ f px0 q ´ Df rx0 spun q}F


ě ε.
}un }E

Define vn :“ }uunn}E and tn :“ }un }E such that }vn }E “ 1, un “ tn vn and tn Ñ 0. Since E


is finite-dimensional, the unit ball in E is compact such that we can assume (by choosing
a subsequence) that pvn qn converges to some v P E with }v}E “ 1. Moreover, the local
Lipschitz continuity of f yields that there exist L ě 0 and δ ą 0 such that f is L-Lipschitz
continuous on Bpx0 , δq. Thus, we have for n sufficiently large that

}f px0 ` un q ´ f px0 q ´ Df rx0 spun q}F }f px0 ` tn vn q ´ f px0 q ´ tn Df rx0 spvn q}


εď “
}un }E tn
}f px0 ` tn vn q ´ f px0 ` tn vq ´ tn Df rx0 spvn ´ vq ` f px0 ` tn vq ´ f px0 q ´ tn Df rx0 spvq}

tn
}f px ` tn vn q ´ f px ` tn vq} }f px0 ` tn vq ´ f px0 q ´ tn Df rx0 spvq}
ď ` } Df rx0 spvn ´ vq} `
tn tn
}f px0 ` tn vq ´ f px0 q ´ tn Df rx0 spvq}F
ď L}vn ´ v}E ` } Df rx0 spvn ´ vq}F ` .
tn
Now, the first two terms converge to zero since vn Ñ v and since the Gateaux derivative
Df rx0 s is continuous. Moreover, the last term converges to zero by the definition of the
Gateaux derivative. Therefore, the right side converges to zero which contradicts ε ą 0.l

It is not clear if the last item of the theorem still holds true if E is infinite-dimensional.

Convexity for Differentiable Functions


The next lemma summarizes the properties of directional derivatives for convex functions.

Lemma 2.3.8 (Directional derivatives of convex functions)


Let f : E Ñ R be convex, x0 P E with f px0 q P R and v P E. Then, the following hold.
1 The limit (2.6) exists (but might be ˘8) and is given by

f px0 ` tvq ´ f px0 q


Df rx0 spvq “ inf .
tą0 t
In particular, Df rx0 spvq is finite for any x0 P pdompf qq˝ .
2 The directional derivative Df rx0 s : E Ñ R is convex.
3 If f px0 ´ vq ą ´8, it holds

f px0 q ´ f px0 ´ vq ď Df rx0 spvq ď f px0 ` vq ´ f px0 q.

Proof. 1 Let λ P p0, 1q. We show that

f px0 ` λtvq ´ f px0 q f px0 ` tvq ´ f px0 q


ď . (2.9)
λt t

42
2 CONVEX FUNCTIONS

f px0 `tvq´f px0 q


Then, this implies that the function t is increasing in t such that

f px0 ` tvq ´ f px0 q f px0 ` tvq ´ f px0 q


lim “ inf .
tŒ0 t tą0 t

To show (2.9), let λ P p0, 1q. Then, it holds by convexity that

λf px0 ` tvq ` p1 ´ λqf px0 q ě f pλpx0 ` tvq ` p1 ´ λqx0 q “ f px0 ` λtvq.

Reformulating yields
1 1´λ
f px0 ` tvq ě f px0 ` λtvq ´ f px0 q.
λ λ
From this, we indeed obtain (2.9) as required
“1{λ
hkkkkikkkkj
1 1´λ
f px0 ` tvq ´ f px0 q λ f px0 ` λtvq ´ p λ ` 1q f px0 q f px0 ` λtvq ´ f px0 q
ě “ .
t t λt

2 Let v1 , v2 P dompDf rx0 sq. Then, we obtain for h ą 0 small enough and x0 ` hv1 , x0 `
hv2 P dompf q that

f px0 ` hpλv1 ` p1 ´ λqv2 qq ´ f px0 q


“ f pλpx0 ` hv1 q ` p1 ´ λqpx0 ` hv2 qq ´ λf px0 q ´ p1 ´ λqf px0 q
ď λpf px0 ` hv1 q ´ f px0 qq ` p1 ´ λqpf px0 ` hv2 q ´ f px0 qq.

Dividing by h ą 0, and letting h tend to 0`, we obtain

Df rx0 spλv1 ` p1 ´ λqv2 q ď λ Df rx0 spv1 q ` p1 ´ λq Df rx0 spv2 q.

3 The second inequality follows from 1 for t “ 1. The first one is clear if f px0 ´vq “ `8 Lecture 7,
or Df rx0 spvq “ `8. Otherwise, there exists h̃ ą 0 with h1 pf px0 ` hvq ´ f px0 qq ă 8 24.04.2024
for all h P p0, h̃s. Hence, we have x0 ´ v, x0 ` hv P dompf q for all h P p0, h̃s. By the
convexity of f we obtain
ˆ ˙
1 h 1 h
f px0 q “ f px0 ` hvq ` px0 ´ vq ď f px0 ` hvq ` f px0 ´ vq.
1`h 1`h 1`h 1`h

Since f px0 q, f px0 ´ vq P R, we can rearrange this inequality to

f px0 ` hvq ´ f px0 q


f px0 q ´ f px0 ´ vq ď @h P p0, h̃s.
h
For h Œ 0 we obtain the desired inequality. l

Next, we prove some equivalent characterizations of convexity for Gateaux differentiable


functions.

43
2 CONVEX FUNCTIONS

Theorem 2.3.9: First-order criterion for convexity


Let U Ă E be convex and open and f : U Ñ R be a finite Gateaux differentiable
function. The following statements are equivalent.
1 f is (strictly) convex.
2 “∇f is a (strictly) monotone operator”, that is,

pąq
x ∇f pxq ´ ∇f pyq, x ´ y y ě 0 @x, y P U px ‰ yq.

3
pąq
f pyq ´ f pxq ě x ∇f pxq, y ´ x y @x, y P U px ‰ yq.

Proof. “ 1 ùñ 3 ”: By the convexity of f , we obtain for any x ‰ y P U and t P p0, 1q


that
pąq
tf pyq ` p1 ´ tqf pxq ě f pty ` p1 ´ tqxq.

This yields
pąq f px ` tpy ´ xqq ´ f pxq
f pyq ´ f pxq ě ě Df rxspy ´ xq “ x ∇f pxq, y ´ x y,
t
where we used Lemma 2.3.8 for the second inequality.
“ 3 ùñ 2 ”: By assumption, it holds that

x ∇f pxq ´ ∇f pyq, x ´ y y “ ´ x ∇f pyq, x ´ y y ´ x ∇f pxq, y ´ x y


pąq
ě ´pf pxq ´ f pyqq ´ pf pyq ´ f pxqq “ 0.

“ 2 ùñ 1 ”: For x, y P E define
` ˘
φ : r0, 1s Ñ R, λ ÞÑ f x ` λpy ´ xq .

We show that φ is convex. To this end, observe that φ is differentiable on p0, 1q with
derivative
` ˘ ` ˘
φpλ ` hq ´ φpλq f x ` pλ ` hqpy ´ xq ´ f x ` λpy ´ xq
φ1 pλq “ lim “ lim
hÑ0
` h hÑ0
˘ ` h˘
f x ` λpy ´ xq ` hpy ´ xq ´ f x ` λpy ´ xq
“ lim
hÑ0 h
` ˘
“ Df rx ` λpy ´ xqspy ´ xq “ x ∇f x ` λpy ´ xq , y ´ x y .

Moreover, let 0 ď λ ă λ̃ ď 1 and z :“ x ` λpy ´ xq and z̃ :“ x ` λ̃py ´ xq. Then


z̃ ´ z “ pλ̃ ´ λqpy ´ xq and thus

1 pąq
φ1 pλ̃q ´ φ1 pλq “ x ∇f pz̃q ´ ∇f pzq, y ´ x y “ ∇f pz̃q ´ ∇f pzq, z̃ ´ z y ě 0.
xlooooooooooooooomooooooooooooooon
λ̃omo
lo ´ oλn
pąq
ě0 ě 0 by assumption

Hence, φ1 is (strictly) increasing.

44
2 CONVEX FUNCTIONS

Now let t P p0, 1q. Then, we obtain by the fundamental theorem of differentiation and
integration together with the fact that φ1 is (strictly) increasing that
żt păq
ż1 pąq
φptq ´ φp0q “ φ1 psqds ď tφ1 ptq, φp1q ´ φptq “ φ1 psqds ě φ1 ptqp1 ´ tq.
0 t

Substracting t times the second equality from 1 ´ t times the first equality, we obtain by
c2 ą c1 that
păq
φptq ´ p1 ´ tqφp0q ´ tφp1q ď tp1 ´ tqpφ1 ptq ´ φ1 ptqq “ 0.
Rearranging the terms yields that
păq
φptq ď p1 ´ tqφp0q ` tφp1q.

In particular, we have that


` ˘ păq
f x ` tpy ´ xq “ φptq ď p1 ´ tqφp0q ` tφp1q “ p1 ´ tqf pxq ` tf pyq. l

Corollary 2.3.10 (Convexity of the norm)


• The norm is convex but not strictly convex.
• Let E be a Hilbert space. Then 21 } ¨ }2 is strictly convex.

Proof. • Convexity follows by the triangular inequality. For y “ 2x P Ezt0u and


t P r0, 1s we have }p1 ´ tqx ` ty} “ p1 ´ tq}x} ` t}y}, so } ¨ } is not strictly convex.
• For f pxq “ 21 }x}2 , it holds
2
1
` th}2 ´ 21 }x}2
2 }x
1
}x}2 ` x x, th y ` t2 }h}2 ´ 12 }x}2
Df rxsphq “ lim “ lim 2
tÑ0 t tÑ0 t
t
“ x x, h y ` lim }h}2 “ x x, h y .
tÑ0 2

Consequently ∇f pxq “ x (where we identify E ˚ with E via the Riesz isomorphism)


such that for all x ‰ y P E it holds

x ∇f pxq ´ ∇f pyq, x ´ y y “ x x ´ y, x ´ y y “ }x ´ y}2 ą 0.

Hence, f is strictly convex. l

The second statement is not true even in finite-dimensional Banach spaces. For instance,
let pE, } ¨ }q “ pRd , } ¨ }8 q and let x “ p´ 12 , 1q and y “ p 12 , 1q. Then, we have 12 }x}2 ` 12 }y}2 “
1 “ }p0, 1q}2 “ } 12 x ` 12 y}2 .
In the case that f is twice Gateaux differentiable, we obtain another criterion for convexity.

Theorem 2.3.11: Second-order criterion for convexity


Let U Ă E be convex and open and f : U Ñ R be a finite function that is twice
Gateaux differentiable. Then, the following holds true.
1 f is convex if and only if the Gateaux Hessian of f is positive semidefinite for

45
2 CONVEX FUNCTIONS

every x0 P U , i.e., x ∇2 f rx0 spxq, x y ě 0 for all x P E.


2 Assume that the Gateaux Hessian of f is positive definite for every x0 P U ,
i.e., x ∇2 f rx0 spxq, x y ą 0 for all x P Ezt0u. Then, f is strictly convex.

Proof. 1 “ ùñ ”: Assume that f is convex and x0 P U . Since U is open, there exists


ε ą 0 such that Bpx0 , εq Ă U . Then, we know by Theorem 2.3.9 and by the convexity of f
that for all t P p0, εq and z P E with }z} ď 1 it holds

x ∇f px0 ` tzq ´ ∇f px0 q, tz y ě 0, @z P E with }z} ď 1.

Dividing by t yields
∇f rx0 ` tzs ´ ∇f rx0 s
x , z y ě 0, @z P E with }z} ď 1.
t
Taking the limit for t Œ 0 and using the continuity of the dual pairing, we obtain
∇f rx0 ` tzs ´ ∇f rx0 s
x lim , z y ě 0, @z P E with }z} ď 1.
tŒ0 t
By the definition of the Gateaux Hessian as derivative of the Gateaux gradient this is
equal to
x ∇2 f rx0 spzq, z y ě 0, @z P E with }z} ď 1.
Using the positive homogenity of Gateaux derivatives and dual pairings, we obtain for
arbitrary x P E that

x ∇2 f rx0 spxq, x y “ }x}2 x ∇2 f rx0 sp }x}


x x
q, }x} y ě 0.

1 “ ðù ” and 2 : Now, assume that ∇2 f rx0 s is positive (semi)definite for any x0 P U .


We prove the claims by contradiction and assume that f is not (strictly) convex. Then,
there exist by Theorem 2.3.9 x, y P U such that
pěq
0 ą x ∇f pxq ´ ∇f pyq, x ´ y y “ φp0q ´ φp1q,

where φ : r0, 1s Ñ R is defined by

φpλq “ x ∇f px ` λpy ´ xqq, x ´ y y .

We observe that φ is differentiable on p0, 1q with derivative


φpλ ` tq ´ φpλq
φ1 pλq “ lim
tÑ0 t
∇f px ` λpy ´ xq ` tpy ´ xqq ´ ∇f px ` λpy ´ xqq
“ x lim ,x ´ yy
tÑ0 t
“ x ∇2 f rx ` λpy ´ xqspy ´ xq, x ´ y y
păq
“ ´ x ∇2 f rx ` λpy ´ xqspy ´ xq, y ´ x y ď 0.
Finally, we obtain by the fundamental theorem of differentiation and integration that
ż1 păq
pďq
0 ă φp1q ´ φp0q “ φ1 psqds ď 0,
0

which is a contradiction. l

46
2 CONVEX FUNCTIONS

Example 2.3.12 Consider the function f pAq “ 12 }Ax ´ y}22 from Example 2.3.4. We found
that ∇2 f rAspHq “ HxxT . Hence, we have that
x ∇2 f rAspHq, H y “ x HxxT , H y “ tracepxxT H T Hq
“ tracepxT H T Hxq “ xT H T Hx “ }Hx}22 ě 0.
Hence, f is convex. ˛

Example 2.3.13 • The function f pxq “ x4 is strictly convex, but its Gateaux Hessian
(i.e., its second derivative) in 0 is not positive definite since f 2 pxq “ 12x2 for all x P R.
Consequently, the reverse statement of Theorem 2.3.11 2 is not true.
• Using Theorem 2.3.11 and Theorem 2.3.9, we can now verify (strict) convexity of
functions just by evaluating their (second) derivatives. For instance, we can verify
1
that the functions x ÞÑ ´ logpxq, x ÞÑ x logpxq and x ÞÑ xe x are proper, continuous
and strictly convex on p0, 8q.
• The negative geometric mean
˜ ¸ n1
n
ź
f : Rn Ñ R, x ÞÑ ´ xk ` ιr0,8qn pxq
k“1
is convex.
• Let ψ : R Ñ R be increasing and a P R. Then

& x ψptq dt, if x ě a,
a
φ : R Ñ p´8, 8s, x ÞÑ
% 8, else
is convex [BC11, Ex. 8.13]. ˛
Remark 2.3.14 (First- and second-order criterion for λ-convex functions) We can
extend of Theorem 2.3.9 and Theorem 2.3.11 for λ-convex functions. More precisely, let
U Ă E be open and let f : U Ñ R. Then, one can show that the following statements are
equivalent.
• The function f is λ-convex.
• For all x, y P U , it holds x ∇f pxq ´ ∇f pyq, x ´ y y ě λ}x ´ y}2 .
• For all x, y P U , it holds f pyq ´ f pxq ´ x ∇f pxq, y ´ x y ě 12 λ}x ´ y}2 .
• For all x0 P U and x P E it holds x ∇2 f rx0 spxq, x y ě λ}x}2 .
In the Euclidean case, the last statement is fulfilled if and only if all eigenvalues of the
Hessian matrix are greater or equal than λ.
The proof of the statements is analogously to the ones from Theorem 2.3.9 and Theo-
rem 2.3.11 and is left as Exercise 2.3.19. ˝

Exercises

Exercise 2.3.15 (Descent lemma) Let f : E Ñ R be Gateaux differentiable. If ∇f is


L-Lipschitz continuous, then it holds
1
f pyq ď f pxq ` x ∇f pxq, y ´ x y ` L}x ´ y}2 .
2 ■

47
2 CONVEX FUNCTIONS

Exercise 2.3.16 Let pE, } ¨ }E q “ pRd , } ¨ }q and pF, } ¨ }E q “ pRn , } ¨ }1 q be finitie dimensional
spaces, equiped with arbitrary (i.e., possibly non-Euclidiean) norms } ¨ } and } ¨ }1 and let
f : Rd Ñ Rn . Is the relation between the Gateaux derivative Df rx0 s and the Jacobian
matrix Jf rx0 s of f at x0 still true? ■

Exercise 2.3.17 (Gâteaux differentiability of the norm)


Show that } ¨ } is never Gateaux differentiable at 0 (if E ‰ t0u). Give examples for a norm,
which is Gateaux differentiable on Ezt0u, where E is infinite-dimensional, and one that is
not. ■

Exercise 2.3.18 A positively homogeneous function f : E Ñ R is convex if and only if it


is subadditive. ■

Exercise 2.3.19 (First- and second-order criterion for λ-convex functions)


Prove the statements from Remark 2.3.14. ■

Exercise 2.3.20 (HM-AM inequality) The harmonic mean řn n 1 of pxk qnk“1 Ă p0, 8q
k“1 xk
řn
is smaller than or equal to its arithmetic mean n1 k“1 xk . ■
´ř ¯
d
Exercise 2.3.21 The log-exponential function (or: LogSumExp) x ÞÑ log k“1 e xk
is
convex, as is the log barrier ´ log ˝ det on the cone of symmetric positive definite matrices.■

48
2 CONVEX FUNCTIONS

2.4 Continuity Properties of Convex Functions


Textbooks: [NP19, Sec. 3.5], [Luc06, Sec. 2.1], [BS13, Sec. 2.4.1], [Hol12, Sec. 14, p. 82-84],
[Z0̆2, p. 64-65], [BL18, Thm. 6.25, 6.30] and [ET99, Subsec. I.2.3].
In this subsection we present sufficient conditions for the continuity of convex functions,
which can, remarkably, be deduced only from boundedness assumptions. The main takeaway
is that (lower semicontinuous if E is infinite-dimensional) proper, convex functions f are
continuous on ripdompf qq.

Lemma 2.4.1 (Locally bounded + convex ùñ locally Lipschitz)


Let f : E Ñ R be proper and convex. Suppose there are x0 P E, δ ą 0 and M P R such that
f pxq ď M for all x P B2δ px0 q. Then there exists a m ď M such that

M ´m
|f px1 q ´ f px2 q| ď }x1 ´ x2 } @x1 , x2 P Bδ px0 q.
δ
In particular, f is Lipschitz continuous on Bδ px0 q.

Proof. For x P B2δ px0 q we have x0 “ 21 x ` 12 p2x0 ´ xq and thus, by the convexity of f ,

1 1
f px0 q ď f pxq ` f p2x0 ´ xq. (2.10)
2 2
Since 2x0 ´ x P B2δ px0 q, we obtain the lower bound
(2.10)
f pxq ě 2f px0 q ´ f p2x0 ´ xq ě 2f px0 q ´ M :“ m.

x1 x2 y
x0
δ

Fig. 2.9: Construction of y in the proof of Lemma 2.4.1.

For x1 , x2 P Bδ px0 q with x1 ‰ x2 let


ˆ ˙
δ δ x2 ´ x1
y :“ 1 ` x2 ´ x1 “ x2 ` δ P B2δ px0 q,
}x2 ´ x1 } }x2 ´ x1 } }x2 ´ x1 }

49
2 CONVEX FUNCTIONS

see Fig. 2.9 for an illustration. Then


}x2 ´ x1 } δ
x2 “ y` x1
δ ` }x2 ´ x1 } δ ` }x2 ´ x1 }
is a convex combination of y and x1 . As f is convex,
}x2 ´ x1 } δ
f px2 q ď f pyq ` f px1 q
δ ` }x2 ´ x1 } δ ` }x2 ´ x1 }
and thus
ˆ ˙
}x2 ´ x1 } δ
f px2 q ´ f px1 q ď f pyq ` ´ 1 f px1 q
δ ` }x2 ´ x1 } δ ` }x2 ´ x1 }
}x2 ´ x1 } ` ˘
“ f pyq ´ f px1 q
δ ` }x2 ´ x1 }
M ´m M ´m
ď }x2 ´ x1 } ď }x2 ´ x1 }.
δ ` }x2 ´ x1 } δ
Exchanging the roles of x1 and x2 yields the statement. l

The local boundedness assumption in Lemma 2.4.1 is essential. For example, the function
$
& ´?x, if x ě 0,
f pxq :“ (2.11)
% `8, else
is convex but not locally Lipschitz continuous in x “ 0.
Corollary 2.4.2 (Continuity of convex functions)
Fig. 2.10: The func-
Let f : E Ñ R be convex and x0 P dompf q. The following are equivalent.
tion (2.11).
1 f is locally Lipschitz continuous at x0 .
2 f is continuous at x0 .
3 f is locally bounded at x0 .
4 f is locally bounded above at x0 .

Proof. The implications 1 ùñ 2 ùñ 3 ùñ 4 are clear and don’t require


convexity. By Lemma 2.4.1, 4 ùñ 1 .

Theorem 2.4.3
Let E be finite-dimensional. A proper convex function on a finite-dimensional space
is continuous on pdompf qq˝ . In praticular, a finite convex function f : E Ñ R on a
finite-dimensional space E is continuous.

For a proof, we refer to [Luc06, Cor. 2.1.3] and [BS13, Cor. 2.109]. A finer analysis using
the same arguments shows that that a proper and convex function f : E Ñ R on a finite-
dimensional E is continuous on ripdompf qq relative to affpdompf qq.
The result is wrong for infinite dimensional spaces. Simple counterexamples are unbounded
(i.e., non-continuous) linear functionals. However, continuity can be recovered by assuming
that f is lower semicontinuous and that E is a Banach space.

50
2 CONVEX FUNCTIONS

Theorem 2.4.4: Lsc convex ùñ continuous

Let E be a Banach space and f P Γ0 pEq, then f is locally Lipschitz continuous on


pdompf qq˝ .

Proof. • We first show that there exists a point û P dompf q and some δ ą 0 and R P R
such that f puq ă R for all u P Bδ pûq.
Let u0 P pdompf qq˝ and R P R such that f pu0 q ă R. Then, we define V :“ levR pf q.
Since f is convex and lsc, we obtain that V is convex and closed. We want to utilize
Baire’s category theorem (Corollary 1.8 in the Functional Analysis script or [BL18,
Thm. 2.14]) in order to show that V admits an interior point. To this end, we define
the scaled and translated versions of V defined by
" *
u ´ u0
Vn :“ u P E : u0 ` P V “ nV ´ pn ` 1qu0 .
n

By construction, V has an interior point if and only if Vn has an interior point for
some n. Further, by the properties of the Minkowski sum, Vn is closed, convex and
non-empty for all n. Thus, by Baire’s category theorem, it remains to show that
Ť
nPN Vn “ E.

For this purpose, we consider the functions Lecture 8,


26.04.2024
fu : R Ñ R, t ÞÑ f pu0 ` tpu ´ u0 qq,

which are convex due to the convexity of f . Since u0 P pdompf qq˝ , there exists some
t0 ą 0 such that fu is finite on r´t0 , t0 s. Thus, by the convexity of fu , we have for any
t P r´t0 , t0 s that
ˆ ˙
t0 ´ t t0 ` t ` ˘
fu ptq “ fu p´t0 q ` t0 ď max fu p´t0 q, fu pt0 q .
2t0 2t0

By Corollary 2.4.2 this implies that fu is continuous in 0. Since f pu0 q ă R this means
that there exists some n P N such that
u ´ u0 ˘ u ´ u0
“ fu p n1 q ă R, i.e. u0 `
`
f u0 ` P V.
n n
Hence it holds that u P Vn . Since u was arbitrary, this implies that nPN Vn “ E.
Ť

Consequently, Baire’s category theorem implies that one Vn has an interior point.
Since Vn is a scaled and translated version of V , this implies that there exists û P V ˝ ,
i.e., there exists some δ ą 0 such that f puq ă R for all u P Bδ pûq.
• Now let u P dompf q˝ be arbitrary. We show that f is locally Lipschitz continuous
in u. The mapping p0, 8q Ñ E given by λ ÞÑ λ1 pu ´ p1 ´ λqûq is continuous and and
has value u at λ “ 1. Since u P pdompf qq˝ , there exits some λ P p0, 1q such that
f p λ1 pu ´ p1 ´ λqûqq “: S ă 8 is finite.
For v P Bδp1´λq puq, set w “ û ` 1
1´λ pv ´ uq P Bδ pûq. Rearranging yields v “ u ´ p1 ´

51
2 CONVEX FUNCTIONS

λqû ` p1 ´ λqw. Hence, the convexity of f yields that


ˆ ˙
u ´ p1 ´ λqû
f pvq “ f λ ` p1 ´ λqw
λ
ˆ ˙
u ´ p1 ´ λqû
ď λf ` p1 ´ λqf pwq ď S ` R.
λ

Thus, f pvq ď S ` R for all v P Bδp1´λq puq, namely, f is locally bounded from above at
u. This implies by Corollary 2.4.2 that f is locally Lipschitz continuous at u. l

52
2 CONVEX FUNCTIONS

2.5 Minima of Convex Functions


Textbooks: [BS13, p. 14], [BS13, Thm. 2.28, 2.31, Lem. 2.33], [BC11, Chp. 11] and [ET99,
Sec. II.1].
In this section, we want to examine the existence and uniqueness of minimizers for proper
and convex functions f : E Ñ R. Note that the minimizers of the function f : C Ñ R for
some C Ă E are exactly the minimizers of f ` ιC : E Ñ R. Thus it suffices to consider
functions defined on the whole normed space E.

Definition 2.5.1 (Local / global minimum / minimizer)


Let f : E Ñ R.
Then, a point x̂ P dompf q is a local minimizer of f , if there exists a neighbourhood U px̂q local minimizer
of x̂ such that f px̂q ď f pxq for all x P U px̂q. Further, x̂ P dompf q is a global minimizer
of f , if f px̂q ď f pxq for all x P dompf q. global minimizer
In this case, we call f px̂q a local/global minimum of f .
Moreover, we denote the set of global minimizers by
␣ (
arg minpf q :“ x̂ P dompf q : f px̂q ď f pxq @x P dompf q .

Similarly, we define the set of global maximizers as arg maxpf q :“ arg minp´f q.

The following properties state that a function grows sufficiently quickly at the “boundary”.

Definition 2.5.2 (Level-bounded, coercive, supercoercive)


A function f : E Ñ R is (lower) level-bounded if levα pf q is bounded for all α P R. This is level-bounded
nÑ8 nÑ8
equivalent to f being coercive, i.e., if f pxn q ÝÝÝÑ 8 as }xn } ÝÝÝÑ 8, see Exercise 2.5.7. coercive
Moreover, f is supercoercive if it is bounded from below on bounded subsets and supercoercive

f pxq
lim inf “ 8.
}x}Ñ8 }x}

By convention, f is coercive and supercoercive if E “ t0u.

For example, the function f pxq “ x2 is supercoercive, but f pxq “ x is not. Note that not
every strictly convex and coercive function f : E Ñ R is supercoercive. For instance consider
f : R Ñ R, x ÞÑ |x| ´ logp|x| ` 1q.
In finite dimensions, a function f P Γ0 pRd q is even coercive if one level set is bounded, see
Exercise 2.5.8.
In the following, we will prove the existence of minimizers for coercive functions f P Γ0 pEq.
In finite-dimensions the proof works by a compactness argument for bounded sets. Unfortu-
nately, the unit ball is compact if and only if E is finite-dimensional such that the argument
fails in infinite dimensional spaces.
As a remedy, we use compactness with respect to weak convergence. Recall that pxn qn in

53
2 CONVEX FUNCTIONS

E converges weakly to x P E (denoted by xn á xq if it holds x p, xn y Ñ x p, x y for


all p P E ˚ . Moreover, we call C Ă E weakly sequentially closed if for all pxn qn in C
converging weakly to some x P E it holds that x P C. Similarly, C is weakly sequentially
compact if any pxn qn in C admits a subsequence pxnk qk such that xnk á x for some x P C.
Further, a function f : E Ñ R is called weakly lower semicontinuous at x0 if it holds
f px0 q ď lim inf nÑ8 f pxn q for all pxn qn in E with xn á x0 . The function f is weakly lower
semicontinuous if it is weakly lower semicontinuous at every point x0 P E.
The following lemma shows the connections between convexity and weak convergence.

Lemma 2.5.3
Let E be a real normed space. Then, the following holds true.
• A convex set C Ă E is closed if and only if it is weakly sequentially closed.
• A convex function f : E Ñ R is lower semicontinuous if and only if it is weakly lower
semicontinuous.
• Let E be a reflexive Banach space, then the unit ball is weakly sequentially compact.
In particualar, any bounded sequence admits a weakly convergent subsequence.

Proof. • Let C Ă E be convex and weakly sequentially closed. Moreover, let pxn qn
be a sequence in C that converges strongly to some x P E. Since strong convergence
implies weak convergence, we obtain that xn á x. Because C is weakly sequentially
closed, we obtain x P C. Thus, C is closed.
For the reverse direction, note that any closed halfspace Hp,a
´
is weakly sequentially
closed. Indeed: let pxn qn be a sequence in Hp,a and let xn á x. Then, x p, x y “
´

limnÑ8 x p, xn y ď a such that x P Hp,a


´
.
Now, let C Ă E be closed. Then, we have by Remark 1.2.3 that C is the intersection
of closed halfspaces, which are weakly sequentially closed. Consequently, also C is
weakly sequentially closed.
• Let f : E Ñ R be weakly lower semicontinuous and let xn Ñ x. Then, it holds xn á x
such that f pxq lim inf nÑ8 f pxn q. Thus, f is lower semicontinuous.
For the other direction let f be convex and lower semicontinuous. Then, by Lemma 2.1.4
and Theorem 2.2.6, epipf q is convex and closed. Therefore, by the first claim, epipf q is
also weakly sequentially closed. Analogously to the proof of Lemma 2.1.4, this implies
that f is weakly lower semicontinuous.
• This should be content of any Functional Analysis class. l

Now, we prove existence of minimizers for convex functions under certain assumptions.

Theorem 2.5.4: Existence and uniqueness of global minima

Let f : E Ñ p´8, 8s be a proper convex function.


1 Let E be a reflexive Banach space. If f is lower semicontinuous and coer-
cive, then inf xPE f pxq is attained, so arg minpf q ‰ H. Furthermore, arg minpf q

54
2 CONVEX FUNCTIONS

is closed and bounded.


2 Every local minimizer is a global one.
3 arg minpf q is convex (but possibly empty).
4 If f is strictly convex, then arg minpf q contains at most one point, that is, if
a minimizer exists, it is unique.

Proof. 1 Since f is proper, we have that there exists some x0 P E such that a :“ f px0 q P
R. Now, we choose a sequence pxn qn in E such that f pxn q ď a and limnÑ8 f pxn q “
inf xPE f pxn q. By definition, we have that xn P leva pf q. Since f is coercive, we obtain
that leva pf q and therefore the sequence pxn qn is bounded. Using that E is a reflexive
Banach space this yields by Lemma 2.5.3 that it admits a subsequence pxnk qk which
converges weakly to some x̂ P E. Finally, the lower semicontinuity of f implies by
Lemma 2.5.3 weak lower semicontinuity such that we have

inf f pxq ď f px̂q ď lim inf f pxnk q “ inf f pxq.


xPE kÑ8 xPE

This implies that ´8 ă f px̂q “ inf xPE f pxq such that x̂ P arg minpf q.
Because arg minpf q “ levf px̂q pf q, we obtain that arg minpf q is bounded (due to coer-
civity of f ) and closed (by Lemma 2.1.4 since f is lsc).
2 Let x̂ be a local minimizer which is not a global minimizer. Then, there exists some x̃
with f px̃q ă f px̂q. In particular, we have for any t P p0, 1q that

f pp1 ´ tqx̂ ` tx̃q ď p1 ´ tqf px̂q ` tf px̃q ă f px̂q.

Since p1 ´ tqx̂ ` tx̃ Ñ x̂ as t Ñ 0 we obtain that for any δ ą 0 there exists some
x P Bpx̂, δq such that f pxq ă f px̂q which contradicts the assumption that x̂ is a local
minimizer.
3 Let x̂1 , x̂2 P arg minpf q and t P r0, 1s. Then, we have that

inf f pxq ď f pp1 ´ tqx̂1 ` tx̂2 q ď p1 ´ tqf px̂1 q ` tf px̂2 q “ inf f pxq,
xPE xPE

such that all ď are in fact equalities. Thus, f pp1 ´ tqx̂1 ` tx̂2 q “ inf xPE f pxq, which
implies p1 ´ tqx̂1 ` tx̂2 P arg minpf q which implies convexity.
4 Assume that x̂1 ‰ x̂2 are two elements of arg minpf q, i.e., f px̂1 q “ f px̂2 q “ inf xPE f pxq.
Then, it holds by strict convexity that

f p 12 x̂1 ` 12 x̂2 q ă 12 pf px̂1 q ` f px̂2 qq “ inf f pxq


xPE

which is a contradiction. l

Using the same proof, we observe that part 1 of the theorem also holds true if f is only
quasi-convex.
Remark 2.5.5 Part 1 of the previous proof follows the so-called direct method in the
calculus of variations or Tonelli’s direct method for proving the existence of minimizers
of some functional f . It consists out of three steps.

55
2 CONVEX FUNCTIONS

• Take a minimizing sequence of f .


• Show that the sequence has a convergent subsequence regarding some topology.
• Show that f is lower semicontinuous with respect to this topology.
Then, it follows that there exists a minimizer of f . ˝
Remark 2.5.6 (Critical points are global minima) In the case, that f : E Ñ R is
proper and convex, we obtain that every critical point is a global minimum. To see this let
f be Gateaux differentiable at some point x0 P E with ∇f px0 q “ 0. Then for any x P E,
we have by Lemma 2.3.8 3 that 0 “ Drx0 spx ´ x0 q ď f pxq ´ f px0 q, i.e., f px0 q ď f pxq.
Consequently, we arrive at x0 P arg minpf q. ˝

Exercises

Exercise 2.5.7 (Coercivity and level-boundedness) A function f : E Ñ R is coercive


if and only if it is level-bounded. ■

Exercise 2.5.8 (One level set is enough)


A function f P Γ0 pRd q is coercive if and only if there exists an a P R such that leva pf q is
non-empty and bounded. ■

Exercise 2.5.9 If g is supercoercive and f P Γ0 pEq, then f ` g is supercoercive. ■

Exercise 2.5.10 Find a convex, continuous function f : R Ñ R with arg minpf q “ H. ■

56
2 CONVEX FUNCTIONS

2.6 Infimal Projection and Infimal Convolution


18
Textbooks: [BC11, Chp. 12], ([BS13, 1.
p. Max
115],and Min p. 17]), [CV77, Sec. 1.4] and [ET99,
[Luc06,
pp. 39-40]. ν
The important question of when actually p(u ) → p(ū) for a sequence
uν → ū, as called for in 1.17(b), goes far beyond the simple sufficient condition
Infimal
offeredProjection
in 1.17(c). A fully satisfying answer will have to await the theory of
‘epi-convergence’ in Chapter 7.
We study a fundamental convexity-preserving operation for functions. Let pE, } ¨ }q and
Parametric minimization has a simple geometric interpretation.
pF, } ¨ }q be real normed spaces.
1.18 Proposition (epigraphical projection). Suppose p(u) = inf x f (x, u) for a
Definition : IRn ×
function f 2.6.1 IRm → IR, and let
(Epigraphical (orE infimal)
be the image of epi f under the pro-
projection)
Thejection
infimal(x, u, α) #→ (u,(or
projection α).marginal function)
If for each u ∈ domof φ
p :the
Eˆ setF PÑ(u)
R= is argminx f (x, u)
is attained, then epi p = E. In general, epi p is the set obtained by adjoining
to E any lower boundary v : Fpoints
Ñ R, that umight
ÞÑ inf be missing,
φpx, uq. i.e., by closing the
xPE
intersection of E with each vertical line in IRm × IR.
Proof. This is clear from the projection argument given for Theorem 1.17.

α epi f
epi p

Fig. 2.11: The name “epigraphical projection” is due to epipvq being the image of epipφq
Fig. 1–8. Epigraphical projection in parametric minimization.
under the projection px, u, aq ÞÑ pu, aq.

1.19 Exercise (constraints in parametric minimization). For each u in a closed


Theorem m
2.6.2: Infimal projection of convex function is convex
set U ⊂ IR let p(u) denote the optimal value and P (u) the optimal solution
Let φin: E
set the
ˆ Fproblem
Ñ R be a convex function. Then its infimal projection v is also convex.
!
In particular for a convex set C Ă F ˆ R, the function u ÞÑ inf a isi convex.
≤ 0 for
pa,uqPC ∈ I1 ,
minimize f0 (x, u) over all x ∈ X satisfying fi (x, u)
= 0 for i ∈ I2 ,

for Since
Proof. φ is set
a closed convex,
X ⊂forIRpx
n , u q P dompφq, i P t1, 2u, and λ P r0, 1s we have
i continuous functions fi : X × U #→ IR (for
i and
i ∈ {0} ∪ I1 ∪ I2 ). Suppose that for each ū ∈ U , ε > 0 and α ∈ IR the set of
λφpx1 , u1 q ` p1 ´ λqφpx2 , u2 q ě φpλx1 ` p1 ´ λqx2 , λu1 ` p1 ´ λqu2 qq
pairs (x, u) ∈ X × U satisfying |u − ū| ≤ ε and f0 (x, u) ≤ α, along with all the
vpλu1 ` p1 n 2 q. m
constraints indexed by I1 and I2 , isěbounded in´IRλqu × IR .
Then p is lsc on U , and for every u ∈ U
Taking the infimum at the left hand side over x1 and x2 we havewith p(u) < ∞ the set P (u)
is nonempty and compact. If only f0 depends on u, and the constraints are
satisfied by at least one1 qx,
λvpu ` then dom 2pq =
p1 ´ λqvpu ěU , and
vpλu 1 `pp1is´continuous
λqu2 q. relative to U .
ν ν ν
In that case, whenever x ∈ P (u ) with u → ū in U , all the cluster points of
Lastly,
thenoting that{x
sequence ν
}ν∈INisare
dompvq theinprojection
P (ū). of dompφq shows that v is convex.
For the secondThis
Guide. partisconsider
obtainedφpa,
from :“ a ` ιC pa,
uq Theorem uq.by taking f (x, u) = f0 (x, u) when
1.17 l

57
2 CONVEX FUNCTIONS

Example 2.6.3 (Infimal projection can be improper) Let f P Γ0 pRd q be finite and
φpx, uq :“ f pxq ` ιtex ďuu pxq. Then φp¨, uq P Γ0 pRd q. For f :“ id, the infimal projection of φ
is the improper, not lower semicontinuous function
$
& 8, u ď 0,
vpuq “
% ´8 u ą 0.
˛

Infimal Convolution
Closely related to infimal projections are infimal convolutions.

Definition 2.6.4 (Infimal convolution [BC11, Def. 12.1])


The infimal convolution of two proper functions f1 , f2 : E Ñ R is defined by infimal
convolution
pf1 □f2 qpxq :“ inf f1 px1 q ` f2 px2 q “ inf f1 pyq ` f2 px ´ yq. (2.12)
x1 ,x2 PE: yPE
x“x1 `x2

By definition, the infimal convolution is commutative (i.e., f1 □f2 “ f2 □f1 ) and it holds
dompf1 □f2 q “ dompf1 q ` dompf2 q. Moreover, we can analogously define the infimal convo-
lution of more than two functions by
n
ÿ
pf1 □ ¨ ¨ ¨ □fn qpxq “ inf f pxi q.
xi PE
x1 `¨¨¨`xn “x i“1

In the case that pf1 □f2 q and pf2 □f3 q are proper, the infimal convolution is associative ,i.e.,
f1 □pf2 □f3 q “ pf1 □f2 q□f3 “ f1 □f2 □f3 .
Example 2.6.5
• We have ιS1 □ιS2 “ ιS1 `S2 for S1 , S2 Ă E.
• For f1 :“ ιC and f2 :“ } ¨ } we have pf1 □f2 qpyq “ inf xPC }y ´ x}, which is the
distance of the point y to the set C ‰ H, and also the infimal projection of φpx, uq :“
}u ´ x} ` ιC pxq.
• The infimal convolution admits the neutral element ιt0u . ˛

The following theorem explains the alternative name “epi-sum”.

Theorem 2.6.6: Infimal convolution and epigraphs

Let f1 , f2 : E Ñ R be proper.
1 We have

epipf1 q ` epipf2 q Ă epipf1 □f2 q, (2.13)

where equality holds if and only if the infimum in (2.12) is attained for all
elements indompf1 □f2 q.
2 We have s-epipf1 □f2 q “ s-epipf1 q ` s-epipf2 q.

58
2 CONVEX FUNCTIONS

3 If f1 and f2 are convex, then so is f1 □f2 .

Proof. 1 Let px, aq P epi f1 ` epi f2 . Then there exist pxi , ai q P epi fi , i “ 1, 2 so that
x “ x1 ` x2 , a “ a1 ` a2 and f1 px1 q ` f2 px2 q ď a1 ` a2 “ a. Consequently,

pf1 lf2 qpxq “ inf tf1 py1 q ` f2 py2 qu ď f1 px1 q ` f2 px2 q ď a


y1 `y2 “x

and px, aq P epipf1 lf2 q.


Equality condition: Assume that the infimum defining f1 lf2 is attained at each
point where f1 lf2 is finite. Let px, aq P epipf1 lf2 q, i.e., pf1 lf2 qpxq ď a. Since the
infimum in f1 lf2 is attained, there exist x1 , x2 so that x “ x1 ` x2 and

pf1 lf2 qpxq “ f1 px1 q ` f2 px2 q.

Hence, f1 px1 q ` f2 px2 q ď a and

px, aq “ px1 , f1 px1 qq ` px2 , a ´ f1 px1 qq P epi f1 ` epi f2 .

Conversely, assume that we have equality in (2.13). Let f pxq :“ pf1 lf2 qpxq be finite
so that px, f pxqq P epipf1 lf2 q. Then there exist pxi , ai q P epi fi with px, f pxqq “
px1 , a1 q ` px2 , a2 q. This implies that f pxq “ a1 ` a2 ě f1 px1 q ` f2 px2 q. Since we have
on the other hand that f pxq ď f1 px1 q ` f2 px2 q we see that the infimum is attained for
the decomposition x “ x1 ` x2 . Lecture 9,
2 The inclusion “Ă” is analogously to 1 . 03.05.2024

For “Ą” let px, aq P s-epipf1 □f2 q. Then, it holds

inf f1 px1 q ` f2 px2 q ă a


x1 `x2 “x

such that we find x1 , x2 P E such that x1 ` x2 “ x and f1 px1 q ` f2 px2 q ă a. In


particular, it holds for ε “ a´pf1 px12q`f2 px2 qq ą 0 and ai “ f pxi q ` ε that pxi , ai q P
s-epipfi q such that px, aq “ px1 , a1 q ` px2 , a2 q P s-epipf1 q ` s-epipf2 q.
3 Let fi , i “ 1, 2 be convex. Then epi f1 ` epi f2 is a convex set by Theorem 2.2.6 and
Theorem 1.1.4. Regarding that the inf-convolution can be rewritten as

pf1 lf2 qpxq :“ infta P R : px, aq P epi f1 ` epi f2 u

we conclude by Theorem 2.6.2 that f1 lf2 is convex. l

The following example shows that the reverse inclusion of part 1 of the theorem is not true.
Moreover, the infimal convolution of two proper convex and lsc functions is not necessarily
proper or lsc.
Example 2.6.7
• Let p P R and consider the proper, continuous, convex functions f1 , f2 : R Ñ R with
f1 pxq :“ px and f2 pxq :“ ex . Then
$
& ppx ´ logppq ` 1q, p ą 0,


pf1 □f2 qpxq “ 0, p “ 0, ,


p ă 0.
%
´8,

59
2 CONVEX FUNCTIONS

so that for p ă 0, the infimal convolution is not proper. For p “ 0 we have f1 “ f1 □f2
and thus

epipf1 q “ epipf1 □f2 q “ R ˆ Rě0 and epipf2 q “ tpx, aq P R ˆ R : ex ď au. (2.14)

In particular, we have that p0, 0q P epipf1 □f2 qzpepipf1 q ` epipf2 qq.


• Let again f2 pxq :“ ex . The sets C1 :“ epipf2 q and C2 :“ epip f12 q are non-empty, convex
and closed. Thus ιC1 , ιC2 P Γ0 pRd q. We have C1 ` C2 “ tpx, yqT P R2 : y ą 0u, which
is open, such that ιC1 □ιC2 “ ιC1 `C2 is not lower semicontinuous. ˛

Exercises

Exercise 2.6.8 (Computing some inf-convolutions [Luc06, Ex. 1.2.24]) Find f □g,
where
• f pxq :“ x and gpxq :“ 2x.
• f : E Ñ R and gpxq :“ ιtx0 u for some x0 P E.
• f : R Ñ R and gpxq :“ rιt0u for r P R.
• f pxq :“ 12 x2 and gpxq :“ ιr0,1s .
• f : E Ñ p´8, 8s is convex and proper and gpxq :“ ιBr p0q for some r ą 0.
• f pxq :“ 21 x2 and gpxq :“ x. ■

60
2 CONVEX FUNCTIONS

2.7 Proximal Mappings


A standard technique for minimizing a differentiable function f : Rd Ñ R is a gradient
descent method. That is, starting with some intialization x0 P Rd we generate a sequence
by xn “ xn´1 ` λ∇f pxn´1 q. In this lecture, the functions of interest are non-differentiable
and defined on a general real normed space pE, } ¨ }q. Therefore, we consider so-called
proximal mappings which are also called proximal operator or proximity operator
(abbreviated proxy or prox).

Definition 2.7.1 (Proximal mapping)


Let λ ą 0 and f P Γ0 pEq. Then, the proximal mapping with respect to f is defined by proximal
mapping
! 1 )
proxλf :“ arg min f pyq ` } ¨ ´y}2 .
yPE 2λ

Moreover, we define the Moreau envolope or Moreau-Yoshida regularization Moreau


λ
f by envolope
" * ˆ ˙
1 1
λ
f : E Ñ R, x ÞÑ inf }x ´ y}2 ` f pyq “ } ¨ }2 □f pxq. (2.15)
yPE 2λ 2λ

The Moreau envolope is convex as the infimal convolution of two convex functions.
In the case, that f : Rd Ñ R is differentiable, the proximal mapping corresponds to an
implicit gradient step. More precisely, let x̂ P proxλf pxq. Then, x̂ is a minimum and hence
a critical point of the function f ` 2λ
1
}x ´ ¨}2 . Consequently, it holds 0 “ λ1 px̂ ´ xq ` ∇f px̂q
which can be reformulated as x̂ “ x ´ λ∇f px̂q.
Remark 2.7.2 (Generalizations of proximal mappings)
The definition can be extended in several directions. For instance, we could consider proximal
mappings with respect to non-convex functions f . Moreover, we could replace the term
}x ´ y}22 by other “distance-like” functions gpx, yq : E ˆ E Ñ R. However, in this case many
of the properties of proximal mappings which we show in the sequel are not clear. ˝

Example 2.7.3 (Proxy of indicator functions and | ¨ |)


• Let f pyq “ ιC pyq for some nonempty closed convex set C Ă E. Then, proxλf is given
by arg minyPC t}x ´ y}u. In the case that E is an Hilbert space, this is exactly the
orthogonal projection onto C from Theorem 1.4.1.
• For the proper, convex, lower semicontinuous function f : R Ñ R, y ÞÑ |y| we have
$
& x ´ λ, for x ą λ,


1
Sλ pxq :“ proxλf pxq “ arg min px ´ yq2 ` |y| “ 0, for |x| ď λ, ,
yPR 2λ ’

x ` λ, for x ă ´λ
%

which is the soft-shrinkage operator with threshold λ. The Moreau envelope of soft-shrinkage
operator

61
2 CONVEX FUNCTIONS

f is the Huber function


$
λ
&x ´ 2,

’ for x ą λ,
λ 1
f pxq “ px ´ Sλ pxqq2 ` |Sλ pxq| “ 1 2
2λ x , for |x| ď λ,
2λ ’

´x ´ λ2 , for x ă ´λ,
%

1 2
2λ x

λ
2

−λ λ −λ λ

Fig. 2.12: Left: The soft-shrinkage function proxλf “ Sλ for f :“ | ¨ |. Middle: The
Moreau envelope λ f of f . (In this plot, λ “ 2.) Right: The absolute value function
(black) and its Moreau envelope (gray) of the absolute value function for λ P t1, 2, 4u.

For the multivariate analogon f :“ } ¨ }1 of the function considered above, the mini-
mization can be done componentwise, leading to
d ˆ ˙
ÿ 1 2
proxλf pxq “ arg min pxk ´ yk q ` λ|yk | ,
yPRd k“1 2

so the soft-shrinkage operator has to be applied componentwise as well. ˛

In Hilbert spaces, the proximal operator can be characterized by a variational inequality


similarly to the orthogonal projection.

Theorem 2.7.4: Properties of proximal mappings

Let f P Γ0 pEq. Then, the following holds true.


1 A point x̂ is a minimizer of f if and only if x̂ P proxλf px̂q for any fixed λ ą 0.

If additionally, E is a reflexive Banach space, it holds


2 For any x P E the proximal mapping proxλf pxq is non-empty.
3 We have arg minpf q “ arg minpλ f q.

If E is a Hilbert space:
4 The proximal mapping admits a unique element, i.e., there exists some x̂ P E
such that proxλf pxq “ tx̂u. In this case, we also write x̂ “ proxλf pxq
5 A point x̂ P E minimizes (2.15) if and only if the variational inequality variational
inequality
1
x x ´ x̂, y ´ x̂ y `f px̂q ď f pyq
λ
holds for all y P E.

62
2 CONVEX FUNCTIONS

6 The Moreau envelope is Fréchet differentiable with

1`
∇pλ f qpxq “ (2.16)
˘
x ´ proxλf pxq .
λ

Proof. 1 “ ùñ ”: Let x̂ be a global minimizer of f . Then it holds for any x P E that

1 1
}x̂ ´ x̂}2 ` f px̂q “ f px̂q ď f pxq ď }x̂ ´ x}2 ` f pxq.
2λ 2λ
Hence, x̂ P proxλf px̂q.
“ ðù ”: Suppose x̂ P proxλf px̂q and let x P E. Then it holds

1
x̂ P arg min }x̂ ´ y}2 ` f pyq
yPE 2λ

such that we have for t P p0, 1q that


1 1
f px̂q “ }x̂ ´ x̂}2 ` f px̂q ď }x̂ ´ pp1 ´ tqx̂ ` txq}2 ` f pp1 ´ tqx̂ ` txq
2λ 2λ
1 t2
“ }tpx̂ ´ xq}2 ` f pp1 ´ tqx̂ ` txq ď }x̂ ´ x}2 ` p1 ´ tqf px̂q ` tf pxq
2λ 2λ
Substracting p1 ´ tqf px̂q on both sides and dividing by t yields
t
f px̂q ď }x̂ ´ x}2 ` f pxq

Taking the limit t Ñ 0 implies that f px̂q ď f pxq. Since x P E was chosen arbitrarily
x̂ is a global minimizer.
2 Since 2λ1
} ¨ ´x}2 ` f is convex and coercive, we get by Theorem 2.5.4 1 that
1
proxλf pxq “ arg minyPE t 2λ }x ´ y}2 ` f pyqu is non-empty.
3 By definition, we have

inf f pyq ď inf 1


}x ´ y}2 ` f pyq “ λ f pxq “ inf 1
}x ´ y}2 ` f pyq ď f pxq.
yPE yPE 2λ yPE 2λ

In particular, if x̂ is a minimizer of f , it holds for all x P E that λ f px̂q ď f px̂q “


inf yPE f pyq ď λ f pxq such that x̂ is a minimizer of λ f .
Vice versa, let x̂ be a minimizer of λ f . By part 2 there exists some x0 P proxλf px̂q.
In particular, we have that

λ 1
f px0 q ď f px0 q ď }x̂ ´ x0 }2 ` f px0 q “ λ f px̂q ď λ f px0 q.

Consequently, the above inequalities are equalities such that }x̂´x0 }2 “ 0, i.e., x0 “ x̂.
Therefore, we have x̂ P proxλf px̂q such that x̂ is a minimizer of f by part 1 .
4 Since E is a Hilbert space, we get by Corollary 2.3.10 that 1
2λ } ¨ }
2
is strictly convex
such that the claim follows by Theorem 2.5.4 4 .

63
2 CONVEX FUNCTIONS

5 “ ùñ ”: Let x P E and x̂ P proxλf pxq.


Then, it holds by the definition of the proxy for all y P E and t P p0, 1q that
1
0 ě f px̂q ´ f px̂ ` tpy ´ x̂qq ` p}x̂ ´ x}2 ´ }x̂ ´ x ´ tpx̂ ´ yq}2 q

1
“ f px̂q ´ f pp1 ´ tqx̂ ` tyq ` p2t x x̂ ´ x, x̂ ´ y y ´t2 }x̂ ´ y}2 q

and by convexity of f further

t t2
0 ě tf px̂q ´ tf pyq ` xx̂ ´ x, x̂ ´ yy ´ }y ´ x̂}2 .
λ 2λ
Finally, dividing by t ą 0 and taking t Œ 0 we obtain the variational inequality.
“ ðù ”: Conversely, let the variational inequality hold true. Then we have for all y P E
that
1 1 1
f px̂q ` }x̂ ´ x}2 ď f pyq ´ xx̂ ´ x, x̂ ´ yy ` }x̂ ´ x}2
2λ λ 2λ
1 1 1
ď f pyq ` }x̂ ´ x}2 ´ xx̂ ´ x, x̂ ´ yy ` }x̂ ´ y}2
2λ λ 2λ
1
“ f pyq ` }y ´ x}2

by the binomial formula. Hence x̂ “ proxλf pxq.
6 We show that the gradient of λf exists and is given by (2.16). For an arbitrary x0 P E
let
1
x̂0 :“ proxλf px0 q, z :“ px0 ´ x̂0 q.
λ
To show that λf is differentiable at x0 with ∇λ f px0 q “ z we have to prove that

rpuq :“ λf px0 ` uq ´ λf px0 q ´ xz, uy


rpuq
fulfills lim }u} “ 0. To this end we show |rpuq| ď C}u}2 . We have
}u}Ñ0

λ 1
f px0 q “ f px̂0 q ` }x̂0 ´ x0 }2 ,

λ
! 1 ) 1
f px0 ` uq “ min f pxq ` }x ´ x0 ´ u}2 ď f px̂0 q ` }x̂0 ´ x0 ´ u}2 ,
xPE 2λ 2λ
so that
1 1 1 1
rpuq ď }x̂0 ´ x0 ´ u}2 ´ }x̂0 ´ x0 }2 ´ xx0 ´ x̂0 , uy “ }u}2 .
2λ 2λ λ 2λ

Since f is convex, so is λf by Theorem 2.6.6 3 . Now r inherits the convexity of λf


so that ˆ ˙
1 1 1 1
0 “ rp0q “ r u ` p´uq ď rpuq ` rp´uq.
2 2 2 2
Consequently, we get
1 1
rpuq ě ´rp´uq ě ´ } ´ u}2 “ ´ }u}2
2λ 2λ
and in summary |rpuq| ď 2λ }u} .
1 2
l

64
2 CONVEX FUNCTIONS

The uniqueness from Theorem 2.7.4 4 holds true for so-called “strictly convex” Banach
spaces.
Explicit forms of proximal mappings for specific (finite-dimensional) functions can be found
at the website http : / / proximity-operator . net/ and in Appendix 9 of the script on
convex analysis in Euclidean spaces by Gabriele Steidl (available in ISIS).

Exercises

Exercise 2.7.5 What is proxλf if f is constant? Show that every translation on E (that
is, a map z ÞÑ z ´ a) is a proximal operator for suitable f and g [Mor65, p. 278-279]. ■

Exercise 2.7.6 (Proximal calculus)


Let f P Γ0 pEq and C Ă E be a non-empty closed convex set. We have the following
calculation rules for the proximal operator.

property gpxq proxg pxq


translation f px ´ zq, z P E z ` proxf px ´ zq
scaling f xa , a ‰ 0 a ¨ proxa´2 f xa
` ˘ ` ˘

reflection f p´xq ´ proxf p´xq

65
3 SUBGRADIENTS

3 Subgradients

3.1 The Subdifferential


Textbooks. [BS13, Sec. 2.4.3], [ET99, Sec. I.5.1.-I.5.2], [MN22, Sec. 3.3], [Phe09, Def. 3.7].
By Theorem 2.3.9 any finite convex Gateaux differentiable function f : E Ñ R fulfills

f pxq ě f px0 q ` x ∇f px0 q, x ´ x0 y @x P E (3.1)

at every x0 P E. In other words, f is minorized by the first Taylor polynomial of f in x0 .


Now, for a proper, convex (but not necessarily differentialble) function f , we call some
p P E ˚ a subgradient if it fulfills (3.1) when we replace the Gateaux gradient by the
subgradient. More precisely, we have the following definition.

x0

Fig. 3.1: Three affine functions supporting a non-differentiable convex f : R Ñ R at x0 . Two


of their slopes are given by directional derivatives Df rx0 sp1q and ´ Df rx0 sp´1q, respectively.

Definition 3.1.1 (Subdifferential)


Let f : E Ñ R be proper and x0 P dompf q. The subdifferential of f at x0 is defined as subdifferential

Bf px0 q :“ tp P E ˚ : f pxq ´ f px0 q ě x p, x ´ x0 y @x P Eu

and its elements are the subgradients of f at x0 . If x0 R dompf q we set Bf px0 q :“ H. If


Bf px0 q ‰ H, then f is subdifferentiable at x0 .

By definition, it holds Bf px0 q “ tp P E ˚ : x p, y y ď f px0 q ´ f px0 ` yq @y P Eu. Moreover,


we have that č
Bf px0 q “ HΛ´E px´x0 q,f pxq´f px0 q .
xPEzx0

66
3 SUBGRADIENTS

In particular, Bf px0 q is convex, closed and weakly sequentially closed.


Example 3.1.2 (Subdifferential of the absolute value) Lecture 10,
The subdifferential of | ¨ | : R Ñ R is 08.05.2022
$
& t´1u,

’ if x ă 0,
B| ¨ |pxq :“ r´1, 1s, if x “ 0,


if x ą 0.
%
t1u,
˛

Remark 3.1.3 (Subdifferentials for non-convex functions)


For non-convex functions, one usually uses a slightly different defintion of subdifferentials.
For a proper (not necessarily convex) f : E Ñ R, the regular subdifferential is defined by
" *
r ˚ f pxq ´ f px0 q ´ x p, x ´ x0 y
B f px0 q :“ p P E : lim inf ě0 .
xÑx0 }x ´ x0 }

The condition is often denoted by “f pxq´f px0 q ě x p, x´x0 y `op}x´x0 }q”. In the case that
f is convex, one can show that the regular subdifferential coincides with Definition 3.1.1.
In this lecture, we mainly consider convex functions. Nevertheless, many of the following
relations remain true for non-convex functions and the regular subdifferential. ˝

For convex functions the following lemma relates subgradients with one-sided directional
derivatives (which exist by Lemma 2.3.8).

Lemma 3.1.4 (Directional derivatives and subgradients)


Let f : E Ñ R be convex and finite at x0 P E. Then

Bf px0 q “ tp P E ˚ : x p, v y ď Df rx0 spvq @v P Eu .

Proof. “Ă”: Let p P Bf px0 q. Setting x :“ x0 ` hv for h ą 0 and v P E, we obtain by the


definition of the subdifferential that

f px0 ` hvq ´ f px0 q ě x p, x0 ` hv ´ x0 y,

that is, 1
h pf px0 ` hvq ´ f px0 qq ě x p, v y. Letting h Œ 0, we obtain the first implication.
“Ą”: Let p P tp P E ˚ : x p, v y ď Df rx0 spvq @v P Eu. By Lemma 2.3.8 3 we have

x p, v y ď Df rx0 spvq ď f px0 ` vq ´ f px0 q @v P E

and thus p P Bf px0 q. l

The lemma implies particularly that Df rx0 spvq “ suppPBf px0 q x p, v y. By replacing v by
´v, in the statement of the lemma, we obtain that p P Bf px0 q if and only if x p, v y ě
´ Df rx0 sp´vq for all v P E. Consequently, we can rewrite the subdifferential as

Bf px0 q “ tp P E ˚ : ´ Df rx0 sp´vq ď x p, v y ď Df rx0 spvq @v P Eu. (3.2)

67
3 SUBGRADIENTS

In the case E “ R, this yields that Bf px0 q “ tp P R : ´ Df rx0 sp´1q ď p ď Df rx0 sp1qu,
i.e., the subdifferential is the interval with the bounds being exactly the one-sided deriva-
tives of f at x0 . If the directional derivatives at x0 are finite, this simplifies to Bf px0 q “
r´ Df rx0 sp´1q, Df rx0 sp1qs.
Under the assumption that f is continuous at x0 , the next theorem shows that the subdif-
ferential is non-empty and bounded. By Theorem 2.4.3 and Theorem 2.4.4, this assumption
is particularly fulfilled if E is finite-dimensional or if E is a Banach space and f lsc.

Theorem 3.1.5: Existence of subgradients

Let f : E Ñ R be proper and convex. If f is continuous at x0 P pdompf qq˝ , then


Bf px0 q is non-empty and bounded. Furthermore, for x0 P Ezpdompf qq˝ , the subdif-
ferential Bf px0 q is either empty or unbounded.

Proof. Since f is continuous in x0 , there exist by Corollary 2.4.2 some ε ą 0 and c P R


such that
f pxq ă c @x P B2ε px0 q. (3.3)

Bounded: It holds by Lemma 2.3.8 for p P Bf px0 q and x P E with }x} ď 1 that

f px0 ` εxq ´ f px0 q c ´ f px0 q


x p, x y ď Df rx0 spxq ď ď .
ε ε
Now, Bf px0 q is bounded since

c ´ f px0 q
}p}˚ “ sup x p, x y ď .
xPE ε
}x}ď1

Non-empty: Consider the sets

C1 :“ epipf q˝ and C2 :“ tpx0 , f px0 qqu.

Since f is a convex function, we have that C1 is convex. Moreover, (3.3) yields that px0 , bq P
C1 for all b ą c. Consequently, C1 and C2 are disjoint, convex and non-empty. Hence,
they can be separated accordingly to Theorem 1.2.2. More precisely, there exists pp, aq P
pE ˆ Rq˚ zt0u “ pE ˚ ˆ Rqzt0u such that

xpp, aq, px, bq y ď xpp, aq, px0 , f px0 qq y @px, bq P epipf q˝ ,

i.e.,
x p, x ´ x0 y `apb ´ f px0 qq ď 0 @px, bq P epipf q˝ .

Reformulating this as set inclusion yields

tpx, bq P E ˆ R : x p, x ´ x0 y `apb ´ f px0 qq ď 0u Ą epipf q˝ .

Taking the interior of both sides yields by pp, aq ‰ 0 that

tpx, bq P E ˆ R : x p, x ´ x0 y `apb ´ f px0 qq ă 0u Ą epipf q˝ .

68
3 SUBGRADIENTS

In particular, we have that

x p, x ´ x0 y `apb ´ f px0 qq ă 0 @px, bq P epipf q˝ .

Since (3.3) yields that there exists b ą c ą f px0 q such that px0 , bq P epipf q˝ , this implies
that
´ f px0 qq ă 0 such that a ă 0.
apbloooomoooon
ą0
Hence, we obtain by deviding by a ă 0 and setting p̄ :“ ´p{a P E ˚ that

´ x p̄, x ´ x0 y `b ´ f px0 q ą 0 @px, bq P epipf q˝ .

By Lemma 1.1.18 we have that epipf q Ă epipf q “ epipf q˝ such that there exist for any x P E
some sequence pxn , bn qn in epipf q˝ with pxn , bn q Ñ px, f pxqq. In particular, we have that

´ x p̄, x ´ x0 y `f pxq ´ f px0 q “ lim ´ x p̄, xn ´ x0 y `bn ´ f px0 q ě 0.


nÑ8 looooooooooooooooomooooooooooooooooon
ą0

Reformulating yields that

f pxq ´ f px0 q ě x p̄, x ´ x0 y @x P E

which implies p̄ P Bf px0 q.


Unbounded or empty for x0 R pdompf qq˝ : We show that Bf px0 q is unbounded whenever
it is non-empty for x0 R pdompf qq˝ . To this end let p P Bf px0 q. Then, there exists by
Theorem 1.2.2 some p̂ P E ˚ zt0u such that x p̂, x y ď x p̂, x0 y for all x P pdompf qq˝ . By going
over to the closure, we obtain that this holds true for all x P dompf q. In particular, we have
for all x P dompf q and all λ ě 0 that

f pxq ´ f px0 q ě x p, x ´ x0 y ě x p, x ´ x0 y ` x λp̂, x ´ x0 y “ x p ` λp̂, x ´ x0 y .

Consequently, we obtain p ` λp̂ P Bf px0 q. Moreover, we have by the triangular inequality


from below that }p ` λp̂}˚ ě λ}p̂}˚ ´ }p}˚ Ñ 8 as λ Ñ 8. In particular, Bf px0 q is
unbounded. l
Corollary 3.1.6 (Subdifferential is non-empty for x0 P ripdompf qq)
Let E be a Banach space and f : E Ñ R be proper and convex. Assume that either f is lsc
or dimpEq ă 8. Then, for any x0 P ripdompf qq, the subdifferential is non-empty.

Proof. Without loss of generality, assume that 0 P affpdompf qq (otherwise consider gpxq “
f px ´ x0 q instead of f ). Then, F :“ affpdompf qq is a closed linear subspace of the Banach
space E and therefore complete. Now, we consider the restricted function f |F : F Ñ R de-
fined by f |F pxq “ f pxq for x P F . Since x0 P ripdompf qq, we obtain that x0 P pdompf |F qq˝ Ă
F . Because either f is lsc or E (and therefore F ) is finite-dimensional, we obtain by The-
orem 2.4.4 or Theorem 2.4.3 that f |F is continuous at x0 . Hence, by Theorem 3.1.5 there
exists q P Bf |F px0 q Ă F ˚ . Finally, the Hahn-Banach theorem implies that there exists
p P E ˚ with x p, x y “ x q, x y for all x P F such that

x p, x ´ x0 y “ x q, x ´ x0 y ď f |F pxq ´ f |F px0 q “ f pxq ´ f px0 q

for all x P dompf q Ă F . This implies p P Bf px0 q. l

69
3 SUBGRADIENTS

Theorem 3.1.7: Subdifferentiability and differentiability

Let f : E Ñ R be convex and finite at x0 P E. If f is Gateaux differentiable at x0 ,


then
Bf px0 q “ t∇f px0 qu. (3.4)

If E is a Banach space and f is lsc, the reverse direction holds true. That is, if
Bf px0 q “ tpu, then f is Gateaux differentiable at x0 and p “ ∇f px0 q holds.

Proof. “ ùñ ”: If f is differentiable at x0 , then by Lemma 3.1.4 any p P Bf px0 q fulfills

x p, x y ď x ∇f px0 q, x y @x P E.

By plugging in ´x instead of x, this implies that x p, x y “ x ∇f px0 q, x y for all x P E such


that p “ ∇f px0 q.
“ ðù ”: Let Bf px0 q “ tpu. Then we have by Theorem 3.1.5, that x0 P pdompf qq˝ (other
wise Bf px0 q is either empty or unbounded). In particular, we have by Theorem 2.4.4 that f
is continuous in x0 such that by Corollary 2.4.2 there exist some ε ą 0 and c P R such that

f pxq ă c @x P B2ε px0 q. (3.5)

Now we show that x p, v y “ Df rx0 spvq for all v P E which implies that p “ ∇f px0 q is the
Gateaux gradient of f at x0 . Assume that there exists v P E such that x p, v y ă Df rx0 spvq.
Then, let F “ ttv : t P Ru be the linear subspace of E spanned by v and define the linear
functional q : F Ñ R by x q, tv y “ t Df rx0 spvq. Since F is finite-dimensional, we have that
q P F ˚ . In the following, we extend q to an element p̄ P E ˚ ztpu by the Hahn-Banach
theorem and show that p̄ P Bf px0 q.
Using the positive homogeneity of Df rx0 s, it holds that x q, tv y “ t Df rx0 spvq “ Df rx0 sptvq
for all t ě 0. Moreover, since p is a subgradient, we have for t ă 0 that

x q, tv y “ t Df rx0 spvq ď t x p, v y “ x p, tv y ď Df rx0 sptvq.

Hence, we have that x q, x y ď Df rx0 spxq for all x P F . Moreover Df rx0 s is subadditive since
it is positively homogeneous and it holds
´1 1 ¯
Df rx0 spx ` yq “ 2 Df rx0 s x ` y
2 2
´1 1 ¯
ď2 Df rx0 spxq ` Df rx0 spyq “ Df rx0 spxq ` Df rx0 spyq
2 2
due to the convexity of Df rx0 s by Lemma 2.3.8. Thus, by the algebraic Hahn-Banach
theorem, there exists some p̄ P E 1 such that x p̄, tv y “ x q, tv y for all t P R and x p̄, x y ď
Df rx0 spxq for all x P E. By Lemma 2.3.8 and (3.5) that for all x P E with }x} ď 1
f px0 ` εxq ´ f px0 q c ´ f px0 q
x p̄, x y ď Df rx0 spxq ď ď .
ε ε
In particular, p̄ is bounded and therefore continuous, i.e., p̄ P E ˚ . Together with x p̄, x y ď
Df rx0 spxq for all x P E, this implies that p̄ P Bf px0 q. Since p̄ ‰ p (because x p̄, v y “
Df rx0 spvq ą x p, v y) this contradicts Bf px0 q “ tpu. l

70
3 SUBGRADIENTS

Example 3.1.8 (Unbounded / empty subdifferential of f P Γ0 pRq)


Consider
& |x| ´ ?2 ´ x, if |x| ď 2,
$
2

f : R Ñ R, x ÞÑ (3.6) 1.5

else.
1

% 8, 0.5

which is a proper convex lower semicontinuous function. We have


−0.5

−1

−1.5
−3 −2 −1 0 1 2 3

$

’ 8, if x “ 2,
&
Df rxsp1q “ 1 ` 2?2´x , if x P r0, 2q,
1

’ Fig. 3.2: The func-
´ 1, if x P r´2, 0q.
% ?1
2 2´x tion from (3.6).
and $


’ ´8, if x “ 2,

?1 if x P p0, 2q,

& ´1 ´
2 2´x
,
Df rxsp´1q “


’ 1´ ? 1
2 2´x
, if x P p´2, 0s,

if x “ ´2.

%
8,
Thus by ˘p ď Df rxsp˘1q, i.e. ´ Df rxsp´1q ď p ď Df rxsp1q we get
$! )


’” 1 ` ?1
2 2´x
, if x P p0, 2q,

’ ı
1 1
´ 1, 2? ` 1 , if x “ 0,

’ ?

& !2 2
’ )2
Bf pxq “

? 1
2 2´x
´1 , if x P p´2, 0q,

´8, ´ 43 , if x “ ´2,

’ ` ‰




otherwise.

% H,
˛

Recall that x̂ P E minimizes a Gateaux differentiable convex f : E Ñ R if and only if


∇f px̂q “ 0. This optimality criterion can be generalized to proper (convex) f as follows.

Theorem 3.1.9: Fermat’s rule

Let f : E Ñ R be proper. Then x0 P E is a global minimizer of f if and only if

0 P Bf px0 q.

Proof. We have that x0 is a global minimizer of f if and only if

f px0 q ď f pxq @x P E ô x 0, x ´ x0 y ď f pxq ´ f px0 q @x P E ô 0 P Bf px0 q.

Example 3.1.10
• The subdifferential of the norm } ¨ } is given by
$
& tp P E ˚ : x p, x y “ }x}, }p} “ 1u, if x ‰ 0,
˚
B} ¨ }pxq “
% B1 p0q, else.

• Let C Ă E be non-empty and convex. Then its subdifferential at x0 is given by the


normal cone BιC px0 q “ NC px0 q. ˛

71
3 SUBGRADIENTS

Exercises

Exercise 3.1.11 Prove the identities from Example 3.1.10. ■

Exercise 3.1.12 Let f be subdifferentiable at x0 P E. Prove that f is lsc in x0 . ■

Exercise 3.1.13 (Subgradients define normals to epigraph) Let f : E Ñ R be proper,


convex and x0 P dompf q. Then
␣ ` ˘(
Bf px0 q “ p P E ˚ : pp, ´1q P Nepipf q px0 , f px0 q . ■

72
3 SUBGRADIENTS

3.2 Subdifferential Calculus


Textbooks: [ET99, Sec. I.5.3], [Hol12, p. 87], [Z0̆2, Thm. 2.4.2] and [BC11, Sec. 16.4]. Lecture 11,
10.05.2024
Lemma 3.2.1 (Subdifferential of scaling, translating, dilation)
For a function f : E Ñ R we have
1 Bgpxq “ Bf px ` x0 q for gpxq :“ f px ` x0 q,
2 Bgpxq “ λBf pλxq for gpxq :“ f pλxq for all λ P R,
3 Bgpxq “ λBf pxq for gpxq :“ λf pxq for λ ą 0.

The proof is left as Exercise 3.2.6


By the following theorem, the subdifferential is not “complete” with respect to the sum of
functions and linear transforms, unless we require some continuity.

Theorem 3.2.2: Subdifferential of sum and composition with


linear transformation

For f, g : E Ñ R we have

Bf px0 q ` Bgpx0 q Ă Bpf ` gqpx0 q @x0 P E,

with equality if f, g P Γ0 pEq and there exists a point x̂ P dompf q X dompgq such that
f is continuous at x̂.
Let F be a real normed space, A P LpE; F q and f P Γ0 pF q be continuous at Aŷ for
some ŷ P E. For every x0 P E we have
“ ‰
Bpf ˝ Aqpx0 q “ A˚ pBf qpAx0 q ,

where A˚ P LpF ˚ ; E ˚ q is the adjoint of A defined by x A˚ p, x y “ x p, Ax y for all


p P F ˚ and x P E.

Proof. We only prove the first statement. A proof for the second statement can be found
in [ET99, Prop. I.5.7].
“Ă”: Let p P Bf px0 q ` Bgpx0 q, i.e., p “ p1 ` p2 for p1 P Bf px0 q and p2 P Bgpx0 q. Then it holds

x p1 , x ´ x0 y ď f pxq ´ f px0 q and x p2 , x ´ x0 y ď gpxq ´ gpx0 q @x P E.

Summing up both inequalities yields

x p, x´x0 y “ x p1 `p2 , x´x0 y “ f pxq`gpxq´f px0 q´gpx0 q “ pf `gqpxq´pf `gqpx0 q @x P E,

i.e., p P Bpf ` gqpx0 q.


“Ą”: For x0 R dompf `gq, we have either f px0 q “ 8 or gpx0 q “ 8 such that Bf px0 q`Bgpx0 q “
H “ Bpf ` gqpx0 q. Thus, let x0 P dompf ` gq “ dompf q X dompgq and p P Bpf ` gqpx0 q, that
is,
f pxq ` gpxq ě x p, x ´ x0 y `f px0 q ` gpx0 q @y P E.

73
3 SUBGRADIENTS

This implies
f pxq ´ f px0 q ´ x p, x ´ x0 y ě gpx
looooooooooooooooomooooooooooooooooon 0 q ´ gpxq .
loooooomoooooon (3.7)
“:φ1 pxq “:´φ2 pxq

Since f, g P Γ0 pEq, we have that φ1 , φ2 P Γ0 pEq. Consequently, the sets C1 :“ tpx, aq P


E ˆ R : φ1 pxq ď au and C2 :“ tpx, bq P E ˆ R : ´φ2 pxq ě bu are convex and non-empty.
Moreover, (3.7) yields that ripC1 q X ripC2 q “ H. Thus, we can separate C1 and C2 due to
Theorem 1.2.5. That is, there exist pq, aq P pE ˚ ˆ Rqzt0u and β P R such that

xpq, aq, px2 , b2 q y ď β ď xpq, aq, px1 , b1 q y @px1 , b1 q P C1 , px2 , b2 q P C2 . (3.8)

Since φ1 px0 q “ φ2 px0 q “ 0, we have for all b2 ă 0 ă b1 that px0 , b1 q P C1 and px0 , b2 q P C2
and therefore

xpq, aq, px0 , b2 q y ď β ď xpq, aq, px0 , b1 q y ùñ a pb 2 ´ b1 q ď 0


looomooon
ă0

such that a ě 0. For px0 , 0q P C1 X C2 , we obtain that

x q, x0 y ď β ď x q, x0 y, i.e., β “ x q, x0 y . (3.9)

Next, we show that a ą 0. Assume that a “ 0; then it holds q ‰ 0. Further, we have


that x q, x2 y ď x q, x1 y for all x1 P dompφ1 q “ dompf q and x2 P dompφ2 q “ dompgq. In
particular, we have that
x q, x̂ y ď x q, x1 y @x1 P dompf q. (3.10)

Now let y P E with }y} ď 1 such that x q, y y ą 0. Since f is continuous in x̂, we have
that there exists some ε ą 0 such that B2ε px̂q Ă dompf q. In particular, we have that
x̂ ´ εy P dompf q such that it holds by (3.10) that

x q, x̂ y ď x q, x̂ ´ εy y “ x q, x̂ y ´ε x q, y y ă x q, x̂ y

which is a contradiction. Hence, it holds a ą 0 and without loss of generality it holds a “ 1.


Inserting this in (3.8) together with (3.9) yields for x P dompf q “ dompφ1 q with px, φ1 pxqq P
C1 “ epipφ1 q that
x q, x0 y “ β ď x q, x y `φ1 pxq,

which can be reformulated as

x p ´ q, x ´ x0 y ď f pxq ´ f px0 q @x P dompf q.

In particular, we have p ´ q P Bf px0 q. Similarly, we get by (3.8) together with a “ 1 and


(3.9) that
x q, x ´ x0 y ď gpxq ´ gpx0 q @x P dompgq

such that q P Bgpx0 q. Consequently, we have that p “ p ´ q ` q P Bf px0 q ` Bgpx0 q. l

74
3 SUBGRADIENTS

Example 3.2.3 (Bf pxq ` Bgpxq ⊊ Bpf ` gqpxq)


Consider f :“ ιRě0 and
$
& 0, if x ă 0,


gpxq :“ 1, if x “ 0,


8, if x ą 0.
%

We have
$
’ H,
’ if x ă 0, $
& & H, if x ě 0,
Bf pxq :“ p´8, 0s, if x “ 0, and Bgpxq :“

’ % t0u, if x ă 0.
if x ą 0.
%
t0u,

Thus Bf pxq ` Bgpxq “ H for all x P R (as S ` H “ H for all sets S). Furthermore, for x P R
we have $
& 1, if x “ 0,
f pxq ` gpxq “
% 8, if x ‰ 0.

For x “ 0 we obtain

Bpf ` gqp0q “ tp P R : f pyq ` gpyq ´ 1 ě py @y P Ru “ R .

Thus $
& H, if x ‰ 0,
Bpf ` gqpxq :“
% R, if x “ 0.
˛

Combining Theorem 3.2.2 and Theorem 3.1.9 gives a characterization of minima in con- Lecture 12,
strained optimization problems. 15.05.2024

Corollary 3.2.4 (First order optimality conditions for constrained minimization)


Let f P Γ0 pEq and C Ă E be a non-empty closed convex set such that dompf q X C ˝ ‰ H.
Then
x̂ P arg min f pxq
xPC

if and only if x̂ P C and 0 P Bf px̂q ` NC px̂q, i.e., there exists some p P Bf px̂q such that
x p, x ´ x̂ y ě 0 for all x P C.
In particular, if E is a Hilbert space, then it is a minimizer if and only if x̂ P C and there
exists a p P Bf px̂q such that

x̂ “ PC px̂ ´ γpq @γ ą 0. (3.11)

Proof. 1 As ιC is continuous on C ˝ , by Theorem 3.2.2 and Theorem 3.1.9 we have that


x̂ is a minimizer of f ` ιC if and only if

0 P Bf px̂q ` BιC px̂q “ Bf px̂q ` NC px̂q.

Thus there exists p P Bf px̂q with x ´p, x ´ x̂ y ď 0 for all x P C.

75
3 SUBGRADIENTS

2 Let E be a Hilbert space. By the projection theorem, x̂ “ PC px̂ ´ γpq is equivalent


to x x̂ ´ γp ´ x̂, x ´ x̂ y ď 0 for all x P C, which is equivalent to

x p, x ´ x̂ y ě 0 @x P C,

so the result follows from 1 . l

There are also product, quotient and chain rules for subdifferentials. We refer to [FP03,
Prop. 7.1.9, Prop. 7.1.11] for detailed formulations and proofs in the case E “ Rn and to
[Cla90b, p. 42 - 48] for a more general case.
Finally, using subdifferential calculus, we can now present an easy proof for the variational
inequality for proximity operators from Theorem 2.7.4 5 .

Corollary 3.2.5 (Easy proof of Theorem 2.7.4 5 )


Let E be a Hilbert space, f P Γ0 pEq, x P E and λ ą 0. Then, it holds x̂ P proxλf pxq if
and only if
1
x x ´ x̂, y ´ x̂ y ď f pyq ´ f px̂q.
λ

Proof. By definition, we have that x̂ P proxλf pxq if and only if

1
x̂ P arg mint }z ´ x}2 ` λf pzqu.
zPE 2

By Fermat’s rule (Theorem 3.1.9) and subdifferential calculus (Theorem 3.2.2) this is equiv-
alent to
1
0 P x̂ ´ x ` λBf px̂q ô px ´ x̂q P Bf px̂q.
λ
Inserting the definition of subdifferentials this is equivalent to
1
x x ´ x̂, y ´ x̂ y ď f pyq ´ f px̂q.
λ
This completes the proof. l

Exercises

Exercise 3.2.6 Prove Lemma 3.2.1. ■

76
4 PROXIMAL ALGORITHMS

4 Proximal Algorithms
In this section, we consider algorithms for solving the optimization problem

x P arg min f pxq, f P Γ0 pEq. (4.1)


xPE

In the case that f is differentiable, we know from a lecture on numerical methods that we
can employ a gradient descent scheme. Moreover, we have seen in the previous sections that
the proximal mapping generalizes the backward gradient step

xn`1 “ xn ´ λ∇f pxn`1 q.

Therefore, we will consider algorithms for solving (4.1), where the proximal mapping replaces
gradient descent steps. To analyze their convergence, we consider fixed point theorems and
averaged operators. Moreover, we assume that pE, } ¨ }q is a Hilbert space with inner
product x ¨, ¨ y. Indeed, many of the following theorems do not hold true in general Banach
spaces without additional assumptions.

4.1 Fixed Point Algorithms and Averaged Operators


Textbooks: [BC11, Chp. 4, Sec. 5.2, Chp. 23, Chp. 27], [AE06, Chp. 5.2], [Cha09, Chp. 6],
[Byr04] and [Com04].
In this subsection, we analyze the convergence of the fixed point iteration xn`1 “ T pxn q.
If T is contractive (i.e., L-Lipschitz-continuous for some L ă 1), we can apply Banach’s
fixed point theorem. Unfortunately, proximal mappings are in general not contractive. For
example, for f : E Ñ R with f ” 0, we have that proxλf is the identity operator I : E Ñ E,
which is not a contraction. Hence, we resort to a more general convergence result based
on so-called averaged operators. This type of convergence results can be traced back to
Mark Krasnoselskii [Kra55], William Mann [Man53] and Helmut Schaefer [Sch57].
Therefore, the above iteration is also called Krasnoselskii-Mann iteration whenever T
is an averaged operator. The main results of this subsection, Theorem 4.1.10, shows that
such iterates converge weakly to a fixed point of T provided FixpT q ‰ H.

Definition 4.1.1 (Contractive, (firmly) non-expansive)


We call T : E Ñ E contractive if it is Lipschitz continuous with Lipschitz constant contractive
L ă 1, that is,
}T x ´ T y} ď L}x ´ y} @x, y P E.

If L “ 1, then T is non-expansive. non-expansive


Moreover, for t P p0, 1q, the operator T is called t-averaged if averaged

T “ tI ` p1 ´ tqR for some non-expansive R : E Ñ E, (4.2)

where I is the identity. We say that T is averaged, if it is averaged for some t P p0, 1q.

77
4 PROXIMAL ALGORITHMS

Finally, T is firmly non-expansive if

}T x ´ T y}2 ď }x ´ y}2 ´ }pI ´ T qx ´ pI ´ T qy}2 ,

which is equivalent to
}T x ´ T y}2 ď x T x ´ T y, x ´ y y .

By definition, averaged and firmly non-expansive operators are non-expansive. The reverse
is not true. For instance, the identity is firmly non-expansive and t-averaged for all t P p0, 1q,
but is not a contraction.
Convex combinations and concatenations of non-expansive operators are non-expansive.
Moreover, if T is t-averaged for some t P p0, 1q, then it is also s-averaged for any 0 ă s ď t.
Further, the following lemma summarizes some relations between the above definitions.

Lemma 4.1.2
Let E be a Hilbert space. Then:
1 A contractive operator T : E Ñ E with Lipschitz constant L ă 1 is t-averaged for all
t P 0, 21 p1 ´ Lq Ă 0, 12 .
` ‰ ` ‰

2 An operator T : E Ñ E is t-averaged if and only if

t
}T x ´ T y}2 ď }x ´ y}2 ´ }pI ´ T qx ´ pI ´ T qy}2 @x, y P E. (4.3)
1´t

In particular, T is firmly non-expansive if and only if it is 21 -averaged.


3 If Tk : E Ñ E are tk -averaged, respectively, for k P t1, 2u, then T2 ˝ T1 is t1 t2 -averaged.

Proof. 1 Let T : E Ñ E have Lipschitz constant L ă 1. Then R :“ 1


1´t pT ´ tIq fulfills

1
}Rx ´ Ry} “ }pT ´ tIqx ´ pT ´ tIqy}
1´t
△‰ 1 t L`t
ď }T x ´ T y} ` }x ´ y} ď }x ´ y}
1´t 1´t 1´t

for all x, y P E, so R is non-expansive if L`t


1´t ď 1, that is, if t P 0, 2 .
` 1´L ‰

2 Define R “ 1´t
1
T ´ 1´tt
I such that T “ tI ` p1 ´ tqR. Then, we have to show that R
is non-expansive if and only if T is firmly non-expansive.
Using the parallelogram identity }αv ` p1 ´ αqw}2 ` αp1 ´ αq}v ´ w}2 “ α}v}2 ` p1 ´
αq}w}2 in Hilbert spaces, we obtain with α “ 1´t1
(i.e., 1 ´ α “ ´ 1´t
t
), v “ T x ´ T y
and w “ x ´ y that
1 t
}Rx ´ Ry}2 “ } pT x ´ T yq ´ px ´ yq}2
1´t 1´t
1 t t
“ }T x ´ T y}2 ´ }x ´ y}2 ` }T x ´ T y ´ px ´ yq}2
1´t 1´t p1 ´ tq2
1 t t
“ }T x ´ T y}2 ´ }x ´ y}2 ` }pI ´ T qx ´ pI ´ T qy}2 .
1´t 1´t p1 ´ tq2

78
4 PROXIMAL ALGORITHMS

Thus, by multiplying with 1 ´ t and substracting p1 ´ tq}x ´ y}2 , it holds


t
p1 ´ tq }Rx ´ Ry}2 ´ }x ´ y}2 “ }T x ´ T y}2 ´ }x ´ y}2 ` }pI ´ T qx ´ pI ´ T qy}2 .
` ˘
1´t

We conclude that: R is non-expansive if and only if the left side of the above equation
is ď 0 for all x, y P E if and only if the right side of the above equation is ď 0 for all
x, y P E if and only if T fulfills (4.3).
Inserting t “ 1
2 gives that T is firmly non-expansive if and only if it is 12 -averaged.
3 By assumption there exists non-expansive operators Rk : E Ñ E and tk P p0, 1q such
that Tk “ tk I ` p1 ´ tk qRk for k P t1, 2u for x P E. Then, for t :“ a1 a2 we have that
pT2 ˝ T1 qpxq “ t2 T1 pxq ` p1 ´ t2 qpR2 ˝ T1 qpxq
` ˘
“ t2 t1 x ` p1 ´ t1 qR1 pxq ` p1 ´ t2 qpR2 ˝ T1 qpxq
“ tx ` pt2 ´ tqR1 pxq ` p1 ´ t2 qpR2 ˝ T1 qpxq
ˆ ˙
t2 ´ t 1 ´ t2
“ tx ` p1 ´ tq R1 pxq ` pR2 ˝ T1 qpxq .
1´t 1´t
loooooooooooooooooooooooomoooooooooooooooooooooooon
“:Rpxq

Here, R is non-expansive as convex combination of non-expansive operators. l

Important examples of firmly non-expansive operators are proximal mappings.

Lemma 4.1.3 (prox is firmly non-expansive)


Let E be a Hilbert space, f P Γ0 pEq and λ ą 0. Then, the proximal operator proxλf is
firmly non-expansive.

Proof. Let x, y P E. By Theorem 2.7.4, we have that


1@ D
x ´ proxλf pxq, z ´ proxλf pxq ` f pproxλf pxqq ´ f pzq ď 0 @z P E.
λ
With z :“ proxλf pyq this gives
@ D
x ´ proxλf pxq, proxλf pyq ´ proxλf pxq ` λf pproxλf pxqq ´ λf pproxλf pyqq ď 0

and by exchanging x and y,


@ D
y ´ proxλf pyq, proxλf pxq ´ proxλf pyq ` λf pproxλf pyqq ´ λf pproxλf pxqq ď 0.

Adding the two previous inequalities we obtain

xx ´ proxλf pxq ` proxλf pyq ´ y, proxλf pyq ´ proxλf pxqy ď 0,

and thus that proxλf is firmly-nonexpansive, namely

} proxλf pyq ´ proxλf pxq}2 ď xy ´ x, proxλf pyq ´ proxλf pxqy. l

Next, we aim to examine the convergence behaviour of the fixed point iteration

xn`1 “ T pxn q

provided that T is an averaged operator. To this end, we need some more definitions.

79
4 PROXIMAL ALGORITHMS

Definition 4.1.4 (Fixed point, FixpT q)


Let T : E Ñ E be an operator. We denote the set of fixed points of T by fixed point

FixpT q :“ tx P E : T x “ xu.
kÑ8
Moreover, we call T asymptotically regular if T k`1 x ´ T k x ÝÝÝÑ 0 for all x P E, where asymptotically
T k denotes the operator looooomooooon
T ˝ ¨¨¨ ˝ T. regular
k times

Example 4.1.5 In general, the fixed point iteration xn`1 “ T pxn q does not necessarily
converge for non-expansive operators T even when FixpT q ‰ H. Simple examples are
T “ ´I or T : R2 Ñ R defined by T px1 , x2 q “ px1 , ´x2 q. ˛

In contrast to the operators from Example 4.1.5, averaged ones are asymptotically regular.

Theorem 4.1.6: Asymptotic regularity of averaged operators

Let E be a Hilbert space. If T : E Ñ E is averaged with FixpT q ‰ H, then T is


asymptotically regular.

Proof. Suppose T is t-averaged with t P p0, 1q. Let n P N and y P FixpT q. By Lemma 4.1.2
and T y “ y we have for all z P E that
t
}T z ´ y}2 “ }T z ´ T y}2 ď }z ´ y}2 ´ }pI ´ T qz ´ pI ´ T qy}2
1´t
t
“ }z ´ y}2 ´ }T z ´ z}2 ,
1´t
which can be rearranged to
1´t`
}T z ´ z}2 ď }z ´ y}2 ´ }T z ´ y}2 . (4.4)
˘
t

Now, let x P E be arbitrary. Since T is averaged, it is non-expansive, such that for any
y P FixpT q it holds
}T k`1 x ´ y} “ }T k`1 x ´ T y} ď }T k x ´ y}.

Therefore, the sequence p}T n x´y}qnPN is non-negative and decreasing such that it converges
to some d ě 0.
Plugging z “ T k x into (4.4) yields

}T k`1 x ´ T k x}2 “ }T pT k xq ´ T k x}2


(4.4) 1´t` k ˘ kÑ8 1 ´ t
ď }T x ´ y}2 ´ }T k`1 x ´ y}2 ÝÝÝÑ pd ´ dq “ 0.
t t l

For the next theorems, we need a technical lemma on weak convergence. Recall that x P E
is a weak sequential cluster point of a sequence pxk qk in E if there exists a subsequence
pxkj qj such that xkj á x as j Ñ 8.
Lecture 13,
17.05.2024

80
4 PROXIMAL ALGORITHMS

Lemma 4.1.7 (Subsequence-subsequence criterion for weak convergence)


Let E be a reflexive Banach space and let pxk qk be a sequence in E. Then pxk qk converges
weakly to some x P E if and only if pxk qk is bounded and has at most one weak sequetial
cluster point.

Proof. “ ùñ ”: Assume that pxk qk converges weakly to some x P E. Then, the sequence
x ΛE pxk q, p y is bounded for all p P E ˚ , i.e., there exists Mp ă 8 such that x ΛE pxk q, p y ă Mp
for all k P N. Thus, the uniform boundedness principle (also known as Banach–Steinhaus
theorem) yields that there exists some M ă 8 such that }xk } “ }ΛE pxk q}˚˚ ă M . Hence,
pxk qk is bounded. Since weak limits are unique, x is the only weak sequential cluster point,
and any subsequence of pxk qk converges weakly to x.
“ ðù ”: Let pxk qk be bounded and suppose that it has at most one weak sequential cluster-
point. Since E is a reflexive Banach space, there exists a subsequence converging to some
x P E. In particular, x is the unique weak sequential cluster point.
Now assume that pxk qk does not weakly converge to x. By definition, this implies that
there exists some p P E ˚ such that x p, xk y does not converge to x p, x y. Thus, there
exists a subsequence pxkj qj and some ε ą 0 such that | x p, xkj y ´ x p, x y | ą ε. Since
x p, xk y ď }p}˚ }xk } is bounded, there exists a subsequence of pxkj qj (which we again denote
by pxkj qj a P R such that x p, xkj y Ñ a ‰ x p, x y. Because pxkj qj is bounded and E is a
reflexive Banach space, it admits a weakly convergent subsequence pxkjl ql . Using that x is
the unique weak sequential cluster point of pxk qk , we obtain that xkjl á x. In particular,
we have that
x p, x y “ lim x p, xkjl y “ lim x p, xkj y “ a ‰ x p, x y .
lÑ8 jÑ8

This is a contradiction. l

The next theorem ensures that weak limits of the fixed point iteration xn`1 “ T pxn q for
non-expansive, asymptotically regular operators T are fixed points of T .

Theorem 4.1.8: Demiclosedness principle

Let E be a Hilbert space and T : E Ñ E be non-expansive. If there exists pxk qkPN Ă


E with xn á x P E and xk ´ T xk Ñ 0, then x P FixpT q.

Proof. Using twice that }v ˘ w}2 “ }v}2 ˘ 2 x v, w y `}w}2 for all v, w P E yields for every
k P N that
}T x ´ x}2 “ }T x ´ xk }2 ´ }xk ´ x}2 ´ 2 x xk ´ x, x ´ T x y
“ }pT xk ´ xk q ` pT x ´ T xk q}2 ´ }xk ´ x}2 ´ 2 x xk ´ x, x ´ T x y
“ }T xk ´ xk }2 ` 2 x T xk ´ xk , T x ´ T xk y
` }T xk ´ T x}2 ´ }xk ´ x}2 ´ 2 x xk ´ x, x ´ T x y
Since T is non-expansive, this is smaller or equal than
}T xk ´ xk }2 ` 2 x T xk ´ xk , T x ´ T xk y ´2 x xk ´ x, x ´ T x y
}T xk ´ xk }2 `2 }T
ď loooooomoooooon xk ´ xk } loooooomoooooon
looooomooooon }T x ´ T xk } ´2 x x k ´ x, x ´ T x y .
loomoon
Ñ0 Ñ0 bounded á0

81
4 PROXIMAL ALGORITHMS

Now, the first two summands converge to zero by the assumption that T xk ´ xk Ñ 0 and
by the fact that xk converges weakly and is therefore bounded by Lemma 4.1.7. The last
summand converges to zero since pxk qk converges weakly to x. Thus, considering the limit
k Ñ 8, we obtain that }T x ´ x} “ 0. l

The following theorem was first proved by Zdzisław Opial in 1967 [Opi67, Thm. 1].

Theorem 4.1.9: Opial’s convergence theorem


Let E be a Hilbert space and T : E Ñ E be non-expansive and asymptotically
regular with FixpT q ‰ H. Then, for any x0 P E, the sequence pxk qkPN defined by
xk`1 “ T xk converges weakly to some x P FixpT q.

Proof. Since T is non-expansive, we obtain for any y P FixpT q that }xk`1 ´ y} “ }T xk ´


T y} ď }xk ´ y}. By induction this yields that

}xk } ď }xk ´ y} ` }y} ď }x0 ´ y} ` }y},

i.e., the sequence pxk qk is bounded and the sequence ak pyq :“ }xk ´ y}2 is decreasing.
Since any Hilbert space is reflexive, we obtain by Lemma 2.5.3 that pxk qk admits a sub-
sequence pxkj qj which converges weakly to some x̂ P E. Using that T is non-expansive and
asymptotically regular, it holds by Theorem 4.1.8 that x̂ P FixpT q. In the following, we show
that the whole sequence pxk qk converges weakly to x̂. Due to Lemma 4.1.7 is suffices to show
that x̂ is the only weak sequential cluster point of pxk qk . Assume that ŷ P E is another
weak sequential cluster point and let pxkl ql be a subsequence of pxk qk converging weakly to
ŷ. Analogously to x̂, we have that ŷ P FixpT q. Since ak px̂q and ak pŷq are decreasing and
positive, we obtain that they converge to some apx̂q and apŷq. Then, it holds
2 2
2 x x̂, x̂ ´ ŷ y “ lim 2 x xkj , x̂ ´ ŷ y “ lim }x }x̂}2 ´ }xkj ´ x̂}2
kj ´ ŷ} ´ }ŷ} ` loooooooooomoooooooooon
jÑ8 jÑ8 loooooooooomoooooooooon
“}xkj }2 ´2 x xkj ,ŷ y “2 x xkj ,x̂ y ´}xkj }2

“ lim akj pŷq ´ }ŷ}2 ` }x̂}2 ´ akj px̂q “ apŷq ´ }ŷ}2 ` }x̂}2 ´ apx̂q.
jÑ8

Analogously, we obtain that

2 x ŷ, x̂ ´ ŷ y “ apŷq ´ }ŷ}2 ` }x̂}2 ´ apx̂q.

Substracting these two equations together, we obtain that

}x̂ ´ ŷ}2 “ x x̂ ´ ŷ, x̂ ´ ŷ y “ 0

such that ŷ “ x̂. Consequently, x̂ is the only weak sequential cluster point of the bounded
sequence pxk qk such that xk á x̂ as k Ñ 8 by Lemma 4.1.7. l

Combining Theorem 4.1.9 and Theorem 4.1.6 yields the following main result.

82
4 PROXIMAL ALGORITHMS

Theorem 4.1.10: Convergence of averaged operator iterations

Let E be a Hilbert space and T : E Ñ E be averaged with FixpT q ‰ H. Then, for


every x0 P E, the sequence pT k x0 qkPN converges weakly to a fixed point of T .

If dimpEq ă 8, weak and strong convergence coincide. Then, Theorem 4.1.10 yields strong
convergence of the sequence pT k x0 qkPN for any initial point x0 P E. Putting everything
together, we obtain the following diagram, which summarizes the results of this section.

firmly nonexpansive non-expansive

a= 12

a≤ 12 (1−L)
a-averaged L-contractive
Fix̸=∅
Fix̸=∅

+non-expansive
asymptotically regular (T k x(0) )k ⇀ x̂
Fix̸=∅

Fig. 4.1: Different conditions ensuring the weak convergence of Picard iterates to a fixed
point x̂ P FixpT q belonging to an operator T : H Ñ H and the relationships between them.

Exercises

Exercise 4.1.11 If T is averaged with respect to a contractive operator R, then T is also


contractive. ■

Exercise 4.1.12 Let E be a Hilbert space. If T : E Ñ E is continuous and the “orbit


sequence” pT k x0 qkPN converges, then it converges to a fixed point of T . ■

Exercise 4.1.13 (| ¨ | not averaged) The absolute value is not an averaged operator. ■

Exercise 4.1.14
Let E be a real normed space and let T : E Ñ E.
• Assume that pT k xqkPN converges for all x P E. Prove that T is asymptotically regular.
• Find an example of an asymptotically regular operator T and some x P E such that
the sequence pT k xqkPN is unbounded (and diverges consequently). ■

83
4 PROXIMAL ALGORITHMS

4.2 First-order Algorithms for Convex Functions


Now, we consider algorithms for solving the minimization problem (4.1). Throughout this
section, we assume that E is a Hilbert space and identify the dual space E ˚ with E by
the Riesz isomorphism.

Gradient descent
First, we recall the basic gradient descent algorithms from the lecture on numerical methods.

Algorithm 1: Gradient Descent.


Data: Let f P Γ0 pEq Gateaux differentiable with arg minpf q ‰ H, x0 P E, λ ą 0.
for k “ 0, 1, . . . do
xk`1 “ xk ´ λ∇f pxk q
end

For proving convergence of Algorithm 1, we need the following theorem. It is known under
the name Baillon-Haddad theorem and can be shown in even more generality by using
duality theory. In particular, existence of the second derivative can be dropped.
Lecture 14,
22.05.2024
Theorem 4.2.1: Baillon-Haddad

Let E be a Hilbert space and suppose that f P Γ0 pEq is twice Gateaux differen-
tiable such that the Gateaux gradient is L-Lipschitz continuous and the Gateaux
Hessian is continuous. Then, the operator L1 ∇f : E Ñ E is firmly non-expansive.

The proof follows quite simple ideas. However, taking care of the infinite-dimensional setting
requires some technical effort. Therefore, we first state a finite-dimensional sketch of the
proof and present afterwards a general proof which includes all details.

Proof. Without loss of generality assume that L “ 1 (otherwise consider L1 f instead of f ).


Let R :“ 2∇f ´ I such that ∇f “ 12 I ` 12 R. Now it suffices by Lemma 4.1.2 to show that
R is 1-Lipschitz continuous.
Finite-dimensional sketch: Let x0 P E. Since f is convex, all eigenvalues of the Hessian
∇2 f rx0 s are nonnegative. Further, ∇f is 1-Lipschitz such that }∇2 f rx0 s} ď 1. In partic-
ular, all eigenvalues of ∇2 f rx0 s are contained in r0, 1s. Therefore, using that R “ 2∇f ´ I,
all eigenvalues of the Jacobian JRrx0 s of R are contained in r´1, 1s. This implies that
}JRrx0 s} ď 1 such that R is 1-Lipschitz-continuous.
General proof with all details: Using the convexity of f together with Theorem 2.3.11
and the fact that ∇f is 1-Lipschitz continuous, we obtain for all x0 , x P E that
1
0 ď x ∇2 f rx0 spxq, x y “ lim
x ∇f px0 ` txq ´ ∇f px0 q, x y
tŒ0 t
1 1
ď lim }∇f px0 ` txq ´ ∇f px0 q}}x} ď lim }x0 ` tx ´ x0 }}x} ď }x}2 .
tŒ0 t tŒ0 t

Hence, using DIrx0 spxq “ x, we obtain that DRrx0 s “ Dp2∇f ´ Iqrx0 s fulfills for all x P E

84
4 PROXIMAL ALGORITHMS

that
´}x}2 ď 2∇2 f rx0 spxq ´ x, x y “ x DRrx0 spxq, x ď }x}2 .
@ D

Since ∇2 f rx0 s is self-adjoint, we have that also DR “ 2∇2 f rx0 s ´ I is self-adjoint (as sum of
two self-adjoint operators). In particular, it was proven in the lecture “Functional Analysis”
(SoSe 2022, Lem. 7.41) that

} DRrx0 s} “ sup x DRrx0 spxq, x y ď 1.


xPE
}x}ď1

To show for fixed x0 , y0 P E that }Rpx0 q ´ Rpy0 q} ă }x0 ´ y0 }, we consider the function
@ D
φptq :“ Rpx0 q ´ Rpp1 ´ tqx0 ` ty0 q, Rpx0 q ´ Rpy0 q .

By definition, we obtain that φp0q “ x Rpx0 q ´ Rpx0 q, Rpx0 q ´ Rpy0 q y “ 0 and φp1q “
x Rpx0 q ´ Rpy0 q, Rpx0 q ´ Rpy0 q y “ }Rpx0 q ´ Rpy0 q}2 . Moreover, φ is differentiable with
B F
Rpp1 ´ tqx0 ` ty0 ` hpy0 ´ x0 qq ´ Rpp1 ´ tqx0 ` ty0 q
φ1 ptq “ lim , Rpx0 q ´ Rpy0 q
hÑ0 h
@ D
“ DRrp1 ´ tqx0 ` ty0 spy0 ´ x0 q, Rpx0 q ´ Rpy0 q .
In particular, we have by the mean value theorem and Cauchy-Schwarz that there exists
some 0 ă t ă 1 such that
}Rpx0 q ´ Rpy0 q}2 “ φp1q “ φp1q ´ φp0q “ φ1 ptq
“ x DRrp1 ´ tqx0 ` ty0 spy0 ´ x0 q, Rpx0 q ´ Rpy0 q y
ď } DRrp1 ´ tqx0 ` ty0 spy0 ´ x0 q}}Rpx0 q ´ Rpy0 q}
ď } DRrp1 ´ tqx0 ` ty0 s}}y0 ´ x0 }}Rpx0 q ´ Rpy0 q}
ď }y0 ´ x0 }}Rpx0 q ´ Rpy0 q}.
Now, dividing by }Rpx0 q ´ Rpy0 q} yields the claim. l

The Baillon-Haddad theorem directly implies the convergence of Algorithm 1.


Corollary 4.2.2
Let E be a Hilbert space and f P Γ0 pEq be twice Gateaux differentiable such that the
Gateaux gradient is L-Lipschitz continuous and the Gateaux Hessian is continuous.
Moreover, suppose that arg minpf q ‰ H. Then, for any x0 P E and any step size 0 ă λ ă L2 ,
the sequence pxk qk generated by Algorithm 1 converges weakly to a minimizer of f .

Proof. Let T : E Ñ E be defined by T pxq “ x ´ λ∇f pxq. By Theorem 4.2.1, there exists
some non-expansive operator R : E Ñ E such that ∇f pxq “ L2 x ` L2 Rpxq. In particular,
2 qx ` 2 Rpxq is an averaged operator. Thus, pxk qk “ pT x0 qk converges
T pxq “ p1 ´ λL λL k

weakly to some x̂ P FixpT q. This can be reformulated as T px̂q “ x̂, i.e., ∇f px̂q “ 0. By
Remark 2.5.6, this implies that x̂ P arg minpf q. l

Remark 4.2.3 (Gradient Flows)


Algorithm 1 can be interpreted as explicit Euler scheme for the ordinary differential equa-
tion
xptq
9 “ ´∇f pxptqq.
Differential equations of this form are called gradient flows and of interest in many parts of
mathematics. Explicit Euler discretizations of ODEs are also called forward schemes. ˝

85
4 PROXIMAL ALGORITHMS

Proximal Point Algorithm


Textbooks: [CP11c], [BC11].
In Section 2.7, we have seen that the proximal mapping generalizes an implicit gradient
descent step. This motivates the following algorithm for minimizing a (not necessarily
differentiable) function f P Γ0 pEq.

Algorithm 2: Proximal Point Algorithm (PPA).


Data: f P Γ0 pEq with arg minpf q ‰ H, x0 P E, λ ą 0.
for k “ 0, 1, . . . do
xk`1 “ proxλf pxk q
end

Corollary 4.2.4 (Convergence of PPA)


Let E be a Hilbert space, f P Γ0 pEq with arg minpf q ‰ H and λ ą 0. Then, for any
x0 P E the sequence pxk qk generated by Algorithm 2 converges weakly to a minimizer of f .

Proof. By definition, we have that xk “ T k x0 for T “ proxλf : E Ñ E. Due to Lemma 4.1.3,


T is firmly non-expansive and therefore 21 -averaged. Hence, Theorem 4.1.10 yields that pxk qk
converges weakly to some x̂ P FixpT q. By Theorem 2.7.4, x̂ is a minimizer of f . l

Note that, in general, pxk qk does not converge strongly.


Remark 4.2.5 (Gradient Flows) If f is Gateaux differentiable, we have by the defini-
tion of the proximal mapping that the sequence pxk qk fulfills

xk`1 “ xk ´ λ∇f pxk`1 q.

Consequently, Algorithm 2 can be interpreted as an implicit Euler scheme of the ODE

xptq
9 “ ´∇f pxptqq.

Such implicit discretizations of ODEs are also called backward schemes. ˝


Since the proximal mapping is defined as a minimization problem, it does not necessarily
admit an analytic form. Still, the objective function f can be often decomposed into a sum
of functionals fj such that proxλfj is known. Since the computation of proxλpf1 `f2 q might
not be possible directly, we extend the proximal point algorithm as follows.

Algorithm 3: Cyclic Proximal Point Algorithm (CPPA).


řn
Data: pfj qnj“1 Ă Γ0 pEq with arg minp j“1 fj q ‰ H, x0 P E, pλk qkPN Ă p0, 8q.
for k “ 0, 1, . . . do
xk`1 “ proxλk fn ˝ . . . ˝ proxλk f1 pxk q
end

Convergence of this algorithm is guaranteed by the following lemma. For the proof, we refer
to [Bačá14, Thm. 3.4, Rem. 3.8].

86
4 PROXIMAL ALGORITHMS

Lemma 4.2.6 (Convergence of CPPA)


řn
Let E be a Hilbert space, fj P Γ0 pEq be Lipschitz continuous for j “ 1, ..., n, f “ j“1 fj
and pλk qkPN P ℓ2 pNqzℓ1 pNq. Then, the sequence pxk qk generated by Algorithm 3 converges
weakly to a minimizer of f .

Suitable step sizes λk would be, e.g., λk :“ k1 λ0 for some λ0 ą 0. An important example of
the cyclic proximal point algorithm is the computation of a point within the inersection of
convex sets.

artn;n. -{ tn + te I
C?p n = al*nna,{ioa
Vo1iolio+
ot7*]h

Fig. 4.2: The cyclic proximal point algorithm finds a point in the intersection of two closed
convex sets A and B in R2 by minimizing ιA ` ιB .

Example 4.2.7 (Finding intersection of closed convex sets)


Şn
Consider non-empty, closed, convex sets pCj qnk“1 Ă E with j“1 Cj ‰ H. We aim to find
Şn řn
some x̂ P j“1 Cj , that is, x̂ P arg minxPE j“1 ιCj pxq.
In this case, the CPPA iteration becomes the much older alternating projection algo-
rithm [Bre65, p. 488] alternating
xk`1 “ pPCn ˝ . . . ˝ PC1 q pxk q, projection
algorithm
where PA is the orthogonal projection onto the set A Ă E. Algorithm 3 converges for n “ 2
to a fixed point of PC2 ˝ PC1 if C1 is compact [CG59, Thm. 4]. The choice of the step sizes
pλk qkPN does not matter in this case since λιCj “ ιCj for any λ ą 0. In general, the limit of
this algorithm might be different from PŞnj“1 Cj px0 q, see Fig. 4.3. ˛

Fig. 4.3: The alter-


Forward-Backward Splitting nating projection al-
gorithm applied to a
To motivate the next algorithm, we consider a small example. square and a circle.

Example 4.2.8 Let f : Rd Ñ R be defined by


1
f pxq “ }Φpxq ´ y}2 ` η}x}1 ,
2

87
4 PROXIMAL ALGORITHMS

where Φ : Rd Ñ Rn is differentiable and y P Rn is fixed. Since hpxq “ η}x}1 is not dif-


ferentiable, we cannot apply gradient descent. Moreover, the computation of the proximal
mapping of the function gpxq “ 12 }Φpxq ´ y}2 might be intractable. On the other hand, the
computation of proxh and the evaluation of ∇g are straight-forward (as far as the derivative
of Φ is known). ˛

To make use of this observation, we split f into g ` h and apply a forward step (i.e., a
gradient descent step) on g and a backward step (i.e., a proximal mapping) on h. The
arising algorithm is called “forward-backward splitting” or “proximal gradient algorithm”.

Algorithm 4: Forward-Backward Splitting Algorithm (FBS)


Data: f “ g ` h, where g P Γ0 pEq be Gateaux differentiable such that ∇g is
L-Lipschitz continuous, h P Γ0 pEq, x0 P E, λ P 0, L2 .
` ˘

for k “ 0, 1, . . . do
xk`1 “ proxλh pxk ´ λ∇gpxk qq
end

If h “ ιC , where C Ă H is non-empty, closed and convex, the minimization problem becomes


arg minxPC gpxq and Algorithm 4 becomes the projected gradient descent algorithm.

Theorem 4.2.9: Convergence of FBS

Let E be a Hilbert space and let f P Γ0 pEq be given by f “ g ` h for some


g, h P Γ0 pEq such that arg minpf q ‰ H. Further, assume that g is twice Gateaux
differentiable such that the Gateaux gradient is L-Lipschitz continuous and the
Gateaux Hessian is continuous. Then, for any x0 P E and λ P 0, L2 , the sequence
` ˘

pxk qkPN generated by Algorithm 4 converges weakly to a minimizer of f .

Proof. Note that xk “ T k x0 , where T “ T2 ˝ T1 with T2 “ proxλh and T1 pxq “ x ´ λ∇gpxq.


Then, we have that x P FixpT q if and only if
1
x P proxλh px ´ λ∇gpxqq ô x P arg mint }x ´ λ∇gpxq ´ y}2 ` λhpyqu
yPE 2
By Fermat’s rule this is equivalent to

0 P x ´ px ´ λ∇gpxqq ` λBhpxq “ λp∇gpxq ` Bhpxqq ô 0 P ∇gpxq ` Bhpxq

Using again Fermat’s rule, this is equivalent to x P arg minpg ` hq “ arg minpf q. Summa-
rizing, this yields that FixpT q “ arg minpf q ‰ H.
Moreover, T1 is averaged by Theorem 4.2.1 and T2 is averaged since it is a proximal map-
ping. Consequently T is a composition of averaged operators and consequently averaged
by Lemma 4.1.2. Hence, the sequence pxk qk converges weakly to some x̂ P FixpT q “
arg minpf q. l
Remark 4.2.10 Under the conditions we posed on g and h, a linear convergence rate of
Algorithm 4 can be achieved: for a minimizer x̂ P arg minxPH f pxq of f we have
ˆ ˙
1
f pxk q ´ f px̂q “ O ,
k

88
4 PROXIMAL ALGORITHMS

see [BT09b, CR97] (the latter being only in Rn ). ˝


We continue Example 4.2.8.
Example 4.2.11 (Iterative soft-thresholding algorithm (ISTA))
We consider the functional from Example 4.2.8 with Φpxq “ Kx for some linear operator
K P Rnˆd , i.e., " *
1
arg min }Kx ´ y}22 ` η}x}1 ,
xPRd 2
and write gpxq :“ 12 }Kx ´ y}22 and h :“ η} ¨ }1 .
Then, the gradient with respect to g is given by ∇gpxq “ KT pKx ´ yq and the proximal
mapping with respect to h is given by the soft-shrinkage operator proxh “ Sη due to Ex-
ample 2.7.3. For some λ P 0, L2 with L :“ }KT K}2 , the update step in Algorithm 4 thus
` ˘

becomes
` ˘
xk`1 “ Sηλ xk ´ λKT pKxk ´ yq .

This algorithm is known as iterative soft-thresholding algorithm (ISTA). It is in par- iterative


ticular important for dictionary-based image reconstruction techniques, which will be con- soft-thresholding
sidered in the exercise classes. A generalization of ISTA for Hilbert spaces can be found, algorithm
e.g., in [DDD04]. ˛

The FBS algorithm can be extended to non-convex functions, see e.g., [AB09, ABS13,
BST13, CPR13, OCBP14]. The convergence analysis mainly rely on the assumption that
the objective function f “ g ` h satisfies the Kurdyka-Lojasiewiccz inequality which
is indeed fulfilled for a wide class of functions as log ´ exp, semi-algebraic and subanalytic
functions which are of interest in image processing.

Accelerated Algorithms
For large scale problems as those arising in image processing a major concern is to find Lecture 15,
efficient algorithms with a reasonable runtime. While each step in Algorithm 4 has low 24.05.2024
computational complexity, it may suffer from slow linear convergence. Using a simple ex-
trapolation idea with appropriate relaxation parameters τk P p0, 1q, the convergence
can often be accelerated: This can be accelerated by the extrapolation step

yk “ xk ` τk pxk ´ xk´1 q ,

the new update step being

xk`1 “ proxλh pyk ´ λ∇gpyk qq “: T pyk q. (4.5)

For k P N we choose
k´1 θk p1 ´ θk´1 q 2
τk :“ “ , where θk :“ .
k`2 θk´1 k`2

89
4 PROXIMAL ALGORITHMS

Then, the definition of yk can be rewritten as


yk “ xk ` τk pxk ´ xk´1 q “ p1 ` τk qxk ´ τk xk´1
´ ´ θk ¯¯ ´ θk ¯
“ 1 ´ θk ´ xk ` θ k ´ xk´1
θk´1 θk´1
ˆ ˙
1
“ p1 ´ θk qxk ` θk xk´1 ` pxk ´ xk´1 q
θk´1
1
“ p1 ´ θk qxk ` θk zk , where zk “ xk´1 ` pxk ´ xk´1 q
θ k´1
This leads to the following algorithm.

Algorithm 5: Fast FBS


2
, λ P 0, L1 .
` ˘
Data: x0 “ z0 P E, θk “ k`2
for k “ 0, 1, . . . do
yk “ p1 ´ θk qxk ` θk zk ;
xk`1 “ proxλh pyk ´ λ∇gpyk qq;
zk`1 “ xk ` θ1k pxk`1 ´ xk q;
end
Remark 4.2.12 (Generalizations of fast FBS) There exist many variants and general-
izations of the above algorithm as
• Nesterov’s algorithms [Nes04, Nes83], see also [DHJJ10, Tse08]; this includes
approximation algorithms for non-smooth g [BBC11, Nes05] as NESTA,
• fast iterative shrinkage algorithms (FISTA) by Beck and Teboulle [BT09a],
• variable metric strategies [BGLS95, BQ99, CV12, PLS08], where the step (4.5) is
replaced by
(4.6)
` ˘
yk`1 “ proxQr ,ηr h yk ´ ηr Q´1 1
r g pyk q

with symmetric, positive definite matrices Qr .


Line search strategies can be incorporated [GS11, Gül92, Nes13]. Finally we mention
Barzilei-Borwein step size rules [BB88] based on a Quasi-Newton approach and rel-
atives, see [FZZ08] for an overview and the cyclic proximal gradient algorithm related to the
cyclic Richardson algorithm [SSM13]. ˝
By the following standard theorem the extrapolation modification of the FBS algorithm
ensures a quadratic convergence rate which is “optimal” for first order methods of smooth
problems in the sense of Nemirovsky and Yudin [NY83].

Theorem 4.2.13: Quadratic convergence of fFBS

Let E be a Hilbert space and let f :“ g ` h, where g, h P Γ0 pEq and g be Gateaux


differentiable with L-Lipschitz continuous gradient. If f has a minimizer x̂, then
the function values of the iterates in Algorithm 5 with step size λ P p0, L1 q converge
quadratically to the minimum, i.e.,
ˆ ˙
1
f pxk q ´ f px̂q “ O .
k2

90
4 PROXIMAL ALGORITHMS

Proof. First we consider the progress in one step of the algorithm. By the Lipschitz conti-
nuity of ∇g and since λ ă L1 we know by the descent lemma (Exercise 2.3.15) that for k P N
it holds
1
gpxk`1 q ď gpyk q ` x∇gpyk q, xk`1 ´ yk y ` }xk`1 ´ yk }2 . (4.7)

Moreover, the variational inequality of the proximal operators (Theorem 2.7.4, 4 ) we have
for all u P E that
1
hpxk`1 q ď hpuq ` xyk ´ λ∇gpyk q ´ xk`1 , xk`1 ´ uy
λ
1
“ hpuq ´ x∇gpyk q, xk`1 ´ uy ` xyk ´ xk`1 , xk`1 ´ uy . (4.8)
λ
Adding the inequalities (4.7) and (4.8) yields

f pxk`1 q ď f puq looooooooooooooooooooomooooooooooooooooooooon


´gpuq ` gpyk q ` x∇gpyk q, u ´ yk y
ď0 (by convexity of g)
1 1
` }xk`1 ´ yk }2 ` xyk ´ xk`1 , xk`1 ´ uy
2λ λ
1 1
ď f puq ` }xk`1 ´ yk }2 ` xyk ´ xk`1 , xk`1 ´ uy.
2λ λ
Combining these inequalities for u :“ x̂ and u :“ xk with θk P r0, 1s gives

f pxk`1 q ´ f px̂q ` p1 ´ θk q pf px̂q ´ f pxk qq


“ θk pf pxk`1 q ´ f px̂qq ` p1 ´ θk q pf pxk`1 q ´ f pxk qq
1 1
ď }xk`1 ´ yk }2 ` xyk ´ xk`1 , xk`1 ´ θk x̂ ´ p1 ´ θk qxk y
2λ λ
1 `
}yk ´ θk x̂ ´ p1 ´ θk qxk }2 ´ }xk`1 ´ θk x̂ ´ p1 ´ θk qxk }2
˘


θ2 `
“ k }zk ´ x̂}2 ´ }zk`1 ´ x̂}2 .
˘

Thus, we obtain for a single step

λ 1 λp1 ´ θk q 1
pf pxk`1 q ´ f px̂qq ` }zk`1 ´ x̂}2 ď pf pxk q ´ f px̂qq ` }zk ´ x̂}2 .
θk2 2 θk2 2

Using the relation recursively on the right-hand side and regarding that 1´θk
2
θk
ď 2
1
θk´1
we
obtain
λ λp1 ´ θ0 q 1 1
2 pf pxk`1 q ´ f px̂qq ď 2 pf px0 q ´ f px̂qq ` }z0 ´ x̂}2 “ }x0 ´ x̂}2 .
θk θ0 2 2

This yields the assertion


2
f pxk`1 q ´ f px̂q ď }x0 ´ x̂}2 .
λpk ` 2q2 l

Douglas-Rachford Splitting
Textbooks: [FP03, Subsubsec. 12.4.1].

91
4 PROXIMAL ALGORITHMS

We consider the problem


arg mintgpxq ` hpxqu
xPE

where g, h P Γ0 pEq such that g or h is continuous at a point in dompgq X domphq. In order


to solve this problem, we characterize the minimizers by the fixed points of an operator
T : E Ñ E defined by
1´ ¯
T :“ I ` p2 proxλh ´Iq ˝ p2 proxλg ´Iq “ I ` proxλh p2 proxλg ´Iq ´ proxλg . (4.9)
2
The operator Rf :“ 2 proxf ´I describes the reflection of x at proxf .
The fixed points of T and the minimizers of g ` h are related to each other as follows.
Fig. 4.4: The reflec-
tion of x at p is p `
Lemma 4.2.14
pp ´ xq “ 2p ´ x.
Let g, h P Γ0 pEq with Bpg ` hq “ Bg ` Bh and let T : E Ñ E be defined by (4.9). Then, x̂ P E
is a minimizer of g ` h if and only if there exists some t̂ P FixpT q such that x̂ :“ proxλg pt̂q.

Proof. By Fermats rule and Theorem 3.2.2 we have that x̂ minimizes g ` h if and only if

0 P λBpg ` hqpx̂q “ λBgpx̂q ` λBhpx̂q ô Dξˆ P λBgpx̂q such that 0 P x̂ ´ px̂ ´ ξq


ˆ ` λBhpx̂q.

By using again Fermats rule, this is equivalent to

Dξˆ P λBgpx̂q such that x̂ P arg mint 21 }y ´ px̂ ´ ξq}


ˆ 2 ` λhpyqu “ prox px̂ ´ ξq.
λh
ˆ
yPE

By setting t̂ :“ ξˆ ` x̂ this is equivelaent to

Dt̂ P x̂ ` λBgpx̂q such that x̂ “ proxλh p2x̂ ´ t̂q

which can be reformulated as

Dt̂ P E such that 0 P x̂ ´ t̂ ` λBgpx̂q and x̂ “ proxλh p2x̂ ´ t̂q.

Using again Fermats rule this is equivalent to

Dt̂ P E such that x̂ P arg mint 21 }y ´ t̂}2 ` λgpyqu and x̂ “ proxλh p2x̂ ´ t̂q
yPE

which means by the definition of the proximal mapping that

Dt̂ P E such that x̂ “ proxλg pt̂q and x̂ “ proxλh p2x̂ ´ t̂q.

By inserting the first condition into the second one this is true if and only if

Dt̂ P E such that x̂ “ proxλg pt̂q and proxλg pt̂q “ proxλh p2 proxλg pt̂q ´ t̂q

which can be reformulated as

Dt̂ P E such that x̂ “ proxλg pt̂q and t̂ “ t̂ ` proxλh p2 proxλg pt̂q ´ t̂q ´ proxλg pt̂q “ T pt̂q.

Finally, this is equivalent to

Dt̂ P FixpT q such that x̂ “ proxλg pt̂q.

This completes the proof. l

92
4 PROXIMAL ALGORITHMS

The lemma indicates in particular, that it suffices to find a fixed point of T in order to
find minimizers of g ` h. This leads to the following algorithm, which implements the fixed
point iteration tk`1 “ T ptk q. It is known under the name Douglas-Rachford splitting
algorithm (abbreviated as DRS or DRA).

Algorithm 6: Douglas-Rachford splitting algorithm


Data: g, h P Γ0 pEq, t0 P E, x0 :“ proxλg pt0 q, λ ą 0.
for k “ 0, 1, . . . do
tk`1 “ proxλh p2xk ´ tk q ` tk ´ xk “ T ptk q;
xk`1 “ proxλg ptk`1 q;
end

Next, we prove that T from (4.9) is firmly nonexpansive such that the Douglas-Rachford
splitting algorithm converges.

Theorem 4.2.15: Convergence of Douglas-Rachford algorithm

Let E be a Hilbert space and let g, h P Γ0 pEq such that Bpg ` hq “ Bg ` Bh and
arg minpg ` hq ‰ H. Then, the iterates pxk qkPN produced by Algorithm 6 converge
weakly to some x̂ P arg minpg ` hq.

Proof. Since arg minpg ` hq ‰ H, we have by Lemma 4.2.14 that FixpT q ‰ H for T as
defined in (4.9). Moreover, since proxλh is firmly nonexpansive, we know that there exists
some nonexpansive R : E Ñ E such that proxλh “ 12 I ` 12 R. In particular, it holds that
2 proxλh ´I “ R is nonexpansive. Analogously, we obtain that 2 proxλg ´I is nonexpansive
such that p2 proxλh ´Iq ˝ p2 proxλg ´Iq is nonexpansive. This implies by the definition in
(4.9) that T is firmly non-expansive. Consequently, we obtain by Theorem 4.1.10 that there
exists t̂ P FixpT q such that tk “ T k t0 á t̂. In particular, we have that xk “ proxλg ptk q á
proxλg pt̂q P arg minpg ` hq by Lemma 4.2.14. l

93
4 PROXIMAL ALGORITHMS

olnin lh*LaJ
O*(* - RerTfnoL ,t

Fig. 4.5: The Douglas-Rachford spitting Algorithm 6 finds a point in the intersection of
closed convex non-disjoint sets A, B Ă R2 by minimizing g ` h, where g :“ ιA and h :“ ιB .
In this case, it converges in two steps.

řn
The DRS can also be applyed for minimizing the sum f “ i“1 hi for n ą 2 with hi P Γ0 pEq. Lecture 16,
To this end consider the Hilbert space E n with inner product xpx1 , ..., xn q, py1 , ..., yn q y “ 29.05.2024
řn
i“1 x xi , yi y and let

D :“ tx “ px1 , . . . , xn q : x1 “ . . . “ xn P Eu Ă E n

and consider the function F P Γ0 pE n q defined by


n
ÿ
F pxq “ lo
ιD pxq
omoon ` hi pxi q, x “ px1 , ..., xn q P E n .
i“1
“:Gpxq loooomoooon
“:Hpxq

Then, any minimizer of F is of the form x “ px̂, ..., x̂q, where x̂ P arg minpf q. Now, we
can apply the DRS onto for minimizing F “ G ` H. For the implementation, we note for
x “ px1 , ..., xn q P E n that
1
proxλH pxq “ arg min t }y ´ x}2 ` Hpyqu
y“py1 ,...,yn qPE n 2λ
n
1 ÿ 1
“ arg min t }y ´ x}2 ` Hpyqu “ arg min t }yi ´ xi }2 ` hi pyi qu
y“py1 ,...,yn qPE n 2λ y“py1 ,...,yn qPE n i“1 2λ
´ 1 ¯n ´ ¯n
“ arg mint }yi ´ xi }2 ` hi pyi qu “ proxλhi pxi q .
yi PE 2λ i“1 i“1

Moreover, for t “ pt1 , ..., tn q it holds that


1
proxλG ptq “ proxιD ptq “ arg min t }y ´ t}2 u “ px̂, ..., x̂q
y“pz,...,zqPD 2

94
4 PROXIMAL ALGORITHMS

with
n n
1ÿ 1 ÿ
x̂ P arg min }z ´ ti }2 “ ti ,
zPE 2 i“1 n i“1
where the last equality follows by setting the derivative to zero. By putting these computa-
tions together, we obtain the following algorithm, which is called the Parallel Douglas-
Rachford splitting. The name is inspired by the fact that the inner loop can be com-
puted in parallel.

Algorithm 7: Parallel Douglas-Rachford splitting


Data: Initial guess t0 P E n
Result: An approximate minimizer of F .
for k “ 0, 1, . . . do
for i “ 1, . . . , n do
tk`1,i “ proxλhi p2xk ´ tk,i q ` tk,i ´ xk ;
end
tk`1 “ ptk`1,1 , ..., tk`1,n q;
řn
xk`1 “ n1 i“1 tk`1,i ;
end

By the previous computations we obtain the following corollary.


Corollary 4.2.16
Let E be a Hilbert space and let hi P Γ0 pEq, i “ 1, ..., n such that there exists x P E
such that hi is continuous in x for all i “ 1, ..., n. Then, the sequence pxk qk generated by
řn
Algorithm 7 converges weakly to a minimizer of i“1 hi .

Exercises.

Exercise 4.2.17 (Variable proximal parameters) Let us consider the case n “ 2 and
suppose that in Algorithm 3 we choose λk ” λ ą 0 to be constant. With the same arguments
as in Algorithm 3, the iterates converge weakly to a fixed point of proxλf2 ˝ proxλf1 . Show
that this fixed point is not a minimizer of f but of the smoother function f2 ` λ f1 . ■

95
4 PROXIMAL ALGORITHMS

4.3 Application: Inverse Problems in Imaging


In this subsection, we want to apply the algorithms from the previous subsection to image Lecture 17,
processing. Throughout this subsection, we always consider the case that E “ Rd with 13.06.2023
standard inner product and Euclidean norm.
We consider gray-valued images as a matrices x P r0, 1snˆm of pixels xij , i “ 1, ..., n,
j “ 1, ..., m, where xij “ 0 corresponds to a black pixel, xij “ 1 represents a white pixel and
xij P p0, 1q are the gray-values inbetween. For simplifying the notation, we will often consider
the vectorized form vecpxq P Rnm instead of the image x itself, where vecpxq contains the
same entries in x but reordered in one large vector instead of a matrix.

(a) Original image (b) Noisy image

(c) Blurred image (d) Image with missing pixels

(e) Low resolution image

Fig. 4.6: Corrupted image for different types of degradation.

Typical tasks from mathematical image processing are the following.

96
4 PROXIMAL ALGORITHMS

• Denoising: Reconstruct a clean image from a noisy version, see Fig. 4.6b.
• Deblurring: Sharpening of a blurred image, see Fig. 4.6c.
• Inpainting: Reconstruct missing pixels from an image, see Fig. 4.6d.
• Superresolution: Increase the resolution of an given image, see Fig. 4.6e.
All of these tasks can be formulated as so-called inverse problems. More precisely, we inverse problems
want to reconstruct (approximately) an unknown ground truth x̂ P Rnm from an observation
y P Rd generated by
y “ F px̂q ` ξ (4.10)

where F : Rnˆm Ñ Rd is a function and ξ P Rd is some unknown noise which arises from in-
exact measurements or other pertubations in the image aquesition. For the specific examples
from above, the operator F is given as follows.
• Denoising: For denoising, we do not modify the image itself, i.e., we choose F pxq “ x.
On the other hand, we add a large pertubation ξ.
• Deblurring: The forward operator is given by a 2D-convolution with kernel k “
r,s“´P . That is, for x “ pxij qij we generate an output image y “ F pxq given by
pkrs qP
řP řP
y “ pyij qij , where yij “ r,s“´P krs xi´r,j´s . Usually, we have that r,s“´P krs “ 1
such that each pixel yij is the average of close pixels from x.
• Inpainting: Here, the forward operator F : Rnˆm Ñ Rnˆm is given by the matrix
vector multiplication F pxq “ diagpe1 , ..., enm qx, where ei “ 1 if the ith pixel of x is
known and ei “ 0 otherwise.
• Superresolution: For superresolution the forward operator F is a concatenation of
a blur operator and a downsampling operator. The blur operator is given similar as
in the deblurring example. An example for an downsampling operator is the operator
which takes every second pixel in the image.
Moreover, several image acquesition problems in medical applications can be formulated as
inverse problems. This includes computed tomography (CT), magnetic resonance imaging
(MRI) and electrical impedance tomography (EIT). Unfortunately, for the considered prob-
lems, the operator F is not invertible or highly ill-posed, i.e., small pertubations of y can
lead to huge differences in the reconstruction F ´1 pyq.
A classical approach to reconstruct the image x from the observation y is the minimization
of a variational functional given by

arg min J pxq, J pxq “ DpF pxq, yq ` λRpxq, λ ą 0. (4.11)


xPRnm

Here DpF pxq, yq is called the data-fidelity term and measures the distance between F pxq
and y. Using that y “ F px̂q ` ξ, we obtain that for “good” reconstructions x which are
close to x̂ we have that the distance between F pxq and y is small. A common choice of D is
Dpx, yq “ 21 }x ´ y}2 . The term Rpxq is a regularizer. It incorporates some prior knowledge
about the image. More precisely R should be a function which is small if the image x looks
“natural” and large otherwise.

97
4 PROXIMAL ALGORITHMS

Example 4.3.1 Using only the data-fidelity term is not sufficient. To see this, we consider
the deblurring problem. Here, we minimize the data-fidelity term
1
J pxq “ DpF pxq, yq “ }F pxq ´ y}2
2
via a gradient descent algorithm. The results after certain numbers of steps are given in
Fig. 4.7. We see that there appear some unnatural artifacts. ˛

In the following example summarizes different choices for the regularizer R in (4.11) and
shows how they can be used for reconstructing the unknown ground truth in certain inverse
problems.

Original image Blurred image

After 200 gradient descent steps After 2000 gradient descent steps

Fig. 4.7: Minimizing only the data-fidelity term yields unnatural artifacts.

Example 4.3.2
• Tikhonov Regularization for Deblurring. The regularizer Rpxq “ }x}2 is called
Tikhonov regularization. Together with the data-fidelity term DpF pxq, yq “ Tikhonov
2 }F pxq ´ y} , the objective function J from the variational problem (4.11) reads regularization
1 2

as
1
J pxq “ }F pxq ´ y}2 ` λ}x}2 .
2
In this case J is differentiable and can be minimized by applying Algorithm 1 (gradient
descent). An application of this algorithm to deblurring is given in Fig. 4.8. We can
see that it reduces the artifacts compared to the reconstruction without regularization.

98
4 PROXIMAL ALGORITHMS

Original image Blurred image

No regularization Tikhonov regularization

TV regularization

Fig. 4.8: Deblurring by solving the variational problem (4.11) for different regularizers.

• Image Inpainting by Penalizing Distances of Neighboring Pixels. We use the


regularizer, which penalizes the squared distance of neighboring pixels. That is, we
set
ÿn ÿ m n ÿ
ÿ m
Rpxq “ pxi´1,j ´ xi,j q2 ` pxi,j´1 ´ xi,j q2 .
i“2 j“1 i“1 j“2

The regularizer can be rewritten as Rpxq “ }Dx}22 for a certain matrix D. To en-
sure that the known pixels have the correct value, we choose the data fidelity term
DpF pxq, yq “ ιtzPRnm :F pzq“yu pxq. This leads to the variational problem
J pxq “ ιtzPRnm :F pzq“yu pxq ` λRpxq.
In order to minimize J , we can use the forward-backward splitting, where g “ R is
differentiable. The proximity operator of h “ ιtzPRnm :F pzq“yu can be computed by
setting all known pixels to the correct value. The result is given in Fig. 4.9.

99
4 PROXIMAL ALGORITHMS

Original image Image with missing pixels

Smooth regularization TV regularization

Fig. 4.9: Inpainting by solving the variational problem (4.11) for different regularizers.

• Total Variation. Finally, we consider the regularizer R which penalizes the 1-norm
of the differences of neighboring pixels. More precisely, we define R for an image
x “ pxi,j qi,j P Rnˆm by
n ÿ
ÿ m n ÿ
ÿ m
Rpxq “ TVpxq “ |xi´1,j ´ xi,j | ` |xi,j´1 ´ xi,j |.
i“2 j“1 i“1 j“2

It is called the total variation (TV) regularizer and can be rewritten as TVpxq “ total variation
}Dx}1 for a certain difference matrix D. Together with the data-fidelity term DpF pxq, yq “
2 }F pxq ´ y} we end up with the objective function
1 2

1
J pxq “ }F pxq ´ y}2 ` λTVpxq. (4.12)
2
Unfortunately, TV is not differentiable and no closed form of its proximity operator
is known. Therefore, none of the algorithms from the previous section is directly
applicable. As a remedy, we will study so-called dual functions within the next sections
in order to derive an efficient algorithm for minimizing functions of the form f pxq “
gpxq ` hpAxq. By setting gpxq “ 21 }F pxq ´ y}2 , A “ D and hpxq “ λ}x}1 this will
enable us to minimize the functional from (4.12).
In Fig. 4.8 we show some reconstructions using the functional (4.12) for deblurring. In
order to apply the TV regularizer together with the data-fidelity term DpF pxq, yq “
ιtzPRnm :F pzq“yu pxq onto an inpainting problem, we use the Douglas-Racheford
splitting with gpxq “ ιtzPRnm :F pzq“yu pxq and hpxq “ TVpxq. For computing proxλTV ,

100
4 PROXIMAL ALGORITHMS

we again refer to the duality theory at the end of this lecture. The result is given in
Fig. 4.9. ˛

Remark 4.3.3 (1-Norm Regularization and Sparsity)


It can be proven that minimizing the 1-norm as a regularizer leads to a “sparse” solution.
That is, the functional
1
J pxq “ }F pxq ´ y}2 ` λ}x}1
2
admits minimizers x̂, where many entries of x̂ are zero, see also Exercise 4.3.4.
Consequently the TV regularizer from the previous example enforces that many differences
between neighboring pixels are zero. For large regularization parameters λ, this leads to
piece-wise constant or “cartoon-like” images, see Fig. 4.10 for an illustration. ˝

λ “ 0.0002 λ “ 0.002

λ “ 0.02 λ “ 0.2

Fig. 4.10: Deblurring with TV regularizer and different regularization parameters λ. For
larger values of λ, the reconstruction becomes more “cartoon-like”.

Bayesian Inverse Problems


The exact reconstruction of the ground truth x̂ in an inverse problem is usually impossible.
Therefore, we can view the variables x̂, y and ξ in (4.10) as random variables X, Y and Ξ
related by
Y “ F pXq ` Ξ.
Here, we call the distribution PX of X the prior distribution. Now, for a fixed sample
y of Y , we are interested in the so-called posterior distribution PX|Y “y which can be
interpreted as the probability distribution of “all possible reconstructions”.

101
4 PROXIMAL ALGORITHMS

In the case that all involved probability distributions have densities, we can search for the
“most likely” reconstruction x by maximizing the density pX|Y “y of PX|Y “y . Assuming that
Ξ „ N p0, σ 2 q this can be reformulated by Bayes’ theorem as
´p
Y |X“x pyq pX pxq
! ¯)
arg maxtlogppX|Y “y pxqqu “ arg max log
x x pY pyq
“ arg max log exp ´}F pxq ´ y}2 {p2σ 2 q ` log ppX pxqq
␣ ` ` ˘˘ (
x

“ arg min 21 }F pxq ´ y}2 ´ σ 2 log ppX pxqq .


␣ (
x

The solution of this problem is called the maximum-a-postiori estimator (MAP) of


X given Y “ y. By setting Rpxq “ ´ log ppX pxqq and λ “ σ 2 , the maximum-a-postiori
estimator is the solution of a regularized problem (4.11).

Exercises

Exercise 4.3.4 (Peak detection with ISTA)


Implement the ISTA from Example 4.2.11 for the matrix K “ I. Then, take a signal
x̂ “ p0, 0, 0, 0, 0, 2, 0, 0, 5, 0, 3, 0q and generate y by adding Gaussian noise onto x̂. Afterwards,
run ISTA onto this observation y and denote the limit by x. Count the number of zero-entries
in x depending on the parameter η.
Redo the same experiment with the functional
1
}Kx ´ y}2 ` η}x}22
2
instead and count again the number of zero-entries of the minimizer. ■

102
5 MONOTONE SET-VALUED MAPPINGS

5 Monotone Set-valued Mappings

5.1 Set-valued Maps


In this section, we consider set-valued maps F : E Ñ 2E (where 2S is the power set of S), Lecture 16
henceforth written as F : E ⇒ E. Throughout this section we always assume that E is a continued,
Hilbert space and that F : E ⇒ E is non-degenerate in the sense that there exists a 29.05.2024
x P E with F pxq ‰ H.
Examples of set-valued maps we encountered before are the subdifferential of a convex
function, the proximal operator for non-convex functions and the tangent cone to a set.
Textbooks: [Phe09, Sec. 2, pp. 17, 27–31], [AE06, p. 1–4], [Sim06, Sec. II.8], [BP12, pp.
53–58] and [BP12, Sec1̇.4].

Definition 5.1.1 (Graph, inverse, domain)


The graph of a set-valued map F : E ⇒ E is graph

grpF q :“ tpx, yq P E ˆ E : y P F pxqu.

Its inverse F ´1 : E ⇒ E is the set valued map F ´1 with

grpF ´1 q :“ tpy, xq P E ˆ E : y P F pxqu.

The domain of F is defined by dompF q :“ tx P E : F pxq ‰ Hu and its range by domain


Ť
ranpF q :“ xPE F pxq.
If ‹ denotes an operation on subsets of E (e.g. `) and F, G : E ⇒ E are set-valued maps,
then we set F ‹ G : E ⇒ E, x ÞÑ F pxq ‹ Gpxq.

We have dompF ´1 q “ ranpF q and ranpF ´1 q “ dompF q.


If F pxq “ tau for some x P E, then we just write F pxq “ a. Similarly, any single valued
map f : E Ñ E induces the set-valued map via F : E ⇒ E, x ÞÑ tf pxqu.

Example 5.1.2 We consider the set-valued function


$
& t´1u
’ for x ă 0,
F pxq :“ r´1, 1s for x “ 0,

for x ą 0
%
t1u

which is the subdifferential of f pxq “ |x|. The functions F, I ` λF , λ ą 0 and their inverses
F ´1 , pI `λF q´1 are depicted in Fig. 5.1. We see that pI `λF q´1 is a single-valued function.˛

103
5 MONOTONE SET-VALUED MAPPINGS

1 λ

-1 1 -λ λ
-1 -λ

Fig. 5.1: F, F ´1 , I ` λF and the single-valued pI ` λF q´1 for some λ ą 0.

Remember that a single-valued function on R is monotone increasing if x1 ď x2 implies that


f px1 q ď f px2 q. This can be rewritten as

xx1 ´ x2 , f px1 q ´ f px2 qy ě 0 @x1 , x2 P R.

Strictly monotone functions f : R Ñ f pRq are invertible.


This definition can be generalized for set-valued functions.

Definition 5.1.3 (Strongly / maximal monotone)


A function F : E ⇒ E is
• monotone if x y1 ´ y2 , x1 ´ x2 y ě 0 for all px1 , y1 q, px2 , y2 q P grpF q. monotone
• strongly monotone with modulus σ ą 0 if strongly
monotone
x y1 ´ y2 , x1 ´ x2 y ě σ}x1 ´ x2 }2 @px1 , y1 q, px2 , y2 q P grpF q.

• maximal monotone if it is monotone and no enlargement of its graph is possible maximal


without destroying its monotonicity: for every px1 , y1 q P pE ˆ Eqz grpF q there exists monotone
a px2 , y2 q P grpF q with x y1 ´ y2 , x1 ´ x2 y ă 0.

By definition F is σ-strongly monotone if and only if F ´ σI is monotone. Moreover, firmly


non-expansive operators are monotone by definition. This includes in particular proximal
mappings.
Example 5.1.4 (Monotone but not maximal monotone)
Consider the set-valued map
$
& ´1, if x ď 0,
F : R Ñ 2R , x ÞÑ tf pxqu, where f pxq “ (5.1)
% 1, else.

104
5 MONOTONE SET-VALUED MAPPINGS

1 1

-1 -1 -1 1

Fig. 5.2: The graphs of the set-valued maps F , I ` F and pI ` F q´1 , where F is induced by
the single-valued map f from (5.1).

The maps F and I ` F are monotone, but F is not maximally monotone: take px1 , y1 q :“
p0, 0q P R ˆ R. Then px1 , y1 q R grpF q because 0 R F p0q “ t´1u. However, for any px2 , y2 q P
grpF q it holds
x x1 ´ x2 , y1 ´ y2 y “ x x2 , y2 y ě 0.

Hence F is not maximally monotone. ˛

Exercises

Exercise 5.1.5 Let E be a Hilbert space. How do linear monotone maps T : E Ñ E look
like? ■

Exercise 5.1.6 If F : E ⇒ E is maximal monotone, then F pxq Ă E is closed and convex


for any x P E. ■

105
5 MONOTONE SET-VALUED MAPPINGS

5.2 The Subdifferential as a Set-valued Map


Textbooks: [Z0̆2, Thm. 2.4.2], [BP12, Subsec. 2.2.2] and [Roc66].
By Theorem 2.3.9, the single-valued gradient of a convex Gateaux-differentiable function
f on E is a monotone operator. This holds more generally for the subdifferential of proper
convex functions.

Theorem 5.2.1: Monotonicity and convexity

Let E be a Hilbert space. If f : E Ñ R is proper and convex, then Bf is monotone.


If f is additionally lower semicontinuous, then Bf : E ⇒ E is maximal monotone.

Proof. 1 Let f be proper and convex. For p1 P Bf px1 q and p2 P Bf px2 q we have

x y ´ xk , pk y ď f pyq ´ f pxk q @y P E

for all k P t1, 2u.


Setting y “ x1 for k “ 2 and y “ x2 for k “ 1 yields

x x1 ´ x2 , p2 y ď f px1 q ´ f px2 q and x x2 ´ x1 , p1 y ď f px2 q ´ f px1 q.

Adding both inequalities yields

x x1 ´ x2 , p1 ´ p2 y ě 0.

2 This goes back to Minty, a not too difficult proof can be found in [Roc66, Thm. 4] or
[Phe09, Thm. 3.25]. l

It can be shown that also the converse statement of the theorem is true: if f : E Ñ R is
a proper and lower semicontinuous and Bf is monotone, then f is convex. For a proof, we
refer to [Z0̆2, Thm. 3.2.7].
The following figure shows that lower semi-continuity of f is really necessary for Bf to be
maximal monotone.

"∞" "∞"

f1 𝜕 f1 f2 𝜕 f2

Fig. 5.3: Left to right: f1 P Γ0 pRq, Bf1 is maximal monotone, the proper, convex function
f2 which is not lower semicontinuous, Bf2 is not maximal monotone.

106
5 MONOTONE SET-VALUED MAPPINGS

Example 5.2.2 By Theorem 2.7.4, for f P Γ0 pEq, λ ą 0 and x P E it holds


! 1 )
xλ :“ proxλf pxq “ arg min λf pyq ` }y ´ x}2
yPE 2

is unique. Since 1
2} ¨ ´x}2 is continuous, by Theorems 3.1.9 and 3.2.2 xλ is a minimizer if
and only if
0 P λBf pxλ q ` xλ ´ x.
This can be rewritten as

x P pI ` λBf qpxλ q ðñ xλ P pI ` λBf q´1 pxq.

In particular, we see that the so-called resolvent of λBf given by JλBf :“ pI ` λBf q´1
coincides with the proximal mapping proxλf . In particular, it is single-valued and firmly
nonexpansive by Theorem 2.7.4. ˛

More generally, we have the following definition.

Definition 5.2.3 (Resolvent)


The resolvent of a set-valued mapping F : E ⇒ E is defined by JF :“ pI ` F q´1 : E ⇒ E. resolvent

The following theorem generalizes the last example.

Theorem 5.2.4: Resolvents for monotone mappings

Let E be a Hilbert space and let F : E ⇒ E be monotone. Set R :“ ranpI `F q Ă E.


1 The resolvent JF : R Ñ E is single-valued and firmly non-expansive.
2 The mapping F is maximal monotone if and only if R “ E.

Proof. 1 Assume that JF is not single-valued. Then, there exist y P E and x1 ‰ x2 P E


such that xi P pI ` F q´1 pyq, i.e., y P pI ` F qpxi q for i “ 1, 2. Then, it holds

y ´ x1 P F px1 q and y ´ x2 P F px2 q.

Since F is monotone, we obtain

xx1 ´ x2 , y ´ x1 ´ py ´ x2 qy ě 0,
2
xx1 ´ x2 , ´x1 ` x2 y “ ´}x1 ´ x2 } ě 0.

This implies x1 “ x2 which is a contradiction.


Since F is monotone, we obtain for all xi P E and yi P F pxi q, i “ 1, 2 that
0 ď xx1 ´ x2 , y1 ´ y2 y,
}x1 ´ x2 }22 ď xx1 ´ x2 , x1 ` y1 ´ px2 ` y2 qy
and with zi P pI ` F qpxi q, i.e., xi “ pI ` F q´1 pzi q further
}x1 ´ x2 }22 ď xx1 ´ x2 , z1 ´ z2 y,
}JF pz1 q ´ JF pz2 q}22 ď xJF pz1 q ´ JF pz2 q, z1 ´ z2 y.
for all zi P R, i “ 1, 2. Hence, JF is firmly nonexpansive.

107
5 MONOTONE SET-VALUED MAPPINGS

2 This is known as Theorem of Minty and for a proof we refer to [Min62, Corollary on
p. 344], [Roc70a, Cor. on p. 78] or [Z0̆2, Thm. 3.11.7]. l

Finally we characterize λ-convex functions via the subdifferential. The following lemma
generalizes Remark 2.3.14.

Lemma 5.2.5 (λ-convexity and subdifferentials)


Let E be a Hilbert space and let f P Γ0 pEq. Then, the following holds true.
1 If f is λ-convex for some λ P R, then it holds

x p0 ´ p1 , x0 ´ x1 y ě λ}x0 ´ x1 }2 @px0 , p0 q, px1 , p1 q P grpBf q.

2 If E is finite-dimensional, also the reverse direction is true.


In particular, the subdifferential of σ-strongly convex functions is σ-strongly monotone.

Proof. We only prove 1 . For 2 we refer to the finite-dimensional script.


Let x0 , x1 P dompf q and define xt :“ p1 ´ tqx0 ` tx1 for t P p0, 1q. Since f is λ-convex, we
have
1
f xt q ď p1 ´ tqf px0 q ` tf px1 q ´ λtp1 ´ tq}x0 ´ x1 }2 ,
`
2
which can be rearranged to
1` ˘ 1
f pxt q ´ f px0 q ` λp1 ´ tq}x0 ´ x1 }2 ď f px1 q ´ f px0 q.
t 2
Taking the limit t Œ 0 we get
1
Df rx0 spx1 ´ x0 q ` λ}x0 ´ x1 }2 ď f px1 q ´ f px0 q.
2
Since f : E Ñ R is convex and finite at x0 , we have by Lemma 3.1.4 that x p0 , x1 ´ x0 y ď
Df rx0 spx1 ´ x0 q for any p0 P Bf px0 q. Therefore, it holds
1
x p0 , x1 ´ x0 y ` λ}x0 ´ x1 }2 ď f px1 q ´ f px0 q @p0 P Bf px0 q.
2
Analogously, by exchanging the roles of x0 and x1 we obtain
1
x p1 , x0 ´ x1 y ` λ}x1 ´ x0 }2 ď f px0 q ´ f px1 q @p1 P Bf px1 q.
2
Adding the last two inequalities yields

x p1 , x0 ´ x1 y ´ x p0 , x0 ´ x1 y `λ}x0 ´ x1 }2 ď 0,

that is, λ}x0 ´ x1 }2 ď x p0 ´ p1 , x0 ´ x1 y. l

Exercises

Exercise 5.2.6 Explicitly write down the functions f1 and f2 from Fig. 5.3, calculate their
subdifferentials and verify that Bf1 is maximal monotone, but Bf2 is not. ■

Exercise 5.2.7 Let f P Γ0 pEq, intpdompf qq ‰ H and x1 , x2 P intpdompf qq be distinct.


Then there exists a t P p0, 1q and a pt P Bf pxt q such that f px2 q ´ f px1 q “ x pt , x2 ´ x1 y. ■

108
6 CONJUGATE FUNCTIONS

6 Conjugate Functions
Like in Functional Analysis, in Convex Analysis many mathematical objects can be paired
with their so-called duals. In this way, close connections between apparently dissimilar
properties of objects become transparent and the analysis of a given object can be treated
in an equivalent yet very different context.
Textbooks: [ET99, Sec. I.4], [Sim06, pp. 25–26], [BC11, Chp. 13], [BS13, Subsec. 2.4.2],
[Z0̆2, Sec. 2.3], [BP12, Subsec. 3.2.1] and [MN22, Sec. 4.1].

6.1 Conjugate Functions and their Properties


Textbooks: [BL12] and [BP12, Subsec. 2.1.4]. Lecture 17
continued,
Let E be a real normed space. The following concept plays a crucial role in duality theory 31.05.2024
and goes back to the Danish mathematician Werner Fenchel.

Definition 6.1.1 (Conjugate)


For f : E Ñ R we define the Fenchel conjugate by conjugate

f ˚ : E ˚ Ñ R, p ÞÑ sup x p, x yE ´f pxq.
xPE

Moreover, we call the conjugate of the conjugate of f the biconjugate f ˚˚ “ pf ˚ q˚ defined biconjugate
by
f ˚˚ : E ˚˚ Ñ R, x ÞÑ sup x x, p yE ˚ ´f ˚ ppq.
pPE ˚

Note that f ˚ p0q “ ´ inf xPE f pxq. By definition, we have that for a given p P E ˚ the value
f ˚ ppq is the largest element in R such that it holds
f pxq ě x p, x y ´f ˚ ppq.
In other words, f ˚ ppq is the largest value such that the hyperplane Hpp,1q,´f ˚ ppq is below the
epigraph of f . As an example, we consider the case of f pxq “ x2 and choose p “ 2. Then,
the geometric interpretation from above is visualized on the left of Fig. 6.1.
Similarly, the function value f ˚ ppq can be interpreted as the largest diffence between the
graphs of f and the function x ÞÑ x p, x y. An illustration is given on the right of Fig. 6.1.
Remark 6.1.2 ((Bi)conjugates are lower semicontinuous, convex)
By definition, we have that
epipf ˚ q “ tpp, aq P E ˚ ˆ R : f ˚ ppq ď au
“ tpp, aq P E ˚ ˆ R : x p, x y ´f pxq ď a @x P Eu
č
“ tpp, aq P E ˚ ˆ R : xpp, aq, px, ´1q y ď f pxqu
xPE
č
“ tpp, aq P E ˚ ˆ R : x ΛEˆR px, ´1q, pp, aq y ď f pxqu
xPE
č
“ HΛ´EˆR px,1q,f pxq .
xPE

109
6 CONJUGATE FUNCTIONS

In particular, epipf ˚ q is closed and convex such that f ˚ is convex and lower semicontinuous.
Consequently, also f ˚˚ “ pf ˚ q˚ and f ˚˚ ˝ ΛE are convex and lower semicontinuous. ˝

p“2

f pxq “ x2

ℓpxq “ 2x ´ f ˚ p2q

1
´f ˚ p2q

Fig. 6.1: Left: Geometric interpretation of f ˚ p2q for f pxq :“ x2 . Right: f ˚ pxq as supremum
of the differences between the graph of f and x ¨, x y.

Example 6.1.3 We compute the conjugate for different functions f .


• For f :“ ιB1 p0q , the conjugate is given by the dual norm: for p P E ˚ we have

f ˚ ppq “ sup x p, x y ´ιB1 p0q “ sup x p, x y “ }p}˚ .


xPE xPB1 p0q

• For f : R Ñ R, x ÞÑ ´ex and p P R we have


f ˚ ppq “ sup px ` ex ” 8.
xPR

In particular, f ˚ is not proper even though f is proper. Furthermore, we have f ˚˚ pxq “


suppPR px ´ 8 ” ´8 for all x P R.
• Now consider f pxq :“ αex , where α P p0, 1s. If p ă 0, then the function gpxq :“ px´αex
xÑ´8
fulfills gpxq ÝÝÝÝÑ 8. If p ą 0, then maxxPR px ´ αex “ p logpp{αq ´ p. If p “ 0, then
supxPR px ´ αex “ 0. Hence the conjugate of f is the entropy

f ˚ ppq “ p logpp{αq ´ p ` ιr0,8q ppq

using the convention 0 logp0q :“ limtŒ0 t logptq “ 0. The functions f and f ˚ are visu-
alized in Fig. 6.2 Similarly, we can verify f ˚˚ “ f . ˛

Fig. 6.2: Left: The family px ÞÑ αex qαPp0,1s . Right: The family of conjugates.

110
6 CONJUGATE FUNCTIONS

The following lemmas collect some basic properties of conjugate functions. In order to Lecture 18,
compare f and f ˚˚ , we use the canonical embedding ΛE : E Ñ E ˚˚ : 05.06.2024

f ˚˚ ˝ ΛE : E Ñ R, x ÞÑ sup x ΛE pxq, p yE ˚ ´f ˚ ppq “ sup x p, x yE ´f ˚ ppq.


pPE ˚ pPE ˚

In the case that E is reflexive, we can identify E ˚˚ with E such that ΛE becomes the
identity.

Lemma 6.1.4
Let f, g : E Ñ R.
1 If f ě g, then f ˚ ď g ˚ .
2 For all x P E and p P E ˚ such that either f pxq or f ˚ ppq are finite, we have the
Fenchel(-Young) inequality

f ˚ ppq ` f pxq ě x p, x y .

3 We have f ˚˚ ˝ ΛE ď f .

Proof. 1 Let p P E ˚ . Then it holds

f ˚ ppq “ sup x p, x y ´f pxq ď sup x p, x y ´gpxq “ g ˚ ppq.


xPE xPE

2 By definition it holds
f ˚ ppq ě x p, x y ´f pxq.
Adding f pxq on both sides yields the claim.
3 Let x P E. Then it holds

f ˚˚ ˝ ΛE pxq “ sup x ΛE pxq, p y ´f ˚ ppq “ sup x p, x y ´f ˚ ppq ď f pxq,


pPE ˚ pPE ˚

where the last inequality follows from 2 . l

Lemma 6.1.5
Let f : E Ñ R.
1 If f is improper, then f ˚ ” ´8 or f ˚ ” 8.
2 If f is proper, then f ˚ cannot attain ´8.
3 If f is proper and convex , then f ˚ P Γ0 pE ˚ q. Moreover, if f is proper, then clpf q˚ “
f ˚.

Proof. 1 If there exists x0 with f px0 q “ ´8, then it holds f ˚ ppq “ supxPE x p, x y ´f px0 q “
8 such that f ˚ ” 8. If f ” 8, then it holds f ˚ ppq “ supxPE x p, x y ´f pxq “ ´8
such that f ˚ ” ´8.
2 Assume that f ˚ ppq “ ´8 for some p P E ˚ . Then, it holds for any x P E that

´8 “ f ˚ ppq ě x p, x y ´f pxq ô 8 ď f pxq.

This implies that f ” 8 and contradicts the assumption that f is proper.

111
6 CONJUGATE FUNCTIONS

3 By Remark 6.1.2, f ˚ is convex and lower semicontinuous. Due to 2 , we know that


since f is proper, f ˚ cannot attain the value ´8. We can use Theorem 2.2.8 to find
p0 P E ˚ and α P R such that α ě x p0 , x y ´f pxq for all x P E. Hence
f ˚ pp0 q “ sup x p0 , x y ´f pxq ď α ă 8
xPE

and so f is proper.
˚

Finally, assume that f is proper. Then, we have to show that pf q˚ “ f ˚ . Since f ď f ,


Lemma 6.1.4 1 yields pf q˚ ě f ˚ such that it is left to show that pf q˚ ď f ˚ . By
definition and the Fenchel inequality from Lemma 6.1.4 2 it holds for all p P E ˚
and x0 P E that
f px0 q “ lim inf f pxq ě lim inf x p, x y ´f ˚ ppq “ x p, x0 y ´f ˚ ppq
xÑx0 xÑx0

Thus, f ppq ě x p, x0 y ´f px0 q and by taking the supremum over x0 P E, we obtain


˚

f ˚ ppq ě pf q˚ ppq. l

The converse of 1 is not true, see Example 6.1.3. The lemma implies in particular, that
the assumption from Lemma 6.1.4 2 are fulfilled automatically if f is proper. Moreover,
the statement of Lemma 6.1.4 3 extended by the following theorem.

Theorem 6.1.6: Fenchel-Moreau-Rockafellar

A proper function f : E Ñ R is in Γ0 pEq if and only if f “ f ˚˚ ˝ ΛE .

Proof. “ ðù ”: Suppose that f “ f ˚˚ ˝ ΛE . By Remark 6.1.2 f ˚ and f ˚˚ are convex and


lower semicontinuous such that also f “ f ˚˚ ˝ ΛE is convex and lower semicontinuous.
“ ùñ ”: Let f P Γ0 pEq. By Lemma 6.1.5 3 , f ˚ , f ˚˚ and f ˚˚ ˝ ΛE are proper.
Suppose towards contradiction, that there exists a x0 P E with f px0 q ‰ f ˚˚ ˝ ΛE px0 q. By
Lemma 6.1.4 3 we have f ˚˚ ˝ ΛE ď f , so we must have

f px0 q ą f ˚˚ ˝ ΛE px0 q “ sup x p, x0 y ´f ˚ ppq. (6.1)


pPE ˚

In particular, x0 P dompf ˚˚ ˝ ΛE q because otherwise 8 “ f ˚˚ ˝ ΛE px0 q ă f px0 q, which


would imply f px0 q “ 8 “ f ˚˚ ˝ ΛE px0 q a contradiction.
Now, (6.1) implies px0 , f ˚˚ ˝ ΛE px0 qq R epipf q. Thus, by Theorem 1.2.2, we can strongly
separate the point tpx0 , f ˚˚ ˝ ΛE px0 qqu from the set epipf q to obtain p0 P E ˚ , a P R and
ε ą 0 such that

x p0 , x0 yE `af ˚˚ ˝ ΛE px0 q ě x p0 , x yE `at ` ε @px, tq P epipf q. (6.2)

Now, we distinguish three cases.


• If a ą 0, then this implies for y P dompf q (which exists since f is proper) and
t “ f pyq ` b that

p0 , x0 yE `af ˚˚ ˝ ΛE px0 q ě looooooooooooooomooooooooooooooon


xloooooooooooooooomoooooooooooooooon x p0 , x yE `af pyq ` ab ` ε
ă8, independent of b Ñ8 as bÑ8

112
6 CONJUGATE FUNCTIONS

which is a contradiction.
• If a “ 0, then (6.2) implies that

x p0 , x y ď x p0 , x0 y ´ε @x P dompf q.

If x0 P dompf q, we obtain for x “ x0 a contradiction. For the case that x0 R dompf q,


we use that by definition it holds that

f ˚ pqq ě x q, x y ´f pxq @x P dompf q.

Adding the last two inequalities up yields for q P dompf ˚ q (which exists since f ˚ is
proper) and b ą 0 that

x q ` bp0 , x y ´f pxq ď f ˚ pqq ` b x p0 , x0 y ´bε @x P dompf q

Taking the supremum over all x P dompf q this yields that

f ˚ pq ` bp0 q ď f ˚ pqq ` b x p0 , x0 y ´bε.

This can be reformulated to


x q, x0 y ´f ˚ pqq ` bε ď x q ` bp0 , x0 y ´f ˚ pq ` bp0 q
“ x ΛE px0 q, q ` bp0 y ´f ˚ pq ` bp0 q ď f ˚˚ ˝ ΛE px0 q.

Letting b Ñ 8 implies that f ˚˚ ˝ ΛE px0 q “ 8, which contradicts the fact that x0 P


dompf ˚˚ ˝ ΛE q from the beginning.
• If α ă 0, dividing (6.2) by ´a yields for all x P dompf q (by noting that px, f pxqq P
epipf q) that
x ´ a1 p0 , x0 y ´f ˚˚ ˝ ΛE px0 q ě x ´ a1 p0 , x y ´f pxq ´ aε

For x R dompf q, we have that f pxq “ 8 such that the above inequality is fulfilled
automatically. Thus, taking the supremum over all x P E yields that

x ´ a1 p0 , x0 y ´f ˚˚ ˝ ΛE px0 q ě f ˚ p´ a1 p0 q ´ ε
a ą f ˚ p´ a1 p0 q.

This contradicts the Fenchel inequality from Lemma 6.1.4 2 . l

As a direct consequence, we can extend the statement from Lemma 6.1.5 3 .


Corollary 6.1.7 (f ˚˚ “ clpf q for convex functions)
For a proper convex function f : E Ñ R we have f ˚˚ ˝ ΛE “ clpf q.

Proof. Since f is proper and convex, we have that clpf q P Γ0 pEq. Thus, applying Theo-
rem 6.1.6 to clpf q yields
6.1.5 ` ˘˚ 6.1.6
f ˚˚ ˝ ΛE “ pf ˚ q˚ ˝ ΛE “ clpf q˚ ˝ ΛE “ clpf q˚˚ ˝ ΛE “ clpf q. l

If E is a Hilbert space, we have E “ E ˚ such that we can compare the functions f and
f ˚ . The following lemma shows that there exists exactly one function fulfilling f “ f ˚ .

113
6 CONJUGATE FUNCTIONS

Lemma 6.1.8 (Self-conjugacy)


Let E be a Hilbert space and let f : E Ñ R be proper. Then f ˚ “ f if and only if f “ 21 }¨}2 .

Proof. Let upxq :“ 12 }x}2 .


“ ðù ”: We show that u˚ “ u. For p P E, the concave function
1
g : E Ñ R, x ÞÑ x p, x y ´ }x}2
2
!
attains its maximum in x0 if and only if ∇gpx0 q “ p ´ x0 “ 0, i.e. x0 “ p. Thus
1 1
u˚ ppq “ sup x p, x y ´ }x}2 “ gppq “ }p}2 .
xPE 2 2

“ ùñ ”: Let f “ f ˚ be proper. By the Fenchel inequality from Lemma 6.1.4 2 we have


for x “ p P dompf q that

2f pxq “ f ˚ pxq ` f pxq ě x x, x y “ }x}2 “ 2upxq,

that is, f pxq ě upxq for x P dompf q. Since u only takes finite values, it holds that f pxq ě upxq
for all x P E. By Lemma 6.1.4 1 this implies for all p P E that

f ppq “ f ˚ ppq ď u˚ ppq “ uppq,

where we used that u˚ “ u by first part of the proof. We conclude that f ” u. l

Theorem 6.1.9: Conjugate calculus

Let f : E Ñ R be a function, A P LpE; Eq a bijective linear operator, a P R, b P E a


vector and λ P R zt0u.
1 For φpxq :“ f pxq ` a we have φ˚ ppq “ f ˚ ppq ´ a.
2 For φpxq :“ f pλxq we have φ˚ ppq “ f ˚ pλ´1 pq.
3 For φpxq :“ λf pxq we have φ˚ ppq “ λf ˚ pλ´1 pq for λ ą 0.
4 For φpxq :“ f pxq ´ x b, x y we have φ˚ ppq “ f ˚ pp ` bq.
5 For φpxq :“ f px ` bq we have φ˚ ppq “ f ˚ ppq ´ x p, b y.
6 For φpxq :“ f pAxq we have φ˚ ppq “ f ˚ pA´˚ pq, where A´˚ is the inverse of the
adjoint operator of A.

The proof of the theorem is left as Exercise 6.1.16.


Lecture 19,
07.06.2024
Theorem 6.1.10: Conjugation and inf-convolution

Let f, g : E Ñ R be proper. Then, the following holds true.


1 It holds pf □gq˚ “ f ˚ ` g ˚ .
2 If E is reflexive, then pf ` gq˚ ď f ˚ □g ˚ .
3 If E is reflexive and f, g P Γ0 pEq, then it holds pf ` gq˚ “ clpf ˚ □g ˚ q.

114
6 CONJUGATE FUNCTIONS

Proof. 1 Since f and g are proper, both f ˚ and g ˚ cannot attain the value ´8 anywhere
by Lemma 6.1.5 2 .
Using that suprema commute, we have for p P E ˚ that
␣ ` ˘( ␣ (
pf □gq˚ ppq “ sup x p, x y ´ inf f pyq ` gpx ´ yq “ sup x p, x y ´f pyq ´ gpx ´ yq
xPE yPE x,yPE
␣ ( ␣ (
“ sup x p, y y ´f pyq ` sup x p, x ´ y y ´gpx ´ yq
yPE xPE
␣ ( ␣ (
“ sup x p, y y ´f pyq ` sup x p, z y ´gpzq “ f ˚ ppq ` g ˚ ppq.
yPE zPE

2 Since E is reflexive, ΛE is an isometric isomorphism and we identify E ˚˚ with E via


ΛE . By Lemma 6.1.5, f ˚ and g ˚ can not attain the value ´8 anywhere. If either f ˚
or g ˚ are improper, then f ˚ □g ˚ ” 8, so the inequality holds true.
Now let f ˚ and g ˚ be proper. By applying Lemma 6.1.4 3 we have f ˚˚ `g ˚˚ ď f `g.
By Lemma 6.1.4 1 this entails pf `gq˚ ď pf ˚˚ `g ˚˚ q˚ . Applying the first part of the
theorem to f ˚ and g ˚ instead of f and g implies pf ˚ □g ˚ q˚ “ f ˚˚ ` g ˚˚ . Conjugating
this equality yields
pf ˚˚ ` g ˚˚ q˚ “ pf ˚ □g ˚ q˚˚ ď f ˚ □g ˚ ,
where the inequality is again due to Lemma 6.1.4 2 .
3 Since f, g P Γ0 pEq, it holds by Corollary 6.1.7 that f ˚˚ “ f and g ˚˚ “ g. Thus, it
holds by the first part of the theorem and by Corollary 6.1.7 that

pf ` gq˚ “ pf ˚˚ ` g ˚˚ q˚ “ pf ˚ □g ˚ q˚˚ “ clpf ˚ □g ˚ q,

which concludes the proof. l

If f, g : Rd Ñ R are convex and proper and dompf q X dompgq contains a point at which f
or g is continuous, then we have equality in the second part of the theorem, i.e., it holds
pf ` gq˚ “ f ˚ □g ˚ . For a proof we refer to [Roc70b, Thm. 16.4].
Next we are interested in relations between subgradients of functions and their conjugates.

Theorem 6.1.11: Subgradients and conjugates

Let f : E Ñ R and x0 P E with f px0 q P R. Then the Fenchel-Young identity Fenchel-


Young identity
p0 P Bf px0 q ðñ x p0 , x0 y “ f px0 q ` f ˚ pp0 q (6.3)

holds. If additionally f px0 q “ f ˚˚ ˝ ΛE px0 q, then

p0 P Bf px0 q ðñ ΛE px0 q P Bf ˚ pp0 q (6.4)

and Bf px0 q “ arg maxpPE ˚ x p, x0 y ´f ˚ ppq.

Proof. 1 We have

p0 P Bf px0 q ðñ f pxq ´ x p0 , x y ě f px0 q ´ x p0 , x0 y @x P E.

115
6 CONJUGATE FUNCTIONS

Since we have equality for x “ x0 , this implies


p0 P Bf px0 q ðñ inf f pxq ´ x p0 , x y “ f px0 q ´ x p0 , x0 y .
xPE
loooooooooomoooooooooon
“´f ˚ pp0 q

2 Let additionally f px0 q “ f ˚˚ ˝ ΛE px0 q. Using the first part of the theorem, we observe
that both sides cannot hold true if f ˚ pp0 q R R. Therefore, we can assume that f ˚ pp0 q P
R such that we get analogously to the first part that

ΛE px0 q P Bf ˚ pp0 q ðñ x ΛE px0 q, p0 y “ f ˚ pp0 q ` f ˚˚ ˝ ΛE px0 q.

Since f px0 q “ f ˚˚ ˝ ΛE px0 q, this is equivalent to

x p0 , x0 y “ f ˚ pp0 q ` f px0 q ðñ p0 P Bf px0 q,

which yields the claim.


3 Finally, by 1 we have for p0 P E ˚

p0 P Bf px0 q ðñ x p0 , x0 y ´f ˚ pp0 q “ f px0 q “ f ˚˚ ˝ ΛE px0 q “ sup x p, x0 yE ´f ˚ ppq,


pPE ˚

which holds true if and only if p0 P arg maxpPE ˚ x p, x0 y ´f ˚ ppq. l

The following corollary deals with conjugates of infimal projections which will be used in
the next section on duality theory.
Let F be a normed space. Recall that the dual space of E ˆ F is given by pE ˆ F q˚ “
E ˚ ˆ F ˚ with dual pairing xpp, qq, px, yq yEˆF :“ x p, x yE ` x q, y yF for pp, qq P E ˚ ˆ F ˚ and
px, yq P E ˆ F .
Corollary 6.1.12 (Conjugates of Infimal Projections)
For φ : E ˆ F Ñ R, the infimal projection v : F Ñ R, u ÞÑ inf xPE φpx, uq has the conjugate

v ˚ “ φ˚ p0, ¨q.

If vpuq is finite and there exists a x0 P E such that vpuq “ φpx0 , uq, then

p̂ P Bvpuq ðñ p0, p̂q P Bφpx0 , uq.

Proof. 1 For p P F ˚ we have

v ˚ ppq “ sup x p, u yF ´vpuq “ sup x p, u yF ´ inf φpx, uq “ sup sup x p, u yF ´φpx, uq


uPF uPF xPE uPF xPE
˚
“ sup x 0, x yE ` x p, u yF ´φpx, uq “ φ p0, pq.
uPF,
xPE

2 Let x0 satisfy vpuq “ φpx0 , uq P R. By Theorem 6.1.11 we have

p̂ P Bvpuq ðñ x p̂, u y “ vpuq ` v ˚ pp̂q “ φpx0 , uq ` φ˚ p0, p̂q.

On the other hand, by the same theorem we have that

p0, p̂q P Bφpx0 , uq ðñ x p̂, u y “ φpx0 , uq ` φ˚ p0, p̂q. l

116
6 CONJUGATE FUNCTIONS

Exercises

Exercise 6.1.13 Let f, fi : E Ñ R for i P I. If f is even, then so is f ˚ . We have


pinf iPI fi q “ supiPI fi˚ and psupiPI fi q ď inf iPI fi˚ . Give an example where strict inequality
˚ ˚

holds. ■

Exercise 6.1.14 Let E be a Hilbert space. Find the conjugate of L : E Ñ R, x ÞÑ


2 x Ax, x y, where A : E Ñ E is a positive injective Hermitian semidefinite operator. This
1

is a generalization of the mapping 12 } ¨ }2 . ■

Exercise 6.1.15 In general, we have pf ` gq˚ ‰ f ˚ ` g ˚ . ■

Exercise 6.1.16 Prove Theorem 6.1.9. ■

Exercise 6.1.17 (Coerciveness via the conjugate)


Let E be a Hilbert space. Prove that f : E Ñ R is coercive if f ˚ is continuous (and finite)
in 0. ■

117
6 CONJUGATE FUNCTIONS

6.2 Moreau’s Characterization of Proximal Mappings


In this section, we analyze proximal mappings with the help of conjugates. Here, we will
assume that E is a Hilbert space. The section has two main results which were first
proven by Moreau in [Mor65]. The first one relates the proximal mapping of a function
with the proximal mapping of its dual and is known under the name Moreau decomposition.
The second one characterizes the class of proximal mappings as the class of non-expansive
operators which are the gradient of a convex potential.

Theorem 6.2.1: Moreau’s decomposition

Let E be a Hilbert space and f P Γ0 pEq. Then, the Moreau decomposition Moreau
holds true for all x P E: decomposition
1 1 f pxq ` 1 f ˚ pxq “ 12 }x}2 ;
2 proxf pxq ` proxf ˚ pxq “ x;
3 there exist unique elements y, z P E such that y ` z “ x and z P Bf pyq. These
elements are y “ proxf pxq and z “ proxf ˚ pxq.

Proof. 1 By Theorem 2.7.4 6 , we know that 1 f ˚ is Frechet differentiable and there-


fore continuous. In particular, it holds by Lemma 6.1.8 and Theorem 6.1.10 3 that
1 ˚
f “ clp1 f ˚ q “ clp 21 } ¨ }2 □f ˚ q “ clpp 21 } ¨ }2 q˚ □f ˚ q “ p 12 } ¨ }2 ` f q˚ .

Hence, we have that


1
f pxq ` 1 f ˚ pxq “ 1 f pxq ` p 12 } ¨ }2 ` f q˚ pxq
!1 ) ! 1 )
“ inf }y ´ x}2 ` f pyq ` sup x y, x y ´ }y}2 ´ f pyq
yPE 2 yPE 2
!1 1 ) ! 1 )
“ inf }y}2 ´ x y, x y ` }x}2 ` f pyq ` sup x y, x y ´ }y}2 ´ f pyq
yPE 2 2 yPE 2
1 ! 1 2 ) ! 1 2 ) 1
“ }x}2 ` inf }y} ´ x y, x y `f pyq ´ inf }y} ´ x y, x y `f pyq “ }x}2 .
2 yPE 2 yPE 2 2

2 Taking the Gateaux derivative in 1 , we obtain by Theorem 2.7.4 that

x ´ proxf pxq ` x ´ proxf ˚ pxq “ x ô x “ proxf pxq ` proxf ˚ pxq.

3 It holds x “ y ` z with z P Bf pyq if and only if we have

x“y`z and x ´ y “ z P Bf pyq, i.e., 0 P y ´ x ` Bf pyq.

By Fermat’s rule this is equivalent to

x“y`z and y P arg mint 21 }w ´ x}2 ` f pwqu “ proxf pxq.


wPE

Using part two of the theorem this is equivalent to

y “ proxf pxq and z “ proxf ˚ pxq.

This concludes the proof. l

118
6 CONJUGATE FUNCTIONS

Remark 6.2.2 (Direct proof of 2 )


Part 2 of the theorem can be shown directly via Theorem 6.1.11: Assume that x̂ “
proxf pxq “ arg minyPE t 12 }y ´ x}2 ` f pxqu. Using Fermat’s rule, this is equivalent to

0 P x̂ ´ x ` Bf px̂q ô x ´ x̂ P Bf px̂q.

By Theorem 6.1.11, this is equivalent to

x̂ P Bf ˚ px ´ x̂q ô 0 P px ´ x̂q ´ x ` Bf ˚ px ´ x̂q.

Using Fermat’s rule again, this is equivalent to x ´ x̂ “ arg minyPE t 21 }y ´ x}2 ` f ˚ pxqu “
proxf ˚ pxq. Since we defined x̂ “ proxf pxq we arrive at

proxf pxq ` proxf ˚ pxq “ x,

which proves 2 . ˝

For proving the second result, we need two auxilliary lemmas.

Lemma 6.2.3
Let E be a Hilbert space and assume that f P Γ0 pEq. Then

1 1
f ˚ ´ } ¨ }2 is convex ô } ¨ }2 ´ f is convex.
2 2

Proof. “ ùñ ”: Define g “ f ˚ ´ 12 } ¨ }2 . Then, we obtain that g P Γ0 pEq and, by Theo-


rem 6.1.10 and Corollary 6.1.7, that
1 ´1 ¯˚ ´ 1 ¯˚
f ˚ “ g ` } ¨ }2 “ g ˚˚ ` } ¨ }2 “ g ˚ □ } ¨ }2 “ p1 g ˚ q˚ .
2 2 2

Taking the dual and noting that the Moreau envelope is continuous by Theorem 2.7.4 6 ,
we obtain that
f “ f ˚˚ “ p1 g ˚ q˚˚ “ clp1 g ˚ q “ 1 g ˚ .

In particular, it holds that


1 1 !1 )
}x}2 ´ f pxq “ }x}2 ´ inf }x ´ y}2 ` g ˚ pyq
2 2 yPE 2
!1 1 1 )
“ sup }x}2 ´ }x}2 ` x x, y y ´ }y}2 ´ g ˚ pyq
yPE 2 2 2
! 1 ) ´ 1 ¯˚
“ sup x x, y y ´g ˚ pyq ´ }y}2 “ g ˚ ` } ¨ }2 pxq,
yPE 2 2

which is convex. Lecture 20,


“ ðù ”: Define h “ 21 } ¨ }2 ´ f and note that h is convex by assumption. Since f is proper, 12.06.2024
there exists some x0 such that f px0 q P R which implies hpx0 q P R. If f px1 q “ 8 for some
x1 , then hpx1 q “ ´8 such that by convexity x2 “ 2x0 ´ x1 either fulfills hpx2 q “ `8 or
1 1
hpx0 q “ hp 12 x1 ` 12 x2 q ď hpx2 q ` hpx1 q “ ´8.
2 2

119
6 CONJUGATE FUNCTIONS

In the first case, we obtain f px2 q “ ´8 which contradicts the assumption that f is proper.
In the second case, we have f px0 q “ 8, which contradicts f px0 q P R. Consequently, it holds
dompf q “ domphq “ E such that Theorem 2.4.4 implies that f (and hence h) is continuous.
In particular, we have that h P Γ0 pEq such that h “ h˚˚ by Theorem 6.1.6.
Therefore, it holds that
1 1 !1 )
f pxq “ }x}2 ´ h˚˚ pxq “ }x}2 ´ suptx x, y y ´h˚ pyqu “ inf }x}2 ´ x x, y y `h˚ pyq .
2 2 yPE yPE 2

In particular, we obtain with the computation rule for conjugates from Theorem 6.1.9 that
!1 )
f ˚ pxq “ sup x x, y y ´ inf }y}2 ´ x y, z y `h˚ pzq
yPE zPE 2
´1 ¯˚ ´1 ¯˚
“ sup } ¨ }2 ´ x ¨, z y `h˚ pzq pxq “ sup } ¨ }2 ´ x ¨, z y pxq ´ h˚ pzq
zPE 2 zPE 2
1 1 1
“ sup }x ` z}2 ´ h˚ pzq “ }x}2 ` sup x x, z y ` }z}2 ´ h˚ pzq
zPE 2 2 zPE 2
1 ´ 1 ¯˚
“ }x}2 ` h˚ ´ } ¨ }2 pxq.
2 2
We conclude that
1 ´ 1 ¯˚
f ˚ ´ } ¨ }2 “ h˚ ´ } ¨ }2
2 2
is convex since conjugate functions are convex. l

Lemma 6.2.4
Let E be a Hilbert space and let T : E Ñ E be non-expansive such that there exists some
Φ : E Ñ R with T pxq P BΦpxq for all x P E. Then, the following holds true.
1 Φ is Frechet differentiable with gradient ∇Φ “ T .
2 The function g “ 21 Φ˚ ´ 12 } ¨ }2 is convex.

Proof. 1 Let x, x0 P E. Then, T px0 q P BΦpx0 q and T pxq P BΦpxq yields by the definition
of the subdifferential that
Φpxq ´ Φpx0 q ě x T px0 q, x ´ x0 y, and Φpx0 q ´ Φpxq ě x T pxq, x0 ´ x y
ô Φpxq ´ Φpx0 q ď x T pxq, x ´ x0 y .
This implies by Cauchy-Schwartz and since T is non-expansive that
0 ď Φpxq ´ Φpx0 q ´ x T px0 q, x ´ x0 y ď x T pxq ´ T px0 q, x ´ x0 y
ď }T pxq ´ T px0 q}}x ´ x0 } ď }x ´ x0 }2 .
In particular, we have that
|Φpxq ´ Φpx0 q ´ x T px0 q, x ´ x0 y |
lim ď lim }x ´ x0 } “ 0.
xÑx0 }x ´ x0 } xÑx0

This means that Φ is Frechet differentiable in x0 with gradient ∇Φpx0 q “ T px0 q.


2 By Theorem 2.7.4 6 we know that 1 Φ˚ is differentiable with

∇1 Φ˚ pxq “ x ´ proxΦ˚ pxq “ x ´ px ´ proxΦ pxqq “ proxΦ pxq,

120
6 CONJUGATE FUNCTIONS

where we used the Moreau decomposition from Theorem 6.2.1. Consequently, g “


21 Φ˚ ´ 12 } ¨ }2 is differentiable with ∇gpxq “ 2 proxΦ pxq ´ x.
Now let x1 , x2 P E and let zi “ proxΦ pxi q for i “ 1, 2. By the definition of the prox
and by setting the derivative to zero, this yields
!1 )
zi “ arg min }y ´ x}2 ´ Φpyq , ô xi ´ zi “ ∇Φpzi q “ T pzi q.
yPE 2

Since T is non-expansive, this yields that

}z1 ´ x1 ´ z2 ` x2 }2 “ }T pz2 q ´ T pz1 q}2 ď }z2 ´ z1 }2 “ }z1 ´ z2 }2

such that
0 ď }z1 ´ z2 }2 ´ }z1 ´ z2 ´ px1 ´ x2 q}2
“ }z1 ´ z2 }2 ´ }z1 ´ z2 }2 ` 2 x z1 ´ z2 , x1 ´ x2 y ´}x1 ´ x2 }2
“ x 2z1 ´ 2z2 , x1 ´ x2 y ´ x x1 ´ x2 , x1 ´ x2 y
“ x 2z1 ´ x1 ´ p2z2 ´ x2 q, x1 ´ x2 y
“ x ∇gpx1 q ´ ∇gpx2 q, x1 ´ x2 y .

By the first order criterion from Theorem 2.3.9, this implies that g is convex. l

Now, we can prove that an operator is a proximal mapping if and only if it is non-expansive
and the gradient of a convex potential.

Theorem 6.2.5: Characterization of the Proximal Mappings


Let E be a Hilbert space and let T : E Ñ E. Then the following are equivalent.
1 There exists some f P Γ0 pEq such that T “ proxf .
2 T is non-expansive and there exists some convex function Φ : E Ñ R such that
T pxq P BΦpxq for all x P E.
3 T is non-expansive and there exists some convex, Frechet differentiable func-
tion Φ : E Ñ R such that T pxq “ ∇Φpxq for all x P E.

Proof. The direction “ 2 ô 3 ” is clear by Theorem 3.1.7 and Lemma 6.2.4.


“ 1 ùñ 3 ”: Let T “ proxf for some f P Γ0 pEq and define Φ “ 12 } ¨ }2 ´ 1 f . Then, we
have by Theorem 6.1.10 and Lemma 6.1.8 that
´1 ¯˚ 1
p1 f q˚ “ } ¨ }2 □f “ } ¨ }2 ` f ˚
2 2
such that p1 f q˚ ´ 12 } ¨ }2 “ f ˚ is convex. Thus, Lemma 6.2.3 implies that Φ is convex.
Moreover, it holds by Theorem 2.7.4 that Φ is Frechet differentiable with gradient

∇Φpxq “ x ´ px ´ proxf q “ proxf “ T.

“ 3 ùñ 1 ”: Let T be non-expansive and assume that there exists some Frechet


differentiable Φ : E Ñ R such that T “ ∇Φ. Then, by Lemma 6.2.4 2 , we have that

121
6 CONJUGATE FUNCTIONS

21 Φ˚ ´ 12 } ¨ }2 is convex which implies by Lemma 6.2.3 that 1


2} ¨ }2 ´ p21 Φ˚ q˚ is convex.
Further, it holds by Theorem 6.1.10 and Lemma 6.1.8 that

p1 Φ˚ q˚ “ p 12 } ¨ }2 □Φ˚ q˚ “ 12 } ¨ }2 ` Φ˚˚ “ 12 } ¨ }2 ` Φ,

where the last statement follows from the fact that Φ is convex and Frechet differentiable
and therefore Φ P Γ0 pEq. This implies that
1
p21 Φ˚ q˚ “ 2p1 Φ˚ q˚ p 12 ¨q “ 2Φp 12 ¨q ` } 12 ¨ }2 “ 2Φp 12 ¨q ` } ¨ }2 .
4
Hence, we have that the function
1 1 ´1 ¯
} ¨ }2 ´ p21 Φ˚ q˚ “ } ¨ }2 ´ 2Φp 12 ¨q “ 2 } 12 ¨ }2 ´ Φp 21 ¨q
2 4 2
is convex. This implies that also 12 } ¨ }2 ´ Φ is convex. Again by Lemma 6.2.3 this implies
that f :“ Φ˚ ´ 12 } ¨ }2 is convex such that
´1 ¯˚ 1
Φ “ Φ˚˚ “ } ¨ }2 ` f “ } ¨ }2 □f ˚ “ 1 f ˚ .
2 2
Finally, this implies by Theorem 2.7.4 that

T pxq “ ∇Φpxq “ ∇1 f ˚ pxq “ x ´ proxf ˚ pxq “ x ´ px ´ proxf pxqq “ proxf pxq.

This finishes the proof. l

The proof implies in particular, that the function f from the first item of Theorem 6.2.5 can
be expressed by the potential Φ from the second and third item by
1
f pxq “ Φ˚ pxq ´ }x}2 .
2
In the case that E “ R, the characterization of proximal mappings can be simplified.
Corollary 6.2.6
Let f : R Ñ R. Then, there exists some g P Γ0 pRq such that f “ proxg if and only if f is
non-expansive and monotone increasing.

The proof is left as Exercise 6.2.7.

Exercises

Exercise 6.2.7 Prove Corollary 6.2.6. ■

122
6 CONJUGATE FUNCTIONS

6.3 Relations of Positively Homogeneous Functions to


Support and Indicator Functions
Textbooks: [HUL93, Sec. V.2], [Z0̆2, p. 79] and [BS13, Ex. 2.115, Prop. 2.116].
In this subsection, we deal with the close relation between positively homogeneous functions,
support functions and indicator functions.

Definition 6.3.1 (Support function)


The support function of a set S Ă E is defined as the conjugate of the indicator function, support function
i.e.,
σS : E ˚ Ñ R, σS ppq “ ι˚S ppq “ sup x p, x y ´ιS pxq “ sup x p, x y .
xPE xPS

Fig. 6.3: Illustration of the support function of a non-convex set C Ă R2 .

By definition, σS is convex, lower semicontinuous and positively homogeneous, i.e., σS pcpq “


cσS ppq for all c ě 0. If S ‰ H is convex and bounded, then σS ppq is proper and

σS˚ ˝ ΛE “ ι˚˚
S ˝ ΛE “ clpιS q “ ιS .

In particular, we have that σS˚ ˝ΛE “ ιS if and only if S is additionally closed. Summarizing,
indicator and support function of a non-empty, bounded, closed, convex set are conjugate
to each other.
Lecture 21,
Example 6.3.2 Let K Ď E be a non-empty, closed, convex cone. Then we have that
14.06.2024
ι˚K ppq “ σK ppq “ ιK ˚ ppq.

This can be seen as follows: Assume that p P K ˚ :“ tq P Rd : xq, xy ď 0 @x P Ku. Then


σK ppq “ supxPK xp, xy “ 0 “ ιK ˚ ppq, where the supremum is attained for x “ 0. Assume
that p R K ˚ . Then there exists x P K such that xp, xy ą 0. Since λx belongs to K for all
λ ě 0 this implies that σK ppq “ `8 “ ιK ˚ ppq. ˛

The following lemma relates positively homogeneous functions with support functions.

123
6 CONJUGATE FUNCTIONS

Lemma 6.3.3 (Conjugate of positively homogeneous functions)


Let f : E Ñ R be proper, convex and positively homogeneous. Then

f ˚ “ ιC f , where Cf :“ tp P E ˚ : x p, x y ď f pxq @x P Eu.

and consequently f “ σCf if f is additionally lower semicontinuous.

Proof. Let p P E ˚ . Since f is positively homogeneous, for all λ ą 0 we have


´x¯
f ˚ ppq “ sup x p, x y ´f pxq “ sup x p, x y ´λf “ sup x p, λy y ´λf pyq “ λf ˚ ppq,
xPE xPE λ yPE

implying that f ˚ ppq P t0, 8u.


Suppose that f ˚ ppq “ 0. Then supxPE x p, x y ´f pxq “ 0 and thus p P Cf . Conversely, if
p P Cf , then x p, x y ´f pxq ď 0 for all x P E. Thus f ˚ ppq “ supxPE x p, x y ´f pxq P t0, 8u
must be zero. l

Corollary 6.3.4 (Subdifferential of a positively homogeneous function)


Let f : E Ñ R be proper, convex and positively homogeneous. For x0 P dompf q we have

Bf px0 q “ tp P Cf : f px0 q “ x p, x0 yu.

In particular, if f p0q “ 0, then Bf p0q “ Cf .

Proof. The Fenchel-Young identity from Theorem 6.1.11 states that

p P Bf px0 q ðñ x p, x0 y “ f px0 q ` f ˚ ppq.

Since f ˚ “ ιCf by Lemma 6.3.3, the equality on the right holds only holds if p P Cf and
f px0 q “ x p, x0 y. l

We exemplify the two previous statements.


Example 6.3.5 (Conjugate and subdifferentiable of a norm)
Consider the function f P Γ0 pEq given by f pxq “ }x}. Clearly, f is positively homogeneous.
Thus, applying Lemma 6.3.3 yields that its conjugate is given by

f ˚ “ ιC}¨} ,

where

C}¨} “ tp P E ˚ : x p, x y ď }x} @x P Eu “ tp P E ˚ : }p}˚ ď 1u “ B1,}¨}˚ p0q Ă E ˚ ,

where } ¨ }˚ :“ supxPE:}x}ď1 x ¨, x y is the dual norm on E ˚ .


Further, by Corollary 6.3.4 we have B} ¨ }p0q “ ιB1,}¨}˚ p0q and for x0 ‰ 0 that

B} ¨ }px0 q “ tp P E ˚ : }p}˚ ď 1, x p, x0 y “ }x0 }u “ tp P E ˚ : }p}˚ “ 1, x p, x0 y “ }x0 }u , ˛

which is the intersection of the unit sphere in E ˚ with a hyperplane.

124
6 CONJUGATE FUNCTIONS

Exercises

Exercise 6.3.6 What is the support function of a linear subspace, a half-space or the kernel
of a linear operator? ■

Exercise 6.3.7 (Support function calculus) For closed convex sets S, T Ă E, is σS `


σT “ σS`T ? Characterize S Ă T in terms of σS and σT . ■

125
7 DUALITY THEORY

7 Duality Theory
Textbooks: [BS13, Sec. 2.5], [BP12, Subsec. 3.2.1], [AE06, Sec. 4.4] and [ET99, Chp. III].
To a convex minimization problem we can associate a “dual” maximization problem using
the conjugates of the functions. The original problem is called the “primal” problem. We
examine the relationship between the optimal values and solution of these two problems.
Throughout this section, we will assume that E and F are reflexive normed spaces.

7.1 Conjugate Duality


Textbooks: [ET99, Sec. III.1 – III.2].
Let E and F be normed spaces. We want to minimize f : E Ñ R. To this end, we consider
a family of “perturbed versions” of f , i.e., we consider a function φ : E ˆ F Ñ R with
φp¨, 0q “ f . Then, we define the primal problems v : F Ñ R as

vpuq :“ inf φpx, uq (Pu )


xPE

vp0q “ inf φpx, 0q “ inf f pxq (P )


xPE xPE

and the corresponding solutions as

solpPu q :“ arg min φpx, uq, solpP q :“ arg min φpx, 0q.
xPE xPE

By Corollary 6.1.12 we know that v ˚ “ φ˚ p0, ¨q. Thus for u P F we have

v ˚˚ puq “ sup x p, u y ´v ˚ ppq “ sup x p, u y ´φ˚ p0, pq.


pPF ˚ pPF ˚

This leads us to the conjugate dual problems of (P ) and (Pu ) defined as

v ˚˚ puq :“ sup x p, u y ´φ˚ p0, pq “ ´ inf˚ φ˚ p0, pq ´ x p, u y (Du )


pPF ˚ pPF

v ˚˚ p0q :“ sup ´φ˚ p0, pq “ ´ inf˚ φ˚ p0, pq (D)


pPF ˚ pPF

and their corresponding solutions

solpDu q :“ arg max x p, u y ´φ˚ p0, pq, solpDq :“ arg min φ˚ p0, pq.
pPF ˚ pPF ˚

To keep the notations short, we will use the notation

v˚ ppq :“ inf φ˚ pp, uq such that v ˚˚ p0q “ ´v˚ p0q.


uPE

If we would have v P Γ0 pF q, we would obtain that v ˚˚ “ v such that the infima in (P )


and (Pu ) coincide with the suprema from (D) and (Du ). However, even for φ P Γ0 pE ˆ F q
we know in general only that vpuq “ inf xPE φpx, uq is convex as inf projection of a convex
function, see Theorem 2.6.2.

126
7 DUALITY THEORY

Therefore we can only guarantee that

´8 ď v ˚˚ puq ď vpuq ď 8, (7.1)

where the middle inequality is due to Lemma 6.1.4 3 .

Definition 7.1.1 (Duality gap, strong duality)


The term vpuq ´ v ˚˚ puq ě 0 (whenever it is defined) is the duality gap. If the duality gap duality gap
is zero, we speak of strong duality between the conjugate dual and the primal problems. strong duality

The next theorem relates the primal and dual problem.

Theorem 7.1.2: Conjugate duality

Let E and F be reflexive, let φ : E ˆ F Ñ R be proper and define v as in (Pu ). Then,


the following holds true.
1 If v ˚˚ puq is finite, then solpDu q “ Bv ˚˚ puq.
2 If v ˚˚ puq “ vpuq is finite, then solpDu q “ Bvpuq.
3 If Bvpuq ‰ H, then vpuq “ v ˚˚ puq is a finite value.
4 The following equivalence holds true.
,
vpuq “ v ˚˚ puq /
.
x̂ P solpPu q ô φpx̂, uq ` φ˚ p0, p̂q “ x p̂, u y ô p0, p̂q P Bφpx̂, uq
/
p̂ P solpDu q
-

Proof. 1 Let u P F with v ˚˚ puq P R. Then v ˚ is proper by Lemma 6.1.5 and therefore
v P Γ0 pF ˚ q such that v ˚ “ v ˚˚˚ . Then, the last statement in Theorem 6.1.11 applied
˚

to f “ v ˚˚ yields
Bv ˚˚ puq “ arg max x p, u y ´v ˚˚˚ ppq “ arg max x p, u y ´v ˚ ppq “ solpDu q,
pPF ˚˚˚ pPF ˚

where we used that by the reflexivity of F it holds that F ˚˚˚ “ F ˚ .


2 Let u P F such that vpuq “ v ˚˚ puq is finite. As in 1 we obtain that v ˚ P Γ0 pF ˚ q
and v ˚˚ P Γ0 pF ˚˚ q. Thus applying the second part of Theorem 6.1.11 for v ˚˚ and v ˚
yields

1 (6.4) (6.4)
p̂ P solpDu q “ Bv ˚˚ puq ðñ u P Bv ˚˚˚ pp̂q “ Bv ˚ pp̂q ðñ p̂ P Bvpuq.

3 Let Bvpuq ‰ H. Then, by definition of the subdifferential, vpuq must be finite. Let
p̂ P Bvpuq. By the Fenchel–Young identity from Theorem 6.1.11 we get
(7.1)
x p̂, u y ´v ˚ pp̂q “ vpuq ě v ˚˚ puq “ sup x p, u y ´v ˚ ppq.
pPF ˚

Hence p̂ P arg maxpPF ˚ x p, u y ´v ˚ ppq. Consequently,

v ˚˚ puq “ x p̂, u y ´v ˚ pp̂q “ vpuq.

127
7 DUALITY THEORY

4 Let u P F .

" i ùñ ii ": Suppose that vpuq “ v ˚˚ puq, x̂ P solpPu q and p̂ P solpDu q. Since φ is
proper, we have that φ˚ is proper such that

´8 ă φpx̂, uq “ vpuq “ v ˚˚ puq “ x p̂, u y ´φ˚ p0, p̂q ă 8.

In particular, vp0q “ v ˚˚ puq is finite. By 2 this implies that p̂ P Bvpuq and thus the
Fenchel–Young identity from Theorem 6.1.11 yields

x p̂, u y “ vpuq ` v ˚ pp̂q “ φpx̂, uq ` φ˚ p0, p̂q.

" ii ùñ i ` iii ": Let x̂ P E and p̂ P F ˚ such that

x p̂, u y “ φpx̂, uq ` φ˚ p0, p̂q. (7.2)

In particular, φpx̂, uq and φ˚ p0, p̂q are finite. Since vpuq ď φpx̂, uq by definition, we
obtain
vpuq ` v ˚ pp̂q ď φpx̂, uq ` φ˚ p0, p̂q “ x p̂, u y .
Thus, by the Fenchel inequality from Lemma 6.1.4 2 we derive

vpuq ` v ˚ pp̂q “ x p̂, u y . (7.3)

By (7.2) and since it holds v ˚ pp̂q “ φ˚ p0, p̂q by Corollary 6.1.12 this yields

vpuq ` v ˚ pp̂q “ φpx̂, uq ` φ˚ p0, p̂q “ φpx̂, uq ` v ˚ pp̂q

such that φpx̂, uq “ vpuq and x̂ P solpPu q.


Further, we obtain by (7.3) and the Fenchel–Young identity from Theorem 6.1.11
applied for v that p̂ P Bvpuq. Using the part 3 and 2 , we obtain that p̂ P Bvpuq “
solpDu q. Finally, (7.3) implies

φpx̂, uq ` φ˚ p0, p̂q “ vpuq ` v ˚ pp̂q “ x p̂, u y “ xpx̂, uq, p0, p̂q y .

By the Fenchel–Young identity applied for φ this yields that p0, p̂q P Bφpx̂, uq
" iii ùñ ii ": Let p0, p̂q P Bφpx̂, uq. Then, in particular, φpx̂, uq is finite. Thus, the
Fenchel–Young identity yields that φpx̂, uq ` φ˚ p0, p̂q “ x p̂, u y. l

Since Young’s inequality entails φpx̂, uq ` φ˚ p0, p̂q ě x x̂, 0 y ` x u, p̂ y, the second condition
from 4 is sometimes called extremality condition. Part 4 of the theorem states for
the special case u “ 0 in particular, that f px̂q “ ´φ˚ pp̂q implies x̂ P arg minpf q.
The following example shows, that the statements of the theorem do not hold true any
longer if we drop some of the assumptions.
Example 7.1.3
Consider the function
$
& ex , if ex ď u,
φ : R ˆ R Ñ R, px, uq ÞÑ
% 8, otherwise.

128
7 DUALITY THEORY

It is easy to check that φ P Γ0 pR ˆ Rq.

vpuq “ inf φpx, uq “ ιRą0 puq


xPR

and thus Bvp0q “ H and Bvp1q “ t0u and vp0q “ 8, vp1q “ 0. Furthermore,
v ˚ ppq “ sup px ´ ιRą0 pxq “ sup px “ ιRď0 ppq.
xPR xą0

By Example 6.3.2 we obtain that v ˚˚ “ ιRě0 .


Thus, v ˚˚ p0q “ v ˚˚ p1q “ 0, so we have a duality gap of vp0q ´ v ˚˚ p0q “ 8 between (P ) and
(D), but no duality gap between pP1 q and pD1 q, i.e., vp1q ´ v ˚˚ p1q “ 0.
Furthermore, Bv ˚˚ p0q “ p´8, 0s and Bv ˚˚ p1q “ t0u. Hence in accordance with Theo-
rem 7.1.2 we have solpDq “ Bv ˚˚ p0q ‰ vp0q “ H and solpD1 q “ t0u “ Bv ˚˚ p1q “ Bvp1q. ˛

We are mainly interested in the solutions of the primal problem (P ) for u “ 0. In this case,
the following corollary reformulates the findings from the previous theorem for φ P Γ0 pEˆF q.
Corollary 7.1.4 (Conjugate duality for the unperturbed problem)
Let φ P Γ0 pE ˆ F q.
1 If F is finite-dimensional and 0 P ripdompvqq or 0 P ripdompv˚ qq, then vp0q “ v ˚˚ p0q.
2 If F is finite-dimensional and 0 P ripdompvqq and vp0q is finite, then vp0q “ v ˚˚ p0q
and solpDq “ Bvp0q ‰ H.
3 If F is finite-dimensional and 0 P ripdompv˚ qq and v ˚˚ p0q is finite, then vp0q “ v ˚˚ p0q
and solpP q “ Bv˚ p0q ‰ H.
4 The following equivalence holds true.
,
vp0q “ v ˚˚ p0q /
.
x̂ P solpP q ô px̂, 0q P Bφ˚ p0, p̂q ô p0, p̂q P Bφpx̂, 0q.
/
p̂ P solpDq
-

Proof. Since φ is convex, so is v by Theorem 2.6.2.


• Let 0 P ripdompvqq. Then in particular vp0q ă 8. If vp0q “ ´8, then vp0q “ v ˚˚ p0q “
´8. Otherwise we have vp0q ą ´8 such that v is proper by Lemma 2.2.2. Since F is
finite-dimensional we obtain by Corollary 3.1.6 that Bvp0q ‰ H. By Theorem 7.1.2 3
this implies that vp0q “ v ˚˚ p0q is finite such that Theorem 7.1.2 2 yields solpDq “
Bvp0q. This shows the claim 2 and 1 for 0 P ripdompvqq.
• Let 0 P ripdompv˚ qq. Note that v˚ puq is the primal problem (Pu ) for the function
φ˚ instead of the function φ. If v ˚˚ p0q “ ´v˚ p0q is finite, we know by Lemma 2.2.2
that v˚ is proper. Further, we can conclude as in the previous part of the proof that
v˚ p0q “ v˚˚˚ p0q and Bv˚ p0q ‰ H by Corollary 3.1.6. Since
6.1.12 6.1.6
v˚˚˚ p0q “ sup ´v˚˚ pxq “ sup ´φ˚˚ px, 0q “ ´ inf φpx, 0q
xPE xPE xPE

we obtain by Theorem 7.1.2 2 for v˚ instead of v that solpP q “ Bv˚ p0q, which
shows 3 .

129
7 DUALITY THEORY

If v˚ p0q “ ´8 then, v ˚˚ p0q “ 8 and therefore vp0q ě v ˚ ˚ p0q “ 8 such that


v ˚˚ p0q “ 8 “ vp0q.
If v ˚˚ p0q “ 8, then we obtain that vp0q ě v ˚˚ p0q “ 8 such that we have vp0q “ 8 “
v ˚˚ p0q. This shows the claim 1 for 0 P ripdompv˚ qq.
• Finally, we prove 4 . Putting u “ 0 in Theorem 7.1.2 4 yields the equivalence of
the first and the third statement. Since φ P Γ0 pR ˆ Rq, (6.4) yields the equivalence of
the second and the third statement.

130
7 DUALITY THEORY

7.2 Fenchel-type Duality


Textbooks: [ET99, Sec. III.4, Sec. VII.4] and [BP12, Subsec. 3.2.2].
We consider a special case of primal and dual problem, which was first considered by Lecture 22,
Werner Fenchel. It covers many functionals arising in image processing approaches 19.06.2024
as well as the linear programming setting. Let g P Γ0 pEq, h P Γ0 pF q, A P LpE, F q a con-
tinuous linear operator, and c P E ˚ and b P F . Then, we are interested in minimizing the
functional
f : E Ñ R, f pxq “ x c, x y `gpxq ` hpb ´ Axq.

To this end, we consider the perturbation φ defined by


` ˘
φ : E ˆ F Ñ R, px, uq ÞÑ x c, x y `gpxq ` h b ´ Ax ` u .

Then, the primal problems (P ) and (Pu ) read as


` ˘
vpuq “ inf x c, x y `gpxq ` h b ´ Ax ` u .
xPE

(7.4)
` ˘
vp0q “ inf x c, x y `gpxq ` h b ´ Ax .
xPE

By the conjugate calculus rule from Theorem 6.1.9 4 and 5 we get for y P E ˚ and p P F ˚
that ␣ (
φ˚ py, pq “ sup x y, x y ` x p, u y ´ x c, x y ´gpxq ´ hpb ´ Ax ` uq
xPE,uPF
␣ (
“ sup x y, x y ´ x c, x y ´gpxq ` suptx p, u y ´hpb ´ Ax ` uqu
xPE uPF
looooooooooooooooomooooooooooooooooon
“phpb´Ax`¨qq˚ ppq
␣ (
“ sup x y, x y ´ x c, x y ´gpxq ` h˚ ppq ´ x p, b ´ Ax y
xPE
␣ (
“ h˚ ppq ´ x p, b y ` sup x y, x y ´ x c, x y ´gpxq ` x p, Ax y
xPE
␣ (
“ h ppq ´ x p, b y ` sup x y, x y ´gpxq ´ x c ´ A˚ p, x y
˚
xPE
looooooooooooooooooooooomooooooooooooooooooooooon
“px c´A˚ p,¨ y `gq˚ pyq

“ h˚ ppq ´ x p, b y `g ˚ py ´ c ` A˚ pq.
Thus, the dual problems (D) and (Du ) can be rewritten as

v ˚˚ puq “ sup x p, u y ´h˚ ppq ` x p, b y ´g ˚ pA˚ p ´ cq.


pPF ˚

v ˚˚ p0q “ sup ´g ˚ pA˚ p ´ cq ´ h˚ ppq ` x p, b y . (7.5)


pPF ˚

Corollary 7.1.4 can be rewritten in our special setting as follows.


Corollary 7.2.1 (Fenchel conjugate duality)
` ˘ `
1 If F is finite-dimensional and b P ri A dompgq ` domphq or c P ri A˚ domph˚ q ´
˘
dompg ˚ q , then vp0q “ v ˚˚ p0q.
` ˘
2 If F is finite-dimensional and b P ri A dompgq ` domphq and vp0q is finite, then
solpDq “ Bvp0q ‰ H.

131
7 DUALITY THEORY

` ˘
3 If F is finite-dimensional and c P ri A˚ domph˚ q ´ dompg ˚ q and v ˚˚ p0q is finite, then
solpP q “ Bv˚ p0q ‰ H.
4 The following equivalence holds true.
,
vp0q “ v ˚˚ p0q /
.
x̂ P solpP q ô px̂, 0q P Bφ˚ p0, p̂q ô p0, p̂q P Bφpx̂, 0q.
/
p̂ P solpDq
-

Proof. By Corollary 7.1.4 we only have to verify the two relations


` ˘ ` ˘
0 P ri dompvq ðñ b P ri A dompgq ` domphq ,
` ˘ ` ˘
0 P ri dompv˚ q ðñ c P ri A˚ domph˚ q ´ dompg ˚ q .

We verify only the first relation. The second one follows analogously. By the definition of
the relative interior we have 0 P ri dompvq if and only if 0 P dompvq and there exists a
` ˘

ε ą 0 such that vpuq ă 8 for all u P Bε p0q X aff dompvq . This is the case if and only if
` ˘

for any such u there is a xu P dompf q with b ´ Axu ` u P domphq, that is,
` ˘ ` ˘
domphq X b ´ A dompgq ` u ‰ H @u P Bε p0q X aff dompvq ,

that is
` ˘
b ` u P A dompgq ` domphq @u P Bε p0q X aff dompvq .

By the definition of the relative interior this is equivalent to b P ri A dompgq ` domphq . l


` ˘

Example 7.2.2 (Dual problem to a linear problem with inequality constraint)


Let E “ Rd , F “ Rm and A P Rmˆd . We are interested in the solution of the following
constrained problem
inf x c, x y subject to Ax ě b. (7.6)
xPRd

Problems like this are called linear programs (in standard form) and play an important linear programs
role in discrete and combinatorial optimization. The constraint Ax ě b ensures that x is
contained in a polyhedron.
In our notation which we can rewrite the linear program equivalently as the unconstrained
problem
inf x c, x y `ιRm
ď0
pb ´ Axq.
xPRd

This fits in the above setting (7.4) with g ” 0 and h :“ ιRm


ď0
. By Example 6.3.2, we obtain
that g “ ιt0u and h “ ιRě0 . Thus, the dual problem (7.5) is given by
˚ ˚ m

sup x p, b y ´ιRm
ě0
ppq ´ ιt0u pA˚ p ´ cq.
pPRm

This can be reormulated as


sup x p, b y subject to p ě 0, and A˚ p “ c,
pPRm

which is exactly the dual form of the linear program (7.6) that is known from linear opti-
mization. ˛

132
7 DUALITY THEORY

7.3 Lagrangian Duality


Textbooks: [BS13, Subsec. 2.5.3], [ET99, Sec. III.3].
In this section, we consider another specific choice for φ leading to the so-called Lagrangian
duality. Traditionally, this type of duality was used to derive optimality conditions for
constraint optimization problems. For proper functions g : E Ñ R and h : F Ñ R and an
arbitrary function G : E Ñ F , we aim to solve the Lagrange minimization problem

inf f pxq, where f pxq “ gpxq ` hpGpxqq. (7.7)


xPE

Note that Gpxq :“ b ´ Ax leads again to a Fenchel problem (7.4) with c “ 0 handled in
the previous subsection.

Definition 7.3.1 (Feasible set / vector for (7.7))


The set G :“ dompf q “ tx P dompgq : Gpxq P domphqu is the feasible set for (7.7) and its feasible set
elements are feasible vectors.

In order to solve this kind of problem, we consider the perturbation function

φ : E ˆ F Ñ R, px, uq ÞÑ gpxq ` hpGpxq ` uq.

This leads to the Lagrange primal problems associated to (7.7) given by

vpuq :“ inf gpxq ` hpGpxq ` uq (L-Pu )


xPE

vp0q “ inf gpxq ` hpGpxqq. (L-P )


xPE

To formulate the dual problem (Du ) we calculate for p P F ˚ , again using the conjugate
calculus rule Theorem 6.1.9 4 ,
␣ (
φ˚ p0, pq “ sup sup x u, p y ´gpxq ´ hpGpxq ` uq “ h˚ ppq ´ inf gpxq ` x p, Gpxq y .
xPE uPF xPE

Definition 7.3.2 (Lagrangian, Lagrange multiplier)


The (standard) Lagrangian associated with (L-Pu ) is Lagrangian

L : E ˆ F ˚ Ñ R, px, pq ÞÑ gpxq ` x p, Gpxq y

and p P F ˚ is the Lagrange multiplier or dual variable. Lagrange


multiplier
We can thus rewrite the dual problems (Du ) and (D) as
! )
v ˚˚ puq “ sup x p, u y ´h˚ ppq ` inf Lpx, pq , (L-Du )
pPF ˚ xPE
! ) ! )
v ˚˚ p0q “ sup ´ h˚ ppq ` inf Lpx, pq “ ´ inf˚ h˚ ppq ´ inf Lpx, pq . (L-D)
pPF ˚ xPE pPF xPE

The following theorem gives optimality conditions which ensure the existence of primal and
dual solutions. Later we will see that for a special setting these conditions coincide with the
so-called Karush-Kuhn-Tucker (KKT) conditions.

133
7 DUALITY THEORY

Theorem 7.3.3: Lagrangian optimality conditions

Let g : E Ñ R and h : F Ñ R be proper functions and G : E Ñ F . If vp0q “ v ˚˚ p0q


and there exist x̂ P sol (L-P ) and p̂ P sol (L-D), then the following optimality
conditions
x̂ P arg min Lpx, p̂q and p̂ P BhpGpx̂qq. (OC)
xPE

hold true. Conversely, if there exist x̂ and p̂ satisfying (OC), then they are solutions
of (L-P ) and (L-D) and there is no duality gap.

Proof. “ ùñ ”: Let vp0q “ v ˚˚ p0q, x̂ P sol (L-P ) and p̂ P sol (L-D). Then we obtain
vp0q “ gpx̂q ` hpGpx̂qq “ v ˚˚ p0q “ inf Lpx, p̂q ´ h˚ pp̂q.
xPE

Subtracting the last from the second term and adding 0 “ x p̂, Gpx̂q y ´ x p̂, Gpx̂q y yields
` ˘
gpx̂q ` x Gpx̂q, p̂ y ´ inf Lpx, p̂q ` h Gpx̂q ` h˚ pp̂q ´ x p̂, Gpx̂q y “ 0,
xPE
looooooooooooooooooomooooooooooooooooooon loooooooooooooooooomoooooooooooooooooon
ě0 (Fenchel inequality)
“Lpx̂,p̂q´inf xPE Lpx,p̂qě0

Thus both summands are zero and we obtain


Lpx̂, p̂q “ inf Lpx, p̂q and hpGpx̂qq ` h˚ pp̂q “ x p̂, Gpx̂q y .
xPE

By the Fenchel-Young identity (6.3) the second equality is equivalent to p̂ P BhpGpx̂qq.


“ ðù ”: This direction is left as Exercise 7.3.8. l

Special setting of closed convex cone constraints


We now consider the special case where h “ ιK for a closed convex cone K Ď F .
Let Ω Ă E be a non-empty, closed and convex, K Ă F a closed, convex cone and g : Ω Ñ R
and G : Ω Ñ F be functions. Consider the constrained minimization problem

inf gpxq subject to Gpxq P K. (P)


xPΩ

We extend g to a function defined on E by setting gpxq “ 8 if x R Ω. Then, the correspond-


ing primal and dual problems (L-P ) and (L-D) read as

inf gpxq ` ιK pGpxqq, (7.8)


xPE

sup inf Lpx, pq ´ ι˚K ppq, (7.9)


pPF ˚ xPE

where we formulated the dual problem as maximization problem instead of minimization


problem. Since K is closed and convex, we have by Example 6.3.2 that
ιK pGpxqq “ σK ˚ pGpxqq “ sup x p, Gpxq y and ι˚K “ ιK ˚ .
pPK ˚

Thus, the above problems can be written as min-max problems:

inf sup Lpx, pq (C-P )


xPE pPK ˚

sup inf Lpx, pq (C-D)


pPK ˚ xPE

134
7 DUALITY THEORY

This relates our minimization problem with saddle points as outlined in the next section.
Corollary 7.3.4 (Optimality conditions for conic problem) Lecture 23,
For the conic problem (P) the optimality conditions (OC) read as 21.06.2024

x̂ P arg min Lpx, p̂q, Gpx̂q P K, p̂ P K ˚ , x p̂, Gpx̂q y “ 0.


xPΩ

Proof. By Example 3.1.10, the subdifferential of ιK is given by the normal cone BιK pxq “
NK pxq. Thus, the second optimality condition p̂ P BhpGpx̂qq in (OC) for x̂ P E and p̂ P F ˚
becomes p̂ P NK Gpx̂q . By the definition of the normal cone we have
` ˘

p̂ P NK pGpx̂qq ðñ Gpx̂q P K and x p̂, x ´ Gpx̂q y ď 0 @x P K. (7.10)

Because Gpx̂q P K and K is cone, we have x :“ λGpx̂q P K for all λ ě 0. Plugging this into
(7.10) yields
x p̂, x y ď x p̂, Gpx̂q y @x P K, (7.11)
λ x p̂, Gpx̂q y ď x p̂, Gpx̂q y @λ ě 0. (7.12)
Setting λ “ 0 yields x p̂, Gpx̂q y ě 0. If x p̂, Gpx̂q y ą 0, we could divide by it such that λ ď 1
for all λ ě 0 which is a contradiction. Thus, we obtain that x p̂, Gpx̂q y “ 0. Plugging this
into (7.11) yields x p̂, x y ď 0 for all x P K such that p̂ P K ˚ . l

Finally, we consider a special problem which arises frequently in practice. To this end, we
assume that the spaces E and F are finite-dimensional.
Example 7.3.5 (Finitely many convex inequality and affine equality constraints)
s
Consider the functions f P Γ0 pEq, G :“ pg1 , . . . , gs q : E Ñ R , where g1 , . . . , gk P Γ0 pEq
and gk`1 , . . . , gs are affine functions for some k P t1, . . . , su and H :“ ph1 , . . . , ht q : E Ñ Rt ,
where hj is an affine function for every j P t1, . . . , tu. We are interested in the problem

inf f pxq subject to Gpxq ď 0, Hpxq “ 0. (7.13)


xPE

This is a special case of the above cone setting (C-P ) where G is replaced by pG, Hq : E Ñ
F :“ Rs ˆ Rt , Ω is given by the whole space E and K :“ Rsď0 ˆt0ut . Then, the polar cone is
given by K ˚ “ Rsě0 ˆ Rt . The primal (C-P ) and dual problems (C-D) can thus be written
as
inf sup Lpx, p, qq and sup inf Lpx, p, qq
xPE pě0, pě0, xPE
qPRt qPRt

with the Lagrangian

L : E ˆ Rs ˆ Rt Ñ R, px, p, qq ÞÑ f pxq ` x p, Gpxq y ` x Hpxq, q y (7.14)

The optimality conditions from Corollary 7.3.4 thus become

x̂ P arg min Lpx, p̂, q̂q, Gpx̂q ď 0, Hpx̂q “ 0, p̂ ě 0, x p̂, Gpx̂q y “ 0, (7.15)
xPE

If Lp¨, p̂, q̂q is differentiable, the first condition can be rewritten as 0 “ ∇x Lpx, p̂, q̂q. Together
these conditions are the original KKT conditions derived in 1951. ˛

135
7 DUALITY THEORY

Applying Corollary 7.1.4 we obtain the following optimality conditions.


Corollary 7.3.6 (Slater condition for the finitely constrained problem)
Şk
Assume that E is finite dimensional, that C :“ dompf q X i“1 dompgi q ‰ H and that the
following Slater condition is fulfilled: there exists a x P ripCq such that

Gpxq ă 0, Hpxq “ 0. (7.16)

Assume the the optimal value is finite. Then the solution fulfills

min max Lpx, p, qq “ max min Lpx, p, qq,


xPE pě0, pě0, xPE
q q

that is, we have strong duality and both a primal and dual solution exist.

The KKT conditions can be also formulated for nonconvex smooth problems where they pro-
vide necessary conditions for the local minima if some qualification constraints are fulfilled.
We refer to [BS13, Spe93, Man94] for more information.
Example 7.3.7 (Discrepancy Principle: penalized vs. constrained problems)
Consider matrices A P Rnˆd and L P Rmˆd with kerpAq X kerpLq “ t0u and b P Rn . For
τ ą 0 consider the constrained minimization problem

inf }Lx}22 subject to }Ax ´ b}22 ď τ. (Pτ )


xPRd

This setting can be applied for solving inverse problems as discussed in Section 4.3. To
ensure that there exists a solution of (Pτ ) we require that

τ ě min }Ax ´ b}22 .


x

Since x ÞÑ }Lx}22 is continuous and coercive on the non-empty set tx P Rd : }Ax ´ b}22 ď τ u,
the problem (Pτ ) has a solution. If there exists x P kerpLq with }Ax ´ b}22 ď τ , then x is a
solution and the problem (Pτ ) becomes trivial. Therefore, we assume that

τď min }Ax ´ b}22 “: τe . (7.17)


xPkerpLq

Now, the Lagrangian (7.14) reads as

Lpx, qq :“ }Lx}22 ` qp}Ax ´ b}22 ´ τ q.

The optimality conditions (7.15) state that if x̂ P Rd and λ P R satisfy


1 0 “ ∇x Lpx̂, λq “ 2pLT L ` λAT Aqx̂ ´ 2λAT b.
2 λ ě 0 and }Ax̂ ´ b}22 ´ τ ď 0 and λp}Ax ´ b}22 ´ τ q “ 0,
then x̂ solves (Pτ ).
If λ ą 0, we obtain by 2 that }Ax ´ b}22 “ τ . If λ “ 0, we get by 1 that

LT Lx̂ “ 0 ùñ x̂T LT Lx̂ “ }Lx̂}22 “ 0 ùñ x̂ P kerpLq, (7.18)

136
7 DUALITY THEORY

which implies by the assumption (7.17) that τ “ }Ax̂ ´ b}22 . Consequently, in both cases,
the minimum x̂ is attained on tx P Rd : }Ax ´ b}22 “ τ u. Now 1 implies that
ˆ ˙´1
1 T
x̂ “: xλ “ pLT L ` λAT Aq´1 λAT b “ L L ` AT A AT b.
λ

One can verify by differentiating that


$`
& 1 LT L ` AT A˘´1 AT b, if λ ą 0,
λ
xλ :“
% any x P kerpLq, if λ “ 0

is a solution of the penalized problem

min λ}Ax ´ b}22 ` }Lx}22 .


xPRd

Rewriting (Pτ ) as a cone problem (P) with Ω “ E “ Rd , F “ R, f pxq :“ }Lx}22 , K :“


p´8, 0s and Gpxq :“ }Ax ´ b}22 ´ τ we obtain that the dual problem (C-D) of (Pτ ) is

sup inf }Lx}22 ` pp}Ax ´ b}22 ´ τ q “ sup }Lxp }22 ` pp}Axp ´ b}22 ´ τ q.
pě0 xPRd pě0

Its solution p̂ “ λ is determined for τ ă τe by the solution p ě 0 of


› ˆ ˙´1 ›2
› 1 ›
}Axp ´ b}22 “ ›A LT L ` AT A AT b ´ b› “ τ. (7.19)
› ›
› p ›
2

In summary, we have that any solution of

x̂ “ arg min }Lx}22 subject to }Ax ´ b}22 ď τ


x

solves
x̂ “ arg min }Lx}22 ` λ}Ax ´ b}22 ,
x ˛
where λ is the solution p of (7.19).

Exercises

Exercise 7.3.8 Prove the direction “ ðù ” from Theorem 7.3.3. ■

137
7 DUALITY THEORY

7.4 Lagrangian Duality and Saddle Point Problems


Textbooks: [Man94, Chp. 5], [Spe93, Sec. 2.2], [ET99, Sec. VI.1] and [BP12, Subsec. 2.3.1].
We have seen in the previous section that primal and dual problems transform into min-max
problems of the Lagrangian function. In this secion, we study this kind of problems more
in detail. From now, we assume that E and F are finite-dimensional spaces with Euclidean
norm.

Definition 7.4.1 (Saddle point)


A point px̂, ŷq P Ω ˆ Ξ Ă E ˆ F is a saddle point of a function Φ : Ω ˆ Ξ Ñ R if saddle point

Φpx̂, yq ď Φpx̂, ŷq ď Φpx, ŷq @px, yq P Ω ˆ Ξ. (7.20)

There is the following close relation between saddle point problems and Min-Max problems. Fig. 7.1: A smooth
function with saddle
point at the origin.
Lemma 7.4.2 (Saddle points and min-max problems)
Let Φ : Ω ˆ Ξ Ñ R.
1 The equality
max min Φpx, yq ď min max Φpx, yq (7.21)
yPΞ xPΩ xPΩ yPΞ

holds true supposed that all extrema exist.


2 If all extrema exist, there exists a saddle point px̂, ŷq P Ω ˆ Ξ if and only if the equality

max min Φpx, yq “ min max Φpx, yq (7.22)


yPΞ xPΩ xPΩ yPΞ

holds true.
3 Any saddle point px̂, ŷq of Φ has function value

Φpx̂, ŷq “ max min Φpx, yq “ min max Φpx, yq


yPΞ xPΩ xPΩ yPΞ

Proof. 1 For px̃, ỹq P Ω ˆ Ξ we have that

ψpỹq :“ min Φpx, ỹq ď Φpx̃, ỹq ď max Φpx̃, yq “: φpx̃q. (7.23)
xPΩ yPΞ

Hence we can take the maximum on the left side and the minimum on the right:
maxỹPΞ ψpỹq ď minx̃PΩ φpx̃q.
Lecture 24,
2 “ ùñ ”: Let px̂, ŷq be a saddle point of Φ. By definition this implies that 26.06.2024

min max Φpx, yq ď max Φpx̂, yq “ Φpx̂, ŷq “ min Φpx, ŷq ď max min Φpx, yq.
xPΩ yPΞ yPΞ xPΩ yPΞ xPΩ

138
7 DUALITY THEORY

Together with (7.21) we arrive at the desired equality (7.22). Moreover, we the above
equation shows 3 .
“ ðù ”: Assume that (7.22) holds true. Let x̂ “ arg minxPΩ φpxq and ŷ “ arg maxyPΞ ψpyq.
Then, it holds

max min Φpx, yq “ max ψpyq “ ψpŷq “ min Φpx, ŷq ď Φpx̂, ŷq ď max Φpx̂, yq
yPΞ xPΩ yPΞ xPΩ yPΞ
(7.24)
“ φpx̂q “ min φpxq “ min max Φpx, yq “ max min Φpx, yq.
xPΩ xPΩ yPΞ yPΞ xPΩ

Since we have the same value on the left and on the right, all inequalities are in fact
equalities. In particular, this yields for all x̃ P Ω and ỹ P Ξ that

Φpx̂, ỹq ď max Φpx̂, yq “ φpx̂q “ Φpx̂, ŷq


yPΞ

and Φpx̃, ŷq ď min Φpx, ŷq “ ψpx̂q “ Φpx̂, ŷq,


xPΩ

i.e., px̂, ŷq is a saddle point of y. l

Example 7.4.3 Note that the claim of Lemma 7.4.2 in the first version of this script (which
is the same as in [Spe93, Satz 2.2.3] as well as in the finite-dimensional script) was not true.
More precisely, the condition

max min Φpx, yq “ Φpx̂, ŷq “ min max Φpx, yq


yPΞ xPΩ xPΩ yPΞ

is not sufficient for showing that px̂, ŷq is a saddle point as the following counterexample
shows. Consider the function Φpx, yq “ x2 ´ y 2 and define px̂, ŷq “ p1, 1q. Then it holds

min max Φpx, yq “ Φpx̂, ŷq “ max min Φpx, yq “ 0.


xPR yPR yPR xPR

However, it is easy to check that p0, 0q is the unique saddle point of Φ. In particular,
px̂, ŷq “ p1, 1q is not a saddle point since Φp0, 1q “ ´1 ă 0 “ Φp1, 1q. ˛

By the following theorem a solution of the saddle point problem of the Lagrangian associated
with (7.13) gives rise to a solution of (7.13).

Theorem 7.4.4: Sufficient saddle point optimality conditions

Let f : Rd Ñ R be a proper function, Ω :“ dompf q, G : Rd Ñ Rs a componentwise


convex function and H : Rd Ñ Rt an affine function. Denote the Lagrangian by

Lpx, p, qq :“ f pxq ` x p, Gpxq y ` x Hpxq, q y .

If px̂, p̂, q̂q P Ω ˆ Rsě0 ˆ Rt is a saddle point of the Lagrangian, i.e., if

Lpx̂, p, qq ď Lpx̂, p̂, q̂q ď Lpx, p̂, q̂q @x P Rd , @pp, qq P Rsě0 ˆ Rt ,

then x̂ solves (7.13).

139
7 DUALITY THEORY

Proof. We want to use the optimality conditions (7.15).


Let px̂, p̂, q̂q be a saddle point of the Lagrangian L. Then

f px̂q ` xp, Gpx̂qy ` xq, Hpx̂qy ď f px̂q ` xp̂, Gpx̂qy ` xq̂, Hpx̂qy
(7.25)
ď f pxq ` xp̂, Gpxqy ` xq̂, Hpxqy

for all x P Ω, p P Rsě0 and q P Rt . Thus,

xp̂ ´ p, Gpx̂qy ` xq̂ ´ q, Hpx̂qy ě 0 @p P Rsě0 , q P Rt . (7.26)

1 Plugging p “ p̂ P Rsě0 and q “ q̂ ´ αei0 P Rt into (7.26) where α P R, i0 P t1, . . . , tu


and ej is the j-the unit vector, yields
` ˘
α ¨ Hpx̂q i0 ě 0 @α P R, i0 P t1, . . . , tu

so that Hpx̂q “ 0.
On the other hand, setting p “ p̂ ` ej and q “ q̂ in (7.26), we get Gpx̂q j ď 0 for
` ˘
2
every j P t1, . . . , su.
3 Putting p “ 0 P Rsě0 in (7.26) we see further that xp̂, Gpx̂qy ě 0 and since p̂ P Rsě0 and
Gpx̂q ď 0 we conclude xp̂, Gpx̂qy ď 0 and thus xp̂, Gpx̂qy “ 0.
4 Finally, using that p̂ ě 0 and Gpxq ď 0, Hpxq “ 0 for feasible points x P G, we obtain
from the right-hand side of (7.25) that

f px̂q ď f pxq ` xloomo


p̂ on, lo
Gpxq Hpxq
omoony ď f pxq
omoony ` xq̂, lo @x P G.
ě0 ď0 “0

Thus x̂ solves the minimization problem (7.13). l

Theorem 7.4.5: Necessary saddle point optimality conditions

Let f P Γ0 pRd q be a function with Ω :“ dompf q, G : Rd Ñ Rs a componentwise


convex function and H : Rd Ñ Rt an affine function. In the case H ı 0, let Ω “ Rd .
Further let Slater’s constraint qualification (7.16) be fulfilled.
If x̂ solves the primal problem (7.13), then there exist p̂ P Rsě0 zt0u and q̂ P Rt such
that
1 px̂, p̂, q̂q is a saddle point of the associated Lagrangian (7.14),
2 x p̂, Gpx̂q y “ 0.

Proof. Let x̂ be a solution of (7.13). Then the system

G0 pxq :“ f pxq ´ f px̂q ă 0, Gpxq ă 0, Hpxq “ 0

has no solution and by Lemma 7.4.6 there exist pp0 , pqT P Rs`1
ě0 zt0u and q P R such that
t

p0 ¨ pf pxq ´ f px̂qq ` xp, Gpxqy ` xq, Hpxqy ě 0 @x P Ω. (7.27)

140
7 DUALITY THEORY

Now p0 “ 0 and p ‰ 0 would contradict the Slater condition. Thus p0 ‰ 0 and dividing
(7.27) by p0 yields with p̂ :“ p10 p and q̂ :“ p10 q,

f pxq ´ f px̂q ` xp̂, Gpxqy ` xq̂, Hpxqy ě 0 @x P Ω,

that is,
Lpx, p̂, q̂q ě f px̂q @x P Ω. (7.28)

Since Gpx̂q ď 0 and Hpx̂q “ 0 we have on the other hand that

Lpx̂, p, qq “ f px̂q ` xloooomoooon x q, Hpx̂q y ď f px̂q


p, Gpx̂q y ` loooomoooon @p P Rsě0 , q P Rt . (7.29)
ď0 “0

Finally this implies that


(7.28) (7.29)
Lpx̂, p̂, q̂q ě f px̂q ě Lpx̂, p, qq @p P Rsě0 , q P Rt ,
(7.29) (7.28)
Lpx̂, p̂, q̂q ď f px̂q ď Lpx, p̂, q̂q @x P Ω,
so that px̂, p̂, q̂q P Ω ˆ pRsě0 ˆ Rt q is a saddle point of L. l

Lemma 7.4.6 (Generalized Gordon theorem [FGH57] [Man94, p. 65])


Let Ω Ă Rd be a nonempty, convex set, G : Ω Ñ Rs be a componentwise convex function and
H : Rd Ñ Rt an affine function. Let Ω “ Rd if H ı 0. If

Gpxq ă 0, Hpxq “ 0 has no solution on Ω,

then there exist p P Rsě0 , q P Rt , pp, qq ‰ p0, 0q such that

xp, Gpxqy ` xq, Hpxqy ě 0 @x P Ω. (7.30)

We can choose p ‰ 0 if either H ” 0 or if there exists x P Rd such that Hpxq “ 0.

Proof. 1. Define the sets


ď
Λpxq :“ tpy, zq P Rs ˆ Rt : Gpxq ă y, Hpxq “ zu, Λ :“ Λpxq.
xPΩ

By assumption, Λ does not contain the origin p0, 0q P Rs ˆ Rt . Further, Λ is convex, because
pyi , zi q P Λ, i “ 1, 2 and the convexity of Ω and G imply

p1 ´ λqy1 ` λy2 ą p1 ´ λqGpx1 q ` λGpx2 q ě Gpp1 ´ λqx1 ` λx2 q,


p1 ´ λqz1 ` λz2 “ p1 ´ λqHpx1 q ` λHpx2 q “ Hpp1 ´ λqx1 ` λx2 q.

By Theorem 1.2.2 there exist p P Rs , q P Rt , pp, qq ‰ 0 such that

x p, u y ` x q, w y ě 0 @u, w P Λ.

Since the entries of u can be made arbitrary large, we conclude that p ě 0.


For ε ą 0 and x P Ω, let u :“ Gpxq ` 1ε, w :“ Hpxq. Then pu, wq P Λpxq Ă Λ and

xp, uy ` xq, wy “ xp, Gpxqy ` εxp, 1y ` xq, Hpxqy ě 0,


xp, Gpxqy ` xq, Hpxqy ě ´εxp, 1y @x P Ω. (7.31)

141
7 DUALITY THEORY

Now if
inf txp, Gpxqy ` xq, Hpxqyu “ ´δ ă 0,
xPΩ

we get by choosing ε such that εxp, 1y ă δ that

inf txp, Gpxqy ` xq, Hpxqyu “ ´δ ă ´εxp, 1y


xPΩ

which contradicts (7.31).


2. Finally, we want to see why p ‰ 0 can be chosen also for H ı 0. Let Hpxq “ Ax ´ b. If
p “ 0 we obtain in (7.30) that

xq, Ax ´ by ě 0 @x P Ω,

in particular for x “ 0 that ´xq, by ě 0 and for x :“ ´λA˚ q that

´λxq, AA˚ qy ` xq, by “ ´λxA˚ q, A˚ qy ´ xq, by ě 0.

Since λ can be chosen arbitrary large, this results in A˚ q “ 0. If the rows of A are linearly
independent, this implies the contradiction q “ 0. If the rows of A are not linearly indepen-
dent and Hpxq “ 0, i.e. Ax “ b for some x, we may choose a maximal linearly independent
row set of A and cancel the other rows of the system without changing the solution set of
the system. l

142
8 PRIMAL-DUAL ALGORITHMS

8 Primal-Dual Algorithms
For this section we only consider optimization methods over finite dimensional spaces with
Euclidean norm.
The following minimization algorithms closely rely on the primal-dual formulation of prob-
lems. We consider functions f “ g ` hpA ¨q, where g P Γ0 pRd q, h P Γ0 pRm q, and A P Rm,d ,
and ask for the solution of the primal problem

pP q arg min tgpxq ` hpAxqu , (8.1)


xPRd

that can be rewritten as

pP q arg min tgpxq ` hpyq s.t. Ax “ yu . (8.2)


xPRd ,yPRm

The Lagrangian of (8.2) is given by

Lpx, y, pq :“ gpxq ` hpyq ` xp, Ax ´ yy. (8.3)

Thus, the primal and dual problem are given by

pP q arg min sup Lpx, y, pq


xPRd ,yPRm pPRm
. (8.4)
pDq arg max inf Lpx, y, pq
pPRm xPRd ,yPRm

Recall that ppx̂, ŷq, p̂q P Rdm,m is a saddle point of the Lagrangian L in (8.3) if

Lppx, yq, p̂q ě Lppx̂, ŷq, p̂q ě Lppx̂, ŷq, pq @x P Rd , y P Rm , p P Rm .

Using the results from the previous section, we obtain that ppx̂, ŷq, p̂q P Rdm,m is a saddle
point of L if and only if px̂, ŷq is a solution of pP q and p̂ is a solution of pDq. However the
existence of a solution of the primal problem px̂, ŷq P Rdm does only imply under additional
qualification constraints that there exists p̂ such that ppx̂, ŷq, p̂q P Rdm,m is a saddle point of
L.
The main idea of the following algorithms is to search for saddle points of the Lagrangian
in order to solve the primal problem (8.1).

8.1 Alternating Direction Method of Multipliers


A first idea to find a saddle point of the Lagrangian is to alternate the minimization with
respect to px, yq and to apply a gradient ascent approach with respect to p with stepsize
γ ą 0. This leads to the iterations

pxpr`1q , y pr`1q q P arg min Lpx, y, pprq q


xPRd ,yPRm (8.5)
pr`1q prq pr`1q pr`1q prq prq pr`1q pr`1q
p “p ` γ∇p Lpx ,y ,p q“p ` γpAx ´y q,

143
8 PRIMAL-DUAL ALGORITHMS

which is known as general Uzawa method [AHU58]. Since for fixed p, the Lagrangian general Uzawa
can be decomposed into terms only depending on one of the variables x and y, this is method
equivalent to the iterations

xpr`1q P arg min Lpx, y prq , pprq q


xPRd

y pr`1q
P arg min Lpxpr`1q , y, pprq q (8.6)
yPRm

ppr`1q “ pprq ` γpAxpr`1q ´ y pr`1q q.

Linear convergence can be proved under certain conditions [GL89]. In particular, these
conditions include strict convexity of the objective function f pxq “ gpxq ` hpAxq. In order
to relax these assumptions on f , we replace the Lagrangian by the so-called augmented
Lagrangian augmented
Lagrangian
γ
Lγ px, y, pq :“ gpxq ` hpyq ` xp, Ax ´ yy `}Ax ´ y}22 ,
2
γ p 1
“ gpxq ` hpyq ` }Ax ´ y ` }22 ´ }p}22 .
2 γ 2γ
Since any solution of the primal problem fulfills Ax “ y, we obtain that the primal problem
can be reformulated as

pP q arg min sup Lpx, y, pq “ arg min sup Lγ px, y, pq.


xPRd ,yPRm pPRm xPRd ,yPRm pPRm

Consequently, it can be shown that any saddle point of L is a saddle point of Lγ for any
γ ą 0. A formal proof of this fact is left as Exercise 8.1.7.
Now, we replace in (8.5) the Lagrangian by the augmented Lagrangian with fixed pa-
rameter γ:
pxpr`1q , y pr`1q q P arg min Lγ px, y, pprq q, (8.7)
xPRd ,yPRm

ppr`1q “ pprq ` γpAxpr`1q ´ y pr`1q q, γ ą 0.

This augmented Lagrangian method is known as method of multipliers [Hel69, Pow72, method of
Roc76]. The improved convergence properties came at a cost. The rewriting of (8.5) to (8.6) multipliers
is no longer possible when we use the augmented Lagrangian. Nevertheless, we consider
the algorithm, which alternates the minimization with respect to x and y. This leads to

xpr`1q P arg min Lγ px, y prq , pprq q, (8.8)


xPRd

y pr`1q “ arg min Lγ pxpr`1q , y, pprq q, (8.9)


yPRm

ppr`1q “ pprq ` γpAxpr`1q ´ y pr`1q q,

and is called the alternating direction method of multipliers (ADMM) which dates alternating
back to [Gab83, GM76, GM75]. Setting bprq :“ pprq {γ we obtain the following (scaled) direction method
ADMM: of multipliers
(ADMM)

144
8 PRIMAL-DUAL ALGORITHMS

Algorithm 8: Alternating Direction Method of Multipliers (ADMM)


Data: Initialization: y p0q P Rm , bp0q P Rm
for r “ 0, 1, . . . do
xpr`1q P arg minxPRd gpxq ` γ2 }bprq ` Ax ´ y prq }22 ;
␣ (

y pr`1q “ arg minyPRm hpyq ` γ2 }bprq ` Axpr`1q ´ y}22 “ prox γ1 h pbprq ` Axpr`1q q;
␣ (

bpr`1q “ bprq ` Axpr`1q ´ y pr`1q ;


end

A good overview on the ADMM algorithm and its applications is given in [BPC` 11], where
in particular the important issue of choosing the parameter γ ą 0 is addressed. Convergence
of the ADMM under various assumptions was proved, e.g., in [GM76, HY98, LM79, Tse91].
Few bounds on the global convergence rate of the algorithm can be found in [EB90] (linear
convergence for linear programs depending on a variety of quantities), [HL12] (linear con-
vergence for sufficiently small step size) and on the local behaviour of a specific variation of
the ADMM during the course of iteration for quadratic programs in [Bol14].

Relation to Douglas-Rachford Splitting In order to show convergence of the ADMM, Lecture 25,
we prove that it is equivalent to the Douglas-Rachford splitting algorithm applied to the 28.06.2024
dual problem, see [EB92, Ess09, Gab83, Set11].
Recall that the dual problam from (8.4) reads as

arg max inf Lpx, y, pq “ arg max inf gpxq ` hpyq ` x p, Ax ´ y y,


pPRm xPRd ,yPRm pPRm xPRd ,yPRm

Since
inf thpyq ´ xp, yyu “ ´ sup txp, yy ´ hpyqu “ ´h˚ ppq
yPRm yPRm

this can be rewritten as


␣ (
arg max inf gpxq ´ h˚ ppq ` xp, Axy
pPRm xPRd
! ()
“ arg max ´h˚ ppq ´ sup x´AT p, xy ´ gpxq

pPRm xPRd
pDq (8.10)
“ arg max ´h˚ ppq ´ g ˚ p´AT pq
␣ (
pPRm

“ arg min g ˚ p´AT pq ` h˚ ppq .


␣ (
pPRm

Applying the DRS onto this objective leads to the iteration

tpr`1q “ proxηg˚ ˝p´AT q p2pprq ´ tprq q ` tprq ´ pprq ,


(8.11)
ppr`1q “ proxηh˚ ptpr`1q q.

Then, the following theorem establishes a relation between ADMM and DRS. It was first
proven in [EB92, Gab83].

145
8 PRIMAL-DUAL ALGORITHMS

Theorem 8.1.1: Relation between ADMM and DRS

The ADMM sequences pbprq qr and py prq qr are related with the sequences (8.11) gen-
erated by the DRS algorithm applied to the dual problem by η “ γ and

tprq “ ηpbprq ` y prq q,


(8.12)
pprq “ ηbprq .

Proof. First, we show that


!η )
p̂ “ arg min }Ap ´ q}22 ` gppq ñ ηpAp̂ ´ qq “ proxηg˚ ˝p´AT q p´ηqq (8.13)
pPRm 2
holds true. The left-hand side of (8.13) is equivalent to

0 P ηAT pAp̂ ´ qq ` Bgpp̂q p̂ P Bg ˚ ´ ηAT pAp̂ ´ qq .


` ˘
ô

Applying ´ηA on both sides and using the chain rule implies

´ηAp̂ P ´ηABg ˚ ´ ηAT pAp̂ ´ qq . “ η B g ˚ ˝ p´AT q ηpAp̂ ´ qq .


` ˘ ` ˘` ˘

Adding ´ηq we get

´ηq P I ` η Bpg ˚ ˝ p´AT qq ηpAp̂ ´ qq ,


` ˘` ˘

which is equivalent to the right-hand side of (8.13) by the definition of the resolvent.
Secondly, applying (8.13) to the first ADMM step with γ “ η and q :“ y prq ´ bprq yields

ηpbprq ` Axpr`1q ´ y prq q “ proxηg˚ ˝p´AT q pηpbprq ´ y prq qq.

Assume that the ADMM and DRS iterates have the identification (8.12) up to some r P N.
Using this induction hypothesis it follows that
(8.11) pr`1q
ηpbprq ` Axpr`1q q “ proxηg˚ ˝p´AT q pηpb prq
´ y prq qq ` lo
loooooomoooooon ηy prq
omoon “ t . (8.14)
2pprq ´tprq tprq ´pprq

By definition of bpr`1q we see that ηpbpr`1q ` y pr`1q q “ tpr`1q . Next we apply (8.13) in the
second ADMM step where we replace g by h and A by ´I and use q :“ ´bprq ´ Axpr`1q .
Together with (8.14) this gives
(8.11)
ηp´y pr`1q ` bprq ` Axpr`1q q “ proxηh˚ pηpb prq
` Axpr`1q qq “ ppr`1q .
looooooooomooooooooon (8.15)
tpr`1q

Using again the definition of bpr`1q we obtain that ηbpr`1q “ ppr`1q which completes the
proof. l

The theorem implies the following convergence result for the ADMM.
Corollary 8.1.2 (Convergence of ADMM)
Let g P Γ0 pRd q, h P Γ0 pRm q and A P Rm,d . Assume that the Lagrangian (8.3) has a saddle
point. Then, for r Ñ 8, the sequences pγbprq qr and py prq qr converge to some limits p̂ “ γ b̂
and ŷ, where p̂ is a solution of the dual problem. If additionally the first step (8.8) in the
ADMM algorithm has a unique solution, then pxprq qr converges to a solution of the primal
problem.

146
8 PRIMAL-DUAL ALGORITHMS

Proof. The convergence of the y prq to ŷ and of pprq to the dual solution p̂ follows from
Theorem 8.1.1 together with Theorem 4.2.15. For the convergence result of pxprq qr , we refer
to [Set09, Thm. 2.4.9]. l

If the first step of the ADMM has more than one solution, the sequence xprq r generated
` ˘

by the ADMM might diverge as the following example shows.

Example 8.1.3 Let gpxq “ 0, hpyq “ ιt0u and let A P Rm,d be the zero-matrix. Then, the
steps of the ADMM read as
xpr`1q P Rd
y pr`1q “ proxιt0u pbprq q “ 0
ppr`1q “ pprq .
Choosing xprq “ p´1qr leads to a divergent sequence generated by the ADMM. ˛

With the help of the ADMM, we can now minimize the TV regularized functionals from
Example 4.3.2.

Example 8.1.4 Let K P Rn,d and D P Rm,d be matrices. For this example, we do not
require a specific forms for K and D. However, our main application will be the matrix
D which maps a (vectorized) image x onto the vector Dx containing the horizontal and
vertical differences between neighboring pixels. In this case Rpxq “ }Dx}1 is the total
variation regularizer from Example 4.3.2. Moreover the matrix K could define the forward
operator of an inverse problem from Section 4.3 by F pxq “ Kx. Now, we aim to minimize
the functional
1
J pxq “ }Kx ´ a}22 ` λ}Dx}1 .
2
In order to apply the ADMM we set gpxq “ 21 }Kx ´ a}22 , hpyq “ }y}1 and A “ D. Then,
starting with initializations xp0q , y p0q , pp0q , the first step from the ADMM Algorithm 8 reads
as ! γ )
xpr`1q P arg min gpxq ` }bprq ` Ax ´ y prq }22
xPRd 2
" *
1 2 γ prq prq 2
“ arg min }Kx ´ a}2 ` }b ` Dx ´ y }2 .
xPRd 2 2
By setting the derivative of the objective function to zero, we obtain that xpr`1q is the
solution of the linear system

pK T K ` γDT Dqx “ K T a ` γDT py prq ´ bprq q,

which can be computed, e.g., by a conjugate gradient method, see the lecture on numerical
methods. The secons step reads as
prq
y pr`1q “ prox γ1 h pbprq ` Axpr`1q q “ S γ1 pb1 ` Kxpr`1q q,

where S γ1 is the soft-shrinkage operator from Example 2.7.3. Finally, the last step is given
by
bpr`1q “ bprq ` Axpr`1q ´ y pr`1q .

Now, all steps are straight forward to implement. ˛

147
8 PRIMAL-DUAL ALGORITHMS

For minimizing more than two summands, we can use a slight modification of the ADMM
summarized in the following remark. The idea is quite similar to the parallel DRS from
Algorithm 7.
Remark 8.1.5 The ADMM can be extended for minimizing functionals of the form
n
ÿ
f pxq “ gpxq ` hi pAi xq.
i“1

By rewriting f as f pxq “ gpxq ` HpAxq, where


˛ ¨
n
A1
˚ . ‹
˝ .. ‚.
ÿ
Hpy1 , ..., yn q “ hi pyi q and A “ ˚ ‹
i“1
An

Now we apply the ADMM on the functions g and H resulting in the following iterations for
xprq , y prq “ py1 , ..., yn q and bprq “ pb1 , ..., bn q.
prq prq prq prq

# +
n
pr`1q γ ÿ prq prq 2
x P arg min gpxq ` }b ` Ai x ´ yi }2
xPRd 2 i“1 i

for i “ 1, ..., n
pr`1q prq
yi “ prox γ1 hi pbi ` Ai xpr`1q q,

for i “ 1, ..., n.
pr`1q prq pr`1q
bi “ bi ` Ai xpr`1q q ´ yi , ˝

The remark could be applied for simplifying the calculations in Example 8.1.4. However, it
cannot avoid the need of solving a linear system. Moreover, we can use the remark in order
to apply the ADMM for minimizing the TV regularized functional from Example 4.3.2 with
the non-differentiable data-fidelity term for the inpainting problem.

Example 8.1.6 Let K and D be given as in Example 8.1.4. We aim to minimize the
functional
f pxq “ ιtau pKxq ` }Dx}1 .
To this end, we set g ” 0, h1 “ ιtau , A1 “ K, h2 “ } ¨ }1 and A2 “ D. Then, the first step
of the ADMM is given by
prq prq prq prq
xpr`1q P arg mint}Kx ` b1 ´ y1 }22 ` }Dx ` b2 ´ y2 }22 u.
xPRd

Setting the gradient of the objective to zero, we obtain that this is equivalent to setting
xpr`1q to a solution of the linear system
prq prq prq prq
pK T K ` DT Dqx “ K T py1 ´ b1 q ` DT py2 ´ b2 q.

The second and third step read as


pr`1q prq
y1 “ prox γ1 h1 pb1 ` Kxpr`1q q “ a
pr`1q prq prq
y2 “ prox γ1 h2 pb2 ` Dxpr`1q q “ S γ1 pb1 ` Kxpr`1q q
pr`1q prq pr`1q prq
b1 “ b1 ` Kxpr`1q ´ y1 “ b1 ` Kxpr`1q ´ a
pr`1q prq pr`1q
b2 “ b2 ` Dxpr`1q ´ y2 .
Now, all steps can be computed. ˛

148
8 PRIMAL-DUAL ALGORITHMS

There exist different modifications of the ADMM algorithm presented above:


- inexact computation of the first step (8.8) [CT94, EB92] such that it might be handled
by an iterative method,
- variable parameter and metric strategies [BPC` 11, HLHY02, HY98, HYW00, KM98]
where the fixed parameter γ can vary in each step, or the quadratic term pγ{2q}Ax´y}22
within the augmented Lagrangian Lγ is replaced by the more general proximal operator
such that the ADMM updates (8.8) and (8.9) receive the form
" *
pr`1q 1 prq prq 2
x P arg min gpxq ` }b ` Ax ´ y }Qr ,
xPRd 2
" *
pr`1q 1 prq pr`1q 2
y “ arg min hpyq ` }b ` Ax ´ y}Qr ,
yPRm 2

respectively, with symmetric, positive definite matrices Qr . The variable parameter


strategies might mitigate the performance dependency on the initial chosen fixed pa-
rameter [BPC` 11, HYW00, KM98, WL01] and include monotone conditions [HY98,
KM98] or more flexible non-monotone rules [BPC` 11, HLHY02, HYW00].

Exercises

Exercise 8.1.7 Let g P Γ0 pRd q, h P Γ0 pRm q and A P Rm,d . Denote by


γ
Lpx, y, pq “ gpxq ` hpyq ` x p, Ax ´ y y, Lγ px, y, pq “ Lpx, y, pq ` }Ax ´ y}2
2
the Lagrangian and augmented Lagrangian. Show that any saddle point px̂, ŷ, p̂q of L
is a saddle point of Lγ . ■

149
8 PRIMAL-DUAL ALGORITHMS

8.2 Primal-Dual Hybrid Gradient Algorithm


The first ADMM step (8.8) requires in general the solution of a linear system of equations.
Therefore, we will discuss another algorithm for solving the primal problem (8.1) given by

pP q arg min tgpxq ` hpAxqu .


xPRd

Using that
hpAxq “ sup txp, Axy ´ h˚ ppqu,
pPRm

the problem can be rewritten as

pP q arg min sup tgpxq ´ h˚ ppq ` xp, Axyu .


xPRd pPRm

Similarly, we have seen in (8.10) that the dual problem can be rewritten as

pDq arg max inf tgpxq ´ h˚ ppq ` xp, Axyu .


pPRm xPRd

Hence, the Lagrangian can be rewritten as

Lpx, pq “ gpxq ´ h˚ ppq ` xp, Axy.

In order to find a saddle point of L, we can use the so-called Arrow–Hurwicz method, i.e.,
we alternate the application of the proximal mapping of Lp¨, pq and ´Lpx, ¨q. This results
in the sequences
" *
pr`1q prq 1 prq 2
x “ arg min gpxq ` xp , Axy ` }x ´ x }2 ,
xPRd 2τ
“ proxτ g pxprq ´ τ AT pprq q, (8.16)
" *
1
ppr`1q “ arg min h˚ ppq ´ xp, Axpr`1q y ` }p ´ pprq }22 ,
pPRm 2σ
“ proxσh˚ ppprq ` σAxpr`1q q. (8.17)

There are several modifications of this basic primal dual algorithm which improve its con-
vergence properties as
- the predictor corrector proximal multiplier method [CT94],
- the primal dual hybrid gradient method (PDHG) [ZC08] with convergence proof in
[BR12],
- primal dual hybrid gradient method with extrapolation of the primal or dual variable
[CP11a, PCCB09], a preconditioned version [CP11b] and a generalization [Con13],
Douglas-Rachford-type algorithms [BH13, BH14] for solving inclusion equations, see
also [CP12, Vũ13], as well an extension allowing the operator A to be non-linear
[Val13].
Here is the algorithm proposed by Chambolle, Cremers and Pock [CP11a, PCCB09], which
is the iteration (8.16) and (8.17) with an extrapolation step for the dual variable p.

150
8 PRIMAL-DUAL ALGORITHMS

Algorithm 9: Primal Dual Hybrid Gradient Method with Extrapolation of Dual


Variable (PDHGMp)
Data: Initialization: xp0q , pp0q “ p̄p0q P Rm , τ, σ ą 0 with τ σ ă 1{}A}22 and θ P p0, 1s
for r “ 0, 1, . . . do
xpr`1q “ proxτ g pxprq ´ τ AT p̄prq q;
ppr`1q “ proxσh˚ ppprq ` σAxpr`1q q;
p̄pr`1q “ ppr`1q ` θpppr`1q ´ pprq q;
end

In order to compute the proximity operator proxσh˚ , we apply the Moreau decomposition
from Theorem 6.2.1. We use the identities pσh˚ q˚ pxq “ σh˚˚ pσ ´1 xq “ σhpσ ´1 xq and
proxσhpσ´1 ¨q pxq “ σ proxσ´1 h pσ ´1 xq from Theorem 6.1.9 and Exercise 2.7.6. Then we obtain

proxσh˚ pxq “ x ´ proxpσh˚ q˚ pxq “ x ´ proxσhpσ´1 ¨q pxq “ x ´ σ proxσ´1 h pσ ´1 xq.

Inserting pprq ` σAxpr`1q for x, the update (8.17) becomes


1
ppr`1q “ proxσh˚ ppprq ` σAxpr`1q q “ pprq ` σAxpr`1q ´ σ prox σ1 h p pprq ` Axpr`1q q.
σ
Setting y pr`1q :“ prox σ1 h p σ1 pprq ` Axpr`1q q and replacing bprq “ 1 prq
σp (similar as in the
ADMM), we arrive at the iterations

xpr`1q “ proxτ g pxprq ´ στ AT bprq q,


y pr`1q “ prox σ1 h pbprq ` Axpr`1q q, (8.18)
pr`1q prq pr`1q pr`1q
b “b ` Ax ´y .

Remark 8.2.1 (Primal-Dual as Linearization of ADMM)


The iterations from (8.18) can also be derived by applying a Taylor expansion at xprq onto
the objective functional of the first step in the ADMM Algorithm 8. More precisely we have
γ prq γ
}b ` Ax ´ y prq }22 « const ` γxAT pAxprq ´ y prq ` bprq q, xy ` px ´ xprq qT AT Apx ´ xprq q.
2 2
By approximating AT A « γτ
1
I, setting γ :“ σ and using bprq instead of Axprq ´ y prq ` bprq
(note that bpr`1q
“ b ` Axpr`1q ´ y pr`1q ) we obtain exaclty the same update steps as
prq

in (8.18). ˝

Together with the extrapolation step from before, we arrive at the following equivalent
formulation of Algorithm 9.

Algorithm 10: Equivalent Formulation of Algorithm 9 (PDHGMp)


Data: Initialization: xp0q , bp0q “ b̄p0q P Rm , τ, σ ą 0 with τ σ ă 1{}A}22 and θ P p0, 1s
for r “ 0, 1, . . . do
1
}x ´ pxprq ´ τ σAT b̄prq q}22 ;
␣ (
xpr`1q “ arg minxPRd gpxq ` 2τ
y pr`1q “ arg minyPRm hpyq ` σ2 }bprq ` Axpr`1q ´ y}22 ;
␣ (

bpr`1q “ bprq ` Axpr`1q ´ y pr`1q q;


b̄pr`1q “ bpr`1q ` θpbpr`1q ´ bprq q;
end

151
8 PRIMAL-DUAL ALGORITHMS

For A being the identity and θ “ 1 and γ “ σ “ τ1 , the PDHGMp algorithm corresponds to
the ADMM as well as the Douglas-Rachford splitting algorithm as derived in Section 8.1.
The following theorem and convergence proof are based on [CP11a].
Lecture 26,
18.07.2023
Theorem 8.2.2: Convergence of PDHGMp

Let g P Γ0 pRd q, h P Γ0 pRm q and θ P p0, 1s. Let τ, σ ą 0 fulfill

τ σ ă 1{}A}22 . (8.19)

where }A}2 denotes the spectral norm of A. Suppose that the Lagrangian Lpx, pq :“
gpxq ´ h˚ ppq ` xAx, py has a saddle point px˚ , p˚ q. Then the sequence tpxprq , pprq qur
produced by PDGHMp converges to a saddle point of the Lagrangian.

Proof. We restrict the proof to the case θ “ 1. By the update steps (8.16) and (8.17) it
holds
´ ¯
xpr`1q “ pI ` τ Bgq´1 xprq ´ τ AT p̄prq ,
´ ¯
ppr`1q “ pI ` σBh˚ q´1 pprq ` σAxpr`1q ,

i.e.,
xprq ´ xpr`1q ´ ¯ pprq ´ ppr`1q ´ ¯
´ AT p̄prq P Bg xpr`1q , ` Axpr`1q P Bh˚ ppr`1q .
τ σ
By definition of the subdifferential we obtain for all x P Rd and all p P Rm that
1 prq
gpxq ě gpxpr`1q q ` xx ´ xpr`1q , x ´ xpr`1q y ´ xAT p̄prq , x ´ xpr`1q y,
τ
1
h˚ ppq ě h˚ pppr`1q q ` xpprq ´ ppr`1q , p ´ ppr`1q y ` xp ´ ppr`1q , Axpr`1q y
σ
and by adding the equations and inserting x “ x˚ and p “ p˚ and using the saddle point
property Lpxpr`1q , p˚ q ´ Lpx˚ , ppr`1q q ě 0, we obtain
´ ¯
0 ě gpxpr`1q q ´ h˚ pp˚ q ´ gpx˚ q ´ h˚ pppr`1q q
´ xAT p̄prq , x˚ ´ xpr`1q y ` xp˚ ´ ppr`1q , Axpr`1q y
1 1
` xxprq ´ xpr`1q , x˚ ´ xpr`1q y ` xpprq ´ ppr`1q , p˚ ´ ppr`1q y
τ σ
“ Lpxpr`1q , p˚ q ´ xp˚ , Axpr`1q y ´ Lpx˚ , ppr`1q q ` xppr`1q , Ax˚ y
´ xp̄prq , Apx˚ ´ xpr`1q qy ` xp˚ ´ ppr`1q , Axpr`1q y
1 1
` xxprq ´ xpr`1q , x˚ ´ xpr`1q y ` xpprq ´ ppr`1q , p˚ ´ ppr`1q y
τ σ
ě ´ xppr`1q ´ p̄prq , Apxpr`1q ´ x˚ qy
1 1
` xxprq ´ xpr`1q , x˚ ´ xpr`1q y ` xpprq ´ ppr`1q , p˚ ´ ppr`1q y.
τ σ
By the identity for inner products
1 ´ prq ¯
xxprq ´ xpr`1q , x˚ ´ xpr`1q y “ }x ´ xpr`1q }22 ` }x˚ ´ xpr`1q }22 ´ }x˚ ´ xprq }22
2

152
8 PRIMAL-DUAL ALGORITHMS

and the same equation for p, this can be rewritten as


1 ˚ 1 ˚
}x ´ xprq }22 ` }p ´ pprq }22
2τ 2σ
1 prq 1 ˚ 1 prq 1 ˚
ě }x ´ xpr`1q }22 ` }x ´ xpr`1q }22 ` }p ´ ppr`1q }22 ` }p ´ ppr`1q }22
2τ 2τ 2σ 2σ
´ xppr`1q ´ p̄prq , Apxpr`1q ´ x˚ qy.
1 prq 1 ˚ 1 prq 1 ˚
ě }x ´ xpr`1q }22 ` }x ´ xpr`1q }22 ` }p ´ ppr`1q }22 ` }p ´ ppr`1q }22
2τ 2τ 2σ 2σ
` xppr`1q ´ 2pprq ` ppr´1q , Apx˚ ´ xpr`1q qy,

where we have inserted the definition p̄prq “ 2pprq ´ ppr´1q . We estimate the last summand
using Cauchy-Schwarz’s inequality as follows:

xppr`1q ´ pprq ´ ppprq ´ ppr´1q q, Apx˚ ´ xpr`1q qy


“ xppr`1q ´ pprq , Apx˚ ´ xpr`1q qy ´ xpprq ´ ppr´1q , Apx˚ ´ xprq qy
´ xpprq ´ ppr´1q , Apxprq ´ xpr`1q qy
ě xppr`1q ´ pprq , Apx˚ ´ xpr`1q qy ´ xpprq ´ ppr´1q , Apx˚ ´ xprq qy
´ }A}2 }xpr`1q ´ xprq }2 }pprq ´ ppr´1q }2 .

Since
1 2
2uv ď αu2 ` v , α ą 0, (8.20)
α
we obtain
ˆ ˙
pr`1q prq prq pr´1q }A}2 pr`1q prq 2 1 prq pr´1q 2
}A}2 }x ´x }2 }p ´p }2 ď α}x ´ x }2 ` }p ´ p }2
2 α
}A}2 ατ pr`1q }A}2 σ prq
“ }x ´ xprq }22 ` }p ´ ppr´1q }22 .
2τ 2ασ
With α :“ σ{τ the relation
a

}A}2 σ ?
}A}2 ατ “ “ }A}2 στ ă 1
α
holds true. Thus, we get
1 ˚ 1 ˚
}x ´ xprq }22 ` }p ´ pprq }22
2τ 2σ
1 ˚ 1 ˚
ě }x ´ xpr`1q }22 ` }p ´ ppr`1q }22
2τ 2σ ?
1 ? 1 pr`1q }A}2 στ prq
` p1 ´ }A}2 στ q}xpr`1q ´ xprq }22 ` }p ´ pprq }22 ´ }p ´ ppr´1q }22
2τ 2σ 2σ
` xppr`1q ´ pprq , Apx˚ ´ xpr`1q qy ´ xpprq ´ ppr´1q , Apx˚ ´ xprq qy. (8.21)

Summing up these inequalities from r “ 0 to N ´ 1 and regarding that pp0q “ pp´1q , we


obtain
1 ˚ 1 ˚
}x ´ xp0q }22 ` }p ´ pp0q }22
2τ 2σ
1 ˚ 1 ˚
ě }x ´ xpN q }22 ` }p ´ ppN q }22
2τ ˜ 2σ ¸
N N ´1
? 1 ÿ prq pr´1q 2 1 ÿ prq pr´1q 2
` p1 ´ }A}2 στ q }x ´ x }2 ` }p ´ p }2
2τ r“1 2σ r“1
1 pN q
` }p ´ ppN ´1q }22 ` xppN q ´ ppN ´1q , Apx˚ ´ xpN q qy

153
8 PRIMAL-DUAL ALGORITHMS

By

1 pN q }A}22 στ pN q
xppN q ´ ppN ´1q , ApxpN q ´ x˚ qy ě ´ }p ´ ppN ´1q }22 ´ }x ´ x˚ }22
2σ 2τ
this can be further estimated as
1 ˚ 1 ˚
}x ´ xp0q }22 ` }p ´ pp0q }22
2τ 2σ
1 1 ˚
ě p1 ´ }A}22 στ q}x˚ ´ xpN q }22 ` }p ´ ppN q }22
2τ ˜ 2σ ¸
N N ´1
? 1 ÿ prq 1 ÿ prq
` p1 ´ }A}2 στ q }x ´ x pr´1q 2
}2 ` }p ´ p }2 . (8.22)
pr´1q 2
2τ k“1 2σ k“1

By (8.22) we conclude that the sequence tpxpnq , ppnq qun is bounded. Thus, there exists a
convergent subsequence tpxpnj q , ppnj q quj which convergenes to some point px̂, p̂q as j Ñ 8.
Further, we see by (8.22) that
´ ¯ ´ ¯
lim xpnq ´ xpn´1q “ 0, lim ppnq ´ ppn´1q “ 0.
nÑ8 nÑ8

Consequently, ´ ¯ ´ ¯
lim xpnj ´1q ´ x̂ “ 0, lim ppnj ´1q ´ p̂ “ 0
jÑ8 jÑ8

holds true. Let T denote the iteration operator of the PDHGMp cycles, i.e., T pxprq , pprq q “
pxpr`1q , ppr`1q q. Since T is the concatenation of affine operators and proximation operators,
it is continuous. Now we have that T xpnj ´1q , ppnj ´1q “ xpnj q , ppnj q and taking the limits
` ˘ ` ˘

for j Ñ 8 we see that T px̂, p̂q “ px̂, p̂q so that px̂, p̂q is a fixed point of the iteration and
thus a saddle point of L. Using this particular saddle point in (8.21) and summing up from
r “ nj to N ´ 1, N ą nj , we obtain

1 1
}x̂ ´ xpnj q }22 ` }p̂ ´ ppnj q }22
2τ 2σ
1 1
ě }x̂ ´ xpN q }22 ` }p̂ ´ ppN q }22
2τ 2σ
˜ ¸
N ´1 N ´1
? 1 ÿ pr`1q prq 2 1 ÿ prq pr´1q 2
` p1 ´ }A}2 στ q }x ´ x }2 ` }p ´ p }2
2τ r“n 2σ r“n `1
j j
?
1 pN q }A}2 στ pnj q
` }p ´ ppN ´1q }22 ´ }p ´ ppnj ´1q }22
2σ 2σ
` xppN q ´ ppN ´1q , Apx̂ ´ xpN q qy ´ xppnj q ´ ppnj ´1q , Apx̂ ´ xpnj q qy

and further
1 1
}x̂ ´ xpnj q }22 ` }p̂ ´ ppnj q }22
2τ 2σ
1 1
ě }x̂ ´ xpN q }22 ` }p̂ ´ ppN q }22
2τ 2σ ?
1 pN q pN ´1q 2 }A}2 στ pnj q
` }p ´p }2 ´ }p ´ ppnj ´1q }22
2σ 2σ
` xppN q ´ ppN ´1q , Apx̂ ´ xpN q qy ´ xppnj q ´ ppnj ´1q , Apx̂ ´ xpnj q qy.

For j Ñ 8 this implies that pxpN q , ppN q q converges also to px̂, ŷq and we are done. l

154
8 PRIMAL-DUAL ALGORITHMS

Similar as for the parallel DRS from Algorithm 7 and to Remark 8.1.5, PDHGMp can be
applied for minmizing functionals with more than two terms.
Remark 8.2.3 We derive the PDHGMp algorithm for functionals of the form
n
ÿ
f pxq “ gpxq ` hi pAi xq.
i“1

As in Remark 8.1.5, we rewrite f as f pxq “ gpxq ` HpAxq, where


¨ ˛
n
A1
˚ .
hi pyi q and A “ ˝ ..
ÿ ‹
Hpy1 , ..., yn q “ ˚ ‹

i“1
An

and apply the PDHGMp. This leads to the iteration


˜ ¸
ÿn
pr`1q prq T prq
x “ proxτ g x ´ τ σAi b̄i
i“1
´ ¯
pr`1q prq
yi “ prox σ1 hi bi ` Ai xpr`1q
pr`1q prq pr`1q
bi “ bi ` Ai xpr`1q ´ yi q
pr`1q pr`1q pr`1q prq
b̄i “ bi ` θpbi ´ bi q ˝

Using the primal-dual algorithm, we can now solve the L2 -TV problem as considered in
Example 4.3.2, Example 8.1.4 and Example 8.1.6 directly without solving a linear system.

Example 8.2.4 Let K P Rn,d and D P Rm,d be matrices. Then, the PDHGMp for mini-
mizing the functional
1
J pxq “ }Kx ´ a}22 ` λ}Dx}1 , gpxq “ 0
2
loooooomoooooon looomooon
“:h2 pDxq
“:h1 pKxq

reads as
´ ¯
prq prq
xpr`1q “ proxτ g xprq ´ τ σAT b̄prq “ xprq ´ τ σK T b̄1 ´ τ σDT b̄2
pr`1q
´
prq
¯ 1 σ
y1 “ prox σ1 h1 b1 ` Kxpr`1q “ pbprq ` Kxpr`1q q ` a
σ`1 σ`1
´ ¯
pr`1q prq prq
y2 “ prox σ1 h2 b2 ` Dxpr`1q “ S λ pb2 ` Dxpr`1q q
σ

pr`1q prq pr`1q pr`1q


b1 “ b1 ` Kx ´ y1
pr`1q prq pr`1q pr`1q
b2 “ b2 ` Dx ´ y2
pr`1q pr`1q pr`1q prq
b̄i “ bi ` θpbi ´ bi q, i “ 1, 2.

Since every step can be computed in closed form, this is a more efficient algorithm than the
one derived in Example 8.1.4. Similarly, if h1 pxq “ ιa pKxq, then we replace the y1 -update
by y1
pr`1q
“ a. ˛

155
Index

A extended real numbers . . . . . . . . . . . . . . . . 26

F
affine combination . . . . . . . . . . . . . . . . . . . . . 4
affine function . . . . . . . . . . . . . . . . . . . . . . . . . 6 feasible set. . . . . . . . . . . . . . . . . . . . . . . . . . .133
affine hull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Fenchel-Young identity . . . . . . . . . . . 115
affine set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 FISTA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
affinely independent . . . . . . . . . . . . . . . . . . . . 5 fixed point . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
alternating direction method of Fréchet differentiability . . . . . . . . . . . . . 40
multipliers . . . . . . . . . . . . . . . . . . 144
alternating projection algorithm . . . . . . 87 G
angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
asymptotically regular . . . . . . . . . . . . . . . . 80 Gateaux derivative . . . . . . . . . . . . . . . . . . 38
augmented Lagrangian . . . . . . . . . . . . . . . 144 Gateaux gradient . . . . . . . . . . . . . . . . . . . . 39
Gateaux Hessian . . . . . . . . . . . . . . . . . . . . 39
B general Uzawa method . . . . . . . . . . . . . . . 144
global minimizer . . . . . . . . . . . . . . . . . . . . . . 53
barycentric coordinates . . . . . . . . . . . . . . . . 6 Gordon theorem. . . . . . . . . . . . . . . . . . . .141
graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
C
H
canonical embedding . . . . . . . . . . . . . . . . . . 18
closed function . . . . . . . . . . . . . . . . . . . . . . . . 29 hyperplane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
closed halfspace . . . . . . . . . . . . . . . . . . . . . . . . 7
closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
I
closure operator . . . . . . . . . . . . . . . . . . . 10, 31 indicator function . . . . . . . . . . . . . . . . . . . . . 26
coercive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 infimal convolution. . . . . . . . . . . . . . . . . . . .58
cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 inverse problems . . . . . . . . . . . . . . . . . . . . . . 97
contractive . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 iterative soft-thresholding algorithm. . .89
convex combination . . . . . . . . . . . . . . . . . . . . 2
convex cone . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 K
convex function . . . . . . . . . . . . . . . . . . . . . . . 32
convex hull . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Krasnoselskii-Mann iteration . . . . . . 77
convex set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 L
D Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . 133
domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
λ-convex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
dual cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
level set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27
duality gap . . . . . . . . . . . . . . . . . . . . . . . . . . 127
level-bounded . . . . . . . . . . . . . . . . . . . . . . . . . 53
E linear programs . . . . . . . . . . . . . . . . . . . . . . 132
local minimizer . . . . . . . . . . . . . . . . . . . . . . . 53
effective domain . . . . . . . . . . . . . . . . . . . . . . 27 lower semicontinuous. . . . . . . . . . . . . . . . . .27
epigraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 lower semicontinuous hull . . . . . . . . . . . . . 29

156
INDEX

M relative interior . . . . . . . . . . . . . . . . . . . . . . . . 8
resolvent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
MAP estimator . . . . . . . . . . . . . . . . . . . . . . 102
maximal monotone . . . . . . . . . . . . . . . . . . 104 S
method of multipliers . . . . . . . . . . . . . . . . 144
saddle point . . . . . . . . . . . . . . . . . . . . 138, 143
Minkowski sum . . . . . . . . . . . . . . . . . . . . . . . 3
simplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
monotone . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
soft-shrinkage . . . . . . . . . . . . . . . . . . . . . . . . . 24
Moreau decomposition . . . . . . . . . . . . . 118
soft-shrinkage operator . . . . . . . . . . . . . . . . 61
Moreau envolope . . . . . . . . . . . . . . . . . . . . . . 61
strict epigraph . . . . . . . . . . . . . . . . . . . . . . . . 27
N strictly convex function . . . . . . . . . . . . . . . 32
strong duality . . . . . . . . . . . . . . . . . . . . . . . 127
NESTA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 strongly convex . . . . . . . . . . . . . . . . . . . . . . . 32
Nesterov algorithm . . . . . . . . . . . . . . . . . . . 90 strongly monotone . . . . . . . . . . . . . . . . . . . 104
non-expansive. . . . . . . . . . . . . . . . . . . . . . . . .77 subdifferential . . . . . . . . . . . . . . . . . . . . . . . . 66
normal cone . . . . . . . . . . . . . . . . . . . . . . . . . . 21 supercoercive . . . . . . . . . . . . . . . . . . . . . . . . . 53
O support function . . . . . . . . . . . . . . . . . . . . . 123
supporting halfspace . . . . . . . . . . . . . . . . . . 15
one-sided directional derivative . . . . . . . . 38 supporting hyperplane . . . . . . . . . . . . . . . . 15
orthogonal projection . . . . . . . . . . . . . . . . . 23
T
P
tangential cone . . . . . . . . . . . . . . . . . . . . . . . 20
Parallel DRS. . . . . . . . . . . . . . . . . . . . . . . . . .95 Tikhonov regularization . . . . . . . . . . . . . 98
polar cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 total variation . . . . . . . . . . . . . . . . . . . . . . . 100
polyhedral set . . . . . . . . . . . . . . . . . . . . . . . . . . 7
predictor corrector proximal multiplier U
method . . . . . . . . . . . . . . . . . . . . . 150
primal dual hybrid gradient method uniformly convex function. . . . . . . . . . . . .32
(PDHG) . . . . . . . . . . . . . . . . . . . . 150 unit simplex . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
proper function . . . . . . . . . . . . . . . . . . . . . . . 27
proximal mapping . . . . . . . . . . . . . . . . . . . . 61
V

R variable metric strategies. . . . . . . . . . . . . .90


variational inequality . . . . . . . . . . . . . . . . . 62
reflexive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 vecmax function . . . . . . . . . . . . . . . . . . . . . . 37

157
REFERENCES

References
[AB99] Charalambos Dionisios Aliprantis and Kim Border. Infinite Dimensional Anal-
ysis - A Hitchhiker’s Guide. Springer Science & Business Media, Berlin Heidel-
berg, 2 edition, 1999.
[AB09] H. Attouch and J. Bolte. On the convergence of the proximal algorithm for
nonsmooth functions involving analytic features. Math. Program., 116(1-2):5–
16, 2009.
[ABS13] H. Attouch, J. Bolte, and B. F. Svaiter. Convergence of descent methods for
semi-algebraic and tame problems proximal algorithms, forward-backward split-
ting, and regularized gauss-seidel methods. Math. Program. Series A, 137(1-
2):91–129, 2013.
[AE06] Jean-Pierre Aubin and Ivar Ekeland. Applied nonlinear analysis. Courier Cor-
poration, 2006.
[AF09] Jean-Pierre Aubin and Hélène Frankowska. Set-valued analysis. Birkhäuser
Boston, MA, 1 edition, 2009.
[AHU58] K. J. Arrow, L. Hurwitz, and H. Uzawa. Studies in Linear and Nonlinear Pro-
gramming. Stanford University Press, 1958.
[Alb96] Ya I Alber. Metric and generalized projection operators in banach spaces: prop-
erties and applications. In Athanass Kartsatos, editor, Theory and Applications
of Nonlinear Operators of Accretive and Monotone Type, volume 178, chapter 2,
pages 15–50. CRC Press, 1996.
[AM19] Francisco J. Aragón Artacho and Paula Segura Martínez. Finding magic squares
with the douglas-rachford algorithm, 2019.
[AO16] Aram V Arutyunov and Valeri Obukhovskii. Convex and Set-Valued Analysis.
De Gruyter, 2016.
[Bačá14] Miroslav Bačák. Computing medians and means in hadamard spaces. SIAM
Journal on Optimization, 24(3):1542–1566, 2014.
[Bar02] Alexander Barvinok. A course in convexity, volume 54. American Mathematical
Society, 2002.
[BB88] J. Barzilai and J. M. Borwein. Two-point step size gradient methods. IMA
Journal of Numerical Analysis, 8:141–148, 1988.
[BBC11] S. Becker, J. Bobin, and Emmanuel Jean Candès. Nesta: a fast and accu-
rate first-order method for sparse recovery. SIAM Journal of Imaging Sciences,
4(1):1–39, 2011.
[BC11] Heinz H. Bauschke and Patrick Louis Combettes. Convex analysis and monotone
operator theory in Hilbert spaces, volume 408. Springer, 2011.
[BGLS95] Joseph Frédéric Bonnans, J. C. Gilbert, C. Lemaréchal, and C. A. Sagastizábal.
A family of variable metric proximal methods. Mathematical Programming,

158
REFERENCES

68:15–47, 1995.
[BH13] Radu Ioan Boţ and Christopher Hendrich. A Douglas-Rachford type primal-dual
method for solving inclusions with mixtures of composite and parallel-sum type
monotone operators. SIAM Journal on Optimization, 23(4):2541–2565, 2013.
[BH14] Radu Ioan Boţ and Christopher Hendrich. Convergence analysis for a primal-
dual monotone + skew splitting algorithm with applications to total vari-
ation minimization. Journal of Mathematical Imaging and Vision, 2014.
arXiv:1211.1706.
[BL11] Kristian Bredies and Dirk Lorenz. Mathematische Bildverarbeitung, volume 1.
Springer, 2011.
[BL12] Heinz H. Bauschke and Yves Lucet. What is a fenchel conjugate. Notices of the
AMS, 59(1):44–46, 2012.
[BL18] Kristian Bredies and Dirk Lorenz. Mathematical Image Processing. Springer,
2018.
[Bol14] D. Boley. Local linear convergence of the alternating direction method of mul-
tipliers on quadratic or linear programs. SIAM J. Optim., 2014.
[Bon19] Joseph Frédéric Bonnans. Convex and Stochastic Optimization. Springer, 2019.
[Bou30] Georges Louis Bouligand. Sur les surfaces dépourvues de points hyperlimites.
Ann. Soc. Polon. Math, 9:32–41, 1930.
[BP12] Viorel Barbu and Teodor Precupanu. Convexity and optimization in Banach
spaces. Springer Science & Business Media, 2012.
[BPC` 11] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimiza-
tion and statistical learning via the alternating direction method of multipliers.
Foundations and Trends in Machine Learning, 3(1):101–122, 2011.
[BQ99] J. V. Burke and M. Qian. A variable metric proximal point algorithm for mono-
tone operators. SIAM Journal on Control and Optimization, 37:353–375, 1999.
[BR12] Silvia Bonettini and Valeria Ruggiero. On the convergence of primal-dual hybrid
gradient algorithms for total variation image restoration. J. Math. Imaging Vis.,
44:236–253, 2012.
[Bre65] Lev Meerovich Bregman. Finding the common point of convex sets by the
method of successive projection. Doklady Akademii Nauk, 162(3):487–490, 1965.
[Bré11] Haïm Brézis. Functional analysis, Sobolev spaces and partial differential equa-
tions. Universitext. Springer, 2011.
[BS13] Joseph Frédéric Bonnans and Alexander Shapiro. Perturbation analysis of opti-
mization problems. Springer Science & Business Media, 2013.
[BST13] J. Bolte, S. Sabach, and M. Teboulle. Proximal alternating linearized mini-
mization for nonconvex and nonsmooth problems. Math. Program., Series A,
2013.

159
REFERENCES

[BT09a] A. Beck and M. Teboulle. Fast gradient-based algorithms for constrained total
variation image denoising and deblurring. SIAM Journal on Imaging Sciences,
2:183–202, 2009.
[BT09b] Amir Beck and Marc Teboulle. Fast gradient-based algorithms for constrained
total variation image denoising and deblurring problems. IEEE transactions on
image processing, 18(11):2419–2434, 2009.
[BV04] Stephen Poythress Boyd and Lieven Vandenberghe. Convex optimization. Cam-
bridge university press, 2004.
[BV10] Jonathan M Borwein and Jon D. Vanderwerff. Convex functions: constructions,
characterizations and counterexamples, volume 172. Cambridge University Press
Cambridge, 2010.
[Byr04] Charles Byrne. A unified treatment of some iterative algorithms in signal pro-
cessing and image reconstruction. Inverse Problems, 20:103–120, 2004.
[CG59] Ward Cheney and Allen A. Goldstein. Proximity maps for convex sets. Proceed-
ings of the American Mathematical Society, 10(3):448–450, 1959.
[Cha09] Chidume Charles. Geometric properties of Banach spaces and nonlinear itera-
tions. Springer, 2009.
[Cla90a] Frank H. Clarke. Optimization and Nonsmooth Analysis. Society for Industrial
and Applied Mathematics, 1990.
[Cla90b] Frank H. Clarke. Optimization and Nonsmooth Analysis. Society for Industrial
and Applied Mathematics, 1990.
[Com04] Patrick Louis Combettes. Solving monotone inclusions via compositions of non-
expansive averaged operators. Optimization, 53(5–6):475–504, 2004.
[Con13] L. Condat. A primal-dual splitting method for convex optimization involving
Lipschitzian, proximable and linear composite terms. J. Optim. Theory Appl.,
158(2):460–479, 2013.
[CP11a] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex
problems with applications to imaging. Journal of Mathematical Imaging and
Vision, 40(1):120–145, 2011.
[CP11b] Antonin Chambolle and Thomas Pock. Diagonal preconditioning for first order
primal-dual algorithms in convex optimization. ICCV, pages 1762 – 1769, 2011.
[CP11c] Patrick Louis Combettes and Jean-Christophe Pesquet. Proximal splitting meth-
ods in signal processing. In Fixed-point algorithms for inverse problems in sci-
ence and engineering, pages 185–212. Springer, 2011.
[CP12] Patrick Combettes and Jean-Christoph Pesquet. Primal-dual splitting algorithm
for solving inclusions with mixture of composite, Lipschitzian, and parallel-sum
type monotone operators. Set-Valued and Variational Analysis, 20(2):307–330,
2012.
[CPR13] E. Chouzenoux, J.-C. Pesquet, and A. Repetti. Variable metric forward-

160
REFERENCES

backward algorithm for minimizing the sum of a differentiable function and


a convex function. J. Optim. Theory Appl., 2013.
[CR97] George HG Chen and R Tyrrell Rockafellar. Convergence rates in forward–
backward splitting. SIAM Journal on Optimization, 7(2):421–444, 1997.
[CT94] Gong Chen and Marc Teboulle. A proximal-based decomposition method for
convex minimization problems. Mathematical Programming, 64:81–101, 1994.
[CV77] Charles Castaing and Michel Valadier. Convex analysis and measurable multi-
functions. Lecture Notes in Mathematics. Springer Berlin, Heidelberg, 1 edition,
1977.
[CV12] Patrick Louis Combettes and B. C. Vu. Variable metric forward-backward split-
ting with applications to monotone inclusions in duality. Optimization, pages
1–30, 2012.
[DDD04] Ingrid Daubechies, M. Defrise, and C. De Mol. An iterative thresholding algo-
rithm for linear inverse problems with a sparsity constraint. Communications
on Pure and Applied Mathematics, 51:1413–1541, 2004.
[DFL08] I. Daubechies, M. Fornasier, and I. Loris. Accelerated projected gradient meth-
ods for linear inverse problems with sparsity constraints. The Journal of Fourier
Analysis and Applications, 14(5-6):764–792, 2008.
[DHJJ10] Joachim Dahl, Per Christian Hansen, Søren Holdt Jensen, and Tobias Lindstrøm
Jensen. Algorithms and software for total variation image reconstruction via
first-order methods. Numerical Algorithms, 53(1):67, 2010.
[DJ98] D. L. Donoho and I. M. Johnstone. Minimax estimation via wavelet shrinkage.
The Annals of Statistics, 26(3):879–921, 1998.
[DM93] Gianni Dal Maso. An Introduction to Γ-Convergence. Springer Science & Busi-
ness Media, 1993.
[Don01] David Leigh Donoho. Sparse components of images and optimal atomic decom-
positions. Constructive Approximation, 17(3):353–382, 2001.
[DSSSC08] J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections
onto the l1-ball for learning in high dimensions. In ICML ’08 Proceedings of the
25th International Conference on Machine Learning, ACM New York, 2008.
[Dud02] Richard Mansfield Dudley. Real Analysis and Probability. Cambridge Studies in
Advanced Mathematics. Cambridge University Press, 2 edition, 2002.
[EB90] J. Eckstein and Dimitri Panteli Bertsekas. An alternating direction method for
linear programming. Technical report, Massachusetts Institute of Technology
Laboratory for Information and Decision Systems, Massachusetts Institute of,
1990.
[EB92] J. Eckstein and D. P. Bertsekas. On the Douglas-Rachford splitting method and
the proximal point algorithm for maximal monotone operators. Mathematical
Programming, 55:293 – 318, 1992.

161
REFERENCES

[Ess09] E. Esser. Applications of Lagrangian-based alternating direction methods and


connections to split Bregman. Technical report, UCLA Computational and
Applied Mathematics, March 2009.
[ET99] Ivar Ekeland and Roger Temam. Convex analysis and variational problems.
SIAM, 1999.
[FGH57] Ky Fan, Irving Glicksberg, and Alan Jerome Hoffman. Systems of inequalities
involving convex functions. Proceedings of the American Mathematical Society,
8:617–622, 1957.
[FP03] Francisco Facchinei and Jong-Shi Pang. Finite-dimensional variational inequal-
ities and complementarity problems, volume 2. Springer, 2003.
[FZZ08] G. Frassoldati, L. Zanni, and G. Zanghirati. New adaptive stepsize selections
in gradient methods. Journal of Industrial and Management Optimization,
4(2):299–312, 2008.
[Gab83] D. Gabay. Applications of the method of multipliers to variational inequalities.
In M. Fortin and R. Glowinski, editors, Augmented Lagrangian Methods: Appli-
cations to the Solution of Boundary Value Problems, chapter IX, pages 299–340.
North-Holland, Amsterdam, 1983.
[GL89] R. Glowinski and P. Le Tallec. Augmented Lagrangian and Operator-Splitting
Methods in Nonlinear Mechanics, volume 9 of SIAM Studies in Applied and
Numerical Mathematics. SIAM, Philadelphia, 1989.
[GM75] R. Glowinski and A. Marroco. Sur l’approximation, par éléments finis d’ordre un,
et la résolution, par pénalisation-dualité d’une classe de problèmes de Dirichlet
non linéaires. Revue française d’automatique, informatique, recherche opéra-
tionnelle. Analyse numérique, 9(2):41–76, 1975.
[GM76] D. Gabay and B. Mercier. A dual algorithm for the solution of nonlinear vari-
ational problems via finite element approximation. Computers & Mathematics
with Applications, 2:17–40, 1976.
[GS11] D. Goldfarb and K. Scheinberg. Fast first-order methods for composite convex
optimization with line search. SIAM Journal on Imaging Sciences, 2011.
[Gül92] Osman Güler. New proximal point algorithms for convex minimization. SIAM
J. Optim., 2(4):649–664, 1992.
[Hel69] G. Helmberg. Introduction to Spectral Theory in Hilbert Space. North Holland,
Amsterdam, 1969.
[HL12] M. Hong and Z. Q. Luo. On linear convergence of the alternating direction
method of multipliers. Arxiv preprint 1208.3922, 2012.
[HLHY02] Bingsheng He, Li-Zhi Liao, Deren Han, and Hai Yang. A new inexact alternating
directions method for monotone variational inequalities. Math. Program., Ser.
A, 92(1):103–118, 2002.
[Hol72] Richard Bruce Holmes. A course on optimization and best approximation, vol-

162
REFERENCES

ume 257 of Lecture Notes in Mathematics. Springer Berlin, Heidelberg, 1972.


[Hol12] Richard Bruce Holmes. Geometric functional analysis and its applications, vol-
ume 24. Springer Science & Business Media, 2012.
[HUL93] Jean-Baptiste Hiriart-Urruty and Claude Lemaréchal. Convex analysis and min-
imization algorithms I: Fundamentals, volume 305. Springer science & business
media, 1993.
[HY98] B. He and H. Yang. Some convergence properties of the method of multiplieres
for linearly constrained monotone variational operators. Operation Research
Letters, 23:151–161, 1998.
[HYW00] B. S. He, H. Yang, and S. L. Wang. Alternating direction method with self-
adaptive penalty parameters for monotone variational inequalties. J. Optimiz.
Theory App., 106(2):337–356, 2000.
[Jah07] Johannes Jahn. Introduction to the Theory of Nonlinear Optimization. Springer
Berlin, Heidelberg, 3 edition, 2007.
[KM98] Spyridon Kontogiorgis and Robert R Meyer. A variable-penalty alternating
directions method for convex optimization. Math. Program., 83(1-3):29–53, 1998.
[Kra55] Mark Aleksandrovich Krasnoselskii. Two observations about the method of
successive approximations. Uspekhi Matematicheskikh Nauk, 10:123–127, 1955.
[LM79] P. L. Lions and B. Mercier. Splitting algorithms for the sum of two nonlinear
operators. SIAM Journal on Numerical Analysis, 16(6):964–979, 1979.
[Luc06] Roberto Lucchetti. Convexity and well-posed problems. Springer Science &
Business Media, 2006.
[Man53] William Robert Mann. Mean value methods in iteration. Proceedings of the
American Mathematical Society, 16(4):506–510, 1953.
[Man94] Olvi Leon Mangasarian. Nonlinear Programming. Classics in Applied Mathe-
matics. SIAM, Madison, WI, 1994.
[Min62] George James Minty. Monotone (nonlinear) operators in hilbert space. Duke
mathematical journal, 29(3):341–346, 1962.
[MN22] Boris S Mordukhovich and Nguyen Mau Nam. Convex Analysis and Beyond, I:
Basic Theory. Springer, Cham, Switzerland, 2022.
[Mor65] Jean-Jacques Moreau. Proximité et dualité dans un espace hilbertien. Bulletin
de la Société mathématique de France, 93:273–299, 1965.
[Nes83] Yurii Evgenievich Nesterov. A method of solving a convex programming problem
with convergence rate Op1{k 2 q. Soviet Mathematics Doklady, 27(2):372–376,
1983.
[Nes04] Yurii Evgenievich Nesterov. Introductory Lectures on Convex Optimization.
Kluwer, Dordrecht, 2004.
[Nes05] Yurii Evgenievich Nesterov. Smooth minimization of non-smooth functions.

163
REFERENCES

Mathematical Programming, 103:127–152, 2005.


[Nes13] Yurii Evgenievich Nesterov. Gradient methods for minimizing composite func-
tions. Mathematical Programming, Series B, 140(1):125–161, 2013.
[NP19] Constantin P Niculescu and Lars-Erik Persson. Convex Functions and their
Applications: A Contemporary Approach. Springer, 2019.
[NY83] A. Nemirovsky and D. Yudin. Informational Complexity and Efficient Methods
for Solution of Convex Extremal Problems. J. Wiley & Sons, New York, 1983.
[OCBP14] Peter Ochs, Yunjin Chen, Thomas Brox, and Thomas Pock. ipiano: Inertial
proximal algorithm for non-convex optimization. SIAM Journal on Imaging
Sciences, 7(2):1388–1419, 2014.
[Opi67] Zdzisław Opial. Weak convergence of the sequence of successive approximations
for nonexpansive mappings. Bulletin of the American Mathematical Society,
73(4):591–597, 1967.
[PCCB09] T. Pock, A. Chambolle, D. Cremers, and H. Bischof. A convex relaxation ap-
proach for computing minimal partitions. IEEE Conf. Computer Vision and
Pattern Recognition, pages 810–817, 2009.
[Phe09] Robert Ralph Phelps. Convex functions, monotone operators and differentiabil-
ity, volume 1364. Springer, 2009.
[PLS08] Lisandro A. Parente, Pablo A. Lotito, and Mikhail V. Solodov. A class of inex-
act variable metric proximal point algorithms. SIAM Journal on Optimization,
19(1):240–260, 2008.
[Pow72] Michael James David Powell. A method for nonlinear constraints in minimiza-
tion problems. Optimization, 1972.
[Roc66] Ralph Rockafellar. Characterization of the subdifferentials of convex functions.
Pacific Journal of Mathematics, 17(3):497–510, 1966.
[Roc70a] Ralph Tyrell Rockafellar. On the maximality of sums of nonlinear monotone
operators. Transactions of the American Mathematical Society, 149(1):75–88,
1970.
[Roc70b] Ralph Tyrrell Rockafellar. Convex Analysis. Princeton Landmarks in Mathe-
matics and Physics. Princeton University Press, 1970.
[Roc76] Ralph Tyrrell Rockafellar. Augmented Lagrangians and applications of the prox-
imal point algorithm in convex programming. Mathematics of Operations Re-
search, 1(2):97–116, May 1976.
[Roc81] R. Tyrrell Rockafellar. The theory of subgradients and its applications to prob-
lems of optimization: convex and nonconvex functions. Research and education
in mathematics. Heldermann Verlag Berlin, 1 edition, 1981.
[Sch57] Helmut Heinrich Schaefer. Über die Methode sukzessiver Approximationen.
Jahresbericht der Deutschen Mathematiker-Vereinigung, 59:131–140, 1957.
[Sch14] Rolf Schneider. Convex bodies: the Brunn–Minkowski theory. Number 151 in

164
REFERENCES

Encyclopedia of Mathematics and its Applications. Cambridge university press,


2014.
[Set09] Simon Setzer. Splitting Methods in Image Processing. University of Mannheim,
2009. Ph.D. Thesis.
[Set11] S. Setzer. Operator splittings, bregman methods and frame shrinkage in image
processing. International Journal of Computer Vision, 92(3):265–280, 2011.
[Sev30] Francesco Severi. Su alcune questioni di topologia infinitesimale. Ann. Soc.
Polon. Math, 9:97–108, 1930.
[Sim06] Stephen Simons. Minimax and monotonicity. Springer, 2006.
[Spe93] Peter Spellucci. Numerische Verfahren der Nichtlinearen Optimierung.
Birkhäuser, Basel, Boston, Berlin, 1993.
[SSM13] S. Setzer, G. Steidl, and J. Morgenthaler. A cyclic projected gradient method.
Computational Optimization and Applications, 54(2):417–440, 2013.
[Tse91] P. Tseng. Applications of a splitting algorithm to decomposition in convex
programming and variational inequalities. SIAM Journal on Control and Opti-
mization, 29:119–138, 1991.
[Tse08] Paul Tseng. On accelerated proximal gradient methods for convex-concave op-
timization, 2008.
[Val13] Tuomo Valkonen. A primal-dual hybrid gradient method for non-linear op-
erators with applications to MRI. Technical report, arXiv e-print, 2013.
http://arxiv.org/abs/1309.5032.
[Vũ13] Bang Cong Vũ. A splitting algorithm for dual monotone inclusions involving
cocoercive operators. Advances in Computational Mathematics, 38(3):667–681,
2013.
[WL01] S. L. Wang and L. Z. Liao. Decomposition method with a variable parameter for
a class of monotone variational inequality problems. J. Optimiz. Theory App.,
109(2):415–429, 2001.
[ZC08] M. Zhu and T. Chan. An efficient primal-dual hybrid gradient algorithm for
total variation image restoration. Technical report, UCLA, Center for Applied
Math., 2008.
[Z0̆2] Constantin Zălinescu. Convex analysis in general vector spaces. World scientific,
2002.

165

You might also like