Inverse Problems
Inverse Problems
Inverse Problems
Martin Benning
Updated on: June 6, 2016
Lecture Notes
Michaelmas Term 2015
2
Contents
3 Regularisation 21
3.1 Truncated singular value decomposition . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Tikhonov regularisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Source-conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Asymptotic regularisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5 Landweber iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.6 Variational regularisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.6.1 Tikhonov-type regularisation . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.7 Convex Variational Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4 Computational realisation 39
4.1 A primal dual hybrid gradient method . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.1 Convergence of PDHGM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1.2 Deconvolution with total variation regularisation . . . . . . . . . . . . . . . 43
3
4 CONTENTS
These lecture notes are based on the following books and lecture notes:
1. Engl, Heinz Werner, Martin Hanke, and Andreas Neubauer. Regularization of inverse prob-
lems. Vol. 375. Springer Science & Business Media, 1996.
2. Mueller, Jennifer L., and Samuli Siltanen. Linear and nonlinear inverse problems with
practical applications. Vol. 10. SIAM, 2012.
4. Christian Clason, Inverse Probleme (in German), Lecture notes winter 2014/2015
https://www.uni-due.de/~adf040p/teaching/inverse_14/InverseSkript.pdf
The lecture notes are under constant redevelopment and will likely contain errors and mistakes. I
very much appreciate the finding and reporting of those (to mb941@cam.ac.uk). Thanks!
Chapter 1
Solving an inverse problem is the task of computing an unknown physical quantity that is related
to given, indirect measurements via a forward model. Inverse problems appear in a vast majority
of applications, including imaging (Computed Tomography (CT), Positron Emission Tomography
(PET), Magnetic Resonance Imaging (MRI), Electron Tomography (ET), microscopic imaging,
geophysical imaging), signal- and image-processing, computer vision, machine learning and (big)
data analysis in general, and many more.
Mathematically, an inverse problem can be described as the solution of the operator equation
Ku = f (1.1)
with given measurement data f for the unknown quantity u. Here, K : U → V denotes an operator
mapping from the Banach space U to the Banach space V. Throughout this lecture we are going
to restrict ourselves to linear and bounded operators though.
Inverting a forward model however is not straightforward in most relevant applications, for two
basic reasons: either a (unique) inverse model simply does not exist, or existing inverse models
heavily amplify small measurement errors. In the sense of Hadamard the problem (1.1) is called
well-posed if
• for all input data there exists a solution of the problem, i.e. for all f ∈ V there exists a
u ∈ U with Ku = f .
• the solution of the problem depends continuously on the input datum, i.e. for all {uk }k∈N
with Kuk → f we have uk → u.
If any of these conditions is violated, problem (1.1) is called ill-posed. In the following we are going
to see that most practically relevant inverse problems are ill-posed or approximately ill-posed.1
1.1 Examples
In the following we are going to present various examples of inverse problems and highlight the
challenges of dealing with them.
1
In fact the name ill-posed problems may be a more suitable name for this lecture, as the real challenge is to deal
with the ill-posedness of the inverse problems. However, the name inverse problems became more widely accepted
for this field of mathematics.
5
6 1.1. EXAMPLES
n
X
K= λj kj kjT . (1.2)
j=1
Note that by dividing (1.1) on both sides of the equality with λn , (1.2) can be rescaled so that
λn = 1 and λ1 = κ−1 hold. Here, κ denotes the so-called condition number κ = λn /λ1 (in the
case of λ1 6= 0) of the matrix K. It is well known from numerical linear algebra that the condition
number is a measure of how stable (1.1) can be solved.
If we assume that we observe f δ instead of f , with kf − f δ k2 ≤ δ (here k · k2 denotes the
Euclidean norm), and further denote uδ the solution of Kuδ = f δ , the difference between uδ and
the solution u of (1.1) reads as
n
X
u − uδ = λ−1
j kj kj
T
f − f δ
.
j=1
due to λ1 ≤ λi for i 6= 1 and the orthogonality of the eigenvectors. Thus, taking the square root
yields the estimate
u − uδ
≤ κ
f − f δ
≤ κδ .
2 2
Hence, we observe that in the worst case an error δ in the data y is amplified by the condition
number κ of the matrix K. A matrix with large κ is therefore called ill-conditioned. We want to
demonstrate the effect of this error amplification with a small example.
and the vector f = (1, 1)T . Then the solution of Ku = f is simply given via u = (1, 0)T . If we,
however, consider the perturbed data f δ = (99/100, 101/100)T instead of f , the solution uδ of
Kuδ = f δ is uδ = (−19.01, 20)T .
CHAPTER 1. INTRODUCTION TO INVERSE PROBLEMS 7
1.1.2 Differentiation
Another classic inverse problem is differentiation. Assume we are given a function f ∈ L∞ ([0, 1])
with f (0) = 0 for which we want to compute u = f 0 . These conditions are satisfied if and only if
u and f satisfy the operator equation
Z y
f (y) = u(x) dx ,
0
which
Ry can be written as the operator equation Ku = f with the linear operator (K·)(y) :=
0 ·(x) dx. As in the previous section, we assume that instead of f we observe a noisy version f δ
for which we further assume that the perturbation is additive, i.e. f δ = f + nδ with f ∈ C 1 ([0, 1])
and nδ ∈ L∞ ([0, 1]). It is obvious that the derivative u only exists if the noise nδ is differentiable.
However, even in case nδ is differentiable the error in the derivative can become arbitrarily large.
Consider for instance the sequence (δj )j∈N with δj → 0 and the sequence of corresponding noise
functions nδj ∈ C 1 ([0, 1]) ,→ L∞ ([0, 1]) with
kx
δj
n (x) := δj sin ,
δj
for a fixed but arbitrary number k. We on the one hand observe
nδj
L∞ ([0,1]) = δj → 0, but on
Thus, despite the noise in the data becoming arbitrarily small, the error in the derivative can
become arbitrarily big (depending on k).
Note that considering a decreasing error in the norm of the Banach
δ
space C ([0,
1 1]) will yield
a different result. If we have a sequence of noise functions with n C 1 ([0,1]) = δ j → 0 instead,
j
we can conclude
δj
u − u
∞ ≤
nδj
1 → 0,
L ([0,1]) C ([0,1])
due to C 1 ([0, 1]) being embedded in L∞ ([0, 1]). In contrast to the previous example the sequence
of functions nδj (x) := δj sin(kx) for instance satisfies
δ 0
(x) = (1 + k)δj → 0 .
δj
δj
n
= sup n (x) + sup n
j
C 1 ([0,1]) x∈[0,1] x∈[0,1]
However, for a fixed δ the bound on
u − uδ
L∞ ([0,1]) can obviously still become fairly large
1.1.3 Deconvolution
An interesting problem that occurs in many imaging, image- and signal processing applications is
the deblurring or deconvolution of signals from a known, linear degradation. Deconvolution of a
signal can be modelled as solving the inverse problem of the convolution, which reads as
Z
f (y) = (Ku)(y) := u(x)g(y − x) dx . (1.3)
Rn
Here f denotes the blurry image, u is the (unknown) true image and g is the function that models
the degradation. Due to the Fourier convolution theorem we can rewrite (1.3) to
n
f = (2π) 2 F −1 (F(u)F(g)) . (1.4)
It is important to note that the inverse Fourier transform is indeed the unique, inverse operator of
the Fourier transform in the Hilbert-space L2 due to the theorem of Plancherel. If we rearrange
(1.4) to solve it for u we obtain
−n −1 F(f )
u = (2π) 2 F , (1.7)
F(g)
and hence, we allegedly can recover u by simple division in the Fourier domain. However, we are
quickly going to discover that this inverse problem is ill-posed and the division will lead to heavy
amplifications of small errors.
Let u denote the image that satisfies (1.3). Further we assume that instead of the blurry image
f we observe f δ = f + nδ instead, and that uδ is the solution of (1.7) with input datum f δ . Hence,
we observe
F(f − f δ ) −1 F(nδ )
n
−1
δ
(2π) 2 u − u = F = F . (1.8)
F(g) F(g)
As the convolution kernel g usually has compact support, F(g) will tend to zero for high frequen-
cies. Hence, the denominator of (1.8) becomes fairly small, whereas the numerator will be non-zero
as the noise is of high frequency. Thus, in the limit the solution will not depend continuously on
the data and the convolution problem therefore be ill-posed.
1.1.4 Tomography
In almost any tomography application the underlying inverse problem is either the inversion of
the Radon transform or of the X-ray transform in dimensions higher than two. For u ∈ C0∞ (Rn ),
CHAPTER 1. INTRODUCTION TO INVERSE PROBLEMS 9
s ∈ R and θ ∈ S n−1 , the Radon transform2 R : C0∞ (Rn ) → C ∞ (S n−1 × R) can be defined as the
line integral operator
Z
f (θ, s) = (Ru)(θ, s) = u(x) dx (1.9)
Zx·θ=s
= u(sθ + tθ⊥ ) dt . (1.10)
R
Note that - with respect to the origin of the reference coordinate system - ϕ determines the angle
of the line along one wants to integrate, while s is the offset of that line to the centre of the
coordinate system.
⊥
dt = − u(sθ + tθ⊥ ) dt .
− R I(sθ + tθ )
2
−R 2
Note that due to d/dx log(f (x)) = f 0 (x)/f (x) the left hand side in the above equation simplifies
to
Z R d ⊥
dt I(sθ + tθ ) R ⊥ R ⊥
2
⊥
dt = log I sθ + θ − log I sθ − θ .
− R I(sθ + tθ )
2
2 2
As we know the radiation intensity at both the emitter and the detector, we therefore know
f (θ, s) := log(I(sθ − (Rθ⊥ )/2)) − log(I(sθ + (Rθ⊥ )/2)) and we can write the estimation of the
unknown density u as the inverse problem of the Radon transform (1.10) (if we further assume
that u can be continuously extended to zero outside of the circle of radius R).
2
Named after the Austrian mathematician Johann Karl August Radon (16 December 1887 – 25 May 1956)
10 1.1. EXAMPLES
In Positron Emission Tomography (PET) a so-called radioactive tracer (a positron emitting ra-
dionuclide on a biologically active molecule) is injected into a patient (or subject). The emitted
positrons of the tracer will interact with the subjects’ electrons after travelling a short distance
(usually less than 1mm), causing the annihilation of both the positron and the electron, which
results in a pair of gamma photons moving into (approximately) opposite directions. This pair
of photons is detected by the scanner detectors, and an intensity f (ϕ, s) can be associated with
the number of annihilations detected at the detector pair that forms the line with offset s and
angle ϕ (with respect to the reference coordinate system). Thus, we can consider the problem of
recovering the unknown tracer density u as a solution of the inverse problem (1.10) again. The line
of integration is determined by the position of the detector pairs and the geometry of the scanner.
Here M (t) = (Mx (t), My (t), Mz (t)) is the nuclear magnetisation (of the spin isochromat), γ is the
gyromagnetic ratio, B(t) = (Bx (t), By (t), Bz (t)) denotes the magnetic field experienced by the
nuclei, T1 is the longitudinal and T2 the transverse relaxation time and M0 the magnetisation in
thermal equilibrium. If we define Mxy (t) = Mx (t) + iMy (t) and Bxy (t) = Bx (t) + iBy (t), we can
rewrite (1.12) to
d Mxy (t)
Mxy (t) = −iγ (Mxy (t)Bz (t) − Mz (t)Bxy (t)) − (1.13a)
dt T2
d γ Mz (t) − M0
(1.13b)
Mz (t) = i Mxy (t)Bxy (t) − Mxy (t)Bxy (t) −
dt 2 T1
d Mxy (t)
Mxy (t) = −iγB0 Mxy (t) − , (1.14a)
dt T2
d Mz (t) − M0
Mz (t) = − . (1.14b)
dt T1
3
Named after the Swiss born American physicist Felix Bloch (23 October 1905 - 10 September 1983)
CHAPTER 1. INTRODUCTION TO INVERSE PROBLEMS 11
It is easy to see that this system of equations (1.14) has the unique solution
for ω0 := γB0 denoting the Lamor frequency, and Mxy (0), Mz (0) being the initial magnetisations
at time t = 0.
Rotating frame
Thus, for a constant magnetic background field in z-direction, B0 , Mxy basically rotates around
the z-axis in clockwise direction with frequency ω0 (if we ignore the T2 decay for a moment).
Rotating the x- and y-coordinate axes with the same frequency yields the representation of the
Bloch equations in the so-called rotating frame. If we substitute Mxy
r (t) := eiω0 t M (t), B r (t) :=
xy xy
Bxy (t)e 0 , Mz (t) := Mz (t) and Bz (t) := Bz (t), we obtain
iω t r r
Mxy r (t)
d r
Mxy (t) = −iγ Mxyr
(t)(Bzr (t) − B0 ) − Mzr (t)Bxy
r
(t) − (1.16a)
dt T2
d r γ M r (t) − M
0
r r (t) − M r (t)B r (t) − z
(1.16b)
M (t) = i Mxy (t)Bxy
dt z 2 xy xy
T1
instead of (1.13).
Thus, if we assume the magnetic field to be constant with magnitude B0 in z-direction within
the rotating frame, i.e. B r (t) = (Bxr (t), Byr (t), B0 ), (1.16a) simplifies to
r (t)
Mxy
d r
Mxy (t) = iγMzr (t)Bxy
r
(t) − . (1.17)
dt T2
90◦ pulse
Now we assume that Bxr (t) = c, c constant, and Byr (t) = 0 for t ∈ [0, τ ], and τ T1 and τ T2 .
Then we can basically ignore the effect of Mxy
r (t)/T and (M r (t)−M )/T , and the Bloch equations
2 z 0 1
in the rotating frame simplify to
with ω := γc, in matrix form with separate components. Assuming the initial magnetisations in
the rotating frame to be zero in the x-y plane, i.e. Mxr (0) = 0 and Myr (0) = 0, and constant in
the z-plane with value Mzr (0), the solution of (1.18) can be written as
Mxr (t)
1 0 0 0
Myr (t) = 0 cos(ωt) sin(ωt) 0 . (1.19)
r 0 − sin(ωt) cos(ωt) r
Mz (0)
Mz (t)
Thus, equation (1.19) rotates the initial z-magnetisation around the x-axis by the angle θ := ωt.
Note that if c and τ are chosen such that θ = π, all magnetisation is rotated from the z-axis to
the y-axis, i.e. Myr (τ ) = Mzr (0). In analogy, choosing Bx (t) = 0 and By (t) = c, all magnetisation
can be shifted from the z- to the x-axis.
12 1.1. EXAMPLES
Signal acquisition
If the radio-frequency (RF) pulse is turned off and thus, Bxr (t) = 0 and Byr (t) = 0 for t > τ ,
the same coils that have been used to induce the RF pulse can be used to measure the x-y
magnetisation. Since we measure a volume of the whole x-y net-magnetisation, the acquired
signal equals
Z Z
y(t) = M (x, t) dx = e−iω0 (x)t M r (x, t) dx (1.20)
R3 R2
with M (x, t) denoting Mxy (t) for a specific spatial coordinate x ∈ R3 (M r (x, t) respectively).
Using (1.15a) and assuming τ < t T2 , this yields
Z
y(t) = Mτ (x)e−iω0 (x)t dx , (1.21)
R3
with Mτ denoting the x-y-magnetisation at spatial location x ∈ R3 and time t = τ . Note that
Mτ = 0 without any RF pulse applied in advance.
Signal recovery
The basic clue to allow for spatially resolving nuclear magnetic resonance spectrometry is to add
a magnetic field B̂(t) to the constant magnetic field B0 in z-direction that varies spatially over
time. Then, (1.14a) changes to
d Mxy (t)
Mxy (t) = −iγ(B0 + B̂(t))Mxy (t) − ,
dt T2
which, for initial value Mxy (0), has the unique solution
Rt t
B̂(τ ) dτ ) − T2
Mxy (t) = e−iγ (B0 t+ 0 e Mxy (0) (1.22)
if we ensure B̂(0) = 0. If now x(t) denotes the spatial location of a considered spin isochromat at
time t, we can write B̂(t) as B̂(t) = x(t) · G(t), with a function G that describes the influence of
the magnetic field gradient over time.
Based on the considerations that lead to (1.21) we therefore measure
Z Rt
y(t) = Mτ (x)e−iγ (B0 (x)t+ 0 x(τ )·G(τ ) dτ ) dx
R3
in an NMR experiment. Further assuming that B0 is also constant in space, we can eliminate this
term and write the signal acquisition as
Z Rt
eiγB0 t
y(t) = Mτ (x)e−iγ 0 x(τ )·G(τ ) dτ dx (1.23)
R3
In the following we assume that x(t) can be approximated reasonably well via its zero-order
Taylor approximation around t0 = 0, i.e.
Z t Z t
x(τ ) · G(τ ) dτ ≈ x(0) · G(τ ) dτ . (1.24)
0 0
and hence, the inverse problem of finding the unknown spin-proton density Mτ for given measure-
ments y is equivalent to solving the inverse problem of the Fourier transform
Z
f (t) = (KMτ )(t) = Mτ (x)e−ix·ξ(t) dx , (1.25)
R3
Rt
with f (t) := eiγB0 t y(t) and ξ(t) := γ 0 G(τ ) dτ .
Chapter 2
Throughout this lecture we deal with functional analytic operators. For the sake of brevity, we
cannot recall all basic concepts of functional analysis but refer to popular textbooks that deal with
this subject, like [10]. Nevertheless, we want to recall a few important properties that are going to
be important for the further course of this lecture. In particular, we are going to focus on inverse
problems with bounded, linear operators K only, i.e. K ∈ L(U, V) with
kKukV
kKkL(U ,V) := sup = sup kKukV < ∞ .
u∈U \{0} kukU kukU ≤1
of K. We say that K is continuous in u ∈ U if there exists a δ > 0 for all ε > 0 with
For linear K (as assumed throughout this lecture) it can be shown that continuity is equivalent
to the existence of a positive constant C such that
kKukV ≤ CkukU
for all u ∈ X . Note that this constant C actually equals the operator norm kKkL(U ,V) .
For the first part of the lecture we only consider K ∈ L(U, V) with U and V being Hilbert
spaces. From functional calculus we know that every Hilbert space is equipped with a scalar
product, which we are going to denote by h·, ·iU (if U denotes the corresponding Hilbertspace). In
analogy to the transpose of a matrix, this scalar product structure together with the theorem of
Fréchet-Riesz [10, Section 2.10, Theorem 2.E] allows us to define the (unique) adjoint operator of
K, denoted with K ∗ , as follows:
13
14 2.1. GENERALISED INVERSE
In addition to that, a scalar product allows to have a notion of orthogonality. Two function u, v ∈ U
are said to be orthogonal if hu, viU = 0. For a subset X ⊂ U the orthogonal complement of X in
U is defined as
X ⊥ := {u ∈ U | hu, xiU = 0 for all x ∈ X } .
From this definition we immediately observe that X ⊥ is a closed subspace. Further we have
U ⊥ = {0}. Moreover, we have X ⊂ (X ⊥ )⊥ . If X is a closed subspace we even have X = (X ⊥ )⊥
and {0}⊥ = X . In this case there exists the orthogonal decomposition
U = X ⊕ X⊥ ,
which means that every element u ∈ U can uniquely be represented as
u = x + x⊥ with x ∈ X , x⊥ ∈ X ⊥ ,
see for instance [10, Section 2.9, Corollary 1]. The mapping u 7→ x defines a linear operator
PX ∈ L(U, X ) that is called orthogonal projection on X . The orthogonal projection satisfies the
following conditions (cf. [6, Section 5.16]):
1. PX is self-adjoint, i.e. PX = PX∗ ;
2. kPX kL(U ,X ) = 1 (if X 6= {0});
3. I − PX = PX ⊥ ;
4. kv − PX vkU = minu∈X kv − ukU ;
5. x = PX u if and only if x ∈ X and u − x ∈ X ⊥ .
Note that for a non-closed subspace X we only have (X ⊥ )⊥ = X . For K ∈ L(U, V) we therefore
have
• R(K)⊥ = N (K ∗ ) and thus N (K ∗ )⊥ = R(K);
• R(K ∗ )⊥ = N (K) and thus N (K)⊥ = R(K ∗ ).
Hence, we can conclude the orthogonal decompositions
U = N (K) ⊕ R(K ∗ ) and V = N (K ∗ ) ⊕ R(K) .
In the following we want to investigate the concept of generalised inverses of bounded linear
operators, before we will identify compactness of operators as the major source of ill-posedness.
Subsequently we are going to discuss this in more detail by analysing compact operators in terms
of their singular value decomposition.
Note that for arbitrary f a minimum norm solution does not need to exist if R(K) is not
closed. If, however, a minimum norm solution exists, it is unique and can (in theory) be computed
via the Moore-Penrose generalised inverse.
Definition 2.2. Let K̃ := K|N (K)⊥ : N (K)⊥ → R(K) denote the restriction of K ∈ L(U, V).
Then the Moore-Penrose inverse K † is defined as the unique linear extension of K̃ −1 with
Note that K̃ is injective due to the restriction to N (K)⊥ , and surjective due to the restriction
to R(K). Hence, K̃ −1 exists, and - as a consequence - K † is well-defined on R(K). Due to the
orthogonal decomposition D(K † ) = R(K) ⊕ R(K)⊥ there exist f1 ∈ R(K) and f2 ∈ R(K)⊥ with
f = f1 + f2 , for arbitrary f ∈ D(K † ). Hence, we have
K † f = K † f1 + K † f2 = K † f1 = K̃ −1 f1 , (2.3)
due to R(K)⊥ = N (K † ), and thus, K † is well-defined on the whole of D(K † ). It can be shown
that K † can be characterized by the Moore-Penrose equations.
Lemma 2.1. The Moore-Penrose inverse K † satisfies R(K † ) = N (K)⊥ and the Moore-Penrose
equations
1. KK † K = K,
2. K † KK † = K † ,
3. K † K = I − PN (K) ,
4. KK † = PR(K) ,
†
D(K )
where PN (K) : U → N (K) and PR(K) : V → R(K) denote the orthogonal projections on N (K)
and R(K), respectively.
Proof. First of all we are going to prove R(K † ) = N (K)⊥ . According to (2.3) and the definition
of the Moore-Penrose inverse we observe for f ∈ D(K † ) that
as we not only have PR(K) f ∈ R(K) but also PR(K) f ∈ R(K). Hence, K † f ∈ R(K̃ −1 ) =
N (K)⊥ and therefore R(K † ) ⊂ N (K)⊥ . However, we also observe N (K)⊥ ⊂ R(K † ) as K † Ku =
K̃ −1 K̃u = u for u ∈ N (K)⊥ already implies u ∈ R(K † ).
It remains to prove the Moore-Penrose equations. We begin with 4: For f ∈ D(K † ) it follows
from R(K † ) = N (K)⊥ and (2.4) that
K † f = K † PR(K) f = K † KK † f
The following theorem states that minimum norm solutions can be computed via the generalised
inverse.
Theorem 2.1. For each f ∈ D(K † ) the minimal norm solution u† of (1.1) is given via
u† = K † f .
The set of all least squares solutions is given via {u† } + N (K).
In numerical linear algebra it is a well known fact that the normal equations can be considered
to compute least squares solutions. The same is true in the continuous case.
Theorem 2.2. For given f ∈ D(K † ) the function u ∈ U is minimal norm solution if and only if
it satisfies the normal equations
K ∗ Ku = K ∗ f . (2.5)
Proof. An element u ∈ U is least squares solution if and only if Ku is the projection of f onto R(K),
i.e. Ku = PR(K) (f ). The latter is equivalent to Ku ∈ R(K) and Ku − f ∈ R(K)⊥ = N (K ∗ ).
Thus, we conclude K ∗ (Ku − f ) = 0. Further, a least squares solution has minimal norm if and
only if u ∈ N (K)⊥ .
CHAPTER 2. LINEAR INVERSE PROBLEMS 17
K † f = (K ∗ K)† K ∗ f ,
and hence, in order to approximate K † f we may also compute an approximation via (2.5) instead.
At the end of this section we further want to analyse the domain of the generalised inverse
in more detail. Due to the construction of the Moore-Penrose inverse we have D(K † ) = R(K) ⊕
R(K)⊥ . As orthogonal complements are always closed we can conclude
and hence, D(K † ) is dense in V. Thus, if R(K) is closed it follows that D(K † ) = V and vice versa,
D(K † ) = V implies R(K) to be closed. Moreover, for f ∈ R(K)⊥ = N (K † ) the minimum-norm
solution is u† = 0. Therefore, for given f ∈ R(K) the important question to address is when f
also satisfies f ∈ R(K). If this is the case, K † has to be continuous. However, the existence of a
single element f ∈ R(K) \ R(K) is enough already to prove that K † is discontinuous.
Theorem 2.3. Let K ∈ L(U, V). Then K † ∈ L(D(K † ), U) if and only if R(K) is closed.
In the next section we are going to discover that the class of compact operators is a class for
which the Moore-Penrose inverses are discontinuous.
Proof. Since U and R(K) are infinite dimensional, we can conclude that N (K)⊥ is also infi-
nite dimensional. We can therefore find a sequence {un }n∈N with un ∈ N (K)⊥ , kun k = 1 and
hun , uk i = 0 for n 6= k. Since K is a compact operator the sequence fn = Kun is compact. Hence,
we can find k, l such that for each δ > 0 we have kKul − Kuk k < δ. However, we also obtain
Hence, K † is discontinuous.
To have a better understanding of when we have f ∈ R(K) \ R(K) for compact operators K,
we want to consider the singular value decomposition of compact operators.
18 2.3. SINGULAR VALUE DECOMPOSITION OF COMPACT OPERATORS
and
∞
X
Kw = σj hw, uj iU vj for all j ∈ N. (2.7)
j=1
Remark 2.1. Since Eigenvalues of K ∗ K with Eigenvectors uj are also Eigenvalues of KK ∗ with
Eigenvectors vj , we further obtain a singular value decomposition of K ∗ due to (2.6), i.e.
∞
X
∗
K z= σj hz, vj iV uj for all j ∈ N.
j=1
We now want to derive a representation of the Moore-Penrose inverse in terms of the singular
value decomposition. Remember that we have concluded K † f = (K ∗ K)† K ∗ f from Theorem 2.1
and Theorem 2.2. Hence, for u† = K † f we obtain
∞
X ∞
X
σj2 hu† , uj iU uj = K ∗ Ku† = K ∗ f = σj hf, vj iV uj
j=1 j=1
As for the singular value decomposition of K and K ∗ we have to verify if the sum converges. In
contrast to the singular value decompositions of K and K ∗ this is not always true. Precisely, the
so-called Picard criterion has to be met for (2.8) to converge.
CHAPTER 2. LINEAR INVERSE PROBLEMS 19
Theorem 2.6. Let K ∈ K(U, V) with singular system {(σj , vj , uj )}, and f ∈ R(K). Then
f ∈ R(K) and (2.8) converges if and only if the Picard criterion
∞
†
X |hf, vj iV |2
kK f k2U = <∞ (2.9)
j=1
σj2
is met.
Hence, the Picard criterion tells us how quickly the coefficients hf, vj iV have to decay compared
to the singular values σj . From representation (2.8) we can further conduct what happens in case
of noisy measurements. Assume we have f δ = f + δvj instead of f . We then observe
†
δ
K f − K f
= δkK † vj kU =
† δ
→ ∞ for j → ∞ .
U σj
For static j we see that the amplification of the error δ depends on how small σj is. Hence, the
faster the singular values decay, the stronger the amplification of errors. For that reason, one
distinguishes between three classes of ill-posed problems:
• Mildly ill-posed inverse problems are problems of the form (1.1) for which c > 0 and r > 0
exist with σj ≥ cj −r for all j ∈ N.
• Severely ill-posed inverse problems of the form (1.1) for which a significantly large j̃ ∈ N
exists, such that σj ≤ cj −r is true for all j ≥ j̃ and c, r > 0.
• Exponentially ill-posed problems of the form (1.1) for which a significantly large j̃ ∈ N exists,
r
such that σj ≤ ce−j is true for all j ≥ j̃ and c, r > 0.
Example 2.1. Let us consider the example of differentiation again, as introduced in Section 1.1.2.
Let U = V = L2 ([0, 1]), then the operator K of the inverse problem (1.1) of differentiation is given
as
Z y Z 1
(Ku)(y) = u(x) dx = k(x, y)u(x) dx ,
0 0
for
(
1 x≤y
k(x, y) = .
0 else
From the theory of integral operators it follows that K is compact, due to it’s kernel k(x, y) being
square integrable (cf. [1]). In order to compute the singular value decomposition we have to
compute K ∗ first, which is characterised via
Hence, we obtain
Z 1Z 1 Z 1 Z 1
hKu, viL2 ([0,1]) = k(x, y)u(x) dx v(y) dy = u(x) k(x, y)v(y) dy dx .
0 0 0 0
20 2.3. SINGULAR VALUE DECOMPOSITION OF COMPACT OPERATORS
Now we want to compute the Eigenvalues and Eigenvectors of K ∗ K, i.e. we look for λ > 0 and
u ∈ L2 ([0, 1]) with
Z 1Z y
λu(x) = (K ∗ Ku)(x) = u(z) dz dy .
x 0
from which we conclude u0 (0) = 0. Taking the derivative another time thus yields the ordinary
differential equation
Thus, the picard criterion in this case is equivalent to the condition that the Fourier series of f is
differentiable with respect to every element of the series (and hence, that f is differentiable).
Chapter 3
Regularisation
We have seen in the previous section that the major source of ill-posedness of inverse problems
of the type (1.1) is a fast decay of the singular values of K. An idea to overcome this issue is to
define approximations of K † in the following fashion. Consider the family of operators
∞
X
Rα f := gα (σj )hf, vj iV uj , (3.1)
j=1
with functions gα : R>0 → R≥0 that converge to 1/σj as α converges to zero. We are going to see
that such an operator Rα is what is called a regularisation (of K † ), if gα is bounded, i.e.
Definition 3.1. Let K ∈ L(U, V) be a bounded operator. A family {Rα }α>0 of linear operators is
called regularisation (or regularisation operator) of K † if
21
22
for measured data f δ ∈ V, we can expect that α has to be chosen in accordance to this error δ.
Let us therefore consider the overall error kRα f δ − K † f kU that we can split up as follows:
Rα f δ − K † f
≤
Rα f δ − Rα f
+
Rα f − K † f
U U
U
†
≤ δ kRα kL(V,U ) +
Rα f − K f
. (3.3)
U
The first term of (3.3) is the data error; this term unfortunately does not stay bounded for α → 0,
which can be concluded from the uniform boundedness principle (also known as the Banach-
Steinhaus theorem), cf. [5, Section 2.2, Theorem 2.2]. The second term however vanishes for
α → 0, due to the pointwise convergence of Rα to K † . Hence it becomes evident from (3.3) that
the choice of α depends on δ.
Definition 3.2. A function α : R>0 × V → R>0 , (δ, f δ ) → α(δ, f δ ) is called parameter choice rule.
We distinguish between
1. a-priori parameter choice rules, if they depend on δ only;
and
n
o
lim sup α(δ, f † ) f δ ∈ V,
f − f δ
≤ δ = 0
δ→0 V
are guaranteed.
In case of 1 or 2 we would simply write α(δ), respectively α(f δ ), instead of α(δ, f δ ).
For the sake of brevity we are only discussing a-priori parameter choices throughout this lecture.
In fact, it can be shown that for every regularisation an a-priori parameter choice rule, and thus,
a convergent regularisation, exists.
Theorem 3.1. Let {Rα }α>0 be a regularisation of K † , for K ∈ L(U, V). Then there exists an
a-priori parameter choice rule, such that (Rα , α) is a convergent regularisation.
Proof. Let f ∈ D(K † ) be arbitrary but fixed. We can find a monotone increasing function σ :
R>0 → R>0 such that for every ε > 0 we have
†
ε
R f δ
− K f
≤ ,
σ(ε)
U 2
due to the pointwise convergence Rα → K † .
As the operator Rσ(ε) is continuous for fixed ε, there exists ρ(ε) > 0 with
Then limδ→0 α(δ) = 0 follows. Furthermore, there exists δ := ρ(ε) for all ε > 0, such that with
α(δ) = σ(ε)
Rα(δ) f δ − K † f
≤
Rσ(ε) f δ − Rσ(ε) f
+
Rσ(ε) f δ − K † f
≤ ε
U U U
Now we want to turn our attention to the case when f ∈ / D(K † ). As the generalised inverse is
not defined for those functions we cannot expect a convergent regularisation to stay bounded as
α → 0. This is confirmed by the following result.
Proof. The convergence in case of f ∈ D(K † ) follows from Theorem 3.1. We therefore only
need to consider the case f ∈ / D(K † ). We assume that there exists a sequence αk → 0 such
that kuαk kU is uniformly bounded. Then there exists a weakly convergent subsequence uαkl with
some limit u ∈ U, cf. [3, Section 2.2, Theorem 2.1]. As continuous linear operators are also weakly
continuous, we further have Kuαkl * Ku. However, as KRα are uniformly bounded operators, we
also conclude Kuαkl = KRαkl f * PR(K) f (according to Lemma 2.1). Hence, we have f ∈ D(K † )
in contradiction to the assumption f ∈
/ D(K † ).
We can finally characterise a-priori parameter choice strategies that lead to convergent regu-
larisation methods via the following theorem.
Theorem 3.3. Let {Rα }α>0 be a regularisation, and α : R>0 → R>0 an a-priori parameter choice
rule. Then (Rα , α) is a convergent regularisation method if and only if
1. limδ→0 α(δ) = 0
to (Rα , α) being a convergent regularisation method. If condition 1 is violated, Rα(δ) f does not
converge pointwise to K † f and hence, (3.4) is violated for f = f δ and δ = 0. Hence, (Rα , α) is
not a convergent regularisation method. If condition 1 is fulfilled but condition 2 is violated, there
exists a null sequence {δk }k∈N with δk kRα(δk ) kL(V,U ) ≥ C > 0, and hence, we can find a sequence
{gk }k∈N ⊂ V with kgk kV = 1 and δk kRα(δk ) gk kU ≥ C. Let f ∈ D(K † ) be arbitrary and define
fk := f + δk gk . Then we have on the one hand kf − fk kV ≤ δk , but on the other hand the norm of
cannot converge to zero, as the second term δk Rα(δk ) gk is bounded from below by construction.
Hence, (3.4) is violated for f δ = gk and thus, (Rα , α) is not a convergent regularisation method.
In the following we want to revisit (3.1) and prove that these methods are indeed regularisation
methods for piecewise continuous functions gα satisfying (3.2).
Theorem 3.4. Let gα : R>0 → R be a piecewise continuous function satisfying (3.2), limα→0 gα (σ) =
σ and
1
Rα f → K † f as α → 0,
Proof. From the singular value decomposition of K † and the definition of Rα we obtain
∞ ∞
†
X 1 X
Rα f − K f = gα (σj ) − hf, vj iV uj = (σj gα (σj ) − 1) hu† , uj iU uj .
σj
j=1 j=1
and hence, each element of the sum stays bounded. Thus, we can estimate
2 ∞
X 2
lim sup
Rα f − K † f
≤ lim sup |σj gα (σj ) − 1|2 hu† , uj iU
α→0 U α→0
j=1
∞
X 2 2
†
≤ lim σ g (σ ) − 1 hu , uj U ,
i
j α j
α→0
j=1
and due to
the pointwise
convergence of gα (σj ) to 1/σj we deduce limα→0 σj gα (σj )−1 = 0. Hence,
we have
Rα f − K † f
U → 0 for all f ∈ D(K † ).
Proposition 3.1. Let the same assumptions hold as in Theorem 3.4. Further, let (3.2) be satisfied.
Then (Rα , α) is a convergent regularisation method if
lim δCα(δ) = 0
δ→0
is guaranteed.
CHAPTER 3. REGULARISATION 25
Finally we want to consider the error propagation of a data error for the regularisation method
(3.1).
Theorem 3.5. Let the same assumptions hold for gα as in Proposition 3.1. If we define uα := Rα f
and uδα := Rα f δ , with f ∈ D(K † ), f δ ∈ V and kf − f δ kV ≤ δ, then
and
hold true.
to obtain (3.7).
Combining the assertions of Theorem 3.4, Proposition 3.1 and Theorem 3.5, we obtain the
following convergence results of the regularised solutions.
Proposition 3.2. Let the assumptions of Theorem 3.4, Proposition 3.1 and Theorem 3.5 hold
true. Then,
uα(δ,f δ ) → u†
is guaranteed as δ → 0.
Note that we naturally obtain limα→0 gα (σ) = 1/σ for σ > 0. Equation (3.1) then reads as
X 1
uα = Rα f = hf, vj iV uj , (3.9)
σj
σj ≥α
for all f ∈ V. Note that (3.9) is always finite for α > 0 as zero is the only accumulation point of
singular vectors of compact operators.
From (3.8) we immediately observe gα (σ) ≤ Cα = 1/α. Thus, according to Proposition 3.1
the truncated singular value decomposition, together with an a-priori parameter choice strategy
satisfying limδ→0 α(δ) = 0, is a convergent regularisation method if limδ→0 δ/α(δ) = 0.
Moreover, we observe supσ,α σgα (σ) = γ = 1 and hence, we obtain the error estimates kKuα −
Kuα kV ≤ δ and kuα − uδα kU ≤ δ/α as a consequence of Theorem 3.5.
δ
Note that Tikhonov regularisation can be computed without knowledge of the singular system.
Considering the equation (K ∗ K + αI)uα in terms of the singular value decomposition, we observe
∞ ∞
X σj ∗
X ασj
2 hf, v i
j V K Ku j + 2 hf, vj iV uj
j=1
σj + α |{z} j=1 σj + α
=σj vj
| {z }
=σj2 uj
∞ ∞
X σj (σj2 + α) X
= hf, vj iV uj = σj hf, vj iV uj = K ∗ f .
j=1
σj2 + α j=1
(K ∗ K + αI)uα = K ∗ f (3.11)
for uα . The advantage in computing uα via (3.11) is that its computation does not require the
singular value decomposition of K, but only involves the inversion of a linear, well-posed operator
equation with a symmetric, positive definite operator.
1
Named after the Russian mathematician Andrey Nikolayevich Tikhonov (30 October 1906 - 7 October 1993)
CHAPTER 3. REGULARISATION 27
3.3 Source-conditions
Before we continue to investigate other examples of regularisation we want to briefly address
the question of the convergence speed of a regularisation method. From Theorem 3.5 we have
already obtained a convergence rate result; however, with additional regularity assumptions on
the (unknown) minimal norm solution we are able to improve those. The regularity assumptions
that we want to consider are known as source conditions, and are of the form
∃w ∈ U : u† = (K ∗ K)µ w . (3.12)
The power µ > 0 of the operator is understood in the sense of the consider the µ-th power of the
singular values of the operator K ∗ K, i.e.
∞
σj2µ hw, uj iU uj .
X
(K ∗ K)µ w =
j=1
The rate of convergence of a regularisation scheme to the minimal norm solution now depends on
the specific choice of gα . We assume that gα satisfies
for all σ > 0. In case of the truncated singular value decomposition we would for instance have
ωµ (α) = αµ . With this additional assumption, we can improve the estimate in Theorem 3.4 as
follows:
∞
X
†
kRα f − K f kV ≤ |σj gα (σj ) − 1|2 |hu† , uj iU |2
j=1
∞
|σj gα (σj ) − 1|2 σj4µ |hw, uj iU |2
X
=
j=1
≤ ωµ (α)2 kwk2U
kuα − u† kU ≤ ωµ (α)kwkU .
Example 3.1. In case of the truncated singular value decomposition we know from Section 3.1
that Cα = 1/α, and we can further conclude ωµ (α) = α2µ . Hence, (3.13) simplifies to
in this case. In order to make the right-hand-side of (3.14) as small as possible, we have to choose
α such that
1
δ 2µ+1
α= .
2µkwkU
28 3.4. ASYMPTOTIC REGULARISATION
for some function γ : R → R. From the initial conditions we immediately observe γ(0) = 0. From
the singular value decomposition and (3.15) we further see
X∞ X∞
γj0 (t)uj = σj hf, vj iV − σj γ(t) huj , uj iU uj .
j=1 j=1
| {z }
=kuj k2U =1
σ2 σ2
1 + x we further observe 1 − e− α ≤ σ 2 /α and therefore (1 − e− α )/σ ≤ maxj σj /α = σ1 /α =
kKkL(U ,V) /α =: Cα .
CHAPTER 3. REGULARISATION 29
for some τ > 0 and u0 ≡ 0. Iteration (3.17) is known as the so-called Landweber iteration. We
assume f ∈ D(K † ) first, and with the singular value decomposition of K and K ∗ we obtain
∞
X ∞
X
k+1
1 − τ σj hu , uj iU + τ σj hf, vj iV uj ,
2
(3.18)
k
hu , uj iU uj =
j=1 j=1
Proof. Equation (3.21) can simply be verified via induction. We immediately see that
2 2
X
2 2−i 2 1 − (1 − 2τ σ 2 + τ 2 σ 4 ) 1 − 1 − τ σ2
(1 − τ σ ) = 1 + (1 − τ σ ) = =
τ σ2 τ σ2
i=1
Together with compactness of the sub level sets this leads to the fundamental theorem of
optimisation.
Theorem 3.6. Let X be a Banach space with topology τ and let E : X → R ∪ {+∞} be lower
semi-continuous. Furthermore, let the level set
{x ∈ X | E(x) ≤ c}
be non-empty, and compact in the topology τ for some c ∈ R. Then there exists a global minimiser
x̂ of E, i.e. E(x̂) ≤ E(x) for all x ∈ X .
Proof. Let Ê = inf x∈X E(x). Then a sequence {xk }k∈N exists with E(xk ) → Ê for k → ∞. For
k sufficiently large, E(xk ) ≤ c holds and hence, {xk }k∈N is contained in a compact set. As a
consequence, a subsequence {xkl }l∈N exists with xkl → x̂, for l → ∞, for some x̂ ∈ X . From the
lower semi-continuity of E we obtain
Ê ≤ E(x̂) ≤ lim inf E(xkl ) ≤ Ê ,
l→∞
For the sake of brevity, we will not going to prove lower semi-continuity and compactness of
the sub-level sets for any of the functionals that we are going to consider. Nevertheless we want
to emphasise (without proof) that all functionals considered throughout this lecture are lower
semi-continuous and have compact sub level sets in the weak-* topology of the corresponding
Banach space. According to the Theorem of Banach-Alaoglu (cf. [9, Theorem 1.9.13]), the set
{v ∈ X ∗ | kvkX ∗ ≤ c}, for c > 0, is compact in the weak-* topology. Hence, we could conclude
existence of a global minimum for a given infinite dimensional optimisation problem, if we were
able to prove lower semi-continuity in the weak-* topology. Unfortunately this is not as easy as it
sounds and certainly beyond the scope of this lecture.
Before we want to proceed to our first example of a variational regularisation model, we further
want to recall the concepts of directional and Fréchet-derivatives of functionals.
Definition 3.5. Let E : X → R ∪ {+∞} be a functional or operator. The directional derivative
(also called first variation) at position x ∈ X in direction y ∈ X is defined as
E(x + ty) − E(x)
dy E(x) := lim , (3.24)
t↓0 t
if that limit exists.
Note that if we define the function ϕy (τ ) := E(x + τ y), the directional derivative of E is
equivalent to ϕ0y (τ )|τ =0
Definition 3.6. Let E : X → R ∪ {+∞} a functional mapping from the Banach space X , and let
dy E(x) exist for all y ∈ X . If E 0 (x) : X → R ∪ {+∞} exists such that
(E 0 (x))(y) = dy E(x) ∀y ∈ X ,
and
|E(x + y) − E(x) − (E 0 (x))(y)|
→0 for kykX → 0 ,
kykX
holds true, E is called Fréchet-differentiable in x and E 0 (x) the Fréchet-derivative in x. If the
Fréchet-derivative exists for all x ∈ X , the operator E 0 : X → X ∗ is called Fréchet-derivative.
32 3.6. VARIATIONAL REGULARISATION
We therefore see, that uα satisfying (3.10) (resp. (3.11)) is also the solution of a variational
regularisation problem of the form (3.23), with H(Ku, f ) = 21 kKu − f k2V and J(u) = 12 kukU .
A nice property of this representation as a minimiser of a functional is that we can generalise
Tikhonov regularisation by choosing different regularisation functionals for J, i.e.
1
T̃α (u) := kKu − f k2V + αJ(u) , (3.26)
2
for general J : U → R ∪ {+∞}. We want to denote this regularisation as Tikhonov-type regular-
isation and present a few examples in the following. Note that Tikhonov-type regularisation for
general J is not necessarily a regularisation in the sense of Definition 3.1, as we cannot expect
uα := arg minu∈U T̃α (u) → u† for α → 0. However, we can generalise Definition 2.1 to justify
calling the minimisation of (3.26) a regularisation.
Definition 3.7. An element u ∈ U is called J-minimising solution of (1.1), if
The (negative) entropy used in physics and information theory is defined as the functional J :
PDF(Ω) → R≥−1 with
Z
J(u) := u(x) log(u(x)) − u(x) dx ,
Ω
is often used as a regulariser, in order to enforce sparse solutions. The corresponding Tikhonov-
type regularisation reads as
1 X∞
2
uα ∈ arg min kKu − f k`2 + α |uj | .
u∈`1 2 j=1
Note that `p spaces are increasing in p, which implies kuk`2 ≤ kuk`1 and hence, u ∈ `2 if u ∈ `1 .
34 3.7. CONVEX VARIATIONAL CALCULUS
Example 3.5 (Total variation regularisation). Total variation as a regulariser has originally been
introduced for image-denoising and -restoration applications with the goal to preserve edges in
images, respectively discontinuities in signals [8]. For smooth signals u ∈ W 1,1 (Ω) the total
variation is simply defined as
Z
J(u) := k(∇u)(x)k2 dx ;
Ω
a more rigorous definition to also include functions with discontinuties is given via the dual for-
mulation
Z
J(u) = TV(u) := sup u(x) (divϕ)(x) dx .
ϕ∈C0∞ (Ω;Rn ) Ω
kkϕk2 k∞ ≤1
Note that the choice of the vector norm is somewhat arbitrary and could be replaced by other
p-vector norms. The corresponding Tikhonov-type regularisation reads as
1 2
uα ∈ arg min kKu − f kV + αTV(u) , (3.29)
u∈BV(Ω) 2
for K ∈ L(BV(Ω) , V), and where BV(Ω) denotes the space of functions of bounded variation
defined as
λx + (1 − λ)y ∈ C ,
for all λ ∈ [0, 1] and all x, y ∈ C. The functional E is called strictly convex, if the equality of
(3.30) only holds for x = y or λ ∈ {0, 1}.
Example 3.6 (Absolute Value Function). For C = R and E : R → R≥0 the absolute value function
E(x) = |x| is convex, since the absolute value function is a metric, and the triangular inequality
yields |λx + (1 − λ)y| ≤ λ|x| + (1 − λ)|y|. Obviously, the absolute value function is not strictly
convex.
Example 3.7 (Characteristic Function). The characteristic function χC : X → R ∪ {+∞} with
(
0 x∈C
χC (x) := (3.31)
+∞ x 6∈ C
is convex if C ⊆ X is a convex subset.
CHAPTER 3. REGULARISATION 35
for x > 0
{1}
∂E(x) = sign(x) := [−1, 1] for x = 0 .
{−1} for x < 0
This can easily be verified by case differentiation. For x = 0 the inequality |x| ≥ px is obviously
fulfilled for any x ∈ R iff p ∈ [−1, 1]. For x > 0 the inequality |y| ≥ py+(1−p)x is fulfilled for every
y ∈ R iff p = 1. In analogy, p = −1 has to hold for x < 0 in order to guarantee |y| ≥ py − (1 + p)x
for each y ∈ R.
Throughout the remainder of this lecture we are particularly interested in convex, non-differentiable
and absolutely one-homogeneous functionals (like the `1 -norm or the total variation), i.e. we want
to consider functionals E for which E(cx) = |c|E(x) is satisfied, for every c ∈ R. We therefore
characterise the subdifferential of (absolutely) one-homogeneous functionals with the following
lemma.
Lemma 3.2 (Subdifferential for One-Homogeneous Functionals). Let E : X → R ∪ {+∞} be a
convex and one-homogeneous functional. Then, the subdifferential of E at x can equivalently be
written as
∂E(x) = {p ∈ X ∗ | hp, xi = E(x), hp, yiX ≤ E(y), ∀y ∈ X } .
Proof. If we consider y = 0 ∈ X in the definition of the subdifferential we immediately see
hp, xi ≥ E(x) ,
while for y = 2x ∈ X we obtain
hp, xi ≤ E(2x) − E(x) = 2E(x) − E(x) = E(x) ,
due to the one-homogeneity of E. Thus, hp, xi = E(x) follows. Moreover, if we insert hp, xi = E(x)
in the definition of the subdifferential we get
hp, yi ≤ E(y) ,
for all y ∈ X .
36 3.7. CONVEX VARIATIONAL CALCULUS
Example 3.9. If we consider the one-homogeneous absolute value function E : R → R≥0 with
E(x) = |x| again, we see that xp = |x| implies p = sign(x). Moreover, we discover that for
arguments y with the same sign as x we obtain yp = |y|, while for arguments y having the
opposite sign as x, we get yp = −|y| < |y|.
Considering (3.23) again, we now and for the rest of the lecture assume H and J to be convex
(in addition to the assumption of being proper and lower semi-continuous). Further, we assume
H to be Fréchet-differentiable with respect to u. For this setup we can state the following useful
characterisation of the subdifferential (without proof).
Theorem 3.8. Let H : V × V → R ∪ {+∞} and J : U → R ∪ {+∞} be proper, convex and lower
semi-continuous functionals, for Banach spaces U and V. Let furthermore K ∈ L(U, V), and H be
Fréchet-differentiable with respect to u. Then, for α > 0, the subdifferential of the functional Eα
as defined in (3.23) is given via
It is worth mentioning that all the definitions and theorems derived in this chapter can be
transferred to (strictly) concave functions and (global) maximisation problems. In particular, if E
is a (strictly) concave functional then −E is (strictly) convex, and the characterisation of a global
maximum of E is identical to the characterisation of a global minimum of −E. We will make use
of this fact throughout the next chapter when it comes to saddle-point problems.
To conclude this chapter, we want to introduce the useful notion of the generalised Bregman
distance.
Definition 3.11. For a proper, lower semi-continuous and convex functional E : X → R ∪ {+∞}
with non-empty subdifferential ∂E, the generalised Bregman distance is defined as
p
DE (x, y) := E(x) − E(y) − hp, x − yi , p ∈ ∂E(y) . (3.32)
p
The generalised Bregman distance is no distance in the classical sense. It does satisfy DE (x, y) ≥
0 for all x, y ∈ X ; however, it is neither symmetric nor does it satisfy a triangular inequality in
general. Nevertheless can we symmetrise the Bregman distance by simply adding the two Bregman
distances with interchanged arguments.
CHAPTER 3. REGULARISATION 37
Definition 3.12. Let the same assumptions hold as in Definition 3.11. The generalised symmetric
Bregman distance is defined as
symm p1 p2
DE (x1 , x2 ) := DE (x2 , x1 ) + DE (x1 , x2 ) = hx1 − x2 , p1 − p2 i ,
Computational realisation
In this final chapter of the lecture we want to focus on how to implement variational regularisation
methods numerically. We focus on variational regularisation methods as they appear to be the most
general class of regularisation methods that we have studied throughout this lecture. However,
we will only consider convex variational regularisation methods, as we want to guarantee that the
iterative methods that we are going to consider converge to global minimisers.
Here both F and G are proper, l.s.c. and convex functionals, and D ∈ L(U, W). Before we
continue to study (4.1) and potential numerical solutions, we want to give a quick example.
Example 4.1 (Total variation regularisation). Note that we can reformulate (3.29) as
1 2
uα ∈ arg min kKu − f kV + suphu, αdivvi − χC (v)
u∈BV(Ω) 2 v
In order to discuss how to solve (4.1) numerically, we want to consider the optimality conditions
of the saddle-point problem first. For any subgradients p̂ ∈ ∂F (û) and q̂ ∈ ∂G(v̂) that satisfy
p̂ + D∗ v̂ = 0 ,
(4.2)
q̂ − Dû = 0 ,
the point (û, v̂) is a saddle-point of (4.1), due to the convexity of F and G. We therefore want to
approach finding a saddle point of (4.1) via the iterative approach
+ D∗ v k+1
k+1 k+1
− uk
p u 0
+M = , (4.3)
q k+1 − Duk+1 v k+1 − v k 0
39
40 4.1. A PRIMAL DUAL HYBRID GRADIENT METHOD
for pk+1 ∈ ∂F (uk+1 ) and q k+1 ∈ ∂G(v k+1 ), and a 2 × 2 operator-matrix M . We immediately
observe that (4.3) should approach (4.2) in the limit (for properly chosen M ). Hence, the important
question is how to choose M such that (4.3) results in a convergent algorithm with relatively simple
update steps for uk+1 and v k+1 ? From our intuitive knowledge as numerical analysts we would
guess that M should be positive definite and maybe even self-adjoint, to guarantee convergence
of (4.3). But beyond that? A naïve choice for M could simply be the identity; however, in that
case each update step is implicit in both uk+1 and v k+1 , which may not simplify solving our
saddle-point problem at all. Alternatively, we propose to use
I −D∗
1
M := τ (4.4)
−D σ1 I
instead, with τ and σ being positive scalars. For this choice, (4.3) simplifies to the equations
Due to pk+1 ∈ ∂F (uk+1 ) and q k+1 ∈ ∂G(uk+1 ), Equations (4.5) and (4.6) can be rewritten to
û = (I + τ K ∗ K)−1 (w + τ K ∗ f ) ,
CHAPTER 4. COMPUTATIONAL REALISATION 41
where zj denotes the j-th vectorial component of the vector field z, for j ∈ {1, . . . , n}. Hence,
PDHGM reads as
in case of total variation regularisation, due to ∇∗ = −div. Note that in case of ROF-denoising as
proposed in [8], Equation (4.12) simplifies to
uk + τ (divv k + f )
uk+1 = .
1+τ
Note that for real world applications one obviously has to find appropriate discretisations of K
and ∇.
Lemma 4.1. Consider the self-adjoint matrix M as defined in (4.4). Then, M is positive definite
if 0 < τ σkDk2 < 1.
holds true for all (u, v)T 6= (0, 0)T . Evaluating the dual product in (4.14) yields the condition
1 1
kuk2 + kvk2 − 2hDu, vi > 0 . (4.15)
τ σ
With the help of Young’s inequality, we can estimate −2hDu, vi from below with
2c 1
−2hDu, vi ≥ − τ σkDk kuk + kvk2 2
,
τ cσ
for some constant c > 0. Due to the condition τ σkDk2 < 1 it is clear that we can find an ε > 0
with (1 + ε)2 τ σkDk2 = 1. This further implies 1/((1 + ε)τ σkDk2 ) = 1 + ε and hence, we can
choose c = 1 + ε = 1/((1 + ε)τ σkDk2 ) to obtain
1 1 1
−2hDu, vi ≥ − 2
kuk + kvk 2
.
1+ε τ σ
42 4.1. A PRIMAL DUAL HYBRID GRADIENT METHOD
With the help of Lemma 4.1 we are able to prove convergence of the iterates (4.7) and (4.8)
to a saddle point of (4.1).
Theorem 4.1. Let 0 < τ σkDk2 < 1. Then the sequence (uk , v k ) defined via (4.7) and (4.8)
satisfies the following properties (for ku0 k < ∞ and kv 0 k < ∞):
• limk→∞ DFsymm (uk , û) = 0
symm k
• limk→∞ DG (v , v̂) = 0
Proof. Note that we can combine (4.2) and (4.3) with M as defined in (4.4) to
− p̂ + D∗ (v k+1 − v̂)
k+1 k+1
p u − û
, symm k+1
= DFsymm (uk+1 , û) + DG (v , v̂) ≥ 0 .
q k+1 − q̂ − D(uk+1 − û) v k+1 − v̂
due to Mpbeing self-adjoint. By using the notation wk := (uk , v k )T and ŵ := (û, v̂)T , and
kwkM := hM, w, wi, we observe
As k · kM is a norm, due to M being self-adjoint and positive definite, this implies boundedness
of the sequence wk . Summing up the inner product of (4.16) with wk+1 − ŵ over all k therefore
yields
∞
X X∞
symm k+1 symm k+1
2 DF (u , û) + DG (v , v̂) + kwk+1 − wk k2M
k=0 k=0
∞
X
= kwk − ŵk2M + kwk+1 − ŵk2M = kw0 − ŵk2M < ∞ .
k=0
w∞ := liml→+∞ wkl . We have to verify that w∞ is a saddle point, which for s∞ := liml→+∞ skl ,
sk := (pk , q k )T , follows immediately from (4.16).
[2] Heinz Werner Engl, Martin Hanke, and Andreas Neubauer. Regularization of inverse problems,
volume 375. Springer Science & Business Media, 1996. 16, 17, 19
[3] Charles W Groetsch. Stable approximate evaluation of unbounded operators. Springer, 2006.
23
[4] Jan Lellmann. Convex optimization with applications to image processing, 2013. Lecture
notes. 36
[5] Terry J Morrison. Functional analysis: An introduction to Banach space theory, volume 43.
John Wiley & Sons, 2011. 22
[6] Arch W Naylor and George R Sell. Linear operator theory in engineering and science. Springer
Science & Business Media, 2000. 14
[8] Leonid I Rudin, Stanley Osher, and Emad Fatemi. Nonlinear total variation based noise
removal algorithms. Physica D: Nonlinear Phenomena, 60(1):259–268, 1992. 34, 41
[9] Terence Tao. Epsilon of Room, One, volume 1. American Mathematical Soc., 2010. 31
[10] Eberhard Zeidler. Applied functional analysis, applications to mathematical physics. Applied
Mathematical Sciences, 108, 1995. 13, 14
45