Book On Damage Mechanics

An Introduction to Domain Decomposition Methods:
algorithms, theory and parallel implementation

Victorita Dolean, Pierre Jolivet, Frederic Nataf
To cite this version:

Victorita Dolean, Pierre Jolivet, Frederic Nataf. An Introduction to Domain Decomposition
Methods: algorithms, theory and parallel implementation. Master. France. 2015. <cel-
01100932v3>
HAL Id: cel-01100932

https://hal.archives-ouvertes.fr/cel-01100932v3
Submitted on 5 Jun 2015 (v3), last revised 10 Jan 2015 (v4)
HAL is a multi-disciplinary open access Larchive ouverte pluridisciplinaire HAL, est

archive for the deposit and dissemination of sci- destinee au depot et a la diffusion de documents
entific research documents, whether they are pub- scientifiques de niveau recherche, publies ou non,
lished or not. The documents may come from emanant des etablissements denseignement et de
teaching and research institutions in France or recherche francais ou etrangers, des laboratoires
abroad, or from public or private research centers. publics ou prives.
An Introduction to Domain Decomposition
Methods: algorithms, theory and parallel
implementation
Victorita Dolean Pierre Jolivet Frederic Nataf
June 5, 2015
2
i
The purpose of this text is to offer an overview of the most popular do-
main decomposition methods for partial differential equations (PDE). The
presentation is kept as much as possible at an elementary level with a spe-
cial focus on the definitions of these methods in terms both of PDEs and
of the sparse matrices arising from their discretizations. We also provide
implementations written in an open source finite element software. In ad-
dition, we consider a number of methods that have not been presented in
other books. We think that this book will give a new perspective and that
it will complement those of Smith, Bjrstad and Gropp [170], Quarteroni
and Valli [160], Mathew [134] and Toselli and Widlund[179] as well as the
review article [20].
The book is addressed to computational scientists, mathematicians, physi-
cists and, in general, to people involved in numerical simulation of par-
tial differential equations. It can also be used as textbook for advanced
undergraduate/First-Year Graduate students. The mathematical tools needed
are basic linear algebra, notions of programming, variational formulation of
PDEs and basic knowledge in finite element discretization.
The value of domain decomposition methods is part of a general need

for parallel algorithms for professional and consumer use. We will focus on
scientific computing and more specifically on the solution of the algebraic
systems arising from the approximation of a partial differential equation.
Domain decomposition methods are a family of methods to solve prob-
lems of linear algebra on parallel machines in the context of simulation. In
scientific computing, the first step is to model mathematically a physical
phenomenon. This often leads to systems of partial differential equations
such as the Navier-Stokes equations in fluid mechanics, elasticity system
in solid mechanics, Schrodinger equations in quantum mechanics, Black and
Scholes equation in finance, Lighthill-Witham equations for traffic, . . . Func-
tional analysis is used to study the well-posedness of the PDEs which is a
necessary condition for their possible numerical approximation. Numerical
analysis enables to design stable and consistant discretization schemes. This
leads to discrete equations F (u) = b Rn where n is the number of degrees
of freedom of the discretization. If F is linear, calculate u is a problem
of linear algebra. If F is nonlinear, a method for solving is the classical
Newtons method, which also leads to solving a series of linear systems.
In the past, improving performance of a program, either in speed or in
the amount of data processed, was only a matter of waiting for the next
generation processors. Every eighteen months, computer performance dou-
bled. As a consequence, linear solver research would take second place to the
search for new discretization schemes. But since approximately year 2005
the clock speed stagnates at 2-3 GHz. The increase in performance is almost
entirely due to the increase in the number of cores per processor. All ma-
jor processor vendors are producing multicore chips and now every machine
ii
is a parallel machine. Waiting for the next generation machine does not
guarantee anymore a better performance of a software. To keep doubling
performance parallelism must double. It implies a huge effort in algorithmic
development. Scientific computing is only one illustration of this general
need in computer science. Visualization, data storage, mesh generation,
operating systems, . . . must be designed with parallelism in mind.
We focus here on parallel linear iterative solvers. Contrary to direct
methods, the appealing feature of domain decomposition methods is that
they are naturally parallel. We introduce the reader to the main classes
of domain decomposition algorithms: Schwarz, Neumann-Neumann/FETI
and Optimized Schwarz. For each method we start by the continuous for-
mulation in terms of PDEs for two subdomains. We then give the definition
in terms of stiffness matrices and their implementation in a free finite ele-
ment package in the many subdomain case. This presentation reflects the
dual nature of domain decomposition methods. They are solvers of linear
systems keeping in mind that the matrices arise from the discretization of
partial differential operators. As for domain decomposition methods that
directly address non linearities, we refer the reader to e.g. [14] or [15] and
references therein.
In Chapter 1 we start by introducing different versions of Schwarz algo-

rithms at continuous level, having as a starting point H. Schwarz method
(see [169]): Jacobi Schwarz Method (JSM), Additive Schwarz Method (ASM)
and Restricted Additive Schwarz (RAS) which the default parallel solver in
PETSc. The first natural feature of these algorithms are that they are equiv-
alent to a Block-Jacobi method when the overlap is minimal. We move on
to the algebraic versions of the Schwarz methods. In order to do this, sev-
eral concepts are necessary: restriction and prolongation operators as well as
partitions of unity which make possible the global definition. These concepts
are explained in detail in the case of different type of discretizations (finite
difference or finite element) and spatial dimensions. The convergence of the
Schwarz method in the two-subdomain case is illustrated for one-dimensional
problems and then for two-dimensional problems by using Fourier analysis.
The last part of the chapter is dedicated to the numerical implementation
by using FreeFem++ [105] for general decompositions into subdomains.
In Chapter 2 we present Optimized Schwarz methods applied to the
Helmholtz equation which models acoustic wave propagation in the fre-
quency domain. We begin with the two subdomain case. We show the
need for the use of interface conditions different from Dirichlet or Neumann
boundary conditions. The Lions and Despres algorithms which are based
on Robin interface conditions are analyzed together with their implementa-
tions. We also show that by taking even more general interface conditions,
much better convergence can be achieved at no extra cost compared to the
use of Robin interface conditions. We consider the many subdomain case
iii
as well. These algorithms are the method of choice for wave propagation
phenomena in the frequency regime. Such situations occur in acoustics,
electromagnetics and elastodynamics.
In Chapter 3 we present the main ideas which justify the use of Krylov
methods instead of stationary iterations. Since Schwarz methods introduced
in Chapter 1 represent fixed point iterations applied to preconditioned global
problems, and consequently not providing the fastest convergence possible,
it is natural to apply Krylov methods instead. This provides the justi-
fication of using Schwarz methods as preconditioners rather than solvers.
Numerical implementations and results using FreeFem++ are closing the
chapter. Although some part of the presentation of some Krylov methods
is not standard, readers already familiar with Krylov methods may as well
skip it.
Chapter 4 is devoted to the introduction of two-level methods. In the
presence of many subdomains, the performance of Schwarz algorithms, i.e.
the iteration number and execution time will grow linearly with the number
of subdomains in one direction. From a parallel computing point of view this
translates into a lack of scalability. The latter can be achieved by adding a
second level or a coarse space. This is strongly related to multigrid methods
and to deflation methods from numerical linear algebra. The simplest coarse
space which belongs to Nicolaides is introduced and then implemented in
FreeFem++.
In Chapter 5, we show that Nicolaides coarse space (see above) is a
particular case of a more general class of spectral coarse spaces which are
generated by vectors issued from solving some local generalized eigenvalue
problems. Then, a theory of these two-level algorithms is presented. First, a
general variational setting is introduced as well as elements from the abstract
theory of the two-level additive Schwarz methods (e.g. the concept of stable
decomposition). The analysis of spectral and classical coarse spaces goes
through some properties and functional analysis results. These results are
valid for scalar elliptic PDEs. This chapter is more technical than the others
and is not necessary to the sequel of the book.
Chapter 6 is devoted to the Neumann-Neumann and FETI algorithms.
We start with the two subdomain case for the Poisson problem. Then, we
consider the formulation in terms of stiffness matrices and stress the duality
of these methods. We also establish a connection with block factorization
of the stiffness matrix of the original problem. We then show that in the
many subdomains case Neumann-Neumann and FETI are no longer strictly
equivalent. For sake of simplicity, we give a FreeFem++ implementation of
only the Neumann-Neumann algorithm. The reader is then ready to delve
into the abundant litterature devoted to the use of these methods for solving
complex mechanical problems.
In Chapter 7, we return to two level methods. This time, a quite recent
adaptive abstract coarse space, as well as most classical two-level methods
iv
1 2 3 4 5 6 716 77 78 8
Figure 1: Dependency graph of chapters
are presented in a different light, under a common framework. Moreover,

their convergence is proven in an abstract setting, provided that the as-
sumptions of the Fictitious Space Lemma are satisfied. The new coarse
space construction is based on solving GENeralized Eigenvalue problems in
the Overlap (GenEO). The construction is provable in the sense that the
condition number is given in terms of an explicit formula where the con-
stants that appear are the maximal number of neighbors of a subdomain
and a threshold prescribed by the user. The latter can be applied to a
broader class of elliptic equations, which include systems of PDEs such as
linear elasticity even with highly heterogeneous coefficients. From 7.1 to
7.6 , we give all the materials necessary to build and analyze two-level
methods for Additive Schwarz methods. Section 7.7 is devoted to endow
one level Neumann-Neumann algorithm of Chapter 6 with a GenEO type
coarse space. In section 7.8, we build a coarse space for one level Optimized
Schwarz methods of Chapter 2. It is based on introducing SORAS algorithm
and two related generalized eigenvalue problems. The resulting algorithm is
named SORAS-GenEO-2.
In Chapter 8 we introduce the parallel computational framework used
in the parallel version of the free finite element package FreeFem++ which
is currently linked with HPDDM, a C++ framework for high-performance
domain decomposition methods, available at the following URL: https:
//github.com/hpddm/hpddm. For sake of simplicity we restrict ourselves to
the two-level Schwarz methods. Numerical simulations of very large scale
problems on high performance computers show the weak and strong scala-
bilities of the Schwarz methods for 2D and 3D Darcy and elasticity problems
with highly heterogeneous coefficients with billions of degrees of freedom. A
self contained FreeFem++ parallel script is given.
We give in Figure 1, the dependency graph of the various chapters. For

instance in order to read Chapter 4 it is necessary to be familiar with both
Chapters 3 and 1. From this graph, the reader is able to choose his way in
reading the book. We suggest some possible partial readings. A reader inter-
v
ested in having a quick and partial view and already familiar with Krylov
methods, may very well read only Chapter 1 followed by Chapter 4. For
new comers to Krylov methods, reading of Chapter 3 must be intercalated
between Chapter 1 and Chapter 4.
For a quick view on all Schwarz methods without entering into the technical
details of coarse spaces, one could consider beginning by Chapter 1 followed
by Chapter 2 and then by Chapter 3 on the use of Schwarz methods as
preconditioners, to finish with Chapter 4 on classical coarse spaces.
For the more advanced reader, Chapters 5 and 7 provide the technical frame-
work for the analysis and construction of more sophisticated coarse spaces.
And last, but not least Chapter 8 gives the keys of parallel numerical imple-
mentation and illustrates with numerical results the previously introduced
methods.
vi
Contents
1 Schwarz methods 1
1.1 Three continuous Schwarz Algorithms . . . . . . . . . . . . . . 1
1.2 Connection with the Block Jacobi algorithm . . . . . . . . . . 6
1.3 discrete partition of unity . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 Two subdomain case in one dimension . . . . . . . . . 10
1d Algebraic setting . . . . . . . . . . . . . . . . . . . . . 10
1d Finite element decomposition . . . . . . . . . . . . . 12
1.3.2 Multi dimensional problems and many subdomains . . 13
Multi-D algebraic setting . . . . . . . . . . . . . . . . . . 13
Multi-D finite element decomposition . . . . . . . . . . 14
1.4 Iterative Schwarz methods: RAS, ASM . . . . . . . . . . . . . 15
1.5 Convergence analysis . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.1 1d case: a geometrical analysis . . . . . . . . . . . . . . 16
1.5.2 2d case: Fourier analysis for two subdomains . . . . . . 17
1.6 More sophisticated Schwarz methods: P.L. Lions Algorithm . 19
1.7 Schwarz methods using FreeFem++ . . . . . . . . . . . . . . . 21
1.7.1 A very short introduction to FreeFem++ . . . . . . . . 21
1.7.2 Setting the domain decomposition problem . . . . . . . 26
1.7.3 Schwarz algorithms as solvers . . . . . . . . . . . . . . . 36
1.7.4 Systems of PDEs: the example of linear elasticity . . . 38
2 Optimized Schwarz methods (OSM) 43

2.1 P.L. Lions Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 43
2.1.1 Computation of the convergence factor . . . . . . . . . 45
2.1.2 General convergence proof . . . . . . . . . . . . . . . . . 47
2.2 Helmholtz problems . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.2.1 Convergence issues for Helmholtz . . . . . . . . . . . . . 51
2.2.2 Despres Algorithm for the Helmholtz equation . . . . 54
2.3 Implementation issues . . . . . . . . . . . . . . . . . . . . . . . . 57
2.3.1 Two-domain non-overlapping decomposition . . . . . . 58
2.3.2 Overlapping domain decomposition . . . . . . . . . . . 62
2.4 Optimal interface conditions . . . . . . . . . . . . . . . . . . . . 65
2.4.1 Optimal interface conditions and ABC . . . . . . . . . 65
1
2 CONTENTS
2.4.2 Optimal Algebraic Interface Conditions . . . . . . . . . 69

2.5 Optimized interface conditions . . . . . . . . . . . . . . . . . . . 71
2.5.1 Optimized interface conditions for . . . . . . . . 71
2.5.2 Optimized IC for Helmholtz . . . . . . . . . . . . . . . . 74
Optimized Robin interface conditions . . . . . . . . . . 77
Optimized Second order interface conditions . . . . . . 79
Numerical results . . . . . . . . . . . . . . . . . . . . . . 81
2.5.3 Optimized IC for other equations . . . . . . . . . . . . . 86
2.6 FreeFem++ implementation of ORAS . . . . . . . . . . . . . . 87
3 Krylov methods 91
3.1 Fixed point iterations . . . . . . . . . . . . . . . . . . . . . . . . 91
3.2 Krylov spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.2.1 Gradient methods . . . . . . . . . . . . . . . . . . . . . . 96
3.3 The Conjugate Gradient method . . . . . . . . . . . . . . . . . 97
3.3.1 The Preconditioned Conjugate Gradient Method . . . 102
3.4 Krylov methods for non-symmetric problems . . . . . . . . . . 104
3.4.1 The GMRES method . . . . . . . . . . . . . . . . . . . . 106
3.4.2 Convergence of the GMRES algorithm . . . . . . . . . 109
3.5 Krylov methods for ill-posed problems . . . . . . . . . . . . . . 111
3.6 Schwarz preconditioners using FreeFem++ . . . . . . . . . . . 114
4 Coarse Spaces 123

4.1 Need for a two-level method . . . . . . . . . . . . . . . . . . . . 123
4.2 Nicolaides coarse space . . . . . . . . . . . . . . . . . . . . . . . 128
4.2.1 Nicolaides coarse space using FreeFem++ . . . . . . . 129
5 Theory of two-level ASM 135

5.1 Introduction of a spectral coarse space . . . . . . . . . . . . . . 135
5.1.1 Spectral coarse spaces for other problems . . . . . . . . 138
5.2 Variational setting . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.3 Additive Schwarz setting . . . . . . . . . . . . . . . . . . . . . . 139
5.4 Abstract theory for the two-level ASM . . . . . . . . . . . . . . 143
5.5 Definition and properties of coarse spaces . . . . . . . . . . . . 145
5.5.1 Nicolaides coarse space . . . . . . . . . . . . . . . . . . . 146
5.5.2 Spectral coarse space . . . . . . . . . . . . . . . . . . . . 146
5.6 Convergence theory for ASM with Nicolaides and spectral
coarse spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.7 Functional analysis results . . . . . . . . . . . . . . . . . . . . . 152
5.8 Theory of spectral coarse spaces for scalar heterogeneous prob-
lems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
CONTENTS 3
6 Neumann-Neumann and FETI Algorithms 155

6.1 Direct and Hybrid Substructured solvers . . . . . . . . . . . . . 155
6.2 Two-subdomains at the continuous level . . . . . . . . . . . . . 158
6.2.1 Iterative Neumann-Neumann and FETI algorithms . . 159
6.2.2 Substructured reformulations . . . . . . . . . . . . . . . 161
6.2.3 FETI as an optimization problem . . . . . . . . . . . . 164
6.3 Two subdomains case at the algebraic level . . . . . . . . . . . 165
6.3.1 Link with approximate factorization . . . . . . . . . . . 168
6.4 Many subdomains case . . . . . . . . . . . . . . . . . . . . . . . 169
6.4.1 Remarks on FETI . . . . . . . . . . . . . . . . . . . . . . 172
6.5 Neumann-Neumann in FreeFem++ . . . . . . . . . . . . . . . . 173
6.5.1 FreeFem++ scripts . . . . . . . . . . . . . . . . . . . . . 177
6.6 Non-standard Neumann-Neumann type methods . . . . . . . . 182
6.6.1 Smith normal form of linear systems of PDEs . . . . . 183
6.6.2 An optimal algorithm for the bi-harmonic operator . . 186
6.6.3 Some optimal algorithms . . . . . . . . . . . . . . . . . . 188
7 GenEO Coarse Space 191

7.1 Reformulation of the Additive Schwarz Method . . . . . . . . 192
7.2 Mathematical Foundation . . . . . . . . . . . . . . . . . . . . . . 195
7.2.1 Fictitious Space Lemma . . . . . . . . . . . . . . . . . . 195
7.2.2 Symmetric Generalized Eigenvalue problem . . . . . . 197
7.2.3 Auxiliary lemma . . . . . . . . . . . . . . . . . . . . . . . 202
7.3 Finite element setting . . . . . . . . . . . . . . . . . . . . . . . . 204
7.4 GenEO coarse space for Additive Schwarz . . . . . . . . . . . . 205
7.4.1 Some estimates for a stable decomposition with RASM,2 206
7.4.2 Definition of the GenEO coarse space . . . . . . . . . . 208
7.5 Hybrid Schwarz with GenEO . . . . . . . . . . . . . . . . . . . . 211
7.5.1 Efficient implementation . . . . . . . . . . . . . . . . . . 213
7.6 FreeFem++ Implementation . . . . . . . . . . . . . . . . . . . . 215
7.7 Balancing Neumann-Neumann . . . . . . . . . . . . . . . . . . . 221
7.7.1 Easy Neumann-Neumann . . . . . . . . . . . . . . . . . 222
7.7.2 Neumann-Neumann with ill-posed subproblems . . . . 225
7.7.3 GenEO BNN . . . . . . . . . . . . . . . . . . . . . . . . . 229
7.7.4 Efficient implementation of the BNNG method . . . . 233
7.8 SORAS-GenEO-2 . . . . . . . . . . . . . . . . . . . . . . . . . . 233
7.8.1 Symmetrized ORAS method . . . . . . . . . . . . . . . . 233
7.8.2 Two-level SORAS algorithm . . . . . . . . . . . . . . . . 234
7.8.3 Nearly Incompressible elasticity . . . . . . . . . . . . . . 236
8 Implementation of Schwarz methods 239

8.1 A parallel FreeFem++ script . . . . . . . . . . . . . . . . . . . . 239
8.1.1 Three dimensional elasticity problem . . . . . . . . . . 239
8.1.2 Native DDM solvers and PETSc Interface . . . . . . . 243
4 CONTENTS
FreeFem++ interface . . . . . . . . . . . . . . . . 244

PETSc interface . . . . . . . . . . . . . . . . . . . 245
8.1.3 Validation of the computation . . . . . . . . . . . . . . . 246
8.1.4 Parallel Script . . . . . . . . . . . . . . . . . . . . . . . . 246
8.2 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . 252
8.2.1 Small scale computations . . . . . . . . . . . . . . . . . 252
8.2.2 Large Scale Computations . . . . . . . . . . . . . . . . . 253
Strong scaling experiments . . . . . . . . . . . . . . . . . 253
Weak scaling experiments . . . . . . . . . . . . . . . . . 257
8.3 FreeFem++ Algebraic Formulation . . . . . . . . . . . . . . . . 259
Chapter 1
Schwarz methods
1.1 Three continuous Schwarz Algorithms

Hermann Schwarz was a German analyst of the 19th century. He was in-
terested in proving the existence and uniqueness of the Poisson problem.
At his time, there were no Sobolev spaces nor Lax-Milgram theorem. The
only available tool was the Fourier transform, limited by its very nature to
simple geometries. In order to consider more general situations, H. Schwarz
devised an iterative algorithm for solving Poisson problem set on a union of
simple geometries, see [169]. For a historical presentation of these kind of
methods see [85].
Let the domain be the union of a disk and a rectangle, see figure 1.1.
Consider the Poisson problem which consists in finding u R such that:
(u) = f in (1.1)
u = 0 on .
Definition 1.1.1 (Original Schwarz algorithm) The Schwarz algorithm

is an iterative method based on solving alternatively sub-problems in domains
1 2
Figure 1.1: A complex domain made from the union of two simple geometries
1
2 CHAPTER 1. SCHWARZ METHODS
2 ) by:
1 and 2 . It updates (un1 , un2 ) (u1n+1 , un+1
(un+1
1 ) = f in 1 (un+1
2 ) = f in 2
un+1
1 = 0 on 1 then, un+1
2 = 0 on 2
n+1
u1 = un2 on 1 2 . n+1
u2 = un+1
1 on 2 1 .
(1.2)
H. Schwarz proved the convergence of the algorithm and thus the well-
posedness of the Poisson problem in complex geometries.
With the advent of digital computers, this method also acquired a prac-
tical interest as an iterative linear solver. Subsequently, parallel computers
became available and a small modification of the algorithm [124] makes it
suited to these architectures. Its convergence can be proved using the max-
imum principle [123].
Definition 1.1.2 (Parallel Schwarz algorithm) Iterative method which

solves concurrently in all subdomains, i = 1, 2:
(un+1
i ) =f in i
un+1
i =0 on i (1.3)
un+1
i = un3i on i 3i .
It is easy to see that if the algorithm converges, the solutions u i , i = 1, 2

in the intersection of the subdomains take the same values. Indeed, in the
1 u2 . By the last line of (1.3), we know
overlap 12 = 1 2 , let e = u
that e = 0 on 12 . By linearity of the Poisson equation, we also have that

e is harmonic. Thus, e solves the homogeneous well-posed BVP:
(e ) = 0 in 12
e = 0 on 12
and thus e = 0 .
Algorithms (1.2) and (1.3) act on the local functions (ui )i=1,2 . In order
to write algorithms that act on global functions in H 1 (), the space in which
problem (1.1) is naturally posed, we need extension operators and partitions
of unity.
Definition 1.1.3 (Extension operators and partition of unity) Let the

extension operator Ei such that Ei (wi ) R is the extension of a func-
tion wi i R, by zero outside i . We also define the partition of unity
functions i i R, i 0 and i (x) = 0 for x i and such that:
2
w = Ei (i wi ) . (1.4)
i=1
for any function w R.

1.1. THREE CONTINUOUS SCHWARZ ALGORITHMS 3
There are two ways to write related algorithms that act on functions un
H 1 (). They are given in Definitions 1.1.4 and 1.1.5.
Definition 1.1.4 (First global Schwarz iteration) Let un be an approx-

imation to the solution to the Poisson problem (1.1), un+1 is computed by
solving first local sub-problems:
(win+1 ) = f in i , win+1 = un on i 3i
(1.5)
win+1 = 0 on i .
and then gluing them together using the partition of unity functions:
2
un+1 = Ei (i win+1 ) . (1.6)
i=1
We can prove the following property:
Lemma 1.1.1 Algorithm (1.5)-(1.6) which iterates on un and algorithm

(1.3) which iterates on (un1 , un2 ) are equivalent.
2
Proof Starting from initial guesses which satisfy u0 = Ei (i u0i ), we
i=1
prove by induction that
2
un = Ei (i uni ) . (1.7)
i=1
holds for all n 0. Assume the property holds at step n of the algorithm.
Then, using the fact that 1 = 1 and 2 = 0 on 1 2 we have by definition
that w1n+1 is a solution to BVP (1.3):
(w1n+1 ) =f in 1 ,
w1n+1 =0 on 1 ,
2 (1.8)
w1n+1 = un = Ei (i uni ) = un2 on 1 2 .
i=1
and thus w1n+1 = un+1 n+1

1 . The proof is the same for w2 = un+1
2 . Finally, we
have using (1.6):
2 2
un+1 = Ei (i win ) = Ei (i uni ) .
i=1 i=1
We introduce in Algorithm 1 another formulation to algorithm (1.5)-(1.6)

in terms of the continuous residual rn = f + un . This way, we get closer
to the algebraic definition of domain decomposition methods. Algorithm 1
is named RAS which stands for Restricted Additive Schwarz.
Lemma 1.1.2 (Equivalence between Schwarz algorithm and RAS)

The algorithm defined by (1.12), (1.13) and (1.14) is called the continuous
RAS algorithm. It is equivalent to the Schwarz algorithm (1.3).
Proof Here, we have to prove the equality
un = E1 (1 un1 ) + E2 (2 un2 ) ,
where un1,2 is given by (1.3) and un is given by (1.12)-(1.13)-(1.14). We

assume that the property holds for the initial guesses:
u0 = E1 (1 u01 ) + E2 (2 u02 )
and proceed by induction assuming the property holds at step n of the

algorithm, i.e. un = E1 (1 un1 ) + E2 (2 un2 ). From (1.14) we have
un+1 = E1 (1 (un + v1n )) + E2 (2 (un + v2n )) . (1.9)
We prove now that un1 + v1n = un+1

1 by proving that un1 + v1n satisfies (1.3)
as un+1
1 does. We first note that, using (1.13)-(1.12) we have:
(un + v1n ) = (un ) + rn = (un ) + f + (un ) = f in 1 ,

(1.10)
un + v1n = un on 1 2 ,
It remains to prove that
un = un2 on 1 2 .
By the induction hypothesis we have un = E1 (1 un1 ) + E2 (2 un2 ). On 1

2 , we have 1 0 and thus 2 1. So that on 1 2 we have :
un = 1 un1 + 2 un2 = un2 . (1.11)
Finally from (1.10) and (1.11) we can conclude that un1 + v1n = un+1
1 satisfies
n+1
problem (1.3) and is thus equal to u1 . The same holds for domain 2 ,
un2 + v2n = un+1
2 . Then equation (1.9) reads
un+1 = E1 (1 un+1
1 ) + E2 (2 u2 )
n+1
which ends the proof of the equivalence between Schwarz algorithm and the
continuous RAS algorithm (1.12)-(1.13)-(1.15).
Another global variant of the parallel Schwarz algorithm (1.3) consists

in replacing formula (1.6) by a simpler formula not based on the partition
of unity.
1.1. THREE CONTINUOUS SCHWARZ ALGORITHMS 5
Algorithm 1 RAS algorithm at the continuous level

1. Compute the residual rn R:
rn = f + (un ) (1.12)
2. For i = 1, 2 solve for a local correction vin :
(vin ) = rn in i , vin = 0 on i (1.13)
3. Compute an average of the local corrections and update un :
un+1 = un + E1 (1 v1n ) + E2 (2 v2n ) . (1.14)
where (i )i=1,2 and (Ei )i=1,2 define a partition of unity as in defined

in section 1.1 equation (1.4).
Definition 1.1.5 (Second global Schwarz iteration) Let un be an ap-

proximation to the solution to the Poisson problem (1.1), un+1 is computed
by solving first local sub-problems (1.5) and then gluing them together with-
out the use of the partition of unity functions:
2
un+1 = Ei (win+1 ) . (1.15)
i=1
It is easy to check that this algorithm is equivalent to Algorithm 2 which is

called ASM (Additive Schwarz method).
Algorithm 2 ASM algorithm at the continuous level

rn = f + (un ) (1.16)
(vin ) = rn in i , vin = 0 on i (1.17)
3. Update un :
un+1 = un + E1 (v1n ) + E2 (v2n ) . (1.18)
To sum up, starting from the original Schwarz algorithm (1.2) that is
sequential, we have thus three continuous algorithms that are essentially
parallel:
Algorithm (1.3) Jacobi Schwarz Method (JSM)
Algorithm (1.12)-(1.13)-(1.14) Restricted Additive Schwarz (RAS)
Algorithm (1.16)-(1.17)-(1.18) Additive Schwarz Method (ASM)
The discrete version of the first algorithm is seldom implemented since

it involves duplication of unknowns. The discrete version of the second
algorithm is the restricted additive Schwarz method (RAS, see[17, 18]) which
is the default parallel solver in the package PETSC. The discrete version of
the third algorithm is the additive Schwarz method (ASM) for which many
theoretical results have been derived, see [179] and references therein. The
latter term was introduced first by Dryja and Widlund in [64] for a variant
of the algorithm firstly introduced at continuous level in [135].
1.2 Connection with the Block Jacobi algorithm

In the previous section we have noticed that the three methods illustrate
different points of view of the Schwarz iteration, the continuous aspect em-
phasized the interest of the overlap, which can be often hidden by the dis-
cretization. When going to the discrete level, we will see that Schwarz al-
gorithm is, from a linear algebra point of view a variation of a block-Jacobi
algorithm.
We first recall the definition of a block Jacobi algorithm and then es-
tablish a connection with the Schwarz algorithms. Let us consider a linear
system:
AU = F (1.19)
with a matrix A of size m m, a right-hand side F Rm and a solution
U Rm where m is an integer. The set of indices {1, . . . , m} is partitioned
into two sets
N1 = {1, . . . , ms } and N2 = {ms + 1, . . . , m} .
Let U1 = (Uk )kN1 = UN1 , U2 = (Uk )kN2 = UN2 and similarly F1 = FN1 ,
F2 = FN2 .
The linear system has the following block form:
A11 A12 U F
( )( 1 ) = ( 1 )
A21 A22 U2 F2
where Aij = ANi Nj , 1 i, j 2.
Definition 1.2.1 (Jacobi algorithm) Let D be the diagonal of A, the Ja-

cobi algorithm reads:
DUn+1 = DUn + (F AUn ) ,

1.2. CONNECTION WITH THE BLOCK JACOBI ALGORITHM 7
or equivalently,
Un+1 = Un + D1 (F AUn ) = Un + D1 rn ,
where rn = F AUn is the residual of the equation.
We now define a block Jacobi algorithm.
Definition 1.2.2 (Block-Jacobi algorithm) The block-Jacobi algorithm

reads:
A11 0 Un+1 A 0 Un F Un
( ) ( 1n+1 ) = ( 11 ) ( 1n ) + ( 1 ) A ( 1n )
0 A22 U2 0 A22 U2 F2 U2
(1.20)
or equivalently
A11 0 U1 F1 A12 U2
n+1 n
= . (1.21)
0 A22 Un+1
2
F2 A21 Un1
In order to have a more compact form of the previous algorithm, we in-

troduce R1 the restriction operator from N into N1 and similarly R2 the
restriction operator from N into N2 . The transpose operator RiT are exten-
sions operators from Ni into N . Note that Aii = Ri ARiT .
Lemma 1.2.1 (Compact form of a block-Jacobi algorithm) The al-

gorithm (1.21) can be re-written as
Un+1 = Un + (R1T (R1 AR1T )1 R1 + R2T (R2 AR2T )1 R2 ) rn . (1.22)
Proof Let Un = (Un1 T , Un2 T )T , algorithm (1.21) becomes
A11 0 n+1 0 A12 n

U =F U . (1.23)
0 A22 A21 0
On the other hand, equation (1.20) can be rewritten equivalently

1
Un+1 Un1 A11 0 rn1 A1 0
( 1
n+1 ) = ( n )+( ) ( n ) U n+1
= U n
+( 11 ) rn
U2 U2 0 A22 r2 0 A1
22
(1.24)
where rni = rnNi , i = 1, 2 . By taking into account that
A1
11 0 ) = RT A1 R = RT (R ART )1 R
( 1 11 1 1 1 1 1
0 0
and
0 0
( ) = R2T A1
22 R2 = R2 (R2 AR2 ) R2 ,
T T 1
0 A1
22
1 2
1 xms xms +1 2
Figure 1.2: Domain decomposition with minimal overlap and partition of
unity
the conclusion follows easily.
In order to establish a connection with the Schwarz algorithms, consider

the following BVP on = (0, 1): Find u such that
u = f, in
u(0) = u(1) = 0.
We discretize it by a three point finite difference scheme on the grid xj = j h,

1 j m where h = 1/(m + 1). Let uj u(xj ), fj = f (xj ), 1 j m
and U = (uj )1jm , F = (fj )1jm satisfy equation (1.19) where A is the
tridiagonal matrix Aj j = 2/h2 and Aj j+1 = Aj+1 j = 1/h2 .
Let domains 1 = (0, (ms + 1) h) and 2 = (ms h, 1) define an overlap-
ping decomposition with a minimal overlap of width h. The discretization
of (1.5) for domain 1 reads

1,j1 2u1,j + u1,j+1
un+1 n+1 n+1

h2
= fj , 1 j ms
.

u1,0 = 0
n+1

u1,ms +1 = un2,ms +1
n+1
Solving for Un+1

1 = (un+1
1,j )1jms corresponds to solving a Dirichlet boundary
value problem in subdomain 1 with Dirichlet data taken from the other
subdomain at the previous step. With the notations introduced previously,
Un+1
1 satisfies
A11 Un+1
1 + A12 Un2 = F1 .
Similarly, we have
A22 Un+1
2 + A21 Un1 = F2 .
These two equations are equivalent to (1.21) and represent the discretization
of the JSM method (1.3).
The discrete counterpart of the extension operator E1 (resp. E2 ) is de-
fined by E1 (U1 ) = (UT1 , 0)T (resp. E2 (U2 ) = (0, UT2 )T ). The discretization
of the ASM (1.5)-(1.15) is then given by equation (1.23). Thus when the
overlap is minimal, the ASM method reduces to the block Jacobi algorithm.
1.3. DISCRETE PARTITION OF UNITY 9
Let i , i = 1, 2 be the piecewise linear functions that define a partition

of unity on the domain decomposition, see Figure 1.2. In this very simple
configuration,

1 if 0 x xms

1 (x) = xms +1 x

if xms x xms +1
h
and

x xms
if xms x xms +1
2 (x) = h .

1 if xms +1 x 1

Functions i , i = 1, 2 define a partition of unity in the sense of (1.4). Since
the overlap is minimal, the discretization of (1.6) is equivalent to that of
(1.15). Thus RAS reduces, in this case, to ASM.
Remark 1.2.1 In conclusion when the overlap is minimal the discrete coun-
terparts of the three Schwarz methods of section 1.1 are equivalent to the
same block Jacobi algorithm. Notice here a counter-intuitive feature: a non
overlapping decomposition of the set of indices N corresponds to a geometric
decomposition of the domain with minimal overlap.
1.3 Algebraic algorithms: discrete partition of unity

Our goal is to introduce the algebraic counterparts of algorithms RAS and
ASM defined in 1.1 in the general case. The simplest way to do so is to
write the iterative method in terms of residuals as is done in equation (1.22).
In order to do this, we need to settle some elements necessary in this writing.
One of them is the proper definition of the partition of unity.
At the continuous level (partial differential equations), the main ingre-
dients of the partition of unity are
An open domain and an overlapping decomposition into N open
subsets = N
i=1 i .
A function u R.
The extension operator Ei of a function i R to a function R

equals to zero in /i .
The partition of unity functions i , 1 i N introduced in for-

mula (1.4) which verify for all functions u R:
2
u = Ei (i ui ).
i=1
We can give a similar definition at the discrete level.

Definition 1.3.1 (Algebraic partition of unity) At the discrete level,

the main ingredients of the partition of unity are
A set indices of degrees of freedom N and a decomposition into N

subsets N = N
i=1 Ni .
A vector U R#N .
The restriction of a vector U R#N to a subdomain i , 1 i N

can be expressed as Ri U where Ri is a rectangular #Ni #N Boolean
matrix. The extension operator will be the transpose matrix RiT .
The partition of unity functions at discrete level correspond to diag-

onal matrices of size #Ni #Ni with non negative entries such that
for all vectors U R#N
N
U = RiT Di Ri U ,
i=1
or in other words
N
Id = RiT Di Ri (1.25)
i=1
where Id R#N #N is the identity matrix.
As pointed out in Remark 1.2.1 an overlapping decomposition of a domain

might correspond to a partition of the set of indices.
In the following we will give some simple examples where all the ingre-
dients of the Definition 1.3.1 are detailed and we will check that (1.25) is
verified in those cases.
1.3.1 Two subdomain case in one dimension

1d Algebraic setting
We start from the 1d example of 1.2 with n = 5, ns = 3 so that the set of
indices N = {1, . . . , 5} is partitioned into two sets, see Figure 1.3
N1 = {1, 2, 3} and N2 = {4, 5} .
Then, matrix R1 is of size 3 5 and matrix R2 is of size 2 5:
1 0 0 0 0 0 0 0 1 0
R1 = 0 1 0 0 0 and R2 = ( ),
0 0 1 0 0 0 0 0 0 1
1 2 3 4 5
N1 N2
Figure 1.3: Algebraic partition of the set of indices
and
1 0 0 0 0
0 1 0 0 0

R1 =
T
0 0 1 and R2 =
T
0 0
.
0 0 1 0
0
0 0 0 0 1
We also have
1 0 0 1 0
D1 = 0 1 0 and D2 = ( ).
0 0 1 0 1
It is clear that relation (1.25) holds.
1 2 3 4 5
N1=1 N2=1
Figure 1.4: Algebraic decomposition of the set of indices into overlapping
subsets
Consider now the case where each subset is extended with a neighboring
point, see Figure 1.4:
N1=1 = {1, 2, 3, 4} and N2=1 = {3, 4, 5} .
Then, matrices R1 and R2 are:
1 0 0 0 0
0 1 0 0 0 0 0 1 0 0
R1 = and R2 = 0 0 0 1 0 .
0 0 1 0 0 0 0 0 0 1
0 0 0 1 0
The simplest choices for the partition of unity matrices are
1 0 0 0
0 1 0 0 0 0 0
D1 = and D2 = 0 1 0
0 0 1 0 0 0 1
0 0 0 0
or
1 0 0 0
0 1 0 0 1/2 0 0
D1 = and D2 = 0 1/2 0 .
0 0 1/2 0 0
0 0 1
0 0 1/2
Again, it is clear that relation (1.25) holds.
1 2 3 4 5
1 2
Figure 1.5: Finite element partition of the mesh
1d Finite element decomposition

We still consider the 1d example with a decomposition into two subdomains
but now in a finite element spirit. A partition of the 1D mesh of Figure 1.5
corresponds to an overlapping decomposition of the set of indices:
N1 = {1, 2, 3} and N2 = {3, 4, 5} .
1 0 0 0 0 0 0 1 0 0
R1 = 0 1 0 0 0 and R2 = 0 0 0 1 0 .
0 0 1 0 0 0 0 0 0 1
In order to satisfy relation (1.25), the simplest choice for the partition of
unity matrices is
1 0 0 1/2 0 0
D1 = 0 1 0 and D2 = 0 1 0
0 0 1/2 0 0 1
Consider now the situation where we add a mesh to each subdomain, see
Figure 1.6. Accordingly, the set of indices is decomposed as:
N1=1 = {1, 2, 3, 4} and N2=1 = {2, 3, 4, 5} .
1 0 0 0 0 0 1 0 0 0
0 1 0 0 0 0 0 1 0 0
R1 = and R2 = .
0 0 1 0 0 0 0 0 1 0
0 0 0 1 0 0 0 0 0 1
1 2 3 4 5
=1
1 =1
2
Figure 1.6: Finite element decomposition of the mesh into overlapping sub-
domains
In order to satisfy relation (1.25), the simplest choice for the partition of
unity matrices is
1 0 0 0 0 0 0 0
0 1 0 0 0 1/2 0 0
D1 = and D2 = .
0 0 1/2 0 0 0 1 0
0 0 0 0 0 0 0 1
Another possible choice that will satisfy relation (1.25) as well is
1 0 0 0 1/2 0 0 0
0 1/2 0 0 0 1/2 0 0
D1 = and D2 = .
0 0 1/2 0 0 0 1/2 0
0 0 0 1/2 0 0 0 1
1.3.2 Multi dimensional problems and many subdomains

In the general case, the set of indices N can be partitioned by an automatic
graph partitioner such as METIS[118] or SCOTCH [24]. From the input ma-
trix A, a connectivity graph is created. Two indices i, j N are connected
if the matrix coefficient Aij 0. Usually, even if matrix A is not symmet-
ric, the connectivity graph is symmetrized. Then algorithms that find a
good partitioning of highly unstructured graphs are used. This distribution
must be done so that the number of elements assigned to each processor is
roughly the same, and the number of adjacent elements assigned to different
processors is minimized. The goal of the first condition is to balance the
computations among the processors. The goal of the second condition is
to minimize the communication resulting from the placement of adjacent
elements to different processors.
Multi-D algebraic setting

Let us consider a partition into N subsets (see Figure 1.7):
N
N = Ni , Ni Nj = for i j . (1.26)
i=1
Let Ri be the restriction matrix from set N to the subset Ni and Di the
identity matrix of size #Ni #Ni , 1 i N . Then, relation (1.25) is
satisfied.
N1 N1=1
N3 N3=1
N2 N2=1
Figure 1.7: Partition and overlapping decomposition of the set of indices
Consider now the case where each subset Ni is extended with its direct
neighbors to form Ni=1 , see Figure 1.7. Let Ri be the restriction matrix from
set N to the subset Ni=1 and Di be a diagonal matrix of size #Ni=1 #Ni=1 ,
1 i N . For the choice of the coefficients of Di there are two main options.
The simplest one is to define it as a Boolean matrix:
1 if j Ni ,
(Di )jj = {
0 if j Ni=1 /Ni .
Then, relation (1.25) is satisfied. Another option is to introduce for all j N

the set of subsets having j as an element:
Mj = {1 i N j Ni=1 } .
Then, define
(Di )jj = 1/#Mj , for j Ni=1 .
Then, relation (1.25) is satisfied.
Multi-D finite element decomposition
Partitioning a set of indices is well adapted to an algebraic framework. In

a finite element setting, the computational domain is the union of elements
of the finite element mesh Th . A geometric partition of the computational
domain is natural. Here again, graph partitioning can be used by first
modeling the finite element mesh by a graph, and then partitioning it into
N parts, see Figure 1.8. By adding to each part layers of elements, it is
1.4. ITERATIVE SCHWARZ METHODS: RAS, ASM 15
Figure 1.8: Left: Finite element partition; Right: one layer extension of the
right subdomain
possible to create overlapping subdomains resolved by the finite element

meshes:
i = for 1 i N . (1.27)
Ti,h
Let {k }kN be a basis of the finite element space. We define

Ni = {k N supp (k ) i } 1 i N. (1.28)
For each degree of freedom k N , let
k = # {j 1 j N and supp (k ) j } .
Let Ri be the restriction matrix from set N to the subset Ni and Di be a
diagonal matrix of size #Ni #Ni , 1 i N such that
(Di )kk = 1/k , k Ni .
Then, relation (1.25) is satisfied.
1.4 Iterative Schwarz methods: RAS, ASM

In a similar way to what was done for the block Jacobi algorithm in equa-
tion (1.22), we can define RAS (the counterpart of Algorithm (1.5)-(1.6))
and ASM algorithms (the counterpart of Algorithm (1.5)-(1.15)).
Definition 1.4.1 (RAS algorithm) The iterative RAS algorithm is the

preconditioned fixed point iteration defined by
Un+1 = Un + MRAS
1
rn , rn = F A Un
where the matrix
N
= RiT Di (Ri A RiT )
1
MRAS
1
Ri (1.29)
i=1
is called the RAS preconditioner.
Definition 1.4.2 (ASM algorithm) The iterative ASM algorithm is the

preconditioned fixed point iteration defined by
Un+1 = Un + MASM
1
rn , rn = F A Un
where the matrix

N
= RiT (Ri A RiT )
1
MASM
1
Ri (1.30)
i=1
is called the ASM preconditioner.
1.5 Convergence analysis

In order to have an idea about the convergence of these methods, we perform
a simple yet revealing analysis. We consider in 1.5.1. a one dimensional
domain decomposed into two subdomains. This shows that the size of the
overlap between the subdomains is key to the convergence of the method. In
1.5.2 an analysis in the multi dimensional case is carried out by a Fourier
analysis. It reveals that the high frequency component of the error is very
quickly damped thanks to the overlap whereas the low frequency part will
demand a special treatment, see chapter 4 on coarse spaces and two-level
methods.
1.5.1 1d case: a geometrical analysis

In the 1D case, the original sequential Schwarz method (1.2) can be ana-
lyzed easily. Let L > 0 and the domain = (0, L) be decomposed into two
subodmains 1 = (0, L1 ) and 2 = (l2 , L) with l2 L1 . By linearity of the
equation and of the algorithm the error eni = uni ui , i = 1, 2 satisfies
d2 en+1 d2 e2n+1
1
= 0 in (0, L1 ) = 0 in (l2 , L)
dx2 then, dx2
1 (0) = 0
en+1 2 (l2 ) = e1 (l2 )
en+1 n+1
e1 (L1 ) = en2 (L1 )

n+1
e2 (L) = 0 .
n+1
(1.31)
Thus the errors are affine functions in each subdomain:
x Lx
1 (x) = e2 (L1 )
en+1 2 (x) = e1 (l2 )
n
and en+1 n+1
.
L1 L l2
Thus, we have
L L1 l2 L L1
2 (L1 ) = e1 (l2 )
en+1 = en2 (L1 )
n+1
.
L l2 L1 L l2
1.5. CONVERGENCE ANALYSIS 17

e02
e11 e12
e21 e22
e31
x
0 l2 L1 L
Figure 1.9: Convergence of the Schwarz method
Let = L1 l2 denote the size of the overlap, we have
l2 L l2 n 1 /(L l2 ) n
2 (L1 ) =
en+1 e2 (L1 ) = e2 (L1 ) .
l2 + L l2 1 + /l2
We see that the following quantity is the convergence factor of the algorithm
1 /(L l2 )
1 =
1 + /l2
It is clear that > 0 is a sufficient and necessary condition to have con-

vergence. The convergence becomes faster as the ratio of the size of the
overlap over the size of a subdomain is bigger. A geometric illustration of
the history of the convergence can be found in figure 1.9.
1.5.2 2d case: Fourier analysis for two subdomains

For sake of simplicity we consider the plane R2 decomposed into two half-
planes 1 = (, ) R and 2 = (0, ) R with an overlap of size > 0.
We choose as an example a symmetric positive definite problem ( > 0)
( )(u) = f in R2 ,
u is bounded at infinity ,
The Jacobi-Schwarz method for this problem is the following iteration
( )(un+1
1 ) = f (x, y), (x, y) 1
(1.32)
1 (, y) = u2 (, y),
un+1 yR
n
and
( )(un+1
2 ) = f (x, y), (x, y) 2
(1.33)
2 (0, y) = u1 (0, y),
un+1 yR
n
j , j = 1, 2 bounded at infinity.
with the local solutions un+1
In order to compute the convergence factor, we introduce the errors
eni = uni ui , i = 1, 2.
By linearity, the errors satisfy the above algorithm with f = 0:
1 ) = f (x, y),
( )(en+1 (x, y) 1
(1.34)
1 (, y) = e2 (, y),
en+1 yR
n
and
( )(en+1
2 ) = f (x, y), (x, y) 2
(1.35)
2 (0, y) = e1 (0, y),
en+1 yR
n
with en+1
j bounded at infinity.
By taking the partial Fourier transform of the first line of (1.34) in the
y direction we get:
2
( 1 (x, k)) = 0
+ k 2 ) (en+1 in 1 .
x2
For a given Fourier variable k, this is an ODE whose solution is sought in

the form
1 (x, k) = j (k) exp(j (k)x).
en+1
j
A simple computation gives

1 (k) = + (k), 2 (k) = (k), with (k) = + k 2 .
Therefore we have
1 (x, k) = + (k) exp( (k)x) + (k) exp( (k)x).

en+1 n+1 + n+1
Since the solution must be bounded at x = , this implies that n+1 (k) 0.
Thus we have
1 (x, k) = + (k) exp( (k)x)
en+1 n+1 +
1.6. MORE SOPHISTICATED SCHWARZ METHODS: P.L. LIONS ALGORITHM19
or equivalently, by changing the value of the coefficient + ,
1 (x, k) = 1 (k) exp( (k)(x ))

en+1 n+1 +
and similarly, in domain 2 we have:
2 (x, k) = 2 (k) exp( (k)x)

en+1 n+1
n+1
with 1,2 to be determined. From the interface conditions we get
1n+1 (k) = 2n (k) exp( (k))
and
2n+1 (k) = 1n (k) exp(+ (k)).
Combining these two and denoting (k) = + (k) = (k), we get for i = 1, 2,
in+1 (k) = (k; , )2 in1 (k)
with the convergence factor given by:

(k; , ) = exp((k)), (k) = + k2 . (1.36)
A graphical representation can be found in Figure 1.10 for some values of

the overlap. This formula deserves a few remarks.
Remark 1.5.1 We have the following properties:

For all k R, (k) < exp( ) < 1 so that in (k) 0 uniformly as n
goes to infinity.
0 as k tends to infinity, high frequency modes of the error converge

very fast.
When there is no overlap ( = 0), = 1 and there is stagnation of the

method.
1.6 More sophisticated Schwarz methods: P.L. Li-

ons Algorithm
During the last decades, more sophisticated Schwarz methods were designed,
namely the optimized Schwarz methods. These methods are based on a
classical domain decomposition, but they use more effective transmission
conditions than the classical Dirichlet conditions at the interfaces between
subdomains. The first more effective transmission conditions were first in-
troduced by P.L. Lions [125]. For elliptic problems, we have seen that
e xp sq rt.1k^ 2 , e xp 0 .5sq rt.1k^ 2 , k fro m 0 to 7
In p u t in te rp re ta tio n :
20 exp 0.1 k 2 1. SCHWARZ METHODS

CHAPTER
k 0 to 7
exp 0.5 0.1 k
p lot
2
Plo t:
0.8
0.6
0.4
0.2 k 20 .1

k 20 .1
0 .5
1 2 3 4 5 6 7
k
Figure 1.10: Convergence rate of the Schwarz method for = .1, = 0.5 (red
curve) or = 1 (blue curve).
n2 n1
1 2 1 n2 n1 2
Figure 1.11: Outward normals for overlapping and non overlapping subdo-
mains for P.L. Lions algorithm.
Ge n e ra tebdy Wo lfra m |Alp h(www.wo
a lfra m a lp h a .co m
o n) Octo b e r2 6 , 2 0 11 fro m Ch a m p a igIL.
n,
Wo lfra mAlp h aLLCA Wolfram Research Company
1.7. SCHWARZ METHODS USING FREEFEM++ 21
Schwarz algorithms work only for overlapping domain decompositions and

their performance in terms of iterations counts depends on the width of
the overlap. The algorithm introduced by P.L. Lions [125] can be applied to
both overlapping and non overlapping subdomains. It is based on improving
Schwarz methods by replacing the Dirichlet interface conditions by Robin
interface conditions.
Let be a positive number, the modified algorithm reads
(un+1
1 ) = f in 1 ,
un+1
1 = 0 on 1 , (1.37)

( + ) (un+1
1 ) = ( + ) (un2 ) on 1 2 ,
n1 n1
and
(un+1
2 ) = f in 2 ,
un+1
2 = 0 on 2 (1.38)

( 2 ) = (
+ ) (un+1 + ) (un1 ) on 2 1
n2 n2
where n1 and n2 are the outward normals on the boundary of the subdo-
mains, see Figure 1.11.
This algorithm was extended to Helmholtz problem by Despres [43]. It

is also possible to consider other interface conditions than Robin conditions
and optimize their choice with respect to the convergence factor. All these
ideas will be presented in detail in Chapter 2.
1.7 Schwarz methods using FreeFem++

The aim of this part is to illustrate numerically the previously defined
Schwarz methods applied to second order elliptic boundary value problems
(e.g Laplace equation and elasticity). In order to do this we will use the
free finite element software FreeFem++ [105] developed at the Laboratoire
Jacques-Louis Lions at Universite Pierre et Marie Curie (Paris 6).
1.7.1 A very short introduction to FreeFem++

FreeFem++ allows a very simple and natural way to solve a great variety
of variational problems by finite element type methods including Discon-
tinuous Galerkin (DG) discretizations. It is also possible to have access to
the underlying linear algebra such as the stiffness or mass matrices. In this
section we will provide only a minimal number of elements of this software,
necessary for the understanding of the programs in the next section, see also
http://www.cmap.polytechnique.fr/spip.php?article239. A very de-
tailed documentation of FreeFem++ is available on the official website
http://www.freefem.org/ff++, at the following address

http://www.freefem.org/ff++/ftp/freefem++doc.pdf . The standard
implementation includes tons of very useful examples that make a tutorial
by themselves.
To start with, suppose we want to solve a very simple homogeneous

Dirichlet boundary value problem for a Laplacian defined on a unit square
=]0, 1[2 :
u = f in
{ (1.39)
u = 0 on
The variational formulation of this problem reads:
Find u H01 () = {w H 1 () w = 0 on } such that
u.vdx f v dx = 0, v H0 () .
1

A feature of FreeFem++ is to penalize Dirichlet boundary conditions. The

above variational formulation is first replaced by
Find u H 1 () such that
u.vdx f v dx = 0, v H () .
1

Then the finite element approximation leads to a system of the type

M
Aij uj Fj = 0, i = 1, ..., M, Aij = i .j dx, Fi = i dx
j=1
where (i )1iM are the finite element functions. Note that the discretized
system corresponds to a Neumann problem. Dirichlet conditions of the type
u = g are then implemented by penalty, namely by setting
Aii = 1030 , Fi = 1030 gi
if i is a boundary degree of freedom. The penalty number 1030 is called

TGV1 and it is possible to change this value. The keyword on imposes the
Dirichlet boundary condition through this penalty term.
The following FreeFem++ script is solving this problem in a few lines.

The text after // symbols are comments ignored by the FreeFem++ lan-
guage. Each new variable must be declared with its type (here int designs
integers).
1
Tres Grande Valeur (Terrifically Great Value) = Very big value in French
4 2
Figure 1.12: Numbering of square borders in FreeFem++
3 // Number of mesh points in x and y directions

int Nbnoeuds=10;
Listing 1.1: ./FreefemCommon/survival.edp
The function square returns a structured mesh of the square: the first two
arguments are the number of mesh points according to x and y directions
and the third one is a parametrization of for x and y varying between 0
and 1 (here it is the identity). The sides of the square are labeled from 1 to
4 in trigonometrical sense (see Figure 1.2).
//Mesh definition
mesh Th=square(Nbnoeuds,Nbnoeuds,[x,y]);
We define the function representing the right-hand side using the keyword
func
// Functions of x and y
14 func f=xy;
func g=1.;
and the P 1 finite element space Vh over the mesh Th using the keyword
fespace
// Finite element space on the mesh Th

fespace Vh(Th,P1);
//uh and vh are of type Vh
22 Vh uh,vh;
The functions uh and vh belong to the P 1 finite element space Vh which is

an approximation to H 1 (). Note here that if one wants to use P 2 instead
P 1 finite elements, it is enough to replace P1 by P2 in the definition of Vh.
26 // variational problem definition

problem heat(uh,vh,solver=LU)=
int2d(Th)(dx(uh)dx(vh)+dy(uh)dy(vh))
int2d(Th)(fvh)
30 +on(1,2,3,4,uh=0);
The keyword problem allows the definition of a variational problem, here

called heat which can be expressed mathematically as:
Find uh Vh such that
uh .vh dx f vh dx = 0, vh Vh .

Afterwards, for the Dirichlet boundary condition the penalization is

imposed using T GV which is usually is equal to 1030 .
Note that keyword problem defines problem (1.39) without solving it. The
parameter solver sets the method that will be used to solve the resulting
linear system, here a Gauss factorization. In order to effectively solve the
finite element problem, we need the command
34 //Solving the problem

heat;
// Plotting the result
plot(uh,wait=1);
The FreeFem++ script can be saved with your favorite text editor (e.g.
under the name heat.edp). In order to execute the script FreeFem++, it
is enough to write the shell command FreeFem++ heat.edp. The result
will be displayed in a graphic window.
One can easily modify the script in order to solve the same kind of problems
but with mixed Neumann and Fourier boundary conditions such as

u + u = f in

u
= 0 on 1
n (1.40)

u = 0 on 2

u
n + u = g on 3 4 .
where f and g are arbitrary functions and a positive real.

The new variational formulation consists in determining uh Vh such that
uh .vh dx + uh vh gvh f vh dx = 0,
3 4 3 4
for all vh Vh . Here again the Dirichlet boundary condition will be

penalized. The FreeFem++ definition of the problem reads:
// Changing boundary conditions to Neumann or Robin

42 real alpha =1.;
problem heatRobin(uh,vh)=
int2d(Th)(dx(uh)dx(vh)+dy(uh)dy(vh))
+int1d(Th,3,4)(alphauhvh)
46 int1d(Th,3,4)(gvh)
int2d(Th)(fvh)
+on(2,uh=0);
In the variational formulation of (1.40) the extra boundary integral

on 3 4 is defined by the keyword int1d(Th,3,4)(function to
integrate).
The keyword varf allows the definition of a variational formulation
// Using linear algebra package

varf varheatRobin(uh,vh)=
54 int2d(Th)(dx(uh)dx(vh)+dy(uh)dy(vh))
+int1d(Th,3,4)(alphauhvh)
int1d(Th,3,4)(gvh)
int2d(Th)(fvh)
58 +on(2,uh=0);
If one wants to use some linear algebra package to solve the linear system
resulting from the finite element discretisation, the program below shows
how one can retrieve first the stiffness matrix and the vector associated
to the right-hand side of the variational formulation. As a general rule,
this procedure can be very useful if one wants to use other solvers such
as domain decomposition methods. Here, the linear system is solved by
UMFPACK [37].
62 // Retrieving the stiffness matrix

matrix Aglobal; // sparse matrix
Aglobal = varheatRobin(Vh,Vh,solver=UMFPACK); // stiffness matrix
// UMFPACK direct solver
66 // Retrieving the right hand side
Vh rhsglobal;
rhsglobal[] = varheatRobin(0,Vh); //right hand side vector of d.o.fs
// Solving the problem by a sparse LU solver
70 uh[] = Aglobal1 rhsglobal[];

Here rhsglobal is a finite element function and the associated vector of
degrees of freedom is denoted by rhsglobal[].
1.7.2 Setting the domain decomposition problem
According to the description of the Schwarz algorithms in the previous

chapters, we need a certain number of data structures which will be built in
the sequel. The file data.edp contains the declaration of these structures
as well as the definition of the global problem to be solved.
1 load metis // mesh partitioner

load medit // OpenGLbased scientific visualization software
int nn=2,mm=2; // number of the domains in each direction
int npart= nnmm; // total number of domains
5 int nloc = 20; // local no of dof per domain in one direction
bool withmetis = 1; // =1 (Metis decomp) =0 (uniform decomp)
int sizeovr = 1; // size of the geometric overlap between subdomains, algebraic
overlap is sizeovr+1
real allong = real(nn)/real(mm); // aspect ratio of the global domain
9 // Mesh of a rectangular domain
mesh Th=square(nnnloc,mmnloc,[xallong,y]);// radial mesh
[(1.+xallong)cos(piy),(1.+xallong)sin(piy)]);//
fespace Vh(Th,P1);
fespace Ph(Th,P0);
13 Ph part; // piecewise constant function
int[int] lpart(Ph.ndof); // giving the decomposition
// Domain decomposition data structures
mesh[int] aTh(npart); // sequence of subdomain meshes
17 matrix[int] Rih(npart); // local restriction operators
matrix[int] Dih(npart); // partition of unity operators
int[int] Ndeg(npart); // number of dof for each mesh
real[int] AreaThi(npart); // area of each subdomain
21 matrix[int] aA(npart),aN(npart); // local matrices
Vh[int] Z(npart); // coarse space, see Chapter 3
// Definition of the problem to solve
// Delta (u) = f, u = 1 on the global boundary
25 //int[int] chlab=[1,1 ,2,1 ,3,1 ,4,1 ];
//Th=change(Th,refe=chlab); // all label borders are set to one
macro Grad(u) [dx(u),dy(u)] // EOM
func f = 1; // right hand side
29 func g = 0 ; // Dirichlet data
func kappa = 1.; // viscosity
func eta = 0;
Vh rhsglobal,uglob; // rhs and solution of the global problem
33 varf vaglobal(u,v) = int2d(Th)(etauv+kappaGrad(u)Grad(v))
+on(1,u=g) + int2d(Th)(fv);
matrix Aglobal;
// Iterative solver parameters
37 real tol=1e6; // tolerance for the iterative method
int maxit=300; // maximum number of iterations
Listing 1.10: ./FreefemCommon/data.edp
Afterwards we have to define a piecewise constant function part which

takes integer values. The isovalues of this function implicitly defines a non
overlapping partition of the domain. We have a coloring of the subdomains.
Suppose we want a decomposition of a rectangle into nnmm domains with

approximately nloc points in one direction, or a more general partitioning
method, using for example METIS [118] or SCOTCH [24]. In order to
perform one of these decompositions, we make use of one of the routines
uniform decomposition Metis decomposition

IsoValue IsoValue
-0.157895 -0.157895
0.0789474 0.0789474
0.236842 0.236842
0.394737 0.394737
0.552632 0.552632
0.710526 0.710526
0.868421 0.868421
1.02632 1.02632
1.18421 1.18421
1.34211 1.34211
1.5 1.5
1.65789 1.65789
1.81579 1.81579
1.97368 1.97368
2.13158 2.13158
2.28947 2.28947
2.44737 2.44737
2.60526 2.60526
2.76316 2.76316
3.15789 3.15789
Figure 1.13: Uniform and Metis decomposition
decompunif or decompMetis defined in the script decomp.idp which will

return a vector defined on the mesh, that can be recasted into the piecewise
function part that we are looking for.
if (withmetis)
2 {
metisdual(lpart,Th,npart); // FreeFem++ interface to Metis
for(int i=0;i<lpart.n;++i)
part[][i]=lpart[i];
6 }
else
{
Ph xx=x,yy=y;
10 part= int(xx/allongnn)mm + int(yymm);
}
if (verbosity > 1)
plot(part,wait=1,fill=1,value=1);
Listing 1.11: ./FreefemCommon/decomp.idp
The isovalues of these two part functions correspond to respectively uniform

or Metis non-overlapping decompositions as shown in Figure 1.13.
Using the function part defined as above as an argument into the routine
SubdomainsPartitionUnity, well get as a result, for each subdomain
labeled i the overlapping meshes aTh[i]:
func bool SubdomainsPartitionUnity(mesh & Th, real[int] & partdof, int

sizeoverlaps, mesh[int] & aTh, matrix[int] & Rih, matrix[int] & Dih, int[int]
& Ndeg, real[int] & AreaThi)
{
33 int npart=partdof.max+1;
mesh Thi=Th; // freefems trick, formal definition
fespace Vhi(Thi,P1); // freefems trick, formal definition
Vhi[int] pun(npart); // local fem functions
37 Vh sun=0, unssd=0;
Ph part;
part[]=partdof;
for(int i=0;i<npart;++i)
41 {
Ph suppi= abs(parti)<0.1; // boolean 1 in the subdomain 0 elswhere
AddLayers(Th,suppi[],sizeoverlaps,unssd[]); // ovr partitions by adding layers
Thi=aTh[i]=trunc(Th,suppi>0,label=10,split=1);// ovr mesh interfaces label
10
45 Rih[i]=interpolate(Vhi,Vh,inside=1); // Restriction operator : Vh > Vhi
pun[i][]=Rih[i]unssd[];
pun[i][] = 1.;// a garder par la suite
sun[] += Rih[i]pun[i][];
49 Ndeg[i] = Vhi.ndof;
AreaThi[i] = int2d(Thi)(1.);
}
53 {
Thi=aTh[i];
pun[i]= pun[i]/sun;
Dih[i]=pun[i][]; //diagonal matrix built from a vector
57 if(verbosity > 1)
plot(pun[i],wait=1,value=1,fill=1,dim=3);
}
return true;
61 }
Listing 1.12: ./FreefemCommon/createPartition.idp
Note that in the CreatePartition.idp script, the function AddLayers is

called:
func bool AddLayers(mesh & Th,real[int] &ssd,int n,real[int] &unssd)

3 {
// build a continuous function uussd (P1) and modifies ssd :
// IN: ssd in the caracteristics function on the input subdomain.
// OUT: ssd is a boolean function, unssd is a smooth function
7 // ssd = 1 if unssd >0; add n layer of element and unssd = 0 ouside of this layer
Ph s;
assert(ssd.n==Ph.ndof);
assert(unssd.n==Vh.ndof);
11 unssd=0;
s[]= ssd;
Vh u;
varf vM(uu,v)=int2d(Th,qforder=1)(uuv/area);
15 matrix M=vM(Ph,Vh);
for(int i=0;i<n;++i)
{
u[]= Ms[];
19 u = u>.1;
unssd+= u[];
s[]= Mu[];
s = s >0.1;
23 }
unssd /= (n);
u[]=unssd;
ssd=s[];
27 return true;
}
Listing 1.13: ./FreefemCommon/createPartition.idp

These last two functions are tricky. The reader does not need to understand
their behavior in order to use them. They are given here for sake of
completeness. The restriction/interpolation operators Rih[i] from the
local finite element space Vh[i] to the global one Vh and the diagonal local
matrices Dih[i] are thus created.
Afterwards one needs to build the overlapping decomposition and the

associated algebraic partition of unity, see equation (1.25). Program
testdecomp.edp (see below) shows such an example by checking that the
partition of unity is correct.
load medit
3 verbosity=2;
include dataGENEO.edp
include decomp.idp
include createPartition.idp
7 SubdomainsPartitionUnity(Th,part[],sizeovr,aTh,Rih,Dih,Ndeg,AreaThi);
// check the partition of unity
Vh sum=0,fctone=1;
for(int i=0; i < npart;i++)
11 {
Vh localone;
real[int] bi = Rih[i]fctone[]; // restriction to the local domain
real[int] di = Dih[i]bi;
15 localone[] = Rih[i]di;
sum[] +=localone[] ;
plot(localone,fill=1,value=1, dim=3,wait=1);
}
19 plot(sum,fill=1,value=1, dim=3,wait=1);
Listing 1.14: ./FreefemCommon/testdecomp.edp
Suppose we want to do now the same thing in a three-dimensional

case.
1 load msh3
func mesh3 Cube(int[int] & NN,real[int,int] &BB ,int[int,int] & L)
// basic functions to build regular mesh of a cube
// int[int] NN=[nx,ny,nz]; the number of seg in the 3 direction
5 // real [int,int] BB=[[xmin,xmax],[ymin,ymax],[zmin,zmax]]; bounding bax
// int [int,int] L=[[1,2],[3,4],[5,6]]; label of the 6 faces left,right, front, back, down, up
{
// first build the 6 faces of the cube.
9 real x0=BB(0,0),x1=BB(0,1);
real y0=BB(1,0),y1=BB(1,1);
real z0=BB(2,0),z1=BB(2,1);
int nx=NN[0],ny=NN[1],nz=NN[2];
13 mesh Thx = square(nx,ny,[x0+(x1x0)x,y0+(y1y0)y]);
int[int] rup=[0,L(2,1)], rdown=[0,L(2,0)],

rmid=[1,L(1,0), 2,L(0,1), 3, L(1,1), 4, L(0,0) ];
17 mesh3 Th=buildlayers(Thx,nz, zbound=[z0,z1],
labelmid=rmid, labelup = rup, labeldown = rdown);
return Th;
}
Listing 1.15: ./FreefemCommon/cube.idp
We would like to build a cube or a parallelepiped defined by calling the

function Cube defined in the script cube.idp and then to split it into
several domains. Again we need a certain number of data structures which
will be declared in the file data3d.edp
load metis
load medit
int nn=2,mm=2,ll=2; // number of the domains in each direction
4 int npart= nnmmll; // total number of domains
int nloc = 11; // local no of dof per domain in one direction
int sizeovr = 2; // size of the overlap
8 real allongx, allongz;
allongx = real(nn)/real(mm);
allongz = real(ll)/real(mm);
// Build the mesh
12 include cube.idp
int[int] NN=[nnnloc,mmnloc,llnloc];
real [int,int] BB=[[0,allongx],[0,1],[0,allongz]]; // bounding box
int [int,int] L=[[1,1],[1,1],[1,1]]; // the label of the 6 faces
16 mesh3 Th=Cube(NN,BB,L); // left,right,front, back, down, right
fespace Vh(Th,P1);
fespace Ph(Th,P0);
Ph part; // piecewise constant function
20 int[int] lpart(Ph.ndof); // giving the decomposition
// domain decomposition data structures
mesh3[int] aTh(npart); // sequence of ovr. meshes
matrix[int] Rih(npart); // local restriction operators
24 matrix[int] Dih(npart); // partition of unity operators
real[int] VolumeThi(npart); // volume of each subdomain
matrix[int] aA(npart); // local Dirichlet matrices
28 Vh[int] Z(npart); // coarse space
// Delta (u) = f, u = 1 on the global boundary
Vh intern;
32 intern = (x>0) && (x<allongx) && (y>0) && (y<1) && (z>0) && (z<allongz);
Vh bord = 1intern;
macro Grad(u) [dx(u),dy(u),dz(u)] // EOM
func f = 1; // right hand side
36 func g = 1; // Dirichlet data
Vh rhsglobal,uglob; // rhs and solution of the global problem
varf vaglobal(u,v) = int3d(Th)(Grad(u)Grad(v))
+on(1,u=g) + int3d(Th)(fv);
40 matrix Aglobal;
// Iterative solver
real tol=1e10; // tolerance for the iterative method
Listing 1.16: ./FreefemCommon/data3d.edp
Then we have to define a piecewise constant function part which takes inte-
ger values. The isovalues of this function implicitly defines a non overlapping
partition of the domain. Suppose we want a decomposition of a rectangle
Figure 1.14: Uniform and Metis decomposition
into nnmm ll domains with approximately nloc points in one direction,

or a more general partitioning method. We will make then use of one of
the decomposition routines which will return a vector defined on the mesh,
that can be recasted into the piecewise function part that we are looking for.
1 if (withmetis)
{
metisdual(lpart,Th,npart);
for(int i=0;i<lpart.n;++i)
5 part[][i]=lpart[i];
}
else
{
9 Ph xx=x,yy=y, zz=z;
part= int(xx/allongxnn)mmll + int(zz/allongzll)mm+int(ymm);
}
Listing 1.17: ./FreefemCommon/decomp3d.idp
The isovalues of these two functions part correspond to non-overlapping

decompositions as shown in Figure 1.14.
Using the function part defined as above, it calls the routine
SubdomainsPartitionUnity3 which builds for each subdomain labeled i
the overlapping meshes aTh[i]
31 func bool SubdomainsPartitionUnity3(mesh3 & Th, real[int] & partdof, int

sizeoverlaps, mesh3[int] & aTh, matrix[int] & Rih, matrix[int] & Dih, int[int]
& Ndeg, real[int] & VolumeThi)
{
int npart=partdof.max+1;
mesh3 Thi=Th; // freefems trick, formal definition
35 fespace Vhi(Thi,P1); // freefems trick, formal definition
Vhi[int] pun(npart); // local fem functions
Vh sun=0, unssd=0;
Ph part;
39 part[]=partdof;
{
// boolean function 1 in the subdomain 0 elswhere
43 Ph suppi= abs(parti)<0.1;
AddLayers3(Th,suppi[],sizeoverlaps,unssd[]); // overlapping partitions by
adding layers
Thi=aTh[i]=trunc(Th,suppi>0,label=10,split=1); // overlapping mesh,
interfaces have label 10
Rih[i]=interpolate(Vhi,Vh,inside=1); // Restriction operator : Vh > Vhi
47 pun[i][]=Rih[i]unssd[];
sun[] += Rih[i]pun[i][];
Ndeg[i] = Vhi.ndof;
VolumeThi[i] = int3d(Thi)(1.);
51 }
{
Thi=aTh[i];
55 pun[i]= pun[i]/sun;
Dih[i]=pun[i][];//diagonal matrix built from a vector
}
return true;
59 }
Listing 1.18: ./FreefemCommon/createPartition3d.idp
by making use of the function AddLayers3 in the CreatePartition3d.idp.

func bool AddLayers3(mesh3 & Th,real[int] &ssd,int n,real[int] &unssd)

{
5 // IN: ssd in the caracteristics function on the input subdomain.
// ssd = 1 if unssd >0; add n layer of element and unssd = 0 ouside of this layer
Ph s;
9 assert(ssd.n==Ph.ndof);
assert(unssd.n==Vh.ndof);
unssd=0;
s[]= ssd;
13 Vh u;
varf vM(uu,v)=int3d(Th,qforder=1)(uuv/volume);
matrix M=vM(Ph,Vh);
17 {
u[]= Ms[];
u = u>.1;
unssd+= u[];
21 s[]= Mu[];
s = s >0.1;
}
unssd /= (n);
25 u[]=unssd;
ssd=s[];
return true;
}
Listing 1.19: ./FreefemCommon/createPartition3d.idp
As in the 2D case, these last two functions are tricky. The reader does not
need to understand their behavior in order to use them. They are given
here for sake of completeness.
The restriction/interpolation operators Rih[i] from the local finite element

space Vh[i] to the global one Vh and the diagonal local matrices Dih[i] are
thus created. Afterwards one needs to build the overlapping decomposition
and the associated algebraic partition of unity, see equation (1.25). The
program testdecomp3d.edp (see below) shows such an example by checking
that the partition of unity is correct.
include data3d.edp
include decomp3d.idp
3 include createPartition3d.idp
medit(part, Th, part, order = 1);
SubdomainsPartitionUnity3(Th,part[],sizeovr,aTh,Rih,Dih,Ndeg,VolumeThi);
// check the partition of unity
7 Vh sum=0,fctone=1;
for(int i=0; i < npart;i++)
{
Vh localone;
11 real[int] bi = Rih[i]fctone[]; // restriction to the local domain
real[int] di = Dih[i]bi;
localone[] = Rih[i]di;
sum[] +=localone[] ;
15 medit(loc,Th, localone, order = 1);
medit(subdomains,aTh[i]);
}
medit(sum,Th, sum, order = 1);
Listing 1.20: ./FreefemCommon/testdecomp3d.edp
1.7.3 Schwarz algorithms as solvers

We are now in a position to code Schwarz solvers. In program
schwarz-solver.edp (see below) the RAS method (see eq. (1.29)) is im-
plemented as a solver. First we need to split the domains into subdo-
mains
verbosity=1;
3 include ../../FreefemCommon/dataGENEO.edp
include ../../FreefemCommon/decomp.idp
include ../../FreefemCommon/createPartition.idp
SubdomainsPartitionUnity(Th,part[],sizeovr,aTh,Rih,Dih,Ndeg,AreaThi);
Listing 1.21: ./SCHWARZ/FreefemProgram/schwarz-solver.edp
Then we need to define the global data from the variational formula-
tion.
9 Aglobal = vaglobal(Vh,Vh,solver = UMFPACK); // global matrix
rhsglobal[] = vaglobal(0,Vh); // global rhs
uglob[] = Aglobal1 rhsglobal[];
plot(uglob,value=1,fill=1,wait=1,cmm=Solution by a direct method,dim=3);
Afterwords we build the local problem matrices

for(int i = 0;i<npart;++i)
{
17 cout << Domain : << i << / << npart << endl;
matrix aT = AglobalRih[i];
aA[i] = Rih[i]aT;
set(aA[i],solver = UMFPACK);// direct solvers
21 }
and finally the Schwarz iteration
ofstream filei(Conv.m);
25 Vh un = 0; // initial guess
Vh rn = rhsglobal;
for(int iter = 0;iter<maxit;++iter)
{
29 real err = 0, res;
Vh er = 0;
{
33 real[int] bi = Rih[i]rn[]; // restriction to the local domain
real[int] ui = aA[i] 1 bi; // local solve
bi = Dih[i]ui;
// bi = ui; // uncomment this line to test the ASM method as a solver
37 er[] += Rih[i]bi;
}
un[] += er[]; // build new iterate
rn[] = Aglobalun[]; // computes global residual
41 rn[] = rn[] rhsglobal[];
rn[] = 1;
err = sqrt(er[]er[]);
res = sqrt(rn[]rn[]);
45 cout << Iteration: << iter << Correction = << err << Residual =
<< res << endl;
plot(un,wait=1,value=1,fill=1,dim=3,cmm=Approximate solution at step +
iter);
int j = iter+1;
// Store the error and the residual in Matlab/Scilab/Octave form
49 filei << Convhist(+j+,:)=[ << err << << res <<]; << endl;
if(err < tol) break;
}
plot(un,wait=1,value=1,fill=1,dim=3,cmm=Final solution);
The convergence history of the algorithm is stored in a Matlab file (also

compatible with Scilab or Octave) Conv.m, under the form of a two-column
matrix containing the error evolution as well as the residual one.
2
10
overlap=2
1 overlap=5
10
overlap=10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
0 10 20 30 40 50 60
Figure 1.15: Solution and RAS convergence as a solver for different overlaps
The result of tracing the evolution of the error is shown in Figure 1.15 where
one can see the convergence history of the RAS solver for different values of
the overlapping parameter.
Remark 1.7.1 Previous tests have shown a very easy use of the RAS iter-
ative algorithm and some straightforward conclusions from this.
The convergence of RAS, not very fast even in a simple configuration

of 4 subdomains, improves when the overlap is getting bigger.
Note that it is very easy to test the ASM method, see eq. (1.30), when
used as a solver. It is sufficient to uncomment the line bi = ui;.
Running the program shows that the ASM does not converge. For this
reason, the ASM method is always used a preconditioner for a Krylov
method such as CG, GMRES or BiCGSTAB, see chapter 3.
In the the three-dimensional case the only part that changes is the
decomposition into subdomains. The other parts of the algorithm are
identical.
include ../../FreefemCommon/data3d.edp
include ../../FreefemCommon/decomp3d.idp
4 include ../../FreefemCommon/createPartition3d.idp
Listing 1.25: ./SCHWARZ/FreefemProgram/schwarz-solver3d.edp
1.7.4 Systems of PDEs: the example of linear elasticity

Suppose we want to solve now another kind of problem, such a linear elas-
ticity. A few changes will be necessary.
load metis
2 load medit
int nn=3,mm=3; // number of the domains in each direction
6 bool withmetis = 1; // =1 (Metis decomp) =0 (uniform decomp)
int sizeovr = 2; // size of the overlap
func E = 21011; // Young modulus ans Poisson ratio
10 func sigma = 0.3;
func lambda = Esigma/((1+sigma)(12sigma)); // Lame coefficients
func mu = E/(2(1+sigma));
real sqrt2=sqrt(2.);
14 func eta = 1.0e6;
// Mesh of a rectangular domain
mesh Th=square(nnnloc,mmnloc,[xallong,y]);
fespace Vh(Th,[P1,P1]); // vector fem space
18 fespace Uh(Th,P1); // scalar fem space
fespace Ph(Th,P0);
22 // Domain decomposition data structures
mesh[int] aTh(npart); // sequence of ovr. meshes
matrix[int] Rih(npart); // local restriction operators
matrix[int] Dih(npart); // partition of unity operators
26 int[int] Ndeg(npart); // number of dof for each mesh
matrix[int] aA(npart); // local Dirichlet matrices
30 int[int] chlab=[1,11 ,2,2 ,3,33 ,4,1 ]; //Dirichlet conditions for label = 1
Th=change(Th,refe=chlab);
macro epsilon(u,v) [dx(u),dy(v),(dy(u)+dx(v))/sqrt2] // EOM
34 macro div(u,v) ( dx(u)+dy(v) ) // EOM
func uboundary = (0.25 (y0.5)2);
varf vaBC([u,v],[uu,vv]) = on(1, u = uboundary, v=0) + on(11, u = 0, v=0) +
on(33, u=0,v=0);
// global problem
38 Vh [rhsglobal,rrhsglobal], [uglob,uuglob];
macro Elasticity(u,v,uu,vv) eta(uuu+vvv) +
lambda(div(u,v)div(uu,vv))+2.mu( epsilon(u,v)epsilon(uu,vv) ) //
EOM
varf vaglobal([u,v],[uu,vv]) = int2d(Th)(Elasticity(u,v,uu,vv)) + vaBC; //
on(1,u=uboundary,v=0)
matrix Aglobal;
42 // Iterative solver parameters
Listing 1.26: ./FreefemCommon/dataElast.edp

First of all, the file dataElast.edp contains now the declarations

and data. The definition of the partition is done like before using
decomp.idp. The SubdomainsPartitionUnityVec is the vector adapta-
tion of SubdomainsPartitionUnity and will provide the same type of re-
sult
func bool SubdomainsPartitionUnityVec(mesh & Th, real[int] & partdof, int

sizeoverlaps, mesh[int] & aTh, matrix[int] & Rih, matrix[int] & Dih, int[int]
& Ndeg, real[int] & AreaThi)
{
int npart=partdof.max+1;
34 mesh Thi=Th; // freefems trick, formal definition
fespace Vhi(Thi,[P1,P1]); // freefems trick, formal definition
Vhi[int] [pun,ppun](npart); // local fem functions
Vh [unssd,uunssd], [sun,ssun]=[0,0];
38 Uh Ussd = 0;
Ph part;
int[int] U2Vc=[0,1]; // no component change
part[]=partdof;
42 for(int i=0;i<npart;++i)
{
Ph suppi= abs(parti)<0.1; // boolean 1 in the subdomain 0 elswhere
AddLayers(Th,suppi[],sizeoverlaps,Ussd[]); // ovr partitions by adding layers
46 [unssd,uunssd] =[Ussd,Ussd];
Thi=aTh[i]=trunc(Th,suppi>0,label=10,split=1); // ovr mesh interfaces label
10
Rih[i]=interpolate(Vhi,Vh,inside=1,U2Vc=U2Vc); // Restriction operator :
Vh > Vhi
pun[i][]=Rih[i]unssd[];
50 sun[] += Rih[i]pun[i][];
Ndeg[i] = Vhi.ndof;
AreaThi[i] = int2d(Thi)(1.);
}
{
Thi=aTh[i];
[pun[i],ppun[i]] = [pun[i]/sun, ppun[i]/sun];
58 Dih[i]=pun[i][]; //diagonal matrix built from a vector
}
return true;
}
Listing 1.27: ./FreefemCommon/createPartitionVec.idp
Note that in the CreatePartitionVec.idp script, the function AddLayers

is called:
func bool AddLayers(mesh & Th,real[int] &ssd,int n,real[int] &unssd)

3 {
// IN: ssd in the caracteristics function on the input subdomain.
7 // ssd = 1 if unssd >0; add n layer of element and unssd = 0 ouside of this layer
Ph s;
Uh u;
assert(ssd.n==Ph.ndof);
11 assert(unssd.n==Uh.ndof);
unssd=0;
s[]= ssd;
varf vM(uu,v)=int2d(Th,qforder=1)(uuv/area);
15 matrix M=vM(Ph,Uh);
{
u[]= Ms[];
19 u = u>.1;
unssd+= u[];
s[]= Mu[];
s = s >0.1;
23 }
unssd /= (n);
u[]=unssd;
ssd=s[];
27 return true;
}
Listing 1.28: ./FreefemCommon/createPartitionVec.idp
The restriction/interpolation operators Rih[i] from the local finite element

space Vh[i] to the global one Vh and the diagonal local matrices Dih[i]
are thus created.
We are now in a position to code Schwarz solvers. In program

schwarz-solver-elast.edp (see below) the RAS method (see eq. (1.29))
is implemented as a solver following the same guidelines as in the case of the
Laplace equation. First we need to split the domains into subdomains
include ../../FreefemCommon/dataElast.edp
4 include ../../FreefemCommon/createPartitionVec.idp
SubdomainsPartitionUnityVec(Th,part[],sizeovr,aTh,Rih,Dih,Ndeg,AreaThi);
Listing 1.29: ./SCHWARZ/FreefemProgram/schwarz-solver-elast.edp
Then we need to define the global data from the variational formula-
tion.
Aglobal = vaglobal(Vh,Vh,solver = UMFPACK); // global matrix

9 rhsglobal[] = vaglobal(0,Vh); // global rhs
real coeff2 = 1;
mesh Thmv=movemesh(Th,[x+coeff2uglob,y+coeff2uuglob]);
13 medit(Thmv, Thmv);
medit(uex, Th, uglob, Th, uuglob, order=1);

Afterwords, the local problem matrices are built in the same way as before
and finally the Schwarz iteration
27 ofstream filei(Conv.m);
Vh [un,uun] = [0,0]; // initial guess
Vh [rn,rrn] = [rhsglobal,rrhsglobal];
for(int iter = 0;iter<maxit;++iter)
31 {
real err = 0, res;
Vh [er,eer] = [0,0];
35 {
real[int] bi = Rih[i]rn[]; // restriction to the local domain
real[int] ui = aA[i] 1 bi; // local solve
bi = Dih[i]ui;
39 // bi = ui; // uncomment this line to test the ASM method as a solver
er[] += Rih[i]bi;
}
un[] += er[]; // build new iterate
43 rn[] = Aglobalun[]; // computes global residual
rn[] = rn[] rhsglobal[];
rn[] = 1;
err = sqrt(er[]er[]);
47 res = sqrt(rn[]rn[]);
cout << Iteration: << iter << Correction = << err << Residual =
<< res << endl;
int j = iter+1;
// Store the error and the residual in Matlab/Scilab/Octave form
51 filei << Convhist(+j+,:)=[ << err << << res <<]; << endl;
if(res < tol) break;
}
mesh Thm=movemesh(Th,[x+coeff2un,y+coeff2uun]);
55 medit(Thm, Thm);
medit(uh, Th, un, Th, uun, order=1);

Chapter 2
Optimized Schwarz methods

(OSM)
During the last decades, a new class of non-overlapping and overlapping

Schwarz methods was developed for scalar partial differential equations,
namely the optimized Schwarz methods. These methods are based on a
classical domain decomposition, but they use more effective transmission
conditions than the classical Dirichlet conditions at the interfaces between
subdomains.
We first introduce P.L. Lions domain decomposition method [125] in 2.1.

Instead of solving a Dirichlet or Neumann boundary value problem (BVP)
in the subdomains as in Schwarz or FETI/BNN algorithms, a Robin (a.k.a
as mixed) BVP is solved. We compute the convergence factor of the method
and give a general convergence proof. This algorithm was extended to
Helmholtz problem by Despres [43], see 2.2. This extension is all the
more important that the original Schwarz method is not convergent in the
case of wave propagation phenomena in the frequency regime. Implementa-
tion issues are discussed in 2.3. In particular, when subdomains overlap
OSM can be implemented simply by replacing the Dirichlet solves in ASM
with Robin solves. This justifies the S in OSM. Then, in sections 2.4 and
2.5, we show that it is possible to consider other interface conditions than
Robin conditions and optimize their choice with respect to the convergence
factor. This explains the O in OSM. In the last section 2.6, we explain a
FreeFem++ implementation.
2.1 P.L. Lions Algorithm

For elliptic problems, Schwarz algorithms work only for overlapping domain
decompositions and their performance in terms of iterations counts depends
on the width of the overlap. Substructuring algorithms such as BNN (Bal-
43
44 CHAPTER 2. OPTIMIZED SCHWARZ METHODS (OSM)
n2 n1
1 2 1 n2 n1 2
Figure 2.1: Outward normals for overlapping and non overlapping subdo-
mains for P.L. Lions algorithm.
ancing Neumann-Neumann [128]) or FETI (Finite Element Tearing and In-

terconnecting [77]) are defined for non overlapping domain decompositions
but not for overlapping subdomains. The algorithm introduced by P.L. Lions
[125] can be applied to both overlapping and non overlapping subdomains.
It is based on improving Schwarz methods by replacing the Dirichlet inter-
face conditions by Robin interface conditions. Let be a positive number,
the modified algorithm reads
(un+1
1 ) = f in 1 ,
un+1
1 = 0 on 1 , (2.1)

( + ) (un+1
1 ) = ( + ) (un2 ) on 1 2 ,
n1 n1
and
2 ) = f
(un+1 in 2 ,
un+1
2 = 0 on 2 (2.2)

( 2 ) = (
+ ) (un+1 + ) (un1 ) on 2 1
n2 n2
where n1 and n2 are the outward normals on the boundary of the subdo-
mains, see Figure 2.1.
We use Fourier transform to analyze the benefit of the Robin interface con-
ditions in a simple case. The domain = R2 is decomposed into two half-
planes 1 = (, ) R and 2 = (0, ) R with 0. We consider the
example of a symmetric positive definite problem
( )(u) = f in R2 ,
with being a positive constant. The algorithm can be written as:

( )(un+1
1 ) = f (x, y), (x, y) 1
n+1
u1 is bounded at infinity (2.3)

( + ) (u1 )(, y) = (
n+1
+ ) (un2 )(, y), y R
n1 n1
2.1. P.L. LIONS ALGORITHM 45
and
( )(un+1
2 ) = f (x, y), (x, y) 2
n+1
u2 is bounded at infinity (2.4)

( + ) (u2 )(0, y) = (
n+1
+ ) (un1 )(0, y), yR
n2 n2
2.1.1 Computation of the convergence factor

In order to compute the convergence factor of the algorithm given by (2.3)
and (2.4) we introduce the errors eni = uni ui , i = 1, 2. By linearity, the
errors satisfy (2.3) and (2.4) with f = 0:
( )(en+1
1 ) = 0, (x, y) 1
n+1
e1 is bounded at infinity (2.5)

( + ) (e1 )(, y) = (
n+1
+ ) (en2 )(, y), y R
n1 n1
and
2 ) = 0,
( )(en+1 (x, y) 2
n+1
e2 is bounded at infinity (2.6)

( + ) (e2 )(0, y) = (
n+1
+ ) (en1 )(0, y), y R
n2 n2
By taking the partial Fourier transform of the first line of (2.5) in the y
direction we get:
2
( 1 (x, k)) = 0
+ k 2 ) (en+1 for x < and k R.
x2
For a given k, this is an ODE whose solution is sought in the form

j j (k) exp(j (k)x). A simple calculation shows that lambda is neces-
sarily given by:
(k) = + k 2 .
Therefore we have
1 (x, k) = + (k) exp( (k)x) + (k) exp( (k)x).

en+1 n+1 + n+1
From the second line of (2.5), the solution must be bounded at x = .

This implies that n+1 (k) 0. Thus we have
1 (x, k) = + (k) exp( (k)x)

en+1 n+1 +
or equivalently, by changing the value of the coefficient + ,
1 (x, k) = 1 (k) exp( (k)(x ))

en+1 n+1 +
and similarly for subdomain 2,
2 (x, k) = 2 (k) exp( (k)x)

en+1 n+1
n+1
with 1,2 to be determined. From the interface conditions we get
1n+1 (k)(+ + ) = 2n (k)( + ) exp( (k))
and
2n+1 (k)( + ) = 1n (k)(+ + ) exp(+ (k)).
Combining these two and denoting (k) = + (k) = (k), we get for i = 1, 2,
in+1 (k) = (k, ; )2 in1 (k)
with
(k)
(k, ; ) = exp((k)) (2.7)
(k) +

where (k) = + k 2 and > 0.
Remark 2.1.1 Formula (2.7) deserves a few remarks.
For all k R, (k, ; ) < 1 so that in (k) 0 as n goes to infinity.

The method is convergent.
When domains overlap ( > 0), (k, ; ) is uniformly bounded from

above by a constant smaller than one, (k, ; ) < exp( ) < 1 and
0 as k tends to infinity. Convergence is geometric.
When there is no overlap ( = 0), 1 as k tends to infinity.
Let kc R. By taking = (kc ), we have (kc ) = 0.
For the original Schwarz method (1.3), the convergence factor is

exp((k)) and for = 0 we see once again that there is no con-
vergence. Replacing the Dirichlet interface conditions (ICs) by Robin
conditions enhances the convergence of the k-th component of the error
to zero by an improvement factor
(k)
(k, 0; ) = < 1. (2.8)
(k) +
1
1 Improvement factor
with Robin interface conditions
Dirichlet interface conditions
0.8
0.8
Convergence Rate
0.6
0.6
0.4 0.4
0.2 0.2
0 0
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
Fourier number Fourier number
Figure 2.2: Left: Convergence factor (dotted line: Dirichlet ICs, solid line:
Robin ICs) vs. Fourier number.
Right: Improvement factor due to Robin interface conditions (2.8) vs.
Fourier number
2.1.2 General convergence proof

The convergence proof given by P.L. Lions in the elliptic case [125] was
extended by B. Despres [43] to Helmholtz equation and then to Maxwell
equations [45]. A general presentation for these equations is given in [33].
We treat here the elliptic case with interface conditions (IC) more general
than Robin IC since they involve second order tangential derivatives along
the interfaces. Our proof is formal since no functional analysis framework
is given. Nevertheless, this general convergence result shows the robustness
of this approach in the non overlapping case.
Let be an open set, we consider the following problem:

(x)u ((x)u) = f in ,
u = 0 on ,
where the functions x (x), (x) are positive functions.
The domain is decomposed into N non-overlapping subdomains (i )1iN ,

= N
i=1 i and i j = , for i j.
Let ij denote the interface ij = i j , i j. For two disjoints

subdomains, ij = .
For sake of simplicity in the writing of the interface conditions, we consider

the two dimensional case ( R2 ) although the proof is valid in arbitrary
dimension. The interface conditions include second order tangential deriva-
tives and have the form

Bij = (x) + ij , ij = ij (x) (ij (x) ) (2.9)
ni i i
where ij and ij are functions from ij into R that verify
ij (x) = ji (x) 0 > 0,
(x)ij = (x)ji 0
ij (x) = 0 on ij .
In this case the operators ij = ji defined in (2.9) have the following re-
markable properties
ij is SPD (symmetric positive definite).

1/2
ij has an invertible SPD square root, denoted by ij whose inverse
1/2
is denoted by ij , which is SPD as well.
The algorithm consists in computing the iterates un+1

i of the local solutions
from the previous iterates uni by solving
(x)un+1
i ((x)un+1i ) = f in i ,
uin+1
= 0 on i

((x) + ij ) un+1
i = ((x) + ij ) unj on ij .
ni nj
(2.10)
The interface condition in the algorithm (2.10) can be rewritten as
ui 1/2 uj 1/2
((x) ) + ij (ui ) = ij ((x) ) + ij (uj ) on ij .
1/2 1/2
ij
ni nj
The convergence proof of the algorithm (2.10) follows the arguments given
in [33] and is based on a preliminary result which is an energy estimate
Lemma 2.1.1 (Energy Estimate) Let u denote a function that satisfies
(x)u ((x)u) = 0 in i
u = 0 on i ,
Then,
2
1 ui
(x)ui 2 + (x)ui 2 + ( [(x) (u )])
1/2
ij i
i 4 ji ij ij ni
2
1 ui
= (ij [(x) + ij (ui )])
1/2
4 ji ij ni
Proof From the variational formulation of the local problem
(x)ui ((x)ui ) = 0 in i ,
we get by choosing the test function equal to ui :
ui ui
(x)ui 2 + (x)ui 2 = (x) ui = (x) ui
i i ni ji ij ni
ui 1/2 1/2
= (x) ij (ui )
ji ij ni ij
ui 1/2
= ij ((x) ) ij (ui )
1/2
ji ij ni
From the identity ab = 1

4
((a + b)2 (a b)2 ), we infer
2
1 ui 1/2
(x)ui 2 + (x)ui 2 + ( ((x) ) (u ))
1/2
ij ij i
i ji 4 ij ni
2
1 ui 1/2
= (ij ((x) ) + ij (ui ))
1/2
ji 4 ij ni
which leads to the conclusion.

We are ready to prove the following convergence result
Theorem 2.1.1 The algorithm (2.10) converges in H 1 , that is
lim uni ui H 1 (i ) = 0 for i = 1, . . . , N

n
Proof In order to prove the convergence of the algorithm we prove that

eni = uni ui converges to zero. By linearity of the equations and of the
algorithm, it is clear that the error en+1
i satisfies
(x)en+1
i ((x)en+1
i ) = 0 in i ,
ein+1
= 0 on i
en+1 1/2 en 1/2
ij ((x) i ) + ij (en+1 ) = ij ((x) njj ) + ij (enj )
1/2 1/2
i on ij .
ni
We apply the energy estimate to en+1

i and taking into account the interface
condition (2.1.2) and noticing that by assumption we have ij = ji , we get
2
1 enj 1/2 n
(x)en+1 2 + (x)un+1 2 = ( ((x) ) + (e ))
1/2
i i ji ji j
i ji 4 ij nj
2
en+1 1/2
(ij ((x) i
) ij (en+1 ))
1/2
i
ni
(2.11)
We introduce the following notations:

Ein+1 = (x)en+1
i 2 + (x)en+1
i 2 ,
i
2
1 en+1 1/2
Cij
n+1
= (ij ((x) i ) ij (ein+1 ))
1/2

4 ij ni
N
E n+1 = Ei
n+1
, C n = Cji
n
.
i=1 i,j(ji)
With these notations, estimate (2.11) can be re-written as

Ein+1 + Cij
n+1
= Cji
n
.
ji ji
After summation over the subdomains, we have

N
Ei + Cij = Cji = Cij
n+1 n+1 n n
.
i=1 i,j(ji) i,j(ji) i,j(ji)
Let us denote
N
E n+1 = Ein+1 and C n = Cji
n
,
i=1 i,j(ji)
we have
E n+1 + C n+1 = C n .
Hence, by summation over n, we get

E C0.
n+1
n=0
This series is convergent only if E 0 as n , which proves that for all
n
subdomain 1 i N we have necessarily eni 0 in H 1 norm as n tends to

infinity.
The same kind of proof holds for the Maxwell system [45] and the convection-
diffusion equation [140].
2.2 Helmholtz problems

We have considered so far domain decomposition methods for Poisson or
elasticity problems which yield symmetric positive definite matrices. With a
suitable coarse space, Schwarz or BNN/FETI type methods are very power-
ful methods whose scalability can be proven. But, the numerical simulation
of propagation phenomena via Helmholtz equations, Maxwell and elasticity
systems in the frequency domain yield matrices which are symmetric but
not positive. The extension of P.L. Lions algorithm to Helmholtz problem
in [43], 2.2.2, is the first iterative method with proven convergence for this
indefinite operator.
2.2. HELMHOLTZ PROBLEMS 51
2.2.1 Convergence issues of the Schwarz method for

Helmholtz
The Helmholtz equation is derived from the acoustic wave equation:
1 2u
u = f (t, x, y)
c2 t2
which models for instance pressure variation with a source term f and a
sound velocity c. When the source is time periodic, it makes sense to look
for time periodic solutions as well. With some abuse of notation, let f =
f (x, y) exp(it) (i2 = 1) be a harmonic source term of frequency , we seek
the solution to the acoustic wave equation in the form u = u(x, y) exp(it).
Then, u must be a solution to the Helmholtz equation
2
L(u) = ( ) (u) = f (x, y), x, y
c2
whose solution is in general complex valued.
An extra difficulty comes from the non positivity of the Helmholtz operator
due to the negative sign of the term of order zero. More precisely, a Schwarz
algorithm for solving Helmholtz equation involves the decomposition of do-
main into N overlapping subdomains (j )1jN , the solving of Dirichlet
problems in the subdomains:
2
( c2 )(un+1
j ) = f (x, y), in j , 1 j N
(2.12)
un+1
j = un , on j
and then using a partition of unity (j )1jN related to (j )1jN :
N
un+1 = j un+1
j .
j=1
The problem is that this algorithm is not necessarily well-posed. Indeed,

if 2 /c2 is an eigenvalue of the Poisson operator in a subdomain k , there
exists a non zero function v k R such that:
2
v = c2 v in k
v = 0 on k .
Then, problem (2.12) is ill-posed in subdomain k . Either there is no

solution to it or if there is one, it is possible to add to it any function
proportional to the eigenvalue v and still satisfy equation (2.12). We have
here what is called a Fredholm alternative.
Moreover, even when the local problems are well-posed, a bad convergence is
to be expected. We present the analysis in the case of the plane divided into
two subdomains although it is very similar to the elliptic case considered in
1.5.2. We introduce the wave number

= .
c
Consider the domain is = R2 with the Sommerfeld radiation condition at
infinity,
u
lim r ( + iu) = 0,
r r

where r = x2 + y 2 . We decompose it into two subdomains with or without
overlap, 0, 1 = (, ) R and 2 = ( 0, ) R and consider the
Schwarz algorithm
u1n+1 2 un+1
1 = f (x, y), x, y 1
u1 (, y) = un2 (, y) , y R
n+1
(2.13)
u n+1
lim r ( 1 + iu1n+1 ) = 0
r r
and
un+1
2 2 un+1
2 = f (x, y), x, y 2
un+1
2 (0, y) = un1 (0, y) , y R
(2.14)
u n+1
lim r ( 2 + iu2n+1 ) = 0 .
r r
For the convergence analysis it suffices by linearity to consider the case
f (x, y) = 0 and to analyze convergence to the zero solution. Let the Fourier
transform in y direction be denoted by
u(x, k) = (Fu)(x, k) = u(x, y)eiky dy,

R
Taking a Fourier transform of the algorithms (2.13) and (2.14) in the y

direction we obtain
2 un+1
1
( 2 k 2 )un+1
1 = 0,
x2
x < , k R (2.15)
1 (, k) = u2 (, k)
un+1 n
and
2 un+1
2
( 2 k 2 )u2n+1 = 0,
x2
x > 0, k R (2.16)
2 (0, k) = u1 (0, k)
un+1 n
The general solutions of these ordinary differential equations are
un+1
j = Aj e(k)x + Bj e(k)x , j = 1, 2,
where (k) denotes the root of the characteristic equation 2 + ( 2 k 2 ) = 0

with positive real or imaginary part,

k 2 2 for k ,
(k) = { 2 (2.17)
i k 2 for k < .
Note that the main difference with the elliptic case is that (k) is a complex
valued function. It takes real values for the vanishing modes k and
purely imaginary values for propagative modes k < . Since the Sommerfeld
radiation condition excludes growing solutions as well as incoming modes at
infinity, we obtain the local solutions
1 (x, k) = u1 (, k)e
un+1 n+1 (k)(x)
(2.18)
2 (x, k) = u2 (0, k)e
un+1 n+1 (k)x
.
Then, it is easy to check that
1 (, k) = exp (2(k) ) u1 (, k).

un+1 n1
Defining the convergence factor by

1/2
1 (, k)
un+1
(k, ) = n1 (2.19)
u1 (, k)
we find by plugging (2.18) into (2.13) and (2.14) that
(k, ) = exp ((k) ) (2.20)
where (k) is given by (2.17). By induction, the following result follows for
even n
un1 (0, k) = (k, )n u01 (0, k), un2 (0, k) = (k, )n u02 (0, k).
Remark 2.2.1 We can distinguish a few features from the expression of the
convergence factor, see Figure 2.3:
The main difference with the elliptic case is that (k, ) is a complex
valued function. The convergence will occur if the modulus of is
smaller than one, (k, ) < 1.
1.2
Convergence rate for omega = 10
0.8
0.6
0.4
0.2
0
0 5 10 15 20 25 30 35 40
Fourier number
Figure 2.3: Convergence factor (2.20) of the Schwarz method for Helmholtz
equation vs. Fourier number
For vanishing modes k > , (k) is negative and real so that these
modes converge faster for larger overlaps. But we have (k, ) 1 for
propagative modes k < whatever the overlap is since there (k)
iR. This prevents the convergence of the Schwarz algorithm for the
Helmholtz equation.
A possible fix is the use of a relaxation parameter :
uin+1 = un+1
i + (1 )uni .
The choice of the optimal parameter is not easy and the overall
convergence rate is not good anyway.
It is thus clear that the Schwarz method cannot be used safely nor efficiently
as a solver for the Helmholtz equation.
2.2.2 Despres Algorithm for the Helmholtz equation

The algorithm by P.L. Lions can be applied to the Helmholtz equation with
the choice = i (i2 = 1). We give here a slight modification of the
original algorithm given in [42]. Domain is decomposed into N overlapping
subdomains (j )1jN and at each iteration a subdomain solve reads
( 2 )(un+1
j ) = f (x, y) in j , 1 j N
(2.21)
( + i) (un+1
j ) = ( + i) (un ) on j
nj nj
and then using a partition of unity (j )1jN related to (j )1jN :
N
un+1 = j un+1
j .
j=1
This algorithm fixes the two drawbacks of the Schwarz algorithm explained
in 2.2.1. First, note that the solution to the boundary value problem (2.21)
is unique even if 2 is an eigenvalue of the Laplace operator with Dirichlet
boundary conditions. Indeed, suppose for some subdomain k (1 k N )
we have a function v k C that satisfies:
2 v v = 0 in k

( + i) (v) = 0 on k .
nk
Multiplying the equation by the conjugate of v and integrating by parts, we
get:
v
v + v v = 0 .
2 2 2
k k nk
Taking into account the Robin condition on k , we have:
2 v2 + v2 + i v2 = 0 .
k k
Taking the imaginary part of the above equality, we get:
v2 = 0 ,
k
so that v is zero on k . Using again the Robin condition, we have that

v/nk = 0 as well on k . Together with the fact that 2 v v = 0 in
k , a unique continuation principle proves that v is identically zero in .
Thus Despres algorithm is always well defined.
The second point is that convergence of algorithm was proved in [44] for a
non overlapping decomposition of domain . It is worth noticing that it was
the first time an iterative solver with proven convergence was proposed for
the Helmholtz and subsequently Maxwell equations [45]. For overlapping
subdomains, no general convergence proof is available. But, in the case of
the plane R2 decomposed into two overlapping half-planes, the convergence
factor can be computed and a convergence result follows. The algorithm
studied here can be written as
un+11 2 un+1
1 = f (x, y), x, y 1

( + i) (u1 )(, y) = (
n+1
+ i) un2 (, y) , y R
n1 n1 (2.22)
un+1
lim r ( 1 + iun+1 1 ) = 0
r r
1.2
Convergence rate for omega = 10
0.8
0.6
0.4
0.2
0
0 5 10 15 20 25 30 35 40
Fourier number
Figure 2.4: Convergence factor (2.24) of Despres method for Helmholtz

equation vs. Fourier number ( = 0.1)
and
un+1
2 2 un+1
2 = f (x, y), x, y 2

( + i) un+1
2 (0, y) = ( + i) un1 (0, y) , y R
n2 n2 (2.23)
un+1
lim r ( 2 + iun+1 2 ) = 0.
r r
Performing a convergence analysis as in 2.1, the convergence factor is:
(k) i
(k, ) = exp((k) ) (2.24)
(k) + i
where (k) is given by formula (2.17). Note that we have

2 k 2

, for k < ,
2 k 2 +
(k, ) =

exp( k ), for k < ,
2 2
see Figure 2.4. Note that for propagative modes k < , exp((k)) =
1 since (k) is purely imaginary. Therefore the convergence comes from
the Robin condition independently of the overlap. As for vanishing modes
k > , is real. The modulus of the rational fraction in (2.24) is 1 and
2.3. IMPLEMENTATION ISSUES 57
convergence comes from the overlap. Thus for all Fourier modes except k =
, the convergence factor is smaller than one. Thus, the method converges
for almost every Fourier number k.
2.3 Implementation issues

A Robin boundary value problem is different from standard Dirichlet or
Neumann problems but poses no difficulty as long as the right-hand side of
the Robin condition is known. But as we shall see, the discretization of the
right-hand side (2.25) on an interface is not a trivial task. Fortunately, it
may be eased by tricks explained in 2.3.1 and 2.3.2.
Consider for example the problem (2.1) in Lions algorithm and let g2n denote
the right-hand side on the interface of subdomain 1 :

g2n = ( + ) (un2 ) on 1 2 . (2.25)
n1
Then, the bilinear form associated to the variational formulation of (2.1) is:
aRobin H 1 (1 ) H 1 (1 ) R
(u, v) z (uv + u v) + u v .
1 1 2
(2.26)
Note that the boundary integral term containing is positive when v = u.
Thus bilinear form aRobin is even more coercive than the one associated with
the Neumann boundary value problem which contains only the subdomain
integral term. The linear form related to (2.1) reads:
lRobin H 1 (1 ) R
(2.27)
v z fv+ g2n v .
1 1 2
The variational formulation of the Robin boundary value problem reads:
Find u H 1 (1 ) such that

v H 1 (1 ), aRobin (u, v) = lRobin (v) .
The finite element discretization is simply obtained by replacing the infinite

dimensional Sobolev space H 1 (1 ) by a finite dimensional space for example
a classical P 1 finite element space denoted Vh (1 ):
Find u Vh (1 ) such that

vh Vh (1 ), aRobin (uh , vh ) = lRobin (vh ) .
From an algebraic point of view, we consider a finite element basis
{k k N1 } Vh (1 )
where N1 is the set of degrees of freedom. The matrix form of aRobin,h is

denoted by ARobin,1 :
[ARobin,1 ]kl = aRobin,h (l , k ) = (l k + l k ) + l k .

1 1 2
(2.28)
Matrix ARobin,1 is actually the sum of the mass matrix, a stiffness matrix
which corresponds to a Neumann problem and of a matrix that has zero

entries for interior degrees of freedom (Supp k (1 2 ) = ), see equa-
tion (2.38) as well.
Note that the same technique can be applied in the case of the Despres
algorithm for the Helmholtz equation with minimal adjustements due to
the form of the problem. We simply replace by 2 and multiply by the
complex conjugate of the test function in the variational form.
The only difficulty lies now in the discretization of the right-hand side g2n on
the interface. Suppose we have a P 1 (piecewise linear) finite element so that
we can identify a degree of freedom with a vertex of a triangular mesh in 2D.
The variational form aRobin implicitly defines a discretization scheme for the
outward normal derivative un+1 1 /n1 . The corresponding stencil can only
involve vertices that belong to domain 1 , see Figure 2.5. In order to ensure
that the domain decomposition algorithm leads to a solution identical (up to
the tolerance of the iterative solver) to the one which would be obtained by
the original discretization scheme on the whole domain, it seems necessary
to discretize g2n (= un2 /n1 ) using the same stencil. But, the function un2 is
defined by degrees of freedom that are in 2 and thus cannot be the ones
1 /n1 . In the overlapping case, a discretization of the right-
defining un+1
hand side based on the same stencil points could be done but at the expense
of identifying the stencil implicitly defined by the variational formulation.
This is not so easy to implement in a code.
In the next two sections, we give two tricks that ease the implementation so
that the converged solution is equal to the one which would be obtained by
the original discretization scheme on the whole domain. One trick applies
to a non overlapping decomposition and the other one to the overlapping
case only.
2.3.1 Two-domain non-overlapping decomposition

This section is concerned with the Finite Element implementation of the
interface conditions of Robin type for a non-overlapping decomposition of
the domain. We present the discretization scheme for a decomposition of a
domain into two non overlapping subdomains 1 and 2 with interface
12 . We consider first the optimized Schwarz algorithm at the continuous
1 2
Figure 2.5: Stencil for the outward normal derivative (u1 /n1 ) at the in-
terface between two non overlapping subdomains 1 and 2
level,
( )un+1
1 =f in 1
un+1 un2
1
+ un+1
1 = + un2 on 12
n1 n2
(2.29)
( )un+1
2 =f in 2
un+1 un1
2
+ un+1
2 = + un1 on 12 .
n2 n1
where we have used that on the interface between non overlapping subdo-
mains, we have n1 = n2 , see Figure 2.1. A direct discretization would
require the computation of the normal derivatives along the interfaces in or-
der to evaluate the right-hand sides in the transmission conditions of (2.29).
This can be avoided by re-naming the problematic quantities:
un2 un1
n1 = + un2 and n2 = + un1 .
n2 n1
The algorithm (2.29) becomes
( )u1n+1 = f in 1
un+1
1
+ un+1
1 = n1 on 12
n1
(2.30)
( )un+1
2 = f in 2
un+1
2
+ un+1
2 = n2 on 12
n2
n+1
1 =n2 + 2 un+1
2
n+1
2 =n1 + 2 un+1
1 .
The last two update relations of (2.30) follow from:
un+1 un+1
n+1
1 = 2
+ un+1
2 = ( 2
2 ) + 2 u2
+ un+1 n+1
= n2 + 2 un+1
2
n2 n2
(2.31)
and similarly
un+1 un+1
n+1
2 = 1
+ un+1
1 = ( 1
1 ) + 2 u1
+ un+1 n+1
= n1 + 2 un+1
1 .
n1 n1
(2.32)
Equations (2.31) and (2.32) can be interpreted as a fixed point algorithm in
the new variables j , j = 1, 2, to solve the substructured problem
1 = 2 + 2 u2 (2 , f ) ,
(2.33)
2 = 1 + 2 u1 (1 , f ) ,
where uj = uj (j , f ), j = 1, 2, are solutions of:
( )uj = f in j ,
uj
+ uj = j on 12 .
nj
Instead of solving the substructured problem (2.33) by the fixed point itera-
tion (2.30), one usually uses a Krylov subspace method to solve the substruc-
tured problem. This corresponds to using the optimized Schwarz method as
a preconditioner for the Krylov subspace method.
At this point, we can introduce the algebraic counterparts of the continu-
ous quantities. A finite element discretization of the substructured prob-
lem (2.33) leads to the linear system
1 = 2 + 2 B2 u2
(2.34)
2 = 1 + 2 B1 u1
where uj = uj (j , fj ), j = 1, 2 are solutions of:
ARobin,1 u1 = f1 + B1T 1
(2.35)
ARobin,2 u2 = f2 + B2T 2
We detail now the new matrices and vectors we have introduced:
Vectors u1 , u2 contain the degrees of freedom of the subdomain solu-
tions.
Vectors f1 , f2 are the degrees of freedom related to f .
Matrices B1 and B2 are the trace operators of the domains 1 and 2
on the interface 12 .
In order to define matrices ARobin,j , we first split the two vectors u1 and u2
into interior and boundary degrees of freedom:
ui
j
uj = b , j = 1, 2, (2.36)
uj

where the indices i and b correspond to interior and interface degrees of
freedom respectively for domain j . Then the discrete trace operators B1
and B2 are just the Boolean matrices corresponding to the decomposition
(2.36) and they can be written as
Bj = [0 I] , j = 1, 2, (2.37)
where I denotes the identity matrix of appropriate size. For example,
B1 u1 = ub1 and B2 u2 = ub2 .
Matrices ARobin,1 and ARobin,2 arise from the discretization of the local
operators along with the interface conditions n + ,
ARobin,j = Kj + BjT M12 Bj , j = 1, 2. (2.38)
Here Kj are local matrices of problem (as combination of stiffness and mass
matrices) and M12 is the interface mass matrix
[M12 ]nm = n m d (2.39)

12
The functions n and m are the basis functions associated with the degrees
of freedom n and m on the interface 12 .
For given 1 and 2 , the functions u1 and u2 can be computed by solving
equations (2.35). By eliminating u1 and u2 in (2.34) using (2.35), we obtain
the substructured linear system
F = d, (2.40)
where = (1 , 2 ) and the matrix F and the right-hand side d are
I I 2 B2 A1 T
Robin,2 B2
F =( )
I 2 B1 A1 T
Robin,1 B1 I
(2.41)
2 B1 A1
Robin,1 f1
d=( )
Robin,2 f2 .
2 B2 A1
The linear system (2.40) is solved by a Krylov subspace method. The matrix
vector product amounts to solving a subproblem in each subdomain and to
send interface data between subdomains.
The general case of a decomposition into an arbitrary number of subdomains
is treated in [88] for the case of the Helmholtz equations. It is also possible to
discretize the interface conditions by using Lagrange multipliers the method
is then named FETI 2 LM (Two-Lagrange Multiplier ), see [162].
2.3.2 Overlapping domain decomposition

The trick explained in the non overlapping case cannot be used in the
overlapping case, see Figure 2.1. Indeed, even in the simple case of two
subdomains, the normal derivatives are computed on two distinct interfaces
1 2 and 2 1 . Here as well, it is possible to write an algorithm
equivalent to P.L. Lions algorithm (2.1)-(2.2) but where the iterate is a
function un R i.e. it is not a pair of functions (un1 , un2 ).
Let a overlapping domain decomposition = 1 2 with a partition of

unity functions 1 and 2 :
1 + 2 = 1, suppi i , i = 1, 2.
Let un R be an approximate solution to a Poisson equation. Then the

update un+1 is computed by the following algorithm which is the continuous
version of the Optimized Restricted Additive Schwarz (ORAS) algorithm,
see [36]. The solution will come from section 1.3 where an equivalence
between the original Schwarz algorithm and the RAS method was shown at
the continuous level.
Lemma 2.3.1 (Equivalence between Lions algorithm and ORAS)

The algorithm defined by (2.42), (2.43) and (2.44) is equivalent to the
Lions Schwarz algorithm
Proof The approach is very similar to what is done in Lemma 1.1.2

where we prove the equivalence between the Schwarz algorithm and the
RAS method. Here, we have to prove the equality
un = E1 (1 un1 ) + E2 (2 un2 ) ,
Algorithm 3 ORAS algorithm at the continuous level

rn = f ( )un (2.42)
( )vin = rn in i
vin = 0 on i
(2.43)

( + ) (vin ) = 0 on i 3i
ni
3. Compute an average of the local corrections and update un :
un+1 = un + E1 (1 v1n ) + E2 (2 v2n ) . (2.44)
where (i )i=1,2 and (Ei )i=1,2 define a partition of unity as in defined

in section 1.1 equation (1.4).
where un1,2 are given by (2.1)-(2.2) and un is given by (2.42)-(2.43)-(2.44).

We assume that the property holds for the initial guesses:
u0 = E1 (1 u01 ) + E2 (2 u02 )
and proceed by induction assuming the property holds at step n of the

algorithm, i.e. un = E1 (1 un1 ) + E2 (2 un2 ). From (2.44) we have
un+1 = E1 (1 (un + v1n )) + E2 (2 (un + v2n )) . (2.45)
We prove now that un1 + v1n = un+1

1 by proving that un1 + v1n satisfies (2.1)
as un+1
1 does. We first note that, using (2.43)-(2.42) we have:
( )(un + v1n ) = ( )un + rn = ( )un + f ( un ) = f in 1 ,

( + ) (un + v1n ) = ( + ) (un ) on 1 2 ,
n1 n1
(2.46)
It remains to prove that

( + ) (un ) = ( + ) (un2 ) on 1 2 .
n1 n1
By the induction hypothesis we have un = E1 (1 un1 ) + E2 (2 un2 ). On a

neighborhood of 1 2 (zone (c) in Figure 2.6), we have 1 0 and thus
1 n2 n1 2
(a) (a) (b) (c) (c)
Figure 2.6: Function of partition of unity 1 . In zone (a): 1 1, in zone

(b): 0 < 1 < 1 and in zone (c): 1 0.
2 1. In this neighborhood, their derivatives are zero as well, so that on

1 2 we have :

( + ) (un ) = ( + ) (1 un1 + 2 un2 ) = ( + ) (un2 ) . (2.47)
n1 n1 n1
Finally from (2.46) and (2.47) we can conclude that un1 + v1n = un+1
1 satisfies
n+1
problem (2.1) and is thus equal to u1 . The same holds for domain 2 ,
un2 + v2n = un+1
2 . Then equation (2.45) reads
1 ) + E2 (2 u2 )
un+1 = E1 (1 un+1 n+1
which ends the proof of the equivalence between P.L. Lions algorithm the
continuous version of the ORAS algorithm.
The advantage of the ORAS method over a direct implementation of

P.L. Lions algorithm is that the Robin boundary condition (2.43) has
a zero right-hand side that needs no discretization. The only drawback
at the discretized level is that the actual overlap is reduced by a mesh
size compared to a full discretization of the original P.L. Lions algorithm,
see [36] for more details.
We now give the algebraic definition of the ORAS method by giving the
discrete counterparts of steps (2.42)-(2.43)-(2.44), see [36]. Let Un R#N
be an approximate solution to a linear system:
AU = F . (2.48)
The set of degrees of freedom N is decomposed into two subsets N1 and N2 .
For i = 1, 2, let Ri denote the Boolean restriction matrix to Ni and Di define
an algebraic partition of unity 2i=1 RiT Di Ri = Id, as in (1.25).
The above three steps algorithm can be written more compactly as:
2.4. OPTIMAL INTERFACE CONDITIONS 65
The update Un+1 is computed in several steps by the following algorithm
1. Compute the residual rn R#N :
rn = F AUn (2.49)
2. For i = 1, 2 solve for a local correction Vi n :
Ai,Robin Vin = Ri rn (2.50)
where Ai,Robin is the discretization matrix of a Robin problem as ex-

plained in (2.28) or in (2.38).
3. Compute an average of the local corrections and update Un accord-

ingly:
Un+1 = Un + R1T D1 V1n + R2T D2 V2n . (2.51)
Definition 2.3.1 (ORAS algorithm) The iterative Optimized Restricted

Additive Schwarz (ORAS) algorithm is the preconditioned fixed point itera-
tion defined by
Un+1 = Un + MORAS
1
rn , rn = F A Un
where the matrix

2
MORAS
1
= RiT Di A1
i,Robin Ri (2.52)
i=1
is called the ORAS preconditioner.

In short, the only difference with the RAS method (1.29) consists in replacing
the local Dirichlet matrices Ri ARiT by matrices ARobin,i which correspond
to local Robin subproblems.
As explained in chapter 3, it is more profitable to use a Krylov method
in place of the above iterative fixed-point algorithm. It consists in solving
(2.48) by a Krylov method such as GMRES or BiCGSTAB preconditioned
by operator MORAS .
2.4 Optimal interface conditions

2.4.1 Optimal interface conditions and ABC
Robin boundary conditions are not the most general interface conditions.
Rather than give the general conditions in an a priori form, we shall derive
them in this section so as to have the fastest convergence. We establish
the existence of interface conditions which are optimal in terms of iteration
1
1

2 2
c
2 c
1
Figure 2.7: Subdomains and their complements
counts. The corresponding interface conditions are pseudo-differential and

are not practical. Nevertheless, this result is a guide for the choice of
partial differential interface conditions. Moreover, this result establishes
a link between the optimal interface conditions and artificial boundary
conditions. This is also a help when dealing with the design of interface
conditions since it gives the possibility to use the numerous papers and
books published on the subject of artificial boundary conditions, see e.g.
[72, 96] or more generally on truncation of infinite domains via the PML
technique [116, 27].
We consider a general linear second order elliptic partial differential operator

L and the problem:
L(u) = f,
u = 0, .
The domain is decomposed into two subdomains 1 and 2 . We suppose
that the problem is regular so that ui = ui , i = 1, 2, is continuous and has
continuous normal derivatives across the interface i = i j , i j. A
modified Schwarz type method is considered.
L(un+1
1 )=f in 1 2 )=f
L(un+1 in 2
un+1
1 =0 on 1 un+1
2 =0 on 2
1 u1 .n1
n+1
+ B1 (un+1
1 ) 2 u2 .n2
n+1
+ B2 (un+1
2 )
= 1 un2 .n2 + B1 (un2 ) on 1 = 2 un1 .n1 + B2 (un1 ) on
2
(2.53)
where 1 and 2 are real-valued functions and B1 and B2 are operators
acting along the interfaces 1 and 2 . For instance, 1 = 2 = 0 and B1 =
B2 = Id correspond to the original Schwarz algorithm (1.3); 1 = 2 = 1 and
Bi = R, i = 1, 2, has been proposed in [125] by P. L. Lions.

The question to which we would like to answer is:
Are there other possibilities in order to have convergence in a

minimal number of steps?
In order to answer this question, we note that by linearity, the error e satisfies
(we supposed here with no loss of generality that 1 = 2 = 1)
L(en+1
1 )=0 in 1 L(en+1
2 )=0 in 2
en+1
1 =0 on 1 en+1
2 =0 on 2
e1 .n1
n+1
+ B1 (en+1
1 ) e2 .n2
n+1
+ B2 (en+1
2 )
= en2 .n2 + B1 (en2 ) on 1 = en1 .n1 + B2 (en1 ) on 2
The initial guess e0i is arbitrary so that it is impossible to have convergence

at step 1 of the algorithm. Convergence needs at least two iterations. Having
e21 0 requires
e12 .n2 + B1 (e12 ) 0.
The only meaningful information on e12 is that
L(e12 ) = 0 in 2 .
In order to use this information, we introduce the DtN (Dirichlet to Neu-

mann) map (a.k.a. Steklov-Poincare):
u0 1 R
(2.54)
DtN2 (u0 ) = v.n2 1 2 ,
where n2 is the outward normal to 2 1 , and v satisfies the following

boundary value problem:
L(v) = 0 in 2 1
v = 0 on 2
v = u0 on 1 2 .
If we take
B1 = DtN2
we see that this choice is optimal since we have
e12 .n2 + B1 (e12 ) 0.
Indeed, in 2 1 2 , e12 satisfies
L(e12 ) = 0.
Hence,
e12 .n2 = DtN2 (e12 )

e12 .n2 = B1 (e12 ) (B1 = DtN2 )
We have formally proved
Result 2.4.1 The use of Bi = DtNj (i j) as interface conditions in (2.53)

is optimal: we have (exact) convergence in two iterations.
The two-domain case for an operator with constant coefficients has been
first treated in [104]. The multidomain case for a variable coefficient oper-
ator with both positive results [141] and negative conjectures for arbitrary
domain decompositions [147] has been considered as well.
Remark 2.4.1 The main feature of this result is to be very general since it
does not depend on the exact form of the operator L and can be extended to
systems or to coupled systems of equations as well with a proper care of the
well-posedness of the algorithm.
As an application, we take = R2 and 1 = ] , 0 [ R. Using the same

Fourier technique that was used to study algorithm (2.5)-(2.6), it is possi-
ble to give the explicit form of the DtN operator for a constant coefficient
operator.
If L = , the action of the DtN map can be computed as:

DtNu0 = + k 2 u0 (k) eiky dk.
R
We say that DtN is a pseudo-differential operator whose symbol is

=
DtN + k2 .
If L is a convection-diffusion operator L = + a , the symbol of

the DtN map is

= a.n1 + (a.n1 )2 + 4( + a.1 k + 2 k 2 )
DtN
2
where n1 is the outward normal to subdomain 1 and 1 is the tan-

gential derivative along the interface.
If L = 2 is the Helmholtz operator, the symbol of DtN is

=
DtN k 2 2 .
U1
U2
Figure 2.8: Geometric Partition into two subdomains and corresponding

partition of the degrees of freedom
These symbols are not polynomials in the Fourier variable k so that the
corresponding operators and hence the optimal interface conditions are not
partial differential operators. They correspond to exact absorbing condi-
tions. These conditions are used on the artificial boundary resulting from
the truncation of a computational domain. The solution on the truncated
domain depends on the choice of this artificial condition. We say that it
is an exact absorbing boundary condition if the solution computed on the
truncated domain is the restriction of the solution of the original problem.
Surprisingly enough, the notions of exact absorbing conditions for domain
truncation and that of optimal interface conditions in domain decomposition
methods coincide.
As the above examples show, the optimal interface transmission conditions
are pseudodifferential. Therefore they are difficult to implement. More-
over, in the general case of a variable coefficient operator and/or a curved
boundary, the exact form of these operators is not known, although they can
be approximated by partial differential operators which are easier to imple-
ment. The approximation of the DtN has been addressed by many authors
since the seminal paper [72] by Engquist and Majda on this question.
2.4.2 Optimal Algebraic Interface Conditions

The issue of optimal interface conditions is considered here at the matrix
level for a non overlapping domain decomposition. When a partial differen-
tial problem is discretized by a finite element method for instance, it yields
a linear system of the form AU = F, where F is a given right-hand side and
U is the set of unknowns. Corresponding to the domain decomposition, the

set of unknowns U is decomposed into interior nodes of the subdomains U1

and U2 , and to unknowns, U , associated to the interface . This leads to
a block decomposition of the linear system

A11 A1 0 U1 F1
A1 A A2
U

= F . (2.55)
0
A2 A22 U2 F2
In order to write an optimized Schwarz method, we have to introduce two

square matrices T1 and T2 which act on vectors of the type U . Then the
OSM algorithm reads:
n+1
A11 A1 U F1
( ) 1 = n (2.56)
A1 A + T2 Un+1 F + T2 Un,2 A2 U2
,1
n+1
A22 A2 U F2
( ) 2 = n (2.57)
A2 A + T1 Un+1 F + T1 Un,1 A1 U1
,2
Lemma 2.4.1 Assume A + T1 + T2 is invertible and problem (2.55) is

well-posed. Then if the algorithm (2.56)-(2.57) converges, it converges to

,1 , U2 , U,2 ) the
the solution of (2.55). That is, if we denote by (U1 , U
n n
limit as n goes to infinity of the sequence (U1 , Un,1 , U2 , Un,2 ) ,
n0
i = Ui and U,1 = U,2 = U , i = 1, 2.

U
Proof Note first that we have a duplication of the interface unknowns U

into U,1 and U,2 .
We subtract the last line of (2.56) to the last line of (2.57), take the limit
as n goes to infinity and get:
,1 U,2 ) = 0
(A + T1 + T2 )(U
,1 = U,2 . Then, taking the limit of (2.56) and (2.57)

which proves that U
T
as n goes to infinity shows that (U1 , U
,1 = U,2 , U2 )

is a solution to
(2.55) which is unique by assumption.
As in 2.4, we can use the optimal interface conditions
Lemma 2.4.2 Assume Aii is invertible for i = 1, 2.
Then, in algorithm (2.56)-(2.57), taking
T1 = A1 A1
11 A1 and T2 = A2 A22 A2
1
yields a convergence in two steps.

2.5. OPTIMIZED INTERFACE CONDITIONS 71
Proof Note that in this case, the bottom-right blocks of the two by two
bock matrices in (2.56) and (2.57)
A A1 A1
11 A1 and A A2 A22 A2
1
are actually Schur complements. It is classical that the subproblems (2.56)

and (2.57) are well-posed. By linearity, in order to prove convergence, it is
T
sufficient to consider the convergence to zero in the case (F1 , F , F2 ) = 0.
At step 1 of the algorithm, we have
1
A11 U1 +A1 U1,1 = 0
or equivalently by applying A1 A1
11 :
1
A1 U1 A1 A1
11 A1 U,1 = 0
1
So that the right-hand side of (2.57) is zero at step 2 (i.e. n = 1). We have
thus convergence to zero in domain 2. The same holds for domain 1.
Matrices T1 and T2 are in general dense matrices whose computation and
use as interface conditions is very costly. The relationship between optimal
algebraic interface conditions and the continuous ones is simply that last
line of block matrices in (2.56) and (2.57) are the correct discretizations of
the optimal interface introduced in 2.4.1.
2.5 Optimized interface conditions

As we have already seen in the previous chapters, three different reasons
led to the development of the new transmission conditions. The first reason
was to obtain Schwarz algorithms that are convergent without overlap, see
[125]. The second motivation for changing the transmission conditions was
to obtain a convergent Schwarz method for the Helmholtz equation, where
the classical overlapping Schwarz algorithm is not convergent [41, 45]. The
third motivation was that the convergence of the classical Schwarz method
is rather slow and depends strongly on the size of the overlap. For an
introduction to optimized Schwarz methods, see [83].
2.5.1 Optimized interface conditions for

In section 2.1.2, we have proved convergence of the domain decomposition
method with interface conditions of the type
n + ( ) (2.58)
for a general but non overlapping domain decomposition.

In section 2.4, we have exhibited interface conditions which are optimal in

terms of iteration counts but are pseudo-differential operators difficult to use
in practice. These results are not sufficient for the design of effective bound-
ary conditions which for the sake of simplicity must have the form (2.58).
From section 2.4, we know that the parameters and must somehow be
such that (2.58) approximates the optimal interface conditions

+ DtN.
ni
At first sight, it seems that the approximations proposed in the field of
artificial boundary conditions are also relevant in the context of domain
decomposition methods. Actually this is not the case, as was proved for the
convection-diffusion equation, see [110, 111].
In order to clarify the situation, we need an estimate of the convergence
factor as a function of the parameters and , the size of the overlap
and the coefficients of the partial differential operator. In particular it will
provide a means for choosing the interface conditions in an optimal way.
This type of study is limited to a very simple situation: a constant coefficient
operator and a whole space decomposed into two half-spaces. But, let us
insist on the fact that these limitations concern only this theoretical study.
The optimized values of the parameters of the interface conditions can be
used with success in complex applications, see section 2.5.2. The robustness
of the approach comes from the general convergence result of section 2.1.2
and from the replacement of the fixed point algorithm on the interface by
a Krylov type method as explained for equations (2.40) and (2.48). The
efficiency comes from the study below which is made possible by the use of
Fourier techniques similar to the ones used in artificial boundary conditions.
Optimization of the interface conditions

It is possible to optimize the choice of the parameter in Robin interface
conditions in order to minimize the maximum convergence factor in the
physical space
max (k, ; ).
k
When the subdomains overlap we have seen that the convergence factor
(2.7) is bounded from above by a positive constant so that it can be checked
that the following min-max problem
max (k, ; opt ) = min max (k, ; )

k k
admits a unique solution.

When the subdomains do not overlap, then for any choice of we have
max (k, 0; ) = 1,
k
so that the above min-max problem is ill-posed.

Anyhow, the purpose of domain decomposition methods is not to solve di-
rectly partial differential equations. They are used to solve the correspond-
ing linear systems arising from their discretizations. It is possible to study
the convergence factor of the related domain decomposition methods at the
discrete level based on the discretization scheme, see [138]. Fourier trans-
form is replaced by discrete Fourier series, i.e. the decomposition on the
vectors
Vk = (eijy k )jZ , k /(Zy)
with y the mesh size in the y direction. This convergence factor depends as
before on the parameters of the continuous problem but also on the discrete
parameters: mesh size in x and y. The resulting formula is quite complex
and would be very difficult to optimize.
Nevertheless, comparison with the continuous case and numerical exper-
iments prove that a semi-continuous approach is sufficient for finding an
optimal value for the parameter . This of course due to the fact that
as the discretization parameters go to zero, the discrete convergence factor
tends to its continuous counterpart.
For the sake of simplicity, we consider only the non-overlapping case, = 0.
We keep the formula of the convergence factor in the continuous case:
(k)
(k, ) = (2.59)
(k) +

with (k) = + k 2 . But we observe that the mesh induces a truncation in
the frequency domain. We have k < /y = kmax . For a parameter , the
convergence factor is approximated by
h () = max (k; ).
k</y
The optimization problem reads:

sc
Find opt such that
h (opt ) = min max (k; ).
sc (2.60)
k</y
It is easy to check that the optimum is given by the relation

sc
(0; opt ) = (kmax ; opt
sc
).
Let m = (0) and M = (kmax ), then we have

sc
opt = m M . (2.61)
It can then easily be checked that in the limit of small y,

y
h (opt ) 1 2
sc sc
and opt 1/4 .
y
1/y 10 20 40 80
sc
opt 6 7 10 16
=1 27 51 104 231
Table 2.1: Number of iterations for different values of the mesh size and two
possible choices for
Whereas for independent of y, we have

y
h () 1 2

for small y. Numerical tests on the model problem of a rectangle divided
into two half-rectangles and a finite difference discretization shows a good
agreement with the above formulas. In Table 2.1, the iteration counts are
given for two possible choices of the parameter , = 1 or opt
sc
given by for-
mula (2.61) and for a reduction of the error by a factor of 10 . For refined
6
mesh, the iteration count is reduced by a factor larger than ten. The opti-
mized Schwarz method is thus quite sensitive to the choice of the interface
condition. As we shall see in the next section, when the method is used as
a preconditioner in Krylov methods as explained in 2.3.1, performance is
less dependent on the optimal choice of the interface condition. Typically,
the iteration count is reduced by a factor three, see Table 2.3. Since taking
optimal interface conditions is beneficial in terms of iteration counts and
has no extra cost, it is a good thing to do especially for wave propagation
phenomena that we consider in the sequel.
2.5.2 Optimized conditions for the Helmholtz

The complete study is found in [88] for a complete presentation. In order
to have a self-consistent paragraph, we set up once again the considered
Helmholtz equation
L(u) = ( 2 )(u) = f (x, y), x, y .
The difficulty comes from the negative sign of the term of order zero of the
operator.
Although the following analysis could be carried out on rectangular domains
as well, we prefer for simplicity to present the analysis in the domain = R2
with the Sommerfeld radiation condition at infinity,
u
lim r( + iu) = 0,
r r

where r = x2 + y 2 . We decompose the domain into two non-overlapping
subdomains 1 = (, 0 ] R and 2 = [ 0, ) R and consider the Schwarz
algorithm
un+1
1 2 un+1
1 = f (x, y), x, y 1
(2.62)
B1 (un+1
1 )(0) = B1 (un2 )(0)
and
un+1
2 2 un+1
2 = f (x, y), x, y 2
(2.63)
B2 (un+1
2 )(0) = B2 (un1 )(0)
where Bj , j = 1, 2, are two linear operators. Note that for the classical
Schwarz method Bj is the identity, Bj = I and without overlap the algo-
rithm cannot converge. But even with overlap in the case of the Helmholtz
equation, only the evanescent modes in the error are damped, while the
propagating modes are unaffected by the Schwarz algorithm [88]. One pos-
sible remedy is to use a relatively fine coarse grid [19] or Robin transmission
conditions, see for example [43, 16]. We consider here transmission con-
ditions which lead to a convergent non-overlapping version of the Schwarz
method. We assume that the linear operators Bj are of the form
Bj = x + Tj , j = 1, 2,
for two linear operators T1 and T2 acting in the tangential direction on the
interface. Our goal is to use these operators to optimize the convergence
factor of the algorithm. For the analysis it suffices by linearity to consider
the case f (x, y) = 0 and to analyze convergence to the zero solution. Taking
a Fourier transform in the y direction we obtain
2 un+1
1
( 2 k 2 )un+1
1 = 0,
x2
x < 0, k R (2.64)
(x + 1 (k))(un+1
1 )(0) = (x + 1 (k))(u2 )(0)
n
and
2 un+1
2
( 2 k 2 )un+1
2 = 0,
x2
x > 0, k R (2.65)
(x + 2 (k))(un+1
2 )(0) = (x + 2 (k))(u1 )(0)
n
where j (k) denotes the symbol of the operator Tj , and k is the Fourier vari-
able, which we also call frequency. The general solutions of these ordinary
differential equations are
un+1
j = Aj e(k)x + Bj e(k)x , j = 1, 2,
where (k) denotes the root of the characteristic equation 2 + ( 2 k 2 ) = 0

with positive real or imaginary part,

(k) = k 2 2 for k ,
(2.66)
(k) = i 2 k 2 for k < .
Since the Sommerfeld radiation condition excludes growing solutions as well

as incoming modes at infinity, we obtain the solutions
1 (x, k) = u1 (0, k)e

un+1 n+1 (k)x
2 (x, k) = u2 (0, k)e

un+1 n+1 (k)x
.
Using the transmission conditions and the fact that
un+1
1
= (k)un+1
1
x
u2n+1
= (k)un+1
2
x
we obtain over one step of the Schwarz iteration
(k) + 1 (k) (k)x n

1 (x, k) =
un+1 e u2 (0, k)
(k) + 1 (k)
(k) + 2 (k) (k)x n
2 (x, k) =
un+1 e u1 (0, k).
(k) + 2 (k)
Evaluating the second equation at x = 0 for iteration index n and inserting

it into the first equation, we get after evaluating again at x = 0
(k) + 1 (k) (k) + 2 (k) n1

1 (0, k) =
un+1 u (0, k).
(k) + 1 (k) (k) + 2 (k) 1
Defining the convergence factor by
(k) + 1 (k) (k) + 2 (k)

(k) = (2.67)
(k) + 1 (k) (k) + 2 (k)
we find by induction that
1 (0, k) = (k) u1 (0, k),

u2n n 0
and by a similar calculation on the second subdomain,
2 (0, k) = (k) u2 (0, k).

u2n n 0
Choosing in the Fourier transformed domain
1 (k) = (k), 2 (k) = (k)
corresponds to using exact absorbing boundary conditions as interface con-

ditions. So we get (k) 0 and the algorithm converges in two steps inde-
pendently of the initial guess. Unfortunately this choice becomes difficult
to use in the real domain where computations take place, since the optimal
choice of the symbols j (k) leads to non-local operators Tj in the real do-
main, caused by the square root in the symbols. We have to construct local
approximations for the optimal transmission conditions.
In Despres algorithm [42], the approximation consists in Tj i (i2 = 1). In
[72], the approximation valid for the truncation of an infinite computational
domain is obtained via Taylor expansions of the symbol in the vicinity of
k = 0:
1
Tjapp = i ( ) ,
2
which leads to the zeroth or second order Taylor transmission conditions,
depending on whether one keeps only the constant term or also the second
order term. But these transmission conditions are only effective for the low
frequency components of the error. This is sufficient for the truncation of a
domain since there is an exponential decay of the high frequency part (large
k) of the solution away from the artificial boundary.
But in domain decomposition, what is important is the convergence factor
which is given by the maximum over k of (k). Since there is no overlap
between the subdomains, it is not possible to profit from any decay. We
present now an approximation procedure suited to domain decomposition
methods. To avoid an increase in the bandwidth of the local discretized
subproblems, we take polynomials of degree at most 2, which leads to trans-
mission operators Tjapp which are at most second order partial differential
operators acting along the interface. By symmetry of the Helmholtz equa-
tion there is no interest in a first order term. We therefore approximate the
operators Tj , j = 1, 2, in the form
Tjapp = (a + b )
with a, b C and where denotes the tangent direction at the interface.
Optimized Robin interface conditions

We approximate the optimal operators Tj , j = 1, 2, in the form
Tjapp = (p + qi), p, q R+ . (2.68)
The non-negativity of p, q comes from the Shapiro-Lopatinski necessary con-
dition for the well-posedness of the local subproblems (2.13)(2.14). Insert-
ing this approximation into the convergence factor (2.67) we find
2

p2 + (q 2 k 2 )

, 2 k 2

2
p + (q + k )
2 2 2
(p, q, k) = (2.69)

2

q 2
+ (p k 2 2 )

, 2 < k 2 .

2
q + (p + k )
2 2 2

First note that for k 2 = 2 the convergence factor (p, q, ) = 1, no matter

what one chooses for the free parameters p and q. In the Helmholtz case
one can not uniformly minimize the convergence factor over all relevant
frequencies, as in the case of positive definite problems, see [109, 88, 111].
The point k = represents however only one single mode in the spectrum,
and a Krylov method will easily take care of this when the Schwarz method
is used as a preconditioner, as our numerical experiments will show. We
therefore consider the optimization problem
min ( max (p, q, k)) , (2.70)

p, qR+ k(kmin , )(+ , kmax )
where and + are parameters to be chosen, and kmin denotes the smallest
frequency relevant to the subdomain, and kmax denotes the largest frequency
supported by the numerical grid. This largest frequency is of the order
/h. For example, if the domain is a strip of height L with homogeneous
Dirichlet conditions on top and bottom, the solution can be expanded in
a Fourier series with the harmonics sin(jy/L), j N. Hence the relevant
frequencies are k = j/L. They are equally distributed with a spacing /L
and thus choosing = /L and + = + /L leaves precisely one
frequency k = for the Krylov method and treats all the others by the
optimization. If falls in between the relevant frequencies, say j/L < <
(j + 1)/L then we can even get the iterative method to converge by choosing
= j/L and + = (j + 1)/L, which will allow us to directly verify our
asymptotic analysis numerically without the use of a Krylov method. How
to choose the optimal parameters p and q is given by the following:
Theorem 2.5.1 (Optimized Robin conditions) Under the three as-
sumptions
2 2 2 + +2 , < (2.71)
2 >
2 2
kmin + +2 , (2.72)
2 2 < 2
kmin + kmax
2
, (2.73)
the solution to the min-max problem (2.70) is unique and the optimal pa-
rameters are given by

2 2 k 2 2

max
p = q =

. (2.74)
2
The optimized convergence factor (2.70) is then given by

2 2 1/4 2 2
1 2 ( k2 2 ) + k2 2
(p , q , k) =
max max
max (2.75)
k(kmin , )(+ , kmax ) 2
2 1/4 2 2
1 + 2 ( k2 2 ) + k2 2
max max
For the proof, see [88].

Optimized Second order interface conditions

We approximate the operators Tj , j = 1, 2 in the form T1app = T2app = a+b
with a, b C and denoting the tangent direction at the interface. The
design of optimized second order transmission conditions is simplified by
Lemma 2.5.1 Let u1 and u2 be two functions which satisfy
L(uj ) ( 2 )(u) = f in j , j = 1, 2
and the transmission condition

( + ) ( + ) (u1 ) = ( + ) ( + ) (u2 ) (2.76)
n1 n1 n2 n2
with , C, + 0 and nj denoting the unit outward normal to domain
j . Then the second order transmission condition
2 1 2 2 1 2
( + ) (u1 ) = ( + ) (u2 )
n1 + + 12 n2 + + 22
(2.77)
is satisfied as well.
Proof Expanding the transmission condition (2.76) yields
2 2
( 2
+ ( + ) + ) (u1 ) = ( 2
( + ) + ) (u2 ).
n1 n1 n2 n2
2
Now using the equation L(u1 ) = f , we can substitute (

2 + )(u1 ) f
2
1
2 2 2
2 (u1 ) and similarly we can substitute ( 2 + )(u2 ) f for (u2 ).
2
for n n22
1 2
Hence, we get
2 2
( 2
2
+ ( + ) + ) (u1 )f = ( 2
2 ( + ) + ) (u2 )f.
1 n1 2 n2
Now the terms f on both sides cancel and a division by + yields (2.77).
Note that Higdon has already proposed approximations to absorbing bound-

ary conditions in factored form in [107]. In our case, this special choice of
approximating j (k) by
2 1
1app (k) = 2app (k) = + k2 (2.78)
+ +
leads to a particularly elegant formula for the convergence factor. Inserting
japp (k) into the convergence factor (2.67) and simplifying, we obtain
2 2
(k) 1 ( + )(k) + + k 2 2
(k; , ) = ( ) =( )
(k) + 1 ( + )(k) + + k 2 2
2 2 2
(k)2 ( + )(k) + (k) (k)
=( ) =( ) ( ) (2.79)
(k) + ( + )(k) +
2 (k) + (k) +
where (k) is defined in (2.66) and the two parameters , C can be used
to optimize the performance. By the symmetry of (k) with respect to k, it
suffices to consider only positive k to optimize performance. We thus need
to solve the min-max problem
min ( max (k; , )) (2.80)

, C k(kmin , )(+ , kmax )
where and + are again the parameters to exclude the frequency k =

where the convergence factor equals 1, as in the zeroth order optimization
problem. The convergence factor (k; , ) consists of two factors and is
real for vanishing modes and imaginary for propagative modes. If we chose
iR and R then for real the first factor is of modulus one and the
second one can be optimized using . If is imaginary, then the second
factor is of modulus one and the first one can be optimized independently
using . Hence for this choice of and the min-max problem decouples.
We therefore consider here the simpler min-max problem
min ( max (k; , )) (2.81)

iR, R k(kmin , )(+ , kmax )
which has an elegant analytical solution. Note however that the original
minimization problem (2.80) might have a solution with better convergence
factor, an issue investigated in [84].
Theorem 2.5.2 (Optimized Second Order Conditions) The solution

of the min-max problem (2.81) is unique and the optimal parameters are
given by
1/4
= i (( 2 kmin
2
)( 2 2 )) iR (2.82)
and
1/4
= ((kmax
2
2 )(+2 2 )) R. (2.83)
The convergence factor (2.81) is then for the propagating modes given by
2
( 2 2 )1/4 ( 2 kmin
2
)1/4
max (k, , ) = ( ) (2.84)
k(kmin , ) ( 2 2 )1/4 + ( 2 kmin
2 )1/4
and for the evanescent modes it is

2
(k 2 2 )1/4 (+2 2 )1/4
max (k, , ) = ( max

) . (2.85)
k(+ , kmax ) (kmax
2 2 )1/4 + (+2 2 )1/4

2 2
Proof For k (kmin , ) we have i2 k2 = 1 since R and thus
i k +

2 2 2
(k; , ) = i2 k2 depends only on . Similarly, for k (+ , kmax )
i k +

2 2
2
2 2
we have k2 2 = 1 since iR and therefore (k; , ) = k2 2
k + k +
depends only on . The solution (, ) of the minimization problem (2.81)
is thus given by the solution of the two independent minimization problems

i 2 k 2
min ( max ) (2.86)
iR, k(kmin , ) i 2 k 2 +
and
k 2 2
min ( max ) . (2.87)
R k(+ , kmax ) k 2 2 +
We show the solution for the second problem (2.87) only, the solution
for the first problem (2.86) is similar. First note that the maximum of

2 2
= k2 2 is attained on the boundary of the interval [+ , kmax ],
k +
because the function (but not ) is monotonically increasing with
k [+ , kmax ]. On the other hand as a function of , (+ ) grows mono-
tonically with while (kmax ) decreases monotonically with . The opti-
mum is therefore reached when we balance the two values on the boundary,
(+ ) = (kmax ) which implies that the optimal satisfies the equation

2
kmax 2 2 2
= + (2.88)
2
kmax 2 + +2 2 +
whose solution is given in (2.83).
The optimization problem (2.87) arises also for symmetric positive definite
problems when an optimized Schwarz algorithm without overlap and Robin
transmission conditions is used and the present solution can be found in
[182].
Note that the optimization of the interface conditions was performed for the
convergence factor of a fixed-point method and not for a particular Krylov
method applied to the substructured problem. In the positive definite case
one can show that minimizing the convergence factor is equivalent to mini-
mizing the condition number of the substructured problem [110]. Numerical
experiments in the next section indicate that for the Helmholtz equation
our optimization also leads to parameters close to the best ones for the
preconditioned Krylov method.
Numerical results
We present two sets of numerical experiments. The first set corresponds to
the model problem analyzed in this paper and the results obtained illustrate
the analysis and confirm the asymptotic convergence results. The second
numerical experiment comes from industry and consists of analyzing the
noise levels in the interior of a VOLVO S90.
We study a two dimensional cavity on the unit square with homogeneous

Dirichlet conditions on top and bottom and on the left and right radiation
conditions of Robin type. We thus have the Helmholtz problem
u 2 u = f 0 < x, y < 1
u=0 0 < x < 1, y = 0, 1
u (2.89)
iu = 0 x = 0, 0 < y < 1
x
u
iu = 0 x = 1, 0 < y < 1.
x
We decompose the unit square into two subdomains of equal size, and we
use a uniform rectangular mesh for the discretization. We perform all our
experiments directly on the error equations, f = 0 and choose the initial
guess of the Schwarz iteration so that all the frequencies are present in the
error.
We show two sets of experiments: The first one with = 9.5, thus exclud-
ing from the frequencies k relevant in this setting, k = n, n = 1, 2, . . ..
This allows us to test directly the iterative Schwarz method, since with
optimization parameters = 9 and + = 10 we obtain a convergence
factor which is uniformly less than one for all k. Table 2.2 shows the
number of iterations needed for different values of the mesh parameter h
for both the zeroth and second order transmission conditions. The Taylor
Order Zero Order Two

Iterative Krylov Iterative Krylov
h Taylor Optimized Taylor Optimized Taylor Optimized Taylor Optimized
1/50 - 457 26 16 - 22 28 9
1/100 - 126 34 21 - 26 33 10
1/200 - 153 44 26 - 36 40 13
1/400 - 215 57 34 - 50 50 15
1/800 - 308 72 43 - 71 61 19
Table 2.2: Number of iterations for different transmission conditions and

different mesh parameter for the model problem
transmission conditions do not lead to a convergent iterative algorithm,

because for all frequencies k > , the convergence factor equals 1. However,
with Krylov acceleration, GMRES in this case, the methods converge.
Note however that the second order Taylor condition is only a little better
than the zeroth order Taylor conditions. The optimized transmission
conditions lead, in the case where lies between two frequencies, already
to a convergent iterative algorithm. The iterative version even beats the

Krylov accelerated Taylor conditions in the second order case. No wonder
that the optimized conditions lead by far to the best algorithms when they
are accelerated by a Krylov method, the second order optimized Schwarz
method is more than a factor three faster than any Taylor method. Note
that the only difference in cost of the various transmission conditions
consists of different entries in the interface matrices, without enlarging
the bandwidth of the matrices. Fig. 2.9 shows the asymptotic behavior
of the methods considered, on the left for zeroth order conditions and on
the right for second order conditions. Note that the scale on the right
3 2
10 10
Optimized 0 Taylor 2 Krylov

0.27 Optimized 2
h 0.5
h
iterations
iterations
2 1
10 0.5 10 Optimized 2 Krylov
h
Taylor 0 Krylov
0.32
h
Optimized 0 Krylov
1 0
10 10
3 2 1 3 2 1
10 10 10 10 10 10
h h
Figure 2.9: Asymptotic behavior for the zeroth order transmission conditions
(left) and for the second order transmission conditions (right)
for the second order transmission conditions is different by an order of

magnitude. In both cases the asymptotic analysis is confirmed for the
iterative version of the optimized methods. In addition one can see that
the Krylov method improves the asymptotic rate by almost an additional
square root, as expected from the analysis in ideal situations. Note the
outlier of the zeroth order optimized transmission condition for h = 1/50. It
is due to the discrepancy between the spectrum of the continuous and the
discrete operator: = 9.5 lies precisely in between two frequencies 9 and
10 at the continuous level, but for the discrete Laplacian with h = 1/50
this spectrum is shifted to 8.88 and 9.84 and thus the frequency 9.84
falls into the range [9, 10] neglected by the optimization. Note however
that this is of no importance when Krylov acceleration is used, so it is not
worthwhile to consider this issue further.
In the second experiment we put directly onto a frequency of the model

problem, = 10, so that the iterative methods cannot be considered any
more, since for that frequency the convergence factor equals one. The Krylov
accelerated versions however are not affected by this, as one can see in
Table 2.3. The number of iterations does not differ from the case where
Order Zero Order Two

h Taylor Optimized Taylor Optimized
1/50 24 15 27 9
1/100 35 21 35 11
1/200 44 26 41 13
1/400 56 33 52 16
1/800 73 43 65 20
Table 2.3: Number of iterations for different transmission conditions and

different mesh parameter for the model problem when lies precisely on a
frequency of the problem and thus Krylov acceleration is mandatory
was chosen to lie between two frequencies, which shows that with Krylov
acceleration the method is robust for any values of . We finally tested
for the smallest resolution of the model problem how well Fourier analysis
predicts the optimal parameters to use. Since we want to test both the
iterative and the Krylov versions, we need to put again the frequency
in between two problem frequencies, and in this case it is important to be
precise. We therefore choose to be exactly between two frequencies of
the discrete problem, = 9.3596, and optimized using = 8.8806 and
+ = 9.8363. Fig. 2.10 shows the number of iterations the algorithm needs
to achieve a residual of 10e6 as a function of the optimization parameters p
and q of the zeroth order transmission conditions, on the left in the iterative
version and on the right for the Krylov accelerated version. The Fourier
50 50 18
70
65
70
65
19
20
60
45 45
18
60 65
21
40 55 40 17
55
50
17
6065
35 35
50
70
60
55
19
50
20
45
30 30
p
18
45
40
17
40
21
25 25
55 35
17
706650
50
22
50
19
45
40
20 55 20 18
35 20
45 60 65 17
40 70
45 18 21
15 50 15 19
50 19 23
55 20 24 25
60 22
65 20
10 55 10 21
10 15 20 25 30 35 40 45 50 10 15 20 25 30 35 40 45 50
q q
Figure 2.10: Number of iterations needed to achieve a certain precision as

function of the optimization parameters p and q in the zeroth order trans-
mission conditions, for the iterative algorithm (left) and for the Krylov ac-
celerated algorithm (right). The star denotes the optimized parameters p
and q found by our Fourier analysis
analysis shows well where the optimal parameters lie and when a Krylov
method is used, the optimized Schwarz method is very robust with respect
to the choice of the optimization parameter. The same holds also for the
second order transmission conditions, as Fig. 2.11 shows.
30 30
10
22
10
11
22
20
40
50
65
60
21
45
12
26
24
28
13
55
24
15
35
26
25 25
19
14
28
30
20
18
22
21
21
24
10
19
26
20 20
10
17
22
11
20
40
50
60
45
alpha
alpha
12
24
20
13
55
18
15
18
16
35
15 15
19
26
14
22
19
28
10
30
21
17
18
21
24
19
26
10
17 18
10 22 2026
24 21 22 2026 10 11
30 28 24 30
28
12 11
35 35 30 28
40 35
405
50
45
13
45 40
4
50
55
60 11
65
15
55
70 50 45 19
16
5505 55 12
20
75 70 65
14
60 7580 90 85 60
5 5 12
5 10 15 20 25 30 35 40 45 50 55 60 5 10 15 20 25 30 35 40 45 50 55 60
beta beta
Figure 2.11: Number of iterations needed to achieve a certain precision

as function of the optimization parameters and in the second order
transmission conditions, for the iterative algorithm (left) and for the Krylov
accelerated algorithm (right). The star denotes the optimized parameters
and found by our Fourier analysis
Noise levels in a VOLVO S90

We analyze the noise level distribution in the passenger cabin of a VOLVO
S90. The vibrations are stemming from the part of the car called firewall.
This example is representative for a large class of industrial problems where
one tries to determine the acoustic response in the interior of a cavity caused
by vibrating parts. We perform a two dimensional simulation on a verti-
cal cross section of the car. Fig. 2.12 shows the decomposition of the car
into 16 subdomains. The computations were performed in parallel on a net-
work of sun workstations with 4 processors. The problem is characterized
4 3 5
12
8 13
2 11
16 1
10 7 9
6 15
14
Figure 2.12: Decomposition into 16 subdomains (left) and acoustic field in

the passenger compartment (right) of the VOLVO S90
by a = 18.46 which corresponds to a frequency of 1000 Hz in the car of

length a. To solve the problem, the optimized Schwarz method was used as
a preconditioner for the Krylov method ORTHODIR, and as convergence
criterion we used
f L 106 f L .
Ku (2.90)
2 2
When using zeroth order Taylor conditions and a decomposition into

16 subdomains, the method needed 105 iterations to converge, whereas
when using second order optimized transmission conditions, the method
converged in 34 iterations, confirming that also in real applications the
optimized Schwarz method is about a factor 3 faster, as we found for the
model problem earlier. Fig. 2.12 shows the acoustic field obtained in the
passenger compartment of the VOLVO S90.
2.5.3 Optimized conditions for other equations

Over the last years, a lot of results have been obtained for different classes of
equations and optimized algorithms based on carefully chosen parameters in
the transmission conditions, have been derived. For steady state symmetric
problems, we can cite [40, 83]. For advection-diffusion type problems, see
[112, 110, 111, 126, 138, 139]. As far as the Helmholtz equations are con-
cerned, optimized transmission conditions were developed in [25, 26, 32, 88,
86]. There are also works for problems involving discontinuous coefficients
in [65, 79, 93]. Other particular aspects has been discussed, such as the
influence of the geometry [82], coarse grid corrections [66], cross points
[87] or circular domains [90]. In fluid dynamics, optimized transmission
conditions have been studied in [54, 52, 53]. See also [182] for porous
flow media. Like the Helmholtz equations, high-frequency time-harmonic
Maxwells equations are difficult to solve by classical iterative methods
since they are indefinite in nature. Optimized methods have been developed
both for the first order formulation of Maxwell equations in [55, 56, 51] or
[69, 70] and also for the second order formulation in [25][section 4.7], [31, 4,
158, 157, 161].
All the previous methods were based on optimized polynomial approxima-

tions of the symbol of the transparent boundary conditions. An alternative
is the use of the rational approximations of Pade type, see [6]. Perfectly
matched layers (PML) [116] [27] are also used in domain decomposition
methods [166], [6]. Note as well that approximate factorization methods
for the Helmholtz operator can be designed using absorbing boundary
conditions [89] or PML [159]. For applications to real life problems using
a Discontinuous Galerkin method can be found in [55, 56, 71]. For finite-
element based non-overlapping and non-conforming domain decomposition
methods for the computation of multiscale electromagnetic radiation and
scattering problems we refer to [122, 158, 157, 161]. In a recent work
2.6. FREEFEM++ IMPLEMENTATION OF ORAS 87
the first order and second order formulations were presented in a unified
framework in [50, 49].
Note that the method of deriving optimized transmission conditions is quite

general and can be in principle applied to other types of equations.
2.6 FreeFem++ implementation of ORAS

As a numerical illustration, we solve the Helmholtz equation in a square
of size 6 wavelengths with first order absorbing boundary condition:
u
k 2 u u = f in , + iku = 0 on .
n
The right handside f is a Gaussian function. The real part of the solu-
tion is plotted on Figure 2.13. After discretization, the resulting linear
Figure 2.13: Real part of a solution of a Helmholtz equation
system (2.48) is solved by a domain decomposition method. As explained

in chapter 3, it is more profitable to use a Krylov method preconditioned
by the ORAS operator MORAS 1
(2.52) in place of the iterative fixed-point
algorithm (2.21). We solve (2.48) by a preconditioned GMRES. We need
to provide the FreeFem++ implementation for the ORAS preconditioner.
As in the case of the Schwarz algorithm, we need to create a specific data
file dataHelmholtz.edp and a partition of unity and the restriction and
extension operators.
load metis
load medit
3 int nn=4,mm=4; // number of the domains in each direction
7 int sizeovr = 2; // size of the overlap
// Mesh of a rectangular domain
mesh Th=square(nnnloc,mmnloc,[xallong,y]);
11 fespace Vh(Th,P1);
fespace Ph(Th,P0);
15 // Domain decomposition data structures
mesh[int] aTh(npart); // sequence of ovr. meshes
matrix[int] Rihreal(npart); // local restriction operators
matrix[int] Dihreal(npart); // partition of unity operators
19 matrix<complex>[int] Rih(npart); // local restriction operators
matrix<complex>[int] Dih(npart); // partition of unity operators
23 matrix<complex>[int] aA(npart); // local Dirichlet matrices
matrix<complex>[int] aR(npart); // local Robin matrices
// k2u Delta (u) = f,
27 // u = g, Dirichlet boundary (top and bottom)
// dn(u) + i k u = 0, Robin boundary (left and right)
int Dirichlet=1, Robin=2;
int[int] chlab=[1, Robin, 2, Robin, 3, Robin, 4, Robin];
31 Th=change(Th,refe=chlab);
real k=12.pi;
func f = exp(((x.5)2+(y.5)2)120.);//1; // right hand side
35 func g = 0; // Dirichlet data
varf vaglobal(u,v) = int2d(Th)(k2uv+Grad(u)Grad(v))
+ int1d(Th,Robin)(1ikuv) int2d(Th)(fv)+
on(Dirichlet,u=g) ;
matrix<complex> Aglobal;
39 Vh<complex> rhsglobal,uglob; // rhs and solution of the global problem
complex alpha = 1ik; // Despres algorithm
// Optimal alpha:
// h12: the mesh size for a rectangular domain of dimension 2x1,
43 // that is obtained with nnxmm = 2x1,4x2,8x4
real h21 = 1./(mmnloc);
real kplus = k+pi/1; // the rectangle height is 1
real kminus = kpi/1;
47 real Cw = min(kplus2k2,k2kminus2);
real alphaOptR = pow(Cw/h21,1/3.)/2;
complex alphaOpt = alphaOptR + 1ialphaOptR;
//alpha = alphaOpt;
51 //alpha=1.0e30;
// Iterative solver
2.6. FREEFEM++ IMPLEMENTATION OF ORAS 89
Listing 2.1: ./FreefemCommon/dataHelmholtz.edp
Script file (note that the GMRS routine has to be adapted to complex
types)
/# debutPartition #/
2 include ../../FreefemCommon/dataHelmholtz.edp
SubdomainsPartitionUnity(Th,part[],sizeovr,aTh,Rihreal,Dihreal,Ndeg,AreaThi);
6 for (int i=0; i<npart; i++) {
Rih[i] = Rihreal[i];
Dih[i] = Dihreal[i];
}
10 /# endPartition #/
/# debutGlobalData #/
14 uglob[] = Aglobal1 rhsglobal[];
Vh realuglob = real(uglob);
plot(realuglob,dim=3,wait=1,cmm=Exact solution, value =1,fill=1);
/# finGlobalData #/
18 /# debutLocalData #/
{
cout << Domain : << i << / << npart << endl;
22 mesh Thi = aTh[i];
fespace Vhi(Thi,P1);
varf RobinInt(u,v) = int2d(Thi)(k2uv+Grad(u)Grad(v))
+ int1d(Thi,Robin)(1ikuv) + on(Dirichlet,
u=g)
26 + int1d(Thi,10)(alphauv);
aR[i] = RobinInt(Vhi,Vhi);
set(aR[i],solver = UMFPACK); // direct solvers using UMFPACK
}
30 /# finLocalData #/
/# debutGMRESsolve #/
include GMRES.idp
Vh<complex> un = 0, sol, er; // initial guess, final solution and error
34 sol[] = GMRES(un[], tol, maxit);
plot(sol,dim=3,wait=1,cmm=Final solution,value =1,fill=1);
er[] = sol[]uglob[];
cout << Final scaled error = << er[].linfty/uglob[].linfty << endl;
38 /# finGMRESsolve #/
Listing 2.2: ./CIO/FreefemProgram/ORAS-GMRES.edp
The domain is decomposed into 6 6 regular subdomains. Convergence

curves are displayed in Figure 2.14(left) using either Dirichlet interface con-
ditions (240 iterations) or first order absorbing boundary conditions (100
iterations). These results are obtained by a one level method that is with-
out a coarse space correction. We can improve them by adding a specially
designed coarse space for Helmholtz equations [34]. The effect can be seen in
Figure 2.14 (right). We plot the convergence curves for the one level Despres
algorithm and its acceleration using either 5 or 10 vectors per subdomain in
the coarse space.
0
10
1 Despres
10 CS5
Dirichlet
CS10
Despres
0 1
10 10
1
10
Residual
2
Residual
10
2
10
3
10
3
10
4 4
10 10
0 50 100 150 200 250 0 10 20 30 40 50 60 70 80 90 100
Iterations Iterations
Figure 2.14: Left: Comparison between Schwarz and Despres algorithms.

Right: Comparison between one level Despres algorithm and its acceleration
with a coarse space [34]
Chapter 3
Krylov methods
In chapter 1, we introduced Schwarz methods as iterative solvers. The

aim of this chapter is to explain the benefit of using domain method
decomposition methods as preconditioners for Krylov type methods such as
CG (conjugate gradient), GMRES or BICGSTAB. In 3.2, we show that
for any iterative solver, it is more suitable to consider it as a preconditioner
for Krylov type methods. In 3.6, we apply this principle to domain
decomposition methods.
Recall first that the iterative versions of the Schwarz methods introduced in
chapter 1 can be written as preconditioned fixed point iterations
Un+1 = Un + M 1 rn , rn = F A Un
where M 1 is depending on the method used (RAS or ASM). The key point
in what follows is to prove that fixed point methods (3.3) are intrinsically
slower than Krylov methods, for more details see [39]. Since this result is
valid for any fixed point method, we will start by placing ourselves in an
abstract linear algebra framework and then apply the results to the Schwarz
methods.
3.1 Fixed point iterations

Consider the following well-posed but difficult to solve linear system
A x = b, x RN (3.1)
and M an easy to invert matrix of the same size than A. Actually by

easy to invert, we mean that the matrix-vector of M 1 by a residual vector
r = b A x is cheap. Then, equation (3.1) can be solved by an iterative
method
M xn+1 = M xn + (b A xn ) M xn+1 = M xn + rn . (3.2)
91
92 CHAPTER 3. KRYLOV METHODS
This suggests the following definition.

Definition 3.1.1 (Fixed point method) The following algorithm equiv-
alent to (3.2)
xn+1 = xn + M 1 (b A xn ) xn+1 = xn + M 1 rn (3.3)
is called a fixed point algorithm and the solution x of (3.1) is a fixed point
of the operator:
x z x + M 1 (b A x) .
The difference between matrices A and M is denoted by P = M A. When
convergent the iteration (3.3) will converge to the solution of the precondi-
tioned system
M 1 Ax = M 1 b .
The above system which has the same solution as the original system is
called a preconditioned system; here M 1 is called the preconditioner. In
other words, a splitting method is equivalent to a fixed-point iteration on a
preconditioned system. We see that the fixed point iterations (3.3) can be
written as
xn+1 = xn + M 1 (b A xn )
= (I M 1 (M P ))xn + M 1 b
= M 1 P xn + M 1 b (3.4)
= M 1 P xn + M 1 Ax = M 1 P xn + M 1 (M P )x
= x + M 1 P (xn x) .
From (3.4) it is easy to see that the error vector en = xn x verifies
en+1 = M 1 P en
so that
en+1 = (M 1 P )n+1 e0 . (3.5)
For this reason M 1 P is called the iteration matrix associated with (3.4)
and since the expression of the iteration doesnt change w.r.t. to n, we call
(3.4) a stationary iteration.
We recall below the well-known convergence results in the case of stationary
iterations.
Lemma 3.1.1 (Convergence of the fixed point iterations) The fixed
point iteration (3.4) converges for arbitrary initial error e0 (that is en 0
as n ) if and only if the spectral radius of the iteration matrix is inferior
to 1, that is (M 1 P ) < 1 where
(B) = max{, eigenvalue of B}.
3.2. KRYLOV SPACES 93
For the detailed proof see for example [100].
In general, it is not easy to ensure that the spectral radius of the iteration
matrix M 1 P is smaller than one. Except for M -matrices (A non-singular
and A1 non-negative), for which any regular splitting A = M P (M non-
singular, M 1 and P non-negative) leads to a convergent fixed point itera-
tion. (see Chapter 4 in [164] for details).
3.2 Krylov spaces

In this section we will show that the solution of a fixed point method belongs
to an affine space called Krylov space. We start by showing that the solution
of a fixed point iteration can be written as a series.
Lemma 3.2.1 (Fixed point solution as a series) Let
zn = M 1 rn = M 1 (b Axn )
be the residual preconditioned by M at the iteration n for the fixed point

iteration
xn+1 = xn + zn = xn + M 1 (b Axn ) (3.6)
Then, the fixed point iteration is equivalent to
n
xn+1 = x0 + (M 1 P )i z0 . (3.7)
i=0
Proof Note that we have that
xn+1 = xn + M 1 rn = xn + zn xn+1 = x0 + z0 + z1 + + zn . (3.8)
From (3.5) we see that the residual vector rn = b Axn = A(xn x) verifies
rn = Aen = (P M )(M 1 P )n e0 = (P M 1 )n P e0 (P M 1 )n1 P e0

= (P M 1 )n (P e0 M e0 ) = (P M 1 )n r0 .
(3.9)
From (3.9) we have that
zn = M 1 rn = M 1 (P M 1 )n r0 = M 1 (P M 1 )n M z0 = (M 1 P )n z0 . (3.10)
From (3.8) and (3.10) we obtain
n1
xn+1 = x0 + z0 + (M 1 P )z1 + + (M 1 P )n zn = x0 + (M 1 P )i z0 . (3.11)
i=0
which leads to the conclusion. Thus the error xn+1 x0 is a geometric series
of common ratio M 1 P . Note that (3.11) can be also written in terms of
the residual vector.
xn+1 = x0 + M 1 r0 + (M 1 P )M 1 r1 + + (M 1 P )n M 1 rn
n1 (3.12)
= x0 + (M 1 P )i M 1 r0 .
i=0
The previous result leads to the following remark.
Remark 3.2.1 The correction xn+1 x0 is given by
xn+1 x0 = Sn (M 1 P )M 1 r0 = Sn (M 1 P ) z0 , (3.13)
where Sn is the polynomial given by Sn (t) = 1 + t + + tn . Moreover, from

(3.12) we see that
xn+1 x0 Span{M 1 r0 , (M 1 P )M 1 r0 , , (M 1 P )n M 1 r0 }
(3.14)
= Span{z0 , (M 1 P )z0 , , (M 1 P )n z0 }
In conclusion the solution of a fixed point iteration is generated in a space

spanned by powers of the iteration matrix M 1 P = I M 1 A applied to a
given vector. The main computational cost is thus given by the multiplica-
tion by the matrix A and by the application of M 1 . At nearly the same
cost, we could generate better approximations in the same space (or better
polynomials of matrices).
Definition 3.2.1 (Krylov spaces) For a given matrix B and a vector y

we define the Krylov subspaces of dimension n associated to B and y by
Kn (B, y) = Span{y, By, . . . , B n1 y} .
Therefore, a Krylov space is a space of polynomials of a matrix times a

vector.
According to the previous definition, we see from (3.14) that
xn x0 Kn (M 1 P, M 1 r0 ) = Kn (M 1 P, z0 ).
By the same kind of reasoning, we also have that
en = (M 1 P )n e0 en Kn+1 (M 1 P, e0 ),
zn = (M 1 P )n z0 zn Kn+1 (M 1 P, z0 ).
3.2. KRYLOV SPACES 95
But since M 1 P = I M 1 A we see that
Kn (M 1 P, z0 ) = Kn (M 1 A, z0 ).
A reasonable way to compute iteratively a solution would be to look for

an optimal element in the above mentioned space. This element will be
necessarily better and will approach faster the solution than a fixed point
method. Krylov type methods differ by the way an optimality condition
is imposed in order to uniquely define xn and by the actual algorithm that
implements it.
Remark 3.2.2 Note also that the difference between two successive iterates
is given by
xn+1 xn = (M 1 P )zn = (M 1 P )n M 1 r0
= (I M 1 A)n M 1 r0 Kn+1 (M 1 A, M 1 r0 )
This sequence can be further improved by searching for a better solution in

this Krylov space.
Example 3.2.1 From the family of fixed point methods we can mention the
Richardson iterations with a relaxation parameter.
xn+1 = xn + rn = xn + (b Axn ). (3.15)
The parameter can be chosen in such a way to have the best convergence
factor over the iterations. In the case of a symmetric positive definite matrix,
the value of which minimizes the convergence rate of the algorithm is
2
opt =
min (A) + max (A)
and in this case the error behaves as

n
2 (A) 1
en A ( ) e0 A
2 (A) + 1
where
max (A)
2 (A) =
min (A)
is the condition number of A, max (A) and min (A) being the extreme eigen-
values of A. In practice, it is very difficult to estimate accurately min or
max and thus opt . This motivates the introduction of the gradient method
in the next paragraph.
3.2.1 Gradient methods

This suggests the definition of a new class of methods for symmetric positive
matrices.
Definition 3.2.2 (Gradient methods) A descent method is an iteration

of the form
xn+1 = xn + n pn , n 0 (3.16)
where n is chosen such that it minimizes en+1 2A = xn+1 x2A (the
square of the A-norm for the error norm at each iteration, en+1 2A =
(A en+1 , en+1 )2 ) and vector pn is called the descent direction. If pn = rn ,
the resulting method is called optimal gradient method.
Consider function f :
z f () = xn + pn x2A
and the minimization problem
f (n ) = xn + n pn x2A

xn+1
= min xn + pn x2A = min (A(xn + pn x), xn + pn x)

= min [2 (Apn , pn ) + 2(Apn , xn x) + (A(xn x), xn x)]

= min[2 (Apn , pn ) 2(pn , A(x xn ))] + (A(xn x), xn x).

n

r
We have that the optimal parameter n is characterized by:

f (pn , rn )
(n ) = 0 n = . (3.17)
(Apn , pn )
Note that in the case of the gradient method (pn = rn ) n becomes
rn 22 (en , rn )A
n = = . (3.18)
rn 2A rn 2A
We have the following convergence result.
Lemma 3.2.2 (Optimal gradient convergence) The optimal step gra-

dient method xn+1 = xn + n rn , with n chosen in (3.18) converges and the
error estimate is verifying the estimate
n
1 2
en A (1 ) e0 A
2 (A)
max (A)
where 2 (A) = min (A) is the condition number of A.
3.3. THE CONJUGATE GRADIENT METHOD 97
Proof From (3.16) we see that

en+1 2A = en + n rn 2A = en 2A + 2n (en , rn )A +n2 rn 2A

n rn 2A
(rn , rn )2
= en 2A n2 rn 2A = en 2A n2 (Arn , rn ) = en 2A
(Arn , rn )
(rn , rn ) (rn , rn )
= en 2A (1 ),
(A1 rn , rn ) (Arn , rn )
where we have used the identity:
en 2A = (en , Aen ) = (A1 rn , rn ) .
Matrix A being symmetric positive definite we have
(rn , rn ) 1 (rn , rn ) 1
min (A) max (A) and
(A r , r )
1 n n max (A) (Ar , r ) min (A)
n n
which leads to the estimate

n
min (A) 1 2
en+1 2A (1 ) en 2A en A (1 ) e0 A
max (A) 2 (A)
and this ends the proof.
3.3 The Conjugate Gradient method

We present here a Krylov method that applies to symmetric positive definite
(SPD) matrices. Suppose that we want to solve Ax = b with A a SPD
matrix. The idea is that the solution at each iteration solves a minimization
problem in the A-norm over a Krylov space:
Find yn Kn (A, r0 ) such that
en A = x (x0 + yn ) A = min x (x0 + w)A . (3.19)

n wKn (A,r0 )
x
In the following we will define a new method that extends the idea of gradient
methods and will prove afterwards that is exactly the Krylov method whose
iterates are given by (3.19).
Definition 3.3.1 (Conjugate Gradient) Starting from an initial guess
x0 and an initial descent direction p0 = r0 = b Ax0 , a conjugate gradient
method is an iteration of the form
xn+1 = xn + n pn ,
rn+1 = rn n Apn , (3.20)
p n+1
= r n+1
+ n+1 p ,n
where n and n are chosen such that they minimize the norm of the error
en+1 2A = xn+1 x2A at each iteration.
Coefficients n and n+1 are easy to find. First, from (3.18) we see that the
coefficient n which minimizes en+1 2A is necessarily given by
(pn , rn )
n = . (3.21)
(Apn , pn )
By taking the dot product of the second relation of (3.20) by pn and by
using (3.21) into it, we see that
(rn+1 , pn ) = (rn , pn ) n (Apn , pn ) = 0.
By taking the dot product of the third relation of (3.20) by rn+1 and by
using the previous orthogonality relation we obtain
(pn+1 , rn+1 ) = (rn+1 , rn+1 ) + n+1 (rn+1 , pn ) = (rn+1 , rn+1 ), n 0.
By replacing the last relation taken in n into (3.21) we get
(rn , rn ) rn 22
n = = . (3.22)
(Apn , pn ) pn 2A
In a similar way we can find n+1 . If we replace the (3.22) taken in n + 1

into en+2 2A and use the third relation of (3.20) to replace pn+1 we get
en+2 2A = xn+2 x2A = xn+1 x + n+1 pn+1 2A
= en+1 2A + 2(Aen+1 , n+1 pn+1 ) + n+1

2
pn+1 2A

rn+1
= en+1 2A 2n+1 (rn+1 , rn+1 ) + n+1
2
pn+1 2A
(rn+1 , rn+1 )4
= en+1 2A
(Apn+1 , pn+1 )
(rn+1 , rn+1 )4
= en+1 2A .
(A(rn+1 + n+1 pn ), (rn+1 + n+1 pn ))
Therefore, minimizing en+2 2A with respect to n+1 is equivalent to mini-

mizing the quadratic form (A(rn+1 + n+1 pn ), (rn+1 + n+1 pn )) with respect
to n+1 , which gives
(Arn+1 , pn )
n+1 = . (3.23)
(Apn , pn )
By taking the A-dot product of the third relation of (3.20) by pn and by
using (3.23) into it see that
(Apn+1 , pn ) = (Arn+1 , pn ) + n+1 (Apn , pn ) = 0 .

Using this identity and taking the A-dot product of the third relation of
(3.20) by pn+1 we get
(Apn+1 , pn+1 ) = (Arn+1 , pn+1 ) + n (Apn+1 , pn )
(3.24)
= (Arn+1 , pn+1 ) = (Apn+1 , rn+1 )
By using (3.24) into the dot product of the second relation of (3.20) by rn
(rn+1 , rn ) = (rn , rn ) n (Apn , rn ) = (rn , rn ) n (Apn , pn ) = 0. (3.25)
Finally by taking the dot product of the second relation of (3.20) by rn+1
and by using (3.25)
rn+1 22
(rn+1 , Apn ) = . (3.26)
n
By plugging (3.26) into (3.23) we conclude by using the expression of (3.22)
that
rn+1 22
n+1 = . (3.27)
rn 22
The resulting algorithm is given in Algorithm 4, see [9].
Algorithm 4 CG algorithm
Compute r0 = b Ax0 , p0 = r0
for i = 0, 1, . . . do
i = (ri , ri )2
qi = Api
i
i =
(pi , qi )2
xi+1 = xi + i pi
ri+1 = ri i qi
i+1 = (ri+1 , ri+1 )2
i+1
i+1 =
i
pi+1 = ri+1 + i+1 pi
check convergence; continue if necessary
end for
We can see that the conjugate gradient algorithm has a few remarkable
properties that can be stated in the following lemma.
Lemma 3.3.1 (Conjugate Gradient as a Krylov method) The
descent directions pn are A-orthogonal or conjugate
(Apk , pl ) = 0, k, l, k l
and the residual vectors are orthogonal between them and to the descent
directions
(rk , rl ) = 0, k, l, k l and (pk , rl ) = 0, k, l, k < l
Moreover, both the descent directions and the residuals span the Krylov
space:
Kn+1 (A, r0 ) = Span{r0 , r1 , , rn } = Span{p0 , p1 , , pn } (3.28)
and vector xn from (3.20) minimizes the error norm en 2A on the affine
space x0 + Kn (A, r0 ). The algorithm will converge in at most N iterations,
N being the size of matrix A.
Proof The proof is a direct consequence of the algebraic manipulations

performed previously to find the coefficients n and n+1 . The property
(3.28) can be obtained by recurrence on the relations (3.20) and in the
same way the fact that xn x0 + Kn (A, r0 ). Moreover, the matrix A being
symmetric positive definite, there exists an A-orthogonal basis of size N ,
which explains why we cannot have more than N conjugate descent vectors.
This leads to a convergence in at most N iterations.
Lemma 3.3.2 (Convergence of the Conjugate Gradient method)

The conjugate gradient method defined in (3.20) converges to the solution
of the linear system Ax = b and we have the error estimate:
n
2 (A) 1
e A 2
n
e0 A (3.29)
2 (A) + 1
where 2 (A) is the condition number of the matrix A.
Proof As we have seen previously, the conjugate gradient method is a

Krylov method. Since
en Kn (A, r0 ) = Kn+1 (A, e0 )
this means that building a sequence of iterates xn is equivalent to build a

sequence of polynomials Pn Pn with real coefficients such that
en = Pn (A)e0 .
Pn (0) = 1.
Pn minimizes J(Q) = Q(A)e0 2A over all the polynomials Q Pn such

that Q(0) = 1.
Matrix A can be diagonalized using orthonormal eigenvectors A = T T t .

For any given polynomial Q,
Q(A) = T Q()T t .
Therefore we have for any monic (coefficient of the leading term is equal to
one) polynomial Q of degree n
en A = Pn (A)e0 A Q(A)e0 A = T Q()T t e0 A max Q()e0 A

(A)
and the estimate yields
en A min max Q()e0 A , (3.30)

QPn ,Q(0)=1 [1 ,2 ]
where 1 = min (A) and 2 = max (A). We know from [23] that
max Q (x) = min max Q(x),

x[1 ,2 ] QPn ,Q(0)=1 x[1 ,2 ]
where
2 1 )
Tn ( 2x 1 2
Q (x) =

,
Tn (
2 1 )
1 2
with Tn being the Chebyshev polynomial of the first kind

(x + x 2 1)n + (x x2 1)n
Tn (x) = , x > 1,
2

cos(n arccos(x)) , x 1.

We see that
1 1
max Q (x) = = (A)+1
x[1 ,2 ] Tn (
2 1 )
1 2
Tn ( 22 (A)1 )
n
2 2 (A) 1
= 2
n
2 (A) + 1
n
2 (A)1 2 (A)1
( ) + ( )
2 (A)+1 2 (A)+1
(3.31)
From (3.31) and (3.30) the conclusion follows.
Remark 3.3.1 We estimate the iteration counts to reach a reduction factor

of the error by a factor 1 for a poorly conditioned system, 2 (A)
1. From Lemma 3.2.2, the optimal gradient method will converge in nOG
iterations with : nOG
1 2
= (1 ) .
2 (A)
Thus, we have
nOG 1
log = log (1 ),
2 2 (A)
that is
nOG 22 (A) ( log ) .
From Lemma 3.3.2, the conjugate gradient method will converge in nCG
iterations with:
2 (A)
nCG ( log ) .
2
These estimates are clearly in favor of the CG algorithm over the optimal
gradient method.
3.3.1 The Preconditioned Conjugate Gradient Method

In order to have a faster convergence, a SPD preconditioner denoted here
M is often used. In order to preserve the SPD properties, we use the fact
that M 1 admits a SPD square root denoted M 1/2 such that
M 1/2 M 1/2 = M 1 .
The CG method is applied to the symmetrized preconditioned system
(M 1/2 A M 1/2 )M 1/2 x = M 1/2 b.
instead of simply M 1 A since the latter is generally non-symmetric, but

its spectrum is the same as of matrix M 1/2 A M 1/2 . A naive and costly
implementation would require the quite expensive computation of the square
root of the inverse of operator M . Once again, a detailed analysis reveals
that it can be bypassed.
Lemma 3.3.3 (Preconditioned Conjugate Gradient) The conjugate

gradient method applied to the system
Ax = b, (3.32)
with
A = M 1/2 A M 1/2 , x = M 1/2 x, b = M 1/2 b
can be rewritten as the following iterative method: starting from and initial
guess x0 and of an initial descent direction
p0 = M 1 r0 = M 1 (b Ax0 ),
compute the sequence of approximations as
xn+1 = xn + n pn ,
rn+1 = rn n Apn , (3.33)
p n+1
= M 1 n+1
r + n+1 p ,
n
where n and n+1 are given by
(M 1 rn , rn ) (M 1 rn+1 , rn+1 )
n = , n+1 = . (3.34)
pn 2A (M 1 rn , rn )
Proof Suppose we apply now the conjugate gradient method (3.20) to the
system (3.32). This gives
xn+1 = xn + n pn ,
rn+1 = rn n Apn , (3.35)
pn+1 = rn+1 + n+1 pn ,
with
xn = M 1/2 xn , pn = M 1/2 pn , rn = M 1/2 rn , A = M 1/2 AM 1/2 . (3.36)
and the coefficients

(rn , rn ) (M 1/2 rn , M 1/2 rn ) (M 1 rn , rn )
n = = = ,
pn 2 ((M 1/2 AM 1/2 )M 1/2 pn , M 1/2 pn ) pn 2A
A
(rn+1 , rn+1 ) (M 1/2 rn+1 , M 1/2 rn+1 ) (M 1 rn+1 , rn+1 )
n = = =
(rn , rn ) (M 1/2 rn , M 1/2 rn ) (M 1 rn , rn )
(3.37)
which are exactly the coefficients from (3.34).
By replacing now (3.36) into the three relations of (3.35), we obtain the
iterations (3.33). The initial descent direction p0 is derived from
p0 = r0 M 1/2 p0 = M 1/2 r0 p0 = M 1 r0 .
which ends the proof.
We have just proved the the preconditioned conjugate gradient requires

only the application of the preconditioner M 1 and the computation of
M 1/2 is thus not required. Moreover, since the spectrum of M 1/2 AM 1/2
and M 1 A is the same, in the convergence estimate (3.29) we simply have
to replace 2 (A) by 2 (M 1 A).
The resulting iterative procedure is given in Algorithm 5.
Remark 3.3.2 There is another way of deriving the preconditioned form of

the Conjugate Gradient Method by noting that even if M 1 A is not symmet-
ric, it is self-adjoint for the M -inner product (recall that M is symmetric
positive definite), that is
(M 1 Ax, y)M = (Ax, y) = (x, Ay) = (x, M (M 1 A)y) = (x, M 1 Ay)M ,

Algorithm 5 PCG algorithm

Compute r0 = b Ax0 , z0 = M 1 r0 , p0 = z0 .
for i = 0, 1, . . . do
i = (ri , zi )2
qi = Api
i
i =
(pi , qi )2
xi+1 = xi + i pi
ri+1 = ri i qi
zi+1 = M 1 ri+1
i+1 = (ri+1 , zi+1 )2
i+1
i+1 =
i
pi+1 = zi+1 + i+1 pi
end for
where (x, y)M = (M x, y). Therefore, an alternative is to replace the usual

Euclidean inner product in the Conjugate Gradient algorithm by the M -inner
product. This idea is considered by Saad (see Chapter 9 of [164] for details).
3.4 Krylov methods for non-symmetric problems

If either operator A or preconditioner M is not symmetric positive definite
positive, PCG algorithm cannot be used. The bad news is that in the
unsymmetric case there is no method with a fixed cost per iteration that
leads to an optimal choice for xn . In this case, two popular methods are the
GMRES [165] and BICGSTAB [180]. With a left preconditioning, they are
applied to preconditioned system
M 1 Ax = M 1 b
These methods differ by the way they pick an approximation xn to the

solution x in the Krylov subspace Kn ((M 1 A), z0 ) with z0 = M 1 r0 =
M 1 (b Ax0 ) being the initial preconditioned residual.
In order to fix the ideas we will concentrate on the solution of the unprecon-
ditioned system Ax = b, in the preconditioned case A needs to be replaced
by M 1 A and b by M 1 b.
Note that it makes sense to minimize the residual, because the error is in
general unknown, and we can only directly minimize special norms of the
error. Moreover, the norm of the error is bounded by a constant times the
norm of the residual
en 2 = A1 rn 2 A1 rn 2
3.4. KRYLOV METHODS FOR NON-SYMMETRIC PROBLEMS 105
and finally we can note that

rn 2 = (Aen , Aen ) = (A Aen , en ) = en 2A A .
In what follows we will concentrate on the GMRES method by explaining
in detail the principles on which it is based and some ideas about its conver-
gence properties. Let us introduce first the general framework of a minimal
residual method.
Definition 3.4.1 (Minimal residual method) Given an initial iterate
x0 and the initial residual r0 = b Ax0 , a minimal residual method applied
to a system Ax = b will build the iterates xn = x0 + yn where
yn Kn (A, r0 ) such that

(3.38)
rn 2 = r0 Ayn 2 = min A w r0 2 .
wKn (A,r0 )
We say that they minimize the euclidean norm of the residual.

We see from the previous definition that we look for the vector xn = x0 + yn
such that yn achieves the minimum of r0 Ay over Kn (A, r0 ). If the
dimension of Kn (A, r0 ) is n and {v1 , v2 , , vn } is a basis of it, then we can
look for yn under the form
n
y n = yj vj .
j=1
If we denote the function to minimize by

n 2 n 2
f (y1 , y2 , , yn ) = r0 A yi vi = r0 yi Avi
i=1 2 i=1 2
then the optimality conditions f

yi = 0 translates into
n
(Avj , Avi )yj = (r , Avi ).
0
(3.39)
j=1
If V denotes the matrix containing the column vectors vj then (3.39) is

equivalent to the system
(V A AV ) Yn = F where Yn = (y1 , y2 , , yn )T . (3.40)
Note that the size of the system (3.40) increases with the number of itera-
tions, which means the algorithmic complexity to compute the solution of
(3.40) increases as the number of iterations becomes larger. On the other
side, if the basis of Kn (A, r0 ) has no particular structure then the system
(3.40) can be too ill-conditioned. The GMRES method proposes a cheap
solution by building a special basis of the Krylov space which makes the
system easy to solve.
3.4.1 The GMRES method

The idea of the GMRES method is to build an orthonormal basis of the
Krylov space by using a Gram-Schmidt algorithm. That is,
r0
Starting from v1 = ,
r0 2
n
Avn (Avn , vi )vi (3.41)
Compute vn+1 = i=1
n , n 1.
Avn (Avn , vi )vi
i=1 2
Computing the Gram-Schmidt orthogonalisation of the basis of a Krylov

space is called Arnoldi method.
Supposing that the recurrence from (3.41) is well-defined (that is the de-
nominator is non-null), then it is equivalent to
n n
Avn = (Avn , vi )vi + Avn (Avn , vi )vi vn+1 (3.42)
i=1 i=1 2
If we denote by Vn the unitary matrix (Vn Vn = In ) having as columns the first

n already computed orthonormal vectors v1 , , vn , then Vn+1 is obtained
from Vn by adding the column vector vn+1 . Thus equation (3.42) can be
re-written as
AVn = Vn+1 Hn , (3.43)
where the entries of the rectangular matrix Hn Mn+1,n (C) are
(Hn )i,n = (Avn , vi ), i n,

n
(3.44)
(Hn )n+1,n = Avn (Hn )i,n vi .
i=1 2
Moreover, since (Hn )l,m = 0, l > m + 1, we say that Hn is a upper Hessenberg

matrix. Note also that the matrix Hn+1 is obtained from Hn by adding a
column given by the elements (Hn+1 )i,n+1 from (3.44) and a line whose only
non-zero element is (Hn+1 )n+2,n+1 .
With these notations and by using (3.43) we see that the problem (3.40) is
equivalent to finding Yn such that
Vn A AVn Yn = r0 [(A v1 , v1 ), (A v1 , v2 ), , (A v1 , vn )]T
Hn Vn+1
Vn+1 Hn Yn = r0 [(Hn )1,1 , (Hn )2,1 , (Hn )n,1 ]T

In
(3.45)
and consequently
Hn Hn Yn = Hn 1,n+1 , where = r0 2 (3.46)

where 1,n+1 is the first column vector of the canonical basis of Rn+1 . Solving
such a system is straightforward if one knows the QR decomposition of Hn
Hn = Qn Rn
with Qn being a unitary matrix of order n + 1 and Rn and upper triangular
matrix of size (n + 1) n where the last line is null. This factorization is
quite easy to compute in the case of an upper Hessenberg matrix (since it
almost upper triangular). Supposing that this factorization is available
then (3.46) reduces to the system
Rn (Qn Qn ) Rn Yn = Rn Qn 1,n+1 Rn Rn Yn = Rn Qn 1,n+1 . (3.47)

In
If we denote by Rn is the main bloc of size n n of Rn obtained by deleting

the last row we have that
Rn Rn = Rn Rn (3.48)
(last row of Rn is zero). In the same way Rn Qn is obtained from Rn Qn by
appending a column vector (Qn is the main bloc of size n n of Qn obtained
by deleting the last row and column). Thus
Rn Qn 1,n+1 = Rn Qn 1,n , (3.49)
where 1,n is the first column vector of the canonical basis of Rn . We can
conclude from (3.48) and (3.49) that solving (3.47) reduces to solving an
upper triangular system of order n
Rn Yn = gn , gn = Qn 1,n . (3.50)
The complexity of solving (3.50) is known to be of order n2 operations.

To summarize, the iterate n obtained from the optimization problem (3.38)
is given by
n
xn = x0 + yi vi = x0 + Vn yn = x0 + Rn1 gn . (3.51)
i=1
By using (3.43) and the fact that Yn is a solution of (3.50), we see that the
residual norm at iteration n verifies
rn 2 = b Axn 2 = b A(x0 + Vn Yn )2 = r0 AVn Yn 2
= Vn+1 (1,n+1 AVn Yn 2 = Vn+1 1,n+1 Vn+1 Hn Yn 2
= Vn+1 (Qn Qn ) 1,n+1 Vn+1 (Qn Qn ) Qn Rn Yn 2

In In Hn
= Vn+1 Qn [Qn 1,n+1 Rn Yn ] 2 = Vn+1 Qn ([gn , n+1 ]T [Rn Yn , 0]T )2
= Vn+1 Qn [0, 0, , n+1 ]T 2 = n+1 ,

(3.52)
where n+1 is the last element of Qn 1,n+1 (we used the fact matrices Vn+1
and Qn are unitary). See [164] for more details of the proof.
The resulting iterative procedure is given in Algorithm 6.
Algorithm 6 GMRES algorithm

Compute r0 = b Ax0 , = r0 2 , v1 = r0 /, 1 = .
for n = 1, 2, . . . do
wn+1 = Avn
for i = 1, 2, . . . , n do
(Hn )i,n = (wn , vi ).
wn+1 = vn+1 (Hn )i,n vi
end for
(Hn )n+1,n = wn+1 2 .
if (Hn )n+1,n 0 then
vn+1 = wn+1 /(Hn )n+1,n .
end if
for i = 1, 2, . . . , n 1 do
(Hn )i,n c si (Hn )i,n

( )=( i )( )
(Hn )i+1,n si ci (Hn )i+1,n
end for
Compute Givens rotation

cn = (Hn )n,n / (Hn )2n,n + (Hn )2n+1,n , sn = (Hn )n+1,n / (Hn )2n,n + (Hn )2n+1,n .
Update (Hn )n,n = cn (Hn )n,n + sn (Hn )n+1,n , (Hn )n+1,n = 0.

Update (n , n+1 ) = (cn n , sn n ).
Solve the triangular system Hy = (1 , 2 , , n )T .
xn = x0 + [v1 v2 vn ]y
check convergence on residual norm n+1 ; continue if necessary
end for
Remark 3.4.1 The GMRES method has the following properties:
As in the case of the conjugate gradient, the GMRES method converges

in maximum N iterations, where N is the dimension of the linear
system to solve.
The method requires the storage in memory at the iteration n, the

matrix Vn and the QR factorization of Hn : the cost of such a method
is quite high (or order nN for n iterations). When n is small w.r.t.
N , the cost is marginal. When the iteration counts gets large, its cost
can be a problem.
We can write a restarted version of the algorithm after each j iterations

by using the current approximation of the solution as starting point.
In this case, however, we the convergence property in a finite number
of iterations is lost.
3.4.2 Convergence of the GMRES algorithm

In the case where A is diagonalizable we can derive an estimate of the
convergence based on the eigenvalues of A. Note that the GMRES method
builds a sequence of polynomials Pn Pn which minimize at each iteration
the quantity Q(A)r0 2 , where Q Pn and Q(0) = 1.
In the case where A is diagonalizable we have
A = W W 1 Q(A) = W Q()W 1
and we have the estimate on the residual

rn 2 = min W Q()W 1 r0 2
QPn ,Q(0)=1
W 2 W 1 2 r0 2 min Q()2 (3.53)
QPn ,Q(0)=1
= 2 (W )r0 2 min max Q()
QPn ,Q(0)=1 (A)
Estimate (3.53) might be very inaccurate since the polynomial minimizing

W Q()W 1 2 might be very different of that minimizing Q()2 . We
can distinguish the following cases
A is a normal matrix that is A commutes with its transpose. In this

case there exists a unitary matrix W such that A = W W 1 (see e.g.
[97] or [108]), which means that
W Q()W 1 2 = Q()2 and 2 (W ) = 1.
In this case the estimate (3.53) is very precise
rn 2 r0 2 min max Q().

QPn ,Q(0)=1 (A)
Estimating the convergence rate is equivalent to find a polynomial

with Q(0) = 1 which approximates 0 on the spectrum of A. If A has a
big number of eigenvalues close to 0 then the convergence will be very
slow.
If A is not a normal matrix, but is diagonalizable such that the matrix

W is well conditioned, then the estimate (3.53) remains reasonable.
In the general case, there is no relation between the convergence rate

of the GMRES algorithm and the eigenspectrum only, see [101]. Some
supplementary information is needed.
The BiConjuguate Gradient Stabilized (BICGSTAB) method [180], see

Algorithm 7, takes a different approach with a fixed cost per iteration but at
the expense of loosing optimal properties of xn . Two mutually orthogonal
sequences are computed and xn is such that the residual is orthogonal to
one of the sequence.
Algorithm 7 preconditioned BICGSTAB algorithm

Compute r0 = b Ax0
Choose r (for example, r = r0 )
for i = 1, 2, . . . do
i1 = (r, ri1 )2
if i1 = 0 then
method fails
end if
if i = 1 then
p1 = r0
else
i1 = (i1 /i2 )(i1 /i1 )
pi = ri1 + i1 (pi1 i1 vi1 )
end if
solve M p = pi
vi = Ap
i1
i =
(r, vi )2
s = ri1 i vi
if norm of s is small enough then
xi = xi1 + i p and stop
end if
solve Ms = s
t = As
i = (t, s)2 /(t, t)2
xi = xi1 + i p + i s
ri = s i t
end for
Other facts:
As in the case of the conjugate gradient, in order to have a faster

convergence, a preconditioner denoted here M is often used. Since one
3.5. KRYLOV METHODS FOR ILL-POSED PROBLEMS 111
doesnt need to preserve some symmetry, the preconditioned iteration

is simply derived from the unpreconditioned one by replacing A by
M 1 A into Algorithm (6).
Krylov methods do not need the explicit form of the operators A nor
of the preconditioner M . They are based on applying these operators
to vectors (so called matrix vector product operations) as it is the case
for the fixed point method (3.3). These are matrix free methods.
There exist Matlab functions for the main Krylov methods: pcg for
the Conjugate Gradient and gmres for the GMRES method.
In conclusion fixed point iterations (3.3) generate an approximate solution

xn that belongs to the same Krylov space as CG or GMRES iterate xnK but
with frozen coefficients. Thus we have that xn is less (actually much much
less) optimal than xnK . In conclusion, it is convenient to replace the fixed
point method by a Krylov type method. For more details on these Krylov
methods, see [164] for a complete overview including implementation issues,
[9] for their templates and for a more mathematical introduction [100] and
for finite precision effects [136].
3.5 Krylov methods for ill-posed problems

Since it will be needed in some sub-structuring methods 6.5, we consider
here the case where the linear system
Ax = b (3.54)
is square but ill-posed. In order to have existence of a solution, we assume

that b range(A). A solution is unique up to any element of ker(A). We
shall see that if A is symmetric, Krylov method will converge to a solution.
When A is non symmetric, we may not converge to a solution.
We start with the symmetric case. It is known in this case that
RN = ker(A) range(A)
and that there is a unique solution x range(A) to (3.54) denoted A b in

the sequel. Thus we have that A is an isomorphism from range(A) onto
itself.
The idea at the origin of Krylov methods can be found in the following
lemma which is true for an invertible matrix.
Lemma 3.5.1 Let C be an invertible matrix of size N N . Then, there

exists a polynomial P of degree p < N such that
C 1 = P(C) .
Proof Let
d
M(X) = ai X i
i=0
be a minimal polynomial of C of degree d N . We have that

d
ai C = 0
i
i=0
and there is no non zero polynomial of lower degree that annihilates C.

Thus, a0 cannot be zero. Otherwise, we would have
d
C ai C i1 = 0 .
i=1
and since C is invertible:

d
ai C
i1
= 0.
i=1
Then, d1 i
i=0 ai+1 X would be an annihilating polynomial of C of degree lower
than d. By definition of a minimal polynomial, this is impossible. This
proves that a0 0 and we can divide by a0 :
d
ai i1
Id + C C =0
i=1 a0
that is
d
ai i1
C 1 = P(C), P(C) = C . (3.55)
i=1 a0
Our first step is to extend Lemma 3.5.1 to the non invertible symmetric
case.
Lemma 3.5.2 Let A be a symmetric indefinite matrix of size N N . Then

there exist coefficients (ai )2ip with p N such that
p
A = ai Ai . (3.56)
i=2
Proof Let denote the set of the eigenvalues of A without taking into
account their multiplicities. Since matrix A is non invertible, we have 0 .
Moreover, since it is symmetric, it is diagonalizable and it is easy to check
that A cancels its characteristic polynomial, that is
(Id A) = 0 . (3.57)

3.5. KRYLOV METHODS FOR ILL-POSED PROBLEMS 113
The zero-th order term of the above polynomial is zero and the next term
is non zero since it takes the following value:

A.
{0}
Then, it suffices to expand polynomial from the left-hand side of (3.57),

divide it by {0} and rearrange terms to end the proof.
The consequence of this is the following lemma
Lemma 3.5.3 The unique solution to Ay = r that belongs to range(A), for

a given r range(A) can be written as
p
y = ai Ai2 r (3.58)
i=2
where the coefficients ai are given in (3.56).
Proof Let r = At range(A) for some t RN . Note that since t may have
a non zero component in ker(A), we may have y t. We apply Lemma 3.5.2
and right multiply (3.56) by vector t:
p
A ( ai Ai2 ) r = r .
i=2

y=
Thus, we conclude that y given by (3.58) is the unique solution to Ay = r

that belongs to range(A).
We apply now these results to the solving of Ax = b range(A) by a Krylov

method with x0 as an initial guess. This result does not depend on the
actual implementation of the Krylov method.
Lemma 3.5.4 (Krylov method for undefinite symmetric systems)

Let x0 be the initial guess decomposed into its component in ker A and
range(A):
x0 = x0ker + x0range .
Then, the solution x to equation (3.54) provided by the Krylov method is
x = x0ker + A b
In other words, the solution to equation (3.54) selected by a Kryolv method

is the one whose component in the kernel of A is x0ker .
Proof At step n, a Krylov method will seek an approximate solution to

Ay = b Ax0 = r0 range(A)
in Kn (A, r0 ). From (3.58), the method will converge in at most p 1 itera-
tions and then the final solution is
x = x0 + y = x0ker + x0range + y

y
with y range(A). It is then obvious to see that

Ay = A(x0range + y) = Ax0range + r0 = b.
so that y = A b.
Thus, we have proved that a Krylov method applied to an indefinite sym-

metric linear system makes sense since for n large enough a solution to
Ax = b range(A) can be found in {x0 } + Kn (A, r0 ).
Remark 3.5.1 This is not generally true in the non symmetric case. We
give now a counter-example. Consider the following system:
0 0 0 0
1 0 0 x = b = 1 (3.59)
0 1 0 1
whose solutions are
1 0
x = 1 + R 0 .
0 1
In particular, the first component of any solution is equal to 1. For sake
of simplicity, assume x0 = 0. Then, the first residual r0 = b has its first
component equal to zero. The same holds for Ak r0 for all k 1. Thus for
all n, any vector in the space {x0 } + Kn (A, r0 ) has its first component equal
to zero and cannot be a solution to system (3.59).
3.6 Schwarz preconditioners using FreeFem++

In order to use Schwarz algorithms as a preconditioner in Krylov methods,
in addition to the usual ingredients (partition of the mesh, local matrices
and restriction operators), we need only the associated matrix-vector
product to the global problem as well as the preconditioning method used.
In the following program Aglobal corresponds to the finite element dis-

cretization of the variational formulation of the problem to be solved. We
then need the matrix-vector product of the operator Aglobal with a vector
l.
3.6. SCHWARZ PRECONDITIONERS USING FREEFEM++ 115
func real[int] A(real[int] &x)

3 {
// Matrix vector product with the global matrix
Vh Ax;
Ax[]= Aglobalx;
7 return Ax[];
}
Listing 3.1: ./FreefemCommon/matvecAS.idp
The preconditioning method can be Additive Schwarz (AS) which can be

found in the routine
// and the application of the preconditioner

func real[int] AS(real[int] &l)
13 {
// Application of the ASM preconditioner
// M{1}y = \sum RiTAi{1}Riy
// Ri restriction operators, Ai =RiARiT local matrices
17 Vh s = 0;
{
real[int] bi = Rih[i]l; // restricts rhs
21 real[int] ui = aA[i] 1 bi; // local solves
s[] += Rih[i]ui; // prolongation
}
return s[];
25 }

// Preconditioned Conjugate Gradient Applied to the system

3 // A(un[]) = rhsglobal[]
// preconditioned by the linear operator
// AS: r[] > AS(r[])
func real[int] myPCG(real[int] xi,real eps, int nitermax)
7 {
ofstream filei(Convprec.m);
ofstream fileignu(Convprec.gnu);
Vh r, un, p, zr, rn, w, er;
11 un[] = xi;
r[] = A(un[]);
r[] = rhsglobal[];
r[] = 1.0;
15 zr[] = AS(r[]);
real resinit=sqrt(zr[]zr[]);
p = zr;
for(int it=0;it<nitermax;++it)
19 {
plot(un,value=1,wait=1,fill=1,dim=3,cmm=Approximate solution at
iteration +it);
real relres = sqrt(zr[]zr[])/resinit;
cout << It: << it << Relative residual = << relres << endl;
23 int j = it+1;
filei << relres(+j+)= << relres << ; << endl;
fileignu << relres << endl;
if(relres < eps)
27 {
cout << CG has converged in + it + iterations << endl;
cout << The relative residual is + relres << endl;
break;
31 }
w[] = A(p[]);
real alpha = r[]zr[];
real aux2 = alpha;
35 real aux3 = w[]p[];
alpha /= aux3; // alpha = (rj,zj)/(Apj,pj);
un[] += alphap[]; // xj+1 = xj + alphap;
r[] = alphaw[]; // rj+1 = rj alphaApj;
39 zr[] = AS(r[]); // zj+1 = M1 rj+1;
real beta = r[]zr[];
beta /= aux2; // beta = (rj+1,zj+1)/(rj,zj);
p[] = beta;
43 p[] += zr[];
}
return un[];
}
Listing 3.3: ./KRYLOV/FreefemProgram/PCG.idp
The Krylov method applied in this case is the Conjugate Gradient (we are in
the symmetric case since both the original problem and the preconditioner
are symmetric positive definite) The whole program where these routines
are called is AS-PCG.edp.
2 include ../../FreefemCommon/dataGENEO.edp
6 /# endPartition #/
/# debutGlobalData #/
10 uglob[] = Aglobal1 rhsglobal[];
/# finGlobalData #/
/# debutLocalData #/
14 {
cout << Domain : << i << / << npart << endl;
aA[i] = Rih[i]aT;
18 set(aA[i],solver = UMFPACK); // direct solvers using UMFPACK
}
/# finLocalData #/
/# debutPCGSolve #/
22 include ../../FreefemCommon/matvecAS.idp
include PCG.idp
Vh un = 0, sol; // initial guess un and final solution
cout << Schwarz Dirichlet algorithm << endl;
26 sol[] = myPCG(un[], tol, maxit); // PCG with initial guess un
plot(sol,cmm= Final solution, wait=1,dim=3,fill=1,value=1);
Vh er = soluglob;
cout << Final error: << er[].linfty << endl;
30 /# finPCGSolve #/
Listing 3.4: ./KRYLOV/FreefemProgram/AS-PCG.edp
In the three-dimensional case the only thing that changes with respect to
the main script is the preparation of the data
include ../../FreefemCommon/data3d.edp
3 include ../../FreefemCommon/decomp3d.idp
include ../../FreefemCommon/createPartition3d.idp
Listing 3.5: ./KRYLOV/FreefemProgram/AS-PCG3d.edp
We can also use RAS as a preconditioner in GMRES or BiCGSTAB,

by slightly modifying the program as described in the following rou-
tines:
2 1
10 10
overlap=2 overlap=2
1 overlap=5 0 overlap=5
10 10
overlap=10 overlap=10
0 1
10 10
1 2
10 10
2 3
10 10
3 4
10 10
4 5
10 10
5 6
10 10
6 7
10 10
0 10 20 30 40 50 60 0 2 4 6 8 10 12 14 16
Figure 3.1: Schwarz convergence as a solver (left) and as a preconditioner

(right) for different overlaps
func real[int] RAS(real[int] &l)

29 {
// Application of the RAS preconditioner
// M{1}y = \sum RiTDiAi{1}Riy
// Ri restriction operators, Ai =RiARiT local matrices
33 Vh s = 0;
for(int i=0;i<npart;++i) {
real[int] bi = Rih[i]l; // restricts rhs
real[int] ui = aA[i] 1 bi; // local solves
37 bi = Dih[i]ui; // partition of unity
s[] += Rih[i]bi; // prolongation
}
return s[];
41 }

func real[int] GMRES(real[int] x0, real eps, int nbiter)

7 {
ofstream filei(Convprec.m);
Vh r, z, v,w,er, un;
Vh[int] V(nbiter); // orthonormal basis
11 real[int,int] Hn(nbiter+2,nbiter+1); // Hessenberg matrix
real[int,int] rot(2,nbiter+2);
real[int] g(nbiter+1),g1(nbiter+1);
r[]=A(x0);
15 r[] = rhsglobal[];
r[] = 1.0;
z[] = RAS(r[]); // z= M{1}(bAx0)
g[0]=sqrt(z[]z[]); // initial residual norm
19 filei << relres(+1+)= << g[0] << ; << endl;
V[0][]=1/g[0]z[]; // first basis vector
for(int it =0;it<nbiter;it++) {
v[]=A(V[it][]);
23 w[]=RAS(v[]); // w = M{1}AV it
for(int i =0;i<it+1;i++) {
Hn(i,it)=w[]V[i][];
w[] = Hn(i,it)V[i][]; }
27 Hn(it+1,it)= sqrt(w[]w[]);
real aux=Hn(it+1,it);
for(int i =0;i<it;i++){ // QR decomposition of Hn
real aa=rot(0,i)Hn(i,it)+rot(1,i)Hn(i+1,it);
31 real bb=rot(1,i)Hn(i,it)+rot(0,i)Hn(i+1,it);
Hn(i,it)=aa;
Hn(i+1,it)=bb; }
real sq = sqrt(Hn(it+1,it)2+Hn(it,it)2);
35 rot(0,it)=Hn(it,it)/sq;
rot(1,it)=Hn(it+1,it)/sq;
Hn(it,it)=sq;
Hn(it+1,it)=0;
39 g[it+1] = rot(1,it)g[it];
g[it] = rot(0,it)g[it];
real[int] y(it+1); // Reconstruct the solution
for(int i=it;i>=0;i) {
43 g1[i]=g[i];
for(int j=i+1;j<it+1;j++)
g1[i]=g1[i]Hn(i,j)y[j];
y[i]=g1[i]/Hn(i,i); }
47 un[]=x0;
for(int i=0;i<it+1;i++)
un[]= un[]+ y[i]V[i][];
er[]=un[]uglob[];
51 real relres = abs(g[it+1]);
cout << It: << it << Residual = << relres << Error = <<
sqrt(er[]er[]) << endl;
int j = it+2;
55 if(relres < eps) {
cout << GMRES has converged in + it + iterations << endl;
break; }
59 V[it+1][]=1/auxw[]; }
return un[];
}
Listing 3.7: ./KRYLOV/FreefemProgram/GMRES.idp

func real[int] BiCGstab(real[int] xi, real eps, int nbiter){

7 ofstream filei(Convprec.m);
Vh r, rb, rs, er, un, p, yn, zn, z, v, t, tn;
un[]=xi;
r[]=A(un[]);
11 r[] = rhsglobal[];
r[] = 1.0;
rb[]=r[];
real rho=1.;
15 real alpha=1.;
real omega=1.;
z[] = RAS(r[]);
real resinit=sqrt(z[]z[]);
19 p[]=0; v[]=0;
real rhonew, beta;
for(int it =1;it<nbiter;it++){
real relres = sqrt(z[]z[])/resinit;
23 cout << It: << it << Relative residual = << relres << endl;
int j = it+1;
if(relres < eps){
27 cout << BiCGstab has converged in + it + iterations << endl;
break;
}
31 rhonew=rb[]r[]; // rhoi = (rb,rim1);
beta=(rhonew/rho)(alpha/omega); // beta = (rhoi/rhoim1)(alpha/omega);
p[] =omegav[]; // pi = rim1 + beta(pim1 omegavim1);
p[] = beta;
35 p[] += r[];
yn[] = RAS(p[]); // y = Mm1pi; vi = Ay;
v[] = A(yn[]);
alpha = rhonew/(rb[]v[]); // alpha = rhoi/(rbvi);
39 rs[] = r[]; // s = rim1 alphavi;
rs[] = alphav[];
zn[] = RAS(rs[]); // z = Mm1s; t = Az;
t[] = A(zn[]);
43 tn[] = RAS(t[]); // tn = Mm1t;
omega = (tn[]zn[])/(tn[]tn[]); // omega =
(Mm1t,Mm1s)/(Mm1t,Mm1t);
un[] += alphayn[]; // xi = xim1 + alphay + omegaz;
un[] += omegazn[];
47 r[] = rs[];
r[] = omegat[];
z[] = RAS(r[]);
rho = rhonew;
51 }
return un[];
}
Listing 3.8: ./KRYLOV/FreefemProgram/BiCGstab.idp

The whole program where the GMRES is called is similar to AS-PCG.edp

where the last part is modified RAS-GMRES.edp.
include ../../FreefemCommon/matvecAS.idp
23 include GMRES.idp
Vh un = 0, sol, er; // initial guess, final solution and error
sol[] = GMRES(un[], tol, maxit);
27 er[] = sol[]uglob[];
cout << Final error = << er[].linfty/uglob[].linfty << endl;
Listing 3.9: ./KRYLOV/FreefemProgram/RAS-GMRES.edp
In the case of the use of BiCGstab, only the last part has to be modi-
fied
include BiCGstab.idp
24 Vh un = 0, sol, er; // initial guess, final solution and error
sol[] = BiCGstab(un[], tol, maxit);
er[] = sol[]uglob[];
28 cout << Final error = << er[].linfty/uglob[].linfty << endl;
Listing 3.10: ./KRYLOV/FreefemProgram/RAS-BiCGstab.edp
In the three dimensional case we have to use the specific preparation of the
data as for the case of the ASM. The performance of Krylov methods is now
much less sensitive to the overlap size compared to the iterative version as
shown in Figure 3.1. Note also that in the case of the use of BiCGStab, an
iteration involves two matrix-vector products, thus the cost of a BICGStab
iteration is twice the cost of a CG iteration. In all previous scripts the
convergence residuals can be stored in a file in Matlab/Octave/Scilab form
and then the convergence histories can be plotted in one of these numeric
languages.
Chapter 4
Coarse Spaces
Theoretical and numerical results show that domain decomposition methods

based solely on local subdomain solves are not scalable with respect to the
number of subdomains. This can be fixed by using a two-level method in
which the preconditioner is enriched by the solve of a coarse problem whose
size is of the order of the number of subdomains. In 4.1, we see that one
level Schwarz methods are not scalable since the iteration count grows as
the number of subdomains in one direction. We introduce a fix introduced
in Nicolaides [146]. A numerical implementation is given in 4.2.1. In 5.1
we introduce a more general coarse space construction that reduces to that
of Nicolaides for a particular choice of parameters
4.1 Need for a two-level method

The results presented so far concern only a small number of subdomains,
two for the theoretical results of 1.5 and four for the numerical results
of 3.6. When the number of subdomains is large, plateaus appear in the
convergence of Schwarz domain decomposition methods. This is the case
even for a simple model such as the Poisson problem:
u = f, in ,
{
u = 0, on .
Figure 4.1: Decomposition into many subdomains
123
124 CHAPTER 4. COARSE SPACES
The problem of one level method comes from the fact that in the Schwarz
methods of 1 there is a lack of a global exchange of information. Data are
exchanged only from one subdomain to its direct neighbors. But the solution
in each subdomain depends on the right-hand side in all subdomains. Let
us denote by Nd the number of subdomains in one direction. Then, for
instance, the leftmost domain of Figure 4.1 needs at least Nd iterations
before being aware about the value of the right-hand side f in the rightmost
subdomain. The length of the plateau is thus typically related to the number
of subdomains in one direction and therefore to the notion of scalability met
in the context of high performance computing.
To be more precise, there are two common notions of scalability.
Definition 4.1.1 (Strong scalability) Strong scalability is defined as

how the solution time varies with the number of cores for a fixed total prob-
lem size. Ideally, the elapsed time is inversely proportional to the number of
cores.
Definition 4.1.2 (Weak scalability) Weak scalability is defined as how

the solution time varies with the number of cores for a fixed problem size per
core. Ideally the elapsed time is constant for a fixed ratio between the size
of the problem and the number of cores.
A mechanism through which the scalability can be achieved (in this case the
weak scalability) consists in the introduction of a two-level preconditioner
via a coarse space correction. In two-level methods, a small problem of
size typically the number of subdomains couples all subdomains at each
iteration. Note that here the term scalable refers to the number of iterations.
In real applications, scalability is defined in terms of the total computing
time. When measured by the computing time, the two-level methods are
often not scalable for large scale calculation because the coarse grid solver
is not scalable. As a result, a multilevel (more than two) approach is often
necessary.
To illustrate the need of a coarse space, in Figure 4.2 we consider a 2D
problem decomposed into 2 2, 4 4 and 8 8 subdomains. For each domain
decomposition, we have two curves: one with a one-level method which has
longer and longer plateaus and the second curve with a coarse grid correction
which is denoted by M2 for which plateaus are much smaller. The problem
of one level methods and its cure are also well illustrated in Figure 4.3 for
a domain decomposition into 64 strips as well as in Table 4.1. The one
level method has a long plateau in the convergence whereas with a coarse
space correction convergence is quite fast. We also see that for the one-level
curves, the plateau has a size proportional to the number of subdomains in
one direction. This can be understood by looking at Figure 4.4 where we
plot the slow decrease of the error for a one dimensional Poisson problem.
4.1. NEED FOR A TWO-LEVEL METHOD 125
-1
-2
-3
Log_10 (Error)
-4
4x4 8x8
-5
-6 M2
2x2 M2 M2
2x2
4x4 8x8
-7
0 10 20 30 40 50 60 70 80
Number of iterations (GCR)
Figure 4.2: Japhet, Nataf, Roux (1998)
From the linear algebra point of view, stagnation in the convergence of one-
level Schwarz methods corresponds to a few very low eigenvalues in the
spectrum of the preconditioned problem. Using preconditioners MASM 1
or
MRAS , we can remove the influence of very large eigenvalues of the coeffi-
1
cient matrix, which correspond to high frequency modes. It has been proved
that for a SPD (symmetric positive definite) matrix, the largest eigenvalue
of the preconditioned system by MASM 1
is bounded by the number of col-
ors needed for the overlapping decomposition such that different colors are
used for adjacent subdomains, see [179] or [164] for instance. But the small
eigenvalues still exist and hamper the convergence. These small eigenvalues
correspond to low frequency modes and represent a certain global informa-
tion with which we have to deal efficiently.
A classical remedy consists in the introduction of a coarse grid or coarse
space correction that couples all subdomains at each iteration of the iterative
method. This is closely related to deflation technique classical in linear
algebra, see Nabben and Vuiks paper [178] and references therein. This is
connected as well to augmented or recycled Krylov space methods, see e.g.
[73], [149] [163] or [21] and references therein.
Suppose we have identified the modes corresponding to the slow convergence
of the iterative method used to solve the linear system:
Ax = b
with a preconditioner M , in our case a domain decomposition method. That

is, we have some a priori knowledge on the small eigenvalues of the precondi-
tioned system M 1 A. For a Poisson problem, these slow modes correspond
to constant functions that are in the null space (kernel) of the Laplace op-
erators. For a homogeneous elasticity problem, they correspond to the rigid
body motions.
Let us call Z the rectangular matrix whose columns correspond to these
slow modes. There are algebraic ways to incorporate this information to
accelerate the method. We give here the presentation that is classical in
domain decomposition, see e.g. [133] or [179]. In the case where A is SPD,
the starting point is to consider the minimization problem
min A(y + Z) bA1 .

It corresponds to finding the best correction to an approximate solution y

by a vector Z in the vector space spanned by the nc columns of Z. This
problem is equivalent to
min 2(Ay b, Z)2 + (AZ, Z)2

Rnc
and whose solution is:
= (Z T AZ)1 Z T (b Ay) .
Thus, the correction term is:
Z = Z (Z T AZ)1 Z T (b Ay) .

r
This kind of correction is called a Galerkin correction. Let R0 = Z T and

r = bAy be the residual associated to the approximate solution y. We have
just seen that the best correction that belongs to the vector space spanned
by the columns of Z reads:
Z = R0T = R0T (R0 AR0T )1 R0 r .
When using such an approach with an additive Schwarz method (ASM) as

defined in eq. (1.29), it is natural to introduce an additive correction to the
additive Schwarz method.
Definition 4.1.3 (Two-level additive Schwarz preconditioner) The

two-level additive Schwarz preconditioner is defined as
N
= R0T (R0 AR0T ) R0 + RiT (Ri ARiT ) Ri
1 1
MASM,2
1
(4.1)
i=1
where the Ri s (1 i N ) are the restriction operators to the overlapping

subdomains and R0 = Z T .
Remark 4.1.1 This definition suggests the following remarks.

4.1. NEED FOR A TWO-LEVEL METHOD 127
N subdomains Schwarz With coarse grid

4 18 25
8 37 22
16 54 24
32 84 25
64 144 25
Table 4.1: Iteration counts for a Poisson problem on a domain decomposed

into strips. The number of unknowns is proportional to the number of
subdomains (weak scalability).
The structure of the two level preconditioner MASM,2

1
is the same that
of the one level method.
Compared to the one level Schwarz method where only local subprob-
lems have to be solved in parallel, the two-level method adds the solving
of a linear system of matrix R0 AR0T which is global in nature. This
global problem is called coarse problem.
The coarse problem couples all subdomains at each iteration but its
matrix is a small O(N N ) square matrix and the extra cost is neg-
ligible compared to the gain provided the number of subdomains is not
too large.
In Table 4.1 we display the iteration counts for a decomposition of the

domain in an increasing number of subdomains. In figure 4.3, we see that
without a coarse grid correction, the convergence curve of the one level
Schwarz method has a very long plateau that can be bypassed by a two-
level method.
e01 e02 e03 e0i
e44
...
e11
e41
0 l2 L1 l3 L2 x ...
Figure 4.4: Convergence of the error for the one level Schwarz method in
the 1D case.
4 SCHWARZ
10
additive Schwarz
with coarse gird acceleration
2
10
0
10
2
10
4
10
6
10
X: 25
Y: 1.658e08
8
10
0 50 100 150
Figure 4.3: Convergence curves with and without a coarse space correction
for a decomposition into 64 strips
4.2 Nicolaides coarse space

We give here a precise definition to the rectangular matrix Z for a Poisson
problem. This construction was introduced in Nicolaides [146]. We take Z so
that it has a domain decomposition structure. Z is defined by vectors which
have local support in the subdomains and so that the constant function 1
belongs to the vector space spanned by the columns of Z.
4.2. NICOLAIDES COARSE SPACE 129
Recall first that we have a discrete partition of unity (see 1.3) in the
following sense: let Ni be subsets of indices, Di be diagonal matrices, 1
i N:
Di R#Ni z R#Ni (4.2)
so that we have:
N
Ri Di Ri = Id .
T
i=1
With these ingredients we are ready to provide the definition of what we
will further call classical or Nicolaides coarse space.
Definition 4.2.1 (Nicolaides coarse space) We define Z as the matrix
whose i-th column is
Zi = RiT Di Ri 1 for 1 i N (4.3)
where 1 is the vector of dimension N full of ones. The global structure of

Z is thus the following:
D1 R1 1 0 0

0 D2 R2 1

Z= . (4.4)
0

0 DN RN 1
0 0
Numerical results of Figures 4.3 and Table 4.1 have been obtained by a two-
level Schwarz preconditioner (4.1) where R0 = Z T with Z being defined by
(4.4).
4.2.1 Nicolaides coarse space using FreeFem++

We provide FreeFem++ scripts that will illustrate numerically the perfor-
mance of the Nicolaides coarse space. We need first to build matrix Z whose
expression is given in (4.4),
// Build coarse space

4 {
Z[i]=1.;
// Z[i][] = Z[i][].intern[];
real[int] zit = Rih[i]Z[i][];
8 real[int] zitemp = Dih[i]zit;
Z[i][]=Rih[i]zitemp;
}
Listing 4.1: ./FreefemCommon/coarsespace.idp
then the coarse matrix A0 = Z T AZ = R0 AR0T ,

13 real[int,int] Ef(npart,npart); // E = ZTAZ

{
real[int] vaux = A(Z[i][]);
17 for(int j=0;j<npart;++j)
{
Ef(j,i) = Z[j][]vaux;
}
21 }
matrix E;
E = Ef;
set(E,solver=UMFPACK);
the coarse solver Q = R0T (R0 AR0T )1 R0
func real[int] Q(real[int] &l) // Q = ZE1 ZT

{
29 real[int] res(l.n);
res=0.;
real[int] vaux(npart);
33 {
vaux[i]=Z[i][]l;
}
real[int] zaux=E1 vaux; // zaux=E1 ZTl
37 for(int i=0;i<npart;++i) // Zzaux
{
res +=zaux[i]Z[i][];
}
41 return res;
}
the projector outside the coarse space P = I QA and its transpose P T =

I AQ.
45 func real[int] P(real[int] &l) // P = I AQ

{
real[int] res=Q(l);
real[int] res2=A(res);
49 res2 = l;
res2 = 1.;
return res2;
}
53 func real[int] PT(real[int] &l) // PT = IQA
{
real[int] res=A(l);
real[int] res2=Q(res);
57 res2 = l;
res2 = 1.;
return res2;
}
Based on the previous quantities, we can build two versions of the additive
Schwarz preconditioner. First the classical one, called AS2 corresponding to
the formula (4.1) and then, the alternative version Q + P T MAS,1
1
P
func real[int] AS2(real[int] & r){

real[int] z = Q(r);
8 real[int] aux = AS(r);
z += aux;
return z;
}
12 func real[int] BNN(real[int] &u) // precond BNN
{
real[int] aux1 = Q(u);
real[int] aux2 = P(u);
16 real[int] aux3 = AS(aux2);
aux2 = PT(aux3);
aux2 += aux1;
return aux2;
20 }
Listing 4.5: ./COARSEGRID/FreefemProgram/PCG-CS.idp
After that we can use one of these inside a preconditioned conjugate gradient.
Finally the main program script is given by AS2-PCG.edp, whose first part
is identical as in the case of the one-level method.
Figure 4.5: Decomposition into subdomains
include ../../FreefemCommon/coarsespace.idp
24 include PCGCS.idp
sol[] = myPCG(un[], tol, maxit); // PCG with initial guess un
28 plot(sol,cmm= Final solution, wait=1,dim=3,fill=1,value=1);
Vh er = soluglob;
cout << Final error: << er[].linfty << endl;
Listing 4.6: ./COARSEGRID/FreefemProgram/AS2-PCG.edp
An identical thing can be performed for the non-symmetric preconditioner,

by simply replacing AS by RAS in the two versions of the preconditoners.
After that we can use one of these inside a BiCGstab or a GMRES.
6 func real[int] RAS2(real[int] & r){
real[int] z = Q(r);
real[int] aux = RAS(r);
z += aux;
10 return z;
}
func real[int] BNN(real[int] &u) // precond BNN
{
14 real[int] aux1 = Q(u);
real[int] aux3 = RAS(aux2);
aux2 = PT(aux3);
18 aux2 += aux1;
return aux2;
}
Listing 4.7: ./COARSEGRID/FreefemProgram/BiCGstab-CS.idp
We perform now the following numerical experiment: consider a uniform

decomposition into n n square subdomains where each domain has 20 20
discretization points with two layers of overlap. We see that in the absence
of the coarse space, the iteration number of a Schwarz method used as
a preconditioner in a Krylov method, depends linearly on the number of
0 0
10 10
2x2 subdomains 2x2 subdomains
4x4 subdomains 4x4 subdomains
1 8x8 subdomains 1 8x8 subdomains
10 10
2 2
10 10
Residual norm (logscale)
Residual norm (logscale)

3 3
10 10
4 4
10 10
5 5
10 10
6 6
10 10
7 7
10 10
8 8
10 10
0 10 20 30 40 50 60 70 0 5 10 15 20 25
Iteration Iteration
Figure 4.6: Convergence history without and with coarse space for various
domain decompositions
subdomains in one direction. By introducing the Nicolaides coarse space we

remove this dependence on the number of subdomains.
Chapter 5
Theory of two-level Additive

Schwarz methods
In this chapter we first introduce first the spectral coarse space which gen-
eralizes the idea of Nicolaides coarse space. In order to quantify the action
of this new preconditioner we need to estimate the condition number of the
preconditioned matrix MASM,2
1
A. This requires a theoretical framework al-
lowing a functional analysis of the condition number of the two-level Schwarz
method with this spectral coarse space.
5.1 Introduction of a spectral coarse space

In this section we introduce a more general coarse space construction that
reduces to that of Nicolaides for a particular choice of a threshold parameter.
This new construction answers to the natural question on how to enrich
the coarse space by allowing it to have a size larger than the number of
subdomains. This enrichment is mainly motivated by the complexity of the
problem to solve, which is often a result of highly heterogeneous coefficients.
We will try to keep it for the moment as elementary as possible without any
theoretical convergence proof in this chapter.
Classical coarse spaces are known to give good results when the jumps in
1 2
Figure 5.1: Coefficient varying along and across the interface.
135
136 CHAPTER 5. THEORY OF TWO-LEVEL ASM
the coefficients are across subdomain interfaces (see e.g. [62, 129, 47, 48])
or inside the subdomains and not near their boundaries (cf. [155, 154]).
However, when the discontinuities are along subdomain interfaces, classical
methods do not work anymore. In [142], we proposed the construction
of a coarse subspace for a scalar elliptic positive definite operator, which
leads to a two-level method that is observed to be robust with respect
to heterogeneous coefficients for fairly arbitrary domain decompositions,
e.g. provided by an automatic graph partitioner such as METIS or SCOTCH
[117, 24]. This method was extensively studied from the numerical point
of view in [143]. The main idea behind the construction of the coarse
space is the computation of the low-frequency modes associated with
a generalized eigenvalue problem based on the Dirichlet-to-Neumann
(DtN) map on the boundary of each subdomain. We use the harmonic
extensions of these low-frequency modes to the whole subdomain to
build the coarse space. With this method, even for discontinuities along
(rather than across) the subdomain interfaces (cf. Fig. 5.1), the iteration
counts are robust to arbitrarily large jumps of the coefficients leading
to a very efficient, automatic preconditioning method for scalar hetero-
geneous problems. As usual with domain decomposition methods, it is
also well suited for parallel implementation as it has been illustrated in [115].
We first motivate why the use of spectral information local to the sub-
domains is the key ingredient in obtaining an optimal convergence. A
full mathematical analysis of the two-level method will be given in chapter 5.
We now propose a construction of the coarse space that will be suitable for
parallel implementation and efficient for arbitrary domain decompositions
and highly heterogeneous problems such as the Darcy problem with Dirichlet
boundary conditions:
div(u) = f, in
where can vary of several orders of magnitude, typically between 1 and 106 .
We still choose Z such that it has the form

W 1 0 0

W2 0

Z= , (5.1)

0 W N
0
where N is the number of overlapping subdomains. But W i is now a

rectangular matrix whose columns are based on the harmonic extensions
of the eigenvectors corresponding to small eigenvalues of the DtN map in
each subdomain i multiplied by the local component of the partition of
unity, see definition 5.5.1. Note that the sparsity of the coarse operator
5.1. INTRODUCTION OF A SPECTRAL COARSE SPACE 137
E = Z T AZ is a result of the sparsity of Z. The nonzero components of E

correspond to adjacent subdomains.
More precisely, let us consider first at the continuous level the Dirichlet to
Neumann map DtNi .
Definition 5.1.1 (Dirichlet-to-Neumann map) For any function de-
fined on the interface ui i R, we consider the Dirichlet-to-Neumann
map
v
DtNi (ui ) = ,
ni i
where i = i and v satisfies

div(v) = 0, in i ,

v = ui , on i , (5.2)

v = 0, on i .
If the subdomain is not a floating one (i.e. i ), we use on

i the boundary condition from the original problem v = 0.
To construct the coarse grid subspace, we use some low frequency modes of
the DtN operator by solving the following eigenvalue problem
DtNi (u) = u. (5.3)
In order to obtain the variational formulation of (5.3) we multiply (5.2) by

a test function w, equal to zero on i
v
v w w = 0.
i i ni
By using the fact that
v
= v on i and w = 0 on i
ni
we get the variational formulation of the eigenvalue problem of the local
Dirichlet-to-Neumann map
Find (v, ) such that v w = v w, w. (5.4)

i i
We now motivate our choice of this coarse space based on DtN map. We
write the original Schwarz method at the continuous level, where the domain
is decomposed in a one-way partitioning. The error eni between the current
iterate at step n of the algorithm and the solution ui (eni = uni ui ) in
subdomain i at step n of the algorithm is harmonic
div(en+1
i ) = 0 in i . (5.5)
i1 i i+1 i1 eni i i+1

eni
en+1
i1 en+1
i+1
en+1
i1 en+1
i+1
Figure 5.2: Fast or slow convergence of the Schwarz algorithm.
On the 1D example sketched in Figure 5.2, we see that the rate of conver-
gence of the algorithm is related to the decay of the harmonic functions eni in
the vicinity of i via the subdomain boundary condition. Indeed, a small
value for this BC leads to a smaller error in the entire subdomain thanks
to the maximum principle. That is, a fast decay for this value corresponds
to a large eigenvalue of the DtN map whereas a slow decay corresponds to
small eigenvalues of this map because the DtN operator is related to the
normal derivative at the interface and the overlap is thin. Thus the small
eigenvalues of the DtN map are responsible for the slow convergence of the
algorithm and it is natural to incorporate them in the coarse grid space.
5.1.1 Spectral coarse spaces for other problems

Construction of coarse spaces is also a challenge for other class of problems,
such as compressible and incompressible flows in [3] and to indefinite prob-
lems such as Helmholtz equations in [34]. In the latter, the construction
of the coarse space is completely automatic and robust. The construction
of the DtN coarse space inherently respects variations in the wave number,
making it possible to treat heterogeneous Helmholtz problems. Moreover,
it does not suffer from ill-conditioning as the standard approach based on
plane waves (see [76],[74], [119]).
5.2 Variational setting

We consider the variational formulation of a Poisson boundary value problem
with Dirichlet boundary conditions: Find u H01 (), for a given polygonal
(polyhedral) domain IRd (d = 2 or 3) and a source term f L2 (), such
that
u v = f (x)v(x) , for all v H0 ().
1
(5.6)

a(u, v) (f, v)
For any domain D we need the usual norm L2 (D) and seminorm
H 1 (D) , as well as the L2 inner product (v, w)L2 (D) . To simplify notations
we write:
v20,D = v 2 ,
D
5.3. ADDITIVE SCHWARZ SETTING 139
and the seminorm a,D is defined by
v2a,D = v2 .
D
When D = we omit the domain from the subscript and write 0 and
a instead of 0, and a, respectively. Note that the seminorm a
becomes a norm on H01 ().
Finally, we will also need averages and norms defined on (d1)dimensional
manifolds X IRd , namely for any v L2 (X) we define
1
v X = u and v0,X = v .
2 2
X X X
where X is the measure of the manifold X.

We consider a discretization of the variational problem (5.6) with P1 con-
tinuous, piecewise linear finite elements (FE). To define the FE spaces and
the approximate solution, we assume that we have a shape regular, sim-
plicial triangulation Th of . The standard space of continuous and piece-
wise linear (w.r.t Th ) functions is then denoted by Vh , and the subspace of
functions from Vh that vanish on the boundary of by Vh,0 . Note that
Vh,0 = Vh H01 (). Denote by {k }kN the basis of Vh,0 and {xk }kN the
associated nodes in . Let us write the discrete FE problem that we want
to solve: Find uh Vh,0 such that
a(uh , vh ) = (f, vh ), for all vh Vh,0 . (5.7)
which gives the matrix form
AU = F, Aij = a(j , i ), Fi = (f, i ), i, j N .
so that uh = kN Uk k where U = (Uk )kN .

For technical reasons, we also need the standard nodal value interpolation
operator Ih from C() to Vh ():
Ih (v) = v(xk ) k .
kN
This interpolant is known to be stable in the sense that there exists a con-
stant CIh > 0 such that for all continuous, piecewise quadratic (w.r.t. Th )
function v, we have:
Ih (v)a CIh va . (5.8)
5.3 Additive Schwarz setting

In order to automatically construct a robust two-level Schwarz precondi-
tioner, we first partition our domain into a set of non-overlapping subdo-
mains {j }N
j=1 resolved by Th using for example a graph partitioner such as
METIS [117] or SCOTCH [24]. Each subdomain j is then extended to a

domain j by adding one or several layers of mesh elements in the sense of
Definition 5.3.1, thus creating an overlapping decomposition {j }N
j=1 of
(see Figure 1.8).
Definition 5.3.1 Given an open subset D which is resolved by Th , the

extension of D by one layer of elements is
D = Int( supp(k )), where dof(D ) = {k N supp(k )D },

kdof(D )
and Int() denotes the interior of a domain. Extensions by more than one
layer can then be defined recursively.
The proof of the following lemma is a direct consequence of Definition 5.3.1.
Lemma 5.3.1 For every degree of freedom k, with k N , there is a subdo-

main j , with 1 j N , such that supp(k ) j .
Note that in the case of P1 finite elements the degrees of freedom are in fact
the values of the unknown function in the vertices of the mesh.
Definition 5.3.2 (Number of overlapping domains) We define
k0 = max (#{j 1 j N , j }).

Th
as the maximum number of subdomains to which one cell can belong.
Another ingredient of a decomposition is the following:
Definition 5.3.3 (Number of colors) We define the number Nc as the

minimum number of colors we can use to color a decomposition such as any
two neighboring domains have different colors.
Note that there is an obvious relation: k0 Nc .
Definition 5.3.4 (Overlapping parameter) We define for each domain

the overlapping parameter as
j = inf dist(x, j ). (5.9)

xj ij i
Lemma 5.3.2 (Partition of unity) With k0 as in Definition 5.3.2, there

exist functions {j }N
j=1 Vh () such that
j (x) [0, 1], x ,

5.3. ADDITIVE SCHWARZ SETTING 141
supp(j ) j ,
j=1 j (x) = 1, x ,
N
j C j1 .
Proof Let
dist(x, j ), x j ,
dj (x) = {
0, x j
Then it is enough to set
dj (x)
j (x) = Ih .
N
j=1 dj (x)
The first three properties are obvious, the last one can be found in [179],
page 57.
Note that with the help of partition of unity of functions {j }N

j=1 defined on
, subordinate to the overlapping decomposition {j }j=1 we can define
N
j = {x j j (x) < 1}
as the boundary layer of j that is overlapped by neighboring domains.

Now, for each j = 1, . . . , N , let
Vh (j ) = {vj v Vh }
denote the space of restrictions of functions in Vh to j . Furthermore, let
Vh,0 (j ) = {vj v Vh , supp (v) j }
denote the space of finite element functions supported in j . By definition,

the extension by zero of a function v Vh,0 (j ) to lies again in Vh . We
denote the corresponding extension operator by
Ej Vh,0 (j ) Vh . (5.10)
Lemma 5.3.1 guarantees that Vh = N

j=1 Ej Vh,0 (j ). The adjoint of Ej
Ej Vh Vh,0 (j ) ,
called the restriction operator, is defined by Ej g, v = g, Ej v (where

< , > denotes the duality pairing), for v Vh,0 (j ), g Vh . However, for
the sake of simplicity, we will often leave out the action of Ej and view
Vh,0 (j ) as a subspace of Vh .
The final ingredient is a coarse space V0 Vh which will be defined later.

Let E0 V0 Vh denote the natural embedding and E0 its adjoint. Then
the two-level additive Schwarz preconditioner (in matrix form) reads
N
MASM,2
1
= R0T A1
0 R0 + Rj Aj Rj , A0 = R0 AR0 and Aj = Rj ARj ,
T 1 T T
j=1
(5.11)
where Rj , R0 are the matrix representations of Ej and E0 with respect to
the basis {k }kN and the chosen basis of the coarse space V0 which will be
specified later.
As usual for standard elliptic BVPs, Aj corresponds to the original (global)

system matrix restricted to subdomain j with Dirichlet conditions on the
artificial boundary j .
On each of the local spaces Vh,0 (j ) the bilinear form aj (, ) is positive

definite since
aj (v, w) = a(Ej v, Ej w), for all v, w Vh,0 (j ),
and because a(, ) is coercive on Vh . For the same reason, the matrix Aj
in (5.11) is invertible. An important ingredient in the definition of Schwarz
algorithms are the following operators:
Pj Vh Vh,0 (j ), aj (Pj u, vj ) = a(u, Ej vj ), vj Vh,0 (j ).
and
Pj = Ej Pj Vh Vh , j = 1N.
Lemma 5.3.3 (Properties of Pi ) The operators Pj are projectors which

are self-adjoint with respect to the scalar product induced by a(, ) and pos-
itive semi-definite.
Proof By definition of Pj , Pj and aj we have
a(Pj u, v) = a(Ej Pj u, v) = a(v, Ej Pj u)

(5.12)
= aj (Pj u, Pj v) = a(Ej Pj u, Ej Pj v) = a(Pj u, Pj v).
Similarly we have a(u, Pj v) = a(Pj v, u) = a(Pj v, Pj u) which proves that
a(Pj u, v) = a(u, Pj v), that is Pj is self-adjoint w.r.t a scalar product.
They are also projectors since by (5.12)
a(Pj u, v) = a(Pj u, Pj v) = a(Pj2 u, Pj v) = a(Pj2 u, v).
As a consequence of the coercivity of the bilinear forms aj we have
a(Pj u, u) = a(Ej Pj u, u) = a(u, Ej Pj u) = aj (Pj u, Pj u) 0.

5.4. ABSTRACT THEORY FOR THE TWO-LEVEL ASM 143
We will denote the additive Schwarz operator by
N N
Pad = Pi = Ej Pj
i=0 i=0
We can give an interpretation for the above from the linear algebra point of
view. In this case if we denote by Pj and Pj the matrix counterparts of Pj
and Pj w.r.t. the finite element basis {k }kN we get
Pj = A1
j Rj A, Pj = Rj Aj Rj A.
T 1
Taking into account (5.11), then the additive matrix which corresponds to
the parallel or block-Jacobi version of the original Schwarz method is then
given
N
Pad = MASM,2
1
A = Pi . (5.13)
i=0
A measure of the convergence is the condition number of Pad , (Pad ) which

can be computed using its biggest and the smallest eigenvalues (which are
real since Pad is self-adjoint w.r.t to the a-inner product):
a(Pad u, u) a(Pad u, u)
max (Pad ) = sup , min (Pad ) = inf .
uVh a(u, u) uVh a(u, u)
5.4 Abstract theory for the two-level ASM

We present here the abstract framework for additive Schwarz methods (see
[179, Chapter 2]). In the following we summarize the most important ingre-
dients.
Lemma 5.4.1 (Upper bound of the eigenvalues of Pad ) With Nc as

in Definition 5.3.3, the largest eigenvalue of the additive Schwarz operator
preconditioned operator Pad = MASM,2
1
A is bounded as follows
max (Pad ) Nc + 1.
Proof We start by proving that
max (Pad ) N + 1.
By using the fact that Pi are a-orthogonal projectors, we get for all u Vh :
a(Pi u, u) a(Pi u, Pi u)
= 1
u2a u2a
which implies:
N
a(Pi u, u) N a(Pi u, u)
max (Pad ) = sup sup N + 1.
uVh i=0 ua2
i=0 uVh u2a
As for the estimate with a bound on the maximum number colors, we pro-
ceed in the following way. By assumption, there are Nc sets of indices i ,
i = 1, . . . , Nc , such that all subdomains j for j i have no intersection
with one another. Then,
Pi = Pj , i = 1, , Nc .
ji
are again an a-orthogonal projectors to which we can apply the same

reasoning to get to the conclusion. Note that this result can be further
improved to max (Pad ) k0 where k0 is the maximum multiplicity of the
intersection between subdomains, see[179].
Definition 5.4.1 (Stable decomposition) Given a coarse space V0 Vh ,

local subspaces {Vh,0 (j )}1jN and a constant C0 , a C0 -stable decomposi-
tion of v Vh is a family of functions {vj }0jN that satisfies
N
v = Ej vj , with v0 V0 , vj Vh,0 (j ), for j 1, (5.14)
j=0
and N
2 2
vj a,j C0 va .
2
(5.15)
j=0
Theorem 5.4.1 If every v Vh admits a C0 -stable decomposition (with

uniform C0 ), then the smallest eigenvalue of MASM,2
1
A satisfies
min (MASM,2
1
A) C02 . (5.16)
Proof We use the projector properties of Pj and the definitions of Pj and

Pj to get
N N
a(Pad u, u) = a(Pj u, u) = a(Pj u, Pj u)
j=0 j=0
N N
= a(Ej Pj u, Ej Pj u) = aj (Pj u, Pj u) (5.17)
j=0 j=0
N
2
= Pj ua, .
j
j=0
5.5. DEFINITION AND PROPERTIES OF COARSE SPACES 145
On the other hand, by using the same properties, the decomposition of

u = N
j=0 Ej uj and a repeated application of the Cauchy-Schwarz inequality
we get:
N N
u2a = a(u, u) = a(u, Ej uj ) = aj (Pj u, uj )
j=0 j=0
1/2 1/2
N N 2 N 2
Pj ua, uj a,j Pj ua, uj a,j .
j=0
j j=0 j j=0
(5.18)
From (5.18) we get first that
N 2 N
u4a Pj ua, uj 2a,j .
j=0 j
j=0
which can be combined with (5.17) to conclude that:
2
a(Pad u, u) j=0 Pj ua,j
N
u2a
=
u2a u2a 2
j=0 uj a,j
N
N
We can clearly see that if the decomposition u = Ej uj is C02 -stable that
j=0
is (5.15) is verified, then the smallest eigenvalue of Pad satisfies:
a(Pad u, u)
min (Pad ) = inf C02 .
uVh u2a
From Lemma 5.4.1 and (5.16) we get the final condition number estimate.
Corollary 5.4.1 The condition number of the Schwarz preconditioner

(5.11) can be bounded by
(MASM,2
1
A) C02 (Nc + 1).
5.5 Definition and properties of the Nicolaides

and spectral coarse spaces
In the following, we will construct a C0 -stable decomposition in a specific
framework, but prior to that we will provide in an abstract setting, a suffi-
cient and simplified condition of stability.
5.5.1 Nicolaides coarse space

The Nicolaides coarse space is made of local components associated to
piecewise constant functions per subdomain. These functions are not in
Vh,0 (j ) since they do not vanish on the boundary. To fix this problem, we
have to distinguish between so called floating subodmains j which do not
touch the global boundary where a Dirichlet boundary condition is imposed
( j = ) and subdomains that have part of their boundary on .
For floating subdomains, well make use of the partition of unity functions
whereas the other subdomains will not contribute to the coarse space. More
precisely, we define

1

( u) 1j = uj 1j
= j j
if j floating
N ico
u
j

0 otherwise .
Functions 1j are not in Vh,0 (j ) so well use the partition of unity functions
j to build the global coarse space. Recall that Ih denotes the standard
nodal value interpolation operator from C() to Vh () and define
j = Ih (j 1j ) .
H
The coarse space is now defined as
V0 = span{H
j j is a floating subdomain} .
The use of the standard nodal value interpolation operator Ih is made nec-
essary by the fact that j 1j is not a P1 finite element function. By con-
struction each of the functions j Vh,0 , so that as required V0 Vh,0 . The
dimension of V0 is the number of floating subdomains.
5.5.2 Spectral coarse space

As motivated in 5.1, the coarse space is made of local components associ-
ated to generalized eigenvalue problems in the subdomains glued together
with the partition of unity functions except that in this section we have
a constant coefficient operator, see [61] for the theory with heterogeneous
coefficients.
Consider the variational generalized eigenvalue problem:
Find (v, ) Vh (j ) R such that

v w = v w, w Vh (j ) . (5.19)

j j
Let nj be the number of degrees of freedom on j . Since the right-hand

side of the generalized eigenvalue problem is an integral over the boundary
5.5. DEFINITION AND PROPERTIES OF COARSE SPACES 147
of the subdomain j , we have at most nj eigenpairs in (5.19). Let

(v` , ` )1`nj ,
(j) (j)
denote these eigenpairs.
For some mj (1 mj nj ) the local coarse space is now defined as the

Vh (j ), 1 ` mj . For
(j)
span of the first mj finite element functions v`
(j) m
any u Vh (j ), we can define the projection on span{v` }`=1j by
mj
spec = aj (v` , u) v` .
(j) (j)
j u (5.20)
`=1
where aj is the local bilinear form associated to the Laplace operator on the
local domain j
aj (v, w) = v w, v, w Vh (j )
j
where the eigenvectors are normalized: aj (v` , v` ) = 1. Functions v` are

(j) (j) (j)
not in Vh,0 (j ) so well use the partition of unity functions j to build the
global coarse space.
Definition 5.5.1 (Spectral coarse space) The spectral coarse space is

defined as
V0 = span{H
j,` 1 j N and 1 ` mj } , j,` = Ih (j v` ).
H
(j)
where
Remark 5.5.1 The structure of this coarse space imposes the following re-
marks
As in the case of the Nicolaides coarse space, the use of the standard
(j)
nodal value interpolation operator Ih is justified by the fact that j v`
is not a P1 finite element function.
The dimension of V0 is N
j=1 mj .
By construction each of the functions j,` Vh,0 , so that as required

V0 Vh,0 .
When mj = 1 and the subdomain does not touch the boundary of , the
lowest eigenvalue of the DtN map is zero and the corresponding eigen-
vector is a constant vector. Thus Nicolaides and the spectral coarse
spaces are then identical.
In the following we will illustrate some very natural properties of the pro-
jection operators defined before.
Lemma 5.5.1 (Stability of the local projections) There exists con-

stants CP and Ctr independent of j and of the size of the subdomain such
that for any u Vh (j ), and for j = N
j
ico
or j = spec
j
j ua,j ua,j , (5.21)

u j u0,j 2 cj j ua,j . (5.22)
where

1
CP2 + (j mj +1 ) , for j = j ,
(j) spec
cj = (5.23)

1 Hj
C 2 + Ctr ( j ) , for j = N ico
,
P j
Hj denotes the diameter of the domain j .

Proof The results in the first part of the proof are true for both cases, that
is j = N
j
ico
or j = spec
j . Stability estimate (5.21) follows immediately
spec
from the fact that j is an a(j) -orthogonal projection and from the
consideration that N
j
ico
ua,j = 0 since N
j
ico
u is a constant function.
To prove (5.22) let us first apply Lemma 5.7.2, i.e.

u j u20, 2CP2 j2 u j u2a, + 4 j u j u20,j , (5.24)
j j
It follows from the triangle inequality and (5.21) that

u j u2a, u j u2a,j 2 u2a,j (5.25)
j
and so it only remains to bound u j u20,j .
At this point we have to distinguish between the Nicolaides and spectral

coarse spaces.
So if j = N j
ico
and j is a floating subdomain, under some regularity
assumptions of the subdomains, we have the following inequality (see [179]
for this type of result)
u j u20,j Ctr Hj u2a,j .
When j is not floating, j u = 0 and the above inequality becomes a
classical trace inequality.
In the case of the spectral coarse space, that is j = spec

j , let us extend
(j) n
the set {v` }`=1 j to an aj -orthonormal basis of Vh (j ) by adding nj =
dimVh,0 (j ) suitable functions vn ` = 1, . . . , nj , and write
(j)
j
+` ,
nj +nj
u = aj (v` , u) v` .
(j) (j)
(5.26)
`=1
5.6. CONVERGENCE THEORY FOR ASM WITH NICOLAIDES AND SPECTRAL COARSE SPACES1
(j) n
The restriction of the functions {v` }`=1 j to the boundary j forms a
complete basis of Vh (j ). This implies that vn 0 on j , for all ` =
(j)
j
+`
1, . . . , nj . Moreover, it follows from the definition of the eigenproblem that
(j) n
the functions {v` }`=1 j are orthogonal also in the (, )0,j inner product.
Therefore
nj
2
u j u20,j = a(v` , u) v` 0,j
2
(j) (j)
(5.27)
`=mj +1
nj
1 2
= a (v` , u)
(j)
(j)
(5.28)
`=mj +1 `
nj
1 2 1
a (v` , u) = u2a,j
(j)
(j) (j)
mj +1 `=1 mj +1
and the result follows from (5.24), (5.25) and (5.27). Note that this estimate
does not require any regularity assumption on the subdomain.
5.6 Convergence theory for ASM with Nicolaides

and spectral coarse spaces
We are now ready to state the existence of a stable splitting for the previously
build coarse spaces.
Lemma 5.6.1 (Existence of a stable splitting) For any u Vh,0 , let

{uj }0jN be defined by
J
u0 = Ih j j uj V0 and uj = Ih (j (u j u)). (5.29)
j=1
where j is Nj
ico
or spec
j . Then {uj }0jN form a C02 -stable decomposition
of u in the sense of the Definition 5.4.1 with
N
C02 = (8 + 8 C2 max cj ) k0 CIh (k0 + 1) + 1 ,
j=1
where cj depends on the choice of the coarse space and is given by for-
mula (5.23).
Proof Note that the proof is the same for both cases, that is j = N j
ico
and j = j .spec
In the following, to ease the presentation when there is no confusion and

it is clear from the context, we will simply denote the restriction uj of u
onto j by u, and write, e.g., j u instead of j uj . Reversely where is no

confusion we will simply write uj instead of its extension to , Ej uj .
Lets show first (5.14), that is u = N

j=0 uj ,
N N N J
uj = Ih (j (u j u)) = Ih j u Ih (j j u) = u u0 .
j=1 j=1 j=1 j=1
We go now to the second part, that is to the proof of (5.15). From a simple
application of the triangle inequality it follows that
N N
uj a u u0 a + ua + uj a
2 2 2 2
(5.30)
j=0 j=1
Since the interpolant Ih is stable with respect to the a-norm (see (5.8)) and
since each cell is overlapped by at most k0 domains, we have
J 2 N 2
u u0 2a = Ih ( j (u j u)) CIh j (u j u)
j=1 a j=1 a
N
CIh k0 j (u j u)2a .
j=1
Substituting this into (5.30) and using the definition of uj as well as the
a-stability of the interpolant Ih we get
N N
uj a CIh (k0 + 1) j (u j u)a + ua .
2 2 2
(5.31)
j=0 j=1
Note that supp{j } = j and supp{j } = j . Thus, using triangle in-

equality and product rule
N N
j (u j u)a 2 j u j ua,j + j u j u0,j
2 2 2 2 2
j=1 j=1
N
4u2a,j + 4j u2a,j + 2C2 j2 u j u20, .
j
j=1
(5.32)
Note that in the last part we used the last property of the partition of unity
stated in Lemma 5.3.2.
5.6. CONVERGENCE THEORY FOR ASM WITH NICOLAIDES AND SPECTRAL COARSE SPACES1
We have:
N N
j (u j u)a 4ua,j + 4j ua,j + 2C j u j u0,j .
2 2 2 2 2 2
j=1 j=1
N
u2a,j (8 + 8 C2 cj )
j=1
N
N
(8 + 8 C2 max cj ) u2a,j
j=1 j=1
N
(8 + 8 C2 max cj ) k0 u2a .
j=1
(5.33)
The last inequality comes from the assumption that each point x is
contained in at most k0 subdomains.
From (5.31) and (5.33) we conclude that
N
N
uj a [(8 + 8 C max cj ) k0 CIh (k0 + 1) + 1]ua ,
2 2 2
j=0 j=1
which ends the proof and yields the following formula for the constant C0
N
C02 = (8 + 8 C2 max cj ) k0 CIh (k0 + 1) + 1 . (5.34)
j=1
From the abstract Schwarz theory, we have the condition number estimates
in the two cases (see Corollary 5.4.1).
Theorem 5.6.1 (Convergence of the ASM with Nicolaides coarse space)

The condition number of the two-level Schwarz algorithm with the Nicolaides
coarse space can be bounded by
(MASM,2
1
A)
N Hj
[(8 + 8 C2 max (CP2 + Ctr
1
( ))) k0 CIh (k0 + 1) + 1] (Nc + 1).
j=1 j
Theorem 5.6.2 (Convergence of the ASM with spectral coarse space)

The condition number of the two-level Schwarz algorithm with a coarse
space based on local DtN maps can be bounded by
(MASM,2
1
A)
N 1
[(8 + 8 C2 max (CP2 + ( ))) k0 CIh (k0 + 1) + 1] (Nc + 1).
j=1 j mj +1
0j
Djk
Xjk
Figure 5.3: Assumption on the overlapping region
Remark 5.6.1 Note that, by choosing the number mj of modes per subdo-
main in the coarse space such that mj +1 Hj1 , the preconditioned problem
(j)
using the coarse space verifies
(MASM,2
1
A)
N Hj
[(8 + 8 C2 max (CP2 + ( ))) k0 CIh (k0 + 1) + 1] (Nc + 1).
j=1 j
Hence, we have the same type of estimate as for the Nicolaides coarse space.
An interesting observation is that the bound depends only in an additive
way on the constant CP and on the ratio of subdomain diameter to overlap.
5.7 Functional analysis results

We prove Lemma 5.7.2 that was used to establish estimate (5.22). We start
by stating an assumption on the overlapping subregions.
Assumption 5.7.1 We assume that each j , j = 1, . . . , N , can be subdi-

vided into regions Djk , k = 1, . . . , Kj (see Figure 5.3), such that diam(Djk )
j and Djk jd , and such that, for each k, the (d1)dimensional manifold
Xjk = j Djk has the following properties:
(i) Xjk jd1 .
2j .
Djk
(ii) Xjk
We assume as well that the triangulation Th resolves each of the regions Djk
and each of the manifolds Xjk .
In [156], the following uniform Poincare inequality is proved:

5.8. THEORY OF SPECTRAL COARSE SPACES FOR SCALAR HETEROGENEOUS PROBLEMS153
Lemma 5.7.1 There exists a uniform constant CP > 0, such that the fol-
lowing Poincaretype inequality holds for all j = 1, . . . , N and k = 1, . . . , Kj :
v v Xjk 0,Djk CP j va,Djk , for all v Vh (Djk ).
The constant CP is independent of h and j .

We then prove the following result:
Lemma 5.7.2 We have
u20, 2CP2 j2 u2a, + 4j u20,j , for all u Vh (j ).

j j
Proof It follows from Lemma 5.7.1, as well as the triangle and the Cauchy-
Schwarz inequalities, that
2 u0,Djk u uXjk 20,Djk + uXjk 20,Djk

1 2
2
Djk
CP2 j2 u2a,Djk + ( u)
Xjk 2 Xjk
Djk
CP2 j2 u2a,Djk + u2
Xjk Xjk
CP2 j2 u2a,Djk + 2j u20,Xjk .
The final result follows by summing over k = 1, . . . , Kj .
5.8 Theory of spectral coarse spaces for scalar het-

erogeneous problems
The analysis of the spectral coarse space introduced in [142] for highly het-
erogeneous Darcy equation
div(u) = f, in
where can vary of several orders of magnitude, typically between 1 and

106 was analyzed in [61]. A similar but different approach is developed in
[81]. These results generalize the classical estimates of overlapping Schwarz
methods to the case where the coarse space is richer than just the kernel
of the local operators (which is the set of constant functions) [146], or
other classical coarse spaces (cf. [179]). It is particularly well suited to
the small overlap case. In the case of generous overlap, our analysis relies
on an assumption on the coefficient distribution that may not be satisfied
for small scale coefficient variation. This assumption does not seem to be
necessary in practice, see also [143] for more extensive numerical results.
The idea of a coarse space based on spectral information that allows to

achieve any a priori chosen target convergence rate was developed and
implemented in the spectral AMGe method in [22]. More recently, in the
framework of two-level overlapping Schwarz, [81, 80, 142, 143, 68] also build
coarse spaces for problems with highly heterogeneous coefficients by solving
local eigenproblems. However, compared to the earlier works in the AMG
context the recent papers all focus on generalized eigenvalue problems. We
can distinguish three sets of methods that all differ by the choice of the
bilinear form on the right-hand side of the generalized eigenproblem. In
[81, 80], the right-hand side is the local mass matrix, or a homogenized
version obtained by using a multiscale partition of unity. In [142, 143] the
right-hand side corresponds to an L2 -product on the subdomain boundary,
so that the problem can be reduced to a generalized eigenproblem for the
DtN operator on the subdomain boundary. The latest set of papers, [68],
uses yet another type of bilinear form on the right-hand side, inspired by
theoretical considerations. In [168], the construction of energy-minimizing
coarse spaces that obey certain functional constraints are used to build
robust coarse spaces for elliptic problems with large variations in the
coefficients.
All these approaches have their advantages and disadvantages, which depend
on many factors, in particular the type of coefficient variation and the size
of the overlap. When the coefficient variation is on a very small scale many
of the above approaches lead to rather large (and therefore costly) coarse
spaces, and it is still an open theoretical question how large the coarse space
will have to become in each case to achieve robustness for an arbitrary
coefficient variation and how to mitigate this.
Chapter 6
Neumann-Neumann and
FETI Algorithms
The last decade has shown, that Neumann-Neumann [12, 128, 121] type
and FETI [78] algorithms, as well as their variants such as BDDC [46] and
FETI-DP [75] algorithms, are very efficient domain decomposition methods.
Most of the early theoretical and numerical work has been carried out for
symmetric positive definite second order problems, see for example [38, 131,
78, 132]. Then, the method was extended to different other problems, like
the advection-diffusion equations [92, 1], plate and shell problems [176] or
the Stokes equations [150, 177].
These methods require pseudo inverses for local Neumann solves which can
be ill-posed either in the formulation of the domain decomposition problem
or in the domain decomposition preconditioner. FETI methods are based on
the introduction of Lagrange multipliers on the interfaces to ensure a weak
continuity of the solution. We present here an alternative formulation of the
Neumann-Neumann algorithm with the purpose of simplifying the numerical
implementation. This is made at the expense of minor modifications to the
original algorithm while keeping its main features. We start with a historical
and heuristic presentation of these methods.
6.1 Direct and Hybrid Substructured solvers

Neumann-Neumann and FETI methods originated from parallel direct
solvers. For a general matrix, direct methods consist in performing its
Gauss decomposition into a lower triangular matrix and an upper triangular
one, a.k.a. LU decomposition. For a symmetric positive definite matrix, a
Cholesky decomposition into a lower triangular matrix times its transpose
is cheaper (a.k.a. LLT decomposition). In order to benefit from the spar-
sity of the matrix arising from a finite element discretization of a partial
155
156 CHAPTER 6. NEUMANN-NEUMANN AND FETI ALGORITHMS
U1
U2
Figure 6.1: Degrees of freedom partition for a multifrontal method
differential equation, a variant of Gauss elimination, the frontal method,

that automatically avoids a large number of operations involving zero terms
was developed. A frontal solver builds a LU or Cholesky decomposition
of a sparse matrix given as the assembly of element matrices by eliminat-
ing equations only on a subset of elements at a time. This subset is called
the front and it is essentially the transition region between the part of the
system already finished and the part not touched yet. These methods are
basically sequential since the unknowns are processed the one after another
or one front after another. In order to benefit from multicore processors, a
multifrontal solver (see Duff and Reid [67]) is an improvement of the frontal
solver that uses several independent fronts at the same time. The fronts can
be worked on by different processors, which enables parallel computing.
In order to simplify the presentation of a multifrontal solver, we consider the
two subdomain case for a matrix arising from a finite element discretization.
As we have repeatedly shown in the previous chapters, a variational problem
discretized by a finite element method yields a linear system of the form
AU = F, where F is a given right-hand side and U is the set of unknowns.
The set of degrees of freedom N is partitioned into interior d.o.f.s N1 and N2
and interface d.o.f.s N , see 6.4 for a complete definition of this intuitive
notion and Figure 6.1 as an illustration of the two-subdomain case.
The vector of unknowns, U (resp. F) is accordingly partitioned into interior

unknowns U1 and U2 (resp. F1 , F2 ), and into interface unknowns, U
(resp. F ). By numbering interface equations last, this leads to a block
decomposition of the linear system which has the shape of an arrow (pointing
6.1. DIRECT AND HYBRID SUBSTRUCTURED SOLVERS 157
down to the right):

A11 0 A1 U1 F1
A22 A2

0 U2 = F2 . (6.1)
A1 A2 A
U F
A simple computation shows that we have a block factorization of matrix A
I A11 I 0 A11 A1
1
A= 0 I A22 22 A2 .
I A1 (6.2)
A1 A1
11 A 2 A 1
22 I S I
Here, the matrix
S = A A1 A1
11 A1 A2 A22 A2
1
(6.3)
is a Schur complement matrix and it is dense. It corresponds to an elimina-

tion of the interior unknowns Ui , i = 1, 2.
From (6.2) we can see that the inverse of A can be easily computed from its
factorization
I 0 A11 A1 A11 I
1 1
A = I A22 A2
1 1
A1
22 0 I .
I S1
A1 A11 A2 A22 I
1 1
(6.4)
Note that in practice, the three matrices A111 , A 1
22 and S1
are never evalu-
ated explicitly. From the above formula, it appears that applying A1 to the
right-hand side F is equivalent to solving linear systems for the three above
matrices which can be done by factorizing them. The parallelism comes
from the fact that matrices A11 and A22 can be factorized concurrently. We
say we advance two fronts in parallel. Once it is done, matrix S can be
computed and then factorized.
By considering more than two fronts and recursive variants, it is possible
to have numerical efficiency for one or possibly two dozens of cores and
problems of size one or two millions of unknowns in two dimensions and
hundreds of thousands degrees of freedom in three dimensions. Although
these figures improve over time, it is generally accepted that purely direct
solvers cannot scale well on large parallel platforms. The bottleneck comes
from the fact that S is a full matrix which is costly to compute and
factorize. This is unfortunate since direct solvers have the advantage over
iterative solvers to be very predictable, reliable and easy to integrate via
third-party libraries. There is thus a great amount of effort in developing
hybrid direct/iterative solvers that try to take advantage of the two worlds.
n2 n1
Figure 6.2: Non overlapping decomposition into two subdomains
A first approach consists in avoiding the factorization of the Schur comple-

ment matrix S by replacing every application of its inverse to some vector
b R#N by an iterative solve of a Schur complement equation
11 A1 A2 A22 A2 ) x = b
x = S1 b (A A1 A1 1
(6.5)
with a conjugate gradient (CG) algorithm (see chapter 3 on Krylov meth-

ods). An extra advantage of using Krylov type methods is that they only
require matrix-vector products S times some vector r which can be per-
formed mostly in parallel without computing the entries of the Schur com-
plement matrix. Both the computation and factorization S are avoided.
This way, the method combines direct solvers in the subdomain with iter-
ative solver for the interface between subdomains. This is an example of a
hybrid direct/iterative solver.
We are now left with the efficiency of the CG method. It means there
is a need for a preconditioner. Finding a good parallel preconditioner for
problem (6.5) is not easy at the algebraic level. In the next section we
will consider substructuring at the continuous level. It will enable to us
to propose the Neumann-Neumann/FETI preconditioners at the continuous
level in 6.2.2 and then to define its algebraic counterpart in 6.3.
6.2 Two subdomains case at the continuous level

As an example for our methods, we consider a Poisson problem defined on
a domain :
Find u R such that
(6.6)
u = f, in
{
u = 0, on .
In order to simplify the presentation, we forget about the boundary condition
on . Suppose that the domain is decomposed into two non-overlapping
subdomains 1 and 2 . Let the interface between the subdomains be =
6.2. TWO-SUBDOMAINS AT THE CONTINUOUS LEVEL 159
1 2 , and (ni )i=1,2 be the outward normal to the artificial boundary

corresponding to 1,2 .
It is known that u1 (resp. u2 ) is the restriction of the solution to (6.6) to
subdomain 1 (resp. 2 ) if and only if:

ui = f, in i , i = 1, 2 ,

u1 = u2 , on ,
(6.7)

u1 u2

+ = 0, on .
n1 n2
In other words, the local functions must satisfy the Poisson equation in the
subdomains and match at the interface. Since the Poisson equation is a
scalar second order partial differential equations, we have two continuity
equations on the interface one for the unknown and the second one for the
normal derivatives. Recall that for overlapping subdomains, a matching
condition of the unknown on the interfaces is sufficient, see chapter 1 on
Schwarz methods.
6.2.1 Iterative Neumann-Neumann and FETI algorithms

In this section we will write iterative methods for solving eq. (6.7). They are
very naturally derived from (6.7) by relaxing then correcting the continuity
relations on the interface. When the flux continuity is relaxed, we obtain
what is called a Neumann-Neumann method and when the continuity on
the Dirichlet data is relaxed we obtain an iterative version of the FETI
algorithm. Both Neumann-Neumann and FETI methods will be investigated
at the algebraic level in the sequel in 6.3.
Definition 6.2.1 (Iterative Neumann-Neumann algorithm) Let uni

denote an approximation to the solution u in a subdomain i , i = 1, 2 at
the iteration n. Starting from an initial guess u0i , the Neumann-Neumann
1 , u2 ) from (u1 , u2 ) by solving
iteration computes the approximation (un+1 n+1 n n
a local Dirichlet problem with a continuous data at the interface

n+ 21

(ui ) = f, in i ,
(6.8)

n+ 1
ui 2 =
1 n
(u + uni ), on .

2 j
followed by a correction for the jump of the fluxes on the interface

(en+1 ) = 0, in i ,

i

1 1
e n+1
1 un+ 2 un+ 2 (6.9)

i
= 1
+ 2
on .

ni 2 n1 n2

1 1
u12 u22
u02
u11 u12
u01
x
0 l L
Figure 6.3: First iteration of a Neumann-Neumann algorithm
The next iteration is then given by
n+ 21
un+1
i = ui + en+1
i , i = 1, 2. (6.10)
The rationale for the first step (6.8) is to satisfy the Poisson equation in the
subdomains while ensuring the continuity of the solution on the interface .
After this step, the solution is continuous on the interfaces but the fluxes
may not match. The jump on the fluxes is corrected after (6.10). On we
have
un+1 un+1 n+ 1 n+ 1
1
+ 2 = 1 )+
(u1 2 + en+1 2 )
(u2 2 + en+1
n1 n2 n1 n2
n+ 1 n+ 1 n+ 1 n+ 1 n+ 1 n+ 1
ui 2 1 u1 2 u2 2 u2 2 1 u1 2 u2 2
= + + + = 0.
ni 2 n1 n2 n2 2 n1 n2
Then, functions un+1

i , i = 1, 2 may not match on the interface. In order to
correct this misfit correction (6.8) is applied again and so on.
Definition 6.2.2 (Iterative FETI algorithm) Let uni denote an approx-

imation to the solution u in a subdomain i , i = 1, 2 at the iteration n.
Starting from the initial guess u0i , FETI iteration computes the approxima-
tion (un+1
1 , u2 ) from (u1 , u2 ) by first correcting the jump of the normal
n+1 n n
derivatives on the interface

n+ 21

(u ) = f, in i ,

i
n+ 1 (6.11)

ui 2 uni 1 un1 un2

= ( + ) on .
ni ni 2 n1 n2
and then the discontinuity of the solution:

(en+1 ) = 0, in i ,

i
(6.12)

n+ 1 n+ 1
en+1
i = 21 (uj 2 ui 2 ) on for j i .
The next iteration is then given by:

n+ 21
un+1
i = ui + en+1
i , = 1, 2, . (6.13)
6.2.2 Substructured reformulations

It consists in reformulating the algorithms from the Definitions 6.2.1 and
6.2.2 in term of the interface unknowns on and operators acting on these
interface unknowns.
Definition 6.2.3 (Interface operators for the Neumann-Neumann algorithm)

Let the operators S1 , S2 be defined as
ui
Si (u , f ) = , i = 1, 2, (6.14)
ni
where ui solve the local problems

ui = f,
in i ,
(6.15)

ui = u , on .

Let also S be defined as the operator
u1 u2
S(u , f ) = S1 (u , f ) + S2 (u , f ) = + (6.16)
n1 n2
which quantifies the jump of the normal derivative at the interface. Define
also operator T by
1 1 1 1
T (g ) = (S (g , 0) + S21 (g , 0)) = (e1 + e2 ) . (6.17)
2 1 2 4
where the ei = Si1 (u , 0) are obtained by solving Neumann local problems

ei = 0, in i ,

ei (6.18)

= g on .

ni
With this notations, steps (6.9)-(6.10) can be re-written in terms of interface
unknowns.
Lemma 6.2.1 (Neumann-Neumann interface iterative version)

The substructured counterpart of the Neumann-Neumann algorithm from
Definition 6.2.1 will compute un+1
from un by the following iteration
un+1
= un (T S) (un , f ) . (6.19)
This defines what we call the iterative interface version of the Neumann-
Neumann algorithm.
In order to accelerate this fixed point method by a Krylov algorithm, we

identify this equation with formula (3.3) of chapter 3 on Krylov methods.
It is then more efficient to use a Krylov method to solve the substructured
problem.
Definition 6.2.4 (Substructuring formulation for Neumann-Neumann)

The substructured formulation of the Neumann-Neumann iteration consists
in solving, typically by a PCG algorithm, the interface problem
Find u R such that

(6.20)
S(., 0)(u ) = S(0, f )
preconditioned by the linear operator T .
It can be checked that both operators S(., 0) and T are symmetric positive
definite and thus the preconditioned conjugate gradient algorithm (PCG)
can be used. Moreover the action of both operators S(., 0) and T can
be computed mostly in parallel. This very popular preconditioner for the
operator S(0, .) was proposed in [12].
Remark 6.2.1 There are at least three point of views that indicate the rel-
evance of this simply derived preconditioner
Symmetry considerations In the special case where is a symmetry

axis for domain (see fig. 6.2), we have S11 (., 0) = S21 (., 0) and thus
T is an exact preconditioner for S(., 0) since it is easy to check that
(T S)(., 0) = Id .
Mechanical interpretation The unknown u may be the temperature

in domain that obeys a Poisson equation with a source term f .
Operator S(., 0) acts on temperature on the interface and returns a
heat flux across the interface. A good preconditioner should act on a
heat flux and return a temperature. If the Poisson equation is replaced
by the elasticity system, the notion of temperature is replaced by that
of displacement and that of the normal flux by that of a normal stress
to the interface (or a force).
Functional analysis considerations A classical setting for the in-

terface operator S(., 0) is to define it as continuous linear operator
1/2
from the Sobolev space H00 () with values in H 1/2 (). Operator
T is a continuous linear operator from H 1/2 () with values into
1/2
H00 (). Thus, the preconditioned operator T S(., 0) is a contin-
1/2
uous from H00 () into itself. Any meaningful discretizations of this
product of operators is then expected to lead to a condition number
independent of the mesh size.
It is possible to consider the Neumann-Neumann preconditioner the other
way round. The unknown is the flux at the interface between the subdo-
mains. In this case we will obtain the FETI iterative algorithm. Note that
due to the opposite signs of the outward normal ni to subdomain i , the
definition of the normal flux is arbitrary up to a flip of sign. In this case
we will have to define first interface operators acting on fluxes (or dual vari-
ables) which will be preconditioned by operators acting on displacements
(or the primal variables).
Definition 6.2.5 (Interface operator for the FETI algorithm) Let
be a function that lives on the interface and Tf eti be a function of f
and defined by
Tf eti (, f ) = v2 v1 (6.21)
where the vi are solutions of the Neumann BVPs (i = 1, 2):

vi = f in i ,

vi (6.22)

= (1) i
on .

ni
This function quantifies the jump of the solutions across the interface.
Note that we have Tf eti (, 0) = 4 T () (defined by (6.17)), which means the
operator
1 1
Sf eti = S(., 0) (6.23)
2 2
is a good preconditioner for Tf eti . By using a similar argument as in the
case of the Neumann-Neumann algorithm, we can state the following result.
Lemma 6.2.2 (FETI interface iterative version) The substructured

counterpart of the FETI algorithm from Definition 6.2.2 will compute n+1
from n by the following iteration
n+1
= n (Sf eti Tf eti ) (n , f ) . (6.24)
This defines what we call the iterative interface version of the FETI algo-
rithm.
In order to accelerate this fixed point method by a Krylov algorithm, we

identify this equation with formula (3.3) of chapter 3 on Krylov methods.
It is then more efficient to use a Krylov method to solve the substructured
problem.
Definition 6.2.6 (Substructuring formulation for FETI) The sub-
structured formulation of the FETI iteration iteration consists in solving
the interface problem
Find R such that
(6.25)
Tf eti (, 0) = Tf eti (0, f ) .
preconditioned by the operator Sf eti which involves the solving in parallel of
a Dirichlet boundary value problem in each subdomain.
Problem (6.25) is solved by a PCG algorithm with (6.23) as a preconditioner.
This approach has been developed in [77, 35, 120].
6.2.3 FETI as an optimization problem

As explained in the sequel, equation (6.25) can be interpreted as the dual
problem of a minimization problem under equality constraints. First note
that the original boundary value problem (6.6) is equivalent to the following
minimization problem:
1
min J (u), J (u) = u f u
2
(6.26)
uH 1 () 2
where H 1 () is the Sobolev space of functions that are square integrable and
their derivatives are square integrable as well. In order to introduce domain
decomposition methods, we make use of a functional analysis result that
proves that space H 1 () is isomorphic to the domain decomposed space
(local H 1 functions with Dirichlet traces continuous at the interface):
H1 (1 , 2 ) = {(u1 , u2 ) H 1 (1 ) H 1 (2 )/ u1 = u2 on } ,
see e.g. [13] or [28]. This allows the splitting of the functional of optimiza-
tion problem (6.26) into local contributions constrained by the continuity
condition at the interface. We have thus the following
Lemma 6.2.3 (Dual optimization problem) Minimization prob-
lem (6.26) is equivalent to
min J(u1 , u2 ) = min J1 (u1 ) + J (u2 )
(u1 ,u2 )H1 (1 ,2 ) (u1 ,u2 )H1 (1 ,2 )
1 1
= min u1 2 + u2 2 f u1 f u2 .
u1 H 1 (1 ) 2 1 2 2 1 2
u2 H (2 )
2
u1 = u2 , on
(6.27)
6.3. TWO SUBDOMAINS CASE AT THE ALGEBRAIC LEVEL 165
Note that a Lagrangian function of this constrained optimization problem

can be written as
L(u1 , u2 , ) = J1 (u1 ) + J (u2 ) + (u1 u2 ) .

By computing the differential of L with respect to the ui s we get that the

critical points satisfy weakly the local boundary value problems, on one hand

ui = f, in 1

ui (6.28)

= (1)i , on

ni
and the continuity condition
u1 = u2 (6.29)
on the other hand. We see that (6.28) and (6.29) can be rewritten equiva-
lently by using the relations (6.21) and (6.22) from Definition 6.2.5,
Tf eti (, f ) = 0
which is equivalent to the substructured formulation (6.25).
6.3 Two subdomains case at the algebraic level

The goal of this section is to write the algebraic counterpart of the algorithms
defined in the previous section at the continuous level. This way, we shall be
able to propose a good preconditioner for the algebraic Schur complement
equation (6.5). This will be done in a finite element framework.
As we have seen in the beginning of this chapter, a variational problem
discretized by a finite element method yields a linear system of the form
AU = F, where F is a given right-hand side and U is the set of unknowns.
The set of degrees of freedom N is decomposed into interior d.o.f.s N1 and
N2 and interface d.o.f.s N , see 6.4 for a complete definition of this intuitive
notion and Figure 6.1 as an illustration of the two-subdomain case.
The vector of unknowns U (resp. F) is decomposed into interior unknowns

U1 and U2 (resp. F1 , F2 ), and into interface unknowns, U (resp. F ). By
numbering interface equations last, we obtain a block decomposition of the
linear system which has the shape of an arrow (pointing down to the right):

A11 0 A1 U1 F1
A22 A2

0 U2 = F2 . (6.30)
A1 A2 A
U F
The algebraic counterpart of the continuous substructured problem (6.20)

amounts simply to an elimination of the interior unknowns Ui , i = 1, 2 in
(6.30) which leads to a Schur complement equation:
S(U , F) = (A A1 A1
11 A1 A2 A22 A2 ) (U )
1
(6.31)
11 F1 A2 A22 F2 ) = 0
(F A1 A1 1
The natural algebraic form of operator S by (6.16) is thus defined by S

in the above formula. In order to find the algebraic counterparts of the
continuous preconditioner T (eq.(6.17)) and thus of the continuous opera-
tors Si (eq.(6.14)), i = 1, 2 we need somehow to split the operator into two
contributions coming from the subdomains.
For this purpose, we first of all note that there is a natural decomposi-
tion of A = (akl
)k,lN into its contribution coming from each subdomain
A = A + A . More precisely, let k, l be the indices of two degrees of
1 2
freedom on the interface associated with two basis functions k and l . The
1,kl 2,kl
corresponding entry akl can be decomposed into a sum a = a + a ,
kl
where
= ,
ai,kl i = 1, 2.
k l
i
In the same spirit, we define for i = 1, 2 and k N :
= f k
(i),k
f
i
and F = (f )kN . In order to make things simple, we have only

(i) (i),k
considered the Poisson problem. Other partial differential equations such as

the elasticity equations can be handled in the same manner.
Finally, we have a decomposition of matrix A into
A = A + A
(1) (2)
(6.32)
and of F into
F = F + F .
(1) (2)
We can infer that for each domain i = 1, 2, the local operators

ii Ai ) (U ) (F Ai Aii Fi ) .
Si (U , F) = (A Ai A1
(i) 1 (i)
are discretizations of the continuous operators Si , i = 1, 2 and let Si = Si (., 0)

be local Schur complements. Note that from (6.32) we have as in the con-
tinuous case that:
S(U , F) = S1 (U , F) + S2 (U , F) and S = S1 + S2 .
Coming back to the substructuring, we obtain now natural a very natural

definition.
6.3. TWO SUBDOMAINS CASE AT THE ALGEBRAIC LEVEL 167
Definition 6.3.1 (Substructuring formulation for Neumann-Neumann)

Since S = S(, 0), the substructured formulation reads
Find U R#N , such that

(6.33)
S(U ) = S(0, F)
The preconditioner T analogous to (6.17) is the weighted sum:

1 1
T = (S1 (., 0)1 + S2 (., 0)1 ) . (6.34)
2 2
The action of Si (., 0)1 on some interface vector g , vi, = Si (., 0)1 g can
be computed by solving the discretized Neumann problem related to (6.18):
Aii Ai
vi 0
( ) ( )= . (6.35)
Ai A vi, g
(i)
To sum up, the Neumann-Neumann method consists in solving for U the

Schur complement equation (6.31) or (6.33) by the PCG method using oper-
ator T (6.34) as a preconditioner. At convergence, the algorithm will provide
the interface values. Interior values in each subdomain can be computed by
solving local Dirichlet problems:

Aii Ui =F i Ai U . (6.36)
Parallelism is natural since all these steps can mostly be computed concur-
rently on two processors. Matrix-vector products with operators S and T
are both defined as sums of local computations and are therefore essentially
parallel with a very high ratio of local computations over data transfer be-
tween processors. As for the step defined by equation (6.36), it is purely
parallel.
As in the continuous case, it is possible to invert the role of the Neumann
and Dirichlet problems and thus obtain the FETI algorithm.
Definition 6.3.2 (FETI interface operator) Let the operator
Tf eti R#N R#N R#N
be defined for some R#N and F R#N as
Tf eti (, F) = V2, V1, ,

where (Vi , Vi, ) are the solutions of

Aii Ai V Fi
( (i) ) (
i )= . (6.37)
Ai A Vi, F + (1)
(i) i

It can be checked that Tf eti (, 0) = 4 T. We can infer from this that since T is
a good preconditioner for S, 41 S is a good preconditioner for Tf eti . This leads
to the following definition of the discrete FETI substructured formulation.
Definition 6.3.3 (Substructuring formulation for FETI) The sub-

structured formulation of the FETI algorithm reads
Find R#N , such that

Tf eti (, 0) = Tf eti (0, F) .
This problem is solved by the PCG algorithm, using the operator

1 1
S
2 2
as a preconditioner.
This approach has been developed in [77, 35, 120].
6.3.1 Link with approximate factorization

Following [170], we will establish a connection between the Neumann-
Neumann algorithm and a block approximate factorization of the original
system (6.30). Recall first that an exact block factorization of matrix A is
given by equation (6.2) so that its inverse is given by the formula in equation
(6.4)
I 0 A11 A1 A11 I
1 1
A 1
= I A22 A2
1
A1
22 0 I .
I S1
A1 A11 A2 A22 I
1 1
This suggests the approximation M 1 A1 of the inverse, which naturally

provides a preconditioner
I 0 A11 A1 A11 I
1 1
M = I A22 A2
1 1
A22
1
0 I .
I T A1 A1
11 A A
2 22
1
I
(6.38)
where T is given by (6.34). A direct computation shows that the precondi-
tioned system M 1 A has the following form:
I 0 0
M 1
A = 0 I 0
0 0 T S
Due to the properties of matrices S and T we can state some remarkable

features.
6.4. MANY SUBDOMAINS CASE 169
Figure 6.4: Non overlapping geometric partition
Remark 6.3.1 This preconditioner has a few properties:
In the case of exact solves in the subdomains, that is for the computa-
tion of the exact factorization of Aii , the application of this precondi-
tioner is equivalent to that of the Schur complement approach, except
that it is performed at global level.
Inexact solvers can be used in the subdomains since the application of

M 1 requires the repeated application of A1
ii .
The preconditioner is symmetric and this symmetry can be preserved

in the case of inexact solves, provided that the same inexact solves are
used in the first and the last terms of (6.38).
The bad news is that even if we have the spectral equivalence of the
preconditioner with the diagonal blocks A1 ii , M
1
is not necessarily
spectrally equivalent to matrix A. Whereas with overlapping Schwarz
methods when the local subdomain solvers are replaced by a spectrally
equivalent solver the convergence behavior of the ASM is asymptotically
equivalent to that of ASM with exact solvers, see [170].
6.4 Many subdomains case

In a finite element setting, the computational domain is the union of elements
of the finite element mesh Th . A geometric partition of the computational
domain is natural. Here again, graph partitioning can be used by first
modeling the finite element mesh by a graph, and then partitioning it into
N parts, see Figure 6.4.
As in section 1.3.2, we define a partition of unity for the non overlapping
decomposition of the domain into subdomains i , i = 1, . . . , N . We recall
that it corresponds to an overlapping decomposition of the set of d.o.f.s

N since some basis functions have a support that intersects two or more
subdomains. Let {k }kN be a basis of the finite element space. For 1 i
N , we define
Ni = {k N supp (k ) i }
the set of degrees of freedom whose associated basis function intersects sub-
domain i . For each degree of freedom k N , let
k = # {j 1 j N and k Nj }
be the number of subdomains that interact with function k . For exam-

ple, for a P 1 finite element method and an interface point shared by two
subdomains, we have k = 2. If the point is a cross point shared by four
subdomains, we have k = 4.
The global set of d.o.fs can be decomposed into unknowns interior to sub-
domains i , 1 i N :

N i = {k Ni / k = 1}
and the set of interface unknowns, also called the skeleton, defined as:
N = {k N / k > 1} .

Let Ri and R be the restriction boolean matrices from the global set N

to subsets N i and N . For U R#N , let Ui =Ri U = (Uk ) be the set
kN i
of interior degrees of freedom of subdomain i and U = R U = (Uk )kN

denote the set of interface degrees of freedom. Note that the sets (N i )1iN
and N form a partition of N
N
N = ( N i ) N .
i=1
For a finite element discretization of partial differential equations, the matrix

has a sparse structure:
T
Ri ARj = 0, for i j .
If the skeleton unknowns are numbered last, the corresponding block form
of the global problem is thus:

A11 0 0 A1 U 1 F1
F2

0
A 22 A2 U2

0 0 = (6.39)
0
0 AN N AN
UN

FN

A1 A2 AN
A U F
6.4. MANY SUBDOMAINS CASE 171
As in the two-subdomain case for matrix (6.1), it has an arrow shape which
allows a multifrontal direct factorization.
In order to define a Neumann-Neumann preconditioner for the substructured
problem
N N
S U = F Ai A1
ii Fi , where S = A Ai Aii Ai ,
1
(6.40)
i=1 i=1
we make use of the decomposition induced on the skeleton N by the geo-

metrical domain decomposition. We have thus the following decomposition
of the skeleton:
N = Ni .
1iN
where the set of indices of interface unknowns of subdomain i is
Ni = {k Ni / k > 1} = N Ni 1 i N.
Let Ri be the restriction Boolean matrix from the skeleton N to Ni . For
a vector of skeleton d.o.f. U R#N , let Ui = Ri U = (Uk )kNi denote
the set of interface degrees of freedom of subdomain i .
Let Di be a diagonal matrix of size R#Ni R#Ni whose entries are
1
(Di )kk = , k Ni .
k
We have clearly partition of unity on the skeleton:
N
Ri Di Ri = Id .
T
i=1
In order to define a Neumann-Neumann preconditioner for (6.40), we first

decompose A into its local contributions coming from the subdomains:
N
A = A
(i)
i=1
as it was done in the two subdomain case in eq. (6.32). More precisely for
a Poisson problem, let k, l be the indices of two degrees of freedom on the
interface associated with two basis functions k and l . The corresponding
i,kl
entry akl
can be decomposed into a sum a , where
=
ai,kl k l , i = 1, . . . , N .
i
This yields a decomposition of S:

N
S = Si , Si = A Ai A1
(i)
ii Ai . (6.41)
i=1
Remark 6.4.1 Note that contrarily to the two subdomain case, matrices Si
have many zero columns and lines and are thus non invertible. The original
matrix A arising from a finite element discretization of a partial differential
operator, all lines and columns related to indices that do not belong to the
set Ni are zeros. It is thus not possible to define directly the preconditioner
as a weighted sum of the inverses of the local Schur complement Si as in the
two subdomain case (6.34).
Definition 6.4.1 (Neumann-Neumann for many subdomains) We

define local Schur complement matrices Si RNi Ni by
Si = Ri Si RTi .
From the above remark, we have:
Si = RTi Si Ri .
From (6.41), we have:

N
S = RTi Si Ri . (6.42)
i=1
The substructuring Neumann-Neumann method consists in solving the in-
terface problem (6.40) by a conjugate gradient algorithm preconditioned by
the interface operator
N
T = RTi Di Si1 Di Ri . (6.43)
i=1
Note that it is not necessary to compute the matrix Si1 . As seen in for-
mula (6.35), it is sufficient to solve a Neumann problem in the subdomains.
6.4.1 Remarks on FETI

In the two subdomain case, we have seen at both the continuous and alge-
braic levels in previous sections that Neumann-Neumann and FETI algo-
rithms are very similar. At the algebraic levels the substructured operators
S(., 0) and Tf eti (., 0) are both square and have the same size: the number
of interface unknowns. When there are cross points (points that belong to
three or more subdomains which is typical in the many subdomain case),
things are more involved and the number of Lagrange multipliers will exceed
the number of (non duplicated) interface unknowns. This is exemplified in
Figures 6.5 and 6.6. We have a domain decomposed into four subdomains.
The cross point belongs to the four subdomains. Figure 6.5 corresponds
to the Neumann-Neumann method where the unknowns related to the sub-
structured operator S(., 0) are the degrees of freedom located on the inter-
faces without any duplication. Figure 6.6 corresponds to the FETI method
6.5. NEUMANN-NEUMANN IN FREEFEM++ 173
x
4 x 3
x x x x x
x
1 2
x
Figure 6.5: Skeleton and BNN interface unknowns for four subdomains
where the unknowns related to the FETI operator Tf eti (., 0) are the links
between the duplicated interface unknowns. They are Lagrange multipliers
of the equality constraints on the duplicated interface unknowns. When a
d.o.f. belongs to two and only two subdomains, there is one associated La-
grange multiplier. The duplication of the cross point of Figure 6.6 yields
four unknowns, denoted uC C C C
1 , u2 , u3 and u4 . The subscript indicates the
subdomain it comes from. In order to minimize the number of Lagrange
multipliers, we choose to impose non redundant equality constraints such
as uC1 = u2 , u2 = u3 and u3 = u4 . They yield three Lagrange multipliers
C C C C C
labeled 12 , 23 and 34 . Thus we have two more unknowns in the FETI

method than in the Neumann-Neumann method. An equivalence between
a correct generalization of the FETI method and the Neumann-Neumann
method although possible cannot be simple. We refer the interested reader
to e.g. the books [179] [153]. Papers for the more advanced Schur comple-
ment methods are e.g. the papers introducing FETI-DP [75] , BBDC [46]
and the relation between these two algorithms [127].
6.5 Neumann-Neumann method in FreeFem++

From the point of view of a practical implementation, formulas (6.40) and
(6.43) raise two difficulties. First the need for a data structure adapted to the
interface and second the well-posedness of the local Neumann problems as
in the case of floating subdomains for Poisson problems where the constant
function 1 is in the kernel of Si for 1 i N .
We overcome the first difficulty by an implementation based on the global
unknowns. It enables us to use the formalism already introduced for the
Schwarz method, chapter 1, based on a global partition of unity (see 1.3):
N
Id = RiT Di Ri . (6.44)
i=1
x x
4 x Cx 3
uC
4 34 uC
3
x x x x Cx x
x x C x x 23Cx x
C u1 u2
12
1 x x 2
x x
Figure 6.6: FETI duplicated interfaces and non redundant Lagrange multi-
pliers for four subdomains
As for the second difficulty, we regularize the Neumann problems. Since

they are involved only in the preconditioning step, the converged solution
is not affected. In section 7.7.2, we present a more classical approach to
handle a possible kernel in matrix Si .
We detail here our FreeFem++ implementation. As explained in 1.7, a

Dirichlet boundary condition is penalized with a parameter denoted here
tgv. Denote the matrix of a local Dirichlet problem by AiD :
Aii Aii
AiD = ( ). (6.45)
Ai i Ai i + tgv Id
(i)
For the sake of simplicity, we suppose that tgv is sufficiently large so that:

(AiD )1 ( Fi ) = ( ii (F i Aii Ui ) ) .
A1 (6.46)
F + tgv Ui
i
Ui
Let be a small positive parameter and M i = (mi,kl )k,lNi denote the local
mass matrix of the subdomain i . More precisely, for k, l Ni , we define
mi,kl = ( k l ) .
i
The regularized local Neumann problem AiN is defined by:
Aii Aii
AiN = M i + ( (i) ) . (6.47)
Ai i Ai i
For the sake of simplicity, we assume that the regularizing parameter is

small enough so that the following formula holds:
Aii (Id + Aii Si Ai i Aii ) A1

ii Aii Si
1 1 1 1
(AiN )1 = . (6.48)
Si1 Ai i A1
ii Si1
Using the algebraic partition of unity, we now define the global counterpart
of the interface operator S (see eq. (6.31)).

Definition 6.5.1 Let U, F R#N , we define S(U, F) by the following for-
mula:

S(U, F) = F A RiT Di (AiD )1 ( Fi ) (6.49)
i Fi + tgv Ui
This definition deserves some explanations.

For each 1 i N , the application of (AiD )1 to the vector (Fi , Fi +
tgv Ui )T returns a vector local to subdomain i which is equal to
Ui on the interface i and which verifies equation (6.39) for the nodes
interior to i as seen from (6.46).
The local values previously obtained are assembled in a global vector

using the partition of unity (6.44).
The global vector obtained is equal to U on the skeleton and verifies

equation (6.39) for the interior nodes.

Finally, S(U, F) is the corresponding global residual.
More precisely, we have the following property written in a compact form:
Result 6.5.1 The link between the global residual and the Schur comple-
ment residual is expressed by:
U, F R#N ,
S(U, F) = RT S(R U, F)
where R was defined as the restriction operator to the interface unknowns.

Proof From eq. (6.46) and (6.49), we have indeed:

F) = F A RiT Di ( Aii (Fi Aii Ui ) )
1
S(U,
i Ui

11 (F1 A11 U1 )
A1 0
A1 (F

2 A22 U2 ) 0
22 ,
= FA =

1 0
AN N (FN AN N UN )
G S(U
)
U
where we have used the partition of unity identity (6.44), the fact that the

diagonal entries of Di are one for interior d.o.fs of subdomain i (N i ) and
the equality
Aii Ri U = Ai R U.
Thus we have seen that solving
0)(U) = S(0,
S(, F) (6.50)
amounts to solving equation (6.40). From Result 6.5.1, it is clear that

0) is made of
System (6.50) is not invertible since the kernel of S(,
vectors U R #N
such that R U = 0.
0) is made of vectors whose interior values are so

The range of S(,
0)).
that S(0, F) range(S(,
Solutions to equation (6.50) exist and are unique up to an arbitrary

0)).
element of ker(S(,
In other words, only the interface values of a solution to (6.50) are unique.
The correct interior values can be obtained by a parallel post processing
consisting in setting the interior values to

ii (Fi Aii Ui ), 1 i N.
A1
A preconditioner for equation (6.50) is defined by:
N

T(r) = ( RiT Di (AiN ) Di Ri ) r for r R#N .
1
(6.51)
i=1
Result 6.5.2 We have the following link between T defined by eq. (6.51)
and T defined by eq. (6.43). For all r R #N
,
T r ) = T(r )
R (TR
Proof We first prove the following identity
0 A11 A1 S1 D r
1 1
0 A22 A2 S1
1
2 D r

T = .

0 A1 A S1 D r
N N N N
r T(r )
on the interface component of the residual

This shows that the action of T
amounts to that of T defined by formula (6.43). Then, for each subdomain
i, 1 i N , we have
0
0

= (Ai )1 ( 0 ) = (Aii Ai Si D r ) .
1 1
(AN ) Di Ri
i 1

N
0 Di ri Ri Si D r
1

r
For the interior unknowns to the subdomains Di is the identity so that
0
0
A1
ii Ai Si D r
1
Di (AN ) Di Ri
i 1
= ( D R S1 D r ) .
0 i i i

r
Finally, due to the fact that the global partition of unity (6.50) induces an
interface partition of unity , we have by left multiplying by RiT and summing
over the subdomains:
0 A11 A1 S1 D r
1 1
0 A22 A2 S2 D r
1 1

T

= .

0 A1 A S1 D r
N N N N
r T(r )
To summarize,
We solve the (ill-posed) linear system (6.50) by a conjugate gradient
method preconditioned by operator T (6.51).
The solution returned by the Krylov method is correct on the interface

but interior values need to be corrected, see 3.5 and the characteri-
0) above.
zation of the kernel and range of S(.,
This correction is achieved by the following post processing step:

Ui = A1
ii (Fi Aii Ui )
where Ui is the restriction to i of the interface values U .
6.5.1 FreeFem++ scripts

We are ready now to provide the FreeFem++ implementation. The
beginning is identical of those of the Schwarz routines, except that the
decomposition into subdomain is without overlap. Therefore we dont need

to add layers to create an overlapping mesh and the different operators are
created accordingly. As in the case of the Schwarz algorithm, we still need
to create a partition of unity (based on the non-overlapping decomposition)
and the restriction and extension operators. Also, we provide the option
of building a coarse space, based on constant local vectors weighted by the
partition of unity. Afterwords we build the local Dirichlet and Neumann
problem matrices
mesh Thi=Th; // freefems trick, formal definition

fespace Vhi(Thi,P1); // freefems trick, formal definition
15 Vhi[int] rhsid(npart),rhsin(npart); // local right hand sides
Vhi[int] auntgv(npart);
plot(part, wait=1,fill=1,ps=Decomp.eps);
19 {
Thi=aTh[i];
varf vad(u,v) = int2d(Thi)(etauv+kappaGrad(u)Grad(v))
+ on(1,u=g) + int2d(Thi)(fv) // Dirichlet pb. Delta
(u)=f , u=g on the boundary
23 + on(10,u=0); // Dirichlet BC on the interface
varf van(u,v) = int2d(Thi)(etanuv+kappaGrad(u)Grad(v))
+ on(1,u=g) + int2d(Thi)(fv) // Dirichlet pb. Delta
(u)=f , u=g on the boundary
; // Neumann BC on the interface
27 aA[i] = vad(Vhi,Vhi,solver=UMFPACK); // local Dirichlet matrix
aN[i] = van(Vhi,Vhi,solver=UMFPACK); // local Neumann matrix
rhsid[i][]= vad(0,Vhi); // local Dirichlet rhs
rhsin[i][]= van(0,Vhi); // local Neumann rhs
31 varf vaun(u,v) = on(10,u=1);
auntgv[i][] = vaun(0,Vhi); // array of tgv on Gamma intern, 0 elsewhere
}
Listing 6.1: ./FETI/FreefemProgram/FETI-BNN.edp
We need also the routine counterparts of the application of the operators S

and T. Before that, let us define the local Dirichlet solve for a homogeneous
and non-homogeneous problem defined on the global boundary.
func real[int] extendDir(real[int] ud, bool hom)

3 // Solve local Dirichlet problems with ud at the interface
// homogeneous on the real boundary (hom = 1) or not (hom = 0)
{
Vh s=0;
{
real[int] ui = Rih[i]ud; // local solution
real[int] bi = ui . auntgv[i][]; // take into account the interface conditions
11 if (hom)
bi = auntgv[i][] ? bi : 0; // update rhs
else
bi = auntgv[i][] ? bi : rhsid[i][];
15 ui= aA[i] 1 bi; // solve local Dirichlet problem
bi = Dih[i]ui; // use the partition of unity
s[]+= Rih[i]bi; // extend the local solution globally
}
19 return s[];
}
Listing 6.2: ./FreefemCommon/matvecDtNtD.idp
The effect of operator

S is given by
func real[int] A(real[int] &l) // DtN operator

{
26 Vh ui, rn, s;
ui[]=l.intern[];
s[] = extendDir(ui[],1);
rn[]= Aglobals[];
30 rn[] = rn[] . intern[];
return rn[];
}
is given by
and that of operator T
// and the application of the preconditioner

func real[int] Mm1(real[int] &l) // preconditionner = NtD operator
37 {
Vh s = 0,rn;
rn[] = l;
41 {
real[int] rb = Rih[i]rn[];
real[int] ri = Dih[i]rb; // local residual
real[int] ui = aN[i] 1 ri; // local solution of the Neumann problem
45 real[int] bi = Dih[i]ui;
s[]+= Rih[i]bi; // extend the local solution globally
}
s[] = s[].intern[];
49 return s[];
}
The latter can be used simply as a preconditioner in a conjugate gradient

or one can add a second level, by thus obtaining the Balancing Neumann-
Neumann algorithm

{
real[int] aux2=Mm1(u);
if(withBNN){
10 real[int] aux1 = Q(u);
aux2 = P(u);
real[int] aux3 = Mm1(aux2);
aux2 = PT(aux3);
14 aux2 += aux1;
}
return aux2;
}
Listing 6.5: ./FETI/FreefemProgram/PCG-BNN.idp
Finally, this is used in the main script in the following way

Comparison of convergence NeumannNeumann Comparison of convergence Balancing NeumannNeumann

0 0
10 10
4x4 4x4
8x8 8x8
1
10 10x10 1 10x10
10
2
10 2
10
3
10
3
10
4
10
Residual
Residual
4
10
5
10
5
10
6
10
6
7 10
10
7
8
10 10
9 8
10 10
0 20 40 60 80 100 120 140 0 1 2 3 4 5 6 7
Figure 6.7: Neumann-Neumann convergence without (left) and with (right)

coarse space for a uniform decomposition
Vh s, b, un, gd;
37 include ../../FreefemCommon/matvecDtNtd.idp
include ../../FreefemCommon/coarsespace.idp
// compute rhs of the interface problem: b[]
un = 0;
41 gd[]=g;
un[]+= bord[].gd[];
s[] = extendDir(un[],0);
b[]= Aglobals[];
45 b[]= rhsglobal[];
b[] = 1;
include PCGBNN.idp
Vh lam = 0,sol;
49 lam[] = lam[].intern[]; // insures that the first iterate verifies
lam[]+= bord[].gd[]; // the global BC, from iter=1 on this is true
sol[] = myPCG(lam[],tol,maxit);
sol[] = extendDir(sol[],0);
53 Vh err = soluglob;
plot(sol,cmm= Final Solution, wait=1,dim=3,fill=1,value=1);
cout << Final error: << err[].linfty << endl;
Listing 6.6: ./FETI/FreefemProgram/FETI-BNN.edp
In Figures 6.7 and 6.8, we report results for uniform and Metis decomposi-
tions into 44, 88 and 1010 subdomains. We use the Neumann-Neumann
method as a preconditioner in a Krylov method. In the first set of tests the
method is used without a coarse space and we see that the iteration num-
ber depends linearly on the total number of domains and not only on the
number of domains in one direction as in the case of the Schwarz methods.
By adding a coarse space, the behavior becomes insensitive to the number
of subdomains as expected. Note that the convergence is better in the case
of uniform decompositions than in the case of decompositions using Metis.
Comparison of convergence NeumannNeumann Metis Comparison of convergence Balancing NeumannNeumann Metis

1 0
10 10
4x4 4x4
8x8 8x8
0
10 10x10 1 10x10
10
1
10 2
10
2
10
3
10
3
10
Residual
Residual
4
10
4
10
5
10
5
10
6
6 10
10
7
7
10 10
8 8
10 10
0 50 100 150 200 250 300 0 2 4 6 8 10 12 14
Figure 6.8: Neumann-Neumann convergence without (left) and with (right)

coarse space for a Metis decomposition
6.6 Non-standard Neumann-Neumann type

methods
Some algorithmic aspects of systems of PDEs based simulations can be bet-
ter clarified by means of symbolic computation techniques. This is very
important since numerical simulations heavily rely on solving systems of
PDEs. Some domain decomposition methods are well understood and ef-
ficient for scalar symmetric equations (e.g., Laplacian, biLaplacian) and to
some extent for non-symmetric equations (e.g., convection-diffusion). But
they have poor performances and lack robustness when used for symmetric
systems of PDEs, and even more so for non-symmetric complex systems.
In particular, we shall concentrate on Neumann-Neumann and FETI
type algorithms. In some sense, these methods applied to systems of
PDEs (such as Stokes, Oseen, linear elasticity) are less optimal than the
domain decomposition methods for scalar problems. Indeed, in the case
of two subdomains consisting of the two half planes, it is well-known that
the Neumann-Neumann preconditioner is an exact preconditioner (the
preconditioned operator is the identity operator) for the Schur complement
equation for scalar equations like the Laplace problem. Unfortunately, this
does not hold in the vector case.
In order to achieve this goal, we have used in [29, 30] algebraic methods
developed in constructive algebra, D-modules (differential modules) and
symbolic computation such as the so-called Smith or Jacobson normal forms
and Grobner basis techniques for transforming a linear system of PDEs into
a set of independent scalar PDEs. These algebraic and symbolic methods
provide important intrinsic information (e.g., invariants) about the linear
system of PDEs to solve. For instance we recover that the two-dimensional
Stokes system is in some sense equivalent to a biharmonic operator (6.56).
6.6. NON-STANDARD NEUMANN-NEUMANN TYPE METHODS 183
In fluid mechanics, it relates to the stream function formulation [11]. These

build-in properties need to be taken into account in the design of new
numerical methods, which can supersede the usual ones based on a direct
extension of the classical scalar methods to linear systems of PDEs. By
means of these techniques, it is also possible to transform the linear system
of PDEs into a set of decoupled PDEs under certain types of invertible
transformations. One of these techniques is the so-called Smith normal
form of the matrix of OD operators associated with the linear system.
The Smith normal form has already been successfully applied to open
problems in the design of Perfectly Matched Layers (PML). The theory of
PML for scalar equations was well-developed and the usage of the Smith
normal form allowed to extend these works to systems of PDEs. In [137],
a general approach is proposed and applied to the particular case of the
compressible Euler equations that model aero-acoustic phenomena and in
[10] for shallow-water equations.
For domain decomposition methods, several results have been obtained

on compressible Euler equations [57, 58], Stokes and Oseen systems [59,
60] or in [98] where a new method in the Smith spirit has been derived.
Previously the computations were performed heuristically, whereas in [29,
30], a systematic way to build optimal algorithms for given PDE systems
was shown.
Notations. If R is a ring, then Rpq is the set of pq matrices with entries in

R and GLp (R) is the group of invertible matrices of Rpp , namely GLp (R) =
{E Rpp F Rpp E F = F E = Ip }. An element of GLp (R) is called an
unimodular matrix. A diagonal matrix with elements di s will be denoted by
diag(d1 , . . . , dp ). If k is a field (e.g., k = Q, R, C), then k[x1 , . . . , xn ] is the
commutative ring of polynomials in x1 , . . . , xn with coefficients in k. In what
follows, k(x1 , . . . , xn ) will denote the field of rational functions in x1 , . . . , xn
with coefficients in k. Finally, if r, r R, then r r means that r divides r,
i.e., there exists r R such that r = r r .
6.6.1 Smith normal form of linear systems of PDEs

We first introduce the concept of Smith normal form [171] of a matrix with
polynomial entries (see, e.g., [91] or [183], Theorem 1.4). The Smith nor-
mal form is a mathematical technique which is classically used in module
theory, linear algebra, symbolic computation, ordinary differential systems,
and control theory. It was first developed to study matrices with integer
entries.
Theorem 6.6.1 Let k be a field, R = k[s], p a positive integer and A Rpp .

Then, there exist two matrices E GLp (R) and F GLp (R) such that
A = E S F,
where S = diag(d1 , . . . , dp ) and the di R satisfying d1 d2 dp . In partic-

ular, we can take di = mi /mi1 , where mi is the greatest common divisor of
all the i i-minors of A (i.e., the determinants of all i i-submatrices of A),
with the convention that m0 = 1. The matrix S = diag(d1 , . . . , dp ) Rpp is
called a Smith normal form of A.
We note that E GLp (R) is equivalent to det(E) is an invertible polyno-

mial, i.e., det(E) k {0}. Also, in what follows, we shall assume that the
di s are monic polynomials, i.e., their leading coefficients are 1, which will
allow us to call the matrix S = diag(d1 , . . . , dp ) the Smith normal form of A.
But, the unimodular matrices E and F are not uniquely defined by A. The
proof of Theorem 6.6.1 is constructive and gives an algorithm for computing
matrices E, S and F . The computation of Smith normal forms is available
in many computer algebra systems such as Maple, Mathematica, Magma. . .
Consider now the following model problem in Rd with d = 2, 3:
Ld (w) = g in Rd , w(x) 0 for x . (6.52)
For instance, Ld (w) can represent the Stokes/Oseen/linear elasticity opera-

tors in dimension d. In the case of the Oseen equation, the diagonal matrix
S (6.57) is the product of a Laplace operator and a convection-diffusion op-
erator. We suppose that the inhomogeneous linear system of PDEs (6.52)
has constant coefficients, then it can be rewritten as
Ad w = g, (6.53)
where Ad Rpp , R = k[x , y ] (resp., R = k[x , y , z ]) for d = 2 (resp.,

d = 3) and k is a field.
In what follows, we shall study the domain decomposition problem in which
Rd is divided into two half planes subdomains. We assume that the direction
normal to the interface of the subdomains is particularized and denoted by
x . If Rx = k(y )[x ] for d = 2 or Rx = k(y , z )[x ] for d = 3, then,
computing the Smith normal form of the matrix Ad Rxpp , we obtain Ad =
E S F , where S Rxpp is a diagonal matrix, E GLp (Rx ) and F GLp (Rx ).
The entries of the matrices E, S, F are polynomials in x , and E and
F are unimodular matrices, i.e., det(E), det(F ) k(y ) {0} if d = 2, or
det(E), det(F ) k(y , z ) {0} if d = 3. We recall that the matrices E and
F are not unique contrary to S. Using the Smith normal form of Ad , we
get:
Ad w = g {ws = F w, S ws = E 1 g}. (6.54)
In other words, (6.54) is equivalent to the uncoupled linear system:
S ws = E 1 g. (6.55)
Since E GLp (Rx ) and F GLp (Rx ), the entries of their inverses are still
polynomial in x . Thus, applying E 1 to the right-hand side g of Ad w = g
amounts to taking k-linear combinations of derivatives of g with respect to
x. If Rd is split into two subdomains R Rd1 and R+ Rd1 , then the
application of E 1 and F 1 to a vector can be done for each subdomain
independently. No communication between the subdomains is necessary.
In conclusion, it is enough to find a domain decomposition algorithm for
the uncoupled system (6.55) and then transform it back to the original one
(6.53) by means of the invertible matrix F over Rx . This technique can be
applied to any linear system of PDEs once it is rewritten in a polynomial
form. The uncoupled system acts on the new dependent variables ws , which
we shall further call Smith variables since they are issued from the Smith
normal form.
Remark 6.6.1 Since the matrix F is used to transform (6.55) to (6.53)

(see the first equation of the right-hand side of (6.54)) and F is not unique,
we need to find a matrix F as simple as possible (e.g., F has minimal degree
in x ) so that to obtain a final algorithm whose form can be used for practical
computations.
Example 6.6.1 Consider the two dimensional elasticity operator defined

by E2 (u) = u ( + ) div u. If we consider the commutative poly-
nomial rings R = Q(, )[x , y ], Rx = Q(, )(y )[x ] = Q(, , y )[x ]
and
( + 2 ) x2 + y2 ( + ) x y
A2 = ( 2 )R
22
( + ) x y x + ( + 2 ) y
2
the matrix of PD operators associated with E2 , i.e., E2 (u) = A2 u, then the

Smith normal form of A2 Rx22 is defined by:
1 0
SA2 = ( ). (6.56)
0 2
The particular form of SA2 shows that, over Rx , the system of PDEs for the
linear elasticity in R2 is algebraically equivalent to a bi-harmonic equation.
Example 6.6.2 Consider the two dimensional Oseen operator O2 (w) =

O2 (v, q) = (c v v + b v + q, v), where b is the convection velocity.
If b = 0, then we obtain the Stokes operator S2 (w) = S2 (v, q) = (c v
v + q, v). If R = Q(b1 , b2 , c, )[x , y ], Rx = Q(b1 , b2 , c, )(y )[x ] =
Q(b1 , b2 , c, , y )[x ] and
(x + y ) + b1 x + b2 y + c
2 2
0 x
O2 =
0 (x2 + y2 ) + b1 x + b2 y + c y

x y 0
the matrix of PD operators associated with O2 , i.e., O2 (w) = O2 w, then

the Smith normal form of O2 Rx33 is defined by:
1 0 0
SO2
= 0 1 0 , L2 = c + b . (6.57)

0 0 L2
From the form of SO2 we can deduce that the two-dimensional Oseen equa-
tions can be mainly characterized by the scalar fourth order PD operator
L2 . This is not surprising since the stream function formulation of the
Oseen equations for d = 2 gives the same PDE for the stream function.
Remark 6.6.2 The above applications of Smith normal forms suggest that
one should design an optimal domain decomposition method for the bi-
harmonic operator 2 (resp., L2 ) in the case of linear elasticity (resp., the
Oseen/Stokes equations) for the two-dimensional problems, and then trans-
form it back to the original system.
6.6.2 An optimal algorithm for the bi-harmonic operator

We give here an example of Neumann-Neumann methods in its iterative
version for Laplace and biLaplace equations. For simplicity, consider a de-
composition of the domain = R2 into two half planes 1 = R R and
2 = R+ R. Let the interface {0} R be denoted by and (ni )i=1,2 be the
outward normal of (i )i=1,2 .
We consider the following problem:
u = f in R2 ,
(6.58)
u(x) 0 for x .
and the following Neumann-Neumann algorithm applied to (6.58):
This algorithm is optimal in the sense that it converges in two iterations.
Since the bi-harmonic operator seems to play a key role in the design of a
new algorithm for both Stokes and elasticity problem in two dimensions, we
need to build an optimal algorithm for it. We consider the following problem:
Find R2 R such that

(6.60)
2 = g in R2 , (x) 0 for x .
Let un be the interface solution at iteration n. We obtain un+1

from un by
the following iterative procedure

ui,n = 0, in i ,
ui,n = f, in i ,

{ ui,n 1 u1,n u2,n
ui,n = un , on ,

= ( + ), on ,

ni 2 n1 n2
(6.59)
and then un+1
= un + (u
1
2
1,n
+ u
2,n
).
Let (n , Dn ) be the interface solution at iteration n (suppose also that

0 = 0 , D0 = (0 ) ). We obtain (n+1
, D ) from ( , D ) by the
n n n
following iterative procedure

2 i,n = 0,

in i ,
2 i,n
= f, in i ,

i,n 1,n 2,n
i,n

1
= ( + ), on ,
= n ,
on , ni
2 n1 n2

i,n
= Dn , on ,

1 1,n 2,n
i,n

= ( + ) , on ,

ni
2 n1 n2
(6.61)
and then n+1
= n
+ 1
2
( 1,n
+ 2,n
) , D n+1
= Dn
+ 2
(
1 1,n
+ 2,n ).

and the following Neumann-Neumann algorithm applied to (6.60):

This is a generalization of the Neumann-Neumann algorithm for the
operator and is also optimal (the proof can be found in [59]).
Now, in the case of the two dimensional linear elasticity, represents the sec-
ond component of the vector of Smith variables, that is, = (ws )2 = (F u)2 ,
where u = (u, v) is the displacement field. Hence, we need to replace
with (F u)2 into the algorithm for the biLaplacian, and then simplify it
using algebraically admissible operations. Thus, one can obtain an optimal
algorithm for the Stokes equations or linear elasticity depending on the
form of F . From here comes the necessity of choosing in a proper way the
matrix F (which is not unique), used to define the Smith normal form,
in order to obtain a good algorithm for the systems of PDEs from the
optimal one applied to the bi-harmonic operator. In [57] and [59], the
computation of the Smith normal forms for the Euler equations and the
Stokes equations was done by hand or using the Maple command Smith.
Surprisingly, the corresponding matrices F have provided good algorithms
for the Euler equations and the Stokes equations even if the approach was
entirely heuristic.
The efficiency of our algorithms heavily relies on the simplicity of the Smith
variables, that is on the entries of the unimodular matrix F used to com-

pute the Smith normal form of the matrix A. Within a constructive algebraic
analysis approach, we developed a method for constructing many possible
Smith variables (completion problem). Taking into account physical aspects,
the user can then choose the simplest one among them. Also, in the algo-
rithms presented in the previous sections, we have equations in the domains
i and interface conditions on obtained heuristically. We have designed
an automatic way to reduce the interface conditions with respect to the
equations in the domains (reduction problem). For more details and explicit
computations, we refer the reader to [29, 30].
6.6.3 Some optimal algorithms

After performing the completion and the reduction of the interface condi-
tions, we can give examples of optimal algorithms.
Example 6.6.3 Consider the elasticity operator:
Ed u = div (u), (u) = (u + (u)T ) + div u Id .
If d = 2, then the completion algorithm gives two possible choices for F :

2 2 2 2
x ( x 3y ) 1 1 (+)x ((3 +2 ) y +(2 +) x )

F= (+) y , F= y3 .
1 0 0 1
(6.62)
By replacing into the Neumann-Neumann algorithm for the biLaplacian
by (F u)2 and re-writing the interface conditions, using the equations inside
the domain like in [59], we get two different algorithms for the elasticity
system. Note that, in the first case of (6.62), = u, and, in the second one,
= v (where u = (u, v)). Below, we shall write in detail the algorithm in
the second case. To simplify the writing, we denote by u = u , un = u n,
nn (u) = ((u) n) n, n (u) = ((u) n) .
Remark 6.6.3 We found an algorithm with a mechanical meaning: Find

the tangential part of the normal stress and the normal displacement at the
interface so that the normal part of the normal stress and the tangential
displacement on the interface match. This is very similar to the original
Neumann-Neumann algorithm, which means that the implementation effort
of the new algorithm from an existing Neumann-Neumann is negligible (the
same type of quantities displacement fields and efforts are imposed at
the interfaces), except that the new algorithm requires the knowledge of some
geometric quantities, such as normal and tangential vectors. Note also that,
with the adjustment of the definition of tangential quantities for d = 3, the
algorithm is the same, and is also similar to the results in [59].
Let (un , n ) be the interface solution at iteration n (suppose also that

u0 = (u0 ) , 0 = (snn (u0 )) ). We obtain (un+1
, ) from (u , ) by
n n n
the following iterative procedure

E2 (ui,n ) = 0, in i ,

E2 (ui,n ) = f, in i ,
1
1,n

ui
i,n
= (u1,n n1 + un2 ) ,
2,n
on ,
ui = u , on ,
n 2
ni ni (u ) =
i,n
n , on ,

ni i (ui,n ) = (n1 1 (u1,n ) + n2 2 (u2,n )) ,

2

on ,
(6.63)
1,n 2,n
and un+1
= un + 1
2
(u 1 + u 2 ) , n+1
= n
+ 1
2
( n n
1 1 (u 1,n
) + n n
2 2 (u 2,n
)).
All algorithms and interface conditions are derived for problems posed on
the whole space, since for the time being, this is the only way to treat from
the algebraic point of view these problems. The effect of the boundary
condition on bounded domains cannot be quantified with the same tools.
All the algorithms are designed in the PDE level and it is very important
to choose the right discrete framework in order to preserve the optimal
properties. For example, in the case of linear elasticity a good candidate
would be the TDNNS finite elements that can be found in [152] and the
algorithms obtained by these algebraic techniques have been used to design
a FETI-TDNNS method [151].
Chapter 7
Robust Coarse Spaces via

Generalized Eigenproblems:
the GenEO method
We have already analyzed in Chapter 5 two-level additive Schwarz methods

for a Poisson problem with a coarse space motivated by heuristic ideas.
The condition number estimates were based on a specific tool to domain
decomposition methods namely the notion of stable decomposition (def-
inition 5.4.1) as well as on functional analysis results such as Poincare
inequalities or trace theorems and on a numerical analysis interpolation
inequality (5.8). In this chapter, we present the GenEO (Generalized
Eigenvalue in the Overlap) method to build coarse spaces for which any
targeted convergence rate can be achieved. The construction is as algebraic
as possible. We shall make a heavy use of a generalization of the stable
decomposition notion, namely the Fictitious Space Lemma, see 7.2.1.
This abstract result is based on writing domain decomposition methods as
a product of three linear operators.
As we also mentioned previously, the main motivation to build a robust

coarse space for a two-level additive Schwarz method is to achieve scalability
when solving highly heterogeneous problems i.e. for which the convergence
properties do not depend on the variation of the coefficient. Recently, for
scalar elliptic problems, operator dependent coarse spaces have been built in
the case where coefficients are not resolved by the subdomain partition, see
e.g. [61, 68, 81, 80, 99, 143, 155, 167, 168]. We have seen in Chapter 5 that
a very useful tool for building coarse spaces for which the corresponding
two-level method is robust, regardless of the partition into subdomains and
of the coefficient distribution, is the solution of local generalized eigenvalue
problems. In this spirit, for the Darcy problem, [81] proposes to solve local
generalized eigenvalue problems on overlapping coarse patches and local
191
192 CHAPTER 7. GENEO COARSE SPACE
contributions are then glued together via a partition of unity to obtain a

global coarse space. More recently [80, 68] and [143, 61] have built on these
ideas and proposed different coarse spaces based on different generalized
eigenvalue problems. The theoretical results in [61] rely on uniform (in the
coefficients) weighted Poincare inequalities and they have two drawbacks:
(i) at the theoretical level some assumptions are needed on the coefficient
distribution in the overlaps and (ii) the arguments cannot be generalized
easily to the case of systems of PDEs because the maximum principle does
not hold anymore.
As a remedy to these drawbacks, in [173], we proposed a coarse space

construction based on Generalized Eigenproblems in the Overlap (which
we will refer to as the GenEO coarse space); the method was previously
introduced in [172] by the same authors. This particular construction has
been applied successfully to positive definite systems of PDEs discretized by
finite elements with only a few extra assumptions. The resulting generalized
eigenvalue problems are closely related, but different to the ones proposed
in [68]. Which of the two approaches is better in practice in terms of
stability versus coarse space size is still the object of ongoing research, see
for example [181].
In this chapter, the GenEO method, as well as most classical two-level meth-
ods are presented in a different light, under a common framework. Moreover,
their convergence can be proven in an abstract setting, provided that the
assumptions of the Fictitious Space Lemma are satisfied. Before stating this
Lemma, we first reformulate the two-level ASM method in this framework.
7.1 Reformulation of the Additive Schwarz

Method
In this section we will show how the abstract theory of the Fictitious Space
Lemma 7.2.2 can be applied to better formalize the Additive Schwarz
Method (ASM). In order to simplify the presentation, we will place our-
selves directly in an algebraic setting where the set of degrees of freedom N
is decomposed into N subsets (Ni )1iN . In this framework we also have a
partition of unity
N
Ri Di Ri = Id
T
i=1
as defined in paragraph 1.3 equation (1.25). The coarse space is of size #N0
and spanned by the columns of a rectangular matrix R0T of size #N #N0 .
Note that most of the assumptions of this lemma are verified without the
precise definition of a coarse space. That is why, the latter will be specified
later on in Definition 7.4.2 only when the stable decomposition property
7.1. REFORMULATION OF THE ADDITIVE SCHWARZ METHOD193
requires it.
In the fictitious space lemma (see Lemma 7.2.2) a certain number of abstract
ingredients are needed. These ingredients can be easily identified in the case
of the ASM. This intuitive introduction will give the general flavour of the
methods exposed later on. We will come to the abstract framework after
this short presentation.
Definition 7.1.1 (ASM components in the Fictitious Space Lemma)

Two Hilbert spaces H, HD , two other associated bilinear forms and induced
scalar products as well as the RASM,2 operator between them are defined as
follows.
Space H = R#N endowed with the standard Euclidian scalar product.

We consider another bilinear form a defined by :
a H H R, (U, V) z a(U, V) = VT AU. (7.1)
where A is the matrix of the problem we want to solve. Recall that

matrix A is symmetric positive definite.
Space HD , defined as the product space

N
HD = R#N0 R#Ni (7.2)
i=1
is endowed with standard scalar Euclidian product. For U = (Ui )0iN ,

V = (Vi )0iN with Ui , Vi R#Ni , the bilinear form b is defined by
b HD HD R
N N
(U, V) z b(U, V) = a(RiT Ui , RiT Vi ) = ViT (Ri ARiT )Ui
i=0 i=0
= V T BU,
(7.3)
where B HD HD is the operator defined by
U = (Ui )0iN HD B(U) = (Ri ARiT Ui )0iN . (7.4)
Note that B is actually a block diagonal operator whose inverse is:
U = (Ui )0iN HD B 1 (U) = ((Ri ARiT )1 Ui )0iN .
The linear operator RASM,2 is defined as:

N
RASM,2 HD H, (Ui )0iN RiT Ui . (7.5)
i=0
After having settled these ingredients, we can proceed to the reformulation

of the Additive Schwarz Method.
Lemma 7.1.1 (ASM in terms of Fictitious Space Lemma) The
two-level ASM preconditioner
N
MASM,2
1
= RiT (Ri ARiT )1 Ri (7.6)
i=0
can be re-written as
MASM,2
1
= RASM,2 B 1 RASM,2 , (7.7)
where the operator RASM,2 H HD is the adjoint of the operator RASM,2

with respect to the standard Euclidean scalar products and operator B is
defined in (7.4).
Proof First of all, note that by definition RASM,2 can be written as:
(RASM,2 (U), V)2 = (U, RASM,2 (V))2 , U H, V = (Vi )0iN HD .
or in other words:
N
Vi (RASM,2 (U))i = U RASM,2 (V) ,
T T
i=0
that is
N N N
Vi (RASM,2 (U))i = U Ri Vi = Vi Ri U .
T T T T
i=0 i=0 i=0
Since this equality is valid for arbitrary Vi , we have the identification:
RASM,2 (U) = (Ri U)0iN . (7.8)
which leads to the re-writing (7.7) of the Additive Schwarz Method.
The explanation of the application of the preconditionner in term of these
operators is the following
According to (7.8), the right most operator RASM,2 decomposes a
global vector in H into local components in HD
The middle operator B 1 corresponds to solving a coarse problem and
N local Dirichlet problems
RASM,2 interpolates the modified local components into a global vector
in H.
As we shall see in the sequel of the chapter, this abstract form is also valid
for many domain decomposition methods such as the balancing Neumann-
Neumann preconditioner (BNN) and a modified version of the Optimized
Schwarz method (OSM). It will enable us to both analyze their condition
number and propose new coarse space constructions.
7.2. MATHEMATICAL FOUNDATION 195
7.2 Mathematical Foundation

In this paragraph we present a few abstract lemmas that are needed in
the sequel. Lemmas 7.2.2 and 7.2.7 are at the core of the study of many
multilevel methods. Lemma 7.2.5 is specific to the GenEO method we will
introduce.
7.2.1 Fictitious Space Lemma

We first state the lemma proved in [145, 144] as it is written in [102]. In its
proof we need the following auxiliary result:
Lemma 7.2.1 (Auxiliary result) Let m be an integer, A1 , A2 Rmm be
two symmetric positive definite matrices. Suppose there exists a constant c
such that
(A1 u, u) c (A2 u, u), u Rm .
2 A1 has real eigenvalues that are bounded by constant c.
Then, A1
Lemma 7.2.2 (Fictitious Space Lemma) Let H and HD be two Hilbert

spaces, with the scalar products denoted by (, ) and (, )D . Let the sym-
metric positive bilinear forms a H H R and b HD HD R,
generated by the s.p.d. operators A H H and B HD HD , respec-
tively (i.e. (Au, v) = a(u, v) for all u, v H and (BuD , vD )D = b(uD , vD ) for
all uD , vD HD ). Suppose that there exists a linear operator R HD H,
such that
R is surjective.
there exists a positive constant cR such that
a(RuD , RuD ) cR b(uD , uD ) uD HD . (7.9)
there exists a positive constant cT such that for all u H there exists
uD HD with RuD = u and
cT b(uD , uD ) a(RuD , RuD ) = a(u, u) . (7.10)
We introduce the adjoint operator R H HD by (RuD , u) = (uD , R u)D

for all uD HD and u H.
Then, we have the following spectral estimate
cT a(u, u) a (RB 1 R Au, u) cR a(u, u) , u H (7.11)
which proves that the eigenvalues of operator RB 1 R A are bounded from

below by cT and from above by cR with sharp bounds for the spectrum of
RB 1 R A given by the best possible constants cT and cR in the above in-
equalities.
Proof We will give a proof of the spectral estimates only in the finite
dimensional case. First note that operator RB 1 R H H is symmetric
by definition. Its positive definiteness, is easy to check. For any u H, we
have:
(RB 1 R u, u) = (B 1 R u, R u)D 0 .
Since B is S.P.D. the above term is zero iff R u = 0. Since R is surjective,
it follows that R is one-to-one. Thus, R u = 0 implies that u = 0.
We first prove the upper bound of the spectral estimate (7.11). First note
that (7.9) is equivalent to
(R ARuD , uD )D cR (B uD , uD )D , uD HD .
Using Lemma 7.2.1, the eigenvalues of B 1 R AR HD HD are real and

bounded from above by cR . This bound carries over to operator RB 1 R A
H H. Indeed, for any positive integer n we have:
(RB 1 R A)n = R (B 1 R AR)n1 B 1 R A .
Thus, for any u H {0}
(RB 1 R A)n u1/n = R (B 1 R AR)n1 B 1 R Au1/n

1/n 1/n
RHD H (B 1 R AR)n1 1/n B 1 R AuHHD .
By taking the limit as n tends to infinity of the previous inequelity, we

obtain that the spectral radius of RB 1 R A is bounded from above by that
of B 1 R AR which is itself less than or equal to cR .
We now prove the lower bound cT . For all u H, let uD HD such that
R(uD ) = u satisfies estimate (7.10) we have:
a(u, u) = a(R(uD ), u) = (R(uD ), A u) = (uD , R A u)D

= (uD , B B 1 R A u)D = b(uD , B 1 R A u)
b(uD , uD )1/2 b(B 1 R A u, B 1 R A u)1/2 (7.12)
1
a(u, u) b(B R A u, B R A u)
1/2 1 1 1/2
cT

Dividing by a(u, u)1/2 / cT and squaring we get
cT a(u, u) (R A u, B 1 R A u)D
= (A u, RB 1 R A u) = a(u, RB 1 R A u) .
For a proof valid in the infinite dimensional case as well and for a proof of
the optimality of the spectral estimate see [145, 144] or [102].
Remark 7.2.1 Lemma 7.2.2 is crucial in the definition of domain decom-

position algorithms and it requires a few elements of explanation:
If R were invertible, constant c1

T in estimate (7.10) would be a conti-
nuity constant of R1 .
In order to apply this lemma to the ASM method, we have defined

R as a sum of vectors local to subdomains, see (7.5). Then, esti-
mate (7.10) is often known as the stable decomposition property, see
[184] or [179]. In the sequel we use this name even when the operator
R involves also projections.
Note also that the bilinear operators from Lemma 7.2.2 can also be related
to an optimization problem.
Lemma 7.2.3 (Related optimization problem) For all u H, we

have:
a((RB 1 R A)1 u, u) = min b(uD , uD ) , .
uD R(uD )=u
Proof By definition, this is equivalent to prove:
((RB 1 R )1 u, u) = min (BuD , uD )D .

uD R(uD )=u
We solve this constrained quadratic minimization problem using a La-

grangian:
L HD H R
(uD , ) 21 (BuD , uD )D (, R(uD ) u) .
By differentiating the Lagrangian L, we get the following optimality system:
BuD = R () and R(uD ) = u ,
so that = (RB 1 R )1 u and uD = B 1 R (RB 1 R )1 u. This leads to

(BuD , uD )D = ((RB 1 R )1 u, u).
7.2.2 Symmetric Generalized Eigenvalue problem

Important remark: In order to avoid further confusions, we warn the
reader that in this section, the notation A or B do not refer to the matrices
of a linear system to be solved but to abstracts linear operators.
Let V be a Hilbert space of dimension n with the scalar product denoted

by (, ). Let A and B be symmetric positive linear operators from V on V .
We first assume that operator B is also definite. We introduce the following

generalized eigenvalue problem
Find (yk , k ) V R such that

(7.13)
A yk = k B yk .
Since operator B is symmetric positive definite, the eigenvalues of (7.13)

can be chosen so that they form a both A-orthogonal and B-orthonormal
basis of V :
(Byk , yl ) = kl , (Ayk , yl ) = 0 for k l
where kl is the classical Kroenecker symbol. We have the following result
verified by the solutions of this problem
Lemma 7.2.4 Let > 0 and define the space related to the eigenpairs of
the problem (7.13)
1
Y = Span {yk k < } . (7.14)

Let denote the projection on Y parallel to Span{yk k 1 }.
Then, for all y V the following inequality holds
( y (y), B(y (y) ) (Ay, y) . (7.15)
Proof Let y V , we have

n
y = (By, yk ) yk = (By, yk ) yk + (By, yk ) yk .
k=1 k < 1 k 1

Y Span{yk k 1 }
Thus, we have:
y (y) = (By, yk ) yk ,
k 1
so that using the B-orthonormality of the eigenvector basis,
(B(y (y)), y (y)) = (By, yk )2 (By, yk )2 k . (7.16)

k 1 k 1
On the other hand, using the A-orthogonality of the eigenvector basis, we

have:
n n
(Ay, y) = ( (By, yk ) Ayk , (By, yl ) yl , )
k=1 l=1
n
= (By, yk ) (Ay, yk ) (7.17)
k=1
n
= (By, yk )2 k (By, yk )2 k .
k=1 k 1
Combining (7.16) and (7.17) ends the proof.
We need to consider the case where both operators A and B may be undefi-
nite. Let P be the orthogonal projection on range(A). Since A is symmetric
positive, P is actually a projection parallel to ker(A). We introduce the fol-
lowing generalized eigenvalue problem
Find (xk , k ) range(A) R such that

(7.18)
P BP xk = k A xk .
Note that operator P BP is symmetric positive from range(A) into itself

and that matrix A seen as a linear operator from range(A) into itself is
symmetric positive definite. Thus the eigenvalues of (7.18) can be chosen so
that they form a both A and P BP orthogonal basis of range(A). We have
the following result verified by the solutions of this problem
Lemma 7.2.5 Let > 0 and define the space related to the eigenpairs of
the problem (7.18)
Z = ker(A) Span{xk k > }. (7.19)
Let denote the projection on Z parallel to Span{xk k }.

Then, for all x V the following inequality holds
( x (x), B(x (x) ) (Ax, x) . (7.20)
Proof Let C be the symmetric linear operator defined as follows:

C range(A) range(A)
x z P BP x
Let m = dim range(A) = n dim ker(A). By normalizing the family of

eigenvectors (xk )1km from (7.18) so that they form an A-orthonormal
basis of range(A), we get
m
P x = (AP x, xk ) xk .
k=1
Note that the eigenvectors are C-orthogonal as well
(Cxk , xl ) = (k Axk , xl ) = k kl ,
where kl is the classical Kronecker delta function. For any x V , we have

the following decomposition into components of ker(A) and range(A):
m
x = (x P x) + (AP x, xk ) xk .
k=1
ker(A)
range(A)
In this spirit, the orthogonal projection (x) of x on Z is:
(x) = (x P x) + (AP x, xk ) xk .
k >
ker(A)
Span{xk k > }
Note that is a projection parallel to Span{xk k }. By estimating now

the first term of (7.20) we see that
x (x) = (AP x, xk ) xk .
k
By plugging this formula into (B(x (x)), x (x)) and by using the
fact that the eigenvectors xk and xl belong to range(A) (xk = P xk ) and
their C-orthogonality, we get

(B(x (x)), x (x)) = (AP x, xk ) Bxk , (AP x, xl ) xl
kk ll
= (AP x, xk ) (AP x, xl ) (Bxk , xl )
ll kk
= (AP x, xk ) (AP x, xl ) (BP xk , P xl )
ll kk
= (AP x, xk ) (AP x, xl ) (Cxk , xl )
ll kk
= (AP x, xk ) k .
2
k
(7.21)
We can easily find upper bounds of this expression
m
(AP x, xk ) k (AP x, xk ) (AP x, xk )
2 2 2
k k k=1
m
= (AP x, (AP x, xk )xk ) = (AP x, P x) = (Ax, x).
k=1
(7.22)
From (7.21) and (7.22), the conclusion follows.
We have defined two generalised eigenvalue problems with no apparent con-

nection (7.13) and (7.18). They allow the definition of two different sets
(7.14) and (7.19). In the following we would like to analyze the relationship
between these two sets.
Lemma 7.2.6 (Relationship between Y and Z ) We suppose that B

is a positive operator. We can distinguish between these cases
If A is positive definite then Y = Z .
If A is not positive definite then Y = Z if and only if P BP = P B.

Proof Consider first the case when A is positive definite. In this case the
projection on the range of A is the identity: P = I and ker(A) = . Thus
problem (7.18) reduces to
Bxk = k Axk A1 Bxk = k xk
while (7.13) will be equivalent to
1
A1 Byk = yk
k
We can thus conclude that
1
Y = Span {yk k < } = Span {xk k > } = Z .

If A is not definite but B is, we can now left-multiply (7.13) by B 1/2 which
results into
B 1/2 AB 1/2 B 1/2 yk = k B 1/2 yk Ayk = k yk . (7.23)

A yk yk
which is a standard eigenvalue problem. First note that

range(A) = B 1/2 range(A), ker(A) = B 1/2 ker(A).
Let us analyze the problem (7.23). If k = 0 then yk ker(A). Otherwise
1 1
k 0 yk = Ayk range(A) P yk = Ayk . (7.24)
k k
with P being the projection onto the range(A) parallel to ker(A).
Consider now the similar problem
P xk = k Axk
P xk = k B 1/2 A B 1/2 xk

xk (7.25)
B 1/2 P B 1/2 Bxk = k Axk

P
P Bxk = k Axk .
We see that
1 1
B 1/2 Y = Span {B 1/2 yk k < } = Span {yk k < }

= ker(A) Span{xk k > }
= B 1/2 ker(A) + B 1/2 Span{xk k > }

Y = ker(A) + Span{xk k > }
with xk being the solution of the generalised eigenvalue problem (7.25). This
leads to the conclusion that Y = Z if and only if P BP = P B.
Remark 7.2.2 (Counterexample) Problems (7.18) and (7.25) are not

equivalent in general. Indeed,
ker(A) range(A) ker(A)
P Bxk = P B (xk + xk ) = P BP xk + P Bxk
ker(A)
and the quantity P Bxk is not necessarily null. Consider now the fol-
lowing example
1 0 2 1 1 0
A=( ), B = ( )P =( ).
0 0 1 1 0 0
The solution of the eigenproblem (7.13) is
1 0 2 1 y 2y + y2
( )y = ( )y ( 1 ) = ( 1 )
0 0 1 1 0 y1 + y2
0 1
(y, ) {[( ) , 0] , [( ) , 1]}
1 1
The solution of the eigenproblem is now (7.18)
2 0 1 0 2x1 x
( )x = ( )x ( ) = ( 1 )
0 0 0 0 0 0
0 1
(x, ) {[( ) , 0] , [( ) , 2]}
1 0
Consider now = 1.5. Then we have

0 0 1
Y = Span {( )} , Z = Span {( ) , ( )} = R2 .
1 1 0
therefore Y Z .
7.2.3 Auxiliary lemma

In the sequel, we make often use of the following lemma:
Lemma 7.2.7 Let M, n, (ni )1iM be positive integers, Qi Rnni , 1 i

M be rectangular matrices and A Rnn be a symmetric positive definite
matrix. Let k0 be an integer defined by:
k0 = max #{i QTi AQj 0}.

1jM
Then for all Ui Rni , 1 i M , we have the following estimate
M T M M
( Qi Ui ) A ( Qi Ui ) k0 UTi (QTi AQi ) Ui (7.26)
i=1 i=1 i=1
Proof By Cauchy-Schwarz inequality, we have:
M T M
( Qi Ui ) A ( Qi Ui ) = (UTi QTi ) A (Qj Uj )
i=1 i=1 i,jQT
i A Qj 0
1/2 1/2
(UTi QTi A Qi Ui ) (UTj QTj A Qj Uj ) .
i,jQT
i A Qj 0
(7.27)
Let us introduce the connectivity matrix C RM RM defined as follows:
1 if QTi A Qj 0, 1 i, j M
Cij = { (7.28)
0 otherwise.
and v RM , v = (vi )1iM the vector of norms defined as
T
v = ((UT1 QT1 AQ1 U1 )1/2 , . . . , (UTM QTM AQM UM )1/2 ) . (7.29)
Note that we have
M M
v22 = vi2 = UTi (QTi AQi )Ui (7.30)
i=1 i=1
and matrix C is symmetric as a consequence of the symmetry of A. With

notations (7.28) and (7.29), estimate (7.27) becomes
M T M
( Qi Ui ) A ( Qi Ui ) vT Cv . (7.31)
i=1 i=1
Note that we also have by definition of an operator norm:
vT Cv C2 v22 . (7.32)
Since matrix C is symmetric, its 2-norm is given by the largest eigevalue in

modulus, which is less than the infinity norm of C. It can be easily checked
that C = k0 and by consequence we will have C2 k0 . Finally, from
estimates (7.30), (7.31) and (7.32), we have
M T M M M
( Qi Ui ) A ( Qi Ui ) k0 vi2 = k0 UTi (QTi AQi )Ui
i=1 i=1 i=1 i=1
which ends the proof.

7.3 Finite element setting

We recall here the finite element setting that is used in order to build the
coarse space. Consider first the variational formulation of a problem for a
given open domain IRd (d = 2 or 3)
a (u, v) = l(v), v V (7.33)
where l(v) denotes a linear form over a Hilbert space V . The problem we
consider is given through a symmetric positive definite bilinear form that is
defined in terms of an integral over any open set Rd for some integer d.
Typical examples are the Darcy equation (K is a diffusion tensor)
a (u, v) = K u v dx , (7.34)

or the elasticity system (C is the fourth-order stiffness tensor and (u) is
the strain tensor of a displacement field u):
a (u, v) = C (u) (v) dx .

We consider a discretization of the variational problem (7.33) by the finite
element (FE) method on a mesh Th of : = cTh c. The approximation
space is then denoted by Vh , {k }kN denotes its basis and N is the related
index set. Let us write the discrete FE problem that we want to solve:
Find uh Vh such that
(7.35)
a (uh , vh ) = l(vh ), for all vh Vh .
which gives the matrix form
AU = b, Aij = a (j , i ), bi = l(i ), i, j N .
so that
uh = uk k , U = (uk )kN .
kN
Domain is decomposed into N subdomains (i )1iN so that all sub-
domains are a union of cells of the mesh Th . This decomposition induces
a natural decomposition of the set of indices N into N subsets of indices
(Nj )1jN as was done in eq. (1.28):
Nj = {k N meas((k ) j ) 0} , 1 j N . (7.36)
j be the #Nj #Nj matrix defined by
Let A
j Uj = a Ujk k , Vjk k , Uj , Vj RNj .

VjT A (7.37)
j
kNj kNj
When the bilinear form a results from the variational solve of a Laplace
problem, the previous matrix corresponds to the discretisation of local Neu-
mann boundary value problems. For this reason we will call it Neumann
matrix even in a more general setting.
7.4. GENEO COARSE SPACE FOR ADDITIVE SCHWARZ 205
1
2
Figure 7.1: Initial geometric decomposition of the domain
7.4 GenEO coarse space for Additive Schwarz

In order to apply the Fictitious Space Lemma to ASM, we use the framework
defined in 7.1. Note that most of the assumptions of this lemma are
verified without the precise definition of a coarse space. That is why it will
be specified later on in Definition 7.4.2 only when needed.
Lets start with the surjectivity and continuity of RASM,2 which do not
depend on the choice of the coarse space.
Lemma 7.4.1 (Surjectivity of RASM,2 ) The operator RASM,2 defined by
(7.5) is surjective.
Proof From the partition of unity as defined in paragraph 1.3 equa-
tion (1.25), we have for all U H:
U = RASM,2 (U), with U = (0, (Di Ri U)1iN ),
which proves the surjectivity of RASM,2 .
Lemma 7.4.2 (Continuity of RASM,2 ) Let
k0 = max # {j Rj A RiT 0} (7.38)

1iN
be the maximum multiplicity of the interaction between subdomains plus one.

Then for all U HD , we have
a(RASM,2 (U), RASM,2 (U)) max(2 , k0 ) b(U, U) ,
which proves the continuity of operator RASM,2 (i.e. hypothesis (7.9) from
Lemma 7.2.2), with
cR = max(2 , k0 )
as a continuity constant.
Proof Let U = (Ui )0iN HD . Then we have by definition

N T N
a(RASM,2 (U), RASM,2 (U)) = ( RiT Ui ) A ( RiT Ui ) (7.39)
i=0 i=0
and
N
T
b(U, U) = (RiT Ui ) A (RiT Ui ) . (7.40)
i=0
Applying Lemma 7.2.7, with M = N +1, Qi = RiT (0 i N ) would not yield

a sharp estimate. Indeed, we note that R0 ARiT 0, 0 i N which means
the constant k0 in Lemma 7.2.7 would be equal to N + 1. Thus, we proceed
in two steps. Since A is symmetric positive, we have:
N T N
( RiT Ui ) A ( RiT Ui ) 2 ((R0T U0 )T A(R0T U0 )
i=0 i=0
T (7.41)
N N
+ ( RiT Ui ) A ( RiT Ui )
i=1 i=1
We use Lemma 7.2.7 to estimate the last term and we have:

N T N
( RiT Ui ) A ( RiT Ui ) 2 ((R0T U0 )T A(R0T U0 )
i=0 i=0
N
T
+k0 (RiT Ui ) A (RiT Ui ))
i=1
N
T
max(2 , k0 ) (RiT Ui ) A (RiT Ui ) .
i=0
(7.42)
7.4.1 Some estimates for a stable decomposition with RASM,2

We now focus on the stable decomposition, estimate (7.10), which requires
further analysis and is dependent on the choice of the coarse space given
here by its column-matrix form Z0 = R0T . Nevertheless, we note that some
intermediate results, valid for arbitrary coarse space, simplify the require-
ment of (7.10). For example, the following lemma which is valid whatever
the choice of Z0 = R0T .
Lemma 7.4.3 Let U H and U = (Ui )0iN HD such that U =
RASM,2 (U).
Then, we have:
N
b(U, U) 2 a(U, U) + (2 k0 + 1) a(RiT Ui , RiT Ui ) . (7.43)
i=1
Proof By definition of RASM,2 (see eq. (7.5)) and Cauchy-Schwarz inequal-

ity, we have:
N
b(U, U) = a(R0T U0 , R0T U0 ) + a(RiT Ui , RiT Ui )
i=1
N N N
= a (U RiT Ui , U RiT Ui ) + a(RiT Ui , RiT Ui )
i=1 i=1 i=1
N N N
2 [a(U, U) + a ( RiT Ui , RiT Ui )] + a(RiT Ui , RiT Ui )
i=1 i=1 i=1
N

2 a(U, U) + a(Ri Ui , Rj Uj ) + a(RiT Ui , RiT Ui )
T T
i=1
1i,jN /Rj A RiT 0
Applying Lemma 7.2.7, the middle term can be bounded by
N
k0 a(RiT Ui , RiT Ui ) .
i=1
and the conclusion follows.
Note that the result (7.43) is insufficient to yield a spectral estimate of the
ASM preconditioner. We still have to bound from above the subdomain en-
ergy terms a(RiT Ui , RiT Ui ), 1 i N , by the global energy term a(U, U).
In order to do this, we first introduce an estimate to a(U, U) from below
in terms of a sum of some local energy terms, see (7.44) and then infer from
it a construction of the coarse space in Definition 7.4.2.
Lemma 7.4.4 Let k1 be the maximal multiplicity of subdomains intersec-
tion, i.e. the largest integer m such that there exists m different subdomains
whose intersection has a non zero measure.
Then, for all U RN , we have
N
T j
(Rj U) A Rj U k1 UT AU = k1 a(U, U) (7.44)
j=1
j are defined by eq. (7.37).

where matrices A
Proof This property makes use of the finite element setting of 7.3. Let
uh = kN Uk k , then by definition
j Rj U .
a (uh , uh ) = UT AU and aj (uh , uh ) = (Rj U)T A
Since at most k1 of the subdomains have a non zero measure intersection,
the sum N j=1 aj (uh , uh ) cannot exceed k1 times a (uh , uh ). Let us be more
explicit in the case of a Darcy equation (7.34). Inequality (7.44) reads:
N
K uh 2 dx k1 K uh 2 dx .
j=1 j
7.4.2 Definition of the GenEO coarse space

All the previous results are valid independently of the choice of the
(Uj )1jN in the decomposition of U and of the coarse space. They cant
give access to an estimate of the constant cT in the stable decomposition
estimate required by condition (7.10) of the Fictitious Space Lemma 7.2.2.
But, they are important steps since they enable a local construction of a
global coarse space so that the constant cT can be chosen, a priori less than
1/2, see Lemma 7.4.3. Actually, we see from (7.43) and (7.44) that it is
sufficient, for some given parameter > 0, to define (Uj )1jN such that
N N
T j
a(Rj Uj , Rj Uj ) (Rj U) A
T T
Rj U , (7.45)
j=1 j=1
which can be satisfied by demanding that for all subdomains j we have

j Rj U .
a(RjT Uj , RjT Uj ) (Rj U)T A (7.46)
This estimate will have as a consequence the stability of the decomposition

and a short computation shows that we can take cT = (2 + (2 k0 + 1)k0 )1
in (7.10). The definition of Uj which satisfy (7.46) for a given threshold
will be a consequence of the definition of the coarse space.
We now detail the construction of the GenEO coarse space.

To start with, we apply the abstract Lemma 7.2.5 in each subdomain to the
following generalized eigenvalue problem:
Definition 7.4.1 (Generalized eigenvalue problem) For all subdo-

mains 1 j N , let
j = Dj (Rj ART )Dj .
B j
Let Pj be the projection on range(A

j ) parallel to ker(A
j ). Consider the
generalized eigenvalue problem:
jk , jk ) range(A
Find (U j ) R
Pj B jk = jk A
j Pj U jk .
j U
Define also
jk jk > }
j ) Span{U
Zj = ker(A
and the local projection
j on Z jk jk }.
j parallel to Span{U
With these notations, Lemma 7.2.5 translates into:

Lemma 7.4.5 (Intermediate Lemma) For all subdomain 1 j N and

j RNj , we have:
U
j )T A (RT Dj (Id j )U
(RjT Dj (Id j )U j) U
TA j .
j U (7.47)
j j
We see now that estimate (7.46) can be obtained directly from (7.47) pro-
vided that Uj are such that the left-hand sides of (7.46) and (7.47) are the
same, that is
Uj = Dj (Id j )Rj U. (7.48)
It remains now to define the coarse space component U0 and the coarse
space interpolation operator, such that Nj=0 Rj Uj = U. From the previous
T
results we can infer that this decomposition is stable.
Definition 7.4.2 (GenEO Coarse space) The GenEO coarse space is

defined as a sum of local contributions weighted with the partition of unity:
N
j .
V0 = RjT Dj Z (7.49)
j=1
Let Z0 R#N dim(V0 ) be a column matrix so that V0 is spanned by its

columns. We denote its transpose R0 = Z0T .
Note that for all U0 V0 , we have the following equality:
U0 = R0T ((R0 R0T )1 )R0 U0 .
Theorem 7.4.1 (GenEO stable decomposition) Let U RN , for all

subdomain 1 j N , we define Uj like in (7.48) and U0 by:
N
U0 = (R0 R0T )1 R0 RjT Dj j Rj U
j=1
so that
N
R0T U0 = RjT Dj j Rj U.
j=1
Let cT = (2 + (2 k0 + 1)k0 )1 and U denote (Ui )0iN HD .

Then, U is a cT -stable decomposition of U since we have:
N
1 T
RASM,2 (U) = RjT Uj = U and b(U, U) U AU = a(U, U) .
j=0 cT
Proof We first check that R(U) = U . By definition, we have:
N
U = RjT Dj Rj U
j=1
N N N
= RjT Dj j Rj U + RjT Dj (Id j )Rj U = RjT Uj .
j=1 j=1 j=0
Uj
R0T U0
We now prove the second part of the theorem, the stability of the decom-
position. By Lemma 7.4.3, we have
N
b(U, U) 2 a(U, U) + (2 k0 + 1) a(RiT Ui , RiT Ui ) . (7.50)
i=1
Then, by definition of Ui , 1 i N in equation (7.48):

T
a(RiT Ui , RiT Ui ) = (RiT Di (Id i )Ri U) A (RiT Di (Id i )Ri U) (7.51)
and by Lemma 7.4.5 we have thus:
i (Ri U) .
a(RiT Ui , RiT Ui ) (Ri U)T A
Summing over all subdomains and using Lemma 7.4.4, we have:

N
a(Ri Ui , Ri Ui ) k1 U AU .
T T T
i=1
Finally, using (7.50) and (7.51), we have:
b(U, U) (2 + (2 k0 + 1)k1 ) UT AU . (7.52)
Combining Lemma 7.4.2 and equation (7.52), we have thus proved
Theorem 7.4.2 The eigenvalues of the two level Schwarz preconditioned

system satisfy the following estimate
1
(MASM
1
2 A) max(2 , k0 ) .
2 + (2k0 + 1)k1
Due to the definition of the coarse space, we see that the condition number
of the preconditioned problem will not depend on the number of the subdo-
mains but only on the parameters k0 , k1 and . Parameter can be chosen
arbitrarily small at the expense of a large coarse space.
7.5. HYBRID SCHWARZ WITH GENEO 211
7.5 Hybrid Schwarz with GenEO

Another version of the Schwarz preconditionner, which we will call Hybrid
Schwarz method (HSM) can be also formalized in the framework of the Fic-
titious Space Lemma. The coarse space is as before based on the same gener-
alized eigensolves. The difference lies in the way the coarse space correction
is applied, see [128] and [178]. Let P0 denote the a-orthogonal projection on
the GenEO coarse space V0 defined above, Definition 7.4.2. Let us denote
by MASM
1
the one-level Schwarz method
N
MASM
1
= RiT (Ri ARiT )1 Ri .
i=1
Following [128], the hybrid Schwarz preconditioner is defined as follows:
MHSM
1
= R0T (R0 AR0T )1 R0 + (Id P0 ) MASM
1
(Id P0T ) . (7.53)
We shall see at the end how to implement efficiently this preconditioner in

a preconditioned conjugate gradient method (PCG).
We first study the spectral properties of this preconditioner. In order to

apply the Fictitious Space Lemma 7.2.2, it suffices in the previous set up to
modify the linear operator R (7.5) and replace it by the following definition.
Definition 7.5.1 For U = (Ui )0iN , RHSM HD H is defined by
N
RHSM (U) = R0T U0 + (Id P0 ) RiT Ui . (7.54)
i=1
We check the assumptions of the Fictitious Space Lemma when RHSM re-
places RASM,2 in Definition 7.1.1. This will give us the condition number
estimate (7.57).
Lemma 7.5.1 (Surjectivity of RHSM ) The operator RHSM is surjec-

tive.
Proof For all U H, we have:

N
U = P0 U + (Id P0 ) U = P0 U + (Id P0 ) RiT Di Ri U .
i=1
Since P0 U V0 , there exists U0 R#N0 such that P0 U = R0T U0 . Thus, we

have
N
U = R0T U0 + (Id P0 ) RiT (Di Ri U) ,
i=1
or, in other words
RHSM (U0 , (Di Ri U)1iN ) = U .
which proves the surjectivity.
Lemma 7.5.2 (Continuity of RHSM ) Let k0 be defined as in (7.38).

Then, for all U = (Ui )0iN HD , the following estimate holds
a(RHSM (U), RHSM (U)) k0 b(U, U). (7.55)
Proof Since P0 and Id P0 are a-orthogonal projections, we have also

making use of Lemma 7.2.7:
a(RHSM (U), RHSM (U)) = a(R0T U0 , R0T U0 )

N N
+ a ((Id P0 ) RiT Ui , (Id P0 ) RiT Ui )
i=1 i=1
N N
a( R0T U0 , R0T U0 ) + a ( RiT Ui , RiT Ui )
i=1 i=1
N
a( R0T U0 , R0T U0 ) + k0 a(RiT Ui , RiT Ui )
i=1
N
k0 a(RiT Ui , RiT Ui ) = k0 b(U, U) .
i=0
Lemma 7.5.3 (Stable decomposition with RHSM ) Let U H. For

1 j N , we define:
Uj = Dj (Id
j ) Rj U
and U0 RN0 such that:
R0T U0 = P0 U .
We define U = (Ui )1iN HD .
Then, we have
RHSM (U) = U
and the stable decomposition property is verified
b(U, U) (1 + k1 ) a(U, U). (7.56)
Proof We first check that we have indeed a decomposition, i.e. that the
equality RHSM (U) = U holds. Note that for all 1 j N we have
RjT Dj
j Rj U V0 (Id P0 )RjT Dj
j Rj U = 0 .
7.5. HYBRID SCHWARZ WITH GENEO 213
We have:
N
U = P0 U + (Id P0 )U = P0 U + (Id P0 ) RjT Dj Rj U
j=1
N
= P0 U + (Id P0 ) RjT Dj Rj U
j=1
N
= R0T U0 + (Id P0 ) RjT Dj (Id
j ) Rj U = RHSM (U) .
j=1
The last thing to do is to check the stability of this decomposition. Using

Lemma 7.4.5, then Lemma 7.4.4 and the fact that P0 is a a-orthogonal
projection, we have
b(U, U) = a(R0T U0 , R0T U0 )
N
+ a(RjT Dj (Id j ) Rj U, RjT Dj (Id j ) Rj U)
j=1
Uj Uj
N
j (Rj U)
a(P0 U, P0 U) + (Rj U)T A
j=1
a(U, U) + k1 a(U, U) (1 + k1 ) a(U, U) .
Previous lemmas lead to the condition number estimate of the algorithm:

Theorem 7.5.1 (Hybrid Schwarz algorithm) Let be a user-defined
parameter to build the GenEO coarse space as in Definition 7.4.1.
The eigenvalues of the hybrid Schwarz preconditioned system satisfy the fol-
lowing estimate
1
(MHSM
1
A) k0 . (7.57)
1 + k1
Note that a hybrid preconditionner will lead to a better condition num-
ber than for the additive one (Theorem 7.4.2) and consequently to a faster
convergence.
7.5.1 Efficient implementation of the hybrid Schwarz

method
As it is written in (7.53), the application of the hybrid preconditioner
involves several applications of the projection P0 . We see in this paragraph
that a clever choice of the initial guess in the PCG algorithm reduces the
cost of the algorithm, see [128].
Note first that the matrix form of P0 is:

P0 = R0T (R0 AR0T )1 R0 A . (7.58)
Indeed, this formula clearly defines a projection since
(R0T (R0 AR0T )1 R0 A)2 = R0T (R0 AR0T )1 R0 AR0T (R0 AR0T )1 R0 A
= R0T (R0 AR0T )1 R0 A .
This projection is A-orthogonal since for all vectors U , V , we have:
(R0T (R0 AR0T )1 R0 A U , V )A = (R0T (R0 AR0T )1 R0 A U , AV )

= (U , R0T (R0 AR0T )1 R0 A V )A .
Finally, the range of R0T (R0 AR0T )1 R0 A is V0 since for all U V0 , there
exist W such that U = R0T W and we have:
R0T (R0 AR0T )1 R0 AU = R0T (R0 AR0T )1 R0 AR0T W = R0T W = U .
From (7.58), we can rewrite definition (7.53) of MHSM

1
as:
MHSM
1
= P0 A1 + (Id P0 ) MASM
1
(Id P0T ) .
Note also that :
P0T A = AR0T (R0 AR0T )1 R0 A = AP0 .
These relations yield the following expression for the preconditioned operator
MHSM
1
A = P0 + (Id P0 ) MASM
1
A(Id P0 ) . (7.59)
When solving the linear system Ax = b with the preconditioned conjugate

gradient (PCG) preconditioned by MHSM 1
(7.53) the method seeks an ap-
proximation to the solution in the Krylov space
Kn (MHSM
1
A, r0 ) = {r0 , MHSM
1
A r0 , . . . , (MHSM
1
A)n1 r0 }
where
r0 = MHSM
1
(b Ax0 )
is the initial preconditioned residual. If r0 is chosen so that
P0 r0 = 0 ,
we have a simplification in the expression of the Krylov subspace:
Kn (MHSM
1
A, r0 ) = {r0 , (Id P0 ) MASM
1
Ar0 , . . . , ((Id P0 ) MASM
1
)n1 r0 } .
7.6. FREEFEM++ IMPLEMENTATION 215
This can be easily proved using formula (7.59) and the fact that P02 = P0 :
MHSM
1
A r0 = (P0 + (Id P0 ) MASM
1
A(Id P0 )) r0
= (Id P0 ) MASM
1
A r0 ,
(MHSM
1
A)2 r0 = (P0 + (Id P0 ) MASM
1
A(Id P0 ))(Id P0 ) MASM
1
A r0
= (Id P0 ) MASM
1
A(Id P0 )MASM
1
A r0
= ((Id P0 ) MASM
1
A)2 r0 ,

It means that in the PCG method, it is sufficient to consider that the pre-
conditioner is
(Id P0 ) MASM
1
.
In order to have P0 r0 = 0, we can choose for example
x0 = R0T (R0 AR0T )1 R0 b .
By using (7.53) and (7.58) we see that:
P0 MHSM
1
= P0 R0T (R0 AR0T )1 R0 = R0T (R0 AR0T )1 R0
which leads to
P0 r0 = P0 MHSM
1
(b AR0T (R0 AR0T )1 R0 b)
= R0T (R0 AR0T )1 R0 (b AR0T (R0 AR0T )1 R0 b)
= 0.
To sum up, the PCG algorithm (see Algorithm 5 in 3.3.1) for the Hybrid
Schwarz method takes the form given in Algorithm 8.
7.6 FreeFem++ Implementation

We illustrate the GenEO coarse space on a Darcy problem with two layers
of very high and different heterogeneities, see Figure 7.2:
div(u) = f, in (7.60)
where R is defined by:

1.e6 if .2 < y < .4

= 1.e5 if .6 < y < .8

1.

else
Algorithm 8 PCG algorithm for the Hybrid Schwarz method

Compute x0 = R0T (R0 AR0T )1 R0 b,
r0 = b Ax0 ,
z0 = (Id P0 )MASM 1
r0
p0 = z 0 .
for i = 0, 1, . . . do
i = (ri , zi )2
qi = Api
i
i =
(pi , qi )2
xi+1 = xi + i pi
ri+1 = ri i qi
zi+1 = (Id P0 )MASM 1
ri+1
i+1 = (ri+1 , zi+1 )2
i+1
i+1 =
i
pi+1 = zi+1 + i+1 pi
end for
The implementation of the Hybrid Schwarz (7.53) method is based on

three FreeFem++ scripts given here. The coarse space (7.49) is built in
GENEO.idp. We see that the main differences with respect to the standard
coarse grid is the definition of the local data under the form of weighted
matrices and the construction of the coarse space. The other parts are
identical to the standard coarse spaces, only the size of the coarse space is
bigger.
{
mesh Thi = aTh[i];
5 fespace Vhi(Thi,P1);
Vhi[int] eV(abs(nev));
real[int] ev(abs(nev));
if (nev > 0){//GENEO coarse space
9 int k =
EigenValue(aN[i],aAweighted[i],sym=true,sigma=0,maxit=50,tol=1.e4,value=ev,vector=eV);
cout << Eigenvalues in the subdomain << i <<endl;
k=min(k,nev); //sometimes the no of converged eigenvalues is bigger than nev.
cout << ev <<endl;
13 }
else// Nicolaides Coarse space
{
eV[0] = 1.;
17 }
for(int j=0;j<abs(nev);++j){
real[int] zitemp = Dih[i]eV[j][];
21 int k = iabs(nev)+j;
Z[k][]=Rih[i]zitemp;
}
}
Listing 7.1: ./THEORIECOARSEGRID/FreefemProgram/GENEO.idp
In PCG-CS.idp, we define the preconditioner MHSM

1
(7.53) and the precon-
ditioned Conjugate Gradient method.
func real[int] AS2(real[int] & r){

real[int] z = Q(r);
8 real[int] aux = AS(r);
z += aux;
return z;
}
{
real[int] aux1 = Q(u);
16 real[int] aux3 = AS(aux2);//aux3 = AS(P(u))
aux2 = PT(aux3);// aux2 = PT(AS(P(u)))
aux2 += aux1;// aux2 = Q(u) + PT(AS(P(u)))
return aux2;
20 }
/# fin BNNPrecond #/
/# debutCGSolve #/
func real[int] myPCG(real[int] xi,real eps, int nitermax)
24 {
ofstream filei(Convprec.m); // Matlab convergence history file
ofstream fileignu(Convprec.gnu); // Gnuplot convergence history file
Vh r, un, p, zr, zrel, rn, w, er;
28 un[] = xi;
r[] = A(un[]);
r[] = rhsglobal[];
r[] = 1.0;
32 zr[] = BNN(r[]);
real resinit=sqrt(zr[]zr[]);
p = zr;
for(int it=0;it<nitermax;++it)
36 {
plot(un,value=1,wait=1,fill=1,dim=3,cmm=Approximate solution at
iteration +it);
real relres = sqrt(zr[]zr[])/resinit;
cout << It: << it << Relative residual = << relres << endl;
40 int j = it+1;
fileignu << relres << endl;
if(relres < eps)
44 {
cout << CG has converged in + it + iterations << endl;
break;
48 }
w[] = A(p[]);
real alpha = r[]zr[];
real aux2 = alpha;
52 real aux3 = w[]p[];
alpha /= aux3; // alpha = (rj,zj)/(Apj,pj);
un[] += alphap[]; // xj+1 = xj + alphap;
r[] = alphaw[]; // rj+1 = rj alphaApj;
56 zr[] = BNN(r[]); // zj+1 = M1 rj+1;
real beta = r[]zr[];
beta /= aux2; // beta = (rj+1,zj+1)/(rj,zj);
p[] = beta;
60 p[] += zr[];
}
return un[];
}
Listing 7.2: ./THEORIECOARSEGRID/FreefemProgram/PCG-CS.idp
The script of the main program is given by AS2-PCG-GENEO.edp
include ../../FreefemCommon/dataGENEO.edp
4 include ../../FreefemCommon/createPartition.idp
plot(part,wait=1,fill=1,ps=partition.eps);
/# endPartition #/
8 /# debutGlobalData #/
12 /# finGlobalData #/
/# debutLocalData #/
{
16 cout << Domain : << i << / << npart << endl;
aA[i] = Rih[i]aT;
set(aA[i],solver = UMFPACK); // direct solvers using UMFPACK
20 varf valocal(u,v) = int2d(aTh[i])(etauv+ka(Grad(u)Grad(v)))
+on(1,u=g);
fespace Vhi(aTh[i],P1);
aN[i]= valocal(Vhi,Vhi);
24 set(aN[i], solver = UMFPACK);
matrix atimesxi = aA[i] Dih[i];
aAweighted[i] = Dih[i] atimesxi;
set(aAweighted[i], solver = UMFPACK);
28 }
/# finLocalData #/
/# debutPCGSolve #/
32 include GENEO.idp
include PCGCS.idp
36 sol[] = myPCG(un[], tol, maxit); // PCG with initial guess un
plot(sol,cmm= Final solution, wait=1,dim=3,fill=1,value=1);
Vh er = soluglob;
cout << Final relative error: << er[].linfty/sol[].linfty << endl;
40 /# finPCGSolve #/
Listing 7.3: ./THEORIECOARSEGRID/FreefemProgram/AS2-PCG-GENEO.edp
In our numerical test, the domain is decomposed into 24 subdomains by

an automatic mesh partitioner, see Figure 7.2. In Figure 7.3, we plot the
convergence curve for two different coarse spaces. The GenEO coarse space
IsoValue
220
-52630.6
CHAPTER 7. GENEO COARSE SPACE
26316.8
78948.4
131580
184212
236843
289475
342106
394738
447369
500001
552633
605264
657896
710527
763159
815790
868422
921054
1.05263e+06
Figure 7.2: Diffusion coefficient (left) and subdomain partition (right)
with two degrees of freedom per subdomain yields a convergence in 18 itera-

tions whereas a coarse space built from subdomain-wise constant functions,
Nicolaides coarse space, yields a convergence in almost 120 iterations.
0
10
Nicolaides
GenEO
1
10
2
10
3
10
Residual
4
10
5
10
6
10
7
10
0 20 40 60 80 100 120
Iterations
Figure 7.3: Comparison between Nicolaides and GenEO coarse spaces

7.7. BALANCING NEUMANN-NEUMANN 221
7.7 Balancing Neumann-Neumann

In the same spirit as in the previous sections of this chapter, the purpose
is to reformulate the classical Neumann-Neumann algorithms in the
framework of the Fictitious Space Lemma. Note that GenEO versions of
the Neumann-Neumann and FETI algorithms were first introduced in [174]
and then analyzed in [175].
We first recall the Neumann-Neumann algorithm introduced in chapter 6.

The aim is to solve the substructured system (6.40)
S U = G R#N (7.61)
where S is the Schur complement of the global matrix (6.39). This system
is defined on the interfaces degrees of freedom indexed by N . The set N
is decomposed into subdomain interfaces degrees of freedom:
N
N = N i .
i=1
For each subdomain 1 i N , let Ri be the restriction operator from the

skeleton R#N onto the subdomain interface R#Ni and Di R#Ni #Ni
be invertible diagonal matrices so that we have the following partition of
unity on the skeleton:
N
Ri Di Ri = IR#N .
T
(7.62)
i=1
In this context, a Schwarz preconditioner for problem (7.61):

N
S = Ri (Ri SRi ) Ri
T T 1
MASM
1
i=1
has been considered in [94, 95]. The Neumann-Neumann (NN) precondi-

tioner works differently, see Definition 6.4.1. It is based on a decomposition
of the global Schur complement operator S into a sum of symmetric positive
(but usually not definite) operators. Let (Si )1iN be given by (6.41), recall
that we have:
N
S = Si .
i=1
In 6.4, we have defined local operators for 1 i N :
Si = Ri Si RTi
where Si R#Ni #Ni are subdomain contributions. Thus, we have

N
S = RTi Si Ri (7.63)
i=1
and equation (7.61) can be re-written as
N
Ri Si Ri U = G .
T
(7.64)
i=1
When the operators Si are invertible, the Neumann-Neumann precondi-

tioner [12] is defined as:
N
N = Ri Di Si Di Ri .
T
MN
1 1
(7.65)
i=1
7.7.1 Easy Neumann-Neumann

We start with the analysis of the one-level Neumann-Neumann (NN) method
assuming that the local matrices Si are invertible.
Definition 7.7.1 (NN components in the Fictitious Space Lemma)

In the framework of the Fictitious Space Lemma 7.2.2 we define
The space of the skeleton
H = R#N
endowed with the standard Euclidean scalar product and the bilinear
form a H H R
a(U , V ) = (S U , V ) , U , V H .
The product space of the subdomain interfaces

N
HD = RNi
i=1
endowed with the standard Euclidean scalar product and the bilinear
form b
b HD HD R
N
((Ui )1iN , (Vi )1iN ) z (Si Ui , Vi ) .
i=1
The linear operator RN N as
RN N HD H
N
(7.66)
(Ui )1iN RTi Di Ui
i=1
With these notations, the preconditioner MN N is given by RN N B RN N .

1 1
Note that the operatpor RN N is surjective since from (7.62), we have for
all U H:
N
U = RTi Di Ri U = RN N ((Ri U )1iN ) .
i=1
Contrarily to the Schwarz method, the stable decomposition is easily checked
and is satisfied with cT = 1. Indeed, let U H, we have the natural decom-
position U = N i=1 Ri Di Ri U . In other words, let U = (Ri U )1iN , we
T
have U = RN N (U). By definition of the bilinear form b we have:

N N
b(U, U) = (Si Ri U , Ri U ) = ( RTi Si Ri U , U ) = (SU , U ) = a(U , U ) .
i=1 i=1
We now study the stability of RN N which is more intricate.

Lemma 7.7.1 (Continuity of the operator RN N ) Let
k2 = max # {j Rj S RTi 0} (7.67)

1iN
be the maximum multiplicity of the interaction between sub-interfaces via the

Schur complement operator. Let also
(S RTi Di Ui , RTi Di Ui )
max = max max . (7.68)
1iN Ui R#Ni /{0} (Si Ui , Ui )
Then, for all U HD we have:
a(RN N (U), RN N (U)) cR b(U, U) .
with cR = k2 max .
Proof Let U = (Ui )1iN HD , then
N T N
a(RN N (U), RN N (U)) = S ( Ri Di Ui ) , RTj Dj Uj .
i=1 j=1
It is sufficient to apply Lemma 7.2.7 where matrix A is S to get
N
a(RN N (U), RN N (U)) k2 (S RTi Di Ui , RTi Di Ui ) (7.69)
i=1
We now estimate the continuity constant of the linear operator RN N . From

(7.69) and the definition of max , we have:
N
a(RN N (U), RN N (U)) k2 (S RTi Di Ui , RTi Di Ui )
i=1
N
k2 max (Si Ui , Ui ) = k2 max b(U, U) .
i=1
Figure 7.4: Interface skeleton and support of S RTi .
Note that k2 is typically larger than k0 defined in (7.38) as the number

of neighbors of a subdomain. This is due to the sparsity pattern of S as
illustrated in Figure 7.4 for a P 1 finite element discretization. The entries
of S RTi are not zero on the interfaces drawn with a continuous line. On
the dashed interfaces, the entries of S RTi are zero. Note that k2 is typically
the number of neighbors of neighbors of a subdomain as seen it can be seen
from this equality
N
k2 = max # {j Rj RTk Sk Rk RTi 0} (7.70)
1iN k=1
obtained using formula (7.63) in the definition of k2 (7.67).
By applying the Fictitious Space Lemma 7.2.2 we have just proved:

Theorem 7.7.1 (Spectral estimate for Easy Neumann-Neumann)
Let max be defined by (7.68) and k2 by (7.67). Then, the eigenvalues of
the Neumann-Neumann preconditioned (7.65) system satisfy the following
estimate
1 (MN1
N S) k2 max .
This result is of little practical use since it assumes that the local Neumann
subproblems are well-posed which is not always the case. Moreover, we have
studied only the one level method.
We address the former issue in the next section and a GenEO coarse space
construction in 7.7.3.
7.7.2 Neumann-Neumann with ill-posed subproblems

In the case of a Poisson problem and for floating subdomains (subdomains
which do not touch the boundary of the global domain), constant functions
are in the kernel of the Neumann subproblems and thus the Schur com-
plement Si has a non trivial kernel. For elasticity problems, one has to
consider rigid body motions. Thus, we have to take into account the possi-
bility to have a non zero kernel for the local Schur complement Si , 1 i N
since for stationary problems, it will often be the case. For each subdomain
1 i N , the operator Si is symmetric and we have the orthogonal direct

sum (denoted by the symbol ):

RNi = ker Si range Si
and we denote by
Pi the orthogonal projection from RNi on ker Si and parallel to range Si .

Note that Si induces an isomorphism from range Si into itself whose inverse
will be denoted by Si ,
Si range Si range Si .
In order to capture the part of the solution that will come from the local ker-
nels Si (1 i N ), let Zi be a rectangular matrix of size #Ni dim(ker Si )
whose columns are a basis of ker Si . We form a rectangular matrix Z0 of
size #N N i=1 dim(ker Si ) by concatenation of the Zi s:
Z0 = (RTi Di Zi )1iN .
Let W0 be the vector space spanned by the columns of Z0 , we introduce the
projection P0 from R#N on W0 and which is S orthogonal.

Note that if Z0 is full rank, the matrix form of the linear operator P0 is
P0 = Z0 (Z0T SZ0 )1 Z0T S . (7.71)
Definition 7.7.2 (Balancing NN and the Fictitious Space Lemma)

In the framework of the Fictitious Space Lemma 7.2.2 we define
The Hilbert space
H = RN
with its standard Euclidean scalar product denoted (, ) is also endowed
with the bilinear form a defined as:
a(U, V) = VT S U .
The product space defined by

N
HD = W0 range Si
i=1
with its standard Euclidean scalar product denoted (, ) is also endowed

with the bilinear form b defined as:
b HD HD R
N
((Ui )0iN , (Vi )0iN ) z V0T S U0 + ViT Si Ui .
i=1
The linear operator RBN N is defined as:
RBN N HD H
N
(7.72)
(Ui )0iN z U0 + (Id P0 ) RTi Di Ui .
i=1
Lemma 7.7.2 (Balancing Neumann-Neumann preconditioner)

Using the notations in Definition 7.7.2, the Balancing Neumann-Neumman
preconditioner is
N
RBN N B 1 RBN N = P0 S1 + (Id P0 ) RTi Di Si (Id Pi )Di Ri (Id P0 )T .
i=1
(7.73)
Proof We first need to find RBN N . By definition, we have:
RBN N H HD
U z RBN N (U) ,
such that
V HD , V T RBN N (U) = RBN N (V)T U .
For all V = (Vi )0iN HD , the above equation is
N N
V0T RBN N (U)0 + ViT RBN N (U)i = (V0 + (Id P0 ) RTi Di Vi )T U
i=1 i=1
N
= V0T U + ViT Di Ri (Id P0 )T U .
i=1
Since HD is a product space, for arbitrary V0 W0 and Vi R#Ni (1 i N )

we can identify each term in the sum. Thus we have:
RBN N (U)0 = P0 U
and since, for all 1 i N , Id Pi is the orthogonal projection from RNi

on range Si , we have:
RBN N (U)i = (Id Pi )Di Ri (Id P0 )T U .
We now identify operator B HD HD related to the bilinear form b and

then B 1 . Operator B is a block diagonal operator whose diagonal entries
are denoted (Bi )0iN . It is clear that for all subdomains 1 i N , Bi is
the restriction of Si from range Si into itself whose inverse is Si . As for B0 ,
actually we identify directly its inverse. For all U0 , V0 W0 , we have:
(V0 , U0 )S = (V0 , SU0 ) = (V0 , B0 U0 )
= (V0 , SS1 B0 U0 ) = (V0 , S1 B0 U0 )S = (V0 , P0 S1 B0 U0 )S .
We can infer that P0 S1 B0 W0 W0 is the identity operator so that

B01 = P0 S1 . Note that when formula (7.71) is valid, we have
P0 S1 = Z0 (Z0T SZ0 )1 Z0T .
This yields the final form of the preconditioner RBN N B 1 RBN N which is
called the Balancing Neumann-Neumann preconditioner, see [130], [121] and
[63]:
N

N = P0 S + (Id P0 ) Ri Di Si (Id Pi )Di Ri (Id P0 ) .
T T
MBN
1 1
i=1
(7.74)
We now check the assumptions of the Fictitious Space Lemma 7.2.2.
Lemma 7.7.3 (Surjectivity of RBN N ) The operator RBN N is surjec-

tive.
Proof From the partition of unity (7.62), we have for all U H:
U = P0 U + (Id P0 )U = P0 U + (Id P0 ) (Id P0 )U

N
= P0 U + (Id P0 ) RTi Di Ri (Id P0 )U
i=1
N
(7.75)
= P0 U + (Id P0 ) RTi Di (Id Pi )Ri (Id P0 )U
i=1
N
+ (Id P0 ) RTi Di Pi Ri (Id P0 )U
i=1
The last term is zero since Pi Ri (Id P0 )U ker Si and thus

N
Ri Di Pi Ri (Id P0 )U W0
T
i=1
and Id P0 is a projection parallel to W0 . Finally we have proved that for

a given U H if we define U HD by
U = (P0 U, ((Id Pi )Ri (Id P0 )U)1iN ) (7.76)
we have RBN N (U) = U.
Lemma 7.7.4 (Stable decomposition) The stable decomposition prop-

erty (7.10) is verified with an optimal constant cT = 1.
Proof Let U H, we use the decomposition defined in equation (7.76). By

using Si Pi 0 (Pi projects on ker Si ), the symmetry of Si , equation (7.63)
and the S orthogonality of projection P0 , we have:
b(U, U) = (SP0 U, P0 U)
N
+ (Si (Id Pi )Ri (Id P0 )U, (Id Pi )Ri (Id P0 )U)
i=1
= (SP0 U, P0 U)
N
+ (Si Ri (Id P0 )U, (Id Pi )Ri (Id P0 )U)
i=1
N
= (SP0 U, P0 U) + (Si Ri (Id P0 )U, Ri (Id P0 )U)
i=1
= (SP0 U, P0 U) + (S(Id P0 )U, (Id P0 )U)
= (SU, U) = a(U, U) .
(7.77)
We now consider the stability of the linear operator RBN N .
Lemma 7.7.5 (Continuity of RBN N ) Let k0 be defined as in (7.67). Let

max be defined by:
(S RTi Di Ui , RTi Di Ui )
max = max max . (7.78)
1iN Ui rangeSi /{0} (Si Ui , Ui )
Then, the stability property (7.9) is verified with the constant cR =

max(1, k0 max ).
Proof Let U HD , by the S orthogonality of the projection P0 and

Lemma 7.2.7 with A substituted by S, we have:
a(RBN N (U), RBN N (U)) = (S RBN N (U), RBN N (U))
N N
= (S (P0 U0 + (Id P0 ) RTi Di Ui ), P0 U0 + (Id P0 ) RTi Di Ui )
i=1 i=1
N N
= (S P0 U0 , P0 U0 ) + (S(Id P0 ) RTi Di Ui , (Id P0 ) RTi Di Ui )
i=1 i=1
N N
(S U0 , U0 ) + (S RTi Di Ui , RTi Di Ui )
i=1 i=1
N
(S U0 , U0 ) + k2 (SRi Di Ui , RTi Di Ui )
T
i=1
N
max(1, k2 max ) ( (S U0 , U0 ) + (Si Ui , Ui ) )
i=1
= max(1, k2 max ) b(U, U)
(7.79)
where max is defined by equation (7.78).
By applying the Fictitious Space Lemma 7.2.2 we have just proved:
Theorem 7.7.2 (Spectral estimate for BNN) Let max be defined by
(7.78) and k2 by (7.67). Then, the eigenvalues of the Neumann-Neumann
preconditioned (7.74) system satisfy the following estimate
1 (MBN
1
N S) max(1, k2 max ) .
Constant max in (7.78) can be large and thus the Balancing Neumann
Neumann preconditioner (7.74) can be inefficient. For this reason, in the
next section, we define a coarse space which allow to guarantee any targeted
convergence rate.
7.7.3 GenEO BNN

We introduce an adaptive coarse space based on a generalized eigenvalue
problem on the interface of each subdomain 1 i N which allows to guar-
antee a targeted convergence rate. We introduce the following eigenvalue
problem:
Find (Uik , ik ) R#Ni {0} R such that
(7.80)
Si Uik = ik Di Ri SRTi Di Uik .
Matrix Di Ri SRTi Di is symmetric positive definite (Di is invertible) so
that we can apply Lemma (7.2.4) to it. Let > 0 be a user-defined threshold,
for each subdomain 1 i N , we introduce a subspace of Wi RNi :
1
Wi = Span {Uik ik < }. (7.81)

Note that the zero eigenvalue (ik = 0) corresponds to the kernel of Si so

that we have: ker(Si ) Wi . Now, let
1
i denote projection from RNi on Wi parallel to Span {Uik ik }.

From these definitions, it can easily be checked that:
Lemma 7.7.6 For all subdomain 1 i N and Ui R#Ni , we have:

T
(RTi Di (Id i )Ui ) SRTi Di (Id i )Ui ) UTi Si Ui . (7.82)
Proof It is sufficient to apply Lemma (7.2.4) with V = R#Ni , operator

A = Si and operator B = Di Ri SRTi Di .
Definition 7.7.3 (GenEO Coarse Space) Let Wi , 1 i N be the i-th

local component of the GenEO coarse space as defined in (7.81). Let Zi be
a rectangular matrix of size #Ni dim Wi whose columns are a basis of
Wi . We form a rectangular matrix ZGenEO of size #N N i=1 dim Wi by a
weighted concatenation of the Zi s:
ZGenEO = (RTi Di Zi )1iN .
Let WGenEO be the vector space spanned by the columns of ZGenEO and
projection Pg from R#N on WGenEO and which is S orthogonal. (7.83)
If the columns of ZGenEO are independent, the matrix form of Pg is:
Pg = ZGenEO (ZGenEO
T
SZGenEO )1 ZGenEO
T
S.
The proof is similar to that of formula (7.58) that was done in the context
of the hybrid Schwarz method in 7.5. Note that ker(Sj ) Wj for all
1 j N , so that we have W0 WGenEO .
Definition 7.7.4 (BNN - GenEO) The Balancing Neumann-Neumann-

GenEO preconditioner is defined by
N

N G = Pg S + (Id Pg ) Ri Di Si (Id Pi )Di Ri (Id Pg ) ,
T T
MBN
1 1
i=1
(7.84)
In order to study this new preconditioner we use the same framework than
for the balancing Neumann-Neumann method except that the natural coarse
space W0 is replaced by the GenEO coarse space WGenEO in the definition

of the product space HD :
N
HD = WGenEO range (Si ) ,
i=1
and accordingly operator RBN N is replaced by the linear operator RBN N G

defined as:
RBN N G HD H
N
(7.85)
(Ui )0iN z U0 + (Id Pg ) RTi Di Ui .
i=1
It can easily be checked from 7.7.2 that the surjectivity of RBN N G and
the stable decomposition are unchanged since W0 WGenEO .
Lemma 7.7.7 (Surjectivity of RBN N G ) Operator RBN N G is surjective.
Proof Similarly to (7.75), we have:
U = Pg U + (Id Pg )U = Pg U + (Id Pg ) (Id Pg )U

N
= Pg U + (Id Pg ) RTi Di Ri (Id Pg )U
i=1
N
(7.86)
= Pg U + (Id Pg ) RTi Di (Id Pi )Ri (Id Pg )U
i=1
N
+ (Id Pg ) RTi Di Pi Ri (Id Pg )U
i=1
The last term of this equation is zero since for all subdomains i, Pi Ri (Id
Pg )U ker Si and thus
N
Ri Di Pi Ri (Id Pg )U W0 WGenEO
T
i=1
and Id Pg is a projection parallel to WGenEO . Finally we have proved that

for all U H if we define U HD by
U = (Pg U, ((Id Pi )Ri (Id Pg )U)1iN ) (7.87)
we have RBN N G (U) = U.

As for the stable decomposition estimate with a stability constant cT = 1,
since projection Pg is S-orthogonal as P0 is, it suffices to replace P0 by Pg
in equation (7.77). The change in coarse space will improve the stability of
the linear operator RBN N G .
Lemma 7.7.8 (Continuity of RBN N ) Let k0 be defined as in (7.67).
Then the stability property (7.9) is verified with the constant cR =
max(1, k0 ).
Proof Let U HD , since Pg is an S-orthogonal projection on WGenEO , we

have:
(S RBN N G (U), RBN N G (U))
N N
= (S (U0 + (Id Pg ) RTi Di Ui ), U0 + (Id Pg ) RTi Di Ui )
i=1 i=1
N N
= (S U0 , U0 ) + (S(Id Pg ) RTi Di Ui , (Id Pg ) RTi Di Ui ) .
i=1 i=1
(7.88)
Since Pg is the S-orthogonal projection on WGenEO and that
N
Ri Di i Ui WGenEO
T
i=1
we have
N N
(S(Id Pg ) RTi Di Ui , (Id Pg ) RTi Di Ui )
i=1 i=1
N N
= (S(Id Pg ) RTi Di (Id i )Ui , (Id Pg ) RTi Di (Id i )Ui ) .
i=1 i=1
Thus using equality (7.88), Lemma 7.7.1 and then Lemma 7.7.6 we have:
(S RBN N G (U), RBN N G (U))

= (S U0 , U0 )
N N
+ (S(Id Pg ) RTi Di (Id i )Ui , (Id Pg ) RTi Di (Id i )Ui )
i=1 i=1
N
(S U0 , U0 ) + k2 (Si RTi Di (Id i )Ui , RTi Di (Id i )Ui )
i=1
N
max(1, k2 ) ((S U0 , U0 ) + (Si Ui , Ui ))
i=1
max(1, k2 ) b(U, U) .
(7.89)
where is the user defined threshold.
To sum up, by applying the Fictitious Space Lemma 7.2.2 we have:
Theorem 7.7.3 (Spectral estimate for BNNG) Let be a user de-

fined threshold to build the GenEO coarse space WGenEO (Definition 7.7.3),
Pg defined by equation (7.83) and k2 be defined by (7.67). Then, the eigen-
values of the Neumann-Neumann preconditioned (7.74) system satisfy the
following estimate
1 (MBN
1
N G S) max(1, k2 ) .
7.8. SORAS-GENEO-2 233
7.7.4 Efficient implementation of the BNNG method

As for the Hybrid Schwarz method 7.5.1, when MBN 1
N G is used as a pre-
conditioner in the PCG method to solve the linear system (7.61)
S U = G ,
an efficient implementation can be done if the initial guess is such that the
initial residual is S-orthogonal to WGenEO . It can be achieved simply by
taking as initial guess:
U0 = ZGenEO (ZGenEO
T
SZGenEO )1 ZGenEO
T
G .
In this case the application of the preconditioner MBN

1
N G in the PCG algo-
rithm can be replaced by the application of:
N
(Id Pg ) RTi Di Si (Id Pi )Di Ri .
i=1
7.8 SORAS-GenEO-2 coarse space for Optimized

Schwarz method
7.8.1 Symmetrized ORAS method
The purpose of this section is to define a general framework for building
adaptive coarse space for Optimized Schwarz Methods (OSM, see 2) for
decomposition into overlapping subdomains. We show that we can achieve
the same robustness that what was done in this chapter for Schwarz [173]
and FETI-BDD [174] domain decomposition methods with GenEO (Gener-
alized Eigenvalue in the Overlap) coarse spaces. For this, a non standard
symmetric variant of the ORAS method as well as two generalized eigenvalue
problems are introduced in [103].
We use the same finite element setting than in 7.3. Recall that the problem
to be solved is defined via a variational formulation on a domain Rd for
d N:
Find u V such that : a (u, v) = l(v) , v V ,
where V is a Hilbert space of functions from with real values. The problem
we consider is given through a symmetric positive definite bilinear form that
is defined in terms of an integral over any open set . Typical examples
are given in 7.3.
After discretization by a finite element method, we have to solve a linear
system in U of the form
AU = F .
As for Schwarz method 7.1, domain is decomposed into N overlapping
subdomains (i )1iN so that all subdomains are a union of cells of the
mesh Th . This decomposition induces a natural decomposition of the set of

indices N into N subsets of indices (Ni )1iN :
Ni = {k N meas(supp k i ) > 0} , 1 i N. (7.90)
For all 1 i N , let Ri be the restriction matrix from R#N to the subset
R#Ni and Di be a diagonal matrix of size #Ni #Ni , so that we have a par-
tition of unity at the algebraic level, Id = N
i=1 Ri Di Ri , where Id R
T #N #N
is the identity matrix.

For all subdomains 1 i N , let Bi be a SPD matrix of size #Ni #Ni ,
which comes typically from the discretization of boundary value local prob-
lems using optimized transmission conditions, the ORAS preconditioner, see
chapter 2, is defined as [36]:
N
MORAS,1
1
= RiT Di Bi1 Ri . (7.91)
i=1
Due to matrices Di , this preconditioner is not symmetric. Note that in

the special case Bi = Ri A RiT the ORAS algorithm reduces to the RAS algo-
rithm [18]. The symmetrized variant of RAS is the additive Schwarz method
(ASM), MASM,1
1
= N T 1
i=1 Ri Ai Ri , which has been extensively studied and
for which various coarse spaces have been analyzed. For the ORAS method,
it seems at first glance that we should accordingly study the following sym-
metrized variant:
N
MOAS,1
1
= RiT Bi1 Ri . (7.92)
i=1
For reasons explained in Remark 7.8.1, we introduce another non standard
variant of the ORAS preconditioner (7.91), the symmetrized ORAS (SO-
RAS) algorithm:
N
MSORAS,1
1
= RiT Di Bi1 Di Ri . (7.93)
i=1
7.8.2 Two-level SORAS algorithm

In order to define the two-level SORAS algorithm, we introduce two gener-
alized eigenvalue problems.
First, for all subdomains 1 i N , we consider the following problem:
Definition 7.8.1
Find (Uik , ik ) R#Ni {0} R such that
(7.94)
Di Ri ARiT Di Uik = ik Bi Uik .
Let > 0 be a user-defined threshold, we define Zgeneo

R#N as the vector
space spanned by the family of vectors (RiT Di Uik )ik > ,1iN corresponding
to eigenvalues larger than .
In order to define the second generalized eigenvalue problem, we make use

of Neumann subdomain matrices (A j )1jN (see eq (7.37)).
Definition 7.8.2 We introduce the generalized eigenvalue problem

Find (Vjk , jk ) R#Ni {0} R such that
i Vik = ik Bi Vik . (7.95)
A
Let > 0 be a user-defined threshold, we define Zgeneo

R#N as the vector
space spanned by the family of vectors (RiT Di Vik )ik < ,1iN corresponding
to eigenvalues smaller than .
We are now ready to define the SORAS two level preconditioner
Definition 7.8.3 (The SORAS-GenEO-2 preconditioner) Let P0 de-
note the A-orthogonal projection on the coarse space
ZGenEO-2 = Zgeneo

Zgeneo ,
the two-level SORAS-GenEO-2 preconditioner is defined as follows:
N
MSORAS,2
1
= P0 A1 + (Id P0 ) RiT Di Bi1 Di Ri (Id P0T ) . (7.96)
i=1
This definition is reminiscent of the balancing domain decomposition pre-

conditioner [128] introduced for Schur complement based methods. Note
that the coarse space is now defined by two generalized eigenvalue problems
instead of one in [173, 174] for ASM and FETI-BDD methods. We have the
following theorem
Theorem 7.8.1 (Spectral estimate for the SORAS-GenEO-2 preconditioner)
Let k0 be the maximal multiplicity of the subdomain intersections (see
eq. (7.38)), , > 0 be arbitrary constants used in Definitions 7.8.2 and
7.8.3 .
Then, the eigenvalues of the two-level preconditioned operator satisfy the
following spectral estimate
1
(MSORAS,2
1
A) max(1, k0 )
k0
1+

where (MSORAS,2
1
A) is an eigenvalue of the preconditioned operator.
The proof is based on the fictitious space lemma [145].
Remark 7.8.1 An analysis of a two-level version of the preconditioner

MOAS
1
(7.92) following the same path yields the following two generalized
eigenvalue problems:
Find (Ujk , jk ) R#Ni {0} R such that
Ai Uik = ik Bi Uik ,
and
Find (Vjk , jk ) R#Ni {0} R such that
i Vik = ik Di Bi Di Vik . .
A
In the general case for 1 i N , matrices Di may have zero entries for
boundary degrees of freedom since they are related to a partition of unity.
Moreover very often matrices Bi and Ai differ only by the interface condi-
tions that is for entries corresponding to boundary degrees of freedom. There-
fore, matrix Di Bi Di on the right hand side of the last generalized eigenvalue
problem is not impacted by the choice of the interface conditions of the one
level optimized Schwarz method. This cannot lead to efficient adaptive coarse
spaces.
7.8.3 Nearly Incompressible elasticity

Although our theory does not apply in a straightforward manner to saddle
point problems, we use it for these difficult problems for which it is not
possible to preserve both symmetry and positivity of the problem. Note
that generalized eigenvalue problems (7.80) and (7.95) still make sense if A
is the matrix of a saddle point problem and matrices Bi and A i are properly
defined for each subdomain 1 i N . The new coarse space was tested
quite successfully on Stokes and nearly incompressible elasticity problems
with a discretization based on saddle point formulations in order to avoid
locking phenomena. We first report 2D results for the latter case where
comparisons are made with other methods in terms of iteration counts only.
The mechanical properties of a solid can be characterized by its Young
modulus E and Poisson ratio or alternatively by its Lame coefficients
and . These coefficients relate to each other by the following formulas:
E E
= and = . (7.97)
(1 + )(1 2) 2(1 + )
The variational problem consists in finding (uh , ph ) Vh = Pd2 H01 () P1
such that for all (vh , qh ) Vh

2(uh ) (vh )dx ph div (vh )dx = f vh dx

(7.98)

div (uh )qh dx 1 ph qh =0
Matrix A i arises from the variational formulation (7.98) where the integra-
tion over domain is replaced by the integration over subdomain i and
finite element space Vh is restricted to subdomain i . Matrix Bi corresponds
to a Robin problem and is the sum of matrix A i and of the matrix of the
following variational formulation restricted to the same finite element space:
2(2 + )
uh vh with = 10 in our test.
i + 3
Figure 7.5: 2D Elasticity: coefficient distribution of steel and rubber.
Table 7.1: 2D Elasticity. GMRES iteration counts for a solid made of steel
and rubber. Simulations made with FreeFem++ [105]
AS SORAS AS+ZEM SORAS +ZEM AS-GenEO SORAS geneo2
d.o.f N iter. iter. iter. dim iter. dim iter. dim iter. dim
35841 8 150 184 117 24 79 24 110 184 13 145
70590 16 276 337 170 48 144 48 153 400 17 303
141375 32 497 >1000 261 96 200 96 171 800 22 561
279561 64 >1000 >1000 333 192 335 192 496 1600 24 855
561531 128 >1000 >1000 329 384 400 384 >1000 2304 29 1220
1077141 256 >1000 >1000 369 768 >1000 768 >1000 3840 36 1971
We test various domain decomposition methods for a heterogeneous beam

of eight layers of steel (E1 , 1 ) = (210 109 , 0.3) and rubber (E2 , 2 ) = (0.1
109 , 0.4999), see Figure 7.5. The beam is clamped on its left and right sides.
Iteration counts for a relative tolerance of 106 are given in Table 7.1. We
compare the one level Additive Schwarz (AS) and SORAS methods, the
two level AS and SORAS methods with a coarse space consisting of rigid
body motions which are zero energy modes (ZEM) and finally AS with a
GenEO coarse space and SORAS with the GenEO-2 coarse space defined in
Definition 7.8.3 with = 0.4 and = 103 . Columns dim refer to the total
size of the coarse space of a two-level method. Eigenvalue problem (7.95)
accounts for roughly 90% of the GenEO-2 coarse space size.
We performed large 3D simulations on 8192 cores of a IBM/Blue Gene Q
machine with 1.6 GHz Power A2 processors for both elasticity (200 millions
of d.o.fs in 200 sec.) and Stokes (200 millions of d.o.fs in 150 sec. )
equations. Computing facilities were provided by an IDRIS-GENCI project.
Chapter 8
Parallel implementation of
Schwarz methods
As depicted in previous chapters, domain decomposition methods can be

used to design extremely robust parallel preconditioners. The purpose of this
chapter is to give an introduction to their use on parallel computers through
their implementation in the free finite element package FreeFem++ [105].
We start with a self-contained script of a three dimensional elasticity prob-
lem, 8.1. The solver can be either one of the domain decomposition meth-
ods introduced in the book, a direct solver or an algebraic multigrid method.
The last two solvers are available via the PETSc [8, 7] interface. In the next
section 8.3, we explain the algebraic formalism used for domain decompo-
sition methods that bypasses a global numbering of the unknowns. In the
last section 8.2, we show strong and weak scalability results for various
problems on both small and large numbers of cores.
8.1 A parallel FreeFem++ script

The first subsection 8.1.1 is devoted to the formulation of the elasticity
problem and its geometry. Such an equation typically arises in compu-
tational solid mechanics, for modeling small deformations of compressible
bodies. The solver of the corresponding linear system will be chosen in
8.1.2. In 8.1.3 , we explain how to visualize the solution. The complete
script elasticity-3d.edp is given at the end of this section in 8.1.4.
8.1.1 Three dimensional elasticity problem

The mechanical properties of a solid can be characterized by its Young
modulus E and Poisson ratio or alternatively by its Lame coefficients
239
240 CHAPTER 8. IMPLEMENTATION OF SCHWARZ METHODS
and . These coefficients relate to each other by the following formulas:

E E
= and = . (8.1)
(1 + )(1 2) 2(1 + )
The reference geometry of the solid is the open R3 . The solid is subjected
to a volume force f and on part of its boundary (3 ) to a normal stress g.
It is clamped on the other part of its boundary (1 ). Let u R3
denote the displacement of the solid. In Cartesian coordinates (x1 , x2 , x3 ),
the linearized strain tensor (u) is given by:
1 ui uj
ij (u) = ( + ) , for 1 i, j 3 .
2 xj xi
The inner product of two strain-tensors is defined by:
3 3
(u) (v) = ij (u)ij (v) .
i=1 j=1
The divergence of the displacement is defined by

3
ui
u = .
i=1 xi
The finite element approximation of the system of linear elasticity is defined
by a variational formulation:
Find a displacement u R3 in P2 so that for any test function v R3
in P2 , we have:
a(u, v) = u v + (u) (v) f v = 0 , (8.2)

where
where and are defined by formula (8.1) Youngs modulus E
and Poissons ratio vary between two sets of values, (E1 , 1 ) =
(2 1011 , 0.25), and (E2 , 2 ) = (107 , 0.45).
f are the body forces, in this case only the gravity.
In the FreeFem++ script, equation (8.2) corresponds to three parametrized
macros: epsilon, div and Varf.
real Sqrt = sqrt(2.);
macro epsilon(u)[dx(u), dy(u#B), dz(u#C), (dz(u#B) + dy(u#C)) / Sqrt, (dz(u)
+ dx(u#C)) / Sqrt, (dy(u) + dx(u#B)) / Sqrt]// EOM
macro div(u)(dx(u) + dy(u#B) + dz(u#C))// EOM
16
macro Varf(varfName, meshName, PhName)
coefficients(meshName, PhName)
varf varfName(def(u), def(v)) = intN(meshName)(lambda div(u) div(v) + 2.
mu (epsilon(u) epsilon(v))) + intN(meshName)(f vB) + on(1,
BC(u, 0));
20 // EOM
8.1. A PARALLEL FREEFEM++ SCRIPT 241
Listing 8.1: ./PARALLEL/FreefemProgram/elasticity-3d.edp
A first geometry is defined via a three dimensional mesh of a (simplified)

bridge that is built in two steps:
the mesh of a two dimensional bridge is built ThGlobal2d.
a third dimension is added to the previous 2D mesh using the

buildlayers function.
Note that the three dimensional mesh is not actually built but a macro is
defined.
real depth = 0.25;

int discrZ = getARGV(discrZ, 1);
32 real L = 2.5;
real H = 0.71;
real Hsupp = 0.61;
real r = 0.05;
36 real l = 0.35;
real h = 0.02;
real width = 2.5L/4.;
real alpha = asin(h/(2.r))/2;
40 /# twoDsequentialMesh #/
border a0a(t=0, 1){x=0; y=tHsupp; label=2;};
border a0(t=0, (L width)/2.){x=t; y=Hsupp; label=1;};
border a0b(t=0, 1){x=(L width)/2.; y=(1t)Hsupp; label=2;};
44 border aa(t=0, 0.5(1/0.75)){x=L/2. width/2.cos(pit0.75);
y=sin(pit0.75)/4.; label=2;};
border ab(t=0, 0.5(1/0.75)){x=L/2. + width/2.cos(pit0.75);
border a2a(t=0, 1){x=(L + width)/2.; y=tHsupp; label=2;};
border a2(t=(L + width)/2., L){x=t; y=Hsupp; label=1;};
48 border a2b(t=0, 1){x=L; y=(1t)Hsupp; label=2;};
border e(t=0, 1){x=L; y=tH; label=2;};
border c(t=0, 1){x=(1t)L; y=H; label=3;};
border d(t=0, 1){x=0; y=(1t)H; label=2;};
52 mesh ThGlobal2d = buildmesh(a0(global (L width)/(2.L)) +
a0a(globalHsupp/L) + a0b(globalHsupp/L) + a2(global (L
width)/(2L)) + a2a(globalHsupp/L) + a2b(globalHsupp/L) + aa(global
width/(2L)) + ab(global width/(2L)) + e(globalH/L) + c(global)
+ d(globalH/L));
ThGlobal2d = adaptmesh(ThGlobal2d, 1/200., IsMetric=1, nbvx=100000);
/# twoDsequentialMeshEnd #/
macro minimalMesh()Cube(CC, BB, LL)// EOM
56 macro generateTh(name)name = buildlayers(ThGlobal2d, discrZ, zbound=[0,
depth])// EOM
int[int, int] LL = [[1,3], [2,2], [2,2]];
real[int, int] BB = [[0,10], [0,1], [0,1]];
int[int] CC = [1, 1, 1];

This geometry is partitioned and distributed among the MPI processes.
Figure 8.1: Partitioning into eight subdomains
From now on, all the tasks can be computed concurrently, meaning that
each MPI process is in charge of only one subdomain and variables are local
to each process. Then a parallel mesh refinement is made by cutting each
tetrahedra into 8 tetrahedra. This corresponds to a mesh refinement factor
s equals to 2. Note also that at this point, the displacement field u is
approximated by P2 continuous piecewise quadratic functions.
func Pk = [P2, P2, P2];
build(generateTh, Th, ThBorder, ThOverlap, D, numberIntersection,

arrayIntersection, restrictionIntersection, Wh, Pk, mpiCommWorld, s)
75 ThGlobal2d = square(1, 1);

The Young and Poisson modulus are heterogeneous and the exterior forces
are constant.
Remark 8.1.1 In our numerical experiments, Poissons ratio being rela-

tively far from the incompressible limit of 0.5, it is not necessary to switch
to a mixed finite element formulation since there is no locking effect.
real f = 900000.;
func real stripes(real a, real b, real paramA, real paramB) {
int da = int(a 10);
107 return (da == (int(da / 2) 2) ? paramB : paramA);
}
macro coefficients(meshName, PhName)

111 fespace PhName(meshName, P0);
PhName Young = stripes(y, x, 2e11, 1e7);
PhName poisson = stripes(y, x, 0.25, 0.45);
PhName tmp = 1. + poisson;
115 PhName mu = Young / (2. tmp);
PhName lambda = Young poisson / (tmp (1. 2. poisson));// EOM
8.1.2 Native DDM solvers and PETSc Interface

At this point, the physics, the geometry and the discretization of the three
dimensional elasticity problem have been given in the script. In order to
find the finite element approximation of the displacement field, we have
to solve the corresponding global algebraic system which is actually never
assembled. Its distributed storage among the processes depends on whether
we use native FreeFem++ solvers or other solvers via the PETSc interface.
The native FreeFem++ solvers are:
a parallel GMRES solver which is the default parallel solver in

FreeFem++,
a one-level Schwarz method, either ASM or RAS,
the two-level Schwarz method with a GenEO coarse space, 7.
They are implemented inside HPDDM, a C++ framework for high-

performance domain decomposition methods, available at the following
URL: https://github.com/hpddm/hpddm. Its interface with FreeFem++
also include substructuring methods, like the FETI and BDD methods, as
described 6. The solvers interfaced with PETSc are:
the PETSc parallel GMRES solver,
the multi frontal parallel solver MUMPS [5],
GAMG: an algebraic multigrid solver [2].

solver = getARGV(solver, 0);

if(solver == 0)
82 {
if(mpirank == 0) {
cout << What kind of solver would you like to use ? << endl;
cout << [1] PETSc GMRES << endl;
86 cout << [2] GAMG << endl;
cout << [3] MUMPS << endl;
cout << [10] ASM << endl;
cout << [11] RAS << endl;
90 cout << [12] Schwarz GenEO << endl;
cout << [13] GMRES << endl;
cout << Please type in a number: ;
cin >> solver;
94 if(solver != 1 && solver != 2 && solver != 3 && solver != 4 && solver
!= 10 && solver != 11 && solver != 12) {
cout << Wrong choice, using GMRES instead ! << endl;
solver = 10;
}
98 }
}
broadcast(processor(0), solver);
FreeFem++ interface If the native FreeFem++ solvers are used, the

local stiffness matrices are assembled concurrently:
assemble(A, rhs, Wh, Th, ThBorder, Varf)

The local operator is attached to a distributed structure dschwarz which
is a FreeFem++ type. If necessary, the GenEO coarse space is built
:
matrix N;
if(mpisize > 1 && solver == 12) {
166 int[int] parm(1);
parm(0) = getARGV(nu, 20);
EVproblem(vPbNoPen, Th, Ph)
matrix noPen = vPbNoPen(Wh, Wh, solver = CG);
170 attachCoarseOperator(mpiCommWorld, Aglob, A = noPen, /threshold = 2.
h[].max / diam,/ parameters = parm);
}

The distributed linear system is solved by a call that includes addi-
tional command line arguments that are automatically passed to the
solvers.
DDM(Aglob, u[], rhs, dim = getARGV(gmres restart, 60), iter =

getARGV(iter, 100), eps = getARGV(eps, 1e8), solver =
solver 9);
PETSc interface If the PETSc interface is used, the local stiffness matrix
K = Aii = Ri ARiT and the local load vector rhs are built concurrently from
the variational forms for all subdomains 1 i N .
124 Varf(vPb, Th, Ph)

matrix A = vPb(Wh, Wh);
rhs = vPb(0, Wh);
dmatrix Mat(A, arrayIntersection, restrictionIntersection, D, bs = 3);

dmatrix is a FreeFem++ type.
If an algebraic multigrid method is used via the PETSc interface, the near
null space must be provided in order to enhance the convergence, see e.g.
[106]. For an elasticity problem, it is made of the rigid body motions which
are three translations and three rotations, here along the axis.
Wh[int] def(Rb)(6);
[Rb[0], RbB[0], RbC[0]] = [ 1, 0, 0];
[Rb[1], RbB[1], RbC[1]] = [ 0, 1, 0];
135 [Rb[2], RbB[2], RbC[2]] = [ 0, 0, 1];
[Rb[3], RbB[3], RbC[3]] = [ y, x, 0];
[Rb[4], RbB[4], RbC[4]] = [z, 0, x];
[Rb[5], RbB[5], RbC[5]] = [ 0, z, y];

Eventually, the solver may be called by passing command line parameters
to PETSc:
141 set(Mat, sparams = pc type gamg ksp type gmres pc gamg threshold
0.05 ksp monitor, nearnullspace = Rb);
}
else if(solver == 3)
set(Mat, sparams = pc type lu pc factor mat solver package mumps
mat mumps icntl 7 2 ksp monitor);
145 mpiBarrier(mpiCommWorld);
timing = mpiWtime();
u[] = Mat1 rhs;

8.1.3 Validation of the computation

The true residual is computed by first performing a parallel matrix vec-
tor product either via matrix Mat that is interfaced with the PETSc inter-
face
res = Mat u[];

or via the matrix Aglob in the native FreeFem++ Schwarz for-
mat.
res = Aglob u[];

Once the linear system is solved and the residual is computed, the sequel
of the program does not depend on whether a FreeFem++ or a PETSc
interface was used. The residual is subtracted from the distributed right-
hand side rhs and distributed scalar product are performed based on
Lemma 8.3.1
182 res = rhs;

real rhsnorm = dscalprod(D, rhs, rhs);
real dist = dscalprod(D, res, res);
if(mpirank == 0)
186 cout << normalized L2 norm of the true residual: << sqrt(dist /
rhsnorm) << endl;

The three dimensional solution can be visualized by the plot com-
mand.
193 plotMPI(Th, u, Global solution, Pk, def, 3, 1);

meshN ThMoved = movemesh3(Th, transfo = [x + u, y + uB, z + uC]);
plotMPI(ThMoved, u, Moved mesh, Pk, def, 3, 1);
Timings for the solvers are given in Section 8.2.
8.1.4 Parallel Script

For sake of completeness, the full script is given here, split into five
pieces.
macro K()real// EOM

macro def()def3// EOM
macro init()init3// EOM
5 macro BC()BC3// EOM
macro meshN()mesh3// EOM
macro intN()int3d// EOM
macro measureN()volume// EOM
9 macro bbN()bb3// EOM
include Schwarz/macro.idp
/# problemPhysics #/
13 real Sqrt = sqrt(2.);
macro epsilon(u)[dx(u), dy(u#B), dz(u#C), (dz(u#B) + dy(u#C)) / Sqrt, (dz(u)
+ dx(u#C)) / Sqrt, (dy(u) + dx(u#B)) / Sqrt]// EOM
macro div(u)(dx(u) + dy(u#B) + dz(u#C))// EOM
17 macro Varf(varfName, meshName, PhName)

varf varfName(def(u), def(v)) = intN(meshName)(lambda div(u) div(v) + 2.
mu (epsilon(u) epsilon(v))) + intN(meshName)(f vB) + on(1,
BC(u, 0));
// EOM
21 /# problemPhysicsEnd #/
/# vfGENEO #/
macro EVproblem(varfName, meshName, PhName)
25 varf varfName(def(u), def(v)) = intN(meshName)(lambda div(u) div(v) + 2.
mu (epsilon(u) epsilon(v))) + on(1, BC(u, 0));
// EOM
/# vfGENEOEnd #/
29 /# sequentialMesh #/
real depth = 0.25;
int discrZ = getARGV(discrZ, 1);
real L = 2.5;
33 real H = 0.71;
real Hsupp = 0.61;
real r = 0.05;
real l = 0.35;
37 real h = 0.02;
real width = 2.5L/4.;
real alpha = asin(h/(2.r))/2;

border a0a(t=0, 1){x=0; y=tHsupp; label=2;};

border a0(t=0, (L width)/2.){x=t; y=Hsupp; label=1;};
border a0b(t=0, 1){x=(L width)/2.; y=(1t)Hsupp; label=2;};
44 border aa(t=0, 0.5(1/0.75)){x=L/2. width/2.cos(pit0.75);
border ab(t=0, 0.5(1/0.75)){x=L/2. + width/2.cos(pit0.75);
border a2a(t=0, 1){x=(L + width)/2.; y=tHsupp; label=2;};
border a2(t=(L + width)/2., L){x=t; y=Hsupp; label=1;};
48 border a2b(t=0, 1){x=L; y=(1t)Hsupp; label=2;};
border e(t=0, 1){x=L; y=tH; label=2;};
border c(t=0, 1){x=(1t)L; y=H; label=3;};
border d(t=0, 1){x=0; y=(1t)H; label=2;};
52 mesh ThGlobal2d = buildmesh(a0(global (L width)/(2.L)) +
a0a(globalHsupp/L) + a0b(globalHsupp/L) + a2(global (L
width)/(2L)) + a2a(globalHsupp/L) + a2b(globalHsupp/L) + aa(global
width/(2L)) + ab(global width/(2L)) + e(globalH/L) + c(global)
+ d(globalH/L));
ThGlobal2d = adaptmesh(ThGlobal2d, 1/200., IsMetric=1, nbvx=100000);
/# twoDsequentialMeshEnd #/
macro minimalMesh()Cube(CC, BB, LL)// EOM
56 macro generateTh(name)name = buildlayers(ThGlobal2d, discrZ, zbound=[0,
depth])// EOM
int[int, int] LL = [[1,3], [2,2], [2,2]];
real[int, int] BB = [[0,10], [0,1], [0,1]];
int[int] CC = [1, 1, 1];
60 /# sequentialMeshEnd #/
include Schwarz/additional macro.idp
64 int overlap = getARGV(overlap, 1);
if(mpirank == 0) {
cout << << mpirank << / << mpisize;
68 cout << input parameters: global size = << global << refinement
factor = << s << precision = << getARGV(eps, 1e8) <<
overlap = << overlap << with partitioner? = <<
partitioner << endl;
}
/# parallelMesh #/
72 func Pk = [P2, P2, P2];
build(generateTh, Th, ThBorder, ThOverlap, D, numberIntersection,

arrayIntersection, restrictionIntersection, Wh, Pk, mpiCommWorld, s)
ThGlobal2d = square(1, 1);

Wh def(u);
/# chooseSolver #/
80 solver = getARGV(solver, 0);
if(solver == 0)
{
if(mpirank == 0) {
84 cout << What kind of solver would you like to use ? << endl;
cout << [1] PETSc GMRES << endl;
cout << [2] GAMG << endl;
cout << [3] MUMPS << endl;
88 cout << [10] ASM << endl;
cout << [11] RAS << endl;
cout << [12] Schwarz GenEO << endl;
cout << [13] GMRES << endl;
92 cout << Please type in a number: ;
cin >> solver;
if(solver != 1 && solver != 2 && solver != 3 && solver != 4 && solver
!= 10 && solver != 11 && solver != 12) {
cout << Wrong choice, using GMRES instead ! << endl;
96 solver = 10;
}
}
}
100 broadcast(processor(0), solver);
/# chooseSolverEnd #/
/# physicalParameters #/
104 real f = 900000.;
func real stripes(real a, real b, real paramA, real paramB) {
int da = int(a 10);
return (da == (int(da / 2) 2) ? paramB : paramA);
108 }
macro coefficients(meshName, PhName)

fespace PhName(meshName, P0);
112 PhName Young = stripes(y, x, 2e11, 1e7);
PhName poisson = stripes(y, x, 0.25, 0.45);
PhName tmp = 1. + poisson;
PhName mu = Young / (2. tmp);
116 PhName lambda = Young poisson / (tmp (1. 2. poisson));// EOM

real[int] res(Wh.ndof);
120 real[int] rhs(Wh.ndof);
if(solver == 1 || solver == 2 || solver == 3) {

/# StiffnessRhsMatrix #/
124 Varf(vPb, Th, Ph)
matrix A = vPb(Wh, Wh);
rhs = vPb(0, Wh);
dmatrix Mat(A, arrayIntersection, restrictionIntersection, D, bs = 3);
128 /# StiffnessRhsMatrixEnd #/
// sparams will override command line arguments !
if(solver == 2) {
/# rigidBodyMotion #/
132 Wh[int] def(Rb)(6);
[Rb[0], RbB[0], RbC[0]] = [ 1, 0, 0];
[Rb[1], RbB[1], RbC[1]] = [ 0, 1, 0];
[Rb[2], RbB[2], RbC[2]] = [ 0, 0, 1];
136 [Rb[3], RbB[3], RbC[3]] = [ y, x, 0];
[Rb[4], RbB[4], RbC[4]] = [z, 0, x];
[Rb[5], RbB[5], RbC[5]] = [ 0, z, y];
/# rigidBodyMotionEnd #/
140 /# SolverPETSc #/
set(Mat, sparams = pc type gamg ksp type gmres pc gamg threshold
0.05 ksp monitor, nearnullspace = Rb);
}
else if(solver == 3)
144 set(Mat, sparams = pc type lu pc factor mat solver package mumps
mat mumps icntl 7 2 ksp monitor);
mpiBarrier(mpiCommWorld);
u[] = Mat1 rhs;
148 /# SolverPETScEnd #/
timing = mpiWtime() timing;
/# matrixVectorPETSc #/
res = Mat u[];
152 /# matrixVectorPETScEnd #/
}
else {
/# localMatrix #/
156 assemble(A, rhs, Wh, Th, ThBorder, Varf)

dschwarz Aglob(A, arrayIntersection, restrictionIntersection, scaling = D);

160
/# coarseSpace #/
164 matrix N;
if(mpisize > 1 && solver == 12) {
int[int] parm(1);
parm(0) = getARGV(nu, 20);
168 EVproblem(vPbNoPen, Th, Ph)
matrix noPen = vPbNoPen(Wh, Wh, solver = CG);
attachCoarseOperator(mpiCommWorld, Aglob, A = noPen, /threshold = 2.
h[].max / diam,/ parameters = parm);
}
172 /# coarseSpaceEnd #/
/# SolverDDM #/
DDM(Aglob, u[], rhs, dim = getARGV(gmres restart, 60), iter =
getARGV(iter, 100), eps = getARGV(eps, 1e8), solver =
solver 9);
/# SolverDDMEnd #/
176 timing = mpiWtime() timing;
/# matrixVectorFFpp #/
res = Aglob u[];
/# matrixVectorFFppEnd #/
180 }
/# trueResidual #/
res = rhs;
real rhsnorm = dscalprod(D, rhs, rhs);
184 real dist = dscalprod(D, res, res);
if(mpirank == 0)
cout << normalized L2 norm of the true residual: << sqrt(dist /
rhsnorm) << endl;
/# trueResidualEnd #/
188
if(mpirank == 0)
cout << time to solution: << timing << endl;
192 /# Visualization #/
plotMPI(Th, u, Global solution, Pk, def, 3, 1);
meshN ThMoved = movemesh3(Th, transfo = [x + u, y + uB, z + uC]);
plotMPI(ThMoved, u, Moved mesh, Pk, def, 3, 1);
196 /# VisualizationEnd #/

Table 8.1: 263,000 unknowns with 8 cores (timings are in seconds)
Solver Time to solution Number of iterations

GAMG = 0.01 211.5 383
GAMG = 0.025 131.7 195
GAMG = 0.05 138.8 172
GAMG = 0.1 not available > 400 (in 333.3)
GAMG = 0.2 not available > 400 (in 1395.8)
GenEO = 20 135.2 192
GenEO = 25 112.1 107
GenEO = 30 103.5 69
GenEO = 35 96.1 38
GenEO = 40 103.4 34
MUMPS 68.1 3
8.2 Numerical experiments
In 8.2.1, we present results obtained on few cores with the above script
from 8.1. Section 8.2.2 shows the scalability of the method with a large
number of cores solving both the system of linear elasticity and a problem
of scalar diffusion.
8.2.1 Small scale computations
Results and timings for solving this problem with 263,000 unknowns on
8 cores running at 1.6 GHz are given in Table 8.1. The parameter is
the relative threshold used for dropping edges in the aggregation graphs of
the multigrid preconditioner, while the parameter is the number of local
deflation vectors computed per subdomain in the GenEO coarse space. The
multigrid implementation is based on GAMG [2], which is bundled into
PETSc [8, 7]. The results for exactly the same problem as before on 64
cores are given in Table 8.2.
For the GenEO method, the computational times vary slightly when the
parameter varies around its optimal value. The iteration count decreases
when the parameter increases. On the other hand, when is increased the
cost of the factorization of the coarse operator increases. For the multigrid
method, the computational times vary rapidly with the parameter .
8.2. NUMERICAL EXPERIMENTS 253
Solver Time to solution Number of iterations

GAMG = 0.01 30.6 310
GAMG = 0.025 26.6 232
GAMG = 0.05 25.7 179
GAMG = 0.1 not available > 400 (in 62.5 sec.)
GAMG = 0.2 not available > 400 (in 263.7 sec.)
GenEO = 20 25.3 200
GenEO = 25 15.1 32
GenEO = 30 15.5 26
MUMPS not available fail to setup
Table 8.2: 263,000 unknowns with 64 cores (timings are in seconds)
8.2.2 Large Scale Computations

Results were obtained on Curie, a Tier-0 system for PRACE1 , with a
peak performance of 1.7 PFLOP/s. They have been first published in the
article [113] nominated for the best paper award at SC132 and were also
disseminated in PRACE Annual Report 2013 [148, pp. 2223] as a success
story. Curie is composed of 5 040 nodes made of two eight-core Intel Sandy
Bridge processors clocked at 2.7 GHz. Its interconnect is an InfiniBand
QDR full fat tree and the MPI implementation was BullxMPI version
1.1.16.5. Intel compilers and MKL in their version 13.1.0.146 were used for
all binaries and shared libraries, and as the linear algebra backend for both
dense and sparse computations in the framework. Finite element matrices
are obtained with FreeFem++. The speedup and efficiency are displayed
in terms of number of MPI processes. In these experiments, each MPI
process is assigned a single subdomain and two OpenMP threads. Since
the preconditioner is not symmetric, we use GMRES. The computation is
stopped when the relative residual is less than 106 .
Strong scaling experiments

We modified the script of section 8.1 to handle the two dimensional elasticity
case in addition to the three dimensional case. In 2D, piecewise cubic basis
functions are used and the system of equations has approximately 33 nonzero
entries per row. It is of constant size equal close to two billions unknowns. In
3D, piecewise quadratic basis functions are used and the system of equations
has approximately 83 nonzero entries per row. The system is of constant
size equal close to 300 million unknowns. These are so-called strong scaling
1
Partnership for Advanced Computing in Europe. url: http://www.prace-ri.eu/.
2
Among five other papers out of 90 accepted papers out of 457 submissions.
experiments. Geometries are partitioned with METIS. After the partitioning

step, each local mesh is refined concurrently by splitting each triangle or
tetrahedron into multiple smaller elements. This means that the simulation
starts with a relatively poor global mesh (26 million triangles in 2D, 10
million tetrahedra in 3D), which is then refined in parallel (thrice in 2D,
twice in 3D). A nice speedup is obtained from 1 024 2 = 2 048 to 8192 2
= 16 384 threads as shown 8.2.
500 Linear speedup

3D 2D
Runtime (seconds)
200
100
40
10 20 40 81
24 48 96 92
# of processes
Figure 8.2: Timings of various simulations
In 3D, we achieve superlinear speedups when going from one thousand to

four thousand cores as shown on Figure 8.2. From four thousand to eight
thousand cores, the speedup is sublinear. At peak performance, on 16 384
530.56
threads, the speedup relative to 2 048 threads equals which is ap-
51.76
proximately a tenfold decrease in runtime. In 2D, the speedup is linear at
the beginning and then sublinear. At peak performance, on 16 384 threads,
213.20
the speedup relative to 2 048 threads equals which is approximately
34.54
a sixfold decrease in runtime.
According to Table 8.4, the costly operations in the construction of the two-
level preconditioner are the solution of each local eigenvalue problem (col-
umn Deflation) and the factorization of each local matrix {Ai }N i=1 (column
Factorization). In 3D, the complexity of the factorization phase typically
grows superlinearly with respect to the number of unknowns. In both 2D
and 3D, the solution of the eigenproblem is the limiting factor for achiev-
ing better speedups. This can be explained by the fact that the Lanczos
method, on which ARPACK is based, tends to perform better for larger
ni
ratios { } , but these values decrease as subdomains get smaller. The
i 1iN
local number of deflation vectors is uniform across subdomains and ranges

from fifteen to twenty. For fewer but larger subdomains, the time to compute
the solution, column Solution, i.e. the time for the GMRES to converge, is
almost equal to the one spent in local forward eliminations and back substi-
tutions and communication times are negligible. When the decompositions
become bigger, subdomains are smaller, hence each local solution is com-
puted faster and global communications have to be taken into account.
To infer a more precise idea of the communication-to-computation ratio,
Figure 8.3 is quite useful. It shows for each computation the relative cost
of each phase in percentage. The first two steps for computing the local
factorizations and deflation vectors are purely concurrent and do not involve
any communication. Thus, it is straightforward to get a lower bound of the
aforementioned ratio. The time spent for assembling the coarse operator and
for the Krylov method to converge is comprised of both communications and
computations.
100% 100%
80% 80%
60% 60%
Ratio
40% 40%
Factorization
20% Deation 20%
Coarse operator
0% 10 20 40 81 Krylov method 10 2 4 8 0%
24 48 96 92 24 048 096 192
# of processes # of processes
Figure 8.3: Comparison of the time spent in various steps for building and
using the preconditioner in 2D (left) and 3D (right).
N Factorization Deation Solution # of it. Total # of d.o.f.

1 024 177.9 s 264.0 s 77.4 s 28 530.6 s
2 048 62.7 s 97.3 s 20.4 s 23 186.0 s
3D 293.98 106
4 096 19.6 s 35.7 s 9.7 s 20 73.1 s
8 192 6.3 s 22.1 s 6.0 s 27 51.8 s
1 024 37.0 s 131.8 s 34.3 s 28 213.2 s
2 048 17.5 s 53.8 s 17.5 s 28 95.1 s
2D 2.14 109
4 096 6.9 s 27.1 s 8.6 s 23 47.7 s
8 192 2.0 s 20.8 s 4.8 s 23 34.5 s
Figure 8.4: Breakdown of the timings used for figure 8.3.
To assess the need for such a sophisticated preconditioner, the convergence

histogram of a simple one-level method versus this two-level method is dis-
played in Figure 8.5. One can easily understand that, while the cost of
building the preconditioner cannot be neglected, it is necessary to ensure
the convergence of the Krylov method: after more than 10 minutes, the
one-level preconditioner barely decreases the relative error to 2 105 , while
it takes 214 seconds for the two-level method to converge to the desired
tolerance, cf. Table 8.4 row #5. That is at least a threefold speedup. For
larger decompositions, the need for two-level methods is even more obvious.
1
MRAS eq. (1.15)
Relative residual error 102 1
PA-DEF1 eq. (1.19a)
103
104
105
106
0 100 200 300 400
# of iterations
Figure 8.5:
Convergence of the restarted GMRES(40) for a 2D problem
of linear elasticity using 1 024 subdomains. Timings for the
setup and solution phases using PA-DEF1
1
are available in 8.4,
using MRAS , the convergence is not reached after 10 minutes.
1
Weak scaling experiments
Moving on to the weak scaling properties, see Definition 4.1.2, the problem
now being solved is a scalar equation of diffusivity
(u) = 1 in
(8.3)
u =0 on [0; 1] {0} .
with highly heterogeneous coefficients. Parameter varies from 1 to 3 106

as displayed in Figure 8.6. The partitioned domain is = [0; 1]d (d = 2 or
3) with piecewise quartic basis functions in 2D yielding linear systems with
approximately 23 nonzero entries per row, and piecewise quadratic basis
functions in 3D yielding linear systems with approximately 27 nonzero
entries per row. After using Greens formula, its variational formulation is,
for all test functions v H01 ():
(x, y)
6
3 10
2 106
106
Figure 8.6: Diffusivity used for the two-dimensional

weak scaling experiments with channels and
inclusions.
a(u, v) = u v f v = 0 .

where f is a source term. Such an equation typically arises for modeling

flows in porous media, or in computational fluid dynamics. The only work
needed is changing the mesh used for computations, as well as the varia-
tional formulation of the problem in FreeFem++ DSL. On average, there is
a constant number of degrees of freedom per subdomain equal to roughly
280 thousands in 3D, and nearly 2.7 millions in 2D. As for the strong scaling
experiment, after building and partitioning a global coarse mesh with few
millions of elements, each local mesh is refined independently to ensure a
constant size per subdomain as the decomposition gets bigger. The efficiency
remains almost 90% thanks to only a slight change in the factorization time
for the local problems and in the construction of the deflation vectors. In
3D, the initial problem of 74 million unknowns is solved in 200 seconds on
512 threads. When using 16 384 threads, the size of the problem is approx-
imately 2.3 billion unknowns. It is solved in 215 seconds for an efficiency of
approximately 90%. In 2D, the initial problem of 695 million unknowns is
solved in 175 seconds on 512 threads. When using 16 384 threads, the prob-
lem size is approximately 22.3 billions unknowns. It is solved in 187 seconds
for an efficiency of approximately 96%. For this kind of scales, the most
penalizing step in the algorithm is the construction of the coarse operator,
especially in 3D. A non negligible increase in the time spent to assemble and
factorize the Galerkin matrix E is responsible for the slight loss of efficiency.
As a result, a multilevel (more than two levels) approach would be necessary.
8.3. FREEFEM++ ALGEBRAIC FORMULATION 259
8.3 FreeFem++ Algebraic Formulation

We present here the way we compute a global matrix vector product or
the action of a global one-level preconditioner, using only their local com-
ponents plus point-to-point (between a pair of processes) communications.
The special feature of this implementation is that no global numbering of
the unknowns is needed. This is the way domain decomposition methods
have been implemented in FreeFem++.
Recall that the set of indices N is decomposed into N sets (Ni )1iN . A
MPI-process is attached to each subset. Let n = #N be the number of de-
grees of freedom of the global finite element space. A global vector U Rn
is stored in a distributed way. Each process i , 1 i N , stores the local
vector Ui = Ri U R#Ni where Ri is the restriction operator introduced
in chapter 1 Definition 1.3.1. A total storage of N i=1 #Ni scalars must be
allocated, which is larger than the size n of the original system. The extra
cost in memory is not a problem since it is distributed among the N MPI
processes. The unknowns in the intersection of two subsets of degrees of
freedom are duplicated. It is important to ensure that the result (vi )1iN
of all linear algebra operators applied to this representation will preserve its
coherence, that is the duplicated degrees of freedom share the same values
across the subdomains Ni , 1 i N :
Definition 8.3.1 A sequence of local vectors (Vi )1iN N i=1 R

#Ni
is co-
herent if there exists a vector V R such that for all 1 i N , we have:
n
Vi = Ri V .
Another equivalent definition could have been that for all 1 i, j N , we

have:
RjT Rj RiT Ui = RiT Ri RjT Uj .
We have to define basic linear algebra operations based on this distributed
storage and to ensure that all linear algebra operations are performed so
that the coherence is enforced, up to round-off errors.
We start with the scalar product of two distributed vectors:
Lemma 8.3.1 Let U, V Rn . We have the following formula for their

scalar product (U, V):
N
(U, V) = (Ri U, Di Ri V) .
i=1
Proof Using the partition of unity (1.25)

N
Id = RiT Di Ri ,
i=1
we have:
N N
(U, V) = (U, RiT Di Ri V) = (Ri U, Di Ri V)
i=1 i=1
N
= (Ui , Di Vi ) .
i=1
Local scalar products are performed concurrently. Thus, the implemen-

tation is parallel except for the sum which corresponds to a MPI Reduce
call across the N MPI processes. Note also that the implementation relies
on the knowledge of a partition of unity so that the FreeFem++ syntax is
dscalprod(D,u,v).
A axpy procedure y x + y for x, y Rn and R is easily implemented

concurrently for distributed vectors in the form:
yi xi + yi , 1 i N .
The matrix-vector product is more intricate. Let A Rnn be a matrix and

let U, V Rn be two global vectors. We want to compute V = AU based
on a distributed storage of U in the form (Ui )1iN . Using the partition of
unity identity (1.25), we have for all 1 i N :
N N
Vi = Ri AU = Ri ARjT Dj Rj U = Ri ARjT Dj Uj .
j=1 j=1
If matrix A arises from a finite element discretization of a partial differential

equation as it is the case here when using FreeFem++, it is possible to
simplify the matrix-vector product. The sparsity pattern of matrix A follows
the one of the graph of the underlying mesh. Let k, l be two degrees of
freedom associated to the basis functions k and l . If their supports have
a zero measure intersection, then:
Akl = 0 .
We can take advantage of this sparsity pattern in the following way. A degree
of freedom k Nj is interior to Nj if for all i j and all l Ni Nj , Akl = 0.
Otherwise, it is said to be a boundary degree of freedom. If the overlap is
sufficient, it is possible to choose diagonal matrix Dj with zero entries for
the boundary degrees of freedom. Then all non zero rows of matrix ARjT Dj
have indices in Nj that is:
Ri ARjT Dj = Ri RjT Rj ARjT Dj .

Moreover, for non neighboring subdomains indexed i and j, the interaction

matrix Ri ARjT is zero. Denoting Oi the neighbors of a subdomain i
Oi = {j J1; N K j i and Ri ARjT 0} ,
we have:
N
Vi = Ri AU = Ri ARjT Dj Rj U = (Ri ARiT ) Di Ui + Ri RjT (Rj ARjT ) Dj Uj .
j=1 jOi
Aii Ajj
Matrix Ajj = Rj ARjT is local to domain j. The matrix operation Ri RjT

corresponds to copy data on the overlap from subdomain j to subdomain i.
Therefore, the matrix vector product is computed in three steps:
concurrent computing of (Rj ARjT ) Dj Uj for all 1 j N ;
neighbor to neighbor MPI-communications;
concurrent sum of neighbor contributions.
More details can be found in the PhD manuscript of P. Jolivet written in

English [114].
Since we have basic linear algebra subroutines, we have all the necessary
ingredients for solving concurrently the linear system A x = b by a Krylov
method such as CG (conjugate gradient) or GMRES. We now turn our at-
tention to domain decomposition methods. The ASM preconditioner reads:
N
MASM
1
= RjT A1
jj Rj .
j=1
Let R be a distributed residual so that each MPI process stores Rj = Rj R,

1 j N . We want to compute for each subdomain 1 i N the restriction
of the application of the ASM preconditioner to R:
N
Ri MASM
1
R = Ri RjT A1
jj Rj R = Aii Ri + (Ri Rj ) Ajj Rj .
1 T 1
j=1 jOi
This task is performed by first solving concurrently on all subdomains a

linear system:
Ajj Zj = Rj 1 j n . (8.4)
Then data transfers between neighboring subdomains implement the

Ri RjT Zj formula. The contribution from neighboring subdomains are
summed locally. This pattern is very similar to that of the matrix vec-
tor product.
The RAS preconditioner reads:
N
MRAS
1
= RjT Dj A1
jj Rj .
j=1
Its application to a distributed residual r consists in computing:

N
Ri MRAS
1
R = Ri RjT Dj A1
jj Rj R = Di Aii Ri + (Ri Rj ) Dj Ajj Rj .
1 T 1
j=1 jOi
The only difference in the implementation of RAS compared to ASM lies in

the concurrent multiplication of the local solution to (8.4) by the partition
of unity matrix Dj before data transfer between neighboring subdomains.
Eciency relative to 256 processes
100% 22 311
# of d.o.f.in millions
80%
2 305
60%
695
40%
20% 3D
2D 74
0% 256 512 10 20 40 81
24 48 96 92
# of processes
(a) Timings of various simulations
d.o.f. d.o.f.
2.1M sbdmn
in 2D 280k sbdmn
in 3D
200
200
150
Time (seconds)
100
100
Factorization
50 Deation
Coarse operator
0 256 512 1 0 2 0 4 0 8 1 Krylov method 0 256 512 1 0 2 0 4 0 8 1
24 48 96 92 24 48 96 92
# of processes # of processes
(b) Comparison of the time spent in various steps for building and using the preconditioner
N Factorization Deation Solution # of it. Total # of d.o.f.

256 64.2 s 117.7 s 15.8 s 13 200.6 s 74.6 106
512 64.0 s 112.2 s 19.9 s 18 199.4 s 144.7 106
1 024 63.2 s 118.6 s 16.2 s 14 202.4 s 288.8 106
3D
2 048 59.4 s 117.6 s 21.3 s 17 205.3 s 578.0 106
4 096 58.1 s 110.7 s 27.9 s 20 207.5 s 1.2 109
8 192 55.0 s 116.6 s 23.6 s 17 215.2 s 2.3 109
256 29.4 s 111.3 s 25.7 s 29 175.8 s 696.0 106
512 29.6 s 111.5 s 28.0 s 28 179.1 s 1.4 109
1 024 29.4 s 112.2 s 33.6 s 28 185.2 s 2.8 109
2D
2 048 29.2 s 112.2 s 33.7 s 28 185.2 s 5.6 109
4 096 29.8 s 113.7 s 31.0 s 26 185.4 s 11.2 109
8 192 29.8 s 113.8 s 30.7 s 25 187.6 s 22.3 109
(c) Breakdown of the timings used for the figure on top
Figure 8.7: Weak scaling experiments.

Bibliography
[1] Yves Achdou, Patrick Le Tallec, Frederic Nataf, and Marina

Vidrascu. A domain decomposition preconditioner for an advection-
diffusion problem. Comput. methods appl. mech. engrg., 184:145170,
2000 (cited on page 155).
[2] M. Adams, H. Bayraktar, T. Keaveny, and P. Papadopoulos. Ul-
trascalable implicit finite element analyses in solid mechanics with
over a half a billion degrees of freedom. In Proceedings of the 2004
acm/ieee conference on supercomputing. In SC04. IEEE Computer
Society, 2004, 34:134:15 (cited on pages 243, 252).
[3] Hubert Alcin, Bruno Koobus, Olivier Allain, and Alain Dervieux.
Efficiency and scalability of a two-level Schwarz algorithm for incom-
pressible and compressible flows. Internat. j. numer. methods fluids,
72(1):6989, 2013. issn: 0271-2091. doi: 10.1002/fld.3733. url:
http://dx.doi.org/10.1002/fld.3733 (cited on page 138).
[4] Ana Alonso-Rodriguez and Luca Gerardo-Giorda. New nonoverlap-
ping domain decomposition methods for the harmonic Maxwell sys-
tem. Siam j. sci. comput., 28(1):102122, 2006 (cited on page 86).
[5] Patrick R. Amestoy, Iain S. Duff, Jean-Yves LExcellent, and Jacko
Koster. A fully asynchronous multifrontal solver using distributed dy-
namic scheduling. Siam j. matrix analysis and applications, 23(1):15
41, 2001 (cited on page 243).
[6] Xavier Antoine, Yassine Boubendir, and Christophe Geuzaine. A
quasi-optimal non-overlapping domain decomposition algorithm for
the Helmholtz equation. Journal of computational physic, 231(2):262
280, 2012 (cited on page 86).
[7] Satish Balay, Shrirang Abhyankar, Mark F. Adams, Jed Brown, Peter
Brune, Kris Buschelman, Victor Eijkhout, William D. Gropp, Dinesh
Kaushik, Matthew G. Knepley, Lois Curfman McInnes, Karl Rupp,
Barry F. Smith, and Hong Zhang. PETSc Web page. http://www.
mcs.anl.gov/petsc. 2014. url: http://www.mcs.anl.gov/petsc
(cited on pages 239, 252).
265
266 BIBLIOGRAPHY
[8] Satish Balay, William D. Gropp, Lois Curfman McInnes, and Barry
F. Smith. Efficient management of parallelism in object oriented
numerical software libraries. In Modern software tools in scientific
computing. E. Arge, A. M. Bruaset, and H. P. Langtangen, editors.
Birkhauser Press, 1997, pages 163202 (cited on pages 239, 252).
[9] R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra,
V. Eijkhout, R. Pozo, C. Romine, and H. Van der Vorst. Templates
for the solution of linear systems: building blocks for iterative meth-
ods, 2nd edition. SIAM, Philadelphia, PA, 1994 (cited on pages 99,
111).
[10] H. Barucq, J. Diaz, and M. Tlemcani. New absorbing layers condi-
tions for short water waves. J. comput. phys., 229(1):5872, 2010.
issn: 0021-9991. doi: 10.1016/j.jcp.2009.08.033. url: http:
//dx.doi.org/10.1016/j.jcp.2009.08.033 (cited on page 183).
[11] G. K. Batchelor. An introduction to fluid dynamics. Of Cambridge
Mathematical Library. Cambridge University Press, Cambridge, pa-
perback edition, 1999, pages xviii+615. isbn: 0-521-66396-2 (cited on
page 183).
[12] Jean-Francois Bourgat, Roland Glowinski, Patrick Le Tallec, and Ma-
rina Vidrascu. Variational formulation and algorithm for trace oper-
ator in domain decomposition calculations. In Domain decomposition
methods. Tony Chan, Roland Glowinski, Jacques Periaux, and Olof
Widlund, editors. SIAM, Philadelphia, PA, 1989, pages 316 (cited
on pages 155, 162, 222).
[13] Ham Brezis. Analyse fonctionnelle : theorie et applications. Dunod,
Paris, 1983 (cited on page 164).
[14] X. C. Cai, W. D. Gropp, D. E. Keyes, R. G. Melvin, and D. P.
Young. Parallel Newton-Krylov-Schwarz algorithms for the transonic
full potential equation. Sisc, 19:245265, 1998 (cited on page ii).
[15] Xiao Chuan Cai and David Keyes. Nonlinearly preconditioned inex-
act newton algorithms. Sisc, 2003 (cited on page ii).
[16] Xiao-Chuan Cai, Mario A. Casarin, Frank W. Elliott Jr., and Olof
B. Widlund. Overlapping Schwarz algorithms for solving Helmholtzs
equation. In, Domain decomposition methods, 10 (boulder, co, 1997),
pages 391399. Amer. Math. Soc., Providence, RI, 1998 (cited on
page 75).
[17] Xiao-Chuan Cai, Charbel Farhat, and Marcus Sarkis. A minimum
overlap restricted additive Schwarz preconditioner and applications
to 3D flow simulations. Contemporary mathematics, 218:479485,
BIBLIOGRAPHY 267
[18] Xiao-Chuan Cai and Marcus Sarkis. A restricted additive Schwarz

preconditioner for general sparse linear systems. Siam journal on sci-
entific computing, 21:239247, 1999 (cited on pages 6, 234).
[19] Xiao-Chuan Cai and Olof B. Widlund. Domain decomposition algo-
rithms for indefinite elliptic problems. Siam j. sci. statist. comput.,
13(1):243258, 1992 (cited on page 75).
[20] Tony F. Chan and Tarek P. Mathew. Domain decomposition algo-
rithms. In, Acta numerica 1994, pages 61143. Cambridge University
Press, 1994 (cited on page i).
[21] Andrew Chapman and Yousef Saad. Deflated and augmented Krylov
subspace techniques. Numer. linear algebra appl., 4(1):4366, 1997.
issn: 1070-5325. doi: 10.1002/(SICI)1099- 1506(199701/02)4:
1<43::AID- NLA99> 3.3.CO;2- Q. url: http://dx.doi.org/10.
1002/(SICI)1099-1506(199701/02)4:1%3C43::AID-NLA99%3E3.
3.CO;2-Q (cited on page 125).
[22] T. Chartier, R. D. Falgout, V. E. Henson, J. Jones, T. Manteuf-
fel, S. McCormick, J. Ruge, and P. S. Vassilevski. Spectral AMGe
(AMGe). Siam j. sci. comput., 25(1):126, 2003. issn: 1064-8275.
doi: 10.1137/S106482750139892X. url: http://dx.doi.org/10.
1137/S106482750139892X (cited on page 154).
[23] P.L. Chebyshev. Theorie des mecanismes connus sous le nom de par-
allelogrammes. Imprimerie de lAcademie imperiale des sciences, 1853
(cited on page 101).
[24] C. Chevalier and F. Pellegrini. PT-SCOTCH: a tool for efficient par-
allel graph ordering. Parallel computing, 6-8(34):318331, 2008 (cited
on pages 13, 27, 136, 140).
[25] Philippe Chevalier. Methodes numeriques pour les tubes hy-
perfrequences. resolution par decomposition de domaine. PhD thesis.
Universite Paris VI, 1998 (cited on page 86).
[26] Philippe Chevalier and Frederic Nataf. Symmetrized method with
optimized second-order conditions for the Helmholtz equation. In,
Domain decomposition methods, 10 (boulder, co, 1997), pages 400
407. Amer. Math. Soc., Providence, RI, 1998 (cited on page 86).
[27] W. C. Chew and W. H. Weedon. A 3d perfectly matched medium
from modified maxwells equations with stretched coordinates. Ieee
trans. microwave opt. technol. lett., 7:599604, 1994 (cited on
pages 66, 86).
[28] Philippe G. Ciarlet. The finite element method for elliptic problems.
North-Holland, Amsterdam, 1978 (cited on page 164).
268 BIBLIOGRAPHY
[29] Thomas Cluzeau, Victorita Dolean, Frederic Nataf, and Alban

Quadrat. Preconditionning techniques for systems of partial differen-
tial equations based on algebraic methods. Technical report (7953).
http://hal.inria.fr/hal-00694468. INRIA, 2012 (cited on pages 182,
183, 188).
[30] Thomas Cluzeau, Victorita Dolean, Frederic Nataf, and Alban
Quadrat. Symbolic techniques for domain decomposition methods.
In, Domain decomposition methods in science and engineering XX.
Springer LNCSE, http://hal.archives-ouvertes.fr/hal-00664092, 2013
(cited on pages 182, 183, 188).
[31] Francis Collino, G. Delbue, Patrick Joly, and A. Piacentini. A new
interface condition in the non-overlapping domain decomposition.
Comput. methods appl. mech. engrg., 148:195207, 1997 (cited on
page 86).
[32] Francis Collino, G. Delbue, Patrick Joly, and A. Piacentini. A new
interface condition in the non-overlapping domain decomposition for
the Maxwell equations Helmholtz equation and related optimal con-
trol. Comput. methods appl. mech. engrg, 148:195207, 1997 (cited
on page 86).
[33] Francis Collino, Souad Ghanemi, and Patrick Joly. Domain decompo-
sition method for harmonic wave propagation: a general presentation.
Comput. methods appl. mech. engrg., 184(2-4):171211, 2000. Vistas
in domain decomposition and parallel processing in computational
mechanics. issn: 0045-7825. doi: 10.1016/S0045-7825(99)00228-
5. url: http://dx.doi.org/10.1016/S0045- 7825(99)00228- 5
[34] Lea Conen, Victorita Dolean, Rolf Krause, and Frederic Nataf. A
coarse space for heterogeneous Helmholtz problems based on the
Dirichlet-to-Neumann operator. J. comput. appl. math., 271:8399,
2014. issn: 0377-0427. doi: 10 . 1016 / j . cam . 2014 . 03 . 031. url:
http : / / dx . doi . org / 10 . 1016 / j . cam . 2014 . 03 . 031 (cited on
pages 90, 138).
[35] Lawrence C. Cowsar, Jan Mandel, and Mary F. Wheeler. Balanc-
ing domain decomposition for mixed finite elements. Math. comp.,
64(211):9891015, 1995 (cited on pages 164, 168).
[36] Amik St-Cyr, Martin J. Gander, and S. J. Thomas. Optimized multi-
plicative, additive, and restricted additive Schwarz preconditioning.
Siam j. sci. comput., 29(6):24022425 (electronic), 2007. issn: 1064-
8275. doi: 10 . 1137 / 060652610. url: http : / / dx . doi . org / 10 .
1137/060652610 (cited on pages 62, 64, 234).
BIBLIOGRAPHY 269
[37] T. A. Davis and I. S. Duff. A combined unifrontal-multifrontal

method for unsymmetric sparse matrices. Acm transactions on math-
ematical software, 25(1):119, 1999 (cited on page 26).
[38] Yann-Herve De Roeck and Patrick Le Tallec. Analysis and test of a
local domain decomposition preconditioner. In Fourth international
symposium on domain decomposition methods for partial differen-
tial equations. Roland Glowinski, Yuri Kuznetsov, Gerard Meurant,
Jacques Periaux, and Olof Widlund, editors. SIAM, Philadelphia,
PA, 1991, pages 112128 (cited on page 155).
[39] Eric de Sturler. Lecture notes on iterative methods. http :
/ / www . math . vt . edu / people / sturler / LectureNotes /
IterativeMethods.html. 2014 (cited on page 91).
[40] Q. Deng. An analysis for a nonoverlapping domain decomposition
iterative procedure. Siam j. sci. comput., 18:15171525, 1997 (cited
on page 86).
[41] Bruno Despres. Decomposition de domaine et probleme de
Helmholtz. C.r. acad. sci. paris, 1(6):313316, 1990 (cited on
page 71).
[42] Bruno Despres. Domain decomposition method and the helmholtz
problem. In Mathematical and numerical aspects of wave propaga-
tion phenomena (strasbourg, 1991). SIAM, Philadelphia, PA, 1991,
pages 4452 (cited on pages 54, 77).
[43] Bruno Despres. Domain decomposition method and the Helmholtz
problem.II. In Second international conference on mathematical and
numerical aspects of wave propagation (newark, de, 1993). SIAM,
Philadelphia, PA, 1993, pages 197206 (cited on pages 21, 43, 47, 50,
75).
[44] Bruno Despres. Methodes de decomposition de domaine pour les
problemes de propagation dondes en regimes harmoniques. PhD the-
sis. Paris IX, 1991 (cited on page 55).
[45] Bruno Despres, Patrick Joly, and Jean E. Roberts. A domain decom-
position method for the harmonic Maxwell equations. In Iterative
methods in linear algebra (brussels, 1991). North-Holland, Amster-
dam, 1992, pages 475484 (cited on pages 47, 50, 55, 71).
[46] Clark R. Dohrmann. A preconditioner for substructuring based
on constrained energy minimization. Siam j. sci. comput.,
25(1):246258 (electronic), 2003. issn: 1064-8275. doi: 10 . 1137 /
S1064827502412887. url: http : / / dx . doi . org / 10 . 1137 /
S1064827502412887 (cited on pages 155, 173).
270 BIBLIOGRAPHY
[47] Clark R. Dohrmann and Olof B. Widlund. An overlapping Schwarz

algorithm for almost incompressible elasticity. Siam j. numer. anal.,
47(4):28972923, 2009 (cited on page 136).
[48] Clark R. Dohrmann and Olof B. Widlund. Hybrid domain decom-
position algorithms for compressible and almost incompressible elas-
ticity. Internat. j. numer. methods engrg., 82(2):157183, 2010 (cited
on page 136).
[49] Victorita Dolean, Martin J. Gander, Stephane Lanteri, Jin-Fa Lee,
and Zhen Peng. Effective transmission conditions for domain decom-
position methods applied to the time-harmonic curl-curl maxwells
equations. Journal of computational physics, 2014 (cited on page 87).
[50] Victorita Dolean, Martin J. Gander, Stephane Lanteri, Jin-Fa Lee,
and Zhen Peng. Optimized Schwarz methods for curl-curl time-
harmonic Maxwells equations. In Proceedings of the 21st interna-
tional domain decomposition conference. Jocelyne Erhel, Martin J.
Gander, Laurence Halpern, Taoufik Sassi, and Olof Widlund, edi-
tors. Springer LNCSE, 2013 (cited on page 87).
[51] Victorita Dolean, Luca Gerardo Giorda, and Martin J. Gander. Opti-
mized Schwarz methods for Maxwell equations. Siam j. scient. comp.,
31(3):21932213, 2009 (cited on page 86).
[52] Victorita Dolean, Stephane Lanteri, and Frederic Nataf. Construc-
tion of interface conditions for solving compressible Euler equations
by non-overlapping domain decomposition methods. Int. j. numer.
meth. fluids, 40:14851492, 2002 (cited on page 86).
[53] Victorita Dolean, Stephane Lanteri, and Frederic Nataf. Convergence
analysis of a Schwarz type domain decomposition method for the
solution of the Euler equations. Appl. num. math., 49:153186, 2004
(cited on page 86).
[54] Victorita Dolean, Stephane Lanteri, and Frederic Nataf. Optimized
interface conditions for domain decomposition methods in fluid dy-
namics. Int. j. numer. meth. fluids, 40:15391550, 2002 (cited on
page 86).
[55] Victorita Dolean, Stephane Lanteri, and Ronan Perrussel. A do-
main decomposition method for solving the three-dimensional time-
harmonic Maxwell equations discretized by discontinuous Galerkin
methods. J. comput. phys., 227(3):20442072, 2008 (cited on
page 86).
[56] Victorita Dolean, Stephane Lanteri, and Ronan Perrussel. Optimized
Schwarz algorithms for solving time-harmonic Maxwells equations
discretized by a discontinuous Galerkin method. Ieee. trans. magn.,
44(6):954957, 2008 (cited on page 86).
BIBLIOGRAPHY 271
[57] Victorita Dolean and Frederic Nataf. A new domain decomposition

method for the compressible Euler equations. M2an math. model.
numer. anal., 40(4):689703, 2006. issn: 0764-583X. doi: 10.1051/
m2an:2006026. url: http://dx.doi.org/10.1051/m2an:2006026
[58] Victorita Dolean and Frederic Nataf. A new domain decomposition
method for the compressible Euler equations using Smith factoriza-
tion. In, Domain decomposition methods in science and engineering
XVII. Volume 60, in Lect. Notes Comput. Sci. Eng. Pages 331338.
Springer, Berlin, 2008. doi: 10.1007/978-3-540-75199-1_40. url:
http://dx.doi.org/10.1007/978-3-540-75199-1_40 (cited on
page 183).
[59] Victorita Dolean, Frederic Nataf, and Gerd Rapin. Deriving a new
domain decomposition method for the Stokes equations using the
Smith factorization. Math. comp., 78(266):789814, 2009. issn: 0025-
5718. doi: 10.1090/S0025- 5718- 08- 02172- 8. url: http://dx.
doi.org/10.1090/S0025- 5718- 08- 02172- 8 (cited on pages 183,
187, 188).
[60] Victorita Dolean, Frederic Nataf, and Gerd Rapin. How to use the
Smith factorization for domain decomposition methods applied to
the Stokes equations. In, Domain decomposition methods in science
and engineering XVII. Volume 60, in Lect. Notes Comput. Sci. Eng.
Pages 477484. Springer, Berlin, 2008. doi: 10.1007/978- 3- 540-
75199 - 1 _ 60. url: http : / / dx . doi . org / 10 . 1007 / 978 - 3 - 540 -
75199-1_60 (cited on page 183).
[61] Victorita Dolean, Frederic Nataf, Robert Scheichl, and Nicole
Spillane. Analysis of a two-level Schwarz method with coarse spaces
based on local Dirichlet-to-Neumann maps. Comput. methods appl.
math., 12(4):391414, 2012. issn: 1609-4840. doi: 10.2478/cmam-
2012-0027. url: http://dx.doi.org/10.2478/cmam-2012-0027
(cited on pages 146, 153, 191, 192).
[62] Maksymilian Dryja, Marcus V. Sarkis, and Olof B. Widlund. Multi-
level Schwarz methods for elliptic problems with discontinuous coeffi-
cients in three dimensions. Numer. math., 72(3):313348, 1996 (cited
on page 136).
[63] Maksymilian Dryja and Olof B. Widlund. Schwarz methods of
Neumann-Neumann type for three-dimensional elliptic finite element
problems. Comm. pure appl. math., 48(2):121155, 1995 (cited on
page 227).
272 BIBLIOGRAPHY
[64] Maksymilian Dryja and Olof B. Widlund. Some domain decomposi-

tion algorithms for elliptic problems. In Iterative methods for large
linear systems. Linda Hayes and David Kincaid, editors. Proceeding
of the Conference on Iterative Methods for Large Linear Systems
held in Austin, Texas, October 19 - 21, 1988, to celebrate the sixty-
fifth birthday of David M. Young, Jr. Academic Press, San Diego,
California, 1989, pages 273291 (cited on page 6).
[65] Olivier Dubois. Optimized schwarz methods for the advection-
diffusion equation and for problems with discontinuous coefficients.
PhD thesis. McGill University, 2007 (cited on page 86).
[66] Olivier Dubois, Martin J. Gander, Sebastien Loisel, Amik St-Cyr,
and Daniel B. Szyld. The optimized Schwarz method with a coarse
grid correction. Siam j. sci. comput., 34(1):A421A458, 2012. issn:
1064-8275. doi: 10.1137/090774434. url: http://dx.doi.org/10.
1137/090774434 (cited on page 86).
[67] I. S. Duff and J. K. Reid. The multifrontal solution of indefinite sparse
symmetric linear equations. Acm trans. math. software, 9(3):302325,
1983. issn: 0098-3500. doi: 10.1145/356044.356047. url: http:
//dx.doi.org/10.1145/356044.356047 (cited on page 156).
[68] Yalchin Efendiev, Juan Galvis, Raytcho Lazarov, and Joerg Willems.
Robust domain decomposition preconditioners for abstract sym-
metric positive definite bilinear forms. Esaim math. model. numer.
anal., 46(5):11751199, 2012. issn: 0764-583X. doi: 10.1051/m2an/
2011073. url: http://dx.doi.org/10.1051/m2an/2011073 (cited
on pages 154, 191, 192).
[69] Mohamed El Bouajaji, Victorita Dolean, Martin J. Gander, and
Stephane Lanteri. Comparison of a one and two parameter family
of transmission conditions for Maxwells equations with damping. In
Domain decomposition methods in science and engineering xx. ac-
cepted for publication. Springer LNCSE, 2012 (cited on page 86).
[70] Mohamed El Bouajaji, Victorita Dolean, Martin J. Gander, and
Stephane Lanteri. Optimized Schwarz methods for the time-harmonic
Maxwell equations with dampimg. Siam j. scient. comp., 34(4):2048
2071, 2012 (cited on page 86).
[71] Mohamed El Bouajaji, Victorita Dolean, Martin J. Gander, Stephane
Lanteri, and Ronan Perrussel. Domain decomposition methods for
electromagnetic wave propagation problems in heterogeneous media
and complex domains. In Domain decomposition methods in science
and engineering xix. Volume 78(1). Springer LNCSE, 2011, pages 5
BIBLIOGRAPHY 273
[72] Bjorn Engquist and Andrew Majda. Absorbing boundary conditions

for the numerical simulation of waves. Math. comp., 31(139):629651,
1977 (cited on pages 66, 69, 77).
[73] Jocelyne Erhel and Frederic Guyomarch. An augmented conjugate
gradient method for solving consecutive symmetric positive definite
linear systems. Siam j. matrix anal. appl., 21(4):12791299, 2000
[74] Charbel Farhat, Philip Avery, Radek Tezaur, and Jing Li. FETI-
DPH: a dual-primal domain decomposition method for acoustic scat-
tering. J. comput. acoust., 13(3):499524, 2005. issn: 0218-396X. doi:
10.1142/S0218396X05002761. url: http://dx.doi.org/10.1142/
S0218396X05002761 (cited on page 138).
[75] Charbel Farhat, Michel Lesoinne, Patrick LeTallec, Kendall Pierson,
and Daniel Rixen. FETI-DP: a dual-primal unified FETI method.
I. A faster alternative to the two-level FETI method. Internat. j.
numer. methods engrg., 50(7):15231544, 2001. issn: 0029-5981. doi:
10.1002/nme.76. url: http://dx.doi.org/10.1002/nme.76 (cited
on pages 155, 173).
[76] Charbel Farhat, A. Macedo, and M. Lesoinne. A two-level domain
decomposition method for the iterative solution of high-frequency ex-
terior Helmholtz problems. Numer. math., 85(2):283303, 2000 (cited
on page 138).
[77] Charbel Farhat and FrancoisXavier Roux. A method of Finite Ele-
ment Tearing and Interconnecting and its parallel solution algorithm.
Int. j. numer. meth. engrg., 32:12051227, 1991 (cited on pages 44,
164, 168).
[78] Charbel Farhat and Francois-Xavier Roux. An unconventional do-
main decomposition method for an efficient parallel solution of large-
scale finite element systems. Siam j. sc. stat. comput., 13:379396,
[79] Eric Flauraud. Methode de decomposition de domaine pour des mi-
lieux poreux failles. in preparation. PhD thesis. Paris VI, 2001 (cited
on page 86).
[80] Juan Galvis and Yalchin Efendiev. Domain decomposition precondi-
tioners for multiscale flows in high contrast media: reduced dimension
coarse spaces. Multiscale model. simul., 8(5):16211644, 2010. issn:
1540-3459. doi: 10.1137/100790112. url: http://dx.doi.org/10.
1137/100790112 (cited on pages 154, 191, 192).
274 BIBLIOGRAPHY
[81] Juan Galvis and Yalchin Efendiev. Domain decomposition precon-

ditioners for multiscale flows in high-contrast media. Multiscale
model. simul., 8(4):14611483, 2010. issn: 1540-3459. doi: 10.1137/
090751190. url: http://dx.doi.org/10.1137/090751190 (cited
on pages 153, 154, 191).
[82] Martin J. Gander. On the influence of geometry on optimized Schwarz
methods. Sema j., 53(1):7178, 2011. issn: 1575-9822 (cited on
page 86).
[83] Martin J. Gander. Optimized Schwarz methods. Siam j. numer.
anal., 44(2):699731, 2006 (cited on pages 71, 86).
[84] Martin J. Gander. Optimized Schwarz methods for Helmholtz prob-
lems. In Thirteenth international conference on domain decomposi-
tion, 2001, pages 245252 (cited on page 80).
[85] Martin J. Gander. Schwarz methods over the course of time. Elec-
tronic transactions on numerical analysis, 31:228255, 2008 (cited on
page 1).
[86] Martin J. Gander, Laurence Halpern, and Frederic Magoules. An
optimized schwarz method with two-sided robin transmission con-
ditions for the helmholtz equation. Int. j. for num. meth. in fluids,
55(2):163175, 2007 (cited on page 86).
[87] Martin J. Gander and Felix Kwok. Best Robin parameters for
optimized Schwarz methods at cross points. Siam j. sci. com-
put., 34(4):A1849A1879, 2012. issn: 1064-8275. doi: 10 . 1137 /
110837218. url: http://dx.doi.org/10.1137/110837218 (cited
on page 86).
[88] Martin J. Gander, Frederic Magoules, and Frederic Nataf. Optimized
Schwarz methods without overlap for the Helmholtz equation. Siam
j. sci. comput., 24(1):3860, 2002 (cited on pages 62, 74, 75, 78, 86).
[89] Martin J. Gander and Frederic Nataf. AILU: a preconditioner based
on the analytic factorization of the elliptic operator. Numer. linear
algebra appl., 7(7-8):505526, 2000. Preconditioning techniques for
large sparse matrix problems in industrial applications (Minneapolis,
MN, 1999). issn: 1070-5325. doi: 10 . 1002 / 1099 - 1506(200010 /
12 ) 7 : 7 / 8<505 :: AID - NLA210 > 3 . 0 . CO ; 2 - Z. url: http : / / dx .
doi.org/10.1002/1099- 1506(200010/12)7:7/8%3C505::AID-
NLA210%3E3.0.CO;2-Z (cited on page 86).
[90] Martin J. Gander and Yingxiang Xu. Optimized Schwarz Methods for
Circular Domain Decompositions with Overlap. Siam j. numer. anal.,
52(4):19812004, 2014. issn: 0036-1429. doi: 10.1137/130946125.
url: http://dx.doi.org/10.1137/130946125 (cited on page 86).
BIBLIOGRAPHY 275
[91] Felix R. Gantmacher. Theorie des matrices. Dunod, 1966 (cited on

page 183).
[92] Luca Gerardo-Giorda, Patrick Le Tallec, and Frederic Nataf. A
Robin-Robin preconditioner for advection-diffusion equations with
discontinuous coefficients. Comput. methods appl. mech. engrg.,
193:745764, 2004 (cited on page 155).
[93] Luca Gerardo Giorda and Frederic Nataf. Optimized Schwarz meth-
ods for unsymmetric layered problems with strongly discontinuous
and anisotropic coefficients. Technical report (561). submitted. Ecole
Polytechnique, France: CMAP, CNRS UMR 7641, 2004. url: http:
//www.cmap.polytechnique.fr/preprint/repository/561.pdf
(cited on page 86).
[94] L. Giraud and A. Haidar. Parallel algebraic hybrid solvers for large
3D convection-diffusion problems. Numer. algorithms, 51(2):151177,
2009. issn: 1017-1398. doi: 10 . 1007 / s11075 - 008 - 9248 - x. url:
http : / / dx . doi . org / 10 . 1007 / s11075 - 008 - 9248 - x (cited on
page 221).
[95] L. Giraud, A. Haidar, and L. T. Watson. Parallel scalability study of
hybrid preconditioners in three dimensions. Parallel comput., 34(6-
8):363379, 2008. issn: 0167-8191. doi: 10.1016/j.parco.2008.01.
006. url: http://dx.doi.org/10.1016/j.parco.2008.01.006
[96] Dan Givoli. Numerical methods for problems in infinite domains. El-
sevier, 1992 (cited on page 66).
[97] Gene H. Golub and Charles F. Van Loan. Matrix computations. Johns
Hopkins Univ. Press, 1989. Second Edition. (cited on page 109).
[98] Pierre Gosselet and Christian Rey. Non-overlapping domain de-
composition methods in structural mechanics. Arch. comput. meth-
ods engrg., 13(4):515572, 2006. issn: 1134-3060. doi: 10 . 1007 /
BF02905857. url: http://dx.doi.org/10.1007/BF02905857 (cited
on page 183).
[99] Ivan Graham, P. O. Lechner, and Robert Scheichl. Domain decom-
position for multiscale PDEs. Numer. math., 106(4):589626, 2007
[100] Anne Greenbaum. Iterative methods for solving linear systems. Vol-
ume 17 of Frontiers in Applied Mathematics. Society for Indus-
trial and Applied Mathematics (SIAM), Philadelphia, PA, 1997,
pages xiv+220. isbn: 0-89871-396-X (cited on pages 93, 111).
276 BIBLIOGRAPHY
[101] Anne Greenbaum, Vlastimil Ptak, and Zdenv e k Strakos. Any

nonincreasing convergence curve is possible for gmres. Siam jour-
nal on matrix analysis and applications, 17(3):465469, 1996 (cited
on page 110).
[102] M. Griebel and P. Oswald. On the abstract theory of additive
and multiplicative Schwarz algorithms. Numer. math., 70(2):163180,
1995. issn: 0029-599X. doi: 10.1007/s002110050115. url: http:
//dx.doi.org/10.1007/s002110050115 (cited on pages 195, 196).
[103] Ryadh Haferssas, Pierre Jolivet, and Frederic Nataf. A Robust Coarse
Space for Optimized Schwarz methods SORAS-GenEO-2. 2015. url:
https : / / hal . archives - ouvertes . fr / hal - 01100926 (cited on
page 233).
[104] Thomas Hagstrom, R. P. Tewarson, and Aron Jazcilevich. Numeri-
cal experiments on a domain decomposition algorithm for nonlinear
elliptic boundary value problems. Appl. math. lett., 1(3), 1988 (cited
on page 68).
[105] F. Hecht. New development in freefem++. J. numer. math., 20(3-
4):251265, 2012. issn: 1570-2820 (cited on pages ii, 21, 237, 239).
[106] Van Emden Henson and Ulrike Meier Yang. Boomeramg: a paral-
lel algebraic multigrid solver and preconditioner. Applied numerical
mathematics, 41(1):155177, 2002 (cited on page 245).
[107] R.L. Higdon. Absorbing boundary conditions for difference approx-
imations to the multi-dimensional wave equations. Mathematics of
computation, 47(176):437459, 1986 (cited on page 79).
[108] Roger A. Horn and Charles R. Johnson. Matrix analysis. Cambridge
University Press, Cambridge, second edition, 2013, pages xviii+643.
isbn: 978-0-521-54823-6 (cited on page 109).
[109] Caroline Japhet. Optimized Krylov-Ventcell method. Application to
convection-diffusion problems. In Proceedings of the 9th international
conference on domain decomposition methods. Petter E. Bjrstad,
Magne S. Espedal, and David E. Keyes, editors. ddm.org, 1998,
pages 382389 (cited on page 78).
[110] Caroline Japhet and Frederic Nataf. The best interface conditions
for domain decomposition methods: absorbing boundary conditions.
to appear in Artificial Boundary Conditions, with Applications to
Computational Fluid Dynamics Problems edited by L. Tourrette,
Nova Science. 2000 (cited on pages 72, 81, 86).
[111] Caroline Japhet, Frederic Nataf, and Francois Rogier. The optimized
order 2 method. application to convection-diffusion problems. Fu-
ture generation computer systems future, 18(1):1730, 2001 (cited on
pages 72, 78, 86).
BIBLIOGRAPHY 277
[112] Caroline Japhet, Frederic Nataf, and Francois-Xavier Roux. The Op-
timized Order 2 Method with a coarse grid preconditioner. applica-
tion to convection-diffusion problems. In Ninth international confer-
ence on domain decompositon methods in science and engineering.
P. Bjorstad, M. Espedal, and D. Keyes, editors. John Wiley & Sons,
1998, pages 382389 (cited on page 86).
[113] P. Jolivet, F. Hecht, F. Nataf, and C. Prudhomme. Scalable do-
main decomposition preconditioners for heterogeneous elliptic prob-
lems. In Proceedings of the 2013 acm/ieee conference on supercom-
puting. In SC13. Best paper finalist. ACM, 2013, 80:180:11 (cited
on page 253).
[114] Pierre Jolivet. Methodes de decomposition de domaine. applica-
tion au calcul haute performance. PhD thesis. Universite de Greno-
ble, https://www.ljll.math.upmc.fr/ jolivet/thesis.pdf, 2014 (cited on
page 261).
[115] Pierre Jolivet, Victorita Dolean, Frederic Hecht, Frederic Nataf,
Christophe Prudhomme, and Nicole Spillane. High performance do-
main decomposition methods on massively parallel architectures with
freefem++. J. numer. math., 20(3-4):287302, 2012. issn: 1570-2820
[116] J.P.Berenger. A perfectly matched layer for the absorption of elec-
tromagnetic waves. J. of comp.phys., 114:185200, 1994 (cited on
pages 66, 86).
[117] G. Karypis and V. Kumar. METIS: A software package for
partitioning unstructured graphs, partitioning meshes, and com-
puting fill-reducing orderings of sparse matrices. Technical re-
port. http://glaros.dtc.umn.edu/gkhome/views/metis. Department
of Computer Science, University of Minnesota, 1998 (cited on
pages 136, 140).
[118] George Karypis and Vipin Kumar. Metis, unstructured graph parti-
tioning and sparse matrix ordering system. version 2.0. Technical re-
port. Minneapolis, MN 55455: University of Minnesota, Department
of Computer Science, August 1995 (cited on pages 13, 27).
[119] Jung-Han Kimn and Marcus Sarkis. Restricted overlapping balanc-
ing domain decomposition methods and restricted coarse problems
for the Helmholtz problem. Comput. methods appl. mech. engrg.,
196(8):15071514, 2007. issn: 0045-7825. doi: 10.1016/j.cma.2006.
03.016. url: http://dx.doi.org/10.1016/j.cma.2006.03.016
278 BIBLIOGRAPHY
[120] Axel Klawonn and Olof B. Widlund. FETI and NeumannNeumann

iterative substructuring methods: connections and new results.
Comm. pure appl. math., 54:5790, 2001 (cited on pages 164, 168).
[121] Patrick Le Tallec. Domain decomposition methods in computational
mechanics. In J. Tinsley Oden, editor, Computational mechanics ad-
vances. Volume 1 (2), pages 121220. North-Holland, 1994 (cited on
pages 155, 227).
[122] Seung-Cheol Lee, Marinos Vouvakis, and Jin-Fa Lee. A non-
overlapping domain decomposition method with non-matching grids
for modeling large finite antenna arrays. J. comput. phys., 203(1):1
21, 2005 (cited on page 86).
[123] Pierre-Louis Lions. On the Schwarz alternating method. II. In Do-
main decomposition methods. Tony Chan, Roland Glowinski, Jacques
Periaux, and Olof Widlund, editors. SIAM, Philadelphia, PA, 1989,
pages 4770 (cited on page 2).
[124] Pierre-Louis Lions. On the Schwarz alternating method. III: a variant
for nonoverlapping subdomains. In First international symposium on
domain decomposition methods for partial differential equations. Tony
F. Chan, Roland Glowinski, Jacques Periaux, and Olof Widlund,
editors. SIAM, Philadelphia, PA, 1990 (cited on page 2).
[125] Pierre-Louis Lions. On the Schwarz alternating method. III: a variant
for nonoverlapping subdomains. In Third international symposium on
domain decomposition methods for partial differential equations , held
in houston, texas, march 20-22, 1989. Tony F. Chan, Roland Glowin-
ski, Jacques Periaux, and Olof Widlund, editors. SIAM, Philadelphia,
PA, 1990 (cited on pages 19, 21, 43, 44, 47, 67, 71).
[126] Gert Lube, Lars Mueller, and Frank-Christian Otto. A non-
overlapping domain decomposition method for the advection-
diffusion problem. Computing, 64:4968, 2000 (cited on page 86).
[127] J. Mandel and B. Sousedk. BDDC and FETI-DP under minimalist
assumptions. Computing, 81(4):269280, 2007. issn: 0010-485X. doi:
10.1007/s00607-007-0254-y. url: http://dx.doi.org/10.1007/
s00607-007-0254-y (cited on page 173).
[128] Jan Mandel. Balancing domain decomposition. Comm. on applied
numerical methods, 9:233241, 1992 (cited on pages 44, 155, 211,
213, 235).
[129] Jan Mandel and Marian Brezina. Balancing domain decomposi-
tion for problems with large jumps in coefficients. Math.comp.,
65(216):13871401, 1996 (cited on page 136).
BIBLIOGRAPHY 279
[130] Jan Mandel and Marian Brezina. Balancing domain decomposition

for problems with large jumps in coefficients. Math. comp., 65:1387
1401, 1996 (cited on page 227).
[131] Jan Mandel and Marian Brezina. Balancing domain decomposition:
theory and computations in two and three dimensions. Technical re-
port (UCD/CCM 2). Center for Computational Mathematics, Uni-
versity of Colorado at Denver, 1993 (cited on page 155).
[132] Jan Mandel, Clark R. Dohrmann, and Radek Tezaur. An algebraic
theory for primal and dual substructuring methods by constraints.
Appl. numer. math., 54:167193, 2005 (cited on page 155).
[133] Jan Mandel and Bedrich Sousedk. Coarse spaces over the ages. In,
Domain decomposition methods in science and engineering XIX. Vol-
ume 78, in Lect. Notes Comput. Sci. Eng. Pages 213220. Springer,
Heidelberg, 2011. doi: 10 . 1007 / 978 - 3 - 642 - 11304 - 8 _ 23. url:
http://dx.doi.org/10.1007/978-3-642-11304-8_23 (cited on
page 126).
[134] Tarek P. A. Mathew. Domain decomposition methods for the numer-
ical solution of partial differential equations. Volume 61 of Lecture
Notes in Computational Science and Engineering. Springer-Verlag,
Berlin, 2008, pages xiv+764. isbn: 978-3-540-77205-7. doi: 10.1007/
978-3-540-77209-5. url: http://dx.doi.org/10.1007/978-3-
540-77209-5 (cited on page i).
[135] A. M. Matsokin and S. V. Nepomnyaschikh. A Schwarz alternating
method in a subspace. Soviet mathematics, 29(10):7884, 1985 (cited
on page 6).
[136] Gerard Meurant. The Lanczos and conjugate gradient algorithms.
Volume 19 of Software, Environments, and Tools. Society for In-
dustrial and Applied Mathematics (SIAM), Philadelphia, PA, 2006,
pages xvi+365. isbn: 978-0-898716-16-0; 0-89871-616-0. doi: 10 .
1137/1.9780898718140. url: http://dx.doi.org/10.1137/1.
9780898718140. From theory to finite precision computations (cited
on page 111).
[137] Frederic Nataf. A new approach to perfectly matched layers for the
linearized Euler system. J. comput. phys., 214(2):757772, 2006. issn:
0021-9991. doi: 10.1016/j.jcp.2005.10.014. url: http://dx.
doi.org/10.1016/j.jcp.2005.10.014 (cited on page 183).
[138] Frederic Nataf. Absorbing boundary conditions in block Gauss-Seidel
methods for convection problems. Math. models methods appl. sci.,
6(4):481502, 1996 (cited on pages 73, 86).
280 BIBLIOGRAPHY
[139] Frederic Nataf and Francis Nier. Convergence rate of some do-
main decomposition methods for overlapping and nonoverlapping
subdomains. Numerische mathematik, 75(3):35777, 1997 (cited on
page 86).
[140] Frederic Nataf and Francois Rogier. Factorization of the convection-
diffusion operator and the Schwarz algorithm. M 3 AS, 5(1):6793,
[141] Frederic Nataf, Francois Rogier, and Eric de Sturler. Optimal inter-
face conditions for domain decomposition methods. Technical report
(301). CMAP (Ecole Polytechnique), 1994 (cited on page 68).
[142] Frederic Nataf, Hua Xiang, and Victorita Dolean. A two level domain
decomposition preconditioner based on local Dirichlet-to-Neumann
maps. C. r. mathematique, 348(21-22):11631167, 2010 (cited on
pages 136, 153, 154).
[143] Frederic Nataf, Hua Xiang, Victorita Dolean, and Nicole Spillane. A
coarse space construction based on local Dirichlet to Neumann maps.
Siam j. sci comput., 33(4):16231642, 2011 (cited on pages 136, 153,
154, 191, 192).
[144] Sergey V. Nepomnyaschikh. Decomposition and fictious domains
for elliptic boundary value problems. In Fifth international sympo-
sium on domain decomposition methods for partial differential equa-
tions. David E. Keyes, Tony F. Chan, Gerard A. Meurant, Jeffrey
S. Scroggs, and Robert G. Voigt, editors. SIAM, Philadelphia, PA,
1992, pages 6272 (cited on pages 195, 196).
[145] Sergey V. Nepomnyaschikh. Mesh theorems of traces, normalizations
of function traces and their inversions. Sov. j. numer. anal. math.
modeling, 6:125, 1991 (cited on pages 195, 196, 235).
[146] Roy A. Nicolaides. Deflation of conjugate gradients with applications
to boundary value problems. Siam j. numer. anal., 24(2):355365,
1987. issn: 0036-1429. doi: 10 .1137 / 0724027. url: http :/ / dx.
doi.org/10.1137/0724027 (cited on pages 123, 128, 153).
[147] Francis Nier. Remarques sur les algorithmes de decomposition de do-
maines. In, Seminaire: equations aux derivees partielles, 19981999,
Exp. No. IX, 26. Ecole Polytech., 1999 (cited on page 68).
[148] M. Oorsprong, F. Berberich, V. Teodor, T. Downes, S. Erotokritou,
S. Requena, E. Hogan, M. Peters, S. Wong, A. Gerber, E. Emeriau,
R. Guichard, G. Yepes, K. Ruud, et al., editors. Prace annual report
2013. Insight Publishers, 2014 (cited on page 253).
BIBLIOGRAPHY 281
[149] Michael L. Parks, Eric de Sturler, Greg Mackey, Duane D. Johnson,

and Spandan Maiti. Recycling Krylov subspaces for sequences of lin-
ear systems. Siam j. sci. comput., 28(5):16511674 (electronic), 2006.
issn: 1064-8275. doi: 10.1137/040607277. url: http://dx.doi.
org/10.1137/040607277 (cited on page 125).
[150] Luca F. Pavarino and Olof B. Widlund. Balancing Neumann-
Neumann methods for incompressible Stokes equations. Comm. pure
appl. math., 55(3):302335, 2002 (cited on page 155).
[151] Astrid Pechstein and Clemens Pechstein. A feti method for a
tdnns discretization of plane elasticity. Technical report (2013-
05). http://www.numa.uni-linz.ac.at/publications/List/2013/2013-
05.pdf: Johannes Kepler University Linz, 2013 (cited on page 189).
[152] Astrid Pechstein and Joachim Schoberl. Anisotropic mixed finite el-
ements for elasticity. Internat. j. numer. methods engrg., 90(2):196
217, 2012. issn: 0029-5981. doi: 10 . 1002 / nme . 3319. url: http :
//dx.doi.org/10.1002/nme.3319 (cited on page 189).
[153] Clemens Pechstein. Finite and boundary element tearing and in-
terconnecting solvers for multiscale problems. Springer-Verlag, 2013
[154] Clemens Pechstein and Robert Scheichl. Analysis of FETI methods
for multiscale PDEs. Numer. math., 111(2):293333, 2008 (cited on
page 136).
[155] Clemens Pechstein and Robert Scheichl. Scaling up through do-
main decomposition. Appl. anal., 88(10-11):15891608, 2009 (cited
on pages 136, 191).
[156] Clemens Pechstein and Robert Scheichl. Weighted Poincare inequal-
ities. Technical report (NuMa-Report 2010-10). Institute of Compu-
tational Mathematics, Johannes Kepler University Linz, 2010 (cited
on page 152).
[157] Zhen Peng and Jin-Fa Lee. Non-conformal domain decomposition
method with second-order transmission conditions for time-harmonic
electromagnetics. J. comput. phys., 229(16):56155629, 2010 (cited on
page 86).
[158] Zhen Peng, Vineet Rawat, and Jin-Fa Lee. One way domain decompo-
sition method with second order transmission conditions for solving
electromagnetic wave problems. J. comput. phys., 229(4):11811197,
282 BIBLIOGRAPHY
[159] Jack Poulson, Bjorn Engquist, Siwei Li, and Lexing Ying. A paral-
lel sweeping preconditioner for heterogeneous 3D Helmholtz equa-
tions. Siam j. sci. comput., 35(3):C194C212, 2013. issn: 1064-8275.
doi: 10 . 1137 / 120871985. url: http : / / dx . doi . org / 10 . 1137 /
120871985 (cited on page 86).
[160] Alfio Quarteroni and Alberto Valli. Domain decomposition methods
for partial differential equations. Oxford Science Publications, 1999
(cited on page i).
[161] Vineet Rawat and Jin-Fa Lee. Nonoverlapping domain decomposi-
tion with second order transmission condition for the time-harmonic
Maxwells equations. Siam j. sci. comput., 32(6):35843603, 2010.
issn: 1064-8275 (cited on page 86).
[162] Francois-Xavier Roux, Frederic Magoules, Laurent Series, and Yas-
sine Boubendir. Approximation of optimal interface boundary condi-
tons for two-Lagrange multiplier FETI method. In, Domain decompo-
sition methods in science and engineering. Volume 40, in Lect. Notes
Comput. Sci. Eng. Pages 283290. Springer, Berlin, 2005 (cited on
page 62).
[163] Yousef Saad. Analysis of augmented Krylov subspace methods. Siam
j. matrix anal. appl., 18(2):435449, 1997. issn: 0895-4798. doi: 10.
1137/S0895479895294289. url: http://dx.doi.org/10.1137/
S0895479895294289 (cited on page 125).
[164] Youssef Saad. Iterative methods for sparse linear systems. PWS Pub-
lishing Company, 1996 (cited on pages 93, 104, 108, 111, 125).
[165] Youssef Saad and Martin H. Schultz. GMRES: a generalized minimal
residual algorithm for solving nonsymmetric linear systems. Siam j.
sci. stat. comp., 7:856869, 1986 (cited on page 104).
[166] Achim Schadle, Lin Zschiedrich, Sven Burger, Roland Klose, and
Frank Schmidt. Domain decomposition method for Maxwells equa-
tions: scattering off periodic structures. J. comput. phys., 226(1):477
493, 2007. issn: 0021-9991. doi: 10.1016/j.jcp.2007.04.017. url:
http : / / dx . doi . org / 10 . 1016 / j . jcp . 2007 . 04 . 017 (cited on
page 86).
[167] Robert Scheichl and Eero Vainikko. Additive Schwarz with
aggregation-based coarsening for elliptic problems with highly vari-
able coefficients. Computing, 80(4):319343, 2007 (cited on page 191).
[168] Robert Scheichl, Panayot S. Vassilevski, and Ludmil Zikatanov. Weak
approximation properties of elliptic projections with functional con-
straints. Multiscale model. simul., 9(4):16771699, 2011. issn: 1540-
3459. doi: 10 . 1137 / 110821639. url: http : / / dx . doi . org / 10 .
1137/110821639 (cited on pages 154, 191).
BIBLIOGRAPHY 283
[169] H. A. Schwarz. Uber einen Grenzubergang durch alternierendes

Verfahren. Vierteljahrsschrift der Naturforschenden Gesellschaft in
Zurich, 15:272286, 1870 (cited on pages ii, 1).
[170] Barry F. Smith, Petter E. Bjrstad, and William Gropp. Domain
decomposition: parallel multilevel methods for elliptic partial differ-
ential equations. Cambridge University Press, 1996 (cited on pages i,
168, 169).
[171] Henry J Stephen Smith. On systems of linear indeterminate equations
and congruences. Philosophical transactions of the royal society of
london, 151:293326, 1861 (cited on page 183).
[172] Nicole Spillane, Victorita Dolean, Patrice Hauret, Frederic Nataf,
Clemens Pechstein, and Robert Scheichl. A robust two-level domain
decomposition preconditioner for systems of PDEs. C. r. math. acad.
sci. paris, 349(23-24):12551259, 2011. issn: 1631-073X. doi: 10 .
1016/j.crma.2011.10.021. url: http://dx.doi.org/10.1016/j.
crma.2011.10.021 (cited on page 192).
[173] Nicole Spillane, Victorita Dolean, Patrice Hauret, Frederic Nataf,
Clemens Pechstein, and Robert Scheichl. Abstract robust coarse
spaces for systems of PDEs via generalized eigenproblems in the
overlaps. Numer. math., 126(4):741770, 2014. issn: 0029-599X. doi:
10.1007/s00211-013-0576-y. url: http://dx.doi.org/10.1007/
s00211-013-0576-y (cited on pages 192, 233, 235).
[174] Nicole Spillane, Victorita Dolean, Patrice Hauret, Frederic Nataf, and
Daniel Rixen. Solving generalized eigenvalue problems on the inter-
faces to build a robust two-level FETI method. C. r. math. acad.
sci. paris, 351(5-6):197201, 2013. issn: 1631-073X. doi: 10.1016/
j.crma.2013.03.010. url: http://dx.doi.org/10.1016/j.crma.
2013.03.010 (cited on pages 221, 233, 235).
[175] Nicole Spillane and Daniel Rixen. Automatic spectral coarse spaces
for robust finite element tearing and interconnecting and balanced
domain decomposition algorithms. Internat. j. numer. methods en-
grg., 95(11):953990, 2013. issn: 0029-5981. doi: 10.1002/nme.4534.
url: http://dx.doi.org/10.1002/nme.4534 (cited on page 221).
[176] Patrick Le Tallec, Jan Mandel, and Marina Vidrascu. A Neumann-
Neumann Domain Decomposition Algorithm for Solving Plate and
Shell Problems. Siam j. numer. anal., 35:836867, 1998 (cited on
page 155).
[177] Patrick Le Tallec and A. Patra. Methods for adaptive hp approxi-
mations of Stokes problem with discontinuous pressure fields. Comp.
meth. appl. mech. eng., 145:361379, 1997 (cited on page 155).
284 BIBLIOGRAPHY
[178] J.M. Tang, R. Nabben, C. Vuik, and Y.A. Erlangga. Comparison

of two-level preconditioners derived from deflation, domain decom-
position and multigrid methods. Journal of scientific computing,
39(3):340370, 2009 (cited on pages 125, 211).
[179] Andrea Toselli and Olof Widlund. Domain decomposition methods
- algorithms and theory. Volume 34 of Springer Series in Computa-
tional Mathematics. Springer, 2005 (cited on pages i, 6, 125, 126, 141,
143, 144, 148, 153, 173, 197).
[180] H. A. van der Vorst. Bi-CGSTAB: a fast and smoothly converging
variant of Bi-CG for the solution of nonsymmetric linear systems.
Siam j. sci. statist. comput., 13(2):631644, 1992. issn: 0196-5204.
doi: 10 . 1137 / 0913035. url: http : / / dx . doi . org / 10 . 1137 /
0913035 (cited on pages 104, 110).
[181] Joerg Willems. Robust multilevel methods for general symmetric pos-
itive definite operators. Siam j. numer. anal., 52(1):103124, 2014.
issn: 0036-1429. doi: 10.1137/120865872. url: http://dx.doi.
org/10.1137/120865872 (cited on page 192).
[182] Francoise Willien, Isabelle Faille, Frederic Nataf, and Frederic Schnei-
der. Domain decomposition methods for fluid flow in porous medium.
In 6th european conference on the mathematics of oil recovery, 1998
[183] J.T. Wloka, B. Rowley, and B. Lawruk. Boundary Value Problems
for Elliptic Systems. Cambridge University Press, Cambridge, 1995
[184] Jinchao Xu. Theory of multilevel methods. PhD thesis. Cornell Uni-
versity, 1989 (cited on page 197).

Book On Damage Mechanics

Uploaded by

Copyright:

Available Formats

Book On Damage Mechanics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Book On Damage Mechanics

Uploaded by

Copyright:

Available Formats

An Introduction to Domain Decomposition Methods:

algorithms, theory and parallel implementation

To cite this version:

HAL Id: cel-01100932

HAL is a multi-disciplinary open access Larchive ouverte pluridisciplinaire HAL, est

Victorita Dolean Pierre Jolivet Frederic Nataf

The value of domain decomposition methods is part of a general need

In Chapter 1 we start by introducing different versions of Schwarz algo-

Figure 1: Dependency graph of chapters

are presented in a different light, under a common framework. Moreover,

We give in Figure 1, the dependency graph of the various chapters. For

2 Optimized Schwarz methods (OSM) 43

2.4.2 Optimal Algebraic Interface Conditions . . . . . . . . . 69

4 Coarse Spaces 123

5 Theory of two-level ASM 135

6 Neumann-Neumann and FETI Algorithms 155

7 GenEO Coarse Space 191

8 Implementation of Schwarz methods 239

FreeFem++ interface . . . . . . . . . . . . . . . . 244

1.1 Three continuous Schwarz Algorithms

Definition 1.1.1 (Original Schwarz algorithm) The Schwarz algorithm

Definition 1.1.2 (Parallel Schwarz algorithm) Iterative method which

It is easy to see that if the algorithm converges, the solutions u i , i = 1, 2

that e = 0 on 12 . By linearity of the Poisson equation, we also have that

Definition 1.1.3 (Extension operators and partition of unity) Let the

for any function w R.

Definition 1.1.4 (First global Schwarz iteration) Let un be an approx-

We can prove the following property:

Lemma 1.1.1 Algorithm (1.5)-(1.6) which iterates on un and algorithm

and thus w1n+1 = un+1 n+1

We introduce in Algorithm 1 another formulation to algorithm (1.5)-(1.6)

Lemma 1.1.2 (Equivalence between Schwarz algorithm and RAS)

Proof Here, we have to prove the equality

where un1,2 is given by (1.3) and un is given by (1.12)-(1.13)-(1.14). We

and proceed by induction assuming the property holds at step n of the

un+1 = E1 (1 (un + v1n )) + E2 (2 (un + v2n )) . (1.9)

We prove now that un1 + v1n = un+1

(un + v1n ) = (un ) + rn = (un ) + f + (un ) = f in 1 ,

It remains to prove that

By the induction hypothesis we have un = E1 (1 un1 ) + E2 (2 un2 ). On 1

un = 1 un1 + 2 un2 = un2 . (1.11)

Another global variant of the parallel Schwarz algorithm (1.3) consists

Algorithm 1 RAS algorithm at the continuous level

2. For i = 1, 2 solve for a local correction vin :

(vin ) = rn in i , vin = 0 on i (1.13)

3. Compute an average of the local corrections and update un :

un+1 = un + E1 (1 v1n ) + E2 (2 v2n ) . (1.14)

where (i )i=1,2 and (Ei )i=1,2 define a partition of unity as in defined

Definition 1.1.5 (Second global Schwarz iteration) Let un be an ap-

It is easy to check that this algorithm is equivalent to Algorithm 2 which is

Algorithm 2 ASM algorithm at the continuous level

2. For i = 1, 2 solve for a local correction vin :

(vin ) = rn in i , vin = 0 on i (1.17)

Algorithm (1.3) Jacobi Schwarz Method (JSM)

Algorithm (1.12)-(1.13)-(1.14) Restricted Additive Schwarz (RAS)

Algorithm (1.16)-(1.17)-(1.18) Additive Schwarz Method (ASM)

The discrete version of the first algorithm is seldom implemented since

1.2 Connection with the Block Jacobi algorithm

N1 = {1, . . . , ms } and N2 = {ms + 1, . . . , m} .