Book On Damage Mechanics
Book On Damage Mechanics
Book On Damage Mechanics
June 5, 2015
2
i
The purpose of this text is to offer an overview of the most popular do-
main decomposition methods for partial differential equations (PDE). The
presentation is kept as much as possible at an elementary level with a spe-
cial focus on the definitions of these methods in terms both of PDEs and
of the sparse matrices arising from their discretizations. We also provide
implementations written in an open source finite element software. In ad-
dition, we consider a number of methods that have not been presented in
other books. We think that this book will give a new perspective and that
it will complement those of Smith, Bjrstad and Gropp [170], Quarteroni
and Valli [160], Mathew [134] and Toselli and Widlund[179] as well as the
review article [20].
The book is addressed to computational scientists, mathematicians, physi-
cists and, in general, to people involved in numerical simulation of par-
tial differential equations. It can also be used as textbook for advanced
undergraduate/First-Year Graduate students. The mathematical tools needed
are basic linear algebra, notions of programming, variational formulation of
PDEs and basic knowledge in finite element discretization.
is a parallel machine. Waiting for the next generation machine does not
guarantee anymore a better performance of a software. To keep doubling
performance parallelism must double. It implies a huge effort in algorithmic
development. Scientific computing is only one illustration of this general
need in computer science. Visualization, data storage, mesh generation,
operating systems, . . . must be designed with parallelism in mind.
We focus here on parallel linear iterative solvers. Contrary to direct
methods, the appealing feature of domain decomposition methods is that
they are naturally parallel. We introduce the reader to the main classes
of domain decomposition algorithms: Schwarz, Neumann-Neumann/FETI
and Optimized Schwarz. For each method we start by the continuous for-
mulation in terms of PDEs for two subdomains. We then give the definition
in terms of stiffness matrices and their implementation in a free finite ele-
ment package in the many subdomain case. This presentation reflects the
dual nature of domain decomposition methods. They are solvers of linear
systems keeping in mind that the matrices arise from the discretization of
partial differential operators. As for domain decomposition methods that
directly address non linearities, we refer the reader to e.g. [14] or [15] and
references therein.
as well. These algorithms are the method of choice for wave propagation
phenomena in the frequency regime. Such situations occur in acoustics,
electromagnetics and elastodynamics.
In Chapter 3 we present the main ideas which justify the use of Krylov
methods instead of stationary iterations. Since Schwarz methods introduced
in Chapter 1 represent fixed point iterations applied to preconditioned global
problems, and consequently not providing the fastest convergence possible,
it is natural to apply Krylov methods instead. This provides the justi-
fication of using Schwarz methods as preconditioners rather than solvers.
Numerical implementations and results using FreeFem++ are closing the
chapter. Although some part of the presentation of some Krylov methods
is not standard, readers already familiar with Krylov methods may as well
skip it.
Chapter 4 is devoted to the introduction of two-level methods. In the
presence of many subdomains, the performance of Schwarz algorithms, i.e.
the iteration number and execution time will grow linearly with the number
of subdomains in one direction. From a parallel computing point of view this
translates into a lack of scalability. The latter can be achieved by adding a
second level or a coarse space. This is strongly related to multigrid methods
and to deflation methods from numerical linear algebra. The simplest coarse
space which belongs to Nicolaides is introduced and then implemented in
FreeFem++.
In Chapter 5, we show that Nicolaides coarse space (see above) is a
particular case of a more general class of spectral coarse spaces which are
generated by vectors issued from solving some local generalized eigenvalue
problems. Then, a theory of these two-level algorithms is presented. First, a
general variational setting is introduced as well as elements from the abstract
theory of the two-level additive Schwarz methods (e.g. the concept of stable
decomposition). The analysis of spectral and classical coarse spaces goes
through some properties and functional analysis results. These results are
valid for scalar elliptic PDEs. This chapter is more technical than the others
and is not necessary to the sequel of the book.
Chapter 6 is devoted to the Neumann-Neumann and FETI algorithms.
We start with the two subdomain case for the Poisson problem. Then, we
consider the formulation in terms of stiffness matrices and stress the duality
of these methods. We also establish a connection with block factorization
of the stiffness matrix of the original problem. We then show that in the
many subdomains case Neumann-Neumann and FETI are no longer strictly
equivalent. For sake of simplicity, we give a FreeFem++ implementation of
only the Neumann-Neumann algorithm. The reader is then ready to delve
into the abundant litterature devoted to the use of these methods for solving
complex mechanical problems.
In Chapter 7, we return to two level methods. This time, a quite recent
adaptive abstract coarse space, as well as most classical two-level methods
iv
1 2 3 4 5 6 716 77 78 8
ested in having a quick and partial view and already familiar with Krylov
methods, may very well read only Chapter 1 followed by Chapter 4. For
new comers to Krylov methods, reading of Chapter 3 must be intercalated
between Chapter 1 and Chapter 4.
For a quick view on all Schwarz methods without entering into the technical
details of coarse spaces, one could consider beginning by Chapter 1 followed
by Chapter 2 and then by Chapter 3 on the use of Schwarz methods as
preconditioners, to finish with Chapter 4 on classical coarse spaces.
For the more advanced reader, Chapters 5 and 7 provide the technical frame-
work for the analysis and construction of more sophisticated coarse spaces.
And last, but not least Chapter 8 gives the keys of parallel numerical imple-
mentation and illustrates with numerical results the previously introduced
methods.
vi
Contents
1 Schwarz methods 1
1.1 Three continuous Schwarz Algorithms . . . . . . . . . . . . . . 1
1.2 Connection with the Block Jacobi algorithm . . . . . . . . . . 6
1.3 discrete partition of unity . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 Two subdomain case in one dimension . . . . . . . . . 10
1d Algebraic setting . . . . . . . . . . . . . . . . . . . . . 10
1d Finite element decomposition . . . . . . . . . . . . . 12
1.3.2 Multi dimensional problems and many subdomains . . 13
Multi-D algebraic setting . . . . . . . . . . . . . . . . . . 13
Multi-D finite element decomposition . . . . . . . . . . 14
1.4 Iterative Schwarz methods: RAS, ASM . . . . . . . . . . . . . 15
1.5 Convergence analysis . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.1 1d case: a geometrical analysis . . . . . . . . . . . . . . 16
1.5.2 2d case: Fourier analysis for two subdomains . . . . . . 17
1.6 More sophisticated Schwarz methods: P.L. Lions Algorithm . 19
1.7 Schwarz methods using FreeFem++ . . . . . . . . . . . . . . . 21
1.7.1 A very short introduction to FreeFem++ . . . . . . . . 21
1.7.2 Setting the domain decomposition problem . . . . . . . 26
1.7.3 Schwarz algorithms as solvers . . . . . . . . . . . . . . . 36
1.7.4 Systems of PDEs: the example of linear elasticity . . . 38
1
2 CONTENTS
3 Krylov methods 91
3.1 Fixed point iterations . . . . . . . . . . . . . . . . . . . . . . . . 91
3.2 Krylov spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.2.1 Gradient methods . . . . . . . . . . . . . . . . . . . . . . 96
3.3 The Conjugate Gradient method . . . . . . . . . . . . . . . . . 97
3.3.1 The Preconditioned Conjugate Gradient Method . . . 102
3.4 Krylov methods for non-symmetric problems . . . . . . . . . . 104
3.4.1 The GMRES method . . . . . . . . . . . . . . . . . . . . 106
3.4.2 Convergence of the GMRES algorithm . . . . . . . . . 109
3.5 Krylov methods for ill-posed problems . . . . . . . . . . . . . . 111
3.6 Schwarz preconditioners using FreeFem++ . . . . . . . . . . . 114
Schwarz methods
(u) = f in (1.1)
u = 0 on .
1 2
Figure 1.1: A complex domain made from the union of two simple geometries
1
2 CHAPTER 1. SCHWARZ METHODS
2 ) by:
1 and 2 . It updates (un1 , un2 ) (u1n+1 , un+1
(un+1
1 ) = f in 1 (un+1
2 ) = f in 2
un+1
1 = 0 on 1 then, un+1
2 = 0 on 2
n+1
u1 = un2 on 1 2 . n+1
u2 = un+1
1 on 2 1 .
(1.2)
H. Schwarz proved the convergence of the algorithm and thus the well-
posedness of the Poisson problem in complex geometries.
With the advent of digital computers, this method also acquired a prac-
tical interest as an iterative linear solver. Subsequently, parallel computers
became available and a small modification of the algorithm [124] makes it
suited to these architectures. Its convergence can be proved using the max-
imum principle [123].
(un+1
i ) =f in i
un+1
i =0 on i (1.3)
un+1
i = un3i on i 3i .
(e ) = 0 in 12
e = 0 on 12
and thus e = 0 .
Algorithms (1.2) and (1.3) act on the local functions (ui )i=1,2 . In order
to write algorithms that act on global functions in H 1 (), the space in which
problem (1.1) is naturally posed, we need extension operators and partitions
of unity.
There are two ways to write related algorithms that act on functions un
H 1 (). They are given in Definitions 1.1.4 and 1.1.5.
(win+1 ) = f in i , win+1 = un on i 3i
(1.5)
win+1 = 0 on i .
and then gluing them together using the partition of unity functions:
2
un+1 = Ei (i win+1 ) . (1.6)
i=1
holds for all n 0. Assume the property holds at step n of the algorithm.
Then, using the fact that 1 = 1 and 2 = 0 on 1 2 we have by definition
that w1n+1 is a solution to BVP (1.3):
(w1n+1 ) =f in 1 ,
w1n+1 =0 on 1 ,
2 (1.8)
w1n+1 = un = Ei (i uni ) = un2 on 1 2 .
i=1
un = E1 (1 un1 ) + E2 (2 un2 ) ,
u0 = E1 (1 u01 ) + E2 (2 u02 )
un = un2 on 1 2 .
Finally from (1.10) and (1.11) we can conclude that un1 + v1n = un+1
1 satisfies
n+1
problem (1.3) and is thus equal to u1 . The same holds for domain 2 ,
un2 + v2n = un+1
2 . Then equation (1.9) reads
un+1 = E1 (1 un+1
1 ) + E2 (2 u2 )
n+1
which ends the proof of the equivalence between Schwarz algorithm and the
continuous RAS algorithm (1.12)-(1.13)-(1.15).
rn = f + (un ) (1.12)
rn = f + (un ) (1.16)
3. Update un :
un+1 = un + E1 (v1n ) + E2 (v2n ) . (1.18)
To sum up, starting from the original Schwarz algorithm (1.2) that is
sequential, we have thus three continuous algorithms that are essentially
parallel:
6 CHAPTER 1. SCHWARZ METHODS
Let U1 = (Uk )kN1 = UN1 , U2 = (Uk )kN2 = UN2 and similarly F1 = FN1 ,
F2 = FN2 .
The linear system has the following block form:
A11 A12 U F
( )( 1 ) = ( 1 )
A21 A22 U2 F2
or equivalently,
Un+1 = Un + D1 (F AUn ) = Un + D1 rn ,
A11 0 U1 F1 A12 U2
n+1 n
= . (1.21)
0 A22 Un+1
2
F2 A21 Un1
A1
11 0 ) = RT A1 R = RT (R ART )1 R
( 1 11 1 1 1 1 1
0 0
and
0 0
( ) = R2T A1
22 R2 = R2 (R2 AR2 ) R2 ,
T T 1
0 A1
22
8 CHAPTER 1. SCHWARZ METHODS
1 2
1 xms xms +1 2
Figure 1.2: Domain decomposition with minimal overlap and partition of
unity
u = f, in
u(0) = u(1) = 0.
Remark 1.2.1 In conclusion when the overlap is minimal the discrete coun-
terparts of the three Schwarz methods of section 1.1 are equivalent to the
same block Jacobi algorithm. Notice here a counter-intuitive feature: a non
overlapping decomposition of the set of indices N corresponds to a geometric
decomposition of the domain with minimal overlap.
A function u R.
A vector U R#N .
or in other words
N
Id = RiT Di Ri (1.25)
i=1
In the following we will give some simple examples where all the ingre-
dients of the Definition 1.3.1 are detailed and we will check that (1.25) is
verified in those cases.
1 0 0 0 0 0 0 0 1 0
R1 = 0 1 0 0 0 and R2 = ( ),
0 0 1 0 0 0 0 0 0 1
1.3. DISCRETE PARTITION OF UNITY 11
1 2 3 4 5
N1 N2
Figure 1.3: Algebraic partition of the set of indices
and
1 0 0 0 0
0 1 0 0 0
R1 =
T
0 0 1 and R2 =
T
0 0
.
0 0 1 0
0
0 0 0 0 1
We also have
1 0 0 1 0
D1 = 0 1 0 and D2 = ( ).
0 0 1 0 1
1 2 3 4 5
N1=1 N2=1
Figure 1.4: Algebraic decomposition of the set of indices into overlapping
subsets
Consider now the case where each subset is extended with a neighboring
point, see Figure 1.4:
N1=1 = {1, 2, 3, 4} and N2=1 = {3, 4, 5} .
Then, matrices R1 and R2 are:
1 0 0 0 0
0 1 0 0 0 0 0 1 0 0
R1 = and R2 = 0 0 0 1 0 .
0 0 1 0 0 0 0 0 0 1
0 0 0 1 0
The simplest choices for the partition of unity matrices are
1 0 0 0
0 1 0 0 0 0 0
D1 = and D2 = 0 1 0
0 0 1 0 0 0 1
0 0 0 0
12 CHAPTER 1. SCHWARZ METHODS
or
1 0 0 0
0 1 0 0 1/2 0 0
D1 = and D2 = 0 1/2 0 .
0 0 1/2 0 0
0 0 1
0 0 1/2
Again, it is clear that relation (1.25) holds.
1 2 3 4 5
1 2
Figure 1.5: Finite element partition of the mesh
1 0 0 0 0 0 0 1 0 0
R1 = 0 1 0 0 0 and R2 = 0 0 0 1 0 .
0 0 1 0 0 0 0 0 0 1
In order to satisfy relation (1.25), the simplest choice for the partition of
unity matrices is
1 0 0 1/2 0 0
D1 = 0 1 0 and D2 = 0 1 0
0 0 1/2 0 0 1
Consider now the situation where we add a mesh to each subdomain, see
Figure 1.6. Accordingly, the set of indices is decomposed as:
1 0 0 0 0 0 1 0 0 0
0 1 0 0 0 0 0 1 0 0
R1 = and R2 = .
0 0 1 0 0 0 0 0 1 0
0 0 0 1 0 0 0 0 0 1
1.3. DISCRETE PARTITION OF UNITY 13
1 2 3 4 5
=1
1 =1
2
Figure 1.6: Finite element decomposition of the mesh into overlapping sub-
domains
In order to satisfy relation (1.25), the simplest choice for the partition of
unity matrices is
1 0 0 0 0 0 0 0
0 1 0 0 0 1/2 0 0
D1 = and D2 = .
0 0 1/2 0 0 0 1 0
0 0 0 0 0 0 0 1
1 0 0 0 1/2 0 0 0
0 1/2 0 0 0 1/2 0 0
D1 = and D2 = .
0 0 1/2 0 0 0 1/2 0
0 0 0 1/2 0 0 0 1
Let Ri be the restriction matrix from set N to the subset Ni and Di the
identity matrix of size #Ni #Ni , 1 i N . Then, relation (1.25) is
satisfied.
N1 N1=1
N3 N3=1
N2 N2=1
Consider now the case where each subset Ni is extended with its direct
neighbors to form Ni=1 , see Figure 1.7. Let Ri be the restriction matrix from
set N to the subset Ni=1 and Di be a diagonal matrix of size #Ni=1 #Ni=1 ,
1 i N . For the choice of the coefficients of Di there are two main options.
The simplest one is to define it as a Boolean matrix:
1 if j Ni ,
(Di )jj = {
0 if j Ni=1 /Ni .
Mj = {1 i N j Ni=1 } .
Then, define
(Di )jj = 1/#Mj , for j Ni=1 .
Figure 1.8: Left: Finite element partition; Right: one layer extension of the
right subdomain
Un+1 = Un + MASM
1
rn , rn = F A Un
d2 en+1 d2 e2n+1
1
= 0 in (0, L1 ) = 0 in (l2 , L)
dx2 then, dx2
1 (0) = 0
en+1 2 (l2 ) = e1 (l2 )
en+1 n+1
(1.31)
Thus the errors are affine functions in each subdomain:
x Lx
1 (x) = e2 (L1 )
en+1 2 (x) = e1 (l2 )
n
and en+1 n+1
.
L1 L l2
Thus, we have
L L1 l2 L L1
2 (L1 ) = e1 (l2 )
en+1 = en2 (L1 )
n+1
.
L l2 L1 L l2
1.5. CONVERGENCE ANALYSIS 17
e02
e11 e12
e21 e22
e31
x
0 l2 L1 L
Figure 1.9: Convergence of the Schwarz method
l2 L l2 n 1 /(L l2 ) n
2 (L1 ) =
en+1 e2 (L1 ) = e2 (L1 ) .
l2 + L l2 1 + /l2
We see that the following quantity is the convergence factor of the algorithm
1 /(L l2 )
1 =
1 + /l2
( )(u) = f in R2 ,
u is bounded at infinity ,
18 CHAPTER 1. SCHWARZ METHODS
( )(un+1
1 ) = f (x, y), (x, y) 1
(1.32)
1 (, y) = u2 (, y),
un+1 yR
n
and
( )(un+1
2 ) = f (x, y), (x, y) 2
(1.33)
2 (0, y) = u1 (0, y),
un+1 yR
n
j , j = 1, 2 bounded at infinity.
with the local solutions un+1
In order to compute the convergence factor, we introduce the errors
eni = uni ui , i = 1, 2.
1 ) = f (x, y),
( )(en+1 (x, y) 1
(1.34)
1 (, y) = e2 (, y),
en+1 yR
n
and
( )(en+1
2 ) = f (x, y), (x, y) 2
(1.35)
2 (0, y) = e1 (0, y),
en+1 yR
n
with en+1
j bounded at infinity.
By taking the partial Fourier transform of the first line of (1.34) in the
y direction we get:
2
( 1 (x, k)) = 0
+ k 2 ) (en+1 in 1 .
x2
Therefore we have
Since the solution must be bounded at x = , this implies that n+1 (k) 0.
Thus we have
1 (x, k) = + (k) exp( (k)x)
en+1 n+1 +
1.6. MORE SOPHISTICATED SCHWARZ METHODS: P.L. LIONS ALGORITHM19
n+1
with 1,2 to be determined. From the interface conditions we get
and
2n+1 (k) = 1n (k) exp(+ (k)).
Combining these two and denoting (k) = + (k) = (k), we get for i = 1, 2,
In p u t in te rp re ta tio n :
Plo t:
0.8
0.6
0.4
0.2 k 20 .1
k 20 .1
0 .5
1 2 3 4 5 6 7
k
Figure 1.10: Convergence rate of the Schwarz method for = .1, = 0.5 (red
curve) or = 1 (blue curve).
n2 n1
1 2 1 n2 n1 2
Figure 1.11: Outward normals for overlapping and non overlapping subdo-
mains for P.L. Lions algorithm.
Ge n e ra tebdy Wo lfra m |Alp h(www.wo
a lfra m a lp h a .co m
o n) Octo b e r2 6 , 2 0 11 fro m Ch a m p a igIL.
n,
Wo lfra mAlp h aLLCA Wolfram Research Company
1.7. SCHWARZ METHODS USING FREEFEM++ 21
(un+1
1 ) = f in 1 ,
un+1
1 = 0 on 1 , (1.37)
( + ) (un+1
1 ) = ( + ) (un2 ) on 1 2 ,
n1 n1
and
(un+1
2 ) = f in 2 ,
un+1
2 = 0 on 2 (1.38)
( 2 ) = (
+ ) (un+1 + ) (un1 ) on 2 1
n2 n2
where n1 and n2 are the outward normals on the boundary of the subdo-
mains, see Figure 1.11.
u.vdx f v dx = 0, v H0 () .
1
u.vdx f v dx = 0, v H () .
1
where (i )1iM are the finite element functions. Note that the discretized
system corresponds to a Neumann problem. Dirichlet conditions of the type
u = g are then implemented by penalty, namely by setting
1
Tres Grande Valeur (Terrifically Great Value) = Very big value in French
1.7. SCHWARZ METHODS USING FREEFEM++ 23
4 2
The function square returns a structured mesh of the square: the first two
arguments are the number of mesh points according to x and y directions
and the third one is a parametrization of for x and y varying between 0
and 1 (here it is the identity). The sides of the square are labeled from 1 to
4 in trigonometrical sense (see Figure 1.2).
//Mesh definition
mesh Th=square(Nbnoeuds,Nbnoeuds,[x,y]);
We define the function representing the right-hand side using the keyword
func
// Functions of x and y
14 func f=xy;
func g=1.;
and the P 1 finite element space Vh over the mesh Th using the keyword
fespace
uh .vh dx f vh dx = 0, vh Vh .
Note that keyword problem defines problem (1.39) without solving it. The
parameter solver sets the method that will be used to solve the resulting
linear system, here a Gauss factorization. In order to effectively solve the
finite element problem, we need the command
The FreeFem++ script can be saved with your favorite text editor (e.g.
under the name heat.edp). In order to execute the script FreeFem++, it
is enough to write the shell command FreeFem++ heat.edp. The result
will be displayed in a graphic window.
One can easily modify the script in order to solve the same kind of problems
1.7. SCHWARZ METHODS USING FREEFEM++ 25
uh .vh dx + uh vh gvh f vh dx = 0,
3 4 3 4
If one wants to use some linear algebra package to solve the linear system
resulting from the finite element discretisation, the program below shows
26 CHAPTER 1. SCHWARZ METHODS
how one can retrieve first the stiffness matrix and the vector associated
to the right-hand side of the variational formulation. As a general rule,
this procedure can be very useful if one wants to use other solvers such
as domain decomposition methods. Here, the linear system is solved by
UMFPACK [37].
if (withmetis)
2 {
metisdual(lpart,Th,npart); // FreeFem++ interface to Metis
for(int i=0;i<lpart.n;++i)
part[][i]=lpart[i];
6 }
else
{
Ph xx=x,yy=y;
10 part= int(xx/allongnn)mm + int(yymm);
}
if (verbosity > 1)
plot(part,wait=1,fill=1,value=1);
Using the function part defined as above as an argument into the routine
SubdomainsPartitionUnity, well get as a result, for each subdomain
labeled i the overlapping meshes aTh[i]:
1.7. SCHWARZ METHODS USING FREEFEM++ 29
load medit
3 verbosity=2;
include dataGENEO.edp
include decomp.idp
include createPartition.idp
7 SubdomainsPartitionUnity(Th,part[],sizeovr,aTh,Rih,Dih,Ndeg,AreaThi);
// check the partition of unity
Vh sum=0,fctone=1;
for(int i=0; i < npart;i++)
11 {
Vh localone;
real[int] bi = Rih[i]fctone[]; // restriction to the local domain
real[int] di = Dih[i]bi;
15 localone[] = Rih[i]di;
sum[] +=localone[] ;
plot(localone,fill=1,value=1, dim=3,wait=1);
}
19 plot(sum,fill=1,value=1, dim=3,wait=1);
load metis
load medit
int nn=2,mm=2,ll=2; // number of the domains in each direction
4 int npart= nnmmll; // total number of domains
int nloc = 11; // local no of dof per domain in one direction
bool withmetis = 1; // =1 (Metis decomp) =0 (uniform decomp)
int sizeovr = 2; // size of the overlap
8 real allongx, allongz;
allongx = real(nn)/real(mm);
allongz = real(ll)/real(mm);
// Build the mesh
12 include cube.idp
int[int] NN=[nnnloc,mmnloc,llnloc];
real [int,int] BB=[[0,allongx],[0,1],[0,allongz]]; // bounding box
int [int,int] L=[[1,1],[1,1],[1,1]]; // the label of the 6 faces
16 mesh3 Th=Cube(NN,BB,L); // left,right,front, back, down, right
fespace Vh(Th,P1);
fespace Ph(Th,P0);
Ph part; // piecewise constant function
20 int[int] lpart(Ph.ndof); // giving the decomposition
// domain decomposition data structures
mesh3[int] aTh(npart); // sequence of ovr. meshes
matrix[int] Rih(npart); // local restriction operators
24 matrix[int] Dih(npart); // partition of unity operators
int[int] Ndeg(npart); // number of dof for each mesh
real[int] VolumeThi(npart); // volume of each subdomain
matrix[int] aA(npart); // local Dirichlet matrices
28 Vh[int] Z(npart); // coarse space
// Definition of the problem to solve
// Delta (u) = f, u = 1 on the global boundary
Vh intern;
32 intern = (x>0) && (x<allongx) && (y>0) && (y<1) && (z>0) && (z<allongz);
Vh bord = 1intern;
macro Grad(u) [dx(u),dy(u),dz(u)] // EOM
func f = 1; // right hand side
36 func g = 1; // Dirichlet data
Vh rhsglobal,uglob; // rhs and solution of the global problem
varf vaglobal(u,v) = int3d(Th)(Grad(u)Grad(v))
+on(1,u=g) + int3d(Th)(fv);
40 matrix Aglobal;
// Iterative solver
real tol=1e10; // tolerance for the iterative method
int maxit=200; // maximum number of iterations
Then we have to define a piecewise constant function part which takes inte-
ger values. The isovalues of this function implicitly defines a non overlapping
partition of the domain. Suppose we want a decomposition of a rectangle
1.7. SCHWARZ METHODS USING FREEFEM++ 33
1 if (withmetis)
{
metisdual(lpart,Th,npart);
for(int i=0;i<lpart.n;++i)
5 part[][i]=lpart[i];
}
else
{
9 Ph xx=x,yy=y, zz=z;
part= int(xx/allongxnn)mmll + int(zz/allongzll)mm+int(ymm);
}
As in the 2D case, these last two functions are tricky. The reader does not
need to understand their behavior in order to use them. They are given
here for sake of completeness.
include data3d.edp
include decomp3d.idp
3 include createPartition3d.idp
medit(part, Th, part, order = 1);
SubdomainsPartitionUnity3(Th,part[],sizeovr,aTh,Rih,Dih,Ndeg,VolumeThi);
// check the partition of unity
7 Vh sum=0,fctone=1;
for(int i=0; i < npart;i++)
{
Vh localone;
11 real[int] bi = Rih[i]fctone[]; // restriction to the local domain
real[int] di = Dih[i]bi;
localone[] = Rih[i]di;
sum[] +=localone[] ;
15 medit(loc,Th, localone, order = 1);
medit(subdomains,aTh[i]);
}
medit(sum,Th, sum, order = 1);
Then we need to define the global data from the variational formula-
tion.
9 Aglobal = vaglobal(Vh,Vh,solver = UMFPACK); // global matrix
rhsglobal[] = vaglobal(0,Vh); // global rhs
uglob[] = Aglobal1 rhsglobal[];
plot(uglob,value=1,fill=1,wait=1,cmm=Solution by a direct method,dim=3);
for(int i = 0;i<npart;++i)
{
17 cout << Domain : << i << / << npart << endl;
matrix aT = AglobalRih[i];
aA[i] = Rih[i]aT;
set(aA[i],solver = UMFPACK);// direct solvers
21 }
ofstream filei(Conv.m);
25 Vh un = 0; // initial guess
Vh rn = rhsglobal;
for(int iter = 0;iter<maxit;++iter)
{
29 real err = 0, res;
Vh er = 0;
for(int i = 0;i<npart;++i)
{
33 real[int] bi = Rih[i]rn[]; // restriction to the local domain
real[int] ui = aA[i] 1 bi; // local solve
bi = Dih[i]ui;
// bi = ui; // uncomment this line to test the ASM method as a solver
37 er[] += Rih[i]bi;
}
un[] += er[]; // build new iterate
rn[] = Aglobalun[]; // computes global residual
41 rn[] = rn[] rhsglobal[];
rn[] = 1;
err = sqrt(er[]er[]);
res = sqrt(rn[]rn[]);
45 cout << Iteration: << iter << Correction = << err << Residual =
<< res << endl;
plot(un,wait=1,value=1,fill=1,dim=3,cmm=Approximate solution at step +
iter);
int j = iter+1;
// Store the error and the residual in Matlab/Scilab/Octave form
49 filei << Convhist(+j+,:)=[ << err << << res <<]; << endl;
if(err < tol) break;
}
plot(un,wait=1,value=1,fill=1,dim=3,cmm=Final solution);
2
10
overlap=2
1 overlap=5
10
overlap=10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
0 10 20 30 40 50 60
Figure 1.15: Solution and RAS convergence as a solver for different overlaps
The result of tracing the evolution of the error is shown in Figure 1.15 where
one can see the convergence history of the RAS solver for different values of
the overlapping parameter.
Remark 1.7.1 Previous tests have shown a very easy use of the RAS iter-
ative algorithm and some straightforward conclusions from this.
Note that it is very easy to test the ASM method, see eq. (1.30), when
used as a solver. It is sufficient to uncomment the line bi = ui;.
Running the program shows that the ASM does not converge. For this
reason, the ASM method is always used a preconditioner for a Krylov
method such as CG, GMRES or BiCGSTAB, see chapter 3.
In the the three-dimensional case the only part that changes is the
decomposition into subdomains. The other parts of the algorithm are
identical.
include ../../FreefemCommon/data3d.edp
include ../../FreefemCommon/decomp3d.idp
4 include ../../FreefemCommon/createPartition3d.idp
SubdomainsPartitionUnity3(Th,part[],sizeovr,aTh,Rih,Dih,Ndeg,VolumeThi);
load metis
2 load medit
int nn=3,mm=3; // number of the domains in each direction
int npart= nnmm; // total number of domains
int nloc = 20; // local no of dof per domain in one direction
6 bool withmetis = 1; // =1 (Metis decomp) =0 (uniform decomp)
int sizeovr = 2; // size of the overlap
real allong = real(nn)/real(mm); // aspect ratio of the global domain
func E = 21011; // Young modulus ans Poisson ratio
10 func sigma = 0.3;
func lambda = Esigma/((1+sigma)(12sigma)); // Lame coefficients
func mu = E/(2(1+sigma));
real sqrt2=sqrt(2.);
14 func eta = 1.0e6;
// Mesh of a rectangular domain
mesh Th=square(nnnloc,mmnloc,[xallong,y]);
fespace Vh(Th,[P1,P1]); // vector fem space
18 fespace Uh(Th,P1); // scalar fem space
fespace Ph(Th,P0);
Ph part; // piecewise constant function
int[int] lpart(Ph.ndof); // giving the decomposition
22 // Domain decomposition data structures
mesh[int] aTh(npart); // sequence of ovr. meshes
matrix[int] Rih(npart); // local restriction operators
matrix[int] Dih(npart); // partition of unity operators
26 int[int] Ndeg(npart); // number of dof for each mesh
real[int] AreaThi(npart); // area of each subdomain
matrix[int] aA(npart); // local Dirichlet matrices
// Definition of the problem to solve
30 int[int] chlab=[1,11 ,2,2 ,3,33 ,4,1 ]; //Dirichlet conditions for label = 1
Th=change(Th,refe=chlab);
macro Grad(u) [dx(u),dy(u)] // EOM
macro epsilon(u,v) [dx(u),dy(v),(dy(u)+dx(v))/sqrt2] // EOM
34 macro div(u,v) ( dx(u)+dy(v) ) // EOM
func uboundary = (0.25 (y0.5)2);
varf vaBC([u,v],[uu,vv]) = on(1, u = uboundary, v=0) + on(11, u = 0, v=0) +
on(33, u=0,v=0);
// global problem
38 Vh [rhsglobal,rrhsglobal], [uglob,uuglob];
macro Elasticity(u,v,uu,vv) eta(uuu+vvv) +
lambda(div(u,v)div(uu,vv))+2.mu( epsilon(u,v)epsilon(uu,vv) ) //
EOM
varf vaglobal([u,v],[uu,vv]) = int2d(Th)(Elasticity(u,v,uu,vv)) + vaBC; //
on(1,u=uboundary,v=0)
matrix Aglobal;
42 // Iterative solver parameters
real tol=1e6; // tolerance for the iterative method
int maxit=200; // maximum number of iterations
include ../../FreefemCommon/dataElast.edp
include ../../FreefemCommon/decomp.idp
4 include ../../FreefemCommon/createPartitionVec.idp
SubdomainsPartitionUnityVec(Th,part[],sizeovr,aTh,Rih,Dih,Ndeg,AreaThi);
Then we need to define the global data from the variational formula-
tion.
42 CHAPTER 1. SCHWARZ METHODS
27 ofstream filei(Conv.m);
Vh [un,uun] = [0,0]; // initial guess
Vh [rn,rrn] = [rhsglobal,rrhsglobal];
for(int iter = 0;iter<maxit;++iter)
31 {
real err = 0, res;
Vh [er,eer] = [0,0];
for(int i = 0;i<npart;++i)
35 {
real[int] bi = Rih[i]rn[]; // restriction to the local domain
real[int] ui = aA[i] 1 bi; // local solve
bi = Dih[i]ui;
39 // bi = ui; // uncomment this line to test the ASM method as a solver
er[] += Rih[i]bi;
}
un[] += er[]; // build new iterate
43 rn[] = Aglobalun[]; // computes global residual
rn[] = rn[] rhsglobal[];
rn[] = 1;
err = sqrt(er[]er[]);
47 res = sqrt(rn[]rn[]);
cout << Iteration: << iter << Correction = << err << Residual =
<< res << endl;
int j = iter+1;
// Store the error and the residual in Matlab/Scilab/Octave form
51 filei << Convhist(+j+,:)=[ << err << << res <<]; << endl;
if(res < tol) break;
}
mesh Thm=movemesh(Th,[x+coeff2un,y+coeff2uun]);
55 medit(Thm, Thm);
medit(uh, Th, un, Th, uun, order=1);
43
44 CHAPTER 2. OPTIMIZED SCHWARZ METHODS (OSM)
n2 n1
1 2 1 n2 n1 2
Figure 2.1: Outward normals for overlapping and non overlapping subdo-
mains for P.L. Lions algorithm.
2 ) = f
(un+1 in 2 ,
un+1
2 = 0 on 2 (2.2)
( 2 ) = (
+ ) (un+1 + ) (un1 ) on 2 1
n2 n2
where n1 and n2 are the outward normals on the boundary of the subdo-
mains, see Figure 2.1.
We use Fourier transform to analyze the benefit of the Robin interface con-
ditions in a simple case. The domain = R2 is decomposed into two half-
planes 1 = (, ) R and 2 = (0, ) R with 0. We consider the
example of a symmetric positive definite problem
( )(u) = f in R2 ,
and
( )(un+1
2 ) = f (x, y), (x, y) 2
n+1
u2 is bounded at infinity (2.4)
( + ) (u2 )(0, y) = (
n+1
+ ) (un1 )(0, y), yR
n2 n2
( )(en+1
1 ) = 0, (x, y) 1
n+1
e1 is bounded at infinity (2.5)
( + ) (e1 )(, y) = (
n+1
+ ) (en2 )(, y), y R
n1 n1
and
2 ) = 0,
( )(en+1 (x, y) 2
n+1
e2 is bounded at infinity (2.6)
( + ) (e2 )(0, y) = (
n+1
+ ) (en1 )(0, y), y R
n2 n2
By taking the partial Fourier transform of the first line of (2.5) in the y
direction we get:
2
( 1 (x, k)) = 0
+ k 2 ) (en+1 for x < and k R.
x2
n+1
with 1,2 to be determined. From the interface conditions we get
and
2n+1 (k)( + ) = 1n (k)(+ + ) exp(+ (k)).
Combining these two and denoting (k) = + (k) = (k), we get for i = 1, 2,
with
(k)
(k, ; ) = exp((k)) (2.7)
(k) +
where (k) = + k 2 and > 0.
(k)
(k, 0; ) = < 1. (2.8)
(k) +
2.1. P.L. LIONS ALGORITHM 47
1
1 Improvement factor
with Robin interface conditions
Dirichlet interface conditions
0.8
0.8
Convergence Rate
0.6
0.6
0.4 0.4
0.2 0.2
0 0
0 5 10 15 20 25 30 35 40 0 5 10 15 20 25 30 35 40
Fourier number Fourier number
Figure 2.2: Left: Convergence factor (dotted line: Dirichlet ICs, solid line:
Robin ICs) vs. Fourier number.
Right: Improvement factor due to Robin interface conditions (2.8) vs.
Fourier number
(x)ij = (x)ji 0
ij (x) = 0 on ij .
In this case the operators ij = ji defined in (2.9) have the following re-
markable properties
(x)un+1
i ((x)un+1i ) = f in i ,
uin+1
= 0 on i
((x) + ij ) un+1
i = ((x) + ij ) unj on ij .
ni nj
(2.10)
ui 1/2 uj 1/2
((x) ) + ij (ui ) = ij ((x) ) + ij (uj ) on ij .
1/2 1/2
ij
ni nj
The convergence proof of the algorithm (2.10) follows the arguments given
in [33] and is based on a preliminary result which is an energy estimate
(x)u ((x)u) = 0 in i
u = 0 on i ,
Then,
2
1 ui
(x)ui 2 + (x)ui 2 + ( [(x) (u )])
1/2
ij i
i 4 ji ij ij ni
2
1 ui
= (ij [(x) + ij (ui )])
1/2
4 ji ij ni
2.1. P.L. LIONS ALGORITHM 49
(x)ui ((x)ui ) = 0 in i ,
ui ui
(x)ui 2 + (x)ui 2 = (x) ui = (x) ui
i i ni ji ij ni
ui 1/2 1/2
= (x) ij (ui )
ji ij ni ij
ui 1/2
= ij ((x) ) ij (ui )
1/2
ji ij ni
2
1 ui 1/2
(x)ui 2 + (x)ui 2 + ( ((x) ) (u ))
1/2
ij ij i
i ji 4 ij ni
2
1 ui 1/2
= (ij ((x) ) + ij (ui ))
1/2
ji 4 ij ni
(x)en+1
i ((x)en+1
i ) = 0 in i ,
ein+1
= 0 on i
en+1 1/2 en 1/2
ij ((x) i ) + ij (en+1 ) = ij ((x) njj ) + ij (enj )
1/2 1/2
i on ij .
ni
Let us denote
N
E n+1 = Ein+1 and C n = Cji
n
,
i=1 i,j(ji)
we have
E n+1 + C n+1 = C n .
Hence, by summation over n, we get
E C0.
n+1
n=0
This series is convergent only if E 0 as n , which proves that for all
n
The same kind of proof holds for the Maxwell system [45] and the convection-
diffusion equation [140].
1 2u
u = f (t, x, y)
c2 t2
which models for instance pressure variation with a source term f and a
sound velocity c. When the source is time periodic, it makes sense to look
for time periodic solutions as well. With some abuse of notation, let f =
f (x, y) exp(it) (i2 = 1) be a harmonic source term of frequency , we seek
the solution to the acoustic wave equation in the form u = u(x, y) exp(it).
Then, u must be a solution to the Helmholtz equation
2
L(u) = ( ) (u) = f (x, y), x, y
c2
An extra difficulty comes from the non positivity of the Helmholtz operator
due to the negative sign of the term of order zero. More precisely, a Schwarz
algorithm for solving Helmholtz equation involves the decomposition of do-
main into N overlapping subdomains (j )1jN , the solving of Dirichlet
problems in the subdomains:
2
( c2 )(un+1
j ) = f (x, y), in j , 1 j N
(2.12)
un+1
j = un , on j
N
un+1 = j un+1
j .
j=1
Moreover, even when the local problems are well-posed, a bad convergence is
to be expected. We present the analysis in the case of the plane divided into
two subdomains although it is very similar to the elliptic case considered in
1.5.2. We introduce the wave number
= .
c
Consider the domain is = R2 with the Sommerfeld radiation condition at
infinity,
u
lim r ( + iu) = 0,
r r
where r = x2 + y 2 . We decompose it into two subdomains with or without
overlap, 0, 1 = (, ) R and 2 = ( 0, ) R and consider the
Schwarz algorithm
u1n+1 2 un+1
1 = f (x, y), x, y 1
u1 (, y) = un2 (, y) , y R
n+1
(2.13)
u n+1
lim r ( 1 + iu1n+1 ) = 0
r r
and
un+1
2 2 un+1
2 = f (x, y), x, y 2
un+1
2 (0, y) = un1 (0, y) , y R
(2.14)
u n+1
lim r ( 2 + iu2n+1 ) = 0 .
r r
For the convergence analysis it suffices by linearity to consider the case
f (x, y) = 0 and to analyze convergence to the zero solution. Let the Fourier
transform in y direction be denoted by
and
2 un+1
2
( 2 k 2 )u2n+1 = 0,
x2
x > 0, k R (2.16)
2 (0, k) = u1 (0, k)
un+1 n
2.2. HELMHOLTZ PROBLEMS 53
un+1
j = Aj e(k)x + Bj e(k)x , j = 1, 2,
Note that the main difference with the elliptic case is that (k) is a complex
valued function. It takes real values for the vanishing modes k and
purely imaginary values for propagative modes k < . Since the Sommerfeld
radiation condition excludes growing solutions as well as incoming modes at
infinity, we obtain the local solutions
1 (x, k) = u1 (, k)e
un+1 n+1 (k)(x)
(2.18)
2 (x, k) = u2 (0, k)e
un+1 n+1 (k)x
.
where (k) is given by (2.17). By induction, the following result follows for
even n
un1 (0, k) = (k, )n u01 (0, k), un2 (0, k) = (k, )n u02 (0, k).
Remark 2.2.1 We can distinguish a few features from the expression of the
convergence factor, see Figure 2.3:
The main difference with the elliptic case is that (k, ) is a complex
valued function. The convergence will occur if the modulus of is
smaller than one, (k, ) < 1.
54 CHAPTER 2. OPTIMIZED SCHWARZ METHODS (OSM)
1.2
Convergence rate for omega = 10
0.8
0.6
0.4
0.2
0
0 5 10 15 20 25 30 35 40
Fourier number
Figure 2.3: Convergence factor (2.20) of the Schwarz method for Helmholtz
equation vs. Fourier number
For vanishing modes k > , (k) is negative and real so that these
modes converge faster for larger overlaps. But we have (k, ) 1 for
propagative modes k < whatever the overlap is since there (k)
iR. This prevents the convergence of the Schwarz algorithm for the
Helmholtz equation.
A possible fix is the use of a relaxation parameter :
uin+1 = un+1
i + (1 )uni .
The choice of the optimal parameter is not easy and the overall
convergence rate is not good anyway.
It is thus clear that the Schwarz method cannot be used safely nor efficiently
as a solver for the Helmholtz equation.
N
un+1 = j un+1
j .
j=1
This algorithm fixes the two drawbacks of the Schwarz algorithm explained
in 2.2.1. First, note that the solution to the boundary value problem (2.21)
is unique even if 2 is an eigenvalue of the Laplace operator with Dirichlet
boundary conditions. Indeed, suppose for some subdomain k (1 k N )
we have a function v k C that satisfies:
2 v v = 0 in k
( + i) (v) = 0 on k .
nk
Multiplying the equation by the conjugate of v and integrating by parts, we
get:
v
v + v v = 0 .
2 2 2
k k nk
2 v2 + v2 + i v2 = 0 .
k k
v2 = 0 ,
k
un+11 2 un+1
1 = f (x, y), x, y 1
( + i) (u1 )(, y) = (
n+1
+ i) un2 (, y) , y R
n1 n1 (2.22)
un+1
lim r ( 1 + iun+1 1 ) = 0
r r
56 CHAPTER 2. OPTIMIZED SCHWARZ METHODS (OSM)
1.2
Convergence rate for omega = 10
0.8
0.6
0.4
0.2
0
0 5 10 15 20 25 30 35 40
Fourier number
and
un+1
2 2 un+1
2 = f (x, y), x, y 2
( + i) un+1
2 (0, y) = ( + i) un1 (0, y) , y R
n2 n2 (2.23)
un+1
lim r ( 2 + iun+1 2 ) = 0.
r r
(k) i
(k, ) = exp((k) ) (2.24)
(k) + i
see Figure 2.4. Note that for propagative modes k < , exp((k)) =
1 since (k) is purely imaginary. Therefore the convergence comes from
the Robin condition independently of the overlap. As for vanishing modes
k > , is real. The modulus of the rational fraction in (2.24) is 1 and
2.3. IMPLEMENTATION ISSUES 57
convergence comes from the overlap. Thus for all Fourier modes except k =
, the convergence factor is smaller than one. Thus, the method converges
for almost every Fourier number k.
Consider for example the problem (2.1) in Lions algorithm and let g2n denote
the right-hand side on the interface of subdomain 1 :
g2n = ( + ) (un2 ) on 1 2 . (2.25)
n1
Then, the bilinear form associated to the variational formulation of (2.1) is:
aRobin H 1 (1 ) H 1 (1 ) R
(u, v) z (uv + u v) + u v .
1 1 2
(2.26)
Note that the boundary integral term containing is positive when v = u.
Thus bilinear form aRobin is even more coercive than the one associated with
the Neumann boundary value problem which contains only the subdomain
integral term. The linear form related to (2.1) reads:
lRobin H 1 (1 ) R
(2.27)
v z fv+ g2n v .
1 1 2
{k k N1 } Vh (1 )
58 CHAPTER 2. OPTIMIZED SCHWARZ METHODS (OSM)
The only difficulty lies now in the discretization of the right-hand side g2n on
the interface. Suppose we have a P 1 (piecewise linear) finite element so that
we can identify a degree of freedom with a vertex of a triangular mesh in 2D.
The variational form aRobin implicitly defines a discretization scheme for the
outward normal derivative un+1 1 /n1 . The corresponding stencil can only
involve vertices that belong to domain 1 , see Figure 2.5. In order to ensure
that the domain decomposition algorithm leads to a solution identical (up to
the tolerance of the iterative solver) to the one which would be obtained by
the original discretization scheme on the whole domain, it seems necessary
to discretize g2n (= un2 /n1 ) using the same stencil. But, the function un2 is
defined by degrees of freedom that are in 2 and thus cannot be the ones
1 /n1 . In the overlapping case, a discretization of the right-
defining un+1
hand side based on the same stencil points could be done but at the expense
of identifying the stencil implicitly defined by the variational formulation.
This is not so easy to implement in a code.
In the next two sections, we give two tricks that ease the implementation so
that the converged solution is equal to the one which would be obtained by
the original discretization scheme on the whole domain. One trick applies
to a non overlapping decomposition and the other one to the overlapping
case only.
1 2
Figure 2.5: Stencil for the outward normal derivative (u1 /n1 ) at the in-
terface between two non overlapping subdomains 1 and 2
level,
( )un+1
1 =f in 1
un+1 un2
1
+ un+1
1 = + un2 on 12
n1 n2
(2.29)
( )un+1
2 =f in 2
un+1 un1
2
+ un+1
2 = + un1 on 12 .
n2 n1
where we have used that on the interface between non overlapping subdo-
mains, we have n1 = n2 , see Figure 2.1. A direct discretization would
require the computation of the normal derivatives along the interfaces in or-
der to evaluate the right-hand sides in the transmission conditions of (2.29).
This can be avoided by re-naming the problematic quantities:
un2 un1
n1 = + un2 and n2 = + un1 .
n2 n1
60 CHAPTER 2. OPTIMIZED SCHWARZ METHODS (OSM)
( )u1n+1 = f in 1
un+1
1
+ un+1
1 = n1 on 12
n1
(2.30)
( )un+1
2 = f in 2
un+1
2
+ un+1
2 = n2 on 12
n2
n+1
1 =n2 + 2 un+1
2
n+1
2 =n1 + 2 un+1
1 .
un+1 un+1
n+1
1 = 2
+ un+1
2 = ( 2
2 ) + 2 u2
+ un+1 n+1
= n2 + 2 un+1
2
n2 n2
(2.31)
and similarly
un+1 un+1
n+1
2 = 1
+ un+1
1 = ( 1
1 ) + 2 u1
+ un+1 n+1
= n1 + 2 un+1
1 .
n1 n1
(2.32)
Equations (2.31) and (2.32) can be interpreted as a fixed point algorithm in
the new variables j , j = 1, 2, to solve the substructured problem
1 = 2 + 2 u2 (2 , f ) ,
(2.33)
2 = 1 + 2 u1 (1 , f ) ,
( )uj = f in j ,
uj
+ uj = j on 12 .
nj
Instead of solving the substructured problem (2.33) by the fixed point itera-
tion (2.30), one usually uses a Krylov subspace method to solve the substruc-
tured problem. This corresponds to using the optimized Schwarz method as
a preconditioner for the Krylov subspace method.
At this point, we can introduce the algebraic counterparts of the continu-
ous quantities. A finite element discretization of the substructured prob-
lem (2.33) leads to the linear system
1 = 2 + 2 B2 u2
(2.34)
2 = 1 + 2 B1 u1
2.3. IMPLEMENTATION ISSUES 61
ARobin,1 u1 = f1 + B1T 1
(2.35)
ARobin,2 u2 = f2 + B2T 2
We detail now the new matrices and vectors we have introduced:
Vectors u1 , u2 contain the degrees of freedom of the subdomain solu-
tions.
Vectors f1 , f2 are the degrees of freedom related to f .
Matrices B1 and B2 are the trace operators of the domains 1 and 2
on the interface 12 .
In order to define matrices ARobin,j , we first split the two vectors u1 and u2
into interior and boundary degrees of freedom:
ui
j
uj = b , j = 1, 2, (2.36)
uj
where the indices i and b correspond to interior and interface degrees of
freedom respectively for domain j . Then the discrete trace operators B1
and B2 are just the Boolean matrices corresponding to the decomposition
(2.36) and they can be written as
Bj = [0 I] , j = 1, 2, (2.37)
Matrices ARobin,1 and ARobin,2 arise from the discretization of the local
operators along with the interface conditions n + ,
Here Kj are local matrices of problem (as combination of stiffness and mass
matrices) and M12 is the interface mass matrix
The functions n and m are the basis functions associated with the degrees
of freedom n and m on the interface 12 .
For given 1 and 2 , the functions u1 and u2 can be computed by solving
equations (2.35). By eliminating u1 and u2 in (2.34) using (2.35), we obtain
the substructured linear system
F = d, (2.40)
62 CHAPTER 2. OPTIMIZED SCHWARZ METHODS (OSM)
I I 2 B2 A1 T
Robin,2 B2
F =( )
I 2 B1 A1 T
Robin,1 B1 I
(2.41)
2 B1 A1
Robin,1 f1
d=( )
Robin,2 f2 .
2 B2 A1
The linear system (2.40) is solved by a Krylov subspace method. The matrix
vector product amounts to solving a subproblem in each subdomain and to
send interface data between subdomains.
The general case of a decomposition into an arbitrary number of subdomains
is treated in [88] for the case of the Helmholtz equations. It is also possible to
discretize the interface conditions by using Lagrange multipliers the method
is then named FETI 2 LM (Two-Lagrange Multiplier ), see [162].
1 + 2 = 1, suppi i , i = 1, 2.
un = E1 (1 un1 ) + E2 (2 un2 ) ,
2.3. IMPLEMENTATION ISSUES 63
rn = f ( )un (2.42)
( )vin = rn in i
vin = 0 on i
(2.43)
( + ) (vin ) = 0 on i 3i
ni
u0 = E1 (1 u01 ) + E2 (2 u02 )
( + ) (un ) = ( + ) (un2 ) on 1 2 .
n1 n1
1 n2 n1 2
(a) (a) (b) (c) (c)
1 ) + E2 (2 u2 )
un+1 = E1 (1 un+1 n+1
which ends the proof of the equivalence between P.L. Lions algorithm the
continuous version of the ORAS algorithm.
We now give the algebraic definition of the ORAS method by giving the
discrete counterparts of steps (2.42)-(2.43)-(2.44), see [36]. Let Un R#N
be an approximate solution to a linear system:
AU = F . (2.48)
The set of degrees of freedom N is decomposed into two subsets N1 and N2 .
For i = 1, 2, let Ri denote the Boolean restriction matrix to Ni and Di define
an algebraic partition of unity 2i=1 RiT Di Ri = Id, as in (1.25).
The above three steps algorithm can be written more compactly as:
2.4. OPTIMAL INTERFACE CONDITIONS 65
rn = F AUn (2.49)
Un+1 = Un + MORAS
1
rn , rn = F A Un
1
1
2 2
c
2 c
1
L(u) = f,
u = 0, .
The domain is decomposed into two subdomains 1 and 2 . We suppose
that the problem is regular so that ui = ui , i = 1, 2, is continuous and has
continuous normal derivatives across the interface i = i j , i j. A
modified Schwarz type method is considered.
L(un+1
1 )=f in 1 2 )=f
L(un+1 in 2
un+1
1 =0 on 1 un+1
2 =0 on 2
1 u1 .n1
n+1
+ B1 (un+1
1 ) 2 u2 .n2
n+1
+ B2 (un+1
2 )
= 1 un2 .n2 + B1 (un2 ) on 1 = 2 un1 .n1 + B2 (un1 ) on
2
(2.53)
where 1 and 2 are real-valued functions and B1 and B2 are operators
acting along the interfaces 1 and 2 . For instance, 1 = 2 = 0 and B1 =
B2 = Id correspond to the original Schwarz algorithm (1.3); 1 = 2 = 1 and
2.4. OPTIMAL INTERFACE CONDITIONS 67
In order to answer this question, we note that by linearity, the error e satisfies
(we supposed here with no loss of generality that 1 = 2 = 1)
L(en+1
1 )=0 in 1 L(en+1
2 )=0 in 2
en+1
1 =0 on 1 en+1
2 =0 on 2
e1 .n1
n+1
+ B1 (en+1
1 ) e2 .n2
n+1
+ B2 (en+1
2 )
= en2 .n2 + B1 (en2 ) on 1 = en1 .n1 + B2 (en1 ) on 2
L(e12 ) = 0 in 2 .
u0 1 R
(2.54)
DtN2 (u0 ) = v.n2 1 2 ,
L(v) = 0 in 2 1
v = 0 on 2
v = u0 on 1 2 .
If we take
B1 = DtN2
we see that this choice is optimal since we have
L(e12 ) = 0.
68 CHAPTER 2. OPTIMIZED SCHWARZ METHODS (OSM)
Hence,
The two-domain case for an operator with constant coefficients has been
first treated in [104]. The multidomain case for a variable coefficient oper-
ator with both positive results [141] and negative conjectures for arbitrary
domain decompositions [147] has been considered as well.
Remark 2.4.1 The main feature of this result is to be very general since it
does not depend on the exact form of the operator L and can be extended to
systems or to coupled systems of equations as well with a proper care of the
well-posedness of the algorithm.
U1
U2
These symbols are not polynomials in the Fourier variable k so that the
corresponding operators and hence the optimal interface conditions are not
partial differential operators. They correspond to exact absorbing condi-
tions. These conditions are used on the artificial boundary resulting from
the truncation of a computational domain. The solution on the truncated
domain depends on the choice of this artificial condition. We say that it
is an exact absorbing boundary condition if the solution computed on the
truncated domain is the restriction of the solution of the original problem.
Surprisingly enough, the notions of exact absorbing conditions for domain
truncation and that of optimal interface conditions in domain decomposition
methods coincide.
As the above examples show, the optimal interface transmission conditions
are pseudodifferential. Therefore they are difficult to implement. More-
over, in the general case of a variable coefficient operator and/or a curved
boundary, the exact form of these operators is not known, although they can
be approximated by partial differential operators which are easier to imple-
ment. The approximation of the DtN has been addressed by many authors
since the seminal paper [72] by Engquist and Majda on this question.
n+1
A22 A2 U F2
( ) 2 = n (2.57)
A2 A + T1 Un+1 F + T1 Un,1 A1 U1
,2
n n
limit as n goes to infinity of the sequence (U1 , Un,1 , U2 , Un,2 ) ,
n0
,1 U,2 ) = 0
(A + T1 + T2 )(U
T
as n goes to infinity shows that (U1 , U
,1 = U,2 , U2 )
is a solution to
(2.55) which is unique by assumption.
As in 2.4, we can use the optimal interface conditions
Lemma 2.4.2 Assume Aii is invertible for i = 1, 2.
Then, in algorithm (2.56)-(2.57), taking
T1 = A1 A1
11 A1 and T2 = A2 A22 A2
1
Proof Note that in this case, the bottom-right blocks of the two by two
bock matrices in (2.56) and (2.57)
A A1 A1
11 A1 and A A2 A22 A2
1
or equivalently by applying A1 A1
11 :
1
A1 U1 A1 A1
11 A1 U,1 = 0
1
So that the right-hand side of (2.57) is zero at step 2 (i.e. n = 1). We have
thus convergence to zero in domain 2. The same holds for domain 1.
Matrices T1 and T2 are in general dense matrices whose computation and
use as interface conditions is very costly. The relationship between optimal
algebraic interface conditions and the continuous ones is simply that last
line of block matrices in (2.56) and (2.57) are the correct discretizations of
the optimal interface introduced in 2.4.1.
n + ( ) (2.58)
max (k, 0; ) = 1,
k
2.5. OPTIMIZED INTERFACE CONDITIONS 73
1/y 10 20 40 80
sc
opt 6 7 10 16
=1 27 51 104 231
Table 2.1: Number of iterations for different values of the mesh size and two
possible choices for
mesh, the iteration count is reduced by a factor larger than ten. The opti-
mized Schwarz method is thus quite sensitive to the choice of the interface
condition. As we shall see in the next section, when the method is used as
a preconditioner in Krylov methods as explained in 2.3.1, performance is
less dependent on the optimal choice of the interface condition. Typically,
the iteration count is reduced by a factor three, see Table 2.3. Since taking
optimal interface conditions is beneficial in terms of iteration counts and
has no extra cost, it is a good thing to do especially for wave propagation
phenomena that we consider in the sequel.
The difficulty comes from the negative sign of the term of order zero of the
operator.
Although the following analysis could be carried out on rectangular domains
as well, we prefer for simplicity to present the analysis in the domain = R2
with the Sommerfeld radiation condition at infinity,
u
lim r( + iu) = 0,
r r
where r = x2 + y 2 . We decompose the domain into two non-overlapping
subdomains 1 = (, 0 ] R and 2 = [ 0, ) R and consider the Schwarz
2.5. OPTIMIZED INTERFACE CONDITIONS 75
algorithm
un+1
1 2 un+1
1 = f (x, y), x, y 1
(2.62)
B1 (un+1
1 )(0) = B1 (un2 )(0)
and
un+1
2 2 un+1
2 = f (x, y), x, y 2
(2.63)
B2 (un+1
2 )(0) = B2 (un1 )(0)
where Bj , j = 1, 2, are two linear operators. Note that for the classical
Schwarz method Bj is the identity, Bj = I and without overlap the algo-
rithm cannot converge. But even with overlap in the case of the Helmholtz
equation, only the evanescent modes in the error are damped, while the
propagating modes are unaffected by the Schwarz algorithm [88]. One pos-
sible remedy is to use a relatively fine coarse grid [19] or Robin transmission
conditions, see for example [43, 16]. We consider here transmission con-
ditions which lead to a convergent non-overlapping version of the Schwarz
method. We assume that the linear operators Bj are of the form
Bj = x + Tj , j = 1, 2,
for two linear operators T1 and T2 acting in the tangential direction on the
interface. Our goal is to use these operators to optimize the convergence
factor of the algorithm. For the analysis it suffices by linearity to consider
the case f (x, y) = 0 and to analyze convergence to the zero solution. Taking
a Fourier transform in the y direction we obtain
2 un+1
1
( 2 k 2 )un+1
1 = 0,
x2
x < 0, k R (2.64)
(x + 1 (k))(un+1
1 )(0) = (x + 1 (k))(u2 )(0)
n
and
2 un+1
2
( 2 k 2 )un+1
2 = 0,
x2
x > 0, k R (2.65)
(x + 2 (k))(un+1
2 )(0) = (x + 2 (k))(u1 )(0)
n
where j (k) denotes the symbol of the operator Tj , and k is the Fourier vari-
able, which we also call frequency. The general solutions of these ordinary
differential equations are
un+1
j = Aj e(k)x + Bj e(k)x , j = 1, 2,
un+1
1
= (k)un+1
1
x
u2n+1
= (k)un+1
2
x
we obtain over one step of the Schwarz iteration
choice of the symbols j (k) leads to non-local operators Tj in the real do-
main, caused by the square root in the symbols. We have to construct local
approximations for the optimal transmission conditions.
In Despres algorithm [42], the approximation consists in Tj i (i2 = 1). In
[72], the approximation valid for the truncation of an infinite computational
domain is obtained via Taylor expansions of the symbol in the vicinity of
k = 0:
1
Tjapp = i ( ) ,
2
which leads to the zeroth or second order Taylor transmission conditions,
depending on whether one keeps only the constant term or also the second
order term. But these transmission conditions are only effective for the low
frequency components of the error. This is sufficient for the truncation of a
domain since there is an exponential decay of the high frequency part (large
k) of the solution away from the artificial boundary.
But in domain decomposition, what is important is the convergence factor
which is given by the maximum over k of (k). Since there is no overlap
between the subdomains, it is not possible to profit from any decay. We
present now an approximation procedure suited to domain decomposition
methods. To avoid an increase in the bandwidth of the local discretized
subproblems, we take polynomials of degree at most 2, which leads to trans-
mission operators Tjapp which are at most second order partial differential
operators acting along the interface. By symmetry of the Helmholtz equa-
tion there is no interest in a first order term. We therefore approximate the
operators Tj , j = 1, 2, in the form
Tjapp = (a + b )
with a, b C and where denotes the tangent direction at the interface.
where and + are parameters to be chosen, and kmin denotes the smallest
frequency relevant to the subdomain, and kmax denotes the largest frequency
supported by the numerical grid. This largest frequency is of the order
/h. For example, if the domain is a strip of height L with homogeneous
Dirichlet conditions on top and bottom, the solution can be expanded in
a Fourier series with the harmonics sin(jy/L), j N. Hence the relevant
frequencies are k = j/L. They are equally distributed with a spacing /L
and thus choosing = /L and + = + /L leaves precisely one
frequency k = for the Krylov method and treats all the others by the
optimization. If falls in between the relevant frequencies, say j/L < <
(j + 1)/L then we can even get the iterative method to converge by choosing
= j/L and + = (j + 1)/L, which will allow us to directly verify our
asymptotic analysis numerically without the use of a Krylov method. How
to choose the optimal parameters p and q is given by the following:
Theorem 2.5.1 (Optimized Robin conditions) Under the three as-
sumptions
2 2 2 + +2 , < (2.71)
2 >
2 2
kmin + +2 , (2.72)
2 2 < 2
kmin + kmax
2
, (2.73)
the solution to the min-max problem (2.70) is unique and the optimal pa-
rameters are given by
2 2 k 2 2
max
p = q =
. (2.74)
2
The optimized convergence factor (2.70) is then given by
2 2 1/4 2 2
1 2 ( k2 2 ) + k2 2
(p , q , k) =
max max
max (2.75)
k(kmin , )(+ , kmax ) 2
2 1/4 2 2
1 + 2 ( k2 2 ) + k2 2
max max
where (k) is defined in (2.66) and the two parameters , C can be used
to optimize the performance. By the symmetry of (k) with respect to k, it
suffices to consider only positive k to optimize performance. We thus need
to solve the min-max problem
which has an elegant analytical solution. Note however that the original
minimization problem (2.80) might have a solution with better convergence
factor, an issue investigated in [84].
2 2
2
2 2
we have k2 2 = 1 since iR and therefore (k; , ) = k2 2
k + k +
depends only on . The solution (, ) of the minimization problem (2.81)
is thus given by the solution of the two independent minimization problems
i 2 k 2
min ( max ) (2.86)
iR, k(kmin , ) i 2 k 2 +
and
k 2 2
min ( max ) . (2.87)
R k(+ , kmax ) k 2 2 +
We show the solution for the second problem (2.87) only, the solution
for the first problem (2.86) is similar. First note that the maximum of
2 2
= k2 2 is attained on the boundary of the interval [+ , kmax ],
k +
because the function (but not ) is monotonically increasing with
k [+ , kmax ]. On the other hand as a function of , (+ ) grows mono-
tonically with while (kmax ) decreases monotonically with . The opti-
mum is therefore reached when we balance the two values on the boundary,
(+ ) = (kmax ) which implies that the optimal satisfies the equation
2
kmax 2 2 2
= + (2.88)
2
kmax 2 + +2 2 +
The optimization problem (2.87) arises also for symmetric positive definite
problems when an optimized Schwarz algorithm without overlap and Robin
transmission conditions is used and the present solution can be found in
[182].
Note that the optimization of the interface conditions was performed for the
convergence factor of a fixed-point method and not for a particular Krylov
method applied to the substructured problem. In the positive definite case
one can show that minimizing the convergence factor is equivalent to mini-
mizing the condition number of the substructured problem [110]. Numerical
experiments in the next section indicate that for the Helmholtz equation
our optimization also leads to parameters close to the best ones for the
preconditioned Krylov method.
Numerical results
We present two sets of numerical experiments. The first set corresponds to
the model problem analyzed in this paper and the results obtained illustrate
the analysis and confirm the asymptotic convergence results. The second
numerical experiment comes from industry and consists of analyzing the
82 CHAPTER 2. OPTIMIZED SCHWARZ METHODS (OSM)
u 2 u = f 0 < x, y < 1
u=0 0 < x < 1, y = 0, 1
u (2.89)
iu = 0 x = 0, 0 < y < 1
x
u
iu = 0 x = 1, 0 < y < 1.
x
We decompose the unit square into two subdomains of equal size, and we
use a uniform rectangular mesh for the discretization. We perform all our
experiments directly on the error equations, f = 0 and choose the initial
guess of the Schwarz iteration so that all the frequencies are present in the
error.
We show two sets of experiments: The first one with = 9.5, thus exclud-
ing from the frequencies k relevant in this setting, k = n, n = 1, 2, . . ..
This allows us to test directly the iterative Schwarz method, since with
optimization parameters = 9 and + = 10 we obtain a convergence
factor which is uniformly less than one for all k. Table 2.2 shows the
number of iterations needed for different values of the mesh parameter h
for both the zeroth and second order transmission conditions. The Taylor
2 1
10 0.5 10 Optimized 2 Krylov
h
Taylor 0 Krylov
0.32
h
Optimized 0 Krylov
1 0
10 10
3 2 1 3 2 1
10 10 10 10 10 10
h h
Figure 2.9: Asymptotic behavior for the zeroth order transmission conditions
(left) and for the second order transmission conditions (right)
was chosen to lie between two frequencies, which shows that with Krylov
acceleration the method is robust for any values of . We finally tested
for the smallest resolution of the model problem how well Fourier analysis
predicts the optimal parameters to use. Since we want to test both the
iterative and the Krylov versions, we need to put again the frequency
in between two problem frequencies, and in this case it is important to be
precise. We therefore choose to be exactly between two frequencies of
the discrete problem, = 9.3596, and optimized using = 8.8806 and
+ = 9.8363. Fig. 2.10 shows the number of iterations the algorithm needs
to achieve a residual of 10e6 as a function of the optimization parameters p
and q of the zeroth order transmission conditions, on the left in the iterative
version and on the right for the Krylov accelerated version. The Fourier
50 50 18
70
65
70
65
19
20
60
45 45
18
60 65
21
40 55 40 17
55
50
17
6065
35 35
50
70
60
55
19
50
20
45
30 30
p
18
45
40
17
40
21
25 25
55 35
17
706650
50
22
50
19
45
40
20 55 20 18
35 20
45 60 65 17
40 70
45 18 21
15 50 15 19
50 19 23
55 20 24 25
60 22
65 20
10 55 10 21
10 15 20 25 30 35 40 45 50 10 15 20 25 30 35 40 45 50
q q
analysis shows well where the optimal parameters lie and when a Krylov
method is used, the optimized Schwarz method is very robust with respect
2.5. OPTIMIZED INTERFACE CONDITIONS 85
to the choice of the optimization parameter. The same holds also for the
second order transmission conditions, as Fig. 2.11 shows.
30 30
10
22
10
11
22
20
40
50
65
60
21
45
12
26
24
28
13
55
24
15
35
26
25 25
19
14
28
30
20
18
22
21
21
24
10
19
26
20 20
10
17
22
11
20
40
50
60
45
alpha
alpha
12
24
20
13
55
18
15
18
16
35
15 15
19
26
14
22
19
28
10
30
21
17
18
21
24
19
26
10
17 18
10 22 2026
24 21 22 2026 10 11
30 28 24 30
28
12 11
35 35 30 28
40 35
405
50
45
13
45 40
4
50
55
60 11
65
15
55
70 50 45 19
16
5505 55 12
20
75 70 65
14
60 7580 90 85 60
5 5 12
5 10 15 20 25 30 35 40 45 50 55 60 5 10 15 20 25 30 35 40 45 50 55 60
beta beta
4 3 5
12
8 13
2 11
16 1
10 7 9
6 15
14
length a. To solve the problem, the optimized Schwarz method was used as
a preconditioner for the Krylov method ORTHODIR, and as convergence
criterion we used
f L 106 f L .
Ku (2.90)
2 2
the first order and second order formulations were presented in a unified
framework in [50, 49].
u
k 2 u u = f in , + iku = 0 on .
n
The right handside f is a Gaussian function. The real part of the solu-
tion is plotted on Figure 2.13. After discretization, the resulting linear
load metis
load medit
3 int nn=4,mm=4; // number of the domains in each direction
int npart= nnmm; // total number of domains
int nloc = 40; // local no of dof per domain in one direction
bool withmetis = 0; // =1 (Metis decomp) =0 (uniform decomp)
7 int sizeovr = 2; // size of the overlap
real allong = real(nn)/real(mm); // aspect ratio of the global domain
// Mesh of a rectangular domain
mesh Th=square(nnnloc,mmnloc,[xallong,y]);
11 fespace Vh(Th,P1);
fespace Ph(Th,P0);
Ph part; // piecewise constant function
int[int] lpart(Ph.ndof); // giving the decomposition
15 // Domain decomposition data structures
mesh[int] aTh(npart); // sequence of ovr. meshes
matrix[int] Rihreal(npart); // local restriction operators
matrix[int] Dihreal(npart); // partition of unity operators
19 matrix<complex>[int] Rih(npart); // local restriction operators
matrix<complex>[int] Dih(npart); // partition of unity operators
int[int] Ndeg(npart); // number of dof for each mesh
real[int] AreaThi(npart); // area of each subdomain
23 matrix<complex>[int] aA(npart); // local Dirichlet matrices
matrix<complex>[int] aR(npart); // local Robin matrices
// Definition of the problem to solve
// k2u Delta (u) = f,
27 // u = g, Dirichlet boundary (top and bottom)
// dn(u) + i k u = 0, Robin boundary (left and right)
int Dirichlet=1, Robin=2;
int[int] chlab=[1, Robin, 2, Robin, 3, Robin, 4, Robin];
31 Th=change(Th,refe=chlab);
macro Grad(u) [dx(u),dy(u)] // EOM
real k=12.pi;
func f = exp(((x.5)2+(y.5)2)120.);//1; // right hand side
35 func g = 0; // Dirichlet data
varf vaglobal(u,v) = int2d(Th)(k2uv+Grad(u)Grad(v))
+ int1d(Th,Robin)(1ikuv) int2d(Th)(fv)+
on(Dirichlet,u=g) ;
matrix<complex> Aglobal;
39 Vh<complex> rhsglobal,uglob; // rhs and solution of the global problem
complex alpha = 1ik; // Despres algorithm
// Optimal alpha:
// h12: the mesh size for a rectangular domain of dimension 2x1,
43 // that is obtained with nnxmm = 2x1,4x2,8x4
real h21 = 1./(mmnloc);
real kplus = k+pi/1; // the rectangle height is 1
real kminus = kpi/1;
47 real Cw = min(kplus2k2,k2kminus2);
real alphaOptR = pow(Cw/h21,1/3.)/2;
complex alphaOpt = alphaOptR + 1ialphaOptR;
//alpha = alphaOpt;
51 //alpha=1.0e30;
// Iterative solver
real tol=1e4; // tolerance for the iterative method
int maxit=1000; // maximum number of iterations
2.6. FREEFEM++ IMPLEMENTATION OF ORAS 89
Script file (note that the GMRS routine has to be adapted to complex
types)
/# debutPartition #/
2 include ../../FreefemCommon/dataHelmholtz.edp
include ../../FreefemCommon/decomp.idp
include ../../FreefemCommon/createPartition.idp
SubdomainsPartitionUnity(Th,part[],sizeovr,aTh,Rihreal,Dihreal,Ndeg,AreaThi);
6 for (int i=0; i<npart; i++) {
Rih[i] = Rihreal[i];
Dih[i] = Dihreal[i];
}
10 /# endPartition #/
/# debutGlobalData #/
Aglobal = vaglobal(Vh,Vh,solver = UMFPACK); // global matrix
rhsglobal[] = vaglobal(0,Vh); // global rhs
14 uglob[] = Aglobal1 rhsglobal[];
Vh realuglob = real(uglob);
plot(realuglob,dim=3,wait=1,cmm=Exact solution, value =1,fill=1);
/# finGlobalData #/
18 /# debutLocalData #/
for(int i = 0;i<npart;++i)
{
cout << Domain : << i << / << npart << endl;
22 mesh Thi = aTh[i];
fespace Vhi(Thi,P1);
varf RobinInt(u,v) = int2d(Thi)(k2uv+Grad(u)Grad(v))
+ int1d(Thi,Robin)(1ikuv) + on(Dirichlet,
u=g)
26 + int1d(Thi,10)(alphauv);
aR[i] = RobinInt(Vhi,Vhi);
set(aR[i],solver = UMFPACK); // direct solvers using UMFPACK
}
30 /# finLocalData #/
/# debutGMRESsolve #/
include GMRES.idp
Vh<complex> un = 0, sol, er; // initial guess, final solution and error
34 sol[] = GMRES(un[], tol, maxit);
plot(sol,dim=3,wait=1,cmm=Final solution,value =1,fill=1);
er[] = sol[]uglob[];
cout << Final scaled error = << er[].linfty/uglob[].linfty << endl;
38 /# finGMRESsolve #/
iterations). These results are obtained by a one level method that is with-
out a coarse space correction. We can improve them by adding a specially
designed coarse space for Helmholtz equations [34]. The effect can be seen in
Figure 2.14 (right). We plot the convergence curves for the one level Despres
algorithm and its acceleration using either 5 or 10 vectors per subdomain in
the coarse space.
0
10
1 Despres
10 CS5
Dirichlet
CS10
Despres
0 1
10 10
1
10
Residual
2
Residual
10
2
10
3
10
3
10
4 4
10 10
0 50 100 150 200 250 0 10 20 30 40 50 60 70 80 90 100
Iterations Iterations
Krylov methods
Recall first that the iterative versions of the Schwarz methods introduced in
chapter 1 can be written as preconditioned fixed point iterations
Un+1 = Un + M 1 rn , rn = F A Un
where M 1 is depending on the method used (RAS or ASM). The key point
in what follows is to prove that fixed point methods (3.3) are intrinsically
slower than Krylov methods, for more details see [39]. Since this result is
valid for any fixed point method, we will start by placing ourselves in an
abstract linear algebra framework and then apply the results to the Schwarz
methods.
A x = b, x RN (3.1)
91
92 CHAPTER 3. KRYLOV METHODS
is called a fixed point algorithm and the solution x of (3.1) is a fixed point
of the operator:
x z x + M 1 (b A x) .
The difference between matrices A and M is denoted by P = M A. When
convergent the iteration (3.3) will converge to the solution of the precondi-
tioned system
M 1 Ax = M 1 b .
The above system which has the same solution as the original system is
called a preconditioned system; here M 1 is called the preconditioner. In
other words, a splitting method is equivalent to a fixed-point iteration on a
preconditioned system. We see that the fixed point iterations (3.3) can be
written as
xn+1 = xn + M 1 (b A xn )
= (I M 1 (M P ))xn + M 1 b
= M 1 P xn + M 1 b (3.4)
= M 1 P xn + M 1 Ax = M 1 P xn + M 1 (M P )x
= x + M 1 P (xn x) .
From (3.4) it is easy to see that the error vector en = xn x verifies
en+1 = M 1 P en
so that
en+1 = (M 1 P )n+1 e0 . (3.5)
For this reason M 1 P is called the iteration matrix associated with (3.4)
and since the expression of the iteration doesnt change w.r.t. to n, we call
(3.4) a stationary iteration.
We recall below the well-known convergence results in the case of stationary
iterations.
Lemma 3.1.1 (Convergence of the fixed point iterations) The fixed
point iteration (3.4) converges for arbitrary initial error e0 (that is en 0
as n ) if and only if the spectral radius of the iteration matrix is inferior
to 1, that is (M 1 P ) < 1 where
(B) = max{, eigenvalue of B}.
3.2. KRYLOV SPACES 93
In general, it is not easy to ensure that the spectral radius of the iteration
matrix M 1 P is smaller than one. Except for M -matrices (A non-singular
and A1 non-negative), for which any regular splitting A = M P (M non-
singular, M 1 and P non-negative) leads to a convergent fixed point itera-
tion. (see Chapter 4 in [164] for details).
zn = M 1 rn = M 1 (b Axn )
n
xn+1 = x0 + (M 1 P )i z0 . (3.7)
i=0
From (3.5) we see that the residual vector rn = b Axn = A(xn x) verifies
zn = M 1 rn = M 1 (P M 1 )n r0 = M 1 (P M 1 )n M z0 = (M 1 P )n z0 . (3.10)
n1
xn+1 = x0 + z0 + (M 1 P )z1 + + (M 1 P )n zn = x0 + (M 1 P )i z0 . (3.11)
i=0
94 CHAPTER 3. KRYLOV METHODS
which leads to the conclusion. Thus the error xn+1 x0 is a geometric series
of common ratio M 1 P . Note that (3.11) can be also written in terms of
the residual vector.
xn+1 = x0 + M 1 r0 + (M 1 P )M 1 r1 + + (M 1 P )n M 1 rn
n1 (3.12)
= x0 + (M 1 P )i M 1 r0 .
i=0
xn+1 x0 = Sn (M 1 P )M 1 r0 = Sn (M 1 P ) z0 , (3.13)
xn+1 x0 Span{M 1 r0 , (M 1 P )M 1 r0 , , (M 1 P )n M 1 r0 }
(3.14)
= Span{z0 , (M 1 P )z0 , , (M 1 P )n z0 }
xn x0 Kn (M 1 P, M 1 r0 ) = Kn (M 1 P, z0 ).
en = (M 1 P )n e0 en Kn+1 (M 1 P, e0 ),
zn = (M 1 P )n z0 zn Kn+1 (M 1 P, z0 ).
3.2. KRYLOV SPACES 95
Kn (M 1 P, z0 ) = Kn (M 1 A, z0 ).
Remark 3.2.2 Note also that the difference between two successive iterates
is given by
xn+1 xn = (M 1 P )zn = (M 1 P )n M 1 r0
= (I M 1 A)n M 1 r0 Kn+1 (M 1 A, M 1 r0 )
Example 3.2.1 From the family of fixed point methods we can mention the
Richardson iterations with a relaxation parameter.
The parameter can be chosen in such a way to have the best convergence
factor over the iterations. In the case of a symmetric positive definite matrix,
the value of which minimizes the convergence rate of the algorithm is
2
opt =
min (A) + max (A)
where
max (A)
2 (A) =
min (A)
is the condition number of A, max (A) and min (A) being the extreme eigen-
values of A. In practice, it is very difficult to estimate accurately min or
max and thus opt . This motivates the introduction of the gradient method
in the next paragraph.
96 CHAPTER 3. KRYLOV METHODS
Consider function f :
z f () = xn + pn x2A
f (n ) = xn + n pn x2A
xn+1
= min xn + pn x2A = min (A(xn + pn x), xn + pn x)
= min [2 (Apn , pn ) + 2(Apn , xn x) + (A(xn x), xn x)]
= min[2 (Apn , pn ) 2(pn , A(x xn ))] + (A(xn x), xn x).
n
r
rn 22 (en , rn )A
n = = . (3.18)
rn 2A rn 2A
In the following we will define a new method that extends the idea of gradient
methods and will prove afterwards that is exactly the Krylov method whose
iterates are given by (3.19).
Definition 3.3.1 (Conjugate Gradient) Starting from an initial guess
x0 and an initial descent direction p0 = r0 = b Ax0 , a conjugate gradient
method is an iteration of the form
xn+1 = xn + n pn ,
rn+1 = rn n Apn , (3.20)
p n+1
= r n+1
+ n+1 p ,n
98 CHAPTER 3. KRYLOV METHODS
where n and n are chosen such that they minimize the norm of the error
en+1 2A = xn+1 x2A at each iteration.
Coefficients n and n+1 are easy to find. First, from (3.18) we see that the
coefficient n which minimizes en+1 2A is necessarily given by
(pn , rn )
n = . (3.21)
(Apn , pn )
By taking the dot product of the second relation of (3.20) by pn and by
using (3.21) into it, we see that
By taking the dot product of the third relation of (3.20) by rn+1 and by
using the previous orthogonality relation we obtain
(rn , rn ) rn 22
n = = . (3.22)
(Apn , pn ) pn 2A
(rn+1 , rn+1 )4
= en+1 2A
(Apn+1 , pn+1 )
(rn+1 , rn+1 )4
= en+1 2A .
(A(rn+1 + n+1 pn ), (rn+1 + n+1 pn ))
Using this identity and taking the A-dot product of the third relation of
(3.20) by pn+1 we get
(Apn+1 , pn+1 ) = (Arn+1 , pn+1 ) + n (Apn+1 , pn )
(3.24)
= (Arn+1 , pn+1 ) = (Apn+1 , rn+1 )
By using (3.24) into the dot product of the second relation of (3.20) by rn
(rn+1 , rn ) = (rn , rn ) n (Apn , rn ) = (rn , rn ) n (Apn , pn ) = 0. (3.25)
Finally by taking the dot product of the second relation of (3.20) by rn+1
and by using (3.25)
rn+1 22
(rn+1 , Apn ) = . (3.26)
n
By plugging (3.26) into (3.23) we conclude by using the expression of (3.22)
that
rn+1 22
n+1 = . (3.27)
rn 22
The resulting algorithm is given in Algorithm 4, see [9].
Algorithm 4 CG algorithm
Compute r0 = b Ax0 , p0 = r0
for i = 0, 1, . . . do
i = (ri , ri )2
qi = Api
i
i =
(pi , qi )2
xi+1 = xi + i pi
ri+1 = ri i qi
i+1 = (ri+1 , ri+1 )2
i+1
i+1 =
i
pi+1 = ri+1 + i+1 pi
check convergence; continue if necessary
end for
We can see that the conjugate gradient algorithm has a few remarkable
properties that can be stated in the following lemma.
Lemma 3.3.1 (Conjugate Gradient as a Krylov method) The
descent directions pn are A-orthogonal or conjugate
(Apk , pl ) = 0, k, l, k l
and the residual vectors are orthogonal between them and to the descent
directions
(rk , rl ) = 0, k, l, k l and (pk , rl ) = 0, k, l, k < l
100 CHAPTER 3. KRYLOV METHODS
Moreover, both the descent directions and the residuals span the Krylov
space:
and vector xn from (3.20) minimizes the error norm en 2A on the affine
space x0 + Kn (A, r0 ). The algorithm will converge in at most N iterations,
N being the size of matrix A.
en = Pn (A)e0 .
Pn (0) = 1.
Q(A) = T Q()T t .
3.3. THE CONJUGATE GRADIENT METHOD 101
Therefore we have for any monic (coefficient of the leading term is equal to
one) polynomial Q of degree n
where 1 = min (A) and 2 = max (A). We know from [23] that
where
2 1 )
Tn ( 2x 1 2
Q (x) =
,
Tn (
2 1 )
1 2
that is
nOG 22 (A) ( log ) .
From Lemma 3.3.2, the conjugate gradient method will converge in nCG
iterations with:
2 (A)
nCG ( log ) .
2
These estimates are clearly in favor of the CG algorithm over the optimal
gradient method.
M 1/2 M 1/2 = M 1 .
Ax = b, (3.32)
with
A = M 1/2 A M 1/2 , x = M 1/2 x, b = M 1/2 b
can be rewritten as the following iterative method: starting from and initial
guess x0 and of an initial descent direction
p0 = M 1 r0 = M 1 (b Ax0 ),
xn+1 = xn + n pn ,
rn+1 = rn n Apn , (3.33)
p n+1
= M 1 n+1
r + n+1 p ,
n
3.3. THE CONJUGATE GRADIENT METHOD 103
(M 1 rn , rn ) (M 1 rn+1 , rn+1 )
n = , n+1 = . (3.34)
pn 2A (M 1 rn , rn )
Proof Suppose we apply now the conjugate gradient method (3.20) to the
system (3.32). This gives
xn+1 = xn + n pn ,
rn+1 = rn n Apn , (3.35)
pn+1 = rn+1 + n+1 pn ,
with
p0 = r0 M 1/2 p0 = M 1/2 r0 p0 = M 1 r0 .
M 1 Ax = M 1 b
en 2 = A1 rn 2 A1 rn 2
3.4. KRYLOV METHODS FOR NON-SYMMETRIC PROBLEMS 105
Hn Vn+1
Vn+1 Hn Yn = r0 [(Hn )1,1 , (Hn )2,1 , (Hn )n,1 ]T
In
(3.45)
and consequently
where 1,n+1 is the first column vector of the canonical basis of Rn+1 . Solving
such a system is straightforward if one knows the QR decomposition of Hn
Hn = Qn Rn
with Qn being a unitary matrix of order n + 1 and Rn and upper triangular
matrix of size (n + 1) n where the last line is null. This factorization is
quite easy to compute in the case of an upper Hessenberg matrix (since it
almost upper triangular). Supposing that this factorization is available
then (3.46) reduces to the system
Rn (Qn Qn ) Rn Yn = Rn Qn 1,n+1 Rn Rn Yn = Rn Qn 1,n+1 . (3.47)
In
Rn Yn = gn , gn = Qn 1,n . (3.50)
where n+1 is the last element of Qn 1,n+1 (we used the fact matrices Vn+1
and Qn are unitary). See [164] for more details of the proof.
N , the cost is marginal. When the iteration counts gets large, its cost
can be a problem.
A = W W 1 Q(A) = W Q()W 1
Other facts:
Krylov methods do not need the explicit form of the operators A nor
of the preconditioner M . They are based on applying these operators
to vectors (so called matrix vector product operations) as it is the case
for the fixed point method (3.3). These are matrix free methods.
There exist Matlab functions for the main Krylov methods: pcg for
the Conjugate Gradient and gmres for the GMRES method.
Ax = b (3.54)
RN = ker(A) range(A)
C 1 = P(C) .
112 CHAPTER 3. KRYLOV METHODS
Proof Let
d
M(X) = ai X i
i=0
Then, d1 i
i=0 ai+1 X would be an annihilating polynomial of C of degree lower
than d. By definition of a minimal polynomial, this is impossible. This
proves that a0 0 and we can divide by a0 :
d
ai i1
Id + C C =0
i=1 a0
that is
d
ai i1
C 1 = P(C), P(C) = C . (3.55)
i=1 a0
Our first step is to extend Lemma 3.5.1 to the non invertible symmetric
case.
Proof Let denote the set of the eigenvalues of A without taking into
account their multiplicities. Since matrix A is non invertible, we have 0 .
Moreover, since it is symmetric, it is diagonalizable and it is easy to check
that A cancels its characteristic polynomial, that is
(Id A) = 0 . (3.57)
3.5. KRYLOV METHODS FOR ILL-POSED PROBLEMS 113
The zero-th order term of the above polynomial is zero and the next term
is non zero since it takes the following value:
A.
{0}
Proof Let r = At range(A) for some t RN . Note that since t may have
a non zero component in ker(A), we may have y t. We apply Lemma 3.5.2
and right multiply (3.56) by vector t:
p
A ( ai Ai2 ) r = r .
i=2
y=
x = x0ker + A b
0 0 0 0
1 0 0 x = b = 1 (3.59)
0 1 0 1
whose solutions are
1 0
x = 1 + R 0 .
0 1
In particular, the first component of any solution is equal to 1. For sake
of simplicity, assume x0 = 0. Then, the first residual r0 = b has its first
component equal to zero. The same holds for Ak r0 for all k 1. Thus for
all n, any vector in the space {x0 } + Kn (A, r0 ) has its first component equal
to zero and cannot be a solution to system (3.59).
The Krylov method applied in this case is the Conjugate Gradient (we are in
the symmetric case since both the original problem and the preconditioner
are symmetric positive definite) The whole program where these routines
3.6. SCHWARZ PRECONDITIONERS USING FREEFEM++ 117
/# debutPartition #/
2 include ../../FreefemCommon/dataGENEO.edp
include ../../FreefemCommon/decomp.idp
include ../../FreefemCommon/createPartition.idp
SubdomainsPartitionUnity(Th,part[],sizeovr,aTh,Rih,Dih,Ndeg,AreaThi);
6 /# endPartition #/
/# debutGlobalData #/
Aglobal = vaglobal(Vh,Vh,solver = UMFPACK); // global matrix
rhsglobal[] = vaglobal(0,Vh); // global rhs
10 uglob[] = Aglobal1 rhsglobal[];
/# finGlobalData #/
/# debutLocalData #/
for(int i = 0;i<npart;++i)
14 {
cout << Domain : << i << / << npart << endl;
matrix aT = AglobalRih[i];
aA[i] = Rih[i]aT;
18 set(aA[i],solver = UMFPACK); // direct solvers using UMFPACK
}
/# finLocalData #/
/# debutPCGSolve #/
22 include ../../FreefemCommon/matvecAS.idp
include PCG.idp
Vh un = 0, sol; // initial guess un and final solution
cout << Schwarz Dirichlet algorithm << endl;
26 sol[] = myPCG(un[], tol, maxit); // PCG with initial guess un
plot(sol,cmm= Final solution, wait=1,dim=3,fill=1,value=1);
Vh er = soluglob;
cout << Final error: << er[].linfty << endl;
30 /# finPCGSolve #/
In the three-dimensional case the only thing that changes with respect to
the main script is the preparation of the data
include ../../FreefemCommon/data3d.edp
3 include ../../FreefemCommon/decomp3d.idp
include ../../FreefemCommon/createPartition3d.idp
SubdomainsPartitionUnity3(Th,part[],sizeovr,aTh,Rih,Dih,Ndeg,VolumeThi);
2 1
10 10
overlap=2 overlap=2
1 overlap=5 0 overlap=5
10 10
overlap=10 overlap=10
0 1
10 10
1 2
10 10
2 3
10 10
3 4
10 10
4 5
10 10
5 6
10 10
6 7
10 10
0 10 20 30 40 50 60 0 2 4 6 8 10 12 14 16
include ../../FreefemCommon/matvecAS.idp
23 include GMRES.idp
Vh un = 0, sol, er; // initial guess, final solution and error
sol[] = GMRES(un[], tol, maxit);
plot(sol,dim=3,wait=1,cmm=Final solution,value =1,fill=1);
27 er[] = sol[]uglob[];
cout << Final error = << er[].linfty/uglob[].linfty << endl;
In the case of the use of BiCGstab, only the last part has to be modi-
fied
include ../../FreefemCommon/matvecAS.idp
include BiCGstab.idp
24 Vh un = 0, sol, er; // initial guess, final solution and error
sol[] = BiCGstab(un[], tol, maxit);
plot(sol,dim=3,wait=1,cmm=Final solution,value =1,fill=1);
er[] = sol[]uglob[];
28 cout << Final error = << er[].linfty/uglob[].linfty << endl;
In the three dimensional case we have to use the specific preparation of the
data as for the case of the ASM. The performance of Krylov methods is now
much less sensitive to the overlap size compared to the iterative version as
shown in Figure 3.1. Note also that in the case of the use of BiCGStab, an
iteration involves two matrix-vector products, thus the cost of a BICGStab
iteration is twice the cost of a CG iteration. In all previous scripts the
convergence residuals can be stored in a file in Matlab/Octave/Scilab form
and then the convergence histories can be plotted in one of these numeric
languages.
Chapter 4
Coarse Spaces
u = f, in ,
{
u = 0, on .
123
124 CHAPTER 4. COARSE SPACES
The problem of one level method comes from the fact that in the Schwarz
methods of 1 there is a lack of a global exchange of information. Data are
exchanged only from one subdomain to its direct neighbors. But the solution
in each subdomain depends on the right-hand side in all subdomains. Let
us denote by Nd the number of subdomains in one direction. Then, for
instance, the leftmost domain of Figure 4.1 needs at least Nd iterations
before being aware about the value of the right-hand side f in the rightmost
subdomain. The length of the plateau is thus typically related to the number
of subdomains in one direction and therefore to the notion of scalability met
in the context of high performance computing.
To be more precise, there are two common notions of scalability.
A mechanism through which the scalability can be achieved (in this case the
weak scalability) consists in the introduction of a two-level preconditioner
via a coarse space correction. In two-level methods, a small problem of
size typically the number of subdomains couples all subdomains at each
iteration. Note that here the term scalable refers to the number of iterations.
In real applications, scalability is defined in terms of the total computing
time. When measured by the computing time, the two-level methods are
often not scalable for large scale calculation because the coarse grid solver
is not scalable. As a result, a multilevel (more than two) approach is often
necessary.
To illustrate the need of a coarse space, in Figure 4.2 we consider a 2D
problem decomposed into 2 2, 4 4 and 8 8 subdomains. For each domain
decomposition, we have two curves: one with a one-level method which has
longer and longer plateaus and the second curve with a coarse grid correction
which is denoted by M2 for which plateaus are much smaller. The problem
of one level methods and its cure are also well illustrated in Figure 4.3 for
a domain decomposition into 64 strips as well as in Table 4.1. The one
level method has a long plateau in the convergence whereas with a coarse
space correction convergence is quite fast. We also see that for the one-level
curves, the plateau has a size proportional to the number of subdomains in
one direction. This can be understood by looking at Figure 4.4 where we
plot the slow decrease of the error for a one dimensional Poisson problem.
4.1. NEED FOR A TWO-LEVEL METHOD 125
-1
-2
-3
Log_10 (Error)
-4
4x4 8x8
-5
-6 M2
2x2 M2 M2
2x2
4x4 8x8
-7
0 10 20 30 40 50 60 70 80
Number of iterations (GCR)
From the linear algebra point of view, stagnation in the convergence of one-
level Schwarz methods corresponds to a few very low eigenvalues in the
spectrum of the preconditioned problem. Using preconditioners MASM 1
or
MRAS , we can remove the influence of very large eigenvalues of the coeffi-
1
cient matrix, which correspond to high frequency modes. It has been proved
that for a SPD (symmetric positive definite) matrix, the largest eigenvalue
of the preconditioned system by MASM 1
is bounded by the number of col-
ors needed for the overlapping decomposition such that different colors are
used for adjacent subdomains, see [179] or [164] for instance. But the small
eigenvalues still exist and hamper the convergence. These small eigenvalues
correspond to low frequency modes and represent a certain global informa-
tion with which we have to deal efficiently.
A classical remedy consists in the introduction of a coarse grid or coarse
space correction that couples all subdomains at each iteration of the iterative
method. This is closely related to deflation technique classical in linear
algebra, see Nabben and Vuiks paper [178] and references therein. This is
connected as well to augmented or recycled Krylov space methods, see e.g.
[73], [149] [163] or [21] and references therein.
Suppose we have identified the modes corresponding to the slow convergence
of the iterative method used to solve the linear system:
Ax = b
body motions.
Let us call Z the rectangular matrix whose columns correspond to these
slow modes. There are algebraic ways to incorporate this information to
accelerate the method. We give here the presentation that is classical in
domain decomposition, see e.g. [133] or [179]. In the case where A is SPD,
the starting point is to consider the minimization problem
= (Z T AZ)1 Z T (b Ay) .
Z = Z (Z T AZ)1 Z T (b Ay) .
r
N
= R0T (R0 AR0T ) R0 + RiT (Ri ARiT ) Ri
1 1
MASM,2
1
(4.1)
i=1
Compared to the one level Schwarz method where only local subprob-
lems have to be solved in parallel, the two-level method adds the solving
of a linear system of matrix R0 AR0T which is global in nature. This
global problem is called coarse problem.
The coarse problem couples all subdomains at each iteration but its
matrix is a small O(N N ) square matrix and the extra cost is neg-
ligible compared to the gain provided the number of subdomains is not
too large.
e44
...
e11
e41
0 l2 L1 l3 L2 x ...
Figure 4.4: Convergence of the error for the one level Schwarz method in
the 1D case.
4 SCHWARZ
10
additive Schwarz
with coarse gird acceleration
2
10
0
10
2
10
4
10
6
10
X: 25
Y: 1.658e08
8
10
0 50 100 150
Figure 4.3: Convergence curves with and without a coarse space correction
for a decomposition into 64 strips
Recall first that we have a discrete partition of unity (see 1.3) in the
following sense: let Ni be subsets of indices, Di be diagonal matrices, 1
i N:
Di R#Ni z R#Ni (4.2)
so that we have:
N
Ri Di Ri = Id .
T
i=1
With these ingredients we are ready to provide the definition of what we
will further call classical or Nicolaides coarse space.
Definition 4.2.1 (Nicolaides coarse space) We define Z as the matrix
whose i-th column is
Numerical results of Figures 4.3 and Table 4.1 have been obtained by a two-
level Schwarz preconditioner (4.1) where R0 = Z T with Z being defined by
(4.4).
Based on the previous quantities, we can build two versions of the additive
Schwarz preconditioner. First the classical one, called AS2 corresponding to
the formula (4.1) and then, the alternative version Q + P T MAS,1
1
P
After that we can use one of these inside a preconditioned conjugate gradient.
Finally the main program script is given by AS2-PCG.edp, whose first part
is identical as in the case of the one-level method.
132 CHAPTER 4. COARSE SPACES
include ../../FreefemCommon/matvecAS.idp
include ../../FreefemCommon/coarsespace.idp
24 include PCGCS.idp
Vh un = 0, sol; // initial guess un and final solution
cout << Schwarz Dirichlet algorithm << endl;
sol[] = myPCG(un[], tol, maxit); // PCG with initial guess un
28 plot(sol,cmm= Final solution, wait=1,dim=3,fill=1,value=1);
Vh er = soluglob;
cout << Final error: << er[].linfty << endl;
0 0
10 10
2x2 subdomains 2x2 subdomains
4x4 subdomains 4x4 subdomains
1 8x8 subdomains 1 8x8 subdomains
10 10
2 2
10 10
Residual norm (logscale)
4 4
10 10
5 5
10 10
6 6
10 10
7 7
10 10
8 8
10 10
0 10 20 30 40 50 60 70 0 5 10 15 20 25
Iteration Iteration
Figure 4.6: Convergence history without and with coarse space for various
domain decompositions
In this chapter we first introduce first the spectral coarse space which gen-
eralizes the idea of Nicolaides coarse space. In order to quantify the action
of this new preconditioner we need to estimate the condition number of the
preconditioned matrix MASM,2
1
A. This requires a theoretical framework al-
lowing a functional analysis of the condition number of the two-level Schwarz
method with this spectral coarse space.
1 2
135
136 CHAPTER 5. THEORY OF TWO-LEVEL ASM
the coefficients are across subdomain interfaces (see e.g. [62, 129, 47, 48])
or inside the subdomains and not near their boundaries (cf. [155, 154]).
However, when the discontinuities are along subdomain interfaces, classical
methods do not work anymore. In [142], we proposed the construction
of a coarse subspace for a scalar elliptic positive definite operator, which
leads to a two-level method that is observed to be robust with respect
to heterogeneous coefficients for fairly arbitrary domain decompositions,
e.g. provided by an automatic graph partitioner such as METIS or SCOTCH
[117, 24]. This method was extensively studied from the numerical point
of view in [143]. The main idea behind the construction of the coarse
space is the computation of the low-frequency modes associated with
a generalized eigenvalue problem based on the Dirichlet-to-Neumann
(DtN) map on the boundary of each subdomain. We use the harmonic
extensions of these low-frequency modes to the whole subdomain to
build the coarse space. With this method, even for discontinuities along
(rather than across) the subdomain interfaces (cf. Fig. 5.1), the iteration
counts are robust to arbitrarily large jumps of the coefficients leading
to a very efficient, automatic preconditioning method for scalar hetero-
geneous problems. As usual with domain decomposition methods, it is
also well suited for parallel implementation as it has been illustrated in [115].
We first motivate why the use of spectral information local to the sub-
domains is the key ingredient in obtaining an optimal convergence. A
full mathematical analysis of the two-level method will be given in chapter 5.
We now propose a construction of the coarse space that will be suitable for
parallel implementation and efficient for arbitrary domain decompositions
and highly heterogeneous problems such as the Darcy problem with Dirichlet
boundary conditions:
div(u) = f, in
where can vary of several orders of magnitude, typically between 1 and 106 .
More precisely, let us consider first at the continuous level the Dirichlet to
Neumann map DtNi .
Definition 5.1.1 (Dirichlet-to-Neumann map) For any function de-
fined on the interface ui i R, we consider the Dirichlet-to-Neumann
map
v
DtNi (ui ) = ,
ni i
where i = i and v satisfies
div(v) = 0, in i ,
v = ui , on i , (5.2)
v = 0, on i .
To construct the coarse grid subspace, we use some low frequency modes of
the DtN operator by solving the following eigenvalue problem
We now motivate our choice of this coarse space based on DtN map. We
write the original Schwarz method at the continuous level, where the domain
is decomposed in a one-way partitioning. The error eni between the current
iterate at step n of the algorithm and the solution ui (eni = uni ui ) in
subdomain i at step n of the algorithm is harmonic
div(en+1
i ) = 0 in i . (5.5)
138 CHAPTER 5. THEORY OF TWO-LEVEL ASM
en+1
i1 en+1
i+1
On the 1D example sketched in Figure 5.2, we see that the rate of conver-
gence of the algorithm is related to the decay of the harmonic functions eni in
the vicinity of i via the subdomain boundary condition. Indeed, a small
value for this BC leads to a smaller error in the entire subdomain thanks
to the maximum principle. That is, a fast decay for this value corresponds
to a large eigenvalue of the DtN map whereas a slow decay corresponds to
small eigenvalues of this map because the DtN operator is related to the
normal derivative at the interface and the overlap is thin. Thus the small
eigenvalues of the DtN map are responsible for the slow convergence of the
algorithm and it is natural to incorporate them in the coarse grid space.
For any domain D we need the usual norm L2 (D) and seminorm
H 1 (D) , as well as the L2 inner product (v, w)L2 (D) . To simplify notations
we write:
v20,D = v 2 ,
D
5.3. ADDITIVE SCHWARZ SETTING 139
v2a,D = v2 .
D
When D = we omit the domain from the subscript and write 0 and
a instead of 0, and a, respectively. Note that the seminorm a
becomes a norm on H01 ().
Finally, we will also need averages and norms defined on (d1)dimensional
manifolds X IRd , namely for any v L2 (X) we define
1
v X = u and v0,X = v .
2 2
X X X
Ih (v) = v(xk ) k .
kN
This interpolant is known to be stable in the sense that there exists a con-
stant CIh > 0 such that for all continuous, piecewise quadratic (w.r.t. Th )
function v, we have:
Ih (v)a CIh va . (5.8)
and Int() denotes the interior of a domain. Extensions by more than one
layer can then be defined recursively.
Note that in the case of P1 finite elements the degrees of freedom are in fact
the values of the unknown function in the vertices of the mesh.
supp(j ) j ,
j=1 j (x) = 1, x ,
N
j C j1 .
Proof Let
dist(x, j ), x j ,
dj (x) = {
0, x j
Then it is enough to set
dj (x)
j (x) = Ih .
N
j=1 dj (x)
The first three properties are obvious, the last one can be found in [179],
page 57.
j = {x j j (x) < 1}
Vh (j ) = {vj v Vh }
Ej Vh,0 (j ) Vh . (5.10)
Ej Vh Vh,0 (j ) ,
and because a(, ) is coercive on Vh . For the same reason, the matrix Aj
in (5.11) is invertible. An important ingredient in the definition of Schwarz
algorithms are the following operators:
and
Pj = Ej Pj Vh Vh , j = 1N.
N N
Pad = Pi = Ej Pj
i=0 i=0
We can give an interpretation for the above from the linear algebra point of
view. In this case if we denote by Pj and Pj the matrix counterparts of Pj
and Pj w.r.t. the finite element basis {k }kN we get
Pj = A1
j Rj A, Pj = Rj Aj Rj A.
T 1
Taking into account (5.11), then the additive matrix which corresponds to
the parallel or block-Jacobi version of the original Schwarz method is then
given
N
Pad = MASM,2
1
A = Pi . (5.13)
i=0
a(Pad u, u) a(Pad u, u)
max (Pad ) = sup , min (Pad ) = inf .
uVh a(u, u) uVh a(u, u)
max (Pad ) Nc + 1.
max (Pad ) N + 1.
By using the fact that Pi are a-orthogonal projectors, we get for all u Vh :
a(Pi u, u) a(Pi u, Pi u)
= 1
u2a u2a
144 CHAPTER 5. THEORY OF TWO-LEVEL ASM
which implies:
N
a(Pi u, u) N a(Pi u, u)
max (Pad ) = sup sup N + 1.
uVh i=0 ua2
i=0 uVh u2a
As for the estimate with a bound on the maximum number colors, we pro-
ceed in the following way. By assumption, there are Nc sets of indices i ,
i = 1, . . . , Nc , such that all subdomains j for j i have no intersection
with one another. Then,
Pi = Pj , i = 1, , Nc .
ji
N
v = Ej vj , with v0 V0 , vj Vh,0 (j ), for j 1, (5.14)
j=0
and N
2 2
vj a,j C0 va .
2
(5.15)
j=0
min (MASM,2
1
A) C02 . (5.16)
N N
a(Pad u, u) = a(Pj u, u) = a(Pj u, Pj u)
j=0 j=0
N N
= a(Ej Pj u, Ej Pj u) = aj (Pj u, Pj u) (5.17)
j=0 j=0
N
2
= Pj ua, .
j
j=0
5.5. DEFINITION AND PROPERTIES OF COARSE SPACES 145
N N
u2a = a(u, u) = a(u, Ej uj ) = aj (Pj u, uj )
j=0 j=0
1/2 1/2
N N 2 N 2
Pj ua, uj a,j Pj ua, uj a,j .
j=0
j j=0 j j=0
(5.18)
From (5.18) we get first that
N 2 N
u4a Pj ua, uj 2a,j .
j=0 j
j=0
2
a(Pad u, u) j=0 Pj ua,j
N
u2a
=
u2a u2a 2
j=0 uj a,j
N
N
We can clearly see that if the decomposition u = Ej uj is C02 -stable that
j=0
is (5.15) is verified, then the smallest eigenvalue of Pad satisfies:
a(Pad u, u)
min (Pad ) = inf C02 .
uVh u2a
From Lemma 5.4.1 and (5.16) we get the final condition number estimate.
(MASM,2
1
A) C02 (Nc + 1).
Functions 1j are not in Vh,0 (j ) so well use the partition of unity functions
j to build the global coarse space. Recall that Ih denotes the standard
nodal value interpolation operator from C() to Vh () and define
j = Ih (j 1j ) .
H
V0 = span{H
j j is a floating subdomain} .
The use of the standard nodal value interpolation operator Ih is made nec-
essary by the fact that j 1j is not a P1 finite element function. By con-
struction each of the functions j Vh,0 , so that as required V0 Vh,0 . The
dimension of V0 is the number of floating subdomains.
where aj is the local bilinear form associated to the Laplace operator on the
local domain j
aj (v, w) = v w, v, w Vh (j )
j
not in Vh,0 (j ) so well use the partition of unity functions j to build the
global coarse space.
V0 = span{H
j,` 1 j N and 1 ` mj } , j,` = Ih (j v` ).
H
(j)
where
Remark 5.5.1 The structure of this coarse space imposes the following re-
marks
As in the case of the Nicolaides coarse space, the use of the standard
(j)
nodal value interpolation operator Ih is justified by the fact that j v`
is not a P1 finite element function.
The dimension of V0 is N
j=1 mj .
When mj = 1 and the subdomain does not touch the boundary of , the
lowest eigenvalue of the DtN map is zero and the corresponding eigen-
vector is a constant vector. Thus Nicolaides and the spectral coarse
spaces are then identical.
In the following we will illustrate some very natural properties of the pro-
jection operators defined before.
148 CHAPTER 5. THEORY OF TWO-LEVEL ASM
So if j = N j
ico
and j is a floating subdomain, under some regularity
assumptions of the subdomains, we have the following inequality (see [179]
for this type of result)
u j u20,j Ctr Hj u2a,j .
When j is not floating, j u = 0 and the above inequality becomes a
classical trace inequality.
nj +nj
u = aj (v` , u) v` .
(j) (j)
(5.26)
`=1
5.6. CONVERGENCE THEORY FOR ASM WITH NICOLAIDES AND SPECTRAL COARSE SPACES1
(j) n
The restriction of the functions {v` }`=1 j to the boundary j forms a
complete basis of Vh (j ). This implies that vn 0 on j , for all ` =
(j)
j
+`
1, . . . , nj . Moreover, it follows from the definition of the eigenproblem that
(j) n
the functions {v` }`=1 j are orthogonal also in the (, )0,j inner product.
Therefore
nj
2
u j u20,j = a(v` , u) v` 0,j
2
(j) (j)
(5.27)
`=mj +1
nj
1 2
= a (v` , u)
(j)
(j)
(5.28)
`=mj +1 `
nj
1 2 1
a (v` , u) = u2a,j
(j)
(j) (j)
mj +1 `=1 mj +1
and the result follows from (5.24), (5.25) and (5.27). Note that this estimate
does not require any regularity assumption on the subdomain.
J
u0 = Ih j j uj V0 and uj = Ih (j (u j u)). (5.29)
j=1
where j is Nj
ico
or spec
j . Then {uj }0jN form a C02 -stable decomposition
of u in the sense of the Definition 5.4.1 with
N
C02 = (8 + 8 C2 max cj ) k0 CIh (k0 + 1) + 1 ,
j=1
where cj depends on the choice of the coarse space and is given by for-
mula (5.23).
Proof Note that the proof is the same for both cases, that is j = N j
ico
and j = j .spec
N N N J
uj = Ih (j (u j u)) = Ih j u Ih (j j u) = u u0 .
j=1 j=1 j=1 j=1
We go now to the second part, that is to the proof of (5.15). From a simple
application of the triangle inequality it follows that
N N
uj a u u0 a + ua + uj a
2 2 2 2
(5.30)
j=0 j=1
Since the interpolant Ih is stable with respect to the a-norm (see (5.8)) and
since each cell is overlapped by at most k0 domains, we have
J 2 N 2
u u0 2a = Ih ( j (u j u)) CIh j (u j u)
j=1 a j=1 a
N
CIh k0 j (u j u)2a .
j=1
Substituting this into (5.30) and using the definition of uj as well as the
a-stability of the interpolant Ih we get
N N
uj a CIh (k0 + 1) j (u j u)a + ua .
2 2 2
(5.31)
j=0 j=1
N N
j (u j u)a 2 j u j ua,j + j u j u0,j
2 2 2 2 2
j=1 j=1
N
4u2a,j + 4j u2a,j + 2C2 j2 u j u20, .
j
j=1
(5.32)
Note that in the last part we used the last property of the partition of unity
stated in Lemma 5.3.2.
5.6. CONVERGENCE THEORY FOR ASM WITH NICOLAIDES AND SPECTRAL COARSE SPACES1
We have:
N N
j (u j u)a 4ua,j + 4j ua,j + 2C j u j u0,j .
2 2 2 2 2 2
j=1 j=1
N
u2a,j (8 + 8 C2 cj )
j=1
N
N
(8 + 8 C2 max cj ) u2a,j
j=1 j=1
N
(8 + 8 C2 max cj ) k0 u2a .
j=1
(5.33)
The last inequality comes from the assumption that each point x is
contained in at most k0 subdomains.
From (5.31) and (5.33) we conclude that
N
N
uj a [(8 + 8 C max cj ) k0 CIh (k0 + 1) + 1]ua ,
2 2 2
j=0 j=1
which ends the proof and yields the following formula for the constant C0
N
C02 = (8 + 8 C2 max cj ) k0 CIh (k0 + 1) + 1 . (5.34)
j=1
From the abstract Schwarz theory, we have the condition number estimates
in the two cases (see Corollary 5.4.1).
(MASM,2
1
A)
N Hj
[(8 + 8 C2 max (CP2 + Ctr
1
( ))) k0 CIh (k0 + 1) + 1] (Nc + 1).
j=1 j
(MASM,2
1
A)
N 1
[(8 + 8 C2 max (CP2 + ( ))) k0 CIh (k0 + 1) + 1] (Nc + 1).
j=1 j mj +1
152 CHAPTER 5. THEORY OF TWO-LEVEL ASM
0j
Djk
Xjk
Figure 5.3: Assumption on the overlapping region
Remark 5.6.1 Note that, by choosing the number mj of modes per subdo-
main in the coarse space such that mj +1 Hj1 , the preconditioned problem
(j)
(MASM,2
1
A)
N Hj
[(8 + 8 C2 max (CP2 + ( ))) k0 CIh (k0 + 1) + 1] (Nc + 1).
j=1 j
Hence, we have the same type of estimate as for the Nicolaides coarse space.
An interesting observation is that the bound depends only in an additive
way on the constant CP and on the ratio of subdomain diameter to overlap.
2j .
Djk
(ii) Xjk
We assume as well that the triangulation Th resolves each of the regions Djk
and each of the manifolds Xjk .
Lemma 5.7.1 There exists a uniform constant CP > 0, such that the fol-
lowing Poincaretype inequality holds for all j = 1, . . . , N and k = 1, . . . , Kj :
Proof It follows from Lemma 5.7.1, as well as the triangle and the Cauchy-
Schwarz inequalities, that
2
Djk
CP2 j2 u2a,Djk + ( u)
Xjk 2 Xjk
Djk
CP2 j2 u2a,Djk + u2
Xjk Xjk
CP2 j2 u2a,Djk + 2j u20,Xjk .
div(u) = f, in
All these approaches have their advantages and disadvantages, which depend
on many factors, in particular the type of coefficient variation and the size
of the overlap. When the coefficient variation is on a very small scale many
of the above approaches lead to rather large (and therefore costly) coarse
spaces, and it is still an open theoretical question how large the coarse space
will have to become in each case to achieve robustness for an arbitrary
coefficient variation and how to mitigate this.
Chapter 6
Neumann-Neumann and
FETI Algorithms
The last decade has shown, that Neumann-Neumann [12, 128, 121] type
and FETI [78] algorithms, as well as their variants such as BDDC [46] and
FETI-DP [75] algorithms, are very efficient domain decomposition methods.
Most of the early theoretical and numerical work has been carried out for
symmetric positive definite second order problems, see for example [38, 131,
78, 132]. Then, the method was extended to different other problems, like
the advection-diffusion equations [92, 1], plate and shell problems [176] or
the Stokes equations [150, 177].
These methods require pseudo inverses for local Neumann solves which can
be ill-posed either in the formulation of the domain decomposition problem
or in the domain decomposition preconditioner. FETI methods are based on
the introduction of Lagrange multipliers on the interfaces to ensure a weak
continuity of the solution. We present here an alternative formulation of the
Neumann-Neumann algorithm with the purpose of simplifying the numerical
implementation. This is made at the expense of minor modifications to the
original algorithm while keeping its main features. We start with a historical
and heuristic presentation of these methods.
155
156 CHAPTER 6. NEUMANN-NEUMANN AND FETI ALGORITHMS
U1
U2
I A11 I 0 A11 A1
1
A= 0 I A22 22 A2 .
I A1 (6.2)
A1 A1
11 A 2 A 1
22 I S I
S = A A1 A1
11 A1 A2 A22 A2
1
(6.3)
I 0 A11 A1 A11 I
1 1
A = I A22 A2
1 1
A1
22 0 I .
I S1
A1 A11 A2 A22 I
1 1
(6.4)
Note that in practice, the three matrices A111 , A 1
22 and S1
are never evalu-
ated explicitly. From the above formula, it appears that applying A1 to the
right-hand side F is equivalent to solving linear systems for the three above
matrices which can be done by factorizing them. The parallelism comes
from the fact that matrices A11 and A22 can be factorized concurrently. We
say we advance two fronts in parallel. Once it is done, matrix S can be
computed and then factorized.
By considering more than two fronts and recursive variants, it is possible
to have numerical efficiency for one or possibly two dozens of cores and
problems of size one or two millions of unknowns in two dimensions and
hundreds of thousands degrees of freedom in three dimensions. Although
these figures improve over time, it is generally accepted that purely direct
solvers cannot scale well on large parallel platforms. The bottleneck comes
from the fact that S is a full matrix which is costly to compute and
factorize. This is unfortunate since direct solvers have the advantage over
iterative solvers to be very predictable, reliable and easy to integrate via
third-party libraries. There is thus a great amount of effort in developing
hybrid direct/iterative solvers that try to take advantage of the two worlds.
158 CHAPTER 6. NEUMANN-NEUMANN AND FETI ALGORITHMS
n2 n1
11 A1 A2 A22 A2 ) x = b
x = S1 b (A A1 A1 1
(6.5)
(6.6)
u = f, in
{
u = 0, on .
In order to simplify the presentation, we forget about the boundary condition
on . Suppose that the domain is decomposed into two non-overlapping
subdomains 1 and 2 . Let the interface between the subdomains be =
6.2. TWO-SUBDOMAINS AT THE CONTINUOUS LEVEL 159
1 1
u12 u22
u02
u11 u12
u01
x
0 l L
n+ 21
un+1
i = ui + en+1
i , i = 1, 2. (6.10)
The rationale for the first step (6.8) is to satisfy the Poisson equation in the
subdomains while ensuring the continuity of the solution on the interface .
After this step, the solution is continuous on the interfaces but the fluxes
may not match. The jump on the fluxes is corrected after (6.10). On we
have
un+1 un+1 n+ 1 n+ 1
1
+ 2 = 1 )+
(u1 2 + en+1 2 )
(u2 2 + en+1
n1 n2 n1 n2
n+ 1 n+ 1 n+ 1 n+ 1 n+ 1 n+ 1
ui 2 1 u1 2 u2 2 u2 2 1 u1 2 u2 2
= + + + = 0.
ni 2 n1 n2 n2 2 n1 n2
n+ 21
(u ) = f, in i ,
i
n+ 1 (6.11)
ui 2 uni 1 un1 un2
= ( + ) on .
ni ni 2 n1 n2
6.2. TWO-SUBDOMAINS AT THE CONTINUOUS LEVEL 161
ui
Si (u , f ) = , i = 1, 2, (6.14)
ni
where ui solve the local problems
ui = f,
in i ,
(6.15)
ui = u , on .
Let also S be defined as the operator
u1 u2
S(u , f ) = S1 (u , f ) + S2 (u , f ) = + (6.16)
n1 n2
which quantifies the jump of the normal derivative at the interface. Define
also operator T by
1 1 1 1
T (g ) = (S (g , 0) + S21 (g , 0)) = (e1 + e2 ) . (6.17)
2 1 2 4
ei = 0, in i ,
ei (6.18)
= g on .
ni
With this notations, steps (6.9)-(6.10) can be re-written in terms of interface
unknowns.
162 CHAPTER 6. NEUMANN-NEUMANN AND FETI ALGORITHMS
un+1
= un (T S) (un , f ) . (6.19)
This defines what we call the iterative interface version of the Neumann-
Neumann algorithm.
It can be checked that both operators S(., 0) and T are symmetric positive
definite and thus the preconditioned conjugate gradient algorithm (PCG)
can be used. Moreover the action of both operators S(., 0) and T can
be computed mostly in parallel. This very popular preconditioner for the
operator S(0, .) was proposed in [12].
Remark 6.2.1 There are at least three point of views that indicate the rel-
evance of this simply derived preconditioner
n+1
= n (Sf eti Tf eti ) (n , f ) . (6.24)
This defines what we call the iterative interface version of the FETI algo-
rithm.
164 CHAPTER 6. NEUMANN-NEUMANN AND FETI ALGORITHMS
where H 1 () is the Sobolev space of functions that are square integrable and
their derivatives are square integrable as well. In order to introduce domain
decomposition methods, we make use of a functional analysis result that
proves that space H 1 () is isomorphic to the domain decomposed space
(local H 1 functions with Dirichlet traces continuous at the interface):
H1 (1 , 2 ) = {(u1 , u2 ) H 1 (1 ) H 1 (2 )/ u1 = u2 on } ,
see e.g. [13] or [28]. This allows the splitting of the functional of optimiza-
tion problem (6.26) into local contributions constrained by the continuity
condition at the interface. We have thus the following
Lemma 6.2.3 (Dual optimization problem) Minimization prob-
lem (6.26) is equivalent to
min J(u1 , u2 ) = min J1 (u1 ) + J (u2 )
(u1 ,u2 )H1 (1 ,2 ) (u1 ,u2 )H1 (1 ,2 )
1 1
= min u1 2 + u2 2 f u1 f u2 .
u1 H 1 (1 ) 2 1 2 2 1 2
u2 H (2 )
2
u1 = u2 , on
(6.27)
6.3. TWO SUBDOMAINS CASE AT THE ALGEBRAIC LEVEL 165
Tf eti (, f ) = 0
S(U , F) = (A A1 A1
11 A1 A2 A22 A2 ) (U )
1
(6.31)
11 F1 A2 A22 F2 ) = 0
(F A1 A1 1
freedom on the interface associated with two basis functions k and l . The
1,kl 2,kl
corresponding entry akl can be decomposed into a sum a = a + a ,
kl
where
= ,
ai,kl i = 1, 2.
k l
i
= f k
(i),k
f
i
A = A + A
(1) (2)
(6.32)
and of F into
F = F + F .
(1) (2)
S(U , F) = S1 (U , F) + S2 (U , F) and S = S1 + S2 .
Aii Ai
vi 0
( ) ( )= . (6.35)
Ai A vi, g
(i)
Parallelism is natural since all these steps can mostly be computed concur-
rently on two processors. Matrix-vector products with operators S and T
are both defined as sums of local computations and are therefore essentially
parallel with a very high ratio of local computations over data transfer be-
tween processors. As for the step defined by equation (6.36), it is purely
parallel.
As in the continuous case, it is possible to invert the role of the Neumann
and Dirichlet problems and thus obtain the FETI algorithm.
Definition 6.3.2 (FETI interface operator) Let the operator
It can be checked that Tf eti (, 0) = 4 T. We can infer from this that since T is
a good preconditioner for S, 41 S is a good preconditioner for Tf eti . This leads
to the following definition of the discrete FETI substructured formulation.
I 0 A11 A1 A11 I
1 1
A 1
= I A22 A2
1
A1
22 0 I .
I S1
A1 A11 A2 A22 I
1 1
I 0 A11 A1 A11 I
1 1
M = I A22 A2
1 1
A22
1
0 I .
I T A1 A1
11 A A
2 22
1
I
(6.38)
where T is given by (6.34). A direct computation shows that the precondi-
tioned system M 1 A has the following form:
I 0 0
M 1
A = 0 I 0
0 0 T S
In the case of exact solves in the subdomains, that is for the computa-
tion of the exact factorization of Aii , the application of this precondi-
tioner is equivalent to that of the Schur complement approach, except
that it is performed at global level.
The bad news is that even if we have the spectral equivalence of the
preconditioner with the diagonal blocks A1 ii , M
1
is not necessarily
spectrally equivalent to matrix A. Whereas with overlapping Schwarz
methods when the local subdomain solvers are replaced by a spectrally
equivalent solver the convergence behavior of the ASM is asymptotically
equivalent to that of ASM with exact solvers, see [170].
k = # {j 1 j N and k Nj }
and the set of interface unknowns, also called the skeleton, defined as:
N = {k N / k > 1} .
Let Ri and R be the restriction boolean matrices from the global set N
to subsets N i and N . For U R#N , let Ui =Ri U = (Uk ) be the set
kN i
of interior degrees of freedom of subdomain i and U = R U = (Uk )kN
denote the set of interface degrees of freedom. Note that the sets (N i )1iN
and N form a partition of N
N
N = ( N i ) N .
i=1
If the skeleton unknowns are numbered last, the corresponding block form
of the global problem is thus:
A11 0 0 A1 U 1 F1
F2
0
A 22 A2 U2
0 0 = (6.39)
0
0 AN N AN
UN
FN
A1 A2 AN
A U F
6.4. MANY SUBDOMAINS CASE 171
As in the two-subdomain case for matrix (6.1), it has an arrow shape which
allows a multifrontal direct factorization.
In order to define a Neumann-Neumann preconditioner for the substructured
problem
N N
S U = F Ai A1
ii Fi , where S = A Ai Aii Ai ,
1
(6.40)
i=1 i=1
i=1
as it was done in the two subdomain case in eq. (6.32). More precisely for
a Poisson problem, let k, l be the indices of two degrees of freedom on the
interface associated with two basis functions k and l . The corresponding
i,kl
entry akl
can be decomposed into a sum a , where
=
ai,kl k l , i = 1, . . . , N .
i
Remark 6.4.1 Note that contrarily to the two subdomain case, matrices Si
have many zero columns and lines and are thus non invertible. The original
matrix A arising from a finite element discretization of a partial differential
operator, all lines and columns related to indices that do not belong to the
set Ni are zeros. It is thus not possible to define directly the preconditioner
as a weighted sum of the inverses of the local Schur complement Si as in the
two subdomain case (6.34).
Si = Ri Si RTi .
Si = RTi Si Ri .
Note that it is not necessary to compute the matrix Si1 . As seen in for-
mula (6.35), it is sufficient to solve a Neumann problem in the subdomains.
x
4 x 3
x x x x x
x
1 2
x
Figure 6.5: Skeleton and BNN interface unknowns for four subdomains
where the unknowns related to the FETI operator Tf eti (., 0) are the links
between the duplicated interface unknowns. They are Lagrange multipliers
of the equality constraints on the duplicated interface unknowns. When a
d.o.f. belongs to two and only two subdomains, there is one associated La-
grange multiplier. The duplication of the cross point of Figure 6.6 yields
four unknowns, denoted uC C C C
1 , u2 , u3 and u4 . The subscript indicates the
subdomain it comes from. In order to minimize the number of Lagrange
multipliers, we choose to impose non redundant equality constraints such
as uC1 = u2 , u2 = u3 and u3 = u4 . They yield three Lagrange multipliers
C C C C C
N
Id = RiT Di Ri . (6.44)
i=1
174 CHAPTER 6. NEUMANN-NEUMANN AND FETI ALGORITHMS
x x
4 x Cx 3
uC
4 34 uC
3
x x x x Cx x
x x C x x 23Cx x
C u1 u2
12
1 x x 2
x x
Figure 6.6: FETI duplicated interfaces and non redundant Lagrange multi-
pliers for four subdomains
Aii Aii
AiD = ( ). (6.45)
Ai i Ai i + tgv Id
(i)
For the sake of simplicity, we suppose that tgv is sufficiently large so that:
(AiD )1 ( Fi ) = ( ii (F i Aii Ui ) ) .
A1 (6.46)
F + tgv Ui
i
Ui
Let be a small positive parameter and M i = (mi,kl )k,lNi denote the local
mass matrix of the subdomain i . More precisely, for k, l Ni , we define
mi,kl = ( k l ) .
i
Aii Aii
AiN = M i + ( (i) ) . (6.47)
Ai i Ai i
6.5. NEUMANN-NEUMANN IN FREEFEM++ 175
Using the algebraic partition of unity, we now define the global counterpart
of the interface operator S (see eq. (6.31)).
Definition 6.5.1 Let U, F R#N , we define S(U, F) by the following for-
mula:
S(U, F) = F A RiT Di (AiD )1 ( Fi ) (6.49)
i Fi + tgv Ui
U, F R#N ,
S(U, F) = RT S(R U, F)
where we have used the partition of unity identity (6.44), the fact that the
diagonal entries of Di are one for interior d.o.fs of subdomain i (N i ) and
the equality
Aii Ri U = Ai R U.
0)(U) = S(0,
S(, F) (6.50)
In other words, only the interface values of a solution to (6.50) are unique.
The correct interior values can be obtained by a parallel post processing
consisting in setting the interior values to
ii (Fi Aii Ui ), 1 i N.
A1
N
T(r) = ( RiT Di (AiN ) Di Ri ) r for r R#N .
1
(6.51)
i=1
Result 6.5.2 We have the following link between T defined by eq. (6.51)
and T defined by eq. (6.43). For all r R #N
,
T r ) = T(r )
R (TR
0 A11 A1 S1 D r
1 1
0 A22 A2 S1
1
2 D r
T = .
0 A1 A S1 D r
N N N N
r T(r )
6.5. NEUMANN-NEUMANN IN FREEFEM++ 177
0
0
= (Ai )1 ( 0 ) = (Aii Ai Si D r ) .
1 1
(AN ) Di Ri
i 1
N
0 Di ri Ri Si D r
1
r
0
0
A1
ii Ai Si D r
1
Di (AN ) Di Ri
i 1
= ( D R S1 D r ) .
0 i i i
r
Finally, due to the fact that the global partition of unity (6.50) induces an
interface partition of unity , we have by left multiplying by RiT and summing
over the subdomains:
0 A11 A1 S1 D r
1 1
0 A22 A2 S2 D r
1 1
T
= .
0 A1 A S1 D r
N N N N
r T(r )
To summarize,
We solve the (ill-posed) linear system (6.50) by a conjugate gradient
method preconditioned by operator T (6.51).
is given by
and that of operator T
180 CHAPTER 6. NEUMANN-NEUMANN AND FETI ALGORITHMS
2
10 2
10
3
10
3
10
4
10
Residual
Residual
4
10
5
10
5
10
6
10
6
7 10
10
7
8
10 10
9 8
10 10
0 20 40 60 80 100 120 140 0 1 2 3 4 5 6 7
Iterations Iterations
Vh s, b, un, gd;
37 include ../../FreefemCommon/matvecDtNtd.idp
include ../../FreefemCommon/coarsespace.idp
// compute rhs of the interface problem: b[]
un = 0;
41 gd[]=g;
un[]+= bord[].gd[];
s[] = extendDir(un[],0);
b[]= Aglobals[];
45 b[]= rhsglobal[];
b[] = 1;
include PCGBNN.idp
Vh lam = 0,sol;
49 lam[] = lam[].intern[]; // insures that the first iterate verifies
lam[]+= bord[].gd[]; // the global BC, from iter=1 on this is true
sol[] = myPCG(lam[],tol,maxit);
sol[] = extendDir(sol[],0);
53 Vh err = soluglob;
plot(sol,cmm= Final Solution, wait=1,dim=3,fill=1,value=1);
cout << Final error: << err[].linfty << endl;
In Figures 6.7 and 6.8, we report results for uniform and Metis decomposi-
tions into 44, 88 and 1010 subdomains. We use the Neumann-Neumann
method as a preconditioner in a Krylov method. In the first set of tests the
method is used without a coarse space and we see that the iteration num-
ber depends linearly on the total number of domains and not only on the
number of domains in one direction as in the case of the Schwarz methods.
By adding a coarse space, the behavior becomes insensitive to the number
of subdomains as expected. Note that the convergence is better in the case
of uniform decompositions than in the case of decompositions using Metis.
182 CHAPTER 6. NEUMANN-NEUMANN AND FETI ALGORITHMS
1
10 2
10
2
10
3
10
3
10
Residual
Residual
4
10
4
10
5
10
5
10
6
6 10
10
7
7
10 10
8 8
10 10
0 50 100 150 200 250 300 0 2 4 6 8 10 12 14
Iterations Iterations
In order to achieve this goal, we have used in [29, 30] algebraic methods
developed in constructive algebra, D-modules (differential modules) and
symbolic computation such as the so-called Smith or Jacobson normal forms
and Grobner basis techniques for transforming a linear system of PDEs into
a set of independent scalar PDEs. These algebraic and symbolic methods
provide important intrinsic information (e.g., invariants) about the linear
system of PDEs to solve. For instance we recover that the two-dimensional
Stokes system is in some sense equivalent to a biharmonic operator (6.56).
6.6. NON-STANDARD NEUMANN-NEUMANN TYPE METHODS 183
Then, there exist two matrices E GLp (R) and F GLp (R) such that
A = E S F,
Ad w = g, (6.53)
S ws = E 1 g. (6.55)
Since E GLp (Rx ) and F GLp (Rx ), the entries of their inverses are still
polynomial in x . Thus, applying E 1 to the right-hand side g of Ad w = g
amounts to taking k-linear combinations of derivatives of g with respect to
x. If Rd is split into two subdomains R Rd1 and R+ Rd1 , then the
application of E 1 and F 1 to a vector can be done for each subdomain
independently. No communication between the subdomains is necessary.
In conclusion, it is enough to find a domain decomposition algorithm for
the uncoupled system (6.55) and then transform it back to the original one
(6.53) by means of the invertible matrix F over Rx . This technique can be
applied to any linear system of PDEs once it is rewritten in a polynomial
form. The uncoupled system acts on the new dependent variables ws , which
we shall further call Smith variables since they are issued from the Smith
normal form.
1 0
SA2 = ( ). (6.56)
0 2
The particular form of SA2 shows that, over Rx , the system of PDEs for the
linear elasticity in R2 is algebraically equivalent to a bi-harmonic equation.
(x + y ) + b1 x + b2 y + c
2 2
0 x
O2 =
0 (x2 + y2 ) + b1 x + b2 y + c y
x y 0
1 0 0
SO2
= 0 1 0 , L2 = c + b . (6.57)
0 0 L2
From the form of SO2 we can deduce that the two-dimensional Oseen equa-
tions can be mainly characterized by the scalar fourth order PD operator
L2 . This is not surprising since the stream function formulation of the
Oseen equations for d = 2 gives the same PDE for the stream function.
Remark 6.6.2 The above applications of Smith normal forms suggest that
one should design an optimal domain decomposition method for the bi-
harmonic operator 2 (resp., L2 ) in the case of linear elasticity (resp., the
Oseen/Stokes equations) for the two-dimensional problems, and then trans-
form it back to the original system.
u = f in R2 ,
(6.58)
u(x) 0 for x .
Since the bi-harmonic operator seems to play a key role in the design of a
new algorithm for both Stokes and elasticity problem in two dimensions, we
need to build an optimal algorithm for it. We consider the following problem:
2 i,n
= f, in i ,
i,n 1,n 2,n
i,n
1
= ( + ), on ,
= n ,
on , ni
2 n1 n2
i,n
= Dn , on ,
1 1,n 2,n
i,n
= ( + ) , on ,
ni
2 n1 n2
(6.61)
and then n+1
= n
+ 1
2
( 1,n
+ 2,n
) , D n+1
= Dn
+ 2
(
1 1,n
+ 2,n ).
Now, in the case of the two dimensional linear elasticity, represents the sec-
ond component of the vector of Smith variables, that is, = (ws )2 = (F u)2 ,
where u = (u, v) is the displacement field. Hence, we need to replace
with (F u)2 into the algorithm for the biLaplacian, and then simplify it
using algebraically admissible operations. Thus, one can obtain an optimal
algorithm for the Stokes equations or linear elasticity depending on the
form of F . From here comes the necessity of choosing in a proper way the
matrix F (which is not unique), used to define the Smith normal form,
in order to obtain a good algorithm for the systems of PDEs from the
optimal one applied to the bi-harmonic operator. In [57] and [59], the
computation of the Smith normal forms for the Euler equations and the
Stokes equations was done by hand or using the Maple command Smith.
Surprisingly, the corresponding matrices F have provided good algorithms
for the Euler equations and the Stokes equations even if the approach was
entirely heuristic.
The efficiency of our algorithms heavily relies on the simplicity of the Smith
188 CHAPTER 6. NEUMANN-NEUMANN AND FETI ALGORITHMS
E2 (ui,n ) = f, in i ,
1
1,n
ui
i,n
= (u1,n n1 + un2 ) ,
2,n
on ,
ui = u , on ,
n 2
ni ni (u ) =
i,n
n , on ,
ni i (ui,n ) = (n1 1 (u1,n ) + n2 2 (u2,n )) ,
2
on ,
(6.63)
1,n 2,n
and un+1
= un + 1
2
(u 1 + u 2 ) , n+1
= n
+ 1
2
( n n
1 1 (u 1,n
) + n n
2 2 (u 2,n
)).
All algorithms and interface conditions are derived for problems posed on
the whole space, since for the time being, this is the only way to treat from
the algebraic point of view these problems. The effect of the boundary
condition on bounded domains cannot be quantified with the same tools.
All the algorithms are designed in the PDE level and it is very important
to choose the right discrete framework in order to preserve the optimal
properties. For example, in the case of linear elasticity a good candidate
would be the TDNNS finite elements that can be found in [152] and the
algorithms obtained by these algebraic techniques have been used to design
a FETI-TDNNS method [151].
190 CHAPTER 6. NEUMANN-NEUMANN AND FETI ALGORITHMS
Chapter 7
191
192 CHAPTER 7. GENEO COARSE SPACE
In this chapter, the GenEO method, as well as most classical two-level meth-
ods are presented in a different light, under a common framework. Moreover,
their convergence can be proven in an abstract setting, provided that the
assumptions of the Fictitious Space Lemma are satisfied. Before stating this
Lemma, we first reformulate the two-level ASM method in this framework.
requires it.
In the fictitious space lemma (see Lemma 7.2.2) a certain number of abstract
ingredients are needed. These ingredients can be easily identified in the case
of the ASM. This intuitive introduction will give the general flavour of the
methods exposed later on. We will come to the abstract framework after
this short presentation.
b HD HD R
N N
(U, V) z b(U, V) = a(RiT Ui , RiT Vi ) = ViT (Ri ARiT )Ui
i=0 i=0
= V T BU,
(7.3)
where B HD HD is the operator defined by
can be re-written as
MASM,2
1
= RASM,2 B 1 RASM,2 , (7.7)
that is
N N N
Vi (RASM,2 (U))i = U Ri Vi = Vi Ri U .
T T T T
i=0 i=0 i=0
Since this equality is valid for arbitrary Vi , we have the identification:
RASM,2 (U) = (Ri U)0iN . (7.8)
which leads to the re-writing (7.7) of the Additive Schwarz Method.
The explanation of the application of the preconditionner in term of these
operators is the following
According to (7.8), the right most operator RASM,2 decomposes a
global vector in H into local components in HD
The middle operator B 1 corresponds to solving a coarse problem and
N local Dirichlet problems
RASM,2 interpolates the modified local components into a global vector
in H.
As we shall see in the sequel of the chapter, this abstract form is also valid
for many domain decomposition methods such as the balancing Neumann-
Neumann preconditioner (BNN) and a modified version of the Optimized
Schwarz method (OSM). It will enable us to both analyze their condition
number and propose new coarse space constructions.
7.2. MATHEMATICAL FOUNDATION 195
there exists a positive constant cT such that for all u H there exists
uD HD with RuD = u and
Proof We will give a proof of the spectral estimates only in the finite
dimensional case. First note that operator RB 1 R H H is symmetric
by definition. Its positive definiteness, is easy to check. For any u H, we
have:
(RB 1 R u, u) = (B 1 R u, R u)D 0 .
Since B is S.P.D. the above term is zero iff R u = 0. Since R is surjective,
it follows that R is one-to-one. Thus, R u = 0 implies that u = 0.
We first prove the upper bound of the spectral estimate (7.11). First note
that (7.9) is equivalent to
(R ARuD , uD )D cR (B uD , uD )D , uD HD .
cT a(u, u) (R A u, B 1 R A u)D
= (A u, RB 1 R A u) = a(u, RB 1 R A u) .
For a proof valid in the infinite dimensional case as well and for a proof of
the optimality of the spectral estimate see [145, 144] or [102].
7.2. MATHEMATICAL FOUNDATION 197
Note also that the bilinear operators from Lemma 7.2.2 can also be related
to an optimization problem.
L HD H R
(uD , ) 21 (BuD , uD )D (, R(uD ) u) .
Thus, we have:
y (y) = (By, yk ) yk ,
k 1
We need to consider the case where both operators A and B may be undefi-
nite. Let P be the orthogonal projection on range(A). Since A is symmetric
positive, P is actually a projection parallel to ker(A). We introduce the fol-
lowing generalized eigenvalue problem
(Cxk , xl ) = (k Axk , xl ) = k kl ,
(x) = (x P x) + (AP x, xk ) xk .
k >
ker(A)
Span{xk k > }
x (x) = (AP x, xk ) xk .
k
By plugging this formula into (B(x (x)), x (x)) and by using the
fact that the eigenvectors xk and xl belong to range(A) (xk = P xk ) and
their C-orthogonality, we get
(B(x (x)), x (x)) = (AP x, xk ) Bxk , (AP x, xl ) xl
kk ll
= (AP x, xk ) (AP x, xl ) (Bxk , xl )
ll kk
= (AP x, xk ) (AP x, xl ) (BP xk , P xl )
ll kk
= (AP x, xk ) (AP x, xl ) (Cxk , xl )
ll kk
= (AP x, xk ) k .
2
k
(7.21)
We can easily find upper bounds of this expression
m
(AP x, xk ) k (AP x, xk ) (AP x, xk )
2 2 2
k k k=1
m
= (AP x, (AP x, xk )xk ) = (AP x, P x) = (Ax, x).
k=1
(7.22)
From (7.21) and (7.22), the conclusion follows.
Proof Consider first the case when A is positive definite. In this case the
projection on the range of A is the identity: P = I and ker(A) = . Thus
problem (7.18) reduces to
Bxk = k Axk A1 Bxk = k xk
while (7.13) will be equivalent to
1
A1 Byk = yk
k
We can thus conclude that
1
Y = Span {yk k < } = Span {xk k > } = Z .
If A is not definite but B is, we can now left-multiply (7.13) by B 1/2 which
results into
B 1/2 AB 1/2 B 1/2 yk = k B 1/2 yk Ayk = k yk . (7.23)
A yk yk
ker(A)
and the quantity P Bxk is not necessarily null. Consider now the fol-
lowing example
1 0 2 1 1 0
A=( ), B = ( )P =( ).
0 0 1 1 0 0
1 0 2 1 y 2y + y2
( )y = ( )y ( 1 ) = ( 1 )
0 0 1 1 0 y1 + y2
0 1
(y, ) {[( ) , 0] , [( ) , 1]}
1 1
2 0 1 0 2x1 x
( )x = ( )x ( ) = ( 1 )
0 0 0 0 0 0
0 1
(x, ) {[( ) , 0] , [( ) , 2]}
1 0
therefore Y Z .
M T M M
( Qi Ui ) A ( Qi Ui ) k0 UTi (QTi AQi ) Ui (7.26)
i=1 i=1 i=1
7.2. MATHEMATICAL FOUNDATION 203
M T M
( Qi Ui ) A ( Qi Ui ) = (UTi QTi ) A (Qj Uj )
i=1 i=1 i,jQT
i A Qj 0
1/2 1/2
(UTi QTi A Qi Ui ) (UTj QTj A Qj Uj ) .
i,jQT
i A Qj 0
(7.27)
Let us introduce the connectivity matrix C RM RM defined as follows:
1 if QTi A Qj 0, 1 i, j M
Cij = { (7.28)
0 otherwise.
T
v = ((UT1 QT1 AQ1 U1 )1/2 , . . . , (UTM QTM AQM UM )1/2 ) . (7.29)
M M
v22 = vi2 = UTi (QTi AQi )Ui (7.30)
i=1 i=1
M T M
( Qi Ui ) A ( Qi Ui ) vT Cv . (7.31)
i=1 i=1
vT Cv C2 v22 . (7.32)
M T M M M
( Qi Ui ) A ( Qi Ui ) k0 vi2 = k0 UTi (QTi AQi )Ui
i=1 i=1 i=1 i=1
a (u, v) = K u v dx , (7.34)
or the elasticity system (C is the fourth-order stiffness tensor and (u) is
the strain tensor of a displacement field u):
When the bilinear form a results from the variational solve of a Laplace
problem, the previous matrix corresponds to the discretisation of local Neu-
mann boundary value problems. For this reason we will call it Neumann
matrix even in a more general setting.
7.4. GENEO COARSE SPACE FOR ADDITIVE SCHWARZ 205
1
2
which proves the continuity of operator RASM,2 (i.e. hypothesis (7.9) from
Lemma 7.2.2), with
cR = max(2 , k0 )
as a continuity constant.
206 CHAPTER 7. GENEO COARSE SPACE
and
N
T
b(U, U) = (RiT Ui ) A (RiT Ui ) . (7.40)
i=0
Note that the result (7.43) is insufficient to yield a spectral estimate of the
ASM preconditioner. We still have to bound from above the subdomain en-
ergy terms a(RiT Ui , RiT Ui ), 1 i N , by the global energy term a(U, U).
In order to do this, we first introduce an estimate to a(U, U) from below
in terms of a sum of some local energy terms, see (7.44) and then infer from
it a construction of the coarse space in Definition 7.4.2.
Lemma 7.4.4 Let k1 be the maximal multiplicity of subdomains intersec-
tion, i.e. the largest integer m such that there exists m different subdomains
whose intersection has a non zero measure.
Then, for all U RN , we have
N
T j
(Rj U) A Rj U k1 UT AU = k1 a(U, U) (7.44)
j=1
jk , jk ) range(A
Find (U j ) R
Pj B jk = jk A
j Pj U jk .
j U
Define also
jk jk > }
j ) Span{U
Zj = ker(A
and the local projection
j on Z jk jk }.
j parallel to Span{U
j )T A (RT Dj (Id j )U
(RjT Dj (Id j )U j) U
TA j .
j U (7.47)
j j
We see now that estimate (7.46) can be obtained directly from (7.47) pro-
vided that Uj are such that the left-hand sides of (7.46) and (7.47) are the
same, that is
Uj = Dj (Id j )Rj U. (7.48)
It remains now to define the coarse space component U0 and the coarse
space interpolation operator, such that Nj=0 Rj Uj = U. From the previous
T
N
j .
V0 = RjT Dj Z (7.49)
j=1
N
U0 = (R0 R0T )1 R0 RjT Dj j Rj U
j=1
so that
N
R0T U0 = RjT Dj j Rj U.
j=1
N
1 T
RASM,2 (U) = RjT Uj = U and b(U, U) U AU = a(U, U) .
j=0 cT
210 CHAPTER 7. GENEO COARSE SPACE
N
U = RjT Dj Rj U
j=1
N N N
= RjT Dj j Rj U + RjT Dj (Id j )Rj U = RjT Uj .
j=1 j=1 j=0
Uj
R0T U0
We now prove the second part of the theorem, the stability of the decom-
position. By Lemma 7.4.3, we have
N
b(U, U) 2 a(U, U) + (2 k0 + 1) a(RiT Ui , RiT Ui ) . (7.50)
i=1
i (Ri U) .
a(RiT Ui , RiT Ui ) (Ri U)T A
1
(MASM
1
2 A) max(2 , k0 ) .
2 + (2k0 + 1)k1
Due to the definition of the coarse space, we see that the condition number
of the preconditioned problem will not depend on the number of the subdo-
mains but only on the parameters k0 , k1 and . Parameter can be chosen
arbitrarily small at the expense of a large coarse space.
7.5. HYBRID SCHWARZ WITH GENEO 211
N
MASM
1
= RiT (Ri ARiT )1 Ri .
i=1
MHSM
1
= R0T (R0 AR0T )1 R0 + (Id P0 ) MASM
1
(Id P0T ) . (7.53)
N
RHSM (U) = R0T U0 + (Id P0 ) RiT Ui . (7.54)
i=1
We check the assumptions of the Fictitious Space Lemma when RHSM re-
places RASM,2 in Definition 7.1.1. This will give us the condition number
estimate (7.57).
Proof We first check that we have indeed a decomposition, i.e. that the
equality RHSM (U) = U holds. Note that for all 1 j N we have
RjT Dj
j Rj U V0 (Id P0 )RjT Dj
j Rj U = 0 .
7.5. HYBRID SCHWARZ WITH GENEO 213
We have:
N
U = P0 U + (Id P0 )U = P0 U + (Id P0 ) RjT Dj Rj U
j=1
N
= P0 U + (Id P0 ) RjT Dj Rj U
j=1
N
= R0T U0 + (Id P0 ) RjT Dj (Id
j ) Rj U = RHSM (U) .
j=1
(R0T (R0 AR0T )1 R0 A)2 = R0T (R0 AR0T )1 R0 AR0T (R0 AR0T )1 R0 A
= R0T (R0 AR0T )1 R0 A .
Finally, the range of R0T (R0 AR0T )1 R0 A is V0 since for all U V0 , there
exist W such that U = R0T W and we have:
MHSM
1
= P0 A1 + (Id P0 ) MASM
1
(Id P0T ) .
These relations yield the following expression for the preconditioned operator
MHSM
1
A = P0 + (Id P0 ) MASM
1
A(Id P0 ) . (7.59)
Kn (MHSM
1
A, r0 ) = {r0 , MHSM
1
A r0 , . . . , (MHSM
1
A)n1 r0 }
where
r0 = MHSM
1
(b Ax0 )
P0 r0 = 0 ,
Kn (MHSM
1
A, r0 ) = {r0 , (Id P0 ) MASM
1
Ar0 , . . . , ((Id P0 ) MASM
1
)n1 r0 } .
7.6. FREEFEM++ IMPLEMENTATION 215
This can be easily proved using formula (7.59) and the fact that P02 = P0 :
MHSM
1
A r0 = (P0 + (Id P0 ) MASM
1
A(Id P0 )) r0
= (Id P0 ) MASM
1
A r0 ,
(MHSM
1
A)2 r0 = (P0 + (Id P0 ) MASM
1
A(Id P0 ))(Id P0 ) MASM
1
A r0
= (Id P0 ) MASM
1
A(Id P0 )MASM
1
A r0
= ((Id P0 ) MASM
1
A)2 r0 ,
It means that in the PCG method, it is sufficient to consider that the pre-
conditioner is
(Id P0 ) MASM
1
.
In order to have P0 r0 = 0, we can choose for example
P0 MHSM
1
= P0 R0T (R0 AR0T )1 R0 = R0T (R0 AR0T )1 R0
which leads to
P0 r0 = P0 MHSM
1
(b AR0T (R0 AR0T )1 R0 b)
= R0T (R0 AR0T )1 R0 (b AR0T (R0 AR0T )1 R0 b)
= 0.
To sum up, the PCG algorithm (see Algorithm 5 in 3.3.1) for the Hybrid
Schwarz method takes the form given in Algorithm 8.
div(u) = f, in (7.60)
for(int i=0;i<npart;++i)
{
mesh Thi = aTh[i];
5 fespace Vhi(Thi,P1);
Vhi[int] eV(abs(nev));
real[int] ev(abs(nev));
if (nev > 0){//GENEO coarse space
9 int k =
EigenValue(aN[i],aAweighted[i],sym=true,sigma=0,maxit=50,tol=1.e4,value=ev,vector=eV);
cout << Eigenvalues in the subdomain << i <<endl;
k=min(k,nev); //sometimes the no of converged eigenvalues is bigger than nev.
cout << ev <<endl;
13 }
else// Nicolaides Coarse space
{
eV[0] = 1.;
17 }
for(int j=0;j<abs(nev);++j){
real[int] zitemp = Dih[i]eV[j][];
21 int k = iabs(nev)+j;
Z[k][]=Rih[i]zitemp;
}
}
/# debutPartition #/
include ../../FreefemCommon/dataGENEO.edp
include ../../FreefemCommon/decomp.idp
4 include ../../FreefemCommon/createPartition.idp
SubdomainsPartitionUnity(Th,part[],sizeovr,aTh,Rih,Dih,Ndeg,AreaThi);
plot(part,wait=1,fill=1,ps=partition.eps);
/# endPartition #/
8 /# debutGlobalData #/
Aglobal = vaglobal(Vh,Vh,solver = UMFPACK); // global matrix
rhsglobal[] = vaglobal(0,Vh); // global rhs
uglob[] = Aglobal1 rhsglobal[];
12 /# finGlobalData #/
/# debutLocalData #/
for(int i = 0;i<npart;++i)
{
16 cout << Domain : << i << / << npart << endl;
matrix aT = AglobalRih[i];
aA[i] = Rih[i]aT;
set(aA[i],solver = UMFPACK); // direct solvers using UMFPACK
20 varf valocal(u,v) = int2d(aTh[i])(etauv+ka(Grad(u)Grad(v)))
+on(1,u=g);
fespace Vhi(aTh[i],P1);
aN[i]= valocal(Vhi,Vhi);
24 set(aN[i], solver = UMFPACK);
matrix atimesxi = aA[i] Dih[i];
aAweighted[i] = Dih[i] atimesxi;
set(aAweighted[i], solver = UMFPACK);
28 }
/# finLocalData #/
/# debutPCGSolve #/
include ../../FreefemCommon/matvecAS.idp
32 include GENEO.idp
include PCGCS.idp
Vh un = 0, sol; // initial guess un and final solution
cout << Schwarz Dirichlet algorithm << endl;
36 sol[] = myPCG(un[], tol, maxit); // PCG with initial guess un
plot(sol,cmm= Final solution, wait=1,dim=3,fill=1,value=1);
Vh er = soluglob;
cout << Final relative error: << er[].linfty/sol[].linfty << endl;
40 /# finPCGSolve #/
220
-52630.6
CHAPTER 7. GENEO COARSE SPACE
26316.8
78948.4
131580
184212
236843
289475
342106
394738
447369
500001
552633
605264
657896
710527
763159
815790
868422
921054
1.05263e+06
0
10
Nicolaides
GenEO
1
10
2
10
3
10
Residual
4
10
5
10
6
10
7
10
0 20 40 60 80 100 120
Iterations
N
Ri Si Ri U = G .
T
(7.64)
i=1
N
N = Ri Di Si Di Ri .
T
MN
1 1
(7.65)
i=1
H = R#N
endowed with the standard Euclidean scalar product and the bilinear
form a H H R
a(U , V ) = (S U , V ) , U , V H .
endowed with the standard Euclidean scalar product and the bilinear
form b
b HD HD R
N
((Ui )1iN , (Vi )1iN ) z (Si Ui , Vi ) .
i=1
RN N HD H
N
(7.66)
(Ui )1iN RTi Di Ui
i=1
7.7. BALANCING NEUMANN-NEUMANN 223
Note that the operatpor RN N is surjective since from (7.62), we have for
all U H:
N
U = RTi Di Ri U = RN N ((Ri U )1iN ) .
i=1
Contrarily to the Schwarz method, the stable decomposition is easily checked
and is satisfied with cT = 1. Indeed, let U H, we have the natural decom-
position U = N i=1 Ri Di Ri U . In other words, let U = (Ri U )1iN , we
T
This result is of little practical use since it assumes that the local Neumann
subproblems are well-posed which is not always the case. Moreover, we have
studied only the one level method.
We address the former issue in the next section and a GenEO coarse space
construction in 7.7.3.
7.7. BALANCING NEUMANN-NEUMANN 225
Si range Si range Si .
In order to capture the part of the solution that will come from the local ker-
nels Si (1 i N ), let Zi be a rectangular matrix of size #Ni dim(ker Si )
whose columns are a basis of ker Si . We form a rectangular matrix Z0 of
size #N N i=1 dim(ker Si ) by concatenation of the Zi s:
Z0 = (RTi Di Zi )1iN .
Let W0 be the vector space spanned by the columns of Z0 , we introduce the
b HD HD R
N
((Ui )0iN , (Vi )0iN ) z V0T S U0 + ViT Si Ui .
i=1
RBN N HD H
N
(7.72)
(Ui )0iN z U0 + (Id P0 ) RTi Di Ui .
i=1
N
RBN N B 1 RBN N = P0 S1 + (Id P0 ) RTi Di Si (Id Pi )Di Ri (Id P0 )T .
i=1
(7.73)
RBN N H HD
U z RBN N (U) ,
such that
V HD , V T RBN N (U) = RBN N (V)T U .
For all V = (Vi )0iN HD , the above equation is
N N
V0T RBN N (U)0 + ViT RBN N (U)i = (V0 + (Id P0 ) RTi Di Vi )T U
i=1 i=1
N
= V0T U + ViT Di Ri (Id P0 )T U .
i=1
RBN N (U)0 = P0 U
7.7. BALANCING NEUMANN-NEUMANN 227
This yields the final form of the preconditioner RBN N B 1 RBN N which is
called the Balancing Neumann-Neumann preconditioner, see [130], [121] and
[63]:
N
N = P0 S + (Id P0 ) Ri Di Si (Id Pi )Di Ri (Id P0 ) .
T T
MBN
1 1
i=1
(7.74)
b(U, U) = (SP0 U, P0 U)
N
+ (Si (Id Pi )Ri (Id P0 )U, (Id Pi )Ri (Id P0 )U)
i=1
= (SP0 U, P0 U)
N
+ (Si Ri (Id P0 )U, (Id Pi )Ri (Id P0 )U)
i=1
N
= (SP0 U, P0 U) + (Si Ri (Id P0 )U, Ri (Id P0 )U)
i=1
= (SP0 U, P0 U) + (S(Id P0 )U, (Id P0 )U)
= (SU, U) = a(U, U) .
(7.77)
(S RTi Di Ui , RTi Di Ui )
max = max max . (7.78)
1iN Ui rangeSi /{0} (Si Ui , Ui )
1 (MBN
1
N S) max(1, k2 max ) .
Constant max in (7.78) can be large and thus the Balancing Neumann
Neumann preconditioner (7.74) can be inefficient. For this reason, in the
next section, we define a coarse space which allow to guarantee any targeted
convergence rate.
1
i denote projection from RNi on Wi parallel to Span {Uik ik }.
Let WGenEO be the vector space spanned by the columns of ZGenEO and
Pg = ZGenEO (ZGenEO
T
SZGenEO )1 ZGenEO
T
S.
The proof is similar to that of formula (7.58) that was done in the context
of the hybrid Schwarz method in 7.5. Note that ker(Sj ) Wj for all
1 j N , so that we have W0 WGenEO .
N
N G = Pg S + (Id Pg ) Ri Di Si (Id Pi )Di Ri (Id Pg ) ,
T T
MBN
1 1
i=1
(7.84)
In order to study this new preconditioner we use the same framework than
for the balancing Neumann-Neumann method except that the natural coarse
7.7. BALANCING NEUMANN-NEUMANN 231
It can easily be checked from 7.7.2 that the surjectivity of RBN N G and
the stable decomposition are unchanged since W0 WGenEO .
Lemma 7.7.7 (Surjectivity of RBN N G ) Operator RBN N G is surjective.
Proof Similarly to (7.75), we have:
The last term of this equation is zero since for all subdomains i, Pi Ri (Id
Pg )U ker Si and thus
N
Ri Di Pi Ri (Id Pg )U W0 WGenEO
T
i=1
we have
N N
(S(Id Pg ) RTi Di Ui , (Id Pg ) RTi Di Ui )
i=1 i=1
N N
= (S(Id Pg ) RTi Di (Id i )Ui , (Id Pg ) RTi Di (Id i )Ui ) .
i=1 i=1
Thus using equality (7.88), Lemma 7.7.1 and then Lemma 7.7.6 we have:
1 (MBN
1
N G S) max(1, k2 ) .
7.8. SORAS-GENEO-2 233
S U = G ,
an efficient implementation can be done if the initial guess is such that the
initial residual is S-orthogonal to WGenEO . It can be achieved simply by
taking as initial guess:
U0 = ZGenEO (ZGenEO
T
SZGenEO )1 ZGenEO
T
G .
For all 1 i N , let Ri be the restriction matrix from R#N to the subset
R#Ni and Di be a diagonal matrix of size #Ni #Ni , so that we have a par-
tition of unity at the algebraic level, Id = N
i=1 Ri Di Ri , where Id R
T #N #N
and
Find (Vjk , jk ) R#Ni {0} R such that
i Vik = ik Di Bi Di Vik . .
A
In the general case for 1 i N , matrices Di may have zero entries for
boundary degrees of freedom since they are related to a partition of unity.
Moreover very often matrices Bi and Ai differ only by the interface condi-
tions that is for entries corresponding to boundary degrees of freedom. There-
fore, matrix Di Bi Di on the right hand side of the last generalized eigenvalue
problem is not impacted by the choice of the interface conditions of the one
level optimized Schwarz method. This cannot lead to efficient adaptive coarse
spaces.
Matrix A i arises from the variational formulation (7.98) where the integra-
tion over domain is replaced by the integration over subdomain i and
finite element space Vh is restricted to subdomain i . Matrix Bi corresponds
to a Robin problem and is the sum of matrix A i and of the matrix of the
following variational formulation restricted to the same finite element space:
2(2 + )
uh vh with = 10 in our test.
i + 3
7.8. SORAS-GENEO-2 237
Table 7.1: 2D Elasticity. GMRES iteration counts for a solid made of steel
and rubber. Simulations made with FreeFem++ [105]
AS SORAS AS+ZEM SORAS +ZEM AS-GenEO SORAS geneo2
d.o.f N iter. iter. iter. dim iter. dim iter. dim iter. dim
35841 8 150 184 117 24 79 24 110 184 13 145
70590 16 276 337 170 48 144 48 153 400 17 303
141375 32 497 >1000 261 96 200 96 171 800 22 561
279561 64 >1000 >1000 333 192 335 192 496 1600 24 855
561531 128 >1000 >1000 329 384 400 384 >1000 2304 29 1220
1077141 256 >1000 >1000 369 768 >1000 768 >1000 3840 36 1971
Parallel implementation of
Schwarz methods
239
240 CHAPTER 8. IMPLEMENTATION OF SCHWARZ METHODS
Note that the three dimensional mesh is not actually built but a macro is
defined.
From now on, all the tasks can be computed concurrently, meaning that
each MPI process is in charge of only one subdomain and variables are local
to each process. Then a parallel mesh refinement is made by cutting each
tetrahedra into 8 tetrahedra. This corresponds to a mesh refinement factor
s equals to 2. Note also that at this point, the displacement field u is
approximated by P2 continuous piecewise quadratic functions.
real f = 900000.;
func real stripes(real a, real b, real paramA, real paramB) {
int da = int(a 10);
107 return (da == (int(da / 2) 2) ? paramB : paramA);
}
matrix N;
if(mpisize > 1 && solver == 12) {
166 int[int] parm(1);
parm(0) = getARGV(nu, 20);
EVproblem(vPbNoPen, Th, Ph)
matrix noPen = vPbNoPen(Wh, Wh, solver = CG);
170 attachCoarseOperator(mpiCommWorld, Aglob, A = noPen, /threshold = 2.
h[].max / diam,/ parameters = parm);
}
PETSc interface If the PETSc interface is used, the local stiffness matrix
K = Aii = Ri ARiT and the local load vector rhs are built concurrently from
the variational forms for all subdomains 1 i N .
Wh[int] def(Rb)(6);
[Rb[0], RbB[0], RbC[0]] = [ 1, 0, 0];
[Rb[1], RbB[1], RbC[1]] = [ 0, 1, 0];
135 [Rb[2], RbB[2], RbC[2]] = [ 0, 0, 1];
[Rb[3], RbB[3], RbC[3]] = [ y, x, 0];
[Rb[4], RbB[4], RbC[4]] = [z, 0, x];
[Rb[5], RbB[5], RbC[5]] = [ 0, z, y];
141 set(Mat, sparams = pc type gamg ksp type gmres pc gamg threshold
0.05 ksp monitor, nearnullspace = Rb);
}
else if(solver == 3)
set(Mat, sparams = pc type lu pc factor mat solver package mumps
mat mumps icntl 7 2 ksp monitor);
145 mpiBarrier(mpiCommWorld);
timing = mpiWtime();
u[] = Mat1 rhs;
/# problemPhysics #/
13 real Sqrt = sqrt(2.);
macro epsilon(u)[dx(u), dy(u#B), dz(u#C), (dz(u#B) + dy(u#C)) / Sqrt, (dz(u)
+ dx(u#C)) / Sqrt, (dy(u) + dx(u#B)) / Sqrt]// EOM
macro div(u)(dx(u) + dy(u#B) + dz(u#C))// EOM
29 /# sequentialMesh #/
real depth = 0.25;
int discrZ = getARGV(discrZ, 1);
real L = 2.5;
33 real H = 0.71;
real Hsupp = 0.61;
real r = 0.05;
real l = 0.35;
37 real h = 0.02;
real width = 2.5L/4.;
real alpha = asin(h/(2.r))/2;
if(mpirank == 0) {
cout << << mpirank << / << mpisize;
68 cout << input parameters: global size = << global << refinement
factor = << s << precision = << getARGV(eps, 1e8) <<
overlap = << overlap << with partitioner? = <<
partitioner << endl;
}
/# parallelMesh #/
72 func Pk = [P2, P2, P2];
Wh def(u);
/# chooseSolver #/
80 solver = getARGV(solver, 0);
if(solver == 0)
{
if(mpirank == 0) {
84 cout << What kind of solver would you like to use ? << endl;
cout << [1] PETSc GMRES << endl;
cout << [2] GAMG << endl;
cout << [3] MUMPS << endl;
88 cout << [10] ASM << endl;
cout << [11] RAS << endl;
cout << [12] Schwarz GenEO << endl;
cout << [13] GMRES << endl;
92 cout << Please type in a number: ;
cin >> solver;
if(solver != 1 && solver != 2 && solver != 3 && solver != 4 && solver
!= 10 && solver != 11 && solver != 12) {
cout << Wrong choice, using GMRES instead ! << endl;
96 solver = 10;
}
}
}
100 broadcast(processor(0), solver);
/# chooseSolverEnd #/
/# physicalParameters #/
104 real f = 900000.;
func real stripes(real a, real b, real paramA, real paramB) {
int da = int(a 10);
return (da == (int(da / 2) 2) ? paramB : paramA);
108 }
real[int] res(Wh.ndof);
120 real[int] rhs(Wh.ndof);
In 8.2.1, we present results obtained on few cores with the above script
from 8.1. Section 8.2.2 shows the scalability of the method with a large
number of cores solving both the system of linear elasticity and a problem
of scalar diffusion.
Results and timings for solving this problem with 263,000 unknowns on
8 cores running at 1.6 GHz are given in Table 8.1. The parameter is
the relative threshold used for dropping edges in the aggregation graphs of
the multigrid preconditioner, while the parameter is the number of local
deflation vectors computed per subdomain in the GenEO coarse space. The
multigrid implementation is based on GAMG [2], which is bundled into
PETSc [8, 7]. The results for exactly the same problem as before on 64
cores are given in Table 8.2.
For the GenEO method, the computational times vary slightly when the
parameter varies around its optimal value. The iteration count decreases
when the parameter increases. On the other hand, when is increased the
cost of the factorization of the coarse operator increases. For the multigrid
method, the computational times vary rapidly with the parameter .
8.2. NUMERICAL EXPERIMENTS 253
200
100
40
10 20 40 81
24 48 96 92
# of processes
100% 100%
80% 80%
60% 60%
Ratio
40% 40%
Factorization
20% Deation 20%
Coarse operator
0% 10 20 40 81 Krylov method 10 2 4 8 0%
24 48 96 92 24 048 096 192
# of processes # of processes
Figure 8.3: Comparison of the time spent in various steps for building and
using the preconditioner in 2D (left) and 3D (right).
256 CHAPTER 8. IMPLEMENTATION OF SCHWARZ METHODS
1
MRAS eq. (1.15)
Relative residual error 102 1
PA-DEF1 eq. (1.19a)
103
104
105
106
0 100 200 300 400
# of iterations
Figure 8.5:
Convergence of the restarted GMRES(40) for a 2D problem
of linear elasticity using 1 024 subdomains. Timings for the
setup and solution phases using PA-DEF1
1
are available in 8.4,
using MRAS , the convergence is not reached after 10 minutes.
1
Moving on to the weak scaling properties, see Definition 4.1.2, the problem
now being solved is a scalar equation of diffusivity
(u) = 1 in
(8.3)
u =0 on [0; 1] {0} .
(x, y)
6
3 10
2 106
106
a(u, v) = u v f v = 0 .
Vi = Ri V .
we have:
N N
(U, V) = (U, RiT Di Ri V) = (Ri U, Di Ri V)
i=1 i=1
N
= (Ui , Di Vi ) .
i=1
yi xi + yi , 1 i N .
N N
Vi = Ri AU = Ri ARjT Dj Rj U = Ri ARjT Dj Uj .
j=1 j=1
Akl = 0 .
We can take advantage of this sparsity pattern in the following way. A degree
of freedom k Nj is interior to Nj if for all i j and all l Ni Nj , Akl = 0.
Otherwise, it is said to be a boundary degree of freedom. If the overlap is
sufficient, it is possible to choose diagonal matrix Dj with zero entries for
the boundary degrees of freedom. Then all non zero rows of matrix ARjT Dj
have indices in Nj that is:
we have:
N
Vi = Ri AU = Ri ARjT Dj Rj U = (Ri ARiT ) Di Ui + Ri RjT (Rj ARjT ) Dj Uj .
j=1 jOi
Aii Ajj
Since we have basic linear algebra subroutines, we have all the necessary
ingredients for solving concurrently the linear system A x = b by a Krylov
method such as CG (conjugate gradient) or GMRES. We now turn our at-
tention to domain decomposition methods. The ASM preconditioner reads:
N
MASM
1
= RjT A1
jj Rj .
j=1
N
Ri MASM
1
R = Ri RjT A1
jj Rj R = Aii Ri + (Ri Rj ) Ajj Rj .
1 T 1
j=1 jOi
summed locally. This pattern is very similar to that of the matrix vec-
tor product.
The RAS preconditioner reads:
N
MRAS
1
= RjT Dj A1
jj Rj .
j=1
100% 22 311
# of d.o.f.in millions
80%
2 305
60%
695
40%
20% 3D
2D 74
0% 256 512 10 20 40 81
24 48 96 92
# of processes
(a) Timings of various simulations
d.o.f. d.o.f.
2.1M sbdmn
in 2D 280k sbdmn
in 3D
200
200
150
Time (seconds)
100
100
Factorization
50 Deation
Coarse operator
0 256 512 1 0 2 0 4 0 8 1 Krylov method 0 256 512 1 0 2 0 4 0 8 1
24 48 96 92 24 48 96 92
# of processes # of processes
(b) Comparison of the time spent in various steps for building and using the preconditioner
264 CHAPTER 8. IMPLEMENTATION OF SCHWARZ METHODS
265
266 BIBLIOGRAPHY
[8] Satish Balay, William D. Gropp, Lois Curfman McInnes, and Barry
F. Smith. Efficient management of parallelism in object oriented
numerical software libraries. In Modern software tools in scientific
computing. E. Arge, A. M. Bruaset, and H. P. Langtangen, editors.
Birkhauser Press, 1997, pages 163202 (cited on pages 239, 252).
[9] R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra,
V. Eijkhout, R. Pozo, C. Romine, and H. Van der Vorst. Templates
for the solution of linear systems: building blocks for iterative meth-
ods, 2nd edition. SIAM, Philadelphia, PA, 1994 (cited on pages 99,
111).
[10] H. Barucq, J. Diaz, and M. Tlemcani. New absorbing layers condi-
tions for short water waves. J. comput. phys., 229(1):5872, 2010.
issn: 0021-9991. doi: 10.1016/j.jcp.2009.08.033. url: http:
//dx.doi.org/10.1016/j.jcp.2009.08.033 (cited on page 183).
[11] G. K. Batchelor. An introduction to fluid dynamics. Of Cambridge
Mathematical Library. Cambridge University Press, Cambridge, pa-
perback edition, 1999, pages xviii+615. isbn: 0-521-66396-2 (cited on
page 183).
[12] Jean-Francois Bourgat, Roland Glowinski, Patrick Le Tallec, and Ma-
rina Vidrascu. Variational formulation and algorithm for trace oper-
ator in domain decomposition calculations. In Domain decomposition
methods. Tony Chan, Roland Glowinski, Jacques Periaux, and Olof
Widlund, editors. SIAM, Philadelphia, PA, 1989, pages 316 (cited
on pages 155, 162, 222).
[13] Ham Brezis. Analyse fonctionnelle : theorie et applications. Dunod,
Paris, 1983 (cited on page 164).
[14] X. C. Cai, W. D. Gropp, D. E. Keyes, R. G. Melvin, and D. P.
Young. Parallel Newton-Krylov-Schwarz algorithms for the transonic
full potential equation. Sisc, 19:245265, 1998 (cited on page ii).
[15] Xiao Chuan Cai and David Keyes. Nonlinearly preconditioned inex-
act newton algorithms. Sisc, 2003 (cited on page ii).
[16] Xiao-Chuan Cai, Mario A. Casarin, Frank W. Elliott Jr., and Olof
B. Widlund. Overlapping Schwarz algorithms for solving Helmholtzs
equation. In, Domain decomposition methods, 10 (boulder, co, 1997),
pages 391399. Amer. Math. Soc., Providence, RI, 1998 (cited on
page 75).
[17] Xiao-Chuan Cai, Charbel Farhat, and Marcus Sarkis. A minimum
overlap restricted additive Schwarz preconditioner and applications
to 3D flow simulations. Contemporary mathematics, 218:479485,
1998 (cited on page 6).
BIBLIOGRAPHY 267
[112] Caroline Japhet, Frederic Nataf, and Francois-Xavier Roux. The Op-
timized Order 2 Method with a coarse grid preconditioner. applica-
tion to convection-diffusion problems. In Ninth international confer-
ence on domain decompositon methods in science and engineering.
P. Bjorstad, M. Espedal, and D. Keyes, editors. John Wiley & Sons,
1998, pages 382389 (cited on page 86).
[113] P. Jolivet, F. Hecht, F. Nataf, and C. Prudhomme. Scalable do-
main decomposition preconditioners for heterogeneous elliptic prob-
lems. In Proceedings of the 2013 acm/ieee conference on supercom-
puting. In SC13. Best paper finalist. ACM, 2013, 80:180:11 (cited
on page 253).
[114] Pierre Jolivet. Methodes de decomposition de domaine. applica-
tion au calcul haute performance. PhD thesis. Universite de Greno-
ble, https://www.ljll.math.upmc.fr/ jolivet/thesis.pdf, 2014 (cited on
page 261).
[115] Pierre Jolivet, Victorita Dolean, Frederic Hecht, Frederic Nataf,
Christophe Prudhomme, and Nicole Spillane. High performance do-
main decomposition methods on massively parallel architectures with
freefem++. J. numer. math., 20(3-4):287302, 2012. issn: 1570-2820
(cited on page 136).
[116] J.P.Berenger. A perfectly matched layer for the absorption of elec-
tromagnetic waves. J. of comp.phys., 114:185200, 1994 (cited on
pages 66, 86).
[117] G. Karypis and V. Kumar. METIS: A software package for
partitioning unstructured graphs, partitioning meshes, and com-
puting fill-reducing orderings of sparse matrices. Technical re-
port. http://glaros.dtc.umn.edu/gkhome/views/metis. Department
of Computer Science, University of Minnesota, 1998 (cited on
pages 136, 140).
[118] George Karypis and Vipin Kumar. Metis, unstructured graph parti-
tioning and sparse matrix ordering system. version 2.0. Technical re-
port. Minneapolis, MN 55455: University of Minnesota, Department
of Computer Science, August 1995 (cited on pages 13, 27).
[119] Jung-Han Kimn and Marcus Sarkis. Restricted overlapping balanc-
ing domain decomposition methods and restricted coarse problems
for the Helmholtz problem. Comput. methods appl. mech. engrg.,
196(8):15071514, 2007. issn: 0045-7825. doi: 10.1016/j.cma.2006.
03.016. url: http://dx.doi.org/10.1016/j.cma.2006.03.016
(cited on page 138).
278 BIBLIOGRAPHY
[139] Frederic Nataf and Francis Nier. Convergence rate of some do-
main decomposition methods for overlapping and nonoverlapping
subdomains. Numerische mathematik, 75(3):35777, 1997 (cited on
page 86).
[140] Frederic Nataf and Francois Rogier. Factorization of the convection-
diffusion operator and the Schwarz algorithm. M 3 AS, 5(1):6793,
1995 (cited on page 50).
[141] Frederic Nataf, Francois Rogier, and Eric de Sturler. Optimal inter-
face conditions for domain decomposition methods. Technical report
(301). CMAP (Ecole Polytechnique), 1994 (cited on page 68).
[142] Frederic Nataf, Hua Xiang, and Victorita Dolean. A two level domain
decomposition preconditioner based on local Dirichlet-to-Neumann
maps. C. r. mathematique, 348(21-22):11631167, 2010 (cited on
pages 136, 153, 154).
[143] Frederic Nataf, Hua Xiang, Victorita Dolean, and Nicole Spillane. A
coarse space construction based on local Dirichlet to Neumann maps.
Siam j. sci comput., 33(4):16231642, 2011 (cited on pages 136, 153,
154, 191, 192).
[144] Sergey V. Nepomnyaschikh. Decomposition and fictious domains
for elliptic boundary value problems. In Fifth international sympo-
sium on domain decomposition methods for partial differential equa-
tions. David E. Keyes, Tony F. Chan, Gerard A. Meurant, Jeffrey
S. Scroggs, and Robert G. Voigt, editors. SIAM, Philadelphia, PA,
1992, pages 6272 (cited on pages 195, 196).
[145] Sergey V. Nepomnyaschikh. Mesh theorems of traces, normalizations
of function traces and their inversions. Sov. j. numer. anal. math.
modeling, 6:125, 1991 (cited on pages 195, 196, 235).
[146] Roy A. Nicolaides. Deflation of conjugate gradients with applications
to boundary value problems. Siam j. numer. anal., 24(2):355365,
1987. issn: 0036-1429. doi: 10 .1137 / 0724027. url: http :/ / dx.
doi.org/10.1137/0724027 (cited on pages 123, 128, 153).
[147] Francis Nier. Remarques sur les algorithmes de decomposition de do-
maines. In, Seminaire: equations aux derivees partielles, 19981999,
Exp. No. IX, 26. Ecole Polytech., 1999 (cited on page 68).
[148] M. Oorsprong, F. Berberich, V. Teodor, T. Downes, S. Erotokritou,
S. Requena, E. Hogan, M. Peters, S. Wong, A. Gerber, E. Emeriau,
R. Guichard, G. Yepes, K. Ruud, et al., editors. Prace annual report
2013. Insight Publishers, 2014 (cited on page 253).
BIBLIOGRAPHY 281
[159] Jack Poulson, Bjorn Engquist, Siwei Li, and Lexing Ying. A paral-
lel sweeping preconditioner for heterogeneous 3D Helmholtz equa-
tions. Siam j. sci. comput., 35(3):C194C212, 2013. issn: 1064-8275.
doi: 10 . 1137 / 120871985. url: http : / / dx . doi . org / 10 . 1137 /
120871985 (cited on page 86).
[160] Alfio Quarteroni and Alberto Valli. Domain decomposition methods
for partial differential equations. Oxford Science Publications, 1999
(cited on page i).
[161] Vineet Rawat and Jin-Fa Lee. Nonoverlapping domain decomposi-
tion with second order transmission condition for the time-harmonic
Maxwells equations. Siam j. sci. comput., 32(6):35843603, 2010.
issn: 1064-8275 (cited on page 86).
[162] Francois-Xavier Roux, Frederic Magoules, Laurent Series, and Yas-
sine Boubendir. Approximation of optimal interface boundary condi-
tons for two-Lagrange multiplier FETI method. In, Domain decompo-
sition methods in science and engineering. Volume 40, in Lect. Notes
Comput. Sci. Eng. Pages 283290. Springer, Berlin, 2005 (cited on
page 62).
[163] Yousef Saad. Analysis of augmented Krylov subspace methods. Siam
j. matrix anal. appl., 18(2):435449, 1997. issn: 0895-4798. doi: 10.
1137/S0895479895294289. url: http://dx.doi.org/10.1137/
S0895479895294289 (cited on page 125).
[164] Youssef Saad. Iterative methods for sparse linear systems. PWS Pub-
lishing Company, 1996 (cited on pages 93, 104, 108, 111, 125).
[165] Youssef Saad and Martin H. Schultz. GMRES: a generalized minimal
residual algorithm for solving nonsymmetric linear systems. Siam j.
sci. stat. comp., 7:856869, 1986 (cited on page 104).
[166] Achim Schadle, Lin Zschiedrich, Sven Burger, Roland Klose, and
Frank Schmidt. Domain decomposition method for Maxwells equa-
tions: scattering off periodic structures. J. comput. phys., 226(1):477
493, 2007. issn: 0021-9991. doi: 10.1016/j.jcp.2007.04.017. url:
http : / / dx . doi . org / 10 . 1016 / j . jcp . 2007 . 04 . 017 (cited on
page 86).
[167] Robert Scheichl and Eero Vainikko. Additive Schwarz with
aggregation-based coarsening for elliptic problems with highly vari-
able coefficients. Computing, 80(4):319343, 2007 (cited on page 191).
[168] Robert Scheichl, Panayot S. Vassilevski, and Ludmil Zikatanov. Weak
approximation properties of elliptic projections with functional con-
straints. Multiscale model. simul., 9(4):16771699, 2011. issn: 1540-
3459. doi: 10 . 1137 / 110821639. url: http : / / dx . doi . org / 10 .
1137/110821639 (cited on pages 154, 191).
BIBLIOGRAPHY 283