Krylov Methods

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Krylov methods, an introduction

Paul Van Dooren, CESAME, Univ. Cath. Louvain, Belgium


slides on http://www.auto.ucl.ac.be/vdooren/Iterative.html

What well talk about ...


basic ideas of iterative methods
recursive refinement
Krylov methods and their variants
orthogonality vs bi-orthogonality
some numerical aspects
error propagation
some algebraic aspects
breakdowns
eigenvalue problems
projected eigenvalues
rational approximation
Pade approximation
2

Motivation
Every method performs better for some classes of problems ...
Direct methods
Jacobi/Gauss-Seidel
Krylov methods
Multigrid methods
Fast multipole methods
but their features can be combined (hybrid, preconditioning)
Advantages of Krylov methods depend on whom to compare with

Recurrences and Krylov methods


Solve Ax = b via fixed point of xk := b + (I A)xk1
Rewrite this as xk := xk1 + rk1 using residual rk1 := b Axk1
=

xk = x0 + r0 + r1 + . . . rk1

From b Axk = b Axk1 Ark1 we find rk := (I A)rk1


=

k1

xk = x0 + r0 + (I A)r0 + . . . (I A)


k1
r0 c
= x0 + r0 , Ar0 , . . . , A

r0

A Krylov subspace is a space spanned by




k1
Kk (A, r0 ) := Im r0 , Ar0 , . . . , A
r0
We are looking for good linear combinations
xk x0 =

k1
X
j=0

cj Aj r0 Kk (A, r0 )

There are essentially two different criterions


min kAxk bk2 ,

make Axk b Kk (A, r0 )

related to orthogonal recurrence relations (GMRES and FOM)

Two additional classes of methods are related to bi-orthogonal


relations (QMR and BI-CG)
5

Arnoldi process
There always exists an orthogonal matrix U T U = In such that

h1,1

h
2,1
T
U AU =: H =

h1,2

h2,2
..
.

...
..
.

h1,n
..
.

..

hn1,n

hn,n1

hn,n

and the first column u1 of U can be chosen arbitrarily.

Equate columns of AU = U H. First one: Au1 = u1 h1,1 + u2 h2,1

h1,1 := uT1 Au1 ,

u
2 := Au1 u1 h1,1 ,

h2,1 := k
u 2 k2

For the following columns:


Auk =

k
X

uj hj,k + uk+1 hk+1,k

j=1

u
k+1

k
X
:= Auk
uj hj,k ,
j=1

hk+1,k := k
uk+1 k2 ,

hj,k := uTj Auk

uk+1 := u
k+1 /hk+1,k

In block notation, with Uk := U (:, 1 : k), Hk := H(1 : k, 1 : k):

Uk

Uk

Hk

+u
k+1 eTk

It is easy to see that Kk (H, e1 ) = Im

...

.
..
I
. ..
= Im k

Choose U such that r0 /kr0 k2 = U e1 , A = U HU T , then


Kk (A, r0 ) = Im [Uk ]
because




T
k1 T
k1
U e1 , (U HU )U e1 , . . . , (U H
U )U e1 = U e1 , He1 , . . . , H
e1

Galerkin condition (FOM)


Look for xk x0 Kk such that b Axk Kk . Therefore
xk x0 = Uk y,

UkT (bAxk ) = 0,

UkT AUk y = UkT r0 = kr0 k2 e1

So we solve using efficient recurrence relations

h1,1

h
2,1

h1,2

h2,2
..
.

...
..
.
..

hk,k1

h1,k
..
.

hk1,k
hk,k

y1

y2
..
.
yk

Error bound kb Axk k2 = |hk+1,k .yk | may grow ...

kr0 k2
0
..
.
0

Minimize residual (GMRES)


Look for xk x0 Kk to minimize kb Axk k2 . Therefore
xk x0 = Uk y,

T
T
bAxk Im[Uk+1 ] kbAxk k2 = kUk+1
r0 Uk+1
AUk yk2

So we solve using efficient recurrence relations

h1,1 h1,2
...
h1,k
kr0 k2

y1

..
..

.
.
h2,1 h2,2

y2
..

. =
..
..

.
.
hk1,k .

0
hk,k1
hk,k

y
k

hk+1,k

One can prove that kb Axk k2 kb Axk1 k2 but it may stall ...
10

GMRES vs FOM
G
Denote k := kb Axk k2 of FOM and GMRES by F
k and k , resp.

Then
2
2
2
(G
= (F
+ (G
k)
k)
k1 )

F
G
k k

The Arnoldi process breaks down when hk+1,k = 0 since we need


to divide by it.
But then the system is solved since both FOM and GMRES yield
F
the same answer and G
k = k = |hk+1,k .yk | = 0

11

Stalling
Consider

...
0
..
.

0
..

1
.

..

0
..
.
0

y1
y2
..
.
yn

1
0
..
.
0

Then
G = (1, . . . , 1, 0)

F = (, . . . , , 0)

GMRES is always bounded, outperforms FOM but still can stall

12

Lanczos process
For A = AT there exists an orthogonal matrix U T U = In such that

U AU =: T

2
..
.

..

..

and the first column u1 of U can be chosen arbitrarily.


Same derivation but recurrences are now short :
Auk = uk1 k + uk k + uk+1 k+1
u
k+1 := Auk uk1 k uk k ,

k+1 := k
uk+1 k2 ,

k := uTk Auk
uk+1 := u
k+1 /k+1

For A = AT 0 the tri-diagonal matrix T can be factored, yielding


two coupled 2-term recurrences instead (Conjugate Gradient).
13

Minimize residual (MINRES)


Look for xk x0 Kk to minimize kb Axk k2 using efficient
recurrence relations

1 2
kr0 k2

y1

..

0
.
2 2

y2 .

. = .
..
..

.
.
k .

k
k

y
k

k+1

One shows that kb Axk k2 decreases linearly with approximate

factor ( 1)/( + 1), where := kAkkA1 k

14

The complexity of the different methods up to step k is


Arnoldi

Lanczos

Conj G

Ax

orthog.

2k 2 n

9kn

10kn

storage

kn

3n

4n

This clearly shows the need for other approaches for unsymmetric
A if k gets large.
Partial orthogonalization (IOM, ...)
Restarts (FOM(m), GMRES(m), ... )

r
b
I
A
T
T

=
Consider A Ax = A b or
x
0
AT 0

15

Unsymmetric Lanczos process


There exist invertible matrices V, W such that W T V = In and

W T AV =: T

2
..
.

..

..

for almost all first columns v1 , w1 of V, W .

Derivation now involves columns of AV = V T and AT W = W T T :


Avk = vk1 k + vk k + vk+1 k+1
AT wk = wk1 k + wk k + wk+1 k+1

16

Without breakdowns we have with Tk := T (1 : k, 1 : k),


Vk := V (:, 1 : k), Wk := W (:, 1 : k), in block notation:

AT

WkT AVk = Tk ,

Vk

Wk

Tk

+ k+1 vk+1 eTk

= Wk

TkT

+ k+1 wk+1 eTk

Vk

WkT Vk = Ik

17

T
(k+1 k+1 wk+1
vk+1 6= 0)

Coupled Krylov subspaces


Since

Kk (T, e1 ) = Im

Ik
0

and

Kk (T , e1 ) = Im

Ik

we have that
Kk (A, v1 ) = Im [Vk ] ,

Kk (AT , w1 ) = Im [Wk ]

because




T
k1
T
k1
W )V e1 = V e1 , T e1 , . . . , T
e1
V e1 , (V T W )V e1 , . . . , (V T

W e1 , (W T T V T )W e1 , . . . , (W T

T k1

18

V T )W e1 = W e1 , T T e1 , . . . , T

T k1

e1

Galerkin condition
Look for xk x0 Kk (A, r0 ) such that b Axk Kk (AT , w1 ).
Therefore
xk x0 = Vk y,

WkT (bAxk ) = 0,

So we solve recursively

2
..
.

..

..

WkT AVk y = WkT r0 = kr0 k2 e1

y1
y2
..
.
yk

But now kW T (b Axk )k2 = |k+1 .yk | ...


19

kr0 k2
0
..
.
0

Minimize quasi residual (QMR)


Look for xk x0 Kk (A, r0 ) to minimize kW T (b Axk )k2 .
Therefore xk x0 = Vk y,

b Axk Im[Vk+1 ],

T
T
kW T (b Axk )k2 = kWk+1
r0 Wk+1
AVk yk2

So we solve using efficient recurrence relations

1 2
kr0 k2

y1

..

0
.
2 2

y2
..

. =
..
..

.
.
k .

0
k
k

yk

k+1

One can only prove that kW T (b Axk )k2 kW T (b Axk1 )k2


20

Variants
Avoid transposes via inner products (x, AT y) = (Ax, y)
Factorize T = L.U to get coupled 2-term recurrences rather
than 3-term recurrences
Re-orthogonalize to compensate for loss of bi-orthogonality
Apply look-ahead to avoid breakdowns
Restart rather than look-ahead or re-orthogonalization
Block versions for all of the above

21

Loss of orthogonality
The orthogonalization process u
k+1 := Auk
under finite precision

Uk

Pk

j=1

uj hj,k yields

uk+1 eTk + Ek ,
Uk Hk +

kEk k kAk

but the error Fk := UkT Uk Ik can grow arbitrarily (even with


MGS) unless one orthogonalizes once again : kFk k
The same comment holds for Lanczos (higher relative cost !)
For unsymmetric Lanczos bounds grow with kWk kkVk k
22

Breakdowns
Arnoldi stops when hk+1,k = 0, implying Axk = b (solved)
Lanczos stops when k+1 = 0, implying Axk = b (solved)
when k+1 = 0, choose vk+1 Vk , e.g. vk+1 = wk+1
T
when wk+1
vk+1 = 0, no bi-orthogonality, serious breakdown

Breakdown is cured by going to the block version (lookahead)


T
det(Wk+1
Vk+1 ) 6= 0

23

Recap solving Ax = b
Choose x0 , r0 := b Ax0 and look for xk x0 Kk (A, r0 )
Four different methods

O(k 2 n)
O(kn)
O(kn)

Axk b Kk (A, r0 )

min kAxk bk2

CG (A = AT 0)

MINRES (A = AT )

FOM

Axk b Kk (AT , w1 )
BICG

GMRES

min kW T (Axk b)k2


QMR

Notice that with full reorthogonalization all methods are O(k 2 n)


Many variants try to cure orthogonality and erratic convergence
All methods improve with preconditioning (application-dependent)

24

Eigenvalue problems
Bauer-Fike Theorem applied to
AUk Uk Hk = hk+1,k uk+1 eTk
yields (for X 1 AX = )
i : |j (Hk ) i (A)| |hk+1,k |(X)
i.e. each eigenvalue of Hk approximates well some eigenvalue of A
For A = AT , (X) = 1
Breakdown hk+1,k = 0 is good since j (Hk ) = i (A)
Improved bounds exist if we know something about (A)

25

Bauer-Fike Theorem applied to


AVk Vk Tk = vk+1 eTk
AWk Wk TkT = w
k+1 eTk
yields (for X 1 AX = )
i : |(Tk ) i (A)| kVk k2 k
vk+1 k2 (X), kWk k2 kw
k+1 k2 (X)
i.e. each eigenvalue of Tk approximates well some eigenvalue of A
Breakdowns vk+1 = 0, w
k+1 = 0 are good since j (Tk ) = i (A)
T
Breakdown w
k+1
vk+1 = 0 does not help

Result still holds when bi-orthogonality gets lost

26

Eigenvalue convergence
Note that Krylov spaces are related to the power method


k1
Kk (A, r) := Im r, Ar, . . . , A
r
and that under exact arithmetic
Kk ((A cI), r) = Kk (A, r)
For Arnoldi, (Hk ) should converge to outer spectrum of A cI
The same holds for Lanczos and (Tk ) (real spectrum if A = AT )
In practice one often converges to outer eigenvalues
but the full story is more complex ...

27

Convergence strongly influenced by rational transformation or


implicit shift technique
Rational transformation A := (bI cA)(dI A)1 transforms
spectrum also (inverse iteration A := (dI A)1 is a special case)
h
i
r) := Im r, Ar,
. . . , Ak1 r
Kk (A,
tends to eigenspace with spectrum of A closest to d
approximate inverses needed for multiplication with A
loss of orthogonality indicates convergence of eigenvalues
symmetric Lanczos extensively studied

28

Implicit shifted ARPACK


One implicit shift QR step applied to Hk :
k = R.Q + I
H

Hk I = Q.R,

k and
(and a deflation) deletes the undesired eigenvalue from H
yields a new starting vector r := (A I)r such that


k2
Kk1 (A, r) := Im r, A
r, . . . , A
r
k1 as projected matrix
yields H
Choose shifts i recursively
r := [(A i )]r = p(A)r
such that p(s) filters the complex plane from undesired eigenvalues
29

Filter after a number of implicit steps

30

Implicit shifts ideas


allow to keep k small (deflations)
behaves numerically better than explicit calculation of r
extends to the Lanczos and unsymmetric Lanczos algorithm
extends to the block Arnoldi and block Lanczos algorithm
extends to the generalized eigenvalue problem
det(B A) = 0 det(I B 1 A) = 0
implicit QR steps become implicit QZ steps

31

Rational approximation

Approximate R(s) of degree N by R(s)


of degree n << N . Assume
R(s) := c(IN A)1 b,

1b
R(s)
:= c(In A)

To interpolate at a point up to order 2k (Pade approximation):

R(s) R(s)
= O(s )2k
one chooses
A = Wk AVk ,

b = Wk b,

c = cVk ,

where
A := (AI)1 ,

Ab),

Im[Vk ] = Kk (A,

Im[Wk ] = Kk (AT , AT cT )

Also valid for multipoint, multi-input, multi-output and Arnoldi

32

We did NOT cover


least squares variants
implementation aspects
preconditioning
singular value problems
block versions
multiple right hand sides
restarts ...

33

Conclusion
Krylov subspace algorithms are used to
solve large scale systems of equations
find eigenvalues, generalized eigenvalues and singular values of
large scale matrix problems
approximate the exponential or high degree rational functions
by lower degree ones
There are numerous variants and sophisticated algorithms
It is still a very active area of research

34

References
[1] R. Lehoucq, D. Sorensen, C. Yang, ARPACK users guide,
SIAM, 1997.
[2] B. Parlett, The symmetric eigenvalue problem, Prentice Hall,
1980.
[3] Y. Saad, Iterative methods for sparse linear systems, SIAM,
2nd Ed., 2003.
[4] H. van der Vorst, Iterative Krylov Methods for Large Linear
Systems, Cambridge Monographs on Applied and
Computational Mathematics, 2003.
[5] P. Van Dooren, Gramian based model reduction of large-scale
dynamical systems, in Numerical Analysis, Chapman and Hall,
CRC Press, London, 2000.
35

You might also like