Strakos: On The Real Convergence Rate of The Conjugate Gradient Method

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

On the Real Convergence Rate of the Conjugate Gradient Method

z. strakos
General Computing Center of the
Czechoslovak Academy of Science
Pod vodrirenskou v&Z 2
182 07 Praha 8, Czechoslovakia

Dedicated to Gene Golub, Richard Varga, and David Young

Submitted by Owe Ax&son

ABSTRACT

We present a parametrized class of matrices for which the rate of convergence of


the conjugate gradient method varies greatly with the parameter and does not
appreciably depend on the algorithm implementation. A small change in the eigen-
value distribution can lead to a large change in the sensitivity of CG to rounding
errors. A theorem is proved which gives a necessary and sufficient condition for
ordering exact arithmetic CG processes for systems with different spectra according
to the energy norm of the error. Theorems 4.1 and 4.2 continue Paige’s and
Greenbaum’s work.

1. INTRODUCTION

Let Ax = b be a system of N linear equations, where x, b E RN, and A is


a symmetric, positive definite matrix of order N. The conjugate gradient
method (CG) for the solution of this system can be presented in the form

x0, rO=b-h” do = r”,


(ri,r’)
‘i= (d’,Ad’) ’

x i+1 = xi + Ti d”,

r i+l= ri - 7iAdi,
rf+l i+l
>r 1
cp,=(.
I
(ri,ri) ’

di+l _ ri+l + pi di, i=O,1,2 ,.... (1.1)

LINEAR ALGEBRA AND ITS APPLZCATZONS 154-156:535-549 (1991) 535


0 Elsevier Science Publishing Co., Inc., 1991
655 Avenue of the Americas, New York, NY 10010 0024-3795/91/$3.50
536 Z. STFUKOS

The CG method was proposed by Hestenes and Stiefel in 1952 [8]. It gives,
in exact arithmetic, the correct solution x within N steps. In practice,
however, CG is regarded as an iterative method [16], because a sufficiently
accurate approximate solution x k is often obtained in far fewer than N steps.
The theoretical convergence rate (in the absence of rounding errors) has
been studied by many authors. Using Chebyshev polynomials, estimates for
the convergence rate were developed ([7], e.g.). It was recognized that the
rate of convergence depends strongly on the distribution of eigenvalues of A.
The effect of roundoff was discussed in Hestenes and Stiefel’s original paper,
and again by Reid [16] and Jennings [9]. It was also studied extensively by
Wozniakowski [26] and Bollen [4, 51. Results developed by Paige, Parlett, and
Simon for the Lanczos algorithm provided further insight into this problem
[12-15, 17-191.
During our experiments we have found a parametrized class of matrices
for which the behavior of the CG method is quite surprising. We have
studied the change in the number of iterations required for convergence as a
function of eigenvalue distribution. For some values of the parameter the CG
process is sensitive to rounding errors, which leads to very slow convergence.
A small change in the parameter (which implies a small change in the
eigenvalue distribution) can lead to a large change in the convergence rate.
The fact that the CG rate of convergence depends strongly on the distribu-
tion of eigenvalues is well known. However, the known results deal with
qualitative characteristics of the spectrum (isolated eigenvalues, clustered
eigenvalues, eigenvalue gap). In our experiments, the parameter does not
affect the eigenvalue distribution qualitatively.
The purpose of our paper is to report our observations and to give at least
a partial explanation of them. For our parametrized eigenvalue distributions
CG is sensitive to rounding errors even in the case of a small condition
number K (e.g. K = 100) and small N (e.g. N = 12,24). The fact that the CG
process is strongly influenced by rounding errors for small N is more
surprising than for large N. We have studied the sensitivity of CG to
rounding errors as a function of eigenvalue distribution. The small problem
size N is convenient for this work. Therefore most of our numerical results
illustrated in Figures I-3 have been obtained for N = 24 (we note that they
represent a small part of the experimental results obtained during our work
for this paper). It can be easily verified that similar results can be obtained
for N >> 24.
We do not claim that our particular eigenvalue distribution is the only
one for which CG suffers from rounding errors. Nevertheless, we think that
studying this distribution may lead to a better understanding of the effect of
rounding errors on the CG process.
CONJUGATE GRADIENT METHOD 537

2. THE REAL CONVERGENCE RATE: DESCRIPTION


OF EXPERIMENTS

The aim of our experiments was to study the dependence of the real CG
convergence rate on the distribution of eigenvalues of the matrix A (in
preconditioned CG methods, the distribution of eigenvalues is that of the
preconditioned matrix, which depends on the choice of preconditioning
method [3]).
We use the following eigenvalue distributions. For a given N, A,, and
K = A,/A, we generate the inner eigenvalues by the formula

i-l
hi = A, + N_1(A” - A,)P> d=2,3 ,...,N-I. (2.1)

Thus for p = 1 we have a uniformly distributed spectrum, and for p < 1 the
spectrum has the “clusterpoint” A i. The parameter p describes quantitatively
the nonuniformity of the spectrum.
For a given spectrum {A,, AZ,. . , AN) determined by N, A i, K, and p we
carried out computations using following five algorithms:

ALGORITHM 2A. The simulation procedure proposed by Hageman and


Young [7] generates the Euclidean norm of the kth CG error, ]]x - xk]]; the
A-norm of the k th CG error, [lx - rk(( A; and the Euclidean norm of the k th
CG residual, ]]rk]]. The coordinates 77j of the initial residual r” in the basis of
normalized eigenvectors of the matrix A are randomly generated with
7j E (O,l>. This simulation procedure is equivalent to the CG algorithm in
which all vectors are represented by their coordinates in the basis of A’s
eigenvectors. The operation y = Ax is thus reduced to y = diag(A,, . . . , A,)x
E Dx, and the computation corresponds to the ordinary CC process applied
to the diagonal system Dx = b.

ALGORITHM 2B. The simulation procedure based on Paige and Saunder’s


SYMMQL algorithm generates [lx - xk](, ]]x - xk]]A, and J]rkJ], as well as the
Euclidean norms [lx - z;]] of the SYMMQL error,Ilr:ll of the SYMMQL residual,
and ]]rhll of the MINRES residual; cf. [ll].

ALGORITHM 2C. The Rutishauser variant of the CG algorithm [27] is


applied to the system Dx+b=O, where D=diag(A,,...,A,), b=-r”=
-(771,172,...7 vNIT, and the initial guess is x0 = 0.
538 Z. STRAKOS

ALGORITHM2D. The classical variant of the CG algorithm (referred to


by Reid as algorithm version 2; cf. 1161).

ALGORITHM2E. Jacobi acceleration of the Algorithm 2D; cf. [l].

These last two algorithms were applied to the system A? = 6 generated


by

A = GDGT, (2.2)
where % and ? are related to r” and x by

6 = Gr', i!=Gx, (2.3)


and G is a randomly generated orthogonal matrix, GGT = I. The initial guess
is !i” = 0.

3. THE REAL CONVERGENCE RATE: RESULTS AND DISCUSSION

Our observations and conclusions are summarized in three points.

(1) There exists a critical value of the parameter p for which the number
of CG iterations required for convergence greatly exceeds N even in the case
of a small condition number (e.g. K = 100) and small N (e.g. N = 12,24). In
its neighborhood a small change in p (or a small perturbation of the
eigenvalue distribution) may cause a large change in the CG convergence
rate.
(2) The critical value of the parameter p decreases with increasing
precision level. For p 2 0.9, CG processes are practically ordered in the
sense of Theorem 3.1.
(3) The observed results practically do not depend on the variant of the
CG algorithm used (that is why the variants are not distinguished in the
figures).

A detailed discussion and partial explanation of these results is given in


the following subsections.

Point Cl)
We have observed that the real CG convergence rate strongly depends on
the distribution of eigenvalues of the system matrix. For some eigenvalue
CONJUGATE GRADIENT METHOD 539

distributions CG is sensitive to roundoff even in the case of a small condition


number (e.g. K = 100) and small N (e.g. N = 24) while for a slightly
modified spectrum the sensitivity IS remarkably smaller. This is illustrated in
Figures 1 and 2. Figure 1 shows the number k of CG iterations required to
satisfy the criterion (IX - x’ll/ (Ix - x0/l < 10pL, L = 12, as a function of the
parameter p for A, = 0.1, K = 103, and N = 12, 24, and 48. In Figure 2 the
size of the problem, N, is fixed (N = 24) and the condition number is set at
K = 10, 102, 103, and 104. All results have been obtained using IBM double
precision (DP) arithmetic (56 bit mantissa). We have found that for a given
N, K, and precision level L (here L = 12), there exists a critical value p*. As
p approaches p *, the number of iterations k = k(p, L) required to satisfy
IIx - xkl( < 10-LIlx - x0/l greatly exceeds N, reaching its maximum at p = p*.
In the neighborhood of p* a small change in p may cause a large change in
kb, LX

110

70
k
60

I
4 I I I
0.2 0.4 0.6 0.8 I.0
c

FIG. 1. The number of CG iterations required to reduce the relative CG error


measured in the Euclidean norm below 10-12, as a function of the parameter p
(A, = 0.1, K = 103, N = 12, 24, and 48).
540 Z. STRAKOS

60

I I
, I I
0,. 2 0.4 0.6 0.8 li. 0
c

FIG. 2. The number of CG iterations required to reduce the relative CG error


measured in the Euclidean norm below lo-“, as a function of the parameter p
(A, = 0.1, N = 24, K = 10, lo*, 103, and 104).

The fact that the CG rate of convergence depends strongly on the


distribution of eigenvalues is well known. However, the known results deal
with qualitative characteristics of the spectrum (isolated eigenvalues, clus-
tered eigenvalues, eigenvalue gap). In our experiments, the parameter p
does not affect the eigenvalue distribution qualitatively. We wish to call the
reader’s attention to the fact that a small perturbation of the eigenvalue
distribution may cause a large change in the CG convergence rate (or in the
sensitivity of CG process to rounding errors).

Point (2)
For a fixed A,, N, and K, the critical value p* decreases with increasing
precision level I,. This is illustrated in Figure 3, which shows k = k(p, L)
computed in DP arithmetic for A, = 0.1, N = 24, K = 103, and precision
levels L = i, 1, 4, 7, and 10. We see a shift of p*(L) from E 0.9 for L = 0.5
to E 0.68 for I, = 10. Even more surprising is the fact that k(p*(l), 1) x=-
k(p*(lO),l). To reach an accuracy of 10-l for p = p*(l)s 0.9, the CG
process needs many more iterations than for p = p*(lO) E 0.68. CG rapidly
accelerates its rate of convergence in the first case, while in the second it
converges slowly with a roughly constant convergence rate. We have not
found an explanation.
We see that for any value of L and for any p1.p2 + 1, p1 > pz,

k(p,,L) Q k(pz,L),
CONJUGATE GRADIENT METHOD 541

FIG. 3. The function k = k(p, L) for h, = 0.1, N = 24, K = 103, and L = f, 1, 4,


7, and 10.

i.e., the CG processes computed for different values of p, p -+ 1, are in some


sense “ordered.”
Using Greenbaum’s ideas [6], we prove Theorem 3.1, which gives a
necessary and sufficient condition for ordering exact arithmetic CG processes
for systems with different spectra according to the energy norm of the error.
As above, let xi be the ith iterate of the exact CG process for AX = b,
and r” be the initial residual. Corresponding to r” there are uniquely
determined eigenvalues 0 < h, < A, < . . * < A,,, and orthonormal eigenvec-
tors z’,z~,...,z”’ of A such that

m
r” = C 7jjzj, 77j'O, j=1,2 ,...,m, ( 3.1)
j=l

(3.2)
542 Z. STRAKOS

Let By = c be a system of linear equations, B a symmetric positive definite


matrix. Let yi be the exact ith CG iterate for this system corresponding to
the initial residual to = c - By’, and let to be expressed as

to= 2 tjwj, tj>O, j=1,2 ,..., m, (3.3)


j=l

where w1,w2,...,w”’ are normalized eigenvectors of B corresponding to its


eigenvalues PI, pZ,. . . ,p,,.

DEFINITION 3.1. We call the initial guesses x0 and y” comparable if the


corresponding initial residuals are defined by (3.1) and (3.3) and nj /A =
(j/G> j = L%...,m.

COMMENT. For the comparable initial guesses r” and y” we have

IIX - x011*= II!/- YOlle.

THEOREM 3.1. The following three assertions are equivalent:

(A) We have

‘i Pi
-<- Vi> j, i,jE{l,Z ,..., m}. (3.4)
Aj Pj

(B) For any comparable initial guesses x0 and y” we have

IIX - xill.4 < Ily - dlle ViE(1,2 ,..., m}.

CC> For any comparable initial guesses x0 and y ’ we have

IIX - xx4 < Ily - Y’lls.


CONJUGATE GRADIENT METHOD 543

Proof. (A) d (B): Using the CG-minimizing property (e.g. [20, 21, 251)
and the expansions (3.1) and (3.31, we can write

lly - y”lG= E [4i(Pk)l%


k=l

where pi, qi are the minimizing CG polynomials with respect to xi, y i, and
where S,” = T-E/hk = [i/pk. Greenbaum showed [6, Lemma l] that there
exists an ith degree polynomial si, ~~(0) = 1, such that [So]’ <[qi&k)12
Vk E {1,2,. . . , m). Applying this, we must have

Q 2 [9i(pk)12(Jf
= IIY - ~‘11: ViE(1,2 ,..., m).
k=l

(B) j (C): This implication is trivial.


(Cl * (A): Suppose C holds and there are i, j E (1,2,. . . , ml, i > j, such
that

EL<-,
‘i

Pj Aj

Choosing li = lj = 1, lk = 0, k E {{l, 2,. . . , m)\{i, j)}, we obtain

hi + Aj Pi+cLj
P,(A) = I- -A, %(P)=l-- 2p.
A; + A; Pf + Pj

It is easy to show that

lly - #II: = 5 [91(,+)]21f< t


k=l k=l
[Pl(Ak)]26;= 11%
- X’lli.

This contradicts the assumption, and the proof is finished. The ordering
544 Z. STFUKOS

according to the first iteration implies the same ordering for subsequent
iterations. n

Though for p > 0.9 the CG processes shown in Figure 3 are practically
ordered in the sense of Theorem 3.1, this fact can hardly be explained using
this theorem. Theorem 3.1 assumes exact arithmetic, while the experimental
results in Figure 3 are strongly influenced by rounding errors. Theorem 3.1
can be successfully applied for explanation of experiments, e.g. in the case of
CG with full reorthogonalization.

Point (3):
The observed results do not appreciably depend on the variant of the CG
algorithm used. Comparable results of Algorithms 2A-2D differ in at most
one or two iterations. Jacobi acceleration of CG (Algorithm 2E) does not
improve the results shown, e.g., in Figure 3. On the contrary, for p E (0.5,
0.6) it converges more slowly than ordinary CG. The agreement of real
convergence rates obtained using Algorithms 2A, 2C, and 2D shows that the
effect of roundoff on the CG process does not depend noticeably on the
density of the system matrix (on the accumulation of elemental roundoff).
The agreement of the Algorithm 2A and 2B results shows definitely that in
our experiments the CG process suffers from rounding errors not due to
unstable solution of the implicitly (in Algorithm 2A) or explicitly (in Algo-
rithm 2B) generated tridiagonal system, but due to loss of orthogonality
among the residual or normalized residual vectors. This corresponds to
Simon’s results [17].

4. CONCLUDING REMARKS

In exact arithmetic the convergence of extremal Ritz values implies the


acceleration of the CG convergence rate (e.g. [2, 20, 22-241). On the
contrary, the convergence of a computed Ritz pair to an eigenpair is
equivalent to the loss of orthogonality caused by rounding errors (e.g. [12]).
Jennings observed the influence of large outlying eigenvalues on the CG
sensitivity to rounding errors [9]. We present a parameterized eigenvalue
distribution with large outlying eigenvalues for which a small change of the
parameter can cause a large change in the sensitivity of CG process to
rounding errors.
We believe that further progress in our work requires a large number of
experiments which relate the real rate of convergence to the loss of orthogo-
nality and the convergence of Ritz pairs.
CONJUGATE GRADIENT METHOD 545

Fundamental theoretical results concerning the convergence of Ritz pairs


and the loss of orthogonality were developed by Paige and Parlett.
Greenbaum used Paige’s work and proved a very interesting result concem-
ing the real convergence rate of the CG and Lanczos methods. She shows
that for a given values of J, finite precision CG applied to a linear system
AX = b converges in steps 1 through J at the same rate as exact CG applied
to a certain linear system x,X, = ?I,. The matrix A, has more distinct
eigenvalues than A. The eigenvalues of x, all lie within small intervals
about the eigenvalues of A [lo].
We give a small new contribution to these results. We prove that for any
eigenvalue hj of A for which the corresponding eigenvector has a nonnegli-
gible component in the initial residual, there is an eigenvalue of XJ very
near Aj.
We use the notation similar to [12]. The computed results of the Lanczos
process after k steps satisfy

AV, =vkTk + /?k+lv k+l(ek)T+ sv,. (4.1)

Let the exact eigendecomposition of A and Tk be

A = UAUT, A = diag(hj), (4.2)

Tk = YkMkYkT, Mk = diag( $)), (4.3)

where the orthonormal matrices U and Yk have the columns uj and yjk!
Then from (4.1)

If we denote

w, = (t”$‘)i j = UT& = uTv,Y,,

f& = (t,$))i,j = uT[&+lvk+l(ek)T


+ av,]r,, (4.5)

where 2, = VkYk is the matrix of computed Ritz vectors, then (4.4) can be
546 Z. STRAKOS

written as

(4.6)

or

(4.7)

We now consider the estimate for IlSn,ll. For any vector x

< 11x11
lld+lll < (1+ 4)“‘ll~ll>

(4.8)

Here E,,, &I are small multiples of the relative machine precision E defined in
[ 12, (2.16)].
We are interested in the value of

min IA.1 - CL(k)\


J
j

The result is obtained in the next theorem.

THEOREM 4.1. Zf the eigenvector of A has a nonzero component in the


initial vector of the Lanczos process, i.e.

I(ut,v’)(+o, (4.9)

then for some eigenvalue CL(“)


3
of T k

Ihi _ ~~‘)I = minIA, _ ~~k)I ~ ’ k”2&1JIAII)


k1’2(‘k+l
t [1+
I("i~vl)l
O(E)]. (4.10)
CONJUGATti GRADIENT METHOD 547

Proof. From (4.7)

Then for any set of real numbers qt, C:=,cp,” = 1, we have

(4.11)

(4.12)

and we can find coeffkients qt so that

c ‘PtYt(k) =
t=1
e,,
(4.13)

Using (4.131, (4.12), (4.11), and (4.8), the proof is completed. n

As a consequence we obtain

THEOREM 4.2. Zf (4.9) holds, th en f or each eigenvalue Ai of A there is


some eigenvalue pL(jN+ m, of the Greenbaum matrix xJ constructed fm the Jth
step of the Lanczos process with the property

Ihi - /.L(jN+y Q (yy;f)i;ll [1+ O(&)], (4.14)


If

where m and 5 are defined in [lo, pp. 22, 36, 46, 501.

Greenbaum warns that Theorem 4.2 does not imply that the finite
precision Lanczos process will eventually find all eigenvalues with nonnegli-
gible components in the initial residual. Only ifit could be shown that, for
some J, the exact Lanczos process applied to A, finds some eigenvalue by
step J, would it then follow that the finite precision Lanczos process finds
this eigenvalue (A. Greenbaum, personal communication).
548 ’ Z. STtiKOS

I wish to thank Professor A. Greenbaum for suggesting several valuable


improvements, and Professor 0. Axelsson for his kind support during the
preparation of this paper.

REFERENCES

1 0. Axelsson and V. A. Barker, Finite Element Solution of Boundary Value


Problems-Theory and Computations, Academic, New York, 1984, 432 pp.
2 0. Axelsson and G. Lindskog, On the rate of convergence of the preconditioned
conjugate gradient method, Numer. Math. 48:499-523 (1986).
3 0. Axelsson and G. Lindskog, On the eigenvalue distribution of a class of
preconditioning methods, Namer. Math. 48:479-498 (1986).
4 J. Bollen, Round-off Error Analysis of Descent Methods for Solving Linear
Equations, Ph.D. Thesis, Tech. Hogeschool Eindhoven, 1980, 193 pp.
5 J. Bollen, Numerical stability of descent methods for solving linear equations,
Numer. Math. 43:361-377 (1984).
6 A. Greenbaum, Comparison of splittings used with the conjugate gradient
algorithm, Numer. Math. 33:181-194
(1979).
7 L. A. Hageman and D. M. Young, Applied Zterative Methods, Academic, New
York, 1981, 386 pp.
8 M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear
systems, J. Res. Nat. Bur. Standards 49:409-436 (1952).
9 A. Jennings, Influence of the eigenvalue spectrum on the convergence rate of the
conjugate gradient method, J. Inst. Math. Appl. 20:61-72 (1977).
10 A. Greenbaum, Behavior of slightly perturbed Lanczos and conjugate gradient
recurrences, Linear Algebra Appl. 113:7-63 (1989).
11 C. C. Paige and M. A. Saunders, Solution of sparse indefinite systems of linear
equations, SIAM J. Numer. Anal. 12(4):617-629 (1975).
12 C. C. Paige, Accuracy and effectiveness of the Lanczos algorithm for the
symmetric eigenproblem, Linear Algebra Appl. 34:235-258 (1980).
13 C. C. Paige and M. A. Saunders, LSQR: An algorithm for sparse linear equations
and sparse least squares, ACM Trans. Math. Software 8(1):43-71 (1982).
14 B. N. Parlett, The Symmetric EigenvaZue Problem, Prentice-Hall, New York,
1980.
15 B. N. Parlett, A new look at the Lanczos algorithm for solving symmetric systems
of linear equations, Linear Algebra Appl. 29:323-346 (1980).
16 J. K. Reid, On the method of conjugate gradients for solution of large sparse
systems of linear equations, in Large Sparse Sets of Linear Equations (J. K. Reid,
Ed.), Academic, London, 1971, pp. 231-254.
17 H. D. Simon, The Lanczos Algorithm for Solving Symmetric Linear Systems,
Ph.D. Thesis, Univ. of California, Berkeley, 1982, 109 pp.
18 H. D. Simon, The Lanczos algorithm with partial reorthogonalization. Math.
Comp. 42:115-142 (1984).
19 H. D. Simon, Analysis of the symmetric Lanczos algorithm with reorthogonaliza-
tion methods, Linear Algebra Appl. 61:101-131 (1984).
CONJUGATE GRADIENT METHOD

20 A. van der Sluis and H. A. van der Vorst, The rate of convergence of conjugate
gradients, preprint 354, U iv. Utrecht, 1984, 29 pp.
21 J. Stoer and R. Freund, 6 n the solution of large indefinite systems of linear
equations by conjugate gradient algorithms, preprint 85, Univ. Wiirzburg, Inst.
Rir Angewandte Mathematik und Statistik, 1981.
22 Z. StrakoS, A remark on the rate of convergence of the conjugate gradient
method for the solution of linear systems (in Czech), in Proceedings of the
Conference Algorithms ‘87, Slovak Union of Math., Strbske Pleso, 1987, pp.
287-290.
23 Z. StrakoS, On the rate of convergence of conjugate gradient method for solution
of linear systems, in Proceedings of the European Congress of Simulation,
Prague, 1987, pp. 184-189.
24 Z. Strakog, The Conjugate Gradient Method for the Solution of Linear Systems
(in Czech), Res. Rep. V-279, General Computing Center of the Czechoslovak
Academy of Sciences, Prague, 1987, 80 pp.
25 H. A. van der Vorst, Preconditioning by Incomplete Decompositions, Ph.D.
Thesis, Univ. Utrecht, 1982, 97 pp.
26 H. Wozniakowski, Roundoff-error analysis of a new class of conjugate-gradient
algorithms, Linear Algebra Appl. 29:507-529 (1980).
27 J. W. Wilkinson and C. Reinsch, Handbook for Automatic Computation-Linear
Algebra, Springer, New York, 1971.

Received 5 February 1990;final manuscript accepted 20 November 1990

You might also like