Strakos: On The Real Convergence Rate of The Conjugate Gradient Method
Strakos: On The Real Convergence Rate of The Conjugate Gradient Method
Strakos: On The Real Convergence Rate of The Conjugate Gradient Method
z. strakos
General Computing Center of the
Czechoslovak Academy of Science
Pod vodrirenskou v&Z 2
182 07 Praha 8, Czechoslovakia
ABSTRACT
1. INTRODUCTION
x i+1 = xi + Ti d”,
r i+l= ri - 7iAdi,
rf+l i+l
>r 1
cp,=(.
I
(ri,ri) ’
The CG method was proposed by Hestenes and Stiefel in 1952 [8]. It gives,
in exact arithmetic, the correct solution x within N steps. In practice,
however, CG is regarded as an iterative method [16], because a sufficiently
accurate approximate solution x k is often obtained in far fewer than N steps.
The theoretical convergence rate (in the absence of rounding errors) has
been studied by many authors. Using Chebyshev polynomials, estimates for
the convergence rate were developed ([7], e.g.). It was recognized that the
rate of convergence depends strongly on the distribution of eigenvalues of A.
The effect of roundoff was discussed in Hestenes and Stiefel’s original paper,
and again by Reid [16] and Jennings [9]. It was also studied extensively by
Wozniakowski [26] and Bollen [4, 51. Results developed by Paige, Parlett, and
Simon for the Lanczos algorithm provided further insight into this problem
[12-15, 17-191.
During our experiments we have found a parametrized class of matrices
for which the behavior of the CG method is quite surprising. We have
studied the change in the number of iterations required for convergence as a
function of eigenvalue distribution. For some values of the parameter the CG
process is sensitive to rounding errors, which leads to very slow convergence.
A small change in the parameter (which implies a small change in the
eigenvalue distribution) can lead to a large change in the convergence rate.
The fact that the CG rate of convergence depends strongly on the distribu-
tion of eigenvalues is well known. However, the known results deal with
qualitative characteristics of the spectrum (isolated eigenvalues, clustered
eigenvalues, eigenvalue gap). In our experiments, the parameter does not
affect the eigenvalue distribution qualitatively.
The purpose of our paper is to report our observations and to give at least
a partial explanation of them. For our parametrized eigenvalue distributions
CG is sensitive to rounding errors even in the case of a small condition
number K (e.g. K = 100) and small N (e.g. N = 12,24). The fact that the CG
process is strongly influenced by rounding errors for small N is more
surprising than for large N. We have studied the sensitivity of CG to
rounding errors as a function of eigenvalue distribution. The small problem
size N is convenient for this work. Therefore most of our numerical results
illustrated in Figures I-3 have been obtained for N = 24 (we note that they
represent a small part of the experimental results obtained during our work
for this paper). It can be easily verified that similar results can be obtained
for N >> 24.
We do not claim that our particular eigenvalue distribution is the only
one for which CG suffers from rounding errors. Nevertheless, we think that
studying this distribution may lead to a better understanding of the effect of
rounding errors on the CG process.
CONJUGATE GRADIENT METHOD 537
The aim of our experiments was to study the dependence of the real CG
convergence rate on the distribution of eigenvalues of the matrix A (in
preconditioned CG methods, the distribution of eigenvalues is that of the
preconditioned matrix, which depends on the choice of preconditioning
method [3]).
We use the following eigenvalue distributions. For a given N, A,, and
K = A,/A, we generate the inner eigenvalues by the formula
i-l
hi = A, + N_1(A” - A,)P> d=2,3 ,...,N-I. (2.1)
Thus for p = 1 we have a uniformly distributed spectrum, and for p < 1 the
spectrum has the “clusterpoint” A i. The parameter p describes quantitatively
the nonuniformity of the spectrum.
For a given spectrum {A,, AZ,. . , AN) determined by N, A i, K, and p we
carried out computations using following five algorithms:
A = GDGT, (2.2)
where % and ? are related to r” and x by
(1) There exists a critical value of the parameter p for which the number
of CG iterations required for convergence greatly exceeds N even in the case
of a small condition number (e.g. K = 100) and small N (e.g. N = 12,24). In
its neighborhood a small change in p (or a small perturbation of the
eigenvalue distribution) may cause a large change in the CG convergence
rate.
(2) The critical value of the parameter p decreases with increasing
precision level. For p 2 0.9, CG processes are practically ordered in the
sense of Theorem 3.1.
(3) The observed results practically do not depend on the variant of the
CG algorithm used (that is why the variants are not distinguished in the
figures).
Point Cl)
We have observed that the real CG convergence rate strongly depends on
the distribution of eigenvalues of the system matrix. For some eigenvalue
CONJUGATE GRADIENT METHOD 539
110
70
k
60
I
4 I I I
0.2 0.4 0.6 0.8 I.0
c
60
I I
, I I
0,. 2 0.4 0.6 0.8 li. 0
c
Point (2)
For a fixed A,, N, and K, the critical value p* decreases with increasing
precision level I,. This is illustrated in Figure 3, which shows k = k(p, L)
computed in DP arithmetic for A, = 0.1, N = 24, K = 103, and precision
levels L = i, 1, 4, 7, and 10. We see a shift of p*(L) from E 0.9 for L = 0.5
to E 0.68 for I, = 10. Even more surprising is the fact that k(p*(l), 1) x=-
k(p*(lO),l). To reach an accuracy of 10-l for p = p*(l)s 0.9, the CG
process needs many more iterations than for p = p*(lO) E 0.68. CG rapidly
accelerates its rate of convergence in the first case, while in the second it
converges slowly with a roughly constant convergence rate. We have not
found an explanation.
We see that for any value of L and for any p1.p2 + 1, p1 > pz,
k(p,,L) Q k(pz,L),
CONJUGATE GRADIENT METHOD 541
m
r” = C 7jjzj, 77j'O, j=1,2 ,...,m, ( 3.1)
j=l
(3.2)
542 Z. STRAKOS
(A) We have
‘i Pi
-<- Vi> j, i,jE{l,Z ,..., m}. (3.4)
Aj Pj
Proof. (A) d (B): Using the CG-minimizing property (e.g. [20, 21, 251)
and the expansions (3.1) and (3.31, we can write
where pi, qi are the minimizing CG polynomials with respect to xi, y i, and
where S,” = T-E/hk = [i/pk. Greenbaum showed [6, Lemma l] that there
exists an ith degree polynomial si, ~~(0) = 1, such that [So]’ <[qi&k)12
Vk E {1,2,. . . , m). Applying this, we must have
Q 2 [9i(pk)12(Jf
= IIY - ~‘11: ViE(1,2 ,..., m).
k=l
EL<-,
‘i
Pj Aj
hi + Aj Pi+cLj
P,(A) = I- -A, %(P)=l-- 2p.
A; + A; Pf + Pj
This contradicts the assumption, and the proof is finished. The ordering
544 Z. STFUKOS
according to the first iteration implies the same ordering for subsequent
iterations. n
Though for p > 0.9 the CG processes shown in Figure 3 are practically
ordered in the sense of Theorem 3.1, this fact can hardly be explained using
this theorem. Theorem 3.1 assumes exact arithmetic, while the experimental
results in Figure 3 are strongly influenced by rounding errors. Theorem 3.1
can be successfully applied for explanation of experiments, e.g. in the case of
CG with full reorthogonalization.
Point (3):
The observed results do not appreciably depend on the variant of the CG
algorithm used. Comparable results of Algorithms 2A-2D differ in at most
one or two iterations. Jacobi acceleration of CG (Algorithm 2E) does not
improve the results shown, e.g., in Figure 3. On the contrary, for p E (0.5,
0.6) it converges more slowly than ordinary CG. The agreement of real
convergence rates obtained using Algorithms 2A, 2C, and 2D shows that the
effect of roundoff on the CG process does not depend noticeably on the
density of the system matrix (on the accumulation of elemental roundoff).
The agreement of the Algorithm 2A and 2B results shows definitely that in
our experiments the CG process suffers from rounding errors not due to
unstable solution of the implicitly (in Algorithm 2A) or explicitly (in Algo-
rithm 2B) generated tridiagonal system, but due to loss of orthogonality
among the residual or normalized residual vectors. This corresponds to
Simon’s results [17].
4. CONCLUDING REMARKS
where the orthonormal matrices U and Yk have the columns uj and yjk!
Then from (4.1)
If we denote
where 2, = VkYk is the matrix of computed Ritz vectors, then (4.4) can be
546 Z. STRAKOS
written as
(4.6)
or
(4.7)
< 11x11
lld+lll < (1+ 4)“‘ll~ll>
(4.8)
Here E,,, &I are small multiples of the relative machine precision E defined in
[ 12, (2.16)].
We are interested in the value of
I(ut,v’)(+o, (4.9)
(4.11)
(4.12)
c ‘PtYt(k) =
t=1
e,,
(4.13)
As a consequence we obtain
where m and 5 are defined in [lo, pp. 22, 36, 46, 501.
Greenbaum warns that Theorem 4.2 does not imply that the finite
precision Lanczos process will eventually find all eigenvalues with nonnegli-
gible components in the initial residual. Only ifit could be shown that, for
some J, the exact Lanczos process applied to A, finds some eigenvalue by
step J, would it then follow that the finite precision Lanczos process finds
this eigenvalue (A. Greenbaum, personal communication).
548 ’ Z. STtiKOS
REFERENCES
20 A. van der Sluis and H. A. van der Vorst, The rate of convergence of conjugate
gradients, preprint 354, U iv. Utrecht, 1984, 29 pp.
21 J. Stoer and R. Freund, 6 n the solution of large indefinite systems of linear
equations by conjugate gradient algorithms, preprint 85, Univ. Wiirzburg, Inst.
Rir Angewandte Mathematik und Statistik, 1981.
22 Z. StrakoS, A remark on the rate of convergence of the conjugate gradient
method for the solution of linear systems (in Czech), in Proceedings of the
Conference Algorithms ‘87, Slovak Union of Math., Strbske Pleso, 1987, pp.
287-290.
23 Z. StrakoS, On the rate of convergence of conjugate gradient method for solution
of linear systems, in Proceedings of the European Congress of Simulation,
Prague, 1987, pp. 184-189.
24 Z. Strakog, The Conjugate Gradient Method for the Solution of Linear Systems
(in Czech), Res. Rep. V-279, General Computing Center of the Czechoslovak
Academy of Sciences, Prague, 1987, 80 pp.
25 H. A. van der Vorst, Preconditioning by Incomplete Decompositions, Ph.D.
Thesis, Univ. Utrecht, 1982, 97 pp.
26 H. Wozniakowski, Roundoff-error analysis of a new class of conjugate-gradient
algorithms, Linear Algebra Appl. 29:507-529 (1980).
27 J. W. Wilkinson and C. Reinsch, Handbook for Automatic Computation-Linear
Algebra, Springer, New York, 1971.