3.6 Iterative Methods For Solving Linear Systems

104 3.
6 Iterative Methods for Solving Linear Systems
cannot be) fulfilled, only the Gauss-Seidel iterative method may be used after apply Theorem 1.27,
that is, converting the given system in the form AT Ax = AT b, we get
45x1 + 38x2 + 38x3 = 350

38x1 + 45x2 + 38x3 = 340
38x1 + 38x2 + 45x3 = 300
Now using the initial solution x(0) = [0, 0, 0]T , and the following Gauss-Seidel iterative formula
(k+1) (k) (k)

x1 = 0.022[350 − 38x2 − 38x3 ]
(k+1) (k+1) (k)
x2 = 0.022[340 − 38x1 − 38x3 ]
(k+1) (k+1) (k+1)
x3 = 0.022[300 − 38x1 − 38x2 ]
and we got, x(4) = [5.58, 4.18, −1.58]T , that is close to the exact solution of the given system. •
1.6.4.1 Eigenvalues and Eigenvectors

Here we will discuss briefly about the eigenvalues and the eigenvectors of an n × n matrix. We also
show how they can be used to describe the solutions of the linear systems.
Definition 1.28 An n × n matrix A is said to have an eigenvalue λ of A if there exists a nonzero

vector, called an eigenvector x such that
Ax = λx. (1.58)
The relation (1.58) represents the eigenvalue problem and we will refer to (λ, x) as an eigenpair. •
The equivalent form of (1.58) is

(A − λI)x = 0, (1.59)
where I is an n × n identity matrix. The system of equation (1.59) has non-trivial solution x if and
only if, A − λI is singular or, equivalently,
det(A − λI) = |A − λI| = 0. (1.60)
The above relation (1.60) represents a polynomial equation in λ of degree n which in principle
could be used to obtain the eigenvalues of the matrix A. This equation is called the characteristic
equation of A. There are n roots of (1.60), which we will denote by λ1 , λ2 , . . . , λn . For a given
eigenvalue λi , the corresponding eigenvector xi is not uniquely determined. If x is an eigenvector
then so is αx where α is any nonzero scalar.
Example 1.65 Find the eigenvalues and eigenvectors of the following matrix
 
−6 0 0
A =  11 −3 0  .
 
−3 6 7
Chapter Three Systems of Linear Algebraic Equations 105
Solution. To find the eigenvalues of the given matrix A by using (1.60), we have

−6 − λ 0 0

11 −3 − λ 0 = 0,

−3 6 7−λ

which gives a characteristic equation of the form

λ3 + 2λ2 − 45λ − 126 = (−6 − λ)(−3 − λ)(7 − λ) = 0,
and gives us the eigenvalues λ = −6, λ = −3 and λ = 7 of the given matrix A. After finding
the eigenvalues of the matrix we turn to the problem of finding eigenvectors. The eigenvectors of
A corresponding to eigenvalues λ are the nonzero vector x that satisfy (1.59). Equivalently, the
eigenvectors corresponding to λ are the nonzero vectors in the solution space of (1.59). We call this
solution space the eigenspace of A corresponding to λ.
To find the eigenvectors of the given matrix A corresponding to each of these eigenvalues, we
substitute each of these three eigenvalues in (1.59). When λ = −6, we have
    
0 0 0 x1 0
11 3 0 x
 2   0 ,
=
    

−3 6 13 x3 0
which implies that
0x1 + 0x2 + 0x3 = 0
11x1 + 3x2 + 0x3 = 0
−3x1 + 6x2 + 13x3 = 0
Solving this system, we got, x1 = 3, x2 = −11 and x3 = 75/13. Hence the eigenvector x(1)
corresponding to first eigenvalue, λ1 = −6, is
x(1) = α[3, −11, 75/13]T , where α ∈ R, α 6= 0.
When λ = −3, we have     
−3 0 0 x1 0
 11 0 0   x2  =  0  ,
    
−3 6 10 x3 0
which implies that
3x1 + 0x2 + 0x3 = 0
11x1 + 0x2 + 0x3 = 0
−3x1 + 6x2 + 10x3 = 0
which gives the solution, x1 = 0, x2 = 5 and x3 = −3. Hence the eigenvector x(2) corresponding to
second eigenvalue, λ2 = −3, is
x(2) = α[0, 5, −3]T , where α ∈ R, α 6= 0.
Finally, when λ = 7, we have
    
−13 0 0 x1 0
 11 −10 0   x2  =  0  ,
    
−3 6 0 x3 0
106 3.6 Iterative Methods for Solving Linear Systems
which implies that

−13x1 + 0x2 + 0x3 = 0
11x1 − 10x2 + 0x3 = 0
−3x1 + 6x2 + 0x3 = 0
gives, x1 = x2 = 0 and x3 = 1. The eigenvector x(3) corresponding to λ3 = 7, is
x(3) = α[0, 0, 1]T , where α ∈ R, α 6= 0.
MATLAB command eig is basic eigenvalue and eigenvector routine. The command
>> D = eig(A);
returns a vector containing all the eigenvalues of the matrix A. If the eigenvectors are also wanted,
>> [X, D] = eig(A);
will return a matrix X whose columns are eigenvectors of A corresponding to the eigenvalues in the
diagonal matrix D. To get the results of the Example 1.65, we use MATLAB command window
as:
>> A = [−6 0 0; 11 − 3 0; −3 6 7]; P = poly(A); P P = poly2sym(P );

>> [X, D] = eig(A); eigenvalues = diag(D);
Definition 1.29 (Spectral Radius of a Matrix)
Let A be an n × n matrix. Then spectral radius ρ(A) of a matrix A is defined as
ρ(A) = max |λi |,

1≤i≤n
where λi are the eigenvalues of a matrix A. •
For example, the following matrix  

4 1 −3
A= 0 0 2 ,
 
0 0 −3
has the characteristic equation of the form
det(A − λI) = −λ3 + λ2 + 12λ = 0,
which gives the eigenvalues λ = 4, 0, −3 of A. Hence the spectral radius of A is
ρ(A) = max{|4|, |0|, | − 3|} = 4.
The spectral radius of a matrix A may be found from MATLAB command as:
>> A = [4 1 − 3; 0 0 2; 0 0 − 3]; B = max(eig(A))

Example 1.66 For the following matrix

!
a b
A= ,
c d
if the eigenvalues of the Jacobi iteration matrix and Gauss-Seidel iteration matrix are λi and µi ,
respectively. Then show that µmax = λ2max .
Solution. Decompose the given matrix in to the following form

! ! !
0 0 a 0 0 b
A=L+D+U = + + .
c 0 0 d 0 0
First we define the Jacobi iteration matrix as follows:
TJ = −D−1 (L + U ),
and computing the right-hand side, we have
! ! !
1/a 0 0 b 0 −b/a
TJ = − = .
0 1/d c 0 −c/d 0
To find the eigenvalues of the matrix TJ , we do as follows:

−λ −b/a
det(TJ − λI) = = 0,

−c/d −λ
gives s s s
cb cb cb
λ1 = − , λ2 = and λmax = .
ad ad ad
Similarly, we can find the Gauss-Seidel iteration matrix as follows:
TG = −(L + D)−1 U,
and computing the right-hand side, we get
!
0 −b/a
TG = .
0 cb/ad
To find the eigenvalues of the matrix TG , we do as follows:

−µ −b/a
det(TG − λI) = = 0,

0 cb/ad − µ
it gives
cb cb
µ1 = 0, µ2 = and µmax = .
ad ad
Thus s 2
cb 
µmax =  = λ2max ,
ad
which is the required result. •
The necessary and sufficient condition for the convergence of the Jacobi iterative method and the
Gauss-Seidel iterative method is defined in the following theorem.
Theorem 1.28 (Necessary and Sufficient Condition for Convergence)
For any initial approximation x(0) ∈ R, the sequence {x(k) }∞

k=0 of approximations defined by
x(k+1) = T x(k) + c, for each k ≥ 0, and c 6= 0 (1.61)
converges to the unique solution of x = T x + c if and only if ρ(T ) < 1.

Note that the condition ρ(T ) < 1 is satisfied when kT k < 1 because ρ(T ) ≤ kT k for any natural
norm. •
There are no general results exist to give us idea of choosing a method among the Jacobi method
or the Gauss-Seidel method to solve an arbitrary linear system. However, the following theorem is
suitable for the special case.
Theorem 1.29 If aij ≤ 0 for each i 6= j and aii > 0 for each i = 1, 2, . . . , n, then one and only
one of the following statements holds:
1. 0 ≤ ρ(TG ) < ρ(TJ ) < 1.
2. 1 < ρ(TJ ) < ρ(TG ).
3. ρ(TJ ) = ρ(TG ) = 0.
4. ρ(TJ ) = ρ(TG ) = 1. •
Example 1.67 Find the spectral radius of Jacobi and Gauss-Seidel iteration matrices using each
of the following matrix.
   
2 0 −1 1 −1 1
(a) A =  −1 3 0 , (b) A =  −2 2 −1  ,
   
0 −1 4 0 1 5
   
1 0 0 1 0 −1
(c) A =  −1 2 0 , (d) A= 1 1 0 .
   
0 −1 3 0 1 1
Solution. (a) The Jacobi iteration matrix TJ for the given matrix A can be obtained as
 
0 0 1/2
TJ =  1/3 0 0 ,
 
0 1/4 0
and the characteristic equation of the matrix TJ is

1
det(TJ − λI) = −λ3 + = 0.
24
329
Solving this cubic polynomial, the maximum eigenvalue (in absolute) of TJ is, , that is
949
329
ρ(TJ ) = = 0.3467.
949
Also, the Gauss-Seidel iteration matrix TG for the given matrix A is
 
0 0 1/2
TG =  0 0 1/6  ,
 
0 0 1/24
and has the characteristic equation of the form

1 2
det(TG − λI) = −λ3 + λ = 0.
24
1
Solving this cubic polynomial, we obtained the maximum eigenvalue of TG , , that is
24
1
ρ(TG ) = = 0.0417.
24
(b) The Jacobi iteration matrix TJ for the given matrix A is
 
0 1 −1
TJ =  1 0 1/2  ,
 
0 −1/5 0
with the characteristic equation of the form

9 2 1 1098
det(TJ − λI) = −λ3 + λ + = 0, gives ρ(TJ ) = = 1.0447.
20 15 1051
The Gauss-Seidel iteration matrix TG is
 
0 1 −1
TG =  0 1 −1/2  ,
 
0 −1/5 1/10
with the characteristic equation of the form

11 2 11
det(TG − λI) = −λ3 + λ = 0, gives ρ(TG ) = = 1.1000.
10 10
Similarly, the matrices for (c) and (d), we have
   
0 0 0 0 0 0
TJ =  1/2 0 0 , TG =  0 0 0  ,
   
0 1/3 0 0 0 0
with
ρ(TJ ) = 0.0000 and ρ(TG ) = 0.0000,
and    
0 0 1 0 0 1
TJ =  −1 0 0 , TG =  0 0 −1  ,
   
0 −1 0 0 0 1
with, ρ(TJ ) = 1.0000 and ρ(TG ) = 1.0000, respectively. •
Note that for many linear systems only one of the method either Jacobi method or Gauss-Seidel
method will converge.
Example 1.68 Show that for the following linear system Jacobi method converges but Gauss-Seidel
method diverges.    
1 2 −2 3
(a) A =  1 1 1 , b =  0 .
   
2 2 1 1
Solution. The Jacobi iteration matrix TJ for the given matrix A can be obtained as
 
0 −2 2
−1
TJ = −D (L + U ) =  −1 0 −1  ,
 
−2 −2 0
det(TJ − λI) = −λ3 = 0,
which gives, λ = 0, 0, 0 and so ρ(TJ ) = 0. Thus Jacobi method converges. To compute the Gauss-
Seidel iteration matrix TG for the given matrix A, we do as
 
0 −2 2
TG = −(D + L)−1 U =  0 2 −3  ,
 
0 0 2
det(TG − λI) = −λ3 + 4λ2 − 4λ = 0.
Solving this cubic polynomial, we obtained λ = 0, 2, 2 and ρ(TG ) = 2, which means Gauss-Seidel
method diverges. •
Example 1.69 Show that for the following linear system Jacobi method diverges but Gauss-Seidel
method converges.    
2 1 1 4
(a) A =  1 2 1 , b =  4 .
   
1 1 1 4
Solution. The Jacobi iteration matrix TJ for the given matrix A can be obtained as
 
0 −1/2 −1/2
TJ = −D−1 (L + U ) =  −1/2 0 −1/2  ,
 
−1 −1 0
det(TJ − λI) = −4λ3 = 5λ − 2 = 0,
which gives, λ = −1.2808, 0.5000, 0.7808 and so ρ(TJ ) = 1.2808. Thus Jacobi method not converge.
To compute the Gauss-Seidel iteration matrix TG for the given matrix A, we do as
 
0 −1/2 −1/2
TG = −(D + L)−1 U =  0 1/4 −1/4  ,
 
0 1/4 3/4
det(TG − λI) = −λ3 + λ2 − 1/4λ = 0.
Solving this cubic polynomial, we obtained λ = 0, 0.5, 0.5 and ρ(TG ) = 0.5, which means Gauss-
Seidel method converges. •
Definition 1.30 (Convergent Matrix)
An n × n matrix is called a convergent matrix if
lim (Ak )ij = 0, for each i, j = 1, 2, . . . , n.

k→∞
!
1/3 0
Example 1.70 Show that the matrix A = , is the convergent matrix.
1/9 1/3
Solution. By computing the powers of the given matrix, we obtain
! ! !
2 1/9 0 3 1/27 0 4 1/81 0
A = , A = , A = .
2/27 1/9 3/81 1/27 4/243 1/81
Then in general, we have

 k
1

0 
Ak = 
 3
 k 1 ,
k 
3k+1 3
and it gives
k
1 k

lim =0 and lim = 0.
k→∞ 3 k→∞ 3k+1
Hence the given matrix A is convergent. •
1 1
Since the above matrix has the eigenvalue of order two, therefore, its spectral radius is . This
3 3
shows the important relation existing between the spectral radius of a matrix and the convergent
of a matrix.
Theorem 1.30 The following statements are equivalent:

1. A is a convergent matrix.
2. lim kAn k = 0, for all natural norms.

n→∞
3. ρ(A) < 1.
4. lim An x = 0, for every x. •

n→∞
 
1 1 0 1
 1 1 1 0 
Example 1.71 Show that the matrix A =   , is not convergent matrix.
 
 0 1 1 1 
1 0 1 1
Solution. Firstly, we shall find the eigenvalues of the given matrix A by computing the character-
istic equation of the matrix as follows
det(A − λI) = λ4 − 4λ3 + 2λ2 + 4λ − 3 = 0,
which factorizes to
(λ + 1)(λ − 3)(λ − 1)2 = 0,
gives the eigenvalues 3, 1, 1, and -1 of the given matrix A. Hence the spectral radius of A is,
ρ(A) = max{|3|, |1|, |1|, | − 1|} = 3 > 1. •
We will discuss some very important results concerning with the eigenvalue problems. The proofs
of all the results are beyond the text and will be omitted. However, they are very easily understood
and can be used.
Theorem 1.31 If A is an n × n matrix then

1. [ρ(AT A)]1/2 = kAk2 .
2. ρ(A) ≤ kAk, for any natural norm k.k .
Example 1.72 Consider the following matrix

 
−2 1 2
A =  1 0 0 ,
 
0 1 0
which gives a characteristic equation of the form
det(A − λI) = −λ3 − 2λ2 + λ + 2 = 0.
Solving this cubic equation, the eigenvalues of A are, -2, -1, and 1. Thus spectral radius of A is
ρ(A) = max{| − 2|, | − 1|, |1|} = 2.
Also     
−2 1 0 −2 1 2 5 −2 −4
AT A =  1 0 1   1 0 0  =  −2 2 2 ,
    
2 0 0 0 1 0 −4 2 4
and a characteristic equation of AT A is
−λ3 + 11λ2 − 14λ + 4 = 0,
which gives the eigenvalues, 0.4174, 1, and 9.5826. Therefore, the spectral radius of AT A is, 9.5826.
Hence q √
kAk2 = ρ(AT A) = 9.5826 ≈ 3.0956.
From this we conclude that
ρ(A) = 2 < 3.0956 ≈ kAk2 .
One can also show that
ρ(A) = 2 < 5 = kAk∞ and ρ(A) = 2 < 3 = kAk1 ,
which satisfies the Theorem 1.31. •
The spectral norm of a matrix A may be found from the MATLAB command as:
>> A = [−2 1 2; 1 0 0; 0 1 0]; B = sqrt(max(eig(A0 ∗ A)))
Theorem 1.32 If A is symmetric matrix then

q q q
kAk2 = ρ(AT A) = ρ(A2 ) = (ρ(A2 ))2 = ρ(A).
Example 1.73 Consider a symmetric matrix

 
3 0 1
A =  0 −3 0  ,
 
1 0 3
which has a characteristic equation of the form
−λ3 + 4λ2 + 9λ − 36 = 0.
Solving this cubic equation, we have the eigenvalues 4, -3, and 3 of the given matrix A. Therefore,
the spectral radius of A is, 4. Since A is symmetric, so
 
10 0 6
T 2
A A = A =  0 9 0 .
 
6 0 10
Since we know that the eigenvalues of A2 are the eigenvalues of A raised to the power 2, so the
eigenvalues of AT A are, 16, 9, and 9, and its spectral radius is, ρ(AT A) = ρ(A2 ) = [ρ(A)]2 = 16.
Hence q √
kAk2 = ρ(AT A) = 16 = 4 = ρ(A),
which satisfies the Theorem 1.32. •
Theorem 1.33 If A is nonsingular matrix, then for any eigenvalue of A

1
≤ |λ| ≤ kAk2 .
kA−1 k2
Note that this result is also true for any natural norm. •
! !
2 1 2 −1
Example 1.74 Consider the matrix A = , and its inverse A−1 = . Firstly,
3 2 −3 2
!
13 8
we find the eigenvalues of the matrix AT A = , which can be obtained by solving a
8 5
characteristic equation

13 − λ 8
T
det(A A − λI) = = λ2 − 18λ + 1 = 0,

8 5−λ
which gives the eigenvalues 17.96, and 0.04. The spectral radius of AT A is, 17.96. Hence
q √
kAk2 = ρ(AT A) = 17.96 ≈ 4.24.
Since a characteristic equation of (A−1 )T (A−1 ) is

13 − λ 4
det[(A−1 )T (A−1 ) − λI] = = λ2 − 18λ + 49 = 0,

4 5−λ
which gives the eigenvalues 14.64 and 3.36, of (A−1 )T (A−1 ), and therefore, its spectral radius is,
14.64. Hence q √
kA−1 k2 = ρ((A−1 )T (A−1 )) = 14.64 ≈ 3.83.
Note that the eigenvalues of A are, 3.73 and 0.27, and therefore, its spectral radius is, 3.73. Hence
1
< |3.73| < 4.24,
3.83
which satisfies the above Theorem 1.33. •
1.6.5 Successive Over-Relaxation Method

Since we have seen that the Gauss-Seidel method uses updated information immediately and con-
verges more quickly than the Jacobi method. In some large systems of equations the Gauss-Seidel
method converges at a very slow rate. Many techniques have been developed in order to improve
the convergence of the Gauss-Seidel method. Perhaps one of the simplest and widely used method
is successive over-relaxation (SOR). A useful modification to the Gauss-Seidel method is define by
the iterative scheme
 
i−1 n
(k+1) (k) ω  X (k+1) X (k)
xi = (1 − ω)xi + bi − aij xj − aij xj  (1.62)
aii j=1 j=i+1
i = 1, 2, . . . , n, k = 1, 2, . . . ,
or, it can be written as

 
i−1 n
(k+1) (k) ω  X (k+1) X (k)
xi = xi + bi − aij xj − aij xj  (1.63)
aii j=1 j=i
i = 1, 2, . . . , n, k = 1, 2, . . . .
The matrix form of the SOR method can be represented by
x(k+1) = (D + ωL)−1 [(1 − ω)D + ωU ]x(k) + ω(D + ωL)−1 b, (1.64)
which is equivalent to
x(k+1) = Tω x(k) + c, (1.65)
where
Tω = (D + ωL)−1 [(1 − ω)D − ωU ] and c = ω(D + ωL)−1 b, (1.66)
are called the SOR iteration matrix and the vector respectively.
The quantity ω is called the relaxation factor. It can be formally proved that convergence can
be obtained for a values of ω in the range 0 < ω < 2. For ω = 1, the SOR method (1.62) is
simply the Gauss-Seidel method. The methods involves (1.62) are called relaxation methods. For
the choices of 0 < ω < 1, the procedures are called under-relaxation methods, and can be used
to obtain convergence of some systems that are not convergent by the Gauss-Seidel method. For
choices 1 < ω < 2, the procedures are called over-relaxation methods, which can used to accelerate
the convergence for systems that are convergent by the Gauss-Seidel method. The SOR methods
are particularly useful for solving the linear systems that occur in the numerical solution of certain
partial differential equations.
Example 1.75 Find the l∞ -norm of the SOR iteration matrix Tω when ω = 1.005 by using the
following matrix !
5 −1
A= .
−1 10
Solution. Since the SOR iteration matrix is
Tω = (D + ωL)−1 [(1 − ω)D − ωU ],
where ! ! !
0 0 0 −1 5 0
L= , U= , D= .
−1 0 0 0 0 10
Then " ! !#−1 " ! !#
5 0 0 0 5 0 0 −1
Tω = +ω (1 − ω) −ω ,
0 10 −1 0 0 10 0 0
which equal to (given ω = 1.005)
!−1 ! !
5 0 −0.025 1.005 −0.005 0.201
Tω = = .
−1.005 10 0 −0.05 −0.0005 0.0152
Then, kTω k∞ = max{0.206, 0.0157} = 0.206, is the l∞ -norm of the matrix Tω . •

(k) (k) (k) (k)

k x1 x2 x3 x4
(k) (k) (k) (k) 0 0.00000 0.00000 0.00000 0.00000
k x1 x2 x3 x4
0 0.0000 0.0000 0.0000 0.0000 1 0.16500 -0.38775 1.03560 0.74924
1 5.0e-1 5.0e-1 3.0e+000 1.70e+0 2 0.66376 0.51715 1.63414 1.15201
2 -1.5e+0 -6.5e+0 3.83e+0 2.94e+0 3 -0.26301 -0.20820 1.98531 1.44656
3 2.65e+1 1.34e+2 -2.47e+1 -1.95e+1 .. .. .. .. ..
. . . . .
4 -5.37e+2 -2.71e+3 5.51e+2 4.34e+2 21 -0.11932 -0.08436 2.51291 1.91401
22 -0.11939 -0.08427 2.51284 1.91410
Table 1.7: Solution by Gauss-Seidel Method
Table 1.8: Solution by SOR Method
Example 1.76 Solve the following system of linear equations taking initial approximation x(0) =
[0, 0, 0, 0]T and with = 10−4 in l∞ -norm
2x1 + 8x2 = 1
5x1 − x2 + x3 = 2
−x1 + x2 + 4x3 + x4 = 12
x2 + x3 + 5x4 = 12
(a) using the Gauss-Seidel method, and (b) using the SOR method with ω = 0.33.
Solution. (a) The Gauss-Seidel method for the given system is

(k+1) (k)
x1 = 0.5[1 − 8x2 ]
(k+1) (k+1) (k)
x2 = −1[2 − 5x1 − x3 ]
(k+1) (k+1) (k+1) (k)
x3 = 0.25[12 + x1 − x2 − x4 ]
(k+1) (k+1) (k+1)
x4 = 0.2[12 − x2 − x3 ]
Starting with initial approximation x(0) = [0, 0, 0, 0]T , and for k = 0, we obtain
(1) (0)
x1 = 0.5[1 − 8x2 ] = 0.5
(1) (1) (0)
x2 = −1[2 − 5x1 − x3 ] = 0.5
(1) (1) (1) (0)
x3 = 0.25[12 + x1 − x2 − x4 ] = 3.0
(1) (1) (1)
x4 = 0.2[12 − x2 − x3 ] = 1.7
Then the first and subsequent iterations are listed in Table 1.7.
(b) The SOR method with initial approximation x(0) = [0, 0, 0, 0]T , ω = 0.33, and for k = 0, gives
(1) (0) (0)
x1 = (1 − 0.33)x1 + 0.165[1 − 8x2 ] = 0.16500
(1) (0) (1) (0)
x2 = (1 − 0.33)x2 − 0.33[2 − 5x1 − x3 ] = −0.387750
(1) (0) (1) (1) (0)
x3 = (1 − 0.33)x3 + 0.0825[12 + x1 − x2 − x4 ] = 1.03560
(1) (0) (1) (1)
x4 = (1 − 0.33)x4 + 0.066[12 − x2 − x3 ] = 0.74924
Since we notice that the Gauss-Seidel method diverged for the given system but the SOR method
converged but very slow for the given system. •
(k) (k) (k) (k) (k) (k) (k) (k)
k x1 x2 x3 x4 k x1 x2 x3 x4
0 0.000000 0.000000 0.000000 0.000000 0 0.000000 0.000000 0.000000 0.000000
1 2.540000 3.467100 5.418392 3.544321 1 2.000000 3.000000 4.500000 3.250000
2 -0.34741 0.923809 3.319772 3.919978 2 0.500000 1.500000 3.625000 3.687500
3 2.047182 1.422556 3.331152 3.811324 3 1.250000 1.562500 3.375000 3.812500
.. .. .. .. .. .. .. .. .. ..
. . . . . . . . . .
15 0.999999 2.000000 3.000000 4.000000 35 1.000000 1.999999 3.000000 4.000000
16 1.000000 2.000000 3.000000 4.000000 36 1.000000 2.000000 3.000000 4.000000
Table 1.9: Solution by SOR Method Table 1.10: Solution by Gauss-S. Method
Example 1.77 Solve the following system of linear equations using the SOR method, with =
0.5 × 10−6 in l∞ -norm.
2x1 + x2 = 4
x1 + 2x2 + x3 = 8
x2 + 2x3 + x4 = 12
x3 + 2x4 = 11
Start with initial approximation x(0) = [0, 0, 0, 0]T , and take ω = 1.27.
Solution. For the given system, the SOR method with ω = 1.27 and starting with initial approxi-
mation x(0) = [0, 0, 0, 0]T , and for k = 0, we obtain
(1) (0) (0)
x1 = (1 − 1.27)x1 + 0.635[4 − x2 ] = 2.54
(1) (0) (1) (0)
x2 = (1 − 1.27)x2 + 0.635[8 − x1 − x3 ] = 3.4671
(1) (0) (1) (0)
x3 = (1 − 1.27)x3 + 0.635[12 − x2 − x4 ] = 5.418392
(1) (0) (1)
x4 = (1 − 1.27)x4 + 0.635[11 − x3 ] = 3.544321.
We use the author-defined function SORM and the following MATLAB commands to get the
above results as follows:
>> Ab = [2 1 0 0 4; 1 2 1 0 8; 0 1 2 1 12; 0 0 1 2 11];

>> x = [0 0 0 0]; w = 1.27; acc = 0.5e − 6;
>> SORM (Ab, x, w, acc);
We noted that the SOR method converges and required 16 iterations to obtain what is obviously
the correct solution for the given system. If we solve the same Example 1.77 using the Gauss-Seidel
method, we find that this method also converges but very slow because it needed 36 iterations
to obtain the correct solution, shown by Table 1.10, which is 20 iterations more than required by
the SOR method. Also, if we solve the same example by using the Jacobi method we find that it
needed 73 iterations to get the correct solution. Comparing the SOR method with the Gauss-Seidel
method a large reduction in number of iterations can be achieved, given an efficient choice of ω.
In practice ω should be chosen in the range 1 < ω < 2 but the precise choice of ω is a major
problem. Finding the optimum value for ω depends on the particular problem (size of the system
of equations and the nature of the equations) and often requires careful. A detailed study for the
optimization of ω can be found in Isaacson and Keller (1966). The following theorems can be used
in certain situations for the convergence of the SOR method.
Theorem 1.34 If all the diagonal elements of a matrix A are nonzero, that is, aii 6= 0, for each
i = 1, 2, . . . , n, then
ρ(Tω ) = |ω − 1|.
This implies that the SOR method can converges only if 0 < ω < 2. •
Theorem 1.35 If A is a positive definite matrix and 0 < ω < 2, then the SOR method converges
for any choice of initial approximation vector x(0) ∈ R. •
Theorem 1.36 If A is a positive definite and tridiagonal matrix, then
ρ(TG ) = [ρ(TJ )]2 < 1,
and the optimal choices of relaxation factor ω for the SOR method is
2
ω= p , (1.67)
1+ 1 − [ρ(TJ )]2
where TG and TJ are the Gauss-Seidel iteration and the Jacobi iteration matrices respectively. With
this choice of the relaxation factor ω, we can have
ρ(Tω ) = ω − 1,
the spectral radius of the SOR iteration matrix Tω . •
Example 1.78 Find the optimal choice for the relaxation factor ω by using the following matrix
 
2 −1 0
A =  −1 2 −1  .
 
0 −1 2
Solution. Since the given matrix A is positive definite and tridiagonal, so we can use the above
Theorem 1.36 to find the optimal choice for ω. Using matrix A, we can find the Jacobi iteration
matrix TJ as follows  
0 1/2 0
TJ =  1/2 0 1/2  .
 
0 1/2 0
Now to find the spectral radius of the Jacobi iteration matrix TJ , we use the following characteristic
equation
λ
det(TJ − λI) = |TJ − λi| = −λ3 + ,
2
which gives the eigenvalues of matrix TJ , as λ = 0, ± √12 . Thus, ρ(TJ ) = √1
2
= 0.7071, and the
optimal value of ω is
2
ω= p = 1.1716.
1 + 1 − (0.7071)2
Also, note that the Gauss-Seidel iteration matrix TG has a the following form
 
0 1/2 0
TG =  0 1/4 1/2  ,
 
0 1/8 1/4
and its characteristic equation is
λ2
det(TG − λI) = |TG − λI| = −λ3 + .
2
Thus
1
ρ(TG ) = = 0.5000 = (ρ(TJ ))2 ,
2
which agrees with Theorem 1.36. •
Note that the optimal value of ω can be found also by using (1.67) if the eigenvalues of the Jacobi
iteration matrix TJ are real and 0 < ρ(TJ ) < 1. •
Example 1.79 Find the optimal choice for the relaxation factor ω by using the following matrix
 
5 −1 −1 −1
 2 5 −1 0 
A= .
 
 −1 −1 5 −1 
−1 −1 −1 5
Solution. Using the given matrix A, we can find the Jacobi iteration matrix TJ as follows
 
0 1/5 1/5 1/5
 2/5 0 1/5 0 
TJ =  .
 
 1/5 1/5 0 1/5 
1/5 1/5 1/5 0
Now to find the spectral radius of the Jacobi iteration matrix TJ , we use the equation
det(TJ − λI) = 0,
and get the following polynomial equation
6 2 8 8
−λ4 − λ − λ− = (5λ − 3)(5λ + 1)3 = 0.
25 125 125
Solving the above polynomial equation, we obtain
3 1 1 1
λ = ,− ,− ,− ,
5 5 5 5
which are the eigenvalues of the matrix TJ . From this we get, ρ(TJ ) = 3/5 = 0.6, the spectral radius
of the matrix TJ .
Since the value of ρ(TJ ) is less than 1, therefore, we can use the formula (1.67) and get
2
ω= p = 1.1111,
1+ 1 − (0.6)2
the optimal value of ω. •
Since the rate of convergence of an iterative method depends on the spectral radius of the matrix
associated with the method, one way to choose a method to accelerate convergence is to choose a
method whose associated matrix T has minimal spectral radius.
Example 1.80 Compare the convergence of the Jacobi, Gauss-Seidel, and SOR iterative methods
for the system of linear equations Ax = b, where the coefficient matrix A is given as
 
4 −1 0 0
 −1 4 −1 0 
A= .
 
 0 −1 4 −1 
0 0 −1 4
Then use the faster convergence method to find the second approximation x(2) for the solution of
the system Ax = [1, 2, 2, 3]T by using initial approximation x(0) = [0.5, 0.5, 0.5, 0.5]T .
Solution. First we compute the Jacobi iteration matrix by using
TJ = −D−1 (L + U ).
Since    
4 0 0 0 0 −1 0 0
 0 4 0 0   −1 0 −1 0 
D= and L+U = ,
   
0 0 4 0 0 −1 0 −1

   
0 0 0 4 0 0 −1 0
therefore,
    
1/4 0 0 0 0 −1 0 0 0 1/4 0 0
 0 1/4 0 0  −1 0 −1 0   1/4 0 1/4 0 
TJ = −  = .
    
0 0 1/4 0 0 −1 0 −1 0 1/4 0 1/4

    
0 0 0 1/4 0 0 −1 0 0 0 1/4 0
To find the eigenvalues of the matrix TJ , we evaluate the following determinant as

−λ 1/4 0 0

1/4 −λ 1/4 0
= 0,

0 1/4 −λ 1/4

0 0 1/4 −λ
which gives the characteristic equation of the form λ4 −0.1875λ2 +1/256 = 0, and solving this fourth-
degree polynomial equation, we get the eigenvalues, λ = −0.4045, λ = −0.1545, λ = 0.1545, λ =
0.4045, of the matrix TJ . As the spectral radius of the matrix TJ is, ρ(TJ ) = 0.4045 < 1, which
shows that Jacobi method will converge for the given linear system.
Since the given matrix is positive definite and tridiagonal, therefore, by using Theorem 1.36, we can
compute the spectral radius of the Gauss-Seidel iteration matrix with the help of the spectral radius
of Jacobi iteration matrix, that is
ρ(TG ) = [ρ(TJ )]2 = (0.4045)2 = 0.1636 < 1,

which showing that Gauss-Seidel method will also converge and faster than the Jacobi method. Also,
from Theorem 1.36, we have
ρ(Tω ) = ω − 1.
Now to find the spectral radius of the SOR iteration matrix Tω , we have to calculate first the optimal
value of ω by using
2 2
ω= p
2
= p = 1.045.
1+ 1 − [ρ(TJ )] 1 + 1 − [0.4045]2
Using this optimal value of ω, we can compute the spectral radius of the SOR iteration matrix Tω
as follows
ρ(Tω ) = ω − 1 = 1.045 − 1 = 0.045 < 1,
which shows that SOR method will also converge for the given system. Since
ρ(Tω ) < ρ(TG ) < ρ(TJ ),
therefore, SOR method will converge faster than Jacobi and Gauss-Seidel methods.
For the given system, using the SOR method and taking ω = 1.045, and the initial approximation
x(0) = [0.5, 0.5, 0.5, 0.5]T , gives
x(1) = [0.3694, 0.7271, 0.8208, 0.9756]T , x(2) = [0.4346, 0.8177, 0.9541, 0.9891]T ,
the required approximation of the exact solution, x = [0.4641, 0.8565, 0.9617, 0.9904]T . •
Procedure 1.9 [SOR Method]

1. Find or take ω in the interval (0, 2) (for guaranteed convergence).
2. Initialize the first approximation x(0) and preassigned accuracy .
3. Compute the constant c = ω(D + ωL)−1 b.
4. Compute the SOR iteration matrix Tω = (D + ωL)−1 [(1 − ω)D − ωU ].

(k+1) (k)
5. Solve for the approximate solutions xi = Tω xi + c, i = 1, 2, . . . , n,
k = 0, 1, . . .
(k+1) (k)
6. Repeat step 5 until kxi − xi k < .
1.7 Errors in Solving Linear Systems

Any computed solution of a linear system must, because of round-off and other errors, be considered
an approximate solution. Here we shall consider the most natural method for determining the
accuracy of a solution of the linear system. One obvious way of estimating the accuracy of the
computed solution x∗ is to compute Ax∗ and to see how close Ax∗ comes to b. Thus if x∗ is an
approximate solution of the given system Ax = b, we compute a vector
r = b − Ax∗ , (1.68)
122 3.7 Errors in Solving Linear Systems
which is called the residual vector and can be easily calculated. The quantity
krk kb − Ax∗ k
= ,
kbk kbk
is called the relative residual.
The smallness of the residual then provides a measure of the goodness of the approximate solution
x∗ . If every component of vector r vanishes, then x∗ is the exact solution. If x∗ is a good
approximation then we would expect each component of r to be small, at least in a relative sense.
For example, the following linear system
x1 + 2x2 = 3
1.0001x1 + 2x2 = 3.0001
has the exact solution x = [1, 1]T but has a poor approximate solution x∗ = [3, 0]T . To see how
good this solution is, we compute the residual, r = [0, −0.0002]T , and so krk∞ = 0.0002. Although
the norm of the residual vector is small, the approximate solution x∗ = [3, 0]T is obviously quite
poor; in fact kx − x∗ k∞ = 2.
We use the author-defined function to get above results using MATLAB command,
>> A = [1 2; 1.0001 2]; b = [3 3.0001]; x0 = [3 0];

>> RESID(A, b, x0); x = [1 1]; Error = norm((x − x0), inf );
We can conclude from the residual that the approximate solution is correct to at most three decimal
places.
Intuitively it would seem reasonable to assume that when krk is small for a given vector norm, then
the error kx − x∗ k would be small as well. In fact this is true for some systems. However, there are
systems of equations which do not satisfy this property. Such systems are said to be ill-conditioned.
1.7.1 Ill-Conditioned Linear Systems

In solving the linear system numerically we have to see the problem conditioning, algorithm stability,
and cost. Above we discussed efficient elimination schemes to solve a linear system and these
schemes are stable when pivoting is employed. But there are some ill-conditioned systems which
are tough to solve by any method.
A system of equations is considered to be well-conditioned if a small change in the coefficient matrix
or a small change in the right hand side results in a small change in the solution vector.
A system of equations is considered to be ill-conditioned if a small change in the coefficient matrix
or a small change in the right hand side results in a large change in the solution vector.
Before introduction of ill-conditioned system, let us consider the following system of equation
x1 + 3x2 = 4
1
x
3 1 + x2 = 1.33
Note that this system of equations has no solution. But, if we take the approximate value of
1
3 = 0.3, then above system becomes
x1 + 3x2 = 4
0.3x1 + x2 = 1.33
1
The solution of this system is x1 = 0.1, x2 = 1.3. If we take the approximate value of 3 = 0.33,
then the system becomes
x1 + 3x2 = 4
0.33x1 + x2 = 1.33
whose solution is x1 = 10, x2 = −2. The system
x1 + 3x2 = 4
0.3333x1 + x2 = 1.33
gives the solution x1 = 100, x2 = −32.
So all these systems of equation ”looks” ill-conditioned (unstable or ill-posed system) because
a small change in the coefficient matrix resulted in a large change in the solutions vector.
Consider another linear system
x1 + 2x2 = 4
2x1 + 3x2 = 7
The exact solution is easily verified to be x1 = 2, x2 = 1. Now make a small change in the right
hand side vector of the equations
x1 + 2x2 = 4.001
2x1 + 3x2 = 7.001
gives the solution x1 = 1.999, x2 = 1.001. Make a small change in the coefficient matrix of the
equations
1.001x1 + 2.001x2 = 4
2.001x1 + 3.001x2 = 7.999
gives the solution x1 = 2.003, x2 = 0.997.
So this system of equation ”looks” well conditioned (stable or well-posed system) because small
changes in the coefficient matrix or the right hand side resulted in small changes in the solution
vector.
In practical problems we can expect the coefficients in the system to be subject to small errors,
either because of round-off or because of physical measurement. If the system is ill-conditioned,
the resulting solution may be grossly in error. Errors of this type, unlike those caused by round-off
error accumulation, can not be avoided by careful programming.
Note that for ill-conditioned systems the residual is not necessarily a good measure of the accuracy
of a solution. How then can we tell when a system is ill-conditioned ? In the following we discuss
the some possible indicators of ill-conditioned system.
Definition 1.31 (Condition Number of a Matrix)
The number kAkkA−1 k is called the condition number of a nonsingular matrix A and is denoted by
K(A), that is
cond(A) = K(A) = kAkkA−1 k. (1.69)
Note that the condition number K(A) for A depends on the matrix norm used and can, for some
matrices, vary considerably as the matrix norm is changed. Since
1 = kIk = kAA−1 k ≤ kAkkA−1 k = K(A),

therefore, the condition number is always in the range 1 ≤ K(A) ≤ ∞ regardless of any natural
norm. The lower limit is attained for identity matrices and K(A) = ∞ if A is singular. So the
matrix A is well-behaved (well-conditioned) if K(A) is close to 1 and is increasingly ill-conditioned
when K(A) is significantly greater than 1, that is, K(A) → ∞. But, how large does K(A) have to
be before a system is regarded as ill-conditioned? There is no clear threshold. However, to assess
the effects of ill-conditioning, a rule of thumb can be used. For a system with condition number
K(A) , expect a loss of roughly log10 K(A) decimal places in the accuracy of the solution. For the
above first system the matrix and its inverse
" # " #
1 2 −1 −3999 2000
A= , A = , K(A) = kAk∞ kA−1 k∞ = 5.999 × 5999.4 = 35990,
2 3.999 2000 −1000
expect to lose 4 decimal places (log10 (35990) = 4.55618) in accuracy. On the other hand the above
second system the matrix and its inverse
" # " #
1 2 −1 −3 2
A= , A = , K(A) = kAk∞ kA−1 k∞ = 5 × 5 = 25,
2 3 2 −1
expect to lose 1 decimal place (log10 (25) = 1.39794) in accuracy.
1.7.1.1 Scaling
Large condition numbers can also arise from equations that are in need of scaling. Consider the
following coefficient matrix which corresponds to one ”regular” equation and one ”large” equation.
For example,
" # " #
1 −1 0.5 0.0005
A= , A−1 = , K(A) = kAk∞ kA−1 k∞ = 2000 × 0.5005 = 1001.
1000 1000 −0.5 0.0005
Scaling (called Equilibration) can be used to reduce the condition number for a system that is
poorly scaled. If each row of A is scaled by its largest element, then the new A and its inverse
become
" # " #
1 −1 −1 0.5 0.5
A= , A = , K(A) = kAk∞ kA−1 k∞ = 2 × 1 = 2,
1 1 −0.5 0.5
is the condition number of the scaled system.

The condition numbers provide bounds for the sensitivity of the solution of a set of equations to
changes in the coefficient matrix. Unfortunately, the evaluation of any of the condition numbers of
a matrix A is not a trivial task since it is necessary first to obtain its inverse.
So if the condition number of a matrix is very large number then this is one of the indicator of the
ill-conditioned system. An other indicator of ill-conditioning is when the pivots during the process
of elimination suffer a loss of one or more significant figures. Small changes in the right-hand side
terms of the system lead to large changes in the solution, gives another indicator of ill-conditioned
systems. Also, when the elements of the inverse of the coefficient matrix are large compared to the
elements of the coefficients matrix, shows the ill-conditioned system. •
1.7.1.2 Properties of the Condition Number of a Matrix

• For any matrix A, cond(A) = 1.
• For identity matrix, cond(I) = 1.
• For any matrix A and scalar α, cond(αA) = cond(A).
• For any diagonal matrix D = Diag(di), cond(D) = (max|di|)/(min|di|).
• Large condition number of A mean A is nearly singular.
• When A and B are square matrices, the inequality cond(AB) ≤ cond(A)cond(B) is true for
every matrix norm.
• if A−1 = AT , then cond(AT A) = 1. Conversely, true also.
Example 1.81 Compute the condition number of the following matrix using the l∞ -norm
 
2 −1 0
A =  2 −4 −1  .
 
−1 0 2
Solution. Since the condition number of a matrix is defined as
K(A) = kAk∞ kA−1 k∞ .
First we calculate the inverse of the given matrix which is

 
8/13 −2/13 −1/13
A−1 =  3/13 −4/13 −2/13  .
 
4/13 −1/13 6/13
Then
11 11

kAk∞ = 7 and kA−1 k∞ = , K(A) = kAk∞ kA−1 k∞ = (7) ≈ 5.9231.
13 13
Depending on the application, we might consider this number to be reasonably small and conclude
that the given matrix A is reasonably well-conditioned. •
To get above results using MATLAB commands, we do the following:
>> A = [2 − 1 0; 2 − 4 − 1; −1 0 2]; Ainv = inv(A)

>> K(A) = norm(A, inf ) ∗ norm(Ainv, inf )
Example 1.82 If the condition number of following matrix A is 8.8671, then find the l∞ -norm of
its inverse matrix, that is, kA−1 k∞
 
10.2 2.4 4.5
A =  −2.3 7.7 11.1  .
 
−5.5 −3.2 0.9
Solution. Since the condition number of a matrix is defined as
K(A) = kAk∞ kA−1 k∞ .
First we calculate the l∞ -norm of the given matrix A which is the maximum of the absolute row
sums, we have
kAk∞ = max{17.1000, 21.1000, 9.6} = 21.1000,
and as it is given K(A) = 8.8671, so we have
8.8671 = (21.1000)kA−1 k∞ .
Simplifying this, we get kA−1 k∞ = 0.4202. •
We might think that if the determinant of a matrix is close to zero, then the matrix is ill-conditioned.
However, this is false. Consider the following matrix
!
10−7 0
A= ,
0 10−7
for which det A = 10−14 ≈ 0. One can easily find the condition number of the given matrix as
K(A) = kAk∞ kA−1 k∞ = (10−7 )(107 ) = 1.
The matrix A is therefore perfectly conditioned. Thus a small determinant is necessary but not
sufficient for a matrix to be ill-conditioned.
The condition number of a matrix K(A) using l2 -norm can be computed by the built-in function
cond command in the MATLAB as follows:
>> A = [1 − 1 2; 3 1 − 1; 2 0 1]; K(A) = cond(A)
Theorem 1.37 (Error in Linear Systems)
Suppose that x∗ is an approximation to the solution x of the linear system Ax = b and A is a

nonsingular matrix and r is the residual vector for x∗ . Then for any natural norm, the error is
kx − x∗ k ≤ krkkA−1 k, (1.70)
and the relative error is

kx − x∗ k krk
≤ K(A) , provided that x 6= 0, b 6= 0. (1.71)
kxk kbk
Proof. Since r = b − Ax∗ and A is nonsingular, then
Ax − Ax∗ = b − (b − r) = r,
A(x − x∗ ) = r, or x − x∗ = A−1 r. (1.72)

kx − x∗ k = kA−1 rk ≤ kA−1 kkrk.
Moreover, since b = Ax, then
kbk
kbk ≤ kAkkxk, or, kxk ≥ .
kAk
Hence
kx − x∗ k kA−1 kkrk krk
≤ ≤ K(A) .
kxk kbk/kAk kbk
The inequalities (1.70) and (1.71) imply that the quantities kA−1 k and K(A) can be used to give
an indication of the connection between the residual vector and the accuracy of the approximation.
If the quantity K(A) ≈ 1, the relative error will be fairly close to the relative residual. But if
K(A) >> 1, then the relative error could be many times larger than the relative residual. •
Example 1.83 Find the condition number of the following matrix (for n = 2, 3, . . .)
" #
1 1
An = .
1 1 − 1/n
If n = 2 and x∗ = [−1.99, 2.99]T be the approximate solution of the linear system Ax = [1, −0.5]T ,
then find the relative error.
Solution. We can easily find the inverse of the given matrix as

" # " # " #
1 1 − 1/n −1 1 − 1/n −1 1−n n
A−1
n = = −n = .
(1 − 1/n) − 1 −1 1 −1 1 n −n
Then the l∞ -norm of both matrices An and A−1

n are
kAn k∞ = 2 and kA−1

n k∞ = 2n,
and so the condition number of the matrix can be computed as follows:
K(A) = kAn k∞ k|A−1

n k∞ = (2)(2n) = 4n and lim K(A) = ∞,
n→∞
which shows that the matrix An is obviously ill-conditioned. Here we expect that the relative error
in the calculated solution to a linear system of the form An x = b could be as much as 4n times the
relative residual. The residual vector (by taking n = 2) can be calculated as
! ! ! !
∗ 1 1 1 −1.99 0.000
r = b − A2 x = − = ,
−0.5 1 0.5 2.99 −0.005
and it gives krk∞ = 0.005. Now using (1.71), we obtain
kx − x∗ k krk 0.005
≤ K(A) = (8) = 0.0400,
kxk kbk 1
which is the required relative error. •

Example 1.84 Consider a following linear system
x1 + x2 − x3 = 1
x1 + 2x2 − 2x3 = 0
−2x1 + x2 + x3 = −1
(a) Discuss the ill-conditioning of the given linear system.

(b) If x∗ = [2.01, 1.01, 1.98]T be an approximate solution of the given system, then find the residual
vector r and its norm krk∞ .
(c) Estimate the relative error using (1.71).
Solution. (a) The matrix A and its inverse is

   
1 1 −1 2 −1 0
A =  1 2 −2  , A−1 =  1.5 −0.5 0.5  .
   
−2 1 1 2.5 −1.5 0.5
Then
kAk∞ = 5, kA−1 k∞ = 4.5, K(A) = kAk∞ k|A−1 k∞ = (5)(4.5) = 22.5 >> 1,
which shows that the matrix is ill-conditioned. Thus the given system is ill-conditioned.
>> A = [1 1 − 1; 1 2 − 2; −2 1 1]; K(A) = norm(A, inf ) ∗ norm(inv(A), inf )
(b) The residual vector can be calculated as

      
1 1 1 −1 2.01 −0.04
∗
r = b − Ax =  0  −  1 2 −2   1.01  =  −0.07  , krk∞ = 0.07.
      
−1 −2 1 1 1.98 0.03
>> A = [1 1 − 1; 1 2 − 2; −2 1 1]; b = [1 0 − 1]0 ;

>> x0 = [2.01 1.01 1.98]0 ; r = RES(A, b, x0); rnorm = norm(r, inf );
(c) From (1.71), we have

kx − x∗ k krk
≤ K(A) .
kxk kbk
By using above parts (a) and (b) and the value kbk∞ = 1, we obtain
kx − x∗ k (0.07)
≤ (22.5) = 1.575.
kxk 1
>> RelErr = (K(A) ∗ rnorm)/norm(b, inf );
which is the required relative error. •

Example 1.85 Consider a linear system Ax = b, where

   
2 1 2 1
A= 1 4 0  and b =  1 .
   
1 2 1 2
(a) Discuss the conditioning of the given linear system.
(b) Suppose that b is changed to b∗ = [1, 1, 1.99]T . How large a relative change can this change
produce in the solution to Ax = b?
Solution. (a) Since the matrix A and its inverse is

   
2 1 2 4/3 1 −8/3
A =  1 4 0 , A−1 =  −1/3 0 2/3  .
   
1 2 1 −2/3 −1 7/3
Then
kAk∞ = 5, kA−1 k∞ = 5, K(A) = kAk∞ k|A−1 k∞ = (5)(5) = 25.
(b) The change from b to b∗ is an error δb = b∗ − b = [−0.01, 0, 0]T = −r, and the l∞ -norm of
this column matrix is, kδbk∞ = 0.01. From the equation (1.71), we get
kδxk 25(0.01)
≤ = 0.1250,
kxk 2
the possible relative change in the solution to the given linear system. •
1.7.2 Conditioning of a Linear System

Let us consider the conditioning of the linear system
Ax = b. (1.73)
Case 1.1 Suppose that the right-hand side term b is replaced by b + δb, where δb is an error in
b. If x + δx is the solution corresponding to the right-hand side b + δb, then we have
A(x + δx) = (b + δb), (1.74)
which implies that
Ax + Aδx = b + δb, or Aδx = δb.
Multiplying by A−1 and taking the norm, gives,
kδxk = kA−1 δbk ≤ kA−1 kkδbk. (1.75)
The change kδxk in the solution is bounded by kA−1 k times the change kδbk in the right-hand side.
kδxk
The conditioning of linear system is connected with the ratio between the relative error and
kxk
kδbk
the relative change in right-hand side, gives
kbk
kδxk/kxk kA−1 δbk/kxk kAxkkA−1 δbk
= = ≤ kAkkA−1 k,
kδbk/kbk kδbk/kAxk kxkkδbk
which implies that

kδxk kδbk
≤ K(A) . (1.76)
kxk kbk
Thus the relative change in the solution is bounded by the condition number of the matrix times
the relative change in the right-hand side. When product in right-hand side is small, the relative
change in the solution is small.
Case 1.2 Suppose that the matrix A is replaced by A + δA, where δA is error in A while right-hand
side term b is similar. If x + δx is the solution corresponding to the matrix A + δA, then we have
(A + δA)(x + δx) = b, (1.77)
which implies that
Ax + Aδx + δA(x + δx) = b, or Aδx = −δA(x + δx).
Multiplying by A−1 and taking the norm, gives,
kδxk(1 − kA−1 kkδAk) ≤ kA−1 kkδAkkxk,
which can be written as

kδxk kA−1 kkδAk K(A)kδAk/kAk
≤ = . (1.78)
kxk (1 − kA−1 kkδAk) (1 − kA−1 kkδAk)
If the product kA−1 kkδAk is much smaller than 1, the denominator in (1.78) is near 1. Consequently,
when kA−1 kkδAk is much smaller than 1, then (1.78) implies that the relative change in the solution
is bounded by the condition number of a matrix times the relative change in the coefficient matrix.
Case 1.3 Suppose that there is change in the coefficient matrix A and right-hand side term b
together, and if x + δx is the solution corresponding to coefficient matrix A + δA and the right-
hand side b + δb, then we have
(A + δA)(x + δx) = (b + δb), (1.79)
which implies that
Ax + Aδx + xδA + δAδx = b + δb, or Aδx + δxδA = (δb − xδA).
Multiplying by A−1 , we get
δx = (I + A−1 δA)−1 A−1 (δb − xδA). (1.80)
Since we know that if A is nonsingular and δA, error in A, we obtain
kA−1 δAk ≤ kA−1 kkδAk < 1, (1.81)
then it follows that (see Fröberg 1969) the matrix (I + A−1 δA) is nonsingular and
1 1
k(I + A−1 δA)−1 k ≤ −1
≤ −1
. (1.82)
1 − kA δAk 1 − kA kkδAk
Taking the norm of (1.80), and use (1.82), gives
kA−1 k
kδxk ≤ [kδbk + kxkkδAk],
1 − kA−1 kkδAk
kδxk kA−1 k kδbk

≤ −1
+ kδAk . (1.83)
kxk 1 − kA kkδAk kxk
Since we know that
kbk
kxk ≥ , (1.84)
kAk
so by using (1.84) in (1.83), gives
kδxk K(A) kδAk kδbk

≤ + (1.85)
kxk kδAk kAk kbk
1 − K(A)
kAk
The estimate (1.85) shows that small relative changes in A and b cause small relative changes in
the solution x of the linear system (1.73) if the inequality
K(A)
, (1.86)
kδAk
1 − K(A)
kAk
is not too large.

Example 1.86 Consider a linear system Ax = b, where
   
2 1 2 1
A= 1 4 0  and b =  1 .
   
1 2 1 2
(a) Discuss the conditioning of the given linear system.

(b) Suppose that b is changed to b1 = [1, 1, 1.99]T . How large a relative change can this change
produce in the solution to Ax = b?  
1.99 1 2
(c) Suppose that A is changed to A1 =  1 4 0  . How large a relative change can this
 
1 2 1
change produce in the solution to Ax
 = b? 
1.99 1 2
(d) Suppose that A is changed to A1 =  1 4 0  and b is changed to b1 = [1, 1, 1.99]T . How
 
1 2 1
large a relative change can this change produce in the solution to Ax = b?
Solution. (a) The matrix and its inverse is

   
2 1 2 4/3 1 −8/3
A =  1 4 0 , A−1 =  −1/3 0 2/3  .
   
1 2 1 −2/3 −1 7/3
Then the l∞ -norm of both matrices are
kAk∞ = 5 and kA−1 k∞ = 5.
Using the values of both matrices norms, we can find the value of the condition number of A as
follows:
K(A) = kAk∞ k|A−1 k∞ = 5 × 5 = 25.
(b) Since the change from b to b1 is an error δb, that is, δb = b1 − b = [−0.01, 0, 0]T , and the
l∞ -norm of this column matrix is, kδbk∞ = 0.01. From the equation (1.76), we get
kδxk 25(0.01)
≤ = 0.1250,
kxk 2
the possible relative change in the solution to the given linear system.
(c) Since the change from A to A1 is an error δA, that is, A1 = A + δA, so
 
−0.01 0 0
δA =  0 0 0 ,
 
0 0 0
and the l∞ -norm of this matrix is, kδAk∞ = 0.01. From the equation (1.78), we get
kδxk 25(0.01/5)
≤ = 0.0526,
kxk (1 − 5(0.01))
the possible relative change in the solution to the given linear system.
(d) Since we know from parts (b) and (c) that changes in bA and A are
   
−0.01 −0.01 0 0
δb =  0  and δA =  0 0 0 ,
   
0 0 0 0
so by using the equation (1.85), we get
kδxk 25 0.01 0.01

≤ + = 0.1842,
0.01

kxk 5 2
1 − 25
5
the possible relative change in the solution to the given linear system. •
1.7.3 Method to Solve Ill-Conditioned System

If the system is ill-conditioned, an approximate solution can be improved by an iterative technique,
called the method of residual correction (or iterative refinement method). The procedure
of the method is defined below.
Let x(1) be an approximate solution to the system
Ax = b, (1.87)
Chapter One Systems of Linear Algebraic Equations 133
and let y be a correction to x(1) so that the exact solution x satisfies
x = x(1) + y.
Then by substituting into (1.87), we find that y must satisfies
Ay = r = b − Ax(1) , (1.88)
where r is the residual. The system (1.88) can now be solved to give correction y to the approxima-
tion x(1) . Thus the new approximation x(2) = x(1) + y, will be closer to the solution as compared
to x(1) . If necessary, we compute new residual, r = b − Ax(2) , and solve again system (1.88) to
get new corrections. Normally two or three iterations are enough for getting exact solution. This
iterative method can be used to obtain an improved solution whenever an approximate solution
has been obtained by any means.
Example 1.87 The following linear system
1.01x1 + 0.99x2 = 2
0.99x1 + 1.01x2 = 2
has the exact solution x = [1, 1]T . The approximate solution due to the Gaussian elimination
method is x(1) = [1.01, 1.01]T , and residual r(1) = [−0.02, −0.02]T . Then the solution to the system
Ay = r(1) ,
using the simple Gaussian elimination method is y(1) = [−0.01, −0.01]T . So the new approximation
is, x(2) = x(1) + y(1) = [1, 1]T , which is equal to the exact solution just after one iteration. •
1.8 Applications
In this section we discuss applications of linear systems. Here, we will solve or tackle a variety of
real-life problems from several areas of science.
1.8.1 Electric Networks and Traffic Flow

1.8.1.1 Electrical Network Analysis
Systems of linear equations are used to determine the currents through various branches of electrical
networks. The following two laws, which are based on experimental verification in the laboratory,
lead to the equations.
Theorem 1.38 (Kirchoff ’s Laws)
1. Junctions: All the current flowing into a junction must flow out of it.
2. Paths: The sum of the IR terms (where I denotes current and R resistance) in any direction
around a closed path is equal to the total voltage in the path in that direction. •
134 1.5 Applications
I1 A I1
8 volts
200 vph
O N
400 vph
2 ohms 2 ohms 400 vph

F A 200 vph
Duval Street
x5
I3 I3 Hogan Street x4 Laura Street x
1
B D 100 vph
1 ohm 800 vph E Monroe Street B
x6
I2 I2 x2
x3
1000 vph
C 100 vph
D Adams Street
C x7
4 ohms 800 vph 600 vph

16 volts
(a) Electrical circuit. (b) Traffic Flow.
Figure 1.3:
Example 1.88 Consider the electric network of Figure 1.3(a). Let us determine the currents
through each branch of this network.
Solution. The batteries are 8 volts and 16 volts. The resistances are one 1-ohm, one 4-ohm, and
two 2-ohm. The current entering each battery will be the same as that leaving it.
Let the currents in the various branches of the given circuit be I1 , I2 and I3 . Kirchhoff ’s laws refer
to junctions and closed paths. There are two junctions in this circuits, namely the points B and
D. There are three closed paths, namely ABDA, CBDC and ABCDA. Apply the laws to the
junctions and paths.
Junctions
Junction B : I1 + I2 = I3
Junction D : I3 = I1 + I2
These two equations results in a single linear equation I1 + I2 − I3 = 0.

Paths
P ath ABDA : 2I1 + 1I3 + 2I1 = 8

P ath CBDC : 4I2 + 1I3 = 16
It is not necessary to look further at path ABCDA. We now have a system of three linear equations
in three unknowns, I1 , I2 and I3 . Path ABCDA in fact leads to an equation that is a combination
of the last two equations; there is no new information. The problem thus reduces to solving the
following system of three linear equations in three variables I1 , I2 , and I3 .
I1 + I2 − I3 = 0
4I1 + I3 = 8
4I2 + I3 = 16
Solve this system for I1 , I2 and I3 using the Gauss elimination method:
 ..   ..   .. 
1 1 −1 . 0 1 1 −1 . 0 1 1 −1 . 0
. . .
     
1 .. 8  ≡  0 −4 5 ..  ≡  0 −4 5 .. 8 .
     
 4 0 8
     
. .. .
0 4 1 .. 16 0 4 1 . −4 0 0 6 .. 24
Now using backward substitution to get the solution of the system, I1 = 1, I2 = 3, I3 = 4. The
units are amps. The solution is unique, as is to be expected in this physical situation. •
Traffic Flow
Network analysis, as we saw in the previous discussion, plays an important role in electrical engi-
neering. In recent years, the concepts and tools of network analysis have been found to be useful
in many other fields, such as information theory and the study of transportation systems. The
following analysis of traffic flow through a road network during peak period illustrates how systems
of linear equations with many solutions can arise in practice.
Consider the typical road network of Figure 1.3(b). It represents an area of downtown Jacksonville,
Florida. The streets are all one-way with the arrows indicating the direction of traffic flow. The
flow of traffic in and out of the network is measured in vehicles per hour (vph). The figures given
here are based on midweek peak traffic hours, 7 A.M. to 9 A.M. and 4 P.M. to 6 P.M. An increase
of 2 percent in the overall flow should be allowed for during Friday evening traffic flow. Let us
construct a mathematical model that can be used to analysis this network. Let the traffic flows
along the various branches be x1 , . . . , x7 , as shown in Figure 1.3(b).
Theorem 1.39 (Traffic Law)
All traffic entering a junction must leave that junction. •
This conservation of flow constraint (compare it to the first of the Kirchhoff’s laws for electrical
networks) leads to a system of linear equations.
Junction A : Traffic entering = 400 + 200, Traffic leaving = x1 + x5 ; x1 + x5 = 600
Junction B : Traffic entering = x1 + x6 , Traffic leaving = x2 + 100; x1 + x6 = x2 + 100.

Continuing thus for each junction and writing the resulting equations in convenient form with
variables on the left and constraints on the right we get the following system of linear equations.
Junction A x1 +x5 = 600

Junction B x1 −x2 +x6 = 100
Junction C x2 −x7 = 500
Junction D −x3 +x7 = 200
Junction E −x3 +x4 +x6 = 800
Junction F x4 +x5 = 600
The Gauss-Jordan elimination method is used to solve this system of equations. Observe that the
augmented matrix contains many zeros. These zeros greatly reduce the amount of computation
involved. In practice, networks are much larger than the one we have illustrated here, and the
systems of linear equations that describe them are thus much larger. The systems are solved on a
computer. however the augmented matrices of all such systems contain many zeros.
Solve this system for x1 , x2 , . . . , x7 using the Gauss-Jordan elimination method:
.. ..
   
 1 0 0 0 1 0 0 . 600   1 0 0 0 0 1 −1 . 600 
 ..   .. 

 1 −1 0 0 0 1 0 . 100 


 0 1 0 0 0 0 −1 . 500 

 ..   .. 
 0 1 0 0 0 0 −1 . 500  
 0 0 1 0 0 0 −1 . −200 
≡ .
  
 .. ..
0 0 −1 0 0 0 1 . 200   0 0 0 1 0 1 −1 . 600 
   


..  
.. 
0 −1 0 0 0 0 0 0 0 1 −1
   
 0 1 . 800   1 . 000 
.. ..
   
0 0 0 1 1 0 0 . 600 0 0 0 0 0 0 0 . 000
Expressing each leading variable in terms of the remaining variables, we get
x1 = −x6 + x7 + 600
x2 = x7 + 500
x3 = x7 − 200
x4 = −x6 + x7 + 600
x5 = x6 − x7
As was perhaps to be expected, the system of equations has many solutions-there are many traffic
flows possible. One does have a certain amount of choice at intersections.
Let us now use this mathematical model to arrive at information. Suppose it becomes necessary to
perform road work on the stretch of Adams Street between Laura and Hogan. It is desirable to have
as small a flow of traffic as possible along this stretch of road. The flows can be controlled along
various branches by means of traffic lights at junctions. What is the minimum flow possible along
Adams that would not lead to traffic congestion ? What are the flows along the other branches
when this is attained ? Our model will enable us to answer these questions.
Minimizing the flow along Adams corresponds to minimizing x7 . Since all traffic flows must greater
than or equal to zero, the third equation implies that the minimum value of x7 is 200, for otherwise
x3 could become negative.(A negative flow would be interpreted as traffic moving in the opposite
direction to the one permitted on a one-way street.) Thus the road work must allow for a flow of
at least 200 cars per hour on the branch CD in the peak period.
Let us now examine what the flows in the other branches will be when this minimum flow along
Adams is attained x7 gives
x1 = −x6 + 800
x2 = + 700
x3 = 000
x4 = −x6 + 800
x5 = x6 − 200
Since x7 = 200 implies that x3 = 0 and vice-versa, we see that the minimum flow in the branch x7
can be attained by making x3 = 0; that is by closing branch DE to the traffic. •
1 1
x1 x2
0 2
x3 x4
0 2
1 1
Figure 1.4: Heat-Transfer problem.
1.8.2 Heat Conduction

Another typical application of linear systems is in the heat-transfer problems in physics and engi-
neering.
Suppose we have a thin rectangular metal plate whose edges are kept at fixed temperatures. As
an example, let the left edge be at 0o C , the right edge at 2o C, and the top and bottom edges
at 1o C (see Figure 1.4). We want to know the temperature inside the plate. There are several
ways of approaching this kind of problem. A simplest approach that will interest us will be the
following type of approximation: We shall overlay our plate with finer and finer grids, or meshes.
The intersections of the mesh lines are called mesh points. Mesh points are divided into boundary
and interior points, depending on whether they lie on the boundary or the interior of the plate. we
may consider these points as heat elements, such that each influences its neighboring points. we
need the temperature of the interior points, given the temperature of the boundary points. It is
obvious that the finer the grid, the better the approximation of the temperature distribution of the
plate. To compute the temperature of the interior points, we use the following principle.
Theorem 1.40 (Mean Value property for Heat Conduction)
The temperature at any interior point is the average of the temperatures of its neighboring points.
Suppose, for simplicity, we have only four interior points with unknown temperatures x1 , x2 , x3 , x4
and 12 boundary points (not named) with temperatures indicated in Figure 1.4.
Example 1.89 Compute the unknown temperatures x1 , x2 , x3 , x4 using Figure ??.
Solution. According to the mean value property, we have
x1 = 0.25(x2 + x3 + 1), x2 = 0.25(x1 + x4 + 3), x3 = 0.25(x1 + x4 + 1), x4 = 0.25(x2 + x3 + 3).
The problem thus reduces to solving the following system of four linear equations in four variables
x1 , x2 , x3 and x4 .
4x1 − x2 − x3 = 1
−x1 + 4x2 − x4 = 3
−x1 + 4x3 − x4 = 1
− x2 − x3 + 4x4 = 3
Solve this system for x1 , x2 , x3 and x4 using the Gauss elimination method:
 ..   . 
4 −1 −1 0 . 1 4 −1 −1 0 .. 1 
.. .
  
15
− 41 −1 .. 13
   
 −1 4 0 −1 . 3  ≡ ··· ≡  0
  
4 4 .
.. 16 ..

0 56 22
   
 −1 4 −1
0 . 1 15 − 15 .
  0 
   15 
.. 24 .. 30
0 −1 −1 4 . 3 0 0 0 7 . 7
Using backward substitution to get the solution, x1 = 34 , x2 = 45 , x3 = 3

4 and x4 = 54 . •
1.8.3 Chemical Solutions and Balancing Chemical Equations

Example 1.90 (Chemical Solutions)
It takes three different ingredients, A, B and C, to produce a certain chemical substance. A, B and
C have to be dissolved in water separately before they interact to form the chemical. The solu-
tion containing A at 2.5g per cubic centimeter (g/cm3 ) combined with the solution containing B at
4.2g/cm3 combined with the solution containing C at 5.6g/cm3 makes 26.50g of the chemical. If the
proportions for A, B, C in these solutions are changed to 3.4, 4.7 and 2.8g/cm3 , respectively (while
the volumes remain the same), then 22.86g of the chemical is produced. Finally, if the proportions
are changed to 3.7, 6.1 and 3.7g/cm3 , respectively, then 29.12g of the chemical is produced. What
are the volumes in cubic centimeters of the solutions containing A, B, and C?
Solution. Let x, y, z be the cubic centimeters be the corresponding volumes of the solutions con-
taining A, B and C. Then 2.5x is the mass of A in the first case, 4.2y is the mass of B and 5.6z
is the mass of C. Add together, the three masses should be 26.50. So 2.5x + 4.2y + 5.6z = 26.50.
The same reasoning applies to the other two cases, and we get the system
2.5x + 4.2y + 5.6z = 26.50

3.4x + 4.7y + 2.8z = 22.86
3.6x + 6.1y + 3.7z = 29.12
Solve this system for x, y and z using the Gauss elimination method:
 ..   .. 
2.5 4.2 5.6 . 26.50   2.5 4.2 5.6 . 26.50 
. ..

 3.4 4.7 2.8 .. 22.86  ≡  0 −1.012 −4.816
   
   . −13.18 
.. ..
3.6 6.1 3.7 . 29.12 0 0 −4.612 . −9.717
Using backward substitution to get the solution x = 0.847, y = 2.996, z = 2.107, of the system. The
volumes of the solutions containing A, B and C are 0.847cm3 , 2.996cm3 , 2.107cm3 , respectively. •

3.6 Iterative Methods For Solving Linear Systems

Uploaded by

Copyright:

Available Formats

3.6 Iterative Methods For Solving Linear Systems

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3.6 Iterative Methods For Solving Linear Systems

Uploaded by

Copyright:

Available Formats

104 3.

6 Iterative Methods for Solving Linear Systems

45x1 + 38x2 + 38x3 = 350

(k+1) (k) (k)

1.6.4.1 Eigenvalues and Eigenvectors

Definition 1.28 An n × n matrix A is said to have an eigenvalue λ of A if there exists a nonzero

The equivalent form of (1.58) is

det(A − λI) = |A − λI| = 0. (1.60)

which gives a characteristic equation of the form

which implies that

x(3) = α[0, 0, 1]T , where α ∈ R, α 6= 0.

>> [X, D] = eig(A);

>> A = [−6 0 0; 11 − 3 0; −3 6 7]; P = poly(A); P P = poly2sym(P );

Definition 1.29 (Spectral Radius of a Matrix)

Let A be an n × n matrix. Then spectral radius ρ(A) of a matrix A is defined as

ρ(A) = max |λi |,

where λi are the eigenvalues of a matrix A. •

For example, the following matrix  

det(A − λI) = −λ3 + λ2 + 12λ = 0,

which gives the eigenvalues λ = 4, 0, −3 of A. Hence the spectral radius of A is

ρ(A) = max{|4|, |0|, | − 3|} = 4.

>> A = [4 1 − 3; 0 0 2; 0 0 − 3]; B = max(eig(A))

Example 1.66 For the following matrix

Solution. Decompose the given matrix in to the following form

Theorem 1.28 (Necessary and Sufficient Condition for Convergence)

For any initial approximation x(0) ∈ R, the sequence {x(k) }∞

x(k+1) = T x(k) + c, for each k ≥ 0, and c 6= 0 (1.61)

converges to the unique solution of x = T x + c if and only if ρ(T ) < 1.

1. 0 ≤ ρ(TG ) < ρ(TJ ) < 1.

2. 1 < ρ(TJ ) < ρ(TG ).

and the characteristic equation of the matrix TJ is

and has the characteristic equation of the form

with the characteristic equation of the form

with the characteristic equation of the form

and the characteristic equation of the matrix TJ is

det(TJ − λI) = −λ3 = 0,

and has the characteristic equation of the form

det(TG − λI) = −λ3 + 4λ2 − 4λ = 0.

and the characteristic equation of the matrix TJ is

det(TJ − λI) = −4λ3 = 5λ − 2 = 0,

and has the characteristic equation of the form

det(TG − λI) = −λ3 + λ2 − 1/4λ = 0.

Definition 1.30 (Convergent Matrix)

An n × n matrix is called a convergent matrix if

lim (Ak )ij = 0, for each i, j = 1, 2, . . . , n.

Then in general, we have

Theorem 1.30 The following statements are equivalent:

2. lim kAn k = 0, for all natural norms.

4. lim An x = 0, for every x. •

det(A − λI) = λ4 − 4λ3 + 2λ2 + 4λ − 3 = 0,

Theorem 1.31 If A is an n × n matrix then

2. ρ(A) ≤ kAk, for any natural norm k.k .

Example 1.72 Consider the following matrix

which gives a characteristic equation of the form

det(A − λI) = −λ3 − 2λ2 + λ + 2 = 0.

ρ(A) = max{| − 2|, | − 1|, |1|} = 2.

and a characteristic equation of AT A is

−λ3 + 11λ2 − 14λ + 4 = 0,

ρ(A) = 2 < 5 = kAk∞ and ρ(A) = 2 < 3 = kAk1 ,

2. Initialize the first approximation x(0) and preassigned accuracy .