PrelimNum PDF

Prelim Notes for Numerical Analysis ∗
Wenqiang Feng †
Abstract
This note is intended to assist my prelim examination preparation. You can download and distribute
it. Please be aware, however, that the note contains typos as well as incorrect or inaccurate solutions . At
here, I also would like to thank Liguo Wang for his help in some problems. This note is based on the Dr.
Abner J. Salgado’s lecture note [4]. Some solutions are from Dr. Steven Wise’s lecture note [5].
∗ Key words: UTK, PDE, Prelim exam, Numerical Analysis.

† Department of Mathematics,University of Tennessee, Knoxville, TN, 37909, wfeng@math.utk.edu
1
Contents
List of Figures 4
List of Tables 4
1 Preliminaries 5
1.1 Linear Algebra Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.1 Common Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.2 Similar and diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.3 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.4 Unitary matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.5 Hermitian matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.6 Positive definite matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1.7 Normal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1.8 Common Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2 Calculus Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Preliminary Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Norms’ Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.4.1 Vector Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.4.2 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2 Direct Method 33
2.1 For squared or rectangular matrices A ∈ Cm,n , m ≥ n . . . . . . . . . . . . . . . . . . . . . . . 33
2.1.1 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.1.2 Gram-Schmidt orthogonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.1.3 QR Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2 For squared matrices A ∈ Cn,n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2.1 Condition number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2.2 LU Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2.3 Cholesky Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2.4 The Relationship of the Existing Decomposition . . . . . . . . . . . . . . . . . . . . 39
2.2.5 Regular Splittings[3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3 Iterative Method 43
3.1 Diagonal dominant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 General Iterative Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3 Stationary cases iterative method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.1 Jacobi Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.2 Gauss-Seidel Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3.3 Richardson Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.4 Successive Over Relaxation (SOR) Method . . . . . . . . . . . . . . . . . . . . . . . . 50
3.4 Convergence in energy norm for steady cases . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5 Dynamic cases iterative method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.5.1 Chebyshev iterative Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.5.2 Minimal residuals Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.5.3 Minimal correction iterative method . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.5.4 Steepest Descent Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.5.5 Conjugate Gradients Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2
Wenqiang Feng Prelim Exam note for Numerical Analysis Page 3
3.5.6 Another look at Conjugate Gradients Method . . . . . . . . . . . . . . . . . . . . . . 59

3.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4 Eigenvalue Problems 63
4.1 Schur algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2 QR algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3 Power iteration algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.4 Inverse Power iteration algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5 Solution of Nonlinear problems 69

5.1 Bisection method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2 Chord method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.3 Secant method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.4 Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.5 Newton’s method for system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.6 Fixed point method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6 Euler Method 79
6.1 Euler’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2 Trapezoidal Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.3 Theta Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.4 Midpoint Rule Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7 Multistep Methond 88
7.1 The Adams Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.2 The Order and Convergence of Multistep Methods . . . . . . . . . . . . . . . . . . . . . . . . 88
7.3 Method of A-stable verification for Multistep Methods . . . . . . . . . . . . . . . . . . . . . 89
7.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8 Runge-Kutta Methods 95
8.1 Quadrature Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8.2 Explicit Runge-Kutta Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8.3 Implicit Runge-Kutta Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.4 Method of A-stable verification for Runge-Kutta Method . . . . . . . . . . . . . . . . . . . . 96
8.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
9 Finite Difference Method 97

9.1 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
10 Finite Element Method 106

10.1 Finite element methods for 1D elliptic problems . . . . . . . . . . . . . . . . . . . . . . . . . 108
10.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
References 113
Appendices 114
Appendix 114
Page 3 of 236
A Numerical Mathematics Preliminary Examination Sample Question, Summer, 2013 114
A.1 Numerical Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
A.2 Numerical Solutions of Nonlinear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
A.3 Numerical Solutions of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
A.4 Numerical Solutions of PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
A.5 Supplemental Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
B Numerical Mathematics Preliminary Examination 148

B.1 Numerical Mathematics Preliminary Examination Jan. 2011 . . . . . . . . . . . . . . . . . . 148
B.2 Numerical Mathematics Preliminary Examination Aug. 2010 . . . . . . . . . . . . . . . . . . 155
C Project 1 MATH571 161
D Project 2 MATH571 177
E Midterm examination 572 189
F Project 1 MATH572 196
G Project 2 MATH572 214
List of Figures
1 The curve of ρ (TRC ) as a function of ω . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2 The curve of ρ (TR ) as a function of w . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3 One dimension’s uniform partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
A1 One dimension’s uniform partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
B2 The curve of ρ (TR ) as a function of w . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
List of Tables
4
1 Preliminaries
1.1 Linear Algebra Preliminaries
1.1.1 Common Properties
Properties 1.1. (Structure of Matrices) Let A = [Aij ] be a square or rectangular matrix, A is called
• diagonal : if aij = 0, ∀i , j, • tridiagonal : if aij = 0, ∀|i − j| > 1,

• upper triangular : if aij = 0, ∀i > j, • lower triangular : if aij = 0, ∀i < j,
• upper Hessenberg : if aij = 0, ∀i > j + 1, • lower Hessenberg : if aij = 0, ∀j > i + 1,
• block diagonal :A = diag (A11 , A22 , · · · , Ann ), • block diagonal :A = diag (Ai,i−1 , Aii , · · · , Ai,i +1 ).
Properties 1.2. (Type of Matrices) Let A = [Aij ] be a square or rectangular matrix, A is called
• Hermitian : if A∗ = A, • skew hermitian : if A∗ = −A,

• symmetric : if AT = A, • skew symmetric : if AT = −A,
• normal : if AT A = AAT , when A ∈ Rn×n , • orthogonal : if AT A = I, when A ∈ Rn×n ,
if A∗ A = AA∗ , when A ∈ Cn×n , unitary : if A∗ A = I, when A ∈ Cn×n .
Properties 1.3. (Properties of invertible matrices) Let A be n × n square matrix. If A is invertible , then
• det (A) , 0, • nullity (A) = 0,

• rank (A) = n, • λi , 0, (λi eigenvalues),
• Ax = b has a unique solution for every b ∈ Rn • Ax = 0 has only trivial solution,
• the row vectors are linearly independent , • the column vectors are linearly independent ,
• the row vectors of A form a basis for Rn . • the column vectors of A form a basis for Rn ,
• the row vectors of A span Rn . • the column vectors of A span Rn .
Properties 1.4. (Properties of conjugate transpose) Let A, B be n × n square matrix and γ be a complex
constant, then
• (A∗ )∗ = A, • det (A∗ ) = det (A)

• (AB)∗ = B∗ A∗ , • tr (A∗ ) = tr (A)
• (A + B)∗ = A∗ + B∗ , • (γA)∗ = γ ∗ A∗ .
Properties 1.5. (Properties of similar matrices) If A ∼ B , then
• det (A) = det (B), • rank (A) = rank (B),

• eig (A) = eig (B), • if B ∼ C, then A ∼ C
• A ∼ A, • B∼A
Page 5 of 236
Properties 1.6. (Properties of Unitary Matrices) Let A be a n × n Unitary matrix, then
• A∗ = A−1 , • A∗ = I,
• A∗ is unitary,
• A is an isometry.
• A is diagonalizable,
• A is unitarily similar to a diagonal matrix , • the column vectors of A form an orthonormal
• the row vectors of A form an orthonormal set, set.
Properties 1.7. (Properties of Hermitian Matrices) Let A be a n × n Hermitian matrix, then
• its eigenvalues are real , • vi ∗ vj = 0, i , j , vi , vj eigenvectors,

• A is unitarily diagonalizable (Spectral theo- • A = H + K, H is Hermitian and K is skew-
rem ), Hermitian,
Properties 1.8. (Properties of positive definite Matrices) Let A ∈ Cn×n be a positive definite Matrix and
B ∈ Cn×n , then
• σ (A) ⊂ (0, ∞), • if A is positive semidefinite then diag (A) ≥ 0,

• A is invertible, • if A is positive definite then diag (A) > 0.
• if B is invertible, B∗ B positive semidefinite , • B∗ B is positive semidefinite
Properties 1.9. (Properties of determinants) Let A, B be n × n square matrix and α be a real constant,
then
• det (AT ) = det (A), • det (AB) = det (A)det (B),

1
• det (αA) = α n det (A), • det (A−1 ) = det (A)
= det (A)−1 .
Properties 1.10. (Properties of inverse) Let A, B be n × n square matrix and α be a real constant, then
• (A∗ )−1 = (A−1 )∗ , • (αA)−1 = α1 A−1

• (A−1 )−1 = A, ! " #
a b 1 d −b
• (AB)−1 = B−1 A−1 , • A= , A−1 = ad−bc .
c d −c a
Properties 1.11. (Properties of Rank) Let A be m × n matrix, B be n × m matrix and P , Q are invertible
n × n matrices, then
• rank (A) ≤ min{m, n}, • rank (P AQ ) = Rank (A),

• rank (A) = rank (A∗ ), • rank (AB) ≥ rank (A) + rank (B) − n,
• rank (A) + dim(ker (A)) = n, • rank (AB) ≤ min{rank (A), rank (B)},
• rank (AQ ) = Rank (A) = Rank (P A), • rank (AB) ≤ rank (A) + rank (B).
Page 6 of 236
1.1.2 Similar and diagonalization
Theorem 1.1. (Similar) A is said to be similar to B, if there is a nonsingular matrix X, such that
A = XBX −1 , (A ∼ B).
Theorem 1.2. (Diagonalizablea ) A matrix is diagonalizable , if and only if there exist a nonsingular
matrix X and a diagonal matrix D such that A = XDX −1 .
a Being diagonalizable has nothing to do with being invertible.
Theorem 1.3. (Diagonalizable) A matrix is diagonalizable , if and only if all its eigenvalues are semisimple
.
Theorem 1.4. (Diagonalizable) Suppose dim(A) = n. A is said to be diagonalizable , if and only if A has
n linearly independent eigenvectors .
Corollary 1.1. (Sample question #2, summer, 2013 ) Suppose dim(A) = n. If A has n distinct eigenvalues
, then A is diagonalizable .
Proof. (Sketch) Suppose n = 2, and let λ1 , λ2 be distinct eigenvalues of A with corresponding eigenvectors
v1 , v2 . Now, we will use contradiction to show v1 , v2 are lineally independent. Suppose v1 , v2 are lineally
dependent, then
c1 v1 + c2 v2 = 0, (1)
with c1 , c2 are not both 0. Multiplying A on both sides of (210), then
c1 Av1 + c2 Av2 = c1 λ1 v1 + c2 λ2 v2 = 0. (2)
Multiplying λ1 on both sides of (210), then
c1 λ1 v1 + c2 λ1 v2 = 0. (3)
Subtracting (212) form (211), then
c2 (λ2 − λ1 )v2 = 0. (4)
Since λ1 , λ2 and v2 , 0, then c2 = 0. Similarly, we can get c1 = 0. Hence, we get the contradiction.
A similar argument gives the result for n. Then we get A has n linearly independent eigenvectors .
Theorem 1.5. (Diagonalizable) Every Hermitian matrix is diagonalizable , In particular, every real
symmetric matrix is diagonalizable.
1.1.3 Eigenvalues and Eigenvectors
Theorem 1.6. if λ is an eigenvalue of A, then λ̄ is an eigenvalue of A∗ .
Theorem 1.7. The eigenvalues of a triangular matrix are the entries on its main diagonal.
Page 7 of 236
Theorem 1.8. Let A be square matrix with eigenvalue λ and the corresponding eigenvector x.
• λn , n ∈ Z is an eigenvalue of An with corresponding eigenvector x,
• if A is invertible, then 1/λ is an eigenvalue of A−1 with corresponding eigenvector x.
Theorem 1.9. Let A be n × n square matrix and let λ1 , λ2 , · · · , λm be distinct eigenvalues of A with corre-
sponding eigenvectors v1 , v2 , · · · , vm . Then v1 , v2 , · · · , vm are linear independent.
1.1.4 Unitary matrices
Definition 1.1. (Unitary Matrix) A matrix A ∈ Cn×n is said to be unitary a , if
A∗ A = I.
a A matrix A ∈ Rn×n is said to be orthogonal , if
AT A = I.
Theorem 1.10. (Angle preservation) A matrix is unitary , then the transformation defined by A preserves
angles.
<x,y>
Proof. For any vectors x, y ∈ Cn that is angle θ is determined from the inner product via cos θ = kxkky k
.
Since A is unitary (and thus an isometry), then
< Ax, Ay >=< A∗ Ax, y >=< x, y > .
This proves the Angle preservation.
Theorem 1.11. (Angle preservation) A matrix is real orthogonal , then A has the transformation form
T (θ ) for some θ " # " #
1 0 cos(θ ) sin(θ )
A= T (θ ) = (5)
0 −1 sin(θ ) − cos(θ )
Finally, we can easily establish the diagonalzableility of the unitary matrices.
Theorem 1.12. (Shur Decomposition) A matrix A ∈ Cn×n is similar to a upper triangular matrix and
A = U T U −1 , (6)
where U is a unitary matrix , T is an upper triangular matrix .
Proof. see Appendix (??)
Theorem 1.13. (Spectral Theorem for Unitary matrices) A is unitary , then A is diagonalizable and A is
unitarily similar to a diagonal matrix .
A = U DU −1 = U DU ∗ , (7)
where U is a unitary matrix , D is an diagonal matrix .
Page 8 of 236
Proof. Result follows from 1.12.
Theorem 1.14. (Spectral representiation) A is unitary , then

1. A has a set of n orthogonal eigenvectors,
2. let {v1 , v2 , · · · , vn } be the eigenvalues w.r.t the corresponding orthogonal eigenvectors {λ1 , λ2 , · · · , λn }.
The A has the representation as the sum of rank one matrices given by
n
X
A= λi vi viT . (8)
i =1
Note: this representation is often called the Spectral Representation or Spectral Decomposition of A.
Proof. see Appendix (??)
1.1.5 Hermitian matrices
Definition 1.2. (Hermitian Matrix) A matrix is Hermitian , if
A∗ = A.
Definition 1.3. Let A be Hermitian , then the spectral of A, σ (A), is real.
Proof. Let λ ∈ σ (A) with corresponding eigenvector v. Then
< Av, v > = < λv, v >= λ < v, v > (9)

< Av, v > = < v, A∗ v >=< v, λ̄v >= λ̄ < v, v > . (10)
Since < v, v >, 0,therefore λ = λ̄. Hence λ is real.
Definition 1.4. Let A be Hermitian , then the different eigenvector are orthogonal i.e.
< vi , vj >= 0, i , j. (11)
Proof. Let λ1 , λ2 be the arbitrary two different eigenvalues with corresponding eigenvector v1 , v2 . Then
< Av1 , v2 > = < λ1 v1 , v2 >= λ1 < v1 , v2 > (12)

< Av1 , v2 > = < v1 , A∗ v2 >=< v1 , Av2 >=< v, λ2 v2 >= λ2 < v1 , v2 > . (13)
Since λ1 , λ2 ,therefore < v1 , v2 >= 0.
Theorem 1.15. (Spectral Theorem for Hermitian matrices) A is Hermitian , then A is unitary diagonal-
izable .
A = U DU −1 = U DU ∗ , (14)
where U is a unitary matrix , D is an diagonal matrix .
Theorem 1.16. If A, B are unitarily similar , then A is Hermitian if and only if B is Hermitian .
Page 9 of 236
Proof. Since A, B are unitarily similar , then A = U BU −1 , where U is a unitary matrix . And
∗
A∗ = U −1 B∗ U ∗ = U ∗−1 B∗ U ∗ = U B∗ U −1 ,
since U is a unitary matrix. Therefore
U BU −1 = A = A∗ = U B∗ U −1 .
Hence, B = B∗ .
Theorem 1.17. If A = A∗ , then ρ (A) = kAk2 .
Proof. Since A is self-adjoint, there an orthonormal basis of eigenvector x ∈ Cn , s.t.
x = α1 e1 + α2 e2 + · · · + αn en .
Moreover, Aei = λi ei , kei k = 1 and (ei , ej ) = 0 when i . j,(ej , ej ) = 1. So,

n
X
kxk2`2 = |αi |2 ,
i =1
since,
n
X n
X
(x, x ) = ( αi ei , αj ej )
i =1 j =1
n X
X n
= αi ᾱj ei ej
i =1 j =1
n
X
= |αi |2 .
i =1
Since, Ax = A(α1 e1 + α2 e2 + · · · + αn en ) = α1 λ1 e1 + α2 λ2 e2 + · · · + αn λn en , then

n
X n
X n
X
kAxk2`2 = |λi αi |2 = |λi |2 |αi |2 ≤ max{|λi |}2 |αi |2 .
i =1 i =1 i =1
Therefore,
kAxk`2 ≤ ρ (A) kxk`2 ,
i.e.
kAxk`2
kAk2 = sup ≤ ρ (A).
x∈Cn kxk` 2
Let k be the index, s.t: |λn | = ρ (A) and x = ek , Ax = Aek = λn ek , so kAxk`2 = |λn | = ρ (A) and
kAxk`2 kAxk`2
kAk2 = sup ≥ = ρ (A).
x∈Cn kxk`2 kxk`2
Page 10 of 236
1.1.6 Positive definite matrices
Definition 1.5. (Positive Definite Matrix)

1. A symmetric real matrix A ∈ Rn×n is said to be Positive Definite , if
xT Ax > 0, ∀x , 0.
2. A Hermitian matrix A ∈ Cn×n is said to be Positive Definite , if
x∗ Ax > 0, ∀x , 0.
Theorem 1.18. Let A, B ∈ Cn×n . Then

1. if A is positive definite, then σ A ⊂ (0, ∞),
2. if A is positive definite, then A is invertible,
3. B∗ B is positive semidefinite,
4. if B is invertible, then B∗ B is positive definite.
5. if B is positive definite, then diag (B) is nonnegative,
6. if diag (B) strictly positive, thenif B is positive definite.
Problem 1.1. (Sample question #1, summer, 2013 ) Suppose A ∈ Cn×n is hermitian and σ (A) ⊂ (0, ∞).
Prove A is Hermitian Positive Defined (HPD).
Proof. Since, A is Hermitian, then is Unitary diagonalizable. i.e. A = U DU −1 = U DU ∗ , then
x∗ Ax = x∗ U DU −1 x = x∗ U DU ∗ x = (U ∗ x )∗ D (U ∗ x ). (15)
Moreover, since σ (A) ⊂ (0, ∞) then x̃∗ D x̃ > 0 for any nonzero x̃. Hence
x∗ Ax = (U ∗ x )∗ D (U ∗ x ) = x̃∗ D x̃ > 0, for any nonzero x. (16)
1.1.7 Normal matrices
Definition 1.6. (Normal Matrix) A matrix is called normal , if
A∗ A = AA∗ .
Corollary 1.2. Unitary matrix and Hermitian matrix are normal matrices.
Theorem 1.19. A ∈ Cn×n is normal if and only if every matrix unitarily equivalent to A is normal.
Theorem 1.20. A ∈ Cn×n is normal if and only if every matrix unitarily equivalent to A is normal.
Proof. Suppose A is normal and B = U ∗ AU , where U is unitary. Then B∗ B = U ∗ A∗ U U ∗ AU = U ∗ A∗ AU =

U ∗ AA∗ U = U ∗ AU U ∗ A∗ U = BB∗ , so B is normal. Conversely, If B is normal, it is easy to get that U ∗ A∗ AU =
U ∗ AA∗ U , then A∗ A = AA∗
Page 11 of 236
Theorem 1.21. (Spectral theorem for normal matrices) If A ∈ Cn×n has eigenvalues λ1 , · · · , λn , counted
according to multiplicity, the following statements are equivalent.
1. A is normal,
2. A is unitarily diagonalizable,
3. ni=1 nj=1 |aij |2 = ni=1 |λi |2 ,
P P P
4. There is an orthonormal set of n eigenvectors of A.
1.1.8 Common Theorems
Definition 1.7. (Orthogonal Complement) Suppose S ⊂ Rn is a subspace. The (Orthogonal Complement)

of S is defined as
n o
S ⊥ = y ∈ Rn | y T x = 0, ∀x ∈ S
Theorem 1.22. Suppose A ∈ Rn×n . Then

1. R(A)⊥ = N (AT ),
2. R(AT )⊥ = N (A).
Proof. 1. For any ỹ ∈ R(A)⊥ , then ỹ T y = 0, ∀y ∈ R(A). And ∀y ∈ R(A), there exists x, such that Ax = y.
Then
ỹ T Ax = (AT ỹ )T x = 0.
Since, x is arbitrary, so it must be AT ỹ = 0. Hence
R ( A ) ⊥ ⊂ N ( AT )
Conversely, suppose y ∈ N (AT ), then AT y = 0 and hence (AT y )T x = y T Ax = 0 for any x ∈ Rn . So,
y ∈ R(AT )⊥ . Therefore
N (AT ) ⊂ R(A)⊥
R(A)⊥ = N (AT ),
2. Similarly, we can prove R(AT )⊥ = N (A),
1.2 Calculus Preliminaries
Definition 1.8. (Taylor formula for one variable) Let f (x ) to be n-th differentiable at x0 , then there exists
a neighborhood B(x0 , ), ∀x ∈ B(x0 , ), s.t.
f 00 (x0 ) f ( n ) ( x0 )
f (x ) = f (x0 ) + f 0 (x0 )(x − x0 ) + ( x − x0 ) 2 + · · · + (x − x0 )n + O ((x − x0 )n+1 )
2! n!
f 00 (x0 ) 2 f ( n ) ( x0 ) n
= f (x0 ) + f 0 (x0 )∆x + ∆x + · · · + ∆x + O (∆xn+1 ). (17)
2! n!
Page 12 of 236
Definition 1.9. (Taylor formula for two variables) Let f (x, y ) ∈ C k +1 (B((x0 , y0 ), )), then ∀(x0 +
∆x, y0 + ∆y ) ∈ B((x0 , y0 ), )),
!
∂ ∂
f (x0 + ∆x, y0 + ∆y ) = f (x0 , y0 ) + ∆x + ∆y f (x0 , y0 )
∂x ∂y
!2
1 ∂ ∂
+ ∆x + ∆y f ( x0 , y 0 ) + · · · (18)
2! ∂x ∂y
!k
1 ∂ ∂
+ ∆x + ∆y f ( x 0 , y 0 ) + Rk
k! ∂x ∂y
where
!k +1
1 ∂ ∂
Rk = ∆x + ∆y f (x0 + θ∆x, y0 + θ∆y ), θ ∈ (0, 1).
(k + 1) ! ∂x ∂y
Definition 1.10. (Commonly used taylor series)

∞
1 X
= xn = 1 + x + x2 + x3 + x4 + · · · , x ∈ (−1, 1), (19)
1−x
n=0
∞
X xn x2 x3 x4
ex = = 1+x+ + + + ··· , x ∈ R, (20)
n! 2! 3! 4!
n=0
∞
X x2n+1 x3 x5 x7 x9
sin(x ) = (−1)n =x− + − + − ··· , x ∈ R, (21)
n=0
(2n + 1)! 3! 5! 7! 9!
∞
X x2n x2 x4 x6 x8
cos(x ) = (−1)n = 1− + − + − ··· , x ∈ R, (22)
n=0
(2n)! 2! 4! 6! 8!
∞
X x n+1 x2 x3 x4
ln(1 + x ) = (−1)n =x− + − + ··· , x ∈ (−1, 1). (23)
n+1 2 3 4
n=0
1.3 Preliminary Inequalities
Definition 1.11. (Cauchy’s Inequality)
a2 b 2
ab ≤ + , for all a, b ∈ R. (24)
2 2
a2 2
Proof. Since (a − b )2 = a2 − 2ab + b2 ≥ 0, therefore ab ≤ 2 + b2 , for all a, b ∈ R.
Definition 1.12. (Cauchy’s Inequality with )
b2
ab ≤ a2 + , for all a, b > 0 , > 0. (25)
4
√
Proof. Using Cauchy’s Inequality with 2a, √1 b in place of a, b, we can get the result.
2
Page 13 of 236
Definition 1.13. (Young’s Inequality) Let 1 < p, q < ∞, p1 + 1q = 1. Then
ap b q
ab ≤ + for all a, b > 0 . (26)
p q
Proof. Firstly, we introduce an auxiliary function
tp 1
f (t ) = + − t.
p q
We know that the minimum value is at t = 1, since f 0 (t ) = t p−1 = 0 at t = 1. Now, we setting t = ab−q/p ,
we get
(ab−q/p )p 1
0 ≤ f (ab−q/p ) = + − ab−q/p
p q
ap b−q 1
= + − ab−q/p .
p q
So,
ap b−q 1
ab−q/p ≤ + .
p q
Multiplying bq on both side of the above equation yields
ap b q
abq−q/p ≤ + .
p q
1
Since, p + 1q = 1, so pq = p + q and q − q/p = 1. Hence
ap b q
ab ≤ + for all a, b > 0 .
p q
Definition 1.14. (Young’s Inequality with )
ab ≤ ap + C ( )bq , for all a, b > 0 , > 0, (27)
Where, C ( ) = (p )−p/q q−1 .
1 1/p

Proof. Using Young’s Inequality with (p )1/p a, p b in place of a, b, we can get the result.
Definition 1.15. (Hölder’s Inequality) Let 1 < p, q < ∞, p1 + 1

q = 1. If u ∈ Lp (U ), v ∈ Lq (U ), then we
have uv ∈ L1 (U ) and
Z Z !1/p Z !1/q
p q
|uv|dx ≤ |u| dx |v| dx = kukLp (U ) kvkLq (U ) . (28)
U U U
Page 14 of 236
R R R
Proof. Suppose U |u|p dx , 0 and U |v|q dx , 0. Otherwise, if U |u|p dx = 0, then u ≡ 0 a.e. and the Hölder’s
Inequality is trivial. We can use the same argument for v. Now, we define f , g as following
|u| |v|
f = ,g = . (29)
kukLp kvkLq
Now applying Young’s inequality for f g, we have
|u| |v| 1 |u|p 1 |u|q
fg = ≤ p + . (30)
kukLp kvkLq p kukLp q kukq q
L
Integrating it on U with respect to x, we obtain

Z  
 1 |u|p 1 |u|q 
Z
|u| |v|
dx ≤ p +  dx

U kukLp kvkLq U p kukLp
 q kukq q 
L
R R
p p
1 U |u| dx 1 U |v| dx
= + (31)
p kukpp q kvkq q
L L
1 1
= + = 1.
p q
(31) implies that
Z
|u||v|dx ≤ kukLp kvkLq . (32)
U
Hence
Z Z
|uv|dx ≤ |u||v|dx ≤ kukLp kvkLq . (33)
U U
Corollary 1.3. (Hölder’s Inequality) Suppose that u ∈ L1 (U ), v ∈ L∞ (U ), then we have uv ∈ L1 (U ) and

Z
|uv|dx ≤ kukL1 (U ) kvkL∞ (U ) . (34)
U
Proof. Since u ∈ L1 (U ), v ∈ L∞ (U ), so |uv| < ∞ and

Z
|uv|dx < ∞. (35)
U
So uv ∈ L1 (U ).
Z Z Z
|uv|dx ≤ |u||v|dx ≤ kvkL∞ (U ) |u|dx = kukL1 (U ) kvkL∞ (U ) . (36)
U U U
Definition 1.16. (General Hölder’s Inequality) Let 1 < p1 , · · · , pn < ∞, p1 + · · · + p1 = 1r . If uk ∈ Lpk (U ),

1 n
then we have Πnk=1 ui ∈ Lr (U ) and
Z
|u1 · · · un |r dx ≤ Πnk=1 kui krLpk (U ). (37)
U
Page 15 of 236
Proof. We will use induction to prove General Hölder’s Inequality.

1. For k = 2, we have
1 1 1
= + ,
r p1 p2
R
so r < min(p1 , p2 ), Lp1 ⊂ Lr and Lp2 ⊂ Lr . Since u1 ∈ Lp1 and u2 ∈ Lp2 , so |u1 u2 | < ∞ and U
|u1 u2 |r dx <
∞. Therefore, u1 u2 ∈ Lr (U ).
1 1
1= + .
p1 /r p2 /r
Then applying Hölder’s inequality for |u1 u2 |r , we have

Z
|u1 u2 |r dx
U
Z ! pr Z ! pr
p1 1 p2 2
≤ (|u1 |r ) r dx (|u2 |r ) r dx
U U
Z ! pr Z ! pr
1 2
= |u1 |p1 dx |u2 |p2 dx
U U
≤ ku1 krLp1 (U ) ku2 krLp2 (U ) .
2. Induction assumption: Assume the inequality holds for k = n − 1, i.e. Πnk=1 ui ∈ Lr (U ) and
Z
r
|u1 · · · un−1 |r dx ≤ Πkn−1
=1 kui kLpk (U ).
U
3. Induction result: for k = n, we have

1 1 1
+ ··· + = .
p1 pn r
R
so r < min(p1 , p2 , · · · , pn ) and Lpk ⊂ Lr . Since uk ∈ Lpk , so Πnk=1 |ui | ∈ Lr (U ) < ∞ and U
|u1 · · · un |r dx <
∞. Therefore, Πnk=1 ui ∈ Lr (U ). let
1 1 1
+ ··· + = .
p1 pn−1 p
so
1 1 1
+ = .
p pn r
From the Hölder’s inequality for n = 2 and the induction assumption, we have
Z Z ! pr Z ! pr
n
r p pn
|u1 · · · un | dx ≤ |u1 · · · un−1 | dx |un | dx
U U U
≤ ku1 krLp (U ) ku2 krLpn (U ) = Πnk=1 kui krLpk (U ).
Page 16 of 236
Corollary 1.4. (General Hölder’s Inequality) Let 1 < p1 , · · · , pn < ∞, p1 + · · · + 1

pn = 1. If uk ∈ Lpk (U ),
1
then we have Πnk=1 ui ∈ L1 (U ) and
Z
|u1 · · · un |dx ≤ Πnk=1 kui kLpk (U ). (38)
U
for k = 1, 2, · · · , n − 1.
Proof. Take r = 1 in last General Hölder’s Inequality.
Definition 1.17. (Discrete Hölder’s Inequality) Let 1 < p, q < ∞, p1 + 1q = 1. Then for all ak , bk ∈ Rn ,

n

n
1/p  n 1/q
X X  X 
p q
ak bk ≤ 
 |ak |  
  |bk |  . (39)
k =1 k =1 k =1
P p P q
Proof.
P p The idea of proof is same to the integral version. Suppose |ak | , 0 and |bk | , 0. Otherwise, if
|ak | = 0, then ak ≡ 0 and the Hölder’s Inequality is trivial. We can use the same argument for bk . Now,
we define f , g as following
ak b
fk = , gk = k . (40)
kak`p kbk`q
Now applying Young’s inequality for f g, we have
p q
ak bk 1 ak 1 bk
fk gk = ≤ p + . (41)
kak`p kbk`q p kak p q kbkqq
` `
Taking summation yields
∞ ∞
X X ak bk
fk gk =
kak`p kbk`q
k =1 k =1
∞  p q 
X  1 ak 1 bk 
≤  +  (42)
p kakpp q kbkqq 
k =1 ` `
P∞ p P∞ q
1 k = 1 ak 1 k =1 bk
= +
p kakpp q kbkqq
` `
1 1
= + = 1.
p q
Therefore
∞
X
ak bk ≤ kak`p kbk`q . (43)
k =1
Corollary 1.5. (Discrete Hölder’s Inequality) Let ak ∈ ` 1 and bk ∈ ` ∞ . Then ak bk ∈ ` 1 and

 
X n X n 
!
ak bk ≤ 
 |ak | sup |bk | .
 (44)
k =1 k =1 k∈N
Page 17 of 236
Proof.
 n   n 
X n X n X n X  X 
!
ak bk ≤ |ak bk | ≤ |ak ||bk | ≤  sup(|bk |)|ak | ≤ 
  |ak | sup |bk | .
 (45)
k =1 k∈N k∈N
k =1 k =1 k =1 k =1
Definition 1.18. (Cauchy-Schwarz’s Inequality) Let u, v ∈ L2 (U ). Then
|uv|2 ≤ kukL2 (U ) kvkL2 (U ) . (46)
Proof. Take p = q = 2 in Hölder’s inequality.
Definition 1.19. (Discrete Cauchy-Schwarz’s Inequality)

2
n n n
X y 2 .
X X
xi yi ≤ |xi |2 i (47)
i =1 i =1 i =1
Proof. Take p = q = 2 in Discrete Hölder’s inequality.
Definition 1.20. (Minkowski’s Inequality) Let 1 ≤ p < ∞ and u, v ∈ Lp (U ). Then
ku + vkLp (U ) ≤ kukLp (U ) + kvkLp (U ) . (48)
R R R
Proof. Suppose U |u|p dx , 0 and U |v|q dx , 0. Otherwise, if U |u|p dx = 0, then u ≡ 0 a.e.. We can use the
same argument for v. Then the Minkowski’s Inequality is trivial. First, We have the following fact
|u + v|p ≤ (|u| + |v|)p ≤ 2p max(|u|p , |v|p ) ≤ 2p (|u|p + |v|p ) < ∞. (49)
Hence u + v ∈ Lp (U ) if u, v ∈ Lp (U ). Let
1 1 p
+ = 1 or q = . (50)
p q p−1
Then, we have the fact that if u + v ∈ Lp then |u + v|p−1 ∈ Lq , since |u + v|p−1 < ∞ and
Z ! 1q Z ! p1 ·(p−1)
p−1 q p−1

|u + v|p−1 q = |u + v| dx = |u + v|p
dx = ku + vkLp < ∞. (51)
L
U U
Now, we can use Hölder’s inequality for |u + v| · |u + v|p−1 , i.e.

Z Z
p
ku + vkLp = |u + v|p dx = |u + v||u + v|p−1 dx
U
ZU
≤ |u||u + v|p−1 + |v||u + v|p−1 dx
U
Z Z
p−1
≤ |u||u + v| dx + |v||u + v|p−1 dx (52)
U U

≤ kukLp |u + v|p−1 Lq + kvkLp |u + v|p−1 Lq

= (kukLp + kvkLp ) |u + v|p−1 Lq
p−1
= (kukLp + kvkLp ) ku + vkLp .
Page 18 of 236
p−1
Since ku + vkLp , 0, dividing ku + vkLp on both side of (52) yields
ku + vkLp ≤ kukLp + kvkLp . (53)
Definition 1.21. (Discrete Minkowski’s Inequality) Let 1 ≤ p < ∞ and ak , bk ∈ Lp (U ). Then u + v ∈

Lp (U ) and
 n 1/p  n 1/p  n 1/p
X  X  X 
p p p

 |ak + bk | 
 ≤  |ak |  +  |bk |  .
 (54)
k =1 k =1 k =1
Proof. The idea is similar to the continuous case.

n
X n
X
|ak + bk |p = |ak + bk | |ak + bk |p−1
k =1 k =1
X n
≤ |ak | |ak + bk |p−1 + |bk | |ak + bk |p−1
k =1
 n 1/p  n 1/q
X  X h iq 
p p−1
≤ 
 |ak |  
  |ak + bk | 

k =1 k =1
 n 1/p  n 1/q !
X
p
 X h
p−1
iq  1 1
+  |bk |  
  |ak + bk |  + =1
p q
 
k =1 k =1
 n 1/p  n 1/q
X  X 
p p
= 
 |ak |  
  |ak + bk | 
k =1 k =1
 n 1/p  n 1/q
X  X 
p p
+ 
 |bk |  
  |ak + bk | 
k =1 k =1
 n 1/p  n  p−1
X  X  p
= 
 |ak |p   |ak + bk |p 
k =1 k =1
 n 1/p  n  p−1
X  X  p
+ 
 |bk |p   |ak + bk |p 
k =1 k =1

n
1/p  n 1/p   n  p−1
X  X   X  p
p p p
= 
 |ak | 
 +  |bk |    |ak + bk |  .
  
 
k =1 k =1 k =1
P 1− p1
n p
Diving k =1 |ak + bk | on both sides of the above equation, we get
 n 1/p  n 1/p  n 1/p
X  X  X 
p p p

 |ak + bk | 
 ≤  |ak |  +  |bk |  .

k =1 k =1 k =1
Page 19 of 236
Definition 1.22. (Integral Minkowski’s Inequality) Let 1 ≤ p < ∞ and u (x, y ) ∈ Lp (U ). Then
Z Z p ! 1 Z Z ! p1
u (x, y )dx dy p ≤ p
|u (x, y )| dy dx. (55)

Proof. 1. When p = 1, then

Z Z Z Z Z Z
u (x, y )dx dy ≤ u (x, y ) dxdy = u (x, y ) dydx. (56)

Where, the last step follows by Fubini’s theorem for nonnegative measurable functions.
2. When 1 < p < ∞,
Z Z p
u (x, y )dx dy

Z Z !p
≤ u (x, y ) dx dy
Z Z !p−1 Z !
= u (x, y ) dx
u (x, y ) dx dy
| {z }
independent on x
Z Z Z !p−1
= u (x, y ) dx
u (x, y ) dxdy
Z Z Z !p−1
= u (x, y ) dx
u (x, y ) dydx (Fubini)
Z Z Z !p−1
= u (x, y ) dx
u (x, y ) dydx
Z Z Z !(p−1)q 1/q Z !1/p

u (x, y ) dx  p
dy  u (x, y ) dy dx (Hölder’s)

≤ 



 1/q
 

Z Z Z !p  Z !1/p

u (x, y ) dx dy  u (x, y )p dy 1 1
= dx ( + = 1)


p q
 
 
| {z }
constant
Z Z !p !1/q Z Z !1/p
u (x, y ) dx dy p
= u (x, y ) dy dx
So, we get
Z Z !p Z Z !p !1−1/p Z Z !1/p
u (x, y ) dx dy ≤ u (x, y ) dx dy p
u (x, y ) dy dx.
R R p 1−1/p
dividing u (x, y ) dx dy on both sides of the above equation yields
Z Z !p !1/p Z Z !1/p
u (x, y ) dx dy p
≤ u (x, y ) dy dx.
Page 20 of 236
Hence, we proved the result by the following fact

Z Z p !1/p Z Z !p !1/p
u (x, y )dx dy ≤
u (x, y ) dx dy .

Definition 1.23. (Differential Version of Gronwall’s Inequality ) Let η (·) be a nonnegative, absolutely
continuous function on [0, T], which satisfies for a.e t the differential inequality
η 0 (t ) ≤ φ (t )η (t ) + ψ (t ), (57)
where φ(t ) and ψ (t ) are nonnegative, summable functions on [0, T]. Then
Rt " Zt #
φ ( s ) ds
η (t ) ≤ e 0 η (0) + ψ (s )ds , ∀0 ≤ t ≤ T . (58)
0
In particular, if
η 0 ≤ φη, on[0, T ] and η (0) = 0, (59)
η (t ) = 0, ∀0 ≤ t ≤ T . (60)
Proof. Since
η 0 (t ) ≤ φ(t )η (t ) + ψ (t ), a.e.0 ≤ t ≤ T . (61)
then
η 0 (s ) − φ(s )η (s ) ≤ ψ (s ), a.e.0 ≤ s ≤ T . (62)
Let
Rs
φ(ξ )dξ
f (s ) = η (s )e − 0 . (63)
By product rule and chain rule, we have
df Rs
φ(ξ )dξ
Rs
φ(ξ )dξ
= η 0 (s )e − 0 − η (s )e − 0 φ (s ), (64)
ds Rs
φ(ξ )dξ
= (η 0 (s ) − η (s )φ(s ))e− 0 (65)
Rs
φ(ξ )dξ
≤ ψ (s )e − 0 , a.e.0 ≤ t ≤ T . (66)
Integral the above equation from 0 to t, then we get
Zt Rs Rt Zt Rs
− 0 φ(ξ )dξ − 0 φ(ξ )dξ
η (s )e ds = η (t )e − η (0) ≤ ψ (s )e− 0 φ(ξ )dξ ds,
0 0
i.e.
Rt Z t Rs
φ(ξ )dξ φ(ξ )dξ
η (t )e − 0 ≤ η (0) + ψ (s )e − 0 ds.
0
Therefore
Rt " Z t Rs #
φ(ξ )dξ − φ(ξ )dξ
η (t ) ≤ e 0 η (0) + ψ (s )e 0 ds .
0
Page 21 of 236
Definition 1.24. (Integral Version of Gronwall’s Inequality) Let ξ (·) be a nonnegative, summable func-
tion on [0, T], which satisfies for a.e t the integral inequality
Z t
ξ (t ) ≤ C1 ξ (s )ds + C2 , (67)
0
where C1 , C2 ≥ 0. Then

ξ (t ) ≤ C2 1 + C1 teC1 t , ∀a.e. 0 ≤ t ≤ T . (68)
In particular, if
Z t
ξ ( t ) ≤ C1 ξ (s )ds, ∀a.e. 0 ≤ t ≤ T , (69)
0
ξ (t ) = 0, a.e. (70)
Proof. Let
Z t
η (t ) : = ξ (s )ds, (71)
0
then
η 0 (t ) = ξ (t ). (72)
Since
Z t
ξ (t ) ≤ C1 ξ (s )ds + C2 , (73)
0
so
η 0 (t ) ≤ C1 η (t ) + C2 . (74)
By Differential Version of Gronwall’s Inequality, we get

Rt Z t
C1 ds
η (t ) ≤ e 0 [η (0) + C2 ds ], (75)
0
i.e.
η (t ) ≤ C2 teC1 t . (76)
Therefore
Z t
ξ (s )ds ≤ C2 teC1 t . (77)
0
Taking derivative w.r.t t on both side of the above, we get
ξ (t ) ≤ C2 eC1 t + C2 teC1 t C1 = C2 (1 + C1 teC1 t ). (78)
Page 22 of 236
Definition 1.25. (Discrete Version of Gronwall’s Inequality) If
(1 + γ )an+1 ≤ an + βfn , β, γ ∈ R, γ , −1, n = 0, · · · , (79)
then,
n
a0 X fk
an + 1 ≤ +
+ β . (80)
(1 + γ ) n 1 (1 + γ )n−k +1
k =0
Proof. We will use induction to prove this discrete Gronwall’s inequality.

1. For n = 0, then
(1 + γ )a1 ≤ a0 + βf0 , (81)
so
a0 f0
a1 ≤ +β . (82)
(1 + γ ) (1 + γ )n−k
2. Induction Assumption: Assume the discrete Gronwall’s inequality is valid for k = n − 1, i.e.
n−1
a0 X fk
an ≤ + β . (83)
(1 + γ )n (1 + γ )n−k
k =0
3. Induction Result: For k = n, we have
( 1 + γ ) an + 1 ≤ an + βfn
n−1
a0 X fk
≤ n
+ β + βfn
(1 + γ ) (1 + γ )n−k
k =0
n−1
a0 X fk fn
≤ +β +β (84)
(1 + γ )n (1 + γ ) n−k (1 + γ )n−n
k =0
n
a0 X fk
= +β .
(1 + γ )n (1 + γ )n−k
k =0
Dividing 1 + γ on both sides of the above equation gives

n
a0 X fk
an + 1 ≤ + β . (85)
(1 + γ )n+1 (1 + γ )n−k +1
k =0
Definition 1.26. (Interpolation Inequality for Lp -norm) Assume 1 ≤ p ≤ r ≤ q ≤ ∞ and
1 θ 1−θ
= + . (86)
r p q
Suppose also u ∈ Lp (U ) ∩ Lq (U ). Then u ∈ Lr (U ), and
kukLr (U ) ≤ kukθLp (U ) kuk1−θ

Lq ( U )
. (87)
Page 23 of 236
1 1
Proof. If 1 ≤ p < r < q then q < r < p1 , hence there exists θ ∈ [0, 1] s.t. 1
r = θ p1 + (1 − θ ) 1q , therefore:
rθ r (1 − θ ) 1 1
1= + = p + q . (88)
p q rθ r (1−θ )
p q
And |u|rθ ∈ L rθ , |u|r (1−θ ) ∈ L r (1−θ ) , since
! rθp ! rθp
Z p Z
|u| rθ rθ
dx = p
|u| dx = kukrθ
Lp (U )
< ∞, (89)
U U
! r (1−θ ) ! r (1−θ )
Z q q
Z q
r (1−θ )
r (1−θ ) r (1−θ ) q
|u| dx = |u| dx = kukLq (U ) < ∞. (90)
U U
Now, we can use Hölder’s inequality for |u|r = |u|rθ |u|r (1−θ ) , i.e.
Z Z
|u|r dx = |u|rθ |u|r (1−θ ) dx
U U
! rθp Z ! r (1−θ )
Z p q q
rθ rθ r (1−θ ) r (1−θ )
≤ |u| dx |u| dx . (91)
U U
r (1−θ )
= kukrθ
Lp (U ) kukLq (U )
. (92)
Therefore

Lq ( U )
. (93)
Definition 1.27. (Interpolation Inequality for Lp -norm) Assume 1 ≤ p ≤ r ≤ q ≤ ∞ and f ∈ Lq . Suppose

also u ∈ Lp (U ) ∩ Lq (U ). Then u ∈ Lr (U ),
1/p−1/r 1/r−1/q
1/p−1/q 1/p−1/q
kukLr (U ) ≤ kukLp (U ) kukLq (U ) . (94)
1 1
Proof. If 1 ≤ p < r < q then q < r < p1 , hence there exists θ ∈ [0, 1] s.t. 1
r = θ p1 + (1 − θ ) 1q , therefore:
rθ r (1 − θ ) 1 1
1= + = p + q . (95)
p q rθ r (1−θ )
p q
And |u|rθ ∈ L rθ , |u|r (1−θ ) ∈ L r (1−θ ) , since
! rθp ! rθp
Z p Z
|u| rθ rθ
dx = p
|u| dx = kukrθ
Lp (U )
< ∞, (96)
U U
! r (1−θ ) ! r (1−θ )
Z q q
Z q
r (1−θ )
r (1−θ ) r (1−θ ) q
|u| dx = |u| dx = kukLq (U ) < ∞. (97)
U U
Page 24 of 236
Now, we can use Hölder’s inequality for |u|r = |u|rθ |u|r (1−θ ) , i.e.
Z Z
r
|u| dx = |u|rθ |u|r (1−θ ) dx
U U
! rθp Z ! r (1−θ )
Z p q q
rθ rθ r (1−θ ) r (1−θ )
≤ |u| dx |u| dx . (98)
U U
r (1−θ )
= kukrθ
Lp (U ) kukLq (U )
. (99)
Therefore

Lq ( U )
. (100)
1/p−1/r
Let θ = 1/p−1/q , then we get
1/p−1/r 1/r−1/q
1/p−1/q 1/p−1/q
kukLr (U ) ≤ kuk Lp (U )
kuk Lq (U )
. (101)
Theorem 1.23. (1D Dirichlet-Poincaré inequality) Let a > 0, u ∈ C 1 ([−a, a]) and u (−a) = 0, then the
1D Dirichlet-Poincaré inequality is defined as follows
Z a Z a
u (x )2 dx ≤ 4a2 u 0 (x )2 dx.
−a −a
Proof. Since u (−a) = 0, then by calculus fact, we have

Z x
u (x ) = u (x ) − u (−a) = u 0 (ξ )dξ.
−a
Therefore
Z x
u (x ) 0
≤
u ( ξ ) dξ

Z −a
x
≤ u 0 (ξ ) dξ
Z−aa
≤ u 0 (ξ ) dξ (x ≤ a)
−a
Z a !1/2 Z a !1/2
2
2 0
u (ξ ) dξ
≤ 1 dξ (Cauchy-Schwarz inequality)
−a −a
Z a !1/2
= (2a)1/2 u 0 (ξ )2 dξ .
−a
Therefore
2 Z a

u (x ) ≤ 2a u 0 (ξ )2 dξ.
−a
Page 25 of 236
Integration on both sides of the above equation from −a to a w.r.t. x yields

Z a Z a Z a
u (x )2 dx ≤ 2a u 0 (ξ )2 dξdx
−a
Z−aa −a
2 Z a
0
= u (ξ ) dξ
2adx
−a −a
Z a
= 4a2 u 0 (ξ )2 dξ
Z−aa
= 4a2 u 0 (x )2 dx.
−a
>a
Theorem 1.24. (1D Neumann-Poincaré inequality) Let a > 0, u ∈ C 1 ([−a, a]) and ū = −a
u (x )dx, then
the 1D Neumann-Poincaré inequality is defined as follows
Z a Z a
u (x ) − ū (x )2 dx ≤ 2a(a − c ) u 0 (x )2 dx.
−a −a
>a
Proof. Since, ū = −a
u (x )dx, then by intermediate value theorem, there exists a c ∈ [−a, a], s.t.
u (c ) = ū (x ).
then by calculus fact, we have

Z x
u (x ) − ū (x ) = u (x ) − u (c ) = u 0 (ξ )dξ.
c
Therefore
Z x
u (x ) − ū (x ) 0
≤
u (ξ )dξ
c
Z x
≤ u 0 (ξ ) dξ
Zc a
≤ u 0 (ξ ) dξ (x ≤ a)
c
Z a !1/2 Z a !1/2
2
2 0
≤ 1 dξ u (ξ ) dξ
(Cauchy-Schwarz inequality)
c c
Z a !1/2
= (a − c )1/2 u 0 (ξ )2 dξ .
−a
Therefore
2 Z a

u (x ) − ū (x ) ≤ (a − c ) u 0 (ξ )2 dξ.
−a
Page 26 of 236

Z a Za Z a
u (x ) − ū (x )2 dx ≤ (a − c ) u 0 (ξ )2 dξdx
−a
Z−aa −a
2 Z a
0
= u (ξ ) dξ
(a − c )dx
−a −a
Z a
= 2a(a − c ) u 0 (ξ )2 dξ
Z−aa
= 2a(a − c ) u 0 (x )2 dx.
−a
1.4 Norms’ Preliminaries

1.4.1 Vector Norms
Definition 1.28. (Vector Norms) A vector norm is a function k·k : Rn 7− R satisfying the following condi-
tions for all x, y ∈ Rn and α ∈ R
1. nonnegative : kxk ≥ 0, (kxk = 0 ⇔ x = 0),
2. homegenity : kαxk = |α| kxk,

3. triangle inequality : x + y ≤ kxk + y , ∀x, y ∈ Rn ,
Definition 1.29. For x ∈ Rn , some of the most frequently used vector norms are
n
X 3. ∞-norm : kxk∞ = max |xi |,
1. 1-norm : kxk1 = |xi |, 1≤i≤n
i =1
v
t n  n 1/p
X X 
2. 2-norm : kxk2 = |2 , p
|xi 4. p-norm : kxkp =  |xi |  .
i =1 i =1
Corollary 1.6. For all x ∈ Rn ,

√
kxk2 ≤ kxk1 ≤ n kxk2 , (102)
√
kxk∞ ≤ kxk2 ≤ n kxk∞ , (103)
1 √
√ kxk1 ≤ kxk2 ≤ n kxk1 , (104)
n
√
kxk∞ ≤ kxk1 ≤ n kxk∞ . (105)
Theorem 1.25. (vector 2-norm invariance) Vector 2-norm is invariant under the orthogonal transforma-
tion, i.e., if Q is an n × n orthogonal matrix, then
kQxk2 = kxk2 , ∀x ∈ Rn (106)
Proof.
kQxk22 = (Qx )T (Qx ) = xT QT Qx = xT x = kxk22 .
Page 27 of 236
1.4.2 Matrix Norms
Definition 1.30. (Matrix Norms) A matrix norm is a function k·k : Rm×n 7− R satisfying the following
conditions for all A, B ∈ Rm×n and α ∈ R
1. nonnegative : kxk ≥ 0, (kxk = 0 ⇔ x = 0),
2. homegenity : kαxk = |α| kxk,

3. triangle inequality : x + y ≤ kxk + y , ∀x, y ∈ Rn ,
Definition 1.31. For A ∈ Rm×n , some of the most frequently matrix vector norms are
v
t m X
n Xn
X
1. F-norm : kAkF = 2
|aij | , 3. ∞-norm : kAk ∞ = max |aij |,
1≤i≤m
i =1 i =1 j =1
m
X kAxkp
2. 1-norm : kAk1 = max |aij |, 4. induced-norm : kAkp = sup .
1≤j≤n kxkp
i =1 x∈Rn ,x,0
Corollary 1.7. For all A ∈ Cn×n ,

√
kAk2 ≤ kAkF ≤ n kAk2 , (107)
1 √
√ kAk2 ≤ kAk∞ ≤ n kAk2 , (108)
n
1 √
√ kAk∞ ≤ kAk2 ≤ n kAk∞ , (109)
n
1 √
√ kAk1 ≤ kAk2 ≤ n kAk1 . (110)
n
Corollary 1.8. For all A ∈ Cn×n , then kAk2 ≤ kAk1 kAk∞ .

p
Proof.
kAk22 = ρ (A)2 = λ ≤ kAk1 kA∗ k1 = kAk1 kAk∞ .
where λ is the eigenvalue of A∗ A.
Theorem 1.26. (Matrix 2-norm and Frobenius invariance) (Matrix 2-norm and Frobenius are invariant
under the orthogonal transformation, i.e., if Q is an n × n orthogonal matrix, then
kQAk2 = kAk2 , ∀A ∈ Rn×n , (111)

kQAkF = kAkF , ∀A ∈ Rn×n (112)
Page 28 of 236
Theorem 1.27. (Neumann Series) Suppose that A ∈ Rn×n . If kAk < 1, then (I − A) is nonsingular and
∞
X
(I − A)−1 = Ak (113)
k =0
with
1 1
≤ (I − A)−1 ≤

. (114)
1 + kAk 1 − kAk
Moreover, if A is nonnegative, then (I − A)−1 =

P∞ k
k =0 A is also nonnegative.
Proof. 1. (I-A) is nonsingular, i.e. (I − A)−1 exits.

(I − A)x ≥ kIxk − kAxk
≥ kxk − kAk kxk
= (1 − kAk) kxk
= C kxk .
So, we get if (I − A)x = 0, then x = 0. Therefore, ker (I − A) = 0, then (I − A)−1 exists.

2. Let SN = N k k
k =0 A , we want to show (I − A)SN → I, as N → ∞. First, we would like to show A ≤
P
k
kAk .

k x k−1 x
kAk
Ak = sup ≤ · · · ≤ kAkk .
A A
≤ sup
0,x∈Cn kxk 0,x∈Cn kxk
N
X N
X +1
(I − A)SN = SN − ASN = Ak − Ak = A0 − AN + 1 = I − AN + 1 .
k =0 k =1
So
(I − A)SN − I = −AN +1 ≤ kAkN +1 .

Since kAk < 1, then kAkN +1 → 0. Therefore,

∞
X
(I − A) Ak = I.
k =0
and
∞
X
(I − A)−1 = Ak
k =0
3. bounded norm
Since

1 = kIk = (I − A) ∗ (I − A)−1 .
So,

(1 − kAk) (I − A)−1 ≤ 1 ≤ (1 + kAk) (I − A)−1 .
Page 29 of 236
Therefore,
1 1
≤ (I − A)−1 ≤

.
1 + kAk 1 − kAk
Lemma 1.1. Suppose that A ∈ Rn×n . If (I − A) is singular, then kAk ≥ 1.
Proof. Converse-negative proposition of If kAk < 1, then (I − A) is nonsingular.
Theorem 1.28. Let A be a nonnegative matrix. then ρ (A) < 1 if only if I − A is nonsingular and (I − A)−1
is nonnegative.
Proof. 1. By theorem (1.27).

2. ⇐ since I − A is nonsingular and (I − A)−1 is nonnegative, by the Perron- Frobenius theorem, there is
a nonnegative eigenvector u associated with ρ (A), which is an eigenvalue, i.e.
Au = ρ (A)u
or
1
(I − A)−1 u = u.
1 − ρ (A)
since I − A is nonsingular and (I − A)−1 is nonnegative, this show that 1 − ρ (A) > 0, which implies
ρ (A) < 1.
1.5 Problems
Problem 1.2. (Prelim Jan. 2011#2) Let A ∈ Cm×n and b ∈ Cm . Prove that the vector x ∈ Cn is a least
squares solution of Ax = b if and only if r⊥ range(A), where r = b − Ax.
Solution. We already know, x ∈ Cn is a least squares solution of Ax = b if and only if
A∗ Ax = A∗ b.
and
(r, Ax ) = (Ax ) ∗ r = x∗ A∗ (b − Ax )
= x∗ (A∗ b − A∗ Ax ))
= 0.
Therefore, r⊥ range(A). The above way is invertible, hence we prove the result. J
Page 30 of 236
Problem 1.3. (Prelim Jan. 2011#3) Suppose A, B ∈ Rn×n and A is non-singular and B is singular. Prove
that
1 kA − Bk
≤ ,
κ (A) kAk

where κ (A) = kAk · A−1 , and k·k is an reduced matrix norm.
Solution. Since B is singular, then there exists a vector x , 0, s.t. Bx = 0. Since A is non-singular, then
A−1 is also non-singular. Moreover, A−1 Bx = 0. Then, we have
x = x − A−1 Bx = (I − A−1 B)x.
So

kxk = (I − A−1 B)x ≤ A−1 A − A−1 B kxk ≤ A−1 kA − Bk kxk .
Since x , 0, so

1 ≤ A−1 kA − Bk .
1 kA − Bk
≤ ,
A−1 kAk kAk
i.e.
1 kA − Bk
≤ .
κ (A) kAk
J
Problem 1.4. (Prelim Aug. 2010#2) Suppose that A ∈ Rn×n is SPD.

√
1. Show that kxkA = xT Ax defines a vector norm.
2. Let the eigenvalues of A be ordered so that 0 < λ1 ≤ λ2 ≤ · · · ≤ λn . Show that
p p
λ1 kxk2 ≤ kxkA ≤ λn kxk2 .
for any x ∈ Rn .
3. Let b ∈ Rn be given. Prove that x∗ ∈ Rn solves Ax = b if and only if x∗ minimizes the quadratic
function f : Rn → R defined by
1 T
f (x ) = x Ax − xT b.
2
√ √
Solution. √ 1. (a) Obviously, kxkA = xT Ax ≥ 0. When x = 0, then kxkA = xT Ax = 0; when kxkA =
xT Ax = 0, then we have (Ax, x ) = 0, since A is SPD, therefore, x ≡ 0.
√ √ √
(b) kλxkA = λxT Aλx = λ2 xT Ax = |λ| xT Ax = |λ| kxkA .
(c) Next we will show x + y A ≤ kxkA + y A . First, we would like to show

y T Ax ≤ kxkA y .
A
Page 31 of 236
Since A is SPD, therefore A = RT R, moreover

q √ √
kRxk2 = (Rx, Rx )1/2 = (Rx )T Rx = xT RT Rx = xT Ax = kxkA .
Then
c.s.
y T Ax = y T RT Rx = (Ry )T Rx = (Rx, Ry ) ≤ kRxk2 Ry = kxkA y .
2 A
And
2
x + y = (x + y, x + y )A = (x, x )A + 2(x, y )A + (y, y )A
A

≤ kxkA + 2 y T Ax + y A

≤ kxkA + 2 kxkA y A + y A
2
= kxkA + y A .
therefore

x + y ≤ kxkA + y .
A A
2. Since A is SPD, therefore A = RT R, moreover

q √ √
√
Let 0 < λ̃1 ≤ λ̃2 ≤ · · · ≤ λ̃n be the eigenvalue of R, then λ̃i = λi . so

λ̃1 kxk2 ≤ kRxk2 = kxkA ≤ λ̃n kxk2 .
i.e.
p p
λ1 kxk2 ≤ kRxk2 = kxkA ≤ λn kxk2 .
3. Since
∂ T ∂ T ∂
x Ax = x Ax + xT (Ax )
∂xi ∂xi ∂xi
0
 
 
 .. 
.
 
 

0

 
T  
= [0, · · · , 0, 1, 0, · · · , 0]Ax + x A  1  i
i


 0 

 .. 
.
 
 
0
 

= (Ax )i + AT x = 2 (Ax )i .
i
and
∂ T ∂ T
x b = x b = [0, · · · , 0, 1, 0, · · · , 0]b = bi .
∂xi ∂xi i
Therefore,
1
∇f (x ) =
2Ax − b = Ax − b.
2
If Ax∗ = b, then ∇f (x∗ ) = Ax∗ − b = 0, therefore x∗ minimizes the quadratic function f. Conversely,
when x∗ minimizes the quadratic function f, then ∇f (x∗ ) = Ax∗ − b = 0, therefore Ax∗ = b.
J
Page 32 of 236
2 Direct Method
2.1 For squared or rectangular matrices A ∈ Cm,n , m ≥ n
2.1.1 Singular Value Decomposition
Theorem 2.1. (Reduced SVD) Suppose that A ∈ Rm×n .
A = Û Σ̂ V̂ ∗ .
m×n n×n n×n
This is called a Reduced SVD of A. where

• σi – Singular values and Σ̂ = diag (σ1 , σ2 , · · · , σn ) ∈ Rn×n .
• vi – right singular vectors and Û = [u1 , u2 , · · · , un ].
• ui – left singular vectors and V̂ = [v1 , v2 , · · · , vn ].
Theorem 2.2. (SVD) Suppose that A ∈ Rm×n .
A= U Σ V∗ .
m×m m×n n×n
This is called a SVD of A. where

• σi – Singular values and Σ̂ = diag (σ1 , σ2 , · · · , σn ) ∈ Rn×n .
• vi – right singular vectors, Û = [u1 , u2 , · · · , um ] and U is unitary.
• ui – left singular vectors, V̂ = [v1 , v2 , · · · , vn ] and V is unitary.
Remark 2.1. 1. SVD works for any matrices, spectral decomposition only works for squared matrices.
2. The spectral decomposition A = XΛX −1 works only if A is non-defective matrices.
For a symmetric matrix the following decompositions are equivalent to SVD.
1. Eigen-value decomposition: i.e. A = XΣX −1 . When A is symmetric, the eigenvalues are real and the
eigenvectors can be chosen to be orthonormal and hence X T X = XX T = I i.e.X −1 = XT . The only
difference is that the singular values are the magnitudes of the eigenvalues and hence the column of X
needs to be multiplied by a negative sign if the eigenvalue turns out to be negative to get the singular
value decomposition. Hence, U=X and σi = |λi |.
2. Orthogonal decomposition: i.e. A = P DP T , where P is a unitary matrix and D is a diagonal matrix.
This exists only when matrix A is symmetric and is the same as eigenvalue decomposition.
3. Schur decomposition i.e. A = QSQT , where Q is a unitary matrix and S is an upper triangular
matrix. This can be done for any matrix. When A is symmetric, then S is a diagonal matrix and
again is the same as the eigenvalue decomposition and orthogonal decomposition.
Page 33 of 236
2.1.2 Gram-Schmidt orthogonalization
Definition 2.1. (projection operator) We define the projection operator as
(u, v)
proju (v) = u,
(u, u)
where (u, v) is the inner product of the vector u and v. If u = 0, we define
proj0 (v) = 0.
Remark 2.2. 1. This operator projects the vector v orthogonally onto the line spanned by vector u.
2. the projection map proj0 is the zero map, sending every vector to the zero vector.
Definition 2.2. (Gram-Schmidt orthogonalization) The Gram-Schmidt process then works as follows:
u1
u1 = v1 , q1 =
ku1 k
u
u2 = v2 − proju1 (v2 ), q2 = 2
ku2 k
u
u3 = v3 − proju1 (v3 ) − proju2 (v3 ), q3 = 3
ku3 k
u
u4 = v4 − proju1 (v4 ) − proju2 (v4 ) − proju3 (v4 ), q4 = 4
ku4 k
.. ..
. .
k−1
X uk
uk = vk − projuj (vk ), qk = .
kuk k
j =1
 r11 ··· r1n

 

A = [a1 , a2 , · · · , an ] = [q1 , q2 , · · · , qn ] 
 .. 
 .
 r22 . 
rnn

Definition 2.3. (projector) A projector is a square matrix P that satisfies
P2 = P.
Definition 2.4. (complementary projector) If P is a projector, then
I −P
is also a projector and is called complementary projector.
Definition 2.5. (orthogonal projector) If P is a orthogonal projector if only if
P = P ∗.
The complement of an orthogonal projector is also orthogonal projector.
Page 34 of 236
Definition 2.6. (projection with orthonormal basis) If P is a orthogonal projector, then P = P ∗ and P
has SVD, i.e. P = QΣQ∗ . Since an orthogonal projector has some singular values equal to zero (except the
identity map P=I), it is natural to drop the silent columns of Q and use the reduced rather than full SVD,
i.e.
P = Q̂Q̂∗ .
The complement projects onto the space orthogonal to range(Q̂).
Definition 2.7. (Gram- Schmidt projections)
P = I − Q̂Q̂∗ .
The complement projects onto the space orthogonal to range(Q̂).
Definition 2.8. (Householder reflectors) The householder reflector F is a particular matrix which satisfies
vv ∗
F = I −2 .
kvk
Comparsion 2.1. (Gram- Schmidt and Householder)
Gram − Schmidt A R1 R2 · · · Rn = Q̂ triangular orthogonalization

| {z }
R̂−1
Householder Qn · · · Q2 Q1 A = R orthogonal triangularization
| {z }
Q∗
2.1.3 QR Decomposition
Theorem 2.3. (Reduced QR Decomposition) Suppose that A ∈ Cm×n .
A = Q̂ R̂ .
m×n n×n
This is called a Reduced QR Decomposition of A. where

• Q̂ ∈ Cm×n – with orthonormal columns.
• R̂ ∈ Cn×n – upper triangular matrix.
Theorem 2.4. (QR Decomposition) Suppose that A ∈ Cm×n .
A= Q R .
m×m m×n
This is called a QR Decomposition of A. where

• Q ∈ Cm×m – is unitary.
• R ∈ Cm×n – upper triangular matrix.
Theorem 2.5. (Existence of QR Decomposition) Every A ∈ Cm×n has full and reduced QR decomposition.
Page 35 of 236
Theorem 2.6. (Uniqueness of QR Decomposition) Each A ∈ Cm×n of full rank has a unique reduced QR
decomposition A = Q̂R̂ with rjj > 0.
2.2 For squared matrices A ∈ Cn,n

A problem can be read as
f : D → S
Data → Solution
2.2.1 Condition number
Definition 2.9. (Well posedness ) We say that a problem is well- posed if the solution depends continuously
on the data, otherwise we say it is ill-posed.
Definition 2.10. (absolute condition number) The absolute condition number κ̂ = κ̂ (x ) of the problem f
at x is defined as

f (x + δx ) − f (x )
κ̂ = lim sup .
δ→0 kδxk≤δ kδxk
If f is (Freēchet) differentiable

κ̂ = Df (x ) .
Example 2.1. f: R2 → R and f (x1 , x2 ) = x1 − x2 , then
∂f ∂f
Df (x ) = [ , ] = [1, −1].
∂x1 ∂x2
and

κ̂ = Df (x ) ∞ = 1.
Definition 2.11. (relative condition number) The absolute condition number κ = κ (x ) of the problem f
at x is defined as
kf (x+δx)−f (x)k
kδxk
κ = lim sup
δ→0 kδxk≤δ kf (x)k
kxk
kxk
= κ̂.
f (x )
Page 36 of 236
Definition 2.12. (condition number of Matrix- Vector Multiplication) The absolute condition of f (x ) =
Ax
kA(x+δx)−A(x)k
kδxk
κ = lim sup
δ→0 kδxk≤δ kA ( x ) k
kxk
kxk
= kAk .
A(x )

Theorem 2.7. (condition of Matrix- Vector Multiplication) Since, kxk = A−1 Ax ≤ A−1 kAxk, then
kxk

≤ A−1 . So
kA ( x ) k

κ ≤ kAk A−1 .
Particularly,

κ = kAk2 A−1 2 .
Definition 2.13. (condition number of Matrix) Let A ∈ Cn×n , invertible, the condition number of A is

κ (A)k·k = kAk A−1 .
particularly,
σ
κ2 (A) = kAk2 A−1 2 = 1 .

σn
where σ1 · · · σn are singular value of A. So, kAk2 = σ1 .
2.2.2 LU Decomposition
Definition 2.14. (LU Decomposition without pivoting) Let A ∈ Cn×n . An LU factorization refers to the
factorization of A, with proper row and/or column orderings or permutations, into two factors, a lower
triangular matrix L and an upper triangular matrix U ,
A = LU .
In the lower triangular matrix all elements above the diagonal are zero, in the upper triangular matrix,
all the elements below the diagonal are zero. For example, for a 3-by-3 matrix A, its LU decomposition
looks like this:
a11 a12 a13  l11 0 0  u11 u12 u13 

    
a21 a22 a23  = l21 l22 0   0 u22 u23  .
     
a31 a32 a33 l31 l32 l33 0 0 u33
Page 37 of 236
Definition 2.15. (LU Decomposition with partial pivoting) The LU factorization with Partial Pivoting
refers often to the LU factorization with row permutations only,
P A = LU ,
where L and U are again lower and upper triangular matrices, and P is a permutation matrix which, when
left-multiplied to A, reorders the rows of A.
Definition 2.16. (LU Decomposition with full pivoting) An LU factorization with full pivoting involves
both row and column permutations,
P AQ = LU ,
where L, U and P are defined as before, and Q is a permutation matrix that reorders the columns of A
Definition 2.17. (LDU Decomposition) An LDU decomposition is a decomposition of the form
A = L̃D Ũ ,
where D is a diagonal matrix and L and U are unit triangular matrices , meaning that all the entries on
the diagonals of L and U are one.
For
example, for a 3-by-3 matrix A, its LDU decomposition looks like this:
a11 a12 a13   1 0 0 1 u12 u13 

    
a21
 a22 a23  = l21 1 0 0 1 u23  .
a31 a32 a33 l31 l32 1 0 0 1
   
Theorem 2.8. (existence of Decomposition) Any square matrix A admits an LUP factorization. If A is
invertible, then it admits an LU (or LDU) factorization if and only if all its leading principal minors are
nonsingular. If A is a singular matrix of rank k , then it admits an LU factorization if the first k leading
principal minors are nonsingular, although the converse is not true.
2.2.3 Cholesky Decomposition
Definition 2.18. (Cholesky Decomposition) In linear algebra, the Cholesky decomposition or Cholesky
factorization is a decomposition of a Hermitian, positive-definite matrix into the product of a lower trian-
gular matrix and its conjugate transpose,
A = LL∗ .
Definition 2.19. (LDM Decomposition) Let A ∈ Rn×n and all the leading principal minors det (A(1 :
k; 1 : k )) , 0; k = 1, · · · , n − 1. Then there exist unique unit lower triangular matrices L and M and a unique
diagonal matrix D = diag (d1 , · · · , dn ), such that
A = L̃DM T .
Page 38 of 236
Definition 2.20. (LDL Decomposition) A closely related variant of the classical Cholesky decomposition
is the LDL decomposition,
A = L̃D L̃∗ ,
where L is a lower unit triangular (unitriangular) matrix and D is a diagonal matrix.
Remark 2.3. This decomposition is related to the classical Cholesky decomposition, of the form LL∗ , as follows:
∗ 1 1 1 1
A = L̃D L̃ = L̃D 2 D 2 ∗ L̃∗ = L̃D 2 (L̃D 2 )∗ .
The LDL variant, if efficiently implemented, requires the same space and computational complexity to construct
and use but avoids extracting square roots. Some indefinite matrices for which no Cholesky decomposition exists
have an LDL decomposition with negative entries in D. For these reasons, the LDL decomposition may be pre-
ferred. For real matrices, the factorization has the form A = LDLT and is often referred to as LDLT decomposition
(or LDLT decomposition). It is closely related to the eigendecomposition of real symmetric matrices, A = QΛQT .
2.2.4 The Relationship of the Existing Decomposition

From last subsection, If A = A∗ , then
1. diagonal elements of A are real and positive .
2. principal sub matrices of A are HPD .
Comparsion 2.2. (Gram- Schmidt and Householder)

∗
A = L̃DM A = L̃DM ∗ = L̃D L̃∗ L̃ = M
1 1∗ 1
A = L̃D L̃∗ A = L̃D L̃∗ = L̃D D L̃∗
2 2 L = L̃D 2
A = LU A = LU = LL∗ U = L∗
2.2.5 Regular Splittings[3]
Definition 2.21. (Regular Splittings) Let A, M, N be three given matrices satisfying
A = M − N.
The pair of matrices M, N is a regular splitting of A, if M is nonsingular and M −1 and N are nonnegative .
Theorem 2.9. (The eigenvalue radius estimation of Regular Splittings[3]) Let M, N be a regular splitting
of A. Then
ρ (M −1 N ) < 1
if only if A is nonsingular and A−1 is nonnegative.
Proof. 1. Define G = M −1 N , since ρ (G ) < 1, then I − G is nonsingular. And then A = M (I − G ), so A is

nonsingular. So, by Theorem.1.28 satisfied, since G = M −1 N is nonsingular and ρ (G ) < 1, then we
have (I − G )−1 is nonnegative as is A−1 = (I − G )−1 M −1 .
Page 39 of 236
2. ⇐: since A,M are nonsingular and A−1 is nonnegative, then A = M (I − G ) is nonsingular. Moreover
−1
A−1 N = M (I − M −1 N ) N
= (I − M −1 N )−1 M −1 N
= (I − G )−1 G.
Clearly, G = M −1 N is nonnegative by the assumptions, and as a result of the Perron-Frobenius theo-
rem, there is a nonnegative eigenvector x associated with ρ (G ) which is an eigenvalue, such that
Gx = ρ (G )x.
Therefore
ρ (G )
A−1 N x = x.
1 − ρ (G )
Since x and A−1 N are nonnegative, this shows that

ρ (G )
≥ 0.
1 − ρ (G )
and this can be true only when 0 ≤ ρ (G ) ≤ 1. Since I − G is nonsingular, then ρ (G ) , 1, which implies
that ρ (G ) < 1.
2.3 Problems
Problem 2.1. (Prelim Aug. 2010#1) Prove that A ∈ Cm×n (m > n) and let A = Q̂R̂ be a reduced QR
factorization.
1. Prove that A has rank n if and only if all the diagonal entries of R̂ are non-zero.
2. Suppose rank(A) = n, and define P = Q̂Q̂∗ . Prove that range(P ) = range(A).
3. What type of matrix is P?
Solution. 1. From the properties of reduced QR factorization, we knowQthat Q̂ has orthonormal columns,
therefore det(Q̂ ) = 1 and R̂ is upper triangular matrix, so det(R̂) = ni=1 rii . Then
n
Y
det(A) = det(Q̂R̂) = det(Q̂ ) det(R̂) = rii .
i =1
Therefore, A has rank n if and only if all the diagonal entries of R̂ are non-zero.
2. (a) range(A) ⊆ range(P ): Let y ∈ range(A), that is to say there exists a x ∈ Cn s.t. Ax = y. Then by
reduced QR factorization we have y = Q̂R̂x. then
P y = P Q̂R̂x = Q̂Q̂∗ Q̂R̂x = Q̂R̂x = Ax = y.
therefore y ∈ range(P ).
(b) range(P ) ⊆ range(A): Let v ∈ range(P ), that is to say there exists a v ∈ Cn , s.t. v = P v = Q̂Q̂∗ v.
Claim 2.1.
Q̂Q̂∗ = A (A∗ A)−1 A∗ .
Page 40 of 236
Proof.
−1
A (A∗ A)−1 A∗

= Q̂R̂ R̂∗ Q̂∗ Q̂R̂ R̂∗ Q̂∗
−1
= Q̂R̂ R̂∗ R̂ R̂∗ Q̂∗
−1
= Q̂R̂R̂−1 R̂∗ R̂∗ Q̂∗
= Q̂Q̂∗ .
J
Therefore by the claim, we have
v = P v = Q̂Q̂∗ v = A (A∗ A)−1 A∗ v = A (A∗ A)−1 A∗ v = Ax.

where x = (A∗ A)−1 A∗ v. Hence v ∈ range(A).

3. P is an orthogonal projector.
J
Problem 2.2. (Prelim Aug. 2010#4) Prove that A ∈ Rn×n is SPD if and only if it has a Cholesky factor-
ization.
Solution. 1. Since A is SPD, so it has LU factorization, and L = U , i.e.

A = LU = U T U .
Therefore, it has a Cholesky factorization.
2. if A has Cholesky factorization, i.e A = U T U , then
xT Ax = xT U T U x = (U x )T U x.
Let y = U x, then we have
xT Ax = (U x )T U x = y T y = y12 + y22 + · · · + yn2 ≥ 0,
with equality only when y = 0, i.e. x=0 (since U is non-singular). Hence A is SPD.
J
Problem 2.3. (Prelim Aug. 2009#2) Prove that for any matrix A ∈ Cn×n , singular or nonsingular, there
exists a permutation matrix P ∈ Rn×n such that PA has an LU factorization, i.e. PA=LU.
Solution. J
Problem 2.4. (Prelim Aug. 2009#4) Let A ∈ Cn×n and σ1 ≥ σ2 ≥ · · · σn ≥ 0 be its singular values.
1. Let λ be an eigenvalue of A. Show that |λ| ≤ σ1 .
Q
2. Show that det(A) = n σ .
j =1 j
Solution. 1. Since σ1 = kAk2 (proof follows by induction), so we need to show |λ| ≤ kAk2 .
|λ| kxk2 = kλxk2 = kAxk ≤ kAk2 kxk2 .
Therefore,
|λ| ≤ σ1 .
Page 41 of 236
2.
Y n
det(A) = det(U ΣV ∗ ) det(U ) det(Σ) det(V ∗ ) = det(Σ) = σj .
j =1
Problem 2.5. (Prelim Aug. 2009#4) Let
Solution. J
Page 42 of 236
3 Iterative Method
3.1 Diagonal dominant
Definition 3.1. (Diagonal dominant of size δ) A ∈ Cn×n has diagonal dominant of size δ > 0 if
X
|aii | ≥ |aij | + δ.
j,i
Properties 3.1. If A ∈ Cn×n is diagonal dominant of size δ > 0 then

1. A−1 exists.

2. A−1 ≤ 1 .
∞ δ
Pn
Proof. 1. Let b = Ax and chose k ∈ (1, 2, 3, · · · , n) s.t kxk∞ = |xk |. Moreover, let bk = j =1 akj xj . Since
X
|aii | ≥ |aij | + δ,
j,i
and
X X X
|aij xj | ≤ |aij ||xj | ≤ kxk∞ |aij |.
j,i j,i j,i
Then

X n
|bk | = akj xj

j =1

X n
= akk xk + akj xj

j,k

X n
≥ |akk xk | − akj xj

j,k
X
≥ |akk xk | − kxk∞ |aij |
j,i
X
≥ |akk | kxk∞ − kxk∞ |aij |
j,i
= δ kxk∞ .
So, kAxk∞ = kbk∞ ≥ kbk k∞ ≥ δ kxk∞ . If Ax = 0, then x = 0. So, ker (A) = 0, and then, A−1 exists.

2. Since kAxk∞ = kbk∞ ≥ kbk k∞ ≥ δ kxk∞ , so kAxk ≥ δ kxk∞ and A−1 ∞ ≤ 1δ .
3.2 General Iterative Scheme

An iterative scheme for the solution
Ax = b, (115)
Page 43 of 236
is a sequence given by
xk +1 = φ(A, b, xk , · · · , xk−r ).
1. r = 0- two layer scheme.

2. r ≥ 1multi-layer scheme.
3. φ - is a linear function of its arguments then the scheme is linear, otherwise it is nonlinear.
k→∞
4. convergent if xk → x.
Definition 3.2. (General Iterative Scheme) A general linear two layer iterative scheme reads
x k +1 − x k
!
Bk + Axk = b.
αk
1. αk ∈ R, Bk ∈ Cn×n –iterative parameters

2. If αk = α, Bk = B, then the method is stationary.
3. If Bk = I, then the method is explicit.
If xk → x0 , then x0 solves Ax = b. So
!
x0 − x0
Bk + Ax0 = b,
αk
i.e.
Ax0 = b.
Now, consider the stationary scheme, i.e
x k +1 − x k
!
B + Axk = b.
α
Then we get
xk +1 = xk + αB−1 (b − Axk ).
Definition 3.3. (Error Transfer Operator) Let ek = x − xk , where x is exact solution and xk is the approx-
imate solution at k step. Then
x = x + αB−1 (b − Ax )
x k +1 = xk + αB−1 (b − Axk ).
So, we get
ek +1 = ek + αB−1 Aek = (I − αB−1 A)ek := T ek .
T = I − αB−1 A is the error transfer operator.
After we defined the error transfer operator , the iterative can be written as
xk +1 = T xk + αB−1 b.
Page 44 of 236
Theorem 3.1. (sufficient condition for converges) The sufficient condition for converges is
kT k < 1. (116)
Theorem 3.2. (sufficient & necessary condition for converges) The sufficient & necessary condition for
converges is
ρ (T ) < 1, (117)
where ρ (T ) is the spectral radius of T.
3.3 Stationary cases iterative method

3.3.1 Jacobi Method
Definition 3.4. (Jacobi Method) Let
A = L + D + U.
A Jacobi Method scheme reads

D xk +1 − xk + Axk = b.
i.e. αk = 1, B = D in the general iterative scheme.
Definition 3.5. (Error Transfer Operator for Jacobi Method ) the error transfer operator for Jacobi Method
is as follows
T = I − D −1 A.
Remark 3.1. Since
A = L + D + U.
and

D xk +1 − xk + Axk = b.
Then we have

D xk +1 − xk + (L + D + U )xk = Lxk + Dxk +1 + U xk = b.
So, the Jacobi iterative method can be written as

X X
aij xjk + aii xik +1 + aij xjk = bi ,
ji
or
 
1  X 
xik +1 = bi − aij xjk  .
 
aii  
j,i
Page 45 of 236
Theorem 3.3. (convergence of the Jacobi Method) If A is diagonal dominant , then the Jacobi Method
convergences.

Proof. We want to show If A is diagonal dominant , then TJ < 1, then Jacobi Method convergences. From
the definition of T, we know that T for Jacobi Method is as follows
TJ = I − D −1 A.
In the matrix form is

  1 
0   a11
 1 0   a11 ··· a1n
 
 h i 
 ..   ..   ..
 
.. ..  tij = 0, i = j,

T =  .  −  .   . . .
 = tij =  a .
tij = − aij , i , j.
   

   1 
   ii
0 1 0 a
 a
n1 ··· ann
nn
So,
X X aij
kT k∞ = max |tij | = max | |.
i i aii
j i,j
Since A is diagonal dominant, so

X
|aii | ≥ |aij | + δ.
j,i
Therefore,
X |aij | δ
1≥ + .
|aii | |aii |
j,i
Hence, kT k∞ < 1
3.3.2 Gauss-Seidel Method
Definition 3.6. (Gauss-Seidel Method) Let
A = L + D + U.
A Gauss-Seidel Method scheme reads

(L + D ) xk +1 − xk + Axk = b.
i.e. αk = 1, B = L + D in the general iterative scheme.
Definition 3.7. (Error Transfer Operator for Gauss-Seidel Method ) The error transfer operator for Gauss-
Seidel Method is as follows
T = I − (L + D )−1 A
= I − (L + D )−1 (L + D + U )
= −(L + D )−1 U .
Page 46 of 236
Remark 3.2. The Gauss-Seidel method is an iterative technique for solving a square system of n linear equations
with unknown x:
Ax = b.
It is defined by the iteration
L∗ x(k +1) = b − U x(k ) ,
where the matrix A is decomposed into alower triangular component L∗ , and a strictly upper triangular
component U: A = L∗ + U .
In more detail, write out A, x and b in their components:
a11 a12 ··· a1n  x1  b1 

     
a21 a22 a2n  x2  b2 

···
A =  . .. ..  , x =  .  , b =  .  .
    
 .. ..  ..   .. 
 . . .     
an1 an2 ··· ann xn bn
Then the decomposition of A into its lower triangular component and its strictly upper triangular component is
given by:
a11 0 ··· 0 0 a12 ··· a1n 

   

a a22 0 0 0 a2n 

 21 ··· 
  ···
A = L∗ + U where L∗ =  . .. .. ..  , U =  . .. .. ..  .
 .. . . .

  .. . . . 
  
an1 an2 ··· ann 0 0 ··· 0
The system of linear equations may be rewritten as:
L∗ x = b − U x
The Gauss-Seidel method now solves the left hand side of this expression for x, using previous value for x on the
right hand side. Analytically, this may be written as:
x(k +1) = L−1 (k )

∗ (b − U x ).
However, by taking advantage of the triangular form of L∗ , the elements of x(k+1) can be computed sequentially
using forward substitution:
 
(k +1) 1  X (k +1)
X (k ) 

xi = bi − aij xj − aij xj  , i, j = 1, 2, . . . , n.

aii  
ji
The procedure is generally continued until the changes made by an iteration are below some tolerance, such as a
sufficiently small residual.
Theorem 3.4. (convergence of the Gauss-Seidel Method) If A is diagonal dominant , then the Gauss-Seidel
Method convergences.
Proof. We want to show If A is diagonal dominant , then kTGS k < 1, then Gauss-Seidel Method conver-
gences. From the definition of T, we know that T for Gauss-Seidel Method is as follows
TGS = −(L + D )−1 U .
Page 47 of 236
Next, we will show kTGS k < 1. Since A is diagonal dominant, so

X X X
|aii | ≥ |aij | + δ = |aij | + |aij | + δ.
j,i j>i ji
which implies
P
j>i |aij |
( )
γ = maxi P | ≤ 1.
|aii | − j<i |aij
Now, we will show kTGS k < γ. Let x ∈ Cn and y = T x, i.e.

y = TGS x = −(L + D )−1 U x.

Let i0 be the index such that y ∞ = |yi0 |, then we have
X X X
|((L + D )y )i0 | = |(U x )i0 | = | ai0 j xj | ≤ |ai0 j ||xj | ≤ |ai0 j | kxk∞ .
j>i0 j>i0 j>i0
Moreover
X X X X
|((L + D )y )i0 | = | ai0 j yj + ai0 i0 yj | ≥ |ai0 i0 yj | − | ai0 j yj | = |ai0 i0 | y ∞ − | ai0 j yj | ≥ |ai0 i0 | y ∞ − |ai0 j | y ∞ .
j<i0 j<i0 j<i0 j<i0
Therefore, we have
X X
|ai0 i0 | y ∞ − |ai0 j | y ∞ ≤ |ai0 j | kxk∞ ,
j<i0 j>i0
which implies
P
j>i0 |ai0 j |
y ≤
∞
P kxk∞ .
|ai0 i0 | − j<i0 |ai0 j |
So,
kTGS xk∞ ≤ γ kxk∞ ,
which implies
kTGS k∞ ≤ γ < 1.
3.3.3 Richardson Method
Definition 3.8. (Richardson Method) Let
A = L + D + U.
A Richardson Method scheme reads
x k +1 − x k
!
I + Axk = b.
ω
i.e. αk = ω , 1, B = I in the general iterative scheme.
Page 48 of 236
Definition 3.9. (Error Transfer Operator for Gauss-Seidel Method ) The error transfer operator for Gauss-
Seidel Method is as follows
TRC = I − ω (B)−1 A = I − ωA.
Remark 3.3. Richardson iteration is an iterative method for solving a system of linear equations. Richardson
iteration was proposed by Lewis Richardson in his work dated 1910. It is similar to the Jacobi and Gauss-Seidel
method. We seek the solution to a set of linear equations, expressed in matrix terms as
Ax = b.
The Richardson iteration is
x(k +1) = (I − ωA)x(k ) + ωb.
where α is a scalar parameter that has to be chosen such that the sequence x(k ) converges.
It is easy to see that the method has the correct fixed points, because if it converges, then x(k +1) ≈ x(k ) andx(k )
has to approximate a solution of Ax = b.
2
Theorem 3.5. (convergence of the Richardson Method) Let A = A∗ > 0 (SPD). If 0 < ω < λmax , then the
Richardson Method convergences. Moreover, the best acceleration parameter is given by
2
ωopt = ,
λmin + λmax
in which, similarly, λmin is the smallest eigenvalue of AT A.
Proof. 1. From the above lemma, we know that the error transform operator is as follows
TRC = I − ω (B)−1 A = I − ωA.
Let λ ∈ σ (A), then ν := 1−ωλ ∈ σ (T ). From the sufficient and & necessary condition for convergence,
we know if σ (T ) < 1, then Richardson Method convergences, i.e.
|1 − ωλ| < 1,
which implies
−1 < 1 − ωλmax ≤ 1 − ωλmin < 1.
So, we get −1 < 1 − ωλmax , i.e.

2
ω< .
λmax
2. The minimum is attachment at |1 − ωλmax | = |1 − ωλmin |(Figure.1), i.e.
ωλmax − 1 = 1 − ωλmin .
Therefore, we get
2
ωopt = .
λmin + λmax
Page 49 of 236
|1 − λmax | |1 − λmin |
1
ω
1 ωopt 1
λmax λmin
Figure 1: The curve of ρ (TRC ) as a function of ω
3.3.4 Successive Over Relaxation (SOR) Method
Definition 3.10. (SOR Method) Let
A = L + D + U.
A SOR Method scheme reads
x k +1 − x k
!
(ωL + D ) + Axk = b.
ω
i.e. αk = ω , 1, B = ωL + D in the general iterative scheme.
Remark 3.4. For Gauss-seidel method, we have

Lxk +1 + Dxk +1 + U xk = b.
If we relax the contribution of the diagonal part, i.e. let ω > 0,
D = ω−1 D + (1 − ω−1 )D,
and
Lxk +1 + ω−1 Dxk +1 + (1 − ω−1 )Dxk + U xk = b.
Then, we obtain
(L + ω−1 D )xk +1 + ((1 − ω−1 )D + U )xk = b.
• ω = 1-Gauss-Seidel method,
• ω < 1-Under relaxation method,
• ω > 1-Over relaxation method.
We can rewire the above formula to get the general form:
(L + ω−1 D )xk +1 + ((1 − ω−1 )D + U )xk = b.
(L + ω−1 D )xk +1 + (D − ω−1 D + U + L − L)xk = b
(L + ω−1 D )xk +1 + (A − (L + ω−1 D ))xk = b
(L + ω−1 D )(xk +1 − xk ) + Axk = b
x k +1 − x k
(ωL + D ) + Axk = b
ω
Page 50 of 236
Definition 3.11. (Error Transfer Operator for Gauss-Seidel Method ) The error transfer operator for SOR
Method is as follows
TSOR = I − α (B)−1 A = I − ω (ωL + D )−1 A = −(L + ω−1 D )−1 ((1 − ω−1 )D + U ).
Theorem 3.6. (Necessary condition for convergence of the SOR Method) If SOR method convergences,
then 0 < ω < 2.
Proof. If SOR method convergences, then ρ (T ) < 1, i.e |λ| < 1. Let λi are the roots of characteristic polyno-
mial XT (λ) = det (λI − T ) = (−1)n Πni=1 (λ − λi ). Then,
XT (0) = Πni=1 λi = det (TSOR ).
Since λi < 1, so |det (TSOR )| < 1. Since TSOR = −(ωL + D )−1 ((1 − ω−1 )D + U ), then
det (TSOR ) = det ((L + ω−1 D )−1 )det ((1 − ω−1 )D + U )

det ((1 − ω−1 )D + U ) det ((1 − ω−1 )D ) Πni=1 (1 − ω−1 )aii
= = =
det (L + ω−1 D ) det (ω−1 D ) Πni=1 ω−1 aii
(1 − ω−1 )n
= = |ω − 1|n < 1
ω−n
Therefore, |ω − 1| < 1, so 0 < ω < 2.
Theorem 3.7. (convergence of the SOR Method for SPD) If A = A∗ , and 0 < ω < 2, then SOR converges.
Proof. Since
TSOR = −(L + ω−1 D )−1 ((1 − ω−1 )D + U ) = (L + ω−1 D )−1 ((ω−1 − 1)D − U ).
Let Q = L + ω−1 D, then
I − TSOR = Q−1 A.
Let (λ, x ) be the eigenpair of T, i.e. T x = λx and y = (I − TSOR )x = (1 − λ)x. So, we have
y = Q−1 Ax, or Qy = Ax.
Moreover,
(Q − A)y = Qy − Ay = Ax − Ay = A(x − y ) = A(x − (I − T )x ) = AT x = λAx.

So, we have
(Qy, y ) = (Ax, y ) = (Ax, (1 − λ)x ) = (1 − λ̄)(Ax, x ).

(y, (Q − A)y ) = (y, λAx ) = λ̄(y, Ax ) = λ̄((1 − λ)x, Ax ) = λ̄(1 − λ̄)(x, Ax ) = λ̄(1 − λ̄)(Ax, x ).
Plus the above equation together, then
(Qy, y ) + (y, (Q − A)y ) = (1 − λ̄)(Ax, x ) + λ̄(1 − λ̄)(Ax, x ) = (1 − |λ|2 )(Ax, x ).

while
(Qy, y ) + (y, (Q − A)y ) = ((L + ω−1 D )y, y ) + (y, (L + ω−1 D − A)y )

= (Ly, y ) + (ω−1 Dy, y ) + (y, ω−1 Dy ) − (y, Dy ) − (y, U y )
= (2ω−1 − 1)(Dy, y ).(sinceA = A∗ , so, L = U )
Page 51 of 236
So, we get
(2ω−1 − 1)(Dy, y ) = (1 − |λ|2 )(Ax, x ).
Since 0 < ω < 2, (Dy, y ) > 0 and (Ax, x ) > 0, so we have
(1 − |λ|2 ) > 0.
Then, we have |λ| < 1.
3.4 Convergence in energy norm for steady cases

From now on, A = A∗ > 0.
Definition 3.12. (Energy norm w.r.t A) The Energy norm associated with A is
kxkA = (Ax, x );
Now, we will consider the convergence in energy norm of stationary scheme,

x k +1 − x k
!
B + Axk = b.
α

Theorem 3.8. (convergence in energy norm) If Q = B − α2 A > 0, then ek A → 0.
Proof. Let ek = xk − x. Since

x k +1 − x k
!
B + Axk = b = Ax.
α
so, we get
e k +1 − e k
!
B + Aek = 0.
α
Let v k +1 = ek +1 − ek , then
1 k +1
Bv + Aek = 0.
α
Then take the inner product of both sides with v k +1 ,
1
(Bv k +1 , v k +1 ) + (Aek , v k +1 ) = 0.
α
Since
1 k +1 1 1 1
ek = (e + e k ) − (e k +1 − e k ) = (e k +1 + e k ) − v k +1 .
2 2 2 2
Therefore,
1
0 = (Bv k +1 , v k +1 ) + (Aek , v k +1 )
α
1 1 1
= (Bv k +1 , v k +1 ) + (A(ek +1 + ek ), v k +1 ) − (Av k +1 , v k +1 )
α 2 2
1 α k +1 k +1 1 k +1
= ((B − A)v , v ) + (A(e + e ), v k +1 )
k
α 2 2
1 α 1 2 2
= ((B − A)v k +1 , v k +1 ) + ( ek +1 A − ek A )
α 2 2
Page 52 of 236
By assumption, Q = B − α2 A > 0, i.e. there exists m > 0, s.t.

2
(Qy, y ) ≥ m y . 2
Therefore,
m k +1 2 1 k +1 2 k 2
v + ( e − e ) ≤ 0.
α 2 2 A A
i.e.
2m k +1 2 k +1 2 k 2
v + e ≤ e .
α 2 A A
Hence
2 2
ek +1 ≤ ek .

A A
and
2
ek +1 → 0.

A
3.5 Dynamic cases iterative method

In this subsection, we will consider the following dynamic iterative method
x k +1 − x k
!
Bk + Axk = b.
αk
where Bk and αk are dependent on the k.
3.5.1 Chebyshev iterative Method
Definition 3.13. (Chebyshev iterative Method) Chebyshev iterative Method is going to choose
α1 , α2 , · · · , αk , s.t. ek 2 is minimal for
x k +1 − x k
!
+ Axk = b.
αk +1

Theorem 3.9. (convergence of Chebyshev iterative Method) If A = A∗ > 0, then for a given n, ek is
minimized by choosing
α0
αk = , t = 1, · · · , n.
1 + ρ0 tk
Where
2 κ (A) − 1 (2k + 1) ∗ 2π
α0 = , ρ0 = 2 , tk = cos( ).
λmin + λmax κ2 ( A ) + 1 2n
Moreover, we have
ρ1k
p
κ (A) − 1
e , where ρ1 = p 2

ek ≤ 2 0
2 2
.
1 + ρ12k κ2 ( A ) + 1
Page 53 of 236
3.5.2 Minimal residuals Method
Definition 3.14. (Minimal residuals Method) Minimal residuals iterative Method is going to choose
α1 , α2 , · · · , αk , s.t. the residuals r k = b − Axk is minimal for
x k +1 − x k
!
+ Axk = b.
αk +1
Theorem 3.10. (optimal αk +1 of minimal residuals iterative Method) The optimal αk +1 of minimal
residuals iterative Method is as follows
(r k , Ar k )
αk +1 = .
Ar k 2
2
Proof. From the iterative scheme
x k +1 − x k
!
+ Axk = b,
αk +1
we get
xk +1 = xk + αk +1 r k .
By multiplying −A and add b to both side of the above equation, we have
r k +1 = r k − αk +1 Ar k .
Therefore,
2
r k +1

2
= (r k − αk +1 Ar k , r k − αk +1 Ar k )
2 2
= r k 2 − 2αk +1 (r k , Ar k ) + αk2+1 Ar k 2 .
When αk +1 minimize the residuals, the

2 2
( r k +1 2 )0 = −2(r k , Ar k ) + 2αk +1 Ar k 2 = 0, i.e.

(r k , Ar k )
αk +1 = .
Ar k 2
2
Corollary 3.1. The residual r k +1 of minimal residuals iterative Method is orthogonal to residual r k in
A-norm.
Proof.
(Ar k +1 , r k ) = (r k +1 , Ar k ) = (r k − αk +1 Ar k , Ar k ) = (r k , Ar k ) − αk +1 (Ar k , Ar k ) = 0.
Page 54 of 236
Algorithm 3.1. (Minimal residuals method algorithm)

• x0
• compute r k = b − Axk
(r k ,Ar k )
• compute αk +1 = 2
kAr k k2
• compute xk +1 = xk + αk +1 r k
Theorem 3.11. (convergence of minimal residuals iterative Method) The minimal residuals iterative
Method converges for any x0 and
κ (A) − 1
Aek ≤ ρ0n Ae0 , with ρ0 = 2

2 2
.
κ2 ( A ) + 1
Proof. Since the choice

(r k , Ar k )
αk +1 = .
Ar k 2
2

minimizes the r k +1 . Consequently, choosing
1
αk +1 = α0 = ,
λmax + λmin
we get
λmax

λ − λmin λmin −1 kAk2 A−1 2 − 1 κ (A) − 1
ρ0 = max = = = 2 .
λmax + λmin λmax −1
+ 1 kAk2 A 2 + 1 κ 2 (A) + 1
λmin
Moreover, since
r k +1 = r k − αk +1 Ar k = (I − αk +1 A)r k ,
then
r k +1 ≤ I − αk +1 A r k = ρ (T ) ≤ ρ0 r k .

2 2 2 2
Since
Aek = A(x − xk ) = Ax − Axk = b − Axk = r k ,
so,
Aek +1 = r k +1 ≤ I − αk +1 A r k = ρ (T ) ≤ ρ0 r k ≤ ρ0n Ae0 .

2 2 2 2 2 2
3.5.3 Minimal correction iterative method
Definition 3.15. (Minimal correction Method) Minimal correction iterative Method is going to choose
α1 , α2 , · · · , αk , s.t. the correction wk +1 B (wk = B−1 (b − Axk ) = B−1 r k ,A = A∗ > 0, B = B∗ > 0) is minimal
for
x k +1 − x k
!
B + Axk = b.
αk +1
Page 55 of 236
Theorem 3.12. (optimal αk +1 of minimal correction iterative Method) The optimal αk +1 of minimal
correction iterative Method is as follows

(wk , Awk ) wk
αk +1 = −1 = A .
(B Awk , Awk ) Awk −1
B
x k +1 − x k
!
B + Axk = b,
αk +1
we get
xk +1 = xk + αk +1 B−1 r k .
By multiplying −A and add b to both side of the above equation, we have
r k +1 = r k − αk +1 AB−1 r k .
Since, wk = B−1 (b − Axk ) = B−1 r k , A = A∗ > 0, B = B∗ > 0 Therefore,

2
wk +1 = (Bwk +1 , wk +1 ) = (BB−1 r k +1 , B−1 r k +1 ) = (r k +1 , B−1 r k +1 )

B
= (r k − αk +1 AB−1 r k , B−1 r k − αk +1 B−1 AB−1 r k )
= (r k , B−1 r k ) − αk +1 (r k , B−1 AB−1 r k ) − αk +1 (AB−1 r k , B−1 r k ) − αk2+1 (AB−1 r k , B−1 AB−1 r k )
= (r k , B−1 r k ) − 2αk +1 (B−1 r k , AB−1 r k ) + αk2+1 (B−1 AB−1 r k , AB−1 r k )
= (r k , wk ) − 2αk +1 (wk , Awk ) + αk2+1 (B−1 Awk , Awk )
2
( wk +1 B )0 = −2(wk , Awk ) + 2αk +1 (B−1 Awk , Awk ) = 0, i.e.

(wk , Awk )
αk +1 = .
(B−1 Awk , Awk )
Remark 3.5. Most of time, it’s not easy to compute k·kA , k·kB−1 . We will use the following alternative way to
1
implement the algorithm. let v k = B 2 wk , then from the iterative scheme
x k +1 − x k
!
B + Axk = b,
αk +1
Multiplying by B−1 on both side of the above equation yields
x k +1 − x k
!
+ B−1 Axk = B−1 b.
αk + 1
Then, Multiplying by −A on both side of the above equation yields
−Axk +1 + Axk
!
− AB−1 Axk = −AB−1 b.
αk +1
Page 56 of 236
therefore
b − Axk +1 − (b − Axk )
!
+ AB−1 (b − Axk ) = 0,
αk +1
i.e.
r k +1 − r k
!
+ AB−1 r k = 0.
αk +1
By using the identity B−1 r k = wk , we get
w k +1 − w k
!
B + Awk = 0.
αk + 1
Then, we have
w k +1 − w k
!
1 1 1 1
B B
2 2 + AB− 2 B 2 wk = b.
αk + 1
1
Multiplying by B− 2 on both side of the above equation yields
k +1 − w k
!
1 w 1 1 1 1
B2 + B− 2 AB− 2 B 2 wk = B− 2 b,
αk +1
i.e.
v k +1 − v k
!
1 1
B + B− 2 AB− 2 v k = 0.
αk +1
1 1
Since B− 2 AB− 2 > 0, then we minimize v k +1 2 instead of wk +1 B . But
2 1 1 1 1 2
wk +1 = (Bwk +1 , wk +1 ) = (B 2 B 2 wk +1 , wk +1 ) = (B 2 wk +1 , B 2 wk +1 ) = v k +1 .

B 2
Theorem 3.13. (convergence of minimal correction iterative Method) The minimal correction iterative
Method converges for any x0 and
κ (B−1 A) − 1
≤ ρ0n Ae0 B−1 , with ρ0 = 2 −1

Aek .
B−1 κ2 ( B A ) + 1
Proof. Same as convergence of minimal residuals iterative Method.
Algorithm 3.2. (Minimal correction method algorithm)

• x0
• compute wk = B−1 (b − Axk )
(wk ,Awk )
• compute αk +1 = (B−1 Awk ,Awk )
• compute xk +1 = xk + αk +1 wk
Page 57 of 236
3.5.4 Steepest Descent Method
Definition 3.16. (Steepest Descent

Method) Steepest Descent iterative Method is going to choose
α1 , α2 , · · · , αk , s.t. the error ek +1 A is minimal for
x k +1 − x k
!
+ Axk = b.
αk +1
Theorem 3.14. (optimal αk +1 of Steepest Descent iterative Method) The optimal αk +1 of Steepest Descent
iterative Method is as follows
2 2
Aek r k
2 2
αk +1 = = 2 .
Aek 2 k
r
A A
x k +1 − x k
!
+ Axk = b = Ax,
αk +1
we get
ek +1 = ek + αk +1 Aek .
Therefore
2
ek +1 = (Aek +1 , ek +1 )

A
= (Aek + αk +1 A2 ek , ek + αk +1 Aek )
2 2 2
= ek − 2α Aek + α 2 Aek
k +1 k +1
A 2 A

2 2 2
( ek +1 2 )0 = −2 Aek 2 + 2αk +1 Aek A = 0, i.e.

2 2
Aek r k
2 2
αk +1 = = 2 .
Aek 2 k
r
A A
The last step, we use the fact Aek = r k .
Theorem 3.15. (convergence of Steepest Descent iterative Method) The Steepest Descent iterative Method
converges for any x0 (A = A∗ > 0, B = B∗ > 0) and
κ (A) − 1
ek ≤ ρ0n e0 , with ρ0 = 2

A A
.
κ2 (A) + 1
Proof. Same as convergence of minimal residuals iterative Method.
Page 58 of 236
3.5.5 Conjugate Gradients Method
Definition 3.17. (Conjugate Gradients Method) Conjugate Gradients Method iterative Method is a three-
layer iterative method which is going to choose α1 , α2 , · · · , αk and τ1 , τ2 , · · · , τk , s.t. the error ek +1 A is
minimal for
(xk +1 − xk ) + (1 − αk +1 )(xk − xk−1 )

B + Axk = b.
α k + 1 τk + 1
3.5.6 Another look at Conjugate Gradients Method

If A is SPD, we know that solving Ax = b is equivalent to minimize the following quadratic functional
1
Φ (x ) = (Ax, x ) − (f , x ).
2
In fact, the minimum value of Φ is − 12 (A−1 f , f ) at x = A−1 f and the residual r k is the negative gradient of
Φ at xk , i.e.
r k = −∇Φ (xk ).
• Richardson method is always using the increment along the negative gradient of Φ to correct the
result, i.e.
x k + 1 = x k + αk r k .
• Conjugate Gradients Method is using the increment along the direction pk which is not parallel to
the gradient of Φ to correct the result.
Definition 3.18. (A-Conjugate) The direction {pk } is call A-Conjugate, if (pj , Apk ) = 0 when j , k. In
particular,
(pk +1 , Apk ) = 0, ∀k ∈ N.
Let p0 , p1 , · · · , pm be the linearly independent series and x0 be the initial guess, then we can construct
the following series
xk +1 = xk + αk pk , 0 ≤ k ≤ m.
where αk is nonnegative. And then the minimum functional Φ (x ) of xk +1 on k + 1 dimension hyperplane

is
k
X
x = x0 + γj p j , γj ∈ R
j =0
if and only if pj is A-Conjugate and
(r k , p k )
αk = .
(pk +1 , Apk )
Page 59 of 236
Algorithm 3.3. (Conjugate Gradients method algorithm)

• x0
• compute r 0 = f − Ax0 and p0 = r 0
2
(r k ,pk ) r k
• compute αk = (pk ,Apk )
= 2
(pk ,Apk )
• compute xk +1 = xk + αk pk
• compute r k +1 = r k − αk Apk
2
(r k +1 ,Apk ) r k +1
• compute βk +1 = − (pk ,Apk )
= − (pk ,Apk2)
• compute xk +1 = xk + βk +1 pk
Properties 3.2. (properties of {pk } and {r k }) the {pk } and {r k } come from the Conjugate Gradients method
have the following properties:
• (pj , r j ) = 0, 0 ≤ i < j ≤ k
• (pi , Apj ) = 0, i , j 0 ≤ i, j ≤ k
• (r i , r j ) = 0, i , j 0 ≤ i, j ≤ k
Theorem 3.16. (convergence of Conjugate Gradients iterative Method) The Conjugate Gradients iterative
Method converges for any x0 (A = A∗ > 0, B = B∗ > 0) and
p
κ (A) − 1
e ≤ 2ρ0 e , with ρ0 = p 2

k n 0
A A
.
κ2 ( A ) + 1
Definition 3.19. (Krylov subspace) In linear algebra, the order-k Krylov subspace generated by an n-by-n
matrix A and a vector b of dimension n is the linear subspace spanned by the images of b under the first k-1
powers of A (starting from A0 = I), that is,
Kk (A, b ) = span {b, Ab, A2 b, . . . , Ak−1 b}.
Theorem 3.17. (Conjugate Gradients iterative Method in Krylov subspace) For Conjugate Gradients
iterative Method, we have
span{r 0 , r 1 , · · · , r k } = span{p0 , p1 , · · · , pk } = Kk +1 (A, r 0 ).
Page 60 of 236
3.6 Problems
Problem 3.1. (Prelim Jan. 2011#1) Consider a linear system Ax = b with A ∈ Rn×n . Richardson’s method
is an iterative method
Mxk +1 = N xk + b
with M = w1 , N = M −A = w1 I −A, where w is a damping factor chosen to make M approximate A as well as

possible. Suppose A is positive definite and w > 0. Let λ1 and λn denote the smallest and largest eigenvalue
of A.
2
1. Prove that Richardson’s method converges if and only if w < λn .
2
2. Prove that the optimal value of w is w0 = λ1 +λn .
1
Solution. 1. Since M = w,N = M − A = w1 I − A, then we have
xk +1 = (I − wA)xk + bw.
So TR = I − wA, From the sufficient and & necessary condition for convergence, we should have
ρ (TR ) < 1. Since λi are the eigenvalue of A, then we have 1 − λi w are the eigenvalues of TR . Hence
Richardson’s method converges if and only if |1 − λi w| < 1, i.e
−1 < 1 − λn w < · · · < 1 − λ1 w < 1,

2
i.e. w < λn .
2. the minimal attaches at |1 − λn w| = |1 − λ1 w| (Figure. B2), i.e
λn w − 1 = 1 − λ1 w,
i,e
2
w0 = .
λ1 + λn
J
|1 − λn | |1 − λ1 |
1
w
1 wopt 1
λn λ1
Figure 2: The curve of ρ (TR ) as a function of w
Page 61 of 236
Problem 3.2. (Prelim Aug. 2010#3) Suppose that A ∈ Rn×n is SPD and b ∈ Rn is given. Then nth Krylov
subspace us defined as
D E
Kn := b, Ab, A2 b, · · · , Ak−1 b .
n on−1
Let xj , x0 = 0, denote the sequence of vectors generated by the conjugate gradient algorithm. Prove
j =0
that if the method has not already converged after n − 1 iterations, i.e. rn−1 = b − Axn−1 , 0, then the nth
iterate xn us the unique vector in Kn that minimizes
2
φ(y ) = x∗ − y A ,
where x∗ = A−1 b.
Solution. J
Problem 3.3. (Prelim Jan. 2011#1)
Solution. J
Page 62 of 236
4 Eigenvalue Problems
Definition 4.1. (Ger šchgorin disks) Let A ∈ Cn×n , the Ger šchgorin disks of A are
X
Di = {ξ ∈ C : |ξ − aii | < Ri } where Ri = |aij |.
i,j
Theorem 4.1. Every eigenvalue of A lies within at least one of the Ger šchgorin discs Di
Proof. Let λ be an eigenvalue of A and let x = (xj ) be a corresponding eigenvector. Let i ∈ {1, · · · , n} be
chosen so that |xi | = maxj |xj |. (That is to say, choose i so that xi is the largest (in absolute value) number
in the vector x) Then |xi | > 0, otherwise x = 0. Since x is an eigenvector, Ax = λx, and thus:
X
aij xj = λxi ∀i ∈ {1, . . . , n}.
j
So, splitting the sum, we get

X
aij xj = λxi − aii xi .
j,i
We may then divide both sides by xi (choosing i as we explained, we can be sure that xi , 0) and take the
absolute value to obtain
P
j,i aij xj X aij xj X

|λ − aii | = ≤ xi ≤
|aij | = Ri
xi

j,i j,i
where the last inequality is valid because
xj ≤ 1

xi for j , i.
Corollary 4.1. The eigenvalues of A must also lie within the Ger šchgorin discs Di corresponding to the
columns of A.
Proof. Apply the Theorem to AT .
Definition 4.2. (Reyleigh Quotient) Let A ∈ Rn×n , x ∈ Rn . The Reyleigh Quotient is
(Ax, x )
R(x ) = .
(x, x )
Remark 4.1. If x is an eigenvector of A, then Ax = λx and
(Ax, x )
R(x ) = = λ.
(x, x )
Page 63 of 236
Properties 4.1. (properties of Reyleigh Quotient) Reyleigh Quotient has the following properties:
1.
2
∇R(x ) = [Ax − R(x )x ]
(x, x )
2. R(x ) minimizes
f (α ) = kAx − αxk2 .
Proof. 1. From the definition of the gradient, we have

" #
∂r (x ) ∂r (x ) ∂r (x )
∇R(x ) = , ,··· , .
∂x1 ∂x2 ∂xn
By using the quotient rule, we have

∂ ∂
∂r (x ) ∂ (Ax, x )
!
∂ xT Ax
!
∂xi
xT Ax xT x − xT Ax ∂x xT x
i
= = = ,
∂xi ∂xi (x, x ) ∂xi xT x (x T x )2
where
∂ T ∂ T ∂
∂xi ∂xi ∂xi
0
 
 
 .. 
.
 
 

0

 
[0, · · · , 0, 1, 0, · · · , 0]Ax + xT A 
 
= 1  i
i

0
 
 
..
 

.
 
 
0

= (Ax )i + (Ax )i = 2 (Ax )i .

Similarly,
∂ T ∂ T ∂
x x = x x + xT (x )
∂xi ∂xi ∂xi
0
 
 
 .. 
.
 
 

0

 
T 
= [0, · · · , 0, 1, 0, · · · , 0]x + x  1  i
i

0
 
 
..
 

.
 
 
0

= 2xi .
Therefore, we have
∂r (x ) 2 (Ax )i xT Ax2xi
= −
∂xi xT x (x T x )2
2
= ((Ax )i − R(x )xi ) .
xT x
Page 64 of 236
Hence
2 2
∇R(x ) = (Ax − R(x )x ) = (Ax − R(x )x ) .
xT x (x, x )
2. let
g (α ) = kAx − αxk22 .
Then,
g (α ) = (Ax − αx, Ax − αx ) = (Ax, Ax ) − 2α (Ax, x ) + α 2 (x, x ),
and
g 0 (α ) = −2(Ax, x ) + 2α (x, x ),
when R(x ) minimizes
f (α ) = kAx − αxk2
, then g 0 (α ) = 0, i.e.
(Ax, x )
α= = R(x ).
(x, x )
4.1 Schur algorithm
Algorithm 4.1. (Schur algorithm)

• A0 = A = Q ∗ U Q
• compute Ak = Qk−1 Ak−1 Qk
4.2 QR algorithm
Algorithm 4.2. (QR algorithm)

• A0 = A
• compute Qk Rk = Ak−1
• compute Ak = Rk Qk
Properties 4.2. (properties of QR algorithm) QR algorithm has the following properties:

1. Ak is similar to Ak−1
∗ ∗
2. Ak−1 = Ak−1 and Ak = Ak
3. If Ak−1 is tridiagonal, then Ak is tridiagonal.
−1 −1
Proof. 1. Since Qk Rk = Ak−1 , so Rk = Qk Ak−1 and Ak = Rk Qk = Qk Ak−1 Qk .
Page 65 of 236
2. Since Q is unitary, so Q∗ = Q−1 and A = A∗ , so

∗ −1 ∗ ∗ ∗ −1 ∗ −1 ∗ −1
Ak = Qk Ak−1 Qk = Qk Ak−1 Qk = Qk Ak−1 Qk = Qk Ak−1 Qk = Ak .
Similarly,
∗ −1 ∗ −1
Ak−1 = Qk Ak Qk = Q k Ak Q k = Ak−1 .
3. since Ak is similar to Ak−1 .
4.3 Power iteration algorithm
Algorithm 4.3. (Power iteration algorithm)

• v 0 : an arbitrary nonzero vector
• compute v k = Av k−1
Remark 4.2. This algorithm generates a sequence of vectors
v 0 , Av 0 , A2 v 0 , A3 v 0 , · · · .
If we want to prove that this sequence converges to an eigenvector of A, the matrix needs to be such that it has a
unique largest eigenvalue λ1 ,
|λ1 | > |λ2 | ≥ · · · |λm | ≥ 0.
There is another technical assumption. The initial vector v 0 needs to be chosen such that q1T v 0 , 0 . Otherwise,
if v 0 is completely perpendicular to the eigenvector q1 , the algorithm will not converge.
Algorithm 4.4. (improved Power iteration algorithm)

• v 0 : an arbitrary nonzero vector with v 0 = 1 2
• compute wk = Av k−1
wk
• compute v k =
kwk k2
• compute λk = R(v k )
Theorem 4.2. (Convergence of power algorithm) If A = A∗ , q1T v 0 , 0 and |λ1 | > |λ2 | ≥ · · · |λm | ≥ 0, then
the convergence to the eigenvector of improved Power iteration algorithm is linear, while the convergence to
the eigenvalue is still quadratic, i.e.
k !

v k − (±q1 ) = O λ2
2 λ1
λ2 2k
!

k
λ − λ1 = O .
2 λ1
Page 66 of 236
Proof. let {q1 , q2 , · · · , qn } be the orthogonal basis of R. Then v 0 can be rewritten as

X
v0 = αj qj .
Moreover, following the power algorithm, we have
X X
w1 = Av 0 = αj Aqj = αj λj qj .(Aqj = λj qj )
P
αj λj qj
v 1 = qP
αj2 λ2j
αj λ2j qj
P P
2 1
αj λ j Aq j
w = Av = qP = qP
αj2 λ2j αj2 λ2j
αj λ2j qj
P
2
v = qP
αj2 λ2·2
j
···
αj λkj qj
P
k
w = q
P 2 2·(k−1)
αj λj
αj λkj qj
P
k
v = qP .
αj2 λ2·k
j
v k can be rewritten as
αj λkj qj α1 λk1 q1 + j>1 αj λkj qj
P P
k
v = qP =q
αj2 λ2·k α12 λ2k 2 2k
1 + j>1 αj λj
P
j
α j λj k

+
P
α1 λ1k q1 j>1 α1 λ1 qj
= ·
|α1 λk1 |
r
P α 2 λj 2k
1 + j>1 α j λ1 1
k
αj λj
q1 +
P
j>1 α1 λ1 qj
= ±1 r .
αj 2 λj 2k
1+
P
j>1 α1 λ1
Therefore,

X α λ !k !k k !

v k − (±q1 ) ≤
j j
qj ≤ C λ2 = O λ2 .
2 α1 λ1 λ1 λ1
j>1
From Taylor formula

2k !
2
λk − λ1 = |R(v k ) − R(q1 )| = O v k − q1 = O λ2 .
2 2 λ1
Remark 4.3. This shows that the speed of convergence depends on the gap between the two largest eigenvalues
of A. In particular, if the largest eigenvalue of A were complex (which it can’t be for the real symmetric matrices
we are considering), then λ2 = λ̄1 and the algorithm would not converge at all.
Page 67 of 236
4.4 Inverse Power iteration algorithm
Algorithm 4.5. (inverse Power iteration algorithm)

• v 0 : an arbitrary nonzero vector with v 0 = 1
2
• compute wk = A−1 v k−1

wk
• compute v k =
kwk k2
Algorithm 4.6. (Improved inverse Power iteration algorithm)

2
• compute wk = (A − µI )−1 v k−1

wk
• compute v k =
kwk k2
Remark 4.4. Improved inverse Power iteration algorithm is a shift µ.
Algorithm 4.7. (Rayleigh Quotient Iteration iteration algorithm)

2
• compute λ0 = R(v 0 )
• compute wk = (A − λk−1 I )−1 v k−1
wk
• compute v k =
kwk k2
Theorem 4.3. (Convergence of power algorithm) If A = A∗ , q1T v 0 , 0 and |λ1 | > |λ2 | ≥ · · · |λm | ≥ 0, If we
update the estimate µ for the eigenvalue with the Rayleigh quotient at each iteration we can get a cubically
convergent algorithm, i.e.
3
v k +1 − (±qJ ) = O v k − (±qJ )

2 2

k k
λ − λJ = O |λ − λJ | . 3
2
4.5 Problems
Problem 4.1. (Prelim Aug. 2013#1)
Solution. J
Page 68 of 236
5 Solution of Nonlinear problems
Definition 5.1. (convergence with Order p) An iterative scheme converges with order p>0 if there is a
constant C > 0, such that
|x − xk +1 | ≤ C|x − xk |p . (118)
5.1 Bisection method
Definition 5.2. (Bisection method) The method is applicable for solving the equation f(x) = 0 for the real
variable x, where f is a continuous function defined on an interval [a, b] and f(a) and f(b) have opposite
signs i.e. f (a)f (b ) < 0. In this case a and b are said to bracket a root since, by the intermediate value
theorem, the continuous function f must have at least one root in the interval (a, b).
Algorithm 1 Bisection method

1: a0 ← a, b0 ← b
2: while k > 0 do
3: ck ← ak−1 +2
bk−1
4: if f (ak )f (ck ) < 0 then

5: ak ← ak−1
6: bk ← ck
7: end if
8: if f (bk )f (ck ) < 0 then
9: ak ← ck
10: bk ← bk−1
11: end if
12: xk ← ck ← ak + 2
bk
13: end while
5.2 Chord method
Definition 5.3. (Chord method) The method is applicable for solving the equation f(x) = 0 for the real
signs i.e. f (a)f (b ) < 0. Instead of the [a,b] segment halving, we?ll divide it relation f (a) : f (b ), It gives
the approach of a root of the equation
h i−1
x k +1 = x k − η k f (x k ).
where
f (b ) − f (a)
ηk =
b−a
Page 69 of 236
Algorithm 2 Chord method

f (a)
1: x1 = a − f (b)−f (a) (b − a), x0 = 0
f (b )−f (a)
2: η k = b−a
3: while |xk +1 − xk | < do
h i−1
4: x k +1 ← x k − η k f (x k )
5: end while
5.3 Secant method
Definition 5.4. (Secant method) The method is applicable for solving the equation f(x) = 0 for the real
signs i.e. f (a)f (b ) < 0. Instead of the [a,b] segment halving, we?ll divide it relation f (xk ) : f (xk−1 ), It
gives the approach of a root of the equation
h i−1
x k +1 = x k − η k f (x k ).
where
f (xk ) − f (xk−1 )
ηk =
xk − xk−1
Algorithm 3 Secant method

f (a)
1: x1 = a − f (b)−f (a) (b − a)
f (xk )−f (xk−1 )
2: η k = xk −xk−1 , x0 = 0
h i−1
4: x k +1 ← x k − η k f (x k )
5: end while
5.4 Newton’s method
Definition 5.5. (Newton’s method) The method is applicable for solving the equation f(x) = 0 for the real
signs i.e. f (a)f (b ) < 0. Instead of the [a,b] segment halving, we?ll divide it relation f 0 (xk ), It gives the
approach of a root of the equation
h i−1
x k +1 = x k − η k f (x k ).
where
η k = f 0 (x k )
Remark 5.1. This scheme needs f 0 (xk ) , 0.
Page 70 of 236
Algorithm 4 Newton’s method

f (a)
1: x1 = a − f (b)−f (a) (b − a)
2: η k = f 0 (xk ), x0 = 0
h i−1
4: x k +1 ← x k − η k f (x k )
5: end while
Theorem 5.1. (convergence of Newton’s method) Let f ∈ C2 , f (x∗ ) = 0, f 0 (x ) , 0 and f 00 (x∗ ) is bounded
in a neighborhood of x∗ . Provide x0 is sufficient close to x∗ , then newton’s method converges quadratically,
i.e.
2
xk +1 − x∗ ≤ C xk − x∗ .

Proof. Let x∗ be the root of f (x ). From the Taylor expansion, we know

1
0 = f (x∗ ) = f (xk ) + f 0 (xk )(x∗ − xk ) + f 00 (θ )(x∗ − xk )2 ,
2
where θ is between x∗ and xk . Define ek = x∗ − xk , then
1
0 = f (x∗ ) = f (xk ) + f 0 (xk )(ek ) + f 00 (θ )(ek )2 .
2
so
h i−1 1h i−1
f 0 (xk ) f (xk ) = −(ek ) − f 0 (xk ) f 00 (θ )(ek )2 .
2
From the Newton’s scheme, we have
 h i−1
xk +1 = xk − f 0 (xk ) f (xk )


x∗ = x∗


So,
h i−1 1h i−1
ek +1 = ek + f 0 (xk ) f (xk ) = − f 0 (xk ) f 00 (θ )(ek )2 ,
2
i.e.
f 00 (θ )
e k +1 = − h i (e k )2 ,
0
2 f (x )k
By assumption, there is a neighborhood of x, such that

f (z ) ≤ C1 , f 0 (z ) ≤ C2 ,
Therefore,

00 (θ )
C1 k 2
ek +1 ≤ h

k 2
f
i (e ) ≤ e .
2 f 0 (xk ) 2C2
This implies
2
xk +1 − x∗ ≤ C xk − x∗ .

Page 71 of 236
5.5 Newton’s method for system
Theorem 5.2. If F : R → Rn is integrable over the interval [a,b], then

Z Z
b b
F (t ) dt ≤ kF (t )k dt.

a a
Theorem 5.3. Suppose F : Rn → Rm is continuously differentiable and a, b ∈ Rn . Then

Z 1
F (b ) = F (a) + J (a + θ (b − a))(b − a) dθ,
0
where J is the Jacobian of F.
Theorem 5.4. Suppose J : Rm → Rn×n is a continuous matrix-valued function. If J(x*) is nonsingular,

then there exists δ > 0 such that, for all x ∈ Rm with kx − x∗ k < δ, J(x) is nonsingular and

J (x )−1 < 2 J (x∗ )−1 .
Theorem 5.5. Suppose J : Rn → Rm . Then F is said to be Lipschitz continuous on S ⊂ Rn if there exists a

positive constant L such that
J (x ) − J (y ) ≤ L x − y
Theorem 5.6. (convergence of Newton’s method) Suppose F : Rn → Rn is continuously differentiable and

F (x∗ ) = 0.
1. the Jacobian J (x∗ ) of F at x∗ is nonsingular, and
2. J is Lipschitz continuous on a neighborhood of x∗ ,

then, for all x0 sufficiently close to x∗ , x0 − x∗ < , Newton’s method converges quadratically to x∗ , i.e
2
xk+1 − xk ≤ C xk − x .

Proof. Let x∗ be the root of F (x ) i.e. F (x∗ )=0. From the Newton’s scheme, we have
 h i−1
xk+1 = xk − J(xk ) F(xk )


x∗ = x∗


Therefore, we have
h i−1
x∗ − xk + 1 = x∗ − xk + J(xk ) (F(xk ) − F(x∗ ))
h i−1
= x∗ − xk + J(xk ) F(xk ) − F(x∗ )+J(x∗ )(x∗ − xk ) − J(x∗ )(x∗ − xk )
h i−1 h i−1
= I − J (xk ) J (x∗ ) x∗ − xk − J (xk ) F (xk ) − F(x∗ ) + J(x∗ )(x∗ − xk ) .
So,
h i−1 h i−1
x∗ − xk+1 ≤ I − J (xk ) J (x∗ ) x∗ − xk + J (xk ) F (xk ) − F(x∗ ) + J(x∗ )(x∗ − xk ) .

(119)
Page 72 of 236
h i−1
Now, we will estimate I − J (xk ) J (x∗ ) and F (xk ) − F(x∗ ) + J(x∗ )(x∗ − xk ) .
I − J (xk ) −1 J (x∗ )
h i h i−1 h i h i−1
= J (xk ) J (xk ) − J (xk ) J (x∗ )
h i−1
= J (xk ) J (xk ) − J(x∗ ) (120)
h i−1
≤ J (xk ) J (xk ) − J(x∗ )
h i−1
≤ L J (xk ) x∗ − xk .
In the last step of the above equation, we use the J is Lipschitz continuous(If J is not Lipschitz contin-
uous, we can only get the Newton method converges linearly to x∗ ). Since F : Rn → Rn is continuously
differentiable, therefore
Z1
F (b ) = F (a) + J (a + θ (b − a))(b − a)dθ.
0
So
Z 1
F ( xk ) = F (x∗ ) + J(x∗ + θ (xk − x∗ ))(xk − x∗ )dθ
0
Z 1
= F (x∗ ) + J(x∗ + θ (xk − x∗ ))(xk − x∗ ) + J(x∗ )(x∗ − xk ) − J(x∗ )(x∗ − xk )dθ
0
Z 1
= F (x∗ ) − J(x∗ )(x∗ − xk ) + J(x∗ + θ (xk − x∗ ))(xk − x∗ ) + J(x∗ )(x∗ − xk )dθ
0
Hence
Z 1
k k
∗ ∗
F (x ) − F(x ) + J(x )(x − x ) = ∗
J(x∗ + θ (xk − x∗ ))(xk − x∗ ) + J(x∗ )(x∗ − xk )dθ.
0
So,
Z
1
F (xk ) − F(x∗ ) + J(x∗ )(x∗ − xk ) = ∗ k ∗ k ∗ ∗ ∗
J (x + θ (x − x ))(x − x ) + J(x )(x − x )dθ k

0
Z 1
≤ J (x∗ + θ (xk − x∗ ))(xk − x∗ ) + J(x∗ )(x∗ − xk ) dθ
0
Z 1
≤ J (x∗ + θ (xk − x∗ )) − J(x∗ ) (x∗ − xk ) dθ (121)
0
Z 1 2
≤ Lθ (x∗ − xk ) dθ
0
1 ∗ 2
≤ L (x − xk ) .
2
From (119), (120) and (121), we have
3 h i−1 2 2
x∗ − xk+1 ≤ L J (xk ) x∗ − xk ≤ 3L [J (x∗ )]−1 x∗ − xk .

(122)
2
Page 73 of 236
Remark 5.2. From the last step of the above proof process, we can get the condition of . such as, If
1
x∗ − xk ≤ ,
L [J (x∗ )]−1

then
1
x∗ − xk+1 ≤ x∗ − xk .

(123)
2
5.6 Fixed point method

In fact, Chord, scant and Newton’s method can be consider as fixed point iterative, since
h i−1
x k +1 = x k − η k f (x k ) = φ (x k ).
Theorem 5.7. x is a fixed point of φ and Uδ = {z : |x − z| ≤ δ}. If φ is differentiable on Uδ and q < 1 such
φ0 (z ) ≤ q < 1 for all z ∈ Uδ , then
1. φ(Uδ ) ⊂ Uδ
2. φ is contraction.
5.7 Problems
Problem 5.1. (Prelim Jan. 2011#4) Let f : Ω ⊂ Rn → Rn be twice continuously differentiable. Suppose
x∗ ∈ Ω is a solution of f (x ) = 0, and the Jacobian matrix of f, denoted Jf , is invertible at x∗ .
1. Prove that if x0 ∈ Ω is sufficiently close to x∗ , then the following iteration converges to x∗ :
xk +1 = xk − Jf (x0 )−1 f (xk ).
2. Prove that the convergence is typically only linear.
Solution. Let x∗ be the root of f(x ) i.e. f(x∗ )=0. From the Newton’s scheme, we have
 h i−1
xk +1 = xk − J (x0 ) f(xk )


x∗ = x∗


Therefore, we have
h i−1
x∗ − xk +1 = x∗ − xk + J (x0 ) (f(xk ) − f(x∗ ))
h i−1
= x∗ − xk − J(x0 ) J(ξ )(x∗ − xk ).
therefore
J(ξ ) ∗

x∗ − xk +1 ≤ 1 −

0
x − xk
J(x )
From theorem
Page 74 of 236
Theorem 5.8. Suppose J : Rm → Rn×n is a continuous matrix-valued function. If J(x*) is nonsingular, then
there exists δ > 0 such that, for all x ∈ Rm with kx − x∗ k < δ, J(x) is nonsingular and

J (x )−1 < 2 J (x∗ )−1 .
we get
1
x∗ − xk +1 ≤ x∗ − xk .

2
Which also shows the convergence is typically only linear.
J
Problem 5.2. (Prelim Aug. 2010#5) Assume that f : R → R, f ∈ C 2 (R), f 0 (x ) > 0 for all x ∈ R, and
f 00 (x ) > 0, for all x ∈ R.
1. Suppose that a root ξ ∈ R exists. Prove that it is unique. Exhibit a function satisfying the assump-
tions above that has no root.
2. Prove that for any starting guess x0 ∈ R, Newton’s method converges, and the convergence rate is
quadratic.
Solution. 1. Let x1 and x2 are the two different roots. So, f (x1 ) = f (x2 ) = 0, then by Mean value
theorem, we have that there exists η ∈ [x1 , x2 ], such f 0 (η ) = 0 which contradicts f 0 (x ) > 0.
2. example f (x ) = ex .
3. Let x∗ be the root of f (x ). From the Taylor expansion, we know
1
2
1
0 = f (x∗ ) = f (xk ) + f 0 (xk )(ek ) + f 00 (θ )(ek )2 .
2
so
h i−1 1h i−1
2
 h i−1
xk +1 = xk − f 0 (xk ) f (xk )


x∗ = x∗


So,
h i−1 1h i−1
2
i.e.
f 00 (θ )
e k +1 = − h i (e k )2 ,
2 f 0 (x k )

f (z ) ≤ C1 , f 0 (z ) ≤ C2 ,
Page 75 of 236
Therefore,

k + 1
f 00 (θ ) k 2 C1 k 2
e ≤ h i (e ) ≤ e .
2 f 0 (xk ) 2C2
This implies
2
xk +1 − x∗ ≤ C xk − x∗ .

Problem 5.3. (Prelim Aug. 2010#4) Let f : Rn → R be twice continuously differentiable. Suppose x∗ is
a isolated root of f and the Jacobian of f at x∗ (J (x∗ )) is non-singular. Determine conditions on so that if
kx0 − x∗ k2 < then the following iteration converges to x∗ :
xk +1 = xk − Jf (x0 )−1 f (xk ), k = 0, 1, 2, · · · .
Solution. J
Problem 5.4. (Prelim Aug. 2009#5) Consider the two-step Newton method
f ( xk ) f (y )
y k = xk − , xk + 1 = y k − 0 k
f 0 ( xk ) f ( xk )
for the solution of the equation f (x ) = 0. Prove

1. If the method converges, then
xk + 1 − x ∗ f 00 (x )
lim ∗ ∗
= 0 k ,
k→∞ (yk − x )(xk − x ) f ( xk )
where x∗ is the solution.

2. Prove the convergence is cubic, that is
xk +1 − x∗ 1 f 00 (xk )
!
lim = .
k→∞ (xk − x∗ )3 2 f 0 ( xk )
3. Would you say that this method is faster than Newton’s method given that its convergence is cubic?
Solution. 1. First, we will show that if xk ∈ [x − h, x + h], then yk ∈ [x − h, x + h]. By Taylor expansion
formula, we have
1 00
0 = f (x∗ ) = f (xk ) + f 0 (xk )(x∗ − xk ) + f (ξk )(x∗ − xk )2 ,
2!
where ξ is between x and xk . Therefore, we have
1 00
f (xk ) = −f 0 (xk )(x∗ − xk ) − f (ξk )(x∗ − xk )2 .
2!
Plugging the above equation to the first step of the Newton’s method, we have
1 f 00 (ξk ) ∗
y k = xk + ( x ∗ − xk ) + ( x − xk ) 2 .
2! f 0 (xk )
Page 76 of 236
then
1 f 00 (ξk ) ∗
yk − x ∗ = ( x − xk ) 2 . (124)
2! f 0 (xk )
Therefore,
1 f 00 (ξk )
1 f 00 (ξk ) ∗

∗
yk − x = ∗ 2 ∗
( x − x k ) ≤ ( x − x k ) ( x − x k ) .
2! f 0 (xk ) 2 f 0 (x )

k
Since we can choose the initial value very close to x∗ , shah that
00
f (ξ ) (x∗ − x ) ≤ 1

f 0 (x ) k
k
Then, we have that
1
yk − x∗ ≤ (x∗ − xk ) .
2
Hence, we proved the result, that is to say, if xk → x∗ , then yk , ξk → x∗ .
2. Next, we will show if xk ∈ [x − h, x + h], then xk +1 ∈ [x − h, x + h]. From the second step of the Newton’s
Method, we have that
f (yk )
xk + 1 − x ∗ = yk − x∗ −
f 0 ( xk )
1
= ((yk − x∗ )f 0 (xk ) − f (yk ))
f 0 ( xk )
1
= [(yk − x∗ )(f 0 (xk ) − f 0 (x∗ )) − f (yk ) + (yk − x∗ )f 0 (x∗ )]
f 0 ( xk )
By mean value theorem, we have there exists ηk between x∗ and xk , such that
f 0 (xk ) − f 0 (x∗ ) = f 00 ηk (xk − x∗ ),
and by Taylor expansion formula, we have
(yk − x∗ )2 00
f ( yk ) = f (x∗ ) + f 0 (x∗ )(yk − x∗ ) + f ( γk )
2
(y − x∗ )2 00
= f 0 (x∗ )(yk − x∗ ) + k f ( γk ) ,
2
where γ is between yk and x∗ . Plugging the above two equations to the second step of the Newton’s
method, we get
(yk − x∗ )2 00
" #
∗ 1 00 ∗ ∗ 0 ∗ ∗ ∗ 0 ∗
xk + 1 − x = f ηk (xk − x )(yk − x ) − f (x )(yk − x ) − f ( γk ) + ( y k − x ) f ( x )
f 0 ( xk ) 2
(yk − x∗ )2 00
" #
1 00 ∗ ∗
= f η (
k kx − x )( y k − x ) − f ( γ )
k . (125)
f 0 ( xk ) 2
Taking absolute values of the above equation, then we have
#
(yk − x∗ )2 00
"
1
xk +1 − x∗ = 0 00 ∗ ∗
f ηk (xk − x )(yk − x ) − f (γk )

f ( xk ) 2
A
≤ A |xk − x∗ | yk − x∗ + yk − x∗ yk − x∗
2
1 1 5
≤ |xk − x∗ | + |xk − x∗ | = |xk − x∗ | .
2 8 8
Hence, we proved the result, that is to say, if yk → x∗ , then xk +1 , ηk , γk → x∗ .
Page 77 of 236
3. Finally, we will prove the convergence order is cubic. From (215), we can get that
xk + 1 − x ∗ f 00 ηk (yk − x∗ )f 00 (γk )
= − .
(xk − x∗ )(yk − x∗ ) f 0 (xk ) 2(xk − x∗ )f 0 (xk )
By using (214), we have
xk + 1 − x ∗ f 00 ηk 1 f 00 (ξk ) ∗ f 00 (γk )
= − ( x − x k ) .
(xk − x∗ )(yk − x∗ ) f 0 (xk ) 4 f 0 (xk ) f 0 ( xk )
Taking limits gives
xk + 1 − x ∗ f 00 (x∗ )
lim = .
k→∞ (xk − x∗ )(yk − x∗ ) f 0 (x ∗ )
By using (214) again, we have
1 2 f 0 ( xk )
= .
yk − x∗ (x∗ − xk )2 f 00 (ξk )
Hence
!2
x − x∗ 1 f 00 (x∗ )
lim k +1 ∗ 3 = .
k→∞ (xk − x ) 2 f 0 (x ∗ )
J
Page 78 of 236
6 Euler Method
In this section, we focus on

y 0 = f (t, y ),


y (t0 ) = y0 .


Where f is Lipschitz continuous w.r.t. the second variable, i.e

|f (t, x ) − f (t, y )| ≤ λ|x − y|, λ > 0. (126)
In the following, We will let y (tn ) to be the numerical approximation of yn and en = yn − y (tn ) to be the
error.
Definition 6.1. (Order of the Method) A time stepping scheme
yn+1 = Φ (h, y0 , y1 , · · · , yn ) (127)
is of order of p ≥ 1 , if
yn+1 − Φ (h, y0 , y1 , · · · , yn ) = O (hp+1 ). (128)
Definition 6.2. (Convergence of the Method) A time stepping scheme
yn+1 = Φ (h, y0 , y1 , · · · , yn ) (129)
is convergent , if

lim max y (tn ) − yn = 0. (130)
h→0 n
6.1 Euler’s method
Definition 6.3. (Forward Euler Methoda )
yn+1 = yn + hf (tn , yn ), n = 0, 1, 2, · · · . (131)

a Forward Euler Method is explicit.
Theorem 6.1. (Forward Euler Method is of order 1 a ) Forward Euler Method
y (tn+1 ) = y (tn ) + hf (tn , y (tn )), (132)
is of order 1 .
a You can also use multi-step theorem to derive it.
Proof. By the Taylor expansion,

y (tn+1 ) = y (tn ) + hy 0 (tn ) + O (h2 ). (133)
So,
y (tn+1 ) − y (tn ) − hf (tn , y (tn )) = y (tn ) + hy 0 (tn ) + O (h2 ) − y (tn ) − hf (tn , y (tn ))
= y (tn ) + hy 0 (tn ) + O (h2 ) − y (tn ) − hy 0 (tn ) (134)
= O (h2 ).
Page 79 of 236
Therefore, Forward Euler Method (6.3) is order of 1 .
Theorem 6.2. (The convergence of Forward Euler Method) Forward Euler Method
y (tn+1 ) = y (tn ) + hf (tn , y (tn )), (135)
is convergent.
Proof. From (134), we get
y (tn+1 ) = y (tn ) + hf (tn , y (tn )) + O (h2 ), (136)
Subtracting (136) from (131), we get
en+1 = en + h[f (tn , yn ) − f (tn , y (tn ))] + ch2 . (137)
Since f is lipschitz continuous w.r.t. the second variable, then
|f (tn , yn ) − f (tn , y (tn ))| ≤ λ|yn − y (tn )|, λ > 0. (138)
Therefore,

en+1 ≤ ken k + hλ ken k + ch2
= (1 + hλ) ken k + ch2 . (139)
Claim:[2]
c
ken k ≤ h[(1 + hλ)n − 1], n = 0, 1, · · · (140)
λ
Proof for Claim (221): The proof is by induction on n.
1. when n = 0, en = 0, hence ken k ≤ λc h[(1 + hλ)n − 1],
2. Induction assumption:
c
ken k ≤ h[(1 + hλ)n − 1]
λ
3. Induction steps:

en+1 ≤(1 + hλ) ken k + ch2 (141)
c
≤ (1 + hλ) h[(1 + hλ)n − 1] + ch2 (142)
λ
c
= h[(1 + hλ)n+1 − 1]. (143)
λ
So, from the claim (221), we get ken k → 0, when h → 0. Therefore Forward Euler Method is convergent
.
Definition 6.4. (tableaux) The tableaux of Forward Euler method

0 0
1.
Page 80 of 236
Solution. Since, the Forward Euler method is as follows
yn+1 = yn + hf (tn , yn ),
then it can be rewritten as RK format, i.e.
ξ1 = yn
yn + 1 = yn + hf (tn + 0h, ξ1 ).
Definition 6.5. (Backward Euler Methodsa )
yn+1 = yn + hf (tn+1 , yn+1 ), n = 0, 1, 2, · · · . (144)

a Backward Euler Method is implicit.
Theorem 6.3. (backward Euler Method is of order 1 a ) Backward Euler Method
y (tn+1 ) = y (tn ) + hf (tn+1 , y (tn+1 )), (145)
is of order 1 .
y (tn+1 ) = y (tn ) + hy 0 (tn ) + O (h2 ) (146)

y 0 (tn+1 ) = y 0 (tn ) + O (h). (147)
So,
y (tn+1 ) − y (tn ) − hf (tn+1 , y (tn+1 ))

= y (tn+1 ) − y (tn ) + hy 0 (tn+1 )
= y (tn ) + hy 0 (tn ) + O (h2 ) − y (tn ) − h[y 0 (tn ) + O (h)] (148)
= O (h2 ).
Therefore, Backward Euler Method (6.5) is order of 1 .
Theorem 6.4. (The convergence of Backward Euler Method) Backward Euler Method
y (tn+1 ) = y (tn ) + hf (tn+1 , y (tn+1 )), (149)
is convergent.
y (tn+1 ) = y (tn ) + hf (tn+1 , y (tn+1 )) + O (h2 ), (150)
en+1 = en + h[f (tn+1 , yn+1 ) − f (tn+1 , y (tn+1 ))] + ch2 . (151)
Page 81 of 236
|f (tn+1 , yn+1 ) − f (tn+1 , y (tn+1 ))| ≤ λ|yn+1 − y (tn+1 )|, λ > 0. (152)
Therefore,

en+1 ≤ ken k + hλ en+1 + ch2 . (153)
So,

(1 − hλ) en+1 ≤ ken k + ch2 . (154)
So, by the Discrete Gronwall’s Inequality , we have

n
ke0 k X h2
en+1 ≤ +
+c
(1 − hλ) n 1 (1 − hλ)n+k−1
k =0
n
X h2
= c (155)
k =0
(1 − hλ)n+k−1
≤ ch (1 + hλ)(nh)/hλ (1 − hλ → 1 + hλ)
2
≤ cheT T .
So, from the claim (155), we get ken k → 0, when h → 0. Therefore Backward Euler Method is convergent
.
Definition 6.6. (tableaux) The tableaux of Backward Euler method

0 0 0
1 0 1
0 1.
Solution. Since, the Backward Euler method is as follows
yn+1 = yn + hf (tn+1 , yn+1 ),
then it can be rewritten as RK format, i.e.
ξ1 = yn
ξ2 = yn + h [0f (tn + 0h, ξ1 ) + 1f (tn + 1h, ξ2 )]
yn + 1 = yn + hf (tn + h, ξ2 ).
6.2 Trapezoidal Method
Definition 6.7. (Trapezoidal Methoda )
1
yn+1 = yn + h[f (tn , yn ) + f (tn+1 , yn+1 )], n = 0, 1, 2, · · · . (156)
2
a Trapezoidal Method Method is a combination of Forward Euler Method and Backward Euler Method.
Page 82 of 236
Theorem 6.5. (Trapezoidal Method is of order 2 a ) Trapezoidal Method
1
y (tn+1 ) = y (tn ) + h[f (tn , y (tn )) + f (tn+1 , y (tn+1 ))], (157)
2
is of order 2 .
1 2 00
y (tn+1 ) = y (tn ) + hy 0 (tn ) + h y (tn ) + O (h3 ) (158)
2!
y 0 (tn+1 ) = y 0 (tn ) + hy 00 (tn ) + O (h2 ). (159)
So,
1
y (tn+1 ) − y (tn ) + h[f (tn , y (tn )) + f (tn+1 , y (tn+1 ))]
2
1
= y (tn+1 ) − y (tn ) + h[y 0 (tn ) + y 0 (tn+1 )]
2
1 1
= y (tn ) + hy (tn ) + h2 y 00 (tn ) + O (h3 ) − y (tn ) + h[y 0 (tn ) + y 0 (tn ) + hy 00 (tn ) + O (h2 )]
0
(160)
2! 2
= O (h3 ).
Therefore, Trapezoidal Method (6.7) is order of 2 .
Theorem 6.6. (The convergence of Trapezoidal Method) Trapezoidal Method
1
y (tn+1 ) = y (tn ) + h[f (tn , y (tn )) + f (tn+1 , y (tn+1 ))], (161)
2
is convergent.
1
y (tn+1 ) = y (tn ) + h[f (tn , y (tn )) + f (tn+1 , y (tn+1 ))] + O (h3 ), (162)
2
1
en+1 = en + h[f (tn , yn ) − f (tn , y (tn )) + f (tn+1 , yn+1 ) − f (tn+1 , y (tn+1 ))] + ch3 . (163)
2
|f (tn , yn ) − f (tn , y (tn ))| ≤ λ|yn − y (tn )|, λ > 0, (164)

|f (tn+1 , yn+1 ) − f (tn+1 , y (tn+1 ))| ≤ λ|yn+1 − y (tn+1 )|, λ > 0. (165)
Therefore,
1
en+1 ≤ ken k + hλ(ken k + en+1 ) + ch3 . (166)
2
Page 83 of 236
So,
1 1
(1 − hλ) en+1 ≤ (1 + hλ) ken k + ch3 . (167)
2 2
Claim:[2]
n
c 2  1 + 21 hλ 
 

ken k ≤ h  1
 − 1 , n = 0, 1, · · · (168)
λ 1 − 2 hλ
Then, we can make h small enough to such that 0 < hλ < 2, then
∞ `
1 + 21 hλ
  
1 X 1  hλ   hλ 
= 1+ ≤  = exp   .
1 − 12 hλ 1 − 21 hλ `!  1 − 12 hλ 1 − 12 hλ
 
` =0
Therefore,
n n
c 2  1 + 12 hλ   c 2  1 + 21 hλ 
    
c 2
 nhλ 
ken k ≤ h   − 1 ≤ h   ≤ h exp  . (169)
λ 1 − 21 hλ λ 1 − 12 hλ λ 1 − 12 hλ
  
This bound of true for every negative integer n such that nh < T . Therefore,
   
c 2  nhλ  c 2  T λ 
.
ken k ≤ h exp   ≤ h exp 
  (170)
λ 1 − 12 hλ λ 1 − 12 hλ

So, from the claim (170), we get ken k → 0, when h → 0. Therefore Trapezoidal Method is convergent .
Definition 6.8. (tableaux) The tableaux of Trapezoidal method

0 0 0
1 1
1 2 2
1 1
2 2.
6.3 Theta Method
Definition 6.9. (Theta Methoda )
yn+1 = yn + h[θf (tn , yn ) + (1 − θ )f (tn+1 , yn+1 )], n = 0, 1, 2, · · · . (171)

a Theta Method is a general form of Forward Euler Method (θ = 1), Backward Euler Method (θ = 0) and
Trapezoidal Method (θ = 12 ).
Definition 6.10. (tableaux) The tableaux of θ-method

0 0 0
1 θ 1-θ
θ 1-θ.
Page 84 of 236
Solution. Since, the θ-Method’s scheme is as follows,
yn+1 = yn + h[θf (tn , yn ) + (1 − θ )f (tn+1 , yn+1 )], n = 0, 1, 2, · · · .
. Then, this scheme can be rewritten as RK-scheme, i.e.
ξ1 = yn
ξ2 = yn + h [θf (tn + 0h, ξ1 ) + (1 − θ )(tn + 1h, ξ2 )]
yn+1 = yn + h[θf (tn + 0h, ξ1 ) + (1 − θ )f (tn + h, ξ2 )]
So, the tableaux of θ-method is

0 0 0
1 θ 1-θ
θ 1-θ.
J
6.4 Midpoint Rule Method
Definition 6.11. (Midpoint Rule Method)
1 1

yn+1 = yn + hf tn + h, (yn + yn+1 ) . (172)
2 2
Theorem 6.7. (Midpoint Rule Method is of order 2) Midpoint Rule Method
1 1

y (tn+1 ) = y (tn ) + hf tn + h, (y (tn ) + y (tn+1 )) . (173)
2 2
is of order 2 .
1 2 00
y (tn+1 ) = y (tn ) + hy 0 (tn ) + h y (tn ) + O (h3 ) (174)
2! !
∂ ∂
f (x0 + ∆x, y0 + ∆y ) = f (x0 , y0 ) + ∆x + ∆y f (x0 , y0 ) + O (h2 ). (175)
∂x ∂y
And chain rule

∂f (t, y) ∂f (t, y)
y 00 = f 0 (t, y) = + f (t, y). (176)
∂t ∂y
So,
1 1

y (tn+1 ) − y (tn ) + hf tn + h, (y (tn ) + y (tn+1 ))
2 2
1 2 00
= y (tn ) + hy (tn ) + h y (tn ) + O (h3 ) − y (tn )
0
2! !
1 ∂f (tn , yn ) 1 ∂f (tn , yn ) 2
− h f (tn , yn ) + (tn + h − tn ) + ( (y (tn ) + y (tn+1 )) − yn ) + O (h )
2 ∂t 2 ∂y
Page 85 of 236
1 2 00
= hy 0 (tn ) + h y (tn ) + O (h3 )
2! !
1 2 ∂f (tn , yn ) 1 2 ∂f (tn , yn ) 3
− hf (tn , yn ) + h + h + O (h )
2 ∂t 2 ∂y
!
1 ∂f (tn , yn ) ∂f (tn , yn ) 0
= hy 0 (tn ) + h2 + y (tn )
2! ∂t ∂y
!
0 1 2 ∂f (tn , yn ) 1 2 ∂f (tn , yn ) 3
− hy (tn ) + h + h + O (h )
2 ∂t 2 ∂y
= O (h3 ).
Therefore, Midpoint Rule Method (6.7) is order of 2 .
Theorem 6.8. (The convergence of Midpoint Rule Method) Midpoint Rule Method
1 1

y (tn+1 ) = y (tn ) + hf tn + h, (y (tn ) + y (tn+1 )) , (177)
2 2
is convergent.
1 1

y (tn+1 ) = y (tn ) + hf tn + h, (y (tn ) + y (tn+1 )) + O (h3 ), (178)
2 2
1 1 1 1

en+1 = en + h f tn + h, (y (tn ) + y (tn+1 )) − f tn + h, (y (tn ) + y (tn+1 )) + ch3 . (179)
2 2 2 2

f t + 1 h, 1 (y (t ) + y (t 1 1

n 2 2 n n+1 )) − f tn + h, ( y ( tn ) + y ( tn+1 ))
2 2
1
≤ λ|y − y (tn ) + yn+1 − y (tn+1 )|, λ > 0. (180)
2 n
Therefore,
1
en+1 ≤ ken k + hλ(ken k + en+1 ) + ch3 . (181)
2
So,
1 1
(1 − hλ) en+1 ≤ (1 + hλ) ken k + ch3 . (182)
2 2
Claim:[2]
n
c 2  1 + 21 hλ 
 

ken k ≤ h  1
 − 1 , n = 0, 1, · · · (183)
λ 1 − 2 hλ
Page 86 of 236
Then, we can make h small enough to such that 0 < hλ < 2, then
∞ `
1 + 12 hλ
  
1 X 1  hλ   hλ 
= 1+ ≤  = exp   .
1 − 12 hλ 1 − 21 hλ ` =0 `! 1 − 12 hλ 1 − 12 hλ
 
Therefore,
n n
c 2  1 + 12 hλ   c 2  1 + 21 hλ 
    
c 2
 nhλ 
ken k ≤ h   − 1 ≤ h   ≤ h exp  . (184)
λ 1 − 21 hλ λ 1 − 12 hλ λ 1 − 12 hλ
  
This bound of true for every negative integer n such that nh < T . Therefore,
   
c 2  nhλ  c 2  T λ 
.
ken k ≤ h exp   ≤ h exp  (185)
λ 1 − 12 hλ λ 1 − 12 hλ

So, from the claim (185), we get ken k → 0, when h → 0. Therefore Midpoint Rule Method is convergent
.
6.5 Problems
Solution. J
Page 87 of 236
7 Multistep Methond
7.1 The Adams Method
Definition 7.1. (s-step Adams-bashforth)

s−1
X
yn+s = yn+s−1 + h bm f (tn+m , yn+m ), (186)
m=0
where
Z tn + s Z h
−1 −1
bm = h pm (τ )dτ = h pm (tn+s−1 + τ )dτ n = 0, 1, 2, · · · .
tn+s−1 0
t − tn+l
pm ( t ) = Πs−1
l =0,l,m , Lagrange interpolation polynomials .
tn+m − tn+l
(1-step Adams-bashforth)
yn+1 = yn + hf (tn , yn ),
3 1

yn + 2 = yn + 1 + h f (tn+1 , yn+1 ) − f (tn , yn ) ,
2 2
23 4 5

yn+3 = yn+2 + h f (tn+2 , yn+2 ) − f (tn+1 , yn+1 ) + f (tn , yn ) .
12 3 12
7.2 The Order and Convergence of Multistep Methods
Definition 7.2. (General s-step Method) The general s-step Method a can be written as
s
X s
X
am yn+m = h bm f(tn+m , yn+m ). (187)
m=0 m=0
Where am , bm , m = 0, · · · , s, are given constants, independent of h, n and original equation.

a if b = 0 the method is explicit; otherwise it is implicit.
s
Page 88 of 236
Theorem 7.1. (s-step method convergent order) The multistep method (187) is of order p ≥ 1 if and only
if there exists c , 0 s.t.
ρ (w ) − σ (w ) ln wa = c (w − 1)p+1 + O (|w − 1|p+2 ), w → 1. (188)
Where,
s
X s
X
ρ (w ) : = am wm and σ (w ) := bm w m . (189)
m=0 m=0
a Let w = ξ + 1, then ln(1 + ξ ) =

P∞ n ξ n+1 2 3 4 n+1
= ξ − ξ2 + ξ3 − ξ4 + · · · + (−1)n ξn+1 + · · · , ξ ∈ (−1, 1).
n=0 (−1) n+1
Theorem 7.2. (s-step method convergent order) The multistep method (187) is of order p ≥ 1 if and only
if
1. sm=0 am = 0, (i.e.ρ (1) = 0),
P
2. sm=0 mk am = k sm=0 mk−1 bm , k = 1, 2, · · · , p,

P P
3. sm=0 mp+1 am , (p + 1) sm=0 mp bm .

P P
Where,
s
X s
X
ρ (w ) : = am wm and σ (w ) := bm w m . (190)
m=0 m=0
Lemma 7.1. (Root Condition) If the roots |λi | ≤ 1 for each i = 1, · · · , m and all roots with value 1 are
simple root then the difference method is said to satisfy the root condition.
Theorem 7.3. (The Dahlquist equivalence theorem) The multistep method (187) is convergent if and only
if
1. consistency: multistep method (187)is order of p ≥ 1 ,
2. stability: the polynomial ρ (w ) satisfies the root condition .
7.3 Method of A-stable verification for Multistep Methods
Theorem 7.4. Explicit Multistep Methods can not be A-stable.
Theorem 7.5. (Dahlquist second barrier) The highest oder of an A-stable multistep method is 2 .
7.4 Problems
Problem 7.1. Find the order of the following quadrature formula.

Z 1
1 2 1 1
f (τ )dτ = f (0) + f ( ) + f (1), Simpson Rule.
0 6 3 2 6
Solution. Since the quadrature formula (209) is order of p if it is exact for every f ∈ Pp−1 . we can chose
Page 89 of 236
the simplest basis (1, τ, τ 2 , τ 3 , · · · , τ p−1 ), and the order conditions read that
p
X Z b
bj cjm = τ m w (τ )dτ, m = 0, 1, · · · , p − 1. (191)
j =1 a
Checking the order condition by the following procedure,

Z 1
1 2 1
1= 1dτ = 1 + 1 + 1 = 1.
0 6 3 6
Z1
1 1 2 1 1 1

= τdτ = 0+ + 1= .
2 0 6 3 2 6 2
Z1 2
1 1 2 2 1 1 1
= τ 2 dτ = 0 + + 12 = .
3 0 6 3 2 6 3
Z1 3
1 1 3 2 1 1 1
= τ 3 dτ = 0 + + 13 = .
4 0 6 3 2 6 4
Z1 4
1 1 4 2 1 1 5
= τ 4 dτ , 0 + + 14 = .
5 0 6 3 2 6 24
we can get the order of the Simpson rule quadrature formula is 4. J
Problem 7.2. Recall Simpson’s quadrature rule:

Z b " #
b−a a+b
f (τ )dτ = f (a) + 4f ( ) + f (b ) + O (|b − a|4 ), Simpson Rule.
a 6 2
Starting from the identity

Z tn+1
y (tn+1 ) − y (tn−1 ) = f (s; y (s ))ds. (192)
tn−1
use Simpson’s rule to derive a 3-step method. Determine its order and whether it is convergent.
Solution. 1. The derivation of the a 3-step method

since,
Z tn+1
y (tn+1 ) − y (tn−1 ) = f (s; y (s ))ds. (193)
tn−1
Then,by Simpson’s quadrature rule, we have
y (tn+1 ) − y (tn−1 ) (194)

Z tn + 1
= f (s; y (s ))ds. (195)
tn−1
tn+1 − tn−1

t + tn+1 tn−1 + tn+1
= f (tn−1 ; y (tn−1 )) + 4f n−1 ;y + f (tn+1 ; y (tn+1 )) (196)
6 2 2
h
= [f (tn−1 ; y (tn−1 )) + 4f (tn ; y (tn )) + f (tn+1 ; y (tn+1 ))] . (197)
3
Page 90 of 236
Therefore, the 3-step method deriving from Simpson’s rule is

h
y (tn+1 ) = y (tn−1 ) + [f (tn−1 ; y (tn−1 )) + 4f (tn ; y (tn )) + f (tn+1 ; y (tn+1 ))] . (198)
3
Or
h
y (tn+2 ) − y (tn ) = [f (tn ; y (tn )) + 4f (tn+1 ; y (tn+1 )) + f (tn+2 ; y (tn+2 ))] . (199)
3
2. The order For our this problem
s s
X X 1 4 1
ρ (w ) : = am wm = −1 + w2 and σ (w ) := bm wm = + w + w2 . (200)
3 3 3
m=0 m=0
By making the substitution with ξ = w − 1 i.e. w = ξ + 1, then

s s
X X 1 2
ρ (w ) : = am wm = ξ 2 + 2ξ and σ (w ) := bm wm = ξ + 2ξ + 2. (201)
3
m=0 m=0
So,
1 ξ2 ξ3
ρ (w ) − σ (w )ln(w ) = ξ 2 + 2ξ − (2 + 2ξ + ξ 2 )(ξ − + ···)
3 2 3
+2ξ +ξ 2
−2ξ +ξ 2 − 23 ξ 3
=
−2ξ 2 +ξ 3 − 32 ξ 4
− 13 ξ 3 + 61 ξ 4 − 19 ξ 5
1
= − ξ 4 + O (ξ 5 ).
2
Therefore, by the theorem
1
ρ (w ) − σ (w )ln(w ) = − ξ 4 + O (ξ 5 ).
2
Hence, this scheme is order of 3.
3. The stability Since,
s
X
ρ (w ) : = am wm = −1 + w2 = (w − 1)(w + 1). (202)
m=0
And w = ±1 are simple root which satisfy the root condition. Therefore, this scheme is stable.
Hence, it is of order 3 and convergent. convergent J
Problem 7.3. Restricting your attention to scalar autonomous y 0 = f (y ), prove that the ERK method with
tableau
0
1 1
2 2
1 1
2 0 2
1 0 0 1
1 1 1 1
6 3 3 6
is of order 4.
Page 91 of 236
Solution. J
Problem 7.4. (Prelim Jan. 2011#5) Consider
y 0 (t ) = f (t, y (t )), t ≥ t0 , y (t0 ) = y0 ,
where f : [t0 , t ∗ ] × R → R is continuous in its first variable and Lipschitz continuous in its second variable.
Prove that Euler’s method converges.
Solution. The Euler’s scheme is as follows:
yn+1 = yn + hf (tn , yn ), n = 0, 1, 2, · · · . (203)
By the Taylor expansion,
y (tn+1 ) = y (tn ) + hy 0 (tn ) + O (h2 ).
So,
= O (h2 ).
Therefore, Forward Euler Method is order of 1 .

From (219), we get
en+1 = en + h[f (tn , yn ) − f (tn , y (tn ))] + ch2 .
|f (tn , yn ) − f (tn , y (tn ))| ≤ λ|yn − y (tn )|, λ > 0.
Therefore,

= (1 + hλ) ken k + ch2 .
Claim:[2]
c
ken k ≤ h[(1 + hλ)n − 1], n = 0, 1, · · ·
λ
c
ken k ≤ h[(1 + hλ)n − 1]
λ
Page 92 of 236
3. Induction steps:

en+1 ≤ (1 + hλ) ken k + ch2
c
≤ (1 + hλ) h[(1 + hλ)n − 1] + ch2
λ
c
= h[(1 + hλ)n+1 − 1].
λ
So, from the claim (221), we get ken k → 0, when h → 0. Therefore Forward Euler Method is convergent .
J
Problem 7.5. (Prelim Jan. 2011#6) Consider the scheme
yn+2 + yn+1 − 2yn = h (f (tn+2 , yn+2 ) + f (tn+1 , yn+1 ) + f (tn , yn ))
for approximating the solution to
y 0 (t ) = f (t, y (t )), t ≥ t0 , y (t0 ) = y0 ,
what’s the order of the scheme? Is it a convergent scheme? Is it A-stable? Justify your answers.
Solution. For our this problem

s
X s
X
2
ρ (w ) : = m
am w = −2 + w + w and σ (w ) := bm w m = 1 + w + w 2 . (206)
m=0 m=0

s
X s
X
ρ (w ) : = am wm = ξ 2 + 3ξ and σ (w ) := bm wm = ξ 2 + 3ξ + 3. (207)
m=0 m=0
So,
ξ2 ξ3
ρ (w ) − σ (w )ln(w ) = ξ 2 + 3ξ − (3 + 3ξ + ξ 2 )(ξ − + ···)
2 3
+3ξ +ξ 2
−3ξ −3ξ 2 −ξ 3
=
+ 23 ξ 2 + 32 ξ 3 + 12 ξ 4
−ξ 3 −ξ 4 − 31 ξ 5
1
= − ξ 2 + O (ξ 3 ).
2
1
ρ (w ) − σ (w )ln(w ) = − ξ 2 + O (ξ 3 ).
2
Hence, this scheme is order of 1. The stability Since,
s
X
ρ (w ) : = am wm = −2 + w + w2 = (w + 2)(w − 1). (208)
m=0
And w = −1 or w = −2 which does not satisfy the root condition. Therefore, this scheme is not stable.
Hence, it is also not A-stable.
J
Page 93 of 236
Problem 7.6. (Prelim Jan. 2011#4)
Solution. J
Page 94 of 236
8 Runge-Kutta Methods
8.1 Quadrature Formulas
Definition 8.1. (The Quadrature) The Quadrature is the procedure of replacing an integral with a finite
sum.
Definition 8.2. (The Quadrature Formula) Let w be a nonnegative function in (a,b) s.t.
Zb Z
b j
w (τ )dτ < ∞, τ w (τ )dτ < ∞, j = 1, 2, · · · .

0<
a a
Then, the quadrature formula is as following

Z b n
X
f (τ )w (τ )dτ ≈ bj f (cj ). (209)
a j
Remark 8.1. The quadrature formula (209) is order of p if it is exact for every f ∈ Pp−1 .
8.2 Explicit Runge-Kutta Formulas
Definition 8.3. (Explicit Runge-Kutta Formulas) Explicit Runge-Kutta is to integral from tn to tn+1 as
follows
Z tn + 1
y ( tn + 1 ) = y (tn ) + f (τ, y (τ ))dτ
tn
Z 1
= y (tn ) + h f (tn + hτ, y (tn + hτ ))dτ
0
and to replace the second integral by a quadrature, i.e.

ν
X
yn + 1 = yn + h bj f (tn + cj h, y (tn + cj h))
j =1
Specifically, we have
ξ1 = yn ,
ξ2 = yn + ha21 f (tn , ξ1 )
ξ3 = yn + ha31 f (tn + c1 h, ξ1 ) + ha32 f (tn + c2 h, ξ2 )
..
.
ν−1
X
ξν = yn + h aνi f (tn + ci h, ξi ))
i =1
X ν
yn+1 = yn + h bj f (tn + cj h, ξj )).
j =1
Page 95 of 236
Definition 8.4. (tableaux) The tableaux of REK

c A
bT
where A is low triangular matrix.
Remark 8.2. by observing that the condition

j−1
X
aj,i = cj , j = 2, 3, · · · , ν,
i =1
is necessary for order 1.
8.3 Implicit Runge-Kutta Formulas
Definition 8.5. (Implicit Runge-Kutta Formulas) Implicit Runge-Kutta use the following scheme
ν
X
ξj = yn + h aj,i f (tn + ci h, ξi ), j = 1, 2, · · · , ν
i =1
X ν
yn + 1 = yn + h bj f (tn + cj h, ξj ).
j =1
8.4 Method of A-stable verification for Runge-Kutta Method
Theorem 8.1. Explicit Runge-Kutta Methods can not be A-stable.
Theorem 8.2. necessary & sufficient A necessary & sufficient condition for A-stable Runge-Kutta method
is

r (z ) < 1, z ∈ C− ,
where
r (z ) = 1 + zbT (I − zA)−1 1.
8.5 Problems
Page 96 of 236
9 Finite Difference Method
Definition 9.1. (Discrete 2-norm) The discrete 2-norm is defined as follows

N
X
kuk22,h = hd |ui |2 ,
i =1
where d is dimension.
Theorem 9.1. (Discrete maximum principle) Let A = tridiag{ai , bi , ci }ni=1 ∈ Rn×n be a tridiagional
matrix with the properties that
bi > 0, ai , ci ≤ 0, ai + bi + ci = 0.
Prove the following maximum principle: If u ∈ Rn is such that (Au )i =2,··· ,n−1 ≤ 0, then ui ≤ max{u1 , un }.
Proof. Without loss generality, we assume uk , k = 2, · · · , n − 1 is the maximum value.

1. For (Au )i =2,··· ,n−1 < 0:
I will use the method of contradiction to prove this case. Since (Au )i =2,··· ,n−1 < 0, so
ak uk−1 + bk uk + ck uk +1 < 0.
Since ak + ck = −bk and ak < 0, ck < 0, so
ak uk−1 − (ak + ck )uk + ck uk +1 = ak (uk−1 − uk ) + ck (uk +1 − uk ) ≥ 0.
This is contradiction to (Au )i =2,··· ,n−1 < 0. Therefore, If u ∈ Rn is such that (Au )i =2,··· ,n−1 < 0, then
ui < max{u1 , un }.
2. For (Au )i =2,··· ,n−1 = 0:
Since (Au )i =2,··· ,n−1 = 0, so
ak uk−1 + bk uk + ck uk +1 = 0.
Since ak + ck = −bk , so
ak uk−1 − (ak + ck )uk + ck uk +1 = ak (uk−1 − uk ) + ck (uk +1 − uk ) = 0.
And ak < 0, ck < 0, uk−1 − uk ≤ 0, uk +1 − uk ≤ 0, so uk−1 = uk = uk +1 , that is to say, uk−1 and uk +1 is also
the maximum points. Bu using the same argument again, we get uk−2 = uk−1 = uk = uk +1 = uk +2 .
Repeating the process, we get
u1 = u2 = · · · = un−1 = un .
Therefore, If u ∈ Rn is such that (Au )i =2,··· ,n−1 = 0, then ui ≤ max{u1 , un }
Theorem 9.2. (Discrete Poincaré inequality) Let Ω = (0, 1) and Ωh be a uniform grid of size h. If Y ∈ Uh
is a mesh function on Ωh such that Y (0) = 0, then there is a constant C, independent of Y and h, for which

kY k2,h ≤ C δ̄Y 2,h .
Page 97 of 236
Proof. I consider the following uniform partition (Figure. A1) of the interval (0, 1) with N points.
x1 = 0 x2 xN −1 xN = 1
Figure 3: One dimension’s uniform partition
Since the discrete 2-norm is defined as follows

N
X
kvk22,h = hd |vi |2 ,
i =1
where d is dimension. So, we have

N N
X 2 X vi−1 − vi 2
kvk22,h = h |vi |2 , δ̄v 2,h = h .
h

i =1 i =2
Since Y (0) = 0, i.e. Y1 = 0,

N
X
(Yi−1 − Yi ) = Y1 − YN = −YN .
i =2
Then,

X N
(Yi−1 − Yi ) = |YN |.
i =2
and
N N N 1/2  N 1/2
Yi−1 − Yi X 2  X Yi−1 − Yi 2 
X X
|YN | ≤ |Yi−1 − Yi | = h ≤  h    .

h h
i =2 i =2 i =2 i =2
Therefore
 K  K 
X  X Yi−1 − Yi 2 
|YK |2 ≤  h2   
 h
i =2 i =2
K 2
Yi−1 − Yi .
X
2
= h (K − 1) h
i =2
1. When K = 2,
2
2 2 Y1 − Y2
|Y2 | ≤ h
h .
2. When K = 3,
Y1 − Y2 2 Y2 − Y3 2
!
2 2 +
|Y3 | ≤ 2h .
h h
Page 98 of 236
3. When K = N ,
Y1 − Y2 2 Y2 − Y3 2 YN −1 − YN 2
!
2 2
|YN | ≤ (N − 1)h + + · · · + .
h h h
Sum over |Yi |2 from 2 to N, we get

N N
N (N − 1) 2 X Yi−1 − Yi 2
X
2 .
|Yi | ≤ h
2 h
i =2 i =2
Since Y1 = 0, so
N N
N (N − 1) 2 X Yi−1 − Yi 2
X
|Yi |2 ≤ h .
2 h
i =1 i =2
And then
N N N
Yi−1 − Yi 2 Yi−1 − Yi 2
! X
1 X
2 N 2
X 1 1 2
|Yi | ≤ h = + h .
(N − 1)2

2(N − 1) h 2 2(N − 1) h
i =1 i =2 i =2
1
Since h = N −1 , so
N ! X N 2
X 1 1 Yi−1 − Yi .
h2 |Yi |2 ≤ + h2
2 2(N − 1) h
i =1 i =2
then
N N
Yi−1 − Yi 2
! X
X
2 1 1
h |Yi | ≤ + h
.
2 2(N − 1) h
i =1 i =2
i.e,
!
1 1 2
kY k22,h ≤ + δ̄Y .
2,h
2 2(N − 1)
since N ≥ 2, so
2
kY k22,h ≤ δ̄Y 2,h .
Hence,

kY k2,h ≤ C δ̄Y 2,h .
Theorem 9.3. (Von Neumann stability analysis method) For the difference scheme
X
Ujn+1 = n
αp Uj−p ,
p∈N
we have the corresponding Fourier transform is as follows

X
Û n+1 (ξ ) = αp e−ipξ Û n (ξ ) := G (λ, ξ )Û n (ξ ).
p∈N

Where λ = hτ2 is CFL number and G (λ, ξ ) is called Growth factor . If G (λ, ξ ) ≤ 1, then the difference
scheme is stable.
Page 99 of 236
9.1 Problems
Problem 9.1. (Prelim Jan. 2011#7) Consider the Crank-Nicholson scheme applied to the diffusion equation
∂u ∂2 u
= 2
∂t ∂x
where t > 0, −∞ < x < ∞.
1. Show that the amplification factor in the Von Neumann analysis of the scheme us
1 + 21 z ∆t
g (ξ ) = 1
,z = 2 (cos (∆xξ ) − 1).
1− 2z
∆x2
2. Use the results of part 1 to show that the scheme is stable.
Solution. 1. The Crank-Nicholson scheme for the diffusion equation is

ujn+1 − ujn
 n+1 n+1 n+1 n
− 2ujn + ujn+1 

1  uj−1 − 2uj + uj +1 uj−1
=  + 
∆t 2 ∆x2 ∆x2

∆t
Let µ = ∆x2
, then the scheme can be rewrote as
µ n+1
ujn+1 = ujn + uj−1 − 2ujn+1 + ujn++11 + uj−1

n
− 2ujn + ujn+1 ,
2
i.e.
µ n+1 µ µ n µ
− uj−1 + (1 + µ)ujn+1 − ujn++11 = uj−1 + (1 − µ)ujn + ujn+1 .
2 2 2 2
By using the Fourier transform, i.e.
ujn+1 = g (ξ )ujn , ujn = eij∆xξ ,
then we have
µ n µ µ n µ
− g (ξ )uj−1 + (1 + µ)g (ξ )ujn − g (ξ )ujn+1 = uj−1 + (1 − µ)ujn + ujn+1 .
2 2 2 2
And then
µ µ µ µ
− g (ξ )ei (j−1)∆xξ + (1 + µ)g (ξ )eij∆xξ − g (ξ )ei (j +1)∆xξ = ei (j−1)∆xξ + (1 − µ)eij∆xξ + ei (j +1)∆xξ ,
2 2 2 2
i.e.
µ µ µ µ

g (ξ ) − e−i∆xξ + (1 + µ) − ei∆xξ ej∆xξ = e−i∆xξ + (1 − µ) + ei∆xξ ej∆xξ ,
2 2 2 2
i.e.
g (ξ ) (1 + µ − µ cos(∆xξ )) = 1 − µ + µ cos(∆xξ ).
therefore,
1 − µ + µ cos(∆xξ )
g (ξ ) = .
1 + µ − µ cos(∆xξ )
hence
1 + 12 z ∆t
g (ξ ) = 1
,z = 2 (cos (∆xξ ) − 1).
1− 2z
∆x2
Page 100 of 236

∆t
2. since z = 2 ∆x 2 (cos (∆xξ ) − 1), then z < 0, then we have
1 1
1 + z < 1 − z,
2 2
therefore g (ξ ) < 1. Since −1 < 1, then
1 1
z − 1 < z + 1.
2 2
Therefore,
1 + 21 z
g (ξ ) = > −1.
1 − 12 z

hence g (ξ ) < 1. So, the scheme is stable.
J
Problem 9.2. (Prelim Jan. 2011#8) Consider the explicit scheme

bµ∆x
ujn+1 = ujn + µ uj−1

n
− 2ujn + ujn+1 − ujn+1 − uj−1
n
, 0 ≤ n ≤ N , 1 ≤ j ≤ L.
2
for the convention-diffusion problem

∂u 2
 = ∂∂xu2 − b ∂u for 0 ≤ x ≤ 1, 0 ≤ t ≤ t ∗
 ∂t ∂x


u ( 0, t ) = u ( 1, t) = 0 for 0 ≤ t ≤ t ∗




u (x, 0) = g (x )

for 0 ≤ x ≤ 1,
∆t 1 t∗
where b > 0, µ = (∆x )2
, ∆x = L+ 1 , and ∆t = N. Prove that, under suitable restrictions on µ and ∆x, the
n
error grid function e satisfy the estimate

ken k∞ ≤ t ∗ C ∆t + ∆x2 ,
for all n such that n∆t ≤ t ∗ , where C > 0 is a constant.
Solution. Let ū be the exact solution and ūjn = ū (n∆t, j∆x ). Then from Taylor Expansion, we have
∂ n 1 ∂2
ūjn+1 = ūjn + ∆t ūj + (∆t )2 2 ū (ξ1 , j∆x ), tn ≤ ξ1 ≤ tn+1 ,
∂t 2 ∂t
∂ 1 ∂ 3 1 ∂4
n
ūj−1 = ūjn − ∆x ūjn − (∆x )3 3 ūjn + (∆x )4 4 ū (n∆t, ξ2 ), xj−1 ≤ ξ2 ≤ xj ,
∂x 6 ∂x 24 ∂x
∂ 1 ∂ 3 1 ∂4
ūjn+1 = ūjn + ∆x ūjn + (∆x )3 3 ūjn + (∆x )4 4 ū (n∆t, ξ3 ), xj ≤ ξ3 ≤ xj + 1 .
∂x 6 ∂x 24 ∂x
Then the truncation error T of this scheme is
ūjn+1 − ūjn n
ūj−1 − 2ūjn + ūjn+1 ūjn+1 − ūj−1
n
T = − +b
∆t ∆x2 ∆x
= O (∆t + (∆x )2 ).
Therefore
bµ∆x
ejn+1 = ejn + µ ej−1

n
− 2ejn + ejn+1 − ejn+1 − ej−1
n
+ c∆t (∆t + (∆x )2 ),
2
Page 101 of 236

i.e.
! !
bµ∆x n bµ∆x n
ejn+1 = µ + ej−1 + (1 − 2µ)ejn + µ − ej +1 + c∆t (∆t + (∆x )2 ).
2 2
Then

en+1 ≤ µ + bµ∆x en + (1 − 2µ) en + µ − bµ∆x en + c∆t (∆t + (∆x )2 ).

j 2 j−1 j 2 j + 1
Therefore


j ∞ 2 j−1 ∞ j ∞ 2 j + 1 ∞

bµ∆x n bµ∆x n
en+1 ≤ µ +

ke k∞ + (1 − 2µ) ken k∞ + µ − ke k∞ + c∆t (∆t + (∆x )2 ).
∞ 2 2
bµ∆x
If 1 − 2µ ≥ 0 and µ − 2 ≥ 0, i.e. µ ≤ 12 and 1 − 12 b∆x > 0, then
! !
bµ∆x bµ∆x
en+1

∞
≤ µ+ ken k∞ + ((1 − 2µ)) ken k∞ + µ − ken k∞ + c∆t (∆t + (∆x )2 )
2 2
= ken k∞ + c∆t (∆t + (∆x )2 ).
Then

ken k∞ ≤ en−1 + c∆t (∆t + (∆x )2 )
∞
≤ en−2 + c2∆t (∆t + (∆x )2 )
∞
..
≤ .

≤ e0 + cn∆t (∆t + (∆x )2 )
∞
= ct ∗ (∆t + (∆x )2 ).
J
Problem 9.3. (Prelim Aug. 2010#8) Consider the Crank-Nicolson scheme

µ n+1

n
− 2ujn + ujn+1
2
∂u ∂2 u
for approximating the solution to the heat equation ∂t
= ∂x2
on the intervals 0 ≤ x ≤ 1 and 0 ≤ t ≤ t ∗ with
the boundary conditions u (0, t ) = u (1, t ) = 0.
1. Show that the scheme may be written in the form un+1 = Aun , where A ∈ Rm×m
sym (the space of m × m
symmetric matrices) and
kAxk2 ≤ kxk2 ,
for any x ∈ Rm , regardless of the value of µ.

2. Show that
kAxk∞ ≤ kxk∞ ,
for any x ∈ Rm , provided µ ≤ 1.(In other words, the scheme may only be conditionally stable in the
max norm.)
Page 102 of 236

Solution. 1. the scheme

µ n+1

n
− 2ujn + ujn+1
2
can be rewritten as
µ n+1 µ µ n µ
2 2 2 2
By using the boundary, we have
Cun+1 = Bun
where
µ µ
1 + µ
   
−2  1 − µ 2 
 µ µ  µ µ
 − 2 1 + µ −2 1−µ
 
  2 2

   
C = 
 .. .. ..  , B = 
  .. .. ..  ,

. . . . . .
µ µ µ µ 
   

 −2 1+µ −2 


 2 1−µ 2 

µ µ 
−2 1+µ 1−µ
  
2
 n+1   n
u1   u1 
 n+1   u n 
u2   2 
un+1 =  .  and un =  .  .
 
 ..   .. 
   
 n+1  umn
um
So, the scheme may be written in the form un+1 = Aun , where A = C −1 B. By using the Fourier
transform, i.e.
then we have
µ n µ µ n µ
2 2 2 2
And then
µ µ µ µ
2 2 2 2
i.e.
µ µ µ µ

2 2 2 2
i.e.
therefore,
1 − µ + µ cos(∆xξ )
g (ξ ) = .
1 + µ − µ cos(∆xξ )
hence
1 + 12 z ∆t
g (ξ ) = ,z = 2 (cos (∆xξ ) − 1).
1 − 12 z ∆x2

Moreover, g (ξ ) < 1, therefore, ρ (A) < 1.
kAxk2 ≤ kAk2 kxk2 = ρ (A) kxk2 ≤ kxk2 .
Page 103 of 236

2. the scheme
µ n+1

n
− 2ujn + ujn+1
2
can be rewritten as
µ n+1 µ n+1 µ n µ
(1 + µ)ujn+1 = u + uj +1 + uj−1 + (1 − µ)ujn + ujn+1 .
2 j−1 2 2 2
then
µ µ µ
(1 − µ) u n + µ u n .

1 + µ ujn+1 ≤ uj−1
n+1 n+1

n
+ u +
j +1 j−1 u + j j +1
2 2 2 2
Therefore
µ n+1 µ n+1 µ n µ n
(1 + µ) ujn+1 ≤ uj−1

n
+ uj +1 + uj−1 + (1 − µ) uj + uj +1 .
∞ 2 ∞ 2 ∞ 2 ∞ ∞ 2 ∞
i.e.
µ µ µ µ
(1 + µ) un+1 ∞ ≤ un+1 ∞ + un+1 ∞ + kun k∞ + (1 − µ) kun k∞ + kun k∞ .

2 2 2 2
if µ ≤ 1, then
un+1 ≤ kun k∞ ,

∞
i.e.
kAun k∞ ≤ kun k∞ .
Problem 9.4. (Prelim Aug. 2010#9) Consider the Lax-Wendroff scheme
a2 (∆t )2 n a∆t
ujn+1 = ujn +

2
uj−1 − 2ujn + ujn+1 − ujn+1 − uj−1
n
,
2(∆x ) 2∆x
for the approximating the solution of the Cauchy problem for the advection equation
∂u ∂u
+a = 0, a > 0.
∂t ∂x
Use Von Neumann’s Method to show that the Lax-Wendroff scheme is stable provided the CFL condition
a∆t
≤ 1.
∆x
is enforced.
Solution. By using the Fourier transform, i.e.
then we have
a2 (∆t )2 n a∆t
g (ξ )ujn = ujn + u − 2u n
+ u n
j +1 − u n
− u n
j−1 .
2(∆x )2 j−1 j 2∆x j +1
Page 104 of 236

And then
a2 (∆t )2 i (j−1)∆xξ i (j +1)∆xξ
a∆t
i (j +1)∆xξ i (j−1)∆xξ

g (ξ )eij∆xξ = eij∆xξ + e − 2e ij∆xξ
+ e − e − e .
2(∆x )2 2∆x
Therefore
a2 (∆t )2 −i∆xξ a∆t
g (ξ ) = 1+ 2
e − 2 + ei∆xξ − ei∆xξ − e−i∆xξ
2(∆x ) 2∆x
a2 (∆t )2 a∆t
= 1+ (2 cos(∆xξ ) − 2) − (2i sin(∆xξ ))
2(∆x )2 2∆x
a2 (∆t )2 a∆t
= 1+ (cos(∆xξ ) − 1) − (i sin(∆xξ )) .
(∆x )2 ∆x
a∆t
Let µ = ∆x , then
g (ξ ) = 1 + µ2 (cos(∆xξ ) − 1) − µ (i sin(∆xξ )) .

If g (ξ ) < 1, then the scheme is stable, i,e
2
1 + µ2 (cos(∆xξ ) − 1) + (µ sin(∆xξ ))2 < 1.

i.e.
1 + 2µ2 (cos(∆xξ ) − 1) + µ4 (cos(∆xξ ) − 1)2 + µ2 sin(∆xξ )2 < 1.
i.e.
µ2 sin(∆xξ )2 + 2 cos(∆xξ ) − 2 + µ4 (cos(∆xξ ) − 1)2 < 0.

i.e.
µ2 1 − cos(∆xξ )2 + 2 cos(∆xξ ) − 2 + µ4 (cos(∆xξ ) − 1)2 < 0.

i.e
µ2 (cos(∆xξ ) − 1)2 − (cos(∆xξ ) − 1)2 < 0,
(µ2 − 1)(cos(∆xξ ) − 1)2 < 0,
then we get µ < 1. The above process is invertible, therefore, we prove the result. J
Solution. J
Page 105 of 236

10 Finite Element Method
Theorem 10.1. (1D Dirichlet-Poincaré inequality) Let a > 0, u ∈ C 1 ([−a, a]) and u (−a) = 0, then the
1D Dirichlet-Poincaré inequality is defined as follows
Z a Z a
u (x )2 dx ≤ 4a2 u 0 (x )2 dx.
−a −a
Proof. Since u (−a) = 0, then by calculus fact, we have

Z x
u (x ) = u (x ) − u (−a) = u 0 (ξ )dξ.
−a
Therefore
Z x
u (x ) 0
≤
u (ξ )dξ
Z −a
x
≤ u 0 (ξ ) dξ
Z−aa
≤ u 0 (ξ ) dξ (x ≤ a)
−a
Z a !1/2 Z a !1/2
2
≤ 12 dξ u 0 (ξ ) dξ (Cauchy-Schwarz inequality)
−a −a
Z a !1/2
= (2a)1/2 u 0 (ξ )2 dξ .
−a
Therefore
2 Z a

u (x ) ≤ 2a u 0 (ξ )2 dξ.
−a

Z a Z a Z a
u (x )2 dx ≤ 2a u 0 (ξ )2 dξdx
−a
Z−aa −a
2 Z a
0
= u (ξ ) dξ
2adx
−a −a
Z a
= 4a2 u 0 (ξ )2 dξ
Z−aa
= 4a2 u 0 (x )2 dx.
−a
>a
Theorem 10.2. (1D Neumann-Poincaré inequality) Let a > 0, u ∈ C 1 ([−a, a]) and ū = −a
u (x )dx, then
the 1D Neumann-Poincaré inequality is defined as follows
Z a Z a
u (x ) − ū (x )2 dx ≤ 2a(a − c ) u 0 (x )2 dx.
−a −a
Page 106 of 236

>a
Proof. Since, ū = −a
u (x )dx, then by intermediate value theorem, there exists a c ∈ [−a, a], s.t.
u (c ) = ū (x ).
then by calculus fact, we have
Z x
u (x ) − ū (x ) = u (x ) − u (c ) = u 0 (ξ )dξ.
c
Therefore
Z x
u (x ) − ū (x ) 0
≤
u (ξ )dξ
c
Z x
≤ u 0 (ξ ) dξ
Zc a
≤ u 0 (ξ ) dξ (x ≤ a)
c
Z a !1/2 Z a !1/2
2
2 0
≤ 1 dξ u (ξ ) dξ
(Cauchy-Schwarz inequality)
c c
Z a !1/2
= (a − c )1/2 u 0 (ξ )2 dξ .
−a
Therefore
2 Z a

u (x ) − ū (x ) ≤ (a − c ) u 0 (ξ )2 dξ.
−a
Z a Za Z a
u (x ) − ū (x )2 dx ≤ (a − c ) u 0 (ξ )2 dξdx
−a
Z−aa −a
Za
= u 0 (ξ )2 dξ

(a − c )dx
−a −a
Z a
= 2a(a − c ) u 0 (ξ )2 dξ
Z−aa
= 2a(a − c ) u 0 (x )2 dx.
−a
Definition 10.1. (symmetric, continuous and coercive) We consider the bilinear form a : H × H → R on
a normed space H.
1. a(·, ·) is said to be symmetric provided that
a(u, v ) = a(v, u ), ∀u, v ∈ H.
2. a(·, ·) is said to continuous or bounded , if there exists a constant C s.t.

a(u, v ) = C kuk kvk , ∀u, v ∈ H.
3. a(·, ·) is said to be coercive provided there exists a constant α s.t.

a(u, u ) ≥ α kuk2 , ∀u ∈ H.
Page 107 of 236

Proof.
Theorem 10.3. (Lax-Milgram Theorem[1]) Given a Hilbert space H, a continuous, coercive bilinear form
a(·, ·) and a continuous functional F ∈ H 0 , there exists a unique u ∈ H s.t.
a(u, v ) = F (v ), ∀v ∈ H.
Theorem 10.4. (Céa Lemma[1]) Suppose V is subspace of H. a(·, ·) is continuous and coercive bilinear
form on V. Given F ∈ V 0 , u ∈ V , s.t.
a(u, v ) = F (v ), ∀v ∈ V .
For the finite element variational problem
a(uh , v ) = F (v ), ∀v ∈ Vh ,
we have
C
ku − uh kV ≤ min ku − vkV ,
α v∈Vh
where C is the continuity constant and α is the coercivity constant of a(·, ·) on V.
Proof.
10.1 Finite element methods for 1D elliptic problems
Theorem 10.5. (Convergence of 1d FEM) The linear basis FEM solution uh for

−u 00 (x ) = f (x ), x ∈ I = x ∈ [a, b ] ,


u (a) = u (b ) = 0.


has the following properties

ku − uh kL2 (I ) ≤ Ch2 u 00 L2 (I ) ,

u 0 − u 0 ≤ Ch u 00
h L2 ( I ) .
L2 ( I )
Proof. 1. Define the first degree Taylor polynomial on Ii = [xi , xi +1 ] as
Q1 u (x ) = u (xi ) + u 0 (x )(x − xi ).
Then we have
Z
u (x ) − Q1 u (x ) = x − y u 00 (y )dy.
I
Page 108 of 236

This implies
Z
u (x ) − Q1 u (x )
C ( Ii )
= max x − y u 00 (y ) dy
x∈Ii I
Z i
≤ h u 00 (y ) dy
Ii
Z !1/2 Z !1/2
2
2 00
u (y ) dy
≤ h 1 dy
Ii Ii
Z !1/2
≤ h3/2 u 00 (y )2 dy
I
i
= h3/2 u 00 (x ) L2 (I ) .
i
And
Z
ku − uh k2L2 (I ) = (u − uh )2 dx
i
Ii
Z
≤ ku − uh k2C (I ) dx
i
Ii
2
≤ h ku − uh kC (I ) .
i
Therefore,
ku − uh kL2 (Ii ) ≤ h1/2 ku − uh kC (Ii ) .
and
ku − uh kC (Ii ) ≤ ku − Q1 ukC (Ii ) + kQ1 u − uh kC (Ii )

= ku − Q1 ukC (Ii ) + Ih (Q1 u − u ) C (I )
i
≤ 2 ku − Q1 ukC (Ii )

= 2h3/2 u 00 (x ) L2 (Ii )
Therefore

ku − uh kL2 (Ii ) ≤ 2h2 u 00 (x ) L2 (I ) .
i
Hence

ku − uh kL2 (I ) ≤ 2h2 u 00 (x ) L2 (I ) .
2. For the linear basis we have the Element solution uh (xi ) = u (xi ) and uh (xi +1 ) = u (xi +1 ) on element
Ii = [xi , xi +1 ]. and
u h ( xi + 1 ) − u h ( xi ) u ( xi + 1 ) − u ( xi ) 1 xi + 1 0
Z Z
0 1
uh ( x ) = = = u (y )dy = u 0 (y )dy,
h h h xi h Ii
1 xi + 1 0
Z Z
h 1
u 0 (x ) = u 0 (x ) = u (x )dy = u 0 (x )dy.
h h xi h Ii
Page 109 of 236

Therefore
Z
1
uh0 (x ) − u 0 (x ) = u 0 (y ) − u 0 (x )dy
h Ii
Z Z y
1
= u 00 (ξ )dξdy.
h Ii x
so
2 Z 2
u 0 − uh0 2 = uh0 (x ) − u 0 (x ) dx
L (Ii )
Ii
Z Z Z y !2
1
= u 00 (ξ )dξdy dx
h2 Ii Ii x
Z Z Z !2
1
≤ u 00 (ξ )dξdy dx
h2 Ii Ii Ii
Z Z !2 Z
1
= u 00 (ξ )dξdy dx
h2 Ii Ii Ii
Z Z !2
1 00
= dy u (ξ )dξ
h Ii Ii
Z !2
1 00
= h u (ξ )dξ
h Ii
Z !2
00
= h u (ξ )dξ
Ii
 Z !1/2 Z !1/2 2
2
2 00
 
≤ h  1 dξ u (ξ ) dξ 

Ii Ii
Z !
≤ h2 u 00 (ξ )2 dξ
Ii
hence

u 0 − uh0 ≤ Ch u 00 2 .
L2 ( I i ) L (I ) i
Therefore,

u 0 − uh0 ≤ Ch u 00 L2 (I ) .
L2 ( I )
Page 110 of 236

10.2 Problems
Problem 10.1. (Prelim Jan. 2008#8) Let Ω ⊂ R2 be a bounded domain with a smooth boundary. Consider
a 2-D poisson-like equation

−∆u + 3u = x2 y 2 , in Ω,


= 0, on ∂Ω.

u

1. Write the corresponding Ritz and Galerkin variational problems.

2. Prove that the Galerkin method has a unique solution uh and the following estimate is valid
ku − uh kH 1 ≤ C inf ku − vh kH 1 ,
vh ∈Vh
with C independent of h, where Vh denotes a finite element subspace of H 1 (Ω) consisting of contin-
uous piecewise polynomials of degree k ≥ 1.
Solution. 1. For this pure Dirichlet Problem, the test functional space v ∈ H01 . Multiple the test func-
tion on the both sides of the original function and integral on Ω, we get
Z Z Z
− ∆uvdx + uvdx = xyvdx.
Ω Ω Ω
Integration by part yields

Z Z Z
∇u∇vdx + uvdx = xyvdx.
Ω Ω Ω
Let
Z Z Z
a(u, v ) = ∇u∇vdx + uvdx, f (v ) = xyvdx.
Ω Ω Ω
Then, the
(a) Ritz variational problem is: find uh ∈ H01 , such that
1
J (uh ) = min a(uh , uh ) − f (uh ).
2
(b) Galerkin variational problem is: find uh ∈ H01 , such that
a(uh , uh ) = f (uh ).
2. Next, we will use Lax-Milgram to prove the uniqueness.

(a)
Z Z
a(u, v ) ≤ |∇u∇v| dx + |uv| dx
Ω Ω
≤ k∇ukL2 (Ω) k∇vkL2 (Ω) + kukL2 (Ω) kvkL2 (Ω)
≤ k∇ukL2 (Ω) k∇vkL2 (Ω) + C k∇ukL2 (Ω) k∇vkL2 (Ω)
≤ C k∇ukL2 (Ω) k∇vkL2 (Ω)
≤ C kukH 1 (Ω) kvkH 1 (Ω)
Page 111 of 236

(b)
Z Z
2
a(u, u ) = (∇u ) dx + u 2 dx
Ω Ω
So,
Z Z
2
a(u, u ) = |∇u| dx + |u|2 dx
Ω Ω
= k∇uk2L2 (Ω) + kuk2L2 (Ω)
= kuk2H 1 (Ω) .
(c)
Z
f (v ) ≤ xyv dx
Ω
Z
≤ max |xy| |v| dx
Ω
Z !1/2 Z !1/2
2 2
≤ C 1 dx |v| dx
Ω Ω
≤ C kvkL2 (Ω) ≤ C kvkH 1 (Ω) .
by Lax-Milgram theorem, we get that e Galerkin method has a unique solution uh . Moreover,
a(vh , vh ) = f (vh ).
And from the weak formula, we have
a(u, vh ) = f (vh ).
then we get the Galerkin Orthogonal (GO)
a(u − uh , vh ) = 0.
Then by coercivity

ku − uh k2H 1 (Ω) ≤ a(u − uh , u − uh )

= a(u − uh , u − vh ) + a(u − uh , vh − uh )

= a(u − uh , u − vh )
≤ ku − uh kH 1 (Ω) ku − vh kH 1 (Ω) .
Therefore,
vh ∈Vh
Page 112 of 236

n o
Problem 10.2. (Prelim Aug. 2006#9) Let Ω := (x, y ) : x2 + y 2 < 1 , consider the poisson problem

−∆u + 2u = xy, in Ω,


= 0, on ∂Ω.

u

1. Define the corresponding Ritz and Galerkin variational formulations.

2. Suppose that the Galerkin variational problem has solution, prove that the Ritz variation problem
must also have a solution. Is the converse statement true?
3. Let VN be an N-dimension subspace of W 1,2 (Ω). Define the Galerkin method for approximating the
solution of the poisson equation problem, and prove that the Galerkin method has a unique solution.
4. Let uN denote the Galerkin solution, prove that
ku − uN kE ≤ C inf ku − vN kE ,
vN ∈VN
where
Z
kvkE := |∇v|2 + 2v 2 dxdy.
Ω
Solution. J
References
[1] S. C. Brenner and R. Scott, The mathematical theory of finite element methods, vol. 15, Springer, 2008.
108
[2] A. Iserles, A First Course in the Numerical Analysis of Differential Equations (Cambridge Texts in Applied
Mathematics), Cambridge University Press, 2008. 80, 84, 86, 92, 151
[3] Y. Saad, Iterative methods for sparse linear systems, Siam, 2003. 2, 39
[4] A. J. Salgado, Numerical math lecture notes: 571-572. UTK, 2013-14. 1
[5] S. M. Wise, Numerical math lecture notes: 571-572. UTK, 2012-13. 1
Page 113 of 236

Appendices
A Numerical Mathematics Preliminary Examination Sample Question,

Summer, 2013
A.1 Numerical Linear Algebra
Problem A.1. (Sample#1) Suppose A ∈ Cher

n×n
, and ρ (A) ⊂ (0, ∞). Prove that A is Hermitian Positive
Definite.
Solution. Since A ∈ Cn×n

her , then the eigenvalue of A are real. Let λ be arbitrary eigenvalue of A, then
(Ax, x ) = (λx, x ) = λ(x, x ),

∗
(Ax, x ) = (x, A x ) = (x, Ax )(x, λx ) = λ(x, x ),
and then λ = λ, so λ is real. Moreover, since ρ (A) ⊂ (0, ∞), then we have λ is positive.
x∗ Ax = x∗ λx = λx∗ x = λ(x12 + x22 + · · · + xn2 ) > 0.
for all x , 0. Hence, A is Hermitian Positive Definite. J
Problem A.2. (Sample#2) Suppose dim(A) = n. If A has n distinct eigenvalues , then A is diagonalizable
.
Solution. (Sketch) Suppose n = 2, and let λ1 , λ2 be distinct eigenvalues of A with corresponding eigen-
vectors v1 , v2 . Now, we will use contradiction to show v1 , v2 are lineally independent. Suppose v1 , v2 are
lineally dependent, then
c1 v1 + c2 v2 = 0, (210)
with c1 , c2 are not both 0. Multiplying A on both sides of (210), then
c1 Av1 + c2 Av2 = c1 λ1 v1 + c2 λ2 v2 = 0. (211)
Multiplying λ1 on both sides of (210), then
c1 λ1 v1 + c2 λ1 v2 = 0. (212)
Subtracting (212) form (211), then
c2 (λ2 − λ1 )v2 = 0. (213)
Since λ1 , λ2 and v2 , 0, then c2 = 0. Similarly, we can get c1 = 0. Hence, we get the contradiction.
A similar argument gives the result for n. Then we get A has n linearly independent eigenvectors . J
Problem A.3. (Sample#5) Let u, v ∈ Cn and set A := In + uv ∗ ∈ Cn×n .

1. Suppose A is invertible. Prove that A−1 = In + αuv ∗ , for some α ∈ C. Give the expression for α.
2. For what u and v is A singular ?
3. Suppose A is singular. What is the null space of A, N(A), in this case?
Page 114 of 236

Solution. 1. If uv ∗ = 0, then the proof is trivial. Assume uv ∗ , 0, then
A−1 A = (In + αuv ∗ ) (In + uv ∗ )

= In + uv ∗ + α (uv ∗ + u (v ∗ u )v ∗ )
= In + (1 + α + αv ∗ u )uv ∗
= In .
i.e.
1 + α + αv ∗ u = 0,
i.e.
1
α=− , 1 , −v ∗ u.
1 + v∗u
2. For 1 = −v ∗ u, the A is singular.

3. If A is singular, then v ∗ u = −1.
Claim A.1. N(A)=span(u).
Proof. (a) ⊆ let w ∈ N (A), then we have
Aw = (In + uv ∗ )w = w + uv ∗ w = 0
Then we have w = −v ∗ wu, hence w ∈ span(u ).

(b) ⊇ Let w ∈ span(u ), then we have w = βu, then
Aw = (In + uv ∗ )βu = β (u + uv ∗ u ) = β (u + (v ∗ u )u ) = 0.
hence span(u ) ∈ w.
J
Problem A.4. (Sample #6) Suppose that A ∈ Rn×n is SPD.

√
1. Show that kxkA = xT Ax defines a vector norm.
2. Let the eigenvalues of A be ordered so that 0 < λ1 ≤ λ2 ≤ · · · ≤ λn . Show that
p p
λ1 kxk2 ≤ kxkA ≤ λn kxk2 .
for any x ∈ Rn .
3. Let b ∈ Rn be given. Prove that x∗ ∈ Rn solves Ax = b if and only if x∗ minimizes the quadratic
function f : Rn → R defined by
1 T
f (x ) = x Ax − xT b.
2
√ √
Solution. √ 1. (a) Obviously, kxkA = xT Ax ≥ 0. When x = 0, then kxkA = xT Ax = 0; when kxkA =
xT Ax = 0, then we have (Ax, x ) = 0, since A is SPD, therefore, x ≡ 0.
√ √ √
(b) kλxkA = λxT Aλx = λ2 xT Ax = |λ| xT Ax = |λ| kxkA .
Page 115 of 236


(c) Next we will show x + y A = kxkA + y A . First, we would like to show

y T Ax ≤ kxkA y .
A
Since A is SPD, therefore A = RT R, moreover

q √ √
1/2
kRxk2 = (Rx, Rx ) = (Rx )T Rx = xT RT Rx = xT Ax = kxkA .
Then
c.s.
y T Ax = y T RT Rx = (Ry )T Rx = (Rx, Ry ) ≤ kRxk2 Ry = kxkA y .
2 A
And
2
x + y = (x + y, x + y )A = (x, x )A + 2(x, y )A + (y, y )A
A

≤ kxkA + 2 y T Ax + y A

≤ kxkA + 2 kxkA y A + y A
2
= kxkA + y A .
therefore

x + y = kxkA + y .
A A
2. Since A is SPD, therefore A = RT R, moreover

q √ √
√
Let 0 < λ̃1 ≤ λ̃2 ≤ · · · ≤ λ̃n be the eigenvalue of R, then λ̃i = λi . so

λ̃1 kxk2 ≤ kRxk2 = kxkA ≤ λ̃n kxk2 .
i.e.
p p
λ1 kxk2 ≤ kRxk2 = kxkA ≤ λn kxk2 .
3. Since
∂ T ∂ T ∂
∂xi ∂xi ∂xi
0
 
 
 .. 
.
 
 

0

 
T  
= [0, · · · , 0, 1, 0, · · · , 0]Ax + x A  1  i
i

0
 
 
..
 

.
 
 
0


= (Ax )i + AT x = 2 (Ax )i .
i
and
∂ T ∂ T
x b = x b = [0, · · · , 0, 1, 0, · · · , 0]b = bi .
∂xi ∂xi i
Page 116 of 236

Therefore,
1
∇f (x ) =
2Ax − b = Ax − b.
2
If Ax∗ = b, then ∇f (x∗ ) = Ax∗ − b = 0, therefore x∗ minimizes the quadratic function f. Conversely,
when x∗ minimizes the quadratic function f, then ∇f (x∗ ) = Ax∗ − b = 0, therefore Ax∗ = b.
J
Problem A.5. (Sample#9) Suppose that the spectrum of A ∈ Rn×n sym is denoted ρ (A) = {λ1 , λ2 , · · · , λn } ⊂ R.
Let S = {x1 , · · · , xn } be the orthonormal basis of eigenvectors of A, with Axk = λk xk , for k = 1, · · · , n. The
Rayleigh quotient of x ∈ Rn∗ is defined as
xT Ax
R(x) : = .
xT x
Prove the following facts:
1.
Pn 2
j = 1 λj α j
R(x) : = Pn 2
j =1 αj
where αj = xT xj .
2.
min λ ≤ R(x) ≤ max λ.

λ∈ρ (A) λ∈ρ (A)
1. First, we need to show that x = nj=1 αj xj is the unique representation of x w.r.t. the or-
P
Solution.
thonormal basis S. Since S = {x1 , · · · , xn } is the orthonormal basis of eigenvectors of A, then nj=1 xT xj xj
P
is the representation of x. Assume j =1 βj xj is another representation of x. Then we have nj=1 (βj −

Pn P
αj )xj = 0, since xj . 0, so α = β. Now , we have

n
X
xT Ax = xT A αj xj
j =1
n
X
= xT αj Axj
j =1
n
X
T
= x αj λj xj
j =1
n
X
= αj λj xT xj
j =1
n
X
= λj αj2 .
j =1
Pn 2
Similarly, we have xT x = j = 1 αj . Hence,
Pn 2
xT Ax j =1 λj αj
R(x) := T = Pn 2
.
x x j = 1 αj
Page 117 of 236

2. Since,
Pn 2
xT Ax j =1 λj αj
R(x) := T = Pn 2
.
x x j = 1 αj
, then
Pn 2 Pn 2
j = 1 αj j = 1 αj
min λj Pn 2
≤ R(x) ≤ max λj Pn 2
,
j = 1 αj j =1 αj
j j
i.e.
min λj ≤ R(x) ≤ max λj .

j j
Hence
min λ ≤ R(x) ≤ max λ.

λ∈ρ (A) λ∈ρ (A)
Problem A.6. (Sample #31) Let A ∈ Rn×n be symmetric positive define (SPD). Let b ∈ Rn . Consider
solving Ax = b using the iterative method
Mxn+1 = N xn + b, n = 0, 1, 2, · · ·
where A = M − N , M is invertible, and x0 ∈ Rn us arbitrary.

1. If M + M T − A is SPD, prove that the method is convergent.
2. Prove that the Gauss-Seidel Method converges.
Solution. 1. From the problem, we get
xn+1 = M −1 N xn + M −1 b.
Let G = M −1 N = M −1 (M −A) = I −M −1 A, If we can prove that ρ (G ) < 1, then this method converges.
Let λ be any eigenvalue of G and x be the corresponding eigenvector, i.e.
Gx = λx.
then
(I − M −1 A)x = λx,
i.e.
(M − A)x = λMx,
i.e.
(1 − λ)Mx = Ax.
(a) λ , 1. If λ = 1, then Ax = 0, for any x, so A = 0 which contradicts to A is SPD.
Page 118 of 236

(b) λ ≤ 1. Since, (1 − λ)Mx = Ax. then
(1 − λ)x∗ Mx = x∗ Ax.
So we have
1 ∗
x∗ Mx = x Ax.
1−λ
taking conjugate transpose of which yields
1 1
x∗ M ∗ x = x∗ A∗ x = x∗ Ax.
1−λ 1−λ
Then, we have
!
1 1
x ∗ (M + M ∗ − A)x = + − 1 x∗ Ax
1−λ 1−λ
!
λ 1
= + x∗ Ax
1−λ 1−λ
1 − λ2 ∗
= x Ax.
|1 − λ|2
Since M + M ∗ − A and A are SPD, then x∗ (M + M ∗ − A)x > 0, x∗ Ax > 0. Therefore,
1 − λ2 > 0.
i.e.
|λ| < 1.
2.
Jacobi Method: MJ = D, NJ = −(L + U )

Gauss-Seidel Method: MGS = D + L, NGS = −U .
T
where A = L + D + U . Since A is SPD, then MGS + MGS − A = D + L + D T + LT − A = D + LT − U is
SPD. Therefore, From the part 1, we get that the Gauss-Seidel Method converges.
J
Problem A.7. (Sample #32) Let A ∈ Rn×n be symmetric positive define (SPD). Let b ∈ Rn . Consider
solving Ax = b using the iterative method
Mxn+1 = N xn + b, n = 0, 1, 2, · · ·
where A = M − N , M is invertible, and x0 ∈ Rn us arbitrary. Suppose that M + M T − A is SPD. Show that

each step of this method reduces the A-norm of en = x − xn , whenever en , 0. Recall that, the A-norm of any
y ∈ Rn is defined via
q
y = y T Ay.
A
Page 119 of 236

Solution. Let ek = xk − x. And rewrite the scheme to the canonical form B = M, α = 1, then
x k +1 − x k
!
B + Axk = b = Ax.
α
so, we get
e k +1 − e k
!
B + Aek = 0.
α
Let v k +1 = ek +1 − ek , then
1 k +1
Bv + Aek = 0.
α
Taking the conjugate transport of the above equation, then we get
1 ∗ k +1 1
Bv + A ∗ ek = B∗ v k +1 + Aek = 0.
α α
therefore
1 B + B∗ k + 1
( )v + Aek = 0.
α 2
B + B∗
Let Bs = 2 . Then take the inner product of both sides with v k +1 ,
1
(B v k +1 , v k +1 ) + (Aek , v k +1 ) = 0.
α s
Since
1 k +1 1 1 1
ek = (e + e k ) − (e k +1 − e k ) = (e k +1 + e k ) − v k +1 .
2 2 2 2
Therefore,
1
0 = (B v k +1 , v k +1 ) + (Aek , v k +1 )
α s
1 1 1
= (B v k +1 , v k +1 ) + (A(ek +1 + ek ), v k +1 ) − (Av k +1 , v k +1 )
α s 2 2
1 α 1
= ((Bs − A)v k +1 , v k +1 ) + (A(ek +1 + ek ), v k +1 )
α 2 2
1 α 1 2 2
= ((Bs − A)v k +1 , v k +1 ) + ( ek +1 A − ek A )
α 2 2
M +M T −A
By assumption, Q = Bs − α2 A = 2 > 0, i.e. there exists m > 0, s.t.
2
(Qy, y ) ≥ m y 2 .
Therefore,
m k +1 2 1 k +1 2 k 2
v + ( e − e ) ≤ 0.
α 2 2 A A
i.e.
2m k +1 2 k +1 2 k 2
v + e ≤ e .
α 2 A A
Page 120 of 236

Hence
2 2
ek +1 ≤ ek .

A A
and
2
ek +1 → 0.

A
Problem A.8. (Sample #33) Consider a linear system Ax = b with A ∈ Rn×n . Richardson’s method is an
iteration method
Mxk +1 = N xk + b
with M = w1 I, N = M − A = w1 I − A, where w is a damping factor chosen to make M approximate A as

well as possible. Suppose A is positive definite and w > 0. Let λ1 and λn denote the smallest and largest
eigenvalues of A.
2
1. Prove that Richardson’s method converges if only if w < λn .
2
2. Prove that the optimal value value of w is w0 = λ1 + λn .
Solution. 1. From the scheme of the Richardson’s method, we know that
xk +1 = (I − wA)xk + wb.
So the error transfer operator is T = I − wA. Then if λi is the eigenvalue of A, then 1 − wλi should be
the eigenvalue of T . The sufficient and necessary condition of convergence is ρ (T ) < 1, i.e.
|1 − wλi | < 1
for all i. Therefore, we have

2
w< .
λi
2 2
Since λn denote the largest eigenvalues of A, then λn ≤ λi . Hence, we need
2
w< .
λn
2
conversely, if w < λn , then ρ (T ) < 1, then the scheme converges.
2. The minimum is attachment at |1 − ωλn | = |1 − ωλ1 |(Figure.1), i.e.
ωλn − 1 = 1 − ωλ1 .
Therefore, we get
2
ωopt = .
λ1 + λn
J
Page 121 of 236

Problem A.9. (Sample #34) Let A ∈ Cn×n . Define
Sn := I + A + A2 + · · · + An .
1. Prove that the sequence {Sn }∞

n=0 converges if only if A is convergent.
2. Prove that if A is convergent, then I − A is non-singular and
lim Sn = (I − A)−1 .
n→∞
Solution. 1. From the problem, we know that

n
X
Sn := I + A + A2 + · · · + An = A0 + A + A2 + · · · + An = Ak .
k =0
Moreover,

Ak x kAk Ak−1 x
k
A = sup ≤ sup ≤ · · · ≤ kAkk .
0,x∈Cn kxk 0,x∈C n kxk
From the properties of geometric series, Sn converges if only if |A| < 1. Therefore, we get if |A| < 1
then A is convergent. Conversely, if A is convergent, then |A| < 1. Hence Sn converges.
2.

(I − A)x = kx − Axk ≥ kxk − kAxk ≥ kxk − kAk kxk = (1 − kAk) kxk .

If A is convergent, then kAk , 0. Therefore, if (I − A)x = 0, then kxk = 0, i.e. ker (I − A) = 0. Hence,
I − A is non-singular. From the definition of Sn , we get
n
X n
X +1
( I − A ) Sn = k
A − Ak = A0 − An+1 = I − An+1 .
k =0 k =1
Taking limit on both sides of the above equation with the fact |A| < 1, then we get
(I − A) lim Sn = I.
n→∞
Since I − A is non-singular, then we have
lim Sn = (I − A)−1 .
n→∞
J
Problem A.10. (Sample #40) Show that if λ is an eigenvalue of A∗ A, where A ∈ Cn×n , then
0 ≤ λ ≤ kAk kA∗ k
Solution. Since x∗ A∗ Ax = (Ax )∗ (Ax ) = λx∗ x ≥ 0, therefore λ ≥ 0, and λ is real. Since

A∗ Ax = λx.
so
0 ≤ λ kxk = kλxk = kA∗ Axk ≤ kA∗ k kAk kxk .
J
Page 122 of 236

Problem A.11. (Sample #41) Suppose A ∈ Cn×n and A is invertible. Prove that
r
λn
κ2 ≤ .
λ1
where λn is the largest eigenvalue of B := A∗ A, and λ1 is the smallest eigenvalue of B.
√
Solution. Since κ2 = kAk2 A−1 2 and kAk2 = max ρ (A) = λn . therefore
√
λ
κ2 = kAk2 A−1 2 = √ n .

λ1
J
Problem A.12. (Sample #34) Let A = [ai,j ] ∈ Cn×n be invertible and b ∈ Cn . Prove that the classical Jacobi
iteration method for approximating the solution to Ax = b is convergent, for any starting value x0 , if A is
strictly diagonally dominant, i.e.
X
ai,i < ai,k , ∀ i = 1, · · · , n.
k,i
Solution. The Jacobi iteration scheme is as follows

D (xk +1 − xk ) + Axk = b.
This scheme can be rewritten as
xk +1 = (I − D −1 A)xk + D −1 b.

We want to show If A is diagonal dominant , then TJ < 1, then Jacobi Method convergences. From the
definition of T, we know that T for Jacobi Method is as follows
TJ = I − D −1 A.
In the matrix form is
  1 
0   a11
 1 0   a11 ··· a1n
 
 h i 
 ..   ..   ..
 
.. ..  tij = 0, i = j,

T =  .  −  .   . . .
 = tij =  a .
tij = − aij , i , j.
      

0 1
  1   a ··· ann
 ii
0 a n1
nn
So,
X X aij
kT k∞ = max |tij | = max | |.
i i aii
j i,j
Since A is diagonal dominant, so

X
|aii | ≥ |aij |.
j,i
Therefore,
X |aij |
1≥ .
|aii |
j,i
Hence, kT k∞ < 1 J
Page 123 of 236

Problem A.13. (Sample #35) Let A = [ai,j ] ∈ Cn×n be invertible and b ∈ Cn . Prove that the classical
Gauss-Seidel iteration method for approximating the solution to Ax = b is convergent, for any starting
value x0 , if A is strictly diagonally dominant, i.e.
X
ai,i < ai,k , ∀ i = 1, · · · , n.
k,i
Solution. The Jacobi iteration scheme is as follows
(D + L)(xk +1 − xk ) + Axk = b.
This scheme can be rewritten as
xk +1 = −(L + D )−1 U xk + (L + D )−1 b := TGS xk + (L + D )−1 b.
We want to show If A is diagonal dominant , then kTGS k < 1, then Jacobi Method convergences. From the
definition of T, we know that T for Gauss-Seidel iteration Method is as follows
TGS = −(L + D )−1 U .
Since A is diagonal dominant, so So,

X X
|aii | − |aij | ≥ |aij |,
ji
which implies
P
j>i |aij |
( )
γ = maxi P | ≤ 1.
|aii | − j<i |aij
Now, we will show kTGS k < γ. Let x ∈ Cn and y = T x, i.e.
y = TGS x = −(L + D )−1 U x.

Let i0 be the index such that y ∞ = |yi0 |, then we have
X X X
|((L + D )y )i0 | = |(U x )i0 | = | ai0 j xj | ≤ |ai0 j ||xj | ≤ |ai0 j | kxk∞ .
j>i0 j>i0 j>i0
Moreover
X X X X
|((L + D )y )i0 | = | ai0 j yj + ai0 i0 yj | ≥ |ai0 i0 yj | − | ai0 j yj | = |ai0 i0 | y ∞ − | ai0 j yj | ≥ |ai0 i0 | y ∞ − |ai0 j | y ∞ .
j<i0 j<i0 j<i0 j<i0
Therefore, from the above two equations, we have

X X
|ai0 i0 | y ∞ − |ai0 j | y ∞ ≤ |ai0 j | kxk∞ ,
j<i0 j>i0
which implies
P
j>i0 |ai0 j |
y ≤
∞
P kxk∞ .
|ai0 i0 | − j<i0 |ai0 j |
Page 124 of 236

So,
kTGS xk∞ ≤ γ kxk∞ ,
which implies
kTGS k∞ ≤ γ < 1.
Problem A.14. (Sample #38) Let A ∈ Cn×n be invertible and suppose b ∈ Cn∗ satisfies Ax = b. let the
perturbations δx, δb ∈ Cn satisfy Aδx = δb, so that A(x + δx ) = b + δb.
1. Prove the error (or perturbation) estimate
1 kδbk kδxk kδbk

≤ ≤ κ (A) .
κ (A) kbk kxk kbk
2. Show that for any invertible matrix A, the upper bound for kδbk
kbk
above can be attained for suitable
choice of b and δb. (In other words, the upper bound is sharp.)
Solution. 1. Since Ax = b and Aδx = δb, then x = A−1 b and

kδbk = kAδxk ≤ kAk kδxk , kxk = A−1 b ≤ A−1 kbk .
Therefore
kδbk 1 1
≤ kδxk , ≤ .
kAk −1
A kbk kxk
Hence
1 kδbk kδxk
≤ .
κ (A) kbk kxk
Similarly, since Ax = b and Aδx = δb, then δx = A−1 δb and

kbk = kAxk ≤ kAk kxk , kδxk = A−1 δb ≤ A−1 kδbk .
Therefore
1 kAk
≤ .
kxk kbk
Hence,
kδxk kδbk
≤ κ (A) .
kxk kbk
So,
1 kδbk kδxk kδbk
≤ ≤ κ (A) .
κ (A) kbk kxk kbk
Page 125 of 236

2. Since Ax = b and Aδx = δb, then x = A−1 b and

1 A−1
≤ , kδbk = kAδxk ≤ kAk kδxk .
kbk kxk
Hence,
kδbk kδxk
≤ κ (A)
kbk kxk
So the upper bound for kδbk

kbk
above can be attained for suitable choice of b and δb, since x and δx are
dependent on b and δb, respectively.
J
∈ R , b ∈ R . Suppose x and x̂ solove Ax = b and (A + δA)x̂ =

Problem A.15. (Sample #39) Let A n×n n
b + δb, respectively. Assuming that −1

A kδAk < 1, show that
!
kδxk κ (A) kδAk kδbk
≤ kδAk2
+
kxk 1 − κ2 (A) kAk kAk kbk
2
where δx = x̂ − x.

Solution. Since A−1 kδAk < 1, then we have

A−1 δA ≤ A−1 kδAk < 1.
Therefore,

(I − A−1 δA)−1 ≤ 1
.
1 − −1 δA
A

(I + A−1 δA)−1 ≤ 1
.
1 − A−1 δA
δx = x + δx − x
= (A + δA)−1 (b + δb ) − A−1 b
= (A + δA)−1 AA−1 (b + δb ) − A−1 b
−1
= (A + δA)−1 A−1 A−1 (b + δb ) − A−1 b

−1
= (A−1 A + A−1 δA) A−1 (b + δb ) − A−1 b
−1
= (I + A−1 δA) A−1 (b + δb ) − A−1 b
−1
= (I + A−1 δA) A−1 (b + δb ) − (I + A−1 δA)A−1 b
−1
= (I + A−1 δA) A−1 δb − A−1 δAA−1 b
Page 126 of 236

Therefore,
1
kδxk ≤ A−1 δb + A−1 δAA−1 b
1 − A δA −1
1
≤ A−1 kδbk + A−1 kδAk A−1 b
1 − A−1 δA
1
= A−1 kδbk + A−1 kδAk kxk
1 − A δA −1
!
κ (A) kδbk kδAk kxk
= +
1 − A−1 δA kAk kAk
Dividing kxk on both sides of the above equation yields

!
kδxk κ (A) kδbk kδAk
≤ +
kxk 1 − A δA kAk kxk
−1 kAk
Since kbk = kAxk ≤ kAk kxk, then we have

!
kδxk κ (A) kδbk kδAk
≤ +
kxk 1 − A−1 δA kbk kAk
!
κ (A) kδbk kδAk
≤ +
1 − A kδAk2 kbk
−1
2
kAk
!
κ (A) kδbk kδAk
≤ −1
+
kA k2 kAk2 kδAk2 kbk kAk
1− kAk2
!
κ (A) kδbk kδAk
= kδAk2
+ .
1 − κ2 ( A ) kbk kAk
kAk2
Let A ∈ R , b ∈ R . Suppose x and x̂ solove Ax = b and (A + δA)x̂ = b,

Problem A.16. (Sample #39) n×n n
−1
respectively. Assuming that A kδAk < 1, show that
kδxk κ (A) kδAk

≤ kδAk2 kAk
.
kxk 1 − κ2 (A) kAk
2
where δx = x̂ − x.

Solution. Since A−1 kδAk < 1, then we have

A−1 δA ≤ A−1 kδAk < 1.
Therefore,

(I − A−1 δA)−1 ≤ 1
.
1 − A−1 δA

(I + A−1 δA)−1 ≤ 1
.
1 − A−1 δA
Page 127 of 236

δx = x + δx − x
= (A + δA)−1 b − A−1 b
= (A + δA)−1 AA−1 b − A−1 b
−1
= (A + δA)−1 A−1 A−1 b − A−1 b

−1
= (A−1 A + A−1 δA) A−1 b − A−1 b
−1
= (I + A−1 δA) A−1 b − A−1 b
−1
= (I + A−1 δA) A−1 b − (I + A−1 δA)A−1 b
−1
= (I + A−1 δA) −A−1 δAA−1 b
Therefore,
1
kδxk ≤ A−1 δAA−1 b
1 − A−1 δA
1
≤ A−1 kδAk A−1 b
1 − A−1 δA
1
= A−1 kδAk kxk
1 − A−1 δA
!
κ (A) kδAk kxk
=
1 − A−1 δA kAk
Dividing kxk on both sides of the above equation yields

!
kδxk κ (A) kδAk
≤
kxk 1 − A δA kAk
−1
Since kbk = kAxk ≤ kAk kxk, then we have

!
kδxk κ (A) kδAk
≤
kxk 1 − A δA kAk
−1
!
κ (A) kδAk
≤
1 − A kδAk2 kAk
−1
2
!
κ (A) kδAk
≤
kA−1 k2 kAk2 kδAk2 kAk
1− kAk2
κ (A) kδAk
= kδAk
.
1 − κ2 (A) kAk 2 kAk
2
Problem A.17. (Sample #40) Show that if λ is an eigenvalue of A∗ A, where A ∈ Cn×n , then
0 ≤ λ ≤ kA∗ k kAk .
Page 128 of 236

Problem A.18. (Sample #41) Suppose A ∈ Cn×n is invertible. Show that

r
λn
κ2 ( A ) = ,
λ1
where λn is the largest eigenvalue of B := A∗ A, and λ1 is the smallest eigenvalue of B.
Problem A.19. (Sample #42) Suppose A ∈ Cn×n and A is invertible. Prove that
q
κ2 ≤ κ1 ( A ) κ∞ ( A ) .
Solution.
Claim A.2.
kAk22 ≤ kAk1 kAk∞ .
Proof.
kAk22 = ρ (A)2 = λ ≤ kAk1 kA∗ k1 = kAk1 kAk∞ .
where λ is the eigenvalue of A∗ A. J

Since κ2 = kAk2 A−1 2 , κ1 = kAk1 A−1 1 and κ∞ = kAk∞ A−1 ∞ .
q q q
−1
≤
p A−1 A−1 ≤
−1

−1

= κ1 ( A ) κ∞ ( A ) .
kAk2 A
2
kAk1 kAk∞ 1 ∞
kAk1 A
1
kAk∞ A
∞
J
Problem A.20. (Sample #44) Suppose A, B ∈ Rn×n and A is non-singular and B is singular. Prove that
1 kA − Bk
≤ ,
κ (A) kAk

x = x − A−1 Bx = (I − A−1 B)x.
So

Since x , 0, so

1 ≤ A−1 kA − Bk .
1 kA − Bk
≤ ,
−1
A kAk kAk
i.e.
1 kA − Bk
≤ .
κ (A) kAk
J
Page 129 of 236

A.2 Numerical Solutions of Nonlinear Equations
Problem A.21. (Sample #1) Let {xn } be a sequence generated by Newton’s method. Suppose that the initial
guess x0 is well chosen so that this sequence converges to the exact solution x∗ . Prove that if f (x∗ ) =
f 0 (x∗ ) = · · · = f m−1 (x∗ ) = 0, f m (x∗ ) , 0, xn converges linearly to x∗ with
e k +1 m−1
lim k
= .
k→∞ e m
Solution. Newton’s method scheme is read as follows
f (x k )
x k +1 = x k − .
f 0 (x k )
Let ek = xk − x∗ , then
e k +1 = x k + 1 − x∗
f (x k )
= x k − 0 k − x∗
f (x )
f (x k )
= ek − .
f 0 (x k )
Therefore,
e k +1 f (x k )
= 1 − .
ek e k f 0 (x k )
Since x0 is well chosen so that this sequence converges to the exact solution x∗ , therefore we have the Taylor
expansion for f (xk ), f 0 (xk ) at x∗ , i.e.
f (m−1) (x∗ ) k m−1 f (m) (ξ k ) k m

f (x k ) = f ( x∗ ) + f 0 ( x∗ ) e k + · · · + e + e
(m − 1) ! m!
f (m) (ξ k ) k m k
= e , ξ ∈ [ x∗ , x k ] .
m!
f (m−1) (x∗ ) k m−2 f (m) (η k ) k m−1
f 0 (x k ) = f 0 (x∗ ) + f 00 (x∗ )ek + · · · + e + e
(m − 2) ! (m − 1) !
f (m) (η k ) k m−1 k
= e , η ∈ [ x∗ , x k ] .
(m − 1) !
Hence,
f (m) (ξ k )
m
e k +1 f (x k ) m! ek m−1 1 f (m) ( ξ k ) 1 m−1
lim k = 1 − k 0 k = 1 − = = 1 − = 1− = ,
k→∞ e e f (x ) f (m) (η k ) k m−1 k m mf ( m ) k
(η ) m m

(m−1)!
e e
since when k → ∞ then ξ k , η k → x∗ . J
Page 130 of 236

Problem A.22. (Sample #2) Let f : Ω ⊂ Rn → Rn be twice continuously differentiable. Suppose x∗ ∈ Ω is

a solution of f(x) = 0, and the Jacobian matrix of f, denoted Jf , is invertible at x∗ .
−1
xk +1 = xk − Jf (x0 ) f(xk ).
2. Prove that the convergence is typically linear.
Solution. J
Problem A.23. (Sample #3) Let a ∈ Rn and R > 0 be given. Suppose that f : B(a, R) → Rn , fi ∈
C 2 (B(a, R)), for each i = 1, · · · , n. Suppose that there is a point ξ ∈ B(a, R), such that f(ξ ) = 0, and
that the Jacobian matrix Jf (x) is invertible, with estimate [Jf (x)]−1 2 ≤ β, for any x ∈ B(a, R). Prove that
the sequence {xk } defined by Newton’s method,
Jf (xk ) (xk +1 − xk ) = −f(xk ),
converges (at least) Linear to the root ξ as k → ∞, provided x0 is sufficiently close to ξ.
Solution. J
Problem A.24. (Sample #6) Assume that f : R → R, f ∈ C 2 (R), f 0 (x ) > 0 for all x ∈ R, and f 00 (x ) > 0,
for all x ∈ R.
1. Suppose that a root ξ ∈ R exists. Prove that it is unique. Exhibit a function satisfying the assump-
tions above that has no root.
2. Prove that for any starting guess x0 ∈ R, Newton’s method converges, and the convergence rate is
quadratic.
Solution. 1. Let x1 and x2 are the two different roots. So, f (x1 ) = f (x2 ) = 0, then by Mean value
theorem, we have that there exists η ∈ [x1 , x2 ], such f (η ) = 0 which contradicts f 0 (x ) > 0.
2. example f (x ) = ex .
3. Let x∗ be the root of f (x ). From the Taylor expansion, we know
1
2
1
0 = f (x∗ ) = f (xk ) + f 0 (xk )(ek ) + f 00 (θ )(ek )2 .
2
so
h i−1 1h i−1
2
 h i−1
xk +1 = xk − f 0 (xk ) f (xk )


x∗ = x∗


Page 131 of 236

So,
h i−1 1h i−1
2
i.e.
f 00 (θ )
e k +1 = − h i (e k )2 ,
0
2 f (x )k

f 00 (z ) ≤ C1 , f 0 (z ) ≤ C2 ,
Therefore,

k + 1
f 00 (θ ) k 2 C1 k 2
e ≤ h i (e ) ≤ e .
2 f 0 (xk ) 2C2
This implies
2
xk +1 − x∗ ≤ C xk − x∗ .

Problem A.25. (Sample #8) Consider the two-step Newton method
f ( xk ) f (y )
y k = xk − 0
, xk + 1 = y k − 0 k
f ( xk ) f ( xk )
for the solution of the equation f (x ) = 0. Prove

1. If the method converges, then
xk + 1 − x ∗ f 00 (xk )
lim = ,
k→∞ (yk − x∗ )(xk − x∗ ) f 0 ( xk )
where x∗ is the solution.

2. Prove the convergence is cubic, that is
xk +1 − x∗ 1 f 00 (xk )
!
lim = .
k→∞ (xk − x∗ )3 2 f 0 ( xk )
3. Would you say that this method is faster than Newton’s method given that its convergence is cubic?
Solution. 1. First, we will show that if xk ∈ [x − h, x + h], then yk ∈ [x − h, x + h]. By Taylor expansion
formula, we have
1 00
0 = f (x∗ ) = f (xk ) + f 0 (xk )(x∗ − xk ) + f (ξk )(x∗ − xk )2 ,
2!
where ξ is between x and xk . Therefore, we have
1 00
f (xk ) = −f 0 (xk )(x∗ − xk ) − f (ξk )(x∗ − xk )2 .
2!
Page 132 of 236

Plugging the above equation to the first step of the Newton’s method, we have
1 f 00 (ξk ) ∗
y k = xk + ( x ∗ − xk ) + ( x − xk ) 2 .
2! f 0 (xk )
then
1 f 00 (ξk ) ∗
yk − x ∗ = ( x − xk ) 2 . (214)
2! f 0 (xk )
Therefore,
1 f 00 (ξk )
1 f 00 (ξk ) ∗

yk − x∗ = ∗ 2 (x − xk ) (x∗ − xk ) .
0
( x − x )
k ≤ 0
2! f (xk ) 2 f ( xk )

Since we can choose the initial value very close to x∗ , shah that
00
f (ξ ) (x∗ − x ) ≤ 1

f 0 (x ) k
k
Then, we have that

1
yk − x∗ ≤ (x∗ − xk ) .
2
Hence, we proved the result, that is to say, if xk → x∗ , then yk , ξk → x∗ .
2. Next, we will show if xk ∈ [x − h, x + h], then xk +1 ∈ [x − h, x + h]. From the second step of the Newton’s
Method, we have that
f (yk )
xk + 1 − x ∗ = yk − x∗ −
f 0 ( xk )
1
= ((yk − x∗ )f 0 (xk ) − f (yk ))
f 0 ( xk )
1
= [(yk − x∗ )(f 0 (xk ) − f 0 (x∗ )) − f (yk ) + (yk − x∗ )f 0 (x∗ )]
f 0 ( xk )
By mean value theorem, we have there exists ηk between x∗ and xk , such that
f 0 (xk ) − f 0 (x∗ ) = f 00 ηk (xk − x∗ ),
and by Taylor expansion formula, we have
(yk − x∗ )2 00
f ( yk ) = f (x∗ ) + f 0 (x∗ )(yk − x∗ ) + f ( γk )
2
(y − x∗ )2 00
= f 0 (x∗ )(yk − x∗ ) + k f ( γk ) ,
2
where γ is between yk and x∗ . Plugging the above two equations to the second step of the Newton’s
method, we get
(yk − x∗ )2 00
" #
1
xk + 1 − x ∗ = f 00
η (
k k x − x ∗
)( y k − x ∗
) − f 0 ∗
( x )( y k − x ∗
) − f ( γ k ) + ( y k − x ∗ 0 ∗
) f ( x )
f 0 ( xk ) 2
(yk − x∗ )2 00
" #
1 00 ∗ ∗
= f η k ( x k − x )( y k − x ) − f ( γk ) . (215)
f 0 ( xk ) 2
Page 133 of 236

Taking absolute values of the above equation, then we have

#
(yk − x∗ )2 00
"

∗
1 00 ∗ ∗
xk +1 − x = 0 f ηk (xk − x )(yk − x ) − f (γk )

f ( xk ) 2
A
≤ A |xk − x∗ | yk − x∗ + yk − x∗ yk − x∗
2
1 1 5
≤ |x − x | + |xk − x∗ | = |xk − x∗ | .
∗
2 k 8 8
Hence, we proved the result, that is to say, if yk → x∗ , then xk +1 , ηk , γk → x∗ .
3. Finally, we will prove the convergence order is cubic. From (215), we can get that
xk + 1 − x ∗ f 00 ηk (yk − x∗ )f 00 (γk )
= − .
(xk − x∗ )(yk − x∗ ) f 0 (xk ) 2(xk − x∗ )f 0 (xk )
By using (214), we have
xk + 1 − x ∗ f 00 ηk 1 f 00 (ξk ) ∗ f 00 (γk )
= − ( x − x k ) .
(xk − x∗ )(yk − x∗ ) f 0 (xk ) 4 f 0 (xk ) f 0 ( xk )
Taking limits gives
xk + 1 − x ∗ f 00 (x∗ )
lim = .
k→∞ (xk − x∗ )(yk − x∗ ) f 0 (x ∗ )
By using (214) again, we have
1 2 f 0 ( xk )
= .
yk − x∗ (x∗ − xk )2 f 00 (ξk )
Hence
!2
x − x∗ 1 f 00 (x∗ )
lim k +1 ∗ 3 = .
k→∞ (xk − x ) 2 f 0 (x ∗ )
J
A.3 Numerical Solutions of ODEs
Problem A.26. (Sample #1) Show that, if z is a non-zero complex number that iix on the boundary of the
linear stability domain of the two-step BDF method
4 1 2
yn+2 − yn+1 + yn = hf (xn+2 , yn+2 ),
3 3 3
then the real part of z must be positive. Thus deduce that this method is A-stable.

s
X s
X
2
ρ (w ) : = m
am w = −2 + w + w and σ (w ) := bm w m = 1 + w + w 2 . (216)
m=0 m=0
Page 134 of 236


s
X s
X
2
ρ (w ) : = m
am w = ξ + 3ξ and σ (w ) := bm wm = ξ 2 + 3ξ + 3. (217)
m=0 m=0
So,
ξ2 ξ3
ρ (w ) − σ (w )ln(w ) = ξ 2 + 3ξ − (3 + 3ξ + ξ 2 )(ξ − + ···)
2 3
+3ξ +ξ 2
−3ξ −3ξ 2 −ξ 3
=
+ 23 ξ 2 + 32 ξ 3 + 12 ξ 4
−ξ 3 −ξ 4 − 31 ξ 5
1
= − ξ 2 + O (ξ 3 ).
2
1
ρ (w ) − σ (w )ln(w ) = − ξ 2 + O (ξ 3 ).
2
J
A.4 Numerical Solutions of PDEs

p
Problem A.27. (Sample #1) Let V be a Hilbert space with inner product (·, ·)V and norm kvkV = (v, v )V ,
∀v ∈ V . Suppose a : V × V → R is a symmetric bilinear form that is
• continuous:

a(u, v ) ≤ γ kukV kvkV , ∃γ > 0, ∀u, v ∈ V ,
• coercive:

α kuk2V ≤ a(u, v ) , ∃α > 0, ∀u ∈ V ,
Suppose L : V → R is linear and bound, i.e.

L(u ) ≤ λ kukV ,
for some λ > 0, ∀u ∈ V . Let u satisfies

a(u, v ) = L(v )
, for all v ∈ V .
1. Galerkin approximation: Suppose that Sh ⊂ V is finite dimensional. Prove that there exists a unique
uh ∈ V that satisfies
a(u, v ) = L(v )
, for all v ∈ Sh .
2. Prove that the Galerkin approximation is stable kuh k ≤ αλ .
3. Prove Ceá’s lemma:
γ
ku − uh kV ≤ inf ku − wkV .
α w∈Sh
Page 135 of 236

Solution. 1. Lax-Milgram theorem.

2. let uh ∈ Sh be the Galerkin approximation, then we have

α kuh k2V ≤ a(uh , uh ) = L(uh ) ≤ λ kuh kV .
So, we have
λ
kuh kV ≤ .
α
3. let uh , w ∈ Sh be the Galerkin approximation, then we have
a(u, v ) = L(v )
a(uh , v ) = L(v )
then we have the so called Galerkin Orthogonal a(u − uh , v ) = 0 for all v ∈ V . Then by coercivity

α ku − uh k2V ≤ a(u − uh , u − uh )

= a(u − uh , u − w ) + a(u − uh , w − uh )

= a(u − uh , u − w )
≤ γ ku − uh kV ku − wkV .
therefore
γ
ku − uh kV ≤ ku − wkV .
α
Hence
γ
ku − uh kV ≤ inf ku − wkV .
α w∈Sh
J
Problem A.28. (Sample #3) Consider the Lax-Friedrichs scheme,
1 n µ as
ujn+1 =

uj−1 + ujn+1 − ujn+1 − uj−1
n
, µ= ,
2 2 h
for approximating solutions to the Cauchy problem for the advection equation
∂u ∂u
+ =0
∂t ∂x
where a > 0. Here h > 0 is the space step size, and s > 0 us the time step size.
1. Prove that, if s = C1 h, where C1 is fixed positive constant, then the local truncation error satisfies
the estimate
Tln ≤ C0 (s + h)
, where C0 > 0 us a constant independent of s and h.
2. Use the von Neumann analysis to show that the Lax-Friedrichs scheme is stable provided the CFL
as
condition
0 < µ = h ≤ 1 holds. In other words, compute the amplification factor, g (ξ ), and show
that g (ξ ) ≤ 1, for all values of ξ, provided µ ≤ 1.
Page 136 of 236

Solution. 1. Then the Lax-Friedrichs method for solving the above partial differential equation is given
by:
ujn+1 − 12 (ujn+1 + uj−1
n
) n
ujn+1 − uj−1
+a =0
∆t 2 ∆x
Or, rewriting this to solve for the unknown ujn+1 ,
1 n ∆t
ujn+1 = n
(uj +1 + uj−1 )−a n
(u n − uj−1 )
2 2 ∆x j +1
i.e.
1 n µ as
ujn+1 =

uj−1 + ujn+1 − ujn+1 − uj−1
n
, µ= .
2 2 h
Let ū be the exact solution and ūjn = ū (n∆t, j∆x ). Then from Taylor Expansion, we have
∂ n 1 ∂2
ūjn+1 = ūjn + ∆t ūj + (∆t )2 2 ū (ξ1 , j∆x ), tn ≤ ξ1 ≤ tn+1 ,
∂t 2 ∂t
∂ 1 ∂2
n
ūj−1 = ūjn − ∆x ūjn + (∆x )2 2 ū (n∆t, ξ2 ), xj−1 ≤ ξ2 ≤ xj ,
∂x 2 ∂x
∂ 1 ∂2
ūjn+1 = ūjn + ∆x ūjn + (∆x )2 2 ū (n∆t, ξ3 ), xj ≤ ξ3 ≤ xj + 1 .
∂x 2 ∂x
Then the truncation error T n+1 of this scheme is

uin+1 − 21 (uin+1 + ui−1
n
) u n
− u n
T n+1 = + a i +1 i−1

∆t 2 ∆x
O (s )2 + O (h)2
= C
s
If s = C1 h, where C1 is fixed positive constant, then the local truncation error
T n+1 = C0 (s + h).
2. By using the Fourier transform, i.e.
then we have
1 n µ
g (ξ )ujn = uj−1 + ujn+1 − ujn+1 − uj−1
n
,
2 2
and
1 i (j−1)∆xξ µ
g (ξ )eij∆xξ = e + ei (j +1)∆xξ − ei (j +1)∆xξ − ei (j−1)∆xξ .
2 2
Then we have
1 −i∆xξ µ
g (ξ ) = e + ei∆xξ − ei∆xξ − e−i∆xξ
2 2
= cos(∆xξ ) + iµ sin(∆xξ ).
Page 137 of 236


From von Neumann analysis, we know that the Lax-Friedrichs scheme is stable if g (ξ ) ≤ 1, i.e.
(cos(∆xξ ))2 + (µ sin(∆xξ ))2 ≤ 1,
i.e.
µ ≤ 1.
Problem A.29. (Sample #4) Consider the linear reaction-diffusion problem


∂u 2


 ∂t
= ∂∂xu2 − u for 0 ≤ x ≤ 1, 0 ≤ t ≤ T

u (0, t ) = u (1, t ) = 0 for 0 ≤ t ≤ T




u (x, 0) = g (x )

for 0 ≤ x ≤ 1,
The Crank-Nicolson scheme for this problem is written as

µ n+1 s
ujn+1 = ujn + uj−1 − 2ujn+1 + ujn++11 + uj−1 − 2ujn + ujn+1 − ujn+1 + ujn

n
2 2
s
where µ = h2
. Prove that the method is stable in the sense that
u n+1 ≤ ku n k∞ ,

∞
for all n ≥ 0, if 0 < µ + 2s ≤ 1.
Solution. This problem is similar to Sample #14. The scheme can be rewritten as
µ n+1 s n+1 µ n+1 µ n s µ
(1 + µ)ujn+1 = uj−1 − uj + uj +1 + uj−1 + (1 − µ − )ujn + ujn+1 .
2 2 2 2 2 2
Then we have
µ s µ µ s n µ n
1 + µ ujn+1 ≤ uj−1
n+1 n+1 n+1

n
+ uj + uj +1 + uj−1 + (1 − µ − ) uj + uj +1 .
2 2 2 2 2 2
Therefore
µ s µ µ s µ
(1 + µ) u n+1 ∞ ≤ u n+1 ∞ + u n+1 ∞ + u n+1 ∞ + ku n k∞ + (1 − µ − ) ku n k∞ + ku n k∞ .

2 2 2 2 2 2
if 0 < µ + 2s ≤ 1, then
µ s µ µ s µ
(1 + µ) u n+1 ∞ ≤ u n+1 ∞ + u n+1 ∞ + u n+1 ∞ + ku n k∞ + (1 − µ − ) ku n k∞ + ku n k∞ .

2 2 2 2 2 2
i.e.
s s
(1 − ) u n+1 ∞ ≤ (1 − ) ku n k∞

2 2
Hence
u n+1 ≤ ku n k∞ .

∞
Page 138 of 236

Problem A.30. (Sample #8) 1D Discrete Poincaré inequality: Let Ω = (0, 1) and Ωh be a uniform grid of
size h. If Y ∈ Uh is a mesh function on Ωh such that Y (0) = 0, then there is a constant C, independent of Y
and h, for which

kY k2,h ≤ C δ̄Y 2,h .
Solution. I consider the following uniform partition (Figure. A1) of the interval (0, 1) with N points.
x1 = 0 x2 xN −1 xN = 1
Figure A1: One dimension’s uniform partition

N
X
kvk22,h = hd |vi |2 ,
i =1

N N
X 2 X vi−1 − vi 2
kvk22,h = h |vi |2 , δ̄v 2,h = h .
h

i =1 i =2
Since Y (0) = 0, i.e. Y1 = 0,

N
X
(Yi−1 − Yi ) = Y1 − YN = −YN .
i =2
Then,

X N
Yi−1 − Yi = |YN |.
i =2
and
N N X N
1/2  N 1/2
X X
Y − Y X Y − Y 2 
h i−1 i  i−1 i 
 
|Yi−1 − Yi | = h2    .
  
|YN | ≤ ≤ 
h h
i =2 i =2 i =2 i =2
Therefore
 K  K 
X  X Yi−1 − Yi 2 
2 2
|YK | ≤  h  
 
h

i =2 i =2
K 2
Yi−1 − Yi .
X
= h2 (K − 1) h
i =2
1. When K = 2,
2
2 2 Y1 − Y2
|Y2 | ≤ h
h .
Page 139 of 236

2. When K = 3,
Y − Y2 2 Y2 − Y3 2
!
|Y3 |2 ≤ 2h2 1 + .
h h
3. When K = N ,
Y − Y2 2 Y2 − Y3 2 − YN 2
!
Y
|YN |2 ≤ (N − 1)h2 1 + + · · · + N −1 .
h h h

N N
N (N − 1) 2 X Yi−1 − Yi 2
X
2 .
|Yi | ≤ h
2 h
i =2 i =2
Since Y1 = 0, so
N N
N (N − 1) 2 X Yi−1 − Yi 2
X
2 .
|Yi | ≤ h
2 h
i =1 i =2
And then
N N N
Yi−1 − Yi 2 Yi−1 − Yi 2
! X
1 X
2 N 2
X 1 1 2
|Yi | ≤ h = 2 + 2(N − 1) h .

(N − 1)2

2(N − 1) h h
i =1 i =2 i =2
1
N N
Yi−1 − Yi 2
! X
2
X
2 1 1 2
h |Yi | ≤ + h
.
2 2(N − 1) h
i =1 i =2
then
N N
Yi−1 − Yi 2
! X
X
2 1 1
h |Yi | ≤ + h
.
2 2(N − 1) h
i =1 i =2
i.e,
!
1 1 2
kY k22,h ≤ + δ̄Y .
2,h
2 2(N − 1)
since N ≥ 2, so
2
kY k22,h ≤ δ̄Y 2,h .
Hence,

kY k2,h ≤ C δ̄Y 2,h .
J
Problem A.31. (Sample #12) Discrete maximum principle: Let A = tridiag{ai , bi , ci }ni=1 ∈ Rn×n be a
tridiagional matrix with the properties that
bi > 0, ai , ci ≤ 0, ai + bi + ci = 0.
Prove the following maximum principle: If u ∈ Rn is such that (Au )i =2,··· ,n−1 ≤ 0, then ui ≤ max{u1 , un }.
Page 140 of 236

Solution. Without loss generality, we assume uk , k = 2, · · · , n − 1 is the maximum value.

1. For (Au )i =2,··· ,n−1 < 0:
I will use the method of contradiction to prove this case. Since (Au )i =2,··· ,n−1 < 0, so
ak uk−1 + bk uk + ck uk +1 < 0.
ak uk−1 − (ak + ck )uk + ck uk +1 = ak (uk−1 − uk ) + ck (uk +1 − uk ) ≥ 0.
This is contradiction to (Au )i =2,··· ,n−1 < 0. Therefore, If u ∈ Rn is such that (Au )i =2,··· ,n−1 < 0, then
ui ≤ max{u1 , un }.
2. For (Au )i =2,··· ,n−1 = 0:
Since (Au )i =2,··· ,n−1 = 0, so
ak uk−1 + bk uk + ck uk +1 = 0.
ak uk−1 − (ak + ck )uk + ck uk +1 = ak (uk−1 − uk ) + ck (uk +1 − uk ) = 0.
And ak < 0, ck < 0, uk−1 − uk ≤ 0, uk +1 − uk ≤ 0, so uk−1 = uk = uk +1 , that is to say, uk−1 and uk +1 is also
the maximum points. Bu using the same argument again, we get uk−2 = uk−1 = uk = uk +1 = uk +2 .
Repeating the process, we get
u1 = u2 = · · · = un−1 = un .
Therefore, If u ∈ Rn is such that (Au )i =2,··· ,n−1 = 0, then ui ≤ max{u1 , un }

J
Problem A.32. (Sample #14) Consider the Crank-Nicolson scheme

µ n+1

n
− 2ujn + ujn+1
2
∂u ∂2 u
= ∂x2
kAxk2 ≤ kxk2 ,

2. Show that
max norm.)
Page 141 of 236


µ n+1

n
− 2ujn + ujn+1
2
can be rewritten as
µ n+1 µ µ n µ
2 2 2 2
Cun+1 = Bun
where
µ µ
1 + µ
   
−2  1 − µ 2 
 µ µ  µ µ
 − 2 1 + µ −2 1−µ
 
  2 2

   
C = 
 .. .. ..  , B = 
  .. .. ..  ,

. . . . . .
µ µ µ µ 
   

 −2 1+µ −2 


 2 1−µ 2 

µ µ 
−2 1+µ 1−µ
  
2
 n+1   n
u1   u1 
 n+1   u n 
u2   2 
un+1 =  .  and un =  .  .
 
 ..   .. 
   
 n+1  umn
um
transform, i.e.
then we have
µ n µ µ n µ
2 2 2 2
And then
µ µ µ µ
2 2 2 2
i.e.
µ µ µ µ

2 2 2 2
i.e.
therefore,
1 − µ + µ cos(∆xξ )
g (ξ ) = .
1 + µ − µ cos(∆xξ )
hence
1 + 12 z ∆t
g (ξ ) = ,z = 2 (cos (∆xξ ) − 1).
1 − 12 z ∆x2

Page 142 of 236

2. the scheme
µ n+1

n
− 2ujn + ujn+1
2
can be rewritten as
2 j−1 2 2 2
then
µ µ µ
(1 − µ) u n + µ u n .

1 + µ ujn+1 ≤ uj−1
n+1 n+1

n
+ u +
j +1 j−1 u + j j +1
2 2 2 2
Therefore
(1 + µ) ujn+1 ≤ uj−1

n
+ uj +1 + uj−1 + (1 − µ) uj + uj +1 .
∞ 2 ∞ 2 ∞ 2 ∞ ∞ 2 ∞
i.e.
µ µ µ µ

2 2 2 2
if µ ≤ 1, then
un+1 ≤ kun k∞ ,

∞
i.e.
Problem A.33. (Sample #15) Consider the Lax-Wendroff scheme
a2 (∆t )2 n a∆t
ujn+1 = ujn +

2
uj−1 − 2ujn + ujn+1 − ujn+1 − uj−1
n
,
2(∆x ) 2∆x
∂u ∂u
+a = 0, a > 0.
∂t ∂x
a∆t
≤ 1.
∆x
is enforced.
then we have
a2 (∆t )2 n a∆t
+ u n
j +1 − u n
− u n
j−1 .
2(∆x )2 j−1 j 2∆x j +1
Page 143 of 236

And then
a2 (∆t )2 i (j−1)∆xξ a∆t
g (ξ )eij∆xξ = eij∆xξ + 2
e − 2eij∆xξ + ei (j +1)∆xξ − ei (j +1)∆xξ − ei (j−1)∆xξ .
2(∆x ) 2∆x
Therefore
a2 (∆t )2 −i∆xξ i∆xξ
a∆t
i∆xξ −i∆xξ

g (ξ ) = 1+ e − 2 + e − e − e
2(∆x )2 2∆x
a2 (∆t )2 a∆t
= 1+ 2
(2 cos(∆xξ ) − 2) − (2i sin(∆xξ ))
2(∆x ) 2∆x
a2 (∆t )2 a∆t
= 1+ (cos(∆xξ ) − 1) − (i sin(∆xξ )) .
(∆x )2 ∆x
a∆t

2
1 + µ2 (cos(∆xξ ) − 1) + (µ sin(∆xξ ))2 < 1.

i.e.
i.e.

i.e.

i.e
µ2 (cos(∆xξ ) − 1)2 − (cos(∆xξ ) − 1)2 < 0,
(µ2 − 1)(cos(∆xξ ) − 1)2 < 0,

Problem A.34. (Sample #16) Consider the Crank-Nicholson scheme applied to the diffusion equation
∂u ∂2 u
= 2
∂t ∂x
where t > 0, −∞ < x < ∞.
1 + 21 z ∆t
g (ξ ) = ,z = 2 (cos (∆xξ ) − 1).
1 − 12 z ∆x2
Page 144 of 236


ujn+1 − ujn
 n+1 n+1 n+1 n
− 2ujn + ujn+1 

1  uj−1 − 2uj + uj +1 uj−1
=  + 
∆t 2 ∆x2 ∆x2

∆t
Let µ = ∆x2
µ n+1

n
− 2ujn + ujn+1 ,
2
i.e.
µ n+1 µ µ n µ
2 2 2 2
then we have
µ n µ µ n µ
2 2 2 2
And then
µ µ µ µ
2 2 2 2
i.e.
µ µ µ µ

2 2 2 2
i.e.
therefore,
1 − µ + µ cos(∆xξ )
g (ξ ) = .
1 + µ − µ cos(∆xξ )
hence
1 + 12 z ∆t
g (ξ ) = ,z = 2 (cos (∆xξ ) − 1).
1 − 12 z ∆x2
∆t
1 1
1 + z < 1 − z,
2 2
1 1
z − 1 < z + 1.
2 2
Therefore,
1 + 21 z
g (ξ ) = > −1.
1 − 12 z

J
Page 145 of 236

Problem A.35. (Sample #17) Consider the explicit scheme

bµ∆x

n
− 2ujn + ujn+1 − n
ujn+1 − uj−1 , 0 ≤ n ≤ N , 1 ≤ j ≤ L.
2

∂u 2


 ∂t
= ∂∂xu2 − b ∂u
∂x
for 0 ≤ x ≤ 1, 0 ≤ t ≤ t ∗

u (0, t ) = u (1, t ) = 0 for 0 ≤ t ≤ t ∗




u (x, 0) = g (x )

for 0 ≤ x ≤ 1,
∆t 1 t∗
where b > 0, µ = (∆x )2
n

ken k∞ ≤ t ∗ C ∆t + ∆x2 ,
∂ n 1 ∂2
ūjn+1 = ūjn + ∆t
ūj + (∆t )2 2 ū (ξ1 , j∆x ), tn ≤ ξ1 ≤ tn+1 ,
∂t 2 ∂t
∂ 1 ∂ 3 1 ∂4
n
ūj−1 = ūjn − ∆x ūjn − (∆x )3 3 ūjn + (∆x )4 4 ū (n∆t, ξ2 ), xj−1 ≤ ξ2 ≤ xj ,
∂x 6 ∂x 24 ∂x
∂ 1 ∂ 3 1 ∂4
ūjn+1 = ūjn + ∆x ūjn + (∆x )3 3 ūjn + (∆x )4 4 ū (n∆t, ξ3 ), xj ≤ ξ3 ≤ xj + 1 .
∂x 6 ∂x 24 ∂x
ūjn+1 − ūjn n
ūj−1 − 2ūjn + ūjn+1 ūjn+1 − ūj−1
n
T = − +b
∆t ∆x2 ∆x
= O (∆t + (∆x )2 ).
Therefore
bµ∆x

n
− 2ejn + ejn+1 − ejn+1 − ej−1
n
+ c∆t (∆t + (∆x )2 ),
2
i.e.
! !
bµ∆x n bµ∆x n
ejn+1 = µ+ n
ej−1 + (1 − 2µ)ej + µ − ej +1 + c∆t (∆t + (∆x )2 ).
2 2
Then


j 2 j−1 j 2 j +1
Therefore


j ∞ 2 j−1 ∞ j ∞ 2 j +1 ∞

bµ∆x n bµ∆x n
en+1 ≤ µ +

∞ 2 2
Page 146 of 236

bµ∆x 1
! !
bµ∆x bµ∆x
en+1

∞
2 2
= ken k∞ + c∆t (∆t + (∆x )2 ).
Then

ken k∞ ≤ en−1 + c∆t (∆t + (∆x )2 )
∞
≤ en−2 + c2∆t (∆t + (∆x )2 )
∞
..
≤ .

≤ e0 + cn∆t (∆t + (∆x )2 )
∞
= ct ∗ (∆t + (∆x )2 ).
A.5 Supplemental Problems
Page 147 of 236

B Numerical Mathematics Preliminary Examination

B.1 Numerical Mathematics Preliminary Examination Jan. 2011
Problem B.1. (Prelim Jan. 2011#1) Consider a linear system Ax = b with A ∈ Rn×n . Richardson’s method
is an iterative method
Mxk +1 = N xk + b
with M = w1 , N = M −A = w1 I −A, where w is a damping factor chosen to make M approximate A as well as

possible. Suppose A is positive definite and w > 0. Let λ1 and λn denote the smallest and largest eigenvalue
of A.
2
1. Prove that Richardson’s method converges if and only if w < λn .
2
2. Prove that the optimal value of w is w0 = λ1 +λn .
1
Solution. 1. Since M = w,N = M − A = w1 I − A, then we have
xk +1 = (I − wA)xk + bw.
So TR = I − wA, From the sufficient and & necessary condition for convergence, we should have
ρ (TR ) < 1. Since λi are the eigenvalue of A, then we have 1 − λi w are the eigenvalues of TR . Hence
Richardson’s method converges if and only if |1 − λi w| < 1, i.e
−1 < 1 − λn w < · · · < 1 − λ1 w < 1,

2
i.e. w < λn .
2. the minimal attaches at |1 − λn w| = |1 − λ1 w| (Figure. B2), i.e
λn w − 1 = 1 − λ1 w,
i,e
2
w0 = .
λ1 + λn
J
|1 − λn | |1 − λ1 |
1
w
1 wopt 1
λn λ1
Figure B2: The curve of ρ (TR ) as a function of w
Page 148 of 236

Problem B.2. (Prelim Jan. 2011#2) Let A ∈ Cm×n and b ∈ Cm . Prove that the vector x ∈ Cn is a least
squares solution of Ax = b if and only if r⊥ range(A), where r = b − Ax.
Solution. We already know, x ∈ Cn is a least squares solution of Ax = b if and only if

A∗ Ax = A∗ b.
and
(r, Ax ) = (Ax ) ∗ r = x∗ A∗ (b − Ax )
= x∗ (A∗ b − A∗ Ax ))
= 0.
Therefore, r⊥ range(A). The above way is invertible, hence we prove the result. J
Problem B.3. (Prelim Jan. 2011#3) Suppose A, B ∈ Rn×n and A is non-singular and B is singular. Prove
that
1 kA − Bk
≤ ,
κ (A) kAk

x = x − A−1 Bx = (I − A−1 B)x.
So

Since x , 0, so

1 ≤ A−1 kA − Bk .
1 kA − Bk
≤ ,
A−1 kAk kAk
i.e.
1 kA − Bk
≤ .
κ (A) kAk
J
Problem B.4. (Prelim Jan. 2011#4) Let f : Ω ⊂ Rn → Rn be twice continuously differentiable. Suppose
x∗ ∈ Ω is a solution of f (x ) = 0, and the Jacobian matrix of f, denoted Jf , is invertible at x∗ .
xk +1 = xk − Jf (x0 )−1 f (xk ).
2. Prove that the convergence is typically only linear.
Page 149 of 236

Solution. Let x∗ be the root of f(x ) i.e. f(x∗ )=0. From the Newton’s scheme, we have
 h i−1
xk +1 = xk − J (x0 ) f(xk )


x∗ = x∗


Therefore, we have
h i−1
x∗ − xk +1 = x∗ − xk + J (x0 ) (f(xk ) − f(x∗ ))
h i−1
= x∗ − xk − J(x0 ) J(ξ )(x∗ − xk ).
therefore
J(ξ ) ∗

x∗ − xk +1 ≤ 1 −

0
x − xk
J(x )
From theorem
Theorem B.1. Suppose J : Rm → Rn×n is a continuous matrix-valued function. If J(x*) is nonsingular, then
there exists δ > 0 such that, for all x ∈ Rm with kx − x∗ k < δ, J(x) is nonsingular and

J (x )−1 < 2 J (x∗ )−1 .
we get
1
x∗ − xk +1 ≤ x∗ − xk .

2
Which also shows the convergence is typically only linear.
J
Problem B.5. (Prelim Jan. 2011#5) Consider
y 0 (t ) = f (t, y (t )), t ≥ t0 , y (t0 ) = y0 ,
where f : [t0 , t ∗ ] × R → R is continuous in its first variable and Lipschitz continuous in its second variable.
Prove that Euler’s method converges.
Solution. The Euler’s scheme is as follows:
yn+1 = yn + hf (tn , yn ), n = 0, 1, 2, · · · . (218)
By the Taylor expansion,
y (tn+1 ) = y (tn ) + hy 0 (tn ) + O (h2 ).
So,
= O (h2 ).
Therefore, Forward Euler Method is order of 1 .

From (219), we get
Page 150 of 236

en+1 = en + h[f (tn , yn ) − f (tn , y (tn ))] + ch2 .
|f (tn , yn ) − f (tn , y (tn ))| ≤ λ|yn − y (tn )|, λ > 0.
Therefore,

= (1 + hλ) ken k + ch2 .
Claim:[2]
c
ken k ≤ h[(1 + hλ)n − 1], n = 0, 1, · · ·
λ
c
ken k ≤ h[(1 + hλ)n − 1]
λ
3. Induction steps:

en+1 ≤(1 + hλ) ken k + ch2
c
≤ (1 + hλ) h[(1 + hλ)n − 1] + ch2
λ
c
= h[(1 + hλ)n+1 − 1].
λ
So, from the claim (221), we get ken k → 0, when h → 0. Therefore Forward Euler Method is convergent
. J
Problem B.6. (Prelim Jan. 2011#6) Consider the scheme
yn+2 + yn+1 − 2yn = h (f (tn+2 , yn+2 ) + f (tn+1 , yn+1 ) + f (tn , yn ))
for approximating the solution to
y 0 (t ) = f (t, y (t )), t ≥ t0 , y (t0 ) = y0 ,
what’s the order of the scheme? Is it a convergent scheme? Is it A-stable? Justify your answers.

s
X s
X
ρ (w ) : = am wm = −2 + w + w2 and σ (w ) := bm w m = 1 + w + w 2 . (221)
m=0 m=0

s
X s
X
ρ (w ) : = am wm = ξ 2 + 3ξ and σ (w ) := bm wm = ξ 2 + 3ξ + 3. (222)
m=0 m=0
Page 151 of 236

So,
ξ2 ξ3
ρ (w ) − σ (w )ln(w ) = ξ 2 + 3ξ − (3 + 3ξ + ξ 2 )(ξ − + ···)
2 3
+3ξ +ξ 2
−3ξ −3ξ 2 −ξ 3
=
+ 23 ξ 2 + 32 ξ 3 + 12 ξ 4
−ξ 3 −ξ 4 − 31 ξ 5
1
= − ξ 2 + O (ξ 3 ).
2
1
ρ (w ) − σ (w )ln(w ) = − ξ 2 + O (ξ 3 ).
2
Hence, this scheme is order of 1. Since,
s
X
ρ (w ) : = am wm = −2 + w + w2 = (w + 2)(w − 1). (223)
m=0
And w = −1 or w = −2 which does not satisfy the root condition. Therefore, this scheme is not stable.
Hence, it is also not A-stable. J
Problem B.7. (Prelim Jan. 2011#7) Consider the Crank-Nicholson scheme applied to the diffusion equa-
tion
∂u ∂2 u
= 2
∂t ∂x
where t > 0, −∞ < x < ∞.
1 + 21 z ∆t
g (ξ ) = 1
,z = 2 (cos (∆xξ ) − 1).
1− 2z
∆x2
ujn+1 − ujn
 n+1 n+1 n+1 n
− 2ujn + ujn+1 

1  uj−1 − 2uj + uj +1 uj−1
=  + 
∆t 2 ∆x2 ∆x2

∆t
Let µ = ∆x2
µ n+1

n
− 2ujn + ujn+1 ,
2
i.e.
µ n+1 µ µ n µ
2 2 2 2
Page 152 of 236

then we have
µ n µ µ n µ
2 2 2 2
And then
µ µ µ µ
2 2 2 2
i.e.
µ µ µ µ

2 2 2 2
i.e.
therefore,
1 − µ + µ cos(∆xξ )
g (ξ ) = .
1 + µ − µ cos(∆xξ )
hence
1 + 12 z ∆t
g (ξ ) = 1
,z = 2 (cos (∆xξ ) − 1).
1− 2z
∆x2
∆t
1 1
1 + z < 1 − z,
2 2
1 1
z − 1 < z + 1.
2 2
Therefore,
1 + 21 z
g (ξ ) = > −1.
1 − 12 z

J
Page 153 of 236

Problem B.8. (Prelim Jan. 2011#8) Consider the explicit scheme

bµ∆x

n
− 2ujn + ujn+1 − n
ujn+1 − uj−1 , 0 ≤ n ≤ N , 1 ≤ j ≤ L.
2

∂u 2


 ∂t
= ∂∂xu2 − b ∂u
∂x
for 0 ≤ x ≤ 1, 0 ≤ t ≤ t ∗

u (0, t ) = u (1, t ) = 0 for 0 ≤ t ≤ t ∗




u (x, 0) = g (x )

for 0 ≤ x ≤ 1,
∆t 1 t∗
where b > 0, µ = (∆x )2
n

ken k∞ ≤ t ∗ C ∆t + ∆x2 ,
∂ n 1 ∂2
ūjn+1 = ūjn + ∆t
ūj + (∆t )2 2 ū (ξ1 , j∆x ), tn ≤ ξ1 ≤ tn+1 ,
∂t 2 ∂t
∂ 1 ∂ 2 1 ∂ 3 1 ∂4
n
ūj−1 = ūjn − ∆x ūjn + (∆x )2 2 ūjn − (∆x )3 3 ūjn + (∆x )4 4 ū (n∆t, ξ2 ), xj−1 ≤ ξ2 ≤ xj ,
∂x 2 ∂x 6 ∂x 24 ∂x
∂ 1 ∂ 2 1 ∂ 3 1 ∂4
ūjn+1 = ūjn + ∆x ūjn + (∆x )2 2 ūjn + (∆x )3 3 ūjn + (∆x )4 4 ū (n∆t, ξ3 ), xj ≤ ξ3 ≤ xj + 1 .
∂x 2 ∂x 6 ∂x 24 ∂x
ūjn+1 − ūjn n
ūj−1 − 2ūjn + ūjn+1 ūjn+1 − ūj−1
n
T = − +b
∆t ∆x2 ∆x
= O (∆t + (∆x )2 ).
Therefore
bµ∆x

n
− 2ejn + ejn+1 − ejn+1 − ej−1
n
+ c∆t (∆t + (∆x )2 ),
2
i.e.
! !
bµ∆x n bµ∆x n
ejn+1 = µ+ n
ej−1 + (1 − 2µ)ej + µ − ej +1 + c∆t (∆t + (∆x )2 ).
2 2
Then


j 2 j−1 j 2 j +1
Therefore


j ∞ 2 j−1 ∞ j ∞ 2 j +1 ∞

bµ∆x n bµ∆x n
en+1 ≤ µ +

∞ 2 2
Page 154 of 236

bµ∆x
! !
bµ∆x bµ∆x
en+1

∞
2 2
= ken k∞ + c∆t (∆t + (∆x )2 ).
Then

ken k∞ ≤ en−1 + c∆t (∆t + (∆x )2 )
∞
≤ e + c2∆t (∆t + (∆x )2 )
n−2
∞
..
≤ .

≤ e0 + cn∆t (∆t + (∆x )2 )
∞
= ct ∗ (∆t + (∆x )2 ).
J
B.2 Numerical Mathematics Preliminary Examination Aug. 2010
Problem B.9. (Prelim Aug. 2010#1) Prove that A ∈ Cm×n (m > n) and let A = Q̂R̂ be a reduced QR
factorization.
1. Prove that A has rank n if and only if all the diagonal entries of R̂ are non-zero.
2. Suppose rank(A) = n, and define P = Q̂Q̂∗ . Prove that range(P ) = range(A).
3. What type of matrix is P?
Solution. 1. From the properties of reduced QR factorization, we knowQthat Q̂ has orthonormal columns,
therefore det(Q̂ ) = 1 and R̂ is upper triangular matrix, so det(R̂) = ni=1 rii . Then
n
Y
det(A) = det(Q̂R̂) = det(Q̂ ) det(R̂) = rii .
i =1
Therefore, A has rank n if and only if all the diagonal entries of R̂ are non-zero.
2. (a) range(A) ⊆ range(P ): Let y ∈ range(A), that is to say there exists a x ∈ Cn s.t. Ax = y. Then by
reduced QR factorization we have y = Q̂R̂x. then
P y = P Q̂R̂x = Q̂Q̂∗ Q̂R̂x = Q̂R̂x = Ax = y.
therefore y ∈ range(P ).
(b) range(P ) ⊆ range(A): Let v ∈ range(P ), that is to say there exists a v ∈ Cn , s.t. v = P v = Q̂Q̂∗ v.
Claim B.1.
Q̂Q̂∗ = A (A∗ A)−1 A∗ .
Proof.
−1
A (A∗ A)−1 A∗

= Q̂R̂ R̂∗ Q̂∗ Q̂R̂ R̂∗ Q̂∗
−1
= Q̂R̂ R̂∗ R̂ R̂∗ Q̂∗
−1
= Q̂R̂R̂−1 R̂∗ R̂∗ Q̂∗
= Q̂Q̂∗ .
Page 155 of 236

J
Therefore by the claim, we have
v = P v = Q̂Q̂∗ v = A (A∗ A)−1 A∗ v = A (A∗ A)−1 A∗ v = Ax.

where x = (A∗ A)−1 A∗ v. Hence v ∈ range(A).

3. P is an orthogonal projector.
J
Problem B.10. (Prelim Aug. 2010#4) Prove that A ∈ Rn×n is SPD if and only if it has a Cholesky factor-
ization.
Solution. 1. Since A is SPD, so it has LU factorization, and L = U , i.e.
A = LU = U T U .
Therefore, it has a Cholesky factorization.

2. if A has Cholesky factorization, i.e A = U T U , then
xT Ax = xT U T U x = (U x )T U x.
Let y = U x, then we have
xT Ax = (U x )T U x = y T y = y12 + y22 + · · · + yn2 ≥ 0,
with equality only when y = 0, i.e. x=0 (since U is non-singular). Hence A is SPD.
J
Problem B.11. (Prelim Aug. 2010#8) Consider the Crank-Nicolson scheme

µ n+1

n
− 2ujn + ujn+1
2
∂u ∂2 u
= ∂x2
kAxk2 ≤ kxk2 ,

2. Show that
max norm.)
Page 156 of 236


µ n+1

n
− 2ujn + ujn+1
2
can be rewritten as
µ n+1 µ µ n µ
2 2 2 2
Cun+1 = Bun
where
µ µ
1 + µ
   
−2  1 − µ 2 
 µ µ  µ µ
 − 2 1 + µ −2 1−µ
 
  2 2

   
C = 
 .. .. ..  , B = 
  .. .. ..  ,

. . . . . .
µ µ µ µ 
   

 −2 1+µ −2 


 2 1−µ 2 

µ µ 
−2 1+µ 1−µ
  
2
 n+1   n
u1   u1 
 n+1   u n 
u2   2 
un+1 =  .  and un =  .  .
 
 ..   .. 
   
 n+1  umn
um
transform, i.e.
then we have
µ n µ µ n µ
2 2 2 2
And then
µ µ µ µ
2 2 2 2
i.e.
µ µ µ µ

2 2 2 2
i.e.
therefore,
1 − µ + µ cos(∆xξ )
g (ξ ) = .
1 + µ − µ cos(∆xξ )
hence
1 + 12 z ∆t
g (ξ ) = ,z = 2 (cos (∆xξ ) − 1).
1 − 12 z ∆x2

Page 157 of 236

2. the scheme
µ n+1

n
− 2ujn + ujn+1
2
can be rewritten as
2 j−1 2 2 2
then
µ µ µ
(1 − µ) u n + µ u n .

1 + µ ujn+1 ≤ uj−1
n+1 n+1

n
+ u +
j +1 j−1 u + j j +1
2 2 2 2
Therefore
(1 + µ) ujn+1 ≤ uj−1

n
+ uj +1 + uj−1 + (1 − µ) uj + uj +1 .
∞ 2 ∞ 2 ∞ 2 ∞ ∞ 2 ∞
i.e.
µ µ µ µ

2 2 2 2
if µ ≤ 1, then
un+1 ≤ kun k∞ ,

∞
i.e.
Problem B.12. (Prelim Aug. 2010#9) Consider the Lax-Wendroff scheme
a2 (∆t )2 n a∆t
ujn+1 = ujn +

2
uj−1 − 2ujn + ujn+1 − ujn+1 − uj−1
n
,
2(∆x ) 2∆x
∂u ∂u
+a = 0, a > 0.
∂t ∂x
a∆t
≤ 1.
∆x
is enforced.
then we have
a2 (∆t )2 n a∆t
+ u n
j +1 − u n
− u n
j−1 .
2(∆x )2 j−1 j 2∆x j +1
Page 158 of 236

And then
a2 (∆t )2 i (j−1)∆xξ i (j +1)∆xξ
a∆t
i (j +1)∆xξ i (j−1)∆xξ

g (ξ )eij∆xξ = eij∆xξ + e − 2e ij∆xξ
+ e − e − e .
2(∆x )2 2∆x
Therefore
a2 (∆t )2 −i∆xξ a∆t
g (ξ ) = 1+ 2
e − 2 + ei∆xξ − ei∆xξ − e−i∆xξ
2(∆x ) 2∆x
a2 (∆t )2 a∆t
= 1+ (2 cos(∆xξ ) − 2) − (2i sin(∆xξ ))
2(∆x )2 2∆x
a2 (∆t )2 a∆t
= 1+ (cos(∆xξ ) − 1) − (i sin(∆xξ )) .
(∆x )2 ∆x
a∆t

2
1 + µ2 (cos(∆xξ ) − 1) + (µ sin(∆xξ ))2 < 1.

i.e.
i.e.

i.e.

i.e
µ2 (cos(∆xξ ) − 1)2 − (cos(∆xξ ) − 1)2 < 0,
(µ2 − 1)(cos(∆xξ ) − 1)2 < 0,
Page 159 of 236


Problem B.13. (Prelim Jan. 2008#8) Let Ω ⊂ R2 be a bounded domain with a smooth boundary. Consider
a 2-D poisson-like equation

−∆u + 3u = x2 y 2 , in Ω,


= 0, on ∂Ω.

u

1. Write the corresponding Ritz and Galerkin variational problems.

2. Prove that the Galerkin method has a unique solution uh and the following estimate is valid
vh ∈Vh
with C independent of h, where Vh denotes a finite element subspace of H 1 (Ω) consisting of contin-
uous piecewise polynomials of degree k ≥ 1.
Solution. 1. For this pure Dirichlet Problem, the test functional space v ∈ H01 . Multiple the test func-
tion on the both sides of the original function and integral on Ω, we get
Z Z Z
− ∆uvdx + uvdx = xyvdx.
Ω Ω Ω
Integration by part yields
Z Z Z
∇u∇vdx + uvdx = xyvdx.
Ω Ω Ω
Let
Z Z Z
a(u, v ) = ∇u∇vdx + uvdx, f (v ) = xyvdx.
Ω Ω Ω
Then, the
(a) Ritz variational problem is: find uh ∈ H01 , such that
1
J (uh ) = min a(uh , uh ) − f (uh ).
2
(b) Galerkin variational problem is: find uh ∈ H01 , such that
a(uh , uh ) = f (uh ).
2. Next, we will use Lax-Milgram to prove the uniqueness.
(a)
Z Z
a(u, v ) ≤ |∇u∇v| dx + |uv| dx
Ω Ω
≤ k∇ukL2 (Ω) k∇vkL2 (Ω) + kukL2 (Ω) kvkL2 (Ω)
≤ k∇ukL2 (Ω) k∇vkL2 (Ω) + C k∇ukL2 (Ω) k∇vkL2 (Ω)
≤ C k∇ukL2 (Ω) k∇vkL2 (Ω)
≤ C kukH 1 (Ω) kvkH 1 (Ω)
Page 160 of 236

(b)
Z Z
2
a(u, u ) = (∇u ) dx + u 2 dx
Ω Ω
So,
Z Z
a(u, u ) = |∇u|2 dx + |u|2 dx
Ω Ω
= k∇uk2L2 (Ω) + kuk2L2 (Ω)
= kuk2H 1 (Ω) .
(c)
Z
f (v ) ≤ xyv dx
Ω
Z
≤ max |xy| |v| dx
Ω
Z !1/2 Z !1/2
≤ C 12 dx |v|2 dx
Ω Ω
≤ C kvkL2 (Ω) ≤ C kvkH 1 (Ω) .
by Lax-Milgram theorem, we get that e Galerkin method has a unique solution uh . Moreover,
a(vh , vh ) = f (vh ).
And from the weak formula, we have
a(u, vh ) = f (vh ).
then we get the Galerkin Orthogonal (GO)
a(u − uh , vh ) = 0.
Then by coercivity

ku − uh k2H 1 (Ω) ≤ a(u − uh , u − uh )

= a(u − uh , u − vh ) + a(u − uh , vh − uh )

= a(u − uh , u − vh )
≤ ku − uh kH 1 (Ω) ku − vh kH 1 (Ω) .
Therefore,
vh ∈Vh
C Project 1 MATH571
Page 161 of 236

COMPUTATIONAL ASSIGNMENT # 1
MATH 571
1. Instability of Gram–Schmidt
The purpose of the first part of your assignment is to investigate the instability of the
classical Gram–Schmidt orthogonalization process. Lecture 9 in BT is somewhat related to
this and could be a good source of inspiration.
1.– Write a piece of code that implements the classical Gram–Schmidt process, see Algo-
rithm 7.1 in BT. Ideally, this should be implemented in the form of a QR factorization,
that is, given a matrix A ∈ Rm×n your method should return two matrices Q ∈ Rm×n
and R ∈ Rn×n , where the matrix Q has (or at least should have) orthonormal columns
and A = QR.
2.– With the help of the developed piece of code, test the algorithm on a matrix A = R20×10
with:
• entries uniformly distributed over the interval [0, 1].
• entries given by
j−1
2i − 21
ai,j = .
19
• entries given by
1
ai,j = ,
i+j−1
this is the so-called Hilbert matrix.
3.– For each one of these cases compute Q? Q. Since Q, in theory, has orthonormal columns
what should you get? What do you actually get?
4.– Implement the modified Gram-Schmidt process (Algorithm 8.1 in BT) and repeat steps
1.—3. What do you observe?
2. Linear Least Squares

The purpose of the second part of your assignment is to observe the so-called Runge’s
phenomenon and try to mitigate it using least squares. Lecture 11 on BT might give some
hints on how to proceed. Consider the function
1
f (x) = ,
1 + 25x2
on the interval [−1, 1]. Do the following:
1.– Choose N ∈ N (not too large ≈ 10 should suffice) and on an equally spaced grid of points
construct a polynomial that interpolates f . In other words, given the grid of points
2i
xi = −1 + , i = 0, N ,
N
Page 162 of 236
Date: Due October 16, 2013.
you must find a polynomial pN of degree N such that

pN (xi ) = f (xi ), i = 0, N .
2.– Even though f and pN coincide at the nodes, how do they compare on the whole interval?
You can, for instance plot them or look at their values on a grid that consists of 2N
points.
3.– We are going to, instead of interpolating, construct a least squares fit for f . In other
words, we choose n ∈ N, n < N , and construct a polynomial qn of degree n such that
N
X
|qn (xi ) − f (xi )|2
i=0
is minimal. P
4.– If our least squares polynomial is defined as qn (x) = nj=0 Qj xj , then the minimality
conditions lead to the overdetermined system
(1) Aq = y, Ai,j = xj−1
i , Qj = Qj , yi = f (xi ),
which, since all the points xi are different has full rank (Can you prove this? ). This
means that the least squares solution can be found, for instance, using the QR algorithm
which you developed on the first part of the assignment. This gives you the coefficients
of the polynomial.
5.– How does qn and f compare? Keeping N fixed, vary n and try to find an empirical
relation for the n (in terms of N ) which optimizes the least squares fit.
Remark. Equation (1) is also the system of equations you obtain when trying to compute
the interpolating polynomial of point 1.–. In this case, however, the system will be square.
You can still use the QR algorithm to solve this system.
Page 163 of 236

MATH 571: Coding Assignment #1

Due on Wednesday, October 16, 2013
TTH 12:40pm
Wenqiang Feng
Page 164 of 236

Wenqiang Feng MATH 571 ( TTH 12:40pm): Coding Assignment #1
Contents
Problem 1 3
Problem 2 5
Page 165 of 236

Wenqiang Feng MATH 571 ( TTH 12:40pm): Coding Assignment #1
Problem 1
1. See Listing 3.
2. See Listing 2.
3. We should get the Identity square matrices. But we did not get the actual Identity square matrices
through the Gram-Schmidt Algorithm. For case 1-2, we only get the matrices which diag(Q∗ Q) =
n
z }| {
{1, · · · , 1} and the other elements approximate to 0 in the sense of C × 10−16 ∼ 10−17 . For case 3,
Classical Gram-Schmidt Algorithm is not stable for case 3, since some elements of matrix Q∗ Q do not
approximate to 0, then the matrix Q∗ Q is not diagonal any more.
4. For case 1-2, we also did not get the actual Identity square matrices by using the Modified Gram-
n
z }| {
∗
Schmidt Algorithm. We only get the matrices which diag(Q Q) = {1, · · · , 1} and the other elements
approximate to 0 in the sense of C × 10−17 ∼ 10−18 . For case 3, the Modified Gram-Schmidt Algo-
n
z }| {
∗
rithm works well for case 3, we get the matrix which diag(Q Q) = {1, · · · , 1} and the other elements
approximate to 0 in the sense of C × 10−8 ∼ 10−13 . So, Modified Gram-Schmidt Algorithm is more
stable than the Classical one.
Listing 1 shows the main function for problem1.
Listing 1: Main Function of Problem1

%Main function
clc
clear all
m=20;n=10;
5 fun1=@(i,j) ((2*i-21)/19)ˆ(j-1);
fun2=@(i,j) 1/(i+j);
A1=rand(m,n);
A2=matrix_gen(m,n,fun1);
A3=matrix_gen(m,n,fun2);
10 % Test for the random case 1
[CQ1,CR1]=gschmidt(A1)
[MQ1,MR1]=mgschmidt(A1)
q11=CQ1’*CQ1
q12=MQ1’*MQ1
15 % Test for case 2
q21=CQ2’*CQ2
q22=MQ2’*MQ2
20 % Test for case 3
q31=CQ3’*CQ3
q32=MQ3’*MQ3
Listing 2 shows the matrices generating function.

Page 166 of 236
Wenqiang Feng MATH 571 ( TTH 12:40pm): Coding Assignment #1 Problem 1 (continued)
Listing 2: Matrices Generating Function

function A=matrix_gen(m,n,fun)
A=zeros(m,n);
for i=1:m
for j=1:n
5 A(i,j)=fun(i,j);
end
end
Listing 3 shows Classical Gram-Schmidt Algorithm.
Listing 3: Classical Gram-Schmidt Algorithm

function [Q,R]=gschmidt(V)
% gschmidt: classical Gram-Schmidt algorithm
%
% USAGE
5 % gschmidt(V)
%
% INPUT
% V: V is an m by n matrix of full rank m<=n
%
10 % OUTPUT
% Q: an m-by-n matrix with orthonormal columns
% R: an n-by-n upper triangular matrix
%
% AUTHOR
15 % Wenqiang Feng
% Department of Mathematics
% University of Tennessee at Knoxville
% E-mail: wfeng@math.utk.edu
% Date: 9/14/2013
20
[m,n]=size(V);
Q=zeros(m,n);
R=zeros(n);
R(1,1)=norm(V(:,1));
25 Q(:,1)=V(:,1)/R(1,1);
for k=2:n
R(1:k-1,k)=Q(:,1:k-1)’*V(:,k);
Q(:,k)=V(:,k)-Q(:,1:k-1)*R(1:k-1,k);
R(k,k)=norm(Q(:,k));
30 if R(k,k) == 0
break;
end
Q(:,k)=Q(:,k)/R(k,k);
end
Listing 4 shows Modified Gram-Schmidt Algorithm.
Listing 4: Modified Gram-Schmidt Algorithm

Page 167 of 236
function [Q,R]=mgschmidt(V)
% mgschmidt: Modified Gram-Schmidt algorithm

%
% USAGE
5 % mgschmidt(V)
%
% INPUT
% V: V is an m by n matrix of full rank m<=n
%
10 % OUTPUT
% Q: an m-by-n matrix with orthonormal columns
% R: an n-by-n upper triangular matrix
%
% AUTHOR
15 % Wenqiang Feng
% Date: 9/14/2013
20
[m,n]=size(V);
Q=zeros(m,n);
R=zeros(n);
25 for k=1:n
R(k,k)=norm(V(:,k));
i f R(k,k) == 0
break;
end
30 Q(:,k)=V(:,k)/R(k,k);
for j=k+1:n
R(k,j)=Q(:, k)’* V(:,j);
V(:, j) = V(:, j)-R(k, j) * Q(:,k);
end
35 end
Problem 2
1. I Chose N = 10 and I got the polynomial p10 is as follow:
P1 0 = −220.941742081448x10 + 7.38961566181029e−13 x9 + 494.909502262444x8

−1.27934383856085e−12 x7 − 381.433823529411x6 + 5.56308212237901e−13 x5
+123.359728506787x4 − 1.16016030941682e−14 x3 − 16.8552036199095x2
−5.86232968237562e−15 x + 1.00000000000000
2. See Figure 1.
3. See Listing 6.
4. Since A is Vandermonde Matrix and all the points xi are different, then det(A) 6= 0. Therefore A has
full rank. Page 168 of 236
5. I varied N from 3 to 15. For every fixed N , I varied n form 1 to N . Then I got the following table
√
(Table.1). From table (Table.1), we can get that n ≈ 2 N + 1, where the N is the number of the
partition.
N \h 1 2 3 4 5 6 7 8 ··· 1
−17 −17
3 0.23 3.96 · 10 5.55 · 10
4 0.82 0.56 0.56 5.10 · 10−17
5 0.50 0.28 0.28 9.04 · 10−16 9.32 · 10−16
6 0.84 0.62 0.62 0.43 0.43 8.02 · 10−15
7 0.71 0.46 0.46 0.25 0.25 3.32 · 10−15 3.96 · 10−15
8 0.89 0.64 0.64 0.45 0.45 0.30 0.30 1.39 · 10−14
..
.
Table 1: The L2 norm of the Least squares polynomial fit
Fix N = 10, vary n( Figure 2-Figure 11).
Listing 5 shows main function of problem2.1.
Listing 5: Main Function of Problem2.1

% Main function of A2
clc
clear all
N=10;
5 n=N;
fun= @(x) 1./(1+25*x.ˆ2);
x=-1:2/N:1;
y=fun(x);
10 x1=-1:2/(2*N):1;
a = polyfit(x,y,n);
p = polyval(a,x1)
plot(x,y,’o’,x1,p,’-’)
15 for m=1:10
least_squares(x, y, m)
end
Listing 6 shows Polynomial Least Squares Fitting Algorithm.
Listing 6: Polynomial Least Squares Fitting Algorithm

%Main function for pro#2.5
clc
clear all
for N=3:15
5 j=1;
for n=1:N%3:N;
Page 169 of 236
fun= @(x) 1./(1+25*x.ˆ2);
x=-1:2/N:1;
b=fun(x);
10
A=MatrixGen(x,n);
cof=GSsolver(A,b);
q=0;
for i=1:n+1
15 q=q+cof(i)*(x.ˆ(i-1));
end
error(j)=norm(q-b);
j=j+1;
error
20 end
end
function A=MatrixGen(x,n)
25 m=size(x,2);
A=zeros(m,n+1);
for i=1:m
for j=1:n+1
A(i,j)=x(i).ˆ(j-1);
30 end
end
function x=GSsolver(A,b)
35 [Q,R]=mgschmidt(A);
x= R\(Q’*b’);
Page 170 of 236

Wenqiang Feng MATH 571 ( TTH 12:40pm): Coding Assignment #1 Problem 2
Figure 1: Runge’s phenomenon of Polynomial interpolation with 2N points.
Figure 2: Least Square polynomial of degree=1, N=10.

Page 171 of 236

Page 172 of 236

Page 173 of 236

Page 174 of 236

Page 175 of 236
Page 176 of 236

D Project 2 MATH571
Page 177 of 236

MATH 571
1. Convergence of Classical Schemes

The purpose of this part of your assignment is to investigate the convergence properties of classical iterative
schemes. To do so, develop:
1. A piece of code [x̃, K] = Jacobi(M, f, ) that implements the Jacobi method.
2. A piece of code [x̃, K] = SOR(M, f, ω, ) that implements the SOR method. Notice that the number ω
should be an input parameter1.
Your implementations should take as input a square matrix M ∈ RN ×N , a right hand side vector f ∈ RN
and a tolerance > 0. The output should be a vector x̃ ∈ RN — an approximate solution to M x = f and
an integer K — the number of iterations.2
For n ∈ N set N = 2n − 1 and consider the following matrices:
• The nonsymmetric matrix A ∈ RN ×N :
Ai,i = 3, i = 1, N , Ai,i+1 = −1, i = 1, N − 1, Ai,i−n = −1, i = n + 1, N .
• The tridiagonal matrix J ∈ R N ×N
:
1
J1,1 = 1 = −J1,2 , Ji,i = 2 + , Ji,i+1 = Ji,i−1 = −1, i = 2, N − 1, JN,N = 1 = −JN,N −1 .
N2
• The tridiagonal matrix S ∈ RN ×N :
Si,i = 3, i = 1, N , Si,i+1 = −1, i = 1, N − 1, Si,i−1 = −1, i = 2, N .
For different values of n ∈ {2, . . . , 50} and for each M ∈ {A, J, S}, choose a vector x ∈ RN and define
fM = M x.
i) Run Jacobi(M, fM , ) and record the number of iterations. How does the number of iterations depend
on N ?
ii) Run SOR(M, fM , 1, ). How does the number of iterations depend on N ?
iii) Try to find the optimal value of ω, that is the one for which the number of iterations is minimal.
iv) How does the number of iterations between Jacobi(M, fM , ), SOR(M, fM , 1, ) and SOR(M, fM , ω) with
an optimal ω compare? What can you conclude?
2. The Method of Alternating Directions

In this section we will study the Alternating Directions Implicit (ADI) method. Given A ∈ RN ×N ,
A = A? > 0 and f ∈ RN we wish to solve Ax = f . Assume that we have the following splitting of the matrix
A:
A = A1 + A2 , Ai = A?i > 0, i = 1, 2, A1 A2 = A2 A1 .
Date: Due November 26, 2013.

1Recall that for ω = 1 we obtain the Gauß–Seidel method so you obtain two methods for one here ;-)
2As stopping criterion you can use either
kxk+1 − xk k < ,
or, since we are just trying to learn,
k+1
kx
Page − xk <
178 of,236
where x is the exact solution.
Then, we propose the following scheme

xk+1/2 − xk
(1) (I + τ A1 ) + Axk = f,
τ
xk+1 − xk+1/2
(2) (I + τ A2 ) + Axk+1/2 = f.
τ
1. Write a piece of code [x̃, K] = ADI(A, f, , A1 , A2 , τ ) that implements the ADI scheme described above.
As before, the input should be a matrix A ∈ RN , a right hand side f ∈ RN and a tolerance > 0. In
addition, the scheme should take parameters A1 , A2 ∈ RN and τ > 0.
Notice that, in general, we need to invert (I + τ Ai ). In practice these matrices are chosen so that these
inversions are easy.
2. Let n ∈ {4, . . . , 50} and set N = n2 . Define the matrices Λ, Σ ∈ RN ×N as follows:
for i = 1, n
for j = 1, n
I = i + n(j − 1);
Λ[I, I] = Σ[I, I] = 3;
if i < n
Λ[I, I + 1] = −1;
endif
if i > 1
Λ[I, I − 1] = −1;
endif
if j < n
Σ[I, I + n] = −1;
endif
if j > 1
Σ[I, I − n] = −1;
endif
endfor
endfor
3. Set A = Λ + Σ.
4. Are the matrices Λ and Σ SPD? Do they commute?
5. Choose x ∈ RN and set f = Ax. Run ADI(A, f, , Λ, Σ, τ ) for different values of τ . Which one seems to
be the optimal one?
The following are not obligatory but can be used as extra credit:
6. Write an expression for xk+1 in terms of xk only. Hint: Try adding and subtracting (1) and (2). What
do you get?
7. From this expression find the equation that controls the error ek = x − xk .
8. Assume that (A1 A2 x, x) ≥ 0, show that in this case [x, y] = (A1 A2 x, y) is an inner product. If that is the
case we will denote kxkB = [x, x]1/2 .
9. Under this assumption we will show convergence of the ADI scheme. To do so:
• Take the inner product of the equation that controls the error with ek+1 + ek .
• Add over k = 1, K. We should obtain
K
X
keK+1 k22 + τ kek+1 + ek k2A + 2τ 2 keK+1 k2B = ke0 k22 + 2τ 2 ke0 k2B .
k=1
• From this it follows that, for every τ > 0, 12 (xk+1 + xk ) → x. How?
Page 179 of 236

MATH 571: Computational Assignment #2

Due on Tuesday, November 26, 2013
TTH 12:40pm
Wenqiang Feng
Page 180 of 236

Wenqiang Feng MATH 571 ( TTH 12:40pm): Computational Assignment #2
Contents
Problem 1 3
Problem 2 8
Page 181 of 236

Let N dim to be the Dimension of the matrix and N iter to be the iterative numbers. In the whole report, b
was generated by Ax, where x is a corresponding vector and x’s entries are random numbers between 0 and
10. The initial iterative values of x are given by ~0.
Problem 1
1. Listing 1 shows the implement of Jacobi Method.
2. Listing 2 shows the implement of SOR Method.
3. The numerical results:
(a) From the records of the iterative number, I got the following results:
For case (2), the Jacobi Method is not convergence, because it has a big Condition Number. For
case (1) and case (3), if N dim is small, roughly speaking, N dim ≤ 10 − 20, then the N dim and
N iter have the roughly relationship N iter = log(N dim + C), when N dim is large, the N iter is
not depent on the N dim (see Figure (1)).
(b) When ω = 1, the SOR Method degenerates to the Gauss-seidel Method. For Gauss-seidel Method,
I get the similar results as Jacobi Method (see Figure (2)). But, the Gauss-Seidel Method is more
stable than Jacobi Method and case (3) is more stable than case (1) (see Figure (1) and Figure
(2)).
Jacobi iteration
45 GS iteration with
40
The iterative steps
35
30
25
10 20 30 40 50 60 70 80 90
The value of N
Figure 1: The relationship between N dim and N iter for case(1)
(c) The optimal w

i. For case (1), the optimal w is around 1, but this optimal w is not optimal for all (see Figure
(3) and Figure (4));
Page 182 of 236
Wenqiang Feng MATH 571 ( TTH 12:40pm): Computational Assignment #2 Problem 1 (continued)
100
90
Jacobi iteration
GS iteration with
80
The iterative steps
70
60
50
40
30
10 20 30 40 50 60 70 80 90
The value of N
Figure 2: The relationship between N dim and N iter for case (3)
45
40
The iterative steps
Jacobi iteration
SOR iteration with w=0.999
30
25
10 20 30 40 50 60 70 80 90
The value of N
ii. For case (2), In general, the SOR Method is not convergence, but SOR is convergence for
some small N dim ; Page 183 of 236
Jacobi iteration
45
The iterative steps
40
35
30
25
10 20 30 40 50 60 70 80 90
The value of N
iii. For case(3), the optimal w is around 1.14; This numerical result is same as the theoretical
result. Let D = diag(diag(A)); E = A − D; T = D\E,
2
wopt = p ≈ 1.14.
1 − ρ(T )2
Where, the ρ(T ) is the spectral radius of T (see Figure (5)).

(d) In general, for the convergence case, N iterJacobi > N iterGauss−Sediel > N iterSORopt . I conclude
that SORopt is more efficient than Gauss − Sediel and Gauss − Sediel is more efficient than
Jacobi for convergence case (see Figure (5)).
Listing 1: Jacobi Method

function [x iter]=jacobi(A,b,x,tol,max_iter)
% jacobi: Solve the linear system with Jacobi iterative algorithm
%
% USAGE
5 % jacobi(A,b,x0,tol)
%
% INPUT
% A: N by N LHS coefficients matrix
% b: N by 1 RHS vector
10 % x: Initial guess
% tol: The stop tolerance
% max_iter: maxmum iterative steps
%
% OUTPUT Page 184 of 236
15 % x: The solutions
100
90
80 Jacobi iteration
GS iteration with
70
The iterative steps
60
50
40
30
10 20 30 40 50 60 70 80 90
The value of N
% iter: iterative steps

%
% AUTHOR
% Wenqiang Feng
20 % Department of Mathematics
% Date: 11/13/2013
n=size(A,1);
25
% Set default parameters

i f (nargin<3), x=zeros(n,1);tol=1e-16;max_iter=500;end;
%Initial some parameters
error=norm(b - A*x);
30 iter=0 ;
% s p l i t the matrix for Jacobi interative method
D = diag(diag(A));
E=D-A;
35 while (error>tol&&iter<max_iter)
x1=x;
x= D\(E*x+b);
error=norm(x-x1);
iter=iter+1;
40 end
Page
Listing 185 of
2: SOR 236
Method
function [x iter]=sor(A,b,w,x,tol,max_iter)
% jacobi: Solve the linear system with SOR iterative algorithm
%
% USAGE
5 % jacobi(A,b,epsilon,x0,tol,max_iter)
%
% INPUT
10 % w: Relaxation parameter
% x: Initial guess
%
15 % OUTPUT
% x: The solutions
%
% AUTHOR
20 % Wenqiang Feng
% Date: 11/13/2013
25
n=size(A,1);
30 error=norm(b - A*x)/norm( b );
iter=0 ;
% s p l i t the matrix for Jacobi interative method
D=diag(diag( A ));
b = w * b;
35 M = w * tril( A, -1 ) + D;
N = -w * triu( A, 1 ) + ( 1.0 - w ) * D;
while (error>tol&&iter<max_iter)
x1=x;
40 x= M\(N*x+b);
error=norm(x-x1)/norm( x );
iter=iter+1;
end
Page 186 of 236

Wenqiang Feng MATH 571 ( TTH 12:40pm): Computational Assignment #2 Problem 1
Problem 2
1. Listing 3 shows the implement of ADI Method.
2. Yes, The Σ and Λ are the SPD matrices. Moreover, they are commute, since ΣΛ = ΛΣ.
3. The optimal τ for the ADI method:

The optimal τ for the ADI method is same as the SSOR and SOR method. Let D = diag(diag(A));
E = A − D; T = D\E,
2
τopt = p .
1 − ρ(T )2
Where, the ρ(T ) is the spectral radius of T .
4. The expression of xk+1 :

By adding and subtracting scheme (1) and scheme (2), we get that
(I + τ A1 )(I + τ A2 )xk+1 − (I − τ A1 )(I − τ A2 )xk = 2τ f. (1)
5. The expression of the error’s control:
(I + τ A1 )(I + τ A2 )ek+1 = (I − τ A1 )(I − τ A2 )ek . (2)
6. Now, I will show [x, y] = (A1 A2 x, y) is an inner product, i.e, I will show the ||x||2B = [x, x] satisfies
parallelogram law:
It’s easy to show that the B-norm ||x||2B = [x, x] satisfies the parallelogram law,
||x + y||2B + ||x − y||2B = (A1 A2 (x + y), x + y) + (A1 A2 (x − y), x − y)

= (A1 A2 x, x) + (A1 A2 x, y) + (A1 A2 y, x) + (A1 A2 y, y)
+(A1 A2 x, x) − (A1 A2 x, y) − (A1 A2 y, x) + (A1 A2 y, y)
= 2(||x||2B + ||y||2B ).
So, The norm space can induce a inner product, so [x, y] = (A1 A2 x, y) is a inner product.
7. Take inner product (2) with ek+1 + ek , we get,

(I + τ A1 )(I + τ A2 )ek+1 , ek+1 + ek = (I − τ A1 )(I − τ A2 )ek , ek+1 + ek . (3)
By using the distribution law, we get

ek+1 , ek+1 + τ Aek+1 , ek+1 + τ 2 A1 A2 ek+1 , ek+1 (4)

+ ek+1 , ek + τ Aek+1 , ek + τ 2 A1 A2 ek+1 , ek (5)

= ek , ek+1 − τ Aek , ek+1 + τ 2 A1 A2 ek , ek+1 (6)

+ ek , ek − τ Aek , ek + τ 2 A1 A2 ek , ek . (7)

Since, A1 A2 = A2 A1 , so A1 A2 ek+1 , ek = A1 A2 ek , ek+1 . Therefore, (4) reduces to

ek+1 , ek+1 + τ A(ek+1 + ek ), ek+1 + ek + τ 2 A1 A2 ek+1 , ek+1 (8)

= ek , ek + τ 2 A1 A2 ek , ek . (9)
Therefore,
Page 187 of 236
||ek+1 ||22 + τ ||ek+1 + ek ||2A + τ 2 ||ek+1 ||2B = ||ek ||22 + τ 2 ||ek ||2B . (10)
Summing over k from 0 to K, we get

K
X
||eK+1 ||22 + τ ||ek+1 + ek ||2A + τ 2 ||eK+1 ||2B = ||e0 ||22 + τ 2 ||e0 ||2B . (11)
k=0
Therefore, from (11), we get ||ek+1 + ek ||2A → 0 ∀τ > 0. So 21 (xk+1 + xk ) → x with respect to || · ||A .
Listing 3: ADI Method

function [x iter]=adi(A,b,A1,A2,tau,x,tol,max_iter)
% jacobi: Solve the linear system with ADI algorithm
%
% USAGE
5 % adi(A,b,A1,A2,tau,x,tol,max_iter)
%
% INPUT
10 % A1: The decomposition of A: A=A1+A2 and A1*A2=A2*A1
% A2: The decomposition of A: A=A1+A2 and A1*A2=A2*A1
% x: Initial guess
15 %
% OUTPUT
% x: The solutions
%
20 % AUTHOR
% Wenqiang Feng
25 % Date: 11/13/2013
n=size(A,1);

30

error=norm(b - A*x);
iter=0 ;
I=eye(n);
35
while (error>tol&&iter<max_iter)
x1=x;
x=(tau*I+A1)\((tau*I-A2)*x+b); % the first half step
x=(tau*I+A2)\((tau*I-A1)*x+b); % the second half step
40 error=norm(x-x1);
iter=iter+1;
end
Page 188 of 236
E Midterm examination 572
Page 189 of 236

MATH 572: Exam problem 4-5

Due on July 15, 2014
TTH 12:40pm
Wenqiang Feng
Page 190 of 236

Wenqiang Feng MATH 572 ( TTH 12:40pm): Exam problem 4-5
Contents
Problem 1 3
Problem 2 4
Problem 3 4
Page 191 of 236

Wenqiang Feng MATH 572 ( TTH 12:40pm): Exam problem 4-5
Problem 1
Given the equation
(
−u00 + u = f, in Ω
0 0
(1)
−u (0) = u (1) = 0, on ∂Ω
devise a finite difference scheme for this problem that results in a tridiagonal matrix. The scheme must be
consistent of order O(2) in the C(Ω̄h ) norm and you should prove this.
Proof: I consider the following uniform partition (Figure. 1) of the interval (0, 1) with N + 1 points. For the
Neumann Boundary, we introduce two ghost point x−1 and xN +1 .
x−1 x0 = 0 x1 xN −1 xN = 1 xN +1
Figure 1: One dimension’s partition
The second order scheme of (1) is as following


 U −2U +U
− i+1 h2i i−1 + Ui
 = Fi , ∀i = 0, · · · , N,
− U1 −U −1
= 0, (2)


2h
 UN +1 −UN −1 = 0.
2h
From the homogeneous Neumann boundary condition, we know that U1 = U−1 and UN +1 = UN −1 . Therefore
1. for i = 0, from the scheme,
1 2 1 2 2
− 2
U−1 + 2 U0 − 2 U1 + U0 = (1 + 2 )U0 − 2 U1 = F0
h h h h h
2. for i = 1, · · · , N − 1, we get
1 2 1 1 2 1
− 2
Ui−1 + 2 Ui − 2 Ui+1 + Ui = − 2 Ui−1 + (1 + 2 )Ui − 2 Ui+1 = Fi .
h h h h h h
3. for i = N
1 2 1 2 2
− UN −1 + 2 UN − 2 UN +1 + UN = (1 + 2 )UN − 2 UN −1 = FN .
h2 h h h h
So the algebraic system is
AU = F,
where      
1 + h22 − h22 U0 F0
 −1 1 + h22 − h12   U1   F1 
 h2     
 .. .. ..   ..   .. 

A= . . . ,U =  F =  .
  .   . 
 1     
 − h2 1 + h22 − h12   UN −1   FN −1 
− h22 1 + h22 UN FN
Next, I will show this scheme is of order O(2). From the Taylor expansion, we know
h2 00 h3
Ui+1 = u(xi+1 ) = u(xi ) + hu0 (xi ) + u (xi ) + u(3) (xi ) + O(h4 )
2 2
Page 2 of 236
192 3
h h
Ui−1 = u(xi−1 ) = u(xi ) − hu0 (xi ) + u00 (xi ) − u(3) (xi ) + O(h4 ).
Wenqiang Feng MATH 572 ( TTH 12:40pm): Exam problem 4-5 Problem 1
Therefore,
Ui+1 − 2Ui + Ui−1 u(xi+1 ) − 2u(xi ) + u(xi−1 )
− =− = −u00 (xi ) + O(h2 ).
h2 h2
Therefore, the scheme (2) is of order O(h2 ).
Problem 2
Let A = tridiag{ai , bi , ci }ni=1 ∈ Rn×n be a tridiagional matrix with the properties that
bi > 0, ai , ci ≤ 0, ai + bi + ci = 0.
Prove the following maximum principle: If u ∈ Rn is such that (Au)i=2,··· ,n−1 ≤ 0, then ui ≤ max{u1 , un }.
Proof: Without loss generality, we assume uk , k = 2, · · · , n − 1 is the maximum value.
1. For (Au)i=2,··· ,n−1 < 0:

I will use the method of contradiction to prove this case. Since (Au)i=2,··· ,n−1 < 0, so
ak uk−1 + bk uk + ck uk+1 < 0.
ak uk−1 − (ak + ck )uk + ck uk+1 = ak (uk−1 − uk ) + ck (uk+1 − uk ) ≥ 0.
This is contradiction to (Au)i=2,··· ,n−1 < 0. Therefore, If u ∈ Rn is such that (Au)i=2,··· ,n−1 < 0, then
ui ≤ max{u1 , un }.
2. For (Au)i=2,··· ,n−1 = 0:

Since (Au)i=2,··· ,n−1 = 0, so
ak uk−1 + bk uk + ck uk+1 = 0.
ak uk−1 − (ak + ck )uk + ck uk+1 = ak (uk−1 − uk ) + ck (uk+1 − uk ) = 0.
And ak < 0, ck < 0, uk−1 − uk ≤ 0, uk+1 − uk ≤ 0, so uk−1 = uk = uk+1 , that is to say, uk−1 and uk+1
is also the maximum points. Bu using the same argument again, we get uk−2 = uk−1 = uk = uk+1 =
uk+2 . Repeating the process, we get
u1 = u2 = · · · = un−1 = un .
Therefore, If u ∈ Rn is such that (Au)i=2,··· ,n−1 = 0, then ui ≤ max{u1 , un }
Problem 3
Prove the following discrete Poincaré inequality: Let Ω = (0, 1) and Ωh be a uniform grid of size h. If Y ∈ Uh
is a mesh function on Ωh such that Y (0) = 0, then there is a constant C, independent of Y and h, for which

kY k2,h ≤ C δ̄ Y 2,h .
Page 193 of 236

Proof: I consider the following uniform partition (Figure. 2) of the interval (0, 1) with N points.
x1 = 0 x2 xN −1 xN = 1
Figure 2: One dimension’s uniform partition

N
X
2
kvk2,h = hd |vi |2 ,
i=1

N N
2
X 2 X vi−1 − vi 2
kvk2,h =h |vi | , δ̄ 2
v 2,h = h .
h
i=1 i=2
Since Y (0) = 0, i.e. Y1 = 0,

N
X
Yi−1 − Yi = Y1 − YN = −YN .
i=2
Then,

XN

Yi−1 − Yi = |YN |.

i=2
and
N N N
!1/2 N !1/2
X X Yi−1 − Yi X X Yi−1 − Yi 2
|YN | ≤ |Yi−1 − Yi | = h ≤
h 2

.
i=2 i=2
h i=2 i=2
h
Therefore
K
! K !
X X Yi−1 − Yi 2
|YK | 2
≤ 2
h
h
i=2 i=2
K
X Yi−1 − Yi 2
2
= h (K − 1) .
h
i=2
1. When K = 2,

Y1 − Y2 2
|Y2 | 2
≤ 2
h .
h
2. When K = 3,
!
Y1 − Y2 2 Y2 − Y3 2
|Y3 | 2
≤ 2h 2
h + h .
3. When K = N ,
!
Y1 − Y2 2 Y2 − Y3 2 YN −1 − YN 2
|YN | 2
≤ (N − 1)h 2 + + ··· + .
Page
h 194 of 236
h h

N
X N 2
N (N − 1) 2 X Yi−1 − Yi
2
|Yi | ≤ h .
i=2
2 i=2
h
Since Y1 = 0, so
N
X N 2
N (N − 1) 2 X Yi−1 − Yi
|Yi |2 ≤ h .
i=1
2 i=2
h
And then
N N X N
1 X N X Yi−1 − Yi 2 1 1 Yi−1 − Yi 2
|Y |2
≤ h2 = + h2 .
2(N − 1) i=2
i
(N − 1)2 i=1 h 2 2(N − 1) i=2
h
1
N N
X 1 1 X Yi−1 − Yi 2
h2 2
|Yi | ≤ + h2 .
2 2(N − 1) h
i=1 i=2
then
N XN
X 1 1 Yi−1 − Yi 2
h 2
|Yi | ≤ + h .
2 2(N − 1) h
i=1 i=2
i.e,

2 1 1 2
kY k2,h ≤ + δ̄ Y .
2 2(N − 1) 2,h
since N ≥ 2, so
2 2
kY k2,h ≤ δ̄ Y 2,h .
Hence,

kY k2,h ≤ C δ̄ Y 2,h .
Page 195 of 236

F Project 1 MATH572
Page 196 of 236

MATH 572
Adaptive Solution of Ordinary Differential Equations

All the theorems about convergence that we have had in class state that, under certain conditions,
lim max ky(tn ) − yn k = 0.
h→0+ n
While this is good and we should not use methods that do not satisfy this condition, this type of result is
of little help in practice. In other words, we usually compute with a fixed h and, even if we know y(tn ), we
do not know the exact solution at the next time step and, thus, cannot assess how small the local error
en+1 = y(tn+1 ) − yn+1
is. Here we will study two strategies to estimate this quantity. Your assignment will consist in implementing
these two strategies and use them for the solution of a Cauchy problem
y 0 = f (t, y) t ∈ (t0 , T ), y(t0 ) = y0 ,
where
1. f = y − t, (t0 , T ) = (0, 10), y0 = 1 + δ, with δ ∈ {0, 10−3 }.
2. f = λy + sin t − λ cos t, (t0 , T ) = (0, 5), y0 = 0, λ ∈ {0, ±5, ±10}.
3. f = 1 − yt , (t0 , T ) = (2, 20), y0 = 2.
4. The Fresnel integral is given by
Z t
φ(t) = sin(s2 )ds.
0
Set it as a Cauchy problem and generate a table of values on [0, 10]. If possible obtain a plot of the
function.
5. The dilogarithm function Z x
ln(1 − t)
f (x) = − dt
0 t
on the interval [−2, 0].
Step Bisection. The local error analysis that is usually carried out with the help of Taylor expansions
yields, for a method of order s, that
ken+1 k ≤ Chs+1 .
The constant C here is independent of h but it might depend on the exact solution y and the current step
tn . To control the local error we will assume that C does not change as n changes. Let v denote the value of
the approximate solution at tn+1 obtained by doing one step of length h from tn . Let u be the approximate
solution at tn+1 obtained by taking two steps of size h/2 from tn . The important thing here is that both u
and v are computable. By the assumption on C we have
y(tn+1 ) = v + Chs+1 ,
y(tn+1 ) = u + 2C(h/2)s+1 ,
which implies
ku − vk
ken+1 k ≤ Chs+1 = .
1 − 2−s
Notice that the quantity on the right of this expression is completely computable. In a practical realization
one can then monitor ku − vk to make sure that it is below a prescribed tolerance. If it is not, the time step
Page 197 of 236
Date: Due March 13, 2014.
can be reduced (halved) to improve the local truncation error. On the other hand, if this quantity is well
below the prescribed tolerance, the time step can be doubled.
Implement this strategy for the fourth order ERK scheme
0
1 1
2 2
1 1
2 0 2
1 0 0 1
1 1 1 1
6 3 3 6
Adaptive Runge-Kutta-Fehlberg Method. The Runge-Kutta-Fehlberg method is an attempt at devis-

ing a procedure to automatically choose the step size. It consists of a fourth order and a fifth order method
with cleverly chosen parameters so that they use the same nodes and, thus, the function evaluations are at
the same points. The result is a fifth order method that has an estimate for the local error. The method
computes two sequences {yn } and {ȳn } of fifth and fourth order, respectively, by
0
1 1
4 4
3 3 9
c A 8 32 32
12 1932 7200 7296
− 2197
b| = 13 2197
439
2197
3680 845
1 −8 − 4104
b̄| 1
216
8
513
2 − 27 2 − 3544
2565
1859
4104 − 11
40
16 6656 28561 9 2
135 0 12825 56430 − 50 55
25 1408 2197
216 0 2565 4104 − 51 0
The quantity
6
X
en+1 = yn+1 − ȳn+1 = h (bi − b̄i )f (tn + ci h, ξi )
i=1
can be used as an estimate of the local error. An algorithm to control the step size is based on the size of
kyn+1 − ȳn+1 k which, in principle, is controlled by Ch5 .
Implement this scheme.
Page 198 of 236


Due on Thurday, March 13, 2014
TTH 12:40pm
Wenqiang Feng
Page 199 of 236

Contents
Adaptive Runge-Kutta Methods Formulas 3
Problem 1 3
Problem 2 4
Problem 3 7
Problem 4 8
Problem 5 8
Adaptive Runge-Kutta Methods MATLAB Code 10
Page 200 of 236

Adaptive Runge-Kutta Methods Formulas

In this project, we consider two adaptive Runge-Kutta Methods for the following initial-value ODE problem
(
y 0 (t) = f (t, y)
(1)
y(t0 ) = y0 ,
The formula for the fourth order Runge-Kutta (4th RK) method can be read as following


 y(t0 ) = y0 ,




K1 = hf (ti , yi )



K = hf (t + h , y + K1 )
2 i 2 i 2
(2)
 K = hf (t + h
, y + K2



3 i 2 i 2 )



 K4 = hf (ti + h, yi + K3 )


y 1
i+1 = yi + 6 (K1 + K2 + K3 + K4 )
And the Adaptive Runge-Kutta-Fehlberg (RKF) Method can be wrote as


y(t0 ) = y0 ,



K1 = hf (ti , yi )



 K
 h
K2 = hf (ti + 4 , yi + 41 )



 3h 3 9
K3 = hf (ti + 8 , yi + 32 K1 + 32 K2 )

K4 = hf (ti + 12h 1932 7200 7296
13 , yi + 2197 K1 − 2197 K2 + 2197 K3 )
(3)



K5 = hf (ti + h, yi + 439 3680 845

 216 K1 − 8K2 + 513 K3 − 4104 K4 )

K = hf (t + h , y − 8 K + 2K − 3544 K + 1859 K − 11 )

 6 i i
 2 27 1 2 2565 3 4104 4 40



y = y + 16
K + 6656
K + 28561
K − 9
K + 2
K

 i+1 i 135 1 12825 3 56430 4 50 5 55 6
 25 1408
ỹi+1 = yi + 216 K1 + 2656 K3 + 2197
4104 K 4 − 1
5 K 5 .
The error
1
E= |yi+1 − ỹi+1 | (4)
h
will be used as an estimator. If E ≤ T ol, y will be kept as the current step solution and then move to the
next step with time step size δh. If E > T ol, recalculate the current step with time step size δh, where
1/4
T ol
δ = 0.84 .
E
Problem 1
1. The 4th RK method and RKF method for Problem 1.1
(a) Results for Problem 1.1. From the figure (Fig.1) we can see that the 4th RK method and
RKF method are both convergent for Problem 1.1. The 4th RK method is convergent with 4
steps and RKF method with 2 steps and reached error 4.26 × 10−14 .
Page 201 of 236
(b) Figures (Fig.1)
Problem.1.1,With steps =4,error=0.000000e+00 Problem.1.1,with step=2,error=4.263256e−14

11 12
Runge−Kutta−4th Runge−Kutta−Fehlberg
10
10
9
8
8
6 6
y
y
5
4
4
3
2
1 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
x x
Figure 1: The 4th RK method and RKF method for Problem 1.1
steps and reached error 9.9 × 10−6 . RKF method with 29 steps and reached error 2.3 × 10−9 .
(b) Figures (Fig.2)
Problem.1.2,With steps =404,error=9.904222e−06 Problem.1.2,with step=29,error=2.285717e−09

35 35
30 30
25 25
20 20
y
15 15
10 10
5 5
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
x x
Problem 2
Page 202 of 236
(b) Figures (Fig.3)

2 2
1.8 1.8
1.6 1.6
1.4 1.4
1.2 1.2
1 1
y
y
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x x
RKF method are both divergent for Problem 2.2.
(b) Figures (Fig.4)
x 10
10 Problem.2.2,With steps =10002,error=8.087051e+00 x 10
6 Problem.2.2,with step=1001,error=3.725290e−09
0 0
−1 −0.5
−2 −1
−3 −1.5
y
−4 −2
−5 −2.5
−6 −3
−7 −3.5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.5 1 1.5 2 2.5 3 3.5
x x
Pagefor
3. The 4th RK method and RKF method 203Problem
of 236 2.3
(b) Figures (Fig.5)

1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
y
y
−0.2 −0.2
−0.4 −0.4
−0.6 −0.6
−0.8 −0.8
−1 −1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x x
(c) The 4th RK method and RKF method for Problem 2.4
i. Results for Problem 2.4. From the figure (Fig.6) we can see that the 4th RK method and
RKF method are both divergent for Problem 2.4.
ii. Figures (Fig.6)
x 10
21 Problem.2.4,With steps =10002,error=1.967067e+13 x 10
5 Problem.2.4,with step=1001,error=1.396984e−09
0 0
−2
−1
−4
−2 −6
−8
−3
y
−10
−4 −12
−14
−5
−16
−6 −18
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.5 1 1.5
x x
(d) The 4th RK method and RKF method for Problem 2.5
i. Results for Problem 2.5. From
Pagethe
204figure (Fig.7) we can see that the 4th RK method
of 236
and RKF method are both convergent for Problem 2.5. The 4th RK method is convergent
with 88 steps and reached error 8.77 × 10−6 . RKF method with 114 steps and reached error
2.57 × 10−10 .
ii. Figures (Fig.7)

1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
y
y
−0.2 −0.2
−0.4 −0.4
−0.6 −0.6
−0.8 −0.8
−1 −1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x x
Problem 3
1. The 4th RK method and RKF method for Problem 3
(a) Results for Problem 3. From the figure (Fig.8) we can see that the 4th RK method and RKF
method are both convergent for Problem 3. The 4th RK method is convergent with 4 steps and
reached error 1.77 × 10−15 . RKF method with 2 steps and reached error 2 × 10−15 .
(b) Figures (Fig.8)

11 11
10 10
9 9
8 8
7 7
y
6 6
5 5
4 4
3 3
2 2
2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20
x x
Page 205
Figure 8: The 4th RK method and of
RKF236method for Problem 3
Problem 4
(a) Results for Problem 4. From the figure (Fig.9) we can see that the 4th RK method and RKF
method are both convergent for Problem 4. The 4th RK method is convergent with 438 steps and
reached error 9.9 × 10−6 . RKF method with 134 steps and reached error 3.68 × 10−14 .
(b) Figures (Fig.9)

0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
y
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
x x
Figure 9: The 4th RK method and RKF method for Problem 4
Problem 5
(a) Results for Problem 5. Since, x = 0 is the singular point for the problems and y0 = limx→0− =
1. So, the schemes do not work for the interval [−2, 0]. But schemes works for the interval [−2, 0−δ]
and δ > 1 × 1016 . I changed the problem to the following
ln(1+x)
f 0 (x) = x ,x ∈ [δ, 2]
f (δ) = 0.
The (Fig.8) gives the result for the interval [δ, 2] and δ = 1 × 1010 .
(b) Figures (Fig.10)
Page 206 of 236

Problem.5.0,With steps =18,error=1.134243e−06 Problem.5.0,with step=5,error=0.000000e+00

2 2
1.5 1.5
y
y
1 1
0.5 0.5
−2 −1.8 −1.6 −1.4 −1.2 −1 −0.8 −0.6 −0.4 −0.2 0 −2 −1.8 −1.6 −1.4 −1.2 −1 −0.8 −0.6 −0.4 −0.2 0
x x
Figure 10: The 4th RK method and RKF method for Problem 5
Page 207 of 236

Adaptive Runge-Kutta Methods MATLAB Code

1. 4-th oder Runge-Kutta Method MATLAB code
Listing 1: 4-th oder Runge-Kutta Method

function [x,y,h]=Runge_Kutta_4(f,xinit,yinit,xfinal,n)
% Euler approximation for ODE initial value problem
% Runge-Kutta 4th order method
% author:Wenqiang Feng
5 % Email: fw253@mst.edu
% date:January 22, 2012
% Calculation of h from xinit, xfinal, and n
h=(xfinal-xinit)/n;
x=[xinit zeros(1,n)]; y=[yinit zeros(1,n)];
10
for i=1:n %calculation loop

x(i+1)=x(i)+h;
k_1 = f(x(i),y(i));
k_2 = f(x(i)+0.5*h,y(i)+0.5*h*k_1);
15 k_3 = f((x(i)+0.5*h),(y(i)+0.5*h*k_2));
k_4 = f((x(i)+h),(y(i)+k_3*h));
y(i+1) = y(i) + (1/6)*(k_1+2*k_2+2*k_3+k_4)*h; %main equation

end
2. Main function for problems
Listing 2: Main function for problem1-5 with 4-th oder Runge-Kutta Method
% Script file: main1.m
% The RHS of the differential equation is defined as
% a handle function
5 % Email: wfeng1@utk.edu
% date: Mar 8, 2014
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% common parameters
clc
10 clear all
n=1;
tol=1e-5;
choice=5; % The choice of the problem number
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
15 %% the parameters for each problems
switch choice
case 1.1
% problem 11
f=@(x,y) y-x; %The right hand term
20 xinit=0;
xfinal=10;
yinit=1;%+1e-3; %The initial condition
case 1.2
Page 208 of 236
% problem 12
25 f=@(x,y) y-x; %The right hand term

xinit=0;
xfinal=10;
yinit=1+1e-3; %The initial condition
case 2.1
30 % problem 21
lambda=0;
f=@(x,y) lambda*y+sin(x)-lambda* cos(x); %The right hand term
xinit=0;
xfinal=5;
35 yinit=0; %The initial condition
case 2.2
% problem 22
lambda=5;
40 xinit=0;
xfinal=5;
yinit=0; %The initial condition
case 2.3
% problem 23
45 lambda=-5;
xinit=0;
xfinal=5;
50 case 2.4
% problem 24
lambda=10;
xinit=0;
55 xfinal=5;
case 2.5
% problem 25
lambda=-10;
60 f=@(x,y) lambda*y+sin(x)-lambda* cos(x); %The right hand term
xinit=0;
xfinal=5;
case 3
65 % problem 3
f=@(x,y) 1-y/x; %The right hand term
xinit=2;
xfinal=20;
70 case 4
% problem 4
f=@(x,y) sin(xˆ2); %The right hand term
xinit=0;
xfinal=10;
case 5 Page 209 of 236

% problem 5
f=@(x,y) log(1+x)/x; %The right hand term
80 xinit=1e-10;
xfinal=2;
end
85
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% computing the numberical solutions
y0=100*ones(1,n+1);
90 [x1,y1]=Runge_Kutta_4(f,xinit,yinit,xfinal,n);
% computing the initial error

%en=max(abs(y1-y0));
en=max(abs(y1(end)-y0(end)));
95 while (en>tol)
n=n+1;
[x1,y1]=Runge_Kutta_4(f,xinit,yinit,xfinal,n);
[x2,y2,h]=Runge_Kutta_4(f,xinit,yinit,xfinal,2*n);
% two method to computing the error
100 % temp=interp1(x1,y1,x2);
% en=max(abs(temp-y2));
en=max(abs(y1(end)-y2(end)));
i f (n>5000)
disp(’the partitions excess 1000’)
105 break;
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Plot
110 figure
plot(x2,y2,’-.’)
xlabel(’x’)
ylabel(’y’)
legend(’Runge-Kutta-4th’)
115 title(sprintf(’Problem.%1.1f,With steps =%d,error=%1e’,choice,2*n,en),...
’FontSize’, 14)
3. Adaptive Runge-Kutta-Fehlberg Method MATLAB code
Listing 3: 4-th oder Runge-Kutta Method

function [time,u,i,E]=Runge_Kutta_Fehlberg(t,T,h,y,f,tol)
% Email: wfeng1@utk.edu
% date: Mar 8, 2014
5 u0=y; % initial value
t0=t; % initial time
i=0; % initial counter
while t<T
h = min(h, T-t); Page 210 of 236
10 k1 = h*f(t,y);
k2 = h*f(t+h/4, y+k1/4);
k3 = h*f(t+3*h/8, y+3*k1/32+9*k2/32);
k4 = h*f(t+12*h/13, y+1932*k1/2197-7200*k2/2197+7296*k3/2197);
k5 = h*f(t+h, y+439*k1/216-8*k2+3680*k3/513-845*k4/4104);
15 k6 = h*f(t+h/2, y-8*k1/27+2*k2-3544*k3/2565+1859*k4/4104-11*k5/40);
y1 = y + 16*k1/135+6656*k3/12825+28561*k4/56430-9*k5/50+2*k6/55;
y2 = y + 25*k1/216+1408*k3/2565+2197*k4/4104-k5/5;
E=abs(y1-y2);
R = E/h;
20 delta = 0.84*(tol/R)ˆ(1/4);
i f E<=tol
t = t+h;
y = y1;
i = i+1;
25 fprintf(’Step %d: t = %6.4f, y = %18.15f\n’, i, t, y);
u(i)=y;
time(i)=t;
h = delta*h;
else
30 h = delta*h;
end
i f (i>1000)
disp(’the partitions excess 1000’)
break;
35 end
end
time=[t0,time];
u=[u0,u];
4. Main function for problems
Listing 4: Main function for problem1-5 with Adaptive Runge-Kutta-Fehlberg Method

%% main2
clc
clear all
%% common parameters
5 tol=1e-5;
h = 0.2;
choice=5; % The choice of the problem number
%% the parameters for each problems
switch choice
10 case 1.1
% problem 11
xinit=0;
xfinal=10;
15 yinit=1;%+1e-3; %The initial condition
case 1.2
% problem 12
xinit=0;
20 xfinal=10; Page 211 of 236
yinit=1+1e-3; %The initial condition
case 2.1
% problem 21
lambda=0;
25 f=@(x,y) lambda*y+sin(x)-lambda* cos(x); %The right hand term
xinit=0;
xfinal=5;
case 2.2
30 % problem 22
lambda=5;
xinit=0;
xfinal=5;
case 2.3
% problem 23
lambda=-5;
40 xinit=0;
xfinal=5;
case 2.4
% problem 24
45 lambda=10;
xinit=0;
xfinal=5;
50 case 2.5
% problem 25
lambda=-10;
xinit=0;
55 xfinal=5;
case 3
% problem 3
f=@(x,y) 1-y/x; %The right hand term
60 xinit=2;
xfinal=20;
case 4
% problem 4
65 f=@(x,y) sin(xˆ2); %The right hand term
xinit=0;
xfinal=10;
70 case 5
% problem 5
f=@(x,y) log(1+x)/x; %The right hand term
xinit=1e-10;
xfinal=2; Page 212 of 236

end
% xinit = 0;
% xfinal=2;
% yinit = 0.5;
80 % f=@(t,y) y-tˆ2+1; %The right hand term
fprintf(’Step %d: t = %6.4f, w = %18.15f\n’, 0, xinit, yinit);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% computing the numberical solutions
85 [time,u,step,error]=Runge_Kutta_Fehlberg(xinit,xfinal,h,yinit,f,tol);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Plot
figure
90 plot(time,u,’-.’)
xlabel(’x’)
ylabel(’y’)
legend(’Runge-Kutta-Fehlberg’)
title(sprintf(’Problem.%1.1f,with step=%d,error=%1e’,choice,step,error),...
95 ’FontSize’, 14)
Page 213 of 236

G Project 2 MATH572
Page 214 of 236

COMPUTATIONAL ASSIGNMENT #2
MATH 572
The purpose of this assignment is to explore techniques for the numerical solution of boundary value and
initial boundary value problems and to introduce some ideas that we did not discuss in class but, nevertheless,
are quite important. You should submit the solution of at least two (2) of the following problems. Submitting
the solution to the third can be used for extra credit.
The Convection diffusion equation. Upwinding.

Let Ω = (0, 1) and consider the following two point boundary value problem:
−u00 + u0 = 0, u(0) = 1, u(1) = 0.
Here > 0 is a constant. We are interested in what happens when 1.
• Find the exact solution to this problem. Is it monotone?

• Compute a finite difference approximation of this problem on a uniform grid of size h = 1/N using centered
differences: That is, set U0 = 1, UN = 0 and
(1) βi Ui−1 + αi Ui + γi Ui+1 = 0, 0 ? For h ≈ ? For h < ?
• Show that
2 i
h +1
Ûi = 1, Ǔi = 2 , i = 0, N
h −1
are two linearly independent solutions of the difference equation. Find the dicrete solution U of the
problem in terms of Û and Ǔ . Using this representation, determine the relation between and h that
ensures that there are no oscillations in U . Does this coincide with your observations of the previous item?
2
Hint: Consider the sign of h +1 .
2
h −1
• Replace the centered difference approximation of the first derivative u0 by the up-wind difference u0 (xi ) ≈
h−1 (u(xi ) − u(xi−1 )). Repeat the previous two items and draw conclusions.
• Show that, using an up-wind approximation the arising matrix satisfies a discrete maximum principle.
Page 215 of 236

Date: Due April 24, 2014.
A posteriori error estimation

For this problem consider
(2) −(a(x)u0 )0 = f, in (0, 1), u(0) = 0, u(1) = 1.
Write a piece of code that, for a given a and f computes the finite element solution to this problem over a
mesh Th = {Ij }Nj=1 , Ij = [xj−1 , xj ] with hj = xj − xj−1 not necessarily uniform.
• Set a = 1 and choose f so that u = x3 . Compute the finite element solution on a sequence of uniform
meshes of size h = 1/N and verify the estimate
(3) ku − U kH 1 (0,1) ≤ Ch = CN −1 .
3
• Set a = 1 and f = − 4√ x
/ L2 (0, 1). This problem, however, is still well posed. Show
and notice that f ∈
this. For this case repeat the previous item. What do you observe?
• Set a(x) = 1 if 0 ≤ x < 1/π and a(x) = 2 otherwise. Choose f ≡ 1 and compute the exact solution.
Repeat the first item. What do you observe? Recall that to compute the exact solution we must include
the interface conditions: u and au0 are continuous.
The last two items show that in the case when either the right hand side or the coefficient in the equation
are not smooth, the solution does not satisfy u00 ∈ L2 (0, 1) and so the error estimate (3) cannot be obtained
with uniform meshes. Notice, also, that in both cases the solution is smooth except perhaps at very few
points, so that if we were able to handle these, problematic, points we should be able to recover (3). The
purpose of a posteriori error estimates is exactly this.
Let us recall the weak formulation of (2). Define:
Z 1 Z 1
A(v, w) = av 0 w0 , L(v) = f v,
0 0
then we need to find u such that u − 1 ∈ H01 (0, 1) and
A(u, v) = L(v) ∀v ∈ H01 (0, 1).
If U is the finite element solution to (2) and v ∈ H01 (0, 1), we have
Z 1 Z 1 N Z
X
0 0
A(u − U, v) = A(u, v) − A(U, v) = L(v) − A(U, v) = fv − aU v = f v − aU 0 v 0 .
0 0 j=1 Ij
Let us now consider each integral separately. Integrating by parts we obtain

Z Z Z
xj
0 0
f v − aU v = fv + (aU 0 )0 v − aU 0 v x
j−1
Ij Ij Ij
so that adding up we get

N Z
X N
X −1
A(u − U, v) = (f + (aU 0 )0 ) v + v(xj )j(a(xj )U 0 (xj )),
j=1 Ij j=1
where
j(w(x)) = w(x + 0) − w(x − 0)
is the so-called jump. Let us now set v = w − Ih w, where Ih is the Lagrange interpolation operator. In this
case then v(xj ) = 0 (why?) and
Page 216 of 236
kvk 2 = kw − I wk 2 ≤ ch kw0 k 2 .
Consequently,
X
A(u − U, w − Ih w) ≤ C hj kf + (aU 0 )0 kL2 (Ij ) kw0 kL2 (Ij )
Ij ∈Th
 1/2  1/2
X X
≤C h2j kf + (aU 0 )0 k2L2 (Ij )   kw0 kL2 (Ij ) 
Ij ∈Th Ij ∈Th
 1/2
X
=C h2j kf + (aU 0 )0 k2L2 (Ij )  kw0 kL2 (0,1)
Ij ∈Th
What is the use of all this? Define rj = hj kf + (aU 0 )0 kL2 (Ij ) then, using Galerkin orthogonality we obtain
 1/2
XN
1 1
ku − U k2H 1 (0,1) ≤ A(u − U, u − U ) = A(u − U, u − U − Ih (u − U )) ≤ C  r2j  ku − U kH 1 (0,1) .
c1 c1 j=1
In other words, we bounded the error in terms of computable and local quantities rj . This allows us to
devise an adaptive method:
• (Solve) Given Th find U .
• (Estimate) Compute the rj ’s.
• (Mark) Choose ` for which r` is maximal.
• (Refine) Construct a new mesh by bisecting I` and leaving all the other elements unchanged.
Implement this method and show that (3) is recovered.
P PN
You might also want to try choosing a set of minimal cardinality M so that j∈M r2j ≥ 12 j=1 r2j and
bisecting the cells Ij with j ∈ M.
Numerical methods for the heat equation

Let Ω = (0, 1) and T = 1. Consider the heat equation
ut − u00 = f in Ω, u|∂Ω = 0, u|t=0 = u0 .
Choose f and u0 so that the exact solution reads:
u(x, t) = sin(3πx)e−2t ,
Implement a finite difference discretization of this problem in sapce and, in time, the θ-method:
Uik+1 − Uik
− θ∆h Uik − (1 − θ)∆h Uik+1 = fik+1 .
τ
In doing so you obtained:
• The explicit Euler method, θ = 1.
• The implicit Euler method, θ = 0.
• The Crank-Nicolson method, θ = 21 .
For each one of them compute the discrete solution U at T = 1 and measure the L2 , H 1 and L∞ norms of
the error. You should do this on a series of meshes and verify the theoretical error estimates. The time step
must be chosen as:
√
• τ = h.
• τ = h.
• τ = h2 .
What can you conclude?
Page 217 of 236


Due on Thurday, April 24, 2014
TTH 12:40pm
Wenqiang Feng
Page 218 of 236

Contents
The Convection Diffusion Equation 3
Problem 1 3
A Posterior Error Estimation 8
Problem 2 8
Heat Equation 13
Problem 3 13
Page 219 of 236

The Convection Diffusion Equation
Problem 1
1. The exact solution
From the problem, we know that the characteristic function is
−λ2 + λ = 0.
So, λ = 0, 1 . Therefore, the general solution is
1 1
u = c1 e0x + c2 e x = c1 + c2 e x .
By using the boundary conditions, we get the solution is
1 1 1
u(x) = 1 − 1 + 1 e x.
1 − e 1 − e
And u(x) is monotone.
2. Central Finite difference scheme
I consider the following partition for finite difference method:
0 1
x0 x1 xN −1 xN
Figure 1: One dimension’s uniform partition for finite difference method
Then, the central difference scheme is as following:

Ui−1 − 2Ui + Ui+1 Ui+1 − Ui−1
− + = 0, i = 1, 2, · · · , N − 1. (1)
h2 2h
U0 = 1, UN = 0. (2)
So
(a) when i = 1, we get
U0 − 2U1 + U2 U2 − U0
− 2
+ = 0,
h 2h
i.e.

1 2 1
− 2
+ U0 + 2 U1 + − 2 U2 = 0.
h 2h h 2h h
Since, U0 = 1, so we get

2 1 1
U1 + − 2 U2 = + . (3)
h2 2h h h 2 2h
(b) when i = 2, · · · , N − 2, we get
Ui−1 − 2Ui + Ui+1 Ui+1 − Ui−1
− + = 0.
h2 2h
i.e.

1 2 1
− + UPage
i−1 + 220 Ui +
of 236 − Ui+1 = 0. (4)
h2 2h h2 2h h2
3. when i = N − 1
UN −2 − 2UN −1 + UN UN − UN −2
− + = 0,
h2 2h
i.e.

1 2 1
− + UN −2 + 2 UN −1 + − UN = 0.
h2 2h h 2h h2
Since UN = 0, then,

1 2
− + UN −2 + UN −1 = 0. (5)
h2 2h h2
From (3)-(5), we get the algebraic system is
AU = F,
where  
2 1
−
 −
h2
1
2h
2
h2
1 
 h2 + 2h h2 − h2
2h 
 .. .. .. 
A=
 . . . ,

 
 − h2 + 2h1 2 1
−


h2
1
2h
2
h2
− h2 + 2h h2
   
1
U1 h2 + 2h
   .. 
 U2   . 
   
U = .. ,F = 
 0

.
 .   
   .. 
 UN −2   . 
UN −1
4. Numerical Results of Central Difference Method
h Nnodes ku − uh kl∞ ,=1 ku − uh kl∞ ,=10−1 ku − uh kl∞ ,=10−3 ku − uh kl∞ ,=10−6

0.5 3 2.540669 × 10−3 7.566929 × 10−1 1.245000 × 102 ∞
0.25 5 6.175919 × 10−4 1.933238 × 10−1 3.050403 × 101 ∞
0.125 9 1.563835 × 10−4 5.570936 × 10−2 7.449173 × 100 ∞
0.0625 17 3.928711 × 10−5 1.211929 × 10−2 1.692902 × 100 ∞
0.03125 33 9.827515 × 10−6 3.018484 × 10−3 2.653958 × 10−1 ∞
0.015625 65 2.457936 × 10−6 7.484336 × 10−4 7.515267 × 10−3 ∞
0.007812 129 6.144675 × 10−7 1.870750 × 10−4 2.281210 × 10−9 ∞
0.003906 257 1.536257 × 10−7 4.674564 × 10−5 6.661338 × 10−16 ∞
Table 1: l∞ norms for the Central Difference Method with = {1, 10−1 , 10−3 , 10−6 }
From Table.1, we get that
(a) when h < the scheme is convergent with optimal convergence order (Figure.2), i.e.
Page 221 of 236
ku − uh kl∞ ≈ 0.01h1.9992 ,
(b) when h ≈ the scheme is convergent with optimal convergence order (Figure.2), i.e.
ku − uh kl∞ ≈ 3.201h2.0072 ,
(c) when h > the scheme is not stable and the solution has oscillation.
l i n e ar r e gr e ssi on f or ǫ = 1
l i n e ar r e gr e ssi on f or ǫ = 0. 1
log(error)
log(h)
Figure 2: linear regression for l∞ norm with = 1 and = 0.1
5. Linearly Independent Solutions Ûi Ǔi

(a) Linearity
It is easy to check that
C1 Ûi + C2 Ǔi = 0,
only when C1 = C2 = 0.
(b) Solutions to (4)
Checking for Ûi = 1
1−2∗1+1 1−1
− + =0
h2 2h
2 i
h +1
Checking for Ǔi = 2
h −1
2 i−1 2 i 2 i+1 2 i+1 2 i−1

h +1 h +1 h +1 h +1 h +1
2 −2 2 + 2 2 − 2
h −1 h −1 h −1 h −1 h −1
− +
h2 2h
2 i−1 2 i 2 i+1
1 h +1 2 h + 1 1 h +1
= − + 2 + + −
h2 2h h −1
h2 2 h −1
2h h2 2
h −1
i−1 i i+1
2 + h 2 + h 2 2 + h h − 2 2 + h
= − + 2 +
2h2 2 − h h 2 − h 2h2 2 − h
i i i
2 − h 2 + h 2 2 + h 2 + h 2 + h
= − + 2 −
2h2 2 − h h 2 − h 2h2 2 − h
i i
2 2 + h 2 2 + h
= − 2 + 2 Page 222 of=236 0.
h 2 − h h 2 − h
(c) The representation of Û and Ǔ

Since Û and Ǔ are the solution of 1, so the linear combination is also solution to 1, i.e.
u = c1 Û + c2 Ǔ
is also solution to 1. We also need this solution to satisfy the boundary conditions, so
 2
u = c1 + c2 2 h +1
=1
h −1
2 N
u = c + c 2 h +1
= 0.
1 2 −1 h
so
(2 + h)N (2 − h)N
c1 = − , c2 = .
(2 + h)(2 − h)N −1 − (2 + h)N (2 + h)(2 − h)N −1 − (2 + h)N
6. Up-wind Finite difference scheme

By using the same partition as central difference, then the up-wind difference scheme is as following:
Ui−1 − 2Ui + Ui+1 Ui − Ui−1
− + = 0, i = 1, 2, · · · , N − 1.
h2 h
U0 = 1, UN = 0.
So
(a) when i = 1, we get
U0 − 2U1 + U2 U1 − U0
− 2
+ = 0,
h h
i.e.

1 2 1
− + U0 + + U1 − U2 = 0.
h2 h h2 h h2
Since, U0 = 1, so we get

2 1 1
2
+ U1 − U2 = + . (6)
h h h2 h 2 h
(b) when i = 2, · · · , N − 2, we get
Ui−1 − 2Ui + Ui+1 Ui − Ui−1

− + = 0.
h2 h
i.e.

1 2 1
− + Ui−1 + + Ui − Ui+1 = 0. (7)
h2 h h2 h h2
(c) when i = N − 1
UN −2 − 2UN −1 + UN UN −1 − UN −2
− 2
+ = 0,
h h
i.e.

1 2 of 1236
− + UNPage
−2 + 223 + UN −1 − 2 UN = 0.
h2 h h2 h h
Since UN = 0, then,

1 2 1
− 2
+ UN −2 + 2
+ UN −1 = 0. (8)
h h h h
From (6)-(8), we get the algebraic system is
AU = F,
where  
2 1
h2 + h − h2
 − 1 2 1

 h2 + h h2 + h − h2 
 .. .. .. 
A=
 . . . ,

 
 − h2 + h1
2 1
h2 + h − h2 
1 2

− h2 + h h2+ h1
   
1
U1 h2 + h
   .. 
 U2   . 
   
U = .. ,F = 
 0

.
 .   
   .. 
 UN −2   . 
UN −1
7. Numerical Results of Up-wind Difference Scheme
h Nnodes ku − uh kl∞ ,=1 ku − uh kl∞ ,=10−1 ku − uh kl∞ ,=10−3 ku − uh kl∞ ,=10−6

0.5 3 2.245933 × 10−2 1.361643 × 10−1 1.992032 × 10−3 ∞
0.25 5 1.270323 × 10−2 1.988791 × 10−1 1.587251 × 10−5 ∞
0.125 9 6.925118 × 10−3 1.571250 × 10−1 4.999060 × 10−7 ∞
0.0625 17 3.623644 × 10−3 9.196290 × 10−2 9.685710 × 10−10 ∞
0.03125 33 1.849028 × 10−3 5.061410 × 10−2 1.110223 × 10−15 ∞
0.015625 65 9.343457 × 10−4 2.695432 × 10−2 2.220446 × 10−16 ∞
0.007812 129 4.695265 × 10−4 1.391029 × 10−2 1.554312 × 10−15 ∞
0.003906 257 2.353710 × 10−4 7.064951 × 10−3 8.881784 × 10−16 ∞
Table 2: l∞ norms for the Up-wind Difference Method with = {1, 10−1 , 10−3 , 10−6 }
From the Table.2 we get that
(a) when h < the scheme is convergent with optimal convergence order (Figure.2), i.e.
ku − uh kl∞ ≈ 0.0471h0.946 ,
(b) when h ≈ the scheme is convergent, but the convergence order is not optimal (Figure.2), i.e.
ku − uh kl∞ ≈ 0.4398h0.6852 ,
(c) when h > the scheme is convergent, and the solution has no oscillation.
Page 224 of 236
l i n e ar r e gr e ssi on f or ǫ = 1
l i n e ar r e gr e ssi on f or ǫ = 0. 1
log(error)
log(h)
Figure 3: linear regression for l∞ norm with = 1 and = 0.1
8. Maximum Principle of Up-wind Difference Scheme
Lemma 0.1 Let A = tridiag{ai , bi , ci }ni=1 ∈ Rn×n be a tridiagional matrix with the properties that
bi > 0, ai , ci ≤ 0, ai + bi + ci = 0.
Then the following maximum principle holds: If u ∈ Rn is such that (Au)i=2,··· ,n−1 ≤ 0, then ui ≤
max{u1 , un }.

From the Up-wind Difference scheme, we get that a1 = 0, ai = − h2 + h1 , i = 2, · · · , n, bi =
2 1

h2 + h , i = 1, · · · , n and ci = − h2 , i = 1, · · · , n − 1, moreover (Au)i=2,··· ,n−1 = 0. Therefore,
bi > 0, ai , ci ≤ 0, ai + bi + ci = 0.
Since (Au)i=2,··· ,n−1 = 0, so the corresponding matrix arising from the up-wind scheme satisfies the
discrete maximum principle(Lemma 0.1).
A Posterior Error Estimation
Problem 2
1. Partition
I consider the following partition for finite element method:
0 1
x1 x2 xN −1 xN
Figure 4: One dimension’s uniform partition for finite element method
Page 225 of 236

2. Basis Function
I will use the linear basis function, i.e. for each element I = [xi , xi+1 ]
( xi+1 −x
φ1 (x) = xi+1 −xi
φI (x) = x−xi
φ2 (x) = xi+1 −xi .
3. Weak Formula
Multiplying the testing function v ∈ H01 to both side of the problem, then integrating by part we get
the following weak formula
Z 1 Z 1
0 0
a(x)u v dx = f vdx.
0 0
4. Approximate Problem
The approximate problem is to find uh ∈ H 1 , s.t
a(uh , vh ) = f (vh )∀v ∈ H01 ,
where
Z 1 Z 1
a(uh , vh ) = a(x)u0h vh0 dx and f (vh ) = f vh dx.
0 0
5. Numerical Results of Finite Element Method for Poisson Equation
(a) Problem: a(x)=1, ue = x3 and f = −6x.
h Nnodes ku − uh kL2 |u − uh |H 1
1/4 5 1.791646 × 10−2 2.480392 × 10−1
1/8 9 4.502711 × 10−3 1.247556 × 10−1
1/16 17 1.127148 × 10−3 6.246947 × 10−2
1/32 33 2.818787 × 10−4 3.124619 × 10−2
1/64 65 7.047542 × 10−5 1.562452 × 10−2
1/128 128 1.761921 × 10−5 7.812440 × 10−3
1/256 257 4.404826 × 10−6 3.906243 × 10−3
Table 3: L2 and H 1 Errors of Finite Element Method for Poisson Equation .
Using linear regression (Figure.5), we can also see that the errors in Table.4 obey
ku − uh kL2 ≈ 0.2870h1.9987 ,
ku − uh kH 1 ≈ 0.9935h0.9986 .
These linear regressions indicate that the finite element method for this problem can converge in
the optimal rates, which are second order in L2 norm and first order in H 1 norm.
Page 226 of 236

l i n e ar r e gr e ssi on f or L 2 n or m e r r or
l i n e ar r e gr e ssi on f or H 1 n or m e r r or
L 2 n or m e r r or
H 1 n or m e r r or
log(error)
log(h)
Figure 5: linear regression for L2 and H 1 norm errors
3 3
(b) Problem: a(x)=1, ue = x 2 and f = − 4√ x
.
h Nnodes ku − uh kL2 |u − uh |H 1
1/4 5 7.625472 × 10−3 1.022294 × 10−1
1/8 9 2.029299 × 10−3 5.585353 × 10−2
1/16 17 5.324774 × 10−4 3.011300 × 10−2
1/32 33 1.378846 × 10−4 1.607571 × 10−2
1/64 65 3.523180 × 10−5 8.517032 × 10−3
1/128 128 8.876332 × 10−6 4.485323 × 10−3
1/256 257 2.203920 × 10−6 2.350599 × 10−3
Using linear regression (Figure.6), we can also see that the errors in Table.4 obey
ku − uh kL2 ≈ 0.1193h1.9593 ,
ku − uh kH 1 ≈ 0.3682h0.9081 .
These linear regressions indicate that the finite element method for this problem can converge,
but not in the optimal rates.
Page 227 of 236

L 2 n or m e r r or
H 1 n or m e r r or
log(error)
log(h)
Figure 6: linear regression for L2 and H 1 norm errors
(c) Problem: f=1,

(
1
1, 0≤x< π
a(x) = 1
2, π ≤ x ≤ 1.
So, the exact solution should be

(
5π 2 +1
− 21 x2 + 2π(π+1) x, 0≤x< 1
π
ue = 5π 2 +1
− 14 x2 + 5π−1
4π(π+1) x + 4π(π+1) ,
1
π ≤ x ≤ 1.
We can not use the uniform mesh to compute this problem. Since if we can use the uniform mesh,
then π1 should be the node point, that is to say
1 1
nh = n = ,
Nelem π
i.e.
nπ = Nelem , n, Nelem ∈ Z.
This is not possible, so we can not generate such mesh.
6. Adaptive Finite Element Method for Possion Equation

I will follow the standard local mesh refinement loops :
SOLVE → ESTIMATE → MARK → REFINE.

33
(a) Problem: a(x)=1, ue = x 2 and f = − 4√ x
.
Page 228 of 236

Iter Nelem ku − uh kL∞ ku − uh kL2 |u − uh |H 1

1 32 2.797720 × 10−5 1.378846 × 10−4 1.607571 × 10−2
2 47 1.022508 × 10−5 6.093669 × 10−5 9.927148 × 10−3
3 75 3.674022 × 10−6 2.038303 × 10−5 5.935496 × 10−3
4 102 1.313414 × 10−6 1.400631 × 10−5 4.239849 × 10−3
5 145 4.663453 × 10−7 6.119733 × 10−6 2.933869 × 10−3
6 171 1.654010 × 10−7 4.589394 × 10−6 2.432512 × 10−3
7 192 5.970786 × 10−8 4.010660 × 10−6 2.185324 × 10−3
8 208 5.956431 × 10−8 3.587483 × 10−6 2.034418 × 10−3
9 219 5.957050 × 10−8 3.297922 × 10−6 1.942123 × 10−3
10 229 5.976916 × 10−8 3.076573 × 10−6 1.864147 × 10−3
Using linear regression, we can also see that the errors (Figure.7) in Table.5 obey
−1.0798
ku − uh kH 1 ≈ 0.6454Nelem .
These linear regressions indicate that the adaptive finite element method for this problem can
converge in the optimal rates, which is first order in H 1 norm.
0.018
L 2 n or m e r r or
H 1 n or m e r r or
0.016
0.014
0.012
0.01
0.008
0.006
0.004
0.002
0
1 2 3 4 5 6 7 8 9 10
Figure 7: L2 and H 1 norm errors for each iteration
(b) Problem: f=1,

(
1
1, 0≤x< π
a(x) = 1
2, π ≤ x ≤ 1.
So, the exact solution should be

(
5π 2 +1
− 21 x2 + 2π(π+1) x, 0≤x< 1
π
ue = 5π 2 +1
− 14 x2 + 5π−1
4π(π+1) x + 4π(π+1) ,
1
π ≤ x ≤ 1.
Page 229 of 236
Iter Nelem ku − uh kL∞ ku − uh kL2 |u − uh |H 1

1 2 5.652041 × 10−1 4.506966 × 10−1 9.637043 × 10−2
2 4 5.652041 × 10−1 4.626630 × 10−1 4.818522 × 10−2
3 8 5.652041 × 10−1 4.656590 × 10−1 2.409261 × 10−2
4 16 5.652041 × 10−1 4.664083 × 10−1 1.204630 × 10−2
5 32 5.652041 × 10−1 4.665956 × 10−1 6.023152 × 10−3
6 48 5.652041 × 10−1 4.666425 × 10−1 4.116248 × 10−3
7 96 5.652041 × 10−1 4.666542 × 10−1 2.058124 × 10−3
8 160 5.652041 × 10−1 4.666571 × 10−1 1.739956 × 10−3
9 192 5.652041 × 10−1 4.666571 × 10−1 1.029062 × 10−3
10 192 5.652041 × 10−1 4.666571 × 10−1 1.029062 × 10−3
Table 6: L2 and H 1 Errors of Finite Element Method for Interface Problems .
Using linear regression, we can also see that the errors (Figure.8) in Table.6 obey
−0.9706
ku − uh kH 1 ≈ 0.1825Nelem .
These linear regressions indicate that the adaptive finite element method for this problem can
converge in the optimal rates, which is first order in H 1 norm.
0.1
H 1 n or m e r r or
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
1 2 3 4 5 6 7 8 9 10
Figure 8: L2 and H 1 norm errors for each iteration
Heat Equation
Problem 3
1. Partition
I consider the following partition for finite element method:
Page 230 of 236

0 1
x0 x1 xN −1 xN
Figure 9: One dimension’s uniform partition for finite element method
2. The corresponding value of f and u0

I choose the following corresponding value of f and u0 :
u0 = sin(3πx), f (t, x) = −2 sin(3πx)e−2t + 9π 2 sin(3πx)e−2t .
3. θ Method Scheme The θ Method Discretization Scheme of this problem is as following

k+1
Uik+1 − Uik U k − 2Uik + Ui+1
k Ui−1 − 2Uik+1 + Ui+1
k+1
− θ i−1 − (1 − θ) = θfik + (1 − θ)fik+1 . (9)
τ h2 h2
τ
Let µ = h2 , then the scheme (9) can be rewritten as
Uik+1 − Uik − θµ(Ui−1

k
− 2Uik + Ui+1
k k+1
) − (1 − θ)µ(Ui−1 − 2Uik+1 + Ui+1
k+1
) = θτ fik + (1 − θ)τ fik+1 .
Combining of similar terms, we get

k+1
−(1 − θ)µUi−1 + (2(1 − θ)µ + 1)Uik+1 − (1 − θ)µUi+1
k+1
= k
θµUi−1 − (2θµ − 1)Uik + θµUi+1
k
+ θτ fik + (1 − θ)τ fik+1 .
Since U (0) = U (1) = 0So, the θ-scheme can be written as the following matrix form
AU k+1 = BU k + F,
where
 
2(1 − θ)µ + 1 −(1 − θ)µ
 −(1 − θ)µ 2(1 − θ)µ + 1 −(1 − θ)µ 
 
 .. .. .. 
A=
 . . . ,

 
 −(1 − θ)µ 2(1 − θ)µ + 1 −(1 − θ)µ 
−(1 − θ)µ 2(1 − θ)µ + 1
 
−(2θµ − 1) θµ
 θµ −(2θµ − 1) θµ 
 
 .. .. .. 
B=
 . . . ,

 
 θµ −(2θµ − 1) θµ 
θµ −(2θµ − 1)
   
U k+1 (x1 ) U k (x1 )
 U k+1 
(x2 )   k 
  U (x2 ) 
 ..   .. 
U k+1 =
 .
 , Uk = 
  .
,

   
 U k+1 (xN −2 )   U k (xN −2 ) 
U k+1Page
(xN −1231) of 236 U k (xN −1 )
   
f k (x1 ) f k+1 (x1 )
 ..   .. 
 .   . 
   
   
F = θτ  f k (xi )  + (1 − θ)τ  f k+1 (xi ) .
 ..   .. 
   
 .   . 
f k (xN −1 ) f k+1 (xN −1 )
4. Numerical Results of Finite difference Method (θ Method) for Heat Equation
(a) Numerical results for θ-Method for fixed τ = 1 × 10−5
h Nnodes µ ku − uh kL∞ ,θ=0 ku − uh kL∞ ,θ=1 ku − uh kL∞ ,θ= 1

2
−2 −2
1/4 5 0.00016 8.794539 × 10 8.794522 × 10 8.794531 × 10−2
1/8 9 0.00064 1.723827 × 10−2 1.723819 × 10−2 1.723823 × 10−2
1/16 17 0.00256 4.076556 × 10−3 4.076490 × 10−3 4.076523 × 10−3
1/32 33 0.01024 1.005390 × 10−3 1.005327 × 10−3 1.005359 × 10−3
1/64 65 0.04096 2.505219 × 10−4 2.504594 × 10−4 2.532024 × 10−4
1/128 129 0.16384 6.260098 × 10−5 6.253858 × 10−5 6.256978 × 10−5
Table 7: L∞ norms for the θ-Method for fixed τ = 1 × 10−5
h Nnodes µ ku − uh kL2 ,θ=0 ku − uh kL2 ,θ=1 ku − uh kL2 ,θ= 1

2
1/4 5 0.00016 6.218678 × 10−2 6.218666 × 10−2 6.218672 × 10−2
1/8 9 0.00064 1.218929 × 10−2 1.218924 × 10−2 1.218927 × 10−2
1/16 17 0.00256 2.882561 × 10−3 2.882514 × 10−3 2.882537 × 10−3
1/32 33 0.01024 7.109183 × 10−4 7.108736 × 10−4 7.108959 × 10−4
1/64 65 0.04096 1.771458 × 10−4 1.771015 × 10−4 1.771236 × 10−4
1/128 129 0.16384 4.426558 × 10−5 4.422145 × 10−5 4.424352 × 10−5
Table 8: L2 norms for the θ-Method for fixed τ = 1 × 10−5
h Nnodes µ ku − uh kH 1 ,θ=0 ku − uh kH 1 ,θ=1 ku − uh kH 1 ,θ= 1

2
−0 −0
1/4 5 0.00016 1.838499 × 10 1.838496 × 10 1.838497 × 10−0
1/8 9 0.00064 8.668172 × 10−1 8.668132 × 10−1 8.668152 × 10−1
1/16 17 0.00256 4.284228 × 10−1 4.284158 × 10−1 4.284193 × 10−1
1/32 33 0.01024 2.136338 × 10−1 2.136204 × 10−1 2.136271 × 10−1
1/64 65 0.04096 1.067553 × 10−1 1.067286 × 10−1 1.067419 × 10−1
1/128 129 0.16384 5.338867 × 10−2 5.333545 × 10−2 5.336206 × 10−2
Table 9: H 1 norms for the θ-Method for fixed τ = 1 × 10−5
From the Table(7)-(9), we can conclude that when µ < 0.5, Implicit Euler method, Explicit
Euler method and Crank-Nicolson method are convergent with optimal order in spacial, which
are second order in L∞ , L2 norm and first order in H 1 norm.
√
(b) Numerical results for θ-Method for τPage
= 232
h of 236
h Nnodes µ ku − uh kL∞ ku − uh kL2 ku − uh kH 1

1/4 5 8.00 9.334285 × 10−2 6.600336 × 10−2 1.951333 × 100
1/8 9 22.63 1.418498 × 10−1 1.003029 × 10−1 7.132843 × 100
1/16 17 64.00 5.067314 × 10−3 3.583132 × 10−3 5.325457 × 10−1
1/32 33 181.02 3.744691 × 10−2 2.647897 × 10−2 7.957035 × 100
1/64 65 512 6.776843 × 10−4 4.791952 × 10−4 2.887826 × 10−1
1/128 129 1228.15 8.093502 × 10−3 5.722970 × 10−3 6.902469 × 100
1/256 257 4096 2.192061 × 10−4 1.550021 × 10−4 3.739592 × 10−2
√
Table 10: Error norms for the Implicit Euler method with τ = h

1/4 5 8.00 4.341161 × 102 3.069664 × 102 9.075199 × 103
1/8 9 22.63 8.631363 × 101 6.103296 × 101 4.340236 × 103
1/16 17 64.00 4.466761 × 103 3.158477 × 103 4.694310 × 105
1/32 33 181.02 2.482730 × 103 1.755559 × 103 5.275526 × 105
1/64 65 512 5.556307 × 1010 2.439517 × 1010 1.962496 × 1014
1/128 129 1228.15 4.383362 × 1025 1.193837 × 1025 3.823127 × 1029
1/256 257 4096 3.530479 × 1051 1.095038 × 1051 1.420743 × 1056
√
Table 11: Error norms for the Explicit Euler method with τ = h

1/4 5 8.00 3.937504 × 10−1 2.784236 × 10−1 8.231355 × 100
1/8 9 22.63 4.372744 × 10−2 3.091997 × 10−2 2.198812 × 100
1/16 17 64.00 1.007102 × 10−2 7.121285 × 10−3 1.058406 × 100
1/32 33 181.02 3.858423 × 10−2 2.728317 × 10−2 8.198702 × 100
1/64 65 512 1.408511 × 10−4 9.959676 × 10−5 6.002108 × 10−2
1/128 129 1228.15 7.776086 × 10−3 5.498523 × 10−3 6.631764 × 100
1/256 257 4096 1.158509 × 10−5 8.191894 × 10−6 1.976382 × 10−2
√
Table 12: Error norms for the Crank-Nicolson method with τ = h
From the Table(10)-(12), we can conclude that Implicit Euler method and Crank-Nicolson method
are unconditional stable, while when µ > 12 Explicit Euler method is not stable.
(c) Numerical results for θ-Method for τ = h

1/4 5 4 9.048357 × 10−2 6.398155 × 10−2 1.891560 × 100
1/8 9 8 1.777939 × 10−2 1.257192 × 10−2 8.940271 × 10−1
1/16 17 16 4.292498 × 10−3 3.035255 × 10−3 4.511170 × 10−1
1/32 33 32 1.106397 × 10−3 7.823405 × 10−4 2.350965 × 10−1
1/64 65 64 2.999114 × 10−4 2.120694 × 10−4 1.278017 × 10−1
1/128 129 128 8.707869 × 10−5 6.157393 × 10−5 7.426427 × 10−2
1/256 257 256 2.785209 × 10−5 1.969440 × 10−5 4.751484 × 10−2
Page 233 of 236

Table 13: Error norms for the Implicit Euler method with τ = h

1/4 5 4 1.633634 × 104 1.155154 × 104 3.415113 × 105
1/8 9 8 4.782087 × 106 3.381446 × 106 2.404647 × 108
1/16 17 16 3.367080 × 1012 2.023268 × 1012 1.028718 × 1015
1/32 33 32 1.762004 × 1051 8.628878 × 1050 1.756719 × 1054
1/64 65 64 5.115840 × 10137 2.577582 × 10137 2.101478 × 10141
1/128 129 128 4.972138 × 10−17 ∞ ∞
1/256 257 256 4.972138 × 10−17 ∞ ∞
Table 14: Error norms for the Explicit Euler method with τ = h

1/4 5 4 1.115040 × 10−1 7.884526 × 10−2 2.330993 × 100
1/8 9 8 1.245553 × 10−2 8.807388 × 10−3 6.263197 × 10−1
1/16 17 16 4.072106 × 10−3 2.879414 × 10−3 4.279551 × 10−1
1/32 33 32 1.004329 × 10−3 7.101680 × 10−4 2.134083 × 10−1
1/64 65 64 2.502360 × 10−4 1.769436 × 10−4 1.066335 × 10−1
1/128 129 128 6.250630 × 10−5 4.419863 × 10−5 5.330792 × 10−2
1/256 257 256 1.562328 × 10−5 1.104733 × 10−5 2.665286 × 10−2
Table 15: Error norms for the Crank-Nicolson method with τ = h
are unconditional stable, while when µ > 12 Explicit Euler method is not stable.
(d) Numerical results for θ-Method for τ = h2

1/4 5 1 8.849982 × 10−2 6.257882 × 10−2 1.850089 × 100
1/8 9 1 1.730081 × 10−2 1.223352 × 10−2 8.699621 × 10−1
1/16 17 1 4.089480 × 10−3 2.891699 × 10−3 4.297810 × 10−1
1/32 33 1 1.008450 × 10−3 7.130822 × 10−4 2.142840 × 10−1
1/64 65 1 2.512547 × 10−4 1.776639 × 10−4 1.070675 × 10−1
1/128 129 1 6.276023 × 10−5 4.437819 × 10−5 5.352449 × 10−2
1/256 257 1 1.568672 × 10−5 1.109219 × 10−5 2.676109 × 10−2
Table 16: Error norms for the Implicit Euler method with τ = h2

1/4 5 1 8.603950 × 105 6.083912 × 105 1.798656 × 107
1/8 9 1 8.967110 × 1012 6.340704 × 1012 7.960153 × 1014
1/16 17 1 3.903063 × 10104 2.759883 × 10104 1.406256 × 10107
1/32 33 1 4.972138 × 10−17 ∞ ∞
1/64 65 1 4.972138 × 10−17 ∞ ∞
1/128 129 1 4.972138 × 10−17 ∞ ∞
1/256 257 1 4.972138 × 10−17 ∞ ∞
Page 234 of 236

Table 17: Error norms for the Explicit Euler method with τ = h2

1/4 5 1 8.793428 × 10−2 6.217892 × 10−2 1.838267 × 100
1/8 9 1 1.723790 × 10−2 1.218904 × 10−2 8.667990 × 10−1
1/16 17 1 4.076506 × 10−3 2.882525 × 10−3 4.284175 × 10−1
1/32 33 1 1.005358 × 10−3 7.108952 × 10−4 2.136269 × 10−1
1/64 65 1 2.504906 × 10−4 1.771236 × 10−4 1.067419 × 10−1
1/128 129 1 6.256978 × 10−5 4.424351 × 10−5 5.336206 × 10−2
1/256 257 1 1.563914 × 10−5 1.105854 × 10−5 2.667992 × 10−2
Table 18: Error norms for the Crank-Nicolson method with τ = h2
are unconditional stable, while when µ > 21 Explicit Euler method is not stable. Moreover, by
using linear regression (Figure.10) for Implicit Euler method errors, we can see that the errors in
Table.16 obey
ku − uh kL2 ≈ 0.9435h2.0580 ,
ku − uh kH 1 ≈ 7.2858h1.0137 .
L 2 n or m e r r or
H 1 n or m e r r or
log(error)
log(h)
Figure 10: linear regression for L2 and H 1 norm errors of Implicit Euler method with τ = h2
Similarly, by using linear regression (Figure.11) for Crank-Nicolson Method, we can also see that
the errors in Table.18 obey
ku − uh kL2 ≈ 0.9382h2.0574 ,
ku − uh kH 1 ≈ 7.2445h1.0131 .
Page 235 of 236
L 2 n or m e r r or
H 1 n or m e r r or
log(error)
log(h)
Figure 11: linear regression for L2 and H 1 norm errors of Crank-Nicolson method with τ = h2
Page 236 of 236

PrelimNum PDF

Uploaded by

Copyright:

Available Formats

PrelimNum PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PrelimNum PDF

Uploaded by

Copyright:

Available Formats

Prelim Notes for Numerical Analysis ∗

∗ Key words: UTK, PDE, Prelim exam, Numerical Analysis.

3.5.6 Another look at Conjugate Gradients Method . . . . . . . . . . . . . . . . . . . . . . 59

5 Solution of Nonlinear problems 69

9 Finite Difference Method 97

10 Finite Element Method 106

B Numerical Mathematics Preliminary Examination 148

C Project 1 MATH571 161

D Project 2 MATH571 177

E Midterm examination 572 189

F Project 1 MATH572 196

G Project 2 MATH572 214

1 The curve of ρ (TRC ) as a function of ω . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

• diagonal : if aij = 0, ∀i , j, • tridiagonal : if aij = 0, ∀|i − j| > 1,

• Hermitian : if A∗ = A, • skew hermitian : if A∗ = −A,

• det (A) , 0, • nullity (A) = 0,

• (A∗ )∗ = A, • det (A∗ ) = det (A)

Properties 1.5. (Properties of similar matrices) If A ∼ B , then

• det (A) = det (B), • rank (A) = rank (B),

Properties 1.6. (Properties of Unitary Matrices) Let A be a n × n Unitary matrix, then

Properties 1.7. (Properties of Hermitian Matrices) Let A be a n × n Hermitian matrix, then

• its eigenvalues are real , • vi ∗ vj = 0, i , j , vi , vj eigenvectors,

• σ (A) ⊂ (0, ∞), • if A is positive semidefinite then diag (A) ≥ 0,

• det (AT ) = det (A), • det (AB) = det (A)det (B),

• (A∗ )−1 = (A−1 )∗ , • (αA)−1 = α1 A−1

• rank (A) ≤ min{m, n}, • rank (P AQ ) = Rank (A),

1.1.2 Similar and diagonalization

with c1 , c2 are not both 0. Multiplying A on both sides of (210), then

c1 Av1 + c2 Av2 = c1 λ1 v1 + c2 λ2 v2 = 0. (2)

Multiplying λ1 on both sides of (210), then

Subtracting (212) form (211), then

c2 (λ2 − λ1 )v2 = 0. (4)

1.1.3 Eigenvalues and Eigenvectors

Theorem 1.6. if λ is an eigenvalue of A, then λ̄ is an eigenvalue of A∗ .

1.1.4 Unitary matrices

Definition 1.1. (Unitary Matrix) A matrix A ∈ Cn×n is said to be unitary a , if

< Ax, Ay >=< A∗ Ax, y >=< x, y > .

This proves the Angle preservation.

where U is a unitary matrix , T is an upper triangular matrix .

Proof. see Appendix (??)

where U is a unitary matrix , D is an diagonal matrix .

Proof. Result follows from 1.12.

Theorem 1.14. (Spectral representiation) A is unitary , then

Proof. see Appendix (??)

1.1.5 Hermitian matrices

Definition 1.2. (Hermitian Matrix) A matrix is Hermitian , if

Definition 1.3. Let A be Hermitian , then the spectral of A, σ (A), is real.

Proof. Let λ ∈ σ (A) with corresponding eigenvector v. Then

< Av, v > = < λv, v >= λ < v, v > (9)

Since < v, v >, 0,therefore λ = λ̄. Hence λ is real.

< vi , vj >= 0, i , j. (11)

< Av1 , v2 > = < λ1 v1 , v2 >= λ1 < v1 , v2 > (12)

Since λ1 , λ2 ,therefore < v1 , v2 >= 0.

where U is a unitary matrix , D is an diagonal matrix .

since U is a unitary matrix. Therefore

Theorem 1.17. If A = A∗ , then ρ (A) = kAk2 .

Proof. Since A is self-adjoint, there an orthonormal basis of eigenvector x ∈ Cn , s.t.

Moreover, Aei = λi ei , kei k = 1 and (ei , ej ) = 0 when i . j,(ej , ej ) = 1. So,

Since, Ax = A(α1 e1 + α2 e2 + · · · + αn en ) = α1 λ1 e1 + α2 λ2 e2 + · · · + αn λn en , then

Definition 1.12. (Cauchy’s Inequality with )

Definition 1.14. (Young’s Inequality with )

ab ≤ ap + C ( )bq , for all a, b > 0 , > 0, (27)

Where, C ( ) = (p )−p/q q−1 .