Matrices and Linear Algebra: (S, Mat, Det) (S, Mat, Eig)
Matrices and Linear Algebra: (S, Mat, Det) (S, Mat, Eig)
Matrices and Linear Algebra: (S, Mat, Det) (S, Mat, Eig)
Contents
26.1 Matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.2
26.1.1 Determinant (s,mat,det) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.2
26.1.2 Eigenvalues and eigenvectors (s,mat,eig) . . . . . . . . . . . . . . . . . . . . . . . . . . 26.2
26.1.3 Hermitian and symmetric matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.3
26.1.4 Matrix trace (s,mat,trace) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.3
26.1.5 Inversion formulas (s,mat,mil) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.3
26.1.6 Kronecker products (s,mat,kron) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.4
26.2 Positive-definite matrices (s,mat,pd) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.4
26.2.1 Properties of positive-definite matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.5
26.2.2 Partial order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.5
26.2.3 Diagonal dominance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.5
26.2.4 Diagonal majorizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.5
26.2.5 Simultaneous diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.6
26.3 Vector norms (s,mat,vnorm) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.6
26.3.1 Examples of vector norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.6
26.3.2 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.7
26.3.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.7
26.4 Inner products (s,mat,inprod) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.7
26.4.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.7
26.4.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.8
26.5 Matrix norms (s,mat,mnorm) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.8
26.5.1 Induced norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.8
26.5.2 Other examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.9
26.5.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.9
26.5.4 Properties for square matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.10
26.5.4.1 Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.10
26.5.4.2 Relationship with spectral radius . . . . . . . . . . . . . . . . . . . . . . . . . 26.10
26.6 Singular values (s,mat,svd) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.10
26.7 Condition numbers and linear systems (s,mat,cond) . . . . . . . . . . . . . . . . . . . . . . . 26.11
26.8 Adjoints (s,mat,adjoint) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.11
26.8.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.11
26.8.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.12
26.9 Pseudo inverse / generalized inverse (s,mat,pseudo) . . . . . . . . . . . . . . . . . . . . . . . . 26.12
26.10Matrices and derivatives (s,mat,grad) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.13
26.11The four spaces (s,mat,4space) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.13
26.12Principal components analysis (low-rank approximation) (s,mat,pca) . . . . . . . . . . . . . . 26.14
26.13Problems (s,mat,prob) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.14
26.14Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26.15
This appendix reviews some of the linear algebra and matrix analysis results that are useful for developing iterative
algorithms and analyzing inverse problems.
In particular, the concept of the adjoint of a linear operator arises frequently when studying inverse problems,
generalizing the familiar concept of the transpose of a matrix. This appendix also sketches the basic elements of
functional analysis needed to describe an adjoint.
26.1
c J. Fessler. [license] August 23, 2017 26.2
for any i ∈ {1, . . . , n}, where A−i,−j denotes the n − 1 × n − 1 matrix formed by removing the ith row and jth
column from A.
Properties of the determinant include the following.
• det{AB} = det{BA} if A ∈ Cm×n and B ∈ Cn×m .
• det{A} = (det{A0 })∗ , where “0 ” denotes the Hermitian transpose or conjugate transpose.
• A is singular (not invertible) if and only if det{A} = 0.
Ax = λx.
i.e., the nonzero elements of each set of eigenvalues are the same.
Proof. If λ ∈ eig{AB} − {0} then ∃x 6= 0 such that ABx = λx 6= 0. Clearly y , Bx 6= 0 here. Multiplying
both sides by B yields BABx Q= λBx =⇒ BAy = λy, so λ ∈ eig{BA} − {0}.
n
• If A ∈ Cn×n then det{A} = i=1 λi .
• eig{A0 } = {λ∗ : λ ∈ eig{A}}
eig A−1 .
• If A is invertible
then
λ∈ eig{A} iff 1/λ ∈
• For k ∈ N: eig Ak = λk : λ ∈ eig{A} .
• The spectral radius of a matrix A is defined as the largest eigenvalue magnitude:
e,mat,eig,rho
• Weyl’s inequality [wiki] is useful for bounding the (real) eigenvalues of sums of Hermitian matrices. In particular
it is useful for analyzing how matrix perturbations affect eigenvalues.
c J. Fessler. [license] August 23, 2017 26.3
A = U diag{λi } U 0 ,
where U −1 = U 0 .
• The eigenvalues of a Hermitian matrix are all real [1, p. 170].
• If A is Hermitian, then x0 Ax is real for all x ∈ Cn [1, p. 170].
• If A is Hermitian, then (see Problem 26.1 for consideration whether this condition is necessary):
• If A is real and symmetric, then we have upper bound that is sometimes tighter [1, p. 34]:
0 2
x Ax ≤ max λ kxk , ∀x ∈ Rn .
λ∈eig{A}
trace{A} = λi . (26.1.8)
i=1
assuming that A and C are invertible. It is also known as the Sherman-Morrison-Woodbury formula [3–5]. (See
[6] for the case where A is singular but positive semidefinite.)
Multiplying on the right by B and simplifying yields the following useful related equality, sometimes called the
push-through identity: e,mat,push,through
−1 −1 −1
[A + BCD] B = A−1 B DA−1 B + C −1
C . (26.1.10)
The following inverse of 2 × 2 block matrices holds if A and B are invertible:
−1 −1
A D A − DB −1 C −A−1 D∆−1 e,mat,block,2x2,inv
= , (26.1.11)
C B −∆−1 CA−1 ∆−1
where ∆ = B − CA−1 D is the Schur complement of A. A generalization is available even when B is not invertible
[7, p. 656].
c J. Fessler. [license] August 23, 2017 26.4
A⊗B = .. .. ..
. (26.1.12)
. . .
aL1 B ... aLM B
Properties of the Kronecker product include the following (among many others [8]):
• (A ⊗ B)(C ⊗ D) = (AC ⊗ BD) if the dimensions are compatible.
• In particular (A ⊗ B)(u ⊗ v) = (A u) ⊗ (B v).
If A and B are Toeplitz or circulant matrices, then this property is the matrix analog of the separability property of
2D convolution.
• (A ⊗ B)−1 = A−1 ⊗ B −1 if A and B are invertible.
• (A ⊗ B)0 = A0 ⊗ B 0
• [A ⊗ B](l−1)K+k, (m−1)N +n = alm bkn , l = 1, . . . , L, k = 1, . . . , K, n = 1, . . . , N, m = 1, . . . , M.
m n
• det{A ⊗ B} = (det{A}) (det{B}) if A is n × n and B is m × m
• If A u = λ u and B v = η v then u ⊗ v is an eigenvector of A ⊗ B with eigenvalue λη.
• If A has singular values {σi } and B has singular values {ηj }, then A ⊗ B has singular values {σi ηj }.
In the context of imaging problems, Kronecker products are useful for representing separable operations such as
convolution with a separable kernel and the 2D DFT. To see this, consider that the linear operation
N
X −1
v[k] = b[k, n]u[n], k = 0, . . . , K − 1,
n=0
can be represented by the matrix-vector product v = B u, where B is the K × N matrix with elements b[k, n].
Similarly, the separable 2D operation
−1 −1
M N
!
X X
v[k, l] = a[l, m] b[k, n]u[n, m] , k = 0, . . . , K − 1, l = 0, . . . , L − 1,
m=0 n=0
can be represented by the matrix-vector product v = C u, where C = A ⊗ B and A is the L × M matrix with
2π 2π
elements a[l, m]. Choosing b[k, n] = e−ı N kn and a[l, m] = e−ı M lm shows that the matrix representation of the
(N, M )-point 2D DFT is Q2D = QM ⊗ QN , where QN denotes the N -point 1D DFT matrix.
For a M × N matrix G, let lex{G} denote the column vector formed by lexicographic ordering of its elements,
i.e., lex{G} = (g11 , . . . , gM 1 , g12 , . . . gM 2 , . . . , g1N , . . . , gM N ), sometimes denoted vec(G). Then one can show that
e,mat,kron,lex
Definition 26.2.1 For a n × n matrix M that is Hermitian symmetric, we say M is positive definite [1, p. 396] iff
x0 M x > 0 for all x 6= 0 ∈ Cn .
t,mat,pd,equiv
s,mat,pd,order
26.2.2 Partial order
The notation B A is shorthand for saying B − A is positive definite. This is a strict partial order, particularly
because it satisfies transitivity: C B and B A implies C A. Likewise, B A is shorthand for saying
B − A is positive semidefinite, and is also transitive. This partial order of matrices is called Loewner order
[wiki]. These inequalities are important for designing majorizers. The following results are useful properties of these
inequalities.
lemma,mat,ba
Lemma 26.2.3 If B A, then C 0 BC C 0 AC for any matrix C of suitable dimensions. (See Problem 26.5.)
t,mat,spd,inv
Theorem 26.2.4 If B A 0, then A−1 B −1 . (See Problem 26.6.) In words: matrix inversion preserves the
natural (partial) ordering of symmetric positive definite matrices.
Corollary 26.2.6 If B is a Hermitian matrix, then B D , diag{|B| 1} where |B| denotes the matrix consisting
of the absolute values of the elements of B.
Proof: P P P
Let H , D − B = diag{|B| 1} −B. Then hii = j |bij | − bii = j6=i |b ij | + (|bii | − bii ) ≥ j6=i |bij |
P P
because |b| − b ≥ 0. Also for j 6= i: hij = −bij so j6=i |hij | = j6=i |bij | ≤ hii . Thus H is diagonally dominant
so D − B 0. 2
c,mat,dd
2
c J. Fessler. [license] August 23, 2017 26.6
When A (and W ) have nonnegative elements (e.g., in CT, PET, SPECT), an alternative simpler proof is to use Corol-
lary 26.2.6 directly with B = A0 W A.
The following theorem generalizes Corollary 26.2.7 (cf. (12.5.14)). See Problem 26.12.
t,mat,dd
Pnp
Theorem 26.2.8 For B ∈ Cnd ×np and any πij ≥ 0 and j=1 πij = 1 for which πij = 0 only if bij = 0:
nd
X e,mat,dd
2
B 0 B D , diag{dj }, dj , |bij | /πij . (26.2.2)
i=1
Proof: 2
bij 2
Pnd Pnp 2 P
= nd np πij bij xj ≤ nd
Pnp Pnp 2
x0 B 0 Bx = i=1
P P
j=1 b ij x j i=1 j=1 πij i=1 j=1 π ij πij x j = j=1 |xj | dj =
2
x0 Dx, using the convexity of |·| . 2
c,mat,dd,equal
Definition 26.3.1 Let V be a vector space over a field such as R or C. A function k·k : V → R is a vector norm iff
for all x, y ∈ V:
• kxk ≥ 0 (nonnegative)
• kxk = 0 iff x = 0 (positive)
• kcxk = |c| kxk for all scalars c in the field (homogeneous)
• kx + yk ≤ kxk + kyk (triangle inequality)
where sup denotes the supremum (least upper bound) of a set. One can show [10, Prob. 2.12] that
e,mat,vnorm,linf,limit
However, the “0-norm” kxk0 is not a vector norm because it does not satisfy all the conditions of Definition 26.3.1.
The proper name for kxk0 is counting measure.
c J. Fessler. [license] August 23, 2017 26.7
26.3.2 Inequalities
To establish that (26.3.1) and (26.3.2) are vector norms, that hardest part is proving the triangle inequality. The
proofs use the following inequalities.
The Hölder inequality [10, p. 29]
If p ∈ [1, ∞] and q ∈ [1, ∞] satisfy 1/p + 1/q = 1, and if x = (x1 , x2 , . . .) ∈ `p and y = (y1 , y2 , . . .) ∈ `q , then
X e,mat,holder
26.3.3 Properties
• If k·k is a vector norm then e,mat,vnorm,wtd
kxkT , kT xk (26.3.7)
is also a vector norm for any nonsingular1 matrix T (with appropriate dimensions).
• Let k·kα and k·kβ be any two vector norms on a finite-dimensional space. Then there exist finite positive constants
Cm and CM such that (see Problem 26.3):
e,mat,vnorm,equiv
s,mat,inprod
26.4 Inner products (s,mat,inprod)
For a vector space V over the field C, an inner product operation h·, ·i : V × V → C, must satisfy the following
axioms ∀x, y ∈ V, α ∈ C.
∗
• hx, yi = hy, xi (Hermitian symmetry), where ∗ denotes complex conjugate.
• hx + y, zi = hx, zi + hy, zi (additivity)
• hαx, yi = α hx, yi (scaling)
• hx, xi ≥ 0 and hx, xi = 0 iff x = 0. (positive definite)
26.4.1 Examples
x,mat,inprod,w,lii
Example 26.4.1 For the space of (suitably regular) functions on [a, b], a valid inner product is
Z b
hf, gi = w(t)f (t)g ∗ (t) dt,
a
where w(t) > 0, ∀t is some (real) weighting function. The usual choice is w = 1.
x,mat,inprod,2
Example 26.4.2 In Euclidean space, Cn , the usual inner product (aka “dot product”) is
n
X
hx, yi = xi yi∗ , where x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ).
i=1
1 We also use the notation kxkT even when T might be singular, in which case the resulting functional is a semi-norm rather than a norm,
because the positivity condition in Definition 26.3.1 no longer holds.
c J. Fessler. [license] August 23, 2017 26.8
26.4.2 Properties
• Bilinearity: * +
X X XX
αi xi , β j yj = αi βj∗ hxi , yj i .
i j i j
for a norm k·k induced by an inner product h·, ·i via (26.4.1), with equality iff x and y are linearly dependent.
Definition 26.5.1 A function k·k : Cm×n → R is a (vector) norm for Cm×n iff it satisfies the following properties for
all A ∈ Cm×n and B ∈ Cm×n .
• kAk ≥ 0 (nonnegative)
• kAk = 0 iff A = 0 (positive)
• kcAk = |c| kAk for all c ∈ C (homogeneous)
• kA + Bk ≤ kAk + kBk (triangle inequality)
In addition, many, but not all, norms for the space Cn×n of square matrices are submultiplicative, meaning that
they satisfy the following inequality:
e,mat,mnorm,submult
|||AB||| ≤ |||A||||||B|||, ∀A, B ∈ Cn×n . (26.5.1)
We use the notation ||| · ||| to distinguish such matrix norms on Cn×n from the ordinary vector norms k·k on Cm×n
that need not satisfy this extra condition.
For example, the max norm on Cm×n is the element-wise maximum: kAkmax = maxi,j |aij | . This is a (vector)
norm on Cm×n but does not satisfy the submultiplicative condition (26.5.1). Most of the norms of interest in imaging
problems are submultiplicative, so these matrix norms are our primary focus hereafter.
kAxk e,mat,mnorm,from,vnorm
• The spectral norm ||| · |||2 , often denoted simply ||| · |||, is defined on Cm×n by
n√ o
|||A|||2 , max λ : λ ∈ eig{A0 A} ,
which is real and nonnegative. This is the matrix norm induced by the Euclidean vector norm k·k2 , i.e.,
kAxk2
|||A|||2 = max .
x6=0 kxk2
and is also called Schur norm and Hilbert-Schmidt norm. It is often the easiest norm to compute.
This norm is invariant to unitary transformations [11, p. 442], because of the trace property (26.1.7).
This is not an induced norm [12], but nevertheless it is compatible with the Euclidean vector norm because
e,mat,mnorm,frob,ineq
kAxk2 ≤ |||A|||Frob kxk2 . (26.5.7)
However, this is not a tight upper bound in general. By combining (26.5.7) with the definition of matrix multi-
plication, one can show easily that the Frobenius norm is submultiplicative [1, p. 291].
26.5.3 Properties
• All matrix norms are equivalent in the sense given for vectors in (26.3.8).
• Two vector norms can induce the same matrix norm if and only if one of the vector norms is a constant scalar
multiple of the other.
• No induced matrix norm can be uniformly dominated by another induced matrix norm:
|||A|||α ≤ |||A|||β , ∀A ∈ Cm×n
if and only if
|||A|||α = |||A|||β .
• A unitarily invariant matrix norm satisfies |||A||| = |||U AV ||| for all A ∈ Cm×n and all unitary matrices
U ∈ Cm×m , V ∈ Cn×n .
The spectral norm ||| · |||2 is the only matrix norm that is both induced and unitarily invariant.
• A self adjoint matrix norm satisfies |||A0 ||| = |||A|||.
The spectral norm ||| · |||2 is the only matrix norm that is both induced and self adjoint.
• If A ∈ Cm×n has rank r ≤ min(m, n), then [13, p. 57]:
√ e,mat,mnorm,2,F
|||A|||2 ≤ |||A|||Frob ≤ r|||A|||2 . (26.5.8)
• By [13, p. 58], p
|||A|||2 ≤ |||A|||1 |||A|||∞ .
• Using the spectral radius ρ(·) defined in (26.1.2):
p e,mat,mnorm,2,rho
|||A|||2 = ρ(A0 A). (26.5.9)
c J. Fessler. [license] August 23, 2017 26.10
• If ||| · ||| is a matrix norm on Cn×n , and if T ∈ Cn×n is invertible, then the following is a matrix norm:
26.5.4.1 Invertibility
• If A is invertible then
|||A−1 ||| ≥ |||I|||/|||A|||.
|||A|||2 = ρ(A).
|||A|||2 ≤ 1 ⇐⇒ A I.
s,mat,svd
26.6 Singular values (s,mat,svd)
Eigenvalues are defined only for square matrices. For any rectangular matrix A ∈ Cn×m , the singular values,
denoted σ1 , . . . , σn are the square roots of the eigenvalues of the n × n square matrix A0 A. (Because A0 A is positive
semidefinite, its eigenvalues are all real and nonnegative.) Written concisely:
p e,mat,svd,sig
s,mat,cond
26.7 Condition numbers and linear systems (s,mat,cond)
The condition number [wiki] for matrix inversion with respect to matrix norm ||| · ||| is defined:
κ(A) , (26.7.1)
∞, A singular.
where σmax and σmin denote the maximum and minimum singular values of A. A concept of condition number has
also been developed for problems with constraints [14]. Condition numbers are submultiplicative:
κ(AB) ≤ κ(A)κ(B).
Suppose we want to solve Ax = b, but the right-hand side is perturbed (e.g., by noise or numerical error) so
instead we solve A x̂ = b + ε. Then the error propagation depends on the condition number [1, p. 338]:
kx − x̂k kεk
≤ κ(A) .
kxk kbk
s,mat,adjoint
26.8 Adjoints (s,mat,adjoint)
Recall the following fact from linear algebra. If A ∈ Cm×n , then
where A0 denotes the Hermitian transpose of A. For analyzing some image reconstruction problems, we need to
generalize the above relationship to operators A in function spaces (specifically Hilbert spaces). The appropriate
generalization of “transpose” is called the adjoint of A and is denoted A∗ [17, p. 352].
Let X and Y denote vector spaces with inner products h·, ·iX and h·, ·iY respectively. Let B(X , Y) denote the
space of bounded linear operators from X to Y, i.e., if A ∈ B(X , Y) then the following supremum is finite:
kA f kY
|||A||| , sup ,
f ∈X , f 6=0 kf kX
where k·kX is the norm on X corresponding to h·, ·iX , defined in (26.4.1), and likewise for k·kY .
If X and Y are Hilbert spaces, i.e., complete vector spaces under their respective inner products, and if A ∈
B(X , Y), then one can show that there exists a unique bounded linear operator A∗ ∈ B(Y, X ), called the adjoint of
A, that satisfies e,mat,adjoint,inprod
hA f , giY = hf , A∗ giX , ∀ f ∈ X , g ∈ Y. (26.8.1)
26.8.1 Examples
If X = Cn and Y = Cm and A ∈ Cm×n , then A∗ = A0 . So adjoint and transpose are the same in Euclidean space.
As another finite-dimensional example, consider X = Cn×n and Y = C and the trace operator A : Cn×n → C
defined by y = AX iff y = trace{X} . To determine the adjoint we massage the inner products:
!
n n n n ∗
∗
X X X X
hAX, yiY = Xii y ∗ = δ[i − j] Xij y ∗ = Xij (δ[i − j] y) = Xij [In y]ij .
i=1 i,j=1 i,j=1 i,j=1
Example 26.8.1 Consider X = `2 , the space of square summable sequences, and Y = L2 [−π, π], the space of square
integrable functions on [−π, π]. The discrete-time Fourier transform (DTFT) operator A : X → Y is defined by
∞
X
F = A f ⇐⇒ F (ω) = e−ıωn fn , ∀ω ∈ [−π, π].
n=−∞
c J. Fessler. [license] August 23, 2017 26.12
The fact that this linear operator is bounded is equivalent to Parseval’s theorem:
Z π ∞
X
2 2 2 2
kF kY = |F (ω)| dω = 2π |fn | = 2π kf kX .
−π n=−∞
Thus |||A||| = 2π. To determine the adjoint, manipulate the inner product:
∞
!
Z π Z π X
∗ −ıωn
hA f , GiY = (A f )(ω)G (ω) dω = e fn G∗ (ω) dω
−π −π n=−∞
∞
X Z π ∗
= fn eıωn G(ω) dω = hf , A∗ GiX ,
n=−∞ −π
where Z π
[A∗ G]n = eıωn G(ω) dω .
−π
Example 26.8.2 Consider X = Y = `2 and the (linear) discrete-time convolution operator A : `2 → `2 defined by
∞
X
z = Ax ⇐⇒ zn = hn−k xk , n ∈ Z,
k=−∞
where we assume that h ∈ `1 . One can show that kAxk2 ≤ khk1 kxk2 , so A is bounded with |||A||| ≤ khk1 . Since
A is bounded, it is legitimate to search for its adjoint:
∞
" ∞ # ∞
" ∞ #∗ ∞
X X X X X
hAx, yi = ∗
yn xk hn−k = xk ∗
yn hn−k = xk [A∗ y]∗k = hx, A∗ yi,
n=−∞ k=−∞ k=−∞ n=−∞ k=−∞
26.8.2 Properties
The following properties of adjoints all concur with those of Hermitian transpose in Euclidean space.
• I ∗ = I, where I denotes the identity operator: I f = f
∗
• (A∗ ) = A
• (AB)∗ = B∗ A∗
• (A + B)∗ = A∗ + B∗
• (αA)∗ = α∗ A∗
• |||A∗ ||| = |||A|||
• |||A∗ A||| = |||AA∗ ||| = |||A|||2 = |||A∗ |||2 .
• If A ∈ B(X , Y) is invertible, then A∗ is invertible and (A∗ )−1 = (A−1 )∗ .
s,mat,pseudo
26.9 Pseudo inverse / generalized inverse (s,mat,pseudo)
The Moore-Penrose generalized inverse or pseudo inverse of a matrix A ∈ Cm×n is the unique matrix A† ∈ Cn×m
that satisfies [1, p. 421]
A† A and AA† are Hermitian e,mat,pseudo
AA† A = A (26.9.1)
A† AA† = A† .
The pseudo inverse is related to minimum-norm least-squares (MNLS) problems as follows. Of all the vectors
x that minimize kAx − bk2 , the unique vector having minimum (Euclidean) norm kxk2 is
e,mat,pseudo,mnls
x̂ = A† b. (26.9.2)
• [18, p. 252]
A† = A0 [AA0 ]† = [A0 A]† A0 .
•
PA = AA† = A[A0 A]† A0 , PA0 = A† A = A0 [AA0 ]† A.
• If U ∈ Cm×m is unitary and V ∈ Cn×n is unitary and A ∈ Cm×n , then (see Problem 26.9.3):
e,mat,pseudo,uav
(U AV )† = V 0 A† U 0 . (26.9.3)
The derivative of a matrix inverse with respect to a parameter also can be useful [wiki]:
∂ −1 −1 ∂ e,mat,grad,inv
s,mat,4space
26.11 The four spaces (s,mat,4space)
The range space and null space of a matrix A ∈ Cm×n and related quantities can be important.
RA , {y : y = Ax}
NA , {x : Ax = 0}
⊥ 0 0
NA0 , {y : A y0 = 0 =⇒ y y0 = 0}
R⊥ 0 0
A0 , {x : x A y = 0 ∀y} .
All four are linear spaces, so all include the zero vector. These spaces have the following relationships (see Prob-
lem 26.10):
⊥
RA = NA0 (26.11.1)
NA = R⊥A0 (26.11.2)
c
RA − 0 ⊆ NA 0 (26.11.3)
NA − 0 ⊆ RcA0 (26.11.4)
e,mat,4space
⊥ c
NA0 ⊆ NA 0. (26.11.5)
c J. Fessler. [license] August 23, 2017 26.14
s,mat,pca
26.12 Principal components analysis (low-rank approximation) (s,mat,pca)
Given data yij for i = 1, . . . , N and j = 1, . . . , M , often we wish to find a set of L orthonormal vectors φ1 , . . . , φL ∈
CN and corresponding coefficients x1 , . . . , xM ∈ CL to minimize the following WLS approximation error
M
L
2 M
X
X
X 2
wj
yj − φl xlj
= wj kyj − Bxj k ,
j=1 l=1 j=1
√ √
where yj = (y1j , . . . , yN j ), xj = (x1j , . . . , xLM ), and B = [φ1 . . . φL ]. Defining ỹj , wj yj and x̃j , wj xj ,
we can rewrite this low-rank matrix approximation problem as
M
X 2
min kỹj − B x̃j k2 = min |||Ỹ − B X̃|||2Frob ,
X̃, B : B 0 B=IL X̃, B : B 0 B=IL
j=1
Problem 26.1 Prove or disprove the ratio property for ρ(A) in (26.1.5) in the general case where A is square but not
necessarily symmetric.
p,mat,rho,ab
Problem 26.2 Equation (26.1.3) states that ρ(AB) = ρ(BA) when A ∈ Cm×n and B ∈ Cn×m . Prove or disprove:
?
ρ(Im − AB) = ρ(In − BA).
p,mat,vnorm,equiv
Problem 26.3 Determine the constants in relating norms in (26.3.8) for the case α = 2 and β = 1.
p,mat,mnorm2,mleq
Problem 26.6 Prove Theorem 26.2.4, relating to the inverse of partially Hermitian positive definite matrices.
p,mat,frob
Problem 26.7 Prove the Frobenius norm inequality (26.5.7) and show that it is not tight.
Problem 26.8
RR Following Example 26.8.2, determine the adjoint of the 2D convolution operator g = Af ⇐⇒
g(x, y) = h(x − x0 , y − y 0 ) f (x0 , y 0 ) dx0 dy 0 .
p,mat,pseudo,uav
Problem 26.9 Prove the equality (26.9.3) for unitary transforms of pseudo inverses.
p,mat,4space
Problem 26.10 Prove the relationships between the four spaces in (26.11.5).
mat,lp,p,strict,convex
p
Problem 26.11 Prove that f (x) = kxkp is strictly convex for p > 1.
p,mat,diag,gers
Problem 26.12 Either prove the generalized diagonal dominance theorem Theorem 26.2.8 using the Geršgorin the-
orem, or construct a counter example showing that (26.2.2) truly is a generalization, i.e., a case where D − B 0 B is
not diagonally dominant. (Solve?)
c J. Fessler. [license] August 23, 2017 26.15
26.14 Bibliography
horn:85
[1] R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge: Cambridge Univ. Press, 1985 (cit. on pp. 26.2,
26.3, 26.4, 26.5, 26.6, 26.8, 26.9, 26.11, 26.12).
henderson:81:odt
[2] H. V. Henderson and S. R. Searle. “On deriving the inverse of a sum of matrices.” In: SIAM Review 23.1 (Jan.
1981), 53–60. DOI: 10.1137/1023004 (cit. on p. 26.3).
sherman:1949:aoa
[3] J. Sherman and W. J. Morrison. “Adjustment of an inverse matrix corresponding to changes in the elements of
a given column or a given row of the original matrix.” In: Ann. Math. Stat. 20.4 (Dec. 1949), p. 621. URL:
http://www.jstor.org/stable/2236322 (cit. on p. 26.3).
woodbury:50:imm
[4] M. A. Woodbury. Inverting modified matrices. Tech. Report 42, Stat. Res. Group, Princeton Univ. 1950 (cit. on
p. 26.3).
bartlett:51:aim
[5] M. S. Bartlett. “An inverse matrix adjustment arising in discriminant analysis.” In: Ann. Math. Stat. 22.1 (Mar.
1951), 107–11. URL: http://www.jstor.org/stable/2236707 (cit. on p. 26.3).
kohno:10:amp
[6] K. Kohno, M. Kawamoto, and Y. Inouye. “A matrix pseudoinversion lemma and its application to block-based
adaptive blind deconvolution for MIMO systems.” In: IEEE Trans. Circ. Sys. I, Fundamental theory and
applications 57.7 (July 2010), 1499–1512. DOI: 10.1109/TCSI.2010.2050222 (cit. on p. 26.3).
kailath:80
[7] T. Kailath. Linear systems. New Jersey: Prentice-Hall, 1980 (cit. on p. 26.3).
vanloan:00:tuk
[8] C. F. Van Loan. “The ubiquitous Kronecker product.” In: J. Comp. Appl. Math. 123.1-2 (Nov. 2000), 85–100.
DOI : 10.1016/S0377-0427(00)00393-9 (cit. on p. 26.4).
johnson:70:pdm
[9] C. R. Johnson. “Positive definite matrices.” In: Amer. Math. Monthly 77.3 (Mar. 1970), 259–64. URL:
http://www.jstor.org/stable/2317709 (cit. on p. 26.4).
luenberger:69
[10] D. G. Luenberger. Optimization by vector space methods. New York: Wiley, 1969. URL:
http://books.google.com/books?id=lZU0CAH4RccC (cit. on pp. 26.6, 26.7).
chan:96:cgm
[11] R. H. Chan and M. K. Ng. “Conjugate gradient methods for Toeplitz systems.” In: SIAM Review 38.3 (Sept.
1996), 427–82. DOI: 10.1137/S0036144594276474. URL:
http://www.jstor.org/stable/2132496 (cit. on p. 26.9).
chellaboina:95:itf
[12] V-S. Chellaboina and W. M. Haddad. “Is the Frobenius matrix norm induced?” In: IEEE Trans. Auto. Control
40.12 (Dec. 1995), 2137–9. DOI: 10.1109/9.478340 (cit. on p. 26.9).
golub:89
[13] G. H. Golub and C. F. Van Loan. Matrix computations. 2nd ed. Johns Hopkins Univ. Press, 1989 (cit. on
pp. 26.9, 26.13).
renegar:96:cnt
[14] J. Renegar. “Condition numbers, the barrier method, and the conjugate-gradient method.” In: SIAM J. Optim.
6.4 (Nov. 1996), 879–912. DOI: 10.1137/S105262349427532X (cit. on p. 26.11).
rheinboldt:76:omo
[15] W. C. Rheinboldt. “On measures of ill-conditioning for nonlinear equations.” In: Mathematics of Computation
30.133 (Jan. 1976), 104–11. URL: http://www.jstor.org/stable/2005433 (cit. on p. 26.11).
trefethen:97
[16] L. N. Trefethen and D. Bau. Numerical linear algebra. Phildelphia: Soc. Indust. Appl. Math., 1997 (cit. on
p. 26.11).
naylor:82
[17] A. W. Naylor and G. R. Sell. Linear operator theory in engineering and science. 2nd ed. New York:
Springer-Verlag, 1982 (cit. on p. 26.11).
moon:00
[18] T. K. Moon and W. C. Stirling. Mathematical methods and algorithms for signal processing. Prentice-Hall,
2000. URL: http://www.neng.usu.edu/ece/faculty/tmoon/book/book.html#errata
(cit. on p. 26.13).
campbell:79
[19] S. L. Campbell and C. D. Meyer. Generalized inverses of linear transformations. London: Pitman, 1979
(cit. on p. 26.13).
eckart:1936:tao
[20] C. Eckart and G. Young. “The approximation of one matrix by another of lower rank.” In: Psychometrika 1.3
(1936), 211–8. DOI: 10.1007/BF02288367 (cit. on p. 26.14).