A Concise Text On Advanced Linear Algebra
A Concise Text On Advanced Linear Algebra
A Concise Text On Advanced Linear Algebra
com
This engaging textbook for advanced undergraduate students and beginning graduates
covers the core subjects in linear algebra. The author motivates the concepts by
drawing clear links to applications and other important areas.
The book places particular emphasis on integrating ideas from analysis wherever
appropriate and features many novelties in its presentation. For example, the notion of
determinant is shown to appear from calculating the index of a vector field which leads
to a self-contained proof of the Fundamental Theorem of Algebra; the
Cayley–Hamilton theorem is established by recognizing the fact that the set of
complex matrices of distinct eigenvalues is dense; the existence of a real eigenvalue of
a self-adjoint map is deduced by the method of calculus; the construction of the Jordan
decomposition is seen to boil down to understanding nilpotent maps of degree two;
and a lucid and elementary introduction to quantum mechanics based on linear algebra
is given.
The material is supplemented by a rich collection of over 350 mostly proof-oriented
exercises, suitable for readers from a wide variety of backgrounds. Selected solutions
are provided at the back of the book, making it ideal for self-study as well as for use as
a course text.
www.pdfgrip.com
www.pdfgrip.com
A Concise Text on
Advanced Linear Algebra
YISONG YANG
Polytechnic School of Engineering, New York University
www.pdfgrip.com
www.cambridge.org
Information on this title: www.cambridge.org/9781107087514
c Yisong Yang 2015
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2015
Printed in the United Kingdom by Clays, St Ives plc
A catalogue record for this publication is available from the British Library
Library of Congress Cataloguing in Publication data
Yang, Yisong.
A concise text on advanced linear algebra / Yisong Yang, Polytechnic School
of Engineering, New York University.
pages cm
Includes bibliographical references and index.
ISBN 978-1-107-08751-4 (Hardback) – ISBN 978-1-107-45681-5 (Paperback)
1. Algebras, Linear–Textbooks. 2. Algebras, Linear–Study and teaching (Higher) .
3. Algebras, Linear–Study and teaching (Graduate). I. Title.
II. Title: Advanced linear algebra.
QA184.2.Y36 2015
512 .5–dc23 2014028951
ISBN 978-1-107-08751-4 Hardback
ISBN 978-1-107-45681-5 Paperback
Cambridge University Press has no responsibility for the persistence or accuracy
of URLs for external or third-party internet websites referred to in this publication,
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
www.pdfgrip.com
For Sheng,
Peter, Anna, and Julia
www.pdfgrip.com
www.pdfgrip.com
Contents
Preface page ix
Notation and convention xiii
1 Vector spaces 1
1.1 Vector spaces 1
1.2 Subspaces, span, and linear dependence 8
1.3 Bases, dimensionality, and coordinates 13
1.4 Dual spaces 16
1.5 Constructions of vector spaces 20
1.6 Quotient spaces 25
1.7 Normed spaces 28
2 Linear mappings 34
2.1 Linear mappings 34
2.2 Change of basis 45
2.3 Adjoint mappings 50
2.4 Quotient mappings 53
2.5 Linear mappings from a vector space into itself 55
2.6 Norms of linear mappings 70
3 Determinants 78
3.1 Motivational examples 78
3.2 Definition and properties of determinants 88
3.3 Adjugate matrices and Cramer’s rule 102
3.4 Characteristic polynomials and Cayley–Hamilton
theorem 107
4 Scalar products 115
4.1 Scalar products and basic properties 115
vii
www.pdfgrip.com
viii Contents
Preface
ix
www.pdfgrip.com
x Preface
then to show how the exponential of a linear mapping may be constructed and
understood.
In Chapter 3 we cover determinants. As a non-traditional but highly
motivating example, we show that the calculation of the topological degree
of a differentiable map from a closed curve into the unit circle in R2 involves
computing a two-by-two determinant, and the knowledge gained allows us to
prove the Fundamental Theorem of Algebra. We then formulate the definition
of a general determinant inductively, without resorting to the notion of permu-
tations, and establish all its properties. We end the chapter by establishing the
Cayley–Hamilton theorem. Two independent proofs of this important theorem
are given. The first proof is analytic and consists of two steps. In the first step,
we show that the theorem is valid for a matrix of distinct eigenvalues. In the
second step, we show that any matrix may be regarded as a limiting point of a
sequence of matrices of distinct eigenvalues. Hence the theorem follows again
by taking the limit. The second proof, on the other hand, is purely algebraic.
In Chapter 4 we discuss vector spaces with scalar products. We start from the
most general notion of scalar products without requiring either non-degeneracy
or positive definiteness. We then carry out detailed studies on non-degenerate
and positive definite scalar products, respectively, and elaborate on adjoint
mappings in terms of scalar products. We end the chapter with a discussion
of isometric mappings in both real and complex space settings and noting their
subtle differences.
In Chapter 5 we focus on real vector spaces with positive definite scalar
products and quadratic forms. We first establish the main spectral theorem for
self-adjoint mappings. We will not take the traditional path of first using the
Fundamental Theorem of Algebra to assert that there is an eigenvalue and then
applying the self-adjointness to show that the eigenvalue must be real. Instead
we shall formulate an optimization problem and use calculus to prove directly
that a self-adjoint mapping must have a real eigenvalue. We then present a
series of characteristic conditions for a symmetric bilinear form, a symmetric
matrix, or a self-adjoint mapping, to be positive definite. We end the chapter
by a discussion of the commutativity of self-adjoint mappings and the useful-
ness of self-adjoint mappings for the investigation of linear mappings between
different spaces.
In Chapter 6 we study complex vector spaces with Hermitian scalar products
and related notions. Much of the theory here is parallel to that of the real space
situation with the exception that normal mappings can only be fully understood
and appreciated within a complex space formalism.
In Chapter 7 we establish the Jordan decomposition theorem. We start with
a discussion of some basic facts regarding polynomials. We next show how
www.pdfgrip.com
Preface xi
to reduce a linear mapping over its generalized eigenspaces via the Cayley–
Hamilton theorem and the prime factorization of the characteristic polynomial
of the mapping. We then prove the Jordan decomposition theorem. The key
and often the most difficult step in this construction is a full understanding
of how a nilpotent mapping is reduced canonically. We approach this problem
inductively with the degree of a nilpotent mapping and show that it is crucial to
tackle a mapping of degree 2. Such a treatment eases the subtlety of the subject
considerably.
In Chapter 8 we present four selected topics that may be used as materi-
als for some optional extra-curricular study when time and interest permit. In
the first section we present the Schur decomposition theorem, which may be
viewed as a complement to the Jordan decomposition theorem. In the second
section we give a classification of skewsymmetric bilinear forms. In the third
section we state and prove the Perron–Frobenius theorem regarding the prin-
cipal eigenvalues of positive matrices. In the fourth section we establish some
basic properties of the Markov matrices.
In Chapter 9 we present yet another selected topic for the purpose of
optional extra-curricular study: a short excursion into quantum mechanics
using gadgets purely from linear algebra. Specifically we will use Cn as the
state space and Hermitian matrices as quantum mechanical observables to for-
mulate the over-simplified quantum mechanical postulates including Bohr’s
statistical interpretation of quantum mechanics and the Schrödinger equation
governing the time evolution of a state. We next establish Heisenberg’s uncer-
tainty principle. Then we prove the equivalence of the Schrödinger description
via the Schrödinger equation and the Heisenberg description via the Heisen-
berg equation of quantum mechanics.
Also provided in the book is a rich collection of mostly proof-oriented
exercises to supplement and consolidate the main course materials. The
diversity and elasticity of these exercises aim to satisfy the needs and inter-
ests of students from a wide variety of backgrounds.
At the end of the book, solutions to some selected exercises are presented.
These exercises and solutions provide additional illustrative examples, extend
main course materials, and render convenience for the reader to master the
subjects and methods covered in a broader range.
Finally some bibliographic notes conclude the book.
This text may be curtailed to meet the time constraint of a semester-long
course. Here is a suggested list of selected sections for such a plan: Sec-
tions 1.1–1.5, 2.1–2.3, 2.5, 3.1.2, 3.2, and 3.3 (present the concept of adjugate
matrices only), Section 3.4 (give the second proof of the Cayley–Hamilton the-
orem only, based on an adjugate matrix expansion), Sections 4.3, 4.4, 5.1, 5.2
www.pdfgrip.com
xii Preface
(omit the analytic proof that a self-adjoint mapping must have an eigenvalue
but resort to Exercise 5.2.1 instead), Sections 5.3, 6.1, 6.2, 6.3.1, and 7.1–7.4.
Depending on the pace of lectures and time available, the instructor may
decide in the later stage of the course to what extent the topics in Sections
7.1–7.4 (the Jordan decomposition) can be presented productively.
The author would like to take this opportunity to thank Patrick Lin, Thomas
Otway, and Robert Sibner for constructive comments and suggestions, and
Roger Astley of Cambridge University Press for valuable editorial advice,
which helped improve the presentation of the book.
xiii
www.pdfgrip.com
www.pdfgrip.com
1
Vector spaces
In this chapter we study vector spaces and their basic properties and structures.
We start by stating the definition and a discussion of the examples of vector
spaces. We next introduce the notions of subspaces, linear dependence, bases,
coordinates, and dimensionality. We then consider dual spaces, direct sums,
and quotient spaces. Finally we cover normed vector spaces.
1.1.1 Fields
The scalars to operate on vectors in a vector space are required to form a field,
which may be denoted by F, where two operations, usually called addition,
denoted by ‘+’, and multiplication, denoted by ‘·’ or omitted, over F are per-
formed between scalars, such that the following axioms are satisfied.
(1) (Closure) If a, b ∈ F, then a + b ∈ F and ab ∈ F.
(2) (Commutativity) For a, b ∈ F, there hold a + b = b + a and ab = ba.
(3) (Associativity) For a, b, c ∈ F, there hold (a + b) + c = a + (b + c) and
a(bc) = (ab)c.
(4) (Distributivity) For a, b, c ∈ F, there hold a(b + c) = ab + ac.
(5) (Existence of zero) There is a scalar, called zero, denoted by 0, such that
a + 0 = a for any a ∈ F.
(6) (Existence of unity) There is a scalar different from zero, called one,
denoted by 1, such that 1a = a for any a ∈ F.
1
www.pdfgrip.com
2 Vector spaces
It is easily seen that zero, unity, additive and multiplicative inverses are all
unique. Besides, a field consists of at least two elements.
With the usual addition and multiplication, the sets of rational numbers, real
numbers, and complex numbers, denoted by Q, R, and C, respectively, are all
fields. These fields are infinite fields. However, the set of integers, Z, is not a
field because there is a lack of multiplicative inverses for its non-unit elements.
Let p be a prime (p = 2, 3, 5, . . . ) and set pZ = {n ∈ Z | n = kp, k ∈ Z}.
Classify Z into the so-called cosets modulo pZ, that is, some non-overlapping
subsets of Z represented as [i] (i ∈ Z) such that
It is clear that Z is divided into exactly p cosets, [0], [1], . . . , [p − 1]. Use
Zp to denote the set of these cosets and pass the additive and multiplicative
operations in Z over naturally to the elements in Zp so that
It can be verified that, with these operations, Zp becomes a field with its obvi-
ous zero and unit elements, [0] and [1]. Of course, p[1] = [1] + · · · + [1] (p
terms)= [p] = [0]. In fact, p is the smallest positive integer whose multipli-
cation with unit element results in zero element. A number of such a property
is called the characteristic of the field. Thus, Zp is a field of characteristic p.
For Q, R, and C, since no such integer exists, we say that these fields are of
characteristic 0.
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
a1 b1 a1 + b1
⎜ . ⎟ ⎜ .. ⎟ ⎜ ⎟
⎜ . ⎟+⎜ ⎟ ⎜ .. ⎟,
⎝ . ⎠ ⎝ . ⎠=⎝ . ⎠ (1.1.4)
an bn an + bn
⎛ ⎞ ⎛ ⎞
a1 αa1
⎜ . ⎟ ⎜ . ⎟
α⎜ ⎟ ⎜ ⎟
⎝ .. ⎠ = ⎝ .. ⎠ where α ∈ F. (1.1.5)
an αan
The set Fn , modeled over the field F and equipped with the above operations,
is a prototype example of a vector space.
More generally, we say that a set U is a vector space over a field F if U is
non-empty and there is an operation called addition, denoted by ‘+’, between
the elements of U , called vectors, and another operation called scalar mul-
tiplication between elements in F, called scalars, and vectors, such that the
following axioms hold.
(1) (Closure) For u, v ∈ U , we have u + v ∈ U . For u ∈ U and a ∈ F, we
have au ∈ U .
(2) (Commutativity) For u, v ∈ U , we have u + v = v + u.
(3) (Associativity of addition) For u, v, w ∈ U , we have u + (v + w) =
(u + v) + w.
(4) (Existence of zero vector) There is a vector, called zero and denoted by 0,
such that u + 0 = u for any u ∈ U .
(5) (Existence of additive inverse) For any u ∈ U , there is a vector, denoted
as (−u), such that u + (−u) = 0.
(6) (Associativity of scalar multiplication) For any a, b ∈ F and u ∈ U , we
have a(bu) = (ab)u.
(7) (Property of unit scalar) For any u ∈ U , we have 1u = u.
(8) (Distributivity) For any a, b ∈ F and u, v ∈ U , we have (a+b)u = au+bu
and a(u + v) = au + av.
As in the case of the definition of a field, we see that it readily follows from
the definition that zero vector and additive inverse vectors are all unique in
a vector space. Besides, any vector multiplied by zero scalar results in zero
vector. That is, 0u = 0 for any u ∈ U .
Other examples of vector spaces (with obviously defined vector addition and
scalar multiplication) include the following.
(1) The set of all polynomials with coefficients in F defined by
P = {a0 + a1 t + · · · + an t n | a0 , a1 , . . . , an ∈ F, n ∈ N}, (1.1.6)
www.pdfgrip.com
4 Vector spaces
dn x dx
an + · · · + a1 + a0 x = 0, a0 , a1 , . . . , an ∈ R. (1.1.7)
dt n dt
(4) In addition, we can also consider the set of arrays of scalars in F consisting
of m rows of vectors in Fn or n columns of vectors in Fm of the form
⎛ ⎞
a11 a12 . . . a1n
⎜ ⎟
⎜ a21 a22 . . . a2n ⎟
⎜
(aij ) = ⎜ ⎟, (1.1.8)
⎟
⎝ ... ... ... ... ⎠
am1 am2 . . . amn
1.1.3 Matrices
Here we consider some of the simplest manipulations on, and properties of,
matrices.
Let A be the matrix given in (1.1.8). Then At , called the transpose of A, is
defined to be
⎛ ⎞
a11 a21 . . . am1
⎜ ⎟
⎜ a12 a22 . . . am2 ⎟
A =⎜
t ⎜ ⎟. (1.1.9)
⎟
⎝ ... ... ... ... ⎠
a1n a2n . . . amn
Of course, At ∈ F(n, m). Simply put, At is a matrix obtained from taking the
row (column) vectors of A to be its corresponding column (row) vectors.
For A ∈ F(n, n), we say that A is symmetric if A = At , or skew-symmetric
or anti-symmetric if At = −A. The sets of symmetric and anti-symmetric
matrices are denoted by FS (n, n) and FA (n, n), respectively.
It is clear that (At )t = A.
www.pdfgrip.com
It will now be useful to introduce the notion of dot product. For any two
vectors u = (a1 , . . . , an ) and v = (b1 , . . . , bn ) in Fn , their dot product u·v ∈ F
is defined to be
u · v = a1 b1 + · · · + an bn . (1.1.10)
With the notion of dot product, we can define the product of two matrices
A ∈ F(m, k) and B ∈ F(k, n) by
where cij is the dot product of the ith row of A and the j th column of B. Thus
AB ∈ F(m, n).
Alternatively, if we use u, v to denote column vectors in Fn , then
u · v = ut v. (1.1.12)
That is, the dot product of u and v may be viewed as a matrix product of the
1 × n matrix ut and n × 1 matrix v as well.
Matrix product (or matrix multiplication) enjoys the following properties.
6 Vector spaces
which suggests that matrix multiplication may be carried out with legitimate
multiplications executed over appropriate matrix blocks.
If A ∈ F(m, k) and B ∈ F(k, n), then At ∈ F(k, m) and B t ∈ F(n, k) so
that B t At ∈ F(n, m). Regarding how AB and B t At are related, here is the
conclusion.
(AB)t = B t At . (1.1.15)
(1) Diagonal matrices are of the form A = (aij ) with aij = 0 whenever
i = j . The set of diagonal matrices is denoted as FD (n, n).
(2) Lower triangular matrices are of the form A = (aij ) with aij = 0 when-
ever j > i. The set of lower triangular matrices is denoted as FL (n, n).
(3) Upper triangular matrices are of the form A = (aij ) with aij = 0 when-
ever i > j . The set of upper triangular matrices is denoted as FU (n, n).
There is a special element in F(n, n), called the identity matrix, or unit
matrix, and denoted by In , or simply I , which is a diagonal matrix whose
diagonal entries are all 1 (unit scalar) and off-diagonal entries are all 0. For
any A ∈ F(n, n), we have AI = I A = A.
www.pdfgrip.com
AB = BA = I. (1.1.16)
In this situation, B is unique (cf. Exercise 1.1.7) and called the inverse of A
and denoted by A−1 .
Exercises
1.1.1 Show that it follows from the definition of a field that zero, unit, additive,
and multiplicative inverse scalars are all unique.
1.1.2 Let p ∈ N be a prime and [n] ∈ Zp . Find −[n] and prove the existence
of [n]−1 when [n] = [0]. In Z5 , find −[4] and [4]−1 .
1.1.3 Show that it follows from the definition of a vector space that both zero
and additive inverse vectors are unique.
1.1.4 Prove the associativity of matrix multiplication by showing that
A(BC) = (AB)C for any A ∈ F(m, k), B ∈ F(k, l), C ∈ F(l, n).
1.1.5 Prove Theorem 1.1.
1.1.6 Let A ∈ F(n, n) (n ≥ 2) and rewrite A as
A1 A2
A= , (1.1.17)
A3 A4
where A1 ∈ F(k, k), A2 ∈ F(k, l), A3 ∈ F(l, k), A4 ∈ F(l, l), k, l ≥ 1,
k + l = n. Show that
At1 At3
A =
t
. (1.1.18)
At2 At4
www.pdfgrip.com
8 Vector spaces
1.1.7 Prove that the inverse of an invertible matrix is unique by showing the
fact that if A, B, C ∈ F(n, n) satisfy AB = I and CA = I then B = C.
1.1.8 Let A ∈ C(n, n). Show that A is Hermitian if and only if iA is anti-
Hermitian.
S0 = Span{e2 − e1 , . . . , en − e1 }. (1.2.7)
For F(m, n), we define Mij ∈ F(m, n) to be the vector such that all its
entries vanish except that its entry at the position (i, j ) (at the ith row and j th
column) is 1, i = 1, . . . , m, j = 1, . . . , n. We have
The notion of spans can be extended to cover some useful situations. Let U
be a vector space and S be a (finite or infinite) subset of U . Define
P = Span{1, t, . . . , t n , . . . }. (1.2.10)
P = ∪∞
n=0 Pn . (1.2.11)
10 Vector spaces
Theorem 1.4 In the system (1.2.13), if m < n, then the system has a nontrivial
solution (x1 , . . . , xn ) = (0, . . . , 0).
x1 v1 + · · · + xn vn = 0, (1.2.15)
for some x1 , . . . , xn ∈ F.
Since each vj ∈ Span{u1 , . . . , um }, j = 1, . . . , n, there are scalars aij ∈ F
(i = 1, . . . , m, j = 1, . . . , n) such that
m
vj = aij ui , j = 1, . . . , n. (1.2.16)
i=1
n
aij xj = 0, i = 1, . . . , m. (1.2.19)
j =1
This system of equations is exactly the system (1.2.13) which allows a non-
trivial solution in view of Theorem 1.4. Hence the proof follows.
We are now prepared to study in the next section several fundamental prop-
erties of vector spaces.
www.pdfgrip.com
12 Vector spaces
Exercises
14 Vector spaces
p(t) = r0 + r1 t + r2 t 2 + · · · + rn t n , (1.3.5)
a1 u1 + · · · + an un + au = 0. (1.3.6)
u = a1 u1 + · · · + an un . (1.3.7)
u = a1 u1 + · · · + an un = b1 v1 + · · · + bn vn . (1.3.8)
Note that the relation (1.3.9) between bases may be formally and conve-
niently expressed in a ‘matrix form’ as
or concisely V = U A, or
⎛ ⎞ ⎛ ⎞
v1 u1
⎜ . ⎟ ⎜ .. ⎟
⎜ . ⎟ = At ⎜ ⎟ (1.3.13)
⎝ . ⎠ ⎝ . ⎠,
vn un
where multiplications between scalars and vectors are made in a well defined
manner. On the other hand, the relation (1.3.11) between coordinate vectors
may be rewritten as
⎛ ⎞ ⎛ ⎞
a1 b1
⎜ . ⎟ ⎜ ⎟
⎜ . ⎟ = A ⎜ .. ⎟ , (1.3.14)
⎝ . ⎠ ⎝ . ⎠
an bn
or
Exercises
16 Vector spaces
1.3.5 Consider the vector space R3 and the bases U = {e1 , e2 , e3 } and
V = {e1 , e1 + e2 , e1 + e2 + e3 }. Find the basis transition matrix A from
U into V satisfying V = U A. Find the coordinate vectors of the given
vector (1, 2, 3) ∈ R3 with respect to the bases U and V, respectively,
and relate these vectors with the matrix A.
1.3.6 Prove that a basis transition matrix must be invertible.
1.3.7 Let U be an n-dimensional vector space over a field F where n ≥ 2
(say). Consider the following construction.
(i) Take u1 ∈ U \ {0}.
(ii) Take u2 ∈ U \ Span{u1 }.
(iii) Take (if any) u3 ∈ U \ Span{u1 , u2 }.
(iv) In general, take ui ∈ U \ Span{u1 , . . . , ui−1 } (i ≥ 2).
Show that this construction will terminate itself in exactly n steps,
that is, it will not be possible anymore to get un+1 , and that the vectors
u1 , u2 , . . . , un so obtained form a basis of U .
U ↔ Fn , f ↔ (f1 , . . . , fn ). (1.4.7)
It is clear that u1 , . . . , un are linearly independent and span U because an
element f of U satisfying (1.4.5) is simply given by
In other words, {u1 , . . . , un } is a basis of U , commonly called the dual basis
of U with respect to the basis {u1 , . . . , un } of U . In particular, we have seen
that U and U are of the same dimensionality.
Let U = {u1 , . . . , un } and V = {v1 , . . . , vn } be two bases of the vec-
tor space U . Let their dual bases be denoted by U = {u1 , . . . , un } and
V = {v1 , . . . , vn }, respectively. Suppose that the bases U and V are related
through
n
uj = aij vi , j = 1, . . . , n. (1.4.10)
i=1
n
n
ui (vj ) =
aki vk (vj ) =
aki δkj = aj i , (1.4.12)
k=1 k=1
www.pdfgrip.com
18 Vector spaces
or
⎛ ⎞ ⎛ ⎞
u1 v1
⎜ . ⎟ ⎜ .. ⎟
⎜ . ⎟ = A⎜ ⎟ (1.4.15)
⎝ . ⎠ ⎝ . ⎠.
un vn
the discussion in the previous section and the above immediately allow us
to get
n
bi = aj i aj , i = 1, . . . , n. (1.4.17)
j =1
or
⎛ ⎞ ⎛ ⎞
b1 a1
⎜ . ⎟ ⎜ .. ⎟
⎜ . ⎟ = At ⎜ ⎟ (1.4.19)
⎝ . ⎠ ⎝ . ⎠.
bn an
Comparing the above results with those established in the previous section,
we see that, with respect to bases and dual bases, the coordinates vectors in U
and U follow ‘opposite’ rules of correspondence. For this reason, coordinate
vectors in U are often called covariant vectors, and those in U contravariant
vectors.
Using the relation stated in (1.4.8), we see that we may naturally view
u1 , . . . , un as elements in (U ) = U so that they form a basis of U dual
to {u1 , . . . , un } since
www.pdfgrip.com
0, i = j,
ui (uj ) ≡ uj (ui ) = δij = i, j = 1, . . . , n. (1.4.20)
1, i = j,
u = a1 u1 + · · · + an un . (1.4.21)
U = U, (1.4.22)
S = {u ∈ U | u , u = 0, ∀u ∈ S }
0
(1.4.27)
Exercises
20 Vector spaces
for some c ∈ F.
1.4.5 Let U = P2 and f, g, h ∈ U be defined by
V + W ≡ {u ∈ U | u = v + w, v ∈ V , w ∈ W }. (1.5.1)
V + W = Span{u1 , . . . , uk , v1 , . . . , vl , w1 , . . . , wm }. (1.5.3)
a1 u1 + · · · + ak uk + b1 v1 + · · · + bl vl + c1 w1 + · · · + cm wm = 0, (1.5.4)
w = c1 w1 + · · · + cm wm = 0. (1.5.5)
is valid for the sum of any two subspaces V and W of finite dimensions in a
vector space.
Theorem 1.11 The sum of two subspaces V and W of U is a direct sum if and
only if each vector u in V + W may be expressed as the sum of a unique vector
v ∈ V and a unique vector w ∈ W .
www.pdfgrip.com
22 Vector spaces
u = v1 + w1 = v2 + w2 , v1 , v2 ∈ V , w1 , w2 ∈ W. (1.5.8)
W = Span{w1 , . . . , wl }. (1.5.9)
It is clear that the set U of all vectors of the form (1.5.10) equipped with the
vector addition (1.5.11) and scalar multiplication (1.5.12) is a vector space over
F. Naturally we may identify V and W with the subspaces of U given by
V = V1 + · · · + Vk , (1.5.14)
V = V1 ⊕ · · · ⊕ Vk . (1.5.15)
Vi ∩ Vj = {0}, i = j, i, j = 1, . . . , k, (1.5.16)
In other words, when the condition (1.5.18) is fulfilled, then (1.5.15) is valid.
The proof of this fact is left as an exercise.
Exercises
1.5.1 For
U = {(x1 , . . . , xn ) ∈ Fn | x1 + · · · + xn = 0},
(1.5.19)
V = {(x1 , . . . , xn ) ∈ Fn | x1 = · · · = xn },
prove that Fn = U ⊕ V .
www.pdfgrip.com
24 Vector spaces
1.5.2 Consider the vector space of all n × n matrices over a field F, denoted by
F(n, n). As before, use FS (n, n) and FA (n, n) to denote the subspaces of
symmetric and anti-symmetric matrices. Assume that the characteristic
of F is not equal to 2. For any M ∈ F(n, n), rewrite M as
1 1
M= (M + M t ) + (M − M t ). (1.5.20)
2 2
1 1
Check that (M + M t ) ∈ FS (n, n) and (M − M t ) ∈ FA (n, n). Use
2 2
this fact to prove the decomposition
(i) Prove that if R is identified with the set of all constant functions over
[a, b] then X = R ⊕ Y .
(ii) For a = 0, b = 1, and f (t) = t 2 + t − 1, find the unique c ∈ R and
g ∈ Y such that f (t) = c + g(t) for all t ∈ [0, 1].
1.5.6 Let U be a vector space and V , W its subspaces such that U = V + W .
If X is subspace of U , is it true that X = (X ∩ V ) + (X ∩ W )?
1.5.7 Let U be a vector space and V , W, X some subspaces of U such that
U = V ⊕ W, U = V ⊕ X. (1.5.23)
26 Vector spaces
Since the coset [0] is already seen to be the additive zero when adding cosets,
we are prompted to define
We may examine how the above introduced addition between cosets and
scalar multiplication with cosets make the set of all cosets into a vector space
over R, denoted by R2 /V , and called the quotient space of R2 modulo V . As
investigated, the geometric meaning of R2 /V is that it is the set of all the lines
in R2 parallel to the vector v and that these lines can be added and multiplied
by real scalars so that the set of lines enjoys the structure of a real vector space.
There is no difficulty extending the discussion to the case of R3 with V a
line or plane through the origin.
More generally, the above quotient-space construction may be formulated
as follows.
forms a vector space over F, called the quotient space of U modulo V , and is
denoted by U/V .
BU = {v1 , . . . , vk , u1 , . . . , ul }. (1.6.9)
a1 u1 + · · · + al ul = b1 v1 + · · · + bk vk . (1.6.11)
www.pdfgrip.com
Exercises
28 Vector spaces
Definition 1.13 Let U be a vector space over the field F. A norm over U is a
correspondence · : U → R such that we have the following.
The notions of convergence and limit are essential for carrying out calculus
in a normed space.
Let U be a finite-dimensional space. We can easily equip U with a norm.
For example, assume that B = {u1 , . . . , un } is a basis of U . For any u ∈ U ,
define
n
n
u1 = |ai |, where u= ai ui . (1.7.2)
i=1 i=1
It can be shown that · p also defines a norm over U . We will not check this
fact. What interests us here, however, is the limit
www.pdfgrip.com
To prove (1.7.4), we note that the right-hand side of (1.7.4) is simply |ai0 |
for some i0 ∈ {1, . . . , n}. Thus, in view of (1.7.3), we have
n 1
p
1
up ≤ |ai0 | p
= |ai0 |n p . (1.7.5)
i=1
From (1.7.5), we obtain
lim sup up ≤ |ai0 |. (1.7.6)
p→∞
Definition 1.15 Let U be a vector space and · and · two norms over U .
We say that · is stronger than · if convergence with respect to norm ·
implies convergence with respect to norm · . More precisely, any convergent
sequence {uk } with limit u0 in (U, · ) is also a convergent sequence with the
same limit in (U, · ).
30 Vector spaces
Theorem 1.16 Let U be a vector space and · and · two norms over U .
Then norm · is stronger than norm · if and only if there is a constant
C > 0 such that
Definition 1.17 Let U be a vector space and · and · two norms over U .
We say that norms · and · are equivalent if convergence in norm · is
equivalent to convergence in norm · .
Theorem 1.18 Any two norms over a finite-dimensional space are equivalent.
n
n
u ≤ |ai |ui ≤ α2 |ai | = α2 u1 , (1.7.16)
i=1 i=1
where we have set
α2 = max{ui | i = 1, . . . , n}. (1.7.17)
In other words, we have shown that · 1 is stronger than · .
In order to show that · is also stronger than · 1 , we need to prove the
existence of a constant α1 > 0 such that
α1 u1 ≤ u, ∀u ∈ U. (1.7.18)
Suppose otherwise that (1.7.18) fails to be valid. In other words, the set of
ratios
u
u ∈ U, u = 0 (1.7.19)
u1
does not have a positive infimum. Then there is a sequence {vk } in U such that
1
vk 1 ≥ vk , vk = 0, k = 1, 2, . . . . (1.7.20)
k
Now set
1
wk = vk , k = 1, 2, . . . . (1.7.21)
vk 1
Then (1.7.20) implies that wk → 0 as k → ∞ with respect to · .
On the other hand, we have wk 1 = 1 (k = 1, 2, . . . ). Express wk with
respect to basis B as
wk = a1,k u1 + · · · + an,k un , a1,k , . . . , an,k ∈ F, k = 1, 2, . . . . (1.7.22)
The definition of norm · 1 then implies |ai,k | ≤ 1 (i = 1, . . . , n, k =
1, 2, . . . ). Hence, by the Bolzano–Weierstrass theorem, there is a subsequence
of {k}, denoted by {ks }, such that ks → ∞ as s → ∞ and
ai,ks → some ai,0 ∈ F as s → ∞, i = 1, . . . , n. (1.7.23)
Now set w0 = a1,0 u1 + · · · + an,0 un . Then w0 − wks 1 → 0 as s → ∞.
Moreover, the triangle inequality gives us
w0 1 ≥ wks 1 − wks − w0 1 = 1 − wks − w0 1 , s = 1, 2, . . . .
(1.7.24)
Thus, letting s → ∞ in (1.7.24), we obtain w0 1 ≥ 1.
However, substituting u = w0 − wks in (1.7.16) and letting s → ∞, we see
that wks → w0 as s → ∞ with respect to norm · as well which is false
because we already know that wk → 0 as k → ∞ with respect to norm · .
www.pdfgrip.com
32 Vector spaces
Summarizing the above study, we see that there are some constants α1 , α2 >
0 such that
α1 u1 ≤ u ≤ α2 u1 , ∀u ∈ U. (1.7.25)
Finally, let · be another norm over U . Then we have some constants
β1 , β2 > 0 such that
β1 u1 ≤ u ≤ β2 u1 , ∀u ∈ U. (1.7.26)
Combining (1.7.25) and (1.7.26), we arrive at the desired conclusion
β1 β2
u ≤ u ≤ u, ∀u ∈ U, (1.7.27)
α2 α1
which establishes the equivalence of norms · and · as stated.
As an immediate application of Theorem 1.18, we have the following.
Exercises
1.7.1 Consider the vector space C[a, b] of the set of all real-valued continuous
functions over the interval [a, b] where a, b ∈ R and a < b. Define two
norms · 1 and · ∞ by setting
www.pdfgrip.com
b
u1 = |u(t)| dt, u∞ = max |u(t)|, ∀u ∈ C[a, b].
a t∈[a,b]
(1.7.30)
Show that · ∞ is stronger than · 1 but not vice versa.
1.7.2 Over the vector space C[a, b] again, define norm · p (p ≥ 1) by
b p1
up = |u(t)| dt
p
, ∀u ∈ C[a, b]. (1.7.31)
a
Show that · ∞ is stronger than · p for any p ≥ 1 and prove that
lim up = u∞ , ∀u ∈ C[a, b]. (1.7.32)
p→∞
1.7.3 Let (U, · ) be a normed space and use U to denote the dual space of
U . For each u ∈ U , define
u = sup{|u (u)| | u ∈ U, u = 1}. (1.7.33)
Show that · defines a norm over U .
1.7.4 Let U be a vector space and U its dual space which is equipped with a
norm, say · . Define
u = sup{|u (u)| | u ∈ U , u = 1}. (1.7.34)
Show that · defines a norm over U as well.
1.7.5 Prove that for any finite-dimensional normed space (U, ·) with a given
subspace V we may indeed use (1.7.29) to define a norm for the quotient
space U/V .
1.7.6 Let · denote the Euclidean norm of R2 which is given by
u = a12 + a22 , u = (a1 , a2 ) ∈ R2 . (1.7.35)
2
Linear mappings
34
www.pdfgrip.com
We can directly check that the mapping addition (2.1.1) and scalar-mapping
multiplication (2.1.2) make L(U, V ) a vector space over F. We adopt the
notation L(U ) = L(U, U ).
As an example, consider the space of matrices, F(m, n). For
A = (aij ) ∈ F(m, n), define
⎛ ⎞⎛ x1
⎞
x1
⎛⎞
a11 ··· a1n
⎜ ⎟⎜ .. ⎟ ⎜ . ⎟
TA (x) = Ax = ⎝ · · · ··· ··· ⎠⎜
⎝
⎟
. ⎠, x=⎜ ⎟
⎝ .. ⎠ ∈ F .
n
In other words, the images of e1 , . . . , en under the linear mapping TA are sim-
ply the column vectors of the matrix A, respectively.
Conversely, take any element T ∈ L(Fn , Fm ). Let v1 , . . . , vn ∈ Fm be
images of e1 , . . . , en under T such that
⎛ ⎞
a11
⎜ . ⎟ m
T (e1 ) = v1 = ⎜ .
⎝ . ⎠
⎟= ai1 ei , . . . ,
i=1
am1
www.pdfgrip.com
36 Linear mappings
⎛
⎞
a1n
⎜ . ⎟ m
T (en ) = vn = ⎜ .
⎝ . ⎠
⎟= ain ei , (2.1.5)
i=1
amn
where {e1 , . . . , em } is the standard basis of Fm . Then any x = x1 e1 +· · ·+xn en
has the image
⎛ ⎞
⎛ ⎞ x1
n n
⎜ . ⎟
T (x) = T ⎝ xj en ⎠ = xj vj = (v1 , . . . , vn ) ⎜ ⎟
⎝ .. ⎠
j =1 j =1
xn
⎛ ⎞⎛ x1
⎞
a11 ··· a1n
⎜ ⎟⎜ .. ⎟
= ⎝ ··· ··· ··· ⎠⎜
⎝
⎟
. ⎠. (2.1.6)
am1 ··· amn xn
In other words, T can be identified with TA through the matrix A consisting
of column vectors as images of e1 , . . . , en under T given in (2.1.5).
It may also be examined that mapping addition and scalar-mapping multi-
plication correspond to matrix addition and scalar-matrix multiplication.
In this way, as vector spaces, F(m, n) and L(Fn , Fm ) may be identified with
each other.
In general, let BU = {u1 , . . . , un } and BV = {v1 , . . . , vm } be bases of the
finite-dimensional vector spaces U and V over the field F, respectively. For
any T ∈ L(U, V ), we can write
m
T (uj ) = aij vi , aij ∈ F, i = 1, . . . , m, j = 1, . . . , n. (2.1.7)
i=1
Thus we again see, after specifying bases for U and V , that we may identify
L(U, V ) with F(m, n) in a natural way.
After the above general description of linear mappings, especially their iden-
tification with matrices, we turn our attention to some basic properties of linear
mappings.
38 Linear mappings
and
for any u ∈ U .
Thus, when applying (2.1.11) to the situation of linear mappings between
finite-dimensional vector spaces and using their associated matrix representa-
tions, we obtain another proof of the associativity of matrix multiplication.
N (T ) = {x ∈ U | T (x) = 0} (2.1.14)
T (x) = v (2.1.17)
40 Linear mappings
The fact stated in the following theorem is known as the nullity-rank equa-
tion or simply rank equation.
Proof If T is 1-1 and onto, then n(T ) = 0 and r(T ) = dim(V ). Thus
dim(U ) = dim(V ) follows from the rank equation.
www.pdfgrip.com
42 Linear mappings
S ◦ T = IU ; (2.1.29)
an element R ∈ L(V , U ) a right inverse of T ∈ L(U, V ) if
T ◦ R = IV . (2.1.30)
It is interesting that if T is known to be invertible, then the left and right in-
verses coincide and are simply the inverse of T . To see this, we let T −1 ∈
L(V , U ) denote the inverse of T and use the associativity of composition of
mappings to obtain from (2.1.29) and (2.1.30) the results
S = S ◦ IV = S ◦ (T ◦ T −1 ) = (S ◦ T ) ◦ T −1 = IV ◦ T −1 = T −1 ,
(2.1.31)
and
R = IU ◦ R = (T −1 ◦ T ) ◦ R = T −1 ◦ (T ◦ R) = T −1 ◦ IV = T −1 ,
(2.1.32)
respectively.
It is clear that, regarding T ∈ L(U, V ), the condition (2.1.29) implies that
T is 1-1, and the condition (2.1.30) implies that T is onto. In view of Theo-
rem 2.5, we see that when U, V are finite-dimensional and dim(U ) = dim(V ),
then T is invertible. Thus we have S = R = T −1 . On the other hand, if
dim(U ) = dim(V ), T can never be invertible and the notion of left and right
inverses is of separate interest.
As an example, we consider the linear mappings S : F3 → F2 and T :
F → F3 associated with the matrices
2
⎛ ⎞
1 0
1 0 0 ⎜ ⎟
A= , B = ⎝ 0 1 ⎠, (2.1.33)
0 1 0
0 0
respectively according to (2.1.3). Then we may examine that S is a left inverse
of T (thus T is a right inverse of S). However, it is absurd to talk about the
invertibility of these mappings.
Let U and V be two vector spaces over the same field. An invertible element
T ∈ L(U, V ) (if it exists) is also called an isomorphism. If there is an isomor-
phism between U and V , they are said to be isomorphic to each other, written
as U ≈ V .
www.pdfgrip.com
Exercises
X = {x ∈ U | T (x) ∈ Y } ≡ T −1 (Y ), (2.1.37)
necessarily a subspace of U ?
2.1.4 Let U, V be finite-dimensional vector spaces and T ∈ L(U, V ). Prove
that r(T ) ≤ dim(U ). In particular, if T is onto, then dim(U ) ≥
dim(V ).
www.pdfgrip.com
44 Linear mappings
where the matrix B = (bj k ) ∈ F(n, n) is called the basis transition matrix
from the basis {u1 , . . . , un } to the basis {ũ1 , . . . , ũn }, which is necessarily
invertible.
Let V be an m-dimensional vector space over F and take T ∈ L(U, V ).
Let A = (aij ) and à = (ãij ) be the m × n matrices of the linear mapping T
associated with the pairs of the bases
46 Linear mappings
A = C Â. (2.2.14)
Then
m
m
(S ◦ T̂ )(uj ) = âij S(vi ) = âij v̂i , j = 1, . . . , n. (2.2.16)
i=1 i=1
48 Linear mappings
then (2.2.20) gives us the manner in which the two matrices A and à are
related:
A = B ÃB −1 . (2.2.22)
Such a relation spells out the following definition.
Exercises
d
2.2.1 Consider the differentiation operation D = as a linear mapping over
dt
P (the vector space of all real polynomials of variable t) given by
dp(t)
D(p)(t) = , p ∈ P. (2.2.23)
dt
(i) Find the matrix that represents D ∈ L(P2 , P1 ) with respect to the
standard bases
BP
1
2
= {1, t, t 2 }, BP
1
1
= {1, t} (2.2.24)
of P2 , P1 , respectively.
(ii) If the basis of P2 is changed into
BP
2
2
= {t − 1, t + 1, (t − 1)(t + 1)}, (2.2.25)
BP
2
1
= {t − 1, t + 1}, (2.2.26)
50 Linear mappings
T (u), v , ∀u ∈ U, (2.3.1)
u = T (v ). (2.3.3)
Thus
T ≡ (T ) = T . (2.3.8)
m
T (uj ) = aij vi , j = 1, . . . , n, (2.3.9)
i=1
n
T (vl ) =
akl uk , l = 1, . . . , m. (2.3.10)
k=1
(1) N (T )0 = R(T ).
(2) N (T ) = R(T )0 .
(3) N(T )0 = R(T ).
(4) N(T ) = R(T )0 .
52 Linear mappings
We note that (1) in Theorem 2.8 may also follow from taking the annihilators
on both sides of (2) and applying Exercise 1.4.7. Thus, part (2) of Theorem 2.8
is the core of all the statements there.
We now use Theorem 2.8 to establish the following result.
Proof Using (2) in Theorem 2.8, we have n(T ) = dim(U ) − r(T ). However,
applying the rank equation (2.1.23) or Theorem 2.3, we have n(T ) + r(T ) =
dim(U ). Combining these results, we have r(T ) = r(T ).
On the other hand, since the associated matrix of T is At whose column vec-
tors are the row vectors of A. Hence R(T ) is the vector space spanned by the
row vectors of A (since (Fm ) = Fm ) whose dimension is likewise called the
row rank of A, denoted as rorank(A). Thus, we have
Exercises
T (u) = v, (2.3.16)
has a solution for some u ∈ U if and only if for any v ∈ V such that
T (v ) = 0 one has v, v = 0. In particular, the equation (2.3.16) has
a solution for any v ∈ V if and only if T is injective. (This statement is
also commonly known as the Fredholm alternative.)
2.3.4 Let U be a finite-dimensional vector space over a field F and a ∈ F.
Define T ∈ L(U, U ) by T (u) = au where u ∈ U . Show that T is then
given by T (u ) = au for any u ∈ U .
2.3.5 Let U = F(n, n) and B ∈ U . Define T ∈ L(U, U ) by T (A) = AB−BA
where A ∈ U . For f ∈ U given by
(ii) Apply (i) to show that for any f ∈ U there is a unique element
B ∈ U such that f (A) = Tr(AB t ).
(iii) For T ∈ L(U, U ), defined by T (A) = At , describe T .
54 Linear mappings
Exercises
56 Linear mappings
k
n
T (ui ) = bi i ui , i = 1, . . . , k, T (uj ) = cj j uj , j = k + 1, . . . , n,
i =1 j =1
(2.5.1)
k
l
T (vi ) = bi i vi , i = 1, . . . , k, T (wj ) = cj j wj , j = 1, . . . , l,
i =1 j =1
(2.5.5)
where the matrices B = (bi i ) and C = (cj j ) are in F(k, k) and F(l, l),
respectively. Thus, we see that, with respect to such a basis of U , the asso-
ciated matrix A ∈ F(n, n) assumes the form
B 0
A= , (2.5.6)
0 C
which takes the form of a special kind of matrices called boxed diagonal
matrices.
Thus, we see that a linear mapping T over a finite-dimensional vector space
U is reducible if and only if there is a basis of U so that the associated matrix
of T with respect to this basis is boxed diagonal.
An important and useful family of invariant subspaces are called
eigenspaces which may be viewed as an extension of the notion of null-spaces.
58 Linear mappings
Multiplying (2.5.8) by λm+1 and subtracting the result from (2.5.9), we obtain
T = λ1 I ⊕ · · · ⊕ λk I, (2.5.14)
www.pdfgrip.com
A = diag{λ1 , . . . , λn }. (2.5.16)
Thus, with respect to the basis {(1, −1)t , (1, 1)t }, the associated matrix of T is
1 0
. (2.5.19)
0 3
60 Linear mappings
2.5.2 Projections
In this subsection, we study an important family of reducible linear mappings
called projections.
I = T + (I − T ). (2.5.25)
Proof Since
(I − P )2 = I − 2P + P 2 , (2.5.26)
62 Linear mappings
T = λ1 P1 + · · · + λk Pk . (2.5.29)
Definition 2.19 Let U be a vector space and T ∈ L(U ). For any nonzero
vector u ∈ U , we say that u is T -cyclic if there is an integer m ≥ 1 such that
T m (u) = 0. The smallest such integer m is called the period of u under or
relative to T . If each vector in U is T -cyclic, T is said to be locally nilpotent.
Proof We only need to show that the set of vectors in (2.5.32) are linearly
independent. If m = 1, the statement is self-evident. So we now assume m ≥ 2.
Let c0 , . . . , cm−1 ∈ F so that
c0 u + c1 T (u) + · · · + cm−1 T m−1 (u) = 0. (2.5.33)
Assume that there is some l = 0, . . . , m − 1 such that cl = 0. Let l ≥ 0 be the
smallest such integer. Then l ≤ m − 2 so that (2.5.33) gives us
1
T l (u) = − (cl+1 T l+1 (u) + · · · + cm−1 T m−1 (u)). (2.5.34)
cl
This relation implies that
T m−1 (u) = T l+([m−1]−l) (u)
1
= − (cl+1 T m (u) + · · · + cm−1 T 2(m−1)+l (u)) = 0, (2.5.35)
cl
which contradicts the fact that T m−1 (u) = 0.
We can apply the above results to study the reducibility of a nilpotent
mapping.
64 Linear mappings
Hence, if we use S = (sij ) to denote the matrix of T with respect to the basis
B, then
1, j = i − 1, i = 2, . . . , n,
sij = (2.5.40)
0, otherwise.
That is,
⎛ ⎞
0 0 ··· ··· 0
⎜ ··· ··· ⎟
⎜ 1 0 0 ⎟
⎜ ⎟
⎜ .. .. .. .. ..⎟
S=⎜ . . . . .⎟. (2.5.41)
⎜ ⎟
⎜ ⎟
⎝ 0 ··· 1 0 0 ⎠
0 ··· ··· 1 0
k = max{n1 , . . . , nl } (2.5.47)
holds.
66 Linear mappings
p(t) = an t n + · · · + a1 t + a0 , a0 , a1 , . . . , an ∈ F, (2.5.48)
p(T ) = an T n + · · · + a1 T + a0 I. (2.5.49)
because the powers of T follow the same rule as the powers of t. That is,
T k T l = T k+l , k, l ∈ N.
For T ∈ L(U ), let λ ∈ F be an eigenvalue of T . Then, for any p(t) ∈ P
given as in (2.5.48), p(λ) is an eigenvalue of p(T ). To see this, we assume that
u ∈ U is an eigenvector of T associated with λ. We have
as anticipated.
If p ∈ P is such that p(T ) = 0, then we say that T is a root of p. Hence, if
T is a root of p, any eigenvalue λ of T must also be a root of p, p(λ) = 0, by
virtue of (2.5.51).
For example, an idempotent mapping is a root of the polynomial p(t) =
t 2 − t, and a nilpotent mapping is a root of a polynomial of the form p(t) = t m
(m ∈ N). Consequently, the eigenvalues of an idempotent mapping can only
be 0 and 1, and that of a nilpotent mapping, 0.
For T ∈ L(U ), let λ1 , . . . , λk be some distinct eigenvalues of T
such that T reduces over Eλ1 , . . . , Eλk . Then T must be a root of the
polynomial
p(t) = (t − λ1 ) · · · (t − λk ). (2.5.52)
k
p(T )u = p(T )ui
i=1
k
= (T − λ1 I ) · · · (T
− λi I ) · · · (T − λk I )(T − λi I )ui
i=1
= 0, (2.5.53)
Exercises
68 Linear mappings
n
1
bij = , i = 1, . . . , n. (2.5.58)
a
j =1
2.5.6 Let S, T ∈ L(U ) and assume that S, T are similar. That is, there is an
invertible element R ∈ L(U ) such that S = R ◦ T ◦ R −1 . Show that
λ ∈ F is an eigenvalue of S if and only if it is an eigenvalue of T .
2.5.7 Let U = F(n, n) and T ∈ L(U ) be defined by taking matrix transpose,
T (A) = At , A ∈ U. (2.5.59)
Show that both ±1 are the eigenvalues of T and identify E1 and E−1 .
Prove that T is reducible over the pair E1 and E−1 . Can T have an
eigenvalue different from ±1? What is the minimal polynomial of T ?
2.5.8 Let U be a vector space over a field F and T ∈ L(U ). If λ1 , λ2 ∈
F are distinct eigenvalues and u1 , u2 are the respectively associated
eigenvectors of T , show that, for any nonzero a1 , a2 ∈ F, the vector
u = a1 u1 + a2 u2 cannot be an eigenvector of T .
2.5.9 Let U be a finite-dimensional vector space over a field F and T ∈
L(U ). Assume that T is of rank 1.
(i) Prove that T must have an eigenvalue, say λ, in F.
(ii) If λ = 0, show that R(T ) = Eλ .
(iii) Is the statement in (ii), i.e. R(T ) = Eλ , valid when λ = 0?
2.5.10 Let A ∈ C(n, n) be unitary. That is, AA† = A† A = In . Show that any
eigenvalue of A is of unit modulus.
2.5.11 Show that, if T ∈ L(U ) is reducible over the pair of null-spaces N (T )
and N(I − T ), then T is idempotent.
2.5.12 Let S, T ∈ L(U ) be idempotent. Show that
(i) R(S) = R(T ) if and only if S ◦ T = T , T ◦ S = S.
(ii) N(S) = N (T ) if and only if S ◦ T = S, T ◦ S = T .
2.5.13 Let T ∈ L(R3 ) be defined by
⎛ ⎞ ⎛ ⎞
1 1 −1 x1
⎜ ⎟ ⎜ ⎟
T (x) = ⎝ −1 1 1 ⎠ x, x = ⎝ x2 ⎠ ∈ R3 . (2.5.60)
1 3 −1 x3
70 Linear mappings
where we have used the fact that norms over a finite-dimensional space are all
equivalent. This estimate may also be restated as
T (u)V
≤ C, u ∈ U, u = 0. (2.6.2)
uU
This boundedness result enables us to formulate the definition of the norm
of a linear mapping T ∈ L(U, V ) as follows.
Definition 2.23 For T ∈ L(U, V ), we define the norm of T , induced from the
respective norms of U and V , by
T (u)V
T = sup u ∈ U, u = 0 . (2.6.3)
uU
To show that (2.6.3) indeed defines a norm for the space L(U, V ), we need
to examine that it fulfills all the properties required of a norm. To this end, we
note from (2.6.3) that
72 Linear mappings
Consequently, we get
S ◦ T ≤ ST . (2.6.8)
(1) For any ε > 0 there exists an invertible element S ∈ L(U ) such that
S − T < ε. This property says that the subset of invertible mappings in
L(U ) is dense in L(U ) with respect to the norm of L(U ).
www.pdfgrip.com
(2) If T ∈ L(U ) is invertible, then there is some ε > 0 such that S ∈ L(U ) is
invertible whenever S satisfies S − T < ε. This property says that the
subset of invertible mappings in L(U ) is open in L(U ) with respect to the
norm of L(U ).
74 Linear mappings
Exercises
xn
76 Linear mappings
m
T 1 = max |aij | . (2.6.26)
1≤j ≤n
i=1
(i) Show that, with the one-parameter group (t) = etA , the solution
of the problem (2.6.27) is simply given by x = (t)x0 .
(ii) Moreover, use x(i) (t) to denote the solution of (2.6.27) when x0 =
ei where {e1 , . . . , en } is the standard basis of Rn , i = 1, . . . , n.
Show that (t) = etA is the n × n matrix with x(1) (t), . . . , x(n) (t)
as the n corresponding column vectors,
(This result provides a practical method for computing the matrix ex-
ponential, etA , which may also be viewed as the solution of the matrix-
valued initial value problem
dX
= AX, X(0) = In , X ∈ R(n, n).) (2.6.29)
dt
2.6.10 Consider the mapping : R → R(2, 2) defined by
cos t − sin t
(t) = , t ∈ R. (2.6.30)
sin t cos t
www.pdfgrip.com
3
Determinants
Thus, we may easily resolve the vector v along the direction of u and the
direction perpendicular to u as follows
# % π& % π &$
v = (b1 , b2 ) = c1 (cos θ, sin θ ) + c2 cos θ ± , sin θ ±
2 2
= (c1 cos θ ∓ c2 sin θ, c1 sin θ ± c2 cos θ ), c1 , c2 ∈ R. (3.1.2)
Here c2 may be interpreted as the length of the vector in the resolution that is
taken to be perpendicular to u. Hence, from (3.1.2), we can read off the result
78
www.pdfgrip.com
Therefore, using (3.1.3) and then (3.1.1), we see that the area σ of the parallel-
ogram under consideration is given by
σ = c2 u = |u cos θ b2 − u sin θ b1 | = |a1 b2 − a2 b1 |. (3.1.4)
Thus we see that the quantity a1 b2 − a2 b1 formed from the vectors (a1 , a2 )
and (b1 , b2 ) stands out, that will be called the determinant of the matrix
a1 a2
A= , (3.1.5)
b1 b2
written as det(A) or denoted by
a a2
1
. (3.1.6)
b1 b2
Since det(A) = ±σ , it is also referred to as the signed area of the parallelo-
gram formed by the vectors (a1 , a2 ) and (b1 , b2 ).
We now consider volume. We shall apply some vector algebra over R3 to
facilitate our discussion.
We use · and × to denote the usual dot and cross products between vectors
in R3 . We use i, j, k to denote the standard mutually orthogonal unit vectors in
R3 that form a right-hand system. For any vectors
u = (a1 , a2 , a3 ) = a1 i + a2 j + a3 k, v = (b1 , b2 , b3 ) = b1 i + b2 j + b3 k,
(3.1.7)
in R3 , we know that
u × v = (a2 b3 − a3 b2 )i − (a1 b3 − a3 b2 )j + (a1 b2 − a2 b1 )k (3.1.8)
is perpendicular to both u and v and u × v gives us the area of the paral-
lelogram formed from using u, v as two adjacent edges, which generalizes the
preceding discussion in R2 . To avoid the trivial situation, we assume u and v
are linearly independent. So u × v = 0 and
R3 = Span{u, v, u × v}. (3.1.9)
Let w = (c1 , c2 , c3 ) be another vector. Then (3.1.9) allows us to
express w as
w = au + bv + c(u × v), a, b, c ∈ R. (3.1.10)
From the geometry of the problem, we see that the volume of the parallelepiped
formed from using u, v, w as adjacent edges is given by
δ = u × vc(u × v) = |c|u × v2 (3.1.11)
www.pdfgrip.com
80 Determinants
because c(u × v) is the height of the parallelepiped, with the bottom area
u × v.
From (3.1.10), we have
w · (u × v) = cu × v2 . (3.1.12)
Inserting (3.1.12) into (3.1.11), we obtain the simplified volume formula
δ = |w · (u × v)|
= |c1 (a2 b3 − a3 b2 ) − c2 (a1 b3 − a3 b2 ) + c3 (a1 b2 − a2 b1 )|. (3.1.13)
In analogy of the earlier discussion on the area formula in R2 , we may set up
the matrix
⎛ ⎞
c1 c2 c3
⎜ ⎟
A = ⎝ a1 a2 a3 ⎠ , (3.1.14)
b1 b2 b3
and define the signed volume or determinant of the 3 × 3 matrix A as
c1 c2 c3
det(A) = a1 a2 a3
b b b
1 2 3
Thus, with the notation of determinants and in view of (3.1.18) and (3.1.19),
we may express the solution to (3.1.17) elegantly as
c a a c
1 2 1 1
a a
c2 b2 b1 c2 1 2
x1 = , x2 = , if = 0. (3.1.20)
a a a a b1 b2
1 2 1 2
b1 b2 b1 b2
The function f maps the interval [α, β] to cover the interval [a, b].
At a point t0 ∈ [α, β] where the derivative of f is positive, f (t0 ) > 0, f
maps a small interval around t0 onto a small interval around f (t0 ) and pre-
serves the orientation (from left to right) of the intervals; if f (t0 ) < 0, it
reverses the orientation. If f (t0 ) = 0, we say that t0 is a regular point of f .
For c ∈ [a, b], if f (t) = 0 for any t ∈ f −1 (c), we call c a regular value of
f . It is clear that, f −1 (c) is a finite set when c is a regular value of f . If c is a
regular value of f , we define the integer
where
82 Determinants
Thus, we have seen that, although both N + (f, c) and N − (f, c) may depend
on c, the quantity N (f, c) as given in (3.1.22) does not depend on c and is seen
to satisfy
⎧
⎪
⎨ 1, f (α) = a, f (β) = b,
N(f, c) = −1, f (α) = b, f (β) = a, (3.1.25)
⎪
⎩
0, f (α) = f (β).
This quantity may summarily be rewritten into the form of an integral
β
1 1
N(f, c) = (f (β) − f (α)) = f (t) dt, (3.1.26)
b−a b−a α
and interpreted to be the number count for the orientation-preserving times
minus the orientation-reversing times the function f maps the interval [α, β]
to cover the interval [a, b]. The advantage of using the integral representation
for N(f, c) is that it is clearly independent of the choice of the regular value
c. Indeed, the right-hand side of (3.1.26) is well defined for any differentiable
function f not necessarily having only finitely many critical points and the
specification of a regular value c for f becomes irrelevant. In fact, the integral
on the right-hand side of (3.1.26) is the simplest topological invariant called
the degree of the function f , which may now be denoted by
β
1
deg(f ) = f (t) dt. (3.1.27)
b−a α
The word ‘topological’ is used to refer to the fact that a small alteration of f
cannot perturb the value of deg(f ) since deg(f ) may take only integer val-
ues and the right-hand side of (3.1.27), however, relies on the derivative of f
continuously.
As a simple application, we note that it is not hard to see that for any c ∈
[a, b] the equation f (t) = c has at least one solution when deg(f ) = 0.
We next extend our discussion of topological invariants to two-dimensional
situations.
Let and C be two closed differentiable curves in R2 oriented counterclock-
wise and let
u: →C (3.1.28)
www.pdfgrip.com
84 Determinants
Let SR1 denote the circle of radius R > 0 in R2 centered at the origin. We
may parameterize SR1 by the polar angle θ : x = R cos θ, y = R sin θ , θ ∈
[0, 2π ]. With (3.1.35), we have
1 2
vS 1 = (x − y 2 , 2xy)
R R2
= (cos2 θ − sin2 θ, 2 cos θ sin θ ) = (cos 2θ, sin 2θ ). (3.1.36)
where τ (s) is the unit tangent vector along the unit circle S 1 at the image point
vγ (s) under the map vγ : γ → S 1 . Rewrite vγ as
Then
vγ (s) = fx x (s) + fy y (s), gx x (s) + gy y (s) . (3.1.40)
The assumption on v gives us the uniform boundedness |fx |, |fy |, |gx |, |gy |
inside . Using this property and (3.1.40), we see that there is a γ -independent
constant C > 0 such that
www.pdfgrip.com
vγ (s) ≤ C x (s)2 + y (s)2 = C. (3.1.41)
which leads to absurdness when the total arclength |γ | of the curve γ is made
small enough.
Thus, returning to the example (3.1.35), we conclude that the vector field v
has a zero inside any circle SR1 (R > 0) since we have shown that ind(v|S 1 ) =
R
2 = 0, which can only be the origin as seen trivially in (3.1.35) already.
We now use Theorem 3.1 to establish the celebrated Fundamental Theorem
of Algebra as stated below.
must have a zero in C. That is, there is some z0 ∈ C such that f (z0 ) = 0.
Proof Without loss of generality and for sake of simplicity, we may assume
an = 1 otherwise we may divide f (z) by an .
Let z = x + iy, x, y ∈ R, and write f (z) as
Then it is clear that v(x, y) = |f (z)| and it suffices to show that v vanishes
at some (x0 , y0 ) ∈ R2 .
In order to simplify our calculation, we consider a one-parameter deforma-
tion of f (z) given by
and denote the correspondingly constructed vector field by v t (x, y). So on the
circle SR1 = {(x, y) ∈ R2 | (x, y) = |z| = R} (R > 0), we have the uniform
lower estimate
www.pdfgrip.com
86 Determinants
Thus, when R is sufficiently large, we have C(R) ≥ 1 (say). For such a choice
of R, by topological invariance, we have
# $ # $ # $
ind v|S 1 = ind v 1 |S 1 = ind v 0 |S 1 . (3.1.48)
R R R
On the other hand, over SR1 we may again use the polar angle θ : x =
R cos θ, y = R sin θ , or z = Reiθ , to represent f 0 as f 0 (z) = R n einθ . Hence
v 0 = R n (cos nθ, sin nθ ). Consequently,
1 0
vS0 1 = v = (cos nθ, sin nθ ). (3.1.49)
R v 0
Therefore, as before, we obtain
# $ # $
ind v 0 |S 1 = deg vS0 1
R
2π
R
1 cos nθ sin nθ
= dθ = n. (3.1.50)
2π 0 −n sin nθ n cos nθ
In view of (3.1.48) and (3.1.50), we get ind(v|S 1 ) = n. Thus, applying Theo-
R
rem 3.1, we conclude that v must vanish somewhere inside the circle SR1 .
∂u ∂u
dσ = × dsdt. (3.1.52)
∂s ∂t
Thus, inserting these and |S 2 | = 4π into (3.1.51), we arrive at
1 ∂u ∂u
deg(u) = u· × dsdt
4π ∂s ∂t
f g h
1
= fs gs hs dsdt. (3.1.53)
4π
f g h
t t t
Exercises
88 Determinants
Use the topological method in this section to prove that the system has
at least one solution.
3.1.5 Consider the stereographic projection of S 2 sited in R3 with the Carte-
sian coordinates x, y, z onto the xy-plane through the south pole
(0, 0, −1) which induces a parameterization of S 2 by R2 given by
2x 2y 1 − x2 − y2
f = , g = , h = , (x, y) ∈ R2 ,
1 + x2 + y2 1 + x2 + y2 1 + x2 + y2
(3.1.58)
n
det(A) = ai1 Ci1 . (3.2.2)
i=1
The formula (3.2.2) is also referred to as the cofactor expansion of the deter-
minant according to the first column.
90 Determinants
Therefore, we arrive at
det(B) = B
ai1 Ci1 + rak1 Ck1
B
= A
ai1 rCi1 + rak1 Ck1
A
= r det(A).
i =k i =k
(3.2.5)
Consequently,
det(C) = C
ai1 Ci1 + ck1 Ck1
C
i =k
= A
ai1 (Ci1 + Ci1
B
) + (ak1 + bk1 )Ck1
A
i =k
= A
ai1 Ci1 + ak1 Ck1
A
+ B
ai1 Ci1 + bk1 Ck1
A
i =k i =k
as asserted.
which implies that the corresponding cofactors of A and B all differ by a sign,
B
Ck1 = −Ck1
A
, k = i, i + 1; B
Ci1 = −Ci+1,1
A
, B
Ci+1,1 = −Ci1
A
. (3.2.10)
www.pdfgrip.com
92 Determinants
Hence
det(B) = B
ak1 Ck1 + ai+1,1 Ci1
B
+ ai1 Ci+1,1
B
k =i,i+1
= A
ak1 (−Ck1 ) + ai+1,1 (−Ci+1,1
A
) + ai1 (−Ci1
A
)
k =i,i+1
= − det(A), (3.2.11)
as expected.
This theorem indicates that if two rows of an n × n (n ≥ 2) matrix A are
identical then det(A) = 0. Thus adding a multiple of a row to another row of
A does not alter the determinant ofA:
a11 ... a1n a11 . . . a1n
··· ··· ··· ··· ··· ···
a
ai1 ... ain i1 . . . ain
··· ··· ··· = ··· ··· ···
rai1 + aj 1 . . . rain + aj n rai1 . . . rain
··· ··· ··· ··· ··· ···
an1 ... ann an1 . . . ann
a11 . . . a1n
··· ··· ···
a
i1 . . . ain
+ · · · · · · · · · = det(A). (3.2.12)
aj 1 . . . aj n
··· ··· ···
an1 . . . ann
The above results provide us with practical computational techniques when
evaluating the determinant of an n × n matrix A. In fact, we may perform the
following three types of permissible row operations on A.
(1) Multiply a row of A by a nonzero scalar. Such an operation may also
be realized by multiplying A from the left by the matrix obtained from
multiplying the corresponding row of the n × n identity matrix I by the
same scalar.
(2) Interchange any two rows of A when n ≥ 2. Such an operation may also
be realized by multiplying A from the left by the matrix obtained from
interchanging the corresponding two rows of the n × n identity matrix I .
www.pdfgrip.com
Theorem 3.8 The conclusions of Theorems 3.4, 3.5, and 3.6 hold when the
statements about the row vectors are replaced correspondingly with the column
vectors of matrices.
Proof Using induction, it is easy to see that the conclusions of Theorems 3.4
and 3.5 hold. We now prove that the conclusion of Theorem 3.6 holds when
www.pdfgrip.com
94 Determinants
the row interchange there is replaced with column interchange. That is, we
show that if two columns in an n × n matrix A are interchanged, its determi-
nant will change sign. This property is not so obvious since our definition of
determinant is based on the cofactor expansion by the first column vector and
an interchange of the first column with another column alters the first column
of the matrix. The effect of the value of determinant with respect to such an
alteration needs to be examined closely, which will be our task below.
We still use induction.
At n = 2 the conclusion may be checked directly.
Assume the conclusion holds at n − 1 ≥ 1.
We now prove the conclusion at n ≥ 3. As before, it suffices to establish the
conclusion for any adjacent column interchange.
If the column interchange does not involve the first column, we see that the
conclusion about the sign change of the determinant clearly holds in view of
the inductive assumption and the cofactor expansion formula (3.2.2) since all
the cofactors Ci1 (i = 1, . . . , n) change their sign exactly once under any pair
of column interchange.
Now consider the effect of an interchange of the first and second columns
of A. It can be checked that such an operation may be carried out through
multiplying A by the matrix F from the right where F is obtained from the
n × n identity matrix I by interchanging the first and second columns of I . Of
course, det(F ) = −1.
Let E1 , . . . , Ek be a sequence of elementary matrices and U = (uij ) an
upper triangular matrix so that Ek · · · E1 A = U . Then we have
⎛ ⎞
u12 u11 u13 · · · u1n
⎜ ⎟
⎜ u22 0 u23 · · · u2n ⎟
⎜ ⎟
⎜ ⎟
UF = ⎜ 0 0 u33 · · · u3n ⎟ . (3.2.16)
⎜ ⎟
⎜ ··· ··· ··· ··· ··· ⎟
⎝ ⎠
0 ··· ··· 0 unn
Thus the cofactor expansion formula down the first column, as stated in Defi-
nition 3.3, and the inductive assumption at n − 1 lead us to the result
0 u23 · · · · · · u11 u13 · · · · · ·
0 u33 · · · · · · 0 u33 · · · · · ·
det(U F ) = u11
− u22 · · · · · · · · · · · ·
··· ··· ··· ···
0 · · · · · · unn 0 · · · · · · unn
= −u11 u22 · · · unn = − det(U ). (3.2.17)
www.pdfgrip.com
96 Determinants
The formula (3.2.23) can be used immediately to derive a few simple but
basic conclusions about various matrices.
For example, if A ∈ F(n, n) is invertible, then there is some B ∈ F(n, n)
such that AB = In . Thus det(A) det(B) = det(AB) = det(In ) = 1, which
implies that det(A) = 0. In other words, the condition det(A) = 0 is neces-
sary for any A ∈ F(n, n) to be invertible. In the next section, we shall show
that this condition is also sufficient. As another example, if A ∈ R(n, n) is or-
thogonal, then AAt = In . Hence (det(A))2 = det(A) det(At ) = det(AAt ) =
det(In ) = 1. In other words, the determinant of an orthogonal matrix can only
take values ±1. Similarly, if A ∈ C(n, n) is unitary, then the condition
AA† = In leads us to the conclusion
t
| det(A)|2 = det(A)det(A) = det(A) det(A ) = det(AA† ) = det(In ) = 1
(3.2.26)
(cf. (3.2.38)). That is, the determinant of a unitary matrix is of modulus one.
Below we show that we can make a cofactor expansion along any column
or row to evaluate a determinant.
98 Determinants
which establishes the legitimacy of the cofactor expansion formula along the
first row of A. The validity of the cofactor expansion along an arbitrary row
may be proved by induction as done for the column case.
where A1 ∈ F(k, k), A2 ∈ F(l, l), A3 ∈ F(k, l), and k + l = n. Then we have
the useful formula
Since the diagonal entries of U are u11 , . . . , ukk , uk+1,k+1 , . . . , unn , we have
where A1 ∈ F(k, k), A2 ∈ F(l, l), A3 ∈ F(l, k), and k + l = n, then (3.2.32)
still holds. Indeed, taking transpose in (3.2.35), we have
At1 At3
A =
t
, (3.2.36)
0 At2
which becomes a boxed upper triangular matrix studied earlier. Thus, using
Theorem 3.10 and (3.2.32), we obtain
as anticipated.
Exercises
100 Determinants
where Cij (t) is the cofactor of the entry aij (t), i, j = 1, . . . , n, in the
matrix A(t).
3.2.8 Prove the formula
⎛ ⎞
x a1 a2 · · · an
⎜ a ⎟
⎜ 1 x a2 · · · an ⎟
⎜ ⎟ n 'n
⎜ ⎟
det ⎜ a1 a2 x · · · an ⎟ = x + ai (x − ai ).
⎜ . .. ⎟
⎜ . .. .. . . ⎟ i=1 i=1
⎝ . . . . . ⎠
a1 a2 a3 ··· x
(3.2.43)
3.2.9 Let p1 (t), . . . , pn+2 (t) be n+2 polynomials of degrees up to n ∈ N and
with coefficients in C. Show that for any n + 2 numbers c1 , . . . , cn+2
in C, there holds
⎛ ⎞
p1 (c1 ) p1 (c2 ) ··· p1 (cn+2 )
⎜ ⎟
⎜ p2 (c1 ) p2 (c2 ) ··· p2 (cn+2 ) ⎟
⎜ ⎟
det ⎜ .. .. .. ⎟ = 0. (3.2.44)
⎜ ..
. ⎟
⎝ . . . ⎠
pn+2 (c1 ) pn+2 (c2 ) · · · pn+2 (cn+2 )
www.pdfgrip.com
3.2.12 Let A ∈ F(n, n) be such that AAt = In and det(A) < 0. Show that
det(A + In ) = 0.
3.2.13 Let A ∈ F(n, n) such that the entries of A are either 1 or −1. Show that
det(A) is an even integer when n ≥ 2.
3.2.14 Let A = (aij ) ∈ R(n, n) satisfy the following diagonally dominant
condition:
|aii | > |aij |, i = 1, . . . , n. (3.2.47)
j =i
Show that det(A) > 0. (This result is also known as the Minkowski
theorem.)
www.pdfgrip.com
102 Determinants
det(In − α t β) = 1 − αβ t . (3.2.49)
In fact, it is easy to see that the left-hand side of (3.3.1) is the cofactor expan-
sion of the determinant along the lth column of such a matrix that is obtained
from A through replacing the lth column by the kth column of A whose value
must be zero and the left-hand side of (3.3.2) is the cofactor expansion of the
determinant along the lth row of such a matrix that is obtained from A through
replacing the lth row by the kth row of A whose value must also be zero.
We can summarize the properties stated in (3.2.27), (3.3.1), and (3.3.2) by
the expressions
adj(A) = C t . (3.3.4)
Ax = b (3.3.7)
1 1
n n
xi = Aij bj = bj Cj i
det(A) det(A)
j =1 j =1
det(Ai )
= , i = 1, . . . , n, (3.3.9)
det(A)
where Ai is the matrix obtained from A after replacing the ith column of A by
the vector b, i = 1, . . . , n.
www.pdfgrip.com
104 Determinants
The formulas stated in (3.3.9) are called Cramer’s formulas. Such a solution
method is also called Cramer’s rule.
Let A ∈ F(m, n) and of rank k. Then there are k row vectors of A which are
linearly independent. Use B to denote the submatrix of A consisting of those k
row vectors. Since B is of rank k we know that there are k column vectors of B
which are linearly independent. Use C to denote the submatrix of B consisting
of those k column vectors. Then C is a submatrix of A which lies in F(k, k)
and is of rank k. In particular det(C) = 0. In other words, we have shown that
if A is of rank k then A has a k × k submatrix of nonzero determinant.
To end this section, we consider a practical problem as an application of
determinants: The unique determination of a polynomial by interpolation.
Let p(t) be a polynomial of degree (n − 1) ≥ 1 over a field F given by
⎧
⎪
⎪ a0 + a1 t1 + · · · + an−2 t1n−2 + an−1 t1n−1 = p1 ,
⎪
⎪
⎨ a0 + a1 t2 + · · · + an−2 t2n−2 + an−1 t2n−1 = p2 ,
(3.3.11)
⎪
⎪ · · ···································· ··· ··· ,
⎪
⎪
⎩
a0 + a1 tn + · · · + an−2 tnn−2 + an−1 tnn−1 = pn ,
1 t1 t12 ··· t1n−2 t1n−1
1 t2 t22 ··· t2n−2 t2n−1
det(A) = . (3.3.12)
··· ··· ··· ··· ··· · · ·
1 tn tn2 ··· tnn−2 t n−1
n
Adding the (−t1 ) multiple of the second last column to the last column,. . . ,
the (−t1 ) multiple of the second column to the third column, and the (−t1 )
multiple of the first column to the second column, we get
www.pdfgrip.com
det(A)
1 0 0 ··· 0 0
1 (t2 − t1 ) t22 (t2 − t1 ) ··· t2n−3 (t2 − t1 ) t2 (t2 − t1 )
n−2
=
··· ··· ··· ··· ··· ···
1 (tn − t1 ) tn (tn − t1 ) ··· tnn−3 (tn − t1 ) tnn−2 (tn − t1 )
1 t2 t22 ··· t2n−3 t2n−2
'
n
1 t3 t32 ··· t3n−3 t3n−2
= (ti − t1 ) . (3.3.13)
··· ··· ··· ··· ··· · · ·
i=2
1 tn tn2 ··· tnn−3 t n−2
n
det(A)
n ⎛ ⎞ n
' '
n '
= (ti − t1 ) ⎝ ⎠
(tj − t2 ) · · · (tk − tn−2 ) (tn − tn−1 )
i=2 j =3 k=n−1
'
= (tj − ti ). (3.3.14)
1≤i<j ≤n
Exercises
106 Determinants
⎧
⎪
⎨ n, r(A) = n,
r(adj(A)) = 1, r(A) = n − 1, (3.3.16)
⎪
⎩
0, r(A) ≤ n − 2.
⎧
⎨ a11 x1 + · · · + a1n xn
⎪ = 0,
·················· ··· ··· (3.3.17)
⎪
⎩
an1 x1 + · · · + ann xn = 0.
3.4 Characteristic
polynomials and Cayley–Hamilton theorem
We first consider the concrete case of matrices.
Let A = (aij ) be an n × n matrix over a field F. We first consider the linear
mapping TA : Fn → Fn induced from A defined by
⎛ ⎞
x1
⎜ . ⎟
TA (x) = Ax, x = ⎜ ⎟
⎝ .. ⎠ ∈ F .
n
(3.4.1)
xn
det(λIn − A) = 0. (3.4.3)
Of course the converse is true as well: If λ satisfies (3.4.3) then (λIn −A)x = 0
has a nontrivial solution which indicates that λ is an eigenvalue of A. Conse-
quently the eigenvalues of A are the roots of the function
108 Determinants
can only appear in the product of λ − a22 and its cofactor in the matrix λIn−1 −
An−1 . Carrying out this process to the end we see that the two leading-degree
terms in pA (λ) can appear only in the product
(λ − a11 ) · · · (λ − ann ), (3.4.7)
which may be read off to give us the results
λn , −(a11 + · · · + ann )λn−1 . (3.4.8)
This completes the proof.
If A ∈ C(n, n) we may also establish Theorem 3.15 by means of calculus.
In fact, we have
pA (λ) 1
= det In − A . (3.4.9)
λn λ
Thus, letting λ → ∞ in (3.4.9), we obtain an = det(In ) = 1. Moreover, (3.4.9)
also gives us the expression
1 1 1
an + an−1 + · · · + a0 n = det In − A . (3.4.10)
λ λ λ
1
Thus, replacing by t, we get
λ
q(t) ≡ an + an−1 t + · · · + a0 t n
1 − ta11 −ta12 ··· −ta1n
−ta21 1 − ta22 · · · −ta2n
= .
(3.4.11)
··· ··· ··· ···
−tan1 −tan2 ··· 1 − tann
Therefore
n
an−1 = q (0) = − aii (3.4.12)
i=1
www.pdfgrip.com
as before.
We now consider the abstract case.
Let U be an n-dimensional vector space over a field F and T ∈ L(U ).
Assume that u ∈ U is an eigenvector of T associated to the eigenvalue λ ∈ F.
Given a basis U = {u1 , . . . , un } we express u as
u = x1 u1 + · · · + xn un , x = (x1 , . . . , xn )t ∈ Fn . (3.4.13)
Let A = (aij ) ∈ F(n, n) be the matrix representation of T with respect to the
basis U so that
n
T (uj ) = aij ui , j = 1, . . . , n. (3.4.14)
i=1
Following the study of the previous chapter we know that A, B, C are re-
lated through the similarity relation A = C −1 BC. Hence we have
pA (λ) = det(λIn − A) = det(λIn − C −1 BC)
= det(C −1 [λIn − B]C)
= det(λIn − B) = pB (λ). (3.4.18)
That is, two similar matrices have the same characteristic polynomial. Thus
we may use pA (λ) to define the characteristic polynomial of linear mapping
T ∈ L(U ), rewritten as pT (λ), where A is the matrix representation of T with
respect to any given basis of U , since such a polynomial is independent of the
choice of the basis.
www.pdfgrip.com
110 Determinants
n
uj = uij ei , j = 1, . . . , n. (3.4.25)
i=1
112 Determinants
Exercises
where Cii is the cofactor of the entry aii of the matrix A (i = 1, . . . , n).
3.4.2 Consider the subset D of C(n, n) defined by
D = {A ∈ C(n, n) | A has n distinct eigenvalues}. (3.4.39)
www.pdfgrip.com
114 Determinants
4
Scalar products
115
www.pdfgrip.com
Let u, v ∈ U so that u is not null. Then we can resolve v into the sum of
two mutually perpendicular vectors, one in Span{u}, say cu for some scalar
c, and one in Span{u}⊥ , say w. In fact, rewrite v as v = w + cu and require
(u, w) = 0. We obtain the unique solution c = (u, v)/(u, u). In summary, we
have obtained the orthogonal decomposition
(u, v) (u, v)
v=w+ u, w=v− u ∈ Span{u}⊥ . (4.1.2)
(u, u) (u, u)
As the first application of the decomposition (4.1.2), we state the following.
V = Span{v1 , . . . , vl }. (4.1.3)
It is clear that (w2 +w3 , w2 +w3 ) = 2(w2 , w3 ) = 0 and (w2 +w3 ) ⊥ w1 . Since
{v1 , v2 + v3 , v3 , . . . , vl } is also a basis of V , the above procedure indicates that
we may rename the basis vectors {v1 , . . . , vl } of V if necessary so that we
obtain
(w1 , v2 )
w2 = v2 − w1 , (4.1.6)
(w1 , w1 )
which satisfies (w2 , w2 ) = 0. Of course Span{v1 , . . . , vl } =
Span{w1 , w2 , v3 , . . . , vl }. Now set
(w2 , vi ) (w1 , vi )
wi = vi − w2 − w1 , i = 3, . . . , l. (4.1.7)
(w2 , w2 ) (w1 , w1 )
Then wi ⊥ w2 and wi ⊥ w1 for i = 3, . . . , l. If (wi , wi ) = 0 for some i =
3, . . . , l, by renaming {v1 , . . . , vl } if necessary, we may assume (w3 , w3 ) = 0.
If (wi , wi ) = 0 for all i = 3, . . . , l, then there is some i = 4, . . . , l such
that (w3 , wi ) = 0. We may assume (w3 , w4 ) = 0. Thus (w3 + w4 , w3 +
w4 ) = 2(w3 , w4 ) = 0 and (w3 + w4 ) ⊥ w1 , (w3 + w4 ) ⊥ w2 . Of course,
{v1 , v2 , v3 + v4 , v4 . . . , vl } is also a basis of V . Thus we see that we may
rename the basis vectors {v1 , . . . , vl } of V so that we obtain
(w2 , v3 ) (w1 , v3 )
w3 = v3 − w2 − w1 , (4.1.8)
(w2 , w2 ) (w1 , w1 )
which again satisfies (w3 , w3 ) = 0 and
Span{v1 , . . . , vl } = Span{w1 , w2 , w3 , v4 , . . . , vl }. (4.1.9)
Therefore, by renaming the basis vectors {v1 , . . . , vl } if necessary, we will
be able to carry the above procedure out and obtain a new set of vectors
{w1 , . . . , wl } given by
i−1
(wj , vi )
w1 = v1 , wi = vi − wj , i = 2, . . . , l, (4.1.10)
(wj , wj )
j =1
The method described in the proof of Theorem 4.2, especially the scheme
given by the formulas in (4.1.10)–(4.1.11), is known as the Gram–Schmidt
procedure for basis orthogonalization.
www.pdfgrip.com
n0 + n+ + m− ≤ dim(U ). (4.1.20)
Exercises
4.1.5 In special relativity, one equips the space R4 with the Minkowski scalar
product or Minkowski metric given by
⎛ ⎞ ⎛ ⎞
a1 b1
⎜ ⎟ ⎜ ⎟
⎜ a2 ⎟ ⎜ b2 ⎟
⎜
(u, v) = a1 b1 − a2 b2 − a3 b3 − a4 b4 , u = ⎜ ⎟ ⎜ ⎟
⎟, v = ⎜ b ⎟ ∈ R .
4
⎝ a3 ⎠ ⎝ 3 ⎠
a4 b4
(4.1.24)
Find an orthogonal basis and determine the indices of nullity, positivity,
and negativity of R4 equipped with this scalar product.
4.1.6 With the notation of the previous exercise, consider the following modi-
fied scalar product
(u, v) = a1 b1 − a2 b3 − a3 b2 − a4 b4 . (4.1.25)
(V + W )⊥ = V ⊥ ∩ W ⊥ . (4.1.26)
u, ρ(av) = (u, av) = a(u, v) = au, ρ(v) = u, aρ(v), (4.2.11)
On the other hand, for T ∈ L(U ), recall that the dual of T , T ∈ L(U ),
satisfies
T ∗ = ρ −1 ◦ T ◦ ρ, (4.2.15)
is called the dual or adjoint mapping of T , with respect to the scalar product
(·, ·).
Definition 4.5 Let T ∈ L(U ) where U is a vector space equipped with a scalar
product (·, ·). We say that T is an orthogonal mapping if (T (u), T (v)) = (u, v)
for any u, v ∈ U .
Exercises
Let
1
V = Span . (4.2.31)
0
over R2 .
(i) Show that the scalar product (·, ·)0 is non-degenerate.
(ii) For TA ∈ L(R2 ) given in (4.2.32), obtain the adjoint mapping TA
of TA and find conditions on the matrix A so that TA is self-adjoint
with respect to the scalar product (·, ·)0 .
4.2.5 Use U to denote the vector space of real-valued functions with all
orders of derivatives over the real line R which vanish outside bounded
intervals. Equip U with the scalar product
∞
(u, v) = u(t)v(t) dt, u, v ∈ U. (4.2.34)
−∞
d
Show that the linear mapping D = : U → U is anti-self-dual or
dt
anti-self-adjoint. That is, D = −D.
4.2.6 Let U be a finite-dimensional vector space equipped with a scalar prod-
uct, (·, ·). Decompose U into the direct sum as stated in (4.1.15). Show
that we may use the scalar product (·, ·) of U to make the quotient
space U/U0 into a space with a non-degenerate scalar product when
U0 = U , still denoted by (·, ·), given by
([u], [v]) = (u, v), [u], [v] ∈ U/U0 . (4.2.35)
4.2.7 Let U be a finite-dimensional vector space equipped with a scalar prod-
uct, (·, ·), and let V be a subspace of U . Show that (·, ·) is a non-
degenerate scalar product over V if and only if V ∩ V ⊥ = {0}.
www.pdfgrip.com
an bn
and complex ones over the standard Hermitian scalar product on Cn :
(u, v) = u · v = u† v = a 1 b1 + · · · + a n bn ,
⎛ ⎞ ⎛ ⎞
a1 b1
⎜ . ⎟ ⎜ . ⎟ (4.3.2)
u=⎜ ⎟ ⎜ ⎟
⎝ .. ⎠ , v = ⎝ .. ⎠ ∈ C .
n
an bn
www.pdfgrip.com
Definition 4.7 A positive definite scalar product over a real vector space U
is a scalar product (·, ·) satisfying (u, u) ≥ 0 for u ∈ U and (u, u) = 0 only
when u = 0.
A positive definite scalar product over a complex vector space U is a scalar
function (u, v) ∈ C, defined for each pair of vectors u, v ∈ U , which satisfies
the following conditions.
(1) (Hermitian symmetry) (u, v) = (v, u) for u, v ∈ U .
(2) (Additivity) (u + v, w) = (u, w) + (v, w) for u, v, w ∈ U .
(3) (Partial homogeneity) (u, av) = a(u, v) for a ∈ C and u, v ∈ U .
(4) (Positivity) (u, u) ≥ 0 for u ∈ U and (u, u) = 0 only when u = 0.
Since the real case is contained as a special situation of the complex case,
we shall focus our discussion on the complex case, unless otherwise stated.
Needless to say, additivity regarding the second argument in (·, ·) still holds
since
(u, v + w) = (v + w, u) = (v, u) + (w, u) = (u, v) + (u, w), u, v, w ∈ U.
(4.3.3)
On the other hand, homogeneity regarding the first argument takes a modified
form,
(au, v) = (v, au) = a(v, u) = a(u, v), a ∈ C, u ∈ U. (4.3.4)
We will extend our study carried out in the previous two sections for general
scalar products to the current situation of a positive definite scalar product that
is necessarily non-degenerate since (u, u) > 0 for any nonzero vector u in U .
First, we see that for u, v ∈ U we can still use the condition (u, v) = 0 to
define u, v to be mutually perpendicular vectors. Next, since for any u ∈ U we
have (u, u) ≥ 0, we can formally define the norm of u as in Cn by
(
u = (u, u). (4.3.5)
www.pdfgrip.com
It is clearly seen that the norm so defined enjoys the positivity and homogeneity
conditions required of a norm. That it also satisfies the triangle inequality will
be established shortly. Thus (4.3.5) indeed gives rise to a norm of the space U
that is said to be induced from the positive definite scalar product (·, ·).
Let u, v ∈ U be perpendicular. Then we have
u + v2 = u2 + v2 . (4.3.6)
This important expression is also known as the Pythagoras theorem, which
may be proved by a simple expansion
u + v2 = (u + v, u + v) = (u, u) + (u, v) + (v, u) + (v, v) = u2 + v2 ,
(4.3.7)
f = a1 f1 + · · · + an fn . (4.3.18)
Consequently, we have
fi #→ ui , i = 1, . . . , n; f = a1 f1 + · · · + an fn #→ a 1 u1 + · · · + a n un .
(4.3.20)
Proof We need only to carry out the proof in the complex case because now
the scalar product (·, ·) fails to be symmetric and the relation (4.2.18) is invalid.
That T being unitary implies (1) is trivial since T (u)2 = (T (u), T (u)) =
(u, u) = u2 for any u ∈ U .
Assume (1) holds. From the expansions
1
(T (u), T (v)) = (T (u + v)2 − T (u)2 − T (v)2 )
2
1
+ i(T (iu + v)2 − T (u)2 − T (v)2 )
2
1
= (u + v2 − u2 − v2 )
2
1
+ i(iu + v2 − u2 − v2 )
2
= (u, v), u, v ∈ U. (4.3.28)
Hence T is unitary.
The rest of the proof is similar to that of Theorem 4.6 and thus skipped.
(u, v) = ut v, u, v ∈ Rn . (4.3.29)
Thus
n
T (uj ) = aij ui , j = 1, . . . , n. (4.3.31)
i=1
(u, v) = ut v, u, v ∈ Cn . (4.3.33)
Thus
# t $t
(u, TA (v)) = ut Av = A u v = (TA (u), v), u, v ∈ Cn . (4.3.34)
unitary if and only if its sets of column and row vectors both form orthonormal
bases of Cn with the standard Hermitian scalar product.
A complex matrix A ∈ C(n, n) is called Hermitian if it satisfies the property
A = A† or, equivalently, if it defines a self-adjoint mapping TA = TA .
Indeed, let η denote the right-hand side of (4.3.39). From the Schwarz inequal-
ity (4.3.10), we get |(u, v)| ≤ u for any v ∈ U satisfying v = 1. So
η ≤ u. To show that η ≥ u, it suffices to consider the nontrivial situation
u = 0. In this case, we have
1
η ≥ u, u = u. (4.3.40)
u
Hence, in conclusion, η = u and (4.3.39) follows.
Exercises
4.3.1 Let U be a vector space with a positive definite scalar product (·, ·).
Show that u1 , . . . , uk ∈ U are linearly independent if and only if their
associated metric matrix, also called the Gram matrix,
⎛ ⎞
(u1 , u1 ) · · · (u1 , uk )
⎜ ⎟
M = ⎝ ··· ··· ··· ⎠ (4.3.41)
(uk , u1 ) · · · (uk , uk )
is nonsingular.
4.3.2 (Continued from Exercise 4.3.1) Show that if u ∈ U lies in
Span{u1 , . . . , uk } then the column vector ((u1 , u), . . . , (uk , u))t lies in
the column space of the metric matrix M. However, the converse is not
true when k < dim(U ).
4.3.3 Let U be a complex vector space with a positive definite scalar product
and B = {u1 , . . . , un } an orthonormal basis of U . For T ∈ L(U ), let
A, A ∈ C(n, n) be the matrices that represent T , T , respectively, with
respect to the basis B. Show that A = A† .
4.3.4 Let U be a finite-dimensional complex vector space with a positive defi-
nite scalar product and S ∈ L(U ) be anti-self-adjoint. That is, S = −S.
Show that I ± S must be invertible.
4.3.5 Consider the complex vector space C(m, n). Show that
defines a positive definite scalar product over C(m, n) that extends the
( C = C(m, 1). With such a
m
traditional Hermitian scalar product over
scalar product, the quantity A = (A, A) is sometimes called the
Hilbert–Schmidt norm of the matrix A ∈ C(m, n).
4.3.6 Let (·, ·) be the standard Hermitian scalar product on Cm and A ∈
C(m, n). Establish the following statement known as the Fredholm
alternative for complex matrix equations: Given b ∈ Cm the non-
homogeneous equation Ax = b has a solution for some x ∈ Cn if and
only if (y, b) = 0 for any solution y ∈ Cm of the homogeneous equation
A† y = 0.
4.3.7 For A ∈ C(n, n) show that if the column vectors of A form an orthonor-
mal basis of Cn with the standard Hermitian scalar product so do the
row vectors of A.
4.3.8 For the matrix
⎛ ⎞
1 −1 1
⎜ ⎟
A = ⎝ −1 1 2 ⎠, (4.3.43)
2 1 −2
obtain a QR factorization.
u = v + w, v ∈ V, w ∈ V ⊥, (4.4.3)
k
(vi , u)
v= vi . (4.4.4)
(vi , vi )
i=1
Moreover, the vector v given in (4.4.4) is the unique solution of the minimiza-
tion problem
η ≡ inf{u − x | x ∈ V }. (4.4.5)
Proof The validity of the expression (4.4.3) for some unique v ∈ V and
w ∈ V ⊥ is already ensured by Theorem 4.11.
We rewrite v as
k
v= ai vi . (4.4.6)
i=1
k
k
k
u− bi vi = w + (ai − bi )vi , x= bi vi ∈ V . (4.4.8)
i=1 i=1 i=1
www.pdfgrip.com
Consequently,
" "2
" k "
" "
u − x2 = "u − bi vi "
" "
i=1
" "2
" k "
" "
= "w + (ai − bi )vi "
" "
i=1
k
= w2 + |ai − bi |2 vi 2
i=1
≥ w2 , (4.4.9)
and the lower bound w2 is attained only when bi = ai for all i = 1, . . . , k,
or x = v.
So the proof is complete.
Theorem 4.15 Let U be a vector space with a positive definite scalar product
and V = {v1 , . . . , vn } a set of orthogonal vectors in U . For any u ∈ U , let the
scalars a1 , . . . , an be the Fourier coefficients of u with respect to V.
Exercises
k
(vi , u)
P (u) = vi , u ∈ U, (4.4.16)
(vi , vi )
i=1
In other words, the distance of the images of any two vectors in U under T
is the same as that between the two vectors. A mapping from U into itself
satisfying such a property is called an isometry or isometric. In this section, we
show that any zero-vector preserving mapping from a real vector space U with
a positive definite scalar product into itself satisfying the property (4.5.1) must
be linear. Therefore, in view of Theorems 4.6, it is orthogonal. In other words,
in the real setting, being isometric characterizes a mapping being orthogonal.
Theorem 4.16 Let U be a real vector space with a positive definite scalar
product. A mapping T from U into itself satisfying the isometric property
(4.5.1) and T (0) = 0 if and only if it is orthogonal.
1
(u, v) = (u + v2 − u2 − v2 ), u, v ∈ U. (4.5.2)
2
www.pdfgrip.com
Hence (T (u), T (v)) = (u, v) for any u, v ∈ U . Using this result, we have
T (u + v) − T (u) − T (v)2
= T (u + v)2 + T (u)2 + T (v)2
− 2(T (u + v), T (u)) − 2(T (u + v), T (v)) + 2(T (u), T (v))
= u + v2 + u2 + v2 − 2(u + v, u) − 2(u + v, v) + 2(u, v)
= (u + v) − u − v2 = 0, (4.5.4)
We next give an example showing that, in the complex situation, being iso-
metric alone is not sufficient to ensure a mapping to be unitary.
The vector space we consider here is taken to be Cn with the standard Her-
mitian scalar product (4.3.2) and the mapping T : Cn → Cn is given by
T (u) = u for u ∈ Cn . Then T satisfies (4.5.1) and T (0) = 0. However,
www.pdfgrip.com
Theorem 4.17 Let U be a complex vector space with a positive definite scalar
product (·, ·). A mapping T from U into itself satisfying the isometric property
(4.5.1), T (0) = 0, and
iT (u) − T (v) = iu − v, u, v ∈ U, (4.5.8)
if and only if it is unitary, where · is induced from (·, ·).
Proof It is obvious that if T ∈ L(U ) is unitary then both (4.5.1) and (4.5.8)
are fulfilled. We now need to show that the converse is true.
First, setting v = iu in (4.5.8), we obtain
T (iu) = iT (u), u ∈ U. (4.5.9)
Next, replacing u, v in (4.3.27) by T (u), −T (v), respectively, and using
(4.5.1) and (4.5.8), we have
1
−(T (u), T (v)) = (T (u) − T (v)2 − T (u)2 − T (v)2 )
2
1
+ i(iT (u) − T (v)2 − T (u)2 − T (v)2 )
2
1
= (u − v2 − u2 − v2 )
2
1
+ i(u − iv2 − u2 − u2 )
2
= −(u, v), u, v ∈ U. (4.5.10)
That is, (T (u), T (v)) = (u, v) for u, v ∈ U . Thus a direct expansion gives us
the result
T (u + v) − T (u) − T (v)2
= T (u + v)2 + T (u)2 + T (v)2
− 2(T (u + v), T (u)) − 2(T (u + v), T (v)) + 2(T (u), T (v))
= u + v2 + u2 + v2 − 2(u + v, u) − 2(u + v, v) + 2(u, v)
= (u + v) − u − v2 = 0, (4.5.11)
www.pdfgrip.com
It is worth noting that Theorems 4.16 and 4.17 are valid in general without
restricting to finite-dimensional spaces.
Exercises
4.5.1 Let U be a real vector space with a positive definite scalar product (·, ·).
Establish the following variant of Theorem 4.16: A mapping T from U
into itself satisfying the property
5
Real quadratic forms and self-adjoint mappings
In this chapter we exclusively consider vector spaces over the field of reals
unless otherwise stated. We first present a general discussion on bilinear and
quadratic forms and their matrix representations. We also show how a sym-
metric bilinear form may be uniquely represented by a self-adjoint mapping.
We then establish the main spectrum theorem for self-adjoint mappings based
on a proof of the existence of an eigenvalue using calculus. We next focus on
characterizing the positive definiteness of self-adjoint mappings. After these
we study the commutativity of self-adjoint mappings. In the last section we
show the effectiveness of using self-adjoint mappings in computing the norm
of a mapping between different spaces and in the formalism of least squares
approximations.
147
www.pdfgrip.com
⎛ ⎞
n
n
n
f (u, v) = f ⎝ xi ui , yj uj ⎠ = xi f (ui , uj )yj = x t Ay, (5.1.1)
i=1 j =1 i,j =1
n
ũj = bij ui , j = 1, . . . , n, (5.1.2)
i=1
which leads to the relation à = B t AB and gives rise to the following concept.
Definition 5.2 For A, B ∈ F(n, n), we say that A and B are congruent if there
is an invertible element C ∈ F(n, n) such that
A = C t BC. (5.1.4)
which is called the quadratic form associated with the bilinear form f . The
quadratic form q is homogeneous of degree 2 since q(tu) = t 2 q(u) for any
t ∈ R and u ∈ U .
Of course q is uniquely determined by f through (5.1.5). However, the con-
verse is not true, which will become clear after the following discussion.
To proceed, let B = {u1 , . . . , un } be a basis of U and u ∈ U any given
vector whose coordinate vector with respect to B is x = (x1 , . . . , xn )t . Then,
from (5.1.1), we have
1 1 1
q(u) = x t Ax = x t (A + At ) + (A − At ) x = x t (A + At )x,
2 2 2
(5.1.6)
www.pdfgrip.com
Thus, if q is the quadratic form associated with f , we derive from (5.1.8) the
relation
1
f (u, v) = (q(u + v) − q(u) − q(v)), u, v ∈ U, (5.1.9)
2
which indicates how f is uniquely determined by q. In a similar manner, we
also have
1
f (u, v) = (q(u + v) − q(u − v)), u, v ∈ U. (5.1.10)
4
As in the situation of scalar products, the relations of the types (5.1.9) and
(5.1.10) are often referred to as polarization identities for symmetric bilinear
forms.
From now on we will concentrate on symmetric bilinear forms.
Let f : U × U → R be a symmetric bilinear form. If x, y ∈ Rn are the
coordinate vector of u, v ∈ U with respect to a basis B, then f (u, v) is given
by (5.1.1) so that matrix A ∈ R(n, n) is symmetric. Recall that (x, y) = x t y is
the Euclidean scalar product over Rn . Thus, if we view A as a linear mapping
Rn → Rn given by x #→ Ax, then the right-hand side of (5.1.1) is simply
(x, Ay). Since A = At , the right-hand side of (5.1.1) is also (Ax, y). In other
words, A defines a self-adjoint mapping over Rn with respect to the standard
Euclidean scalar product over Rn .
Conversely, if U is a vector space equipped with a positive definite scalar
product (·, ·) and T ∈ L(U ) is a self-adjoint or symmetric mapping, then
is a symmetric bilinear form. Thus, in this way, we see that symmetric bilinear
forms are completely characterized.
In a more precise manner, we have the following theorem, which relates
symmetric bilinear forms and self-adjoint mappings over a vector space with a
positive definite scalar product.
Proof For each v ∈ U , the existence of a unique vector, say T (v), so that
(5.1.11) holds, is already shown in Section 4.2. Since f is bilinear, we have
T ∈ L(U ). The self-adjointness or symmetry of T follows from the symmetry
of f and the scalar product.
Note that, in view of Section 4.1, a symmetric bilinear form is exactly a kind
of scalar product as well, not necessarily positive definite, though.
Exercises
Theorem 5.5 Assume that T ∈ L(U ) is self-adjoint with respect to the scalar
product of U . Then
(1) T must have a real eigenvalue,
(2) there is an orthonormal basis of U consisting of eigenvectors of T .
Then set
⎛ ⎞
n
n
Q(x) = (u, T (u)) = ⎝ xi ui , xj T (uj )⎠
i=1 j =1
⎛ ⎞
n
n
n
n
=⎝ xi ui , xj akj uk ⎠ = xi xj akj δik
i=1 j =1 k=1 i,j,k=1
n
= xi xj aij = x t Ax. (5.2.2)
i,j =1
which may also be identified as the unit sphere in Rn centered at the origin
and commonly denoted by S n−1 . Since S n−1 is compact, the function Q given
in (5.2.2) attains its minimum over S n−1 at a certain point on S n−1 , say x 0 =
(x10 , . . . , xn0 )t .
We may assume xn0 = 0. Without loss of generality, we may also assume
xn > 0 (the case xn0 < 0 can be treated similarly). Hence, near x 0 , we may
0
Therefore, with
P (x1 , . . . , xn−1 ) = Q x1 , . . . , xn−1 , 1 − x12 − · · · − xn−1
2 , (5.2.5)
n−1
P (x1 , . . . , xn−1 ) = aij xi xj
i,j =1
n−1
+2 ain xi 1 − x12 − · · · − xn−1
2
i=1
+ ann (1 − x12 − · · · − xn−1
2
). (5.2.7)
Thus
n−1
∂P
=2 aij xj + 2ain 1 − x12 − · · · − xn−1
2
∂xi
j =1
xi
n−1
− 2 aj n xj − 2ann xi , i = 1, . . . , n − 1.
1 − x12 − · · · − xn−1
2
j =1
(5.2.8)
⎛ ⎞
n
n
T (v1 ) = T ⎝ ⎠
xj uj =
0
xj0 T (uj )
j =1 j =1
⎛ ⎞
n
n n
= aij ui xj0 = ⎝ aij xj0 ⎠ ui
i,j =1 i=1 j =1
n
= λ0 xi0 ui = λ0 v1 , (5.2.12)
i=1
Exercises
5.2.1 The fact that a symmetric matrix A ∈ R(n, n) has and can have only
real eigenvalues may also be proved algebraically more traditionally as
follows. Consider A as an element in C(n, n) and let λ ∈ C be any of
its eigenvalue whose existence is ensured by the Fundamental Theorem
www.pdfgrip.com
5.2.15 Let A ∈ R(n, n) be a symmetric matrix whose eigenvalues are all non-
negative. Show that det(A + In ) > 1 if A = 0.
5.2.16 Let u ∈ Rn be a nonzero column vector. Show that there is an orthog-
onal matrix Q ∈ R(n, n) such that
In this section, we apply the results of the previous section to investigate the
situation when q stays positive, which is important in applications.
It is clear that T = S 2 .
Conversely, if T = S 2 for some positive definite mapping S, then 0 is not
an eigenvalue of S. So S is invertible. Therefore
(u, T (u)) = (u, S 2 (u)) = (S(u), S(u)) = S(u)2 > 0, u ∈ U, u = 0,
(5.3.9)
and the positive definiteness of T follows.
Let {u1 , . . . , un } be an arbitrary basis of U and x = (x1 , . . . , xn )t ∈ Rn a
n
nonzero vector. For u = xi ui , we have
i=1
⎛ ⎞
n
n
n
(u, T (u)) = ⎝ xi ui , xj T (uj )⎠ = xi (ui , T (uj ))xj = x t Ax,
i=1 j =1 i,j =1
(5.3.10)
which is positive if T is positive definite, and vice versa. So the equivalence of
the positive definiteness of T and the statement (4) follows.
Suppose that A is positive definite. Let λ be any eigenvalue of A and x an
associated eigenvector. Then x t Ax = λx t x > 0 implies λ > 0 since x t x > 0.
So (5) follows.
Now assume (5). Using Theorem 5.6, we see that A = P t DP where D is a
diagonal matrix in R(n, n) whose diagonal entries are the positive eigenvalues
www.pdfgrip.com
A = P t DP = P t D12 P = (P t D1 P )(P t D1 P ) = B 2 , B = P t D1 P .
(5.3.13)
Using (5) we know that B is positive definite. So (6) follows. Conversely, if (6)
holds, we may use the symmetry of B to get x t Ax = x t B 2 x = (Bx)t (Bx) > 0
whenever x ∈ Rn with x = 0 because B is nonsingular, which implies Bx =
0. Thus (4) holds.
The proof is complete.
Proof That A is positive definite is equivalent to either (1) or (2) has already
been demonstrated in the proof of Theorem 5.8 and it remains to establish (3).
If there is a nonsingular matrix B ∈ R(n, n) such that A = B t B, then
x t Ax = (Bx)t (Bx) > 0 for x ∈ Rn with x = 0, which implies that A is
positive definite.
www.pdfgrip.com
Now assume A is positive definite. Then, combining Theorem 5.6 and (1),
we can rewrite A as A = P t DP where P ∈ R(n, n) is orthogonal and
D ∈ R(n, n) is diagonal whose diagonal entries, say λ1 , . . . , λn , are all posi-
tive. Define the diagonal matrix D1 by (5.3.12) and set D1 P = B. Then B is
nonsingular and A = B t B as asserted.
x4
(5.3.15)
√
A is negative definite; when a√= −
√ 3, A is negative semi-definite but not
negative definite; when a ∈ (− 3, 3), A is indefinite.
Exercises
i=1 i=1
xn
(5.3.21)
det(λA − B) = 0, (5.3.23)
we have
In−1 β
x Ax = y
t t
t
y
β ann
= y12 + · · · + yn−1
2
+ 2(b1 y1 + · · · + bn−1 yn−1 )yn + ann yn2
= (y1 + b1 yn )2 + · · · + (yn−1 + bn−1 yn )2
+ (ann − b12 − · · · − bn−1
2
)yn2 , (5.4.8)
k
y t Ai1 ,...,ik y = ail im yl ym = x t Ax, (5.4.10)
l,m=1
where
⎛ ⎞
ai1 i1 ··· ai1 ik
⎜ ⎟
Ai1 ,...,ik = ⎝ · · · ··· ··· ⎠ (5.4.11)
aik i1 ··· aik ik
is a submatrix of A obtained from deleting all the ith rows and j th columns
of A for i, j = 1, . . . , n and i, j = i1 , . . . , ik . The quantity det(Ai1 ,...,ik )
is referred to as a principal minor of A of order k. Such a principal minor
becomes a leading principal minor when i1 = 1, . . . , ik = k with A1,...,k = Ak .
For A = (aij ) the principal minors of order 1 are all its diagonal entries,
a11 , . . . , ann . It is clear that if A is positive definite then so is Ai1 ,...,ik . Hence
det(Ai1 ,...,ik ) > 0. Therefore we arrive at the following slightly strengthened
version of Theorem 5.10.
If we require that all the diagonal entries of L2 be positive, then the inductive
assumption leads to L2 = L1 . Hence the vector γ is also uniquely determined,
γ = L−11 α. So it remains to show that the number a may be uniquely deter-
mined as well. For this purpose, we need to show in (5.4.15) that
An−1 α L1 0 In−1 γ Lt1 0
A= t
= t
.
α ann 0 1 γ ann 0 1
(5.4.17)
Hence (5.4.6), namely, (5.4.16), follows immediately. Therefore the last rela-
tion in (5.4.15) leads to the unique determination of the number a:
(
a = ann − γ t γ > 0. (5.4.18)
Exercises
|aij | ≤ a, i, j = 1, . . . , n. (5.4.28)
Thus S(u) ∈ Eλi . In other words, each Eλi is invariant under S. Since S is self-
adjoint, each Eλi has an orthonormal basis, say {ui,1 , . . . , ui,mi }, consisting of
the eigenvectors of S. Therefore, the set of vectors
Note that although the theorem says that, when S and T commute and
n = dim(U ), there are n mutually orthogonal vectors that are the eigenvec-
tors of S and T simultaneously, it does not say that an eigenvector of S or T
must also be an eigenvector of T or S. For example, we may take S = I (the
identity mapping). Then S commutes with any mapping and any nonzero vec-
tor u is an eigenvector of S but u cannot be an eigenvector of all self-adjoint
mappings.
The matrix version of Theorem 5.13 is easily stated and proved: Two sym-
metric matrices A, B ∈ R(n, n) are commutative, AB = BA, if and only if
there is an orthogonal matrix P ∈ R(n, n) such that
A = P t DA P , B = P t DB P , (5.5.4)
where DA , DB are diagonal matrices in R(n, n) whose diagonal entries are the
eigenvalues of A, B, respectively.
www.pdfgrip.com
Exercises
5.5.1 Let U be a finite-dimensional vector space over a field F. Show that for
T ∈ L(U ), if T commutes with any S ∈ L(U ), then there is some a ∈ F
such that T = aI .
5.5.2 If A, B ∈ R(n, n) are positive definite matrices and AB = BA, show
that AB is also positive definite.
5.5.3 Let U be an n-dimensional vector space over a field F, where F = R or
C and T , S ∈ L(U ). Show that, if T has n distinct eigenvalues in F and
S commutes with T , then there is a polynomial p(t), of degree at most
(n − 1), of the form
p(t) = a0 + a1 t + · · · + an−1 t n−1 , a0 , a1 , . . . , an−1 ∈ F, (5.5.5)
such that S = p(T ).
5.5.4 (Continued from the previous exercise) If T ∈ L(U ) has n distinct
eigenvalues in F, show that
CT = {S ∈ L(U ) | S ◦ T = T ◦ S}, (5.5.6)
i.e. the subset of linear mappings over U and commutative with T , as a
subspace of L(U ), is exactly n-dimensional.
5.5.5 Let U be a finite-dimensional vector space with a positive definite scalar
product and T ∈ L(U ).
(i) Show that if T is normal satisfying T ◦ T = T ◦ T then T (u) =
T (u) for any u ∈ U . In particular, N (T ) = N (T ).
(ii) Show that if T is normal and idempotent, T 2 = T , then T = T .
where
σ0 = max {σi }, (5.6.7)
1≤i≤n
√
which proves T ≤ σ0 . Furthermore, let i = 1, . . . , n be such that σ0 = σi .
Then (5.6.5) leads to
In particular, the above also gives us a practical method to compute the norm
of a linear mapping T from U into itself by using the generated self-adjoint
mapping T ◦ T .
www.pdfgrip.com
T 2 = T ◦ T , T ∈ L(U, V ). (5.6.11)
In fact, let η denote the right-hand side of (5.6.12). Then the Schwarz
inequality (4.3.10) implies that |(u, T (u))| ≤ T (u) ≤ T for u ∈ U satis-
fying u = 1. So η ≤ T . On the other hand, let λ be the eigenvalue of T
such that T = |λ| and u ∈ U an associated eigenvector with u = 1. Then
T = |λ| = |(u, T (u))| ≤ η. Hence (5.6.12) is verified.
We now show how to extend (5.6.12) to evaluate the norm of a general linear
mapping between vector spaces with scalar products.
Thus, for any ε > 0, there is some uε ∈ U with uε = 1 such that
As an application, we see that, for any matrix A ∈ R(m, n), the largest
eigenvalues of the symmetric matrices At A ∈ R(n, n) and AAt ∈ R(m, m)
must coincide. In fact, regarding eigenvalues, it is not hard to establish a more
general result as stated as follows.
It is interesting that we do not require any additional properties for the vector
spaces U, V in order for Theorem 5.16 to hold.
We next study another important problem in applications known as the least
squares approximation.
Let T ∈ L(U, V ) and v ∈ V . Consider the optimization problem
- .
η ≡ inf T (u) − v2V | u ∈ U . (5.6.19)
For simplicity, we first assume that (5.6.19) has a solution, that is, there is
some x ∈ U such that T (x) − v2V = η, and we look for some appropriate
condition the solution x will fulfill. For this purpose, we set
Then f (0) = η ≤ f (ε). Hence we may expand the right-hand side of (5.6.20)
to obtain
df
0= = (T (x) − v, T (y))V + (T (y), T (x) − v)V
dε ε=0
= 2((T ◦ T )(x) − T (v), y)U , y ∈ U, (5.6.21)
which shows that the quantity T (x) − v2V is independent of the solution x
of (5.6.22). Besides, for any test element u ∈ U , we rewrite u as u = x + w
where x is a solution to (5.6.22) and w ∈ U . Then we have
k
1
x= (ui , T (v))U ui + x0 , x0 ∈ N (T ), (5.6.27)
λi
i=1
since N(T ◦ T ) = N (T ).
Exercises
where R2 and R3 are equipped with the standard Euclidean scalar prod-
ucts.
(i) Find the eigenvalues of T ◦ T and compute T .
(ii) Find the eigenvalues of T ◦ T and verify that T ◦ T is positive
semi-definite but not positive definite (cf. part (ii) of (2)).
(iii) Check to see that the largest eigenvalues of T ◦ T and T ◦ T are
the same.
(iv) Compute all eigenvalues of T ◦T and T ◦T and explain the results
in view of Theorem 5.16.
5.6.6 Let A ∈ F(m, n) and B ∈ F(n, m) where m < n and consider AB ∈
F(m, m) and BA ∈ F(n, n). Show that BA has at most m + 1 distinct
eigenvalues in F.
5.6.7 (A specialization of the previous exercise) Let u, v ∈ Fn be column
vectors. Hence ut v ∈ F and vut ∈ F(n, n).
(i) Show that the matrix vut has a nonzero eigenvalue in F only if
ut v = 0.
(ii) Show that, when ut v = 0, the only nonzero eigenvalue of vut in F
is ut v so that v is an associated eigenvector.
(iii) Show that, when ut v = 0, the eigenspace of vut associated with
the eigenvalue ut v is one-dimensional.
www.pdfgrip.com
5.6.8 Consider the Euclidean space Rk equipped with the standard inner prod-
uct and let A ∈ R(m, n). Formulate a solution of the following optimiza-
tion problem
- .
η ≡ inf Ax − b2 | x ∈ Rn , b ∈ Rm , (5.6.30)
6
Complex quadratic forms
and self-adjoint mappings
In this chapter we extend our study on real quadratic forms and self-adjoint
mappings to the complex situation. We begin with a discussion on the com-
plex version of bilinear forms and the Hermitian structures. We will relate the
Hermitian structure of a bilinear form with representing it by a unique self-
adjoint mapping. Then we establish the main spectrum theorem for self-adjoint
mappings. We next focus again on the positive definiteness of self-adjoint map-
pings. We explore the commutativity of self-adjoint mappings and apply it to
obtain the main spectrum theorem for normal mappings. We also show how to
use self-adjoint mappings to study a mapping between two spaces.
180
www.pdfgrip.com
⎛ ⎞
n
n
n
f (u, v) = f ⎝ xi ui , yj uj ⎠ = x i f (ui , uj )yj = x t Ay = x † Ay,
i=1 j =1 i,j =1
(6.1.1)
where A = (aij ) = (f (ui , uj )) lies in C(n, n) which is the matrix representa-
tion of the sesquilinear form f with respect to B.
Let B̃ = {ũ1 , . . . , ũn } be another basis of U so that the coordinate vectors of
u, v are x̃, ỹ ∈ Cn with respect to B̃. Hence, using à = (ãij ) = (f (ũi , ũj )) ∈
C(n, n) to denote the matrix representation of the sesquilinear form f with
respect to B̃ and B = (bij ) ∈ C(n, n) the basis transition matrix from B into
B̃ so that
n
ũj = bij ui , j = 1, . . . , n, (6.1.2)
i=1
Exercises
6.1.1 Let U be a complex vector space with a positive definite scalar product
(·, ·) and T ∈ L(U ). Show that T = 0 if and only if (u, T (u)) = 0 for
any u ∈ U . Give an example to show that the same may not hold for a
real vector space with a positive definite scalar product.
6.1.2 Let U be a complex vector space with a positive definite scalar product
(·, ·) and T ∈ L(U ). Show that T is self-adjoint or Hermitian if and only
if (u, T (u)) = (T (u), u) for any u ∈ U . Give an example to show that
the same may not hold for a real vector space with a positive definite
scalar product.
6.1.3 Let U be a complex vector space with a basis B = {u1 , . . . , un } and
A = (f (ui , uj )) ∈ C(n, n) the matrix representation of f with respect
to B. Show that f is Hermitian if and only if A is Hermitian.
6.1.4 Let U be a complex vector space with a positive definite scalar product
(·, ·) and T ∈ L(U ). Show that T can be uniquely decomposed into a
sum T = R + iS where R, S ∈ L(U ) are both self-adjoint.
6.1.5 Show that the inverse of an invertible self-adjoint mapping is also self-
adjoint.
6.1.6 Show that In and −In cannot be Hermitian congruent, although they are
congruent as elements in C(n, n).
Theorem 6.4 Let U be a complex vector space with a positive definite scalar
product (·, ·) and T ∈ L(U ). If T is self-adjoint, then the following are valid.
which gives us λ = λ so that λ ∈ R. This establishes (1). Note that this proof
does not assume that U is finite dimensional.
To establish (2), we use induction on dim(U ).
If dim(U ) = 1, there is nothing to show.
Assume that the statement (2) is valid if dim(U ) ≤ n − 1 for some n ≥ 2.
Consider dim(U ) = n, n ≥ 2.
Let λ1 be an eigenvalue of T in C. From (1), we know that actually λ1 ∈ R.
Use Eλ1 to denote the eigenspace of T associated with λ1 :
where λ2 , . . . , λk are all the distinct eigenvalues of T over the invariant sub-
space (Eλ1 )⊥ , which are real.
Finally, we need to show that λ1 , . . . , λk obtained above are all possible
eigenvalues of T . For this purpose, let λ be an eigenvalue of T and u an
associated eigenvector. Then there are u1 ∈ Eλ1 , . . . , uk ∈ Eλk such that
u = u1 + · · · + uk . Hence the relation T (u) = λu gives us λ1 u1 + · · · + λk uk =
λ(u1 + · · · + uk ). That is,
Using the mapping (6.2.7) and Theorem 6.4, we may obtain the following
characterization of a Hermitian matrix, which may also be regarded as a matrix
version of Theorem 6.4.
A = P † DP , (6.2.8)
where D ∈ R(n, n) is a real diagonal matrix whose diagonal entries are the
eigenvalues of A.
In other words, Theorem 6.5 states that a complex square matrix is diago-
nalizable through a unitary matrix into a real diagonal matrix if and only if it
is Hermitian.
Practically, the decomposition (6.2.8) for a Hermitian matrix A may also,
more preferably, be established as in the proof of Theorem 5.6 as follows.
www.pdfgrip.com
Exercises
6.2.1 Let A ∈ C(n, n) be Hermitian. Show that det(A) must be a real number.
6.2.2 (Extension of the previous exercise) Let U be a complex vector space
with a positive definite scalar product and T ∈ L(U ). If T is self-adjoint,
then the coefficients of the characteristic polynomial of T are all real.
6.2.3 Let U be a complex vector space with a positive definite scalar product
and S, T ∈ L(U ) self-adjoint and commutative, S ◦ T = T ◦ S.
(i) Prove the identity
(i) Show that T (u)(t) = tu(t) for t ∈ [0, 1] defines a self-adjoint map-
ping in over U .
(ii) Show that T does not have an eigenvalue whatsoever.
6.2.8 Let T ∈ L(U ) be self-adjoint where U is a finite-dimensional complex
vector space with a positive definite scalar product.
www.pdfgrip.com
The proof of Theorem 6.7 is similar to that of Theorem 5.8 and thus left as
an exercise. Here we check only that the matrix A defined in (6.3.7) is indeed
Hermitian. In fact, this may be seen from the self-adjointness of T through
a j i = (uj , T (ui )) = (T (uj ), ui ) = (ui , T (uj )) = aij , i, j = 1, . . . , n.
(6.3.8)
Note also that if {u1 , . . . , un } is an orthonormal basis of U , then the quanti-
ties aij (i, j = 1, . . . , n) in (6.3.7) simply give rise to the matrix representation
n
of T with respect to this basis. That is, T (uj ) = aij ui (j = 1, . . . , n).
i=1
A matrix version of Theorem 6.7 may be stated as follows.
Proof The necessity proof is similar to that in Theorem 5.1.11 and omitted.
The sufficiency proof is also similar but needs some adaptation to meet the
delicacy with handling complex numbers. To see how this is done, we may
assume that (6.3.9) holds and we show that A is positive definite by an induc-
tive argument. Again, if n = 1, there is nothing to show, and assume that the
assertion is valid at n−1 (that is, when A ∈ C(n−1, n−1)) (n ≥ 2). It remains
to prove that A is positive definite at n ≥ 2.
As before, we rewrite the Hermitian matrix A in the blocked form
An−1 α
A= , (6.3.10)
α† ann
Thus, if we require that all the diagonal entries of L2 are positive, then the
inductive assumption gives us L2 = L1 . So the vector γ is also uniquely
determined, γ = L−11 α. Thus it remains only to show that the number a may
be uniquely determined as well. To this end, we need to show in (6.3.17) that
Hence (6.3.18) follows as a consequence of det(A) > 0. Thus the third equa-
tion in (6.3.17) leads to the unique determination of the positive number a as
in the real situation:
a = ann − γ † γ > 0. (6.3.20)
Exercises
is a diagonal matrix. Show also that the same conclusion is true for
some orthogonal matrices P and Q in R(n, n) when A is real.
6.3.13 Let U be a finite-dimensional complex vector space with a positive
definite scalar product (·, ·) and T ∈ L(U ) a positive definite mapping.
(i) Establish the generalized Schwarz inequality
( (
|(u, T (v))| ≤ (u, T (u)) (v, T (v)), u, v ∈ U. (6.3.27)
(ii) Show that the equality in (6.3.27) occurs if and only if u, v are
linearly dependent.
The proof of the theorem is the same as that for the real situation and
omitted.
We now use Theorem 6.11 to study a slightly larger class of linear mappings
commonly referred to as normal mappings.
www.pdfgrip.com
An obvious but pretty general example of a normal mapping that is not nec-
essarily self-adjoint or Hermitian is a unitary mapping T ∈ L(U ) since it
satisfies T ◦ T = T ◦ T = I . Consequently, for a unitary mapping T ∈ L(U ),
there is an orthonormal basis of U consisting of eigenvectors of T . Further-
more, since T is isometric, it is clear that all eigenvalues of T are of absolute
value 1, as already observed in Section 4.3.
www.pdfgrip.com
The matrix versions of Definition 6.12 and Theorems 6.13 and 6.14 may be
stated as follows.
To prove the theorem, we may simply follow the standard way to associate
a matrix A ∈ C(n, n) with the mapping it generates over Cn through x #→ Ax
for x ∈ Cn as before and apply Theorems 6.13 and 6.14.
Moreover, if A ∈ C(n, n) is unitary, AA† = A† A = In , then there is a
unitary matrix P ∈ C(n, n) and a diagonal matrix D = diag{λ1 , . . . , λn } with
|λi | = 1, i = 1, . . . , n, such that A = P † DP .
Exercises
6.4.4 Assume that A ∈ C(n, n) is triangular and normal. Show that A must
be diagonal.
6.4.5 Let U be a finite-dimensional complex vector space with a positive
definite scalar product and T ∈ L(U ) be normal.
(i) Show that T is self-adjoint if and only if all the eigenvalues of T
are real.
(ii) Show that T is anti-self-adjoint if and only if all the eigenvalues of
T are imaginary.
(iii) Show that T is unitary if and only if all the eigenvalues of T are of
absolute value 1.
6.4.6 Let T ∈ L(U ) be a normal mapping where U is a finite-dimensional
complex vector space with a positive definite scalar product.
(i) Show that if T k = 0 for some integer k ≥ 1 then T = 0.
(ii) (A sharpened version of (i)) Given u ∈ U show that if T k (u) = 0
for some integer k ≥ 1 then T (u) = 0.
(This is an extended version of Exercise 6.2.8.)
6.4.7 If R, S ∈ L(U ) are Hermitian and commutative, show that T = R ± iS
is normal.
6.4.8 Consider the matrix
1 i
A= . (6.4.11)
i 1
6.4.11 Let U be a complex vector space with a positive definite scalar product
(·, ·). A mapping T ∈ L(U ) is called hyponormal if it satisfies
(u, (T ◦ T − T ◦ T )(u)) ≤ 0, u ∈ U. (6.4.12)
(i) Show that T is normal if any only if T and T are both hyponormal.
(ii) Show that T being hyponormal is equivalent to
T (u) ≤ T (u), u ∈ U. (6.4.13)
(iii) If T is hyponormal, so is T + λI where λ ∈ C.
(iv) Show that if λ ∈ C is an eigenvalue and u an associated eigen-
vector of a hyponormal mapping T ∈ L(U ) then λ and u are an
eigenvalue and an associated eigenvector of T .
6.4.12 Let U be an n-dimensional (n ≥ 2) complex vector space with a pos-
itive definite scalar product and T ∈ L(U ). Show that T is normal if
and only if there is a complex-coefficient polynomial p(t) of degree at
most n − 1 such that T = p(T ).
6.4.13 Let U be a finite-dimensional complex vector space with a positive
definite scalar product and T ∈ L(U ). Show that T is normal if and
only if T and T have the same invariant subspaces of U .
i=1
n
T (u)2V = (T (u), T (u))V = ((T ◦ T )(u), u)U = σi |ai |2 ≤ σ0 ,
i=1
(6.5.5)
where
σ0 = max {σi } ≥ 0, (6.5.6)
1≤i≤n
√
which shows T ≤ σ0 . Moreover, let i = 1, . . . , n be such that σ0 = σi .
Then (6.5.4) gives us
T 2 ≥ T (ui )2V = ((T ◦ T )(ui ), ui )U = σi = σ0 . (6.5.7)
Consequently, as in the real situation, we conclude with
√
T = σ0 , σ0 ≥ 0 is the largest eigenvalue of the mapping T ◦ T .
(6.5.8)
Therefore the norm of a linear mapping T ∈ L(U, V ) may be obtained
by computing the largest eigenvalue of the induced self-adjoint or Hermitian
mapping T ◦ T ∈ L(U ).
In particular, if T ∈ L(U ) is already self-adjoint, then, since T is simply
the radical root of the largest eigenvalue of T 2 , we arrive at
T = max {|λi |}, (6.5.9)
1≤i≤n
www.pdfgrip.com
In fact the numbers λ21 , . . . , λ2k are all the positive eigenvalues and u1 , . . . , uk
the associated eigenvectors of the self-adjoint mapping T ◦ T ∈ L(U ).
Let A ∈ C(m, n). Theorem 6.19 implies that there are unitary matrices P ∈
C(n, n) and Q ∈ C(m, m) such that
⎧
⎪
⎨ AP = Q or A = QP † ,
D 0 (6.5.17)
⎪
⎩ = ∈ R(m, n), D = diag{λ1 , . . . , λk },
0 0
www.pdfgrip.com
Exercises
6.5.1 Verify that for T ∈ L(U, V ) the mapping T given in (6.5.2) is a well-
defined element in L(V , U ) although the scalar products (·, ·)U and
(·, ·)V of U and V are both sesquilinear.
6.5.2 Consider the matrix
2 1−i 3
A= . (6.5.18)
3i −1 3 + 2i
Note that most of the exercises in Chapter 5 may be restated in the context
of the complex situation of this chapter and are omitted.
www.pdfgrip.com
7
Jordan decomposition
205
www.pdfgrip.com
Proof If I = I(g) for some g ∈ P, it is clear that g will have the lowest
degree among all elements in I in view of the definition of I(g). Such an
observation indicates what to look for in our proof. Indeed, since I = {0}, we
may choose g to be an element in I \ {0} that is of the lowest degree. Then it
is clear that I(g) ⊂ I. For any h ∈ I \ {0}, we claim g|h (i.e. g divides h).
Otherwise, we may rewrite h as
h(t) = q(t)g(t) + r(t) (7.1.2)
for some q, r ∈ P so that the degree of r is lower than that of g. Since g ∈ I,
we have qg ∈ I. Thus r = h − qg ∈ I, which contradicts the definition of g.
Consequently, g|h. So h ∈ I(g) as expected, which proves I ⊂ I(g).
If there is another h ∈ P such that I = I(h), of course g and h must have
the same degree since h|g and g|h. If the coefficients of the highest-degree
terms of g and h coincide, then g = h, otherwise g − h ∈ I \ {0} would be of
a lower degree, which contradicts the choice of g given earlier.
The theorem is proved.
Let g1 , . . . , gk ∈ P \ {0}. Choose g ∈ P such that I(g) = I(g1 , . . . , gk ).
Then there are elements f1 , . . . , fk ∈ P such that
g = f1 g1 + · · · + fk gk , (7.1.3)
which implies that g contains all common divisors of g1 , . . . , gk . In other
words, if h|g1 , . . . , h|gk , then h|g. On the other hand, the definition of
I(g1 , . . . , gk ) already gives us g|g1 , . . . , g|gk . So g itself is a common
divisor of g1 , . . . , gk . In view of Theorem 7.2, we see that the coefficient of the
highest-degree term of g determines g completely. Thus, we may fix g by tak-
ing the coefficient of the highest-degree term of g to be 1. Such a polynomial
www.pdfgrip.com
is valid for arbitrary t. This fact will be our starting point in the subsequent
development.
A polynomial p ∈ P of degree at least 1 is called a prime polynomial or an
irreducible polynomial if p cannot be factored into the product of two polyno-
mials in P of degrees at least 1. Two polynomials are said to be equivalent if
one is a scalar multiple of the other.
Exercises
7.1.6 Let U be a finite-dimensional vector space over a field F and P the vec-
tor space of all polynomials over F. Show that I = {p ∈ P | p(T ) = 0}
is an ideal of P. Let g ∈ P be such that I = I(g) and the coefficient of
the highest-degree term of g is 1. Is g the minimal polynomial of T that
has the minimum degree among all the polynomials that annihilate T ?
www.pdfgrip.com
Therefore, we have
f1 (T )g1 (T ) + · · · + fk (T )gk (T ) = I. (7.2.7)
In fact, since pini and gi are relatively prime, there are polynomials qi and ri
in P such that
qi pini + ri gi = 1. (7.2.10)
This gives us the relation
Exercises
N (A) = {x ∈ Cn | Ax = 0},
(7.2.15)
N (A − aIn ) = {x ∈ Cn | (A − aIn )x = 0},
U = U0 ⊕ U1 ⊕ · · · ⊕ Uk , (7.3.3)
k0 + k = n(T ). (7.3.5)
www.pdfgrip.com
where ai s, bi s, and bij s are scalars. Applying T to (7.3.14) and using (7.3.12)
we arrive at
l0 l m
i −1
j
ai vi0 + bij TV (vi ) = 0, (7.3.15)
i=1 i=1 j =0
ai = 0, i = 1, . . . , l0 ; bij = 0, j = 0, 1, . . . , mi − 1, i = 1, . . . , l.
(7.3.16)
Substituting (7.3.16) into (7.3.15) we have
b1 v10 + · · · + bl0 vl00 + b1m1 T m1 (u1 ) + · · · + blml T ml (ul ) = 0. (7.3.17)
as well, which establishes that the vectors given in (7.3.13) are linearly inde-
pendent.
It is clear that
v10 , . . . , vl00 , T m1 (u1 ), . . . , T ml (ul ) ∈ N (T ). (7.3.20)
form a basis of N (T ).
www.pdfgrip.com
for some unique bi s and bij s in F, which gives rise to the result
l0 l m
i −1
u− bi wi − bij T j (ui ) ∈ N (T ). (7.3.23)
i=1 i=1 j =0
We are now prepared to consider general linear mappings over complex vec-
tor spaces.
di ≡ dim(Vi ) = m0 + m1 + · · · + ml . (7.3.31)
www.pdfgrip.com
With respect to such a basis, the matrix of T − λi I is seen to take the following
boxed diagonal form
⎛ ⎞
0 0 ··· 0
⎜ ⎟
⎜ 0 S1 · · · 0 ⎟
⎜ ⎟
⎜ ⎟, (7.3.32)
⎜ 0 · · · ... 0 ⎟
⎝ ⎠
0 ··· 0 Sl
where, in the diagonal of the above matrix, 0 is the zero matrix of size m0 × m0
and S1 , . . . , Sl are shift matrices of sizes m1 × m1 , . . . , ml × ml , respectively.
Therefore the matrix of T = (T − λi I ) + λi I with respect to the same basis is
simply
⎛ ⎞
0 0 ··· 0
⎜ ⎟
⎜ 0 S1 · · · 0 ⎟
⎜ ⎟
Ai = ⎜ ⎟ + λi Idi . (7.3.33)
⎜ 0 · · · ... 0 ⎟
⎝ ⎠
0 ··· 0 Sl
Note that the factorization expressed in (7.3.26) indicates that each eigen-
value, λi , repeats itself ni times as a root in the characteristic polynomial of
T . For this reason, the integer ni is called the algebraic multiplicity of the
eigenvalue λi of T .
Given T ∈ L(U ), let λi be an eigenvalue of T . For any integer m ≥ 1, the
nonzero vectors in N ((T − λi I )m ) are called the generalized eigenvectors and
N((T − λi I )m ) the generalized eigenspace associated to the eigenvalue λi . If
m = 1, generalized eigenvectors and eigenspace are simply eigenvectors and
eigenspace, respectively, associated to the eigenvalue λi of T , and
In other words, the geometric multiplicity is less than or equal to the algebraic
multiplicity, of any eigenvalue, of a linear mapping.
Exercises
with specifying
⎛ ⎞
λi 0 ··· ··· 0
⎜ ··· 0 ⎟
⎜ 1 λi 0 ⎟
⎜ ⎟
⎜ .. .. .. .. ⎟
Ji,s =⎜
⎜ . . . . 0 ⎟⎟ (7.4.7)
⎜ .. ⎟
⎜ .. .. .. ⎟
⎝ . ··· . . . ⎠
0 ··· ··· 1 λi
B = B1 ∪ · · · ∪ Bk , (7.4.10)
u = u1 + · · · + uk , u1 ∈ V1 , . . . , uk ∈ Vk . (7.4.13)
Thus, we have
mT (T )(u) = (T − λ1 I )m1 · · · (T − λk I )nk (u1 + · · · + uk )
k #
=
(T − λ1 I )m1 · · · [(T − λi I )mi ] · · ·
i=1
· · · (T − λk I )nk (T − λi I )mi (ui )
= 0, (7.4.14)
! denotes the item that is
which establishes mT (T ) = 0 as asserted. Here [·]
missing in the expression. Since mT (λ)|pT (λ), we arrive at pT (T ) = 0, as
stated in the Cayley–Hamilton theorem.
Given T ∈ L(U ), use P to denote the vector space of all polynomials with
coefficients in C and consider the subspace of P:
AT = {p ∈ P | p(T ) = 0}. (7.4.15)
It is clear that AT is an ideal in P. Elements in AT are also called annihilating
polynomials of the mapping T . Let AT be generated by some m(λ). Then m(λ)
has the property that it is a minimal-degree polynomial among all nonzero
elements in AT . If we normalize the coefficient of the highest-degree term of
m(λ) to 1, then m is uniquely determined and is called the minimal polynomial
of the linear mapping T . It is clear that, given T ∈ L(U ), if λ1 , . . . , λk are all
the distinct eigenvalues of T and m1 , . . . , mk are the corresponding degrees of
www.pdfgrip.com
Exercises
8
Selected topics
In this chapter we present a few selected subjects that are important in applica-
tions as well, but are not usually included in a standard linear algebra course.
These subjects may serve as supplemental or extracurricular materials. The
first subject is the Schur decomposition theorem, the second is about the clas-
sification of skewsymmetric bilinear forms, the third is the Perron–Frobenius
theorem for positive matrices, and the fourth concerns the Markov or stochastic
matrices.
226
www.pdfgrip.com
Consequently, pT (T ) = 0, as anticipated.
It is clear that, in Theorem 8.1, if the upper triangular matrix B ∈ C(n, n)
is diagonal, then the orthonormal basis B is made of the eigenvectors of T .
In this situation T is normal. Likewise, if B is diagonal and real, then T is
self-adjoint.
We remark that Theorem 8.1 may also be proved without resorting to the
adjoint mapping.
In fact, use the notation of Theorem 8.1 and proceed to the nontrivial situ-
ation dim(U ) = n ≥ 2 directly. Let λ1 ∈ C be an eigenvalue of T and u1 an
associated unit eigenvector. Then we have the orthogonal decomposition
j
S(uj ) = bij ui , j = 2, . . . , n, (8.1.10)
i=2
A = P † BP . (8.1.12)
The proof of Theorem 8.2 may be obtained by applying Theorem 8.1, where
we take U = Cn with the standard Hermitian scalar product and define T ∈
L(Cn ) by setting T (u) = Au for u ∈ Cn . In fact, with B = {u1 , . . . , un } being
the orthonormal basis of Cn stated in Theorem 8.1, the unitary matrix P in
(8.1.12) is such that the ith column vector of P † is simply ui , i = 1, . . . , n.
From (8.1.12) we see immediately that A is normal if and only if B is diag-
onal and that A is Hermitian if and only if B is diagonal and real.
Exercises
8.1.1 Show that the matrix B may be taken to be lower triangular in Theo-
rems 8.1 and 8.2.
8.1.2 Let A = (aij ), B = (bij ) ∈ C(n, n) be stated in the relation (8.1.12).
Use the fact Tr(A† A) = Tr(B † B) to infer the identity
n
|aij |2 = |bij |2 . (8.1.13)
i,j =1 1≤i≤j ≤n
8.1.4 For A ∈ R(n, n) assume that the roots of the characteristic polynomial
of A are all real. Establish the real version of the Schur decomposition
theorem, which asserts that there is an orthogonal matrix P ∈ R(n, n)
and an upper triangular matrix B ∈ R(n, n) such that A = P t BP and
that the diagonal entries of B are all the eigenvalues of A. Can you prove
a linear mapping version of the theorem when U is a real vector space
with a positive definite scalar product?
8.1.5 Show that if A ∈ R(n, n) is normal and all the roots of its characteristic
polynomial are real then A must be symmetric.
8.1.6 Let U be a finite-dimensional complex vector space with a positive defi-
nite scalar product and S, T ∈ L(U ). If S and T are commutative, show
that U has an orthonormal basis, say B, such that, with respect to B, the
matrix representations of S and T are both upper triangular.
8.1.7 Let U be an n-dimensional (n ≥ 2) complex vector space with a positive
definite scalar product and T ∈ L(U ). If λ ∈ C is an eigenvalue of T u ∈
U an associated eigenvector, then the quotient space V = U/Span{u} is
of dimension n − 1.
(i) Define a positive definite scalar product over V and show that T
induces an element in L(V ).
www.pdfgrip.com
Consequently, we have
I2 0 α1 β I2 γ
P EAE P =
t t
γt In−2 −β t
An−2 0 In−2
α1 α1 γ + β
= . (8.2.11)
γ α1 − β
t t
γ α1 γ − β t γ + γ t β + An−2
t
www.pdfgrip.com
α1 γ + β = 0. (8.2.12)
where
G = γ t α1 γ − β t γ + γ t β + An−2 (8.2.15)
U0 = U ⊥ = {u ∈ U | f (u, v) = 0, v ∈ U }. (8.2.26)
Exercises
8.2.1 Let f be a bilinear form over a vector space U . Show that f is skewsym-
metric if and only if f (u, u) = 0 for any u ∈ U .
8.2.2 Let f be a skewsymmetric bilinear form over U and S ⊂ U a non-empty
subset. Show that S ⊥ defined in (8.2.25) is a subspace of U .
8.2.3 Let U be a finite-dimensional vector space and f a skewsymmetric bil-
inear form over U . Define U0 by (8.2.26) and use A to denote the matrix
representation of f with respect to any given basis of U . Show that
dim(U0 ) = dim(U ) − r(A). (8.2.33)
8.2.4 Let U be a symplectic vector space equipped with a symplectic form f .
If V is a subspace of U , V ⊥ is called the symplectic complement of V
in U . Prove the following.
(i) dim(V ) + dim(V ⊥ ) = dim(U ).
(ii) (V ⊥ )⊥ = V .
8.2.5 Let U be a symplectic vector space equipped with a symplectic form f .
If V is a space of U , we can consider the restriction of f over V . Prove
that f is symplectic over V if and only if V ∩ V ⊥ = {0}.
www.pdfgrip.com
which leads to
n
ai0 j zj > rzi0 when s > 0 is sufficiently small. (8.3.14)
j =1
Thus Az > rz. Set v = z/z. Then v ∈ S and Av > rv. Hence we may
choose ε > 0 small such that Av > (r + ε)v. So r + ε ∈ , which contradicts
the definition of r made in (8.3.7).
Therefore Au = ru and r is a positive eigenvalue of A.
The positivity of u = (ai ) follows easily since u ∈ S and A > 0 so that
ru = Au leads to
n
λai = aij aj > 0, i = 1, . . . , n, (8.3.15)
j =1
Then s0 > 0 and us0 ≥ 0 but us0 > 0. If us0 = 0 then us0 > 0 since us0 is an
eigenvector of A associated to r that contradicts the definition of s0 . Therefore
we must have us0 = 0, which gives us the result v = (s0−1 )u as desired.
In order to show that r is the simple root of the characteristic polynomial
of A, we need to prove that there is only one Jordan block associated with the
eigenvalue r and that this Jordan block can only be 1 × 1. To do so, we first
show that the geometric multiplicity of r is 1. Then we show that there is no
generalized eigenvector. That is, the equation
(A − rIn )v = u, v ∈ Cn , (8.3.18)
has no solution.
Suppose otherwise that the dimension of the eigenspace Er is greater than
one. Then there is some v ∈ Er that is not a scalar multiple of u. Write v as
v = v1 + iv2 with v1 , v2 ∈ Rn . Then Av1 = rv1 and Av2 = rv2 . We assert
that one of the sets of vectors {u, v1 } and {u, v2 } must be linearly independent
over R. Otherwise there are a1 , a2 ∈ R such that v1 = a1 u and v2 = a2 u which
imply v = (a1 + ia2 )u, a contradiction. Use w to denote either v1 or v2 which
is linearly independent from u over R. Then we know that ±w can never be
non-negative. Thus there are components of w that have different signs. Now
consider the vector
It is clear that us > 0 when s > 0 is sufficiently small since u > 0. So there
is some s0 > 0 such that us0 ≥ 0 but a component of us0 is zero. However,
because us0 = 0 owing to the presence of a positive component in w, we arrive
at a contradiction because us0 is seen to be an eigenvector of A associated to r.
We next show that (8.3.18) has no solution. Since u ∈ Rn , we need only to
consider v ∈ Rn in (8.3.18). We proceed as follows.
Consider
Since Au = ru, we see that ws also satisfies (8.3.18). Take s > 0 sufficiently
large so that ws > 0. Hence (A − rIn )ws = u > 0 or
Exercises
8.3.1 Let A = (aij ) ∈ R(n, n) be a positive matrix and r > 0 the Perron–
Frobenius eigenvalue of A. Show that r satisfies the estimate
⎛ ⎞ ⎛ ⎞
n n
min ⎝ aij ⎠ ≤ r ≤ max ⎝ aij ⎠ . (8.3.27)
1≤i≤n 1≤i≤n
j =1 j =1
that is, the components of each row vector in A sum up to 1, then A is called
a Markov or stochastic matrix. If there is an integer m ≥ 1 such that Am is a
positive matrix, then A is called a regular Markov or regular stochastic matrix.
|λ| ≤ 1. (8.4.2)
Proof Using (8.4.1), the fact that u = (1, . . . , 1)t satisfies Au = u may be
checked directly.
Now let λ ∈ C be any eigenvalue of A and v = (bi ) ∈ Cn an associated
eigenvector. Then there is some i0 = 1, . . . , n such that
(1) The absolute value of any other eigenvalue λ ∈ C of A is less than 1. That
is, |λ| < 1.
(2) 1 is a simple root of the characteristic polynomial of A.
A = CBC −1 , (8.4.6)
B = diag{J1 , · · · , Jk } (8.4.7)
J = λI + P , (8.4.9)
J m = (λI + P )m
m
m!
= λm−s P s
s!(m − s)!
s=0
l−1
m!
= λm−s P s
s!(m − s)!
s=0
m(m − 1) · · · (m − [l − 2]) l−1
= λm I + λm−1 mP + · · · + λm−l+1 P ,
(l − 1)!
(8.4.10)
which is an upper triangular matrix with the diagonal entries all equal to λm .
From (8.4.6) and (8.4.7), we have
Am = Cdiag{J1m , . . . , Jkm }C −1 . (8.4.11)
However, using the condition Am > 0, Theorem 8.6, and (8.4.10), we
conclude that there exists exactly one Jordan block among the Jordan blocks
J1 , . . . , Jk of A with λ = 1 and that such a Jordan block can only be 1 × 1.
The proof of the theorem is thus complete.
Since in (8.4.10) there are l terms, we see that J m → 0 as m → ∞ when
|λ| < 1. This observation leads to the following important convergence theo-
rem for the Markov matrices.
Inserting the results that the first column vector of C is u = a(1, . . . , 1)t , the
first row vector of D is w t = bv t , where v satisfies (8.4.13), and ab = 1 into
(8.4.19), we obtain
⎛ ⎞
b1 · · · bn
⎜ . .. ⎟
lim Am = ⎜ ⎝ .. · · · ⎟
. ⎠ = K, (8.4.20)
m→∞
b1 ··· bn
as asserted.
For a Markov matrix A ∈ R(n, n), the power of A, Am , may or may not
approach a limiting matrix as m → ∞. If the limit of Am as m → ∞ exists
and is some K ∈ R(n, n), then it is not hard to show that K is also a Markov
matrix, which is called the stable matrix of A and A is said to be a stable
www.pdfgrip.com
Markov matrix. Theorem 8.10 says that a regular Markov matrix A is stable
and, at the same time, gives us a constructive method to find the stable matrix
of A.
Exercises
8.4.1 Let A ∈ R(n, n) be a stable Markov matrix and K ∈ R(n, n) the stable
matrix of A. Show that K is also a Markov matrix and satisfies AK =
KA = K.
8.4.2 Show that the matrix
0 1
A= (8.4.21)
1 0
is a Markov matrix which is not regular. Is A stable?
8.4.3 Consider the Markov matrix
⎛ ⎞
1 1 0
1⎜ 1 1 ⎟
A= ⎜ ⎝ 1 ⎟. (8.4.22)
2 2 2 ⎠
0 1 1
9
Excursion: Quantum mechanics in a nutshell
The content of this chapter may serve as yet another supplemental topic to meet
the needs and interests beyond those of a usual course curriculum. Here we
shall present an over-simplified, but hopefully totally transparent, description
of some of the fundamental ideas and concepts of quantum mechanics, using
a pure linear algebra formalism.
n
(u, v) = u v =
†
a i bi , (9.1.2)
i=1
248
www.pdfgrip.com
n
|u = ai |ei . (9.1.3)
i=1
Therefore the bra-counterpart of |u is simply given as
n
u| = (|u)† = a i ei |. (9.1.4)
i=1
As a consequence, the orthonormal condition regarding the basis {e1 , . . . ,
en } becomes
ei |ej = δij , i, j = 1, . . . , n, (9.1.5)
and the Hermitian scalar product of the vectors |u and |v assumes the form
n
u|v = a i bi = v|u. (9.1.6)
i=1
For the vector |u given in (9.1.3), we find that
ai = ei |u, i = 1, . . . , n. (9.1.7)
Now rewriting |u as
n
|u = |ei ai , (9.1.8)
i=1
and inserting (9.1.7) into (9.1.8), we obtain
n
n
|u = |ei ei |u ≡ |ei ei | |u, (9.1.9)
i=1 i=1
n
which suggests that the strange-looking ‘quantity’, |ei ei |, should natu-
i=1
rally be identified as the identity mapping or matrix,
n
|ei ei | = I, (9.1.10)
i=1
which readily follows from the associativity property of matrix multiplication.
Similarly, we have
n
n
n
u| = ei |uei | = u|ei ei | = u| |ei ei | . (9.1.11)
i=1 i=1 i=1
Thus (9.1.10) can be applied to both bra and ket vectors symmetrically and
what it expresses is simply the fact that |e1 , . . . , |en form an orthonormal
basis of Cn .
www.pdfgrip.com
We may reexamine some familiar linear mappings under the new notation.
Let |u be a unit vector in Cn . Use P|u to denote the mapping that projects
C onto Span{|u} along (Span{|u})⊥ . Then we have
n
⎛ ⎞
a1
⎜ . ⎟
|uv| = |u(|v)† = ⎜ ⎟
⎝ .. ⎠ (b1 , . . . , bn )
an
⎛ ⎞
a1 b1 ··· a1 bn
⎜ ⎟
= ⎝ ··· ··· · · · ⎠ = (ai bj ). (9.1.20)
an b1 ··· an bn
In particular, we have the representation
⎛ ⎞
1 0 ··· 0
⎜ ⎟
n ⎜ 0 1 ··· 0 ⎟
⎜ ⎟
|ei ei | = ⎜ ⎟. (9.1.21)
⎜ 0 ..
. 0 ⎟
i=1 ⎝ 0 ⎠
0 0 ··· 1
Finally, if A ∈ C(n, n) is a Hermitian matrix (or equivalently viewed as
a self-adjoint mapping over Cn ), it is clear to explain A in u|A|v as to be
applied on the ket vector |v from the left or on the row vector u| from the
right, without ambiguity, since
u|A|v = u|(A|v) = (u|A)|v = (A|u)† |v. (9.1.22)
Exercises
H = K + V. (9.2.8)
i
Since H is Hermitian, − H is anti-Hermitian. Hence U (t) is unitary,
h̄
which ensures the conservation of the normality of the state vector |φ(t).
That is,
where
λi
ωi = , i = 1, . . . , n, (9.2.15)
h̄
are angular frequencies of the eigenmodes
E = h̄ω, (9.2.17)
where
[H, A] = H A − AH (9.2.19)
is the commutator of H and A which measures the non-commutativity of H
and A. Hence we see that A is time-independent if A commutes with H :
[H, A] = 0. (9.2.20)
In particular, the average energy H is always conserved. Furthermore, if H is
related through the momentum P and potential V through (9.2.8) and (9.2.9)
so that P commutes with V , then [H, P ] = 0 and the average momentum
P is also conserved. These are the quantum mechanical extensions of laws
of conservation for energy and momentum.
Exercises
which is given in a form free of the choice of the basis {|ui }. Thus,
with (9.3.1), if A and B are two observables, then the Schwarz inequality
implies that
σA2 σB2 ≥ |φ|(A − AI )|(B − BI )|φ|2 ≡ |c|2 , (9.3.2)
where the complex number c is given by
c = φ|(A − AI )|(B − BI )|φ
= φ|(A − AI )(B − BI )|φ
= φ|AB|φ − Bφ|A|φ − Aφ|B|φ + ABφ|φ
= AB − AB. (9.3.3)
Interchanging A and B, we have
c = φ|(B − BI )|(A − AI )|φ
= BA − AB. (9.3.4)
Therefore, we obtain
1 1
(c) = (c − c) = [A, B]. (9.3.5)
2i 2i
Inserting (9.3.5) into (9.3.2), we arrive at the inequality
2
1
σA2 σB2 ≥ [A, B] , (9.3.6)
2i
which roughly says that if two observables are non-commutative, we cannot
achieve simultaneous high-precision measurements for them. To put the state-
ment in another way, if we know one observable with high precision, we do
not know the other observable at the same time with high precision. This fact,
in particular, the inequality (9.3.6), in quantum mechanics, is known as the
Heisenberg uncertainty principle.
On the other hand, when A and B commute, we know that A and B share
the same eigenstates which may form an orthonormal basis of Cn . Let φ be a
commonly shared eigenstate so that
A|φ = λA |φ, B|φ = λB |φ, λA , λB ∈ R. (9.3.7)
If the system lies in the state |φ, then, with simultaneous full certainty, the
measured values of the observables A and B are λA and λB , respectively.
Let k ≥ 1 be an integer and define the kth moment of an observable A in the
state |φ by
Ak = φ|Ak |φ. (9.3.8)
www.pdfgrip.com
Thus, in (9.3.3), when we set B = A, we see that the variance σA2 of A in the
state |φ may be computed using the formula
n
|φ = φi |ui . (9.3.11)
i=1
n
n
A2 = λ2i |φi |2 , A = λi |φi |2 . (9.3.12)
i=1 i=1
Hence, we have
n 2
n
2
σA,|φ = λ2i |φi |2 − λi |φi |2 . (9.3.13)
i=1 i=1
i=1
www.pdfgrip.com
Thus, using calculus, the maximum points are to be sought among the solutions
of the equations
⎛ ⎡ ⎤ ⎞
n
xi ⎝λ2i − 2λi ⎣ λj xj2 ⎦ − ξ ⎠ = 0, i = 1, . . . , n, (9.3.15)
j =1
in the nontrivial situation σA > 0, we see that there are at least n − 2 values of
i = 1, . . . , n such that
Using the constraint in (9.3.21), we may simplify the objective function of the
problem (9.3.21) into the form
In this case, it is straightforward to check that λ1 and λ2 are indeed the two
roots of equation (9.3.19). In particular,
1
2
σA,|φ = (λ1 − λ2 )2 . (9.3.24)
4
Consequently, if we use λmin and λmax to denote the smallest and largest eigen-
values of A, then we see in view of (9.3.24) that the maximum uncertainty is
given by
1
σA,max = (λmax − λmin ), (9.3.25)
2
which is achieved when the system occupies the maximum uncertainty state
1
|φmax = a|uλmax + b|uλmin , a, b ∈ C, |a| = |b| = √ . (9.3.26)
2
In other words, we have the result
Exercises
Evaluate the quantities σA2 , σB2 , and [A, B] in the state
1 −1
|φ = √ , (9.3.29)
5 2
1
n
|φ = √ |ui . (9.3.30)
n
i=1
i i
A(t) = e h̄ tH Ae− h̄ tH , (9.4.5)
which indicates how an observable should evolve itself with respect to time.
Differentiating (9.4.5), we are led to the equation
d i
A(t) = [H, A(t)], (9.4.6)
dt h̄
which spells out the dynamical law that a time-dependent observable must
follow and is known as the Heisenberg equation.
We next show that (9.4.6) implies (9.2.7) as well.
As preparation, we establish the following Gronwall inequality which is
useful in the study of differential equations: If f (t) and g(t) are continuous
non-negative functions in t ≥ 0 and satisfy
t
f (t) ≤ a + f (τ )g(τ ) dτ, t ≥ 0, (9.4.7)
0
for some constant a ≥ 0, then
t
f (t) ≤ a exp g(τ ) dτ , t ≥ 0. (9.4.8)
0
To prove it, we modify (9.4.7) into
t
f (t) < a + ε + f (τ )g(τ ) dτ, t ≥ 0, (9.4.9)
0
and set
t
h(t) = a + ε + f (τ )g(τ ) dτ, t ≥ 0, (9.4.10)
0
We now turn our attention back to the Heisenberg equation (9.4.6). Suppose
that B(t) is another solution of the equation. Then C(t) = A(t)−B(t) satisfies
d i
C(t) = [H, C(t)], C(0) = A(0) − B(0). (9.4.14)
dt h̄
Therefore, we have
2
C (t) ≤ H C(t). (9.4.15)
h̄
On the other hand, we may use the triangle inequality to get
C(t + h) − C(t) ≤ C(t + h) − C(t), h > 0, (9.4.16)
which allows us to conclude with
d
C(t) ≤ C (t). (9.4.17)
dt
Inserting (9.4.17) into (9.4.15) and integrating, we find
2 t
C(t) ≤ C(0) + H C(τ ) dτ, t ≥ 0. (9.4.18)
h̄ 0
Consequently, it follows from applying the Gronwall inequality that
2
C(t) ≤ C(0)e h̄ H t , t ≥ 0. (9.4.19)
The same argument may be carried out in the domain t ≤ 0 with the time
flipping t # → −t. Thus, in summary, we obtain the collective conclusion
2
C(t) ≤ C(0)e h̄ H |t| , t ∈ R. (9.4.20)
In particular, if C(0) = 0, then C(t) ≡ 0, which implies that the solution to
the initial value problem of the Heisenberg equation (9.4.6) is unique. Hence,
if A(0) = A, then the solution is uniquely given by (9.4.5). As a consequence,
if A commutes with H , A(t) = A for all time t. In other words, if an observ-
able commutes with the Hamiltonian initially, it stays commutative with the
Hamiltonian and remains in fact constant for all time.
We can now derive the Schrödinger equation (9.2.7) from the Heisenberg
equation (9.4.6).
In fact, let A(t) be the unique solution of (9.4.6) evolving from its initial
state A. Then A(t) is given by (9.4.5). Let |φ(t) denote the state vector that
evolves with respect to t from its initial state vector |φ0 so that it gives rise
to the same expected value as that evaluated using the Heisenberg equation
through A(t). Then (9.4.3) holds. We hope to examine, to what extent, the
relation (9.4.1) must be valid. To this end, we assume that
u|A|u = v|A|v (9.4.21)
www.pdfgrip.com
holds for any Hermitian matrix A ∈ C(n, n) for some |u, |v ∈ Cn and we
investigate how |u and |v are related.
If |v = 0, then u|A|u = 0 for any Hermitian matrix A, which implies
|u = 0 as well. Thus, in the following, we only consider the nontrivial situa-
tion |u = 0, |v = 0.
If v = 0, we set V = Span{|v} and W = V ⊥ . Choose a Hermitian matrix
A so that |x #→ A|x (|x ∈ Cn ) defines the projection of Cn onto V along
W . Write |u = a|v + |w, where a ∈ C and |w ∈ W . Then A|u = a|v.
Thus
which leads to
which gives us the result w = 0. In other words, that (9.4.21) holds for any
Hermitian matrix A ∈ C(n, n) implies that |u and |v differ from each other
by a phase factor a satisfying (9.4.23). That is,
where
i
|ψ(t) = e− h̄ tH |φ0 . (9.4.27)
Exercises
9.4.1 Consider the Heisenberg equation (9.4.6) subject to the initial condition
A(0) = A0 . An integration of (9.4.6) gives
i t
A(t) = A0 + [H, A(τ )] dτ. (9.4.29)
h̄ 0
(i) From (9.4.29) derive the result
t
i
[H, A(t)] = [H, A0 ] + [H, [H, A(τ )]] dτ. (9.4.30)
h̄ 0
(ii) Use (9.4.30) and the Gronwall inequality to come up with an alter-
native proof that H and A(t) commute for all time if and only if
they do so initially.
9.4.2 Consider the Hamiltonian H and an observable A given by
2 i 1 1−i
H = , A= . (9.4.31)
−i 2 1+i 1
(i) Solve the Schrödinger equation (9.2.7) to obtain the time-
dependent state |φ(t) evolving from the initial state
1 1
|φ0 = √ . (9.4.32)
2 i
(ii) Evaluate the expected value of the observable A assuming that the
system lies in the state |φ(t).
(iii) Solve the Heisenberg equation (9.4.6) with the initial condition
A(0) = A and use it to evaluate the expected value of the same
observable within the Heisenberg picture. That is, compute the
quantity φ0 |A(t)|φ0 . Compare the result with that obtained in (ii)
and explain.
www.pdfgrip.com
Section 1.1
1.1.2 Let n satisfy 1 ≤ n < p and write 2n = kp + l for some integers k and
l where 0 ≤ l < p. Then n + (n − l) = kp. Thus [n − l] = −[n].
If [n] = [0], then n is not a multiple of p. Since p is a prime, so the
greatest common divisor of n and p is 1. Thus there are integers k, l such
that
kp + ln = 1. (S1)
267
www.pdfgrip.com
Section 1.2
1.2.7 We have
⎛ ⎞
⎛ ⎞ a1 b1 a1 b2 ··· a1 bn
a1 ⎜ ⎟
⎜ . ⎟ ⎜ a2 b1 a2 b2 ··· a2 bn ⎟
ut v = ⎜ ⎟
⎝ .. ⎠ (b1 , . . . , bn ) =
⎜
⎜ ···
⎟,
⎝ ··· ··· ··· ⎟⎠
an
an b1 an b2 ··· an bn
whose ith row and j th row are
u ∈ U1 ∪ · · · ∪ Um . (S3)
Section 1.4
of (S4) we obtain
n
f (x1 , . . . , xn ) = afn (e1 ) xi , (x1 , . . . , xn ) ∈ Fn .
i=1
Section 1.7
1.7.2 Assume the nontrivial situation u = 0. It is clear that up ≤ u∞ for
any p ≥ 1. Thus lim sup up ≤ u∞ .
p→∞
Let t0 ∈ [a, b] be such that |u(t0 )| = u∞ > 0. For any ε ∈
(0, u∞ ) we can find an interval around t0 , say Iε ⊂ [a, b], such that
|u(t)| > u∞ − ε when t ∈ Iε . Thus
1
p 1
up ≥ |u(t)|p dt ≥ (u∞ − ε) |Iε | p , (S5)
Iε
where |Iε | denotes the length of the interval Iε . Letting p → ∞ in (S5)
we find lim inf up ≥ u∞ −ε. Since ε > 0 may be chosen arbitrarily
p→∞
small, we arrive at lim inf up ≥ u∞ .
p→∞
Thus the limit up → u∞ as p → ∞ follows.
www.pdfgrip.com
Section 2.1
So
r(R ◦ S) + r(S ◦ T ) ≤ l + m + k + l
= (k + l + m) + l
= r(S) + r(R ◦ S ◦ T ).
(y1 , . . . , yk , z1 , . . . , zl ) ∈ Fk+l .
Section 2.2
and
1 1
T (u1 ) = (a + b + c + d)u1 + (a + b − c − d)u2 ,
2 2
1 1
T (u2 ) = (a − b + c − d)u1 + (a − b − c + d)u2 ,
2 2
so that the problem follows.
2.2.5 Let A = (aij ) and define T ∈ L(Fn ) by setting T (x) = Ax where
x ∈ Fn is taken to be a column vector. Then
n
T (ej ) = aij ei , j = 1, . . . , n.
i=1
f1 = en , f2 = en−1 , ..., fn = e1 ,
or fi = en−i+1 for i = 1, . . . , n. Then we obtain
n
n
T (fj ) = an−i+1,n−j +1 fi ≡ bij fi , j = 1, . . . , n.
i=1 i=1
Since A and B = (bij ) are the matrix representations of the mapping T
under the bases {e1 , . . . , en } and {f1 , . . . , fn }, respectively, we conclude
that A ∼ B.
Section 2.3
2.3.3 That the equation T (u) = v has a solution for some u ∈ U is equivalent
to v ∈ R(T ), which is equivalent to v ∈ N (T )0 (by Theorem 2.8),
which is equivalent to v, v = 0 for all v ∈ N (T ) or T (v ) = 0.
Section 2.4
2.4.3 Let k = r(T̃ ) = dim(R(T̃ )). Let {[v1 ]Y , . . . , [vk ]Y } be a basis of R(T̃ )
in V /Y . Then there are [u1 ]X , . . . , [uk ]X ∈ U/X such that
T̃ ([u1 ]X ) = [v1 ]Y , ..., T̃ ([uk ]) = [vk ]Y .
On the other hand, from the definition of T̃ , we have T̃ ([ui ]X ) =
[T (ui )]Y for i = 1, . . . , k. We claim that T (u1 ), . . . , T (uk ) are linearly
independent. In fact, let a1 , . . . , ak be scalars such that
a1 T (u1 ) + · · · + ak T (uk ) = 0.
Taking cosets, we get a1 [T (u1 )]Y + · · · + ak [T (uk )]Y = [0]Y , which
leads to a1 [v1 ]Y + · · · + ak [vk ]Y = [0]Y . Thus a1 = · · · = ak = 0. This
proves r(T ) ≥ k.
www.pdfgrip.com
Section 2.5
2.5.12 (i) Assume R(S) = R(T ). Since S projects U onto R(S) = R(T )
along N (S), we have for any u ∈ U the result S(T (u)) = T (u).
Thus S ◦ T = T . Similarly, T ◦ S = S.
Furthermore, from S ◦ T = T , then T (U ) = (S ◦ T )(U ) implies
R(T ) ⊂ R(S). Likewise, T ◦ S = S implies R(S) ⊂ R(T ). So
R(S) = R(T ).
(ii) Assume N (S) = N (T ). For any u ∈ U , rewrite u as u = v + w
with v ∈ R(T ) and w ∈ N (T ). Then T (v) = v, T (w) = 0, and
S(w) = 0 give us
Hence S ◦ T = S. Similarly, T ◦ S = T .
Assume S ◦ T = S, T ◦ S = T . Let u ∈ N (T ). Then S(u) =
S(T (u)) = 0. So u ∈ N (S). So N (T ) ⊂ N (S). Interchange S
and T . We get N (S) ⊂ N (T ).
2.5.14 (i) We have
u, u1 u1 , u1 + u, u2 u2 , u1 = u, u1 ,
u, u1 u1 , u2 + u, u2 u2 , u2 = u, u2 .
Namely,
u1 , u1 u1 + u2 , u1 u2 = u1 , u1 , u2 u1 + u2 , u2 u2 = u2 .
www.pdfgrip.com
Section 2.6
T = Tn − Sn = Tn ◦ (I − Rn ), (S9)
Section 3.1
3.1.3 (i) Let C be a closed curve around but away from the origin of R2 and
consider
f (x, y)
u : C → S1, u(x, y) = , (x, y) ∈ C.
f (x, y)
Take R > 0 to be a constant and let C be parametrized by θ :
R R
x = cos θ, y = sin θ, 0 ≤ θ ≤ 2π. Then u = (u1 , u2 ) =
a b
(cos θ, sin θ ) (on C). So ind(f |C ) = deg(u) = 1. On the other
hand, on C, we have
g(x, y)2 = (a 2 x 2 − b2 y 2 )2 + 4a 2 b2 x 2 y 2 = (a 2 x 2 + b2 y 2 )2 = R 4 .
Section 3.2
3.2.7 Let the column vectors in A(t) be denoted by A1 (t), . . . , An (t). Then
1
(det(A(t + h)) − det(A(t)))
h
1
= (A1 (t + h) − A1 (t)), A2 (t + h), . . . , An (t + h)
h
1
+ A1 (t), (A2 (t + h) − A2 (t)), . . . , An (t + h)
h
1
+ · · · + A1 (t), A2 (t), . . . , (An (t + h) − An (t)) .
h
d
det(A(t)) = |A1 (t), A2 (t), . . . , An (t)|
dt
+ |A1 (t), A2 (t), . . . , An (t)|
+ · · · + |A1 (t), A2 (t), . . . , An (t)|.
d
n
n
det(A(t)) = ai1 (t)Ci1 (t) + ai2 (t)Ci2 (t)
dt
i=1 i=1
n
+··· + ain (t)Cin (t).
i=1
3.2.8 Adding all columns to the first column of the matrix, we get
n
x+ ai a1 a2 · · · an
i=1
n
x+ ai x a2 · · · an
i=1
x + a a
n
i 2 x · · · an
i=1
..
.. .. .. . .
. . . . .
n
x+ ai a2 a3 · · · x
i=1
1 a1 a2 · · · an
1 x a ··· a
2 n
n
= x+ ai 1 a2 x · · · an .
. . ..
i=1 . . .. . .
. . . . .
1 a a ··· x
2 3
Since the minor of the entry at the position (1, 1) is the determinant of
'n
a lower triangular matrix, we get a = (x − ai ).
i=1
www.pdfgrip.com
3.2.9 If c1 , . . . , cn+2 are not all distinct then it is clear that the determinant is
zero since there are two identical columns. Assume now c1 , . . . , cn+2
are distinct and consider the function of x given by
⎛ ⎞
p1 (x) p1 (c2 ) ··· p1 (cn+2 )
⎜ ⎟
⎜ p2 (x) p2 (c2 ) ··· p2 (cn+2 ) ⎟
⎜ ⎟
p(x) = det ⎜ .. .. .. ⎟.
⎜ ..
. ⎟
⎝ . . . ⎠
pn+2 (x) pn+2 (c2 ) · · · pn+2 (cn+2 )
By the cofactor expansion along the first column, we see that p(x)
is a polynomial of degree at most n which vanishes at n + 1 points:
c2 , . . . , cn+2 . Hence p(x) = 0 for all x. In particular, p(c1 ) = 0 as well.
3.2.10 We use Dn+1 to denote the determinant and implement induction to do
the computation. When n = 1, we have D2 = a1 x + a0 . At n − 1,
we assume Dn = an−1 x n−1 + · · · + a1 x + a0 . At n, by the cofactor
expansion according to the first column, we get
3.2.11 Use D(λ) to denote the determinant on the left-hand side of (3.2.46). It
is clear that D(0) = 0. So (3.2.46) is true at λ = 0.
Now assume λ = 0 and rewrite D(λ) as
⎛ ⎞
1 0 0 ··· 0
⎜ a1 − λ ··· ⎟
⎜ 1 a2 an ⎟
⎜ ⎟
⎜ 1 a1 a2 − λ ··· an ⎟
D(λ) = det ⎜ ⎟.
⎜ ⎟
⎜ .. .. .. .. .. ⎟
⎝ . . . . . ⎠
1 a1 a2 ··· an − λ
⎛ ⎞
1 −a1 −a2 ··· −an
⎜ −λ ··· ⎟
⎜ 1 0 0 ⎟
⎜ ⎟
⎜ 1 0 −λ ··· 0 ⎟
D(λ) = det ⎜ ⎟
⎜ ⎟
⎜ .. .. .. .. .. ⎟
⎝ . . . . . ⎠
1 0 0 · · · −λ
⎛ a1 a2 an ⎞
1 − − ··· −
⎜ λ λ λ ⎟
⎜ −1 ··· ⎟
⎜ 1 0 0 ⎟
⎜ ⎟
= λ det ⎜
n
1 0 −1 ··· 0 ⎟.
⎜ ⎟
⎜ ⎟
⎜ .. .. .. .. .. ⎟
⎝ . . . . . ⎠
1 0 0 ··· −1
⎛ ⎞
1
n
a1 a2 an
⎜1 − λ ai − − ··· −
⎜ λ λ λ ⎟
⎟
⎜ i=1 ⎟
⎜ −1 ··· 0 ⎟
⎜ 0 0 ⎟
D(λ) = λ det ⎜
n
⎜
⎟
⎜ 0 0 −1 ··· 0 ⎟ ⎟
⎜ .. ⎟
⎜ .. .. .. .. ⎟
⎝ . . . . . ⎠
0 0 0 ··· −1
1
n
n
= λn 1 − ai (−1)n = (−1)n λn−1 λ − ai .
λ
i=1 i=1
Then |xi | > 0. On the other hand, the ith component of the equation
Ax = 0 reads
which is a contradiction.
3.2.15 Consider the modified matrix
A(t) = D + t (A − D), 0 ≤ t ≤ 1,
1 b1 b2 ··· bn
1 − a1 b1 −a1 b2 ··· −a1 bn
0
0 −a2 b1 1 − a2 b2 ··· −a2 bn .
det(In − α β) =
t
.. .. .. .. ..
. . . . .
0 −an b1 −an b2 ··· 1 − an bn
Now adding the a1 multiple of row 1 to row 2, a2 multiple of row
1 to row 3,..., and an multiple of row 1 to the last row, of the above
determinant, we get
1 b1 b2 · · · bn
a
1 1 0 ··· 0
det(In − α t β) = a2 0 1 · · · 0 .
. .
. .. .. . .
. . . . ..
a 0 0 ··· 1
n
Section 3.3
3.3.7 For A = (aij ), we use MijA and CijA to denote the minor and cofactor of
the entry aij , respectively. Then we have
t
MijA = MjAi , (S14)
t
which leads to CijA = CjAi , i, j = 1, . . . , n. Hence
t
(adj(A))t = (CijA ) = (CijA )t = adj(At ).
Section 3.4
3.4.1 Using CijA to denote the cofactor of the entry aij of the matrix A = (aij )
and applying (3.2.42), we have
d d
a1 = pA (λ) = (det(λIn − A))
dλ λ=0 dλ λ=0
n
(−A)
n
= Cii = (−1)n−1 CiiA .
i=1 i=1
www.pdfgrip.com
3.4.2 It suffices to show that M = C(n, n) \ D is closed in C(n, n). Let {Ak }
be a sequence in M which converges to some A ∈ C(n, n). We need to
prove that A ∈ M. To this end, consider pAk (λ) and let {λk } be multiple
roots of pAk (λ). Then we have
pAk (λk ) = 0, pA k
(λk ) = 0, k = 1, 2, . . . . (S15)
On the other hand, we know that the coefficients of pA (λ) are con-
tinuously dependent on the entries of A. So the coefficients of pAk (λ)
converge to those of pA (λ), respectively. In particular, the coefficients
of pAk (λ), say {an−1
k
}, . . . , {a1k }, {a0k }, are bounded sequences. Thus
n−1
|λk | ≤
n
|λnk − pAk (λk )| + |pAk (λk )| ≤ |aik ||λk |i ,
i=0
That is,
Note that (S16) may be used to prove some known facts more
easily. For example, the relation adj(At ) = (adj(A))t (see Exercise
3.3.7) follows immediately.
3.4.6 First assume that A is invertible. Then
In −A A λIn 0 λIn − AB
= ,
0 λIn In B λIn λB
In 0 A λIn A λIn
= .
−B λIn In B λIn − BA 0
The determinants of the left-hand sides of the above are the same. The
determinants of the right-hand sides of the above are
1 b1 b2 ··· bn
··· 0
a1 λ 0
a2 0 λ ··· 0 .
det(λIn − α t β) =
..
.. .. .. ..
. . . . .
an 0 0 ··· λ
1
n
1− ai bi 0 0 ··· 0
λ
i=1
···
a1 λ 0 0
det(λIn − α β) =
t
a2 0 λ ··· 0
.. .. .. .. ..
. . . . .
an 0 0 ··· λ
1
n
= 1− ai bi λn = λn−1 (λ − αβ t ).
λ
i=1
Since both sides of the above also agree at λ = 0, they are identical for
all λ.
3.4.8 (i) We have pA (λ) = λ3 − 2λ2 − λ + 2. From pA (A) = 0 we have
A(A2 − 2A − In ) = −2In . So
⎛ ⎞
1 −1 0
⎜ ⎟
1 2 ⎜ 1 ⎟
−1
A = − (A − 2A − In ) = ⎜ ⎜ 0 0 ⎟.
2 2 ⎟
⎝ 3 ⎠
−2 −1
2
(ii) We divide λ10 by pA (λ) to find
λ10 =
pA (λ)(λ7 + 2λ7 + 5λ5 + 10λ4 + 21λ3 + 42λ2 + 85λ + 170)
+ 341λ2 − 340.
Section 4.1
Section 4.2
4.2.6 We first show that the definition has no ambiguity. For this purpose,
assume [u1 ] = [u] and [v1 ] = [v]. We need to show that (u1 , v1 ) =
(u, v). In fact, since u1 ∈ [u] and v1 ∈ [v], we know that u1 − u ∈ U0
and v1 − v ∈ U0 . Hence there are x, y ∈ U0 such that u1 − u = x, v1 −
v = y. So (u1 , v1 ) = (u + x, v + y) = (u, v) + (u, y) + (x, v + y) =
(u, v).
It is obvious that ([u], [v]) is bilinear and symmetric.
To show that the scalar product is non-degenerate, we assume [u] ∈
U/U0 satisfies ([u], [v]) = 0 for any [v] ∈ U/U0 . Thus (u, v) = 0 for
all v ∈ U which implies u ∈ U0 . In other words, [u] = [0].
4.2.8 Let w ∈ V ⊥ . Then (v, w) = 0 for any v ∈ V . Hence 0 = (v, w) =
v, ρ(w), ∀v ∈ V , which implies ρ(w) ∈ V 0 . Hence ρ(V ⊥ ) ⊂ V 0 .
On the other hand, take w ∈ V 0 . Then v, w = 0 for all v ∈ V .
Since ρ : U → U is an isomorphism, there is a unique w ∈ U such
that ρ(w) = w . Thus (v, w) = v, ρ(w) = v, w = 0, v ∈ V . That
is, w ∈ V ⊥ . Thus w ∈ ρ(V ⊥ ), which proves V 0 ⊂ ρ(V ⊥ ).
4.2.10 Recall (4.1.26). Replace V , W in (4.1.26) by V ⊥ , W ⊥ and use
(V ⊥ )⊥ = V , (W ⊥ )⊥ = W . We get (V ⊥ + W ⊥ )⊥ = V ∩ W. Taking
⊥
on both sides of this equation, we see that the conclusion follows.
4.2.11 From Exercise 4.1.8 we know that there is a nonzero vector u ∈ U such
that (u, u) = 0. Since (·, ·) is non-degenerate, there is some w ∈ U
such that (u, w) = 0. It is clear that u, w are linearly independent. If
w satisfies (w, w) = 0, then we take v = w/(u, w). Hence (v, v) = 0
and (u, v) = 1. Assume now (w, w) = 0 and consider x = u+cw (c ∈
C). Then x can never be zero for any c ∈ C. Set (x, x) = 0. We get
2c(u, w) + c2 (w, w) = 0. So we may choose c = −2(u, w)/(w, w).
Thus, with v = x/(u, x) where
2(u, w)2
(u, x) = − = 0,
(w, w)
Section 4.3
4.3.1 We consider the real case for simplicity. The complex case is similar.
Suppose M is nonsingular and consider a1 u1 + · · · + ak uk = 0, where
a1 , . . . , ak are scalars. Taking scalar products of this equation with the
vectors u1 , . . . , uk consecutively, we get
a1 (u1 , ui ) + · · · + ak (uk , ui ) = 0, i = 1, . . . , k.
k
That is, we have (v, v) = 0 where v = ai ui . Since the scalar product
i=1
is positive definite, we arrive at v = 0. Thus u1 , . . . , uk are linearly
dependent.
4.3.2 Suppose u ∈ Span{u1 , . . . , uk }. Then there are scalars a1 , . . . , ak such
that u = a1 u1 + · · · + ak uk . Taking scalar products of this equation with
u1 , . . . , uk , we have
which establishes
⎛ ⎞ ⎧⎛ ⎞ ⎛ ⎞⎫
(u1 , u) ⎪
⎪ (u1 , u1 ) (u1 , uk ) ⎪⎪
⎜ ⎟ ⎨⎜ ⎟ ⎜ ⎟⎬
⎜ .
.. ⎟ ∈ Span ⎜ .. ⎟,...,⎜ .. ⎟ .
⎝ ⎠ ⎪ ⎝ . ⎠ ⎝ . ⎠⎪
⎪
⎩ ⎪
⎭
(uk , u) (uk , u1 ) (uk , uk )
www.pdfgrip.com
Section 4.4
4.4.4 (i) We know that we have the direct sum U = V ⊕V ⊥ . For any x, y ∈ [u]
we have x = v1 + w1 , y = v2 + w2 , vi ∈ V , wi ∈ V ⊥ , i = 1, 2. So
x − y = (v1 − v2 ) + (w1 − w2 ). However, since x − y ∈ V , we have
w1 = w2 . In other words, for any x ∈ [u], there is a unique w ∈ V ⊥
so that x = v + w for some v ∈ V . Of course, [x] = [w] = [u] and
x2 = v2 + w2 whose minimum is attained at v = 0. This proves
[u] = w.
(ii) So we need to find the unique w ∈ V ⊥ such that [w] = [u]. First
find an orthogonal basis {v1 , . . . , vk } of V . Then for u we know that the
projection of u onto V along V ⊥ is given by the Fourier expansion (see
Exercise 4.4.2)
k
(vi , u)
v= vi .
(vi , vi )
i=1
k
(vi , u)
w =u− vi .
(vi , vi )
i=1
Section 4.5
Section 5.2
D = diag{d1 , . . . , dn }, di = ±1, i = 1, . . . , n.
1
5.2.16 Use · to denote the standard norm of Rn and let u1 = u. Expand
u
{u1 } to get an orthonormal basis {u1 , u2 , . . . , un } of R (with the usual
n
ut Q = ut Q = ut (u1 , u2 , . . . , un )
= (ut u1 , ut u2 , . . . , ut un ) = (u, 0, . . . , 0).
Therefore
Section 5.3
x1
S= x= ∈ R2 x12 − x22 =0 ,
x2
det(A + B) = det(P t P + B)
= det(P t ) det(I + [P −1 ]t B[P −1 ]) det(P ).
for some constant λ0 > 0. That is, f (u) ≥ f (x) for all u ∈ U .
www.pdfgrip.com
(ii) If x, y ∈ U solve (5.3.25), then (i) indicates that x, y are the mini-
mum points of f . Hence f (x) = f (y). Replacing u by y in (S17)
we arrive at x = y.
5.3.19 Let S ∈ L(U ) be positive semi-definite so that S 2 = T . Then q(u) =
(S(u), S(u)) = S(u)2 for any u ∈ U . Hence the Schwarz inequality
(4.3.10) gives us
q(αu + βv) = (S(αu + βv), S(αu + βv))
= α 2 S(u)2 + 2αβ(S(u), S(v)) + β 2 S(v)2
≤ α 2 S(u)2 + 2αβS(u)S(v) + β 2 S(v)2
≤ α 2 S(u)2 + αβ(S(u)2 + S(v)2 )
+ β 2 S(v)2
= α(α + β)S(u)2 + β(α + β)S(v)2
= αq(u) + βq(v), u, v ∈ U,
where we have also used the inequality 2ab ≤ a 2 + b2 for a, b ∈ R.
Section 5.4
i=1
xn
n
We can rewrite q(x) as yi2 with
i=1
yi = xi + ai xi+1 , i = 1, . . . , n − 1, yn = xn + an x1 . (S18)
It is seen that q(x) is positive definite if and only if the change of vari-
ables given in (S18) is invertible, which is equivalent to the condition
1 + (−1)n+1 a1 a2 · · · an = 0.
5.4.3 If A is positive semi-definite, then A + λIn is positive definite for all
λ > 0. Hence all the leading principal minors of A + λIn are positive
www.pdfgrip.com
Section 5.5
5.5.1 Assume the nontrivial case dim(U ) ≥ 2. First suppose that T satisfies
T (u) ∈ Span{u} for any u ∈ U . If T = aI for some a ∈ F, then there are
some linearly independent vectors u, v ∈ U such that T (u) = au and
T (v) = bv for some a, b ∈ F with a = b. Since T (u+v) ∈ Span{u+v},
we have T (u + v) = c(u + v) for some c ∈ F. Thus c(u + v) = au + bv,
which implies a = c and b = c because u, v are linearly independent.
This is false. Next we assume that there is some u ∈ U such that v =
T (u) ∈ Span{u}. Thus u, v are linearly independent. Let S ∈ L(U ) be
such that S(u) = v and S(v) = u. Then T (v) = T (S(u)) = S(T (u)) =
www.pdfgrip.com
Then
a0 + a1 λi + · · · + an−1 λn−1
i = bi , i = 1, . . . , n, (S20)
c0 + c1 λi + · · · + cn−1 λn−1
i = 0, i = 1, . . . , n,
Section 5.6
5.6.3 (i) Let v ∈ N (T ). Then 0 = (u, T (v))U = (T (u), v)V for any u ∈
U . So v ∈ R(T )⊥ . Conversely, if v ∈ R(T )⊥ , then for any u ∈ U
we have 0 = (T (u), v)V = (u, T (v))U . So T (v) = 0 or v ∈
N(T ). Thus N (T ) = R(T )⊥ . Taking ⊥ on this relation we obtain
R(T ) = N (T )⊥ .
www.pdfgrip.com
Section 6.2
Section 6.3
Section 6.4
6.4.4 Assume that A = (aij ) is upper triangular. Then we see that the diag-
onal entries of A† A and AA† are
i
n
|a11 |2 , . . . , |aj i |2 , . . . , |aj n |2 ,
j =1 j =1
n
n
|a1j |2 , . . . , |aij |2 , . . . , |ann |2 ,
j =1 j =i
a0 + a1 λi + · · · + ak−1 λk−1
i = λi , i = 1, . . . , k.
Section 6.5
6.5.8 Note that some special forms of this problem have appeared as Exer-
cises 6.3.11 and 6.3.12. By the singular value decomposition for A we
may rewrite A as A = P Q where Q, P ∈ C(n, n) are unitary and
∈ R(n, n) is diagonal whose diagonal entries are all nonnegative.
Alternatively, we also have A = (P Q)(Q† Q) = (P P † )(P Q), as
products of a unitary and positive semi-definite matrices, expressed in
two different orders.
Section 7.1
Section 7.2
7.2.1 If pS (λ) are pT (λ) are relatively prime, then there are polynomials f, g
such that f (λ)pS (λ) + g(λ)pT (λ) = 1. Thus, I = f (T )pS (T ) +
g(T )pT (T ) = f (T )pS (T ) which implies pS (T ) is invertible and
pS (T )−1 = f (T ). Similarly we see that pT (S) is also invertible.
7.2.2 Using the notation of the previous exercise, we have I = f (T )pS (T ).
Thus, applying R ◦ S = T ◦ R, we get
Section 7.3
T (u) = a1 λ1 u1 + · · · + an λn un ,
··· ··· ························
T n−1
(u) = a1 λn−1
1 u1 + · · · + an λn un .
n−1
www.pdfgrip.com
we obtain
⎧
⎪
⎪ a1 (x1 + λ1 x2 + · · · + λn−1
1 xn ) = 0,
⎪
⎪
⎨ a (x + λ x + · · · + λn−1 x ) = 0,
2 1 2 2 2 n
(S25)
⎪
⎪ ··························· ··· ···
⎪
⎪
⎩
an (x1 + λn x2 + · · · + λn xn ) = 0.
n−1
Section 7.4
l m
λk = λk . Since A is nonsingular, then λ = 0. Hence λ satisfies
m−l
λk = 1 as asserted.
7.4.11 Use (3.4.37) without assuming det(A) = 0 or (S16).
7.4.12 Take u = (1, . . . , 1)t ∈ Rn . It is clear that Au = nu. It is also
clear that n(A) = n − 1 since r(A) = 1. So n(A − nIn ) = 1
b2 bn
and A ∼ diag{n, 0, . . . , 0}. Take v = (1, , . . . , )t ∈ Rn . Then
n n
Bv = nv. Since r(B) = 1, we get n(B) = n − 1. So n(B − nIn ) = 1.
Consequently, B ∼ diag{n, 0, . . . , 0}. Thus A ∼ B.
7.4.17 Suppose otherwise that there is an A ∈ R(3, 3) such that m(λ) = λ2 +
3λ + 4 is the minimal polynomial of A. Let pA (λ) be the characteristic
polynomial of A. Since pA (λ) ∈ P3 and the coefficients of pA (λ) are
all real, so pA (λ) has a real root. On the other hand, recall that the
roots of m(λ) are all the roots of pA (λ) but the former has no real root.
So we arrive at a contradiction.
7.4.18 (i) Let mA (λ) be the minimal polynomial of A. Then mA (λ)|λ2 + 1.
However, λ2 + 1 is prime over R, so mA (λ) = λ2 + 1.
(ii) Let pA (λ) be the characteristic polynomial of A. Then the degree
of pA (λ) is n. Since mA (λ) = λ2 + 1 contains all the roots of
pA (λ) in C, which are ±i, which must appear in conjugate pairs
because pA (λ) is of real coefficients, so pA (λ) = (λ2 + 1)m for
some integer m ≥ 1. Hence n = 2m.
(iii) Since pA (λ) = (λ − i)m (λ + i)m and mA (λ) = (λ − i)(λ + i) (i.e.,
±i are single roots of mA (λ)), we know that
N (A − iIn ) = {x ∈ Cn | Ax = ix},
N (A + iIn ) = {y ∈ Cn | Ay = −iy},
wi = ui + ivi , ui , vi ∈ Rn , i = 1, . . . , m. (S26)
m
m
ai ui + bi vi = 0, a1 , . . . , am , b1 , . . . , bm ∈ R. (S27)
i=1 i=1
Section 8.1
Section 8.2
Section 8.3
Section 8.4
8.4.5 It is clear that all the entries of A and At are non-negative. It remains
to show that 1 and u = (1, . . . , 1)t ∈ Rn are a pair of eigenvalues and
eigenvectors of both A and At . In fact, applying Ai u = u and Ati u = u
(i = 1, . . . , k) consecutively, we obtain Au = A1 · · · Ak u = u and
At u = Atk · · · At1 u = u, respectively.
Section 9.3
Bibliographic notes
We end the book by mentioning a few important but more specialized subjects
that are not touched in this book. We point out only some relevant references
for the interested.
Convex sets. In Lang [23] basic properties and characterizations of convex
sets in Rn are presented. For a deeper study of convex sets using advanced
tools such as the Hahn–Banach theorem see Lax [25].
Tensor products and alternating forms. These topics are covered elegantly
by Halmos [18]. In particular, there, the determinant is seen to arise as
the unique scalar, associated with each linear mapping, defined by the one-
dimensional space of top-degree alternating forms, over a finite-dimensional
vector space.
Minmax principle for computing the eigenvalues of self-adjoint mappings.
This is a classical variational resolution of the eigenvalue problem known
as the method of the Rayleigh–Ritz quotients. For a thorough treatment see
Bellman [6], Lancaster and Tismenetsky [22], and Lax [25].
Calculus of matrix-valued functions. These techniques are useful and pow-
erful in applications. For an introduction see Lax [25].
Irreducible matrices. Such a notion is crucial for extending the Perron–
Frobenius theorem and for exploring the Markov matrices further under more
relaxed conditions. See Berman and Plemmons [7], Horn and Johnson [21],
Lancaster and Tismenetsky [22], Meyer [29], and Xu [38] for related studies.
Transformation groups and bilinear forms. Given a non-degenerate bilin-
ear form over a finite-dimensional vector space, the set of all linear mappings
on the space which preserve the bilinear form is a group under the operation
of composition. With a specific choice of the bilinear form, a particular such
transformation group may thus be constructed and investigated. For a concise
introduction to this subject in the context of linear algebra see Hoffman and
Kunze [19].
311
www.pdfgrip.com
References
313
www.pdfgrip.com
314 References
Index
315
www.pdfgrip.com
316 Index
Index 317
318 Index