A Concise Text On Advanced Linear Algebra

Download as pdf or txt
Download as pdf or txt
You are on page 1of 333

www.pdfgrip.

com

A Concise Text on Advanced Linear Algebra

This engaging textbook for advanced undergraduate students and beginning graduates
covers the core subjects in linear algebra. The author motivates the concepts by
drawing clear links to applications and other important areas.
The book places particular emphasis on integrating ideas from analysis wherever
appropriate and features many novelties in its presentation. For example, the notion of
determinant is shown to appear from calculating the index of a vector field which leads
to a self-contained proof of the Fundamental Theorem of Algebra; the
Cayley–Hamilton theorem is established by recognizing the fact that the set of
complex matrices of distinct eigenvalues is dense; the existence of a real eigenvalue of
a self-adjoint map is deduced by the method of calculus; the construction of the Jordan
decomposition is seen to boil down to understanding nilpotent maps of degree two;
and a lucid and elementary introduction to quantum mechanics based on linear algebra
is given.
The material is supplemented by a rich collection of over 350 mostly proof-oriented
exercises, suitable for readers from a wide variety of backgrounds. Selected solutions
are provided at the back of the book, making it ideal for self-study as well as for use as
a course text.
www.pdfgrip.com
www.pdfgrip.com

A Concise Text on
Advanced Linear Algebra

YISONG YANG
Polytechnic School of Engineering, New York University
www.pdfgrip.com

University Printing House, Cambridge CB2 8BS, United Kingdom

Cambridge University Press is part of the University of Cambridge.


It furthers the University’s mission by disseminating knowledge in the pursuit of
education, learning and research at the highest international levels of excellence.

www.cambridge.org
Information on this title: www.cambridge.org/9781107087514

c Yisong Yang 2015
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2015
Printed in the United Kingdom by Clays, St Ives plc
A catalogue record for this publication is available from the British Library
Library of Congress Cataloguing in Publication data
Yang, Yisong.
A concise text on advanced linear algebra / Yisong Yang, Polytechnic School
of Engineering, New York University.
pages cm
Includes bibliographical references and index.
ISBN 978-1-107-08751-4 (Hardback) – ISBN 978-1-107-45681-5 (Paperback)
1. Algebras, Linear–Textbooks. 2. Algebras, Linear–Study and teaching (Higher) .
3. Algebras, Linear–Study and teaching (Graduate). I. Title.
II. Title: Advanced linear algebra.
QA184.2.Y36 2015
512 .5–dc23 2014028951
ISBN 978-1-107-08751-4 Hardback
ISBN 978-1-107-45681-5 Paperback
Cambridge University Press has no responsibility for the persistence or accuracy
of URLs for external or third-party internet websites referred to in this publication,
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
www.pdfgrip.com

For Sheng,
Peter, Anna, and Julia
www.pdfgrip.com
www.pdfgrip.com

Contents

Preface page ix
Notation and convention xiii

1 Vector spaces 1
1.1 Vector spaces 1
1.2 Subspaces, span, and linear dependence 8
1.3 Bases, dimensionality, and coordinates 13
1.4 Dual spaces 16
1.5 Constructions of vector spaces 20
1.6 Quotient spaces 25
1.7 Normed spaces 28
2 Linear mappings 34
2.1 Linear mappings 34
2.2 Change of basis 45
2.3 Adjoint mappings 50
2.4 Quotient mappings 53
2.5 Linear mappings from a vector space into itself 55
2.6 Norms of linear mappings 70
3 Determinants 78
3.1 Motivational examples 78
3.2 Definition and properties of determinants 88
3.3 Adjugate matrices and Cramer’s rule 102
3.4 Characteristic polynomials and Cayley–Hamilton
theorem 107
4 Scalar products 115
4.1 Scalar products and basic properties 115

vii
www.pdfgrip.com

viii Contents

4.2 Non-degenerate scalar products 120


4.3 Positive definite scalar products 127
4.4 Orthogonal resolutions of vectors 137
4.5 Orthogonal and unitary versus isometric mappings 142
5 Real quadratic forms and self-adjoint mappings 147
5.1 Bilinear and quadratic forms 147
5.2 Self-adjoint mappings 151
5.3 Positive definite quadratic forms, mappings, and matrices 157
5.4 Alternative characterizations of positive definite matrices 164
5.5 Commutativity of self-adjoint mappings 170
5.6 Mappings between two spaces 172
6 Complex quadratic forms and self-adjoint mappings 180
6.1 Complex sesquilinear and associated quadratic forms 180
6.2 Complex self-adjoint mappings 184
6.3 Positive definiteness 188
6.4 Commutative self-adjoint mappings and consequences 194
6.5 Mappings between two spaces via self-adjoint mappings 199
7 Jordan decomposition 205
7.1 Some useful facts about polynomials 205
7.2 Invariant subspaces of linear mappings 208
7.3 Generalized eigenspaces as invariant subspaces 211
7.4 Jordan decomposition theorem 218
8 Selected topics 226
8.1 Schur decomposition 226
8.2 Classification of skewsymmetric bilinear forms 230
8.3 Perron–Frobenius theorem for positive matrices 237
8.4 Markov matrices 242
9 Excursion: Quantum mechanics in a nutshell 248
9.1 Vectors in Cn and Dirac bracket 248
9.2 Quantum mechanical postulates 252
9.3 Non-commutativity and uncertainty principle 257
9.4 Heisenberg picture for quantum mechanics 262

Solutions to selected exercises 267


Bibliographic notes 311
References 313
Index 315
www.pdfgrip.com

Preface

This book is concisely written to provide comprehensive core materials for


a year-long course in Linear Algebra for senior undergraduate and beginning
graduate students in mathematics, science, and engineering. Students who gain
profound understanding and grasp of the concepts and methods of this course
will acquire an essential knowledge foundation to excel in their future aca-
demic endeavors.
Throughout the book, methods and ideas of analysis are greatly emphasized
and used, along with those of algebra, wherever appropriate, and a delicate
balance is cast between abstract formulation and practical origins of various
subject matters.
The book is divided into nine chapters. The first seven chapters embody a
traditional course curriculum. An outline of the contents of these chapters is
sketched as follows.
In Chapter 1 we cover basic facts and properties of vector spaces. These
include definitions of vector spaces and subspaces, concepts of linear dep-
endence, bases, coordinates, dimensionality, dual spaces and dual bases,
quotient spaces, normed spaces, and the equivalence of the norms of a finite-
dimensional normed space.
In Chapter 2 we cover linear mappings between vector spaces. We start from
the definition of linear mappings and discuss how linear mappings may be con-
cretely represented by matrices with respect to given bases. We then introduce
the notion of adjoint mappings and quotient mappings. Linear mappings from
a vector space into itself comprise a special but important family of mappings
and are given a separate treatment later in this chapter. Topics studied there
include invariance and reducibility, eigenvalues and eigenvectors, projections,
nilpotent mappings, and polynomials of linear mappings. We end the chapter
with a discussion of the concept of the norms of linear mappings and use it
to show that being invertible is a generic property of a linear mapping and

ix
www.pdfgrip.com

x Preface

then to show how the exponential of a linear mapping may be constructed and
understood.
In Chapter 3 we cover determinants. As a non-traditional but highly
motivating example, we show that the calculation of the topological degree
of a differentiable map from a closed curve into the unit circle in R2 involves
computing a two-by-two determinant, and the knowledge gained allows us to
prove the Fundamental Theorem of Algebra. We then formulate the definition
of a general determinant inductively, without resorting to the notion of permu-
tations, and establish all its properties. We end the chapter by establishing the
Cayley–Hamilton theorem. Two independent proofs of this important theorem
are given. The first proof is analytic and consists of two steps. In the first step,
we show that the theorem is valid for a matrix of distinct eigenvalues. In the
second step, we show that any matrix may be regarded as a limiting point of a
sequence of matrices of distinct eigenvalues. Hence the theorem follows again
by taking the limit. The second proof, on the other hand, is purely algebraic.
In Chapter 4 we discuss vector spaces with scalar products. We start from the
most general notion of scalar products without requiring either non-degeneracy
or positive definiteness. We then carry out detailed studies on non-degenerate
and positive definite scalar products, respectively, and elaborate on adjoint
mappings in terms of scalar products. We end the chapter with a discussion
of isometric mappings in both real and complex space settings and noting their
subtle differences.
In Chapter 5 we focus on real vector spaces with positive definite scalar
products and quadratic forms. We first establish the main spectral theorem for
self-adjoint mappings. We will not take the traditional path of first using the
Fundamental Theorem of Algebra to assert that there is an eigenvalue and then
applying the self-adjointness to show that the eigenvalue must be real. Instead
we shall formulate an optimization problem and use calculus to prove directly
that a self-adjoint mapping must have a real eigenvalue. We then present a
series of characteristic conditions for a symmetric bilinear form, a symmetric
matrix, or a self-adjoint mapping, to be positive definite. We end the chapter
by a discussion of the commutativity of self-adjoint mappings and the useful-
ness of self-adjoint mappings for the investigation of linear mappings between
different spaces.
In Chapter 6 we study complex vector spaces with Hermitian scalar products
and related notions. Much of the theory here is parallel to that of the real space
situation with the exception that normal mappings can only be fully understood
and appreciated within a complex space formalism.
In Chapter 7 we establish the Jordan decomposition theorem. We start with
a discussion of some basic facts regarding polynomials. We next show how
www.pdfgrip.com

Preface xi

to reduce a linear mapping over its generalized eigenspaces via the Cayley–
Hamilton theorem and the prime factorization of the characteristic polynomial
of the mapping. We then prove the Jordan decomposition theorem. The key
and often the most difficult step in this construction is a full understanding
of how a nilpotent mapping is reduced canonically. We approach this problem
inductively with the degree of a nilpotent mapping and show that it is crucial to
tackle a mapping of degree 2. Such a treatment eases the subtlety of the subject
considerably.
In Chapter 8 we present four selected topics that may be used as materi-
als for some optional extra-curricular study when time and interest permit. In
the first section we present the Schur decomposition theorem, which may be
viewed as a complement to the Jordan decomposition theorem. In the second
section we give a classification of skewsymmetric bilinear forms. In the third
section we state and prove the Perron–Frobenius theorem regarding the prin-
cipal eigenvalues of positive matrices. In the fourth section we establish some
basic properties of the Markov matrices.
In Chapter 9 we present yet another selected topic for the purpose of
optional extra-curricular study: a short excursion into quantum mechanics
using gadgets purely from linear algebra. Specifically we will use Cn as the
state space and Hermitian matrices as quantum mechanical observables to for-
mulate the over-simplified quantum mechanical postulates including Bohr’s
statistical interpretation of quantum mechanics and the Schrödinger equation
governing the time evolution of a state. We next establish Heisenberg’s uncer-
tainty principle. Then we prove the equivalence of the Schrödinger description
via the Schrödinger equation and the Heisenberg description via the Heisen-
berg equation of quantum mechanics.
Also provided in the book is a rich collection of mostly proof-oriented
exercises to supplement and consolidate the main course materials. The
diversity and elasticity of these exercises aim to satisfy the needs and inter-
ests of students from a wide variety of backgrounds.
At the end of the book, solutions to some selected exercises are presented.
These exercises and solutions provide additional illustrative examples, extend
main course materials, and render convenience for the reader to master the
subjects and methods covered in a broader range.
Finally some bibliographic notes conclude the book.
This text may be curtailed to meet the time constraint of a semester-long
course. Here is a suggested list of selected sections for such a plan: Sec-
tions 1.1–1.5, 2.1–2.3, 2.5, 3.1.2, 3.2, and 3.3 (present the concept of adjugate
matrices only), Section 3.4 (give the second proof of the Cayley–Hamilton the-
orem only, based on an adjugate matrix expansion), Sections 4.3, 4.4, 5.1, 5.2
www.pdfgrip.com

xii Preface

(omit the analytic proof that a self-adjoint mapping must have an eigenvalue
but resort to Exercise 5.2.1 instead), Sections 5.3, 6.1, 6.2, 6.3.1, and 7.1–7.4.
Depending on the pace of lectures and time available, the instructor may
decide in the later stage of the course to what extent the topics in Sections
7.1–7.4 (the Jordan decomposition) can be presented productively.
The author would like to take this opportunity to thank Patrick Lin, Thomas
Otway, and Robert Sibner for constructive comments and suggestions, and
Roger Astley of Cambridge University Press for valuable editorial advice,
which helped improve the presentation of the book.

West Windsor, New Jersey Yisong Yang


www.pdfgrip.com

Notation and convention

We use N to denote the set of all natural numbers,


N = {0, 1, 2, . . . },
and Z the set of all integers,
Z = {. . . , −2, −1, 0, 1, 2, . . . }.

We use i to denote the imaginary unit −1. For a complex number c =
a + ib where a, b are real numbers we use
c = a − ib
to denote the complex conjugate of c. We use {c} and {c} to denote the real
and imaginary parts of the complex number c = a + ib. That is,
{c} = a, {c} = b.
We use i, j, k, l, m, n to denote integer-valued indices or space dimen-
sion numbers, a, b, c scalars, u, v, w, x, y, z vectors, A, B, C, D matrices,
P , R, S, T mappings, and U, V , W, X, Y, Z vector spaces, unless otherwise
stated.
We use t to denote the variable in a polynomial or a function or the transpose
operation on a vector or a matrix.
When X or Y is given, we use X ≡ Y to denote that Y , or X, is defined to
be X, or Y , respectively.
Occasionally, we use the symbol ∀ to express ‘for all’.
Let X be a set and Y, Z subsets of X. We use Y \ Z to denote the subset of
elements in Y which are not in Z.

xiii
www.pdfgrip.com
www.pdfgrip.com

1
Vector spaces

In this chapter we study vector spaces and their basic properties and structures.
We start by stating the definition and a discussion of the examples of vector
spaces. We next introduce the notions of subspaces, linear dependence, bases,
coordinates, and dimensionality. We then consider dual spaces, direct sums,
and quotient spaces. Finally we cover normed vector spaces.

1.1 Vector spaces


A vector space is a non-empty set consisting of elements called vectors which
can be added and multiplied by some quantities called scalars. In this section,
we start with a study of vector spaces.

1.1.1 Fields
The scalars to operate on vectors in a vector space are required to form a field,
which may be denoted by F, where two operations, usually called addition,
denoted by ‘+’, and multiplication, denoted by ‘·’ or omitted, over F are per-
formed between scalars, such that the following axioms are satisfied.
(1) (Closure) If a, b ∈ F, then a + b ∈ F and ab ∈ F.
(2) (Commutativity) For a, b ∈ F, there hold a + b = b + a and ab = ba.
(3) (Associativity) For a, b, c ∈ F, there hold (a + b) + c = a + (b + c) and
a(bc) = (ab)c.
(4) (Distributivity) For a, b, c ∈ F, there hold a(b + c) = ab + ac.
(5) (Existence of zero) There is a scalar, called zero, denoted by 0, such that
a + 0 = a for any a ∈ F.
(6) (Existence of unity) There is a scalar different from zero, called one,
denoted by 1, such that 1a = a for any a ∈ F.

1
www.pdfgrip.com

2 Vector spaces

(7) (Existence of additive inverse) For any a ∈ F, there is a scalar, denoted by


−a or (−a), such that a + (−a) = 0.
(8) (Existence of multiplicative inverse) For any a ∈ F \ {0}, there is a scalar,
denoted by a −1 , such that aa −1 = 1.

It is easily seen that zero, unity, additive and multiplicative inverses are all
unique. Besides, a field consists of at least two elements.
With the usual addition and multiplication, the sets of rational numbers, real
numbers, and complex numbers, denoted by Q, R, and C, respectively, are all
fields. These fields are infinite fields. However, the set of integers, Z, is not a
field because there is a lack of multiplicative inverses for its non-unit elements.
Let p be a prime (p = 2, 3, 5, . . . ) and set pZ = {n ∈ Z | n = kp, k ∈ Z}.
Classify Z into the so-called cosets modulo pZ, that is, some non-overlapping
subsets of Z represented as [i] (i ∈ Z) such that

[i] = {j ∈ Z | i − j ∈ pZ}. (1.1.1)

It is clear that Z is divided into exactly p cosets, [0], [1], . . . , [p − 1]. Use
Zp to denote the set of these cosets and pass the additive and multiplicative
operations in Z over naturally to the elements in Zp so that

[i] + [j ] = [i + j ], [i][j ] = [ij ]. (1.1.2)

It can be verified that, with these operations, Zp becomes a field with its obvi-
ous zero and unit elements, [0] and [1]. Of course, p[1] = [1] + · · · + [1] (p
terms)= [p] = [0]. In fact, p is the smallest positive integer whose multipli-
cation with unit element results in zero element. A number of such a property
is called the characteristic of the field. Thus, Zp is a field of characteristic p.
For Q, R, and C, since no such integer exists, we say that these fields are of
characteristic 0.

1.1.2 Vector spaces


Let F be a field. Consider the set of n-tuples, denoted by Fn , with elements
called vectors arranged in row or column forms such as
⎛ ⎞
a1
⎜ . ⎟
⎜ . ⎟ or (a1 , . . . , an ) where a1 , . . . , an ∈ F. (1.1.3)
⎝ . ⎠
an
Furthermore, we can define the addition of two vectors and the scalar multipli-
cation of a vector by a scalar following the rules such as
www.pdfgrip.com

1.1 Vector spaces 3

⎛ ⎞ ⎛ ⎞ ⎛ ⎞
a1 b1 a1 + b1
⎜ . ⎟ ⎜ .. ⎟ ⎜ ⎟
⎜ . ⎟+⎜ ⎟ ⎜ .. ⎟,
⎝ . ⎠ ⎝ . ⎠=⎝ . ⎠ (1.1.4)
an bn an + bn
⎛ ⎞ ⎛ ⎞
a1 αa1
⎜ . ⎟ ⎜ . ⎟
α⎜ ⎟ ⎜ ⎟
⎝ .. ⎠ = ⎝ .. ⎠ where α ∈ F. (1.1.5)
an αan
The set Fn , modeled over the field F and equipped with the above operations,
is a prototype example of a vector space.
More generally, we say that a set U is a vector space over a field F if U is
non-empty and there is an operation called addition, denoted by ‘+’, between
the elements of U , called vectors, and another operation called scalar mul-
tiplication between elements in F, called scalars, and vectors, such that the
following axioms hold.
(1) (Closure) For u, v ∈ U , we have u + v ∈ U . For u ∈ U and a ∈ F, we
have au ∈ U .
(2) (Commutativity) For u, v ∈ U , we have u + v = v + u.
(3) (Associativity of addition) For u, v, w ∈ U , we have u + (v + w) =
(u + v) + w.
(4) (Existence of zero vector) There is a vector, called zero and denoted by 0,
such that u + 0 = u for any u ∈ U .
(5) (Existence of additive inverse) For any u ∈ U , there is a vector, denoted
as (−u), such that u + (−u) = 0.
(6) (Associativity of scalar multiplication) For any a, b ∈ F and u ∈ U , we
have a(bu) = (ab)u.
(7) (Property of unit scalar) For any u ∈ U , we have 1u = u.
(8) (Distributivity) For any a, b ∈ F and u, v ∈ U , we have (a+b)u = au+bu
and a(u + v) = au + av.
As in the case of the definition of a field, we see that it readily follows from
the definition that zero vector and additive inverse vectors are all unique in
a vector space. Besides, any vector multiplied by zero scalar results in zero
vector. That is, 0u = 0 for any u ∈ U .
Other examples of vector spaces (with obviously defined vector addition and
scalar multiplication) include the following.
(1) The set of all polynomials with coefficients in F defined by
P = {a0 + a1 t + · · · + an t n | a0 , a1 , . . . , an ∈ F, n ∈ N}, (1.1.6)
www.pdfgrip.com

4 Vector spaces

where t is a variable parameter.


(2) The set of all real-valued continuous functions over the interval [a, b] for
a, b ∈ R and a < b usually denoted by C[a, b].
(3) The set of real-valued solutions to the differential equation

dn x dx
an + · · · + a1 + a0 x = 0, a0 , a1 , . . . , an ∈ R. (1.1.7)
dt n dt

(4) In addition, we can also consider the set of arrays of scalars in F consisting
of m rows of vectors in Fn or n columns of vectors in Fm of the form
⎛ ⎞
a11 a12 . . . a1n
⎜ ⎟
⎜ a21 a22 . . . a2n ⎟

(aij ) = ⎜ ⎟, (1.1.8)

⎝ ... ... ... ... ⎠
am1 am2 . . . amn

where aij ∈ F, i = 1, . . . , m, j = 1, . . . , n, called an m by n or m × n


matrix and each aij is called an entry or component of the matrix. The set
of all m × n matrices with entries in F may be denoted by F(m, n). In
particular, F(m, 1) or F(1, n) is simply Fm or Fn . Elements in F(n, n) are
also called square matrices.

1.1.3 Matrices
Here we consider some of the simplest manipulations on, and properties of,
matrices.
Let A be the matrix given in (1.1.8). Then At , called the transpose of A, is
defined to be
⎛ ⎞
a11 a21 . . . am1
⎜ ⎟
⎜ a12 a22 . . . am2 ⎟
A =⎜
t ⎜ ⎟. (1.1.9)

⎝ ... ... ... ... ⎠
a1n a2n . . . amn

Of course, At ∈ F(n, m). Simply put, At is a matrix obtained from taking the
row (column) vectors of A to be its corresponding column (row) vectors.
For A ∈ F(n, n), we say that A is symmetric if A = At , or skew-symmetric
or anti-symmetric if At = −A. The sets of symmetric and anti-symmetric
matrices are denoted by FS (n, n) and FA (n, n), respectively.
It is clear that (At )t = A.
www.pdfgrip.com

1.1 Vector spaces 5

It will now be useful to introduce the notion of dot product. For any two
vectors u = (a1 , . . . , an ) and v = (b1 , . . . , bn ) in Fn , their dot product u·v ∈ F
is defined to be

u · v = a1 b1 + · · · + an bn . (1.1.10)

The following properties of dot product can be directly examined.

(1) (Commutativity) u · v = v · u for any u, v ∈ Fn .


(2) (Associativity and homogeneity) u · (av + bw) = a(u · v) + b(u · w) for
any u, v, w ∈ Fn and a, b ∈ F.

With the notion of dot product, we can define the product of two matrices
A ∈ F(m, k) and B ∈ F(k, n) by

C = (cij ) = AB, i = 1, . . . , m, j = 1, . . . , n, (1.1.11)

where cij is the dot product of the ith row of A and the j th column of B. Thus
AB ∈ F(m, n).
Alternatively, if we use u, v to denote column vectors in Fn , then

u · v = ut v. (1.1.12)

That is, the dot product of u and v may be viewed as a matrix product of the
1 × n matrix ut and n × 1 matrix v as well.
Matrix product (or matrix multiplication) enjoys the following properties.

(1) (Associativity of scalar multiplication) a(AB) = (aA)B = A(aB) for


any a ∈ F and any A ∈ F(m, k), B ∈ F(k, n).
(2) (Distributivity) A(B + C) = AB + AC for any A ∈ F(m, k) and B, C ∈
F(k, n); (A + B)C = AC + BC for any A, B ∈ F(m, k) and C ∈ F(k, n).
(3) (Associativity) A(BC) = (AB)C for any A ∈ F(m, k), B ∈ F(k, l),
C ∈ F(l, n).

Alternatively, if we express A ∈ F(m, k) and B ∈ F(k, n) as made of m row


vectors and n column vectors, respectively, rewritten as


A1
⎜ . ⎟
A=⎜ ⎟
⎝ .. ⎠ , B = (B1 , . . . , Bn ), (1.1.13)
Am
www.pdfgrip.com

6 Vector spaces

then, formally, we have


⎛ ⎞
A1 · B1 A1 · B2 ··· A1 · Bn
⎜ ⎟
⎜ A2 · B1 A2 · B2 · · · A2 · Bn ⎟
AB = ⎜
⎜ ···


⎝ ··· ··· ··· ⎠
Am · B1 Am · B2 · · · Am · B n
⎛ ⎞
A1
⎜ . ⎟
=⎜ ⎟
⎝ .. ⎠ (B1 , . . . , Bn )
Am
⎛ ⎞
A1 B1 A1 B2 ··· A1 Bn
⎜ ⎟
⎜ A2 B1 A2 B2 ··· A2 Bn ⎟
=⎜
⎜ ···
⎟, (1.1.14)
⎝ ··· ··· ··· ⎟⎠
A m B1 Am B2 ··· Am Bn

which suggests that matrix multiplication may be carried out with legitimate
multiplications executed over appropriate matrix blocks.
If A ∈ F(m, k) and B ∈ F(k, n), then At ∈ F(k, m) and B t ∈ F(n, k) so
that B t At ∈ F(n, m). Regarding how AB and B t At are related, here is the
conclusion.

Theorem 1.1 For A ∈ F(m, k) and B ∈ F(k, n), there holds

(AB)t = B t At . (1.1.15)

The proof of this basic fact is assigned as an exercise.


Other matrices in F(n, n) having interesting properties include the
following.

(1) Diagonal matrices are of the form A = (aij ) with aij = 0 whenever
i = j . The set of diagonal matrices is denoted as FD (n, n).
(2) Lower triangular matrices are of the form A = (aij ) with aij = 0 when-
ever j > i. The set of lower triangular matrices is denoted as FL (n, n).
(3) Upper triangular matrices are of the form A = (aij ) with aij = 0 when-
ever i > j . The set of upper triangular matrices is denoted as FU (n, n).

There is a special element in F(n, n), called the identity matrix, or unit
matrix, and denoted by In , or simply I , which is a diagonal matrix whose
diagonal entries are all 1 (unit scalar) and off-diagonal entries are all 0. For
any A ∈ F(n, n), we have AI = I A = A.
www.pdfgrip.com

1.1 Vector spaces 7

Definition 1.2 A matrix A ∈ F(n, n) is called invertible or nonsingular if there


is some B ∈ F(n, n) such that

AB = BA = I. (1.1.16)

In this situation, B is unique (cf. Exercise 1.1.7) and called the inverse of A
and denoted by A−1 .

If A, B ∈ F(n, n) are such that AB = I , then we say that A is a left inverse


of B and B a right inverse of A. It can be shown that a left or right inverse is
simply the inverse. In other words, if A is a left inverse of B, then both A and
B are invertible and the inverses of each other.
If A ∈ R(n, n) enjoys the property AAt = At A = I , then A is called an
orthogonal matrix. For A = (aij ) ∈ C(m, n), we adopt the notation A = (a ij )
for taking the complex conjugate of A and use A† to denote taking the complex
t
conjugate of the transpose of A, A† = A , which is also commonly referred
to as taking the Hermitian conjugate of A. If A ∈ C(n, n), we say that A is
Hermitian symmetric, or simply Hermitian, if A† = A, and skew-Hermitian
or anti-Hermitian, if A† = −A. If A ∈ C(n, n) enjoys the property AA† =
A† A = I , then A is called a unitary matrix. We will see the importance of
these notions later.

Exercises

1.1.1 Show that it follows from the definition of a field that zero, unit, additive,
and multiplicative inverse scalars are all unique.
1.1.2 Let p ∈ N be a prime and [n] ∈ Zp . Find −[n] and prove the existence
of [n]−1 when [n] = [0]. In Z5 , find −[4] and [4]−1 .
1.1.3 Show that it follows from the definition of a vector space that both zero
and additive inverse vectors are unique.
1.1.4 Prove the associativity of matrix multiplication by showing that
A(BC) = (AB)C for any A ∈ F(m, k), B ∈ F(k, l), C ∈ F(l, n).
1.1.5 Prove Theorem 1.1.
1.1.6 Let A ∈ F(n, n) (n ≥ 2) and rewrite A as

A1 A2
A= , (1.1.17)
A3 A4
where A1 ∈ F(k, k), A2 ∈ F(k, l), A3 ∈ F(l, k), A4 ∈ F(l, l), k, l ≥ 1,
k + l = n. Show that

At1 At3
A =
t
. (1.1.18)
At2 At4
www.pdfgrip.com

8 Vector spaces

1.1.7 Prove that the inverse of an invertible matrix is unique by showing the
fact that if A, B, C ∈ F(n, n) satisfy AB = I and CA = I then B = C.
1.1.8 Let A ∈ C(n, n). Show that A is Hermitian if and only if iA is anti-
Hermitian.

1.2 Subspaces, span, and linear dependence


Let U be a vector space over a field F and V ⊂ U a non-empty subset of U .
We say that V is a subspace of U if V is a vector space over F with the
inherited addition and scalar multiplication from U . It is worth noting that,
when checking whether a subset V of a vector space U becomes a sub-
space, one only needs to verify the closure axiom (1) in the definition of a
vector space since the rest of the axioms follow automatically as a conse-
quence of (1).
The two trivial subspaces of U are those consisting only of zero vector, {0},
and U itself. A nontrivial subspace is also called a proper subspace.
Consider the subset Pn (n ∈ N) of P defined by
Pn = {a0 + a1 t + · · · + an t n | a0 , a1 , . . . , an ∈ F}. (1.2.1)
It is clear that Pn is a subspace of P and Pm is subspace of Pn when m ≤ n.
Consider the set Sa of all vectors (x1 , . . . , xn ) in Fn satisfying the equation
x1 + · · · + xn = a, (1.2.2)
where a ∈ F. Then Sa is a subspace of Fn if and only if a = 0.
Let u1 , . . . , uk be vectors in U . The linear span of {u1 , . . . , uk }, denoted by
Span{u1 , . . . , uk }, is the subspace of U defined by
Span{u1 , . . . , uk } = {u ∈ U | u = a1 u1 + · · · + ak uk , a1 , . . . , ak ∈ F}.
(1.2.3)
Thus, if u ∈ Span{u1 , . . . , uk }, then there are a1 , . . . , ak ∈ F such that
u = a1 u1 + · · · + ak uk . (1.2.4)
We also say that u is linearly spanned by u1 , . . . , uk or linearly dependent on
u1 , . . . , uk . Therefore, zero vector 0 is linearly dependent on any finite set of
vectors.
If U = Span{u1 , . . . , uk }, we also say that U is generated by the vectors
u1 , . . . , u k .
For Pn defined in (1.2.1), we have Pn = Span{1, t, . . . , t n }. Thus
Pn is generated by 1, t, . . . , t n . Naturally, for two elements p, q in
www.pdfgrip.com

1.2 Subspaces, span, and linear dependence 9

Pn , say p(t) = a0 + a1 t + · · · + an t n , q(t) = b0 + b1 t + · · · + bn t n , we


identify p and q if and only if all the coefficients of p and q of like powers of
t coincide in F, or, ai = bi for all i = 0, 1, . . . , n.
In Fn , define

e1 = (1, 0, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), en = (0, 0, . . . , 0, 1).


(1.2.5)

Then Fn = Span{e1 , e2 , . . . , en } and Fn is generated by e1 , e2 , . . . , en .


Thus, for S0 defined in (1.2.2), we have

(x1 , x2 , . . . , xn ) = −(x2 + · · · + xn )e1 + x2 e2 + · · · + xn en


= x2 (e2 − e1 ) + · · · + xn (en − e1 ), (1.2.6)

where x2 , . . . , xn are arbitrarily taken from F. Therefore

S0 = Span{e2 − e1 , . . . , en − e1 }. (1.2.7)

For F(m, n), we define Mij ∈ F(m, n) to be the vector such that all its
entries vanish except that its entry at the position (i, j ) (at the ith row and j th
column) is 1, i = 1, . . . , m, j = 1, . . . , n. We have

F(m, n) = Span{Mij | i = 1, . . . , m, j = 1, . . . , n}. (1.2.8)

The notion of spans can be extended to cover some useful situations. Let U
be a vector space and S be a (finite or infinite) subset of U . Define

Span(S) = the set of linear combinations


of all possible finite subsets of S. (1.2.9)

It is obvious that Span(S) is a subspace of U . If U = Span(S), we say that U


is spanned or generated by the set of vectors S.
As an example, we have

P = Span{1, t, . . . , t n , . . . }. (1.2.10)

Alternatively, we can also express P as

P = ∪∞
n=0 Pn . (1.2.11)

The above discussion motivates the following formal definition.

Definition 1.3 Let u1 , . . . , um be m vectors in the vector space U over a


field F. We say that these vectors are linearly dependent if one of them may
be written as a linear span of the rest of them or linearly dependent on the rest
www.pdfgrip.com

10 Vector spaces

of them. Or equivalently, u1 , . . . , um are linearly dependent if there are scalars


a1 , . . . , am ∈ F where (a1 , . . . , am ) = (0, . . . , 0) such that
a1 u1 + · · · + am um = 0. (1.2.12)
Otherwise u1 , . . . , um are called linearly independent. In this latter case, the
only possible vector (a1 , . . . , am ) ∈ Fn to make (1.2.12) fulfilled is the zero
vector, (0, . . . , 0).

To proceed further, we need to consider the following system of linear


equations

⎨ a11 x1 + · · · + a1n xn
⎪ = 0,
.................. ... ... (1.2.13)


am1 x1 + · · · + amn xn = 0,
over F with unknowns x1 , . . . , xn .

Theorem 1.4 In the system (1.2.13), if m < n, then the system has a nontrivial
solution (x1 , . . . , xn ) = (0, . . . , 0).

Proof We prove the theorem by using induction on m + n.


The beginning situation is m + n = 3 when m = 1 and n = 2. It is clear that
we always have a nontrivial solution.
Assume that the statement of the theorem is true when m + n ≤ k where
k ≥ 3.
Let m + n = k + 1. If k = 3, the condition m < n implies m = 1, n = 3 and
the existence of a nontrivial solution is obvious. Assume then k ≥ 4. If all the
coefficients of the variable x1 in (1.2.13) are zero, i.e. a11 = · · · = am1 = 0,
then x1 = 1, x2 = · · · = xn = 0 is a nontrivial solution. So we may assume
one of the coefficients of x1 is nonzero. Without loss of generality, we assume
a11 = 0. If m = 1, there is again nothing to show. Assume m ≥ 2. Dividing the
first equation in (1.2.13) by a11 if necessary, we can further assume a11 = 1.
Then, adding the (−ai1 )-multiple of the first equation into the ith equation, in
(1.2.13), for i = 2, . . . , m, we arrive at


⎪ x1 + a12 x2 + · · · + a1n xn = 0,


⎨ b22 x2 + · · · + b2n xn = 0,
(1.2.14)

⎪ .................. ... ...



bm2 x2 + · · · + bmn xn = 0.
The system below the first equation in (1.2.14) contains m − 1 equations and
n − 1 unknowns x2 , . . . , xn . Of course, m − 1 < n − 1. So, in view of the
www.pdfgrip.com

1.2 Subspaces, span, and linear dependence 11

inductive assumption, it has a nontrivial solution. Substituting this nontrivial


solution into the first equation in (1.2.14) to determine the remaining unknown
x1 , we see that the existence of a nontrivial solution to the original system
(1.2.13) follows.

The importance of Theorem 1.4 is seen in the following.

Theorem 1.5 Any set of more than m vectors in Span{u1 , . . . , um } must be


linearly dependent.

Proof Let v1 , . . . , vn ∈ Span{u1 , . . . , um } be n vectors where n > m.


Consider the possible linear dependence relation

x1 v1 + · · · + xn vn = 0, (1.2.15)

for some x1 , . . . , xn ∈ F.
Since each vj ∈ Span{u1 , . . . , um }, j = 1, . . . , n, there are scalars aij ∈ F
(i = 1, . . . , m, j = 1, . . . , n) such that


m
vj = aij ui , j = 1, . . . , n. (1.2.16)
i=1

Substituting (1.2.16) into (1.2.15), we have


 m
n 
xj aij ui = 0. (1.2.17)
j =1 i=1

Regrouping the terms in the above equation, we arrive at


⎛ ⎞
m 
n
⎝ aij xj ⎠ ui = 0, (1.2.18)
i=1 j =1

which may be fulfilled by setting


n
aij xj = 0, i = 1, . . . , m. (1.2.19)
j =1

This system of equations is exactly the system (1.2.13) which allows a non-
trivial solution in view of Theorem 1.4. Hence the proof follows.

We are now prepared to study in the next section several fundamental prop-
erties of vector spaces.
www.pdfgrip.com

12 Vector spaces

Exercises

1.2.1 Let U1 and U2 be subspaces of a vector space U . Show that U1 ∪ U2 is


a subspace of U if and only if U1 ⊂ U2 or U2 ⊂ U1 .
1.2.2 Let Pn denote the vector space of the polynomials of degrees up to n
over a field F expressed in terms of a variable t. Show that the vectors
1, t, . . . , t n in Pn are linearly independent.
1.2.3 Show that the vectors in Fn defined in (1.2.5) are linearly independent.
1.2.4 Show that S0 defined in (1.2.2) may also be expressed as

S0 = Span{e1 − en , . . . , en−1 − en }, (1.2.20)

and deduce that, in Rn , the vectors


  
1 1 1
1, , . . . , n−2 , 2 n−1 −1 , e1 −en , . . . , en−1 − en , (1.2.21)
2 2 2

are linearly dependent (n ≥ 4).


1.2.5 Show that FS (n, n), FA (n, n), FD (n, n), FL (n, n), and FU (n, n) are all
subspaces of F(n, n).
1.2.6 Let u1 , . . . , un (n ≥ 2) be linearly independent vectors in a vector
space U and set

vi−1 = ui−1 + ui , i = 2, . . . , n; vn = un + u1 . (1.2.22)

Investigate whether v1 , . . . , vn are linearly independent as well.


1.2.7 Let F be a field. For any two vectors u = (a1 , . . . , an ) and v =
(b1 , . . . , bn ) in Fn (n ≥ 2), viewed as matrices, we see that the ma-
trix product A = ut v lies in F(n, n). Prove that any two row vectors of
A are linearly dependent. What happens to the column vectors of A?
1.2.8 Consider a slightly strengthened version of the second part of Exercise
1.2.1 above: Let U1 , U2 be subspaces of a vector space U , U1 = U ,
U2 = U . Show without using Exercise 1.2.1 that there exists a vec-
tor in U which lies outside U1 ∪ U2 . Explain how you may apply the
conclusion of this exercise to prove that of Exercise 1.2.1.
1.2.9 (A challenging extension of the previous exercise) Let U1 , . . . , Uk be k
subspaces of a vector space U over a field of characteristic 0. If Ui = U
for i = 1, . . . , k, show that there is a vector in U which lies outside
∪ki=1 Ui .
1.2.10 For A ∈ F(m, n) and B ∈ F(n, m) with m > n show that AB as an
element in F(m, m) can never be invertible.
www.pdfgrip.com

1.3 Bases, dimensionality, and coordinates 13

1.3 Bases, dimensionality, and coordinates


Let U be a vector space over a field F, take u1 , . . . , un ∈ U , and set
V = Span{u1 , . . . , un }. Eliminating linearly dependent vectors from the set
{u1 , . . . , un } if necessary, we can certainly assume that the vectors u1 , . . . , un
are already made linearly independent. Thus, any vector u ∈ V may take
the form
u = a1 u1 + · · · + an un , a1 , . . . , an ∈ F. (1.3.1)
It is not hard to see that the coefficients a1 , . . . , an in the above representation
must be unique. In fact, if we also have
u = b1 u1 + · · · + bn un , b1 , . . . , bn ∈ F, (1.3.2)
then, combining the above two relations, we have (a1 − b1 )u1 + · · · +
(an − bn )un = 0. Since u1 , . . . , un are linearly independent, we obtain
a1 = b1 , . . . , an = bn and the uniqueness follows.
Furthermore, if there is another set of vectors v1 , . . . , vm in U such that
Span{v1 , . . . , vm } = Span{u1 , . . . , un }, (1.3.3)
then m ≥ n in view of Theorem 1.5. As a consequence, if v1 , . . . , vm are also
linearly independent, then m = n. This observation leads to the following.

Definition 1.6 If there are linearly independent vectors u1 , . . . , un ∈ U such


that U = Span{u1 , . . . , un }, then U is said to be finitely generated and the set
of vectors {u1 , . . . , un } is called a basis of U . The number of vectors in any
basis of a finitely generated vector space, n, is independent of the choice of the
basis and is referred to as the dimension of the finitely generated vector space,
written as dim(U ) = n. A finitely generated vector space is also said to be
of finite dimensionality or finite dimensional. If a vector space U is not finite
dimensional, it is said to be infinite dimensional, also written as dim(U ) = ∞.

As an example of an infinite-dimensional vector space, we show that when


R is regarded as a vector space over Q, then dim(R) = ∞. In fact, recall
that a real number is called an algebraic number if it is the zero of a polyno-
mial with coefficients in Q. We also know that there are many non-algebraic
numbers in R, called transcendental numbers. Let τ be such a transcenden-
tal number. Then for any n = 1, 2, . . . the numbers 1, τ, τ 2 , . . . , τ n are lin-
early independent in the vector space R over the field Q. Indeed, if there are
r0 , r1 , r2 , . . . , rn ∈ Q so that
r0 + r1 τ + r2 τ 2 + · · · + rn τ n = 0, (1.3.4)
www.pdfgrip.com

14 Vector spaces

and at least one number among r0 , r1 , r2 , . . . , rn is nonzero, then τ is the zero


of the nontrivial polynomial

p(t) = r0 + r1 t + r2 t 2 + · · · + rn t n , (1.3.5)

which violates the assumption that τ is transcendental. Thus R is infinite


dimensional over Q.
The following theorem indicates that it is fairly easy to construct a basis for
a finite-dimensional vector space.

Theorem 1.7 Let U be an n-dimensional vector space over a field F. Any n


linearly independent vectors in U form a basis of U .

Proof Let u1 , . . . , un ∈ U be linearly independent vectors. We only need to


show that they span U . In fact, take any u ∈ U . We know that u1 , . . . , un , u
are linearly dependent. So there is a nonzero vector (a1 , . . . , an , a) ∈ Fn+1
such that

a1 u1 + · · · + an un + au = 0. (1.3.6)

Of course, a = 0, otherwise it contradicts the assumption that u1 , . . . , un


are linearly independent. So u = (−a −1 )(a1 u1 + · · · + an un ). Thus u ∈
Span{u1 , . . . , un }.

Definition 1.8 Let {u1 , . . . , un } be a basis of the vector space U . Given u ∈ U


there are unique scalars a1 , . . . , an ∈ F such that

u = a1 u1 + · · · + an un . (1.3.7)

These scalars, a1 , . . . , an are called the coordinates, and (a1 , . . . , an ) ∈ Fn the


coordinate vector, of the vector u with respect to the basis {u1 , . . . , un }.

It will be interesting to investigate the relation between the coordinate vec-


tors of a vector under different bases.
Let U = {u1 , . . . , un } and V = {v1 , . . . , vn } be two bases of the vector space
U . For u ∈ U , let (a1 , . . . , an ) ∈ Fn and (b1 , . . . , bn ) ∈ Fn be the coordinate
vectors of u with respect to U and V, respectively. Thus

u = a1 u1 + · · · + an un = b1 v1 + · · · + bn vn . (1.3.8)

On the other hand, we have



n
vj = aij ui , j = 1, . . . , n. (1.3.9)
i=1
www.pdfgrip.com

1.3 Bases, dimensionality, and coordinates 15

The n × n matrix A = (aij ) is called a basis transition matrix or basis change


matrix. Inserting (1.3.9) into (1.3.8), we have
⎛ ⎞
n n n
ai ui = ⎝ aij bj ⎠ ui . (1.3.10)
i=1 i=1 j =1

Hence, by the linear independence of the basis vectors, we have



n
ai = aij bj , i = 1, . . . , n. (1.3.11)
j =1

Note that the relation (1.3.9) between bases may be formally and conve-
niently expressed in a ‘matrix form’ as

(v1 , . . . , vn ) = (u1 , . . . , un )A, (1.3.12)

or concisely V = U A, or
⎛ ⎞ ⎛ ⎞
v1 u1
⎜ . ⎟ ⎜ .. ⎟
⎜ . ⎟ = At ⎜ ⎟ (1.3.13)
⎝ . ⎠ ⎝ . ⎠,
vn un
where multiplications between scalars and vectors are made in a well defined
manner. On the other hand, the relation (1.3.11) between coordinate vectors
may be rewritten as
⎛ ⎞ ⎛ ⎞
a1 b1
⎜ . ⎟ ⎜ ⎟
⎜ . ⎟ = A ⎜ .. ⎟ , (1.3.14)
⎝ . ⎠ ⎝ . ⎠
an bn
or

(a1 , . . . , an ) = (b1 , . . . , bn )At . (1.3.15)

Exercises

1.3.1 Let U be a vector space with dim(U ) = n ≥ 2 and V a subspace of U


with a basis {v1 , . . . , vn−1 }. Prove that for any u ∈ U \ V the vectors
u, v1 , . . . , vn−1 form a basis for U .
1.3.2 Show that dim(F(m, n)) = mn.
1.3.3 Determine dim(FS (n, n)), dim(FA (n, n)), and dim(FD (n, n)).
1.3.4 Let P be the vector space of all polynomials with coefficients in a field
F. Show that dim(P) = ∞.
www.pdfgrip.com

16 Vector spaces

1.3.5 Consider the vector space R3 and the bases U = {e1 , e2 , e3 } and
V = {e1 , e1 + e2 , e1 + e2 + e3 }. Find the basis transition matrix A from
U into V satisfying V = U A. Find the coordinate vectors of the given
vector (1, 2, 3) ∈ R3 with respect to the bases U and V, respectively,
and relate these vectors with the matrix A.
1.3.6 Prove that a basis transition matrix must be invertible.
1.3.7 Let U be an n-dimensional vector space over a field F where n ≥ 2
(say). Consider the following construction.
(i) Take u1 ∈ U \ {0}.
(ii) Take u2 ∈ U \ Span{u1 }.
(iii) Take (if any) u3 ∈ U \ Span{u1 , u2 }.
(iv) In general, take ui ∈ U \ Span{u1 , . . . , ui−1 } (i ≥ 2).
Show that this construction will terminate itself in exactly n steps,
that is, it will not be possible anymore to get un+1 , and that the vectors
u1 , u2 , . . . , un so obtained form a basis of U .

1.4 Dual spaces


Let U be an n-dimensional vector space over a field F. A functional (also called
a form or a 1-form) f over U is a linear function f : U → F satisfying
f (u + v) = f (u) + f (v), u, v ∈ U ; f (au) = af (u), a ∈ F, u ∈ U.
(1.4.1)
Let f, g be two functionals. Then we can define another functional called
the sum of f and g, denoted by f + g, by
(f + g)(u) = f (u) + g(u), u ∈ U. (1.4.2)
Similarly, let f be a functional and a ∈ F. We can define another functional
called the scalar multiple of a with f , denoted by af , by
(af )(u) = af (u), u ∈ U. (1.4.3)
It is a simple exercise to check that these two operations make the set of all
functionals over U a vector space over F. This vector space is called the dual
space of U , denoted by U  .
Let {u1 , . . . , un } be a basis of U . For any f ∈ U  and any u = a1 u1 + · · · +
an un ∈ U , we have
 n
 n
f (u) = f ai ui = ai f (ui ). (1.4.4)
i=1 i=1
www.pdfgrip.com

1.4 Dual spaces 17

Hence, f is uniquely determined by its values on the basis vectors,

f1 = f (u1 ), . . . , fn = f (un ). (1.4.5)

Conversely, for arbitrarily assigned values f1 , . . . , fn in (1.4.5), we define



n 
n
f (u) = ai fi for any u = ai ui ∈ U. (1.4.6)
i=1 i=1

It is clear that f is a functional. That is, f ∈ U  . Of course, such an f satisfies


(1.4.5).
Thus, we have seen a well-defined 1-1 correspondence

U  ↔ Fn , f ↔ (f1 , . . . , fn ). (1.4.7)

Especially we may use u1 , . . . , un to denote the elements in U  correspond-


ing to the vectors e1 , . . . , en in Fn given by (1.2.5). Then we have

 0, i = j,
ui (uj ) = δij = i, j = 1, . . . , n. (1.4.8)
1, i = j,

It is clear that u1 , . . . , un are linearly independent and span U  because an
element f of U  satisfying (1.4.5) is simply given by

f = f1 u1 + · · · + fn un . (1.4.9)

In other words, {u1 , . . . , un } is a basis of U  , commonly called the dual basis
of U  with respect to the basis {u1 , . . . , un } of U . In particular, we have seen
that U and U  are of the same dimensionality.
Let U = {u1 , . . . , un } and V = {v1 , . . . , vn } be two bases of the vec-
tor space U . Let their dual bases be denoted by U  = {u1 , . . . , un } and
V  = {v1 , . . . , vn }, respectively. Suppose that the bases U  and V  are related
through

n
uj = aij vi , j = 1, . . . , n. (1.4.10)
i=1

Using (1.3.9) and (1.4.10) to evaluate ui (vj ), we obtain


 n
 n n
 
ui (vj ) = ui akj uk = akj ui (uk ) = akj δik = aij , (1.4.11)
k=1 k=1 k=1


n 
n
ui (vj ) =  
aki vk (vj ) = 
aki δkj = aj i , (1.4.12)
k=1 k=1
www.pdfgrip.com

18 Vector spaces

which leads to aij = aj i (i, j = 1, . . . , n). In other words, we have arrived at


the correspondence relation

n
uj = aj i vi , j = 1, . . . , n. (1.4.13)
i=1

With matrix notation as before, we have

(u1 , . . . , un ) = (v1 , . . . , vn )At , (1.4.14)

or
⎛ ⎞ ⎛ ⎞
u1 v1
⎜ . ⎟ ⎜ .. ⎟
⎜ . ⎟ = A⎜ ⎟ (1.4.15)
⎝ . ⎠ ⎝ . ⎠.
un vn

Besides, for any u ∈ U  written as

u = a1 u1 + · · · + an un = b1 v1 + · · · + bn vn , (1.4.16)

the discussion in the previous section and the above immediately allow us
to get

n
bi = aj i aj , i = 1, . . . , n. (1.4.17)
j =1

Thus, in matrix form, we obtain the relation

(b1 . . . , bn ) = (a1 . . . , an )A, (1.4.18)

or
⎛ ⎞ ⎛ ⎞
b1 a1
⎜ . ⎟ ⎜ .. ⎟
⎜ . ⎟ = At ⎜ ⎟ (1.4.19)
⎝ . ⎠ ⎝ . ⎠.
bn an
Comparing the above results with those established in the previous section,
we see that, with respect to bases and dual bases, the coordinates vectors in U
and U  follow ‘opposite’ rules of correspondence. For this reason, coordinate
vectors in U are often called covariant vectors, and those in U  contravariant
vectors.
Using the relation stated in (1.4.8), we see that we may naturally view
u1 , . . . , un as elements in (U  ) = U  so that they form a basis of U  dual
to {u1 , . . . , un } since
www.pdfgrip.com

1.4 Dual spaces 19


0, i = j,
ui (uj ) ≡ uj (ui ) = δij = i, j = 1, . . . , n. (1.4.20)
1, i = j,

Thus, for any u ∈ U  satisfying u(ui ) = ai (i = 1, . . . , n), we have

u = a1 u1 + · · · + an un . (1.4.21)

In this way, we see that U  may be identified with U . In other words, we


have seen that the identification just made spells out the relationship

U  = U, (1.4.22)

which is also referred to as reflectivity of U or U is said to be reflective.


Notationally, for u ∈ U and u ∈ U  , it is often convenient to rewrite u (u),
which is linear in both the u and u arguments, as

u (u) = u , u. (1.4.23)

Then our identification (1.4.22), made through setting

u (u) = u(u ), u ∈ U, u ∈ U  , (1.4.24)

simply says that the ‘pairing’ ·, · as given in (1.4.23) is symmetric:

u , u = u, u , u ∈ U, u ∈ U  . (1.4.25)

For any non-empty subset S ⊂ U , the annihilator of S, denoted by S 0 , is


the subset of U  given by

S 0 = {u ∈ U  | u , u = 0, ∀u ∈ S}. (1.4.26)

It is clear that S 0 is always a subspace of U  regardless of whether S is a


subspace of U . Likewise, for any nonempty subset S  ⊂ U  , we can define the
annihilator of S  , S  , as the subset
0

S  = {u ∈ U | u , u = 0, ∀u ∈ S  }
0
(1.4.27)

of U . Of course, S  is always a subspace of U .


0

Exercises

1.4.1 Let F be a field. Describe the dual spaces F and (F2 ) .


1.4.2 Let U be a finite-dimensional vector space. Prove that for any vectors
u, v ∈ U (u = v) there exists an element f ∈ U  such that f (u) = f (v).
1.4.3 Let U be a finite-dimensional vector space and f, g ∈ U  . For any v ∈
U , f (v) = 0 if and only if g(v) = 0. Show that f and g are linearly
dependent.
www.pdfgrip.com

20 Vector spaces

1.4.4 Let F be a field and

V = {(x1 , . . . , xn ) ∈ Fn | x1 + · · · + xn = 0}. (1.4.28)

Show that any f ∈ V 0 may be expressed as



n
f (x1 , . . . , xn ) = c xi , (x1 , . . . , xn ) ∈ Fn , (1.4.29)
i=1

for some c ∈ F.
1.4.5 Let U = P2 and f, g, h ∈ U  be defined by

f (p) = p(−1), g(p) = p(0), h(p) = p(1), p(t) ∈ P2 .


(1.4.30)

(i) Show that B  = {f, g, h} is a basis for U  .


(ii) Find a basis B of U which is dual to B  .
1.4.6 Let U be an n-dimensional vector space and V be an m-dimensional
subspace of U . Show that the annihilator V 0 is an (n − m)-dimensional
subspace of U  . In other words, there holds the dimensionality equation

dim(V ) + dim(V 0 ) = dim(U ). (1.4.31)

1.4.7 Let U be an n-dimensional vector space and V be an m-dimensional


subspace of U . Show that V 00 = (V 0 )0 = V .

1.5 Constructions of vector spaces


Let U be a vector space and V , W its subspaces. It is clear that V ∩ W is also
a subspace of U but V ∪ W in general may fail to be a subspace of U . The
smallest subspace of U that contains V ∪ W should contain all vectors in U of
the form v + w where v ∈ V and w ∈ W . Such an observation motivates the
following definition.

Definition 1.9 If U is a vector space and V , W its subspaces, the sum of V


and W , denoted by V + W , is the subspace of U given by

V + W ≡ {u ∈ U | u = v + w, v ∈ V , w ∈ W }. (1.5.1)

Checking that V + W is a subspace of U that is also the smallest subspace


of U containing V ∪ W will be assigned as an exercise.
www.pdfgrip.com

1.5 Constructions of vector spaces 21

Now let B0 = {u1 , . . . , uk } be a basis of V ∩ W . Expand it to obtain bases


for V and W , respectively, of the forms

BV = {u1 , . . . , uk , v1 , . . . , vl }, BW = {u1 , . . . , uk , w1 , . . . , wm }. (1.5.2)

From the definition of V + W , we get

V + W = Span{u1 , . . . , uk , v1 , . . . , vl , w1 , . . . , wm }. (1.5.3)

We can see that {u1 , . . . , uk , v1 , . . . , vl , w1 , . . . , wm } is a basis of V + W . In


fact, we only need to show that the vectors u1 , . . . , uk , v1 , . . . , vl , w1 , . . . , wm
are linearly independent. For this purpose, consider the linear relation

a1 u1 + · · · + ak uk + b1 v1 + · · · + bl vl + c1 w1 + · · · + cm wm = 0, (1.5.4)

where a1 , . . . , ak , b1 , . . . , bl , c1 , . . . , cm are scalars. We claim that

w = c1 w1 + · · · + cm wm = 0. (1.5.5)

Otherwise, using (1.5.4) and (1.5.5), we see that w ∈ V . However, we already


have w ∈ W . So w ∈ V ∩ W , which is false since u1 , . . . , uk , w1 , . . . , wm are
linearly independent. Thus (1.5.5) follows and c1 = · · · = cm = 0. Applying
(1.5.5) in (1.5.4), we immediately have a1 = · · · = ak = b1 = · · · = bl = 0.
Therefore we can summarize the above discussion by concluding with the
following theorem.

Theorem 1.10 The following general dimensionality formula

dim(V + W ) = dim(V ) + dim(W ) − dim(V ∩ W ) (1.5.6)

is valid for the sum of any two subspaces V and W of finite dimensions in a
vector space.

Of great importance is the situation when dim(V ∩ W ) = 0 or V ∩ W = {0}.


In this situation, the sum is called direct sum, and rewritten as V ⊕ W . Thus,
we have

dim(V ⊕ W ) = dim(V ) + dim(W ). (1.5.7)

Direct sum has the following characteristic.

Theorem 1.11 The sum of two subspaces V and W of U is a direct sum if and
only if each vector u in V + W may be expressed as the sum of a unique vector
v ∈ V and a unique vector w ∈ W .
www.pdfgrip.com

22 Vector spaces

Proof Suppose first V ∩ W = {0}. For any u ∈ V + W , assume that it may


be expressed as

u = v1 + w1 = v2 + w2 , v1 , v2 ∈ V , w1 , w2 ∈ W. (1.5.8)

From (1.5.8), we have v1 − v2 = w2 − w1 which lies in V ∩ W . So v1 − v2 =


w2 − w1 = 0 and the stated uniqueness follows.
Suppose that any u ∈ V + W can be expressed as u = v + w for some
unique v ∈ V and w ∈ W . If V ∩ W = {0}, take x ∈ V ∩ W with x = 0. Then
zero vector 0 may be expressed as 0 = x + (−x) with x ∈ V and (−x) ∈ W ,
which violates the stated uniqueness since 0 = 0 + 0 with 0 ∈ V and 0 ∈ W ,
as well.

Let V be a subspace of an n-dimensional vector space U and BV =


{v1 , . . . , vk } be a basis of V . Extend BV to get a basis of U , say
{v1 , . . . , vk , w1 , . . . , wl }, where k + l = n. Define

W = Span{w1 , . . . , wl }. (1.5.9)

Then we obtain U = V ⊕ W . The subspace W is called a linear complement,


or simply complement, of V in U . Besides, the subspaces V and W are said to
be mutually complementary in U .
We may also build up a vector space from any two vector spaces, say V and
W , over the same field F, as a direct sum of V and W . To see this, we construct
vectors of the form

u = (v, w), v ∈ V, w ∈ W, (1.5.10)

and define vector addition and scalar multiplication component-wise by

u1 + u2 = (v1 , w1 ) + (v2 , w2 ) = (v1 + v2 , w2 + w2 ),


v1 , v2 ∈ V , w1 , w2 ∈ W, (1.5.11)
au = a(v, w) = (aw, av), v ∈ V, w ∈ W, a ∈ F. (1.5.12)

It is clear that the set U of all vectors of the form (1.5.10) equipped with the
vector addition (1.5.11) and scalar multiplication (1.5.12) is a vector space over
F. Naturally we may identify V and W with the subspaces of U given by

Ṽ = {(v, 0) | v ∈ V }, W̃ = {(0, w) | w ∈ W }. (1.5.13)

Of course, U = Ṽ ⊕ W̃ . Thus, in a well-understood sense, we may also rewrite


this relation as U = V ⊕ W as anticipated. Sometimes the vector space U so
constructed is also referred to as the direct product of V and W and rewritten
as U = V ×W . In this way, R2 may naturally be viewed as R×R, for example.
www.pdfgrip.com

1.5 Constructions of vector spaces 23

More generally, let V1 , . . . , Vk be any k subspaces of a vector space U . In a


similar manner we can define the sum

V = V1 + · · · + Vk , (1.5.14)

which is of course a subspace of U . Suggested by the above discussion, if


each v ∈ V may be written as v = v1 + · · · + vk for uniquely determined
v1 ∈ V1 , . . . , vk ∈ Vk , then we say that V is the direct sum of V1 , . . . , Vk and
rewrite such a relation as

V = V1 ⊕ · · · ⊕ Vk . (1.5.15)

It should be noted that, when k ≥ 3, extra caution has to be exerted when


checking whether the sum (1.5.14) is a direct sum. For example, the naive
condition

Vi ∩ Vj = {0}, i = j, i, j = 1, . . . , k, (1.5.16)

among V1 , . . . , Vk , is not sufficient anymore to ensure (1.5.15).


To illustrate this subtlety, let us consider V = F2 and take
     
1 1 1
V1 = Span , V2 = Span , V3 = Span .
0 1 −1
(1.5.17)

It is clear that V1 , V2 , V3 satisfy (1.5.16) and V = V1 + V2 + V3 but V cannot


be a direct sum of V1 , V2 , V3 .
In fact, in such a general situation, a correct condition which replaces
(1.5.16) is
⎛ ⎞

Vi ∩ ⎝ Vj ⎠ = {0}, i = 1, . . . , k. (1.5.18)
1≤j ≤k,j =i

In other words, when the condition (1.5.18) is fulfilled, then (1.5.15) is valid.
The proof of this fact is left as an exercise.

Exercises

1.5.1 For
U = {(x1 , . . . , xn ) ∈ Fn | x1 + · · · + xn = 0},
(1.5.19)
V = {(x1 , . . . , xn ) ∈ Fn | x1 = · · · = xn },

prove that Fn = U ⊕ V .
www.pdfgrip.com

24 Vector spaces

1.5.2 Consider the vector space of all n × n matrices over a field F, denoted by
F(n, n). As before, use FS (n, n) and FA (n, n) to denote the subspaces of
symmetric and anti-symmetric matrices. Assume that the characteristic
of F is not equal to 2. For any M ∈ F(n, n), rewrite M as
1 1
M= (M + M t ) + (M − M t ). (1.5.20)
2 2
1 1
Check that (M + M t ) ∈ FS (n, n) and (M − M t ) ∈ FA (n, n). Use
2 2
this fact to prove the decomposition

F(n, n) = FS (n, n) ⊕ FA (n, n). (1.5.21)

What happens when the characteristic of F is 2 such as when F = Z2 ?


1.5.3 Show that FL (n, n) ∩ FU (n, n) = FD (n, n).
1.5.4 Use FL (n, n) and FU (n, n) in F(n, n) to give an example for the dimen-
sionality relation (1.5.6).
1.5.5 Let X = C[a, b] (a, b ∈ R and a < b) be the vector space of all real-
valued continuous functions over the interval [a, b] and
  b 

Y = f ∈ X  f (t) dt = 0 . (1.5.22)
a

(i) Prove that if R is identified with the set of all constant functions over
[a, b] then X = R ⊕ Y .
(ii) For a = 0, b = 1, and f (t) = t 2 + t − 1, find the unique c ∈ R and
g ∈ Y such that f (t) = c + g(t) for all t ∈ [0, 1].
1.5.6 Let U be a vector space and V , W its subspaces such that U = V + W .
If X is subspace of U , is it true that X = (X ∩ V ) + (X ∩ W )?
1.5.7 Let U be a vector space and V , W, X some subspaces of U such that

U = V ⊕ W, U = V ⊕ X. (1.5.23)

Can one infer W = X from the condition (1.5.23)? Explain why or


why not.
1.5.8 Let V1 , . . . , Vk be some subspaces of U and set V = V1 + · · · + Vk .
Show that this sum is a direct sum if and only if one of the following
statements is true.
(i) V1 , . . . , Vk satisfy (1.5.18).
(ii) If non-overlapping sets of vectors

{v11 , . . . , vl11 }, . . . , {v1k , . . . , vlkk } (1.5.24)


www.pdfgrip.com

1.6 Quotient spaces 25

are bases of V1 , . . . , Vk , respectively, then the union


{v11 , . . . , vl11 } ∪ · · · ∪ {v1k , . . . , vlkk } (1.5.25)
is a basis of V .
(iii) There holds the dimensionality relation
dim(V ) = dim(V1 ) + · · · + dim(Vk ). (1.5.26)

1.6 Quotient spaces


In order to motivate the introduction of the concept of quotient spaces, we first
consider a concrete example in R2 .
Let v ∈ R2 be any nonzero vector. Then
V = Span{v} (1.6.1)
represents the line passing through the origin and along (or opposite to) the
direction of v. More generally, for any u ∈ R2 , the coset
[u] = {u + w | w ∈ V } = {x ∈ R2 | x − u ∈ V } (1.6.2)
represents the line passing through the vector u and parallel to the vector v.
Naturally, we define [u1 ] + [u2 ] = {x + y | x ∈ [u1 ], y ∈ [u2 ]} and claim
[u1 ] + [u2 ] = [u1 + u2 ]. (1.6.3)
In fact, let z ∈ [u1 ] + [u2 ]. Then there exist x ∈ [u1 ] and y ∈ [u2 ] such that
z = x + y. Rewrite x, y as x = u1 + w1 , y = u2 + w2 for some w1 , w2 ∈ V .
Hence z = (u1 + u2 ) + (w1 + w2 ), which implies z ∈ [u1 + u2 ]. Conversely,
if z ∈ [u1 + u2 ], then there is some w ∈ V such that z = (u1 + u2 ) + w =
(u1 + w) + u2 . Since u1 + w ∈ [u1 ] and u2 ∈ [u2 ], we see that z ∈ [u1 ] + [u2 ].
Hence the claim follows.
From the property (1.6.3), we see clearly that the coset [0] = V serves as an
additive zero element among the set of all cosets.
Similarly, we may also naturally define a[u] = {ax | x ∈ [u]} for a ∈ R
where a = 0. Note that this last restriction is necessary because otherwise 0[u]
would be a single-point set consisting of zero vector only. We claim
a[u] = [au], a ∈ R \ {0}. (1.6.4)
In fact, if z ∈ a[u], there is some x ∈ [u] such that z = ax. Since x ∈ [u],
there is some w ∈ V such that x = u + w. So z = au + aw which implies z ∈
[au]. Conversely, if z ∈ [au], then there is some w ∈ V such that z = au + w.
Since z = a(u + a −1 w), we get z ∈ a[u]. So (1.6.4) is established.
www.pdfgrip.com

26 Vector spaces

Since the coset [0] is already seen to be the additive zero when adding cosets,
we are prompted to define

0[u] = [0]. (1.6.5)

Note that (1.6.4) and (1.6.5) may be collectively rewritten as

a[u] = [au], a ∈ R, u ∈ R2 . (1.6.6)

We may examine how the above introduced addition between cosets and
scalar multiplication with cosets make the set of all cosets into a vector space
over R, denoted by R2 /V , and called the quotient space of R2 modulo V . As
investigated, the geometric meaning of R2 /V is that it is the set of all the lines
in R2 parallel to the vector v and that these lines can be added and multiplied
by real scalars so that the set of lines enjoys the structure of a real vector space.
There is no difficulty extending the discussion to the case of R3 with V a
line or plane through the origin.
More generally, the above quotient-space construction may be formulated
as follows.

Definition 1.12 Let U be a vector space over a field F and V a subspace of U .


The set of cosets represented by u ∈ U given by

[u] = {u + w | w ∈ V } = {u} + V ≡ u + V , (1.6.7)

equipped with addition and scalar multiplication defined by

[u] + [v] = [u + v], u, v ∈ U, a[u] = [au], a ∈ F, u ∈ U, (1.6.8)

forms a vector space over F, called the quotient space of U modulo V , and is
denoted by U/V .

Let BV = {v1 , . . . , vk } be a basis of V . Extend BV to get a basis of U , say

BU = {v1 , . . . , vk , u1 , . . . , ul }. (1.6.9)

We claim that {[u1 ], . . . , [ul ]} forms a basis for U/V . In fact, it is


evident that these vectors span U/V . So we only need to show their linear
independence.
Consider the relation

a1 [u1 ] + · · · + al [ul ] = 0, a1 , . . . , al ∈ F. (1.6.10)

Note that 0 = [0] in U/V . So a1 u1 + · · · + al ul ∈ V . Thus there are scalars


b1 , . . . , bk ∈ F such that

a1 u1 + · · · + al ul = b1 v1 + · · · + bk vk . (1.6.11)
www.pdfgrip.com

1.6 Quotient spaces 27

Hence a1 = · · · = al = b1 = · · · = bk = 0 and the claimed linear indepen-


dence follows.
As a consequence of the afore-going discussion, we arrive at the following
basic dimensionality relation
dim(U/V ) + dim(V ) = dim(U ). (1.6.12)
Note that the construction made to arrive at (1.6.12) demonstrates a practical
way to find a basis for U/V . The quantity dim(U/V ) = dim(U ) − dim(V ) is
sometimes called the codimension of the subspace V in U . The quotient space
U/V may be viewed as constructed from the space U after ‘collapsing’ or
‘suppressing’ its subspace V .

Exercises

1.6.1 Let V be the subspace of R2 given by V = Span{(1, −1)}.


(i) Draw V in R2 .
(ii) Draw the cosets
S1 = (1, 1) + V , S2 = (2, 1) + V . (1.6.13)

(iii) Describe the quotient space R2 /V .


(iv) Draw the coset S3 = (−2)S1 + S2 .
(v) Determine whether S3 is equal to (−1, 0) + V (explain why).
1.6.2 Let V be the subspace of P2 (set of polynomials of degrees up to 2 and
with coefficients in R) satisfying
 1
p(t) dt = 0, p(t) ∈ P2 . (1.6.14)
−1

(i) Find a basis to describe V .


(ii) Find a basis for the quotient space P2 /V and verify (1.6.12).
1.6.3 Describe the plane given in terms of the coordinates (x, y, z) ∈ R3 by
the equation
ax + by + cz = d, (a, b, c) = (0, 0, 0), (1.6.15)

where a, b, c, d ∈ R are constants, as a coset in R3 and as a point in a


quotient space.
1.6.4 Let U be a vector space and V and W some subspaces of U . For u ∈ U ,
use [u]V and [u]W to denote the cosets u + V and u + W , respectively.
Show that, if V ⊂ W , and u1 , . . . , uk ∈ U , then that [u1 ]V , . . . , [uk ]V
are linearly dependent implies that [u1 ]W , . . . , [uk ]W are linearly
dependent. In particular, if U is finite dimensional, then dim(U/V ) ≥
dim(U/W ), if V ⊂ W , as may also be seen from using (1.6.12).
www.pdfgrip.com

28 Vector spaces

1.7 Normed spaces


It will be desirable to be able to evaluate the ‘length’ or ‘magnitude’ or
‘amplitude’ of any vector in a vector space. In other words, it will be use-
ful to associate to each vector a quantity that resembles the notion of length of
a vector in (say) R3 . Such a quantity is generically called norm.
In this section, we take the field F to be either R or C.

Definition 1.13 Let U be a vector space over the field F. A norm over U is a
correspondence  ·  : U → R such that we have the following.

(1) (Positivity) u ≥ 0 for u ∈ U and u = 0 only for u = 0.


(2) (Homogeneity) au = |a|u for a ∈ F and u ∈ U .
(3) (Triangle inequality) u + v ≤ u + v for u, v ∈ U .
A vector space equipped with a norm is called a normed space. If  ·  is
the specific norm of the normed space U , we sometimes spell this fact out by
stating ‘normed space (U,  · )’.

Definition 1.14 Let (U,  · ) be a normed space and {uk } a sequence in U .


For some u0 ∈ U , we say that uk → u0 or {uk } converges to u0 as k → ∞ if
lim u0 − uk  = 0, (1.7.1)
k→∞

and the sequence {uk } is said to be a convergent sequence. The vector u0 is


also said to be the limit of the sequence {uk }.

The notions of convergence and limit are essential for carrying out calculus
in a normed space.
Let U be a finite-dimensional space. We can easily equip U with a norm.
For example, assume that B = {u1 , . . . , un } is a basis of U . For any u ∈ U ,
define

n 
n
u1 = |ai |, where u= ai ui . (1.7.2)
i=1 i=1

It is direct to verify that  · 1 indeed defines a norm.


More generally, for p ≥ 1, we may set
 n 1
 p 
n
up = |ai | p
, where u = ai ui . (1.7.3)
i=1 i=1

It can be shown that  · p also defines a norm over U . We will not check this
fact. What interests us here, however, is the limit
www.pdfgrip.com

1.7 Normed spaces 29

lim up = max{|ai | | i = 1, . . . , n}. (1.7.4)


p→∞

To prove (1.7.4), we note that the right-hand side of (1.7.4) is simply |ai0 |
for some i0 ∈ {1, . . . , n}. Thus, in view of (1.7.3), we have
 n 1
 p
1
up ≤ |ai0 | p
= |ai0 |n p . (1.7.5)
i=1
From (1.7.5), we obtain
lim sup up ≤ |ai0 |. (1.7.6)
p→∞

On the other hand, using (1.7.3) again, we have


 1
up ≥ |ai0 |p p = |ai0 |. (1.7.7)
Thus,
lim inf up ≥ |ai0 |. (1.7.8)
p→∞

Therefore (1.7.4) is established. As a consequence, we are motivated to adopt


the notation
n
u∞ = max{|ai | | i = 1, . . . , n}, where u = ai ui , (1.7.9)
i=1
and restate our result (1.7.4) more elegantly as
lim up = u∞ , u ∈ U. (1.7.10)
p→∞

It is evident that  · ∞ does define a norm over U .


Thus we have seen that there are many ways to introduce a norm over a
vector space. So it will be important to be able to compare norms. For calculus,
it is obvious that the most important thing with regard to a norm is the notion
of convergence. In particular, assume there are two norms, say  ·  and  ·  ,
equipped over the vector space U . We hope to know whether convergence with
respect to norm · implies convergence with respect to norm · . This desire
motivates the introduction of the following concept.

Definition 1.15 Let U be a vector space and  ·  and  ·  two norms over U .
We say that  ·  is stronger than  ·  if convergence with respect to norm  · 
implies convergence with respect to norm · . More precisely, any convergent
sequence {uk } with limit u0 in (U,  · ) is also a convergent sequence with the
same limit in (U,  ·  ).

Regarding the above definition, we have the following.


www.pdfgrip.com

30 Vector spaces

Theorem 1.16 Let U be a vector space and  ·  and  ·  two norms over U .
Then norm  ·  is stronger than norm  ·  if and only if there is a constant
C > 0 such that

u ≤ Cu, ∀u ∈ U. (1.7.11)

Proof If (1.7.11) holds, then it is clear that  ·  is stronger than  ·  .


Suppose that  ·  is stronger than  ·  but (1.7.11) does not hold. Thus there
is a sequence {uk } in U such that uk = 0 (k = 1, 2, . . . ) and
uk 
≥ k, k = 1, 2, . . . . (1.7.12)
uk 
Define
1
vk = uk , k = 1, 2, . . . . (1.7.13)
uk 
Then (1.7.12) yields the bounds
1
vk  ≤ , k = 1, 2, . . . . (1.7.14)
k
Consequently, vk → 0 as k → ∞ with respect to norm  · . However, accord-
ing to (1.7.13), we have vk  = 1, k = 1, 2, . . . , and vk → 0 as k → ∞ with
respect to norm  ·  . This reaches a contradiction.

Definition 1.17 Let U be a vector space and  ·  and  ·  two norms over U .
We say that norms  ·  and  ·  are equivalent if convergence in norm  ·  is
equivalent to convergence in norm  ·  .

In view of Theorem 1.16, we see that norms  ·  and  ·  are equivalent if


and only if the inequality

C1 u ≤ u ≤ C2 u, ∀u ∈ U, (1.7.15)

holds true for some suitable constants C1 , C2 > 0.


The following theorem regarding norms over finite-dimensional spaces is of
fundamental importance.

Theorem 1.18 Any two norms over a finite-dimensional space are equivalent.

Proof Let U be an n-dimensional vector space and B = {u1 , . . . , un } a basis


for U . Define the norm  · 1 by (1.7.2). Let  ·  be any given norm over U .
Then by the properties of norm we have
www.pdfgrip.com

1.7 Normed spaces 31


n 
n
u ≤ |ai |ui  ≤ α2 |ai | = α2 u1 , (1.7.16)
i=1 i=1
where we have set
α2 = max{ui  | i = 1, . . . , n}. (1.7.17)
In other words, we have shown that  · 1 is stronger than  · .
In order to show that  ·  is also stronger than  · 1 , we need to prove the
existence of a constant α1 > 0 such that
α1 u1 ≤ u, ∀u ∈ U. (1.7.18)
Suppose otherwise that (1.7.18) fails to be valid. In other words, the set of
ratios
  
u 
u ∈ U, u = 0 (1.7.19)
u1 
does not have a positive infimum. Then there is a sequence {vk } in U such that
1
vk 1 ≥ vk , vk = 0, k = 1, 2, . . . . (1.7.20)
k
Now set
1
wk = vk , k = 1, 2, . . . . (1.7.21)
vk 1
Then (1.7.20) implies that wk → 0 as k → ∞ with respect to  · .
On the other hand, we have wk 1 = 1 (k = 1, 2, . . . ). Express wk with
respect to basis B as
wk = a1,k u1 + · · · + an,k un , a1,k , . . . , an,k ∈ F, k = 1, 2, . . . . (1.7.22)
The definition of norm  · 1 then implies |ai,k | ≤ 1 (i = 1, . . . , n, k =
1, 2, . . . ). Hence, by the Bolzano–Weierstrass theorem, there is a subsequence
of {k}, denoted by {ks }, such that ks → ∞ as s → ∞ and
ai,ks → some ai,0 ∈ F as s → ∞, i = 1, . . . , n. (1.7.23)
Now set w0 = a1,0 u1 + · · · + an,0 un . Then w0 − wks 1 → 0 as s → ∞.
Moreover, the triangle inequality gives us
w0 1 ≥ wks 1 − wks − w0 1 = 1 − wks − w0 1 , s = 1, 2, . . . .
(1.7.24)
Thus, letting s → ∞ in (1.7.24), we obtain w0 1 ≥ 1.
However, substituting u = w0 − wks in (1.7.16) and letting s → ∞, we see
that wks → w0 as s → ∞ with respect to norm  ·  as well which is false
because we already know that wk → 0 as k → ∞ with respect to norm  · .
www.pdfgrip.com

32 Vector spaces

Summarizing the above study, we see that there are some constants α1 , α2 >
0 such that
α1 u1 ≤ u ≤ α2 u1 , ∀u ∈ U. (1.7.25)

Finally, let  ·  be another norm over U . Then we have some constants
β1 , β2 > 0 such that
β1 u1 ≤ u ≤ β2 u1 , ∀u ∈ U. (1.7.26)
Combining (1.7.25) and (1.7.26), we arrive at the desired conclusion
β1 β2
u ≤ u ≤ u, ∀u ∈ U, (1.7.27)
α2 α1
which establishes the equivalence of norms  ·  and  ·  as stated.
As an immediate application of Theorem 1.18, we have the following.

Theorem 1.19 A finite-dimensional normed space (U, ·) is locally compact.


That is, any bounded sequence in U contains a convergent subsequence.

Proof Let {u1 , . . . , un } be any basis of U and introduce norm  · 1 by the


expression (1.7.2). Let {vk } be a bounded sequence in (U,  · ). Then Theo-
rem 1.18 says that {vk } is also bounded in (U,  · 1 ). If we rewrite vk as
vk = a1,k u1 + · · · + an,k un , a1,k , . . . , an,k ∈ F, k = 1, 2, . . . , (1.7.28)
the definition of  · 1 then implies that the sequences {ai,k } (i = 1, 2, . . . ) are
all bounded in F. Thus the Bolzano–Weierstrass theorem indicates that there
are subsequences {ai,ks } (i = 1, . . . , n) which converge to some ai,0 ∈ F
(i = 1, . . . , n) as s → ∞. Consequently, setting v0 = a1,0 u1 + · · · + an,0 un ,
we have v0 − vks 1 → 0 as s → ∞. Thus v0 − vks  ≤ Cv0 − vks 1 → 0
as s → ∞ as well.
Let (U,  · ) be a finite-dimensional normed vector space and V a subspace
of U . Consider the quotient space U/V . As an exercise, it may be shown that
[u] = inf{v | v ∈ [u]}, [u] ∈ U/V , u ∈ U, (1.7.29)
defines a norm over U/V .

Exercises

1.7.1 Consider the vector space C[a, b] of the set of all real-valued continuous
functions over the interval [a, b] where a, b ∈ R and a < b. Define two
norms  · 1 and  · ∞ by setting
www.pdfgrip.com

1.7 Normed spaces 33

 b
u1 = |u(t)| dt, u∞ = max |u(t)|, ∀u ∈ C[a, b].
a t∈[a,b]
(1.7.30)
Show that  · ∞ is stronger than  · 1 but not vice versa.
1.7.2 Over the vector space C[a, b] again, define norm  · p (p ≥ 1) by
 b  p1
up = |u(t)| dt
p
, ∀u ∈ C[a, b]. (1.7.31)
a
Show that  · ∞ is stronger than  · p for any p ≥ 1 and prove that
lim up = u∞ , ∀u ∈ C[a, b]. (1.7.32)
p→∞

1.7.3 Let (U,  · ) be a normed space and use U  to denote the dual space of
U . For each u ∈ U  , define
u  = sup{|u (u)| | u ∈ U, u = 1}. (1.7.33)
Show that  ·  defines a norm over U  .
1.7.4 Let U be a vector space and U  its dual space which is equipped with a
norm, say  ·  . Define
u = sup{|u (u)| | u ∈ U  , u  = 1}. (1.7.34)
Show that  ·  defines a norm over U as well.
1.7.5 Prove that for any finite-dimensional normed space (U, ·) with a given
subspace V we may indeed use (1.7.29) to define a norm for the quotient
space U/V .
1.7.6 Let  ·  denote the Euclidean norm of R2 which is given by

u = a12 + a22 , u = (a1 , a2 ) ∈ R2 . (1.7.35)

Consider the subspace V = {(x, y) ∈ R2 | 2x − y = 0} and the coset


[(−1, 1)] in R2 modulo V .
(i) Find the unique vector v ∈ R2 such that v = [(−1, 1)].
(ii) Draw the coset [(−1, 1)] in R2 and the vector v found in part (i) and
explain the geometric content of the results.
www.pdfgrip.com

2
Linear mappings

In this chapter we consider linear mappings over vector spaces. We begin by


stating the definition and a discussion of the structural properties of linear map-
pings. We then introduce the notion of adjoint mappings and illustrate some
of their applications. We next focus on linear mappings from a vector space
into itself and study a series of important concepts such as invariance and
reducibility, eigenvalues and eigenvectors, projections, nilpotent mappings,
and polynomials of linear mappings. Finally we discuss the use of norms of
linear mappings and present a few analytic applications.

2.1 Linear mappings


A linear mapping may be regarded as the simplest correspondence
between two vector spaces. In this section we start our study with the def-
inition of linear mappings. We then discuss the matrix representation of a
linear mapping, composition of linear mappings, and the rank and nullity of
a linear mapping.

2.1.1 Definition, examples, and notion of associated matrices


Let U and V be two vector spaces over the same field F. A linear mapping or
linear map or linear operator is a correspondence T from U into V , written as
T : U → V , satisfying the following.

(1) (Additivity) T (u1 + u2 ) = T (u1 ) + T (u2 ), u1 , u2 ∈ U .


(2) (Homogeneity) T (au) = aT (u), a ∈ F, u ∈ U .

34
www.pdfgrip.com

2.1 Linear mappings 35

A special implication of the homogeneity condition is that T (0) = 0. One


may also say that a linear mapping ‘respects’ or preserves vector addition and
scalar multiplication.
The set of all linear mappings from U into V will be denoted by L(U, V ).
For S, T ∈ L(U, V ), we define S + T to be a mapping from U into V
satisfying
(S + T )(u) = S(u) + T (u), ∀u ∈ U. (2.1.1)

For any a ∈ F and T ∈ L(U, V ), we define aT to be the mapping from U into


V satisfying
(aT )(u) = aT (u), ∀u ∈ U. (2.1.2)

We can directly check that the mapping addition (2.1.1) and scalar-mapping
multiplication (2.1.2) make L(U, V ) a vector space over F. We adopt the
notation L(U ) = L(U, U ).
As an example, consider the space of matrices, F(m, n). For
A = (aij ) ∈ F(m, n), define
⎛ ⎞⎛ x1

x1
⎛⎞
a11 ··· a1n
⎜ ⎟⎜ .. ⎟ ⎜ . ⎟
TA (x) = Ax = ⎝ · · · ··· ··· ⎠⎜


. ⎠, x=⎜ ⎟
⎝ .. ⎠ ∈ F .
n

am1 ··· amn xn xn


(2.1.3)

Then TA ∈ L(Fn , Fm ). Besides, for the standard basis e1 , . . . , en of Fn , we


have
⎛ ⎞ ⎛ ⎞
a11 a1n
⎜ . ⎟ ⎜ . ⎟
TA (e1 ) = ⎜ ⎟ ⎜
⎝ .. ⎠ , . . . , TA (en ) = ⎝ .. ⎠ .
⎟ (2.1.4)
am1 amn

In other words, the images of e1 , . . . , en under the linear mapping TA are sim-
ply the column vectors of the matrix A, respectively.
Conversely, take any element T ∈ L(Fn , Fm ). Let v1 , . . . , vn ∈ Fm be
images of e1 , . . . , en under T such that
⎛ ⎞
a11
⎜ . ⎟  m
T (e1 ) = v1 = ⎜ .
⎝ . ⎠
⎟= ai1 ei , . . . ,
i=1
am1
www.pdfgrip.com

36 Linear mappings



a1n
⎜ . ⎟  m
T (en ) = vn = ⎜ .
⎝ . ⎠
⎟= ain ei , (2.1.5)
i=1
amn
where {e1 , . . . , em } is the standard basis of Fm . Then any x = x1 e1 +· · ·+xn en
has the image
⎛ ⎞
⎛ ⎞ x1
 n n
⎜ . ⎟
T (x) = T ⎝ xj en ⎠ = xj vj = (v1 , . . . , vn ) ⎜ ⎟
⎝ .. ⎠
j =1 j =1
xn
⎛ ⎞⎛ x1

a11 ··· a1n
⎜ ⎟⎜ .. ⎟
= ⎝ ··· ··· ··· ⎠⎜


. ⎠. (2.1.6)
am1 ··· amn xn
In other words, T can be identified with TA through the matrix A consisting
of column vectors as images of e1 , . . . , en under T given in (2.1.5).
It may also be examined that mapping addition and scalar-mapping multi-
plication correspond to matrix addition and scalar-matrix multiplication.
In this way, as vector spaces, F(m, n) and L(Fn , Fm ) may be identified with
each other.
In general, let BU = {u1 , . . . , un } and BV = {v1 , . . . , vm } be bases of the
finite-dimensional vector spaces U and V over the field F, respectively. For
any T ∈ L(U, V ), we can write

m
T (uj ) = aij vi , aij ∈ F, i = 1, . . . , m, j = 1, . . . , n. (2.1.7)
i=1

Since the images T (u1 ), . . . , T (un ) completely determine the mapping T , we


see that the matrix A = (aij ) completely determines the mapping T . In other
words, to each mapping in L(U, V ), there corresponds a unique matrix in
F(m, n).
Conversely, for any A = (aij ) ∈ F(m, n), we define T (u1 ), . . . , T (un ) by
 n
(2.1.7). Moreover, for any x ∈ U given by x = xj uj for x1 , . . . , xn ∈ F,
j =1
we set
⎛ ⎞

n 
n
T (x) = T ⎝ xj uj ⎠ = xj T (uj ). (2.1.8)
j =1 j =1

It is easily checked that this makes T a well-defined element in L(U, V ).


www.pdfgrip.com

2.1 Linear mappings 37

Thus we again see, after specifying bases for U and V , that we may identify
L(U, V ) with F(m, n) in a natural way.
After the above general description of linear mappings, especially their iden-
tification with matrices, we turn our attention to some basic properties of linear
mappings.

2.1.2 Composition of linear mappings


Let U, V , W be vector spaces over a field F of respective dimensions n, l, m.
For T ∈ L(U, V ) and S ∈ L(V , W ), we can define the composition of T and
S with the understanding

(S ◦ T )(x) = S(T (x)), x ∈ U. (2.1.9)

It is obvious that S ◦ T ∈ L(U, W ). We now investigate the matrix of S ◦ T in


terms of the matrices of S and T .
To this end, let BU = {u1 , . . . , un }, BV = {v1 , . . . , vl }, BW = {w1 , . . . , wm }
be the bases of U, V , W , respectively, and A = (aij ) ∈ F(l, n) and B =
(bij ) ∈ F(m, l) be the correspondingly associated matrices of T and S, respec-
tively. Then we have
 l
  l
(S ◦ T )(uj ) = S(T (uj )) = S aij vi = aij S(vi )
i=1 i=1

l 
m 
m 
l
= aij bki wk = bik akj wi . (2.1.10)
i=1 k=1 i=1 k=1

In other words, we see that if we take C = (cij ) ∈ F(m, n) to be the matrix


associated to the linear mapping S ◦ T with respect to the bases BU and BW ,
then C = BA. Hence the composition of linear mappings corresponds to the
multiplication of their associated matrices, in the same order. For this reason, it
is also customary to use ST to denote S ◦T , when there is no risk of confusion.
The composition of linear mappings obviously enjoys the associativity
property

R ◦ (S ◦ T ) = (R ◦ S) ◦ T , R ∈ L(W, X), S ∈ L(V , W ), T ∈ L(U, V ),


(2.1.11)

as can be seen from

(R ◦ (S ◦ T ))(u) = R((S ◦ T )(u)) = R(S(T (u))) (2.1.12)


www.pdfgrip.com

38 Linear mappings

and

((R ◦ S) ◦ T )(u) = (R ◦ S)(T (u)) = R(S(T (u))) (2.1.13)

for any u ∈ U .
Thus, when applying (2.1.11) to the situation of linear mappings between
finite-dimensional vector spaces and using their associated matrix representa-
tions, we obtain another proof of the associativity of matrix multiplication.

2.1.3 Null-space, range, nullity, and rank


For T ∈ L(U, V ), the subset

N (T ) = {x ∈ U | T (x) = 0} (2.1.14)

in U is a subspace of U called the null-space of T , which is sometimes called


the kernel of T , and denoted as Kernel(T ) or T −1 (0). Here and in the sequel,
note that, in general, T −1 (v) denotes the set of preimages of v ∈ V under T .
That is,

T −1 (v) = {x ∈ U | T (x) = v}. (2.1.15)

Besides, the subset

R(T ) = {v ∈ V | v = T (x) for some x ∈ U } (2.1.16)

is a subspace of V called the range of T , which is sometimes called the image


of U under T , and denoted as Image(T ) or T (U ).
For T ∈ L(U, V ), we say that T is one-to-one, or 1-1, or injective, if T (x) =
T (y) whenever x, y ∈ U with x = y. It is not hard to show that T is 1-1 if and
only if N (T ) = {0}. We say that T is onto or surjective if R(T ) = V .
A basic problem in linear algebra is to investigate whether the equation

T (x) = v (2.1.17)

has a solution x in U for a given vector v ∈ V . From the meaning of R(T ),


we see that (2.1.17) has a solution if and only if v ∈ R(T ), and the solution
is unique if and only if N (T ) = {0}. More generally, if v ∈ R(T ) and u ∈ U
is any particular solution of (2.1.17), then the set of all solutions of (2.1.17) is
simply the coset

[u] = {u} + N (T ) = u + N (T ). (2.1.18)

Thus, in terms of N (T ) and R(T ), we may understand the structure of the


set of solutions of (2.1.17) completely.
www.pdfgrip.com

2.1 Linear mappings 39

First, as the set of cosets, we have


U/N(T ) = {{T −1 (v)} | v ∈ R(T )}. (2.1.19)
Furthermore, for v1 , v2 ∈ R(T ), take u1 , u2 ∈ U such that T (u1 ) =
v2 , T (u2 ) = v2 . Then [u1 ] = T −1 (v1 ), [u2 ] = T −1 (v2 ), and [u1 ] + [u2 ] =
[u1 + u2 ] = T −1 (v1 + v2 ); for a ∈ F, v ∈ R(T ), and u ∈ T −1 (v), we have
au ∈ T −1 (av) and a[u] = [au] = T −1 (av), which naturally give us the
construction of U/N (T ) as a vector space.
Next, we consider the quotient space V /R(T ), referred to as cokernel of
T , written as Cokernel(T ). Recall that V /R(T ) is the set of non-overlapping
cosets in V modulo R(T ) so that exactly [0] = R(T ). Therefore we have

V \ R(T ) = [v], (2.1.20)
where the union of cosets on the right-hand side of (2.1.20) is made over all
[v] ∈ V /R(T ) except the zero element [0]. In other words, the union of all
cosets in V /R(T ) \ {[0]} gives us the set of all such vectors v in V that the
equation (2.1.17) fails to have a solution in U . With set notation, this last state-
ment is

{v ∈ V | T −1 (v) = ∅} = [v]. (2.1.21)
Thus, loosely speaking, the quotient space or cokernel V /R(T ) measures
the ‘size’ of the set of vectors in V that are ‘missed’ by the image of U
under T .

Definition 2.1 Let U and V be finite-dimensional vector spaces over a field F.


The nullity and rank of a linear mapping T : U → V , denoted by n(T ) and
r(T ), respectively, are the dimensions of N (T ) and R(T ). That is,
n(T ) = dim(N (T )), r(T ) = dim(R(T )). (2.1.22)

As a consequence of this definition, we see that T is 1-1 if and only if


n(T ) = 0 and T is onto if and only if r(T ) = dim(V ) or dim(Cokernel(T )) =
0.
The following simple theorem will be of wide usefulness.

Theorem 2.2 Let U, V be vector spaces over a field F and T ∈ L(U, V ).


(1) If v1 , . . . , vk ∈ V are linearly independent and u1 , . . . , uk ∈ U are such
that T (u1 ) = v1 , . . . , T (uk ) = vk , then u1 , . . . , uk are linearly indepen-
dent as well. In other words, the preimages of linear independent vectors
are also linearly independent.
www.pdfgrip.com

40 Linear mappings

(2) If N(T ) = {0} and u1 , . . . , uk ∈ U are linearly independent, then


v1 = T (u1 ), . . . , vk = T (uk ) ∈ V are linearly independent as well. In
other words, the images of linearly independent vectors under a 1-1 linear
mapping are also linearly independent.

Proof (1) Consider the relation a1 u1 + · · · + ak uk = 0, a1 , . . . , ak ∈ F.


Applying T to the above relation, we get a1 v1 + · · · + ak vk = 0. Hence
a1 = · · · = ak = 0.
(2) We similarly consider a1 v1 + · · · + ak vk = 0. This relation immediately
gives us T (a1 u1 + · · · + ak uk ) = 0. Using N (T ) = {0}, we deduce
a1 u1 +· · ·+ak uk = 0. Thus the linear independence of u1 , . . . , uk implies
a1 = · · · = ak = 0.

The fact stated in the following theorem is known as the nullity-rank equa-
tion or simply rank equation.

Theorem 2.3 Let U, V be finite-dimensional vector spaces over a field F and


T ∈ L(U, V ). Then
n(T ) + r(T ) = dim(U ). (2.1.23)

Proof Let {u1 , . . . , uk } be a basis of N (T ). We then expand it to get a basis


for the full space U written as {u1 , . . . , uk , w1 , . . . , wl }. Thus
R(T ) = Span{T (w1 ), . . . , T (wl )}. (2.1.24)
We now show that T (w1 ), . . . , T (wl ) form a basis for R(T ) by establishing
their linear independence. To this end, consider b1 T (w1 ) + · · · + bl T (wl ) = 0
for some b1 , . . . , bl ∈ F. Hence T (b1 w1 + · · · + bl wl ) = 0 or b1 w1 + · · · +
bl wl ∈ N (T ). So there are a1 , . . . , ak ∈ F such that b1 w1 + · · · + bl wl =
a1 u1 + · · · + ak uk . Since u1 , . . . , uk , w1 , . . . , wl are linearly independent, we
arrive at a1 = · · · = ak = b1 = · · · = bl = 0. In particular, r(T ) = l and the
rank equation is valid.
As an immediate application of the above theorem, we have the following.

Theorem 2.4 Let U, V be finite-dimensional vector spaces over a field F. If


there is some T ∈ L(U, V ) such that T is both 1-1 and onto, then there must
hold dim(U ) = dim(V ). Conversely, if dim(U ) = dim(V ), then there is some
T ∈ L(U, V ) which is both 1-1 and onto.

Proof If T is 1-1 and onto, then n(T ) = 0 and r(T ) = dim(V ). Thus
dim(U ) = dim(V ) follows from the rank equation.
www.pdfgrip.com

2.1 Linear mappings 41

Conversely, assume dim(U ) = dim(V ) = n and let {u1 , . . . , un } and


{v1 , . . . , vn } be any bases of U and V , respectively. We can define a unique
linear mapping T : U → V by specifying the images of u1 , . . . , un under T
to be v1 , . . . , vn . That is,
T (u1 ) = v1 , . . . , T (un ) = vn , (2.1.25)
so that

n 
n
T (u) = ai vi , ∀u = ai ui ∈ U. (2.1.26)
i=1 i=1

Since r(T ) = n, the rank equation implies n(T ) = 0. So T is 1-1 as well.


We can slightly modify the proof of Theorem 2.4 to prove the following.

Theorem 2.5 Let U, V be finite-dimensional vector spaces over a field F so


that dim(U ) = dim(V ). If T ∈ L(U, V ) is either 1-1 or onto, then T must be
both 1-1 and onto. In this situation, there is a unique S ∈ L(V , U ) such that
S ◦ T : U → U and T ◦ S : V → V are identity mappings, denoted by IU and
IV , respectively, satisfying IU (u) = u, ∀u ∈ U and IV (v) = v, ∀v ∈ V .

Proof Suppose dim(U ) = dim(V ) = n. If T ∈ L(U, V ) is 1-1, then n(T ) =


0. So the rank equation gives us r(T ) = n = dim(V ). Hence T is onto. If
T ∈ L(U, V ) is onto, then r(T ) = n = dim(U ). So the rank equation gives us
n(T ) = 0. Hence T is 1-1. Now let {u1 , . . . , un } be any basis of U . Since T
is 1-1 and onto, we know that v1 , . . . , vn ∈ V defined by (2.1.25) form a basis
for V . Define now S ∈ L(V , U ) by setting
S(v1 ) = u1 , . . . , S(vn ) = un , (2.1.27)
so that

n 
n
S(v) = bi ui , ∀v = bi vi ∈ V . (2.1.28)
i=1 i=1

Then it is clear that S ◦ T = IU and T ◦ S = IV . Thus the stated existence of


S ∈ L(V , U ) follows.
If R ∈ L(V , U ) is another mapping such that R ◦ T = IU and T ◦
R = IV . Then the associativity of the composition of linear mappings (2.1.11)
gives us the result R = R ◦ IV = R ◦ (T ◦ S) = (R ◦ T ) ◦ S = IU ◦
S = S. Thus the stated uniqueness of S ∈ L(V , U ) follows as well.
Let U, V be vector spaces over a field F and T ∈ L(U, V ). We say that T is
invertible if there is some S ∈ L(V , U ) such that S ◦ T = IU and T ◦ S = IV .
www.pdfgrip.com

42 Linear mappings

Such a mapping S is necessarily unique and is called the inverse of T , denoted


as T −1 .
The notion of inverse can be slightly relaxed to something refereed to as a
left or right inverse: We call an element S ∈ L(V , U ) a left inverse of T ∈
L(U, V ) if

S ◦ T = IU ; (2.1.29)
an element R ∈ L(V , U ) a right inverse of T ∈ L(U, V ) if

T ◦ R = IV . (2.1.30)
It is interesting that if T is known to be invertible, then the left and right in-
verses coincide and are simply the inverse of T . To see this, we let T −1 ∈
L(V , U ) denote the inverse of T and use the associativity of composition of
mappings to obtain from (2.1.29) and (2.1.30) the results
S = S ◦ IV = S ◦ (T ◦ T −1 ) = (S ◦ T ) ◦ T −1 = IV ◦ T −1 = T −1 ,
(2.1.31)
and
R = IU ◦ R = (T −1 ◦ T ) ◦ R = T −1 ◦ (T ◦ R) = T −1 ◦ IV = T −1 ,
(2.1.32)
respectively.
It is clear that, regarding T ∈ L(U, V ), the condition (2.1.29) implies that
T is 1-1, and the condition (2.1.30) implies that T is onto. In view of Theo-
rem 2.5, we see that when U, V are finite-dimensional and dim(U ) = dim(V ),
then T is invertible. Thus we have S = R = T −1 . On the other hand, if
dim(U ) = dim(V ), T can never be invertible and the notion of left and right
inverses is of separate interest.
As an example, we consider the linear mappings S : F3 → F2 and T :
F → F3 associated with the matrices
2
⎛ ⎞
 1 0
1 0 0 ⎜ ⎟
A= , B = ⎝ 0 1 ⎠, (2.1.33)
0 1 0
0 0
respectively according to (2.1.3). Then we may examine that S is a left inverse
of T (thus T is a right inverse of S). However, it is absurd to talk about the
invertibility of these mappings.
Let U and V be two vector spaces over the same field. An invertible element
T ∈ L(U, V ) (if it exists) is also called an isomorphism. If there is an isomor-
phism between U and V , they are said to be isomorphic to each other, written
as U ≈ V .
www.pdfgrip.com

2.1 Linear mappings 43

As a consequence of Theorem 2.4, we see that two finite-dimensional vector


spaces over the same field are isomorphic if and only if they have the same
dimension.
Let U, V be finite-dimensional vector spaces over a field F and {u1 , . . . , un },
{v1 , . . . , vm } any bases of U, V , respectively. For fixed i = 1, . . . , m, j =
1, . . . , n, define Tij ∈ L(U, V ) by setting

Tij (uj ) = vi ; Tij (uk ) = 0, 1 ≤ k ≤ n, k = j. (2.1.34)

It is clear that {Tij | i = 1, . . . , m, j = 1, . . . , n} is a basis of L(U, V ). In


particular, we have

dim(L(U, V )) = mn. (2.1.35)

Thus naturally L(U, V ) ≈ F(m, n). In the special situation when V = F


we have L(U, F) = U  . Thus dim(L(U, F)) = dim(U  ) = dim(U ) = n as
obtained earlier.

Exercises

2.1.1 For A = (aij ) ∈ F(m, n), define a mapping MA : Fm → Fn by setting


⎛ ⎞
a11 · · · a1n
⎜ ⎟
MA (x) = xA = (x1 , . . . , xm )⎝ · · · · · · · · · ⎠ , x ∈ Fm ,
am1 ··· amn
(2.1.36)

where Fl is taken to be the vector space of F-valued


l-component row vectors. Show that MA is linear and the row
vectors of the matrix A are the images of the standard vectors
e1 , . . . , em of Fm again taken to be row vectors. Describe, as vector
spaces, how F(m, n) may be identified with L(Fm , Fn ).
2.1.2 Let U, V be vector spaces over the same field F and T ∈ L(U, V ).
Prove that T is 1-1 if and only if N (T ) = {0}.
2.1.3 Let U, V be vector spaces over the same field F and T ∈ L(U, V ). If
Y is a subspace of V , is the subset of U given by

X = {x ∈ U | T (x) ∈ Y } ≡ T −1 (Y ), (2.1.37)

necessarily a subspace of U ?
2.1.4 Let U, V be finite-dimensional vector spaces and T ∈ L(U, V ). Prove
that r(T ) ≤ dim(U ). In particular, if T is onto, then dim(U ) ≥
dim(V ).
www.pdfgrip.com

44 Linear mappings

2.1.5 Prove that if T ∈ L(U, V ) satisfies (2.1.29) or (2.1.30) then T is 1-1 or


onto.
2.1.6 Consider A ∈ F(n, n). Prove that, if there is B or C ∈ F(n, n) such
that either AB = In or CA = In , then A is invertible and B = A−1 or
C = A−1 .
2.1.7 Let U be an n-dimensional vector space (n ≥ 2) and U  its dual space.
For f ∈ U  recall that f 0 = N (f ) = {u ∈ U | f (u) = 0}. Let g ∈ U 
be such that f and g are linearly independent. Hence f 0 = g 0 .
(i) Show that U = f 0 + g 0 must hold.
(ii) Establish dim(f 0 ∩ g 0 ) = n − 2.
2.1.8 For T ∈ L(U ), where U is a finite-dimensional vector space, show that
N (T 2 ) = N (T ) if and only if R(T 2 ) = R(T ).
2.1.9 Let U, V be finite-dimensional vector spaces and S, T ∈ L(U, V ).
Show that

r(S + T ) ≤ r(S) + r(T ). (2.1.38)

2.1.10 Let U, V , W be finite-dimensional vector spaces over the same field


and T ∈ L(U, V ), S ∈ L(V , W ). Establish the rank and nullity
inequalities

r(S ◦ T ) ≤ min{r(S), r(T )}, n(S ◦ T ) ≤ n(S) + n(T ). (2.1.39)

2.1.11 Let A, B ∈ F(n, n) and A or B be nonsingular. Show that r(AB) =


min{r(A), r(B)}. In particular, if A = BC or A = CB where C ∈
F(n, n) is nonsingular, then r(A) = r(B).
2.1.12 Let A ∈ F(m, n) and B ∈ F(n, m). Prove that AB ∈ F(m, m) must be
singular when m > n.
2.1.13 Let T ∈ L(U, V ) and S ∈ L(V , W ) where dim(U ) = n and
dim(V ) = m. Prove that

r(S ◦ T ) ≥ r(S) + r(T ) − m. (2.1.40)

(This result is also known as the Sylvester inequality.)


2.1.14 Let U, V , W, X be finite-dimensional vector spaces over the same field
and T ∈ L(U, V ), S ∈ L(V , W ), R ∈ L(W, X). Establish the rank
inequality

r(R ◦ S) + r(S ◦ T ) ≤ r(S) + r(R ◦ S ◦ T ). (2.1.41)

(This result is also known as the Frobenius inequality.) Note that


(2.1.40) may be deduced as a special case of (2.1.41).
www.pdfgrip.com

2.2 Change of basis 45

2.1.15 Let U be a vector space over a field F and {v1 , . . . , vk } and


{w1 , . . . , wl } be two sets of linearly independent vectors. That is,
the dimensions of the subspaces V = Span{v1 , . . . , vk } and W =
Span{w1 , . . . , wl } of U are k and l, respectively. Consider the subspace
of Fk+l defined by
⎧  ⎫
⎨   ⎬
 k l
S = (y1 , . . . , yk , z1 , . . . , zl ) ∈ Fk+l  yi vi + zj w j = 0 .
⎩  i=1 ⎭
j =1
(2.1.42)
Show that dim(S) = dim(V ∩ W ).
2.1.16 Let T ∈ L(U, V ) be invertible and U = U1 ⊕ · · · ⊕ Uk . Show that
there holds V = V1 ⊕ · · · ⊕ Vk where Vi = T (Ui ), i = 1, . . . , k.
2.1.17 Let U be finite-dimensional and T ∈ L(U ). Show that U = N (T ) ⊕
R(T ) if and only if N (T ) ∩ R(T ) = {0}.

2.2 Change of basis


It will be important to investigate how the associated matrix changes with
respect to a change of basis for a linear mapping.
Let {u1 , . . . , un } and {ũ1 , . . . , ũn } be two bases of the n-dimensional vector
space U over a field F. A change of bases from {u1 , . . . , un } to {ũ1 , . . . , ũn } is
simply a linear mapping R ∈ L(U, U ) such that

R(uj ) = ũj , j = 1, . . . , n. (2.2.1)

Following the study of the previous section, we know that R is invertible.


Moreover, if we rewrite (2.2.1) in a matrix form, we have

n
ũk = bj k uj , k = 1, . . . , n, (2.2.2)
j =1

where the matrix B = (bj k ) ∈ F(n, n) is called the basis transition matrix
from the basis {u1 , . . . , un } to the basis {ũ1 , . . . , ũn }, which is necessarily
invertible.
Let V be an m-dimensional vector space over F and take T ∈ L(U, V ).
Let A = (aij ) and à = (ãij ) be the m × n matrices of the linear mapping T
associated with the pairs of the bases

{u1 , . . . , un } and {v1 , . . . , vm }, {ũ1 , . . . , ũn } and {v1 , . . . , vm },


(2.2.3)
www.pdfgrip.com

46 Linear mappings

of the spaces U and V , respectively. Thus, we have



m 
m
T (uj ) = aij vi , T (ũj ) = ãij vi , j = 1, . . . , n. (2.2.4)
i=1 i=1
Combining (2.2.2) and (2.2.4), we obtain
 n
m 
ãij vi = T (ũj ) = T bkj uk
i=1 k=1

n 
m
= bkj aik vi
k=1 i=1
 n
m 
= aik bkj vi , j = 1, . . . , n. (2.2.5)
i=1 k=1
Therefore, we can read off the relation

n
ãij = aik bkj , i = 1, . . . , m, j = 1, . . . , n, (2.2.6)
k=1
or
à = AB. (2.2.7)
Another way to arrive at (2.2.7) is to define T̃ ∈ L(U, V ) by

m
T̃ (uj ) = ãij vi , j = 1, . . . , n. (2.2.8)
i=1
Then, with the mapping R ∈ L(U, U ) given in (2.2.1), the second relation in
(2.2.4) simply says T̃ = T ◦ R, which leads to (2.2.7) immediately.
Similarly, we may consider a change of basis in V . Let {v̂1 , . . . , v̂m } be
another basis of V which is related to the basis {v1 , . . . , vm } through an invert-
ible mapping S ∈ L(V , V ) with
S(vi ) = v̂i , i = 1, . . . , m, (2.2.9)
given by the basis transition matrix C = (cil ) ∈ F(m, m) so that

m
v̂i = cli vl , i = 1, . . . , m. (2.2.10)
l=1

For T ∈ L(U, V ), let A = (aij ), Â = (âij ) ∈ F(m, n) be the matrices of T


associated with the pairs of the bases
{u1 , . . . , un } and {v1 , . . . , vm }, {u1 , . . . , un } and {v̂1 , . . . , v̂m },
(2.2.11)
www.pdfgrip.com

2.2 Change of basis 47

of the spaces U and V , respectively. Thus, we have



m 
m
T (uj ) = aij vi = âij v̂i , j = 1, . . . , n. (2.2.12)
i=1 i=1

Inserting (2.2.10) into (2.2.12), we obtain the relation



m
aij = cil âlj , i = 1, . . . , m, j = 1, . . . , n. (2.2.13)
l=1

Rewriting (2.2.13) in its matrix form, we arrive at

A = C Â. (2.2.14)

As before, we may also define T̂ ∈ L(U, V ) by



m
T̂ (uj ) = âij vi , j = 1, . . . , n. (2.2.15)
i=1

Then

m 
m
(S ◦ T̂ )(uj ) = âij S(vi ) = âij v̂i , j = 1, . . . , n. (2.2.16)
i=1 i=1

In view of (2.2.12) and (2.2.16), we have S ◦ T̂ = T , which leads again to


(2.2.14).
An important special case is when U = V with dim(U ) = n. We investigate
how the associated matrices of an element T ∈ L(U, U ) with respect to two
bases are related through the transition matrix between the two bases. For this
purpose, let {u1 , . . . , un } and {ũ1 , . . . , ũn } be two bases of U related through
the mapping R ∈ L(U, U ) so that

R(uj ) = ũj , j = 1, . . . , n. (2.2.17)

Now let A = (aij ), Ã = (ãij ) ∈ F(n, n) be the associated matrices of T with


respect to the bases {u1 , . . . , un }, {ũ1 , . . . , ũn }, respectively. Thus

n 
n
T (uj ) = aij ui , T (ũj ) = ãij ũi , j = 1, . . . , n. (2.2.18)
i=1 i=1

Define T̃ ∈ L(U, U ) so that



n
T̃ (uj ) = ãij ui , j = 1, . . . , n. (2.2.19)
i=1
www.pdfgrip.com

48 Linear mappings

Then (2.2.17), (2.2.18), and (2.2.19) give us


R ◦ T̃ ◦ R −1 = T . (2.2.20)
Thus, if we choose B = (bij ) ∈ F(n, n) to represent the invertible mapping
(2.2.17), that is,

n
R(uj ) = bij ui , j = 1, . . . , n, (2.2.21)
i=1

then (2.2.20) gives us the manner in which the two matrices A and à are
related:
A = B ÃB −1 . (2.2.22)
Such a relation spells out the following definition.

Definition 2.6 Two matrices A, B ∈ F(n, n) are said to be similar if there is


an invertible matrix C ∈ F(n, n) such that A = CBC −1 , written as A ∼ B.

It is clear that similarity of matrices is an equivalence relation. In other


words, we have A ∼ A, B ∼ A if A ∼ B, and A ∼ C if A ∼ B and B ∼ C,
for A, B, C ∈ F(n, n).
Consequently, we see that the matrices associated to a linear mapping of a
finite-dimensional vector space into itself with respect to different bases are
similar.

Exercises
d
2.2.1 Consider the differentiation operation D = as a linear mapping over
dt
P (the vector space of all real polynomials of variable t) given by
dp(t)
D(p)(t) = , p ∈ P. (2.2.23)
dt
(i) Find the matrix that represents D ∈ L(P2 , P1 ) with respect to the
standard bases
BP
1
2
= {1, t, t 2 }, BP
1
1
= {1, t} (2.2.24)
of P2 , P1 , respectively.
(ii) If the basis of P2 is changed into
BP
2
2
= {t − 1, t + 1, (t − 1)(t + 1)}, (2.2.25)

find the basis transition matrix from the basis BP


1
2
into the basis
B P2 .
2
www.pdfgrip.com

2.2 Change of basis 49

(iii) Obtain the matrix that represents D ∈ L(P2 , P1 ) with respect to


BP2
2
and BP 1
1
directly first and then obtain it again by using the
results of (i) and (ii) in view of the relation stated as in (2.2.7).
(iv) If the basis of P1 is changed into

BP
2
1
= {t − 1, t + 1}, (2.2.26)

obtain the matrix that represents D ∈ L(P2 , P1 ) with respect to the


bases BP1
2
and BP 2
1
for P2 and P1 respectively.
(v) Find the basis transition matrix from BP 1
1
into BP
2
1
.
(vi) Obtain the matrix that represents D ∈ L(P2 , P1 ) with respect to
BP1
2
and BP 2
1
directly first and then obtain it again by using the
results of (i) and (ii) in view of the relation stated as in (2.2.14).
2.2.2 (Continued from Exercise 2.2.1) Now consider D as an element in
L(P2 , P2 ).
(i) Find the matrices that represent D with respect to BP 1
2
and BP2
2
,
respectively.
(ii) Use the basis transition matrix from the basis BP
1
2
into the basis BP2
2
of P2 to verify your results in (i) in view of the similarity relation
stated as in (2.2.22).
2.2.3 Let D : Pn → Pn be defined by (2.2.23) and consider the linear map-
ping T : Pn → Pn defined by

T (p) = tD(p) − p, p = p(t) ∈ Pn . (2.2.27)

Find N(T ) and R(T ) and show that Pn = N (T ) ⊕ R(T ).


2.2.4 Show that the real matrices
 
a b 1 a+b+c+d a−b+c−d
and (2.2.28)
c d 2 a+b−c−d a−b−c+d

are similar through investigating the matrix representations of a certain


linear mapping over R2 under the standard basis {e1 , e2 } and the trans-
formed basis {u1 = e1 + e2 , u2 = e1 − e2 }, respectively.
2.2.5 Prove that the matrices
⎛ ⎞ ⎛ ⎞
a11 a12 · · · a1n ann · · · an2 an1
⎜ ⎟ ⎜ ⎟
⎜ a21 a22 · · · a2n ⎟ ⎜ ··· ··· ··· ··· ⎟
⎜ ⎟ and ⎜ ⎟ (2.2.29)
⎜ ··· ··· ··· ··· ⎟ ⎜a ⎟
⎝ ⎠ ⎝ 2n · · · a22 a21 ⎠
an1 an2 · · · ann a1n · · · a12 a11
www.pdfgrip.com

50 Linear mappings

in F(n, n) are similar by realizing them as the matrix representatives of


a certain linear mapping over Fn with respect to two appropriate bases
of Fn .

2.3 Adjoint mappings


Let U, V be finite-dimensional vector spaces over a field F and U  , V  their
dual spaces. For T ∈ L(U, V ) and v  ∈ V  , we see that

T (u), v  , ∀u ∈ U, (2.3.1)

defines a linear functional over U . Hence there is a unique vector u ∈ U  such


that

u, u  = T (u), v  , ∀u ∈ U. (2.3.2)

Of course, u depends on T and v  . So we may write this relation as

u = T  (v  ). (2.3.3)

Under such a notion, we can rewrite (2.3.2) as

u, T  (v  ) = T (u), v  , ∀u ∈ U, ∀v  ∈ V  . (2.3.4)

Thus, in this way we have constructed a mapping T  : V  → U  . We now


show that T  is linear.
In fact, let v1 , v2 ∈ V  . Then (2.3.4) gives us

u, T  (vi ) = T (u), vi , ∀u ∈ U, i = 1, 2. (2.3.5)

Thus

u, T  (v1 ) + T  (v2 ) = T (u), v1 + v2 , ∀u ∈ U. (2.3.6)

In view of (2.3.4) and (2.3.6), we arrive at T  (v1 + v2 ) = T (v1 ) + T  (v2 ).


Besides, for any a ∈ F, we have from (2.3.4) that

u, T  (av  ) = T (u), av   = aT (u), v   = au, T  (v  )


= u, aT  (v  ), ∀u ∈ U, ∀v  ∈ V  , (2.3.7)

which yields T  (av  ) = aT  (v  ). Thus the linearity of T  is established.

Definition 2.7 For any given T ∈ L(U, V ), the linear mapping T  : V  → U 


defined by (2.3.4) is called the adjoint mapping, or simply adjoint, of T .
www.pdfgrip.com

2.3 Adjoint mappings 51

Using the fact U  = U, V  = V , and the relation (2.3.4), we have

T  ≡ (T  ) = T . (2.3.8)

It will be interesting to study the matrices associated with T and T  . For


this purpose, let {u1 , . . . , un } and {v1 , . . . , vm } be bases of U and V , and
{u1 , . . . , un } and {v1 , . . . , vm

} the corresponding dual bases, respectively. Take
 
A = (aij ) ∈ F(m, n) and A = (akl ) ∈ F(n, m) such that


m
T (uj ) = aij vi , j = 1, . . . , n, (2.3.9)
i=1
n
T  (vl ) =  
akl uk , l = 1, . . . , m. (2.3.10)
k=1

Inserting u = uj and v  = vi in (2.3.4), we have aj i = aij (i = 1, . . . , m,


j = 1, . . . , n). In other words, we get A = At . Thus the adjoint of linear
mapping corresponds to the transpose of a matrix.
We now investigate the relation between N (T ), R(T ) and N (T  ), R(T  ).

Theorem 2.8 Let U, V be finite-dimensional vector spaces and T ∈ L(U, V ).


Then N(T ), N(T  ), R(T ), R(T  ) and their annihilators enjoy the following
relations.

(1) N (T )0 = R(T  ).
(2) N (T ) = R(T  )0 .
(3) N(T  )0 = R(T ).
(4) N(T  ) = R(T )0 .

Proof We first prove (2). Let u ∈ N (T ). Then T  (v  ), u = v  , T (u) = 0


for any v  ∈ V  . So u ∈ R(T  )0 . If u ∈ R(T  )0 , then u , u = 0 for any
u ∈ R(T  ). So T  (v  ), u = 0 for any v  ∈ V  . That is, v  , T (u) = 0 for
any v  ∈ V  . Hence u ∈ N (T ). So (2) is established.
We now prove (1). If u ∈ R(T  ), there is some v  ∈ V  such that u =
T (v ). So u , u = T  (v  ), u = v  , T (u) = 0 for any u ∈ N (T ). This
 

shows u ∈ N (T )0 . Hence R(T  ) ⊂ N (T )0 . However, using (2) and (1.4.31),


we have

dim(N (T )) = dim(R(T  )0 ) = dim(U  ) − dim(R(T  ))


= dim(U ) − dim(R(T  )), (2.3.11)
www.pdfgrip.com

52 Linear mappings

which then implies


dim(R(T  )) = dim(U ) − dim(N (T )) = dim(N (T )0 ). (2.3.12)
This proves N(T )0 = R(T  ).
Finally, (3) and (4) follow from replacing T by T  and using T  = T in (1)
and (2).

We note that (1) in Theorem 2.8 may also follow from taking the annihilators
on both sides of (2) and applying Exercise 1.4.7. Thus, part (2) of Theorem 2.8
is the core of all the statements there.
We now use Theorem 2.8 to establish the following result.

Theorem 2.9 Let U, V be finite-dimensional vector spaces. For any T ∈


L(U, V ), the rank of T and that of its dual T  ∈ L(V  , U  ) are equal.
That is,
r(T ) = r(T  ). (2.3.13)

Proof Using (2) in Theorem 2.8, we have n(T ) = dim(U ) − r(T  ). However,
applying the rank equation (2.1.23) or Theorem 2.3, we have n(T ) + r(T ) =
dim(U ). Combining these results, we have r(T ) = r(T  ).

As an important application of Theorem 2.8, we consider the notion of rank


of a matrix.
Let T ∈ L(Fn , Fm ) be defined by (2.1.6) by the matrix A = (aij ) ∈ F(m, n).
Then the images of the standard basis vectors e1 , . . . , en are the column vectors
of A. Thus R(T ) is the vector space spanned by the column vectors of A whose
dimension is commonly called the column rank of A, denoted as corank(A).
Thus, we have

r(T ) = corank(A). (2.3.14)

On the other hand, since the associated matrix of T  is At whose column vec-
tors are the row vectors of A. Hence R(T  ) is the vector space spanned by the
row vectors of A (since (Fm ) = Fm ) whose dimension is likewise called the
row rank of A, denoted as rorank(A). Thus, we have

r(T  ) = rorank(A). (2.3.15)

Consequently, in view of (2.3.14), (2.3.15), and Theorem 2.9, we see that


the column and row ranks of a matrix coincide, although they have different
meanings.
Since the column and row ranks of a matrix are identical, they are jointly
called the rank of the matrix.
www.pdfgrip.com

2.4 Quotient mappings 53

Exercises

2.3.1 Let U be a finite-dimensional vector space over a field F and regard a


given element f ∈ U  as an element in L(U, F). Describe the adjoint of
f , namely, f  , as an element in L(F , U  ), and verify r(f ) = r(f  ).
2.3.2 Let U, V , W be finite-dimensional vector spaces over a field F.
Show that for T ∈ L(U, V ) and S ∈ L(V , W ) there holds
(S ◦ T ) = T  ◦ S  .
2.3.3 Let U, V be finite-dimensional vector spaces over a field F and T ∈
L(U, V ). Prove that for given v ∈ V the non-homogeneous equation

T (u) = v, (2.3.16)

has a solution for some u ∈ U if and only if for any v  ∈ V  such that
T  (v  ) = 0 one has v, v   = 0. In particular, the equation (2.3.16) has
a solution for any v ∈ V if and only if T  is injective. (This statement is
also commonly known as the Fredholm alternative.)
2.3.4 Let U be a finite-dimensional vector space over a field F and a ∈ F.
Define T ∈ L(U, U ) by T (u) = au where u ∈ U . Show that T  is then
given by T  (u ) = au for any u ∈ U  .
2.3.5 Let U = F(n, n) and B ∈ U . Define T ∈ L(U, U ) by T (A) = AB−BA
where A ∈ U . For f ∈ U  given by

f (A) = f, A = Tr(A), A ∈ U, (2.3.17)

where Tr(A), or trace of A, is the sum of the diagonal entries of A,


determine T  (f ).
2.3.6 Let U = F(n, n). Then f (A) = Tr(AB t ) defines an element f ∈ U  .
(i) Use Mij to denote the element in F(n, n) whose entry in the ith row
and j th column is 1 but all other entries are zero. Verify the formula

Tr(Mij Mkl ) = δil δj k , i, j, k, k = 1, . . . , n. (2.3.18)

(ii) Apply (i) to show that for any f ∈ U  there is a unique element
B ∈ U such that f (A) = Tr(AB t ).
(iii) For T ∈ L(U, U ), defined by T (A) = At , describe T  .

2.4 Quotient mappings


Let U, V be two vector spaces over a field F and T ∈ L(U, V ). Suppose that
X, Y are subspaces of U, V , respectively, which satisfy the property T (X) ⊂ Y
www.pdfgrip.com

54 Linear mappings

or T ∈ L(X, Y ). We now show that such a property allows us to generate a


linear mapping
T̃ : U/X → V /Y, (2.4.1)
from T naturally.
As before, we use [·] to denote a coset in U/X or V /Y . Define T̃ : U/
X → V /Y by setting
T̃ ([u]) = [T (u)], ∀[u] ∈ U/X. (2.4.2)
We begin by showing that this definition does not suffer any ambiguity by
verifying
T̃ ([u1 ]) = T̃ ([u2 ]) whenever [u1 ] = [u2 ]. (2.4.3)
In fact, if [u1 ] = [u2 ], then u1 −u2 ∈ X. Thus T (u1 )−T (u2 ) = T (u1 −u2 ) ∈ Y ,
which implies [T (u1 )] = [T (u2 )]. So (2.4.3) follows.
The linearity of T̃ can now be checked directly.
First, let u1 , u2 ∈ U . Then, by (2.4.2), we have
T̃ ([u1 ] + [u2 ]) = T̃ ([u1 + u2 ]) = [T (u1 + u2 )] = [T (u1 ) + T (u2 )]
= [T (u1 )] + [T (u2 )] = T̃ ([u1 ]) + T̃ ([u2 ]). (2.4.4)
Next, let a ∈ F and u ∈ U . Then, again by (2.4.2), we have
T̃ (a[u]) = T̃ ([au]) = [T (au)] = a[T (u)] = a T̃ ([u]). (2.4.5)
It is not hard to show that the property T (X) ⊂ Y is also necessary to
ensure that (2.4.2) gives us a well-defined mapping T̃ from U/X into V /Y for
T ∈ L(U, V ). Indeed, let u ∈ X. Then [u] = [0]. Thus [T (u)] = T̃ ([u]) =
T̃ ([0]) = [0] which implies T (u) ∈ Y .
In summary, we can state the following basic theorem regarding construct-
ing quotient mappings between quotient spaces.

Theorem 2.10 Let X, Y be subspaces of U, V , respectively, and T ∈ L(U, V ).


Then (2.4.2) defines a linear mapping T̃ from U/X into V /Y if and only if the
condition
T (X) ⊂ Y (2.4.6)
holds.

In the special situation when X = {0} ⊂ U and Y is an arbitrary subspace


of V , we see that U/X = U any T ∈ L(U, V ) induces the quotient mapping
T̃ : U → V /Y, T̃ (u) = [T (u)]. (2.4.7)
www.pdfgrip.com

2.5 Linear mappings from a vector space into itself 55

Exercises

2.4.1 Let V , W be subspaces of a vector space U . Then the quotient mapping


I˜ : U/V → U/W induced from the identity mapping I : U → U is
well-defined and given by
I˜([u]V ) = [u]W , u ∈ U, (2.4.8)
if V ⊂ W , where [·]V and [·]W denote the cosets in U/V and
U/W , respectively. Show that the fact I˜([0]V ) = [0]W implies that if
[u1 ]V , . . . , [uk ]V are linearly dependent in U/V then [u1 ]W , . . . , [uk ]W
are linearly dependent in U/W .
2.4.2 Let U be a finite-dimensional vector space and V a subspace of U . Use I˜
to denote the quotient mapping I˜ : U → U/V induced from the identity
mapping I : U → U given as in (2.4.7). Apply Theorem 2.3 or the rank
equation (2.1.23) to I˜ to reestablish the relation (1.6.12).
2.4.3 Let U, V be finite-dimensional vector spaces over a field and X, Y the
subspaces of U, V , respectively. For T ∈ L(U, V ), assume T (X) ⊂ Y
and use T̃ to denote the quotient mapping from U/X into V /Y induced
from T . Show that r(T̃ ) ≤ r(T ).

2.5 Linear mappings from a vector space into itself


Let U be a vector space over a field F. In this section, we study the important
special situation when linear mappings are from U into itself. We shall denote
the space L(U, U ) simply by L(U ). Such mappings are also often called linear
transformations. First, we consider some general properties such as invariance
and reducibility. Then we present some examples.

2.5.1 Invariance and reducibility


In this subsection, we consider some situations in which the complexity of a
linear mapping may be ‘reduced’ somewhat.

Definition 2.11 Let T ∈ L(U ) and V be a subspace of U . We say that V is an


invariant subspace of T if T (V ) ⊂ V .

Given T ∈ L(U ), it is clear that the null-space N (T ) and range R(T ) of T


are both invariant subspaces of T .
To see how the knowledge about an invariant subspace reduces the complex-
ity of a linear mapping, we assume that V is a nontrivial invariant subspace of
www.pdfgrip.com

56 Linear mappings

T ∈ L(U ) where U is n-dimensional. Let {u1 , . . . , uk } be any basis of V . We


extend it to get a basis of U , say {u1 , . . . , uk , uk+1 , . . . , un }. With respect to
such a basis, we have


k 
n
T (ui ) = bi  i ui  , i = 1, . . . , k, T (uj ) = cj  j uj  , j = k + 1, . . . , n,
i  =1 j  =1
(2.5.1)

where B = (bi  i ) ∈ F(k, k) and C = (cj  j ) ∈ F(n, [n − k]). With respect to


this basis, the associated matrix A ∈ F(n, n) becomes
 
B C1 C1
A= , = C. (2.5.2)
0 C2 C2

Such a matrix is sometimes referred to as boxed upper triangular.


Thus, we see that a linear mapping T over a finite-dimensional vector space
U has a nontrivial invariant subspace if and only if there is a basis of U so that
the associated matrix of T with respect to this basis is boxed upper triangular.
For the matrix A given in (2.5.2), the vanishing of the entries in the left-
lower portion of the matrix indeed reduces the complexity of the matrix. We
have seen clearly that such a ‘reduction’ happens because of the invariance
property

T (Span{u1 , . . . , uk }) ⊂ Span{u1 , . . . , uk }. (2.5.3)

Consequently, if we also have the following additionally imposed invariance


property

T (Span{uk+1 , . . . , un }) ⊂ Span{uk+1 , . . . , un }, (2.5.4)

then cj  j = 0 for j  = 1, . . . , k in (2.5.1) or C1 = 0 in (2.5.2), which further


reduces the complexity of the matrix A.
The above investigation motivates the introduction of the concept of
reducibility of a linear mapping as follows.

Definition 2.12 We say that T ∈ L(U ) is reducible if there are nontrivial


invariant subspaces V , W of T so that U = V ⊕ W . In such a situation, we
also say that T may be reduced over the subspaces V and W . Otherwise, if no
such pair of invariant subspaces exist, we say that T is irreducible.

Thus, if U is n-dimensional and T ∈ L(U ) may be reduced over the non-


trivial subspaces V and W , then we take {v1 , . . . , vk } and {w1 , . . . , wl } to
be any bases of V and W , so that {v1 , . . . , vk , w1 , . . . , wl } is a basis of U .
www.pdfgrip.com

2.5 Linear mappings from a vector space into itself 57

Over such a basis, T has the representation


k 
l
T (vi ) = bi  i vi  , i = 1, . . . , k, T (wj ) = cj  j wj  , j = 1, . . . , l,
i  =1 j  =1
(2.5.5)

where the matrices B = (bi  i ) and C = (cj  j ) are in F(k, k) and F(l, l),
respectively. Thus, we see that, with respect to such a basis of U , the asso-
ciated matrix A ∈ F(n, n) assumes the form

B 0
A= , (2.5.6)
0 C

which takes the form of a special kind of matrices called boxed diagonal
matrices.
Thus, we see that a linear mapping T over a finite-dimensional vector space
U is reducible if and only if there is a basis of U so that the associated matrix
of T with respect to this basis is boxed diagonal.
An important and useful family of invariant subspaces are called
eigenspaces which may be viewed as an extension of the notion of null-spaces.

Definition 2.13 For T ∈ L(U ), a scalar λ ∈ F is called an eigenvalue


of T if the null-space N (T − λI ) is not the zero space {0}. Any nonzero
vector in N(T − λI ) is called an eigenvector associated with the eigenvalue
λ and N(T − λI ) is called the eigenspace of T associated with the eigen-
value λ, and often denoted as Eλ . The integer dim(Eλ ) is called the geometric
multiplicity of the eigenvalue λ.
In particular, for A ∈ F(n, n), an eigenvalue, eigenspace, and the eigen-
vectors associated with and geometric multiplicity of an eigenvalue of the
matrix A are those of the A-induced mapping TA : Fn → Fn , defined by
TA (x) = Ax, x ∈ Fn .

Let λ be an eigenvalue of T ∈ L(U ). It is clear that Eλ is invariant under


T and T = λI (I is the identity mapping) over Eλ . Thus, let {u1 , . . . , uk }
be a basis of Eλ and extend it to obtain a basis for the full space U . Then
the associated matrix A of T with respect to this basis takes a boxed lower
triangular form

λIk C1
A= , (2.5.7)
0 C2

where Ik denotes the identity matrix in F(k, k).


www.pdfgrip.com

58 Linear mappings

We may explore the concept of eigenspaces to pursue a further reduction of


the associated matrix of a linear mapping.

Theorem 2.14 Let u1 , . . . , uk be eigenvectors associated with distinct eigen-


values λ1 , . . . , λk of some T ∈ L(U ). Then these vectors are linearly
independent.

Proof We use induction on k. If k = 1, the statement of the theorem is already


valid since u1 = 0. Assume the statement of the theorem is valid for k = m ≥ 1.
For k = m + 1, consider the relation

c1 u1 + · · · + cm um + cm+1 um+1 = 0, c1 , . . . , cm , cm+1 ∈ F. (2.5.8)

Applying T to (2.5.8), we have

c1 λ1 u1 + · · · + cm λm um + cm+1 λm+1 um+1 = 0. (2.5.9)

Multiplying (2.5.8) by λm+1 and subtracting the result from (2.5.9), we obtain

c1 (λ1 − λm+1 )u1 + · · · + cm (λm − λm+1 )um = 0. (2.5.10)

Thus, by the assumption that u1 , . . . , um are linearly independent, we get c1 =


· · · = cm = 0 since λ1 , . . . , λm , λm+1 are distinct. Inserting this result into
(2.5.8), we find cm+1 = 0. So the proof follows.

As a corollary of the theorem, we immediately conclude that a linear


mapping over an n-dimensional vector space may have at most n distinct
eigenvalues.
If λ1 , . . . , λk are the distinct eigenvalues of T , then Theorem 2.14 indicates
that the sum of Eλ1 , . . . , Eλk is a direct sum,

Eλ1 + · · · + Eλk = Eλ1 ⊕ · · · ⊕ Eλk . (2.5.11)

Thus, the equality

dim(Eλ1 ) + · · · + dim(Eλk ) = dim(U ) = n (2.5.12)

holds if and only if

U = Eλ1 ⊕ · · · ⊕ Eλk . (2.5.13)

In this situation T is reducible over Eλ1 , . . . , Eλk and may naturally be


expressed as a direct sum of mappings

T = λ1 I ⊕ · · · ⊕ λk I, (2.5.14)
www.pdfgrip.com

2.5 Linear mappings from a vector space into itself 59

where λi I is understood to operate on Eλi , i = 1, . . . , k. Furthermore, using


the bases of Eλ1 , . . . , Eλk to form a basis of U , we see that the associated
matrix A of T becomes diagonal

A = diag{λ1 In1 , . . . , λk Ink }, (2.5.15)

where ni = dim(Eλi ) (i = 1, . . . , k). In particular, when T has n distinct


eigenvalues, λ1 , . . . , λn , then any n eigenvectors correspondingly associated
with these eigenvalues form a basis of U . With respect to this basis, the asso-
ciated matrix A of T is simply

A = diag{λ1 , . . . , λn }. (2.5.16)

We now present some examples as illustrations.


Consider T ∈ L(R2 ) defined by
 
2 1 x1
T (x) = x, x = ∈ R2 . (2.5.17)
1 2 x2

Then it may be checked directly that T has 1 and 3 as eigenvalues and

E1 = Span{(1, −1)t }, E3 = Span{(1, 1)t }. (2.5.18)

Thus, with respect to the basis {(1, −1)t , (1, 1)t }, the associated matrix of T is

1 0
. (2.5.19)
0 3

So T is reducible and reduced over the pair of eigenspaces E1 and E3 .


Next, consider S ∈ L(R2 ) defined by
 
0 −1 x1
S(x) = x, x = ∈ R2 . (2.5.20)
1 0 x2

We show that S has no nontrivial invariant spaces. In fact, if V is one, then


dim(V ) = 1. Let x ∈ V such that x = 0. Then the invariance of V requires
S(x) = λx for some λ ∈ R. Inserting this relation into (2.5.20), we obtain

−x2 = λx1 , x1 = λx2 . (2.5.21)

Hence x1 = 0, x2 = 0. Iterating the two equations in (2.5.21), we obtain


λ2 + 1 = 0 which is impossible. So S is also irreducible.
www.pdfgrip.com

60 Linear mappings

2.5.2 Projections
In this subsection, we study an important family of reducible linear mappings
called projections.

Definition 2.15 Let V and W be two complementary subspaces of U . That is,


U = V ⊕ W . For any u ∈ U , express u uniquely as u = v + w, v ∈ V , w ∈ W ,
and define the mapping P : U → U by
P (u) = v. (2.5.22)
Then P ∈ L(U ) and is called the projection of U onto V along W .

We need to check that the mapping P defined in Definition 2.15 is indeed


linear. To see this, we take u1 , u2 ∈ U and express them as u1 = v1 +w1 , u2 =
v2 + w2 , for unique v1 , v2 ∈ V , w1 , w2 ∈ W . Hence P (u1 ) = v1 , P (u2 ) = v2 .
On the other hand, from u1 +u2 = (v1 +v2 )+(w1 +w2 ), we get P (u1 +u2 ) =
v1 + v2 . Thus P (u1 + u2 ) = P (u1 ) + P (u2 ). Moreover, for any a ∈ F and
u ∈ U , write u = v + w for unique v ∈ V , w ∈ W . Thus P (u) = v and
au = av + aw give us P (au) = av = aP (u). So P ∈ L(U ) as claimed.
From Definition 2.15, we see that for v ∈ V we have P (v) = v. Thus
P (P (u)) = P (u) for any u ∈ U . In other words, the projection P satisfies the
special property P ◦ P = P . For notational convenience, we shall use T k to
denote the k-fold composition T ◦· · ·◦T for any T ∈ L(U ). With this notation,
we see that a projection P satisfies the condition P 2 = P . Any linear mapping
satisfying such a condition is called idempotent.
We now show that being idempotent characterizes a linear mapping being a
projection.

Theorem 2.16 A linear mapping T over a vector space U is idempotent if and


only if it is a projection. More precisely, if T 2 = T and
V = N (I − T ), W = N (T ), (2.5.23)
then U = V ⊕ W and T is simply the projection of U onto V along W .

Proof Let T be idempotent and define the subspaces V , W by (2.5.23). We


claim
R(T ) = N (I − T ), R(I − T ) = N (T ). (2.5.24)
In fact, if u ∈ R(T ), then there is some x ∈ U such that u = T (x). Hence
(I − T )u = (I − T )(T (x)) = T (x) − T 2 (x) = 0. So u ∈ N (I − T ). If
u ∈ N(I − T ), then u = T (u) which implies u ∈ R(T ) already. If u ∈
www.pdfgrip.com

2.5 Linear mappings from a vector space into itself 61

R(I − T ), then there is some x ∈ U such that u = (I − T )(x). So T (u) =


T ◦(I −T )(x) = (T −T 2 )(x) = 0 and u ∈ N (T ). If u ∈ N (T ), then T (u) = 0
which allows us to rewrite u as u = (I − T )(u). Thus u ∈ R(I − T ).
Now consider the identity

I = T + (I − T ). (2.5.25)

Applying (2.5.25) to U and using (2.5.24), we obtain U = V + W . Let u ∈


V ∩ W . Using the definition of V , W in (2.5.23), we get T (u) − u = 0 and
T (u) = 0. Thus u = 0. So U = V ⊕ W .
Finally, for any u ∈ U , express u as u = v + w for some unique v ∈ V
and w ∈ W and let P ∈ L(U ) be the projection of U onto V along W . Then
(2.5.23) indicates that T (v) = v and T (w) = 0. So T (u) = T (v + w) =
T (v) + T (w) = v = P (u). That is, T = P and the proof follows.

An easy but interesting consequence of Theorem 2.16 is the following.

Theorem 2.17 Let V , W be complementary subspaces of U . Then P ∈ L(U )


is the projection of U onto V along W if and only if I − P is the projection of
U onto W along V .

Proof Since
(I − P )2 = I − 2P + P 2 , (2.5.26)

we see that (I −P )2 = I −P if and only if P 2 = P . The rest of the conclusion


follows from (2.5.23).

From (2.5.23), we see that when T ∈ L(U ) is idempotent, it is reducible


over the null-spaces N (T ) and N (I − T ). It may be shown that the converse
is true, which is assigned as an exercise.
Recall that the null-spaces N (T ) and N (I − T ), if nontrivial, of an idempo-
tent mapping T ∈ L(U ), are simply the eigenspaces E0 and E1 of T associated
with the eigenvalue 0 and 1, respectively. So, with respect to a basis of U con-
sisting of the vectors in any bases of E0 and E1 , the associated matrix A of T
is of the form

0k 0
A= , (2.5.27)
0 Il

where 0k is the zero matrix in F(k, k), k = dim(N (T )) = n(T ), l =


dim(N(I − T )) = r(T ).
www.pdfgrip.com

62 Linear mappings

Let T ∈ L(U ) and λ1 , . . . , λk be some distinct eigenvalues of T such that


T reduces over Eλ1 , . . . , Eλk . Use Pi to denote the projection of U onto Eλi
along

Eλ1 ⊕ · · · ⊕ Eλi ⊕ · · · ⊕ Eλk , (2.5.28)


! indicates the item that is missing in the expression. Then we can
where [·]
represent T as

T = λ1 P1 + · · · + λk Pk . (2.5.29)

2.5.3 Nilpotent mappings


Consider the vector space Pn of the set of all polynomials of degrees up to n
with coefficients in a field F and the differentiation operator
d
D= so that D(a0 + a1 t + · · · + an t n ) = a1 + 2a2 t + · · · + nan t n−1 .
dt
(2.5.30)

Then D n+1 = 0 (zero mapping). Such a linear mapping D : Pn → Pn is an


example of nilpotent mappings we now study.

Definition 2.18 Let U be a finite-dimensional vector space and T ∈ L(U ).


We say that T is nilpotent if there is an integer k ≥ 1 such that T k = 0. For a
nilpotent mapping T ∈ L(U ), the smallest integer k ≥ 1 such that T k = 0 is
called the degree or index of nilpotence of T .

The same definition may be stated for square matrices.


Of course, the degree of a nonzero nilpotent mapping is always at least 2.

Definition 2.19 Let U be a vector space and T ∈ L(U ). For any nonzero
vector u ∈ U , we say that u is T -cyclic if there is an integer m ≥ 1 such that
T m (u) = 0. The smallest such integer m is called the period of u under or
relative to T . If each vector in U is T -cyclic, T is said to be locally nilpotent.

It is clear that a nilpotent mapping must be locally nilpotent. In fact, these


two notions are equivalent in finite dimensions.

Theorem 2.20 If U is finite-dimensional, then a mapping T ∈ L(U ) is nilpo-


tent if and only if it is locally nilpotent.
www.pdfgrip.com

2.5 Linear mappings from a vector space into itself 63

Proof Suppose that dim(U ) = n ≥ 1 and T ∈ L(U ) is locally nilpotent. Let


{u1 , . . . , un } be any basis of U and m1 , . . . , mn be the periods of u1 , . . . , un ,
respectively. Set
k = max{m1 , . . . , mn }. (2.5.31)
Then it is seen that T is nilpotent of degree k.
We now show how to use a cyclic vector to generate an invariant subspace.

Theorem 2.21 For T ∈ L(U ), let u ∈ U be a nonzero cyclic vector under T


of period m. Then the vectors
u, T (u), . . . , T m−1 (u), (2.5.32)
are linearly independent so that they span an m-dimensional T -invariant sub-
space of U . In particular, m ≤ dim(U ).

Proof We only need to show that the set of vectors in (2.5.32) are linearly
independent. If m = 1, the statement is self-evident. So we now assume m ≥ 2.
Let c0 , . . . , cm−1 ∈ F so that
c0 u + c1 T (u) + · · · + cm−1 T m−1 (u) = 0. (2.5.33)
Assume that there is some l = 0, . . . , m − 1 such that cl = 0. Let l ≥ 0 be the
smallest such integer. Then l ≤ m − 2 so that (2.5.33) gives us
1
T l (u) = − (cl+1 T l+1 (u) + · · · + cm−1 T m−1 (u)). (2.5.34)
cl
This relation implies that
T m−1 (u) = T l+([m−1]−l) (u)
1
= − (cl+1 T m (u) + · · · + cm−1 T 2(m−1)+l (u)) = 0, (2.5.35)
cl
which contradicts the fact that T m−1 (u) = 0.
We can apply the above results to study the reducibility of a nilpotent
mapping.

Theorem 2.22 Let U be a finite-dimensional vector space and T ∈ L(U )


a nilpotent mapping of degree k. If T is reducible over V and W , that is,
U = V ⊕ W , T (V ) ⊂ V , and T (W ) ⊂ W , then
k ≤ max{dim(V ), dim(W )}. (2.5.36)
In particular, if k = dim(U ), then T is irreducible.
www.pdfgrip.com

64 Linear mappings

Proof Let {v1 , . . . , vn1 } and {w1 , . . . , wn2 } be bases of V and W ,


respectively. Then, by Theorem 2.21, the periods of these vectors cannot
exceed the right-hand side of (2.5.36). Thus (2.5.36) follows from (2.5.31).
If k = dim(U ), then dim(V ) = k and dim(W ) = 0 or dim(V ) = 0 and
dim(W ) = k, which shows that T is irreducible.

Let dim(U ) = n and T ∈ L(U ) be nilpotent of degree n. Then there is a


vector u ∈ U of period n. Set

U1 = Span{T n−1 (u)}, . . . , Un−1 = Span{T (u), . . . , T n−1 (u)}. (2.5.37)

Then U1 , . . . , Un−1 are nontrivial invariant subspaces of T satisfying U1 ⊂


· · · ⊂ Un−1 and dim(U1 ) = 1, . . . , dim(Un−1 ) = n − 1. Thus, T has invariant
subspaces of all possible dimensions, but yet is irreducible.
Again, let dim(U ) = n and T ∈ L(U ) be nilpotent of degree n. Choose a
cyclic vector u ∈ U of period n and set

u1 = u, . . . , un = T n−1 (u). (2.5.38)

Then, with respect to the basis B = {u1 , . . . , un }, we have

T (ui ) = ui+1 , i = 1, . . . , n − 1, T (un ) = 0. (2.5.39)

Hence, if we use S = (sij ) to denote the matrix of T with respect to the basis
B, then

1, j = i − 1, i = 2, . . . , n,
sij = (2.5.40)
0, otherwise.

That is,
⎛ ⎞
0 0 ··· ··· 0
⎜ ··· ··· ⎟
⎜ 1 0 0 ⎟
⎜ ⎟
⎜ .. .. .. .. ..⎟
S=⎜ . . . . .⎟. (2.5.41)
⎜ ⎟
⎜ ⎟
⎝ 0 ··· 1 0 0 ⎠
0 ··· ··· 1 0

Alternatively, we may also set

u1 = T n−1 (u), . . . , un = u. (2.5.42)


www.pdfgrip.com

2.5 Linear mappings from a vector space into itself 65

Then with respect to this reordered basis the matrix of T becomes


⎛ ⎞
0 1 0 ··· 0
⎜ 0 0 1 ··· 0 ⎟
⎜ ⎟
⎜ ⎟
⎜ .. . . .. .. .. ⎟
S=⎜ .⎜ . . . . ⎟ (2.5.43)
⎟.
⎜ . ⎟
⎜ . .. ⎟
⎝ . ··· . 0 1 ⎠
0 ··· ··· 0 0

The n × n matrix S expressed in either (2.5.41) or (2.5.43) is also called a


shift matrix, which clearly indicates that T is nilpotent of degree n.
In general, we shall see later that, if T ∈ L(U ) is nilpotent and U is finite-
dimensional, there are T -invariant subspaces U1 , . . . , Ul such that

U = U1 ⊕ · · · ⊕ Ul , dim(U1 ) = n1 , . . . , dim(Ul ) = nl , (2.5.44)

and the degree of T restricted to Ui is exactly ni (i = 1, . . . , l). Thus, we can


choose T -cyclic vectors of respective periods n1 , . . . , nl , say u1 , . . . , ul , and
use the vectors

u1 , · · · , T n1 −1 (u1 ), . . . , ul , . . . , T nl −1 (ul ), (2.5.45)

say, as a basis of U . With respect to such a basis, the matrix S of T is


⎛ ⎞
S1 0 · · · 0
⎜ ⎟
⎜ .. .. ⎟
⎜ 0 . . 0 ⎟
S=⎜ ⎜ ⎟, (2.5.46)
⎜ ... .. .. .. ⎟⎟
⎝ . . . ⎠
0 0 · · · Sl

where each Si is an ni × ni shift matrix (i = 1, . . . , l).


Let k be the degree of T or the matrix S. From (2.5.46), we see clearly that
the relation

k = max{n1 , . . . , nl } (2.5.47)

holds.

2.5.4 Polynomials of linear mappings


In this section, we have seen that it is often useful to consider various powers of
a linear mapping T ∈ L(U ) as well as some linear combinations of appropriate
www.pdfgrip.com

66 Linear mappings

powers of T . These manipulations motivate the introduction of the notion of


polynomials of linear mappings. Specifically, for any p(t) ∈ P with the form

p(t) = an t n + · · · + a1 t + a0 , a0 , a1 , . . . , an ∈ F, (2.5.48)

we define p(T ) ∈ L(U ) to be the linear mapping over U given by

p(T ) = an T n + · · · + a1 T + a0 I. (2.5.49)

It is straightforward to check that all usual operations over polynomials in


variable t can be carried over correspondingly to those over polynomials in
the powers of a linear mapping T over the vector space U . For example, if
f, g, h ∈ P satisfy the relation f (t) = g(t)h(t), then

f (T ) = g(T )h(T ), (2.5.50)

because the powers of T follow the same rule as the powers of t. That is,
T k T l = T k+l , k, l ∈ N.
For T ∈ L(U ), let λ ∈ F be an eigenvalue of T . Then, for any p(t) ∈ P
given as in (2.5.48), p(λ) is an eigenvalue of p(T ). To see this, we assume that
u ∈ U is an eigenvector of T associated with λ. We have

p(T )(u) = (an T n + · · · + a1 T + a0 I )(u)


= (an λn + · · · + a1 λ + a0 )(u) = p(λ)u, (2.5.51)

as anticipated.
If p ∈ P is such that p(T ) = 0, then we say that T is a root of p. Hence, if
T is a root of p, any eigenvalue λ of T must also be a root of p, p(λ) = 0, by
virtue of (2.5.51).
For example, an idempotent mapping is a root of the polynomial p(t) =
t 2 − t, and a nilpotent mapping is a root of a polynomial of the form p(t) = t m
(m ∈ N). Consequently, the eigenvalues of an idempotent mapping can only
be 0 and 1, and that of a nilpotent mapping, 0.
For T ∈ L(U ), let λ1 , . . . , λk be some distinct eigenvalues of T
such that T reduces over Eλ1 , . . . , Eλk . Then T must be a root of the
polynomial

p(t) = (t − λ1 ) · · · (t − λk ). (2.5.52)

To see this, we rewrite any u ∈ U as u = u1 + · · · + uk where ui ∈ Eλi ,


i = 1, . . . , k. Then
www.pdfgrip.com

2.5 Linear mappings from a vector space into itself 67


k
p(T )u = p(T )ui
i=1

k
= (T − λ1 I ) · · · (T
− λi I ) · · · (T − λk I )(T − λi I )ui
i=1
= 0, (2.5.53)

which establishes p(T ) = 0, as claimed.


It is clear that the polynomial p(t), given in (2.5.52), is the lowest-degree
nontrivial polynomial among all polynomials for which T is a root, which is
often referred to as the minimal polynomial of T , a basic notion to be detailed
later.

Exercises

2.5.1 Let S, T ∈ L(U ) and S ◦ T = T ◦ S. Show that the null-space N (S)


and range R(S) of S are invariant subspaces under T . In particular, an
eigenspace of T associated with eigenvalue λ is seen to be invariant
under T when taking S = T − λI .
2.5.2 Let S, T ∈ L(U ). Prove that the invertibility of the mappings I + S ◦ T
and I + T ◦ S are equivalent by showing that if I + S ◦ T is invertible
then so is I + T ◦ S with

(I + T ◦ S)−1 = I − T ◦ (I + S ◦ T )−1 ◦ S. (2.5.54)

2.5.3 The rotation transformation over R2 , denoted by Rθ ∈ L(R2 ), is given


by
 
cos θ − sin θ x1
Rθ (x) = x, x = ∈ R2 . (2.5.55)
sin θ cos θ x2

Show that Rθ has no nontrivial invariant subspace in R2 unless θ = kπ


(k ∈ Z).
2.5.4 Consider the vector space P of the set of all polynomials over a field F.
Define T ∈ L(P) by setting

T (p) = tp(t), ∀p(t) = a0 + a1 t + · · · + an t n ∈ P. (2.5.56)

Show that T cannot have an eigenvalue.


2.5.5 Let A = (aij ) ∈ F(n, n) be invertible and satisfy

n
aij = a ∈ F, i = 1, . . . , n. (2.5.57)
j =1
www.pdfgrip.com

68 Linear mappings

(i) Show that a must be an eigenvalue of A and that a = 0.


(ii) Show that if A−1 = (bij ) then


n
1
bij = , i = 1, . . . , n. (2.5.58)
a
j =1

2.5.6 Let S, T ∈ L(U ) and assume that S, T are similar. That is, there is an
invertible element R ∈ L(U ) such that S = R ◦ T ◦ R −1 . Show that
λ ∈ F is an eigenvalue of S if and only if it is an eigenvalue of T .
2.5.7 Let U = F(n, n) and T ∈ L(U ) be defined by taking matrix transpose,

T (A) = At , A ∈ U. (2.5.59)

Show that both ±1 are the eigenvalues of T and identify E1 and E−1 .
Prove that T is reducible over the pair E1 and E−1 . Can T have an
eigenvalue different from ±1? What is the minimal polynomial of T ?
2.5.8 Let U be a vector space over a field F and T ∈ L(U ). If λ1 , λ2 ∈
F are distinct eigenvalues and u1 , u2 are the respectively associated
eigenvectors of T , show that, for any nonzero a1 , a2 ∈ F, the vector
u = a1 u1 + a2 u2 cannot be an eigenvector of T .
2.5.9 Let U be a finite-dimensional vector space over a field F and T ∈
L(U ). Assume that T is of rank 1.
(i) Prove that T must have an eigenvalue, say λ, in F.
(ii) If λ = 0, show that R(T ) = Eλ .
(iii) Is the statement in (ii), i.e. R(T ) = Eλ , valid when λ = 0?
2.5.10 Let A ∈ C(n, n) be unitary. That is, AA† = A† A = In . Show that any
eigenvalue of A is of unit modulus.
2.5.11 Show that, if T ∈ L(U ) is reducible over the pair of null-spaces N (T )
and N(I − T ), then T is idempotent.
2.5.12 Let S, T ∈ L(U ) be idempotent. Show that
(i) R(S) = R(T ) if and only if S ◦ T = T , T ◦ S = S.
(ii) N(S) = N (T ) if and only if S ◦ T = S, T ◦ S = T .
2.5.13 Let T ∈ L(R3 ) be defined by
⎛ ⎞ ⎛ ⎞
1 1 −1 x1
⎜ ⎟ ⎜ ⎟
T (x) = ⎝ −1 1 1 ⎠ x, x = ⎝ x2 ⎠ ∈ R3 . (2.5.60)
1 3 −1 x3

Let S ∈ L(R3 ) project R3 onto R(T ) along N (T ). Determine S by


obtaining a matrix A ∈ R(3, 3) that represents S. That is, S(x) = Ax
for x ∈ R3 .
www.pdfgrip.com

2.5 Linear mappings from a vector space into itself 69

2.5.14 Let U be an n-dimensional vector space (n ≥ 2) and U  its dual space.


Let u1 , u2 and u1 , u2 be independent vectors in U and U  , respectively.
Define a linear mapping T ∈ L(U ) by setting
T (u) = u, u1 u1 + u, u2 u2 , u ∈ U. (2.5.61)
(i) Find the condition(s) regarding u1 , u2 , u1 , u2 under which T is a
projection.
(ii) When T is a projection, determine subspaces V and W of U in
terms of u1 , u2 , u1 , u2 so that T projects U onto V along W .
2.5.15 We may slightly generalize the notion of idempotent mappings to a
mapping T ∈ L(U ) satisfying
T 2 = aT , T = 0, a ∈ F, a = 0, 1. (2.5.62)
Show that such a mapping T is reduced over the pair of subspaces
N(T ) and N (aI − T ).
2.5.16 Let T ∈ L(U ) and λ1 , . . . , λk be some distinct eigenvalues of T
such that T is reducible over the eigenspaces Eλ1 , . . . , Eλk . Show that
λ1 , . . . , λk are all the eigenvalues of T . In other words, if λ is any
eigenvalue of T , then λ is among λ1 , . . . , λk .
2.5.17 Is the differential operator D : Pn → Pn reducible? Find an element
in Pn that is of period n + 1 under D and use it to obtain a basis of Pn .
2.5.18 Let α and β be nonzero elements in F(n, 1). Then A = αβ t ∈ F(n, n).
(i) Prove that the necessary and sufficient condition for A to be nilpo-
tent is α t β = 0. If A is nilpotent, what is the degree of A?
(ii) Prove that the necessary and sufficient condition for A to be idem-
potent is α t β = 1.
2.5.19 Show that the linear mapping T ∈ L(P) defined in (2.5.56) and the
differential operator D ∈ L(P) satisfy the identity D ◦ T − T ◦ D = I .
2.5.20 Let S, T ∈ L(U ) satisfy S ◦ T − T ◦ S = I . Establish the identity
S k ◦ T − T ◦ S k = kS k−1 , k ≥ 2. (2.5.63)
2.5.21 Let U be a finite-dimensional vector space over a field F and T ∈ L(U )
an idempotent mapping satisfying the equation
T 3 + T − 2I = 0. (2.5.64)
Show that T = I .
2.5.22 Let U be a finite-dimensional vector space over a field F and T ∈ L(U )
satisfying
T3 = T (2.5.65)
so that ±1 cannot be the eigenvalues of T . Show that T = 0.
www.pdfgrip.com

70 Linear mappings

2.5.23 Let U be a finite-dimensional vector space over a field F and S, T ∈


L(U ) satisfying S ∼ T . For any p ∈ P, show that p(S) ∼ p(T ).
2.5.24 Let U be a finite-dimensional vector space over a field F and T ∈
L(U ). For any p ∈ P, show that p(T ) = p(T  ).
2.5.25 Let U be an n-dimensional vector space over a field F, T ∈ L(U )
nilpotent of degree k ≥ 2, and n(T ) = l ≥ 2. Assume k + l = n + 1.
(i) Show that there are subspaces V and W of U where

V = Span{u, T (u), . . . , T k−1 (u)}, (2.5.66)

with u ∈ U a T -cyclic vector of period k, and W is an (l − 1)-


dimensional subspace of N (T ), such that T is reducible over the
pair V and W .
(ii) Describe R(T ) and determine r(T ).
2.5.26 Let U be a finite-dimensional vector space over a field F and S, T ∈
L(U ).
(i) Show that if S is invertible and T nilpotent, S − T must also be
invertible provided that S and T commute, S ◦ T = T ◦ S.
(ii) Find an example to show that the condition in (i) that S and T
commute cannot be removed.
2.5.27 Let U be a finite-dimensional vector space, T ∈ L(U ), and V a sub-
space of U which is invariant under T . Show that if U = R(T ) ⊕ V
then V = N (T ).

2.6 Norms of linear mappings


In this section, we begin by considering general linear mappings between
arbitrary finite-dimensional normed vector spaces. We then concentrate on
mappings from a normed space into itself.

2.6.1 Definition and elementary properties of norms


of linear mappings
Let U, V be finite-dimensional vector spaces over R or C and  · U ,  · V
norms on U, V , respectively. Assume that B = {u1 , . . . , un } is any basis of U .
n
For any u ∈ U , write u as u = ai ui where a1 , . . . , an are the coordinates
i=1
www.pdfgrip.com

2.6 Norms of linear mappings 71

of u with respect to B. Thus


"  n "
"  " n
" "
T (u)V = "T ai ui " ≤ |ai |T (ui )V
" "
i=1 V i=1
 n
≤ max {T (ui )V } |ai |
1≤i≤n
i=1
 
≡ max {T (ui )V } u1 ≤ CuU , (2.6.1)
1≤i≤n

where we have used the fact that norms over a finite-dimensional space are all
equivalent. This estimate may also be restated as
T (u)V
≤ C, u ∈ U, u = 0. (2.6.2)
uU
This boundedness result enables us to formulate the definition of the norm
of a linear mapping T ∈ L(U, V ) as follows.

Definition 2.23 For T ∈ L(U, V ), we define the norm of T , induced from the
respective norms of U and V , by
  
T (u)V 
T  = sup u ∈ U, u = 0 . (2.6.3)
uU 

To show that (2.6.3) indeed defines a norm for the space L(U, V ), we need
to examine that it fulfills all the properties required of a norm. To this end, we
note from (2.6.3) that

T (u)V ≤ T uU , u ∈ U. (2.6.4)

For S, T ∈ L(U, V ), since the triangle inequality and (2.6.4) give us

(S + T )(u)V ≤ S(u)V + T (u)V ≤ (S + T )uU , (2.6.5)

we obtain S + T  ≤ S + T , which says the triangle inequality holds over


L(U, V ).
Let a be any scalar. Then aT (u)V = |a|T (u)V for any u ∈ U . Hence
  
aT (u)V 
aT  = sup u ∈ U, u = 0
uU 
  
T (u)V 
= |a| sup u ∈ U, u = 0 = |a|T , (2.6.6)
uU 
which indicates that homogeneity follows.
Finally, it is clear that T  = 0 implies T = 0.
www.pdfgrip.com

72 Linear mappings

Let T ∈ L(U, V ) and S ∈ L(V , W ) where W is another normed vector


space with the norm  · W . Then S ◦ T ∈ L(U, W ) and it follows from (2.6.4)
that

(S ◦ T )(u)W ≤ ST (u)V ≤ ST uU , u ∈ U. (2.6.7)

Consequently, we get

S ◦ T  ≤ ST . (2.6.8)

This simple but general inequality is of basic usefulness.


In the rest of the section, we will focus on mappings from U into itself. We
note that, for the identity mapping I ∈ L(U ), it is clear that I  = 1.

2.6.2 Invertibility of linear mappings as a generic property


Let U be a finite-dimensional vector space with norm  · . It has been seen
that the space L(U ) may be equipped with an induced norm which may also be
denoted by · since there is no risk of confusion. The availability of a norm of
L(U ) allows one to perform analysis on L(U ) so that a deeper understanding
of L(U ) may be achieved.
As an illustration, in this subsection, we will characterize invertibility of
linear mappings by using the norm.

Theorem 2.24 Let U be a finite-dimensional normed space. An element T ∈


L(U ) is invertible if and only if there is a constant c > 0 such that

T (u) ≥ cu, u ∈ U. (2.6.9)

Proof Assume (2.6.9) is valid. Then it is clear that N (T ) = {0}. Hence T


is invertible. Conversely, assume that T is invertible and T −1 ∈ L(U ) is its
inverse. Then 1 = I  = T −1 ◦ T  ≤ T −1 T  implies that the norm
of an invertible mapping can never be zero. Thus, for any u ∈ U , we have
u = (T −1 ◦ T )(u) ≤ T −1 T (u) or T (u) ≥ (T −1 )−1 u, u ∈ U ,
which establishes (2.6.9).

We now show that invertibility is a generic property for elements in L(U ).

Theorem 2.25 Let U be a finite-dimensional normed space and T ∈ L(U ).

(1) For any ε > 0 there exists an invertible element S ∈ L(U ) such that
S − T  < ε. This property says that the subset of invertible mappings in
L(U ) is dense in L(U ) with respect to the norm of L(U ).
www.pdfgrip.com

2.6 Norms of linear mappings 73

(2) If T ∈ L(U ) is invertible, then there is some ε > 0 such that S ∈ L(U ) is
invertible whenever S satisfies S − T  < ε. This property says that the
subset of invertible mappings in L(U ) is open in L(U ) with respect to the
norm of L(U ).

Proof For any scalar λ, consider Sλ = T − λI . If dim(U ) = n, there are at


most n possible values of λ for which Sλ is not invertible. Now T − Sλ  =
|λ|I  = |λ|. So for any ε > 0, there is a scalar λ, |λ| < ε, such that Sλ is
invertible. This proves (1).
We next consider (2). Let T ∈ L(U ) be invertible. Then (2.6.9) holds for
some c > 0. Let S ∈ L(U ) be such that S − T  < ε for some ε > 0. Then,
for any u ∈ U , we have

cu ≤ T (u) = (T − S)(u) + S(u)


≤ T − Su + S(u) ≤ εu + S(u), u ∈ U, (2.6.10)

or S(u) ≥ (c − ε)u. Therefore, if we choose ε < c, we see in view


of Theorem 2.24 that S is invertible when S − T  < ε. Hence (2) follows
as well.

2.6.3 Exponential of a linear mapping


Let T ∈ L(U ). For a positive integer m, we consider Tm ∈ L(U ) given by

m
1 k
Tm = T , (2.6.11)
k!
k=0

with the understanding T 0 = I . Therefore, for l < m, we have the estimate



m
T k
Tl − Tm  ≤ . (2.6.12)
k!
k=l+1

In particular, Tl − Tm  → 0 as l, m → ∞. Hence we see that the limit



m
1 k
lim T (2.6.13)
m→∞ k!
k=0

is a well-defined element in L(U ) and is naturally denoted as



 1 k
e =
T
T , (2.6.14)
k!
k=0
www.pdfgrip.com

74 Linear mappings

and is called the exponential of T ∈ L(U ). Thus e0 = I . As in calculus, if


S, T ∈ L(U ) are commutative, we can verify the formula
eS eT ≡ eS ◦ eT = eS+T . (2.6.15)
A special consequence of this simple property is that the exponential of any
mapping T ∈ L(U ) is invertible. In fact, the relation (2.6.15) indicates that
(eT )−1 = e−T . (2.6.16)

More generally, with the notation (t) = etT (t ∈ R), we have


(1) (s) (t) = (s + t), s, t ∈ R,
(2) (0) = I ,
and we say that : R → L(U ) defines a one-parameter group.
Furthermore, we also have
1 1 (t+h)T
( (t + h) − (t)) = (e − etT )
h h
1 1
= (ehT − I )etT = etT (ehT − I )
h h
 ∞
hk
=T T k etT
(k + 1)!
k=0
∞
 hk
= etT T k T , h ∈ R, h = 0. (2.6.17)
(k + 1)!
k=0

Therefore, we obtain the limit


1
lim ( (t + h) − (t)) = T etT = etT T . (2.6.18)
h→0 h

In other words, the above result gives us


 d tT
(t) = e = T etT = etT T = T (t) = (t)T , t ∈ R. (2.6.19)
dt
In particular,  (0) = T , which intuitively says that T is the initial rate of
change of the one-parameter group . One also refers to this relationship as
‘T generates ’ or ‘T is the generator of .’
The relation (2.6.19) suggests that it is legitimate to differentiate the series

 1 k k
(t) = t T term by term:
k!
k=0
∞ ∞ ∞
d  1 k k  1 d k k  1 k k
t T = (t )T = T t T , t ∈ R. (2.6.20)
dt k! k! dt k!
k=0 k=0 k=0
www.pdfgrip.com

2.6 Norms of linear mappings 75

With the exponential of a linear mapping, T ∈ L(U ), various elementary


functions of T involving the exponentials of T may also be introduced accord-
ingly. For example,
1# T $ 1# T $
cosh T = e + e−T , sinh T = e − e−T , (2.6.21)
2 2
1 # iT $ 1 # iT $
cos T = e + e−iT , sin T = e − e−iT , (2.6.22)
2 2i
are all well defined and enjoy similar properties of the corresponding classical
functions.
The matrix version of the discussion here can also be easily formulated anal-
ogously and is omitted.

Exercises

2.6.1 Let U be a finite-dimensional normed space and {Tn } ⊂ L(U ) a


sequence of invertible mappings that converges to a non-invertible
mapping T ∈ L(U ). Show that Tn−1  → ∞ as n → ∞.
2.6.2 Let U and V be finite-dimensional normed spaces with the norms ·U
and  · V , respectively. For T ∈ L(U, V ), show that the induced norm
of T may also be evaluated by the expression

T  = sup{T (u)V | u ∈ U, uU = 1}. (2.6.23)

2.6.3 Consider T ∈ L(Rn , Rm ) defined by


⎞ ⎛
x1
⎜ . ⎟
T (x) = Ax, A = (aij ) ∈ R(m, n), x = ⎜ ⎟
⎝ .. ⎠ ∈ R . (2.6.24)
n

xn

Use the norm  · ∞ for both Rm and Rn , correspondingly, and denote


by T ∞ the induced norm of the linear mapping T given in (2.6.24).
Show that
⎧ ⎫
⎨ n ⎬
T ∞ = max |aij | . (2.6.25)
1≤i≤m ⎩ ⎭
j =1

(This quantity is also sometimes denoted as A∞ .)


2.6.4 Let T be the linear mapping defined in (2.6.24). Show that, if we use
the norm  · 1 for both Rm and Rn , correspondingly, and denote by
T 1 the induce norm of T , then
www.pdfgrip.com

76 Linear mappings

 

m
T 1 = max |aij | . (2.6.26)
1≤j ≤n
i=1

(This quantity is also sometimes denoted as A1 .)


2.6.5 Let U be a finite-dimensional normed space over R or C and use N to
denote the subset of L(U ) consisting of all nilpotent mappings. Show
that N is not an open subset of L(U ).
2.6.6 Let U be a finite-dimensional normed space and T ∈ L(U ). Prove that
eT  ≤ eT  .
t
2.6.7 Let A ∈ R(n, n). Show that (eA )t = eA and that, if A is skew-
symmetric, At = −A, then eA is orthogonal.

2.6.8 Let A ∈ C(n, n). Show that (eA )† = eA and that, if A is anti-
Hermitian, A† = −A, then eA is unitary.
2.6.9 Let A ∈ R(n, n) and consider the initial value problem of the following
system of differential equations
⎛ ⎞ ⎛ ⎞
x1 (t) x1,0
dx ⎜ . ⎟ ⎜ . ⎟
= Ax, x = x(t) = ⎜
⎝ .. ⎠ ;
⎟ x(0) = x0 = ⎜ ⎟
⎝ .. ⎠ .
dt
xn (t) xn,0
(2.6.27)

(i) Show that, with the one-parameter group (t) = etA , the solution
of the problem (2.6.27) is simply given by x = (t)x0 .
(ii) Moreover, use x(i) (t) to denote the solution of (2.6.27) when x0 =
ei where {e1 , . . . , en } is the standard basis of Rn , i = 1, . . . , n.
Show that (t) = etA is the n × n matrix with x(1) (t), . . . , x(n) (t)
as the n corresponding column vectors,

(t) = (x(1) (t), . . . , x(n) (t)). (2.6.28)

(This result provides a practical method for computing the matrix ex-
ponential, etA , which may also be viewed as the solution of the matrix-
valued initial value problem
dX
= AX, X(0) = In , X ∈ R(n, n).) (2.6.29)
dt
2.6.10 Consider the mapping : R → R(2, 2) defined by

cos t − sin t
(t) = , t ∈ R. (2.6.30)
sin t cos t
www.pdfgrip.com

2.6 Norms of linear mappings 77

(i) Show that (t) is a one-parameter group.


(ii) Find the generator, say A ∈ R(2, 2), of (t).
(iii) Compute etA directly by the formula
∞ k
 t
etA = Ak (2.6.31)
k!
k=0

and verify (t) = etA .


(iv) Use the practical method illustrated in Exercise 2.6.9 to obtain
the matrix exponential etA through solving two appropriate initial
value problems as given in (2.6.27).
2.6.11 For the functions of T ∈ L(U ) defined in (2.6.21) and (2.6.22), estab-
lish the identities
cosh2 T − sinh2 T = I, cos2 T + sin2 T = I. (2.6.32)
2.6.12 For T ∈ L(U ), establish the formulas
d d
(sinh tT ) = T cosh tT , (cosh tT ) = T sinh tT . (2.6.33)
dt dt
2.6.13 For T ∈ L(U ), establish the formulas
d d
(sin tT ) = T cos tT , (cos tT ) = −T sin tT . (2.6.34)
dt dt
www.pdfgrip.com

3
Determinants

In this chapter we introduce one of the most important computational tools in


linear algebra – the determinants. First we discuss some motivational exam-
ples. Next we present the definition and basic properties of determinants. Then
we study some applications of determinants.

3.1 Motivational examples


We now present some examples occurring in geometry, algebra, and topology
that use determinants as a natural underlying computational tool.

3.1.1 Area and volume


Let u = (a1 , a2 ) and v = (b1 , b2 ) be nonzero vectors in R2 . We consider
the area of the parallelogram formed from using these two vectors as adjacent
edges. First we may express u in polar coordinates as

u = (a1 , a2 ) = u(cos θ, sin θ ). (3.1.1)

Thus, we may easily resolve the vector v along the direction of u and the
direction perpendicular to u as follows
# % π& % π &$
v = (b1 , b2 ) = c1 (cos θ, sin θ ) + c2 cos θ ± , sin θ ±
2 2
= (c1 cos θ ∓ c2 sin θ, c1 sin θ ± c2 cos θ ), c1 , c2 ∈ R. (3.1.2)

Here c2 may be interpreted as the length of the vector in the resolution that is
taken to be perpendicular to u. Hence, from (3.1.2), we can read off the result

c2 = ±(b2 cos θ − b1 sin θ ) = |b2 cos θ − b1 sin θ |. (3.1.3)

78
www.pdfgrip.com

3.1 Motivational examples 79

Therefore, using (3.1.3) and then (3.1.1), we see that the area σ of the parallel-
ogram under consideration is given by
σ = c2 u = |u cos θ b2 − u sin θ b1 | = |a1 b2 − a2 b1 |. (3.1.4)
Thus we see that the quantity a1 b2 − a2 b1 formed from the vectors (a1 , a2 )
and (b1 , b2 ) stands out, that will be called the determinant of the matrix

a1 a2
A= , (3.1.5)
b1 b2
written as det(A) or denoted by
 
 a a2 
 1
 . (3.1.6)
 b1 b2 
Since det(A) = ±σ , it is also referred to as the signed area of the parallelo-
gram formed by the vectors (a1 , a2 ) and (b1 , b2 ).
We now consider volume. We shall apply some vector algebra over R3 to
facilitate our discussion.
We use · and × to denote the usual dot and cross products between vectors
in R3 . We use i, j, k to denote the standard mutually orthogonal unit vectors in
R3 that form a right-hand system. For any vectors
u = (a1 , a2 , a3 ) = a1 i + a2 j + a3 k, v = (b1 , b2 , b3 ) = b1 i + b2 j + b3 k,
(3.1.7)

in R3 , we know that
u × v = (a2 b3 − a3 b2 )i − (a1 b3 − a3 b2 )j + (a1 b2 − a2 b1 )k (3.1.8)
is perpendicular to both u and v and u × v gives us the area of the paral-
lelogram formed from using u, v as two adjacent edges, which generalizes the
preceding discussion in R2 . To avoid the trivial situation, we assume u and v
are linearly independent. So u × v = 0 and
R3 = Span{u, v, u × v}. (3.1.9)
Let w = (c1 , c2 , c3 ) be another vector. Then (3.1.9) allows us to
express w as
w = au + bv + c(u × v), a, b, c ∈ R. (3.1.10)
From the geometry of the problem, we see that the volume of the parallelepiped
formed from using u, v, w as adjacent edges is given by
δ = u × vc(u × v) = |c|u × v2 (3.1.11)
www.pdfgrip.com

80 Determinants

because c(u × v) is the height of the parallelepiped, with the bottom area
u × v.
From (3.1.10), we have
w · (u × v) = cu × v2 . (3.1.12)
Inserting (3.1.12) into (3.1.11), we obtain the simplified volume formula
δ = |w · (u × v)|
= |c1 (a2 b3 − a3 b2 ) − c2 (a1 b3 − a3 b2 ) + c3 (a1 b2 − a2 b1 )|. (3.1.13)
In analogy of the earlier discussion on the area formula in R2 , we may set up
the matrix
⎛ ⎞
c1 c2 c3
⎜ ⎟
A = ⎝ a1 a2 a3 ⎠ , (3.1.14)
b1 b2 b3
and define the signed volume or determinant of the 3 × 3 matrix A as
 
 c1 c2 c3 
 
 
det(A) =  a1 a2 a3 
 
 b b b 
1 2 3

= c1 (a2 b3 − a3 b2 ) − c2 (a1 b3 − a3 b2 ) + c3 (a1 b2 − a2 b1 ). (3.1.15)


In view of the 2 × 2 determinants already defined, we may rewrite (3.1.15)
in the decomposed form
 
 c1 c2 c3       
   a a   a a   a a 
   2 3   1 3   1 2 
 a1 a2 a3  = c1   − c2   + c3  .
   b2 b3   b1 b3   b1 b2 
 b b b 
1 2 3
(3.1.16)

3.1.2 Solving systems of linear equations


Next we remark that a more standard motivational example for determinants is
Cramer’s rule or Cramer’s formulas for solving systems of linear equations.
For example, consider the 2 × 2 system

a1 x1 + a2 x2 = c1 ,
(3.1.17)
b1 x1 + b2 x2 = c2 ,
where a1 , a2 , b1 , b2 , c1 , c2 ∈ F. Multiplying the first equation by b1 , the sec-
ond equation by a1 , and subtracting, we have
(a1 b2 − a2 b1 )x2 = a1 c2 − b1 c1 ; (3.1.18)
www.pdfgrip.com

3.1 Motivational examples 81

multiplying the first equation by b2 , the second equation by a2 , and subtracting,


we have

(a1 b2 − a2 b1 )x1 = b2 c1 − a2 c2 . (3.1.19)

Thus, with the notation of determinants and in view of (3.1.18) and (3.1.19),
we may express the solution to (3.1.17) elegantly as
   
 c a   a c 
 1 2   1 1   
     a a 
 c2 b2   b1 c2   1 2 
x1 =   , x2 =   , if   = 0. (3.1.20)
 a a   a a   b1 b2 
 1 2   1 2 
   
 b1 b2   b1 b2 

The extension of these formulas to 3 × 3 systems will be assigned as an


exercise.

3.1.3 Topological invariants


Let f be a real-valued continuously differentiable function over R that maps
the closed interval [α, β] (α < β) into [a, b] (a < b) so that the boundary
points are mapped into boundary points as well,

f : {α, β} → {a, b}. (3.1.21)

The function f maps the interval [α, β] to cover the interval [a, b].
At a point t0 ∈ [α, β] where the derivative of f is positive, f  (t0 ) > 0, f
maps a small interval around t0 onto a small interval around f (t0 ) and pre-
serves the orientation (from left to right) of the intervals; if f  (t0 ) < 0, it
reverses the orientation. If f  (t0 ) = 0, we say that t0 is a regular point of f .
For c ∈ [a, b], if f  (t) = 0 for any t ∈ f −1 (c), we call c a regular value of
f . It is clear that, f −1 (c) is a finite set when c is a regular value of f . If c is a
regular value of f , we define the integer

N (f, c) = N + (f, c) − N − (f, c), (3.1.22)

where

N ± (f, c) = the number of points in f −1 (c) where ±f  > 0. (3.1.23)

If f  (t) = 0 for any t ∈ (α, β), then N ± (f, c) = 1 and N ∓ (f, c) = 0


according to ±f  (t) > 0 (t ∈ (α, β)), which leads to N (f, c) = ±1 according
to ±f  (t) > 0 (t ∈ (α, β)). In particular, N (f, c) is independent of c.
If f  = 0 at some point, such a point is called a critical point of f . We
assume further that f has finitely many critical points in (α, β), say t1 , . . . , tm
www.pdfgrip.com

82 Determinants

(m ≥ 1). Set t0 = α, tm+1 = β, and assume ti < ti+1 for i = 0, . . . , m. Then,


if c is a regular value of f , we have
N ± (f, c) = the number of intervals (ti , ti+1 ) containing f −1 (c)
and satisfying ± f  (t) > 0, t ∈ (ti , ti+1 ), i = 0, . . . , m.
(3.1.24)

Thus, we have seen that, although both N + (f, c) and N − (f, c) may depend
on c, the quantity N (f, c) as given in (3.1.22) does not depend on c and is seen
to satisfy


⎨ 1, f (α) = a, f (β) = b,
N(f, c) = −1, f (α) = b, f (β) = a, (3.1.25)


0, f (α) = f (β).
This quantity may summarily be rewritten into the form of an integral
 β
1 1
N(f, c) = (f (β) − f (α)) = f  (t) dt, (3.1.26)
b−a b−a α
and interpreted to be the number count for the orientation-preserving times
minus the orientation-reversing times the function f maps the interval [α, β]
to cover the interval [a, b]. The advantage of using the integral representation
for N(f, c) is that it is clearly independent of the choice of the regular value
c. Indeed, the right-hand side of (3.1.26) is well defined for any differentiable
function f not necessarily having only finitely many critical points and the
specification of a regular value c for f becomes irrelevant. In fact, the integral
on the right-hand side of (3.1.26) is the simplest topological invariant called
the degree of the function f , which may now be denoted by
 β
1
deg(f ) = f  (t) dt. (3.1.27)
b−a α
The word ‘topological’ is used to refer to the fact that a small alteration of f
cannot perturb the value of deg(f ) since deg(f ) may take only integer val-
ues and the right-hand side of (3.1.27), however, relies on the derivative of f
continuously.
As a simple application, we note that it is not hard to see that for any c ∈
[a, b] the equation f (t) = c has at least one solution when deg(f ) = 0.
We next extend our discussion of topological invariants to two-dimensional
situations.
Let and C be two closed differentiable curves in R2 oriented counterclock-
wise and let
u: →C (3.1.28)
www.pdfgrip.com

3.1 Motivational examples 83

be a differentiable map. In analogy with the case of a real-valued function


over an interval discussed above, we may express the number count for the
orientation-preserving times minus the orientation-reversing times u maps the
curve to cover the curve C in the form of a line integral,

1
deg(u) = τ · du, (3.1.29)
|C|
where |C| denotes the length of the curve C and τ is the unit tangent vector
along the positive direction of C.
In the special situation when C = S 1 (the unit circle in R2 centered at the
origin), we write u as
u = (f, g), f 2 + g 2 = 1, (3.1.30)
where f, g are real-valued functions, so that
τ = (−g, f ). (3.1.31)
Now assume further that the curve is parameterized by a parameter t taken
over the interval [α, β]. Then, inserting (3.1.30) and (3.1.31) into (3.1.29) and
using |S 1 | = 2π, we have
 β
1
deg(u) = (−g, f ) · (f  , g  ) dt
2π α
 β
1
= (f g  − gf  ) dt
2π α
 β  

1  f g 
=    dt. (3.1.32)
2π α  f g  
Thus we see that the concept of determinant arises naturally again.
Let v be a vector field over R2 . Let be a closed curve in R2 where v = 0.
Then
1
v = v (3.1.33)
v
defines a map from into S 1 . The index of the vector field v along the curve
is then defined as
ind(v| ) = deg(v ), (3.1.34)
which is another useful topological invariant.
As an example, we consider a vector field v over R2 given by
v(x, y) = (x 2 − y 2 , 2xy), (x, y) ∈ R2 . (3.1.35)
Then v(x, y)2 = (x 2 +y 2 )2 > 0 for (x, y) = (0, 0). So for any closed curve
not intersecting the origin, the quantity ind(v| ) is well defined.
www.pdfgrip.com

84 Determinants

Let SR1 denote the circle of radius R > 0 in R2 centered at the origin. We
may parameterize SR1 by the polar angle θ : x = R cos θ, y = R sin θ , θ ∈
[0, 2π ]. With (3.1.35), we have
1 2
vS 1 = (x − y 2 , 2xy)
R R2
= (cos2 θ − sin2 θ, 2 cos θ sin θ ) = (cos 2θ, sin 2θ ). (3.1.36)

Therefore, using (3.1.32), we get


 2π  

1  cos 2θ sin 2θ 
ind(v|S 1 ) = deg(vS 1 ) =   dθ = 2. (3.1.37)
R R 2π 0  −2 sin 2θ 2 cos 2θ 
For any closed curve enclosing but not intersecting the origin, we can
continuously deform it into a circle SR1 (R > 0), while staying away from
the origin in the process. By continuity or topological invariance, we obtain
ind(v| ) = ind(v|S 1 ) = 2. The meaning of this result will be seen in the
R
following theorem.

Theorem 3.1 Let v be a vector field that is differentiable over a bounded


domain in R2 and let be a closed curve contained in . If v = 0 on
and ind(v| ) = 0, then there must be at least one point enclosed inside
where v = 0.

Proof Assume otherwise that v = 0 in the domain enclosed by . Let γ


be another closed curve enclosed inside . Since γ may be obtained from
through a continuous deformation and v = 0 nowhere inside , we have
ind(v| ) = ind(v|γ ). On the other hand, if we parameterize the curve γ using
its arclength s, then
  
1 1 
ind(v|γ ) = deg(vγ ) = 
τ · vγ (s) ds, vγ = v  , (3.1.38)
2π γ v γ

where τ (s) is the unit tangent vector along the unit circle S 1 at the image point
vγ (s) under the map vγ : γ → S 1 . Rewrite vγ as

vγ (s) = (f (x(s), y(s)), g(x(s), y(s))). (3.1.39)

Then
 
vγ (s) = fx x  (s) + fy y  (s), gx x  (s) + gy y  (s) . (3.1.40)

The assumption on v gives us the uniform boundedness |fx |, |fy |, |gx |, |gy |
inside . Using this property and (3.1.40), we see that there is a γ -independent
constant C > 0 such that
www.pdfgrip.com

3.1 Motivational examples 85


vγ (s) ≤ C x  (s)2 + y  (s)2 = C. (3.1.41)

In view of (3.1.38) and (3.1.41), we have



1 C
1 ≤ |ind(v|γ )| ≤ vγ (s) ds ≤ |γ |, (3.1.42)
2π γ 2π

which leads to absurdness when the total arclength |γ | of the curve γ is made
small enough.

Thus, returning to the example (3.1.35), we conclude that the vector field v
has a zero inside any circle SR1 (R > 0) since we have shown that ind(v|S 1 ) =
R
2 = 0, which can only be the origin as seen trivially in (3.1.35) already.
We now use Theorem 3.1 to establish the celebrated Fundamental Theorem
of Algebra as stated below.

Theorem 3.2 Any polynomial of degree n ≥ 1 with coefficients in C of the


form

f (z) = an zn + an−1 zn−1 + · · · + a0 , a0 , . . . , an−1 , an ∈ C, an = 0,


(3.1.43)

must have a zero in C. That is, there is some z0 ∈ C such that f (z0 ) = 0.

Proof Without loss of generality and for sake of simplicity, we may assume
an = 1 otherwise we may divide f (z) by an .
Let z = x + iy, x, y ∈ R, and write f (z) as

f (z) = P (x, y) + iQ(x, y), (3.1.44)

where P , Q are real-valued functions of x, y. Consider the vector field

v(x, y) = (P (x, y), Q(x, y)). (3.1.45)

Then it is clear that v(x, y) = |f (z)| and it suffices to show that v vanishes
at some (x0 , y0 ) ∈ R2 .
In order to simplify our calculation, we consider a one-parameter deforma-
tion of f (z) given by

f t (z) = zn + t (an−1 zn−1 + · · · + a0 ), t ∈ [0, 1], (3.1.46)

and denote the correspondingly constructed vector field by v t (x, y). So on the
circle SR1 = {(x, y) ∈ R2 | (x, y) = |z| = R} (R > 0), we have the uniform
lower estimate
www.pdfgrip.com

86 Determinants

v t (x, y) = |f t (z)|


 
1 1
≥ R n 1 − |an−1 | − · · · − |a0 | n
R R
≡ C(R), t ∈ [0, 1]. (3.1.47)

Thus, when R is sufficiently large, we have C(R) ≥ 1 (say). For such a choice
of R, by topological invariance, we have
# $ # $ # $
ind v|S 1 = ind v 1 |S 1 = ind v 0 |S 1 . (3.1.48)
R R R

On the other hand, over SR1 we may again use the polar angle θ : x =
R cos θ, y = R sin θ , or z = Reiθ , to represent f 0 as f 0 (z) = R n einθ . Hence
v 0 = R n (cos nθ, sin nθ ). Consequently,
1 0
vS0 1 = v = (cos nθ, sin nθ ). (3.1.49)
R v 0 
Therefore, as before, we obtain
# $ # $
ind v 0 |S 1 = deg vS0 1
R
 2π  
R

1  cos nθ sin nθ 
=   dθ = n. (3.1.50)
2π 0  −n sin nθ n cos nθ 
In view of (3.1.48) and (3.1.50), we get ind(v|S 1 ) = n. Thus, applying Theo-
R
rem 3.1, we conclude that v must vanish somewhere inside the circle SR1 .

Use  to denote a closed surface in R3 and S 2 the standard unit sphere in R3 .


We may also consider a map u :  → S 2 . Since the orientation of S 2 is given
by its unit outnormal vector, say ν, we may analogously express the number
count, for the number of times that u covers S 2 in an orientation-preserving
manner minus the number of times that u covers S 2 in an orientation-reversing
manner, in the form of a surface integral, also called the degree of the map
u :  → S 2 , by

1
deg(u) = 2 ν · dσ, (3.1.51)
|S | 
where dσ is the vector area element over S 2 induced from the map u.
To further facilitate computation, we may assume that  is parameterized
by the parameters s, t over a two-dimensional domain and u = (f, g, h),
where f, g, h are real-valued functions of s, t so that f 2 + g 2 + h2 = 1. At the
image point u, the unit outnormal of S 2 at u, is simply u itself. Moreover, the
vector area element at u under the mapping u can be represented as
www.pdfgrip.com

3.1 Motivational examples 87

 
∂u ∂u
dσ = × dsdt. (3.1.52)
∂s ∂t
Thus, inserting these and |S 2 | = 4π into (3.1.51), we arrive at
  
1 ∂u ∂u
deg(u) = u· × dsdt
4π ∂s ∂t
 
  f g h 
1  
=  fs gs hs  dsdt. (3.1.53)
4π  
 f g h 
t t t

This gives another example of the use of determinants.

Exercises

3.1.1 For the 3 × 3 system of equations



⎨ a1 x1 + a2 x2 + a3 x3
⎪ = d1 ,
b1 x1 + b2 x2 + b3 x3 = d2 , (3.1.54)


c1 x1 + c2 x2 + c3 x3 = d3 ,
find similar solution formulas as those for (3.1.17) expressed as ratios of
some 3 × 3 determinants.
3.1.2 Let f : over R [α, β] → [a, b] be a real-valued continuously differen-
tiable function satisfying {f (α), f (β)} ⊂ {a, b}, where α, β, a, b ∈ R
and α < β and a < b. Show that if deg(f ) = 0 then f (t) = c has a
solution for any c ∈ (a, b).
3.1.3 Consider the vector fields or maps f, g : R2 → R2 defined by
f (x, y) = (ax, by), g(x, y) = (a 2 x 2 − b2 y 2 , 2abxy), (x, y) ∈ R2 ,
(3.1.55)
where a, b > 0 are constants.
(i) Compute the indices of f and g around a suitable closed curve
around the origin of R2 .
(ii) What do you notice? Do the results depend on a, b? In particular,
what are the numbers (counting multiplicities) of zeros of the maps
f and g?
(iii) If we make some small alterations of f and g by a positive param-
eter ε > 0 to get (say)
fε (x, y) = (ax − ε, by),
(3.1.56)
gε (x, y) = (a 2 x 2 − b2 y 2 − ε, 2abxy), (x, y) ∈ R2 ,
www.pdfgrip.com

88 Determinants

what happens to the indices of fε and gε around the same closed


curve? What happens to the zeros of fε and gε ?
3.1.4 Consider the following simultaneous system of nonlinear equations
 3
x − 3xy 2 = 5 cos2 (x + y),
(3.1.57)
3x 2 y − y 3 = −2e−x y .
2 2

Use the topological method in this section to prove that the system has
at least one solution.
3.1.5 Consider the stereographic projection of S 2 sited in R3 with the Carte-
sian coordinates x, y, z onto the xy-plane through the south pole
(0, 0, −1) which induces a parameterization of S 2 by R2 given by

2x 2y 1 − x2 − y2
f = , g = , h = , (x, y) ∈ R2 ,
1 + x2 + y2 1 + x2 + y2 1 + x2 + y2
(3.1.58)

where u = (f, g, h) ∈ S 2 . Regarded as the identity map u : S 2 → S 2 ,


we have deg(u) = 1. Verify this result by computing the integral
 
  f g h 
1  
deg(u) =  fx gx hx  dxdy. (3.1.59)
4π R2  
 f g h 
y y y

3.1.6 The hedgehog map is a map S 2 → S 2 defined in terms of the parame-


terization of R2 by polar coordinates r, θ by the expression

u = (cos(nθ ) sin f (r), sin(nθ ) sin f (r), cos f (r)), (3.1.60)

where 0 < r < ∞, θ ∈ [0, 2π], n ∈ Z, and f is a real-valued function


satisfying the boundary condition f (0) = π and f (∞) = 0. Compute
 ∞  2π  
1 ∂u ∂u
deg(u) = u· × dθdr (3.1.61)
4π 0 0 ∂r ∂θ
and explain your result.

3.2 Definition and properties of determinants


Motivated by the practical examples shown in the previous section, we now
systematically develop the notion of determinants. There are many ways to
www.pdfgrip.com

3.2 Definition and properties of determinants 89

define determinants. The inductive definition to be presented below is perhaps


the simplest.

Definition 3.3 Consider A ∈ F(n, n) (n ≥ 1). If n = 1, then A = (a) (a ∈ F)


and the determinant of A, det(A), is defined to be det(A) = a; if A = (aij )
with n ≥ 2, the minor Mij of the entry aij is the determinant of the (n − 1) ×
(n − 1) submatrix of A obtained from deleting the ith row and j th column of
A occupied by aij and the cofactor Cij is given by

Cij = (−1)i+j Mij , i, j = 1, . . . , n. (3.2.1)

The determinant of A is defined by the expansion formula


n
det(A) = ai1 Ci1 . (3.2.2)
i=1

The formula (3.2.2) is also referred to as the cofactor expansion of the deter-
minant according to the first column.

This definition indicates that if a column of an n × n matrix A is zero then


det(A) = 0. To show this, we use induction. When n = 1, it is trivial. Assume
that the statement is true at n − 1 (n ≥ 2). We now prove the statement at
n (n ≥ 2). In fact, if the first column of A is zero, then det(A) = 0 simply
by the definition of determinant (see (3.2.2)); if another column rather than
the first column of A is zero, then all the cofactors Ci1 vanish by the inductive
assumption, which still results in det(A) = 0. The definition also implies that if
A = (aij ) is upper triangular then det(A) is the product of its diagonal entries,
det(A) = a11 · · · ann , as may be shown by induction as well.
The above definition of a determinant immediately leads to the following
important properties.

Theorem 3.4 Consider the n × n matrices A, B given as


⎛ ⎞ ⎛ ⎞
a11 ... a1n a11 ... a1n
⎜ ⎟ ⎜ ⎟
⎜ ··· ··· ··· ⎟ ⎜ ··· ··· ··· ⎟
⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ ⎟
A = ⎜ ak1 ... akn ⎟ , B = ⎜ rak1 ... rakn ⎟ . (3.2.3)
⎜ ⎟ ⎜ ⎟
⎜ ··· ··· ··· ⎟ ⎜ ··· ··· ··· ⎟
⎝ ⎠ ⎝ ⎠
an1 ... ann an1 ... ann
www.pdfgrip.com

90 Determinants

That is, B is obtained from A by multiplying the kth row of A by a scalar r ∈ F,


k = 1, . . . , n. Then we have det(B) = r det(A).

Proof We prove the theorem by induction.


When n = 1, the statement of the theorem is trivial.
Assume that the statement is true for (n − 1) × (n − 1) matrices when n ≥ 2.
Now for the n × n matrices given in (3.2.3), use CijA , and CijB to denote
the cofactors of A and B, respectively, i, j = 1, . . . , n. Then, (3.2.3) and the
inductive assumption give us
B
Ck1 = Ck1
A
; B
Ci1 = rCi1
A
, i = k. (3.2.4)

Therefore, we arrive at
 
det(B) = B
ai1 Ci1 + rak1 Ck1
B
= A
ai1 rCi1 + rak1 Ck1
A
= r det(A).
i =k i =k
(3.2.5)

The proof is complete.

This theorem implies that if a row of a matrix A is zero then det(A) = 0.


As an application, we show that if an n × n matrix A = (aij ) is lower
triangular then det(A) = a11 · · · ann . In fact, when n = 1, there is nothing
to show. Assume that the formula is true at n − 1 (n ≥ 2). At n ≥ 2, the
first row of the minor Mi1 of A vanishes for each i = 2, . . . , n. So Mi1 = 0,
i = 2, . . . , n. However, the inductive assumption gives us M11 = a22 · · · ann .
Thus det(A) = a11 (−1)1+1 M11 = a11 · · · ann as claimed.
Therefore, if an n × n matrix A = (aij ) is either upper or lower triangular,
we infer that there holds det(At ) = det(A), although, later, we will show that
such a result is true for general matrices.

Theorem 3.5 For the n × n matrices A = (aij ), B = (bij ), C = (cij ) which


have identical rows except the kth row in which ckj = akj + bkj , j = 1, . . . , n,
we have det(C) = det(A) + det(B).

Proof We again use induction on n.


The statement is clear when n = 1.
Assume that the statement is valid for the n − 1 case (n ≥ 2).
For A, B, C given in the theorem with n ≥ 2, with the notation in the proof
of Theorem 3.4 and in view of the inductive assumption, we have
C
Ck1 = Ck1
A
; C
Ci1 = Ci1
A
+ Ci1
B
, i = k. (3.2.6)
www.pdfgrip.com

3.2 Definition and properties of determinants 91

Consequently,

det(C) = C
ai1 Ci1 + ck1 Ck1
C

i =k

= A
ai1 (Ci1 + Ci1
B
) + (ak1 + bk1 )Ck1
A

i =k
 
= A
ai1 Ci1 + ak1 Ck1
A
+ B
ai1 Ci1 + bk1 Ck1
A

i =k i =k

= det(A) + det(B), (3.2.7)

as asserted.

Theorem 3.6 Let A, B be two n × n (n ≥ 2) matrices so that B is obtained


from interchanging any two rows of A. Then det(B) = − det(A).

Proof We use induction on n.


At n = 2, we can directly check that the statement of the theorem is true.
Assume that the statement is true at n − 1 ≥ 2.
Let A, B be n × n (n ≥ 3) matrices given by
⎛ ⎞ ⎛ ⎞
a11 . . . a1n a11 . . . a1n
⎜ ⎟ ⎜ ⎟
⎜ ··· ··· ··· ⎟ ⎜ ··· ··· ··· ⎟
⎜ ⎟ ⎜ ⎟
⎜ a ⎟ ⎜ a ⎟
⎜ i1 . . . ain ⎟ ⎜ j 1 . . . aj n ⎟
⎜ ⎟ ⎜ ⎟
A=⎜ ⎟ ⎜
⎜ ··· ··· ··· ⎟, B = ⎜ ··· ··· ··· ⎟,
⎟ (3.2.8)
⎜ ⎟ ⎜ ⎟
⎜ aj 1 . . . aj n ⎟ ⎜ ai1 . . . ain ⎟
⎜ ⎟ ⎜ ⎟
⎜ ··· ··· ··· ⎟ ⎜ ··· ··· ··· ⎟
⎝ ⎠ ⎝ ⎠
an1 . . . ann an1 . . . ann
where j = i + k for some k ≥ 1. We observe that it suffices to prove the
adjacent case when k = 1 because when k ≥ 2 we may obtain B from A sim-
ply by interchanging adjacent rows k times downwardly and then k − 1 times
upwardly, which gives rise to an odd number of adjacent row interchanges.
For the adjacent row interchange, j = i +1, the inductive assumption allows
us to arrive at the following relations between the minors of the matrices A and
B immediately,
B
Mk1 = −Mk1
A
, k = i, i + 1; B
Mi1 = Mi+1,1
A
, B
Mi+1,1 = Mi1
A
, (3.2.9)

which implies that the corresponding cofactors of A and B all differ by a sign,
B
Ck1 = −Ck1
A
, k = i, i + 1; B
Ci1 = −Ci+1,1
A
, B
Ci+1,1 = −Ci1
A
. (3.2.10)
www.pdfgrip.com

92 Determinants

Hence

det(B) = B
ak1 Ck1 + ai+1,1 Ci1
B
+ ai1 Ci+1,1
B

k =i,i+1

= A
ak1 (−Ck1 ) + ai+1,1 (−Ci+1,1
A
) + ai1 (−Ci1
A
)
k =i,i+1

= − det(A), (3.2.11)
as expected.
This theorem indicates that if two rows of an n × n (n ≥ 2) matrix A are
identical then det(A) = 0. Thus adding a multiple of a row to another row of
A does not alter the determinant ofA:
   
 a11 ... a1n   a11 . . . a1n 
   
   
 ··· ··· ···   ··· ··· ··· 
   
   a 
 ai1 ... ain   i1 . . . ain 
   
 ··· ··· ···  =  ··· ··· ··· 
   
   
 rai1 + aj 1 . . . rain + aj n   rai1 . . . rain 
   
 ··· ··· ···   ··· ··· ··· 
   
   
 an1 ... ann   an1 . . . ann 
 
 a11 . . . a1n 
 
 
 ··· ··· ··· 
 
 a 
 i1 . . . ain 
 
+  · · · · · · · · ·  = det(A). (3.2.12)
 
 aj 1 . . . aj n 
 
 ··· ··· ··· 
 
 
 an1 . . . ann 
The above results provide us with practical computational techniques when
evaluating the determinant of an n × n matrix A. In fact, we may perform the
following three types of permissible row operations on A.
(1) Multiply a row of A by a nonzero scalar. Such an operation may also
be realized by multiplying A from the left by the matrix obtained from
multiplying the corresponding row of the n × n identity matrix I by the
same scalar.
(2) Interchange any two rows of A when n ≥ 2. Such an operation may also
be realized by multiplying A from the left by the matrix obtained from
interchanging the corresponding two rows of the n × n identity matrix I .
www.pdfgrip.com

3.2 Definition and properties of determinants 93

(3) Add a multiple of a row to another row of A when n ≥ 2. Such an oper-


ation may also be realized by multiplying A from the left by the matrix
obtained from adding the same multiple of the row to another row, corre-
spondingly, of the n × n identity matrix I .
The matrices constructed in the above three types of permissible row oper-
ations are called elementary matrices of types 1, 2, 3. Let E be an elementary
matrix of a given type. Then E is invertible and E −1 is of the same type. More
precisely, if E is of type 1 and obtained from multiplying a row of I by the
scalar r, then det(E) = r det(I ) = r and E −1 is simply obtained from multi-
plying the same row of I by r −1 , resulting in det(E −1 ) = r −1 ; if E is of type
2, then E −1 = E and det(E) = det(E −1 ) = − det(I ) = −1; if E is of type 3
and obtained from adding an r multiple of the ith row to the j th row (i = j )
of I , then E −1 is obtained from adding a (−r) multiple of the ith row to the
j th row of I and det(E) = det(E −1 ) = det(I ) = 1. In all cases,
det(E −1 ) = det(E)−1 . (3.2.13)
In conclusion, the properties of determinant under permissible row opera-
tions may be summarized collectively as follows.

Theorem 3.7 Let A be an n × n matrix and E be an elementary matrix of the


same dimensions. Then
det(EA) = det(E) det(A). (3.2.14)

For an n × n matrix A, we can perform a sequence of permissible row


operations on A to reduce it into an upper triangular matrix, say U , whose
determinant is simply the product of whose diagonal entries. Thus, if we
express A as Ek · · · E1 A = U where E1 , . . . , Ek are some elementary mat-
rices, then Theorem 3.7 gives us the relation
det(Ek ) · · · det(E1 ) det(A) = det(U ). (3.2.15)
We are now prepared to establish the similar properties of determinants with
respect to column operations.

Theorem 3.8 The conclusions of Theorems 3.4, 3.5, and 3.6 hold when the
statements about the row vectors are replaced correspondingly with the column
vectors of matrices.

Proof Using induction, it is easy to see that the conclusions of Theorems 3.4
and 3.5 hold. We now prove that the conclusion of Theorem 3.6 holds when
www.pdfgrip.com

94 Determinants

the row interchange there is replaced with column interchange. That is, we
show that if two columns in an n × n matrix A are interchanged, its determi-
nant will change sign. This property is not so obvious since our definition of
determinant is based on the cofactor expansion by the first column vector and
an interchange of the first column with another column alters the first column
of the matrix. The effect of the value of determinant with respect to such an
alteration needs to be examined closely, which will be our task below.
We still use induction.
At n = 2 the conclusion may be checked directly.
Assume the conclusion holds at n − 1 ≥ 1.
We now prove the conclusion at n ≥ 3. As before, it suffices to establish the
conclusion for any adjacent column interchange.
If the column interchange does not involve the first column, we see that the
conclusion about the sign change of the determinant clearly holds in view of
the inductive assumption and the cofactor expansion formula (3.2.2) since all
the cofactors Ci1 (i = 1, . . . , n) change their sign exactly once under any pair
of column interchange.
Now consider the effect of an interchange of the first and second columns
of A. It can be checked that such an operation may be carried out through
multiplying A by the matrix F from the right where F is obtained from the
n × n identity matrix I by interchanging the first and second columns of I . Of
course, det(F ) = −1.
Let E1 , . . . , Ek be a sequence of elementary matrices and U = (uij ) an
upper triangular matrix so that Ek · · · E1 A = U . Then we have
⎛ ⎞
u12 u11 u13 · · · u1n
⎜ ⎟
⎜ u22 0 u23 · · · u2n ⎟
⎜ ⎟
⎜ ⎟
UF = ⎜ 0 0 u33 · · · u3n ⎟ . (3.2.16)
⎜ ⎟
⎜ ··· ··· ··· ··· ··· ⎟
⎝ ⎠
0 ··· ··· 0 unn

Thus the cofactor expansion formula down the first column, as stated in Defi-
nition 3.3, and the inductive assumption at n − 1 lead us to the result
   
 0 u23 · · · · · ·   u11 u13 · · · · · · 
   
   
 0 u33 · · · · · ·   0 u33 · · · · · · 
det(U F ) = u11    
 − u22  · · · · · · · · · · · · 
 ··· ··· ··· ···   
   
 0 · · · · · · unn   0 · · · · · · unn 
= −u11 u22 · · · unn = − det(U ). (3.2.17)
www.pdfgrip.com

3.2 Definition and properties of determinants 95

Combining (3.2.15) and (3.2.17), we obtain


det(Ek · · · E1 AF ) = det(U F ) = − det(U )
= − det(Ek ) · · · det(E1 ) det(A). (3.2.18)
Thus, applying (3.2.14) on the left-hand side of (3.2.18), we arrive at
det(AF ) = − det(A) as desired.
Theorem 3.8 gives us additional practical computational techniques when
evaluating the determinant of an n×n matrix A because it allows us to perform
the following three types of permissible column operations on A.
(1) Multiply a column of A by a nonzero scalar. Such an operation may also
be realized by multiplying A from the right by the matrix obtained from
multiplying the corresponding column of the n × n identity matrix I by
the same scalar.
(2) Interchange any two columns of A when n ≥ 2. Such an operation may
also be realized by multiplying A from the right by the matrix obtained
from interchanging the corresponding two columns of the n × n identity
matrix I .
(3) Add a multiple of a column to another column of A when n ≥ 2. Such
an operation may also be realized by multiplying A from the right by the
matrix obtained from adding the same multiple of the column to another
column, correspondingly, of the n × n identity matrix I .
The matrices constructed in the above three types of permissible column
operations are simply the elementary matrices defined and described earlier.
Like those for permissible row operations, the properties of a determinant
under permissible column operations may be summarized collectively as well,
as follows.

Theorem 3.9 Let A be an n × n matrix and E be an elementary matrix of the


same dimensions. Then
det(AE) = det(A) det(E). (3.2.19)

With the above preparation, we are ready to harvest a series of important


properties of determinants.
First we show that a determinant is invariant under matrix transpose.

Theorem 3.10 Let A be an n × n matrix. Then


det(A) = det(At ). (3.2.20)
www.pdfgrip.com

96 Determinants

Proof Choose a sequence of elementary matrices E1 , . . . , Ek such that


Ek · · · E1 A = U (3.2.21)
is an upper triangular matrix. Thus from U t = At E1t · · · Ekt and Theorem 3.9
we have
det(U ) = det(U t ) = det(At E1t · · · Ekt ) = det(At ) det(E1t ) · · · det(Ekt ).
(3.2.22)
Comparing (3.2.22) with (3.2.15) and noting det(El ) = det(Elt ), l = 1, . . . , k,
because an elementary matrix is either symmetric or lower or upper triangular,
we arrive at det(A) = det(At ) as claimed.
Next we show that a determinant preserves matrix multiplication.

Theorem 3.11 Let A and B be two n × n matrices. Then


det(AB) = det(A) det(B). (3.2.23)

Proof Let E1 , . . . , Ek be a sequence of elementary matrices so that (3.2.21)


holds for an upper triangular matrix U . There are two cases to be treated
separately.
(1) U has a zero row. Then det(U ) = 0. Moreover, by the definition of matrix
multiplication, we see that U B also has a zero row. Hence det(U B) = 0.
On the other hand (3.2.21) leads us to Ek · · · E1 AB = U B. So
det(Ek ) · · · det(E1 ) det(A) = det(U ) = 0,
(3.2.24)
det(Ek ) · · · det(E1 ) det(AB) = 0.
In particular, det(A) = 0 and det(AB) = 0. Thus (3.2.23) is valid.
(2) U has no zero row. Hence unn = 0. Using (type 3) row operations if nec-
essary we may assume uin = 0 for all i ≤ n − 1. Therefore un−1,n−1 = 0.
Thus, using more (type 3) row operations if necessary, we may eventually
assume that U is made diagonal with u11 = 0, . . . , unn = 0. Using (type
1) row operations if necessary we may assume u11 = · · · = unn = 1.
That is, U = I . Thus we get Ek · · · E1 A = I and Ek · · · E1 AB = B.
Consequently,
det(Ek ) · · · det(E1 ) det(A) = 1,
(3.2.25)
det(Ek ) · · · det(E1 ) det(AB) = det(B),
which immediately lead us to the anticipated conclusion (3.2.23).
The proof is complete.
www.pdfgrip.com

3.2 Definition and properties of determinants 97

The formula (3.2.23) can be used immediately to derive a few simple but
basic conclusions about various matrices.
For example, if A ∈ F(n, n) is invertible, then there is some B ∈ F(n, n)
such that AB = In . Thus det(A) det(B) = det(AB) = det(In ) = 1, which
implies that det(A) = 0. In other words, the condition det(A) = 0 is neces-
sary for any A ∈ F(n, n) to be invertible. In the next section, we shall show
that this condition is also sufficient. As another example, if A ∈ R(n, n) is or-
thogonal, then AAt = In . Hence (det(A))2 = det(A) det(At ) = det(AAt ) =
det(In ) = 1. In other words, the determinant of an orthogonal matrix can only
take values ±1. Similarly, if A ∈ C(n, n) is unitary, then the condition
AA† = In leads us to the conclusion
t
| det(A)|2 = det(A)det(A) = det(A) det(A ) = det(AA† ) = det(In ) = 1
(3.2.26)

(cf. (3.2.38)). That is, the determinant of a unitary matrix is of modulus one.
Below we show that we can make a cofactor expansion along any column
or row to evaluate a determinant.

Theorem 3.12 Let A = (aij ) be an n × n matrix and C = (Cij ) its cofactor


matrix. Then

n
det(A) = aik Cik
i=1
n
= akj Ckj , k = 1, . . . , n. (3.2.27)
j =1

In other words, the determinant of A may be evaluated by a cofactor expansion


along any column or any row of A.

Proof We first consider the column expansion case.


We will make induction on k = 1, . . . , n.
If k = 1, there is nothing to show.
Assume the statement is true at k ≥ 1 and n ≥ 2. Interchanging the kth and
(k +1)th columns and using the inductive assumption that a cofactor expansion
can be made along the kth column of the matrix with the two columns already
interchanged, we have

− det(A) = a1,k+1 (−1)1+k M1,k+1 + · · · + ai,k+1 (−1)i+k Mi,k+1


+ · · · + an,k+1 (−1)n+k Mn,k+1 , (3.2.28)
www.pdfgrip.com

98 Determinants

which is exactly what was claimed at k + 1:



n
det(A) = ai,k+1 (−1)i+(k+1) Mi,k+1 . (3.2.29)
i=1

We next consider the row expansion case.


t
We use MijA to denote the minor of the (i, j )th entry of the matrix At (i, j =
1, . . . , n). Applying Theorem 3.10 and Definition 3.3 we have

n 
n
a1j (−1)j +1 MjA1 =
t
det(A) = det(At ) = a1j (−1)1+j M1j , (3.2.30)
j =1 j =1

which establishes the legitimacy of the cofactor expansion formula along the
first row of A. The validity of the cofactor expansion along an arbitrary row
may be proved by induction as done for the column case.

Assume that A ∈ F(n, n) takes a boxed upper triangular form,



A1 A3
A= , (3.2.31)
0 A2

where A1 ∈ F(k, k), A2 ∈ F(l, l), A3 ∈ F(k, l), and k + l = n. Then we have
the useful formula

det(A) = det(A1 ) det(A2 ). (3.2.32)

To prove (3.2.32), we use a few permissible row operations to reduce A1 into


an upper triangular form U1 whose diagonal entries are u11 , . . . , ukk (say).
Thus det(U1 ) = u11 · · · ukk . Likewise, we may also use a few permissible
row operations to reduce A2 into an upper triangular form U2 whose diagonal
entries are uk+1,k+1 , . . . , unn (say). Thus det(U2 ) = uk+1,k+1 · · · unn . Now
apply the same sequences of permissible row operations on A. The boxed
upper triangular form of A allows us to reduce A into the following upper
triangular form

U1 A4
U= . (3.2.33)
0 U2

Since the diagonal entries of U are u11 , . . . , ukk , uk+1,k+1 , . . . , unn , we have

det(U ) = u11 · · · ukk uk+1,k+1 · · · unn = det(U1 ) det(U2 ). (3.2.34)

Discounting the effects of the permissible row operations on A, A1 , and A2 ,


we see that (3.2.34) implies (3.2.32).
www.pdfgrip.com

3.2 Definition and properties of determinants 99

It is easy to see that if A takes a boxed lower triangular form,



A1 0
A= , (3.2.35)
A3 A2

where A1 ∈ F(k, k), A2 ∈ F(l, l), A3 ∈ F(l, k), and k + l = n, then (3.2.32)
still holds. Indeed, taking transpose in (3.2.35), we have

At1 At3
A =
t
, (3.2.36)
0 At2

which becomes a boxed upper triangular matrix studied earlier. Thus, using
Theorem 3.10 and (3.2.32), we obtain

det(A) = det(At ) = det(At1 ) det(At2 ) = det(A1 ) det(A2 ), (3.2.37)

as anticipated.

Exercises

3.2.1 For A ∈ C(n, n) use Definition 3.3 to establish the property

det(A) = det(A), (3.2.38)

where A is the matrix obtained from A by taking complex conjugate


for all entries of A.
3.2.2 Let E ∈ F(n, n) be such that each row and each column of E can only
have exactly one nonzero entry which may either be 1 or −1. Show that
det(E) = ±1.
3.2.3 In F(n, n), anti-upper triangular and anti-lower triangular matrices
are of the forms
⎛ ⎞ ⎛ ⎞
a11 a12 · · · a1n 0 ··· 0 a1n
⎜ . ⎟ ⎜ . .. ⎟
⎜ . .. . ⎟ ⎜ . . . ⎟
⎜ . . .. 0 ⎟ ⎜ . .. .. . ⎟
⎜ ⎟, ⎜ ⎟,
⎜ . . . . ⎟ ⎜ . ⎟
⎜ .. .. .. .. ⎟ ⎜ ..
.
..
. .. ⎟
⎝ ⎠ ⎝ 0 ⎠
an1 0 · · · 0 an1 an2 · · · ann
(3.2.39)

respectively. Establish the formulas to express the determinants of these


matrices in terms of the anti-diagonal entries an1 , . . . , a1n .
www.pdfgrip.com

100 Determinants

3.2.4 Show that


⎛ ⎞
x 1 1 1
⎜ ⎟
⎜ 1 y 0 0 ⎟
det ⎜
⎜ 1
⎟ = txyz − yz − tz − ty, x, y, z, t ∈ R.
⎝ 0 z 0 ⎟

1 0 0 t
(3.2.40)
3.2.5 Let A, B ∈ F(3, 3) and assume that the first and second columns of
A are same as the first and second columns of B. If det(A) = 5 and
det(B) = 2, find det(3A − 2B) and det(3A + 2B).
3.2.6 (Extension of Exercise 3.2.5) Let A, B ∈ F(n, n) such that only the j th
columns of them are possibly different. Establish the formula
det(aA + bB) = (a + b)n−1 (a det(A) + b det(B)) ,
a, b ∈ F. (3.2.41)
3.2.7 Let A(t) = (aij (t)) ∈ R(n, n) be such that each entry aij (t) is a differ-
entiable function of t ∈ R. Establish the differentiation formula
d n
daij (t)
det(A(t)) = Cij (t), (3.2.42)
dt dt
i,j =1

where Cij (t) is the cofactor of the entry aij (t), i, j = 1, . . . , n, in the
matrix A(t).
3.2.8 Prove the formula
⎛ ⎞
x a1 a2 · · · an
⎜ a ⎟
⎜ 1 x a2 · · · an ⎟ 
⎜ ⎟ n 'n
⎜ ⎟
det ⎜ a1 a2 x · · · an ⎟ = x + ai (x − ai ).
⎜ . .. ⎟
⎜ . .. .. . . ⎟ i=1 i=1
⎝ . . . . . ⎠
a1 a2 a3 ··· x
(3.2.43)
3.2.9 Let p1 (t), . . . , pn+2 (t) be n+2 polynomials of degrees up to n ∈ N and
with coefficients in C. Show that for any n + 2 numbers c1 , . . . , cn+2
in C, there holds
⎛ ⎞
p1 (c1 ) p1 (c2 ) ··· p1 (cn+2 )
⎜ ⎟
⎜ p2 (c1 ) p2 (c2 ) ··· p2 (cn+2 ) ⎟
⎜ ⎟
det ⎜ .. .. .. ⎟ = 0. (3.2.44)
⎜ ..
. ⎟
⎝ . . . ⎠
pn+2 (c1 ) pn+2 (c2 ) · · · pn+2 (cn+2 )
www.pdfgrip.com

3.2 Definition and properties of determinants 101

3.2.10 (Determinant representation of a polynomial) Establish the formula


⎛ ⎞
x −1 0 · · · 0 0
⎜ ⎟
⎜ 0 x −1 · · · 0 0 ⎟
⎜ ⎟
⎜ .. ⎟
⎜ 0 . 0 ⎟
⎜ 0 x 0 ⎟
det ⎜ ⎟
⎜ .. .. .. .. .. .. ⎟
⎜ . . . . . . ⎟
⎜ ⎟
⎜ ⎟
⎝ 0 0 0 ··· x −1 ⎠
a0 a1 a2 ··· an−1 an
= an x + an−1 x
n n−1
+ · · · + a1 x + a0 . (3.2.45)

3.2.11 For n ≥ 2 establish the formula


⎛ ⎞
a1 − λ a2 ··· an
⎜ ⎟
⎜ a1 a2 − λ · · · an ⎟
⎜ ⎟
det ⎜ . . .. ⎟
⎜ .. .. ..
. ⎟
⎝ . ⎠
a1 a2 · · · an − λ


n
= (−1)n λn−1 λ − ai . (3.2.46)
i=1

3.2.12 Let A ∈ F(n, n) be such that AAt = In and det(A) < 0. Show that
det(A + In ) = 0.
3.2.13 Let A ∈ F(n, n) such that the entries of A are either 1 or −1. Show that
det(A) is an even integer when n ≥ 2.
3.2.14 Let A = (aij ) ∈ R(n, n) satisfy the following diagonally dominant
condition:

|aii | > |aij |, i = 1, . . . , n. (3.2.47)
j =i

Show that det(A) = 0. (This result is also known as the Levy–


Desplanques theorem.)
3.2.15 (A refined version of the previous exercise) Let A = (aij ) ∈ R(n, n)
satisfy the following positive diagonally dominant condition:

aii > |aij |, i = 1, . . . , n. (3.2.48)
j =i

Show that det(A) > 0. (This result is also known as the Minkowski
theorem.)
www.pdfgrip.com

102 Determinants

3.2.16 If A ∈ F(n, n) is skewsymmetric and n is odd, then A must be singular.


What happens when n is even?
3.2.17 Let α, β ∈ F(1, n). Establish the formula

det(In − α t β) = 1 − αβ t . (3.2.49)

3.2.18 Compute the determinant


⎛ ⎞
100 x1 x2 ··· xn
⎜ x ··· 0 ⎟
⎜ 1 1 0 ⎟
⎜ ⎟
⎜ 0 1 ··· 0 ⎟
f (x1 , . . . , xn ) = det ⎜ x2 ⎟ (3.2.50)
⎜ . .. ⎟
⎜ . .. .. .. ⎟
⎝ . . . . . ⎠
xn 0 0 ··· 1
and describe what the equation f (x1 , . . . , xn ) = 0 represents
geometrically.

3.3 Adjugate matrices and Cramer’s rule


Let A = (aij ) be an n × n (n ≥ 2) matrix and let C = (Cij ) be the associated
cofactor matrix. For k, l = 1, . . . , n, we may apply Theorem 3.12 to obtain the
relations

n
aik Cil = 0, k = l, (3.3.1)
i=1

n
akj Clj = 0, k = l. (3.3.2)
j =1

In fact, it is easy to see that the left-hand side of (3.3.1) is the cofactor expan-
sion of the determinant along the lth column of such a matrix that is obtained
from A through replacing the lth column by the kth column of A whose value
must be zero and the left-hand side of (3.3.2) is the cofactor expansion of the
determinant along the lth row of such a matrix that is obtained from A through
replacing the lth row by the kth row of A whose value must also be zero.
We can summarize the properties stated in (3.2.27), (3.3.1), and (3.3.2) by
the expressions

C t A = det(A)In , AC t = det(A)In . (3.3.3)

These results motivate the following definition.


www.pdfgrip.com

3.3 Adjugate matrices and Cramer’s rule 103

Definition 3.13 Let A be an n × n matrix and C its cofactor matrix. The


adjugate matrix of A, denoted by adj(A), is the transpose of the cofactor matrix
of A:

adj(A) = C t . (3.3.4)

Adjugate matrices are sometimes also called adjoint or adjunct matrices.


As a consequence of this definition and (3.3.3) we have

adj(A)A = A adj(A) = det(A)In , (3.3.5)

which leads immediately to the following conclusion.

Theorem 3.14 Let A be an n × n matrix. Then A is invertible if and only if


det(A) = 0. Furthermore, if A is invertible, then A−1 may be expressed as
1
A−1 = adj(A). (3.3.6)
det(A)

Proof If det(A) = 0, from (3.3.5) we arrive at (3.3.6). Conversely, if A−1


exists, then from AA−1 = In and Theorem 3.11 we have det(A) det(A−1 ) = 1.
Thus det(A) = 0.

As an important application, we consider the unique solution of the system

Ax = b (3.3.7)

when A = (aij ) is an invertible n × n matrix, x = (x1 , . . . , xn )t the vector of


unknowns, and b = (b1 , . . . , bn )t a given non-homogeneous right-hand-side
vector.
In such a situation we can use (3.3.6) to get
1
x= adj(A)b. (3.3.8)
det(A)
Therefore, with adj(A) = (Aij ) = C t (where C = (Cij ) is the cofactor matrix
of A) we may read off to obtain the result

1  1 
n n
xi = Aij bj = bj Cj i
det(A) det(A)
j =1 j =1
det(Ai )
= , i = 1, . . . , n, (3.3.9)
det(A)
where Ai is the matrix obtained from A after replacing the ith column of A by
the vector b, i = 1, . . . , n.
www.pdfgrip.com

104 Determinants

The formulas stated in (3.3.9) are called Cramer’s formulas. Such a solution
method is also called Cramer’s rule.
Let A ∈ F(m, n) and of rank k. Then there are k row vectors of A which are
linearly independent. Use B to denote the submatrix of A consisting of those k
row vectors. Since B is of rank k we know that there are k column vectors of B
which are linearly independent. Use C to denote the submatrix of B consisting
of those k column vectors. Then C is a submatrix of A which lies in F(k, k)
and is of rank k. In particular det(C) = 0. In other words, we have shown that
if A is of rank k then A has a k × k submatrix of nonzero determinant.
To end this section, we consider a practical problem as an application of
determinants: The unique determination of a polynomial by interpolation.
Let p(t) be a polynomial of degree (n − 1) ≥ 1 over a field F given by

p(t) = an−1 t n−1 + · · · + a1 t + a0 , (3.3.10)

and let t1 , . . . , tn be n points in F so that p(ti ) = pi (i = 1, . . . , n). To ease the


illustration to follow, we may assume that n is sufficiently large (say n ≥ 5).
Therefore we have the simultaneous system of equations



⎪ a0 + a1 t1 + · · · + an−2 t1n−2 + an−1 t1n−1 = p1 ,


⎨ a0 + a1 t2 + · · · + an−2 t2n−2 + an−1 t2n−1 = p2 ,
(3.3.11)

⎪ · · ···································· ··· ··· ,



a0 + a1 tn + · · · + an−2 tnn−2 + an−1 tnn−1 = pn ,

in the n unknowns a0 , a1 , . . . , an−2 , an−1 , whose coefficient matrix, say A, has


the determinant

 
 1 t1 t12 ··· t1n−2 t1n−1 

 
 1 t2 t22 ··· t2n−2 t2n−1 
det(A) =  . (3.3.12)
 ··· ··· ··· ··· ··· · · · 
 
 1 tn tn2 ··· tnn−2 t n−1 
n

Adding the (−t1 ) multiple of the second last column to the last column,. . . ,
the (−t1 ) multiple of the second column to the third column, and the (−t1 )
multiple of the first column to the second column, we get
www.pdfgrip.com

3.3 Adjugate matrices and Cramer’s rule 105

det(A)
 
 1 0 0 ··· 0 0 
 
 
 1 (t2 − t1 ) t22 (t2 − t1 ) ··· t2n−3 (t2 − t1 ) t2 (t2 − t1 ) 
n−2
=  

 ··· ··· ··· ··· ··· ··· 
 
 1 (tn − t1 ) tn (tn − t1 ) ··· tnn−3 (tn − t1 ) tnn−2 (tn − t1 ) 
 
 1 t2 t22 ··· t2n−3 t2n−2 

'  
n
 1 t3 t32 ··· t3n−3 t3n−2 
= (ti − t1 )   . (3.3.13)
 ··· ··· ··· ··· ··· · · · 
i=2  
 1 tn tn2 ··· tnn−3 t n−2 
n

Continuing the same expansion, we eventually get

det(A)
 n ⎛ ⎞  n
' '
n '
= (ti − t1 ) ⎝ ⎠
(tj − t2 ) · · · (tk − tn−2 ) (tn − tn−1 )
i=2 j =3 k=n−1
'
= (tj − ti ). (3.3.14)
1≤i<j ≤n

Hence the matrix A is invertible if and only if t1 , t2 , . . . , tn are distinct. Under


such a condition the coefficients a0 , a1 , . . . , an−1 are uniquely determined.
The determinant (3.3.12) is called the Vandermonde determinant.
If A, B ∈ F(n, n) are similar, there is an invertible C ∈ F(n, n) such that
A = C −1 BC, which gives us det(A) = det(B). That is, similar matrices
have the same determinant. In view of this fact and the fact that the matrix
representations of a linear mapping from a finite-dimensional vector space into
itself with respect to different bases are similar, we may define the determinant
of a linear mapping, say T , denoted by det(T ), to be the determinant of the
matrix representation of T with respect to any basis. In this situation, we see
that T is invertible if and only if det(T ) = 0.

Exercises

3.3.1 Show that A ∈ F(m, n) is of rank k if and only if there is a k × k


submatrix of A whose determinant is nonzero and that the determinant
of any square submatrix of a larger size (if any) is zero.
3.3.2 Let A ∈ R(n, n) be an orthogonal matrix. Prove that adj(A) = At or
−At depending on the sign of det(A).
www.pdfgrip.com

106 Determinants

3.3.3 For A ∈ F(n, n) (n ≥ 2) establish the formula

det(adj(A)) = (det(A))n−1 . (3.3.15)

In particular this implies that A is nonsingular if and only if adj(A) is


so.
3.3.4 For A ∈ F(n, n) (n ≥ 2) prove the rank relations



⎨ n, r(A) = n,
r(adj(A)) = 1, r(A) = n − 1, (3.3.16)


0, r(A) ≤ n − 2.

3.3.5 Let A ∈ F(n, n) be invertible. Show that adj(A−1 ) = (adj(A))−1 .


3.3.6 For A ∈ F(n, n) where n ≥ 3 show that adj(adj(A)) = (det(A))n−2 A.
What happens when n = 2?
3.3.7 For A ∈ F(n, n) where n ≥ 2 show that adj(At ) = (adj(A))t .
3.3.8 Let A ∈ R(n, n) satisfy adj(A) = At . Prove that A = 0 if and only if
det(A) = 0.
3.3.9 For A, B ∈ R(n, n) show that adj(AB) = adj(B)adj(A). Thus, if A is
idempotent, so is adj(A), and if A is nilpotent, so is adj(A).
3.3.10 For A = (aij ) ∈ F(n, n) consider the linear system


⎨ a11 x1 + · · · + a1n xn
⎪ = 0,
·················· ··· ··· (3.3.17)


an1 x1 + · · · + ann xn = 0.

Show that if A is singular but adj(A) = 0 then the space of solutions


of (3.3.17) is spanned by any nonzero column vector of adj(A).
3.3.11 Use Cramer’s rule and the Vandermonde determinant to find the
quadratic polynomial p(t) with coefficients in C whose values at
t = 1, 1 + i, −3 are −2, 0, 5, respectively.
3.3.12 Let p(t) be a polynomial of degree n − 1 given in (3.3.10). Use
Cramer’s rule and the Vandermonde determinant to prove that p(t) can-
not have n distinct zeros unless it is the zero polynomial.
3.3.13 Let A ∈ F(m, n), B ∈ F(n, m), and m > n. Show that det(AB) = 0.
3.3.14 For U = F(n, n) define T ∈ L(U ) by T (A) = AB − BA for A ∈ U
where B ∈ U is fixed. Show that for such a linear mapping T we have
det(T ) = 0 no matter how B is chosen.
www.pdfgrip.com

3.4 Characteristic polynomials and Cayley–Hamilton theorem 107

3.4 Characteristic
polynomials and Cayley–Hamilton theorem
We first consider the concrete case of matrices.
Let A = (aij ) be an n × n matrix over a field F. We first consider the linear
mapping TA : Fn → Fn induced from A defined by
⎛ ⎞
x1
⎜ . ⎟
TA (x) = Ax, x = ⎜ ⎟
⎝ .. ⎠ ∈ F .
n
(3.4.1)
xn

Recall that an eigenvalue of TA is a scalar λ ∈ F such that the null-space

Eλ = N (TA − λI ) = {x ∈ Fn | TA (x) = λx}, (3.4.2)

where I ∈ L(Fn ) is the identity mapping, is nontrivial and Eλ is called an


eigenspace of TA , whose nonzero vectors are called eigenvectors. The eigen-
values, eigenspaces, and eigenvectors of TA are also often simply referred to
as those of the matrix A. The purpose of this section is to show how to use
determinant as a tool to find the eigenvalues of A.
If λ is an eigenvalue of A and x an eigenvector, then Ax = λx. In other
words, x is a nonzero solution of the equation (λIn − A)x = 0. Hence the
matrix λIn − A is singular. Therefore λ satisfies

det(λIn − A) = 0. (3.4.3)

Of course the converse is true as well: If λ satisfies (3.4.3) then (λIn −A)x = 0
has a nontrivial solution which indicates that λ is an eigenvalue of A. Conse-
quently the eigenvalues of A are the roots of the function

pA (λ) = det(λIn − A), (3.4.4)

which is seen to be a polynomial of degree n following the cofactor expansion


formula (3.2.2) in Definition 3.3. The polynomial pA (λ) defined in (3.4.4) is
called the characteristic polynomial associated with the matrix A, whose roots
are called the characteristic roots of A. So the eigenvalues of A are the char-
acteristic roots of A. In particular, A can have at most n distinct eigenvalues.

Theorem 3.15 Let A = (aij ) be an n × n (n ≥ 2) matrix and pA (λ) its


characteristic polynomial. Then pA (λ) is of the form

pA (λ) = λn − Tr(A)λn−1 + · · · + (−1)n det(A). (3.4.5)


www.pdfgrip.com

108 Determinants

Proof Write pA (λ) = an λn + an−1 λn−1 + · · · + a0 . Then


a0 = pA (0) = det(−A) = (−1)n det(A) (3.4.6)
as asserted. Besides, using Definition 3.3 and induction, we see that the two
leading-degree terms in pA (λ) containing λn and λn−1 can only appear in the
λIn −A
product of the entry λ − a11 and the cofactor C11 of the entry λ − a11
in the matrix λIn − A. Let An−1 be the submatrix of A obtained by deleting
λIn −A
the row and column vectors of A occupied by the entry a11 . Then C11 =
det(λIn−1 − An−1 ) whose two leading-degree terms containing λ n−1
and λ n−2

can only appear in the product of λ − a22 and its cofactor in the matrix λIn−1 −
An−1 . Carrying out this process to the end we see that the two leading-degree
terms in pA (λ) can appear only in the product
(λ − a11 ) · · · (λ − ann ), (3.4.7)
which may be read off to give us the results
λn , −(a11 + · · · + ann )λn−1 . (3.4.8)
This completes the proof.
If A ∈ C(n, n) we may also establish Theorem 3.15 by means of calculus.
In fact, we have
 
pA (λ) 1
= det In − A . (3.4.9)
λn λ
Thus, letting λ → ∞ in (3.4.9), we obtain an = det(In ) = 1. Moreover, (3.4.9)
also gives us the expression
 
1 1 1
an + an−1 + · · · + a0 n = det In − A . (3.4.10)
λ λ λ
1
Thus, replacing by t, we get
λ
q(t) ≡ an + an−1 t + · · · + a0 t n
 
 1 − ta11 −ta12 ··· −ta1n 
 
 
 −ta21 1 − ta22 · · · −ta2n 
=  .
 (3.4.11)
 ··· ··· ··· ··· 
 
 −tan1 −tan2 ··· 1 − tann 

Therefore

n

an−1 = q (0) = − aii (3.4.12)
i=1
www.pdfgrip.com

3.4 Characteristic polynomials and Cayley–Hamilton theorem 109

as before.
We now consider the abstract case.
Let U be an n-dimensional vector space over a field F and T ∈ L(U ).
Assume that u ∈ U is an eigenvector of T associated to the eigenvalue λ ∈ F.
Given a basis U = {u1 , . . . , un } we express u as
u = x1 u1 + · · · + xn un , x = (x1 , . . . , xn )t ∈ Fn . (3.4.13)
Let A = (aij ) ∈ F(n, n) be the matrix representation of T with respect to the
basis U so that

n
T (uj ) = aij ui , j = 1, . . . , n. (3.4.14)
i=1

Then the relation T (u) = λu leads to



n
aij xj = λxi , i = 1, . . . , n, (3.4.15)
j =1

which is exactly the matrix equation Ax = λx already discussed. Hence λ may


be obtained from solving for the roots of the characteristic equation (3.4.3).
Let V = {v1 , . . . , vn } be another basis of U and B = (bij ) ∈ F(n, n) be the
matrix representation of T with respect to the basis V so that

n
T (vj ) = bij vi , i = 1, . . . , n. (3.4.16)
i=1

Since U and V are two bases of U , there is an invertible matrix C = (cij ) ∈


F(n, n) called the basis transition matrix such that

n
vj = cij ui , j = 1, . . . , n. (3.4.17)
i=1

Following the study of the previous chapter we know that A, B, C are re-
lated through the similarity relation A = C −1 BC. Hence we have
pA (λ) = det(λIn − A) = det(λIn − C −1 BC)
= det(C −1 [λIn − B]C)
= det(λIn − B) = pB (λ). (3.4.18)
That is, two similar matrices have the same characteristic polynomial. Thus
we may use pA (λ) to define the characteristic polynomial of linear mapping
T ∈ L(U ), rewritten as pT (λ), where A is the matrix representation of T with
respect to any given basis of U , since such a polynomial is independent of the
choice of the basis.
www.pdfgrip.com

110 Determinants

The following theorem, known as the Cayley–Hamilton theorem, is of fun-


damental importance in linear algebra.

Theorem 3.16 Let A ∈ C(n, n) and pA (λ) be its characteristic polynomial.


Then
pA (A) = 0. (3.4.19)
Proof We split the proof into two situations.
First we assume that A has n distinct eigenvalues, say λ1 , . . . , λn . We can
write pA (λ) as
'
n
pA (λ) = (λ − λi ). (3.4.20)
i=1

Let u1 , . . . , un ∈ Cn be the eigenvectors associated to the eigenvalues


λ1 , . . . , λn respectively which form a basis of Cn . Then
⎛ ⎞
'n
pA (A)ui = ⎝ (A − λj In )⎠ ui
j =1
⎛ ⎞
'
=⎝ (A − λj In )⎠ (A − λi In )ui = 0, (3.4.21)
j =i

for any i = 1, . . . , n. This proves pA (A) = 0.


Next we consider the general situation when A may have multiple eigen-
values. We begin by showing that A may be approximated by matrices of n
distinct eigenvalues. We show this by induction on n. When n = 1, there is
nothing to do. Assume the statement is true at n − 1 ≥ 1. We now proceed
with n ≥ 2.
In view of Theorem 3.2 there exists an eigenvalue of A, say λ1 . Let u1 be an
associated eigenvector. Extend {u1 } to get a basis for Cn , say {u1 , u2 , . . . , un }.
Then the linear mapping defined by (3.4.1) satisfies

n
TA (uj ) = Auj = bij ui , i = 1, . . . , n; b11 = λ1 , bi1 = 0, i = 2, . . . , n.
i=1
(3.4.22)
That is, the matrix B = (bij ) of TA with respect to the basis {u1 , . . . , un } is of
the form

λ1 b0
B= , (3.4.23)
0 B0
www.pdfgrip.com

3.4 Characteristic polynomials and Cayley–Hamilton theorem 111

where B0 ∈ C(n − 1, n − 1) and b0 ∈ Cn−1 is a row vector. By the inductive


assumption, for any ε > 0, there is some C0 ∈ C(n − 1, n − 1) such that
B0 − C0  < ε and C0 has n − 1 distinct eigenvalues, say λ2 , . . . , λn , where
we use  ·  to denote any norm of the space of square matrices whenever there
is no risk of confusion. Now set

λ1 + δ b0
C= . (3.4.24)
0 C0

It is clear that the eigenvalues of C are λ1 + δ, λ2 , . . . , λn , which are distinct


when δ > 0 is small enough. Of course there is a constant K > 0 depending
on n only such that B − C < Kε when δ > 0 is sufficiently small.
On the other hand, if we use the notation uj = (u1j , . . . , unj )t , j =
1, . . . , n, then we have


n
uj = uij ei , j = 1, . . . , n. (3.4.25)
i=1

Thus, with U = (uij ), we obtain the relation A = U −1 BU . Therefore

A − U −1 CU  = U −1 (B − C)U  ≤ U −1 U Kε. (3.4.26)

Of course U −1 CU has the same n distinct eigenvalues as the matrix C. Hence


the asserted approximation property in the situation of n × n matrices is estab-
lished.
Finally, let {A(l) } be a sequence in C(n, n) such that A(l) → A as l → ∞
and each A(l) has n distinct eigenvalues (l = 1, 2, . . . ). We have already shown
that pA(l) (A(l) ) = 0 (l = 1, 2, . . . ). Consequently,

pA (A) = lim pA(l) (A(l) ) = 0. (3.4.27)


l→∞

The proof of the theorem is now complete.

Of course Theorem 3.16 is valid for A ∈ F(n, n) whenever F is a subfield


of C although A may not have an eigenvalue in F. Important examples include
F = Q and F = R.
We have seen that the proof of Theorem 3.16 relies on the assumption of the
existence of an eigenvalue which in general is only valid when the underlying
field is C. In fact, however, the theorem holds universally over any field. Below
we give a proof of it without assuming F = C, which is of independent interest
and importance.
www.pdfgrip.com

112 Determinants

To proceed, we need to recall and re-examine the definition of polynomi-


als in the most general terms. Given a field F, a polynomial p over F is an
expression of the form
p(t) = a0 + a1 t + · · · + an t n , a0 , a1 , . . . , an ∈ F, (3.4.28)
where a0 , a1 , . . . , an are called the coefficients of p and the variable t is a
formal symbol that ‘generates’ the formal symbols t 0 = 1, t 1 = t, . . . , t n =
(t)n (called the powers of t) which ‘participate’ in all the algebraic operations
following the usual associative, commutative, and distributive laws as if t were
an F-valued parameter or variable. For example, (at i )(bt j ) = abt i+j for any
a, b ∈ F and i, j ∈ N. The set of all polynomials with coefficients in F and in
the variable t is denoted by P, which is a vector space over F, with addition
and scalar multiplication defined in the usual ways, whose zero element is any
polynomial with zero coefficients. In other words, two polynomials in P are
considered equal if and only if the coefficients of the same powers of t in the
two polynomials all agree.
In the rest of this section we shall assume that the variable of any polynomial
we consider is such a formal symbol. It is clear that this assumption does not
affect the computation we perform.
Let A ∈ F(n, n) and λ be a (formal) variable just introduced. Then we have
(λIn − A)adj(λIn − A) = det(λIn − A)In , (3.4.29)
whose right-hand side is simply
det(λIn − A)In = (λn + an−1 λn−1 + · · · + a1 λ + a0 )In . (3.4.30)
On the other hand, we may expand the left-hand side of (3.4.29) into the form
(λIn − A)adj(λIn − A)
= (λIn − A)(An−1 λn−1 + · · · + A1 λ + A0 )
= An−1 λn +(An−2 − AAn−1 )λn−1 +(A0 − AA1 )λ − AA0 , (3.4.31)
where A0 , A1 , . . . , An−1 ∈ F(n, n). Comparing the like powers of λ in
(3.4.30) and (3.4.31) we get the relations
An−1 = In , An−2 − AAn−1 = an−1 In , ...,
(3.4.32)
A0 − AA1 = a1 In , −AA0 = a0 In .
Multiplying from the left the first relation in (3.4.32) by An , the second by
An−1 ,. . . , and the second last by A, and then summing up the results, we obtain
0 = An + an−1 An−1 + · · · + a1 A + a0 In = pA (A), (3.4.33)
as anticipated.
www.pdfgrip.com

3.4 Characteristic polynomials and Cayley–Hamilton theorem 113

Let U be an n-dimensional vector space over a field F and T ∈ L(U ). Given


a basis U = {u1 , . . . , un } let A ∈ F(n, n) be the matrix that represents T with
respect to U . Then we know that Ai represents T i with respect to U for any
i = 1, 2 . . . . As a consequence, for any polynomial p(t) with coefficients in
F, the matrix p(A) represents p(T ) with respect to U. Therefore the Cayley–
Hamilton theorem may be restated in terms of linear mappings as follows.

Theorem 3.17 Let U be a finite-dimensional vector space over an arbitrary


field. Any T ∈ L(U ) is trivialized or annihilated by its characteristic polyno-
mial pT (λ). That is,
pT (T ) = 0. (3.4.34)

For A ∈ F(n, n) (n ≥ 2) let pA (λ) = λn + an−1 λn−1 + · · · + a1 λ + a0 be


the characteristic polynomial of A. Theorem 3.15 gives us a0 = (−1)n det(A).
Inserting this result into the equation pA (A) = 0 we have
A(An−1 + an−1 An−2 + · · · + a1 In ) = (−1)n+1 det(A)In , (3.4.35)
which leads us again to the conclusion that A is invertible whenever det(A) =
0. Furthermore, in this situation, the relation (3.4.35) implies
(−1)n+1 n−1
A−1 = (A + an−1 An−2 + · · · + a1 In ), (3.4.36)
det(A)
or alternatively,
adj(A) = (−1)n+1 (An−1 + an−1 An−2 + · · · + a1 In ), det(A) = 0.
(3.4.37)
We leave it as an exercise to show that the condition det(A) = 0 above for
(3.4.37) to hold is not necessary and can thus be dropped.

Exercises

3.4.1 Let pA (λ) be the characteristic polynomial of a matrix A = (aij ) ∈


F(n, n), where n ≥ 2 and F = R or C. Show that in pA (λ) = λn + · · · +
a1 λ + a0 we have

n
a1 = (−1)n−1 Cii , (3.4.38)
i=1

where Cii is the cofactor of the entry aii of the matrix A (i = 1, . . . , n).
3.4.2 Consider the subset D of C(n, n) defined by
D = {A ∈ C(n, n) | A has n distinct eigenvalues}. (3.4.39)
www.pdfgrip.com

114 Determinants

Prove that D is open in C(n, n).


3.4.3 Let A ∈ F(2, 2) be given by

a b
A= . (3.4.40)
c d
Find the characteristic polynomial of A. Assume det(A) = ad − bc = 0
and use (3.4.36) to derive the formula that gives A−1 .
3.4.4 Show that (3.4.37) is true for any A ∈ F(n, n). That is, the condition
det(A) = 0 in (3.4.37) may actually be removed.
3.4.5 Let U be an n-dimensional vector space over F and T ∈ L(U ) is nilpo-
tent of degree n. What is the characteristic polynomial of T ?
3.4.6 For A, B ∈ F(n, n), prove that the characteristic polynomials of AB and
BA are identical. That is,
pAB (λ) = pBA (λ). (3.4.41)
3.4.7 Let F be a field and α, β ∈ F(1, n). Find the characteristic polynomial
of the matrix α t β ∈ F(n, n).
3.4.8 Consider the matrix
⎛ ⎞
1 2 0
⎜ ⎟
A=⎝ 0 2 0 ⎠. (3.4.42)
−2 −1 −1
(i) Use the characteristic polynomial of A and the Cayley–Hamilton
theorem to find A−1 .
(ii) Use the characteristic polynomial of A and the Cayley–Hamilton
theorem to find A10 .
www.pdfgrip.com

4
Scalar products

In this chapter we consider vector spaces over a field which is either R or C. We


shall start from the most general situation of scalar products. We then consider
the situations when scalar products are non-degenerate and positive definite,
respectively.

4.1 Scalar products and basic properties


In this section, we use F to denote the field R or C.

Definition 4.1 Let U be a vector space over F. A scalar product over U is


defined to be a bilinear symmetric function f : U × U → F, written simply
as (u, v) ≡ f (u, v), u, v ∈ U . In other words the following properties hold.

(1) (Symmetry) (u, v) = (v, u) ∈ F for u, v ∈ U .


(2) (Additivity) (u + v, w) = (u, w) + (v, w) for u, v, w ∈ U .
(3) (Homogeneity) (au, v) = a(u, v) for a ∈ F and u, v ∈ U .
We say that u, v ∈ U are mutually perpendicular or orthogonal to each
other, written as u ⊥ v, if (u, v) = 0. More generally for any non-empty
subset S of U we use the notation
S ⊥ = {u ∈ U | (u, v) = 0 for any v ∈ S}. (4.1.1)
For u ∈ U we say that u is a null vector if (u, u) = 0.

It is obvious that S ⊥ is a subspace of U for any nonempty subset S of U .


Moreover {0}⊥ = U . Furthermore it is easy to show that if the vectors
u1 , . . . , uk are mutually perpendicular and not null then they are linearly
independent.

115
www.pdfgrip.com

116 Scalar products

Let u, v ∈ U so that u is not null. Then we can resolve v into the sum of
two mutually perpendicular vectors, one in Span{u}, say cu for some scalar
c, and one in Span{u}⊥ , say w. In fact, rewrite v as v = w + cu and require
(u, w) = 0. We obtain the unique solution c = (u, v)/(u, u). In summary, we
have obtained the orthogonal decomposition
(u, v) (u, v)
v=w+ u, w=v− u ∈ Span{u}⊥ . (4.1.2)
(u, u) (u, u)
As the first application of the decomposition (4.1.2), we state the following.

Theorem 4.2 Let U be a finite-dimensional vector space equipped with a


scalar product (·, ·) and set U0 = U ⊥ . Any basis of U0 can be extended to
become an orthogonal basis of U . In other words any two vectors in such a
basis of U are mutually perpendicular.

Proof If U0 = U there is nothing to show. Below we assume U0 = U .


If U0 = {0} we may start from any basis of U . If U0 = {0} let {u1 , . . . , uk }
be a basis of U0 and extend it to get a basis of U , say {u1 , . . . , uk , v1 , . . . , vl }.
That is, U = U0 ⊕ V , where

V = Span{v1 , . . . , vl }. (4.1.3)

If (v1 , v1 ) = 0 then there is some vi (i ≥ 2) such that (v1 , vi ) = 0 oth-


erwise v1 ∈ U0 which is false. Thus, without loss of generality, we may
assume (v1 , v2 ) = 0 and consider {v2 , v1 , . . . , vl } instead if (v2 , v2 ) = 0 oth-
erwise we may consider {v1 + v2 , v2 , . . . , vl } as a basis of V because now
(v1 + v2 , v1 + v2 ) = 2(v1 , v2 ) = 0. So we have seen that we may assume
(v1 , v1 ) = 0 to start with after renaming the basis vectors {v1 , . . . , vl } of V if
necessary. Now let w1 = v1 and set
(w1 , vi )
wi = vi − w1 , i = 2, . . . , l. (4.1.4)
(w1 , w1 )
Then wi = 0 since v1 , vi are linearly independent for all i = 2, . . . , l. It is
clear that wi ⊥ w1 (i = 2, . . . , l). If (wi , wi ) = 0 for some i = 2, . . . , l,
we may assume i = 2 after renaming the basis vectors {v1 , . . . , vl } of V if
necessary. If (wi , wi ) = 0 for all i = 2, . . . , l, there must be some j =
i, i, j = 2, . . . , l, such that (wi , wj ) = 0, otherwise wi ∈ U0 for i = 2, . . . , l,
which is false. Without loss of generality, we may assume (w2 , w3 ) = 0 and
consider
(w1 , v2 + v3 )
w2 + w3 = (v2 + v3 ) − w1 . (4.1.5)
(w1 , w1 )
www.pdfgrip.com

4.1 Scalar products and basic properties 117

It is clear that (w2 +w3 , w2 +w3 ) = 2(w2 , w3 ) = 0 and (w2 +w3 ) ⊥ w1 . Since
{v1 , v2 + v3 , v3 , . . . , vl } is also a basis of V , the above procedure indicates that
we may rename the basis vectors {v1 , . . . , vl } of V if necessary so that we
obtain
(w1 , v2 )
w2 = v2 − w1 , (4.1.6)
(w1 , w1 )
which satisfies (w2 , w2 ) = 0. Of course Span{v1 , . . . , vl } =
Span{w1 , w2 , v3 , . . . , vl }. Now set
(w2 , vi ) (w1 , vi )
wi = vi − w2 − w1 , i = 3, . . . , l. (4.1.7)
(w2 , w2 ) (w1 , w1 )
Then wi ⊥ w2 and wi ⊥ w1 for i = 3, . . . , l. If (wi , wi ) = 0 for some i =
3, . . . , l, by renaming {v1 , . . . , vl } if necessary, we may assume (w3 , w3 ) = 0.
If (wi , wi ) = 0 for all i = 3, . . . , l, then there is some i = 4, . . . , l such
that (w3 , wi ) = 0. We may assume (w3 , w4 ) = 0. Thus (w3 + w4 , w3 +
w4 ) = 2(w3 , w4 ) = 0 and (w3 + w4 ) ⊥ w1 , (w3 + w4 ) ⊥ w2 . Of course,
{v1 , v2 , v3 + v4 , v4 . . . , vl } is also a basis of V . Thus we see that we may
rename the basis vectors {v1 , . . . , vl } of V so that we obtain
(w2 , v3 ) (w1 , v3 )
w3 = v3 − w2 − w1 , (4.1.8)
(w2 , w2 ) (w1 , w1 )
which again satisfies (w3 , w3 ) = 0 and
Span{v1 , . . . , vl } = Span{w1 , w2 , w3 , v4 , . . . , vl }. (4.1.9)
Therefore, by renaming the basis vectors {v1 , . . . , vl } if necessary, we will
be able to carry the above procedure out and obtain a new set of vectors
{w1 , . . . , wl } given by

i−1
(wj , vi )
w1 = v1 , wi = vi − wj , i = 2, . . . , l, (4.1.10)
(wj , wj )
j =1

and having the properties


(wi , wi ) = 0, i = 1, . . . , l,
(wi , wj ) = 0, i = j, i, j = 1, . . . , l, (4.1.11)
Span{w1 , . . . , wl } = Span{v1 , . . . , vl }.
In other words {u1 , . . . , uk , w1 , . . . , wl } is seen to be an orthogonal basis of U .

The method described in the proof of Theorem 4.2, especially the scheme
given by the formulas in (4.1.10)–(4.1.11), is known as the Gram–Schmidt
procedure for basis orthogonalization.
www.pdfgrip.com

118 Scalar products

In the rest of this section, we assume F = R.


If U0 = U ⊥ is a proper subspace of U , that is, U0 = {0} and U0 = U , we
have seen from Theorem 4.2 that we may express an orthogonal basis of U by
(say)

{u1 , . . . , un0 , v1 , . . . , vn+ , w1 , . . . , wn− }, (4.1.12)

so that {u1 , . . . , un0 } is a basis of U0 and that (if any)

(vi , vi ) > 0, i = 1, . . . , n+ , (wi , wi ) < 0, i = 1, . . . , n− .


(4.1.13)
It is clear that, with

U+ = Span{v1 , . . . , vn+ }, U− = Span{w1 , . . . , wn− }, (4.1.14)

we have the following elegant orthogonal subspace decomposition

U = U0 ⊕ U+ ⊕ U− , dim(U0 ) = n0 , dim(U+ ) = n+ , dim(U− ) = n− .


(4.1.15)
It is interesting that the integers n0 , n+ , n− are independent of the choice of
an orthogonal basis. Such a statement is also known as the Sylvester theorem.
In fact, it is obvious that n0 is independent of the choice of an orthogonal
basis since n0 is the dimension of U0 = U ⊥ . Assume that

{ũ1 , . . . , ũn0 , ṽ1 , . . . , ṽm+ , w̃1 , . . . , w̃m− } (4.1.16)

is another orthogonal basis of U so that {ũ1 , . . . , ũn0 } is a basis of U0 and that


(if any)

(ṽi , ṽi ) > 0, i = 1, . . . , m+ , (w̃i , w̃i ) < 0, i = 1, . . . , m− .


(4.1.17)
To proceed, we assume n+ ≥ m+ for definiteness and we need to est-
ablish n+ ≤ m+ . For this purpose, we show that u1 , . . . , un0 , v1 , . . . , vn+ ,
w̃1 , . . . , w̃m− are linearly independent. Indeed, if there are scalars a1 , . . . , an0 ,
b1 , . . . , bn+ , c1 , . . . , cm− in R such that

a1 u1 + · · · + an0 un0 + b1 v1 + · · · + bn+ vn+ = c1 w̃1 + · · · + cm− w̃m− ,


(4.1.18)
then we may take the scalar products of both sides of (4.1.18) with themselves
to get

b12 (v1 , v1 ) + · · · + bn2+ (vn+ , vn+ ) = c12 (w̃1 , w̃1 ) + · · · + cm


2

(w̃m− , w̃m− ).
(4.1.19)
www.pdfgrip.com

4.1 Scalar products and basic properties 119

Thus, applying the properties (4.1.13) and (4.1.17) in (4.1.19), we conclude


that b1 = · · · = bn+ = c1 = · · · = cm− = 0. Inserting this result into
(4.1.18) and using the linear independence of u1 , . . . , un0 , we arrive at a1 =
· · · = an0 = 0. So the asserted linear independence follows. As a consequence,
we have

n0 + n+ + m− ≤ dim(U ). (4.1.20)

In view of (4.1.20) and n0 +m+ +m− = dim(U ) we find n+ ≤ m+ as desired.


Thus the integers n0 , n+ , n− are determined by the scalar product and
independent of the choice of an orthogonal basis. These integers are some-
times referred to as the indices of nullity, positivity, and negativity of the scalar
product, respectively.
It is clear that for the orthogonal basis (4.1.12) of U we can further rescale
the vectors v1 , . . . , vn+ and w1 , . . . , wn− to make them satisfy

(vi , vi ) = 1, i = 1, . . . , n+ , (wi , wi ) = −1, i = 1, . . . , n− . (4.1.21)

Such an orthogonal basis is called an orthonormal basis.

Exercises

4.1.1 Let S be a non-empty subset of a vector space U equipped with a scalar


product (·, ·). Show that S ⊥ is a subspace of U and S ⊂ (S ⊥ )⊥ .
4.1.2 Let u1 , . . . , uk be mutually perpendicular vectors of a vector space U
equipped with a scalar product (·, ·). Show that if these vectors are not
null then they must be linearly independent.
4.1.3 Let S1 and S2 be two non-empty subsets of a vector space U equipped
with a scalar product (·, ·). If S1 ⊂ S2 , show that S1⊥ ⊃ S2⊥ .
4.1.4 Consider the vector space Rn and define
⎛ ⎞ ⎛ ⎞
a1 b1
⎜ . ⎟ ⎜ ⎟
(u, v) = ut Av, u = ⎜ ⎝ .
.
⎟ , v = ⎜ .. ⎟ ∈ Rn ,
⎠ ⎝ . ⎠ (4.1.22)
an bn
where A ∈ R(n, n).
(i) Show that (4.1.22) defines a scalar product over Rn if and only if A
is symmetric.
(ii) Show that the subspace U0 = (Rn )⊥ is in fact the null-space of the
matrix A given by

N (A) = {x ∈ Rn | Ax = 0}. (4.1.23)


www.pdfgrip.com

120 Scalar products

4.1.5 In special relativity, one equips the space R4 with the Minkowski scalar
product or Minkowski metric given by
⎛ ⎞ ⎛ ⎞
a1 b1
⎜ ⎟ ⎜ ⎟
⎜ a2 ⎟ ⎜ b2 ⎟

(u, v) = a1 b1 − a2 b2 − a3 b3 − a4 b4 , u = ⎜ ⎟ ⎜ ⎟
⎟, v = ⎜ b ⎟ ∈ R .
4
⎝ a3 ⎠ ⎝ 3 ⎠
a4 b4
(4.1.24)
Find an orthogonal basis and determine the indices of nullity, positivity,
and negativity of R4 equipped with this scalar product.
4.1.6 With the notation of the previous exercise, consider the following modi-
fied scalar product

(u, v) = a1 b1 − a2 b3 − a3 b2 − a4 b4 . (4.1.25)

(i) Use the Gram–Schmidt procedure to find an orthonormal basis


of R4 .
(ii) Compute the indices of nullity, positivity, and negativity.
4.1.7 Let U be a vector space with a scalar product and V , W two subspaces
of U . Show that

(V + W )⊥ = V ⊥ ∩ W ⊥ . (4.1.26)

4.1.8 Let U be an n-dimensional vector space over C with a scalar product


(·, ·). Show that if n ≥ 2 then there must be a vector u ∈ U , u = 0, such
that (u, u) = 0.

4.2 Non-degenerate scalar products


Let U be a vector space equipped with the scalar product (·, ·). In this section,
we examine the special situation when U0 = U ⊥ = {0}.

Definition 4.3 A scalar product (·, ·) over U is said to be non-degenerate if


U0 = U ⊥ = {0}. Or equivalently, if (u, v) = 0 for all v ∈ U then u = 0.

The most important consequence of a non-degenerate scalar product is that


it allows us to identify U with its dual space U  naturally through the pairing
given by the scalar product. To see this, we note that for each v ∈ U

f (u) = (u, v), u∈U (4.2.1)


www.pdfgrip.com

4.2 Non-degenerate scalar products 121

defines an element f ∈ U  . We now show that all elements of U  may be


defined this way.
In fact, assume dim(U ) = n and let {u1 , . . . , un } be an orthogonal basis
of U . Since
(ui , ui ) ≡ ci = 0, i = 1, . . . , n, (4.2.2)
we may take
1
vi = ui , i = 1, . . . , n, (4.2.3)
ci
to achieve
(ui , vj ) = δij , i, j = 1, . . . , n. (4.2.4)
Thus, if we define fi ∈ U  by setting
fi (u) = (u, vi ), u ∈ U, i = 1, . . . , n, (4.2.5)

then {f1 , . . . , fn } is seen to be a basis of U dual to {u1 , . . . , un }. Therefore,
for each f ∈ U  , there are scalars a1 , . . . , an such that
f = a1 f1 + · · · + an fn . (4.2.6)
Consequently, for any u ∈ U , we have
f (u) = (a1 f1 + · · · + an fn )(u) = a1 f1 (u) + · · · + an fn (u)
= a1 (u, v1 ) + · · · + an (u, vn ) = (u, a1 v1 + · · · + an vn ) ≡ (u, v),
(4.2.7)
which proves that any element f of U  is of the form (4.2.1). Such a statement,
that is, any functional over U may be represented as a scalar product, in the
context of infinite-dimensional spaces, is known as the Riesz representation
theorem.
In order to make our discussion more precise, we denote the dependence of
f on v in (4.2.1) explicitly by f ≡ v  ∈ U  and use ρ : U → U  to express
this correspondence,
ρ(v) = v  . (4.2.8)
Therefore, we may summarize various relations discussed above as follows,
u, ρ(v) = u, v   = (u, v), u, v ∈ U. (4.2.9)
Then we can check to see that ρ is linear. In fact, for v, w ∈ U , we have
u, ρ(v + w) = (u, v + w) = (u, v) + (u, w)
= u, ρ(v) + u, ρ(w) = u, ρ(v) + ρ(w), u ∈ U,
(4.2.10)
www.pdfgrip.com

122 Scalar products

which implies ρ(v + w) = ρ(v) + ρ(w). Besides, for a ∈ F and v ∈ U ,


we have

u, ρ(av) = (u, av) = a(u, v) = au, ρ(v) = u, aρ(v), (4.2.11)

which establishes ρ(av) = aρ(v). Thus the linearity of ρ : U → U  follows.


Since we have seen that ρ : U → U  is onto, we conclude that ρ : U → U 
is an isomorphism, which may rightfully be called the Riesz isomorphism. As
a consequence, we can rewrite (4.2.9) as

u, ρ(v) = u, v   = (u, v) = (u, ρ −1 (v  )), u, v ∈ U, v ∈ U .


(4.2.12)

On the other hand, for T ∈ L(U ), recall that the dual of T , T  ∈ L(U  ),
satisfies

u, T  (v  ) = T (u), v  , u ∈ U, v ∈ U . (4.2.13)

So in view of (4.2.12) and (4.2.13), we arrive at

(T (u), v) = (u, (ρ −1 ◦ T  ◦ ρ)(v)), u, v ∈ U. (4.2.14)

In other words, for any T ∈ L(U ), there is a unique element T ∗ ∈ L(U ),


called the dual of T with respect to the non-degenerate scalar product (·, ·)
and determined by the relation

T ∗ = ρ −1 ◦ T  ◦ ρ, (4.2.15)

via the Riesz isomorphism ρ : U → U  , such that

(T (u), v) = (u, T ∗ (v)), u, v ∈ U. (4.2.16)

Through the Riesz isomorphism, we may naturally identify U  with U . In


this way, we may view U as its own dual space and describe U as a self-dual
space. Thus, for T ∈ L(U ), we may naturally identify T ∗ with T  as well,
without spelling out the Riesz isomorphism, which leads us to formulate the
following definition.

Definition 4.4 Let U be a vector space with a non-degenerate scalar product


(·, ·). For a mapping T ∈ L(U ), the unique mapping T  ∈ L(U ) satisfying

(u, T (v)) = (T  (u), v), u, v ∈ U, (4.2.17)

is called the dual or adjoint mapping of T , with respect to the scalar product
(·, ·).

If T = T  , T is said to be a self-dual or self-adjoint mapping with respect to


the scalar product (·, ·).
www.pdfgrip.com

4.2 Non-degenerate scalar products 123

Definition 4.5 Let T ∈ L(U ) where U is a vector space equipped with a scalar
product (·, ·). We say that T is an orthogonal mapping if (T (u), T (v)) = (u, v)
for any u, v ∈ U .

As an immediate consequence of the above definition, we have the following


basic results.

Theorem 4.6 That T ∈ L(U ) is an orthogonal mapping is equivalent to one


of the following statements.
(1) (T (u), T (u)) = (u, u) for any u ∈ U .
(2) For any orthogonal basis {u1 , . . . , un } of U the vectors T (u1 ), . . . , T (un )
are mutually orthogonal and (T (ui ), T (ui )) = (ui , ui ) for i = 1, . . . , n.
(3) T  ◦ T = T ◦ T  = I , the identity mapping over U .

Proof If T is orthogonal, it is clear that (1) holds.


Now assume (1) is valid. Using the properties of the scalar product, we have
the identity
2(u, v) = (u + v, u + v) − (u, u) − (v, v), u, v ∈ U. (4.2.18)
So 2(T (u), T (v)) = (T (u + v), T (u + v)) − (T (u), T (u)) − (T (v), T (v)) =
2(u, v) for any u, v ∈ U . Thus T is orthogonal.
That T being orthogonal implies (2) is trivial.
Assume (2) holds. We express any u, v ∈ U as

n 
n
u= ai ui , v= bi ui , ai , bi ∈ F, i = 1, . . . , n. (4.2.19)
i=1 i=1
Therefore we have
⎛  ⎛ ⎞⎞
n 
n
(T (u), T (v)) = ⎝T ai ui , T ⎝ bj uj ⎠⎠
i=1 j =1

n
= ai bj (T (ui ), T (uj ))
i,j =1
 n
= ai bi (T (ui ), T (ui ))
i=1
n
= ai bi (ui , ui )
i=1
⎛ ⎞
n 
n
=⎝ ai ui , bj uj ⎠ = (u, v), (4.2.20)
i=1 j =1
www.pdfgrip.com

124 Scalar products

which establishes the orthogonality of T .


We now show that T being orthogonal and (3) are equivalent. In fact, if T is
orthogonal, then (u, (T  ◦ T )(v)) = (u, v) or (u, (T  ◦ T − I )(v)) = 0 for any
u ∈ U . By the non-degeneracy of the scalar product we get (T  ◦T −I )(v) = 0
for any v ∈ U , which proves T  ◦ T = I . In other words, T  is a left inverse
of T . In view of the discussion in Section 2.1, T  is also a right inverse of T .
That is, T ◦ T  = I . So (3) follows. That (3) implies the orthogonality of T is
obvious.

As an example, we consider R2 equipped with the standard Euclidean scalar


product, i.e. the dot product, given as
 
a1 b1
(u, v)+ ≡ u · v = a1 b1 + a2 b2 for u = ,v= ∈ R2 .
a2 b2
(4.2.21)

It is straightforward to check that the rotation mapping Rθ : R2 → R2


defined by

cos θ − sin θ
Rθ (u) = u, θ ∈ R, u ∈ R2 , (4.2.22)
sin θ cos θ
is an orthogonal mapping with respect to the scalar product (4.2.21). However,
it fails to be orthogonal with respect to the Minkowski scalar product
 
a1 b1
(u, v)− ≡ a1 b1 − a2 b2 for u = ,v= ∈ R2 . (4.2.23)
a2 b2

Nevertheless, if we modify Rθ into ρθ : R2 → R2 using hyperbolic cosine and


sine functions and dropping the negative sign, by setting

cosh θ sinh θ
ρθ (u) = u, θ ∈ R, u ∈ R2 , (4.2.24)
sinh θ cosh θ
we see that ρθ is orthogonal with respect to the scalar product (4.2.23),
although it now fails to be orthogonal with respect to (4.2.21), of course. This
example clearly illustrates the dependence of the form of an orthogonal map-
ping on the underlying scalar product.
As another example, consider the space R2 with the scalar product
 
a1 b1
(u, v)∗ = a1 b1 + a1 b2 + a2 b1 − a2 b2 , u = ,v= ∈ R2 .
a2 b2
(4.2.25)
www.pdfgrip.com

4.2 Non-degenerate scalar products 125

It is clear that (·, ·)∗ is non-degenerate and may be rewritten as



1 1
(u, v)∗ = ut
v. (4.2.26)
1 −1

Thus, for a mapping T ∈ L(R2 ) defined by



a b
T (u) = u ≡ Au, u ∈ R2 , a, b, c, d ∈ R, (4.2.27)
c d
we have

 1 1
(T (u), v)∗ = (u, T (v))∗ = u t
Av
1 −1
  −1 
1 1 1 1 1 1
=u t
A v, (4.2.28)
1 −1 1 −1 1 −1
which implies
 −1 
 1 1 1 1
T (u) = u At
1 −1 1 −1

1 a+b+c+d a+b−c−d
= u, u ∈ R2 . (4.2.29)
2 a−b+c−d a−b−c+d
Consequently, if T is self-adjoint with respect to the scalar product (·, ·)∗ , the
condition a = b + c + d holds for a but b, c, d are arbitrary.

Exercises

4.2.1 Let U be a vector space equipped with a non-degenerate scalar product,


(·, ·), and T ∈ L(U ).
(i) Show that the pairing (u, v)T = (u, T (v)) (u, v ∈ U ) defines a
scalar product on U if and only if T is self-adjoint.
(ii) Show that for a self-adjoint mapping T ∈ L(U ) the pairing (·, ·)T
given above defines a non-degenerate scalar product over U if and
only if T is invertible.
4.2.2 Consider the vector space R2 equipped with the non-degenerate scalar
product
 
a1 b1
(u, v) = a1 b2 + a2 b1 , u = ,v= ∈ R2 . (4.2.30)
a2 b2
www.pdfgrip.com

126 Scalar products

Let
 
1
V = Span . (4.2.31)
0

Show that V ⊥ = V . This provides an example that in general V + V ⊥


may fail to make up the full space.
4.2.3 Let A = (aij ) ∈ R(2, 2) and define TA ∈ L(R2 ) by
 
a11 a12 a1
TA (u) = u, u = ∈ R2 . (4.2.32)
a21 a22 a2
Find conditions on A such that TA is self-adjoint with respect to the
scalar product (·, ·)− defined in (4.2.23).
4.2.4 Define the scalar product
 
a1 b1
(u, v)0 = a1 b2 + a2 b1 , u = v= ∈ R2 , (4.2.33)
a2 b2

over R2 .
(i) Show that the scalar product (·, ·)0 is non-degenerate.
(ii) For TA ∈ L(R2 ) given in (4.2.32), obtain the adjoint mapping TA
of TA and find conditions on the matrix A so that TA is self-adjoint
with respect to the scalar product (·, ·)0 .
4.2.5 Use U to denote the vector space of real-valued functions with all
orders of derivatives over the real line R which vanish outside bounded
intervals. Equip U with the scalar product
 ∞
(u, v) = u(t)v(t) dt, u, v ∈ U. (4.2.34)
−∞
d
Show that the linear mapping D = : U → U is anti-self-dual or
dt
anti-self-adjoint. That is, D  = −D.
4.2.6 Let U be a finite-dimensional vector space equipped with a scalar prod-
uct, (·, ·). Decompose U into the direct sum as stated in (4.1.15). Show
that we may use the scalar product (·, ·) of U to make the quotient
space U/U0 into a space with a non-degenerate scalar product when
U0 = U , still denoted by (·, ·), given by
([u], [v]) = (u, v), [u], [v] ∈ U/U0 . (4.2.35)
4.2.7 Let U be a finite-dimensional vector space equipped with a scalar prod-
uct, (·, ·), and let V be a subspace of U . Show that (·, ·) is a non-
degenerate scalar product over V if and only if V ∩ V ⊥ = {0}.
www.pdfgrip.com

4.3 Positive definite scalar products 127

4.2.8 Let U be a finite-dimensional vector space with a non-degenerate scalar


product (·, ·). Let ρ : U → U  be the Riesz isomorphism. Show that
for any subspace V of U there holds
ρ(V ⊥ ) = V 0 . (4.2.36)

In other words, the mapping ρ is an isomorphism from V ⊥ onto V 0 .


In particular, dim(V ⊥ ) = dim(V 0 ). Thus, in view of (1.4.31) and
(4.2.36), we have the dimensionality equation
dim(V ) + dim(V ⊥ ) = dim(U ). (4.2.37)
4.2.9 Let U be a finite-dimensional space with a non-degenerate scalar prod-
uct and let V be a subspace of U . Use (4.2.37) to establish V = (V ⊥ )⊥ .
4.2.10 Let U be a finite-dimensional space with a non-degenerate scalar prod-
uct and let V , W be two subspaces of U . Establish the relation
(V ∩ W )⊥ = V ⊥ + W ⊥ . (4.2.38)
4.2.11 Let U be an n-dimensional vector space over C with a non-degenerate
scalar product (·, ·). Show that if n ≥ 2 then there must be linearly
independent vectors u, v ∈ U such that (u, u) = 0 and (v, v) = 0 but
(u, v) = 1.

4.3 Positive definite scalar products


In this section we consider two types of positive definite scalar products: real
ones and complex ones. Real ones are modeled over the standard Euclidean
scalar product on Rn :
(u, v) = u · v = ut v = a1 b1 + · · · + an bn ,
⎛ ⎞ ⎛ ⎞
a1 b1
⎜ . ⎟ ⎜ . ⎟ (4.3.1)
u=⎜ ⎟ ⎜ ⎟
⎝ .. ⎠ , v = ⎝ .. ⎠ ∈ R ,
n

an bn
and complex ones over the standard Hermitian scalar product on Cn :
(u, v) = u · v = u† v = a 1 b1 + · · · + a n bn ,
⎛ ⎞ ⎛ ⎞
a1 b1
⎜ . ⎟ ⎜ . ⎟ (4.3.2)
u=⎜ ⎟ ⎜ ⎟
⎝ .. ⎠ , v = ⎝ .. ⎠ ∈ C .
n

an bn
www.pdfgrip.com

128 Scalar products

The common feature of these products is the positivity property (u, u) ≥ 0


for any vector u and (u, u) = 0 only when u = 0. The major difference is
that the former is symmetric but the latter fails to be so. Instead, there holds
the adjusted property (u, v) = (v, u) for any u, v ∈ Cn , which is seen to be
naturally implemented to ensure the positivity property. As a consequence, in
both the(Rn and Cn cases, we are able to define the norm of a vector u to be
u = (u, u).
Motivated from the above examples, we can bring forth the following
definition.

Definition 4.7 A positive definite scalar product over a real vector space U
is a scalar product (·, ·) satisfying (u, u) ≥ 0 for u ∈ U and (u, u) = 0 only
when u = 0.
A positive definite scalar product over a complex vector space U is a scalar
function (u, v) ∈ C, defined for each pair of vectors u, v ∈ U , which satisfies
the following conditions.
(1) (Hermitian symmetry) (u, v) = (v, u) for u, v ∈ U .
(2) (Additivity) (u + v, w) = (u, w) + (v, w) for u, v, w ∈ U .
(3) (Partial homogeneity) (u, av) = a(u, v) for a ∈ C and u, v ∈ U .
(4) (Positivity) (u, u) ≥ 0 for u ∈ U and (u, u) = 0 only when u = 0.

Since the real case is contained as a special situation of the complex case,
we shall focus our discussion on the complex case, unless otherwise stated.
Needless to say, additivity regarding the second argument in (·, ·) still holds
since
(u, v + w) = (v + w, u) = (v, u) + (w, u) = (u, v) + (u, w), u, v, w ∈ U.
(4.3.3)
On the other hand, homogeneity regarding the first argument takes a modified
form,
(au, v) = (v, au) = a(v, u) = a(u, v), a ∈ C, u ∈ U. (4.3.4)
We will extend our study carried out in the previous two sections for general
scalar products to the current situation of a positive definite scalar product that
is necessarily non-degenerate since (u, u) > 0 for any nonzero vector u in U .
First, we see that for u, v ∈ U we can still use the condition (u, v) = 0 to
define u, v to be mutually perpendicular vectors. Next, since for any u ∈ U we
have (u, u) ≥ 0, we can formally define the norm of u as in Cn by
(
u = (u, u). (4.3.5)
www.pdfgrip.com

4.3 Positive definite scalar products 129

It is clearly seen that the norm so defined enjoys the positivity and homogeneity
conditions required of a norm. That it also satisfies the triangle inequality will
be established shortly. Thus (4.3.5) indeed gives rise to a norm of the space U
that is said to be induced from the positive definite scalar product (·, ·).
Let u, v ∈ U be perpendicular. Then we have
u + v2 = u2 + v2 . (4.3.6)
This important expression is also known as the Pythagoras theorem, which
may be proved by a simple expansion
u + v2 = (u + v, u + v) = (u, u) + (u, v) + (v, u) + (v, v) = u2 + v2 ,
(4.3.7)

since (u, v) = 0 and (v, u) = (u, v) = 0.


For u, v ∈ U with u = 0, we may use the orthogonal decomposition formula
(4.1.2) to resolve v into the form
 
(u, v) (u, v) (u, v)
v= v− u + u≡w+ u, (4.3.8)
(u, u) (u, u) (u, u)
so that (u, w) = 0. Hence, in view of the Pythagoras theorem, we get
 
 (u, v) 2
 u2 ≥ |(u, v)| .
2
v = w + 
2 2  (4.3.9)
(u, u)  u2
In other words, we have
|(u, v)| ≤ uv, (4.3.10)
with equality if and only if w = 0 or equivalently, v ∈ Span{u}. Of course
(4.3.10) is valid when u = 0. Hence, in summary, we may state that (4.3.10)
holds for any u, v ∈ U and that the equality is achieved if and only if u, v are
linearly dependent.
The inequality (4.3.10) is the celebrated Schwarz inequality whose deriva-
tion is seen to be another direct application of the vector orthogonal decompo-
sition formula (4.1.2).
We now apply the Schwarz inequality to establish the triangle inequality for
the norm  ·  induced from a positive definite scalar product (·, ·).
Let u, v ∈ U . Then in view of (4.3.10) we have
u + v2 = u2 + v2 + (u, v) + (v, u)
≤ u2 + v2 + 2|(u, v)|
≤ u2 + v2 + 2uv
= (u + v)2 . (4.3.11)
www.pdfgrip.com

130 Scalar products

Hence the triangle inequality u + v ≤ u + v follows.


If {u1 , . . . , un } is a basis of U , we may invoke the Gram–Schmidt procedure

i−1
(vj , ui )
v1 = u1 , vi = ui − vj , i = 2, . . . , n, (4.3.12)
(vj , vj )
j =1

as before to obtain an orthogonal basis for U . In fact, we may examine the


validity of this procedure by a simple induction.
When n = 1, there is nothing to show.
Assume the procedure is valid at n = k ≥ 1.
At n = k + 1, by the inductive assumption, we know that we may construct
{v1 , . . . , vk } to get an orthogonal basis for Span{u1 , . . . , uk }. Define

k
(vj , uk+1 )
vk+1 = uk+1 − vj . (4.3.13)
(vj , vj )
j =1

Then we can check that (vk+1 , vi ) = 0 for i = 1, . . . , k and


vk+1 ∈ Span{uk+1 , v1 , . . . , vk } ⊂ Span{u1 , . . . , uk , uk+1 }. (4.3.14)
Of course vk+1 = 0 otherwise uk+1 ∈ Span{v1 , . . . , vk } = Span{u1 , . . . , uk }.
Thus we have obtained k + 1 nonzero mutually orthogonal vectors
v1 , . . . , vk , vk+1 that make up a basis for Span{u1 , . . . , uk , uk+1 } as asserted.
Thus, we have seen that, in the positive definite scalar product situation,
from any basis {u1 , . . . , un } of U , the Gram–Schmidt procedure (4.3.12) pro-
vides a scheme of getting an orthogonal basis {v1 , . . . , vn } of U so that each
of its subsets {v1 , . . . , vk } is an orthogonal basis of Span{u1 , . . . , uk } for
k = 1, . . . , n.
Let {v1 , . . . , vn } be an orthogonal basis for U . The positivity property allows
us to modify the basis further by setting
1
wi = vi , i = 1, . . . , n, (4.3.15)
vi 
so that {w1 , . . . , wn } is an orthogonal basis of U consisting of unit vectors
(that is, wi  = 1 for i = 1, . . . , n). Such an orthogonal basis is called an
orthonormal basis.
We next examine the dual space U  of U in view of the positive definite
scalar product (·, ·) over U .
For any u ∈ U , it is clear that
f (v) = (u, v), v ∈ U, (4.3.16)
defines an element in U  . Now let {u1 , . . . , un } be an orthonormal basis of U
and define fi ∈ U  by setting
www.pdfgrip.com

4.3 Positive definite scalar products 131

fi (v) = (ui , v), v ∈ U. (4.3.17)

Then {f1 , . . . , fi } is a basis of U  which is dual to {u1 , . . . , un }. Hence, for


any f ∈ U  , there are scalars a1 , . . . , an such that

f = a1 f1 + · · · + an fn . (4.3.18)

Consequently, we have

f (v) = a1 f1 (v) + · · · + an fn (v)


= a1 (u1 , v) + · · · + an (un , v)
= (a 1 u1 + · · · + a n un , v)
≡ (u, v), u = a 1 u1 + · · · + a n un . (4.3.19)

Thus, each element in U  may be represented by an element in U in the form


of a scalar product. In other words, the Riesz representation theorem still holds
here, although homogeneity of such a representation takes an adjusted form,

fi #→ ui , i = 1, . . . , n; f = a1 f1 + · · · + an fn #→ a 1 u1 + · · · + a n un .
(4.3.20)

Nevertheless, we now show that adjoint mappings are well defined.


Let U, V be finite-dimensional vector spaces over C with positive definite
scalar products, both denoted by (·, ·). For T ∈ L(U, V ) and any v ∈ V , the
expression

f (u) = (v, T (u)), u ∈ U, (4.3.21)

defines an element f ∈ U  . Hence there is a unique element w ∈ U such


that f (u) = (w, u). Since w depends on v, we may denote this relation by
w = T  (v). Hence

(v, T (u)) = (T  (v), u). (4.3.22)

We will prove T  ∈ L(V , U ).


In fact, for v1 , v2 ∈ V , we have

(T  (v1 + v2 ), u) = (v1 + v2 , T (u))


= (v1 , T (u)) + (v2 , T (u))
= (T  (v1 ), u) + (T  (v2 ), u)
= (T  (v1 ) + T  (v2 ), u), u ∈ U. (4.3.23)

Thus T  (v1 + v2 ) = T  (v1 ) + T  (v2 ) and additivity follows.


www.pdfgrip.com

132 Scalar products

For a ∈ C and v ∈ V , we have

(T  (av), u) = (av, T (u)) = a(v, T (u))


= a(T  (v), u) = (aT  (v), u), u ∈ U. (4.3.24)

This shows T  (av) = aT  (v) and homogeneity also follows.


Of particular interest is a mapping from U into itself. In this case we can
define a self-dual or self-adjoint mapping T with respect to the positive definite
scalar product (·, ·) to be such that T  = T . Similar to Definition 4.5, we also
have the following.

Definition 4.8 Let U be a real or complex vector space equipped with


a positive definite scalar product (·, ·). Assume that T ∈ L(U ) satisfies
(T (u), T (v)) = (u, v) for any u, v ∈ U .

(1) T is called orthogonal when U is real.


(2) T is called unitary when U is complex.

In analogue to Theorem 4.6, we have the following.

Theorem 4.9 That T ∈ L(U ) is orthogonal or unitary is equivalent to one of


the following statements.

(1) T is norm-preserving. That is, T (u) = u for any u ∈ U , where  · 


is the norm of U induced from its positive definite scalar product (·, ·).
(2) T maps an orthonormal basis to another orthonormal basis of U .
(3) T  ◦ T = T ◦ T  = I .

Proof We need only to carry out the proof in the complex case because now
the scalar product (·, ·) fails to be symmetric and the relation (4.2.18) is invalid.
That T being unitary implies (1) is trivial since T (u)2 = (T (u), T (u)) =
(u, u) = u2 for any u ∈ U .
Assume (1) holds. From the expansions

u + v2 = u2 + v2 + 2{(u, v)}, (4.3.25)


iu + v2 = u2 + v2 + 2{(u, v)}, (4.3.26)

we obtain the following polarization identity in the complex situation:


1 1
(u, v) = (u + v2 − u2 − v2 ) + i(iu + v2 − u2 − v2 ),
2 2
u, v ∈ U. (4.3.27)
www.pdfgrip.com

4.3 Positive definite scalar products 133

Applying (4.3.27), we obtain

1
(T (u), T (v)) = (T (u + v)2 − T (u)2 − T (v)2 )
2
1
+ i(T (iu + v)2 − T (u)2 − T (v)2 )
2
1
= (u + v2 − u2 − v2 )
2
1
+ i(iu + v2 − u2 − v2 )
2
= (u, v), u, v ∈ U. (4.3.28)

Hence T is unitary.
The rest of the proof is similar to that of Theorem 4.6 and thus skipped.

If T ∈ L(U ) is orthogonal or unitary and λ ∈ C an eigenvalue of T , then it


is clear that |λ| = 1 since T is norm-preserving.
Let A ∈ F(n, n) and define TA ∈ L(Fn ) in the usual way TA (u) = Au for
any column vector u ∈ Fn .
When F = R, let the positive define scalar product be the Euclidean one
given in (4.3.1). That is,

(u, v) = ut v, u, v ∈ Rn . (4.3.29)

Thus

(u, TA (v)) = ut Av = (At u)t v = (TA (u), v), u, v ∈ Rn . (4.3.30)

Therefore TA (u) = At u (u ∈ Rn ). If TA is orthogonal, then TA ◦ TA = TA ◦


TA = I which leads to At A = AAt = In . Besides, a self-adjoint mapping
TA = TA is defined by a symmetric matrix, A = At .
The above discussion may be carried over to the abstract setting as follows.
Let U be a real vector space with a positive definite scalar product (·, ·) and
B = {u1 , . . . , un } an orthonormal basis. Assume T ∈ L(U ) is represented by
the matrix A ∈ R(n, n) with respect to the basis B so that


n
T (uj ) = aij ui , j = 1, . . . , n. (4.3.31)
i=1

Similarly T  ∈ L(U ) is represented by A = (aij ) ∈ R(n, n). Then we have

aij = (ui , T (uj )) = (T  (ui ), uj ) = aj i , i, j = 1, . . . , n. (4.3.32)


www.pdfgrip.com

134 Scalar products

So we again have A = At . If T is orthogonal, then T ◦ T  = T  ◦ T = I ,


which gives us AAt = At A = In as before. If T is self-adjoint, T = T  , then
A is again a symmetric matrix.
When F = C, let the positive define scalar product be the Hermitian one
given in (4.3.2). Then

(u, v) = ut v, u, v ∈ Cn . (4.3.33)

Thus
# t $t
(u, TA (v)) = ut Av = A u v = (TA (u), v), u, v ∈ Cn . (4.3.34)

Therefore TA (u) = A u (u ∈ Cn ). If TA is unitary, then TA ◦TA = TA ◦TA = I


t

which leads to A A = AA = In . Besides, a self-adjoint mapping T = T 


t t

is defined by a matrix which is symmetric under complex conjugation and


t
transpose of A. That is, A = A .
Similar to the real vector space situation, we leave it as an exercise to ex-
amine that, in the abstract setting, the matrix representing the adjoint mapping
with respect to an orthonormal basis is obtained by taking matrix transpose
and complex conjugate of the matrix of the original mapping with respect to
the same basis.
The above calculations lead us to formulate the following concepts, which
were originally introduced in Section 1.1 without explanation.

Definition 4.10 A real matrix A ∈ R(n, n) is said to be orthogonal if its trans-


pose At is its inverse. That is, AAt = At A = In . It is easily checked that
A is orthogonal if and only if its sets of column and row vectors both form
orthonormal bases of Rn with the standard Euclidean scalar product.
A complex matrix A ∈ C(n, n) is said to be unitary if the complex conjugate
t t
of its transpose A , also called its Hermitian conjugate denoted as A† = A ,
is the inverse of A. That is AA = A A = In . It is easily checked that A is
† †

unitary if and only if its sets of column and row vectors both form orthonormal
bases of Cn with the standard Hermitian scalar product.
A complex matrix A ∈ C(n, n) is called Hermitian if it satisfies the property
A = A† or, equivalently, if it defines a self-adjoint mapping TA = TA .

With the above terminology and the Gram–Schmidt procedure, we may


establish a well-known matrix factorization result, commonly referred to as
the QR factorization, for a non-singular matrix.
In fact, let A ∈ C(n, n) be non-singular and use u1 , . . . , un to denote the n
corresponding column vectors of A that form a basis of Cn . Use (·, ·) to denote
www.pdfgrip.com

4.3 Positive definite scalar products 135

the Hermitian scalar product on Cn . That is, (u, v) = u† v, where u, v ∈ Cn


are column vectors. Apply the Gram–Schmidt procedure to set


⎪ v1 = u 1 ,



⎪ (v1 , u2 )

⎨ v2 = u2 − v1 ,
(v1 , v1 )
(4.3.35)

⎪ ··· ··· ······



⎪ (v , u ) (vn−1 , un )

⎩ vn = un − 1 n v1 − · · · − vn−1 .
(v1 , v1 ) (vn−1 , vn−1 )
Then {v1 , . . . , vn } is an orthogonal basis of Cn . Set wi = (1/vi )vi for
i = 1, . . . , n. We see that {w1 , . . . , wn } is an orthonormal basis of Cn .
Therefore, inverting (4.3.35) and rewriting the resulting relations in terms of
{w1 , . . . , wn }, we get


⎪ u1 = v1 w1 ,



⎪ (v1 , u2 )

⎨ u2 = v1 w1 + v2 w2 ,
(v1 , v1 )

⎪ ··· ··· ······



⎪ (v1 , un ) (vn−1 , un )

⎩ un = v1 w1 + · · · + vn−1 wn−1 + vn wn .
(v1 , v1 ) (vn−1 , vn−1 )
(4.3.36)
For convenience, we may express (4.3.36) in the compressed form


⎪ u1 = r11 w1 ,


⎨ u = r w +r w ,
2 12 1 22 2
(4.3.37)

⎪ ··· ··· ······



un = r1n w1 + · · · + rn−1,n wn−1 + rnn wn ,
which implies the matrix relation
⎛ ⎞
r11 r12 ··· r1n
⎜ ⎟
⎜ 0 r22 ··· r2n ⎟
⎜ ⎟
A = (u1 , . . . , un ) = (w1 , . . . , wn ) ⎜ .. .. .. ⎟
⎜ ..
. ⎟
⎝ . . . ⎠
0 0 ··· rnn
= QR, (4.3.38)

where Q = (w1 , . . . , wn ) ∈ C(n, n) is unitary since its column vectors are


mutually perpendicular and of unit length and R = (rij ) ∈ C(n, n) is upper
triangular with positive diagonal entries since rii = vi  for i = 1, . . . , n.
www.pdfgrip.com

136 Scalar products

This explicit construction is the desired QR factorization for A. It is clear that


if A is real then the matrices Q and R are also real and Q is orthogonal.
To end this section, we note that the norm of a vector u in a vector space U
equipped with a positive definite scalar product (·, ·) may be evaluated through
the following useful expression

u = sup{|(u, v)| | v ∈ U, v = 1}. (4.3.39)

Indeed, let η denote the right-hand side of (4.3.39). From the Schwarz inequal-
ity (4.3.10), we get |(u, v)| ≤ u for any v ∈ U satisfying v = 1. So
η ≤ u. To show that η ≥ u, it suffices to consider the nontrivial situation
u = 0. In this case, we have
 
 1 
η ≥  u, u  = u. (4.3.40)
u
Hence, in conclusion, η = u and (4.3.39) follows.

Exercises

4.3.1 Let U be a vector space with a positive definite scalar product (·, ·).
Show that u1 , . . . , uk ∈ U are linearly independent if and only if their
associated metric matrix, also called the Gram matrix,
⎛ ⎞
(u1 , u1 ) · · · (u1 , uk )
⎜ ⎟
M = ⎝ ··· ··· ··· ⎠ (4.3.41)
(uk , u1 ) · · · (uk , uk )

is nonsingular.
4.3.2 (Continued from Exercise 4.3.1) Show that if u ∈ U lies in
Span{u1 , . . . , uk } then the column vector ((u1 , u), . . . , (uk , u))t lies in
the column space of the metric matrix M. However, the converse is not
true when k < dim(U ).
4.3.3 Let U be a complex vector space with a positive definite scalar product
and B = {u1 , . . . , un } an orthonormal basis of U . For T ∈ L(U ), let
A, A ∈ C(n, n) be the matrices that represent T , T  , respectively, with
respect to the basis B. Show that A = A† .
4.3.4 Let U be a finite-dimensional complex vector space with a positive defi-
nite scalar product and S ∈ L(U ) be anti-self-adjoint. That is, S  = −S.
Show that I ± S must be invertible.
4.3.5 Consider the complex vector space C(m, n). Show that

(A, B) = Tr(A† B), A, B ∈ C(m, n) (4.3.42)


www.pdfgrip.com

4.4 Orthogonal resolutions of vectors 137

defines a positive definite scalar product over C(m, n) that extends the
( C = C(m, 1). With such a
m
traditional Hermitian scalar product over
scalar product, the quantity A = (A, A) is sometimes called the
Hilbert–Schmidt norm of the matrix A ∈ C(m, n).
4.3.6 Let (·, ·) be the standard Hermitian scalar product on Cm and A ∈
C(m, n). Establish the following statement known as the Fredholm
alternative for complex matrix equations: Given b ∈ Cm the non-
homogeneous equation Ax = b has a solution for some x ∈ Cn if and
only if (y, b) = 0 for any solution y ∈ Cm of the homogeneous equation
A† y = 0.
4.3.7 For A ∈ C(n, n) show that if the column vectors of A form an orthonor-
mal basis of Cn with the standard Hermitian scalar product so do the
row vectors of A.
4.3.8 For the matrix
⎛ ⎞
1 −1 1
⎜ ⎟
A = ⎝ −1 1 2 ⎠, (4.3.43)
2 1 −2
obtain a QR factorization.

4.4 Orthogonal resolutions of vectors


In this section, we continue to study a finite-dimensional vector space U with a
positive definite scalar product (·, ·). We focus our attention on the problem of
resolving a vector into the span of a given set of orthogonal vectors. To avoid
the trivial situation, we always assume that the set of the orthogonal vectors
concerned never contains a zero vector unless otherwise stated.
We begin with the following basic orthogonal decomposition theorem.

Theorem 4.11 Let V be a subspace of U . Then there holds the orthogonal


decomposition
U = V ⊕ V ⊥. (4.4.1)

Proof If V = {0}, there is nothing to show. Assume V = {0}. Let


{v1 , . . . , vk } be an orthogonal basis of V . We can then expand {v1 , . . . , vk } into
an orthogonal basis for U , which may be denoted as {v1 , . . . , vk , w1 , . . . , wl },
where k + l = n = dim(U ). Of course, w1 , . . . , wl ∈ V ⊥ . For any u ∈ V ⊥ ,
we rewrite u as
u = a1 v1 + · · · + ak vk + b1 w1 + · · · + bl wl , (4.4.2)
www.pdfgrip.com

138 Scalar products

with some scalars a1 , . . . , ak , b1 , . . . , bl . From (u, vi ) = ai (vi , vi ) = 0


(i = 1, . . . , k), we obtain a1 = · · · = ak = 0, which establishes
u ∈ Span{w1 , . . . , wl }. So V ⊥ ⊂ Span{w1 , . . . , wl }. Therefore V ⊥ =
Span{w1 , . . . , wl } and U = V + V ⊥ .
Finally, take u ∈ V ∩ V ⊥ . Then (u, u) = 0. By the positivity condition, we
get u = 0. Thus (4.4.1) follows.

We now follow (4.4.1) to concretely construct the orthogonal decomposition


of a given vector.

Theorem 4.12 Let V be a nonzero subspace of U with an orthogonal basis


{v1 , . . . , vk }. Any u ∈ U may be uniquely decomposed into the form

u = v + w, v ∈ V, w ∈ V ⊥, (4.4.3)

where v is given by the expression


k
(vi , u)
v= vi . (4.4.4)
(vi , vi )
i=1

Moreover, the vector v given in (4.4.4) is the unique solution of the minimiza-
tion problem

η ≡ inf{u − x | x ∈ V }. (4.4.5)

Proof The validity of the expression (4.4.3) for some unique v ∈ V and
w ∈ V ⊥ is already ensured by Theorem 4.11.
We rewrite v as

k
v= ai vi . (4.4.6)
i=1

Then (vi , v) = ai (vi , vi ) (i = 1, . . . , k). That is,


(vi , v)
ai = , i = 1, . . . , k, (4.4.7)
(vi , vi )
which verifies (4.4.4).
For the scalars a1 , . . . , ak given in (4.4.7) and v in (4.4.6), we have from
(4.4.3) the relation


k 
k 
k
u− bi vi = w + (ai − bi )vi , x= bi vi ∈ V . (4.4.8)
i=1 i=1 i=1
www.pdfgrip.com

4.4 Orthogonal resolutions of vectors 139

Consequently,
" "2
" k "
" "
u − x2 = "u − bi vi "
" "
i=1
" "2
"  k "
" "
= "w + (ai − bi )vi "
" "
i=1

k
= w2 + |ai − bi |2 vi 2
i=1
≥ w2 , (4.4.9)

and the lower bound w2 is attained only when bi = ai for all i = 1, . . . , k,
or x = v.
So the proof is complete.

Definition 4.13 Let {v1 , . . . , vk } be a set of orthogonal vectors in U . For u ∈


U , the sum

k
(vi , u)
ai vi , ai = , i = 1, . . . , k, (4.4.10)
(vi , vi )
i=1

is called the Fourier expansion of u and a1 , . . . , ak are the Fourier coefficients,


with respect to the orthogonal set {v1 , . . . , vk }.

Of particular interest is when a set of orthogonal vectors becomes a basis.

Definition 4.14 Let {v1 , . . . , vn } be a set of orthogonal vectors in U . The set


is said to be complete if it is a basis of U .

The completeness of a set of orthogonal vectors is seen to be characterized


by the norms of vectors in relation to their Fourier coefficients.

Theorem 4.15 Let U be a vector space with a positive definite scalar product
and V = {v1 , . . . , vn } a set of orthogonal vectors in U . For any u ∈ U , let the
scalars a1 , . . . , an be the Fourier coefficients of u with respect to V.

(1) There holds the inequality



n
|ai |2 vi 2 ≤ u2 (4.4.11)
i=1
www.pdfgrip.com

140 Scalar products

(which is often referred to as the Bessel inequality).


(2) That u ∈ Span{v1 , . . . , vn } if and only if the equality in (4.4.11) is
attained. That is,

n
|ai |2 vi 2 = u2 (4.4.12)
i=1

(which is often referred to as the Parseval identity). Therefore, the set V is


complete if and only if the Parseval identity (4.4.12) holds for any u ∈ U .

Proof For given u, use v to denote the Fourier expansion of v as stated in


(4.4.3) and (4.4.4) with k = n. Then u2 = w2 + v2 . In particular,
v2 ≤ u2 , which is (4.4.11).
It is clear that u ∈ Span{v1 , . . . , vn } if and only if w = 0, which is equivalent
to the fulfillment of the equality u2 = v2 , which is (4.4.12).

If {u1 , . . . , un } is a set of orthogonal unit vectors in U , the Fourier expansion


and Fourier coefficients of a vector u ∈ U with respect to {u1 , . . . , un } take
the elegant forms

n
ai ui , ai = (ui , u), i = 1, . . . , n, (4.4.13)
i=1

such that the Bessel inequality becomes



n
|(ui , u)|2 ≤ u2 . (4.4.14)
i=1

Therefore {u1 , . . . , un } is complete if and only if the Parseval identity



n
|(ui , u)|2 = u2 (4.4.15)
i=1

holds for any u ∈ U .


It is interesting to note that, since ai = (vi , u)/vi 2 (i = 1, . . . , n) in
(4.4.11) and (4.4.12), the inequalities (4.4.11) and (4.4.14), and the identities
(4.4.12) and (4.4.15), are actually of the same structures, respectively.

Exercises

4.4.1 Let U be a finite-dimensional vector space with a positive definite scalar


product, (·, ·), and S = {u1 , . . . , un } an orthogonal set of vectors in U .
Show that the set S is complete if and only if S ⊥ = {0}.
www.pdfgrip.com

4.4 Orthogonal resolutions of vectors 141

4.4.2 Let V be a nontrivial subspace of a vector space U with a positive def-


inite scalar product. Show that, if {v1 , . . . , vk } is an orthogonal basis of
V , then the mapping P : U → U given by its Fourier expansion,


k
(vi , u)
P (u) = vi , u ∈ U, (4.4.16)
(vi , vi )
i=1

is the projection of U along V ⊥ onto V . That is, P ∈ L(U ), P 2 = P ,


N(P ) = V ⊥ , and R(P ) = V .
4.4.3 Let U be a finite-dimensional vector space with a positive definite scalar
product, (·, ·), and V a subspace of U . Use PV : U → U to denote the
projection of U onto V along V ⊥ . Prove that if W is a subspace of U
containing V then

u − PV (u) ≥ u − PW (u), u ∈ U, (4.4.17)

and V = W if and only if equality in (4.4.17) holds, where the norm  · 


of U is induced from the positive definite scalar product (·, ·). In other
words, orthogonal projection of a vector into a larger subspace provides
a better approximation of the vector.
4.4.4 Let V be a nontrivial subspace of a finite-dimensional vector space U
with a positive definite scalar product. Recall that over U/V we may
define the norm

[u] = inf{x | x ∈ [u]}, [u] ∈ U/V , (4.4.18)

for the quotient space U/V .


(i) Prove that for each [u] ∈ U/V there is a unique w ∈ [u] such that
[u] = w.
(ii) Find a practical method to compute the vector w shown to exist in
part (i) among the coset [u].
4.4.5 Consider the vector space Pn of real-coefficient polynomials in variable
t of degrees up to n ≥ 1 with the positive definite scalar product
 1
(u, v) = u(t)v(t) dt, u, v ∈ Pn . (4.4.19)
−1

Applying the Gram–Schmidt procedure to the standard basis


{1, t, . . . , t n } of Pn , we may construct an orthonormal basis, say
{L0 , L1 , . . . , Ln }, of Pn . The set of polynomials {Li (t)} are the well-
known Legendre polynomials. Explain why the degree of each Li (t)
must be i (i = 0, 1, . . . , n) and find Li (t) for i = 0, 1, 2, 3, 4.
www.pdfgrip.com

142 Scalar products

4.4.6 In P2 , find the Fourier expansion of the polynomial u(t) = −3t 2 + t − 5


in terms of the Legendre polynomials.
4.4.7 Let Pn be the vector space with the scalar product defined in (4.4.19).
For f ∈ P3 satisfying

f (1) = −1, f (t) = 2, f (t 2 ) = 6, f (t 3 ) = −5, (4.4.20)

find an element v ∈ P3 such that v is the pre-image of the Riesz mapping


ρ : P3 → P3 of f ∈ P3 , or ρ(v) = f . That is, there holds

f (u) = (u, v), u ∈ P3 . (4.4.21)

4.5 Orthogonal and unitary versus isometric mappings


Let U be a vector space with a positive definite scalar product, (·, ·), which
induces a norm  ·  on U . If T ∈ L(U ) is orthogonal or unitary, then it is
clear that

T (u) − T (v) = u − v, u, v ∈ U. (4.5.1)

In other words, the distance of the images of any two vectors in U under T
is the same as that between the two vectors. A mapping from U into itself
satisfying such a property is called an isometry or isometric. In this section, we
show that any zero-vector preserving mapping from a real vector space U with
a positive definite scalar product into itself satisfying the property (4.5.1) must
be linear. Therefore, in view of Theorems 4.6, it is orthogonal. In other words,
in the real setting, being isometric characterizes a mapping being orthogonal.

Theorem 4.16 Let U be a real vector space with a positive definite scalar
product. A mapping T from U into itself satisfying the isometric property
(4.5.1) and T (0) = 0 if and only if it is orthogonal.

Proof Assume T satisfies T (0) = 0 and (4.5.1). We show that T must be


linear. To this end, from (4.5.1) and replacing v by 0, we get T (u) = u
for any u ∈ U . On the other hand, the symmetry of the scalar product (·, ·)
gives us the identity

1
(u, v) = (u + v2 − u2 − v2 ), u, v ∈ U. (4.5.2)
2
www.pdfgrip.com

4.5 Orthogonal and unitary versus isometric mappings 143

Replacing u, v in (4.5.2) by T (u), −T (v), respectively, we get


1
−(T (u), T (v)) = (T (u) − T (v)2 − T (u)2 −  − T (v)2 )
2
1
= (u − v2 − u2 − v2 )
2
= −(u, v). (4.5.3)

Hence (T (u), T (v)) = (u, v) for any u, v ∈ U . Using this result, we have

T (u + v) − T (u) − T (v)2
= T (u + v)2 + T (u)2 + T (v)2
− 2(T (u + v), T (u)) − 2(T (u + v), T (v)) + 2(T (u), T (v))
= u + v2 + u2 + v2 − 2(u + v, u) − 2(u + v, v) + 2(u, v)
= (u + v) − u − v2 = 0, (4.5.4)

which proves the additivity condition

T (u + v) = T (u) + T (v), u, v ∈ U. (4.5.5)

Besides, setting v = −u in (4.5.5) and using T (0) = 0, we have T (−u) =


−T (u) for any u ∈ U .
Moreover, (4.5.5) also implies that, for any integer m, we have T (mu) =
1
mT (u). Replacing u by u where m is a nonzero integer, we also have
m
1 1
T ( u) = T (u). Combining these results, we conclude that
m m
T (ru) = rT (u), r ∈ Q, u ∈ U. (4.5.6)

Finally, for any a ∈ R, let {rk } be a sequence in Q such that rk → a as k → ∞.


Then we find
T (au) − aT (u) ≤ T (au) − T (rk u) + rk T (u) − aT (u)
= |a − rk |u + |rk − a|T (u) → 0 as k → ∞.
(4.5.7)
That is, T (au) = aT (u) for any a ∈ R and u ∈ U . So homogeneity is estab-
lished and the proof follows.

We next give an example showing that, in the complex situation, being iso-
metric alone is not sufficient to ensure a mapping to be unitary.
The vector space we consider here is taken to be Cn with the standard Her-
mitian scalar product (4.3.2) and the mapping T : Cn → Cn is given by
T (u) = u for u ∈ Cn . Then T satisfies (4.5.1) and T (0) = 0. However,
www.pdfgrip.com

144 Scalar products

T is not homogeneous with respect to scalar multiplication. More precisely,


T (au) = aT (u) whenever a ∈ C with (a) = 0 and u ∈ Cn \ {0}. Hence
T ∈ L(U ).
It will be interesting to spell out some conditions in the complex situation in
addition to (4.5.1) that would, when put together with (4.5.1), along with the
zero-vector preserving property, ensure a mapping to be unitary. The following
theorem is such a result.

Theorem 4.17 Let U be a complex vector space with a positive definite scalar
product (·, ·). A mapping T from U into itself satisfying the isometric property
(4.5.1), T (0) = 0, and
iT (u) − T (v) = iu − v, u, v ∈ U, (4.5.8)
if and only if it is unitary, where  ·  is induced from (·, ·).

Proof It is obvious that if T ∈ L(U ) is unitary then both (4.5.1) and (4.5.8)
are fulfilled. We now need to show that the converse is true.
First, setting v = iu in (4.5.8), we obtain
T (iu) = iT (u), u ∈ U. (4.5.9)
Next, replacing u, v in (4.3.27) by T (u), −T (v), respectively, and using
(4.5.1) and (4.5.8), we have
1
−(T (u), T (v)) = (T (u) − T (v)2 − T (u)2 − T (v)2 )
2
1
+ i(iT (u) − T (v)2 − T (u)2 − T (v)2 )
2
1
= (u − v2 − u2 − v2 )
2
1
+ i(u − iv2 − u2 − u2 )
2
= −(u, v), u, v ∈ U. (4.5.10)
That is, (T (u), T (v)) = (u, v) for u, v ∈ U . Thus a direct expansion gives us
the result
T (u + v) − T (u) − T (v)2
= T (u + v)2 + T (u)2 + T (v)2
− 2(T (u + v), T (u)) − 2(T (u + v), T (v)) + 2(T (u), T (v))
= u + v2 + u2 + v2 − 2(u + v, u) − 2(u + v, v) + 2(u, v)
= (u + v) − u − v2 = 0, (4.5.11)
www.pdfgrip.com

4.5 Orthogonal and unitary versus isometric mappings 145

which establishes the additivity property T (u+v) = T (u)+T (v) for u, v ∈ U


as before. Therefore (4.5.6) holds in the current complex formalism.
In view of the additivity property of T , (4.5.6), and (4.5.9), we have

T ((p + iq)u) = T (pu) + T (iqu) = pT (u) + iqT (u)


= (p + iq)T (u), p, q ∈ Q, u ∈ U. (4.5.12)

Finally, for any a = b + ic ∈ C where b, c ∈ R, we may choose a sequence


{rk } (rk = pk + iqk with pk , qk ∈ Q) such that pk → b and qk → c or
rk → a as k → ∞. In view of these and (4.5.12), we see that (4.5.7) holds in
the complex situation as well. In other words, T (au) = aT (u) for a ∈ C and
u ∈ U.
The proof is complete.

It is worth noting that Theorems 4.16 and 4.17 are valid in general without
restricting to finite-dimensional spaces.

Exercises

4.5.1 Let U be a real vector space with a positive definite scalar product (·, ·).
Establish the following variant of Theorem 4.16: A mapping T from U
into itself satisfying the property

T (u) + T (v) = u + v, u, v ∈ U, (4.5.13)

if and only if it is orthogonal, where  ·  is induced from (·, ·).


4.5.2 Show that the mapping T : Cn → Cn given by T (u) = u for u ∈ Cn ,
which is equipped with the standard Hermitian scalar product, satisfies
(4.5.13) but that it is not unitary.
4.5.3 Let U be a complex vector space with a positive definite scalar product
(·, ·). Establish the following variant of Theorem 4.17: A mapping T
from U into itself satisfying the property (4.5.13) and

iT (u) + T (v) = iu + v, u, v ∈ U, (4.5.14)

if and only if it is unitary, where  ·  is induced from (·, ·).


4.5.4 Check to see why the mapping T defined in Exercise 4.5.2 fails to satisfy
the property (4.5.14) with U = Cn .
4.5.5 Let T : Rn → Rn (n ≥ 2) be defined by
⎛ ⎞ ⎛ ⎞
xn x1
⎜ . ⎟ ⎜ . ⎟
T (x) = ⎜ ⎟ ⎜
⎝ .. ⎠ , x = ⎝ .. ⎠ ∈ R ,
⎟ n
(4.5.15)
x1 xn
www.pdfgrip.com

146 Scalar products

where Rn is equipped with the standard Euclidean scalar product.


(i) Show that T is an isometry.
(ii) Determine all the eigenvalues of T .
(iii) Show that eigenvectors associated to different eigenvalues are mu-
tually perpendicular.
(iv) Determine the minimal polynomial of T .
www.pdfgrip.com

5
Real quadratic forms and self-adjoint mappings

In this chapter we exclusively consider vector spaces over the field of reals
unless otherwise stated. We first present a general discussion on bilinear and
quadratic forms and their matrix representations. We also show how a sym-
metric bilinear form may be uniquely represented by a self-adjoint mapping.
We then establish the main spectrum theorem for self-adjoint mappings based
on a proof of the existence of an eigenvalue using calculus. We next focus on
characterizing the positive definiteness of self-adjoint mappings. After these
we study the commutativity of self-adjoint mappings. In the last section we
show the effectiveness of using self-adjoint mappings in computing the norm
of a mapping between different spaces and in the formalism of least squares
approximations.

5.1 Bilinear and quadratic forms


Let U be a finite-dimensional vector space over R. The simplest real-valued
functions over U are linear functions, which are also called functionals earlier
and have been studied. The next simplest real-valued functions to be studied
are bilinear forms whose definition is given as follows.

Definition 5.1 A function f : U × U → R is called a bilinear form if it


satisfies, for any u, v, w ∈ U and a ∈ R, the following conditions.

(1) f (u + v, w) = f (u, w) + f (v, w), f (au, v) = af (u, v).


(2) f (u, v + w) = f (u, v) + f (u, w), f (u, av) = af (u, v).

Let B = {u1 , . . . , un } be a basis of U . For u, v ∈ U with coordinate vectors


x = (x1 , . . . , xn )t , y = (y1 , . . . , yn )t ∈ Rn with respect to B, we have

147
www.pdfgrip.com

148 Real quadratic forms and self-adjoint mappings

⎛ ⎞

n 
n 
n
f (u, v) = f ⎝ xi ui , yj uj ⎠ = xi f (ui , uj )yj = x t Ay, (5.1.1)
i=1 j =1 i,j =1

where A = (aij ) = (f (ui , uj )) ∈ R(n, n) is referred to as the matrix repre-


sentation of the bilinear form f with respect to the basis B.
Let B̃ = {ũ1 , . . . , ũn } be another basis of U so that à = (ãij ) =
(f (ũi , ũj )) ∈ R(n, n) is the matrix representation of f with respect to B̃.
If x̃, ỹ ∈ Rn are the coordinate vectors of u, v ∈ U with respect to B̃ and the
basis transition matrix between B and B̃ is B = (bij ) so that


n
ũj = bij ui , j = 1, . . . , n, (5.1.2)
i=1

then x = B x̃, y = B ỹ (cf. Section 1.3). Hence we arrive at

f (u, v) = x̃ t Ãỹ = x t Ay = x̃ t (B t AB)ỹ, (5.1.3)

which leads to the relation à = B t AB and gives rise to the following concept.

Definition 5.2 For A, B ∈ F(n, n), we say that A and B are congruent if there
is an invertible element C ∈ F(n, n) such that

A = C t BC. (5.1.4)

Therefore, our calculation above shows that the matrix representations of a


bilinear form with respect to different bases are congruent.
For a bilinear form f : U × U → R, we can set

q(u) = f (u, u), u ∈ U, (5.1.5)

which is called the quadratic form associated with the bilinear form f . The
quadratic form q is homogeneous of degree 2 since q(tu) = t 2 q(u) for any
t ∈ R and u ∈ U .
Of course q is uniquely determined by f through (5.1.5). However, the con-
verse is not true, which will become clear after the following discussion.
To proceed, let B = {u1 , . . . , un } be a basis of U and u ∈ U any given
vector whose coordinate vector with respect to B is x = (x1 , . . . , xn )t . Then,
from (5.1.1), we have
 
1 1 1
q(u) = x t Ax = x t (A + At ) + (A − At ) x = x t (A + At )x,
2 2 2
(5.1.6)
www.pdfgrip.com

5.1 Bilinear and quadratic forms 149

since x t Ax is a scalar which results in x t Ax = (x t Ax)t = x t At x. In other


words, the quadratic form q constructed from (5.1.5) can only capture the
information contained in the symmetric part, 12 (A + At ), but nothing in the
skewsymmetric part, 12 (A−At ), of the matrix A = (f (ui , uj )). Consequently,
the quadratic form q cannot determine the bilinear form f completely, in gen-
eral situations, unless the skewsymmetric part of A is absent or A − At = 0.
In other words, A = (f (ui , uj )) is symmetric. This observation motivates the
following definition.

Definition 5.3 A bilinear form f : U × U → R is symmetric if it satisfies

f (u, v) = f (v, u), u, v ∈ U. (5.1.7)

If f is a symmetric bilinear form, then we have the expansion

f (u + v, u + v) = f (u, u) + f (v, v) + 2f (u, v), u, v ∈ U. (5.1.8)

Thus, if q is the quadratic form associated with f , we derive from (5.1.8) the
relation
1
f (u, v) = (q(u + v) − q(u) − q(v)), u, v ∈ U, (5.1.9)
2
which indicates how f is uniquely determined by q. In a similar manner, we
also have
1
f (u, v) = (q(u + v) − q(u − v)), u, v ∈ U. (5.1.10)
4
As in the situation of scalar products, the relations of the types (5.1.9) and
(5.1.10) are often referred to as polarization identities for symmetric bilinear
forms.
From now on we will concentrate on symmetric bilinear forms.
Let f : U × U → R be a symmetric bilinear form. If x, y ∈ Rn are the
coordinate vector of u, v ∈ U with respect to a basis B, then f (u, v) is given
by (5.1.1) so that matrix A ∈ R(n, n) is symmetric. Recall that (x, y) = x t y is
the Euclidean scalar product over Rn . Thus, if we view A as a linear mapping
Rn → Rn given by x #→ Ax, then the right-hand side of (5.1.1) is simply
(x, Ay). Since A = At , the right-hand side of (5.1.1) is also (Ax, y). In other
words, A defines a self-adjoint mapping over Rn with respect to the standard
Euclidean scalar product over Rn .
Conversely, if U is a vector space equipped with a positive definite scalar
product (·, ·) and T ∈ L(U ) is a self-adjoint or symmetric mapping, then

f (u, v) = (u, T (v)), u, v ∈ U, (5.1.11)


www.pdfgrip.com

150 Real quadratic forms and self-adjoint mappings

is a symmetric bilinear form. Thus, in this way, we see that symmetric bilinear
forms are completely characterized.
In a more precise manner, we have the following theorem, which relates
symmetric bilinear forms and self-adjoint mappings over a vector space with a
positive definite scalar product.

Theorem 5.4 Let U be a finite-dimensional vector space with a positive def-


inite scalar product (·, ·). For any symmetric bilinear form f : U × U → R,
there is a unique self-adjoint or symmetric linear mapping, say T ∈ L(U ),
such that the relation (5.1.11) holds.

Proof For each v ∈ U , the existence of a unique vector, say T (v), so that
(5.1.11) holds, is already shown in Section 4.2. Since f is bilinear, we have
T ∈ L(U ). The self-adjointness or symmetry of T follows from the symmetry
of f and the scalar product.

Note that, in view of Section 4.1, a symmetric bilinear form is exactly a kind
of scalar product as well, not necessarily positive definite, though.

Exercises

5.1.1 Let f : U × U → R be a bilinear form such that f (u, u) > 0 and


f (v, v) < 0 for some u, v ∈ U .
(i) Show that u, v are linearly independent.
(ii) Show that there is some w ∈ U, w = 0 such that f (w, w) = 0.
5.1.2 Let A, B ∈ F(n, n). Show that if A, B are congruent then A, B must
have the same rank.
5.1.3 Let A, B ∈ F(n, n). Show that if A, B are congruent and A is symmetric
then so is B.
5.1.4 Are the matrices
 
2 1 0 1
A= , B= , (5.1.12)
1 1 1 0

congruent in R(2, 2)?


5.1.5 Consider the identity matrix In ∈ F(n, n). Show that In and −In are not
congruent if F = R but are congruent if F = C.
www.pdfgrip.com

5.2 Self-adjoint mappings 151

5.1.6 Consider the quadratic form


⎛ ⎞
x1
⎜ ⎟
q(x) = x12 + 2x22 − x32 + 2x1 x2 − 4x1 x3 , x = ⎝ x2 ⎠ ∈ R3 ,
x3
(5.1.13)
where R3 is equipped with the standard Euclidean scalar product
(x, y) = x t y for x, y ∈ R3 .
(i) Find the unique symmetric bilinear form f : R3 × R3 → R such
that q(x) = f (x, x) for x ∈ R3 .
(ii) Find the unique self-adjoint mapping T ∈ L(R3 ) such that
f (x, y) = (x, T (y)) for any x, y ∈ R3 .
5.1.7 Let f : U × U → R be a bilinear form.
(i) Show that f is skewsymmetric, that is, f (u, v) = −f (v, u) for any
u, v ∈ U , if and only if f (u, u) = 0 for any u ∈ U .
(ii) Show that if f is skewsymmetric then the matrix representation of
f with respect to an arbitrary basis of U must be skewsymmetric.
5.1.8 Prove that Theorem 5.4 is still true if the positive definite scalar product
(·, ·) of U is only assumed to be non-degenerate instead to ensure the
existence and uniqueness of an element T ∈ L(U ).

5.2 Self-adjoint mappings


Let U be a finite-dimensional vector space with a positive definite scalar
product, (·, ·). For self-adjoint mappings, we have the following foundational
theorem.

Theorem 5.5 Assume that T ∈ L(U ) is self-adjoint with respect to the scalar
product of U . Then
(1) T must have a real eigenvalue,
(2) there is an orthonormal basis of U consisting of eigenvectors of T .

Proof We use induction on dim(U ).


There is nothing to show at dim(U ) = 1.
Assume that the theorem is true at dim(U ) = n − 1 ≥ 1.
We investigate the situation when dim(U ) = n ≥ 2.
www.pdfgrip.com

152 Real quadratic forms and self-adjoint mappings

For convenience, let B = {u1 , . . . , un } be an orthonormal basis of U . For


any vector u ∈ U , let x = (x1 , . . . , xn )t ∈ Rn be its coordinate vector. Use
A = (aij ) ∈ R(n, n) to denote the matrix representation of T with respect to
B so that

n
T (uj ) = aij ui , i = 1, . . . , n. (5.2.1)
i=1

Then set
⎛ ⎞
n 
n
Q(x) = (u, T (u)) = ⎝ xi ui , xj T (uj )⎠
i=1 j =1
⎛ ⎞

n 
n 
n 
n
=⎝ xi ui , xj akj uk ⎠ = xi xj akj δik
i=1 j =1 k=1 i,j,k=1

n
= xi xj aij = x t Ax. (5.2.2)
i,j =1

Consider the unit sphere in U given as


  
n
 
n
S= u= xi ui ∈ U  u = (u, u) =
2
xi = 1 ,
2
(5.2.3)
i=1 i=1

which may also be identified as the unit sphere in Rn centered at the origin
and commonly denoted by S n−1 . Since S n−1 is compact, the function Q given
in (5.2.2) attains its minimum over S n−1 at a certain point on S n−1 , say x 0 =
(x10 , . . . , xn0 )t .
We may assume xn0 = 0. Without loss of generality, we may also assume
xn > 0 (the case xn0 < 0 can be treated similarly). Hence, near x 0 , we may
0

represent the points on S n−1 by the formulas



xn = 1 − x12 − · · · − xn−1
2 where (x1 , . . . , xn−1 ) is near (x10 , . . . , xn−1
0
).
(5.2.4)

Therefore, with
  
P (x1 , . . . , xn−1 ) = Q x1 , . . . , xn−1 , 1 − x12 − · · · − xn−1
2 , (5.2.5)

we see that (x10 , . . . , xn−1


0
) is a critical point of P . Thus we have

(∇P )(x10 , . . . , xn−1


0
) = 0. (5.2.6)
www.pdfgrip.com

5.2 Self-adjoint mappings 153

In order to carry out the computation involved in (5.2.6), we rewrite the


function P as


n−1
P (x1 , . . . , xn−1 ) = aij xi xj
i,j =1


n−1 
+2 ain xi 1 − x12 − · · · − xn−1
2

i=1
+ ann (1 − x12 − · · · − xn−1
2
). (5.2.7)

Thus


n−1 
∂P
=2 aij xj + 2ain 1 − x12 − · · · − xn−1
2
∂xi
j =1

xi 
n−1
− 2 aj n xj − 2ann xi , i = 1, . . . , n − 1.
1 − x12 − · · · − xn−1
2
j =1

(5.2.8)

Using (5.2.8) in (5.2.6), we arrive at


⎛ ⎞

n
1 n
aij xj0 = 0 ⎝ aj n xj0 ⎠ xi0 , i = 1, . . . , n − 1. (5.2.9)
xn
j =1 j =1

Note that we also have


⎛ ⎞

n
1 n
anj xj0 = 0 ⎝ aj n xj0 ⎠ xn0 (5.2.10)
xn
j =1 j =1

automatically since A is symmetric. Combining (5.2.9) and (5.2.10), we get


⎛ ⎞
1 ⎝
n
Ax 0 = λ0 x 0 , λ0 = 0 aj n xj0 ⎠ , (5.2.11)
xn
j =1

which establishes that λ0 is an eigenvalue and x 0 an eigenvector associated


with λ0 , of A.
Let v1 ∈ U be such that its coordinate vector with respect to the basis B
is x 0 . Then we have
www.pdfgrip.com

154 Real quadratic forms and self-adjoint mappings

⎛ ⎞
n 
n
T (v1 ) = T ⎝ ⎠
xj uj =
0
xj0 T (uj )
j =1 j =1
⎛ ⎞

n 
n n
= aij ui xj0 = ⎝ aij xj0 ⎠ ui
i,j =1 i=1 j =1

n
= λ0 xi0 ui = λ0 v1 , (5.2.12)
i=1

which verifies that λ0 is an eigenvalue of T and v1 is an associated eigenvector.


Now set U1 = Span{v1 } and make the decomposition U = U1 ⊕ U1⊥ .
We claim that U1⊥ is invariant under T . In fact, for any u ∈ U1⊥ , we have
(u, v1 ) = 0. Hence (T (u), v1 ) = (u, T (v1 )) = (u, λ0 v1 ) = λ0 (u, v1 ) = 0
which indicates T (u) ∈ U1⊥ .
Of course dim(U1⊥ ) = n − 1. Using the inductive assumption, we know that

U1 has an orthonormal basis {v2 , . . . , vn } so that each vi is an eigenvector
associated with a real eigenvalue of T , i = 2, . . . , n.
Finally, we may rescale v1 to make it a unit vector. Thus {v1 , v2 , . . . , vn } is
an orthonormal basis of U so that each vector vi is an eigenvector associated
with a real eigenvalue of T , i = 1, 2, . . . , n, as desired.
With respect to the basis {v1 , . . . , vn }, the matrix representation of T is
diagonal, whose diagonal, entries are the eigenvalues of T that are shown to be
all real. In other words, all eigenvalues of T are real.
Let T ∈ L(U ) be self-adjoint, use λ1 , . . . , λk to denote all the distinct eigen-
values of T , and denote by Eλ1 , . . . , Eλk the corresponding eigenspaces which
are of course invariant subspaces of T . Using Theorem 5.5, we see that there
holds the direct sum
U = Eλ1 ⊕ · · · ⊕ Eλk . (5.2.13)
In particular, we may use E0 to denote the eigenspace corresponding to the
eigenvalue 0 (if any). Moreover, we may set
) )
E+ = Eλi , E− = Eλi . (5.2.14)
λi >0 λi <0

The associated numbers


n0 = dim(E0 ), n+ = dim(E+ ), n− = dim(E− ), (5.2.15)
are exactly what were previously called the indices of nullity, positivity, and
negativity, respectively, in the context of a real scalar product in Section 4.2,
which is simply a symmetric bilinear form here and may always be represented
www.pdfgrip.com

5.2 Self-adjoint mappings 155

by a self-adjoint mapping over U . It is clear that n0 is simply the nullity of T ,


or n0 = n(T ), and n+ + n− is the rank of T , n+ + n− = r(T ). Furthermore,
for ui ∈ Eλi , uj ∈ Eλj , we have
λi (ui , uj ) = (T (ui ), uj ) = (ui , T (uj )) = λj (ui , uj ), (5.2.16)
which leads to (λi − λj )(ui , uj ) = 0. Thus, for i = j , we have (ui , uj ) = 0.
In other words, the eigenspaces associated with distinct eigenvalues of T are
mutually perpendicular. (This observation suggests a practical way to con-
struct an orthogonal basis of U consisting of eigenvectors of T : First find
all eigenspaces of T . Then obtain an orthogonal basis for each of these
eigenspaces by using the Gram–Schmidt procedure. Finally put all these or-
thogonal bases together to get an orthogonal basis of the full space.)
A useful matrix version of Theorem 5.5 may be stated as follows.

Theorem 5.6 A matrix A ∈ R(n, n) is symmetric if and only if there is an


orthogonal matrix P ∈ R(n, n) such that
A = P t DP , (5.2.17)
where D ∈ R(n, n) is a diagonal matrix and the diagonal entries of D are the
eigenvalues of A.

Proof The right-hand side of (5.2.17) is of course symmetric.


Conversely, let A be symmetric. The linear mapping T ∈ L(Rn ) defined by
T (x) = Ax, with x ∈ Rn any column vector, is self-adjoint with respect to the
standard scalar product over Rn . Hence there are column vectors u1 , . . . , un
in Rn consisting of eigenvectors of T or A associated with real eigenvalues,
say λ1 , . . . , λn , which form an orthonormal basis of Rn . The relations Au1 =
λ1 u1 , . . . , Aun = λn un , may collectively be rewritten as AQ = QD where
D = diag{λ1 , . . . , λn } and Q ∈ R(n, n) is made of taking u1 , . . . , un as the
respective column vectors. Since uti uj = δij , i, j = 1, . . . , n, we see that Q is
an orthogonal matrix. Setting P = Qt , we arrive at (5.2.17).
Note that the proof of Theorem 5.6 gives us a practical way to construct the
matrices P and D in (5.2.17).

Exercises

5.2.1 The fact that a symmetric matrix A ∈ R(n, n) has and can have only
real eigenvalues may also be proved algebraically more traditionally as
follows. Consider A as an element in C(n, n) and let λ ∈ C be any of
its eigenvalue whose existence is ensured by the Fundamental Theorem
www.pdfgrip.com

156 Real quadratic forms and self-adjoint mappings

of Algebra. Let u ∈ Cn be an eigenvector of A associated to λ. Then,


using (·, ·) to denote the standard Hermitian scalar product over Cn , we
have
λ(u, u) = λu† u = (u, Au). (5.2.18)
Use (5.2.18) to show that λ must be real, or equivalently, λ = λ. Then
show that A has an eigenvector in Rn associated to λ.
5.2.2 Consider the quadratic form
⎛ ⎞
x1
⎜ ⎟
q(x) = x12 + 2x22 − 2x32 + 4x1 x3 , x = ⎝ x2 ⎠ ∈ R3 . (5.2.19)
x3

(i) Find a symmetric matrix A ∈ R(3, 3) such that q(x) = x t Ax.


(ii) Find an orthonormal basis of R3 (with respect to the standard Eu-
clidean scalar product of R3 ) consisting of the eigenvectors of A.
(iii) Find an orthogonal matrix P ∈ R(3, 3) so that the substitution of
the variable, y = P x, transforms the quadratic form (5.2.19) into
the diagonal form
⎛ ⎞
y1
⎜ ⎟
λ1 y12 + λ2 y22 + λ3 y32 = y t diag{λ1 , λ2 , λ3 }y, y = ⎝ y2 ⎠∈ R3 ,
y3
(5.2.20)
where λ1 , λ2 , λ3 are the eigenvalues of A, counting multiplicities.
5.2.3 Show that any symmetric matrix A ∈ R(n, n) must be congruent with
a diagonal matrix of the form D = diag{d1 , . . . , dn }, where di = ±1
or 0 for i = 1, . . . , n.
5.2.4 Let A ∈ R(n, n) be symmetric and det(A) < 0. Show that there is a
column vector x ∈ Rn such that x t Ax < 0.
5.2.5 Show that if A ∈ R(n, n) is orthogonal then adj(A) = ±At .
5.2.6 Show that if A1 , . . . , Ak ∈ R(n, n) are orthogonal, so is their product
A = A1 · · · Ak .
5.2.7 Assume that A ∈ R(n, n) is symmetric and all of its eigenvalues are
±1. Prove that A is orthogonal.
5.2.8 Let A ∈ R(n, n) be an upper or lower triangular matrix. If A is orthog-
onal, show that A must be diagonal and the diagonal entries of A can
only be ±1.
5.2.9 Show that if T ∈ L(U ) is self-adjoint and T m = 0 for some integer
m ≥ 1 then T = 0.
www.pdfgrip.com

5.3 Positive definite quadratic forms, mappings, and matrices 157

5.2.10 Show that if T ∈ L(U ) is self-adjoint and m an odd positive integer


then there is a self-adjoint element S ∈ L(U ) such that T = S m .
5.2.11 Let A ∈ R(n, n) be symmetric and satisfy the equation A3 +A2 +4In =
0. Prove that A = −2In .
5.2.12 Let A ∈ R(n, n) be orthogonal.
(i) Show that the real eigenvalues of A can only be ±1.
(ii) If n = odd and det(A) = 1, then 1 is an eigenvalue of A.
(iii) If det(A) = −1, then −1 is an eigenvalue of A.
5.2.13 Let x ∈ Rn be a nonzero column vector. Show that
 
2
P = In − xx t (5.2.21)
xt x
is an orthogonal matrix.
5.2.14 Let A ∈ R(n, n) be an idempotent symmetric matrix such that r(A) =
r. Show that the characteristic polynomial of A is

pA (λ) = det(λIn − A) = (λ − 1)r λn−r . (5.2.22)

5.2.15 Let A ∈ R(n, n) be a symmetric matrix whose eigenvalues are all non-
negative. Show that det(A + In ) > 1 if A = 0.
5.2.16 Let u ∈ Rn be a nonzero column vector. Show that there is an orthog-
onal matrix Q ∈ R(n, n) such that

Qt (uut )Q = diag{ut u, 0, . . . , 0}. (5.2.23)

5.3 Positive definite quadratic forms, mappings,


and matrices
Let q be a quadratic form over a finite-dimensional vector space U with a pos-
itive definite scalar product (·, ·) and f a symmetric bilinear form that induces
q: q(u) = f (u, u), u ∈ U . Then there is a self-adjoint mapping T ∈ L(U )
such that

q(u) = (u, T (u)), u ∈ U. (5.3.1)

In this section, we apply the results of the previous section to investigate the
situation when q stays positive, which is important in applications.

Definition 5.7 The positive definiteness of various subjects of concern is


defined as follows.
www.pdfgrip.com

158 Real quadratic forms and self-adjoint mappings

(1) A quadratic form q over U is said to be positive definite if


q(u) > 0, u ∈ U, u = 0. (5.3.2)
(2) A self-adjoint mapping T ∈ L(U ) is said to be positive definite if
(u, T (u)) > 0, u ∈ U, u = 0. (5.3.3)
(3) A symmetric matrix A ∈ R(n, n) is said to be positive definite if for any
nonzero column vector x ∈ Rn there holds
x t Ax > 0. (5.3.4)

Thus, when a quadratic form q and a self-adjoint mapping T are related


through (5.3.1), then the positive definiteness of q and T are equivalent. There-
fore it will be sufficient to study the positive definiteness of self-adjoint map-
pings which will be shown to be equivalent to the positive definiteness of any
matrix representations of the associated bilinear forms of the self-adjoint map-
pings. Recall that the matrix representation of the symmetric bilinear form
induced from T with respect to an arbitrary basis {u1 , . . . , un } of U is a matrix
A ∈ R(n, n) defined by
A = (aij ), aij = (ui , T (uj )), i, j = 1, . . . , n. (5.3.5)
The following theorem links various notions of positive definiteness.

Theorem 5.8 That a self-adjoint mapping T ∈ L(U ) is positive definite is


equivalent to any of the following statements.
(1) All the eigenvalues of T are positive.
(2) There is a positive constant, λ0 > 0, such that
(u, T (u)) ≥ λ0 u2 , u ∈ U. (5.3.6)

(3) There is a positive definite mapping S ∈ L(U ) such that T = S 2 .


(4) The matrix A defined in (5.3.5) with respect to an arbitrary basis is posi-
tive definite.
(5) The eigenvalues of the matrix A defined in (5.3.5) with respect to an arbi-
trary basis are all positive.
(6) The matrix A defined in (5.3.5) with respect to an arbitrary basis enjoys
the factorization A = B 2 for some positive definite matrix B ∈ R(n, n).

Proof Assume T is positive definite. Let λ be any eigenvalue of T and u ∈ U


an associated eigenvector. Then λu2 = (u, T u) > 0. Thus λ > 0 and (1)
follows.
www.pdfgrip.com

5.3 Positive definite quadratic forms, mappings, and matrices 159

Assume (1) holds. Let {u1 , . . . , un } be an orthonormal basis of U so that ui


is an eigenvector associated with the eigenvalue λi (i = 1, . . . , n). Set λ0 =
n
min{λ1 , . . . , λn }. Then λ0 > 0. Moreover, for any u ∈ U with u = ai ui
i=1
where ai ∈ R (i = 1, . . . , n), we have
⎛ ⎞
n n 
n 
n
(u, T (u)) = ⎝ ai ui , λj aj uj ⎠ = λi ai2 ≥ λ0 ai2 = λ0 u2 ,
i=1 j =1 i=1 i=1
(5.3.7)
which establishes (2). It is obvious that (2) implies the positive definiteness
of T .
We now show that the positive definiteness of T is equivalent to the state-
ment (3). In fact, let {u1 , . . . , un } be an orthonormal basis of U consisting of
eigenvectors with {λ1 , . . . , λn } the corresponding eigenvalues. In view of (1),
λi > 0 for i = 1, . . . , n. Now define S ∈ L(U ) to satisfy
(
S(ui ) = λi ui , i = 1, . . . , n. (5.3.8)

It is clear that T = S 2 .
Conversely, if T = S 2 for some positive definite mapping S, then 0 is not
an eigenvalue of S. So S is invertible. Therefore
(u, T (u)) = (u, S 2 (u)) = (S(u), S(u)) = S(u)2 > 0, u ∈ U, u = 0,
(5.3.9)
and the positive definiteness of T follows.
Let {u1 , . . . , un } be an arbitrary basis of U and x = (x1 , . . . , xn )t ∈ Rn a
 n
nonzero vector. For u = xi ui , we have
i=1
⎛ ⎞
n 
n 
n
(u, T (u)) = ⎝ xi ui , xj T (uj )⎠ = xi (ui , T (uj ))xj = x t Ax,
i=1 j =1 i,j =1
(5.3.10)
which is positive if T is positive definite, and vice versa. So the equivalence of
the positive definiteness of T and the statement (4) follows.
Suppose that A is positive definite. Let λ be any eigenvalue of A and x an
associated eigenvector. Then x t Ax = λx t x > 0 implies λ > 0 since x t x > 0.
So (5) follows.
Now assume (5). Using Theorem 5.6, we see that A = P t DP where D is a
diagonal matrix in R(n, n) whose diagonal entries are the positive eigenvalues
www.pdfgrip.com

160 Real quadratic forms and self-adjoint mappings

of A, say λ1 , . . . , λn , and P ∈ R(n, n) is an orthogonal matrix. Then, with


y = P x = (y1 , . . . , yn )t ∈ Rn where x ∈ Rn , we have

n
x t Ax = (P x)t DP x = y t Dy = λi yi2 > 0, (5.3.11)
i=1

whenever x = 0 since P is nonsingular. Hence (5) implies (4) as well.


Finally, assume (5) holds and set
( (
D1 = diag{ λ1 , . . . , λn }. (5.3.12)

Then D12 = D. Thus

A = P t DP = P t D12 P = (P t D1 P )(P t D1 P ) = B 2 , B = P t D1 P .
(5.3.13)
Using (5) we know that B is positive definite. So (6) follows. Conversely, if (6)
holds, we may use the symmetry of B to get x t Ax = x t B 2 x = (Bx)t (Bx) > 0
whenever x ∈ Rn with x = 0 because B is nonsingular, which implies Bx =
0. Thus (4) holds.
The proof is complete.

We note that, if in (5.3.5) the basis {u1 , . . . , un } is orthonormal, then



n 
n
T (uj ) = (ui , T (uj ))ui = aij ui , j = 1, . . . , n, (5.3.14)
i=1 i=1

which coincides with our notation adapted earlier.


For practical convenience, it is worth stating the following matrix version of
Theorem 5.8.

Theorem 5.9 That a symmetric matrix A ∈ R(n, n) is positive definite is


equivalent to any of the following statements.

(1) All the eigenvalues of A are positive.


(2) There is a positive definite matrix B ∈ R(n, n) such that A = B 2 .
(3) A is congruent to the identity matrix. That is, there is a nonsingular matrix
B ∈ R(n, n) such that A = B t In B = B t B.

Proof That A is positive definite is equivalent to either (1) or (2) has already
been demonstrated in the proof of Theorem 5.8 and it remains to establish (3).
If there is a nonsingular matrix B ∈ R(n, n) such that A = B t B, then
x t Ax = (Bx)t (Bx) > 0 for x ∈ Rn with x = 0, which implies that A is
positive definite.
www.pdfgrip.com

5.3 Positive definite quadratic forms, mappings, and matrices 161

Now assume A is positive definite. Then, combining Theorem 5.6 and (1),
we can rewrite A as A = P t DP where P ∈ R(n, n) is orthogonal and
D ∈ R(n, n) is diagonal whose diagonal entries, say λ1 , . . . , λn , are all posi-
tive. Define the diagonal matrix D1 by (5.3.12) and set D1 P = B. Then B is
nonsingular and A = B t B as asserted.

Similarly, we say that the quadratic form q, self-adjoint mapping T , or


symmetric matrix A ∈ R(n, n) is positive semi-definite or non-negative if in
the condition (5.3.2), (5.3.3), or (5.3.4), respectively, the ‘greater than’ sign
(>) is replaced by ‘greater than or equal to’ sign (≥). For positive semi-
definite or non-negative mappings and matrices, the corresponding versions
of Theorems 5.8 and 5.9 can similarly be stated, simply with the word ‘pos-
itive’ there being replaced by ‘non-negative’ and In in Theorem 5.9 (3) by
diag{1, . . . , 1, 0, . . . , 0}.
Besides, we can define a quadratic form q, self-adjoint mapping T , or sym-
metric matrix A to be negative definite or negative semi-definite (non-positive),
if −q, −T , or −A is positive definite or positive semi-definite (non-negative).
Thus, a quadratic form, self-adjoint mapping, or symmetric matrix is called
indefinite or non-definite if it is neither positive semi-definite nor negative
semi-definite.
As an illustration, we consider the quadratic form
⎛ ⎞
x1
⎜ . ⎟
q(x) = a(x12 + x22 + x32 + x42 ) + 2x1 x2 + 2x1 x3 + 2x1 x4 , x = ⎜ ⎟
⎝ .. ⎠∈ R ,
4

x4
(5.3.15)

where a is a real number. It is clear that q may be represented by the matrix


⎛ ⎞
a 1 1 1
⎜ ⎟
⎜ 1 a 0 0 ⎟
A=⎜ ⎜ 1 0
⎟, (5.3.16)
⎝ a 0 ⎟⎠
1 0 0 a

with respect to the standard basis of R4 , whose characteristic equation is

(λ − a)2 ([λ − a]2 − 3) = 0. (5.3.17)



Consequently
√ the eigenvalues of√A are λ1 = λ2 = a, λ3 = a + 3, λ4 =
a − √3. Therefore, when a > 3, the matrix A is positive definite; when √
a = 3, A is positive semi-definite but not positive definite; when a < − 3,
www.pdfgrip.com

162 Real quadratic forms and self-adjoint mappings


A is negative definite; when a√= −
√ 3, A is negative semi-definite but not
negative definite; when a ∈ (− 3, 3), A is indefinite.

Exercises

5.3.1 Let T ∈ L(U ) be self-adjoint and S ∈ L(U ) be anti-self-adjoint. Show


that if T is positive definite then so is T − S 2 .
5.3.2 Prove that if A and B are congruent matrices then A being positive
definite or positive semi-definite is equivalent to B being so.
5.3.3 Let U be a vector space with a positive definite scalar product (·, ·).
Show that, for any set of linearly independent vectors {u1 , . . . , uk } in
U , the metric matrix
⎛ ⎞
(u1 , u1 ) · · · (u1 , uk )
⎜ ⎟
M = ⎝ ··· ··· ··· ⎠, (5.3.18)
(uk , u1 ) · · · (uk , uk )
must be positive definite.
5.3.4 Show that a necessary and sufficient condition for A ∈ R(n, n) to be
symmetric is that there exists an invertible matrix B ∈ R(n, n) such
that A = B t B + aIn for some a ∈ R.
5.3.5 Let A ∈ R(n, n) be positive definite. Show that det(A) > 0.
5.3.6 Show that the inverse of a positive definite matrix is also positive defi-
nite.
5.3.7 Show that if A ∈ R(n, n) is positive definite then so is adj(A).
5.3.8 Let A ∈ R(m, n) where m = n. Then AAt ∈ R(m, m) and At A ∈
R(n, n) are both symmetric. Prove that if A is of full rank, that is,
r(A) = min{m, n}, then one of the matrices AAt , At A is positive def-
inite and the other is positive semi-definite but can never be positive
definite.
5.3.9 Let A = (aij ) ∈ R(n, n) be positive definite.
(i) Prove that aii > 0 for i = 1, . . . , n.
(ii) Establish the inequalities

|aij | < aii ajj , i = j, i, j = 1, . . . , n. (5.3.19)
5.3.10 Let T ∈ L(U ) be self-adjoint. Show that, if T is positive semi-definite
and k ≥ 1 any integer, then there is a unique positive semi-definite
element S ∈ L(U ) such that T = S k .
5.3.11 Let A ∈ R(n, n) be positive semi-definite and consider the null set
S = {x ∈ Rn | x t Ax = 0}. (5.3.20)
www.pdfgrip.com

5.3 Positive definite quadratic forms, mappings, and matrices 163

Prove that x ∈ S if and only if Ax = 0. Hence S is the null-space of


the matrix A. What happens when A is indefinite such that there are
nonzero vectors x, y ∈ Rn so that x t Ax > 0 and y t Ay < 0?
5.3.12 Let A, B ∈ R(n, n) be symmetric and A positive definite. Prove that
there is a nonsingular matrix C ∈ R(n, n) so that both C t AC and
C t BC are diagonal matrices (that is, A and B are simultaneously con-
gruent to diagonal matrices).
5.3.13 Assume that A, B ∈ R(n, n) are symmetric and the eigenvalues of
A, B are greater than or equal to a, b ∈ R, respectively. Show that the
eigenvalues of A + B are greater than or equal to a + b.
5.3.14 Let A, B ∈ R(n, n) be positive definite. Prove that the eigenvalues of
AB are all positive.
5.3.15 Consider the quadratic form
⎛ ⎞
 2
x1
n n
⎜ . ⎟
q(x) = (n + 1) xi2 − xi , x = ⎜ ⎟
⎝ .. ⎠ ∈ R .
n

i=1 i=1
xn
(5.3.21)

(i) Find a symmetric matrix A ∈ R(n, n) such that q(x) = x t Ax for


x ∈ Rn .
(ii) Compute the eigenvalues of A to determine whether q or A is pos-
itive definite.
5.3.16 Let A, B ∈ R(n, n) where A is positive definite and B is positive semi-
definite. Establish the inequality

det(A + B) ≥ det(A) + det(B) (5.3.22)

and show that equality in (5.3.22) holds only when B = 0.


5.3.17 Let A, B ∈ R(n, n) be positive definite such that B − A is positive
semi-definite. Prove that any solution λ of the equation

det(λA − B) = 0, (5.3.23)

must be real and satisfy λ ≥ 1.


5.3.18 Let U be an n-dimensional vector space with a positive definite scalar
product (·, ·) and T ∈ L(U ) a positive definite self-adjoint mapping.
Given b ∈ U , consider the function
1
f (u) = (u, T (u)) − (u, b), u ∈ U. (5.3.24)
2
www.pdfgrip.com

164 Real quadratic forms and self-adjoint mappings

(i) Show that the non-homogeneous equation


T (x) = b (5.3.25)
enjoys the following variational principle: x ∈ U solves (5.3.25)
if and only if it is the minimum point of f , i.e. f (u) ≥ f (x) for
any u ∈ U .
(ii) Show without using the invertibility of T that the solution to
(5.3.25) is unique.
5.3.19 Use the notation of the previous exercise and consider the quadratic
form q(u) = (u, T (u)) (u ∈ U ) where T ∈ L(U ) is positive semi-
definite. Prove that q is convex. That is, for any α, β ≥ 0 satisfying
α + β = 1, there holds
q(αu + βv) ≤ αq(u) + βq(v), u, v ∈ U. (5.3.26)

5.4 Alternative characterizations of positive definite matrices


In this section, we present two alternative characterizations of positive definite
matrices, that are useful in applications. The first one involves using determi-
nants and the second one involves an elegant matrix decomposition.

Theorem 5.10 Let A = (aij ) ∈ R(n, n) be symmetric. The matrix A is positive


definite if and only if all its leading principal minors are positive, that is, if and
only if
 
   a11 · · · a1n 
 a   
 11 a12   
a11 > 0,   > 0, . . . ,  · · · · · · · · ·  > 0. (5.4.1)
 a21 a22   
 a ··· a 
n1 nn

Proof First assume that A is positive definite. For any k = 1, . . . , n, set


⎛ ⎞
a11 · · · a1k
⎜ ⎟
Ak = ⎝ · · · · · · · · · ⎠ . (5.4.2)
ak1 ··· akk

For any y = (y1 , . . . , yk )t ∈ Rk , y = 0, we take x = (y1 , . . . , yk , 0, . . . , 0)t


∈ Rn . Thus, there holds

k
y t Ak y = aij yi yj = x t Ax > 0. (5.4.3)
i,j =1
www.pdfgrip.com

5.4 Alternative characterizations of positive definite matrices 165

So Ak ∈ R(k, k) is positive definite. Using Theorem 5.9, there is a nonsingular


matrix B ∈ R(k, k) such that Ak = B t B. Thus, det(Ak ) = det(B t B) =
det(B t ) det(B) = det(B)2 > 0, and (5.4.1) follows.
We next assume that (5.4.1) holds and we show that A is positive definite by
an inductive argument.
If n = 1, there is nothing to show.
Assume that the assertion is valid at n − 1 (n ≥ 2).
We prove that A is positive definite at n ≥ 2.
To proceed, we rewrite A in a blocked form as

An−1 α
A= , (5.4.4)
αt ann

where α = (a1n , . . . , an−1,n )t ∈ Rn−1 . By the inductive assumption at n − 1,


we know that An−1 is positive definite. Hence, applying Theorem 5.9, we have
a nonsingular matrix B ∈ R(n − 1, n − 1) such that An−1 = B t B. Thus
   
An−1 α Bt 0 In−1 β B 0
A= t
= t
, (5.4.5)
α ann 0 1 β ann 0 1

if we choose β = (B t )−1 α ≡ (b1 , . . . , bn−1 )t ∈ Rn−1 . Since det(A) > 0, we


obtain, after making some suitable row operations, the result

det(A) In−1 β
= det = ann − b12 − · · · − bn−1
2
> 0. (5.4.6)
det(An−1 ) βt ann

Now for x ∈ Rn so that



B 0
y= x ≡ (y1 , . . . , yn−1 , yn )t , (5.4.7)
0 1

we have

In−1 β
x Ax = y
t t
t
y
β ann
= y12 + · · · + yn−1
2
+ 2(b1 y1 + · · · + bn−1 yn−1 )yn + ann yn2
= (y1 + b1 yn )2 + · · · + (yn−1 + bn−1 yn )2
+ (ann − b12 − · · · − bn−1
2
)yn2 , (5.4.8)

which is clearly positive whenever y = 0, or equivalently, x = 0, in view of


the condition (5.4.6).
Therefore the proof is complete.
www.pdfgrip.com

166 Real quadratic forms and self-adjoint mappings

Note that, if we take x = (x1 , . . . , xn )t ∈ Rn so that all the nonvanishing


components of x, if any, are given by

xi1 = y1 , . . . , xik = yk , i1 , . . . , ik = 1, . . . , n, i1 < · · · < ik , (5.4.9)

(if k = 1 it is understood that x has only one nonvanishing component at


xi1 = y1 , if any) then for y = (y1 , . . . , yk )t ∈ Rk there holds


k
y t Ai1 ,...,ik y = ail im yl ym = x t Ax, (5.4.10)
l,m=1

where
⎛ ⎞
ai1 i1 ··· ai1 ik
⎜ ⎟
Ai1 ,...,ik = ⎝ · · · ··· ··· ⎠ (5.4.11)
aik i1 ··· aik ik
is a submatrix of A obtained from deleting all the ith rows and j th columns
of A for i, j = 1, . . . , n and i, j = i1 , . . . , ik . The quantity det(Ai1 ,...,ik )
is referred to as a principal minor of A of order k. Such a principal minor
becomes a leading principal minor when i1 = 1, . . . , ik = k with A1,...,k = Ak .
For A = (aij ) the principal minors of order 1 are all its diagonal entries,
a11 , . . . , ann . It is clear that if A is positive definite then so is Ai1 ,...,ik . Hence
det(Ai1 ,...,ik ) > 0. Therefore we arrive at the following slightly strengthened
version of Theorem 5.10.

Theorem 5.11 Let A = (aij ) ∈ R(n, n) be symmetric. The matrix A is positive


definite if and only if all its principal minors are positive, that is, if and only if
 
 ai1 i1 · · · ai1 ik 
 
 
aii > 0,  · · · · · · · · ·  > 0, (5.4.12)
 
 a ··· a 
ik i1 ik ik

for i, i1 , . . . , ik = 1, . . . , n, i1 < · · · < ik , k = 2, . . . , n.

We now pursue the decomposition of a positive definite matrix as another


characterization of this kind of matrices.

Theorem 5.12 Let A ∈ R(n, n) be symmetric. The matrix A is positive definite


if and only if there is a nonsingular lower triangular matrix L ∈ R(n, n) such
that A = LLt . Moreover, when A is positive definite, there is a unique L with
the property that all the diagonal entries of L are positive.
www.pdfgrip.com

5.4 Alternative characterizations of positive definite matrices 167

Proof If there is a nonsingular L ∈ R(n, n) such that A = LLt , we see in


view of Theorem 5.9 that A is of course positive definite.
Now we assume A = (aij ) ∈ R(n, n) is positive definite. We look for a
unique lower triangular matrix L = (lij ) ∈ R(n, n) such that A = LLt so that
all the diagonal entries of L are positive, l11 > 0, . . . , lnn > 0.
We again use induction.
When n = 1, then A = (a11 ) with a11 > 0. So the unique choice is L =

( a11 ).
Assume that the assertion is valid at n − 1 (n ≥ 2).
We establish the decomposition at n (n ≥ 2).
To proceed, we rewrite A in the form (5.4.4). In view of Theorem 5.10, the
matrix An−1 is positive definite. Thus there is a unique lower triangular matrix
L1 ∈ R(n − 1, n − 1), with positive diagonal entries, so that An−1 = L1 Lt1 .
Now take L ∈ R(n, n) with

L2 0
L= , (5.4.13)
γt a

where L2 ∈ R(n − 1, n − 1) is a lower triangular matrix, γ ∈ Rn−1 is a column


vector, and a ∈ R is a suitable number. Then, if we set A = LLt , we obtain
  
An−1 α L2 0 Lt2 γ
A= =
αt ann γt a 0 a
 t
L2 L2 L2 γ
= . (5.4.14)
(L2 γ ) γ γ + a 2
t t

Therefore we arrive at the relations

An−1 = L2 Lt2 , α = L2 γ , γ t γ + a 2 = ann . (5.4.15)

If we require that all the diagonal entries of L2 be positive, then the inductive
assumption leads to L2 = L1 . Hence the vector γ is also uniquely determined,
γ = L−11 α. So it remains to show that the number a may be uniquely deter-
mined as well. For this purpose, we need to show in (5.4.15) that

ann − γ t γ > 0. (5.4.16)

In fact, in (5.4.5), the matrix B is any nonsingular element in R(n − 1, n − 1)


that gives us An−1 = B t B. Thus, from the relation An−1 = L1 Lt here we may
specify L1 = B t in (5.4.5) so that β = γ there. That is,
www.pdfgrip.com

168 Real quadratic forms and self-adjoint mappings

   
An−1 α L1 0 In−1 γ Lt1 0
A= t
= t
.
α ann 0 1 γ ann 0 1
(5.4.17)
Hence (5.4.6), namely, (5.4.16), follows immediately. Therefore the last rela-
tion in (5.4.15) leads to the unique determination of the number a:
(
a = ann − γ t γ > 0. (5.4.18)

The proof follows.

The above-described characterization that a positive definite n × n matrix


can be decomposed as the product of a unique lower triangular matrix L, with
positive diagonal entries, and its transpose, is known as the Cholesky decom-
position theorem, which has wide applications in numerous areas. Through
resolving the Cholesky relation A = LLt , it may easily be seen that the lower
triangular matrix L = (lij ) can actually be constructed from A = (aij ) explic-
itly following the formulas
⎧ √

⎪ l11 = a11 ,



⎪ 1

⎪ li1 = ai1 , i = 2, . . . , n,

⎪ l

⎪ * 11

⎨ +
+ 
i−1
l = ,aii − lij2 , i = 2, . . . , n,


ii

⎪ j =1

⎪ ⎛ ⎞

⎪ j −1

⎪ 1 
⎪ l
⎪ = ⎝aij − lik lj k ⎠ , i = j + 1, . . . , n, j = 2, . . . , n.

⎩ ij ljj
k=1
(5.4.19)

Thus, if A ∈ R(n, n) is nonsingular, the product AAt may always be reduced


into the form LLt for a unique lower triangular matrix L with positive diagonal
entries.

Exercises

5.4.1 Let a1 , a2 , a3 be real numbers and set


⎛ ⎞
a1 a2 a3
⎜ ⎟
A = ⎝ a2 a3 a1 ⎠ . (5.4.20)
a3 a1 a2
Prove that A can never be positive definite no matter how a1 , a2 , a3 are
chosen.
www.pdfgrip.com

5.4 Alternative characterizations of positive definite matrices 169

5.4.2 Consider the quadratic form


⎛ ⎞
x1
⎜ ⎟
q(x) = (x1 + a1 x2 )2 +(x2 + a2 x3 )2 +(x3 + a3 x1 )2 , x = ⎝ x2 ⎠∈ R3 ,
x3
(5.4.21)

where a1 , a2 , a3 ∈ R. Find a necessary and sufficient condition on


a1 , a2 , a3 for q to be positive definite. Can you extend your finding to
the case over Rn ?
5.4.3 Let A = (aij ) ∈ R(n, n) be symmetric. Prove that if the matrix A is pos-
itive semi-definite then all its leading principal minors are non-negative,
 
   a11 ··· a1n 
 a  
 11 a12   
a11 ≥ 0,   ≥ 0, . . . ,  ··· ··· · · ·  ≥ 0. (5.4.22)
 a21 a22   
 a ··· ann 
n1

Is the converse true?


5.4.4 Investigate the possibility whether Theorem 5.12 can be extended so
that the lower triangular matrix may be replaced by an upper triangular
matrix so that the theorem may now read: A symmetric matrix A is pos-
itive definite if and only if there is a nonsingular upper triangular matrix
U such that A = U U t . Furthermore, if A is positive definite, then there
is a unique upper triangular matrix U with positive diagonal entries such
that A = U U t .
5.4.5 Assume that A = (aij ) ∈ R(n, n) is positive definite and rewrite A as

An−1 α
A= t
, (5.4.23)
α ann

where An−1 ∈ R(n, n) is positive definite by Theorem 5.10 and α is a


column vector in Rn−1 .
(i) Show that the matrix equation
 t  
In−1 β An−1 α In−1 β
t
0 1 α ann 0 1
 (5.4.24)
An−1 0
=
0 ann + a

has a unique solution for some β ∈ Rn−1 and a ≤ 0.


www.pdfgrip.com

170 Real quadratic forms and self-adjoint mappings

(ii) Use (i) to establish the inequality

det(A) ≤ det(An−1 )ann . (5.4.25)

(iii) Use (ii) and induction to establish the general conclusion

det(A) ≤ a11 · · · ann . (5.4.26)

(iv) Show that (5.4.26) still holds when A is positive semi-definite.


5.4.6 Let A ∈ R(n, n) and view A as made of n column vectors u1 , . . . , un in
Rn which is equipped with the standard Euclidean scalar product.
(i) Show that At A is the metric or Gram matrix of u1 , . . . , un ∈ Rn .
(ii) Use the fact that At A is positive semi-definite and (5.4.26) to prove
that

| det(A)| ≤ u1  · · · un . (5.4.27)

5.4.7 Let A = (aij ) ∈ R(n, n) and a ≥ 0 is a bound of the entries of A


satisfying

|aij | ≤ a, i, j = 1, . . . , n. (5.4.28)

Apply the conclusion of the previous exercise to establish the following


Hadamard inequality for determinants,
n
| det(A)| ≤ a n n 2 . (5.4.29)

5.4.8 Consider the matrix


⎛ ⎞
2 1 −1
⎜ ⎟
A=⎝ 1 3 1 ⎠. (5.4.30)
−1 1 2
(i) Apply Theorem 5.10 to show that A is positive definite.
(ii) Find the unique lower triangular matrix L ∈ R(3, 3) stated in The-
orem 5.12 such that A = LLt .

5.5 Commutativity of self-adjoint mappings


We continue our discussion about self-adjoint mappings over a real finite-
dimensional vector space U with a positive definite scalar product (·, ·).
The main focus of this section is to characterize a situation when two map-
pings may be simultaneously diagonalized.
www.pdfgrip.com

5.5 Commutativity of self-adjoint mappings 171

Theorem 5.13 Let S, T ∈ L(U ) be self-adjoint. Then there is an orthonor-


mal basis of U consisting of eigenvectors of both S, T if and only if S and T
commute, or S ◦ T = T ◦ S.

Proof Let λ1 , . . . , λk be all the distinct eigenvalues of T and Eλ1 , . . . , Eλk


the associated eigenspaces which are known to be mutually perpendicular. As-
sume that S, T ∈ L(U ) commute. Then, for any u ∈ Eλi (i = 1, . . . , k), we
have

T (S(u)) = S(T (u)) = S(λi u) = λi S(u). (5.5.1)

Thus S(u) ∈ Eλi . In other words, each Eλi is invariant under S. Since S is self-
adjoint, each Eλi has an orthonormal basis, say {ui,1 , . . . , ui,mi }, consisting of
the eigenvectors of S. Therefore, the set of vectors

{u1,1 , . . . , u1,m1 , . . . , uk,1 , . . . , uk,mk } (5.5.2)

is an orthonormal basis of U consisting of the eigenvectors of both S and T .


Conversely, if {u1 , . . . , un } is an orthonormal basis of U consisting of the
eigenvectors of both S and T , then

S(ui ) = εi ui , T (ui ) = λi ui , εi , λi ∈ R, i = 1, . . . , n. (5.5.3)

Thus T (S(ui )) = εi λi ui = S(T (ui )) (i = 1, . . . , n), which establishes S ◦


T = T ◦ S as anticipated.

Note that although the theorem says that, when S and T commute and
n = dim(U ), there are n mutually orthogonal vectors that are the eigenvec-
tors of S and T simultaneously, it does not say that an eigenvector of S or T
must also be an eigenvector of T or S. For example, we may take S = I (the
identity mapping). Then S commutes with any mapping and any nonzero vec-
tor u is an eigenvector of S but u cannot be an eigenvector of all self-adjoint
mappings.
The matrix version of Theorem 5.13 is easily stated and proved: Two sym-
metric matrices A, B ∈ R(n, n) are commutative, AB = BA, if and only if
there is an orthogonal matrix P ∈ R(n, n) such that

A = P t DA P , B = P t DB P , (5.5.4)

where DA , DB are diagonal matrices in R(n, n) whose diagonal entries are the
eigenvalues of A, B, respectively.
www.pdfgrip.com

172 Real quadratic forms and self-adjoint mappings

Exercises

5.5.1 Let U be a finite-dimensional vector space over a field F. Show that for
T ∈ L(U ), if T commutes with any S ∈ L(U ), then there is some a ∈ F
such that T = aI .
5.5.2 If A, B ∈ R(n, n) are positive definite matrices and AB = BA, show
that AB is also positive definite.
5.5.3 Let U be an n-dimensional vector space over a field F, where F = R or
C and T , S ∈ L(U ). Show that, if T has n distinct eigenvalues in F and
S commutes with T , then there is a polynomial p(t), of degree at most
(n − 1), of the form
p(t) = a0 + a1 t + · · · + an−1 t n−1 , a0 , a1 , . . . , an−1 ∈ F, (5.5.5)
such that S = p(T ).
5.5.4 (Continued from the previous exercise) If T ∈ L(U ) has n distinct
eigenvalues in F, show that
CT = {S ∈ L(U ) | S ◦ T = T ◦ S}, (5.5.6)
i.e. the subset of linear mappings over U and commutative with T , as a
subspace of L(U ), is exactly n-dimensional.
5.5.5 Let U be a finite-dimensional vector space with a positive definite scalar
product and T ∈ L(U ).
(i) Show that if T is normal satisfying T ◦ T  = T  ◦ T then T (u) =
T  (u) for any u ∈ U . In particular, N (T ) = N (T  ).
(ii) Show that if T is normal and idempotent, T 2 = T , then T = T  .

5.6 Mappings between two spaces


In this section, we briefly illustrate how to use self-adjoint mappings to study
general mappings between two vector spaces with positive definite scalar prod-
ucts.
Use U and V to denote two real vector spaces of finite dimensions equipped
with positive definite scalar products (·, ·)U and (·, ·)V , respectively. For T ∈
L(U, V ), since
f (u) = (T (u), v)V , u ∈ U, (5.6.1)
defines an element f in U  , we know that there is a unique element in U
depending on v, say T  (v), such that f (u) = (u, T  (v))U . That is,
(T (u), v)V = (u, T  (v))U , u ∈ U, v ∈ V . (5.6.2)
www.pdfgrip.com

5.6 Mappings between two spaces 173

Hence we have obtained a well-defined mapping T  : V → U .


It is straightforward to check that T  is linear. Thus T  ∈ L(V , U ). This
construction allows us to consider the composed mappings T  ◦ T ∈ L(U ) and
T ◦ T  ∈ L(V ), which are both seen to be self-adjoint.
Moreover, since

(u, (T  ◦ T )(u))U = (T (u), T (u))V ≥ 0, u ∈ U, (5.6.3)

(v, (T ◦ T  )(v))V = (T  (v), T  (v))U ≥ 0, v ∈ V, (5.6.4)

we see that T  ◦ T ∈ L(U ) and T ◦ T  ∈ L(V ) are both positive semi-definite.


Let  · U and  · V be the norms induced from the positive definite scalar
products (·, ·)U and (·, ·)V , respectively. Recall that the norm of T with respect
to  · U and  · V is given by

T  = sup{T (u)V | uU = 1, u ∈ U }. (5.6.5)

On the other hand, since T  ◦ T ∈ L(U ) is self-adjoint and positive semi-


definite, there is an orthonormal basis {u1 , . . . , un } of U consisting of eigen-
vectors of T  ◦ T , associated with the corresponding non-negative eigenvalues

n  n
σ1 , . . . , σn . Therefore, for u = ai ui ∈ U with u2U = ai2 = 1, we
i=1 i=1
have

n
T (u)2V = (T (u), T (u))V = (u, (T  ◦ T )(u))U = σi ai2 ≤ σ0 , (5.6.6)
i=1

where
σ0 = max {σi }, (5.6.7)
1≤i≤n

which proves T  ≤ σ0 . Furthermore, let i = 1, . . . , n be such that σ0 = σi .
Then (5.6.5) leads to

T 2 ≥ T (ui )2V = (ui , (T  ◦ T )(ui ))U = σi = σ0 . (5.6.8)

Consequently, we may conclude with



T  = σ0 , σ0 is the largest eigenvalue of the mapping T  ◦ T . (5.6.9)

In particular, the above also gives us a practical method to compute the norm
of a linear mapping T from U into itself by using the generated self-adjoint
mapping T  ◦ T .
www.pdfgrip.com

174 Real quadratic forms and self-adjoint mappings

If T ∈ L(U ) is already self-adjoint, then, since T  is the radical root of the


largest eigenvalue of T 2 , we have the expression

T  = max {|λi |}, (5.6.10)


1≤i≤n

where λ1 , . . . , λn are the eigenvalues of T .


An immediate consequence of (5.6.9) and (5.6.10) is the elegant formula

T 2 = T  ◦ T , T ∈ L(U, V ). (5.6.11)

Assume that T ∈ L(U ) is self-adjoint. From (5.6.10), it is not hard to see


that T  may also be assessed according to

T  = sup{|(u, T (u))| | u ∈ U, u = 1}. (5.6.12)

In fact, let η denote the right-hand side of (5.6.12). Then the Schwarz
inequality (4.3.10) implies that |(u, T (u))| ≤ T (u) ≤ T  for u ∈ U satis-
fying u = 1. So η ≤ T . On the other hand, let λ be the eigenvalue of T
such that T  = |λ| and u ∈ U an associated eigenvector with u = 1. Then
T  = |λ| = |(u, T (u))| ≤ η. Hence (5.6.12) is verified.
We now show how to extend (5.6.12) to evaluate the norm of a general linear
mapping between vector spaces with scalar products.

Theorem 5.14 Let U, V be finite-dimensional vector spaces equipped with


positive definite scalar products (·, ·)U , (·, ·)V , respectively. For T ∈ L(U, V ),
we have

T  = sup{|(T (u), v)V | | u ∈ U, v ∈ V , uU = 1, vV = 1}. (5.6.13)

Proof Recall that there holds

T  = sup{T (u)V | u ∈ U, uU = 1}. (5.6.14)

Thus, for any ε > 0, there is some uε ∈ U with uε  = 1 such that

T (uε )V ≥ T  − ε. (5.6.15)

Furthermore, for T (uε ) ∈ V , we have

T (uε )V = sup{|(T (uε ), v)V | | v ∈ V , vV = 1}. (5.6.16)

Thus, there is some vε ∈ V with vε V = 1 such that

|(T (uε ), vε )V | ≥ (T (uε )V − ε. (5.6.17)


www.pdfgrip.com

5.6 Mappings between two spaces 175

As a consequence, if we use η to denote the right-hand side of (5.6.13), we


may combine (5.6.15) and (5.6.17) to obtain η ≥ T  − 2ε. Since ε > 0 is
arbitrary, we have η ≥ T .
On the other hand, for u ∈ U, v ∈ V with uU = 1, vV = 1, we may
use the Schwarz inequality (4.3.10) to get |(T (u), v)V | ≤ T (u)V ≤ T .
Hence η ≤ T .
Therefore we arrive at T  = η and the proof follows.

An important consequence of Theorem 5.14 is that the norms of a linear


mapping, over two vector spaces with positive definite scalar products, and its
dual assume the same value.

Theorem 5.15 Let U, V be finite-dimensional vector spaces equipped with


positive definite scalar products (·, ·)U , (·, ·)V , respectively. For T ∈ L(U, V )
and its dual T  ∈ L(V , U ), we have T  = T  . Thus T  ◦ T  = T ◦ T  
and the largest eigenvalues of the positive semi-definite mappings T  ◦ T ∈
L(U ) and T ◦ T  ∈ L(V ) are the same.

Proof The fact that T  = T   may be deduced from applying (5.6.13) to


T  and the relation (T (u), v)V = (u, T  (v))U (u ∈ U ,v ∈ V ). The conclusion
T  ◦ T  = T ◦ T   follows from (5.6.11) and that the largest eigenvalues
of the positive semi-definite mappings T  ◦ T and T ◦ T  are the same is a
consequence of the eigenvalue characterization of the norm of a self-adjoint
mapping stated in (5.6.10) and T  ◦ T  = T ◦ T  .

As an application, we see that, for any matrix A ∈ R(m, n), the largest
eigenvalues of the symmetric matrices At A ∈ R(n, n) and AAt ∈ R(m, m)
must coincide. In fact, regarding eigenvalues, it is not hard to establish a more
general result as stated as follows.

Theorem 5.16 Let U, V be vector spaces over a field F and T ∈ L(U, V ), S ∈


L(V , U ). Then the nonzero eigenvalues of S ◦ T and T ◦ S are the same.

Proof Let λ ∈ F be a nonzero eigenvalue of S ◦ T and u ∈ U an associated


eigenvector. We show that λ is also an eigenvalue of T ◦ S. In fact, from

(S ◦ T )(u) = λu, (5.6.18)

we have (T ◦ S)(T (u)) = λT (u). In order to show that λ is an eigenvalue of


T ◦ S, it suffices to show T (u) = 0. However, such a property is already seen
in (5.6.18) since λ = 0 and u = 0.
www.pdfgrip.com

176 Real quadratic forms and self-adjoint mappings

Interchanging S and T , we see that, if λ is a nonzero eigenvalue of T ◦ S,


then it is also an eigenvalue of S ◦ T .

It is interesting that we do not require any additional properties for the vector
spaces U, V in order for Theorem 5.16 to hold.
We next study another important problem in applications known as the least
squares approximation.
Let T ∈ L(U, V ) and v ∈ V . Consider the optimization problem
- .
η ≡ inf T (u) − v2V | u ∈ U . (5.6.19)

For simplicity, we first assume that (5.6.19) has a solution, that is, there is
some x ∈ U such that T (x) − v2V = η, and we look for some appropriate
condition the solution x will fulfill. For this purpose, we set

f (ε) = T (x + εy) − v2V , y ∈ U, ε ∈ R. (5.6.20)

Then f (0) = η ≤ f (ε). Hence we may expand the right-hand side of (5.6.20)
to obtain
 
df
0= = (T (x) − v, T (y))V + (T (y), T (x) − v)V
dε ε=0
= 2((T  ◦ T )(x) − T  (v), y)U , y ∈ U, (5.6.21)

which implies that x ∈ U is a solution to the equation

(T  ◦ T )(x) = T  (v), v ∈ V. (5.6.22)

This equation is commonly referred to as the normal equation. It is not hard


to see that the equation (5.6.22) is always consistent for any v ∈ V . In fact,
it is clear that R(T  ◦ T ) ⊂ R(T  ). On the other hand, if u ∈ N (T  ◦ T ),
then T (u)2V = (u, (T  ◦ T )(u)) = 0. Thus u ∈ N (T ). This establishes
N (T ) = N (T  ◦ T ). So by the rank equation we deduce dim(U ) = n(T ) +
r(T ) = n(T  ◦ T ) + r(T  ◦ T ). Therefore, in view of Theorem 2.9, we get
r(T  ◦ T ) = r(T ) = r(T  ). That is, R(T  ◦ T ) = R(T  ), as desired, which
proves the solvability of (5.6.22) for any v ∈ V .
Next, let x be a solution to (5.6.22). We show that x solves (5.6.19). In fact,
if y is another solution to (5.6.22), then z = x − y ∈ N (T  ◦ T ) = N (T ). Thus

T (y) − v2V = T (x + z) − v2V = T (x) − v2V , (5.6.23)


www.pdfgrip.com

5.6 Mappings between two spaces 177

which shows that the quantity T (x) − v2V is independent of the solution x
of (5.6.22). Besides, for any test element u ∈ U , we rewrite u as u = x + w
where x is a solution to (5.6.22) and w ∈ U . Then we have

T (u) − v2V =T (x) − v2V + 2(T (x) − v, T (w))V + T (w)2V


=T (x) − v2V + 2((T  ◦ T )(x) − T  (v), w)U
+ T (w)2V
=T (x) − v2V + T (w)2V
≥ T (x) − v2V . (5.6.24)

Consequently, x solves (5.6.19) as anticipated.


Using knowledge about self-adjoint mappings, we can express a solution to
the normal equation (5.6.22) explicitly in terms of the eigenvectors and eigen-
values of T  ◦ T . In fact, let {u1 , . . . , uk , . . . , un } be an orthonormal basis of U
consisting of the eigenvectors of the positive semi-definite mapping T  ◦ T ∈
L(U ) associated with the non-negative eigenvalues λ1 , . . . , λk , . . . , λn among
which λ1 , . . . , λk are positive (if any). Then a solution x of (5.6.22) may be
 n
written x = ai ui for some a1 , . . . , an ∈ R. Inserting this into (5.6.22), we
i=1
obtain

k
ai λi ui = T  (v). (5.6.25)
i=1

Thus, taking scalar product on both sides of the above, we find


1
ai = (ui , T  (v))U , i = 1, . . . , k, (5.6.26)
λi
which leads to the following general solution formula for the equation (5.6.22):


k
1
x= (ui , T  (v))U ui + x0 , x0 ∈ N (T ), (5.6.27)
λi
i=1

since N(T  ◦ T ) = N (T ).

Exercises

5.6.1 Show that the mapping T  : V → U defined in (5.6.2) is linear.


5.6.2 Let U, V be finite-dimensional vector spaces with positive definite scalar
products. For any T ∈ L(U, V ) show that the mapping T  ◦ T ∈ L(U )
is positive definite if and only if n(T ) = 0.
www.pdfgrip.com

178 Real quadratic forms and self-adjoint mappings

5.6.3 Let U, V be finite-dimensional vector spaces with positive definite scalar


products, (·, ·)U , (·, ·)V , respectively. For any T ∈ L(U, V ), define T  ∈
L(V , U ) by (5.6.2).
(i) Establish the relations R(T )⊥ = N (T  ) and R(T ) = N (T  )⊥ (the
latter is also known as the Fredholm alternative).
(ii) Prove directly, without resorting to Theorem 2.9, that r(T ) =
r(T  ).
(iii) Prove that r(T ◦ T  ) = r(T  ◦ T ) = r(T ).
5.6.4 Apply the previous exercise to verify the validity of the rank relation

r(At A) = r(AAt ) = r(A) = r(At ), A ∈ R(m, n). (5.6.28)

5.6.5 Let T ∈ L(R2 , R3 ) be defined by


⎛ ⎞
 1 2  
x1 ⎜ ⎟ x1 x1
T = ⎝ −1 1 ⎠ , ∈ R2 , (5.6.29)
x2 x2 x2
2 3

where R2 and R3 are equipped with the standard Euclidean scalar prod-
ucts.
(i) Find the eigenvalues of T  ◦ T and compute T .
(ii) Find the eigenvalues of T ◦ T  and verify that T ◦ T  is positive
semi-definite but not positive definite (cf. part (ii) of (2)).
(iii) Check to see that the largest eigenvalues of T  ◦ T and T ◦ T  are
the same.
(iv) Compute all eigenvalues of T ◦T  and T  ◦T and explain the results
in view of Theorem 5.16.
5.6.6 Let A ∈ F(m, n) and B ∈ F(n, m) where m < n and consider AB ∈
F(m, m) and BA ∈ F(n, n). Show that BA has at most m + 1 distinct
eigenvalues in F.
5.6.7 (A specialization of the previous exercise) Let u, v ∈ Fn be column
vectors. Hence ut v ∈ F and vut ∈ F(n, n).
(i) Show that the matrix vut has a nonzero eigenvalue in F only if
ut v = 0.
(ii) Show that, when ut v = 0, the only nonzero eigenvalue of vut in F
is ut v so that v is an associated eigenvector.
(iii) Show that, when ut v = 0, the eigenspace of vut associated with
the eigenvalue ut v is one-dimensional.
www.pdfgrip.com

5.6 Mappings between two spaces 179

5.6.8 Consider the Euclidean space Rk equipped with the standard inner prod-
uct and let A ∈ R(m, n). Formulate a solution of the following optimiza-
tion problem
- .
η ≡ inf Ax − b2 | x ∈ Rn , b ∈ Rm , (5.6.30)

by deriving a matrix version of the normal equation.


5.6.9 Consider a parametrized plane in R3 given by
x1 = y1 + y2 , x2 = y1 − y2 , x3 = y1 + y2 , y1 , y2 ∈ R.
(5.6.31)
Use the least squares approximation to find a point in the plane that is the
closest to the point in R3 with the coordinates x1 = 2, x2 = 1, x3 = 3.
www.pdfgrip.com

6
Complex quadratic forms
and self-adjoint mappings

In this chapter we extend our study on real quadratic forms and self-adjoint
mappings to the complex situation. We begin with a discussion on the com-
plex version of bilinear forms and the Hermitian structures. We will relate the
Hermitian structure of a bilinear form with representing it by a unique self-
adjoint mapping. Then we establish the main spectrum theorem for self-adjoint
mappings. We next focus again on the positive definiteness of self-adjoint map-
pings. We explore the commutativity of self-adjoint mappings and apply it to
obtain the main spectrum theorem for normal mappings. We also show how to
use self-adjoint mappings to study a mapping between two spaces.

6.1 Complex sesquilinear and associated quadratic forms


Let U be a finite-dimensional vector space over C. Extending the standard
Hermitian scalar product over Cn , we may formulate the notion of a complex
‘bilinear’ form as follows.

Definition 6.1 A complex-valued function f : U × U → C is called a


sesquilinear form, which is also sometimes loosely referred to as a bilinear
form, if it satisfies for any u, v, w ∈ U and a ∈ C the following conditions.
(1) f (u + v, w) = f (u, w) + f (v, w), f (u, v + w) = f (u, v) + f (u, w).
(2) f (au, v) = af (u, v), f (u, av) = af (u, v).

As in the real situation, we may consider how to use a matrix to represent


a sesquilinear form. To this end, let B = {u1 , . . . , un } be a basis of U . For
u, v ∈ U , let x = (x1 , . . . , xn )t , y = (y1 , . . . , yn )t ∈ Cn denote the coordinate
vectors of u, v with respect to the basis B. Then

180
www.pdfgrip.com

6.1 Complex sesquilinear and associated quadratic forms 181

⎛ ⎞

n 
n 
n
f (u, v) = f ⎝ xi ui , yj uj ⎠ = x i f (ui , uj )yj = x t Ay = x † Ay,
i=1 j =1 i,j =1
(6.1.1)
where A = (aij ) = (f (ui , uj )) lies in C(n, n) which is the matrix representa-
tion of the sesquilinear form f with respect to B.
Let B̃ = {ũ1 , . . . , ũn } be another basis of U so that the coordinate vectors of
u, v are x̃, ỹ ∈ Cn with respect to B̃. Hence, using à = (ãij ) = (f (ũi , ũj )) ∈
C(n, n) to denote the matrix representation of the sesquilinear form f with
respect to B̃ and B = (bij ) ∈ C(n, n) the basis transition matrix from B into
B̃ so that

n
ũj = bij ui , j = 1, . . . , n, (6.1.2)
i=1

we know that the relations x = B x̃ and y = B ỹ are valid. Therefore, we can


conclude with
f (u, v) = x̃ † Ãỹ = x † Ay = x̃ † (B † AB)ỹ. (6.1.3)

Consequently, there holds à = B † AB. As in the real situation, we make the


definition that two matrices A, B ∈ C(n, n) are said to be Hermitian congru-
ent, or simply congruent, if there is an invertible matrix C ∈ C(n, n) such that
A = C † BC. (6.1.4)
Hence we see that the matrix representations of a sesquilinear form over U
with respect to different bases of U are Hermitian congruent.
Let f : U × U → C be a sesquilinear form. Define q : U → C by setting
q(u) = f (u, u), u ∈ U. (6.1.5)
As in the real situation, we may call q the quadratic form associated with f .
We have the following homogeneity property
q(zu) = |z|2 q(u), u ∈ U, z ∈ C. (6.1.6)
Conversely, we can also show that f is uniquely determined by q. This fact
is in sharp contrast with the real situation.
In fact, since
f (u + v, u + v) = f (u, u)+f (v, v)+f (u, v)+f (v, u), u, v ∈ U,
(6.1.7)
f (u+iv, u+iv) = f (u, u)+f (v, v)+if (u, v) − if (v, u), u, v ∈ U,
(6.1.8)
www.pdfgrip.com

182 Complex quadratic forms and self-adjoint mappings

we have the following polarization identity relating a sesquilinear form to its


induced quadratic form,
1
f (u, v) = (q(u + v) − q(u) − q(v))
2
i
− (q(u + iv) − q(u) − q(v)) , u, v ∈ U. (6.1.9)
2
For U , let B = {u1 , . . . , un } be a basis, u ∈ U , and x ∈ Cn the coordinate
vector of u with respect to B. In view of (6.1.1), we get
 
† 1 1
q(u) =x Ax = x

(A + A ) + (A − A ) x
† †
2 2
1 † 1
= x (A + A† )x + x † (A − A† )x = {q(u)} + i{q(u)}. (6.1.10)
2 2
In other words, q is real-valued when A is Hermitian, A = A† , which may be
checked to be equivalent to the condition
f (u, v) = f (v, u), u, v ∈ U. (6.1.11)
In fact, if q is real-valued, then replacing u by iv and v by u in (6.1.9),
we have
1 i
−if (v, u) = (q(u + iv) − q(u) − q(v)) − (q(u + v) − q(u) − q(v)) .
2 2
(6.1.12)
Combining (6.1.9) and (6.1.12), and using the condition that q is real-valued,
we arrive at (6.1.11).
Thus we are led to the following definition.

Definition 6.2 A sesquilinear form f : U × U → C is said to be Hermitian if


it satisfies the condition (6.1.11).

For any column vectors x, y ∈ Cn , the standard Hermitian scalar product is


given by
(x, y) = x † y. (6.1.13)
Thus f : Cn × Cn → C defined by f (x, y) = (x, y) for x, y ∈ Cn is a
Hermitian sesquilinear form. More generally, with any n × n Hermitian matrix
A, we see that
f (x, y) = x † Ay = (x, Ay), x, y ∈ Cn , (6.1.14)
is also a Hermitian sesquilinear form. Conversely, the relation (6.1.1) indi-
cates that a Hermitian sesquilinear form over an n-dimensional complex vector
www.pdfgrip.com

6.1 Complex sesquilinear and associated quadratic forms 183

space is completely represented, with respect to a given basis, by a Hermitian


matrix, in terms of the standard Hermitian scalar product over Cn .
Consider a finite-dimensional complex vector space U with a positive defi-
nite scalar product (·, ·). Given any sesquilinear form f : U × U → C, since
g(v) = f (u, v), v ∈ U, (6.1.15)
is linear functional in v depending on u ∈ U , there is a unique vector (say)
T (u) ∈ U such that
f (u, v) = (T (u), v), u, v ∈ U, (6.1.16)
which in fact defines the correspondence T as an element in L(U ). Let T 
denote the adjoint of T . Then f may also be represented as
f (u, v) = (u, T  (v)), u, v ∈ U. (6.1.17)
In other words, a sesquilinear form f over U may completely be represented
by a linear mapping T or T  from U into itself, in terms of the positive definite
scalar product over U , alternatively through the expression (6.1.16) or (6.1.17).
Applying (6.1.9) to f (u, v) = (u, T (v)) (u, v ∈ U ) for any T ∈ L(U ), we
obtain the following useful polarization identity for T :
1
(u, T (v)) = ((u + v, T (u + v)) − (u, T (u)) − (v, T (v)))
2
i
− ((u + iv, T (u + iv)) + (u, T (u)) + (v, T (v))) ,
2
u, v ∈ U. (6.1.18)
The Hermitian situation is especially interesting for us.

Theorem 6.3 Let f be a sesquilinear form over a finite-dimensional complex


vector space U with a positive definite scalar product (·, ·) and represented by
the mapping T ∈ L(U ) through (6.1.16) or (6.1.17). Then f is Hermitian if
and only if T is self-adjoint, T = T  .

Proof If f is Hermitian, then


f (u, v) = f (v, u) = (T (v), u) = (u, T (v)), u, v ∈ U. (6.1.19)

In view of (6.1.17) and (6.1.19), we arrive at T = T . The converse is similar.

Let B = {u1 , . . . , un } be an orthonormal basis of U , T ∈ L(U ) be self-


adjoint, and A = (aij ) ∈ C(n, n) the matrix representation of T with respect
 n
to B. Then T (uj ) = aij ui (j = 1, . . . , n) so that
i=1
www.pdfgrip.com

184 Complex quadratic forms and self-adjoint mappings

aij = (ui , T (uj )) = (T (ui ), uj ) = a j i , i, j = 1, . . . , n. (6.1.20)

Hence A = A† . Of course the converse is true too. Therefore T is self-adjoint


if and only if the matrix representation of T with respect to any orthonormal
basis is Hermitian. Consequently, a self-adjoint mapping over a complex vector
space with a positive definite scalar product is interchangeably referred to as
Hermitian as well.

Exercises

6.1.1 Let U be a complex vector space with a positive definite scalar product
(·, ·) and T ∈ L(U ). Show that T = 0 if and only if (u, T (u)) = 0 for
any u ∈ U . Give an example to show that the same may not hold for a
real vector space with a positive definite scalar product.
6.1.2 Let U be a complex vector space with a positive definite scalar product
(·, ·) and T ∈ L(U ). Show that T is self-adjoint or Hermitian if and only
if (u, T (u)) = (T (u), u) for any u ∈ U . Give an example to show that
the same may not hold for a real vector space with a positive definite
scalar product.
6.1.3 Let U be a complex vector space with a basis B = {u1 , . . . , un } and
A = (f (ui , uj )) ∈ C(n, n) the matrix representation of f with respect
to B. Show that f is Hermitian if and only if A is Hermitian.
6.1.4 Let U be a complex vector space with a positive definite scalar product
(·, ·) and T ∈ L(U ). Show that T can be uniquely decomposed into a
sum T = R + iS where R, S ∈ L(U ) are both self-adjoint.
6.1.5 Show that the inverse of an invertible self-adjoint mapping is also self-
adjoint.
6.1.6 Show that In and −In cannot be Hermitian congruent, although they are
congruent as elements in C(n, n).

6.2 Complex self-adjoint mappings


As in the real situation, we now show that complex self-adjoint or Hermitian
mappings are completely characterized by their spectra as well.

Theorem 6.4 Let U be a complex vector space with a positive definite scalar
product (·, ·) and T ∈ L(U ). If T is self-adjoint, then the following are valid.

(1) The eigenvalues of T are all real.


(2) Let λ1 , . . . , λk be all the distinct eigenvalues of T . Then T may be reduced
over the direct sum of mutually perpendicular eigenspaces
www.pdfgrip.com

6.2 Complex self-adjoint mappings 185

U = Eλ1 ⊕ · · · ⊕ Eλk . (6.2.1)

(3) There is an orthonormal basis of U consisting of eigenvectors of T .

Proof Let T be self-adjoint and λ ∈ C an eigenvalue of T with u ∈ U an


associated eigenvector. Then, using T (u) = λu, we have

λu2 = (u, T (u)) = (T (u), u) = λu2 , (6.2.2)

which gives us λ = λ so that λ ∈ R. This establishes (1). Note that this proof
does not assume that U is finite dimensional.
To establish (2), we use induction on dim(U ).
If dim(U ) = 1, there is nothing to show.
Assume that the statement (2) is valid if dim(U ) ≤ n − 1 for some n ≥ 2.
Consider dim(U ) = n, n ≥ 2.
Let λ1 be an eigenvalue of T in C. From (1), we know that actually λ1 ∈ R.
Use Eλ1 to denote the eigenspace of T associated with λ1 :

Eλ1 = {u ∈ U | T (u) = λ1 u} = N (λ1 I − T ). (6.2.3)

If Eλ1 = U , then T = λ1 I and there is nothing more to show. We now assume


Eλ1 = U .
It is clear that Eλ1 is invariant under T . In fact, T is reducible over the
direct sum U = Eλ1 ⊕ (Eλ1 )⊥ . To see this, we need to establish the invariance
T ((Eλ1 )⊥ ) ⊂ (Eλ1 )⊥ . Indeed, for any u ∈ Eλ1 and v ∈ (Eλ1 )⊥ , we have

(u, T (v)) = (T (u), v) = λ1 (u, v) = 0. (6.2.4)

Thus T (v) ∈ (Eλ1 )⊥ and T (Eλ1 )⊥ ⊂ (Eλ1 )⊥ .


Using the fact that dim((Eλ1 )⊥ ) = dim(U ) − dim(Eλ1 ) ≤ n − 1 and the
inductive assumption, we see that T is reduced over a direct sum of mutually
perpendicular eigenspaces of T in (Eλ1 )⊥ :

(Eλ1 )⊥ = Eλ2 ⊕ · · · ⊕ Eλk , (6.2.5)

where λ2 , . . . , λk are all the distinct eigenvalues of T over the invariant sub-
space (Eλ1 )⊥ , which are real.
Finally, we need to show that λ1 , . . . , λk obtained above are all possible
eigenvalues of T . For this purpose, let λ be an eigenvalue of T and u an
associated eigenvector. Then there are u1 ∈ Eλ1 , . . . , uk ∈ Eλk such that
u = u1 + · · · + uk . Hence the relation T (u) = λu gives us λ1 u1 + · · · + λk uk =
λ(u1 + · · · + uk ). That is,

(λ1 − λ)u1 + · · · + (λk − λ)uk = 0. (6.2.6)


www.pdfgrip.com

186 Complex quadratic forms and self-adjoint mappings

Since u = 0, there exists some i = 1, . . . , k such that ui = 0. Thus, taking


scalar product of the above equation with ui , we get (λi − λ)ui 2 = 0, which
implies λ = λi . Therefore λ1 , . . . , λk are all the possible eigenvalues of T .
To establish (3), we simply construct an orthonormal basis over each
eigenspace Eλi , obtained in (2), denoted by Bλi , i = 1, . . . , k. Then B =
Bλ1 ∪ · · · ∪ Bλk is a desired orthonormal basis of U as stated in (3).

Let A = (aij ) ∈ C(n, n) and consider the mapping T ∈ L(Cn ) defined by

T (x) = Ax, x ∈ Cn . (6.2.7)

Using the mapping (6.2.7) and Theorem 6.4, we may obtain the following
characterization of a Hermitian matrix, which may also be regarded as a matrix
version of Theorem 6.4.

Theorem 6.5 A matrix A ∈ C(n, n) is Hermitian if and only if there is a


unitary matrix P ∈ C(n, n) such that

A = P † DP , (6.2.8)

where D ∈ R(n, n) is a real diagonal matrix whose diagonal entries are the
eigenvalues of A.

Proof If (6.2.8) holds, it is clear that A is Hermitian, A = A† .


Conversely, assume A is Hermitian. Using B0 = {e1 , . . . , en } to denote the
standard basis of Cn equipped with the usual Hermitian positive definite scalar
product (x, y) = x † y for x, y ∈ Cn , we see that B0 is an orthonormal basis of
Cn . With the mapping T defined by (6.2.7), we have (T (x), y) = (Ax)† y =
x † Ay = (x, T (y)) for any x, y ∈ Cn , and

n
T (ej ) = aij ei , j = 1, . . . , n. (6.2.9)
i=1

Thus T is self-adjoint or Hermitian. Using Theorem 6.4, there is an


orthonormal basis, say B, consisting of eigenvalues of T , say λ1 , . . . , λn ,
which are all real. With respect to B, the matrix representation of T is diago-
nal, D = diag{λ1 , . . . , λn }. Now since the basis transition matrix from B0 into
B is unitary, thus (6.2.8) must hold for some unitary matrix P as expected.

In other words, Theorem 6.5 states that a complex square matrix is diago-
nalizable through a unitary matrix into a real diagonal matrix if and only if it
is Hermitian.
Practically, the decomposition (6.2.8) for a Hermitian matrix A may also,
more preferably, be established as in the proof of Theorem 5.6 as follows.
www.pdfgrip.com

6.2 Complex self-adjoint mappings 187

Find an orthonormal basis {u1 , . . . , un } of Cn , with the standard Hermitian


scalar product, consisting of eigenvectors of A: Aui = λi ui , i = 1, . . . , n. Let
Q ∈ C(n, n) be made of taking u1 , . . . , un as its respective column vectors.
Then Q is unitary and AQ = QD where D = diag{λ1 , . . . , λn }. Thus P = Q†
renders (6.2.8).

Exercises

6.2.1 Let A ∈ C(n, n) be Hermitian. Show that det(A) must be a real number.
6.2.2 (Extension of the previous exercise) Let U be a complex vector space
with a positive definite scalar product and T ∈ L(U ). If T is self-adjoint,
then the coefficients of the characteristic polynomial of T are all real.
6.2.3 Let U be a complex vector space with a positive definite scalar product
and S, T ∈ L(U ) self-adjoint and commutative, S ◦ T = T ◦ S.
(i) Prove the identity

(S ± iT )(u)2 = S(u)2 + T (u)2 , u ∈ U. (6.2.10)

(ii) Show that S ± iT is invertible if either S or T is so. However, the


converse is not true.
(This is an extended version of Exercise 4.3.4.)
6.2.4 Let U be a complex vector space with a positive definite scalar product
and V a subspace of U . Show that the mapping P ∈ L(U ) that projects
U onto V along V ⊥ is self-adjoint.
6.2.5 (A strengthened version of the previous exercise) If P ∈ L(U ) is idem-
potent, P 2 = P , show that P projects U onto R(P ) along R(P )⊥ if and
only if P  = P .
6.2.6 We rewrite any A ∈ C(n, n) into the form A = B + iC where
B, C ∈ R(n, n). Show that a necessary and sufficient condition for A
to be unitary is that B t C = C t B and B t B + C t C = In .
6.2.7 Let U = C[0, 1] be the vector space of all complex-valued continuous
functions in the variable t ∈ [0, 1] equipped with the positive definite
scalar product
 1
(u, v) = u(t)v(t) dt, u, v ∈ U. (6.2.11)
0

(i) Show that T (u)(t) = tu(t) for t ∈ [0, 1] defines a self-adjoint map-
ping in over U .
(ii) Show that T does not have an eigenvalue whatsoever.
6.2.8 Let T ∈ L(U ) be self-adjoint where U is a finite-dimensional complex
vector space with a positive definite scalar product.
www.pdfgrip.com

188 Complex quadratic forms and self-adjoint mappings

(i) Show that if T k = 0 for some integer k ≥ 1 then T = 0.


(ii) (A sharpened version of (i)) Given u ∈ U show that if T k (u) = 0
for some integer k ≥ 1 then T (u) = 0.

6.3 Positive definiteness


We now consider the notion of positive definiteness in the complex situation.
Assume that U is a finite-dimensional complex vector space with a positive
definite scalar product (·, ·).

6.3.1 Definition and basic characterization


We start with the definition of the positive definiteness of various subjects of
our interest in the complex situation.

Definition 6.6 The positive definiteness of a quadratic form, a self-adjoint or


Hermitian mapping, or a Hermitian matrix may be defined as follows.
(1) A real-valued quadratic form q over U (hence it is generated from a
sesquilinear Hermitian form) is positive definite if
q(u) > 0, u ∈ U, u = 0. (6.3.1)
(2) A self-adjoint or Hermitian mapping T ∈ L(U ) is positive definite if
(u, T (u)) > 0, u ∈ U, u = 0. (6.3.2)
(3) A Hermitian matrix A ∈ C(n, n) is positive definite if
x † Ax > 0, x ∈ Cn , x = 0. (6.3.3)

Let f : U × U → C be a sesquilinear Hermitian form and the quadratic


form q is obtained from f through (6.1.5). Then there is a unique self-adjoint
or Hermitian mapping T ∈ L(U ) such that
q(u) = f (u, u) = (u, T (u)), u ∈ U. (6.3.4)
Thus we see that the positive definiteness of q and that of T are equivalent.

n
Besides, let {u1 , . . . , un } be any basis of U and write u ∈ U as u = xi ui
i=1
where x = (x1 , . . . , xn )t ∈ Cn is the coordinate vector of u. Then, in view of
(6.1.1),
q(u) = x † Ax, A = (f (ui , uj )), (6.3.5)
www.pdfgrip.com

6.3 Positive definiteness 189

and the real-valuedness of q is seen to be equivalent to A being Hermitian and,


thus, the positive definiteness of A and that of q are equivalent. Therefore the
positive definiteness of a self-adjoint or Hermitian mapping is central, which
is our focus of this section.
Parallel to Theorem 5.8, we have the following.

Theorem 6.7 That a self-adjoint or Hermitian mapping T ∈ L(U ) is positive


definite is equivalent to any of the following statements.
(1) All the eigenvalues of T are positive.
(2) There is a positive constant, λ0 > 0, such that
(u, T (u)) ≥ λ0 u2 , u ∈ U. (6.3.6)
(3) There is a positive definite self-adjoint or Hermitian mapping S ∈ L(U )
such that T = S 2 .
(4) The Hermitian matrix A defined by
A = (aij ), aij = (ui , T (uj )), i, j = 1, . . . , n, (6.3.7)
with respect to an arbitrary basis {u1 , . . . , un } of U , is positive definite.
(5) The eigenvalues of the Hermitian matrix A defined in (6.3.7) with respect
to an arbitrary basis are all positive.
(6) The Hermitian matrix A defined in (6.3.7) with respect to an arbitrary
basis enjoys the factorization A = B 2 for some Hermitian positive definite
matrix B ∈ C(n, n).

The proof of Theorem 6.7 is similar to that of Theorem 5.8 and thus left as
an exercise. Here we check only that the matrix A defined in (6.3.7) is indeed
Hermitian. In fact, this may be seen from the self-adjointness of T through
a j i = (uj , T (ui )) = (T (uj ), ui ) = (ui , T (uj )) = aij , i, j = 1, . . . , n.
(6.3.8)
Note also that if {u1 , . . . , un } is an orthonormal basis of U , then the quanti-
ties aij (i, j = 1, . . . , n) in (6.3.7) simply give rise to the matrix representation
n
of T with respect to this basis. That is, T (uj ) = aij ui (j = 1, . . . , n).
i=1
A matrix version of Theorem 6.7 may be stated as follows.

Theorem 6.8 That a Hermitian matrix A ∈ C(n, n) is positive definite is


equivalent to any of the following statements.
(1) All the eigenvalues of A are positive.
www.pdfgrip.com

190 Complex quadratic forms and self-adjoint mappings

(2) There is a unitary matrix P ∈ C(n, n) and a diagonal matrix D ∈ R(n, n)


whose diagonal entries are all positive such that A = P † DP .
(3) There is a positive definite Hermitian matrix B ∈ C(n, n) such that A=B 2 .
(4) A is Hermitian congruent to the identity matrix. That is, there is a
nonsingular matrix B ∈ C(n, n) such that A = B † In B = B † B.

The proof is similar to that of Theorem 5.9 and left as an exercise.


Positive semi-definiteness and negative definiteness can be defined and
investigated analogously as in the real situation in Section 5.3 and are
skipped here.

6.3.2 Determinant characterization of positive definite


Hermitian matrices
If A ∈ C(n, n) is Hermitian, then det(A) ∈ R. Moreover, if A is positive
definite, Theorem 6.8 says there is a nonsingular matrix B ∈ C(n, n) such that
A = B † B. Thus det(A) = det(B † ) det(B) = | det(B)|2 > 0. Such a property
suggests that it may be possible to extend Theorem 5.10 to Hermitian matrices
as well, as we now do.

Theorem 6.9 Let A = (aij ) ∈ C(n, n) be Hermitian. The matrix A is positive


definite if and only if all its leading principal minors are positive, that is, if and
only if  
   a11 · · · a1n 
 a   
 11 a12   
a11 > 0,   > 0, . . . ,  · · · · · · · · ·  > 0. (6.3.9)
 a21 a22   
 a ann 
n1 · · ·

Proof The necessity proof is similar to that in Theorem 5.1.11 and omitted.
The sufficiency proof is also similar but needs some adaptation to meet the
delicacy with handling complex numbers. To see how this is done, we may
assume that (6.3.9) holds and we show that A is positive definite by an induc-
tive argument. Again, if n = 1, there is nothing to show, and assume that the
assertion is valid at n−1 (that is, when A ∈ C(n−1, n−1)) (n ≥ 2). It remains
to prove that A is positive definite at n ≥ 2.
As before, we rewrite the Hermitian matrix A in the blocked form

An−1 α
A= , (6.3.10)
α† ann

where α = (a1n , . . . , an−1,n )t ∈ Cn−1 . By the inductive assumption at n − 1,


we know that the Hermitian matrix An−1 is positive definite. Hence, applying
www.pdfgrip.com

6.3 Positive definiteness 191

Theorem 6.8, we have a nonsingular matrix B ∈ C(n − 1, n − 1) such that


An−1 = B † B. Thus
   
An−1 α B† 0 In−1 β B 0
A= †
= †
,
α ann 0 1 β ann 0 1
(6.3.11)

where β = (B † )−1 α ≡ (b1 , . . . , bn−1 )t ∈ Cn−1 . Since det(A) > 0, we obtain


with some suitable row operations the result

det(A) In−1 β
= det †
= ann − |b1 |2 − · · · − |bn−1 |2 > 0,
det(An−1 ) β ann
(6.3.12)

with det(B † B) = det(An−1 ). Taking x ∈ Cn and setting



B 0
y= x ≡ (y1 , . . . , yn−1 , yn )t , (6.3.13)
0 1
we have

In−1 β
x † Ax = y † †
y
β ann
= (y 1 + b1 y n , . . . , y n−1 + bn−1 y n ,
⎛ ⎞
y1
⎜ . ⎟
b1 y 1 + · · · + bn−1 y n−1 + ann y n ) ⎜ ⎟
⎝ .. ⎠
yn
= |y1 + b1 yn | + · · · + |yn−1 + bn−1 yn |2
2

+ (ann − |b1 |2 − · · · − |bn−1 |2 )|yn |2 , (6.3.14)


which is positive when y = 0 or x = 0 because of the condition (6.3.12).
Thus the proof is complete.

6.3.3 Characterization by the Cholesky decomposition


In this subsection, we show that the Cholesky decomposition theorem is also
valid in the complex situation.

Theorem 6.10 Let A = (aij ) ∈ C(n, n) be Hermitian. The matrix A is pos-


itive definite if and only if there is a nonsingular lower triangular matrix
L ∈ C(n, n) such that A = LL† . Moreover, when A is positive definite, there is
a unique such L with the property that all the diagonal entries of L are positive.
www.pdfgrip.com

192 Complex quadratic forms and self-adjoint mappings

Proof Assume A = LL† with L being nonsingular. Then y = L† x = 0


whenever x = 0. Thus we have x † Ax = x † (LL† )x = y † y > 0, which proves
that A is positive definite.
Assume now A is positive definite. We show that there is a unique lower
triangular matrix L = (lij ) ∈ C(n, n) whose diagonal entries are all positive,
l11 , . . . , lnn > 0, so that A = LL† . We again use induction.

When n = 1, we have A = (a11 ) and the unique choice for L is L = ( a11 )
since a11 > 0 in view of Theorem 6.9.
Assume the conclusion is established at n − 1 (n ≥ 2).
We proceed to establish the conclusion at n (n ≥ 2).
Rewrite A in the blocked form (6.3.10). Applying Theorem 6.9, we know
that An−1 ∈ C(n − 1, n − 1) is positive definite. So there is a unique lower
triangular matrix L1 ∈ C(n − 1, n − 1) with positive diagonal entries such that
An−1 = L1 L†1 .
Now consider L ∈ C(n, n) of the form

L2 0
L= , (6.3.15)
γ† a

where L2 ∈ C(n − 1, n − 1) is a lower triangular matrix, γ ∈ Cn−1 is a column


vector, and a ∈ R is a suitable number. Then, if we set A = LL† , we obtain
   †
An−1 α L2 0 L2 γ
A= †
= †
α ann γ a 0 a
 †
L 2 L2 L2 γ
= . (6.3.16)
(L2 γ ) γ γ + a 2
† †

Therefore we arrive at the relations

An−1 = L2 L†2 , α = L2 γ , γ † γ + a 2 = ann . (6.3.17)

Thus, if we require that all the diagonal entries of L2 are positive, then the
inductive assumption gives us L2 = L1 . So the vector γ is also uniquely
determined, γ = L−11 α. Thus it remains only to show that the number a may
be uniquely determined as well. To this end, we need to show in (6.3.17) that

ann − γ † γ > 0. (6.3.18)

In fact, inserting L1 = B † in (6.3.11), we have β = γ . That is,


    †
An−1 α L1 0 In−1 γ L1 0
A= †
= †
.
α ann 0 1 γ ann 0 1
(6.3.19)
www.pdfgrip.com

6.3 Positive definiteness 193

Hence (6.3.18) follows as a consequence of det(A) > 0. Thus the third equa-
tion in (6.3.17) leads to the unique determination of the positive number a as
in the real situation:

a = ann − γ † γ > 0. (6.3.20)

The inductive proof is now complete.

Exercises

6.3.1 Prove Theorem 6.7.


6.3.2 Prove Theorem 6.8.
6.3.3 Let A ∈ C(n, n) be nonsingular. Prove that there is a unique lower
triangular matrix L ∈ C(n, n) with positive diagonal entries so that
AA† = LL† .
6.3.4 Show that if A1 , . . . , Ak ∈ C(n, n) are unitary matrices, so is their
product A = A1 · · · Ak .
6.3.5 Show that if A ∈ C(n, n) is Hermitian and all its eigenvalues are ±1
then A is unitary.
6.3.6 Assume that u ∈ Cn is a nonzero column vector. Show that there is a
unitary matrix Q ∈ C(n, n) such that

Q† (uu† )Q = diag{u† u, 0, . . . , 0}. (6.3.21)

6.3.7 Let A = (aij ) ∈ C(n, n) be Hermitian. Show that if A is positive


definite then aii > 0 for i = 1, . . . , n and

|aij | < aii ajj , i, j = 1, . . . , n. (6.3.22)

6.3.8 Is the Hermitian matrix


⎛ ⎞
5 i 2−i
⎜ ⎟
A = ⎝ −i 4 1−i ⎠ (6.3.23)
2+i 1+i 3
positive definite?
6.3.9 Let A = (aij ) ∈ C(n, n) be a positive semi-definite Hermitian matrix.
Show that aii ≥ 0 for i = 1, . . . , n and that

det(A) ≤ a11 · · · ann . (6.3.24)

6.3.10 Let u1 , . . . , un be n column vectors of Cn , with the standard Hermitian


scalar product, which are the n column vectors of a matrix A ∈ C(n, n).
Establish the inequality

| det(A)| ≤ u1  · · · un . (6.3.25)


www.pdfgrip.com

194 Complex quadratic forms and self-adjoint mappings

What is the Hadamard inequality in the context of a complex matrix?


6.3.11 Let A ∈ C(n, n) be nonsingular. Show that there exist a unitary matrix
P ∈ C(n, n) and a positive definite Hermitian matrix B ∈ C(n, n) such
that A = P B. Show also that, if A is real, then the afore-mentioned
matrices P and B may also be chosen to be real so that P is orthogonal
and B positive definite.
6.3.12 Let A ∈ C(n, n) be nonsingular. Since A† A is positive definite, its
eigenvalues, say λ1 , . . . , λn , are all positive. Show that there exist uni-
tary matrices P and Q in C(n, n) such that A = P DQ where
-( ( .
D = diag λ1 , . . . , λn (6.3.26)

is a diagonal matrix. Show also that the same conclusion is true for
some orthogonal matrices P and Q in R(n, n) when A is real.
6.3.13 Let U be a finite-dimensional complex vector space with a positive
definite scalar product (·, ·) and T ∈ L(U ) a positive definite mapping.
(i) Establish the generalized Schwarz inequality
( (
|(u, T (v))| ≤ (u, T (u)) (v, T (v)), u, v ∈ U. (6.3.27)

(ii) Show that the equality in (6.3.27) occurs if and only if u, v are
linearly dependent.

6.4 Commutative self-adjoint mappings and consequences


In this section, we extend our investigation about commutativity of self-adjoint
mappings in the real situation to that in the complex situation. It is not hard to
see that the conclusion in this extended situation is the same as in the real
situation.

Theorem 6.11 Let U be a finite-dimensional complex vector space with a pos-


itive definite scalar product (·, ·) and S, T ∈ L(U ) be self-adjoint or Hermi-
tian. Then there is an orthonormal basis of U consisting of eigenvectors of
both S, T if and only if S and T commute, or S ◦ T = T ◦ S.

The proof of the theorem is the same as that for the real situation and
omitted.
We now use Theorem 6.11 to study a slightly larger class of linear mappings
commonly referred to as normal mappings.
www.pdfgrip.com

6.4 Commutative self-adjoint mappings and consequences 195

Definition 6.12 Let T ∈ L(U ). We say that T is normal if T and T  commute:


T ◦ T  = T  ◦ T. (6.4.1)

Assume T is normal. Decompose T in the form


1 1
T = (T + T  ) + (T − T  ) = P + Q, (6.4.2)
2 2
1 1
where P = (T + T  ) ∈ L(U ) is self-adjoint or Hermitian and Q = (T −
2 2
T  ) ∈ L(U ) is anti-self-adjoint or anti-Hermitian since it satisfies Q = −Q.
From
(u, iQ(v)) = i(u, Q(v)) = −i(Q(u), v) = (iQ(u), v), u, v ∈ U, (6.4.3)
we see that iQ ∈ L(U ) is self-adjoint or Hermitian. In view of (6.4.1), P
and iQ are commutative. Applying Theorem 6.11, we conclude that U has an
orthonormal basis consisting of eigenvectors of P and iQ simultaneously, say
{u1 , . . . , un }, associated with the corresponding real eigenvalues, {ε1 , . . . , εn }
and {δ1 , . . . , δn }, respectively. Consequently, we have
T (ui ) = (P + Q)(ui ) = (P − i[iQ])(ui ) = (εi + iωi )ui , i = 1, . . . , n,
(6.4.4)
where ωi = −δi (i = 1, . . . , n). This discussion leads us to the following
theorem.

Theorem 6.13 Let T ∈ L(U ). Then T is normal if and only if there is an


orthonormal basis of U consisting of eigenvectors of T .

Proof Suppose that U has an orthonormal basis consisting of eigenvectors


of T , say B = {u1 , . . . , un }, with the corresponding eigenvalues {λ1 , . . . , λn }.
Let T  ∈ L(U ) be represented by the matrix B = (bij ) ∈ C(n, n) with respect
to B. Then
 n
T  (uj ) = bij ui , j = 1, . . . , n. (6.4.5)
i=1
Therefore, we have


n

(T (uj ), ui ) = bkj uk , ui = bij , i, j = 1, . . . , n, (6.4.6)
k=1
and
(T  (uj ), ui ) = (uj , T (ui )) = (uj , λi ui ) = λi δij , i, j = 1, . . . , n. (6.4.7)
www.pdfgrip.com

196 Complex quadratic forms and self-adjoint mappings

Combining (6.4.6) and (6.4.7), we find bij = λi δij (i, j = 1, . . . , n).


In other words, B is diagonal, B = diag{λ1 , . . . , λn }, such that λ1 , . . . , λn
are the eigenvalues of T  with the corresponding eigenvectors u1 , . . . , un .
In particular,

(T ◦ T  )(ui ) = λi λi ui = λi λi ui = (T  ◦ T )(ui ), i = 1, . . . , n, (6.4.8)

which proves T ◦ T  = T  ◦ T . That is, T is normal.


Conversely, if T is normal, the existence of an orthonormal basis of U con-
sisting of eigenvectors of T is already shown.
The proof is complete.

To end this section, we consider commutative normal mappings.

Theorem 6.14 Let S, T ∈ L(U ) be normal. Then there is an orthonormal


basis of U consisting of eigenvectors of S and T simultaneously if and only if
S and T are commutative, S ◦ T = T ◦ S.

Proof If {u1 , . . . , un } is an orthonormal basis of U so that S(ui ) = γi ui


and T (ui ) = λi ui , γi , λi ∈ C, i = 1, . . . , n, then (S ◦ T )(ui ) = γi λi ui =
(T ◦ S)(ui ), i = 1, . . . , n, which establishes the commutativity of S and T :
S ◦ T = T ◦ S.
Conversely, assume S and T are commutative normal mappings. Let
λ1 , . . . , λk be all the distinct eigenvalues of T and Eλ1 , . . . , Eλk the associated
eigenspaces that may readily be checked to be mutually perpendicular (left
as an exercise in this section). Fix i = 1, . . . , k. For any u ∈ Eλi , we have
T (S(u)) = S(T (u)) = S(λi u) = λi S(u). Thus S(u) ∈ Eλi . That is, Eλi
is invariant under S. Since S is normal, Eλi has an orthonormal basis, say
{ui,1 , . . . , ui,mi }, consisting of the eigenvectors of S. Therefore, the set of
vectors

{u1,1 , . . . , u1,m1 , . . . , uk,1 , . . . , uk,mk } (6.4.9)

is an orthonormal basis of U consisting of the eigenvectors of both S


and T .

An obvious but pretty general example of a normal mapping that is not nec-
essarily self-adjoint or Hermitian is a unitary mapping T ∈ L(U ) since it
satisfies T ◦ T  = T  ◦ T = I . Consequently, for a unitary mapping T ∈ L(U ),
there is an orthonormal basis of U consisting of eigenvectors of T . Further-
more, since T is isometric, it is clear that all eigenvalues of T are of absolute
value 1, as already observed in Section 4.3.
www.pdfgrip.com

6.4 Commutative self-adjoint mappings and consequences 197

The matrix versions of Definition 6.12 and Theorems 6.13 and 6.14 may be
stated as follows.

Definition 6.15 A matrix A ∈ C(n, n) is said to be normal if it satisfies the


property AA† = A† A.

Theorem 6.16 Normal matrices have the following characteristic properties.

(1) A matrix A ∈ C(n, n) is diagonalizable through a unitary matrix, that is,


there is a unitary matrix P ∈ C(n, n) and a diagonal matrix D ∈ C(n, n)
such that A = P † DP , if and only if A is normal.
(2) Two normal matrices A, B ∈ C(n, n) are simultaneously diagonalizable
through a unitary matrix, that is, there is a unitary matrix P ∈ C(n, n)
and two diagonal matrices D1 , D2 ∈ C(n, n) such that A = P † D1 P ,
B = P † D2 P , if and only if A, B are commutative: AB = BA.

To prove the theorem, we may simply follow the standard way to associate
a matrix A ∈ C(n, n) with the mapping it generates over Cn through x #→ Ax
for x ∈ Cn as before and apply Theorems 6.13 and 6.14.
Moreover, if A ∈ C(n, n) is unitary, AA† = A† A = In , then there is a
unitary matrix P ∈ C(n, n) and a diagonal matrix D = diag{λ1 , . . . , λn } with
|λi | = 1, i = 1, . . . , n, such that A = P † DP .

Exercises

6.4.1 Let U be a finite-dimensional complex vector space with a positive


definite scalar product (·, ·) and T ∈ L(U ). Show that T is normal if
and only if T satisfies the identity

T (u) = T  (u), u ∈ U. (6.4.10)

6.4.2 Use the previous exercise and the property (λI −T ) = λI −T  (λ ∈ C)


to show directly that if T is normal and λ is an eigenvalue of T with an
associated eigenvector u ∈ U then λ and u are a pair of eigenvalue and
eigenvector of T  .
6.4.3 Let U be a finite-dimensional complex vector space with a positive
definite scalar product (·, ·) and T ∈ L(U ) satisfy the property that, if
λ ∈ C is an eigenvalue of T and u ∈ U an associated eigenvector, then
λ is an eigenvalue and u an associated eigenvector of T  . Show that if
λ, μ ∈ C are two different eigenvalues of T and u, v are the associated
eigenvectors, respectively, then (u, v) = 0.
www.pdfgrip.com

198 Complex quadratic forms and self-adjoint mappings

6.4.4 Assume that A ∈ C(n, n) is triangular and normal. Show that A must
be diagonal.
6.4.5 Let U be a finite-dimensional complex vector space with a positive
definite scalar product and T ∈ L(U ) be normal.
(i) Show that T is self-adjoint if and only if all the eigenvalues of T
are real.
(ii) Show that T is anti-self-adjoint if and only if all the eigenvalues of
T are imaginary.
(iii) Show that T is unitary if and only if all the eigenvalues of T are of
absolute value 1.
6.4.6 Let T ∈ L(U ) be a normal mapping where U is a finite-dimensional
complex vector space with a positive definite scalar product.
(i) Show that if T k = 0 for some integer k ≥ 1 then T = 0.
(ii) (A sharpened version of (i)) Given u ∈ U show that if T k (u) = 0
for some integer k ≥ 1 then T (u) = 0.
(This is an extended version of Exercise 6.2.8.)
6.4.7 If R, S ∈ L(U ) are Hermitian and commutative, show that T = R ± iS
is normal.
6.4.8 Consider the matrix

1 i
A= . (6.4.11)
i 1

(i) Show that A is not Hermitian nor unitary but normal.


(ii) Find an orthonormal basis of C2 , with the standard Hermitian
scalar product, consisting of the eigenvectors of A.
6.4.9 Let U be a finite-dimensional complex vector space with a positive
definite scalar product and T ∈ L(U ) be normal. Show that for any
integer k ≥ 1 there is a normal element S ∈ L(U ) such that T = S k .
Moreover, if T is unitary, then there is a unitary element S ∈ L(U )
such that T = S k .
6.4.10 Recall that any c ∈ C may be rewritten as c = aeiθ , where a, θ ∈ R
and a ≥ 0, known as a polar decomposition of c. Let U be a finite-
dimensional complex vector space with a positive definite scalar prod-
uct and T ∈ L(U ). Show that T enjoys a similar polar decomposition
property such that there are a positive semi-definite element R and a
unitary element S, both in L(U ), satisfying T = R ◦ S = S ◦ R, if and
only if T is normal.
www.pdfgrip.com

6.5 Mappings between two spaces via self-adjoint mappings 199

6.4.11 Let U be a complex vector space with a positive definite scalar product
(·, ·). A mapping T ∈ L(U ) is called hyponormal if it satisfies
(u, (T ◦ T  − T  ◦ T )(u)) ≤ 0, u ∈ U. (6.4.12)
(i) Show that T is normal if any only if T and T  are both hyponormal.
(ii) Show that T being hyponormal is equivalent to
T  (u) ≤ T (u), u ∈ U. (6.4.13)
(iii) If T is hyponormal, so is T + λI where λ ∈ C.
(iv) Show that if λ ∈ C is an eigenvalue and u an associated eigen-
vector of a hyponormal mapping T ∈ L(U ) then λ and u are an
eigenvalue and an associated eigenvector of T  .
6.4.12 Let U be an n-dimensional (n ≥ 2) complex vector space with a pos-
itive definite scalar product and T ∈ L(U ). Show that T is normal if
and only if there is a complex-coefficient polynomial p(t) of degree at
most n − 1 such that T  = p(T ).
6.4.13 Let U be a finite-dimensional complex vector space with a positive
definite scalar product and T ∈ L(U ). Show that T is normal if and
only if T and T  have the same invariant subspaces of U .

6.5 Mappings between two spaces via self-adjoint mappings


As in the real situation, we show that self-adjoint or Hermitian mappings may
be used to study mappings between two complex vector spaces.
As in Section 5.6, let U and V denote two complex vector spaces of finite
dimensions, with positive definite Hermitian scalar products (·, ·)U and (·, ·)V ,
respectively. Given T ∈ L(U, V ) and v ∈ V , it is clear that
f (u) = (v, T (u))V , u ∈ U, (6.5.1)
defines an element f in U  which depends on v ∈ V . So there is a unique ele-
ment in U depending on v, say T  (v), such that f (u) = (T  (v), u)U . That is,
(v, T (u))V = (T  (v), u)U , u ∈ U, v ∈ V . (6.5.2)
Thus we have obtained a well-defined mapping T  : V → U .
It may easily be verified that T  is linear. Thus T  ∈ L(V , U ). As in the
real situation, we can consider the composed mappings T  ◦ T ∈ L(U ) and
T ◦ T  ∈ L(V ), which are both self-adjoint or Hermitian.
www.pdfgrip.com

200 Complex quadratic forms and self-adjoint mappings

Besides, from the relations


(u, (T  ◦ T )(u))U = (T (u), T (u))V ≥ 0, u ∈ U,
  
(6.5.3)
(v, (T ◦ T )(v))V = (T (v), T (v))U ≥ 0, v ∈ V,

it is seen that T  ◦T ∈ L(U ) and T ◦T  ∈ L(V ) are both positive semi-definite.


Use  · U and  · V to denote the norms induced from (·, ·)U and (·, ·)V ,
respectively. Then
T  = sup{T (u)V | uU = 1, u ∈ U }. (6.5.4)
On the other hand, using the fact that T  ◦ T ∈ L(U ) is self-adjoint and posi-
tive semi-definite, we know that there is an orthonormal basis {u1 , . . . , un } of
U consisting of eigenvectors of T  ◦ T , with the corresponding non-negative

n
eigenvalues, σ1 , . . . , σn , respectively. Hence, for any u = ai ui ∈ U with
i=1

n
u2U = |ai | = 1, we have
2

i=1


n
T (u)2V = (T (u), T (u))V = ((T  ◦ T )(u), u)U = σi |ai |2 ≤ σ0 ,
i=1
(6.5.5)
where
σ0 = max {σi } ≥ 0, (6.5.6)
1≤i≤n

which shows T  ≤ σ0 . Moreover, let i = 1, . . . , n be such that σ0 = σi .
Then (6.5.4) gives us
T 2 ≥ T (ui )2V = ((T  ◦ T )(ui ), ui )U = σi = σ0 . (6.5.7)
Consequently, as in the real situation, we conclude with

T  = σ0 , σ0 ≥ 0 is the largest eigenvalue of the mapping T  ◦ T .
(6.5.8)
Therefore the norm of a linear mapping T ∈ L(U, V ) may be obtained
by computing the largest eigenvalue of the induced self-adjoint or Hermitian
mapping T  ◦ T ∈ L(U ).
In particular, if T ∈ L(U ) is already self-adjoint, then, since T  is simply
the radical root of the largest eigenvalue of T 2 , we arrive at
T  = max {|λi |}, (6.5.9)
1≤i≤n
www.pdfgrip.com

6.5 Mappings between two spaces via self-adjoint mappings 201

where λ1 , . . . , λn are the eigenvalues of T .


Thus, a combination of (6.5.8) and (6.5.9) leads to the formula
T 2 = T  ◦ T , T ∈ L(U, V ). (6.5.10)
Analogously, from (6.5.9), we note that, when T ∈ L(U ) is self-adjoint, the
quantity T  may also be evaluated accordingly by
T  = sup{|(u, T (u))| | u ∈ U, u = 1}, (6.5.11)
as in the real situation.
We can similarly show how to extend (6.5.11) to evaluate the norm of an
arbitrary linear mapping between U and V .

Theorem 6.17 Let U, V be finite-dimensional complex vector spaces with


positive definite Hermitian scalar products (·, ·)U , (·, ·)V , respectively. For
T ∈ L(U, V ), we have
T  = sup{|(v, T (u))V | | u ∈ U, v ∈ V , uU = 1, vV = 1}. (6.5.12)

The proof is identical to that for Theorem 5.14.


As in the real situation, we may use Theorem 6.17 to establish the fact that
the norms of a linear mapping and its dual, over two complex vector spaces
with positive definite Hermitian scalar products, share the same value.

Theorem 6.18 Let U, V be finite-dimensional complex vector spaces with


positive definite Hermitian scalar products (·, ·)U , (·, ·)V , respectively. For
T ∈ L(U, V ) and its dual T  ∈ L(V , U ), we have T  = T  . Thus
T  ◦ T  = T ◦ T   and the largest eigenvalues of the positive semi-definite
self-adjoint or Hermitian mappings T  ◦ T ∈ L(U ) and T ◦ T  ∈ L(V ) are the
same.

Proof The fact that T  = T   may be deduced from applying (6.5.12) to


T  and the relation (v, T (u))V = (T  (v), u)U (u ∈ U ,v ∈ V ). The conclusion
T  ◦ T  = T ◦ T   follows from (6.5.10) and that the largest eigenvalues of
the positive semi-definite self-adjoint or Hermitian mappings T  ◦ T and T ◦ T 
are the same is a consequence of the eigenvalue characterization of the norm
of a self-adjoint mapping stated in (6.5.9) and T  ◦ T  = T ◦ T  .
However, as in the real situation, Theorem 6.18 is natural and hardly sur-
prising in view of Theorem 5.16.
We continue to study a mapping T ∈ L(U, V ), where U and V are two com-
plex vector spaces with positive definite Hermitian product (·, ·)U and (·, ·)V
www.pdfgrip.com

202 Complex quadratic forms and self-adjoint mappings

and of dimensions n and m, respectively. Let σ1 , . . . , σn be all the eigen-


values of the positive semi-definite mapping T  ◦ T ∈ L(U ), among which
σ1 , . . . , σk are positive, say. Use {u1 , . . . , uk , . . . , un } to denote an orthonor-
mal basis of U consisting of eigenvectors of T  ◦ T associated with the eigen-
values σ1 , . . . , σk , . . . , σn . Then we have
(T (ui ), T (uj ))V = (ui , (T  ◦ T )(uj ))U
= σj (ui , uj )U = σj δij , i, j = 1, . . . , n. (6.5.13)
This simple expression indicates that T (ui ) = 0 for i > k (if any) and that
{T (u1 ), . . . , T (uk )} forms an orthogonal basis of R(T ). In particular, k =
r(T ).
Now set
1
vi = T (ui ), i = 1, . . . , k. (6.5.14)
T (ui )V
Then {v1 , . . . , vk } is an orthonormal basis for R(T ). Taking i = j = 1, . . . , k

in (6.5.13), we see that T (ui ) = σi , i = 1, . . . , k. In view of this and
(6.5.14), we arrive at

T (ui ) = σi vi , i = 1, . . . , k, T (uj ) = 0, j > k (if any). (6.5.15)
√ √
In the above construction, the positive numbers σ1 , . . . , σk are called
the singular values of T and the expression (6.5.15) the singular value decom-
position for T . This result may conveniently be summarized as a theorem.

Theorem 6.19 Let U, V be finite-dimensional complex vector spaces with


positive definite Hermitian scalar products and T ∈ L(U, V ) is of
rank k ≥ 1. Then there are orthonormal bases {u1 , . . . , uk , . . . , un }
and {v1 , . . . , vk , . . . , vm } of U and V , respectively, and positive numbers
λ1 , . . . , λk , referred to as the singular values of T , such that
T (ui ) = λi vi , i = 1, . . . , k, T (uj ) = 0, j >k (if any). (6.5.16)

In fact the numbers λ21 , . . . , λ2k are all the positive eigenvalues and u1 , . . . , uk
the associated eigenvectors of the self-adjoint mapping T  ◦ T ∈ L(U ).

Let A ∈ C(m, n). Theorem 6.19 implies that there are unitary matrices P ∈
C(n, n) and Q ∈ C(m, m) such that


⎨ AP = Q or A = QP † ,

D 0 (6.5.17)

⎩ = ∈ R(m, n), D = diag{λ1 , . . . , λk },
0 0
www.pdfgrip.com

6.5 Mappings between two spaces via self-adjoint mappings 203

where k = r(A) and λ1 , . . . , λk are some positive numbers for which


λ21 , . . . , λ2k are all the positive eigenvalues of the Hermitian matrix A† A. The
numbers λ1 , . . . , λk are called the singular values of the matrix A and the
expression (6.5.17) the singular value decomposition for the matrix A.
Note that Exercise 6.3.12 may be regarded as an early and special version of
the general singular value decomposition procedure here.
Of course the results above may also be established similarly for mappings
between real vector spaces and for real matrices.

Exercises

6.5.1 Verify that for T ∈ L(U, V ) the mapping T  given in (6.5.2) is a well-
defined element in L(V , U ) although the scalar products (·, ·)U and
(·, ·)V of U and V are both sesquilinear.
6.5.2 Consider the matrix

2 1−i 3
A= . (6.5.18)
3i −1 3 + 2i

(i) Find the eigenvalues of AA† and A† A and compare.


(ii) Find A.
6.5.3 Apply (6.5.8) and use the fact that the nonzero eigenvalues of T  ◦ T
and T ◦ T  are the same to prove directly that T  = T   as stated in
Theorem 6.18.
6.5.4 If U is a finite-dimensional vector space with a positive definite scalar
product and T ∈ L(U ) is normal, show that T 2  = T 2 . Can this
result be extended to T m  = T m for any positive integer m?
6.5.5 Consider the matrix

1+i 2 −1
A= . (6.5.19)
2 1−i 1

(i) Find the singular values of A.


(ii) Find a singular value decomposition of A.
6.5.6 Let A ∈ C(m, n). Show that A and A† have the same singular values.
6.5.7 Let A ∈ C(n, n) be invertible. Investigate the relationship between the
singular values of A and those of A−1 .
6.5.8 Let A ∈ C(n, n). Use the singular value decomposition for A to show
that A may be rewritten as A = P B = CQ, where P , Q are some uni-
tary matrices and B, C some positive semi-definite Hermitian matrices.
www.pdfgrip.com

204 Complex quadratic forms and self-adjoint mappings

6.5.9 Let U be a finite-dimensional complex vector space with a positive


definite Hermitian scalar product and T ∈ L(U ) positive semi-definite.
Show that the singular values of T are simply the positive eigenvalues
of T .
6.5.10 Show that a square complex matrix is unitary if and only if its singular
values are all 1.

Note that most of the exercises in Chapter 5 may be restated in the context
of the complex situation of this chapter and are omitted.
www.pdfgrip.com

7
Jordan decomposition

In this chapter we establish the celebrated Jordan decomposition theorem


which allows us to reduce a linear mapping over C into a canonical form in
terms of its eigenspectrum. As a preparation we first recall some facts regard-
ing factorization of polynomials. Then we show how to reduce a linear map-
ping over a set of its invariant subspaces determined by a prime factorization of
the characteristic polynomial of the mapping. Next we reduce a linear mapping
over its generalized eigenspaces. Finally we prove the Jordan decomposition
theorem by understanding how a mapping behaves itself over each of its gen-
eralized eigenspaces.

7.1 Some useful facts about polynomials


Let P be the vector space of all polynomials with coefficients in a given field
F and in the variable t. Various technical computations and concepts involving
elements in P may be simplified considerably with the notion ‘ideal’ as we
now describe.

Definition 7.1 A non-empty subset I ⊂ P is called an ideal of P if it satisfies


the following two conditions.
(1) f + g ∈ I for any f, g ∈ I.
(2) f g ∈ I for any f ∈ P and g ∈ I.

Since F may naturally be viewed as a subset of P, we see that af ∈ I for


any a ∈ F and f ∈ I. Hence an ideal is also a subspace.
Let g1 , . . . , gk ∈ P. Construct the subset of P given by
{f1 g1 + · · · + fk gk | f1 , . . . , fk ∈ P}. (7.1.1)

205
www.pdfgrip.com

206 Jordan decomposition

It is readily checked that the subset defined in (7.1.1) is an ideal of P. We


may say that this deal is generated from g1 , . . . , gk and use the notation
I(g1 , . . . , gk ) to denote it.
There are two trivial ideals: I = {0} and I = P and it is obvious that
{0} = I(0) and P = I(1). That is, both {0} and P are generated from some
single elements in P. The following theorem establishes that any ideal in P
may be generated from a single element in P.

Theorem 7.2 Any ideal in P is singly generated. More precisely, if I = {0}


is an ideal of P, then there is an element g ∈ P such that I = I(g). More-
over, if there is another h ∈ P such that I = I(h), then g and h are of the
same degree. Besides, if the coefficients of the highest-degree terms of g and h
coincide, then g = h.

Proof If I = I(g) for some g ∈ P, it is clear that g will have the lowest
degree among all elements in I in view of the definition of I(g). Such an
observation indicates what to look for in our proof. Indeed, since I = {0}, we
may choose g to be an element in I \ {0} that is of the lowest degree. Then it
is clear that I(g) ⊂ I. For any h ∈ I \ {0}, we claim g|h (i.e. g divides h).
Otherwise, we may rewrite h as
h(t) = q(t)g(t) + r(t) (7.1.2)
for some q, r ∈ P so that the degree of r is lower than that of g. Since g ∈ I,
we have qg ∈ I. Thus r = h − qg ∈ I, which contradicts the definition of g.
Consequently, g|h. So h ∈ I(g) as expected, which proves I ⊂ I(g).
If there is another h ∈ P such that I = I(h), of course g and h must have
the same degree since h|g and g|h. If the coefficients of the highest-degree
terms of g and h coincide, then g = h, otherwise g − h ∈ I \ {0} would be of
a lower degree, which contradicts the choice of g given earlier.
The theorem is proved.
Let g1 , . . . , gk ∈ P \ {0}. Choose g ∈ P such that I(g) = I(g1 , . . . , gk ).
Then there are elements f1 , . . . , fk ∈ P such that
g = f1 g1 + · · · + fk gk , (7.1.3)
which implies that g contains all common divisors of g1 , . . . , gk . In other
words, if h|g1 , . . . , h|gk , then h|g. On the other hand, the definition of
I(g1 , . . . , gk ) already gives us g|g1 , . . . , g|gk . So g itself is a common
divisor of g1 , . . . , gk . In view of Theorem 7.2, we see that the coefficient of the
highest-degree term of g determines g completely. Thus, we may fix g by tak-
ing the coefficient of the highest-degree term of g to be 1. Such a polynomial
www.pdfgrip.com

7.1 Some useful facts about polynomials 207

g is referred to as the greatest common divisor of g1 , . . . , gk and often denoted


as g = gcd(g1 , . . . , gk ). Therefore, we see that the notion of an ideal and its
generator provides an effective tool for the study of greatest common divisors.
Given g1 , . . . , gk ∈ P \ {0}, if there does not exist a common divisor of a
nontrivial degree (≥ 1) for g1 , . . . , gk , then we say that g1 , . . . , gk are rela-
tively prime or co-prime. Thus g1 , . . . , gk are relatively prime if and only if
gcd(g1 , . . . , gk ) = 1. In this situation, there are elements f1 , . . . , fk ∈ P such
that the identity

f1 (t)g1 (t) + · · · + fk (t)gk (t) = 1 (7.1.4)

is valid for arbitrary t. This fact will be our starting point in the subsequent
development.
A polynomial p ∈ P of degree at least 1 is called a prime polynomial or an
irreducible polynomial if p cannot be factored into the product of two polyno-
mials in P of degrees at least 1. Two polynomials are said to be equivalent if
one is a scalar multiple of the other.

Exercises

7.1.1 For f, g ∈ P show that f |g if and only if I(f ) ⊃ I(g).


7.1.2 Consider I ⊂ P over a field F given by

I = {f ∈ P | f (a1 ) = · · · = f (ak ) = 0}, (7.1.5)

where a1 , . . . , ak ∈ F are distinct.


(i) Show that I is an ideal.
(ii) Find a concrete element g ∈ P such that I = I(g).
(iii) To what extent, is the element g obtained in (ii) unique?
7.1.3 Show that if f ∈ P and (t − 1)|f (t n ), where n ≥ 1 is an integer, then
(t n − 1)|f (t n ).
7.1.4 Let f, g ∈ P, g = 0, and n ≥ 1 be an integer. Prove that f |g if and only
if f n |g n .
7.1.5 Let f, g ∈ P be nonzero polynomials and n ≥ 1 an integer. Show that

(gcd(f, g))n = gcd(f n , g n ). (7.1.6)

7.1.6 Let U be a finite-dimensional vector space over a field F and P the vec-
tor space of all polynomials over F. Show that I = {p ∈ P | p(T ) = 0}
is an ideal of P. Let g ∈ P be such that I = I(g) and the coefficient of
the highest-degree term of g is 1. Is g the minimal polynomial of T that
has the minimum degree among all the polynomials that annihilate T ?
www.pdfgrip.com

208 Jordan decomposition

7.2 Invariant subspaces of linear mappings


In this section, we show how to use characteristic polynomials and their prime
factorization to resolve or reduce linear mappings into mappings over invari-
ant subspaces. As a preparation, we first establish a factorization theorem for
characteristic polynomials.

Theorem 7.3 Let U be a finite-dimensional vector space over a field F and


T ∈ L(U ). Assume that V , W are nontrivial subspaces of U such that
U = V ⊕ W and V , W are invariant under T . Use R, S to denote T
restricted to V , W , and pR (λ), pS (λ), pT (λ) the characteristic polynomials of
R ∈ L(V ), S ∈ L(W ), T ∈ L(U ), respectively. Then pT (λ) = pR (λ)pS (λ).

Proof Let {v1 , . . . , vk } and {w1 , . . . , wl } be bases of V , W , respectively.


Then {v1 , . . . , vk , w1 , . . . , wl } is a basis of U . Assume that B ∈ F(k, k) and
C ∈ F(l, l) are the matrix representations of R and S, with respect to the bases
{v1 , . . . , vk } and {w1 , . . . , wl }, respectively. Then

B 0
A= (7.2.1)
0 C
is the matrix representation of T with respect to the basis
{v1 , . . . , vk , w1 , . . . , wl }. Consequently, we have

λIk − B 0
pT (λ) = det(λI − A) = det
0 λIl − C
= det(λIk − B) det(λIl − C) = pR (λ)pS (λ), (7.2.2)
as asserted, and the theorem is proved.
We can now demonstrate how to use the factorization of the characteristic
polynomial of a linear mapping to naturally resolve it over invariant subspaces.

Theorem 7.4 Let U be a finite-dimensional vector space over a field F and


T ∈ L(U ) and use pT (λ) to denote the characteristic polynomial of T . Factor
pT (λ) into the form
n
pT = p1n1 · · · pk k , (7.2.3)
where p1 , . . . , pk are nonequivalent prime polynomials in P, and set
pT
gi = ni = p1n1 · · · p/
ni nk
i · · · pk , i = 1, . . . , k, (7.2.4)
pi
where !denotes the factor that is missing. Then we have the following.
www.pdfgrip.com

7.2 Invariant subspaces of linear mappings 209

(1) The vector space U has the direct decomposition

U = V1 ⊕ · · · ⊕ Vk , Vi = N (pini (T )), i = 1, . . . , k. (7.2.5)

(2) T is invariant over each Vi , i = 1, . . . , k.

Proof Since pT (λ) is the characteristic polynomial, the Cayley–Hamilton


theorem gives us pT (T ) = 0. Since g1 , . . . , gk are relatively prime, there are
polynomials f1 , . . . , fk ∈ P such that
f1 (λ)g1 (λ) + · · · + fk (λ)gk (λ) = 1. (7.2.6)

Therefore, we have
f1 (T )g1 (T ) + · · · + fk (T )gk (T ) = I. (7.2.7)

Thus, given u ∈ U , we can rewrite u as


u = u1 + · · · + uk , ui = fi (T )gi (T )u, i = 1, . . . , k. (7.2.8)

Now since pini (T )ui = pini (T )fi (T )gi (T )u = fi (T )pT (T )u = 0, we get


ui ∈ N(pini (T )) (i = 1, . . . , k), which proves U = V1 + · · · + Vk .
For any i = 1, . . . , k, we need to show
⎛ ⎞

Wi ≡ Vi ∩ ⎝ Vj ⎠ = {0}. (7.2.9)
1≤j ≤k,j =i

In fact, since pini and gi are relatively prime, there are polynomials qi and ri
in P such that

qi pini + ri gi = 1. (7.2.10)
This gives us the relation

qi (T )pini (T ) + ri (T )gi (T ) = I. (7.2.11)


Note also that the definition of gi indicates that

Vj ⊂ N (gi (T )). (7.2.12)
1≤j ≤k,j =i

Thus, let u ∈ Wi . Then, applying (7.2.11) to u and using (7.2.12), we find

u = qi (T )pini (T )u + ri (T )gi (T )u = 0. (7.2.13)


Therefore (1) is established.
Let u ∈ Vi . Then pini (T )(T (ui )) = T pini (T )(u) = 0. Thus T (u) ∈ Vi and
the invariance of Vi under T is proved, which establishes (2).
www.pdfgrip.com

210 Jordan decomposition

Exercises

7.2.1 Let S, T ∈ L(U ), where U is a finite-dimensional vector space over a


field F. Show that if the characteristic polynomials pS (λ) and pT (λ) of
S and T are relatively prime then pS (T ) and pT (S) are both invertible.
7.2.2 Let S, T ∈ L(U ) where U is a finite-dimensional vector space over
a field F. Use the previous exercise to show that if the characteristic
polynomials of S and T are relatively prime and R ∈ L(U ) satisfies
R ◦ S = T ◦ R then R = 0.
7.2.3 Let U be a finite-dimensional vector space over a field F and T ∈ L(U ).
Show that T is idempotent, T 2 = T , if and only if

r(T ) + r(I − T ) = dim(U ). (7.2.14)

7.2.4 Let U be a finite-dimensional vector space over a field F and T ∈ L(U ).


Prove the following slightly extended version of Theorem 7.4.
Suppose that the characteristic polynomial of T , say pT (λ), has the
factorization pT (λ) = g1 (λ)g2 (λ) over F where g1 and g2 are rela-
tively prime polynomials. Then U = N (g1 (T )) ⊕ N (g2 (T )) and both
N(g1 (T )) and N (g2 (T )) are invariant under T .
7.2.5 Let u ∈ Cn be a nonzero column vector and set A = uu† ∈ C(n, n).
(i) Show that A2 = aA, where a = u† u.
(ii) Find a nonsingular matrix B ∈ C(n, n) so that A = B −1 DB where
D ∈ C(n, n) is diagonal and determine D.
(iii) Describe

N (A) = {x ∈ Cn | Ax = 0},
(7.2.15)
N (A − aIn ) = {x ∈ Cn | (A − aIn )x = 0},

as two invariant subspaces of the mapping T ∈ L(Cn ) given by


x #→ Ax (x ∈ Cn ) over which T reduces.
(iv) Determine r(T ).
7.2.6 Let U be a finite-dimensional vector space and T1 , . . . , Tk ∈ L(U ) sat-
isfy Ti2 = Ti (i = 1, . . . , k) and Ti ◦ Tj = 0 (i, j = 1, . . . , k, i = j , if
any). Show that there holds the space decomposition

U = R(T1 ) ⊕ · · · ⊕ R(Tk ) ⊕ V , V = ∩ki=1 N (Ti ), (7.2.16)

which reduces T1 , . . . , Tk simultaneously.


www.pdfgrip.com

7.3 Generalized eigenspaces as invariant subspaces 211

7.3 Generalized eigenspaces as invariant subspaces


In the first part of this section, we will carry out a study of nilpotent mappings,
which will be crucial for the understanding of the structure of a general linear
mapping in terms of its eigenvalues, to be seen in the second part of the section.

7.3.1 Reducibility of nilpotent mappings


Let U be a finite-dimensional vector space over a field F and T ∈ L(U ) a
nilpotent mapping. Recall that T is said to be of degree m ≥ 1 if m is the
smallest integer such that T m = 0. When m = 1, then T = 0 and the situation
is trivial. In the nontrivial case, m ≥ 2, we know that if u ∈ U is of period
m (that is, T m (u) = 0 but T m−1 (u) = 0), then u, T (u), . . . , T m−1 (u) are
linearly independent vectors in U . Thus, in any nontrivial situation, m satisfies
2 ≤ m ≤ dim(U ).
For a nontrivial nilpotent mapping, we have the following key results.

Theorem 7.5 Let T ∈ L(U ) be a nilpotent mapping of degree m ≥ 2 where


U is finite-dimensional. Then the following are valid.

(1) There are k vectors u1 , . . . , uk and k integers m1 ≥ 2, . . . , mk ≥ 2 such


that u1 , . . . , uk are of periods m1 , . . . , mk , respectively, such that U has
a basis of the form

u01 , . . . , u0k0 , u1 , T (u1 ), . . . , T m1 −1 (u1 ), . . . , uk , T (uk ), . . . , T mk −1 (uk ),


(7.3.1)
where u1 , . . . , uk0 , if any, are some vectors taken from N (T ). Thus, setting

U0 = Span{u01 , . . . , u0k0 },
(7.3.2)
Ui = Span{ui , T (ui ), . . . , T mi −1 (ui )}, i = 1, . . . , k,
we have the decomposition

U = U0 ⊕ U1 ⊕ · · · ⊕ Uk , (7.3.3)

and that U0 , U1 , . . . , Uk are invariant under T .


(2) The degree m of T is given by

m = max{mi | i = 1, . . . , k}. (7.3.4)

(3) The sum of the integers k0 and k is the nullity of T :

k0 + k = n(T ). (7.3.5)
www.pdfgrip.com

212 Jordan decomposition

Proof (1) We proceed inductively with m ≥ 2.


(i) m = 2.
Let {v1 , . . . , vk } be a basis of R(T ). Choose u1 , . . . , uk ∈ U such that
T (u1 ) = v1 , . . . , T (uk ) = vk . We assert that
u1 , T (u1 ), . . . , uk , T (uk ) (7.3.6)
are linearly independent vectors in U . To see this, consider
a1 u1 + b1 T (u1 ) + · · · + ak uk + bk T (uk ) = 0, ai , bi ∈ F, i = 1, . . . , k.
(7.3.7)
Applying T to (7.3.7), we obtain a1 v1 + · · · + ak vk = 0. Thus a1 = · · · =
ak = 0. Inserting this result into (7.3.7), we obtain b1 v1 + · · · + bk vk = 0
which leads to b1 = · · · = bk = 0.
It is clear that v1 , . . . , vk ∈ N (T ) since T 2 = 0. Choose u01 , . . . , u0k0 ∈
N(T ), if any, so that {u01 , . . . , u0k0 , v1 , . . . , vk } becomes a basis of N (T ).
For any u ∈ U , we rewrite T (u) as
T (u) = a1 v1 + · · · + ak vk = T (a1 u1 + · · · + ak uk ), (7.3.8)
where a1 , . . . , ak ∈ F are uniquely determined. Therefore we conclude
that u − (a1 u1 + · · · + ak u) ∈ N (T ) and that there are unique scalars
c1 , . . . , ck0 , b1 , . . . , bk such that
u − (a1 u1 + · · · + ak u) = c1 u1 + · · · + ck0 uk0 + b1 v1 + · · · + bk vk .
(7.3.9)
Consequently we see that
u01 , . . . , u0k0 , u1 , T (u1 ), . . . , uk , T (uk ) (7.3.10)
form a basis of U , as described.
(ii) m ≥ 3.
We assume the statement in (1) holds when the degree of a nilpotent map-
ping is up to m − 1 ≥ 2.
Let T ∈ L(U ) be of degree m ≥ 3 and set V = R(T ). Since V is invariant
under T , we may regard T as an element in L(V ). To avoid confusion, we use
TV to denote the restriction of T over V .
It is clear that the degree of TV is m − 1 ≥ 2. Applying the inductive
assumption to TV , we see that there are vectors v1 , . . . , vl ∈ V with respective
periods m1 ≥ 2, . . . , ml ≥ 2 and vectors v10 , . . . , vl00 ∈ N (TV ), if any, such
that
m1 −1 ml −1
v10 , . . . , vl00 , v1 , TV (v1 ), . . . , TV (v1 ), . . . , vl , TV (vl ), . . . , TV (vl )
(7.3.11)
form a basis of V .
www.pdfgrip.com

7.3 Generalized eigenspaces as invariant subspaces 213

Since V = R(T ), we can find some w1 , . . . , wl0 , u1 , . . . , ul ∈ U such that

T (w1 ) = v10 , . . . , T (wl0 ) = vl00 , T (u1 ) = v1 , . . . , T (ul ) = vl . (7.3.12)

Hinted by (i), we assert that


 
w1 , v10 . . . , wl0 , vl00 , u1 , T (u1 ), . . . , T m1 (u1 ), . . . , ul , T (ul ), . . . , T ml (ul )
(7.3.13)
are linearly independent. In fact, consider the relation
mi 

l0 
l 
(ai wi + bi vi0 ) + bij T j (ui ) = 0, (7.3.14)
i=1 i=1 j =0

where ai s, bi s, and bij s are scalars. Applying T to (7.3.14) and using (7.3.12)
we arrive at


l0 l m
  i −1
j
ai vi0 + bij TV (vi ) = 0, (7.3.15)
i=1 i=1 j =0

which results in the conclusion

ai = 0, i = 1, . . . , l0 ; bij = 0, j = 0, 1, . . . , mi − 1, i = 1, . . . , l.
(7.3.16)
Substituting (7.3.16) into (7.3.15) we have
 
b1 v10 + · · · + bl0 vl00 + b1m1 T m1 (u1 ) + · · · + blml T ml (ul ) = 0. (7.3.17)

In other words, we get


m1 −1 ml −1
b1 v10 + · · · + bl0 vl00 + b1m1 TV (v1 ) + · · · + blml TV (vl ) = 0. (7.3.18)

Since the vectors given in (7.3.11) are linearly independent, we find

b1 = · · · = bl0 = b1m1 = · · · = blml = 0 (7.3.19)

as well, which establishes that the vectors given in (7.3.13) are linearly inde-
pendent.
It is clear that
 
v10 , . . . , vl00 , T m1 (u1 ), . . . , T ml (ul ) ∈ N (T ). (7.3.20)

Find u01 , . . . , u0k0 ∈ N (T ), if any, such that


 
u01 , . . . , u0k0 , v10 , . . . , vl00 , T m1 (u1 ), . . . , T ml (ul ) (7.3.21)

form a basis of N (T ).
www.pdfgrip.com

214 Jordan decomposition

Take any u ∈ U . We know that T (u) ∈ V implies that




l0 l m
  i −1
j
T (u) = bi vi0 + bij TV (vi )
i=1 i=1 j =0
mi 

l0 
l 
= bi T (wi ) + bij T j (ui ) (7.3.22)
i=1 i=1 j =1

for some unique bi s and bij s in F, which gives rise to the result


l0 l m
  i −1
u− bi wi − bij T j (ui ) ∈ N (T ). (7.3.23)
i=1 i=1 j =0

Consequently, we see that




l0 l m
  i −1 
k0 
l0 
l

u− bi wi − bij T j (ui ) = ai u0i + ci vi0 + di T mi (ui )
i=1 i=1 j =0 i=1 i=1 i=1
(7.3.24)

for some unique ai s, ci s, and di s in F. Thus we conclude that the vectors

u01 , . . . , u0k0 , w1 , T (w1 ), . . . , wl0 , T (wl0 ),


 (7.3.25)
ui , T (ui ), . . . , T mi (ui ), i = 1, . . . , l,

form a basis of U so that u01 , . . . , u0k0 ∈ N (T ) (if any), w1 , . . . , wl0 are of


period 2, and u1 , . . . , ul of periods m1 + 1, . . . , ml + 1, respectively.
Thus (1) is proved.
Statement (2) is obvious.
In (7.3.25), we have seen that k = l0 + l. Hence, from (7.3.21), we have
n(T ) = k0 + k as anticipated, which proves (3).

We are now prepared to consider general linear mappings over complex vec-
tor spaces.

7.3.2 Reducibility of a linear mapping via generalized eigenspaces


We now assume that the field we work on is C. In this situation, any prime
polynomial must be of degree 1 which allows us to make the statements in
Theorem 7.4 concrete and explicit.
www.pdfgrip.com

7.3 Generalized eigenspaces as invariant subspaces 215

Theorem 7.6 Let U be a complex n-dimensional vector space and T ∈ L(U ).


For the characteristic polynomial pT (λ) of T , let λ1 , . . . , λk be the distinct
roots of pT (λ) so that

pT (λ) = (λ − λ1 )n1 · · · (λ − λk )nk , n1 , . . . , nk ∈ N. (7.3.26)

Then the following statements are valid.

(1) The vector space U has the decomposition

U = V1 ⊕ · · · ⊕ Vk , Vi = N ((T − λi I )ni ), i = 1, . . . , k. (7.3.27)

(2) Each Vi is invariant under T and T − λi I is nilpotent over Vi , i =


1, . . . , k.
(3) The characteristic polynomial of T restricted to Vi is simply

pVi (λ) = (λ − λi )ni , (7.3.28)

and the dimension of Vi is ni , i = 1, . . . , k.

Proof In view of Theorem 7.4, it remains only to establish (3).


From Theorem 7.2, we have the factorization

pT (λ) = pV1 (λ) · · · pVk (λ), (7.3.29)

where pVi (λ) is the characteristic polynomial of T restricted to Vi (i =


1, . . . , k).
For fixed i, we consider T over Vi . Since T − λi I is nilpotent on Vi , we can
find vectors u01 , . . . , u0m0 ∈ N (T − λi I ), if any, and cyclic vectors u1 , . . . , ul
of respective periods m1 ≥ 2, . . . , ml ≥ 2, if any, so that


⎪ u01 , . . . , u0m0 ,


⎨ u1 , (T − λi I )(u1 ), . . . , (T − λi )m1 −1 (u1 ),
(7.3.30)

⎪ ...............



ul , (T − λi I )(ul ), . . . , (T − λi )ml −1 (ul ),

form a basis for Vi , in view of Theorem 7.5. In particular, we have

di ≡ dim(Vi ) = m0 + m1 + · · · + ml . (7.3.31)
www.pdfgrip.com

216 Jordan decomposition

With respect to such a basis, the matrix of T − λi I is seen to take the following
boxed diagonal form
⎛ ⎞
0 0 ··· 0
⎜ ⎟
⎜ 0 S1 · · · 0 ⎟
⎜ ⎟
⎜ ⎟, (7.3.32)
⎜ 0 · · · ... 0 ⎟
⎝ ⎠
0 ··· 0 Sl

where, in the diagonal of the above matrix, 0 is the zero matrix of size m0 × m0
and S1 , . . . , Sl are shift matrices of sizes m1 × m1 , . . . , ml × ml , respectively.
Therefore the matrix of T = (T − λi I ) + λi I with respect to the same basis is
simply
⎛ ⎞
0 0 ··· 0
⎜ ⎟
⎜ 0 S1 · · · 0 ⎟
⎜ ⎟
Ai = ⎜ ⎟ + λi Idi . (7.3.33)
⎜ 0 · · · ... 0 ⎟
⎝ ⎠
0 ··· 0 Sl

Consequently, it follows immediately that the characteristic polynomial of T


restricted to Vi may be computed by

pVi (λ) = det(λIdi − Ai ) = (λ − λi )di . (7.3.34)

Finally, inserting (7.3.34) into (7.3.29), we obtain

pT (λ) = (λ − λ1 )d1 · · · (λ − λk )dk . (7.3.35)

Comparing (7.3.35) with (7.3.26), we arrive at di = ni , i = 1, . . . , k, as


asserted.

Note that the factorization expressed in (7.3.26) indicates that each eigen-
value, λi , repeats itself ni times as a root in the characteristic polynomial of
T . For this reason, the integer ni is called the algebraic multiplicity of the
eigenvalue λi of T .
Given T ∈ L(U ), let λi be an eigenvalue of T . For any integer m ≥ 1, the
nonzero vectors in N ((T − λi I )m ) are called the generalized eigenvectors and
N((T − λi I )m ) the generalized eigenspace associated to the eigenvalue λi . If
m = 1, generalized eigenvectors and eigenspace are simply eigenvectors and
eigenspace, respectively, associated to the eigenvalue λi of T , and

n(T − λi I ) = dim(N (T − λi I )) (7.3.36)


www.pdfgrip.com

7.3 Generalized eigenspaces as invariant subspaces 217

is the geometric multiplicity of the eigenvalue λi . Since N (T −λi I ) ⊂ N ((T −


λi )ni ), we have

n(T − λi I ) ≤ dim(N (T − λi )ni ) = ni . (7.3.37)

In other words, the geometric multiplicity is less than or equal to the algebraic
multiplicity, of any eigenvalue, of a linear mapping.

Exercises

7.3.1 Let U be a finite-dimensional vector space over a field F and T ∈ L(U ).


Assume that λ0 ∈ F is an eigenvalue of T and consider the eigenspace
Eλ0 = N(T −λ0 I ) associated with λ0 . Let {u1 , . . . , uk } be a basis of Eλ0
and extend it to obtain a basis of U , say B = {u1 , . . . , uk , v1 , . . . , vl }.
Show that, using the matrix representation of T with respect to the
basis B, the characteristic polynomial of T may be shown to take the
form

pT (λ) = (λ − λ0 )k q(λ), (7.3.38)

where q(λ) is a polynomial of degree l with coefficients in F. In particu-


lar, use (7.3.38) to infer again, without relying on Theorem 7.6, that the
geometric multiplicity does not exceed the algebraic multiplicity, of the
eigenvalue λ0 .
7.3.2 Let U be a finite-dimensional vector space over a field F and T ∈ L(U ).
We say that U is a cyclic vector space with respect to T if there is a
vector u ∈ U such that the vectors

u, T (u), . . . , T n−1 (u), (7.3.39)

form a basis for U , and we call the vector u a cyclic vector of T .


(i) Find the matrix representation of T with respect to the basis

{T n−1 (u), . . . , T (u), u} (7.3.40)

with specifying

T (T n−1 (u)) = an−1 T n−1 (u) + · · · + a1 T (u) + a0 u,


a0 , a1 , . . . , an−1 ∈ F. (7.3.41)

Note that T is nilpotent of degree n only when a0 = a1 = · · · =


an−1 = 0.
(ii) Find the characteristic polynomial and the minimal polynomial of
T in terms of a0 , a1 , . . . , an−1 .
www.pdfgrip.com

218 Jordan decomposition

7.3.3 Let U be an n-dimensional vector space (n ≥ 2) over a field F and


T ∈ L(U ). Assume that T has a cyclic vector. Show that S ∈ L(U )
commutes with T if and only S = p(T ) for some polynomial p(t) of
degree at most n − 1 and with coefficients in F.
7.3.4 Let U be an n-dimensional complex vector space with a positive definite
scalar product and T ∈ L(U ) a normal mapping.
(i) Show that if T has a cyclic vector then T has n distinct eigenvalues.
(ii) Assume that T has n distinct eigenvalues and u1 , . . . , un are the
associated eigenvectors. Show that
u = a1 u1 + · · · + an un (7.3.42)
is a cyclic vector of T if and only if ai = 0 for any i = 1, . . . , n.
7.3.5 Let U be an n-dimensional vector space over a field F and T ∈ L(U )
a degree n nilpotent mapping, n ≥ 2. Show that there is no S ∈ L(U )
such that S 2 = T .

7.4 Jordan decomposition theorem


Let T ∈ L(U ) where U is an n-dimensional vector space over C and
λ1 , . . . , λk all the distinct eigenvalues of T , of respective algebraic multiplici-
ties n1 , . . . , nk , so that the characteristic polynomial of T assumes the form
pT (λ) = (λ − λi )n1 · · · (λ − λk )nk . (7.4.1)
For each i = 1, . . . , k, use Vi to denote the generalized eigenspace associated
with the eigenvalue λi :
Vi = N ((T − λi I )ni ). (7.4.2)
Then we have seen the following.
(1) Vi is invariant under T .
(2) There are eigenvectors u0i,1 , . . . , u0i,l 0 of T , if any, associated with the
i
eigenvalue λi , and cyclic vectors ui,1 , . . . , ui,li of respective periods
mi,1 ≥ 2, . . . , mi,li ≥ 2, if any, relative to T − λi I , such that Vi has a
basis, denoted by Bi , consisting of vectors


⎪ u0i,1 , . . . , u0i,l 0 ,




i

ui,1 , (T − λi I )(ui,1 ), . . . , (T − λi )mi,1 −1 (ui,1 ),


(7.4.3)

⎪ ············



⎩ u , (T − λ I )(u ), . . . , (T − λ )mi,li −1 (u ).
i,li i i,li i i,li
www.pdfgrip.com

7.4 Jordan decomposition theorem 219

(3) T − λi I is nilpotent of degree mi on Vi where

mi = max{1, mi,1 , . . . , mi,li }. (7.4.4)

Since (T − λi I )ni is null over Vi , we have mi ≤ ni . Therefore, applying T to


these vectors, we have


⎪ T (u0i,s ) = λi u0i,s , s = 1, . . . , li0 ,



⎪ T (ui,1 ) = (T − λi I )(ui,1 ) + λi ui,1 ,





⎪ T ((T − λi I )(ui,1 )) = (T − λi I )2 (ui,1 )



⎪ +λi (T − λi I )(ui,1 ),





⎪ ······ ··· ······



⎨ T ((T − λ )mi,1 −1 (u )) =
i i,1 λi (T − λi )mi,1 −1 (ui,1 ),
(7.4.5)

⎪ ······ ··· ······





⎪ T (ui,li ) = (T − λi I )(ui,li ) + λi ui,li ,



⎪ T ((T − λi I )(ui,li )) = (T − λi I )2 (ui,li )





⎪ +λi (T − λi I )(ui,li ),





⎪ ······ ··· ······

⎩ mi,li −1
T ((T − λi ) (ui,li )) = λi (T − λi )mi,li −1 (ui,li ).

From (7.4.5), we see that, as an element in L(Vi ), the matrix representation of


T with respect to the basis (7.4.3) is
⎛ ⎞
Ji,0 0 ··· 0
⎜ ⎟
⎜ 0 Ji,1 · · · 0 ⎟
⎜ ⎟
Ji ≡ ⎜ ⎟, (7.4.6)
⎜ 0 ..
. 0 ⎟
⎝ 0 ⎠
0 ··· 0 Ji,li

where Ji,0 = λi Il 0 and


i

⎛ ⎞
λi 0 ··· ··· 0
⎜ ··· 0 ⎟
⎜ 1 λi 0 ⎟
⎜ ⎟
⎜ .. .. .. .. ⎟
Ji,s =⎜
⎜ . . . . 0 ⎟⎟ (7.4.7)
⎜ .. ⎟
⎜ .. .. .. ⎟
⎝ . ··· . . . ⎠
0 ··· ··· 1 λi

is an mi,s × mi,s matrix, s = 1, . . . , li .


www.pdfgrip.com

220 Jordan decomposition

Alternatively, we may also reorder the vectors listed in (7.4.3) to get




⎪ u0i,1 , . . . , u0i,l 0 ,




i

(T − λi )mi,1 −1 (ui,1 ), . . . , (T − λi I )(ui,1 ), ui,1 ,


(7.4.8)

⎪ · · · · · · · · · · · ·



⎩ (T − λi )mi,li −1 (ui,li ), . . . , (T − λi I )(ui,li ), ui,li .
With the choice of these reordered basis vectors, the submatrix Ji,s instead
takes the following updated form,
⎛ ⎞
λi 1 0 ··· 0
⎜ . ⎟
⎜ 0 λ ..
. .. ⎟
⎜ i 1 ⎟
⎜ ⎟
⎜ .. . . . . ⎟
Ji,s = ⎜ . . . . . . 0 ⎟ ∈ C(mi,s , mi,s ), s = 1, . . . , li .
⎜ ⎟
⎜ . ⎟
⎜ . .. .. ⎟
⎝ . ··· . . 1 ⎠
0 ··· ··· 0 λi
(7.4.9)
The submatrix Ji,s given in either (7.4.7) or (7.4.9) is called a Jordan block.
To simplify this statement, Ji,0 = λi Il 0 is customarily said to consist of li0
i
1 × 1 (degenerate) Jordan blocks.
Consequently, if we choose

B = B1 ∪ · · · ∪ Bk , (7.4.10)

where Bi is as given in (7.4.3) or (7.4.8), i = 1, . . . , k, to be a basis of U , then


the matrix that represents T with respect to B is
⎛ ⎞
J1 0 · · · 0
⎜ . ⎟
⎜ .. .. ⎟
⎜ 0 . . .. ⎟
J =⎜ ⎜ . .
⎟,
⎟ (7.4.11)
⎜ .. .. ... 0 ⎟
⎝ ⎠
0 ··· 0 Jk
which is called a Jordan canonical form or a Jordan matrix.
We may summarize the above discussion into the following theorem, which
is the celebrated Jordan decomposition theorem.

Theorem 7.7 Let U be an n-dimensional vector space over C and T ∈ L(U )


so that its distinct eigenvalues are λ1 , . . . , λk with respective algebraic multi-
plicities n1 , . . . , nk . Then the following hold.
www.pdfgrip.com

7.4 Jordan decomposition theorem 221

(1) U has the decomposition U = V1 ⊕ · · · ⊕ Vk and T is invariant over each


of the subspaces V1 , . . . , Vk .
(2) For each i = 1, . . . , k, T − λi I is nilpotent of degree mi over Vi where
mi ≤ ni .
(3) For each i = 1, . . . , k, appropriate eigenvectors and generalized eigen-
vectors may be chosen in Vi as stated in (7.4.3) or (7.4.8) which generate
a basis of Vi , say Bi .
(4) With respect to the basis B = B1 ∪· · ·∪Bk of U , the matrix representation
of T assumes the Jordan canonical form (7.4.11).

The theorem indicates that T nullifies the polynomial

mT (λ) = (λ − λ1 )m1 · · · (λ − λk )mk (7.4.12)

which is a polynomial of the minimum degree among all nonzero polynomials


having T as a root. In fact, to show mT (T ) = 0, we rewrite any u ∈ U in the
form

u = u1 + · · · + uk , u1 ∈ V1 , . . . , uk ∈ Vk . (7.4.13)

Thus, we have
mT (T )(u) = (T − λ1 I )m1 · · · (T − λk I )nk (u1 + · · · + uk )
k #

= 
(T − λ1 I )m1 · · · [(T − λi I )mi ] · · ·
i=1

· · · (T − λk I )nk (T − λi I )mi (ui )
= 0, (7.4.14)
! denotes the item that is
which establishes mT (T ) = 0 as asserted. Here [·]
missing in the expression. Since mT (λ)|pT (λ), we arrive at pT (T ) = 0, as
stated in the Cayley–Hamilton theorem.
Given T ∈ L(U ), use P to denote the vector space of all polynomials with
coefficients in C and consider the subspace of P:
AT = {p ∈ P | p(T ) = 0}. (7.4.15)
It is clear that AT is an ideal in P. Elements in AT are also called annihilating
polynomials of the mapping T . Let AT be generated by some m(λ). Then m(λ)
has the property that it is a minimal-degree polynomial among all nonzero
elements in AT . If we normalize the coefficient of the highest-degree term of
m(λ) to 1, then m is uniquely determined and is called the minimal polynomial
of the linear mapping T . It is clear that, given T ∈ L(U ), if λ1 , . . . , λk are all
the distinct eigenvalues of T and m1 , . . . , mk are the corresponding degrees of
www.pdfgrip.com

222 Jordan decomposition

nilpotence of T −λ1 I, . . . , T −λk I over the respective generalized eigenspaces


(7.4.2), then mT (λ) defined in (7.4.12) is the minimal polynomial of T .
For example, if T is nilpotent of degree k, then mT (λ) = λk ; if T is idem-
potent, that is, T 2 = T , and T = 0, T = I , then mT (λ) = λ2 − λ.
We may use minimal polynomials to characterize a diagonalizable linear
mapping whose matrix representation with respect to a suitable basis is diag-
onal. In such a situation it is clear that this basis is made of eigenvectors and
the diagonal entries of the diagonal matrix are the corresponding eigenvalues
of the mapping.

Theorem 7.8 Let U be an n-dimensional vector space over C and T ∈ L(U ).


Then T is diagonalizable if and only if its minimal polynomial has only simple
roots.

Proof Let mT (λ) be the minimal polynomial of T and λ1 , . . . , λk the distinct


eigenvalues of T . If all roots of mT (λ) are simple, then for each eigenvalue λi ,
the degree of T − λi I over the generalized eigenspace associated to λi is 1, or
T = λi I , i = 1, . . . , k. Thus all the Jordan blocks given in (7.4.6) are diagonal,
which makes J stated in (7.4.11) diagonal.
Conversely, assume T is diagonalizable and λ1 , . . . , λk are all the distinct
eigenvalues of T . Since U = Eλ1 ⊕ · · · ⊕ Eλk , we see that h(λ) = (λ −
λ1 ) · · · (λ−λk ) ∈ AT defined in (7.4.15). We claim that mT (λ) = h(λ). To see
this, we only need to show that for any element p ∈ AT we have p(λi ) = 0,
i = 1, . . . , k. Assume otherwise p(λ1 ) = 0 (say). Then p(λ) and λ − λ1 are
co-prime. So there are polynomials f, g such that f (λ)p(λ) + g(λ)(λ − λ1 ) ≡
1. Consequently, I = g(T )(T − λ1 I ), which leads to the contradiction u = 0
for any u ∈ Eλ1 . Thus, we see that h is the lowest degree element in AT \ {0}.
In other words, mT = h.

As a matrix version of Theorem 7.7, we may state that any n × n complex


matrix is similar to a Jordan matrix of the form (7.4.11). For matrices, min-
imal polynomials may be defined analogously and, hence, omitted. Besides,
the matrix version of Theorem 7.8 may read as follows: An n × n matrix
is diagonalizable if and only if the roots of its minimal polynomial are all
simple.

Exercises

7.4.1 Show that a diagonalizable nilpotent mapping must be trivial, T = 0,


and a nontrivial idempotent mapping T = I is diagonalizable.
www.pdfgrip.com

7.4 Jordan decomposition theorem 223

7.4.2 Let λ1 , . . . , λk be all the distinct eigenvalues of a Hermitian mapping


T over an n-dimensional vector space U over C. Show that
mT (λ) = (λ − λ1 ) · · · (λ − λk ). (7.4.16)
7.4.3 Let U be an n-dimensional vector space over C and S, T ∈ L(U ). In
the proof of Theorem 7.8, it is shown that if S ∼ T , then mS (λ) =
mT (λ). Give an example to show that the condition mS (λ) = mT (λ) is
not sufficient to ensure S ∼ T .
7.4.4 Let A ∈ C(n, n). Show that A ∼ At .
7.4.5 Let A, B ∈ R(n, n). Show that if there is a nonsingular element C ∈
C(n, n) such that A = C −1 BC then there is a nonsingular element
K ∈ R(n, n) such that A = K −1 BK.
7.4.6 Let A, B ∈ C(n, n) be normal. Show that if the characteristic polyno-
mials of A, B coincide then A ∼ B.
7.4.7 Let T ∈ L(Cn ) be defined by
⎛ ⎞ ⎛ ⎞
xn x1
⎜ . ⎟ ⎜ . ⎟
T (x) = ⎜ ⎟ ⎜ ⎟
⎝ .. ⎠ , x = ⎝ .. ⎠ ∈ C .
n
(7.4.17)
x1 xn
(i) Determine all the eigenvalues of T .
(ii) Find the minimal polynomial of T .
(iii) Does Cn have a basis consisting of eigenvectors of T ?
7.4.8 Consider the matrix
⎛ ⎞
a 0 0
⎜ ⎟
A=⎝ 1 0 1 ⎠, (7.4.18)
0 1 0
where a ∈ R.
(i) For what value(s) of a can or cannot the matrix A be diagonalized?
(ii) Find the Jordan forms of A corresponding to various values of a.
7.4.9 Let A ∈ C(n, n) and k ≥ 2 be an integer such that A ∼ Ak .
(i) Show that, if λ is an eigenvalue of A, so is λk .
(ii) Show that, if in addition A is nonsingular, then each eigenvalue of
A is a root of unity. In other words, if λ ∈ C is an eigenvalue of A,
then there is an integer s ≥ 1 such that λs = 1.
7.4.10 Let A ∈ C(n, n) satisfy Am = aIn for some integer m ≥ 1 and nonzero
a ∈ C. Use the information about the minimal polynomial of A to
prove that A is diagonalizable.
www.pdfgrip.com

224 Jordan decomposition

7.4.11 Let A, B ∈ F(n, n) satisfy A ∼ B. Show that adj(A) ∼ adj(B).


7.4.12 Show that the n × n matrices
⎛ ⎞ ⎛ ⎞
1 1 ··· 1 n 0 ··· 0
⎜ ⎟ ⎜ ⎟
⎜ 1 1 ··· 1 ⎟ ⎜ b2 0 · · · 0 ⎟
⎜ ⎟ ⎜ ⎟
A=⎜ . . . ⎟, B = ⎜ . . . ⎟ , (7.4.19)
⎜ .. .. . . ... ⎟ ⎜ .. .. . . ... ⎟
⎝ ⎠ ⎝ ⎠
1 1 ··· 1 bn 0 · · · 0
where b2 , . . . , bn ∈ R, are similar and diagonalizable.
7.4.13 Show that the matrices
⎛ ⎞ ⎛ ⎞
2 0 0 1 0 0
⎜ ⎟ ⎜ ⎟
A = ⎝ 0 0 1 ⎠ , B = ⎝ 0 −1 0 ⎠ , (7.4.20)
0 1 0 0 −6 2
are similar and find a nonsingular element C ∈ R(3, 3) such that A =
C −1 BC.
7.4.14 Show that if A ∈ C(n, n) has a single eigenvalue then A is not diago-
nalizable unless A = λIn . In particular, a triangular matrix with iden-
tical diagonal entries can never be diagonalizable unless it is already
diagonal.
7.4.15 Consider A ∈ C(n, n) and express its characteristic polynomial pA (λ)
as

pA (λ) = (λ − λ1 )n1 · · · (λ − λk )nk , (7.4.21)

where λ1 , . . . , λk ∈ C are the distinct eigenvalues of A and n1 , . . . , nk


the respective algebraic multiplicities of these eigenvalues such that
k
ni = n. Show that A is diagonalizable if and only if r(λi I − A) =
i=1
n − ni for i = 1, . . . , k.
7.4.16 Show that the matrix
⎛ ⎞
0 −1 0 0
⎜ ⎟
⎜ 0 0 1 0 ⎟
A=⎜
⎜ 0
⎟, (7.4.22)
⎝ 0 0 −1 ⎟

a4 0 0 0
where a > 0, is diagonalizable in C(4, 4) but not in R(4, 4).
7.4.17 Show that there is no matrix in R(3, 3) whose minimal polynomial is
m(λ) = λ2 + 3λ + 4.
7.4.18 Let A ∈ R(n, n) and satisfy A2 + In = 0.
www.pdfgrip.com

7.4 Jordan decomposition theorem 225

(i) Show that the minimal polynomial of A is simply λ2 + 1.


(ii) Show that n must be an even integer, n = 2m.
(iii) Show that there are n linearly independent vectors,
u1 , v1 , . . . , um , vm , in Rn such that
Aui = −vi , Avi = ui , i = 1, . . . , m. (7.4.23)
(iv) Use the vectors u1 , v1 , . . . , um , vm to construct an invertible
matrix B ∈ R(n, n) such that

0 Im
A=B B −1 . (7.4.24)
−Im 0
www.pdfgrip.com

8
Selected topics

In this chapter we present a few selected subjects that are important in applica-
tions as well, but are not usually included in a standard linear algebra course.
These subjects may serve as supplemental or extracurricular materials. The
first subject is the Schur decomposition theorem, the second is about the clas-
sification of skewsymmetric bilinear forms, the third is the Perron–Frobenius
theorem for positive matrices, and the fourth concerns the Markov or stochastic
matrices.

8.1 Schur decomposition


In this section we establish the Schur decomposition theorem, which serves as
a useful complement to the Jordan decomposition theorem and renders further
insight and fresh angles into various subjects, such as the spectral structures of
normal mappings and self-adjoint mappings, already covered.

Theorem 8.1 Let U be a finite-dimensional complex vector space with a pos-


itive definite scalar product and T ∈ L(U ). There is an orthonormal basis
B = {u1 , . . . , un } of U such that the matrix representation of T with respect
to B is upper triangular. That is,

j
T (uj ) = bij ui , j = 1, . . . , n, (8.1.1)
i=1
for some bij ∈ C, i = 1, . . . , j, j = 1, . . . , n. In particular, the diagonal
entries b11 , . . . , bnn of the upper triangular matrix B = (bij ) ∈ C(n, n) are
the eigenvalues of T , which are not necessarily distinct.

Proof We prove the theorem by induction on dim(U ).


When dim(U ) = 1, there is nothing to show.

226
www.pdfgrip.com

8.1 Schur decomposition 227

Assume that the theorem holds when dim(U ) = n − 1 ≥ 1.


We proceed to establish the theorem when dim(U ) = n ≥ 2.
Let w be an eigenvector of T  associated with the eigenvalue λ and consider
V = (Span{w})⊥ . (8.1.2)
We assert that V is invariant under T . In fact, for any v ∈ V , we have
(w, T (v)) = (T  (w), v) = (λw, v) = λ(w, v) = 0, (8.1.3)
which verifies T (v) ∈ V . Thus T ∈ L(V ).
Applying the inductive assumption on T ∈ L(V ) since dim(V ) = n − 1, we
see that there is an orthonormal basis {u1 , . . . , un−1 } of V and scalars bij ∈ C,
i = 1, . . . , j, j = 1, . . . , n − 1 such that

j
T (uj ) = bij ui , j = 1, . . . , n − 1. (8.1.4)
i=1

Finally, setting un = (1/w)w, we conclude that {u1 , . . . , un−1 , un } is an


orthonormal basis of U with the stated properties.
Using the Schur decomposition theorem, Theorem 8.1, the Cayley–
Hamilton theorem (over C) can be readily proved.
In fact, since T (u1 ) = b11 u1 , we have (T − b11 I )(u1 ) = 0. For j − 1 ≥ 1,
we assume (T − b11 I ) · · · (T − bj −1,j −1 I )(uk ) = 0 for k = 1, . . . , j − 1.
Using the matrix representation of T with respect to {u1 , . . . , un }, we have
j −1

(T − bjj I )(uj ) = bij ui . (8.1.5)
i=1

Hence we arrive at the general conclusion


(T − b11 I ) · · · (T − bjj I )(uj )
⎛ ⎞
j −1

= (T − b11 I ) · · · (T − bj −1,j −1 I ) ⎝ bij ui ⎠ = 0, (8.1.6)
i=1

for j = 2, . . . , n. Hence (T − b11 I ) · · · (T − bjj I )(uk ) = 0 for k = 1, . . . , j,


j = 1, . . . , n. In particular, since the characteristic polynomial of T takes the
form
pT (λ) = det(λI − B) = (λ − b11 ) · · · (λ − bnn ), (8.1.7)
we have
pT (T )(ui ) = (T − b11 I ) · · · (T − bnn I )(ui ) = 0, i = 1, . . . , n. (8.1.8)
www.pdfgrip.com

228 Selected topics

Consequently, pT (T ) = 0, as anticipated.
It is clear that, in Theorem 8.1, if the upper triangular matrix B ∈ C(n, n)
is diagonal, then the orthonormal basis B is made of the eigenvectors of T .
In this situation T is normal. Likewise, if B is diagonal and real, then T is
self-adjoint.
We remark that Theorem 8.1 may also be proved without resorting to the
adjoint mapping.
In fact, use the notation of Theorem 8.1 and proceed to the nontrivial situ-
ation dim(U ) = n ≥ 2 directly. Let λ1 ∈ C be an eigenvalue of T and u1 an
associated unit eigenvector. Then we have the orthogonal decomposition

U = Span{u1 } ⊕ V , V = (Span{u1 })⊥ . (8.1.9)

Let P ∈ L(U ) be the projection of U onto V along Span{u1 } and set S =


P ◦ T . Then S may be viewed as an element in L(V ).
Since dim(V ) = n − 1, we may apply the inductive assumption to obtain an
orthonormal basis, say {u2 , . . . , un }, of V , such that


j
S(uj ) = bij ui , j = 2, . . . , n, (8.1.10)
i=2

for some bij s in C. Of course the vectors u1 , u2 , . . . , un now form an orthonor-


mal basis of U . Moreover, in view of (8.1.10) and R(I − P ) = Span{u1 },
we have
T (u1 ) = λ1 u1 ≡ b11 u1 ,
T (uj ) = ((I − P ) ◦ T )(uj ) + (P ◦ T )(uj )
(8.1.11)

j
= b1j u1 + bij ui , b1j ∈ C, j = 2, . . . , n,
i=2

where the matrix B = (bij ) ∈ C(n, n) is clearly upper triangular as described.


Thus Theorem 8.1 is again established.
The matrix version of Theorem 8.1 may be stated as follows.

Theorem 8.2 For any matrix A ∈ C(n, n) there is a unitary matrix P ∈


C(n, n) and an upper triangular matrix B ∈ C(n, n) such that

A = P † BP . (8.1.12)

That is, the matrix A is Hermitian congruent or similar through a unitary


matrix to an upper triangular matrix B whose diagonal entries are all the
eigenvalues of A.
www.pdfgrip.com

8.1 Schur decomposition 229

The proof of Theorem 8.2 may be obtained by applying Theorem 8.1, where
we take U = Cn with the standard Hermitian scalar product and define T ∈
L(Cn ) by setting T (u) = Au for u ∈ Cn . In fact, with B = {u1 , . . . , un } being
the orthonormal basis of Cn stated in Theorem 8.1, the unitary matrix P in
(8.1.12) is such that the ith column vector of P † is simply ui , i = 1, . . . , n.
From (8.1.12) we see immediately that A is normal if and only if B is diag-
onal and that A is Hermitian if and only if B is diagonal and real.

Exercises

8.1.1 Show that the matrix B may be taken to be lower triangular in Theo-
rems 8.1 and 8.2.
8.1.2 Let A = (aij ), B = (bij ) ∈ C(n, n) be stated in the relation (8.1.12).
Use the fact Tr(A† A) = Tr(B † B) to infer the identity

n 
|aij |2 = |bij |2 . (8.1.13)
i,j =1 1≤i≤j ≤n

8.1.3 Denote by λ1 , . . . , λn all the eigenvalues of a matrix A ∈ C(n, n). Show


that A is normal if and only if it satisfies the equation

Tr(A† A) = |λ1 |2 + · · · + |λn |2 . (8.1.14)

8.1.4 For A ∈ R(n, n) assume that the roots of the characteristic polynomial
of A are all real. Establish the real version of the Schur decomposition
theorem, which asserts that there is an orthogonal matrix P ∈ R(n, n)
and an upper triangular matrix B ∈ R(n, n) such that A = P t BP and
that the diagonal entries of B are all the eigenvalues of A. Can you prove
a linear mapping version of the theorem when U is a real vector space
with a positive definite scalar product?
8.1.5 Show that if A ∈ R(n, n) is normal and all the roots of its characteristic
polynomial are real then A must be symmetric.
8.1.6 Let U be a finite-dimensional complex vector space with a positive defi-
nite scalar product and S, T ∈ L(U ). If S and T are commutative, show
that U has an orthonormal basis, say B, such that, with respect to B, the
matrix representations of S and T are both upper triangular.
8.1.7 Let U be an n-dimensional (n ≥ 2) complex vector space with a positive
definite scalar product and T ∈ L(U ). If λ ∈ C is an eigenvalue of T u ∈
U an associated eigenvector, then the quotient space V = U/Span{u} is
of dimension n − 1.
(i) Define a positive definite scalar product over V and show that T
induces an element in L(V ).
www.pdfgrip.com

230 Selected topics

(ii) Formulate an inductive proof of Theorem 8.1 using the construction


in (i).
8.1.8 Given A ∈ C(n, n) (n ≥ 2), follow the steps below to carry out an
inductive but also constructive proof of Theorem 8.2.
(i) Find an eigenvalue, say λ1 , of A, and a unit eigenvector u1 ∈ Cn ,
taken as a column vector. Let u2 , . . . , un ∈ Cn be chosen so that
u1 , u2 , . . . , un form an orthonormal basis of Cn . Use u1 , u2 , . . . , un
as the first, second,. . . , and the nth column vectors of a matrix called
Q1 . Then Q1 is unitary. Check that

† λ1 α
Q1 AQ1 = , (8.1.15)
0 An−1

where An−1 ∈ C(n − 1, n − 1) and α ∈ Cn−1 is a row vector.


(ii) Apply the inductive assumption to get a unitary element Q ∈ C(n−1,
n − 1) so that Q† An−1 Q is upper triangular. Show that

1 0
Q2 = (8.1.16)
0 Q
is a unitary element in C(n, n) such that
(Q1 Q2 )† A(Q1 Q2 ) = Q†2 Q†1 AQ1 Q2 (8.1.17)
becomes upper triangular as desired.
8.1.9 Let T ∈ L(U ) where U is a finite-dimensional complex vector space
with a positive definite scalar product. Prove that if λ is an eigenvalue of
T then λ is an eigenvalue of T  .

8.2 Classification of skewsymmetric bilinear forms


Let U be a finite-dimensional vector space over a field F. A bilinear form
f : U × U → F is called skewsymmetric or anti-symmetric if it satisfies
f (u, v) = −f (v, u), u, v ∈ U. (8.2.1)
Let B = {u1 , . . . , un } be a basis of U . For u, v ∈ U with coordinate vectors
x = (x1 , . . . , xn )t , y = (y1 , . . . , yn )t ∈ Fn with respect to B, we can rewrite
f (u, v) as
⎛ ⎞
 n 
n n
f (u, v) = f ⎝ xi ui , yj uj ⎠ = xi f (ui , uj )yj = x t Ay, (8.2.2)
i=1 j =1 i,j =1
www.pdfgrip.com

8.2 Classification of skewsymmetric bilinear forms 231

where A = (aij ) = (f (ui , uj )) ∈ F(n, n) is the matrix representation of f


with respect to the basis B. Thus, combining (8.2.1) and (8.2.2), we see that A
must be skewsymmetric or anti-symmetric, A = −At , because

−x t Ay = −f (u, v) = f (v, u) = y t Ax = (y t Ax)t = x t At y, (8.2.3)

and x, y ∈ Fn are arbitrary.


Let B̃ = {ũ1 , . . . , ũn } be another basis of U and à = (ãij ) = (f (ũi , ũj )) ∈
F(n, n) the matrix representation of f with respect to B̃. If the transition matrix
between B and B̃ is B = (bij ) ∈ F(n, n) so that

n
ũj = bij ui , j = 1, . . . , n, (8.2.4)
i=1

then we know that à and A are congruent through B, à = B t AB, as discussed


in Chapter 5.
In this section, we study the canonical forms of skewsymmetric forms
through a classification of skewsymmetric matrices by congruent relations.

Theorem 8.3 Let A ∈ F(n, n) be skewsymmetric. Then there is some non-


singular matrix C ∈ F(n, n) satisfying det(C) = ±1 so that A is congruent
through C to a matrix ∈ F(n, n) of the following canonical form
⎛ ⎞
α1 · · · 0 ··· ··· 0
⎜ . .. ⎟
⎜ . .. .. .. .. ⎟
⎜ . . . . . . ⎟
⎜ ⎟
⎜ . .. .. ⎟
⎜ .. · · · α · · · ⎟
⎜ k . . ⎟
≡⎜ ⎟ = CAC t , (8.2.5)
⎜ .. ⎟
⎜ 0 ··· ··· 0 ··· . ⎟
⎜ ⎟
⎜ . . . .. .. ⎟
⎜ . . . ..
. . ⎟
⎝ . . . . ⎠
0 ··· ··· ··· ··· 0
where

0 ai
αi = , i = 1, . . . , k, (8.2.6)
−ai 0

are some 2 × 2 skewsymmetric matrices given in terms of k (if any) nonzero


scalars a1 , . . . , ak ∈ F.

Proof In the trivial case A = 0, there is nothing to show.


We now make induction on n.
www.pdfgrip.com

232 Selected topics

If n = 1, then A = 0 and there is nothing to show. If n = 2 but A = 0, then



0 a
A= , a = 0. (8.2.7)
−a 0

Hence there is nothing to show either.


Assume the statement of the theorem is valid for any n ≤ l for some l ≥ 2.
Let n = l + 1. Assume the nontrivial case A = (aij ) = 0. So there is some
aij = 0 for i, j = 1, . . . , n. Let i be the smallest among {1, . . . , n} such that
aij = 0 for some j = 1, . . . , n. If i = 1 and j = 2, then a12 = 0. If i = 1 and
j > 2, we let E1 be the elementary matrix obtained from interchanging the
second and j th rows of In . Then det(E1 ) = −1 and the entry at the first row
and second column of E1 AE1t is nonzero. If i > 1, then the first row of A is a
zero row. Let E1 be the elementary matrix obtained from interchanging the first
and ith rows of In . Then det(E1 ) = −1 and the first row of E1 AE1t = (bij )
is nonzero. So there is some 2 ≤ j ≤ n such that b1j = 0. Let E2 be the
elementary matrix obtained from interchanging the second and j th rows of
E1 AE1t . Then, in E2 E1 AE1t E2t = (cij ), we have c12 ≡ a1 = 0. We also have
det(E2 ) = ±1 depending on whether j = 2 or j = 2.
Thus we may summarize that there is a matrix E with det(E) = ±1
such that
 
α1 β 0 a1
EAE = t
, α1 = , (8.2.8)
−β An−2
t
−a1 0

where β ∈ F(2, n − 2) and An−2 ∈ F(n − 2, n − 2) is skewsymmetric.


Consider P ∈ F(n, n) of the form

I2 0
P = t
, (8.2.9)
γ In−2

where γ ∈ F(2, n − 2) is to be determined. Then det(P ) = 1 and



I2 γ
P =
t
. (8.2.10)
0 In−2

Consequently, we have
  
I2 0 α1 β I2 γ
P EAE P =
t t
γt In−2 −β t
An−2 0 In−2

α1 α1 γ + β
= . (8.2.11)
γ α1 − β
t t
γ α1 γ − β t γ + γ t β + An−2
t
www.pdfgrip.com

8.2 Classification of skewsymmetric bilinear forms 233

To proceed, we choose γ to satisfy

α1 γ + β = 0. (8.2.12)

This is possible to do since α1 ∈ F(2, 2) is invertible. In other words, we are


led to the unique choice

−1 −1 0 −a1−1
γ = −α1 β, α1 = . (8.2.13)
a1 0

Thus, in view of (8.2.12), we see that (8.2.11) becomes



α1 0
P EAE P =
t t
, (8.2.14)
0 G

where

G = γ t α1 γ − β t γ + γ t β + An−2 (8.2.15)

is a skewsymmetric element in F(n − 2, n − 2).


Using the inductive assumption, we can find an element D ∈ F(n−2, n−2)
satisfying det(D) = ±1 such that DGD t is of the desired canonical form
stated in the theorem.
Now let

I2 0
Q= . (8.2.16)
0 D

Then det(Q) = ±1 and



α1 0
QP EAE P Q =
t t t
= , (8.2.17)
0 DGD t

where the matrix ∈ F(n, n) is as given in (8.2.5).


Taking C = QP E, we see that the theorem is established.

Furthermore, applying a sequence of elementary matrices to the left and


right of the canonical matrix given in (8.2.5) realizing suitable row and col-
umn interchanges, we see that we can use a nonsingular matrix of determinant
±1 to congruently reduce into another canonical form,
⎛ ⎞
0 Dk 0
˜ =⎜ ⎟
⎝ −Dk 0 0 ⎠ ∈ F(n, n), (8.2.18)
0 0 0
www.pdfgrip.com

234 Selected topics

where Dk ∈ F(k, k) is the diagonal matrix


Dk = diag{a1 , . . . , ak }. (8.2.19)
As a by-product of the above discussion, we infer that the rank of a
skewsymmetric matrix is 2k, an even number.
We now consider the situation when F = R. By row and column operations
if necessary, we may assume without loss of generality that a1 , . . . , ak > 0 in
(8.2.6). With this assumption, define
⎛ ⎞
β1 · · · 0 0 ⎛ 1 ⎞
⎜ . .. ⎟ √ 0
⎜ . .. .. ⎟
⎜ . . ⎟, β = ⎜ ai ⎟
R=⎜ . ⎟ i ⎜ ⎟ , i = 1, . . . , k.
.
⎜ 0 ··· β ⎝ 1 ⎠
⎝ k 0 ⎟ ⎠ 0 √
ai
0 · · · 0 In−2k
(8.2.20)
Then
⎛ ⎞
J2 ··· 0 0
⎜ . .. ⎟
⎜ . .. .. ⎟
⎜ . . ⎟,
R Rt = ≡⎜ . .
⎟ (8.2.21)
⎜ 0 ··· 0 ⎟
⎝ J2 ⎠
0 ··· ··· 0
where

0 1
J2 = (8.2.22)
−1 0
appears k times in (8.2.21).
Of course, as before, we may also further congruently reduce in (8.2.21)
into another canonical form,

J2k 0
˜ = , (8.2.23)
0 0
where J2k ∈ R(2k, 2k) is given by

0 Ik
J2k = . (8.2.24)
−Ik 0
As in the case of scalar products, for a skewsymmetric bilinear form f :
U × U → F, define
S ⊥ = {u ∈ U | f (u, v) = 0, v ∈ S} (8.2.25)
www.pdfgrip.com

8.2 Classification of skewsymmetric bilinear forms 235

for a non-empty subset S of U . Then S ⊥ is a subspace of U .


Set

U0 = U ⊥ = {u ∈ U | f (u, v) = 0, v ∈ U }. (8.2.26)

We call f non-degenerate if U0 = {0}. In other words, a non-degenerate


skewsymmetric bilinear form f is characterized by the fact that when u ∈ U
and f (u, v) = 0 for all v ∈ U then u = 0. Thus f is called degenerate if
U0 = {0} or dim(U0 ) ≥ 1. It is not hard to show that f is degenerate if and
only if the rank of the matrix representation of f with respect to any basis of
U is smaller than dim(U ).
If f is degenerate over U , then the canonical form (8.2.5) indicates that we
may obtain a basis

{u1 , . . . , un0 , v1 , . . . , vk , w1 , . . . , wk } (8.2.27)

of U so that {u1 , . . . , un0 } is a basis of U0 with n0 = dim(U0 ) and that



⎨ f (vi , wi ) = ai = 0, i = 1, . . . , k,

f (vi , vj ) = f (wi , wj ) = 0, i, j = 1, . . . , k, (8.2.28)


f (vi , wj ) = 0, i = j, i, j = 1, . . . , k.

Of particular interest is when f is non-degenerate over U . In such a situa-


tion, dim(U ) must be an even number, 2k.

Definition 8.4 Let U be a vector space of 2k dimensions. A skewsymmetric


bilinear form f : U × U → F is called symplectic if f is non-degenerate. An
even dimensional vector space U equipped with a symplectic form is called a
symplectic vector space. A basis {v1 , . . . , vk , w1 , . . . , wk } of a 2k-dimensional
symplectic vector space U equipped with the symplectic form f is called sym-
plectic if


⎨ f (vi , wi ) = 1, i = 1, . . . , k,
f (vi , vj ) = f (wi , wj ) = 0, i, j = 1, . . . , k, (8.2.29)


f (vi , wj ) = 0, i = j, i, j = 1, . . . , k.

A symplectic basis is also called a Darboux basis.


Therefore, we have seen that a real symplectic vector space always has a
symplectic basis.

Definition 8.5 Let U be a symplectic vector space equipped with a symplectic


form f . A subspace V of U is called isotropic if f (u, v) = 0 for any u, v ∈ V .
www.pdfgrip.com

236 Selected topics

If dim(U ) = 2k then any k-dimensional isotropic subspace of U is called a


Lagrangian subspace.

Let U be a symplectic vector space with a symplectic basis given as in


(8.2.29). Then we see that both
V = Span{v1 , . . . , vk }, W = Span{w1 , . . . , wk } (8.2.30)
are Lagrangian subspaces of U .
If U is a finite-dimensional complex vector space, we may consider a
skewsymmetric sesquilinear form f from U × U into C. Such a form satisfies
f (u, v) = −f (v, u), u, v ∈ U, (8.2.31)
and is called Hermitian skewsymmetric or skew-Hermitian. Therefore, the form
g(u, v) = if (u, v), u, v ∈ U, (8.2.32)
is Hermitian. It is clear that the matrix representation, say A ∈ C(n, n), with
respect to any basis of U of a Hermitian skewsymmetric form is anti-Hermitian
or skew-Hermitian, A† = −A. Hence iA is Hermitian. Applying the knowl-
edge about Hermitian forms and Hermitian matrices studied in Chapter 6, it is
not hard to come up with a complete understanding of skew-Hermitian forms
and matrices, in the same spirit of Theorem 8.3. We leave this as an exercise.

Exercises

8.2.1 Let f be a bilinear form over a vector space U . Show that f is skewsym-
metric if and only if f (u, u) = 0 for any u ∈ U .
8.2.2 Let f be a skewsymmetric bilinear form over U and S ⊂ U a non-empty
subset. Show that S ⊥ defined in (8.2.25) is a subspace of U .
8.2.3 Let U be a finite-dimensional vector space and f a skewsymmetric bil-
inear form over U . Define U0 by (8.2.26) and use A to denote the matrix
representation of f with respect to any given basis of U . Show that
dim(U0 ) = dim(U ) − r(A). (8.2.33)
8.2.4 Let U be a symplectic vector space equipped with a symplectic form f .
If V is a subspace of U , V ⊥ is called the symplectic complement of V
in U . Prove the following.
(i) dim(V ) + dim(V ⊥ ) = dim(U ).
(ii) (V ⊥ )⊥ = V .
8.2.5 Let U be a symplectic vector space equipped with a symplectic form f .
If V is a space of U , we can consider the restriction of f over V . Prove
that f is symplectic over V if and only if V ∩ V ⊥ = {0}.
www.pdfgrip.com

8.3 Perron–Frobenius theorem for positive matrices 237

8.2.6 Show that a subspace V of a symplectic vector space U is isotropic if


and only if V ⊂ V ⊥ .
8.2.7 Show that a subspace V of a symplectic vector space U is Lagrangian if
and only if V = V ⊥ .
8.2.8 For A ∈ C(n, n), show that A is skew-Hermitian if and only if there is a
nonsingular element C ∈ C(n, n) such that CAC † = iD where D is an
n × n real diagonal matrix.

8.3 Perron–Frobenius theorem for positive matrices


Let A = (aij ) ∈ R(n, n). We say that A is positive (non-negative) if aij > 0
(aij ≥ 0) for all i, j = 1, . . . , n. Likewise we say that a vector u = (ai ) ∈
Rn is positive (non-negative) if ai > 0 (ai ≥ 0) for all i = 1, . . . , n. More
generally, for A, B ∈ R(n, n), we write A > B (A ≥ B) if (A − B) > 0
((A − B) ≥ 0); for u, v ∈ Rn , we write u > v (u ≥ v) if (u − v) > 0
((u − v) ≥ 0).
The Perron–Frobenius theorem concerns the existence and properties of a
positive eigenvector, associated to a positive eigenvalue, of a positive matrix
and may be stated as follows.

Theorem 8.6 Let A = (aij ) ∈ R(n, n) be a positive matrix. Then there is


a positive eigenvalue, r, of A, called the dominant eigenvalue, satisfying the
following properties.
(1) There is a positive eigenvector, say u, associated with r, such that any
other non-negative eigenvectors of A associated with r must be positive
multiples of u.
(2) r is a simple root of the characteristic polynomial of A.
(3) If λ ∈ C is any other eigenvalue of A, then
|λ| < r. (8.3.1)
Furthermore, any nonnegative eigenvector of A must be associated with
the dominant eigenvalue r.

Proof We equip Rn with the norm


⎛ ⎞
x1
⎜ . ⎟
x = max{|xi | | i = 1, . . . , n}, x=⎜ ⎟
⎝ .. ⎠ ∈ R ,
n
(8.3.2)
xn
www.pdfgrip.com

238 Selected topics

and consider the subset


S = {x ∈ Rn | x = 1, x ≥ 0} (8.3.3)
of R . Then define
n

 = {λ ∈ R | λ ≥ 0, Ax ≥ λx for some x ∈ S}. (8.3.4)


We can show that  is an interval in R of the form [0, r] for some r > 0.
In fact, take a test vector, say y = (1, . . . , 1)t ∈ Rn . Since A is positive, it
is clear that Ay ≥ λy if λ satisfies
⎧ ⎫
⎨  ⎬
n

0 < λ ≤ min aij i = 1, . . . , n . (8.3.5)
⎩ ⎭
j =1

Moreover, the definition of  implies immediately that if λ ∈  then


[0, λ] ⊂ . Thus  is connected.
Let λ ∈ . Then there is some x ∈ S such that Ax ≥ λx. Therefore, since
x = 1, we have
⎧  ⎫
⎨  ⎬
n

λ = λx = Ax = max  aij xj   i = 1, . . . , n
⎩  ⎭
j =1
⎧ ⎫
⎨  ⎬
n

≤ max aij  i = 1, . . . , n , (8.3.6)
⎩ ⎭
j =1

which establishes the boundedness of .


Now set
r = sup {λ ∈ } . (8.3.7)
Then we have seen that r satisfies 0 < r < ∞. Hence there is a sequence
{λk } ⊂  such that
r = lim λk . (8.3.8)
k→∞
On the other hand, the assumption that {λk } ⊂  indicates that there is a
sequence {x (k) } ⊂ S such that
Ax (k) ≥ λk x (k) , k = 1, 2, . . . . (8.3.9)

Using the compactness of S, we may assume that there is subsequence of {x (k) }


which we still denote by {x (k) } without loss of generality such that it converges
to some element, say u = (a1 , . . . , an )t , in S, as k → ∞. Letting k → ∞ in
(8.3.9), we arrive at
Au ≥ ru. (8.3.10)
www.pdfgrip.com

8.3 Perron–Frobenius theorem for positive matrices 239

In particular, this proves r ∈ . Thus indeed  = [0, r].


We next show that equality must hold in (8.3.10).
Suppose otherwise there is some i0 = 1, . . . , n such that

n 
n
ai0 j aj > rai0 ; aij aj ≥ rai for i ∈ {1, . . . , n} \ {i0 }. (8.3.11)
j =1 j =1

Let z = (zi ) = u + sei0 (s > 0). Since zi = ai for i = i0 , we conclude from


the second inequality in (8.3.11) that

n
aij zj > rzi for i ∈ {1, . . . , n} \ {i0 }, s > 0. (8.3.12)
j =1

However, the first inequality in (8.3.11) gives us



n
ai0 j aj > r(ai0 + s) when s > 0 is sufficiently small, (8.3.13)
j =1

which leads to

n
ai0 j zj > rzi0 when s > 0 is sufficiently small. (8.3.14)
j =1

Thus Az > rz. Set v = z/z. Then v ∈ S and Av > rv. Hence we may
choose ε > 0 small such that Av > (r + ε)v. So r + ε ∈ , which contradicts
the definition of r made in (8.3.7).
Therefore Au = ru and r is a positive eigenvalue of A.
The positivity of u = (ai ) follows easily since u ∈ S and A > 0 so that
ru = Au leads to

n
λai = aij aj > 0, i = 1, . . . , n, (8.3.15)
j =1

because u = 0. That is, a non-negative eigenvector of A associated to a positive


eigenvalue must be positive.
Let v be any non-negative eigenvector of A associated to r. We show that
there is a positive number a such that v = au. For this purpose, we construct
the vector

us = u − sv, s > 0. (8.3.16)

Of course us is a non-negative eigenvector of A associated to r when s > 0 is


small. Set

s0 = sup{s | us ≥ 0}. (8.3.17)


www.pdfgrip.com

240 Selected topics

Then s0 > 0 and us0 ≥ 0 but us0 > 0. If us0 = 0 then us0 > 0 since us0 is an
eigenvector of A associated to r that contradicts the definition of s0 . Therefore
we must have us0 = 0, which gives us the result v = (s0−1 )u as desired.
In order to show that r is the simple root of the characteristic polynomial
of A, we need to prove that there is only one Jordan block associated with the
eigenvalue r and that this Jordan block can only be 1 × 1. To do so, we first
show that the geometric multiplicity of r is 1. Then we show that there is no
generalized eigenvector. That is, the equation

(A − rIn )v = u, v ∈ Cn , (8.3.18)

has no solution.
Suppose otherwise that the dimension of the eigenspace Er is greater than
one. Then there is some v ∈ Er that is not a scalar multiple of u. Write v as
v = v1 + iv2 with v1 , v2 ∈ Rn . Then Av1 = rv1 and Av2 = rv2 . We assert
that one of the sets of vectors {u, v1 } and {u, v2 } must be linearly independent
over R. Otherwise there are a1 , a2 ∈ R such that v1 = a1 u and v2 = a2 u which
imply v = (a1 + ia2 )u, a contradiction. Use w to denote either v1 or v2 which
is linearly independent from u over R. Then we know that ±w can never be
non-negative. Thus there are components of w that have different signs. Now
consider the vector

us = u + sw, s > 0. (8.3.19)

It is clear that us > 0 when s > 0 is sufficiently small since u > 0. So there
is some s0 > 0 such that us0 ≥ 0 but a component of us0 is zero. However,
because us0 = 0 owing to the presence of a positive component in w, we arrive
at a contradiction because us0 is seen to be an eigenvector of A associated to r.
We next show that (8.3.18) has no solution. Since u ∈ Rn , we need only to
consider v ∈ Rn in (8.3.18). We proceed as follows.
Consider

ws = v + su, s > 0. (8.3.20)

Since Au = ru, we see that ws also satisfies (8.3.18). Take s > 0 sufficiently
large so that ws > 0. Hence (A − rIn )ws = u > 0 or

Aws > rws . (8.3.21)

Thus, if δ > 0 is sufficiently small, we have Aws > (r + δ)ws . Rescaling ws


if necessary, we may assume ws ∈ S. This indicates r + δ ∈  which violates
the definition of r stated in (8.3.7).
So the assertion that r is a simple root of the characteristic polynomial of A
follows.
www.pdfgrip.com

8.3 Perron–Frobenius theorem for positive matrices 241

Moreover, let λ ∈ C an eigenvalue of A which is not r and v = (bi ) ∈ Cn


an associated eigenvector. From Av = λv we have

n
|λ||bi | ≤ aij |bj |, i = 1, . . . , n. (8.3.22)
j =1

Rescaling if necessary, we may also assume


⎛ ⎞
|b1 |
⎜ . ⎟
w=⎜ ⎟
⎝ .. ⎠ ∈ S. (8.3.23)
|bn |
Since (8.3.22) implies Aw ≥ rw, we see in view of the definition of  that
|λ| ∈ . Using (8.3.7), we have |λ| ≤ r.
If |λ| < r, there is nothing more to do. If |λ| = r, the discussion just made
in the earlier part of the proof shows that w is a non-negative eigenvector of
A associated to r and thus equality holds in (8.3.22) for all i = 1, . . . , n.
Therefore, since A > 0, the complex numbers b1 , . . . , bn must share the same
phase angle, θ ∈ R, so that

bi = |bi |eiθ , i = 1, . . . , n. (8.3.24)


On the other hand, since w is a non-negative eigenvector of A associated to r,
there is a number a > 0 such that
w = au. (8.3.25)

Combining (8.3.23)–(8.3.25), we have v = aeiθ u. In particular, λ = r.


Finally let v be an arbitrary eigenvector of A which is non-negative and
associated to an eigenvalue λ ∈ C. Since At > 0 and the characteristic poly-
nomial of At is the same as that of A, we conclude that r is also the dominant
eigenvalue of At . Now use w to denote a positive eigenvector of At associated
to r. Then At w = rw. Thus
λv t w = (Av)t w = v t At w = rv t w. (8.3.26)
However, in view of the fact that v ≥ 0, v = 0, w > 0, we have v t w > 0. So
it follows from (8.3.26) that λ = r.
The proof of the theorem is complete.
The dominant eigenvalue r of the distinguished characteristics of a positive
matrix A stated in Theorem 8.6 is also called the Perron or Perron–Frobenius
eigenvalue of the matrix A.
For an interesting historical account of the Perron–Frobenius theory and a
discussion of its many applications and generalizations and attributions of the
proofs including the one presented here, see Bellman [6].
www.pdfgrip.com

242 Selected topics

Exercises

8.3.1 Let A = (aij ) ∈ R(n, n) be a positive matrix and r > 0 the Perron–
Frobenius eigenvalue of A. Show that r satisfies the estimate
⎛ ⎞ ⎛ ⎞
n  n
min ⎝ aij ⎠ ≤ r ≤ max ⎝ aij ⎠ . (8.3.27)
1≤i≤n 1≤i≤n
j =1 j =1

8.3.2 Consider the matrix


⎛ ⎞
1 2 3
⎜ ⎟
A=⎝ 2 3 1 ⎠. (8.3.28)
3 2 1
(i) Use (8.3.27) to find the dominant eigenvalue of A.
(ii) Check to see that u = (1, 1, 1)t is a positive eigenvector of A. Use u
and Theorem 8.6 to find the dominant eigenvalue of A and confirm
that this is exactly what was obtained in part (i).
(iii) Compute all the eigenvalues of A directly and confirm the result
obtained in part (i) or (ii).
8.3.3 Let A, B ∈ R(n, n) be positive matrices and use rA , rB to denote
the dominant eigenvalues of A, B, respectively. Show that rA ≤ rB if
A ≤ B.
8.3.4 Let A ∈ R(n, n) be a positive matrix and u = (ai ) ∈ Rn a non-negative
eigenvector of A. Show that the dominant eigenvalue r of A may be
computed by the formula
⎛ ⎞
1 n
r = 0n ⎝ aj k ak ⎠ . (8.3.29)
i=1 ai j,k=1

8.4 Markov matrices


In this section we discuss a type of nonnegative matrices known as the Markov
or stochastic matrices which are important in many areas of applications.

Definition 8.7 Let A = (aij ) ∈ R(n, n) so that aij ≥ 0 for i, j = 1, . . . , n. If


A satisfies

n
aij = 1, i = 1, . . . , n, (8.4.1)
j =1
www.pdfgrip.com

8.4 Markov matrices 243

that is, the components of each row vector in A sum up to 1, then A is called
a Markov or stochastic matrix. If there is an integer m ≥ 1 such that Am is a
positive matrix, then A is called a regular Markov or regular stochastic matrix.

A few immediate consequences follow directly from the definition of a


Markov matrix and are stated below.

Theorem 8.8 Let A = (aij ) ∈ R(n, n) be a Markov matrix. Then 1 is an


eigenvalue of A which enjoys the following properties.

(1) The vector u = (1, . . . , 1)t ∈ Rn is an eigenvector of A associated to the


eigenvalue 1.
(2) Any eigenvalue λ ∈ C of A satisfies

|λ| ≤ 1. (8.4.2)

Proof Using (8.4.1), the fact that u = (1, . . . , 1)t satisfies Au = u may be
checked directly.
Now let λ ∈ C be any eigenvalue of A and v = (bi ) ∈ Cn an associated
eigenvector. Then there is some i0 = 1, . . . , n such that

|bi0 | = max{|bi | | i = 1, . . . , n}. (8.4.3)

Thus, from the relation Av = λv, we have



n
λbi0 = ai0 j bj , (8.4.4)
j =1

which in view of (8.4.1) gives us



n 
n
|λ||bi0 | ≤ ai0 j |bj | ≤ |bi0 | ai0 j = |bi0 |. (8.4.5)
j =1 j =1

Since |bi0 | > 0, we see that the bound (8.4.2) follows.

In fact, for a nonnegative element A ∈ R(n, n), it is clear that A being a


Markov matrix is equivalent to the vector u = (1, . . . , 1)t being an eigenvec-
tor of A associated to the eigenvalue 1. This simple fact establishes that the
product of any number of the Markov matrices in R(n, n) is also a Markov
matrix.
Let A ∈ R(n, n) be a Markov matrix. It will be interesting to identify a
certain condition under which the eigenvalue 1 of A becomes dominant as in
the Perron–Frobenius theorem. Below is such a result.
www.pdfgrip.com

244 Selected topics

Theorem 8.9 If A ∈ R(n, n) is a regular Markov matrix, then the eigenvalue


1 of A is the dominant eigenvalue of A which satisfies the following properties.

(1) The absolute value of any other eigenvalue λ ∈ C of A is less than 1. That
is, |λ| < 1.
(2) 1 is a simple root of the characteristic polynomial of A.

Proof Let m ≥ 1 be an integer such that Am > 0. Since Am is a Markov


matrix, we see that 1 is the dominant eigenvalue of Am . On the other hand, if
λ ∈ C is any eigenvalue of A other than 1, since λm is an eigenvalue of Am
other than 1, we have in view of Theorem 8.6 that |λm | < 1, which proves
|λ| < 1.
We now show that 1 is a simple root of the characteristic polynomial of A. If
m = 1, the conclusion follows from Theorem 8.6. So we may assume m ≥ 2.
Recall that there is an invertible matrix C ∈ C(n, n) such that

A = CBC −1 , (8.4.6)

where B ∈ C(n, n) takes the boxed diagonal form

B = diag{J1 , · · · , Jk } (8.4.7)

in which each Ji (i = 1, . . . , k) is either a diagonal matrix of the form λI or a


Jordan block of the form
⎛ ⎞
λ 1 0 ··· 0
⎜ .. ⎟
⎜ ..
. ⎟
⎜ 0 λ 1 . ⎟
⎜ ⎟
⎜ .. .. .. .. ⎟
J =⎜ . . . . 0 ⎟ , (8.4.8)
⎜ ⎟
⎜ ⎟
⎜ .. .. .. ⎟
⎝ . ··· . . 1 ⎠
0 ··· ··· 0 λ

where we use λ to denote a generic eigenvalue of A. In either case we may


rewrite J as the sum of two matrices, in the form

J = λI + P , (8.4.9)

where for some integer l ≥ 1 (with l being the degree of nilpotence of P ) we


have P l = 0. Hence, in view of the binomial expansion formula, we have
www.pdfgrip.com

8.4 Markov matrices 245

J m = (λI + P )m

m
m!
= λm−s P s
s!(m − s)!
s=0

l−1
m!
= λm−s P s
s!(m − s)!
s=0
m(m − 1) · · · (m − [l − 2]) l−1
= λm I + λm−1 mP + · · · + λm−l+1 P ,
(l − 1)!
(8.4.10)
which is an upper triangular matrix with the diagonal entries all equal to λm .
From (8.4.6) and (8.4.7), we have
Am = Cdiag{J1m , . . . , Jkm }C −1 . (8.4.11)
However, using the condition Am > 0, Theorem 8.6, and (8.4.10), we
conclude that there exists exactly one Jordan block among the Jordan blocks
J1 , . . . , Jk of A with λ = 1 and that such a Jordan block can only be 1 × 1.
The proof of the theorem is thus complete.
Since in (8.4.10) there are l terms, we see that J m → 0 as m → ∞ when
|λ| < 1. This observation leads to the following important convergence theo-
rem for the Markov matrices.

Theorem 8.10 If A ∈ R(n, n) is a regular Markov matrix, then


lim Am = K, (8.4.12)
m→∞

where K is a positive Markov matrix with n identical row vectors, v1t = · · · =


vnt = v t , where v = (b1 , . . . , bn )t ∈ Rn is the unique positive vector satisfying

n
At v = v, bi = 1. (8.4.13)
i=1

Proof If A is a regular Markov matrix, then there is an integer m ≥ 1 such


that Am > 0. Thus (At )m = (Am )t > 0. Since At and A have the same
characteristic polynomial, At has 1 as its dominant eigenvalue as A does. Let
v = (bi ) be an eigenvector of At associated to the eigenvalue 1. Then v is also
an eigenvector of (At )m associated to 1. Since (At )m > 0 and its eigenvalue 1
as the dominant eigenvalue of (At )m is simple, we see in view of Theorem 8.6
that we may choose v ∈ Rn so that either v > 0 or v < 0. We now choose
v > 0 and normalize it so that its components sum to 1. That is, v satisfies
(8.4.13).
www.pdfgrip.com

246 Selected topics

Using Theorem 8.9, let C ∈ C(n, n) be invertible such that


A = Cdiag{1, J1 , . . . , Jk }C −1 . (8.4.14)
Rewriting (8.4.14) as AC = Cdiag{1, J1 , · · · , Jk }, we see that the first column
vector of C is an eigenvector of A associated to the eigenvalue 1 which is
simple by Theorem 8.9. Hence we may choose this first column vector of C to
be u = a(1, . . . , 1)t for some a ∈ C, a = 0.
On the other hand, rewrite D = C −1 and express (8.4.14) as
DA = diag{1, J1 , . . . , Jk }D. (8.4.15)
We see that the first row vector of D, say w for some w ∈ C , satisfies
t n

w t A = w t . Hence At w = w. Since 1 is a simple root of the characteristic


polynomial of At , we conclude that there is some b ∈ C, b = 0, such that
w = bv where v satisfies (8.4.13).
Since D = C −1 , we have w t u = 1, which leads to

n
abv t (1, . . . , 1)t = ab bi = ab = 1. (8.4.16)
i=1
Finally, for any integer m ≥ 1, (8.4.14) gives us
Am = Cdiag{1, J1m , . . . , Jkm }D. (8.4.17)
Since each of the Jordan blocks J1 , . . . , Jk is of the form (8.4.9) for some
λ ∈ C satisfying |λ| < 1, we have seen that
J1m , . . . , Jkm → 0 as m → ∞. (8.4.18)
Therefore, taking m → ∞ in (8.4.17), we arrive at
lim Am = Cdiag{1, 0, . . . , 0}D. (8.4.19)
m→∞

Inserting the results that the first column vector of C is u = a(1, . . . , 1)t , the
first row vector of D is w t = bv t , where v satisfies (8.4.13), and ab = 1 into
(8.4.19), we obtain
⎛ ⎞
b1 · · · bn
⎜ . .. ⎟
lim Am = ⎜ ⎝ .. · · · ⎟
. ⎠ = K, (8.4.20)
m→∞
b1 ··· bn
as asserted.
For a Markov matrix A ∈ R(n, n), the power of A, Am , may or may not
approach a limiting matrix as m → ∞. If the limit of Am as m → ∞ exists
and is some K ∈ R(n, n), then it is not hard to show that K is also a Markov
matrix, which is called the stable matrix of A and A is said to be a stable
www.pdfgrip.com

8.4 Markov matrices 247

Markov matrix. Theorem 8.10 says that a regular Markov matrix A is stable
and, at the same time, gives us a constructive method to find the stable matrix
of A.

Exercises

8.4.1 Let A ∈ R(n, n) be a stable Markov matrix and K ∈ R(n, n) the stable
matrix of A. Show that K is also a Markov matrix and satisfies AK =
KA = K.
8.4.2 Show that the matrix

0 1
A= (8.4.21)
1 0
is a Markov matrix which is not regular. Is A stable?
8.4.3 Consider the Markov matrix
⎛ ⎞
1 1 0
1⎜ 1 1 ⎟
A= ⎜ ⎝ 1 ⎟. (8.4.22)
2 2 2 ⎠
0 1 1

(i) Check that A is regular by showing that A2 > 0.


(ii) Find the stable matrix of A.
(iii) Use induction to establish the formula
⎛ ⎞
3 1
⎜2 + (2 m−2
− 1) 2m−1
+ (2 m−2
− 1) ⎟
2
1 ⎜


⎟,
A = m⎜
m
2m−2
2m−1
2 m−2

2 ⎝ ⎠
1 3
+ (2 m−2
− 1) 2m−1
+ (2 m−2
− 1)
2 2
m = 1, 2, . . . . (8.4.23)
Take m → ∞ in (8.4.23) and verify your result obtained in (ii).
8.4.4 Let A ∈ R(n, n) be a Markov matrix. If At is also a Markov matrix, A
is said to be a doubly Markov matrix. Show that the stable matrix K of
a regular doubly Markov matrix is simply given by
⎛ ⎞
1 ··· 1
1⎜ . ⎟
K= ⎜ ⎝ .. · · · ... ⎟
⎠. (8.4.24)
n
1 ··· 1
8.4.5 Let A1 , . . . , Ak ∈ R(n, n) be k doubly Markov matrices. Show that their
product A = A1 · · · Ak is also a doubly Markov matrix.
www.pdfgrip.com

9
Excursion: Quantum mechanics in a nutshell

The content of this chapter may serve as yet another supplemental topic to meet
the needs and interests beyond those of a usual course curriculum. Here we
shall present an over-simplified, but hopefully totally transparent, description
of some of the fundamental ideas and concepts of quantum mechanics, using
a pure linear algebra formalism.

9.1 Vectors in Cn and Dirac bracket


Consider the vector space Cn , consisting of column vectors, and use
{e1 , . . . , en } to denote the standard basis of Cn . For u, v ∈ Cn with
⎛⎞ ⎞

a1 b1
⎜ . ⎟  n
⎜ . ⎟  n
u=⎜ .
⎝ . ⎠
⎟= ai ei , v=⎜ .
⎝ . ⎠
⎟= bi ei , (9.1.1)
i=1 i=1
an bn

recall that the Hermitian scalar product is given by


n
(u, v) = u v =

a i bi , (9.1.2)
i=1

so that {e1 , . . . , en } is a unitary basis, satisfying (ei , ej ) = δij , i, j = 1, . . . , n.


In quantum mechanics, it is customary to rewrite the scalar product (9.1.2)
in a bracket form, u|v. Then it was Dirac who suggested to view u|v as the
scalar pairing of a ‘bra’ vector u| and a ‘ket’ vector |v, representing the row
vector u† and the column vector v. Thus we may use |e1 , . . . , |en  to denote
the standard basis vectors of Cn and represent the vector u in Cn as

248
www.pdfgrip.com

9.1 Vectors in Cn and Dirac bracket 249


n
|u = ai |ei . (9.1.3)
i=1
Therefore the bra-counterpart of |u is simply given as

n
u| = (|u)† = a i ei |. (9.1.4)
i=1
As a consequence, the orthonormal condition regarding the basis {e1 , . . . ,
en } becomes
ei |ej  = δij , i, j = 1, . . . , n, (9.1.5)
and the Hermitian scalar product of the vectors |u and |v assumes the form

n
u|v = a i bi = v|u. (9.1.6)
i=1
For the vector |u given in (9.1.3), we find that
ai = ei |u, i = 1, . . . , n. (9.1.7)
Now rewriting |u as

n
|u = |ei ai , (9.1.8)
i=1
and inserting (9.1.7) into (9.1.8), we obtain
 n
n 
|u = |ei ei |u ≡ |ei ei | |u, (9.1.9)
i=1 i=1


n
which suggests that the strange-looking ‘quantity’, |ei ei |, should natu-
i=1
rally be identified as the identity mapping or matrix,

n
|ei ei | = I, (9.1.10)
i=1
which readily follows from the associativity property of matrix multiplication.
Similarly, we have
 n
n 
n 
u| = ei |uei | = u|ei ei | = u| |ei ei | . (9.1.11)
i=1 i=1 i=1
Thus (9.1.10) can be applied to both bra and ket vectors symmetrically and
what it expresses is simply the fact that |e1 , . . . , |en  form an orthonormal
basis of Cn .
www.pdfgrip.com

250 Excursion: Quantum mechanics in a nutshell

We may reexamine some familiar linear mappings under the new notation.
Let |u be a unit vector in Cn . Use P|u to denote the mapping that projects
C onto Span{|u} along (Span{|u})⊥ . Then we have
n

P|u |v = u|v|u, |v ∈ Cn . (9.1.12)


Placing the scalar number u|v to the right-hand side of the above expression,
we see that the mapping P|u can be rewritten as
P|u = |uu|. (9.1.13)
Let {|u1 , . . . , |un } be any orthonormal basis of Cn . Then P|ui  projects Cn
onto Span{|ui } along
Span{|u1 , . . . , |ui , . . . , |un }. (9.1.14)
It is clear that

n
I= P|ui  , (9.1.15)
i=1

since ai = ui |u, which generalizes the result (9.1.10).


If T is an arbitrary Hermitian operator with eigenvalues λ1 , . . . , λn and the
associated orthonormal eigenvectors u1 , . . . , un (which form a basis of Cn ),
then we have the representation

n
T = λi |ui ui |, (9.1.16)
i=1

as may be checked easily. Besides, in u|T |v, we may interpret T as applied


either to the ket vector |v, from the left, or to the bra vector u|, from the right,
which will not cause any ambiguity.
To summarize, we may realize Cn by column vectors as ket vectors
⎛ ⎞ ⎛ ⎞
a1 b1
⎜ . ⎟ ⎜ . ⎟
|u = ⎜ ⎟ ⎜
⎝ .. ⎠ , |v = ⎝ .. ⎠ ∈ C ,
⎟ n
(9.1.17)
an bn
so that the bra vectors are represented as row vectors
u| = (|u)† = (a 1 , . . . , a n ), v| = (|v)† = (b1 , . . . , bn ). (9.1.18)
Consequently,

n
u|v = (|u) |v =†
a i bi , (9.1.19)
i=1
www.pdfgrip.com

9.1 Vectors in Cn and Dirac bracket 251

⎛ ⎞
a1
⎜ . ⎟
|uv| = |u(|v)† = ⎜ ⎟
⎝ .. ⎠ (b1 , . . . , bn )
an
⎛ ⎞
a1 b1 ··· a1 bn
⎜ ⎟
= ⎝ ··· ··· · · · ⎠ = (ai bj ). (9.1.20)
an b1 ··· an bn
In particular, we have the representation
⎛ ⎞
1 0 ··· 0
⎜ ⎟
 n ⎜ 0 1 ··· 0 ⎟
⎜ ⎟
|ei ei | = ⎜ ⎟. (9.1.21)
⎜ 0 ..
. 0 ⎟
i=1 ⎝ 0 ⎠
0 0 ··· 1
Finally, if A ∈ C(n, n) is a Hermitian matrix (or equivalently viewed as
a self-adjoint mapping over Cn ), it is clear to explain A in u|A|v as to be
applied on the ket vector |v from the left or on the row vector u| from the
right, without ambiguity, since
u|A|v = u|(A|v) = (u|A)|v = (A|u)† |v. (9.1.22)

Exercises

9.1.1 Consider the orthonormal basis of C2 consisting of the ket vectors


 
1 1+i 1 1+i
|u1  = , |u2  = . (9.1.23)
2 1−i 2 −1 + i

Verify the identity


|u1 u1 | + |u2 u2 | = I2 . (9.1.24)

9.1.2 Consider the Hermitian matrix



−1 3+i
A= (9.1.25)
3−i 2
in C(2, 2).
(i) Find an orthonormal basis of C2 consisting of eigenvectors, say
|u1 , |u2 , associated with the eigenvalues, say λ1 , λ2 , of A.
(ii) Find the matrices that represent the orthogonal projections

P|u1  = |u1 u1 |, P|u2  = |u2 u2 |. (9.1.26)


www.pdfgrip.com

252 Excursion: Quantum mechanics in a nutshell

(iii) Verify the identity

A = λ1 P|u1  + λ2 P|u2  = λ1 |u1 u1 | + λ2 |u2 u2 |. (9.1.27)

9.2 Quantum mechanical postulates


In physics literature, quantum mechanics may be formulated in terms of two
objects referred to as states and observables. A state, also called a wave func-
tion, contains statistical information of a certain physical observable, which
often describes one of some measurable quantities such as energy, momenta,
and position coordinates of a mechanical system, such as those encountered
in describing the motion of a hypothetical particle. Mathematically, states are
vectors in a complex vector space with a positive definite Hermitian scalar
product, called the state space, and observables are Hermitian mappings over
the state space given. In this section, we present an over-simplified formalism
of quantum mechanics using the space Cn as the state space and Hermitian
matrices in C(n, n) as observables.
To proceed, we state a number of axioms, in our context, called the quantum
mechanical postulates, based on which quantum mechanics is built.

• State postulate. The state of a mechanical system, hereby formally referred


to as ‘a particle’, is described by a unit ket vector |φ in Cn .
• Observable postulate. A physically measurable quantity of the particle such
as energy, momenta, etc., called an observable, is represented by a Hermitian
matrix A ∈ C(n, n) so that its expected value for the particle in the state |φ,
denoted by A, is given by

A = φ|A|φ. (9.2.1)

• Measurement postulate. Let A ∈ C(n, n) be an observable with eigenvalues


λ1 , . . . , λn , which are known to be all real, and the corresponding eigenvec-
tors of A be denoted by |u1 , . . . , |un , which form an orthonormal basis
of Cn . As a random variable, the measurement XA of the observable A,
when the particle lies in the state |φ, can result only in a reading among the
eigenvalues of A, and obeys the probability distribution
⎧ 

⎨ |φi |2 , if λ is an eigenvalue,
P ({XA = λ}) = λi =λ (9.2.2)

⎩ 0, if λ is not an eigenvalue,
www.pdfgrip.com

9.2 Quantum mechanical postulates 253

where φ1 , . . . , φn ∈ C are the coordinates of |φ with respect to the basis


{u1 , . . . , un }, that is,

n
|φ = φi |ui . (9.2.3)
i=1

The measurement postulate (9.2.2) may be viewed as directly motivated


from the expected (or expectation) value formula

n
A = λi |φi |2 , (9.2.4)
i=1

which can be obtained by substituting (9.2.3) into (9.2.1).


In the context of the measurement postulate, we have the following quantum
mechanical interpretation of an eigenstate.

Theorem 9.1 The particle lies in an eigenstate of the observable A if and


only if all measurements of A render the same value which must be some eigen-
value of A.

Proof Suppose that λ is an eigenvalue of A. Use Eλ to denote the correspond-


ing eigenspace. If the particle lies in a state |φ ∈ Eλ , then φi = ui |φ = 0
when λi = λ, where λ1 , . . . , λn are all the possible eigenvalues of A and
|u1 , . . . , |un  are the corresponding eigenvectors that form an orthonormal
basis of Cn . Therefore
P ({XA = λi }) = 0 when λi = λ, (9.2.5)
which leads to P ({XA = λ}) = φ|φ = 1.
Conversely, if P ({XA = λ}) = 1 holds for some λ when the particle lies
in the state |φ, then according to the measurement postulate (9.2.2), we have
λ = λi for some i = 1, . . . , n. Hence

0 = 1 − P ({XA = λi }) = |φj |2 , (9.2.6)
λj =λi

which follows that φj = 0 when j = i. Thus |φ ∈ Eλi as claimed.


We are now at a position to consider how a state evolves itself with respect
to time t.
• Time evolution postulate. A state vector |φ follows time evolution accord-
ing to the law
d
ih̄ |φ = H |φ, (9.2.7)
dt
www.pdfgrip.com

254 Excursion: Quantum mechanics in a nutshell

where H ∈ C(n, n) is a Hermitian matrix called the Hamiltonian of the system


and h̄ > 0 a universal constant called the Planck constant. Equation (9.2.7) is
the matrix version of the celebrated Schrödinger equation.
In classical mechanics, the Hamiltonian H of a system measures the total
energy, which may be written as the sum of the kinetic energy K and potential
energy V ,

H = K + V. (9.2.8)

If the system consists of a single particle of mass m > 0, then K may be


expressed in terms of the momentum P of the particle through the relation
1 2
K= P . (9.2.9)
2m
In quantum mechanics in our context here, both P and V are taken to be Her-
mitian matrices.
Let |φ(0) = |φ0  be the initial state of the time-dependent state vector
|φ(t). Solving (9.2.7), we obtain
i
|φ(t) = U (t)|φ0 , U (t) = e− h̄ tH . (9.2.10)

i
Since H is Hermitian, − H is anti-Hermitian. Hence U (t) is unitary,

U (t)U † (t) = I, (9.2.11)

which ensures the conservation of the normality of the state vector |φ(t).
That is,

φ(t)|φ(t) = φ0 |φ0  = 1. (9.2.12)


Assume that the eigenvalues of H , say λ1 , . . . , λn are all positive. Let
u1 , . . . , un be the associated eigenstates of H that form an orthonormal
basis of Cn . With the expansion

n
|φ0  = φ0,i |ui , (9.2.13)
i=1

we may rewrite the state vector |φ(t) as



n
i 
n
i 
n
|φ(t) = φ0,i e− h̄ tH |ui  = e− h̄ λi t φ0,i |ui  = e−iωi t φ0,i |ui ,
i=1 i=1 i=1
(9.2.14)
www.pdfgrip.com

9.2 Quantum mechanical postulates 255

where
λi
ωi = , i = 1, . . . , n, (9.2.15)

are angular frequencies of the eigenmodes

e−iωi t φ0,i |ui , i = 1, . . . , n. (9.2.16)

In other words, the state vector or wave function |φ(t) is a superposition of n


eigenmodes with the associated angular frequencies determined by the eigen-
values of the Hamiltonian through (9.2.15).
As an observable, the Hamiltonian H measures the total energy. Thus the
eigenvalues λ1 , . . . , λn of H are the possible energy values of the system. For
this reason, we may use E to denote a generic energy value, among λ1 , . . . , λn .
Correspondingly, we use ω to denote a generic angular frequency, among
ω1 , . . . , ωn . Therefore we arrive at the generic relation

E = h̄ω, (9.2.17)

known as the Einstein formula, arising originally in the work of Einstein


towards an understanding of the photoelectric effect, which later became one of
the two basic equations in the wave–particle duality hypothesis of de Broglie
and laid the very foundation of quantum mechanics. Roughly speaking, the
formula (9.2.17) indicates that a particle of energy E behaves like a wave of
angular frequency ω and that a wave of angular frequency ω also behaves like
a particle of energy E.
Let A be an observable and assume that the system lies in the state
|φ(t) which is governed by the Schrödinger equation (9.2.7). We investigate
whether the expectation value A(t) = φ(t)|A|φ(t) is conserved or time-
independent. For this purpose, we compute
d d
A(t) = φ(t)|A|φ(t)
dt dt
   
d d
= φ(t)| A|φ(t) + φ(t)|A |φ(t)
dt dt
 †  
d d
= |φ(t)| A|φ(t) + φ(t)|A |φ(t)
dt dt
 †  
i i
= − H |φ(t) A|φ(t) + φ(t)|A − H |φ(t)
h̄ h̄
1   2
i 
= φ(t) [H, A]φ(t) , (9.2.18)

www.pdfgrip.com

256 Excursion: Quantum mechanics in a nutshell

where
[H, A] = H A − AH (9.2.19)
is the commutator of H and A which measures the non-commutativity of H
and A. Hence we see that A is time-independent if A commutes with H :
[H, A] = 0. (9.2.20)
In particular, the average energy H  is always conserved. Furthermore, if H is
related through the momentum P and potential V through (9.2.8) and (9.2.9)
so that P commutes with V , then [H, P ] = 0 and the average momentum
P  is also conserved. These are the quantum mechanical extensions of laws
of conservation for energy and momentum.

Exercises

9.2.1 Consider the Hermitian matrix


⎛ ⎞
5 i 0
⎜ ⎟
A = ⎝ −i 3 1−i ⎠ (9.2.21)
0 1+i 5
as an observable of a system.
(i) Find the eigenvalues of A as possible readings when measuring the
observable A.
(ii) Find the corresponding unit eigenvectors of A, say |u1 , |u2 , |u3 ,
which form an orthonormal basis of the state space C3 .
(iii) Assume that the state the system occupies is given by the vector
⎛ ⎞
i
1 ⎜ ⎟
|φ = √ ⎝ 2 + i ⎠ (9.2.22)
15
3
and resolve |φ in terms of |u1 , |u2 , |u3 .
(iv) If the system stays in the state |φ, determine the probability dis-
tribution function of the random variable XA , which is the random
value read each time when a measurement about A is made.
(v) If the system stays in the state |φ, evaluate the expected value, A,
of XA .
9.2.2 Consider the Schrödinger equation

d 4 i
ih̄ |φ = H |φ, H = . (9.2.23)
dt −i 4
www.pdfgrip.com

9.3 Non-commutativity and uncertainty principle 257

(i) Find the orthonormal eigenstates of H and use them to construct


the solution |φ(t) of (9.2.23) satisfying the initial condition

1 1
|φ(0) = |φ0  = √ . (9.2.24)
2 −1
(ii) Consider a perturbation of the Hamiltonian H given as

0 1
Hε = H + εσ1 , σ1 = , ε ∈ R, (9.2.25)
1 0
where σ1 is known as one of the Pauli matrices. Show that the com-
mutator of H and Hε is

1 0
[H, Hε ] = 2iεσ3 , σ3 = , (9.2.26)
0 −1
where σ3 is another Pauli matrix, and use it and (9.2.18) to evaluate
the rate of change of the time-dependent expected value Hε (t) =
φ(t)|Hε |φ(t).
(iii) Establish the formula

φ(t)|Hε |φ(t) = φ0 |H |φ0  + εφ(t)|σ1 |φ(t) (9.2.27)


d
and use it to verify the result regarding Hε (t) obtained in (ii)
dt
through the commutator identity (9.2.18).

9.3 Non-commutativity and uncertainty principle


Let {|ui } be an orthonormal set of eigenstates of a Hermitian matrix A ∈
C(n, n) with the associated real eigenvalues {λi } and use XA to denote the
random variable of measurement of A. If the system lies in the state |φ
with φi = ui |φ (i = 1, 2, · · · ), the distribution function of XA is as
given in (9.2.2). Thus the variance of XA can be calculated according to the
formula

n
σA2 = (λi − A)2 |φi |2
i=1

= (|(A − AI )|φ)† (A − AI )|φ


= φ|(A − AI )|(A − AI )|φ, (9.3.1)
www.pdfgrip.com

258 Excursion: Quantum mechanics in a nutshell

which is given in a form free of the choice of the basis {|ui }. Thus,
with (9.3.1), if A and B are two observables, then the Schwarz inequality
implies that
σA2 σB2 ≥ |φ|(A − AI )|(B − BI )|φ|2 ≡ |c|2 , (9.3.2)
where the complex number c is given by
c = φ|(A − AI )|(B − BI )|φ
= φ|(A − AI )(B − BI )|φ
= φ|AB|φ − Bφ|A|φ − Aφ|B|φ + ABφ|φ
= AB − AB. (9.3.3)
Interchanging A and B, we have
c = φ|(B − BI )|(A − AI )|φ
= BA − AB. (9.3.4)
Therefore, we obtain
1 1
(c) = (c − c) = [A, B]. (9.3.5)
2i 2i
Inserting (9.3.5) into (9.3.2), we arrive at the inequality
 2
1
σA2 σB2 ≥ [A, B] , (9.3.6)
2i
which roughly says that if two observables are non-commutative, we cannot
achieve simultaneous high-precision measurements for them. To put the state-
ment in another way, if we know one observable with high precision, we do
not know the other observable at the same time with high precision. This fact,
in particular, the inequality (9.3.6), in quantum mechanics, is known as the
Heisenberg uncertainty principle.
On the other hand, when A and B commute, we know that A and B share
the same eigenstates which may form an orthonormal basis of Cn . Let φ be a
commonly shared eigenstate so that
A|φ = λA |φ, B|φ = λB |φ, λA , λB ∈ R. (9.3.7)
If the system lies in the state |φ, then, with simultaneous full certainty, the
measured values of the observables A and B are λA and λB , respectively.
Let k ≥ 1 be an integer and define the kth moment of an observable A in the
state |φ by
Ak  = φ|Ak |φ. (9.3.8)
www.pdfgrip.com

9.3 Non-commutativity and uncertainty principle 259

Thus, in (9.3.3), when we set B = A, we see that the variance σA2 of A in the
state |φ may be computed using the formula

σA2 = A2  − A2 . (9.3.9)



In probability theory, the radical root of the variance, σA = σA2 , is called
standard deviation. In quantum mechanics, σA is also called uncertainty,
which measures the randomness of the observed values of the observable A.
It will be instructive to identify those states which will render the maximum
uncertainty. To simplify our discussion, we shall assume that A ∈ C(n, n) has
n distinct eigenvalues λ1 , . . . , λn . As before, we use |u1 , . . . , |un  to denote
the corresponding eigenstates of A which form an orthonormal basis of the
state space Cn . In order to emphasize the dependence of the uncertainty on the
underlying state |φ, we use σA,|φ 2
to denote the associated variance. We are
to solve the problem
2
max{σA,|φ | φ|φ = 1}. (9.3.10)

For this purpose, we write any normalized state vector |φ as


n
|φ = φi |ui . (9.3.11)
i=1

Then A2  and A are given by


n 
n
A2  = λ2i |φi |2 , A = λi |φi |2 . (9.3.12)
i=1 i=1

Hence, we have
 n 2

n 
2
σA,|φ = λ2i |φi |2 − λi |φi |2 . (9.3.13)
i=1 i=1

To ease computation, we replace |φi | by xi ∈ R (i = 1, . . . , n) and consider


instead the constrained maximization problem
⎧ ⎧  n ⎫

⎪ ⎨ n 
2⎬

⎪ λ2i xi2 − λi xi2


max
⎩ ⎭
,
i=1 i=1
(9.3.14)

⎪ n



⎩ xi = 1.
2

i=1
www.pdfgrip.com

260 Excursion: Quantum mechanics in a nutshell

Thus, using calculus, the maximum points are to be sought among the solutions
of the equations
⎛ ⎡ ⎤ ⎞
n
xi ⎝λ2i − 2λi ⎣ λj xj2 ⎦ − ξ ⎠ = 0, i = 1, . . . , n, (9.3.15)
j =1

where ξ ∈ R is a Lagrange multiplier. Multiplying this equation by xi and


summing over i = 1, . . . , n, we find

ξ = A2  − 2A2 . (9.3.16)

Consequently (9.3.15) takes the form


# $
xi λ2i − 2Aλi − [A2  − 2A2 ] = 0, i = 1, . . . , n. (9.3.17)

On the other hand, since the quadratic equation

λ2 − 2Aλ − (A2  − 2A2 ) = 0 (9.3.18)

has two real roots


(
λ = A ± A2  − A2 = A ± σA (9.3.19)

in the nontrivial situation σA > 0, we see that there are at least n − 2 values of
i = 1, . . . , n such that

λ2i − 2Aλi − (A2  − 2A2 ) = 0, (9.3.20)

which leads us to conclude in view of (9.3.17) that xi = 0 at those values


of i. For definiteness, we assume xi = 0 when i = 1, 2. Hence (9.3.14) is
reduced into
⎧  # $2 

⎨ max λ2 x 2 + λ2 x 2 − λ1 x 2 + λ2 x 2
1 1 2 2 1 2 ,
(9.3.21)

⎩ x 2 + x 2 = 1.
1 2

Using the constraint in (9.3.21), we may simplify the objective function of the
problem (9.3.21) into the form

(λ1 − λ2 )2 (x12 − x14 ), (9.3.22)

which may be further maximized to give us the solution


1
|x1 | = |x2 | = √ . (9.3.23)
2
www.pdfgrip.com

9.3 Non-commutativity and uncertainty principle 261

In this case, it is straightforward to check that λ1 and λ2 are indeed the two
roots of equation (9.3.19). In particular,
1
2
σA,|φ = (λ1 − λ2 )2 . (9.3.24)
4
Consequently, if we use λmin and λmax to denote the smallest and largest eigen-
values of A, then we see in view of (9.3.24) that the maximum uncertainty is
given by
1
σA,max = (λmax − λmin ), (9.3.25)
2
which is achieved when the system occupies the maximum uncertainty state
1
|φmax  = a|uλmax  + b|uλmin , a, b ∈ C, |a| = |b| = √ . (9.3.26)
2
In other words, we have the result

σA,|φmax  = σA,max . (9.3.27)

Exercises

9.3.1 Let A, B ∈ C(2, 2) be two observables given by


 
1 i −2 1 − i
A= , B= . (9.3.28)
−i 2 1+i 3

Evaluate the quantities σA2 , σB2 , and [A, B] in the state

1 −1
|φ = √ , (9.3.29)
5 2

and use them to check the uncertainty principle (9.3.6).


9.3.2 Consider an observable A ∈ C(n, n), which has n distinct eigenval-
ues λ1 , . . . , λn . Let |u1 , . . . , |un  be the corresponding eigenstates of
A which form an orthonormal basis of Cn . Consider the uniform state
given by

1 
n
|φ = √ |ui . (9.3.30)
n
i=1

(i) Compute the uncertainty of A in the state |φ.


(ii) Compare your result with the maximum uncertainty given in the
expression (9.3.25).
www.pdfgrip.com

262 Excursion: Quantum mechanics in a nutshell

9.3.3 Let A ∈ C(3, 3) be an observable given by


⎛ ⎞
1 2+i 0
⎜ ⎟
A = ⎝ 2 − i −3 0 ⎠ . (9.3.31)
0 0 5
(i) Compute the maximum uncertainty of A.
(ii) Find all maximum uncertainty states of A.
(iii) Let |φ be the uniform state defined in Exercise 9.3.2. Find the
uncertainty of A in the state |φ and compare it with the maximum
uncertainty of A found in (i).

9.4 Heisenberg picture for quantum mechanics


So far our description of quantum mechanics has been based on the
Schrödinger equation, which governs the evolution of a state vector. Such a
description of quantum mechanics is also called a Schrödinger picture. Here
we study another important description of quantum mechanics called the
Heisenberg picture within which the state vector is time-independent but
observables evolve with respect to time following a dynamical equation similar
to that seen in classical mechanics.
We start from the Schrödinger equation (9.2.7) defined by the Hamilto-
nian H . Let |φ(t) be a state vector and |φ(0) = |φ0 . Then
i
|φ(t) = e− h̄ tH |φ0 . (9.4.1)
Thus, for any observable A, its expected value in the state |φ(t) is given by
i i
A(t) = φ(t)|A|φ(t) = φ0 |e h̄ tH Ae− h̄ tH |φ0 . (9.4.2)
On the other hand, if we use time-independent state vector, |φ0 , and replace A
with a correctly formulated time-dependent version, A(t), we are to have the
same mechanical conclusion. In particular, the expected value of A at time t in
the state |φ(t) must be equal to the expected value of A(t) in the state |φ0 .
That is,
φ(t)|A|φ(t) = φ0 |A(t)|φ0 . (9.4.3)
Comparing (9.4.2) and (9.4.3), we obtain
i i
φ0 |e h̄ tH Ae− h̄ tH − A(t)|φ0  = 0. (9.4.4)
Therefore, using the arbitrariness of |φ0 , we arrive at the relation
www.pdfgrip.com

9.4 Heisenberg picture for quantum mechanics 263

i i
A(t) = e h̄ tH Ae− h̄ tH , (9.4.5)
which indicates how an observable should evolve itself with respect to time.
Differentiating (9.4.5), we are led to the equation
d i
A(t) = [H, A(t)], (9.4.6)
dt h̄
which spells out the dynamical law that a time-dependent observable must
follow and is known as the Heisenberg equation.
We next show that (9.4.6) implies (9.2.7) as well.
As preparation, we establish the following Gronwall inequality which is
useful in the study of differential equations: If f (t) and g(t) are continuous
non-negative functions in t ≥ 0 and satisfy
 t
f (t) ≤ a + f (τ )g(τ ) dτ, t ≥ 0, (9.4.7)
0
for some constant a ≥ 0, then
 t 
f (t) ≤ a exp g(τ ) dτ , t ≥ 0. (9.4.8)
0
To prove it, we modify (9.4.7) into
 t
f (t) < a + ε + f (τ )g(τ ) dτ, t ≥ 0, (9.4.9)
0
and set
 t
h(t) = a + ε + f (τ )g(τ ) dτ, t ≥ 0, (9.4.10)
0

where ε > 0. Therefore h(t) is positive-valued and differentiable with h (t) =


f (t)g(t) for t > 0 and h(0) = a + ε. Multiplying (9.4.9) by g(t), we have
h (t)
≤ g(t), t ≥ 0. (9.4.11)
h(t)
Integrating (9.4.11), we obtain
 t 
h(t) ≤ h(0) exp g(τ ) dτ , t ≥ 0. (9.4.12)
0
However, (9.4.10) and (9.4.11) indicate that f (t) < h(t) (t ≥ 0). Hence we
may use (9.4.12) to get
 t 
f (t) < (a + ε) exp g(τ ) dτ , t ≥ 0. (9.4.13)
0
Finally, since ε > 0 is arbitrary, we see that (9.4.8) follows.
www.pdfgrip.com

264 Excursion: Quantum mechanics in a nutshell

We now turn our attention back to the Heisenberg equation (9.4.6). Suppose
that B(t) is another solution of the equation. Then C(t) = A(t)−B(t) satisfies
d i
C(t) = [H, C(t)], C(0) = A(0) − B(0). (9.4.14)
dt h̄
Therefore, we have
2
C  (t) ≤ H C(t). (9.4.15)

On the other hand, we may use the triangle inequality to get
C(t + h) − C(t) ≤ C(t + h) − C(t), h > 0, (9.4.16)
which allows us to conclude with
d
C(t) ≤ C  (t). (9.4.17)
dt
Inserting (9.4.17) into (9.4.15) and integrating, we find

2 t
C(t) ≤ C(0) + H C(τ ) dτ, t ≥ 0. (9.4.18)
h̄ 0
Consequently, it follows from applying the Gronwall inequality that
2
C(t) ≤ C(0)e h̄ H t , t ≥ 0. (9.4.19)
The same argument may be carried out in the domain t ≤ 0 with the time
flipping t # → −t. Thus, in summary, we obtain the collective conclusion
2
C(t) ≤ C(0)e h̄ H  |t| , t ∈ R. (9.4.20)
In particular, if C(0) = 0, then C(t) ≡ 0, which implies that the solution to
the initial value problem of the Heisenberg equation (9.4.6) is unique. Hence,
if A(0) = A, then the solution is uniquely given by (9.4.5). As a consequence,
if A commutes with H , A(t) = A for all time t. In other words, if an observ-
able commutes with the Hamiltonian initially, it stays commutative with the
Hamiltonian and remains in fact constant for all time.
We can now derive the Schrödinger equation (9.2.7) from the Heisenberg
equation (9.4.6).
In fact, let A(t) be the unique solution of (9.4.6) evolving from its initial
state A. Then A(t) is given by (9.4.5). Let |φ(t) denote the state vector that
evolves with respect to t from its initial state vector |φ0  so that it gives rise
to the same expected value as that evaluated using the Heisenberg equation
through A(t). Then (9.4.3) holds. We hope to examine, to what extent, the
relation (9.4.1) must be valid. To this end, we assume that
u|A|u = v|A|v (9.4.21)
www.pdfgrip.com

9.4 Heisenberg picture for quantum mechanics 265

holds for any Hermitian matrix A ∈ C(n, n) for some |u, |v ∈ Cn and we
investigate how |u and |v are related.
If |v = 0, then u|A|u = 0 for any Hermitian matrix A, which implies
|u = 0 as well. Thus, in the following, we only consider the nontrivial situa-
tion |u = 0, |v = 0.
If v = 0, we set V = Span{|v} and W = V ⊥ . Choose a Hermitian matrix
A so that |x #→ A|x (|x ∈ Cn ) defines the projection of Cn onto V along
W . Write |u = a|v + |w, where a ∈ C and |w ∈ W . Then A|u = a|v.
Thus

u|A|u = |a|2 v|v = |a|2 v|A|v, (9.4.22)

which leads to

|a| = 1 or a = eiθ , θ ∈ R. (9.4.23)

Moreover, let A ∈ C(n, n) be a Hermitian matrix so that |x #→ A|x (|x ∈ Cn )


defines the projection of Cn onto W along V . Then A|v = 0 and A|w = |w.
Inserting these into (9.4.21), we find

0 = v|A|v = u|A|u = w|w, (9.4.24)

which gives us the result w = 0. In other words, that (9.4.21) holds for any
Hermitian matrix A ∈ C(n, n) implies that |u and |v differ from each other
by a phase factor a satisfying (9.4.23). That is,

|u = eiθ |v, θ ∈ R. (9.4.25)

Now inserting (9.4.5) into (9.4.3), we obtain

φ(t)|A|φ(t) = ψ(t)|A|ψ(t), (9.4.26)

where
i
|ψ(t) = e− h̄ tH |φ0 . (9.4.27)

Consequently, in view of the conclusion (9.4.25), we arrive at the relation


i
|φ(t) = eiθ(t) e− h̄ tH |φ0 , (9.4.28)

where θ (t) is a real-valued function of t, which simply cancels itself out in


(9.4.26) and may well taken to be zero. In other words, we are prompted to
conclude that the state vector should follow the law of evolution given simply
by (9.4.1) or by (9.4.28) with setting θ (t) ≡ 0, which is the unique solution of
the Schrödinger equation (9.2.7) subject to the initial condition |φ(0) = |φ0 .
Thus the Schrödinger equation inevitably comes into being as anticipated.
www.pdfgrip.com

266 Excursion: Quantum mechanics in a nutshell

Exercises

9.4.1 Consider the Heisenberg equation (9.4.6) subject to the initial condition
A(0) = A0 . An integration of (9.4.6) gives

i t
A(t) = A0 + [H, A(τ )] dτ. (9.4.29)
h̄ 0
(i) From (9.4.29) derive the result
 t
i
[H, A(t)] = [H, A0 ] + [H, [H, A(τ )]] dτ. (9.4.30)
h̄ 0
(ii) Use (9.4.30) and the Gronwall inequality to come up with an alter-
native proof that H and A(t) commute for all time if and only if
they do so initially.
9.4.2 Consider the Hamiltonian H and an observable A given by
 
2 i 1 1−i
H = , A= . (9.4.31)
−i 2 1+i 1
(i) Solve the Schrödinger equation (9.2.7) to obtain the time-
dependent state |φ(t) evolving from the initial state

1 1
|φ0  = √ . (9.4.32)
2 i
(ii) Evaluate the expected value of the observable A assuming that the
system lies in the state |φ(t).
(iii) Solve the Heisenberg equation (9.4.6) with the initial condition
A(0) = A and use it to evaluate the expected value of the same
observable within the Heisenberg picture. That is, compute the
quantity φ0 |A(t)|φ0 . Compare the result with that obtained in (ii)
and explain.
www.pdfgrip.com

Solutions to selected exercises

Section 1.1

1.1.2 Let n satisfy 1 ≤ n < p and write 2n = kp + l for some integers k and
l where 0 ≤ l < p. Then n + (n − l) = kp. Thus [n − l] = −[n].
If [n] = [0], then n is not a multiple of p. Since p is a prime, so the
greatest common divisor of n and p is 1. Thus there are integers k, l such
that

kp + ln = 1. (S1)

Consequently [l] is the multiplicative inverse of [n]. That is,


[l] = [n]−1 .
We may also prove the existence of [n]−1 without using (S1) but by
using an interesting statement called Fermat’s Little Theorem: For any
nonnegative integer m and positive integer p, the integer mp − m is
divisible by p. We prove this theorem by induction. When m = 0, there
is nothing to show. Assume the statement is true at m ≥ 0. At m + 1 we
have

 p
p−1
(m + 1) − (m + 1) =
p
mk + (mp − m), (S2)
k=1
k

which is clearly divisible by p in view of the inductive assumption and


the definition of binomial coefficients. So the theorem follows.
Now we come back to the construction of [n]−1 . Since p is a prime
and divides np − n = n(np−1 − 1) but not n, p divides np−1 − 1. So
there is an integer k such that np−1 − 1 = kp. Therefore nnp−2 = 1
modulo p. That is, [np−2 ] = [n]−1 .
1.1.7 Multiplying the relation AB = I by C from the left we have C(AB) =
C which gives us B = I B = (CA)B = C(AB) = C.

267
www.pdfgrip.com

268 Solutions to selected exercises

Section 1.2

1.2.7 We have
⎛ ⎞
⎛ ⎞ a1 b1 a1 b2 ··· a1 bn
a1 ⎜ ⎟
⎜ . ⎟ ⎜ a2 b1 a2 b2 ··· a2 bn ⎟
ut v = ⎜ ⎟
⎝ .. ⎠ (b1 , . . . , bn ) =

⎜ ···
⎟,
⎝ ··· ··· ··· ⎟⎠
an
an b1 an b2 ··· an bn
whose ith row and j th row are

ai (b1 , b2 , . . . , bn ) and aj (b1 , b2 , . . . , bn ),

which are clearly linearly dependent with each other.


For column vectors we proceed similarly.
1.2.8 If either U1 ⊂ U2 or U2 ⊂ U1 , the statement is trivially true. Suppose
otherwise and pick u1 ∈ U1 but u1 ∈ U2 and u2 ∈ U2 but u2 ∈ U1 . We
assert that u = u1 +u2 ∈ U1 ∪U2 . If u ∈ U1 ∪U2 then u ∈ U1 or u ∈ U2 .
So, respectively, u2 = u + (−u1 ) ∈ U1 or u1 = u + (−u2 ) ∈ U2 , which
is false.
1.2.9 Without loss of generality we may assume that there are no such i, j ,
i, j = 1, . . . , k, i = j , that Ui ⊂ Uj . Thus, in view of the previous
exercise, we know that for k = 2 we can find u1 ∈ U1 and u2 ∈ U2
such that u = u1 + u2 ∈ U1 ∪ U2 . Assume that the statement of the
problem is true at k = m ≥ 2. We proceed to prove the statement at
k = m + 1.
Assume otherwise that U = U1 ∪ · · · ∪ Um+1 . By the inductive as-
sumption there is some u ∈ U such that

u ∈ U1 ∪ · · · ∪ Um . (S3)

Thus u ∈ Um+1 . Pick v ∈ Um+1 and consider the following m + 1


vectors

w1 = u + v, w2 = 2u + v, ..., wm+1 = (m + 1)u + v.

There is some i = 1, . . . , m + 1 such that wi ∈ U1 ∪ · · · ∪ Um .


Otherwise, if wi ∈ U1 ∪ · · · ∪ Um for all i = 1, . . . , m + 1, then there are
i, j = 1, . . . , m + 1, i = j , such that wi , wj lie in one of the subspaces
U1 , . . . , Um , say Ul (1 ≤ l ≤ m), which leads to wi − wj = (i − j )u ∈
Ul or u ∈ Ul , a contradiction to (S3).
Now assume wi ∈ U1 ∪ · · · ∪ Um for some i = 1, . . . , m + 1. Thus
wi ∈ Um+1 . So v = wi + (−iu) ∈ Um+1 since u ∈ Um+1 , which is
again false.
www.pdfgrip.com

Solutions to selected exercises 269

Section 1.4

1.4.3 Without loss of generality assume f = 0. Then there is some u ∈ U


such that f (u) = 0. For any v ∈ U consider
f (v)
w=v− u.
f (u)
We have f (w) = 0. Hence g(v) = 0 as well which gives us
g(u)
g(v) = f (v) ≡ af (v), v ∈ U.
f (u)
That is, g = af .
1.4.4 Let {v1 , . . . , vn−1 } be a basis of V . Extend it to get a basis of Fn , say
{v1 , . . . , vn−1 , vn }. Let {f1 , . . . , fn−1 , fn } be a basis of (Fn ) dual to
{v1 , . . . , vn−1 , vn }. It is clear that V 0 = Span{fn }. On the other hand,
for any (x1 , . . . , xn ) ∈ Fn we have
v = (x1 , x2 , . . . , xn ) − (x1 + x2 + · · · + xn , 0, . . . , 0) ∈ V .
So
0 = fn (v) = fn (x1 , . . . , xn ) − (x1 + · · · + xn )fn (e1 ),
(x1 , . . . , xn ) ∈ Fn . (S4)
For any f ∈ V , there is some a ∈ F such that f = afn . Hence in view
0

of (S4) we obtain

n
f (x1 , . . . , xn ) = afn (e1 ) xi , (x1 , . . . , xn ) ∈ Fn .
i=1

Section 1.7

1.7.2 Assume the nontrivial situation u = 0. It is clear that up ≤ u∞ for
any p ≥ 1. Thus lim sup up ≤ u∞ .
p→∞
Let t0 ∈ [a, b] be such that |u(t0 )| = u∞ > 0. For any ε ∈
(0, u∞ ) we can find an interval around t0 , say Iε ⊂ [a, b], such that
|u(t)| > u∞ − ε when t ∈ Iε . Thus
 1
p 1
up ≥ |u(t)|p dt ≥ (u∞ − ε) |Iε | p , (S5)

where |Iε | denotes the length of the interval Iε . Letting p → ∞ in (S5)
we find lim inf up ≥ u∞ −ε. Since ε > 0 may be chosen arbitrarily
p→∞
small, we arrive at lim inf up ≥ u∞ .
p→∞
Thus the limit up → u∞ as p → ∞ follows.
www.pdfgrip.com

270 Solutions to selected exercises

1.7.3 (i) Positivity: Of course u  ≥ 0 for any u ∈ U  . If u  = 0,


then |u (u)| = 0 for any u ∈ U satisfying u = 1. Thus, for any
1
v ∈ U, v = 0, we have u (v) = vu ( v) = 0, which shows
v
u (v) = 0 for any v ∈ U . So u = 0.
(ii) Homogeneity: For any a ∈ F and u ∈ U  , we have
au  = sup{|au (u)| | u ∈ U, u = 1}
= |a| sup{|u (u)| | u ∈ U, u = 1} = |a|u  .
(iii) Triangle inequality: For any u , v  ∈ U  , we have
u + v   = sup{|u (u) + v  (u)| | u ∈ U, u = 1}
≤ sup{|u (u)| + |v  (u)| | u ∈ U, u = 1}
≤ sup{|u (u)| | u ∈ U, u = 1}
+ sup{|v  (v)| | v ∈ U, v = 1}
= u  + v   .
1.7.4 (i) Positivity: Of course u ≥ 0. If u = 0, then |u (u)| = 0 for all
u ∈ U  satisfying  u  =  1. Then for any v  ∈ U  , v  = 0, we
1
have v  (u) = v   v  (u) = 0, which shows v  (u) = 0 for
v  
any v  ∈ U  . Hence u = 0.
(ii) Homogeneity: Let a ∈ F and u ∈ U . We have
au = sup{|u (au)| | u ∈ U  , u  = 1}
= |a| sup{|u (u)| | u ∈ U  , u  = 1} = |a|u.
(iii) Triangle inequality: For u, v ∈ U , we have
u + v = sup{|u (u + v)| | u ∈ U  , u  = 1}
≤ sup{|u (u)| + |u (v)| | u ∈ U  , u  = 1}
≤ sup{|u (u)| | u ∈ U  , u  = 1}
+ sup{|v  (v)| | v  ∈ U  , v   = 1}
= u + v.

Section 2.1

2.1.7 (i) Since f, g = 0 we have dim(f 0 ) = dim(g 0 ) = n − 1. Let


{u1 , . . . , un−1 } be a basis of f 0 and take v ∈ g 0 but v ∈ f 0 .
Then u1 , . . . , un−1 , v are linearly independent, and hence, form a
basis of U . In particular, U = f 0 + g 0 .
www.pdfgrip.com

Solutions to selected exercises 271

(ii) From the dimensionality equation

n = dim(f 0 ) + dim(g 0 ) − dim(f 0 ∩ g 0 )

the answer follows.


2.1.8 We have N (T ) ⊂ N (T 2 ) and R(T 2 ) ⊂ R(T ). On the other hand, if
n = dim(U ), the rank equation gives us

n(T ) + r(T ) = n = n(T 2 ) + r(T 2 ).

Thus n(T ) = n(T 2 ) if and only if r(T ) = r(T 2 ). So N (T 2 ) = N (T )


if and only if R(T 2 ) = R(T ).
2.1.10 Let w1 , . . . , wk ∈ W be a basis of R(S ◦ T ). Then there are
u1 , . . . , uk ∈ U such that S(T (ui )) = wi , i = 1, . . . , k.
Let vi = T (ui ), i = 1, . . . , k. Then Span{v1 , . . . , vk } ⊂ R(T ).
Hence k ≤ r(T ). Of course, R(S ◦ T ) ⊂ R(S). So k ≤ r(S) as well.
Next, let u1 , . . . , uk ∈ U form a basis of N (T ) ⊂ N (S ◦ T ). Let
z1 , . . . , zl ∈ N (S ◦ T ) so that u1 , . . . , uk , z1 , . . . , zl form a basis of
N(S ◦ T ). Of course T (z1 ), . . . , T (zl ) ∈ N (S). We assert that these
are linearly independent. In fact, if there are scalars a1 , . . . , al such that
a1 T (z1 ) + · · · + al T (zl ) = 0, then T (a1 z1 + · · · + al zl ) = 0, which
indicates that a1 z1 + · · · + al zl ∈ N (T ). Hence a1 z1 + · · · + al zl =
b1 u1 + · · · + bk uk for some scalars b1 , . . . , bk . However, we know that
u1 , . . . , uk , z1 , . . . , zl are linearly independent. So a’s and b’s are all
zero. In particular, T (z1 ), . . . , T (zl ) are linearly independent. Hence
l ≤ n(S), which proves k + l ≤ n(T ) + n(S).
2.1.12 Define T ∈ L(Fm , Fn ) by T (x) = Bx, x ∈ Fm . Then, since R(T ) ⊂
Rn , we have r(T ) ≤ n. By the rank equation n(T ) + r(T ) = m and the
condition m > n, we see that n(T ) > 0. Hence there is a nonzero vector
x ∈ Fm such that T (x) = 0 or Bx = 0. Thus (AB)x = A(Bx) = 0
which proves that the m × m matrix AB cannot be invertible.
2.1.14 Use N and R to denote the null-space and range of a mapping. Let

N (R) ∩ R(S ◦ T ) = Span{w1 , ..., wk } ⊂ W,

where w1 , ..., wk are independent vectors.


Expand {w1 , ..., wk } to get a basis for R(S ◦ T ) so that

R(S ◦ T ) = Span{w1 , ..., wk , y1 , ..., yl }.

Then {R(y1 ), ..., R(yl )} is a basis for R(R ◦ S ◦ T ) since

R(w1 ) = ... = R(wk ) = 0


www.pdfgrip.com

272 Solutions to selected exercises

and R(y1 ), ..., R(yl ) are independent. In particular, r(R ◦ S ◦ T ) = l.


Since R(S ◦ T ) ⊂ R(S), we can expand {w1 , ..., wk , y1 , ..., yl } to
get a basis for R(S) so that

R(S) = Span{w1 , ..., wk , y1 , ..., yl , z1 , ..., zm }.

Now we can count the numbers as follows:

r(R ◦ S) = dim (Span{R(w1 ), ..., R(wk ),


R(y1 ), ..., R(yl ), R(z1 ), ..., R(zm )})
= dim(Span{R(y1 ), ...,R(yl ), R(z1 ), ..., R(zm )})
≤ l + m,
r(S ◦ T ) = k + l.

So
r(R ◦ S) + r(S ◦ T ) ≤ l + m + k + l
= (k + l + m) + l
= r(S) + r(R ◦ S ◦ T ).

2.1.15 Define a mapping T : Fk+l → V + W by



k 
l
T (y1 , . . . , yk , z1 , . . . , zl ) = yi vi + zj w j ,
i=1 j =1

(y1 , . . . , yk , z1 , . . . , zl ) ∈ Fk+l .

Then N(T ) = S. From the rank equation we have n(T ) + r(T ) = k + l


or dim(S) + r(T ) = k + l. On the other hand, it is clear that R(T ) =
V + W . So r(T ) = dim(V + W ). From the dimensionality equation
(1.5.6) we have dim(V + W ) = dim(V ) + dim(W ) − dim(V ∩ W ), or
r(T ) + dim(V ∩ W ) = k + l. Therefore dim(S) = dim(V ∩ W ).

Section 2.2

2.2.4 Define T ∈ L(R2 ) by setting


 
a b x1
T (x) = x, x= ∈ R2 .
c d x2
Then we have

T (e1 ) = ae1 + ce2 ,


T (e2 ) = be1 + de2 ,
www.pdfgrip.com

Solutions to selected exercises 273

and
1 1
T (u1 ) = (a + b + c + d)u1 + (a + b − c − d)u2 ,
2 2
1 1
T (u2 ) = (a − b + c − d)u1 + (a − b − c + d)u2 ,
2 2
so that the problem follows.
2.2.5 Let A = (aij ) and define T ∈ L(Fn ) by setting T (x) = Ax where
x ∈ Fn is taken to be a column vector. Then

n
T (ej ) = aij ei , j = 1, . . . , n.
i=1

Consider a new basis of F given by


n

f1 = en , f2 = en−1 , ..., fn = e1 ,
or fi = en−i+1 for i = 1, . . . , n. Then we obtain

n 
n
T (fj ) = an−i+1,n−j +1 fi ≡ bij fi , j = 1, . . . , n.
i=1 i=1
Since A and B = (bij ) are the matrix representations of the mapping T
under the bases {e1 , . . . , en } and {f1 , . . . , fn }, respectively, we conclude
that A ∼ B.

Section 2.3

2.3.3 That the equation T (u) = v has a solution for some u ∈ U is equivalent
to v ∈ R(T ), which is equivalent to v ∈ N (T  )0 (by Theorem 2.8),
which is equivalent to v, v   = 0 for all v  ∈ N (T  ) or T  (v  ) = 0.

Section 2.4

2.4.3 Let k = r(T̃ ) = dim(R(T̃ )). Let {[v1 ]Y , . . . , [vk ]Y } be a basis of R(T̃ )
in V /Y . Then there are [u1 ]X , . . . , [uk ]X ∈ U/X such that
T̃ ([u1 ]X ) = [v1 ]Y , ..., T̃ ([uk ]) = [vk ]Y .
On the other hand, from the definition of T̃ , we have T̃ ([ui ]X ) =
[T (ui )]Y for i = 1, . . . , k. We claim that T (u1 ), . . . , T (uk ) are linearly
independent. In fact, let a1 , . . . , ak be scalars such that
a1 T (u1 ) + · · · + ak T (uk ) = 0.
Taking cosets, we get a1 [T (u1 )]Y + · · · + ak [T (uk )]Y = [0]Y , which
leads to a1 [v1 ]Y + · · · + ak [vk ]Y = [0]Y . Thus a1 = · · · = ak = 0. This
proves r(T ) ≥ k.
www.pdfgrip.com

274 Solutions to selected exercises

Section 2.5

2.5.12 (i) Assume R(S) = R(T ). Since S projects U onto R(S) = R(T )
along N (S), we have for any u ∈ U the result S(T (u)) = T (u).
Thus S ◦ T = T . Similarly, T ◦ S = S.
Furthermore, from S ◦ T = T , then T (U ) = (S ◦ T )(U ) implies
R(T ) ⊂ R(S). Likewise, T ◦ S = S implies R(S) ⊂ R(T ). So
R(S) = R(T ).
(ii) Assume N (S) = N (T ). For any u ∈ U , rewrite u as u = v + w
with v ∈ R(T ) and w ∈ N (T ). Then T (v) = v, T (w) = 0, and
S(w) = 0 give us

(S ◦ T )(u) = S(T (v) + T (w)) = S(v) = S(v + w) = S(u).

Hence S ◦ T = S. Similarly, T ◦ S = T .
Assume S ◦ T = S, T ◦ S = T . Let u ∈ N (T ). Then S(u) =
S(T (u)) = 0. So u ∈ N (S). So N (T ) ⊂ N (S). Interchange S
and T . We get N (S) ⊂ N (T ).
2.5.14 (i) We have

T 2 (u) = T (u, u1 u1 + u, u2 u2 )


= u, u1 T (u1 ) + u, u2 T (u2 )
= u, u1 (u1 , u1 u1 + u1 , u2 u2 )
+ u, u2 (u2 , u1 u1 + u2 , u2 u2 )
= (u, u1 u1 , u1  + u, u2 u2 , u1 )u1
+ (u, u1 u1 , u2  + u, u2 u2 , u2 )u2
= T (u) = u, u1 u1 + u, u2 u2 . (S6)

Since u1 , u2 are independent, we have

u, u1 u1 , u1  + u, u2 u2 , u1  = u, u1 ,
u, u1 u1 , u2  + u, u2 u2 , u2  = u, u2 .

Namely,

u, u1 , u1 u1 + u2 , u1 u2  = u, u1 ,


u, u1 , u2 u1 + u2 , u2 u2  = u, u2 .

Since u ∈ U is arbitrary, we have

u1 , u1 u1 + u2 , u1 u2 = u1 , u1 , u2 u1 + u2 , u2 u2 = u2 .
www.pdfgrip.com

Solutions to selected exercises 275

Since u1 , u2 are independent, we have

u1 , u1  = 1, u2 , u1  = 0, u1 , u2  = 0, u2 , u2  = 1.


(S7)
Conversely, if (S7) holds, then we can use (S6) to get

T 2 (u) = u, u1 u1 + u, u2 u2 = T (u), u ∈ U.

In conclusion, (S7) is a necessary and sufficient condition to ensure


that T is a projection.
(ii) Assume T is a projection. Then (S7) holds. From V = {u ∈
U | T (u) = u} and the definition of T , we see that V ⊂
Span{u1 , u2 }. Using (S7), we also have u1 , u2 ∈ V . So V =
Span{u1 , u2 }.
If T (u) = 0, then u, u1 u1 + u, u2 u2 = 0. Since u1 , u2
are independent, we have u, u1  = 0, u, u2  = 0. So W =
(Span{u1 , u2 })0 .
2.5.15 We have T (T − aI ) = 0 or (aI − T )T = T (aI − T ) = 0. Besides,
 
1 1 1 1
I = I − T + T = (aI − T ) + T .
a a a a
Therefore, we may rewrite any u ∈ U as
1 1
u = I (u) = (aI − T )(u) + T (u) ≡ v + w. (S8)
a a
Thus
 
1
T (v) = T(aI − T )(u) = 0,
a
 
1
(aI − T )(w) = (aI − T ) T (u) = 0.
a
That is, v ∈ N (T ) and w ∈ N (aI −T ). Hence U = N (T )+N (aI −T ).
Let u ∈ N (aI − T ) ∩ N (T ). Inserting this into (S8), we get u = 0.
So N(aI − T ) ∩ N (T ) = {0} and U = N (aI − T ) ⊕ N (T ).
Since T commutes with aI − T and T , it is clear that N (aI − T )
and N(T ) are invariant under T .
2.5.25 (i) Since T is nilpotent of degree k, there is an element u ∈ U such
that T k−1 (u) = 0 but T k (u) = 0. It is clear that

V = Span{u, T (u), . . . , T k−1 (u)}

is a k-dimensional subspace of U invariant under T . Let W be the


complement of Span{T k−1 (u)} in N (T ). Then dim(W ) = l − 1.
www.pdfgrip.com

276 Solutions to selected exercises

Consider the subspace X = V + W of U . Pick v ∈ V ∩ W and


write v as

v = a0 u + a1 T (u) + · · · + ak−1 T k−1 (u), a0 , a1 , . . . , ak−1 ∈ F.

Then T (v) = 0 gives us a0 T (u) + · · · + ak−2 T k−1 (u) = 0, which


leads to a0 = a1 = · · · = ak−2 = 0. So v ∈ Span{T k−1 (u)} and
v ∈ W which indicates v = 0. Hence X = V ⊕ W . However, since
dim(X) = dim(V ) + dim(W ) = k + (l − 1) = n, we have X = U
so that T is reducible over V , W .
(ii) It is clear that R(T ) = Span{T (u), . . . , T k−1 (u)} and r(T ) =
k − 1.
2.5.26 (i) Write S − T as S ◦ (I − S −1 ◦ T ) and set P = S −1 ◦ T . Then
P is nilpotent as well. Let the degree of P be k. Assume P = 0.
Then k ≥ 2. It may be checked that I − P is invertible since
I + P + · · · + P k−1 is the inverse of I − P .
(ii) For A ∈ R(2, 2) define TA ∈ L(R2 ) by TA (x) = Ax where x ∈ R2
is a column vector. Now set
 
0 1 0 1
A= , B= .
1 0 0 0

Then TA is invertible, TB is nilpotent, but TA − TB is not invertible.


It is direct to check that TA , TB do not commute.
2.5.27 Let v ∈ V . Then T (v) ∈ R(T ) ∩ V since V is invariant under T . Since
R(T ) ∩ V = {0} we have T (v) = 0. So v ∈ N (T ). That is, V ⊂ N (T ).
On the other hand, the rank equation indicates that dim(V ) = n(T ) =
dim(N(T )). So V = N (T ).

Section 2.6

2.6.1 Write Sn = Tn − T . Then

T = Tn − Sn = Tn ◦ (I − Rn ), (S9)

where Rn = Tn−1 ◦ Sn . If Tn−1  → ∞ as n → ∞, we may assume


that {Tn−1 } is bounded without loss of generality. Since Sn  → 0
as n → ∞ we have Rn  ≤ Tn−1 Sn  → 0 as n → ∞ as well.
So in view of Theorem 2.25 we see that I − Rn is invertible when n is
sufficiently large, which leads to the false conclusion that T is invertible
in view of (S9).
www.pdfgrip.com

Solutions to selected exercises 277

2.6.5 Let P ∈ N and consider Tλ = λI + P , where λ is a scalar. It is clear


that Tλ is invertible for all λ = 0 and P − Tλ  → 0 as λ → 0. Since
Tλ is invertible so it can never be nilpotent.

Section 3.1

3.1.3 (i) Let C be a closed curve around but away from the origin of R2 and
consider
f (x, y)
u : C → S1, u(x, y) = , (x, y) ∈ C.
f (x, y)
Take R > 0 to be a constant and let C be parametrized by θ :
R R
x = cos θ, y = sin θ, 0 ≤ θ ≤ 2π. Then u = (u1 , u2 ) =
a b
(cos θ, sin θ ) (on C). So ind(f |C ) = deg(u) = 1. On the other
hand, on C, we have

g(x, y)2 = (a 2 x 2 − b2 y 2 )2 + 4a 2 b2 x 2 y 2 = (a 2 x 2 + b2 y 2 )2 = R 4 .

Now we can construct v : C → S 1 by setting


g(x, y) 1
v(x, y) = = 2 (a 2 x 2 − b2 y 2 , 2abxy)
g(x, y) R
= (cos2 θ − sin2 θ, 2 cos θ sin θ ).
Therefore, ind(g|C ) = deg(v) = 2.
(ii) The origin of R2 is the only zero of f and g that is a simple zero of
f and a double zero of g.
(iii) Since fε and gε are small perturbations of f and g, by stability we
deduce ind(fε |C ) = 1, ind(gε |C ) = 2. However, fε still has exactly
one simple zero, x = ε, y = 0, but gε has two simple zeros, x = ±

ε, y = 0. Thus, when going from ε = 0 to ε > 0, the double zero
of the latter splits into two simple zeros. Note that, algebraically,
a double zero and two simple zeros are both counted as two zeros,
which is, loosely speaking, indicative of the result ind(gε |C ) = 2.
3.1.4 Consider the vector field
# $
v = x 3 − 3xy 2 − 5 cos2 (x + y), 3x 2 y − y 3 + 2e−x y .
2 2

We are to show that v has a zero somewhere. We use the deformation


method again to simplify the problem. For this purpose, we set
# $
v t = x 3 − 3xy 2 − 5t cos2 (x + y), 3x 2 y − y 3 +2te−x y , 0 ≤ t ≤ 1.
2 2
www.pdfgrip.com

278 Solutions to selected exercises

It isnot hard to show that there is some R > 0 such that v t  ≥ 1


for x 2 + y 2 = r ≥ R, t ∈ [0, 1]. Thus, we only need to show
that over a suitably large circle given by CR = {r = R}, we have
ind(v 0 |CR ) = 0. Over CR we have x = R cos θ, y = R sin θ, θ ∈
[0, 2π]. On the other hand, for z = x + iy, we have z3 = (x 3 − 3xy 2 ) +
i(−y 3 + 3x 2 y). Hence, with z = Reiθ , we get v 0 = R 3 (cos 3θ, sin 3θ ),
1
so that u= 0 v 0 =(cos 3θ, sin 3θ ). Thus deg(u) = 3. So ind(v|CR ) =
v 
ind(v 1 |CR ) = ind(v 0 |CR ) = 3 = 0 as expected and v must vanish at
some point inside CR .
3.1.5 Inserting the expression for the stereographic projection given and going
through a tedious calculation, we obtain

1 1
deg(u) = dxdy
π R2 (1 + x 2 + y 2 )2
  ∞
1 2π r
= dθ dr = 1.
π 0 0 (1 + r 2 )2
3.1.6 Inserting the hedgehog expression and integrating, we obtain
 ∞  2π  
1 ∂u ∂u
deg(u) = u· × dθ dr
4π 0 0 ∂r ∂θ

n 0
= − sin f df = n,
2 π
which indicates that the map covers the 2-sphere, while preserving the
orientation, n times.

Section 3.2

3.2.6 Consider the matrix C = aA + bB. The entries of C not in the j th


column are the corresponding entries of A multiplied by (a + b) but the
j th column of C is equal to the sum of the a-multiple of the j th column
of A and b-multiple of the j th column of B. So by the properties of
determinants, we have
⎛ ⎞
a11 · · · aa1j + bb1j · · · a1n
⎜ . .. ⎟
det(C) = (a + b)n−1 det ⎜ ⎝ .. ··· ··· ···

. ⎠
an1 ··· aanj + bbnj ··· ann
= (a + b) n−1
(a det(A) + b det(B)).
www.pdfgrip.com

Solutions to selected exercises 279

3.2.7 Let the column vectors in A(t) be denoted by A1 (t), . . . , An (t). Then

det(A(t + h)) − det(A(t))


= |A1 (t + h), A2 (t + h), . . . , An (t + h)|
− |A1 (t), A2 (t), . . . , An (t)|
= |A1 (t + h), A2 (t + h), . . . , An (t + h)|
− |A1 (t), A2 (t + h), . . . , An (t + h)|
+ |A1 (t), A2 (t + h), A3 (t + h) . . . , An (t + h)|
− |A1 (t), A2 (t), A3 (t + h), . . . , An (t + h)|
+ |A1 (t), A2 (t), A3 (t + h), . . . , An (t + h)|
+ · · · + |A1 (t), A2 (t), . . . , An (t + h)|
− |A1 (t), A2 (t), . . . , An (t)|
= |A1 (t + h) − A1 (t), A2 (t + h), . . . , An (t + h)|
+ |A1 (t), A2 (t + h) − A2 (t), A3 (t + h), . . . , An (t + h)|
+ · · · + |A1 (t), A2 (t), . . . , An (t + h) − An (t)|.

Dividing the above by h = 0, we get

1
(det(A(t + h)) − det(A(t)))
h 
1 
=  (A1 (t + h) − A1 (t)), A2 (t + h), . . . , An (t + h)
h
 
 1 
+ A1 (t), (A2 (t + h) − A2 (t)), . . . , An (t + h)
h
 
 1 
+ · · · + A1 (t), A2 (t), . . . , (An (t + h) − An (t)) .
h

Now taking the h → 0 limits on both sides of the above, we arrive at

d
det(A(t)) = |A1 (t), A2 (t), . . . , An (t)|
dt
+ |A1 (t), A2 (t), . . . , An (t)|
+ · · · + |A1 (t), A2 (t), . . . , An (t)|.

Finally, expanding the j th determinant along the j th column on the


right-hand side of the above by the cofactors, j = 1, 2, . . . , n,
we have
www.pdfgrip.com

280 Solutions to selected exercises

d 
n 
n
 
det(A(t)) = ai1 (t)Ci1 (t) + ai2 (t)Ci2 (t)
dt
i=1 i=1

n

+··· + ain (t)Cin (t).
i=1

3.2.8 Adding all columns to the first column of the matrix, we get
 
  n 
 x+ ai a1 a2 · · · an 

 
 i=1 
  n 
 
 x+ ai x a2 · · · an 
 
 i=1 
 
 
 x + a a
n
 i 2 x · · · an 
 
 i=1 
 .. 
 .. .. .. . .
 . . . . . 
 
  n 
 
 x+ ai a2 a3 · · · x 
 
i=1
 
 1 a1 a2 · · · an 
 
 1 x a ··· a 
  2 n 
n  
 
= x+ ai  1 a2 x · · · an  .
 . . .. 
i=1  . . .. . .
 . . . . . 
 
 1 a a ··· x 
2 3

Consider the (n + 1) × (n + 1) determinant on the right-hand side of


the above. We now subtract row n from row n + 1, row n − 1 from
row n,..., row 2 from row 3, row 1 from row 2. Then that determinant
becomes
 
 1 a1 a2 ··· an 
 
 0 x−a ··· 
 1 0 0 
 
 0 · · · x − a · · · 0 
a≡ 2 .
 . 
 . .. .. .. .. 
 . . . . . 
 
 0 ··· ··· ··· x − a  n

Since the minor of the entry at the position (1, 1) is the determinant of
'n
a lower triangular matrix, we get a = (x − ai ).
i=1
www.pdfgrip.com

Solutions to selected exercises 281

3.2.9 If c1 , . . . , cn+2 are not all distinct then it is clear that the determinant is
zero since there are two identical columns. Assume now c1 , . . . , cn+2
are distinct and consider the function of x given by

⎛ ⎞
p1 (x) p1 (c2 ) ··· p1 (cn+2 )
⎜ ⎟
⎜ p2 (x) p2 (c2 ) ··· p2 (cn+2 ) ⎟
⎜ ⎟
p(x) = det ⎜ .. .. .. ⎟.
⎜ ..
. ⎟
⎝ . . . ⎠
pn+2 (x) pn+2 (c2 ) · · · pn+2 (cn+2 )

By the cofactor expansion along the first column, we see that p(x)
is a polynomial of degree at most n which vanishes at n + 1 points:
c2 , . . . , cn+2 . Hence p(x) = 0 for all x. In particular, p(c1 ) = 0 as well.
3.2.10 We use Dn+1 to denote the determinant and implement induction to do
the computation. When n = 1, we have D2 = a1 x + a0 . At n − 1,
we assume Dn = an−1 x n−1 + · · · + a1 x + a0 . At n, by the cofactor
expansion according to the first column, we get

Dn+1 = xDn + (−1)n+2 a0 (−1)n


= x(an x n−1 + · · · + a2 x + a1 ) + a0 .

3.2.11 Use D(λ) to denote the determinant on the left-hand side of (3.2.46). It
is clear that D(0) = 0. So (3.2.46) is true at λ = 0.
Now assume λ = 0 and rewrite D(λ) as

⎛ ⎞
1 0 0 ··· 0
⎜ a1 − λ ··· ⎟
⎜ 1 a2 an ⎟
⎜ ⎟
⎜ 1 a1 a2 − λ ··· an ⎟
D(λ) = det ⎜ ⎟.
⎜ ⎟
⎜ .. .. .. .. .. ⎟
⎝ . . . . . ⎠
1 a1 a2 ··· an − λ

Adding the (−a1 ) multiple of column 1 to column 2, the (−a2 ) mul-


tiple of column 1 to column 3,..., and the (−an ) multiple of column 1
to column n + 1, we see that D(λ) becomes
www.pdfgrip.com

282 Solutions to selected exercises

⎛ ⎞
1 −a1 −a2 ··· −an
⎜ −λ ··· ⎟
⎜ 1 0 0 ⎟
⎜ ⎟
⎜ 1 0 −λ ··· 0 ⎟
D(λ) = det ⎜ ⎟
⎜ ⎟
⎜ .. .. .. .. .. ⎟
⎝ . . . . . ⎠
1 0 0 · · · −λ
⎛ a1 a2 an ⎞
1 − − ··· −
⎜ λ λ λ ⎟
⎜ −1 ··· ⎟
⎜ 1 0 0 ⎟
⎜ ⎟
= λ det ⎜
n
1 0 −1 ··· 0 ⎟.
⎜ ⎟
⎜ ⎟
⎜ .. .. .. .. .. ⎟
⎝ . . . . . ⎠
1 0 0 ··· −1

On the right-hand side of the above, adding column 2, column 3,...,


column n + 1 to column 1, we get

⎛ ⎞
1
n
a1 a2 an
⎜1 − λ ai − − ··· −
⎜ λ λ λ ⎟

⎜ i=1 ⎟
⎜ −1 ··· 0 ⎟
⎜ 0 0 ⎟
D(λ) = λ det ⎜
n


⎜ 0 0 −1 ··· 0 ⎟ ⎟
⎜ .. ⎟
⎜ .. .. .. .. ⎟
⎝ . . . . . ⎠
0 0 0 ··· −1
 
1 
n 
n
= λn 1 − ai (−1)n = (−1)n λn−1 λ − ai .
λ
i=1 i=1

3.2.12 From det(AAt ) = det(In ) = 1 and det(A)2 = det(AAt ) we get


det(A)2 = 1. Using det(A) < 0, we get det(A) = −1. Hence
det(A + In ) = det(A + AAt ) = det(A) det(In + At ) = − det(In + A),
which results in det(In + A) = 0.
3.2.13 Since a11 = 1 or −1, we may add or subtract the first row from the
other rows of A so that we reduce A into a matrix B = (bij ) satisfying
b11 = 1 or −1, bi1 = 0 for i ≥ 2, and all entries in the submatrix of
B with the first row and column of B deleted are even numbers. By the
cofactor expansion along the first column of B we see that det(B) =
an even integer. Since det(A) = det(B), the proof follows.
www.pdfgrip.com

Solutions to selected exercises 283

3.2.14 To see that det(A) = 0, it suffices to show that A is invertible, or equiv-


alently, N (A) = {x ∈ Rn | Ax = 0} = {0}. In fact, take x ∈ N (A) and
assume otherwise x = (x1 , . . . , xn )t = 0. Let i = 1, . . . , n be such
that

|xi | = max {|xj |}. (S10)


1≤j ≤n

Then |xi | > 0. On the other hand, the ith component of the equation
Ax = 0 reads

ai1 x1 + · · · + aii xi + · · · + ain xn = 0. (S11)

Combining (S10) and (S11) we arrive at

0 = |ai1 x1 + · · · + aii xi + · · · + ain xn |



≥ |aii ||xi | − |aij ||xj |
1≤j ≤n,j =i
⎛ ⎞

≥ |xi | ⎝|aii | − |aij |⎠ > 0,
1≤j ≤n,j =i

which is a contradiction.
3.2.15 Consider the modified matrix

A(t) = D + t (A − D), 0 ≤ t ≤ 1,

where D = diag{a11 , . . . , ann }. Then A(t) satisfies the condition stated


in the previous exercise. So det(A(t)) = 0. Furthermore, det(A(0)) =
det(D) > 0. So det(A(1)) > 0 as well otherwise there is a point t0 ∈
(0, 1) such that det(A(t0 )) = 0, which is false.
3.2.17 Let α = (a1 , . . . , bn ), β = (b1 , . . . , bn ).
To compute
 
 1 − a1 b1 −a1 b2 ··· −a1 bn 

 
 −a2 b1 1 − a2 b2 · · · −a2 bn 
 
det(In − α β) = 
t
.. .. .. ,
 .. 
 . . . . 
 
 −an b1 −an b2 · · · 1 − an bn 

we artificially enlarge it into an (n + 1) × (n + 1) determinant of


the form
www.pdfgrip.com

284 Solutions to selected exercises

 
 1 b1 b2 ··· bn 
 
 1 − a1 b1 −a1 b2 ··· −a1 bn 
 0
 
 0 −a2 b1 1 − a2 b2 ··· −a2 bn  .
det(In − α β) = 
t
 
 .. .. .. .. .. 
 . . . . . 
 
 0 −an b1 −an b2 ··· 1 − an bn 
Now adding the a1 multiple of row 1 to row 2, a2 multiple of row
1 to row 3,..., and an multiple of row 1 to the last row, of the above
determinant, we get
 
 1 b1 b2 · · · bn 
 
 a 
 1 1 0 ··· 0 
 
 
det(In − α t β) =  a2 0 1 · · · 0  .
 . . 
 . .. .. . .
 . . . . .. 
 
 a 0 0 ··· 1 
n

Next, subtracting the b1 multiple of row 2 from row 1, b2 multiple of


row 3 from row 1,..., and bn multiple of the last row from row 1, we
obtain
 
 n 
 1− a b 0 0 · · · 0 
 i i 
 
 i=1 
 1 0 · · · 0  
 a1 n
det(In − α t β) =  =1− ai bi .
 a2 0 1 · · · 0 
 . 
i=1
 .. .. .. . .
 . . . . .. 
 
 a 0 0 ··· 1 
n

3.2.18 The method is contained in the solution of Exercise 3.2.17. In fact,


adding the (−x1 ) multiple of row 2 to row 1, (−x2 ) multiple of row 3
to row 1,..., and (−xn ) multiple of the last row to row 1, we get
⎛  n ⎞
⎜ 100 − xi
2
0 0 · · · 0 ⎟
⎜ ⎟
⎜ i=1 ⎟
⎜ 1 0 ··· 0 ⎟
⎜ x1 ⎟
f (x1 , . . . , xn ) = det ⎜


⎜ x2 0 1 ··· 0 ⎟ ⎟
⎜ .. ⎟
⎜ .. .. .. . . ⎟
⎝ . . . . . ⎠
xn 0 0 ··· 1

n
= 100 − xi2 .
i=1
www.pdfgrip.com

Solutions to selected exercises 285

So f (x1 , . . . , xn ) = 0 represents the sphere in Rn centered at the origin


and of radius 10.

Section 3.3

3.3.3 First we observe that for A ∈ F(m, n) and B ∈ F(n, l) written as B =


(B1 , . . . , Bl ) where B1 , . . . , Bl ∈ Fn are l column vectors we have

AB = (AB1 , . . . , ABl ), (S12)

where AB1 , . . . , ABl are l column vectors in Fm .


Now for A ∈ F(n, n) we recall the relation (3.3.5). If det(A) = 0, we
can evaluate determinants on both sides of (3.3.5) to arrive at (3.3.15). If
det(A) = 0 then in view of (3.3.5) and (S12) we have

adj(A)A1 = 0, ..., adj(A)An = 0, (S13)

where A1 , . . . , An are the n column vectors of A. If A = 0, then at


least one of A1 , . . . , An is nonzero. In other words, n(adj(A)) ≥ 1 so
that r(adj(A)) ≤ n − 1. Hence det(adj(A)) = 0 and (3.3.15) is valid. If
A = 0 then adj(A) = 0 and (3.3.15) trivially holds.
3.3.4 If r(A) = n then both A and adj(A) are invertible by (3.3.5). So we
obtain r(adj(A)) = n. If r(A) = n−1 then A contains an (n−1)×(n−1)
submatrix whose determinant is nonzero. So adj(A) = 0 which leads to
r(adj(A)) ≥ 1. On the other hand, from (S13), we see that n(adj(A)) ≥
n − 1. So r(adj(A)) ≤ 1. This proves r(adj(A)) = 1. If r(A) ≤ n − 2
then any n − 1 row vectors of A are linearly dependent, which indicates
that all row vectors of any (n − 1) × (n − 1) submatrix of A are linearly
dependent. In particular all cofactors of A are zero. Hence adj(A) = 0
and r(adj(A)) = 0.
3.3.6 Let n ≥ 3 and assume A is invertible. Using (3.3.5) and (3.3.15) we have

adj(A)adj(adj(A)) = det(adj(A))In = (det(A))n−1 In .

Comparing this with (3.3.5) again we obtain adj(adj(A)) =


(det(A))n−2 A. If A is not invertible, then det(A) = 0 and r(adj(A)) ≤ 1.
Since n ≥ 3, we have r(adj(A)) ≤ 1 ≤ n − 2. So in view of
Exercise 3.3.4 we have r(adj(adj(A))) = 0 or adj(adj(A)) = 0. So
adj(adj(A)) = (det(A))n−2 A is trivially true.
If n = 2 it is direct to verify the relation adj(adj(A)) = A.
www.pdfgrip.com

286 Solutions to selected exercises

3.3.7 For A = (aij ), we use MijA and CijA to denote the minor and cofactor of
the entry aij , respectively. Then we have
t
MijA = MjAi , (S14)
t
which leads to CijA = CjAi , i, j = 1, . . . , n. Hence
t
(adj(A))t = (CijA ) = (CijA )t = adj(At ).

(If A is invertible, we may use (3.3.5) to write adj(A) = det(A)A−1 .


Therefore (adj(A))t = det(A)(A−1 )t = det(At )(At )−1 = adj(At ).)
3.3.8 From (3.3.5) we have AAt = det(A)In . For B = (bij ) = AAt , we get

n
bii = aij2 , i = 1, . . . , n.
j =1

So det(A) = 0 if and only if bii = 0 for all i = 1, . . . , n, or A = 0.


3.3.9 First assume that A, B are both invertible. Then, in view of (3.3.5), we
have
adj(AB) = det(AB)(AB)−1 = det(A) det(B)B −1 A−1
= (det(B)B −1 )(det(A)A−1 ) = adj(B)adj(A).
Next we prove the conclusion without assuming that A, B are invertible.
Since det(A − λI ) and det(B − λI ) are polynomials of the variable λ,
of degree n, they have at most n roots. Let {λk } be a sequence in R so
that λk → 0 as k → ∞ and det(A − λk I ) = 0, det(B − λk I ) = 0 for
k = 1, 2, . . . . Hence we have
adj((A − λk I )(B − λk I )) = adj(B − λk I )adj(A − λk I ), k = 1, 2, . . . .
Letting k → ∞ in the above we arrive at adj(AB) = adj(B)adj(A)
again.

Section 3.4

3.4.1 Using CijA to denote the cofactor of the entry aij of the matrix A = (aij )
and applying (3.2.42), we have
 
d  d 
a1 = pA (λ) = (det(λIn − A))
dλ λ=0 dλ λ=0
n
(−A)
n
= Cii = (−1)n−1 CiiA .
i=1 i=1
www.pdfgrip.com

Solutions to selected exercises 287

3.4.2 It suffices to show that M = C(n, n) \ D is closed in C(n, n). Let {Ak }
be a sequence in M which converges to some A ∈ C(n, n). We need to
prove that A ∈ M. To this end, consider pAk (λ) and let {λk } be multiple
roots of pAk (λ). Then we have

pAk (λk ) = 0, pA k
(λk ) = 0, k = 1, 2, . . . . (S15)

On the other hand, we know that the coefficients of pA (λ) are con-
tinuously dependent on the entries of A. So the coefficients of pAk (λ)
converge to those of pA (λ), respectively. In particular, the coefficients
of pAk (λ), say {an−1
k
}, . . . , {a1k }, {a0k }, are bounded sequences. Thus


n−1
|λk | ≤
n
|λnk − pAk (λk )| + |pAk (λk )| ≤ |aik ||λk |i ,
i=0

which indicates that {λk } is a bounded sequence in C. Passing to a sub-


sequence if necessary, we may assume without loss of generality that
λk → some λ0 ∈ C as k → ∞. Letting k → ∞ in (S15), we obtain

pA (λ0 ) = 0, pA (λ0 ) = 0. Hence λ0 is a multiple root of pA (λ) which
proves A ∈ M.
3.4.4 Using (3.4.31), we have

adj(λIn − A) = An−1 λn−1 + · · · + A1 λ + A0 ,

where the matrices An−1 , An−2 , . . . , A1 , A0 are determined through


(3.4.32). Hence, setting λ = 0 in the above, we get

adj(−A) = A0 = a1 In + a2 A + · · · + an−1 An−2 + An−1 .

That is,

adj(A) = (−1)n−1 (a1 In + a2 A+ · · · +an−1 An−2 + An−1 ). (S16)

Note that (S16) may be used to prove some known facts more
easily. For example, the relation adj(At ) = (adj(A))t (see Exercise
3.3.7) follows immediately.
3.4.6 First assume that A is invertible. Then

pAB (λ) = det(λIn − AB) = det(A[λA−1 − B])


= det(A) det(λA−1 − B) = det([λA−1 − B]A)
= det(λIn − BA) = pBA (λ).
www.pdfgrip.com

288 Solutions to selected exercises

Next assume A is arbitrary. If F = R or C, we may proceed as follows.


Let {ak } be a sequence in F such that A − ak In is invertible and ak → 0
as k → ∞. Then we have

p(A−ak In )B (λ) = pB(A−ak In ) (λ), k = 1, 2, . . . .

Letting k → ∞ in the above we arrive at pAB (λ) = pBA (λ) again.


If A is not invertible or F is not R or C, the above methods fail and
a different method needs to be used. To this end, we recognize the fol-
lowing matrix relations in F(2n, 2n):

  
In −A A λIn 0 λIn − AB
= ,
0 λIn In B λIn λB
  
In 0 A λIn A λIn
= .
−B λIn In B λIn − BA 0

The determinants of the left-hand sides of the above are the same. The
determinants of the right-hand sides of the above are

(−1)n λn det(λIn − AB), (−1)n λn det(λIn − BA).

Hence the conclusion pAB = (λ) = pBA (λ) follows.


3.4.7 We consider the nontrivial case n ≥ 2. Using the notation and method
in the solution to Exercise 3.2.17, we have

 
 1 b1 b2 ··· bn 
 
 ··· 0 
 a1 λ 0
 
 a2 0 λ ··· 0  .
det(λIn − α t β) = 
 .. 
 .. .. .. ..
 . . . . . 
 
 an 0 0 ··· λ 

Assume λ = 0. Subtracting the b1 /λ multiple of row 2 from row 1,


b2 /λ multiple of row 3 from row 1,..., and bn /λ multiple of the last row
from row 1, we obtain
www.pdfgrip.com

Solutions to selected exercises 289

 
 1 
n
 1− ai bi 0 0 ··· 0 
 
 λ 
 i=1 
 ··· 
 a1 λ 0 0 
det(λIn − α β) = 
t 

 a2 0 λ ··· 0 
 
 .. .. .. .. .. 
 . . . . . 
 
 an 0 0 ··· λ 

1
n
= 1− ai bi λn = λn−1 (λ − αβ t ).
λ
i=1

Since both sides of the above also agree at λ = 0, they are identical for
all λ.
3.4.8 (i) We have pA (λ) = λ3 − 2λ2 − λ + 2. From pA (A) = 0 we have
A(A2 − 2A − In ) = −2In . So
⎛ ⎞
1 −1 0
⎜ ⎟
1 2 ⎜ 1 ⎟
−1
A = − (A − 2A − In ) = ⎜ ⎜ 0 0 ⎟.
2 2 ⎟
⎝ 3 ⎠
−2 −1
2
(ii) We divide λ10 by pA (λ) to find

λ10 =
pA (λ)(λ7 + 2λ7 + 5λ5 + 10λ4 + 21λ3 + 42λ2 + 85λ + 170)
+ 341λ2 − 340.

Consequently, inserting A in the above, we have


⎛ ⎞
1 2046 0
⎜ ⎟
A = 341A − 340In = ⎝ 0 1024
10 2
0 ⎠.
0 −1705 1

Section 4.1

4.1.8 Let u1 and u2 be linearly independent vectors in U and assume


(u1 , u1 ) = 0 and (u2 , u2 ) = 0 otherwise there is nothing to show. Set
u3 = a1 u1 + u2 where a1 = −(u1 , u2 )/(u1 , u1 ). Then u3 = 0 and is
perpendicular to u1 . So u1 , u3 are linearly independent as well. Consider
u = cu1 +u3 (c ∈ C). Of course u = 0 for any c. Since (u1 , u3 ) = 0, we
www.pdfgrip.com

290 Solutions to selected exercises

have (u, u) = c2 (u1 , u1 ) + (u3 , u3 ). Thus, in order to have (u, u) = 0,


we may choose
7
(u3 , u3 )
c= − .
(u1 , u1 )

Section 4.2

4.2.6 We first show that the definition has no ambiguity. For this purpose,
assume [u1 ] = [u] and [v1 ] = [v]. We need to show that (u1 , v1 ) =
(u, v). In fact, since u1 ∈ [u] and v1 ∈ [v], we know that u1 − u ∈ U0
and v1 − v ∈ U0 . Hence there are x, y ∈ U0 such that u1 − u = x, v1 −
v = y. So (u1 , v1 ) = (u + x, v + y) = (u, v) + (u, y) + (x, v + y) =
(u, v).
It is obvious that ([u], [v]) is bilinear and symmetric.
To show that the scalar product is non-degenerate, we assume [u] ∈
U/U0 satisfies ([u], [v]) = 0 for any [v] ∈ U/U0 . Thus (u, v) = 0 for
all v ∈ U which implies u ∈ U0 . In other words, [u] = [0].
4.2.8 Let w ∈ V ⊥ . Then (v, w) = 0 for any v ∈ V . Hence 0 = (v, w) =
v, ρ(w), ∀v ∈ V , which implies ρ(w) ∈ V 0 . Hence ρ(V ⊥ ) ⊂ V 0 .
On the other hand, take w ∈ V 0 . Then v, w   = 0 for all v ∈ V .
Since ρ : U → U  is an isomorphism, there is a unique w ∈ U such
that ρ(w) = w  . Thus (v, w) = v, ρ(w) = v, w  = 0, v ∈ V . That
is, w ∈ V ⊥ . Thus w  ∈ ρ(V ⊥ ), which proves V 0 ⊂ ρ(V ⊥ ).
4.2.10 Recall (4.1.26). Replace V , W in (4.1.26) by V ⊥ , W ⊥ and use
(V ⊥ )⊥ = V , (W ⊥ )⊥ = W . We get (V ⊥ + W ⊥ )⊥ = V ∩ W. Taking

on both sides of this equation, we see that the conclusion follows.
4.2.11 From Exercise 4.1.8 we know that there is a nonzero vector u ∈ U such
that (u, u) = 0. Since (·, ·) is non-degenerate, there is some w ∈ U
such that (u, w) = 0. It is clear that u, w are linearly independent. If
w satisfies (w, w) = 0, then we take v = w/(u, w). Hence (v, v) = 0
and (u, v) = 1. Assume now (w, w) = 0 and consider x = u+cw (c ∈
C). Then x can never be zero for any c ∈ C. Set (x, x) = 0. We get
2c(u, w) + c2 (w, w) = 0. So we may choose c = −2(u, w)/(w, w).
Thus, with v = x/(u, x) where

2(u, w)2
(u, x) = − = 0,
(w, w)

we have (v, v) = 0 and (u, v) = 1 again.


www.pdfgrip.com

Solutions to selected exercises 291

Section 4.3

4.3.1 We consider the real case for simplicity. The complex case is similar.
Suppose M is nonsingular and consider a1 u1 + · · · + ak uk = 0, where
a1 , . . . , ak are scalars. Taking scalar products of this equation with the
vectors u1 , . . . , uk consecutively, we get

a1 (u1 , ui ) + · · · + ak (uk , ui ) = 0, i = 1, . . . , k.

Since M t is nonsingular, we get a1 = · · · = ak = 0.


Suppose M is singular. Then so is M t and the above system of equa-
tions will have a solution (a1 , . . . , ak ) = (0, . . . , 0). We rewrite the
above system as
 k  k
 
ai ui , u1 = 0, . . . , ai ui , uk = 0.
i=1 i=1

Then modify the above into


 k  k
 
a1 ai ui , u1 = 0, ..., ak ai ui , uk =0
i=1 i=1

and sum up the k equations. We get


⎛ ⎞
k 
k
⎝ ai ui , aj uj ⎠ = 0.
i=1 j =1


k
That is, we have (v, v) = 0 where v = ai ui . Since the scalar product
i=1
is positive definite, we arrive at v = 0. Thus u1 , . . . , uk are linearly
dependent.
4.3.2 Suppose u ∈ Span{u1 , . . . , uk }. Then there are scalars a1 , . . . , ak such
that u = a1 u1 + · · · + ak uk . Taking scalar products of this equation with
u1 , . . . , uk , we have

(ui , u) = a1 (ui , u1 ) + · · · + ak (ui , uk ), i = 1, . . . , k,

which establishes
⎛ ⎞ ⎧⎛ ⎞ ⎛ ⎞⎫
(u1 , u) ⎪
⎪ (u1 , u1 ) (u1 , uk ) ⎪⎪
⎜ ⎟ ⎨⎜ ⎟ ⎜ ⎟⎬
⎜ .
.. ⎟ ∈ Span ⎜ .. ⎟,...,⎜ .. ⎟ .
⎝ ⎠ ⎪ ⎝ . ⎠ ⎝ . ⎠⎪

⎩ ⎪

(uk , u) (uk , u1 ) (uk , uk )
www.pdfgrip.com

292 Solutions to selected exercises

On the other hand, let V = Span{u1 , . . . , uk }. If k < dim(U ), then


V ⊥ = {0}. Take u ∈ V ⊥ , u = 0. Then
⎛ ⎞ ⎛ ⎞
(u1 , u) 0
⎜ ⎟ ⎜ . ⎟
⎜ .
.. ⎟=⎜ . ⎟
⎝ ⎠ ⎝ . ⎠
(uk , u) 0
⎧⎛ ⎞ ⎛ ⎞⎫

⎪ (u1 , u1 ) (u1 , uk ) ⎪⎪
⎨⎜ ⎟ ⎜ ⎟⎬
∈ Span ⎜ ⎝
..
.
⎟,...,⎜
⎠ ⎝
..
.
⎟ ,
⎠⎪

⎪ ⎪
⎩ ⎭
(uk , u1 ) (uk , uk )
but u ∈ Span{u1 , . . . , uk }.
4.3.4 For any u ∈ U , we have
(I ± S)u2 = ((I ± S)u, (I ± S)u)
= u2 + Su2 ± (u, Su) ± (Su, u)
= u2 + Su2 ± (S  u, u) ± (Su, u)
= u2 + Su2 ∓ (Su, u) ± (Su, u)
= u2 + Su2 .
So (I ± S)u = 0 whenever u = 0. Thus n(I ± S) = 0 and I ± S must
be invertible.
4.3.6 Set R(A) = {u ∈ Cm | u = Ax for some x ∈ Cn }. Rewrite Cm as Cm =
R(A) ⊕ (R(A))⊥ and assert (R(A))⊥ = {y ∈ Cm | A† y = 0} = N (A† ).
In fact, if A† y = 0, then (y, Ax) = y † Ax = (x † A† y)† = 0 for any
x ∈ Cn . That is, N (A† ) ⊂ R(A)⊥ . On the other hand, take z ∈ R(A)⊥ .
Then for any x ∈ Cn we have 0 = (z, Ax) = z† Ax = (A† z)† x =
(A† z, x), which indicates A† z = 0. So z ∈ N (A† ). Thus R(A)⊥ ⊂
N(A† ).
Therefore the equation Ax = b has a solution if and only if b ∈ R(A)
or, if and only if b is perpendicular to R(A)⊥ = N (A† ).

Section 4.4

4.4.2 It is clear that P ∈ L(U ). Let {w1 , . . . , wl } be an orthogonal basis of


W = V ⊥ . Then {v1 , . . . , vk , w1 , . . . , wl } is an orthogonal basis of U .
Now it is direct to check that P (vj ) = vj , j = 1, . . . , k, P (ws ) =
0, s = 1, . . . , l. So P 2 (vj ) = P (vj ), j = 1, . . . , k, P 2 (ws ) =
P (ws ), s = 1, . . . , l. Hence P 2 = P , R(P ) = V , and N (P ) = W
as claimed. In other words, P : U → U is the projection of U onto V
along W .
www.pdfgrip.com

Solutions to selected exercises 293

4.4.4 (i) We know that we have the direct sum U = V ⊕V ⊥ . For any x, y ∈ [u]
we have x = v1 + w1 , y = v2 + w2 , vi ∈ V , wi ∈ V ⊥ , i = 1, 2. So
x − y = (v1 − v2 ) + (w1 − w2 ). However, since x − y ∈ V , we have
w1 = w2 . In other words, for any x ∈ [u], there is a unique w ∈ V ⊥
so that x = v + w for some v ∈ V . Of course, [x] = [w] = [u] and
x2 = v2 + w2 whose minimum is attained at v = 0. This proves
[u] = w.
(ii) So we need to find the unique w ∈ V ⊥ such that [w] = [u]. First
find an orthogonal basis {v1 , . . . , vk } of V . Then for u we know that the
projection of u onto V along V ⊥ is given by the Fourier expansion (see
Exercise 4.4.2)

k
(vi , u)
v= vi .
(vi , vi )
i=1

Thus, from u = v + w where w ∈ V ⊥ , we get


k
(vi , u)
w =u− vi .
(vi , vi )
i=1

Section 4.5

4.5.5 (i) It is direct to check that T (x) = x for x ∈ Rn .


(ii) Since T (T (x)) = x for x ∈ Rn , we have T 2 = I . Consequently,
T = T  . So if λ is an eigenvalue of T then λ2 = 1. Thus the eigen-
values of T may only be ±1. In fact, for u = (1, 0, . . . , 0, 1)t and
v = (1, 0, . . . , 0, −1)t , we have T (u) = u and T (v) = −v. So ±1
are both eigenvalues of T .
(iii) Let x, y ∈ Rn satisfy T (x) = x and T (y) = −y. Then (x, y) =
(T (x), T (y)) = (x, −y) = −(x, y). So (x, y) = 0.
(iv) Since T 2 − I = 0 and T = ±I , we see that the minimal polyno-
mial of T is mT (λ) = λ2 − 1.

Section 5.2

5.2.7 We can write A as A = P t DP where P is orthogonal and

D = diag{d1 , . . . , dn }, di = ±1, i = 1, . . . , n.

It is easily seen that the column vectors of D are mutually orthogonal


and of length 1. So D is orthogonal as well. Hence A must be orthog-
onal as a product of three orthogonal matrices.
www.pdfgrip.com

294 Solutions to selected exercises

1
5.2.16 Use · to denote the standard norm of Rn and let u1 = u. Expand
u
{u1 } to get an orthonormal basis {u1 , u2 , . . . , un } of R (with the usual
n

scalar product) where u1 , . . . , un are all column vectors. Let Q be the


n×n matrix whose 1st, 2nd,. . . , nth column vectors are u1 , u2 , . . . , un .
Then

ut Q = ut Q = ut (u1 , u2 , . . . , un )
= (ut u1 , ut u2 , . . . , ut un ) = (u, 0, . . . , 0).

Therefore

Qt (uut )Q = (ut Q)t (ut Q)


⎛ ⎞
u
⎜ ⎟
⎜ 0 ⎟
⎜ ⎟
= ⎜ . ⎟ (u, 0, . . . , 0) = diag{u2 , 0, . . . , 0}.
⎜ .. ⎟
⎝ ⎠
0

Section 5.3

5.3.8 For definiteness, assume m < n. So r(A) = m. Use  ·  to denote the


standard norm of Rl . Now check

q(x) = x t (At A)x = (Ax)t (Ax) = Ax2 ≥ 0, x ∈ Rn ,

Q(y) = y t (AAt )y = (At y)t (At y) = At y2 ≥ 0, y ∈ Rm .

Hence At A ∈ R(n, n) and AAt ∈ R(m, m) are both semi-positive


definite.
If q(x) = 0, we have Ax = 0. Since r(A) = m < n, we know that
n(A) = n − r(A) > 0. Thus there is some x = 0 such that Ax = 0.
Hence q cannot be positive definite.
If Q(y) = 0, we have At y = 0. Since r(At ) = r(A) = m, we know
that n(At ) = m − r(At ) = 0. Thus y = 0. Hence Q(y) > 0 whenever
y = 0 and Q is positive definite.
5.3.11 Assume A is semi-positive definite. Then there is a semi-positive defi-
nite matrix B such that A = B 2 . If x ∈ S, then 0 = x t Ax = x t B 2 x =
(Bx)t (Bx) = Bx2 . That is, Bx = 0, which implies Ax = B 2 x = 0.
Conversely, Ax = 0 clearly implies x ∈ S.
If A is indefinite, N (A) may not be equal to S. For example, take
A = diag{1, −1}. Then N (A) = {0} and
www.pdfgrip.com

Solutions to selected exercises 295

   
x1 
S= x= ∈ R2 x12 − x22 =0 ,
x2

which is the union of the two lines x1 = x2 and x1 = −x2 , which


cannot even be a subspace of R2 .
5.3.12 Since A is positive definite, there is an invertible matrix P ∈ R(n, n)
such that P t AP = In . Since P t BP is symmetric, there is an orthog-
onal matrix Q ∈ R(n, n) such that Qt (P t BP )Q is a real diagonal
matrix. Set C = P Q. Then both C t AC and C t BC are diagonal.
5.3.13 First we observe that if C is symmetric then the eigenvalues of C are
greater than or equal to c if and only if x t Cx ≥ cx t x = cx2 , x ∈ Rn .
In fact, if λ1 , . . . , λn are eigenvalues of C, then there is an orthogonal
matrix P such that C = P t DP , where D = diag{λ1 , . . . , λn }. So with
y = P x we have

n
x t Cx = (P x)t D(P x) = y t Dy = λi yi2 ≥ λ0 y2 ,
i=1

where λ0 is the smallest eigenvalue of C. If λ0 ≥ c then the above gives


us x t Cx ≥ cy2 = cx2 . Conversely, we see that

n 
n
x t Cx − cx2 ≥ 0 implies λi yi − cy2 = (λi − c)yi2 ≥ 0.
i=1 i=1

Since x and hence y can be arbitrarily chosen, we have λi ≥ c for all i.


Now from x t Ax ≥ ax2 , x t Bx ≥ bx2 , we infer x t (A + B)x ≥
(a + b)x2 . Thus the eigenvalues of A + B are greater than or equal
to a + b.
5.3.14 Since A is positive definite, there is another positive definite matrix
P ∈ R(n, n) such that A = P 2 . Let C = AB. Then we have C = P 2 B.
Thus P −1 CP = P BP . Since the right-hand side of this relation is pos-
itive definite, its eigenvalues are all positive. Hence all the eigenvalues
of C are positive.
5.3.16 Let P be a nonsingular matrix such that A = P t P . Then det(A) =
det(P t P ) = det(P )2 . Now

det(A + B) = det(P t P + B)
= det(P t ) det(I + [P −1 ]t B[P −1 ]) det(P ).

Consider C = [P −1 ]t B[P −1 ] which is clearly semi-positive definite.


Let the eigenvalues of C be λ1 , . . . , λn ≥ 0. Then there is an orthogonal
matrix Q such that Qt CQ = D where D = diag{λ1 , . . . , λn }. Thus
www.pdfgrip.com

296 Solutions to selected exercises

det(C) = det(Qt DQ) = det(D) = λ1 · · · λn . Combining the above


results and using det(Q)2 = 1, we obtain

det(A + B) = det(Qt ) det(P t ) det(I + C) det(P ) det(Q)


= det(A) det(I + Qt CQ) = det(A) det(I + D)
'
n
= det(A) (1 + λi )
i=1
≥ det(A)(1 + λ1 · · · λn )
= det(A) + det(A) det(C).

However, recall the relation between B and C we get B = P t CP


which gives us det(B) = det(P )2 det(C) = det(A) det(C) so that the
desired inequality is established.
If the inequality becomes an equality, then all λ1 , . . . , λn vanish. So
C = 0 which indicates that B = 0.
5.3.17 Since A is positive definite, there is an invertible matrix P ∈ R(n, n)
such that P t AP = In . Hence we have

(det(P ))2 det(λA − B) = det(P t ((λ − 1)A − (B − A))P )


= det((λ − 1)In − P t (B − A)P ).

Since P t (B − A)P is positive semi-definite whose eigenvalues are all


non-negative, we see that the roots of the equation det(λA − B) = 0
must satisfy λ − 1 ≥ 0.
5.3.18 (i) Let x ∈ U be a minimum point of f . Then for any y ∈ U we have
g(ε) ≥ g(0), where g(ε) = f (x + εy) (ε ∈ R). Thus
 
dg 1 1
0= = (y, T (x)) + (x, T (y)) − (y, b)
dε ε=0 2 2
= (y, T (x) − b).

Since y ∈ U is arbitrary, we get T (x)−b = 0 and (5.3.25) follows.


Conversely, if x ∈ U satisfies (5.3.25), then
1 1
f (u) − f (x) = (u, T (u)) − (u, T (x)) − (x, T (x)) + (x, T (x))
2 2
1
= (u − x, T (u − x)) ≥ λ0 u − x2 , u ∈ U,
2
(S17)

for some constant λ0 > 0. That is, f (u) ≥ f (x) for all u ∈ U .
www.pdfgrip.com

Solutions to selected exercises 297

(ii) If x, y ∈ U solve (5.3.25), then (i) indicates that x, y are the mini-
mum points of f . Hence f (x) = f (y). Replacing u by y in (S17)
we arrive at x = y.
5.3.19 Let S ∈ L(U ) be positive semi-definite so that S 2 = T . Then q(u) =
(S(u), S(u)) = S(u)2 for any u ∈ U . Hence the Schwarz inequality
(4.3.10) gives us
q(αu + βv) = (S(αu + βv), S(αu + βv))
= α 2 S(u)2 + 2αβ(S(u), S(v)) + β 2 S(v)2
≤ α 2 S(u)2 + 2αβS(u)S(v) + β 2 S(v)2
≤ α 2 S(u)2 + αβ(S(u)2 + S(v)2 )
+ β 2 S(v)2
= α(α + β)S(u)2 + β(α + β)S(v)2
= αq(u) + βq(v), u, v ∈ U,
where we have also used the inequality 2ab ≤ a 2 + b2 for a, b ∈ R.

Section 5.4

5.4.1 Suppose otherwise that A is positive definite. Using Theorem 5.11 we


have
     
 a a   a a   a a 
 1 2   1 3   3 1 
a1 , a2 , a3 > 0,  , ,  > 0,
 a2 a3   a3 a2   a1 a2 
which contradicts det(A) > 0.
5.4.2 In the general case, n ≥ 3, we have
⎛ ⎞
x1

n−1
⎜ . ⎟
q(x) = (xi + ai xi+1 )2 + (xn + an x1 )2 , x=⎜ ⎟
⎝ .. ⎠ ∈ R .
n

i=1
xn

n
We can rewrite q(x) as yi2 with
i=1

yi = xi + ai xi+1 , i = 1, . . . , n − 1, yn = xn + an x1 . (S18)
It is seen that q(x) is positive definite if and only if the change of vari-
ables given in (S18) is invertible, which is equivalent to the condition
1 + (−1)n+1 a1 a2 · · · an = 0.
5.4.3 If A is positive semi-definite, then A + λIn is positive definite for all
λ > 0. Hence all the leading principal minors of A + λIn are positive
www.pdfgrip.com

298 Solutions to selected exercises

when λ > 0. Taking λ → 0+ in these positive minors we arrive at


the conclusion. The converse is not true, however. For example, take
A = diag{1, 0, −1}. Then all the leading principal minors of A are non-
negative but A is indefinite.
5.4.5 (i) The left-hand side of equation (5.4.24) reads

An−1 An−1 β + α
.
β An−1 + α β An−1 β + α t β + β t α + ann
t t t

Comparing the above with the right-hand side of (5.4.24), we arrive


at the equation An−1 β = −α, which has a unique solution since
An−1 is invertible. Inserting this result into the entry at the position
(n, n) in the above matrix and comparing with the right-hand side
of (5.4.24) again, we get a = −α t (A−1 −1
n−1 )α ≤ 0 since An−1 is
positive definite.
(ii) Taking determinants on both sides of (5.4.24) and using the facts
det(An−1 ) > 0 and a ≤ 0, we obtain
det(A) = det(An−1 )(ann + a) ≤ det(An−1 )ann .
(iii) This follows directly.
(iv) If A is positive semi-definite, then A + λIn is positive definite for
λ > 0. Hence det(A + λIn ) ≤ (a11 + λ) · · · (ann + λ), λ > 0.
Letting λ → 0+ in the above inequality we see that (5.4.26) holds
again.
5.4.6 (i) It is clear that the entry at the position (i, j ) of the matrix At A is
(ui , uj ).
(ii) Thus, applying (5.4.26), we have det(At A) ≤ (u1 , u1 ) · · · (un , un )
or (det(A))2 ≤ u1 2 · · · un 2 .
5.4.7 Using the definition of the standard Euclidean scalar product we have
1
ui  ≤ an 2 , i = 1, . . . , n. Thus (5.4.29) follows.

Section 5.5

5.5.1 Assume the nontrivial case dim(U ) ≥ 2. First suppose that T satisfies
T (u) ∈ Span{u} for any u ∈ U . If T = aI for some a ∈ F, then there are
some linearly independent vectors u, v ∈ U such that T (u) = au and
T (v) = bv for some a, b ∈ F with a = b. Since T (u+v) ∈ Span{u+v},
we have T (u + v) = c(u + v) for some c ∈ F. Thus c(u + v) = au + bv,
which implies a = c and b = c because u, v are linearly independent.
This is false. Next we assume that there is some u ∈ U such that v =
T (u) ∈ Span{u}. Thus u, v are linearly independent. Let S ∈ L(U ) be
such that S(u) = v and S(v) = u. Then T (v) = T (S(u)) = S(T (u)) =
www.pdfgrip.com

Solutions to selected exercises 299

S(v) = u. Let R ∈ L(U ) be such that R(u) = v and R(v) = 0. Then


u = T (v) = T (R(u)) = R(T (u)) = R(v) = 0, which is again false.
5.5.3 Let λ1 , . . . , λn ∈ F be the n distinct eigenvalues of T and u1 , . . . , un
the associated eigenvectors, respectively, which form a basis of U . We
have T (S(ui )) = S(T (ui )) = λi S(ui ). So S(ui ) ∈ Eλi . In other words,
there is some bi ∈ F such that S(ui ) = bi ui for i = 1, . . . , n. It is clear
that the scalars b1 , . . . , bn determine S uniquely. On the other hand,
consider a mapping R ∈ L(U ) defined by

R = a0 I + a1 T + · · · + an−1 T n−1 , a0 , a1 , . . . , an−1 ∈ F. (S19)

Then

R(ui ) = (a0 + a1 λi + · · · + an−1 λn−1


i )ui , i = 1, . . . , n.

Therefore, if a0 , a1 , . . . , an−1 are so chosen that

a0 + a1 λi + · · · + an−1 λn−1
i = bi , i = 1, . . . , n, (S20)

then R = S. However, using the Vandermonde determinant, we know


that the non-homogeneous system of equations (S20) has a unique so-
lution in a0 , a1 , . . . , an−1 . Thus R may be constructed by (S19) to yield
R = S.
5.5.4 From the previous exercise we have CT = Span{I, T , . . . , T n−1 }. It
remains to show that, as vectors in L(U ), the mappings I, T , . . . , T n−1
are linearly independent. To see this, we consider

c0 I + c1 T + · · · + cn−1 T n−1 = 0, c0 , c1 , . . . , cn−1 ∈ F.

Applying the above to ui we obtain

c0 + c1 λi + · · · + cn−1 λn−1
i = 0, i = 1, . . . , n,

which lead to c0 = c1 = · · · = cn−1 = 0 in view of the Vandermonde


determinant. So dim(CT ) = n.

Section 5.6

5.6.3 (i) Let v ∈ N (T  ). Then 0 = (u, T  (v))U = (T (u), v)V for any u ∈
U . So v ∈ R(T )⊥ . Conversely, if v ∈ R(T )⊥ , then for any u ∈ U
we have 0 = (T (u), v)V = (u, T  (v))U . So T  (v) = 0 or v ∈
N(T  ). Thus N (T  ) = R(T )⊥ . Taking ⊥ on this relation we obtain
R(T ) = N (T  )⊥ .
www.pdfgrip.com

300 Solutions to selected exercises

(ii) From R(T )⊥ = N (T  ) and V = R(T ) ⊕ R(T )⊥ we have


dim(V ) = r(T ) + n(T  ). Applying this result in the rank equation
r(T  ) + n(T  ) = dim(V ) we get r(T ) = r(T  ).
(iii) It is clear that N (T ) ⊂ N (T  ◦ T ). If u ∈ N (T  ◦ T ) then 0 =
(u, (T  ◦ T )(u))U = (T (u), T (u))V . Thus u ∈ N (T ). This proves
N(T ) = N (T  ◦ T ). Hence n(T ) = n(T  ◦ T ). Using the rank
equations dim(U ) = r(T ) + n(T ) and dim(U ) = r(T  ◦ T ) +
n(T  ◦ T ) we obtain r(T ) = r(T  ◦ T ). Replacing T by T  , we get
r(T  ) = r(T ◦ T  ).

Section 6.2

6.2.5 Since P 2 = P , we know that P projects U onto R(P ) along N (P ). If


N(P ) = R(P )⊥ , we show that P  = P . To this end, for any u1 , u2
∈ U we rewrite them as u1 = v1 + w1 , u2 = v2 + w2 where
v1 , v2 ∈ R(P ), w1 , w2 ∈ R(P )⊥ . Then we have P (v1 ) = v1 , P (v2 ) =
v2 , P (w1 ) = P (w2 ) = 0. Hence (u1 , P (u2 )) = (v1 + w1 , v2 ) =
(v1 , v2 ), (P (u1 ), u2 ) = (v1 , v2 + w2 ) = (v1 , v2 ). This proves
(u1 , P (u2 )) = (P (u1 ), u2 ), which indicates P = P  . Conversely, we
note that for any T ∈ L(U ) there holds R(T )⊥ = N (T  ) (cf. Exercise
5.6.3). If P = P  , we have R(P )⊥ = N (P ).
6.2.8 (i) If T k = 0 for some k ≥ 1 then eigenvalues of T vanish. Let
{u1 , . . . , un } be a basis of U consisting of eigenvectors of T . Then
T (ui ) = 0 for all i = 1, . . . , n. Hence T = 0.
(ii) If k = 1, then there is nothing to show. Assume that the statement
is true up to k = l ≥ 1. We show that the statement is true at k = l +1. If
l = odd, then k = 2m for some integer m ≥ 1. Hence T 2m (u) = 0 gives
us (u, T 2m (u)) = (T m (u), T m u) = 0. So T m (u) = 0. Since m ≤ l, we
deduce T (u) = 0. If l = even, then l = 2m for some integer m ≥ 1 and
T 2m+1 (u) = 0 gives us T 2m (v) = 0 where v = T (u). Hence T (v) = 0.
That is, T 2 (u) = 0. So it follows again that T (u) = 0.

Section 6.3

6.3.11 Since A† A is positive definite, there is a positive definite Hermitian


matrix B ∈ C(n, n) such that A† A = B 2 . Thus we can rewrite A as
A = P B with P = (A† )−1 B which may be checked to satisfy

P P † = (A† )−1 BB † A−1 = (A† )−1 B 2 A−1 = (A† )−1 A† AA−1 = In .


www.pdfgrip.com

Solutions to selected exercises 301

6.3.12 From the previous exercise, we may rewrite A as A = CB where C, B


are in C(n, n) such that C is unitary and B positive definite. Hence
( A = B(. So if λ1 , . . . , λn > 0 are the eigenvalues of A A then
† 2 †
A
λ1 , . . . , λn are those of B. Let Q ∈ C(n, n) be unitary such that
B = Q† DQ, where D is given in (6.3.26). Then A = CQ† DQ. Let
P = CQ† . We see that P is unitary and the problem follows.
6.3.13 Since T is positive definite, (u, v)T ≡ (u, T (v)), u, v ∈ U , also de-
fines a positive definite scalar product over U . So the problem follows
from the Schwarz inequality stated in terms of this new scalar product.

Section 6.4

6.4.4 Assume that A = (aij ) is upper triangular. Then we see that the diag-
onal entries of A† A and AA† are

i 
n
|a11 |2 , . . . , |aj i |2 , . . . , |aj n |2 ,
j =1 j =1

n 
n
|a1j |2 , . . . , |aij |2 , . . . , |ann |2 ,
j =1 j =i

respectively. Comparing these entries in the equation A† A = AA† , we


get aij = 0 for i < j . So A is diagonal.
6.4.6 (i) The conclusion T = 0 follows from the fact that all eigenvalues of
T vanish and U has a basis consisting of eigenvectors of T .
(ii) We have (T  ◦ T )k (u) = 0 since T  , T commute. Using Exer-
cise 6.2.8 we have (T  ◦ T )(u) = 0. Hence 0 = (u, (T  ◦ T )(u)) =
(T (u), T (u)) which gives us T (u) = 0.
6.4.10 Assume that there are positive semi-definite R and unitary S, in L(U ),
such that
T = R ◦ S = S ◦ R. (S21)
    
Then T = S ◦ R = R ◦ S . So T ◦ T = T ◦ T and T is nor-
mal. Now assume that T is normal. Then U has an orthonormal basis,
say {u1 , . . . , un }, consisting of eigenvectors of T . Set T (ui ) = λi ui
(i = 1, . . . , n). Make polar decompositions of these eigenvalues,
λi = |λi |eiθi , i = 1, . . . , n, with the convention that θi = 0 if λi = 0.
Define R, S ∈ L(U ) by setting

R(ui ) = |λi |ui , S(ui ) = eiθi ui , i = 1, . . . , n.


www.pdfgrip.com

302 Solutions to selected exercises

Then it is clear that R is positive semi-definite, S unitary, and (S21)


holds.
6.4.12 Assume that T is normal. Let λ1 , . . . , λk ∈ C be all the distinct eigen-
values of T and Eλ1 , . . . , Eλk the corresponding eigenspaces. Then
U = Eλ1 ⊕ · · · ⊕ Eλk . Let {ui,1 , . . . , ui,mi } be an orthonormal basis of
Eλi (i = 1, . . . , k). Then T  (u) = λi u for any u ∈ Eλi (i = 1, . . . , k).
On the other hand, using the Vandermonde determinant, we know that
we can find a0 , a1 , . . . , ak−1 ∈ C such that

a0 + a1 λi + · · · + ak−1 λk−1
i = λi , i = 1, . . . , k.

Now set p(t) = a0 + a1 t + · · · + ak−1 t k−1 . Then p(T )(u) = λi u for


any u ∈ Eλi (i = 1, . . . , k). Thus T  = p(T ).
6.4.13 Assume that T is normal. If dim(U ) = n = 1, there is nothing to do.
Assume n ≥ 2. Then the previous exercise establishes that there is
a polynomial p(t) such that T  = p(T ). Hence any subspace invari-
ant under T is also invariant under T  . However, T  = p(T ) implies
T = q(T  ) where q is the polynomial obtained from p by replacing
the coefficients of p by the complex conjugates of the corresponding
coefficients of p. Thus any invariant subspace of T  is also invariant
under T .
Now assume that T and T  have the same invariant subspaces. We
show that T is normal. We proceed inductively on dim(U ) = n. If
n = 1, there is nothing to do. Assume that the conclusion is true at
n = k ≥ 1. We consider n = k + 1. Let λ ∈ C be an eigenvalue of
T and u ∈ U an associated eigenvector of T . Then V = Span{u} is
invariant under T and T  . We claim that V ⊥ is invariant under T and T 
as well. In fact, take w ∈ V ⊥ . Since T  (u) = au for some a ∈ C, we
have (u, T (w)) = (T  (u), w) = a(u, w) = 0. So T (w) ∈ V ⊥ . Hence
V ⊥ is invariant under T and T  as well. Since dim(V ⊥ ) = n − 1 = k,
we see that T ◦ T  = T  ◦ T on V ⊥ . Besides, we have T (T  (u)) =
T (au) = aλu and T  (T (u)) = T  (λu) = λau. So T ◦ T  = T  ◦ T on
V . This proves that T is normal on U since U = V ⊕ V ⊥ .

Section 6.5

6.5.4 Since T is normal, we have (T 2 ) ◦T 2 = (T  ◦T )2 so that the eigenvalues


of (T 2 ) ◦ T 2 are those of T  ◦ T squared. This proves T 2  = T 2 in
particular. Likewise, (T m ) ◦ T m = (T  ◦ T )m so that the eigenvalues
of (T m ) ◦ T m are those of T  ◦ T raised to the mth power. So T m  =
T m is true in general.
www.pdfgrip.com

Solutions to selected exercises 303

6.5.8 Note that some special forms of this problem have appeared as Exer-
cises 6.3.11 and 6.3.12. By the singular value decomposition for A we
may rewrite A as A = P Q where Q, P ∈ C(n, n) are unitary and
 ∈ R(n, n) is diagonal whose diagonal entries are all nonnegative.
Alternatively, we also have A = (P Q)(Q† Q) = (P P † )(P Q), as
products of a unitary and positive semi-definite matrices, expressed in
two different orders.

Section 7.1

7.1.4 Since g = 0, we also have f = 0. Assume f n |g n . We show that f |g. If


n = 1, there is nothing to do. Assume n ≥ 2. Let h = gcd(f, g). Then
we have f = hp, g = hq, p, q ∈ P, and gcd(p, q) = 1. If p is a scalar,
then f |g. Suppose otherwise that p is not a scalar. We rewrite f n and g n
as f n = hn pn and g n = hn q n . Since f n |g n , we have g n = f n r, where
r ∈ P. Hence hn q n = hn pn r. Therefore q n = pn r. In particular, p|q n .
However, since gcd(p, q) = 1, we have p|q n−1 . Arguing repeatedly we
arrive at p|q, which is a contradiction.
7.1.5 Let h = gcd(f, g). Then f = hp, g = hq, and gcd(p, q) = 1.
Thus f n = hn pn , g n = hn q n , and gcd(p n , q n ) = 1, which implies
gcd(f n , g n ) = hn .

Section 7.2

7.2.1 If pS (λ) are pT (λ) are relatively prime, then there are polynomials f, g
such that f (λ)pS (λ) + g(λ)pT (λ) = 1. Thus, I = f (T )pS (T ) +
g(T )pT (T ) = f (T )pS (T ) which implies pS (T ) is invertible and
pS (T )−1 = f (T ). Similarly we see that pT (S) is also invertible.
7.2.2 Using the notation of the previous exercise, we have I = f (T )pS (T ).
Thus, applying R ◦ S = T ◦ R, we get

R = f (T )pS (T ) ◦ R = R ◦ f (S)pS (S) = 0.

7.2.3 It is clear that N (T ) ⊂ R(I − T ). It is also clear that R(I − T ) ⊂


N(T ) when T 2 = T . So N (T ) = R(I − T ) if T 2 = T . Thus (7.2.14)
follows from the rank equation r(T ) + n(T ) = dim(U ). Conversely,
assume (7.2.14) holds. From this and the rank equation again we get
r(I − T ) = n(T ). So N (T ) = R(I − T ) which establishes T 2 = T .
7.2.4 We have f1 g1 + f2 g2 = 1 for some polynomials f1 , f2 . So

I = f1 (T )g1 (T ) + f2 (T )g2 (T ). (S22)


www.pdfgrip.com

304 Solutions to selected exercises

As a consequence, for any u ∈ U , we have u = v + w where


v = f1 (T )g1 (T )u and w = f2 (T )g2 (T )u. Hence g2 (T )v =
f1 (T )pT (T )u = 0 and g1 (T )w = f2 (T )pT (T )u = 0. This shows v ∈
N(g2 (T )) and w ∈ N (g1 (T )). Therefore U = N (g1 (T )) + N (g2 (T )).
Pick u ∈ N (g1 (T ))∩N (g2 (T )). Applying (S22) to u we see that u = 0.
Hence the problem follows.
7.2.6 We proceed by induction on k. If k = 1, (7.2.16) follows from Exer-
cise 7.2.3. Assume that at k − 1 ≥ 1 the relation (7.2.16) holds. That is,

U = R(T1 ) ⊕ · · · ⊕ R(Tk−1 ) ⊕ W, W = N (T1 ) ∩ · · · ∩ N (Tk−1 ).


(S23)

Now we have U = R(Tk ) ⊕ N (Tk ). We assert N (Tk ) = R(T1 ) ⊕


· · · ⊕ R(Tk−1 ) ⊕ V . In fact, pick any u ∈ N (Tk ). Then (S23)
indicates that u = u1 + · · · + uk−1 + w for u1 ∈ R(T1 ), . . . , uk−1 ∈
R(Tk−1 ), w ∈ W . Since Tk ◦ Ti = 0 for i = 1, . . . , k − 1, we see that
u1 , . . . , uk−1 ∈ N (Tk ). Hence w ∈ N (Tk ). So w ∈ V . This establishes
the assertion and the problem follows.

Section 7.3

7.3.2 (ii) From (7.3.41) we see that, if we set

p(λ) = λn − an−1 λn−1 − · · · − a1 λ − a0 ,

then p(T )(T k (u)) = T k (p(T )(u)) = 0 for k = 0, 1, . . . , n − 1, where


T 0 = I . This establishes p(T ) = 0 since {T n−1 (u), . . . , T (u), u} is a
basis of U . So pT (λ) = p(λ). It is clear that mT (λ) = pT (λ).
7.3.3 Let u be a cyclic vector of T . Assume that S and T commute. Let
a0 , a1 , . . . , an−1 ∈ F be such that S(u) = an−1 T n−1 (u) + · · · +
a1 T (u)+a0 u. Set p(t) = an−1 t n−1 +· · ·+a1 t +a0 . Hence S(T k (u)) =
T k (S(u)) = (T k p(T ))(u) = p(T )(T k (u)) for k = 1, . . . , n − 1. This
proves S = p(T ) since u, T (u), . . . , T n−1 (u) form a basis of U .
7.3.4 (i) Let u be a cyclic vector of T . Since T is normal, U has a basis
consisting of eigenvectors, say u1 , . . . , un , of T , associated with the
corresponding eigenvalues, λ1 , . . . , λn . Express u as u = a1 u1 + · · · +
an un for some a1 , . . . , an ∈ C. Hence

T (u) = a1 λ1 u1 + · · · + an λn un ,
··· ··· ························
T n−1
(u) = a1 λn−1
1 u1 + · · · + an λn un .
n−1
www.pdfgrip.com

Solutions to selected exercises 305

Inserting the above relations into the equation

x1 u + x2 T (u) + · · · + xn T n−1 (u) = 0, (S24)

we obtain


⎪ a1 (x1 + λ1 x2 + · · · + λn−1
1 xn ) = 0,


⎨ a (x + λ x + · · · + λn−1 x ) = 0,
2 1 2 2 2 n
(S25)

⎪ ··························· ··· ···



an (x1 + λn x2 + · · · + λn xn ) = 0.
n−1

If λ1 , . . . , λn are not all distinct, then (S25) has a solution


(x1 , . . . , xn ) ∈ Cn , which is not the zero vector for any given
a1 , . . . , an , contradicting with the linear independence of the vectors
u, T (u), . . . , T n−1 (u).
(ii) With (7.3.42), we consider the linear dependence of the vectors
u, T (u), . . . , T n−1 (u) as in part (i) and come up with (S24) and (S25).
Since λ1 , . . . , λn are distinct, in view of the Vandermonde determinant,
we see that the system (S25) has only zero solution x1 = 0, . . . , xn = 0
if and only if ai = 0 for any i = 1, . . . , n.
7.3.5 Assume there is such an S. Then S is nilpotent of degree m where m
satisfies 2(n − 1) < m ≤ 2n since S 2(n−1) = T n−1 = 0 and S 2n =
T n = 0. By Theorem 2.22, we arrive at 2(n − 1) < m ≤ n, which is
false.

Section 7.4

7.4.5 Let the invertible matrix C ∈ C(n, n) be decomposed as C = P + iQ,


where P , Q ∈ R(n, n). If Q = 0, there is nothing to show. Assume
Q = 0. Then from CA = BC we have P A = BP and QA = BQ.
Thus, for any real number λ, we have (P + λQ)A = B(P + λQ). It
is clear that there is some λ ∈ R such that det(P + λQ) = 0. For such
λ, set K = P + λQ. Then A = K −1 BK.
7.4.9 (i) Let u ∈ Cn \ {0} be an eigenvector associated to the eigenvalue
λ. Then Au = λu. Thus Ak u = λk u. Since there is an invert-
ible matrix B ∈ C(n, n) such that Ak = B −1 AB, we obtain
(B −1 ABu) = λk u or A(Bu) = λk (Bu), so that the problem
follows.
2 l
(ii) From (i) we see that if λ ∈ C is an eigenvalue then λk , λk , . . . , λk
are all eigenvalues of A which cannot be all distinct when l is
large enough. So there are some integers 1 ≤ l < m such that
www.pdfgrip.com

306 Solutions to selected exercises

l m
λk = λk . Since A is nonsingular, then λ = 0. Hence λ satisfies
m−l
λk = 1 as asserted.
7.4.11 Use (3.4.37) without assuming det(A) = 0 or (S16).
7.4.12 Take u = (1, . . . , 1)t ∈ Rn . It is clear that Au = nu. It is also
clear that n(A) = n − 1 since r(A) = 1. So n(A − nIn ) = 1
b2 bn
and A ∼ diag{n, 0, . . . , 0}. Take v = (1, , . . . , )t ∈ Rn . Then
n n
Bv = nv. Since r(B) = 1, we get n(B) = n − 1. So n(B − nIn ) = 1.
Consequently, B ∼ diag{n, 0, . . . , 0}. Thus A ∼ B.
7.4.17 Suppose otherwise that there is an A ∈ R(3, 3) such that m(λ) = λ2 +
3λ + 4 is the minimal polynomial of A. Let pA (λ) be the characteristic
polynomial of A. Since pA (λ) ∈ P3 and the coefficients of pA (λ) are
all real, so pA (λ) has a real root. On the other hand, recall that the
roots of m(λ) are all the roots of pA (λ) but the former has no real root.
So we arrive at a contradiction.
7.4.18 (i) Let mA (λ) be the minimal polynomial of A. Then mA (λ)|λ2 + 1.
However, λ2 + 1 is prime over R, so mA (λ) = λ2 + 1.
(ii) Let pA (λ) be the characteristic polynomial of A. Then the degree
of pA (λ) is n. Since mA (λ) = λ2 + 1 contains all the roots of
pA (λ) in C, which are ±i, which must appear in conjugate pairs
because pA (λ) is of real coefficients, so pA (λ) = (λ2 + 1)m for
some integer m ≥ 1. Hence n = 2m.
(iii) Since pA (λ) = (λ − i)m (λ + i)m and mA (λ) = (λ − i)(λ + i) (i.e.,
±i are single roots of mA (λ)), we know that

N (A − iIn ) = {x ∈ Cn | Ax = ix},
N (A + iIn ) = {y ∈ Cn | Ay = −iy},

are both of dimension m in Cn and Cn = N (A − iIn ) ⊕ N (A +


iIn ). Since A is real, we see that if x ∈ N (A − iIn ) then x ∈
N(A + iIn ), and vice versa. Moreover, if {w1 , . . . , wm } is a basis
of N (A − iIn ), then {w1 , . . . , w m } is a basis of N (A + iIn ), and
vice versa. Thus {w1 , . . . , wm , w 1 , . . . , w m } is a basis of Cn . We
now make the decomposition

wi = ui + ivi , ui , vi ∈ Rn , i = 1, . . . , m. (S26)

Then ui , vi (i = 1, . . . , m) satisfy (7.4.23). It remains to show


that these vectors are linearly independent in Rn . In fact, consider
www.pdfgrip.com

Solutions to selected exercises 307


m 
m
ai ui + bi vi = 0, a1 , . . . , am , b1 , . . . , bm ∈ R. (S27)
i=1 i=1

From (S26) we have


1 1
ui = (wi + w i ), vi = (wi − wi ), i = 1, . . . , m.
2 2i
(S28)
Inserting (S28) into (S27), we obtain

m 
m
(ai − ibi )wi + (ai + ibi )w i = 0,
i=1 i=1

which leads to ai = bi = 0 for all i = 1, . . . , m.


(iv) Take ordered basis B = {u1 , . . . , um , v1 , . . . , vm }. Then it is seen
that, with respect to B, the matrix representation of the mapping
TA ∈ L(Rn ) defined by TA (u) = Au, u ∈ Rn is simply

0 Im
C= .
−Im 0

More precisely, if a matrix called B is formed by using the vectors


in the ordered basis B as its first, second, . . . , and the nth column
vectors, then AB = BC, which establishes (7.4.24).

Section 8.1

8.1.5 If A is normal, there is a unitary matrix P ∈ C(n, n) such that A =


P † DP , where D is a diagonal matrix of the form diag{λ1 , . . . , λn } with
λ1 , . . . , λn the eigenvalues of A which are assumed to be real. Thus
A† = A. However, because A is real, we have A = At .
8.1.6 We proceed inductively on dim(U ). If dim(U ) = 1, there is nothing to
show. Assume that the problem is true at dim(U ) = n − 1 ≥ 1. We
prove the conclusion at dim(U ) = n ≥ 2. Let λ ∈ C be an eigenvalue
of T and Eλ the associated eigenspace of T . Then for u ∈ Eλ we have
T (S(u)) = S(T (u)) = λS(u). Hence S(u) ∈ Eλ . So Eλ is invariant un-
der S. As an element in L(Eλ ), S has an eigenvalue μ ∈ C. Let u ∈ Eλ
be an eigenvector of S associated with μ. Then u is a common eigen-
vector of S and T . Applying this observation to S  and T  since S  , T 
commute as well, we know that S  and T  also have a common eigenvec-
tor, say w, satisfying S  (w) = σ w, T  (w) = γ w, for some σ, γ ∈ C.
www.pdfgrip.com

308 Solutions to selected exercises

Let V = (Span{w})⊥ . Then V is invariant under S and T as can be seen


from
(w, S(v)) = (S  (w), v) = (σ w, v) = σ (w, v) = 0,
(w, T (v)) = (T  (w), v) = (γ w, v) = γ (w, v) = 0,
for v ∈ V . Since dim(V ) = n − 1, we may find an orthonormal basis
of V , say {u1 , . . . , un−1 }, under which the matrix representations of S
and T are upper triangular. Let un = w/w. Then {u1 , . . . , un−1 , un }
is an orthonormal basis of U under which the matrix representations of
S and T are upper triangular.
8.1.9 Let λ ∈ C be any eigenvalue of T and v ∈ U an associated eigenvector.
Then we have
((T  − λI )(u), v) = (u, (T − λI )(v)) = 0, u ∈ U, (S29)

which implies that R(T  − λI ) ⊂ (Span{v})⊥ . Therefore r(T  − λI ) ≤


n − 1. Thus, in view of the rank equation, we obtain n(T  − λI ) ≥ 1. In
other words, this shows that λ must be an eigenvalue of T  .

Section 8.2

8.2.3 Let B = {u1 , . . . , un } be a basis of U and x, y ∈ Fn the coordinate


vectors of u, v ∈ U with respect to B. With A = (aij ) = (f (ui , uj ))
and (8.2.2), we see that u ∈ U0 if and only if x t Ay = 0 for all y ∈ Fn
or Ax = 0. In other words, u ∈ U0 if and only if x ∈ N (A). So
dim(U0 ) = n(A) = n − r(A) = dim(U ) − r(A).
8.2.4 (i) As in the previous exercise, we use B = {u1 , . . . , un } to denote
a basis of U and x, y ∈ Fn the coordinate vectors of any vectors
u, v ∈ U with respect to B. With A = (aij ) = (f (ui , uj )) and
(8.2.2), we see that u ∈ V ⊥ if and only if
(Ax)t y = 0, v ∈ V. (S30)
Let dim(V ) = m. We can find m linearly independent vectors
y (1) , . . . , y (m) in Fn to replace the condition (S30) by
(y (1) )t (Ax) = 0, . . . , (y (m) )t (Ax) = 0.
These equations indicate that, if we use B to denote the matrix
formed by taking (y (1) )t , . . . , (y (m) )t as its first,. . . , and mth row
vectors, then Ax ∈ N (B) = {z ∈ Fn | Bz = 0}. In other words, the
subspace of Fn consisting of the coordinate vectors of the vectors in
V ⊥ is given by X = {x ∈ Fn | Ax ∈ N (B)}. Since A is invertible,
www.pdfgrip.com

Solutions to selected exercises 309

we have dim(X) = dim(N (B)) = n(B) = n − r(B) = n − m. This


establishes dim(V ⊥ ) = dim(X) = dim(U ) − dim(V ).
(ii) For v ∈ V , we have f (u, v) = 0 for any u ∈ V ⊥ . So V ⊂ (V ⊥ )⊥ .
On the other hand, from (i), we get

dim(V ) + dim(V ⊥ ) = dim(V ⊥ ) + dim((V ⊥ )⊥ ).

So dim(V ) = dim((V ⊥ )⊥ ) which implies V = (V ⊥ )⊥ .


1
8.2.7 If V = V ⊥ , then V is isotropic and dim(V ) = dim(U ) in view of
2
Exercise 8.2.4. Thus V is Lagrangian. Conversely, if V is Lagrangian,
1
then V is isotropic such that V ⊂ V ⊥ and dim(V ) = dim(U ). From
2
1
Exercise 8.2.4, we have dim(V ) = dim(U ) = dim(V ). So V = V ⊥ .

2

Section 8.3

8.3.1 Let u = (ai ) ∈ Rn be a positive eigenvector associated to r. Then the


ith component of the relation Au = ru reads

n
rai = aij aj , i = 1, . . . , n. (S31)
j =1

Choose k, l = 1, . . . , n such that

ak = min{ai | i = 1, . . . , n}, al = max{ai | i = 1, . . . , n}.

Inserting these into (S31) we find



n 
n
rak ≥ ak akj , ral ≤ al alj .
j =1 j =1

From these we see that the bounds stated in (8.3.27) follow.


8.3.3 Use the notation A = {λ ∈ R | λ ≥ 0, Ax ≥ λx for some x ∈ S},
where S is defined by (8.3.3). Recall the construction (8.3.7). We see
that rA = sup{λ ∈ A }. Since A ≤ B implies A ⊂ B , we deduce
rA ≤ rB .

Section 8.4

8.4.4 From lim Am = K, we obtain lim (At )m = lim (Am )t = K t .


m→∞ m→∞ m→∞
Since all the row vectors of K and K t are identical, we see that all
entries of K are identical. By the condition (8.4.13), we deduce (8.4.24).
www.pdfgrip.com

310 Solutions to selected exercises

8.4.5 It is clear that all the entries of A and At are non-negative. It remains
to show that 1 and u = (1, . . . , 1)t ∈ Rn are a pair of eigenvalues and
eigenvectors of both A and At . In fact, applying Ai u = u and Ati u = u
(i = 1, . . . , k) consecutively, we obtain Au = A1 · · · Ak u = u and
At u = Atk · · · At1 u = u, respectively.

Section 9.3

9.3.2 (i) In the uniform state (9.3.30), we have


1 1 2
n n
A = λi , A2  = λi .
n n
i=1 i=1

Hence the uncertainty σA of the observable A in the state (9.3.30)


is given by the formula
 n 2
1 2 1
n
σA = A  − A =
2 2 2
λi − λi . (S32)
n n
i=1 i=1

(ii) From (9.3.25) and (S32), we obtain the comparison


 n 2
1 2 1
n
1
λi − λi ≤ (λmax − λmin )2 .
n n 4
i=1 i=1
www.pdfgrip.com

Bibliographic notes

We end the book by mentioning a few important but more specialized subjects
that are not touched in this book. We point out only some relevant references
for the interested.
Convex sets. In Lang [23] basic properties and characterizations of convex
sets in Rn are presented. For a deeper study of convex sets using advanced
tools such as the Hahn–Banach theorem see Lax [25].
Tensor products and alternating forms. These topics are covered elegantly
by Halmos [18]. In particular, there, the determinant is seen to arise as
the unique scalar, associated with each linear mapping, defined by the one-
dimensional space of top-degree alternating forms, over a finite-dimensional
vector space.
Minmax principle for computing the eigenvalues of self-adjoint mappings.
This is a classical variational resolution of the eigenvalue problem known
as the method of the Rayleigh–Ritz quotients. For a thorough treatment see
Bellman [6], Lancaster and Tismenetsky [22], and Lax [25].
Calculus of matrix-valued functions. These techniques are useful and pow-
erful in applications. For an introduction see Lax [25].
Irreducible matrices. Such a notion is crucial for extending the Perron–
Frobenius theorem and for exploring the Markov matrices further under more
relaxed conditions. See Berman and Plemmons [7], Horn and Johnson [21],
Lancaster and Tismenetsky [22], Meyer [29], and Xu [38] for related studies.
Transformation groups and bilinear forms. Given a non-degenerate bilin-
ear form over a finite-dimensional vector space, the set of all linear mappings
on the space which preserve the bilinear form is a group under the operation
of composition. With a specific choice of the bilinear form, a particular such
transformation group may thus be constructed and investigated. For a concise
introduction to this subject in the context of linear algebra see Hoffman and
Kunze [19].

311
www.pdfgrip.com

312 Bibliographic notes

Computational methods for solving linear systems. Practical methods for


solving systems of linear equations are well investigated and documented. See
Lax [25], Golub and Ortega [13], and Stoer and Bulirsch [32] for a description
of some of the methods.
Computing the eigenvalues of symmetric matrices. This is a much devel-
oped subject and many nice methods are available. See Lax [25] for meth-
ods based on the QR factorization and differential flows. See also Stoer and
Bulirsch [32].
Computing the eigenvalues of general matrices. Some nicely formulated
iterative methods may be employed to approximate the eigenvalues of a general
matrix under certain conditions. These methods include the QR convergence
algorithm and the power method. See Golub and Ortega [13] and Stoer and
Bulirsch [32].
Random matrices. The study of random matrices, the matrices whose entries
are random variables, was pioneered by Wigner [36, 37] to model the spectra of
large atoms and has recently become the focus of active mathematical research.
See Akemann, Baik, and Di Francesco [1], Anderson, Guionnet, and Zeitouni
[2], Mehta [28], and Tao [33], for textbooks, and Beenakker [5], Diaconis [12],
and Guhr, Müller-Groeling, and Weidenmüller [17], for survey articles.
Besides, for a rich variety of applications of Linear Algebra and its related
studies, see Bai, Fang, and Liang [3], Bapat [4], Bellman [6], Berman and
Plemmons [7], Berry, Dumais, and O’Brien [8], Brualdi and Ryser [9], Datta
[10], Davis [11], Gomide et al [14], Graham [15], Graybill [16], Horadam [20],
Latouche and Vaidyanathan [24], Leontief [26], Lyubich, Akin, Vulis, and Kar-
pov [27], Meyn and Tweedie [30], Stinson [31], Taubes [34], Van Dooren and
Wyman [35], and references therein.
www.pdfgrip.com

References

[1] G. Akemann, J. Baik, and P. Di Francesco, The Oxford Handbook of Random


Matrix Theory, Oxford University Press, Oxford, 2011.
[2] G. W. Anderson, A. Guionnet, and O. Zeitouni, An Introduction to Random
Matrices, Cambridge University Press, Cambridge, 2010.
[3] Z. Bai, Z. Fang, and Y.-C. Liang, Spectral Theory of Large Dimensional Ran-
dom Matrices and Its Applications to Wireless Communications and Finance
Statistics, World Scientific, Singapore, 2014.
[4] R. B. Bapat, Linear Algebra and Linear Models, 3rd edn, Universitext, Springer-
Verlag and Hindustan Book Agency, New Delhi, 2012.
[5] C. W. J. Beenakker, Random-matrix theory of quantum transport, Reviews of
Modern Physics 69 (1997) 731–808.
[6] R. Bellman, Introduction to Matrix Analysis, 2nd edn, Society of Industrial and
Applied Mathematics, Philadelphia, 1997.
[7] A. Berman and R. J. Plemmons, Nonnegative Matrices in the Mathematical Sci-
ences, Society of Industrial and Applied Mathematics, Philadelphia, 1994.
[8] M. Berry, S. Dumais, and G. O’Brien, Using linear algebra for intelligent infor-
mation retrieval, SIAM Review 37 (1995) 573–595.
[9] R. A. Brualdi and H. J. Ryser, Combinatorial Matrix Theory, Encyclopedia of
Mathematics and its Applications 39, Cambridge University Press, Cambridge,
1991.
[10] B. N. Datta, Numerical Linear Algebra and Applications, 2nd edn, Society of
Industrial and Applied Mathematics, Philadelphia, 2010.
[11] E. Davis, Linear Algebra and Probability for Computer Science Applications,
A. K. Peters/CRC Press, Boca Raton, FL, 2012.
[12] P. Diaconis, Patterns in eigenvalues: the 70th Josiah Willard Gibbs lecture, Bul-
letin of American Mathematical Society (New Series) 40 (2003) 155–178.
[13] G. H. Golub and J. M. Ortega, Scientific Computing and Differential Equations,
Academic Press, Boston and New York, 1992.
[14] J. Gomide, R. Melo-Minardi, M. A. dos Santos, G. Neshich, W. Meira, Jr., J. C.
Lopes, and M. Santoro, Using linear algebra for protein structural comparison
and classification, Genetics and Molecular Biology 32 (2009) 645–651.
[15] A. Graham, Nonnegative Matrices and Applicable Topics in Linear Algebra,
John Wiley & Sons, New York, 1987.

313
www.pdfgrip.com

314 References

[16] F. A. Graybill, Introduction to Matrices with Applications in Statistics,


Wadsworth Publishing Company, Belmont, CA, 1969.
[17] T. Guhr, A. Müller-Groeling, and H. A. Weidenmüller, Random-matrix theories
in quantum physics: common concepts, Physics Reports 299 (1998) 189–425.
[18] P. R. Halmos, Finite-Dimensional Vector Spaces, 2nd edn, Springer-Verlag, New
York, 1987.
[19] K. Hoffman and R. Kunze, Linear Algebra, Prentice-Hall, Englewood Cliffs,
NJ, 1965.
[20] K. J. Horadam, Hadamard Matrices and Their Applications, Princeton Univer-
sity Press, Princeton, NJ, 2007.
[21] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press,
Cambridge, New York, and Melbourne, 1985.
[22] P. Lancaster and M. Tismenetsky, The Theory of Matrices, 2nd edn, Academic
Press, San Diego, New York, London, Sydney, and Tokyo, 1985.
[23] S. Lang, Linear Algebra, 3rd edn, Springer-Verlag, New York, 1987.
[24] G. Latouche and R. Vaidyanathan, Introduction to Matrix Analytic Methods in
Stochastic Modeling, Society of Industrial and Applied Mathematics, Philadel-
phia, 1999.
[25] P. D. Lax, Linear Algebra and Its Applications, John Wiley & Sons, Hoboken,
NJ, 2007.
[26] W. Leontief, Input-Output Economics, Oxford University Press, New York,
1986
[27] Y. I. Lyubich, E. Akin, D. Vulis, and A. Karpov, Mathematical Structures in
Population Genetics, Springer-Verlag, New York, 2011.
[28] M. L. Mehta, Random Matrices, Elsevier Academic Press, Amsterdam, 2004.
[29] C. Meyer, Matrix Analysis and Applied Linear Algebra, Society of Industrial
and Applied Mathematics, Philadelphia, 2000.
[30] S. P. Meyn and R. L. Tweedie, Markov Chains and Stochastic Stabil-
ity, Springer-Verlag, London, 1993; 2nd edn, Cambridge University Press,
Cambridge, 2009.
[31] D. R. Stinson, Cryptography, Discrete Mathematics and its Applications, Chap-
man & Hall/CRC Press, Boca Raton, FL, 2005.
[32] J. Stoer and R. Bulirsch, Introduction to Numerical Analysis, Springer-Verlag,
New York, Heidelberg, and Berlin, 1980.
[33] T. Tao, Topics in Random Matrix Theory, American Mathematical Society,
Providence, RI, 2012.
[34] C. H. Taubes, Lecture Notes on Probability, Statistics, and Linear Algebra,
Department of Mathematics, Harvard University, Cambridge, MA, 2010.
[35] P. Van Dooren and B. Wyman, Linear Algebra for Control Theory, IMA Vol-
umes in Mathematics and its Applications, Springer-Verlag, New York, 2011.
[36] E. Wigner, Characteristic vectors of bordered matrices with infinite dimensions,
Annals of Mathematics 62 (1955) 548–564.
[37] E. Wigner, On the distribution of the roots of certain symmetric matrices, Annals
of Mathematics 67 (1958) 325–327.
[38] Y. Xu, Linear Algebra and Matrix Theory (in Chinese), 2nd edn, Higher Educa-
tion Press, Beijing, 2008.
www.pdfgrip.com

Index

1-form, 16 Cayley–Hamilton theorem, 110


characteristic of a field, 2
characteristic roots, 107 characteristic polynomial, 107
characteristic polynomial of linear mapping,
addition, 3
109
adjoint mapping, 50, 122
Cholesky decomposition theorem, 168
adjoint matrix, 103
Cholesky relation, 168
adjugate matrix, 103
co-prime, 207
adjunct matrix, 103
codimension, 27
algebraic multiplicity, 216
cofactor, 89
algebraic number, 13
cofactor expansion, 89
angular frequencies, 255
cokernel, 39
annihilating polynomials, 221
column rank of a matrix, 52
annihilator, 19
commutator, 256
anti-Hermitian mappings, 195
complement, 22
anti-Hermitian matrices, 7 complete set of orthogonal vectors, 139
anti-lower triangular matrix, 99 component, 4
anti-self-adjoint, 126 composition of mappings, 37
anti-self-adjoint mappings, 195 congruent matrices, 148
anti-self-dual, 126 contravariant vectors, 18
anti-symmetric, 4 convergent sequence, 28
anti-symmetric forms, 230 convex, 164
anti-upper triangular matrix, 99 coordinate vector, 14
coordinates, 14
basis, 13
covariant vectors, 18
basis change matrix, 15
Cramer’s formulas, 80, 104
basis orthogonalization, 117
Cramer’s rule, 80, 104
basis transition matrix, 15, 45
cyclic vector, 62, 217
Bessel inequality, 140
cyclic vector space, 217
bilinear form, 147, 180
bilinear forms, 147 Darboux basis, 235
boxed diagonal matrix, 57 degree, 86
boxed lower triangular form, 99 degree of a nilpotent mapping, 62
boxed upper triangular form, 98 dense, 72
boxed upper triangular matrix, 56 determinant, 79
bracket, 248 determinant of a linear mapping, 105

315
www.pdfgrip.com

316 Index

diagonal matrix, 6 Heisenberg picture, 262


diagonalizable, 222 Heisenberg uncertainty principle, 258
diagonally dominant condition, 101 Hermitian congruent, 181
dimension, 13 Hermitian conjugate, 7, 134
direct product of vector spaces, 22 Hermitian matrices, 7
direct sum, 21 Hermitian matrix, 134
direct sum of mappings, 58 Hermitian scalar product, 127
dominant, 237 Hermitian sesquilinear form, 182
dot product, 5 Hermitian skewsymmetric forms, 236
doubly Markov matrices, 247 Hermitian symmetric matrices, 7
dual basis, 17 Hilbert–Schmidt norm of a matrix, 137
dual mapping, 122 homogeneous, 148
dual space, 16 hyponormal mappings, 199

eigenmodes, 255 ideal, 205


eigenspace, 57 idempotent linear mappings, 60
eigenvalue, 57 identity matrix, 6
eigenvector, 57 image, 38
Einstein formula, 255 indefinite, 161
entry, 4 index, 83
equivalence of norms, 30 index of negativity, 119
equivalence relation, 48 index of nilpotence, 62
equivalent polynomials, 207 index of nullity, 119
Euclidean scalar product, 124, 127 index of positivity, 119
infinite dimensional, 13
Fermat’s Little Theorem, 267 injective, 38
field, 1 invariant subspace, 55
field of characteristic 0, 2 inverse mapping, 42
field of characteristic p, 2 invertible linear mapping, 41
finite dimensional, 13 invertible matrix, 7
finite dimensionality, 13 irreducible linear mapping, 56
finitely generated, 13 irreducible polynomial, 207
form, 16 isometric, 142
Fourier coefficients, 139 isometry, 142
Fourier expansion, 139 isomorphic, 42
Fredholm alternative, 53, 137, 178 isomorphism, 42
Frobenius inequality, 44 isotropic subspace, 235
functional, 16
Fundamental Theorem of Algebra, 85 Jordan block, 220
Jordan canonical form, 220
generalized eigenvectors, 216 Jordan decomposition theorem, 220
generalized Schwarz inequality, 194 Jordan matrix, 220
generated, 8
generic property, 72 kernel, 38
geometric multiplicity, 57
Gram matrix, 136 least squares approximation, 176
Gram–Schmidt procedure, 117 left inverse, 7, 42
great common divisor, 207 Legendre polynomials, 141
Gronwall inequality, 263 Levy–Desplanques theorem, 101
limit, 28
Hamiltonian, 254 linear complement, 22
hedgehog map, 88 linear function, 16
Heisenberg equation, 263 linear span, 8
www.pdfgrip.com

Index 317

linear transformation, 55 orthogonal matrices, 7


linearly dependent, 8, 9 orthogonal matrix, 134
linearly independent, 10 orthonormal basis, 119, 130
linearly spanned, 8
locally nilpotent mappings, 62 Parseval identity, 140
lower triangular matrix, 6 Pauli matrices, 257
period, 62
mapping addition, 35 permissible column operations, 95
Markov matrices, 243 permissible row operations, 92
matrix, 4 perpendicular, 115
matrix exponential, 76 Perron–Frobenius theorem, 237
matrix multiplication, 5 photoelectric effect, 255
matrix-valued initial value problem, 76 Planck constant, 254
maximum uncertainty, 261 polar decomposition, 198
maximum uncertainty state, 261 polarization identities, 149
Measurement postulate, 252 polarization identity, 132
metric matrix, 136 positive definite Hermitian mapping, 188
minimal polynomial, 67, 221 positive definite Hermitian matrix, 188
Minkowski metric, 120 positive definite quadratic form, 188
Minkowski scalar product, 120 positive definite quadratic forms, 158
Minkowski theorem, 101 positive definite scalar product over a complex
minor, 89 vector space, 128
mutually complementary, 22 positive definite scalar product over a real
vector space, 128
negative definite, 161 positive definite self-adjoint mapping, 188
negative semi-definite, 161 positive definite self-adjoint mappings, 158
nilpotent mappings, 62 positive definite symmetric matrices, 158
non-definite, 161 positive diagonally dominant condition, 101
non-degenerate, 120 positive matrices, 237
non-negative, 161 positive semi-definite, 161
non-negative matrices, 237 positive vectors, 237
non-negative vectors, 237 preimages, 38
non-positive, 161 prime polynomial, 207
nonsingular matrix, 7 principal minors, 166
norm, 28 product of matrices, 5
norm that is stronger, 29 projection, 60
normal equation, 176 proper subspace, 8
normal mappings, 172, 194, 195 Pythagoras theorem, 129
normal matrices, 197
normed space, 28 QR factorization, 134
null vector, 115 quadratic form, 148, 181
null-space, 38 quotient space, 26
nullity, 39
nullity-rank equation, 40 random variable, 252
range, 38
observable postulate, 252 rank, 39
observables, 252 rank equation, 40
one-parameter group, 74 rank of a matrix, 52
one-to-one, 38 reducibility, 56
onto, 38 reducible linear mapping, 56
open, 73 reflective, 19
orthogonal, 115 reflectivity, 19
orthogonal mapping, 123, 132 regular Markov matrices, 243
www.pdfgrip.com

318 Index

regular stochastic matrices, 243 State postulate, 252


relatively prime, 207 state space, 252
Riesz isomorphism, 122 states, 252
Riesz representation theorem, 121 stochastic matrices, 243
right inverse, 7, 42 subspace, 8
row rank of a matrix, 52 sum of functionals, 16
sum of subspaces, 20
scalar multiple of a functional, 16 surjective, 38
scalar multiplication, 3 Sylvester inequality, 44
scalar product, 115 Sylvester theorem, 118
scalar-mapping multiplication, 35 symmetric, 4
scalars, 1, 3 symmetric bilinear forms, 149
Schrödinger equation, 254 symplectic basis, 235
Schrödinger picture, 262 symplectic complement, 236
Schur decomposition theorem, 226 symplectic forms, 235
Schwarz inequality, 129 symplectic vector space, 235
self-adjoint mapping, 122
self-dual mapping, 122 time evolution postulate, 253
self-dual space, 122 topological invariant, 82
sesquilinear form, 180 transcendental number, 13
shift matrix, 65 transpose, 4
signed area, 79
uncertainty, 259
signed volume, 80
uniform state, 261
similar matrices, 48
unit matrix, 6
singular value decomposition for a mapping,
unitary mapping, 132
202
unitary matrices, 7
singular value decomposition for a matrix, 203
unitary matrix, 134
singular values of a mapping, 202
upper triangular matrix, 6
singular values of a matrix, 203
skewsymmetric, 4 Vandermonde determinant, 105
skew-Hermitian forms, 236 variance, 257
skew-Hermitian matrices, 7 variational principle, 164
skewsymmetric forms, 230 vector space, 1
special relativity, 120 vector space over a field, 3
square matrix, 4 vectors, 1, 3
stable Markov matrices, 247
stable matrix, 246 wave function, 252
standard deviation, 259 wave–particle duality hypothesis, 255

You might also like