MTH5112 Linear Algebra I, 2009-2010: Oscar Bandtlow 30 March 2010
MTH5112 Linear Algebra I, 2009-2010: Oscar Bandtlow 30 March 2010
MTH5112 Linear Algebra I, 2009-2010: Oscar Bandtlow 30 March 2010
Oscar Bandtlow
30 March 2010
ii
Contents
2 Matrix Algebra 9
2.1 Revision from Geometry I . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Transpose of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Special types of square matrices . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Linear systems in matrix notation . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Elementary matrices and the Invertible Matrix Theorem . . . . . . . . . . . . 16
2.6 Gauss-Jordan inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 Determinants 23
3.1 General definition of determinants . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Properties of determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Cramer’s Rule and a formula for A−1 . . . . . . . . . . . . . . . . . . . . . . 31
4 Vector Spaces 35
4.1 Definition and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 The span of a set of vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4 Linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.5 Basis and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.6 Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.7 Row space and column space . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5 Linear Transformations 59
5.1 Definition and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2 Linear transformations on general vector spaces . . . . . . . . . . . . . . . . . 61
5.3 Image and Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.4 Matrix representations of linear transformations . . . . . . . . . . . . . . . . . 65
5.5 Composition of linear transformations . . . . . . . . . . . . . . . . . . . . . . 68
5.6 Change of basis and similarity . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6 Orthogonality 75
6.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.2 Orthogonal complements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3 Orthogonal sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.4 Orthonormal sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
iii
iv CONTENTS
Index 103
Chapter 1
Systems of linear equations are the bread and butter of linear algebra. They arise in many
areas of the sciences, including physics, engineering, business, economics, and sociology. Their
systematic study also provided part of the motivation for the development of modern linear
algebra at the end of the 19th century.
The material in this chapter will be familiar from Geometry I, where systems of linear
equations have already been discussed in some detail. As this chapter is fundamental for what
is to follow, you want to make sure that the basic ideas are hardwired in your brain for the
rest of term!
1
2 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS
A system with no solution is called inconsistent, while a system with at least one solution
is called consistent.
The set of all solutions of a system is called its solution set, which may be empty if the
system is inconsistent.
The basic problem we want to address in this section is the following: given an arbitrary
m × n system, determine its solution set. Later on, we will discuss a procedure that provides
a complete and practical solution to this problem (the so-called ‘Gaussian algorithm’). Before
we encounter this procedure, we require a bit more terminology.
Definition 1.3. Two m × n systems are said to be equivalent, if they have the same solution
set.
(a) is easy to solve: looking at the last equation we find first that x3 = 2; the second from
the bottom implies x2 = 2; and finally the first one yields x1 = (−3 + x2 − 2x3 )/5 = −1. So
the solution set of this system is {(−1, 2, 2)}.
To find the solution of (b), add the first and the second equation. Then x2 = 2, while
subtracting the first from the third equation gives 3x3 = 6, that is x3 = 2. Finally, the first
equation now gives x1 = (−3 + x2 − 2x3 )/5 = −1, so the solution set is again {(−1, 2, 2)}.
Thus the systems (a) and (b) are equivalent.
In solving system (b) above we have implicitly used the following important observation:
Lemma 1.5. The following operations do not change the solution set of a linear system:
Proof. (i) and (ii) are obvious. The proof of (iii) was an exercise in Geometry I (Coursework
4, Exercise 3) for the special case of two equations in three unknowns. The general case can
be proved in exactly the same way and is left as an exercise.
We shall see shortly how to use the above operations systematically to obtain the solution
set of any given linear system. Before doing so, however, we introduce a useful short-hand.
Given an m × n linear system
a11 x1 + · · · + a1n xn = b1
..
.
am1 x1 + · · · + amn xn = bm
The variables corresponding to the leading 1’s of the augmented matrix in row echelon
form will be referred to as the leading variables, the remaining ones as the free variables.
Example 1.10.
1 2 3 −4 6
(a) .
0 0 1 2 3
Leading variables: x1 and x3 ; free variables: x2 and x4 .
1 0 5
(b) .
0 1 3
Leading variables: x1 and x2 ; no free variables.
Note that if the augmented matrix of a system is in row echelon form, the solution set is
easily obtained.
Example 1.11. Determine the solution set of the systems given by the following augmented
matrices in row echelon form:
1 −2 0 1 2
1 3 0 2
(a) , (b) 0 0 1 −2 1 .
0 0 0 1
0 0 0 0 0
x1 + 3x2 = 2
0 = 1
x1 − 2x2 + x4 = 2
x3 − 2x4 = 1
0 = 0
We can express the leading variables in terms of the free variables x2 and x4 . So set x2 = α
and x4 = β, where α and β are arbitrary real numbers. The second line now tells us that
x3 = 1 + 2x4 = 1 + 2β, and then the first line that x1 = 2 + 2x2 − x4 = 2 + 2α − β. Thus
the solution set is { (2 + 2α − β, α, 1 + 2β, β) | α, β ∈ R }.
It turns out that every matrix can be brought into row echelon form using only elementary
row operations. The procedure is known as the
1.2. GAUSSIAN ELIMINATION 5
Gaussian algorithm:
Step 1 If the matrix consists entirely of zeros, stop — it is already in row echelon form.
Step 2 Otherwise, find the the first column from the left containing a non-zero entry (call it
a), and move the row containing that entry to the top position.
Step 4 By subtracting multiples of that row from rows below it, make each entry below the
leading 1 zero.
This completes the first row. All further operations are carried out on the other rows.
Step 5 Repeat steps 1-4 on the matrix consisting of the remaining rows
The process stops when either no rows remain at Step 5 or the remaining rows consist of
zeros.
Example 1.12. Solve the following system using the Gaussian algorithm:
x2 + 6x3 =4
3x1 − 3x2 + 9x3 = −3
2x1 + 2x2 + 18x3 = 8
1 −1 3 −1 1 −1 3 −1 1 −1 3 −1
∼ 0 1 6 4 ∼ 0 1 6 4 ∼ 0 1 6 4 ,
R3 − 2R1 0 4 12 10 R3 − 4R2 0 0 −12 −6 − 12 R3 0 0 1 12
1
where the last matrix is now in row echelon form. The corresponding system reads:
x1 − x2 + 3x3 = −1
x2 + 6x3 = 4
x3 = 21
Leading variables are x1 , x2 and x3 ; there are no free variables. The last equation now implies
x3 = 12 ; the second equation from bottom yields x2 = 4−6x 33= 11and finally the first equation
3
yields x1 = −1 + x2 − 3x3 = − 2 . Thus the solution is (− 2 , 1, 2 ) .
A variant of the Gauss algorithm is the Gauss-Jordan algorithm, which brings a matrix to
reduced row echelon form:
6 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS
Gauss-Jordan algorithm
Step 1 Bring matrix to row echelon form using the Gaussian algorithm.
Step 2 Find the row containing the first leading 1 from the right, and add suitable multiples
of this row to the rows above it to make each entry above the leading 1 zero.
This completes the first non-zero row from the bottom. All further operations are carried out
on the rows above it.
Step 3 Repeat steps 1-2 on the matrix consisting of the remaining rows.
Example 1.13. Solve the following system using the Gauss-Jordan algorithm:
x1 + x2 + x3 + x4 + x5 = 4
x1 + x2 + x3 + 2x4 + 2x5 = 5
x1 + x2 + x3 + 2x4 + 3x5 = 7
x1 + x2 + x3 = 3
x4 = −1
x5 = 2
Leading variables are x1 , x4 , and x5 ; free variables x2 and x3 . Now set x2 = α and
x3 = β, and solve for the leading variables starting from the last equation. This yields
x5 = 2, x4 = −1, and finally x1 = 3 − x2 − x3 = 3 − α − β. Thus the solution set is
{ (3 − α − β, α, β, −1, 2) | α, β ∈ R }.
We have just seen that any matrix can be brought to (reduced) row echelon form using
only elementary row operations, and moreover that there is an explicit procedure to achieve
this (namely the Gaussian and Gauss-Jordan algorithm). We record this important insight for
later use.
Theorem 1.14.
(a) Every matrix can be brought to row echelon form by a series of elementary row opera-
tions.
(b) Every matrix can be brought to reduced row echelon form by a series of elementary row
operations.
Proof. For (a):apply the Gaussian algorithm; for (b): apply the Gauss-Jordan algorithm.
1.3. SPECIAL CLASSES OF LINEAR SYSTEMS 7
Remark 1.15. It can be shown (but not in this module) that the reduced row echelon form
of a matrix is unique. A moment’s thought (to which you are warmly invited) shows that this
is not the case for the row echelon form.
The remark above implies that if a matrix is brought to reduced row echelon form by
any sequence of elementary row operations (that is, not necessarily by those prescribed by the
Gauss-Jordan algorithm) the leading ones will nevertheless always appear in the same positions.
As a consequence, the following definition makes sense.
Definition 1.16. A pivot position in a matrix A is a location that corresponds to a leading
1 in the reduced row echelon form of A. A pivot column is a column of A that contains a
pivot position.
Example 1.17. Let
1 1 1 1 1 4
A = 1 1 1 2 2 5 .
1 1 1 2 3 7
By Example 1.13 the reduced row echelon form of A is
1 1 1 0 0 3
0 0 0 1 0 −1 ,
0 0 0 0 1 2
Thus the pivot positions of A are the (1, 1)-entry, the (2, 4)-entry, and the (3, 5)-entry and
the pivot columns of A are columns 1, 4, and 5.
The notion of a pivot position and a pivot column will come in handy later in the module.
Example 1.21.
The first observation about homogeneous systems is that they always have a solution, the
so-called trivial or zero solution: (0, 0, . . . , 0).
For later use we record the following useful consequence of the previous theorem on con-
sistent homogeneous systems:
Proof. We just observed that a homogeneous systems is consistent. Thus, if the system is
underdetermined and homogeneous, it must have infinitely many solutions by Theorem 1.19,
hence, in particular, it must have a non-zero solution.
Our final result in this section is devoted to the special case of n × n systems. For such
systems there is a delightful characterisation of the existence and uniqueness of solutions of a
given system in terms of the associated homogeneous systems. At the same time, the proof of
this result serves as another illustration of the usefulness of the row echelon form for theoretical
purposes.
Theorem 1.23. An n × n system is consistent and has a unique solution, if and only if the
only solution of the associated homogeneous system is the zero solution.
• The same sequence of elementary row operations that brings the augmented matrix
of a system to row echelon form, also brings the augmented matrix of the associated
homogeneous system to row echelon form, and vice versa.
• An n×n system in row echelon form has a unique solution precisely if there are n leading
variables.
Thus, if an n × n system is consistent and has a unique solution, the corresponding homoge-
neous system must have a unique solution, which is necessarily the zero solution.
Conversely, if the associated homogeneous system of a given system has the zero solution
as its unique solution, then the original inhomogeneous system must have a solution, and this
solution must be unique.
Chapter 2
Matrix Algebra
We write A = (aij )m×n or simply A = (aij ) to denote an m × n matrix whose (i, j)-entry is
aij , i.e. aij is the i-th row and in the j-th column.
If A = (aij )m×n we say that A has size m × n. An n × n matrix is said to be square.
Example 2.1. If
1 3 2
A= ,
−2 4 0
then A is a matrix of size 2 × 3. The (1, 2)-entry of A is 3 and the (2, 3)-entry of A is 0.
Definition 2.2 (Equality). Two matrices A and B are equal and we write A = B if they
have the same size and aij = bij where A = (aij ) and B = (bij ).
Definition 2.3 (Scalar multiplication). If A = (aij )m×n and α is a scalar, then αA (the scalar
product of α and A) is the m × n matrix whose (i, j)-entry is αaij .
Definition 2.4 (Addition). If A = (aij )m×n and B = (bij )m×n then the sum A + B of A
and B is the m × n matrix whose (i, j)-entry is aij + bij .
Definition 2.6 (Zero matrix). We write Om×n or simply O (if the size is clear from the
context) for the m × n matrix all of whose entries are zero, and call it a zero matrix.
9
10 CHAPTER 2. MATRIX ALGEBRA
Scalar multiplication and addition of matrices satisfy the following rules proved in Geom-
etry I:
Theorem 2.7. Let A, B and C be matrices of the same size, and let α and β be scalars.
Then:
(a) A + B = B + A;
(b) A + (B + C) = (A + B) + C;
(c) A + O = A;
(h) 1A = A.
Example 2.8. Simplify 2(A + 3B) − 3(C + 2B), where A, B, and C are matrices with the
same size.
Solution.
Example 2.10. Compute the (1, 3)-entry and the (2, 4)-entry of AB, where
2 1 6 0
3 −1 2
A= and B = 0 2 3 4 .
0 1 4
−1 0 5 8
Solution.
(1, 3)-entry: 3 · 6 + (−1) · 3 + 2 · 5 = 25;
(2, 4)-entry: 0 · 0 + 1 · 4 + 4 · 8 = 36.
Definition 2.11 (Identity matrix). An identity matrix I is a square matrix with 1’s on the
diagonal and zeros elsewhere. If we want to emphasise its size we write In for the n × n
identity matrix.
2.1. REVISION FROM GEOMETRY I 11
Theorem 2.12. Assume that α is a scalar and that A, B, and C are matrices so that the
indicated operations can be performed. Then:
(a) IA = A and BI = B;
Notation 2.13.
• Since A(BC) = (AB)C, we can omit the brackets and simply write ABC and similarly
for products of more than three factors.
Example 2.14.
1 0 0 1 0 1
=
0 0 0 0 0 0
but
0 1 1 0 0 0
=
0 0 0 0 0 0
Definition 2.15. If A and B are two matrices with AB = BA, then A and B are said to
commute.
AB = I and BA = I .
Note that not every matrix is invertible. For example the matrix
1 0
A=
0 0
Later on in this chapter we shall discuss an algorithm that lets us decide whether a matrix
is invertible and at the same furnishes an inverse if the matrix is invertible.
12 CHAPTER 2. MATRIX ALGEBRA
B = IB = (CA)B = C(AB) = CI = C .
If A is an invertible matrix, the unique inverse of A is denoted by A−1 . Hence A−1 (if it
exists!) is a square matrix of the same size as A with the property that
AA−1 = A−1 A = I .
Note that the above equality implies that if A is invertible, then its inverse A−1 is also invertible
with inverse A, that is,
(A−1 )−1 = A .
Slightly deeper is the following result:
Theorem 2.18. If A and B are invertible matrices of the same size, then AB is invertible
and
(AB)−1 = B −1 A−1 .
Example 2.20.
1 4
1 2 3
(a) A = ⇒ AT = 2 5
4 5 6
3 6
1 2 T 1 3
(b) B = ⇒B =
3 −1 2 −1
2.3. SPECIAL TYPES OF SQUARE MATRICES 13
Theorem 2.21. Assume that α is a scalar and that A, B, and C are matrices so that the
indicated operations can be performed. Then:
(a) (AT )T = A;
(c) (A + B)T = AT + B T ;
(d) (AB)T = B T AT .
Proof. (a) is obvious while (b) and (c) are proved as Exercise 6 in Coursework 2. For the proof
of (d) assume A = (aij )m×n and B = (bij )n×p and write AT = (ãij )n×m and B T = (b̃ij )p×n
where
ãij = aji and b̃ij = bji .
Notice that (AB)T and B T AT have the same size, so it suffices to show that they have the
same entries. Now, the (i, j)-entry of B T AT is
n
X n
X n
X
b̃ik ãkj = bki ajk = ajk bki ,
k=1 k=1 k=1
which is the (j, i)-entry of AB, that is, the (i, j)-entry of (AB)T . Thus B T AT = (AB)T .
Example 2.24.
1 2 4
5 2
symmetric: 2 −1 3 , .
2 −1
4 3 0
2 2 4
1 1 1
not symmetric: 2 2 3 .
1 1 1
1 3 5
14 CHAPTER 2. MATRIX ALGEBRA
Innocent as this definition may seem, symmetric matrices play an important role in many
parts of pure and applied Mathematics as well as in some of the ‘hard’ sciences. Some of
the reasons for this will become clearer towards the end of this course, when we shall study
symmetric matrices in much more detail.
Some other useful classes of square matrices are the triangular ones, which will also play
a role later on in the course.
If A = (aij ) is a square matrix of size n × n, we call a11 , a22 , . . . , ann the diagonal entries
of A. So, informally speaking, a matrix is upper triangular if all the entries below the diagonal
entries are zero, and it is strictly upper triangular if all entries below the diagonal entries and
the diagonal entries itself are zero. Similarly for (strictly) lower triangular matrices.
Example 2.26.
1 0 0 0
1 2 0 3 0 0
upper triangular: , diagonal:
0 3 0 0 5 0
0 0 0 3
0 0 0
strictly lower triangular: −1 0 0 .
2 3 0
Theorem 2.27. The sum and product of two upper triangular matrices of the same size is
upper triangular.
The first reformulation is based on the observation that we can write this system more suc-
cinctly as a single matrix equation
Ax = b , (2.2)
where
a11 ··· a1n x1 b1
.. .. , .. ..
A= . . x = . ∈ R , and b = . ∈ Rm ,
n
am1 · · · amn xn bm
2x1 − 3x2 + x3 = 2
3x1 − x3 = −1
can be written
x
2 −3 1 1
2
x2 = .
3 0 −1 −1
| {z } x3 | {z }
=A | {z } =b
=x
Apart from obvious notational economy, writing (2.1) in the form (2.2) has a number of
other advantages which will become clearer shortly.
The other useful way of writing (2.1) is the following: with A and x as before we have
a11 x1 + · · · + a1n xn a11 a1n
.. . .
Ax = = x1 .. + · · · + xn .. ,
.
am1 x1 + · · · + amn xn am1 amn
| {z } | {z }
=a1 =an
Sums such as the left-hand side of (2.3) or (2.4) will turn up time and again in this course,
so it will be convenient to introduce the following terminology
Summarising the previous discussion, we now have the following characterisation of con-
sistency:
Theorem 2.31 (Consistency Theorem for Linear Systems). A linear system Ax = b is con-
sistent if and only if b can be written as a linear combination of the column vectors of A.
Ax = b (2.5)
M Ax = M b (2.6)
Proof. Note that if x satisfies (2.5), then it clearly satisfies (2.6). Conversely, suppose that x
satisfies (2.6), that is,
M Ax = M b .
Since M is invertible, we may multiply both sides of the above equation by M −1 from the left
to obtain
M −1 M Ax = M −1 M b ,
so IAx = Ib, and hence Ax = b, that is, x satisfies (2.5).
2.5. ELEMENTARY MATRICES AND THE INVERTIBLE MATRIX THEOREM 17
We now come back to the idea outlined at the beginning of this section. It turns out that
we can ‘algebraize’ the process of applying an elementary row operation to a matrix A by
left-multiplying A by a certain type of matrix, defined as follows:
Definition 2.33. An elementary matrix of type I (respectively, type II, type III) is a
matrix obtained by applying an elementary row operation of type I (respectively, type II, type
III) to an identity matrix.
Example 2.34.
0 1 0
type I: E1 = 1
0 0 (take I3 and swap rows 1 and 2)
0 0 1
1 0 0
type II: E2 = 0
1 0 (take I3 and multiply row 3 by 4)
0 0 4
1 0 2
type III: E3 = 0
1 0 (take I3 and add 2 times row 3 to row 1)
0 0 1
Example 2.35. Let A = (aij )3×3 and let El (l = 1, 2, 3) be defined as in the previous example.
Then
0 1 0 a11 a12 a13 a21 a22 a23
E1 A = 1 0 0 a21 a22 a23 = a11 a12 a13 ,
0 0 1 a31 a32 a33 a31 a32 a33
1 0 0 a11 a12 a13 a11 a12 a13
E2 A = 0 1 0 a21 a22 a23 = a21 a22 a23 ,
0 0 4 a31 a32 a33 4a31 4a32 4a33
1 0 2 a11 a12 a13 a11 + 2a31 a12 + 2a32 a13 + 2a33
E3 A = 0 1 0 a21 a22 a23 = a21 a22 a23 .
0 0 1 a31 a32 a33 a31 a32 a33
You should now pause and marvel at the following observation: interchanging rows 1 and 2
of A produces E1 A, multiplying row 3 of A by 4 produces E2 A, and adding 2 times row 3 to
row 1 of A produces E3 A.
This example should convince you of the truth of the following theorem, the proof of which
will be omitted as it is straightforward, slightly lengthy and not particularly instructive.
Proof. The assertion follows from the previous theorem and the observation that an elementary
row operation can be reversed by an elementary row operation of the same type. More precisely,
• if two rows of a matrix are interchanged, then interchanging them again restores the
original matrix;
• if a row is multiplied by α 6= 0, then multiplying the same row by 1/α restores the
original matrix;
• if α times row q has been added to row r, then adding −α times row q to row r restores
the original matrix.
Now, suppose that E was obtained from I by a certain row operation. Then, as we just
observed, there is another row operation of the same type that changes E back to I. Thus
there is an elementary matrix F of the same type as E such that F E = I. A moment’s
thought shows that EF = I as well, since E and F correspond to reverse operations. All in
all, we have now shown that E is invertible and its inverse E −1 = F is an elementary matrix
of the same type.
Example 2.38. Determine the inverses of the elementary matrices E1 , E2 , and E3 in Exam-
ple 2.34.
Solution. In order to transform E1 into I we need to swap rows 1 and 2 of E1 . The elementary
matrix that performs this feat is
0 1 0
E1−1 = 1 0 0 .
0 0 1
Finally, in order to transform E3 into I we need to add −2 times row 3 to row 1, and so
1 0 −2
E3−1 = 0 1 0 .
0 0 1
Before we come to the main result of this chapter we need some more terminology:
Definition 2.39. A matrix B is row equivalent to a matrix A if there exists a finite sequence
E1 , E2 , . . . , Ek of elementary matrices such that
B = Ek Ek−1 · · · E1 A .
1
Fact 2.40.
Property (b) follows from Theorem 2.37. Details of the proof of (a), (b), and (c) are left
as an exercise.
We are now able to formulate and prove the first highlight of this module, a truly delightful
characterisation of invertibility of matrices. More precisely, the following theorem provides
three equivalent conditions for a matrix to be invertible. Later on in this course, we will
encounter further equivalent conditions.
Before stating the theorem we recall that the zero vector, denoted by 0, is the column
vector all of whose entries are zero.
Theorem 2.41 (Invertible Matrix Theorem). Let A be a square n × n matrix. The following
are equivalent:
(a) A is invertible;
Proof. We shall prove this theorem using a cyclic argument: we shall first show that (a)
implies (b), then (b) implies (c), then (c) implies (d), and finally that (d) implies (a). This is
a frequently used trick to show the logical equivalence of a list of assertions.
(a) ⇒ (b): Suppose that A is invertible. If x satisfies Ax = 0, then
Determinants
In other words, with every 2 × 2 matrix A it is possible to associate a scalar, called the
determinant of A, which is given by a certain sum of products of the entries of A. The
following fact was proved in Geometry I by a brute force calculation:
This fact reveals one of the main motivations to introduce this somewhat non-intuitive
object: the determinant of a matrix allows us to decide whether a matrix is invertible or not.
In this chapter we introduce determinants for arbitrary square matrices, study some of their
properties, and then prove the generalisation of the above fact for arbitrary square matrices.
Before giving the general definition of the determinant of an n × n matrix, let us recall the
definition of 3 × 3 determinants given in Geometry I:
If A = (aij ) is a 3 × 3 matrix, then its determinant is defined by
a11 a12 a13
a22 a23 a12 a13 a12 a13
det(A) = a21 a22 a23 = a11 − a21 a32 a33 + a31 a22 a23 (3.2)
a31 a32 a33 a32 a 33
= a11 a22 a33 − a11 a32 a23 − a21 a12 a33 + a21 a32 a13 + a31 a12 a23 − a31 a22 a13 .
23
24 CHAPTER 3. DETERMINANTS
Notation 3.2. For any square matrix A, let Aij denote the submatrix formed by deleting the
i-th row and the j-th column of A.
Warning: This notation differs from the one used in the course text
Example 3.3. If
3 2 5 −1
−2 9 0 6
7 −2 −3 1 ,
A=
4 −5 8 −4
then
3 2 −1
A23 = 7 −2 1 .
4 −5 −4
If we now define the determinant of a 1 × 1 matrix A = (aij ) by det(A) = a11 , we can
recast (3.1) and (3.2) as follows:
• if A = (aij )2×2 then
det(A) = a11 det(A11 ) − a21 det(A21 ) ;
To state the next theorem, it will be convenient to write the definition of det(A) in a
slightly different form.
Definition 3.6. Given a square matrix A = (aij ), the (i, j)-cofactor of A is the number Cij
defined by
Cij = (−1)i+j det(Aij ) .
This is called the cofactor expansion down the first column of A. There is nothing special
about the first column, as the next theorem shows:
Although this theorem is fundamental for the development of determinants, we shall not
prove it here, as it would lead to a rather lengthy digression.1
Before moving on, notice that the plus or minus sign in the (i, j)-cofactor depends on
the position of aij in the matrix, regardless of aij itself. The factor (−1)i+j determines the
following checkerboard pattern of signs
+ − + ···
− + −
.
+ − +
..
..
. .
Example 3.8. Use a cofactor expansion across the second row to compute det(A), where
4 −1 3
A = 0 0 2 .
1 0 7
Solution.
1
If not knowing the proof causes you sleepless nights, ask me and I will point you towards one.
26 CHAPTER 3. DETERMINANTS
We have already computed the value of the above 3 × 3 determinant in the previous example
and found it to be equal to −2. Thus det(A) = 3 · 5 · (−2) = −30.
Notice that the matrix in the previous example was almost lower triangular. The method
of this example is easily generalised to prove the following theorem:
Theorem 3.10. If A is either an upper or a lower triangular matrix, then det(A) is the product
of the diagonal entries of A.
3 1 0 3 1 0
(c) 4 2 9 = 7 3 9 by (c) of the previous theorem.
0 −2 1 0 −2 1
The following examples show how to use the previous theorem for the effective computation
of determinants:
Solution. Here we see that the first column already has two zero entries. Using the previous
theorem we can introduce another zero in this column by adding row 2 to row 4. Thus
0
1 2 −1 0 1 2 −1
2 5 −7 3 2 5 −7 3
det(A) = = .
0 3 6 2 0 3 6 2
−2 −5 4 −2 0 0 −3 1
The 3 × 3 determinant above can be further simplified by subtracting 3 times row 1 from row
2. Thus
1 2
−1
det(A) = −2 0 0 5 .
0 −3 1
Finally we notice that the above determinant can be brought to triangular form by swapping
row 2 and row 3, which changes the sign of the determinant by the previous theorem. Thus
1 2 −1
det(A) = (−2) · (−1) 0 −3 1 = (−2) · (−1) · 1 · (−3) · 5 = −30 ,
0 0 5
by Theorem 3.10.
We are now able to prove the first important result about determinants. It allows us to
decide whether a matrix is invertible or not by computing its determinant. It will play an
important role in later chapters.
Proof. Bring A to row echelon form U (which is then necessarily upper triangular). Since we
can achieve this using elementary row operations, and since, in the process we only ever multiply
a row by a non-zero scalar det(A) = γ det(U ) for some γ with γ 6= 0, by Theorem 3.11. If A
is invertible, then det(U ) = 1, since U is upper triangular with 1’s on the diagonal, and hence
det(A) = γ det(U ) 6= 0. Otherwise, at least one diagonal entry of U is zero, so det(U ) = 0,
and hence det(A) = γ det(U ) = 0.
Our next result shows what effect transposing a matrix has on its determinant:
Proof. The proof is by induction on n (that is, the size of A).2 The theorem is obvious for
n = 1. Suppose now that it has already been proved for k × k matrices for some integer k.
Our aim now is to show that the assertion of the theorem is true for (k + 1) × (k + 1) matrices
as well. Let A be a (k + 1) × (k + 1) matrix. Note that the (i, j)-cofactor of A equals the
(i, j)-cofactor of AT , because the cofactors involve k × k determinants only, for which we
assumed that the assertion of the theorem holds. Hence
so det(A) = det(AT ).
Let’s summarise: the theorem is true for 1 × 1 matrices, and the truth of the theorem for
k × k matrices for some k implies the truth of the theorem for (k + 1) × (k + 1) matrices.
2
If you have never encountered this method of proof, don’t despair! Simply read through the following
argument. The last paragraph explains the underlying idea of this method.
3.2. PROPERTIES OF DETERMINANTS 29
Thus, the theorem must be true for 2 × 2 matrices (choose k = 1); but since we now know
that it is true for 2 × 2 matrices, it must be true for 3 × 3 matrices as well (choose k = 2);
continuing with this process, we see that the theorem must be true for matrices of arbitrary
size.
By the previous theorem, each statement of the theorem on the behaviour of determinants
under row operations (Theorem 3.11) is also true if the word ‘row’ is replaced by ‘column’,
since a row operation on AT amounts to a column operation on A.
Theorem 3.19. Let A be a square matrix.
(a) If two columns of A are interchanged to produce B, then det(B) = − det(A).
(b) If one column of A is multiplied by α to produce B, then det(B) = α det(A).
(c) If a multiple of one column of A is added to another column to produce a matrix B
then det(B) = det(A).
Example 3.20. Find det(A) where
1 3 4 8
−1 2 1 9
A=
2
.
5 7 0
3 −4 −1 5
Solution. Adding column 1 to column 2 gives
1 3 4 8 1 4 4 8
−1 2 1 9 −1 1
1 9
det(A) = = .
2 5 7 0 2 7 7 0
3 −4 −1 5 3 −1 −1 5
Now subtracting column 3 from column 2 the determinant is seen to vanish by a cofactor
expansion down column 2.
1 0 4 8
−1 0 1 9
det(A) = = 0.
2 0 7 0
3 0 −1 5
Our next aim is to prove that determinants are multiplicative, that is, det(AB) = det(A) det(B)
for any two square matrices A and B of the same size. We start by establishing a baby-version
of this result, which, at the same time, proves the theorem on the behaviour of determinants
under row operations stated earlier (see Theorem 3.11).
Theorem 3.21. If A is an n × n matrix and E an elementary n × n matrix, then
det(EA) = det(E) det(A)
with
−1
if E is of type I (interchanging two rows)
det(E) = α if E is of type II (multiplying a row by α) .
1 if E is of type III (adding a multiple of one row to another)
30 CHAPTER 3. DETERMINANTS
Proof. By induction on the size of A. The case where A is a 2×2 matrix is an easy exercise (see
Exercise 1, Coursework 4). Suppose now that the theorem has been verified for determinants
of k ×k matrices for some k with k ≥ 2. Let A be (k +1)×(k +1) matrix and write B = EA.
Expand det(EA) across a row that is unaffected by the action of E on A, say, row i. Note
that Bij is obtained from Aij by the same type of elementary row operation that E performs
on A. But since these matrices are only k × k, our hypothesis implies that
det(Bij ) = r det(Aij ) ,
= r det(A) .
In particular, taking A = Ik+1 we see that det(E) = −1, α, 1 depending on the nature of E.
To summarise: the theorem is true for 2 × 2 matrices and the truth of the theorem for
k × k matrices for some k ≥ 2 implies the truth of the theorem for (k + 1) × (k + 1) matrices.
By the principle of induction the theorem is true for matrices of any size.
Using the previous theorem we are now able to prove the second important result of this
chapter:
Theorem 3.22. If A and B are square matrices of the same size, then
Case II: If A is invertible, then by the Invertible Matrix Theorem A is a product of elementary
matrices, that is, there exist elementary matrices E1 , . . . , Ek , such that
A = Ek Ek−1 · · · E1 .
For brevity, write |A| for det(A). Then, by the previous theorem,
A = (a1 . . . an ) .
BA = (Ba1 . . . Ban ) ,
Ai (b) = (a1 . . . b . . . an ) .
col i
Theorem 3.23 (Cramer’s Rule). Let A be an invertible n × n matrix. For any b ∈ Rn , the
unique solution x of Ax = b has entries given by
det(Ai (b))
xi = for i = 1, . . . , n .
det(A)
Proof. Let a1 . . . , an be the columns of A, and e1 , . . . , en the columns of the n × n identity
matrix I. Then
and hence
det(A) det(Ii (x)) = det(Ai (b)) .
But det(Ii (x)) = xi by a cofactor expansion across row i, so
det(Ai (b))
xi = ,
det(A)
since det(A) 6= 0.
Example 3.24. Use Cramer’s Rule to solve the system
3x1 − 2x2 = 6
.
−5x1 + 4x2 = 8
Then
6 −2 3 6
A1 (b) = , A2 (b) = ,
8 4 −5 8
and Cramer’s Rule gives
det(A1 (b)) 40
x1 = = = 20 ,
det(A) 2
det(A2 (b)) 54
x2 = = = 27 .
det(A) 2
Cramer’s Rule is not really useful for practical purposes (except for very small systems),
since evaluation of determinants is time consuming when the system is large. For 3×3 systems
and larger, you are better off using Gaussian elimination. Apart from its intrinsic beauty, its
main strength is as a theoretical tool. For example, it allows you to study how sensitive the
solution of Ax = b is to a change in an entry in A or b.
As an application of Cramer’s Rule, we shall now derive an explicit formula for the inverse
of a matrix. Before doing so we shall have another look at the process of inverting a matrix.
Again, denote the columns of In by e1 , . . . , en . The Gauss-Jordan inversion process bringing
(A|I) to (I|A−1 ) can be viewed as solving the n systems
Ax = e1 , Ax = e2 , ... Ax = en .
Ax = ej ,
and the i-th entry of x is the (i, j)-entry of A−1 . By Cramer’s rule
det(Ai (ej ))
(i, j)-entry of A−1 = xi = . (3.3)
det(A)
A cofactor expansion down column i of Ai (ej )) shows that
where Cji is the (j, i)-cofactor of A. Thus, by (3.3), the (i, j)-entry of A−1 is the cofactor
Cji divided by det(A) (note that the order of the indices is reversed!). Thus
C11 C21 · · · Cn1
1 C12 C22 · · · Cn2
−1
A = . .. ... .. . (3.4)
det(A) .. . .
C1n C2n · · · Cn
| {z }
= adj (A)
The matrix of cofactors on the right of (3.4) is called the adjugate of A, and is denoted by
adj (A). The following theorem is simply a restatement of (3.4):
Theorem 3.25 (Inverse Formula). Let A be an invertible matrix. Then
1
A−1 = adj (A) .
det(A)
3.3. CRAMER’S RULE AND A FORMULA FOR A−1 33
Example 3.26. Find the inverse of the following matrix using the Inverse Formula
1 3 −1
A = −2 −6 0 .
1 4 −3
Thus
18 5 −6
adj (A) = −6 −2 2 ,
−2 −1 0
and since det(A) = 2, we have
5
9 2
−3
A−1 = −3 −1 1 .
−1 − 12 0
Note that the above calculations are just as laborious as if we had used the Gauss-Jordan
inversion process to compute A−1 . As with Cramer’s Rule, the deceptively neat formula for
the inverse is not useful if you want to invert larger matrices. As a rule, for matrices larger
than 3 × 3 the Gauss-Jordan inversion algorithm is much faster.
34 CHAPTER 3. DETERMINANTS
Chapter 4
Vector Spaces
• scalar multiplication: if
x1
x = ... ∈ Rn , and α is a scalar
xn
After these operations were defined, it turned out that they satisfy a number of rules (see
Theorem 2.7).1 We are now going to turn this process on its head. That is, we start from
a set on which two operations are defined, we postulate that these operations satisfy certain
rules, and we call the resulting structure a ‘vector space’:
1
Go back and look them up!
35
36 CHAPTER 4. VECTOR SPACES
Definition 4.1. A vector space is a non-empty set V on which are defined two operations,
called addition and scalar multiplication, such that the following axioms hold for all u, v, w
in V and all scalars α, β:
(A1) u + v = v + u;
(A2) u + (v + w) = (u + v) + w;
(A8) 1u = u.
We will refer to V as the universal set for the vector space. Its elements are called vectors,
and we usually write them using bold letters u, v, w, etc.
The term ‘scalar’ will usually refer to a real number, although later on we will sometimes
allow scalars to be complex numbers. To distinguish these cases we will use the term real
vector space (if the scalars are real numbers) or complex vector space (if the scalars are
complex numbers). For the moment, however, we will only consider real vector spaces.
Note that in the above definition the axioms (C1) and (C2), known as closure axioms,
simply state that the two operations produce values in V , that is, they are bona fide operations
on V . The other eight axioms, also known as the classical vector space axioms, stipulate how
the two operations interact.
Let’s have a look at some examples:
Example 4.2. Let Rm×n denote the set of all m × n matrices. Define addition and scalar
multiplication of matrices in the usual way. Then Rn×m is a vector space by Theorem 2.7.
Example 4.3. Let Pn denote the set of all polynomials with real coefficients of degree less
than n. Thus, an element p in Pn is of the form
p(t) = a0 + a1 t + a2 t2 + · · · + an tn ,
q(t) = b0 + b1 t + b2 t2 + · · · + bn tn ,
• p + q is the polynomial
• αp is the polynomial
Note that (C1) and (C2) clearly hold, since if p, q ∈ Pn and α is a scalar, then p + q and αp
are again polynomials of degree less than n. Axiom (A1) holds since if p and q are as above,
then
so p + q = q + p. A similar calculation shows that (A2) holds. Axiom (A3) holds if we let 0
be the zero polynomial, that is
0(t) = 0 + 0 · t + · · · + 0 · tn ,
since then (p + 0)(t) = p(t), that is, p + 0 = p. Axiom (A4) holds if, given p ∈ Pn we set
−p = (−1)p, since then
that is p+(−p) = 0. The remaining axioms are easily verified as well, using familiar properties
of real numbers.
Example 4.4. Let C[a, b] denote the set of all real-valued functions that are defined and
continuous on the closed interval [a, b]. For f , g ∈ C[a, b] and α a scalar, define f + g and αf
pointwise, that is, by
since then
(f + 0)(t) = f (t) + 0(t) = f (t) + 0 = f (t) ,
so f + 0 = f . Axiom (A4) holds if, given f ∈ C[a, b], we let −f be the function
since then
(f + (−f ))(t) = f (t) + (−f )(t) = f (t) − f (t) = 0 = 0(t) ,
that is, f + (−f ) = 0. We leave it as an exercise to verify the remaining axioms.
38 CHAPTER 4. VECTOR SPACES
u + (−u) = 0 . (4.2)
Thus
(4.2) (4.1) (A2) (4.2) (A3)
0 = u + (−u) = (0u + u) + (−u) = 0u + (u + (−u)) = 0u + 0 = 0u .
(A3) (A2)
− u = −u + 0 = −u + (u + v) = (−u + u) + v
(A1) (A4) (A1) (A3)
= (u + (−u)) + v = 0 + v = v + 0 = v .
4.2 Subspaces
Given a vector space V , a ‘subspace’ of V is, roughly speaking, a subset of V that inherits
the vector space structure from V , and can thus be considered as a vector space in its own
right. One of the main motivations to consider such ‘substructures’ of vector spaces, is the
following. As you might have noticed, it can be frightfully tedious to check whether a given
set, call it H, is a vector space. Suppose that we know that H is a subset of a larger set V
equipped with two operations (addition and scalar multiplication), for which we have already
checked that the vector space axioms are satisfied. Now, in order for H to be the universal set
of a vector space equipped with the operations of addition and scalar multiplication inherited
from V , the set H should certainly be closed under addition and scalar multiplication (so that
(C1) and (C2) are satisfied). Somewhat surprisingly, checking these two axioms is enough in
order for H to be a vector space in its own right, as we shall see shortly. To summarise: if H
is a subset of a vector space V , and if H is closed under addition and scalar multiplication,
then H is a vector space in its own right. So instead of having to check 10 axioms, we only
need to check two in this case. Let’s cast these observations into the following definition:
2
In the language of MTH4104 (Introduction to Algebra) this statement says that the additive inverse is
unique.
4.2. SUBSPACES 39
Definition 4.6. Let H be a nonempty subset of a vector space V . Suppose that H satisfies
the following two conditions:
(i) if u, v ∈ H, then u + v ∈ H;
Theorem 4.7. Let H be a subspace of a vector space V . Then H with addition and scalar
multiplication inherited from V is a vector space in its own right.
Proof. Clearly, by definition of a subspace, (C1) and (C2) are satisfied. Axioms (A3) and (A4)
follow from Theorem 4.5 and condition (ii) of the definition of a subspace. The remaining
axioms are valid for any elements in V , so, in particular, they are valid for any elements in H
as well.
Remark 4.8. If V is a vector space, then {0} and V are clearly subspaces of V . All other
subspaces are said to be proper subspaces of V . We call {0} the zero subspace of V .
Solution. (a) Notice that an arbitrary element in L is of the form r(1, 1, 1)T for some real
number r. Thus, in particular, L is not empty, since (0, 0, 0)T ∈ L. In order to check that L
is a subspace of R3 we need to check that conditions (i) and (ii) of Definition 4.6 are satisfied.
We start with condition (i). Let x1 and x2 belong to L. Then x1 = r1 (1, 1, 1)T and
x2 = r2 (1, 1, 1)T for some real numbers r1 and r2 , so
1 1 1
x1 + x2 = r1 1 + r2 1 = (r1 + r2 ) 1 ∈ L .
1 1 1
Let’s summarise: the non-empty set L satisfies conditions (i) and (ii), that is, it is closed
under addition and scalar multiplication, hence L is a subspace of R3 as claimed.
(b) In order to see that P is a subspace of R3 we first note that (0, 0, 0)T ∈ P , so P is not
empty.
Next we check condition (i). Let x1 = (r1 , s1 , t1 )T ∈ P and x2 = (r2 , s2 , t2 )T ∈ P . Then
r1 − s1 + 3t1 = 0 and r2 − s2 + 3t2 = 0, so
r1 + r2
x1 + x2 = s1 + s2 ∈ P ,
t1 + t2
Remark 4.10. In the example above the two subspaces L and P of R3 can also be thought
of as geometric objects. More precisely, L can be interpreted geometrically as a line through
the origin with direction vector (1, 1, 1)T , while P can be interpreted as a plane through the
origin with normal vector (1, −1, 3)T .
More generally, all proper subspaces of R3 can be interpreted geometrically as either lines or
planes through the origin. Similarly, all proper subspaces of R2 can be interpreted geometrically
as lines through the origin.
If this is not crystal clear to you, think about it!
Example 4.11. H = (r2 , s, r)T r, s ∈ R is not a subspace of R3 , since
1 1 2
0 ∈ H, but 2 0 = 0 6∈ H .
1 1 2
is not a subspace of R2×2 . In order to see this, note that every subspace must contain the
zero vector. However,
O2×2 6∈ H .
Example 4.13. Let H = { f ∈ C[−2, 2] | f (1) = 0 }. Then H is a subspace of C[−2, 2]. First
observe that the zero function is in H, so H is not empty. Next we check that the closure
properties are satisfied.
Let f , g ∈ H . Then f (1) = 0 and g(1) = 0, so
A class of subspaces we have already encountered (but didn’t think about them in this way)
are the solution sets of homogeneous systems. More precisely, if A ∈ Rm×n is the coefficient
matrix of such a system, then the solution set can be thought of as the collection of all x ∈ Rn
with Ax = 0, and the collection of all such x is a subspace of Rn . Before convincing us of
this fact, we introduce some convenient terminology:
N (A) = { x ∈ Rn | Ax = 0 }
A(x + y) = Ax + Ay = 0 + 0 = 0 ,
A(αx) = α(Ax) = α0 = 0 ,
so αx ∈ N (A).
Thus N (A) is a subspace of Rn as claimed.
Solution. We need to find the solution set of Ax = 0. To do this you can use your favourite
method to solve linear systems. Perhaps the fastest one is to bring the augmented matrix (A|0)
to reduced row echelon form and write the leading variables in terms of the free variables. In
our case, we have
−3 6 −1 1 −7 0 1 −2 0 −1 3 0
1 −2 2 3 −1 0 ∼ · · · ∼ 0 0 1 2 −2 0 .
2 −4 5 8 −4 0 0 0 0 0 0 0
42 CHAPTER 4. VECTOR SPACES
The leading variables are x1 and x3 , and the free variables are x2 , x4 and x5 . Now setting
x2 = α, x4 = β and x5 = γ we find x3 = −2x4 + 2x5 = −2β + 2γ and x1 = 2x2 + x4 − 3x5 =
2α + β − 3γ. Thus
x1 2α + β − 3γ 2 1 −3
x2 α 1 0 0
x3 = −2β + 2γ = α 0 + β −2 + γ 2 ,
x4 β 0 1 0
x5 γ 0 0 1
hence
2 1 −3
1 0 0
N (A) = α 0 + β −2 + γ 2 α, β, γ ∈ R .
0 1 0
0 0 1
Notice that in the above example Span (e1 , e2 ) can be interpreted geometrically as the
x1 , x2 plane, that is, the plane containing the x1 - and the x2 -axis. In particular, Span (e1 , e2 )
is a subspace of R3 . This is true more generally:
4
In case you forgot what this means go back to Definition 2.30.
4.3. THE SPAN OF A SET OF VECTORS 43
Example 4.19. Given vectors v1 and v2 in a vector space V , show that H = Span (v1 , v2 )
is a subspace of V .
Solution. Notice that 0 ∈ H (since 0 = 0v1 + 0v2 ), so H is not empty. In order to show that
H is closed under addition, let u and w be arbitrary vectors in H. Then there are scalars α1 ,
α2 and β1 , β2 , such that
u = α1 v1 + α2 v2 ,
w = β1 v1 + β2 v2 .
so u + w ∈ H.
In order to show that H is closed under scalar multiplication, let u ∈ H, say, u =
α1 v1 + α2 v2 , and let γ be a scalar. Then, by axioms (A5) and (A7)
so γu ∈ H.
More generally, using exactly the same method of proof, it is possible to show the following:
We have just seen that the span of a collection of vectors in a vector space V is a subspace
of V . As we saw in Example 4.18, the span may be a proper subspace of V , or it may be
equal to all of V . The latter is sufficiently interesting a case to merit its own definition:
Definition 4.21. Let V be a vector space, and let v1 , . . . , vn ∈ V . We say that the set
{v1 , . . . , vn } is a spanning set for V if
Span (v1 , . . . , vn ) = V .
If {v1 , . . . , vn } is a spanning set for V , we shall also say that {v1 , . . . , vn } spans V , that
v1 , . . . , vn span V or that V is spanned by v1 , . . . , vn .
Notice that the above definition can be rephrased as follows. A set {v1 , . . . , vn } is a
spanning set for V , if and only if every vector in V can be written as a linear combination of
v1 , . . . , vn .
Example 4.22. Which of the following sets are spanning sets for R3 ?
(a) (1, 0, 0)T , (0, 1, 0)T , (0, 0, 1)T , (1, 2, 4)T (b) (1, 1, 1)T , (1, 1, 0)T , (1, 0, 0)T
(c) (1, 0, 1)T , (0, 1, 0)T (d) (1, 2, 4)T , (2, 1, 3)T , (4, −1, 1)T
α1 x1 + α2 x2 + α3 x3 = α1 x1 + α2 x2 + α3 (3x1 + 2x2 )
= (α1 + 3α2 )x1 + (α2 + 2α3 )x2 .
Hence
Span (x1 , x2 , x3 ) = Span (x1 , x2 ) .
Observing that equation (4.4) can be written as
we see that any of the three vectors can be expressed as a linear combination of the other
two, so
Span (x1 , x2 , x3 ) = Span (x1 , x2 ) = Span (x1 , x3 ) = Span (x2 , x3 ) .
In other words, because of the dependence relation (4.5), the span of x1 , x2 , x3 can be written
as the span of only two of the given vectors. Or, put yet differently, we can throw away one of
the three vectors without changing their span. So the three vectors are not the most economic
way to express their span, because two of them suffice.
On the other hand, no dependency of the form (4.5) exists between x1 and x2 . Indeed,
suppose that there were scalars c1 and c2 , not both 0, such that
c1 x1 + c2 x2 = 0 , (4.6)
then we could solve for one of the vectors in terms of the other, that is,
c2 c1
x1 = − x2 (if c1 6= 0) or x2 = − x1 (if c2 6= 0) .
c1 c2
However, neither vector is a multiple of the other, so (4.6) can hold only if c1 = c2 = 0. In
particular, Span (x1 ) and Span (x2 ) are both proper subspaces of Span (x1 , x2 ), so we cannot
reduce the number of vectors further to express Span (x1 , x2 , x3 ) = Span (x1 , x2 ).
This discussion motivates the following definitions:
Definition 4.25. The vectors v1 , . . . , vn in a vector space V are said to be linearly depen-
dent if there exist scalars c1 , . . . , cn , not all zero, such that
c1 v1 + · · · + cn vn = 0 .
Example 4.26. The three vectors x1 , x2 , x3 defined in (4.3) are linearly dependent.
Definition 4.27. The vectors v1 , . . . , vn in a vector space V are said to be linearly inde-
pendent if they are not linearly dependent, that is, if
c1 v1 + · · · + cn vn = 0 ,
2c1 + c2 = 0
c1 + c2 = 0
However, as is easily seen, the only solution of this system is c1 = c2 = 0. Thus, the two
vectors are indeed linearly independent as claimed.
p1 (t) = 2 + t , p2 (t) = 1 + t .
Then p1 and p2 are linearly independent. In order to see this, suppose that
c1 p1 + c2 p2 = 0 .
2c1 + c2 = 0
c1 + c2 = 0
However, as in the previous example, the only solution of this system is c1 = c2 = 0. Thus p1
and p2 are indeed linearly independent as claimed.
c1 x + c2 y = 0 ,
(b) Just as in R2 , two vectors in R3 are linearly dependent if and only if they are collinear.
Suppose now that x and y are two linearly independent vectors in R3 . Since they are
not collinear, they will span a plane (through the origin). If z is another vector lying in
this plane, then 0 can be written as a linear combination of x and y, hence x, y, z are
linearly dependent. Conversely, if z does not lie in the plane spanned by x and y, then
x, y, z are linearly independent.
In other words, three vectors in R3 are linearly independent if and only if they are not
coplanar.
Theorem 4.31. Let x1 , . . . , xn be n vectors in Rn and let X ∈ Rn×n be the matrix whose
j-th column is xj . Then the vectors x1 , . . . , xn are linearly dependent if and only if X is
singular.
Example 4.32. Determine whether the following three vectors in R3 are linearly independent:
−1 5 4
3 , 2 , 5 .
1 5 6
Solution. Since
−1 5 4 −1 3 1 R1 + R2 4 5 6 R1 − R3 0 0 0
3 2 5 = 5 2 5 = 5 2 5 = 5 2 5 = 0 ,
1 5 6 4 5 6 4 5 6 4 5 6
The following result will become important later in this chapter, when we discuss coordinate
systems.
v = α1 v1 + · · · + αn vn , (4.7)
for some scalars α1 , . . . , αn . Suppose that v can also be written in the form
v = β1 v1 + · · · + βn vn , (4.8)
4.5. BASIS AND DIMENSION 49
is a basis for R3 .
To see this note that the vectors are linearly independent, because
1 1 1
0
1 1 = 1 6= 0 .
0 0 1
Moreover, the vectors span R3 since, if (a, b, c)T is an arbitrary vector in R3 , then
a 1 1 1
b = (a − b) 0 + (b − c) 1 + c 1 .
c 0 0 1
50 CHAPTER 4. VECTOR SPACES
The previous two examples show that a vector space may have more than one basis. This
is not a nuisance, but, quite to the contrary, a blessing, as we shall see later in this module. For
the moment, you should only note that both bases consist of exactly three elements. We will
revisit and expand this seemingly innocent observation shortly, when we discuss the dimension
of a vector space.
Example 4.37. Let
1 0 0 1 0 0 0 0
E11 = , E12 = , E21 = , E22 = .
0 0 0 0 1 0 0 1
Then {E11 , E12 , E21 , E22 } is a basis for R2×2 , because the four vectors span R2×2 (as was
shown in Coursework 5, Exercise 7(b)) and they are linearly independent. To see this, suppose
that
c1 E11 + c2 E12 + c3 E21 + c4 E22 = O2×2 .
Then
c1 c2 0 0
= ,
c3 c4 0 0
so c1 = c2 = c3 = c4 = 0.
Most of the vector spaces we have encountered so far have particularly simple bases, termed
‘standard bases’:
Example 4.38 (Standard bases for Rn , Rm×n and Pn ).
Rn : The n columns of In form the standard basis of Rn , usually denoted by {e1 , e2 , . . . , en }.
Rm×n : A canonical basis can be constructed as follows. For i = 1, . . . , m and j = 1, . . . , n
let Eij ∈ Rm×n be the matrix whose (i, j)-entry is 1, and all other entries are 0. Then
{ Eij | i = 1, . . . , m , j = 1, . . . , n } is the standard basis for Rm×n .
Pn : The standard basis is the collection {p0 , . . . , pn } of all monomials of degree less than
n, that is,
pk (t) = tk , for k = 0, . . . , n.
If this is not clear to you, you should check that it really is a basis!
Going back to Examples 4.35 and 4.36, recall the observation that both bases of R3
contained exactly three elements. This is not pure coincidence, but has a deeper reason. In
fact, as we shall see shortly, any basis of a vector space must contain the same number of
vectors. Before proving this important result we need the following:
Theorem 4.39. Let v1 , . . . , vn be vectors in a vector space V . If Span (v1 , . . . , vn ) = V ,
then any collection of m vectors in V where m > n is linearly dependent.
Proof. Let u1 , . . . , um be m vectors in V where m > n. Then, since v1 , . . . , vn span V , we
can write
ui = αi1 v1 + · · · + αin vn for i = 1, . . . , m .
Thus, a linear combination of the vectors u1 , . . . , um can be written as
X m
c1 u1 + · · · + cm um = ci ui
i=1
m n
!
X X
= ci αij vj
i=1 j=1
n m
!
X X
= αij ci vj .
j=1 i=1
4.5. BASIS AND DIMENSION 51
This is a homogeneous system with more unknowns than equations, so by Theorem 1.22 it
must have a non-trivial solution (ĉ1 , . . . , ĉm )T . But then
n
X
ĉ1 u1 + · · · + ĉm um = 0vj = 0 ,
j=1
Corollary 4.40. If a vector space V has a basis of n vectors, then every basis of V must have
exactly n vectors.
Proof. Suppose that {v1 , . . . , vn } and {u1 , . . . , um } are both bases for V . We shall show
that m = n. In order to see this, notice that, since Span (v1 , . . . , vn ) = V and u1 , . . . , um are
linearly independent it follows by the previous theorem that m ≤ n. By the same reasoning,
since Span (u1 , . . . , um ) = V and v1 , . . . , vn are linearly independent, we must have n ≤ m.
So, all in all, we have n = m, that is, the two bases have the same number of elements.
In view of this corollary it now makes sense to talk about the number of elements of a
basis, and give it a special name:
Definition 4.41. Let V be a vector space. If V has a basis consisting of n vectors, we say
that V has dimension n, and write dim V = n.
The vector space {0} is said to have dimension 0. The vector space V is said to be finite
dimensional if there is a finite set of vectors spanning V ; otherwise it is said to be infinite
dimensional .
Example 4.42. By Example 4.38 the vector spaces Rn , Rm×n and Pn are finite dimensional
with dimensions
As an example of an infinite dimensional vector space, consider the set of all polynomials with
real coefficients, and call it P . In order to see that P is a vector space when equipped with the
usual addition and scalar multiplication, notice that P is a subset of the vector space C[−1, 1]
of continuous functions on [−1, 1] (in fact, it is a subset of C[a, b] for any a, b ∈ R with
a < b), which is closed under addition and scalar multiplication. Thus P is a vector space.
Note that any finite collection of monomials is linearly independent, so P must be infinite
dimensional. For the same reason, C[a, b] and C 1 [a, b] are infinite dimensional vector spaces.
While infinite dimensional vector spaces play an important role in many parts of contemporary
applied and pure mathematics, we shall be mainly concerned with finite dimensional vector
spaces for the rest of this module.
Example 4.43. We are now able to classify the subspaces of R3 discussed in Remark 4.10:
• 1-dimensional subspaces. Any subspace spanned by a nonzero vector, that is all lines
through the origin.
• 2-dimensional subspaces. Any subspace spanned by two linearly independent vectors,
that is all planes through the origin.
• 3-dimensional subspaces. Only R3 .
We close this section with the following result, which is often useful when trying to decide
whether a collection of vectors forms a basis of a vector space:
Theorem 4.44. If V is a vector space with dim V = n, then:
(a) any set consisting of n linearly independent vectors spans V ;
(b) any n vectors that span V are linearly independent.
Proof. (a) Let v1 , . . . , vn be a linearly independent. Pick v ∈ V . Since dim V = n, the n + 1
vectors v, v1 , . . . , vn must be linearly dependent by Theorem 4.39. Thus
c0 v + c1 v1 + · · · + cn vn = 0 , (4.10)
where c0 , c1 , . . . , cn are not all 0. But c0 6= 0 (for otherwise (4.10) would imply that the
vectors v1 , . . . , vn are linearly dependent), hence
c1 cn
v= − v1 + · · · + − vn ,
c0 c0
so v ∈ Span (v1 , . . . , vn ). But v was arbitrary, so Span (v1 , . . . , vn ) = V .
(b) Suppose Span (v1 , . . . , vn ) = V . In order to show that v1 , . . . , vn are linearly in-
dependent we argue by contradiction: suppose to the contrary that v1 , . . . , vn are linearly
dependent. Then one of the vi ’s, say vn can be written as a linear combination of the other
vectors. So v1 , . . . , vn−1 also span V . If v1 , . . . , vn−1 are still linearly dependent we can
eliminate another vector and still have a spanning set. We can continue this process until we
have found a spanning set with k linearly independent vectors where k < n. This, however,
contradicts the fact that dim V = n.
Remark 4.45. The above theorem provides a convenient tool to check whether a set of vectors
forms a basis. The theorem tells us that n linearly independent vectors in an n-dimensional
vector space are automatically spanning, so these vectors are a basis for the vector space. This
is often useful in situations where linear independence is easier to check than the spanning
property.
Remark 4.46. The above theorem also provides two perspectives on a basis of a vector space:
(
a spanning set that is as small as possible;
a basis is
a linearly independent collection of vectors that is as large as possible.
4.6 Coordinates
In this short section we shall discuss an important application of the notion of a basis. In
essence, a basis allows us to view a vector space of dimension n as if it were Rn . This is a
tremendously useful idea, with many practical and theoretical applications, many of which you
will see in the following chapters.
The basic idea is the following. Suppose that {b1 , . . . , bn } is a basis for a vector space
V . Since the basis vectors are spanning, given v ∈ V , there are scalars c1 , . . . , cn such that
v = c1 b1 + · · · + cn bn .
Moreover, since the basis vectors b1 , . . . , bn are linearly independent, the scalars c1 , . . . , cn
are uniquely determined by Theorem 4.33. Thus, the vector v in the vector space V , can
be uniquely represented as an n-vector (c1 , . . . , cn )T in Rn . This motivates the following
definition:
v = c1 b1 + · · · + cn bn ,
are called the coordinates of v relative to B . The n-vector (c1 , . . . , cn )T ∈ Rn is called the
B-coordinate vector of v, or the coordinate vector of v relative to B, and is denoted
by [v]B .
Solution.
1 1 1
x = −2b1 + 3b2 = (−2) +3 = .
0 2 6
1
Example 4.49. The entries of x = are the coordinates of x relative to the standard
6
basis E = {e1 , e2 }, since
1 1 0
=1 +6 = 1e1 + 6e2 .
6 0 1
Thus, x = [x]E .
• Given a basis B for a vector space V and the B-coordinate vector [v]B for some v ∈ V ,
find v.5
• Given a basis B for a vector space V and a vector v ∈ V , find the B-coordinate vector
[v]B .
• Given two bases B and D for a vector space V and [v]D , find [v]B .
We shall now address these problems in the case where V is Rn , deferring the general discussion
to the next chapter, when we will have better machinery to deal with these problems.
Theorem 4.51. Let B = {b1 , . . . , bn } be a basis for Rn . There is an invertible n × n matrix
PB such that for any x ∈ Rn
x = PB [x]B .
In fact, the matrix PB is the matrix whose j-th column is bj .
Proof. Let [x]B = (c1 , . . . , cn )T . Then
x = c1 b1 + · · · + cn bn ,
so
c1
..
x = (b1 · · · bn ) . .
| {z }
=PB cn
Moreover, by Theorem 4.31, the matrix PB is invertible since its columns are linearly indepen-
dent.
Since a vector x ∈ Rn is equal to its coordinate vector relative to the standard basis,
the matrix PB given in the theorem above is called the transition matrix from B to the
standard basis.
Corollary 4.52. Let B = {b1 , . . . , bn } be a basis for Rn . For any x ∈ Rn
[x]B = PB−1 x .
2 −1 4
Example 4.53. Let b1 = , b2 = and x = . Let B = {b1 , b2 } be the
1 1 5
corresponding basis for R2 . Find the B-coordinates of x.
Solution. By the previous corollary
[x]B = PB−1 x .
Now
2 −1
PB = ,
1 1
so
−1 1 1 1
PB = .
3 −1 2
Thus
1 1 1 4 3
[x]B = = .
3 −1 2 5 2
5
We have already done a particular case in Example 4.48.
4.7. ROW SPACE AND COLUMN SPACE 55
Theorem 4.54. Let B and D be two bases for Rn . If x is any vector in Rn , then
PB [x]B = x = PD [x]D .
The n × n matrix PB−1 PD given in the theorem above is called the transition matrix from
D to B.
Example 4.55. Let B = {b1 , b2 } be the basis given in Example 4.53, let D = {d1 , d2 },
where
1 1
d1 = , d2 = ,
0 2
and let x ∈ R2 . If the D-coordinates of x are (−3, 2)T , what are the B-coordinates of x?
Solution.
1 1 1 1 1 −3 1
[x]B = PB−1 PD [x]D = = .
3 −1 2 0 2 2 3
• The subspace of R1×n spanned by the row vectors of A is called the row space of A
and is denoted by row(A).
• The subspace of Rm×1 spanned by the column vectors of A is called the column space
of A and is denoted by col(A).
1 0 0
Example 4.57. Let A = .
0 1 0
• Since
α 1 0 0 +β 0 1 0 = α β 0
row(A) is a 2-dimensional subspace of R1×3 .
• Since
1 0 0 α
α +β +γ =
0 1 0 β
col(A) is a 2-dimensional subspace of R2×1 .
56 CHAPTER 4. VECTOR SPACES
Notice that the row space and column space of a matrix are generally distinct objects.
Indeed, one is a subspace of R1×n the other a subspace of Rm×1 . However, in the example
above, both spaces have the same dimension (namely 2). We shall see shortly, that, rather
surprisingly, this is always the case. Before exploring this topic further we introduce the
following important concept:
Definition 4.58. The rank of a matrix, denoted by rank A, is the dimension of the row space.
How does one calculate the rank of a matrix? The next result provides the clue:
Theorem 4.59. Two row equivalent matrices have the same row space, so, in particular, have
the same rank.
Proof. Let A and B be two row equivalent matrices. Since B is row equivalent to A, the
matrix B can be obtained from A by a finite sequence of elementary row operations. Thus the
rows of B are a linear combination of the rows of A. Consequently, row(B) is a subspace of
row(A). Exchanging the roles of A and B it follows, using the same argument, that row(A)
is also a subspace of row(B), so row(A) = row(B).
Combining the previous theorem with the observation that the nonzero rows of a matrix
in row echelon form are linearly independent, we obtain the following recipe for calculating a
basis for the row space and the rank of a matrix:
In order to calculate a basis for the row space and the rank of a matrix A:
Solution. We have already calculated the nullspace N (A) of this matrix in Example 4.16 by
bringing A to row echelon form U and then using back substitution to solve U x = 0, giving
where
2 1 −3
1 0 0
0 ,
x1 = −2 ,
x2 = 2.
x3 =
0 1 0
0 0 1
It is not difficult to see that x1 , x2 , x3 are linearly independent, so {x1 , x2 , x3 } is a basis for
N (A). Thus, nul A = 3.
Notice that in the above example the nullity of A is equal to the number of free variables
of the system Ax = 0. This is no coincidence, but true in general! If this is not obvious to
you, you should mull about this for a little while!
The connection between the rank and nullity of a matrix, alluded to above, is the content
of the following beautiful theorem with an ugly name:
rank A + nul A = n .
Proof. Bring A to row echelon form U . Write r = rank A. Now observe that U has r non-zero
rows, hence U x = 0 has n − r free variables, so nul A = n − r.
We now return to the perhaps rather surprising connection between the dimensions of the
row space and the column space of a matrix.
Proof. If A has rank r, then the row echelon form U of A will have r leading 1’s. The columns
of U corresponding to the leading 1’s will be linearly independent. They do not, however, form
a basis of col(A), since in general A and U will have different column spaces. Let UL denote
the matrix obtained from U by deleting all the columns corresponding to the free variables.
Delete the same columns from A and denote the new matrix by AL . The matrices AL and UL
are row equivalent. Thus, if x is a solution to AL x = 0, then UL x = 0. Since the columns
of UL are linearly independent, x must equal 0. It follows that the columns of AL must be
linearly independent as well. Since AL has r columns, the dimension of row(A) must be at
least r, that is,
dim col(A) ≥ r = dim row(A) .
Applying the above inequality to AT yields
The above proof shows how to find a basis for the column space of a matrix:
• the columns of A containing the leading variables form a basis for col(A).
The leading variables are in columns 1,2, and 4. Thus a basis for col(A) is given by
1 −1 2
1 , 0 , 4 .
2 −1 7
Chapter 5
Linear Transformations
L(x) = 2x .
Then L is linear since, if x and y are arbitrary vectors in R2 and α is an arbitrary real number,
then
(i) L(x + y) = 2(x + y) = 2x + 2y = L(x) + L(y);
Then L is linear. In order to see this suppose that x and y are arbitrary vectors in R2 with
x1 y
x= , y= 1 .
x2 y2
Thus
59
60 CHAPTER 5. LINEAR TRANSFORMATIONS
In order to shorten statements of theorems and examples let us introduce the following
convention:
If x is a vector in Rn , we shall henceforth denote its i-th entry by xi , and similarly for vectors
in Rn denoted by other bold symbols. So, for example, if y = (1, 4, 2, 7)T ∈ R4 , then y3 = 2.
(c) L( ni=1 αi vi ) = ni=1 αi L(vi ) for any vi ∈ V and any scalars αi where i = 1, . . . , n.
P P
Proof.
(c) follows by repeated application of the defining properties (i) and (ii) of linear transfor-
mations.
Note that we have used Theorem 4.5 for the proof of (a) and (b).
Let’s look at some examples, which should convince you that linear transformations arise
naturally in other areas of Mathematics.
Example 5.7. Let L : C[a, b] → R1 be defined by
Z b
L(f ) = f (t) dt .
a
D(f ) = f 0 .
Example 5.9. Let V be a vector space and let Id : V → V denote the identity transfor-
mation (or identity for short) on V , that is,
Definition 5.10. Let V and W be vector spaces, and let L : V → W be a linear transfor-
mation. The kernel of L, denoted by ker(L), is the subset of V given by
ker(L) = { v ∈ V | L(v) = 0 } .
The previous example shows that the kernel of a linear transformation is the natural gen-
eralisation of the nullspace of a matrix.
The previous example shows that the range of a linear transformation is the natural gen-
eralisation of the column space of a matrix.
We saw previously that the nullspace and the column space of an m × n matrix are
subspaces of Rn and Rm respectively. The same is true for the abstract analogues introduced
above.
(a) First observe that ker(L) is not empty since 0 ∈ ker(L) by Theorem 5.6. Suppose now
that v1 , v2 ∈ ker(L). Then
L(αv) = αL(v) = α0 = 0 ,
(b) First observe that L(H) is not empty since 0 ∈ L(H) by Theorem 5.6. Suppose now that
w1 , w2 ∈ L(H). Then there are v1 , v2 ∈ H such that L(v1 ) = w1 and L(v2 ) = w2
and so
w1 + w2 = L(v1 ) + L(v2 ) = L(v1 + v2 ) .
But v1 +v2 ∈ H, because H is a subspace, so w1 +w2 ∈ L(H). Moreover, if w ∈ L(H)
and α is a scalar, then there is v ∈ H such that L(v) = w and so
αw = αL(v) = L(αv) .
D(p) = p0 .
D(P3 ) = P2 .
Example 5.16 (Optional; for those of you who have taken MTH4102 ‘Differential Equations’).
Let L : C 1 [−1, 1] → C[−1, 1] be the linear transformation given by
L(f ) = f + f 0 .
Solution. In order to determine the kernel of L we need to find all f ∈ C 1 [−1, 1] such that
f + f0 = 0 . (5.1)
This is a first order homogeneous differential equation with integrating factor et . Thus, a
function f satisfies (5.1) if and only if
d t
(e f (t)) = 0 ,
dt
so
et f (t) = α ,
for some α ∈ R, and hence
f (t) = αe−t .
Thus
ker(L) = Span (h) ,
where h(t) = e−t .
The moral of this part of the example is that whenever you are solving a linear homogeneous
differential equation, you are in fact calculating the kernel of a certain linear transformation!
In order to determine the range of L, notice that L clearly sends continuously differentiable
functions to continuous functions. The question is whether every continuous function arises
as an image of some f ∈ C 1 [−1, 1] under L. The answer is yes! To see this, fix g ∈ C[−1, 1].
We need to show that there is an f ∈ C 1 [−1, 1] such that L(f ) = g, that is,
f0 + f = g . (5.2)
This is a first order linear inhomogeneous differential equation for f . Using the integrating
factor et we find that (5.2) is equivalent to
d t
(e f (t)) = et g(t) .
dt
But the right hand side of the equation above has an antiderivative, say H, that is,
d
H(t) = et g(t)
dt
so
et f (t) = H(t) + α ,
for some α ∈ R, hence
f (t) = e−t H(t) + αe−t .
Notice that the f just found is clearly continuously differentiable. To summarise, we have
shown that, given g ∈ C[−1, 1], we can find f ∈ C 1 [−1, 1] such that L(f ) = g. In other
words, the range of L is C[−1, 1].
The moral of this part of the example is the following: the statement that the range of
L is C[−1, 1] means that the differential equation (5.2) has a solution for any g ∈ C[−1, 1].
So statements about ranges of certain linear transformations are in fact statements about the
existence of solutions for certain classes of differential equations!
5.4. MATRIX REPRESENTATIONS OF LINEAR TRANSFORMATIONS 65
aj = L(ej ) for j = 1, . . . , n ,
To paraphrase the theorem above yet again: given a linear transformation L between Rn
and Rm we can find a matrix that represents it. For this reason the matrix A constructed
above is called a matrix representation of the linear transformation L. For the construction
of A we have used the standard basis. We shall see shortly that there is nothing special
about the standard basis, that is, we could have chosen any other basis to represent a given
transformation. This will be the content of the ‘Matrix Representation Theorem’ we will
encounter shortly.
Example 5.18. Let L : R3 → R2 be given by
x1 − x2
L(x) = .
x2 + 2x3
so if we set
1 −1 0
A= ,
0 1 2
then indeed
x1
1 −1 0 x1 − x2
Ax = x2 = = L(x) .
0 1 2 x2 + 2x3
x3
We are now ready for the main result alluded to at the beginning of this section. This truly
marvellous result is an extension of Theorem 5.17. It states that any linear transformation
between arbitrary finite dimensional vector spaces can be represented by a matrix.
In order to appreciate this statement, let’s suppose that V is an n-dimensional vector space
with basis V and that W is an m-dimensional vector space with basis W. If L : V → W is a
linear transformation we can ask the following question: given an arbitrary vector v ∈ V with
V-coordinates [v]V , what are the W-coordinates of L(v)? It turns out that there is an m × n
matrix A, not depending on v, such that
[L(v)]W = A[v]V .
In other words, the matrix A represents the change of coordinates effected by the linear
transformation L. This important fact is the content of the following theorem:
Theorem 5.19 (Matrix Representation Theorem). If V = {v1 , . . . , vn } and W = {w1 , . . . , wm }
are bases for vector spaces V and W respectively, then corresponding to each linear transfor-
mation L : V → W there is an m × n matrix A such that
[L(v)]W = A[v]V for each v ∈ V .
In fact, the j-th column aj of A is given by
aj = [L(vj )]W .
Proof. For each j = 1, . . . n let aj = (a1j , . . . , amj )T be the W-coordinate vector of L(vj ).
Thus m
X
L(vj ) = a1j w1 + · · · + amj wm = aij wi .
i=1
Fix v ∈ V . Let x be the V-coordinate vector of v, that is,
n
X
v = x1 v1 + · · · + xn vn = xj vj .
j=1
Then
Xn
L(v) = L( xj vj )
j=1
n
X
= xj L(vj )
j=1
n m
!
X X
= xj aij wi
j=1 i=1
m n
!
X X
= aij xj wi .
i=1 j=1
5.4. MATRIX REPRESENTATIONS OF LINEAR TRANSFORMATIONS 67
If we now call
n
X
yi = aij xj for i = 1, . . . , m ,
j=1
[L(v)]W = A[v]V .
Definition 5.20. Given vector spaces V and W with corresponding bases V and W, and a
linear transformation L : V → W , we call the matrix A constructed in the theorem
V above the
matrix representation of L with respect to V and W, and denote it by L W . Thus, for
any v ∈ V we have
V
[L(v)]W = L W [v]V .
where
1 1
b1 = , b2 = .
0 1
Find the matrix representation of L with respect to the standard basis E = {e1 , e2 , e3 } and
the basis B = {b1 , b2 }.
Solution. Since
L(e1 ) = 1b1 + 0b2
L(e2 ) = 0b1 + 1b2
L(e3 ) = 0b1 + 1b2
we see that
1 0 0
[L(e1 )]B = , [L(e2 )]B = , [L(e3 )]B = ,
0 1 1
so
E 1 0 0
L B = .
0 1 1
(D(p))(t) = p0 (t) .
Define
p1 (t) = 1, p2 (t) = t, p3 (t) = t2 ,
68 CHAPTER 5. LINEAR TRANSFORMATIONS
and let P2 = {p1 , p2 , p3 } and P1 = {p1 , p2 } be bases for P2 and P1 respectively. Since
we have
D(p1 ) = 0p1 + 0p2
D(p2 ) = 1p1 + 0p2
D(p3 ) = 0p1 + 2p2
so
P2 0 1 0
D P1 = .
0 0 2
Suppose now that p ∈ P2 is given by
p(t) = a + bt + ct2 .
We want to find D(p). Of course we could do this working directly from the definition of D,
but we can also use the Matrix Representation Theorem: since
Example 5.23. Let A ∈ Rm×n , and let LA be the corresponding linear transformation from
Rn to Rm . Since LA (ej ) is just the j-th column of A, we see that the matrix representation
of LA with respect to the standard bases of Rn and Rm is just A itself.
T :U →V ,
S:V →W.
(S ◦ T )(α1 u1 ) = α1 (S ◦ T )(u1 ) ,
so S ◦ T is linear, as claimed.
Example 5.24. Suppose that A ∈ Rm×n and B ∈ Rn×r . Let LA : Rn → Rm and LB :
Rr → Rn be the corresponding linear transformations. Then LA ◦ LB : Rr → Rm is the linear
transformation given by
so
LA ◦ LB = LAB .
In other words, the composite of LA and LB is the linear transformation arising from the
product AB. Ultimately, this is the deeper reason why matrix multiplication is defined as it
is.
We shall shortly discuss a more abstract analogue of the above example for two arbitrary
linear transformations between finite dimensional vector spaces. Before doing so we digress to
introduce some more notation:
Notation 5.25. If V is a vector space and L : V → V is linear, we can compose L with itself
and call the resulting linear transformation L2 , that is,
L2 = L ◦ L .
D(p) = p0 .
Then
D2 (p) = p00
D3 (p) = p000
and so forth.
70 CHAPTER 5. LINEAR TRANSFORMATIONS
Suppose now that U , V and W are finite dimensional vector spaces with corresponding
bases U, V and W. If we are given two linear transformations T : U → V and S : V → W
we may want to ask the question, how the matrix representations of S and T are related to
that of S ◦ T . The following theorem provides a neat answer:
Theorem 5.27 (Composition Formula). Let U , V and W be finite dimensional vector spaces
with corresponding bases U, V and W. Suppose that T : U → V and S : V → W are linear
transformations. Then U V U
S◦T W = S W T V ,
that is, the matrix representation of S ◦ T with respect to U and W is the matrix product of
the matrix representation of S with respect V and W and the matrix representation of T with
respect to U and V.
V U
Proof. Write A = S W and B = T V . Let u ∈ U and let v = T (u) ∈ V . Thus, by the
Matrix Representation Theorem,
[S(v)]W = A[v]V ,
[T (u)]V = B[u]U .
Hence
Since u ∈ U was arbitrary, the above equation means that AB must be the matrix represen-
tation of S ◦ T with respect to U and W, that is,
U
AB = S ◦ T W .
Corollary 5.28. Let V be a finite dimensional vector space with basis V and let L : V → V .
If A is the matrix representation of L with respect to V, then An is the matrix representation
of Ln with respect to V, that is
n V
L V = An .
The n × n matrix S constructed in the theorem above is called the transition matrix
from U to W. It allows you to calculate the W-coordinates of a vector, given that you know
its U-coordinates. It shouldn’t come as a surprise that a transition matrix is invertible:
Theorem 5.30. Let V be a finite dimensional vector space with bases U and W. Then the
transition matrix from U to W is invertible, and its inverse is the transition matrix from W
to U.
Proof. By the Composition Formula we have
U W W W
Id W Id U = Id ◦ Id W = Id W = I ,
W U U U
Id U Id W = Id ◦ Id U = Id U = I ,
U W
so Id W is invertible with inverse Id U .
Example 5.31. Let P = {p1 , p2 , p3 } and Q = {q1 , q2 , q3 } be the bases of P2 given by
Now
q1 = 1p1 + 0p2 + 1p3
q2 = 0p1 + 1p2 + 0p3
q3 = 2p1 + 0p2 + 3p3
so
1 0 2
[q1 ]P = 0 ,
[q2 ]P = 1 ,
[q3 ]P = 0 ,
1 0 3
hence the transition matrix from Q to P is
Q 1 0 2
Id P = 0 1 0 .
1 0 3
72 CHAPTER 5. LINEAR TRANSFORMATIONS
Now if p is given by
p(t) = 2 − 3t + t2 ,
then [p]P = (2, −3, 1)T , so
P 3 0 −2 2 4
[p]Q = Id Q [p]P = 0 1 0 −3 = −3 .
−1 0 1 1 −1
Thus
p = 4q1 − 3q2 − q3 .
Question. Let V be a finite dimensional vector space with bases U and W. Suppose that
L : V → V is a linear transformation. U W
How are the two matrix representations L U and L W related?
Theorem 5.32. Let V be a finite dimensional vector space with bases U and W, and let
L : V → V be a linear transformation. If S is the transition matrix from U to W, then
U W
L U = S −1 L W S .
U W
Proof. Note that S = Id W and S −1 = Id U . Thus, by the composition formula,
W W W U W U U U
S −1 L W S = Id U L W Id W = Id U L ◦ Id W = Id ◦ L ◦ Id U = L U .
Example 5.33. Let L : R3 → R3 be the linear transformation defined by L(x) = Ax, where
2 2 0
A = 1 1 2 .
1 1 2
(a) Find the matrix representation of L with respect to the basis B = {b1 , b2 , b3 }, where
1 −2 1
b1 = −1 , b2 =
1 , b3 = 1 .
0 1 1
Solution.
(a) First observe that the matrix representation of L with respect to the standard basis E
of R3 is A itself. Now, the transition matrix S from B to E is simply
B 1 −2 1
S = Id E = PB = −1 1 1 ,
0 1 1
so the transition matrix from E to B is
0 −3 3
1
S −1 = −1 −1 2 .
3
1 1 1
Thus
B 0 0 0
L B = S −1 AS = 0 1 0 .
0 0 4
(b) Notice that it would be extremely silly to try and work out An in order to obtain
Ln (x) = An x. What you should observe instead is that the matrix representation of L
with respect to B, call it B, is extremely simple: it is diagonal. Thus
n B 0 0 0
L B = B n = 0 1 0 .
0 0 4n
Hence, in order to work out Ln (x) we only need to find the coordinates of x with respect
to B and we are done. Now
0 −3 3 −1 0
E −1 1
[x]B = Id B [x]E = S x = −1 −1 2 2 = 1 ,
3
1 1 1 2 1
so
0 0
n n B n
[L (x)]B = L B [x]B = B 1 = 1 ,
1 4n
and thus
−2 + 4n
Ln (x) = b2 + 4n b3 = 1 + 4n .
1 + 4n
Let’s have another look at the previous theorem: is states that if V is an n-dimensional
vector space with two bases U and W and L is a linear transformation from V to itself, then
the matrix representation of L with respect to W, call it A, and the matrix representation of
L with respect to U, call it B, satisfy
B = S −1 AS ,
Definition 5.34. Let A and B be two n × n matrices. The matrix B is said to be similar
to A if there is an invertible S ∈ Rn×n such that
B = S −1 AS .
A = SBS −1 = R−1 BR .
Orthogonality
In this chapter we will return to the concrete vector space Rn and add a new concept that will
reveal new aspects of it. The added spice in the discussion is the notion of ‘orthogonality’.
This concept extends our intuitive notion of perpendicularity in R2 and R3 to Rn . Innocent
as it may seem, this new concept turns out to be a rather powerful device, as we shall see
shortly.
6.1 Definition
We start by revisiting a concept that you have already encountered in Geometry I. Before
stating it recall that a vector x in Rn is, by definition, an n × 1 matrix. Given another vector
y in Rn , we may then form the matrix product xT y of the 1 × n matrix xT and the n × 1
matrix y. Notice that by the rules of matrix multiplication xT y is a 1 × 1 matrix, which we
can simply think of as a real number.
Definition 6.1. Let x and y be two vectors in Rn . The scalar xT y is called the scalar
product or dot product of x and y, and is often written x·y. Thus, if
x1 y1
x2 y2
x = .. , y = .. ,
. .
xn yn
then
y1
y2
x·y = xT y = x1 x2 · · · xn .. = x1 y1 + x2 y2 + · · · + xn yn .
.
yn
Example 6.2. If
2 4
x = −3 and y = 5 ,
1 6
75
76 CHAPTER 6. ORTHOGONALITY
then
T
4
x·y = x y = 2 −3 1 5 = 2 · 4 + (−3) · 5 + 1 · 6 = 8 − 15 + 6 = −1 ,
6
T
2
y·x = y x = 4 5 6 −3 = 4 · 2 + 5 · (−3) + 6 · 1 = 8 − 15 + 6 = −1 .
1
Having had a second look at the example above it should be clear why x·y = y·x. In fact,
this is true in general. The following further properties of the dot product follow easily from
properties of the transpose operation:
The above example should convince you that in R2 and R3 the definition of the length of
a vector x coincides with the standard notion of the length of the line segment from the origin
to x.
Note that if x ∈ Rn and α ∈ R then
because kαxk2 = (αx)·(αx) = α2 (x·x) = α2 kxk2 . Thus, if x 6= 0, we can always find a unit
vector y in the same direction as x by setting
1
y= x.
kxk
Definition 6.6. For x and y in Rn , the distance between x and y, written dist(x, y), is
the length of x − y, that is,
dist(x, y) = kx − yk .
6.2. ORTHOGONAL COMPLEMENTS 77
Again, you might want to mull for a second about this definition and convince yourself
that in R2 and R3 the above definition of the distance between two vectors x and y coincides
with the standard notion of the Euclidean distance between the two points represented by x
and y.
We now come to the main part of this introductory section. In essence, we will extend
the concept of perpendicularity familiar in R2 and R3 to vectors in Rn . In order to motivate
it consider two lines through the origin in R2 or R3 determined by two vectors x and y. By
drawing a picture (which I will try to supply before long) of the triangle formed by the points
representing x, y and −y, convince yourself that the lines in the direction of x and y are
geometrically perpendicular if and only if the distance from x to y equals the distance from x
to −y. Now, using Theorem 6.3 repeatedly,
dist(x, −y)2 = kx + yk2
= (x + y)·(x + y)
= x·(x + y) + y·(x + y)
= x·x + x·y + y·x + y·y
= kxk2 + kyk2 + 2x·y .
Using the same calculation with −y in place of y we see that
dist(x, y)2 = kxk2 + k − yk2 + 2x·(−y)
= kxk2 + kyk2 − 2x·y .
Thus
dist(x, −y) = dist(x, y) if and only if x · y = 0.
In other words, the lines in the direction of x and y are geometrically perpendicular if and only
if x·y = 0. We now take this observation as our starting point for the following definition,
which extends the concept of perpendicularity to Rn .
Definition 6.7. Two vectors x and y in Rn are orthogonal (to each other) if x·y = 0.
Note that the zero vector is orthogonal to every other vector in Rn .
The following theorem is an old acquaintance in new clothing, and, at the same time,
contains a key fact about orthogonal vectors:
Theorem 6.8 (Pythagorean Theorem). Two vectors x and y in Rn are orthogonal if and only
if
kx + yk2 = kxk2 + kyk2 .
Proof. See Exercise 3 on Coursework 9.
Example 6.10. Let W be a plane through the origin in R3 and let L be the line through
the origin and perpendicular to W . By construction, each vector in W is orthogonal to every
vector in L, and each vector in L is orthogonal to every vector in W . Hence
L⊥ = W and W⊥ = L.
The following theorem collects some useful facts about orthogonal complements.
Theorem 6.11. Let Y be a subspace of Rn . Then:
(a) Y ⊥ is a subspace of Rn .
(b) A vector x belongs to Y ⊥ if and only if x is orthogonal to every vector in a set that
spans Y .
Proof. See Exercises 4 and 6 on Coursework 9.
We finish this section with an application of the concepts introduced so far to the column
space and the nullspace of a matrix. These subspaces are sometimes called the fundamental
subspaces of a matrix. The next theorem, a veritable gem of Linear Algebra, shows that
the fundamental subspaces of a matrix and that of its transpose are intimately related via
orthogonality:
Theorem 6.12 (Fundamental Subspace Theorem). Let A ∈ Rm×n . Then:
(a) N (A) = col(AT )⊥ .
(b) N (AT ) = col(A)⊥ .
Proof. In this proof we shall identify the rows of A (which are strictly speaking 1 × n matrices)
with vectors in Rn .
(a) Let x ∈ Rn . Then
x ∈ N (A) ⇐⇒ Ax = 0
⇐⇒ x is orthogonal to every row of A
⇐⇒ x is orthogonal to every column of AT
⇐⇒ x ∈ col(AT )⊥ ,
so N (A) = col(AT )⊥ .
(b) Apply (a) to AT .
Example 6.14. If
3 −1 −1
u1 = 1 , u2 = 2 , u3 = −4 ,
1 1 7
then {u1 , u2 , u3 } is an orthogonal set since
u1 ·u2 = 3 · (−1) + 1 · 2 + 1 · 1 = 0
u1 ·u3 = 3 · (−1) + 1 · (−4) + 1 · 7 = 0
u2 ·u3 = (−1) · (−1) + 2 · (−4) + 1 · 7 = 0
The next theorem contains the first, perhaps surprising, property of orthogonal sets:
Theorem 6.15. If {u1 , . . . , ur } is an orthogonal set of nonzero vectors, then the vectors
u1 , . . . , ur are linearly independent.
Proof. Suppose that
c1 u1 + c2 u2 + · · · + cr ur = 0 .
Then
0 = 0·u1
= (c1 u1 + c2 u2 + · · · + cr ur )·u1
= c1 (u1 ·u1 ) + c2 (u2 · u1 ) + · · · + cr (ur ·u1 )
= c1 (u1 ·u1 ) ,
y = c1 u1 + · · · + cr ur ,
then
y·uj
cj = for each j = 1, . . . , r.
uj ·uj
Proof. As in the proof of the preceding theorem, the orthogonality of {u1 , . . . , ur } implies
that
y·u1 = (c1 u1 + · · · + cr ur )·u1 = c1 (u1 ·u1 ) .
Since u1 ·u1 is not zero (why?), we can solve for c1 in the above equation and find the stated
expression. In order to find cj for j = 2, . . . , r, compute y·uj and solve for cj .
Example 6.18. Show that the set {u1 , u2 , u3 } in Example 6.14 is an orthogonal basis for R3
and express the vector y = (6, 1, −8)T as a linear combination of the basis vectors.
80 CHAPTER 6. ORTHOGONALITY
Solution. Note that by Theorem 6.15 the vectors in the orthogonal set {u1 , u2 , u3 } are linearly
independent, so must form a basis for R3 , since dim R3 = 3.
Now
y·u1 = 11 , y·u2 = −12 , y·u3 = −66 ,
u1 ·u1 = 11 , u2 ·u2 = 6 , u3 ·u3 = 66 ,
so
y·u1 y·u2 y·u3 11 −12 −66
y= u1 + u2 + u3 = u1 + u2 + u3 = u1 − 2u2 − u3 .
u1 ·u1 u2 ·u2 u3 ·u3 11 6 66
Matrices whose columns form an orthonormal set are important in applications, in particular
in computational algorithms. We are now going to explore some of their properties.
Proof. As an illustration of the general idea, suppose for the moment that U has only three
columns, each a vector in Rm . Write
U = u1 u2 u3 .
Then T
uT1 u 1 u1 u T
1 u 2 u T
1 u 3
U T U = uT2 u1 u2 u3 = uT2 u1 uT2 u2 uT2 u3
Theorem 6.23. Let U ∈ Rm×n have orthonormal columns, and let x and y in Rn . Then:
(b) kU xk = kxk;
The above considerations show that every square matrix with orthonormal columns is an
orthogonal matrix. Two other interesting properties of orthogonal matrices are contained in
the following theorem.
y = ŷ + z , (6.1)
Proof. Let ŷ be given by (6.2). Since ŷ is a linear combination of the vectors u1 , . . . , ur , the
vector ŷ must belong to H. Let z = y − ŷ. Then
z·u1 = (y − ŷ)·u1
u·u1
= y·u1 − (u1 ·u1 ) − 0 − · · · − 0
u1 ·u1
= y·u1 − y·u1
= 0,
ŷ − ŷ1 = z1 − z .
The above equality shows that the vector v = ŷ − ŷ1 belongs to both H and H ⊥ . Thus
v·v = 0, which implies v = 0. Therefore ŷ = ŷ1 and z = z1 , so the decomposition (6.1) is
unique.
The vector ŷ in (6.1) is called the orthogonal projection of y onto H , and is written
projH y, that is,
ŷ = projH y .
One of the reasons why orthogonal projections play an important role in Linear Algebra,
and indeed in other branches of Mathematics, is made plain in the following theorem:
But kŷ − vk2 > 0, since ŷ 6= v, so the desired inequality (6.3) holds.
The theorem above is the reason why the orthogonal projection of y onto H is often called
the best approximation of y by elements in H.
We conclude this section with the following consequence of the Orthogonal Decomposition
Theorem:
(a) (H ⊥ )⊥ = H;
v1 = x1
x2 ·v1
v2 = x2 − v1
v1 ·v1
x3 ·v1 x3 ·v2
v3 = x3 − v1 − v2
v1 ·v1 v2 ·v2
..
.
xr ·v1 xr ·v2 xr ·vr−1
vr = xr − v1 − v2 − · · · − vr−1
v1 ·v1 v2 ·v2 vr−1 ·vr−1
Proof. Write Hk = Span (x1 , . . . , xk ). Set v1 = x1 , so that Span (v1 ) = Span (x1 ). Suppose
that for some k < r we have already constructed v1 , . . . , vk so that {v1 , . . . , vk } is an
orthogonal basis for Hk . Define
Remark 6.30. As with almost all the other results and techniques presented in this module,
the best way to remember the Gram Schmidt process is to understand the proof. Here is the
idea in a nut-shell: the Gram Schmidt process is an iterative procedure; if, at some stage, the
orthogonal vectors v1 , . . . , vk have already been constructed, the next vector vk+1 is obtained
by subtracting the orthogonal projection of xk+1 onto Hk = Span (v1 , . . . vk ) from xk+1 , that
is,
vk+1 = xk+1 − projHk xk+1 ,
as this makes the vector vk+1 orthogonal Hk , and thus in particular, orthogonal to all previously
constructed vectors v1 , . . . , vk .
1 2 6
1
The vector v2 is constructed by subtracting the orthogonal projection of x2 onto Span (v1 )
from x2 , that is,
−1
x2 ·v1 4 0
v2 = x2 − v1 = x2 − v1 = 0.
v1 ·v1 4
1
The vector v3 is constructed by subtracting the orthogonal projection of x3 onto Span (v1 , v2 )
from x3 , that is,
1
x3 ·v1 x3 ·v2 8 6 −2
v3 = x3 − v1 − v2 = x3 − v1 − v2 = 0,
v1 ·v1 v2 ·v2 4 2
1
How do we find least squares solutions of a given system Ax = b? To motivate the result
to follow, let
b̂ = projcol(A) b .
Since b̂ is in col(A), the equation Ax = b̂ is consistent, and there is an x̂ ∈ Rn such that
Ax̂ = b̂ .
b − Ax̂ ∈ col(A)⊥ .
b − Ax̂ ∈ N (AT ) .
Thus
AT (b − Ax̂) = 0 ,
and hence
AT Ax̂ = AT b .
To summarise what we have just said: a least squares solution of Ax = b satisfies
AT Ax = AT b . (6.5)
The matrix equation (6.5) represents a system of equations called the normal equations for
Ax = b.
Theorem 6.33. Let A ∈ Rm×n and b ∈ Rm . The set of least squares solutions of Ax = b
coincides with the non-empty solution set of the normal equations
AT Ax = AT b . (6.6)
86 CHAPTER 6. ORTHOGONALITY
Proof. We have just seen that a least squares solution x̂ must satisfy the normal equations.
It turns out that the argument outlined also works in the reverse direction. To be precise,
suppose that x̂ satisfies the normal equations, that is
AT Ax̂ = AT b .
Then AT (b − Ax̂) = 0, so b − Ax̂ ∈ N (AT ) = col(A)⊥ . Thus b − Ax̂ is orthogonal to
col(A). Hence the equation
b = Ax̂ + (b − Ax̂)
is a decomposition of b into a sum of a vector in col(A) and a vector orthogonal to it. By
the uniqueness of the orthogonal decomposition, Ax̂ must be the orthogonal projection of b
onto col(A). Thus, Ax̂ = b̂, and x̂ is a least squares solution.
Example 6.34. Find the least squares solution of the inconsistent system Ax = b, where
4 0 2
A = 0 2 b = 0 .
1 1 11
Solution. Compute
4 0
4 0 1 17 1
AT A = 0 2 =
,
0 2 1 1 5
1 1
2
T 4 0 1 19
A b= 0 = .
0 2 1 11
11
Thus the normal equations AT Ax = AT b are
17 1 x1 19
= .
1 5 x2 11
This system can (and should, in general!) be solved by Gaussian elimination. In our case,
however, it is quicker to spot that the coefficient matrix is invertible with inverse
T −1 1 5 −1
(A A) = ,
84 −1 17
so AT Ax = AT b can be solved my multiplying both sides with (AT A)−1 from the left, giving
the least squares solution
T −1 T 1 5 −1 19 1
x̂ = (A A) A b = = .
84 −1 17 11 2
Often (but not always!) the matrix AT A is invertible, and the method shown in the
example above can be used. In general, the least squares solution need not be unique, and
Gaussian elimination has to used to solve the normal equations. The following theorem gives
necessary and sufficient conditions for AT A to be invertible.
Theorem 6.35. Let A ∈ Rm×n and b ∈ Rm . The matrix AT A is invertible if and only if
the columns of A are linearly independent. In this case, Ax = b has only one least squares
solution x̂, given by
x̂ = (AT A)−1 AT b .
Proof. See Exercise 1 on Coursework 11 for the first part. The remaining assertion follows as
in the previous example.
Chapter 7
In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigen-
vectors of matrices, two concepts you have already encountered in Geometry I and, in various
guises, in other modules. Let me first try to refresh your memory.
Ax = λx ,
Remark 7.3. Note that if x is an eigenvector of a matrix A with eigenvalue λ, then any
nonzero multiple of x is also an eigenvector corresponding to λ, since
87
88 CHAPTER 7. EIGENVALUES AND EIGENVECTORS
x = c1 u + c2 w
Ax = c1 Au + c2 Aw = 3c1 u + 2c2 w ,
In other words, relative to the basis B, the action of A on an arbitrary vector is very easy to
understand. We shall explore this observation in more detail shortly, and we shall see that it
holds the key to a new and powerful perspective on matrices.
We shall now investigate how to determine all the eigenvalues and eigenvectors of an n × n
matrix A. We start by observing that the defining equation Ax = λx can be written
(A − λI)x = 0 . (7.1)
Thus λ is an eigenvalue of A if and only if (7.1) has a non-trivial solution. The set of solutions
of (7.1) is N (A − λI), that is, the nullspace of A − λI, which is a subspace of Rn .Thus, λ is
an eigenvalue of A if and only if
N (A − λI) 6= {0} ,
Theorem 7.4. Let A be an n × n matrix and λ a scalar. The following statements are
equivalent:
(a) λ is an eigenvalue of A;
(d) A − λI is singular;
(λ + 1)(λ − 2) = 0 ,
−7 + 1 −6 0 −6 −6 0 1 1 0 1 1 0
(A + I|0) = = ∼ ∼ ,
9 8+1 0 9 9 0 9 9 0 0 0 0
1 23 0 1 23 0
−7 − 2 −6 0 −9 −6 0
(A − 2I|0) = = ∼ ∼ ,
9 8−2 0 9 6 0 9 6 0 0 0 0
Solution. A slightly tedious calculation using repeated cofactor expansions shows that the
characteristic polynomial of A is
2 − λ −3 1
det(A − λI) = 1
−2 − λ 1 = −λ(λ − 1)2 ,
1 −3 2 − λ
In order to find the eigenspace corresponding to λ2 we find the nullspace of A−λ2 I = A−I,
again using Gaussian elimination:
2−1 −3 1 1 −3 1 1 −3 1
A−I = 1 −2 − 1 1 = 1 −3 1 ∼ 0 0 0 ,
1 −3 2−1 1 −3 1 0 0 0
so setting x2 = α and x3 = β we find x1 = 3x2 −x3 = 3α −β. Thus every vector in N (A−I)
is of the form
3α − β 3 −1
α = α 1 + β 0 ,
β 0 1
and so the eigenspace corresponding to the eigenvalue 1 is
3 −1
α 1 + β 0 α, β ∈ R .
0 1
Solution. Using the fact that the determinant of a triangular matrix is the product of the
diagonal entries we find
1 − λ 2 3
det(A − λI) = 0
4−λ 5 = (1 − λ)(4 − λ)(6 − λ) ,
0 0 6 − λ
We will revisit this theorem from a different perspective in the next section.
7.2 Diagonalisation
In many applications of Linear Algebra one is faced with the following problem: given a square
matrix A, find the k-th power Ak of A for large values of k. In general, this can be a very
onerous task. For certain matrices, however, evaluating powers is spectacularly easy:
Then 2
2 2 0 2 0 2 0
D = =
0 3 0 3 0 32
and 2 3
3 2 2 0 2 0 2 0
D = DD = = .
0 3 0 32 0 33
In general,
k 2k 0
D = ,
0 3k
for k ≥ 1.
After having had another look at the example above, you should convince yourself that
if D is a diagonal n × n matrix with diagonal entries d1 , . . . , dn , then Dk is the diagonal
matrix whose diagonal entries are dk1 , . . . , dkn . The moral of this is that calculating powers for
diagonal matrices is easy. What if the matrix is not diagonal? The next best situation arises
if the matrix is similar to a diagonal matrix. In this case, calculating powers is almost as easy
as calculating powers of diagonal matrices, as we shall see shortly. We shall now single out
matrices with this property and give them a special name:
P −1 AP = D ,
A = P DP −1 ,
A2 = P DP −1 P DP −1 = P D2 P −1 ,
A3 = AA2 = P DP −1 P D2 P −1 = P D3 P −1 ,
and in general
Ak = P Dk P −1 ,
for any k ≥ 1. Thus powers of A are easily computed, as claimed.
Before giving a characterisation of diagonalisable matrices we require an auxiliary result of
independent interest:
Proof. By contradiction. Suppose to the contrary that the vectors v1 , . . . , vr are linearly
dependent. We may then assume (after reordering the v’s and the λ’s if necessary), that
there is an index p < r such that vp+1 is a linear combination of the preceding linearly
independent vectors. Thus there exist scalars c1 , . . . , cp such that
c1 v1 + · · · + cp vp = vp+1 . (7.3)
Multiplying both sides of the above equation by A and using the fact that Avk = λk vk for
each k, we obtain
c1 λ1 v1 + · · · + cp λp vp = λp+1 vp+1 . (7.4)
Multiplying both sides of (7.3) by λp+1 and subtracting the result from (7.4), we see that
Since the vectors v1 , . . . , vp are linearly independent, the coefficients in (7.5) must all be zero.
But none of the factors λk − λp+1 is zero, because the eigenvalues are distinct, so we must
have c1 = . . . = cp = 0. But then (7.3) implies that vp+1 = 0, which is impossible, because
vp+1 is an eigenvector. Thus our assumption that the vectors v1 , . . . , vr are linearly dependent
must be false, that is, the vectors v1 , . . . , vr are linearly independent.
We are now able to prove the main the result of this section:
Proof. First observe that if P is any n × n matrix with columns v1 , . . . , vn , and if D is any
diagonal matrix with diagonal entries λ1 , . . . , λn , then
AP = A v1 · · · vn = Av1 · · · Avn , (7.6)
while
P D = λv1 · · · λn vn . (7.7)
94 CHAPTER 7. EIGENVALUES AND EIGENVECTORS
First we prove the ‘only if’ part of the theorem. Suppose that A is diagonalisable and that
−1
P AP = D. Then AP = P D, and equations (7.6) and (7.7) imply that
Av1 · · · Avn = λ1 v1 · · · λn Avn . (7.8)
Thus
Av1 = λ1 v1 , Av2 = λ2 v2 , ..., Avn = λn vn . (7.9)
Since P is invertible its columns v1 , . . . , vn must be linearly independent. Moreover, none of its
columns can be zero, so (7.9) implies that λ1 , . . . , λn are eigenvalues of A with corresponding
eigenvectors v1 , . . . , vn . This finishes the proof of the ‘only if’ part.
In order to prove the ‘if’ part, suppose that A has n linearly independent eigenvectors
v1 , . . . , vn with corresponding eigenvalues λ1 , . . . , λn . Let P be the matrix whose columns
are v1 , . . . , vn and let D be the diagonal matrix whose diagonal entries are λ1 , . . . , λn . Then
equations (7.6), (7.7), and (7.8) imply that AP = P D. But since the columns of P are
linearly independent, P is invertible, so P −1 AP = D as claimed.
Example 7.15. Diagonalise the following matrix, if possible:
−7 3 −3
A = −9 5 −3 .
9 −3 5
Solution. A slightly tedious calculation shows that the characteristic polynomial is given by
−7 − λ 3 −3
p(λ) = det(A − λI) = −9 5 − λ −3 = −λ3 + 3λ2 − 4 .
9 −3 5 − λ
The cubic p above can be factored by spotting that −1 is a root. Polynomial division then
yields
p(λ) = −(λ + 1)(λ2 − 4λ + 4) = −(λ + 1)(λ − 2)2 ,
so the distinct eigenvalues of A are 2 and −1.
The usual methods (see Examples 7.6 and 7.7) now produce a basis for each of the two
eigenspaces and it turns out that
1 −1
N (A − 2I) = Span (v1 , v2 ), where v1 = 3 , v2 =
0,
0 3
−1
N (A + I) = Span (v3 ), where v3 = −1 .
1
You may now want to confirm, using your favourite method, that the three vectors
v1 , v2 , v3 are linearly independent. As we shall see shortly, this is not really necessary: the
union of basis vectors for eigenspaces always produces linearly independent vectors (see the
proof of Corollary 7.17 below).
Thus, A is diagonalisable, since it has 3 linearly independent eigenvectors. In order to find
the diagonalising matrix P we recall that defining
1 −1 −1
P = v1 v2 v3 = 3 0 −1
0 3 1
7.2. DIAGONALISATION 95
does the trick, that is, P −1 AP = D, where D is the diagonal matrix whose entries are the
eigenvalues of A and where the order of the eigenvalues matches the order chosen for the
eigenvectors in P , that is,
2 0 0
D = 0 2 0 .
0 0 −1
It is good practice to check that P and D really do the job they are supposed to do:
−7 3 −3 1 −1 −1 2 −2 1
AP = −9 5 −3 3 0 −1 = 6 0 1,
9 −3 5 0 3 1 0 6 −1
1 −1 −1 2 0 0 2 −2 1
P D = 3 0 −1 0 2 0 = 6 0 1,
0 3 1 0 0 −1 0 6 −1
so AP = P D, and hence P −1 AP = D as required.
Solution. The characteristic polynomial of A turns out to be exactly the same as in the previous
example:
det(A − λI) = −λ3 + 3λ2 − 4 = −(λ + 1)(λ − 2)2 .
Thus the eigenvalues of A are 2 and −1. However, in this case it turns out that both
eigenspaces are 1-dimensional:
1
N (A − 2I) = Span (v1 ) where v1 = 2,
−1
−1
N (A + I) = Span (v2 ) where v1 = −1 .
1
Since A has only 2 linearly independent eigenvectors, the Diagonalisation Theorem implies
that A is not diagonalisable.
Put differently, the Diagonalisation Theorem states that a matrix A ∈ Rn×n is diagonalis-
able if and only if A has enough eigenvectors to form a basis of Rn . The following corollary
makes this restatement even more precise:
Corollary 7.17. Let A ∈ Rn×n and let λ1 , . . . , λr be the (distinct) eigenvalues of A. Then
A is diagonalisable if and only if
dim N (A − λ1 ) + · · · + dim N (A − λr I) = n .
96 CHAPTER 7. EIGENVALUES AND EIGENVECTORS
Proof. For simplicity we will only consider the case where A has two distinct eigenvalues; the
general case can be proved along the same lines. Suppose that A has two distinct eigenvalues
λ1 and λ2 . Let {v1 , . . . , vp } be a basis for N (A − λ1 I) and let {w1 , . . . , wq } be a basis
for N (A − λ2 I). First we show that the p + q vectors v1 , . . . , vp , w1 , . . . , wq are linearly
independent. In order to see this suppose that there are scalars α1 , . . . , αp , β1 , . . . , βq such
that
α1 v1 + · · · + αp vp + β1 w1 + · · · + βq wq = 0 . (7.10)
Theorem 7.13 now implies that
α1 v1 + · · · + αp vp = 0 and β1 w1 + · · · + βq wq = 0 , (7.11)
because otherwise (7.10) would state that 0 can be written as a nontrivial linear combination
of eigenvectors belonging to distinct eigenvalues of A. But since v1 , . . . , vp and w1 , . . . wq
are linearly independent, it follows that α1 = · · · = αp = β1 = · · · = βq = 0. Thus the p + q
vectors v1 , . . . , vp , w1 , . . . , wq are linearly independent as claimed.
By the Diagonalisation Theorem A is diagonalisable if and only if A has n linearly inde-
pendent eigenvectors. By what we have just proved, this is the case if and only if p + q = n,
and the corollary follows because dim N (A − λ1 ) = p and dim N (A − λ2 I) = q.
A very useful special case of the Diagonalisation Theorem is the following:
Theorem 7.18. An n × n matrix with n distinct eigenvalues is diagonalisable.
Proof. Let v1 , . . . , vn be eigenvectors corresponding to the n distinct eigenvalues of A. Then
the n vectors v1 , . . . , vn are linearly independent by Theorem 7.13. Hence A is diagonalisable
by the Diagonalisation Theorem.
Remark 7.19. Note that the above condition for diagonalisability is sufficient but not nec-
essary : an n × n matrix which does not have n distinct eigenvalues may or may not be
diagonalisable (see Examples 7.15 and 7.16).
Example 7.20. The matrix
1 −1 5
A = 0 2 6
0 0 3
is diagonalisable, since it has three distinct eigenvalues 1, 2, and 3.
so the characteristic polynomial does not have any real roots, and hence A does not have any
real eigenvalues. However, since
λ2 + 1 = λ2 − (−1) = λ − i2 = (λ − i)(λ + i) ,
7.3. INTERLUDE: COMPLEX VECTOR SPACES AND MATRICES 97
the characteristic polynomial has two complex roots, namely i and −i. Thus it makes sense to
say that A has two complex eigenvalues i and −i. What are the corresponding eigenvectors?
Solving
(A − iI)x = 0
leads to the system
−ix1 − x2 = 0
x1 + ix2 = 0
1
Both equations yield the condition x2 = −ix1 , so is an eigenvector corresponding to
−i
the eigenvalue i. Indeed
0 −1 1 i i 1
= = 2 =i .
1 0 −i 1 −i −i
1
Similarly, we see that is an eigenvector corresponding to the eigenvalue −i. Indeed
i
0 −1 1 −i −i 1
= = 2 = −i .
1 0 i 1 −i i
The moral of this example is the following: on the one hand, we could just say that the matrix
A has no real eigenvalues and stop the discussion right here. On the other hand, we just saw
that it makes sense to say that A has two complex eigenvalues with corresponding complex
eigenvectors.
This leads to the idea of leaving our current real set-up, and enter a complex realm instead.
As it turns out, this is an immensely powerful idea. However, as our time is limited, we shall
only cover the bare necessities, allowing us to prove the main result of the next section.
Let Cn denote the set of all n-vectors with complex entries, that is,
z1
n ..
C = . z1 , . . . , zn ∈ C .
z
n
Just as in Rn , we add vectors in Cn by adding their entries, and we can multiply a vector in
Cn by a complex number, by multiplying each entry.
Then
(1 + i) + (−2 + 3i) −1 + 4i
z+w = 2i + 1 = 1 + 2i
3 + (2 + i) 5+i
(1 + 2i)(1 + i) 1 + 2i + i + 2i2 −1 + 3i
αz = (1 + 2i)(2i) = 2i + (2i)2 = −4 + 2i
(1 + 2i) · 3 3 + 6i 3 + 6i
98 CHAPTER 7. EIGENVALUES AND EIGENVECTORS
If addition and scalar multiplication is defined in this way (now allowing scalars to be in
C), then Cn satisfies all the axioms of a vector space. Similarly, we can introduce the set
of all m × n matrices with complex entries, call it Cm×n , and define addition and scalar
multiplication (again allowing complex scalars) entry-wise just as in Rm×n . Again, Cm×n
satisfies all the axioms of a vector space.
Fact 7.22. All the results in Chapters 1–5, and all the results from the beginning of this
chapter hold verbatim, if ‘scalar’ is taken to mean ‘complex number’.
Since ‘scalars’ are now allowed to be complex numbers, Cn and Cm×n are known as
complex vector spaces.
The reason for allowing this more general set-up is that, in a certain sense, complex
numbers are much nicer than real numbers. More precisely, we have the following result:
Theorem 7.23 (Fundamental Theorem of Algebra). If p is a complex polynomial of degree
n ≥ 1, that is,
p(z) = cn z n + · · · + c1 z + c0 ,
where c0 , c1 , . . . , cn ∈ C, then p has at least one (possibly complex) root.
Corollary 7.24. Every matrix A ∈ Cn×n has at least one (possibly complex) eigenvalue and
a corresponding eigenvector z ∈ Cn .
Proof. Since λ is an eigenvalue of A if and only if det(A − λI) = 0 and since p(λ) =
det(A − λI) is a polynomial with complex coefficients of degree n, the assertion follows from
the Fundamental Theorem of Algebra.
The corollary above is the main reason why complex vector spaces are considered. We are
guaranteed that every matrix has at least one eigenvalue, and we may then use the powerful
tools developed in the earlier parts of this chapter to analyse matrices through their eigenvalues
and eigenvectors.
The proof of this theorem requires a number of auxiliary results which we shall first state
and prove.
Lemma 7.26. The product of two orthogonal matrices of the same size is orthogonal, that
is, if Q0 ∈ Rn×n and Q1 ∈ Rn×n are orthogonal, then Q0 Q1 is also orthogonal.
so Q0 Q1 is orthogonal.
(Ax)·y = x·(Ay) .
so Aw ∈ H ⊥ as claimed.
The following lemma contains the key result that will allow us to prove the Spectral
Theorem. We already know from the discussion in the previous section that every matrix has
a complex eigenvalue and a corresponding complex eigenvector. The following lemma shows
that, rather unexpectedly, symmetric matrices always have at least one real eigenvalue with a
corresponding real eigenvector:
Lemma 7.29. Every symmetric matrix A ∈ Rn×n has at least one real eigenvalue with
corresponding real eigenvector v ∈ Rn .
Proof. By Corollary 7.24 we know that A has at least one complex eigenvalue λ with corre-
sponding eigenvector z ∈ Cn , that is
Az = λz . (7.12)
Write
λ = a + ib where a, b ∈ R (7.13)
z = v + iw where v, w ∈ Rn
100 CHAPTER 7. EIGENVALUES AND EIGENVECTORS
Av = av − bw , (7.14)
Aw = aw + bv .
so
a(v·w) − bkwk2 = a(v·w) + bkvk2
and hence
b(kvk2 + kwk2 ) = 0 .
But z = v + iw is an eigenvector, so the vectors v and w cannot both be the zero vector.
Therefore kvk2 + kwk2 > 0, and hence
b = 0.
Thus, using (7.13) and (7.14), we see that λ = a ∈ R and Av = av = λv, that is, A has a
real eigenvalue with corresponding real eigenvector.
Proof of the Spectral Theorem. By induction on n. The theorem is trivial for n = 1. Suppose
now that the theorem has been shown to be true for some k. Let A ∈ R(k+1)×(k+1) be
symmetric. By Lemma 7.29 there is λ ∈ R and v ∈ Rk+1 such that
Av = λv .
Since multiplying an eigenvector by a nonzero scalar does not change its status as an eigen-
vector, we may assume that v is normalised, that is, kvk = 1. Next choose an orthonormal
basis {w1 , . . . , wk } for Span (v)⊥ . Then {v1 , w1 , . . . , wk } is an orthonormal basis for Rk+1
and it follows that the matrix Q0 ∈ R(k+1)×(k+1) given by
Q 0 = v w1 · · · w k
Moreover, since
vT Av = vT (λv) = λ(vT v) = λ ,
7.4. SPECTRAL THEOREM FOR SYMMETRIC MATRICES 101
vT Awj = v·(Awj ) = 0
wiT Av = wi ·(Av) = (Awi )·v = 0 ,
it follows that
λ 0 ··· 0
0
T
Q0 AQ0 = .. .
. B
0
Observe now that by Lemma 7.27 the k × k matrix B is symmetric, because
bij = wiT Awj = wi ·(Awj ) = (Awi )·wj = wj ·(Awi ) = wjT Awi = bji .
RT BR = D ,
A short calculation using the fact that RT R = I shows that QT1 Q1 = I, so Q1 is orthogonal.
If we now let
Q = Q0 Q1 ,
then Q is orthogonal by Lemma 7.26, and
λ 0 ··· 0 λ 0 ··· 0 λ 0 ··· 0
0 0 0
QT AQ = QT1 QT0 AQ0 Q1 = QT1 .. Q = = ,
1 .. T
..
. B . R BR . D
0 0 0
Let us record the following consequence of the Spectral Theorem, which is of independent
interest:
Corollary 7.30. The eigenvalues of a symmetric matrix A are real, and eigenvectors corre-
sponding to distinct eigenvalues are orthogonal.
adjugate, 32 inverse, 11
Inverse Formula, 32
basis, 49 Invertible Matrix Theorem, 19
best approximation, 83
Best Approximation Theorem, 82 kernel, 62
characteristic equation, 88 leading 1, 3
characteristic polynomial, 88 least squares problem, 85
cofactor, 25 least squares solution, 85
cofactor expansion, 25 linear combination, 16
Cofactor Expansion Theorem, 25 linear equation, 1
column space, 55 linear mapping, 59
commuting matrices, 11 linear transformation, 59
complex vector space, 36, 98 linearly dependent vectors, 46
composite, 69 linearly independent vectors, 46
Composition Formula, 70
coordinate vector, 53 matrix, 9
coordinates, 53 augmented, 3
Cramer’s Rule, 31 coefficient, 3
diagonal, 14
determinant, 24 invertible, 11
diagonalisable matrix, 92 lower triangular, 14
Diagonalisation Theorem, 93 nonsingular, 28
dimension, 51 singular, 28
distance, 76 symmetric, 13
dot product, 75 upper triangular, 14
matrix representation, 67
eigenspace, 89
Matrix Representation Theorem, 66
eigenvalue, 87
eigenvector, 87 norm, 76
elementary row operation, 3 normal equations, 85
equivalent systems, 2 normalising, 76
Fundamental Subspace Theorem, 78 nullspace, 41
fundamental subspaces, 78 orthogonal basis, 79
Gauss-Jordan algorithm, 6 orthogonal complement, 77
Gauss-Jordan inversion, 20 Orthogonal Decomposition Theorem, 82
Gaussian algorithm, 5 orthogonal matrix, 81
Gaussian elimination, 3 orthogonal projection, 82
orthogonal set, 78
identity, 62 orthogonal vectors, 77
identity transformation, 62 orthonormal basis, 80
image, 62 orthonormal set, 80
103
104 INDEX
pivot column, 7
pivot position, 7
Pythagorean Theorem, 77
range, 62
rank, 56
real vector space, 36
reduced row echelon form, 3
row echelon form, 3
row equivalent matrices, 18
row space, 55
scalar product, 75
solution, 1
trivial, 8
zero, 8
span, 42
spanning set, 43
Spectral Theorem, 98
standard basis, 49, 50
subspace, 39
proper, 39
zero, 39
system
associated homogeneous, 8
consistent, 2
homogeneous, 8
inconsistent, 2
inhomogeneous, 8
overdetermined, 7
solution set of a, 2
underdetermined, 7
transition matrix, 54
transpose, 12
unit vector, 76
variable
free, 4
leading, 4
vector space, 36
finite dimensional, 51
infinite dimensional, 51
zero vector, 19