4 Eigenvalues and Eigenvectors PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 108

Part IV

Eigenvalues and Eigenvectors

277
Section 16

The Determinant
Focus Questions
By the end of this section, you should be able to give precise and thorough
answers to the questions listed below. You may want to keep these questions
in mind to focus your thoughts as you complete the section.

• How do we calculate the determinant of an n × n matrix?


• What is one important fact the determinant tells us about a matrix?

Application: Area and Volume

Consider the problem of finding the area of a parallelogram determined by two vectors u and v,
as illustrated at left in Figure 16.1. We could calculate this area, for example, by breaking up

v
w

v
u u

Figure 16.1: A parallelogram and a parallelepiped.

the parallelogram into two triangles and a rectangle and finding the area of each. Now consider
the problem of calculating the volume of the three-dimensional analog (called a parallelepiped)
determined by three vectors u, v, and w as illustrated at right in Figure 16.1.
It is quite a bit more difficult to break this parallelepiped into subregions whose volumes are
easy to compute. However, all of these computations can be made quickly by using determinants.
The details are later in this section.

279
280 Section 16. The Determinant

Introduction

We know that a non-zero vector x is an eigenvector of an n × n matrix A if Ax = λx for some


scalar λ. Note that this equation can be written as (A − λIn )x = 0. Until now, we were given
eigenvalues of matrices and have used the eigenvalues to find the eigenvectors. In this section we
will learn an algebraic technique to find the eigenvalues ourselves. We will also be able to justify
why an n × n matrix has at most n eigenvalues.
A scalar λ is an eigenvalue of A if (A − λIn )x = 0 has a non-trivial solution x, which happens
if and only if A − λIn is not invertible. In this section we will find a scalar whose value will tell us
when a matrix is invertible and when it is not, and use this scalar to find the eigenvalues of a matrix.
 
a b
Preview Activity 16.1. In this activity, we will focus on 2 × 2 matrices. Let A = be a
c d
2 × 2 matrix. To see if A is invertible, we row reduce A by replacing row 2 with a·(row 2) −c·(row
1):  
a b
.
0 ad − bc
So the only way A can be reduced I2 is if ad−bc 6= 0. We call this quantity ad−bc the determinant
of A, and denote the determinant of A as det(A) or |A|. When det(A) 6= 0, we know that
 
−1 1 d −b
A = .
ad − bc −c a

We now consider how we can use the determinant to find eigenvalues and other information about
the invertibility of a matrix.
 
1 2
(1) Let A = . Find det(A) by hand. What does this mean about the matrix A? Can
2 4
you confirm this with other methods?
 
1 3
(2) One of the eigenvalues of A = is λ = 4. Recall that we can rewrite the matrix
2 2
equation Ax = 4x in the form (A − 4I2 )x = 0. What must be true about A − 4I2 in order
for 4 to be an eigenvalue of A? How does this relate to det(A − 4I2 )?
 
1 3
(3) Another eigenvalue of A = is λ = −1. What must be true about A + I2 in order
2 2
for −1 to be an eigenvalue of A? How does this relate to det(A + I2 )?
 
3 2
(4) To find the eigenvalues of the matrix A = , we rewrite the equation Ax = λx
2 6
as
 (A − λI2 )x = 0. The coefficient matrix of this last system has the form A − λI2 =
3−λ 2
. The determinant of this matrix is a quadratic expression in λ. Since the
2 6−λ
eigenvalues will occur when the determinant is 0, we need to solve a quadratic equation. Find
the resulting eigenvalues. (Note: One of the eigenvalues is 2.)

(5) Can you explain why a 2 × 2 matrix can have at most two eigenvalues?
Section 16. The Determinant 281

The Determinant of a Square Matrix

Around 1900 or so determinants were deemed much more important than they are today. In fact,
determinants were used even before matrices. According to Tucker1 determinants (not matrices)
developed out of the study of coefficients of systems of linear equations and were used by Leibniz
150 years before the term matrix was coined by J. J. Sylvester in 1848. Even though determinants
are not as important as they once were, the determinant of a matrix is still a useful quantity. We saw
in Preview Activity 16.1 that the determinant of a matrix tells us if the matrix is invertible and how
it can help us find eigenvalues. In this section, we will see how to find the determinant of any size
matrix and how to use this determinant to find the eigenvalues.
 
a b
The determinant of a 2 × 2 matrix A = is det(A) = ad − bc. The matrix A is
c d
invertible if and only if det(A) 6= 0. We will use a recursive approach to find the determinants of
larger size matrices building from the 2 × 2 determinants. We present the result in the 3 × 3 case
here – a more detailed analysis can be found at the end of this section.
 
a11 a12 a13
To find the determinant of a 3×3 matrix A =  a21 a22 a23 , we will use the determinants
a31 a32 a33
of three 2 × 2 matrices. More specifically, the determinant of A, denoted det(A) is the quantity
     
a22 a23 a21 a23 a21 a22
a11 det − a12 det + a13 det . (16.1)
a32 a33 a31 a33 a31 a32
This sum is called a cofactor expansion of the determinant of A. The smaller matrices in this
expansion are obtained by deleting certain rows and columns of the matrix A. In general, when
finding the determinant of an n × n matrix, we find determinants of (n − 1) × (n − 1) matrices,
which we can again reduce to smaller matrices to calculate.
We will use the specific matrix  
1 2 0
A= 1 4 3 
2 2 1
as an example in illustrating the cofactor expansion method in general.

• We first pick a row or column of A. We will pick the first row of A for this example.
• For each entry in the row (or column) we choose, in this case the first row, we will calculate
the determinant of a smaller matrix obtained by removing the row and the column the entry
is in. Let Aij be the smaller matrix found by deleting the ith row and jth column of A. For
entry a11 , we find the matrix A11 obtained by removing first row and first column:
 
4 3
A11 = .
2 1
For entry a12 , we find  
1 3
A12 = .
2 1
1
Tucker, Alan. (1993). The Growing Importance of Linear Algebra in Undergraduate Mathematics. The College
Mathematics Journal, 1, 3-9.
282 Section 16. The Determinant

Finally, for entry a13 , we find


 
1 4
A13 = .
2 2

• Notice that in the 3 × 3 determinant formula in (16.1) above, the middle term had a (-) sign.
The signs of the terms in the cofactor expansion alternate within each row and each column.
More specifically, the sign of a term in the ith row and jth column is (−1)i+j . We then obtain
the following pattern of the signs within each row and column:
 
+ − + ···

 − + − ··· 

 + − + ··· 
..
 
.

In particular, the sign factor for a11 is (−1)1+1 = 1, for a12 is (−1)1+2 = −1, and for a13 is
(−1)1+3 = 1.

• For each entry aij in the row (or column) of A we chose, we multiply the entry aij by the
determinant of Aij and the sign (−1)i+j . In this case, we obtain the following numbers
 
1+1 4 3
a11 (−1) det(A11 ) = 1 det = 1(4 − 6) = −2
2 1
 
1+2 1 3
a12 (−1) det(A12 ) = −2 det = −2(1 − 6) = 10
2 1

a13 (−1)1+3 det(A13 ) = 0

Note that in the last calculation, since a13 = 0, we did not have to evaluate the rest of the
terms.

• Finally, we find the determinant by adding all these values:

det(A) = a11 (−1)1+1 det(A11 ) + a12 (−1)1+2 det(A12 )


+ a13 (−1)1+3 det(A13 )
= 8.

Cofactors

We will now define the determinant of a general n × n matrix A in terms of a cofactor expansion
as we did in the 3 × 3 case. To do so, we need some notation and terminology.

• We let Aij be the submatrix of A = [aij ] found by deleting the ith row and jth column of A.
The determinant of Aij is called the ijth minor of A or the minor corresponding to the entry
aij .
Section 16. The Determinant 283

• Notice that in the 3 × 3 case, we used the opposite of the 1,2 minor in the sum. It will be the
case that the terms in the cofactor expansion will alternate in sign. We can make the signs
in the sum alternate by taking −1 to an appropriate power. As a result, we define the ijth
cofactor Cij of A as
Cij = (−1)i+j det (Aij ) .

• Finally, we define the determinant of A.

Definition 16.1. If A = [aij ] is an n × n matrix, the determinant of A is the scalar

det(A) = a11 C11 + a12 C12 + a13 C13 + · · · + a1n C1n

where Cij = (−1)i+j det(Aij ) is the ij-cofactor of A and Aij is the matrix obtained by removing
row i and column j of matrix A.

This method for computing determinants is called the cofactor expansion or Laplace expansion
of A along the 1st row. The cofactor expansion reduces the computation of the determinant of an
n × n matrix to n computations of determinants of (n − 1) × (n − 1) matrices. These smaller
matrices can be reduced again using cofactor expansions, so it can be a long and grueling process
for large matrices. It turns out that we can actually take this expansion along any row or column of
the matrix (a proof of this fact is given in Section 21). For example, the cofactor expansion along
the 2nd row is
det(A) = a21 C21 + a22 C22 + · · · + a2n C2n
and along the 3rd column the formula is

det(A) = a13 C13 + a23 C23 + · · · + an3 Cn3 .

Note that when finding a cofactor expansion, choosing a row or column with many zeros makes
calculations easier.

Activity 16.1.
 
1 2 −1
(a) Let A =  −2 0 4 . Use the cofactor expansion along the first row to calculate the
6 3 0
determinant of A by hand.
 
1 4 2
(b) Calculate det(A) by using a cofactor expansion along the second row where A =  0 2 0 .
2 5 3
 
1 −2 3
(c) Calculate the determinant of  0 4 −3 .
0 0 8

(d) Which determinant property can be used to calculate the determinant in part (c)? Explain
how. (Determinant properties are included below for easy reference.)
284 Section 16. The Determinant

 
1 1 2
(e) Consider the matrix A =  0 2 1 . Let B be the matrix which results when c times
1 2 2
row 1 is added to row 2 of A. Evaluate the determinant of B by hand to check that it is
equal to the determinant of A, which verifies one other determinant property (in a specific
case).

As with any new idea, like the determinant, we must ask what properties are satisfied. We state
the following theorem without proof for the time being. For the interested reader, the proof of many
of these properties is given in Section 21 and others in the exercises.

Theorem 16.2. Given n × n matrices A, B, the following hold:

(1) det(AB) = det(A) · det(B), and in particular det(Ak ) = (det A)k for any positive integer
k.

(2) det(AT ) = det(A).

(3) A is invertible if and only if det(A) 6= 0.

(4) If A is invertible, then det(A−1 ) = (det A)−1 .


 
a b
(5) For a 2 × 2 matrix A = , det(A) = ad − bc.
c d

(6) If A is upper/lower triangular, then det(A) is the product of the entries on the diagonal.

(7) The determinant of a matrix is the product of the eigenvalues, with each eigenvalue repeated
as many times as its multiplicity.

(8) Effect of row operations:

• Adding a multiple of a row to another does NOT change the determinant of the matrix.
• Multiplying a row by a constant multiplies the determinant by the same constant.
• Row swapping multiplies the determinant by (−1).

(9) If the row echelon form U of A is obtained by adding multiples of one row to another, and
row swapping, then det(A) is equal to det(U ) multiplied by (−1)r where r is the number of
row swappings done during the row reduction.

Note that if we were to find the determinant of a 4 × 4 matrix using the cofactor method, we
will calculate determinants of 4 matrices of size 3 × 3, each of which will require 3 determinant
calculations again. So, we will need a total of 12 calculations of determinants of 2 × 2 matrices.
That is a lot of calculations. There are other, more efficient, methods for calculating determinants.
For example, we can row reduce the matrix, keeping track of the effect that each row operation has
on the determinant.
Section 16. The Determinant 285

The Determinant of a 3 × 3 Matrix

Earlier we defined the determinant of a 3 × 3 matrix. In this section we endeavor to understand the
motivation behind that definition.
We will repeat the process we went through in the 2×2 case to see how to define the determinant
of a 3 × 3 matrix. Let  
a11 a12 a13
A =  a21 a22 a23  .
a31 a32 a33
To find the inverse of A we augment A by the 3 × 3 identity matrix
 
a11 a12 a13 1 0 0
[A | I3 ] =  a21 a22 a23 0 1 0 
a31 a32 a33 0 0 1
and row reduce the matrix (using appropriate technology) to obtain
 
a33 a22 − a32 a23 a33 a12 − a32 a13 −a13 a22 + a12 a23
 1 0 0 −
 d d d 

 0 1 0 − a33 a21 − a31 a23 a33 a11 − a31 a13 a23 a11 − a21 a13
 
− ,
d d d
 
 
 −a31 a22 + a32 a21 a32 a11 − a31 a12 a22 a11 − a21 a12 
0 0 1 −
d d d
where
d = a33 a11 a22 − a33 a21 a12 − a31 a13 a22
(16.2)
− a32 a11 a23 + a32 a21 a13 + a31 a12 a23 .

In this case, we can see that the inverse of the 3 × 3 matrix A will be defined if and only if
d 6= 0. So, in the 3 × 3 case the determinant of A will be given by the value of d in Equation (16.2).
What remains is for us to see how this is related to determinants of 2 × 2 sub-matrices of A.
To start, we collect all terms involving a11 in d. A little algebra shows that
det(A) = a11 (a33 a22 − a32 a23 ) − a33 a21 a12 − a31 a13 a22 + a32 a21 a13 + a31 a12 a23 .

Now let’s collect the remaining terms involving a12 :


det(A) = a11 (a33 a22 − a32 a23 ) − a12 (a33 a21 − a31 a23 ) − a31 a13 a22 + a32 a21 a13 .

Finally, we collect the terms involving a13 :


det(A) = a11 (a33 a22 − a32 a23 ) − a12 (a33 a21 − a31 a23 ) + a13 (a32 a21 − a31 a22 ) .

Now we can connect the determinant of A to determinants of 2 × 2 sub-matrices of A.

• Notice that
a33 a22 − a32 a23
 
a22 a23
is the determinant of the 2 × 2 matrix obtained from A by deleting the first row
a32 a33
and first column.
286 Section 16. The Determinant

• Similarly, the expression


a33 a21 − a31 a23
 
a21 a23
is the determinant of the 2 × 2 matrix obtained from A by deleting the first row
a31 a33
and second column.

• Finally, the expression


a32 a21 − a31 a22
 
a21 a22
is the determinant of the 2 × 2 matrix obtained from A by deleting the first row
a31 a32
and third column.

Putting this all together gives us formula (16.1) for the determinant of a 3 × 3 matrix as we
defined earlier.

Two Devices for Remembering Determinants

There are useful ways to remember how to calculate the formulas for determinants of 2 × 2 and
a11 a12
3 × 3 matrices. In the 2 × 2 case of A = , we saw that
a21 a22

|A| = a11 a22 − a21 a22 .

This makes |A| the product of the diagonal elements a11 and a22 minus the product of the off-
diagonal elements a12 and a21 . We can visualize this in an array by drawing arrows across the
diagonal and off-diagonal, with a plus sign on the diagonal arrow indicting that we add the product
of the diagonal elements and a minus sign on the off-diagonal arrow indicating that we subtract the
product of the off-diagonal elements as shown in Figure 16.2.

a11 a12
@
a a22
@
21 @
− R
@
+
Figure 16.2: A diagram to remember the 2 × 2 determinant.

We can do a similar thing for the determinant of a 3 × 3 matrix. In this case, we extend the
3 × 3 array to a 3 × 5 array by adjoining the first two columns onto the matrix. We then add the
products along the diagonals going from left to right and subtract the products along the diagonals
going from right to left as indicated in Figure 16.3.

Examples

What follows are worked examples that use the concepts from this section.

Example 16.3. For each of the following


Section 16. The Determinant 287

a11 a12 a13 a11 a12


@ @ @
a21 @a22 @a23 @
a21 a22
@ @ @
@ @ @
a a @ a33 @
a31 @
a32
31 32
− − − R @
@
+
R @ R
+ +
Figure 16.3: A diagram to remember the 3 × 3 determinant.

• Identify the sub-matrices A1,j

• Determine the cofactors C1,j .

• Use the cofactor expansion to calculate the determinant.

 
3 6 2
(a) A =  0 4 −1 
5 0 1
 
3 0 1 1
 2 1 2 1 
(b) A =  
 1 −2 2 −1 
−3 2 3 1

Example Solution.

(a) With a 3 × 3 matrix, we will find the sub-matrices A11 , A12 , and A13 . Recall that Aij is
the sub-matrix of A obtained by deleting the ith row and jth column of A. Thus,
     
4 −1 0 −1 0 4
A11 = A12 = and A13 = .
0 1 5 1 5 0

The ijth cofactor is Cij = (−1)i+j det(Aij ), so


 
4 −1
C11 = (−1)2 =4
0 1
 
3 0 −1
C12 = (−1) = −5
5 1
 
4 0 4
C13 = (−1) = −20.
5 0

Then

det(A) = a11 C11 + a12 C12 + a13 C13 = (3)(4) + (6)(−5) + (2)(−20) = −58.
288 Section 16. The Determinant

(b) With a 4 × 4 matrix, we will find the sub-matrices A11 , A12 , A13 , and A14 . We see that

 
1 2 1
A11 =  −2 2 −1 
2 3 1
 
2 2 1
A12 =  1 2 −1 
−3 3 1
 
2 1 1
A13 =  1 −2 −1 
−3 2 1
 
2 1 2
A14 =  1 −2 2  .
−3 2 3

To calculate the ijth cofactor Cij = (−1)i+j det(Aij ), we need to calculate the determi-
nants of the A1j . Using the device for calculating the determinant of a 3 × 3 matrix we
have that
 
1 2 1
det(A11 ) = det  −2 2 −1 
2 3 1
= (1)(2)(1) + (2)(−1)(2) + (1)(−2)(3)
− (1)(2)(2) − (1)(−1)(3) − (2)(−2)(1)
= −5,

 
2 2 1
det(A12 ) = det  1 2 −1 
−3 3 1
= (2)(2)(1) + (2)(−1)(−3) + (1)(1)(3)
− (1)(2)(−3) − (2)(−1)(3) − (2)(1)(1)
= 23,

 
2 1 1
det(A13 ) = det  1 −2 −1 
−3 2 1
= (2)(−2)(1) + (1)(−1)(−3) + (1)(1)(2)
− (1)(−2)(−3) − (2)(−1)(2) − (1)(1)(1)
= −2,
Section 16. The Determinant 289

and
 
2 1 2
det(A14 ) = det  1 −2 2 
−3 2 3
= (2)(−2)(3) + (1)(2)(−3) + (2)(1)(2)
− (2)(−2)(−3) − (2)(2)(2) − (1)(1)(3)
= −37.

Then

C11 = (−1)2 det(A11 ) = −5


C12 = (−1)3 det(A12 ) = −23
C13 = (−1)4 det(A13 ) = −2
C14 = (−1)5 det(A13 ) = 37

and so

det(B) = b11 C11 + b12 C12 + b13 C13 + b14 C14


= (3)(−5) + (0)(−23) + (1)(−2) + (1)(37)
= 20.

Example 16.4. Show that for any 2 × 2 matrices A and B,

det(AB) = det(A) det(B).

Example Solution.
   
a11 a12 b11 b12
Let A = and B = . Then
a21 a22 b21 b22
 
a11 b11 + a12 b21 a11 b12 + a12 b22
AB = .
a21 b11 + a22 b21 a21 b12 + a22 b22

So

det(AB) = (a11 b11 + a12 b21 )(a21 b12 + a22 b22 )


− (a11 b12 + a12 b22 )(a21 b11 + a22 b21 )
= (a11 b11 a21 b12 + a11 b11 a22 b22 + a12 b21 a21 b12 + a12 b21 a22 b22 )
− (a11 b12 a21 b11 + a11 b12 a22 b21 + a12 b22 a21 b11 + a12 b22 a22 b21 )
= a11 b11 a22 b22 + a12 b21 a21 b12 − a11 b12 a22 b21 − a12 b22 a21 b11 .

Also,

det(A) det(B) = (a11 a22 − a12 a21 )(b11 b22 − b12 b21 )
= a11 a22 b11 b22 − a11 a22 b12 b21 − a12 a21 b11 b22 + a12 a21 b12 b21 .

We conclude that det(AB) = det(A) det(B) if A and B are 2 × 2 matrices.


290 Section 16. The Determinant

Summary

• The determinant of an n × n matrix A = [aij ] is found by taking the cofactor expansion of


A along the first row. That is

det(A) = a11 C11 + a12 C12 + a13 C13 + · · · + a1n C1n ,

where
– Aij is the sub-matrix of A found by deleting the ith row and jth column of A.
– Cij = (−1)i+j det (Aij ) is the ijth cofactor of A.
• The matrix A is invertible if and only if det(A) 6= 0.

Exercises

(1) Use the cofactor expansion to explain why multiplying each of the entries of a 3 × 3 matrix
A by 2 multiplies the determinant of A by 8.
 
1 1 2
(2) Use the determinant criterion to determine for which c the matrix A =  1 0 c  is
2 −1 2
invertible.
(3) Let A be a square matrix.
(a) Explain why det(A2 ) = [det(A)]2
(b) Expand on the argument from (a) to explain why det(Ak ) = [det(A)]k for any
positive integer k.
(c) Suppose that A is an invertible matrix and k is a positive integer. Must Ak be an
invertible matrix? Why or why not?
1
(4) Let A be an invertible matrix. Explain why det(A−1 ) = using determinant proper-
det(A)
ties.
(5) Simplify the following determinant expression using determinant properties:

det(P A4 P −1 AT (A−1 )3 )

(6) Find the eigenvalues of the following matrices. Find a basis for and the dimension of each
eigenspace.
 
1 1 1
(a) A =  1 1 1 
1 1 1
 
2 0 3
(b) A =  0 1 0 
0 1 2
Section 16. The Determinant 291

(7) Label each of the following statements as True or False. Provide justification for your re-
sponse.
(a) True/False For any two n × n matrices A and B, det(A + B) = det A + det B.
(b) True/False For any square matrix A, det(−A) = − det(A).
(c) True/False For any square matrix A, det(−A) = det(A).
(d) True/False The determinant of a square matrix with all non-zero entries is non-zero.
(e) True/False If the determinant of A is non-zero, then so is the determinant of A2 .
(f) True/False If the determinant of a matrix A is 0, then one of the rows of A is a linear
combination of the other rows.
(g) True/False For any square matrix A, det(A2 ) > det(A).
(h) True/False If A and B are n × n matrices and AB is invertible, then A and B are
invertible.
(i) True/False If A2 is the zero matrix, then the only eigenvalue of A is 0.
(j) True/False If 0 is an eigenvalue of A, then 0 is an eigenvalue of AB for any B of the
same size as A.
(k) True/False Suppose A is a 3 × 3 matrix. Then any three eigenvectors of A will form
a basis of R3 .

Project: Area and Volume Using Determinants

The approach we will take to connecting area (volume) to the determinant will help shed light on
properties of the determinant that we will discuss from an algebraic perspective in a later section.
First, we mention some basic properties of area (we focus on area for now, but these same properties
are valid for volumes as well). volume). As a shorthand, we denote the area of a region R by
Area(R).

• Area cannot be negative.

• If two regions R1 and R2 don’t overlap, then the area of the union of the regions is equal
to the sum of the areas of the regions. That is, if R1 ∩ R2 = ∅, then Area(R1 ∪ R2 ) =
Area(R1 ) + Area(R2 ).

• Area is invariant under translation. That is, if we move a geometric region by the same
amount uniformly in a given direction, the area of the original region and the area of the
transformed region are the same. A translation of a region is done by just adding a fixed
vector to each vector in the region. That is, a translation by a vector v is a function Tv such
that the image Tv (R) of a region R is defined as

Tv (R) = {r + v : r ∈ R}.

Since area is translation invariant, Area(Tv (R)) = Area(R).


292 Section 16. The Determinant

• The area of a one-dimensional object like a line segment is 0.

Now we turn our attention to areas of parallelograms. Let u and v be vectors in R2 . The
parallelogram P (u, v) defined by u and v with point Q as basepoint is the set
−−→
P (u, v) = {OQ + ru + sv : 0 ≤ r, s ≤ 1}.

An illustration of such a parallelogram is shown at left in Figure 16.4. If u = [u1 u2 ]T and v =

u v′
θ
Q
O
u′

Figure 16.4: A parallelogram and a translated, rotated parallelogram.

 
u1 u2
[v1 v2 ]T , then we will also represent P (u, v) as P .
v1 v2
−−→
Since area is translation and rotation invariant, we can translate our parallelogram by −OQ to
place its basepoint at the origin, then rotate by an angle θ (as shown at left in Figure 16.4. This
transforms the vector v to a vector v0 and the vector u to a vector u0 as shown at right in Figure
16.4. With this in mind we can always assume that our parallelograms have one vertex at the origin,
with u along the x-axis, and v in standard position. Now we can investigate how to calculate the
area of a parallelogram.

Project Activity 16.1. There are two situations to consider when we want to find the area of a
parallelogram determined by vectors u and v, both shown in Figure 16.5. The parallelogram will
be determined by the lengths of these vectors.

B C B C
v v

h h

u u
O D A E O A D E

Figure 16.5: Parallelograms formed by u and v

(a) In the situation depicted at left in Figure 16.5, use geometry to explain why Area(P (u, v)) =
h|u|. (Hint: What can we say about the triangles ODB and EAC?)
Section 16. The Determinant 293

(b) In the situation depicted at right in Figure 16.5, use geometry to again explain why
Area(P (u, v)) = h|u|. (Hint: What can we say about Area(AEC) and Area(ODB)?)

The result of Project Activity 16.1 is that the area of P (u, v) is given by h|u|, where h is the
height of the parallelogram determined by dropping a perpendicular from the terminal point of v to
the line determined by the vector u.
Now we turn to the question of how the determinant is related to area of a parallelogram. Our
approach will use some properties of the area of P (u, v).
Project Activity 16.2. Let u and v be vectors that determine a parallelogram in R2 .

v v v + ku

h h h

u ku u

Figure 16.6: Parallelograms formed by ku and v and by u and v + ku.

(a) Explain why


Area(P (u, v)) = Area(P (v, u)) (16.3)
(b) If k is any scalar, then ku either stretches or compresses u. Use this idea, and the result of
Project Activity 16.1, to explain why
Area(P (ku, v)) = Area(P (u, kv)) = |k|Area(P (u, v)) (16.4)
for any real number k. A representative picture of this situation is shown at left in Figure
16.5 for a value of k > 1. You will also need to consider what happens when k < 0.

(c) Finally, use the result of Project Activity 16.1 to explain why
Area(P (u + kv, v)) = Area(P (u, v + ku)) = Area(P (u, v)) (16.5)
for any real number k. A representative picture is shown at right in Figure 16.6.

Properties (16.4) and (16.5) will allow us to calculate the area of the parallelogram determined
by vectors u and v.
Project Activity 16.3. Let u = [u1 u2 ]T and v = [v1 v2 ]T . We will now demonstrate that
 
u1 u2
Area(P (u, v)) = det

.
v1 v2

Before we begin, note that if both u1 and v1 are 0, then u and v are parallel. This makes P (u, v) a
line segment and so Area(P (u, v)) = 0. But if u1 = v1 = 0, it is also the case that
 
u1 u2
det = u1 v2 − u2 v1 = 0
v1 v2
as well. So we can assume that at least one of u1 , v1 is not 0. Since P (u, v) = P (v, u), we can
assume without loss of generality that u1 6= 0.
294 Section 16. The Determinant

(a) Explain using properties (16.4) and (16.5) as appropriate why


   
v1
Area(P (u, v)) = Area P u, 0 v2 − u2 .
u1
h iT
v1
(b) Let v1 = 0 v2 − u1 u2 . Recall that our alternate representation of P (u, v)) allows us
to write   
u1 u2
Area(P (u, v1 )) = Area P .
0 v2 − uv11 u2
This should seem very suggestive. We are essentially applying the process of Gaussian
elimination to our parallelogram matrix to reduce it to a diagonal matrix. From there,
we can calculate the area. The matrix form should indicate the next step – applying an
operation to eliminate the entry in the first row and second column. To do this, we need to
consider what happens if v2 − uv11 u2 = 0 and if v2 − uv11 u2 6= 0.
v1
i. Assume that v2 − = 0. Explain why Area(P (u, v)) = 0. Then explain why
u1 u2
 
u1 u2
Area(P (u, v)) = 0 = det .
v1 v2
ii. Now we consider the case when v2 − uv11 u2 6= 0. Complete the process as in part (a),
using properties (16.4) and (16.5) (compare to Gaussian elimination) to continue to re-
duce the problem of calculating Area(P (u, v)) to one of calculating Area(P (e1 , e2 )).
Use this process to conclude that
 
u1 u2
Area(P (u, v)) = det

.
v1 v2

We can apply the same arguments as above using rotations, translations, shearings, and scalings
to show that the properties of area given above work in any dimension. Given vectors u1 , u2 , . . .,
un in Rn , we let
−−→
P (u1 , u2 , . . . , un ) = {OQ + x1 u1 + x2 u2 + · · · + xn un : 0 ≤ xi ≤ 1 for each i}.

If n = 2, then P (u1 , u2 ) is the parallelogram determined by u1 and u2 with basepoint Q. If


n = 3, then P (u1 , u2 , u3 ) is the parallelepiped with basepoint Q determined by u1 , u2 , and u3 .
In higher dimensions the sets P (u1 , u2 , . . . , un ) are called parallelotopes, and we use the nota-
tion Vol(P (u1 , u2 , . . . , un )) for their volume. The n-dimensional volumes of these paralleotopes
satisfy the following properties:

Vol(P (u1 , u2 , . . . , ui−1 , ui , ui+1 , . . . , uj−1 , uj , uj+1 , . . . , un ))


= Vol(P (u1 , u2 , . . . , ui−1 , uj , ui+1 , . . . , uj−1 , ui , uj+1 , . . . , un )) (16.6)

for any i and j.

Vol(P (u1 , u2 , . . . , ui−1 , kui , ui+1 , . . . , un )) = |k|Vol(P (u1 , u2 , . . . , un )) (16.7)

for any real number k and any i.

Vol(P (u1 , u2 , . . . , ui−1 , ui + kuj , ui+1 , . . . , un )) = Vol(P (u1 , u2 , . . . , un )) (16.8)

for any real number k and any distinct i and j.


Section 16. The Determinant 295

ProjectActivity
 16.4. We now show that Vol(P (u1 , u2 , u3 )) is the absolute value of the determi-
u1
nant of  u2 . For easier notation, let u = [u1 u2 u3 ]T , v = [v1 v2 v3 ]T , and w = [w1 w2 w3 ]T .
u3
As we argued in the 2-dimensional case, we can assume that all terms that we need to be nonzero
are nonzero, and we can do so without verification.

(a) Explain how property (16.7) shows that Vol(P (u, v, w)) is equal to

u1 u2 u3
  
1 1
Vol P  0 u1 2 1 − v1 u2 )
(v u u1 (v3 u1 − v1 u3 )
 .
1 1
0 u1 (w2 u1 − w1 u2 ) u1 (w3 u1 − w1 u3 )

(Hint: Think about how these properties are related to row operations.)
h iT
(b) Now let v1 = 0 u11 (v2 u1 − v1 u2 ) u11 (v3 u1 − v1 u3 ) and
h iT
w1 = 0 u11 (w2 u1 − w1 u2 ) u11 (w3 u1 − w1 u3 ) . Explain how property (16.7) shows that
Vol(P (u, v, w)) is equal to
  
u1 u2 u3
Vol P  0 u11 (v2 u1 − v1 u2 ) u11 (v3 u1 − v1 u3 )  ,
0 0 d

where
1
d= (u1 (v2 w3 − v3 w2 ) − u2 (v1 w3 − v3 w1 ) + u3 (v1 w2 − v2 w1 )).
u1 v2 − u2 v1

(c) Just as we saw in the 2-dimensional case, we can proceed to use the diagonal entries to
eliminate the entries above the diagonal without changing the volume to see that
  
u1 0 0
Vol(P (u, v, w)) = Vol P  0 u11 (v2 u1 − v1 u2 ) 0  .
0 0 d

Complete the process, applying appropriate properties to explain why

Vol(P (u, v, w)) = xVol(P (e1 , e2 , e3 ))

for some constant x. Find the constant and, as a result, find a specific expression for
Vol(P (u, v, w)) involving a determinant.

Properties (16.6), (16.7), and (16.8) involve the analogs of row operations on matrices, and we
will prove algebraically that the determinant exhibits the same properties. In fact, the determinant
can be uniquely defined by these properties. So in a sense, the determinant is an area or volume
function.
Section 17

The Characteristic Equation

Focus Questions
By the end of this section, you should be able to give precise and thorough
answers to the questions listed below. You may want to keep these questions
in mind to focus your thoughts as you complete the section.

• What is the characteristic polynomial of a matrix?


• What is the characteristic equation of a matrix?
• How and why is the characteristic equation of a matrix useful?
• How many different eigenvalues can an n × n matrix have?
• How large can the dimension of the eigenspace corresponding to an eigen-
value be?

Application: Modeling the Second Law of Thermodynamics

Pour cream into your cup of coffee and the cream spreads out; straighten up your room and it soon
becomes messy again; when gasoline is mixed with air in a car’s cylinders, it explodes if a spark is
introduced. In each of these cases a transition from a low energy state (your room is straightened
up) to a higher energy state (a messy, disorganized room) occurs. This can be described by entropy
– a measure of the energy in a system. Low energy is organized (like ice cubes) and higher energy
is not (like water vapor). It is a fundamental property of energy (as described by the second law of
thermodynamics) that the entropy of a system cannot decrease. In other words, in the absence of
any external intervention, things never become more organized.
The Ehrenfest model1 is a Markov process proposed to explain the statistical interpretation
of the second law of thermodynamics using the diffusion of gas molecules. This process can be
modeled as a problem of balls and bins, as we will do later in this section. The characteristic
1
named after Paul and Tatiana Ehrenfest who introduced it in “Über zwei bekannte Einwände gegen das Boltz-
mannsche H-Theorem,” Physikalishce Zeitschrift, vol. 8 (1907), pp. 311-314)

297
298 Section 17. The Characteristic Equation

polynomial of the transition matrix will help us find the eigenvalues and allow us to analyze our
model.

Introduction

We have seen that the eigenvalues of an n × n matrix A are the scalars λ so that A − λIn has
a nontrivial null space. Since a matrix has a nontrivial null space if and only if the matrix is not
invertible, we can also say that λ is an eigenvalue of A if

det(A − λIn ) = 0. (17.1)

This equation is called the characteristic equation of A. It provides us an algebraic way to find
eigenvalues, which can then be used in findingeigenvectors
 corresponding to each eigenvalue.
1 1
Suppose we want to find the eigenvalues of A = . Note that
1 3
 
1−λ 1
A − λI2 = ,
1 3−λ

with determinant (1 − λ)(3 − λ) − 1 = λ2 − 4λ + 2. Hence, the eigenvalues λ1 , λ2 are the solutions



of the characteristic
√ equation λ 2 − 4λ + 2 = 0. Using quadratic formula, we find that λ = 2 + 2
1
and λ2 = 2 − 2 are the eigenvalues.
In this activity, our goal will be to use the characteristic equation to obtain information about
eigenvalues and eigenvectors of a matrix with real entries.
Preview Activity 17.1.

(1) For each of the following parts, use the characteristic equation to determine the eigenvalues of
A. Then, for each eigenvalue λ, find a basis of the corresponding eigenspace, i.e., Nul (A −
λI). You might want to recall how to find a basis for the null space of a matrix from Section
13. Also, make sure that your eigenvalue candidate λ yields nonzero eigenvectors in Nul (A−
λI) for otherwise
 λ willnot be an eigenvalue.
   
2 0 1 2 1 4
(a) A = (b) A = (c) A =
0 −3 0 1 2 3

(2) Use your eigenvalue and eigenvector calculations of the above problem as a guidance to
answer the following questions about a matrix with real entries.
(a) At most how many eigenvalues can a 2 × 2 matrix have? Is it possible to have no
eigenvalues? Is it possible to have only one eigenvalue? Explain.
(b) If a matrix is an upper-triangular matrix (i.e., all entries below the diagonal are 0’s,
as in the first two matrices of the previous problem), what can you say about its
eigenvalues? Explain.
(c) How many linearly independent eigenvectors can be found for a 2 × 2 matrix? Is it
possible to have a matrix without 2 linearly independent eigenvectors? Explain.

(3) Using the characteristic equation, determine which matrices have 0 as an eigenvalue.
Section 17. The Characteristic Equation 299

The Characteristic Equation

Until now, we have been given eigenvalues or eigenvectors of a matrix and determined eigenvectors
and eigenvalues from the known information. In this section we use determinants to find (or ap-
proximate) the eigenvalues of a matrix. From there we can find (or approximate) the corresponding
eigenvectors. The tool we will use is a polynomial equation, the characteristic equation, of a square
matrix whose roots are the eigenvalues of the matrix. The characteristic equation will then provide
us with an algebraic way of finding the eigenvalues of a square matrix.
We have seen that the eigenvalues of a square matrix A are the scalars λ so that A − λI has
a nontrivial null space. Since a matrix has a nontrivial null space if and only if the matrix is not
invertible, we can also say that λ is an eigenvalue of A if

det(A − λI) = 0. (17.2)

Note that if A is an n × n matrix, then det(A − λI) is a polynomial of degree n. Furthermore, if A


has real entries, the polynomial has real coefficients. This polynomial, and the equation (17.2) are
given special names.

Definition 17.1. Let A be an n × n matrix. The characteristic polynomial of A is the polynomial

det(A − λIn ),

where In is the n × n identity matrix. The characteristic equation of A is the equation

det(A − λIn ) = 0.

So the characteristic equation of A gives us an algebraic way of finding the eigenvalues of A.

Activity 17.1.
 
3 −2 5
(a) Find the characteristic polynomial of the matrix A =  1 0 7 , and use the charac-
0 0 1
teristic polynomial to find all of the eigenvalues of A.
 
1 0 0 1
 1 2 0 0 
(b) Verify that 1 and 2 are the only eigenvalues of the matrix 
 0 0 1 0 .

0 0 0 1

As we argued in Preview Activity 17.1, a 2 × 2 matrix can have at most 2 eigenvalues. For
an n × n matrix, the characteristic polynomial will be a degree n polynomial, and we know from
algebra that a degree n polynomial can have at most n roots. Since an eigenvalue of a matrix is a
root of the characteristic polynomial of that matrix, we can conclude that an n × n matrix can have
at most n distinct eigenvalues. Activity 17.1 (b) shows that a 4 × 4 matrix may have fewer than 4
eigenvalues, however. Note that one of these eigenvalues, the eigenvalue 1, appears three times as
a root of the characteristic polynomial of the matrix. The number of times an eigenvalue appears as
a root of the characteristic polynomial is called the (algebraic) multiplicity of the eigenvalue. More
formally:
300 Section 17. The Characteristic Equation

Definition 17.2. The (algebraic) multiplicity of an eigenvalue λ of a matrix A is the largest integer
m so that (x − λ)m divides the characteristic polynomial of A.

Thus, in Activity 17.1 (b) the eigenvalue 1 has multiplicity 3 and the eigenvalue 2 has multi-
plicity 1. Notice that if we count the eigenvalues of an n × n matrix with their multiplicities, the
total will always be n.
If A is a matrix with real entries, then the characteristic polynomial will have real coefficients.
It is possible that the characteristic polynomial can have complex roots, and that the matrix A has
complex eigenvalues. The Fundamental Theorem of Algebra shows us that if a real matrix has
complex eigenvalues, then those eigenvalues will appear in conjugate pairs, i.e., if λ1 = a + ib is
an eigenvalue of A, then λ2 = a − ib is another eigenvalue of A. Furthermore, for an odd degree
polynomial, since the complex eigenvalues will come in conjugate pairs, we will be able to find at
least one real eigenvalue.
We now summarize the information we have so far about eigenvalues of an n × n real matrix:

Theorem 17.3. Let A be an n × n matrix with real entries. Then

(1) There are at most n eigenvalues of A. If each eigenvalue (including complex eigenvalues) is
counted with its multiplicity, there are exactly n eigenvalues.

(2) If A has a complex eigenvalue λ, the complex conjugate of λ is also an eigenvalue of A.

(3) If n is odd, A has at least one real eigenvalue.

(4) If A is upper or lower-triangular, the eigenvalues are the entries on the diagonal.

Eigenspaces, A Geometric Example

Recall that for each eigenvalue λ of an n × n matrix A, the eigenspace of A corresponding to


the eigenvalue λ is Nul (A − λIn ). These eigenspaces can tell us important information about the
matrix transformation defined by A. For example, consider the matrix transformation T from R3 to
R3 defined by T (x) = Ax, where
 
1 0 1
A =  0 1 1 .
0 0 2

We are interested in understanding what this matrix transformation does to vectors in R3 . First we
note that
 A has eigenvalues  λ1 = 1 and λ2 = 2, with λ1 having multiplicity 2. There is a pair
1 0
v1 =  0  and v2 =  1  of linearly independent eigenvectors for A corresponding to the
0 0
 
1
eigenvalue λ1 and an eigenvector v3 =  1  for A corresponding to the eigenvalue λ2 . Note
1
that the vectors v1 , v2 , and v3 are linearly independent (recall from Theorem that eigenvectors
corresponding to different eigenvalues are always linearly independent). So any vector b in R3 can
Section 17. The Characteristic Equation 301

be written uniquely as a linear combination of v1 , v2 , and v3 . Let’s now consider the action of the
matrix transformation T on a linear combination of v1 , v2 , and v2 . Note that

T (c1 v1 + c2 v2 + c3 v3 ) = c1 T (v1 ) + c2 T (v2 ) + c3 T (v3 )


= c1 λ1 v1 + c2 λ1 v2 + c3 λ2 v3
= (1)(c1 v1 + c2 v2 ) + (2)c3 v3 . (17.3)

Equation (17.3) illustrates that it is most convenient to view the action of T in the coordinate system
where Span{v1 } serves as the x-axis, Span{v2 } serves as the y-axis, and Span{v3 } as the z-
axis. In this case, we can visualize that when we apply the transformation T to a vector b =
c1 v1 + c2 v2 + c3 v3 in R3 the result is an output vector that is unchanged in the v1 -v2 plane and
scaled by a factor of 2 in the v3 direction. For example, consider the box whose sides are determined
by the vectors v1 , v2 , and v3 as shown in Figure 17.1. The transformation T stretches this box by
a factor of 2 in the v3 direction and leaves everything else alone, as illustrated in Figure 17.1. So
the entire Span{v1 , v2 }) is unchanged by T , but Span{v3 }) is scaled by 2. In this situation, the
eigenvalues and eigenvectors provide the most convenient perspective through which to visualize
the action of the transformation T .

1
v3

0
3
2
y
v2
0 v1 2 3
x

Figure 17.1: A box and a transformed box.

This geometric perspective illustrates how each eigenvalue and the corresponding eigenspace of
A tells us something important about A. So it behooves us to learn a little more about eigenspaces.

Dimensions of Eigenspaces

There is a connection between the dimension of the eigenspace of a matrix corresponding to an


eigenvalue and the multiplicity of that eigenvalue as a root of the characteristic polynomial. Recall
that the dimension of a subspace of Rn is the number of vectors in a basis for the eigenspace. We
investigate the connection between dimension and multiplicity in the next activity.

Activity 17.2.
302 Section 17. The Characteristic Equation

 
3 −2 5
(a) Find the dimension of the eigenspace for each eigenvalue of matrix A =  1 0 7 
0 0 1
from Activity 17.1 (a).
 
1 0 0 1
 1 2 0 0 
(b) Find the dimension of the eigenspace for each eigenvalue of matrix A =  
 0 0 1 0 
0 0 0 1
from Activity 17.1 (b).

(c) Consider now a 3 × 3 matrix with 3 distinct eigenvalues λ1 , λ2 , λ3 .

i. Recall that a polynomial of degree can have at most three distinct roots. What does
that say about the multiplicities of λ1 , λ2 , λ3 ?
ii. Use the fact that eigenvectors corresponding to distinct eigenvalues are linearly inde-
pendent to find the dimensions of the eigenspaces for λ1 , λ2 , λ3 .

The examples in Activity 17.2 all provide instances of the principle that the dimension of an
eigenspace corresponding to an eigenvalue λ cannot exceed the multiplicity of λ. Specifically:
Theorem 17.4. If λ is an eigenvalue of A, the dimension of the eigenspace corresponding to λ is
less than or equal to the multiplicity of λ.
 
1 0 1
The examples we have seen raise another important point. The matrix A =  0 1 1  from
0 0 2
our geometric example has two eigenvalues 1 and 2, with the eigenvalue 1 having multiplicity 2.
If we let Eλ represent the eigenspace of A corresponding to the eigenvalue  λ, then dim(E
 1) = 2
2 0 1
and dim(E2 ) = 1. If we change this matrix slightly to the matrix B =  0 1 1  we see that
0 0 1
B has two eigenvalues 1 and 2, with the eigenvalue 1 having multiplicity 2. However, in this case
we have dim(E1 ) = 1 (like the example in from Activities 17.1 (a) and 17.2 (a)). In this case the
vector v1 = [1 0 0]T forms a basis for E2 and the vector v2 = [0 1 0]T forms a basis for E1 . We
can visualize the action of B on the square formed by v1 and v2 in the xy-plane as a scaling by 2
in the v1 direction as shown in Figure 17.2, but since we do not have a third linearly independent
eigenvector, the action of B in the direction of [0 0 1]T is not so clear.
So the action of a matrix transformation can be more easily visualized if the dimension of each
eigenspace is equal to the multiplicity of the corresponding eigenvalue. This geometric perspective
leads us to define the geometric multiplicity of an eigenvalue.
Definition 17.5. The geometric multiplicity of an eigenvalue of an n×n matrix A is the dimension
of the corresponding eigenspace Nul (A − λIn ).

Examples

What follows are worked examples that use the concepts from this section.
Section 17. The Characteristic Equation 303

0
2
y v2
0 v1 2 3
x

Figure 17.2: A box and a transformed box.

 
−1 0 −2
Example 17.6. Let A =  2 1 2 .
0 0 1

(a) Find the characteristic polynomial of A.

(b) Factor the characteristic polynomial and find the eigenvalues of A.

(c) Find a basis for each eigenspace of A.

(d) Is it possible to find a basis for R3 consisting of eigenvectors of A? Explain.

Example Solution.

(a) The characteristic polynomial of A is

p(λ) = det(A − λI3 )


 
−1 − λ 0 −2
= det  2 1−λ 2 
0 0 1−λ
= (−1 − λ)(1 − λ)(1 − λ).

(b) The eigenvalues of A are the solutions to the characteristic equation. Since

p(λ) = (−1 − λ)(1 − λ)(1 − λ) = 0

implies λ = −1 or λ = 1, the eigenvalues of A are 1 and −1.

(c) To find a basis for the eigenspace of A corresponding to the eigenvalue


 1, we find aba-
−2 0 −2
sis for Nul (A − I3 ). The reduced row echelon form of A − I3 =  2 0 2  is
0 0 0
304 Section 17. The Characteristic Equation

   
1 0 1 x1
 0 0 0 . If x =  x2 , then (A − I3 )x = 0 has general solution
0 0 0 x3

       
x1 −x3 0 −1
x =  x2  =  x2  = x2  1  + x3  0  .
x3 x3 0 1

Therefore, {[0 1 0]T , [−1 0 1]T } is a basis for the eigenspace of A corresponding to the
eigenvalue 1.

To find a basis for the eigenspace of A corresponding to the eigenvalue


 −1, we find
 a
0 0 −2
basis for Nul (A + I3 ). The reduced row echelon form of A + I3 =  2 2 2  is
0 0 2
   
1 1 0 x1
 0 0 1 . If x =  x2 , then (A + I3 )x = 0 has general solution
0 0 0 x3

     
x1 −x2 −1
x =  x2  =  x2  = x2  1  .
x3 0 0

Therefore, a basis for the eigenspace of A corresponding to the eigenvalue −1 is {[−1 1 0]T }.

(d) Let v1 = [0 1 0]T , [−1 0 1]T , v2 = [−1 0 1]T , and v3 = [−1 1 0]T . Since eigenvectors
corresponding to different eigenvalues are linearly independent, and since neither v1 nor
v2 is a scalar multiple of the other, we can conclude that the set {v1 , v2 , v3 } is a linearly
independent set with 3 = dim(R3 ) vectors. Therefore, {v1 , v2 , v3 } is a basis for R3
consisting of eigenvectors of A.

Example 17.7. Find a 3 × 3 matrix A that has an eigenvector v1 = [1 0 1]T with corresponding
eigenvalue λ1 = 2, an eigenvector v2 = [0 2 − 3]T with corresponding eigenvalue λ2 = −3, and
an eigenvector v3 = [−4 0 5]T with corresponding eigenvalue λ3 = 5. Explain your process.
Example Solution. We are looking for a 3 × 3 matrix A such that Av1 = 2v1 , Av2 = −3v2 and
Av3 = 5v3 . Since v1 , v2 , and v3 are eigenvectors corresponding to different eigenvalues, v1 , v2 ,
Section 17. The Characteristic Equation 305

and v3 are linearly independent. So the matrix [v1 v2 v3 ] is invertible. It follows that
A[v1 v2 v3 ] = [Av1 Av2 Av3 ]
 
1 0 −4
A 0
 2 0  = [2v1 − 3v2 5v3 ]
1 −3 5
   
1 0 −4 2 0 −20
A 0
 2 0  =  0 −6 0 
1 −3 5 2 9 25
  −1
2 0 −20 1 0 −4
A =  0 −6 0  0 2 0 
2 9 25 1 −3 5
 
  5 2 4
2 0 −20  9 3 9 
A = 0 −6 0  1
 0 2 0 
 
2 9 25
− 91 16 91
 
10
3 −2 − 43
 
A=  0 −3 0 .

5 11
− 3 10 3

Summary

In this section we studied the characteristic polynomial of a matrix and similar matrices.

• If A is an n × n matrix, the characteristic polynomial of A is the polynomial


det(A − λIn ),
where In is the n × n identity matrix.
• If A is an n × n matrix, the characteristic equation of A is the equation
det(A − λIn ) = 0.

• The characteristic equation of a square matrix provides us an algebraic method to find the
eigenvalues of the matrix.
• The eigenvalues of an upper or lower-triangular matrix are the entries on the diagonal.
• There are at most n eigenvalues of an n × n matrix.
• For a real matrix A, if an eigenvalue λ of A is complex, then the complex conjugate of λ is
also an eigenvalue.
• The algebraic multiplicity of an eigenvalue λ is the multiplicity of λ as a root of the charac-
teristic equation.
• The dimension of the eigenspace corresponding to an eigenvalue λ is less than or equal to the
algebraic multiplicity of λ.
306 Section 17. The Characteristic Equation

Exercises

(1) There is interesting relationship2 between a matrix and its characteristic equation that we
explore in this exercise.
 
1 2
(a) We first illustrate with an example. Let B = .
1 −2
i. Show that λ2 + λ − 4 is the characteristic polynomial for B.
ii. Calculate B 2 . Then compute B 2 + B − 4I2 . What do you get?
(b) The first part of this exercise presents an example of a matrix that satisfies its own
characteristic equation. Explain for a general n × n matrix, why A satisfies its char-
acteristic equation.

(2) There is a useful relationship between the determinant and eigenvalues of a matrix A that we
explore in this exercise.
 
2 3
(a) Let B = . Find the determinant of B and the eigenvalues of B, and com-
8 4
pare det(B) to the eigenvalues of B.
(b) Let A be an n × n matrix. In this part of the exercise we argue the general case
illustrated in the previous part – that det(A) is the product of the eigenvalues of A.
Let p(λ) = det(A − λIn ) be the characteristic polynomial of A.
i. Let λ1 , λ2 , . . ., λn be the eigenvalues of A (note that these eigenvalues may not
all be distinct). Recall that if r is a root of a polynomial q(x), then (x − r) is a
factor of q(x). Use this idea to explain why

p(λ) = (−1)n (λ − λ1 )(λ − λ2 ) · · · (λ − λn ).

ii. Explain why p(0) = λ1 λ2 · · · λn .


iii. Why is p(0) also equal to det(A). Explain how we have shown that det(A) is
the product of the eigenvalues of A.

(3) Find the eigenvalues of the following matrices. For each eigenvalue, determine its algebraic
and geometric multiplicity.
 
1 1 1
(a) A =  1 1 1 
1 1 1
 
2 0 3
(b) A =  0 1 0 
0 1 2

(4) Let A be an n × n matrix. Use the characteristic equation to explain why A and AT have the
same eigenvalues.
2
This result is known as the Cayley-Hamilton Theorem and is one of the fascinating results in linear algebra.
Section 17. The Characteristic Equation 307

(5) Find three 3 × 3 matrices whose eigenvalues are 2 and 3, and for which the dimensions of the
eigenspaces for λ = 2 and λ = 3 are different.

(6) Suppose A is an n × n matrix and B is an invertible n × n matrix. Explain why the charac-
teristic polynomial of A is the same as the characteristic polynomial of BAB −1 , and hence,
as a result, the eigenvalues of A and BAB −1 are the same.

(7) Label each of the following statements as True or False. Provide justification for your re-
sponse.
(a) True/False If the determinant of a 2 × 2 matrix A is positive, then A has two distinct
real eigenvalues.
(b) True/False If two 2 × 2 matrices have the same eigenvalues, then the have the same
eigenvectors.
(c) True/False The characteristic polynomial of an n × n matrix has degree n.
(d) True/False If R is the reduced row echelon form of an n × n matrix A, then A and
R have the same eigenvalues.
(e) True/False If R is the reduced row echelon form of an n × n matrix A, and v is an
eigenvector of A, then v is an eigenvector of R.
(f) True/False Let A and B be n × n matrices with characteristic polynomials pA (λ)
and pB (λ), respectively. If A 6= B, then pA (λ) 6= pB (λ).
(g) True/False Every matrix has at least one eigenvalue.
(h) True/False Suppose A is a 3 × 3 matrix with three distinct eigenvalues. Then any
three eigenvectors, one for each eigenvalue, will form a basis of R3 .
(i) True/False If an eigenvalue λ is repeated 3 times among the eigenvalues of a matrix,
then there are at most 3 linearly independent eigenvectors corresponding to λ.

Project: The Ehrenfest Model

To realistically model the diffusion of gas molecules we would need to consider a system with a
large number of balls as substitutes for the gas molecules. However, the main idea can be seen in
a model with a much smaller number of balls, as we will do now. Suppose we have two bins that
contain a total of 4 balls between them. Label the bins as Bin 1 and Bin 2. In this case we can
think of entropy as the number of different possible ways the balls can be arranged in the system.
For example, there is only 1 way for all of the balls to be in Bin 1 (low entropy), but there are 4
ways that we can have one ball in Bin 1 (choose any one of the four different balls, which can be
distinguished from each other) and 3 balls in Bin 2 (higher entropy). The highest entropy state has
the balls equally distributed between the bins (with 6 different ways to do this).
We assume that there is a way for balls to move from one bin to the other (like having gas
molecules pass through a permeable membrane). A way to think about this is that we select a ball
(from ball 1 to ball 4, which are different balls) and move that ball from its current bin to the other
bin. Consider a “move” to be any instance when a ball changes bins. A state is any configuration of
308 Section 17. The Characteristic Equation

balls in the bins at a given time, and the state changes when a ball is chosen at random and moved
to the other bin. The possible states are to have 0 balls in Bin 1 and 4 balls in Bin 2 (State 0, entropy
1), 1 ball in Bin 1 and 3 in Bin 2 (State 1, entropy 4), 2 balls in each Bin (State 2, entropy 6), 3 balls
in Bin 1 and 1 ball in Bin 2 (State 3, entropy 4), and 4 balls in Bin 1 and 0 balls in Bin 2 (State 4,
entropy 1). These states are shown in Figure 17.3.

Bin 1 Bin 2 Bin 1 Bin 2 Bin 1 Bin 2 Bin 1 Bin 2


x0 x1 x2 x3

Bin 1 Bin 2
x4

Figure 17.3: States

Project Activity 17.1. To model the system of balls in bins we need to understand how the system
can transform from one state to another. It suffices to count the number of balls in Bin 1 (since the
remaining balls will be in Bin 2). Even though the balls are labeled, our count only cares about how
many balls are in each bin. Let x0 = [x0 , x1 , x2 , x3 , x4 ]T , where xi is the probability that Bin 1
T
contains i balls, and let x1 = x10 , x11 , x12 , x13 , x14 , where x1i is the probability that Bin 1 contains


i balls after the first move. We will call the vectors x0 and x1 probability distributions of balls in
bins. Note that since all four balls have to be placed in some bin, the sum of the entries in our
probability distribution vectors must be 1. Recall that a move is an instance when a ball changes
bins. We want to understand how x1 is obtained from x0 . In other words, we want to figure out
what the probability that Bin 1 contains 0, 1, 2, 3, or 4 balls after one ball changes bins if our initial
probability distribution of balls in bins is x0 .
We begin by analyzing the ways that a state can change. For example,

• Suppose there are 0 balls in Bin 1. (In our probability distribution x0 , this happens with
probability x0 .) Then there are four balls in Bin 2. The only way for a ball to change bins is
if one of the four balls moves from Bin 2 to Bin 1, putting us in State 1. Regardless of which
ball moves, we will always be put in State 1, so this happens with a probability of 1. In other
words, if the probability that Bin 1 contains 0 balls is x0 , then there is a probability of (1)x0
that Bin 1 will contain 1 ball after the move.

• Suppose we have 1 ball in Bin 1. There are four ways this can happen (since there are four
balls, and the one in Bin 1 is selected at random from the four balls), so the probability of a
given ball being in Bin 1 is 41 .
Section 17. The Characteristic Equation 309

– If the ball in Bin 1 moves, that move puts us in State 0. In other words, if the probability
that Bin 1 contains 1 ball is x1 , then there is a probability of 14 x1 that Bin 1 will contain
0 balls after a move.
– If any of the 3 balls in Bin 2 moves (each moves with probability 43 ), that move puts us
in State 2. In other words, if the probability that Bin 1 contains 1 ball is x1 , then there
is a probability of 34 x1 that Bin 1 will contain 2 balls after a move.

(a) Complete this analysis to explain the probabilities if there are 2, 3, or 4 balls in Bin 1.

(b) Explain how the results of part (a) show that

x10 = 0x0 + 41 x1 + 0x2 + 0x3 + 0x4


x11 = 1x0 + 0x1 + 21 x2 + 0x3 + 0x4
x12 = 0x0 + 34 x1 + 0x2 + 34 x3 + 0x4
x13 = 0x0 + 0x1 + 12 x2 + 0x3 + 1x4
x14 = 0x0 + 0x1 + 0x2 + 14 x3 + 0x4

The system we developed in Project Activity 17.1 has matrix form

x1 = T x0 ,

where T is the transition matrix

0 14
 
0 0 0
1
1 0 0 0 
 
 2
0 34 3
 
T =
 0 4 0 .

1
0 0 0 1 
 
 2
1
0 0 0 4 0

Subsequent moves give probability distribution vectors

x2 = T x1
x3 = T x2
.. ..
. .
xk = T xk−1 .

This example is an example of a Markov process (see Definition 9.4). There are several ques-
tions we can ask about this model. For example, what is the long-term behavior of this system,
and how does this model relate to entropy? That is, given an initial probability distribution vector
x0 , the system will have probability distribution vectors x1 , x2 , . . . after subsequent moves. What
happens to the vectors xk as k goes to infinity, and what does this tell us about entropy? To answer
these questions, we will first explore the sequence {xk } numerically, and then use the eigenvalues
and eigenvectors of T to analyze the sequence {xk }.
Project Activity 17.2. Use appropriate technology to do the following.
310 Section 17. The Characteristic Equation

(a) Suppose we begin with a probability distribution vector x0 = [1 0 0 0 0]T . Calculate


vectors xk for enough values of k so that you can identify the long term behavior of the
sequence. Describe this behavior.

(b) Repeat part (a) with


T
i. x0 = 0 12 12 0 0

T
ii. x0 = 0 31 13 0 31

T
iii. x0 = 51 51 15 15 51


Describe the long term behavior of the sequence {xk } in each case.

In what follows, we investigate the behavior of the sequence {xk } that we uncovered in Project
Activity 17.2.
Project Activity 17.3. We use the characteristic polynomial to find the eigenvalues of T .
(a) Find the characteristic polynomial of T . Factor the characteristic polynomial into a product
of linear polynomials to show that the eigenvalues of T are 0, 1, −1, 21 and − 12 .

(b) As we will see a bit later, certain eigenvectors for T will describe the end behavior of
the sequence {xk }. Find eigenvectors for T corresponding to the eigenvalues 1 and −1.
Explain how the eigenvector for T corresponding to the eigenvalue 1 explains the behavior
of one of the sequences was saw in Project Activity 17.2. (Any eigenvector of T with
eigenvalue 1 is called an equilibrium or steady state vector.)

Now we can analyze the behavior of the sequence {xk }.


Project Activity 17.4. To make the notation easier, we will let v1 be an eigenvector of T cor-
responding to the eigenvalue 0, v2 an eigenvector of T corresponding to the eigenvalue 1, v3 an
eigenvector of T corresponding to the eigenvalue −1, v4 an eigenvector of T corresponding to the
eigenvalue 21 , and v5 an eigenvector of T corresponding to the eigenvalue − 12 .
(a) Explain why {v1 , v2 , v3 , v4 , v5 } is a basis of R5 .

(b) Let x0 be any initial probability distribution vector. Explain why we can write x0 as
5
X
x0 = a1 v1 + a2 v2 + a3 v3 + a4 v4 + a5 v5 = ai vi
i=1
for some scalars a1 , a2 , a3 , a4 , and a5 .

We can now use the eigenvalues and eigenvectors of T to write the vectors xk in a convenient
form. Let λ1 = 0, λ2 = 1, λ3 = −1, λ4 = 12 , and λ5 = − 12 . Notice that
x1 = T x0
= T (a1 v1 + a2 v2 + a3 v3 + a4 v4 + a5 v5 )
= a1 T v1 + a2 T v2 + a3 T v3 + a4 T v4 + a5 T v5
= a1 λ1 v1 + a2 λ2 v2 + a3 λ3 v3 + a4 λ4 v4 + a5 λ5 v5
5
X
= ai λi vi .
i=1
Section 17. The Characteristic Equation 311

Similarly
5 5 5
!
X X X
x2 = T x1 = T ai λi vi = a i λ i T vi = ai λ2i vi .
i=1 i=1 i=1

We can continue in this manner to ultimately show that for each positive integer k we have
5
X
xk = ai λki vi (17.4)
i=1
P5
when x0 = i=1 ai vi .

Project Activity 17.5. Recall that we are interested in understanding the behavior of the sequence
{xk } as k goes to infinity.

(a) Equation (17.4) shows that we need to know limk→∞ λki for each i in order to analyze
limk→∞ xk . Calculate or describe these limits.

(b) Use the result of part (a), Equation (17.4), and Project Activity 17.3 (b) to explain why the
sequence {xk } is either eventually fixed or oscillates between two states. Compare to the
results from Project Activity 17.2. How are these results related to entropy? You may use
the facts that

• v1 = [1 0 − 2 0 1]T is an eigenvector for T corresponding to the eigenvalue 0,


• v2 = [1 4 6 4 1]T is an eigenvector for T corresponding to the eigenvalue 1,
• v3 = [1 − 4 6 − 4 1]T is an eigenvector for T corresponding to the eigenvalue −1,
• v4 = [−1 − 2 0 2 1]T is an eigenvector for T corresponding to the eigenvalue 12 ,
• v5 = [−1 2 0 − 2 1]T is an eigenvector for T corresponding to the eigenvalue − 12 .
Section 18

Diagonalization

Focus Questions
By the end of this section, you should be able to give precise and thorough
answers to the questions listed below. You may want to keep these questions
in mind to focus your thoughts as you complete the section.

• What is a diagonal matrix?


• What does it mean to diagonalize a matrix?
• What does it mean for two matrices to be similar?
• What important properties do similar matrices share?
• Under what conditions is a matrix diagonalizable?
• When a matrix A is diagonalizable, what is the structure of a matrix P that
diagonalizes A?
• Why is diagonalization useful?

Application: The Fibonacci Numbers

In 1202 Leonardo of Pisa (better known as Fibonacci) published Liber Abaci (roughly translated
as The Book of Calculation), in which he constructed a mathematical model of the growth of a
rabbit population. The problem Fibonacci considered is that of determining the number of pairs of
rabbits produced in a given time period beginning with an initial pair of rabbits. Fibonacci made the
assumptions that each pair of rabbits more than one month old produces a new pair of rabbits each
month, and that no rabbits die. (We ignore any issues about that might arise concerning the gender
of the offspring.) If we let Fn represent the number of rabbits in month n, Fibonacci produced the
model

Fn+2 = Fn+1 + Fn , (18.1)

313
314 Section 18. Diagonalization

for n ≥ 0 where F0 = 0 and F1 = 1. The resulting sequence

1, 1, 2, 3, 5, 8, 13, 21, . . .

is a very famous sequence in mathematics and is called the Fibonacci sequence. This sequence is
thought to model many natural phenomena such as number of seeds in a sunflower and anything
which grows in a spiral form. It is so famous in fact that it has a journal devoted entirely to it. As
a note, while Fibonacci’s work Liber Abaci introduced this sequence to the western world, it had
been described earlier Sanskrit texts going back as early as the sixth century.
By definition, the Fibonacci numbers are calculated by recursion. This is a vey ineffective way
to determine entries Fn for large n. Later in this section we will derive a fascinating and unexpected
formula for the Fibonacci numbers using the idea of diagonalization.

Introduction

As we have seen when studying Markov processes, each state is dependent on the previous state. If
x0 is the initial state and A is the transition matrix, then the nth state is found by An x0 . In these
situations, and others, it is valuable to be able to quickly and easily calculate powers of a matrix.
We explore a way to do that in this section.

Preview Activity 18.1. Consider a very simplified weather forecast. Let us assume there are two
possible states for the weather: rainy (R) or sunny(S). Let us also assume that the weather patterns
are stable enough that we can reasonably predict the weather tomorrow based on the weather today.
If is is sunny today, then there is a 70% chance that it will be sunny tomorrow,
  and if it is rainy
s
today then there is a 40% chance that it will be rainy tomorrow. If x0 = is a state vector that
r
indicates a probability s that it is sunny and probability r that it is rainy on day 0, then
 
0.70 0.40
x1 = x0
0.30 0.60
 
0.70 0.40
tells us the likelihood of it being sunny or rainy on day 1. Let A = .
0.30 0.60
 
1
(1) Suppose it is sunny today, that is x0 = . Calculate x1 = Ax0 and explain how this
0
matrix-vector product tells us the probability that it will be sunny tomorrow.

(2) Calculate x2 = Ax1 and interpret the meaning of each component of the product.

(3) Explain why x2 = A2 x0 . Then explain in general why xn = An x0 .

(4) The previous result demonstrates that to determine the long-term probability of a sunny or
rainy day, we want to be able to easily calculate powers of the matrix A. Use a computer
algebra system (e.g., Maple, Mathematica, Wolfram|Alpha) to calculate the entries of x10 ,
x20 , and x30 . Based on this data, what do you expect the long term probability of any day
being a sunny one?
Section 18. Diagonalization 315

Diagonalization

In Preview Activity 18.1 we saw how if we can powers of a matrix we can make predictions about
the long-term behavior of some systems. In general, calculating powers of a matrix can be a very
difficult thing, but there are times when the process is straightforward.
 
2 0
Activity 18.1. Let D = .
0 3
 2 
2 0
(a) Show that D = 2 .
0 32
 3 
2 0
(b) Show that D = 3 . (Hint: D3 = DD2 .)
0 33
 n 
2 0
(c) Explain in general why D = n for any positive integer n.
0 3n

Activity 18.1 illustrates that calculating powers of square matrices whose only nonzero entries
are along the diagonal is rather simple. In general, if
 
d11 0 0 · · · 0 0
 0 d22 0 · · · 0 0 
D= . . ..  ,
 
 . . 0 0 . . . 
0 0 0 · · · 0 dnn
then
dk11 0
 
0 ··· 0 0
 0 dk22 0 ··· 0 0 
Dk =  .. . ..
 
. 0 .. .

 0 
0 0 0 · · · 0 dknn
for any positive integer k. Recall that a diagonal matrix is a matrix whose only nonzero elements
are along the diagonal (see Definition 8.6). In this section we will see that matrices that are sim-
ilar to diagonal matrices have some very nice properties, and that diagonal matrices are useful in
calculations of powers of matrices.
We can utilize the method of calculating powers of diagonal matrices to also easily calculate
powers of other types of matrices.
Activity 18.2. Let D be any matrix, P an invertible matrix, and let A = P −1 DP .

(a) Show that A2 = P −1 D2 P .

(b) Show that A3 = P −1 D3 P .

(c) Explain in general why An = P −1 Dn P for positive integers n.

As Activity 18.2 illustrates, to calculate the powers of a matrix of the form P −1 DP we only
need determine the powers of the matrix D. If D is a diagonal matrix, this is especially straightfor-
ward.
316 Section 18. Diagonalization

Similar Matrices

Similar matrices play an important role in certain calculations. For example, Activity 18.2 showed
that if we can write a square matrix A in the form A = P −1 DP for some invertible matrix P and
diagonal matrix D, then finding the powers of A is straightforward. As we will see, the relation
A = P −1 DP will imply that the matrices A and D share many properties.

Definition 18.1. The n × n matrix A is similar to the n × n matrix B if there is an invertible matrix
P such that A = P −1 BP .
   
1 1 2 2
Activity 18.3. Let A = and B = . Assume that A is similar to B via the
2 0 0 −1
 
2 1
matrix P = .
2 2

(a) Calculate det(A) and det(B). What do you notice?

(b) Find the characteristic polynomials of A and B. What do you notice?

(c) What can you say about the eigenvalues of A and B? Explain.
 
1
(d) Explain why x = is an eigenvector for A with eigenvalue 2. Is x an eigenvector for
1
B with eigenvalue 2? Why or why not?

Activity 18.3 suggests that similar matrices share some, but not all, properties. Note that if
A = P −1 BP , then B = Q−1 AQ with Q = P −1 . So if A is similar to B, then B is similar to A.
Similarly (no pun intended), since A = I −1 AI (where I is the identity matrix), then any square
matrix is similar to itself. Also, if A = P −1 BP and B = M −1 CM , then A = (M P )−1 C(M P ).
So if A is similar to B and B is similar to C, then A is similar to C. If you have studied relations,
these three properties show that similarity is an equivalence relation on the set of all n × n matrices.
This is one reason why similar matrices share many important traits, as the next activity highlights.

Activity 18.4. Let A and B be similar matrices with A = P −1 BP .

(a) Use the multiplicative property of the determinant to explain why det(A) = det(B). So
similar matrices have the same determinants.

(b) Use the fact that P −1 IP = I to show that A − λI is similar to B − λI.

(c) Explain why it follows from (a) and (b) that

det(A − λI) = det(B − λI).

So similar matrices have the same characteristic polynomial, and the same eigenvalues.

We summarize some properties of similar matrices in the following theorem.

Theorem 18.2. Let A and B be similar n × n matrices and I the n × n identity matrix. Then

(1) det(A) = det(B),


Section 18. Diagonalization 317

(2) A − λI is similar to B − λI,

(3) A and B have the same characteristic polynomial,

(4) A and B have the same eigenvalues.

Similarity and Matrix Transformations

When a matrix is similar to a diagonal matrix, we can gain insight into the action of the correspond-
ing matrix transformation. As an example, consider the matrix transformation T from R2 to R2
defined by T x = Ax, where  
3 1
A= . (18.2)
1 3
We are interested in understanding what this matrix transformation does to vectors in R
 . First
2
 we
−1
note that A has eigenvalues λ1 = 2 and λ2 = 4 with corresponding eigenvectors v1 = and
1
 
1
v2 = . If we let P = [v1 v2 ], then you can check that
1

P −1 AP = D

and
A = P DP −1 ,
where  
2 0
D= .
0 4
Thus,
T (x) = P DP −1 x.

A simple calculation shows that


 
−1 1 −1 1
P = .
2 1 1
   
1 0
Let us apply T to the unit square whose sides are formed by the vectors e1 = and e2 =
0 1
as shown in the first picture in Figure 18.1.
To apply T we first multiply e1 and e2 by P −1 . This gives us
1 1
P −1 e1 = v1 and P −1 v2 = v2 .
2 2
So P −1 transforms the standard coordinate system into a coordinate system in which v1 and v2
determine the axes, as illustrated in the second picture in Figure 18.1. Applying D to the output
scales by 2 in the v1 direction and 4 in the v2 direction as depicted in the third picture in Figure
18.1. Finally, we apply P to translate back into the standard xy coordinate system as shown in the
last picture in Figure 18.1.
318 Section 18. Diagonalization

This example illustrates that it is most convenient to view the action of T in the coordinate
system where v1 serves as the x-direction and v2 as the y-direction. In this case, we can visualize
that when we apply the transformation T to a vector in this system it is just scaled in both directions
by the matrix D. Then the matrix P translates everything back to the standard xy coordinate system.

e2
P −1 e1 P −1 e2

e1

T (e1 )

DP −1 e2

DP −1 e1 T (e2 )

Figure 18.1: The matrix transformation.

This geometric perspective provides another example of how having a matrix similar to a di-
agonal matrix informs us about the situation. In what follows we determine the conditions that
determine when a matrix is similar to a diagonal matrix.

Diagonalization in General

In Preview Activity 18.1 and in the matrix transformation example we found that a matrix A was
similar to a diagonal matrix whose columns were eigenvectors of A. This will work for a general
n × n matrix A as long as we can find an invertible matrix P whose columns are eigenvectors of A.
More specifically, suppose A is an n × n matrix with n linearly independent eigenvectors v1 , v1 ,
. . ., vn with corresponding eigenvalues λ1 , λ1 , . . ., λn (not necessarily distinct). Let

P = [v1 v2 v3 · · · vn ].
Section 18. Diagonalization 319

Then

AP = [Av1 Av2 Av3 · · · Avn ]


= [λ1 v1 λ2 v2 λ3 v3 · · · λn vn ]
 
λ1 0 0 ··· 0 0
 0 λ2 0 ··· 0 0 
= [v1 v2 v3 · · · vn ]  ... .. .. ..
 
. . .
 
 ··· 

 0 0 0 · · · λn−1 0 
0 0 0 ··· 0 λn
= P D.

where  
λ1 0 0 ··· 0 0
 0 λ2 0 ··· 0 0 
.. .. .. ..
 
D= . . . . .
 
 ··· 
 0 0 0 · · · λn−1 0 
0 0 0 ··· 0 λn
Since the columns of P are linearly independent, we know P is invertible, and so

P −1 AP = D.

Definition 18.3. An n × n matrix A is diagonalizable if there is an invertible n × n matrix P so


that P −1 AP is a diagonal matrix.

In other words, a matrix A is diagonalizable if A is similar to a diagonal matrix.


IMPORTANT NOTE: The key notion to the process described above is that in order to diago-
nalize an n × n matrix A, we have to find n linearly independent eigenvectors for A. When A is
diagonalizable, a matrix P so that P −1 AP is diagonal is said to diagonalize A.

Activity 18.5. Find an invertible matrix P that diagonalizes A.


 
1 1
(a) A =
0 2
 
3 2 4
(b) A =  2 0 2 . (Hint: The eigenvalues of A are 8 and −1.)
4 2 3

It should be noted
 that
 there are square matrices that are not diagonalizable. For example,
1 1
the matrix A = has 1 as its only eigenvalue and the dimension of the eigenspace of
0 1
A corresponding to the eigenvalue is one. Therefore, it will be impossible to find two linearly
independent eigenvectors for A.
We showed previously that eigenvectors corresponding to distinct eigenvalue are always linearly
independent, so if an n × n matrix A has n distinct eigenvalues then A is diagonalizable. Activity
18.5 (b) shows that it is possible to diagonalize an n × n matrix even if the matrix does not have
320 Section 18. Diagonalization

n distinct eigenvalues. In general, we can diagonalize a matrix as long as the dimension of each
eigenspace is equal to the multiplicity of the corresponding eigenvalue. In other words, a matrix
is diagonalizable if the geometric multiplicity is the same is the algebraic multiplicity for each
eigenvalue.
At this point we might ask one final question. We argued that if an n×n matrix A has n linearly
independent eigenvectors, then A is diagonalizable. It is reasonable to wonder if the converse is true
– that is, if A is diagonalizable, must A have n linearly independent eigenvectors? The answer is
yes, and you are asked to show this in Exercise 6. We summarize the result in the following theorem.

Theorem 18.4 (The Diagonalization Theorem). An n × n matrix A is diagonalizable if and only


if A has n linearly independent eigenvectors. If A is diagonalizable and has linearly independent
eigenvectors v1 , v2 , . . ., vn with Avi = λi vi for each i, then n × n matrix P [v1 v2 · · · vn ] whose
columns are linearly independent eigenvectors of A satisfies P −1 AP = D, where D[dij ] is the
diagonal matrix with diagonal entries dii = λi for each i.

Examples

What follows are worked examples that use the concepts from this section.
   
1 −2 1 1 2 0
Example 18.5. Let A =  0 3 −1  and B =  0 1 0 . You should use appropriate
0 −2 2 0 0 4
technology to calculate determinants, perform any row reductions, or solve any polynomial equa-
tions.

(a) Determine if A is diagonalizable. If diagonalizable, find a matrix P that diagonalizes A.

(b) Determine if B is diagonalizable. If diagonalizable, find a matrix Q that diagonalizes B.

(c) Is it possible for two matrices R and S to have the same eigenvalues with the same algebraic
multiplicities, but one matrix is diagonalizable and the other is not? Explain.

Example Solution.

(a) Technology shows that the characteristic polynomial of A is

p(λ) = det(A − λI3 ) = (4 − λ)(1 − λ)2 .

The eigenvalues of A are the solutions to the characteristic equation p(λ) = 0. Thus, the
eigenvalues of A are 1 and 4.
To find a basis for the eigenspace of A corresponding to the eigenvalue 1, we find the
general solution to the homogeneous system (A − I3 )x = 0. Using technology we see
 
1

0 −2 1
 0 1 − 2
that the reduced row echelon form of A − I3 = 0 2 −1 is  0 0 . So
 
   0 
0 −2 1
0 0 0
Section 18. Diagonalization 321

 
x1
if x =  x2 , then the general solution to (A − I3 )x = 0 is
x3
 
x1
 
x= 1
 2 x3 

x3
 
 
1 0
 
= x1  0  + x3  1 .
 2 
0 1

So a basis for the eigenspace of A corresponding to the eigenvalue 1 is


n o
[1 0 0]T , [0 1 2]T .

To find a basis for the eigenspace of A corresponding to the eigenvalue 4, we find the
general solution to the homogeneous system (A −4I3 )x = 0. Using  technology
 we see

−3 −2 1 0 1 −1
that the reduced row echelon form of A − 4I3 =  0 −1 −1  is  0 1 1 .
0 −2 −2 0 0 0
 
x1
So if x =  x2 , then the general solution to (A − 4I3 )x = 0 is
x3

x = [x1 x2 x3 ]T
= [x3 − x3 x3 ]T
= x3 [1 − 1 1]T .

So a basis for the eigenspace of A corresponding to the eigenvalue 4 is


n o
[1 − 1 0]T .

Eigenvectors corresponding to different eigenvalues are linearly independent, so the set


n o
[1 0 0]T , [0 1 2]T , [1 − 1 0]T

is a basis for R3 . Since we can find a basis for R3 consisting of eigenvectors of A, we


conclude that A is diagonalizable. Letting
 
1 0 1
P = 0 1 −1 
0 2 1
gives us  
1 0 0
P −1 AP =  0 1 0 
0 0 4
322 Section 18. Diagonalization

(b) Technology shows that the characteristic polynomial of B is

p(λ) = det(B − λI3 ) = (4 − λ)(1 − λ)2 .

The eigenvalues of B are the solutions to the characteristic equation p(λ) = 0. Thus, the
eigenvalues of B are 1 and 4.
To find a basis for the eigenspace of B corresponding to the eigenvalue 1, we find the
general solution to the homogeneous system (B −I3 )x = 0. Using  technology we see
0 2 0 0 1 0
that the reduced row echelon form of B − I3 =  0 0 0  is  0 0 1 . So if
0 0 3 0 0 0
 
x1
x =  x2 , then the general solution to (B − I3 )x = 0 is
x3

x = [x1 x2 x3 ]T
= [x1 0 0]T
= x1 [1 0 0]T .

So a basis for the eigenspace of B corresponding to the eigenvalue 1 is


n o
[1 0 0]T .

To find a basis for the eigenspace of B corresponding to the eigenvalue 4, we find the
general solution to the homogeneous system (B − 4I3 )x = 0. Using
 technologywe see
−3 2 0 1 0 0
that the reduced row echelon form of B − 4I3 =  0 −3 0  is  0 1 0 . So if
0 0 0 0 0 0
 
x1
x =  x2 , then the general solution to (B − 4I3 )x = 0 is
x3

x = [x1 x2 x3 ]T
= [0 0 x3 ]T
= x3 [0 0 1]T .

So a basis for the eigenspace of B corresponding to the eigenvalue 4 is


n o
[0 0 1]T .

Since each eigenspace is one-dimensional, we cannot find a basis for R3 consisting of


eigenvectors of B. We conclude that B is not diagonalizable.

(c) Yes it is possible for two matrices R and S to have the same eigenvalues with the same
multiplicities, but one matrix is diagonalizable and the other is not. An example is given
by the matrices A and B in this problem.
Section 18. Diagonalization 323

Example 18.6.

(a) Is it possible to find diagonalizable matrices A and B such that AB is not diagonalizable?
If yes, provide an example. If no, explain why.

(b) Is it possible to find diagonalizable matrices A and B such that A+B is not diagonalizable?
If yes, provide an example. If no, explain why.

(c) Is it possible to find a diagonalizable matrix A such that AT is not diagonalizable? If yes,
provide an example. If no, explain why.

(d) Is it possible to find an invertible diagonalizable matrix A such that A−1 is not diagonaliz-
able? If yes, provide an example. If no, explain why.

Example Solution.
   
1 1 2 −2
(a) Let A = and B = . Since A and B are both diagonal matrices,
0 2 0 1
their eigenvalues are their diagonal entries. With
 2 distinct
 eigenvalues, both A and B are
2 −1
diagonalizable. In this case we have AB = , whose only eigenvector is 2. The
0 2
 
0 1
reduced row echelon form of AB − 2I2 is . So a basis for the eigenspace of AB
0 0
is {[1 0]T }. Since there is no basis for R2 consisting of eigenvectors of AB, we conclude
that AB is not diagonalizable.
   
1 3 2 0
(b) Let A = and B = . Since A and B are both diagonal matrices,
0 2 0 1
their eigenvalues are their diagonal entries. With 2 distinct
 eigenvalues, both A and B are
3 3
diagonalizable. In this case we have A + B = , whose only eigenvector is 3. The
0 3
 
0 1
reduced row echelon form of (A + B) − 3I2 is . So a basis for the eigenspace of
0 0
A + B is {[1 0]T }. Since there is no basis for R2 consisting of eigenvectors of A + B, we
conclude that A + B is not diagonalizable.

(c) It is not possible to find a diagonalizable matrix A such that AT is not diagonalizable. To
see why, suppose that matrix A is diagonalizable. That is, there exists a matrix P such that
T −1
P −1 AP = D, where D is a diagonal matrix. Recall that P −1 = P T . So

D = DT
T
= P −1 AP
T
= P T AT P −1
 −1
= P A PT
T T
.
−1
Letting A = P T , we conclude that

Q−1 AT Q = D.
324 Section 18. Diagonalization

Therefore, Q diagonalizes AT .

(d) It is not possible to find an invertible diagonalizable matrix A such that A−1 is not diag-
onalizable. To see why, suppose that matrix A is diagonalizable. That is, there exists a
matrix P such that P −1 AP = D, where D is a diagonal matrix. Thus, A = P DP −1 .
Since A is invertible, det(A) 6= 0. It follows that det(D) 6= 0. So none of the diagonal
entries of D can be 0. Thus, D is invertible and D−1 is a diagonal matrix. Then
−1
D−1 = P −1 AP = P A−1 P −1

and so P −1 diagonalizes A−1 .

Summary

• A matrix D = [dij ] is a diagonal matrix if dij = 0 whenever i 6= j.

• A matrix A is diagonalizable if there is an invertible matrix P so that P −1 AP is a diagonal


matrix.

• Two matrices A and B are similar if there is an invertible matrix P so that

B = P −1 AP.

• Similar matrices have the same determinants, same characteristic polynomials, and same
eigenvalues. Note that similar matrices do not necessarily have the same eigenvectors corre-
sponding to the same eigenvalues.

• An n × n matrix A is diagonalizable if and only if A has n linearly independent eigenvectors.

• When an n × n matrix A is diagonalizable, then P = [v1 v2 v3 · · · vn ] is invertible and


P −1 AP is diagonal, where v1 , v2 , . . ., vn are n linearly independent eigenvectors for A.

• One use for diagonalization is that once we have diagonalized a matrix A we can quickly
and easily compute powers of A. Diagonalization can also help us understand the actions of
matrix transformations.

Exercises

(1) Determine if each of the following matrices is diagonalizable or not. For diagonalizable
matrices, clearly identify a matrix P which diagonalizes the matrix, and what the resulting
diagonal matrix is.
 
2 −1
(a) A =
1 4
 
−1 4 −2
(b) A =  −3 4 0 
−3 1 3
Section 18. Diagonalization 325

  

1 1
(2) The 3 × 3 matrix A has two eigenvalues λ1 = 2 and λ2 = 3. The vectors  2 ,  −1 ,
1 2
     
2 1 2
and 4 are eigenvectors for λ1 = 2, while the vectors 1 , 2  are eigenvectors
    
2 1 2
for λ2 = 3. Find the matrix A.

(3) Find a 2 × 2 non-diagonal matrix A and two different pairs of P and D matrices for which
A = P DP −1 .

(4) Find a 2 × 2 non-diagonal matrix A and two different P matrices for which A = P DP −1
with the same D.

(5) Suppose a 4 × 4 matrix A has eigenvalues 2, 3 and 5 and the eigenspace for the eigenvalue
3 has dimension 2. Do we have enough information to determine if A is diagonalizable?
Explain.

(6) Let A be a diagonalizable n×n matrix. Show that A has n linearly independent eigenvectors.

(7)
   
1 1 1 2
(a) Let A = and B = . Find the eigenvalues and eigenvectors of A
0 1 0 1
and B. Conclude that it is possible for two different n × n matrices A and B to have
exactly the same eigenvectors and corresponding eigenvalues.
(b) A natural question to ask is if there are any conditions under which n×n matrices that
have exactly the same eigenvectors and corresponding eigenvalues must be equal.
Determine the answer to this question if A and B are both diagonalizable.

(8)
(a) Show that if D and D0 are n × n diagonal matrices, then DD0 = D0 D.
(b) Show that if A and B are n × n matrices and P is an invertible n × n matrix such that
P −1 AP = D and P −1 BP = D0 with D and D0 diagonal matrices, then AB = BA.

(9) Exercise 2 in Section 17 shows that the determinant of a matrix is the product of its eigen-
values. In this exercise we show that the trace of a diagonalizable matrix is the sum of its
eigenvalues.1 First we define the trace of a matrix.

Definition 18.7. The trace of an n × n matrix A = [aij ] is the sum of the diagonal entries
of A. That is,
X n
trace(A) = a11 + a22 + · · · + ann = aii .
i=1

(a) Show that if R = [rij ] and S = [sij ] are n×n matrices, then trace(RS) = trace(SR).
1
This result is true for any matrix, but the argument is more complicated.
326 Section 18. Diagonalization

(b) Let A be a diagonalizable n×n matrix, and let p(λ) = det(A−λIn ) be the character-
istic polynomial of A. Let P be an invertible matrix such that P −1 AP = D, where
D is the diagonal matrix whose diagonal entries are λ1 , λ2 , . . ., λn , the eigenvalues
of A (note that these eigenvalues may not all be distinct).
i. Explain why trace(A) = trace(D).
ii. Show that the trace of an n × n diagonalizable matrix is the sum of the eigen-
values of the matrix.

(10) In this exercise we generalize the result of Exercise 12 in Section 8 to arbitrary diagonalizable
matrices.
(a) Show that if  
λ1 0 0 · · · 0
 0 λ2 0 · · · 0 
D= .. .. .. . . . ,
 
 . . . . .. 
0 0 0 · · · λn
then
eλ1
 
0 0 ··· 0
 0 eλ2 0 ··· 0 
eD =  .. .. .. . . .. .
 
 . . . . . 
0 0 0 · · · e λn

(b) Now suppose that an n × n matrix A is diagonalizable, with P −1 AP equal to a


diagonal matrix D. Show that eA = P eD P −1 .
   
1 1 0 −1
(11) Let A = and let B = .
0 0 0 0
(a) Use the result of Exercise 10 to calculate eA .
(b) Calculate eB . (Hint: Explain why B is not diagonalizable.)
(c) Use the result of Exercise 10 to calculate eA+B .
(d) The real exponential function satisfies some familiar properties. For example, ex ey =
ey ex and ex+y = ex ey for any real numbers x and y. Does the matrix exponential
satisfy the corresponding properties. That is, if X and Y are n × n matrices, must
eX eY = eY eX and eX+Y = eX eY ? Explain.

(12) In Exercise 11 we see that we cannot conclude that eX+Y = eX eY for n × n matrices X and
Y . However, a more limited property is true.
(a) Follow the steps indicated to show that if A is an n × n matrix and s and t are any
scalars, then eAs eAt = eA(s+t) . (Although we will not use it, you may assume that
the series for eA converges for any square matrix A.)
i. Use the definition to show that
X X sk tm
eAs eAt = m!Ak+m .
k!
k≥0 m≥0
Section 18. Diagonalization 327

ii. Relabel and reorder terms with n = k + m to show that


n
1 nX n!
eAs eAt = sumn≥0 A sn−m tm .
n! (n − m)!m!
m=0

iii. Complete the problem using the Binomial Theorem that says
n
X n!
(s + t)n = sn−m tm .
(n − m)!m!
m=0

(b) Use the result of part (a) to show that eA is an invertible matrix for any n × n matrix
A.

(13) There is an interesting connection between the determinant of a matrix exponential and the
trace of the matrix. Let A be a diagonalizable n×n matrix with real entries. Let D = P −1 AP
for some invertible matrix P , where D is the diagonal matrix with entries λ1 , λ2 , . . ., λn the
eigenvalues of A.
(a) Show that eA = P eD P −1 .
(b) Use Exercise 9 to show that

det eA = etrace(A) .


(14) Label each of the following statements as True or False. Provide justification for your re-
sponse.
(a) True/False If matrix A is diagonalizable, then so is AT .
(b) True/False If matrix A is diagonalizable, then A is invertible.
(c) True/False If an n × n matrix A is diagonalizable, then A has n distinct eigenvalues.
(d) True/False If matrix A is invertible and diagonalizable, then so is A−1 .
(e) True/False If an n × n matrix C is diagonalizable, then there exists a basis of Rn
consisting of the eigenvectors of C.
(f) True/False An n × n matrix with n distinct eigenvalues is diagonalizable.
(g) True/False If A is an n × n diagonalizable matrix, then there is a unique diagonal
matrix such that P −1 AP = D for some invertible matrix P .
(h) True/False If A is an n × n matrix with eigenvalue λ, then the dimension of the
eigenspace of A corresponding to the eigenvalue λ is n − rank(A − λIn ).
(i) True/False If λ is an eigenvalue of an n × n matrix A, then eλ is an eigenvalue of
eA . (See Exercise 12 in Section 8 for information on the matrix exponential.)

Project: Binet’s Formula for the Fibonacci Numbers

We return to the Fibonacci sequence Fn where Fn+2 = Fn+1 + Fn , for n ≥ 0, F0 = 0, and F1 = 1.


Since Fn+2 is determined by previous values Fn+1 and Fn , the relation Fn+2 = Fn+1 + Fn is
328 Section 18. Diagonalization

called a recurrence relation. The recurrence relation Fn+2 = Fn+1 + Fn is very time consuming
to use to compute Fn for large values of n. It turns out that there is a fascinating formula that gives
the nth term of the Fibonacci sequence directly, without using the relation Fn+2 = Fn+1 + Fn .
Project Activity 18.1. The recurrence relationFn+2 = Fn+1 + Fn gives the equations
Fn+1 = Fn + Fn−1 (18.3)
Fn = Fn . (18.4)
 
Fn+1
Let xn = for n ≥ 0. Explain how the equations (18.3) and (18.3) can be described with
Fn
the matrix equation
xn = Axn−1 , (18.5)
 
1 1
where A = .
1 0

The matrix equation (18.5) shows us how to find the vectors xn using powers of the matrix A:
x1 = Ax0
x2 = Ax1 = A(Ax0 ) = A2 x0
x3 = Ax2 = A(A2 x0 ) = A3 x0
.. ..
. .
x n = An x 0 .
So if we can somehow easily find the powers of the matrix A, then we can find a convenient formula
for Fn . As we have seen, we know how to do this if A is diagonalizable
 
1 1
Project Activity 18.2. Let A = .
1 0
√ √
1+ 5 1− 5
(a) Show that the eigenvalues of A are ϕ = 2 and ϕ = 2 .

(b) Find bases for each eigenspace of A.

Now that we have the eigenvalues and know corresponding eigenvectors for A, we can return
to the problem of diagonalizing A.
Project Activity 18.3.
(a) Why do we know that A is diagonalizable?

(b) Find a matrix P such that P −1 AP is a diagonal matrix. What is the diagonal matrix?

Now we can find a formula for the nth Fibonacci number.


Project Activity 18.4. Since P −1 AP = D, where D is a diagonal matrix, we also have A =
P DP −1 . Recall that when A = P DP −1 , it follows that An = P Dn P −1 . Use the equation
An = P Dn P −1 to show that
ϕn − ϕn
Fn = √ . (18.6)
5
(Hint: We just need to calculate the second component of An x0 .)
Section 18. Diagonalization 329

Formula (18.6) is called Binet’s formula. It is a very surprising formula in the fact that the
expression on the right hand side of (18.6) is an integer for each positive integer n. Note that with
Binet’s formula we can quickly compute Fn for very large values of n. For example,

F150 = 9969216677189303386214405760200.

The number ϕ = 1+2 5 , called the golden mean or golden ratio is intimately related to the
Fibonacci sequence. Binet’s formula provides a fascinating relationship between the Fibonacci
numbers and the golden ratio. The golden ratio also occurs often in other areas of mathematics. It
was an important number to the ancient Greek mathematicians who felt that the most aesthetically
pleasing rectangles had sides in the ratio of ϕ : 1.

Project Activity 18.5. You might wonder what happens if we use negative integer exponents in
Binet’s formula. In other words, are there negatively indexed Fibonacci numbers? For any integer
n, including negative integers, let
ϕn − ϕn
Fn = √
5
There is a specific relationship between F−n and Fn . Find it and verify it.
Section 19

Approximating Eigenvalues and


Eigenvectors

Focus Questions
By the end of this section, you should be able to give precise and thorough
answers to the questions listed below. You may want to keep these questions
in mind to focus your thoughts as you complete the section.

• What is the power method for?


• How does the power method work?
• How can we use the inverse power method to approximate any eigen-
value/eigenvector pair?

Application: Leslie Matrices and Population Modeling

The Leslie Matrix (also called the Leslie Model) is a powerful model for describing an age dis-
tributed growth of a population that is closed to migration. In a Leslie model, it is usually the case
that only one gender (most often female) is considered. As an example, we will later consider a
population of sheep that is being grown commercially. A natural question that we will address is
how we can harvest the population to build a sustainable environment.
When working with populations, the matrices we use are often large. For large matrices, using
the characteristic polynomial to calculate eigenvalues is too time and resource consuming to be
practical, and we generally cannot find the exact values of the eigenvalues. As a result, approxi-
mation techniques are very important. In this section we will explore a method for approximating
eigenvalues. The eigenvalues of a Leslie matrix are important because they describe the limiting or
steady-state behavior of a population. The matrix and model were introduced by Patrick H. Leslie
in “On the Use of Matrices in Certain Population Mathematics”, Leslie, P.H., Biometrika, Volume
XXXIII, November 1945, pp. 183-212.

331
332 Section 19. Approximating Eigenvalues and Eigenvectors

Introduction

We have used the characteristic polynomial to find the eigenvalues of a matrix, and for each eigen-
value row reduced a corresponding matrix to find the eigenvectors This method is only practical for
small matrices – for more realistic applications approximation techniques are used. We investigate
one such technique in this section, the power method.
 
2 6
Preview Activity 19.1. Let A = . Our goal is to find a scalar λ and a nonzero vector v
5 3
so that Av = λv.

(1) If we have no prior knowledge of the eigenvalues and eigenvectors of this matrix, we might
just begin with a guess. Let x0 = [1 0]T be such a guess for an eigenvector. Calculate Ax0 .
Is x0 an eigenvector of A? Explain.

(2) If x0 is not a good approximation to an eigenvector of A, then we need to make a better guess.
We have little to work with other than just random guessing, but we can use x1 = Ax0 as
another guess. We calculated x1 in part 1. Is x1 an eigenvector for A? Explain.

(3) In parts (a) and (b) you might have noticed that in some sense x1 is closer to being an eigen-
vector of A than x0 was. So maybe continuing this process will get us closer to an eigenvector
of A. In other words, for each positive integer k we define xk as Axk−1 . Before we proceed,
however, we should note that as we calculate the vectors x1 , x2 , x3 , . . ., the entries in the
vectors get large very quickly. So it will be useful to scale the entries so that they stay at a
reasonable size, which makes it easier to interpret the output. One way to do this is to divide
each vector xi by its largest component in absolute value so that all of the entries stay between
−1 and 1.1 So in our example we have x0 = [1 0]T , x1 = [2/5 1]T , and x2 = [1 25/34]T .
Explain why scaling our vectors will not affect our search for an eigenvector.

(4) Use an appropriate technological tool to find the vectors xk up to k = 10. What do you think
the limiting vector limk→∞ xk is? Is this limiting vector an eigenvector of A? If so, what is
the corresponding eigenvalue?

The Power Method

While the examples we present in this text are small in order to highlight the concepts, matrices that
appear in real life applications are often enormous. For example, in Google’s PageRank algorithm
that is used to determine relative rankings of the importance of web pages, matrices of staggering
size are used (most entries in the matrices are zero, but the size of the matrices is still huge). Finding
eigenvalues of such large matrices through the characteristic polynomial is impractical. In fact,
finding the roots of all but the smallest degree characteristic polynomials is a very difficult problem.
As a result, using the characteristic polynomial to find eigenvalues and then finding eigenvectors is
not very practical in general, and it is often a better option to use a numeric approximation method.
We will consider one such method in this section, the power method.
1
There are several other ways to scale, but we won’t consider them here.
Section 19. Approximating Eigenvalues and Eigenvectors 333


2 6
In Preview Activity 19.1, we saw an example of a matrix A = so that the sequence
5 3
{xk }, where xk = Axk−1 , converged to a dominant eigenvector of A for an initial guess vector
x0 = [1 0]T . The vectors xi for i from 1 to 6 (with scaling) are approximately
     
0.4 1 0.8898
x1 =   x2 =   x3 =  
1 0.7353 1
     
1 0.9838 1
x4 =   x5 =   x6 =  .
0.9575 1 0.9939

Numerically we can see that the sequence {xk } approaches the vector [1 1]T , and Figure 19.1
illustrates this geometrically as well. This method of successive approximations xk = Axk−1 is

x1 x3 x5 x6
x4
x2

x0

Figure 19.1: The power method.

called the power method (since we could write xk as Ak x0 ). Our task now is to show that this
method works in general. In the next activity we restrict our argument to the 2 × 2 case, and then
discuss the general case afterwards.
Let A be an arbitrary 2 × 2 matrix with two linearly independent eigenvectors v1 and v2 and
corresponding eigenvalues λ1 and λ2 , respectively. We will also assume |λ1 | > |λ2 |. An eigenvalue
whose absolute value is larger than that of any other eigenvalue is called a dominant eigenvalue.
Any eigenvector for a dominant eigenvalue is called a dominant eigenvector. Before we show that
our method can be used to approximate a dominant eigenvector, we recall that since v1 and v2 are
eigenvectors corresponding to distinct eigenvalues, then v1 and v2 are linearly independent. So
there exist scalars a1 and a2 such that

x0 = a1 v1 + a2 v2 .

We have seen that for each positive integer k we can write xn as

xk = a1 λk1 v1 + a2 λk2 v2 . (19.1)

With this representation of x0 we can now see why the power method approximates a dominant
eigenvector of A.
334 Section 19. Approximating Eigenvalues and Eigenvectors

Activity 19.1. Assume as above that A is an arbitrary 2 × 2 matrix with two linearly independent
eigenvectors v1 and v2 and corresponding eigenvalues λ1 and λ2 , respectively. (We are assuming
that we don’t know these eigenvectors, but we can assume that they exist.) Assume that λ1 is the
dominant eigenvalue for A, x0 is some initial guess to an eigenvector for A, that x0 = a1 v1 + a2 v2 ,
and that xk = Axk−1 for k ≥ 1.

(a) We divide both sides of equation (19.1) by λk1 (since λ1 is the dominant eigenvalue, we
know that λ1 is not 0) to obtain
 k
1 λ2
xk = a1 v1 + a2 v2 . (19.2)
λk1 λ1
 k
Recall that λ1 is the dominant eigenvalue for A. What happens to λλ21 as k → ∞?
Explain what happens to the right hand side of equation (19.2) as k → ∞.

(b) Explain why the previous result tells us that the vectors xk are approaching a vector in the
direction of v1 or −v1 as k → ∞, assuming a1 6= 0. (Why do we need a1 6= 0? What
happens if a1 = 0?)

(c) What does all of this tell us about the sequence {xk } as k → ∞?

The power method is straightforward to implement, but it is not without its drawbacks. We
began by assuming that we had a basis of eigenvectors of a matrix A. So we are also assuming that
A is diagonalizable. We also assumed that A had a dominant eigenvalue λ1 . That is, if A is n × n
we assume that A has eigenvalues λ1 , λ2 , . . ., λn , not necessarily distinct, with

|λ1 | > |λ2 | ≥ |λ3 | ≥ · · · ≥ |λn |

and with vi an eigenvector of A with eigenvalue λi . We could then write any initial guess x0 in the
form
x0 = a1 v1 + a2 v2 + · · · + an vn .
The initial guess is also called a seed.
Then
xk = a1 λk1 v1 + a2 λk2 v2 + · · · + an λkn vn
and  k  k
1 λ2 λn
xk = a1 v1 + a2 v2 + · · · + a n vn . (19.3)
λk1 λ1 λ1
Notice that we are not actually calculating the vectors xk here – this is a theoretical argument and
we don’t know λ1 and are not performing any scaling like we did in Preview Activity 19.1. We are
 k
assuming that λ1 is the dominant eigenvalue of A, though, so for each i the terms λλ1i converge
to 0 as k goes to infinity. Thus,
xk ≈ λk1 a1 v1
for large values of k, which makes the sequence {xk } converge to a vector in the direction of a
dominant eigenvector v1 provided a1 6= 0. So we need to be careful enough to choose a seed that
has a nonzero component in the direction of v1 . Of course, we generally don’t know that our matrix
Section 19. Approximating Eigenvalues and Eigenvectors 335

is diagonalizable before we make these calculations, but for many matrices the sequence {xk } will
approach a dominant eigenvector.
Once we have an approximation to a dominant eigenvector, we can then approximate the dom-
inant eigenvalue.

Activity 19.2. Let A be an n × n matrix with eigenvalue λ and corresponding eigenvector v.

λ(v·v)
(a) Explain why λ = v·v .

(Av)·v
(b) Use the result of part (a) to explain why λ = v·v .

The result of Activity 19.2 is that, when the vectors in the sequence {xk } approximate a domi-
nant eigenvector of a matrix A, the quotients

(Axk ) · xk xT Axk
= kT (19.4)
xk · x k xk xk

approximate the dominant eigenvalue of A. The quotients in (19.4) are called Rayleigh quotients.
To summarize, the procedure for applying the power method for approximating a dominant
eigenvector and dominant eigenvalue of a matrix A is as follows.

Step 1: Select an arbitrary nonzero vector x0 as an initial guess to a dominant eigenvector.

Step 2: Let x1 = Ax0 . Let k = 1.

Step 3: To avoid having the magnitudes of successive approximations become excessively large,
scale this approximation xk . That is, find the entry αk of xk that is largest in absolute value.
Then replace xk by |α1k | xk . Note that this does not change the direction of this approximation,
only its magnitude.

(Axk )·xk
Step 4: Calculate the Rayleigh quotient rk = xk ·xk .

Step 5: Let let xk+1 = Axk . Increase k by 1 and repeat Steps 3 through 5.

If the sequence {xk } converges to a dominant eigenvector of A, then the sequence {rk } converges
to the dominant eigenvalue of A.
The power method can be useful for approximating a dominant eigenvector as long as the suc-
cessive multiplications by A are fairly simple – for example, if many entries of A are zero.2 The
rate of convergence of the sequence {xk } depends on the ratio λλ12 . If this ratio is close to 1, then
 k
it can take many iterations before the power λλ21 makes the v2 term negligible. There are other
methods for approximating eigenvalues and eigenvectors, e.g., the QR factorization, that we will
not discuss at this point.
2
A matrix in which most entries are zero is called a sparse matrix.
336 Section 19. Approximating Eigenvalues and Eigenvectors

The Inverse Power Method

The power method only allows us to approximate the dominant eigenvalue and a dominant eigen-
vector for a matrix A. It is possible to modify this method to approximate other eigenvectors and
eigenvalues under certain conditions. We consider an example in the next activity to motivate the
general situation.
 
2 6
Activity 19.3. Let A = be the matrix from Preview Activity 19.1. Recall that 8 is an
5 3
eigenvalue for A, and a quick calculation can show that −3 is the other eigenvalue of A. Consider
1 −5 6
the matrix B = (A − (−2)I2 )−1 = 10 .
5 −4
1 1
(a) Show that 8−(−2) and −3−(−2) are the eigenvalues of B.

(b) Recall that v1 = [1 1]T is an eigenvector of A corresponding to the eigenvalue 8 and assume
that v2 = [−6 5]T is an eigenvector for A corresponding to the eigenvalue −3. Calculate
the products Bv1 and Bv2 . How do the products relate to the results of part (a)?

Activity 19.3 provides evidence that we can translate the matrix A having a dominant eigenvalue
to a different matrix B with the same eigenvectors as A and with a dominant eigenvalue of our
choosing. To see why, let A be an n × n matrix with eigenvalues λ1 , λ2 , . . ., λn , and let α be any
real number distinct from the eigenvalues. Let B = (A − αIn )−1 . In our example in Activity 19.3
the numbers
1 1 1 1
, , , ...,
λ1 − α λ2 − α λ3 − α λn − α
were the eigenvalues of B, and that if vi is an eigenvector for A corresponding to the eigenvalue
λi , then vi is an eigenvector of B corresponding to the eigenvalue λi 1−α . To see why, let λ be an
eigenvalue of an n × n matrix A with corresponding eigenvector v. Let α be a scalar that is not an
eigenvalue of A, and let B = (A − αIn )−1 . Now
Av = λv
Av − αv = λv − αv
(A − αIn )v = (λ − α)v
1
v = (A − αIn )−1 v.
λ−α
1
So λ−α is an eigenvalue of B with eigenvector v.
Now suppose that A is an n × n matrix with eigenvalues λ1 , λ2 , . . ., λn , and that we want to
approximate an eigenvector and corresponding eigenvalue
λi of A. If we can somehow find a value
of α so that |λi − α| < |λj − α| for all j 6= i, then λi 1−α > λj1−α for any j 6= i. Thus, the

matrix B = (A − αIn )−1 has λi 1−α as its dominant eigenvalue and we can use the power method
to approximate an eigenvector and the Rayleigh quotient to approximate the eigenvalue λi 1−α , and
hence approximate λi .
 
7 3 3
Activity 19.4. Let A = 18  30 22 −10 .
15 −21 11
Section 19. Approximating Eigenvalues and Eigenvectors 337

(a) Apply the power method to the matrix B = (A − I3 )−1 with initial vector x0 = [1 0 0]T to
fill in Table 19.1 (to four decimal places). Use this information to estimate an eigenvalue
for A and a corresponding eigenvector.

k 10 15 20

xk
xT
k Axk
xT
k xk

Table 19.1: Applying the power method to (A − I3 )−1 .

(b) Applying the power method to the matrix B = (A − 0I3 )−1 with initial vector x0 =
[1 0 0]T yields the information in Table 19.2 (to four decimal places). Use this information
to estimate an eigenvalue for A and a corresponding eigenvector.

k 10 15 20
     
 0.3344   −0.3333   0.3333 
     
     
xk  −0.6677 
 
 0.6666 
 
 −0.6666 
 
     
     
−1.0000 1.0000 −1.0000
xT
k Axk
xT
−1.0014 −1.0000 −1.0000
k xk

Table 19.2: Applying the power method to (A − 0I3 )−1 .

(c) Applying the power method to the matrix B = (A − 5I3 )−1 with initial vector x0 =
[1 0 0]T yields the information in Table 19.3 (to four decimal places). Use this information
to estimate an eigenvalue for A and a corresponding eigenvector.

Examples

What follows are worked examples that use the concepts from this section.
 
1 2 3
Example 19.1. Let A =  4 5 6 .
7 8 9

(a) Approximate the dominant eigenvalue of A accurate to two decimal places using the power
method. Use technology as appropriate.
338 Section 19. Approximating Eigenvalues and Eigenvectors

k 10 15 20
     
 0.0000   0.0000   0.0000 
     
     
xk 
 1.0000 

 −1.0000 
 

 1.0000 

     
     
−1.0000 1.0000 −1.0000
xT
k Axk
xT
−1.0000 −1.0000 −1.0000
k xk

Table 19.3: Applying the power method to (A − 5I3 )−1 .

(b) Find the characteristic polynomial p(λ) of A. Then find the the root of p(λ) farthest from
the origin. Compare to the result of part (a). Use technology as appropriate.

Example Solution.

(a) We use technology to calculate the scaled vectors Ak x0 for values of k until the components
don’t change in the second decimal place. We start with the seed x0 = [1 1 1]T . For
example, to two decimal places we have xk = [0.28 0.64 1.00]T for k ≥ 20. So we suspect
that [0.28 0.64 1.00]T is close to a dominant eigenvector for A.
For the dominant eigenvalue, we can calculate the Rayleigh quotients (Ax k )·xk
xk ·xk until they
do not change to two decimal places. For k ≥ 4, our Rayleigh quotients are all (to two
decimal places) equal to 16.12. So we expect that the dominant eigenvalue of A is close to
16.12. Notice that
A[0.28 0.64 1.00]T = [4.56 10.32 16.08]T ,
which is not far off from 16.12[0.28 0.64 1.00]T .

(b) The characteristic polynomial of A is

p(λ) = −λ3 + 15λ2 + 18λ = −λ(λ2 − 15λ − 18).

The quadratic formula gives the nonzero roots of p(λ) as


p √
15 ± 152 + 4(18) 15 ± 3 33
= .
2 2
The roots farthest from the origin is approximately 16.12, as was also calculated in part (a).
 
2 1 0
Example 19.2. Let A =  1 3 1 .
0 1 2

(a) Use the power method to approximate the dominant eigenvalue and a corresponding eigen-
vector (using scaling) accurate to two decimal places. Use x0 = [1 1 1]T as the seed.
Section 19. Approximating Eigenvalues and Eigenvectors 339

(b) Determine the exact value of the dominant eigenvalue of A and compare to your result from
part (a).

(c) Approximate the remaining eigenvalues of A using the inverse power method. (Hint: Try
α = 0.5 and α = 1.8.)

Example Solution.

(a) We use technology to calculate the scaled vectors Ak x0 for values of k until the components
don’t change in the second decimal place. For example, to two decimal places we have
T
xk = [0.50 1.00 0.50]T for k ≥ 4. So we suspect that 21 1 12 is a dominant eigenvector


for A.
For the dominant eigenvalue, we can calculate the Rayleigh quotients (Ax k )·xk
xk ·xk until they
do not change to two decimal places. For k ≥ 2, our Rayleigh quotients are all (to two
decimal places) equal to 4. So we expect that the dominant eigenvalue of A is 4. We could
also use the fact that
1 1 T 1 1 T
   
T
A 1 = [2 4 2] = 4 1
2 2 2 2
T
to see that 12 1 21 is a dominant eigenvector for A with eigenvalue 4.


(b) Technology shows that the characteristic polynomial of A − λI3 is

p(λ) = −λ3 + 7λ2 − 14λ + 8 = −(λ − 1)(λ − 2)(λ − 4).

We can see from the characteristic polynomial that 4 is the dominant eigenvalue of A.

(c) Applying the power method to B = (A − 0.5I3 )−1 with seed x0 = [1 1 1]T gives xk ≈
[0.50 1.00 0.50]T for k ≥ 5, with Rayleigh quotients of 2 (to several decimal places). So 2
1
is the dominant eigenvalue of B. But λ−0.5 is also the dominant eigenvalue of B, where λ
1
is the corresponding eigenvalue of A. . So to find λ, we note that λ−0.5 = 2 implies that
λ = 1 is an eigenvalue of A.
Now applying the power method to B = (A − 1.8I3 )−1 with seed x0 = [1 1 1]T gives
xk ≈ [1.00 − 1.00 1.00]T for large enough k, with Rayleigh quotients of 5 (to several
1
decimal places). To find the corresponding eigenvalue λ for A, we note that λ−1.8 = 5, or
λ = 2 is an eigenvalue of A.
Admittedly, this method is very limited. Finding good choices for α often depends on
having some information about the eigenvalues of A. Choosing α close to an eigenvalue
provides the best chance of obtaining that eigenvalue.

Summary

• The power method is an iterative method that can be used to approximate the dominant eigen-
value of an n × n matrix A that has n linearly independent eigenvectors and a dominant
eigenvalue.
340 Section 19. Approximating Eigenvalues and Eigenvectors

• To use the power method we start with a seed x0 and then calculate the sequence {xk } of
vectors, where xk = Axk−1 . If x0 is chosen well, then the sequence {xk } converges to a
dominant eigenvector of A.

• If A is an n × n matrix with eigenvalues λ1 , λ2 , . . ., λn , to approximate an eigenvector


of A corresponding to the eigenvalue λi , we apply the power
method to the matrix B =
−1 1 1
(A − αIn ) , where α is not a eigenvalue of A and λi −α > λj −α for any j 6= i.

Exercises
 
1 2
(1) Let A = . Let x0 = [1 0]T .
2 1
(a) Find the eigenvalues and corresponding eigenvectors for A.
(b) Use appropriate technology to calculate xk = Ak x0 for k up to 10. Compare to a
dominant eigenvector for A.
(c) Use the eigenvectors from part (b) to approximate the dominant eigenvalue for A.
Compare to the exact value of the dominant eigenvalue of A.
(d) Assume that the other eigenvalue for A is close to 0. Apply the inverse power method
and compare the results to the remaining eigenvalue and eigenvectors for A.
 
1 2 0
(2) Let A =  −2 1 2 . Use the power method to approximate a dominant eigenvector for
1 3 1
A. Use x0 = [1 1 1]T as the seed. Then approximate the dominant eigenvalue of A.
 
3 −1
(3) Let A = . Use the power method starting with x0 = [1 1]T . Explain why the
−1 3
method fails in this case to approximate a dominant eigenvector, and how you could adjust
the seed to make the process work.
 
0 1
(4) Let A = .
1 0
(a) Find the eigenvalues and an eigenvector for each eigenvalue.
(b) Apply the power method with an initial starting vector x0 = [0 1]T . What is the
resulting sequence?
(c) Use equation (19.3) to explain the sequence you found in part (b).
 
2 6
(5) Let A = . Fill in the entries in Table 19.4, where xk is the kth approximation to a
5 3
dominant eigenvector using the power method, starting with the seed x0 = [1 0]T . Compare
x ·xk
the results of this table to the eigenvalues of A and limk→∞ xk+1 k ·xk
. What do you notice?
 
4 −5
(6) Let A = . The power method will approximate the dominant eigenvalue λ = 14.
2 15
In this exercise we explore what happens if we apply the power method to A−1 .
Section 19. Approximating Eigenvalues and Eigenvectors 341

v x0 x1 x2 x3 x4 x5
vT Av
vT v

v x6 x7 x8 x9 x10 x11
vT Av
vT v

Table 19.4: Values of the Rayleigh quotient.

(a) Apply the power method to A−1 to approximate the dominant eigenvalue of A−1 .
Use [1 1]T as the seed. How is this eigenvalue related to an eigenvalue of A?
(b) Explain in general why applying the power method to the inverse of an invertible
matrix B might give an approximation to an eigenvalue of B of smallest magnitude.
When might this not work?

(7) There are other algebraic methods that do not rely on the determinant of a matrix that can be
used to find eigenvalues of a matrix. We examine one such method in this exercise. Let A be
any n × n matrix, and let v be any vector in Rn .
(a) Explain why the vectors
v, Av, A2 v, . . . , An v
are linearly independent.
(b) Let c0 , c1 , . . ., cn be scalars, not all 0, so that

c0 v + c1 Av + c2 A2 v + · · · + cn An v = 0.

Explain why there must be a smallest positive integer k so that there are scalars a0 ,
a1 , . . ., ak with ak 6= 0. such that

a0 v + a1 Av + a2 A2 v + · · · + ak Ak v = 0.

(c) Let
q(t) = a0 + a1 t + a2 t2 + · · · + ak tk .
Then
q(A) = a0 + a1 A + a2 A2 + · · · + ak Ak
and

q(A)v = (a0 + a1 A + a2 A2 + · · · + ak Ak )v
= a0 v + a1 Av + a2 A2 v + · · · + ak Ak v
= 0.

Suppose the polynomial q(t) has a linear factor, say q(t) = (t − λ)Q(t) for some
degree k −1 polynomial Q(t). Explain why, if Q(A)v is non-zero, λ is an eigenvalue
of A with eigenvector Q(A)v.
342 Section 19. Approximating Eigenvalues and Eigenvectors

(d) This method allows us to find certain eigenvalues and eigenvectors, the roots of
the polynomial q(t). Any other eigenvector must lie outside the eigenspaces we
have already found, so repeating the process with a vector v not in any of the
known eigenspaces
  will produce different eigenvalues and eigenvectors. Let A =
2 2 −1
 2 2 2 .
0 0 6
i. Find the polynomial q(t). Use v = [1 1 1]T .
ii. Find all of the roots of q(t).
iii. For each root λ of q(t), find the polynomial Q(t) and use this polynomial to
determine an eigenvector of A. Verify your work.
(8) We don’t need to use the Rayleigh quotients to approximate the dominant eigenvalue of a
matrix A if we instead keep track of the scaling factors. Recall that the scaling in the power
method can be used to make the magnitudes of the successive approximations smaller and
easier to work with. Let A be an n × n matrix and begin with a non-zero seed v0 . We
now want to keep track of the scaling factors, so let α0 be the component of v0 with largest
absolute value and let x0 = |α10 | v0 . For k ≥ 0, let vk = Axk−1 , let αk be the component of
vk with largest absolute value and let xk = α1k vk .
 
0 1
(a) Let A = . Use x0 = [1 1]T as the seed and calculate αk for k from 1 to
−8 6
10. Compare to the dominant eigenvalue of A.
(b) Assume that for large k the vectors xk approach a dominant eigenvector with dom-
inant eigenvalue λ. Show now in general that the sequence of scaling factors αk
approaches λ.
(9) Let A be an n × n matrix and let α be a scalar that is not an eigenvalue of A. Suppose that x
is an eigenvector of B = (A − αIn )−1 with eigenvalue β. Find an eigenvalue of A in terms
of β and α with corresponding eigenvector x.
(10) Label each of the following statements as True or False. Provide justification for your re-
sponse.
(a) True/False The largest eigenvalue of a matrix is a dominant eigenvalue.
(b) True/False If an n × n matrix A has n linearly independent eigenvectors and a
dominant eigenvalue, then the sequence {Ak x0 } converges to a dominant eigenvector
of A for any initial vector x0 .
(c) True/False If λ is an eigenvalue of an n × n matrix A and α is not an eigenvalue of
A, then λ − α is an eigenvalue of A − αIn .
(d) True/False Every square matrix has a dominant eigenvalue.

Project: Managing a Sheep Herd

Sheep farming is a significant industry in New Zealand. New Zealand is reported to have the
highest density of sheep in the world. Sheep can begin to reproduce after one year, and give birth
Section 19. Approximating Eigenvalues and Eigenvectors 343

only once per year. Table 19.5 gives Birth and Survival Rates for Female New Zealand Sheep (from
G. Caughley, “Parameters for Seasonally Breeding Populations,” Ecology, 48, (1967), 834-839).
Since sheep hardly ever live past 12 years, we will only consider the population through 12 years.

Age (years) Birth Rate Survival Rate


0-1 0.000 0.845
1-2 0.045 0.975
2-3 0.391 0.965
3-4 0.472 0.950
4-5 0.484 0.926
5-6 0.546 0.895
6-7 0.543 0.850
7-8 0.502 0.786
8-9 0.468 0.691
9-10 0.459 0.561
10-11 0.433 0.370
11-12 0.421 0.000

Table 19.5: New Zealand female sheep data by age group.

As sheep reproduce, they add to the 0-1 sheep (lamb) population. The potential to produce
offspring is called fecundity (derived from the word fecund which generally refers to reproductive
ability) and determines how many lamb are added to the population. Let Fk (the fecundity rate) be
the rate at which females in age class k give birth to female offspring. Not all members of a given
age group survive to the next age groups, so let sk be the fraction of individuals that survives from
age group k to age group k + 1. With these ideas in mind, we can create a life cycle chart as in
Figure 19.2 that illustrates how the population of sheep changes on a farm (for the sake of space,
we illustrate with four age classes).

F4
F3
F2
F1 1 2 3 4
s1 s2 s3

Figure 19.2: Life cycle with four age classes.

(0)
To model the sheep population, we need a few variables. Let n1 be the number of sheep in
(0)
age group 0-1, n2 the number in age group 1-2, n3 the number in age group 2-3 and, in general,
(0)
nk the number of sheep in age group (k − 1)-k at some initial time (time 0), and let

(0) T
h i
(0) (0) (0)
x0 = n1 n2 n3 · · · n12 .
344 Section 19. Approximating Eigenvalues and Eigenvectors

We wish to determine the populations in the different groups after one year. Let

(1) T
h i
(1) (1) (1)
x1 = n1 n2 n3 · · · n12 ,

(1) (1)
where n1 denotes the number of sheep in age group 0-1, n2 the number of sheep in age group
(1)
1-2 and, in general, nk the number of tilapia in age group (k − 1)-k after one year.
Project Activity 19.1. Table 19.5 shows that, on average, each female in age group 1-2 produces
0.045 female offspring in a year. Since there are n2 females in age group 1-2, the lamb population
increases by 0.045n2 in a year.

(a) Continue this analysis to explain why


(1)
n1 = 0.045n2 + 0.391n3 + 0.472n4 + 0.484n5 + 0.546n6 + 0.543n7
+ 0.502n8 + 0.468n9 + 0.459n10 + 0.433n11 + 0.421n12 .

(1)
(b) Explain why n2 = 0.845n1 .

(c) Now explain why


x1 = Lx0 , (19.5)
where L is the matrix
 0 0.045 0.391 0.472 0.484 0.546 0.543 0.502 0.468 0.459 0.433 0.421

 0.845 0 0 0 0 0 0 0 0 0 0 0
 0 0.975 0 0 0 0 0 0 0 0 0 0 

 0 0 0.965 0 0 0 0 0 0 0 0 0 
 0 0 0 0.950 0 0 0 0 0 0 0 0 


 0 0 0 0 0.926 0 0 0 0 0 0 0 
 0 0 0 0 0 0.895 0 0 0 0 0 0  . (19.6)
 
 0 0 0 0 0 0 0.850 0 0 0 0 0 
 0 0 0 0 0 0 0 0.786 0 0 0 0 
 
 0 0 0 0 0 0 0 0 0.691 0 0 0 
0 0 0 0 0 0 0 0 0 0.561 0 0
0 0 0 0 0 0 0 0 0 0 0.370 0

Notice that our matrix L has the form


 
F1 F2 F3 · · · Fn−1 Fn
 s1 0 0
 ··· 0 0 

 0 s2 0 ··· 0 0 
.
 
 0 0 s3 ··· 0 0 
..
 
.
 
 
0 0 0 · · · sn−1 0

Such a matrix is called a Leslie matrix.


Leslie matrices have certain useful properties, and one eigenvalue of a Leslie matrix can tell us
a lot about the long-term behavior of the situation being modeled. You can take these properties as
fact unless otherwise directed.

(1) A Leslie matrix L has a unique positive eigenvalue λ1 with a corresponding eigenvector v1
whose entries are all positive.
Section 19. Approximating Eigenvalues and Eigenvectors 345

(2) If λi (i > 1) is any other eigenvalue (real or complex) of L, then |λi | ≤ λ1 . If λ1 is the largest
magnitude eigenvalue of a matrix L, we call λ1 a dominant eigenvalue of L.

(3) If any two successive entries in the first row of L are both positive, then |λi | < λ1 for every
i > 1. In this case we say that λ1 is a strictly dominant eigenvalue of L. In a Leslie model, this
happens when the females in two successive age classes are fertile, which is almost always
the case.

(4) If λ1 is a strictly dominant eigenvalue, then xk is approximately a scalar multiple of v1 for


large values of k, regardless of the initial state x0 . In other words, large state vectors are close
to eigenvectors for λ1 .

We can use these properties to determine the long-term behavior of the sheep herd.

Project Activity 19.2. Assume that L is defined by (19.6), and let

(m) T
h i
(m) (m) (m)
xm = n1 n2 n3 · · · n12 ,

(m) (m)
where n1 denotes the number of sheep in age group 0-1, n2 the number of sheep in age group
(m)
1-2 and, in general, nk the number of sheep in age group (k − 1)-k after k years.

(a) Assume that x0 = [100 100 100 · · · 100]T . Use appropriate technology to calculate x22 ,
x23 , x24 , and x25 . Round to the nearest whole number. What do you notice about the sheep
population? You may use the GeoGebra applet at https://www.geogebra.org/m/
yqss88xq.

(b) We can use the third and fourth properties of Leslie matrices to better understand the long-
term behavior of the sheep population. Since successive entries in the first row of the Leslie
matrix in (19.6) are positive, our Leslie matrix has a strictly dominant eigenvalue λ1 . Given
the dimensions of our Leslie matrix, finding this dominant eigenvalue through algebraic
means is not feasible. Use the power method to approximate the dominant eigenvalue λ1
of the Leslie matrix in (19.6) to five decimal places. Explain your process. Then explain
how this dominant eigenvalue tells us that, unchecked, the sheep population grows at a rate
that is roughly exponential. What is the growth rate of this exponential growth? You may
use the GeoGebra applet at https://www.geogebra.org/m/yqss88xq.

Project Activity 19.2 indicates that, unchecked, the sheep population will grow without bound,
roughly exponentially with ratio equal to the dominant eigenvalue of our Leslie matrix L. Of course,
a sheep farmer cannot provide the physical environment or the resources to support an unlimited
population of sheep. In addition, most sheep farmers cannot support themselves only by shearing
sheep for the wool. Consequently, some harvesting of the sheep population each year for meat and
skin is necessary. A sustainable harvesting policy allows for the regular harvesting of some sheep
while maintaining the population at a stable level. It is necessary for the farmer to find an optimal
harvesting rate to attain this stable population and the following activity leads us through an analysis
of how such a harvesting rate can be determined.

Project Activity 19.3. The Leslie model can be modified to consider harvesting. It is possible to
harvest different age groups at different rates, and to harvest only some age groups and not others.
346 Section 19. Approximating Eigenvalues and Eigenvectors

In the case of sheep, it might make sense to only harvest from the youngest population since lamb
is more desirable than mutton and the lamb population grows the fastest. Assume that this is our
harvesting strategy and that we harvest our sheep from only the youngest age group at the start of
each year. Let h be the fraction of sheep we harvest from the youngest age group each year after
considering growth.

(a) If we begin with an initial population x0 , then the state vector after births and expected
deaths is Lx0 . Now we harvest. Explain why if we harvest a fraction h from the youngest
age group after considering growth, then the state vector after 1 year will be

x1 = Lx0 − HLx0 ,

where  
h 0 0 0 0 0 0 0 0 0 0 0

 0 0 0 0 0 0 0 0 0 0 0 0 


 0 0 0 0 0 0 0 0 0 0 0 0 


 0 0 0 0 0 0 0 0 0 0 0 0 


 0 0 0 0 0 0 0 0 0 0 0 0 

0 0 0 0 0 0 0 0 0 0 0 0
 
H= .
 
 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0
 
 
0 0 0 0 0 0 0 0 0 0 0 0
 
 
0 0 0 0 0 0 0 0 0 0 0 0
 
 
 
 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0

(b) Our goal is to find a harvesting rate that will lead to a steady state in which the sheep
population remains the same each year. In other words, we want to find a value of h, if one
exists, that satisfies
x = Lx − HLx. (19.7)

Show that (19.7) is equivalent to the matrix equation

x = (I12 − H)Lx. (19.8)

(c) Use appropriate technology to experiment numerically with different values of h to find the
value you think gives the best uniform harvest rate. Explain your reasoning. You may use
the GeoGebra applet at https://www.geogebra.org/m/yqss88xq.

(d) Now we will use some algebra to find an equation that explicitly gives us the harvest rate in
the general setting. This will take a bit of work, but none of it is too difficult. To simplify
our work but yet illustrate the overall idea, let us consider the general 4 × 4 case with
arbitrary Leslie matrix
 
F1 F2 F3 F4
 s1 0 0 0 
L=  0 s2 0 0  .

0 0 s3 0
Section 19. Approximating Eigenvalues and Eigenvectors 347

 
h 0 0 0
 0 0 0 0 
Recall that we want to find a value of h that satisfies (19.8) with H = 
 0
.
0 0 0 
0 0 0 0
Let x = [x1 x2 x3 x4 ] .
T

i. Calculate the matrix product (I4 − H)L. Explain why this product is again a Leslie
matrix and why (I4 − H)L will have a dominant eigenvalue of 1.
ii. Now calculate (I4 − H)Lx and set it equal to x. Write down the resulting system of
4 equations that must be satisfied. Be sure that your first equation is

x1 = (1 − h)F1 x1 + (1 − h)F2 x2 + (1 − h)F3 x3 + (1 − h)F4 x4 . (19.9)

iii. Equation (19.9) as written depends on the entries of the vector x, but we should be
able to arrive at a result that is independent of x. To see how we do this, we assume
the population of the youngest group is never 0, so we can divide both sides of (19.9)
by x1 to obtain
x2 x3 x4
1 = (1 − h)F1 + (1 − h)F2 + (1 − h)F3 + (1 − h)F4 . (19.10)
x1 x1 x1
Now we need to write the fractions xx21 , xx13 , and xx41 so that they do not involve the xi .
Use the remaining equations in your system to show that
x2
= s1
x1
x3
= s1 s2
x1
x4
= s1 s2 s3 .
x1

iv. Now conclude that the harvesting value h must satisfy the equation

1 = (1 − h)[F1 + F2 s1 + F3 s1 s2 + F4 s1 s2 s3 ]. (19.11)

The value R = F1 + F2 s1 + F3 s1 s2 + F4 s1 s2 s3 is called the net reproduction rate of


the population and turns out to be the average number of daughters born to a female
in her expected lifetime.

(e) Extend (19.11) to the 12 age group case of the sheep herd. Calculate the value of R for this
sheep herd and then find the value of h. Compare this h to the value you obtained through
experimentation earlier. Find the fraction of the lambs that should be harvested each year
and explain what the stable population state vector x tells us about the sheep population for
this harvesting policy.
Section 20

Complex Eigenvalues

Focus Questions
By the end of this section, you should be able to give precise and thorough
answers to the questions listed below. You may want to keep these questions
in mind to focus your thoughts as you complete the section.

• What properties do complex eigenvalues of a real matrix satisfy?


• What properties do complex eigenvectors of a real matrix satisfy?
• What is a rotation-scaling matrix?
• How do we find a rotation-scaling matrix within a matrix with complex
eigenvalues?

Application: The Gershgorin Disk Theorem

We have now seen different methods for calculating/approximating eigenvalues of a matrix. The
algebraic method using the characteristic polynomial can provide exact values, but only in cases
where the size of the matrix is small. Methods like the power method allow us to approximate
eigenvalues in many, but not all, cases. These approximation techniques can be made more efficient
if we have some idea of where the eigenvalues are. The Gershgorin Disc Theorem is a useful
tool that can quickly provide bounds on the location of eigenvalues using elementary calculations.
For example, using the Gershsgorin Disk Theorem we can quickly tell that the real parts of the
eigenvalues of the matrix
 
3 1 −1
 0 −1 + i i 
2 1 −2i
lie between −4 and 5 and the imaginary parts lie between −5 and 2. Even more, we can say that
the eigenvalues lie in the disks (called Gershgorin disks) shown in Figure 20.1. We will learn more
details about the Gershgorin Disk Theorem at the end of this section.

349
350 Section 20. Complex Eigenvalues

Im(z)

−1 + i
Re(z)
3

−2i

Figure 20.1: Gershgorin disks.

Introduction

So far we have worked with real matrices whose eigenvalues are all real. However, the characteristic
polynomial of a matrix with real entries can have complex roots. In this section we investigate the
properties of these complex roots and their corresponding eigenvectors, how these complex eigen-
vectors are found, and the geometric interpretation of the transformations defined by matrices with
complex eigenvalues. Although we can consider matrices that have complex numbers as entries,
we will restrict ourselves to matrices with real entries.
 
2 4
Preview Activity 20.1. Let A = .
−2 2

(1) Find the characteristic polynomial of A.


(2) Find the eigenvalues of A. You should get two complex numbers. How are these complex
numbers related?
(3) Find an eigenvector corresponding to each eigenvalue of A. You should obtain vectors with
complex entries.

Complex Eigenvalues

As you noticed in Preview Activity 20.1, the complex roots of the characteristic equation of a real
matrix A come in complex conjugate pairs. This should come as no surprise since we know through
our use of the quadratic formula that complex roots of (real) quadratic polynomials come in complex
conjugate pairs. More generally, if p(x) = a0 + a1 x + a2 x2 + · · · + an xn is a polynomial with real
coefficients and z is a root of this polynomial, meaning p(z) = 0, then
0 = p(z) = a0 + a1 z + a2 z 2 + · · · + an z n = a0 + a1 z + a2 z 2 + · · · + an z n = p(z) .
Therefore, z is also a root of p(x).
Section 20. Complex Eigenvalues 351

 
0 −1
Activity 20.1. Let A = .
1 0

(a) The linear transformation T : R2 → R2 defined by T (x) = Ax is a rotation transformation.


What is the angle of rotation?

(b) Find the eigenvalues of A. For each eigenvalue, find an eigenvector.

In Preview Activity 20.1 and in Activity 20.1, you found that if v is an eigenvector of A corre-
sponding to λ, then v obtained by taking the complex conjugate of each entry in v is an eigenvector
of A corresponding to λ. Specifically, if v = u + iw where both u and w are real vectors is an
eigenvector of A, then so is v = u − iw. We can justify this property using matrix algebra as
follows:
Av = Av = Av = λv = λv .

In the first equality, we used the fact that A is a real matrix, so A = A. In all the other equalities,
we used the properties of the conjugation operation in complex numbers.

Rotation and Scaling Matrices

Recall that a rotation matrix is of the form


 
cos(θ) − sin(θ)
Rθ =
sin(θ) cos(θ)

where the rotation is counterclockwise about the origin by an angle of θ radians. In Activity 20.1,
we considered the rotation matrix with angle π/2 in counterclockwise direction. We will soon see
that rotation matrices play an important role in the geometry of a matrix transformation for a matrix
that has complex eigenvalues. In this activity, we will restrict ourselves to the 2 × 2 case, but similar
arguments can be made in higher dimensions.
 
1 1
Activity 20.2. Let A = .
−1 1

(a) Explain why A is not a rotation matrix.

(b) Although A is√not a rotation matrix, there is a rotation matrix B inside A. To find the matrix
B, factor out 2 from all entries of A. In other words, write A as a product of two matrices
in the form  √ 
2 √0
A= B.
0 2

(c) The B matrix is a rotation matrix with an appropriate θ. Find this θ.

(d) If we think about the product of two matrices as applying one transformation after another,
describe the effect of the matrix transformation defined by A geometrically.
352 Section 20. Complex Eigenvalues

 
a −b
More generally, if we have a matrix A of the form A = , then
b a
 √ " √ a √ −b
#
a2 + b2 √ 0 a2 +b2 a2 +b2
A= .
0 a2 + b2 √ b √ a
a2 +b2 a2 +b2

The first matrix in the decomposition is a scaling matrix with a scaling factor of s = a2 + b2 . So
if s > 1, the transformation stretches vectors, and if s < 1, the transformation shrinks vectors. The
second matrix in the decomposition is a rotation matrix with angle θ such that cos(θ) = √a2a+b2
and sin(θ) = √a2b+b2 . This angle is also the angle between the positive x-axis and the vector
   
a a −b
v= . We will refer to the matrices of the form as rotation-scaling matrices.
b b a

Matrices with Complex Eigenvalues

Now we will investigate how a general 2 × 2 matrix with complex eigenvalues can be seen to be
similar (both in a linear algebra and a colloquial meaning) to a rotation-scaling matrix.
 
1 −5
Activity 20.3. Let B = . The eigenvalues of B are 2 ± 3i. An eigenvector for the
2 3
 
−5
eigenvalue 2 − 3i is v = . We will use this eigenvector to show that B is similar to a
1 − 3i
rotation-scaling matrix.

(a) Any complex vector v can be written as v = u + iw where both u and w are real vectors.
What are these real vectors u and w for the eigenvector v above?

(b) Let P = [u w] be the matrix whose first column is the real part of v and whose second
column is the imaginary part of v (without the i). Find R = P −1 BP .

(c) Express R as a product of a rotation and a scaling matrix. What is the factor of scaling?
What is the rotation angle?

In Activity 20.3, we saw that the matrix B with complex eigenvalues 2 ± 3i is similar to a
rotation-scaling matrix. Specifically R = P −1 BP , where the columns of P are the real and √ imagi-
nary parts of an eigenvector of B, is the rotation-scaling matrix with a factor of scaling by 22 + 32
and a rotation by angle θ = arccos( √222+32 ).
Does a similar decomposition result hold for a general 2 × 2 matrix with complex eigenvalues?
We investigate this question in the next activity.

Activity 20.4. Let A be a 2 × 2 matrix with complex eigenvalue λ = a − bi, b 6= 0, and corre-
sponding complex eigenvector v = u + iw.

(a) Explain why Av = Au + iAw.

(b) Explain why λv = (au + bw) + i(aw − bu).


Section 20. Complex Eigenvalues 353

(c) Use the previous two results to explain why


• Au = au + bw and
• Aw = aw − bu.
 
a −b
(d) Let P = [u w]. We will now show that AP = P R where R = .
b a

i. Without any calculation, explain why


AP = [Au Aw].

ii. Recall that if M is an m × n matrix and x is an n × 1 vector, then the matrix product
M x is a linear combination of the columns of M with weights the corresponding
entries of the vector x. Use this idea to show that
P R = [au + bw − bu + aw].

iii. Now explain why AP = P R.


iv. Assume for the moment that P is an invertible matrix. Show that A = P RP −1 .

Your work in Activity 20.4


√ shows that any 2 × 2 matrix is similar to a rotation-scaling matrix
with a factor of scaling by a2 + b2 and a rotation by angle θ = arccos( √a2a+b2 ) if b ≥ 0, and
θ = − arccos( √a2a+b2 ) if b < 0. Geometrically, this means that every 2×2 real matrix with complex
eigenvalues is just a scaled rotation (R) with respect to the basis B formed by u and w from the
complex eigenvector v. Multiplying by P −1 and P simply provides the change of basis from the
standard basis to the basis B, as we will see in detail when we learn about linear transformations.
Theorem 20.1. Let A be a real 2 × 2 matrix with complex eigenvalue a − bi and corresponding
eigenvector v = u + iw. Then
 
−1 a −b
A = P RP , where P = [u w] and R = .
b a

The one fact that we have not yet addressed is why the matrix P = [u w] is invertible. We do
that now to complete the argument.
Let A be a real 2 × 2 matrix with Av = λv, where λ = a − bi, b 6= 0 and v = u + iw. To
show that u and w are linearly independent, we need to show that no nontrivial linear combination
of u and w can be the zero vector. Suppose
x1 u + x2 w = 0
for some scalars x1 and x2 . We will show that x1 = x2 = 0. Assume to the contrary that one of
x1 , x2 is not zero. First, assume x1 6= 0. Then u = − xx12 w. Let c = − xx21 . Then

Au = A(cw)
Au = cAw
au + bw = c(au − bw)
(a + cb)u = (ca − b)w
(a + cb)(cu) = (ca − b)w.
354 Section 20. Complex Eigenvalues

So we must have (a+cb)c = ca−b. This equation simplifies to c2 b = −b. Since b 6= 0, we conclude
that c2 = −1 which is impossible for a real constant c. Therefore, we cannot have x1 6= 0. A similar
argument (left to the reader) shows that x2 = 0. Thus we can conclude that u and w are linearly
independent.

Examples

What follows are worked examples that use the concepts from this section.
 
0 1 0
Example 20.2. Let A =  −1 0 −1 .
1 1 1
(a) Without doing any computations, explain why not all of the eigenvalues of A can be com-
plex.

(b) Find all of the eigenvalues of A.

Example Solution.
(a) Since complex eigenvalues occur in conjugate pairs, the complex eigenvalues with nonzero
imaginary parts occur in pairs. Since A can have at most 3 different eigenvalues, at most
two of them can have nonzero imaginary parts. So at least one eigenvalue of A is real.
 
−λ 1 0
(b) For this matrix A we have A−λI3 =  −1 −λ −1 . Using a cofactor expansion
1 1 −λ + 1
along the first row gives us
det(A − λI3 ) = (−λ) ((−λ)(1 − λ) + 1) − ((−1)(1 − λ) + 1)
= −λ3 + λ2 − λ + 1 − λ − 1
= λ3 + λ2 − 2λ
= −λ(λ2 − λ + 2).
The roots of the characteristic polynomial are λ = 0 and

p
1 ± 1 − 4(2) 1
λ= = (1 ± 7i).
2 2
 
1 2
Example 20.3. Let A = . Find a rotation scaling matrix R that is similar to A. Identify
−1 3
the rotation and scaling factor.
Example Solution. The eigenvalues of A are the roots of the characteristic polynomial
p(λ) = det(A − λI2 )
 
1−λ 2
= det
−1 3 − λ
= (1 − λ)(3 − λ) + 2
= λ2 − 4λ + 5.
Section 20. Complex Eigenvalues 355

The quadratic formula shows that the roots of p(λ) are



4 ± −4
= 2 ± i.
2
To find an eigenvector for A with eigenvalue 2 − i, we row reduce
 
−1 + i 2
A − (2 − i)I3 =
−1 1+i
to  
1 −i − 1
.
0 0
An eigenvector for A with eigenvalue 2 − i is then

[1 + i 1]T = [1 1]T + i[1 0]T .


 
1 1
Letting P = , we have
1 0
 
−1 2 −1
R=P AP = .
1 2

The scaling is determined by the determinant of R which is 5, and the angle θ of rotation satisfies
sin(θ) = 51 . This makes θ ≈ 0.2014 radians or approximately 11.5370◦ counterclockwise.

Summary

• For a real matrix, complex eigenvalues appear in conjugate pairs. Specifically, if λ = a + ib


is an eigenvalue of a real matrix A, then λ = a − ib is also an eigenvalue of A.

• For a real matrix, if a v is an eigenvector corresponding to λ, then the vector v obtained by


taking the complex conjugate of each entry in v is an eigenvector corresponding to λ.
 
a −b
• The rotation-scaling matrix A = can be written as
b a
 √ 2 " √ a √ −b
#
a + b2 √ 0 2
a +b 2 2
a +b 2
.
0 a2 + b2 √ b √ a
2
a +b 2 a2 +b2

This decomposition geometrically means that


 the transformation
 correspondingto A canbe
a
viewed as a rotation by angle θ = arccos √
a2 +b2
if b ≥ 0, or θ = − arccos √a2a+b2 if

b < 0, followed by a scaling by factor a2 + b2 .

• If A is a real 2 × 2 matrix with complex eigenvalue a − bi and corresponding


 eigenvector v =
a −b
u + iw, then A is similar to the rotation-scaling matrix R = . More specifically,
b a

A = P RP −1 , where P = [u w] .
356 Section 20. Complex Eigenvalues

Exercises

(1) Find eigenvalues and eigenvectors of each of the following matrices.


 
2 4
(a)
−2 2
 
3 2
(b)
−1 1
 
1 −2
(c)
4 −3

(2) Find a rotation-scaling matrix where the rotation angle is θ = 3π/4 and scaling factor is less
than 1.

(3) Determine which rotation-scaling matrices have determinant equal to 1. Be as specific as


possible.
 
2 4
(4) Determine the rotation-scaling matrix inside the matrix .
−2 2
(5) Find a real 2 × 2 matrix with eigenvalue 1 + 2i.

(6) Find a real 2 × 2 matrix which is not a rotation-scaling matrix with eigenvalue −1 + 2i.

(7) We have seen how to find the characteristic polynomial of an n × n matrix. In this exercise
we consider the revers question. That is, given a polynomial p(λ) of degree n, can we find
an n × n matrix whose characteristic polynomial is p(λ)?
 
0 −a0
(a) Find the characteristic polynomial of the 2 × 2 matrix C = . Use this
1 −a1
result to find a real valued matrix whose eigenvalues are 1 + i and 1 − i.
(b) Repeat part (a) by showing that −p(λ) = − λ2 + a2 λ2 + a1 λ + a0 is the charac-

 
0 0 −a0
teristic polynomial of the 3 × 3 matrix C =  0 1 −a1 .
0 0 −a2
(c) We can generalize this argument. Prove, using mathematical induction, that the poly-
nomial

p(λ) = (−1)n λn + an−1 λn−1 + an−2 λn−2 + · · · + a1 λ + a0




is the characteristic polynomial of the matrix


 
0 0 0 ··· 0 −a0
 1 0 0 ··· 0 −a1 
 
C =  0 1 0 ··· 0 −a2 .
 
 .. .. .. . . .. ..
 . . . . . .


0 0 0 · · · 1 −an−1

The matrix C is called the companion matrix for p(λ).


Section 20. Complex Eigenvalues 357

(8) Label each of the following statements as True or False. Provide justification for your re-
sponse.
(a) True/False If 3 − 4i is an eigenvalue of a real matrix, then so is 3 + 4i.
(b) True/False If 2 + 3i is an eigenvalue of a 3 × 3 real matrix A, then A has three
distinct eigenvalues.
(c) True/False Every 2 × 2 real matrix with complex eigenvalues is a rotation-scaling
matrix.
(d) True/False Every square matrix with real entries has real number eigenvalues.
(e) True/False If A is a 2 × 2 matrix with complex eigenvalues similar to a rotation-
scaling matrix R, the eigenvalues of A and R are the same.
(f) True/False If A is a real matrix with complex eigenvalues, all eigenvectors of A must
be non-real.

Project: Understanding the Gershgorin Disk Theorem

To understand the Gershgorin Disk Theorem, we need to recall how to visualize a complex number
in the plane. Recall that a complex number z is a number of the form z = a + bi where a and
b are real numbers and i2 = −1. The number a is the real part of z, denoted as <(z), and b is
the imaginary part of z, denoted =(z). The set of all complex numbers is denoted C. We define
addition and multiplication on C as follows. For a + bi, c + di ∈ C,

(a + bi) + (c + di) = (a + c) + (b + d)i and (a + bi)(c + di) = (ac − bd) + (ad + bc)i.

Note that the product is what we would expect if we “expanded” the product in the normal way and
used the fact that i2 = −1. The set of complex numbers forms a field – that is, C satisfies all of the
same properties as R as stated in Theorem 4.2.
We can visualize the complex number a + bi in the plane as the point (a, b). Here we are
viewing the horizontal axis as the real axis and the vertical axis as the imaginary axis. The length
(or magnitude) of the complex number z = a + bi, which we denote √ as |z|, is the distance from the
origin to z. So by the Pythagorean Theorem we have |a + bi| = a2 + b2 . Note that the magnitude
of z = a + bi can be written as a complex product
p
|z| = (a + bi)(a − bi).

The complex number a − bi is called the complex conjugate of z = a + bi and is denoted as z. A


few important properties of real numbers and their conjugates are the following. Let z = a + bi and
w = c + di be complex numbers. Then

• z + w = (a + c) + (b + d)i = (a + c) − (b + d)i = (a − bi) + (c − di) = z + w,

• zw = (ac − bd) + (ad + bc)i = (ac − bd) − (ad + bc)i = (a − bi)(c − di) = zw,

• z = z,
358 Section 20. Complex Eigenvalues

√ √
• |z| = a2 + b2 ≥ a2 = |a| = |<(z)|,
√ √
• |z| = a2 + b2 ≥ b2 = |b| = |=(z)|,

• |z| = |z|,

• |z| = 0 if and only if z = 0,

• If p(x) is a polynomial with real coefficients and the complex number z satisfies p(z) = 0,
then p (z) = 0 as well.

Using these facts we can show that the triangle inequality is true for complex numbers. That is,

|z + w| ≤ |z| + |w|.

To see why, notice that

|z + w|2 = (z + w)(z + w)
= (z + w)(z + w)
= zz + zw + wz + ww
= zz + zw + zww + ww
= |z|2 + 2<(zw) + |w|2
≤ |z|2 + 2|zw| + |w|2
= |z|2 + 2|z||w| + |w|2
= (|z| + |w|)2 .

Since |z + w|, |z|, and |w| are all non-negative, taking square roots of both sides gives us |z + w| ≤
|z| + |w| as desired. We can extend this triangle inequality to any number of complex numbers.
That is, if z1 , z2 , . . ., zk are complex numbers, then

|z1 + z2 + · · · + zk | ≤ |z1 | + |z2 | + · · · + |zk |. (20.1)

We can prove Equation (20.1) by mathematical induction. We have already done the k = 2 case
and so we assume that Equation (20.1) is true for any sum of k complex numbers. Now suppose
that z1 , z2 , . . ., zk , zk+1 are complex numbers. Then

|z1 + z2 + · · · + zk + zk+1 | = |(z1 + z2 + · · · + zk ) + zk+1 |


≤ |z1 + z2 + · · · + zk | + |zk+1 |
≤ (|z1 | + |z2 | + · · · + |zk |) + |zk+1 |
= |z1 | + |z2 | + · · · + |zk | + |zk+1 |.

To prove the Gershgorin Disk Theorem, we will use the Levy-Desplanques Theorem, which
gives conditions that guarantee that a matrix is invertible. We illustrate with an example in the
following activity.
 
3 2
Project Activity 20.1. Let A = . Since det(A) 6= 0, we know that A is an invertible
−1 4
matrix. Let us assume for a moment that we don’t know that A is invertible and try to determine
Section 20. Complex Eigenvalues 359

if 0 is an eigenvalue of A. In other words, we want to know if there is a nonzero vector v so that


Av = 0. Assuming the existence of such a vector v = [v1 v2 ]T , for Av to be 0 it must be the case
that
3v1 + 2v2 = 0 and − v1 + 4v2 = 0.
Since the vector v is not the zero vector, at least one of v1 , v2 is not zero. Note that if one of v1 , v2
is zero, the so is the other. So we can assume that v1 and v2 are nonzero.

(a) Use the fact that 3v1 + 2v2 = 0 to show that |v2 | > |v1 |.

(b) Use the fact that −v1 + 4v2 = 0 to show that |v1 | > |v2 |. What conclusion can we draw
about whether 0 is an eigenvalue of A? Why does this mean that A is invertible?

What makes the arguments work in Project Activity 20.1 is that |3| > |2| and |4| > | − 1|. This
argument can be extended to larger matrices, as described in the following theorem.

Theorem 20.4 (Levy-Desplanques Theorem). Any square matrix A = [aij ] satisfying |aii | >
j6=i |aij | for all i is invertible.
P

Proof. Let A = [aij ] be an n × n matrix satisfying |aii | > j6=i |aij | for all i. Let us assume that
P
A is not invertible, that is that there is a vector v 6= 0 such that Av = 0. Let v = [v1 v2 · · · vn ]
and t be between 1 and n so that |vt | ≥ |vi | for all i. That is, choose vt to be the component of v
with the largest absolute value.
Expanding the product Av using the row-column product along the tth row shows that

at1 v1 + at2 v2 + · · · atn vn = 0.

Solving for the att term gives us

att vt = −(at1 v1 + at2 v2 + · · · at(t−1) vt−1 + at(t+1) vt+1 + · · · + atn vn ).

Then

|att ||vt | = | − (at1 v1 + at2 v2 + · · · at(t−1) vt−1 + at(t+1) vt+1 + · · · + atn vn |


= |at1 v1 + at2 v2 + · · · at(t−1) vt−1 + at(t+1) vt+1 + · · · + atn vn |
≤ |at1 ||v1 | + |at2 ||v2 | + · · · |at(t−1) ||vt−1 | + |at(t+1) ||vt+1 | + · · · + |atn ||vn |
≤ |at1 ||vt | + |at2 ||vt | + · · · |at(t−1) ||vt | + |at(t+1) ||vt | + · · · + |atn ||vt |
= (|at1 | + |at2 | + · · · |at(t−1) | + |at(t+1) | + · · · + |atn |)|vt |.

Since |vt | 6= 0, we cancel the |vt | term to conclude that

|att | ≤ |at1 | + |at2 | + · · · |at(t−1) | + |at(t+1) | + · · · + |atn |.

But this contradicts the condition that |aii | > for all i. We conclude that 0 is not an
P
j6=i |aij |
eigenvalue for A and A is invertible. 

Any matrix A = [aij ] satisfying the condition of the Levy-Desplanques Theorem is given a
special name.
360 Section 20. Complex Eigenvalues

Definition 20.5. A square matrix A = [aij ] is strictly diagonally dominant if |aii | >
P
j6=i |aij |
for all i.

So any strictly diagonally dominant matrix is invertible. A quick glance can show that a matrix
is strictly diagonally dominant. For example, since |3| > |1| + | − 1|, |12| > |5| + |6|, and
| − 8| < | − 2| + |4|, the matrix  
3 1 −1
A =  5 12 6 
−2 4 −8
is strictly diagonally dominant and therefore invertible. However, just because a matrix is not
strictly diagonally
 dominant,
 it does not follow that the matrix is non-invertible. For example, the
1 2
matrix B = is invertible, but not strictly diagonally dominant.
0 1
Now we can address the Gershgorin Disk Theorem.

Project Activity 20.2. Let A be an arbitrary n × n matrix and assume that λ is an eigenvalue of A.

(a) Explain why the matrix A − λI is singular.

(b) What does the Levy-Desplanques Theorem tell us about the matrix A − λI?

(c) Explain how we can conclude the Gershgorin Disk Theorem.

Theorem 20.6 (Gershgorin Disk Theorem). Let A = [aij ] be an n×n matrix with complex
entries. Then every eigenvalue of A lies in one of the Gershgorin discs

{z ∈ C : |z − aii | ≤ ri },

where ri =
P
j6=i |aij |.

Based on this theorem, we define a Gershgorin disk to be D(aii , ri ), where ri =


P
j6=i |aij |.

(d) Use the Gershgorin


 Disk Theorem
 to give estimates on the locations of the eigenvalues of
−1 2
the matrix A = .
−3 2

The Gershgorin Disk Theorem has a consequence that gives additional information about the
eigenvalues if some of the Gershgorin disks do not overlap.

Theorem 20.7. If S is a union of m Gershgorin disks of a matrix A such that S does not intersect
any other Gershgorin disk, then S contains exactly m eigenvalues (counting multiplicities) of A.

Proof. Most proofs of this theorem require some results from topology. For that reason, we will
not present a completely rigorous proof but rather give the highlights. Let A = [aij ] be an n × n
matrix. Let Di be a collection of Gershgorin disks of A for 1 ≤ i ≤ m such that S = ∪1≤i≤m Di
does not intersect any other Gershgorin disk of A, and let S 0 be the union of the Gershgorin disks
of A that are different from the Di . Note that S ∩ S 0 = ∅. Let C be the matrix whose ith column is
aii ei , that is C is the diagonal matrix whose diagonal entries are the corresponding diagonal entries
Section 20. Complex Eigenvalues 361

of A. Note that the eigenvalues of C are aii and the Gershgorin disks of C are just the points aii .
So our theorem is true for this matrix C. To prove the result, we build a continuum of matrices
from C to A as follows: let B = A − C (so that B is the matrix whose off-diagonal entries are
those of A and whose diagonal entries are 0), and let A(t) = tB + C for t in the interval [0, 1].
Note that A(1) = A. Since the diagonal entries of A(t) are the same as those of A, the Gershgorin
disks of A(t) have the same centers as the corresponding Gershgorin disks of A, while the radii
of the Gershgorin disks of A(t) are those of A but scaled by t. So the Gershgorin disks of A(t)
increase from points (the aii ) to the Gershgorin disks of A as t increases from 0 to 1. While the
centers of the disks all remain fixed, it is important to recognize that the eigenvalues of A(t) move
as t changes. An illustration of this is shown in Figure 20.2 with the eigenvalues as the black
 points
1

i
and the changing Gershgorin disks dashed in magenta, using the matrix 2 . We can
1 −2 + i
learn about how the eigenvalues move with the characteristic polynomial.

2
Im(z)

Re(z)
-3 -2 -1

Figure 20.2: How eigenvalues move.

Let p(t, x) be the characteristic polynomial of A(t). Note that these characteristic polynomials
are functions of both t and x. Since polynomials are continuous functions, their roots (the eigen-
values of A(t)) are continuous for t ∈ [0, 1] as well. Let λ(t) be an eigenvalue of A(t). Note that
λ(1) is an eigenvalue of A, and λ(0) is one of the aii and is therefore in S. We will argue that λ(t)
is in S for every value of t in [0, 1]. Let ri be the radius of Di and let D(t)i be the Gershgorin
disk of A(t) with the same center as Di and radius r(t)i = tri . Let S(t) = ∪1≤i≤m D(s)i . Since
r(s)i ≤ ri , it follows that D(s)i ⊆ Di and so S(t) ∩ S 0 = ∅ as well. From topology, we know that
since the disks Di are closed, the union S of these disks is also closed. Similarly, S(t) and S 0 are
closed. Thus, λ(t) is continuous in a closed set and so does not leave the set. Thus, λ(t) is in S for
every value of t in [0, 1].

Section 21

Properties of Determinants

Focus Questions
By the end of this section, you should be able to give precise and thorough
answers to the questions listed below. You may want to keep these questions
in mind to focus your thoughts as you complete the section.

• How do elementary row operations change the determinant?


• How can we represent elementary row operations via matrix multiplica-
tion?
• How can we use elementary row operations to calculate the determinant
more efficiently?
• What is the Cramer’s rule for the explicit formula for the inverse of a ma-
trix?
• How can we interpret determinants from a geometric perspective?

Introduction

This section is different than others in that it contains mainly proofs of previously stated results and
only a little new material. Consequently, there is no application attached to this section.
We have seen that an important property of the determinant is that it provides an easy criteria for
the invertibility of a matrix. As a result, we obtained an algebraic method for finding the eigenvalues
of a matrix, using the characteristic equation. In this section, we will investigate other properties of
the determinant related to how elementary row operations change the determinant. These properties
of the determinant will help us evaluate the determinant in a more efficient way compared to using
the cofactor expansion method, which is computationally intensive for large n values due to it being
a recursive method. Finally, we will derive a geometrical interpretation of the determinant.

Preview Activity 21.1.

363
364 Section 21. Properties of Determinants

(1) We first consider how the determinant changes if we multiply a row of the matrix by a con-
stant.
 
2 3
(a) Let A = . Pick a few different values for the constant k and compare the
1 4
 
2k 3k
determinant of A and that of . What do you conjecture that the effect of
1 4
multiplying a row by a constant on the determinant is?
(b) If we want to make sure our
 conjecture is valid for any 2 × 2 matrix, we need to
a b
show that for A = , the relationship between det(A) and the determinant
c d
 
a·k b·k
of follows our conjecture. We should also check that the relation-
c d
 
a b
ship between det(A) and the determinant of follows our conjecture.
c·k d·k
Verify this.
(c) Make a similar conjecture for what happens to the determinant when a row of a 3 × 3
matrix A is multiplied by a constant k, and explain why your conjecture is true using
the cofactor expansion definition of the determinant.

(2) The second type of elementary row operation we consider is row swapping.
 
a b
(a) Take a general 2 × 2 matrix A = and determine how row swapping effects
c d
the determinant.
(b) Now choose a few different 3 × 3 matrices and see how row swapping changes the
determinant in these matrices by evaluating the determinant with a calculator or any
other appropriate technology.
(c) Based on your results so far, conjecture how row swapping changes the determinant
in general.

(3) The last type of elementary row operation is adding a multiple of a row to another. Determine
the effect of this operation on a 2 × 2 matrix by evaluating the determinant of a general 2 × 2
matrix after a multiple of one row is added to the other row.

(4) All of the elementary row operations we discussed above can be achieved by matrix mul-
tiplication with elementary matrices. For each of the following elementary matrices, de-
termine
 what elementary operation it corresponds to by calculating the product EA, where
a11 a12 a13
A =  a21 a22 a23  is a general 3 × 3 matrix.
a31 a32 a33
     
0 1 0 1 0 0 1 0 0
(a) E =  1 0 0  (b) E =  0 3 0  (c) E =  0 1 2 
0 0 1 0 0 1 0 0 1
Section 21. Properties of Determinants 365

Elementary Row Operations and Their Effects on the Determinant

In Preview Activity 21.1, we conjectured how elementary row operations affect the determinant of
a matrix. In the following activity, we prove how the determinant changes when a row is multiplied
by a constant using the cofactor expansion definition of the determinant.

Activity 21.1. In this activity, assume that the determinant of A can be determined by a cofactor
expansion along any row or column. (We will prove this result independently later in this section.)
Consider an arbitrary n × n matrix A = [aij ].

(a)

(b) Write the expression for det(A) using the cofactor expansion along the second row.

(c) Let B be obtained by multiplying the second row of A by k. Write the expression for
det(B) if the cofactor expansion along the second row is used.

(d) Use the expressions you found above, to express det(B) in terms of det(A).

(e) Explain how this method generalizes to prove the relationship between the determinant of
a matrix A and that of the matrix obtained by multiplying a row by a constant k.

Your work in Activity 21.1 proves the first part of the following theorem on how elementary
row operations change the determinant of a matrix.

Theorem 21.1. Let A be a square matrix.

(1) If B is obtained by multiplying a row of A by a constant k, then det(B) = k det(A).

(2) If B is obtained by swapping two rows of A, then det(B) = − det(A).

(3) If B is obtained by adding a multiple of a row of A to another, then det(B) = det(A).

In the next section, we will use elementary matrices to prove the last two properties of Theorem
21.1.

Elementary Matrices

As we saw in Preview Activity 21.1, elementary row operations can be achieved by multiplication
by elementary matrices.

Definition 21.2. An elementary matrix is a matrix obtained by performing a single elementary


row operation on an identity matrix.

The following elementary matrices correspond, respectively, to an elementary row operation


which swaps rows 2 and 4; an elementary row operation which multiplies the third row by 5; and an
elementary row operation which adds four times the third row to the first row on any 4 × 4 matrix:
366 Section 21. Properties of Determinants

     
1 0 0 0 1 0 0 0 1 0 4 0
 0 0 0 1   0 1 0 0   0 1 0 0 
E1 =   , E2 =   , and E3 =  .
 0 0 1 0   0 0 5 0   0 0 1 0 
0 1 0 0 0 0 0 1 0 0 0 1

To obtain an elementary matrix corresponding an elementary row operation, we simply per-


form the elementary row operation on the identity matrix. For example, E1 above is obtained by
swapping rows 2 and 4 of the identity matrix.
With the use of elementary matrices, we can now prove the result about how the determinant
is affected by elementary row operations. We first rewrite Theorem 21.1 in terms of elementary
matrices:

Theorem 21.3. Let A be an n × n matrix. If E is an n × n elementary matrix, then det(EA) =


det(E) det(A) where

 r if E corresponds to multiplying a row by r


det(E) = −1 if E corresponds to swapping two rows


1 if E corresponds to adding a multiple of a row to another.

Notes on Theorem 21.3. An elementary matrix E obtained by multiplying a row by r is a diagonal


matrix with one r along the diagonal and the rest 1s, so det(E) = r. Similarly, an elementary
matrix E obtained by adding a multiple of a row to another is a triangular matrix with 1s along
the diagonal, so det(E) = 1. The fact that the the determinant of an elementary matrix obtained
by swapping two rows is −1 is a bit more complicated and is verified independently later in this
section. Also, the proof of 21.3 depends on the fact that the cofactor expansion of a matrix is the
same along any two rows. A proof of this can also be found later in this section.

Proof of Theorem 21.3. We will prove the result by induction on n, the size of the matrix A. We
verified these results in Preview Activity 21.1 for n = 2 using elementary row operations. The
elementary matrix versions follow immediately.
Now assume the theorem is true for k × k matrices with k ≥ 2 and consider an n × n matrix
A where n = k + 1. If E is an n × n elementary matrix, we want to show that det(EA) =
det(E) det(A). Let EA = B. (Although it is an abuse of language, we will refer to both the
elementary matrix and the elementary row operation corresponding to it by E.)
When finding det(B) = det(EA) we will use a cofactor expansion along a row which is not
affected by the elementary row operation E. Since E affects at most two rows and A has n ≥ 3
rows, it is possible to find such a row, say row i. The cofactor expansion along row i of B is

bi1 (−1)i+1 det(Bi1 ) + bi2 (−1)i+2 det(Bi2 ) + · · · + bin (−1)i+n det(Bin ) . (21.1)

Since we chose a row of A which was not affected by the elementary row operation, it follows
that bij = aij for 1 ≤ j ≤ n. Also, the matrix Bij obtained by removing row i and column j from
matrix B = EA can be obtained from Aij by an elementary row operation of the same type as E.
Hence there is an elementary matrix Ek of the same type as E with Bij = Ek Aij . Therefore, by
Section 21. Properties of Determinants 367

induction, det(Bij ) = det(Ek ) det(Aij ) and det(Ek ) is equal to 1, -1 or r depending on the type
of elementary row operation. If we substitute this information into equation (21.1), we obtain

det(B) = ai1 (−1)i+1 det(Ek ) det(Ai1 ) + ai2 (−1)i+2 det(Ek ) det(Ai2 )


+ · · · + ain (−1)i+n det(Ek ) det(Ain )
= det(Ek ) det(A) .

This equation proves det(EA) = det(Ek ) det(A) for any n × n matrix A where Ek is the corre-
sponding elementary row operation on the k × k matrices obtained in the cofactor expansion.
The proof of the inductive step will be finished if we show that det(Ek ) = det(E). This
equality follows if we let A = In in det(EA) = det(Ek ) det(A). Therefore, det(E) is equal to
r, or 1, or −1, depending on the type of the elementary row operation E since the same is true of
det(Ek ) by inductive hypothesis.
Therefore, by the principle of induction, the claim is true for every n ≥ 2. 

As a corollary of this theorem, we can prove the multiplicativity of determinants:


Theorem 21.4. Let A and B be n × n matrices. Then

det(AB) = det(A) det(B) .

Proof. If A is non-invertible, then AB is also non-invertible and both det(A) and det(AB) are 0,
proving the equality in this case.
Suppose now that A is invertible. By the Invertible Matrix Theorem, we know that A is row
equivalent to In . Expressed in terms of elementary matrices, this means that there are elementary
matrices E1 , E2 , . . . , E` such that

A = E1 E2 · · · E` In = E1 E2 · · · E` . (21.2)

Therefore, repeatedly applying Theorem 21.3, we find that

det(A) = det(E1 ) det(E2 ) · · · det(E` ) . (21.3)

If we multiply equation (21.2) by B on the right, we obtain

AB = E1 E2 · · · E` B .

Again, by repeatedly applying Theorem 21.3 with this product of matrices, we find

det(AB) = det(E1 E2 · · · E` B) = det(E1 ) det(E2 ) · · · det(E` ) det(B) .

From equation (21.3), the product of det(Ei )’s equals det(A), so

det(AB) = det(A) det(B)

which finishes the proof of the theorem. 

We can use the multiplicative property of the determinant and the determinants of elementary
matrices to calculate the determinant of a matrix in a more efficient way than using the cofactor
expansion. The next activity provides an example.
368 Section 21. Properties of Determinants

 
1 1 2
Activity 21.2. Let A =  2 2 6 .
−1 2 1

(a) Use elementary row operations to reduce A to a row echelon form. Keep track of the
elementary row operation you use.

(b) Taking into account how elementary row operations affect the determinant, use the row
echelon form of A to calculate det(A).

Your work in Activity 21.2 provides an efficient method for calculating the determinant. If A is
a square matrix, we use row operations given by elementary matrices E1 , E2 , . . ., Ek to row reduce
A to row echelon form R. That is

R = Ek Ek−1 · · · E2 E1 A.

We know det(Ei ) for each i, and since R is a triangular matrix we can find its determinant. Then

det(A) = det(E1 )−1 det(E2 )−1 · · · det(E2 )−1 det(R).

In other words, if we keep track of how the row operations affect the determinant, we can calculate
the determinant of a matrix A using row operations.

Activity 21.3. Theorems 21.3 and 21.4 can be used to prove the following (part c of Theorem 16.2)
that A is invertible if and only if det(A) 6= 0. We see how in this activity. Let A be an n × n matrix.
We can row reduce A to its reduced row echelon form R by elementary matrices E1 , E2 , . . ., Ek
so that
R = E1 E2 · · · Ek A.

(a) Suppose A is invertible. What, then, is R? What is det(R)? Can the determinant of an
elementary matrix ever be 0? How do we conclude that det(A) 6= 0?

(b) Now suppose that det(A) 6= 0. What can we conclude about det(R)? What, then, must R
be? How do we conclude that A is invertible?

Summary: Let A be an n × n matrix. Suppose we swap rows s times and divide rows by
constants k1 , k2 , . . . , kr while computing a row echelon form REF(A) of A. Then det(A) =
(−1)s k1 k2 · · · kr det(REF(A)).

Geometric Interpretation of the Determinant

Determinants have interesting and useful applications from a geometric perspective. To understand
the geometric interpretation of the determinant of an n × n matrix A, we consider the image of the
unit square under the transformation T (x) = Ax and see how its area changes based on A.

Activity 21.4.
Section 21. Properties of Determinants 369

 
2 0
(a) Let A = . Start with the unit square in R2 with corners at the origin and at (1, 1).
0 3
 
x
In other words, the unit square we are considering consists of all vectors v = where
y
0 ≤ x ≤ 1 and 0 ≤ y ≤ 1, visualized as points in the plane.

i. Consider the collection of image vectors Av obtained by multiplying v’s by A. Sketch


the rectangle formed by these image vectors.
ii. Explain how the area of this image rectangle and the unit square is related via det(A).
 
a 0
iii. Does the relationship you found above generalize to an arbitrary A = ? If
0 b
not, modify the relationship to hold for all diagonal matrices.
 
2 1
(b) Let A = .
0 3

i. Sketch the image of the unit square under the transformation T (v) = Av. To make
the sketching easier, find the images of the vectors [0 0]T , [1 0]T , [0 1]T , [1 1]T as
points first and then connect these images to find the image of the unit square.
ii. Check that the area of the parallelogram you obtained in the above part is equal to
det(A).
 
−2 1
iii. Does the relationship between the area and det(A) still hold if A = ? If
0 3
not, how will you modify the relationship?

It can be shown that for all 2 × 2 matrices a similar relationship holds.


Theorem 21.5. For a 2 × 2 matrix A, the area of the image of the unit square under the transfor-
mation T (x) = Ax is equal to | det(A)|. This is equivalent to saying that | det(A)| is equal to the
area of the parallelogram defined by the columns of A. The area of the parallelogram is also equal
to the lengths of the column vectors of A multiplied by | sin(θ)| where θ is the angle between the
two column vectors.

There is a similar geometric interpretation of the determinant of a 3 × 3 matrix in terms of


volume.
Theorem 21.6. For a 3 × 3 matrix A, the volume of the image of the unit cube under the transfor-
mation T (x) = Ax is equal to | det(A)|. This is equivalent to saying that | det(A)| is equal to the
volume of the parallelepiped defined by the columns of A.

The sign of det(A) can be interpreted in terms of the orientation of the column vectors of A.
See the project in Section 16 for details.

An Explicit Formula for the Inverse and Cramer’s Rule

In Section 10 we found the inverse A−1 using row reduction of the matrix obtained by augmenting
A with In . However, in theoretical applications, having an explicit formula for A−1 can be handy.
370 Section 21. Properties of Determinants

Such an explicit formula provides us with an algebraic expression for A−1 in terms of the entries
of A. A consequence of the formula we develop is Cramer’s Rule, which can be used to provide
formulas that give solutions to certain linear systems.
We begin with an interesting connection between a square matrix and the matrix of its cofactors
that we explore in the next activity.
 
2 1 3
Activity 21.5. Let A =  1 4 5 .
2 −1 2

(a) Calculate the (1, 1), (1, 2), and (1, 3) cofactors of A.

(b) If Cij represents the (i, j) cofactor of A, then the cofactor matrix C is the matrix C = [Cij ].
The adjugate matrix of A is the transpose of the cofactor matrix. In our example, the
adjugate matrix of A is  
13 −5 −7
adj(A) =  8 −2 −7  .
−9 4 7
Check the entries of this adjugate matrix with your calculations from part (a). Then calcu-
late the matrix product
A adj(A).

(c) What do you notice about the product A adj(A)? How is this product related to det(A)?

The result of Activity 21.5 is rather surprising, but it is valid in general. That is, if A = [aij ] is
an invertible
 n × n matrix and Cij is the (i, j) cofactor of A, then A adj(A) = det(A)In . In other
adj(A)
words, A det(A) = In and so
1
A−1 = adj(A).
det(A)
This gives us another formulation of the inverse of a matrix. To see why A adj(A) = det(A)In , we
use the row-column version of the matrix product to find the ijth entry of A adj(A) as indicated by
the shaded row and column
 
a11 a12 · · · a1n
 a21 a22 · · · a2n   
 C11 C21 · · · Cj1 · · · Cn1
 .. .. ..   C

 . . .  12 C22 · · · Cj2 · · · Cn2 

 ai1 ai2 · · · ain   .
  . .
.. .
.. 
.
.


 .. .. ..  C
 
 . . .  1n C2n · · · Cjn · · · Cnn
an1 an2 · · · ann

Thus the ijth entry of A adj(A) is

ai1 Cj1 + ai2 Cj2 + · · · + ain Cjn . (21.4)

Notice that if i = j, then expression (21.4) is the cofactor expansion of A along the ith row. So
the iith entry of A adj(A) is det(A). It remains to show that the ijth entry of A adj(A) is 0 when
i 6= j.
Section 21. Properties of Determinants 371

When i 6= j, the expression (21.4) is the cofactor expansion of the matrix


 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
 .. .. ..
 
 . . .


 
 ai1 a i2 · · · a in

 .. .. ..
 
 . . .


 
 aj−11 aj−12 · · · aj−1n 
 
 ai1
 ai2 · · · ain  
 aj+11 ai+12 · · · aj+1n 
 .. .. ..
 
 . . .


an1 an2 · · · ann
along the jth row. This matrix is the one obtained by replacing the jth row of A with the ith row
of A. Since this matrix has two identical rows, it is not row equivalent to the identity matrix and
is therefore not invertible. Thus, when i 6= j expression (21.4) is 0. This makes A adj(A) =
det(A)In .
1
One consequence of the formula A−1 = det(A) adj(A) is Cramer’s rule, which describes the
solution to the equation Ax = b.
   
3 1 2
Activity 21.6. Let A = , and let b = .
4 2 6
(a) Solve the equation Ax = b using the inverse of A.
 
2 1
(b) Let A1 = , the matrix obtained by replacing the first column of A with b. Calcu-
6 2
det(A1 )
late det(A) and compare to your solution from part (a). What do you notice?
 
3 2
(c) Now let A2 = , the matrix obtained by replacing the second column of A with b.
4 6
det(A2 )
Calculate det(A) and compare to your solution from part (a). What do you notice?

The result from Activity 21.6 may seem a bit strange, but turns out to be true in general. The
result is called Cramer’s Rule.
Theorem 21.7 (Cramer’s Rule). Let A be an n × n invertible matrix. For any b in Rn , the solution
x of Ax = b has entries
det(Ai )
xi =
det(A)
where Ai represents the matrix formed by replacing ith column of A with b.

To see why Cramer’s Rule works in general, let A be an n × n invertible matrix and b =
[b1 b2 · · · bn ]T . The solution to Ax = b is
  
C11 C21 · · · Cn1 b1
1 1  C12 C22
 · · · Cn2  b2 
x = A−1 b = adj(A)b =  . .. .. .. .
 
det(A) det(A)  .. . . .

 
C1n C2n · · · Cnn bn
372 Section 21. Properties of Determinants

Expanding the product gives us


 
b1 C11 + b2 C21 + · · · + bn Cn1
1  b1 C12 + b2 C22 + · · · + bn Cn2 
x= .. .
 
det(A)  .


b1 C1n + b2 C2n + · · · + bn Cnn
The expression
b1 C1j + b2 C2j + · · · + bn Cnj
is the cofactor expansion of the matrix
 
a11 a12 · · · a1j−1 b1 a1j+1 · · · a1n
 a21 a22 · · · a2j−1 b2 a2j+1 · · · a2n 
Aj =  . .. ..
 
 .. . .


an1 an2 · · · anj−1 bn anj+1 · · · ann
along the jth column, giving us the formula in Cramer’s Rule.
Cramer’s Rule is not a computationally efficient method. To find a solution to a linear system of
n equations in n unknowns using Cramer’s Rule requires calculating n + 1 determinants of n × n
matrices – quite inefficient when n is 3 or greater. Our standard method of solving systems using
Gaussian elimination is much more efficient. However, Cramer’s Rule does provide a formula for
the solution to Ax = b as long as A is invertible.

The Determinant of the Transpose

In this section we establish the fact that the determinant of a square matrix is the same as the
determinant of its transpose.
The result is easily verified for 2 × 2 matrices, so we will proceed by induction and assume that
the determinant of the transpose of any (n − 1) × (n − 1) matrix is the same as the determinant of
its transpose. Suppose A = [aij ] is an n × n matrix. By definition,
det(A) = a11 C11 + a12 C12 + a13 C13 + · · · + a1n C1n
and
det(AT ) = a11 C11 + a21 C21 + a31 C31 + · · · + an1 Cn1 .
Note that the only terms in either determinant that contains a11 is a11 C11 . This term is the same
in both determinants, so we proceed to examine other elements. Let us consider all terms in the
cofactor expansion for det(AT ) that contain ai1 a1j . The only summand that contains ai1 is ai1 Ci1 .
Letting Aij be the sub-matrix of A obtained by deleting the ith row and jth column, we see that
ai1 Ci1 = (−1)i+1 ai1 det(Ai1 ). Now let’s examine the sub-matrix Ai1 :
 
a12 a13 · · · a1j · · · a1n−1 a1n
 a22 a23 · · · a2j · · · a2n−1 a2n 
 .. .. .. ..
 
 . . . .


 
 ai−12 ai−13 · · · ai−1j · · · ai−1n−1 ai−1n 
 
 ai+12 ai+13 · · · ai+1j · · · ai+1n−1 ai+1n 
an2 an3 · · · anj · · · ann−1 ann
Section 21. Properties of Determinants 373

When we expand along the first row to calculate det(Ai1 ), the only term that will involve a1j is

(−1)j−1+1 a1j det(Ai1,1j ),

where Aik,jm denotes the sub-matrix of A obtained by deleting rows i and k and columns j and m
from A. So the term that contains ai1 a1j in the cofactor expansion for det(AT ) is

(−1)i + 1ai1 (−1)j a1j det(Ai11j ) = (−1)i+j+1 ai1 a1j det(Ai1,1j ). (21.5)

Now we examine the cofactor expansion for det(A) to find the terms that contain ai1 a1j . The
quantity a1j only appears in the cofactor expansion as

a1j C1j = (−1)1+j a1j det(A1j ).

Now let’s examine the sub-matrix A1j :


 
a21 a22 · · · a2j−1 a2j+1 · · · a2n
 a31 a32 · · · a3j−1 a3j+1 · · · a3n 
 .. .. .. ..
 
 . . . .


 
 ai1 ai2 · · · aij−1 aij+1 · · · ain 
 .. .. ..
 
 . . .


an1 an2 · · · anj−1 anj+1 · · · ann

Here is where we use the induction hypothesis. Since A1j is an (n − 1) × (n − 1) matrix, its
determinant can be found with a cofactor expansion down the first column. The only term in this
cofactor expansion that will involve ai1 is

(−1)i−1+1 ai1 det(A1i,j1 ).

So the term that contains ai1 a1j in the cofactor expansion for det(A) is

(−1)1+j a1j (−1)i−1+1 ai1 det(A1ji1 ) = (−1)i+j+1 ai1 a1j det(A1i,j1 ). (21.6)

Since the quantities in (21.5) and (21.6) are equal, we conclude that the terms in the two cofactor
expansions are the same and
det(AT ) = det(A).

Row Swaps and Determinants

In this section we determine the effect of row swaps to the determinant. Let Ers be the elementary
matrix that swaps
 rows r and s in the n × n matrix A = [aij ]. Applying E12 to a 2 × 2 matrix
a b
A= , we see that
c d
 
c d
det(A) = ad − bc = −(ad − bc) = det = det(E12 A).
a b
So swapping rows in a 2 × 2 matrix multiplies the determinant by −1. Suppose that row swapping
on any (n−1)×(n−1) matrix multiplies the determinant by −1 (in other words, we are proving our
374 Section 21. Properties of Determinants

statement by mathematical induction). Now suppose A is an n×n matrix and let B = [bij ] = Ers A.
We first consider the case that s = r + 1 – that we swap adjacent rows. We consider two cases,
0 the
r > 1 and r = 1. First let us suppose that r > 1. Let Cij be the (i, j) cofactor of A and Cij
(i, j) cofactor of B. We have

det(A) = a11 C11 + a12 C12 + · · · + a1n C1n

and
0 0 0
det(B) = b11 C11 + b12 C12 + · · · + b1n C1n .
Since r > 1,it follows that a1j = b1j for every j. For each j the sub-matrix B1j obtained from B
by deleting the ith row and jth column is the same matrix as obtained from Aij by swapping rows
r and s. So by our induction hypothesis, we have C1j0 = −C for each j. Then
1j

0 0 0
det(B) = b11 C11 + b12 C12 + · · · + b1n C1n
= a11 (−C11 ) + a12 (−C12 ) + · · · + a1n (−C1n )
= −(a11 C11 + a12 C12 + · · · + a1n C1n )
= − det(A).

Now we consider the case where r = 1, where B is the matrix obtained from A by swapping the
first and second rows. Here we will use the fact that det(A) = det(AT ) which allows us to calculate
det(A) and det(B) with the cofactor expansions down the first column. In this case we have

det(A) = a11 C11 + a21 C21 + · · · + an1 Cn1

and
0 0 0
det(B) = b11 C11 + b21 C21 + · · · + bn1 Cn1
0 0 0 0
= a21 C11 + a11 C21 + a31 C31 + · · · + an1 Cn1 .
0 = −C
For each i ≥ 3, the sub-matrix Bi1 is just Ai1 with rows 1 and 2 swapped. So we have Ci1 i1
by our induction hypothesis. Since we swapped rows 1 and 2, we have B21 = A11 and B11 = A21 .
Thus,
0
b11 C11 = (−1)1+1 b11 det(A21 ) = a21 det(A21 ) = −a21 C21
and
0
b21 C21 = (−1)2+1 a11 det(A11 ) = −a11 det(A11 ) = −a11 C11 .
Putting this all together gives us
0 0 0
det(B) = b11 C11 + b21 C21 + · · · + bn1 Cn1
= −a21 C21 − a11 C11 + a31 (−C31 ) + · · · + an1 (−Cn1 )
= − (a11 C11 + a21 C21 + · · · + an1 Cn1 )
= − det(A).

So we have shown that if B is obtained from A by interchanging two adjacent rows, then det(B) =
− det(A). Now we consider the general case. Suppose B is obtained from A by interchanging rows
r and s, with r < s. We can perform this single row interchange through a sequence of adjacent
row interchanges. First we swap rows r and r + 1, then rows r + 1 and r + 2, and continue until
Section 21. Properties of Determinants 375

we swap rows s − 1 and s. This places the original row r into the row s position, and the process
involved s − r adjacent row interchanges. Each of these interchanges multiplies the determinant
by a factor of −1. At the end of this sequence of row swaps, the original row s is now row s − 1.
So it will take one fewer adjacent row interchanges to move this row to be row r. This sequence of
(s − r) + (s − r − 1) = 2(s − r − 1) − 1 row interchanges produces the matrix B. Thus,

det(B) = (−1)2(s−r)−1 det(A) = − det(A),

and interchanging any two rows multiplies the determinant by −1.

Cofactor Expansions

We have stated that the determinant of a matrix can be calculated by using a cofactor expansion
along any row or column. We use the result that swapping rows introduces a factor of −1 in the
determinant to verify that result in this section. Note that in proving that det(AT ) = det(A), we
have already shown that the cofactor expansion along the first column is the same as the cofactor
expansion along the first row. If we can prove that the cofactor expansion along any row is the same,
then the fact that det(AT ) = det(A) will imply that the cofactor expansion along any column is
the same as well.
Now we demonstrate that the cofactor expansions along the first row and the ith row are the
same. Let A = [aij ] be an n × n matrix. The cofactor expansion of A along the first row is

a11 C11 + a12 C12 + · · · + a1n C1n

and the cofactor expansion along the ith row is

ai1 Ci1 + ai2 Ci2 + · · · + ain Cin .

Let B be the matrix obtained by swapping row i with previous rows so that row i becomes the first
row and the order of the remaining rows is preserved.

· · · aij · · · ain
 
ai1 ai2

 a11 a12 · · · a1j · · · a1n 


 a21 a22 · · · a2j · · · a2n 

B=
 ai−11 ai−12 · · · ai−1j · · · ai−1n 

 ai+11 ai+12 · · · ai+1j · · · ai+1n 
.. .. .. ..
 
. .
 
 . . 
an1 an2 · · · anj ··· ann

Then
det(B) = (−1)i−1 det(A).
0 be the (i, j) cofactor of B we have
So, letting Cij

0 0 0
det(A) = (−1)i−1 det(B) = (−1)i−1 ai1 C11

+ ai2 C12 + · · · + ain C1n .
376 Section 21. Properties of Determinants

Notice that for each j we have B1j = Aij . So


0 0 0
det(A) = (−1)i−1 ai1 C11

+ ai2 C12 + · · · + ain C1n

= (−1)i−1 ai1 (−1)( 1 + 1) det(B11 ) + ai2 (−1)1+2 det(B12 )

+ · · · + ain (−1)1+n det(B1n )

= (−1)i−1 ai1 (−1)( 1 + 1) det(Ai1 ) + ai2 (−1)1+2 det(Ai2 )

+ · · · + ain (−1)1+n det(Ain )
= ai1 (−1)( i + 1) det(Ai1 ) + ai2 (−1)i+2 det(Ai2 )
+ · · · + ain (−1)i+n det(Ain )
= ai1 Ci1 + ai2 Ci2 + · · · + ain Cin .

The LU Factorization of a Matrix

There are many instances where we have a number of systems to solve of the form Ax = b, all
with the same coefficient matrix. The system may evolve over time so that we do not know the
constant vectors b in the system all at once, but only determine them as time progresses. Each time
we obtain a new vector b, we have to apply the same row operations to reduce the coefficient matrix
to solve the new system. This is time repetitive and time consuming. Instead, we can keep track of
the row operations in one row reduction and save ourselves a significant amount of time. One way
of doing this is the LU -factorization (or decomposition).
To illustrate, suppose we can write the matrix A as a product A = LU , where
   
1 0 0 0 1 0 1 0
 −1 1 0 0   0 1 3 −2 
L=  0 1 1 0  and U =  0 0 0
  .
3 
1 0 0 1 0 0 0 0

Let b = [3 1 1 3]T and x = [x1 x2 x3 x4 ]T , and consider the linear system Ax = b. If Ax = b,


then LU x = b. We can solve this system without applying row operations as follows. Let U x = z,
where z = [z1 z2 z3 z4 ]T . We can solve Lz = b by using forward substitution.
The equation Lz = b is equivalent to the system

z1 =3
−z1 + z2 =1
z2 + z3 =1
z4 = 3.

The first equation shows that z1 = 3. Substituting into the second equation gives us z2 = 4. Using
this information in the third equation yields z3 = −3, and then the fourth equation shows that
z4 = 0. To return to the original system, since U x = z, we now solve this system to find the
Section 21. Properties of Determinants 377

solution vector x. In this case, since U is upper triangular, we use back substitution. The equation
U x = z is equivalent to the system

x1 + x3 = 3
x2 + 3x3 − 2x4 = 4
3x4 =−3.

Note that the third column of U is not a pivot column, so x3 is a free variable. The last equation
shows that x4 = −1. Substituting into the second equation and solving for x2 yields x2 = 2 − 3x3 .
The first equation then gives us x1 = 3 − x3 . So the general solution
   
3 −1
 2   −3 
x=  0  +  1  x3
  

−1 0

to Ax = b can be found through L and U via forward and backward substitution. If we can find a
factorization of a matrix A into a lower triangular matrix L and an upper triangular matrix U , then
A = LU is called an LU -factorization or LU -decomposition.
We can use elementary matrices to obtain a factorization of certain matrices into products of
lower triangular (the“L” in LU) and upper triangular (the “U” in LU) matrices. We illustrate with
an example. Let  
1 0 1 0
 −1 1 2 −2 
A=  0 1 3
.
1 
1 0 1 0
Our goal is to find an upper triangular matrix U and a lower triangular matrix L so that A = LU .
We begin by row reducing A to an upper triangular matrix, keeping track of the elementary matrices
used to perform the row operations. We start by replacing the entries below the (1, 1) entry in A
with zeros. The elementary matrices that perform these operations are
   
1 0 0 0 1 0 0 0
 1 1 0 0   0 1 0 0 
 0 0 1 0  and E2 =  0 0 1 0  ,
E1 =    

0 0 0 1 −1 0 0 1
and  
1 0 1 0
 0 1 3 −2 
E2 E1 A = 
 0
.
1 3 1 
0 0 0 0
We next zero out the entries below the (2, 2) entry as
 
1 0 1 0
 0 1 3 −2 
E3 E2 E1 A =   0 0
,
0 3 
0 0 0 0
378 Section 21. Properties of Determinants

where  
1 0 0 0
 0 1 0 0 
E3 = 
 0 −1 1
.
0 
0 0 0 1
The product E3 E2 E1 A is an upper triangular matrix U . So we have

E3 E2 E1 A = U

and
A = E1−1 E2−1 E3−1 U,

where  
1 0 0 0
 −1 1 0 0 
E1−1 E2−1 E3−1 =
 0

1 1 0 
1 0 0 1
is a lower triangular matrix L. So we have decomposed the matrix A into a product A = LU , where
L is lower triangular and U is upper triangular. Since every matrix is row equivalent to a matrix in
row echelon form, we can always find an upper triangular matrix U in this way. However, we may
not always obtain a corresponding lower triangular matrix, as the next example illustrates.
Suppose we change the problem slightly and consider the matrix
 
1 0 1 0
 −1 1 2 −2 
B=
 0
.
1 3 1 
1 0 0 1

Using the same elementary matrices E1 , E2 , and E3 as earlier, we have


 
1 0 1 0
 0 1 3 −2 
E3 E2 E1 B = 
 0
.
0 0 3 
0 0 −1 1

To reduce B to row-echelon form now requires a row interchange. Letting


 
1 0 0 0
 0 1 0 0 
E4 = 
 0

0 0 1 
0 0 1 0

brings us to
 
1 0 1 0
 0 1 3 −2 
E4 E3 E2 E1 B =  .
 0 0 −1 1 
0 0 0 3
Section 21. Properties of Determinants 379

So in this case we have U = E4 E3 E2 E1 B, but


 
1 0 0 0
 −1 1 0 0 
E1−1 E2−1 E3−1 E4−1 =
 0

1 0 1 
1 0 1 0

is not lower triangular. The difference in this latter example is that we needed a row swap to obtain
the upper triangular form.

Examples

What follows are worked examples that use the concepts from this section.

Example 21.8.

(a) If A, B are n × n matrices with det(A) = 3 and det(B) = 2, evaluate the following
determinant values. Briefly justify.

i. det(A−1 )
ii. det(ABAT )
iii. det(A3 (BA)−1 (AB)2 )
 
a b c
(b) If the determinant of  d e f  is m, find the determinant of each of the following
g h i
matrices.
 
a b c
i.  2d 2e 2f 
g h i
 
d e f
ii.  g h i 
a b c
 
a b c
iii.  g − 2d h − 2e i − 2f 
a+d b+e c+f

Example Solution.

(a) Assume that det(A) = 3 and det(B) = 2.

i. Since det(A) 6= 0, we know that A is invertible. Since 1 = det(In ) = det(AA−1 ) =


1
det(A) det(A−1 ), it follows that det(A−1 ) = det(A) = 31 .
380 Section 21. Properties of Determinants

ii. We know that det(AT ) = det(A), so

det(ABAT ) = det(A) det(B) det(AT )


= det(A) det(B) det(A)
= (3)(2)(3)
= 18.

iii. Using properties of determinants gives us

det(A3 (BA)−1 (AB)2 ) = det(A3 ) det((BA)−1 ) det((AB)2 )


 
3 1
= (det(A)) (det(AB))2
det(AB)
 
1
= 27 (det(A) det(B))2
det(A) det(B)
(27)(62 )
=
6
= 162.
 
a b c
(b) Assume that det  d e f  = m.
g h i

i. Multiplying a row by a scalar multiples the determinant by that scalar, so


 
a b c
det  2d 2e 2f  = 2m.
g h i

ii. Interchanging two rows multiples the determinant by −1. It takes two row swaps in
the original matrix to obtain this one, so
 
d e f
det  g h i  = (−1)2 m = m.
a b c

iii. Adding a multiple of a row to another does not change the determinant of the matrix.
Since there is a row swap needed to get this matrix from the original we have
 
a b c
det  g − 2d h − 2e i − 2f  = −m.
a+d b+e c+f
 
2 8 0
Example 21.9. Let A =  2 2 −3 .
1 2 7

(a) Find an LU factorization for A.


Section 21. Properties of Determinants 381

(b) Use the LU factorization with forward substitution and back substitution to solve the system
Ax = [18 3 12]T .

Example Solution.

(a) We row reduce A to  an upper triangular


 matrix by applying elementary matrices. First
1 0 0
notice that if E1 =  −1 1 0 , then
0 0 1
 
2 8 0
E1 A =  0 −6 −3  .
1 2 7
 
1 0 0
Letting E2 =   gives us
 
 0 1 0 
− 12 0 1
 
2 8 0
E2 E1 A =  0 −6 −3  .
0 −2 7
 
1 0 0
Finally, when E3 =   we have
 
 0 1 0 
1
0 −3 1
 
2 8 0
U = E3 E2 E1 A =  0 −6 −3  .
0 0 8

This gives us E3 E2 E1 A = U , so we can take


 
1 0 0
L = E1−1 E2−1 E3−1 = 
 
 1 .
1 0 
1 1
2 3 1

(b) To solve the system Ax = b, where b = [18 3 12]T , we use the LU factorization of A
and solve LU x = b. Let x = [x1 x2 x3 ]T and let z = [z1 z2 z3 ]T with U x = z so
that Lz = L(U x) = Ax = b. First we solve Lz = [18 3 12]T to find z using forward
substitution. The first row of L shows that z1 = 18 and the second row that z1 + z2 = 3.
So z2 = −15. The third row of L gives us 12 z1 + 13 z2 + z3 = 12, so z3 = 12 − 9 + 5 = 8.
Now to find x we solve U x = z using back substitution. The third row of U tells us that
8x3 = 8 or that x3 = 1. The second row of U shows that −6x2 − 3x3 = −15 or x2 = 2.
Finally, the first row of U gives us 2x1 + 8x2 = 18, or x1 = 1. So the solution to Ax = b
is x = [1 2 1]T .
382 Section 21. Properties of Determinants

Summary

• The elementary row operations have the following effects on the determinant:
(a) If we multiply a row of a matrix by a constant k, then the determinant is multiplied
by k.
(b) If we swap two rows of a matrix, then the determinant changes sign.
(c) If we add a multiple of a row of a matrix to another, the determinant does not change.

• Each of the elementary row operations can be achieved by multiplication by elementary ma-
trices. To obtain the elementary matrix corresponding to an elementary row operation, we
perform the operation on the identity matrix.

• Let A be an n × n invertible matrix. For any b in Rn , the solution x of Ax = b has entries

det(Ai (b))
xi =
det(A)

where Ai (b) represents the matrix formed by replacing ith column of A with b.

• Let A be an invertible n × n matrix. Then


1
A−1 = adj A
det(A)

where the adj A matrix, the adjugate of A, is defined as the matrix whose ij-th entry is Cji ,
the ji-th cofactor of A.

• For a 2 × 2 matrix A, the area of the image of the unit square under the transformation
T (x) = Ax is equal to | det(A)|, which is also equal to the area of the parallelogram defined
by the columns of A.

• For a 3 × 3 matrix A, the volume of the image of the unit cube under the transformation
T (x) = Ax is equal to | det(A)|, which is also equal to the volume of the parallelepiped
defined by the columns of A.

Exercises

(1) Find a formula for det(rA) in terms of r and det(A), where A is an n × n matrix and r is a
scalar. Explain why your formula is valid.

(2) Find det(A) by hand using elementary row operations where


 
1 2 −1 3
 −1 −2 3 −1 
A=
 −2 −1
.
2 −3 
1 8 −3 8
Section 21. Properties of Determinants 383

 
4 −1 −1 −1
 −1 4 −1 −1 
(3) Consider the matrix A =   −1 −1
. We will find det(A) using elementary
4 −1 
−1 −1 −1 4
row operations. (This matrix arises in graph theory, and its determinant gives the number of
spanning trees in the complete graph with 5 vertices. This number is also equal to the number
of labeled trees with 5 vertices.)
(a) Add rows R2 , R3 and R4 to the first row in that order.
(b) Then add the new R1 to rows R2 , R3 and R4 to get a triangular matrix B.
(c) Find the determinant of B. Then use det(B) and properties of how elementary row
operations affect determinants to find det(A).
(d) Generalize your work to find the determinant of the n × n matrix
 
n −1 −1 · · · −1 −1
 −1 n −1 · · · −1 −1 
A= . .. .. .. ..  .
 
 . . . . ··· . . 
−1 −1 −1 · · · −1 n

(4) For which matrices A, if any, is det(A) = − det(−A)? Justify your answer.
 
1 0 1
(5) Find the inverse A−1 of A =  0 1 0  using the adjugate matrix.
2 0 1
(6) For an invertible n × n matrix A, what is the relationship between det(A) and det(adj A)?
Justify your result.
 
a b 1
(7) Let A =  c d 2 , and assume that det(A) = 2. Determine the determinants of each of
e f 3
the following.
 
a b 1
(a) B =  3c 3d 6 
e+a f +b 4
 
2e 2f 6
(b) C =  2c − 2e 2d − 2f −2 
2a 2b 2
(8) Find the area of the parallelogram with one vertex at the origin and adjacent vertices at (1, 2)
and (a, b). For which (a, b) is the area 0? When does this happen geometrically?
(9) Find the volume of the parallelepiped with one vertex at the origin and three adjacent vertices
at (3, 2, 0), (1, 1, 1) and (1, 3, c) where c is unknown. For which c, is the volume 0? When
does this happen geometrically?
(10) Label each of the following statements as True or False. Provide justification for your re-
sponse.
384 Section 21. Properties of Determinants

(a) True/False If two rows are equal in A, then det(A) = 0.


(b) True/False If A is a square matrix and R is a row echelon form of A, then det(A) =
det(R).
(c) True/False If a matrix A is invertible, then 0 is not an eigenvalue of A.
(d) True/False If A is a 2 × 2 matrix for which the image of the unit square under the
transformation T (x) = Ax has zero area, then A is non-invertible.
(e) True/False Row operations do not change the determinant of a square matrix.
(f) True/False If Aij is the matrix obtained from a square matrix A = [aij ] by deleting
the ith row and jth column of A, then

ai1 (−1)i+1 det(Ai1 ) + ai2 (−1)i+2 det(Ai2 ) + · · ·


+ ain (−1)i+n det(Ain )
= a1j (−1)j+1 det(A1j ) + a2i (−1)j+2 det(A2i ) + · · ·
+ anj (−1)j+n det(Anj )

for any i and j between 1 and n.


(g) True/False If A is an invertible matrix, then det AT A > 0.


You might also like