4 Eigenvalues and Eigenvectors PDF
4 Eigenvalues and Eigenvectors PDF
4 Eigenvalues and Eigenvectors PDF
277
Section 16
The Determinant
Focus Questions
By the end of this section, you should be able to give precise and thorough
answers to the questions listed below. You may want to keep these questions
in mind to focus your thoughts as you complete the section.
Consider the problem of finding the area of a parallelogram determined by two vectors u and v,
as illustrated at left in Figure 16.1. We could calculate this area, for example, by breaking up
v
w
v
u u
the parallelogram into two triangles and a rectangle and finding the area of each. Now consider
the problem of calculating the volume of the three-dimensional analog (called a parallelepiped)
determined by three vectors u, v, and w as illustrated at right in Figure 16.1.
It is quite a bit more difficult to break this parallelepiped into subregions whose volumes are
easy to compute. However, all of these computations can be made quickly by using determinants.
The details are later in this section.
279
280 Section 16. The Determinant
Introduction
We now consider how we can use the determinant to find eigenvalues and other information about
the invertibility of a matrix.
1 2
(1) Let A = . Find det(A) by hand. What does this mean about the matrix A? Can
2 4
you confirm this with other methods?
1 3
(2) One of the eigenvalues of A = is λ = 4. Recall that we can rewrite the matrix
2 2
equation Ax = 4x in the form (A − 4I2 )x = 0. What must be true about A − 4I2 in order
for 4 to be an eigenvalue of A? How does this relate to det(A − 4I2 )?
1 3
(3) Another eigenvalue of A = is λ = −1. What must be true about A + I2 in order
2 2
for −1 to be an eigenvalue of A? How does this relate to det(A + I2 )?
3 2
(4) To find the eigenvalues of the matrix A = , we rewrite the equation Ax = λx
2 6
as
(A − λI2 )x = 0. The coefficient matrix of this last system has the form A − λI2 =
3−λ 2
. The determinant of this matrix is a quadratic expression in λ. Since the
2 6−λ
eigenvalues will occur when the determinant is 0, we need to solve a quadratic equation. Find
the resulting eigenvalues. (Note: One of the eigenvalues is 2.)
(5) Can you explain why a 2 × 2 matrix can have at most two eigenvalues?
Section 16. The Determinant 281
Around 1900 or so determinants were deemed much more important than they are today. In fact,
determinants were used even before matrices. According to Tucker1 determinants (not matrices)
developed out of the study of coefficients of systems of linear equations and were used by Leibniz
150 years before the term matrix was coined by J. J. Sylvester in 1848. Even though determinants
are not as important as they once were, the determinant of a matrix is still a useful quantity. We saw
in Preview Activity 16.1 that the determinant of a matrix tells us if the matrix is invertible and how
it can help us find eigenvalues. In this section, we will see how to find the determinant of any size
matrix and how to use this determinant to find the eigenvalues.
a b
The determinant of a 2 × 2 matrix A = is det(A) = ad − bc. The matrix A is
c d
invertible if and only if det(A) 6= 0. We will use a recursive approach to find the determinants of
larger size matrices building from the 2 × 2 determinants. We present the result in the 3 × 3 case
here – a more detailed analysis can be found at the end of this section.
a11 a12 a13
To find the determinant of a 3×3 matrix A = a21 a22 a23 , we will use the determinants
a31 a32 a33
of three 2 × 2 matrices. More specifically, the determinant of A, denoted det(A) is the quantity
a22 a23 a21 a23 a21 a22
a11 det − a12 det + a13 det . (16.1)
a32 a33 a31 a33 a31 a32
This sum is called a cofactor expansion of the determinant of A. The smaller matrices in this
expansion are obtained by deleting certain rows and columns of the matrix A. In general, when
finding the determinant of an n × n matrix, we find determinants of (n − 1) × (n − 1) matrices,
which we can again reduce to smaller matrices to calculate.
We will use the specific matrix
1 2 0
A= 1 4 3
2 2 1
as an example in illustrating the cofactor expansion method in general.
• We first pick a row or column of A. We will pick the first row of A for this example.
• For each entry in the row (or column) we choose, in this case the first row, we will calculate
the determinant of a smaller matrix obtained by removing the row and the column the entry
is in. Let Aij be the smaller matrix found by deleting the ith row and jth column of A. For
entry a11 , we find the matrix A11 obtained by removing first row and first column:
4 3
A11 = .
2 1
For entry a12 , we find
1 3
A12 = .
2 1
1
Tucker, Alan. (1993). The Growing Importance of Linear Algebra in Undergraduate Mathematics. The College
Mathematics Journal, 1, 3-9.
282 Section 16. The Determinant
• Notice that in the 3 × 3 determinant formula in (16.1) above, the middle term had a (-) sign.
The signs of the terms in the cofactor expansion alternate within each row and each column.
More specifically, the sign of a term in the ith row and jth column is (−1)i+j . We then obtain
the following pattern of the signs within each row and column:
+ − + ···
− + − ···
+ − + ···
..
.
In particular, the sign factor for a11 is (−1)1+1 = 1, for a12 is (−1)1+2 = −1, and for a13 is
(−1)1+3 = 1.
• For each entry aij in the row (or column) of A we chose, we multiply the entry aij by the
determinant of Aij and the sign (−1)i+j . In this case, we obtain the following numbers
1+1 4 3
a11 (−1) det(A11 ) = 1 det = 1(4 − 6) = −2
2 1
1+2 1 3
a12 (−1) det(A12 ) = −2 det = −2(1 − 6) = 10
2 1
Note that in the last calculation, since a13 = 0, we did not have to evaluate the rest of the
terms.
Cofactors
We will now define the determinant of a general n × n matrix A in terms of a cofactor expansion
as we did in the 3 × 3 case. To do so, we need some notation and terminology.
• We let Aij be the submatrix of A = [aij ] found by deleting the ith row and jth column of A.
The determinant of Aij is called the ijth minor of A or the minor corresponding to the entry
aij .
Section 16. The Determinant 283
• Notice that in the 3 × 3 case, we used the opposite of the 1,2 minor in the sum. It will be the
case that the terms in the cofactor expansion will alternate in sign. We can make the signs
in the sum alternate by taking −1 to an appropriate power. As a result, we define the ijth
cofactor Cij of A as
Cij = (−1)i+j det (Aij ) .
where Cij = (−1)i+j det(Aij ) is the ij-cofactor of A and Aij is the matrix obtained by removing
row i and column j of matrix A.
This method for computing determinants is called the cofactor expansion or Laplace expansion
of A along the 1st row. The cofactor expansion reduces the computation of the determinant of an
n × n matrix to n computations of determinants of (n − 1) × (n − 1) matrices. These smaller
matrices can be reduced again using cofactor expansions, so it can be a long and grueling process
for large matrices. It turns out that we can actually take this expansion along any row or column of
the matrix (a proof of this fact is given in Section 21). For example, the cofactor expansion along
the 2nd row is
det(A) = a21 C21 + a22 C22 + · · · + a2n C2n
and along the 3rd column the formula is
Note that when finding a cofactor expansion, choosing a row or column with many zeros makes
calculations easier.
Activity 16.1.
1 2 −1
(a) Let A = −2 0 4 . Use the cofactor expansion along the first row to calculate the
6 3 0
determinant of A by hand.
1 4 2
(b) Calculate det(A) by using a cofactor expansion along the second row where A = 0 2 0 .
2 5 3
1 −2 3
(c) Calculate the determinant of 0 4 −3 .
0 0 8
(d) Which determinant property can be used to calculate the determinant in part (c)? Explain
how. (Determinant properties are included below for easy reference.)
284 Section 16. The Determinant
1 1 2
(e) Consider the matrix A = 0 2 1 . Let B be the matrix which results when c times
1 2 2
row 1 is added to row 2 of A. Evaluate the determinant of B by hand to check that it is
equal to the determinant of A, which verifies one other determinant property (in a specific
case).
As with any new idea, like the determinant, we must ask what properties are satisfied. We state
the following theorem without proof for the time being. For the interested reader, the proof of many
of these properties is given in Section 21 and others in the exercises.
(1) det(AB) = det(A) · det(B), and in particular det(Ak ) = (det A)k for any positive integer
k.
(6) If A is upper/lower triangular, then det(A) is the product of the entries on the diagonal.
(7) The determinant of a matrix is the product of the eigenvalues, with each eigenvalue repeated
as many times as its multiplicity.
• Adding a multiple of a row to another does NOT change the determinant of the matrix.
• Multiplying a row by a constant multiplies the determinant by the same constant.
• Row swapping multiplies the determinant by (−1).
(9) If the row echelon form U of A is obtained by adding multiples of one row to another, and
row swapping, then det(A) is equal to det(U ) multiplied by (−1)r where r is the number of
row swappings done during the row reduction.
Note that if we were to find the determinant of a 4 × 4 matrix using the cofactor method, we
will calculate determinants of 4 matrices of size 3 × 3, each of which will require 3 determinant
calculations again. So, we will need a total of 12 calculations of determinants of 2 × 2 matrices.
That is a lot of calculations. There are other, more efficient, methods for calculating determinants.
For example, we can row reduce the matrix, keeping track of the effect that each row operation has
on the determinant.
Section 16. The Determinant 285
Earlier we defined the determinant of a 3 × 3 matrix. In this section we endeavor to understand the
motivation behind that definition.
We will repeat the process we went through in the 2×2 case to see how to define the determinant
of a 3 × 3 matrix. Let
a11 a12 a13
A = a21 a22 a23 .
a31 a32 a33
To find the inverse of A we augment A by the 3 × 3 identity matrix
a11 a12 a13 1 0 0
[A | I3 ] = a21 a22 a23 0 1 0
a31 a32 a33 0 0 1
and row reduce the matrix (using appropriate technology) to obtain
a33 a22 − a32 a23 a33 a12 − a32 a13 −a13 a22 + a12 a23
1 0 0 −
d d d
0 1 0 − a33 a21 − a31 a23 a33 a11 − a31 a13 a23 a11 − a21 a13
− ,
d d d
−a31 a22 + a32 a21 a32 a11 − a31 a12 a22 a11 − a21 a12
0 0 1 −
d d d
where
d = a33 a11 a22 − a33 a21 a12 − a31 a13 a22
(16.2)
− a32 a11 a23 + a32 a21 a13 + a31 a12 a23 .
In this case, we can see that the inverse of the 3 × 3 matrix A will be defined if and only if
d 6= 0. So, in the 3 × 3 case the determinant of A will be given by the value of d in Equation (16.2).
What remains is for us to see how this is related to determinants of 2 × 2 sub-matrices of A.
To start, we collect all terms involving a11 in d. A little algebra shows that
det(A) = a11 (a33 a22 − a32 a23 ) − a33 a21 a12 − a31 a13 a22 + a32 a21 a13 + a31 a12 a23 .
• Notice that
a33 a22 − a32 a23
a22 a23
is the determinant of the 2 × 2 matrix obtained from A by deleting the first row
a32 a33
and first column.
286 Section 16. The Determinant
Putting this all together gives us formula (16.1) for the determinant of a 3 × 3 matrix as we
defined earlier.
There are useful ways to remember how to calculate the formulas for determinants of 2 × 2 and
a11 a12
3 × 3 matrices. In the 2 × 2 case of A = , we saw that
a21 a22
This makes |A| the product of the diagonal elements a11 and a22 minus the product of the off-
diagonal elements a12 and a21 . We can visualize this in an array by drawing arrows across the
diagonal and off-diagonal, with a plus sign on the diagonal arrow indicting that we add the product
of the diagonal elements and a minus sign on the off-diagonal arrow indicating that we subtract the
product of the off-diagonal elements as shown in Figure 16.2.
a11 a12
@
a a22
@
21 @
− R
@
+
Figure 16.2: A diagram to remember the 2 × 2 determinant.
We can do a similar thing for the determinant of a 3 × 3 matrix. In this case, we extend the
3 × 3 array to a 3 × 5 array by adjoining the first two columns onto the matrix. We then add the
products along the diagonals going from left to right and subtract the products along the diagonals
going from right to left as indicated in Figure 16.3.
Examples
What follows are worked examples that use the concepts from this section.
3 6 2
(a) A = 0 4 −1
5 0 1
3 0 1 1
2 1 2 1
(b) A =
1 −2 2 −1
−3 2 3 1
Example Solution.
(a) With a 3 × 3 matrix, we will find the sub-matrices A11 , A12 , and A13 . Recall that Aij is
the sub-matrix of A obtained by deleting the ith row and jth column of A. Thus,
4 −1 0 −1 0 4
A11 = A12 = and A13 = .
0 1 5 1 5 0
Then
det(A) = a11 C11 + a12 C12 + a13 C13 = (3)(4) + (6)(−5) + (2)(−20) = −58.
288 Section 16. The Determinant
(b) With a 4 × 4 matrix, we will find the sub-matrices A11 , A12 , A13 , and A14 . We see that
1 2 1
A11 = −2 2 −1
2 3 1
2 2 1
A12 = 1 2 −1
−3 3 1
2 1 1
A13 = 1 −2 −1
−3 2 1
2 1 2
A14 = 1 −2 2 .
−3 2 3
To calculate the ijth cofactor Cij = (−1)i+j det(Aij ), we need to calculate the determi-
nants of the A1j . Using the device for calculating the determinant of a 3 × 3 matrix we
have that
1 2 1
det(A11 ) = det −2 2 −1
2 3 1
= (1)(2)(1) + (2)(−1)(2) + (1)(−2)(3)
− (1)(2)(2) − (1)(−1)(3) − (2)(−2)(1)
= −5,
2 2 1
det(A12 ) = det 1 2 −1
−3 3 1
= (2)(2)(1) + (2)(−1)(−3) + (1)(1)(3)
− (1)(2)(−3) − (2)(−1)(3) − (2)(1)(1)
= 23,
2 1 1
det(A13 ) = det 1 −2 −1
−3 2 1
= (2)(−2)(1) + (1)(−1)(−3) + (1)(1)(2)
− (1)(−2)(−3) − (2)(−1)(2) − (1)(1)(1)
= −2,
Section 16. The Determinant 289
and
2 1 2
det(A14 ) = det 1 −2 2
−3 2 3
= (2)(−2)(3) + (1)(2)(−3) + (2)(1)(2)
− (2)(−2)(−3) − (2)(2)(2) − (1)(1)(3)
= −37.
Then
and so
Example Solution.
a11 a12 b11 b12
Let A = and B = . Then
a21 a22 b21 b22
a11 b11 + a12 b21 a11 b12 + a12 b22
AB = .
a21 b11 + a22 b21 a21 b12 + a22 b22
So
Also,
det(A) det(B) = (a11 a22 − a12 a21 )(b11 b22 − b12 b21 )
= a11 a22 b11 b22 − a11 a22 b12 b21 − a12 a21 b11 b22 + a12 a21 b12 b21 .
Summary
where
– Aij is the sub-matrix of A found by deleting the ith row and jth column of A.
– Cij = (−1)i+j det (Aij ) is the ijth cofactor of A.
• The matrix A is invertible if and only if det(A) 6= 0.
Exercises
(1) Use the cofactor expansion to explain why multiplying each of the entries of a 3 × 3 matrix
A by 2 multiplies the determinant of A by 8.
1 1 2
(2) Use the determinant criterion to determine for which c the matrix A = 1 0 c is
2 −1 2
invertible.
(3) Let A be a square matrix.
(a) Explain why det(A2 ) = [det(A)]2
(b) Expand on the argument from (a) to explain why det(Ak ) = [det(A)]k for any
positive integer k.
(c) Suppose that A is an invertible matrix and k is a positive integer. Must Ak be an
invertible matrix? Why or why not?
1
(4) Let A be an invertible matrix. Explain why det(A−1 ) = using determinant proper-
det(A)
ties.
(5) Simplify the following determinant expression using determinant properties:
det(P A4 P −1 AT (A−1 )3 )
(6) Find the eigenvalues of the following matrices. Find a basis for and the dimension of each
eigenspace.
1 1 1
(a) A = 1 1 1
1 1 1
2 0 3
(b) A = 0 1 0
0 1 2
Section 16. The Determinant 291
(7) Label each of the following statements as True or False. Provide justification for your re-
sponse.
(a) True/False For any two n × n matrices A and B, det(A + B) = det A + det B.
(b) True/False For any square matrix A, det(−A) = − det(A).
(c) True/False For any square matrix A, det(−A) = det(A).
(d) True/False The determinant of a square matrix with all non-zero entries is non-zero.
(e) True/False If the determinant of A is non-zero, then so is the determinant of A2 .
(f) True/False If the determinant of a matrix A is 0, then one of the rows of A is a linear
combination of the other rows.
(g) True/False For any square matrix A, det(A2 ) > det(A).
(h) True/False If A and B are n × n matrices and AB is invertible, then A and B are
invertible.
(i) True/False If A2 is the zero matrix, then the only eigenvalue of A is 0.
(j) True/False If 0 is an eigenvalue of A, then 0 is an eigenvalue of AB for any B of the
same size as A.
(k) True/False Suppose A is a 3 × 3 matrix. Then any three eigenvectors of A will form
a basis of R3 .
The approach we will take to connecting area (volume) to the determinant will help shed light on
properties of the determinant that we will discuss from an algebraic perspective in a later section.
First, we mention some basic properties of area (we focus on area for now, but these same properties
are valid for volumes as well). volume). As a shorthand, we denote the area of a region R by
Area(R).
• If two regions R1 and R2 don’t overlap, then the area of the union of the regions is equal
to the sum of the areas of the regions. That is, if R1 ∩ R2 = ∅, then Area(R1 ∪ R2 ) =
Area(R1 ) + Area(R2 ).
• Area is invariant under translation. That is, if we move a geometric region by the same
amount uniformly in a given direction, the area of the original region and the area of the
transformed region are the same. A translation of a region is done by just adding a fixed
vector to each vector in the region. That is, a translation by a vector v is a function Tv such
that the image Tv (R) of a region R is defined as
Tv (R) = {r + v : r ∈ R}.
Now we turn our attention to areas of parallelograms. Let u and v be vectors in R2 . The
parallelogram P (u, v) defined by u and v with point Q as basepoint is the set
−−→
P (u, v) = {OQ + ru + sv : 0 ≤ r, s ≤ 1}.
u v′
θ
Q
O
u′
u1 u2
[v1 v2 ]T , then we will also represent P (u, v) as P .
v1 v2
−−→
Since area is translation and rotation invariant, we can translate our parallelogram by −OQ to
place its basepoint at the origin, then rotate by an angle θ (as shown at left in Figure 16.4. This
transforms the vector v to a vector v0 and the vector u to a vector u0 as shown at right in Figure
16.4. With this in mind we can always assume that our parallelograms have one vertex at the origin,
with u along the x-axis, and v in standard position. Now we can investigate how to calculate the
area of a parallelogram.
Project Activity 16.1. There are two situations to consider when we want to find the area of a
parallelogram determined by vectors u and v, both shown in Figure 16.5. The parallelogram will
be determined by the lengths of these vectors.
B C B C
v v
h h
u u
O D A E O A D E
(a) In the situation depicted at left in Figure 16.5, use geometry to explain why Area(P (u, v)) =
h|u|. (Hint: What can we say about the triangles ODB and EAC?)
Section 16. The Determinant 293
(b) In the situation depicted at right in Figure 16.5, use geometry to again explain why
Area(P (u, v)) = h|u|. (Hint: What can we say about Area(AEC) and Area(ODB)?)
The result of Project Activity 16.1 is that the area of P (u, v) is given by h|u|, where h is the
height of the parallelogram determined by dropping a perpendicular from the terminal point of v to
the line determined by the vector u.
Now we turn to the question of how the determinant is related to area of a parallelogram. Our
approach will use some properties of the area of P (u, v).
Project Activity 16.2. Let u and v be vectors that determine a parallelogram in R2 .
v v v + ku
h h h
u ku u
(c) Finally, use the result of Project Activity 16.1 to explain why
Area(P (u + kv, v)) = Area(P (u, v + ku)) = Area(P (u, v)) (16.5)
for any real number k. A representative picture is shown at right in Figure 16.6.
Properties (16.4) and (16.5) will allow us to calculate the area of the parallelogram determined
by vectors u and v.
Project Activity 16.3. Let u = [u1 u2 ]T and v = [v1 v2 ]T . We will now demonstrate that
u1 u2
Area(P (u, v)) = det
.
v1 v2
Before we begin, note that if both u1 and v1 are 0, then u and v are parallel. This makes P (u, v) a
line segment and so Area(P (u, v)) = 0. But if u1 = v1 = 0, it is also the case that
u1 u2
det = u1 v2 − u2 v1 = 0
v1 v2
as well. So we can assume that at least one of u1 , v1 is not 0. Since P (u, v) = P (v, u), we can
assume without loss of generality that u1 6= 0.
294 Section 16. The Determinant
We can apply the same arguments as above using rotations, translations, shearings, and scalings
to show that the properties of area given above work in any dimension. Given vectors u1 , u2 , . . .,
un in Rn , we let
−−→
P (u1 , u2 , . . . , un ) = {OQ + x1 u1 + x2 u2 + · · · + xn un : 0 ≤ xi ≤ 1 for each i}.
ProjectActivity
16.4. We now show that Vol(P (u1 , u2 , u3 )) is the absolute value of the determi-
u1
nant of u2 . For easier notation, let u = [u1 u2 u3 ]T , v = [v1 v2 v3 ]T , and w = [w1 w2 w3 ]T .
u3
As we argued in the 2-dimensional case, we can assume that all terms that we need to be nonzero
are nonzero, and we can do so without verification.
(a) Explain how property (16.7) shows that Vol(P (u, v, w)) is equal to
u1 u2 u3
1 1
Vol P 0 u1 2 1 − v1 u2 )
(v u u1 (v3 u1 − v1 u3 )
.
1 1
0 u1 (w2 u1 − w1 u2 ) u1 (w3 u1 − w1 u3 )
(Hint: Think about how these properties are related to row operations.)
h iT
(b) Now let v1 = 0 u11 (v2 u1 − v1 u2 ) u11 (v3 u1 − v1 u3 ) and
h iT
w1 = 0 u11 (w2 u1 − w1 u2 ) u11 (w3 u1 − w1 u3 ) . Explain how property (16.7) shows that
Vol(P (u, v, w)) is equal to
u1 u2 u3
Vol P 0 u11 (v2 u1 − v1 u2 ) u11 (v3 u1 − v1 u3 ) ,
0 0 d
where
1
d= (u1 (v2 w3 − v3 w2 ) − u2 (v1 w3 − v3 w1 ) + u3 (v1 w2 − v2 w1 )).
u1 v2 − u2 v1
(c) Just as we saw in the 2-dimensional case, we can proceed to use the diagonal entries to
eliminate the entries above the diagonal without changing the volume to see that
u1 0 0
Vol(P (u, v, w)) = Vol P 0 u11 (v2 u1 − v1 u2 ) 0 .
0 0 d
for some constant x. Find the constant and, as a result, find a specific expression for
Vol(P (u, v, w)) involving a determinant.
Properties (16.6), (16.7), and (16.8) involve the analogs of row operations on matrices, and we
will prove algebraically that the determinant exhibits the same properties. In fact, the determinant
can be uniquely defined by these properties. So in a sense, the determinant is an area or volume
function.
Section 17
Focus Questions
By the end of this section, you should be able to give precise and thorough
answers to the questions listed below. You may want to keep these questions
in mind to focus your thoughts as you complete the section.
Pour cream into your cup of coffee and the cream spreads out; straighten up your room and it soon
becomes messy again; when gasoline is mixed with air in a car’s cylinders, it explodes if a spark is
introduced. In each of these cases a transition from a low energy state (your room is straightened
up) to a higher energy state (a messy, disorganized room) occurs. This can be described by entropy
– a measure of the energy in a system. Low energy is organized (like ice cubes) and higher energy
is not (like water vapor). It is a fundamental property of energy (as described by the second law of
thermodynamics) that the entropy of a system cannot decrease. In other words, in the absence of
any external intervention, things never become more organized.
The Ehrenfest model1 is a Markov process proposed to explain the statistical interpretation
of the second law of thermodynamics using the diffusion of gas molecules. This process can be
modeled as a problem of balls and bins, as we will do later in this section. The characteristic
1
named after Paul and Tatiana Ehrenfest who introduced it in “Über zwei bekannte Einwände gegen das Boltz-
mannsche H-Theorem,” Physikalishce Zeitschrift, vol. 8 (1907), pp. 311-314)
297
298 Section 17. The Characteristic Equation
polynomial of the transition matrix will help us find the eigenvalues and allow us to analyze our
model.
Introduction
We have seen that the eigenvalues of an n × n matrix A are the scalars λ so that A − λIn has
a nontrivial null space. Since a matrix has a nontrivial null space if and only if the matrix is not
invertible, we can also say that λ is an eigenvalue of A if
This equation is called the characteristic equation of A. It provides us an algebraic way to find
eigenvalues, which can then be used in findingeigenvectors
corresponding to each eigenvalue.
1 1
Suppose we want to find the eigenvalues of A = . Note that
1 3
1−λ 1
A − λI2 = ,
1 3−λ
(1) For each of the following parts, use the characteristic equation to determine the eigenvalues of
A. Then, for each eigenvalue λ, find a basis of the corresponding eigenspace, i.e., Nul (A −
λI). You might want to recall how to find a basis for the null space of a matrix from Section
13. Also, make sure that your eigenvalue candidate λ yields nonzero eigenvectors in Nul (A−
λI) for otherwise
λ willnot be an eigenvalue.
2 0 1 2 1 4
(a) A = (b) A = (c) A =
0 −3 0 1 2 3
(2) Use your eigenvalue and eigenvector calculations of the above problem as a guidance to
answer the following questions about a matrix with real entries.
(a) At most how many eigenvalues can a 2 × 2 matrix have? Is it possible to have no
eigenvalues? Is it possible to have only one eigenvalue? Explain.
(b) If a matrix is an upper-triangular matrix (i.e., all entries below the diagonal are 0’s,
as in the first two matrices of the previous problem), what can you say about its
eigenvalues? Explain.
(c) How many linearly independent eigenvectors can be found for a 2 × 2 matrix? Is it
possible to have a matrix without 2 linearly independent eigenvectors? Explain.
(3) Using the characteristic equation, determine which matrices have 0 as an eigenvalue.
Section 17. The Characteristic Equation 299
Until now, we have been given eigenvalues or eigenvectors of a matrix and determined eigenvectors
and eigenvalues from the known information. In this section we use determinants to find (or ap-
proximate) the eigenvalues of a matrix. From there we can find (or approximate) the corresponding
eigenvectors. The tool we will use is a polynomial equation, the characteristic equation, of a square
matrix whose roots are the eigenvalues of the matrix. The characteristic equation will then provide
us with an algebraic way of finding the eigenvalues of a square matrix.
We have seen that the eigenvalues of a square matrix A are the scalars λ so that A − λI has
a nontrivial null space. Since a matrix has a nontrivial null space if and only if the matrix is not
invertible, we can also say that λ is an eigenvalue of A if
det(A − λIn ),
det(A − λIn ) = 0.
Activity 17.1.
3 −2 5
(a) Find the characteristic polynomial of the matrix A = 1 0 7 , and use the charac-
0 0 1
teristic polynomial to find all of the eigenvalues of A.
1 0 0 1
1 2 0 0
(b) Verify that 1 and 2 are the only eigenvalues of the matrix
0 0 1 0 .
0 0 0 1
As we argued in Preview Activity 17.1, a 2 × 2 matrix can have at most 2 eigenvalues. For
an n × n matrix, the characteristic polynomial will be a degree n polynomial, and we know from
algebra that a degree n polynomial can have at most n roots. Since an eigenvalue of a matrix is a
root of the characteristic polynomial of that matrix, we can conclude that an n × n matrix can have
at most n distinct eigenvalues. Activity 17.1 (b) shows that a 4 × 4 matrix may have fewer than 4
eigenvalues, however. Note that one of these eigenvalues, the eigenvalue 1, appears three times as
a root of the characteristic polynomial of the matrix. The number of times an eigenvalue appears as
a root of the characteristic polynomial is called the (algebraic) multiplicity of the eigenvalue. More
formally:
300 Section 17. The Characteristic Equation
Definition 17.2. The (algebraic) multiplicity of an eigenvalue λ of a matrix A is the largest integer
m so that (x − λ)m divides the characteristic polynomial of A.
Thus, in Activity 17.1 (b) the eigenvalue 1 has multiplicity 3 and the eigenvalue 2 has multi-
plicity 1. Notice that if we count the eigenvalues of an n × n matrix with their multiplicities, the
total will always be n.
If A is a matrix with real entries, then the characteristic polynomial will have real coefficients.
It is possible that the characteristic polynomial can have complex roots, and that the matrix A has
complex eigenvalues. The Fundamental Theorem of Algebra shows us that if a real matrix has
complex eigenvalues, then those eigenvalues will appear in conjugate pairs, i.e., if λ1 = a + ib is
an eigenvalue of A, then λ2 = a − ib is another eigenvalue of A. Furthermore, for an odd degree
polynomial, since the complex eigenvalues will come in conjugate pairs, we will be able to find at
least one real eigenvalue.
We now summarize the information we have so far about eigenvalues of an n × n real matrix:
(1) There are at most n eigenvalues of A. If each eigenvalue (including complex eigenvalues) is
counted with its multiplicity, there are exactly n eigenvalues.
(4) If A is upper or lower-triangular, the eigenvalues are the entries on the diagonal.
We are interested in understanding what this matrix transformation does to vectors in R3 . First we
note that
A has eigenvalues λ1 = 1 and λ2 = 2, with λ1 having multiplicity 2. There is a pair
1 0
v1 = 0 and v2 = 1 of linearly independent eigenvectors for A corresponding to the
0 0
1
eigenvalue λ1 and an eigenvector v3 = 1 for A corresponding to the eigenvalue λ2 . Note
1
that the vectors v1 , v2 , and v3 are linearly independent (recall from Theorem that eigenvectors
corresponding to different eigenvalues are always linearly independent). So any vector b in R3 can
Section 17. The Characteristic Equation 301
be written uniquely as a linear combination of v1 , v2 , and v3 . Let’s now consider the action of the
matrix transformation T on a linear combination of v1 , v2 , and v2 . Note that
Equation (17.3) illustrates that it is most convenient to view the action of T in the coordinate system
where Span{v1 } serves as the x-axis, Span{v2 } serves as the y-axis, and Span{v3 } as the z-
axis. In this case, we can visualize that when we apply the transformation T to a vector b =
c1 v1 + c2 v2 + c3 v3 in R3 the result is an output vector that is unchanged in the v1 -v2 plane and
scaled by a factor of 2 in the v3 direction. For example, consider the box whose sides are determined
by the vectors v1 , v2 , and v3 as shown in Figure 17.1. The transformation T stretches this box by
a factor of 2 in the v3 direction and leaves everything else alone, as illustrated in Figure 17.1. So
the entire Span{v1 , v2 }) is unchanged by T , but Span{v3 }) is scaled by 2. In this situation, the
eigenvalues and eigenvectors provide the most convenient perspective through which to visualize
the action of the transformation T .
1
v3
0
3
2
y
v2
0 v1 2 3
x
This geometric perspective illustrates how each eigenvalue and the corresponding eigenspace of
A tells us something important about A. So it behooves us to learn a little more about eigenspaces.
Dimensions of Eigenspaces
Activity 17.2.
302 Section 17. The Characteristic Equation
3 −2 5
(a) Find the dimension of the eigenspace for each eigenvalue of matrix A = 1 0 7
0 0 1
from Activity 17.1 (a).
1 0 0 1
1 2 0 0
(b) Find the dimension of the eigenspace for each eigenvalue of matrix A =
0 0 1 0
0 0 0 1
from Activity 17.1 (b).
i. Recall that a polynomial of degree can have at most three distinct roots. What does
that say about the multiplicities of λ1 , λ2 , λ3 ?
ii. Use the fact that eigenvectors corresponding to distinct eigenvalues are linearly inde-
pendent to find the dimensions of the eigenspaces for λ1 , λ2 , λ3 .
The examples in Activity 17.2 all provide instances of the principle that the dimension of an
eigenspace corresponding to an eigenvalue λ cannot exceed the multiplicity of λ. Specifically:
Theorem 17.4. If λ is an eigenvalue of A, the dimension of the eigenspace corresponding to λ is
less than or equal to the multiplicity of λ.
1 0 1
The examples we have seen raise another important point. The matrix A = 0 1 1 from
0 0 2
our geometric example has two eigenvalues 1 and 2, with the eigenvalue 1 having multiplicity 2.
If we let Eλ represent the eigenspace of A corresponding to the eigenvalue λ, then dim(E
1) = 2
2 0 1
and dim(E2 ) = 1. If we change this matrix slightly to the matrix B = 0 1 1 we see that
0 0 1
B has two eigenvalues 1 and 2, with the eigenvalue 1 having multiplicity 2. However, in this case
we have dim(E1 ) = 1 (like the example in from Activities 17.1 (a) and 17.2 (a)). In this case the
vector v1 = [1 0 0]T forms a basis for E2 and the vector v2 = [0 1 0]T forms a basis for E1 . We
can visualize the action of B on the square formed by v1 and v2 in the xy-plane as a scaling by 2
in the v1 direction as shown in Figure 17.2, but since we do not have a third linearly independent
eigenvector, the action of B in the direction of [0 0 1]T is not so clear.
So the action of a matrix transformation can be more easily visualized if the dimension of each
eigenspace is equal to the multiplicity of the corresponding eigenvalue. This geometric perspective
leads us to define the geometric multiplicity of an eigenvalue.
Definition 17.5. The geometric multiplicity of an eigenvalue of an n×n matrix A is the dimension
of the corresponding eigenspace Nul (A − λIn ).
Examples
What follows are worked examples that use the concepts from this section.
Section 17. The Characteristic Equation 303
0
2
y v2
0 v1 2 3
x
−1 0 −2
Example 17.6. Let A = 2 1 2 .
0 0 1
Example Solution.
(b) The eigenvalues of A are the solutions to the characteristic equation. Since
1 0 1 x1
0 0 0 . If x = x2 , then (A − I3 )x = 0 has general solution
0 0 0 x3
x1 −x3 0 −1
x = x2 = x2 = x2 1 + x3 0 .
x3 x3 0 1
Therefore, {[0 1 0]T , [−1 0 1]T } is a basis for the eigenspace of A corresponding to the
eigenvalue 1.
x1 −x2 −1
x = x2 = x2 = x2 1 .
x3 0 0
Therefore, a basis for the eigenspace of A corresponding to the eigenvalue −1 is {[−1 1 0]T }.
(d) Let v1 = [0 1 0]T , [−1 0 1]T , v2 = [−1 0 1]T , and v3 = [−1 1 0]T . Since eigenvectors
corresponding to different eigenvalues are linearly independent, and since neither v1 nor
v2 is a scalar multiple of the other, we can conclude that the set {v1 , v2 , v3 } is a linearly
independent set with 3 = dim(R3 ) vectors. Therefore, {v1 , v2 , v3 } is a basis for R3
consisting of eigenvectors of A.
Example 17.7. Find a 3 × 3 matrix A that has an eigenvector v1 = [1 0 1]T with corresponding
eigenvalue λ1 = 2, an eigenvector v2 = [0 2 − 3]T with corresponding eigenvalue λ2 = −3, and
an eigenvector v3 = [−4 0 5]T with corresponding eigenvalue λ3 = 5. Explain your process.
Example Solution. We are looking for a 3 × 3 matrix A such that Av1 = 2v1 , Av2 = −3v2 and
Av3 = 5v3 . Since v1 , v2 , and v3 are eigenvectors corresponding to different eigenvalues, v1 , v2 ,
Section 17. The Characteristic Equation 305
and v3 are linearly independent. So the matrix [v1 v2 v3 ] is invertible. It follows that
A[v1 v2 v3 ] = [Av1 Av2 Av3 ]
1 0 −4
A 0
2 0 = [2v1 − 3v2 5v3 ]
1 −3 5
1 0 −4 2 0 −20
A 0
2 0 = 0 −6 0
1 −3 5 2 9 25
−1
2 0 −20 1 0 −4
A = 0 −6 0 0 2 0
2 9 25 1 −3 5
5 2 4
2 0 −20 9 3 9
A = 0 −6 0 1
0 2 0
2 9 25
− 91 16 91
10
3 −2 − 43
A= 0 −3 0 .
5 11
− 3 10 3
Summary
In this section we studied the characteristic polynomial of a matrix and similar matrices.
• The characteristic equation of a square matrix provides us an algebraic method to find the
eigenvalues of the matrix.
• The eigenvalues of an upper or lower-triangular matrix are the entries on the diagonal.
• There are at most n eigenvalues of an n × n matrix.
• For a real matrix A, if an eigenvalue λ of A is complex, then the complex conjugate of λ is
also an eigenvalue.
• The algebraic multiplicity of an eigenvalue λ is the multiplicity of λ as a root of the charac-
teristic equation.
• The dimension of the eigenspace corresponding to an eigenvalue λ is less than or equal to the
algebraic multiplicity of λ.
306 Section 17. The Characteristic Equation
Exercises
(1) There is interesting relationship2 between a matrix and its characteristic equation that we
explore in this exercise.
1 2
(a) We first illustrate with an example. Let B = .
1 −2
i. Show that λ2 + λ − 4 is the characteristic polynomial for B.
ii. Calculate B 2 . Then compute B 2 + B − 4I2 . What do you get?
(b) The first part of this exercise presents an example of a matrix that satisfies its own
characteristic equation. Explain for a general n × n matrix, why A satisfies its char-
acteristic equation.
(2) There is a useful relationship between the determinant and eigenvalues of a matrix A that we
explore in this exercise.
2 3
(a) Let B = . Find the determinant of B and the eigenvalues of B, and com-
8 4
pare det(B) to the eigenvalues of B.
(b) Let A be an n × n matrix. In this part of the exercise we argue the general case
illustrated in the previous part – that det(A) is the product of the eigenvalues of A.
Let p(λ) = det(A − λIn ) be the characteristic polynomial of A.
i. Let λ1 , λ2 , . . ., λn be the eigenvalues of A (note that these eigenvalues may not
all be distinct). Recall that if r is a root of a polynomial q(x), then (x − r) is a
factor of q(x). Use this idea to explain why
(3) Find the eigenvalues of the following matrices. For each eigenvalue, determine its algebraic
and geometric multiplicity.
1 1 1
(a) A = 1 1 1
1 1 1
2 0 3
(b) A = 0 1 0
0 1 2
(4) Let A be an n × n matrix. Use the characteristic equation to explain why A and AT have the
same eigenvalues.
2
This result is known as the Cayley-Hamilton Theorem and is one of the fascinating results in linear algebra.
Section 17. The Characteristic Equation 307
(5) Find three 3 × 3 matrices whose eigenvalues are 2 and 3, and for which the dimensions of the
eigenspaces for λ = 2 and λ = 3 are different.
(6) Suppose A is an n × n matrix and B is an invertible n × n matrix. Explain why the charac-
teristic polynomial of A is the same as the characteristic polynomial of BAB −1 , and hence,
as a result, the eigenvalues of A and BAB −1 are the same.
(7) Label each of the following statements as True or False. Provide justification for your re-
sponse.
(a) True/False If the determinant of a 2 × 2 matrix A is positive, then A has two distinct
real eigenvalues.
(b) True/False If two 2 × 2 matrices have the same eigenvalues, then the have the same
eigenvectors.
(c) True/False The characteristic polynomial of an n × n matrix has degree n.
(d) True/False If R is the reduced row echelon form of an n × n matrix A, then A and
R have the same eigenvalues.
(e) True/False If R is the reduced row echelon form of an n × n matrix A, and v is an
eigenvector of A, then v is an eigenvector of R.
(f) True/False Let A and B be n × n matrices with characteristic polynomials pA (λ)
and pB (λ), respectively. If A 6= B, then pA (λ) 6= pB (λ).
(g) True/False Every matrix has at least one eigenvalue.
(h) True/False Suppose A is a 3 × 3 matrix with three distinct eigenvalues. Then any
three eigenvectors, one for each eigenvalue, will form a basis of R3 .
(i) True/False If an eigenvalue λ is repeated 3 times among the eigenvalues of a matrix,
then there are at most 3 linearly independent eigenvectors corresponding to λ.
To realistically model the diffusion of gas molecules we would need to consider a system with a
large number of balls as substitutes for the gas molecules. However, the main idea can be seen in
a model with a much smaller number of balls, as we will do now. Suppose we have two bins that
contain a total of 4 balls between them. Label the bins as Bin 1 and Bin 2. In this case we can
think of entropy as the number of different possible ways the balls can be arranged in the system.
For example, there is only 1 way for all of the balls to be in Bin 1 (low entropy), but there are 4
ways that we can have one ball in Bin 1 (choose any one of the four different balls, which can be
distinguished from each other) and 3 balls in Bin 2 (higher entropy). The highest entropy state has
the balls equally distributed between the bins (with 6 different ways to do this).
We assume that there is a way for balls to move from one bin to the other (like having gas
molecules pass through a permeable membrane). A way to think about this is that we select a ball
(from ball 1 to ball 4, which are different balls) and move that ball from its current bin to the other
bin. Consider a “move” to be any instance when a ball changes bins. A state is any configuration of
308 Section 17. The Characteristic Equation
balls in the bins at a given time, and the state changes when a ball is chosen at random and moved
to the other bin. The possible states are to have 0 balls in Bin 1 and 4 balls in Bin 2 (State 0, entropy
1), 1 ball in Bin 1 and 3 in Bin 2 (State 1, entropy 4), 2 balls in each Bin (State 2, entropy 6), 3 balls
in Bin 1 and 1 ball in Bin 2 (State 3, entropy 4), and 4 balls in Bin 1 and 0 balls in Bin 2 (State 4,
entropy 1). These states are shown in Figure 17.3.
Bin 1 Bin 2
x4
Project Activity 17.1. To model the system of balls in bins we need to understand how the system
can transform from one state to another. It suffices to count the number of balls in Bin 1 (since the
remaining balls will be in Bin 2). Even though the balls are labeled, our count only cares about how
many balls are in each bin. Let x0 = [x0 , x1 , x2 , x3 , x4 ]T , where xi is the probability that Bin 1
T
contains i balls, and let x1 = x10 , x11 , x12 , x13 , x14 , where x1i is the probability that Bin 1 contains
i balls after the first move. We will call the vectors x0 and x1 probability distributions of balls in
bins. Note that since all four balls have to be placed in some bin, the sum of the entries in our
probability distribution vectors must be 1. Recall that a move is an instance when a ball changes
bins. We want to understand how x1 is obtained from x0 . In other words, we want to figure out
what the probability that Bin 1 contains 0, 1, 2, 3, or 4 balls after one ball changes bins if our initial
probability distribution of balls in bins is x0 .
We begin by analyzing the ways that a state can change. For example,
• Suppose there are 0 balls in Bin 1. (In our probability distribution x0 , this happens with
probability x0 .) Then there are four balls in Bin 2. The only way for a ball to change bins is
if one of the four balls moves from Bin 2 to Bin 1, putting us in State 1. Regardless of which
ball moves, we will always be put in State 1, so this happens with a probability of 1. In other
words, if the probability that Bin 1 contains 0 balls is x0 , then there is a probability of (1)x0
that Bin 1 will contain 1 ball after the move.
• Suppose we have 1 ball in Bin 1. There are four ways this can happen (since there are four
balls, and the one in Bin 1 is selected at random from the four balls), so the probability of a
given ball being in Bin 1 is 41 .
Section 17. The Characteristic Equation 309
– If the ball in Bin 1 moves, that move puts us in State 0. In other words, if the probability
that Bin 1 contains 1 ball is x1 , then there is a probability of 14 x1 that Bin 1 will contain
0 balls after a move.
– If any of the 3 balls in Bin 2 moves (each moves with probability 43 ), that move puts us
in State 2. In other words, if the probability that Bin 1 contains 1 ball is x1 , then there
is a probability of 34 x1 that Bin 1 will contain 2 balls after a move.
(a) Complete this analysis to explain the probabilities if there are 2, 3, or 4 balls in Bin 1.
x1 = T x0 ,
0 14
0 0 0
1
1 0 0 0
2
0 34 3
T =
0 4 0 .
1
0 0 0 1
2
1
0 0 0 4 0
x2 = T x1
x3 = T x2
.. ..
. .
xk = T xk−1 .
This example is an example of a Markov process (see Definition 9.4). There are several ques-
tions we can ask about this model. For example, what is the long-term behavior of this system,
and how does this model relate to entropy? That is, given an initial probability distribution vector
x0 , the system will have probability distribution vectors x1 , x2 , . . . after subsequent moves. What
happens to the vectors xk as k goes to infinity, and what does this tell us about entropy? To answer
these questions, we will first explore the sequence {xk } numerically, and then use the eigenvalues
and eigenvectors of T to analyze the sequence {xk }.
Project Activity 17.2. Use appropriate technology to do the following.
310 Section 17. The Characteristic Equation
Describe the long term behavior of the sequence {xk } in each case.
In what follows, we investigate the behavior of the sequence {xk } that we uncovered in Project
Activity 17.2.
Project Activity 17.3. We use the characteristic polynomial to find the eigenvalues of T .
(a) Find the characteristic polynomial of T . Factor the characteristic polynomial into a product
of linear polynomials to show that the eigenvalues of T are 0, 1, −1, 21 and − 12 .
(b) As we will see a bit later, certain eigenvectors for T will describe the end behavior of
the sequence {xk }. Find eigenvectors for T corresponding to the eigenvalues 1 and −1.
Explain how the eigenvector for T corresponding to the eigenvalue 1 explains the behavior
of one of the sequences was saw in Project Activity 17.2. (Any eigenvector of T with
eigenvalue 1 is called an equilibrium or steady state vector.)
(b) Let x0 be any initial probability distribution vector. Explain why we can write x0 as
5
X
x0 = a1 v1 + a2 v2 + a3 v3 + a4 v4 + a5 v5 = ai vi
i=1
for some scalars a1 , a2 , a3 , a4 , and a5 .
We can now use the eigenvalues and eigenvectors of T to write the vectors xk in a convenient
form. Let λ1 = 0, λ2 = 1, λ3 = −1, λ4 = 12 , and λ5 = − 12 . Notice that
x1 = T x0
= T (a1 v1 + a2 v2 + a3 v3 + a4 v4 + a5 v5 )
= a1 T v1 + a2 T v2 + a3 T v3 + a4 T v4 + a5 T v5
= a1 λ1 v1 + a2 λ2 v2 + a3 λ3 v3 + a4 λ4 v4 + a5 λ5 v5
5
X
= ai λi vi .
i=1
Section 17. The Characteristic Equation 311
Similarly
5 5 5
!
X X X
x2 = T x1 = T ai λi vi = a i λ i T vi = ai λ2i vi .
i=1 i=1 i=1
We can continue in this manner to ultimately show that for each positive integer k we have
5
X
xk = ai λki vi (17.4)
i=1
P5
when x0 = i=1 ai vi .
Project Activity 17.5. Recall that we are interested in understanding the behavior of the sequence
{xk } as k goes to infinity.
(a) Equation (17.4) shows that we need to know limk→∞ λki for each i in order to analyze
limk→∞ xk . Calculate or describe these limits.
(b) Use the result of part (a), Equation (17.4), and Project Activity 17.3 (b) to explain why the
sequence {xk } is either eventually fixed or oscillates between two states. Compare to the
results from Project Activity 17.2. How are these results related to entropy? You may use
the facts that
Diagonalization
Focus Questions
By the end of this section, you should be able to give precise and thorough
answers to the questions listed below. You may want to keep these questions
in mind to focus your thoughts as you complete the section.
In 1202 Leonardo of Pisa (better known as Fibonacci) published Liber Abaci (roughly translated
as The Book of Calculation), in which he constructed a mathematical model of the growth of a
rabbit population. The problem Fibonacci considered is that of determining the number of pairs of
rabbits produced in a given time period beginning with an initial pair of rabbits. Fibonacci made the
assumptions that each pair of rabbits more than one month old produces a new pair of rabbits each
month, and that no rabbits die. (We ignore any issues about that might arise concerning the gender
of the offspring.) If we let Fn represent the number of rabbits in month n, Fibonacci produced the
model
313
314 Section 18. Diagonalization
1, 1, 2, 3, 5, 8, 13, 21, . . .
is a very famous sequence in mathematics and is called the Fibonacci sequence. This sequence is
thought to model many natural phenomena such as number of seeds in a sunflower and anything
which grows in a spiral form. It is so famous in fact that it has a journal devoted entirely to it. As
a note, while Fibonacci’s work Liber Abaci introduced this sequence to the western world, it had
been described earlier Sanskrit texts going back as early as the sixth century.
By definition, the Fibonacci numbers are calculated by recursion. This is a vey ineffective way
to determine entries Fn for large n. Later in this section we will derive a fascinating and unexpected
formula for the Fibonacci numbers using the idea of diagonalization.
Introduction
As we have seen when studying Markov processes, each state is dependent on the previous state. If
x0 is the initial state and A is the transition matrix, then the nth state is found by An x0 . In these
situations, and others, it is valuable to be able to quickly and easily calculate powers of a matrix.
We explore a way to do that in this section.
Preview Activity 18.1. Consider a very simplified weather forecast. Let us assume there are two
possible states for the weather: rainy (R) or sunny(S). Let us also assume that the weather patterns
are stable enough that we can reasonably predict the weather tomorrow based on the weather today.
If is is sunny today, then there is a 70% chance that it will be sunny tomorrow,
and if it is rainy
s
today then there is a 40% chance that it will be rainy tomorrow. If x0 = is a state vector that
r
indicates a probability s that it is sunny and probability r that it is rainy on day 0, then
0.70 0.40
x1 = x0
0.30 0.60
0.70 0.40
tells us the likelihood of it being sunny or rainy on day 1. Let A = .
0.30 0.60
1
(1) Suppose it is sunny today, that is x0 = . Calculate x1 = Ax0 and explain how this
0
matrix-vector product tells us the probability that it will be sunny tomorrow.
(2) Calculate x2 = Ax1 and interpret the meaning of each component of the product.
(4) The previous result demonstrates that to determine the long-term probability of a sunny or
rainy day, we want to be able to easily calculate powers of the matrix A. Use a computer
algebra system (e.g., Maple, Mathematica, Wolfram|Alpha) to calculate the entries of x10 ,
x20 , and x30 . Based on this data, what do you expect the long term probability of any day
being a sunny one?
Section 18. Diagonalization 315
Diagonalization
In Preview Activity 18.1 we saw how if we can powers of a matrix we can make predictions about
the long-term behavior of some systems. In general, calculating powers of a matrix can be a very
difficult thing, but there are times when the process is straightforward.
2 0
Activity 18.1. Let D = .
0 3
2
2 0
(a) Show that D = 2 .
0 32
3
2 0
(b) Show that D = 3 . (Hint: D3 = DD2 .)
0 33
n
2 0
(c) Explain in general why D = n for any positive integer n.
0 3n
Activity 18.1 illustrates that calculating powers of square matrices whose only nonzero entries
are along the diagonal is rather simple. In general, if
d11 0 0 · · · 0 0
0 d22 0 · · · 0 0
D= . . .. ,
. . 0 0 . . .
0 0 0 · · · 0 dnn
then
dk11 0
0 ··· 0 0
0 dk22 0 ··· 0 0
Dk = .. . ..
. 0 .. .
0
0 0 0 · · · 0 dknn
for any positive integer k. Recall that a diagonal matrix is a matrix whose only nonzero elements
are along the diagonal (see Definition 8.6). In this section we will see that matrices that are sim-
ilar to diagonal matrices have some very nice properties, and that diagonal matrices are useful in
calculations of powers of matrices.
We can utilize the method of calculating powers of diagonal matrices to also easily calculate
powers of other types of matrices.
Activity 18.2. Let D be any matrix, P an invertible matrix, and let A = P −1 DP .
As Activity 18.2 illustrates, to calculate the powers of a matrix of the form P −1 DP we only
need determine the powers of the matrix D. If D is a diagonal matrix, this is especially straightfor-
ward.
316 Section 18. Diagonalization
Similar Matrices
Similar matrices play an important role in certain calculations. For example, Activity 18.2 showed
that if we can write a square matrix A in the form A = P −1 DP for some invertible matrix P and
diagonal matrix D, then finding the powers of A is straightforward. As we will see, the relation
A = P −1 DP will imply that the matrices A and D share many properties.
Definition 18.1. The n × n matrix A is similar to the n × n matrix B if there is an invertible matrix
P such that A = P −1 BP .
1 1 2 2
Activity 18.3. Let A = and B = . Assume that A is similar to B via the
2 0 0 −1
2 1
matrix P = .
2 2
(c) What can you say about the eigenvalues of A and B? Explain.
1
(d) Explain why x = is an eigenvector for A with eigenvalue 2. Is x an eigenvector for
1
B with eigenvalue 2? Why or why not?
Activity 18.3 suggests that similar matrices share some, but not all, properties. Note that if
A = P −1 BP , then B = Q−1 AQ with Q = P −1 . So if A is similar to B, then B is similar to A.
Similarly (no pun intended), since A = I −1 AI (where I is the identity matrix), then any square
matrix is similar to itself. Also, if A = P −1 BP and B = M −1 CM , then A = (M P )−1 C(M P ).
So if A is similar to B and B is similar to C, then A is similar to C. If you have studied relations,
these three properties show that similarity is an equivalence relation on the set of all n × n matrices.
This is one reason why similar matrices share many important traits, as the next activity highlights.
(a) Use the multiplicative property of the determinant to explain why det(A) = det(B). So
similar matrices have the same determinants.
So similar matrices have the same characteristic polynomial, and the same eigenvalues.
Theorem 18.2. Let A and B be similar n × n matrices and I the n × n identity matrix. Then
When a matrix is similar to a diagonal matrix, we can gain insight into the action of the correspond-
ing matrix transformation. As an example, consider the matrix transformation T from R2 to R2
defined by T x = Ax, where
3 1
A= . (18.2)
1 3
We are interested in understanding what this matrix transformation does to vectors in R
. First
2
we
−1
note that A has eigenvalues λ1 = 2 and λ2 = 4 with corresponding eigenvectors v1 = and
1
1
v2 = . If we let P = [v1 v2 ], then you can check that
1
P −1 AP = D
and
A = P DP −1 ,
where
2 0
D= .
0 4
Thus,
T (x) = P DP −1 x.
This example illustrates that it is most convenient to view the action of T in the coordinate
system where v1 serves as the x-direction and v2 as the y-direction. In this case, we can visualize
that when we apply the transformation T to a vector in this system it is just scaled in both directions
by the matrix D. Then the matrix P translates everything back to the standard xy coordinate system.
e2
P −1 e1 P −1 e2
e1
T (e1 )
DP −1 e2
DP −1 e1 T (e2 )
This geometric perspective provides another example of how having a matrix similar to a di-
agonal matrix informs us about the situation. In what follows we determine the conditions that
determine when a matrix is similar to a diagonal matrix.
Diagonalization in General
In Preview Activity 18.1 and in the matrix transformation example we found that a matrix A was
similar to a diagonal matrix whose columns were eigenvectors of A. This will work for a general
n × n matrix A as long as we can find an invertible matrix P whose columns are eigenvectors of A.
More specifically, suppose A is an n × n matrix with n linearly independent eigenvectors v1 , v1 ,
. . ., vn with corresponding eigenvalues λ1 , λ1 , . . ., λn (not necessarily distinct). Let
P = [v1 v2 v3 · · · vn ].
Section 18. Diagonalization 319
Then
where
λ1 0 0 ··· 0 0
0 λ2 0 ··· 0 0
.. .. .. ..
D= . . . . .
···
0 0 0 · · · λn−1 0
0 0 0 ··· 0 λn
Since the columns of P are linearly independent, we know P is invertible, and so
P −1 AP = D.
It should be noted
that
there are square matrices that are not diagonalizable. For example,
1 1
the matrix A = has 1 as its only eigenvalue and the dimension of the eigenspace of
0 1
A corresponding to the eigenvalue is one. Therefore, it will be impossible to find two linearly
independent eigenvectors for A.
We showed previously that eigenvectors corresponding to distinct eigenvalue are always linearly
independent, so if an n × n matrix A has n distinct eigenvalues then A is diagonalizable. Activity
18.5 (b) shows that it is possible to diagonalize an n × n matrix even if the matrix does not have
320 Section 18. Diagonalization
n distinct eigenvalues. In general, we can diagonalize a matrix as long as the dimension of each
eigenspace is equal to the multiplicity of the corresponding eigenvalue. In other words, a matrix
is diagonalizable if the geometric multiplicity is the same is the algebraic multiplicity for each
eigenvalue.
At this point we might ask one final question. We argued that if an n×n matrix A has n linearly
independent eigenvectors, then A is diagonalizable. It is reasonable to wonder if the converse is true
– that is, if A is diagonalizable, must A have n linearly independent eigenvectors? The answer is
yes, and you are asked to show this in Exercise 6. We summarize the result in the following theorem.
Examples
What follows are worked examples that use the concepts from this section.
1 −2 1 1 2 0
Example 18.5. Let A = 0 3 −1 and B = 0 1 0 . You should use appropriate
0 −2 2 0 0 4
technology to calculate determinants, perform any row reductions, or solve any polynomial equa-
tions.
(c) Is it possible for two matrices R and S to have the same eigenvalues with the same algebraic
multiplicities, but one matrix is diagonalizable and the other is not? Explain.
Example Solution.
The eigenvalues of A are the solutions to the characteristic equation p(λ) = 0. Thus, the
eigenvalues of A are 1 and 4.
To find a basis for the eigenspace of A corresponding to the eigenvalue 1, we find the
general solution to the homogeneous system (A − I3 )x = 0. Using technology we see
1
0 −2 1
0 1 − 2
that the reduced row echelon form of A − I3 = 0 2 −1 is 0 0 . So
0
0 −2 1
0 0 0
Section 18. Diagonalization 321
x1
if x = x2 , then the general solution to (A − I3 )x = 0 is
x3
x1
x= 1
2 x3
x3
1 0
= x1 0 + x3 1 .
2
0 1
To find a basis for the eigenspace of A corresponding to the eigenvalue 4, we find the
general solution to the homogeneous system (A −4I3 )x = 0. Using technology
we see
−3 −2 1 0 1 −1
that the reduced row echelon form of A − 4I3 = 0 −1 −1 is 0 1 1 .
0 −2 −2 0 0 0
x1
So if x = x2 , then the general solution to (A − 4I3 )x = 0 is
x3
x = [x1 x2 x3 ]T
= [x3 − x3 x3 ]T
= x3 [1 − 1 1]T .
The eigenvalues of B are the solutions to the characteristic equation p(λ) = 0. Thus, the
eigenvalues of B are 1 and 4.
To find a basis for the eigenspace of B corresponding to the eigenvalue 1, we find the
general solution to the homogeneous system (B −I3 )x = 0. Using technology we see
0 2 0 0 1 0
that the reduced row echelon form of B − I3 = 0 0 0 is 0 0 1 . So if
0 0 3 0 0 0
x1
x = x2 , then the general solution to (B − I3 )x = 0 is
x3
x = [x1 x2 x3 ]T
= [x1 0 0]T
= x1 [1 0 0]T .
To find a basis for the eigenspace of B corresponding to the eigenvalue 4, we find the
general solution to the homogeneous system (B − 4I3 )x = 0. Using
technologywe see
−3 2 0 1 0 0
that the reduced row echelon form of B − 4I3 = 0 −3 0 is 0 1 0 . So if
0 0 0 0 0 0
x1
x = x2 , then the general solution to (B − 4I3 )x = 0 is
x3
x = [x1 x2 x3 ]T
= [0 0 x3 ]T
= x3 [0 0 1]T .
(c) Yes it is possible for two matrices R and S to have the same eigenvalues with the same
multiplicities, but one matrix is diagonalizable and the other is not. An example is given
by the matrices A and B in this problem.
Section 18. Diagonalization 323
Example 18.6.
(a) Is it possible to find diagonalizable matrices A and B such that AB is not diagonalizable?
If yes, provide an example. If no, explain why.
(b) Is it possible to find diagonalizable matrices A and B such that A+B is not diagonalizable?
If yes, provide an example. If no, explain why.
(c) Is it possible to find a diagonalizable matrix A such that AT is not diagonalizable? If yes,
provide an example. If no, explain why.
(d) Is it possible to find an invertible diagonalizable matrix A such that A−1 is not diagonaliz-
able? If yes, provide an example. If no, explain why.
Example Solution.
1 1 2 −2
(a) Let A = and B = . Since A and B are both diagonal matrices,
0 2 0 1
their eigenvalues are their diagonal entries. With
2 distinct
eigenvalues, both A and B are
2 −1
diagonalizable. In this case we have AB = , whose only eigenvector is 2. The
0 2
0 1
reduced row echelon form of AB − 2I2 is . So a basis for the eigenspace of AB
0 0
is {[1 0]T }. Since there is no basis for R2 consisting of eigenvectors of AB, we conclude
that AB is not diagonalizable.
1 3 2 0
(b) Let A = and B = . Since A and B are both diagonal matrices,
0 2 0 1
their eigenvalues are their diagonal entries. With 2 distinct
eigenvalues, both A and B are
3 3
diagonalizable. In this case we have A + B = , whose only eigenvector is 3. The
0 3
0 1
reduced row echelon form of (A + B) − 3I2 is . So a basis for the eigenspace of
0 0
A + B is {[1 0]T }. Since there is no basis for R2 consisting of eigenvectors of A + B, we
conclude that A + B is not diagonalizable.
(c) It is not possible to find a diagonalizable matrix A such that AT is not diagonalizable. To
see why, suppose that matrix A is diagonalizable. That is, there exists a matrix P such that
T −1
P −1 AP = D, where D is a diagonal matrix. Recall that P −1 = P T . So
D = DT
T
= P −1 AP
T
= P T AT P −1
−1
= P A PT
T T
.
−1
Letting A = P T , we conclude that
Q−1 AT Q = D.
324 Section 18. Diagonalization
Therefore, Q diagonalizes AT .
(d) It is not possible to find an invertible diagonalizable matrix A such that A−1 is not diag-
onalizable. To see why, suppose that matrix A is diagonalizable. That is, there exists a
matrix P such that P −1 AP = D, where D is a diagonal matrix. Thus, A = P DP −1 .
Since A is invertible, det(A) 6= 0. It follows that det(D) 6= 0. So none of the diagonal
entries of D can be 0. Thus, D is invertible and D−1 is a diagonal matrix. Then
−1
D−1 = P −1 AP = P A−1 P −1
Summary
B = P −1 AP.
• Similar matrices have the same determinants, same characteristic polynomials, and same
eigenvalues. Note that similar matrices do not necessarily have the same eigenvectors corre-
sponding to the same eigenvalues.
• One use for diagonalization is that once we have diagonalized a matrix A we can quickly
and easily compute powers of A. Diagonalization can also help us understand the actions of
matrix transformations.
Exercises
(1) Determine if each of the following matrices is diagonalizable or not. For diagonalizable
matrices, clearly identify a matrix P which diagonalizes the matrix, and what the resulting
diagonal matrix is.
2 −1
(a) A =
1 4
−1 4 −2
(b) A = −3 4 0
−3 1 3
Section 18. Diagonalization 325
1 1
(2) The 3 × 3 matrix A has two eigenvalues λ1 = 2 and λ2 = 3. The vectors 2 , −1 ,
1 2
2 1 2
and 4 are eigenvectors for λ1 = 2, while the vectors 1 , 2 are eigenvectors
2 1 2
for λ2 = 3. Find the matrix A.
(3) Find a 2 × 2 non-diagonal matrix A and two different pairs of P and D matrices for which
A = P DP −1 .
(4) Find a 2 × 2 non-diagonal matrix A and two different P matrices for which A = P DP −1
with the same D.
(5) Suppose a 4 × 4 matrix A has eigenvalues 2, 3 and 5 and the eigenspace for the eigenvalue
3 has dimension 2. Do we have enough information to determine if A is diagonalizable?
Explain.
(6) Let A be a diagonalizable n×n matrix. Show that A has n linearly independent eigenvectors.
(7)
1 1 1 2
(a) Let A = and B = . Find the eigenvalues and eigenvectors of A
0 1 0 1
and B. Conclude that it is possible for two different n × n matrices A and B to have
exactly the same eigenvectors and corresponding eigenvalues.
(b) A natural question to ask is if there are any conditions under which n×n matrices that
have exactly the same eigenvectors and corresponding eigenvalues must be equal.
Determine the answer to this question if A and B are both diagonalizable.
(8)
(a) Show that if D and D0 are n × n diagonal matrices, then DD0 = D0 D.
(b) Show that if A and B are n × n matrices and P is an invertible n × n matrix such that
P −1 AP = D and P −1 BP = D0 with D and D0 diagonal matrices, then AB = BA.
(9) Exercise 2 in Section 17 shows that the determinant of a matrix is the product of its eigen-
values. In this exercise we show that the trace of a diagonalizable matrix is the sum of its
eigenvalues.1 First we define the trace of a matrix.
Definition 18.7. The trace of an n × n matrix A = [aij ] is the sum of the diagonal entries
of A. That is,
X n
trace(A) = a11 + a22 + · · · + ann = aii .
i=1
(a) Show that if R = [rij ] and S = [sij ] are n×n matrices, then trace(RS) = trace(SR).
1
This result is true for any matrix, but the argument is more complicated.
326 Section 18. Diagonalization
(b) Let A be a diagonalizable n×n matrix, and let p(λ) = det(A−λIn ) be the character-
istic polynomial of A. Let P be an invertible matrix such that P −1 AP = D, where
D is the diagonal matrix whose diagonal entries are λ1 , λ2 , . . ., λn , the eigenvalues
of A (note that these eigenvalues may not all be distinct).
i. Explain why trace(A) = trace(D).
ii. Show that the trace of an n × n diagonalizable matrix is the sum of the eigen-
values of the matrix.
(10) In this exercise we generalize the result of Exercise 12 in Section 8 to arbitrary diagonalizable
matrices.
(a) Show that if
λ1 0 0 · · · 0
0 λ2 0 · · · 0
D= .. .. .. . . . ,
. . . . ..
0 0 0 · · · λn
then
eλ1
0 0 ··· 0
0 eλ2 0 ··· 0
eD = .. .. .. . . .. .
. . . . .
0 0 0 · · · e λn
(12) In Exercise 11 we see that we cannot conclude that eX+Y = eX eY for n × n matrices X and
Y . However, a more limited property is true.
(a) Follow the steps indicated to show that if A is an n × n matrix and s and t are any
scalars, then eAs eAt = eA(s+t) . (Although we will not use it, you may assume that
the series for eA converges for any square matrix A.)
i. Use the definition to show that
X X sk tm
eAs eAt = m!Ak+m .
k!
k≥0 m≥0
Section 18. Diagonalization 327
iii. Complete the problem using the Binomial Theorem that says
n
X n!
(s + t)n = sn−m tm .
(n − m)!m!
m=0
(b) Use the result of part (a) to show that eA is an invertible matrix for any n × n matrix
A.
(13) There is an interesting connection between the determinant of a matrix exponential and the
trace of the matrix. Let A be a diagonalizable n×n matrix with real entries. Let D = P −1 AP
for some invertible matrix P , where D is the diagonal matrix with entries λ1 , λ2 , . . ., λn the
eigenvalues of A.
(a) Show that eA = P eD P −1 .
(b) Use Exercise 9 to show that
det eA = etrace(A) .
(14) Label each of the following statements as True or False. Provide justification for your re-
sponse.
(a) True/False If matrix A is diagonalizable, then so is AT .
(b) True/False If matrix A is diagonalizable, then A is invertible.
(c) True/False If an n × n matrix A is diagonalizable, then A has n distinct eigenvalues.
(d) True/False If matrix A is invertible and diagonalizable, then so is A−1 .
(e) True/False If an n × n matrix C is diagonalizable, then there exists a basis of Rn
consisting of the eigenvectors of C.
(f) True/False An n × n matrix with n distinct eigenvalues is diagonalizable.
(g) True/False If A is an n × n diagonalizable matrix, then there is a unique diagonal
matrix such that P −1 AP = D for some invertible matrix P .
(h) True/False If A is an n × n matrix with eigenvalue λ, then the dimension of the
eigenspace of A corresponding to the eigenvalue λ is n − rank(A − λIn ).
(i) True/False If λ is an eigenvalue of an n × n matrix A, then eλ is an eigenvalue of
eA . (See Exercise 12 in Section 8 for information on the matrix exponential.)
called a recurrence relation. The recurrence relation Fn+2 = Fn+1 + Fn is very time consuming
to use to compute Fn for large values of n. It turns out that there is a fascinating formula that gives
the nth term of the Fibonacci sequence directly, without using the relation Fn+2 = Fn+1 + Fn .
Project Activity 18.1. The recurrence relationFn+2 = Fn+1 + Fn gives the equations
Fn+1 = Fn + Fn−1 (18.3)
Fn = Fn . (18.4)
Fn+1
Let xn = for n ≥ 0. Explain how the equations (18.3) and (18.3) can be described with
Fn
the matrix equation
xn = Axn−1 , (18.5)
1 1
where A = .
1 0
The matrix equation (18.5) shows us how to find the vectors xn using powers of the matrix A:
x1 = Ax0
x2 = Ax1 = A(Ax0 ) = A2 x0
x3 = Ax2 = A(A2 x0 ) = A3 x0
.. ..
. .
x n = An x 0 .
So if we can somehow easily find the powers of the matrix A, then we can find a convenient formula
for Fn . As we have seen, we know how to do this if A is diagonalizable
1 1
Project Activity 18.2. Let A = .
1 0
√ √
1+ 5 1− 5
(a) Show that the eigenvalues of A are ϕ = 2 and ϕ = 2 .
Now that we have the eigenvalues and know corresponding eigenvectors for A, we can return
to the problem of diagonalizing A.
Project Activity 18.3.
(a) Why do we know that A is diagonalizable?
(b) Find a matrix P such that P −1 AP is a diagonal matrix. What is the diagonal matrix?
Formula (18.6) is called Binet’s formula. It is a very surprising formula in the fact that the
expression on the right hand side of (18.6) is an integer for each positive integer n. Note that with
Binet’s formula we can quickly compute Fn for very large values of n. For example,
F150 = 9969216677189303386214405760200.
√
The number ϕ = 1+2 5 , called the golden mean or golden ratio is intimately related to the
Fibonacci sequence. Binet’s formula provides a fascinating relationship between the Fibonacci
numbers and the golden ratio. The golden ratio also occurs often in other areas of mathematics. It
was an important number to the ancient Greek mathematicians who felt that the most aesthetically
pleasing rectangles had sides in the ratio of ϕ : 1.
Project Activity 18.5. You might wonder what happens if we use negative integer exponents in
Binet’s formula. In other words, are there negatively indexed Fibonacci numbers? For any integer
n, including negative integers, let
ϕn − ϕn
Fn = √
5
There is a specific relationship between F−n and Fn . Find it and verify it.
Section 19
Focus Questions
By the end of this section, you should be able to give precise and thorough
answers to the questions listed below. You may want to keep these questions
in mind to focus your thoughts as you complete the section.
The Leslie Matrix (also called the Leslie Model) is a powerful model for describing an age dis-
tributed growth of a population that is closed to migration. In a Leslie model, it is usually the case
that only one gender (most often female) is considered. As an example, we will later consider a
population of sheep that is being grown commercially. A natural question that we will address is
how we can harvest the population to build a sustainable environment.
When working with populations, the matrices we use are often large. For large matrices, using
the characteristic polynomial to calculate eigenvalues is too time and resource consuming to be
practical, and we generally cannot find the exact values of the eigenvalues. As a result, approxi-
mation techniques are very important. In this section we will explore a method for approximating
eigenvalues. The eigenvalues of a Leslie matrix are important because they describe the limiting or
steady-state behavior of a population. The matrix and model were introduced by Patrick H. Leslie
in “On the Use of Matrices in Certain Population Mathematics”, Leslie, P.H., Biometrika, Volume
XXXIII, November 1945, pp. 183-212.
331
332 Section 19. Approximating Eigenvalues and Eigenvectors
Introduction
We have used the characteristic polynomial to find the eigenvalues of a matrix, and for each eigen-
value row reduced a corresponding matrix to find the eigenvectors This method is only practical for
small matrices – for more realistic applications approximation techniques are used. We investigate
one such technique in this section, the power method.
2 6
Preview Activity 19.1. Let A = . Our goal is to find a scalar λ and a nonzero vector v
5 3
so that Av = λv.
(1) If we have no prior knowledge of the eigenvalues and eigenvectors of this matrix, we might
just begin with a guess. Let x0 = [1 0]T be such a guess for an eigenvector. Calculate Ax0 .
Is x0 an eigenvector of A? Explain.
(2) If x0 is not a good approximation to an eigenvector of A, then we need to make a better guess.
We have little to work with other than just random guessing, but we can use x1 = Ax0 as
another guess. We calculated x1 in part 1. Is x1 an eigenvector for A? Explain.
(3) In parts (a) and (b) you might have noticed that in some sense x1 is closer to being an eigen-
vector of A than x0 was. So maybe continuing this process will get us closer to an eigenvector
of A. In other words, for each positive integer k we define xk as Axk−1 . Before we proceed,
however, we should note that as we calculate the vectors x1 , x2 , x3 , . . ., the entries in the
vectors get large very quickly. So it will be useful to scale the entries so that they stay at a
reasonable size, which makes it easier to interpret the output. One way to do this is to divide
each vector xi by its largest component in absolute value so that all of the entries stay between
−1 and 1.1 So in our example we have x0 = [1 0]T , x1 = [2/5 1]T , and x2 = [1 25/34]T .
Explain why scaling our vectors will not affect our search for an eigenvector.
(4) Use an appropriate technological tool to find the vectors xk up to k = 10. What do you think
the limiting vector limk→∞ xk is? Is this limiting vector an eigenvector of A? If so, what is
the corresponding eigenvalue?
While the examples we present in this text are small in order to highlight the concepts, matrices that
appear in real life applications are often enormous. For example, in Google’s PageRank algorithm
that is used to determine relative rankings of the importance of web pages, matrices of staggering
size are used (most entries in the matrices are zero, but the size of the matrices is still huge). Finding
eigenvalues of such large matrices through the characteristic polynomial is impractical. In fact,
finding the roots of all but the smallest degree characteristic polynomials is a very difficult problem.
As a result, using the characteristic polynomial to find eigenvalues and then finding eigenvectors is
not very practical in general, and it is often a better option to use a numeric approximation method.
We will consider one such method in this section, the power method.
1
There are several other ways to scale, but we won’t consider them here.
Section 19. Approximating Eigenvalues and Eigenvectors 333
2 6
In Preview Activity 19.1, we saw an example of a matrix A = so that the sequence
5 3
{xk }, where xk = Axk−1 , converged to a dominant eigenvector of A for an initial guess vector
x0 = [1 0]T . The vectors xi for i from 1 to 6 (with scaling) are approximately
0.4 1 0.8898
x1 = x2 = x3 =
1 0.7353 1
1 0.9838 1
x4 = x5 = x6 = .
0.9575 1 0.9939
Numerically we can see that the sequence {xk } approaches the vector [1 1]T , and Figure 19.1
illustrates this geometrically as well. This method of successive approximations xk = Axk−1 is
x1 x3 x5 x6
x4
x2
x0
called the power method (since we could write xk as Ak x0 ). Our task now is to show that this
method works in general. In the next activity we restrict our argument to the 2 × 2 case, and then
discuss the general case afterwards.
Let A be an arbitrary 2 × 2 matrix with two linearly independent eigenvectors v1 and v2 and
corresponding eigenvalues λ1 and λ2 , respectively. We will also assume |λ1 | > |λ2 |. An eigenvalue
whose absolute value is larger than that of any other eigenvalue is called a dominant eigenvalue.
Any eigenvector for a dominant eigenvalue is called a dominant eigenvector. Before we show that
our method can be used to approximate a dominant eigenvector, we recall that since v1 and v2 are
eigenvectors corresponding to distinct eigenvalues, then v1 and v2 are linearly independent. So
there exist scalars a1 and a2 such that
x0 = a1 v1 + a2 v2 .
With this representation of x0 we can now see why the power method approximates a dominant
eigenvector of A.
334 Section 19. Approximating Eigenvalues and Eigenvectors
Activity 19.1. Assume as above that A is an arbitrary 2 × 2 matrix with two linearly independent
eigenvectors v1 and v2 and corresponding eigenvalues λ1 and λ2 , respectively. (We are assuming
that we don’t know these eigenvectors, but we can assume that they exist.) Assume that λ1 is the
dominant eigenvalue for A, x0 is some initial guess to an eigenvector for A, that x0 = a1 v1 + a2 v2 ,
and that xk = Axk−1 for k ≥ 1.
(a) We divide both sides of equation (19.1) by λk1 (since λ1 is the dominant eigenvalue, we
know that λ1 is not 0) to obtain
k
1 λ2
xk = a1 v1 + a2 v2 . (19.2)
λk1 λ1
k
Recall that λ1 is the dominant eigenvalue for A. What happens to λλ21 as k → ∞?
Explain what happens to the right hand side of equation (19.2) as k → ∞.
(b) Explain why the previous result tells us that the vectors xk are approaching a vector in the
direction of v1 or −v1 as k → ∞, assuming a1 6= 0. (Why do we need a1 6= 0? What
happens if a1 = 0?)
(c) What does all of this tell us about the sequence {xk } as k → ∞?
The power method is straightforward to implement, but it is not without its drawbacks. We
began by assuming that we had a basis of eigenvectors of a matrix A. So we are also assuming that
A is diagonalizable. We also assumed that A had a dominant eigenvalue λ1 . That is, if A is n × n
we assume that A has eigenvalues λ1 , λ2 , . . ., λn , not necessarily distinct, with
and with vi an eigenvector of A with eigenvalue λi . We could then write any initial guess x0 in the
form
x0 = a1 v1 + a2 v2 + · · · + an vn .
The initial guess is also called a seed.
Then
xk = a1 λk1 v1 + a2 λk2 v2 + · · · + an λkn vn
and k k
1 λ2 λn
xk = a1 v1 + a2 v2 + · · · + a n vn . (19.3)
λk1 λ1 λ1
Notice that we are not actually calculating the vectors xk here – this is a theoretical argument and
we don’t know λ1 and are not performing any scaling like we did in Preview Activity 19.1. We are
k
assuming that λ1 is the dominant eigenvalue of A, though, so for each i the terms λλ1i converge
to 0 as k goes to infinity. Thus,
xk ≈ λk1 a1 v1
for large values of k, which makes the sequence {xk } converge to a vector in the direction of a
dominant eigenvector v1 provided a1 6= 0. So we need to be careful enough to choose a seed that
has a nonzero component in the direction of v1 . Of course, we generally don’t know that our matrix
Section 19. Approximating Eigenvalues and Eigenvectors 335
is diagonalizable before we make these calculations, but for many matrices the sequence {xk } will
approach a dominant eigenvector.
Once we have an approximation to a dominant eigenvector, we can then approximate the dom-
inant eigenvalue.
λ(v·v)
(a) Explain why λ = v·v .
(Av)·v
(b) Use the result of part (a) to explain why λ = v·v .
The result of Activity 19.2 is that, when the vectors in the sequence {xk } approximate a domi-
nant eigenvector of a matrix A, the quotients
(Axk ) · xk xT Axk
= kT (19.4)
xk · x k xk xk
approximate the dominant eigenvalue of A. The quotients in (19.4) are called Rayleigh quotients.
To summarize, the procedure for applying the power method for approximating a dominant
eigenvector and dominant eigenvalue of a matrix A is as follows.
Step 3: To avoid having the magnitudes of successive approximations become excessively large,
scale this approximation xk . That is, find the entry αk of xk that is largest in absolute value.
Then replace xk by |α1k | xk . Note that this does not change the direction of this approximation,
only its magnitude.
(Axk )·xk
Step 4: Calculate the Rayleigh quotient rk = xk ·xk .
Step 5: Let let xk+1 = Axk . Increase k by 1 and repeat Steps 3 through 5.
If the sequence {xk } converges to a dominant eigenvector of A, then the sequence {rk } converges
to the dominant eigenvalue of A.
The power method can be useful for approximating a dominant eigenvector as long as the suc-
cessive multiplications by A are fairly simple – for example, if many entries of A are zero.2 The
rate of convergence of the sequence {xk } depends on the ratio λλ12 . If this ratio is close to 1, then
k
it can take many iterations before the power λλ21 makes the v2 term negligible. There are other
methods for approximating eigenvalues and eigenvectors, e.g., the QR factorization, that we will
not discuss at this point.
2
A matrix in which most entries are zero is called a sparse matrix.
336 Section 19. Approximating Eigenvalues and Eigenvectors
The power method only allows us to approximate the dominant eigenvalue and a dominant eigen-
vector for a matrix A. It is possible to modify this method to approximate other eigenvectors and
eigenvalues under certain conditions. We consider an example in the next activity to motivate the
general situation.
2 6
Activity 19.3. Let A = be the matrix from Preview Activity 19.1. Recall that 8 is an
5 3
eigenvalue for A, and a quick calculation can show that −3 is the other eigenvalue of A. Consider
1 −5 6
the matrix B = (A − (−2)I2 )−1 = 10 .
5 −4
1 1
(a) Show that 8−(−2) and −3−(−2) are the eigenvalues of B.
(b) Recall that v1 = [1 1]T is an eigenvector of A corresponding to the eigenvalue 8 and assume
that v2 = [−6 5]T is an eigenvector for A corresponding to the eigenvalue −3. Calculate
the products Bv1 and Bv2 . How do the products relate to the results of part (a)?
Activity 19.3 provides evidence that we can translate the matrix A having a dominant eigenvalue
to a different matrix B with the same eigenvectors as A and with a dominant eigenvalue of our
choosing. To see why, let A be an n × n matrix with eigenvalues λ1 , λ2 , . . ., λn , and let α be any
real number distinct from the eigenvalues. Let B = (A − αIn )−1 . In our example in Activity 19.3
the numbers
1 1 1 1
, , , ...,
λ1 − α λ2 − α λ3 − α λn − α
were the eigenvalues of B, and that if vi is an eigenvector for A corresponding to the eigenvalue
λi , then vi is an eigenvector of B corresponding to the eigenvalue λi 1−α . To see why, let λ be an
eigenvalue of an n × n matrix A with corresponding eigenvector v. Let α be a scalar that is not an
eigenvalue of A, and let B = (A − αIn )−1 . Now
Av = λv
Av − αv = λv − αv
(A − αIn )v = (λ − α)v
1
v = (A − αIn )−1 v.
λ−α
1
So λ−α is an eigenvalue of B with eigenvector v.
Now suppose that A is an n × n matrix with eigenvalues λ1 , λ2 , . . ., λn , and that we want to
approximate an eigenvector and corresponding eigenvalue
λiof A. If we can somehow find a value
of α so that |λi − α| < |λj − α| for all j 6= i, then λi 1−α > λj1−α for any j 6= i. Thus, the
matrix B = (A − αIn )−1 has λi 1−α as its dominant eigenvalue and we can use the power method
to approximate an eigenvector and the Rayleigh quotient to approximate the eigenvalue λi 1−α , and
hence approximate λi .
7 3 3
Activity 19.4. Let A = 18 30 22 −10 .
15 −21 11
Section 19. Approximating Eigenvalues and Eigenvectors 337
(a) Apply the power method to the matrix B = (A − I3 )−1 with initial vector x0 = [1 0 0]T to
fill in Table 19.1 (to four decimal places). Use this information to estimate an eigenvalue
for A and a corresponding eigenvector.
k 10 15 20
xk
xT
k Axk
xT
k xk
(b) Applying the power method to the matrix B = (A − 0I3 )−1 with initial vector x0 =
[1 0 0]T yields the information in Table 19.2 (to four decimal places). Use this information
to estimate an eigenvalue for A and a corresponding eigenvector.
k 10 15 20
0.3344 −0.3333 0.3333
xk −0.6677
0.6666
−0.6666
−1.0000 1.0000 −1.0000
xT
k Axk
xT
−1.0014 −1.0000 −1.0000
k xk
(c) Applying the power method to the matrix B = (A − 5I3 )−1 with initial vector x0 =
[1 0 0]T yields the information in Table 19.3 (to four decimal places). Use this information
to estimate an eigenvalue for A and a corresponding eigenvector.
Examples
What follows are worked examples that use the concepts from this section.
1 2 3
Example 19.1. Let A = 4 5 6 .
7 8 9
(a) Approximate the dominant eigenvalue of A accurate to two decimal places using the power
method. Use technology as appropriate.
338 Section 19. Approximating Eigenvalues and Eigenvectors
k 10 15 20
0.0000 0.0000 0.0000
xk
1.0000
−1.0000
1.0000
−1.0000 1.0000 −1.0000
xT
k Axk
xT
−1.0000 −1.0000 −1.0000
k xk
(b) Find the characteristic polynomial p(λ) of A. Then find the the root of p(λ) farthest from
the origin. Compare to the result of part (a). Use technology as appropriate.
Example Solution.
(a) We use technology to calculate the scaled vectors Ak x0 for values of k until the components
don’t change in the second decimal place. We start with the seed x0 = [1 1 1]T . For
example, to two decimal places we have xk = [0.28 0.64 1.00]T for k ≥ 20. So we suspect
that [0.28 0.64 1.00]T is close to a dominant eigenvector for A.
For the dominant eigenvalue, we can calculate the Rayleigh quotients (Ax k )·xk
xk ·xk until they
do not change to two decimal places. For k ≥ 4, our Rayleigh quotients are all (to two
decimal places) equal to 16.12. So we expect that the dominant eigenvalue of A is close to
16.12. Notice that
A[0.28 0.64 1.00]T = [4.56 10.32 16.08]T ,
which is not far off from 16.12[0.28 0.64 1.00]T .
(a) Use the power method to approximate the dominant eigenvalue and a corresponding eigen-
vector (using scaling) accurate to two decimal places. Use x0 = [1 1 1]T as the seed.
Section 19. Approximating Eigenvalues and Eigenvectors 339
(b) Determine the exact value of the dominant eigenvalue of A and compare to your result from
part (a).
(c) Approximate the remaining eigenvalues of A using the inverse power method. (Hint: Try
α = 0.5 and α = 1.8.)
Example Solution.
(a) We use technology to calculate the scaled vectors Ak x0 for values of k until the components
don’t change in the second decimal place. For example, to two decimal places we have
T
xk = [0.50 1.00 0.50]T for k ≥ 4. So we suspect that 21 1 12 is a dominant eigenvector
for A.
For the dominant eigenvalue, we can calculate the Rayleigh quotients (Ax k )·xk
xk ·xk until they
do not change to two decimal places. For k ≥ 2, our Rayleigh quotients are all (to two
decimal places) equal to 4. So we expect that the dominant eigenvalue of A is 4. We could
also use the fact that
1 1 T 1 1 T
T
A 1 = [2 4 2] = 4 1
2 2 2 2
T
to see that 12 1 21 is a dominant eigenvector for A with eigenvalue 4.
We can see from the characteristic polynomial that 4 is the dominant eigenvalue of A.
(c) Applying the power method to B = (A − 0.5I3 )−1 with seed x0 = [1 1 1]T gives xk ≈
[0.50 1.00 0.50]T for k ≥ 5, with Rayleigh quotients of 2 (to several decimal places). So 2
1
is the dominant eigenvalue of B. But λ−0.5 is also the dominant eigenvalue of B, where λ
1
is the corresponding eigenvalue of A. . So to find λ, we note that λ−0.5 = 2 implies that
λ = 1 is an eigenvalue of A.
Now applying the power method to B = (A − 1.8I3 )−1 with seed x0 = [1 1 1]T gives
xk ≈ [1.00 − 1.00 1.00]T for large enough k, with Rayleigh quotients of 5 (to several
1
decimal places). To find the corresponding eigenvalue λ for A, we note that λ−1.8 = 5, or
λ = 2 is an eigenvalue of A.
Admittedly, this method is very limited. Finding good choices for α often depends on
having some information about the eigenvalues of A. Choosing α close to an eigenvalue
provides the best chance of obtaining that eigenvalue.
Summary
• The power method is an iterative method that can be used to approximate the dominant eigen-
value of an n × n matrix A that has n linearly independent eigenvectors and a dominant
eigenvalue.
340 Section 19. Approximating Eigenvalues and Eigenvectors
• To use the power method we start with a seed x0 and then calculate the sequence {xk } of
vectors, where xk = Axk−1 . If x0 is chosen well, then the sequence {xk } converges to a
dominant eigenvector of A.
Exercises
1 2
(1) Let A = . Let x0 = [1 0]T .
2 1
(a) Find the eigenvalues and corresponding eigenvectors for A.
(b) Use appropriate technology to calculate xk = Ak x0 for k up to 10. Compare to a
dominant eigenvector for A.
(c) Use the eigenvectors from part (b) to approximate the dominant eigenvalue for A.
Compare to the exact value of the dominant eigenvalue of A.
(d) Assume that the other eigenvalue for A is close to 0. Apply the inverse power method
and compare the results to the remaining eigenvalue and eigenvectors for A.
1 2 0
(2) Let A = −2 1 2 . Use the power method to approximate a dominant eigenvector for
1 3 1
A. Use x0 = [1 1 1]T as the seed. Then approximate the dominant eigenvalue of A.
3 −1
(3) Let A = . Use the power method starting with x0 = [1 1]T . Explain why the
−1 3
method fails in this case to approximate a dominant eigenvector, and how you could adjust
the seed to make the process work.
0 1
(4) Let A = .
1 0
(a) Find the eigenvalues and an eigenvector for each eigenvalue.
(b) Apply the power method with an initial starting vector x0 = [0 1]T . What is the
resulting sequence?
(c) Use equation (19.3) to explain the sequence you found in part (b).
2 6
(5) Let A = . Fill in the entries in Table 19.4, where xk is the kth approximation to a
5 3
dominant eigenvector using the power method, starting with the seed x0 = [1 0]T . Compare
x ·xk
the results of this table to the eigenvalues of A and limk→∞ xk+1 k ·xk
. What do you notice?
4 −5
(6) Let A = . The power method will approximate the dominant eigenvalue λ = 14.
2 15
In this exercise we explore what happens if we apply the power method to A−1 .
Section 19. Approximating Eigenvalues and Eigenvectors 341
v x0 x1 x2 x3 x4 x5
vT Av
vT v
v x6 x7 x8 x9 x10 x11
vT Av
vT v
(a) Apply the power method to A−1 to approximate the dominant eigenvalue of A−1 .
Use [1 1]T as the seed. How is this eigenvalue related to an eigenvalue of A?
(b) Explain in general why applying the power method to the inverse of an invertible
matrix B might give an approximation to an eigenvalue of B of smallest magnitude.
When might this not work?
(7) There are other algebraic methods that do not rely on the determinant of a matrix that can be
used to find eigenvalues of a matrix. We examine one such method in this exercise. Let A be
any n × n matrix, and let v be any vector in Rn .
(a) Explain why the vectors
v, Av, A2 v, . . . , An v
are linearly independent.
(b) Let c0 , c1 , . . ., cn be scalars, not all 0, so that
c0 v + c1 Av + c2 A2 v + · · · + cn An v = 0.
Explain why there must be a smallest positive integer k so that there are scalars a0 ,
a1 , . . ., ak with ak 6= 0. such that
a0 v + a1 Av + a2 A2 v + · · · + ak Ak v = 0.
(c) Let
q(t) = a0 + a1 t + a2 t2 + · · · + ak tk .
Then
q(A) = a0 + a1 A + a2 A2 + · · · + ak Ak
and
q(A)v = (a0 + a1 A + a2 A2 + · · · + ak Ak )v
= a0 v + a1 Av + a2 A2 v + · · · + ak Ak v
= 0.
Suppose the polynomial q(t) has a linear factor, say q(t) = (t − λ)Q(t) for some
degree k −1 polynomial Q(t). Explain why, if Q(A)v is non-zero, λ is an eigenvalue
of A with eigenvector Q(A)v.
342 Section 19. Approximating Eigenvalues and Eigenvectors
(d) This method allows us to find certain eigenvalues and eigenvectors, the roots of
the polynomial q(t). Any other eigenvector must lie outside the eigenspaces we
have already found, so repeating the process with a vector v not in any of the
known eigenspaces
will produce different eigenvalues and eigenvectors. Let A =
2 2 −1
2 2 2 .
0 0 6
i. Find the polynomial q(t). Use v = [1 1 1]T .
ii. Find all of the roots of q(t).
iii. For each root λ of q(t), find the polynomial Q(t) and use this polynomial to
determine an eigenvector of A. Verify your work.
(8) We don’t need to use the Rayleigh quotients to approximate the dominant eigenvalue of a
matrix A if we instead keep track of the scaling factors. Recall that the scaling in the power
method can be used to make the magnitudes of the successive approximations smaller and
easier to work with. Let A be an n × n matrix and begin with a non-zero seed v0 . We
now want to keep track of the scaling factors, so let α0 be the component of v0 with largest
absolute value and let x0 = |α10 | v0 . For k ≥ 0, let vk = Axk−1 , let αk be the component of
vk with largest absolute value and let xk = α1k vk .
0 1
(a) Let A = . Use x0 = [1 1]T as the seed and calculate αk for k from 1 to
−8 6
10. Compare to the dominant eigenvalue of A.
(b) Assume that for large k the vectors xk approach a dominant eigenvector with dom-
inant eigenvalue λ. Show now in general that the sequence of scaling factors αk
approaches λ.
(9) Let A be an n × n matrix and let α be a scalar that is not an eigenvalue of A. Suppose that x
is an eigenvector of B = (A − αIn )−1 with eigenvalue β. Find an eigenvalue of A in terms
of β and α with corresponding eigenvector x.
(10) Label each of the following statements as True or False. Provide justification for your re-
sponse.
(a) True/False The largest eigenvalue of a matrix is a dominant eigenvalue.
(b) True/False If an n × n matrix A has n linearly independent eigenvectors and a
dominant eigenvalue, then the sequence {Ak x0 } converges to a dominant eigenvector
of A for any initial vector x0 .
(c) True/False If λ is an eigenvalue of an n × n matrix A and α is not an eigenvalue of
A, then λ − α is an eigenvalue of A − αIn .
(d) True/False Every square matrix has a dominant eigenvalue.
Sheep farming is a significant industry in New Zealand. New Zealand is reported to have the
highest density of sheep in the world. Sheep can begin to reproduce after one year, and give birth
Section 19. Approximating Eigenvalues and Eigenvectors 343
only once per year. Table 19.5 gives Birth and Survival Rates for Female New Zealand Sheep (from
G. Caughley, “Parameters for Seasonally Breeding Populations,” Ecology, 48, (1967), 834-839).
Since sheep hardly ever live past 12 years, we will only consider the population through 12 years.
As sheep reproduce, they add to the 0-1 sheep (lamb) population. The potential to produce
offspring is called fecundity (derived from the word fecund which generally refers to reproductive
ability) and determines how many lamb are added to the population. Let Fk (the fecundity rate) be
the rate at which females in age class k give birth to female offspring. Not all members of a given
age group survive to the next age groups, so let sk be the fraction of individuals that survives from
age group k to age group k + 1. With these ideas in mind, we can create a life cycle chart as in
Figure 19.2 that illustrates how the population of sheep changes on a farm (for the sake of space,
we illustrate with four age classes).
F4
F3
F2
F1 1 2 3 4
s1 s2 s3
(0)
To model the sheep population, we need a few variables. Let n1 be the number of sheep in
(0)
age group 0-1, n2 the number in age group 1-2, n3 the number in age group 2-3 and, in general,
(0)
nk the number of sheep in age group (k − 1)-k at some initial time (time 0), and let
(0) T
h i
(0) (0) (0)
x0 = n1 n2 n3 · · · n12 .
344 Section 19. Approximating Eigenvalues and Eigenvectors
We wish to determine the populations in the different groups after one year. Let
(1) T
h i
(1) (1) (1)
x1 = n1 n2 n3 · · · n12 ,
(1) (1)
where n1 denotes the number of sheep in age group 0-1, n2 the number of sheep in age group
(1)
1-2 and, in general, nk the number of tilapia in age group (k − 1)-k after one year.
Project Activity 19.1. Table 19.5 shows that, on average, each female in age group 1-2 produces
0.045 female offspring in a year. Since there are n2 females in age group 1-2, the lamb population
increases by 0.045n2 in a year.
(1)
(b) Explain why n2 = 0.845n1 .
(1) A Leslie matrix L has a unique positive eigenvalue λ1 with a corresponding eigenvector v1
whose entries are all positive.
Section 19. Approximating Eigenvalues and Eigenvectors 345
(2) If λi (i > 1) is any other eigenvalue (real or complex) of L, then |λi | ≤ λ1 . If λ1 is the largest
magnitude eigenvalue of a matrix L, we call λ1 a dominant eigenvalue of L.
(3) If any two successive entries in the first row of L are both positive, then |λi | < λ1 for every
i > 1. In this case we say that λ1 is a strictly dominant eigenvalue of L. In a Leslie model, this
happens when the females in two successive age classes are fertile, which is almost always
the case.
We can use these properties to determine the long-term behavior of the sheep herd.
(m) T
h i
(m) (m) (m)
xm = n1 n2 n3 · · · n12 ,
(m) (m)
where n1 denotes the number of sheep in age group 0-1, n2 the number of sheep in age group
(m)
1-2 and, in general, nk the number of sheep in age group (k − 1)-k after k years.
(a) Assume that x0 = [100 100 100 · · · 100]T . Use appropriate technology to calculate x22 ,
x23 , x24 , and x25 . Round to the nearest whole number. What do you notice about the sheep
population? You may use the GeoGebra applet at https://www.geogebra.org/m/
yqss88xq.
(b) We can use the third and fourth properties of Leslie matrices to better understand the long-
term behavior of the sheep population. Since successive entries in the first row of the Leslie
matrix in (19.6) are positive, our Leslie matrix has a strictly dominant eigenvalue λ1 . Given
the dimensions of our Leslie matrix, finding this dominant eigenvalue through algebraic
means is not feasible. Use the power method to approximate the dominant eigenvalue λ1
of the Leslie matrix in (19.6) to five decimal places. Explain your process. Then explain
how this dominant eigenvalue tells us that, unchecked, the sheep population grows at a rate
that is roughly exponential. What is the growth rate of this exponential growth? You may
use the GeoGebra applet at https://www.geogebra.org/m/yqss88xq.
Project Activity 19.2 indicates that, unchecked, the sheep population will grow without bound,
roughly exponentially with ratio equal to the dominant eigenvalue of our Leslie matrix L. Of course,
a sheep farmer cannot provide the physical environment or the resources to support an unlimited
population of sheep. In addition, most sheep farmers cannot support themselves only by shearing
sheep for the wool. Consequently, some harvesting of the sheep population each year for meat and
skin is necessary. A sustainable harvesting policy allows for the regular harvesting of some sheep
while maintaining the population at a stable level. It is necessary for the farmer to find an optimal
harvesting rate to attain this stable population and the following activity leads us through an analysis
of how such a harvesting rate can be determined.
Project Activity 19.3. The Leslie model can be modified to consider harvesting. It is possible to
harvest different age groups at different rates, and to harvest only some age groups and not others.
346 Section 19. Approximating Eigenvalues and Eigenvectors
In the case of sheep, it might make sense to only harvest from the youngest population since lamb
is more desirable than mutton and the lamb population grows the fastest. Assume that this is our
harvesting strategy and that we harvest our sheep from only the youngest age group at the start of
each year. Let h be the fraction of sheep we harvest from the youngest age group each year after
considering growth.
(a) If we begin with an initial population x0 , then the state vector after births and expected
deaths is Lx0 . Now we harvest. Explain why if we harvest a fraction h from the youngest
age group after considering growth, then the state vector after 1 year will be
x1 = Lx0 − HLx0 ,
where
h 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
H= .
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
(b) Our goal is to find a harvesting rate that will lead to a steady state in which the sheep
population remains the same each year. In other words, we want to find a value of h, if one
exists, that satisfies
x = Lx − HLx. (19.7)
(c) Use appropriate technology to experiment numerically with different values of h to find the
value you think gives the best uniform harvest rate. Explain your reasoning. You may use
the GeoGebra applet at https://www.geogebra.org/m/yqss88xq.
(d) Now we will use some algebra to find an equation that explicitly gives us the harvest rate in
the general setting. This will take a bit of work, but none of it is too difficult. To simplify
our work but yet illustrate the overall idea, let us consider the general 4 × 4 case with
arbitrary Leslie matrix
F1 F2 F3 F4
s1 0 0 0
L= 0 s2 0 0 .
0 0 s3 0
Section 19. Approximating Eigenvalues and Eigenvectors 347
h 0 0 0
0 0 0 0
Recall that we want to find a value of h that satisfies (19.8) with H =
0
.
0 0 0
0 0 0 0
Let x = [x1 x2 x3 x4 ] .
T
i. Calculate the matrix product (I4 − H)L. Explain why this product is again a Leslie
matrix and why (I4 − H)L will have a dominant eigenvalue of 1.
ii. Now calculate (I4 − H)Lx and set it equal to x. Write down the resulting system of
4 equations that must be satisfied. Be sure that your first equation is
iii. Equation (19.9) as written depends on the entries of the vector x, but we should be
able to arrive at a result that is independent of x. To see how we do this, we assume
the population of the youngest group is never 0, so we can divide both sides of (19.9)
by x1 to obtain
x2 x3 x4
1 = (1 − h)F1 + (1 − h)F2 + (1 − h)F3 + (1 − h)F4 . (19.10)
x1 x1 x1
Now we need to write the fractions xx21 , xx13 , and xx41 so that they do not involve the xi .
Use the remaining equations in your system to show that
x2
= s1
x1
x3
= s1 s2
x1
x4
= s1 s2 s3 .
x1
iv. Now conclude that the harvesting value h must satisfy the equation
1 = (1 − h)[F1 + F2 s1 + F3 s1 s2 + F4 s1 s2 s3 ]. (19.11)
(e) Extend (19.11) to the 12 age group case of the sheep herd. Calculate the value of R for this
sheep herd and then find the value of h. Compare this h to the value you obtained through
experimentation earlier. Find the fraction of the lambs that should be harvested each year
and explain what the stable population state vector x tells us about the sheep population for
this harvesting policy.
Section 20
Complex Eigenvalues
Focus Questions
By the end of this section, you should be able to give precise and thorough
answers to the questions listed below. You may want to keep these questions
in mind to focus your thoughts as you complete the section.
We have now seen different methods for calculating/approximating eigenvalues of a matrix. The
algebraic method using the characteristic polynomial can provide exact values, but only in cases
where the size of the matrix is small. Methods like the power method allow us to approximate
eigenvalues in many, but not all, cases. These approximation techniques can be made more efficient
if we have some idea of where the eigenvalues are. The Gershgorin Disc Theorem is a useful
tool that can quickly provide bounds on the location of eigenvalues using elementary calculations.
For example, using the Gershsgorin Disk Theorem we can quickly tell that the real parts of the
eigenvalues of the matrix
3 1 −1
0 −1 + i i
2 1 −2i
lie between −4 and 5 and the imaginary parts lie between −5 and 2. Even more, we can say that
the eigenvalues lie in the disks (called Gershgorin disks) shown in Figure 20.1. We will learn more
details about the Gershgorin Disk Theorem at the end of this section.
349
350 Section 20. Complex Eigenvalues
Im(z)
−1 + i
Re(z)
3
−2i
Introduction
So far we have worked with real matrices whose eigenvalues are all real. However, the characteristic
polynomial of a matrix with real entries can have complex roots. In this section we investigate the
properties of these complex roots and their corresponding eigenvectors, how these complex eigen-
vectors are found, and the geometric interpretation of the transformations defined by matrices with
complex eigenvalues. Although we can consider matrices that have complex numbers as entries,
we will restrict ourselves to matrices with real entries.
2 4
Preview Activity 20.1. Let A = .
−2 2
Complex Eigenvalues
As you noticed in Preview Activity 20.1, the complex roots of the characteristic equation of a real
matrix A come in complex conjugate pairs. This should come as no surprise since we know through
our use of the quadratic formula that complex roots of (real) quadratic polynomials come in complex
conjugate pairs. More generally, if p(x) = a0 + a1 x + a2 x2 + · · · + an xn is a polynomial with real
coefficients and z is a root of this polynomial, meaning p(z) = 0, then
0 = p(z) = a0 + a1 z + a2 z 2 + · · · + an z n = a0 + a1 z + a2 z 2 + · · · + an z n = p(z) .
Therefore, z is also a root of p(x).
Section 20. Complex Eigenvalues 351
0 −1
Activity 20.1. Let A = .
1 0
In Preview Activity 20.1 and in Activity 20.1, you found that if v is an eigenvector of A corre-
sponding to λ, then v obtained by taking the complex conjugate of each entry in v is an eigenvector
of A corresponding to λ. Specifically, if v = u + iw where both u and w are real vectors is an
eigenvector of A, then so is v = u − iw. We can justify this property using matrix algebra as
follows:
Av = Av = Av = λv = λv .
In the first equality, we used the fact that A is a real matrix, so A = A. In all the other equalities,
we used the properties of the conjugation operation in complex numbers.
where the rotation is counterclockwise about the origin by an angle of θ radians. In Activity 20.1,
we considered the rotation matrix with angle π/2 in counterclockwise direction. We will soon see
that rotation matrices play an important role in the geometry of a matrix transformation for a matrix
that has complex eigenvalues. In this activity, we will restrict ourselves to the 2 × 2 case, but similar
arguments can be made in higher dimensions.
1 1
Activity 20.2. Let A = .
−1 1
(b) Although A is√not a rotation matrix, there is a rotation matrix B inside A. To find the matrix
B, factor out 2 from all entries of A. In other words, write A as a product of two matrices
in the form √
2 √0
A= B.
0 2
(d) If we think about the product of two matrices as applying one transformation after another,
describe the effect of the matrix transformation defined by A geometrically.
352 Section 20. Complex Eigenvalues
a −b
More generally, if we have a matrix A of the form A = , then
b a
√ " √ a √ −b
#
a2 + b2 √ 0 a2 +b2 a2 +b2
A= .
0 a2 + b2 √ b √ a
a2 +b2 a2 +b2
√
The first matrix in the decomposition is a scaling matrix with a scaling factor of s = a2 + b2 . So
if s > 1, the transformation stretches vectors, and if s < 1, the transformation shrinks vectors. The
second matrix in the decomposition is a rotation matrix with angle θ such that cos(θ) = √a2a+b2
and sin(θ) = √a2b+b2 . This angle is also the angle between the positive x-axis and the vector
a a −b
v= . We will refer to the matrices of the form as rotation-scaling matrices.
b b a
Now we will investigate how a general 2 × 2 matrix with complex eigenvalues can be seen to be
similar (both in a linear algebra and a colloquial meaning) to a rotation-scaling matrix.
1 −5
Activity 20.3. Let B = . The eigenvalues of B are 2 ± 3i. An eigenvector for the
2 3
−5
eigenvalue 2 − 3i is v = . We will use this eigenvector to show that B is similar to a
1 − 3i
rotation-scaling matrix.
(a) Any complex vector v can be written as v = u + iw where both u and w are real vectors.
What are these real vectors u and w for the eigenvector v above?
(b) Let P = [u w] be the matrix whose first column is the real part of v and whose second
column is the imaginary part of v (without the i). Find R = P −1 BP .
(c) Express R as a product of a rotation and a scaling matrix. What is the factor of scaling?
What is the rotation angle?
In Activity 20.3, we saw that the matrix B with complex eigenvalues 2 ± 3i is similar to a
rotation-scaling matrix. Specifically R = P −1 BP , where the columns of P are the real and √ imagi-
nary parts of an eigenvector of B, is the rotation-scaling matrix with a factor of scaling by 22 + 32
and a rotation by angle θ = arccos( √222+32 ).
Does a similar decomposition result hold for a general 2 × 2 matrix with complex eigenvalues?
We investigate this question in the next activity.
Activity 20.4. Let A be a 2 × 2 matrix with complex eigenvalue λ = a − bi, b 6= 0, and corre-
sponding complex eigenvector v = u + iw.
ii. Recall that if M is an m × n matrix and x is an n × 1 vector, then the matrix product
M x is a linear combination of the columns of M with weights the corresponding
entries of the vector x. Use this idea to show that
P R = [au + bw − bu + aw].
The one fact that we have not yet addressed is why the matrix P = [u w] is invertible. We do
that now to complete the argument.
Let A be a real 2 × 2 matrix with Av = λv, where λ = a − bi, b 6= 0 and v = u + iw. To
show that u and w are linearly independent, we need to show that no nontrivial linear combination
of u and w can be the zero vector. Suppose
x1 u + x2 w = 0
for some scalars x1 and x2 . We will show that x1 = x2 = 0. Assume to the contrary that one of
x1 , x2 is not zero. First, assume x1 6= 0. Then u = − xx12 w. Let c = − xx21 . Then
Au = A(cw)
Au = cAw
au + bw = c(au − bw)
(a + cb)u = (ca − b)w
(a + cb)(cu) = (ca − b)w.
354 Section 20. Complex Eigenvalues
So we must have (a+cb)c = ca−b. This equation simplifies to c2 b = −b. Since b 6= 0, we conclude
that c2 = −1 which is impossible for a real constant c. Therefore, we cannot have x1 6= 0. A similar
argument (left to the reader) shows that x2 = 0. Thus we can conclude that u and w are linearly
independent.
Examples
What follows are worked examples that use the concepts from this section.
0 1 0
Example 20.2. Let A = −1 0 −1 .
1 1 1
(a) Without doing any computations, explain why not all of the eigenvalues of A can be com-
plex.
Example Solution.
(a) Since complex eigenvalues occur in conjugate pairs, the complex eigenvalues with nonzero
imaginary parts occur in pairs. Since A can have at most 3 different eigenvalues, at most
two of them can have nonzero imaginary parts. So at least one eigenvalue of A is real.
−λ 1 0
(b) For this matrix A we have A−λI3 = −1 −λ −1 . Using a cofactor expansion
1 1 −λ + 1
along the first row gives us
det(A − λI3 ) = (−λ) ((−λ)(1 − λ) + 1) − ((−1)(1 − λ) + 1)
= −λ3 + λ2 − λ + 1 − λ − 1
= λ3 + λ2 − 2λ
= −λ(λ2 − λ + 2).
The roots of the characteristic polynomial are λ = 0 and
√
p
1 ± 1 − 4(2) 1
λ= = (1 ± 7i).
2 2
1 2
Example 20.3. Let A = . Find a rotation scaling matrix R that is similar to A. Identify
−1 3
the rotation and scaling factor.
Example Solution. The eigenvalues of A are the roots of the characteristic polynomial
p(λ) = det(A − λI2 )
1−λ 2
= det
−1 3 − λ
= (1 − λ)(3 − λ) + 2
= λ2 − 4λ + 5.
Section 20. Complex Eigenvalues 355
The scaling is determined by the determinant of R which is 5, and the angle θ of rotation satisfies
sin(θ) = 51 . This makes θ ≈ 0.2014 radians or approximately 11.5370◦ counterclockwise.
Summary
A = P RP −1 , where P = [u w] .
356 Section 20. Complex Eigenvalues
Exercises
(2) Find a rotation-scaling matrix where the rotation angle is θ = 3π/4 and scaling factor is less
than 1.
(6) Find a real 2 × 2 matrix which is not a rotation-scaling matrix with eigenvalue −1 + 2i.
(7) We have seen how to find the characteristic polynomial of an n × n matrix. In this exercise
we consider the revers question. That is, given a polynomial p(λ) of degree n, can we find
an n × n matrix whose characteristic polynomial is p(λ)?
0 −a0
(a) Find the characteristic polynomial of the 2 × 2 matrix C = . Use this
1 −a1
result to find a real valued matrix whose eigenvalues are 1 + i and 1 − i.
(b) Repeat part (a) by showing that −p(λ) = − λ2 + a2 λ2 + a1 λ + a0 is the charac-
0 0 −a0
teristic polynomial of the 3 × 3 matrix C = 0 1 −a1 .
0 0 −a2
(c) We can generalize this argument. Prove, using mathematical induction, that the poly-
nomial
(8) Label each of the following statements as True or False. Provide justification for your re-
sponse.
(a) True/False If 3 − 4i is an eigenvalue of a real matrix, then so is 3 + 4i.
(b) True/False If 2 + 3i is an eigenvalue of a 3 × 3 real matrix A, then A has three
distinct eigenvalues.
(c) True/False Every 2 × 2 real matrix with complex eigenvalues is a rotation-scaling
matrix.
(d) True/False Every square matrix with real entries has real number eigenvalues.
(e) True/False If A is a 2 × 2 matrix with complex eigenvalues similar to a rotation-
scaling matrix R, the eigenvalues of A and R are the same.
(f) True/False If A is a real matrix with complex eigenvalues, all eigenvectors of A must
be non-real.
To understand the Gershgorin Disk Theorem, we need to recall how to visualize a complex number
in the plane. Recall that a complex number z is a number of the form z = a + bi where a and
b are real numbers and i2 = −1. The number a is the real part of z, denoted as <(z), and b is
the imaginary part of z, denoted =(z). The set of all complex numbers is denoted C. We define
addition and multiplication on C as follows. For a + bi, c + di ∈ C,
(a + bi) + (c + di) = (a + c) + (b + d)i and (a + bi)(c + di) = (ac − bd) + (ad + bc)i.
Note that the product is what we would expect if we “expanded” the product in the normal way and
used the fact that i2 = −1. The set of complex numbers forms a field – that is, C satisfies all of the
same properties as R as stated in Theorem 4.2.
We can visualize the complex number a + bi in the plane as the point (a, b). Here we are
viewing the horizontal axis as the real axis and the vertical axis as the imaginary axis. The length
(or magnitude) of the complex number z = a + bi, which we denote √ as |z|, is the distance from the
origin to z. So by the Pythagorean Theorem we have |a + bi| = a2 + b2 . Note that the magnitude
of z = a + bi can be written as a complex product
p
|z| = (a + bi)(a − bi).
• zw = (ac − bd) + (ad + bc)i = (ac − bd) − (ad + bc)i = (a − bi)(c − di) = zw,
• z = z,
358 Section 20. Complex Eigenvalues
√ √
• |z| = a2 + b2 ≥ a2 = |a| = |<(z)|,
√ √
• |z| = a2 + b2 ≥ b2 = |b| = |=(z)|,
• |z| = |z|,
• If p(x) is a polynomial with real coefficients and the complex number z satisfies p(z) = 0,
then p (z) = 0 as well.
Using these facts we can show that the triangle inequality is true for complex numbers. That is,
|z + w| ≤ |z| + |w|.
|z + w|2 = (z + w)(z + w)
= (z + w)(z + w)
= zz + zw + wz + ww
= zz + zw + zww + ww
= |z|2 + 2<(zw) + |w|2
≤ |z|2 + 2|zw| + |w|2
= |z|2 + 2|z||w| + |w|2
= (|z| + |w|)2 .
Since |z + w|, |z|, and |w| are all non-negative, taking square roots of both sides gives us |z + w| ≤
|z| + |w| as desired. We can extend this triangle inequality to any number of complex numbers.
That is, if z1 , z2 , . . ., zk are complex numbers, then
We can prove Equation (20.1) by mathematical induction. We have already done the k = 2 case
and so we assume that Equation (20.1) is true for any sum of k complex numbers. Now suppose
that z1 , z2 , . . ., zk , zk+1 are complex numbers. Then
To prove the Gershgorin Disk Theorem, we will use the Levy-Desplanques Theorem, which
gives conditions that guarantee that a matrix is invertible. We illustrate with an example in the
following activity.
3 2
Project Activity 20.1. Let A = . Since det(A) 6= 0, we know that A is an invertible
−1 4
matrix. Let us assume for a moment that we don’t know that A is invertible and try to determine
Section 20. Complex Eigenvalues 359
(a) Use the fact that 3v1 + 2v2 = 0 to show that |v2 | > |v1 |.
(b) Use the fact that −v1 + 4v2 = 0 to show that |v1 | > |v2 |. What conclusion can we draw
about whether 0 is an eigenvalue of A? Why does this mean that A is invertible?
What makes the arguments work in Project Activity 20.1 is that |3| > |2| and |4| > | − 1|. This
argument can be extended to larger matrices, as described in the following theorem.
Theorem 20.4 (Levy-Desplanques Theorem). Any square matrix A = [aij ] satisfying |aii | >
j6=i |aij | for all i is invertible.
P
Proof. Let A = [aij ] be an n × n matrix satisfying |aii | > j6=i |aij | for all i. Let us assume that
P
A is not invertible, that is that there is a vector v 6= 0 such that Av = 0. Let v = [v1 v2 · · · vn ]
and t be between 1 and n so that |vt | ≥ |vi | for all i. That is, choose vt to be the component of v
with the largest absolute value.
Expanding the product Av using the row-column product along the tth row shows that
Then
But this contradicts the condition that |aii | > for all i. We conclude that 0 is not an
P
j6=i |aij |
eigenvalue for A and A is invertible.
Any matrix A = [aij ] satisfying the condition of the Levy-Desplanques Theorem is given a
special name.
360 Section 20. Complex Eigenvalues
Definition 20.5. A square matrix A = [aij ] is strictly diagonally dominant if |aii | >
P
j6=i |aij |
for all i.
So any strictly diagonally dominant matrix is invertible. A quick glance can show that a matrix
is strictly diagonally dominant. For example, since |3| > |1| + | − 1|, |12| > |5| + |6|, and
| − 8| < | − 2| + |4|, the matrix
3 1 −1
A = 5 12 6
−2 4 −8
is strictly diagonally dominant and therefore invertible. However, just because a matrix is not
strictly diagonally
dominant,
it does not follow that the matrix is non-invertible. For example, the
1 2
matrix B = is invertible, but not strictly diagonally dominant.
0 1
Now we can address the Gershgorin Disk Theorem.
Project Activity 20.2. Let A be an arbitrary n × n matrix and assume that λ is an eigenvalue of A.
(b) What does the Levy-Desplanques Theorem tell us about the matrix A − λI?
Theorem 20.6 (Gershgorin Disk Theorem). Let A = [aij ] be an n×n matrix with complex
entries. Then every eigenvalue of A lies in one of the Gershgorin discs
{z ∈ C : |z − aii | ≤ ri },
where ri =
P
j6=i |aij |.
The Gershgorin Disk Theorem has a consequence that gives additional information about the
eigenvalues if some of the Gershgorin disks do not overlap.
Theorem 20.7. If S is a union of m Gershgorin disks of a matrix A such that S does not intersect
any other Gershgorin disk, then S contains exactly m eigenvalues (counting multiplicities) of A.
Proof. Most proofs of this theorem require some results from topology. For that reason, we will
not present a completely rigorous proof but rather give the highlights. Let A = [aij ] be an n × n
matrix. Let Di be a collection of Gershgorin disks of A for 1 ≤ i ≤ m such that S = ∪1≤i≤m Di
does not intersect any other Gershgorin disk of A, and let S 0 be the union of the Gershgorin disks
of A that are different from the Di . Note that S ∩ S 0 = ∅. Let C be the matrix whose ith column is
aii ei , that is C is the diagonal matrix whose diagonal entries are the corresponding diagonal entries
Section 20. Complex Eigenvalues 361
of A. Note that the eigenvalues of C are aii and the Gershgorin disks of C are just the points aii .
So our theorem is true for this matrix C. To prove the result, we build a continuum of matrices
from C to A as follows: let B = A − C (so that B is the matrix whose off-diagonal entries are
those of A and whose diagonal entries are 0), and let A(t) = tB + C for t in the interval [0, 1].
Note that A(1) = A. Since the diagonal entries of A(t) are the same as those of A, the Gershgorin
disks of A(t) have the same centers as the corresponding Gershgorin disks of A, while the radii
of the Gershgorin disks of A(t) are those of A but scaled by t. So the Gershgorin disks of A(t)
increase from points (the aii ) to the Gershgorin disks of A as t increases from 0 to 1. While the
centers of the disks all remain fixed, it is important to recognize that the eigenvalues of A(t) move
as t changes. An illustration of this is shown in Figure 20.2 with the eigenvalues as the black
points
1
i
and the changing Gershgorin disks dashed in magenta, using the matrix 2 . We can
1 −2 + i
learn about how the eigenvalues move with the characteristic polynomial.
2
Im(z)
Re(z)
-3 -2 -1
Let p(t, x) be the characteristic polynomial of A(t). Note that these characteristic polynomials
are functions of both t and x. Since polynomials are continuous functions, their roots (the eigen-
values of A(t)) are continuous for t ∈ [0, 1] as well. Let λ(t) be an eigenvalue of A(t). Note that
λ(1) is an eigenvalue of A, and λ(0) is one of the aii and is therefore in S. We will argue that λ(t)
is in S for every value of t in [0, 1]. Let ri be the radius of Di and let D(t)i be the Gershgorin
disk of A(t) with the same center as Di and radius r(t)i = tri . Let S(t) = ∪1≤i≤m D(s)i . Since
r(s)i ≤ ri , it follows that D(s)i ⊆ Di and so S(t) ∩ S 0 = ∅ as well. From topology, we know that
since the disks Di are closed, the union S of these disks is also closed. Similarly, S(t) and S 0 are
closed. Thus, λ(t) is continuous in a closed set and so does not leave the set. Thus, λ(t) is in S for
every value of t in [0, 1].
Section 21
Properties of Determinants
Focus Questions
By the end of this section, you should be able to give precise and thorough
answers to the questions listed below. You may want to keep these questions
in mind to focus your thoughts as you complete the section.
Introduction
This section is different than others in that it contains mainly proofs of previously stated results and
only a little new material. Consequently, there is no application attached to this section.
We have seen that an important property of the determinant is that it provides an easy criteria for
the invertibility of a matrix. As a result, we obtained an algebraic method for finding the eigenvalues
of a matrix, using the characteristic equation. In this section, we will investigate other properties of
the determinant related to how elementary row operations change the determinant. These properties
of the determinant will help us evaluate the determinant in a more efficient way compared to using
the cofactor expansion method, which is computationally intensive for large n values due to it being
a recursive method. Finally, we will derive a geometrical interpretation of the determinant.
363
364 Section 21. Properties of Determinants
(1) We first consider how the determinant changes if we multiply a row of the matrix by a con-
stant.
2 3
(a) Let A = . Pick a few different values for the constant k and compare the
1 4
2k 3k
determinant of A and that of . What do you conjecture that the effect of
1 4
multiplying a row by a constant on the determinant is?
(b) If we want to make sure our
conjecture is valid for any 2 × 2 matrix, we need to
a b
show that for A = , the relationship between det(A) and the determinant
c d
a·k b·k
of follows our conjecture. We should also check that the relation-
c d
a b
ship between det(A) and the determinant of follows our conjecture.
c·k d·k
Verify this.
(c) Make a similar conjecture for what happens to the determinant when a row of a 3 × 3
matrix A is multiplied by a constant k, and explain why your conjecture is true using
the cofactor expansion definition of the determinant.
(2) The second type of elementary row operation we consider is row swapping.
a b
(a) Take a general 2 × 2 matrix A = and determine how row swapping effects
c d
the determinant.
(b) Now choose a few different 3 × 3 matrices and see how row swapping changes the
determinant in these matrices by evaluating the determinant with a calculator or any
other appropriate technology.
(c) Based on your results so far, conjecture how row swapping changes the determinant
in general.
(3) The last type of elementary row operation is adding a multiple of a row to another. Determine
the effect of this operation on a 2 × 2 matrix by evaluating the determinant of a general 2 × 2
matrix after a multiple of one row is added to the other row.
(4) All of the elementary row operations we discussed above can be achieved by matrix mul-
tiplication with elementary matrices. For each of the following elementary matrices, de-
termine
what elementary operation it corresponds to by calculating the product EA, where
a11 a12 a13
A = a21 a22 a23 is a general 3 × 3 matrix.
a31 a32 a33
0 1 0 1 0 0 1 0 0
(a) E = 1 0 0 (b) E = 0 3 0 (c) E = 0 1 2
0 0 1 0 0 1 0 0 1
Section 21. Properties of Determinants 365
In Preview Activity 21.1, we conjectured how elementary row operations affect the determinant of
a matrix. In the following activity, we prove how the determinant changes when a row is multiplied
by a constant using the cofactor expansion definition of the determinant.
Activity 21.1. In this activity, assume that the determinant of A can be determined by a cofactor
expansion along any row or column. (We will prove this result independently later in this section.)
Consider an arbitrary n × n matrix A = [aij ].
(a)
(b) Write the expression for det(A) using the cofactor expansion along the second row.
(c) Let B be obtained by multiplying the second row of A by k. Write the expression for
det(B) if the cofactor expansion along the second row is used.
(d) Use the expressions you found above, to express det(B) in terms of det(A).
(e) Explain how this method generalizes to prove the relationship between the determinant of
a matrix A and that of the matrix obtained by multiplying a row by a constant k.
Your work in Activity 21.1 proves the first part of the following theorem on how elementary
row operations change the determinant of a matrix.
In the next section, we will use elementary matrices to prove the last two properties of Theorem
21.1.
Elementary Matrices
As we saw in Preview Activity 21.1, elementary row operations can be achieved by multiplication
by elementary matrices.
1 0 0 0 1 0 0 0 1 0 4 0
0 0 0 1 0 1 0 0 0 1 0 0
E1 = , E2 = , and E3 = .
0 0 1 0 0 0 5 0 0 0 1 0
0 1 0 0 0 0 0 1 0 0 0 1
Proof of Theorem 21.3. We will prove the result by induction on n, the size of the matrix A. We
verified these results in Preview Activity 21.1 for n = 2 using elementary row operations. The
elementary matrix versions follow immediately.
Now assume the theorem is true for k × k matrices with k ≥ 2 and consider an n × n matrix
A where n = k + 1. If E is an n × n elementary matrix, we want to show that det(EA) =
det(E) det(A). Let EA = B. (Although it is an abuse of language, we will refer to both the
elementary matrix and the elementary row operation corresponding to it by E.)
When finding det(B) = det(EA) we will use a cofactor expansion along a row which is not
affected by the elementary row operation E. Since E affects at most two rows and A has n ≥ 3
rows, it is possible to find such a row, say row i. The cofactor expansion along row i of B is
bi1 (−1)i+1 det(Bi1 ) + bi2 (−1)i+2 det(Bi2 ) + · · · + bin (−1)i+n det(Bin ) . (21.1)
Since we chose a row of A which was not affected by the elementary row operation, it follows
that bij = aij for 1 ≤ j ≤ n. Also, the matrix Bij obtained by removing row i and column j from
matrix B = EA can be obtained from Aij by an elementary row operation of the same type as E.
Hence there is an elementary matrix Ek of the same type as E with Bij = Ek Aij . Therefore, by
Section 21. Properties of Determinants 367
induction, det(Bij ) = det(Ek ) det(Aij ) and det(Ek ) is equal to 1, -1 or r depending on the type
of elementary row operation. If we substitute this information into equation (21.1), we obtain
This equation proves det(EA) = det(Ek ) det(A) for any n × n matrix A where Ek is the corre-
sponding elementary row operation on the k × k matrices obtained in the cofactor expansion.
The proof of the inductive step will be finished if we show that det(Ek ) = det(E). This
equality follows if we let A = In in det(EA) = det(Ek ) det(A). Therefore, det(E) is equal to
r, or 1, or −1, depending on the type of the elementary row operation E since the same is true of
det(Ek ) by inductive hypothesis.
Therefore, by the principle of induction, the claim is true for every n ≥ 2.
Proof. If A is non-invertible, then AB is also non-invertible and both det(A) and det(AB) are 0,
proving the equality in this case.
Suppose now that A is invertible. By the Invertible Matrix Theorem, we know that A is row
equivalent to In . Expressed in terms of elementary matrices, this means that there are elementary
matrices E1 , E2 , . . . , E` such that
A = E1 E2 · · · E` In = E1 E2 · · · E` . (21.2)
AB = E1 E2 · · · E` B .
Again, by repeatedly applying Theorem 21.3 with this product of matrices, we find
We can use the multiplicative property of the determinant and the determinants of elementary
matrices to calculate the determinant of a matrix in a more efficient way than using the cofactor
expansion. The next activity provides an example.
368 Section 21. Properties of Determinants
1 1 2
Activity 21.2. Let A = 2 2 6 .
−1 2 1
(a) Use elementary row operations to reduce A to a row echelon form. Keep track of the
elementary row operation you use.
(b) Taking into account how elementary row operations affect the determinant, use the row
echelon form of A to calculate det(A).
Your work in Activity 21.2 provides an efficient method for calculating the determinant. If A is
a square matrix, we use row operations given by elementary matrices E1 , E2 , . . ., Ek to row reduce
A to row echelon form R. That is
R = Ek Ek−1 · · · E2 E1 A.
We know det(Ei ) for each i, and since R is a triangular matrix we can find its determinant. Then
In other words, if we keep track of how the row operations affect the determinant, we can calculate
the determinant of a matrix A using row operations.
Activity 21.3. Theorems 21.3 and 21.4 can be used to prove the following (part c of Theorem 16.2)
that A is invertible if and only if det(A) 6= 0. We see how in this activity. Let A be an n × n matrix.
We can row reduce A to its reduced row echelon form R by elementary matrices E1 , E2 , . . ., Ek
so that
R = E1 E2 · · · Ek A.
(a) Suppose A is invertible. What, then, is R? What is det(R)? Can the determinant of an
elementary matrix ever be 0? How do we conclude that det(A) 6= 0?
(b) Now suppose that det(A) 6= 0. What can we conclude about det(R)? What, then, must R
be? How do we conclude that A is invertible?
Summary: Let A be an n × n matrix. Suppose we swap rows s times and divide rows by
constants k1 , k2 , . . . , kr while computing a row echelon form REF(A) of A. Then det(A) =
(−1)s k1 k2 · · · kr det(REF(A)).
Determinants have interesting and useful applications from a geometric perspective. To understand
the geometric interpretation of the determinant of an n × n matrix A, we consider the image of the
unit square under the transformation T (x) = Ax and see how its area changes based on A.
Activity 21.4.
Section 21. Properties of Determinants 369
2 0
(a) Let A = . Start with the unit square in R2 with corners at the origin and at (1, 1).
0 3
x
In other words, the unit square we are considering consists of all vectors v = where
y
0 ≤ x ≤ 1 and 0 ≤ y ≤ 1, visualized as points in the plane.
i. Sketch the image of the unit square under the transformation T (v) = Av. To make
the sketching easier, find the images of the vectors [0 0]T , [1 0]T , [0 1]T , [1 1]T as
points first and then connect these images to find the image of the unit square.
ii. Check that the area of the parallelogram you obtained in the above part is equal to
det(A).
−2 1
iii. Does the relationship between the area and det(A) still hold if A = ? If
0 3
not, how will you modify the relationship?
The sign of det(A) can be interpreted in terms of the orientation of the column vectors of A.
See the project in Section 16 for details.
In Section 10 we found the inverse A−1 using row reduction of the matrix obtained by augmenting
A with In . However, in theoretical applications, having an explicit formula for A−1 can be handy.
370 Section 21. Properties of Determinants
Such an explicit formula provides us with an algebraic expression for A−1 in terms of the entries
of A. A consequence of the formula we develop is Cramer’s Rule, which can be used to provide
formulas that give solutions to certain linear systems.
We begin with an interesting connection between a square matrix and the matrix of its cofactors
that we explore in the next activity.
2 1 3
Activity 21.5. Let A = 1 4 5 .
2 −1 2
(a) Calculate the (1, 1), (1, 2), and (1, 3) cofactors of A.
(b) If Cij represents the (i, j) cofactor of A, then the cofactor matrix C is the matrix C = [Cij ].
The adjugate matrix of A is the transpose of the cofactor matrix. In our example, the
adjugate matrix of A is
13 −5 −7
adj(A) = 8 −2 −7 .
−9 4 7
Check the entries of this adjugate matrix with your calculations from part (a). Then calcu-
late the matrix product
A adj(A).
(c) What do you notice about the product A adj(A)? How is this product related to det(A)?
The result of Activity 21.5 is rather surprising, but it is valid in general. That is, if A = [aij ] is
an invertible
n × n matrix and Cij is the (i, j) cofactor of A, then A adj(A) = det(A)In . In other
adj(A)
words, A det(A) = In and so
1
A−1 = adj(A).
det(A)
This gives us another formulation of the inverse of a matrix. To see why A adj(A) = det(A)In , we
use the row-column version of the matrix product to find the ijth entry of A adj(A) as indicated by
the shaded row and column
a11 a12 · · · a1n
a21 a22 · · · a2n
C11 C21 · · · Cj1 · · · Cn1
.. .. .. C
. . . 12 C22 · · · Cj2 · · · Cn2
ai1 ai2 · · · ain .
. .
.. .
..
.
.
.. .. .. C
. . . 1n C2n · · · Cjn · · · Cnn
an1 an2 · · · ann
Notice that if i = j, then expression (21.4) is the cofactor expansion of A along the ith row. So
the iith entry of A adj(A) is det(A). It remains to show that the ijth entry of A adj(A) is 0 when
i 6= j.
Section 21. Properties of Determinants 371
The result from Activity 21.6 may seem a bit strange, but turns out to be true in general. The
result is called Cramer’s Rule.
Theorem 21.7 (Cramer’s Rule). Let A be an n × n invertible matrix. For any b in Rn , the solution
x of Ax = b has entries
det(Ai )
xi =
det(A)
where Ai represents the matrix formed by replacing ith column of A with b.
To see why Cramer’s Rule works in general, let A be an n × n invertible matrix and b =
[b1 b2 · · · bn ]T . The solution to Ax = b is
C11 C21 · · · Cn1 b1
1 1 C12 C22
· · · Cn2 b2
x = A−1 b = adj(A)b = . .. .. .. .
det(A) det(A) .. . . .
C1n C2n · · · Cnn bn
372 Section 21. Properties of Determinants
In this section we establish the fact that the determinant of a square matrix is the same as the
determinant of its transpose.
The result is easily verified for 2 × 2 matrices, so we will proceed by induction and assume that
the determinant of the transpose of any (n − 1) × (n − 1) matrix is the same as the determinant of
its transpose. Suppose A = [aij ] is an n × n matrix. By definition,
det(A) = a11 C11 + a12 C12 + a13 C13 + · · · + a1n C1n
and
det(AT ) = a11 C11 + a21 C21 + a31 C31 + · · · + an1 Cn1 .
Note that the only terms in either determinant that contains a11 is a11 C11 . This term is the same
in both determinants, so we proceed to examine other elements. Let us consider all terms in the
cofactor expansion for det(AT ) that contain ai1 a1j . The only summand that contains ai1 is ai1 Ci1 .
Letting Aij be the sub-matrix of A obtained by deleting the ith row and jth column, we see that
ai1 Ci1 = (−1)i+1 ai1 det(Ai1 ). Now let’s examine the sub-matrix Ai1 :
a12 a13 · · · a1j · · · a1n−1 a1n
a22 a23 · · · a2j · · · a2n−1 a2n
.. .. .. ..
. . . .
ai−12 ai−13 · · · ai−1j · · · ai−1n−1 ai−1n
ai+12 ai+13 · · · ai+1j · · · ai+1n−1 ai+1n
an2 an3 · · · anj · · · ann−1 ann
Section 21. Properties of Determinants 373
When we expand along the first row to calculate det(Ai1 ), the only term that will involve a1j is
where Aik,jm denotes the sub-matrix of A obtained by deleting rows i and k and columns j and m
from A. So the term that contains ai1 a1j in the cofactor expansion for det(AT ) is
(−1)i + 1ai1 (−1)j a1j det(Ai11j ) = (−1)i+j+1 ai1 a1j det(Ai1,1j ). (21.5)
Now we examine the cofactor expansion for det(A) to find the terms that contain ai1 a1j . The
quantity a1j only appears in the cofactor expansion as
Here is where we use the induction hypothesis. Since A1j is an (n − 1) × (n − 1) matrix, its
determinant can be found with a cofactor expansion down the first column. The only term in this
cofactor expansion that will involve ai1 is
So the term that contains ai1 a1j in the cofactor expansion for det(A) is
(−1)1+j a1j (−1)i−1+1 ai1 det(A1ji1 ) = (−1)i+j+1 ai1 a1j det(A1i,j1 ). (21.6)
Since the quantities in (21.5) and (21.6) are equal, we conclude that the terms in the two cofactor
expansions are the same and
det(AT ) = det(A).
In this section we determine the effect of row swaps to the determinant. Let Ers be the elementary
matrix that swaps
rows r and s in the n × n matrix A = [aij ]. Applying E12 to a 2 × 2 matrix
a b
A= , we see that
c d
c d
det(A) = ad − bc = −(ad − bc) = det = det(E12 A).
a b
So swapping rows in a 2 × 2 matrix multiplies the determinant by −1. Suppose that row swapping
on any (n−1)×(n−1) matrix multiplies the determinant by −1 (in other words, we are proving our
374 Section 21. Properties of Determinants
statement by mathematical induction). Now suppose A is an n×n matrix and let B = [bij ] = Ers A.
We first consider the case that s = r + 1 – that we swap adjacent rows. We consider two cases,
0 the
r > 1 and r = 1. First let us suppose that r > 1. Let Cij be the (i, j) cofactor of A and Cij
(i, j) cofactor of B. We have
and
0 0 0
det(B) = b11 C11 + b12 C12 + · · · + b1n C1n .
Since r > 1,it follows that a1j = b1j for every j. For each j the sub-matrix B1j obtained from B
by deleting the ith row and jth column is the same matrix as obtained from Aij by swapping rows
r and s. So by our induction hypothesis, we have C1j0 = −C for each j. Then
1j
0 0 0
det(B) = b11 C11 + b12 C12 + · · · + b1n C1n
= a11 (−C11 ) + a12 (−C12 ) + · · · + a1n (−C1n )
= −(a11 C11 + a12 C12 + · · · + a1n C1n )
= − det(A).
Now we consider the case where r = 1, where B is the matrix obtained from A by swapping the
first and second rows. Here we will use the fact that det(A) = det(AT ) which allows us to calculate
det(A) and det(B) with the cofactor expansions down the first column. In this case we have
and
0 0 0
det(B) = b11 C11 + b21 C21 + · · · + bn1 Cn1
0 0 0 0
= a21 C11 + a11 C21 + a31 C31 + · · · + an1 Cn1 .
0 = −C
For each i ≥ 3, the sub-matrix Bi1 is just Ai1 with rows 1 and 2 swapped. So we have Ci1 i1
by our induction hypothesis. Since we swapped rows 1 and 2, we have B21 = A11 and B11 = A21 .
Thus,
0
b11 C11 = (−1)1+1 b11 det(A21 ) = a21 det(A21 ) = −a21 C21
and
0
b21 C21 = (−1)2+1 a11 det(A11 ) = −a11 det(A11 ) = −a11 C11 .
Putting this all together gives us
0 0 0
det(B) = b11 C11 + b21 C21 + · · · + bn1 Cn1
= −a21 C21 − a11 C11 + a31 (−C31 ) + · · · + an1 (−Cn1 )
= − (a11 C11 + a21 C21 + · · · + an1 Cn1 )
= − det(A).
So we have shown that if B is obtained from A by interchanging two adjacent rows, then det(B) =
− det(A). Now we consider the general case. Suppose B is obtained from A by interchanging rows
r and s, with r < s. We can perform this single row interchange through a sequence of adjacent
row interchanges. First we swap rows r and r + 1, then rows r + 1 and r + 2, and continue until
Section 21. Properties of Determinants 375
we swap rows s − 1 and s. This places the original row r into the row s position, and the process
involved s − r adjacent row interchanges. Each of these interchanges multiplies the determinant
by a factor of −1. At the end of this sequence of row swaps, the original row s is now row s − 1.
So it will take one fewer adjacent row interchanges to move this row to be row r. This sequence of
(s − r) + (s − r − 1) = 2(s − r − 1) − 1 row interchanges produces the matrix B. Thus,
Cofactor Expansions
We have stated that the determinant of a matrix can be calculated by using a cofactor expansion
along any row or column. We use the result that swapping rows introduces a factor of −1 in the
determinant to verify that result in this section. Note that in proving that det(AT ) = det(A), we
have already shown that the cofactor expansion along the first column is the same as the cofactor
expansion along the first row. If we can prove that the cofactor expansion along any row is the same,
then the fact that det(AT ) = det(A) will imply that the cofactor expansion along any column is
the same as well.
Now we demonstrate that the cofactor expansions along the first row and the ith row are the
same. Let A = [aij ] be an n × n matrix. The cofactor expansion of A along the first row is
Let B be the matrix obtained by swapping row i with previous rows so that row i becomes the first
row and the order of the remaining rows is preserved.
· · · aij · · · ain
ai1 ai2
a11 a12 · · · a1j · · · a1n
a21 a22 · · · a2j · · · a2n
B=
ai−11 ai−12 · · · ai−1j · · · ai−1n
ai+11 ai+12 · · · ai+1j · · · ai+1n
.. .. .. ..
. .
. .
an1 an2 · · · anj ··· ann
Then
det(B) = (−1)i−1 det(A).
0 be the (i, j) cofactor of B we have
So, letting Cij
0 0 0
det(A) = (−1)i−1 det(B) = (−1)i−1 ai1 C11
+ ai2 C12 + · · · + ain C1n .
376 Section 21. Properties of Determinants
There are many instances where we have a number of systems to solve of the form Ax = b, all
with the same coefficient matrix. The system may evolve over time so that we do not know the
constant vectors b in the system all at once, but only determine them as time progresses. Each time
we obtain a new vector b, we have to apply the same row operations to reduce the coefficient matrix
to solve the new system. This is time repetitive and time consuming. Instead, we can keep track of
the row operations in one row reduction and save ourselves a significant amount of time. One way
of doing this is the LU -factorization (or decomposition).
To illustrate, suppose we can write the matrix A as a product A = LU , where
1 0 0 0 1 0 1 0
−1 1 0 0 0 1 3 −2
L= 0 1 1 0 and U = 0 0 0
.
3
1 0 0 1 0 0 0 0
z1 =3
−z1 + z2 =1
z2 + z3 =1
z4 = 3.
The first equation shows that z1 = 3. Substituting into the second equation gives us z2 = 4. Using
this information in the third equation yields z3 = −3, and then the fourth equation shows that
z4 = 0. To return to the original system, since U x = z, we now solve this system to find the
Section 21. Properties of Determinants 377
solution vector x. In this case, since U is upper triangular, we use back substitution. The equation
U x = z is equivalent to the system
x1 + x3 = 3
x2 + 3x3 − 2x4 = 4
3x4 =−3.
Note that the third column of U is not a pivot column, so x3 is a free variable. The last equation
shows that x4 = −1. Substituting into the second equation and solving for x2 yields x2 = 2 − 3x3 .
The first equation then gives us x1 = 3 − x3 . So the general solution
3 −1
2 −3
x= 0 + 1 x3
−1 0
to Ax = b can be found through L and U via forward and backward substitution. If we can find a
factorization of a matrix A into a lower triangular matrix L and an upper triangular matrix U , then
A = LU is called an LU -factorization or LU -decomposition.
We can use elementary matrices to obtain a factorization of certain matrices into products of
lower triangular (the“L” in LU) and upper triangular (the “U” in LU) matrices. We illustrate with
an example. Let
1 0 1 0
−1 1 2 −2
A= 0 1 3
.
1
1 0 1 0
Our goal is to find an upper triangular matrix U and a lower triangular matrix L so that A = LU .
We begin by row reducing A to an upper triangular matrix, keeping track of the elementary matrices
used to perform the row operations. We start by replacing the entries below the (1, 1) entry in A
with zeros. The elementary matrices that perform these operations are
1 0 0 0 1 0 0 0
1 1 0 0 0 1 0 0
0 0 1 0 and E2 = 0 0 1 0 ,
E1 =
0 0 0 1 −1 0 0 1
and
1 0 1 0
0 1 3 −2
E2 E1 A =
0
.
1 3 1
0 0 0 0
We next zero out the entries below the (2, 2) entry as
1 0 1 0
0 1 3 −2
E3 E2 E1 A = 0 0
,
0 3
0 0 0 0
378 Section 21. Properties of Determinants
where
1 0 0 0
0 1 0 0
E3 =
0 −1 1
.
0
0 0 0 1
The product E3 E2 E1 A is an upper triangular matrix U . So we have
E3 E2 E1 A = U
and
A = E1−1 E2−1 E3−1 U,
where
1 0 0 0
−1 1 0 0
E1−1 E2−1 E3−1 =
0
1 1 0
1 0 0 1
is a lower triangular matrix L. So we have decomposed the matrix A into a product A = LU , where
L is lower triangular and U is upper triangular. Since every matrix is row equivalent to a matrix in
row echelon form, we can always find an upper triangular matrix U in this way. However, we may
not always obtain a corresponding lower triangular matrix, as the next example illustrates.
Suppose we change the problem slightly and consider the matrix
1 0 1 0
−1 1 2 −2
B=
0
.
1 3 1
1 0 0 1
brings us to
1 0 1 0
0 1 3 −2
E4 E3 E2 E1 B = .
0 0 −1 1
0 0 0 3
Section 21. Properties of Determinants 379
is not lower triangular. The difference in this latter example is that we needed a row swap to obtain
the upper triangular form.
Examples
What follows are worked examples that use the concepts from this section.
Example 21.8.
(a) If A, B are n × n matrices with det(A) = 3 and det(B) = 2, evaluate the following
determinant values. Briefly justify.
i. det(A−1 )
ii. det(ABAT )
iii. det(A3 (BA)−1 (AB)2 )
a b c
(b) If the determinant of d e f is m, find the determinant of each of the following
g h i
matrices.
a b c
i. 2d 2e 2f
g h i
d e f
ii. g h i
a b c
a b c
iii. g − 2d h − 2e i − 2f
a+d b+e c+f
Example Solution.
ii. Interchanging two rows multiples the determinant by −1. It takes two row swaps in
the original matrix to obtain this one, so
d e f
det g h i = (−1)2 m = m.
a b c
iii. Adding a multiple of a row to another does not change the determinant of the matrix.
Since there is a row swap needed to get this matrix from the original we have
a b c
det g − 2d h − 2e i − 2f = −m.
a+d b+e c+f
2 8 0
Example 21.9. Let A = 2 2 −3 .
1 2 7
(b) Use the LU factorization with forward substitution and back substitution to solve the system
Ax = [18 3 12]T .
Example Solution.
(b) To solve the system Ax = b, where b = [18 3 12]T , we use the LU factorization of A
and solve LU x = b. Let x = [x1 x2 x3 ]T and let z = [z1 z2 z3 ]T with U x = z so
that Lz = L(U x) = Ax = b. First we solve Lz = [18 3 12]T to find z using forward
substitution. The first row of L shows that z1 = 18 and the second row that z1 + z2 = 3.
So z2 = −15. The third row of L gives us 12 z1 + 13 z2 + z3 = 12, so z3 = 12 − 9 + 5 = 8.
Now to find x we solve U x = z using back substitution. The third row of U tells us that
8x3 = 8 or that x3 = 1. The second row of U shows that −6x2 − 3x3 = −15 or x2 = 2.
Finally, the first row of U gives us 2x1 + 8x2 = 18, or x1 = 1. So the solution to Ax = b
is x = [1 2 1]T .
382 Section 21. Properties of Determinants
Summary
• The elementary row operations have the following effects on the determinant:
(a) If we multiply a row of a matrix by a constant k, then the determinant is multiplied
by k.
(b) If we swap two rows of a matrix, then the determinant changes sign.
(c) If we add a multiple of a row of a matrix to another, the determinant does not change.
• Each of the elementary row operations can be achieved by multiplication by elementary ma-
trices. To obtain the elementary matrix corresponding to an elementary row operation, we
perform the operation on the identity matrix.
det(Ai (b))
xi =
det(A)
where Ai (b) represents the matrix formed by replacing ith column of A with b.
where the adj A matrix, the adjugate of A, is defined as the matrix whose ij-th entry is Cji ,
the ji-th cofactor of A.
• For a 2 × 2 matrix A, the area of the image of the unit square under the transformation
T (x) = Ax is equal to | det(A)|, which is also equal to the area of the parallelogram defined
by the columns of A.
• For a 3 × 3 matrix A, the volume of the image of the unit cube under the transformation
T (x) = Ax is equal to | det(A)|, which is also equal to the volume of the parallelepiped
defined by the columns of A.
Exercises
(1) Find a formula for det(rA) in terms of r and det(A), where A is an n × n matrix and r is a
scalar. Explain why your formula is valid.
4 −1 −1 −1
−1 4 −1 −1
(3) Consider the matrix A = −1 −1
. We will find det(A) using elementary
4 −1
−1 −1 −1 4
row operations. (This matrix arises in graph theory, and its determinant gives the number of
spanning trees in the complete graph with 5 vertices. This number is also equal to the number
of labeled trees with 5 vertices.)
(a) Add rows R2 , R3 and R4 to the first row in that order.
(b) Then add the new R1 to rows R2 , R3 and R4 to get a triangular matrix B.
(c) Find the determinant of B. Then use det(B) and properties of how elementary row
operations affect determinants to find det(A).
(d) Generalize your work to find the determinant of the n × n matrix
n −1 −1 · · · −1 −1
−1 n −1 · · · −1 −1
A= . .. .. .. .. .
. . . . ··· . .
−1 −1 −1 · · · −1 n
(4) For which matrices A, if any, is det(A) = − det(−A)? Justify your answer.
1 0 1
(5) Find the inverse A−1 of A = 0 1 0 using the adjugate matrix.
2 0 1
(6) For an invertible n × n matrix A, what is the relationship between det(A) and det(adj A)?
Justify your result.
a b 1
(7) Let A = c d 2 , and assume that det(A) = 2. Determine the determinants of each of
e f 3
the following.
a b 1
(a) B = 3c 3d 6
e+a f +b 4
2e 2f 6
(b) C = 2c − 2e 2d − 2f −2
2a 2b 2
(8) Find the area of the parallelogram with one vertex at the origin and adjacent vertices at (1, 2)
and (a, b). For which (a, b) is the area 0? When does this happen geometrically?
(9) Find the volume of the parallelepiped with one vertex at the origin and three adjacent vertices
at (3, 2, 0), (1, 1, 1) and (1, 3, c) where c is unknown. For which c, is the volume 0? When
does this happen geometrically?
(10) Label each of the following statements as True or False. Provide justification for your re-
sponse.
384 Section 21. Properties of Determinants