Down With Determinants
Down With Determinants
Down With Determinants
Sheldon Axler
21 December 1994
1. Introduction
Ask anyone why a square matrix of complex numbers has an eigenvalue, and you’ll
probably get the wrong answer, which goes something like this: The characteristic
polynomial of the matrix—which is defined via determinants—has a root (by the
fundamental theorem of algebra); this root is an eigenvalue of the matrix.
What’s wrong with that answer? It depends upon determinants, that’s what.
Determinants are difficult, non-intuitive, and often defined without motivation. As
we’ll see, there is a better proof—one that is simpler, clearer, provides more insight,
and avoids determinants.
This paper will show how linear algebra can be done better without determinants.
Without using determinants, we will define the multiplicity of an eigenvalue and
prove that the number of eigenvalues, counting multiplicities, equals the dimension
of the underlying space. Without determinants, we’ll define the characteristic and
minimal polynomials and then prove that they behave as expected. Next, we will
easily prove that every matrix is similar to a nice upper-triangular one. Turning
to inner product spaces, and still without mentioning determinants, we’ll have a
simple proof of the finite-dimensional Spectral Theorem.
Determinants are needed in one place in the undergraduate mathematics curricu-
lum: the change of variables formula for multi-variable integrals. Thus at the end of
this paper we’ll revive determinants, but not with any of the usual abstruse defini-
tions. We’ll define the determinant of a matrix to be the product of its eigenvalues
(counting multiplicities). This easy-to-remember definition leads to the usual for-
mulas for computing determinants. We’ll derive the change of variables formula for
multi-variable integrals in a fashion that makes the appearance of the determinant
there seem natural.
This work was partially supported by the National Science Foundation. Many people
made comments that helped improve this paper. I especially thank Marilyn Brouwer,
William Brown, Jonathan Hall, Paul Halmos, Richard Hill, Ben Lotto, and Wade Ramey.
@
det
@
2
A few friends who use determinants in their research have expressed unease at the
title of this paper. I know that determinants play an honorable role in some areas of
research, and I do not mean to belittle their importance when they are indispensable.
But most mathematicians and most students of mathematics will have a clearer
understanding of linear algebra if they use the determinant-free approach to the
basic structure theorems.
The theorems in this paper are not new; they will already be familiar to most
readers. Some of the proofs and definitions are new, although many parts of this
approach have been around in bits and pieces, but without the attention they de-
served. For example, at a recent annual meeting of the AMS and MAA, I looked
through every linear algebra text on display. Out of over fifty linear algebra texts
offered for sale, only one obscure book gave a determinant-free proof that eigen-
values exist, and that book did not manage to develop other key parts of linear
algebra without determinants. The anti-determinant philosophy advocated in this
paper is an attempt to counter the undeserved dominance of determinant-dependent
methods.
This paper focuses on showing that determinants should be banished from much
of the theoretical part of linear algebra. Determinants are also useless in the com-
putational part of linear algebra. For example, Cramer’s rule for solving systems
of linear equations is already worthless for 10 × 10 systems, not to mention the
much larger systems often encountered in the real world. Many computer programs
efficiently calculate eigenvalues numerically—none of them uses determinants. To
emphasize the point, let me quote a numerical analyst. Henry Thacher, in a review
(SIAM News, September 1988) of the Turbo Pascal Numerical Methods Toolbox,
writes,
I find it hard to conceive of a situation in which the numerical value of a
determinant is needed: Cramer’s rule, because of its inefficiency, is com-
pletely impractical, while the magnitude of the determinant is an indication
of neither the condition of the matrix nor the accuracy of the solution.
Proof. To show that T (our linear operator on V ) has an eigenvalue, fix any non-
zero vector v ∈ V . The vectors v, T v, T 2 v, . . . , T n v cannot be linearly independent,
because V has dimension n and we have n + 1 vectors. Thus there exist complex
numbers a0 , . . . , an , not all 0, such that
a0 v + a1 T v + · · · + an T n v = 0.
Make the a’s the coefficients of a polynomial, which can be written in factored form
as
a0 + a1 z + · · · + an z n = c(z − r1 ) . . . (z − rm ),
where c is a non-zero complex number, each rj is complex, and the equation holds
for all complex z. We then have
0 = (a0 I + a1 T + · · · + an T n )v
= c(T − r1 I) . . . (T − rm I)v,
which means that T − rj I is not injective for at least one j. In other words, T has
an eigenvalue.
a1 v1 + · · · + am vm = 0.
3. Generalized eigenvectors
Unfortunately, the eigenvectors of T need not span V . For example, the linear
operator on C2 whose matrix is
0 1
0 0
has only one eigenvalue, namely 0, and its eigenvectors form a one-dimensional
subspace of C2 . We will see, however, that the generalized eigenvectors (defined
below) of T always span V .
A vector v ∈ V is called a generalized eigenvector of T if
(T − λI)k v = 0
for some eigenvalue λ of T and some positive integer k. Obviously, the set of
generalized eigenvectors of T corresponding to an eigenvalue λ is a subspace of V .
The following lemma shows that in the definition of generalized eigenvector, instead
of allowing an arbitrary power of T − λI to annihilate v, we could have restricted
attention to the nth power, where n equals the dimension of V . As usual, ker is an
abbreviation for kernel (the set of vectors that get mapped 0).
are linearly independent vectors; we will then have k linearly independent elements
in an n-dimensional space, which implies that k ≤ n.
To prove the vectors in (3.2) are linearly independent, suppose a0 , . . . , ak−1 are
complex numbers such that
The next result is the key tool we’ll use to give a description of the structure of
a linear operator.
@
det
@
5
V1 ∩ V2 = {0}. (3.6)
Because V1 and V2 are the kernel and range of a linear operator on V , we have
a1 v1 + · · · + am vm = 0. (3.9)
@
det
@
6
Let k be the smallest positive integer such that (T − λ1 I)k v1 = 0. Apply the linear
operator
(T − λ1 I)k−1 (T − λ2 I)n . . . (T − λm I)n
to both sides of (3.9), getting
a1 (T − λ1 I)k−1 (T − λ2 I)n . . . (T − λm I)n v1 = 0, (3.10)
where we have used Lemma 3.1. If we rewrite (T − λ2 I)n . . . (T − λm I)n in (3.10) as
n n
(T − λ1 I) + (λ1 − λ2 )I . . . (T − λ1 I) + (λ1 − λm )I ,
n
and then expand each (T − λ1 I) + (λ1 − λj )I using the binomial theorem and
multiply everything together, we get a sum of terms. Except for the term
(λ1 − λ2 )n . . . (λ1 − λm )n I,
each term in this sum includes a power of (T − λ1 I), which when combined with
the (T − λ1 I)k−1 on the left and the v1 on the right in (3.10) gives 0. Hence (3.10)
becomes the equation
a1 (λ1 − λ2 )n . . . (λ1 − λm )n (T − λ1 I)k−1 v1 = 0.
Thus a1 = 0. In a similar fashion, aj = 0 for each j, as desired.
Now we can pull everything together into the following structure theorem. Part
(b) allows us to interpret each linear transformation appearing in parts (c) and (d)
as a linear operator from Uj to itself.
Theorem 3.11 Let λ1 , . . . , λm be the distinct eigenvalues of T , with U1 , . . . , Um
denoting the corresponding sets of generalized eigenvectors. Then
(a) V = U1 ⊕ · · · ⊕ Um ;
(b) T maps each Uj into itself;
(c) each (T − λj I)|Uj is nilpotent;
(d) each T |Uj has only one eigenvalue, namely λj .
Proof. The proof of (a) follows immediately from Propositions 3.8 and 3.4.
To prove (b), suppose v ∈ Uj . Then (T − λj I)k v = 0 for some positive integer k.
We have
(T − λj I)k T v = T (T − λj I)k v = T (0) = 0.
Thus T v ∈ Uj , as desired.
The proof of (c) follows immediately from the definition of a generalized eigen-
vector and Lemma 3.1.
To prove (d), let λ be an eigenvalue of T |Uj with corresponding non-zero eigen-
vector v ∈ Uj . Then (T − λj I)v = (λ − λj )v, and hence
(T − λj I)k v = (λ − λj )k v
for each positive integer k. Because v is a generalized eigenvector of T corresponding
to λj , the left-hand side of this equation is 0 for some k. Thus λ = λj , as desired.
@
det
@
7
6. Upper-Triangular Form
A square matrix is called upper-triangular if all the entries below the main diago-
nal are 0. Our next goal is to show that each linear operator has an upper-triangular
matrix for some choice of basis. We’ll begin with nilpotent operators; our main
structure theorem will then easily give the result for arbitrary linear operators.
Lemma 6.1 Suppose T is nilpotent. Then there is a basis of V with respect to
which the matrix of T contains only 0’s on and below the main diagonal.
@
det
@
9
Proof. First choose a basis of ker T . Then extend this to a basis of ker T 2 . Then
extend to a basis of ker T 3 . Continue in this fashion, eventually getting a basis of V .
The matrix of T with respect to this basis clearly has the desired form.
By avoiding determinants and focusing on generalized eigenvectors, we can now
give a simple proof that every linear operator can be put in upper-triangular form.
We actually get a better result, because the matrix in the next theorem has many
more 0’s than required for upper-triangular form.
Theorem 6.2 Let λ1 , . . . , λm be the distinct eigenvalues of T . Then there is a
basis of V with respect to which the matrix of T has the form
λ1 ∗
..
. 0
0 λ1
λ2 ∗
..
.
.
0 λ2
..
.
λm ∗
..
0 .
0 λm
Proof. This follows immediately from Theorem 3.11 and Lemma 6.1.
For many traditional uses of the Jordan form, the theorem above can be used
instead. If Jordan form really is needed, then many standard proofs show (without
determinants) that every nilpotent operator can be put in Jordan form. The result
for general linear operators then follows from Theorem 3.11.
8. Getting Real
So far we have been dealing only with complex vector spaces. As we’ll see, a
real vector space U can be embedded, in a natural way, in a complex vector space
called the complexification of U . Each linear operator on U can be extended to a
linear operator on the complexification of U . Our results about linear operators on
complex vector spaces can then be translated to information about linear operators
on real vector spaces. Let’s see how this process works.
Suppose that U is a real vector space. As a set, the complexification of U , denoted
UC , equals U × U . Formally, a typical element of UC is an ordered pair (u, v), where
u, v ∈ U , but we will write this as u + iv, for obvious reasons. We define addition
on UC by
(u1 + iv1 ) + (u2 + iv2 ) = (u1 + u2 ) + i(v1 + v2 ).
@
det
@
12
Note that any orthonormal basis of the real inner product space U is also an or-
thonormal basis of the complex inner product space UC .
If S is a self-adjoint operator on U , then obviously SC is self-adjoint on UC . We
can then apply the complex Spectral Theorem (Theorem 7.5) to SC and transfer to
U , getting the real Spectral Theorem. The next theorem gives the formal statement
of the result and the details of the proof.
Theorem 8.3 Suppose U is a real inner product space and S is a linear operator
on U . Then there is an orthonormal basis of U consisting of eigenvectors of S if
and only if S is self-adjoint.
Proof. To prove the easy direction, first suppose that there is an orthonormal basis
of U consisting of eigenvectors of S. With respect to that basis, S has a diagonal
matrix. Clearly, the matrix of S ∗ (with respect to the same basis) equals the matrix
of S. Thus S is self-adjoint.
To prove the other direction, now suppose that S is self-adjoint. As noted above,
this implies that SC is self-adjoint on UC . Thus there is a basis
{u1 + iv1 , . . . , un + ivn } (8.4)
of UC consisting of eigenvectors of SC (by the complex Spectral Theorem, which is
Theorem 7.5); here each uj and vj is in U . Each eigenvalue of SC is real (Proposition
7.6), and thus each uj and each vj is an eigenvector of S. Clearly, {u1 , v1 , . . . , un , vn }
spans U (because (8.4) is a basis of UC ). Conclusion: The eigenvectors of S span U .
For each eigenvalue of S, choose an orthonormal basis of the associated set of
eigenvectors in U . The union of these bases (one for each eigenvalue) is still or-
thonormal, because eigenvectors corresponding to distinct eigenvalues are orthogo-
nal (Proposition 7.4). The span of this union includes every eigenvector of S (by
construction). We have just seen that the eigenvectors of S span U , and so we have
an orthonormal basis of U consisting of eigenvectors of S, as desired.
9. Determinants
At this stage we have proved most of the major structure theorems of linear
algebra without even defining determinants. In this section we will give a simple
definition of determinants, whose main reasonable use in undergraduate mathemat-
ics is in the change of variables formula for multi-variable integrals.
The constant term of the characteristic polynomial of T is plus or minus the
product of the eigenvalues of T , counting multiplicity (this is obvious from our
definition of the characteristic polynomial). Let’s look at some additional motivation
for studying the product of the eigenvalues.
Suppose we want to know how to make a change of variables in a multi-variable
integral over some subset of Rn . After linearization, this reduces to the question
of how a linear operator S on Rn changes volumes. Let’s consider the special
case where S is self-adjoint. Then there is an orthonormal basis of Rn consisting of
eigenvectors of S (by the real Spectral Theorem, which is Theorem 8.3). A moment’s
thought about the geometry of an orthonormal basis of eigenvectors shows that if
@
det
@
14
E is a subset of Rn , then the volume (whatever that means) of S(E) must equal
the volume of E multiplied by the absolute value of the product of the eigenvalues
of S, counting multiplicity. We’ll prove later that a similar result holds even for
non-self-adjoint operators. At any rate, we see that the product of the eigenvalues
seems to be an interesting object. An arbitrary linear operator on a real vector
space need not have any eigenvalues, so we will return to our familiar setting of
a linear operator T on a complex vector space V . After getting the basic results
on complex vector spaces, we’ll deal with real vector spaces by using the notion of
complexification discussed earlier.
Now we are ready for the formal definition. The determinant of T , denoted
det T , is defined to be the product of the eigenvalues of T , counting multiplicity.
This definition would not be possible with the traditional approach to eigenvalues,
because that method uses determinants to prove that eigenvalues exist. With the
techniques used here, we already know (by Theorem 3.11(a)) that T has dim V
eigenvalues, counting multiplicity. Thus our simple definition makes sense.
In addition to simplicity, our definition also makes transparent the following
result, which is not at all obvious from the standard definition.
Theorem 9.1 An operator is invertible if and only if its determinant is non-zero.
Proof. Clearly, T is invertible if and only if 0 is not an eigenvalue of T , and this
happens if and only if det T = 0.
With our definition of determinant and characteristic polynomial, we see immedi-
ately that the constant term of the characteristic polynomial of T equals (−1)n det T ,
where n = dim V . The next result shows that even more is true—our definitions
are consistent with the usual ones.
Proposition 9.2 The characteristic polynomial of T equals det(zI − T ).
Proof. Let λ1 , . . . , λm denote the eigenvalues of T , with multiplicities β1 , . . . , βm .
Thus for z ∈ C, the eigenvalues of zI − T are z − λ1 , . . . , z − λm , with multiplicities
β1 , . . . , βm . Hence the determinant of zI − T is the product
(z − λ1 )β1 . . . (z − λm )βm ,
which equals the characteristic polynomial of T .
Note that determinant is a similarity invariant. In other words, if S is an invertible
linear operator on V , then T and ST S −1 have the same determinant (because they
have the same eigenvalues, counting multiplicity).
We define the determinant of a square matrix of complex numbers to be the
determinant of the corresponding linear operator (with respect to some choice of
basis, which doesn’t matter, because two different bases give rise to two linear
operators that are similar and hence have the same determinant). Fix a basis of
V , and for the rest of this section let’s identify linear operators on V with matrices
with respect to that basis. How can we find the determinant of T from its matrix,
without finding all the eigenvalues? Although getting the answer to that question
@
det
@
15
will be hard, the method used below will show how someone might have discovered
the formula for the determinant of a matrix. Even with the derivation that follows,
determinants are difficult, which is precisely why they should be avoided.
We begin our search for a formula for the determinant by considering matrices
of a special form. Let a1 , . . . , an ∈ C. Consider a linear operator T whose matrix is
0 an
a
1 0
a2 0 ; (9.3)
.. ..
. .
an−1 0
here all entries of the matrix are 0 except for the upper right-hand corner and
along the line just below the main diagonal. Let’s find the determinant of T . Note
that T n = a1 . . . an I. Because the first columns of {I, T, . . . , T n−1 } are linearly
independent (assuming that none of the aj is 0), no polynomial of degree less than
n can annihilate T . Thus z n − a1 . . . an is the minimal polynomial of T . Hence
z n − a1 . . . an is also the characteristic polynomial of T . Thus
det T = (−1)n−1 a1 . . . an .
(If some aj is 0, then clearly T is not invertible, so det T = 0, and the same formula
holds.)
Now let τ be a permutation of {1, . . . , n}, and consider a matrix T whose j th
column consists of all zeroes except for aj in the τ (j)th row. The permutation τ
is a product of cyclic permutations. Thus T is similar to (and so has the same
determinant as) a block diagonal matrix where each block of size greater than one
has the form of (9.3). The determinant of a block diagonal matrix is obviously the
product of the determinants of the blocks, and we know from the last paragraph
how to compute those. Thus we see that det T = (sign τ )a1 . . . an . To put this into
a form that does not depend upon the particular permutation τ , let ti,j denote the
entry in row i, column j, of T (so ti,j = 0 unless i = τ (j)), and let P (n) denote the
set of all permutations of {1, . . . , n}. Then
det T = (sign π)tπ(1),1 . . . tπ(n),n , (9.4)
π∈P (n)
The entries on the diagonal, namely the eigenvalues √ of S ∗ S, are all non-negative, as
∗
we have just seen. The square root of S S, denoted S ∗ S, is the linear operator on
U corresponding to the diagonal matrix obtained by taking√ the non-negative square
∗
root of each entry of the matrix of S S. Obviously, S ∗ S is self-adjoint, and its
square equals S ∗ S. Also, the multiplicativity of det shows that
√
(det S ∗ S)2 = det(S ∗ S) = (det S ∗ )(det S) = (det S)2 .
√ √
Thus det S ∗ S = |det S| (because det S ∗ S must be non-negative).
The next lemma provides the tool we will use to reduce the question of volume
change by a linear operator to the self-adjoint case. It is called the polar decompo-
sition of an operator S, because √it resembles the polar
√ decomposition of a complex
number z = eiθ r. Here r equals z̄z (analogous to S ∗ S in the lemma), and mul-
tiplication by eiθ is an isometry on C (analogous to the isometric property of A in
the lemma).
Lemma 9.6 Let S be a linear operator on a real inner √ product space U . Then
there exists a linear isometry A on U such that S = A S ∗ S.
Proof. For u ∈ U we have
√ √ √
S ∗ Su 2 = S ∗ Su, S ∗ Su = S ∗ Su, u = Su, Su = Su 2 .
√ √
other words, S ∗ Su = Su . Thus the function A defined√on ran S ∗ S by
In √
A( S ∗ Su) = Su is well defined and is a linear isometry from ran S ∗ S onto ran S.
Extend√ A to a linear isometry of U onto U by first extending A to be any isometry
of (ran S ∗ S)⊥ onto (ran S)⊥ (these two spaces have the√ same dimension, because
we have just seen that there is a linear isometry of ran S ∗ S onto ran S), and then
extend A to all of U by linearity (with the Pythagorean Theorem showing √ that A
is an isometry on all of U ). The construction of A shows that S = A S ∗ S, as
desired.
Now we are ready to give a clean, illuminating proof that a linear operator changes
volumes by a factor of the absolute value of the determinant. We will not formally
define volume, but only use the obvious properties that volume should satisfy. In
particular, the subsets E of Rn considered in the theorem below should be restricted
to whatever class the reader uses most comfortably (polyhedrons, open sets, or
measurable sets).
Theorem 9.7 Let S be a linear operator on Rn . Then
vol S(E) = |det S| vol E
for E ⊂ Rn .
√
Proof. Let S = A S ∗ S be the polar decomposition of S as given by Lemma 9.6.
Let E ⊂ Rn . Because A is an isometry, it does not change volumes. Thus
√ √
vol S(E) = vol A S ∗ S(E) = vol S ∗ S(E).
@
det
@
18
√
But S ∗ S is self-adjoint, and we already noted at the beginning of this section that
each self-adjoint operator changes volume by a factor equal to the absolute value of
the determinant. Thus we have
√ √
vol S(E) = vol S ∗ S(E) = |det S ∗ S| vol E = |det S| vol E,
as desired.
10. Conclusion
As mathematicians, we often read a nice new proof of a known theorem, enjoy
the different approach, but continue to derive our internal understanding from the
method we originally learned. This paper aims to change drastically the way math-
ematicians think about and teach crucial aspects of linear algebra. The simple proof
of the existence of eigenvalues given in Theorem 2.1 should be the one imprinted
in our minds, written on our blackboards, and published in our textbooks. Gen-
eralized eigenvectors should become a central tool for the understanding of linear
operators. As we have seen, their use leads to natural definitions of multiplicity
and the characteristic polynomial. Every mathematician and every linear algebra
student should at least remember that the generalized eigenvectors of an operator
always span the domain (Proposition 3.4)—this crucial result leads to easy proofs
of upper-triangular form (Theorem 6.2) and the Spectral Theorem (Theorems 7.5
and 8.3).
Determinants appear in many proofs not discussed here. If you scrutinize such
proofs, you’ll often discover better alternatives without determinants. Down with
Determinants!
Department of Mathematics
Michigan State University
East Lansing, MI 48824 USA