David M. Rocke
Department of Applied Science
UC Davis
1. Overview
A canonical form of a linear transformation is a matrix representation in a basis chosen to
make that representation simple in form. The most common canonical form is a diagonal
matrix. In this section of the course, we explore canonical forms with three main types of
1. For real symmetric or complex Hermitian matrices, we show that these are always di-
agonalizable; i.e., there exists a basis wrt which the matrix representation is diagonal.
2. For general linear transformations, we show that it is diagonalizable if and only if its
minimal polynomial is a product of distinct linear factors.
3. For any linear transformation for which the characteristic polynomial factors com-
pletely (this is all linear transformations if the field is C), there is a matrix represen-
tation in Jordan canonical form.
Most of this will be shown directly in class, assuming the standard facts about real and
complex numbers and solution and factoring of polynomials. The Cayley-Hamilton theorem
is given without proof, as this would be too extensive.
Theorem 1 Suppose T ∈ L(V ), with V finite dimensional. Then the following are equiv-
1. λ is an eigenvalue of T .
2. T − λI is singular and therefore not invertible.
3. |T − λI| = 0.
|B − uI| = |P −1 AP − uI|
= |P −1 AP − uP −1 P |
= |P −1 (A − uI)P |
= |P −1 ||(A − uI)||P |
= |(A − uI)
since |P −1 | = |P |−1 .
A linear transformation T is diagonalizable if there exists a basis wrt which the matrix
representation of T is diagonal, or, equivalently, if there is a basis of the whole vector space
consisting of eigenvectors of T . If T has a diagonal representation, and λ1 , λ2 , . . . , λk are
the distinct eigenvalues, then T ’s matrix representation consists of k diagonal blocks, of
which the ith block is the diagonal matrix λi Idi ×di and the characteristic polynomial of T
is (u − λ1 )d1 (u − λ2 )d2 · · · (u − λk )dk . For diagonalizable transformations, the dimension of
the eigenspace of λi is di .
Proof. Let (λ1 , x1 ), (λ2 , x2 ), . . . , (λn , xn ) be the distinct eigenvalues and an associated
eigenvector for each. We show that {x1 , x2 , . . . , xn } is a linearly independent set. Then this
must be a basis of V since the space spanned by them is of dimension n and must therefore
be the whole space.
Suppose, on the contrary, that
ai xi = 0 (2.1)
for some set of constants ai not all zero. Without loss of generality, let a1 6= 0 Applying T
to (2.1), we get
ai λi xi = 0
so that
X n
0 = λn ai xi − ai λi xi
i=1 i=1
= ai (λn − λi )xi
= ai (λn − λi )xi
We can then apply T again, multiplying by λn−1 and subtracting, and continue the process
to obtain
a1 (λn − λ1 )(λn−1 − λ1 ) · · · (λ2 − λ1 ) = 0
Since all the eigenvalues are distinct by hypothesis, this means that a1 = 0, a contradiction.
Thus, the xi are linearly independent.
Proof. Suppose that (λ1 , x1 ) and (λ2 , x2 ) are eigenvalue/eigenvector pairs for a Hermitian
matrix A, and suppose that λ1 6= λ2 . Then Ax1 = λ1 x1 and so x∗2 Ax1 = λ1 x∗2 x1 (multiplying
on the left by x∗2 ). Also, Ax2 = λ2 x2 , so x∗2 A∗ = x∗2 A = λ2 x∗2 (using the definition of
Hermitian and the previous lemma), and thus x∗2 Ax1 = λ2 x∗2 x1 (multiplying on the right
by x1 ). This in turn implies that λ1 x∗2 x1 = λ2 x∗2 x1 . Since λ1 6= λ2 , x∗2 x1 = 0; that is, the
two vectors are orthogonal. The proof for symmetric matrices is the same.
Proof. Let A be Hermitian and let (λ1 , x1 ) be an eigenvalue/eigenvector pair. Construct
an orthonormal basis of V consisting of y1 = x1 /||x1 ||, y2 , . . . , yn . The basis transformation
into this new coordinate system has a matrix P satisfying P ∗ P = I because the new basis
is orthonormal. Let B be the matrix in the new coordinate system, so that B = P ∗ AP .
Note that B is still Hermitian because B ∗ = (P ∗ AP )∗ = P ∗ A∗ P = P ∗ AP = B. Also,
B has the same eigenvalues as A since it has the same characteristic polynomial, and
y1 is an eigenvector because y1 in the new coordinate system is P ∗ x1 /||x1 || and By1 =
P ∗ AP P ∗ x1 /||x1 || = P ∗ Ax1 /||x1 || = λ1 P ∗ x1 /||x1 || = λ1 y1 . The fact that Ay1 = λ1 y1 ,
implies that the matrix representation of the linear transformation in the new basis has a
first column consisting of (λ1 , 0, . . . , 0)> . This is so because we know that the vector with
coordinates (1, 0, . . . , 0) is mapped into (λ1 , 0, . . . , 0). Now consider a basis vector yi with
i 6= 1. Byi ∈ V , so Byi = cy1 + ỹ, where ỹ is a linear combination of y2 , y3 ,. . . , yn and is
therefore orthogonal to y1 . Now
However, y1∗ B = λ1 y1∗ so y1∗ Byi = λ1 y1∗ yi = 0. Thus, c = 0. This means that B has a first
row in which all the off-diagonal elements are zero and is of the form
µ ¶
λ1 0
0 C
so that C ∗ = C.
We now finish the proof by induction on the dimension n. For n = 1, the result is obvious
since 1 × 1 matrices are trivially diagonal. If the result is true for Hermitian matrices of
dimension n − 1, then there is a unitary matrix Q such that Q∗ CQ is diagonal. This implies
that B is diagonalized by the unitary matrix
µ ¶
1 0
0 Q
eigenvalues λ1 , λ2 ,. . . , λk . Then as previously observed, A is of the form
λ1 I1 0 ··· 0
0 λ2 I2 · · · 0
A= . . .
.. .. .. 0
0 0 ··· λk Ik
Lemma 5 Let F ⊂ P be a polynomial ideal. Then there exists a monic (leading coefficient
1) polynomial f with F = hf i.
Proof. Let f be a nonzero polynomial in F of smallest degree, and wlog let f be monic.
We show that F = hf i. Let g ∈ F. It is sufficient to show that f divides g. By polynomial
division with remainders, we can write g = f q + r, where deg(r) < deg(f ). But by hypoth-
esis, g ∈ F, and f q ∈ F because F is a polynomial ideal, so r ∈ F since F is a vector
subspace of P. But f is by choice a polynomial in F of lowest degree possible for nonzero
polynomials, so r = 0, and g is therefore a multiple of f .
V1 = hx, T xi, V2 = hx, T x, T 2 xi, . . . . At some point, the sequence of subspaces must stop
increasing in size, since the dimension increases by 1 at each step if the size increases, and
the dimension is limited to n. Thus, at some point T k x ∈ Vk−1 , so
Lemma 6 Let m(u) be the minimal polynomial of a linear transformation T and let λ be
an eigenvalue of T . Then m(λ) = 0. Conversely, if m(c) = 0, then c is an eigenvalue of T .
This implies that the minimal polynomial must divide the characteristic polynomial. We
now show that T is diagonalizable if and only if the minimal polynomial for T is a product
of distinct linear factors.
Lemma 7 Let T ∈ L(V ), V of finite dimension n, and let m(u) be the minimal polynomial
of T , Suppose that m(u) = (u − c1 )r1 (u − c2 )r2 · · · (u − ck )rk . Let W ⊂ V , W 6= V , and
suppose that W is invariant under T , meaning that if w ∈ W , then T w ∈ W . Then there
exists y ∈ V with y ∈
/ W and some eigenvalue λ of T such that (T − λI)y ∈ W .
Proof. Fix x ∈ V \ W and consider the set F of polynomials p(u) such that p(T )x ∈ W .
The minimal polynomial m(u) is in F, so the set is non-empty. Also, we can show that F
is a polynomial ideal. First, it is clearly a vector space. Next, if p ∈ F and q ∈ P then
p(T )x ∈ W because p ∈ F, and q(T )[p(T )x] ∈ W because W is invariant under T , and thus
pq ∈ F. Since F is a polynomial ideal, it must be the principal ideal generated by some
monic polynomial g. Since m ∈ F, g|m, so g(u) = (u − c1 )e1 (u − c2 )e2 · · · (u − ck )ek , with
ei ≤ ri . For at least one i, we must have ei > 0 since g 6= 1, say ej . Then g(u) = (u−cj )h(u)
and h(u) ∈ / F because g is a polynomial of minimal degree in F . Now let y = h(T )x ∈ / W.
We have (T − cj I)y = (T − cj )h(T )x = g(T )x ∈ W , as required.
Theorem 5 Let T ∈ L(V ), with V of finite dimension n. Suppose that the characteristic
polynomial of T factors completely as (u − λ1 )d1 (u − λ2 )d2 · · · (u − λk )dk . Then there exists a
basis wrt which the matrix representation A of T has the following form: A is block diagonal
with blocks A1 , A2 ,. . . ,Ak . Each block corresponds to one eigenvalue and is in turn block
diagonal with blocks Bi1 , Bi2 ,. . . , Bini . Each of the Bij has a diagonal consisting of the
eigenvalue λi , has 1 on each entry directly below the diagonal, and has zeroes elsewhere.
There is exactly one eigenvector corresponding to each Bij .